tech/microsoft/azure-ai

AZURE AI

Microsoft Azure AI agentic platform skill.

production Azure OpenAI REST API, Azure AI Agent Service REST API, Semantic Kernel SDK, Copilot Studio, MCP
requires: tech/microsoft
improves: tech/microsofttech

Azure AI — Agentic Platform

Microsoft's agentic AI stack in 2025/2026 centres on four layers:

LayerTechnologyRole
ModelsAzure OpenAI (GPT-4o, o3, o1), Azure AI Foundry model catalogFoundation models, reasoning, embeddings
AgentsAzure AI Agent ServiceManaged threads, tool execution, file search, code interpreter
OrchestrationSemantic KernelPlugin routing, planner, multi-agent process framework
SurfaceCopilot Studio / M365 Agents SDKExpose agents in Teams, Outlook, Copilot Chat

The 2nth.ai pattern: Workers AI classifies intent at the edge (zero latency) → Azure AI Agent handles the stateful, multi-step thread → results stream back through the Worker.

Azure OpenAI — calling from Workers

Azure OpenAI uses the same REST API shape as OpenAI but with a different base URL and api-version parameter. Auth via API key or Entra ID token.

// Chat completion — GPT-4o
const res = await fetch(
  `${env.AZURE_OPENAI_ENDPOINT}/openai/deployments/gpt-4o/chat/completions?api-version=2025-01-01-preview`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'api-key': env.AZURE_OPENAI_KEY,
      // OR: Authorization: `Bearer ${await getAzureToken(env, 'https://cognitiveservices.azure.com/.default')}`
    },
    body: JSON.stringify({
      model: 'gpt-4o',
      messages: [
        { role: 'system', content: 'You are a helpful business assistant.' },
        { role: 'user', content: userMessage },
      ],
      max_tokens: 1000,
      temperature: 0.3,
    }),
  }
);
const data = await res.json();
const reply = data.choices[0].message.content;

Current model deployments (2025/2026):

Deployment nameModelBest for
gpt-4oGPT-4oGeneral assistant, structured output, tool use
gpt-4o-miniGPT-4o miniFast classification, low-cost tasks
o3-minio3-miniComplex reasoning, coding, multi-step planning
o1o1Deep reasoning, long-context analysis
text-embedding-3-largeEmbeddingRAG, semantic search, Vectorize

Structured output (JSON schema)

body: JSON.stringify({
  model: 'gpt-4o',
  messages: [...],
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'invoice_extraction',
      strict: true,
      schema: {
        type: 'object',
        properties: {
          vendor: { type: 'string' },
          amount: { type: 'number' },
          due_date: { type: 'string' },
          line_items: { type: 'array', items: { type: 'object', properties: { description: { type: 'string' }, amount: { type: 'number' } }, required: ['description','amount'], additionalProperties: false } },
        },
        required: ['vendor','amount','due_date','line_items'],
        additionalProperties: false,
      },
    },
  },
})

Azure AI Agent Service

Managed agent infrastructure with persistent threads. Think OpenAI Assistants v2, hosted in Azure, with enterprise auth and Azure AI Search integration.

Concepts

Create an agent

const agent = await fetch(
  `${env.AZURE_AI_PROJECT_ENDPOINT}/agents/v1.0/agents`,
  {
    method: 'POST',
    headers: { 'Content-Type': 'application/json', 'api-key': env.AZURE_AI_KEY },
    body: JSON.stringify({
      model: 'gpt-4o',
      name: '2nth-erp-assistant',
      instructions: `You are an ERP assistant for 2nth.ai clients. You can query invoices, 
        check stock, and draft supplier communications. Always confirm before taking actions.`,
      tools: [
        { type: 'file_search' },
        { type: 'code_interpreter' },
        {
          type: 'function',
          function: {
            name: 'query_invoices',
            description: 'Query open invoices from the ERP system',
            parameters: {
              type: 'object',
              properties: {
                client_name: { type: 'string', description: 'Filter by client name' },
                status: { type: 'string', enum: ['open', 'overdue', 'paid'] },
              },
            },
          },
        },
      ],
    }),
  }
).then(r => r.json());

Thread lifecycle (Worker handler)

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const { message, threadId: existingThreadId } = await request.json() as {
      message: string; threadId?: string;
    };

    const base = env.AZURE_AI_PROJECT_ENDPOINT + '/agents/v1.0';
    const headers = { 'Content-Type': 'application/json', 'api-key': env.AZURE_AI_KEY };

    // 1. Get or create thread
    const threadId = existingThreadId ?? (
      await fetch(`${base}/threads`, { method: 'POST', headers, body: '{}' })
        .then(r => r.json()).then(d => d.id)
    );

    // 2. Add user message to thread
    await fetch(`${base}/threads/${threadId}/messages`, {
      method: 'POST', headers,
      body: JSON.stringify({ role: 'user', content: message }),
    });

    // 3. Create a run
    let run = await fetch(`${base}/threads/${threadId}/runs`, {
      method: 'POST', headers,
      body: JSON.stringify({ assistant_id: env.AZURE_AGENT_ID }),
    }).then(r => r.json());

    // 4. Poll until complete (handle tool calls)
    while (run.status === 'queued' || run.status === 'in_progress' || run.status === 'requires_action') {
      await new Promise(r => setTimeout(r, 800));
      run = await fetch(`${base}/threads/${threadId}/runs/${run.id}`, { headers }).then(r => r.json());

      if (run.status === 'requires_action') {
        const toolCalls = run.required_action.submit_tool_outputs.tool_calls;
        const outputs = await Promise.all(toolCalls.map(async (tc: any) => {
          const args = JSON.parse(tc.function.arguments);
          let output = '';

          if (tc.function.name === 'query_invoices') {
            // Call your own Worker/D1/ERPNext
            const invoices = await env.DB.prepare(
              'SELECT * FROM invoices WHERE status = ? LIMIT 10'
            ).bind(args.status ?? 'open').all();
            output = JSON.stringify(invoices.results);
          }

          return { tool_call_id: tc.id, output };
        }));

        // Submit tool outputs
        await fetch(`${base}/threads/${threadId}/runs/${run.id}/submit_tool_outputs`, {
          method: 'POST', headers,
          body: JSON.stringify({ tool_outputs: outputs }),
        });
      }
    }

    // 5. Get the assistant reply
    const messages = await fetch(
      `${base}/threads/${threadId}/messages?order=desc&limit=1`, { headers }
    ).then(r => r.json());

    const reply = messages.data[0].content[0].text.value;
    return Response.json({ reply, threadId });
  }
};

Semantic Kernel — orchestration layer

Semantic Kernel (SK) is Microsoft's orchestration SDK. Use it from a Cloud Run container (Node.js/Python) when you need multi-agent coordination, complex planning, or plugin chaining that's too stateful for a Worker.

Worker → Cloud Run → SK pattern

// Cloudflare Worker: classify + route
const intent = await classifyWithWorkersAI(message, env);

if (intent === 'complex_multi_step') {
  // Hand off to Cloud Run container running Semantic Kernel
  const result = await fetch(env.SK_SERVICE_URL + '/orchestrate', {
    method: 'POST',
    headers: { Authorization: `Bearer ${await getCloudRunToken(env)}` },
    body: JSON.stringify({ message, threadId, context }),
  }).then(r => r.json());
  return result;
}
// Simple → handle in Worker directly with Azure OpenAI

SK plugin definition (TypeScript)

import { kernel, KernelFunction, KernelArguments } from '@microsoft/semantic-kernel';

class ERPPlugin {
  @KernelFunction()
  @kernel.function({ description: 'Get open invoices from ERPNext' })
  async getOpenInvoices(
    @kernel.parameter({ description: 'Client name filter', required: false })
    clientName?: string
  ): Promise<string> {
    const res = await fetch(`${ERP_BASE_URL}/api/method/frappe.client.get_list`, {
      headers: { Authorization: `token ${API_KEY}:${API_SECRET}` },
      // ...
    });
    return JSON.stringify(await res.json());
  }
}

// Register and invoke
const k = kernel.builder()
  .addAzureOpenAIChatCompletion('gpt-4o', endpoint, apiKey)
  .build();
k.importPluginFromObject(new ERPPlugin(), 'ERP');

const result = await k.invokePromptAsync(
  'Find all overdue invoices for {{$client}} and draft a payment reminder.',
  new KernelArguments({ client: 'Acme Corp' })
);

Copilot Studio — MCP connector

Expose any 2nth.ai Worker as a tool in Microsoft Copilot Studio using the MCP protocol. Copilot Studio added native MCP support in 2025.

2nth Worker (MCP server)
  → Copilot Studio MCP connector
  → Microsoft 365 Copilot Chat
  → Teams / Outlook / Word

Worker as MCP server (minimal)

// GET /.well-known/mcp — MCP server manifest
// POST /mcp/tools/call — tool invocation

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const url = new URL(request.url);

    if (url.pathname === '/.well-known/mcp') {
      return Response.json({
        schema_version: '2025-11-05',
        name: '2nth-erp',
        description: 'Query ERP data — invoices, stock, suppliers',
        tools: [
          {
            name: 'query_invoices',
            description: 'List invoices by status or client',
            inputSchema: {
              type: 'object',
              properties: {
                status: { type: 'string', enum: ['open','overdue','paid'] },
                client: { type: 'string' },
              },
            },
          },
        ],
      });
    }

    if (url.pathname === '/mcp/tools/call' && request.method === 'POST') {
      const { name, arguments: args } = await request.json() as any;
      // ... dispatch to tool handlers
      return Response.json({ content: [{ type: 'text', text: result }] });
    }

    return new Response('Not found', { status: 404 });
  }
};

Register in Copilot Studio

1. Copilot Studio → Your copilot → Tools → Add a tool → MCP
2. Server URL: https://your-worker.workers.dev
3. Auth: API key header (X-API-Key → your Worker secret)
4. Copilot Studio auto-discovers tools from /.well-known/mcp
5. Enable tool → test in Copilot Chat → publish to Teams/Outlook

Azure AI Search — RAG retrieval

Azure AI Search is the enterprise RAG layer. Hybrid search (vector + BM25 keyword) outperforms pure vector search for most business document scenarios.

// Index a document
await fetch(`${env.AZURE_SEARCH_ENDPOINT}/indexes/erp-docs/docs/index?api-version=2024-07-01`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json', 'api-key': env.AZURE_SEARCH_KEY },
  body: JSON.stringify({
    value: [{
      '@search.action': 'upload',
      id: docId,
      content: documentText,
      contentVector: await getEmbedding(documentText, env), // text-embedding-3-large
      source: 'erpnext',
      clientId: 'acme-corp',
    }],
  }),
});

// Hybrid search
const results = await fetch(
  `${env.AZURE_SEARCH_ENDPOINT}/indexes/erp-docs/docs/search?api-version=2024-07-01`,
  {
    method: 'POST',
    headers: { 'Content-Type': 'application/json', 'api-key': env.AZURE_SEARCH_KEY },
    body: JSON.stringify({
      search: userQuery,               // BM25 keyword
      vectorQueries: [{
        kind: 'vector',
        vector: await getEmbedding(userQuery, env),
        fields: 'contentVector',
        k: 5,
      }],
      queryType: 'semantic',
      semanticConfiguration: 'default',
      top: 5,
      select: 'id,content,source,clientId',
    }),
  }
).then(r => r.json());

2nth.ai agentic routing pattern

User message (Web / Teams / WhatsApp)
  → Cloudflare Worker
      ↓
  Workers AI (Llama 3.1 8B) — classify intent: simple | complex | erp | document
      ↓
  [simple]    → Azure OpenAI GPT-4o mini — direct answer, stream back
  [complex]   → Azure AI Agent Service — create/resume thread, run with tools
  [erp]       → Azure AI Agent + ERPNext function tool via Cloud Run
  [document]  → Azure AI Agent + Azure AI Search file_search tool
      ↓
  D1 / KV — persist threadId per user session
  R2 — store uploaded files before indexing

Rate limits and quotas

ServiceDefault quotaNotes
GPT-4o450k TPM (tokens/min)Request increase for production
GPT-4o mini2M TPMGood for classification/routing
o3-mini150k TPMLower — use for complex tasks only
AI Agent Service50 concurrent runsScales with quota request
Azure AI Search3 indexes (free tier)Standard S1: 50 indexes

Gotchas