Table of Contents

Sampling

MCP sampling allows servers to request LLM completions from the client. This enables agentic behaviors where a server-side tool delegates reasoning back to the client's language model — for example, summarizing content, generating text, or making decisions.

How sampling works

  1. The server calls SampleAsync (or uses the AsSamplingChatClient adapter) during tool execution.
  2. The request is sent to the connected client over MCP.
  3. The client's SamplingHandler processes the request — typically by forwarding it to an LLM.
  4. The client returns the LLM response to the server, which continues tool execution.

Server: requesting a completion

Inject McpServer into a tool method and use the AsSamplingChatClient extension method to get an <xref:Microsoft.Extensions.AI.IChatClient> that sends requests through the connected client:

[McpServerTool(Name = "SummarizeContent"), Description("Summarizes the given text")]
public static async Task<string> Summarize(
    McpServer server,
    [Description("The text to summarize")] string text,
    CancellationToken cancellationToken)
{
    ChatMessage[] messages =
    [
        new(ChatRole.User, "Briefly summarize the following content:"),
        new(ChatRole.User, text),
    ];

    ChatOptions options = new()
    {
        MaxOutputTokens = 256,
        Temperature = 0.3f,
    };

    return $"Summary: {await server.AsSamplingChatClient().GetResponseAsync(messages, options, cancellationToken)}";
}

Alternatively, use SampleAsync directly for lower-level control:

CreateMessageResult result = await server.SampleAsync(
    new CreateMessageRequestParams
    {
        Messages =
        [
            new SamplingMessage
            {
                Role = Role.User,
                Content = [new TextContentBlock { Text = "What is 2 + 2?" }]
            }
        ],
        MaxTokens = 100,
    },
    cancellationToken);

string response = result.Content.OfType<TextContentBlock>().FirstOrDefault()?.Text ?? string.Empty;

Client: handling sampling requests

Set SamplingHandler when creating the client. This handler is called when a server sends a sampling/createMessage request.

Using an IChatClient

The simplest approach is to use CreateSamplingHandler with any <xref:Microsoft.Extensions.AI.IChatClient> implementation:

IChatClient chatClient = new OllamaChatClient(new Uri("http://localhost:11434"), "llama3");

McpClientOptions options = new()
{
    Handlers = new()
    {
        SamplingHandler = chatClient.CreateSamplingHandler()
    }
};

await using var client = await McpClient.CreateAsync(transport, options);

Custom handler

For full control, provide a custom delegate:

McpClientOptions options = new()
{
    Handlers = new()
    {
        SamplingHandler = async (request, progress, cancellationToken) =>
        {
            // Forward to your LLM, apply content filtering, etc.
            string prompt = request?.Messages?.LastOrDefault()?.Content
                .OfType<TextContentBlock>().FirstOrDefault()?.Text ?? string.Empty;

            return new CreateMessageResult
            {
                Model = "my-model",
                Role = Role.Assistant,
                Content = [new TextContentBlock { Text = $"Response to: {prompt}" }]
            };
        }
    }
};

Capability negotiation

Sampling requires the client to advertise the sampling capability. This is handled automatically — when a SamplingHandler is set, the client includes the sampling capability during initialization. The server can check whether the client supports sampling before calling SampleAsync; if sampling is not supported, the method throws InvalidOperationException.