Allow Prompt/Sampling Messages to contain multiple content blocks. by evalstate · Pull Request #198 · modelcontextprotocol/modelcontextprotocol
I'm slightly worried about allowing message content array w/o requiring a strict message role alternance.
And very worried about the breaking change.
Most inference APIs (OpenAI's chat completions, Claude's, but also OSS in HF transformers and llama.cpp) require or assume a strict assistant / user alternance in messages, with message content being a single string or an array of typed parts.
The current sampling API amounts to flattened version of this & allows consecutive repeated roles, but is currently trivial and unambiguous to unflatten, by just grouping by role:
// Sampling messages
[
{"role": "user", "content": {"type": "text", "text": "Describe and enhance this pic:"}},
{"role": "user", "content": {"type": "image", "mimeType": "image/png", "data": "base64..."}},
{"role": "assistant", "content": {"type": "text", "text": "It's dull. I've spiced it up"}},
{"role": "assistant", "content": {"type": "image", "mimeType": "image/png", "data": "base64..."}},
{"role": "user", "content": {"type": "text", "text": "And then?"}}
]Converted to OpenAI / HF-style format (content: string | ({type: "text", text: string} | ...)[]):
// OpenAI- / HF-style messages
[
{"role": "user", "content": [
{"type": "text", "text": "Describe and enhance this pic:"},
{"type": "image", "mimeType": "image/png", "data": "base64..."}
]},
{"role": "assistant", "content": [
{"type": "text", "text": "It's dull. I've spiced it up"},
{"type": "image", "mimeType": "image/png", "data": "base64..."}
]},
{"role": "user", "content": {"type": "text", "text": "And then?"}}
]Now if we allow this:
[
{"role": "user", "content": [{"type": "text", "text": "content1.1"}, {"type": "text", "text": "content1.2"}]},
{"role": "user", "content": [{"type": "text", "text": "content2"}]}
]The only way to implement it w/ actual inference APIs will be to coalesce these, loosing the kinda-implied semantic grouping of content1.1 and content1.2:
[
{"role": "user", "content": [
{"type": "text", "text": "content1.1"},
{"type": "text", "text": "content1.2"},
{"type": "text", "text": "content2"}
]}
]My take is we should:
-
Have content accept a single
MessageContentor an array of it, to avoid backwards-incompatibility:type MessageContent = TextContent | ImageContent | AudioContent | EmbeddedResource; export interface PromptMessage { role: Role; content: MessageContent | MessageContent[]; }
-
Introduce backward-compatible message role alternance: maybe something like:
Consecutive sub-sequences of messages with the same role MUST either all have a content with a single MessageContent, or be of length 1.