[email protected], [email protected], [email protected], [email protected], [email protected]
https://.com/webmachinelearning/prompt-api/blob/main/README.md
None yet, although some of the shared infrastructure in https://webmachinelearning..io/writing-assistance-apis/#supporting will be used.
An API designed for interacting with an AI language model using text, image, and audio inputs. It supports various use cases, from generating image captions and performing visual searches to transcribing audio, classifying sound events, generating text following specific instructions, and extracting information or insights from text. It supports structured outputs which ensure that responses adhere to a predefined format, typically expressed as a JSON schema, to enhance response conformance and facilitate seamless integration with downstream applications that require standardized output formats.
This API is also exposed in Chrome Extensions, currently as an Origin Trial. This Intent is for exposure as an Origin Trial on the web.
https://.com/w3ctag/design-reviews/issues/1093
Pending
This feature, like all built-in AI features, has inherent interoperability risks due to the use of AI models whose behavior is not fully specified. See some general discussion in https://www.w3.org/reports/ai-web-impact/#interop.
In particular, because the output in response to a given prompt varies by language model, it is possible for developers to write brittle code that relies on specific output formats or quality, and does not work across multiple browsers or multiple versions of the same browser.
There are some reasons to be optimistic that web developers won't write such brittle code. Language models are inherently nondeterministic, so creating dependencies on their exact output is difficult. And many users will not have the hardware necessary to run a language model, so developers will need to code in a way such that the prompt API is always used as an enhancement, or has appropriate fallback to cloud services.
Several parts of the API design help steer developers in the right direction, as well. The API has clear availability testing features for developers to use, and requires developers to state their required capabilities (e.g., modalities and languages) up front. Most importantly, the structured outputs feature can help mitigate against writing brittle code that relies on specific output formats.
Gecko: No signal (https://.com/mozilla/standards-positions/issues/1213)
WebKit: No signal (https://.com/WebKit/standards-positions/issues/495)
Web developers: Strongly positive (https://.com/webmachinelearning/prompt-api/blob/main/README.md#stakeholder-feedback)
Other signals: We are also working with Microsoft Edge developers on this feature, with them contributing the structured output functionality.
This feature would definitely benefit from having polyfills, backed by any of: cloud services, lazily-loaded client-side models using WebGPU, or the web developer's own server. We anticipate seeing an ecosystem of such polyfills grow as more developers experiment with this API.
Does this intent deprecate or change behavior of existing APIs, such that it has potentially high risk for Android WebView-based applications?
None
Validate the technical implementation and developer experience of multimodal inputs with a broader audience and actual usage.
Assess how structured output improves ergonomics and could address interoperability concerns between implementations (e.g. different underlying models).
Gather extensive feedback from a wide range of web developers rooted in real world usage.
Identify diverse and innovative use cases to inform a roadmap of task APIs.
None
It is possible that giving DevTools more insight into the nondeterministic states of the model, e.g. random seeds, could help with debugging. See discussion at https://.com/webmachinelearning/prompt-api/issues/74.
We also have some internal debugging pages which give more detail on the model's status, e.g. chrome://on-device-internals, and parts of these might be suitable to port into DevTools.
No
Not all platforms will come with a language model. In particular, in the initial stages we are focusing on Windows, Mac, and Linux.
No
We plan to write web platform tests for the API surface as much as possible. The core responses from the model will be difficult to test, but some facets are testable, e.g. the adherence to structured output response constraints.
prompt-api-for-gemini-nano-multimodal-input
AIPromptAPIMultimodalInput
True
https://issues.chromium.org/issues/417530643
We have various use counters for the API, e.g. LanguageModel_Create
Does the feature depend on any code or APIs outside the Chromium open source repository and its open-source dependencies to function?
Yes: this feature depends on a language model, which is bridged to the open-source parts of the implementation via the interfaces in //services/on_device_model.
Open questions about a feature may be a source of future web compat or interop issues. Please list open issues (e.g. links to known issues in the project for the feature specification) whose resolution may introduce web compat/interop risk (e.g., changing to naming or structure of the API in a non-backward-compatible way).
https://.com/webmachinelearning/prompt-api/issues/42 is somewhat worth keeping an eye on, but we believe a forward-compatible approach is possible by just providing constant min = max values.
https://chromestatus.com/feature/5134603979063296?gate=5106702730657792
This intent message was generated by Chrome Platform Status.
LGTM to experiment from M139 to M144 inclusive.
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAM0wra9oT0jygAYT00WPp0_wtZ-znrB2OdZ6GQb%2B3thFLP19pA%40mail.gmail.com.