Bereitstellung Mosaic AI Foundation-Modell

Mosaic AI Foundation Model Serving

Serve state-of-the-art foundation models for both real-time and batch inference workload needs. This enables you to quickly and easily build applications that leverage high-quality generative AI models without the need to maintain your own model deployment.

* Displayed pricing does not guarantee product availability in that region. For product availability see here: AWS, Azure, GCP, SAP
1. Azure Databricks, as a first-party service on Microsoft Azure, offers unified billing and support by Microsoft
The Premium tier on Azure Databricks corresponds to the Enterprise tier on AWS and GCP
2. Hourly pricing is charged on a per-minute increment
3. Throughput in a single unit of PT capacity varies by model and query shape (input vs. output tokens). Please use the GenAI Calculator to estimate workload-specific throughput and total cost

Foundation-Modellbereitstellung: DBU-Preise und Durchsatz

Modell	Pay-Per-Token		Bereitgestellter Durchsatz für Skalierungsbänder¹		Bereitgestellter Durchsatz für die Einstiegsbandbreite (nur verfügbar für Basis-Modelle in den USA, Kanada und Brasilien)³
Modell	DBU/1 Mio. EINGABE-Token (global)	DBU/1 Mio. AUSGABE-Token (global)	DBU/h (global)	Durchsatzband² (Token/s max.)	DBU/h (global)	Max Tokens / Sekunde
Aktuelle Modelle
Llama 3.1 405B	35,714	142.857	600.000	3.400	150.000	850
Llama 4 Maverick	7,143	21,429	85.715	3.875	85.715	3.875
Llama 3.3 70B	7,143	21,429	342.857	10.500	85.714	2.600
Llama 3.1 70B	N/A	N/A	342.857	10.500	85.714	2.600
Llama 3.1 8B	2.143	6.429	106.000	23.000	53.571	11.500
Llama 3.2 3B	N/A	N/A	92.857	22.000	46.429	10,900
Llama 3.2 1B	N/A	N/A	85.714	35.000	42.857	15,800
GTE	1.857	N/A	20,000	9.450	20,000	9.450
BGE Large	1,429	N/A	24.000	11.800	24.000	11.800
Ältere Modelle
DBRX	N/A	N/A	171.429	650	171.429	650
Llama 3 70B	N/A	N/A	212,143	1.000	212,143	1.000
Llama 3 8B	N/A	N/A	106.000	3.000	106.000	3.000
Llama 2 70B	N/A	N/A	290.800	1.200	290.800	1.200
Llama 2 13B	N/A	N/A	112,000	980	112,000	980
Mixtral 8x7B	N/A	N/A	290,857	620	290,857	5.000
MPT-30B	N/A	N/A	112,000	450	112,000	450
MPT 7B	N/A	N/A	20,000	2.450	20,000	2.450

²: Das Durchsatzband ist ein modellspezifischer Maximaldurchsatz (Token pro Sekunde), der zum oben genannten Stundenpreis bereitgestellt wird. Beim bereitgestellten Durchsatz wird der Modelldurchsatz in Inkrementen seines spezifischen „Durchsatzbands“ bereitgestellt. Bei einem höheren Modelldurchsatz muss der Kunde ein geeignetes Vielfaches des Durchsatzbands festlegen, das dann mit dem entsprechenden Vielfachen des oben genannten Stundenpreises in Rechnung gestellt wird.

¹: Die Durchsatzangabe ist exemplarisch und basiert auf einem typischen Echtzeit-Anwendungsfall mit einem Eingabewert von 3500 und einem Ausgabewert von 300 Token. Der Ist-Durchsatz schwankt je nach Anwendungsfall, Abfrageform und weiteren Faktoren. Eingabe-/Ausgabeverhältnisse gelten nicht für das Einbetten von Modellen.

^3: Das Einstiegsband ist nur für AWS in den USA, Kanada, Brasilien und für Azure in den USA, Kanada, Brasilien und der EU verfügbar. Die Einstiegsbandbreite ist auch nicht für fein abgestimmte Versionen der Basismodelle verfügbar.

Nutzungsbasierte Abrechnung mit einer 14-tägigen kostenlosen Testversion oder kontaktieren Sie uns für Rabatte für die verbindliche Nutzung oder benutzerdefinierte Anforderungen.

Jetzt kostenlos testen Kontakt

Mosaic AI Foundation Model Serving

Foundation-Modellbereitstellung: DBU-Preise und Durchsatz

Häufig gestellte Fragen zur Mosaic AI-Foundation-Modellbereitstellung