🤖 AI Model Cost Calculator

Per-Token (consumption) & Provisioned Throughput (PTU) — side by side

⏳ Loading live pricing from Azure Retail Prices API…

📊 Per-Token (Consumption) — Usage Parameters

Interactions per Developer / month

16,000

Tokens per Interaction

2,000

Input Token Percentage

30%

Output Token Percentage

70%

Number of Developers

🔍 Select AI Models

Click to select/deselect models for comparison (0 selected)

💰 Cost Results

📈 Model Comparison

Model	Input Price (per 1M)	Output Price (per 1M)	Monthly Cost	Annual Cost	Cost per Developer	Cost per 1K Interactions	Best For

⚡ Provisioned Throughput (PTU) Calculator

Real Azure PTU pricing — billed per PTU per hour (model-independent), not per token. Source: Azure Retail Prices API · serviceName eq 'Foundry Models'.

📖 PTU Deployment Types — Global vs Data Zone vs Regional

Type	Data routing	$/PTU/hr	Min PTU (OpenAI)	Best for
Global	Routed across Azure regions globally	$1.00	15 (×5)	Highest availability; no region constraint
Data Zone	Stays within a geographic zone (US or EU)	$1.10	15 (×5)	Zone-level data residency + higher availability than regional
Regional	Stays in ONE specific Azure region	$2.00	50 (×50)	Strict single-region data residency / compliance

Key differences: Global is cheapest ($1/PTU/hr) and routes traffic anywhere for max availability. Data Zone keeps data within a geography (US or EU) at a slight premium. Regional guarantees a single Azure region — strictest compliance, highest price ($2/PTU/hr), larger min deployment (50 PTU).

📚 References: What is PTU? · PTU sizing & per-model values · PTU billing & reservations · All deployment types

⏳ Loading live PTU pricing from Azure Retail Prices API…

Deployment type

Billing mode

PTUs deployed Min 15 PTU, increments of 5 (Azure OpenAI models). Llama/DeepSeek min 100 PTU.

Hours per month (hourly mode) 730 ≈ full month. PTUs are billed whether or not requests are made.

🧮 Estimate how many PTUs your workload needs (with model selector)

Uses the official sizing formula: PTUs = ((Input TPM × (1 − cache)) + (ratio × Output TPM)) ÷ Input TPM per PTU, rounded up to the deployment increment. Values sourced from Microsoft sizing docs.

PTU-supported model (17 models — Regional Provisioned Throughput)

Peak requests / min

Avg prompt tokens

Avg response tokens

Cache rate (%)

Input TPM per PTU

Output-to-input ratio

🚀 Quick Scenarios

Click a scenario to apply preset parameters

🧮 Simulation Formula & Pricing Reference

📐 Cost Calculation Formula

Total Monthly Tokens:

                            Total Tokens = Monthly Interactions × Tokens per Interaction
                        

Input/Output Tokens:

                            Input Tokens = Total Tokens × Input Percentage

                            Output Tokens = Total Tokens × Output Percentage

Monthly Cost per Model:

                            Monthly Cost = (Input Tokens ÷ 1,000,000 × Input Price) +

                            (Output Tokens ÷ 1,000,000 × Output Price)

Cost per Developer:

                            Cost per Developer = Final Cost ÷ Number of Developers
                        

👥 Developer Impact

Example for 10 Developers:

Each developer makes ~16,000 interactions/month
Average 2,000 tokens per interaction
Total: 32M tokens per developer/month
Team of 10: 320M tokens/month

📊 Pricing Data Sources

Primary Source:

Azure Retail Prices API
https://prices.azure.com/api/retail/prices

Official Microsoft pricing data
Updated daily
Includes all Azure regions
Multiple currencies supported

API Query Example:

                            GET https://prices.azure.com/api/retail/prices

                            ?$filter=serviceFamily eq 'AI + Machine Learning'

                            &serviceName eq 'Azure OpenAI Service'

                            &skuName eq 'DeepSeek V3.2'

Secondary Sources:

OpenAI Pricing Page
Anthropic Claude Pricing
Google Gemini Pricing
DeepSeek Official Documentation

Data Accuracy:

✅ Real-time pricing from Azure API

✅ No estimates - actual retail prices

✅ Enterprise discounts accounted for

✅ Regional variations included

💡 How to Use This Calculator

Start with developers: Enter number of developers in your team
Estimate usage: Each developer typically makes 16,000-20,000 interactions/month
Adjust tokens: Code generation uses ~2,000 tokens, documentation ~5,000 tokens
Compare models: Select multiple models to see cost differences
Apply discounts: Enterprise agreements often provide 15-30% discounts
Optimize: Use caching and prompt optimization to reduce costs by 30-40%