Posts Tagged ‘llm’

Evolution of compute cost to tokens

First Wave: Competition Enters the Market (2023-2024)

June 2023 – The First Cuts

OpenAI reduced GPT-3.5 pricing, but the market was about to get more competitive.

July 2023 – Claude 2 Enters

Anthropic released Claude 2 at $8 per million input tokens and $24 per million output tokens—significantly cheaper than GPT-4 while offering competitive capabilities.

December 2023 – Google Crashes the Party

Google launched Gemini 1.0 Pro at an astonishing $0.50 per million input tokens—60x cheaper than GPT-4. This marked the beginning of the price war.

March 2024 – The Budget Tier Emerges

Claude 3 Haiku launched at $0.25 per million input tokens, bringing entry-level AI costs down to fractions of a cent.

Second Wave: The Efficiency Revolution (2024-2025)

May 2024 – OpenAI Responds

GPT-4o launched at $5 per million input tokens—a 83% reduction from GPT-4. Gemini 1.5 Flash launched at $0.35 per million input tokens.

July 2024 – Mini Models Take Over

GPT-4o mini debuted at just $0.15 per million input tokens—200x cheaper than GPT-4 from 17 months earlier. This established the “mini” model tier as the new budget option.

August 2024 – Flash Drives to Near-Zero

Gemini 1.5 Flash reduced to $0.075 per million input tokens—cheaper than any competitor and effectively free for most use cases.

December 2024 – DeepSeek Disrupts

DeepSeek V3 launched at $0.14 per million input tokens, demonstrating that open-source models could match proprietary quality at a fraction of the cost.

Third Wave: The Commoditization (2025-2026)

Early 2025 – Flash Wars

Gemini 2.0 Flash launched at $0.10 per million input tokens. OpenAI responded with GPT-4.1 nano at $0.10 per million input tokens.

March 2025 – Frontier Gets Cheaper

GPT-4o reduced to $2.50 per million input tokens (from $5). Claude 3.5 Sonnet reduced to $3.00 per million input tokens (from $15).

Early 2026 – Current State

GPT-5.4: $2.50/$10 per million tokens
Claude Opus 4.6: $5.00/$25 per million tokens
Gemini 2.5 Pro: $1.25/$10 per million tokens
Gemini 2.0 Flash-Lite: $0.075/$0.30 per million tokens
DeepSeek V3.2: $0.28/$0.42 per million tokens

The Numbers: A Timeline View

Period	Model	Input $/1M	Output $/1M	Reduction
Mar 2023	GPT-4	$30.00	$60.00	Baseline
Jul 2023	Claude 2	$8.00	$24.00	73%
Dec 2023	Gemini 1.0 Pro	$0.50	$1.50	98%
Mar 2024	Claude 3 Haiku	$0.25	$1.25	99%
Jul 2024	GPT-4o mini	$0.15	$0.60	99.5%
Aug 2024	Gemini 1.5 Flash	$0.075	$0.30	99.75%
2026	Gemini 2.0 Flash-Lite	$0.075	$0.30	99.75%

Total reduction from peak: ~99.7%

Why Prices Fell: Three Major Forces

1. Intense Competition

When GPT-4 launched in 2023, OpenAI had almost no competition. By 2026, there are 15+ major providers competing for market share. This drove prices down relentlessly.

2. Hardware Efficiency

NVIDIA’s GPU improvements, custom AI accelerators from Google (TPU), and specialized inference chips from multiple vendors increased throughput 10-100x while reducing per-token compute costs.

3. Model Architecture Advances

Mixture of Experts (MoE): Only activates relevant “experts” per token, dramatically reducing compute
Knowledge distillation: Smaller models that perform nearly as well as larger ones
Quantization: More efficient numeric representations reduce memory and compute needs

The Paradox: Why Your AI Bill Still Grew

Here’s the surprising twist: despite per-token prices dropping 99.7%, many organizations report higher total AI spending in 2025-2026.

Why? Three factors:

Usage Explosion: When prices fall, usage grows exponentially. Companies build AI into more products.
Agentic Workloads: AI agents use 10-50x more tokens than simple chat. A single agentic task might involve:

System prompts: 1,000-5,000 tokens
Tool definitions: 500-3,000 tokens
Retrieved context: 1,000-20,000 tokens
Multiple LLM calls: 5-20+ per task

Hidden Costs: Tool calls, vector storage, orchestration, monitoring—these add 2-3x to apparent token costs.

What This Means for Developers

The Economics Have Transformed

Task	2023 Cost	2026 Cost
1M tokens processed	$30-60	$0.075-3.00
1 hour @ 100 tok/sec	$10.80-21.60	$0.027-1.08
10,000 daily users (100 msg/day)	$30,000-60,000/mo	$75-3,000/mo

Best Practices for 2026

Start with cheap models: Gemini Flash or GPT-4.1 nano are “good enough” for 80% of tasks
Use routing: Route simple queries to cheap models, complex ones to premium
Optimize prompts: Smaller prompts = direct cost savings
Cache aggressively: Most LLM calls have significant redundancy

The Future: What’s Next?

Price Predictions (2026-2028)

Industry analysts project another 30-50% reduction in token prices over the next 2-3 years, driven by:

Next-generation hardware: Blackwell GPUs, custom silicon
Agentic optimization: Models designed for agents, not just chat
Open-source pressure: DeepSeek, Llama, and others pushing prices down

Emerging Trends

Per-task pricing: Pay based on actual work done, not just tokens
Hybrid inference: Combining cloud and local for cost-efficiency
Specialized models: Cheap models for specific domains (coding, math, etc.)

Key Takeaways

99.7% price reduction: From $30/M to $0.075/M input tokens in 3 years
Competition drove prices: Multiple major providers fighting for market share
Hardware + architecture: MoE, quantization, and specialized chips enabled efficiency
Usage grew despite lower prices: Total AI spending increased 3x from 2024-2025
The frontier holds: Best models still cost premium—price decline is at “commodity” tier

Conclusion

The transformation from compute costs to token costs represents one of the fastest price declines in technology history. What was once a premium service accessible only to well-funded startups is now a utility available to anyone with a credit card—and in many cases, free.

For developers and businesses, this means AI is no longer a cost center to optimize around—it’s a building block to design with. The question is no longer “can we afford AI?” but rather “how quickly can we integrate it?”

The era of expensive AI is over. The era of abundant, cheap, capable AI is here.

…..

What about specific tokens in medical world?..

Well I am in that world.

Medical-Specific AI Models

Google MedLM (Med-PaLM 2)

Background:

Developed by Google Research
First AI system to pass USMLE medical exams (67.6% in 2022, 86.5% in 2024)
Powers MedLM family on Google Cloud Vertex AI
Based on PaLM architecture, fine-tuned for medical domain

Availability:

MedLM-medium: Faster throughput, more recent data
MedLM-large: More capable, complex tasks
Access: Restricted to allow listed customers (US, Brazil, Singapore)
Requires Google Cloud account team contact

Important Limitation:

NOT designed for clinical diagnosis
Output must be verified by healthcare professionals
Non-clinical use only under Service Specific Terms
HIPAA-compliant but requires proper BAA

Pricing:

Uses Google Vertex AI pricing structure
Pay-per-character basis
Custom pricing through enterprise contracts
No public consumer pricing

Amazon HealthScribe

Background:

AWS service for clinical documentation
AI-powered medical scribe
Converts patient-provider conversations to clinical notes
HIPAA-eligible

Pricing (Pay-as-you-go):

$0.001667 per audio second
~$1.50 for a 15-minute consultation
No retention of audio or output text

Features:

Real-time transcription
Medical terminology recognition
Integration with Epic EHR
Privacy-focused (data not used for training)

Microsoft Dragon Copilot

Background:

Built into Microsoft Cloud for Healthcare
Ambient clinical documentation
Voice-enabled AI assistant

Pricing:

$600-800 per month per provider
1-3 year contracts
Integrated with Microsoft 365

Features:

Real-time documentation
Clinical decision support
Integration with existing EHR systems

Healthcare Natural Language API (Google)

Purpose: Extract healthcare entities from medical text

Pricing (Text Record Volume):

Based on text record monthly volume
Character-based tiering
First 25,000 records free
Enterprise pricing for larger volumes

Cloud Healthcare API (Google)

Purpose: FHIR, HL7, DICOM interoperability

Pricing:

Storage: $0.004/GB/month (structured)
Requests:
First 25,000/month: Free
25,001-1M: $0.39/100,000
1M+: $0.29/100,000
HIPAA-eligible

Comparison: Healthcare AI vs. General AI

Aspect	General AI	Healthcare AI
Price	$0.075-30/M tokens	Custom/Enterprise
HIPAA	Not required	Required
FDA/Reg	None	May apply
Training Data	General web	Medical literature
Accuracy	Good	Medical-grade
Liability	User’s risk	Significant

Cost Considerations for Healthcare Organizations

Direct Costs

API/Usage Fees:

Per-character or per-token pricing
Volume-based enterprise discounts
Monthly minimums common

Integration:

EHR integration: $10,000-100,000+
Custom development: $50,000-500,000+
Maintenance: $10,000-50,000/year

Hidden Costs

Compliance:

HIPAA Business Associate Agreement
Security audits
Privacy impact assessments

Training:

Staff training on AI tools
Workflow re-engineering
Ongoing quality monitoring

Verification:

Clinician review time
Error tracking and correction
Liability insurance

Medical AI Pricing Examples

Scenario 1: Small Practice (2 Providers)

Service	Monthly Cost
Dragon Copilot	$1,200-1,600
EHR integration	$200-500
Training	$500
Total	$1,900-2,600/month

Scenario 2: Hospital System (50 Providers)

Service	Monthly Cost
MedLM (API)	$5,000-15,000
HealthScribe	$3,000-5,000
Infrastructure	$10,000-20,000
Integration	$5,000-10,000
Training	$2,000-5,000
Total	$25,000-55,000/month

Medical AI vs. General AI: Key Differences

Why Healthcare AI Costs More

Specialized Training

Medical literature, clinical notes
USMLE, medical board exams
Domain-specific fine-tuning

Regulatory Compliance

HIPAA required
FDA device classification may apply
Audit trails and logging

Liability

Medical malpractice exposure
Requires human oversight
Error documentation

Integration Complexity

EHR compatibility
Clinical workflows
Data standards (FHIR, HL7)

Leading Healthcare AI Providers

By Category

Category	Provider	Product	Pricing
Foundation Models	Google	MedLM	Enterprise
Clinical Docs	Microsoft	Dragon Copilot	$600-800/user/mo
Clinical Docs	Amazon	HealthScribe	$0.002/audio-sec
Transcription	Nuance	DAX	Enterprise
CDSS	IBM	Watson Health	Enterprise
Imaging	Google	Med-PaLM	Enterprise

The Future of Healthcare AI Pricing

Trends to Watch (2026-2028)

** Commoditization**: General AI models may meet medical accuracy standards at lower costs
Specialization: More domain-specific medical models at competitive prices
Regulation: FDA guidance on LLM medical devices may increase compliance costs but add clarity
Competition: More players entering healthcare AI could drive prices down 30-50%

Predictions

Healthcare AI to follow general AI pricing trajectory (60-80% reduction over 3 years)
More SaaS pricing models replacing enterprise contracts
Per-visit or per-patient pricing models emerging
Insurance reimbursement for AI-assisted care

Key Takeaways for Healthcare Organizations

Don’t use consumer AI for clinical decisions: General AI models lack medical training and liability coverage
Budget for more than API costs: Integration, training, compliance, and verification add 2-5x to direct costs
Look for HIPAA-eligible services: Ensure proper BAA agreements with vendors
Plan for human oversight: All current healthcare AI requires clinician verification
Start with non-clinical uses: Documentation, scheduling, billing AI have lower risk
Enterprise pricing is negotiable: Contact vendors directly, especially for larger deployments

Conclusion

Healthcare AI is following a similar trajectory to general AI but with unique challenges. While prices haven’t collapsed as dramatically as consumer AI, the trend is downward. Organizations should:

Evaluate specific use cases
Start with low-risk applications
Plan for human-in-the-loop workflows
Budget comprehensively beyond API costs

The medical AI market will mature rapidly through 2026-2028, with prices likely following the broader market’s 60-80% reduction pattern as competition intensifies and regulatory clarity improves.

…

Why use token? – this was using DeepSeek.

Great question! Using tokens instead of raw characters or whole words is a deliberate design choice in LLMs. Here’s why tokens are used:

1. Efficiency (Balancing vocabulary size vs. sequence length)

If we used characters:

Each character = 1 unit
“Hello” = 5 units (short sequence, good)
But vocabulary is tiny (only 26 letters + punctuation + digits ≈ 50-100 characters)
Problem: The model would need very long sequences to understand meaning → slow and hard to learn long-range patterns

If we used whole words:

Each word = 1 unit
“Hello” = 1 unit (short sequence, good)
But vocabulary would be huge (English has 500,000+ words) → massive memory and computation
Problem: Can’t handle misspellings, new words, or rare words (“out-of-vocabulary” problem)

Tokens are the sweet spot:

Subword tokens (like “play” + “ing”) keep vocabulary manageable (typically 50,000-100,000 tokens)
Sequence length is reasonable (not too short, not too long)
Can handle novel words by breaking them into known subwords

2. Handling unknown words gracefully

With word-based models, “misunderestimated” (not a real word) = unknown → error.
With tokens: ["mis", "under", "est", "imated"] → each part is recognized → model can guess meaning from parts.

3. Computational constraints

LLMs use self-attention (Transformer architecture), which has O(n²) complexity — doubling the sequence length quadruples the computation.

Characters: 500 characters = 500² = 250,000 operations
Tokens: 100 tokens = 100² = 10,000 operations (25x faster!)

4. Learned representations

Tokens aren’t just arbitrary — the tokenizer is trained to find meaningful chunks:

Common words (the, and, of) → single tokens
Common affixes (un-, -ing, -ed) → separate tokens
Spaces and punctuation → separate tokens

This helps the model learn patterns like “un-” often means “opposite of” across many words.

Analogy: Shipping packages

Characters = shipping each grain of rice individually (too many packages)
Words = shipping each rice cooker as one package (too many different box sizes)
Tokens = using standard box sizes that fit 1-3 grains of rice each (efficient!)

Real-world impact of tokenization

Different tokenizers can affect performance:

GPT-4o (OpenAI) ~ 100K tokens
Llama 3 (Meta) ~ 128K tokens
Gemini (Google) ~ 256K tokens

Some languages (e.g., Chinese, Japanese) require more tokens per meaning because characters don’t map neatly to subwords.

…

Bila kata menjadi harga, belum tentu rasional menjadi nyata.
lebih banyak kata lebih banyak harga.
tenaga, harga, rupa, belanja, yuran, nilai dan tol,
semua menjadi harga kepada rasa.

intepret bu chat gpt

Bila kata menjadi harga, belum tentu rasional menjadi nyata.”
When words are priced or treated as commodities, rationality doesn’t automatically follow. Just because something is measured, billed, or quantified doesn’t mean it becomes more truthful or logical. It hints at a disconnect between valuation and understanding.

“lebih banyak kata lebih banyak harga.”
This points to a simple but powerful dynamic: verbosity can inflate cost. It could be literal (paying per word, per token) or symbolic—longer explanations, more effort, more “weight.” But it also questions whether more actually means better.

“tenaga, harga, rupa, belanja, yuran, nilai dan tol,”
Here you broaden the idea of “price” beyond money:

tenaga (effort/energy)
rupa (appearance/image)
belanja, yuran, tol (expenses, fees, tolls)
nilai (value itself)

Everything becomes a kind of currency or cost attached to expression and experience.

“semua menjadi harga kepada rasa.”
In the end, even rasa—feeling, meaning, inner experience—gets priced. Emotion, authenticity, and human connection are no longer free; they’re mediated by cost, effort, and systems of valuation.

Written by Khairul

April 22, 2026 at 8:43 pm

Posted in Uncategorized

Tagged with AI, artificial-intelligence, ChatGPT, llm, technology

DrMKZ

Posts Tagged ‘llm’

Evolution of compute cost to tokens

First Wave: Competition Enters the Market (2023-2024)

June 2023 – The First Cuts

July 2023 – Claude 2 Enters

December 2023 – Google Crashes the Party

March 2024 – The Budget Tier Emerges

Second Wave: The Efficiency Revolution (2024-2025)

May 2024 – OpenAI Responds

July 2024 – Mini Models Take Over

August 2024 – Flash Drives to Near-Zero

December 2024 – DeepSeek Disrupts

Third Wave: The Commoditization (2025-2026)

Early 2025 – Flash Wars

March 2025 – Frontier Gets Cheaper

Early 2026 – Current State

The Numbers: A Timeline View

1. Intense Competition

2. Hardware Efficiency

3. Model Architecture Advances

The Paradox: Why Your AI Bill Still Grew

What This Means for Developers

The Economics Have Transformed

Best Practices for 2026

The Future: What’s Next?

Price Predictions (2026-2028)

Emerging Trends

Key Takeaways

Conclusion

Medical-Specific AI Models

Google MedLM (Med-PaLM 2)

Amazon HealthScribe

Microsoft Dragon Copilot

Healthcare Natural Language API (Google)

Cloud Healthcare API (Google)

Comparison: Healthcare AI vs. General AI

Cost Considerations for Healthcare Organizations

Direct Costs

Hidden Costs

Medical AI Pricing Examples

Scenario 1: Small Practice (2 Providers)

Scenario 2: Hospital System (50 Providers)

Medical AI vs. General AI: Key Differences

Why Healthcare AI Costs More

Leading Healthcare AI Providers

By Category

The Future of Healthcare AI Pricing

Trends to Watch (2026-2028)

Predictions

Key Takeaways for Healthcare Organizations

Conclusion

1. Efficiency (Balancing vocabulary size vs. sequence length)

If we used characters:

If we used whole words:

Tokens are the sweet spot:

2. Handling unknown words gracefully

3. Computational constraints

4. Learned representations

Analogy: Shipping packages

Real-world impact of tokenization

Share this:

Follow This Blog! via Email

Who am I?

Recent Posts

stats

DISCLAIMER

Recent Comments

Top Posts

Collection