Data Quality & Provenance
Every record returned by every Silicon Analysts tool β REST or MCP β carries a provenance block describing where the data came from, how confident we are in it, and when it was last refreshed. This page is the canonical reference for what those fields mean.
Where our data is least certain
We publish ranges, not false precision. These are the datapoints where our published range is widest β where the real number is hardest to pin down, and where input from people who work with these numbers would move us most. Confidence here is inferred from the width of the published range.
| Datapoint | Published | Range & confidence | Spread | Whatβs your number? |
|---|---|---|---|---|
| iot-edgegross margin | 45% | 35%β50% | 33% | |
| Intel 16nmwafer price | $4,500 | $3,800β$5,200 | 31% | |
| Samsung 5nm (SF5)wafer price | $13,000 | $11,000β$15,000 | 31% | |
| TSMC 28nmwafer price | $3,000 | $2,550β$3,450 | 30% | |
| TSMC 7nm (N7)wafer price | $9,500 | $8,100β$10,900 | 29% | |
| TSMC 5nm (N5/N4)wafer price | $18,500 | $16,000β$21,300 | 29% | |
| Samsung 3nm (SF3)wafer price | $15,000 | $13,000β$17,000 | 27% | |
| TSMC 3nm (N3)wafer price | $19,500 | $17,000β$22,000 | 26% |
Drop your own number in any row to see how it compares to the crowd β input is anonymous and aggregated (individual values are never shown; the distribution unlocks at 3 contributions). Have a source? Logged-in users can suggest a reviewed correction β how contributions are handled.
How community data works
We invite the people who use these tools to sharpen the numbers β but a number nobody can trust is worse than no number. Hereβs exactly how contributions are handled.
Anonymous by default
No account or email needed. We store only the value and which datapoint itβs for, to compute a community range β never tied to your identity, never shown individually. The range also reflects adjustments people make in our tools; you can opt out of being included at any time.
The crowd informs, we decide
Contributions never change our published numbers automatically. Sourced corrections go to a review queue we vet by hand; implausible values are bounded out, and any stray number is absorbed harmlessly into a median β it canβt move what we publish.
Three labels, never blurred
A number is always one of: our published estimate (vetted), the community signal (n contributors, median + confidence), or validated (a sourced correction we reviewed). We never present crowd input as fact.
Showing our confidence β and where the crowd disagrees β isnβt a weakness. Itβs how we stay honest about whatβs known versus estimated.
Provenance Taxonomy
Two enums classify every record: source_type (how the record was produced) and confidence_tier (how much to trust it). The type contract lives in lib/tools/types.ts.
source_type β how this record was produced
Sourced from a published external report (Morgan Stanley, TrendForce, public earnings, etc.). Direct attribution.
Computed by Silicon Analysts from public inputs via a documented methodology (Monte Carlo cost models, yield equations, etc.).
Pure-function output from user-supplied inputs (e.g. calculate_chip_cost results). No data lookup involved.
Analyst judgment where data is sparse or unobservable.
confidence_tier β qualitative confidence
Multiple independent sources agree; methodology is well-tested.
Single authoritative source or moderate methodology uncertainty.
Sparse data, significant assumptions, or rapidly changing.
0.85 is worse than a reasoned "high" β agents will treat the number as more precise than it is. We will tighten to a numerical confidence once the methodology for computing it is documented.Update Cadence by Dataset
Last-updated dates below are dataset-level. Per-record overrides are supported in the API; future migrations will populate them as individual chips, nodes, and packaging types refresh asynchronously. Cadence is a target β historical refresh history will appear on a planned /changelog page.
| Dataset | Last Updated | Target Cadence | Source Types |
|---|---|---|---|
AI Accelerators (chipSpecs) Per-chip cost breakdowns derived from Monte Carlo models against public teardown data and analyst reports. | 2026-04-07 | Monthly | derivedresearch |
Foundry & Wafer Pricing (foundryData) Wafer price ranges, defect density, NRE/mask-set costs, and node maturity status. Synthesized from TrendForce, Morgan Stanley, CSET, public filings. | 2026-04-07 | Quarterly | derivedresearch |
Packaging & HBM Specs (packagingData) Per-tech packaging cost benchmarks (CoWoS-S/L, EMIB, SoIC, FC-BGA, etc.) plus HBM2 β HBM4 cost-per-stack and bandwidth/capacity. | 2026-04-07 | Monthly | derivedresearch |
HBM Market Analysis (hbmData) 9 sub-tables: accelerators, specs, market share, spot prices, leading indicators, qualification feed, revenue forecast, supplier revenue, validation checks. | 2026-04-07 | Monthly | researchderived |
Supply-chain Headlines (marketPulse) Curated supply-chain headlines with trend direction and impact analysis. Per-item dates parsed to ISO 8601 in the API. | β | Weekly | research |
Methodology Notes by Source Type
Research
Records attributed directly to a public external publication. Primary sources include TrendForce quarterly reports, Morgan Stanley semiconductor research, Raymond James analyst notes, CSET, IEDM proceedings, and earnings releases. Each record carries a per-row source string in addition to the structured provenance block.
Derived
Records produced by a Silicon Analysts model from public inputs. Examples: per-accelerator cost breakdowns combine Epoch AI Monte Carlo models with TrendForce and Raymond James inputs; wafer price ranges synthesize multiple foundry-pricing sources with the Murphy yield model. See Semiconductor Cost Guide for the methodology in depth.
Computed
Pure-function output from user-supplied inputs. The calculate_chip_cost tool is the only example today: given die dimensions and process parameters, returns an estimated chip cost. No data lookup is involved at the record level (though input defaults pull from derived wafer-pricing data β that data's freshness sets the ceiling on result freshness via the conservative-pick rule).
Estimated
Analyst judgment where data is sparse or unobservable. Rare today; reserved for fields like packaging cost on early-life tech where no published price exists. Always paired with low or medium.
Example: Provenance in a Tool Response
Every record across all 6 tools carries the same shape. Below is a representative get_accelerator_costs record (truncated).
{
"chip": "NVIDIA B200",
"vendor": "NVIDIA",
"processNode": "TSMC N4P",
"estMfgCostUsd": 8500,
"estSellPriceUsd": 30000,
"costBreakdown": { "logicDieCostUsd": 220, "hbmCostUsd": 4200, ... },
"provenance": {
"last_updated": "2026-04-07T00:00:00.000Z",
"source_type": "derived",
"confidence_tier": "high",
"dataset_version": "chipSpecs-v1.0"
}
}See the Developer API page for full per-tool response shapes and the /api/v1 manifest for the machine-readable contract.