A Forecasting Engine Built for Decisions at Scale

Running a marketplace business well means making a lot of distinct decisions — and those decisions don't all require the same kind of answer.

The team managing replenishment needs to know how many units to order this week. The account team is tracking whether price trends will affect demand over the next quarter. Business development is trying to size the revenue opportunity for a brand before that brand has ever sold a unit.

These are different questions with different time horizons, different inputs, and different consequences when the answer is wrong. And yet in most organizations, they're being answered in disconnected ways — each team with its own method, its own assumptions, and no common ground when the numbers don't agree.

We are building the Gierd Forecasting Engine because marketplace businesses deserve better than that. The goal is a single system that can give every team consistent, trustworthy answers to all of those questions, built in a way that takes each one seriously rather than treating them as variations of the same problem.

Forecasting Isn't One Problem

Marketplace businesses don't have one forecasting problem. They have several, and each one requires a different kind of answer.

How many units will sell over the next 90 days? That's a demand question — it drives replenishment decisions, ad spend, and campaign timing.

What will the selling price look like over that same window? That's a price question — and a demand forecast that ignores price movement will be systematically off in predictable ways.

If we're evaluating a new brand as a potential client, how large is the revenue opportunity at a given price point? That's a market sizing question, and it needs to be answered from external market data before any internal history exists.

And once you have the demand forecast, what does it tell you about how many units to order this week given current inventory and lead times? That's the replenishment question — it translates the forecast into a business decision.

These are related questions, but they're not the same question. Systems that try to answer all of them with a single model end up answering none of them well.

The engine is being built around four distinct forecast types, each designed to answer one of those questions specifically. A demand forecast. A price forecast. A pre-onboarding opportunity sizing. And a replenishment signal. Each has its own models, its own evaluation criteria, and its own process for determining which approach earns the job. They share underlying data infrastructure — the same nightly run schedule — but each is purpose-built for its job.

One Catalog, Many Demand Patterns

Even within a single client's catalog, demand is not uniform. Some products sell consistently, week after week, with enough historical signal to estimate a demand curve with reasonable confidence. Others sell intermittently — a handful of units a month, irregular timing, sometimes nothing for weeks at a time. Some products have never been in stock long enough to reveal their actual demand at all.

The approach that performs well on consistent sellers tends to fail on intermittent ones. It's not a tuning problem. A method built to find patterns in dense, regular data is working from different assumptions than one built to handle long stretches of zero sales without treating them as evidence of low demand. Forcing one approach to cover both is accepting error that doesn't need to be there.

We kept coming back to this: the right response to a catalog with widely different demand patterns is an approach that adapts to those differences. For each client account, we run multiple methods against real historical data and evaluate them on their actual performance. The one that performs best on that account's data becomes the champion for that account. And because products behave differently even within the same catalog, the competition runs at the product level too — every product gets the approach that was most accurate on that product's own history, not just the approach that performed best on average across the entire account.

No method is assumed to be the optimal one. It has to prove it.

Honest Ranges, Not False Precision

No model eliminates uncertainty. What separates a useful forecast from a misleading one isn't just whether the point estimate is close — it's whether the uncertainty around that estimate is honest.

A forecast that claims tight confidence when the underlying uncertainty is actually wide creates a specific kind of downstream problem. The team reads the range and plans accordingly. There's no buffer for normal variance. When demand runs high or low outside the stated range — which happens more often than an overconfident forecast admits — the position is wrong with no room to absorb it. False precision doesn't just look bad on a scorecard. It propagates into ordering decisions with no slack built in.

Every demand forecast the engine produces includes three values: a lower estimate, a middle estimate, and an upper estimate. These aren't cosmetic additions to make the output look complete. The engine is evaluated on whether those stated ranges actually contain the true outcome at the right rates. A forecast that claims 80% confidence is held to that: does the true outcome land inside the stated range at least 80% of the time? If it doesn't, that's an overconfidence problem, and the engine treats it accordingly. Overconfident forecasts are penalized when selecting which approach earns the job, not just inaccurate ones.

This accountability changes how the engine behaves. It creates an incentive to be honest about uncertainty rather than to produce numbers that look tight.

When the Data Lies

One more constraint shapes demand forecasting across virtually every ecommerce catalog: stockouts. When a product runs out of inventory, sales drop to zero — not because demand disappeared, but because there was nothing to sell. If you train a model on that data without accounting for it, you're teaching it the wrong lesson. The model reads those zeros as low demand, when they were actually out-of-stock listings.

The engine will identify those periods and excludes them from training. A day when a product couldn't be purchased doesn't tell you how many units customers wanted to buy. Including it distorts the learned demand signal in ways that compound quietly over time. There's a practical guard on this exclusion as well: if removing those periods would leave a product with too little clean training data to build a reliable model, the system handles that situation separately rather than producing a model trained on an insufficient sample.

Excluding a bad signal is not the same as ignoring it. The engine knows those periods happened. It just doesn't learn from them.

The System in Action

The engine will run nightly across our client accounts. Every morning, updated forecasts will be available through a self-service dashboard where our team members can explore demand signals, download reports, run scenario simulations, and size new brand opportunities.

What makes this architecture worth the complexity is what it replaced: fragmented numbers produced by different people using different methods for different purposes — and when those numbers disagreed, there was no way to resolve which one to trust.

The engine doesn't eliminate judgment — team members still make decisions, and those decisions require contextual knowledge no model has. What it does is give every team access to numbers that come from the same methodology, with uncertainty ranges that mean what they say, so the judgment calls are being made from a shared foundation rather than from competing estimates.

Good forecasting isn't just a matter of picking the right model. It's a matter of asking the right questions, applying the right methodology to each, and being honest about what any model can and can't know. Getting those three things right — across every account and every product in a catalog — is the problem this engine is being built to solve.