Product

Solutions

Resources

Pricing

Talk to Sales

Nate Moran and Zach Bermingham

Gierd Unified Schema: Data Foundations for an Agentic Future

Addressing the Data Slop

MCP, UCP, ACP, A2A. The acronym soup of the moment in commerce. Anthropic, Google, and OpenAI have each staked positions on the protocols that will govern AI commerce, and every major retailer and marketplace has announced an agentic strategy. The narrative is hard to miss: the next era of commerce will run, in meaningful part, through autonomous agents acting on behalf of buyers, sellers, and operators.

It's a compelling vision that rests on a quietly fragile assumption.

AI Slop has become shorthand for the low-quality, hallucinated outputs that erode trust in AI. Far less attention gets paid to its mirror image, which we'll call Data Slop. Inputs that are fragmented across platforms, inconsistent across marketplaces, and stripped of the business context that makes them usable. Bad data in, bad data out isn't new. It's the oldest principle in the discipline. AI hasn't retired it, it's raised the stakes considerably. A dashboard that gives a confidently wrong answer is a problem a human catches. An agent doing the same thing at scale, with the authority to act, is something else entirely.

If autonomous agents are going to operate the next era of commerce, what does the data underneath them actually need to look like? Not what most companies have today. Most companies still operate on fragmented data, with definitions that drift between teams and business context that lives in tribal knowledge rather than the data itself. That works when a human is the last line of defense between a query and a decision. It stops working the moment that human steps out of the loop.

At Gierd, we've been building toward a different answer – Gierd Unified Schema, also known as GUS. Three things have shaped the work: unification of the underlying data through GUS, context captured alongside it, and modeling shaped for the consumer asking the question, whether that consumer is a human, a system, or an agent.

Three Architectural Commitments

Avoiding Data Slop isn't necessarily a tooling problem. It's an architectural posture. Three commitments separate the companies preparing for the agentic future from the ones still treating AI as a feature to bolt on later.

Data-First. Every data domain in the stack (Orders, Inventory, Pricing, Customer Service, Finance, Advertising, etc.) has an owner, a release process, and a service-level commitment. A data domain with an owner is a building block. A data domain without one is a liability.

API-First. Every capability lives as an API before it becomes an output. Systems consume the API, not the other way around. When a customer, a partner, an account manager, and an AI agent all pull the same answer through the same interface, consistency stops being an aspiration. It becomes a property of the architecture.

AI-First. AI isn't a downstream user. It's a primary consumer the architecture is designed for. As our APIs mature, this is the surface we expect to expose to agents directly, so they pull from the same foundation every other consumer does rather than a degraded version of it.

These three commitments describe a posture, not a project. They're the precondition for everything that follows.

The GUS Layer

Multi-channel commerce is fragmented by default. The fragmentation isn't incidental, it's structural, and it shapes every analytical question downstream.

The same SKU is identified five different ways across Amazon, eBay, Walmart, Target Plus, and Best Buy. The same "order" event has different shapes, different timings, and in places different definitions per channel. Layer in the customer's own warehouse management systems, financial systems, and logistics partners, and the harmonization problem becomes the work itself. Fragmentation is the natural state of multi-channel commerce. Unification is an active engineering effort.

GUS is our resolution. It's a single canonical schema spanning every Gierd integration point: marketplace APIs, customer systems, logistics partners, and 3P integrations. We follow the widely adopted medallion architecture, moving data through Bronze (raw), Silver (staged), and Gold (enriched and production-ready) layers. GUS is what emerges in Gold. The unified and normalized form of the transactional and analytical datasets that drive multi-channel commerce, including inventory, listings, offers, orders, transactions, and so on. Dimensions carry the context that ties it all together: Accounts, Channels, Listings, Offers, SKUs, and more.

The principle underneath GUS is the operational counterpart to Data Slop: good data in, good data out.

Tony Ojeda recently wrote about our self-healing data pipeline, which keeps GUS current and complete in production. Keeping GUS healthy in production is one half of the story. Building and extending GUS as we onboard new marketplaces and data domains is the other. It used to be some of the most labor-intensive work on a data engineering team's plate. It isn't anymore.

Building GUS with Agents

Our self-healing agents are reactive, detecting and resolving entropy in live production data. The agents described here are constructive. They are the architects of GUS, helping us design and standardize the foundation, not just patch pieces together as we go. Same architectural posture, different point of leverage.

Before this shift, the work was a manual grind. Onboarding a marketplace meant a data engineer starting at the raw table level and working upstream. They'd spend hours parsing API documentation or waiting on product managers to define which fields mattered and how they mapped across disparate channels. It was a cycle of manual comparisons, often against the marketplace frontend, to bridge the gap between how Amazon, Walmart, and eBay defined the same event. Then came the aliasing, the normalization judgment calls, and the inevitable troubleshooting of type mismatches when trying to union five sources into one view. Int here, numeric there, failed to parse timestamp. Is tax included in this order total or not? Constant back-and-forth with engineers, trying to remember what each channel had or did not.

Today, that process is governed by three chained AI agents.

The plan-engineer reads through our documentation and collection of findings from each marketplace. The one-off calculations and data definitions get committed to our agent context files. No longer do data engineers need to read this themselves or remember bespoke calculations. This agent considers all of these in each statement of work it takes on, resulting in proposals that don't drift or create hours of unnecessary data validation by the business or analysts.

The build-engineer takes these plans and executes them. As an organization, we have defined standards for what good dbt SQL looks like and how it should function. No longer is a data engineer troubleshooting why unioning five marketplace order data tables together errors for INT vs NUMERIC. The planning agent understands this ahead of time and instructs the build accordingly.

The review-engineer confirms that work done meets the criteria defined in the plan. Did we actually solve the original problem that was defined? All too often, engineers go down a rabbit hole that drifts from the origin of starting the work. This confirms we accomplished what we set out to do. In this same stage, we ensure that the standards set for acceptable work is met and that the marketplace definitions are adhered to.

One job per agent. One artifact per job. Our data-specific requirements change the stakes: the review gate prioritizes spec compliance over code quality because the spec is the contract downstream models rely on, the spec is a versioned and auditable artifact, and we scan for consequences across every model that consumes GUS before any schema change is merged.

This setup mirrors the agentic engineering teams Robert Evans has described. The rhyme is direct: plan-engineer is the Architect, build-engineer is the Engineer, and review-engineer is the Code Reviewer. Three gates to production.

The agents propose. The data engineers refine and approve. No model change reaches production without a human signature. This isn't a training phase. It is the architecture.

Purpose-Led Modeling: One Foundation, Many Shapes

A flawlessly harmonized dataset doesn't answer questions on its own. It needs models that shape it for the consumer asking. GUS is the foundation. What sits on top of it is where the foundation becomes useful, and the shape it takes depends on who's asking.

We maintain three modeling patterns over GUS, each tuned to a different consumer.

The first is star schema modeling for BI and analytics. Fact and dimension tables are the connective tissue that lets analysts trace events across channels through shared dimensions. Tying traffic to orders across Amazon, Walmart, and eBay is trivial when listings, SKUs, and channels are modeled as conformed dimensions. Forecasting, attribution, and channel mix analysis all become possible at the multi-channel level rather than the single-channel level. This is the layer dashboards and BI tools consume, and it's where most analytics engineering work has historically lived.

The second is contextual wide tables for AI consumption. Star schemas are optimal for human analysts writing SQL, but they are not optimal for agents. An agent asking about SKU performance shouldn't have to navigate a join graph to assemble the answer. The relevant facts and dimensions should already be there. We build wide tables that pre-join the context an agent needs to reason. SKU-level economics across cost, fees, traffic, conversion, returns, and payouts in one queryable surface. Order lifecycle tables that capture the full arc from listing to order to shipment to payout to reconciliation. Same canonical data as the star schemas, different shape, designed for a different consumer.

The third is intelligence models, which turn the foundation into decisions. Competitive Pricing Intelligence unifies signals from marketplace APIs, web crawling, and 3P data providers so our teams operate from the freshest market picture available, not yesterday's snapshot. Unified forecasting models support channel sales expectations, replenishment planning, price decay curves, and elasticity analysis, all built on the cross-channel dimensions the star schemas establish. The Unified Commerce Ledger tracks transactions through the full lifecycle (expected → unsettled → settled → reconciled) so finance teams can close the loop between what was sold, what was paid, and what was earned.

The closing piece is decision capture. Every pricing change, listing optimization, and inventory adjustment is recorded alongside the outcome it produced. This isn't just data storage. It is the record of intent and result, and it's what allows the intelligence models to improve over time rather than restart from scratch.

Three consumers, three shapes, one foundation. That's not a tooling choice. It's the architecture.

Maybe It's Just Semantics

For most of the last decade, the discipline that owned this work had a different name. Analytics Engineering was designed for an era when the consumer of the data was a human looking at a dashboard. Model the data. Own the metric definitions. Build the marts. The success metric was, more or less, did the SQL run, and is the dashboard right?

That era is closing.

The consumer is changing. More and more, the entity asking a question of the data is an AI agent rather than a human. The agent won't look at a chart and apply judgment to a fuzzy answer. It will read the answer, treat it as truth, and act on it. The new bar: did the agent give the right answer when the question was ambiguous, and was that answer grounded in something defensible?

That isn't a messy data problem. It's a context problem.

The discipline emerging to meet it is Context Engineering: encoding business definitions, hierarchies, edge cases, and decision rules into machine-readable context so the answer to a question stays consistent across every consumer. Analytics engineering produced clean data. Context engineering produces defensible answers.

The protocols being laid down right now make the shift urgent. None of them carry their own context. They assume the systems they connect to already have it. In a world where agents query each other across organizational boundaries, the company with the cleanest, best-contextualized foundation wins.

At Gierd, we've been building toward this quietly and deliberately. Models matter, but meaning is the multiplier. AI matters, but only if the data underneath it is something other than slop. When you ask a question of your data, you should get the same answer back regardless of how you ask it. And it should be the right one.