All posts
Ray Iyer
Ray Iyer
Co-founder & CEO, Anglera

Beyond the hero image: the asset and attribute data AI needs

AI vision reads pixels, not specs. The alt text, image metadata, and structured attributes that make a product page understandable to buyers and AI.

Beyond the hero image: the asset and attribute data AI needs

Most catalog teams still treat the hero shot as the finish line. Get a clean white-background image, maybe a lifestyle photo, ship the listing. But an AI answer engine looking at that image sees a rectangle of pixels: colors, shapes, a rough silhouette. It cannot see the port count on the back of a switch, the thread pitch on a fitting, or the certification stamped in text too small to render at web resolution. The gap between what a photo shows and what a buyer or an AI needs to know is exactly where products go invisible.

Vision models are good at objects, not specs

Multimodal models like GPT-4o and Gemini have gotten genuinely good at recognizing what an image contains, and the visual search market is scaling fast alongside them, projected to more than triple from about $6.3 billion in 2025 to $23.8 billion by 2034. But recognition is not comprehension. A vision model can tell you a picture shows a gray metal enclosure with cables coming out of it. It cannot reliably tell you that enclosure delivers 370W of PoE budget across 24 ports, or that the mounting bracket is sold separately. That information either lives in text somewhere near the image, or it does not exist to the model at all.

Image quality compounds the problem. Blurry, poorly lit, or low-resolution catalog photos already struggle to match against real-world queries, which is one reason distributors with thin photography budgets lose ground in visual search even before the specs question comes up.

Alt text stopped being a caption

The job of alt text has changed. For years it described what a screen reader should say about an image: "man holding drill." The newer expectation, especially as AI vision systems read surrounding page context to interpret why an image matters, is that alt text carries purpose, not just contents. "Man holding drill" tells an answer engine nothing about torque, chuck size, or battery platform. "18V brushless hammer drill, 1/2 in keyless chuck, compatible with XR battery platform" gives it something to reason with, and it does so without touching the image file at all.

That distinction matters because alt text is one of the only channels where product truth and image context sit in the same place. If it's generic or missing, the image is decorative as far as any language model is concerned.

The metadata layer Google already expects

This isn't just an AI-search theory. Google's own structured data guidance for images asks for creator, license, and copyright fields on the ImageObject type, plus a way to flag whether an image is a real photograph or AI-generated. That's before you get to product structured data proper, where Merchant Center wants multiple images at specific resolutions and aspect ratios tied to accurate price, availability, and identifiers. The image is not a standalone asset. It's one field in a structured record, and it only pays off when the rest of the record is filled in around it.

Before and after: same photo, different product

Here's what an ordinary supplier feed looks like next to an enriched version of the same SKU:

Raw feed description: "Network switch, 24 port, black, good for office use."

AttributeEnriched value
Port count24 x 10/100/1000 RJ45
PoE budget370W total, 802.3bt
Uplink ports4 x SFP+ 10G
Mounting19 in rack, 1U
Alt text24-port managed PoE++ switch, 1U rack-mount, 370W budget, 4x SFP+ uplinks
Image setFront panel, rear panel, dimensional line drawing
Fan noiseFanless

Nothing here required a photographer to reshoot anything. It required pulling values out of the supplier's spec sheet, scoring them for completeness, and attaching them to the SKU and its images as text an engine can parse.

Ask an answer engine

Ask an answer engine "which fanless 24-port PoE switch has enough budget for 24 wireless access points" and it needs the PoE-budget number and the fanless attribute in text, matched to a real image of the actual unit. A hero shot alone answers none of that. The structured record next to it answers all of it.

Structured data helps discovery, not shortcuts around substance

It's worth being honest about the limits here. A widely cited Ahrefs study tracking 1,885 pages that added JSON-LD schema found no meaningful citation lift on pages that were already heavily cited by AI systems, undercutting the idea that markup alone moves the needle. The pages in that study already had 100+ citations before the test. For a typical distributor SKU starting from nothing, the mechanism is different: schema and alt text are how a page gets discovered and correctly parsed in the first place, not a lever you pull on top of already-strong content. Structured data amplifies real product information. It doesn't manufacture it.

Where this leaves catalog teams

Photography budgets and SEO tags both matter less than the plain-text layer connecting them: attributes extracted from real supplier documentation, alt text that names what the product does instead of what it looks like, and image metadata filled in consistently across every SKU rather than the ten hero shots that got extra attention. That's a data operations problem more than a creative one, and it's one most catalogs carry at scale because nobody enriches every SKU by hand.

This is the layer Anglera works on. It plugs into whatever PIM a distributor already runs, or works from a flat file if there isn't one, and continuously extracts and quality-scores the attributes and alt text that sit next to every image, so the picture and the product record finally say the same thing.

Ray Iyer

About the author

Ray IyerCo-founder & CEO, Anglera

Ray is the co-founder and CEO of Anglera, building the product-data infrastructure for agentic commerce — turning messy catalogs into structured, AI-readable data that buyers and answer engines can find. Previously product at Uber; Stanford CS.

See it on your own SKUs.

A 30-minute walkthrough on your categories and your supplier data.

Book a demo