To Cook is to See

To Cook
is to See

How we use AI for autonomous cooking at scale

How we use AI for autonomous cooking at scale

With LLMs reshaping white-collar work, the next frontier is Physical AI transforming physical labor. For robots to act in the real world, they must first perceive and understand it, and vision is their most fundamental sense.

Self-driving cars see. Home robots see (via an egocentric view)

View of a self-driving car

Egocentric views used for humanoid robot training

By the same principle, a robot must be able to see in order to cook and that’s exactly how Posha works. Just like an expert driver can perceive & understand the environment (lane detection, understanding traffic signs, identifying cars, humans, animals and other objects) — a world class chef perceives multiple attributes during cooking.

Let us dig into the kind of intelligence needed to cook a world-class meal every single time, repeatably, consistently at scale.

Identify Ingredients

A chef should be able to identify and distinguish one ingredient from another - from veggies to protein to pasta.

BROCCOLI

SHIITAKE MUSHROOM

YUKON GOLD POTATO

PENNE PASTA

BELL PEPPERS

WASHED & SOAKED RICE

CABBAGE

CHICKEN WINGS

BEEF

BEETS

Prep Style of an Ingredient

The same ingredient can be prepped differently in different recipes. A chef should be able to identify between them.

Prep style of an ingredient impacts how much heat should be provided, at what level and also how to assess the doneness of the ingredient visually. For example - Minced Beef once done would look quite different in color and texture from say Beef Strips.

BEEF

STRIPS

MINCE

CHUNKS

CARROT

BATONS

DICED

PUREE

ONIONS

DICED

PUREE

SLICED

POTATOES

WEDGES

SMALL DICE

BELL PEPPER

SMALL DICE

LARGE DICE

SLICED

Ingredient Size

The size of how large an ingredient is cut would directly determine how long it needs to be cooked for.

LARGE FLORET - BROCCOLI

MEDIUM FLORET - BROCCOLI

SMALL FLORET - BROCCOLI

MEDIUM CUT - OKRA

LARGE CUT - OKRA

Ingredient Volume

Amount or volume of a particular ingredient majorly impacts the quantity in which other ingredients should be added in the recipe.

For example, quantity of tomato would influence the balance of taste between sweetness and sourness.

In another example - amount of water would directly depend on the volume of rice.

TOMATO PUREE

HIGH VOLUME

MEDIUM VOLUME

LOW VOLUME

SPINACH

HIGH VOLUME

MEDIUM VOLUME

LOW VOLUME

RICE

HIGH VOLUME

MEDIUM VOLUME

LOW VOLUME

Ingredient Variant

The same ingredient can be present in different forms which would entirely change how we cook it.

For example - how we would treat Glass Noodles, Rice Noodles, Noodle Cakes, Hakka Noodles would completely differ in terms of water qty. addition, heat and stirring profiles.

CHICKEN

THIGH

BREAST

MINCE

LENTIL

MOONG

WHOLE BROWN MASOOR

SPLIT MASOOR

GRAIN

FOXTAIL MILLET

COUSCOUS

RICE

NOODLES

RICE NOODLES

NOODLE CAKE

GLASS NOODLES

HAKKA NOODLES

PASTA

PENNE

MACARONI

SPAGHETTI

POTATO

YUKON GOLD

RED POTATO

RUSSET

TOMATO

CHERRY

CANNED PUREE

FRESH PUREE

DICED FRESH

The above is just scratching the surface of analyzing the state of input ingredients as they enter. A lot of the intelligence lies in understanding how food transforms under heat. This transformation can take place in form of change in color, size, texture and involves tracking the state of food in real-time.

Color

One of the most important signals of cooking progress is color. As food heats, it changes shade in predictable ways, from raw to cooked, pale to golden, and golden to brown. By continuously tracking these color shifts in real time, Posha can decide when an ingredient has reached the right stage, whether that is lightly sautéed onions, deeply caramelized onions, or chicken that has just turned from pink to fully white and cooked.

Texture

Another key signal is texture. As food cooks, its structure changes in ways that matter just as much as color. A potato, for example, starts firm and holds its shape, then gradually softens, and can eventually break down into a mash.

Size

Another key signal is texture. As food cooks, its structure changes in ways that matter just as much as color. A potato, for example, starts firm and holds its shape, then gradually softens, and can eventually break down into a mash.

Cooking intelligence involves can involves understanding the food scene as it cooks and tracking key events.

Detecting the moment water transitions from still to actively boiling by observing continuous bubbling and surface movement.

Knowing when boiling starches like rice or pasta have absorbed or lost all free water, leaving the pan visibly dry.

Tracking when frozen or clumped meat breaks apart into smaller, evenly distributed pieces while cooking.

Recognizing when rigid foods like a ramen-noodle cake or dry spaghetti soften, bend, and fully merge into the cooking liquid.

All of this is still largely unexplored territory. There is very little existing literature on understanding food as it cooks, mainly because there has never been enough real, fine-grained data capturing how ingredients transform over time.

At Posha, we are building what we call the Culinary Vision Dataset—the world’s largest collection of real-world cooking data of 5 Million datapoints that tracks changes in color, size, texture, and structure across thousands of cooking sessions.

This foundation enables what we call Culinary Artificial Intelligence: a new class of AI that doesn’t just read recipes, but sees, understands, and reasons about food as it cooks.