What We Do: AI-generated Lifestyle Images

Dan Moen

At Full Scene, we want to be the best in the world at creating cost-effective lifestyle images for eCommerce brands.

When we first saw text-to-image models in action, it was clear there would be opportunity online retailers leverage this amazing break-through.

We think every product you sell should include lifestyle images. Some brands with big budgets and small assortments can afford to shoot them in the studio or even in the field. Awesome! But that’s probably not you. Our goal is to make it easy and affordable for every brand to include lifestyle images across your entire assortment.

Leveraging the Images You Have

The first critical decision we made was to leverage the images you already have. Most products include at least one image we can work with. By making this decision, we accepted two downsides: 1. We won’t be able to support 100% of products or categories, at least not with our core process. 2. Output will often not be “perfect”. If your brand requires perfection across your entire assortment, this solution is not for you.

A lot goes in to determining whether an image will work or not.

Contrast with Background
We need a shot of the product that allows us to extract it from image. Shots on white or gray backgrounds are ideal, but not required. We run into issues if the product appears to blend into the background, making it difficult or impossible to determine where the product ends and the background begins.

Lighting
The lighting needs to lend itself to blending into a realistic scene. Reflections, transparency and strong lighting from weird angles can make things difficult or impossible for us.

Angle
We generally like mid-angle shots, where it’s easy to create room for a scene in the background. Top-down or high-angle shots often don’t work well for us. Cutting Boards, for example, are often shot directly overhead or at a severe angle making it really hard on us.

Quality
Low quality images generally don’t work at all for us. The AI works hard to generate a scene for us that works with the product. If the product shot is grainy, the AI will generate an equally low quality scene around it. We know that search engines favor lightweight pages, but in some cases we can’t work with the images on your site.

Moving and Resizing

Most product shots, especially the ones on white or gray backgrounds, are extreme close ups. In order to generate a lifestyle image, we need make space to generate a scene. In almost every case, we will reduce the size of the product and move it.

We analyze each product and user our judgement to place the item in an ideal place for the scene we have in mind.

Here’s an example. In the original image, the product takes up all of the vertical space (border added so you can see the change clearly).

Here we’ve reduced the size by 30% and moved it down, making lots of room to create a scene.

We can also reposition the item up, left and right as needed, but try not to get too crazy.

Masking

When we ask the AI to generate an image for us, we provide it with an image of the product along with a mask, to tell it what it’s not allowed to change. Creating accurate masks is hard. We have made massive investments in our masking technology, enabling us to generate accurate masks at scale. It’s just one of the many ingredients in our secret sauce.

Prompts

A prompt is the information you submit to an AI model describing what you’d like it to create. Writing prompts that generate useful output is incredibly difficult. At Full Scene, we’ve developed technology to dynamically generate prompts for inbound products. And we’re always working on improving them. Most of our competitors provide you with an interface and expect you to start writing your own prompts. It’s painful, frustrating work.

But once you crack the code for a category, it just works. That’s another ingredient in our secret sauce. We do the heavy lifting for you.

Image Generation at Scale

When we first started working with this new technology, it became abundantly clear to us that the majority of images generated would be failures. As a result, we knew we would end up throwing away many bad images for every one we kept.

With that in mind we’ve built our system a little different. Instead of a human sitting in front of an interface generating a handful of images one batch at a time, we generated hundreds (or thousands) of images at a time, scaling up as much capacity as needed to do so quickly.

Quality Assurance

When a new product is introduced to our system, prompts and settings are set automatically. In most cases those settings are fine and left untouched. However, human intervention is still needed in many cases, so we have a human review every product before a single image is generated.

Once the first batch of images is generated they are added to a queue for a human to review. Our expectation is that we’re only going to keep a small percentage of the output, the best of the best. If we’re not happy with the output as a whole, we flag the product and send it back to be re-run. We don’t want doing business for you to be painful, so we’ve built human intervention into our process. Another critical ingredient in our secret sauce.

Conclusion

We hope this overview helps you understand how we do things, why we’ve chosen the approach we’re taking and how it differs from our competition. AI has just begun to change how we do things, and we’re doing our small part to make the transition easy.