WORLD

The future of AI relies on a high school teacher’s free database

Christoph Schuhmann in Hamburg, Germany, on Thursday. When Schuhmann isn’t teaching physics and computer science to German teens, he works with a small team of volunteers building the world’s biggest free AI training data set, which has already been used in text-to-image generators such as Google’s Imagen and Stable Diffusion. | BLOOMBERG

By Aggi Cantrill and Marissa Newman
Bloomberg

Apr 24, 2023

In front of a suburban house on the outskirts of the northern Germany city of Hamburg, a single word — "LAION” — is scrawled in pencil across a mailbox. It’s the only indication that the home belongs to the person behind a massive data gathering effort central to the artificial intelligence boom that has seized the world's attention.

That person is high school teacher Christoph Schuhmann, and LAION, short for "Large-scale AI Open Network,” is his passion project. When Schuhmann isn’t teaching physics and computer science to German teens, he works with a small team of volunteers building the world’s biggest free AI training data set, which has already been used in text-to-image generators such as Google’s Imagen and Stable Diffusion.

Databases like LAION are central to AI text-to-image generators, which rely on them for the enormous amounts of visual material used to deconstruct and create new images. The debut of these products late last year was a paradigm-shifting event: It sent the tech sector’s AI arms race into hyperdrive and raised a myriad of ethical and legal issues. Within a matter of months, lawsuits had been filed against generative AI companies Stability AI and Midjourney for copyright infringement, and critics were sounding the alarm about the violent, sexualized and otherwise problematic images within their datasets, which have been accused of introducing biases that are nearly impossible to mitigate.