VAIsual introduces the world’s largest biometrically released real-life dataset

13 May 2022

vAIsual, a technology company generating on-demand photos using artificial intelligence, launched its dataset of 500,000 legally clean, original images. The dataset contains high-resolution images of real people, who have signed a biometric release authorizing usage of their likeness for AI training.

This means it is the largest, legally cleared dataset of human photographs ever to be released. This number can be increased to two million through standard augmentation methods.

All the original photos have been taken in a studio by highly trained professionals and supervised by machine learning experts, optimizing the images for machine learning purposes.

As machine learning requires large volumes of data, companies frequently rely on outside sources to assemble them. Armed with little to no knowledge of the issue, armies of poorly compensated “Turks” scout the internet and quickly build sets of tens of thousands, hundreds of thousands if not millions of images.

Since no one has the patience or time to vet the results properly, corrupted imagery enters the process.

In facial recognition, the issue is made worse by the addition of biases and privacy. Prejudices in race, gender, age, ethnicity, and appearances plague the results of even the most trusted facial recognition software.

This dredging method also opens crippling legal and security traps. A recent study has shown that attackers can insert hidden samples to steal secrets. And recent legislative development has made some datasets a legal minefield, sometimes creating emotional nightmares for those involved.

The good news is that solid solutions are starting to emerge. vAIsual, the company already known for its high quality synthetic media generation, just released the most extensive dataset of real human faces. With 500,000 original images immediately available, vAIsual offers the most extensive facial data set of high-resolution photos of real people.

All are legally cleared for machine learning via explicit model releases and are of the highest professional quality. It is the world’s largest biometrically released real-life dataset.

“Our biometrically-released real-life datasets are the core building blocks of the first generation of truly clean computer vision algorithms,” said Michael Osterrieder, vAIsual co-founder and CEO.

“Companies looking to achieve full legal compliance with the myriad of data protection laws coming into effect in the U.S., Europe and Asia will need easily licensable training data. Between our dataset and soon to be debuted dataset store, where online licensing will be as easy as A, B, C, vAIsual is positioned as a key partner for any company looking to succeed in the synthetic media space.

“We are on the forefront of AI development. This concerns not only technical but also legal aspects. Until now, too many companies risk their future by using dirty datasets originally compiled for research purposes. Any of these companies using scraped datasets from research libraries can be sued any time by many authorities in different jurisdictions – including the US and Europe with its strict GDPR legislation in place.

We are proud to be able to deliver a legally cleared dataset of human biometrics to enable companies to operate with legal security and peace of mind without having to worry about the fallout of ethically and legally questionable strategies”, says Osterrieder.

With this offering, the AI development community can move away from corrupted scraped data sets while avoiding any legal pitfalls. According to Osterrieder, machine learning engineers can expect less training time, less GPU usage, lower costs, better yield, better results, and outputs that outperform their dirty data rivals.