Tag • #google imagen

chevron_right

Paper: Stable Diffusion “memorizes” some images, sparking privacy concerns

news.movim.eu / ArsTechnica · Wednesday, 1 February, 2023 - 18:37 · 1 minute

An image from Stable Diffusion’s training set compared to a similar Stable Diffusion generation when prompted with

Enlarge / An image from Stable Diffusion’s training set compared (left) to a similar Stable Diffusion generation (right) when prompted with "Ann Graham Lotz." (credit: Carlini et al., 2023)

On Monday, a group of AI researchers from Google, DeepMind, UC Berkeley, Princeton, and ETH Zurich released a paper outlining an adversarial attack that can extract a small percentage of training images from latent diffusion AI image synthesis models like Stable Diffusion . It challenges views that image synthesis models do not memorize their training data and that training data might remain private if not disclosed.

Recently, AI image synthesis models have been the subject of intense ethical debate and even legal action . Proponents and opponents of generative AI tools regularly argue over the privacy and copyright implications of these new technologies. Adding fuel to either side of the argument could dramatically affect potential legal regulation of the technology, and as a result, this latest paper, authored by Nicholas Carlini et al., has perked up ears in AI circles.

However, Carlini's results are not as clear-cut as they may first appear. Discovering instances of memorization in Stable Diffusion required 175 million image generations for testing and preexisting knowledge of trained images. Researchers only extracted 94 direct matches and 109 perceptual near-matches out of 350,000 high-probability-of-memorization images they tested (a set of known duplicates in the 160 million-image dataset used to train Stable Diffusion), resulting in a roughly 0.03 percent memorization rate in this particular scenario.

Read 7 remaining paragraphs | Comments

chevron_right

Artist finds private medical record photos in popular AI training data set

news.movim.eu / ArsTechnica · Wednesday, 21 September, 2022 - 15:43 · 1 minute

Enlarge / Censored medical images found in the LAION-5B data set used to train AI. The black bars and distortion have been added. (credit: Ars Technica)

Late last week, a California-based AI artist who goes by the name Lapine discovered private medical record photos taken by her doctor in 2013 referenced in the LAION-5B image set, which is a scrape of publicly available images on the web. AI researchers download a subset of that data to train AI image synthesis models such as Stable Diffusion and Google Imagen .

Lapine discovered her medical photos on a site called Have I Been Trained that lets artists see if their work is in the LAION-5B data set. Instead of doing a text search on the site, Lapine uploaded a recent photo of herself using the site's reverse image search feature. She was surprised to discover a set of two before-and-after medical photos of her face, which had only been authorized for private use by her doctor, as reflected in an authorization form Lapine tweeted and also provided to Ars.

My face is in the #LAION dataset. In 2013 a doctor photographed my face as part of clinical documentation. He died in 2018 and somehow that image ended up somewhere online and then ended up in the dataset- the image that I signed a consent form for my doctor- not for a dataset. pic.twitter.com/TrvjdZtyjD

— Lapine (@LapineDeLaTerre) September 16, 2022

Lapine has a genetic condition called Dyskeratosis Congenita . "It affects everything from my skin to my bones and teeth," Lapine told Ars Technica in an interview. "In 2013, I underwent a small set of procedures to restore facial contours after having been through so many rounds of mouth and jaw surgeries. These pictures are from my last set of procedures with this surgeon."

Read 14 remaining paragraphs | Comments