How to Deprive AI of Your Art

Dec 13, 2022

Artificial intelligence is a technology that is a regular part of your life whether you realize it or not. In my life, I regularly use the “context-aware fill” feature in Photoshop to remove branches messing with the composition of my wildlife photographs. On long road trips, I frequently use my car’s auto-lane-keep feature to help keep my family safe and make the trip less exhausting. Almost daily, I use speech-to-text on my phone to dictate rough text that I later import into my Mac for editing. Yesterday, I queried Siri to answer a question about a favourite singer. I watch a lot of YouTube videos and some are clearly narrated by computers. Many of us have experienced “customer service” calls that involve a computer-generated voice asking “You said XYZ. Is that correct?” Search engines, weather reports, climate models, financial & insurance applications, and route planning on your favourite map app all involve some element of artificial intelligence.

AI technology is everywhere but somehow AI Art feels different. Creativity was supposed to be the last bastion of “we are better than robots” but dang, some of the art created by Dall-E and Stable Diffusion are - I’ll say what many are loath to admit - gorgeous. AI is coming for writers too. In the last week, I laughed out loud at an article written by an AI application. This competence seemed to pop up overnight. When so many of my creative friends already suffer from imposter syndrome, the fear and pushback aimed at this technology were predictable. The possibility of creatives being replaced by “robots” hit way too close to home and bank account.

The objection from artists that resonates with me the most is the non-consensual use of their work to train the “brains” of each AI and understanding this requires some context.

The “sticky wicket” of machine learning is that it is not that different from human learning. As an imperfect analogue, imagine giving the Flash a pencil and then asking him to draw all of the pictures in a huge curated pile of images... say something like "puppy." You also have him speed-watch some how-to-draw videos. Over time you'd see his drawings improve (the 10,000 hours to become an expert doesn't really apply here). When he finishes his pile, you take away any reference material and say "draw a puppy." He's going to be able to do that. That is essentially what is happening.

What many artists are objecting to is having their art included, without their permission, in that “pile.” Why? Because it allows the AI to learn and imitate their style, creating work that is very similar to their work without any compensation.

If this describes you, there is one option to reduce the likelihood of your art being gathered that doesn’t involve a lawyer. Stable Diffusion is very transparent about what dataset its image generator uses. This engine uses datasets of billions of images that are subsets of data collected by Common Crawl. In the same way that Google uses “spiders” to crawl the web for data to index, the non-profit organization Common Crawl has a spider that captures the data from billions of web pages monthly. The resulting repository of web crawl data, petabytes in size, can be accessed and analyzed by anyone for free.

So, if we make the assumption that Stable Diffusion’s dataset is refreshed from time to time (this is an assumption. I have no confirmation of this), one way to stop it from training on your art is to make sure that it is never collected by Common Crawl in the first place.

On your own website, this is very straightforward. The Common Crawl spider (CCbot) respects instructions in the robots.txt file.

The instructions…

User-agent: CCBot

Disallow: /

… in a robots.txt file in the root directory of your website will tell the Common Crawl spider to leave your data alone. I’ve included links to a couple of instructional videos to help you with this so you can do it yourself or pass it along to your web designer. (Aside: Without knowing where and how Dall-E gets its data, there is no way to do similar for that engine).

How to Add a Robots.txt File - YouTube

How to Optimize WordPress Robots.txt In 2022 | Robots.txt Tutorial For WordPress #SEO #Robots.txt - YouTube

Of course this only limits data collection of images on your own website and for one major crawler. Fortunately, other important sites like Deviant Art are creating ways state your intentions. As of November 11, the default on that platform is for art to be excluded from their AI generator, DreamUp. They have also created a “noai” directive in web page HTML that they hope the industry will adopt and respect. Rather than fighting “promptists” who have created work that resembles your style, or trying to sue Microsoft over Dall-E, keeping your images out of the system is an option that is increasingly straightforward.

Whether that is actually a good idea is fodder for another post, but not just yet. The wounds are currently too fresh and the fear too prescient so I’ll save my opinion for a future post. But as a preview, it will likely include Australia, Ernest Hemingway, Will Eisner, spawning, parallel evolution, Star Trek replicators and even a reference to porn. But that’s for another time.

For more information:

Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion's Image Generator

Common Crawl

What DALL-E Dataset Did OpenAI Use? – NightCafe Creator

UPDATE All Deviations Are Opted Out of AI Datasets by team on DeviantArt

Update: December 21, 2002 - DeviantArt’s “noai” and “noimageai” meta tag

DeviantArt is trying to get the industry to adopt two new meta tags, “noai” and “noimageai." GlobalComix recently installed this as a default and other organizations, including DeviantArt and ArtStation have added these tags as a user opt-in. All the provisos mentioned above re. good robot behavior still apply.

Rather than re-invent the wheel, I’d like to point you to an excellent article on how to install either, or both of these tags, on your own website.

See What is DeviantArt’s new “noai” and “noimageai” meta tag and how to install it by illustrator Aimee Cozza.

But it all pales in importance…

As important as the AI discussion is, it literally pales in comparison to the announcement out of Lawrence Livermore National Laboratory that hit the news today. Stash December 5th, 2022 in the trivia corner of your brain. On that day, scientists created a fusion reaction that produced more energy than it took to create. This is a “Wright Brothers” moment with world-saving and transforming potential that will, among other things, one day eliminate the need to burn fossil fuels for energy. I never thought I’d see commercial fusion power on the grid in my lifetime, but now I am thinking the odds are 50/50. It makes me well up to consider that this breakthrough is the culmination of decades of research. Perseverance for the win!

Nuclear scientist Marv Adams explains what happened in the successful fusion experiment

In full: Leading scientists discuss breakthrough in nuclear fusion

Bill Nye explains why nuclear fusion breakthrough is a big deal

BlancoKat

Jan 19, 2023

The good news is that "responsible data collection" is being taught in at least one machine learning course -- Harvard-edX.org's "TinyML" certification series makes a big deal of data engineers ensuring they know where their data comes from, and that permissions are obtained when creating AI models.

Expand full comment

Tech for Comics

Discussion about this post