Protecting Artists from Theft by AI

In 2023, author Melanie Mitchell discovered that a cheap AI-generated imitation of her book on, ironically, the subject of artificial intelligence was for sale on Amazon. She reported it, but the platform didn’t take action to remove it until the story of the theft caught the attention of the media. “I was mad at Amazon for doing so little to prevent this,” says Mitchell. “Right now they don’t see a lot of economic incentive to crack down.”

Other AI-generated knockoffs also turned up on the platform. Journalist Rory Cellan-Jones found a version of his memoir for sale when Amazon’s algorithm recommended the book to him. In his case, they took it down after he pointed it out. But it’s unclear how many other AI-generated imitations may be available at the online retailer—which, it’s perhaps worth noting, began as a humble digital bookstore.

Then, in late 2023, it was revealed that Facebook parent company Meta and Open AI had been training their AI using pirated books, prompting lawsuits from a number of prominent authors, including Sarah Silverman, Michael Chabon, and Ta-Nehisi Coates.

So far, the lawsuits against Meta and Open AI have been partially dismissed on the grounds that the AI has not generated any works that are “substantially similar” to the originals. But the issue of piracy is only going to become more pressing for creative professionals who fear that the rapidly advancing technology could rip off their prose and ideas—and potentially put them out of work.

If you can’t copyright the output, what about the input?

Recently, Rodger Morrison, a professor at Troy University Sorrell College of Business who studies the intersection between artificial intelligence and business, hit upon a novel concept that may help pave the way to legal protection for artists against AI piracy. While pondering the fact that there is currently no legal mechanism for copyrighting AI generated content (courts have ruled that it’s not protected because it’s not produced by a human), he had an idea: If you can’t copyright the output, what about the input? He began thinking about the kinds of prompts users feed AI to mimic a particular writer’s style, which are comprised of specific words and phrases, also known as “tokens.”

To offer an example of how this mimicry might work, consider master of horror Edgar Allen Poe. To cue an AI model such as ChatGPT or Claude to write prose in Poe’s style, a human user might first train it with something like: “Poe’s writing style includes gothic elements, dark imagery, psychological depth, cryptic symbolism, melancholic tone, intricate language, unreliable narrators, suspenseful pacing, supernatural exploration, emphasis on mood and atmosphere…”

From there the AI breaks down the prompt into single word and sub-word tokens, and it is no longer necessary to include spaces between the words, resulting in a “tokenization.” Assuming that the token string is a faithful reflection of the author’s style, at this point all you would need to do to mimic the great gothic poet is enter an LLM prompt like:

“Poe’s writing style is GothicElements,DarkImagery,PsychologicalDepth,CrypticSymbolism. Please create a poem in Poe’s style that talks about a sad person being haunted by a crow.”

To test his idea of copyrighting tokens, Morrison began experimenting with creating a token for his own unique writing style, eventually falling upon a series of 12 words that seemed to prompt ChatGPT to accurately reproduce it. In March, that string of words became the first writing style token string ever granted a copyright from the United States Copyright Office.

“It’s an important development in terms of thinking about and calling attention to issues around copyright and protection for writers whose work may be mined and imitated by LLMs,” says cultural historian Catherine Clarke of the University of London, whose work has investigated the overlap between literature and artificial intelligence. “We know that users are already asking LLMs to imitate the style of named writers, with varying degrees of success.”

The issue extends to realms outside of prose, as Morrison notes that virtually any medium can be tokenized, from music to graphic arts to architecture, though different disciplines would require different strings of relevant tokens. To replicate the style of a particular musician, for example, one might need to aggregate separate tokens for composition style, instrument playing style, vocal waveform patterns, and other factors.

Morrison says that long-accepted legal precedent may make it difficult for aspiring AI intellectual property thieves to circumvent token copyright. While in theory copiers could attempt to use tokens that are slightly different from the copyrighted version, it could be argued that even the resemblance would be prohibited under law.

“If I were to take a book that is under copyright and change a few things around,” Morrison explains, “then the author of the original work can argue that I violated their copyright. This is called a ‘derivative work infringement’ or ‘substantial similarity infringement.’ Both concepts have a long legal protection history and could easily be applied to protecting a tokenization.”

Clarke agrees that token copyright holds some promise, but says there are uncertainties from a creative standpoint. For example, due to the very issue of similarity raised by Morrison, she questions if such methods will be able to capture the fine line between personal style and what is often referred to as idiolect: If a writer draws from another’s tone and use of idiom in a way that is traditionally accepted—especially within stylistically narrow or formulaic niche genres such as vampire fantasy or investing insights—will that be regarded as an infringement?

She also wonders how it might be applied in the cases of skilled literary writers who tend to vary their style and register across different pieces or genres, or even within a single piece of writing.

These are questions that remain to be explored. But if such token copyrights are not a solution in themselves, says Morrison, they may at least represent a first step toward finding one—and giving authors and artists some control over how their creative work finds its way into the world.

Lead image by Tasnuva Elahi; with images by Piece of Cake and The img / Shutterstock

  • Nick Hilden

    Posted on

    Arts, science, and travel writer Nick Hilden contributes to the likes of the Washington Post, Scientific American, Esquire, Popular Science, National Geographic, and more. You can follow him on Twitter at @nickhilden or Instagram at @nick.hilden.

new_letter

Get the Nautilus newsletter

Cutting-edge science, unraveled by the very brightest living thinkers.

Note: This article have been indexed to our site. We do not claim legitimacy, ownership or copyright of any of the content above. To see the article at original source Click Here

Related Posts
JAL A350-900 (Part 1) with step photo feature and first C check on the back of the main landing gear door thumbnail

JAL A350-900 (Part 1) with step photo feature and first C check on the back of the main landing gear door

 新型コロナウイルス感染症(COVID-19)の影響で国際線の大量運休が続く中、国内線は徐々に旅客需要が戻りつつある。日本航空(JAL/JL、9201)では、年末年始の国内線の予約数がコロナ前の2019年同期と比べて8割まで回復しており、コロナによる減便発生後も受領を続けたエアバスA350-900型機が羽田と札幌(新千歳)、伊丹、福岡、那覇を結ぶ国内線幹線を中心に投入されていることから、提供座席数は9割強まで戻した。 Cチェック中のJAL A350-900 JA05XJ=PHOTO: Tadayuki YOSHIKAWA/Aviation Wire  2年前の2019年9月1日に就航したJALのA350。大型機といえば米ボーイングの機体と相場が決まっていた日本の航空業界にとって、欧州のエアバスが開発した最新鋭機をJALが大量発注したことは衝撃だった。  今年6月には、A350-900の初号機(登録記号JA01XJ)が初のCチェックを迎えた。Cチェックは自動車の車検に例えられる機体の整備作業で、おおむね1年半から2年ごとに実施され、JALのA350は2019年12月10日に引き渡された5号機(JA05XJ)まで完了した。 Cチェック中のJAL A350-900 JA05XJの主脚扉周辺。扉裏は整備士が作業しやすいようステップになっている=PHOTO: Tadayuki YOSHIKAWA/Aviation Wire JALエンジニアリング技術部システム技術室エアバスグループの平松昌人さん=PHOTO: Tadayuki YOSHIKAWA/Aviation Wire  初回のCチェックは、以前の機体のようにさまざまな部位を分解することなく進み、JALのA350では足場を組むこともなく、高所作業車など使って作業が行われていた。A350の整備に携わるJALエンジニアリング(JALEC)の技術部システム技術室エアバスグループの平松昌人さんは、「主脚の扉にステップがあったり、主翼の付け根の部分のスペースを有効活用したりと、整備性を考えた構造になっています」と話す。  これまではボーイング機が大半を占めたJALだが、エアバス機は従来の機体とは違った整備士目線の工夫が凝らされているという。  JALのA350-900は発注済みの18機がすべて国内線用機材で、12月時点で14号機(JA14XJ)まで受領。年度内に15号機まで日本に到着する見込み。国内線仕様のボーイング777-200型機(3クラス375席:ファースト14席、クラスJ 82席、普通席279席)を置き換えている。  座席数は3クラス369席で、ファーストクラスが12席、クラスJが94席、普通席が263席が標準のX11仕様、14号機は普通席が多い3クラス391席(ファーストクラス12席、クラスJ 56席、普通席323席)のX12仕様で引き渡された。年末年始は初号機と3号機(JA03XJ)もX12仕様で運航している。  本写真特集では、5号機のCチェックの様子をまとめた。前編は胴体や主脚、主翼、エンジンなど機体の作業、後編は客室を中心に取り上げる。 *後編はこちら。 *写真は38枚。Cチェック中のJAL A350-900 JA05XJ=PHOTO: Tadayuki YOSHIKAWA/Aviation Wire Cチェック中のJAL A350-900 JA05XJ(手前)=PHOTO: Tadayuki YOSHIKAWA/Aviation Wire Cチェック中のJAL A350-900 JA05XJ=PHOTO: Tadayuki YOSHIKAWA/Aviation Wire Cチェック中のJAL A350-900 JA05XJ=PHOTO: Tadayuki YOSHIKAWA/Aviation Wire Cチェック中のJAL A350-900 JA05XJの前脚=PHOTO: Tadayuki…
Read More
Quantum Breakthrough: Caltech Scientists Unveil New Way To Erase Quantum Computer Errors thumbnail

Quantum Breakthrough: Caltech Scientists Unveil New Way To Erase Quantum Computer Errors

Researchers from Caltech have developed a quantum eraser to correct “erasure” errors in quantum computing systems. This technique, which involves manipulating alkaline-earth neutral atoms in laser light “tweezers,” allows for the detection and correction of errors through fluorescence. The innovation leads to a tenfold improvement in entanglement rates in Rydberg neutral atom systems, representing a
Read More
הקשר בין מתח חשמלי לגמישות מוחית thumbnail

הקשר בין מתח חשמלי לגמישות מוחית

מחקר חדש של אוניברסיטת תל אביב, מצא לראשונה קשר ישיר ומובהק בין שינויים בקולטנים המצומדים לחלבון G לבין יכולתו של המוח להתרגל לשינויים חיצוניים השפעות תלות במתח GPCR. פרופ' משה פרנס, אוניברסיטת תל אביב מחקר חדש של אוניברסיטת תל אביב, מצא לראשונה קשר ישיר ומובהק בין שינויים בקולטנים המצומדים לחלבון G לבין יכולתו של המוח…
Read More
Index Of News
Total
0
Share