Sony v Universal established a very important legal doctrine with regards to "co...

simiones · on Feb 10, 2023

Edit: after the text you added, I believe we are essentially in agreement. IF it is accepted that the way GPT3 was created is a fair use of the works in the training data, THEN I fully agree that (1) OpenAI has every right to sell it even if (2) some uses of it would still constitute copyright infringement, since (3) only specific users would be liable for copyright infringement in their uses.

Where we differ is in how certain we are that the IF is true. I for one believe there is a good chance that the training of an LLM on copyrighted works does infringe on the copyright of those works (if no other exceptions apply, such as the LLM being trained only for academic research purposes, of course).

My original response:

Where Sony v Universal definitely applies though is when evaluating whether OpenAI's selling of GPT3 to others who then use it to create copyright-infringing works would make OpenAI liable for contributory infringement. Here, the similarities are crystal clear, and the conclusion is simple: since there clearly exist non-infringing uses of GPT3 (such as Supabase Clippy), OpenAI is fully in the clear to sell GPT3, just as much as Sony was for selling the VCR.

However, this assumes that OpenAI has the rights to the IP of GPT3 itself in the first place, which is a prerequisite to them being allowed to sell it at all. Sony certainly had the rights to the IP of the VCR - Universal never claimed that the VCR was a derivative work of their movies.

Essentially, in Sony v Universal, Universal was claiming (1) that Sony was liable for contributory infringement, since (2) all customers of Sony who used it to record and then playback a Universal show were guilty of copyright infringement. The court established that (2) was in fact fair use, and from there automatically (1) become false, since now there was an established legal way of using the Sony product.

But, in a hypothetical OpenAI v Universal, Universal could plausibly claim that (1) OpenAI is liable for copyright infringement directly, since they are distributing GPT3 , (2) which is a derived work of Universal's IP used in the training set of GPT3.

williamcotton · on Feb 10, 2023

I honestly think you're more certain because you want the courts to rule in a certain manner.

I'm more certain because I'm thinking about this separately from my opinions as to how the courts will rule. Yes, I will just so happen to agree with that ruling because I happen to agree with the logic and knowledge contained in our legal process. I agree that the existing legal doctrines already capture the spirit of what we are asking them to judge. I agree that the statutory law, case law and doctrine that informs their judgement will successfully balance both the limited rights of copyright holders and the natural rights of a public to unburdened access to the arts, knowledge and information. I agree with their process of balancing the potential impact on existing commercial practice with the potential impact on new forms of commercially significant non-infringing practices.

Some things in copyright might just seem unfair, like the case of Baker v Selden:

In 1859, Charles Selden obtained copyright in a book he wrote called Selden's Condensed Ledger, or Book-keeping Simplified. In it the book described an improved system of book-keeping. The books contained about twenty pages of primarily book-keeping forms and only about 650 words. In addition, the books contained examples and an introduction. In the following years Selden made several other books, improving on the initial system. In total, Selden wrote six books, though, evidence suggests that they were really six editions of the same book.

Selden, however, was unsuccessful in selling his books. He originally believed he could sell his system to several counties and the United States Department of the Treasury. Those sales never happened. Selden was forced to assign his interest—an interest that apparently was returned to his wife after his death in 1871.

In 1867, W.C.M. Baker produced a book describing a very similar system. Unlike Selden, Baker was more successful at selling his book–selling it to some 40 counties within five years.

Selden's widow, Elizabeth Selden, hired an attorney, Samuel S. Fisher, a former Commissioner of Patents. In 1872, Fisher filed suit against Baker for copyright infringement.

The poor old widow lost. Boohoo. But this was a just ruling!

I'm not sure that the people who think that ChatGPT is guilty of copyright infringement are thinking about the issue in a balanced manner. Luckily our courts probably will!

One strategy that the defense could use to lower their risk profile is to allow open access to their models and allow an entire ecosystem of commercially significant non-infringing uses to blossom because they are aware of how the courts will be influenced based on existing statutory and legal doctrine...