corneliusdavid 203 Posted July 29 6 hours ago, Dalija Prasnikar said: you don't even have to open it. It may just be a part of your project where AI will go through your complete data Hmm... Yes, that is a serious consideration. Eye-opening article. 5 hours ago, Anders Melander said: Open Source does not mean that the code can be used freely. There are still conditions that must be observed (attribution, restrictions on usage, etc.). An AI (or the people that train it) doesn't care about that. True. Good point. Share this post Link to post
Brandon Staggs 242 Posted July 30 On 7/29/2024 at 3:48 AM, Anders Melander said: Open Source does not mean that the code can be used freely. There are still conditions that must be observed (attribution, restrictions on usage, etc.). An AI (or the people that train it) doesn't care about that. There will need to be laws written specifically to cover this stuff. Right now, copyright is very clearly understood and enforced as a means of protecting specific expressions of an idea. Your exact code is copyrighted, but the way you solve problems is not. Anyone is free to "train" themselves by publicly available source code. They are not free to copy source code unless the license allows it. I don't know why programmers think that their code on github should be off-limits to AI training when it is already being used to train human minds. Share this post Link to post
Anders Melander 1753 Posted July 30 8 minutes ago, Brandon Staggs said: Anyone is free to "train" themselves by publicly available source code. Are you really comparing the way an actual sentient intelligence learns principles by study to the way a language model works?... 2 minutes ago, Brandon Staggs said: I don't know why programmers think that their code on github should be off-limits to AI training when it is already being used to train human minds. Well, I guess you are then. While that's an interesting fantasy, the reality is that you are equaling two entirely different things. Language models can not and does not form independent ideas; It can only function by copying, either directly or indirectly, existing work and nothing prevents a LLM from suggesting a solution based entirely on or copied verbatim from a single source - and that is a license violation. Share this post Link to post
Brandon Staggs 242 Posted July 30 7 minutes ago, Anders Melander said: Are you really comparing the way an actual sentient intelligence learns principles by study to the way a language model works?... Well, I guess you are then. While that's an interesting fantasy, the reality is that you are equaling two entirely different things. Language models can not and does not form independent ideas; It can only function by copying, either directly or indirectly, existing work and nothing prevents a LLM from suggesting a solution based entirely on or copied verbatim from a single source - and that is a license violation. Look, I completely agree about how these things work. I am not a fan of these autocomplete algorithms. But you can't just wave your hands and say "this is different" when it comes to language models being trained with publicly visible source code and people using it to figure out how to solve problems. What is your LEGAL BASIS for saying it is a license violation? What part of law are you referring to? Various aspects of law protect the intellectual property of source code. Talk about copyright or patents. The only reason you can write a software license and expect someone else to honor it is because of these legal protections, primarily copyright. If you upload your source code to a visible repository and then write in the license file that nobody is allowed to look at your code to learn how it works, you would have no basis for enforcement. I don't know why you think that language models are somehow some new obviously invalid use. Nobody has written the laws yet. Share this post Link to post
Stefan Glienke 1978 Posted July 31 18 hours ago, Brandon Staggs said: But you can't just wave your hands and say "this is different" when it comes to language models being trained with publicly visible source code and people using it to figure out how to solve problems. What is your LEGAL BASIS for saying it is a license violation? Because things like Clean room design exist to minimize the chances of any copyright infringement. There have been lawsuits over "similarly enough looking" code in the past. Also, the simple fact that gen AI does not care about any license attribution which you would have to follow if you take any source code with one of the major permissive licenses that still require attribution. Share this post Link to post
Brandon Staggs 242 Posted July 31 3 hours ago, Stefan Glienke said: Because things like Clean room design exist to minimize the chances of any copyright infringement. There have been lawsuits over "similarly enough looking" code in the past. Also, the simple fact that gen AI does not care about any license attribution which you would have to follow if you take any source code with one of the major permissive licenses that still require attribution. How on earth do you hire a competent programmer and make the case they have never looked at open source code available on github? It's one thing to use a clean room to avoid infringing on closed source, but good luck in the court saying you have never looked at code on github! And again, nobody is explained in what legal sense an algorithm scraping text on github to generate summaries is a violation of copyright law when programmers do it all the time. EITHER ONE can lead to an actual license violation. An AI could certainly end up spitting out verbatim code that is a de facto copyright violation, but human coders do this routinely with copy and paste. (Not saying this is okay, but we haven't shut down human access to publicly available source code just because of the potential for copyright infringement.) Just to be clear, I am not comparing how LLMs are trained to how humans are trained, and I am not suggesting I am "okay" with LLMs being trained by web scraping (whether it be code or blog posts or news articles or books). I am just pointing out that there is no legal basis yet for many of the claims being made here. And to bring this topic to a point of reality, I think anyone who is concerned with their code being used to train language models has no choice but to keep the code private. The law is not clear, and is probably not on your side, and since the companies pushing AI are the ones with the most money, the law will probably never be on your side. You're stuck keeping your code private if you don't want it being used to train language models. Share this post Link to post