What about copyright?
AI models like ChatGPT learn to generate output because their are exposed to a lot of data during their training (see âTâ). This can be data of all sorts, mostly obtained from the internet, including personal writings, music, art. There are at least two problems associated with this. One has to do with the data used for training, as some of it can be copyrighted and might not have been legal to use it. The other one has to do with the abstraction and generation abilities of LLMs: based on what they have seen, they might create new pieces that are extremely similar to the originals, reproducing the style or specific traits of an existing piece of work, or a given artist. Especially small artists can be at risk, since synthetic creations can be inspired from and look like theirs, without actually giving them any credit or payment.