@skaphle (I will just untag some people here if thats okay. Not really wanting to spam their mentions if not needed)
I am with you that it is illegal to just take images from somewhere and putting into a dataset is (in most cases) illegal. As I outlined with the general opt-out nature of copyright. That is until a court decides otherwise.
Enforcing it is obviously is the problem. Disney and Universal are fighting OpenAI in court over this issue right now. Will be an interesing case to follow.
Scraping in and of itself can be still fair use btw. Its more a matter of how you use the data. Doing this for scientific for the research of AI in a scientific setting might be deemed legal (i.e because scientific is considered fair use broadly)
If openAI can clear that mark I dunno. But likely not since they are using stuff commercially not just to research behaviour of AI and stuff. But thats ultimately the job of courts to decide.
As I said as well i dunno if an explicit prohibition in a license can beat fair use or not. If yes the NoAI-tag might proof useful as it makes for a clear against thr alternative of fighting over fair use or not. Making that tag machine-readable / embedded also makes sure you cannot
claim.you did not know as an AICompany.
But as I said I dunno if thats correct or not. If fair use beats that and OpenAI somehow gets FairUse through we are fucked anyway