Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd wager that there are several players in the AI market who have already scraped and OCR'd every book and magazine on zlib and libgen to feed into training models. Google are almost certainly piped everything they have in Google Books into their models, before some future legal case says they can't. Won't take long before the open community starts doing the same.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: