AI2 drops biggest open dataset yet for training language models

Language models like GPT-4 and Claude are powerful and useful, but the data on which they are trained is a closely guarded secret. The Allen Institute for AI (AI2) aims to reverse this trend with a new, huge text dataset that’s free to use and open to inspection. Dolma, as the dataset is called,…#alleninstituteforai #appetite #openai #meta #chart #distribute #huggingface
Source: Reuters: Health - Category: Consumer Health News Source Type: news