Meta launched the AI language model LLaMA, a large language model with 65 billion parameters
Meta announced on Friday local time that it will launch a new large-scale language model based on artificial intelligence (AI) for the research community, and join the AI competition with Microsoft, Google and other companies stimulated by ChatGPT.Meta's LLaMA is the abbreviation of "Large Language Model Meta AI". It can be provided to government, community and academic researchers and entity workers under non-commercial license.
It is reported that the company is developing LLaMA with various parameters (7B, 13B, 33B and 65B). Among them, LLaMA 65B and LLaMA 33B have been trained on 1.4 trillion tokens, while the smallest model LLaMA 7B has also been trained on 1 trillion tokens.
Like other large language models, LLaMA works by taking a series of words as "input" and predicting the next word to recursively generate text. For this model, Meta selects texts from 20 languages with the largest number of users for training, focusing on Latin and Cyrillic alphabet.
Of course, like other models, LLaMA also faces the challenges of prejudice, toxic comments and hallucinations. Meta still needs to do more research to solve the shortcomings of such language models.
Meta said that LLaMA, as a basic model, is designed to be multifunctional and can be applied to many different use cases, rather than a fine-tuning model designed for specific tasks. Through the open source LLaMA code, other researchers can more easily find new ways to limit or eliminate these problems. Meta also provides a set of benchmark evaluation criteria for evaluating model bias and toxicity in this paper to show the limitations of the model and support researchers' further research in this key area.
It is worth mentioning that Meta also launched the large-scale language model OPT-175B in May last year. The project is also aimed at researchers, which forms the basis of the new iteration of its chat robot blenterbot.
Later, the company also launched a model called "Galactica", which was said to be able to write scientific articles and solve mathematical problems, but its demo version was later removed from the shelf because it repeatedly generated "sounds authoritative" content.