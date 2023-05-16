Sundar Pichai, CEO, Alphabet Inc. , during the Google I/O Developers Conference in Mountain View, Calif., on Wednesday, May 10, 2023.

CNBC has learned that Google’s new big language model, which the company announced last week, uses nearly five times as much training data as its predecessor from 2022, allowing it to perform more advanced coding, math and creative writing tasks.

PaLM 2, the company’s new public-use large language (LLM) model unveiled at Google I/O, has been trained on 3.6 trillion tokens, according to internal documents seen by CNBC. Tokens, which are strings of words, are an important building block for training LLM, because they teach the model to predict the next word that will appear in a sequence.

Google’s previous version of PaLM, which stands for Pathways Language Model, was released in 2022 and trained on 780 billion tokens.

While Google was eager to show the power of its AI technology and how it could be integrated into search, emails, word processing, and spreadsheets, the company was unwilling to publish the volume or other details of its training data. OpenAI, the innovator of Microsoft-backed ChatGPT, has also kept details of the latest LLM language called GPT-4 secret.

The companies say the reason for the lack of disclosure is the competitive nature of the business. Google and OpenAI are rushing to attract users who might want to search for information using chatbots instead of traditional search engines.

But as the AI ​​arms race rages on, the research community is calling for more transparency.

Since revealing PaLM 2, Google has said the new model is smaller than previous LLMs, which is significant because it means the company’s technology is becoming more efficient while accomplishing more complex tasks. PaLM 2 is trained, according to internal documentation, on 340 billion parameters, which is an indication of the complexity of the model. The initial PaLM is trained on 540 billion parameters.

Google did not immediately provide comment for this story.