More

    Meta Begins Training Llama 4: Insights from the Llama 3.1 Development Journey

    Meta scientists have commenced training the highly anticipated Llama 4, coinciding with the release of Llama 3.1. In a recent interview, AI research scientist Thomas Scialom, who led the post-training of Llama 2 and Llama 3, shared valuable insights into the development process and future directions for the Llama series.

    Understanding Llama 3.1’s Development

    Data and Synthetic Data Usage Llama 3.1, Meta’s latest open-source model, has intrigued many with its impressive capabilities. Key questions revolve around the data it uses, the amount of synthetic data involved, and the decision not to use the Mixture of Experts (MoE) architecture.

    Post-Training and RLHF Processes Scialom explained the intricate post-training and Reinforcement Learning from Human Feedback (RLHF) processes, shedding light on the model evaluation methods.

    Parameter Scale and Challenges

    Balancing Factors in Parameter Selection Choosing the parameter scale for LLMs requires considering multiple factors, including scaling laws, training time, GPU constraints, and hardware availability across the AI community. Not everyone uses H100 GPUs; thus, the model must accommodate various GPU models and memory sizes.

    Inference and Training Costs Quantization technologies like FP16 and FP8 precision alter the cost proportions of inference and training/fine-tuning. Despite these challenges, Meta aimed to create a scalable model that balances inference efficiency within the constraints of current computing power.

    Scaling Law and Model Size

    Re-evaluating the Scaling Law

    While the familiar Scaling Law focuses on model weight and training amount, Scialom pointed out that GPT-3’s parameters exceeded the required token count. In contrast, Chinchilla’s approach optimizes computing power by balancing parameters and training tokens. Meta’s strategy involved increasing training tokens and time, pushing the model to an “overtrained” state to enhance reasoning performance, deviating from Chinchilla’s principles.

    Expanding Data and Architecture

    Compared to Llama 2, Llama 3’s dataset expanded significantly from 2T to 15T tokens. Future improvements in architecture will likely go beyond the Transformer model, addressing its current inflexibility in token computing power allocation.

    Synthetic Data and Filtering

    Filtering High-Quality Data

    Scialom emphasized the importance of filtering high-quality data from the vast amount of text on the internet. For Llama 2, Meta used Llama as a classifier to label and balance topics like mathematics, law, and politics. Llama 3’s post-training relied solely on synthetic data from Llama 2, highlighting the potential of synthetic data as model performance improves.

    Model Evaluation and RLHF

    Challenges in Evaluation

    Evaluating language models remains an open research question, especially as models become more advanced. Overfitting to benchmarks can skew performance metrics. Meta employs various evaluation methods, including reward models, model-as-a-judge, diverse prompts, and benchmarks.

    Iterative RLHF for Improvement

    A practical approach to comparing models involves multiple rounds of RLHF. By sampling annotated prompts and comparing the responses of old and new models, Meta can automatically calculate the win rate, ensuring continuous improvement.

    The Future: Llama 4 and Agent Technology

    Training Llama 4

    Training for Llama 4 began in June, focusing on agent technology. Meta has already developed tools like Toolformer and aims to expand these capabilities. The GAIA benchmark, released a year ago, evaluates models’ real-world problem-solving abilities. GPT-4-driven systems have shown significant improvements over GPT-3, indicating the potential for advanced agent functionalities.

    Enhancing Agent Capabilities

    Scialom believes that agent capabilities, such as function calls, following complex instructions, advance planning, and multi-step reasoning, will mirror the intelligence improvements seen in models like GPT-4. This reflects Meta’s commitment to developing robust agent systems powered by advanced language models.

    By sharing these insights, Meta continues to foster an open-source culture, inviting the AI community to engage with and contribute to the ongoing evolution of the Llama series.

    Related topics:

    How to Transform Your Home with Ikea Home Automation

    Top 10 Medical Device Automation Companies

    What Changes Will Sora Bring to the Self-Media Industry?

    Recent Articles

    TAGS

    Related Stories