More

    EU AI Act Compliance Tool Highlights Shortcomings in Big Tech’s AI Models

    A new compliance assessment tool has revealed significant gaps in key artificial intelligence (AI) models from major tech companies, underscoring challenges in meeting European regulations on cybersecurity and discriminatory practices. Data obtained by Reuters indicates that some leading generative AI models are falling short in areas critical to the EU’s forthcoming AI Act.

    The introduction of the EU’s AI regulations gained momentum following the public launch of OpenAI’s ChatGPT in late 2022, which sparked widespread discussion about potential risks associated with such technologies. In response, lawmakers have been working to establish rules specifically aimed at “general-purpose” AIs (GPAI).

    Developed by Swiss startup LatticeFlow AI in collaboration with ETH Zurich and Bulgaria’s INSAIT, the new compliance framework evaluates AI models from firms including Meta, OpenAI, Alibaba, and Anthropic. The “Large Language Model (LLM) Checker” scores these models on a scale of 0 to 1 across various categories such as technical robustness and safety.

    A recent leaderboard published by LatticeFlow indicated that models from Alibaba, Anthropic, OpenAI, Meta, and Mistral achieved average scores of 0.75 or higher. However, the LLM Checker identified critical deficiencies that could compel companies to allocate additional resources to ensure compliance with the AI Act. Companies failing to meet these regulations risk hefty fines of up to 35 million euros (approximately $38 million) or 7% of their global annual revenue.

    As the EU continues to define enforcement measures for the AI Act, experts are being convened to draft a code of practice governing generative AI technologies by spring 2025. The recent assessments reveal specific areas where tech firms may struggle to adhere to the law.

    For instance, discriminatory output—reflecting inherent biases related to gender and race—remains a persistent concern in generative AI development. In tests assessing discriminatory output, OpenAI’s “GPT-3.5 Turbo” received a score of just 0.46, while Alibaba Cloud’s “Qwen1.5 72B Chat” garnered an even lower score of 0.37. Additionally, testing for “prompt hijacking,” a cyberattack technique where malicious prompts are disguised as legitimate, revealed that Meta’s “Llama 2 13B Chat” model scored 0.42, with Mistral’s “8x7B Instruct” model at 0.38. Conversely, Anthropic’s “Claude 3 Opus” achieved the highest average score of 0.89.

    The LLM Checker is designed to align with the AI Act’s stipulations and will adapt as further enforcement measures are rolled out. LatticeFlow has announced that the tool will be available online for developers seeking to test their models for compliance.

    Petar Tsankov, CEO and co-founder of LatticeFlow, stated that the results provide a constructive roadmap for companies looking to align their models with regulatory standards. “While the EU is still establishing compliance benchmarks, we are identifying gaps in the models,” he noted. “With a stronger emphasis on optimizing for compliance, model providers can adequately prepare to meet regulatory requirements.”

    Meta and Mistral have declined to comment on the findings, while Alibaba, Anthropic, and OpenAI did not respond immediately to requests for comment. Although the European Commission cannot verify external tools, it has been informed about the development of the LLM Checker and regards it as a significant step towards implementing the new laws. A spokesperson stated, “The Commission welcomes this study and AI model evaluation platform as a first step in translating the EU AI Act into technical requirements.”

    Related topics:

    ASML Faces Investor Doubts Following 2025 Financial Guidance Cut

    Robinhood Unveils Desktop Platform and Expands Trading Options

    TSMC Poised for Record Profits Amid AI Surge

    Recent Articles

    TAGS

    Related Stories