ChatGPT 4 achieves 2nd place as top AI chatbot for the first time

Chat GBT chatbot

(Daily Point) — Anthropic’s revolutionary artificial intelligence model, Claude 3 Opus, has clinched the coveted top position on the Chatbot Arena leaderboard, outshining OpenAI’s GPT-4 for the first time since its launch last year.

Departing from conventional methods of AI model evaluation, the LMSYS Chatbot Arena adopts a distinctive approach, placing emphasis on human judgment. Participants are tasked with evaluating and ranking responses generated by two distinct models in a blind test scenario.

For an extended period, OpenAI’s GPT-4 has dominated this benchmark, with any contender approaching its performance often dubbed as “GPT-4 class.” Hence, Claude 3’s achievement is particularly noteworthy.

However, it’s worth noting that while Claude 3 has surpassed GPT-4 in these results, the margin between the two models is narrow. Claude 3’s reign at the top may be short-lived, as the imminent release of GPT-4.5 looms.

Administered by the Large Model Systems Organization (LMSys), the Chatbot Arena hosts a diverse array of large language models engaging in anonymous randomized battles. Since its inception last year, the benchmark has amassed over 400,000 user votes, consistently featuring models from OpenAI, Google, Anthropic, as well as emerging contenders like Mistral’s and Alibaba’s offerings.

Utilizing the Elo system, commonly employed in e-sports and chess, the benchmark calculates the skill level of participating models. However, in this case, the participants are not humans interacting with the chatbots, but rather the AI models themselves.

Claude 3 Opus, the flagship model in the Claude 3 lineup, has secured the top position on the leaderboard with an influx of over 70,000 new votes. Remarkably, even the smaller Claude 3 models have performed admirably. Claude 3 Haiku, the smallest variant in the series designed for consumer devices akin to Google’s Gemini Nano, has delivered impressive results without matching the scale of GPT-4 or Claude Opus.

All three Claude models have made notable appearances in the top 10 rankings of these benchmarks. Opus leads the pack, Sonnet shares the fourth position alongside Gemini Pro, and Haiku secures the sixth spot alongside an earlier version of GPT-4.

Related Post

Here are price, specs and new features of iPhone 16

Apple to reveal these four iPhone 16 Models on September 9

Govt employees banned from using social media: What are penalties?

Did Govt block or slow down internet? IT minister responds

WhatsApp to introduce customizable Meta AI voices in upcoming update

Social media faces audio downloading disruptions: When will it be resolved?

Apple could introduce heartbeat-based iPhone unlocking feature

Leave a Reply

Your email address will not be published. Required fields are marked *