Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Submit feedback
    • Contribute to GitLab
  • Sign in
B
brynfest
  • Project
    • Project
    • Details
    • Activity
    • Cycle Analytics
  • Issues 1
    • Issues 1
    • List
    • Board
    • Labels
    • Milestones
  • Merge Requests 0
    • Merge Requests 0
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Create a new issue
  • Jobs
  • Issue Boards
  • Martha Macdougall
  • brynfest
  • Issues
  • #1

Closed
Open
Opened Feb 10, 2025 by Martha Macdougall@marthamacdouga
  • Report abuse
  • New issue
Report abuse New issue

If there's Intelligent Life out There


Optimizing LLMs to be excellent at specific tests backfires on Meta, Stability.

-. -. -. -. -. -. -

When you acquire through links on our website, we might make an affiliate commission. Here's how it works.

Hugging Face has actually released its 2nd LLM leaderboard to rank the very best language designs it has actually tested. The new leaderboard seeks to be a more challenging uniform standard for testing open large language model (LLM) efficiency across a range of jobs. Alibaba's Qwen designs appear dominant in the leaderboard's inaugural rankings, taking 3 areas in the leading 10.

Pumped to reveal the brand name new open LLM leaderboard. We burned 300 H100 to re-run brand-new examinations like MMLU-pro for all major open LLMs!Some knowing:- Qwen 72B is the king and Chinese open models are controling overall- Previous assessments have actually ended up being too easy for current ... June 26, 2024

Hugging Face's second leaderboard tests language models across four jobs: understanding testing, thinking on exceptionally long contexts, complex math capabilities, and instruction following. Six benchmarks are utilized to evaluate these qualities, with tests including fixing 1,000-word murder mysteries, explaining PhD-level questions in layman's terms, and a lot of complicated of all: high-school mathematics formulas. A complete breakdown of the criteria used can be found on Hugging Face's blog.

The frontrunner of the new leaderboard is Qwen, Alibaba's LLM, which takes 1st, 3rd, and 10th place with its handful of versions. Also appearing are Llama3-70B, Meta's LLM, and a handful of smaller sized open-source projects that handled to outperform the pack. Notably missing is any indication of ChatGPT; Hugging Face's leaderboard does not evaluate closed-source designs to ensure reproducibility of outcomes.

Tests to certify on the leaderboard are run specifically on Hugging Face's own computer systems, kenpoguy.com which according to CEO Clem Delangue's Twitter, are powered by 300 Nvidia H100 GPUs. Because of Hugging Face's open-source and collective nature, users.atw.hu anybody is complimentary to submit brand-new models for screening and admission on the leaderboard, with a brand-new ballot system prioritizing popular new entries for testing. The leaderboard can be filtered to show just a highlighted range of significant models to prevent a confusing glut of little LLMs.

As a pillar of the LLM space, Hugging Face has become a trusted source for LLM learning and community partnership. After its first leaderboard was released in 2015 as a method to compare and reproduce testing arise from a number of recognized LLMs, the board rapidly in appeal. Getting high ranks on the board became the objective of lots of designers, small and big, macphersonwiki.mywikis.wiki and as designs have actually become generally more powerful, 'smarter,' and enhanced for the particular tests of the first leaderboard, its outcomes have become less and less meaningful, hence the development of a second variant.

Some LLMs, consisting of more recent variants of Meta's Llama, significantly underperformed in the new leaderboard compared to their high marks in the very first. This originated from a pattern of over-training LLMs only on the first leaderboard's criteria, resulting in regressing in real-world efficiency. This regression of performance, thanks to hyperspecific and self-referential data, follows a pattern of AI performance growing worse with time, proving once again as Google's AI answers have shown that LLM efficiency is just as great as its training data and that true synthetic "intelligence" is still many, several years away.

Remain on the Leading Edge: Get the Tom's Hardware Newsletter

Get Tom's Hardware's best news and extensive evaluations, straight to your inbox.

Dallin Grimm is a contributing writer for Tom's Hardware. He has been constructing and breaking computers since 2017, functioning as the resident youngster at Tom's. From APUs to RGB, Dallin guides all the current tech news.

Moore Threads GPUs apparently reveal 'exceptional' inference efficiency with DeepSeek models

DeepSeek research recommends Huawei's Ascend 910C provides 60% of Nvidia H100 reasoning efficiency

Asus and MSI hike RTX 5090 and RTX 5080 GPU costs by up to 18%

-. bit_user. LLM efficiency is only as good as its training information and that real synthetic "intelligence" is still numerous, numerous years away. First, this statement discounts the function of network architecture.

The definition of "intelligence" can not be whether something procedures details exactly like people do, or else the search for additional terrestrial intelligence would be entirely useless. If there's intelligent life out there, it probably doesn't believe quite like we do. Machines that act and act smartly also needn't always do so, either. Reply

-. jp7189. I do not enjoy the click-bait China vs. the world title. The fact is qwen is open source, open weights and can be run anywhere. It can (and has actually currently been) tweaked to add/remove bias. I praise hugging face's work to create standardized tests for LLMs, and hb9lc.org for putting the focus on open source, open weights first. Reply

-. jp7189. bit_user said:. First, this statement discounts the function of network architecture.

Second, intelligence isn't a binary thing - it's more like a spectrum. There are different classes cognitive tasks and abilities you might be acquainted with, wiki.lafabriquedelalogistique.fr if you study child advancement or animal intelligence.

The meaning of "intelligence" can not be whether something processes details exactly like humans do, or else the search for asteroidsathome.net additional terrestrial intelligence would be totally useless. If there's intelligent life out there, it most likely does not believe rather like we do. Machines that act and behave wisely also needn't always do so, either. We're producing a tools to help people, therfore I would argue LLMs are more handy if we grade them by human intelligence standards. Reply

- View All 3 Comments

Most Popular

Tomshardware belongs to Future US Inc, an international media group and leading digital publisher. Visit our business website.

- Terms.

  • Contact Future's specialists. - Privacy policy.
  • Cookies policy.
  • Availability Statement.
  • Advertise with us.
  • About us. - Coupons.
  • Careers

    © Future US, Inc. Full 7th Floor, 130 West 42nd Street, New York City, NY 10036.
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
No due date
0
Labels
None
Assign labels
  • View project labels
Reference: marthamacdouga/brynfest#1