Tag • #gpus

chevron_right

Nvidia unveils Blackwell B200, the “world’s most powerful chip” designed for AI

news.movim.eu / ArsTechnica · Tuesday, 19 March - 15:27 · 1 minute

The GB200 "superchip" covered with a fanciful blue explosion that suggests computational power bursting forth from within. The chip does not actually glow blue in reality.

Enlarge / The GB200 "superchip" covered with a fanciful blue explosion that suggests computational power bursting forth from within. The chip does not actually glow blue in reality. (credit: Nvidia / Benj Edwards)

On Monday, Nvidia unveiled the Blackwell B200 tensor core chip—the company's most powerful single-chip GPU, with 208 billion transistors—which Nvidia claims can reduce AI inference operating costs (such as running ChatGPT ) and energy consumption by up to 25 times compared to the H100 . The company also unveiled the GB200, a "superchip" that combines two B200 chips and a Grace CPU for even more performance.

The news came as part of Nvidia's annual GTC conference, which is taking place this week at the San Jose Convention Center. Nvidia CEO Jensen Huang delivered the keynote Monday afternoon. "We need bigger GPUs," Huang said during his keynote. The Blackwell platform will allow the training of trillion-parameter AI models that will make today's generative AI models look rudimentary in comparison, he said. For reference, OpenAI's GPT-3, launched in 2020, included 175 billion parameters. Parameter count is a rough indicator of AI model complexity.

Nvidia named the Blackwell architecture after David Harold Blackwell , a mathematician who specialized in game theory and statistics and was the first Black scholar inducted into the National Academy of Sciences. The platform introduces six technologies for accelerated computing, including a second-generation Transformer Engine, fifth-generation NVLink, RAS Engine, secure AI capabilities, and a decompression engine for accelerated database queries.

Read 8 remaining paragraphs | Comments

chevron_right

Nvidia’s “Chat With RTX” is a ChatGPT-style app that runs on your own GPU

news.movim.eu / ArsTechnica · Thursday, 15 February - 16:54 · 2 minutes

Enlarge (credit: Nvidia)

On Tuesday, Nvidia released Chat With RTX, a free personalized AI chatbot similar to ChatGPT that can run locally on a PC with an Nvidia RTX graphics card. It uses Mistral or Llama open-weights LLMs and can search through local files and answer questions about them.

Chat With RTX works on Windows PCs equipped with NVIDIA GeForce RTX 30 or 40 Series GPUs with at least 8GB of VRAM. It uses a combination of retrieval-augmented generation (RAG), NVIDIA TensorRT-LLM software, and RTX acceleration to enable generative AI capabilities directly on users' devices. This setup allows for conversations with the AI model using local files as a dataset.

"Users can quickly, easily connect local files on a PC as a dataset to an open-source large language model like Mistral or Llama 2, enabling queries for quick, contextually relevant answers," writes Nvidia in a promotional blog post.

A screenshot of Chat With RTX, which runs in a web browser window. (credit: Benj Edwards)

Using Chat With RTX, users can talk about various subjects or ask the AI model to summarize or analyze data, similar to how one might interact with ChatGPT. In particular, the Mistal-7B model has built-in conditioning to avoid certain sensitive topics (like sex and violence, of course), but users could presumably somehow plug in an uncensored AI model and discuss forbidden topics without the paternalism inherent in the censored models.

Also, the application supports a variety of file formats, including .TXT, .PDF, .DOCX, and .XML. Users can direct the tool to browse specific folders, which Chat With RTX then scans to answer queries quickly. It even allows for the incorporation of information from YouTube videos and playlists, offering a way to include external content in its database of knowledge (in the form of embeddings) without requiring an Internet connection to process queries.

Rough around the edges

We downloaded and ran Chat With RTX to test it out. The download file is huge, at around 35 gigabytes, owing to the Mistral and Llama LLM weights files being included in the distribution. ("Weights" are the actual neural network files containing the values that represent data learned during the AI training process.) When installing, Chat With RTX downloads even more files, and it executes in a console window using Python with an interface that pops up in a web browser window.

Several times during our tests on an RTX 3060 with 12GB of VRAM, Chat With RTX crashed. Like open source LLM interfaces, Chat With RTX is a mess of layered dependencies, relying on Python, CUDA, TensorRT, and others. Nvidia hasn't cracked the code for making the installation sleek and non-brittle. It's a rough-around-the-edges solution that feels very much like an Nvidia skin over other local LLM interfaces (such as GPT4ALL ). Even so, it's notable that this capability is officially coming directly from Nvidia.

On the bright side (a massive bright side), local processing capability emphasizes user privacy, as sensitive data does not need to be transmitted to cloud-based services (such as with ChatGPT). Using Mistral 7B feels slightly less capable than ChatGPT-3.5 (the free version of ChatGPT), which is still remarkable for a local LLM running on a consumer GPU. It's not a true ChatGPT replacement yet, and it can't touch GPT-4 Turbo or Google Gemini Pro/Ultra in processing capability.

Nvidia GPU owners can download Chat With RTX for free on the Nvidia website.

Read on Ars Technica | Comments

chevron_right

Nvidia CEO calls for “Sovereign AI” as his firm overtakes Amazon in market value

news.movim.eu / ArsTechnica · Tuesday, 13 February - 16:41

Enlarge (credit: Nvidia / Benj Edwards)

On Monday, Nvidia CEO Jensen Huang said that every country should control its own AI infrastructure so it can protect its culture, Reuters reports . He called this concept "Sovereign AI," which an Nvidia blog post defined as each country owning "the production of their own intelligence."

Huang made the announcement in a discussion with UAE's Minister of AI, Omar Al Olama, during the World Governments Summit in Dubai. "It codifies your culture, your society’s intelligence, your common sense, your history—you own your own data," Huang told Al Olama.

The World Governments Summit organization defines itself as "a global, neutral, non-profit organization dedicated to shaping the future of governments." Its annual event attracts over 4,000 delegates from 150 countries, according to Nvidia. It's hosted in the United Arab Emirates , a collection of absolute monarchies with no democratically elected institutions.

Read 5 remaining paragraphs | Comments

chevron_right

2023 was the year that GPUs stood still

news.movim.eu / ArsTechnica · Thursday, 28 December - 11:28 · 1 minute

Enlarge (credit: Andrew Cunningham)

In many ways, 2023 was a long-awaited return to normalcy for people who build their own gaming and/or workstation PCs. For the entire year, most mainstream components have been available at or a little under their official retail prices, making it possible to build all kinds of PCs at relatively reasonable prices without worrying about restocks or waiting for discounts. It was a welcome continuation of some GPU trends that started in 2022. Nvidia, AMD, and Intel could release a new GPU, and you could consistently buy that GPU for roughly what it was supposed to cost.

That's where we get into how frustrating 2023 was for GPU buyers, though. Cards like the GeForce RTX 4090 and Radeon RX 7900 series launched in late 2022 and boosted performance beyond what any last-generation cards could achieve. But 2023's midrange GPU launches were less ambitious. Not only did they offer the performance of a last-generation GPU, but most of them did it for around the same price as the last-gen GPUs whose performance they matched.

The midrange runs in place

Not every midrange GPU launch will get us a GTX 1060 —a card roughly 50 percent faster than its immediate predecessor and beat the previous-generation GTX 980 despite costing just a bit over half as much money. But even if your expectations were low, this year's midrange GPU launches have been underwhelming.

Read 22 remaining paragraphs | Comments

chevron_right

After a chaotic three years, GPU sales are starting to look normal-ish again

news.movim.eu / ArsTechnica · Monday, 4 December - 21:57 · 1 minute

Enlarge / AMD's Radeon RX 7600. (credit: Andrew Cunningham)

It's been an up-and-down decade for most consumer technology, with a pandemic-fueled boom in PC sales giving way to a sales crater that the market is still gradually recovering from . But few components have had as hard a time as gaming graphics cards, which were near impossible to buy at reasonable prices for about two years and then crashed hard as GPU companies responded with unattainable new high-end products .

According to the GPU sales analysts at Jon Peddie Research, things may finally be evening out. Its data shows that GPU shipments have returned to quarter-over-quarter and year-over-year growth after two years of shrinking sales. This is the second consecutive quarter this has happened, which "strongly indicates that things are finally on the upswing for the graphics industry."

JPR reports that overall GPU unit shipments (which include integrated and dedicated GPUs) are up 16.8 percent from Q2 and 36.6 percent from a year ago. Dedicated GPU sales increased 37.4 percent from Q2. When comparing year-over-year numbers, the biggest difference is that Nvidia, AMD, and Intel all have current-generation GPUs available in the $200–$300 range, including the GeForce RTX 4060 , the Radeon RX 7600 , and the Arc A770 and A750 , all of which were either unavailable or newly launched in Q3 of 2022.

Read 4 remaining paragraphs | Comments

chevron_right

Cities: Skylines 2’s troubled launch, and why simulation games are freaking hard

news.movim.eu / ArsTechnica · Sunday, 19 November - 12:00

Cities: Skylines 2’s troubled launch, and why simulation games are freaking hard

Enlarge (credit: Paradox Interactive)

The worst thing about Cities: Skylines 2 is that it was recently released.

If this hugely ambitious city builder simulation would have been released some time ago, patched over and over again, and updated with some gap-filling DLC, it would be far better off. It could be on its slow-burn second act, like No Man’s Sky , Cyberpunk 2077 , or Final Fantasy XIV . It could have settled into a disgruntled-but-still-invested player base, like Destiny 2 or Overwatch 2 . Or its technical debts could have been slowly paid off to let its underlying strengths come through, as with Disco Elysium or The Witcher 3.

But Cities: Skylines 2 ( C:S2) is regrettably available now in its current state. It has serious performance problems, both acknowledged by its 30-odd-employee developer Colossal Order and studied in-depth by others (which we’ll get into). It has a rough-draft look when compared to its predecessor, which has accumulated eight years of fixes, DLC, and mods to cover a dizzying array of ideas. Worst of all, it was highly anticipated by fans, some of whom have high-end systems that still can't properly run the sluggish game.

Read 37 remaining paragraphs | Comments

chevron_right

Microsoft launches custom chips to accelerate its plans for AI domination

news.movim.eu / ArsTechnica · Wednesday, 15 November - 21:53

A photo of the Microsoft Azure Maia 100 chip that has been altered with splashes of color by the author to look as if AI itself were bursting forth from its silicon substrate.

Enlarge / A photo of the Microsoft Azure Maia 100 chip that has been altered with splashes of color by the author to look as if AI itself were bursting forth from its silicon substrate. (credit: Microsoft | Benj Edwards)

On Wednesday at the Microsoft Ignite conference, Microsoft announced two custom chips designed for accelerating in-house AI workloads through its Azure cloud computing service: Microsoft Azure Maia 100 AI Accelerator and the Microsoft Azure Cobalt 100 CPU.

Microsoft designed Maia specifically to run large language models like GPT 3.5 Turbo and GPT-4 , which underpin its Azure OpenAI services and Microsoft Copilot (formerly Bing Chat ). Maia has 105 billion transistors that are manufactured on a 5-nm TSMC process. Meanwhile, Cobalt is a 128-core ARM-based CPU designed to do conventional computing tasks like power Microsoft Teams. Microsoft has no plans to sell either one, preferring them for internal use only.

As we've previously seen, Microsoft wants to be " the Copilot company ," and it will need a lot of computing power to meet that goal. According to Reuters , Microsoft and other tech firms have struggled with the high cost of delivering AI services that can cost 10 times more than services like search engines.

Read 5 remaining paragraphs | Comments

chevron_right

Nvidia introduces the H200, an AI-crunching monster GPU that may speed up ChatGPT

news.movim.eu / ArsTechnica · Monday, 13 November - 21:44 · 1 minute

The Nvidia H200 GPU covered with a blue explosion.

Enlarge / The Nvidia H200 GPU covered with a fanciful blue explosion that figuratively represents raw compute power bursting forth in a glowing flurry. (credit: Nvidia | Benj Edwards)

On Monday, Nvidia announced the HGX H200 Tensor Core GPU, which utilizes the Hopper architecture to accelerate AI applications. It's a follow-up of the H100 GPU , released last year and previously Nvidia's most powerful AI GPU chip. If widely deployed, it could lead to far more powerful AI models—and faster response times for existing ones like ChatGPT —in the near future.

According to experts, lack of computing power (often called "compute") has been a major bottleneck of AI progress this past year, hindering deployments of existing AI models and slowing the development of new ones. Shortages of powerful GPUs that accelerate AI models are largely to blame. One way to alleviate the compute bottleneck is to make more chips, but you can also make AI chips more powerful. That second approach may make the H200 an attractive product for cloud providers.

What's the H200 good for? Despite the "G" in the "GPU" name, data center GPUs like this typically aren't for graphics. GPUs are ideal for AI applications because they perform vast numbers of parallel matrix multiplications, which are necessary for neural networks to function. They are essential in the training portion of building an AI model and the "inference" portion, where people feed inputs into an AI model and it returns results.

Read 7 remaining paragraphs | Comments

chevron_right

US surprises Nvidia by speeding up new AI chip export ban

news.movim.eu / ArsTechnica · Tuesday, 24 October, 2023 - 21:07 · 1 minute

Enlarge / A press photo of the Nvidia H100 Tensor Core GPU. (credit: Nvidia )

On Tuesday, chip designer Nvidia announced in an SEC filing that new US export restrictions on its high-end AI GPU chips to China are now in effect sooner than expected, according to a report from Reuters . The curbs were initially scheduled to take effect 30 days after their announcement on October 17 and are designed to prevent China, Iran, and Russia from acquiring advanced AI chips.

The banned chips are advanced graphics processing units (GPUs) that are commonly used for training and running deep learning AI applications similar to ChatGPT and AI image generators , among other uses. GPUs are well-suited for neural networks because their massively parallel architecture performs the necessary matrix multiplications involved in running neural networks faster than conventional processors.

The Biden administration initially announced an advanced AI chip export ban in September 2022, and in reaction, Nvidia designed and released new chips , the A800 and H800, to comply with those export rules for the Chinese market. In November 2022, Nvidia told The Verge that the A800 "meets the US Government’s clear test for reduced export control and cannot be programmed to exceed it." However, the new curbs enacted Monday specifically halt the exports of these modified Nvidia AI chips. The Nvidia A100 , H100 , and L40S chips are also included in the export restrictions.

Read 3 remaining paragraphs | Comments