Tag • #whisper

chevron_right

Meta’s “massively multilingual” AI model translates up to 100 languages, speech or text

news.movim.eu / ArsTechnica · Tuesday, 22 August, 2023 - 19:57 · 1 minute

An illustration of a person holding up a megaphone to a head silhouette that says

Enlarge (credit: Getty Images)

On Tuesday, Meta announced SeamlessM4T , a multimodal AI model for speech and text translations. As a neural network that can process both text and audio, it can perform text-to-speech, speech-to-text, speech-to-speech, and text-to-text translations for "up to 100 languages," according to Meta. Its goal is to help people who speak different languages communicate with each other more effectively.

Continuing Meta's relatively open approach to AI, Meta is releasing SeamlessM4T under a research license (CC BY-NC 4.0) that allows developers to build on the work. They're also releasing SeamlessAlign, which Meta calls "the biggest open multimodal translation dataset to date, totaling 270,000 hours of mined speech and text alignments." That will likely kick-start the training of future translation AI models from other researchers.

Among the features of SeamlessM4T touted on Meta's promotional blog, the company says that the model can perform speech recognition (you give it audio of speech, and it converts it to text), speech-to-text translation (it translates spoken audio to a different language in text), speech-to-speech translation (you feed it speech audio, and it outputs translated speech audio), text-to-text translation (similar to how Google Translate functions), and text-to-speech translation (feed it text and it will translate and speak it out in another language). Each of the text translation functions supports nearly 100 languages, and the speech output functions support about 36 output languages.

Read 6 remaining paragraphs | Comments

chevron_right

OpenAI launches GPT-4 API for everyone

news.movim.eu / ArsTechnica · Monday, 10 July, 2023 - 19:50 · 1 minute

Enlarge (credit: OpenAI)

On Thursday, OpenAI announced that all paying API customers now have access to the GPT-4 API. It also introduced updates to chat-based models, announced a shift from the Completions API to the Chat Completions API, and outlined plans for deprecation of older models.

Generally considered its most powerful API product, the GPT-4 API first launched in March but has been under closed testing until now. As an API, developers can use a special interface to integrate OpenAI's large language model (LLM) into their own products for uses such as summarization, coding assistance, analysis, and composition. The model runs remotely on OpenAI's servers and provides output to other apps over the Internet.

OpenAI says the GPT-4 API with 8K context is accessible to existing developers who have a successful payment history, with plans to open access to new developers by the end of July. And in a move to distance itself from older GPT-3-style models, OpenAI has also opted to begin retiring "Completions API" models in favor of newer Chat Completions API models. Since its March launch , OpenAI says that its Chat Completions API models now account for 97 percent of OpenAI's API GPT usage.

Read 4 remaining paragraphs | Comments

chevron_right

Buzz – Pour traduire ou transcrire de l’audio au format texte à l’aide de l’IA

news.movim.eu / Korben · Friday, 26 May, 2023 - 07:00 · 1 minute

Vous vous souvenez de mon article sur Whisper, cet outil d’IA mis au point par OpenAI ? Pour rappel, ce logiciel est capable de retranscrire en texte n’importe quel fichier audio ou vidéo.

C’est extrêmement pratique pour faire de la retranscription ou des sous-titres de qualité sans se prendre la tête. Et cela dans tout un tas de langues. Mais ce n’était pas forcement facile à prendre en main puisque ça passait par un colab avec des lignes de Python.

Heureusement, depuis les choses ont bien évolué et on a maintenant un logiciel fini très facile à utiliser. Ce logiciel c’est Buzz et ça fonctionne sous macOS, Linux et Windows.

Parmi ses atouts, on retrouve la transcription et la traduction en temps réel à partir du microphone de l’ordinateur, ce qui va faciliter quand même vachement le travail de ceux qui ont besoin de transcrire une réunion, une interview ou même des conversations informelles (un coup de fil par exemple).

L’application permet également d’importer des fichiers audio et vidéo et d’exporter les transcriptions au format CSV, SRT, TXT et VTT, permettant ainsi une compatibilité avec de nombreux logiciels et services comme Youtube.

Et comme si ça ne suffisait pas, Buzz prend en charge les modèles hors ligne tel que Whisper.cpp ou online comme l’API Whisper d’OpenAI. L’application propose également un moteur de recherche pour farfouiller dans les transcriptions audio et surtout un éditeur de texte intégré afin de faciliter le travail de révision des transcriptions.

Je l’ai testé à plusieurs reprises et ça fonctionne vraiment super bien, si vous prenez un modèle Small ou supérieur.

A télécharger ici.

chevron_right

Whisper-ui – Transcrire vos audios / vidéos bien au chaud dans une interface graphique

news.movim.eu / Korben · Sunday, 30 April, 2023 - 07:00 · 1 minute

Vous vous souvenez de Whisper, ce projet d’OpenAI qui permet de convertir au format texte, n’importe quel audio, ce qui permet par exemple de faire des transcriptions d’interview ou des sous-titres pour une vidéo ?

J’avais même fait un tuto sur le sujet .

Et bien bonne nouvelle, le codeur Abhay Kashyap a mis au point une interface graphique pour Whisper simplement nommée Whisper-ui. Cela fonctionne avec Streamlit et ça permet de se constituer une liste de média à traduire ou retranscrire très facilement.

Capture d'écran de l'interface graphique de Whisper-ui montrant la transcription en cours d'un fichier audio

Vous pourrez ensuite naviguer dans vos fichiers et les filtrer comme bon vous semble avec le moteur sur la gauche et évidemment récupérer la retranscription.

Pour installer Whisper-ui, le plus simple c’est de passer par Docker puisqu’il y a un Docker-Compose fourni.

git clone https://github.com/hayabhay/whisper-ui.git
cd whisper-ui
docker-compose up -d

Puis vous rendre sur l’URL suivante : http://localhost:8501/

Sinon, vous pouvez aussi l’installer directement :

sudo apt install ffmpeg
pip install -r requirements.txt

Et lancer le script Python avec Streamlit :

streamlit run app/01_🏠_Home.py

Bref, c’est pratique, ça fait gagner du temps et ça permet d’éviter de se farcir tout en ligne de commande. Et si vous cherchez d’autres projets qui utilisent Whisper, y’a toute une liste merveilleuse ici ! Merci à Nobody pour l’info !

+ d’infos ici.

chevron_right

ChatGPT and Whisper APIs debut, allowing devs to integrate them into apps

news.movim.eu / ArsTechnica · Wednesday, 1 March, 2023 - 19:54

An abstract green artwork created by OpenAI.

Enlarge (credit: OpenAI)

On Wednesday, OpenAI announced the availability of developer APIs for its popular ChatGPT and Whisper AI models that will let developers integrate them into their apps. An API (application programming interface) is a set of protocols that allows different computer programs to communicate with each other. In this case, app developers can extend their apps' abilities with OpenAI technology for an ongoing fee based on usage.

Introduced in late November, ChatGPT generates coherent text in many styles. Whisper , a speech-to-text model that launched in September, can transcribe spoken audio into text.

In particular, demand for a ChatGPT API has been huge, which led to the creation of an unauthorized API late last year that violated OpenAI's terms of service. Now, OpenAI has introduced its own API offering to meet the demand. Compute for the APIs will happen off-device and in the cloud.

Read 6 remaining paragraphs | Comments

chevron_right

Fini les heures de travail pour retranscrire votre audio/vidéo grâce à l’IA de Whisper

news.movim.eu / Korben · Saturday, 28 January, 2023 - 08:00

Salut les amis !

Alors aujourd’hui, je vais vous parler de la retranscription audio à l’aide Whisper, un super outil qui utilise de l’intelligence artificielle.

C’est génial si vous voulez créer des sous-titres pour vos vidéos ou si vous voulez proposer une version textuelle de votre podcast. Vous pouvez ainsi utiliser la retranscription audio pour toute sorte de contenu, comme des films, des séries, des animés, etc. C’est super pratique et vraiment efficace. J’ai été bluffé !

Merci aux Patreons sans qui cette vidéo n’aurait pas vu le jour. Rejoignez nous !!!

hisper

chevron_right

AI model from OpenAI automatically recognizes speech and translates it to English

news.movim.eu / ArsTechnica · Thursday, 22 September, 2022 - 16:48

A pink waveform on a blue background, poetically suggesting audio.

Enlarge (credit: Benj Edwards / Ars Technica)

On Wednesday, OpenAI released a new open source AI model called Whisper that recognizes and translates audio at a level that approaches human recognition ability. It can transcribe interviews, podcasts, conversations, and more.

OpenAI trained Whisper on 680,000 hours of audio data and matching transcripts in approximately 10 languages collected from the web. According to OpenAI, this open-collection approach has led to "improved robustness to accents, background noise, and technical language." It can also detect the spoken language and translate it to English.

OpenAI describes Whisper as an encoder-decoder transformer , a type of neural network that can use context gleaned from input data to learn associations that can then be translated into the model's output. OpenAI presents this overview of Whisper's operation:

Read 4 remaining paragraphs | Comments