• chevron_right

      AI model from OpenAI automatically recognizes speech and translates it to English

      news.movim.eu / ArsTechnica · Thursday, 22 September, 2022 - 16:48

    A pink waveform on a blue background, poetically suggesting audio.

    Enlarge (credit: Benj Edwards / Ars Technica)

    On Wednesday, OpenAI released a new open source AI model called Whisper that recognizes and translates audio at a level that approaches human recognition ability. It can transcribe interviews, podcasts, conversations, and more.

    OpenAI trained Whisper on 680,000 hours of audio data and matching transcripts in approximately 10 languages collected from the web. According to OpenAI, this open-collection approach has led to "improved robustness to accents, background noise, and technical language." It can also detect the spoken language and translate it to English.

    OpenAI describes Whisper as an encoder-decoder transformer , a type of neural network that can use context gleaned from input data to learn associations that can then be translated into the model's output. OpenAI presents this overview of Whisper's operation:

    Read 4 remaining paragraphs | Comments

    • chevron_right

      New AI assistant can browse, search, and use web apps like a human

      news.movim.eu / ArsTechnica · Thursday, 15 September, 2022 - 16:52 · 1 minute

    Still from a demo video showing ACT-1 performing a search on Redfin.com in a browser.

    Enlarge / Still from a demo video showing ACT-1 performing a search on Redfin.com in a browser when asked to "find me a house." (credit: Adept)

    Yesterday, California-based AI firm Adept announced Action Transformer (ACT-1) , an AI model that can perform actions in software like a human assistant when given high-level written or verbal commands. It can reportedly operate web apps and perform intelligent searches on websites while clicking, scrolling, and typing in the right fields as if it were a person using the computer.

    In a demo video tweeted by Adept, the company shows someone typing, "Find me a house in Houston that works for a family of 4. My budget is 600K" into a text entry box. Upon submitting the task, ACT-1 automatically browses Redfin.com in a web browser, clicking the proper regions of the website, typing a search entry, and changing the search parameters until a matching house appears on the screen.

    Another demonstration video on Adept's website shows ACT-1 operating Salesforce with prompts such as "add Max Nye at Adept as a new lead" and "log a call with James Veel saying that he's thinking about buying 100 widgets." ACT-1 then clicks the right buttons, scrolls, and fills out the proper forms to finish these tasks. Other demo videos show ACT-1 navigating Google Sheets, Craigslist, and Wikipedia through a browser.

    Read 5 remaining paragraphs | Comments