IDEA #60 – Video Search from the Audio

I’m assuming this has to be happening somewhere. Basically, there should be a video search engine that uses speech-to-text technology (which isn’t perfect, but even if it’s 90% right, that’s better than nothing) and starts indexing videos based on what is being said in the audio. So the software would have to listen/watch every video it’s indexing — which could be a very time consuming task, given the amount of video content out there these days.

Pluggd is doing this for podcasts, so it seems it’d be fairly easy to do this for videos. It’s really neat technology — shows a heatmap in the podcast based on the keyword you’re searching for (so that you can see where in the podcast that topic is being spoken about). My only reservations are what the terms of service are on the video websites.

You could also have this technology see if it recognizes any text or images in the video — thus it might notice a picture of an apple, or a logo, or text that says something.

One problem with this search is that the video might be a comedy, but there’s no way of this technology to know it’s a comedy — it would just know the speech in the audio.

  • Colin Dowling

    Interesting idea, and one that wouldn’t seem that hard to implement either through software or super cheap labor. IIRC, google uses the closed caption transcripts to index video. That said, most user generated content won’t be captioned, so there is definitely a space for this service. No idea what the actual “market” would be outside of licensing the product to video portals, but a solid idea nonetheless.

  • Chandra Bajpai

    Great idea Steve, I’m surprised no one has done it yet, esp. with voice recognition being so advanced today. It might even be able to be cobbled together using open source components. One side benefit you could get is a transcript of the recording…which you could use to drive traffic to your video site via search engines.

  • James D Kirk

    You assumed correctly, Steve! I thought this was a widely known capability. People must check out the technology that has come up with. Simply said it’s pretty amazing. Not perfect mind you, but compared to what’s available, pretty amazing nonetheless.

    Nexidia provides the technology for an Atlanta news station. Check out the “Video Now” feature at and you’ll likely be blown away.

  • Michael

    This website is a ghost town….what happened?

  • Sean Scott

    I would argue that the technology for speech to text is not there yet. I’ve been looking for software/asps that do speech to text well and its hard to come by. You have to remember that videos have a lot of embedded static bg noises. And that even some of the most advanced speech recognition software have an accuracy of 70 to 85%. You will still need some kind of human transcription to get it where you want it.

    There are a ton of companies out there that use proprietary software to do similar service, however i’d be interested at having a good look at that proprietary process.

    If it was that easy, don’t you think google video search would have come up with it by now

  • Don Clark -Atlanta

    A year later, and this is the process we use: has been using speech recognition for some time and it works pretty well, except most of the video is SUPER compressed FLV’s.

    So I use Blinkx to find the source, and then Bit Torrent and Miro to find better quality versions.