Voice Gesture

Do you remember Mouse Gesture ?

I am currently working on its good friend : Voice Gesture.

Well, the name is not very explicit, but it's a voice recognition library based on the Flash Player 10.1 Microphone new feature.

I just want to show you a very early demonstration :

Voice Gesture from didier.brun on Vimeo.

As you can see, it works pretty fine ( > 95% accuracy ) but I have to admit that, for now, the algorithm require these 2 points :

  • The user is the trainer (I have recorded my own voice models into the library)
  • A silent place

I have some more work to optimize the algorithm and build an AIR application to record and organize the sound library.

So keep in touch, I will publish this library soon...

PS : Voice Gesture works in a simple web-based Flash Player 10.1 (I have recorded this demo using FireFox), it is not specific to AIR 2.0.

EDIT 11/02/01 : MAX SAMPLE FILES - AVAILABLE

Sorry, I have very little time to update/clean and publish a good version of VoiceGesture (and also few commercial constraints). However, I give you the MAX BOYL zip file containing :

  • My presentation PDF
  • All samples & exercices using the latest version of VoiceGesture & PitchDetection

The sources are undocumented, but if you have questions, post on this thread, I will have a look the best I can...

The link : VOICE_GESTURE.zip

Comments (60)

  1. Very nice ! Well done mate.

    Wednesday, December 16, 2009 at 2:00 am #
  2. Tek wrote:

    Your teaser is impressive, I’m pretty sure we will see it used in a real life project soon. I hope that the algorithm could be improved in a near future to remove defaults you found in it currently.

    Wednesday, December 16, 2009 at 2:36 am #
  3. Interesting, apart from non-voice sounds (like the can of coke opening ) the training could be an issue for general usage.

    Great work!!

    Wednesday, December 16, 2009 at 5:02 am #
  4. Really cool work! – looking forward to seeing the published lib. There are some really cool possibilities this will bring.

    Wednesday, December 16, 2009 at 5:59 am #
  5. Really good stuff !
    I’m waiting for the next step and could see this library.
    Regards
    JP

    Wednesday, December 16, 2009 at 10:06 am #
  6. Great work Didier, looking forward to seeing this evolve in 2010.

    Also kudos to SidLee for getting behind some R&D, I hope others take notice and follow their example.

    Wednesday, December 16, 2009 at 10:45 am #
  7. lionel wrote:

    Very nice demo! love the pschiiit end :)
    In the demo, you don’t speak a lot…Could that work during a conversation, for example, if I made a pause to launch the correct order ?

    Wednesday, December 16, 2009 at 11:07 am #
  8. Very very nice stuff !
    A new dimension in user experiences in our browser.
    We are all waiting for sources ;)

    Have a nice day !

    P.S : I spend few minutes looking for the invisible mail in my iphone with the sound at 1’14 :p

    Wednesday, December 16, 2009 at 12:46 pm #
  9. fabien wrote:

    Wow ! Nice !
    We can imagine a lot of interesting use cases for accessibility, and e-learning… If we don’t need to drink too Coca Cola for calibration :)

    Wednesday, December 16, 2009 at 1:11 pm #
  10. Thanks for your comments guys :)

    Wednesday, December 16, 2009 at 2:40 pm #
  11. Jloa wrote:

    Wow! Really nice. Especially the pretty girl that appeared when he opened the tin ^_^

    Wednesday, December 16, 2009 at 3:11 pm #
  12. Romain wrote:

    Really impressive and I have an idea for future use …
    I’m excited!

    Great work, thanks a lot ;)

    Wednesday, December 16, 2009 at 3:45 pm #
  13. Malatze wrote:

    Cool! Wonder if you used any opensource solution (ported or otherwise) like Sphinx for instance, for the voice recognition or is this all build from the ground up?

    Wednesday, December 16, 2009 at 7:28 pm #
  14. krys wrote:

    Keep up the great work Didier :)

    k

    Wednesday, December 16, 2009 at 8:12 pm #
  15. Bien fait! Ca marche super!

    Wednesday, December 16, 2009 at 9:55 pm #
  16. Fardeen wrote:

    Excellent !

    C’est pour SidLee ?

    Wednesday, December 16, 2009 at 11:16 pm #
  17. Blackiz wrote:

    Wow!!! Great!!

    Thursday, December 17, 2009 at 2:40 am #
  18. dim wrote:

    coca-cola should give you money fot this :DDD – very good work !!!

    Saturday, December 19, 2009 at 12:37 pm #
  19. Vijay.R wrote:

    Great Work…Looking forward to see the next step.

    Monday, January 18, 2010 at 11:04 am #
  20. Gaurav wrote:

    Good Stuff !!

    Thursday, February 25, 2010 at 2:53 pm #
  21. ilogyc wrote:

    cool!

    that library would be capable of recognize phrases? or just simple words and sounds?

    Thursday, April 1, 2010 at 9:31 am #
  22. @ilogyc> It could work with short phrases :)

    Thursday, April 1, 2010 at 4:06 pm #
  23. Og2t wrote:

    Hey Didier! Nice work! Are you using FFT or zero crossing frequency method? Are you thinking of sharing the source soon?

    Monday, May 24, 2010 at 1:05 pm #
  24. Guillaume wrote:

    Very interesting library !
    When do you think you will have a usable library ? Will it cost something ?

    I’m very interesting to use this for different stuff that i would do.

    Friday, July 23, 2010 at 2:48 pm #
  25. vanilla wrote:

    I recently made one :)
    http://vimeo.com/13637625

    Monday, July 26, 2010 at 10:17 am #
  26. test wrote:

    Is the source going to be available soon?

    Tuesday, August 24, 2010 at 5:09 am #
  27. Library and/or Src wrote:

    Quelqu’un sait si la librarie et/ou sources sont disponible pour telechargement?

    Anyone knows where to find the library and/or source code for download?

    Wednesday, September 1, 2010 at 4:42 pm #
  28. Ventoline wrote:

    What happent to Voice Gesture? here and there speech recognition projects are poping out, but nothing that matches its quality..

    Wednesday, October 6, 2010 at 9:25 am #
  29. karen wrote:

    when will you release the library?
    About when, I can’t wait to see.
    Karen

    Wednesday, November 17, 2010 at 9:41 am #
  30. Joel wrote:

    Does anyone have source code on doing the voice authentication bit at least? I’d like to utilize something like this in an app I’m working on. I see in the trackbacks there are references to drawing the audio and then putting a series of filters against it, but it would be cool to see how that works.

    Sunday, November 28, 2010 at 4:40 pm #
  31. Franchise wrote:

    Didier has a great sense of humor,
    “So keep in touch, I will publish this library soon…” this was in
    December 2009. It is some really amazing work. Maybe he is producing a commercial version of it.

    Thursday, December 9, 2010 at 5:13 pm #
  32. Eric Dolecki wrote:

    Any news on this? I’d love to play with it when possible.

    Wednesday, January 12, 2011 at 6:26 am #
  33. Pierre wrote:

    Quand est-il de ce projet? Ça donne vraiment envie de tester! Des news?

    Wednesday, February 2, 2011 at 9:46 pm #
  34. pressz wrote:

    Hi Didier..
    I really like ur work and i’ve downloaded the source code (MAX SAMPLE FILES) u provided here.. I would like to see how it works from scratch, but the file (.fla) is corrupted, and cnnot be opened.. Will you share with us the file (that’s working) again?? TQVM.. ^^

    Tuesday, March 29, 2011 at 9:20 am #
  35. wenny wrote:

    Hi, i have download the source and test it out, but i couldn’t try out the FITC_MAP sample, for i don’t have your voice. Anyway, i can see the bitmap of the voice gesture, its it possible that you can share with me of how to make the bitmap of my voice, so i can test it out? Thanks.

    Sunday, April 3, 2011 at 3:26 pm #
  36. editor wrote:

    Hi,

    I’d like to use what you’ve put together for a voice recognition quiz. I had a Flash programmer do it, but the highest percent we can achieve – in terms of a score – is 65%. And, the larger problem is that sometimes just saying anything into the microphone returns a score higher than the score that is returned when the target word is said! Clearly, he needs some assistance to fix these issues. Is there any chance you could help us other with this?

    Thanks!

    Monday, April 4, 2011 at 7:25 am #
  37. dave wrote:

    @ wenny

    i’m german with very bad english an the fitc_map demo works great for me,
    try one of the three commands that didier have import to the demo
    say osaka or paris or san francisco in englisch (or german ;) ) or maybee in french ;) .
    Or you record your own voice and add the recorded bitmap into the library and add the model, to use your voice command .

    @didier wow ! very nice library and a smart idea, did you will publish the surce code of the demo from the video above ? or is it a future commercial project ? i find a view on the source for very useful to work on a voip client with voice recognition.

    sorry for my very bad english
    greetz from bonn,germany
    dave

    Wednesday, May 4, 2011 at 1:47 am #
  38. steven wrote:

    hi this is a great idea
    but how do you draw the sample data for the library?
    with one of the spectrum functions included or the above-mentioned air application ?

    Please give me a clue

    steven

    Friday, May 6, 2011 at 9:28 am #
  39. Kris wrote:

    Hi,
    this project is really Great!

    Could you please kindly explane how you created PNG spectrums used in samples? I see that there are some repetitions of word to recognise, and I see some blur too. Are there something more, like filtering or cropping?

    Thanks in advance for your reply :-)

    Monday, May 16, 2011 at 11:33 am #
  40. This is a really good read for me. Must agree that you are one of the coolest bloggers I ever saw. Thanks for posting this useful article.

    Tuesday, June 14, 2011 at 3:57 pm #
  41. usconet@hotmail.com wrote:

    Very fantastic…Wooww…¡¡¡

    Wednesday, June 22, 2011 at 4:59 am #
  42. Ryan hope wrote:

    dude can you please post the script you have used to make the sound recognition possible.

    Tuesday, July 26, 2011 at 11:56 pm #
  43. NextView wrote:

    Bravo. C’est impressionnant.
    Plein de très bonnes perspectives.

    Je pense que c’est une bonne idée de commercialiser cette technologie.
    Je te souhaite bon courage.

    je serai le premier à acheter ta version commerciale d’un programme du genre pour commander mon pc juste en bavardant avec lui.

    Une librairie as3 payante par exemple. Ainsi on pourrait aussi en profiter et tu gagnerai aussi ta tune. c’est normal.
    Je ne sait pas si c’est comme ça que ça se passe en général.

    Enfin bref. Encore bravo. Fais nous signe des tes autres travaux. Suis sur que ça nous intéressera autant.

    Peace.

    Monday, August 15, 2011 at 4:57 am #
  44. Steve wrote:

    Great stuff Didier – this is exactly the sort of thing I’ve been looking for. I started playing around with trying to create something similar but then stumbled across your code. Out of interest what license is the code released under? Am I able to use some of it (e.g. the PitchDetector) for a project under the MIT license?

    Thursday, August 18, 2011 at 3:25 pm #
  45. Rahul Patwa wrote:

    Awesome stuff !! :) :O :)

    Friday, December 16, 2011 at 8:38 am #
  46. Mike wrote:

    This is one stunning library! Amazing work! I just wanted to ask: I’ve been taking a look at your fitc_map project and I can’t make sense out of these lines:

    var s:Sprite=this[e.id.split(“_”)[0]];
    s['label'].text=e.id.split(“_”)[0].toUpperCase();
    eaze(s).to(1,{alpha:1}).delay(2).to(1,{alpha:0});

    (they are inside the voiceHandler)

    Any chance you might explain it to me?
    Congratulations once again.

    Tuesday, January 10, 2012 at 12:26 am #
  47. Jimmy wrote:

    http://www.didierbrun.com/VOICE_GESTURE.zip (it could be a starting point).

    Friday, March 30, 2012 at 4:56 am #
  48. www.koo.com wrote:

    i can’t find the zip file… anyone has the zip file ?

    Thursday, May 3, 2012 at 7:33 pm #
  49. thomas wrote:

    nice stuff, please release more

    Thursday, May 10, 2012 at 7:26 pm #
  50. sebb wrote:

    the source zip link is not working, i am really curious about this, any chance you can make it available again? :)

    Friday, May 18, 2012 at 5:03 pm #
  51. The link to download VOICE_GESTURE.zip is good now.

    Thursday, June 7, 2012 at 11:09 am #
  52. Victor Lira wrote:

    Hi,
    I would love to test it and check how you’ve done this, but the download starts and never finished (Network error).

    Please let us know when you fix this.

    Regards

    Tuesday, July 3, 2012 at 9:57 am #
  53. Thanks for uploading the source code. I found your video in 2009, and now 2012 again + source code, well done :)

    Tuesday, September 18, 2012 at 6:00 pm #
  54. bonabony wrote:

    hello sir, may i ask how to sample the voice gesture?

    Friday, November 9, 2012 at 10:24 pm #
  55. madhusudan wrote:

    can you provide any example on flex?then it should be more useful for us..

    Thursday, December 27, 2012 at 10:22 am #
  56. Orlando Leite wrote:

    link is down

    Monday, April 8, 2013 at 7:51 pm #
  57. WEREM wrote:

    Where can I download the VOICE_GESTURE.zip? The link is broken. Thanks.

    Thursday, May 9, 2013 at 7:15 pm #
  58. Lus wrote:

    Download link is broken… :(

    Monday, May 27, 2013 at 4:21 pm #
  59. Radu wrote:

    The link is broken, this is the most advanced speech recognition if found. And now its not available.

    Tuesday, July 2, 2013 at 10:05 am #
  60. KuttyMoorthy wrote:

    Interesting,Very nice demo!but the download link is broken… :(

    Tuesday, November 12, 2013 at 12:40 pm #

Trackbacks/Pingbacks (28)

  1. Voice Gesture-语音识别for FP10.1 | 熠●极光 on Wednesday, December 16, 2009 at 5:41 am

    [...] 翻译自原文:http://www.bytearray.org/?p=1151 [...]

     
  2. Hebiflux » Reconnaissance vocale sous flash ? on Wednesday, December 16, 2009 at 10:20 am

    [...] vidéo de Didier Brun chez Bytearray qui laisse sans voix (ahAHAH ! Pardon…) sur des essais de reconnaissance vocale. Basé sur [...]

     
  3. coderkind.com » Blog Archive » Flash 10.1 voice recognition demo on Wednesday, December 16, 2009 at 1:56 pm

    [...] impressive voice recognition demo here, as discovered via cisnky’s Twitter [...]

     
  4. Reconnaissance vocale par Didier Brun on Wednesday, December 16, 2009 at 2:50 pm

    [...] Didier Brun est développeur AS3, vous pouvez voir ses travaux ici : http://www.didierbrun.com et vous pouvez avoir plus d’explications sur son expérimentation ici http://www.bytearray.org/?p=1151 [...]

     
  5. Apukeittiö.fi » Blog Archive » Puheentunnistusta on Wednesday, December 16, 2009 at 4:39 pm

    [...] Kaikkea sitä Flashillä voikin tehdä > http://www.bytearray.org/?p=1151 [...]

     
  6. Business Centered Design Blog » “Voice Gesture” in Flash on Thursday, December 17, 2009 at 11:36 am

    [...] “Voice Gesture” in Flash Dezember 17th, 2009 | Category: Allgemeines via bytearray.org [...]

     
  7. [...] a good way to stay on top of any new exciting libraries that people are talking about. For example, Voice Gesture an article recently posted on ByteArray.org has been getting a lot attention on Twitter. It [...]

     
  8. Flash platform and more… « dkor on Friday, December 18, 2009 at 2:43 pm

    [...] on his blog a link to… ByteArray! about the coming project of voice recognition call ‘Voice Gesture‘ and developed by Didier Brun. This application take advantage of the new Microphone access [...]

     
  9. Adobe Flash platform and more… « dkor on Friday, December 18, 2009 at 3:04 pm

    [...] on his blog a link to… ByteArray! about the coming project of voice recognition call ‘Voice Gesture‘ and developed by Didier Brun. This application take advantage of the new Microphone access [...]

     
  10. New NUI « UI Addict on Sunday, December 20, 2009 at 10:50 pm

    [...] 20 12 2009 Recently I saw a interesting tweet from Seth Sandler regarding a AS3 speech recognition lib. The demo was very impressive. So I wondered why not for python ?. So I started looking around [...]

     
  11. [...] Voice Gesture [by Didier Brun] [...]

     
  12. Voice Gesture by Didier Brun « Willekeurigheid on Wednesday, December 23, 2009 at 2:43 am

    [...] Drool… more info [...]

     
  13. onebyoneblog » Blog Archive » FITC in Quick Review on Wednesday, February 24, 2010 at 2:19 pm

    [...] lunch came the obligatory Cool Shit hour. Here we got to see Didier Brun show off some voice recognition in Flash Player 10.1, Chris Allen of Infrared5 show off Brass Monkey, a great looking framework which will allow [...]

     
  14. » Blog Archive » Pensamientos flexeros (2010-02-14) on Friday, February 26, 2010 at 11:47 pm

    [...] reconocimiento de voz con AS3 http://www.bytearray.org/?p=1151 (via @yacaFx) [...]

     
  15. Mouse ve Voice Gesture Olayları on Monday, March 1, 2010 at 1:19 am

    [...] ve Voice gesture olayları ile ilgili aynı şahıstan (Didier Brun-ByteArray) iki farklı deneme: http://www.bytearray.org/?p=1151 http://www.bytearray.org/?p=91 >> [...]

     
  16. FITC Amsterdam » GELB der Powerflasher Blog on Tuesday, March 2, 2010 at 2:53 pm

    [...] 10 Minuten geile Scheiße. Am meisten beeindruckt hat mich dabei Didier Bruns, der seine Spracherkennung in Flash anhand eines SingStar Klones zeigte. Wirklich erstaunlich was mit dem neuen FlashPlayer 10.1 [...]

     
  17. Pitching the Microphone with Flash Player 10.1 Beta on Thursday, March 11, 2010 at 2:09 pm

    [...] http://www.bytearray.org/?p=1151 [...]

     
  18. Altes Thema, alte Frage ... Sprachsteuerung - Flashforum on Wednesday, May 5, 2010 at 5:56 pm

    [...] [...]

     
  19. Innovation and Flash | RIAgora on Wednesday, June 2, 2010 at 4:03 pm

    [...] Augmented Reality et ma boule de feu: AIR app and the source code Voice gesture recognition: http://www.bytearray.org/?p=1151 Intel8080 CPU emulation: http://www.bytearray.org/?p=622 Street Fighter CPU emulation: [...]

     
  20. Innovación y Flash | RIA212 on Wednesday, June 2, 2010 at 6:44 pm

    [...] Reconocimiento de Voz. [...]

     
  21. Innovation through Flash on Thursday, June 3, 2010 at 3:11 pm

    [...] shown above. Some of my favourites include an excellent demonstration of face recognition, an AS3 voice recognition library and a fantastic World Construction Kit that utilises the C++ Box2D physics library, running [...]

     
  22. [...] ist, der kann auf Bibliotheken zu Spracherkennung zurückgreifen. Didier Brun hat unter http://www.bytearray.org/?p=1151 eine solche Bibliothek bereits für den Flash Player 10.1 angekündigt. Oder man sendet die [...]

     
  23. [...] impressionante).L’occasion de rappeler que Didier Brun avait montré en décembre dernier, un système similaire pour Flash. Consulter l’articleCoup de coeur : une technologie qui simule le relief de [...]

     
  24. FITC // FITC San Francisco Report on Thursday, September 23, 2010 at 5:30 pm

    [...] Also in the realm of audio is Didier Brun’s work. He’s created a clever way to do audio recognition in Flash. Basic summary is he draws out the audio data, applies a series of filters on this image and then matches it to a previously recorded image. Using this technique he can create voice control systems or do things like pitch detection. You can find a post and video demo of his VoiceGesture system here: http://www.bytearray.org/?p=1151 [...]

     
  25. Sprachsteuerung - Flashforum on Monday, January 3, 2011 at 2:41 am

    [...] [...]

     
  26. Microphone.activityLevel anpassen (Klatschen) - Flashforum on Thursday, February 3, 2011 at 3:18 pm

    [...] [...]

     
  27. Useful Flash Goodies « encryptedpixel on Wednesday, March 16, 2011 at 7:35 pm

    [...] Voice recognition, very useful. http://www.bytearray.org/?p=1151 [...]

     
  28. [...] By admin on 17 Dezember 2009 in kuehlhaus-blog with 0 Comments. via bytearray.org [...]