‘open sesame’ door latch
from KWS to SRE(speaker recogniton)
Authentication by voice on a local device is worth exploring. Current methods use FFT. Maybe our code could be adapted to recognize a person by their voice in a small micro model.
goal:
- where: for the door to my shop and the back door of my house
- when: I am carrying something with both hands that I don’t want to put down
- what: say ‘open sesame’ and have the latch release so I can push open the door
collect data:
- It only needs to work for a few people and it is better if it doesn’t work for anyone else.
- Maybe figure out a way to collect samples with a parallel system running at the door, recording the phrase and other talk and shipping it out to some more powerful server to collect and use for training.
- Iterate models. Someday there will be enough data and the door will open.
revison:
- Eventually it might work. Then it will need some intermediary layer. Maybe that layer won’t be suitable to work on the device.
- Add person recognition layer, either visual or auditory.
- Until that layer is compactly smart, it could do a long loop to the server. If the server doesn’t agree it could set off alarms.
- long term goal: approval from @SecurityGuy
hardware:
- a $24, 12v door latch
- a 120v,12v transformer
- 3.3v relay
alternate approaches:
Perhaps it is possible to leverage an audio phrase database to train a person authentication model. Maybe forget ‘open sesame’ as a training phrase. Maybe use something real common, a bajillion recordings of ‘hello’ for instance. I would have say hello a thousand times. Can the model learn which one is me?
If that works, how could I make it work on an uncommon word that will open my door? Maybe take all the audio data and reverse it in time, like playing a the Beatles White Album backwards to hear ‘Paul is dead’. Instead of ‘hello’, ‘olleh’ opens my door. Train on that and build another model.
request for team
This project seems right for a team. At the core it will require lots of experimentation and testing of models spanning the whole gamut of tinyML develpment as laid out by Vijay Janapa Reddi and the rest of the leaders and staff of the edX tinyML classes.
Team members could compare their selection of FFT variation, pre-processing, model architecture and inference implementation.
Team could compare and evaluate publically available audio data or models that we could piggy back on.
Team members could generate and share voiceprint data and experiment on optimal phrase type for voice authentication.
Team could develop a hardware/software subsystem for collecting voice data and storing on a server accessible to the whole team.
Team could discuss, reach consensus or produce multiple iterations of an actual hardware/software implementation.
I am very excited and ready to go. Please consider working with me on this or some pivot of this project idea. What do you think?
references and discussion:
“Internet elders such as MIT’s David D. Clark argue that authentication – verification of the identity of a person or organization you are communicating with – should be part of the basic architecture of a new Internet…(Tim Berners Lee) proposed a simple authentication tool – a browser button labeled “Oh, yeah?” that would verify identities” Click "Oh yeah?" | MIT Technology Review
For the Internet of Things (IOT) authentication and ownership of data are key issues that must be faced now, before we have hundreds of microcontrollers in our homes sending data to unnamed servers and big tech, claiming the data as their own, invading our homes. Recently, in a Independent Activities Period (IAP) class at MIT, Tim Berners Lee admitted to making a big mistake when he wrote HTML. There should have been authentication built in.
Recently I had to call Fidelity a lot since I was the executor of my mom’s will. I said a short phrase just a few times, over the phone and now I am authenticated as soon as they hear my voice. What can we do on a microcontroller?