I just spent four hours converting a single ONNX file to TFLite for use with Home Assistant’s Wyoming protocol. I’m reasonably technically savvy – I know what a virtual environment is, I know what Docker is, and I can use a command line.
My specific reasons for wanting to do this are twofold: one, I want to use the wakeword “Computer” with Home Assistant. Yes, I’m aware that there are two existing models for this specific wakeword and they don’t work for me: they have too many false activations as well as trouble in environments with even minor amounts of noise. I am aware of all of the reasons that people cite not to use this specific wakeword and I don’t care. It’s what I want to use and I’m willing to try and train a model that does work for me.
The other reason is more philosophical: I want to be able to create my own wakeword, with whatever words I want, whenever I want. I like self-hosting things, I want to be able to create the models on my own hardware using my own expertise and data. I want to be able to add my own adversarial data into the mix from long-term room recordings I have of the space. (My living room, or any other arbitrary room in my home.) I want total control over the creation, and this is as much a practical as it is a philosophical position for me.
I used to use Mycroft for this, until the organization stopped existing because of the patent troll issue. Integration with Home Assistant was also somewhat iffy, and there was duplicated effort by using the Hue skill to control lights, which is my primary use case. While the wake word itself worked nearly perfectly, it was time to switch to Home Assistant’s voice pipeline for voice control. Home Assistant has tried to make this somewhat user-friendly, and I have a custom local AI server set up in my basement to run STT via Rhasspy and there’s a protocol called Wyoming for creating “satellites” – little physical wake word listening stations which can be as simple as a microphone plugged into a Raspberry Pi.
However, the included wakewords don’t include “Computer”, so I set about trying to figure out how to train that for myself.
Into dependency hell!
Here’s what creating a custom wake word looks like in 2025:
- Start with a Google Colab notebook that may or may not work depending on when Google last changed their runtime environment. Hope the maintainer is still active and willing to keep up. Wade through obscure academic jargon that nobody cares about like “produces QDQ model with per-tensor quantization and INT8 activations/weights.”
- Get an ONNX file. There are two ways of doing this: the “official” Google Colab notebook which doesn’t work 90% of the time because the maintainer is absent, or the “new and improved” Google Colab notebook that does output an actual ONNX file (as of the time of this writing; who knows how soon before it breaks) but the file itself is malformed for use with OpenWakeWord. The maintainer of this latter notebook is aware of this but doesn’t seem to care, and also has chosen not to bother with the functionality of the original notebook that creates a .tflite file, because again, he doesn’t care about it for his use case.
- Get an ONNX file that needs reshaping because the axes are wrong for OpenWakeWord. Write a Python script to add a Transpose node because apparently that’s just something you’re expected to know that you need to do, and also how to do it. This is an absurd proposition.
- Enter dependency hell trying to convert ONNX → TFLite:
- Try
onnx-tf: Requires TensorFlow 2.15, which needstensorflow-probability, which needstensorflow-addons(which is EOL’d, by the way) - Hit a
ValueError: None values not supporteddeep in the conversion stack - Try
ai-edge-torch: Can’t install because it needs PyTorch 2.4+ butonnx2torchworks with 2.1+, and pip’s resolver gives up - Contemplate how much easier it would be to just use Alexa; pine for the relatively easy days of training models with Mycroft-precise
- Try
- Discover an obscure NXP GitHub repo that has a converter tool, but:
- No clear installation instructions. You better understand git clone and that you need to type
pip install . - The tool is actually a Python module, not an installed command, so you need to run it with
python -m - Oh, and it’s
python3notpythonon your system, which took me a while to figure out.
- No clear installation instructions. You better understand git clone and that you need to type
- Finally get a .tflite file after stumbling through all of the above
All of this is an unpleasant experience. I realize that I’m playing with alpha features of beta software. But there was a usable workflow with Mycroft years ago, and we’ve abandoned that in favor of a much more opaque (at least to me, I don’t speak Python) workflow with pytorch and numpy and py-this and py-that and a thousand other Python-based scripts, scriptlets, byzantine venvs and ancient scrolls that open smoking portals to the Netherlands.
There is no standard tookit: everyone has their own script, notebook, or conversion flow. The documents linked in the existing, er…documentation point to Google Coloab notebooks that are frequently out of date because Google changed their runtimes. Much of the software depends on extremely old versions of software and the documentation (when it exists at all) makes no mention of this, so if you just…install Python based off some tutorial that you found because you’re – again –not a Python uber-nerd – you’ll get an environment that you can’t use. There is no document anywhere that explicitly lays all of this out. “You have to install this version of Python with these specific software versions in this specific order when Mercury is in retrograde”. Python packaging is already bad, but ML/AI libraries take it to another level. There are version conflicts everywhere. EOL’d packages are still required as dependencies. (Seriously, fuck Python. I’m so fucking sick of the tyranny of venvs and new versions that break everything that came before. Even Jamie Zawinski hates it.) There’s a lot of online feedback saying “Just use an LLM to figure out how to do this!”. It’s a good idea, but using an LLM on this problem usually ends in a doom loop of trying different combinations of different software versions, and eventually your venv is fucked and it’s just better to nuke it from orbit and you’ve made no progress whatsoever.
To add to this, the workflow (Google Colab) that we’ve all decided to use for the initial ONNX file generation is itself brittle. That tutorial from six months ago is probably useless because the “classic” Google Colab notebook breaks constantly, has out-of-date and incomplete documentation, and is a terrible environment to do this sort of work in, anyway. When (not if!) Google updates their runtime, suddenly the console is filled with thousands of lines of error messages, which print in their own little horizontal “slice” of the screen, each with its own scrollbar. Finding anything, especially after you run into an error, involves scrolling back through thousands of lines, which may or may not be relevant, to find the one error line that actually means something. Abandoning Google Colab as an environment would be a great first step; I think the right move here (if we’re sticking with OpenWakeWord) is a docker container that contains a Jupyter server (although Jupyter’s text is even less readable than Google Colab) and conversion tools to make the .onnx output work with HA, which is what I bet 99% of the people who are opening issues on the OpenWakeWord github page are trying to do.
I do realize that a lot of what I’m complaining about is OpenWakeWord. I have a ton of respect for dscripka (the creator and chief maintainer of OpenWakeWord, as far as I know) but the project is anything but user-friendly, and maintaining documentation or creating practical tutorials for using the software obviously is not a priority. As of the time of this writing, visiting the huggingface model page linked to from the github page immediately throws a runtime error, while the last release of the software itself is from nearly two years ago (Feb 2024). Again – the writers and maintainers of OpenWakeWord seem to have done incredibly impressive work to create an entirely new system of wake word detection, and this is to their credit. I certainly couldn’t do such a thing, and it’s clear that they’re doing mathematical modeling approaching true artistry that is far beyond my capabilities. But the creator seems to be focused on the math to create the thing, as though it’s more an academic project and not intended to be practical for end users. That’s a valid choice, but a (personally) annoying one, and it’s doubly annoying because Home Assistant / Nabu Casa has chosen to throw their hat in with this particular bit of software and that feels like an fairly anti-end-user choice, in my opinion. As an organization they have resource constraints and I’m not trying to ignore that, but why would an organization that values tinkering and figuring things out yourself get so embedded with a system where you can’t really do that easily?
It’s impossible for me not to compare this to the process of training a wake word with the now-defunct Mycroft project. Mycroft-Precise was actually usable; I sunk hours into perfecting my model. It wasn’t perfect, but it had an actual workflow that made sense, real documentation you could follow (some of which I personally wrote and shared with the community), and clear steps: record samples, train model, add in adversarial data by saving wakeword activations, refine. Creating thousands of recordings and strengthening the model over weeks was tedious, but it was understandable. I, or a reasonably tech-savvy user like I, could wrap my / their head around the process.
Then Mycroft died, and we got…this. A fragmented, incoherent, multi-system multi-step easy-to-break mess with a nearly-impossible learning curve. You need to:
- Understand machine learning model formats, and the internal formatting of the formats themselves, and how to alter them to work.
- Seriously, I can’t stress enough how crazy it is that someone put in the time figure out the axes of the file being out of whack for what OpenWakeWord wanted. I would have never figured that out for myself.
- Navigate Python’s dependency resolver
- Debug cryptic errors in conversion libraries
- Know when to give up on one approach and try another
- Have the patience to spend hours on what should be a simple file conversion
Here’s what would make this bearable. It’s really two basic problems, and documentation.
- Native ONNX support in Wyoming/OpenWakeWord: TFLite is great for ultra-low-power devices, but Home Assistant and Wyoming satellites typically run on hardware that can handle ONNX just fine. Why force this conversion at all? Just accept .onnx files directly.
- Somewhat related: the idea behind Wyoming is cool…but it’s a lot of work to do a basic thing. Whisper, Piper, and OpenWakeWord all run in their own little containers that I have to have complete control over. This makes sense for GPU-accelerated setups like mine but deploying the server once and then simply pointing the satellite back at that would be a great way to streamline how it currently works.
- A docker container with all of the tools with all of the correct little fiddly arcane versions of the software inside it that are required to fire up a Jupyter server with a notebook that we know will create the proper output, and conversion tools for as long as Wyoming refuses to allow the use of .onnx files.
- Better documentation: Not just “run this command” but “here’s what to do when it inevitably breaks” and “here are the exact versions that work together.” Walkthrough documentation that doesn’t assume that the user is a Python genius. List commands and give multiple examples regarding usage because command switch formatting is inconsistent. (This isn’t a problem confined to ML python libraries, to be clear.)
I eventually got my .tflite file. No idea if it actually even works yet because my Wyoming satellite Pi needs to be rebuilt. But I shouldn’t have had to fight this hard for it.
Finally, a word about the “entitled prick” problem in FOSS software. That problem usually manifests something like this: there’s a FOSS project, and someone finds it, wants a change, and then stamps their foot on the floor when there’s not enough support for their requested feature, or the fix to their specific problem doesn’t come fast enough, and they end up sounding like a bratty child who wants everyone to do things for them, for free, right now, dammit. I find that attitude annoying! But surely there is room to complain about what I see as end-user-hostile software features in a piece of software that tends to reward users poking around and figuring things out. I realize I’m using Home Assistant – and all the attendant integrations and add-ons – for free. That is a blessing that I’m receiving off the goodwill of the various developers and I absolutely acknowledge that. At the same time, if the tools provided are hostile to end users, I think complaints are valid.
Nothing about this ecosystem encourages stability or ease of use. We can do better. Right now, custom wake words are technically possible but practically nearly impossible – which is a problem that doesn’t need to be.

