Recently, we’ve been working on trying to build some tooling to make our Azuracast experience for our DJs and listeners a little better.
Shooting myself in the foot: background
We’ve been trying to work around a longstanding bug: when a new streamer connects to Azuracast, Azuracast’s Liquidsoap processing picks up the last thing the previous streamer sent as now-playing metadata, and sets it as the metadata for the new streamer.
This makes a lot of sense if you’re coding for the situation where a streamer loses connectivity and then resumes; generally this will be short, so preserving the now-playing metadata makes the best sense.
However, we have a rotating set of DJs who each stream for a relatively short time – our standard show is 2 hours long. So this means that if DJ One signs off, and DJ Two starts streaming without sending new metadata after they’ve connected, then DJ Two’s set seems to be a continuation of DJ One’s signoff. This is confusing, and for streamers who prefer to simply connect and stream, means that their metadata will be “wrong” for a considerable part of the show.
Azuracast’s now-playing APIs say that we should be able to send the stream metadata any time with a call to the API:
curl -X POST \
--location 'https:///api/station/1/nowplaying/update' \
--header 'Content-Type: application/json' \
--header 'X-API-Key: xxxx:xxxx' \
--data '{ "title" : "Live Broadcast", "artist" : ""}'
The only problem is that on our installation running Azuracast 0.22.1, this returns a 200 and does absolutely nothing. Looking at the logs inside Azuracast, the request is being rejected because a streamer is active. I opened a bug for this, and the recommended solution was to upgrade to the current stable release, 0.23.1.
Round 1: Upgrading Azuracast
2025-10-19, 9 pm: I’d upgraded Azuracast before and it had been pretty much completely seamless: put up a notice, run the Azuracast updater, broadcasting stops a second, and then the new version resumes right where it left off.
Super easy, barely an inconvenience.
After our 7 pm show on Sunday, I noted we’d taken a nightly automated backup of our current 0.22.1 installation, and then went ahead and upgraded: broadcasting stopped a second, the UI reloaded. I had to log back in, and we were still playing the same track. Fantastic! All according to plan. I had not taken a full backup of my installation because we all know Azuracast always updates just fine.
This was critical error #1.
2025-10-20, 7:15 pm: The next evening, however, I tried to stream my show. All went well until about an hour and a half in, and suddenly the audio started to stutter and glitch. Badly. I took a look at the Liquidsoap logs on Azuracast and they were not pretty.
2025/10/21 19:18:55 [clock.local_1:2] Latency is too high: we must catchup 54.91 seconds! Check if your system can process your stream fast enough (CPU usage, disk access, etc) or if your stream should be self-sync (can happen when using `input.ffmpeg`). Refer to the latency control section of the documentation for more info.
...
2025/10/21 19:18:56 [clock.local_1:2] Latency is too high: we must catchup 54.97 seconds! Check if your system can process your stream fast enough (CPU usage, disk access, etc) or if your stream should be self-sync (can happen when using `input.ffmpeg`). Refer to the latency control section of the documentation for more info.
...
2025/10/21 19:18:57 [clock.local_1:2] Latency is too high: we must catchup 55.03 seconds! Check if your system can process your stream fast enough (CPU usage, disk access, etc) or if your stream should be self-sync (can happen when using `input.ffmpeg`). Refer to the latency control section of the documentation for more info.
2025/10/21 19:18:57 [input_streamer:2] Generator max buffered length exceeded (441000 < 441180)! Dropping content..
And so on. You can see that Liquidsoap is having a worse and worse time trying to consume my stream and send it on. I eventually stopped my show early; Liquidsoap did not recover as I expected it to, so I restarted Azuracast, and watched as the AutoDJ happily streamed away, and resolved to look at it the next day.
No reports of problems, so I assumed it was a fluke.
2025-10-21 7:30 pm: The next show that day, it happened again, and was just as bad. The Tuesday DJ also cut his show short.
We had a very, very broken Azuracast, and there was an all-day streaming concert planned for Saturday, four days away.
Round 2: rollback did not roll
2025-10-21, 8pm: I started working right after the cancelled show, reasoning that we were indeed very much under time pressure, and that multiple restarts/crashes/reinstalls during our stations primary listener hours would be a bad idea.
I decided to try a rollback to 0.22.1, where we’d been streaming just fine. Unfortunately, I was lacking a critical piece of information.
When you run Azuracast’s ./docker.sh install, you must “pin” the release level you want in azuracast.env if you don’t want the most recent version. This is not documented in big bold DO THIS OR YOU WILL BE COMPLETELY SCREWED letters in the Azuracast install docs, because of course you always want the most recent stable version, why wouldn’t you?
So I, and ChatGPT, my faithful (but unfortunately clueless about pinning versions, critical error #2: I had picked the wrong tool for the job because it gave me more answers for free) companion, embarked on getting the server fixed.
I went through multiple iterations of “I’ve reinstalled the server and it’s upgraded itself to 0.23.1 again”. I tried multiple ways to just install 0.22.1 and leave it there.
2025-10-21, 10:02 pm: I downloaded the code at the 0.22.1 tag and tried to run it in development mode and reinstall my automatic backup. It upgraded itself to 0.23.1.
2025-10-21, 10:40pm: I tried building all the Docker images myself at 0.22.1 and restoring the backup. It upgraded itself.
I tried downloading the Docker images, restoring, and just running them. It upgraded itself.
2025-10-21, 11:55pm: I managed to dig up a full backup of our 0.22.0 install, which was around a year old. This wasn’t ideal, but it was better than nothing at all, and restored it, then tried to install 0.22.1 from source. It chugged for a long time doing the restore…and upgraded itself to 0.23.1.
2025-10-21, 12:24 am: I then made critical error #3: I concluded that the 0.23.1 database on the database docker volume was the problem, and that I needed to deinstall Azuracast and retry the 0.22.1 install, following the documented deinstall/reinstall process. This was a bad idea, because it deleted the Docker volumes from my Azuracast install…and then erased them. So now I’d lost all my station media, all my podcasts, and all my playlists. I was very hosed. [If I had not made critical error #1 (skipping the full backup), critical error #3 would not have been a problem.]
2025-10-21, 1:34 am: painstaking reload of the data from the old backup. It upgraded itself again.
2025-10-21, 3:22 am: tried again, more carefully. Restore. Wait. Watch it upgrade itself again.
2025-10-21, 4:41 am: Nothing I could think of, or that ChatGPT could think of, could fix it. We were down, hard.
2025-10-21, 5:21 am: The rest of the team is starting to come on line. Everything was broken, I was exhausted. They chase me off to bed, and I tried to sleep.
The rest of the team comes through
2025-10-22, 6 am: The rest of the team is up and online. ʞu¡0ɹʞS posts a neutral “we’re down for maintenance” banner on radiospiral.net. Southwind Niehaus suggests that she can provide an alternate Azuracast server for Saturday at 0.21.0, and the team pitches in to get that server set up to be a backup.
2025-10-22, 10 am: Mr. Spiral approves the switchover to Southwind’s server, and offers to send Gypsy Witch the tracks she needs to do her show. (She uses downloads from Azuracast to fill out her playlists.)
2025-10-22, 10:19 am: passing out the alternate server URL to Second Life denizens starts. It is decided to not change DNS to Southwind’s server because of propagation times.
2025-10-22, 10:27 am: plans to populate the substitute server proceed apace.
2025-10-22, 12:16 pm: Radiospiral.net web player repointed to substitute server, but metadata is not working. Phone alerts woke me up enough that I was able to supply the right now-playing metadata URL to ʞu¡0ɹʞS .
2025-10-22, 12:29 pm: Radiospiral.net is switched over. I update the iOS radio app’s config data on GitHub and confirm we have music but no metadata in the app; the metadata server URL was hardcoded in the released version of the app. I make a note to push out a new version with the metadata in the config file.
2025-10-22, 1:09 pm: I am able to find the version on the App Store and make the fix.
2025-10-22, 1:36 pm: Test version of the app up and available to beta testers.
2025-10-22, 2:08 pm: The substitute stream is working in all the correct places in Second Life as well. We close the PI.
I continued work on the iOS app; the real blocker was getting the screenshots right! Once that was done, I submitted the new version of the app on 10-25 and had an approval and the new version on the App Store by 10-26. Everything was working well with the substitute server, the Saturday show was successful, Southwind’s server handled the load perfectly, and kept going just fine, streaming shows and AutoDJing, while I resumed work on restoring 0.22.1.
Actually fixing it, day 1
I had used Claude to help verify the fixes I made to the iOS app, and it worked so much better than ChatGPT on code generation that I went ahead and subscribed at the $20/month level.
I brought up Claude on the Azuracast server, showed it the checked-out source code repo, and asked for help solving the problem of getting to and staying on 0.22.1.
Claude immediately told me about AZURACAST_VERSION, version pinning, and azuracast.env. [Looking back over the timeline, I wasted somewhere around 14.5 hours not knowing about that.]
We set the AZURACAST_VERSION=0.22.1 in azuracast.env.
Claude suggested a two-stage strategy to restore the nightly from just before the failed upgrade, and the old full backup.
First, I checked out Azuracast again at the 0.22.1 tag and let it install itself. Claude found and fixed a couple issues that were keeping it from building.
Once that was up and I had somewhere I could restore the files to, I first restored the old, full, backup. This got me back the media files, but not the playlists, stations, or podcasts. (It would turn out that the podcasts weren’t in that backup at all because we hadn’t started hosting them on Azuracast yet when it was taken.) That took about two hours.
We then restored the nightly over the old backup to get the station settings back. That took only a minute, and restored the current configs and database (including playlists). I had to reset my Azuracast login password (the azuracast:account:reset-password CLI command did that).
Because the database and the media library were not in sync, I had a lot of unassigned tracks in the library that I was going to need to get into proper playlists.
Claude helped me build SQL queries and a small PHP program to categorize the tracks by duration
- < 2 minutes,which are often noisy and/or disruptive
- 2 minutes to 20 minutes (our standard AutoDJ tracks)
- > 30 minutes, which get played on “long-play Sundays”
and sort them into the existing playlists where they were supposed to go. The few remaining < 2-minute tracks were listened to and filed appropriately. This in total took about an hour, and the server was back in good shape.
Day 2: Future-Proofing (~2 hours)
We discussed what we could do to stop testing in prod. Claude suggested a blue-green deployment strategy — one known-good server at all times, so we could flip from one to the other after doing testing.
We created /var/azuracast-staging to have somewhere to build the second server, and configured it to use ports 8000/8443 for its web interface, and the station ports on the 10xxx ports.
The media storage is shared between prod and staging; staging has read-only access. (This is sort of useful; it doesn’t allow us to move media around on the staging server, and I may switch it to just having its own volumes that I can swap to whichever instance is currently “production”.)
There’s now a DISASTER-RECOVERY.md document, a complete disaster recovery guide with all scenarios and an azuracast-upgrade-strategies.md that documents the blue-green deployment.
Lessons learned
If one is dealing with a PI with which one is not 100% a subject-matter expert, it is critical to have one available, whether a human or LLM one. I chose the wrong LLM one: as soon as I had Claude look at the configuration and told it I wanted to be running at 0.22.1 and stay there, it told me about pinning the version in azuracast.env.
Testing in production, which is what I ended up doing with the upgrade to 0.23.1, was a bad idea. I worked with Claude to come up with a setup allowing me to run a staging Azuracast server in parallel with the production one. This lets me try things on a server that’s okay to break. It’s probably an idea to have a dev one too, but I’ll come back to that later.
Carefully integrating full backups into the upgrade process at the correct points is critical to being able to roll back as quickly as possible. (This is carefully documented in the disaster recovery document. The recommended number of backups uses around half a terabyte of storage, but it carefully checkpoints everything along the way.)
It’s still possible to be down for an hour or more, but not for the multiple days that resolving this took this time.