Synology not booting correctly after DSM 7.2.2 Update

In this post, I’ll share the key lessons learned from a lengthy troubleshooting session after upgrading from DSM 7.1.1 to DSM 7.2.2.

The Symptoms:

  • The DSM upgrade from version 7.1.1 to 7.2.2 appeared successful.
  • After the upgrade, DSM booted fine, allowing login and access to SMB shares.
  • However, after the subsequent reboot, DSM got stuck at the “System is getting ready…” screen at the login prompt.
  • Multiple reboots didn’t resolve the issue.

SSH was unavailable, but fortunately, the serial connection was still functional, providing the only way in.

Initial Observations: Upon logging in, it became clear that:

  • No disks were mounted.
  • /etc/fstab was empty.
  • cat /proc/mdstat showed all drives and RAID arrays were healthy.

The first step was to investigate what was supposed to populate /etc/fstab and why it wasn’t working. It turns out that the process is controlled by a binary called /usr/syno/bin/synocfgen, with a symlink to /usr/syno/cfgen/s00_synocheckfstab.

Running the binary manually (/usr/syno/cfgen/s00_synocheckfstab), followed by mount -a, successfully mounted all the volumes.

The Next Step: The question then shifted to determining what was responsible for executing /usr/syno/cfgen/s00_synocheckfstab. After some digging, I found the culprit:

  • A script called volume.sh located in /usr/syno/lib/systemd/scripts/volume.sh.
  • This script is associated with the syno-volume.service, defined in /usr/lib/systemd/system/syno-volume.service.

ash-4.4# cat /usr/lib/systemd/system/syno-volume.service;
[Unit]
Description=Synology volume service
Wants=syno-space.target
After=syno-space.target

[Service]
Type=oneshot
RemainAfterExit=yes
TimeoutStartSec=1800s
TimeoutStopSec=600s
ExecStartPre=-/usr/syno/lib/systemd/scripts/volume.sh –bootup-pre-start
ExecStart=/usr/syno/lib/systemd/scripts/volume.sh –bootup-start
ExecStop=/usr/syno/lib/systemd/scripts/volume.sh –stop-all

[X-Synology]

The Problem: The syno-volume.service wasn’t running! Running systemctl status syno-volume.service confirmed this.

Going next – Why syno-volume service wasn’t running. It took quite some time to go through various logs for messages/errors – all without result. Went through observing the dependencies of syno-volume service:

  • all dependent services were there and running (syno-spaces and some others…)
  • and somehow expected – none of the services that depend on syno-volume were started (systemctl list-dependencies –reverse syno-volume)

Starting manually the systemd services (returned by the reverse list-dependencies command) brought a fully functional NAS…Until the next reboot…when it stuck at login prompt again.

After some other round of thinking I decided to check what is the default target systemd level. And it turned out it is not the default one of Synology!

On the broken system:

systemctl get-default

pkg-synobrm-keep-session.target

On a healthy system:

systemctl get-default
syno-bootup-done.target

Final piece of the puzzle was to identify why the default systemd level was changed (and of course, manualy changing it was lost on next reboot, so another round of reverse engineering the synology scripts&configs was necessary). Grepping here and there led this time to /usr/syno/lib/systemd/generators/syno-brm-restore-generator – updated very recently and likely coming from the latest DSM.

Upon examining the syno-brm-restore-generator, it became evident that two files could trigger the logic that changes the default systemd target:

  • /var/run/.brm_restoring
  • /var/lib/abb_recovery_status

These files appear to play a role in altering the system’s default target, which was disrupting the boot process.

When comparing with a healthy system that had also gone through the same upgrade to DSM 7.2.2, it was found that the files /var/lib/abb_recovery_status and /var/run/.brm_restoring were absent. Removing these files allowed systemd to restore its default target, and the NAS became fully functional again.

However, the reason why these files were present in the first place, despite the system having been upgraded correctly, remains a mystery.