From a few times disks coming to an end, some other times my own mistakes on configuring services and/or software on it and having to restart installation from scratch, as well a real big fire in the datacentre where the server was located.
I just got the new server, as i still have a few months on rent on the old one, and i will use this time for setting up the new server and of course moving the services from the old one to the new one with a minimum disruption as possible.
I have been using FreeBSD always for servers, and will continue to do so, and also together with ZFS, and the current server is running a zfs on root on a 2 disk zfs mirror.
It is the same configuration i will be using for the new server, but because of a big mistake i did when upgrading FreeBSD on the old server and forgetting to upgrade the boot code on the disks, initially the system didn't boot at all after the upgrade. So a scary moment considering that this was my main server where i have some important services running, like mail for my domain, but for some strange luck, with a remote install managed to boot on one of the disks that contained still my data and was able to be up and running again.
And after that, i haven't touched the server afraid it might not boot again.
Considering it is an old server from OVH, that no longer supports IPMI/KVM, then i decided to move to a new server and i have been preparing things to be moved and let go of the old server.
The main point i wanted to make sure everything was working is related to what happens when a disk fails in the mirror and as i read in several places, the system should boot up cleanly from the other disk without any issues.
While testing this using FreeBSD 14.2 release, when i was "disconnecting" the first disk, the system would not boot up. So i required another reinstall.
I started searching online and after going around on a few places i found that the FreeBSD installer had a bug where while creating the correct partitions on both disks and creating the zfs on root, it seemed that on the second disk the needed efi partition was created, but it wasn't formatted and obviously the efi loader was not copied to it.
This is the bug in FreeBSD bugzilla:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=258987
Based on that the explanation for not booting was good, as the efi loader partition was never initialized and of course the second disk couldn't boot the system.
Then another reinstall, and this time checked the disk partition on both disks, and the gpart show output was like this:
# gpart show
=> 40 3907029088 ada0 GPT (1.8T)
40 532480 1 efi (260M)
532520 2008 - free - (1.0M)
534528 3906494464 2 freebsd-zfs (1.8T)
3907028992 136 - free - (68K)
=> 40 3907029088 ada1 GPT (1.8T)
40 532480 1 efi (260M)
532520 2008 - free - (1.0M)
534528 3906494464 2 freebsd-zfs (1.8T)
3907028992 136 - free - (68K)
And it shows the correct and expected partitions. We know that the on the first disk, the efi partition is correct and contains the correct and the needed loader.efi for booting, so then for the second disk we first need to format the partition as FAT32 and copy the loader.efi to it, and you do so like this:
# newfs_msdos -F 32 -c 1 /dev/ada1p1
# mount -t msdosfs /dev/ada1p1 /mnt
# mkdir -p /mnt/EFI/BOOT
# cp /boot/loader.efi /mnt/EFI/BOOT/BOOTX64.efi
# umount /mnt
After this is done, i first rebooted without changing anything else, and as expected it booted OK. Then i disconnected the first disk, and once again rebooted, and it also booted correctly and now i could see that it has booted from the second disk, and the system was running smoothly as expected.
Again, while searching on solutions for this, i found a little port/utility called "loaders-update" that you can find in the ports/sysutils area which can be installed via the ports tree or via the pkg install process.
This utility can be run with a "dry run" option that checks the disks and partitions and outputs what would be done if the updates were done and you do that just basically by issuing the command and you get the output of what was expected to do:
# loaders-update show-me
loaders-update v1.2.1
One or more efi partition(s) have been found.
Examining ada1p1...
mount -t msdosfs /dev/ada1p1 /mnt
EFI loader /mnt/EFI/BOOT/bootx64.efi is up-to-date.
EFI loader /mnt/EFI/FREEBSD/loader.efi is up-to-date.
umount /mnt
Examining ada0p1...
mount -t msdosfs /dev/ada0p1 /mnt
There is no FreeBSD loader in ada0p1
umount /mnt
One or more freebsd-boot partition(s) have been found.
The root file system is zfs.
Examining ada1...
The pmbr on this disk is up-to-date.
The freebsd-boot partition ada1p2 is up-to-date.
Examining ada0...
The pmbr on this disk is up-to-date.
The freebsd-boot partition ada0p2 is up-to-date.
-------------------------------
Your current boot method is UEFI.
Boot device: IP4 Intel(R) I350 Gigabit Network Connection PciRoot(0x0)/Pci(0x3,0x3)/Pci(0x0,0x0)/MAC(ac1f6b44d2cc,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)
One or more target partition(s) have been found...
All loaders are up-to-date.
-------------------------------
#
As you can see at moment the output shows what is correct and as expected, but also shows that on the first disk while there is an efi partition, that partition is empty or not populated, which means you will need to make sure to copy the contents as explained above from the other "good" efi partition.
After these changes and checks as mentioned before the system booted correctly from either the disks, by selecting on the boot menu, or either by disconnecting one disk then it will boot correctly from the other disk.
Also for one last test, as it was mentioned that the above bug in FreeBSD installation has now been fixed and merged from what i could find onto FreeBSD 14.3 Release, i have done a fresh reinstall using the 14.3 base installation and the problem no longer happened as i could boot from either of the disks, as well i could "disconnect" the first disk and the system will boot as normal from the second disk.
But also a legacy fstab entry should be removed or commented out as below snipt it of the fstab, or the system will try to mount it and it will get stuck on the mount if the first disk is not present:
# Device Mountpoint FStype Options Dump Pass#
# /dev/gpt/efiboot0 /boot/efi msdosfs rw 2 2
There was an another interesting problem i found, and at this point i don't know if it is a real problem in the sense of something missing, like the loaders or boot code, or if it is me not doing the right things.
The problem is that when i manually remove or using the zfs term "detach" the first disk from the pool and reboot, the system doesn't reboot anymore, and i am back at reinstalling the system again.
This will be an investigation for a future day...