Bug 88483

Summary:	btrfs raid on plain dmcrypt fails to boot randomly
Product:	systemd	Reporter:	Paolo <palmaway>
Component:	general	Assignee:	systemd-bugs
Status:	NEW ---	QA Contact:	systemd-bugs
Severity:	major
Priority:	medium	CC:	2bluesc, dutch109, freedesktop, liststuff, mail, palmaway, radek
Version:	unspecified
Hardware:	All
OS:	Linux (All)
Whiteboard:
i915 platform:		i915 features:

Description Paolo 2015-01-16 07:01:04 UTC

I run an up-to-date Archlinux (x64). My configuration is as follows: 3 disks encrypted with plain dmcrypt, on top of which there is a btrfs RAID5 filesystem. The disks are decrypted at boot through /etc/crypttab using 3 key files, producing the devices:
/dev/mapper/crypt[a,b,c]
Then a single entry in /etc/fstab mounts the btrfs filesystem.

At boot, only one of the devices is found (systemd message "systemd[1]: Found device /dev/mapper/cryptX."). Which one is found among the three is random, and changes at every boot. I suspect this is due to the devices sharing the same label and uuid because of btrfs, as explained here:
http://lists.freedesktop.org/archives/systemd-devel/2013-June/011400.html

The configuration fails to boot producing a "A start job is running for" message for the remaining devices at any time the device found by systemd is not the right one. By "right one" I mean:
- in case in fstab I use a specific device, that one. The messages for the remaining devices have no timeout, so they block the boot process indefinitely, and a reboot is necessary. Chances to boot correctly are therefore 1/3.
- in case I use the label or uuid shared by the devices, then /dev/mapper/cryptc (the last one being decrypted by the crypttab). In this case, there are 3 "start job" messages: one related to the /dev/disk-by-[label/uuid] and the two relative to the remaining devices as above. Chances to boot correctly are therefore lower, but the /dev/disk-by- message does have a timeout (1:30 minutes) and produces an emergency shell after that.

I tried:
- to create a .mount file in /etc/systemd/system (which takes precedence over the fstab line) as suggested in https://bbs.archlinux.org/viewtopic.php?id=146708 . It always boots correctly, but it doesn't mount the filesystem correctly with the same probability. It simply fails silently.
- to add the "device=" option for the 3 devices among the fstab mount options, and even to create a systemd service that runs after cyptsetup.target and before local-fs-pre.target to force a "btrfs device scan" as suggested here: https://unix.stackexchange.com/questions/120907/arch-does-not-mount-btrfs-array-on-boot . Unfortunately, neither of these solve the problem.

Based on the above, I think the problem (bug) is systemd only finding one of the devices. My guess is that it happens because they share the same label and uuid (see the first link).

The btrfs filesystem is used for storage, it is not system related and for various reasons it is not advisable to have it decrypted with the mkinitcpio hooks (sd-encrypt and btrfs).

Comment 1 Lennart Poettering 2015-01-28 19:24:21 UTC

How precisely does your fstab and crypttab look like?

Note that for btrfs RAID only the backing device that makes the RAID set complete is considered active by systemd. That means that systemd will only pick up the last device that is discovered. This is intended that way. For this to properly work you need to reference the btrfs file system by its UUID in fstab, so that it doesn't matter which one is the last one to be picked up.

Comment 2 Paolo 2015-02-09 08:02:56 UTC

Hi, and thanks for your reply. However, putting the UUID in fstab doesn't work on my system, as mentioned in the original bug report above. Here are a few additional info:

The relevant blkid output:
 
/dev/mapper/cryptb: LABEL="data" UUID="ee0726c7-f7d1-4031-8a53-32d384334196" UUID_SUB="8eb20b58-f736-404d-92dd-bb467da8a275" TYPE="btrfs"
/dev/mapper/cryptc: LABEL="data" UUID="ee0726c7-f7d1-4031-8a53-32d384334196" UUID_SUB="0072ff4d-25e6-44a1-9fae-8cb9b38299d9" TYPE="btrfs"
/dev/mapper/crypta: LABEL="data" UUID="ee0726c7-f7d1-4031-8a53-32d384334196" UUID_SUB="d0a0ff7f-76d0-4572-9e0f-db9f20ea6fa0" TYPE="btrfs"

My /etc/crypttab:

crypta  /dev/sda  /root/key.crypta  cipher=aes-xts-plain64,size=512,hash=plain
cryptb  /dev/sdb  /root/key.cryptb  cipher=aes-xts-plain64,size=512,hash=plain
cryptc  /dev/sdc  /root/key.cryptc  cipher=aes-xts-plain64,size=512,hash=plain

The relevant part of my /etc/fstab:

/dev/mapper/cryptc  /data  btrfs  defaults,device=/dev/mapper/crypta,device=/dev/mapper/cryptb,device=/dev/mapper/cryptc,compress-force=lzo  0 0

Again, if I put in the fstab the UUID shared by the btrfs raid filesystem indicated by blkid (ee0726c7-f7d1-4031-8a53-32d384334196) then the problem is not solved, it actually gets worse, as indicated in the original report above! I get 3 "start job" messages, one related to the /dev/disk-by-uuid and the two relative to the remaining devices /dev/mapper/cryptX.

Comment 3 Lennart Poettering 2015-02-11 19:42:47 UTC

(In reply to Paolo from comment #2)
> Hi, and thanks for your reply. However, putting the UUID in fstab doesn't
> work on my system, as mentioned in the original bug report above. 

Not following. Why does that now work? Can you elaborate? The comment #1 is not very clear about that?

> /dev/mapper/cryptc  /data  btrfs 
> defaults,device=/dev/mapper/crypta,device=/dev/mapper/cryptb,device=/dev/
> mapper/cryptc,compress-force=lzo  0 0
> 
> Again, if I put in the fstab the UUID shared by the btrfs raid filesystem
> indicated by blkid (ee0726c7-f7d1-4031-8a53-32d384334196) then the problem
> is not solved, it actually gets worse, as indicated in the original report
> above! I get 3 "start job" messages, one related to the /dev/disk-by-uuid
> and the two relative to the remaining devices /dev/mapper/cryptX.

Hmm, why do you get the latter three? I mean, the idea is to always use the UUID, and nothing else, not a mixture...

Comment 4 Paolo 2015-03-01 17:36:56 UTC

I changed /etc/fstab to contain the following line:

UUID=ee0726c7-f7d1-4031-8a53-32d384334196   /data   btrfs  defaults,compress-force=lzo  0 0

as you can see, I also eliminated the "device" mount options, in case those were the ones creating the problem. Unfortunately, as stated in my first post, now I get these messages at boot:

- "A start job is running for" /dev/mapper/cryptb and /dev/mapper/cryptc with no timeout;
- "A start job is running for dev-disk-by/x2uuid-..." with a timeout of 1min and 30 secs.

The three messages alternate on screen, and when the timeout for the last one ends, I am able to enter the root password for maintenance.

Please don't ask me why doesn't it work, as that is exactly why I posted a bug report: there is an expected behavior, but it doesn't happen.

Can you please clarify what do you mean by "the idea is to always use the UUID"? Do you mean in the crypttab as well? How can I use the UUID's in the crypttab, since the disks share the same UUID? Can you provide an example?

Comment 5 Paolo 2015-03-01 18:08:31 UTC

I changed both the fstab (for the devices not related to the btrfs, which were previously using LABEL= instead of UUID=) and the crypttab (using the UUID_SUB indicated in the blkid output as UUID=). Now both files only reference to UUID's.

The situation is *much* worse: now I get 7 "A start job" messages (2 for the "/dev/mapper/..." and 5 for "dev-disk-by/x2uuid-..."). The system is virtually unbootable: 20 reboots and none succeeding.

Comment 6 Paolo 2015-03-01 19:06:49 UTC

I have reverted to the previous configuration, and now the system sometime boots (as explained before, when I am lucky enough that systemd finds /dev/mapper/crypta first).

When I can boot, I cannot run systemd-analyze (it reports that "Bootup is not yet finished." exactly as described in the first link of comment #1). However, if I run "systemctl list-units" I get the following as first lines:

UNIT                                LOAD     ACTIVE   SUB      JOB   DESCRIPTION
proc-sys-fs-binfmt_misc.automount   loaded   active   waiting        Arbitrary Executable File Formats File System Automount Point
dev-mapper-cryptb.device            loaded   inactive dead     start dev-mapper-cryptb.device
dev-mapper-cryptc.device            loaded   inactive dead     start dev-mapper-cryptc.device

I hope this helps. Thanks for looking into this.

Comment 7 Andre 2015-05-09 21:39:31 UTC

I'm also affected by this bug with a similar setup running up-to-date Archlinux (x64). My configuratation consists of 2 disks encrypted with plain dm-crypt with a btrfs RAID1 on top. The disks are also encrypted via a keyfile. 

However, I use UUIDs in the crypttab provided by blkid /dev/sdX and in fstab I use /dev/mapper/DEVICE1 and in the mount-options provide also device=/dev/mapper/DEVICE2. With this configuration I need 3-5 attempts each boot to succesfully mount the RAID.

Please tell me, if I can provide any further information.

Comment 8 Kyle 2015-07-04 23:00:14 UTC

I'm also affected: Arch Linux systemd 221-2 + device-mapper 2.02.122-1 + linux 4.0.7-2

I have 4 physical disks with using LUKS + keyfile and my crypttab uses the partition's UUID.

The fstab file use the btrfs RAID filesystem's UUID as explained by Lennart (although using LABEL= appears to behave the same).

The system boots and mounts the filesystem as expected; however, there are 3 pending jobs and the system remains in the "starting" state.

$ systemctl list-jobs
JOB UNIT                     TYPE  STATE  
 72 dev-mapper-crypt3.device start running
 58 dev-mapper-crypt0.device start running
 69 dev-mapper-crypt1.device start running

3 jobs listed.

The missing device crypt2 seems to have worked as suggested to be the device actually mounted by btrfs magic:

$ systemctl status dev-mapper-crypt2.device
dev-mapper-crypt2.device - /dev/mapper/crypt2
   Follow: unit currently follows state of sys-devices-virtual-block-dm\x2d6.device
   Loaded: loaded
  Drop-In: /run/systemd/generator/dev-mapper-crypt2.device.d
           └─90-device-timeout.conf
   Active: active (plugged) since Sat 2015-07-04 14:54:08 PDT; 1h 3min ago
   Device: /sys/devices/virtual/block/dm-6

Jul 04 14:54:08 puppies systemd[1]: Found device /dev/mapper/crypt2.

Comment 9 raneon 2015-10-20 22:09:05 UTC

I do confirm this bug as well on my arch linux system with a VM that decrypts at boot 3 disks as btrfs RAID1. See https://bugs.archlinux.org/task/42884?project=1. I'm quite unhappy as it got worse in the last time, don't want to blame systemd, but right now not any workaround did solve this issue as reported here already. At the moment I need more than 15 boots to get my system up and running. I really hope that somebody will step in to solve this bug.

Comment 10 raneon 2015-10-20 22:20:35 UTC

Forgot to say that I use linux 4.2.3 and systemd 227

Comment 11 dutch109 2016-11-12 13:46:44 UTC

I am hit by this bug too (Arch Linux with Linux 4.4 & systemd 231).

None of the workarounds I found here and elsewhere did work, I tried:
* explicitely require all devices in /etc/fstab with device=xxx,device=yyy
* setting filesystem to noauto,x-systemd.automount in /etc/fstab and noauto in /etc/crypttab (it did work until a recent update)
* adding btrfs in MODULES array in /etc/mkinitcpio.conf

I finally "fixed" it by setting the filesystem to noauto in /etc/fstab (so they are NOT mounted by systemd at boot), and creating a simple service that mounts the missing partitions later.

/etc/systemd/system/late-mount.service :
[Unit]
Description=Mount directories that systemd fail to mount

[Service]
ExecStart=/etc/systemd/system/late-mount

[Install]
WantedBy=multi-user.target 

/etc/systemd/system/late-mount:
#!/bin/bash -e
mount /xxx
mount /xxx/yyy

Comment 12 liststuff 2017-02-26 06:48:33 UTC

This is a really nasty bug, I wasted hours trying to get my configuration working. I'm on Ubuntu 16.04 with all the latest updates, and this bug is still there. It'd be really appreciated with systemd devs could look into this.

For me the workaround from dutch109 worked, with some modification, systemd/late-mount.service:
[Unit]
Description=Mount encrypted multi-device Btrfs filesystems that systemd fails to mount due to https://bugs.freedesktop.org/show_bug.cgi?id=88483
Before=display-manager.service getty@tty1.service getty@tty2.service getty@rrt3.service getty@tty4.service getty@tty5.service getty@tty6.service

[Service]
Type=oneshot
ExecStart=/etc/systemd/system/late-mount

[Install]
WantedBy=multi-user.target

systemd/late-mount:
#!/bin/bash -e
setfont Uni3-TerminusBold32x16.psf.gz
cryptdisks_start sda1_crypt
cryptdisks_start sdb1_crypt
mount /home

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.