* mount time of multi-disk arrays
@ 2014-07-07 13:38 André-Sebastian Liebe
2014-07-07 13:54 ` Konstantinos Skarlatos
0 siblings, 1 reply; 9+ messages in thread
From: André-Sebastian Liebe @ 2014-07-07 13:38 UTC (permalink / raw)
To: linux-btrfs
Hello List,
can anyone tell me how much time is acceptable and assumable for a
multi-disk btrfs array with classical hard disk drives to mount?
I'm having a bit of trouble with my current systemd setup, because it
couldn't mount my btrfs raid anymore after adding the 5th drive. With
the 4 drive setup it failed to mount once in a few times. Now it fails
everytime because the default timeout of 1m 30s is reached and mount is
aborted.
My last 10 manual mounts took between 1m57s and 2m12s to finish.
My hardware setup contains a
- Intel Core i7 4770
- Kernel 3.15.2-1-ARCH
- 32GB RAM
- dev 1-4 are 4TB Seagate ST4000DM000 (5900rpm)
- dev 5 is a 4TB Wstern Digital WDC WD40EFRX (5400rpm)
Thanks in advance
André-Sebastian Liebe
--------------------------------------------------------------------------------------------------
# btrfs fi sh
Label: 'apc01_pool0' uuid: 066141c6-16ca-4a30-b55c-e606b90ad0fb
Total devices 5 FS bytes used 14.21TiB
devid 1 size 3.64TiB used 2.86TiB path /dev/sdd
devid 2 size 3.64TiB used 2.86TiB path /dev/sdc
devid 3 size 3.64TiB used 2.86TiB path /dev/sdf
devid 4 size 3.64TiB used 2.86TiB path /dev/sde
devid 5 size 3.64TiB used 2.88TiB path /dev/sdb
Btrfs v3.14.2-dirty
# btrfs fi df /data/pool0/
Data, single: total=14.28TiB, used=14.19TiB
System, RAID1: total=8.00MiB, used=1.54MiB
Metadata, RAID1: total=26.00GiB, used=20.20GiB
unknown, single: total=512.00MiB, used=0.00
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: mount time of multi-disk arrays 2014-07-07 13:38 mount time of multi-disk arrays André-Sebastian Liebe @ 2014-07-07 13:54 ` Konstantinos Skarlatos 2014-07-07 14:14 ` Austin S Hemmelgarn ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Konstantinos Skarlatos @ 2014-07-07 13:54 UTC (permalink / raw) To: André-Sebastian Liebe, linux-btrfs On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: > Hello List, > > can anyone tell me how much time is acceptable and assumable for a > multi-disk btrfs array with classical hard disk drives to mount? > > I'm having a bit of trouble with my current systemd setup, because it > couldn't mount my btrfs raid anymore after adding the 5th drive. With > the 4 drive setup it failed to mount once in a few times. Now it fails > everytime because the default timeout of 1m 30s is reached and mount is > aborted. > My last 10 manual mounts took between 1m57s and 2m12s to finish. I have the exact same problem, and have to manually mount my large multi-disk btrfs filesystems, so I would be interested in a solution as well. > > My hardware setup contains a > - Intel Core i7 4770 > - Kernel 3.15.2-1-ARCH > - 32GB RAM > - dev 1-4 are 4TB Seagate ST4000DM000 (5900rpm) > - dev 5 is a 4TB Wstern Digital WDC WD40EFRX (5400rpm) > > Thanks in advance > > André-Sebastian Liebe > -------------------------------------------------------------------------------------------------- > > # btrfs fi sh > Label: 'apc01_pool0' uuid: 066141c6-16ca-4a30-b55c-e606b90ad0fb > Total devices 5 FS bytes used 14.21TiB > devid 1 size 3.64TiB used 2.86TiB path /dev/sdd > devid 2 size 3.64TiB used 2.86TiB path /dev/sdc > devid 3 size 3.64TiB used 2.86TiB path /dev/sdf > devid 4 size 3.64TiB used 2.86TiB path /dev/sde > devid 5 size 3.64TiB used 2.88TiB path /dev/sdb > > Btrfs v3.14.2-dirty > > # btrfs fi df /data/pool0/ > Data, single: total=14.28TiB, used=14.19TiB > System, RAID1: total=8.00MiB, used=1.54MiB > Metadata, RAID1: total=26.00GiB, used=20.20GiB > unknown, single: total=512.00MiB, used=0.00 > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Konstantinos Skarlatos ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mount time of multi-disk arrays 2014-07-07 13:54 ` Konstantinos Skarlatos @ 2014-07-07 14:14 ` Austin S Hemmelgarn 2014-07-07 16:57 ` André-Sebastian Liebe 2014-07-07 14:24 ` André-Sebastian Liebe 2014-07-07 15:48 ` Duncan 2 siblings, 1 reply; 9+ messages in thread From: Austin S Hemmelgarn @ 2014-07-07 14:14 UTC (permalink / raw) To: Konstantinos Skarlatos, André-Sebastian Liebe, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 2832 bytes --] On 2014-07-07 09:54, Konstantinos Skarlatos wrote: > On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: >> Hello List, >> >> can anyone tell me how much time is acceptable and assumable for a >> multi-disk btrfs array with classical hard disk drives to mount? >> >> I'm having a bit of trouble with my current systemd setup, because it >> couldn't mount my btrfs raid anymore after adding the 5th drive. With >> the 4 drive setup it failed to mount once in a few times. Now it fails >> everytime because the default timeout of 1m 30s is reached and mount is >> aborted. >> My last 10 manual mounts took between 1m57s and 2m12s to finish. > I have the exact same problem, and have to manually mount my large > multi-disk btrfs filesystems, so I would be interested in a solution as > well. > >> >> My hardware setup contains a >> - Intel Core i7 4770 >> - Kernel 3.15.2-1-ARCH >> - 32GB RAM >> - dev 1-4 are 4TB Seagate ST4000DM000 (5900rpm) >> - dev 5 is a 4TB Wstern Digital WDC WD40EFRX (5400rpm) >> >> Thanks in advance >> >> André-Sebastian Liebe >> -------------------------------------------------------------------------------------------------- >> >> >> # btrfs fi sh >> Label: 'apc01_pool0' uuid: 066141c6-16ca-4a30-b55c-e606b90ad0fb >> Total devices 5 FS bytes used 14.21TiB >> devid 1 size 3.64TiB used 2.86TiB path /dev/sdd >> devid 2 size 3.64TiB used 2.86TiB path /dev/sdc >> devid 3 size 3.64TiB used 2.86TiB path /dev/sdf >> devid 4 size 3.64TiB used 2.86TiB path /dev/sde >> devid 5 size 3.64TiB used 2.88TiB path /dev/sdb >> >> Btrfs v3.14.2-dirty >> >> # btrfs fi df /data/pool0/ >> Data, single: total=14.28TiB, used=14.19TiB >> System, RAID1: total=8.00MiB, used=1.54MiB >> Metadata, RAID1: total=26.00GiB, used=20.20GiB >> unknown, single: total=512.00MiB, used=0.00 This is interesting, I actually did some profiling of the mount timings for a bunch of different configurations of 4 (identical other than hardware age) 1TB Seagate disks. One of the arrangements I tested was Data using single profile and Metadata/System using RAID1. Based on the results I got, and what you are reporting, the mount time doesn't scale linearly in proportion to the amount of storage space. You might want to try the RAID10 profile for Metadata, of the configurations I tested, the fastest used Single for Data and RAID10 for Metadata/System. Also, based on the System chunk usage, I'm guessing that you have a LOT of subvolumes/snapshots, and I do know that having very large (100+) numbers of either does slow down the mount command (I don't think that we cache subvolume information between mount invocations, so it has to re-parse the system chunks for each individual mount). [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2967 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mount time of multi-disk arrays 2014-07-07 14:14 ` Austin S Hemmelgarn @ 2014-07-07 16:57 ` André-Sebastian Liebe 0 siblings, 0 replies; 9+ messages in thread From: André-Sebastian Liebe @ 2014-07-07 16:57 UTC (permalink / raw) To: Austin S Hemmelgarn, Konstantinos Skarlatos, linux-btrfs On 07/07/2014 04:14 PM, Austin S Hemmelgarn wrote: > On 2014-07-07 09:54, Konstantinos Skarlatos wrote: >> On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: >>> Hello List, >>> >>> can anyone tell me how much time is acceptable and assumable for a >>> multi-disk btrfs array with classical hard disk drives to mount? >>> >>> I'm having a bit of trouble with my current systemd setup, because it >>> couldn't mount my btrfs raid anymore after adding the 5th drive. With >>> the 4 drive setup it failed to mount once in a few times. Now it fails >>> everytime because the default timeout of 1m 30s is reached and mount is >>> aborted. >>> My last 10 manual mounts took between 1m57s and 2m12s to finish. >> I have the exact same problem, and have to manually mount my large >> multi-disk btrfs filesystems, so I would be interested in a solution as >> well. >> >>> My hardware setup contains a >>> - Intel Core i7 4770 >>> - Kernel 3.15.2-1-ARCH >>> - 32GB RAM >>> - dev 1-4 are 4TB Seagate ST4000DM000 (5900rpm) >>> - dev 5 is a 4TB Wstern Digital WDC WD40EFRX (5400rpm) >>> >>> Thanks in advance >>> >>> André-Sebastian Liebe >>> -------------------------------------------------------------------------------------------------- >>> >>> >>> # btrfs fi sh >>> Label: 'apc01_pool0' uuid: 066141c6-16ca-4a30-b55c-e606b90ad0fb >>> Total devices 5 FS bytes used 14.21TiB >>> devid 1 size 3.64TiB used 2.86TiB path /dev/sdd >>> devid 2 size 3.64TiB used 2.86TiB path /dev/sdc >>> devid 3 size 3.64TiB used 2.86TiB path /dev/sdf >>> devid 4 size 3.64TiB used 2.86TiB path /dev/sde >>> devid 5 size 3.64TiB used 2.88TiB path /dev/sdb >>> >>> Btrfs v3.14.2-dirty >>> >>> # btrfs fi df /data/pool0/ >>> Data, single: total=14.28TiB, used=14.19TiB >>> System, RAID1: total=8.00MiB, used=1.54MiB >>> Metadata, RAID1: total=26.00GiB, used=20.20GiB >>> unknown, single: total=512.00MiB, used=0.00 > This is interesting, I actually did some profiling of the mount timings > for a bunch of different configurations of 4 (identical other than > hardware age) 1TB Seagate disks. One of the arrangements I tested was > Data using single profile and Metadata/System using RAID1. Based on the > results I got, and what you are reporting, the mount time doesn't scale > linearly in proportion to the amount of storage space. > > You might want to try the RAID10 profile for Metadata, of the > configurations I tested, the fastest used Single for Data and RAID10 for > Metadata/System. Switching Metadata from raid1 to raid10 reduced mount times from roughly 120s to 38s! > > Also, based on the System chunk usage, I'm guessing that you have a LOT > of subvolumes/snapshots, and I do know that having very large (100+) > numbers of either does slow down the mount command (I don't think that > we cache subvolume information between mount invocations, so it has to > re-parse the system chunks for each individual mount). No, I had to remove the one and only snapshot to recover from a 'no space left on device' to regain metadata space (http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html) -- André-Sebastian Liebe ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mount time of multi-disk arrays 2014-07-07 13:54 ` Konstantinos Skarlatos 2014-07-07 14:14 ` Austin S Hemmelgarn @ 2014-07-07 14:24 ` André-Sebastian Liebe 2014-07-07 22:34 ` Konstantinos Skarlatos 2014-07-07 15:48 ` Duncan 2 siblings, 1 reply; 9+ messages in thread From: André-Sebastian Liebe @ 2014-07-07 14:24 UTC (permalink / raw) To: Konstantinos Skarlatos, linux-btrfs On 07/07/2014 03:54 PM, Konstantinos Skarlatos wrote: > On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: >> Hello List, >> >> can anyone tell me how much time is acceptable and assumable for a >> multi-disk btrfs array with classical hard disk drives to mount? >> >> I'm having a bit of trouble with my current systemd setup, because it >> couldn't mount my btrfs raid anymore after adding the 5th drive. With >> the 4 drive setup it failed to mount once in a few times. Now it fails >> everytime because the default timeout of 1m 30s is reached and mount is >> aborted. >> My last 10 manual mounts took between 1m57s and 2m12s to finish. > I have the exact same problem, and have to manually mount my large > multi-disk btrfs filesystems, so I would be interested in a solution > as well. Hi Konstantinos , you can workaround this by manual creating a systemd mount unit. - First review the autogenerated systemd mount unit (systemctl show <your-mount-unit>.mount). You you can get the unit name by issuing a 'systemctl' and look for your failed mount. - Then you have to take the needed values (After, Before, Conflicts, RequiresMountsFor, Where, What, Options, Type, Wantedby) and put them into an new systemd mount unit file (possibly under /usr/lib/systemd/system/<your-mount-unit>.mount ). - Now just add the TimeoutSec with a large enough value below [Mount]. - If you later want to automount you raid, add the WantedBy under [Install] - now issue a 'systemctl daemon-reload' and look for error messages in syslog. - If there are no errors you could enable your manual mount entry by 'systemctl enable <your-mount-unit>.mount' and safely comment out your old fstab entry (systemd no longer generates autogenerated units). -- 8< ----------- 8< ----------- 8< ----------- 8< ----------- 8< ----------- 8< ----------- 8< ----------- [Unit] Description=Mount /data/pool0 After=dev-disk-by\x2duuid-066141c6\x2d16ca\x2d4a30\x2db55c\x2de606b90ad0fb.device systemd-journald.socket local-fs-pre.target system.slice -.mount Before=umount.target Conflicts=umount.target RequiresMountsFor=/data /dev/disk/by-uuid/066141c6-16ca-4a30-b55c-e606b90ad0fb [Mount] Where=/data/pool0 What=/dev/disk/by-uuid/066141c6-16ca-4a30-b55c-e606b90ad0fb Options=rw,relatime,skip_balance,compress Type=btrfs TimeoutSec=3min [Install] WantedBy=dev-disk-by\x2duuid-066141c6\x2d16ca\x2d4a30\x2db55c\x2de606b90ad0fb.device -- 8< ----------- 8< ----------- 8< ----------- 8< ----------- 8< ----------- 8< ----------- 8< ----------- > >> >> My hardware setup contains a >> - Intel Core i7 4770 >> - Kernel 3.15.2-1-ARCH >> - 32GB RAM >> - dev 1-4 are 4TB Seagate ST4000DM000 (5900rpm) >> - dev 5 is a 4TB Wstern Digital WDC WD40EFRX (5400rpm) >> >> Thanks in advance >> >> André-Sebastian Liebe >> -------------------------------------------------------------------------------------------------- >> >> >> # btrfs fi sh >> Label: 'apc01_pool0' uuid: 066141c6-16ca-4a30-b55c-e606b90ad0fb >> Total devices 5 FS bytes used 14.21TiB >> devid 1 size 3.64TiB used 2.86TiB path /dev/sdd >> devid 2 size 3.64TiB used 2.86TiB path /dev/sdc >> devid 3 size 3.64TiB used 2.86TiB path /dev/sdf >> devid 4 size 3.64TiB used 2.86TiB path /dev/sde >> devid 5 size 3.64TiB used 2.88TiB path /dev/sdb >> >> Btrfs v3.14.2-dirty >> >> # btrfs fi df /data/pool0/ >> Data, single: total=14.28TiB, used=14.19TiB >> System, RAID1: total=8.00MiB, used=1.54MiB >> Metadata, RAID1: total=26.00GiB, used=20.20GiB >> unknown, single: total=512.00MiB, used=0.00 >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe >> linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Konstantinos Skarlatos -- André-Sebastian Liebe ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mount time of multi-disk arrays 2014-07-07 14:24 ` André-Sebastian Liebe @ 2014-07-07 22:34 ` Konstantinos Skarlatos 0 siblings, 0 replies; 9+ messages in thread From: Konstantinos Skarlatos @ 2014-07-07 22:34 UTC (permalink / raw) To: André-Sebastian Liebe, linux-btrfs On 7/7/2014 5:24 μμ, André-Sebastian Liebe wrote: > On 07/07/2014 03:54 PM, Konstantinos Skarlatos wrote: >> On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: >>> Hello List, >>> >>> can anyone tell me how much time is acceptable and assumable for a >>> multi-disk btrfs array with classical hard disk drives to mount? >>> >>> I'm having a bit of trouble with my current systemd setup, because it >>> couldn't mount my btrfs raid anymore after adding the 5th drive. With >>> the 4 drive setup it failed to mount once in a few times. Now it fails >>> everytime because the default timeout of 1m 30s is reached and mount is >>> aborted. >>> My last 10 manual mounts took between 1m57s and 2m12s to finish. >> I have the exact same problem, and have to manually mount my large >> multi-disk btrfs filesystems, so I would be interested in a solution >> as well. > Hi Konstantinos , you can workaround this by manual creating a systemd > mount unit. > > - First review the autogenerated systemd mount unit (systemctl show > <your-mount-unit>.mount). You you can get the unit name by issuing a > 'systemctl' and look for your failed mount. > - Then you have to take the needed values (After, Before, Conflicts, > RequiresMountsFor, Where, What, Options, Type, Wantedby) and put them > into an new systemd mount unit file (possibly under > /usr/lib/systemd/system/<your-mount-unit>.mount ). > - Now just add the TimeoutSec with a large enough value below [Mount]. > - If you later want to automount you raid, add the WantedBy under [Install] > - now issue a 'systemctl daemon-reload' and look for error messages in > syslog. > - If there are no errors you could enable your manual mount entry by > 'systemctl enable <your-mount-unit>.mount' and safely comment out your > old fstab entry (systemd no longer generates autogenerated units). > > -- 8< ----------- 8< ----------- 8< ----------- 8< ----------- 8< > ----------- 8< ----------- 8< ----------- > [Unit] > Description=Mount /data/pool0 > After=dev-disk-by\x2duuid-066141c6\x2d16ca\x2d4a30\x2db55c\x2de606b90ad0fb.device > systemd-journald.socket local-fs-pre.target system.slice -.mount > Before=umount.target > Conflicts=umount.target > RequiresMountsFor=/data > /dev/disk/by-uuid/066141c6-16ca-4a30-b55c-e606b90ad0fb > > [Mount] > Where=/data/pool0 > What=/dev/disk/by-uuid/066141c6-16ca-4a30-b55c-e606b90ad0fb > Options=rw,relatime,skip_balance,compress > Type=btrfs > TimeoutSec=3min > > [Install] > WantedBy=dev-disk-by\x2duuid-066141c6\x2d16ca\x2d4a30\x2db55c\x2de606b90ad0fb.device > -- 8< ----------- 8< ----------- 8< ----------- 8< ----------- 8< > ----------- 8< ----------- 8< ----------- Hi André, This unit file works for me, thank you for creating it! Can somebody put it on the wiki? > > >>> My hardware setup contains a >>> - Intel Core i7 4770 >>> - Kernel 3.15.2-1-ARCH >>> - 32GB RAM >>> - dev 1-4 are 4TB Seagate ST4000DM000 (5900rpm) >>> - dev 5 is a 4TB Wstern Digital WDC WD40EFRX (5400rpm) >>> >>> Thanks in advance >>> >>> André-Sebastian Liebe >>> -------------------------------------------------------------------------------------------------- >>> >>> >>> # btrfs fi sh >>> Label: 'apc01_pool0' uuid: 066141c6-16ca-4a30-b55c-e606b90ad0fb >>> Total devices 5 FS bytes used 14.21TiB >>> devid 1 size 3.64TiB used 2.86TiB path /dev/sdd >>> devid 2 size 3.64TiB used 2.86TiB path /dev/sdc >>> devid 3 size 3.64TiB used 2.86TiB path /dev/sdf >>> devid 4 size 3.64TiB used 2.86TiB path /dev/sde >>> devid 5 size 3.64TiB used 2.88TiB path /dev/sdb >>> >>> Btrfs v3.14.2-dirty >>> >>> # btrfs fi df /data/pool0/ >>> Data, single: total=14.28TiB, used=14.19TiB >>> System, RAID1: total=8.00MiB, used=1.54MiB >>> Metadata, RAID1: total=26.00GiB, used=20.20GiB >>> unknown, single: total=512.00MiB, used=0.00 >>> >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe >>> linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> Konstantinos Skarlatos > -- > André-Sebastian Liebe > -- Konstantinos Skarlatos ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mount time of multi-disk arrays 2014-07-07 13:54 ` Konstantinos Skarlatos 2014-07-07 14:14 ` Austin S Hemmelgarn 2014-07-07 14:24 ` André-Sebastian Liebe @ 2014-07-07 15:48 ` Duncan 2014-07-07 16:40 ` Benjamin O'Connor 2014-07-07 22:31 ` Konstantinos Skarlatos 2 siblings, 2 replies; 9+ messages in thread From: Duncan @ 2014-07-07 15:48 UTC (permalink / raw) To: linux-btrfs Konstantinos Skarlatos posted on Mon, 07 Jul 2014 16:54:05 +0300 as excerpted: > On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: >> >> can anyone tell me how much time is acceptable and assumable for a >> multi-disk btrfs array with classical hard disk drives to mount? >> >> I'm having a bit of trouble with my current systemd setup, because it >> couldn't mount my btrfs raid anymore after adding the 5th drive. With >> the 4 drive setup it failed to mount once in a few times. Now it fails >> everytime because the default timeout of 1m 30s is reached and mount is >> aborted. >> My last 10 manual mounts took between 1m57s and 2m12s to finish. > I have the exact same problem, and have to manually mount my large > multi-disk btrfs filesystems, so I would be interested in a solution as > well. I don't have a direct answer, as my btrfs devices are all SSD, but... a) Btrfs, like some other filesystems, is designed not to need a pre-mount (or pre-rw-mount) fsck, because it does what /should/ be a quick-scan at mount-time. However, that isn't always as quick as it might be for a number of reasons: a1) Btrfs is still a relatively immature filesystem and certain operations are not yet optimized. In particular, multi-device btrfs operations tend to still be using a first-working-implementation type of algorithm instead of a well optimized for parallel operation algorithm, and thus often serialize access to multiple devices where a more optimized algorithm would parallelize operations across multiple devices at the same time. That will come, but it's not there yet. a2) Certain operations such as orphan cleanup ("orphans" are files that were deleted while they were in use and thus weren't fully deleted at the time; if they were still in use at unmount (remount-read-only), cleanup is done at mount-time) can delay mount as well. a3) Inode_cache mount option: Don't use this unless you can explain exactly WHY you are using it, preferably backed up with benchmark numbers, etc. It's useful only on 32-bit, generally high-file-activity server systems and has general-case problems, including long mount times and possible overflow issues that make it inappropriate for normal use. Unfortunately there's a lot of people out there using it that shouldn't be, and I even saw it listed on at least one distro (not mine!) wiki. =:^( a4) The space_cache mount option OTOH *IS* appropriate for normal use (and is in fact enabled by default these days), but particularly in improper shutdown cases can require rebuilding at mount time -- altho this should happen /after/ mount, the system will just be busy for some minutes, until the space-cache is rebuilt. But the IO from a space_cache rebuild on one filesystem could slow down the mounting of filesystems that mount after it, as well as the boot-time launching of other post- mount launched services. If you're seeing the time go up dramatically with the addition of more filesystem devices, however, and you do /not/ have inode_cache active, I'd guess it's mainly the not-yet-optimized multi-device operations. b) As with any systemd launched unit, however, there's systemd configuration mechanisms for working around specific unit issues, including timeout issues. Of course most systems continue to use fstab and let systemd auto-generate the mount units, and in fact that is recommended, but either with fstab or directly created mount units, there's a timeout configuration option that can be set. b1) The general systemd *.mount unit [Mount] section option appears to be TimeoutSec=. As is usual with systemd times, the default is seconds, or pass the unit(s, like "5min 20s"). b2) I don't see it /specifically/ stated, but with a bit of reading between the lines, the corresponding fstab option appears to be either x-systemd.timeoutsec= or x-systemd.TimeoutSec= (IOW I'm not sure of the case). You may also want to try x-systemd.device-timeout=, which /is/ specifically mentioned, altho that appears to be specifically the timeout for the device to appear, NOT for the filesystem to mount after it does. b3) See the systemd.mount (5) and systemd-fstab-generator (8) manpages for more, that being what the above is based on. So it might take a bit of experimentation to find the exact command, but based on the above anyway, it /should/ be pretty easy to tell systemd to wait a bit longer for that filesystem. When you find the right invocation, please reply with it here, as I'm sure there's others who will benefit as well. FWIW, I'm still on reiserfs for my spinning rust (only btrfs on my ssds), but I expect I'll switch them to btrfs at some point, so I may well use the information myself. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mount time of multi-disk arrays 2014-07-07 15:48 ` Duncan @ 2014-07-07 16:40 ` Benjamin O'Connor 2014-07-07 22:31 ` Konstantinos Skarlatos 1 sibling, 0 replies; 9+ messages in thread From: Benjamin O'Connor @ 2014-07-07 16:40 UTC (permalink / raw) To: Duncan; +Cc: linux-btrfs As a point of reference, my BTRFS filesystem with 11 x 21TB devices in RAID0 with space cache enabled takes about 4 minutes to mount after a clean unmount. There is a decent amount of variation in the amount of time (has been as low as 3 minutes or taken 5 minutes or longer). These devices are all connected via 10gb iscsi. Mount time seems to have not increased relative to the number of devices (so far). I think that back when we had only 6 devices, it still took roughly that amount of time. -ben -- ----------------------------- Benjamin O'Connor TechOps Systems Administrator TripAdvisor Media Group benoc@tripadvisor.com c. 617-312-9072 ----------------------------- Duncan wrote: > Konstantinos Skarlatos posted on Mon, 07 Jul 2014 16:54:05 +0300 as > excerpted: > >> On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: >>> can anyone tell me how much time is acceptable and assumable for a >>> multi-disk btrfs array with classical hard disk drives to mount? >>> >>> I'm having a bit of trouble with my current systemd setup, because it >>> couldn't mount my btrfs raid anymore after adding the 5th drive. With >>> the 4 drive setup it failed to mount once in a few times. Now it fails >>> everytime because the default timeout of 1m 30s is reached and mount is >>> aborted. >>> My last 10 manual mounts took between 1m57s and 2m12s to finish. > >> I have the exact same problem, and have to manually mount my large >> multi-disk btrfs filesystems, so I would be interested in a solution as >> well. > > I don't have a direct answer, as my btrfs devices are all SSD, but... > > a) Btrfs, like some other filesystems, is designed not to need a > pre-mount (or pre-rw-mount) fsck, because it does what /should/ be a > quick-scan at mount-time. However, that isn't always as quick as it > might be for a number of reasons: > > a1) Btrfs is still a relatively immature filesystem and certain > operations are not yet optimized. In particular, multi-device btrfs > operations tend to still be using a first-working-implementation type of > algorithm instead of a well optimized for parallel operation algorithm, > and thus often serialize access to multiple devices where a more > optimized algorithm would parallelize operations across multiple devices > at the same time. That will come, but it's not there yet. > > a2) Certain operations such as orphan cleanup ("orphans" are files that > were deleted while they were in use and thus weren't fully deleted at the > time; if they were still in use at unmount (remount-read-only), cleanup > is done at mount-time) can delay mount as well. > > a3) Inode_cache mount option: Don't use this unless you can explain > exactly WHY you are using it, preferably backed up with benchmark > numbers, etc. It's useful only on 32-bit, generally high-file-activity > server systems and has general-case problems, including long mount times > and possible overflow issues that make it inappropriate for normal use. > Unfortunately there's a lot of people out there using it that shouldn't > be, and I even saw it listed on at least one distro (not mine!) wiki. =:^( > > a4) The space_cache mount option OTOH *IS* appropriate for normal use > (and is in fact enabled by default these days), but particularly in > improper shutdown cases can require rebuilding at mount time -- altho > this should happen /after/ mount, the system will just be busy for some > minutes, until the space-cache is rebuilt. But the IO from a space_cache > rebuild on one filesystem could slow down the mounting of filesystems > that mount after it, as well as the boot-time launching of other post- > mount launched services. > > If you're seeing the time go up dramatically with the addition of more > filesystem devices, however, and you do /not/ have inode_cache active, > I'd guess it's mainly the not-yet-optimized multi-device operations. > > > b) As with any systemd launched unit, however, there's systemd > configuration mechanisms for working around specific unit issues, > including timeout issues. Of course most systems continue to use fstab > and let systemd auto-generate the mount units, and in fact that is > recommended, but either with fstab or directly created mount units, > there's a timeout configuration option that can be set. > > b1) The general systemd *.mount unit [Mount] section option appears to be > TimeoutSec=. As is usual with systemd times, the default is seconds, or > pass the unit(s, like "5min 20s"). > > b2) I don't see it /specifically/ stated, but with a bit of reading > between the lines, the corresponding fstab option appears to be either > x-systemd.timeoutsec= or x-systemd.TimeoutSec= (IOW I'm not sure of the > case). You may also want to try x-systemd.device-timeout=, which /is/ > specifically mentioned, altho that appears to be specifically the timeout > for the device to appear, NOT for the filesystem to mount after it does. > > b3) See the systemd.mount (5) and systemd-fstab-generator (8) manpages > for more, that being what the above is based on. > > So it might take a bit of experimentation to find the exact command, but > based on the above anyway, it /should/ be pretty easy to tell systemd to > wait a bit longer for that filesystem. > > When you find the right invocation, please reply with it here, as I'm > sure there's others who will benefit as well. FWIW, I'm still on > reiserfs for my spinning rust (only btrfs on my ssds), but I expect I'll > switch them to btrfs at some point, so I may well use the information > myself. =:^) > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mount time of multi-disk arrays 2014-07-07 15:48 ` Duncan 2014-07-07 16:40 ` Benjamin O'Connor @ 2014-07-07 22:31 ` Konstantinos Skarlatos 1 sibling, 0 replies; 9+ messages in thread From: Konstantinos Skarlatos @ 2014-07-07 22:31 UTC (permalink / raw) To: Duncan, linux-btrfs On 7/7/2014 6:48 μμ, Duncan wrote: > Konstantinos Skarlatos posted on Mon, 07 Jul 2014 16:54:05 +0300 as > excerpted: > >> On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote: >>> can anyone tell me how much time is acceptable and assumable for a >>> multi-disk btrfs array with classical hard disk drives to mount? >>> >>> I'm having a bit of trouble with my current systemd setup, because it >>> couldn't mount my btrfs raid anymore after adding the 5th drive. With >>> the 4 drive setup it failed to mount once in a few times. Now it fails >>> everytime because the default timeout of 1m 30s is reached and mount is >>> aborted. >>> My last 10 manual mounts took between 1m57s and 2m12s to finish. >> I have the exact same problem, and have to manually mount my large >> multi-disk btrfs filesystems, so I would be interested in a solution as >> well. > I don't have a direct answer, as my btrfs devices are all SSD, but... > > a) Btrfs, like some other filesystems, is designed not to need a > pre-mount (or pre-rw-mount) fsck, because it does what /should/ be a > quick-scan at mount-time. However, that isn't always as quick as it > might be for a number of reasons: > > a1) Btrfs is still a relatively immature filesystem and certain > operations are not yet optimized. In particular, multi-device btrfs > operations tend to still be using a first-working-implementation type of > algorithm instead of a well optimized for parallel operation algorithm, > and thus often serialize access to multiple devices where a more > optimized algorithm would parallelize operations across multiple devices > at the same time. That will come, but it's not there yet. > > a2) Certain operations such as orphan cleanup ("orphans" are files that > were deleted while they were in use and thus weren't fully deleted at the > time; if they were still in use at unmount (remount-read-only), cleanup > is done at mount-time) can delay mount as well. > > a3) Inode_cache mount option: Don't use this unless you can explain > exactly WHY you are using it, preferably backed up with benchmark > numbers, etc. It's useful only on 32-bit, generally high-file-activity > server systems and has general-case problems, including long mount times > and possible overflow issues that make it inappropriate for normal use. > Unfortunately there's a lot of people out there using it that shouldn't > be, and I even saw it listed on at least one distro (not mine!) wiki. =:^( > > a4) The space_cache mount option OTOH *IS* appropriate for normal use > (and is in fact enabled by default these days), but particularly in > improper shutdown cases can require rebuilding at mount time -- altho > this should happen /after/ mount, the system will just be busy for some > minutes, until the space-cache is rebuilt. But the IO from a space_cache > rebuild on one filesystem could slow down the mounting of filesystems > that mount after it, as well as the boot-time launching of other post- > mount launched services. > > If you're seeing the time go up dramatically with the addition of more > filesystem devices, however, and you do /not/ have inode_cache active, > I'd guess it's mainly the not-yet-optimized multi-device operations. > > > b) As with any systemd launched unit, however, there's systemd > configuration mechanisms for working around specific unit issues, > including timeout issues. Of course most systems continue to use fstab > and let systemd auto-generate the mount units, and in fact that is > recommended, but either with fstab or directly created mount units, > there's a timeout configuration option that can be set. > > b1) The general systemd *.mount unit [Mount] section option appears to be > TimeoutSec=. As is usual with systemd times, the default is seconds, or > pass the unit(s, like "5min 20s"). > > b2) I don't see it /specifically/ stated, but with a bit of reading > between the lines, the corresponding fstab option appears to be either > x-systemd.timeoutsec= or x-systemd.TimeoutSec= (IOW I'm not sure of the > case). You may also want to try x-systemd.device-timeout=, which /is/ > specifically mentioned, altho that appears to be specifically the timeout > for the device to appear, NOT for the filesystem to mount after it does. > > b3) See the systemd.mount (5) and systemd-fstab-generator (8) manpages > for more, that being what the above is based on. Thanks for your detailed answer. A mount unit with a larger timeout works fine, maybe we should tell distro maintainers to up the limit for btrfs to 5 minutes or so? In my experience, mount time definitely grows as the filesystem grows older, and times out after snapshot count gets more than 500-1000 . I guess thats something that can be optimized in the future, but i believe stability is a much more urgent need now... > > So it might take a bit of experimentation to find the exact command, but > based on the above anyway, it /should/ be pretty easy to tell systemd to > wait a bit longer for that filesystem. > > When you find the right invocation, please reply with it here, as I'm > sure there's others who will benefit as well. FWIW, I'm still on > reiserfs for my spinning rust (only btrfs on my ssds), but I expect I'll > switch them to btrfs at some point, so I may well use the information > myself. =:^) > -- Konstantinos Skarlatos ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-07-07 22:34 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-07 13:38 mount time of multi-disk arrays André-Sebastian Liebe 2014-07-07 13:54 ` Konstantinos Skarlatos 2014-07-07 14:14 ` Austin S Hemmelgarn 2014-07-07 16:57 ` André-Sebastian Liebe 2014-07-07 14:24 ` André-Sebastian Liebe 2014-07-07 22:34 ` Konstantinos Skarlatos 2014-07-07 15:48 ` Duncan 2014-07-07 16:40 ` Benjamin O'Connor 2014-07-07 22:31 ` Konstantinos Skarlatos
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).