* btrfs stuck with lot's of files
@ 2014-12-01 11:46 Peter Volkov
2014-12-01 18:47 ` Robert White
2014-12-02 1:33 ` Qu Wenruo
0 siblings, 2 replies; 10+ messages in thread
From: Peter Volkov @ 2014-12-01 11:46 UTC (permalink / raw)
To: linux-btrfs@vger.kernel.org
Hi, guys.
We have a problem with btrfs file system: sometimes it became stuck
without leaving me any way to interrupt it (shutdown -r now is unable to
restart server). By stuck I mean some processes that previously were
able to write on disk are unable to cope with load and load average goes
up:
top - 13:10:58 up 1 day, 9:26, 5 users, load average: 157.76, 156.61,
149.29
Tasks: 235 total, 2 running, 233 sleeping, 0 stopped, 0 zombie
%Cpu(s): 19.8 us, 15.0 sy, 0.0 ni, 60.7 id, 3.9 wa, 0.0 hi, 0.6 si,
0.0 st
KiB Mem: 65922104 total, 65414856 used, 507248 free, 1844 buffers
KiB Swap: 0 total, 0 used, 0 free. 62570804 cached
Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
8644 root 20 0 0 0 0 R 96.5 0.0 127:21.95
kworker/u16:16
5047 dvr 20 0 6884292 122668 4132 S 6.4 0.2 258:59.49
dvrserver
30223 root 20 0 20140 2600 2132 R 6.4 0.0 0:00.01
top
1 root 20 0 4276 1628 1524 S 0.0 0.0 0:40.19
init
There are about 300 treads on server, some of which are writing on disk.
A bit information about this btrfs filesystem: this is 22 disk file
system with raid1 for metadata and raid0 for data:
# btrfs filesystem df /store/
Data, single: total=11.92TiB, used=10.86TiB
System, RAID1: total=8.00MiB, used=1.27MiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=46.00GiB, used=33.49GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=128.00KiB
# btrfs property get /store/
ro=false
label=store
# btrfs device stats /store/
(shows all zeros)
# btrfs balance status /store/
No balance found on '/store/'
# btrfs filesystem show /store/
Btrfs v3.17.1
(btw, is it supposed to have only version here?)
As for load we write quite small files of size (some of 313K, some of
800K), that's why metadata takes that much. So back to the problem.
iostat 1 exposes following problem:
avg-cpu: %user %nice %system %iowait %steal %idle
16.96 0.00 17.09 65.95 0.00 0.00
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdb 0.00 0.00 0.00 0 0
sde 0.00 0.00 0.00 0 0
sdd 0.00 0.00 0.00 0 0
sdf 0.00 0.00 0.00 0 0
sdg 0.00 0.00 0.00 0 0
sdj 0.00 0.00 0.00 0 0
sdh 0.00 0.00 0.00 0 0
sdk 0.00 0.00 0.00 0 0
sdi 1.00 0.00 200.00 0 200
sdl 0.00 0.00 0.00 0 0
sdn 48.00 0.00 17260.00 0 17260
sdm 0.00 0.00 0.00 0 0
sdp 0.00 0.00 0.00 0 0
sdo 0.00 0.00 0.00 0 0
sdq 0.00 0.00 0.00 0 0
sdr 0.00 0.00 0.00 0 0
sds 0.00 0.00 0.00 0 0
sdt 0.00 0.00 0.00 0 0
sdv 0.00 0.00 0.00 0 0
sdw 0.00 0.00 0.00 0 0
sdu 0.00 0.00 0.00 0 0
write goes to one disk. I've tried to debug what's going in kworker and
did
$ echo workqueue:workqueue_queue_work
> /sys/kernel/debug/tracing/set_event
$ cat /sys/kernel/debug/tracing/trace_pipe > trace_pipe.out2
trace_pipe2.out.xz in attachment. Could you comment, what goes wrong
here?
Server has 64Gb of RAM. Is it possible that it is unable to keep all
metadata in memory, can we encrease this memory limit, if exists?
Thanks in advance for any pointers,
--
Peter.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: btrfs stuck with lot's of files
2014-12-01 11:46 btrfs stuck with lot's of files Peter Volkov
@ 2014-12-01 18:47 ` Robert White
2014-12-02 1:50 ` Peter Volkov
2014-12-02 1:33 ` Qu Wenruo
1 sibling, 1 reply; 10+ messages in thread
From: Robert White @ 2014-12-01 18:47 UTC (permalink / raw)
To: Peter Volkov, linux-btrfs@vger.kernel.org
On 12/01/2014 03:46 AM, Peter Volkov wrote:
> Hi, guys.
> (stuff about getting hung up trying to write to one drive)
That drive (/dev/sdn) is probably starting to fail. Some older drives
basically go unresponsive when they start to go bad. Particularly if
they've gone bad enough to have run out of spare tracks/sectors.
Sometimes they will just refuse to answer. Sometimes they will go into
"try again" mode, and the same activity will be retried indefinitely.
This will then fill up your write queues and jam up all sorts of subsystems.
Step 1: Backup your data. Since you didn't RAID your data at all, when
that drive dies your data is going to go away in fascinating and
unpredictable ways. (RAID1 metadata with no RAID1 or RAID5 of the data
means you have essentially no media failure protection.)
Step 2: Turn on SMART (if you can and you can) and check whether the
drive is in its final moments of life. If your disk is all green lights
according to smart, you may be able to un-jamb it by just doing a
balance as described and explained after the next time I quote you.
Step 3: Switch your data mode to RAID5. It will cost you about half of
your currenly free data space, but it won't leave you _as_ _vulnerable_
to complete data loss as you are now. SMART might be wrong about your
drive being fine if it says it is.
> # btrfs filesystem df /store/
> Data, single: total=11.92TiB, used=10.86TiB
Reguardless of the above...
You have a terabyte of unused but allocated data storage. You probably
need to balance your system to un-jamb that. That's a lot of space that
is unavailable to the metadata (etc).
ASIDE: Having your metadata set to RAID1 (as opposed to the default of
DUP) seems a little iffy since your data is still set to DUP. This
configuration is not going to leave you with a mountable filesystem if
you lose a disk. I'm not sure if the RAID1 layout is going to want to
put specific datum in specific places, but it might, which if it does
might leave you in an irreconcilable position.
Either way, you will probably un-jam your system in the short run by
doing a balance. A full balance (no filter args at all) would be your
best bet.
FUTHER ASIDE: raid1 metadata and raid5 data might be good for you given
22 volumes and 10% empty empty space it would only cost you half of your
existing empty space. If you don't RAID your data, there is no real
point to putting your metadata in RAID.
[Yes, I said my basic points about your current layout two different
ways and times. You are either "just a little over-committed on space"
or you are "about to lose all your data" and it's impossible to tell
which is the case from here.]
Backup your data. NOW!
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: btrfs stuck with lot's of files
2014-12-01 11:46 btrfs stuck with lot's of files Peter Volkov
2014-12-01 18:47 ` Robert White
@ 2014-12-02 1:33 ` Qu Wenruo
2014-12-02 2:00 ` Peter Volkov
2014-12-04 22:58 ` Reiterate: " Peter Volkov
1 sibling, 2 replies; 10+ messages in thread
From: Qu Wenruo @ 2014-12-02 1:33 UTC (permalink / raw)
To: Peter Volkov, linux-btrfs@vger.kernel.org
-------- Original Message --------
Subject: btrfs stuck with lot's of files
From: Peter Volkov <pva@gentoo.org>
To: linux-btrfs@vger.kernel.org <linux-btrfs@vger.kernel.org>
Date: 2014年12月01日 19:46
> Hi, guys.
>
> We have a problem with btrfs file system: sometimes it became stuck
> without leaving me any way to interrupt it (shutdown -r now is unable to
> restart server). By stuck I mean some processes that previously were
> able to write on disk are unable to cope with load and load average goes
> up:
>
> top - 13:10:58 up 1 day, 9:26, 5 users, load average: 157.76, 156.61,
> 149.29
> Tasks: 235 total, 2 running, 233 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 19.8 us, 15.0 sy, 0.0 ni, 60.7 id, 3.9 wa, 0.0 hi, 0.6 si,
> 0.0 st
> KiB Mem: 65922104 total, 65414856 used, 507248 free, 1844 buffers
> KiB Swap: 0 total, 0 used, 0 free. 62570804 cached
> Mem
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> COMMAND
> 8644 root 20 0 0 0 0 R 96.5 0.0 127:21.95
> kworker/u16:16
> 5047 dvr 20 0 6884292 122668 4132 S 6.4 0.2 258:59.49
> dvrserver
> 30223 root 20 0 20140 2600 2132 R 6.4 0.0 0:00.01
> top
> 1 root 20 0 4276 1628 1524 S 0.0 0.0 0:40.19
> init
>
>
>
> There are about 300 treads on server, some of which are writing on disk.
> A bit information about this btrfs filesystem: this is 22 disk file
> system with raid1 for metadata and raid0 for data:
>
> # btrfs filesystem df /store/
> Data, single: total=11.92TiB, used=10.86TiB
> System, RAID1: total=8.00MiB, used=1.27MiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, RAID1: total=46.00GiB, used=33.49GiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=512.00MiB, used=128.00KiB
> # btrfs property get /store/
> ro=false
> label=store
> # btrfs device stats /store/
> (shows all zeros)
> # btrfs balance status /store/
> No balance found on '/store/'
> # btrfs filesystem show /store/
> Btrfs v3.17.1
> (btw, is it supposed to have only version here?)
This is a small bug that if there is appending '/' in the path for
'btrfs fi show', it can't recognize it....
Patch is already sent and maybe included next version.
>
> As for load we write quite small files of size (some of 313K, some of
> 800K), that's why metadata takes that much. So back to the problem.
> iostat 1 exposes following problem:
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 16.96 0.00 17.09 65.95 0.00 0.00
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> sda 0.00 0.00 0.00 0 0
> sdc 0.00 0.00 0.00 0 0
> sdb 0.00 0.00 0.00 0 0
> sde 0.00 0.00 0.00 0 0
> sdd 0.00 0.00 0.00 0 0
> sdf 0.00 0.00 0.00 0 0
> sdg 0.00 0.00 0.00 0 0
> sdj 0.00 0.00 0.00 0 0
> sdh 0.00 0.00 0.00 0 0
> sdk 0.00 0.00 0.00 0 0
> sdi 1.00 0.00 200.00 0 200
> sdl 0.00 0.00 0.00 0 0
> sdn 48.00 0.00 17260.00 0 17260
> sdm 0.00 0.00 0.00 0 0
> sdp 0.00 0.00 0.00 0 0
> sdo 0.00 0.00 0.00 0 0
> sdq 0.00 0.00 0.00 0 0
> sdr 0.00 0.00 0.00 0 0
> sds 0.00 0.00 0.00 0 0
> sdt 0.00 0.00 0.00 0 0
> sdv 0.00 0.00 0.00 0 0
> sdw 0.00 0.00 0.00 0 0
> sdu 0.00 0.00 0.00 0 0
>
>
> write goes to one disk. I've tried to debug what's going in kworker and
> did
>
> $ echo workqueue:workqueue_queue_work
>> /sys/kernel/debug/tracing/set_event
> $ cat /sys/kernel/debug/tracing/trace_pipe > trace_pipe.out2
>
> trace_pipe2.out.xz in attachment. Could you comment, what goes wrong
> here?
It seems that attachment is blocked by mail-list so I didn't see the
attachment.
>
> Server has 64Gb of RAM. Is it possible that it is unable to keep all
> metadata in memory, can we encrease this memory limit, if exists?
Not possible, it will never happen (if nothing goes wrong....).
Kernel has the outstanding page cache mechanism, when memory comes short,
some cached metadata/data can be flushed back(if dirty) to disk to free
space.
And re-read from disk if needed later.
So kernel don't need to load all the metadata/data into memory, and
that's mostly impossible for large fs.
And one missing important informantion: kernel version.
What I can see is only the btrfs-progs version, which doesn't really
help for such kernel stuck problem.
Thanks,
Qu
>
>
> Thanks in advance for any pointers,
> --
> Peter.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: btrfs stuck with lot's of files
2014-12-01 18:47 ` Robert White
@ 2014-12-02 1:50 ` Peter Volkov
2014-12-02 12:48 ` Duncan
0 siblings, 1 reply; 10+ messages in thread
From: Peter Volkov @ 2014-12-02 1:50 UTC (permalink / raw)
To: Robert White; +Cc: linux-btrfs@vger.kernel.org
В Пн, 01/12/2014 в 10:47 -0800, Robert White пишет:
> On 12/01/2014 03:46 AM, Peter Volkov wrote:
> > (stuff about getting hung up trying to write to one drive)
>
> That drive (/dev/sdn) is probably starting to fail.
> (about failed drive)
Thank you Robert for the answer. It is not likely that drive fails here.
Similar condition (write to a single drive) happens with other drives
i.e. such write pattern may happen with any drive.
After looking at what happens longer I see the following. During stuck
single processor core is busy 100% of CPU in kernel space (some kworker
is taking 100% CPU). Ftrace reveals that
btrfs_async_reclaim_metadata_space is most frequently called function.
So it looks like btrfs is doing some operation with metadata and until
it finishes that everything is stuck (practically no writes happens on
disk). So I'm looking for suggestion on how to cope with this process.
> > # btrfs filesystem df /store/
> > Data, single: total=11.92TiB, used=10.86TiB
>
> Reguardless of the above...
>
> You have a terabyte of unused but allocated data storage. You probably
> need to balance your system to un-jamb that. That's a lot of space that
> is unavailable to the metadata (etc).
Well, I'm afraid that balance will put fs into even longer "stuck".
> ASIDE: Having your metadata set to RAID1 (as opposed to the default of
> DUP) seems a little iffy since your data is still set to DUP.
That's true. But why data is duplicated? During btrfs volume creation
I've set explicitly -d data single.
> FUTHER ASIDE: raid1 metadata and raid5 data might be good for you given
> 22 volumes and 10% empty empty space it would only cost you half of your
> existing empty space. If you don't RAID your data, there is no real
> point to putting your metadata in RAID.
Is raid5 ready for use? As I read post[1] mentioned on[2] it is still
some way to make it stable.
[1]
http://marc.merlins.org/perso/btrfs/post_2014-03-23_Btrfs-Raid5-Status.html
[2] https://btrfs.wiki.kernel.org/index.php/RAID56
--
Peter.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: btrfs stuck with lot's of files
2014-12-02 1:33 ` Qu Wenruo
@ 2014-12-02 2:00 ` Peter Volkov
2014-12-04 22:58 ` Reiterate: " Peter Volkov
1 sibling, 0 replies; 10+ messages in thread
From: Peter Volkov @ 2014-12-02 2:00 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs@vger.kernel.org
В Вт, 02/12/2014 в 09:33 +0800, Qu Wenruo пишет:
> -------- Original Message --------
> Subject: btrfs stuck with lot's of files
> From: Peter Volkov <pva@gentoo.org>
> To: linux-btrfs@vger.kernel.org <linux-btrfs@vger.kernel.org>
> Date: 2014年12月01日 19:46
> > Hi, guys.
> >
> > We have a problem with btrfs file system: sometimes it became stuck
> > without leaving me any way to interrupt it (shutdown -r now is unable to
> > restart server). By stuck I mean some processes that previously were
> > able to write on disk are unable to cope with load and load average goes
> > up:
> >
> > top - 13:10:58 up 1 day, 9:26, 5 users, load average: 157.76, 156.61,
> > 149.29
> > Tasks: 235 total, 2 running, 233 sleeping, 0 stopped, 0 zombie
> > %Cpu(s): 19.8 us, 15.0 sy, 0.0 ni, 60.7 id, 3.9 wa, 0.0 hi, 0.6 si,
> > 0.0 st
> > KiB Mem: 65922104 total, 65414856 used, 507248 free, 1844 buffers
> > KiB Swap: 0 total, 0 used, 0 free. 62570804 cached
> > Mem
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> > COMMAND
> > 8644 root 20 0 0 0 0 R 96.5 0.0 127:21.95
> > kworker/u16:16
> > 5047 dvr 20 0 6884292 122668 4132 S 6.4 0.2 258:59.49
> > dvrserver
> > 30223 root 20 0 20140 2600 2132 R 6.4 0.0 0:00.01
> > top
> > 1 root 20 0 4276 1628 1524 S 0.0 0.0 0:40.19
> > init
> >
> >
> >
> > There are about 300 treads on server, some of which are writing on disk.
> > A bit information about this btrfs filesystem: this is 22 disk file
> > system with raid1 for metadata and raid0 for data:
> >
> > # btrfs filesystem df /store/
> > Data, single: total=11.92TiB, used=10.86TiB
> > System, RAID1: total=8.00MiB, used=1.27MiB
> > System, single: total=4.00MiB, used=0.00B
> > Metadata, RAID1: total=46.00GiB, used=33.49GiB
> > Metadata, single: total=8.00MiB, used=0.00B
> > GlobalReserve, single: total=512.00MiB, used=128.00KiB
> > # btrfs property get /store/
> > ro=false
> > label=store
> > # btrfs device stats /store/
> > (shows all zeros)
> > # btrfs balance status /store/
> > No balance found on '/store/'
> > # btrfs filesystem show /store/
> > Btrfs v3.17.1
> > (btw, is it supposed to have only version here?)
> This is a small bug that if there is appending '/' in the path for
> 'btrfs fi show', it can't recognize it....
> Patch is already sent and maybe included next version.
> >
> > As for load we write quite small files of size (some of 313K, some of
> > 800K), that's why metadata takes that much. So back to the problem.
> > iostat 1 exposes following problem:
> >
> > avg-cpu: %user %nice %system %iowait %steal %idle
> > 16.96 0.00 17.09 65.95 0.00 0.00
> >
> > Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> > sda 0.00 0.00 0.00 0 0
> > sdc 0.00 0.00 0.00 0 0
> > sdb 0.00 0.00 0.00 0 0
> > sde 0.00 0.00 0.00 0 0
> > sdd 0.00 0.00 0.00 0 0
> > sdf 0.00 0.00 0.00 0 0
> > sdg 0.00 0.00 0.00 0 0
> > sdj 0.00 0.00 0.00 0 0
> > sdh 0.00 0.00 0.00 0 0
> > sdk 0.00 0.00 0.00 0 0
> > sdi 1.00 0.00 200.00 0 200
> > sdl 0.00 0.00 0.00 0 0
> > sdn 48.00 0.00 17260.00 0 17260
> > sdm 0.00 0.00 0.00 0 0
> > sdp 0.00 0.00 0.00 0 0
> > sdo 0.00 0.00 0.00 0 0
> > sdq 0.00 0.00 0.00 0 0
> > sdr 0.00 0.00 0.00 0 0
> > sds 0.00 0.00 0.00 0 0
> > sdt 0.00 0.00 0.00 0 0
> > sdv 0.00 0.00 0.00 0 0
> > sdw 0.00 0.00 0.00 0 0
> > sdu 0.00 0.00 0.00 0 0
> >
> >
> > write goes to one disk. I've tried to debug what's going in kworker and
> > did
> >
> > $ echo workqueue:workqueue_queue_work
> >> /sys/kernel/debug/tracing/set_event
> > $ cat /sys/kernel/debug/tracing/trace_pipe > trace_pipe.out2
> >
> > trace_pipe2.out.xz in attachment. Could you comment, what goes wrong
> > here?
> It seems that attachment is blocked by mail-list so I didn't see the
> attachment.
I've put it here:
https://drive.google.com/file/d/0BygFL6N3ZVUAMWxCQ0tDREE1Uzg/view?usp=sharing
And some additional information I've put in another letter that just
sent to mailing list.
> > Server has 64Gb of RAM. Is it possible that it is unable to keep all
> > metadata in memory, can we encrease this memory limit, if exists?
> Not possible, it will never happen (if nothing goes wrong....).
> Kernel has the outstanding page cache mechanism, when memory comes short,
> some cached metadata/data can be flushed back(if dirty) to disk to free
> space.
> And re-read from disk if needed later.
>
> So kernel don't need to load all the metadata/data into memory, and
> that's mostly impossible for large fs.
Thanks for this explanation! Still I'm looking for suggestion on how to
cope with btrfs_async_reclaim_metadata_space that is mentioned most
frequently in kworker trace.
> And one missing important informantion: kernel version.
This is kernel 3.16.7-gentoo.
--
Peter.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: btrfs stuck with lot's of files
2014-12-02 1:50 ` Peter Volkov
@ 2014-12-02 12:48 ` Duncan
2014-12-02 18:56 ` Ian Armstrong
0 siblings, 1 reply; 10+ messages in thread
From: Duncan @ 2014-12-02 12:48 UTC (permalink / raw)
To: linux-btrfs
Peter Volkov posted on Tue, 02 Dec 2014 04:50:29 +0300 as excerpted:
> В Пн, 01/12/2014 в 10:47 -0800, Robert White пишет:
>> On 12/01/2014 03:46 AM, Peter Volkov wrote:
>> > (stuff about getting hung up trying to write to one drive)
>>
>> That drive (/dev/sdn) is probably starting to fail.
>> (about failed drive)
>
> Thank you Robert for the answer. It is not likely that drive fails here.
> Similar condition (write to a single drive) happens with other drives
> i.e. such write pattern may happen with any drive.
>
> After looking at what happens longer I see the following. During stuck
> single processor core is busy 100% of CPU in kernel space (some kworker
> is taking 100% CPU).
FWIW, agreed that it's unlikely to be the drive, especially if you're not
seeing bus resets or drive errors in dmesg and smart says the drive is
fine, as I expect it does/will. It may be a btrfs bug or scaling issue,
of which btrfs still has some, or it could simply be the single mode vs
raid0 mode issue I explain below.
>> > # btrfs filesystem df /store/
>> > Data, single: total=11.92TiB, used=10.86TiB
>>
>> Reguardless of the above...
>>
>> You have a terabyte of unused but allocated data storage. You probably
>> need to balance your system to un-jamb that. That's a lot of space that
>> is unavailable to the metadata (etc).
>
> Well, I'm afraid that balance will put fs into even longer "stuck".
>
>> ASIDE: Having your metadata set to RAID1 (as opposed to the default of
>> DUP) seems a little iffy since your data is still set to DUP.
>
> That's true. But why data is duplicated? During btrfs volume creation
> I've set explicitly -d data single.
I believe Robert mis-wrote (thinko). The btrfs filesystem df clearly
shows that your data is in single mode, the data default mode, not dup
mode, which is normally only available to metadata (not data) on a single-
device filesystem, where it is the metadata default.
However, in the original post you /did/ say raid1 for metadata, raid0 for
data, and the above btrfs filesystem df again clearly says single, not
raid0.
Which is very likely to be your problem. In single mode, btrfs will
create chunks one at a time, picking the device with the most free space
to allocate it on. The normal data chunk size is 1 GiB. Because of the
most-free-space allocation rule, with N devices (22 in your case) of the
same size, after N (22) data chunks are allocated you'll tend to have one
such chunk on each device.
Each of these 1 GiB chunks (along with space freed up by normal delete
activity in other allocated data chunks) will be filled before another is
allocated.
Which will mean you're writing a GiB worth of data to one device before
you switch to the next one. With your mostly sub-MiB file write pattern,
that's probably 1500-2000 files written to a chunk on that single device,
before another chunk is allocated on the next device.
Thus all your activity on that single device!
In raid0 mode, by contrast, the same 1 GiB chunks will be allocated on
each device, but a stripe of chunks will be allocated across all devices
(22 in your case) at the same time, and data being written is broken up
into much smaller per-device strips. I'm not sure what the actual per-
device is in raid0 mode, but it's *WELL* under a GiB and I believe in the
KiBs not MiB range. It might be 128 KiB, the compression block size when
the compress mount option is used.
Obviously were you using raid0 data, you'd see the load spread out at
least somewhat better. But the df says it's single, not raid0.
To get raid0 mode you can use a balance with filters (see the wiki or
recent btrfs-balance manpage), or blow away the existing filesystem and
create a new one, setting --data raid0 when you mkfs.btrfs, and restore
from backups (which you're already prepared to do if you value your data
in any case[1]).
That missing btrfs filesystem show, due to the terminating / in /store/
(simply /store should work) is somewhat frustrating here, as it'd show
per-device sizes and utilization. Assuming near same-sized devices, with
11 TiB of data being far greater than the 1 GiB data chunk size times 22
devices I'd guess you're pretty evened out, utilization-wise, but the
output from both show and df is necessary to get the full story.
>> FUTHER ASIDE: raid1 metadata and raid5 data might be good for you given
>> 22 volumes and 10% empty empty space it would only cost you half of
>> your existing empty space. If you don't RAID your data, there is no
>> real point to putting your metadata in RAID.
>
> Is raid5 ready for use? As I read post[1] mentioned on[2] it is still
> some way to make it stable.
You are absolutely correct. I'd strongly recommend staying AWAY from
btrfs raid5/6 modes at this time. While Robert is becoming an active
regular and has the technical background to point out some things others
miss, he's still reasonably new to this list and may not have been aware
of the incomplete status of raid5/6 modes at this time.
Effectively btrfs raid56 (called raid56, no slash, in btrfs lingo,
because it's the same code that handles both) at this time can be
considered a slower raid0, with parity strips that are written but not
able to be used for full recovery at this point, that will "magically" be
upgraded to raid56 when the btrfs raid56 recovery code is complete.
Operationally it works fine, and the parity strips are indeed written.
It's the scrub and recovery code that's not yet complete. Which means
consider it a raid0 in terms of recovery, a total loss if a single device
is lost, and have your backups and/or willingness to simply say bye to
the data if a device is lost prepared accordingly, and you won't be
caught unprepared.
Which since you're using single mode now but thought you were using raid0
mode already, isn't far from your present situation in any case. So you
might actually want to think about raid56 modes if you do a mkfs.btrfs
for some reason, since you're already going to be prepared for a raid0
level meltdown, loss of all data that's not backed up, and while you'd
not get a lot of benefit from it right now, you /would/ get the automatic
upgrade to actually /recoverable/ raid56 when that code is deployed.
The other alternative if your devices and thus filesystem size are big
enough (> 1 TiB per device, > 22 TiB total), would be raid10 mode for the
data. Btrfs raid1 and raid10 is exactly two-way, so you'd have 11-way-
striping instead of the 22-way you'd have with raid0 or the effective
single-speed you have now due to single-mode data, but would also have
the two-way-mirroring. In addition to the normal benefits of two-way-
mirroring, that lets you take advantage of btrfs checksumming and data
integrity features as well, reading from the good copy (and rewriting the
bad one) if the first copy found doesn't match checksum. If I had the
capacity, raid10 would be my preferred mode here, but it /does/ mean
halving effective capacity of the filesystem.
Hope that helps and best wishes from a fellow gentooer! =:^)
---
[1] Backups: While btrfs isn't entirely experimental any more, it's
still not entirely stable either, and data eating bugs can and do
happen. As such, the sysadmin's rule of thumb that says if you don't
have a backup, you don't care about your data, and an untested backup is
not a backup, applies even more than it does when your data is on a fully
mature filesystem.
Of course the same applies to raid0, so the general btrfs status isn't a
big change from that in any case and I expect you either already have
good backups or are prepared to simply lose the data if a device goes bad
already.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: btrfs stuck with lot's of files
2014-12-02 12:48 ` Duncan
@ 2014-12-02 18:56 ` Ian Armstrong
2014-12-02 22:42 ` Duncan
0 siblings, 1 reply; 10+ messages in thread
From: Ian Armstrong @ 2014-12-02 18:56 UTC (permalink / raw)
To: linux-btrfs
On Tue, 2 Dec 2014 12:48:21 +0000 (UTC)
Duncan <1i5t5.duncan@cox.net> wrote:
> Peter Volkov posted on Tue, 02 Dec 2014 04:50:29 +0300 as excerpted:
>
> > В Пн, 01/12/2014 в 10:47 -0800, Robert White пишет:
> >> On 12/01/2014 03:46 AM, Peter Volkov wrote:
> >> > (stuff about getting hung up trying to write to one drive)
> >>
> >> That drive (/dev/sdn) is probably starting to fail.
> >> (about failed drive)
> >
> > Thank you Robert for the answer. It is not likely that drive fails
> > here. Similar condition (write to a single drive) happens with
> > other drives i.e. such write pattern may happen with any drive.
> >
> > After looking at what happens longer I see the following. During
> > stuck single processor core is busy 100% of CPU in kernel space
> > (some kworker is taking 100% CPU).
>
> FWIW, agreed that it's unlikely to be the drive, especially if you're
> not seeing bus resets or drive errors in dmesg and smart says the
> drive is fine, as I expect it does/will. It may be a btrfs bug or
> scaling issue, of which btrfs still has some, or it could simply be
> the single mode vs raid0 mode issue I explain below.
I encountered a similar problem here a few days ago on a btrfs raid1
partition while using rsync to clone a (~30GB) directory.
Everything started fine, but I came back an hour later to find rsync had
apparently stalled at about 20% with cpu usage at 100% on a single
kworker thread. I was able to kill rsync eventually, and after a while
(don't know how long, but >10 minutes) cpu usage returned to normal.
Restarting rsync resulted in kworker at 100% cpu in less than a minute.
Once stalled there was little drive access happening. Another raid1
partition (mdadm/ext4) on the same drive pair was having no problems.
Nothing showed in the system logs.
In this instance I'd forgotten to delete a temporary 500GB file before
starting rsync, so although recently balanced (musage=80/dusage=80) it
was running at near capacity.
After a reboot, deleting the 500GB file & running balance, everything
returned to normal. Ran rsync again & it completed fine.
Running slackware current, with Kernel 3.16.4
# btrfs filesystem df /mnt/general
Data, RAID1: total=1.38TiB, used=1.38TiB
System, RAID1: total=32.00MiB, used=256.00KiB
Metadata, RAID1: total=6.00GiB, used=4.67GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
# btrfs filesystem show /mnt/general
Label: none uuid: 592376ea-769f-4abb-915e-aa5e49162d90
Total devices 2 FS bytes used 1.38TiB
devid 1 size 1.79TiB used 1.39TiB path /dev/sda4
devid 2 size 1.79TiB used 1.39TiB path /dev/sdd4
Btrfs v3.17.2
--
Ian
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: btrfs stuck with lot's of files
2014-12-02 18:56 ` Ian Armstrong
@ 2014-12-02 22:42 ` Duncan
0 siblings, 0 replies; 10+ messages in thread
From: Duncan @ 2014-12-02 22:42 UTC (permalink / raw)
To: linux-btrfs
Ian Armstrong posted on Tue, 02 Dec 2014 18:56:13 +0000 as excerpted:
> On Tue, 2 Dec 2014 12:48:21 +0000 (UTC)
> Duncan <1i5t5.duncan@cox.net> wrote:
>
>> FWIW, agreed that it's unlikely to be the drive, especially if you're
>> not seeing bus resets or drive errors in dmesg and smart says the drive
>> is fine, as I expect it does/will. It may be a btrfs bug or scaling
>> issue, of which btrfs still has some, or it could simply be the single
>> mode vs raid0 mode issue I explain below.
>
> I encountered a similar problem here a few days ago on a btrfs raid1
> partition while using rsync to clone a (~30GB) directory.
>
> Everything started fine, but I came back an hour later to find rsync had
> apparently stalled at about 20% with cpu usage at 100% on a single
> kworker thread. I was able to kill rsync eventually, and after a while
> (don't know how long, but >10 minutes) cpu usage returned to normal.
> Restarting rsync resulted in kworker at 100% cpu in less than a minute.
> Once stalled there was little drive access happening. Another raid1
> partition (mdadm/ext4) on the same drive pair was having no problems.
> Nothing showed in the system logs.
>
> In this instance I'd forgotten to delete a temporary 500GB file before
> starting rsync, so although recently balanced (musage=80/dusage=80) it
> was running at near capacity.
>
> After a reboot, deleting the 500GB file & running balance, everything
> returned to normal. Ran rsync again & it completed fine.
>
> Running slackware current, with Kernel 3.16.4
FWIW that was my point -- there are still such bugs out there, often
corner-case so they don't affect most folks most of the time, but out
there.
I had a similar stall recently, a kworker stuck at 100% that went away
after I killed whatever app had triggered the problem (pan, the news
program I'm writing this with, as it happens). In my case I chalked it
up to a known corner-case bug in my slightly old 3.17.0 kernel (my use-
case doesn't do read-only snapshots so I'm not affected by that known bug
that effectively blacklists 3.17.0 for some users; this would have been a
different one). I don't /know/ it was that bug, but it most likely was,
as it's a known but rare corner-case that AFAIK is already fixed in the
late 3.18-rcs.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 10+ messages in thread
* Reiterate: btrfs stuck with lot's of files
2014-12-02 1:33 ` Qu Wenruo
2014-12-02 2:00 ` Peter Volkov
@ 2014-12-04 22:58 ` Peter Volkov
2014-12-04 23:55 ` Chris Murphy
1 sibling, 1 reply; 10+ messages in thread
From: Peter Volkov @ 2014-12-04 22:58 UTC (permalink / raw)
To: linux-btrfs@vger.kernel.org
Hi, guys again. Looking at this issue, I suspect this is bug in btrfs.
We'll have to clean up this installation soon, so if there is any
request to do some debugging, please, ask. I'll try to reiterate what
was said in this thread.
Short story: btrfs filesystem made of 22 1Tb disks with lot's of files
(~30240000). Write load is 25 Mbyte/second. After some time file system
became unable to cope with this load. Also at this time `sync` takes
ages to finish, shutdown -r hangs (I guess related to sync).
Also I see there is one some kernel kworker that is main suspect for
this behavior: all the time it takes 100% of CPU core, jumping from core
to core. At the same time according to iostat write/read speed is close
to zero and everything is stuck.
Siting some details from previous messages:
> > top - 13:10:58 up 1 day, 9:26, 5 users, load average: 157.76, 156.61, 149.29
> > Tasks: 235 total, 2 running, 233 sleeping, 0 stopped, 0 zombie
> > %Cpu(s): 19.8 us, 15.0 sy, 0.0 ni, 60.7 id, 3.9 wa, 0.0 hi, 0.6 si, 0.0 st
> > KiB Mem: 65922104 total, 65414856 used, 507248 free, 1844 buffers
> > KiB Swap: 0 total, 0 used, 0 free. 62570804 cached Mem
> >
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> > COMMAND
> > 8644 root 20 0 0 0 0 R 96.5 0.0 127:21.95 kworker/u16:16
> > 5047 dvr 20 0 6884292 122668 4132 S 6.4 0.2 258:59.49 dvrserver
> > 30223 root 20 0 20140 2600 2132 R 6.4 0.0 0:00.01 top
> > 1 root 20 0 4276 1628 1524 S 0.0 0.0 0:40.19 init
> >
> > There are about 300 treads on server, some of which are writing on disk.
> > A bit information about this btrfs filesystem: this is 22 disk file
> > system with raid1 for metadata and raid0 for data:
> >
> > # btrfs filesystem df /store/
> > Data, single: total=11.92TiB, used=10.86TiB
> > System, RAID1: total=8.00MiB, used=1.27MiB
> > System, single: total=4.00MiB, used=0.00B
> > Metadata, RAID1: total=46.00GiB, used=33.49GiB
> > Metadata, single: total=8.00MiB, used=0.00B
> > GlobalReserve, single: total=512.00MiB, used=128.00KiB
> > # btrfs property get /store/
> > ro=false
> > label=store
> > # btrfs device stats /store/
> > (shows all zeros)
> > # btrfs balance status /store/
> > No balance found on '/store/'
# btrfs filesystem show
Label: 'store' uuid: 296404d1-bd3f-417d-8501-02f8d7906bcf
Total devices 22 FS bytes used 6.50TiB
devid 1 size 931.51GiB used 558.02GiB path /dev/sdb
devid 2 size 931.51GiB used 559.00GiB path /dev/sdc
devid 3 size 931.51GiB used 559.00GiB path /dev/sdd
devid 4 size 931.51GiB used 559.00GiB path /dev/sde
devid 5 size 931.51GiB used 559.00GiB path /dev/sdf
devid 6 size 931.51GiB used 559.00GiB path /dev/sdg
devid 7 size 931.51GiB used 559.00GiB path /dev/sdh
devid 8 size 931.51GiB used 559.00GiB path /dev/sdi
devid 9 size 931.51GiB used 559.00GiB path /dev/sdj
devid 10 size 931.51GiB used 559.00GiB path /dev/sdk
devid 11 size 931.51GiB used 559.00GiB path /dev/sdl
devid 12 size 931.51GiB used 559.00GiB path /dev/sdm
devid 13 size 931.51GiB used 559.00GiB path /dev/sdn
devid 14 size 931.51GiB used 559.00GiB path /dev/sdo
devid 15 size 931.51GiB used 559.00GiB path /dev/sdp
devid 16 size 931.51GiB used 559.00GiB path /dev/sdq
devid 17 size 931.51GiB used 559.00GiB path /dev/sdr
devid 18 size 931.51GiB used 559.00GiB path /dev/sds
devid 19 size 931.51GiB used 559.00GiB path /dev/sdt
devid 20 size 931.51GiB used 559.00GiB path /dev/sdu
devid 21 size 931.51GiB used 559.01GiB path /dev/sdv
devid 22 size 931.51GiB used 560.01GiB path /dev/sdw
Btrfs v3.17.1
> > iostat 1 exposes following problem:
> >
> > avg-cpu: %user %nice %system %iowait %steal %idle
> > 16.96 0.00 17.09 65.95 0.00 0.00
> >
> > Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> > sda 0.00 0.00 0.00 0 0
> > sdc 0.00 0.00 0.00 0 0
> > sdb 0.00 0.00 0.00 0 0
> > sde 0.00 0.00 0.00 0 0
> > sdd 0.00 0.00 0.00 0 0
> > sdf 0.00 0.00 0.00 0 0
> > sdg 0.00 0.00 0.00 0 0
> > sdj 0.00 0.00 0.00 0 0
> > sdh 0.00 0.00 0.00 0 0
> > sdk 0.00 0.00 0.00 0 0
> > sdi 1.00 0.00 200.00 0 200
> > sdl 0.00 0.00 0.00 0 0
> > sdn 48.00 0.00 17260.00 0 17260
> > sdm 0.00 0.00 0.00 0 0
> > sdp 0.00 0.00 0.00 0 0
> > sdo 0.00 0.00 0.00 0 0
> > sdq 0.00 0.00 0.00 0 0
> > sdr 0.00 0.00 0.00 0 0
> > sds 0.00 0.00 0.00 0 0
> > sdt 0.00 0.00 0.00 0 0
> > sdv 0.00 0.00 0.00 0 0
> > sdw 0.00 0.00 0.00 0 0
> > sdu 0.00 0.00 0.00 0 0
At that time I saw such load profile. Write load changed from disk to
disk with time, so I do not suspect broken disk. Currently write profile
is different:
https://drive.google.com/file/d/0BygFL6N3ZVUAVmxaZ1Q5VTZpSGc/view?usp=sharing
Sometimes like above, sometimes all zero, most time load is very low.
> > write goes to one disk. I've tried to debug what's going in kworker and
> > did
> >
> > $ echo workqueue:workqueue_queue_work
> >> /sys/kernel/debug/tracing/set_event
> > $ cat /sys/kernel/debug/tracing/trace_pipe > trace_pipe.out2
I've put result here:
https://drive.google.com/file/d/0BygFL6N3ZVUAMWxCQ0tDREE1Uzg/view?usp=sharing
> > Server has 64Gb of RAM.
kernel is 3.16.7-gentoo
--
Peter.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Reiterate: btrfs stuck with lot's of files
2014-12-04 22:58 ` Reiterate: " Peter Volkov
@ 2014-12-04 23:55 ` Chris Murphy
0 siblings, 0 replies; 10+ messages in thread
From: Chris Murphy @ 2014-12-04 23:55 UTC (permalink / raw)
To: linux-btrfs@vger.kernel.org
On Thu, Dec 4, 2014 at 3:58 PM, Peter Volkov <pva@gentoo.org> wrote:
> Hi, guys again. Looking at this issue, I suspect this is bug in btrfs.
> We'll have to clean up this installation soon, so if there is any
> request to do some debugging, please, ask. I'll try to reiterate what
> was said in this thread.
>
> Short story: btrfs filesystem made of 22 1Tb disks with lot's of files
> (~30240000). Write load is 25 Mbyte/second. After some time file system
> became unable to cope with this load. Also at this time `sync` takes
> ages to finish, shutdown -r hangs (I guess related to sync).
>
> Also I see there is one some kernel kworker that is main suspect for
> this behavior: all the time it takes 100% of CPU core, jumping from core
> to core. At the same time according to iostat write/read speed is close
> to zero and everything is stuck.
>
> Siting some details from previous messages:
>
>> > top - 13:10:58 up 1 day, 9:26, 5 users, load average: 157.76, 156.61, 149.29
>> > Tasks: 235 total, 2 running, 233 sleeping, 0 stopped, 0 zombie
>> > %Cpu(s): 19.8 us, 15.0 sy, 0.0 ni, 60.7 id, 3.9 wa, 0.0 hi, 0.6 si, 0.0 st
>> > KiB Mem: 65922104 total, 65414856 used, 507248 free, 1844 buffers
>> > KiB Swap: 0 total, 0 used, 0 free. 62570804 cached Mem
>> >
>> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
>> > COMMAND
>> > 8644 root 20 0 0 0 0 R 96.5 0.0 127:21.95 kworker/u16:16
>> > 5047 dvr 20 0 6884292 122668 4132 S 6.4 0.2 258:59.49 dvrserver
>> > 30223 root 20 0 20140 2600 2132 R 6.4 0.0 0:00.01 top
>> > 1 root 20 0 4276 1628 1524 S 0.0 0.0 0:40.19 init
>> >
>> > There are about 300 treads on server, some of which are writing on disk.
>> > A bit information about this btrfs filesystem: this is 22 disk file
>> > system with raid1 for metadata and raid0 for data:
>> >
>> > # btrfs filesystem df /store/
>> > Data, single: total=11.92TiB, used=10.86TiB
>> > System, RAID1: total=8.00MiB, used=1.27MiB
>> > System, single: total=4.00MiB, used=0.00B
>> > Metadata, RAID1: total=46.00GiB, used=33.49GiB
>> > Metadata, single: total=8.00MiB, used=0.00B
>> > GlobalReserve, single: total=512.00MiB, used=128.00KiB
>> > # btrfs property get /store/
>> > ro=false
>> > label=store
>> > # btrfs device stats /store/
>> > (shows all zeros)
>> > # btrfs balance status /store/
>> > No balance found on '/store/'
>
> # btrfs filesystem show
> Label: 'store' uuid: 296404d1-bd3f-417d-8501-02f8d7906bcf
> Total devices 22 FS bytes used 6.50TiB
> devid 1 size 931.51GiB used 558.02GiB path /dev/sdb
> devid 2 size 931.51GiB used 559.00GiB path /dev/sdc
> devid 3 size 931.51GiB used 559.00GiB path /dev/sdd
> devid 4 size 931.51GiB used 559.00GiB path /dev/sde
> devid 5 size 931.51GiB used 559.00GiB path /dev/sdf
> devid 6 size 931.51GiB used 559.00GiB path /dev/sdg
> devid 7 size 931.51GiB used 559.00GiB path /dev/sdh
> devid 8 size 931.51GiB used 559.00GiB path /dev/sdi
> devid 9 size 931.51GiB used 559.00GiB path /dev/sdj
> devid 10 size 931.51GiB used 559.00GiB path /dev/sdk
> devid 11 size 931.51GiB used 559.00GiB path /dev/sdl
> devid 12 size 931.51GiB used 559.00GiB path /dev/sdm
> devid 13 size 931.51GiB used 559.00GiB path /dev/sdn
> devid 14 size 931.51GiB used 559.00GiB path /dev/sdo
> devid 15 size 931.51GiB used 559.00GiB path /dev/sdp
> devid 16 size 931.51GiB used 559.00GiB path /dev/sdq
> devid 17 size 931.51GiB used 559.00GiB path /dev/sdr
> devid 18 size 931.51GiB used 559.00GiB path /dev/sds
> devid 19 size 931.51GiB used 559.00GiB path /dev/sdt
> devid 20 size 931.51GiB used 559.00GiB path /dev/sdu
> devid 21 size 931.51GiB used 559.01GiB path /dev/sdv
> devid 22 size 931.51GiB used 560.01GiB path /dev/sdw
>
> Btrfs v3.17.1
>
>> > iostat 1 exposes following problem:
>> >
>> > avg-cpu: %user %nice %system %iowait %steal %idle
>> > 16.96 0.00 17.09 65.95 0.00 0.00
>> >
>> > Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
>> > sda 0.00 0.00 0.00 0 0
>> > sdc 0.00 0.00 0.00 0 0
>> > sdb 0.00 0.00 0.00 0 0
>> > sde 0.00 0.00 0.00 0 0
>> > sdd 0.00 0.00 0.00 0 0
>> > sdf 0.00 0.00 0.00 0 0
>> > sdg 0.00 0.00 0.00 0 0
>> > sdj 0.00 0.00 0.00 0 0
>> > sdh 0.00 0.00 0.00 0 0
>> > sdk 0.00 0.00 0.00 0 0
>> > sdi 1.00 0.00 200.00 0 200
>> > sdl 0.00 0.00 0.00 0 0
>> > sdn 48.00 0.00 17260.00 0 17260
>> > sdm 0.00 0.00 0.00 0 0
>> > sdp 0.00 0.00 0.00 0 0
>> > sdo 0.00 0.00 0.00 0 0
>> > sdq 0.00 0.00 0.00 0 0
>> > sdr 0.00 0.00 0.00 0 0
>> > sds 0.00 0.00 0.00 0 0
>> > sdt 0.00 0.00 0.00 0 0
>> > sdv 0.00 0.00 0.00 0 0
>> > sdw 0.00 0.00 0.00 0 0
>> > sdu 0.00 0.00 0.00 0 0
>
> At that time I saw such load profile. Write load changed from disk to
> disk with time, so I do not suspect broken disk. Currently write profile
> is different:
> https://drive.google.com/file/d/0BygFL6N3ZVUAVmxaZ1Q5VTZpSGc/view?usp=sharing
> Sometimes like above, sometimes all zero, most time load is very low.
>
>> > write goes to one disk. I've tried to debug what's going in kworker and
>> > did
>> >
>> > $ echo workqueue:workqueue_queue_work
>> >> /sys/kernel/debug/tracing/set_event
>> > $ cat /sys/kernel/debug/tracing/trace_pipe > trace_pipe.out2
>
> I've put result here:
> https://drive.google.com/file/d/0BygFL6N3ZVUAMWxCQ0tDREE1Uzg/view?usp=sharing
>
Is Btrfs single profile expected to parallel write to block devices?
Initially, any write is a new write rather than an overwrite, because
of COW. All writes go into a single chunk on a single device until the
chunk is full, then onto the next device with a new chunk until that
chunk is full. And so on. This behavior only changes once all space is
allocated as a data or metadata chunk on all block devices, which
actually could take some time. If there are many chunks on many
devices that are 90% full, then I don't know how Btrfs decides which
chunks it writes to. But I still don't think it's highly parallelized
like it is on XFS.
Are reads are parallelized in this case? Unless there's parallelized
reads and writes, the single profile isn't scalable. So before
something is a bug, I'd wonder if the design expects this layout to be
used for the intended use case rather than raid0. The chances of a
single drive dying with 22 drives in the volume is astronomically
high, probably 100% over as short as 6 months, and then what?
I'm unaware of either existing or planned functionality to allow such
a volume to remain functional: to do that, Btrfs needs to delete all
affected files so they're no longer referenced. I've actually thought
of this layout for use with GlusterFS and Ceph, in such a way that a
drive can die and Btrfs informs the distributed filesystem above it
what files are no longer available by this particular storage brick;
next the brick's filesystem can be "cleaned up" by deleting all
missing files, then deleting the missing device, thereby stabilizing
the existing fs. The distributed file system starts replicating
missing files according to its policies.
But right now, if any device dies in your example layout, the
filesystem is functionally lost. Yes you can get remaining data out of
it, but it's in a sense 1/22nd's broken and not fixable as far as I
know. But I haven't tried fixing this manually, e.g. do a scrub to get
a missing files listing and start delete those files, add a new
device, and delete the missing device. If the missing files aren't
explicitly deleted, I think the fs still has references for them and
will just return read/corruption errors rather than denying the file
even exists.
--
Chris Murphy
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-12-04 23:55 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-01 11:46 btrfs stuck with lot's of files Peter Volkov
2014-12-01 18:47 ` Robert White
2014-12-02 1:50 ` Peter Volkov
2014-12-02 12:48 ` Duncan
2014-12-02 18:56 ` Ian Armstrong
2014-12-02 22:42 ` Duncan
2014-12-02 1:33 ` Qu Wenruo
2014-12-02 2:00 ` Peter Volkov
2014-12-04 22:58 ` Reiterate: " Peter Volkov
2014-12-04 23:55 ` Chris Murphy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox