linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* umount waiting for 12 hours and still running
@ 2013-11-05 13:42 John Goerzen
  2013-11-05 14:20 ` Duncan
  0 siblings, 1 reply; 6+ messages in thread
From: John Goerzen @ 2013-11-05 13:42 UTC (permalink / raw)
  To: linux-btrfs

Hello,

More than 12 hours ago, I tried to umount a btrfs filesystem. Something
involving btrfs-cleaner and btrfs-transacti is still running, but I
don't know what.

I have noticed excessively long umount times before, and it is a
significant concern for me.

A bit of background:

The filesystem in question involves two 2TB USB hard drives.  It is 49%
full.  Data is RAID0, metadata is RAID1.  The files stored on it are for
BackupPC, meaning there are many, many directories and hardlinks.  I
would estimate 30 million inodes in use and many of them have dozens of
hardlinks to them.  These disks used to be formatted with ext4.  I used
the e2fs dump to back them up, created a fresh btrfs filesystem, and
used restore to load the data onto it.

Now then.  btrfs seemed to be extremely slow creating hard links. Slow
to the tune of taking hours longer than ext4 to do the same task, and
often triggering kernel task hung for more than 120 seconds warnings.  I
thought perhaps converting metadata to raid0 would help.  So I started a
btrfs balance start -mconver=raid0 on it.  According to btrfs fi df, it
churned through the first 900MB out of 26GB of metadata in quick order,
but then the amount of RAID0 metadata bounced up and down between about
950MB and 1019MB -- always just shy of 1GB.  There was an active rsync
job to the disk during this time.  With no apparent progress even after
hours, I tried to cancel the balance.  My cancel command did not return
even after waiting hours.  Finally I rebooted and mounted the FS with
the option to not restart the balance, then it canceled in a few
minutes.  dstat showed all was quiet on the disk.  So I thought I would
unmount it, remount it normally, and start the convert again.

And it is from that unmount that it has been sitting.  According to
dstat, it reads about 360K per second, every so often writing out about
25MB per second.  And it's been doing this for 12 hours.

It seems I have encountered numerous problems here:

   * I/O Starvation on link(2) and perhaps also unlink(2)
   * btrfs convert having a lack of progress after many hours
   * btrfs convert stop not stopping anything
   * umount taking hours

The umount is still pending, so if there is any debugging I can do,
please let me know.

Kernel 3.10 from Debian wheezy backports on i386.

Thanks,

John

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: umount waiting for 12 hours and still running
  2013-11-05 13:42 umount waiting for 12 hours and still running John Goerzen
@ 2013-11-05 14:20 ` Duncan
  2013-11-05 16:11   ` John Goerzen
  0 siblings, 1 reply; 6+ messages in thread
From: Duncan @ 2013-11-05 14:20 UTC (permalink / raw)
  To: linux-btrfs

John Goerzen posted on Tue, 05 Nov 2013 07:42:02 -0600 as excerpted:

> Hello,
> 
> More than 12 hours ago, I tried to umount a btrfs filesystem. Something
> involving btrfs-cleaner and btrfs-transacti is still running, but I
> don't know what.
> 
> I have noticed excessively long umount times before, and it is a
> significant concern for me.
> 
> A bit of background:
> 
> The filesystem in question involves two 2TB USB hard drives.  It is 49%
> full.  Data is RAID0, metadata is RAID1.  The files stored on it are for
> BackupPC, meaning there are many, many directories and hardlinks.  I
> would estimate 30 million inodes in use and many of them have dozens of
> hardlinks to them.

That's a bit of a problem for btrfs at this point, as you rightly mention.

> I thought perhaps converting metadata to raid0 would help.  So I
> started a btrfs balance start -mconver=raid0 on it.

> Kernel 3.10 from Debian wheezy backports on i386.

There's a known bug with balance on current kernels related to pre-
allocated space (as with the systemd journal or torrent files with some 
clients).

A patch is available and queued for 3.13 and then for stable (which 
doesn't take patches unless they're in current mainline already), but 
while 3.12 is out and the 3.13 commit window would normally be open, 
Linus is taking a week off for travel without a good internet connection, 
so the 3.13 kernel commit window is delayed a week.  Which means this 
patch is likely to be delayed a couple weeks before it reaches stable.
=:^(

Here's a link to the post with the patch:

[PATCH] Btrfs: relocate csums properly with prealloc extents

http://permalink.gmane.org/gmane.comp.file-systems.btrfs/28733

I'd suggest applying that to the latest 3.12 kernel and trying the 
balance again.  Unfortunately that means an unsafe reboot without a 
remount read-only or unmount, but...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: umount waiting for 12 hours and still running
  2013-11-05 14:20 ` Duncan
@ 2013-11-05 16:11   ` John Goerzen
  2013-11-05 18:21     ` Duncan
  0 siblings, 1 reply; 6+ messages in thread
From: John Goerzen @ 2013-11-05 16:11 UTC (permalink / raw)
  To: linux-btrfs

Duncan <1i5t5.duncan <at> cox.net> writes:

> 
> John Goerzen posted on Tue, 05 Nov 2013 07:42:02 -0600 as excerpted:
> 
> > The filesystem in question involves two 2TB USB hard drives.  It is 49%
> > full.  Data is RAID0, metadata is RAID1.  The files stored on it are for
> > BackupPC, meaning there are many, many directories and hardlinks.  I
> > would estimate 30 million inodes in use and many of them have dozens of
> > hardlinks to them.
> 
> That's a bit of a problem for btrfs at this point, as you rightly mention.

Hi Duncan,

Thank you very much for taking the time to reply.

Can you clarify a bit about what sort of problems I might expect to
encounter with this sort of setup on btrfs?

> 
> > I thought perhaps converting metadata to raid0 would help.  So I
> > started a btrfs balance start -mconver=raid0 on it.
> 
> > Kernel 3.10 from Debian wheezy backports on i386.
> 
> There's a known bug with balance on current kernels related to pre-
> allocated space (as with the systemd journal or torrent files with some 
> clients).

[snip ]

> http://permalink.gmane.org/gmane.comp.file-systems.btrfs/287

I'm almost completely sure that this bug wasn't being hit.  The files were
streamed back by restore(8), and a few written by BackupPC.  I checked the
source to both just to make sure, and neither have a call to fallocate.  I
do not believe there were sparse files on the disk either.   I also haven't
experienced the csum errors mentioned in the post.   

Thanks again,

-- John



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: umount waiting for 12 hours and still running
  2013-11-05 16:11   ` John Goerzen
@ 2013-11-05 18:21     ` Duncan
  0 siblings, 0 replies; 6+ messages in thread
From: Duncan @ 2013-11-05 18:21 UTC (permalink / raw)
  To: linux-btrfs

John Goerzen posted on Tue, 05 Nov 2013 16:11:56 +0000 as excerpted:

> Duncan <1i5t5.duncan <at> cox.net> writes:
> 
> 
>> John Goerzen posted on Tue, 05 Nov 2013 07:42:02 -0600 as excerpted:
>> 
>> > The filesystem in question involves two 2TB USB hard drives.  It is
>> > 49% full.  Data is RAID0, metadata is RAID1.  The files stored on it
>> > are for BackupPC, meaning there are many, many directories and
>> > hardlinks.  I would estimate 30 million inodes in use and many of
>> > them have dozens of hardlinks to them.
>> 
>> That's a bit of a problem for btrfs at this point, as you rightly
>> mention.

> Can you clarify a bit about what sort of problems I might expect to
> encounter with this sort of setup on btrfs?

I'm not a dev nor do I run that sort of setup, so I won't attempt a lot 
of detail.  This is admittedly a bit handwavy, but if you need more just 
use it as a place to start for for your own research.

That out of the way, having followed the list for awhile, I've seen 
several reports of complications related to high hardlink count, mostly 
exactly as yours, related to unresponsive for N seconds warnings and 
inordinately long processing times for unmounts, etc.

Additionally, it's worth noting that until relatively recently (the wiki 
changelog page says 3.7), btrfs had a rather low limit on hardlinks in a 
single directory that people using btrfs for hardlink intensive purposes 
kept hitting.  A developer could give you more details, but IIRC, the 
solution that worked around that, while it /did/ give btrfs the ability 
to handle them, effectively created a setup where the first few hardlinks 
were handled inline and thus were reasonably fast, but beyond that limit, 
an indirect referencing scheme was used that was rather less efficient.

I'd guess btrfs' current problems in that regard are thus two-fold, 
first, above a certain level the implementation /does/ get less 
efficient, and second, given the relatively recent kernel 3.7 
implementation, btrfs' large-numbers-of-hardlinks code hasn't had nearly 
the time to shake out the bugs and get incremental optimizations that the 
more basic code has had.  I doubt btrfs will ever be a speed demon in 
this area, but I expect that given another year or so, the high-numbers 
hardlink code will be somewhat better optimized and tested simply due to 
the incremental effect of bug shakeout and small code changes over time 
as btrfs continues maturing.

Meanwhile, my own interest in btrfs is as a filesystem for SSDs (I still 
use reiserfs on my spinning rust and I've had very good luck with it even 
thru various shoddy hardware experiences since the ordered-by-default 
code went in around 2.6.16, IIRC, but its journaling isn't well suited to 
SSDs), and being able to actually use btrfs' data checksumming and 
integrity features, which means raid1 or raid10 mode (raid1 in my case), 
and the speed of SSDs does mitigate to a large degree a lot of the 
slowness I see others reporting for this and other cases.  Additionally, 
I run several independent smaller partitions so if there /is/ a problem, 
the damage is contained, which means I'm typically dealing with double-
digit gigs per partition at most, thus reducing full partition scrub and 
rebalance times from the hours to days I see people reporting on-list for 
multi-terabyte spinning rust, to typically seconds, perhaps a couple 
minutes, here.  The time is short enough I typically use the don't-
background option, and run the scrub/balance in real-time, waiting for 
the result.  Needless to say, if a full balance is going to take days, 
you don't run it very often, but since it's only a couple minutes here, I 
scrub and balance reasonably frequently, say if I have a bad shutdown (I 
use suspend-to-ram and sometimes on resume the SSDs don't stabilize fast 
enough for the kernel, so a device drops from the btrfs raid1 and the 
whole system goes unstable after that, often leading to a bad shutdown 
and reboot).  Since a full balance involves rewriting everything to new 
chunks that tends to limit bitrot or the chance for any errors to build 
up over time.

My point being that my particular use-case is pretty much diametrically 
opposite yours!  For your backups use-case, I'd probably use something 
less experimental than btrfs, like xfs or ext4 with ordered journaling... 
or the reiserfs I still use on spinning rust, tho people's experience 
with it seems to either be really good or really bad, and while mine is 
definitely good that doesn't mean yours will be.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: umount waiting for 12 hours and still running
@ 2013-11-05 18:46 Tomasz Chmielewski
  2013-11-05 18:53 ` John Goerzen
  0 siblings, 1 reply; 6+ messages in thread
From: Tomasz Chmielewski @ 2013-11-05 18:46 UTC (permalink / raw)
  To: linux-btrfs@vger.kernel.org, jgoerzen

> More than 12 hours ago, I tried to umount a btrfs filesystem.
> Something involving btrfs-cleaner and btrfs-transacti is still
> running, but I don't know what.

Does "iostat -x 1" or "iostat -k 1" show any disk activity?

Anything interesting in dmesg?


-- 
Tomasz Chmielewski
http://wpkg.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: umount waiting for 12 hours and still running
  2013-11-05 18:46 Tomasz Chmielewski
@ 2013-11-05 18:53 ` John Goerzen
  0 siblings, 0 replies; 6+ messages in thread
From: John Goerzen @ 2013-11-05 18:53 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: linux-btrfs@vger.kernel.org



On 11/05/2013 12:46 PM, Tomasz Chmielewski wrote:
>> More than 12 hours ago, I tried to umount a btrfs filesystem.
>> Something involving btrfs-cleaner and btrfs-transacti is still
>> running, but I don't know what.
>
> Does "iostat -x 1" or "iostat -k 1" show any disk activity?

Yes.  For instance, from iostat -x 1:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00    0.00     0.00     0.00 
0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     0.00    0.00    0.00     0.00     0.00 
0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00 
0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00  104.00    0.00   416.00     0.00 
8.00     1.03   10.08   10.08    0.00   9.31  96.80
sdd               0.00     0.00    0.00    0.00     0.00     0.00 
0.00     0.00    0.00    0.00    0.00   0.00   0.00

sdc and sdd are the drives in this brtfs FS, and they are used for 
nothing but that.

iostat -k 1 shows similar levels of activity.

>
> Anything interesting in dmesg?

There was this when it was first mounted:

Nov  4 11:28:43 erwin kernel: [  200.669110] btrfs: use lzo compression
Nov  4 11:28:43 erwin kernel: [  200.669114] btrfs: disk space caching 
is enabled
Nov  4 11:28:58 erwin kernel: [  215.660695] BTRFS debug (device dm-15): 
unlinked 1 orphans
Nov  4 11:28:58 erwin kernel: [  215.673535] btrfs: force skipping balance

I later canceled the balance

Also, several like this:

Nov  4 11:51:23 erwin kernel: [ 1560.552129] INFO: task 
btrfs-transacti:6775 blocked for more than 120 seconds.
Nov  4 11:51:23 erwin kernel: [ 1560.552200] btrfs-transacti D 90d3d4a2 
     0  6775      2 0x00000000
Nov  4 11:51:23 erwin kernel: [ 1560.553136]  [<f85852b6>] ? 
wait_current_trans.isra.20+0x8b/0xb5 [btrfs]
Nov  4 11:51:23 erwin kernel: [ 1560.553217]  [<f858740d>] ? 
start_transaction+0x1db/0x46f [btrfs]
Nov  4 11:51:23 erwin kernel: [ 1560.553267]  [<f85876e0>] ? 
btrfs_attach_transaction+0xd/0x10 [btrfs]
Nov  4 11:51:23 erwin kernel: [ 1560.553316]  [<f8580963>] ? 
transaction_kthread+0xa3/0x158 [btrfs]
Nov  4 11:51:23 erwin kernel: [ 1560.553366]  [<f85808c0>] ? 
try_to_freeze+0x28/0x28 [btrfs]

but that was befrore the umount.

Nothing else.

-- John

>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-11-05 18:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-05 13:42 umount waiting for 12 hours and still running John Goerzen
2013-11-05 14:20 ` Duncan
2013-11-05 16:11   ` John Goerzen
2013-11-05 18:21     ` Duncan
  -- strict thread matches above, loose matches on Subject: below --
2013-11-05 18:46 Tomasz Chmielewski
2013-11-05 18:53 ` John Goerzen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).