Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: dsterba@suse.cz, linux-btrfs@vger.kernel.org, clm@fb.com, jbacik@fb.com
Subject: Re: Poll: time to switch skinny-metadata on by default?
Date: Mon, 27 Oct 2014 00:39:25 -0400	[thread overview]
Message-ID: <20141027043924.GI17395@hungrycats.org> (raw)
In-Reply-To: <20141020163403.GW22943@twin.jikos.cz>

[-- Attachment #1: Type: text/plain, Size: 3084 bytes --]

On Mon, Oct 20, 2014 at 06:34:03PM +0200, David Sterba wrote:
> On Thu, Oct 16, 2014 at 01:33:37PM +0200, David Sterba wrote:
> > I'd like to make it default with the 3.17 release of btrfs-progs.
> > Please let me know if you have objections.
> 
> For the record, 3.17 will not change the defaults. The timing of the
> poll was very bad to get enough feedback before the release. Let's keep
> it open for now.

I don't have hard data, but I do have disturbing soft data:

	12 btrfs filesystems with various mixed workloads

	4 of those w/skinny metadata (converted with btrfstune -x)

	3 of those have processes or the entire filesystem hanging
	every few days, triggering watchdog reboots

I'm still trying to find the smoking gun, but it looks like there's a
problem that only shows up when skinny metadata is enabled (or possibly
one that only shows up when both skinny and non-skinny are mixed?).

One thing that may be significant is _when_ those 3 hanging filesystems
are hanging:  when using rsync to update local files.  These machines are
using the traditional rsync copy-then-rename method rather than --inplace
updates.  There's no problem copying data into an empty directory with
rsync, but as soon as I start updating existing data, some process (not
necessarily rsync) using with the filesystem gets stuck within 36 hours,
and stays stuck for days.  If I don't run rsync on the skinny filesystems,
they'll run for a week or more without incident--and if I then start
running rsync again, they hang later the same day.

When I get kernel stacks they show ~50 processes stuck all over the
btrfs metadata manipulation code.  If someone wants to wade through
these I can collect them easily enough.

The 4th skinny-metadata machine--the one that doesn't hang often--is
the only one that isn't using rsync to receive files from elsewhere.
It's also the busiest filesystem (in iops/sec) with the largest variety
in its workload, so all things being equal it should be encountering
more random btrfs problems than the other three.

Some of my machines have multiple filesystems, some with skinny and
some without.  I've tried moving the rsync destination tree to the
non-skinny filesystems on those machines, and in those cases I was able
to complete several rsync updates without incident.  That seems to rule
out any system-level problem.

The 8 filesystems without skinny don't have the hang problem.  They have
had a variety of other issues, but not hangs alone.  Currently 3.17 +
stable-queue patches fixes all the problems I've encountered so far with
the non-skinny filesystems, so the skinny filesystems are now earning
most of my attention.

With this small sample size and data collection rate I admit I could
just have a spurious correlation.  The data also supports conclusions
such as "Western Digital hard drives cause hangs" or "filesystems
created in August 2014 cause hangs."  I'd encourage anyone with the
intrastructure set up to do a larger-scale test to see if this is--or
is not--reproducible.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

  parent reply	other threads:[~2014-10-27  4:39 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-16 11:33 Poll: time to switch skinny-metadata on by default? David Sterba
2014-10-20 16:34 ` David Sterba
2014-10-21  9:29   ` Duncan
2014-10-21 11:02     ` Austin S Hemmelgarn
2014-10-21 12:35       ` Konstantinos Skarlatos
2014-10-21 16:40     ` Rich Freeman
2014-10-22  2:08       ` Duncan
2014-10-22 12:49         ` Dave
2014-10-23  2:41           ` Duncan
2014-10-23 13:37             ` David Sterba
2014-10-23 14:47         ` Tobias Geerinckx-Rice
2014-10-24  1:33           ` Duncan
2014-10-25 12:24   ` Marc Joliet
2014-10-25 19:58     ` Marc Joliet
2014-10-27  1:30       ` Marc Joliet
2014-10-25 20:33     ` Chris Murphy
2014-10-25 20:35       ` Chris Murphy
2014-10-27  1:24         ` Marc Joliet
2014-10-27  7:50           ` Duncan
2014-10-27  4:39   ` Zygo Blaxell [this message]
2014-10-27  7:16     ` Duncan
  -- strict thread matches above, loose matches on Subject: below --
2014-10-17 12:30 Petr Janecek
2014-10-17 18:25 ` Josef Bacik
2014-10-18 11:21   ` Petr Janecek
2014-10-18 14:04     ` Josef Bacik
2014-10-18 15:52       ` Wang Shilong
2014-10-18 15:53         ` Josef Bacik
2014-10-18 16:01           ` Wang Shilong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141027043924.GI17395@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=clm@fb.com \
    --cc=dsterba@suse.cz \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox