All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alin Dobre <alin.dobre@elastichosts.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: Deadlock/high load
Date: Fri, 20 Jun 2014 17:22:22 +0100	[thread overview]
Message-ID: <53A45FBE.6040306@elastichosts.com> (raw)
In-Reply-To: <5399C40F.3070509@elastichosts.com>

On 12/06/14 16:15, Alin Dobre wrote:
> Hi all,
> 
> I have a problem that triggers quite often on our production machines. I
> don't really know what's triggering this or how to reproduce it, but the
> machine enters in some sort of deadlock state, where it consumes all the
> i/o and the load average goes very high in seconds (it even gets to over
> 200), sometimes in about a minute or even less, the machine is
> unresponsive and we have to reset it. Rarely, the load just stays high
> (~25) for hours, but it never gets down again, but this happens rarely,
> as I said. In general, the machine is either already unresponsive or is
> about to become unresponsive.
> 
> The last machine that encountered this has 40 cores and the btrfs
> filesystem is running over SSDs. We encountered this on a plain 3.14
> kernel, and also on the latest 3.14.6 kernel + all the patches whose
> summary is marked "btrfs:" that made it in 3.15, straight forward
> backported (cherry-picked) to 3.14.
> 
> Also, no suspicious (malicious) activity from the running processes either.
> 
> I noticed there was another report on 3.13 which was solved by a 3.15rc
> patch, it doesn't seem to be the same thing.
> 
> Since the only chance to obtain something was via a SysRq dump, here's
> what I could get from the last "w" trigger (tasks that are in
> uninterruptable (blocked) state), showing only tasks that are related to
> btrfs:

I tried to reproduce this on a slower/older machine with older SSDs and
couldn't get anywhere, the machine stood up. However, when I tried one
of our faster/newer machine also with newer and faster SSDs, I managed
to reproduce it twice.

I should mention that the disks are set up in a MD RAID6, and btrfs
single for both data and metadata is on top of that. I ran bonnie++ to
reproduce it (bonnie++ -d /home/bonnie -s 4g -m test -r 1024 -x 100 -u
bonnie) inside a container that was memory capped to 1GB (hence the -r
1024) with the help of cgroups.

Just before the machine stopped being fully responsive I had 3 processes
that were consuming 100% CPU: md128_raid6, btrfs-transact,
kworker/u82:6. The load was fairly low, but atop stopped working at ~5
load average.

I couldn't dump the sysrq blocked processes this time, but the above 3
processes are also in my initial report.

As per Liu Bo's request, the output of the df command is:
Data, single: total=73.01GiB, used=28.05GiB
System, single: total=4.00MiB, used=16.00KiB
Metadata, single: total=3.01GiB, used=1.04GiB
unknown, single: total=368.00MiB, used=0.00
at the moment when atop was already unresponsive.

Another thing to mention is that our production machines also have a
fairly high traffic of snapshotting (or plain creation, more rarely) and
deletion operations on subvolumes that are quota enabled.

Cheers,
Alin.

  parent reply	other threads:[~2014-06-20 16:21 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-12 15:15 Deadlock/high load Alin Dobre
2014-06-13  3:37 ` Liu Bo
2014-06-13  6:46   ` Alin Dobre
2014-06-13  6:50   ` Alin Dobre
2014-06-20 16:22 ` Alin Dobre [this message]
2014-06-27 16:12 ` Alin Dobre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53A45FBE.6040306@elastichosts.com \
    --to=alin.dobre@elastichosts.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.