public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jens Rosenboom <j.rosenboom@x-ion.de>
Cc: Brian Foster <bfoster@redhat.com>, xfs@oss.sgi.com
Subject: Re: [PATCH v2] [RFC] xfs: allocate log vector buffers outside CIL context lock
Date: Sun, 14 Feb 2016 11:16:45 +1100	[thread overview]
Message-ID: <20160214001645.GF14668@dastard> (raw)
In-Reply-To: <CADr68Wa9vG=ZOn4gbHezEyOeM+tCo69s3WVqcVXnZrn+=DdoVA@mail.gmail.com>

On Sat, Feb 13, 2016 at 06:09:17PM +0100, Jens Rosenboom wrote:
> 2016-01-26 15:17 GMT+01:00 Brian Foster <bfoster@redhat.com>:
> > On Wed, Jan 20, 2016 at 12:58:53PM +1100, Dave Chinner wrote:
> >> From: Dave Chinner <dchinner@redhat.com>
> >>
> >> One of the problems we currently have with delayed logging is that
> >> under serious memory pressure we can deadlock memory reclaim. THis
> >> occurs when memory reclaim (such as run by kswapd) is reclaiming XFS
> >> inodes and issues a log force to unpin inodes that are dirty in the
> >> CIL.
....
> >> That said, I don't have a reliable deadlock reproducer in the first
> >> place, so I'm interested i hearing what people think about this
> >> approach to solve the problem and ways to test and improve it.
> >>
> >> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> >> ---
> >
> > This seems reasonable to me in principle. It would be nice to have some
> > kind of feedback in terms of effectiveness resolving the original
> > deadlock report. I can't think of a good way of testing short of
> > actually instrumenting the deadlock one way or another, unfortunately.
> > Was there a user that might be willing to test or had a detailed enough
> > description of the workload/environment?
> 
> We have seen this issue on our production Ceph cluster sporadically
> and have tried a long time to reproduce it in a lab environment.
....
> kmem_alloc (mode:0x2408240)
> Feb 13 10:51:57 storage-node35 kernel: [10562.614089] XFS:
> ceph-osd(10078) possible memory allocation deadlock size 32856 in
> kmem_alloc (mode:0x2408240)

High order allocation of 32k. That implies a buffer size of at least
32k is in use. Can you tell me what the output of xfs_info <mntpt>
is for one of your filesystems?

I suspect you are using a 64k directory block size, in which case
I'll ask "are you storing millions of files in a single directory"?
If your answer is no, then "don't do that" is an appropriate
solution because large directory block sizes are slower than the
default (4k) for almost all operations until you get up into the
millions of files per directory range.

> Soon after this, operations get so slow that the OSDs die because of
> their suicide timeouts.
> 
> Then I installed onto 3 servers this patch (applied onto kernel
> v4.4.1). The bad news is that I am still getting the kernel messages
> on these machines. The good news, though, is that they appear at a
> much lower frequency and also the impact on performance seems to be
> lower, so the OSD processes on these three nodes did not get killed.

Right, the patch doesn't fix the underlying issue that memory
fragmentation can prevent high order allocation from succeeding for
long periods.  However, it does ensure that the filesystem does not
immediately deadlock memory reclaim when it happens so the system
has a chance to recover. It still can deadlock the filesystem,
because if we can't commit the transaction we can't unlock the
objects in the transaction and everything can get stuck behind that
if there's something sufficiently important in the blocked
transaction.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2016-02-14  0:16 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-19  4:31 [PATCH] [RFC] xfs: allocate log vector buffers outside CIL context lock Dave Chinner
2016-01-20  1:58 ` [PATCH v2] " Dave Chinner
2016-01-26 14:17   ` Brian Foster
2016-02-13 17:09     ` Jens Rosenboom
2016-02-14  0:16       ` Dave Chinner [this message]
2016-02-15 11:57         ` Jens Rosenboom
2016-02-15 13:28           ` Dave Chinner
2016-03-02 12:45             ` Gavin Guo
2016-03-02 18:00               ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160214001645.GF14668@dastard \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=j.rosenboom@x-ion.de \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox