All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Lachlan McIlroy <lachlan@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>, xfs-oss <xfs@oss.sgi.com>,
	linux-mm@kvack.org
Subject: Re: deadlock with latest xfs
Date: Mon, 27 Oct 2008 16:30:04 +1100	[thread overview]
Message-ID: <20081027053004.GF11948@disturbed> (raw)
In-Reply-To: <49051C71.9040404@sgi.com>

On Mon, Oct 27, 2008 at 12:42:09PM +1100, Lachlan McIlroy wrote:
> Dave Chinner wrote:
>> On Sun, Oct 26, 2008 at 11:53:51AM +1100, Dave Chinner wrote:
>>> On Fri, Oct 24, 2008 at 05:48:04PM +1100, Dave Chinner wrote:
>>>> OK, I just hung a single-threaded rm -rf after this completed:
>>>>
>>>> # fsstress -p 1024 -n 100 -d /mnt/xfs2/fsstress
>>>>
>>>> It has hung with this trace:
....
>> Got it now. I can reproduce this in a couple of minutes now that both
>> the test fs and the fs hosting the UML fs images are using lazy-count=1
>> (and the frequent 10s long host system freezes have gone away, too).
>>
>> Looks like *another* new memory allocation problem [1]:
.....
>> We've entered memory reclaim inside the xfsdatad while trying to do
>> unwritten extent completion during I/O completion, and that memory
>> reclaim is now blocked waiting for I/o completion that cannot make
>> progress.
>>
>> Nasty.
>>
>> My initial though is to make _xfs_trans_alloc() able to take a KM_NOFS argument
>> so we don't re-enter the FS here. If we get an ENOMEM in this case, we should
>> then re-queue the I/O completion at the back of the workqueue and let other
>> I/o completions progress before retrying this one. That way the I/O that
>> is simply cleaning memory will make progress, hence allowing memory
>> allocation to occur successfully when we retry this I/O completion...
> It could work - unless it's a synchronous I/O in which case the I/O is not
> complete until the extent conversion takes place.

Right. Pushing unwritten extent conversion onto a different
workqueue is probably the only way to handle this easily.
That's the same solution Irix has been using for a long time
(the xfsc thread)....

> Could we allocate the memory up front before the I/O is issued?

Possibly, but that will create more memory pressure than
allocation in I/O completion because now we could need to hold
thousands of allocations across an I/O - think of the case where
we are running low on memory and have a disk subsystem capable of
a few hundred thousand I/Os per second. the allocation failing would
prevent the I/os from being issued, and if this is buffered writes
into unwritten extents we'd be preventing dirty pages from being
cleaned....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Lachlan McIlroy <lachlan@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>, xfs-oss <xfs@oss.sgi.com>,
	linux-mm@kvack.org
Subject: Re: deadlock with latest xfs
Date: Mon, 27 Oct 2008 16:30:04 +1100	[thread overview]
Message-ID: <20081027053004.GF11948@disturbed> (raw)
In-Reply-To: <49051C71.9040404@sgi.com>

On Mon, Oct 27, 2008 at 12:42:09PM +1100, Lachlan McIlroy wrote:
> Dave Chinner wrote:
>> On Sun, Oct 26, 2008 at 11:53:51AM +1100, Dave Chinner wrote:
>>> On Fri, Oct 24, 2008 at 05:48:04PM +1100, Dave Chinner wrote:
>>>> OK, I just hung a single-threaded rm -rf after this completed:
>>>>
>>>> # fsstress -p 1024 -n 100 -d /mnt/xfs2/fsstress
>>>>
>>>> It has hung with this trace:
....
>> Got it now. I can reproduce this in a couple of minutes now that both
>> the test fs and the fs hosting the UML fs images are using lazy-count=1
>> (and the frequent 10s long host system freezes have gone away, too).
>>
>> Looks like *another* new memory allocation problem [1]:
.....
>> We've entered memory reclaim inside the xfsdatad while trying to do
>> unwritten extent completion during I/O completion, and that memory
>> reclaim is now blocked waiting for I/o completion that cannot make
>> progress.
>>
>> Nasty.
>>
>> My initial though is to make _xfs_trans_alloc() able to take a KM_NOFS argument
>> so we don't re-enter the FS here. If we get an ENOMEM in this case, we should
>> then re-queue the I/O completion at the back of the workqueue and let other
>> I/o completions progress before retrying this one. That way the I/O that
>> is simply cleaning memory will make progress, hence allowing memory
>> allocation to occur successfully when we retry this I/O completion...
> It could work - unless it's a synchronous I/O in which case the I/O is not
> complete until the extent conversion takes place.

Right. Pushing unwritten extent conversion onto a different
workqueue is probably the only way to handle this easily.
That's the same solution Irix has been using for a long time
(the xfsc thread)....

> Could we allocate the memory up front before the I/O is issued?

Possibly, but that will create more memory pressure than
allocation in I/O completion because now we could need to hold
thousands of allocations across an I/O - think of the case where
we are running low on memory and have a disk subsystem capable of
a few hundred thousand I/Os per second. the allocation failing would
prevent the I/os from being issued, and if this is buffered writes
into unwritten extents we'd be preventing dirty pages from being
cleaned....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-10-27  5:30 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-23  9:17 deadlock with latest xfs Lachlan McIlroy
2008-10-23 20:57 ` Christoph Hellwig
2008-10-23 22:28   ` Dave Chinner
2008-10-24  3:08   ` Lachlan McIlroy
2008-10-24  5:24     ` Dave Chinner
2008-10-24  6:48       ` Dave Chinner
2008-10-26  0:53         ` Dave Chinner
2008-10-26  2:50           ` Dave Chinner
2008-10-26  2:50             ` Dave Chinner
2008-10-26  4:20             ` Dave Chinner
2008-10-26  4:20               ` Dave Chinner
2008-10-27  1:42             ` Lachlan McIlroy
2008-10-27  1:42               ` Lachlan McIlroy
2008-10-27  5:30               ` Dave Chinner [this message]
2008-10-27  5:30                 ` Dave Chinner
2008-10-27  6:29                 ` Lachlan McIlroy
2008-10-27  6:29                   ` Lachlan McIlroy
2008-10-27  6:54                   ` Dave Chinner
2008-10-27  6:54                     ` Dave Chinner
2008-10-27  7:31                     ` Lachlan McIlroy
2008-10-27  7:31                       ` Lachlan McIlroy
2008-10-28  6:02             ` Nick Piggin
2008-10-28  6:25               ` Dave Chinner
2008-10-28  6:25                 ` Dave Chinner
2008-10-28  8:56                 ` Nick Piggin
2008-10-24  8:46       ` Lachlan McIlroy
2008-10-26 22:39     ` Dave Chinner
2008-10-27  2:30       ` Timothy Shimmin
2008-10-27  5:47         ` Dave Chinner
2008-10-27  7:33       ` Lachlan McIlroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081027053004.GF11948@disturbed \
    --to=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=lachlan@sgi.com \
    --cc=linux-mm@kvack.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.