linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: Wu Fengguang <fengguang.wu@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux Kernel List <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Minchan Kim <minchan.kim@gmail.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	David Rientjes <rientjes@google.com>
Subject: Re: [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails
Date: Wed, 8 Sep 2010 18:49:23 +1000	[thread overview]
Message-ID: <20100908084923.GW705@dastard> (raw)
In-Reply-To: <20100906215023.GC7362@dastard>

On Tue, Sep 07, 2010 at 07:50:23AM +1000, Dave Chinner wrote:
> On Mon, Sep 06, 2010 at 09:40:15AM +0100, Mel Gorman wrote:
> > On Mon, Sep 06, 2010 at 02:02:43PM +1000, Dave Chinner wrote:
> > > I just went to grab the CAL counters, and found the system in
> > > another livelock.  This time I managed to start the sysrq-trigger
> > > dump while the livelock was in progress - I bas??cally got one shot
> > > at a command before everything stopped responding. Now I'm waiting
> > > for the livelock to pass.... 5min.... the fs_mark workload
> > > has stopped (ctrl-c finally responded), still livelocked....
> > > 10min.... 15min.... 20min.... OK, back now.
> > > 
> > > Interesting - all the fs_mark processes are in D state waiting on IO
> > > completion processing.
> > 
> > Very interesting, maybe they are all stuck in congestion_wait() this
> > time? There are a few sources where that is possible.
> 
> No, they are waiting on log IO completion, not doing allocation or
> in the VM at all.  They stuck in xlog_get_iclog_state() waiting for
> all the log IO buffers to be processed which are stuck behind the
> inode buffer IO completions in th kworker threads that I posted. 
> 
> This potentially is caused by the kworker thread consolidation - log
> IO completion processing used to be in a separate workqueue for
> processing latency and deadlock prevention reasons - the data and
> metadata IO completion can block, whereas we need the log IO
> completion to occur as quickly as possible. I've seen one deadlock
> that the separate work queues solved w.r.t. loop devices, and I
> suspect that part of the problem here is that transaction completion
> cannot occur (and free the memory it and the CIL holds) because log IO
> completion processing is being delayed significantly by metadata IO
> completion...
.....
> > > Which shows that this wasn't an IPI storm that caused this
> > > particular livelock.
> > 
> > No, but it's possible we got stuck somewhere like too_many_isolated() or
> > in congestion_wait. One thing at a time though, would you mind testing
> > the following patch? I haven't tested this *at all* but it should reduce
> > the number of times drain_all_pages() are called further while not
> > eliminating them entirely.
> 
> Ok, I'll try it later today, but first I think I need to do some
> deeper investigation on the kworker thread behaviour....

Ok, so an update is needed here. I have confirmed that the above
livelock was caused by the kworker thread consolidation, and I have
a fix for it (make the log IO completion processing queue WQ_HIGHPRI
so it gets queued ahead of the data/metadata IO completions), and
I've been able to create over a billion inodes now without a
livelock occurring. See the thread titled "[2.6.36-rc3] Workqueues,
XFS, dependencies and deadlock" if you want more details.

To make sure I've been seeing two different livelocks, I removed
Mel's series from my tree (which still contained the above workqueue
fix), and I started seeing short memory allocation livelocks (10-15s
at most) with abnormal increases in CAL counts indication an
increase in IPIs during the short livelocks.  IOWs, the livelock
was't as severe as before the workqueue fix, but still present.
Hence the workqueue issue was definitely a contributing factor to
the severity of the memory allocation triggered issue.

It is clear that there have been two different livelocks with
different caused by the same test, which has led to a lot of
confusion in this thread. It appears that Mel's patch series as
originally posted in this thread is all that is necessary to avoid
the memory allocation livelock issue I was seeing. The workqueue
fix solves the other livelock I was seeing once Mel's patches were
in place.

Thanks to everyone for helping me track these livelocks down and
providing lots of suggestions for things to try. I'll keep testing
and looking for livelocks, but my confidence is increasing that
we've got to the root of them now. 

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2010-09-08  8:50 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-03  9:08 [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V4 Mel Gorman
2010-09-03  9:08 ` [PATCH 1/3] mm: page allocator: Update free page counters after pages are placed on the free list Mel Gorman
2010-09-03 22:38   ` Andrew Morton
2010-09-05 18:06     ` Mel Gorman
2010-09-03  9:08 ` [PATCH 2/3] mm: page allocator: Calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake Mel Gorman
2010-09-03 22:55   ` Andrew Morton
2010-09-03 23:17     ` Christoph Lameter
2010-09-03 23:28       ` Andrew Morton
2010-09-04  0:54         ` Christoph Lameter
2010-09-05 18:12     ` Mel Gorman
2010-09-03  9:08 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2010-09-03 23:00   ` Andrew Morton
2010-09-04  2:25     ` Dave Chinner
2010-09-04  3:21       ` Andrew Morton
2010-09-04  7:58         ` Dave Chinner
2010-09-04  8:14           ` Dave Chinner
     [not found]             ` <20100905015400.GA10714@localhost>
     [not found]               ` <20100905021555.GG705@dastard>
     [not found]                 ` <20100905060539.GA17450@localhost>
     [not found]                   ` <20100905131447.GJ705@dastard>
2010-09-05 13:45                     ` Wu Fengguang
2010-09-05 23:33                       ` Dave Chinner
2010-09-06  4:02                       ` Dave Chinner
2010-09-06  8:40                         ` Mel Gorman
2010-09-06 21:50                           ` Dave Chinner
2010-09-08  8:49                             ` Dave Chinner [this message]
2010-09-09 12:39                               ` Mel Gorman
2010-09-10  6:17                                 ` Dave Chinner
2010-09-07 14:23                         ` Christoph Lameter
2010-09-08  2:13                           ` Wu Fengguang
2010-09-04  3:23       ` Wu Fengguang
2010-09-04  3:59         ` Andrew Morton
2010-09-04  4:37           ` Wu Fengguang
2010-09-05 18:22       ` Mel Gorman
2010-09-05 18:14     ` Mel Gorman
2010-09-08  7:43   ` KOSAKI Motohiro
2010-09-08 20:05     ` Christoph Lameter
2010-09-09 12:41     ` Mel Gorman
2010-09-09 13:45       ` Christoph Lameter
2010-09-09 13:55         ` Mel Gorman
2010-09-09 14:32           ` Christoph Lameter
2010-09-09 15:05             ` Mel Gorman
2010-09-10  2:56               ` KOSAKI Motohiro
2010-09-03 23:05 ` [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V4 Andrew Morton
2010-09-21 11:17   ` Mel Gorman
2010-09-21 12:58     ` [stable] " Greg KH
2010-09-21 14:23       ` Mel Gorman
2010-09-23 18:49         ` Greg KH
2010-09-24  9:14           ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2010-08-31 17:37 [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V3 Mel Gorman
2010-08-31 17:37 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2010-08-31 18:26   ` Christoph Lameter
2010-08-23  8:00 [PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator V2 Mel Gorman
2010-08-23  8:00 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2010-08-23 23:17   ` KOSAKI Motohiro
2010-08-16  9:42 [RFC PATCH 0/3] Reduce watermark-related problems with the per-cpu allocator Mel Gorman
2010-08-16  9:42 ` [PATCH 3/3] mm: page allocator: Drain per-cpu lists after direct reclaim allocation fails Mel Gorman
2010-08-16 14:50   ` Rik van Riel
2010-08-17  2:57   ` Minchan Kim
2010-08-18  3:02   ` KAMEZAWA Hiroyuki
2010-08-19 14:47   ` Minchan Kim
2010-08-19 15:10     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100908084923.GW705@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=minchan.kim@gmail.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).