linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Chris Mason <chris.mason@oracle.com>,
	Frans Pop <elendil@planet.nl>,
	David Rientjes <rientjes@google.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Kernel Testers List <kernel-testers@vger.kernel.org>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	Reinette Chatre <reinette.chatre@intel.com>,
	Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>,
	Karol Lewandowski <karol.k.lewandowski@gmail.com>,
	Mohamed Abbas <mohamed.abbas@intel.com>,
	Jens Axboe <jens.axboe@oracle.com>,
	"John W. Linville" <linville@tuxdriver.com>,
	linux-mm@kvack.org
Subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn
Date: Tue, 20 Oct 2009 11:48:39 +0100	[thread overview]
Message-ID: <20091020104839.GC11778@csn.ul.ie> (raw)
In-Reply-To: <20091019161815.GA11487@think>

On Tue, Oct 20, 2009 at 01:18:15AM +0900, Chris Mason wrote:
> On Mon, Oct 19, 2009 at 03:01:52PM +0100, Mel Gorman wrote:
> > 
> > > During the 2nd phase I see the first SKB allocation errors with a music 
> > > skip between reading commits 95.000 and 110.000.
> > > About commit 115.000 there is a very long pause during which the counter 
> > > does not increase, music stops and the desktop freezes completely. The 
> > > first 30 seconds of that freeze there is only very low disk activity (which 
> > > seems strange);
> > 
> > I'm just going to have to depend on Jens here. Jens, the congestion_wait() is
> > on BLK_RW_ASYNC after the commit. Reclaim usually writes pages asynchronously
> > but lumpy reclaim actually waits of pages to write out synchronously so
> > it's not always async.
> 
> Waiting doesn't make it synchronous from the elevator point of view ;)
> If you're using WB_SYNC_NONE, it's a async write.  WB_SYNC_ALL makes it
> a sync write.  I only see WB_SYNC_NONE in vmscan.c, so we should be
> using the async congestion wait.  (the exception is xfs which always
> does async writes).
> 

Right, reclaim always queues the pages for async IO but for lumpy reclaim,
it calls wait_on_page_writeback() but as you say, from an elevator point of
view, it's still async.

> But I'm honestly not 100% sure.  Looking back through the emails, the
> test case is doing IO on top of a whole lot of things on top of
> dm-crypt?  I just tried to figure out if dm-crypt is turning the async
> IO into sync IOs, but didn't quite make sense of it.
> 

I'm not overly sure either.

> Could you also please include which filesystems were being abused during
> the test and how?  Reading through the emails, I think you've got:
> 
> gitk being run 3 times on some FS (NFS?)
> streaming reads on NFS
> swap on dm-crypt
> 
> If other filesystems are being used, please correct me.  Also please
> include if they are on crypto or straight block device.
> 

I've attached a patch below that should allow us to cheat. When it's applied,
it outputs who called congestion_wait(), how long the timeout was and how
long it waited for. By comparing before and after sleep times, we should
be able to see which of the callers has significantly changed and if
it's something easily addressable.

> > Either way, reclaim is usually worried about writing pages but it would appear
> > after this change that a lot of read activity can also stall a process in
> > direct reclaim. What might be happening in Frans's particular case is that the
> > tasklet that allocates high-order pages for the RX buffers is getting stalled
> > by congestion caused by other processes doing reads from the filesystem.
> > While it makes sense from a congestion point of view to halt the IO, the
> > reclaim operations from direct reclaimers is getting delayed for long enough
> > to cause problems for GFP_ATOMIC.
> 
> The congestion_wait code either waits for congestion to clear or for
> a given timeout.  The part that isn't clear is if before the patch
> we waited a very short time (congestion cleared quickly) or a very long
> time (we hit the timeout or congestion cleared slowly).
> 

Using the instrumentation patch, I found with a very basic test that we
are waiting for short periods of time more often with the patch applied

      1 congestion_wait   rw=1 delay 6 timeout 25 :: before commit
      7 kswapd            congestion_wait   rw=1 delay 0 timeout 25 :: before commit
     32 kswapd            congestion_wait sync=0 delay 0 timeout 25 :: after commit
     61 kswapd            congestion_wait   rw=1 delay 1 timeout 25 :: before commit
    133 kswapd            congestion_wait sync=0 delay 1 timeout 25 :: after commit
     16 kswapd            congestion_wait   rw=1 delay 2 timeout 25 :: before commit
     70 kswapd            congestion_wait sync=0 delay 2 timeout 25 :: after commit
      1 try_to_free_pages congestion_wait sync=0 delay 2 timeout 25 :: after commit
     17 kswapd            congestion_wait   rw=1 delay 3 timeout 25 :: before commit
     28 kswapd            congestion_wait sync=0 delay 3 timeout 25 :: after commit
      1 try_to_free_pages congestion_wait sync=0 delay 3 timeout 25 :: after commit
     23 kswapd            congestion_wait   rw=1 delay 4 timeout 25 :: before commit
     16 kswapd            congestion_wait sync=0 delay 4 timeout 25 :: after commit
      5 try_to_free_pages congestion_wait sync=0 delay 4 timeout 25 :: after commit
     20 kswapd            congestion_wait   rw=1 delay 5 timeout 25 :: before commit
     18 kswapd            congestion_wait sync=0 delay 5 timeout 25 :: after commit
      3 try_to_free_pages congestion_wait sync=0 delay 5 timeout 25 :: after commit
     21 kswapd            congestion_wait   rw=1 delay 6 timeout 25 :: before commit
      8 kswapd            congestion_wait sync=0 delay 6 timeout 25 :: after commit
      2 try_to_free_pages congestion_wait sync=0 delay 6 timeout 25 :: after commit
     13 kswapd            congestion_wait   rw=1 delay 7 timeout 25 :: before commit
     12 kswapd            congestion_wait sync=0 delay 7 timeout 25 :: after commit
      2 try_to_free_pages congestion_wait sync=0 delay 7 timeout 25 :: after commit
      8 kswapd            congestion_wait   rw=1 delay 8 timeout 25 :: before commit
      7 kswapd            congestion_wait sync=0 delay 8 timeout 25 :: after commit
      9 kswapd            congestion_wait   rw=1 delay 9 timeout 25 :: before commit
      5 kswapd            congestion_wait sync=0 delay 9 timeout 25 :: after commit
      2 try_to_free_pages congestion_wait sync=0 delay 9 timeout 25 :: after commit
      4 kswapd            congestion_wait   rw=1 delay 10 timeout 25 :: before commit
      5 kswapd            congestion_wait sync=0 delay 10 timeout 25 :: after commit
      1 try_to_free_pages congestion_wait sync=0 delay 10 timeout 25 :: after commit
[... remaining output snipped ...]

The before and after commit are really 2.6.31 and 2.6.31-patch-reverted.
The first column is how many times we delayed for that length of time.
To generate the output, I just took the console log from both kernels with
a basic test, put the congestion_wait lines into two separate files and

cat congestion-*-sorted | sort -n -k5 | uniq -c

to give a count of how many times we delayed for a particular caller.

> The easiest way to tell is to just replace the congestion_wait() calls
> in direct reclaim with schedule_timeout_interruptible(10), test, then
> schedule_timeout_interruptible(HZ/20), then test again.
> 

Reclaim can also call congestion_wait() and maybe the problem isn't
within the page allocator at all but that it's indirectly affected by
timing.

> > 
> > Does this sound plausible to you? If so, what's the best way of
> > addressing this? Changing congestion_wait back to WRITE (assuming that
> > works for Frans)? Changing it to SYNC (again, assuming it actually
> > works) or a revert?
> 
> I don't think changing it to SYNC is a good plan unless we're actually
> doing sync io.  It would be better to just wait on one of the pages that
> you've sent down (or its hashed waitqueue since the page can go away).
> 

Frans, is there any chance you could apply the following patch and get
the console logs for a vanilla kernel and with the congestion patches
reverted? I'm hoping it'll be able to tell us which of the callers has
significantly changed in timing. If there is one caller that has
significantly changed, it might be enough to address just that caller.

=====

  parent reply	other threads:[~2009-10-20 10:48 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <3onW63eFtRF.A.xXH.oMTxKB@chimera>
     [not found] ` <COE24pZSBH.A.k2B.ZNTxKB@chimera>
     [not found]   ` <200910021111.55749.elendil@planet.nl>
2009-10-05  5:13     ` [Bug #14141] order 2 page allocation failures in iwlagn Frans Pop
2009-10-05  6:50       ` Frans Pop
2009-10-05  8:54         ` Frans Pop
2009-10-05  8:57         ` Mel Gorman
2009-10-05 21:34           ` Frans Pop
2009-10-06  0:04             ` David Rientjes
2009-10-06  1:25               ` KOSAKI Motohiro
2009-10-06  8:53               ` Mel Gorman
2009-10-06  9:14                 ` David Rientjes
2009-10-06  9:22                   ` Mel Gorman
2009-10-06 10:23               ` Frans Pop
2009-10-11 23:10         ` Frans Pop
2009-10-11 23:36           ` Frans Pop
2009-10-12 13:43           ` Mel Gorman
2009-10-12 17:32             ` Frans Pop
2009-10-12 18:43               ` Mel Gorman
2009-10-13 20:38               ` Frans Pop
2009-10-14 10:30                 ` Mel Gorman
2009-10-14 13:10                   ` Frans Pop
2009-10-14 15:40                     ` Mel Gorman
2009-10-14 16:13                       ` Frans Pop
2009-10-14 18:34                       ` Frans Pop
2009-10-14 23:56                         ` Mel Gorman
2009-10-15 20:15                           ` Frans Pop
2009-10-16  9:39                             ` Mel Gorman
2009-10-14 16:30                     ` reinette chatre
2009-10-18 23:33                     ` Frans Pop
2009-10-19  0:36                       ` Pekka Enberg
2009-10-19  2:44                         ` Frans Pop
2009-10-19  9:49                           ` [Bug #14141] order 2 page allocation failures (generic) Tobi Oetiker
2009-10-19  9:54                             ` Pekka Enberg
2009-10-19 14:01                               ` Karol Lewandowski
2009-10-19 14:06                                 ` Mel Gorman
2009-10-19 17:09                                   ` Karol Lewandowski
2009-10-20  1:47                                     ` Karol Lewandowski
2009-10-19 13:31                             ` Mel Gorman
2009-10-19 13:40                               ` Tobias Oetiker
2009-10-19 14:09                                 ` Mel Gorman
2009-10-19 14:16                                   ` Tobias Oetiker
2009-10-19 14:59                                     ` Mel Gorman
2009-10-19 20:12                                       ` Tobias Oetiker
2009-10-19 20:17                                         ` Tobias Oetiker
2009-10-20 10:57                                           ` Mel Gorman
2009-10-20 11:44                                             ` Tobias Oetiker
2009-10-20 12:51                                               ` Mel Gorman
2009-10-20 12:58                                                 ` Tobias Oetiker
2009-10-20 13:39                                                   ` Mel Gorman
2009-10-20 13:50                                                     ` Tobias Oetiker
2009-10-20 14:14                                                       ` Mel Gorman
2009-10-20 14:20                                                         ` Tobias Oetiker
2009-10-22 10:27                                                     ` Tobias Oetiker
2009-10-19  2:52                         ` [Bug #14141] order 2 page allocation failures in iwlagn Jens Axboe
2009-10-19 14:01                       ` Mel Gorman
2009-10-19 16:18                         ` Chris Mason
2009-10-19 17:01                           ` Christoph Hellwig
2009-10-19 21:57                             ` Chris Mason
2009-10-19 17:01                           ` Christoph Hellwig
2009-10-20 10:48                           ` Mel Gorman [this message]
2009-10-26 21:06                             ` Frans Pop
2009-10-27 14:54                               ` Mel Gorman
2009-10-27 15:16                                 ` KOSAKI Motohiro
2009-10-27 15:21                                   ` Mel Gorman
2009-10-27 15:52                                 ` Mel Gorman
2009-10-27 16:03                                   ` Chris Mason
2009-10-27 17:21                                     ` Frans Pop
2009-10-27 17:21                                     ` Frans Pop
2009-11-05 20:14                               ` Frans Pop
2009-11-06  9:51                                 ` Frans Pop
2009-11-09 19:00                                   ` Mel Gorman
2009-10-20 10:48                           ` Mel Gorman
2009-10-25 18:54                           ` Frans Pop
2009-10-14 16:28                   ` reinette chatre
2009-10-14 16:50                     ` Mel Gorman
2009-10-14 20:41                       ` reinette chatre
2009-10-14 21:33                         ` Frans Pop
2009-10-14 21:55                           ` reinette chatre
2009-10-15  2:02                         ` Frans Pop
2009-10-15 15:29                           ` reinette chatre
2009-10-15 19:41                             ` Frans Pop
2009-10-16 17:21                               ` reinette chatre
2009-10-17  5:42                               ` reinette chatre
2009-10-27 11:10                                 ` Frans Pop
2009-10-27 16:15                                   ` reinette chatre
     [not found] ` <COE24pZSBH.A.rP.2MTxKB@chimera>
2009-10-21 20:04   ` [PATCH] SLUB: Don't drop __GFP_NOFAIL completely from allocate_slab() (was: Re: [Bug #14265] ifconfig: page allocation failure. order:5,ode:0x8020 w/ e100) Karol Lewandowski
2009-10-21 21:06     ` David Rientjes
2009-10-21 21:20       ` Karol Lewandowski
2009-10-22 10:20         ` Mel Gorman
2009-10-22 21:33           ` Karol Lewandowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091020104839.GC11778@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=bzolnier@gmail.com \
    --cc=chris.mason@oracle.com \
    --cc=elendil@planet.nl \
    --cc=jens.axboe@oracle.com \
    --cc=karol.k.lewandowski@gmail.com \
    --cc=kernel-testers@vger.kernel.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linville@tuxdriver.com \
    --cc=mohamed.abbas@intel.com \
    --cc=penberg@cs.helsinki.fi \
    --cc=reinette.chatre@intel.com \
    --cc=rientjes@google.com \
    --cc=rjw@sisk.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).