Re: [PATCH] mm: disallow direct reclaim page writeback

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Mel Gorman <mel@csn.ul.ie>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] mm: disallow direct reclaim page writeback
Date: Tue, 13 Apr 2010 21:19:02 +1000	[thread overview]
Message-ID: <20100413111902.GY2493@dastard> (raw)
In-Reply-To: <20100413095815.GU25756@csn.ul.ie>

On Tue, Apr 13, 2010 at 10:58:15AM +0100, Mel Gorman wrote:
> On Tue, Apr 13, 2010 at 10:17:58AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > When we enter direct reclaim we may have used an arbitrary amount of stack
> > space, and hence enterring the filesystem to do writeback can then lead to
> > stack overruns. This problem was recently encountered x86_64 systems with
> > 8k stacks running XFS with simple storage configurations.
> > 
> > Writeback from direct reclaim also adversely affects background writeback. The
> > background flusher threads should already be taking care of cleaning dirty
> > pages, and direct reclaim will kick them if they aren't already doing work. If
> > direct reclaim is also calling ->writepage, it will cause the IO patterns from
> > the background flusher threads to be upset by LRU-order writeback from
> > pageout() which can be effectively random IO. Having competing sources of IO
> > trying to clean pages on the same backing device reduces throughput by
> > increasing the amount of seeks that the backing device has to do to write back
> > the pages.
> > 
> 
> It's already known that the VM requesting specific pages be cleaned and
> reclaimed is a bad IO pattern but unfortunately it is still required by
> lumpy reclaim. This change would appear to break that although I haven't
> tested it to be 100% sure.

How do you test it? I'd really like to be able to test this myself....

> Even without high-order considerations, this patch would appear to make
> fairly large changes to how direct reclaim behaves. It would no longer
> wait on page writeback for example so direct reclaim will return sooner

AFAICT it still waits for pages under writeback in exactly the same manner
it does now. shrink_page_list() does the following completely
separately to the sc->may_writepage flag:

 666                 may_enter_fs = (sc->gfp_mask & __GFP_FS) ||
 667                         (PageSwapCache(page) && (sc->gfp_mask & __GFP_IO));
 668
 669                 if (PageWriteback(page)) {
 670                         /*
 671                          * Synchronous reclaim is performed in two passes,
 672                          * first an asynchronous pass over the list to
 673                          * start parallel writeback, and a second synchronous
 674                          * pass to wait for the IO to complete.  Wait here
 675                          * for any page for which writeback has already
 676                          * started.
 677                          */
 678                         if (sync_writeback == PAGEOUT_IO_SYNC && may_enter_fs)
 679                                 wait_on_page_writeback(page);
 680                         else
 681                                 goto keep_locked;
 682                 }

So if the page is under writeback, PAGEOUT_IO_SYNC is set and
we can enter the fs, it will still wait for writeback to complete
just like it does now.

However, the current code only uses PAGEOUT_IO_SYNC in lumpy
reclaim, so for most typical workloads direct reclaim does not wait
on page writeback, either. Hence, this patch doesn't appear to
change the actions taken on a page under writeback in direct
reclaim....

> than it did potentially going OOM if there were a lot of dirty pages and
> it made no progress during direct reclaim.

I did a fair bit of low/small memory testing. This is a subjective
observation, but I definitely seemed to get less severe OOM
situations and better overall responisveness with this patch than
compared to when direct reclaim was doing writeback.

> > Hence for direct reclaim we should not allow ->writepages to be entered at all.
> > Set up the relevant scan_control structures to enforce this, and prevent
> > sc->may_writepage from being set in other places in the direct reclaim path in
> > response to other events.
> > 
> 
> If an FS caller cannot re-enter the FS, it should be using GFP_NOFS
> instead of GFP_KERNEL.

This problem is not a filesystem recursion problem which is, as I
understand it, what GFP_NOFS is used to prevent. It's _any_ kernel
code that uses signficant stack before trying to allocate memory
that is the problem. e.g a select() system call:

       Depth    Size   Location    (47 entries)
       -----    ----   --------
 0)     7568      16   mempool_alloc_slab+0x16/0x20
 1)     7552     144   mempool_alloc+0x65/0x140
 2)     7408      96   get_request+0x124/0x370
 3)     7312     144   get_request_wait+0x29/0x1b0
 4)     7168      96   __make_request+0x9b/0x490
 5)     7072     208   generic_make_request+0x3df/0x4d0
 6)     6864      80   submit_bio+0x7c/0x100
 7)     6784      96   _xfs_buf_ioapply+0x128/0x2c0 [xfs]
....
32)     3184      64   xfs_vm_writepage+0xab/0x160 [xfs]
33)     3120     384   shrink_page_list+0x65e/0x840
34)     2736     528   shrink_zone+0x63f/0xe10
35)     2208     112   do_try_to_free_pages+0xc2/0x3c0
36)     2096     128   try_to_free_pages+0x77/0x80
37)     1968     240   __alloc_pages_nodemask+0x3e4/0x710
38)     1728      48   alloc_pages_current+0x8c/0xe0
39)     1680      16   __get_free_pages+0xe/0x50
40)     1664      48   __pollwait+0xca/0x110
41)     1616      32   unix_poll+0x28/0xc0
42)     1584      16   sock_poll+0x1d/0x20
43)     1568     912   do_select+0x3d6/0x700
44)      656     416   core_sys_select+0x18c/0x2c0
45)      240     112   sys_select+0x4f/0x110
46)      128     128   system_call_fastpath+0x16/0x1b

There's 1.6k of stack used before memory allocation is called, 3.1k
used there before ->writepage is entered, XFS used 3.5k, and
if the mempool needed to allocate a page it would have blown the
stack. If there was any significant storage subsystem (add dm, md
and/or scsi of some kind), it would have blown the stack.

Basically, there is not enough stack space available to allow direct
reclaim to enter ->writepage _anywhere_ according to the stack usage
profiles we are seeing here....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-04-13 11:19 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-13  0:17 [PATCH] mm: disallow direct reclaim page writeback Dave Chinner
2010-04-13  8:31 ` KOSAKI Motohiro
2010-04-13 10:29   ` Dave Chinner
2010-04-13 11:39     ` KOSAKI Motohiro
2010-04-13 14:36       ` Dave Chinner
2010-04-14  3:12         ` Dave Chinner
2010-04-14  6:52           ` KOSAKI Motohiro
2010-04-15  1:56             ` Dave Chinner
2010-04-14  6:52         ` KOSAKI Motohiro
2010-04-14  7:36           ` Dave Chinner
2010-04-13  9:58 ` Mel Gorman
2010-04-13 11:19   ` Dave Chinner [this message]
2010-04-13 19:34     ` Mel Gorman
2010-04-13 20:20       ` Chris Mason
2010-04-14  1:40         ` Dave Chinner
2010-04-14  4:59           ` KAMEZAWA Hiroyuki
2010-04-14  5:41             ` Dave Chinner
2010-04-14  5:54               ` KOSAKI Motohiro
2010-04-14  6:13                 ` Minchan Kim
2010-04-14  7:19                   ` Minchan Kim
2010-04-14  9:42                     ` KAMEZAWA Hiroyuki
2010-04-14 10:01                       ` Minchan Kim
2010-04-14 10:07                         ` Mel Gorman
2010-04-14 10:16                           ` Minchan Kim
2010-04-14  7:06                 ` Dave Chinner
2010-04-14  6:52           ` KOSAKI Motohiro
2010-04-14  7:28             ` Dave Chinner
2010-04-14  8:51               ` Mel Gorman
2010-04-15  1:34                 ` Dave Chinner
2010-04-15  4:09                   ` KOSAKI Motohiro
2010-04-15  4:11                     ` [PATCH 1/4] vmscan: delegate pageout io to flusher thread if current is kswapd KOSAKI Motohiro
2010-04-15  8:05                       ` Suleiman Souhlal
2010-04-15  8:17                         ` KOSAKI Motohiro
2010-04-15  8:26                           ` KOSAKI Motohiro
2010-04-15 10:30                             ` Johannes Weiner
2010-04-15 17:24                               ` Suleiman Souhlal
2010-04-20  2:56                               ` Ying Han
2010-04-15  9:32                         ` Dave Chinner
2010-04-15  9:41                           ` KOSAKI Motohiro
2010-04-15 17:27                           ` Suleiman Souhlal
2010-04-15 23:33                             ` Dave Chinner
2010-04-15 23:41                               ` Suleiman Souhlal
2010-04-16  9:50                               ` Alan Cox
2010-04-17  3:06                                 ` Dave Chinner
2010-04-15  8:18                       ` KOSAKI Motohiro
2010-04-15 10:31                       ` Mel Gorman
2010-04-15 11:26                         ` KOSAKI Motohiro
2010-04-15  4:13                     ` [PATCH 2/4] vmscan: kill prev_priority completely KOSAKI Motohiro
2010-04-15  4:14                     ` [PATCH 3/4] vmscan: move priority variable into scan_control KOSAKI Motohiro
2010-04-15  4:15                     ` [PATCH 4/4] vmscan: delegate page cleaning io to flusher thread if VM pressure is low KOSAKI Motohiro
2010-04-15  4:35                     ` [PATCH] mm: disallow direct reclaim page writeback KOSAKI Motohiro
2010-04-15  6:32                       ` Dave Chinner
2010-04-15  6:44                         ` KOSAKI Motohiro
2010-04-15  6:58                           ` Dave Chinner
2010-04-15  6:20                     ` Dave Chinner
2010-04-15  6:35                       ` KOSAKI Motohiro
2010-04-15  8:54                         ` Dave Chinner
2010-04-15 10:21                           ` KOSAKI Motohiro
2010-04-15 10:23                             ` [PATCH 1/4] vmscan: simplify shrink_inactive_list() KOSAKI Motohiro
2010-04-15 13:15                               ` Mel Gorman
2010-04-15 15:01                                 ` Andi Kleen
2010-04-15 15:44                                   ` Mel Gorman
2010-04-15 16:54                                     ` Andi Kleen
2010-04-15 23:40                                       ` Dave Chinner
2010-04-16  7:13                                         ` Andi Kleen
2010-04-16 14:57                                         ` Mel Gorman
2010-04-17  2:37                                           ` Dave Chinner
2010-04-16 14:55                                       ` Mel Gorman
2010-04-15 18:22                                 ` Valdis.Kletnieks
2010-04-16  9:39                                   ` Mel Gorman
2010-04-15 10:24                             ` [PATCH 2/4] [cleanup] mm: introduce free_pages_prepare KOSAKI Motohiro
2010-04-15 13:33                               ` Mel Gorman
2010-04-15 10:24                             ` [PATCH 3/4] mm: introduce free_pages_bulk KOSAKI Motohiro
2010-04-15 13:46                               ` Mel Gorman
2010-04-15 10:26                             ` [PATCH 4/4] vmscan: replace the pagevec in shrink_inactive_list() with list KOSAKI Motohiro
2010-04-15 10:28                   ` [PATCH] mm: disallow direct reclaim page writeback Mel Gorman
2010-04-15 13:42                     ` Chris Mason
2010-04-15 17:50                       ` tytso
2010-04-16 15:05                       ` Mel Gorman
     [not found]                       ` <20100416150510.GL19264@csn.ul.ie>
2010-04-19 15:15                         ` Mel Gorman
     [not found]                         ` <20100419151511.GV19264@csn.ul.ie>
2010-04-19 17:38                           ` Chris Mason
2010-04-16  4:14                     ` Dave Chinner
2010-04-16 15:14                       ` Mel Gorman
2010-04-18  0:32                         ` Andrew Morton
2010-04-18 19:05                           ` Christoph Hellwig
2010-04-18 16:31                             ` Andrew Morton
2010-04-18 19:35                               ` Christoph Hellwig
2010-04-18 19:11                             ` Sorin Faibish
2010-04-18 19:10                           ` Sorin Faibish
2010-04-18 21:30                             ` James Bottomley
2010-04-18 23:34                               ` Sorin Faibish
2010-04-19  3:08                               ` tytso
2010-04-19  0:35                           ` Dave Chinner
2010-04-19  0:49                             ` Arjan van de Ven
2010-04-19  1:08                               ` Dave Chinner
2010-04-19  4:32                                 ` Arjan van de Ven
2010-04-19 15:20                         ` Mel Gorman
2010-04-23  1:06                           ` Dave Chinner
2010-04-23 10:50                             ` Mel Gorman
2010-04-15 14:57                   ` Andi Kleen
2010-04-15  2:37                 ` Johannes Weiner
2010-04-15  2:43                   ` KOSAKI Motohiro
2010-04-16 23:56                     ` Johannes Weiner
2010-04-14  6:52         ` KOSAKI Motohiro
2010-04-14 10:06         ` Andi Kleen
2010-04-14 11:20           ` Chris Mason
2010-04-14 12:15             ` Andi Kleen
2010-04-14 12:32               ` Alan Cox
2010-04-14 12:34                 ` Andi Kleen
2010-04-14 13:23             ` Mel Gorman
2010-04-14 14:07               ` Chris Mason
2010-04-14  0:24 ` Minchan Kim
2010-04-14  4:44   ` Dave Chinner
2010-04-14  7:54     ` Minchan Kim
2010-04-16  1:13 ` KAMEZAWA Hiroyuki
2010-04-16  4:18   ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100413111902.GY2493@dastard \
    --to=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).