From: Chris Mason <chris.mason@oracle.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: Mel Gorman <mel@csn.ul.ie>, Dave Chinner <david@fromorbit.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] mm: disallow direct reclaim page writeback
Date: Wed, 14 Apr 2010 07:20:15 -0400 [thread overview]
Message-ID: <20100414112015.GO13327@think> (raw)
In-Reply-To: <877hoa9wlv.fsf@basil.nowhere.org>
On Wed, Apr 14, 2010 at 12:06:36PM +0200, Andi Kleen wrote:
> Chris Mason <chris.mason@oracle.com> writes:
> >
> > Huh, 912 bytes...for select, really? From poll.h:
> >
> > /* ~832 bytes of stack space used max in sys_select/sys_poll before allocating
> > additional memory. */
> > #define MAX_STACK_ALLOC 832
> > #define FRONTEND_STACK_ALLOC 256
> > #define SELECT_STACK_ALLOC FRONTEND_STACK_ALLOC
> > #define POLL_STACK_ALLOC FRONTEND_STACK_ALLOC
> > #define WQUEUES_STACK_ALLOC (MAX_STACK_ALLOC - FRONTEND_STACK_ALLOC)
> > #define N_INLINE_POLL_ENTRIES (WQUEUES_STACK_ALLOC / sizeof(struct poll_table_entry))
> >
> > So, select is intentionally trying to use that much stack. It should be using
> > GFP_NOFS if it really wants to suck down that much stack...
>
> There are lots of other call chains which use multiple KB bytes by itself,
> so why not give select() that measly 832 bytes?
>
> You think only file systems are allowed to use stack? :)
Grin, most definitely.
>
> Basically if you cannot tolerate 1K (or more likely more) of stack
> used before your fs is called you're toast in lots of other situations
> anyways.
Well, on a 4K stack kernel, 832 bytes is a very large percentage for
just one function.
Direct reclaim is a problem because it splices parts of the kernel that
normally aren't connected together. The people that code in select see
832 bytes and say that's teeny, I should have taken 3832 bytes.
But they don't realize their function can dive down into ecryptfs then
the filesystem then maybe loop and then perhaps raid6 on top of a
network block device.
>
> > kernel had some sort of way to dynamically allocate ram, it could try
> > that too.
>
> It does this for large inputs, but the whole point of the stack fast
> path is to avoid it for common cases when a small number of fds is
> only needed.
>
> It's significantly slower to go to any external allocator.
Yeah, but since the call chain does eventually go into the allocator,
this function needs to be more stack friendly.
I do agree that we can't really solve this with noinline_for_stack pixie
dust, the long call chains are going to be a problem no matter what.
Reading through all the comments so far, I think the short summary is:
Cleaning pages in direct reclaim helps the VM because it is able to make
sure that lumpy reclaim finds adjacent pages. This isn't a fast
operation, it has to wait for IO (infinitely slow compared to the CPU).
Will it be good enough for the VM if we add a hint to the bdi writeback
threads to work on a general area of the file? The filesystem will get
writepages(), the VM will get the IO it needs started.
I know Mel mentioned before he wasn't interested in waiting for helper
threads, but I don't see how we can work without it.
-chris
WARNING: multiple messages have this Message-ID (diff)
From: Chris Mason <chris.mason@oracle.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: Mel Gorman <mel@csn.ul.ie>, Dave Chinner <david@fromorbit.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] mm: disallow direct reclaim page writeback
Date: Wed, 14 Apr 2010 07:20:15 -0400 [thread overview]
Message-ID: <20100414112015.GO13327@think> (raw)
In-Reply-To: <877hoa9wlv.fsf@basil.nowhere.org>
On Wed, Apr 14, 2010 at 12:06:36PM +0200, Andi Kleen wrote:
> Chris Mason <chris.mason@oracle.com> writes:
> >
> > Huh, 912 bytes...for select, really? From poll.h:
> >
> > /* ~832 bytes of stack space used max in sys_select/sys_poll before allocating
> > additional memory. */
> > #define MAX_STACK_ALLOC 832
> > #define FRONTEND_STACK_ALLOC 256
> > #define SELECT_STACK_ALLOC FRONTEND_STACK_ALLOC
> > #define POLL_STACK_ALLOC FRONTEND_STACK_ALLOC
> > #define WQUEUES_STACK_ALLOC (MAX_STACK_ALLOC - FRONTEND_STACK_ALLOC)
> > #define N_INLINE_POLL_ENTRIES (WQUEUES_STACK_ALLOC / sizeof(struct poll_table_entry))
> >
> > So, select is intentionally trying to use that much stack. It should be using
> > GFP_NOFS if it really wants to suck down that much stack...
>
> There are lots of other call chains which use multiple KB bytes by itself,
> so why not give select() that measly 832 bytes?
>
> You think only file systems are allowed to use stack? :)
Grin, most definitely.
>
> Basically if you cannot tolerate 1K (or more likely more) of stack
> used before your fs is called you're toast in lots of other situations
> anyways.
Well, on a 4K stack kernel, 832 bytes is a very large percentage for
just one function.
Direct reclaim is a problem because it splices parts of the kernel that
normally aren't connected together. The people that code in select see
832 bytes and say that's teeny, I should have taken 3832 bytes.
But they don't realize their function can dive down into ecryptfs then
the filesystem then maybe loop and then perhaps raid6 on top of a
network block device.
>
> > kernel had some sort of way to dynamically allocate ram, it could try
> > that too.
>
> It does this for large inputs, but the whole point of the stack fast
> path is to avoid it for common cases when a small number of fds is
> only needed.
>
> It's significantly slower to go to any external allocator.
Yeah, but since the call chain does eventually go into the allocator,
this function needs to be more stack friendly.
I do agree that we can't really solve this with noinline_for_stack pixie
dust, the long call chains are going to be a problem no matter what.
Reading through all the comments so far, I think the short summary is:
Cleaning pages in direct reclaim helps the VM because it is able to make
sure that lumpy reclaim finds adjacent pages. This isn't a fast
operation, it has to wait for IO (infinitely slow compared to the CPU).
Will it be good enough for the VM if we add a hint to the bdi writeback
threads to work on a general area of the file? The filesystem will get
writepages(), the VM will get the IO it needs started.
I know Mel mentioned before he wasn't interested in waiting for helper
threads, but I don't see how we can work without it.
-chris
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-04-14 11:22 UTC|newest]
Thread overview: 245+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-13 0:17 [PATCH] mm: disallow direct reclaim page writeback Dave Chinner
2010-04-13 0:17 ` Dave Chinner
2010-04-13 8:31 ` KOSAKI Motohiro
2010-04-13 8:31 ` KOSAKI Motohiro
2010-04-13 10:29 ` Dave Chinner
2010-04-13 10:29 ` Dave Chinner
2010-04-13 11:39 ` KOSAKI Motohiro
2010-04-13 11:39 ` KOSAKI Motohiro
2010-04-13 14:36 ` Dave Chinner
2010-04-13 14:36 ` Dave Chinner
2010-04-14 3:12 ` Dave Chinner
2010-04-14 3:12 ` Dave Chinner
2010-04-14 6:52 ` KOSAKI Motohiro
2010-04-14 6:52 ` KOSAKI Motohiro
2010-04-15 1:56 ` Dave Chinner
2010-04-15 1:56 ` Dave Chinner
2010-04-14 6:52 ` KOSAKI Motohiro
2010-04-14 6:52 ` KOSAKI Motohiro
2010-04-14 7:36 ` Dave Chinner
2010-04-14 7:36 ` Dave Chinner
2010-04-13 9:58 ` Mel Gorman
2010-04-13 9:58 ` Mel Gorman
2010-04-13 11:19 ` Dave Chinner
2010-04-13 11:19 ` Dave Chinner
2010-04-13 19:34 ` Mel Gorman
2010-04-13 19:34 ` Mel Gorman
2010-04-13 20:20 ` Chris Mason
2010-04-13 20:20 ` Chris Mason
2010-04-14 1:40 ` Dave Chinner
2010-04-14 1:40 ` Dave Chinner
2010-04-14 4:59 ` KAMEZAWA Hiroyuki
2010-04-14 4:59 ` KAMEZAWA Hiroyuki
2010-04-14 5:41 ` Dave Chinner
2010-04-14 5:41 ` Dave Chinner
2010-04-14 5:54 ` KOSAKI Motohiro
2010-04-14 5:54 ` KOSAKI Motohiro
2010-04-14 6:13 ` Minchan Kim
2010-04-14 7:19 ` Minchan Kim
2010-04-14 7:19 ` Minchan Kim
2010-04-14 9:42 ` KAMEZAWA Hiroyuki
2010-04-14 9:42 ` KAMEZAWA Hiroyuki
2010-04-14 9:42 ` KAMEZAWA Hiroyuki
2010-04-14 10:01 ` Minchan Kim
2010-04-14 10:01 ` Minchan Kim
2010-04-14 10:07 ` Mel Gorman
2010-04-14 10:07 ` Mel Gorman
2010-04-14 10:07 ` Mel Gorman
2010-04-14 10:16 ` Minchan Kim
2010-04-14 10:16 ` Minchan Kim
2010-04-14 7:06 ` Dave Chinner
2010-04-14 7:06 ` Dave Chinner
2010-04-14 6:52 ` KOSAKI Motohiro
2010-04-14 6:52 ` KOSAKI Motohiro
2010-04-14 7:28 ` Dave Chinner
2010-04-14 7:28 ` Dave Chinner
2010-04-14 8:51 ` Mel Gorman
2010-04-14 8:51 ` Mel Gorman
2010-04-15 1:34 ` Dave Chinner
2010-04-15 1:34 ` Dave Chinner
2010-04-15 1:34 ` Dave Chinner
2010-04-15 4:09 ` KOSAKI Motohiro
2010-04-15 4:09 ` KOSAKI Motohiro
2010-04-15 4:11 ` [PATCH 1/4] vmscan: delegate pageout io to flusher thread if current is kswapd KOSAKI Motohiro
2010-04-15 4:11 ` KOSAKI Motohiro
2010-04-15 4:11 ` KOSAKI Motohiro
2010-04-15 8:05 ` Suleiman Souhlal
2010-04-15 8:05 ` Suleiman Souhlal
2010-04-15 8:17 ` KOSAKI Motohiro
2010-04-15 8:17 ` KOSAKI Motohiro
2010-04-15 8:26 ` KOSAKI Motohiro
2010-04-15 8:26 ` KOSAKI Motohiro
2010-04-15 10:30 ` Johannes Weiner
2010-04-15 10:30 ` Johannes Weiner
2010-04-15 17:24 ` Suleiman Souhlal
2010-04-15 17:24 ` Suleiman Souhlal
2010-04-20 2:56 ` Ying Han
2010-04-20 2:56 ` Ying Han
2010-04-15 9:32 ` Dave Chinner
2010-04-15 9:32 ` Dave Chinner
2010-04-15 9:41 ` KOSAKI Motohiro
2010-04-15 9:41 ` KOSAKI Motohiro
2010-04-15 17:27 ` Suleiman Souhlal
2010-04-15 17:27 ` Suleiman Souhlal
2010-04-15 23:33 ` Dave Chinner
2010-04-15 23:33 ` Dave Chinner
2010-04-15 23:41 ` Suleiman Souhlal
2010-04-15 23:41 ` Suleiman Souhlal
2010-04-16 9:50 ` Alan Cox
2010-04-16 9:50 ` Alan Cox
2010-04-17 3:06 ` Dave Chinner
2010-04-17 3:06 ` Dave Chinner
2010-04-15 8:18 ` KOSAKI Motohiro
2010-04-15 8:18 ` KOSAKI Motohiro
2010-04-15 8:18 ` KOSAKI Motohiro
2010-04-15 10:31 ` Mel Gorman
2010-04-15 10:31 ` Mel Gorman
2010-04-15 11:26 ` KOSAKI Motohiro
2010-04-15 11:26 ` KOSAKI Motohiro
2010-04-15 4:13 ` [PATCH 2/4] vmscan: kill prev_priority completely KOSAKI Motohiro
2010-04-15 4:13 ` KOSAKI Motohiro
2010-04-15 4:13 ` KOSAKI Motohiro
2010-04-15 4:14 ` [PATCH 3/4] vmscan: move priority variable into scan_control KOSAKI Motohiro
2010-04-15 4:14 ` KOSAKI Motohiro
2010-04-15 4:14 ` KOSAKI Motohiro
2010-04-15 4:15 ` [PATCH 4/4] vmscan: delegate page cleaning io to flusher thread if VM pressure is low KOSAKI Motohiro
2010-04-15 4:15 ` KOSAKI Motohiro
2010-04-15 4:15 ` KOSAKI Motohiro
2010-04-15 4:35 ` [PATCH] mm: disallow direct reclaim page writeback KOSAKI Motohiro
2010-04-15 4:35 ` KOSAKI Motohiro
2010-04-15 6:32 ` Dave Chinner
2010-04-15 6:32 ` Dave Chinner
2010-04-15 6:44 ` KOSAKI Motohiro
2010-04-15 6:44 ` KOSAKI Motohiro
2010-04-15 6:58 ` Dave Chinner
2010-04-15 6:58 ` Dave Chinner
2010-04-15 6:20 ` Dave Chinner
2010-04-15 6:20 ` Dave Chinner
2010-04-15 6:35 ` KOSAKI Motohiro
2010-04-15 6:35 ` KOSAKI Motohiro
2010-04-15 8:54 ` Dave Chinner
2010-04-15 8:54 ` Dave Chinner
2010-04-15 10:21 ` KOSAKI Motohiro
2010-04-15 10:21 ` KOSAKI Motohiro
2010-04-15 10:23 ` [PATCH 1/4] vmscan: simplify shrink_inactive_list() KOSAKI Motohiro
2010-04-15 10:23 ` KOSAKI Motohiro
2010-04-15 13:15 ` Mel Gorman
2010-04-15 13:15 ` Mel Gorman
2010-04-15 15:01 ` Andi Kleen
2010-04-15 15:01 ` Andi Kleen
2010-04-15 15:44 ` Mel Gorman
2010-04-15 15:44 ` Mel Gorman
2010-04-15 16:54 ` Andi Kleen
2010-04-15 16:54 ` Andi Kleen
2010-04-15 23:40 ` Dave Chinner
2010-04-15 23:40 ` Dave Chinner
2010-04-16 7:13 ` Andi Kleen
2010-04-16 7:13 ` Andi Kleen
2010-04-16 14:57 ` Mel Gorman
2010-04-16 14:57 ` Mel Gorman
2010-04-17 2:37 ` Dave Chinner
2010-04-17 2:37 ` Dave Chinner
2010-04-16 14:55 ` Mel Gorman
2010-04-16 14:55 ` Mel Gorman
2010-04-15 18:22 ` Valdis.Kletnieks
2010-04-16 9:39 ` Mel Gorman
2010-04-16 9:39 ` Mel Gorman
2010-04-15 10:24 ` [PATCH 2/4] [cleanup] mm: introduce free_pages_prepare KOSAKI Motohiro
2010-04-15 10:24 ` KOSAKI Motohiro
2010-04-15 10:24 ` KOSAKI Motohiro
2010-04-15 13:33 ` Mel Gorman
2010-04-15 13:33 ` Mel Gorman
2010-04-15 10:24 ` [PATCH 3/4] mm: introduce free_pages_bulk KOSAKI Motohiro
2010-04-15 10:24 ` KOSAKI Motohiro
2010-04-15 10:24 ` KOSAKI Motohiro
2010-04-15 13:46 ` Mel Gorman
2010-04-15 13:46 ` Mel Gorman
2010-04-15 10:26 ` [PATCH 4/4] vmscan: replace the pagevec in shrink_inactive_list() with list KOSAKI Motohiro
2010-04-15 10:26 ` KOSAKI Motohiro
2010-04-15 10:28 ` [PATCH] mm: disallow direct reclaim page writeback Mel Gorman
2010-04-15 10:28 ` Mel Gorman
2010-04-15 13:42 ` Chris Mason
2010-04-15 13:42 ` Chris Mason
2010-04-15 17:50 ` tytso
2010-04-15 17:50 ` tytso
2010-04-15 17:50 ` tytso
2010-04-16 15:05 ` Mel Gorman
2010-04-16 15:05 ` Mel Gorman
2010-04-16 15:05 ` Mel Gorman
2010-04-19 15:15 ` Mel Gorman
2010-04-19 15:15 ` Mel Gorman
2010-04-19 17:38 ` Chris Mason
2010-04-19 15:15 ` Mel Gorman
2010-04-16 4:14 ` Dave Chinner
2010-04-16 4:14 ` Dave Chinner
2010-04-16 15:14 ` Mel Gorman
2010-04-16 15:14 ` Mel Gorman
2010-04-18 0:32 ` Andrew Morton
2010-04-18 0:32 ` Andrew Morton
2010-04-18 19:05 ` Christoph Hellwig
2010-04-18 19:05 ` Christoph Hellwig
2010-04-18 16:31 ` Andrew Morton
2010-04-18 16:31 ` Andrew Morton
2010-04-18 19:35 ` Christoph Hellwig
2010-04-18 19:35 ` Christoph Hellwig
2010-04-18 19:11 ` Sorin Faibish
2010-04-18 19:11 ` Sorin Faibish
2010-04-18 19:11 ` Sorin Faibish
2010-04-18 19:10 ` Sorin Faibish
2010-04-18 19:10 ` Sorin Faibish
2010-04-18 19:10 ` Sorin Faibish
2010-04-18 21:30 ` James Bottomley
2010-04-18 21:30 ` James Bottomley
2010-04-18 23:34 ` Sorin Faibish
2010-04-18 23:34 ` Sorin Faibish
2010-04-18 23:34 ` Sorin Faibish
2010-04-19 3:08 ` tytso
2010-04-19 3:08 ` tytso
2010-04-19 0:35 ` Dave Chinner
2010-04-19 0:35 ` Dave Chinner
2010-04-19 0:49 ` Arjan van de Ven
2010-04-19 0:49 ` Arjan van de Ven
2010-04-19 1:08 ` Dave Chinner
2010-04-19 1:08 ` Dave Chinner
2010-04-19 4:32 ` Arjan van de Ven
2010-04-19 4:32 ` Arjan van de Ven
2010-04-19 15:20 ` Mel Gorman
2010-04-19 15:20 ` Mel Gorman
2010-04-23 1:06 ` Dave Chinner
2010-04-23 1:06 ` Dave Chinner
2010-04-23 10:50 ` Mel Gorman
2010-04-23 10:50 ` Mel Gorman
2010-04-15 14:57 ` Andi Kleen
2010-04-15 14:57 ` Andi Kleen
2010-04-15 2:37 ` Johannes Weiner
2010-04-15 2:37 ` Johannes Weiner
2010-04-15 2:43 ` KOSAKI Motohiro
2010-04-15 2:43 ` KOSAKI Motohiro
2010-04-16 23:56 ` Johannes Weiner
2010-04-16 23:56 ` Johannes Weiner
2010-04-14 6:52 ` KOSAKI Motohiro
2010-04-14 6:52 ` KOSAKI Motohiro
2010-04-14 10:06 ` Andi Kleen
2010-04-14 10:06 ` Andi Kleen
2010-04-14 11:20 ` Chris Mason [this message]
2010-04-14 11:20 ` Chris Mason
2010-04-14 12:15 ` Andi Kleen
2010-04-14 12:15 ` Andi Kleen
2010-04-14 12:32 ` Alan Cox
2010-04-14 12:32 ` Alan Cox
2010-04-14 12:34 ` Andi Kleen
2010-04-14 12:34 ` Andi Kleen
2010-04-14 13:23 ` Mel Gorman
2010-04-14 13:23 ` Mel Gorman
2010-04-14 14:07 ` Chris Mason
2010-04-14 14:07 ` Chris Mason
2010-04-14 0:24 ` Minchan Kim
2010-04-14 0:24 ` Minchan Kim
2010-04-14 4:44 ` Dave Chinner
2010-04-14 4:44 ` Dave Chinner
2010-04-14 7:54 ` Minchan Kim
2010-04-14 7:54 ` Minchan Kim
2010-04-16 1:13 ` KAMEZAWA Hiroyuki
2010-04-16 1:13 ` KAMEZAWA Hiroyuki
2010-04-16 4:18 ` KAMEZAWA Hiroyuki
2010-04-16 4:18 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100414112015.GO13327@think \
--to=chris.mason@oracle.com \
--cc=andi@firstfloor.org \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.