* 2.4 fix for write throttling on x86 >1G
@ 2005-03-11 6:10 Andrea Arcangeli
2005-03-11 16:04 ` Marcelo Tosatti
0 siblings, 1 reply; 4+ messages in thread
From: Andrea Arcangeli @ 2005-03-11 6:10 UTC (permalink / raw)
To: linux-kernel; +Cc: Marcelo Tosatti
Hello Marcelo,
I've got a fix for you on 2.4. I got reports of stalls with heavy writes
on 2.4. There was a mistake in nr_free_buffer_pages. That function is
definitely meant _not_ to take highmem into account (dirty cache cannot
spread over highmem in 2.4 [even when on top of fs]). For unknown
reasons it was actually taking highmem into account. The code was
obviously meant to not take inot account see the GFP_USER and zonelist,
except it wasn't using the zonelist. That is a severe problem because
there will be no write throttling at all, and no bdflush wakeup either.
This should fix it, though my compiler fails to compile 2.4, so it's not
immediate to verify it. If any problem showup I'll post a followup.
This is a noop for all systems <800M (1G shouldn't be noticeable
either). This is why most people can't notice.
Thanks.
Signed-off-by: Andrea Arcangeli <andrea@suse.de>
--- 2.4.23aa3/mm/page_alloc.c.~1~ 2004-07-04 02:09:42.000000000 +0200
+++ 2.4.23aa3/mm/page_alloc.c 2005-03-11 07:00:23.000000000 +0100
@@ -656,7 +656,7 @@ unsigned int nr_free_buffer_pages (void)
class_idx = zone_idx(zone);
sum += zone->nr_cache_pages;
- for (zone = pgdat->node_zones; zone < pgdat->node_zones + MAX_NR_ZONES; zone++) {
+ for (; zone; zone = *zonep++) {
int free = zone->free_pages - zone->watermarks[class_idx].high;
if (free <= 0)
continue;
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.4 fix for write throttling on x86 >1G
2005-03-11 6:10 2.4 fix for write throttling on x86 >1G Andrea Arcangeli
@ 2005-03-11 16:04 ` Marcelo Tosatti
2005-03-11 20:53 ` Andrea Arcangeli
0 siblings, 1 reply; 4+ messages in thread
From: Marcelo Tosatti @ 2005-03-11 16:04 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-kernel, Marcelo Tosatti
Hi Andrea!
On Fri, Mar 11, 2005 at 07:10:35AM +0100, Andrea Arcangeli wrote:
> Hello Marcelo,
>
> I've got a fix for you on 2.4. I got reports of stalls with heavy writes
> on 2.4.
Out of curiosity, that was SuSE not mainline ?
> There was a mistake in nr_free_buffer_pages. That function is
> definitely meant _not_ to take highmem into account (dirty cache cannot
> spread over highmem in 2.4 [even when on top of fs]). For unknown
> reasons it was actually taking highmem into account. The code was
> obviously meant to not take inot account see the GFP_USER and zonelist,
> except it wasn't using the zonelist.
True, initialization of "zone" variable in nr_free_buffer_pages() is
un-nice.
> That is a severe problem because
> there will be no write throttling at all, and no bdflush wakeup either.
>
> This should fix it, though my compiler fails to compile 2.4, so it's not
> immediate to verify it. If any problem showup I'll post a followup.
>
> This is a noop for all systems <800M (1G shouldn't be noticeable
> either). This is why most people can't notice.
Do we really want to limit dirty cache to low mem on HIGHIO capable
machines? I'm afraid doing so might hurt performance on such systems.
I think it might be wise to have nr_free_buffer_pages() take highmem
into account if CONFIG_HIGHIO is set ?
> --- 2.4.23aa3/mm/page_alloc.c.~1~ 2004-07-04 02:09:42.000000000 +0200
> +++ 2.4.23aa3/mm/page_alloc.c 2005-03-11 07:00:23.000000000 +0100
> @@ -656,7 +656,7 @@ unsigned int nr_free_buffer_pages (void)
> class_idx = zone_idx(zone);
>
> sum += zone->nr_cache_pages;
> - for (zone = pgdat->node_zones; zone < pgdat->node_zones + MAX_NR_ZONES; zone++) {
> + for (; zone; zone = *zonep++) {
> int free = zone->free_pages - zone->watermarks[class_idx].high;
> if (free <= 0)
> continue;
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.4 fix for write throttling on x86 >1G
2005-03-11 20:53 ` Andrea Arcangeli
@ 2005-03-11 16:55 ` Marcelo Tosatti
0 siblings, 0 replies; 4+ messages in thread
From: Marcelo Tosatti @ 2005-03-11 16:55 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-kernel
On Fri, Mar 11, 2005 at 09:53:09PM +0100, Andrea Arcangeli wrote:
> Hello Marcelo,
>
> On Fri, Mar 11, 2005 at 01:04:13PM -0300, Marcelo Tosatti wrote:
> > Out of curiosity, that was SuSE not mainline ?
>
> yep.
>
> > Do we really want to limit dirty cache to low mem on HIGHIO capable
> > machines? I'm afraid doing so might hurt performance on such systems.
> >
> > I think it might be wise to have nr_free_buffer_pages() take highmem
> > into account if CONFIG_HIGHIO is set ?
>
> The problem is the buffercache/blkdev-pagecache: it simply can't go in
> highmem. A similar fix happened recently in 2.6 for the same reasons,
> but in 2.6 we allowed it with some logic specific for the
> blkdev-pagecache.
Right, I dont think it is easy nor wanted to make that distiction in v2.4.
> nr_free_buffer_pages() was never intended to take highmem into account,
> that's why there's the GFP_USER thing already, except we didn't loop
> into the zonelist, so I didn't try to make a fix similar to 2.6.
Hopefully it is not a big deal to not-allow >1GB dirty pagecache on v2.4.
Applied, thanks.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.4 fix for write throttling on x86 >1G
2005-03-11 16:04 ` Marcelo Tosatti
@ 2005-03-11 20:53 ` Andrea Arcangeli
2005-03-11 16:55 ` Marcelo Tosatti
0 siblings, 1 reply; 4+ messages in thread
From: Andrea Arcangeli @ 2005-03-11 20:53 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: linux-kernel
Hello Marcelo,
On Fri, Mar 11, 2005 at 01:04:13PM -0300, Marcelo Tosatti wrote:
> Out of curiosity, that was SuSE not mainline ?
yep.
> Do we really want to limit dirty cache to low mem on HIGHIO capable
> machines? I'm afraid doing so might hurt performance on such systems.
>
> I think it might be wise to have nr_free_buffer_pages() take highmem
> into account if CONFIG_HIGHIO is set ?
The problem is the buffercache/blkdev-pagecache: it simply can't go in
highmem. A similar fix happened recently in 2.6 for the same reasons,
but in 2.6 we allowed it with some logic specific for the
blkdev-pagecache.
nr_free_buffer_pages() was never intended to take highmem into account,
that's why there's the GFP_USER thing already, except we didn't loop
into the zonelist, so I didn't try to make a fix similar to 2.6.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2005-03-11 21:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-11 6:10 2.4 fix for write throttling on x86 >1G Andrea Arcangeli
2005-03-11 16:04 ` Marcelo Tosatti
2005-03-11 20:53 ` Andrea Arcangeli
2005-03-11 16:55 ` Marcelo Tosatti
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox