All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Andrea Arcangeli <andrea@novell.com>
Cc: Andrew Morton <akpm@osdl.org>, linux-kernel@vger.kernel.org
Subject: Re: ZONE_PADDING wastes 4 bytes of the new cacheline
Date: Sun, 24 Oct 2004 02:28:28 +1000	[thread overview]
Message-ID: <417A86AC.2080505@yahoo.com.au> (raw)
In-Reply-To: <20041023110334.GS14325@dualathlon.random>

Andrea Arcangeli wrote:
> On Sat, Oct 23, 2004 at 08:22:38PM +1000, Nick Piggin wrote:
> 
>>It is an unlikely scenario, but it is definitely good for robustness.
>>Especially on small memory systems where the amount allocated doesn't
>>have to be that large.
>>
>>Let's say a 16MB system pages_low ~= 64K, so we'll also say we've
> 
> 
> btw, thinking the watermarks linear with the amount of memory isn't
> correct. the watermarks for zone normal against zone normal (i.e. the
> current pages_xx of 2.6) should have an high and low limit indipendent
> on the memory size of the machine. the low limit is what the machine
> needs to avoid locking up in the PF_MEMALLOC paths. So it obviously has
> absolutely nothing to do with the amount of ram in the machine.
> 

No, you are right there of course. However, I think 64K will be the
reality in this case because I just did a sysrq+M and took a look
at my ZONE_DMA (ie. 16MB) limits.

However, it does have something to do with the amount of concurrency
in the system - the chance of multiple tasks in PF_MEMALLOC on a larger
system (with more tasks, more CPUs) will increase of course.

> 64k sounds way too low even for a PDA that doesn't swap, still there are
> PF_MEMALLOC paths in the dcache and fs methods.
> 
> but this is just a side note, let's assume 64k would be sane in this
> workload (a page size smaller than 4k that in turn requires less ram to
> execute method on each page object would make it sane for example).
> 

Maybe - I haven't really looked at those paths at all... but yeah it
is a peripheral issue. We can continue that in another thread sometime
:)

> 
>>currently got 64K free. Someone then wants to do an order 4 allocation
>>OK they succeed (assuming memory isn't fragmented) and there's 0K free.
>>
>>Which is bad because you can now get deadlocks when trying to free
>>memory.
> 
> 
> I got what you mean, I misread that code sorry, you're perfectly right
> about order being needed in that code.
> 
> In 2.4 I had to implement it too of course, it's just much cleaner than
> 2.6.
> 
> static inline unsigned long zone_free_pages(zone_t * zone, unsigned int order)
> {
> 	long free = zone->free_pages - (1UL << order);
> 	return free >= 0 ? free : 0;
> }
> 
> 
> 	for (;;) {
> 		zone_t *z = *(zone++);
> 		if (!z)
> 			break;
> 
> 		if (zone_free_pages(z, order) > z->watermarks[class_idx].low) {
> 			page = rmqueue(z, order);
> 			if (page)
> 				return page;
> 		}
> 	}
> 
> 
> this compares with your 2.6 code:
> 
> 	for (i = 0; (z = zones[i]) != NULL; i++) {
> 		min = z->pages_min;
> 		if (gfp_mask & __GFP_HIGH)
> 			min /= 2;
> 		if (can_try_harder)
> 			min -= min / 4;
> 		min += (1<<order) + z->protection[alloc_type];
> 
> 		if (z->free_pages < min)
> 			continue;
> 
> 		page = buffered_rmqueue(z, order, gfp_mask);
> 		if (page)
> 			goto got_pg;
> 	}
> 

Although you put 2.6 in a bad light with this code ;)
__GFP_HIGH and can_try_harder are pretty important... It
does look like the continue could be replaced with the
2.4 version's negated if statement to be a bit cleaner

> When I was reading "z->free_pages < min" in your code, I was really
> reading like my code here "zone_free_pages(z, order) > z->watermarks[class_idx].low"
> I was taking for given the free_pages - 1UL<<order was already accounted
> in z->free_pages, because I hidden that calculation in a method so I'm
> not used to think about it while reading alloc_pages (I assumed that
> thing was already accounted for in a different function like in 2.4).
> 
> Sorry if I'm biased but I read and modified 2.4 many more times than
> 2.6.
> 

That's OK.

> 
>>Oh if you've still got the three watermarks then that may work -
>>I thought you meant getting rid of one of the *completely*.
>>
>>But I'm still not sure what advantage you see in moving from
>>pages_xxx + protection to a single watermark.
> 
> 
> then what advantage you get to compute pages_xx + protection at runtime
> when reading a pages_xx that already contains the protection would be
> enough? I avoid computations at runtime and I keep the localized in the
> watermark generation. I doubt it makes much difference but this is the
> way I did in 2.4 and it looks cleaner to me, plus this avoids me to
> reinvent the wheel.
> 

In kswapd you really just want the pages_xxx value (well, pages_high).


  reply	other threads:[~2004-10-23 16:30 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-10-21  1:17 ZONE_PADDING wastes 4 bytes of the new cacheline Andrea Arcangeli
2004-10-21  3:10 ` Nick Piggin
2004-10-21  4:36   ` Andrew Morton
2004-10-21  4:53     ` Nick Piggin
2004-10-21 10:51     ` Mikael Pettersson
2004-10-21 12:45       ` Andrea Arcangeli
2004-10-21 18:54         ` Adam Heath
2004-10-21 20:21           ` DaMouse
2004-10-21 21:24             ` Jon Masters
2004-10-22 10:09               ` DaMouse
2004-10-21 22:26     ` Nick Piggin
2004-10-21 22:45       ` Andrea Arcangeli
2004-10-22  0:34         ` Nick Piggin
2004-10-22  1:10           ` Andrea Arcangeli
2004-10-22  1:26             ` Andrew Morton
2004-10-22  2:55               ` Jesse Barnes
2004-10-22  3:38                 ` Nick Piggin
2004-10-22  3:49                   ` Jesse Barnes
2004-10-22 17:15                     ` Andrea Arcangeli
2004-10-22  3:09               ` Nick Piggin
2004-10-22  3:26                 ` Andrew Morton
2004-10-22  3:35                   ` Nick Piggin
2004-10-22 17:13                     ` Andrea Arcangeli
2004-10-22 17:07                   ` Andrea Arcangeli
2004-10-22 15:50               ` Andrea Arcangeli
2004-10-22  3:02             ` Nick Piggin
2004-10-22 16:58               ` Andrea Arcangeli
2004-10-23  4:33                 ` Nick Piggin
2004-10-23  9:59                   ` Andrea Arcangeli
2004-10-23 10:22                     ` Nick Piggin
2004-10-23 11:03                       ` Andrea Arcangeli
2004-10-23 16:28                         ` Nick Piggin [this message]
2004-10-25 12:44                           ` Andrea Arcangeli
2004-10-25 12:49                             ` Nick Piggin
2004-10-25 13:51                               ` Andrea Arcangeli
2004-10-25 20:09                             ` Robert White

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=417A86AC.2080505@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=akpm@osdl.org \
    --cc=andrea@novell.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.