public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Large slab cache in 2.6.1
@ 2004-02-22  0:50 Mike Fedyk
  2004-02-22  1:09 ` Mike Fedyk
  2004-02-22  2:36 ` Chris Wedgwood
  0 siblings, 2 replies; 56+ messages in thread
From: Mike Fedyk @ 2004-02-22  0:50 UTC (permalink / raw)
  To: linux-kernel

Actually 2.6.1-bk2-nfsd-stalefh-nfsd-lofft (two nfsd patches that 
already made it into 2.6.2 and 2.6.3)

http://www.matchmail.com/stats/lrrd/matchmail.com/srv-lnx2600.matchmail.com-memory.html

I have 1.5 GB of ram in this system that will be a Linux Terminal Server 
  (but using Debian & VNC).  There's 600MB+ anonymous memory, 600MB+ 
slab cache, and 100MB page cache.  That's after turning off swap (it was 
400MB into swap at the time).

Turning off the swap only shrank my page cache, and my slab didn't 
shrink one bit.

I'm sending this email because this is a production server, and I'd like 
to know if any patches after 2.6.1 would help this problem.

Thanks,

Mike


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  0:50 Mike Fedyk
@ 2004-02-22  1:09 ` Mike Fedyk
  2004-02-22  1:20   ` William Lee Irwin III
  2004-02-22  2:36 ` Chris Wedgwood
  1 sibling, 1 reply; 56+ messages in thread
From: Mike Fedyk @ 2004-02-22  1:09 UTC (permalink / raw)
  To: linux-kernel

Mike Fedyk wrote:
> I have 1.5 GB of ram in this system that will be a Linux Terminal Server 
>  (but using Debian & VNC).  There's 600MB+ anonymous memory, 600MB+ slab 
> cache, and 100MB page cache.  That's after turning off swap (it was 
> 400MB into swap at the time).

Here's my top slab users:

dentry_cache      585455 763395    256   15    1 : tunables  120   60 
  8 : slabdata  50893  50893      3

ext3_inode_cache  686837 688135    512    7    1 : tunables   54   27 
  8 : slabdata  98305  98305      0

buffer_head        34095  78078     48   77    1 : tunables  120   60 
  8 : slabdata   1014   1014      0

vm_area_struct     42103  44602     64   58    1 : tunables  120   60 
  8 : slabdata    769    769      0

pte_chain          20964  43740    128   30    1 : tunables  120   60 
  8 : slabdata   1458   1458      0

radix_tree_node    22494  23520    260   15    1 : tunables   54   27 
  8 : slabdata   1568   1568      0

filp               14474  15315    256   15    1 : tunables  120   60 
  8 : slabdata   1021   1021      0

nfs_inode_cache     2822   9264    640    6    1 : tunables   54   27 
  8 : slabdata   1544   1544      0

size-128            3420   4410    128   30    1 : tunables  120   60 
  8 : slabdata    147    147      0

size-32             3420   3472     32  112    1 : tunables  120   60 
  8 : slabdata     31     31      0

size-64             2823   3248     64   58    1 : tunables  120   60 
  8 : slabdata     56     56      0

proc_inode_cache    2580   3180    384   10    1 : tunables   54   27 
  8 : slabdata    318    318      0

dnotify_cache       2435   2490     20  166    1 : tunables  120   60 
  8 : slabdata     15     15      0

sock_inode_cache    1888   1981    512    7    1 : tunables   54   27 
  8 : slabdata    283    283      0

unix_sock           1682   1710    384   10    1 : tunables   54   27 
  8 : slabdata    171    171      0

inode_cache         1650   1690    384   10    1 : tunables   54   27 
  8 : slabdata    169    169      0
f


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  1:09 ` Mike Fedyk
@ 2004-02-22  1:20   ` William Lee Irwin III
  2004-02-22  2:03     ` Mike Fedyk
  0 siblings, 1 reply; 56+ messages in thread
From: William Lee Irwin III @ 2004-02-22  1:20 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: linux-kernel

Mike Fedyk wrote:
>>I have 1.5 GB of ram in this system that will be a Linux Terminal Server 
>> (but using Debian & VNC).  There's 600MB+ anonymous memory, 600MB+ slab 
>> cache, and 100MB page cache.  That's after turning off swap (it was 
>> 400MB into swap at the time).

On Sat, Feb 21, 2004 at 05:09:34PM -0800, Mike Fedyk wrote:
> Here's my top slab users:
> dentry_cache      585455 763395    256   15    1 : tunables  120   60 
>  8 : slabdata  50893  50893      3
> ext3_inode_cache  686837 688135    512    7    1 : tunables   54   27 
>  8 : slabdata  98305  98305      0
> buffer_head        34095  78078     48   77    1 : tunables  120   60 
>  8 : slabdata   1014   1014      0
> vm_area_struct     42103  44602     64   58    1 : tunables  120   60 
>  8 : slabdata    769    769      0
> pte_chain          20964  43740    128   30    1 : tunables  120   60 
>  8 : slabdata   1458   1458      0

Similar issue here; I ran out of filp's/whatever shortly after booting.


-- wli

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  1:20   ` William Lee Irwin III
@ 2004-02-22  2:03     ` Mike Fedyk
  2004-02-22  2:17       ` William Lee Irwin III
  2004-02-22  2:33       ` Nick Piggin
  0 siblings, 2 replies; 56+ messages in thread
From: Mike Fedyk @ 2004-02-22  2:03 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel

William Lee Irwin III wrote:

> Mike Fedyk wrote:
> 
>>>I have 1.5 GB of ram in this system that will be a Linux Terminal Server 
>>>(but using Debian & VNC).  There's 600MB+ anonymous memory, 600MB+ slab 
>>>cache, and 100MB page cache.  That's after turning off swap (it was 
>>>400MB into swap at the time).
> 
> 
> On Sat, Feb 21, 2004 at 05:09:34PM -0800, Mike Fedyk wrote:
> 
>>Here's my top slab users:
>>dentry_cache      585455 763395    256   15    1 : tunables  120   60 
>> 8 : slabdata  50893  50893      3
>>ext3_inode_cache  686837 688135    512    7    1 : tunables   54   27 
>> 8 : slabdata  98305  98305      0
>>buffer_head        34095  78078     48   77    1 : tunables  120   60 
>> 8 : slabdata   1014   1014      0
>>vm_area_struct     42103  44602     64   58    1 : tunables  120   60 
>> 8 : slabdata    769    769      0
>>pte_chain          20964  43740    128   30    1 : tunables  120   60 
>> 8 : slabdata   1458   1458      0
> 
> 
> Similar issue here; I ran out of filp's/whatever shortly after booting.

So Nick Piggin's VM patches won't help with this?



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  2:03     ` Mike Fedyk
@ 2004-02-22  2:17       ` William Lee Irwin III
  2004-02-22  2:38         ` Nick Piggin
  2004-02-22  2:40         ` Mike Fedyk
  2004-02-22  2:33       ` Nick Piggin
  1 sibling, 2 replies; 56+ messages in thread
From: William Lee Irwin III @ 2004-02-22  2:17 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: linux-kernel

William Lee Irwin III wrote:
>> Similar issue here; I ran out of filp's/whatever shortly after booting.

On Sat, Feb 21, 2004 at 06:03:14PM -0800, Mike Fedyk wrote:
> So Nick Piggin's VM patches won't help with this?

I think they're in -mm, and I'd call the vfs slab cache shrinking stuff
a vfs issue anyway because there's no actual VM content to it, apart
from the code in question being driven by the VM.


-- wli

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  2:03     ` Mike Fedyk
  2004-02-22  2:17       ` William Lee Irwin III
@ 2004-02-22  2:33       ` Nick Piggin
  2004-02-22  2:46         ` Nick Piggin
  1 sibling, 1 reply; 56+ messages in thread
From: Nick Piggin @ 2004-02-22  2:33 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: William Lee Irwin III, linux-kernel



Mike Fedyk wrote:

> William Lee Irwin III wrote:
>
>> Mike Fedyk wrote:
>>
>>>> I have 1.5 GB of ram in this system that will be a Linux Terminal 
>>>> Server (but using Debian & VNC).  There's 600MB+ anonymous memory, 
>>>> 600MB+ slab cache, and 100MB page cache.  That's after turning off 
>>>> swap (it was 400MB into swap at the time).
>>>
>>
>>
>> On Sat, Feb 21, 2004 at 05:09:34PM -0800, Mike Fedyk wrote:
>>
>>> Here's my top slab users:
>>> dentry_cache      585455 763395    256   15    1 : tunables  120   
>>> 60 8 : slabdata  50893  50893      3
>>> ext3_inode_cache  686837 688135    512    7    1 : tunables   54   
>>> 27 8 : slabdata  98305  98305      0
>>> buffer_head        34095  78078     48   77    1 : tunables  120   
>>> 60 8 : slabdata   1014   1014      0
>>> vm_area_struct     42103  44602     64   58    1 : tunables  120   
>>> 60 8 : slabdata    769    769      0
>>> pte_chain          20964  43740    128   30    1 : tunables  120   
>>> 60 8 : slabdata   1458   1458      0
>>
>>
>>
>> Similar issue here; I ran out of filp's/whatever shortly after booting.
>
>
> So Nick Piggin's VM patches won't help with this?
>

Probably not.

The main thing they do is to try to be smarter about which active
mapped pages to evict. The slab shrinking balance is pretty well
unchanged.

However there is one path in try_to_free_pages that I've changed
to shrink the slab where it otherwise wouldn't. It is pretty
unlikely that would would be continually running into this path,
but testing is welcome, as always.

Stupid question: you didn't actually say what the problem is...
having 600MB slab cache and 400MB swap may not actually be a
problem provided the swap is not being used and the cache is.

I have an idea it might be worthwhile to try using inactive list
scanning as an input to slab pressure...

Nick


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  0:50 Mike Fedyk
  2004-02-22  1:09 ` Mike Fedyk
@ 2004-02-22  2:36 ` Chris Wedgwood
  2004-02-22  3:03   ` Linus Torvalds
  1 sibling, 1 reply; 56+ messages in thread
From: Chris Wedgwood @ 2004-02-22  2:36 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: linux-kernel

On Sat, Feb 21, 2004 at 04:50:34PM -0800, Mike Fedyk wrote:

> I have 1.5 GB of ram in this system that will be a Linux Terminal
> Server (but using Debian & VNC).  There's 600MB+ anonymous memory,
> 600MB+ slab cache, and 100MB page cache.  That's after turning off
> swap (it was 400MB into swap at the time).

I have a similar annoying problem...  I have a machine which is almost
always idle (single user work station type thing) with 1.5GB of RAM
and I end up with 850M in slab!

For me the main problem seems to be driven by dentry_cache itself
bloating up really big and those entries keep fs-specific memory
pinned.

Forcing paging will push this down to acceptable levels but it's a
really irritating solution --- I'm still trying to think of a better
way to stop the dentries from using such a disproportionate amount of
memory.

I'm played with -mm kernels and various patches out there...  nothing
seems to put enough pressure on the slab unless I force paging.

akpm, riel --- any (more) ideas here?



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  2:17       ` William Lee Irwin III
@ 2004-02-22  2:38         ` Nick Piggin
  2004-02-22  2:46           ` William Lee Irwin III
  2004-02-22  2:40         ` Mike Fedyk
  1 sibling, 1 reply; 56+ messages in thread
From: Nick Piggin @ 2004-02-22  2:38 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Mike Fedyk, linux-kernel



William Lee Irwin III wrote:

>William Lee Irwin III wrote:
>
>>>Similar issue here; I ran out of filp's/whatever shortly after booting.
>>>
>
>On Sat, Feb 21, 2004 at 06:03:14PM -0800, Mike Fedyk wrote:
>
>>So Nick Piggin's VM patches won't help with this?
>>
>
>I think they're in -mm, and I'd call the vfs slab cache shrinking stuff
>a vfs issue anyway because there's no actual VM content to it, apart
>from the code in question being driven by the VM.
>
>

Yes they're in -mm and in dire need of more testing.

The indented audience are people who's machines are swapping a lot,
but ensuring they don't break more common cases isn't a bad idea.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  2:17       ` William Lee Irwin III
  2004-02-22  2:38         ` Nick Piggin
@ 2004-02-22  2:40         ` Mike Fedyk
  2004-02-22  2:58           ` Nick Piggin
  1 sibling, 1 reply; 56+ messages in thread
From: Mike Fedyk @ 2004-02-22  2:40 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: linux-kernel

William Lee Irwin III wrote:

> William Lee Irwin III wrote:
> 
>>>Similar issue here; I ran out of filp's/whatever shortly after booting.
> 
> 
> On Sat, Feb 21, 2004 at 06:03:14PM -0800, Mike Fedyk wrote:
> 
>>So Nick Piggin's VM patches won't help with this?
> 
> 
> I think they're in -mm, and I'd call the vfs slab cache shrinking stuff
> a vfs issue anyway because there's no actual VM content to it, apart
> from the code in question being driven by the VM.

Hmm, that's news to me.  Maybe that's a newer patch.  I haven't been 
reading the list much for the last month or so...

Nick had a patch that was supposed to help 2.6 with low memory 
situations to bring it on a par with 2.4 in that respect.  ISTR "active 
recycling" being mentioned about it...

I also did a
find / -xdev -type f -exec cat "{}" \; > /dev/null 2>&1
with no swap and my page cache didn't get any bigger and slab didn't 
shrink. :(

Is there anything in 2.6.3 in respect to IDE, MD Raid{1,5}, knfsd, or 
athlons I should worry about in upgrading from 2.6.1?

Thanks,

Mike


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  2:38         ` Nick Piggin
@ 2004-02-22  2:46           ` William Lee Irwin III
  0 siblings, 0 replies; 56+ messages in thread
From: William Lee Irwin III @ 2004-02-22  2:46 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Mike Fedyk, linux-kernel

William Lee Irwin III wrote:
>> I think they're in -mm, and I'd call the vfs slab cache shrinking stuff
>> a vfs issue anyway because there's no actual VM content to it, apart
>> from the code in question being driven by the VM.

On Sun, Feb 22, 2004 at 01:38:36PM +1100, Nick Piggin wrote:
> Yes they're in -mm and in dire need of more testing.
> The indented audience are people who's machines are swapping a lot,
> but ensuring they don't break more common cases isn't a bad idea.

The only symptom I'm having is running out of filp's shortly after boot.
I'm not having any performance issues. In fact, I'll send an unrelated
post out about how happy I am about performance. =)


-- wli

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  2:33       ` Nick Piggin
@ 2004-02-22  2:46         ` Nick Piggin
  2004-02-22  2:54           ` Nick Piggin
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Piggin @ 2004-02-22  2:46 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Mike Fedyk, William Lee Irwin III, linux-kernel, Chris Wedgwood



Nick Piggin wrote:

>
> I have an idea it might be worthwhile to try using inactive list
> scanning as an input to slab pressure...
>

Err that is what it does, of course. My idea was the other way
round - use active list scanning as input. So no, that probably
won't help.

Only one way to find out though. Patch against 2.6.3-mm2.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  2:46         ` Nick Piggin
@ 2004-02-22  2:54           ` Nick Piggin
  0 siblings, 0 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-22  2:54 UTC (permalink / raw)
  Cc: Mike Fedyk, William Lee Irwin III, linux-kernel, Chris Wedgwood

[-- Attachment #1: Type: text/plain, Size: 388 bytes --]



Nick Piggin wrote:

>
>
> Nick Piggin wrote:
>
>>
>> I have an idea it might be worthwhile to try using inactive list
>> scanning as an input to slab pressure...
>>
>
> Err that is what it does, of course. My idea was the other way
> round - use active list scanning as input. So no, that probably
> won't help.
>
> Only one way to find out though. Patch against 2.6.3-mm2.
>

*cough*


[-- Attachment #2: vm-inactive-shrink-slab.patch --]
[-- Type: text/plain, Size: 934 bytes --]

 linux-2.6-npiggin/mm/vmscan.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletion(-)

diff -puN mm/vmscan.c~vm-inactive-shrink-slab mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-inactive-shrink-slab	2004-02-22 13:39:45.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c	2004-02-22 13:45:01.000000000 +1100
@@ -797,6 +797,7 @@ static int
 shrink_zone(struct zone *zone, unsigned int gfp_mask,
 	int nr_pages, int *nr_scanned, struct page_state *ps, int priority)
 {
+	int ret;
 	unsigned long imbalance;
 	unsigned long nr_refill_inact;
 	unsigned long max_scan;
@@ -836,7 +837,10 @@ shrink_zone(struct zone *zone, unsigned 
 	if (max_scan < nr_pages * 2)
 		max_scan = nr_pages * 2;
 
-	return shrink_cache(nr_pages, zone, gfp_mask, max_scan, nr_scanned);
+	ret = shrink_cache(nr_pages, zone, gfp_mask, max_scan, nr_scanned);
+	/* Account for active list scanning too */
+	*nr_scanned += nr_refill_inact;
+	return ret;
 }
 
 /*

_

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  2:40         ` Mike Fedyk
@ 2004-02-22  2:58           ` Nick Piggin
  0 siblings, 0 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-22  2:58 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: William Lee Irwin III, linux-kernel



Mike Fedyk wrote:

> William Lee Irwin III wrote:
>
>> William Lee Irwin III wrote:
>>
>>>> Similar issue here; I ran out of filp's/whatever shortly after 
>>>> booting.
>>>
>>
>>
>> On Sat, Feb 21, 2004 at 06:03:14PM -0800, Mike Fedyk wrote:
>>
>>> So Nick Piggin's VM patches won't help with this?
>>
>>
>>
>> I think they're in -mm, and I'd call the vfs slab cache shrinking stuff
>> a vfs issue anyway because there's no actual VM content to it, apart
>> from the code in question being driven by the VM.
>
>
> Hmm, that's news to me.  Maybe that's a newer patch.  I haven't been 
> reading the list much for the last month or so...
>
> Nick had a patch that was supposed to help 2.6 with low memory 
> situations to bring it on a par with 2.4 in that respect.  ISTR 
> "active recycling" being mentioned about it...
>

Just an aside, it is hard to get 2.6 "on par" with 2.4 because 2.6 is
often much fairer (although it can still be badly unfair - if we ever
want to fix that we'd probably need per process mm).

There are quite a lot sorts of low memory situations you can get into.
My (and Nikita's) patches don't help the one you're probably in. They
don't put more pressure on slab.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  2:36 ` Chris Wedgwood
@ 2004-02-22  3:03   ` Linus Torvalds
  2004-02-22  3:11     ` Chris Wedgwood
  2004-02-22  3:21     ` Mike Fedyk
  0 siblings, 2 replies; 56+ messages in thread
From: Linus Torvalds @ 2004-02-22  3:03 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Mike Fedyk, linux-kernel



On Sat, 21 Feb 2004, Chris Wedgwood wrote:
> 
> Forcing paging will push this down to acceptable levels but it's a
> really irritating solution --- I'm still trying to think of a better
> way to stop the dentries from using such a disproportionate amount of
> memory.

Why?

It's quite likely that especially on a fairly idle machine, the dentry 
cache really _should_ be the biggest single memory user.

Why? Because an idle machine tends to largely be dominated by things like 
"updatedb" and friends running. If there isn't any other real activity, 
there's no reason for a big page cache, nor is there anything that would 
put memory pressure on the dentries, so they grow as much as they can.

Do you see any actual bad behaviour from this?

		Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  3:03   ` Linus Torvalds
@ 2004-02-22  3:11     ` Chris Wedgwood
  2004-02-22  3:28       ` Linus Torvalds
  2004-02-22  3:21     ` Mike Fedyk
  1 sibling, 1 reply; 56+ messages in thread
From: Chris Wedgwood @ 2004-02-22  3:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mike Fedyk, linux-kernel

On Sat, Feb 21, 2004 at 07:03:50PM -0800, Linus Torvalds wrote:

> It's quite likely that especially on a fairly idle machine, the
> dentry cache really _should_ be the biggest single memory user.

Only because updatedb/find/du populate it sporadically.  things like
cron jobs run over night and fill the slab which *never* shrinks[1].

> Do you see any actual bad behaviour from this?

The page-cache is restricted to small sizes making the fs rather slow
at times.  Ideally with 1.5GB of RAM I'd like to be able to get 800MB
or so into the page-cache... not 200MB.

Maybe gradual page-cache pressure could shirnk the slab?


  --cw



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  3:03   ` Linus Torvalds
  2004-02-22  3:11     ` Chris Wedgwood
@ 2004-02-22  3:21     ` Mike Fedyk
  1 sibling, 0 replies; 56+ messages in thread
From: Mike Fedyk @ 2004-02-22  3:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Chris Wedgwood, linux-kernel

Linus Torvalds wrote:

> 
> On Sat, 21 Feb 2004, Chris Wedgwood wrote:
> 
>>Forcing paging will push this down to acceptable levels but it's a
>>really irritating solution --- I'm still trying to think of a better
>>way to stop the dentries from using such a disproportionate amount of
>>memory.
> 
> 
> Why?
> 
> It's quite likely that especially on a fairly idle machine, the dentry 
> cache really _should_ be the biggest single memory user.
> 
> Why? Because an idle machine tends to largely be dominated by things like 
> "updatedb" and friends running. If there isn't any other real activity, 
> there's no reason for a big page cache, nor is there anything that would 
> put memory pressure on the dentries, so they grow as much as they can.
> 
> Do you see any actual bad behaviour from this?
> 
> 		Linus

Yes, see another message from me in this thread where I cat all files in 
my drive with 700MB slab (mostly dentries), and 100MB page cache after 
it's done.

Other than that the machine is idle over the weekend.  During the week 
it serves files over samba and knfsd in addition to exporting ~20 KDE 
desktops over VNC, and imap to ~4 users.  The desktops get little use at 
the moment though.

So having a small page cache should be detrimental to this machine.

http://www.matchmail.com/stats/lrrd/matchmail.com/srv-lnx2600.matchmail.com.html

The url above will show graphs for the machine in question.  But these 
graphs should be particularly interesting:

I'm swapping ocassionally, but only ~5 of the 20 KDE desktops are in use 
during the week:
http://www.matchmail.com/stats/lrrd/matchmail.com/srv-lnx2600.matchmail.com-swap.html
http://www.matchmail.com/stats/lrrd/matchmail.com/srv-lnx2600.matchmail.com-memory.html

I have a lot of open inodes, and when that goes down, so does the size 
of my slab:
http://www.matchmail.com/stats/lrrd/matchmail.com/srv-lnx2600.matchmail.com-open_inodes.html

This is to show the disk activity that should have enlarged my page cache:
http://www.matchmail.com/stats/lrrd/matchmail.com/srv-lnx2600.matchmail.com-iostat.html

Mike


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  3:11     ` Chris Wedgwood
@ 2004-02-22  3:28       ` Linus Torvalds
  2004-02-22  3:29         ` Chris Wedgwood
                           ` (3 more replies)
  0 siblings, 4 replies; 56+ messages in thread
From: Linus Torvalds @ 2004-02-22  3:28 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Mike Fedyk, linux-kernel



On Sat, 21 Feb 2004, Chris Wedgwood wrote:
> 
> Maybe gradual page-cache pressure could shirnk the slab?

What happened to the experiment of having slab pages on the (in)active
lists and letting them be free'd that way? Didn't somebody already do 
that? Ed Tomlinson and Craig Kulesa?

That's still something I'd like to try, although that's obviously 2.7.x 
material, so not useful for rigth now.

Or did the experiment just never work out well?

		Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  3:28       ` Linus Torvalds
@ 2004-02-22  3:29         ` Chris Wedgwood
  2004-02-22  3:31         ` Chris Wedgwood
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 56+ messages in thread
From: Chris Wedgwood @ 2004-02-22  3:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mike Fedyk, linux-kernel

On Sat, Feb 21, 2004 at 07:28:24PM -0800, Linus Torvalds wrote:

> What happened to the experiment of having slab pages on the
> (in)active lists and letting them be free'd that way? Didn't
> somebody already do that? Ed Tomlinson and Craig Kulesa?

Stupid question perhaps?  But how would I implement this to test it
out. It doesn't seem entirely trivial but I'm largely ignorant in
these parts of the kernel.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  3:28       ` Linus Torvalds
  2004-02-22  3:29         ` Chris Wedgwood
@ 2004-02-22  3:31         ` Chris Wedgwood
  2004-02-22  4:01           ` Nick Piggin
  2004-02-22  6:15         ` Andrew Morton
  2004-02-22 14:03         ` Ed Tomlinson
  3 siblings, 1 reply; 56+ messages in thread
From: Chris Wedgwood @ 2004-02-22  3:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mike Fedyk, linux-kernel

On Sat, Feb 21, 2004 at 07:28:24PM -0800, Linus Torvalds wrote:

> What happened to the experiment of having slab pages on the
> (in)active lists and letting them be free'd that way? Didn't
> somebody already do that? Ed Tomlinson and Craig Kulesa?

Just as a data point:

cw@taniwha:~/wk/linux/bk-2.5.x$ grep -E '(LowT|Slab)' /proc/meminfo
LowTotal:       898448 kB
Slab:           846260 kB

So the slab pressure I have right now is simply because there is
nowhere else it has to grow...



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  3:31         ` Chris Wedgwood
@ 2004-02-22  4:01           ` Nick Piggin
  2004-02-22  4:10             ` Nick Piggin
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Piggin @ 2004-02-22  4:01 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, Mike Fedyk, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 595 bytes --]



Chris Wedgwood wrote:

>On Sat, Feb 21, 2004 at 07:28:24PM -0800, Linus Torvalds wrote:
>
>
>>What happened to the experiment of having slab pages on the
>>(in)active lists and letting them be free'd that way? Didn't
>>somebody already do that? Ed Tomlinson and Craig Kulesa?
>>
>
>Just as a data point:
>
>cw@taniwha:~/wk/linux/bk-2.5.x$ grep -E '(LowT|Slab)' /proc/meminfo
>LowTotal:       898448 kB
>Slab:           846260 kB
>
>So the slab pressure I have right now is simply because there is
>nowhere else it has to grow...
>
>

Can you try the following patch? It is against 2.6.3-mm2.


[-- Attachment #2: vm-slab-balance.patch --]
[-- Type: text/plain, Size: 5719 bytes --]

 linux-2.6-npiggin/fs/dcache.c         |    4 ++--
 linux-2.6-npiggin/fs/dquot.c          |    2 +-
 linux-2.6-npiggin/fs/inode.c          |    4 ++--
 linux-2.6-npiggin/fs/mbcache.c        |    2 +-
 linux-2.6-npiggin/fs/xfs/linux/kmem.h |    2 +-
 linux-2.6-npiggin/include/linux/mm.h  |    3 +--
 linux-2.6-npiggin/mm/vmscan.c         |   22 ++++++++++++----------
 7 files changed, 20 insertions(+), 19 deletions(-)

diff -puN mm/vmscan.c~vm-slab-balance mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-slab-balance	2004-02-22 14:52:45.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c	2004-02-22 15:00:24.000000000 +1100
@@ -82,7 +82,6 @@ static long total_memory;
 struct shrinker {
 	shrinker_t		shrinker;
 	struct list_head	list;
-	int			seeks;	/* seeks to recreate an obj */
 	long			nr;	/* objs pending delete */
 };
 
@@ -92,14 +91,13 @@ static DECLARE_MUTEX(shrinker_sem);
 /*
  * Add a shrinker callback to be called from the vm
  */
-struct shrinker *set_shrinker(int seeks, shrinker_t theshrinker)
+struct shrinker *set_shrinker(shrinker_t theshrinker)
 {
         struct shrinker *shrinker;
 
         shrinker = kmalloc(sizeof(*shrinker), GFP_KERNEL);
         if (shrinker) {
 	        shrinker->shrinker = theshrinker;
-	        shrinker->seeks = seeks;
 	        shrinker->nr = 0;
 	        down(&shrinker_sem);
 	        list_add(&shrinker->list, &shrinker_list);
@@ -139,20 +137,24 @@ EXPORT_SYMBOL(remove_shrinker);
  */
 static int shrink_slab(unsigned long scanned, unsigned int gfp_mask)
 {
+	unsigned long long to_scan = scanned;
+	unsigned long slab_size = 0;
 	struct shrinker *shrinker;
-	long pages;
 
 	if (down_trylock(&shrinker_sem))
 		return 0;
 
-	pages = nr_used_zone_pages();
 	list_for_each_entry(shrinker, &shrinker_list, list) {
-		unsigned long long delta;
+		slab_size += (*shrinker->shrinker)(0, gfp_mask);
+	}
 
-		delta = 4 * scanned / shrinker->seeks;
-		delta *= (*shrinker->shrinker)(0, gfp_mask);
-		do_div(delta, pages + 1);
-		shrinker->nr += delta;
+	list_for_each_entry(shrinker, &shrinker_list, list) {
+		unsigned long long delta = to_scan;
+		int this_size = (*shrinker->shrinker)(0, gfp_mask);
+		delta *= this_size;
+		do_div(delta, slab_size + 1);
+		/* + 1 to make sure some scanning is eventually done */
+		shrinker->nr += delta + 1;
 		if (shrinker->nr > SHRINK_BATCH) {
 			long nr_to_scan = shrinker->nr;
 
diff -puN include/linux/mm.h~vm-slab-balance include/linux/mm.h
--- linux-2.6/include/linux/mm.h~vm-slab-balance	2004-02-22 14:52:45.000000000 +1100
+++ linux-2.6-npiggin/include/linux/mm.h	2004-02-22 14:52:45.000000000 +1100
@@ -483,9 +483,8 @@ typedef int (*shrinker_t)(int nr_to_scan
  * to recreate one of the objects that these functions age.
  */
 
-#define DEFAULT_SEEKS 2
 struct shrinker;
-extern struct shrinker *set_shrinker(int, shrinker_t);
+extern struct shrinker *set_shrinker(shrinker_t);
 extern void remove_shrinker(struct shrinker *shrinker);
 
 /*
diff -puN fs/dcache.c~vm-slab-balance fs/dcache.c
--- linux-2.6/fs/dcache.c~vm-slab-balance	2004-02-22 14:52:45.000000000 +1100
+++ linux-2.6-npiggin/fs/dcache.c	2004-02-22 14:52:45.000000000 +1100
@@ -657,7 +657,7 @@ static int shrink_dcache_memory(int nr, 
 		if (gfp_mask & __GFP_FS)
 			prune_dcache(nr);
 	}
-	return dentry_stat.nr_unused;
+	return dentry_stat.nr_dentry;
 }
 
 #define NAME_ALLOC_LEN(len)	((len+16) & ~15)
@@ -1564,7 +1564,7 @@ static void __init dcache_init(unsigned 
 	if (!dentry_cache)
 		panic("Cannot create dentry cache");
 	
-	set_shrinker(DEFAULT_SEEKS, shrink_dcache_memory);
+	set_shrinker(shrink_dcache_memory);
 
 	if (!dhash_entries)
 		dhash_entries = PAGE_SHIFT < 13 ?
diff -puN fs/dquot.c~vm-slab-balance fs/dquot.c
--- linux-2.6/fs/dquot.c~vm-slab-balance	2004-02-22 14:52:45.000000000 +1100
+++ linux-2.6-npiggin/fs/dquot.c	2004-02-22 14:52:45.000000000 +1100
@@ -1661,7 +1661,7 @@ static int __init dquot_init(void)
 	if (!dquot_cachep)
 		panic("Cannot create dquot SLAB cache");
 
-	set_shrinker(DEFAULT_SEEKS, shrink_dqcache_memory);
+	set_shrinker(shrink_dqcache_memory);
 
 	return 0;
 }
diff -puN fs/inode.c~vm-slab-balance fs/inode.c
--- linux-2.6/fs/inode.c~vm-slab-balance	2004-02-22 14:52:45.000000000 +1100
+++ linux-2.6-npiggin/fs/inode.c	2004-02-22 14:52:45.000000000 +1100
@@ -479,7 +479,7 @@ static int shrink_icache_memory(int nr, 
 		if (gfp_mask & __GFP_FS)
 			prune_icache(nr);
 	}
-	return inodes_stat.nr_unused;
+	return inodes_stat.nr_inodes;
 }
 
 static void __wait_on_freeing_inode(struct inode *inode);
@@ -1394,7 +1394,7 @@ void __init inode_init(unsigned long mem
 	if (!inode_cachep)
 		panic("cannot create inode slab cache");
 
-	set_shrinker(DEFAULT_SEEKS, shrink_icache_memory);
+	set_shrinker(shrink_icache_memory);
 }
 
 void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev)
diff -puN fs/mbcache.c~vm-slab-balance fs/mbcache.c
--- linux-2.6/fs/mbcache.c~vm-slab-balance	2004-02-22 14:52:45.000000000 +1100
+++ linux-2.6-npiggin/fs/mbcache.c	2004-02-22 14:52:45.000000000 +1100
@@ -629,7 +629,7 @@ mb_cache_entry_find_next(struct mb_cache
 
 static int __init init_mbcache(void)
 {
-	mb_shrinker = set_shrinker(DEFAULT_SEEKS, mb_cache_shrink_fn);
+	mb_shrinker = set_shrinker(mb_cache_shrink_fn);
 	return 0;
 }
 
diff -puN fs/xfs/linux/kmem.h~vm-slab-balance fs/xfs/linux/kmem.h
--- linux-2.6/fs/xfs/linux/kmem.h~vm-slab-balance	2004-02-22 14:52:45.000000000 +1100
+++ linux-2.6-npiggin/fs/xfs/linux/kmem.h	2004-02-22 14:52:45.000000000 +1100
@@ -171,7 +171,7 @@ typedef int (*kmem_shake_func_t)(int, un
 static __inline kmem_shaker_t
 kmem_shake_register(kmem_shake_func_t sfunc)
 {
-	return set_shrinker(DEFAULT_SEEKS, sfunc);
+	return set_shrinker(sfunc);
 }
 
 static __inline void

_

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  4:01           ` Nick Piggin
@ 2004-02-22  4:10             ` Nick Piggin
  2004-02-22  4:30               ` Nick Piggin
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Piggin @ 2004-02-22  4:10 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, Mike Fedyk, linux-kernel



Nick Piggin wrote:

>
>
> Chris Wedgwood wrote:
>
>> On Sat, Feb 21, 2004 at 07:28:24PM -0800, Linus Torvalds wrote:
>>
>>
>>> What happened to the experiment of having slab pages on the
>>> (in)active lists and letting them be free'd that way? Didn't
>>> somebody already do that? Ed Tomlinson and Craig Kulesa?
>>>
>>
>> Just as a data point:
>>
>> cw@taniwha:~/wk/linux/bk-2.5.x$ grep -E '(LowT|Slab)' /proc/meminfo
>> LowTotal:       898448 kB
>> Slab:           846260 kB
>>
>> So the slab pressure I have right now is simply because there is
>> nowhere else it has to grow...
>>
>>
>
> Can you try the following patch? It is against 2.6.3-mm2.
>

Actually I think the previous shrink_slab formula factors
out to the right thing anyway, so nevermind this patch :P


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  4:10             ` Nick Piggin
@ 2004-02-22  4:30               ` Nick Piggin
  2004-02-22  4:41                 ` Mike Fedyk
  2004-02-22  6:09                 ` Andrew Morton
  0 siblings, 2 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-22  4:30 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, Mike Fedyk, linux-kernel



Nick Piggin wrote:

>
> Actually I think the previous shrink_slab formula factors
> out to the right thing anyway, so nevermind this patch :P
>
>

Although, nr_used_zone_pages probably shouldn't be counting
highmem zones, which might be our problem.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  4:30               ` Nick Piggin
@ 2004-02-22  4:41                 ` Mike Fedyk
  2004-02-22  5:37                   ` Nick Piggin
  2004-02-22  6:09                 ` Andrew Morton
  1 sibling, 1 reply; 56+ messages in thread
From: Mike Fedyk @ 2004-02-22  4:41 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Chris Wedgwood, Linus Torvalds, linux-kernel

Nick Piggin wrote:

> 
> 
> Nick Piggin wrote:
> 
>>
>> Actually I think the previous shrink_slab formula factors
>> out to the right thing anyway, so nevermind this patch :P
>>
>>
> 
> Although, nr_used_zone_pages probably shouldn't be counting
> highmem zones, which might be our problem.

What is the kernel parameter to disable highmem?  I saw nohighio, but 
that's not it...

Doesn't "mem=" have alignment problems?


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  4:41                 ` Mike Fedyk
@ 2004-02-22  5:37                   ` Nick Piggin
  2004-02-22  5:44                     ` Chris Wedgwood
  2004-02-22  5:50                     ` Mike Fedyk
  0 siblings, 2 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-22  5:37 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: Chris Wedgwood, Linus Torvalds, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 686 bytes --]



Mike Fedyk wrote:

> Nick Piggin wrote:
>
>>
>>
>> Nick Piggin wrote:
>>
>>>
>>> Actually I think the previous shrink_slab formula factors
>>> out to the right thing anyway, so nevermind this patch :P
>>>
>>>
>>
>> Although, nr_used_zone_pages probably shouldn't be counting
>> highmem zones, which might be our problem.
>
>
> What is the kernel parameter to disable highmem?  I saw nohighio, but 
> that's not it...
>

Not sure. That defeats the purpose of trying to get your setup
working nicely though ;)

Can you upgrade to 2.6.3-mm2? It would be ideal if you could
test this patch against that kernel due to the other VM changes.

Chris, could you test this too please? Thanks.


[-- Attachment #2: vm-shrink-slab-lowmem.patch --]
[-- Type: text/plain, Size: 3526 bytes --]

 linux-2.6-npiggin/include/linux/mm.h |    2 +-
 linux-2.6-npiggin/mm/page_alloc.c    |   13 +++++++++----
 linux-2.6-npiggin/mm/vmscan.c        |   22 ++++++++++++----------
 3 files changed, 22 insertions(+), 15 deletions(-)

diff -puN mm/vmscan.c~vm-shrink-slab-lowmem mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-shrink-slab-lowmem	2004-02-22 16:35:06.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c	2004-02-22 16:35:06.000000000 +1100
@@ -145,7 +145,7 @@ static int shrink_slab(unsigned long sca
 	if (down_trylock(&shrinker_sem))
 		return 0;
 
-	pages = nr_used_zone_pages();
+	pages = nr_lowmem_lru_pages();
 	list_for_each_entry(shrinker, &shrinker_list, list) {
 		unsigned long long delta;
 
@@ -857,7 +857,8 @@ shrink_zone(struct zone *zone, unsigned 
  */
 static int
 shrink_caches(struct zone **zones, int priority, int *total_scanned,
-		int gfp_mask, int nr_pages, struct page_state *ps)
+		int *lowmem_scanned, int gfp_mask, int nr_pages,
+		struct page_state *ps)
 {
 	int ret = 0;
 	int i;
@@ -875,7 +876,10 @@ shrink_caches(struct zone **zones, int p
 
 		ret += shrink_zone(zone, gfp_mask,
 				to_reclaim, &nr_scanned, ps, priority);
+
 		*total_scanned += nr_scanned;
+		if (i < ZONE_HIGHMEM)
+			*lowmem_scanned += nr_scanned;
 		if (ret >= nr_pages)
 			break;
 	}
@@ -915,19 +919,17 @@ int try_to_free_pages(struct zone **zone
 		zones[i]->temp_priority = DEF_PRIORITY;
 
 	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
-		int total_scanned = 0;
+		int total_scanned = 0, lowmem_scanned = 0;
 		struct page_state ps;
 
 		get_page_state(&ps);
 		nr_reclaimed += shrink_caches(zones, priority, &total_scanned,
-						gfp_mask, nr_pages, &ps);
+				&lowmem_scanned, gfp_mask, nr_pages, &ps);
 
-		if (zones[0] - zones[0]->zone_pgdat->node_zones < ZONE_HIGHMEM) {
-			shrink_slab(total_scanned, gfp_mask);
-			if (reclaim_state) {
-				nr_reclaimed += reclaim_state->reclaimed_slab;
-				reclaim_state->reclaimed_slab = 0;
-			}
+		shrink_slab(lowmem_scanned, gfp_mask);
+		if (reclaim_state) {
+			nr_reclaimed += reclaim_state->reclaimed_slab;
+			reclaim_state->reclaimed_slab = 0;
 		}
 
 		if (nr_reclaimed >= nr_pages) {
diff -puN mm/page_alloc.c~vm-shrink-slab-lowmem mm/page_alloc.c
--- linux-2.6/mm/page_alloc.c~vm-shrink-slab-lowmem	2004-02-22 16:35:06.000000000 +1100
+++ linux-2.6-npiggin/mm/page_alloc.c	2004-02-22 16:35:06.000000000 +1100
@@ -772,13 +772,18 @@ unsigned int nr_free_pages(void)
 
 EXPORT_SYMBOL(nr_free_pages);
 
-unsigned int nr_used_zone_pages(void)
+unsigned int nr_lowmem_lru_pages(void)
 {
+	pg_data_t *pgdat;
 	unsigned int pages = 0;
-	struct zone *zone;
 
-	for_each_zone(zone)
-		pages += zone->nr_active + zone->nr_inactive;
+	for_each_pgdat(pgdat) {
+		int i;
+		for (i = 0; i < ZONE_HIGHMEM; i++) {
+			struct zone *zone = pgdat->node_zones + i;
+			pages += zone->nr_active + zone->nr_inactive;
+		}
+	}
 
 	return pages;
 }
diff -puN include/linux/mm.h~vm-shrink-slab-lowmem include/linux/mm.h
--- linux-2.6/include/linux/mm.h~vm-shrink-slab-lowmem	2004-02-22 16:35:06.000000000 +1100
+++ linux-2.6-npiggin/include/linux/mm.h	2004-02-22 16:35:06.000000000 +1100
@@ -625,7 +625,7 @@ static inline struct vm_area_struct * fi
 
 extern struct vm_area_struct *find_extend_vma(struct mm_struct *mm, unsigned long addr);
 
-extern unsigned int nr_used_zone_pages(void);
+extern unsigned int nr_lowmem_lru_pages(void);
 
 extern struct page * vmalloc_to_page(void *addr);
 extern struct page * follow_page(struct mm_struct *mm, unsigned long address,

_

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  5:37                   ` Nick Piggin
@ 2004-02-22  5:44                     ` Chris Wedgwood
  2004-02-22  5:52                       ` Nick Piggin
  2004-02-22  5:50                     ` Mike Fedyk
  1 sibling, 1 reply; 56+ messages in thread
From: Chris Wedgwood @ 2004-02-22  5:44 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Mike Fedyk, Linus Torvalds, linux-kernel

On Sun, Feb 22, 2004 at 04:37:46PM +1100, Nick Piggin wrote:

> Can you upgrade to 2.6.3-mm2? It would be ideal if you could test
> this patch against that kernel due to the other VM changes.

Sure.

> Chris, could you test this too please? Thanks.

I tested this change to a stock 2.6.3 kernel and saw a marginally
better situation... 650MB in slab instead of 850MB:

===== page_alloc.c 1.186 vs edited =====
--- 1.186/mm/page_alloc.c	Wed Feb 18 19:43:04 2004
+++ edited/page_alloc.c	Sat Feb 21 21:05:32 2004
@@ -764,13 +764,18 @@
 
 EXPORT_SYMBOL(nr_free_pages);
 
+/*
+ * return the number of non-highmem pages (we should probably rename
+ * this function? --cw)
+ */
 unsigned int nr_used_zone_pages(void)
 {
 	unsigned int pages = 0;
 	struct zone *zone;
 
 	for_each_zone(zone)
-		pages += zone->nr_active + zone->nr_inactive;
+		if (!is_highmem(zone))
+		    pages += zone->nr_active + zone->nr_inactive;
 
 	return pages;
 }


I'll test -mm2 with your patch shortly.

> 

> @@ -145,7 +145,7 @@ static int shrink_slab(unsigned long sca
>  	if (down_trylock(&shrinker_sem))
>  		return 0;
>  
> -	pages = nr_used_zone_pages();
> +	pages = nr_lowmem_lru_pages();

Cool. I think renaming this i a good idea.

> -unsigned int nr_used_zone_pages(void)
> +unsigned int nr_lowmem_lru_pages(void)
>  {
> +	pg_data_t *pgdat;
>  	unsigned int pages = 0;
> -	struct zone *zone;
>  
> -	for_each_zone(zone)
> -		pages += zone->nr_active + zone->nr_inactive;
> +	for_each_pgdat(pgdat) {
> +		int i;
> +		for (i = 0; i < ZONE_HIGHMEM; i++) {
> +			struct zone *zone = pgdat->node_zones + i;
> +			pages += zone->nr_active + zone->nr_inactive;
> +		}
> +	}

Why not just check is_highmem(zone) here?

> -extern unsigned int nr_used_zone_pages(void);
> +extern unsigned int nr_lowmem_lru_pages(void);

Since shrink_slab() is the only consumer of this why not move the
function to vmscan.c just above shrink_slab()?



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  5:37                   ` Nick Piggin
  2004-02-22  5:44                     ` Chris Wedgwood
@ 2004-02-22  5:50                     ` Mike Fedyk
  2004-02-22  6:01                       ` Nick Piggin
  1 sibling, 1 reply; 56+ messages in thread
From: Mike Fedyk @ 2004-02-22  5:50 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Chris Wedgwood, Linus Torvalds, linux-kernel

Nick Piggin wrote:

> 
> 
> Mike Fedyk wrote:
>> What is the kernel parameter to disable highmem?  I saw nohighio, but 
>> that's not it...
>>
> 
> Not sure. That defeats the purpose of trying to get your setup
> working nicely though ;)
> 
> Can you upgrade to 2.6.3-mm2? It would be ideal if you could
> test this patch against that kernel due to the other VM changes.

I can test on another machine, but it doesn't have as much memory, and 
I'd have to use highmem emulation.

I'd prefer to not have to restart this machine and put a test kernel on it.

> 
> Chris, could you test this too please? Thanks.

Yes, Chris do you have any highmem machines where you can test this patch?

Mike


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  5:44                     ` Chris Wedgwood
@ 2004-02-22  5:52                       ` Nick Piggin
  0 siblings, 0 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-22  5:52 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Mike Fedyk, Linus Torvalds, linux-kernel



Chris Wedgwood wrote:

>On Sun, Feb 22, 2004 at 04:37:46PM +1100, Nick Piggin wrote:
>
>
>>Can you upgrade to 2.6.3-mm2? It would be ideal if you could test
>>this patch against that kernel due to the other VM changes.
>>
>
>Sure.
>
>
>>Chris, could you test this too please? Thanks.
>>
>
>I tested this change to a stock 2.6.3 kernel and saw a marginally
>better situation... 650MB in slab instead of 850MB:
>

In your case, this is probably ideal if the system is
not doing much. You now have a reasonable amount of low
memory available.


>
>===== page_alloc.c 1.186 vs edited =====
>--- 1.186/mm/page_alloc.c	Wed Feb 18 19:43:04 2004
>+++ edited/page_alloc.c	Sat Feb 21 21:05:32 2004
>@@ -764,13 +764,18 @@
> 
> EXPORT_SYMBOL(nr_free_pages);
> 
>+/*
>+ * return the number of non-highmem pages (we should probably rename
>+ * this function? --cw)
>+ */
> unsigned int nr_used_zone_pages(void)
> {
> 	unsigned int pages = 0;
> 	struct zone *zone;
> 
> 	for_each_zone(zone)
>-		pages += zone->nr_active + zone->nr_inactive;
>+		if (!is_highmem(zone))
>+		    pages += zone->nr_active + zone->nr_inactive;
> 
> 	return pages;
> }
>
>
>I'll test -mm2 with your patch shortly.
>
>

My patch will be functionally the same as yours so you'll be
mainly testing the other VM changes (which isn't a bad thing).
Thanks.

>
>>@@ -145,7 +145,7 @@ static int shrink_slab(unsigned long sca
>> 	if (down_trylock(&shrinker_sem))
>> 		return 0;
>> 
>>-	pages = nr_used_zone_pages();
>>+	pages = nr_lowmem_lru_pages();
>>
>
>Cool. I think renaming this i a good idea.
>
>

Yep.

>>-unsigned int nr_used_zone_pages(void)
>>+unsigned int nr_lowmem_lru_pages(void)
>> {
>>+	pg_data_t *pgdat;
>> 	unsigned int pages = 0;
>>-	struct zone *zone;
>> 
>>-	for_each_zone(zone)
>>-		pages += zone->nr_active + zone->nr_inactive;
>>+	for_each_pgdat(pgdat) {
>>+		int i;
>>+		for (i = 0; i < ZONE_HIGHMEM; i++) {
>>+			struct zone *zone = pgdat->node_zones + i;
>>+			pages += zone->nr_active + zone->nr_inactive;
>>+		}
>>+	}
>>
>
>Why not just check is_highmem(zone) here?
>
>

Why indeed? Easier to read vs a tiny bit faster. It isn't
really a fast path, so your version is probably better.

>>-extern unsigned int nr_used_zone_pages(void);
>>+extern unsigned int nr_lowmem_lru_pages(void);
>>
>
>Since shrink_slab() is the only consumer of this why not move the
>function to vmscan.c just above shrink_slab()?
>
>

Might as well I suppose. I'll incorporate your suggestions if
it tests well and I end up sending it off to Andrew.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  5:50                     ` Mike Fedyk
@ 2004-02-22  6:01                       ` Nick Piggin
  2004-02-22  6:17                         ` Andrew Morton
  2004-02-22  6:45                         ` Mike Fedyk
  0 siblings, 2 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-22  6:01 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: Chris Wedgwood, Linus Torvalds, linux-kernel, Andrew Morton



Mike Fedyk wrote:

> Nick Piggin wrote:
>
>>
>>
>> Mike Fedyk wrote:
>>
>>> What is the kernel parameter to disable highmem?  I saw nohighio, 
>>> but that's not it...
>>>
>>
>> Not sure. That defeats the purpose of trying to get your setup
>> working nicely though ;)
>>
>> Can you upgrade to 2.6.3-mm2? It would be ideal if you could
>> test this patch against that kernel due to the other VM changes.
>
>
> I can test on another machine, but it doesn't have as much memory, and 
> I'd have to use highmem emulation.
>

Probably not worth the bother. It is easy enough for anyone to
test random things, but the reason your feedback is so important
is because you are actually *using* the system.

> I'd prefer to not have to restart this machine and put a test kernel 
> on it.
>

Fair enough. Maybe if we can get enough testing, some of the mm
changes can get into 2.6.4? I'm sure Linus is turning pale, maybe
we'd better wait until 2.6.10 ;)

>>
>> Chris, could you test this too please? Thanks.
>
>
> Yes, Chris do you have any highmem machines where you can test this 
> patch?
>

The system he's testing on has 1.5G too.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  4:30               ` Nick Piggin
  2004-02-22  4:41                 ` Mike Fedyk
@ 2004-02-22  6:09                 ` Andrew Morton
  2004-02-22 17:05                   ` Linus Torvalds
  1 sibling, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2004-02-22  6:09 UTC (permalink / raw)
  To: Nick Piggin; +Cc: cw, torvalds, mfedyk, linux-kernel

Nick Piggin <piggin@cyberone.com.au> wrote:
>
> Although, nr_used_zone_pages probably shouldn't be counting
>  highmem zones, which might be our problem.

yeah.  We should have made that change when making shrink_slab() ignore
highmem scanning.

Something like this (the function needs a rename)

--- 25/mm/page_alloc.c~shrink_slab-highmem-fix	2004-02-21 22:07:32.000000000 -0800
+++ 25-akpm/mm/page_alloc.c	2004-02-21 22:08:03.000000000 -0800
@@ -769,8 +769,10 @@ unsigned int nr_used_zone_pages(void)
 	unsigned int pages = 0;
 	struct zone *zone;
 
-	for_each_zone(zone)
-		pages += zone->nr_active + zone->nr_inactive;
+	for_each_zone(zone) {
+		if (zone - zone->zone_pgdat->node_zones < ZONE_HIGHMEM)
+			pages += zone->nr_active + zone->nr_inactive;
+	}
 
 	return pages;
 }

_


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  3:28       ` Linus Torvalds
  2004-02-22  3:29         ` Chris Wedgwood
  2004-02-22  3:31         ` Chris Wedgwood
@ 2004-02-22  6:15         ` Andrew Morton
  2004-02-22 16:08           ` Martin J. Bligh
  2004-02-22 14:03         ` Ed Tomlinson
  3 siblings, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2004-02-22  6:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: cw, mfedyk, linux-kernel

Linus Torvalds <torvalds@osdl.org> wrote:
>
> What happened to the experiment of having slab pages on the (in)active
>  lists and letting them be free'd that way? Didn't somebody already do 
>  that? Ed Tomlinson and Craig Kulesa?

That was Ed.  Because we cannot reclaim slab pages direct from the LRU it
turned out that putting slab pages onto the LRU was merely an extremely
complicated way of making the VFS cache scanning rate porportional to the
pagecache scanning rate.  So we ended up doing just that, without putting
the slab pages on the LRU.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  6:01                       ` Nick Piggin
@ 2004-02-22  6:17                         ` Andrew Morton
  2004-02-22  6:35                           ` Nick Piggin
  2004-02-22  6:45                         ` Mike Fedyk
  1 sibling, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2004-02-22  6:17 UTC (permalink / raw)
  To: Nick Piggin; +Cc: mfedyk, cw, torvalds, linux-kernel

Nick Piggin <piggin@cyberone.com.au> wrote:
>
> Fair enough. Maybe if we can get enough testing, some of the mm
>  changes can get into 2.6.4? I'm sure Linus is turning pale, maybe
>  we'd better wait until 2.6.10 ;)

I need to alight from my lazy tail and test them a bit^Wlot first.  More
like 2.6.5.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  6:17                         ` Andrew Morton
@ 2004-02-22  6:35                           ` Nick Piggin
  2004-02-22  6:57                             ` Andrew Morton
  2004-02-22  8:36                             ` Chris Wedgwood
  0 siblings, 2 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-22  6:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mfedyk, cw, torvalds, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 406 bytes --]



Andrew Morton wrote:

>Nick Piggin <piggin@cyberone.com.au> wrote:
>  
>
>>Fair enough. Maybe if we can get enough testing, some of the mm
>> changes can get into 2.6.4? I'm sure Linus is turning pale, maybe
>> we'd better wait until 2.6.10 ;)
>>    
>>
>
>I need to alight from my lazy tail and test them a bit^Wlot first.  More
>like 2.6.5.
>
>  
>

Can you maybe use this patch then, please?

Thanks


[-- Attachment #2: vm-shrink-slab-lowmem.patch --]
[-- Type: text/plain, Size: 5212 bytes --]

 linux-2.6-npiggin/include/linux/mm.h |    2 -
 linux-2.6-npiggin/mm/page_alloc.c    |   11 ------
 linux-2.6-npiggin/mm/vmscan.c        |   64 ++++++++++++++++++++++++++++-------
 3 files changed, 52 insertions(+), 25 deletions(-)

diff -puN mm/vmscan.c~vm-shrink-slab-lowmem mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-shrink-slab-lowmem	2004-02-22 16:35:06.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c	2004-02-22 17:30:53.000000000 +1100
@@ -122,7 +122,25 @@ void remove_shrinker(struct shrinker *sh
 }
 
 EXPORT_SYMBOL(remove_shrinker);
- 
+
+/*
+ * Returns the number of lowmem pages which are on the lru lists
+ */
+static unsigned int nr_lowmem_lru_pages(void)
+{
+	unsigned int pages = 0;
+	struct zone *zone;
+
+	for_each_zone(zone) {
+		if (unlikely(is_highmem(zone)))
+			continue;
+		pages += zone->nr_active + zone->nr_inactive;
+	}
+
+	return pages;
+}
+
+
 #define SHRINK_BATCH 128
 /*
  * Call the shrink functions to age shrinkable caches
@@ -136,6 +154,24 @@ EXPORT_SYMBOL(remove_shrinker);
  * slab to avoid swapping.
  *
  * We do weird things to avoid (scanned*seeks*entries) overflowing 32 bits.
+ *
+ * The formula to work out how much to scan each slab is as follows:
+ * Let S be the number of lowmem LRU pages were scanned (scanned)
+ * Let M be the total number of lowmem LRU pages (pages)
+ * T be the total number of all slab items.
+ * For each slab:
+ * I be number of slab items ((*shrinker->shrinker)(0, gfp_mask))
+ *
+ * "S * M / T" then gives the total number of slab items to scan, N
+ * Then for each slab, "N * T / I" is the number of items to scan for this slab.
+ *
+ * This simplifies to  "S * M / I", or
+ * lowmem lru scanned * items in this slab / total lowmem lru pages
+ *
+ * TODO:
+ * The value of M should be calculated *before* LRU scanning.
+ * Total number of items in each slab should be used, not just freeable ones.
+ * Unfreeable slab items should not count toward the scanning total.
  */
 static int shrink_slab(unsigned long scanned, unsigned int gfp_mask)
 {
@@ -145,14 +181,16 @@ static int shrink_slab(unsigned long sca
 	if (down_trylock(&shrinker_sem))
 		return 0;
 
-	pages = nr_used_zone_pages();
+	pages = nr_lowmem_lru_pages();
 	list_for_each_entry(shrinker, &shrinker_list, list) {
 		unsigned long long delta;
 
 		delta = 4 * scanned / shrinker->seeks;
 		delta *= (*shrinker->shrinker)(0, gfp_mask);
 		do_div(delta, pages + 1);
-		shrinker->nr += delta;
+
+		/* +1 to ensure some scanning gets done */
+		shrinker->nr += delta + 1;
 		if (shrinker->nr > SHRINK_BATCH) {
 			long nr_to_scan = shrinker->nr;
 
@@ -857,7 +895,8 @@ shrink_zone(struct zone *zone, unsigned 
  */
 static int
 shrink_caches(struct zone **zones, int priority, int *total_scanned,
-		int gfp_mask, int nr_pages, struct page_state *ps)
+		int *lowmem_scanned, int gfp_mask, int nr_pages,
+		struct page_state *ps)
 {
 	int ret = 0;
 	int i;
@@ -875,7 +914,10 @@ shrink_caches(struct zone **zones, int p
 
 		ret += shrink_zone(zone, gfp_mask,
 				to_reclaim, &nr_scanned, ps, priority);
+
 		*total_scanned += nr_scanned;
+		if (i < ZONE_HIGHMEM)
+			*lowmem_scanned += nr_scanned;
 		if (ret >= nr_pages)
 			break;
 	}
@@ -915,19 +957,17 @@ int try_to_free_pages(struct zone **zone
 		zones[i]->temp_priority = DEF_PRIORITY;
 
 	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
-		int total_scanned = 0;
+		int total_scanned = 0, lowmem_scanned = 0;
 		struct page_state ps;
 
 		get_page_state(&ps);
 		nr_reclaimed += shrink_caches(zones, priority, &total_scanned,
-						gfp_mask, nr_pages, &ps);
+				&lowmem_scanned, gfp_mask, nr_pages, &ps);
 
-		if (zones[0] - zones[0]->zone_pgdat->node_zones < ZONE_HIGHMEM) {
-			shrink_slab(total_scanned, gfp_mask);
-			if (reclaim_state) {
-				nr_reclaimed += reclaim_state->reclaimed_slab;
-				reclaim_state->reclaimed_slab = 0;
-			}
+		shrink_slab(lowmem_scanned, gfp_mask);
+		if (reclaim_state) {
+			nr_reclaimed += reclaim_state->reclaimed_slab;
+			reclaim_state->reclaimed_slab = 0;
 		}
 
 		if (nr_reclaimed >= nr_pages) {
diff -puN mm/page_alloc.c~vm-shrink-slab-lowmem mm/page_alloc.c
--- linux-2.6/mm/page_alloc.c~vm-shrink-slab-lowmem	2004-02-22 16:35:06.000000000 +1100
+++ linux-2.6-npiggin/mm/page_alloc.c	2004-02-22 17:04:43.000000000 +1100
@@ -772,17 +772,6 @@ unsigned int nr_free_pages(void)
 
 EXPORT_SYMBOL(nr_free_pages);
 
-unsigned int nr_used_zone_pages(void)
-{
-	unsigned int pages = 0;
-	struct zone *zone;
-
-	for_each_zone(zone)
-		pages += zone->nr_active + zone->nr_inactive;
-
-	return pages;
-}
-
 #ifdef CONFIG_NUMA
 unsigned int nr_free_pages_pgdat(pg_data_t *pgdat)
 {
diff -puN include/linux/mm.h~vm-shrink-slab-lowmem include/linux/mm.h
--- linux-2.6/include/linux/mm.h~vm-shrink-slab-lowmem	2004-02-22 16:35:06.000000000 +1100
+++ linux-2.6-npiggin/include/linux/mm.h	2004-02-22 17:04:26.000000000 +1100
@@ -625,8 +625,6 @@ static inline struct vm_area_struct * fi
 
 extern struct vm_area_struct *find_extend_vma(struct mm_struct *mm, unsigned long addr);
 
-extern unsigned int nr_used_zone_pages(void);
-
 extern struct page * vmalloc_to_page(void *addr);
 extern struct page * follow_page(struct mm_struct *mm, unsigned long address,
 		int write);

_

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  6:01                       ` Nick Piggin
  2004-02-22  6:17                         ` Andrew Morton
@ 2004-02-22  6:45                         ` Mike Fedyk
  2004-02-22  6:58                           ` Nick Piggin
  1 sibling, 1 reply; 56+ messages in thread
From: Mike Fedyk @ 2004-02-22  6:45 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Chris Wedgwood, Linus Torvalds, linux-kernel, Andrew Morton

Nick Piggin wrote:

> 
> 
> Mike Fedyk wrote:
> 
>> Nick Piggin wrote:
>>
>>>
>>>
>>> Mike Fedyk wrote:
>>>
>>>> What is the kernel parameter to disable highmem?  I saw nohighio, 
>>>> but that's not it...
>>>>
>>>
>>> Not sure. That defeats the purpose of trying to get your setup
>>> working nicely though ;)
>>>
>>> Can you upgrade to 2.6.3-mm2? It would be ideal if you could
>>> test this patch against that kernel due to the other VM changes.
>>
>>
>>
>> I can test on another machine, but it doesn't have as much memory, and 
>> I'd have to use highmem emulation.
>>
> 
> Probably not worth the bother. It is easy enough for anyone to
> test random things, but the reason your feedback is so important
> is because you are actually *using* the system.

I completely understand what you're saying.  I have seen enough threads 
where someone refused to test patches.  So let me be more specific.

I'll have to test the kernel on two other machines for a few days before 
I put it on this particular machine.  Unfortunately, both of them have < 
1.5GB ram.

So let me know which patches are most likely to fix this problem.

PS, if I can apply them to my 2.6.1 kernel, then I wouldn't have to run 
the base kernel to compare changes of 2.6.1 -> 2.6.3 -> 2.6.3-mm -> your 
patch.

Each step would require a week-day to get a fair compairison.

Mike


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  6:35                           ` Nick Piggin
@ 2004-02-22  6:57                             ` Andrew Morton
  2004-02-22  7:20                               ` Nick Piggin
  2004-02-22  8:36                             ` Chris Wedgwood
  1 sibling, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2004-02-22  6:57 UTC (permalink / raw)
  To: Nick Piggin; +Cc: mfedyk, cw, torvalds, linux-kernel

Nick Piggin <piggin@cyberone.com.au> wrote:
>
> 
> Can you maybe use this patch then, please?

OK.

> +static unsigned int nr_lowmem_lru_pages(void)

heh, that's what I called it.

> + * Total number of items in each slab should be used, not just freeable ones.
> + * Unfreeable slab items should not count toward the scanning total.

That's up to the individual shrinkers.  What we have for dcache and icache
is close enough.  Most entries on inode_unused and dentry_unused should be
reclaimable, but checking that with some instrumentation wouldn't hurt.

> +		if (i < ZONE_HIGHMEM)
> +			*lowmem_scanned += nr_scanned;

yup.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  6:45                         ` Mike Fedyk
@ 2004-02-22  6:58                           ` Nick Piggin
  2004-02-22  7:20                             ` Mike Fedyk
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Piggin @ 2004-02-22  6:58 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: Chris Wedgwood, Linus Torvalds, linux-kernel, Andrew Morton



Mike Fedyk wrote:

> Nick Piggin wrote:
>
>>
>> Probably not worth the bother. It is easy enough for anyone to
>> test random things, but the reason your feedback is so important
>> is because you are actually *using* the system.
>
>
> I completely understand what you're saying.  I have seen enough 
> threads where someone refused to test patches.  So let me be more 
> specific.
>
> I'll have to test the kernel on two other machines for a few days 
> before I put it on this particular machine.  Unfortunately, both of 
> them have < 1.5GB ram.
>

That is quite alright. I didn't intend to sound pushy in that
message, and I fully understand if you refuse to test patches on
your production machine.

> So let me know which patches are most likely to fix this problem.
>
> PS, if I can apply them to my 2.6.1 kernel, then I wouldn't have to 
> run the base kernel to compare changes of 2.6.1 -> 2.6.3 -> 2.6.3-mm 
> -> your patch.
>
> Each step would require a week-day to get a fair compairison.
>

The last patch I posted would be a good one to test if you possibly
can. You should hear someone shout within a few days if it does
anything nasty, so the 2.6.3-mm+patch path is probably safer ;)

Nick


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  6:57                             ` Andrew Morton
@ 2004-02-22  7:20                               ` Nick Piggin
  0 siblings, 0 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-22  7:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: mfedyk, cw, torvalds, linux-kernel



Andrew Morton wrote:

>Nick Piggin <piggin@cyberone.com.au> wrote:
>
>>
>>Can you maybe use this patch then, please?
>>
>
>OK.
>
>
>>+static unsigned int nr_lowmem_lru_pages(void)
>>
>
>heh, that's what I called it.
>
>
>>+ * Total number of items in each slab should be used, not just freeable ones.
>>+ * Unfreeable slab items should not count toward the scanning total.
>>
>
>That's up to the individual shrinkers.  What we have for dcache and icache
>is close enough.  Most entries on inode_unused and dentry_unused should be
>reclaimable, but checking that with some instrumentation wouldn't hurt.
>
>

Yeah it is an issue with the shrinkers, but I put it here so I
only had to write it once.

All items under TODO list are pretty pedantic, but they might
have larger impacts with very small memory systems. They would
definitely improve consistency of shrink_slab's behaviour.

Granted they probably wouldn't do much in the large scheme of
things.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  6:58                           ` Nick Piggin
@ 2004-02-22  7:20                             ` Mike Fedyk
  0 siblings, 0 replies; 56+ messages in thread
From: Mike Fedyk @ 2004-02-22  7:20 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Chris Wedgwood, Linus Torvalds, linux-kernel, Andrew Morton

Nick Piggin wrote:

> 
> 
> Mike Fedyk wrote:
>> I'll have to test the kernel on two other machines for a few days 
>> before I put it on this particular machine.  Unfortunately, both of 
>> them have < 1.5GB ram.
>>
> 
> That is quite alright. I didn't intend to sound pushy in that
> message, and I fully understand if you refuse to test patches on
> your production machine.

No problem.  It is really sad when a problem could be fixed if only the 
origional reporter put more effort into testing the proposed fixes.

Heh, so let me keep from being one of those reporters...

> 
>> So let me know which patches are most likely to fix this problem.
>>
>> PS, if I can apply them to my 2.6.1 kernel, then I wouldn't have to 
>> run the base kernel to compare changes of 2.6.1 -> 2.6.3 -> 2.6.3-mm 
>> -> your patch.
>>
>> Each step would require a week-day to get a fair compairison.
>>
> 
> The last patch I posted would be a good one to test if you possibly
> can. You should hear someone shout within a few days if it does
> anything nasty, so the 2.6.3-mm+patch path is probably safer ;)
> 

Ok, I'll get started compiling tonight.  Be sure to CC me if you have 
any updates to this patch.

Mike


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  6:35                           ` Nick Piggin
  2004-02-22  6:57                             ` Andrew Morton
@ 2004-02-22  8:36                             ` Chris Wedgwood
  2004-02-22  9:13                               ` Andrew Morton
  1 sibling, 1 reply; 56+ messages in thread
From: Chris Wedgwood @ 2004-02-22  8:36 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, mfedyk, torvalds, linux-kernel

On Sun, Feb 22, 2004 at 05:35:09PM +1100, Nick Piggin wrote:

> Can you maybe use this patch then, please?

I probably need to do more testing, but the quick patch I was using
against mainline (bk head) works better than this against 2.5.3-mm2.

I'll poke about more tomorrow.  Time for some z's now.




^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  8:36                             ` Chris Wedgwood
@ 2004-02-22  9:13                               ` Andrew Morton
  2004-02-23  0:16                                 ` Nick Piggin
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2004-02-22  9:13 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: piggin, mfedyk, torvalds, linux-kernel

Chris Wedgwood <cw@f00f.org> wrote:
>
> On Sun, Feb 22, 2004 at 05:35:09PM +1100, Nick Piggin wrote:
> 
> > Can you maybe use this patch then, please?
> 
> I probably need to do more testing, but the quick patch I was using
> against mainline (bk head) works better than this against 2.5.3-mm2.

The patch which went in six months or so back which said "only reclaim slab
if we're scanning lowmem pagecache" was wrong.  I must have been asleep at
the time.

We do need to scan slab in response to highmem page reclaim as well. 
Because all the math is based around the total amount of memory in the
machine, and we know that if we're performing highmem page reclaim then the
lower zones have no free memory.

Also, the fact that this patch makes such a difference on the 1.5G machine
points at a problem in balancing the reclaim rate against the different
zones.  I'll take a look at that separate problem.

This should apply to 2.6.3-mm2.


 mm/vmscan.c |   18 +++++++-----------
 1 files changed, 7 insertions(+), 11 deletions(-)

diff -puN mm/vmscan.c~a mm/vmscan.c
--- 25/mm/vmscan.c~a	2004-02-22 00:37:09.000000000 -0800
+++ 25-akpm/mm/vmscan.c	2004-02-22 00:37:49.000000000 -0800
@@ -922,12 +922,10 @@ int try_to_free_pages(struct zone **zone
 		nr_reclaimed += shrink_caches(zones, priority, &total_scanned,
 						gfp_mask, nr_pages, &ps);
 
-		if (zones[0] - zones[0]->zone_pgdat->node_zones < ZONE_HIGHMEM) {
-			shrink_slab(total_scanned, gfp_mask);
-			if (reclaim_state) {
-				nr_reclaimed += reclaim_state->reclaimed_slab;
-				reclaim_state->reclaimed_slab = 0;
-			}
+		shrink_slab(total_scanned, gfp_mask);
+		if (reclaim_state) {
+			nr_reclaimed += reclaim_state->reclaimed_slab;
+			reclaim_state->reclaimed_slab = 0;
 		}
 
 		if (nr_reclaimed >= nr_pages) {
@@ -1021,11 +1019,9 @@ static int balance_pgdat(pg_data_t *pgda
 			zone->temp_priority = priority;
 			reclaimed = shrink_zone(zone, GFP_KERNEL,
 					to_reclaim, &nr_scanned, ps, priority);
-			if (i < ZONE_HIGHMEM) {
-				reclaim_state->reclaimed_slab = 0;
-				shrink_slab(nr_scanned, GFP_KERNEL);
-				reclaimed += reclaim_state->reclaimed_slab;
-			}
+			reclaim_state->reclaimed_slab = 0;
+			shrink_slab(nr_scanned, GFP_KERNEL);
+			reclaimed += reclaim_state->reclaimed_slab;
 			to_free -= reclaimed;
 			if (zone->all_unreclaimable)
 				continue;

_


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
@ 2004-02-22 11:00 Manfred Spraul
  0 siblings, 0 replies; 56+ messages in thread
From: Manfred Spraul @ 2004-02-22 11:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Linus Torvalds

>
>
>Linus Torvalds <torvalds@osdl.org> wrote:
>>
>> What happened to the experiment of having slab pages on the (in)active
>>  lists and letting them be free'd that way? Didn't somebody already do 
>>  that? Ed Tomlinson and Craig Kulesa?
>
>That was Ed.  Because we cannot reclaim slab pages direct from the LRU
>
I think that this is needed: Bonwick's slab algorithm (i.e. two-level 
linked lists, implemented in cache_alloc_refill and  free_block) is 
intended for unfreeable objects.
The dentry cache is a cache of freeable objects - a different algorithm 
would be more efficient for shrinking the dentry cache after an updatedb.
I had started prototyping, but didn't get far.
--
    Manfred


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  3:28       ` Linus Torvalds
                           ` (2 preceding siblings ...)
  2004-02-22  6:15         ` Andrew Morton
@ 2004-02-22 14:03         ` Ed Tomlinson
  2004-02-23  2:28           ` Mike Fedyk
  3 siblings, 1 reply; 56+ messages in thread
From: Ed Tomlinson @ 2004-02-22 14:03 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Linus Torvalds, Mike Fedyk, linux-kernel

On February 21, 2004 10:28 pm, Linus Torvalds wrote:
> On Sat, 21 Feb 2004, Chris Wedgwood wrote:
>
> > 
> > Maybe gradual page-cache pressure could shirnk the slab?
>
> 
> What happened to the experiment of having slab pages on the (in)active
> lists and letting them be free'd that way? Didn't somebody already do 
> that? Ed Tomlinson and Craig Kulesa?

You have a good memory.  

We dropped this experiment since there was a lot of latency between the time a 
slab page became freeable and when it was actually freed.  The current 
call back scheme was designed to balance slab preasure and vmscaning.

Ed Tomlinson
 
> That's still something I'd like to try, although that's obviously 2.7.x 
> material, so not useful for rigth now.
> 
> Or did the experiment just never work out well?
> 
> 		Linus
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  6:15         ` Andrew Morton
@ 2004-02-22 16:08           ` Martin J. Bligh
  2004-02-22 17:55             ` Jamie Lokier
  2004-02-22 21:13             ` Dipankar Sarma
  0 siblings, 2 replies; 56+ messages in thread
From: Martin J. Bligh @ 2004-02-22 16:08 UTC (permalink / raw)
  To: Andrew Morton, Linus Torvalds
  Cc: cw, mfedyk, linux-kernel, Dipankar Sarma, Maneesh Soni

--Andrew Morton <akpm@osdl.org> wrote (on Saturday, February 21, 2004 22:15:53 -0800):

> Linus Torvalds <torvalds@osdl.org> wrote:
>> 
>> What happened to the experiment of having slab pages on the (in)active
>>  lists and letting them be free'd that way? Didn't somebody already do 
>>  that? Ed Tomlinson and Craig Kulesa?
> 
> That was Ed.  Because we cannot reclaim slab pages direct from the LRU it
> turned out that putting slab pages onto the LRU was merely an extremely
> complicated way of making the VFS cache scanning rate porportional to the
> pagecache scanning rate.  So we ended up doing just that, without putting
> the slab pages on the LRU.

I still don't understand the rationale behind the way we currently do it - 
perhaps I'm just being particularly dense. If we have 10,000 pages full of
dcache, and start going through shooting entries by when they were LRU wrt
the entries, not the dcache itself, then (assuming random access to dcache),
we'll evenly shoot the same number of entries from each dcache page without
actually freeing any pages at all, just trashing the cache.

Now I'm aware access isn't really random, which probably saves our arse.
But then some of the entries will be locked too, which only makes things
worse (we free a bunch of entries from that page, but the page itself
still isn't freeable). So it still seems likely to me that we'll blow 
away at least half of the dcache entries before we free any significant 
number of pages at all. That seems insane to me. Moreover, the more times 
we shrink & fill, the worse the layout will get (less grouping of "recently 
used entries" into the same page).

Moreover, it seems rather expensive to do a write operation for each 
dentry to maintain the LRU list over entries. But maybe we don't do that
anymore with dcache RCU - I lost track of what that does ;-( So doing it
on the page LRU basis still makes a damned sight more sense to me. Don't
we want semantics like "once used vs twice used" preference treatment 
for dentries, etc anyway?

If someone has the patience to explain exactly why I'm crazy (on this topic,
not in general) I'd appreciate it ;-)

M.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  6:09                 ` Andrew Morton
@ 2004-02-22 17:05                   ` Linus Torvalds
  2004-02-23  0:29                     ` Nick Piggin
  0 siblings, 1 reply; 56+ messages in thread
From: Linus Torvalds @ 2004-02-22 17:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Piggin, cw, mfedyk, linux-kernel



On Sat, 21 Feb 2004, Andrew Morton wrote:
>
> yeah.  We should have made that change when making shrink_slab() ignore
> highmem scanning.
> 
> Something like this (the function needs a rename)

Why not just pass in the list of zones? That way the _caller_ determines 
what zones he is interested in.

So just add a "struct zonelist *zonelist" as the argument, the same way 
"__alloc_pages()" has..

		Linus

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22 16:08           ` Martin J. Bligh
@ 2004-02-22 17:55             ` Jamie Lokier
  2004-02-23  3:45               ` Mike Fedyk
  2004-02-22 21:13             ` Dipankar Sarma
  1 sibling, 1 reply; 56+ messages in thread
From: Jamie Lokier @ 2004-02-22 17:55 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Andrew Morton, Linus Torvalds, cw, mfedyk, linux-kernel,
	Dipankar Sarma, Maneesh Soni

Martin J. Bligh wrote:
> So it still seems likely to me that we'll blow away at least half of
> the dcache entries before we free any significant number of pages at
> all. That seems insane to me.

It's not totally insane to free dcache entries from pages that won't
be freed.  It encourages new entries to be allocated in those pages.

Ideally you'd simply mark those dcache entries as prime candidates for
recycling when new entries are needed, without actually freeing them
until new entries are needed - or until their whole pages can be
released.

Also, biasing new allocations to recycle those old dcache entries, but
also biasing them to recently used pages, so that recently used
entries tend to cluster in the same pages.

(I'm not sure how those ideas would work out in practice; they're just
hand-waving).

-- Jamie

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22 16:08           ` Martin J. Bligh
  2004-02-22 17:55             ` Jamie Lokier
@ 2004-02-22 21:13             ` Dipankar Sarma
  1 sibling, 0 replies; 56+ messages in thread
From: Dipankar Sarma @ 2004-02-22 21:13 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Andrew Morton, Linus Torvalds, cw, mfedyk, linux-kernel,
	Maneesh Soni, Paul McKenney

On Sun, Feb 22, 2004 at 08:08:43AM -0800, Martin J. Bligh wrote:
> I still don't understand the rationale behind the way we currently do it - 
> perhaps I'm just being particularly dense. If we have 10,000 pages full of
> dcache, and start going through shooting entries by when they were LRU wrt
> the entries, not the dcache itself, then (assuming random access to dcache),
> we'll evenly shoot the same number of entries from each dcache page without
> actually freeing any pages at all, just trashing the cache.
> 
> Now I'm aware access isn't really random, which probably saves our arse.
> But then some of the entries will be locked too, which only makes things
> worse (we free a bunch of entries from that page, but the page itself
> still isn't freeable). So it still seems likely to me that we'll blow 
> away at least half of the dcache entries before we free any significant 
> number of pages at all. That seems insane to me. Moreover, the more times 
> we shrink & fill, the worse the layout will get (less grouping of "recently 
> used entries" into the same page).

Do you have a quick test to demonstrate this ? That would be useful.

> Moreover, it seems rather expensive to do a write operation for each 
> dentry to maintain the LRU list over entries. But maybe we don't do that
> anymore with dcache RCU - I lost track of what that does ;-( So doing it
> on the page LRU basis still makes a damned sight more sense to me. Don't
> we want semantics like "once used vs twice used" preference treatment 
> for dentries, etc anyway?

Dcache-RCU hasn't changed the dentry freeing to slab much, it is still
LRU. Given a CPU, dentries are still returned to the slab
in dcache LRU order.

I have always wondered about how useful the global dcache LRU 
mechanism is. This adds another reason for us to go experiment
with it.

Thanks
Dipankar

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22  9:13                               ` Andrew Morton
@ 2004-02-23  0:16                                 ` Nick Piggin
  2004-02-23  0:26                                   ` Andrew Morton
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Piggin @ 2004-02-23  0:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Chris Wedgwood, mfedyk, torvalds, linux-kernel



Andrew Morton wrote:

>Chris Wedgwood <cw@f00f.org> wrote:
>
>>On Sun, Feb 22, 2004 at 05:35:09PM +1100, Nick Piggin wrote:
>>
>>
>>>Can you maybe use this patch then, please?
>>>
>>I probably need to do more testing, but the quick patch I was using
>>against mainline (bk head) works better than this against 2.5.3-mm2.
>>
>
>The patch which went in six months or so back which said "only reclaim slab
>if we're scanning lowmem pagecache" was wrong.  I must have been asleep at
>the time.
>
>We do need to scan slab in response to highmem page reclaim as well. 
>Because all the math is based around the total amount of memory in the
>machine, and we know that if we're performing highmem page reclaim then the
>lower zones have no free memory.
>
>

I don't understand this. Presumably if the lower zones have no free
memory then we'll be doing lowmem page reclaim too, and that will
be shrinking the slab.

The patch I sent you should (modulo the ->seeks stuff) make it
behave as if the slab pages are on lowmem LRUs and get scanned
accordingly.



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-23  0:16                                 ` Nick Piggin
@ 2004-02-23  0:26                                   ` Andrew Morton
  2004-02-23  0:34                                     ` Nick Piggin
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2004-02-23  0:26 UTC (permalink / raw)
  To: Nick Piggin; +Cc: cw, mfedyk, torvalds, linux-kernel

Nick Piggin <piggin@cyberone.com.au> wrote:
>
> 
> 
> Andrew Morton wrote:
> 
> >Chris Wedgwood <cw@f00f.org> wrote:
> >
> >>On Sun, Feb 22, 2004 at 05:35:09PM +1100, Nick Piggin wrote:
> >>
> >>
> >>>Can you maybe use this patch then, please?
> >>>
> >>I probably need to do more testing, but the quick patch I was using
> >>against mainline (bk head) works better than this against 2.5.3-mm2.
> >>
> >
> >The patch which went in six months or so back which said "only reclaim slab
> >if we're scanning lowmem pagecache" was wrong.  I must have been asleep at
> >the time.
> >
> >We do need to scan slab in response to highmem page reclaim as well. 
> >Because all the math is based around the total amount of memory in the
> >machine, and we know that if we're performing highmem page reclaim then the
> >lower zones have no free memory.
> >
> >
> 
> I don't understand this. Presumably if the lower zones have no free
> memory then we'll be doing lowmem page reclaim too, and that will
> be shrinking the slab.

We should be performing lowmem page reclaim, but we're not.  With some
highmem/lowmem size combinations the `incremental min' logic in the page
allocator will prevent __GFP_HIGHMEM allocations from taking ZONE_NORMAL
below pages_high and kswapd then does not perform page reclaim in the
lowmem zone at all.  I'm seeing some workloads where we reclaim 700 highmem
pages for each lowmem page.  This hugely exacerbated the slab problem on
1.5G machines.  I have that fixed up now.

Regardless of that, we do, logically, want to reclaim slab in response to
highmem reclaim pressure because any highmem allocation can be satisfied by
lowmem too.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22 17:05                   ` Linus Torvalds
@ 2004-02-23  0:29                     ` Nick Piggin
  0 siblings, 0 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-23  0:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, cw, mfedyk, linux-kernel



Linus Torvalds wrote:

>
>On Sat, 21 Feb 2004, Andrew Morton wrote:
>
>>yeah.  We should have made that change when making shrink_slab() ignore
>>highmem scanning.
>>
>>Something like this (the function needs a rename)
>>
>
>Why not just pass in the list of zones? That way the _caller_ determines 
>what zones he is interested in.
>
>So just add a "struct zonelist *zonelist" as the argument, the same way 
>"__alloc_pages()" has..
>
>

The caller might as well just pass in the total size of all
LRUs that it has scanned. It has to traverse the zones anyway,
and this enables it to take the size *before* shrinking, which
gives you a better number.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-23  0:26                                   ` Andrew Morton
@ 2004-02-23  0:34                                     ` Nick Piggin
  2004-02-23  0:46                                       ` Andrew Morton
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Piggin @ 2004-02-23  0:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: cw, mfedyk, torvalds, linux-kernel



Andrew Morton wrote:

>Nick Piggin <piggin@cyberone.com.au> wrote:
>
>>
>>
>>Andrew Morton wrote:
>>
>>
>>>
>>>We do need to scan slab in response to highmem page reclaim as well. 
>>>Because all the math is based around the total amount of memory in the
>>>machine, and we know that if we're performing highmem page reclaim then the
>>>lower zones have no free memory.
>>>
>>>
>>>
>>I don't understand this. Presumably if the lower zones have no free
>>memory then we'll be doing lowmem page reclaim too, and that will
>>be shrinking the slab.
>>
>
>We should be performing lowmem page reclaim, but we're not.  With some
>highmem/lowmem size combinations the `incremental min' logic in the page
>allocator will prevent __GFP_HIGHMEM allocations from taking ZONE_NORMAL
>below pages_high and kswapd then does not perform page reclaim in the
>lowmem zone at all.  I'm seeing some workloads where we reclaim 700 highmem
>pages for each lowmem page.  This hugely exacerbated the slab problem on
>1.5G machines.  I have that fixed up now.
>
>

This is the incremental min logic doing its work though. Maybe
that should be fixed up to be less aggressive instead of putting
more complexity in the scanner to work around it.

Anyway could you post the patch you're using to fix it?

>Regardless of that, we do, logically, want to reclaim slab in response to
>highmem reclaim pressure because any highmem allocation can be satisfied by
>lowmem too.
>
>

The logical extension of that is: "we want to reclaim *lowmem* in
response to highmem reclaim pressure because any ..."

If this isn't what the scanner is doing then it should be fixed in
a more generic way.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-23  0:34                                     ` Nick Piggin
@ 2004-02-23  0:46                                       ` Andrew Morton
  2004-02-23  0:54                                         ` Nick Piggin
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2004-02-23  0:46 UTC (permalink / raw)
  To: Nick Piggin; +Cc: cw, mfedyk, torvalds, linux-kernel

Nick Piggin <piggin@cyberone.com.au> wrote:
>
> >We should be performing lowmem page reclaim, but we're not.  With some
> >highmem/lowmem size combinations the `incremental min' logic in the page
> >allocator will prevent __GFP_HIGHMEM allocations from taking ZONE_NORMAL
> >below pages_high and kswapd then does not perform page reclaim in the
> >lowmem zone at all.  I'm seeing some workloads where we reclaim 700 highmem
> >pages for each lowmem page.  This hugely exacerbated the slab problem on
> >1.5G machines.  I have that fixed up now.
> >
> >
> 
> This is the incremental min logic doing its work though. Maybe
> that should be fixed up to be less aggressive instead of putting
> more complexity in the scanner to work around it.

The scanner got simpler.

> Anyway could you post the patch you're using to fix it?

Sure.

> >Regardless of that, we do, logically, want to reclaim slab in response to
> >highmem reclaim pressure because any highmem allocation can be satisfied by
> >lowmem too.
> >
> >
> 
> The logical extension of that is: "we want to reclaim *lowmem* in
> response to highmem reclaim pressure because any ..."

yep.

> If this isn't what the scanner is doing then it should be fixed in
> a more generic way.


 include/linux/mmzone.h |    5 ++++-
 mm/page_alloc.c        |   13 ++++++++++++-
 mm/vmscan.c            |   22 +++++++++-------------
 3 files changed, 25 insertions(+), 15 deletions(-)

diff -puN mm/page_alloc.c~zone-balancing-batching mm/page_alloc.c
--- 25/mm/page_alloc.c~zone-balancing-batching	2004-02-22 15:15:52.000000000 -0800
+++ 25-akpm/mm/page_alloc.c	2004-02-22 15:15:52.000000000 -0800
@@ -1019,6 +1019,7 @@ void show_free_areas(void)
 			" min:%lukB"
 			" low:%lukB"
 			" high:%lukB"
+			" batch:%lukB"
 			" active:%lukB"
 			" inactive:%lukB"
 			"\n",
@@ -1027,6 +1028,7 @@ void show_free_areas(void)
 			K(zone->pages_min),
 			K(zone->pages_low),
 			K(zone->pages_high),
+			K(zone->reclaim_batch),
 			K(zone->nr_active),
 			K(zone->nr_inactive)
 			);
@@ -1618,6 +1620,8 @@ static void setup_per_zone_pages_min(voi
 			lowmem_pages += zone->present_pages;
 
 	for_each_zone(zone) {
+		unsigned long long reclaim_batch;
+
 		spin_lock_irqsave(&zone->lru_lock, flags);
 		if (is_highmem(zone)) {
 			/*
@@ -1642,8 +1646,15 @@ static void setup_per_zone_pages_min(voi
 			                   lowmem_pages;
 		}
 
-		zone-> pages_low = zone->pages_min * 2;
+		zone->pages_low = zone->pages_min * 2;
 		zone->pages_high = zone->pages_min * 3;
+
+		reclaim_batch = zone->present_pages * SWAP_CLUSTER_MAX;
+		do_div(reclaim_batch, lowmem_pages);
+		zone->reclaim_batch = reclaim_batch;
+		if (zone->reclaim_batch < 4)
+			zone->reclaim_batch = 4;
+
 		spin_unlock_irqrestore(&zone->lru_lock, flags);
 	}
 }
diff -puN mm/vmscan.c~zone-balancing-batching mm/vmscan.c
--- 25/mm/vmscan.c~zone-balancing-batching	2004-02-22 15:15:52.000000000 -0800
+++ 25-akpm/mm/vmscan.c	2004-02-22 15:15:52.000000000 -0800
@@ -859,13 +859,12 @@ shrink_zone(struct zone *zone, unsigned 
  */
 static int
 shrink_caches(struct zone **zones, int priority, int *total_scanned,
-		int gfp_mask, int nr_pages, struct page_state *ps)
+		int gfp_mask, struct page_state *ps)
 {
 	int ret = 0;
 	int i;
 
 	for (i = 0; zones[i] != NULL; i++) {
-		int to_reclaim = max(nr_pages, SWAP_CLUSTER_MAX);
 		struct zone *zone = zones[i];
 		int nr_scanned;
 
@@ -875,8 +874,8 @@ shrink_caches(struct zone **zones, int p
 		if (zone->all_unreclaimable && priority != DEF_PRIORITY)
 			continue;	/* Let kswapd poll it */
 
-		ret += shrink_zone(zone, gfp_mask,
-				to_reclaim, &nr_scanned, ps, priority);
+		ret += shrink_zone(zone, gfp_mask, zone->reclaim_batch,
+				&nr_scanned, ps, priority);
 		*total_scanned += nr_scanned;
 	}
 	return ret;
@@ -904,7 +903,6 @@ int try_to_free_pages(struct zone **zone
 {
 	int priority;
 	int ret = 0;
-	const int nr_pages = SWAP_CLUSTER_MAX;
 	int nr_reclaimed = 0;
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	int i;
@@ -920,7 +918,7 @@ int try_to_free_pages(struct zone **zone
 
 		get_page_state(&ps);
 		nr_reclaimed += shrink_caches(zones, priority, &total_scanned,
-						gfp_mask, nr_pages, &ps);
+						gfp_mask, &ps);
 
 		shrink_slab(total_scanned, gfp_mask);
 		if (reclaim_state) {
@@ -928,7 +926,7 @@ int try_to_free_pages(struct zone **zone
 			reclaim_state->reclaimed_slab = 0;
 		}
 
-		if (nr_reclaimed >= nr_pages) {
+		if (nr_reclaimed >= SWAP_CLUSTER_MAX) {
 			ret = 1;
 			if (gfp_mask & __GFP_FS)
 				wakeup_bdflush(total_scanned);
@@ -1008,13 +1006,11 @@ static int balance_pgdat(pg_data_t *pgda
 			if (zone->all_unreclaimable && priority != DEF_PRIORITY)
 				continue;
 
-			if (nr_pages) {		/* Software suspend */
+			if (nr_pages)		/* Software suspend */
 				to_reclaim = min(to_free, SWAP_CLUSTER_MAX*8);
-			} else {		/* Zone balancing */
-				to_reclaim = zone->pages_high-zone->free_pages;
-				if (to_reclaim <= 0)
-					continue;
-			}
+			else			/* Zone balancing */
+				to_reclaim = zone->reclaim_batch;
+
 			all_zones_ok = 0;
 			zone->temp_priority = priority;
 			reclaimed = shrink_zone(zone, GFP_KERNEL,
diff -puN include/linux/mmzone.h~zone-balancing-batching include/linux/mmzone.h
--- 25/include/linux/mmzone.h~zone-balancing-batching	2004-02-22 15:15:52.000000000 -0800
+++ 25-akpm/include/linux/mmzone.h	2004-02-22 15:15:52.000000000 -0800
@@ -69,7 +69,10 @@ struct zone {
 	 */
 	spinlock_t		lock;
 	unsigned long		free_pages;
-	unsigned long		pages_min, pages_low, pages_high;
+	unsigned long		pages_min;
+	unsigned long		pages_low;
+	unsigned long		pages_high;
+	unsigned long		reclaim_batch;
 
 	ZONE_PADDING(_pad1_)
 

_


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-23  0:46                                       ` Andrew Morton
@ 2004-02-23  0:54                                         ` Nick Piggin
  2004-02-23  1:00                                           ` Andrew Morton
  0 siblings, 1 reply; 56+ messages in thread
From: Nick Piggin @ 2004-02-23  0:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: cw, mfedyk, torvalds, linux-kernel



Andrew Morton wrote:

>Nick Piggin <piggin@cyberone.com.au> wrote:
>
>
>>This is the incremental min logic doing its work though. Maybe
>>that should be fixed up to be less aggressive instead of putting
>>more complexity in the scanner to work around it.
>>
>
>The scanner got simpler.
>
>
>>Anyway could you post the patch you're using to fix it?
>>
>
>Sure.
>
>
>>>Regardless of that, we do, logically, want to reclaim slab in response to
>>>highmem reclaim pressure because any highmem allocation can be satisfied by
>>>lowmem too.
>>>
>>>
>>>
>>The logical extension of that is: "we want to reclaim *lowmem* in
>>response to highmem reclaim pressure because any ..."
>>
>
>yep.
>
>

Yeah this is good. I thought the patch you were proposing was
to shrink slab on highmem pressure.

Apply some lowmem pressure due to highmem pressure THEN shrink
slab as a result of the lowmem pressure is much better.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-23  0:54                                         ` Nick Piggin
@ 2004-02-23  1:00                                           ` Andrew Morton
  2004-02-23  1:06                                             ` Nick Piggin
  0 siblings, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2004-02-23  1:00 UTC (permalink / raw)
  To: Nick Piggin; +Cc: cw, mfedyk, torvalds, linux-kernel

Nick Piggin <piggin@cyberone.com.au> wrote:
>
> 
> >
> >yep.
> >
> >
> 
> Yeah this is good. I thought the patch you were proposing was
> to shrink slab on highmem pressure.

That as well.

> Apply some lowmem pressure due to highmem pressure THEN shrink
> slab as a result of the lowmem pressure is much better.

Prove it to me ;)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-23  1:00                                           ` Andrew Morton
@ 2004-02-23  1:06                                             ` Nick Piggin
  0 siblings, 0 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-23  1:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: cw, mfedyk, torvalds, linux-kernel



Andrew Morton wrote:

>Nick Piggin <piggin@cyberone.com.au> wrote:
>
>>
>>>yep.
>>>
>>>
>>>
>>Yeah this is good. I thought the patch you were proposing was
>>to shrink slab on highmem pressure.
>>
>
>That as well.
>
>

Well this is the complexity I'm talking about. Sure it
is actually "simpler" code wise, but you're making it
conceptually more complex.

>>Apply some lowmem pressure due to highmem pressure THEN shrink
>>slab as a result of the lowmem pressure is much better.
>>
>
>Prove it to me ;)
>
>

Your slab wasn't being shrunk because the slab pressure
calculation was way off for highmem systems. My patch fixed
that, so lowmem pressure should shrink slab properly.

Then with your patch, highmem pressure will apply lowmem
pressure. So the end result is that the slab gets appropriate
pressure.

Can't you just prove to me why that doesn't work? ;)


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22 14:03         ` Ed Tomlinson
@ 2004-02-23  2:28           ` Mike Fedyk
  2004-02-23  3:33             ` Ed Tomlinson
  0 siblings, 1 reply; 56+ messages in thread
From: Mike Fedyk @ 2004-02-23  2:28 UTC (permalink / raw)
  To: Ed Tomlinson; +Cc: Chris Wedgwood, Linus Torvalds, linux-kernel

Ed Tomlinson wrote:

> On February 21, 2004 10:28 pm, Linus Torvalds wrote:
> 
>>On Sat, 21 Feb 2004, Chris Wedgwood wrote:
>>
>>
>>>Maybe gradual page-cache pressure could shirnk the slab?
>>
>>
>>What happened to the experiment of having slab pages on the (in)active
>>lists and letting them be free'd that way? Didn't somebody already do 
>>that? Ed Tomlinson and Craig Kulesa?
> 
> 
> You have a good memory.  
> 
> We dropped this experiment since there was a lot of latency between the time a 
> slab page became freeable and when it was actually freed.  The current 
> call back scheme was designed to balance slab preasure and vmscaning.

Does it really matter if there is a lot of latency?  How does this 
affect real-world results?  IOW, if it's not at the end of the LRU, then 
there's probably something better to free instead...


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-23  2:28           ` Mike Fedyk
@ 2004-02-23  3:33             ` Ed Tomlinson
  0 siblings, 0 replies; 56+ messages in thread
From: Ed Tomlinson @ 2004-02-23  3:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: Mike Fedyk

On February 22, 2004 09:28 pm, Mike Fedyk wrote:
> Ed Tomlinson wrote:
> > On February 21, 2004 10:28 pm, Linus Torvalds wrote:
> >>On Sat, 21 Feb 2004, Chris Wedgwood wrote:
> >>>Maybe gradual page-cache pressure could shirnk the slab?
> >>
> >>What happened to the experiment of having slab pages on the (in)active
> >>lists and letting them be free'd that way? Didn't somebody already do
> >>that? Ed Tomlinson and Craig Kulesa?
> >
> > You have a good memory.
> >
> > We dropped this experiment since there was a lot of latency between the
> > time a slab page became freeable and when it was actually freed.  The
> > current call back scheme was designed to balance slab preasure and
> > vmscaning.
>
> Does it really matter if there is a lot of latency?  How does this
> affect real-world results?  IOW, if it's not at the end of the LRU, then
> there's probably something better to free instead...

It mattered.  People noticed and complained.  In any case, as Andrew 
pointed out, we get the same effect, without long latencies, in a simplier 
manner with the current scheme.

Ed

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: Large slab cache in 2.6.1
  2004-02-22 17:55             ` Jamie Lokier
@ 2004-02-23  3:45               ` Mike Fedyk
  0 siblings, 0 replies; 56+ messages in thread
From: Mike Fedyk @ 2004-02-23  3:45 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Martin J. Bligh, Andrew Morton, Linus Torvalds, cw, linux-kernel,
	Dipankar Sarma, Maneesh Soni

Jamie Lokier wrote:
> It's not totally insane to free dcache entries from pages that won't
> be freed.  It encourages new entries to be allocated in those pages.
> 
> Ideally you'd simply mark those dcache entries as prime candidates for
> recycling when new entries are needed, without actually freeing them
> until new entries are needed - or until their whole pages can be
> released.

This doesn't do much when you want to actually free slab pages though...

I had a similair thought, where you'd mark slab pages where you should 
aggressively try to free the containing slab objects in future scans, 
but didn't send it since someone else had probably already thought of it.

> 
> Also, biasing new allocations to recycle those old dcache entries, but
> also biasing them to recently used pages, so that recently used
> entries tend to cluster in the same pages.
> 

Hmm, so if slab is on the LRU, then in some cases the page can't be 
freed because of locked slab objects and new objects get allocated to 
the new mostly free slab page, and you didn't free very many pages.

Though this might better utilize the slab pages...

Mike


^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2004-02-23  3:45 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-22 11:00 Large slab cache in 2.6.1 Manfred Spraul
  -- strict thread matches above, loose matches on Subject: below --
2004-02-22  0:50 Mike Fedyk
2004-02-22  1:09 ` Mike Fedyk
2004-02-22  1:20   ` William Lee Irwin III
2004-02-22  2:03     ` Mike Fedyk
2004-02-22  2:17       ` William Lee Irwin III
2004-02-22  2:38         ` Nick Piggin
2004-02-22  2:46           ` William Lee Irwin III
2004-02-22  2:40         ` Mike Fedyk
2004-02-22  2:58           ` Nick Piggin
2004-02-22  2:33       ` Nick Piggin
2004-02-22  2:46         ` Nick Piggin
2004-02-22  2:54           ` Nick Piggin
2004-02-22  2:36 ` Chris Wedgwood
2004-02-22  3:03   ` Linus Torvalds
2004-02-22  3:11     ` Chris Wedgwood
2004-02-22  3:28       ` Linus Torvalds
2004-02-22  3:29         ` Chris Wedgwood
2004-02-22  3:31         ` Chris Wedgwood
2004-02-22  4:01           ` Nick Piggin
2004-02-22  4:10             ` Nick Piggin
2004-02-22  4:30               ` Nick Piggin
2004-02-22  4:41                 ` Mike Fedyk
2004-02-22  5:37                   ` Nick Piggin
2004-02-22  5:44                     ` Chris Wedgwood
2004-02-22  5:52                       ` Nick Piggin
2004-02-22  5:50                     ` Mike Fedyk
2004-02-22  6:01                       ` Nick Piggin
2004-02-22  6:17                         ` Andrew Morton
2004-02-22  6:35                           ` Nick Piggin
2004-02-22  6:57                             ` Andrew Morton
2004-02-22  7:20                               ` Nick Piggin
2004-02-22  8:36                             ` Chris Wedgwood
2004-02-22  9:13                               ` Andrew Morton
2004-02-23  0:16                                 ` Nick Piggin
2004-02-23  0:26                                   ` Andrew Morton
2004-02-23  0:34                                     ` Nick Piggin
2004-02-23  0:46                                       ` Andrew Morton
2004-02-23  0:54                                         ` Nick Piggin
2004-02-23  1:00                                           ` Andrew Morton
2004-02-23  1:06                                             ` Nick Piggin
2004-02-22  6:45                         ` Mike Fedyk
2004-02-22  6:58                           ` Nick Piggin
2004-02-22  7:20                             ` Mike Fedyk
2004-02-22  6:09                 ` Andrew Morton
2004-02-22 17:05                   ` Linus Torvalds
2004-02-23  0:29                     ` Nick Piggin
2004-02-22  6:15         ` Andrew Morton
2004-02-22 16:08           ` Martin J. Bligh
2004-02-22 17:55             ` Jamie Lokier
2004-02-23  3:45               ` Mike Fedyk
2004-02-22 21:13             ` Dipankar Sarma
2004-02-22 14:03         ` Ed Tomlinson
2004-02-23  2:28           ` Mike Fedyk
2004-02-23  3:33             ` Ed Tomlinson
2004-02-22  3:21     ` Mike Fedyk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox