* Large slab cache in 2.6.1
@ 2004-02-22 0:50 Mike Fedyk
2004-02-22 1:09 ` Mike Fedyk
2004-02-22 2:36 ` Chris Wedgwood
0 siblings, 2 replies; 56+ messages in thread
From: Mike Fedyk @ 2004-02-22 0:50 UTC (permalink / raw)
To: linux-kernel
Actually 2.6.1-bk2-nfsd-stalefh-nfsd-lofft (two nfsd patches that
already made it into 2.6.2 and 2.6.3)
http://www.matchmail.com/stats/lrrd/matchmail.com/srv-lnx2600.matchmail.com-memory.html
I have 1.5 GB of ram in this system that will be a Linux Terminal Server
(but using Debian & VNC). There's 600MB+ anonymous memory, 600MB+
slab cache, and 100MB page cache. That's after turning off swap (it was
400MB into swap at the time).
Turning off the swap only shrank my page cache, and my slab didn't
shrink one bit.
I'm sending this email because this is a production server, and I'd like
to know if any patches after 2.6.1 would help this problem.
Thanks,
Mike
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 0:50 Mike Fedyk
@ 2004-02-22 1:09 ` Mike Fedyk
2004-02-22 1:20 ` William Lee Irwin III
2004-02-22 2:36 ` Chris Wedgwood
1 sibling, 1 reply; 56+ messages in thread
From: Mike Fedyk @ 2004-02-22 1:09 UTC (permalink / raw)
To: linux-kernel
Mike Fedyk wrote:
> I have 1.5 GB of ram in this system that will be a Linux Terminal Server
> (but using Debian & VNC). There's 600MB+ anonymous memory, 600MB+ slab
> cache, and 100MB page cache. That's after turning off swap (it was
> 400MB into swap at the time).
Here's my top slab users:
dentry_cache 585455 763395 256 15 1 : tunables 120 60
8 : slabdata 50893 50893 3
ext3_inode_cache 686837 688135 512 7 1 : tunables 54 27
8 : slabdata 98305 98305 0
buffer_head 34095 78078 48 77 1 : tunables 120 60
8 : slabdata 1014 1014 0
vm_area_struct 42103 44602 64 58 1 : tunables 120 60
8 : slabdata 769 769 0
pte_chain 20964 43740 128 30 1 : tunables 120 60
8 : slabdata 1458 1458 0
radix_tree_node 22494 23520 260 15 1 : tunables 54 27
8 : slabdata 1568 1568 0
filp 14474 15315 256 15 1 : tunables 120 60
8 : slabdata 1021 1021 0
nfs_inode_cache 2822 9264 640 6 1 : tunables 54 27
8 : slabdata 1544 1544 0
size-128 3420 4410 128 30 1 : tunables 120 60
8 : slabdata 147 147 0
size-32 3420 3472 32 112 1 : tunables 120 60
8 : slabdata 31 31 0
size-64 2823 3248 64 58 1 : tunables 120 60
8 : slabdata 56 56 0
proc_inode_cache 2580 3180 384 10 1 : tunables 54 27
8 : slabdata 318 318 0
dnotify_cache 2435 2490 20 166 1 : tunables 120 60
8 : slabdata 15 15 0
sock_inode_cache 1888 1981 512 7 1 : tunables 54 27
8 : slabdata 283 283 0
unix_sock 1682 1710 384 10 1 : tunables 54 27
8 : slabdata 171 171 0
inode_cache 1650 1690 384 10 1 : tunables 54 27
8 : slabdata 169 169 0
f
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 1:09 ` Mike Fedyk
@ 2004-02-22 1:20 ` William Lee Irwin III
2004-02-22 2:03 ` Mike Fedyk
0 siblings, 1 reply; 56+ messages in thread
From: William Lee Irwin III @ 2004-02-22 1:20 UTC (permalink / raw)
To: Mike Fedyk; +Cc: linux-kernel
Mike Fedyk wrote:
>>I have 1.5 GB of ram in this system that will be a Linux Terminal Server
>> (but using Debian & VNC). There's 600MB+ anonymous memory, 600MB+ slab
>> cache, and 100MB page cache. That's after turning off swap (it was
>> 400MB into swap at the time).
On Sat, Feb 21, 2004 at 05:09:34PM -0800, Mike Fedyk wrote:
> Here's my top slab users:
> dentry_cache 585455 763395 256 15 1 : tunables 120 60
> 8 : slabdata 50893 50893 3
> ext3_inode_cache 686837 688135 512 7 1 : tunables 54 27
> 8 : slabdata 98305 98305 0
> buffer_head 34095 78078 48 77 1 : tunables 120 60
> 8 : slabdata 1014 1014 0
> vm_area_struct 42103 44602 64 58 1 : tunables 120 60
> 8 : slabdata 769 769 0
> pte_chain 20964 43740 128 30 1 : tunables 120 60
> 8 : slabdata 1458 1458 0
Similar issue here; I ran out of filp's/whatever shortly after booting.
-- wli
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 1:20 ` William Lee Irwin III
@ 2004-02-22 2:03 ` Mike Fedyk
2004-02-22 2:17 ` William Lee Irwin III
2004-02-22 2:33 ` Nick Piggin
0 siblings, 2 replies; 56+ messages in thread
From: Mike Fedyk @ 2004-02-22 2:03 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: linux-kernel
William Lee Irwin III wrote:
> Mike Fedyk wrote:
>
>>>I have 1.5 GB of ram in this system that will be a Linux Terminal Server
>>>(but using Debian & VNC). There's 600MB+ anonymous memory, 600MB+ slab
>>>cache, and 100MB page cache. That's after turning off swap (it was
>>>400MB into swap at the time).
>
>
> On Sat, Feb 21, 2004 at 05:09:34PM -0800, Mike Fedyk wrote:
>
>>Here's my top slab users:
>>dentry_cache 585455 763395 256 15 1 : tunables 120 60
>> 8 : slabdata 50893 50893 3
>>ext3_inode_cache 686837 688135 512 7 1 : tunables 54 27
>> 8 : slabdata 98305 98305 0
>>buffer_head 34095 78078 48 77 1 : tunables 120 60
>> 8 : slabdata 1014 1014 0
>>vm_area_struct 42103 44602 64 58 1 : tunables 120 60
>> 8 : slabdata 769 769 0
>>pte_chain 20964 43740 128 30 1 : tunables 120 60
>> 8 : slabdata 1458 1458 0
>
>
> Similar issue here; I ran out of filp's/whatever shortly after booting.
So Nick Piggin's VM patches won't help with this?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 2:03 ` Mike Fedyk
@ 2004-02-22 2:17 ` William Lee Irwin III
2004-02-22 2:38 ` Nick Piggin
2004-02-22 2:40 ` Mike Fedyk
2004-02-22 2:33 ` Nick Piggin
1 sibling, 2 replies; 56+ messages in thread
From: William Lee Irwin III @ 2004-02-22 2:17 UTC (permalink / raw)
To: Mike Fedyk; +Cc: linux-kernel
William Lee Irwin III wrote:
>> Similar issue here; I ran out of filp's/whatever shortly after booting.
On Sat, Feb 21, 2004 at 06:03:14PM -0800, Mike Fedyk wrote:
> So Nick Piggin's VM patches won't help with this?
I think they're in -mm, and I'd call the vfs slab cache shrinking stuff
a vfs issue anyway because there's no actual VM content to it, apart
from the code in question being driven by the VM.
-- wli
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 2:03 ` Mike Fedyk
2004-02-22 2:17 ` William Lee Irwin III
@ 2004-02-22 2:33 ` Nick Piggin
2004-02-22 2:46 ` Nick Piggin
1 sibling, 1 reply; 56+ messages in thread
From: Nick Piggin @ 2004-02-22 2:33 UTC (permalink / raw)
To: Mike Fedyk; +Cc: William Lee Irwin III, linux-kernel
Mike Fedyk wrote:
> William Lee Irwin III wrote:
>
>> Mike Fedyk wrote:
>>
>>>> I have 1.5 GB of ram in this system that will be a Linux Terminal
>>>> Server (but using Debian & VNC). There's 600MB+ anonymous memory,
>>>> 600MB+ slab cache, and 100MB page cache. That's after turning off
>>>> swap (it was 400MB into swap at the time).
>>>
>>
>>
>> On Sat, Feb 21, 2004 at 05:09:34PM -0800, Mike Fedyk wrote:
>>
>>> Here's my top slab users:
>>> dentry_cache 585455 763395 256 15 1 : tunables 120
>>> 60 8 : slabdata 50893 50893 3
>>> ext3_inode_cache 686837 688135 512 7 1 : tunables 54
>>> 27 8 : slabdata 98305 98305 0
>>> buffer_head 34095 78078 48 77 1 : tunables 120
>>> 60 8 : slabdata 1014 1014 0
>>> vm_area_struct 42103 44602 64 58 1 : tunables 120
>>> 60 8 : slabdata 769 769 0
>>> pte_chain 20964 43740 128 30 1 : tunables 120
>>> 60 8 : slabdata 1458 1458 0
>>
>>
>>
>> Similar issue here; I ran out of filp's/whatever shortly after booting.
>
>
> So Nick Piggin's VM patches won't help with this?
>
Probably not.
The main thing they do is to try to be smarter about which active
mapped pages to evict. The slab shrinking balance is pretty well
unchanged.
However there is one path in try_to_free_pages that I've changed
to shrink the slab where it otherwise wouldn't. It is pretty
unlikely that would would be continually running into this path,
but testing is welcome, as always.
Stupid question: you didn't actually say what the problem is...
having 600MB slab cache and 400MB swap may not actually be a
problem provided the swap is not being used and the cache is.
I have an idea it might be worthwhile to try using inactive list
scanning as an input to slab pressure...
Nick
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 0:50 Mike Fedyk
2004-02-22 1:09 ` Mike Fedyk
@ 2004-02-22 2:36 ` Chris Wedgwood
2004-02-22 3:03 ` Linus Torvalds
1 sibling, 1 reply; 56+ messages in thread
From: Chris Wedgwood @ 2004-02-22 2:36 UTC (permalink / raw)
To: Mike Fedyk; +Cc: linux-kernel
On Sat, Feb 21, 2004 at 04:50:34PM -0800, Mike Fedyk wrote:
> I have 1.5 GB of ram in this system that will be a Linux Terminal
> Server (but using Debian & VNC). There's 600MB+ anonymous memory,
> 600MB+ slab cache, and 100MB page cache. That's after turning off
> swap (it was 400MB into swap at the time).
I have a similar annoying problem... I have a machine which is almost
always idle (single user work station type thing) with 1.5GB of RAM
and I end up with 850M in slab!
For me the main problem seems to be driven by dentry_cache itself
bloating up really big and those entries keep fs-specific memory
pinned.
Forcing paging will push this down to acceptable levels but it's a
really irritating solution --- I'm still trying to think of a better
way to stop the dentries from using such a disproportionate amount of
memory.
I'm played with -mm kernels and various patches out there... nothing
seems to put enough pressure on the slab unless I force paging.
akpm, riel --- any (more) ideas here?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 2:17 ` William Lee Irwin III
@ 2004-02-22 2:38 ` Nick Piggin
2004-02-22 2:46 ` William Lee Irwin III
2004-02-22 2:40 ` Mike Fedyk
1 sibling, 1 reply; 56+ messages in thread
From: Nick Piggin @ 2004-02-22 2:38 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: Mike Fedyk, linux-kernel
William Lee Irwin III wrote:
>William Lee Irwin III wrote:
>
>>>Similar issue here; I ran out of filp's/whatever shortly after booting.
>>>
>
>On Sat, Feb 21, 2004 at 06:03:14PM -0800, Mike Fedyk wrote:
>
>>So Nick Piggin's VM patches won't help with this?
>>
>
>I think they're in -mm, and I'd call the vfs slab cache shrinking stuff
>a vfs issue anyway because there's no actual VM content to it, apart
>from the code in question being driven by the VM.
>
>
Yes they're in -mm and in dire need of more testing.
The indented audience are people who's machines are swapping a lot,
but ensuring they don't break more common cases isn't a bad idea.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 2:17 ` William Lee Irwin III
2004-02-22 2:38 ` Nick Piggin
@ 2004-02-22 2:40 ` Mike Fedyk
2004-02-22 2:58 ` Nick Piggin
1 sibling, 1 reply; 56+ messages in thread
From: Mike Fedyk @ 2004-02-22 2:40 UTC (permalink / raw)
To: William Lee Irwin III; +Cc: linux-kernel
William Lee Irwin III wrote:
> William Lee Irwin III wrote:
>
>>>Similar issue here; I ran out of filp's/whatever shortly after booting.
>
>
> On Sat, Feb 21, 2004 at 06:03:14PM -0800, Mike Fedyk wrote:
>
>>So Nick Piggin's VM patches won't help with this?
>
>
> I think they're in -mm, and I'd call the vfs slab cache shrinking stuff
> a vfs issue anyway because there's no actual VM content to it, apart
> from the code in question being driven by the VM.
Hmm, that's news to me. Maybe that's a newer patch. I haven't been
reading the list much for the last month or so...
Nick had a patch that was supposed to help 2.6 with low memory
situations to bring it on a par with 2.4 in that respect. ISTR "active
recycling" being mentioned about it...
I also did a
find / -xdev -type f -exec cat "{}" \; > /dev/null 2>&1
with no swap and my page cache didn't get any bigger and slab didn't
shrink. :(
Is there anything in 2.6.3 in respect to IDE, MD Raid{1,5}, knfsd, or
athlons I should worry about in upgrading from 2.6.1?
Thanks,
Mike
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 2:38 ` Nick Piggin
@ 2004-02-22 2:46 ` William Lee Irwin III
0 siblings, 0 replies; 56+ messages in thread
From: William Lee Irwin III @ 2004-02-22 2:46 UTC (permalink / raw)
To: Nick Piggin; +Cc: Mike Fedyk, linux-kernel
William Lee Irwin III wrote:
>> I think they're in -mm, and I'd call the vfs slab cache shrinking stuff
>> a vfs issue anyway because there's no actual VM content to it, apart
>> from the code in question being driven by the VM.
On Sun, Feb 22, 2004 at 01:38:36PM +1100, Nick Piggin wrote:
> Yes they're in -mm and in dire need of more testing.
> The indented audience are people who's machines are swapping a lot,
> but ensuring they don't break more common cases isn't a bad idea.
The only symptom I'm having is running out of filp's shortly after boot.
I'm not having any performance issues. In fact, I'll send an unrelated
post out about how happy I am about performance. =)
-- wli
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 2:33 ` Nick Piggin
@ 2004-02-22 2:46 ` Nick Piggin
2004-02-22 2:54 ` Nick Piggin
0 siblings, 1 reply; 56+ messages in thread
From: Nick Piggin @ 2004-02-22 2:46 UTC (permalink / raw)
To: Nick Piggin
Cc: Mike Fedyk, William Lee Irwin III, linux-kernel, Chris Wedgwood
Nick Piggin wrote:
>
> I have an idea it might be worthwhile to try using inactive list
> scanning as an input to slab pressure...
>
Err that is what it does, of course. My idea was the other way
round - use active list scanning as input. So no, that probably
won't help.
Only one way to find out though. Patch against 2.6.3-mm2.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 2:46 ` Nick Piggin
@ 2004-02-22 2:54 ` Nick Piggin
0 siblings, 0 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-22 2:54 UTC (permalink / raw)
Cc: Mike Fedyk, William Lee Irwin III, linux-kernel, Chris Wedgwood
[-- Attachment #1: Type: text/plain, Size: 388 bytes --]
Nick Piggin wrote:
>
>
> Nick Piggin wrote:
>
>>
>> I have an idea it might be worthwhile to try using inactive list
>> scanning as an input to slab pressure...
>>
>
> Err that is what it does, of course. My idea was the other way
> round - use active list scanning as input. So no, that probably
> won't help.
>
> Only one way to find out though. Patch against 2.6.3-mm2.
>
*cough*
[-- Attachment #2: vm-inactive-shrink-slab.patch --]
[-- Type: text/plain, Size: 934 bytes --]
linux-2.6-npiggin/mm/vmscan.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletion(-)
diff -puN mm/vmscan.c~vm-inactive-shrink-slab mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-inactive-shrink-slab 2004-02-22 13:39:45.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c 2004-02-22 13:45:01.000000000 +1100
@@ -797,6 +797,7 @@ static int
shrink_zone(struct zone *zone, unsigned int gfp_mask,
int nr_pages, int *nr_scanned, struct page_state *ps, int priority)
{
+ int ret;
unsigned long imbalance;
unsigned long nr_refill_inact;
unsigned long max_scan;
@@ -836,7 +837,10 @@ shrink_zone(struct zone *zone, unsigned
if (max_scan < nr_pages * 2)
max_scan = nr_pages * 2;
- return shrink_cache(nr_pages, zone, gfp_mask, max_scan, nr_scanned);
+ ret = shrink_cache(nr_pages, zone, gfp_mask, max_scan, nr_scanned);
+ /* Account for active list scanning too */
+ *nr_scanned += nr_refill_inact;
+ return ret;
}
/*
_
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 2:40 ` Mike Fedyk
@ 2004-02-22 2:58 ` Nick Piggin
0 siblings, 0 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-22 2:58 UTC (permalink / raw)
To: Mike Fedyk; +Cc: William Lee Irwin III, linux-kernel
Mike Fedyk wrote:
> William Lee Irwin III wrote:
>
>> William Lee Irwin III wrote:
>>
>>>> Similar issue here; I ran out of filp's/whatever shortly after
>>>> booting.
>>>
>>
>>
>> On Sat, Feb 21, 2004 at 06:03:14PM -0800, Mike Fedyk wrote:
>>
>>> So Nick Piggin's VM patches won't help with this?
>>
>>
>>
>> I think they're in -mm, and I'd call the vfs slab cache shrinking stuff
>> a vfs issue anyway because there's no actual VM content to it, apart
>> from the code in question being driven by the VM.
>
>
> Hmm, that's news to me. Maybe that's a newer patch. I haven't been
> reading the list much for the last month or so...
>
> Nick had a patch that was supposed to help 2.6 with low memory
> situations to bring it on a par with 2.4 in that respect. ISTR
> "active recycling" being mentioned about it...
>
Just an aside, it is hard to get 2.6 "on par" with 2.4 because 2.6 is
often much fairer (although it can still be badly unfair - if we ever
want to fix that we'd probably need per process mm).
There are quite a lot sorts of low memory situations you can get into.
My (and Nikita's) patches don't help the one you're probably in. They
don't put more pressure on slab.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 2:36 ` Chris Wedgwood
@ 2004-02-22 3:03 ` Linus Torvalds
2004-02-22 3:11 ` Chris Wedgwood
2004-02-22 3:21 ` Mike Fedyk
0 siblings, 2 replies; 56+ messages in thread
From: Linus Torvalds @ 2004-02-22 3:03 UTC (permalink / raw)
To: Chris Wedgwood; +Cc: Mike Fedyk, linux-kernel
On Sat, 21 Feb 2004, Chris Wedgwood wrote:
>
> Forcing paging will push this down to acceptable levels but it's a
> really irritating solution --- I'm still trying to think of a better
> way to stop the dentries from using such a disproportionate amount of
> memory.
Why?
It's quite likely that especially on a fairly idle machine, the dentry
cache really _should_ be the biggest single memory user.
Why? Because an idle machine tends to largely be dominated by things like
"updatedb" and friends running. If there isn't any other real activity,
there's no reason for a big page cache, nor is there anything that would
put memory pressure on the dentries, so they grow as much as they can.
Do you see any actual bad behaviour from this?
Linus
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 3:03 ` Linus Torvalds
@ 2004-02-22 3:11 ` Chris Wedgwood
2004-02-22 3:28 ` Linus Torvalds
2004-02-22 3:21 ` Mike Fedyk
1 sibling, 1 reply; 56+ messages in thread
From: Chris Wedgwood @ 2004-02-22 3:11 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Mike Fedyk, linux-kernel
On Sat, Feb 21, 2004 at 07:03:50PM -0800, Linus Torvalds wrote:
> It's quite likely that especially on a fairly idle machine, the
> dentry cache really _should_ be the biggest single memory user.
Only because updatedb/find/du populate it sporadically. things like
cron jobs run over night and fill the slab which *never* shrinks[1].
> Do you see any actual bad behaviour from this?
The page-cache is restricted to small sizes making the fs rather slow
at times. Ideally with 1.5GB of RAM I'd like to be able to get 800MB
or so into the page-cache... not 200MB.
Maybe gradual page-cache pressure could shirnk the slab?
--cw
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 3:03 ` Linus Torvalds
2004-02-22 3:11 ` Chris Wedgwood
@ 2004-02-22 3:21 ` Mike Fedyk
1 sibling, 0 replies; 56+ messages in thread
From: Mike Fedyk @ 2004-02-22 3:21 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Chris Wedgwood, linux-kernel
Linus Torvalds wrote:
>
> On Sat, 21 Feb 2004, Chris Wedgwood wrote:
>
>>Forcing paging will push this down to acceptable levels but it's a
>>really irritating solution --- I'm still trying to think of a better
>>way to stop the dentries from using such a disproportionate amount of
>>memory.
>
>
> Why?
>
> It's quite likely that especially on a fairly idle machine, the dentry
> cache really _should_ be the biggest single memory user.
>
> Why? Because an idle machine tends to largely be dominated by things like
> "updatedb" and friends running. If there isn't any other real activity,
> there's no reason for a big page cache, nor is there anything that would
> put memory pressure on the dentries, so they grow as much as they can.
>
> Do you see any actual bad behaviour from this?
>
> Linus
Yes, see another message from me in this thread where I cat all files in
my drive with 700MB slab (mostly dentries), and 100MB page cache after
it's done.
Other than that the machine is idle over the weekend. During the week
it serves files over samba and knfsd in addition to exporting ~20 KDE
desktops over VNC, and imap to ~4 users. The desktops get little use at
the moment though.
So having a small page cache should be detrimental to this machine.
http://www.matchmail.com/stats/lrrd/matchmail.com/srv-lnx2600.matchmail.com.html
The url above will show graphs for the machine in question. But these
graphs should be particularly interesting:
I'm swapping ocassionally, but only ~5 of the 20 KDE desktops are in use
during the week:
http://www.matchmail.com/stats/lrrd/matchmail.com/srv-lnx2600.matchmail.com-swap.html
http://www.matchmail.com/stats/lrrd/matchmail.com/srv-lnx2600.matchmail.com-memory.html
I have a lot of open inodes, and when that goes down, so does the size
of my slab:
http://www.matchmail.com/stats/lrrd/matchmail.com/srv-lnx2600.matchmail.com-open_inodes.html
This is to show the disk activity that should have enlarged my page cache:
http://www.matchmail.com/stats/lrrd/matchmail.com/srv-lnx2600.matchmail.com-iostat.html
Mike
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 3:11 ` Chris Wedgwood
@ 2004-02-22 3:28 ` Linus Torvalds
2004-02-22 3:29 ` Chris Wedgwood
` (3 more replies)
0 siblings, 4 replies; 56+ messages in thread
From: Linus Torvalds @ 2004-02-22 3:28 UTC (permalink / raw)
To: Chris Wedgwood; +Cc: Mike Fedyk, linux-kernel
On Sat, 21 Feb 2004, Chris Wedgwood wrote:
>
> Maybe gradual page-cache pressure could shirnk the slab?
What happened to the experiment of having slab pages on the (in)active
lists and letting them be free'd that way? Didn't somebody already do
that? Ed Tomlinson and Craig Kulesa?
That's still something I'd like to try, although that's obviously 2.7.x
material, so not useful for rigth now.
Or did the experiment just never work out well?
Linus
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 3:28 ` Linus Torvalds
@ 2004-02-22 3:29 ` Chris Wedgwood
2004-02-22 3:31 ` Chris Wedgwood
` (2 subsequent siblings)
3 siblings, 0 replies; 56+ messages in thread
From: Chris Wedgwood @ 2004-02-22 3:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Mike Fedyk, linux-kernel
On Sat, Feb 21, 2004 at 07:28:24PM -0800, Linus Torvalds wrote:
> What happened to the experiment of having slab pages on the
> (in)active lists and letting them be free'd that way? Didn't
> somebody already do that? Ed Tomlinson and Craig Kulesa?
Stupid question perhaps? But how would I implement this to test it
out. It doesn't seem entirely trivial but I'm largely ignorant in
these parts of the kernel.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 3:28 ` Linus Torvalds
2004-02-22 3:29 ` Chris Wedgwood
@ 2004-02-22 3:31 ` Chris Wedgwood
2004-02-22 4:01 ` Nick Piggin
2004-02-22 6:15 ` Andrew Morton
2004-02-22 14:03 ` Ed Tomlinson
3 siblings, 1 reply; 56+ messages in thread
From: Chris Wedgwood @ 2004-02-22 3:31 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Mike Fedyk, linux-kernel
On Sat, Feb 21, 2004 at 07:28:24PM -0800, Linus Torvalds wrote:
> What happened to the experiment of having slab pages on the
> (in)active lists and letting them be free'd that way? Didn't
> somebody already do that? Ed Tomlinson and Craig Kulesa?
Just as a data point:
cw@taniwha:~/wk/linux/bk-2.5.x$ grep -E '(LowT|Slab)' /proc/meminfo
LowTotal: 898448 kB
Slab: 846260 kB
So the slab pressure I have right now is simply because there is
nowhere else it has to grow...
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 3:31 ` Chris Wedgwood
@ 2004-02-22 4:01 ` Nick Piggin
2004-02-22 4:10 ` Nick Piggin
0 siblings, 1 reply; 56+ messages in thread
From: Nick Piggin @ 2004-02-22 4:01 UTC (permalink / raw)
To: Chris Wedgwood; +Cc: Linus Torvalds, Mike Fedyk, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 595 bytes --]
Chris Wedgwood wrote:
>On Sat, Feb 21, 2004 at 07:28:24PM -0800, Linus Torvalds wrote:
>
>
>>What happened to the experiment of having slab pages on the
>>(in)active lists and letting them be free'd that way? Didn't
>>somebody already do that? Ed Tomlinson and Craig Kulesa?
>>
>
>Just as a data point:
>
>cw@taniwha:~/wk/linux/bk-2.5.x$ grep -E '(LowT|Slab)' /proc/meminfo
>LowTotal: 898448 kB
>Slab: 846260 kB
>
>So the slab pressure I have right now is simply because there is
>nowhere else it has to grow...
>
>
Can you try the following patch? It is against 2.6.3-mm2.
[-- Attachment #2: vm-slab-balance.patch --]
[-- Type: text/plain, Size: 5719 bytes --]
linux-2.6-npiggin/fs/dcache.c | 4 ++--
linux-2.6-npiggin/fs/dquot.c | 2 +-
linux-2.6-npiggin/fs/inode.c | 4 ++--
linux-2.6-npiggin/fs/mbcache.c | 2 +-
linux-2.6-npiggin/fs/xfs/linux/kmem.h | 2 +-
linux-2.6-npiggin/include/linux/mm.h | 3 +--
linux-2.6-npiggin/mm/vmscan.c | 22 ++++++++++++----------
7 files changed, 20 insertions(+), 19 deletions(-)
diff -puN mm/vmscan.c~vm-slab-balance mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-slab-balance 2004-02-22 14:52:45.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c 2004-02-22 15:00:24.000000000 +1100
@@ -82,7 +82,6 @@ static long total_memory;
struct shrinker {
shrinker_t shrinker;
struct list_head list;
- int seeks; /* seeks to recreate an obj */
long nr; /* objs pending delete */
};
@@ -92,14 +91,13 @@ static DECLARE_MUTEX(shrinker_sem);
/*
* Add a shrinker callback to be called from the vm
*/
-struct shrinker *set_shrinker(int seeks, shrinker_t theshrinker)
+struct shrinker *set_shrinker(shrinker_t theshrinker)
{
struct shrinker *shrinker;
shrinker = kmalloc(sizeof(*shrinker), GFP_KERNEL);
if (shrinker) {
shrinker->shrinker = theshrinker;
- shrinker->seeks = seeks;
shrinker->nr = 0;
down(&shrinker_sem);
list_add(&shrinker->list, &shrinker_list);
@@ -139,20 +137,24 @@ EXPORT_SYMBOL(remove_shrinker);
*/
static int shrink_slab(unsigned long scanned, unsigned int gfp_mask)
{
+ unsigned long long to_scan = scanned;
+ unsigned long slab_size = 0;
struct shrinker *shrinker;
- long pages;
if (down_trylock(&shrinker_sem))
return 0;
- pages = nr_used_zone_pages();
list_for_each_entry(shrinker, &shrinker_list, list) {
- unsigned long long delta;
+ slab_size += (*shrinker->shrinker)(0, gfp_mask);
+ }
- delta = 4 * scanned / shrinker->seeks;
- delta *= (*shrinker->shrinker)(0, gfp_mask);
- do_div(delta, pages + 1);
- shrinker->nr += delta;
+ list_for_each_entry(shrinker, &shrinker_list, list) {
+ unsigned long long delta = to_scan;
+ int this_size = (*shrinker->shrinker)(0, gfp_mask);
+ delta *= this_size;
+ do_div(delta, slab_size + 1);
+ /* + 1 to make sure some scanning is eventually done */
+ shrinker->nr += delta + 1;
if (shrinker->nr > SHRINK_BATCH) {
long nr_to_scan = shrinker->nr;
diff -puN include/linux/mm.h~vm-slab-balance include/linux/mm.h
--- linux-2.6/include/linux/mm.h~vm-slab-balance 2004-02-22 14:52:45.000000000 +1100
+++ linux-2.6-npiggin/include/linux/mm.h 2004-02-22 14:52:45.000000000 +1100
@@ -483,9 +483,8 @@ typedef int (*shrinker_t)(int nr_to_scan
* to recreate one of the objects that these functions age.
*/
-#define DEFAULT_SEEKS 2
struct shrinker;
-extern struct shrinker *set_shrinker(int, shrinker_t);
+extern struct shrinker *set_shrinker(shrinker_t);
extern void remove_shrinker(struct shrinker *shrinker);
/*
diff -puN fs/dcache.c~vm-slab-balance fs/dcache.c
--- linux-2.6/fs/dcache.c~vm-slab-balance 2004-02-22 14:52:45.000000000 +1100
+++ linux-2.6-npiggin/fs/dcache.c 2004-02-22 14:52:45.000000000 +1100
@@ -657,7 +657,7 @@ static int shrink_dcache_memory(int nr,
if (gfp_mask & __GFP_FS)
prune_dcache(nr);
}
- return dentry_stat.nr_unused;
+ return dentry_stat.nr_dentry;
}
#define NAME_ALLOC_LEN(len) ((len+16) & ~15)
@@ -1564,7 +1564,7 @@ static void __init dcache_init(unsigned
if (!dentry_cache)
panic("Cannot create dentry cache");
- set_shrinker(DEFAULT_SEEKS, shrink_dcache_memory);
+ set_shrinker(shrink_dcache_memory);
if (!dhash_entries)
dhash_entries = PAGE_SHIFT < 13 ?
diff -puN fs/dquot.c~vm-slab-balance fs/dquot.c
--- linux-2.6/fs/dquot.c~vm-slab-balance 2004-02-22 14:52:45.000000000 +1100
+++ linux-2.6-npiggin/fs/dquot.c 2004-02-22 14:52:45.000000000 +1100
@@ -1661,7 +1661,7 @@ static int __init dquot_init(void)
if (!dquot_cachep)
panic("Cannot create dquot SLAB cache");
- set_shrinker(DEFAULT_SEEKS, shrink_dqcache_memory);
+ set_shrinker(shrink_dqcache_memory);
return 0;
}
diff -puN fs/inode.c~vm-slab-balance fs/inode.c
--- linux-2.6/fs/inode.c~vm-slab-balance 2004-02-22 14:52:45.000000000 +1100
+++ linux-2.6-npiggin/fs/inode.c 2004-02-22 14:52:45.000000000 +1100
@@ -479,7 +479,7 @@ static int shrink_icache_memory(int nr,
if (gfp_mask & __GFP_FS)
prune_icache(nr);
}
- return inodes_stat.nr_unused;
+ return inodes_stat.nr_inodes;
}
static void __wait_on_freeing_inode(struct inode *inode);
@@ -1394,7 +1394,7 @@ void __init inode_init(unsigned long mem
if (!inode_cachep)
panic("cannot create inode slab cache");
- set_shrinker(DEFAULT_SEEKS, shrink_icache_memory);
+ set_shrinker(shrink_icache_memory);
}
void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev)
diff -puN fs/mbcache.c~vm-slab-balance fs/mbcache.c
--- linux-2.6/fs/mbcache.c~vm-slab-balance 2004-02-22 14:52:45.000000000 +1100
+++ linux-2.6-npiggin/fs/mbcache.c 2004-02-22 14:52:45.000000000 +1100
@@ -629,7 +629,7 @@ mb_cache_entry_find_next(struct mb_cache
static int __init init_mbcache(void)
{
- mb_shrinker = set_shrinker(DEFAULT_SEEKS, mb_cache_shrink_fn);
+ mb_shrinker = set_shrinker(mb_cache_shrink_fn);
return 0;
}
diff -puN fs/xfs/linux/kmem.h~vm-slab-balance fs/xfs/linux/kmem.h
--- linux-2.6/fs/xfs/linux/kmem.h~vm-slab-balance 2004-02-22 14:52:45.000000000 +1100
+++ linux-2.6-npiggin/fs/xfs/linux/kmem.h 2004-02-22 14:52:45.000000000 +1100
@@ -171,7 +171,7 @@ typedef int (*kmem_shake_func_t)(int, un
static __inline kmem_shaker_t
kmem_shake_register(kmem_shake_func_t sfunc)
{
- return set_shrinker(DEFAULT_SEEKS, sfunc);
+ return set_shrinker(sfunc);
}
static __inline void
_
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 4:01 ` Nick Piggin
@ 2004-02-22 4:10 ` Nick Piggin
2004-02-22 4:30 ` Nick Piggin
0 siblings, 1 reply; 56+ messages in thread
From: Nick Piggin @ 2004-02-22 4:10 UTC (permalink / raw)
To: Chris Wedgwood; +Cc: Linus Torvalds, Mike Fedyk, linux-kernel
Nick Piggin wrote:
>
>
> Chris Wedgwood wrote:
>
>> On Sat, Feb 21, 2004 at 07:28:24PM -0800, Linus Torvalds wrote:
>>
>>
>>> What happened to the experiment of having slab pages on the
>>> (in)active lists and letting them be free'd that way? Didn't
>>> somebody already do that? Ed Tomlinson and Craig Kulesa?
>>>
>>
>> Just as a data point:
>>
>> cw@taniwha:~/wk/linux/bk-2.5.x$ grep -E '(LowT|Slab)' /proc/meminfo
>> LowTotal: 898448 kB
>> Slab: 846260 kB
>>
>> So the slab pressure I have right now is simply because there is
>> nowhere else it has to grow...
>>
>>
>
> Can you try the following patch? It is against 2.6.3-mm2.
>
Actually I think the previous shrink_slab formula factors
out to the right thing anyway, so nevermind this patch :P
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 4:10 ` Nick Piggin
@ 2004-02-22 4:30 ` Nick Piggin
2004-02-22 4:41 ` Mike Fedyk
2004-02-22 6:09 ` Andrew Morton
0 siblings, 2 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-22 4:30 UTC (permalink / raw)
To: Chris Wedgwood; +Cc: Linus Torvalds, Mike Fedyk, linux-kernel
Nick Piggin wrote:
>
> Actually I think the previous shrink_slab formula factors
> out to the right thing anyway, so nevermind this patch :P
>
>
Although, nr_used_zone_pages probably shouldn't be counting
highmem zones, which might be our problem.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 4:30 ` Nick Piggin
@ 2004-02-22 4:41 ` Mike Fedyk
2004-02-22 5:37 ` Nick Piggin
2004-02-22 6:09 ` Andrew Morton
1 sibling, 1 reply; 56+ messages in thread
From: Mike Fedyk @ 2004-02-22 4:41 UTC (permalink / raw)
To: Nick Piggin; +Cc: Chris Wedgwood, Linus Torvalds, linux-kernel
Nick Piggin wrote:
>
>
> Nick Piggin wrote:
>
>>
>> Actually I think the previous shrink_slab formula factors
>> out to the right thing anyway, so nevermind this patch :P
>>
>>
>
> Although, nr_used_zone_pages probably shouldn't be counting
> highmem zones, which might be our problem.
What is the kernel parameter to disable highmem? I saw nohighio, but
that's not it...
Doesn't "mem=" have alignment problems?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 4:41 ` Mike Fedyk
@ 2004-02-22 5:37 ` Nick Piggin
2004-02-22 5:44 ` Chris Wedgwood
2004-02-22 5:50 ` Mike Fedyk
0 siblings, 2 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-22 5:37 UTC (permalink / raw)
To: Mike Fedyk; +Cc: Chris Wedgwood, Linus Torvalds, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 686 bytes --]
Mike Fedyk wrote:
> Nick Piggin wrote:
>
>>
>>
>> Nick Piggin wrote:
>>
>>>
>>> Actually I think the previous shrink_slab formula factors
>>> out to the right thing anyway, so nevermind this patch :P
>>>
>>>
>>
>> Although, nr_used_zone_pages probably shouldn't be counting
>> highmem zones, which might be our problem.
>
>
> What is the kernel parameter to disable highmem? I saw nohighio, but
> that's not it...
>
Not sure. That defeats the purpose of trying to get your setup
working nicely though ;)
Can you upgrade to 2.6.3-mm2? It would be ideal if you could
test this patch against that kernel due to the other VM changes.
Chris, could you test this too please? Thanks.
[-- Attachment #2: vm-shrink-slab-lowmem.patch --]
[-- Type: text/plain, Size: 3526 bytes --]
linux-2.6-npiggin/include/linux/mm.h | 2 +-
linux-2.6-npiggin/mm/page_alloc.c | 13 +++++++++----
linux-2.6-npiggin/mm/vmscan.c | 22 ++++++++++++----------
3 files changed, 22 insertions(+), 15 deletions(-)
diff -puN mm/vmscan.c~vm-shrink-slab-lowmem mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-shrink-slab-lowmem 2004-02-22 16:35:06.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c 2004-02-22 16:35:06.000000000 +1100
@@ -145,7 +145,7 @@ static int shrink_slab(unsigned long sca
if (down_trylock(&shrinker_sem))
return 0;
- pages = nr_used_zone_pages();
+ pages = nr_lowmem_lru_pages();
list_for_each_entry(shrinker, &shrinker_list, list) {
unsigned long long delta;
@@ -857,7 +857,8 @@ shrink_zone(struct zone *zone, unsigned
*/
static int
shrink_caches(struct zone **zones, int priority, int *total_scanned,
- int gfp_mask, int nr_pages, struct page_state *ps)
+ int *lowmem_scanned, int gfp_mask, int nr_pages,
+ struct page_state *ps)
{
int ret = 0;
int i;
@@ -875,7 +876,10 @@ shrink_caches(struct zone **zones, int p
ret += shrink_zone(zone, gfp_mask,
to_reclaim, &nr_scanned, ps, priority);
+
*total_scanned += nr_scanned;
+ if (i < ZONE_HIGHMEM)
+ *lowmem_scanned += nr_scanned;
if (ret >= nr_pages)
break;
}
@@ -915,19 +919,17 @@ int try_to_free_pages(struct zone **zone
zones[i]->temp_priority = DEF_PRIORITY;
for (priority = DEF_PRIORITY; priority >= 0; priority--) {
- int total_scanned = 0;
+ int total_scanned = 0, lowmem_scanned = 0;
struct page_state ps;
get_page_state(&ps);
nr_reclaimed += shrink_caches(zones, priority, &total_scanned,
- gfp_mask, nr_pages, &ps);
+ &lowmem_scanned, gfp_mask, nr_pages, &ps);
- if (zones[0] - zones[0]->zone_pgdat->node_zones < ZONE_HIGHMEM) {
- shrink_slab(total_scanned, gfp_mask);
- if (reclaim_state) {
- nr_reclaimed += reclaim_state->reclaimed_slab;
- reclaim_state->reclaimed_slab = 0;
- }
+ shrink_slab(lowmem_scanned, gfp_mask);
+ if (reclaim_state) {
+ nr_reclaimed += reclaim_state->reclaimed_slab;
+ reclaim_state->reclaimed_slab = 0;
}
if (nr_reclaimed >= nr_pages) {
diff -puN mm/page_alloc.c~vm-shrink-slab-lowmem mm/page_alloc.c
--- linux-2.6/mm/page_alloc.c~vm-shrink-slab-lowmem 2004-02-22 16:35:06.000000000 +1100
+++ linux-2.6-npiggin/mm/page_alloc.c 2004-02-22 16:35:06.000000000 +1100
@@ -772,13 +772,18 @@ unsigned int nr_free_pages(void)
EXPORT_SYMBOL(nr_free_pages);
-unsigned int nr_used_zone_pages(void)
+unsigned int nr_lowmem_lru_pages(void)
{
+ pg_data_t *pgdat;
unsigned int pages = 0;
- struct zone *zone;
- for_each_zone(zone)
- pages += zone->nr_active + zone->nr_inactive;
+ for_each_pgdat(pgdat) {
+ int i;
+ for (i = 0; i < ZONE_HIGHMEM; i++) {
+ struct zone *zone = pgdat->node_zones + i;
+ pages += zone->nr_active + zone->nr_inactive;
+ }
+ }
return pages;
}
diff -puN include/linux/mm.h~vm-shrink-slab-lowmem include/linux/mm.h
--- linux-2.6/include/linux/mm.h~vm-shrink-slab-lowmem 2004-02-22 16:35:06.000000000 +1100
+++ linux-2.6-npiggin/include/linux/mm.h 2004-02-22 16:35:06.000000000 +1100
@@ -625,7 +625,7 @@ static inline struct vm_area_struct * fi
extern struct vm_area_struct *find_extend_vma(struct mm_struct *mm, unsigned long addr);
-extern unsigned int nr_used_zone_pages(void);
+extern unsigned int nr_lowmem_lru_pages(void);
extern struct page * vmalloc_to_page(void *addr);
extern struct page * follow_page(struct mm_struct *mm, unsigned long address,
_
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 5:37 ` Nick Piggin
@ 2004-02-22 5:44 ` Chris Wedgwood
2004-02-22 5:52 ` Nick Piggin
2004-02-22 5:50 ` Mike Fedyk
1 sibling, 1 reply; 56+ messages in thread
From: Chris Wedgwood @ 2004-02-22 5:44 UTC (permalink / raw)
To: Nick Piggin; +Cc: Mike Fedyk, Linus Torvalds, linux-kernel
On Sun, Feb 22, 2004 at 04:37:46PM +1100, Nick Piggin wrote:
> Can you upgrade to 2.6.3-mm2? It would be ideal if you could test
> this patch against that kernel due to the other VM changes.
Sure.
> Chris, could you test this too please? Thanks.
I tested this change to a stock 2.6.3 kernel and saw a marginally
better situation... 650MB in slab instead of 850MB:
===== page_alloc.c 1.186 vs edited =====
--- 1.186/mm/page_alloc.c Wed Feb 18 19:43:04 2004
+++ edited/page_alloc.c Sat Feb 21 21:05:32 2004
@@ -764,13 +764,18 @@
EXPORT_SYMBOL(nr_free_pages);
+/*
+ * return the number of non-highmem pages (we should probably rename
+ * this function? --cw)
+ */
unsigned int nr_used_zone_pages(void)
{
unsigned int pages = 0;
struct zone *zone;
for_each_zone(zone)
- pages += zone->nr_active + zone->nr_inactive;
+ if (!is_highmem(zone))
+ pages += zone->nr_active + zone->nr_inactive;
return pages;
}
I'll test -mm2 with your patch shortly.
>
> @@ -145,7 +145,7 @@ static int shrink_slab(unsigned long sca
> if (down_trylock(&shrinker_sem))
> return 0;
>
> - pages = nr_used_zone_pages();
> + pages = nr_lowmem_lru_pages();
Cool. I think renaming this i a good idea.
> -unsigned int nr_used_zone_pages(void)
> +unsigned int nr_lowmem_lru_pages(void)
> {
> + pg_data_t *pgdat;
> unsigned int pages = 0;
> - struct zone *zone;
>
> - for_each_zone(zone)
> - pages += zone->nr_active + zone->nr_inactive;
> + for_each_pgdat(pgdat) {
> + int i;
> + for (i = 0; i < ZONE_HIGHMEM; i++) {
> + struct zone *zone = pgdat->node_zones + i;
> + pages += zone->nr_active + zone->nr_inactive;
> + }
> + }
Why not just check is_highmem(zone) here?
> -extern unsigned int nr_used_zone_pages(void);
> +extern unsigned int nr_lowmem_lru_pages(void);
Since shrink_slab() is the only consumer of this why not move the
function to vmscan.c just above shrink_slab()?
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 5:37 ` Nick Piggin
2004-02-22 5:44 ` Chris Wedgwood
@ 2004-02-22 5:50 ` Mike Fedyk
2004-02-22 6:01 ` Nick Piggin
1 sibling, 1 reply; 56+ messages in thread
From: Mike Fedyk @ 2004-02-22 5:50 UTC (permalink / raw)
To: Nick Piggin; +Cc: Chris Wedgwood, Linus Torvalds, linux-kernel
Nick Piggin wrote:
>
>
> Mike Fedyk wrote:
>> What is the kernel parameter to disable highmem? I saw nohighio, but
>> that's not it...
>>
>
> Not sure. That defeats the purpose of trying to get your setup
> working nicely though ;)
>
> Can you upgrade to 2.6.3-mm2? It would be ideal if you could
> test this patch against that kernel due to the other VM changes.
I can test on another machine, but it doesn't have as much memory, and
I'd have to use highmem emulation.
I'd prefer to not have to restart this machine and put a test kernel on it.
>
> Chris, could you test this too please? Thanks.
Yes, Chris do you have any highmem machines where you can test this patch?
Mike
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 5:44 ` Chris Wedgwood
@ 2004-02-22 5:52 ` Nick Piggin
0 siblings, 0 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-22 5:52 UTC (permalink / raw)
To: Chris Wedgwood; +Cc: Mike Fedyk, Linus Torvalds, linux-kernel
Chris Wedgwood wrote:
>On Sun, Feb 22, 2004 at 04:37:46PM +1100, Nick Piggin wrote:
>
>
>>Can you upgrade to 2.6.3-mm2? It would be ideal if you could test
>>this patch against that kernel due to the other VM changes.
>>
>
>Sure.
>
>
>>Chris, could you test this too please? Thanks.
>>
>
>I tested this change to a stock 2.6.3 kernel and saw a marginally
>better situation... 650MB in slab instead of 850MB:
>
In your case, this is probably ideal if the system is
not doing much. You now have a reasonable amount of low
memory available.
>
>===== page_alloc.c 1.186 vs edited =====
>--- 1.186/mm/page_alloc.c Wed Feb 18 19:43:04 2004
>+++ edited/page_alloc.c Sat Feb 21 21:05:32 2004
>@@ -764,13 +764,18 @@
>
> EXPORT_SYMBOL(nr_free_pages);
>
>+/*
>+ * return the number of non-highmem pages (we should probably rename
>+ * this function? --cw)
>+ */
> unsigned int nr_used_zone_pages(void)
> {
> unsigned int pages = 0;
> struct zone *zone;
>
> for_each_zone(zone)
>- pages += zone->nr_active + zone->nr_inactive;
>+ if (!is_highmem(zone))
>+ pages += zone->nr_active + zone->nr_inactive;
>
> return pages;
> }
>
>
>I'll test -mm2 with your patch shortly.
>
>
My patch will be functionally the same as yours so you'll be
mainly testing the other VM changes (which isn't a bad thing).
Thanks.
>
>>@@ -145,7 +145,7 @@ static int shrink_slab(unsigned long sca
>> if (down_trylock(&shrinker_sem))
>> return 0;
>>
>>- pages = nr_used_zone_pages();
>>+ pages = nr_lowmem_lru_pages();
>>
>
>Cool. I think renaming this i a good idea.
>
>
Yep.
>>-unsigned int nr_used_zone_pages(void)
>>+unsigned int nr_lowmem_lru_pages(void)
>> {
>>+ pg_data_t *pgdat;
>> unsigned int pages = 0;
>>- struct zone *zone;
>>
>>- for_each_zone(zone)
>>- pages += zone->nr_active + zone->nr_inactive;
>>+ for_each_pgdat(pgdat) {
>>+ int i;
>>+ for (i = 0; i < ZONE_HIGHMEM; i++) {
>>+ struct zone *zone = pgdat->node_zones + i;
>>+ pages += zone->nr_active + zone->nr_inactive;
>>+ }
>>+ }
>>
>
>Why not just check is_highmem(zone) here?
>
>
Why indeed? Easier to read vs a tiny bit faster. It isn't
really a fast path, so your version is probably better.
>>-extern unsigned int nr_used_zone_pages(void);
>>+extern unsigned int nr_lowmem_lru_pages(void);
>>
>
>Since shrink_slab() is the only consumer of this why not move the
>function to vmscan.c just above shrink_slab()?
>
>
Might as well I suppose. I'll incorporate your suggestions if
it tests well and I end up sending it off to Andrew.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 5:50 ` Mike Fedyk
@ 2004-02-22 6:01 ` Nick Piggin
2004-02-22 6:17 ` Andrew Morton
2004-02-22 6:45 ` Mike Fedyk
0 siblings, 2 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-22 6:01 UTC (permalink / raw)
To: Mike Fedyk; +Cc: Chris Wedgwood, Linus Torvalds, linux-kernel, Andrew Morton
Mike Fedyk wrote:
> Nick Piggin wrote:
>
>>
>>
>> Mike Fedyk wrote:
>>
>>> What is the kernel parameter to disable highmem? I saw nohighio,
>>> but that's not it...
>>>
>>
>> Not sure. That defeats the purpose of trying to get your setup
>> working nicely though ;)
>>
>> Can you upgrade to 2.6.3-mm2? It would be ideal if you could
>> test this patch against that kernel due to the other VM changes.
>
>
> I can test on another machine, but it doesn't have as much memory, and
> I'd have to use highmem emulation.
>
Probably not worth the bother. It is easy enough for anyone to
test random things, but the reason your feedback is so important
is because you are actually *using* the system.
> I'd prefer to not have to restart this machine and put a test kernel
> on it.
>
Fair enough. Maybe if we can get enough testing, some of the mm
changes can get into 2.6.4? I'm sure Linus is turning pale, maybe
we'd better wait until 2.6.10 ;)
>>
>> Chris, could you test this too please? Thanks.
>
>
> Yes, Chris do you have any highmem machines where you can test this
> patch?
>
The system he's testing on has 1.5G too.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 4:30 ` Nick Piggin
2004-02-22 4:41 ` Mike Fedyk
@ 2004-02-22 6:09 ` Andrew Morton
2004-02-22 17:05 ` Linus Torvalds
1 sibling, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2004-02-22 6:09 UTC (permalink / raw)
To: Nick Piggin; +Cc: cw, torvalds, mfedyk, linux-kernel
Nick Piggin <piggin@cyberone.com.au> wrote:
>
> Although, nr_used_zone_pages probably shouldn't be counting
> highmem zones, which might be our problem.
yeah. We should have made that change when making shrink_slab() ignore
highmem scanning.
Something like this (the function needs a rename)
--- 25/mm/page_alloc.c~shrink_slab-highmem-fix 2004-02-21 22:07:32.000000000 -0800
+++ 25-akpm/mm/page_alloc.c 2004-02-21 22:08:03.000000000 -0800
@@ -769,8 +769,10 @@ unsigned int nr_used_zone_pages(void)
unsigned int pages = 0;
struct zone *zone;
- for_each_zone(zone)
- pages += zone->nr_active + zone->nr_inactive;
+ for_each_zone(zone) {
+ if (zone - zone->zone_pgdat->node_zones < ZONE_HIGHMEM)
+ pages += zone->nr_active + zone->nr_inactive;
+ }
return pages;
}
_
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 3:28 ` Linus Torvalds
2004-02-22 3:29 ` Chris Wedgwood
2004-02-22 3:31 ` Chris Wedgwood
@ 2004-02-22 6:15 ` Andrew Morton
2004-02-22 16:08 ` Martin J. Bligh
2004-02-22 14:03 ` Ed Tomlinson
3 siblings, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2004-02-22 6:15 UTC (permalink / raw)
To: Linus Torvalds; +Cc: cw, mfedyk, linux-kernel
Linus Torvalds <torvalds@osdl.org> wrote:
>
> What happened to the experiment of having slab pages on the (in)active
> lists and letting them be free'd that way? Didn't somebody already do
> that? Ed Tomlinson and Craig Kulesa?
That was Ed. Because we cannot reclaim slab pages direct from the LRU it
turned out that putting slab pages onto the LRU was merely an extremely
complicated way of making the VFS cache scanning rate porportional to the
pagecache scanning rate. So we ended up doing just that, without putting
the slab pages on the LRU.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 6:01 ` Nick Piggin
@ 2004-02-22 6:17 ` Andrew Morton
2004-02-22 6:35 ` Nick Piggin
2004-02-22 6:45 ` Mike Fedyk
1 sibling, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2004-02-22 6:17 UTC (permalink / raw)
To: Nick Piggin; +Cc: mfedyk, cw, torvalds, linux-kernel
Nick Piggin <piggin@cyberone.com.au> wrote:
>
> Fair enough. Maybe if we can get enough testing, some of the mm
> changes can get into 2.6.4? I'm sure Linus is turning pale, maybe
> we'd better wait until 2.6.10 ;)
I need to alight from my lazy tail and test them a bit^Wlot first. More
like 2.6.5.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 6:17 ` Andrew Morton
@ 2004-02-22 6:35 ` Nick Piggin
2004-02-22 6:57 ` Andrew Morton
2004-02-22 8:36 ` Chris Wedgwood
0 siblings, 2 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-22 6:35 UTC (permalink / raw)
To: Andrew Morton; +Cc: mfedyk, cw, torvalds, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 406 bytes --]
Andrew Morton wrote:
>Nick Piggin <piggin@cyberone.com.au> wrote:
>
>
>>Fair enough. Maybe if we can get enough testing, some of the mm
>> changes can get into 2.6.4? I'm sure Linus is turning pale, maybe
>> we'd better wait until 2.6.10 ;)
>>
>>
>
>I need to alight from my lazy tail and test them a bit^Wlot first. More
>like 2.6.5.
>
>
>
Can you maybe use this patch then, please?
Thanks
[-- Attachment #2: vm-shrink-slab-lowmem.patch --]
[-- Type: text/plain, Size: 5212 bytes --]
linux-2.6-npiggin/include/linux/mm.h | 2 -
linux-2.6-npiggin/mm/page_alloc.c | 11 ------
linux-2.6-npiggin/mm/vmscan.c | 64 ++++++++++++++++++++++++++++-------
3 files changed, 52 insertions(+), 25 deletions(-)
diff -puN mm/vmscan.c~vm-shrink-slab-lowmem mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-shrink-slab-lowmem 2004-02-22 16:35:06.000000000 +1100
+++ linux-2.6-npiggin/mm/vmscan.c 2004-02-22 17:30:53.000000000 +1100
@@ -122,7 +122,25 @@ void remove_shrinker(struct shrinker *sh
}
EXPORT_SYMBOL(remove_shrinker);
-
+
+/*
+ * Returns the number of lowmem pages which are on the lru lists
+ */
+static unsigned int nr_lowmem_lru_pages(void)
+{
+ unsigned int pages = 0;
+ struct zone *zone;
+
+ for_each_zone(zone) {
+ if (unlikely(is_highmem(zone)))
+ continue;
+ pages += zone->nr_active + zone->nr_inactive;
+ }
+
+ return pages;
+}
+
+
#define SHRINK_BATCH 128
/*
* Call the shrink functions to age shrinkable caches
@@ -136,6 +154,24 @@ EXPORT_SYMBOL(remove_shrinker);
* slab to avoid swapping.
*
* We do weird things to avoid (scanned*seeks*entries) overflowing 32 bits.
+ *
+ * The formula to work out how much to scan each slab is as follows:
+ * Let S be the number of lowmem LRU pages were scanned (scanned)
+ * Let M be the total number of lowmem LRU pages (pages)
+ * T be the total number of all slab items.
+ * For each slab:
+ * I be number of slab items ((*shrinker->shrinker)(0, gfp_mask))
+ *
+ * "S * M / T" then gives the total number of slab items to scan, N
+ * Then for each slab, "N * T / I" is the number of items to scan for this slab.
+ *
+ * This simplifies to "S * M / I", or
+ * lowmem lru scanned * items in this slab / total lowmem lru pages
+ *
+ * TODO:
+ * The value of M should be calculated *before* LRU scanning.
+ * Total number of items in each slab should be used, not just freeable ones.
+ * Unfreeable slab items should not count toward the scanning total.
*/
static int shrink_slab(unsigned long scanned, unsigned int gfp_mask)
{
@@ -145,14 +181,16 @@ static int shrink_slab(unsigned long sca
if (down_trylock(&shrinker_sem))
return 0;
- pages = nr_used_zone_pages();
+ pages = nr_lowmem_lru_pages();
list_for_each_entry(shrinker, &shrinker_list, list) {
unsigned long long delta;
delta = 4 * scanned / shrinker->seeks;
delta *= (*shrinker->shrinker)(0, gfp_mask);
do_div(delta, pages + 1);
- shrinker->nr += delta;
+
+ /* +1 to ensure some scanning gets done */
+ shrinker->nr += delta + 1;
if (shrinker->nr > SHRINK_BATCH) {
long nr_to_scan = shrinker->nr;
@@ -857,7 +895,8 @@ shrink_zone(struct zone *zone, unsigned
*/
static int
shrink_caches(struct zone **zones, int priority, int *total_scanned,
- int gfp_mask, int nr_pages, struct page_state *ps)
+ int *lowmem_scanned, int gfp_mask, int nr_pages,
+ struct page_state *ps)
{
int ret = 0;
int i;
@@ -875,7 +914,10 @@ shrink_caches(struct zone **zones, int p
ret += shrink_zone(zone, gfp_mask,
to_reclaim, &nr_scanned, ps, priority);
+
*total_scanned += nr_scanned;
+ if (i < ZONE_HIGHMEM)
+ *lowmem_scanned += nr_scanned;
if (ret >= nr_pages)
break;
}
@@ -915,19 +957,17 @@ int try_to_free_pages(struct zone **zone
zones[i]->temp_priority = DEF_PRIORITY;
for (priority = DEF_PRIORITY; priority >= 0; priority--) {
- int total_scanned = 0;
+ int total_scanned = 0, lowmem_scanned = 0;
struct page_state ps;
get_page_state(&ps);
nr_reclaimed += shrink_caches(zones, priority, &total_scanned,
- gfp_mask, nr_pages, &ps);
+ &lowmem_scanned, gfp_mask, nr_pages, &ps);
- if (zones[0] - zones[0]->zone_pgdat->node_zones < ZONE_HIGHMEM) {
- shrink_slab(total_scanned, gfp_mask);
- if (reclaim_state) {
- nr_reclaimed += reclaim_state->reclaimed_slab;
- reclaim_state->reclaimed_slab = 0;
- }
+ shrink_slab(lowmem_scanned, gfp_mask);
+ if (reclaim_state) {
+ nr_reclaimed += reclaim_state->reclaimed_slab;
+ reclaim_state->reclaimed_slab = 0;
}
if (nr_reclaimed >= nr_pages) {
diff -puN mm/page_alloc.c~vm-shrink-slab-lowmem mm/page_alloc.c
--- linux-2.6/mm/page_alloc.c~vm-shrink-slab-lowmem 2004-02-22 16:35:06.000000000 +1100
+++ linux-2.6-npiggin/mm/page_alloc.c 2004-02-22 17:04:43.000000000 +1100
@@ -772,17 +772,6 @@ unsigned int nr_free_pages(void)
EXPORT_SYMBOL(nr_free_pages);
-unsigned int nr_used_zone_pages(void)
-{
- unsigned int pages = 0;
- struct zone *zone;
-
- for_each_zone(zone)
- pages += zone->nr_active + zone->nr_inactive;
-
- return pages;
-}
-
#ifdef CONFIG_NUMA
unsigned int nr_free_pages_pgdat(pg_data_t *pgdat)
{
diff -puN include/linux/mm.h~vm-shrink-slab-lowmem include/linux/mm.h
--- linux-2.6/include/linux/mm.h~vm-shrink-slab-lowmem 2004-02-22 16:35:06.000000000 +1100
+++ linux-2.6-npiggin/include/linux/mm.h 2004-02-22 17:04:26.000000000 +1100
@@ -625,8 +625,6 @@ static inline struct vm_area_struct * fi
extern struct vm_area_struct *find_extend_vma(struct mm_struct *mm, unsigned long addr);
-extern unsigned int nr_used_zone_pages(void);
-
extern struct page * vmalloc_to_page(void *addr);
extern struct page * follow_page(struct mm_struct *mm, unsigned long address,
int write);
_
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 6:01 ` Nick Piggin
2004-02-22 6:17 ` Andrew Morton
@ 2004-02-22 6:45 ` Mike Fedyk
2004-02-22 6:58 ` Nick Piggin
1 sibling, 1 reply; 56+ messages in thread
From: Mike Fedyk @ 2004-02-22 6:45 UTC (permalink / raw)
To: Nick Piggin; +Cc: Chris Wedgwood, Linus Torvalds, linux-kernel, Andrew Morton
Nick Piggin wrote:
>
>
> Mike Fedyk wrote:
>
>> Nick Piggin wrote:
>>
>>>
>>>
>>> Mike Fedyk wrote:
>>>
>>>> What is the kernel parameter to disable highmem? I saw nohighio,
>>>> but that's not it...
>>>>
>>>
>>> Not sure. That defeats the purpose of trying to get your setup
>>> working nicely though ;)
>>>
>>> Can you upgrade to 2.6.3-mm2? It would be ideal if you could
>>> test this patch against that kernel due to the other VM changes.
>>
>>
>>
>> I can test on another machine, but it doesn't have as much memory, and
>> I'd have to use highmem emulation.
>>
>
> Probably not worth the bother. It is easy enough for anyone to
> test random things, but the reason your feedback is so important
> is because you are actually *using* the system.
I completely understand what you're saying. I have seen enough threads
where someone refused to test patches. So let me be more specific.
I'll have to test the kernel on two other machines for a few days before
I put it on this particular machine. Unfortunately, both of them have <
1.5GB ram.
So let me know which patches are most likely to fix this problem.
PS, if I can apply them to my 2.6.1 kernel, then I wouldn't have to run
the base kernel to compare changes of 2.6.1 -> 2.6.3 -> 2.6.3-mm -> your
patch.
Each step would require a week-day to get a fair compairison.
Mike
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 6:35 ` Nick Piggin
@ 2004-02-22 6:57 ` Andrew Morton
2004-02-22 7:20 ` Nick Piggin
2004-02-22 8:36 ` Chris Wedgwood
1 sibling, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2004-02-22 6:57 UTC (permalink / raw)
To: Nick Piggin; +Cc: mfedyk, cw, torvalds, linux-kernel
Nick Piggin <piggin@cyberone.com.au> wrote:
>
>
> Can you maybe use this patch then, please?
OK.
> +static unsigned int nr_lowmem_lru_pages(void)
heh, that's what I called it.
> + * Total number of items in each slab should be used, not just freeable ones.
> + * Unfreeable slab items should not count toward the scanning total.
That's up to the individual shrinkers. What we have for dcache and icache
is close enough. Most entries on inode_unused and dentry_unused should be
reclaimable, but checking that with some instrumentation wouldn't hurt.
> + if (i < ZONE_HIGHMEM)
> + *lowmem_scanned += nr_scanned;
yup.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 6:45 ` Mike Fedyk
@ 2004-02-22 6:58 ` Nick Piggin
2004-02-22 7:20 ` Mike Fedyk
0 siblings, 1 reply; 56+ messages in thread
From: Nick Piggin @ 2004-02-22 6:58 UTC (permalink / raw)
To: Mike Fedyk; +Cc: Chris Wedgwood, Linus Torvalds, linux-kernel, Andrew Morton
Mike Fedyk wrote:
> Nick Piggin wrote:
>
>>
>> Probably not worth the bother. It is easy enough for anyone to
>> test random things, but the reason your feedback is so important
>> is because you are actually *using* the system.
>
>
> I completely understand what you're saying. I have seen enough
> threads where someone refused to test patches. So let me be more
> specific.
>
> I'll have to test the kernel on two other machines for a few days
> before I put it on this particular machine. Unfortunately, both of
> them have < 1.5GB ram.
>
That is quite alright. I didn't intend to sound pushy in that
message, and I fully understand if you refuse to test patches on
your production machine.
> So let me know which patches are most likely to fix this problem.
>
> PS, if I can apply them to my 2.6.1 kernel, then I wouldn't have to
> run the base kernel to compare changes of 2.6.1 -> 2.6.3 -> 2.6.3-mm
> -> your patch.
>
> Each step would require a week-day to get a fair compairison.
>
The last patch I posted would be a good one to test if you possibly
can. You should hear someone shout within a few days if it does
anything nasty, so the 2.6.3-mm+patch path is probably safer ;)
Nick
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 6:57 ` Andrew Morton
@ 2004-02-22 7:20 ` Nick Piggin
0 siblings, 0 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-22 7:20 UTC (permalink / raw)
To: Andrew Morton; +Cc: mfedyk, cw, torvalds, linux-kernel
Andrew Morton wrote:
>Nick Piggin <piggin@cyberone.com.au> wrote:
>
>>
>>Can you maybe use this patch then, please?
>>
>
>OK.
>
>
>>+static unsigned int nr_lowmem_lru_pages(void)
>>
>
>heh, that's what I called it.
>
>
>>+ * Total number of items in each slab should be used, not just freeable ones.
>>+ * Unfreeable slab items should not count toward the scanning total.
>>
>
>That's up to the individual shrinkers. What we have for dcache and icache
>is close enough. Most entries on inode_unused and dentry_unused should be
>reclaimable, but checking that with some instrumentation wouldn't hurt.
>
>
Yeah it is an issue with the shrinkers, but I put it here so I
only had to write it once.
All items under TODO list are pretty pedantic, but they might
have larger impacts with very small memory systems. They would
definitely improve consistency of shrink_slab's behaviour.
Granted they probably wouldn't do much in the large scheme of
things.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 6:58 ` Nick Piggin
@ 2004-02-22 7:20 ` Mike Fedyk
0 siblings, 0 replies; 56+ messages in thread
From: Mike Fedyk @ 2004-02-22 7:20 UTC (permalink / raw)
To: Nick Piggin; +Cc: Chris Wedgwood, Linus Torvalds, linux-kernel, Andrew Morton
Nick Piggin wrote:
>
>
> Mike Fedyk wrote:
>> I'll have to test the kernel on two other machines for a few days
>> before I put it on this particular machine. Unfortunately, both of
>> them have < 1.5GB ram.
>>
>
> That is quite alright. I didn't intend to sound pushy in that
> message, and I fully understand if you refuse to test patches on
> your production machine.
No problem. It is really sad when a problem could be fixed if only the
origional reporter put more effort into testing the proposed fixes.
Heh, so let me keep from being one of those reporters...
>
>> So let me know which patches are most likely to fix this problem.
>>
>> PS, if I can apply them to my 2.6.1 kernel, then I wouldn't have to
>> run the base kernel to compare changes of 2.6.1 -> 2.6.3 -> 2.6.3-mm
>> -> your patch.
>>
>> Each step would require a week-day to get a fair compairison.
>>
>
> The last patch I posted would be a good one to test if you possibly
> can. You should hear someone shout within a few days if it does
> anything nasty, so the 2.6.3-mm+patch path is probably safer ;)
>
Ok, I'll get started compiling tonight. Be sure to CC me if you have
any updates to this patch.
Mike
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 6:35 ` Nick Piggin
2004-02-22 6:57 ` Andrew Morton
@ 2004-02-22 8:36 ` Chris Wedgwood
2004-02-22 9:13 ` Andrew Morton
1 sibling, 1 reply; 56+ messages in thread
From: Chris Wedgwood @ 2004-02-22 8:36 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, mfedyk, torvalds, linux-kernel
On Sun, Feb 22, 2004 at 05:35:09PM +1100, Nick Piggin wrote:
> Can you maybe use this patch then, please?
I probably need to do more testing, but the quick patch I was using
against mainline (bk head) works better than this against 2.5.3-mm2.
I'll poke about more tomorrow. Time for some z's now.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 8:36 ` Chris Wedgwood
@ 2004-02-22 9:13 ` Andrew Morton
2004-02-23 0:16 ` Nick Piggin
0 siblings, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2004-02-22 9:13 UTC (permalink / raw)
To: Chris Wedgwood; +Cc: piggin, mfedyk, torvalds, linux-kernel
Chris Wedgwood <cw@f00f.org> wrote:
>
> On Sun, Feb 22, 2004 at 05:35:09PM +1100, Nick Piggin wrote:
>
> > Can you maybe use this patch then, please?
>
> I probably need to do more testing, but the quick patch I was using
> against mainline (bk head) works better than this against 2.5.3-mm2.
The patch which went in six months or so back which said "only reclaim slab
if we're scanning lowmem pagecache" was wrong. I must have been asleep at
the time.
We do need to scan slab in response to highmem page reclaim as well.
Because all the math is based around the total amount of memory in the
machine, and we know that if we're performing highmem page reclaim then the
lower zones have no free memory.
Also, the fact that this patch makes such a difference on the 1.5G machine
points at a problem in balancing the reclaim rate against the different
zones. I'll take a look at that separate problem.
This should apply to 2.6.3-mm2.
mm/vmscan.c | 18 +++++++-----------
1 files changed, 7 insertions(+), 11 deletions(-)
diff -puN mm/vmscan.c~a mm/vmscan.c
--- 25/mm/vmscan.c~a 2004-02-22 00:37:09.000000000 -0800
+++ 25-akpm/mm/vmscan.c 2004-02-22 00:37:49.000000000 -0800
@@ -922,12 +922,10 @@ int try_to_free_pages(struct zone **zone
nr_reclaimed += shrink_caches(zones, priority, &total_scanned,
gfp_mask, nr_pages, &ps);
- if (zones[0] - zones[0]->zone_pgdat->node_zones < ZONE_HIGHMEM) {
- shrink_slab(total_scanned, gfp_mask);
- if (reclaim_state) {
- nr_reclaimed += reclaim_state->reclaimed_slab;
- reclaim_state->reclaimed_slab = 0;
- }
+ shrink_slab(total_scanned, gfp_mask);
+ if (reclaim_state) {
+ nr_reclaimed += reclaim_state->reclaimed_slab;
+ reclaim_state->reclaimed_slab = 0;
}
if (nr_reclaimed >= nr_pages) {
@@ -1021,11 +1019,9 @@ static int balance_pgdat(pg_data_t *pgda
zone->temp_priority = priority;
reclaimed = shrink_zone(zone, GFP_KERNEL,
to_reclaim, &nr_scanned, ps, priority);
- if (i < ZONE_HIGHMEM) {
- reclaim_state->reclaimed_slab = 0;
- shrink_slab(nr_scanned, GFP_KERNEL);
- reclaimed += reclaim_state->reclaimed_slab;
- }
+ reclaim_state->reclaimed_slab = 0;
+ shrink_slab(nr_scanned, GFP_KERNEL);
+ reclaimed += reclaim_state->reclaimed_slab;
to_free -= reclaimed;
if (zone->all_unreclaimable)
continue;
_
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
@ 2004-02-22 11:00 Manfred Spraul
0 siblings, 0 replies; 56+ messages in thread
From: Manfred Spraul @ 2004-02-22 11:00 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, Linus Torvalds
>
>
>Linus Torvalds <torvalds@osdl.org> wrote:
>>
>> What happened to the experiment of having slab pages on the (in)active
>> lists and letting them be free'd that way? Didn't somebody already do
>> that? Ed Tomlinson and Craig Kulesa?
>
>That was Ed. Because we cannot reclaim slab pages direct from the LRU
>
I think that this is needed: Bonwick's slab algorithm (i.e. two-level
linked lists, implemented in cache_alloc_refill and free_block) is
intended for unfreeable objects.
The dentry cache is a cache of freeable objects - a different algorithm
would be more efficient for shrinking the dentry cache after an updatedb.
I had started prototyping, but didn't get far.
--
Manfred
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 3:28 ` Linus Torvalds
` (2 preceding siblings ...)
2004-02-22 6:15 ` Andrew Morton
@ 2004-02-22 14:03 ` Ed Tomlinson
2004-02-23 2:28 ` Mike Fedyk
3 siblings, 1 reply; 56+ messages in thread
From: Ed Tomlinson @ 2004-02-22 14:03 UTC (permalink / raw)
To: Chris Wedgwood; +Cc: Linus Torvalds, Mike Fedyk, linux-kernel
On February 21, 2004 10:28 pm, Linus Torvalds wrote:
> On Sat, 21 Feb 2004, Chris Wedgwood wrote:
>
> >
> > Maybe gradual page-cache pressure could shirnk the slab?
>
>
> What happened to the experiment of having slab pages on the (in)active
> lists and letting them be free'd that way? Didn't somebody already do
> that? Ed Tomlinson and Craig Kulesa?
You have a good memory.
We dropped this experiment since there was a lot of latency between the time a
slab page became freeable and when it was actually freed. The current
call back scheme was designed to balance slab preasure and vmscaning.
Ed Tomlinson
> That's still something I'd like to try, although that's obviously 2.7.x
> material, so not useful for rigth now.
>
> Or did the experiment just never work out well?
>
> Linus
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 6:15 ` Andrew Morton
@ 2004-02-22 16:08 ` Martin J. Bligh
2004-02-22 17:55 ` Jamie Lokier
2004-02-22 21:13 ` Dipankar Sarma
0 siblings, 2 replies; 56+ messages in thread
From: Martin J. Bligh @ 2004-02-22 16:08 UTC (permalink / raw)
To: Andrew Morton, Linus Torvalds
Cc: cw, mfedyk, linux-kernel, Dipankar Sarma, Maneesh Soni
--Andrew Morton <akpm@osdl.org> wrote (on Saturday, February 21, 2004 22:15:53 -0800):
> Linus Torvalds <torvalds@osdl.org> wrote:
>>
>> What happened to the experiment of having slab pages on the (in)active
>> lists and letting them be free'd that way? Didn't somebody already do
>> that? Ed Tomlinson and Craig Kulesa?
>
> That was Ed. Because we cannot reclaim slab pages direct from the LRU it
> turned out that putting slab pages onto the LRU was merely an extremely
> complicated way of making the VFS cache scanning rate porportional to the
> pagecache scanning rate. So we ended up doing just that, without putting
> the slab pages on the LRU.
I still don't understand the rationale behind the way we currently do it -
perhaps I'm just being particularly dense. If we have 10,000 pages full of
dcache, and start going through shooting entries by when they were LRU wrt
the entries, not the dcache itself, then (assuming random access to dcache),
we'll evenly shoot the same number of entries from each dcache page without
actually freeing any pages at all, just trashing the cache.
Now I'm aware access isn't really random, which probably saves our arse.
But then some of the entries will be locked too, which only makes things
worse (we free a bunch of entries from that page, but the page itself
still isn't freeable). So it still seems likely to me that we'll blow
away at least half of the dcache entries before we free any significant
number of pages at all. That seems insane to me. Moreover, the more times
we shrink & fill, the worse the layout will get (less grouping of "recently
used entries" into the same page).
Moreover, it seems rather expensive to do a write operation for each
dentry to maintain the LRU list over entries. But maybe we don't do that
anymore with dcache RCU - I lost track of what that does ;-( So doing it
on the page LRU basis still makes a damned sight more sense to me. Don't
we want semantics like "once used vs twice used" preference treatment
for dentries, etc anyway?
If someone has the patience to explain exactly why I'm crazy (on this topic,
not in general) I'd appreciate it ;-)
M.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 6:09 ` Andrew Morton
@ 2004-02-22 17:05 ` Linus Torvalds
2004-02-23 0:29 ` Nick Piggin
0 siblings, 1 reply; 56+ messages in thread
From: Linus Torvalds @ 2004-02-22 17:05 UTC (permalink / raw)
To: Andrew Morton; +Cc: Nick Piggin, cw, mfedyk, linux-kernel
On Sat, 21 Feb 2004, Andrew Morton wrote:
>
> yeah. We should have made that change when making shrink_slab() ignore
> highmem scanning.
>
> Something like this (the function needs a rename)
Why not just pass in the list of zones? That way the _caller_ determines
what zones he is interested in.
So just add a "struct zonelist *zonelist" as the argument, the same way
"__alloc_pages()" has..
Linus
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 16:08 ` Martin J. Bligh
@ 2004-02-22 17:55 ` Jamie Lokier
2004-02-23 3:45 ` Mike Fedyk
2004-02-22 21:13 ` Dipankar Sarma
1 sibling, 1 reply; 56+ messages in thread
From: Jamie Lokier @ 2004-02-22 17:55 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Andrew Morton, Linus Torvalds, cw, mfedyk, linux-kernel,
Dipankar Sarma, Maneesh Soni
Martin J. Bligh wrote:
> So it still seems likely to me that we'll blow away at least half of
> the dcache entries before we free any significant number of pages at
> all. That seems insane to me.
It's not totally insane to free dcache entries from pages that won't
be freed. It encourages new entries to be allocated in those pages.
Ideally you'd simply mark those dcache entries as prime candidates for
recycling when new entries are needed, without actually freeing them
until new entries are needed - or until their whole pages can be
released.
Also, biasing new allocations to recycle those old dcache entries, but
also biasing them to recently used pages, so that recently used
entries tend to cluster in the same pages.
(I'm not sure how those ideas would work out in practice; they're just
hand-waving).
-- Jamie
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 16:08 ` Martin J. Bligh
2004-02-22 17:55 ` Jamie Lokier
@ 2004-02-22 21:13 ` Dipankar Sarma
1 sibling, 0 replies; 56+ messages in thread
From: Dipankar Sarma @ 2004-02-22 21:13 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Andrew Morton, Linus Torvalds, cw, mfedyk, linux-kernel,
Maneesh Soni, Paul McKenney
On Sun, Feb 22, 2004 at 08:08:43AM -0800, Martin J. Bligh wrote:
> I still don't understand the rationale behind the way we currently do it -
> perhaps I'm just being particularly dense. If we have 10,000 pages full of
> dcache, and start going through shooting entries by when they were LRU wrt
> the entries, not the dcache itself, then (assuming random access to dcache),
> we'll evenly shoot the same number of entries from each dcache page without
> actually freeing any pages at all, just trashing the cache.
>
> Now I'm aware access isn't really random, which probably saves our arse.
> But then some of the entries will be locked too, which only makes things
> worse (we free a bunch of entries from that page, but the page itself
> still isn't freeable). So it still seems likely to me that we'll blow
> away at least half of the dcache entries before we free any significant
> number of pages at all. That seems insane to me. Moreover, the more times
> we shrink & fill, the worse the layout will get (less grouping of "recently
> used entries" into the same page).
Do you have a quick test to demonstrate this ? That would be useful.
> Moreover, it seems rather expensive to do a write operation for each
> dentry to maintain the LRU list over entries. But maybe we don't do that
> anymore with dcache RCU - I lost track of what that does ;-( So doing it
> on the page LRU basis still makes a damned sight more sense to me. Don't
> we want semantics like "once used vs twice used" preference treatment
> for dentries, etc anyway?
Dcache-RCU hasn't changed the dentry freeing to slab much, it is still
LRU. Given a CPU, dentries are still returned to the slab
in dcache LRU order.
I have always wondered about how useful the global dcache LRU
mechanism is. This adds another reason for us to go experiment
with it.
Thanks
Dipankar
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 9:13 ` Andrew Morton
@ 2004-02-23 0:16 ` Nick Piggin
2004-02-23 0:26 ` Andrew Morton
0 siblings, 1 reply; 56+ messages in thread
From: Nick Piggin @ 2004-02-23 0:16 UTC (permalink / raw)
To: Andrew Morton; +Cc: Chris Wedgwood, mfedyk, torvalds, linux-kernel
Andrew Morton wrote:
>Chris Wedgwood <cw@f00f.org> wrote:
>
>>On Sun, Feb 22, 2004 at 05:35:09PM +1100, Nick Piggin wrote:
>>
>>
>>>Can you maybe use this patch then, please?
>>>
>>I probably need to do more testing, but the quick patch I was using
>>against mainline (bk head) works better than this against 2.5.3-mm2.
>>
>
>The patch which went in six months or so back which said "only reclaim slab
>if we're scanning lowmem pagecache" was wrong. I must have been asleep at
>the time.
>
>We do need to scan slab in response to highmem page reclaim as well.
>Because all the math is based around the total amount of memory in the
>machine, and we know that if we're performing highmem page reclaim then the
>lower zones have no free memory.
>
>
I don't understand this. Presumably if the lower zones have no free
memory then we'll be doing lowmem page reclaim too, and that will
be shrinking the slab.
The patch I sent you should (modulo the ->seeks stuff) make it
behave as if the slab pages are on lowmem LRUs and get scanned
accordingly.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-23 0:16 ` Nick Piggin
@ 2004-02-23 0:26 ` Andrew Morton
2004-02-23 0:34 ` Nick Piggin
0 siblings, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2004-02-23 0:26 UTC (permalink / raw)
To: Nick Piggin; +Cc: cw, mfedyk, torvalds, linux-kernel
Nick Piggin <piggin@cyberone.com.au> wrote:
>
>
>
> Andrew Morton wrote:
>
> >Chris Wedgwood <cw@f00f.org> wrote:
> >
> >>On Sun, Feb 22, 2004 at 05:35:09PM +1100, Nick Piggin wrote:
> >>
> >>
> >>>Can you maybe use this patch then, please?
> >>>
> >>I probably need to do more testing, but the quick patch I was using
> >>against mainline (bk head) works better than this against 2.5.3-mm2.
> >>
> >
> >The patch which went in six months or so back which said "only reclaim slab
> >if we're scanning lowmem pagecache" was wrong. I must have been asleep at
> >the time.
> >
> >We do need to scan slab in response to highmem page reclaim as well.
> >Because all the math is based around the total amount of memory in the
> >machine, and we know that if we're performing highmem page reclaim then the
> >lower zones have no free memory.
> >
> >
>
> I don't understand this. Presumably if the lower zones have no free
> memory then we'll be doing lowmem page reclaim too, and that will
> be shrinking the slab.
We should be performing lowmem page reclaim, but we're not. With some
highmem/lowmem size combinations the `incremental min' logic in the page
allocator will prevent __GFP_HIGHMEM allocations from taking ZONE_NORMAL
below pages_high and kswapd then does not perform page reclaim in the
lowmem zone at all. I'm seeing some workloads where we reclaim 700 highmem
pages for each lowmem page. This hugely exacerbated the slab problem on
1.5G machines. I have that fixed up now.
Regardless of that, we do, logically, want to reclaim slab in response to
highmem reclaim pressure because any highmem allocation can be satisfied by
lowmem too.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 17:05 ` Linus Torvalds
@ 2004-02-23 0:29 ` Nick Piggin
0 siblings, 0 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-23 0:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, cw, mfedyk, linux-kernel
Linus Torvalds wrote:
>
>On Sat, 21 Feb 2004, Andrew Morton wrote:
>
>>yeah. We should have made that change when making shrink_slab() ignore
>>highmem scanning.
>>
>>Something like this (the function needs a rename)
>>
>
>Why not just pass in the list of zones? That way the _caller_ determines
>what zones he is interested in.
>
>So just add a "struct zonelist *zonelist" as the argument, the same way
>"__alloc_pages()" has..
>
>
The caller might as well just pass in the total size of all
LRUs that it has scanned. It has to traverse the zones anyway,
and this enables it to take the size *before* shrinking, which
gives you a better number.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-23 0:26 ` Andrew Morton
@ 2004-02-23 0:34 ` Nick Piggin
2004-02-23 0:46 ` Andrew Morton
0 siblings, 1 reply; 56+ messages in thread
From: Nick Piggin @ 2004-02-23 0:34 UTC (permalink / raw)
To: Andrew Morton; +Cc: cw, mfedyk, torvalds, linux-kernel
Andrew Morton wrote:
>Nick Piggin <piggin@cyberone.com.au> wrote:
>
>>
>>
>>Andrew Morton wrote:
>>
>>
>>>
>>>We do need to scan slab in response to highmem page reclaim as well.
>>>Because all the math is based around the total amount of memory in the
>>>machine, and we know that if we're performing highmem page reclaim then the
>>>lower zones have no free memory.
>>>
>>>
>>>
>>I don't understand this. Presumably if the lower zones have no free
>>memory then we'll be doing lowmem page reclaim too, and that will
>>be shrinking the slab.
>>
>
>We should be performing lowmem page reclaim, but we're not. With some
>highmem/lowmem size combinations the `incremental min' logic in the page
>allocator will prevent __GFP_HIGHMEM allocations from taking ZONE_NORMAL
>below pages_high and kswapd then does not perform page reclaim in the
>lowmem zone at all. I'm seeing some workloads where we reclaim 700 highmem
>pages for each lowmem page. This hugely exacerbated the slab problem on
>1.5G machines. I have that fixed up now.
>
>
This is the incremental min logic doing its work though. Maybe
that should be fixed up to be less aggressive instead of putting
more complexity in the scanner to work around it.
Anyway could you post the patch you're using to fix it?
>Regardless of that, we do, logically, want to reclaim slab in response to
>highmem reclaim pressure because any highmem allocation can be satisfied by
>lowmem too.
>
>
The logical extension of that is: "we want to reclaim *lowmem* in
response to highmem reclaim pressure because any ..."
If this isn't what the scanner is doing then it should be fixed in
a more generic way.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-23 0:34 ` Nick Piggin
@ 2004-02-23 0:46 ` Andrew Morton
2004-02-23 0:54 ` Nick Piggin
0 siblings, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2004-02-23 0:46 UTC (permalink / raw)
To: Nick Piggin; +Cc: cw, mfedyk, torvalds, linux-kernel
Nick Piggin <piggin@cyberone.com.au> wrote:
>
> >We should be performing lowmem page reclaim, but we're not. With some
> >highmem/lowmem size combinations the `incremental min' logic in the page
> >allocator will prevent __GFP_HIGHMEM allocations from taking ZONE_NORMAL
> >below pages_high and kswapd then does not perform page reclaim in the
> >lowmem zone at all. I'm seeing some workloads where we reclaim 700 highmem
> >pages for each lowmem page. This hugely exacerbated the slab problem on
> >1.5G machines. I have that fixed up now.
> >
> >
>
> This is the incremental min logic doing its work though. Maybe
> that should be fixed up to be less aggressive instead of putting
> more complexity in the scanner to work around it.
The scanner got simpler.
> Anyway could you post the patch you're using to fix it?
Sure.
> >Regardless of that, we do, logically, want to reclaim slab in response to
> >highmem reclaim pressure because any highmem allocation can be satisfied by
> >lowmem too.
> >
> >
>
> The logical extension of that is: "we want to reclaim *lowmem* in
> response to highmem reclaim pressure because any ..."
yep.
> If this isn't what the scanner is doing then it should be fixed in
> a more generic way.
include/linux/mmzone.h | 5 ++++-
mm/page_alloc.c | 13 ++++++++++++-
mm/vmscan.c | 22 +++++++++-------------
3 files changed, 25 insertions(+), 15 deletions(-)
diff -puN mm/page_alloc.c~zone-balancing-batching mm/page_alloc.c
--- 25/mm/page_alloc.c~zone-balancing-batching 2004-02-22 15:15:52.000000000 -0800
+++ 25-akpm/mm/page_alloc.c 2004-02-22 15:15:52.000000000 -0800
@@ -1019,6 +1019,7 @@ void show_free_areas(void)
" min:%lukB"
" low:%lukB"
" high:%lukB"
+ " batch:%lukB"
" active:%lukB"
" inactive:%lukB"
"\n",
@@ -1027,6 +1028,7 @@ void show_free_areas(void)
K(zone->pages_min),
K(zone->pages_low),
K(zone->pages_high),
+ K(zone->reclaim_batch),
K(zone->nr_active),
K(zone->nr_inactive)
);
@@ -1618,6 +1620,8 @@ static void setup_per_zone_pages_min(voi
lowmem_pages += zone->present_pages;
for_each_zone(zone) {
+ unsigned long long reclaim_batch;
+
spin_lock_irqsave(&zone->lru_lock, flags);
if (is_highmem(zone)) {
/*
@@ -1642,8 +1646,15 @@ static void setup_per_zone_pages_min(voi
lowmem_pages;
}
- zone-> pages_low = zone->pages_min * 2;
+ zone->pages_low = zone->pages_min * 2;
zone->pages_high = zone->pages_min * 3;
+
+ reclaim_batch = zone->present_pages * SWAP_CLUSTER_MAX;
+ do_div(reclaim_batch, lowmem_pages);
+ zone->reclaim_batch = reclaim_batch;
+ if (zone->reclaim_batch < 4)
+ zone->reclaim_batch = 4;
+
spin_unlock_irqrestore(&zone->lru_lock, flags);
}
}
diff -puN mm/vmscan.c~zone-balancing-batching mm/vmscan.c
--- 25/mm/vmscan.c~zone-balancing-batching 2004-02-22 15:15:52.000000000 -0800
+++ 25-akpm/mm/vmscan.c 2004-02-22 15:15:52.000000000 -0800
@@ -859,13 +859,12 @@ shrink_zone(struct zone *zone, unsigned
*/
static int
shrink_caches(struct zone **zones, int priority, int *total_scanned,
- int gfp_mask, int nr_pages, struct page_state *ps)
+ int gfp_mask, struct page_state *ps)
{
int ret = 0;
int i;
for (i = 0; zones[i] != NULL; i++) {
- int to_reclaim = max(nr_pages, SWAP_CLUSTER_MAX);
struct zone *zone = zones[i];
int nr_scanned;
@@ -875,8 +874,8 @@ shrink_caches(struct zone **zones, int p
if (zone->all_unreclaimable && priority != DEF_PRIORITY)
continue; /* Let kswapd poll it */
- ret += shrink_zone(zone, gfp_mask,
- to_reclaim, &nr_scanned, ps, priority);
+ ret += shrink_zone(zone, gfp_mask, zone->reclaim_batch,
+ &nr_scanned, ps, priority);
*total_scanned += nr_scanned;
}
return ret;
@@ -904,7 +903,6 @@ int try_to_free_pages(struct zone **zone
{
int priority;
int ret = 0;
- const int nr_pages = SWAP_CLUSTER_MAX;
int nr_reclaimed = 0;
struct reclaim_state *reclaim_state = current->reclaim_state;
int i;
@@ -920,7 +918,7 @@ int try_to_free_pages(struct zone **zone
get_page_state(&ps);
nr_reclaimed += shrink_caches(zones, priority, &total_scanned,
- gfp_mask, nr_pages, &ps);
+ gfp_mask, &ps);
shrink_slab(total_scanned, gfp_mask);
if (reclaim_state) {
@@ -928,7 +926,7 @@ int try_to_free_pages(struct zone **zone
reclaim_state->reclaimed_slab = 0;
}
- if (nr_reclaimed >= nr_pages) {
+ if (nr_reclaimed >= SWAP_CLUSTER_MAX) {
ret = 1;
if (gfp_mask & __GFP_FS)
wakeup_bdflush(total_scanned);
@@ -1008,13 +1006,11 @@ static int balance_pgdat(pg_data_t *pgda
if (zone->all_unreclaimable && priority != DEF_PRIORITY)
continue;
- if (nr_pages) { /* Software suspend */
+ if (nr_pages) /* Software suspend */
to_reclaim = min(to_free, SWAP_CLUSTER_MAX*8);
- } else { /* Zone balancing */
- to_reclaim = zone->pages_high-zone->free_pages;
- if (to_reclaim <= 0)
- continue;
- }
+ else /* Zone balancing */
+ to_reclaim = zone->reclaim_batch;
+
all_zones_ok = 0;
zone->temp_priority = priority;
reclaimed = shrink_zone(zone, GFP_KERNEL,
diff -puN include/linux/mmzone.h~zone-balancing-batching include/linux/mmzone.h
--- 25/include/linux/mmzone.h~zone-balancing-batching 2004-02-22 15:15:52.000000000 -0800
+++ 25-akpm/include/linux/mmzone.h 2004-02-22 15:15:52.000000000 -0800
@@ -69,7 +69,10 @@ struct zone {
*/
spinlock_t lock;
unsigned long free_pages;
- unsigned long pages_min, pages_low, pages_high;
+ unsigned long pages_min;
+ unsigned long pages_low;
+ unsigned long pages_high;
+ unsigned long reclaim_batch;
ZONE_PADDING(_pad1_)
_
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-23 0:46 ` Andrew Morton
@ 2004-02-23 0:54 ` Nick Piggin
2004-02-23 1:00 ` Andrew Morton
0 siblings, 1 reply; 56+ messages in thread
From: Nick Piggin @ 2004-02-23 0:54 UTC (permalink / raw)
To: Andrew Morton; +Cc: cw, mfedyk, torvalds, linux-kernel
Andrew Morton wrote:
>Nick Piggin <piggin@cyberone.com.au> wrote:
>
>
>>This is the incremental min logic doing its work though. Maybe
>>that should be fixed up to be less aggressive instead of putting
>>more complexity in the scanner to work around it.
>>
>
>The scanner got simpler.
>
>
>>Anyway could you post the patch you're using to fix it?
>>
>
>Sure.
>
>
>>>Regardless of that, we do, logically, want to reclaim slab in response to
>>>highmem reclaim pressure because any highmem allocation can be satisfied by
>>>lowmem too.
>>>
>>>
>>>
>>The logical extension of that is: "we want to reclaim *lowmem* in
>>response to highmem reclaim pressure because any ..."
>>
>
>yep.
>
>
Yeah this is good. I thought the patch you were proposing was
to shrink slab on highmem pressure.
Apply some lowmem pressure due to highmem pressure THEN shrink
slab as a result of the lowmem pressure is much better.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-23 0:54 ` Nick Piggin
@ 2004-02-23 1:00 ` Andrew Morton
2004-02-23 1:06 ` Nick Piggin
0 siblings, 1 reply; 56+ messages in thread
From: Andrew Morton @ 2004-02-23 1:00 UTC (permalink / raw)
To: Nick Piggin; +Cc: cw, mfedyk, torvalds, linux-kernel
Nick Piggin <piggin@cyberone.com.au> wrote:
>
>
> >
> >yep.
> >
> >
>
> Yeah this is good. I thought the patch you were proposing was
> to shrink slab on highmem pressure.
That as well.
> Apply some lowmem pressure due to highmem pressure THEN shrink
> slab as a result of the lowmem pressure is much better.
Prove it to me ;)
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-23 1:00 ` Andrew Morton
@ 2004-02-23 1:06 ` Nick Piggin
0 siblings, 0 replies; 56+ messages in thread
From: Nick Piggin @ 2004-02-23 1:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: cw, mfedyk, torvalds, linux-kernel
Andrew Morton wrote:
>Nick Piggin <piggin@cyberone.com.au> wrote:
>
>>
>>>yep.
>>>
>>>
>>>
>>Yeah this is good. I thought the patch you were proposing was
>>to shrink slab on highmem pressure.
>>
>
>That as well.
>
>
Well this is the complexity I'm talking about. Sure it
is actually "simpler" code wise, but you're making it
conceptually more complex.
>>Apply some lowmem pressure due to highmem pressure THEN shrink
>>slab as a result of the lowmem pressure is much better.
>>
>
>Prove it to me ;)
>
>
Your slab wasn't being shrunk because the slab pressure
calculation was way off for highmem systems. My patch fixed
that, so lowmem pressure should shrink slab properly.
Then with your patch, highmem pressure will apply lowmem
pressure. So the end result is that the slab gets appropriate
pressure.
Can't you just prove to me why that doesn't work? ;)
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 14:03 ` Ed Tomlinson
@ 2004-02-23 2:28 ` Mike Fedyk
2004-02-23 3:33 ` Ed Tomlinson
0 siblings, 1 reply; 56+ messages in thread
From: Mike Fedyk @ 2004-02-23 2:28 UTC (permalink / raw)
To: Ed Tomlinson; +Cc: Chris Wedgwood, Linus Torvalds, linux-kernel
Ed Tomlinson wrote:
> On February 21, 2004 10:28 pm, Linus Torvalds wrote:
>
>>On Sat, 21 Feb 2004, Chris Wedgwood wrote:
>>
>>
>>>Maybe gradual page-cache pressure could shirnk the slab?
>>
>>
>>What happened to the experiment of having slab pages on the (in)active
>>lists and letting them be free'd that way? Didn't somebody already do
>>that? Ed Tomlinson and Craig Kulesa?
>
>
> You have a good memory.
>
> We dropped this experiment since there was a lot of latency between the time a
> slab page became freeable and when it was actually freed. The current
> call back scheme was designed to balance slab preasure and vmscaning.
Does it really matter if there is a lot of latency? How does this
affect real-world results? IOW, if it's not at the end of the LRU, then
there's probably something better to free instead...
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-23 2:28 ` Mike Fedyk
@ 2004-02-23 3:33 ` Ed Tomlinson
0 siblings, 0 replies; 56+ messages in thread
From: Ed Tomlinson @ 2004-02-23 3:33 UTC (permalink / raw)
To: linux-kernel; +Cc: Mike Fedyk
On February 22, 2004 09:28 pm, Mike Fedyk wrote:
> Ed Tomlinson wrote:
> > On February 21, 2004 10:28 pm, Linus Torvalds wrote:
> >>On Sat, 21 Feb 2004, Chris Wedgwood wrote:
> >>>Maybe gradual page-cache pressure could shirnk the slab?
> >>
> >>What happened to the experiment of having slab pages on the (in)active
> >>lists and letting them be free'd that way? Didn't somebody already do
> >>that? Ed Tomlinson and Craig Kulesa?
> >
> > You have a good memory.
> >
> > We dropped this experiment since there was a lot of latency between the
> > time a slab page became freeable and when it was actually freed. The
> > current call back scheme was designed to balance slab preasure and
> > vmscaning.
>
> Does it really matter if there is a lot of latency? How does this
> affect real-world results? IOW, if it's not at the end of the LRU, then
> there's probably something better to free instead...
It mattered. People noticed and complained. In any case, as Andrew
pointed out, we get the same effect, without long latencies, in a simplier
manner with the current scheme.
Ed
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: Large slab cache in 2.6.1
2004-02-22 17:55 ` Jamie Lokier
@ 2004-02-23 3:45 ` Mike Fedyk
0 siblings, 0 replies; 56+ messages in thread
From: Mike Fedyk @ 2004-02-23 3:45 UTC (permalink / raw)
To: Jamie Lokier
Cc: Martin J. Bligh, Andrew Morton, Linus Torvalds, cw, linux-kernel,
Dipankar Sarma, Maneesh Soni
Jamie Lokier wrote:
> It's not totally insane to free dcache entries from pages that won't
> be freed. It encourages new entries to be allocated in those pages.
>
> Ideally you'd simply mark those dcache entries as prime candidates for
> recycling when new entries are needed, without actually freeing them
> until new entries are needed - or until their whole pages can be
> released.
This doesn't do much when you want to actually free slab pages though...
I had a similair thought, where you'd mark slab pages where you should
aggressively try to free the containing slab objects in future scans,
but didn't send it since someone else had probably already thought of it.
>
> Also, biasing new allocations to recycle those old dcache entries, but
> also biasing them to recently used pages, so that recently used
> entries tend to cluster in the same pages.
>
Hmm, so if slab is on the LRU, then in some cases the page can't be
freed because of locked slab objects and new objects get allocated to
the new mostly free slab page, and you didn't free very many pages.
Though this might better utilize the slab pages...
Mike
^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2004-02-23 3:45 UTC | newest]
Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-22 11:00 Large slab cache in 2.6.1 Manfred Spraul
-- strict thread matches above, loose matches on Subject: below --
2004-02-22 0:50 Mike Fedyk
2004-02-22 1:09 ` Mike Fedyk
2004-02-22 1:20 ` William Lee Irwin III
2004-02-22 2:03 ` Mike Fedyk
2004-02-22 2:17 ` William Lee Irwin III
2004-02-22 2:38 ` Nick Piggin
2004-02-22 2:46 ` William Lee Irwin III
2004-02-22 2:40 ` Mike Fedyk
2004-02-22 2:58 ` Nick Piggin
2004-02-22 2:33 ` Nick Piggin
2004-02-22 2:46 ` Nick Piggin
2004-02-22 2:54 ` Nick Piggin
2004-02-22 2:36 ` Chris Wedgwood
2004-02-22 3:03 ` Linus Torvalds
2004-02-22 3:11 ` Chris Wedgwood
2004-02-22 3:28 ` Linus Torvalds
2004-02-22 3:29 ` Chris Wedgwood
2004-02-22 3:31 ` Chris Wedgwood
2004-02-22 4:01 ` Nick Piggin
2004-02-22 4:10 ` Nick Piggin
2004-02-22 4:30 ` Nick Piggin
2004-02-22 4:41 ` Mike Fedyk
2004-02-22 5:37 ` Nick Piggin
2004-02-22 5:44 ` Chris Wedgwood
2004-02-22 5:52 ` Nick Piggin
2004-02-22 5:50 ` Mike Fedyk
2004-02-22 6:01 ` Nick Piggin
2004-02-22 6:17 ` Andrew Morton
2004-02-22 6:35 ` Nick Piggin
2004-02-22 6:57 ` Andrew Morton
2004-02-22 7:20 ` Nick Piggin
2004-02-22 8:36 ` Chris Wedgwood
2004-02-22 9:13 ` Andrew Morton
2004-02-23 0:16 ` Nick Piggin
2004-02-23 0:26 ` Andrew Morton
2004-02-23 0:34 ` Nick Piggin
2004-02-23 0:46 ` Andrew Morton
2004-02-23 0:54 ` Nick Piggin
2004-02-23 1:00 ` Andrew Morton
2004-02-23 1:06 ` Nick Piggin
2004-02-22 6:45 ` Mike Fedyk
2004-02-22 6:58 ` Nick Piggin
2004-02-22 7:20 ` Mike Fedyk
2004-02-22 6:09 ` Andrew Morton
2004-02-22 17:05 ` Linus Torvalds
2004-02-23 0:29 ` Nick Piggin
2004-02-22 6:15 ` Andrew Morton
2004-02-22 16:08 ` Martin J. Bligh
2004-02-22 17:55 ` Jamie Lokier
2004-02-23 3:45 ` Mike Fedyk
2004-02-22 21:13 ` Dipankar Sarma
2004-02-22 14:03 ` Ed Tomlinson
2004-02-23 2:28 ` Mike Fedyk
2004-02-23 3:33 ` Ed Tomlinson
2004-02-22 3:21 ` Mike Fedyk
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox