* Re: [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
@ 2002-06-20 4:07 rwhron
2002-06-20 12:47 ` Dave Jones
0 siblings, 1 reply; 20+ messages in thread
From: rwhron @ 2002-06-20 4:07 UTC (permalink / raw)
To: davej; +Cc: linux-kernel, ckulesa, torvalds
> Absolutely. Maybe Randy Hron (added to Cc) can find some spare time
> to benchmark these sometime before the summit too[1]. It'll be very
> interesting to see where it fits in with the other benchmark results
> he's collected on varying workloads.
I'd like to start benchmarking 2.5 on the quad xeon. You fixed the
aic7xxx driver in 2.5.23-dj1. It also has a qlogic QLA2200.
You mentioned the qlogic driver in 2.5 may not have the new error handling yet.
I haven't been able to get a <SysRq showTasks> on it yet,
but the reproducable scenerio for all the 2.5.x kernels I've tried
has been:
mke2fs -q /dev/sdc1
mount -t ext2 -o defaults,noatime /dev/sdc1 /fs1
mkreiserfs /dev/sdc2
mount -t reiserfs -o defaults,noatime /dev/sdc2 /fs2
mke2fs -q -j -J size=400 /dev/sdc3
mount -t ext3 -o defaults,noatime,data=writeback /dev/sdc3 /fs3
for fs in /fs1 /fs2 /fs3
do cpio a hundred megabytes of benchmarks into the 3 filesystems.
sync;sync;sync
umount $fs
done
In 2.5.x umount(1) hangs in uninteruptable sleep when
umounting the first or second filesystem. In 2.5.23, the sync
was in uninteruptable sleep before umounting /fs2.
The compile error on 2.5.23-dj1 was:
gcc -Wp,-MD,./.qlogicisp.o.d -D__KERNEL__ -I/usr/src/linux-2.5.23-dj1/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -pipe -mpreferred-stack-boundary=2 -march=i686 -nostdinc -iwithprefix include -DKBUILD_BASENAME=qlogicisp -c -o qlogicisp.o qlogicisp.c
qlogicisp.c:2005: unknown field `abort' specified in initializer
qlogicisp.c:2005: warning: initialization from incompatible pointer type
qlogicisp.c:2005: unknown field `reset' specified in initializer
qlogicisp.c:2005: warning: initialization from incompatible pointer type
make[2]: *** [qlogicisp.o] Error 1
make[2]: Leaving directory `/usr/src/linux-2.5.23-dj1/drivers/scsi'
make[1]: *** [scsi] Error 2
make[1]: Leaving directory `/usr/src/linux-2.5.23-dj1/drivers'
make: *** [drivers] Error 2
Just in case someone with know-how and can do wants to[1].
> [1] I am master of subtle hints.
I'll put 2.5.x on top of the quad Xeon benchmark queue as soon as I can.
--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
2002-06-20 4:07 [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b) rwhron
@ 2002-06-20 12:47 ` Dave Jones
0 siblings, 0 replies; 20+ messages in thread
From: Dave Jones @ 2002-06-20 12:47 UTC (permalink / raw)
To: rwhron; +Cc: linux-kernel, ckulesa, torvalds
On Thu, Jun 20, 2002 at 12:07:49AM -0400, rwhron@earthlink.net wrote:
> The compile error on 2.5.23-dj1 was:
>
> gcc -Wp,-MD,./.qlogicisp.o.d -D__KERNEL__ -I/usr/src/linux-2.5.23-dj1/include -Wall -Wstrict-prototypes -Wno-trigraphs -O2 -fno-strict-aliasing -fno-common -fomit-frame-pointer -pipe -mpreferred-stack-boundary=2 -march=i686 -nostdinc -iwithprefix include -DKBUILD_BASENAME=qlogicisp -c -o qlogicisp.o qlogicisp.c
> qlogicisp.c:2005: unknown field `abort' specified in initializer
> qlogicisp.c:2005: warning: initialization from incompatible pointer type
> qlogicisp.c:2005: unknown field `reset' specified in initializer
> qlogicisp.c:2005: warning: initialization from incompatible pointer type
Ok, it looks like it hasn't been updated to include the new-style EH yet
(although there are/were some that had both). Setting the option
"Use SCSI drivers with broken error handling [DANGEROUS]" in the SCSI
submenu will give same behaviour as that driver does in Linus' tree.
Ie, it will compile, but possibly not have any working error handling.
It should be ok for benchmarking though..
Dave
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
@ 2002-06-25 10:58 rwhron
2002-06-25 21:57 ` Rik van Riel
0 siblings, 1 reply; 20+ messages in thread
From: rwhron @ 2002-06-25 10:58 UTC (permalink / raw)
To: davej; +Cc: linux-kernel
> Maybe Randy Hron (added to Cc) can find some spare time
> to benchmark these sometime before the summit too[1].
dbench isn't scaling as well with the -rmap13b patch.
With 128 processes, dbench throughput is less than 1/3
of mainline.
dbench ext2 32 processes Average High Low
2.5.24 28.24 28.84 27.30 mb/sec
2.5.24-rmap13b 21.64 23.50 19.71
dbench ext2 128 processes Average High Low
2.5.24 19.32 21.05 18.05
2.5.24-rmap13b 5.34 5.38 5.30
tiobench:
Sequential reads, rmap had about 10% more throughput
and lower max latency.
For random reads, throughput was lower and max latency
was higher with rmap.
Lmbench:
Most metrics look better with rmap. Exceptions
are fork/exec latency and mmap latency. mmap
latency was 18% higher with rmap.
Autoconf build (fork test) was about 5% faster
without rmap.
Details at:
http://home.earthlink.net/~rwhron/kernel/latest.html
--
Randy Hron
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
@ 2002-06-20 14:20 rwhron
0 siblings, 0 replies; 20+ messages in thread
From: rwhron @ 2002-06-20 14:20 UTC (permalink / raw)
To: davej; +Cc: linux-kernel
> "Use SCSI drivers with broken error handling [DANGEROUS]" in the SCSI
> submenu will give same behaviour as that driver does in Linus' tree.
> Ie, it will compile, but possibly not have any working error handling.
> It should be ok for benchmarking though..
I will try that with the latest -dj after the current run (2.4.19-pre10 +
Jen's blockhighmem + Andy Kleen's select/poll) completes.
--
Randy Hron
http://home.earthlink.net/~rwhron/
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
@ 2002-06-19 11:18 Craig Kulesa
2002-06-19 16:18 ` Andrew Morton
` (3 more replies)
0 siblings, 4 replies; 20+ messages in thread
From: Craig Kulesa @ 2002-06-19 11:18 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm
Where: http://loke.as.arizona.edu/~ckulesa/kernel/rmap-vm/
This patch implements Rik van Riel's patches for a reverse mapping VM
atop the 2.5.23 kernel infrastructure. The principal sticky bits in
the port are correct interoperability with Andrew Morton's patches to
cleanup and extend the writeback and readahead code, among other things.
This patch reinstates Rik's (active, inactive dirty, inactive clean)
LRU list logic with the rmap information used for proper selection of pages
for eviction and better page aging. It seems to do a pretty good job even
for a first porting attempt. A simple, indicative test suite on a 192 MB
PII machine (loading a large image in GIMP, loading other applications,
heightening memory load to moderate swapout, then going back and
manipulating the original Gimp image to test page aging, then closing all
apps to the starting configuration) shows the following:
2.5.22 vanilla:
Total kernel swapouts during test = 29068 kB
Total kernel swapins during test = 16480 kB
Elapsed time for test: 141 seconds
2.5.23-rmap13b:
Total kernel swapouts during test = 40696 kB
Total kernel swapins during test = 380 kB
Elapsed time for test: 133 seconds
Although rmap's page_launder evicts a ton of pages under load, it seems to
swap the 'right' pages, as it doesn't need to swap them back in again.
This is a good sign. [recent 2.4-aa work pretty nicely too]
Various details for the curious or bored:
- Tested: UP, 16 MB < mem < 256 MB, x86 arch.
Untested: SMP, highmem, other archs.
In particular, I didn't even attempt to port rmap-related
changes to 2.5's arch/arm/mm/mm-armv.c.
- page_launder() is coarse and tends to clean/flush too
many pages at once. This is known behavior, but seems slightly
worse in 2.5 for some reason.
- pf_gfp_mask() doesn't exist in 2.5, nor does PF_NOIO. I have
simply dropped the call in try_to_free_pages() in vmscan.c, but
there is probably a way to reinstate its logic
(i.e. avoid memory balancing I/O if the current task
can't block on I/O). I didn't even attempt it.
- Writeback: instead of forcing reinstating a page on the
inactive list when !PageActive, page->mapping, !Pagedirty, and
!PageWriteback (see mm/page-writeback.c, fs/mpage.c), I just
let it go without any LRU list changes. If the page is
inactive and needs attention, it'll end up on the inactive
dirty list soon anyway, AFAICT. Seems okay so far, but that
may be flawed/sloppy reasoning... We could always look at the
page flags and reinstate the page to the appropriate LRU list
(i.e. inactive clean or dirty) if this turns out to be a
problem...
- Make shrink_[i,d,dq]cache_memory return the result of
kmem_cache_shrink(), not simply 0. Seems pointless to waste
that information, since we're getting it for free. Rik's patch
wants that info anyway...
- Readahead and drop_behind: With the new readahead code, we have
some choices regarding under what circumstances we choose to
drop_behind (i.e. only drop_behind if the reads look really
sequential, etc...). This patch blindly calls drop_behind at
the conclusion of page_cache_readahead(). Hopefully the
drop_behind code correctly interprets the new readahead indices.
It *seems* to behave correctly, but a quick look by another
pair of eyes would be reassuring.
- A couple of trivial rmap cleanups for Rik:
a) Semicolon day! System fails to boot if rmap debugging
is enabled in rmap.c. Fix is to remove the extraneous
semicolon in page_add_rmap():
if (!ptep_to_mm(ptep)); <--
b) The pte_chain_unlock/lock() pair between the tests for
"The page is in active use" and "Anonymous process
memory without backing store" in vmscan.c seems
unnecessary.
c) Drop PG_launder page flag, ala current 2.5 tree.
d) if(page_count(page)) == 0) ---> if(!page_count(page))
and things like that...
- To be consistent with 2.4-rmap, this patch includes a
minimal BIO-ified port of Andrew Morton's read-latency2 patch
(i.e. minus the elvtune ioctl stuff) to 2.5, from his patch
sets. This adds about 7 kB to the patch.
- The patch also includes compilation fixes:
(2.5.22)
drivers/scsi/constants.c (undeclared integer variable)
drivers/pci/pci-driver.c (unresolved symbol in pcmcia_core)
(2.5.23)
include/linux/smp.h (define cpu_online_map for UP)
kernel/ksyms.c (export default_wake_function for modules)
arch/i386/i386_syms.c (export ioremap_nocache for modules)
Hope this is of use to someone! It's certainly been a fun and
instructive exercise for me so far. ;)
I'll attempt to keep up with the 2.5 and rmap changes, fix inevitable
bugs in porting, and will upload regular patches to the above URL, at
least until the usual VM suspects start paying more attention to 2.5.
I'll post a quick changelog to the list occasionally if and when any
changes are significant, i.e. other then boring hand patching and
diffing.
Comments, feedback & patches always appreciated!
Craig Kulesa
Steward Observatory, Univ. of Arizona
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
2002-06-19 11:18 Craig Kulesa
@ 2002-06-19 16:18 ` Andrew Morton
2002-06-19 17:00 ` Daniel Phillips
` (2 subsequent siblings)
3 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2002-06-19 16:18 UTC (permalink / raw)
To: Craig Kulesa; +Cc: linux-kernel, linux-mm, Rik van Riel
Craig Kulesa wrote:
>
> ...
> Various details for the curious or bored:
>
> - Tested: UP, 16 MB < mem < 256 MB, x86 arch.
> Untested: SMP, highmem, other archs.
>
> In particular, I didn't even attempt to port rmap-related
> changes to 2.5's arch/arm/mm/mm-armv.c.
>
> - page_launder() is coarse and tends to clean/flush too
> many pages at once. This is known behavior, but seems slightly
> worse in 2.5 for some reason.
>
> - pf_gfp_mask() doesn't exist in 2.5, nor does PF_NOIO. I have
> simply dropped the call in try_to_free_pages() in vmscan.c, but
> there is probably a way to reinstate its logic
> (i.e. avoid memory balancing I/O if the current task
> can't block on I/O). I didn't even attempt it.
That's OK. PF_NOIO is a 2.4 "oh shit" for a loop driver deadlock.
That all just fixed itself up.
> - Writeback: instead of forcing reinstating a page on the
> inactive list when !PageActive, page->mapping, !Pagedirty, and
> !PageWriteback (see mm/page-writeback.c, fs/mpage.c), I just
> let it go without any LRU list changes. If the page is
> inactive and needs attention, it'll end up on the inactive
> dirty list soon anyway, AFAICT. Seems okay so far, but that
> may be flawed/sloppy reasoning... We could always look at the
> page flags and reinstate the page to the appropriate LRU list
> (i.e. inactive clean or dirty) if this turns out to be a
> problem...
The thinking there was this: the 2.4 shrink_cache() code was walking the
LRU, running writepage() against dirty pages at the tail. Each written
page was moved to the head of the LRU while under writeout, because we
can't do anything with it yet. Get it out of the way.
When I changed that single-page writepage() into a "clustered 32-page
writeout via ->dirty_pages", the same thing had to happen: get those
pages onto the "far" end of the inactive list.
So basically, you'll need to give them the same treatment as Rik
was giving them when they were written out in vmscan.c. Whatever
that was - it's been a while since I looked at rmap, sorry.
> ...
>
> - To be consistent with 2.4-rmap, this patch includes a
> minimal BIO-ified port of Andrew Morton's read-latency2 patch
> (i.e. minus the elvtune ioctl stuff) to 2.5, from his patch
> sets. This adds about 7 kB to the patch.
Heh. Probably we should not include this in your patch. It gets
in the way of evaluating rmap. I suggest we just suffer with the
existing IO scheduling for the while ;)
> - The patch also includes compilation fixes:
> (2.5.22)
> drivers/scsi/constants.c (undeclared integer variable)
> drivers/pci/pci-driver.c (unresolved symbol in pcmcia_core)
> (2.5.23)
> include/linux/smp.h (define cpu_online_map for UP)
> kernel/ksyms.c (export default_wake_function for modules)
> arch/i386/i386_syms.c (export ioremap_nocache for modules)
>
> Hope this is of use to someone! It's certainly been a fun and
> instructive exercise for me so far. ;)
Good stuff, thanks.
-
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
2002-06-19 11:18 Craig Kulesa
2002-06-19 16:18 ` Andrew Morton
@ 2002-06-19 17:00 ` Daniel Phillips
2002-06-19 17:11 ` Dave Jones
2002-06-19 19:04 ` Steven Cole
2002-06-19 22:44 ` William Lee Irwin III
3 siblings, 1 reply; 20+ messages in thread
From: Daniel Phillips @ 2002-06-19 17:00 UTC (permalink / raw)
To: Craig Kulesa, linux-kernel; +Cc: linux-mm, Linus Torvalds
On Wednesday 19 June 2002 13:18, Craig Kulesa wrote:
> Where: http://loke.as.arizona.edu/~ckulesa/kernel/rmap-vm/
>
> This patch implements Rik van Riel's patches for a reverse mapping VM
> atop the 2.5.23 kernel infrastructure...
>
> ...Hope this is of use to someone! It's certainly been a fun and
> instructive exercise for me so far. ;)
It's intensely useful. It changes the whole character of the VM discussion
at the upcoming kernel summit from 'should we port rmap to mainline?' to 'how
well does it work' and 'what problems need fixing'. Much more useful.
Your timing is impeccable. You really need to cc Linus on this work,
particularly your minimal, lru version.
--
Daniel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
2002-06-19 17:00 ` Daniel Phillips
@ 2002-06-19 17:11 ` Dave Jones
2002-06-19 17:35 ` Rik van Riel
0 siblings, 1 reply; 20+ messages in thread
From: Dave Jones @ 2002-06-19 17:11 UTC (permalink / raw)
To: Daniel Phillips
Cc: Craig Kulesa, linux-kernel, linux-mm, Linus Torvalds, rwhron
On Wed, Jun 19, 2002 at 07:00:57PM +0200, Daniel Phillips wrote:
> > ...Hope this is of use to someone! It's certainly been a fun and
> > instructive exercise for me so far. ;)
> It's intensely useful. It changes the whole character of the VM discussion
> at the upcoming kernel summit from 'should we port rmap to mainline?' to 'how
> well does it work' and 'what problems need fixing'. Much more useful.
Absolutely. Maybe Randy Hron (added to Cc) can find some spare time
to benchmark these sometime before the summit too[1]. It'll be very
interesting to see where it fits in with the other benchmark results
he's collected on varying workloads.
Dave
[1] I am master of subtle hints.
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
2002-06-19 17:11 ` Dave Jones
@ 2002-06-19 17:35 ` Rik van Riel
2002-06-19 19:53 ` Ingo Molnar
0 siblings, 1 reply; 20+ messages in thread
From: Rik van Riel @ 2002-06-19 17:35 UTC (permalink / raw)
To: Dave Jones
Cc: Daniel Phillips, Craig Kulesa, linux-kernel, linux-mm,
Linus Torvalds, rwhron
On Wed, 19 Jun 2002, Dave Jones wrote:
> On Wed, Jun 19, 2002 at 07:00:57PM +0200, Daniel Phillips wrote:
> > > ...Hope this is of use to someone! It's certainly been a fun and
> > > instructive exercise for me so far. ;)
> > It's intensely useful. It changes the whole character of the VM discussion
> > at the upcoming kernel summit from 'should we port rmap to mainline?' to 'how
> > well does it work' and 'what problems need fixing'. Much more useful.
>
> Absolutely. Maybe Randy Hron (added to Cc) can find some spare time
> to benchmark these sometime before the summit too[1]. It'll be very
> interesting to see where it fits in with the other benchmark results
> he's collected on varying workloads.
Note that either version is still untuned and rmap for 2.5
still needs pte-highmem support.
I am encouraged by Craig's test results, which show that
rmap did a LOT less swapin IO and rmap with page aging even
less. The fact that it did too much swapout IO means one
part of the system needs tuning but doesn't say much about
the thing as a whole.
In fact, I have a feeling that our tools are still too
crude, we really need/want some statistics of what's
happening inside the VM ... I'll work on those shortly.
Once we do have the tools to look at what's happening
inside the VM we should be much better able to tune the
right places inside the VM.
regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
2002-06-19 17:35 ` Rik van Riel
@ 2002-06-19 19:53 ` Ingo Molnar
2002-06-19 20:21 ` Craig Kulesa
2002-06-24 15:02 ` Rik van Riel
0 siblings, 2 replies; 20+ messages in thread
From: Ingo Molnar @ 2002-06-19 19:53 UTC (permalink / raw)
To: Rik van Riel
Cc: Dave Jones, Daniel Phillips, Craig Kulesa, linux-kernel, linux-mm,
Linus Torvalds, rwhron
On Wed, 19 Jun 2002, Rik van Riel wrote:
> I am encouraged by Craig's test results, which show that
> rmap did a LOT less swapin IO and rmap with page aging even
> less. The fact that it did too much swapout IO means one
> part of the system needs tuning but doesn't say much about
> the thing as a whole.
btw., isnt there a fair chance that by 'fixing' the aging+rmap code to
swap out less, you'll ultimately swap in more? [because the extra swappout
likely ended up freeing up RAM as well, which in turn decreases the amount
of trashing.]
Ingo
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
2002-06-19 19:53 ` Ingo Molnar
@ 2002-06-19 20:21 ` Craig Kulesa
2002-06-19 20:24 ` Linus Torvalds
2002-06-24 15:02 ` Rik van Riel
1 sibling, 1 reply; 20+ messages in thread
From: Craig Kulesa @ 2002-06-19 20:21 UTC (permalink / raw)
To: Ingo Molnar
Cc: Rik van Riel, Dave Jones, Daniel Phillips, linux-kernel, linux-mm,
Linus Torvalds, rwhron
On Wed, 19 Jun 2002, Ingo Molnar wrote:
> btw., isnt there a fair chance that by 'fixing' the aging+rmap code to
> swap out less, you'll ultimately swap in more? [because the extra swappout
> likely ended up freeing up RAM as well, which in turn decreases the amount
> of trashing.]
Agree. Heightened swapout in this rather simplified example) isn't a
problem in itself, unless it really turns out to be a bottleneck in a
wide variety of loads. As long as the *right* pages are being swapped
and don't have to be paged right back in again.
I'll try a more varied set of tests tonight, with cpu usage tabulated.
-Craig
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
2002-06-19 20:21 ` Craig Kulesa
@ 2002-06-19 20:24 ` Linus Torvalds
2002-06-24 21:34 ` Martin J. Bligh
0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2002-06-19 20:24 UTC (permalink / raw)
To: Craig Kulesa
Cc: Ingo Molnar, Rik van Riel, Dave Jones, Daniel Phillips,
linux-kernel, linux-mm, rwhron
On Wed, 19 Jun 2002, Craig Kulesa wrote:
>
> I'll try a more varied set of tests tonight, with cpu usage tabulated.
Please do a few non-swap tests too.
Swapping is the thing that rmap is supposed to _help_, so improvements in
that area are good (and had better happen!), but if you're only looking at
the swap performance, you're ignoring the known problems with rmap, ie the
cases where non-rmap kernels do really well.
Comparing one but not the other doesn't give a very balanced picture..
Linus
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
2002-06-19 20:24 ` Linus Torvalds
@ 2002-06-24 21:34 ` Martin J. Bligh
2002-06-24 21:39 ` Rik van Riel
2002-07-04 5:19 ` Daniel Phillips
0 siblings, 2 replies; 20+ messages in thread
From: Martin J. Bligh @ 2002-06-24 21:34 UTC (permalink / raw)
To: Linus Torvalds, Craig Kulesa
Cc: Ingo Molnar, Rik van Riel, Dave Jones, Daniel Phillips,
linux-kernel, linux-mm, rwhron
>> I'll try a more varied set of tests tonight, with cpu usage tabulated.
>
> Please do a few non-swap tests too.
>
> Swapping is the thing that rmap is supposed to _help_, so improvements in
> that area are good (and had better happen!), but if you're only looking at
> the swap performance, you're ignoring the known problems with rmap, ie the
> cases where non-rmap kernels do really well.
>
> Comparing one but not the other doesn't give a very balanced picture..
It would also be interesting to see memory consumption figures for a benchmark
with many large processes. With this type of load, memory consumption
through PTEs is already a problem - as far as I can see, rmap triples the
memory requirement of PTEs through the PTE chain's doubly linked list
(an additional 8 bytes per entry) ... perhaps my calculations are wrong?
This is particular problem for databases that tend to have thousands of
processes attatched to a large shared memory area.
A quick rough calculation indicates that the Oracle test I was helping out
with was consuming almost 10Gb of PTEs without rmap - 30Gb for overhead
doesn't sound like fun to me ;-(
M.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
2002-06-24 21:34 ` Martin J. Bligh
@ 2002-06-24 21:39 ` Rik van Riel
2002-06-24 21:56 ` Martin J. Bligh
2002-07-04 5:19 ` Daniel Phillips
1 sibling, 1 reply; 20+ messages in thread
From: Rik van Riel @ 2002-06-24 21:39 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Linus Torvalds, Craig Kulesa, Ingo Molnar, Dave Jones,
Daniel Phillips, linux-kernel, linux-mm, rwhron
On Mon, 24 Jun 2002, Martin J. Bligh wrote:
> A quick rough calculation indicates that the Oracle test I was helping
> out with was consuming almost 10Gb of PTEs without rmap - 30Gb for
> overhead doesn't sound like fun to me ;-(
10 GB is already bad enough that rmap isn't so much causing
a problem but increasing an already untolerable problem.
For the large SHM segment you'd probably want to either use
large pages or shared page tables ... in each of these cases
the rmap overhead will disappear together with the page table
overhead.
Now we just need volunteers for the implementation ;)
kind regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
2002-06-24 21:39 ` Rik van Riel
@ 2002-06-24 21:56 ` Martin J. Bligh
0 siblings, 0 replies; 20+ messages in thread
From: Martin J. Bligh @ 2002-06-24 21:56 UTC (permalink / raw)
To: Rik van Riel
Cc: Linus Torvalds, Craig Kulesa, Ingo Molnar, Dave Jones,
Daniel Phillips, linux-kernel, linux-mm, rwhron
>> A quick rough calculation indicates that the Oracle test I was helping
>> out with was consuming almost 10Gb of PTEs without rmap - 30Gb for
>> overhead doesn't sound like fun to me ;-(
>
> 10 GB is already bad enough that rmap isn't so much causing
> a problem but increasing an already untolerable problem.
Yup, I'm not denying there's an large existing problem there, but
at least we can fit it into memory right now. Just something to bear
in mind when you're benchmarking.
> Now we just need volunteers for the implementation ;)
We have some people looking at it already, but it's not the world's
most trivial problem to solve ;-)
M.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
2002-06-24 21:34 ` Martin J. Bligh
2002-06-24 21:39 ` Rik van Riel
@ 2002-07-04 5:19 ` Daniel Phillips
1 sibling, 0 replies; 20+ messages in thread
From: Daniel Phillips @ 2002-07-04 5:19 UTC (permalink / raw)
To: Martin J. Bligh, Linus Torvalds, Craig Kulesa
Cc: Ingo Molnar, Rik van Riel, Dave Jones, Daniel Phillips,
linux-kernel, linux-mm, rwhron
On Monday 24 June 2002 23:34, Martin J. Bligh wrote:
> ... as far as I can see, rmap triples the
> memory requirement of PTEs through the PTE chain's doubly linked list
> (an additional 8 bytes per entry)
It's 8 bytes per pte_chain node all right, but it's a single linked
list, with each pte_chain node pointing at a pte and the next pte_chain
node.
> ... perhaps my calculations are wrong?
Yep. You do not get one pte_chain node per pte, it's one per mapped
page, plus one for each additional sharer of the page. With the
direct pointer optimization, where an unshared struct page points
directly at the pte (rumor has it Dave McCracken has done the patch)
then the pte_chain overhead goes away for all except shared pages.
Then with page table sharing, again the direct pointer optimization
is possible. So the pte_chain overhead drops rapidly, and in any
case, is not proportional to the number of ptes.
For practical purposes, the memory overhead for rmap boils down to
one extra field in struct page, that is, it's proportional to the
number of physical pages, an overhead of less than .1%. In heavy
sharing situations the pte_chain overhead will rise somewhat, but
this is precisely the type of load where reverse mapping is most
needed for efficient and predictable pageout processing, and page
table sharing should help here as well.
--
Daniel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
2002-06-19 19:53 ` Ingo Molnar
2002-06-19 20:21 ` Craig Kulesa
@ 2002-06-24 15:02 ` Rik van Riel
1 sibling, 0 replies; 20+ messages in thread
From: Rik van Riel @ 2002-06-24 15:02 UTC (permalink / raw)
To: Ingo Molnar
Cc: Dave Jones, Daniel Phillips, Craig Kulesa, linux-kernel, linux-mm,
Linus Torvalds, rwhron
On Wed, 19 Jun 2002, Ingo Molnar wrote:
> On Wed, 19 Jun 2002, Rik van Riel wrote:
>
> > I am encouraged by Craig's test results, which show that
> > rmap did a LOT less swapin IO and rmap with page aging even
> > less. The fact that it did too much swapout IO means one
> > part of the system needs tuning but doesn't say much about
> > the thing as a whole.
>
> btw., isnt there a fair chance that by 'fixing' the aging+rmap code to
> swap out less, you'll ultimately swap in more? [because the extra swappout
> likely ended up freeing up RAM as well, which in turn decreases the amount
> of trashing.]
Possibly, but I expect the 'extra' swapouts to be caused
by page_launder writing out too many pages at once and not
just the ones it wants to free.
Cleaning pages and freeing them are separate operations,
what is missing is a mechanism to clean enoughh pages but
not all inactive pages at once ;)
regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
2002-06-19 11:18 Craig Kulesa
2002-06-19 16:18 ` Andrew Morton
2002-06-19 17:00 ` Daniel Phillips
@ 2002-06-19 19:04 ` Steven Cole
2002-06-19 22:44 ` William Lee Irwin III
3 siblings, 0 replies; 20+ messages in thread
From: Steven Cole @ 2002-06-19 19:04 UTC (permalink / raw)
To: Craig Kulesa; +Cc: linux-kernel, linux-mm
On Wed, 2002-06-19 at 05:18, Craig Kulesa wrote:
>
>
> Where: http://loke.as.arizona.edu/~ckulesa/kernel/rmap-vm/
>
> This patch implements Rik van Riel's patches for a reverse mapping VM
> atop the 2.5.23 kernel infrastructure. The principal sticky bits in
> the port are correct interoperability with Andrew Morton's patches to
> cleanup and extend the writeback and readahead code, among other things.
> This patch reinstates Rik's (active, inactive dirty, inactive clean)
> LRU list logic with the rmap information used for proper selection of pages
> for eviction and better page aging. It seems to do a pretty good job even
> for a first porting attempt. A simple, indicative test suite on a 192 MB
> PII machine (loading a large image in GIMP, loading other applications,
> heightening memory load to moderate swapout, then going back and
> manipulating the original Gimp image to test page aging, then closing all
> apps to the starting configuration) shows the following:
>
> 2.5.22 vanilla:
> Total kernel swapouts during test = 29068 kB
> Total kernel swapins during test = 16480 kB
> Elapsed time for test: 141 seconds
>
> 2.5.23-rmap13b:
> Total kernel swapouts during test = 40696 kB
> Total kernel swapins during test = 380 kB
> Elapsed time for test: 133 seconds
>
> Although rmap's page_launder evicts a ton of pages under load, it seems to
> swap the 'right' pages, as it doesn't need to swap them back in again.
> This is a good sign. [recent 2.4-aa work pretty nicely too]
>
> Various details for the curious or bored:
>
> - Tested: UP, 16 MB < mem < 256 MB, x86 arch.
> Untested: SMP, highmem, other archs.
^^^
I tried to boot 2.5.23-rmap13b on a dual PIII without success.
Freeing unused kernel memory: 252k freed
hung here with CONFIG_SMP=y
Adding 1052248k swap on /dev/sda6. Priority:0 extents:1
Adding 1052248k swap on /dev/sdb1. Priority:0 extents:1
The above is the edited dmesg output from booting 2.5.23-rmap13b as an
UP kernel, which successfully booted on the same 2-way box.
Steven
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b)
2002-06-19 11:18 Craig Kulesa
` (2 preceding siblings ...)
2002-06-19 19:04 ` Steven Cole
@ 2002-06-19 22:44 ` William Lee Irwin III
3 siblings, 0 replies; 20+ messages in thread
From: William Lee Irwin III @ 2002-06-19 22:44 UTC (permalink / raw)
To: Craig Kulesa; +Cc: linux-kernel, linux-mm
On Wed, Jun 19, 2002 at 04:18:00AM -0700, Craig Kulesa wrote:
> Where: http://loke.as.arizona.edu/~ckulesa/kernel/rmap-vm/
> This patch implements Rik van Riel's patches for a reverse mapping VM
> atop the 2.5.23 kernel infrastructure. The principal sticky bits in
There is a small bit of trouble here: pte_chain_lock() needs to
preempt_disable() and pte_chain_unlock() needs to preempt_enable(),
as they are meant to protect critical sections.
Cheers,
Bill
On Wed, Jun 19, 2002 at 04:18:00AM -0700, Craig Kulesa wrote:
+static inline void pte_chain_lock(struct page *page)
+{
+ /*
+ * Assuming the lock is uncontended, this never enters
+ * the body of the outer loop. If it is contended, then
+ * within the inner loop a non-atomic test is used to
+ * busywait with less bus contention for a good time to
+ * attempt to acquire the lock bit.
+ */
+ while (test_and_set_bit(PG_chainlock, &page->flags)) {
+ while (test_bit(PG_chainlock, &page->flags))
+ cpu_relax();
+ }
+}
+
+static inline void pte_chain_unlock(struct page *page)
+{
+ clear_bit(PG_chainlock, &page->flags);
+}
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2002-07-04 5:21 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-20 4:07 [PATCH] (1/2) reverse mapping VM for 2.5.23 (rmap-13b) rwhron
2002-06-20 12:47 ` Dave Jones
-- strict thread matches above, loose matches on Subject: below --
2002-06-25 10:58 rwhron
2002-06-25 21:57 ` Rik van Riel
2002-06-20 14:20 rwhron
2002-06-19 11:18 Craig Kulesa
2002-06-19 16:18 ` Andrew Morton
2002-06-19 17:00 ` Daniel Phillips
2002-06-19 17:11 ` Dave Jones
2002-06-19 17:35 ` Rik van Riel
2002-06-19 19:53 ` Ingo Molnar
2002-06-19 20:21 ` Craig Kulesa
2002-06-19 20:24 ` Linus Torvalds
2002-06-24 21:34 ` Martin J. Bligh
2002-06-24 21:39 ` Rik van Riel
2002-06-24 21:56 ` Martin J. Bligh
2002-07-04 5:19 ` Daniel Phillips
2002-06-24 15:02 ` Rik van Riel
2002-06-19 19:04 ` Steven Cole
2002-06-19 22:44 ` William Lee Irwin III
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox