[PATCH] rmap 14

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] rmap 14
@ 2002-08-16  2:07 ` Rik van Riel
  0 siblings, 0 replies; 25+ messages in thread
From: Rik van Riel @ 2002-08-16  2:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm

This is a fairly minimal change for rmap14 since I've been
working on 2.5 most of the time. The experimental code in
this version is a hopefully smarter page_launder() that
shouldn't do much more IO than needed and hopefully gets
rid of the stalls that people have seen during heavy swap
activity.  Please test this version. ;)


The first release of the 14th version of the reverse
mapping based VM is now available.
This is an attempt at making a more robust and flexible VM
subsystem, while cleaning up a lot of code at the same time.
The patch is available from:

           http://surriel.com/patches/2.4/2.4.19-rmap14
and        http://surriel.com/patches/2.4/incr/rmap13c-rmap14
and        http://linuxvm.bkbits.net/


My big TODO items for a next release are:
  - O(1) page launder - currently functional but slow, needs to be tuned
  - pte-highmem

rmap 14:
  - get rid of stalls during swapping, hopefully          (me)
  - low latency zap_page_range                            (Robert Love)
rmap 13c:
  - add wmb() to wakeup_memwaiters                        (Arjan van de Ven)
  - remap_pmd_range now calls pte_alloc with full address (Paul Mackerras)
  - #ifdef out pte_chain_lock/unlock on UP machines       (Andrew Morton)
  - un-BUG() truncate_complete_page, the race is expected (Andrew Morton, me)
  - remove NUMA changes from rmap13a                      (Christoph Hellwig)
rmap 13b:
  - prevent PF_MEMALLOC recursion for higher order allocs (Arjan van de Ven, me)
  - fix small SMP race, PG_lru                            (Hugh Dickins)
rmap 13a:
  - NUMA changes for page_address                         (Samuel Ortiz)
  - replace vm.freepages with simpler kswapd_minfree      (Christoph Hellwig)
rmap 13:
  - rename touch_page to mark_page_accessed and uninline  (Christoph Hellwig)
  - NUMA bugfix for __alloc_pages                         (William Irwin)
  - kill __find_page                                      (Christoph Hellwig)
  - make pte_chain_freelist per zone                      (William Irwin)
  - protect pte_chains by per-page lock bit               (William Irwin)
  - minor code cleanups                                   (me)
rmap 12i:
  - slab cleanup                                          (Christoph Hellwig)
  - remove references to compiler.h from mm/*             (me)
  - move rmap to marcelo's bk tree                        (me)
  - minor cleanups                                        (me)
rmap 12h:
  - hopefully fix OOM detection algorithm                 (me)
  - drop pte quicklist in anticipation of pte-highmem     (me)
  - replace andrea's highmem emulation by ingo's one      (me)
  - improve rss limit checking                            (Nick Piggin)
rmap 12g:
  - port to armv architecture                             (David Woodhouse)
  - NUMA fix to zone_table initialisation                 (Samuel Ortiz)
  - remove init_page_count                                (David Miller)
rmap 12f:
  - for_each_pgdat macro                                  (William Lee Irwin)
  - put back EXPORT(__find_get_page) for modular rd       (me)
  - make bdflush and kswapd actually start queued disk IO (me)
rmap 12e
  - RSS limit fix, the limit can be 0 for some reason     (me)
  - clean up for_each_zone define to not need pgdata_t    (William Lee Irwin)
  - fix i810_dma bug introduced with page->wait removal   (William Lee Irwin)
rmap 12d:
  - fix compiler warning in rmap.c                        (Roger Larsson)
  - read latency improvement   (read-latency2)            (Andrew Morton)
rmap 12c:
  - fix small balancing bug in page_launder_zone          (Nick Piggin)
  - wakeup_kswapd / wakeup_memwaiters code fix            (Arjan van de Ven)
  - improve RSS limit enforcement                         (me)
rmap 12b:
  - highmem emulation (for debugging purposes)            (Andrea Arcangeli)
  - ulimit RSS enforcement when memory gets tight         (me)
  - sparc64 page->virtual quickfix                        (Greg Procunier)
rmap 12a:
  - fix the compile warning in buffer.c                   (me)
  - fix divide-by-zero on highmem initialisation  DOH!    (me)
  - remove the pgd quicklist (suspicious ...)             (DaveM, me)
rmap 12:
  - keep some extra free memory on large machines         (Arjan van de Ven, me)
  - higher-order allocation bugfix                        (Adrian Drzewiecki)
  - nr_free_buffer_pages() returns inactive + free mem    (me)
  - pages from unused objects directly to inactive_clean  (me)
  - use fast pte quicklists on non-pae machines           (Andrea Arcangeli)
  - remove sleep_on from wakeup_kswapd                    (Arjan van de Ven)
  - page waitqueue cleanup                                (Christoph Hellwig)
rmap 11c:
  - oom_kill race locking fix                             (Andres Salomon)
  - elevator improvement                                  (Andrew Morton)
  - dirty buffer writeout speedup (hopefully ;))          (me)
  - small documentation updates                           (me)
  - page_launder() never does synchronous IO, kswapd
    and the processes calling it sleep on higher level    (me)
  - deadlock fix in touch_page()                          (me)
rmap 11b:
  - added low latency reschedule points in vmscan.c       (me)
  - make i810_dma.c include mm_inline.h too               (William Lee Irwin)
  - wake up kswapd sleeper tasks on OOM kill so the
    killed task can continue on its way out               (me)
  - tune page allocation sleep point a little             (me)
rmap 11a:
  - don't let refill_inactive() progress count for OOM    (me)
  - after an OOM kill, wait 5 seconds for the next kill   (me)
  - agpgart_be fix for hashed waitqueues                  (William Lee Irwin)
rmap 11:
  - fix stupid logic inversion bug in wakeup_kswapd()     (Andrew Morton)
  - fix it again in the morning                           (me)
  - add #ifdef BROKEN_PPC_PTE_ALLOC_ONE to rmap.h, it
    seems PPC calls pte_alloc() before mem_map[] init     (me)
  - disable the debugging code in rmap.c ... the code
    is working and people are running benchmarks          (me)
  - let the slab cache shrink functions return a value
    to help prevent early OOM killing                     (Ed Tomlinson)
  - also, don't call the OOM code if we have enough
    free pages                                            (me)
  - move the call to lru_cache_del into __free_pages_ok   (Ben LaHaise)
  - replace the per-page waitqueue with a hashed
    waitqueue, reduces size of struct page from 64
    bytes to 52 bytes (48 bytes on non-highmem machines)  (William Lee Irwin)
rmap 10:
  - fix the livelock for real (yeah right), turned out
    to be a stupid bug in page_launder_zone()             (me)
  - to make sure the VM subsystem doesn't monopolise
    the CPU, let kswapd and some apps sleep a bit under
    heavy stress situations                               (me)
  - let __GFP_HIGH allocations dig a little bit deeper
    into the free page pool, the SCSI layer seems fragile (me)
rmap 9:
  - improve comments all over the place                   (Michael Cohen)
  - don't panic if page_remove_rmap() cannot find the
    rmap in question, it's possible that the memory was
    PG_reserved and belonging to a driver, but the driver
    exited and cleared the PG_reserved bit                (me)
  - fix the VM livelock by replacing > by >= in a few
    critical places in the pageout code                   (me)
  - treat the reclaiming of an inactive_clean page like
    allocating a new page, calling try_to_free_pages()
    and/or fixup_freespace() if required                  (me)
  - when low on memory, don't make things worse by
    doing swapin_readahead                                (me)
rmap 8:
  - add ANY_ZONE to the balancing functions to improve
    kswapd's balancing a bit                              (me)
  - regularize some of the maximum loop bounds in
    vmscan.c for cosmetic purposes                        (William Lee Irwin)
  - move page_address() to architecture-independent
    code, now the removal of page->virtual is portable    (William Lee Irwin)
  - speed up free_area_init_core() by doing a single
    pass over the pages and not using atomic ops          (William Lee Irwin)
  - documented the buddy allocator in page_alloc.c        (William Lee Irwin)
rmap 7:
  - clean up and document vmscan.c                        (me)
  - reduce size of page struct, part one                  (William Lee Irwin)
  - add rmap.h for other archs (untested, not for ARM)    (me)
rmap 6:
  - make the active and inactive_dirty list per zone,
    this is finally possible because we can free pages
    based on their physical address                       (William Lee Irwin)
  - cleaned up William's code a bit                       (me)
  - turn some defines into inlines and move those to
    mm_inline.h (the includes are a mess ...)             (me)
  - improve the VM balancing a bit                        (me)
  - add back inactive_target to /proc/meminfo             (me)
rmap 5:
  - fixed recursive buglet, introduced by directly
    editing the patch for making rmap 4 ;)))              (me)
rmap 4:
  - look at the referenced bits in page tables            (me)
rmap 3:
  - forgot one FASTCALL definition                        (me)
rmap 2:
  - teach try_to_unmap_one() about mremap()               (me)
  - don't assign swap space to pages with buffers         (me)
  - make the rmap.c functions FASTCALL / inline           (me)
rmap 1:
  - fix the swap leak in rmap 0                           (Dave McCracken)
rmap 0:
  - port of reverse mapping VM to 2.4.16                  (me)

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] rmap 14
@ 2002-08-16  2:07 ` Rik van Riel
  0 siblings, 0 replies; 25+ messages in thread
From: Rik van Riel @ 2002-08-16  2:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-mm

This is a fairly minimal change for rmap14 since I've been
working on 2.5 most of the time. The experimental code in
this version is a hopefully smarter page_launder() that
shouldn't do much more IO than needed and hopefully gets
rid of the stalls that people have seen during heavy swap
activity.  Please test this version. ;)


The first release of the 14th version of the reverse
mapping based VM is now available.
This is an attempt at making a more robust and flexible VM
subsystem, while cleaning up a lot of code at the same time.
The patch is available from:

           http://surriel.com/patches/2.4/2.4.19-rmap14
and        http://surriel.com/patches/2.4/incr/rmap13c-rmap14
and        http://linuxvm.bkbits.net/


My big TODO items for a next release are:
  - O(1) page launder - currently functional but slow, needs to be tuned
  - pte-highmem

rmap 14:
  - get rid of stalls during swapping, hopefully          (me)
  - low latency zap_page_range                            (Robert Love)
rmap 13c:
  - add wmb() to wakeup_memwaiters                        (Arjan van de Ven)
  - remap_pmd_range now calls pte_alloc with full address (Paul Mackerras)
  - #ifdef out pte_chain_lock/unlock on UP machines       (Andrew Morton)
  - un-BUG() truncate_complete_page, the race is expected (Andrew Morton, me)
  - remove NUMA changes from rmap13a                      (Christoph Hellwig)
rmap 13b:
  - prevent PF_MEMALLOC recursion for higher order allocs (Arjan van de Ven, me)
  - fix small SMP race, PG_lru                            (Hugh Dickins)
rmap 13a:
  - NUMA changes for page_address                         (Samuel Ortiz)
  - replace vm.freepages with simpler kswapd_minfree      (Christoph Hellwig)
rmap 13:
  - rename touch_page to mark_page_accessed and uninline  (Christoph Hellwig)
  - NUMA bugfix for __alloc_pages                         (William Irwin)
  - kill __find_page                                      (Christoph Hellwig)
  - make pte_chain_freelist per zone                      (William Irwin)
  - protect pte_chains by per-page lock bit               (William Irwin)
  - minor code cleanups                                   (me)
rmap 12i:
  - slab cleanup                                          (Christoph Hellwig)
  - remove references to compiler.h from mm/*             (me)
  - move rmap to marcelo's bk tree                        (me)
  - minor cleanups                                        (me)
rmap 12h:
  - hopefully fix OOM detection algorithm                 (me)
  - drop pte quicklist in anticipation of pte-highmem     (me)
  - replace andrea's highmem emulation by ingo's one      (me)
  - improve rss limit checking                            (Nick Piggin)
rmap 12g:
  - port to armv architecture                             (David Woodhouse)
  - NUMA fix to zone_table initialisation                 (Samuel Ortiz)
  - remove init_page_count                                (David Miller)
rmap 12f:
  - for_each_pgdat macro                                  (William Lee Irwin)
  - put back EXPORT(__find_get_page) for modular rd       (me)
  - make bdflush and kswapd actually start queued disk IO (me)
rmap 12e
  - RSS limit fix, the limit can be 0 for some reason     (me)
  - clean up for_each_zone define to not need pgdata_t    (William Lee Irwin)
  - fix i810_dma bug introduced with page->wait removal   (William Lee Irwin)
rmap 12d:
  - fix compiler warning in rmap.c                        (Roger Larsson)
  - read latency improvement   (read-latency2)            (Andrew Morton)
rmap 12c:
  - fix small balancing bug in page_launder_zone          (Nick Piggin)
  - wakeup_kswapd / wakeup_memwaiters code fix            (Arjan van de Ven)
  - improve RSS limit enforcement                         (me)
rmap 12b:
  - highmem emulation (for debugging purposes)            (Andrea Arcangeli)
  - ulimit RSS enforcement when memory gets tight         (me)
  - sparc64 page->virtual quickfix                        (Greg Procunier)
rmap 12a:
  - fix the compile warning in buffer.c                   (me)
  - fix divide-by-zero on highmem initialisation  DOH!    (me)
  - remove the pgd quicklist (suspicious ...)             (DaveM, me)
rmap 12:
  - keep some extra free memory on large machines         (Arjan van de Ven, me)
  - higher-order allocation bugfix                        (Adrian Drzewiecki)
  - nr_free_buffer_pages() returns inactive + free mem    (me)
  - pages from unused objects directly to inactive_clean  (me)
  - use fast pte quicklists on non-pae machines           (Andrea Arcangeli)
  - remove sleep_on from wakeup_kswapd                    (Arjan van de Ven)
  - page waitqueue cleanup                                (Christoph Hellwig)
rmap 11c:
  - oom_kill race locking fix                             (Andres Salomon)
  - elevator improvement                                  (Andrew Morton)
  - dirty buffer writeout speedup (hopefully ;))          (me)
  - small documentation updates                           (me)
  - page_launder() never does synchronous IO, kswapd
    and the processes calling it sleep on higher level    (me)
  - deadlock fix in touch_page()                          (me)
rmap 11b:
  - added low latency reschedule points in vmscan.c       (me)
  - make i810_dma.c include mm_inline.h too               (William Lee Irwin)
  - wake up kswapd sleeper tasks on OOM kill so the
    killed task can continue on its way out               (me)
  - tune page allocation sleep point a little             (me)
rmap 11a:
  - don't let refill_inactive() progress count for OOM    (me)
  - after an OOM kill, wait 5 seconds for the next kill   (me)
  - agpgart_be fix for hashed waitqueues                  (William Lee Irwin)
rmap 11:
  - fix stupid logic inversion bug in wakeup_kswapd()     (Andrew Morton)
  - fix it again in the morning                           (me)
  - add #ifdef BROKEN_PPC_PTE_ALLOC_ONE to rmap.h, it
    seems PPC calls pte_alloc() before mem_map[] init     (me)
  - disable the debugging code in rmap.c ... the code
    is working and people are running benchmarks          (me)
  - let the slab cache shrink functions return a value
    to help prevent early OOM killing                     (Ed Tomlinson)
  - also, don't call the OOM code if we have enough
    free pages                                            (me)
  - move the call to lru_cache_del into __free_pages_ok   (Ben LaHaise)
  - replace the per-page waitqueue with a hashed
    waitqueue, reduces size of struct page from 64
    bytes to 52 bytes (48 bytes on non-highmem machines)  (William Lee Irwin)
rmap 10:
  - fix the livelock for real (yeah right), turned out
    to be a stupid bug in page_launder_zone()             (me)
  - to make sure the VM subsystem doesn't monopolise
    the CPU, let kswapd and some apps sleep a bit under
    heavy stress situations                               (me)
  - let __GFP_HIGH allocations dig a little bit deeper
    into the free page pool, the SCSI layer seems fragile (me)
rmap 9:
  - improve comments all over the place                   (Michael Cohen)
  - don't panic if page_remove_rmap() cannot find the
    rmap in question, it's possible that the memory was
    PG_reserved and belonging to a driver, but the driver
    exited and cleared the PG_reserved bit                (me)
  - fix the VM livelock by replacing > by >= in a few
    critical places in the pageout code                   (me)
  - treat the reclaiming of an inactive_clean page like
    allocating a new page, calling try_to_free_pages()
    and/or fixup_freespace() if required                  (me)
  - when low on memory, don't make things worse by
    doing swapin_readahead                                (me)
rmap 8:
  - add ANY_ZONE to the balancing functions to improve
    kswapd's balancing a bit                              (me)
  - regularize some of the maximum loop bounds in
    vmscan.c for cosmetic purposes                        (William Lee Irwin)
  - move page_address() to architecture-independent
    code, now the removal of page->virtual is portable    (William Lee Irwin)
  - speed up free_area_init_core() by doing a single
    pass over the pages and not using atomic ops          (William Lee Irwin)
  - documented the buddy allocator in page_alloc.c        (William Lee Irwin)
rmap 7:
  - clean up and document vmscan.c                        (me)
  - reduce size of page struct, part one                  (William Lee Irwin)
  - add rmap.h for other archs (untested, not for ARM)    (me)
rmap 6:
  - make the active and inactive_dirty list per zone,
    this is finally possible because we can free pages
    based on their physical address                       (William Lee Irwin)
  - cleaned up William's code a bit                       (me)
  - turn some defines into inlines and move those to
    mm_inline.h (the includes are a mess ...)             (me)
  - improve the VM balancing a bit                        (me)
  - add back inactive_target to /proc/meminfo             (me)
rmap 5:
  - fixed recursive buglet, introduced by directly
    editing the patch for making rmap 4 ;)))              (me)
rmap 4:
  - look at the referenced bits in page tables            (me)
rmap 3:
  - forgot one FASTCALL definition                        (me)
rmap 2:
  - teach try_to_unmap_one() about mremap()               (me)
  - don't assign swap space to pages with buffers         (me)
  - make the rmap.c functions FASTCALL / inline           (me)
rmap 1:
  - fix the swap leak in rmap 0                           (Dave McCracken)
rmap 0:
  - port of reverse mapping VM to 2.4.16                  (me)

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-16  2:07 ` Rik van Riel
@ 2002-08-16  2:21   ` Bill Huey
  -1 siblings, 0 replies; 25+ messages in thread
From: Bill Huey @ 2002-08-16  2:21 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, linux-mm, Bill Huey (Hui)

On Thu, Aug 15, 2002 at 11:07:49PM -0300, Rik van Riel wrote:
> This is a fairly minimal change for rmap14 since I've been
> working on 2.5 most of the time. The experimental code in
> this version is a hopefully smarter page_launder() that
> shouldn't do much more IO than needed and hopefully gets
> rid of the stalls that people have seen during heavy swap
> activity.  Please test this version. ;)
> 
> The first release of the 14th version of the reverse
> mapping based VM is now available.
> This is an attempt at making a more robust and flexible VM
> subsystem, while cleaning up a lot of code at the same time.
> The patch is available from:

Hey,

Again, the combination of a kind of felt increase in intelligence in
swap decisions and increase in interactivity made my machine feel
substantally smoother, but it needs to be backed up by other people's
experiences with it.

I wish there was a test for this kind of thing.

bill


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
@ 2002-08-16  2:21   ` Bill Huey
  0 siblings, 0 replies; 25+ messages in thread
From: Bill Huey @ 2002-08-16  2:21 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, linux-mm, Bill Huey (Hui)

On Thu, Aug 15, 2002 at 11:07:49PM -0300, Rik van Riel wrote:
> This is a fairly minimal change for rmap14 since I've been
> working on 2.5 most of the time. The experimental code in
> this version is a hopefully smarter page_launder() that
> shouldn't do much more IO than needed and hopefully gets
> rid of the stalls that people have seen during heavy swap
> activity.  Please test this version. ;)
> 
> The first release of the 14th version of the reverse
> mapping based VM is now available.
> This is an attempt at making a more robust and flexible VM
> subsystem, while cleaning up a lot of code at the same time.
> The patch is available from:

Hey,

Again, the combination of a kind of felt increase in intelligence in
swap decisions and increase in interactivity made my machine feel
substantally smoother, but it needs to be backed up by other people's
experiences with it.

I wish there was a test for this kind of thing.

bill

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-16  2:21   ` Bill Huey
@ 2002-08-16 21:02     ` Mel
  -1 siblings, 0 replies; 25+ messages in thread
From: Mel @ 2002-08-16 21:02 UTC (permalink / raw)
  To: Bill Huey; +Cc: Rik van Riel, linux-kernel, linux-mm

On Thu, 15 Aug 2002, Bill Huey wrote:

> Again, the combination of a kind of felt increase in intelligence in
> swap decisions and increase in interactivity made my machine feel
> substantally smoother, but it needs to be backed up by other people's
> experiences with it.
>
> I wish there was a test for this kind of thing.
>

Blatant plug but it's what I'm working on with VM Regress. At the version
I'm working on, I've started the first benchmark but I've only been
working on it a day so it's a bit to go yet. As a start, we'll be able to
benchmark page access times and swap decisions. These are three live tests
run against 2.4.20pre2 but the suite is known to compile with the latest
2.5 kernel and with 2.4.19-rmap14.

http://www.csn.ul.ie/~mel/projects/vmregress/2.4.20pre2/start/mapanon.html
http://www.csn.ul.ie/~mel/projects/vmregress/2.4.20pre2/updatedb/mapanon.html
http://www.csn.ul.ie/~mel/projects/vmregress/2.4.20pre2/withx/mapanon.html

start    was run at system startup
updatedb is output after updatedb was running 2 minutes
withx    is run with the system running X, konqueror and a few eterms

The information is still a bit sparse but still, things can be told. The
test works by using three kernel modules

mapanon - Will mmap, read/write, close mmaped regions for a caller
pagemap - Print out pages swapped/present in all VMA's
zone    - Print out all zone information

A perl script uses these from userspace to benchmark how quickly data can
be referenced for a given reference pattern. The benchmark isn't finished
yet so all that is done is a linear read through memory once, hence all
page references are 1.

The reports have three sections. The first is details of the test. The
second is a graph showing how long it took to read/write a page in
milliseconds. Note that they are fixed at a min access time of 350 because
of module overhead and test overhead. This could be "removed" easily
enough to give a more realistic view of real page access. The third is a
graph showing page reference counts in green and pages present in read.
The reference line is flat because it's one scan through the region.

start shows that page access times were pretty much constant, not
suprising

updatedb was fine until near the end. At that stage, buffers could not be
freed out that were filled by updatedb and it was having to look hard. So
you can see, times are quick for ages and then suddenly rise to an average
access time of about 3000 milliseconds with one access at 630482
milliseconds!!

withx shows spikey access times for pages which is consistent with large
apps starting up in the background

Now... where this is going. I plan to write a module that will generate
page references to a given pattern. Possible pattern references are

o Linear
o Pure random
o Random with gaussian distribution
o Smooth so the references look like a curve
o Trace data taken from a "real" application or database

The real application could be cool if I could acquire the data and would
produce "real" info. If we could simulate a database accessing for
instance, it could be shown exactly how the VM performed. But as it is,
some benchmarks can be easily adjusted to use VM Regress. If they just
read /proc/vmregress/pagemap, they will know what pages are present in
memory and what got swapped. A perl library VMR::Pagemap is provided to
decode the information.

Once the test can be run on the stock kernel, it can be run against rmap,
aa, 2.5.x or whatever other sort kernels are out there so emperical data
can be produced. The first major benchmark that will be produced will be
something like Rik's webserver benchmark.

Instead of the module memory mapping a region, it will memory map a set of
web pages and images. A set of bots will then act like users browsing. The
bot will say how long it took to retrive pages with a best, slowest and
average access time. The kernel modules will dump out what pages were
present, kernel statistics and so on.

The release date for the next version with this benchmark is next week at
some stage. I'm not working on this over the weekend so I estimate I'll
have this bench ready by Wednesday and I'll give rmap a run with it to
make sure it works correctly. Either way "your wish" is on the way

-- 
Mel Gorman
MSc Student, University of Limerick
http://www.csn.ul.ie/~mel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
@ 2002-08-16 21:02     ` Mel
  0 siblings, 0 replies; 25+ messages in thread
From: Mel @ 2002-08-16 21:02 UTC (permalink / raw)
  To: Bill Huey; +Cc: Rik van Riel, linux-kernel, linux-mm

On Thu, 15 Aug 2002, Bill Huey wrote:

> Again, the combination of a kind of felt increase in intelligence in
> swap decisions and increase in interactivity made my machine feel
> substantally smoother, but it needs to be backed up by other people's
> experiences with it.
>
> I wish there was a test for this kind of thing.
>

Blatant plug but it's what I'm working on with VM Regress. At the version
I'm working on, I've started the first benchmark but I've only been
working on it a day so it's a bit to go yet. As a start, we'll be able to
benchmark page access times and swap decisions. These are three live tests
run against 2.4.20pre2 but the suite is known to compile with the latest
2.5 kernel and with 2.4.19-rmap14.

http://www.csn.ul.ie/~mel/projects/vmregress/2.4.20pre2/start/mapanon.html
http://www.csn.ul.ie/~mel/projects/vmregress/2.4.20pre2/updatedb/mapanon.html
http://www.csn.ul.ie/~mel/projects/vmregress/2.4.20pre2/withx/mapanon.html

start    was run at system startup
updatedb is output after updatedb was running 2 minutes
withx    is run with the system running X, konqueror and a few eterms

The information is still a bit sparse but still, things can be told. The
test works by using three kernel modules

mapanon - Will mmap, read/write, close mmaped regions for a caller
pagemap - Print out pages swapped/present in all VMA's
zone    - Print out all zone information

A perl script uses these from userspace to benchmark how quickly data can
be referenced for a given reference pattern. The benchmark isn't finished
yet so all that is done is a linear read through memory once, hence all
page references are 1.

The reports have three sections. The first is details of the test. The
second is a graph showing how long it took to read/write a page in
milliseconds. Note that they are fixed at a min access time of 350 because
of module overhead and test overhead. This could be "removed" easily
enough to give a more realistic view of real page access. The third is a
graph showing page reference counts in green and pages present in read.
The reference line is flat because it's one scan through the region.

start shows that page access times were pretty much constant, not
suprising

updatedb was fine until near the end. At that stage, buffers could not be
freed out that were filled by updatedb and it was having to look hard. So
you can see, times are quick for ages and then suddenly rise to an average
access time of about 3000 milliseconds with one access at 630482
milliseconds!!

withx shows spikey access times for pages which is consistent with large
apps starting up in the background

Now... where this is going. I plan to write a module that will generate
page references to a given pattern. Possible pattern references are

o Linear
o Pure random
o Random with gaussian distribution
o Smooth so the references look like a curve
o Trace data taken from a "real" application or database

The real application could be cool if I could acquire the data and would
produce "real" info. If we could simulate a database accessing for
instance, it could be shown exactly how the VM performed. But as it is,
some benchmarks can be easily adjusted to use VM Regress. If they just
read /proc/vmregress/pagemap, they will know what pages are present in
memory and what got swapped. A perl library VMR::Pagemap is provided to
decode the information.

Once the test can be run on the stock kernel, it can be run against rmap,
aa, 2.5.x or whatever other sort kernels are out there so emperical data
can be produced. The first major benchmark that will be produced will be
something like Rik's webserver benchmark.

Instead of the module memory mapping a region, it will memory map a set of
web pages and images. A set of bots will then act like users browsing. The
bot will say how long it took to retrive pages with a best, slowest and
average access time. The kernel modules will dump out what pages were
present, kernel statistics and so on.

The release date for the next version with this benchmark is next week at
some stage. I'm not working on this over the weekend so I estimate I'll
have this bench ready by Wednesday and I'll give rmap a run with it to
make sure it works correctly. Either way "your wish" is on the way

-- 
Mel Gorman
MSc Student, University of Limerick
http://www.csn.ul.ie/~mel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-16 21:02     ` Mel
@ 2002-08-16 21:29       ` Scott Kaplan
  -1 siblings, 0 replies; 25+ messages in thread
From: Scott Kaplan @ 2002-08-16 21:29 UTC (permalink / raw)
  To: Mel; +Cc: Bill Huey, Rik van Riel, linux-kernel, linux-mm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mel,

I appreciate your efforts; the goal is a good one, but I'm concerned about 
some parts of the direction you seem to be taking.

On Friday, August 16, 2002, at 05:02 PM, Mel wrote:

> start    was run at system startup
> updatedb is output after updatedb was running 2 minutes
> withx    is run with the system running X, konqueror and a few eterms

I will acknowledge that you're at the beginning of a long process, and 
that you have much more that you plan to add, but I feel the need to point 
out that this is a *very* small test suite.  ``start'' is more of a 
curiosity than an interesting data point.  The other two are not 
unreasonable starting points.

> updatedb was fine until near the end. At that stage, buffers could not be
> freed out that were filled by updatedb and it was having to look hard. So
> you can see, times are quick for ages and then suddenly rise to an average
> access time of about 3000 milliseconds with one access at 630482
> milliseconds!!

You may want to check your code for sanity:  There are only 1,000 
milliseconds in a second, and I'm skeptical that there was a 630 second 
(that is, 10+ minute) reference.  Were there, perhaps, microseconds?  
There are 1,000,000 of those in a second, so 630,482 would still be half a 
second, which should be enough time for dozens of page faults (approach 
100 of them), so I'm wondering what could possibly cause this measurement.

Or...was this process descheduled, and what you measured is the interval 
between when this process last ran and when the scheduler put it on the 
CPU again?

> withx shows spikey access times for pages which is consistent with large
> apps starting up in the background

It is?  Why?  Which is the ``large app'' here?  What does it mean to start 
up in the background, and why would that make the page access times 
inconsistent?

> Now... where this is going. I plan to write a module that will generate
> page references to a given pattern. Possible pattern references are
>
> o Linear
> o Pure random
> o Random with gaussian distribution
> o Smooth so the references look like a curve
> o Trace data taken from a "real" application or database

Noooooooooo!

I can't think of a reason to test the VM under any one of the first three 
distributions.  I've never, *ever* seen or heard of a linear or gaussian 
distribution of page references.  As for uniform random (which is what I 
assume you mean by ``pure random''), that's not worth testing.  If a 
workload presents a pure random reference pattern, any on-line policy is 
screwed.  No process can do this on a data set that doesn't fit in memory,
  and if it does, there's no hope.

The fourth suggestion -- some negative exponential distribution -- is the 
kind of thing about which this group had a long discussion just a few 
weeks ago.  It's a mostly-bad idea.  In short:  If you have a negative 
exponential curve for your distribution, the best on-line policy is LRU, 
and nothing else will improve on it -- it's been proven in the literature 
on the LRUSM model of program behavior long ago.  It's when reference 
behavior *deviates* from that smooth curve that a policy may perform 
better than simple LRU.  Moreover, real workloads differ from that smooth 
curve over time, particularly during phase changes, which is where the 
*real* test of a VM policy occurs.  There's been plenty of work done on 
mathematical models for program behavior.  None of it has been sufficient 
for qualitative (that is, rank ordering) or quantitative evaluation of 
memory management policies.  Those models that come close are complex to 
work with, requiring the setting of a large number of parameters.

The last suggestion -- real trace data -- is the best one.  I do wonder 
why you put ``real'' in quotes.  I also wouldn't want trace data taken 
from *one* application or database.  You need a whole suite to represent 
the kinds of reference behavior that a VM system will need to manage.

Again, I recognize that this is a work in progress.  I'd be happy to see 
it yield worthwhile results.  If you use oversimplified models, it won't.  
The results will not be reliable for evaluating performance or making 
comparisons of VM systems.

Scott
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (Darwin)
Comment: For info see http://www.gnupg.org

iD8DBQE9XW6+8eFdWQtoOmgRAqjiAJ0ZlrQGOg3MFzXYyi+SdvKIa/bvOgCeOWak
7put0ihQbEY0wNXD+objEos=
=4IQt
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
@ 2002-08-16 21:29       ` Scott Kaplan
  0 siblings, 0 replies; 25+ messages in thread
From: Scott Kaplan @ 2002-08-16 21:29 UTC (permalink / raw)
  To: Mel; +Cc: Bill Huey, Rik van Riel, linux-kernel, linux-mm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mel,

I appreciate your efforts; the goal is a good one, but I'm concerned about 
some parts of the direction you seem to be taking.

On Friday, August 16, 2002, at 05:02 PM, Mel wrote:

> start    was run at system startup
> updatedb is output after updatedb was running 2 minutes
> withx    is run with the system running X, konqueror and a few eterms

I will acknowledge that you're at the beginning of a long process, and 
that you have much more that you plan to add, but I feel the need to point 
out that this is a *very* small test suite.  ``start'' is more of a 
curiosity than an interesting data point.  The other two are not 
unreasonable starting points.

> updatedb was fine until near the end. At that stage, buffers could not be
> freed out that were filled by updatedb and it was having to look hard. So
> you can see, times are quick for ages and then suddenly rise to an average
> access time of about 3000 milliseconds with one access at 630482
> milliseconds!!

You may want to check your code for sanity:  There are only 1,000 
milliseconds in a second, and I'm skeptical that there was a 630 second 
(that is, 10+ minute) reference.  Were there, perhaps, microseconds?  
There are 1,000,000 of those in a second, so 630,482 would still be half a 
second, which should be enough time for dozens of page faults (approach 
100 of them), so I'm wondering what could possibly cause this measurement.

Or...was this process descheduled, and what you measured is the interval 
between when this process last ran and when the scheduler put it on the 
CPU again?

> withx shows spikey access times for pages which is consistent with large
> apps starting up in the background

It is?  Why?  Which is the ``large app'' here?  What does it mean to start 
up in the background, and why would that make the page access times 
inconsistent?

> Now... where this is going. I plan to write a module that will generate
> page references to a given pattern. Possible pattern references are
>
> o Linear
> o Pure random
> o Random with gaussian distribution
> o Smooth so the references look like a curve
> o Trace data taken from a "real" application or database

Noooooooooo!

I can't think of a reason to test the VM under any one of the first three 
distributions.  I've never, *ever* seen or heard of a linear or gaussian 
distribution of page references.  As for uniform random (which is what I 
assume you mean by ``pure random''), that's not worth testing.  If a 
workload presents a pure random reference pattern, any on-line policy is 
screwed.  No process can do this on a data set that doesn't fit in memory,
  and if it does, there's no hope.

The fourth suggestion -- some negative exponential distribution -- is the 
kind of thing about which this group had a long discussion just a few 
weeks ago.  It's a mostly-bad idea.  In short:  If you have a negative 
exponential curve for your distribution, the best on-line policy is LRU, 
and nothing else will improve on it -- it's been proven in the literature 
on the LRUSM model of program behavior long ago.  It's when reference 
behavior *deviates* from that smooth curve that a policy may perform 
better than simple LRU.  Moreover, real workloads differ from that smooth 
curve over time, particularly during phase changes, which is where the 
*real* test of a VM policy occurs.  There's been plenty of work done on 
mathematical models for program behavior.  None of it has been sufficient 
for qualitative (that is, rank ordering) or quantitative evaluation of 
memory management policies.  Those models that come close are complex to 
work with, requiring the setting of a large number of parameters.

The last suggestion -- real trace data -- is the best one.  I do wonder 
why you put ``real'' in quotes.  I also wouldn't want trace data taken 
from *one* application or database.  You need a whole suite to represent 
the kinds of reference behavior that a VM system will need to manage.

Again, I recognize that this is a work in progress.  I'd be happy to see 
it yield worthwhile results.  If you use oversimplified models, it won't.  
The results will not be reliable for evaluating performance or making 
comparisons of VM systems.

Scott
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (Darwin)
Comment: For info see http://www.gnupg.org

iD8DBQE9XW6+8eFdWQtoOmgRAqjiAJ0ZlrQGOg3MFzXYyi+SdvKIa/bvOgCeOWak
7put0ihQbEY0wNXD+objEos=
=4IQt
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-16 21:29       ` Scott Kaplan
@ 2002-08-16 23:02         ` Mel
  -1 siblings, 0 replies; 25+ messages in thread
From: Mel @ 2002-08-16 23:02 UTC (permalink / raw)
  To: Scott Kaplan; +Cc: Bill Huey, Rik van Riel, linux-kernel, linux-mm

On Fri, 16 Aug 2002, Scott Kaplan wrote:

> > start    was run at system startup
> > updatedb is output after updatedb was running 2 minutes
> > withx    is run with the system running X, konqueror and a few eterms
>
> I will acknowledge that you're at the beginning of a long process, and
> that you have much more that you plan to add, but I feel the need to point
> out that this is a *very* small test suite.

It will take a *long* time to develop the full test suite to cover,
faulting, page alloc, slab, vmscan, buffer caches etc.. I have two
choices, I can develop the entire thing and have one large release or I
can release early with updates so people can keep an eye out and make sure
they get what they want as well as what I'm looking for. I choose the
latter. This test is what I can produce *now* but this benchmark isn't
even finished. I put up the three tests an an example of a beginning to
try and show that a test suite is on the way that isn't a simple shell
script.

> You may want to check your code for sanity:  There are only 1,000
> milliseconds in a second, and I'm skeptical that there was a 630 second
> (that is, 10+ minute) reference.  Were there, perhaps, microseconds?

nuts, yeah, microseconds. milliseconds is a typo. Userland counds in
microseconds. kernel code counts in jiffies. The code is sane, my emails
are not.

> There are 1,000,000 of those in a second, so 630,482 would still be half a
> second, which should be enough time for dozens of page faults (approach
> 100 of them), so I'm wondering what could possibly cause this measurement.
>

I'm not sure. I've noticed the odd twitch of long access time on rare
occasions but I'm not sure what causes them yet. I'm not sure if they are real
or confined to my code. All the time measurement stuff is in VMR::Time so at
least the timing code is confined for anyone who wants to verify.

> Or...was this process descheduled, and what you measured is the interval
> between when this process last ran and when the scheduler put it on the
> CPU again?
>

The measure is the time when the script asked the module to read a page.
The page is read by echoing to a mapanon_read proc entry. It's looking
like it takes about 350 microseconds to enter the module and perform the
read. I don't call schedule although it is possible I get scheduled. The only
way to be sure would be to collect all timing information within the module
which is perfectly possible. The only trouble is that if the module collects,
only one test instance can run at a time.

The way it is at the moment, I could run 100 instances of this test at the
same time and see how they interacted. The module is (or should be) SMP safe
and these tests were run on a duel processor. I'm waiting for a quad xeon
xseries to arrive so I can start running tests there.

> > withx shows spikey access times for pages which is consistent with large
> > apps starting up in the background
>
> It is?  Why?  Which is the ``large app'' here?  What does it mean to start
> up in the background, and why would that make the page access times
> inconsistent?
>

I didn't think about this but I suspected that what would happen is that the
apps and the test would compete for memory at the same time. Both would swap
out pages so there would be periods of quick accesses with a block of long
delays as more was swapped. I didnt' think this fully through yet and this is
30 seconds of reasoning so don't shoot me if I'm wrong.

At the time the test was started, 4 instances of konqueror were starting to
run and it hogs physical pages quiet a lot so it stands to reason it would
collide with the test. It's not a large app as such, but my machine isn't
exactly a powerhouse either.

> Noooooooooo!
>
> I can't think of a reason to test the VM under any one of the first three
> distributions.  I've never, *ever* seen or heard of a linear or gaussian
> distribution of page references.

I'm familiar with this problem and believe it or not, I've read a few papers on
the subject. The reason why I would write it is that it will help determine if
the page replacement algorithm is able to detect the working set or not. If
I refer to pages with a smooth distribution on an area about the size of
physical memory, the pages not been referenced should be swapped out.

It is more a test than a benchmark but it is somewhere where rmap should
shine. If I map memory the same size as physical memory, 2.4.20pre2 will
swap out the whole process because it can't reverse lookup pages. I want to
see will rmap selectively swap the correct pages. The timing isn't important
because for the length of the test, a FIFO or random selection isn't going
to be appreciably noticable. We need to see what the present pages were.

The second real reason to have this is that it is very easy to work out
in advance how the VM should perform for a given simple pattern. The test
should back up what the developer has in their head. It's much easier to
work initially with regular data than true trace information.

Lastly, this isn't justification for bad refernce data but even producing
data with a know pattern is more reproducable than running kernel
compiles, big dd's, large mmaps etc and timing the results.

> <Other page reference behaviour>

I see your point and mostly I agree. The problem is generating the correct
type of data is difficult and a full project in itself but generating
exact test data is not my immediate concern. The script is going to be
receiving it's reference data from a VMR::Reference perl module which is
responsible for generating page references. If someone feels that a better
reference pattern should be used, they can add it to the module and re-run
the tests. Either that or they can describe how to generate (or cite a
paper) to me and I'll investigate it if I have the time

> The last suggestion -- real trace data -- is the best one.  I do wonder
> why you put ``real'' in quotes.

Because all programs are real, even VM Regress but testing trace data from it
would be pretty useless. When I said "real", I meant real as in applications
like compilers, database servers, web browsers etc.

> I also wouldn't want trace data taken
> from *one* application or database.  You need a whole suite to represent
> the kinds of reference behavior that a VM system will need to manage.
>

Trace data would be great but I haven't been thinking about it long and
haven't come up with a reliable way of generating it yet. Given a bit of
thought, a patch to the kernel could be developed that would allow processes
to be attached to and read page faults but that is only part of the picture.
Trapping calls to mark_page_accessed might help but I need to think more and
generating real trace data is more important for a much later release. I'm
still working on framework here.

> Again, I recognize that this is a work in progress.  I'd be happy to see
> it yield worthwhile results.  If you use oversimplified models, it won't.
> The results will not be reliable for evaluating performance or making
> comparisons of VM systems.
>

Things have to start with simplified models because they can be easily
understood at a glance. I think it's a bit unreasonable to expect a full
featured suites at first release. As I said I have been working on this
particular benchmark 1 day, *1* day and the suite has only about 8 or 10
days of development time in total. I like to think I'm not a bad
programmer but I'm not God :-)

-- 
Mel Gorman
MSc Student, University of Limerick
http://www.csn.ul.ie/~mel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
@ 2002-08-16 23:02         ` Mel
  0 siblings, 0 replies; 25+ messages in thread
From: Mel @ 2002-08-16 23:02 UTC (permalink / raw)
  To: Scott Kaplan; +Cc: Bill Huey, Rik van Riel, linux-kernel, linux-mm

On Fri, 16 Aug 2002, Scott Kaplan wrote:

> > start    was run at system startup
> > updatedb is output after updatedb was running 2 minutes
> > withx    is run with the system running X, konqueror and a few eterms
>
> I will acknowledge that you're at the beginning of a long process, and
> that you have much more that you plan to add, but I feel the need to point
> out that this is a *very* small test suite.

It will take a *long* time to develop the full test suite to cover,
faulting, page alloc, slab, vmscan, buffer caches etc.. I have two
choices, I can develop the entire thing and have one large release or I
can release early with updates so people can keep an eye out and make sure
they get what they want as well as what I'm looking for. I choose the
latter. This test is what I can produce *now* but this benchmark isn't
even finished. I put up the three tests an an example of a beginning to
try and show that a test suite is on the way that isn't a simple shell
script.

> You may want to check your code for sanity:  There are only 1,000
> milliseconds in a second, and I'm skeptical that there was a 630 second
> (that is, 10+ minute) reference.  Were there, perhaps, microseconds?

nuts, yeah, microseconds. milliseconds is a typo. Userland counds in
microseconds. kernel code counts in jiffies. The code is sane, my emails
are not.

> There are 1,000,000 of those in a second, so 630,482 would still be half a
> second, which should be enough time for dozens of page faults (approach
> 100 of them), so I'm wondering what could possibly cause this measurement.
>

I'm not sure. I've noticed the odd twitch of long access time on rare
occasions but I'm not sure what causes them yet. I'm not sure if they are real
or confined to my code. All the time measurement stuff is in VMR::Time so at
least the timing code is confined for anyone who wants to verify.

> Or...was this process descheduled, and what you measured is the interval
> between when this process last ran and when the scheduler put it on the
> CPU again?
>

The measure is the time when the script asked the module to read a page.
The page is read by echoing to a mapanon_read proc entry. It's looking
like it takes about 350 microseconds to enter the module and perform the
read. I don't call schedule although it is possible I get scheduled. The only
way to be sure would be to collect all timing information within the module
which is perfectly possible. The only trouble is that if the module collects,
only one test instance can run at a time.

The way it is at the moment, I could run 100 instances of this test at the
same time and see how they interacted. The module is (or should be) SMP safe
and these tests were run on a duel processor. I'm waiting for a quad xeon
xseries to arrive so I can start running tests there.

> > withx shows spikey access times for pages which is consistent with large
> > apps starting up in the background
>
> It is?  Why?  Which is the ``large app'' here?  What does it mean to start
> up in the background, and why would that make the page access times
> inconsistent?
>

I didn't think about this but I suspected that what would happen is that the
apps and the test would compete for memory at the same time. Both would swap
out pages so there would be periods of quick accesses with a block of long
delays as more was swapped. I didnt' think this fully through yet and this is
30 seconds of reasoning so don't shoot me if I'm wrong.

At the time the test was started, 4 instances of konqueror were starting to
run and it hogs physical pages quiet a lot so it stands to reason it would
collide with the test. It's not a large app as such, but my machine isn't
exactly a powerhouse either.

> Noooooooooo!
>
> I can't think of a reason to test the VM under any one of the first three
> distributions.  I've never, *ever* seen or heard of a linear or gaussian
> distribution of page references.

I'm familiar with this problem and believe it or not, I've read a few papers on
the subject. The reason why I would write it is that it will help determine if
the page replacement algorithm is able to detect the working set or not. If
I refer to pages with a smooth distribution on an area about the size of
physical memory, the pages not been referenced should be swapped out.

It is more a test than a benchmark but it is somewhere where rmap should
shine. If I map memory the same size as physical memory, 2.4.20pre2 will
swap out the whole process because it can't reverse lookup pages. I want to
see will rmap selectively swap the correct pages. The timing isn't important
because for the length of the test, a FIFO or random selection isn't going
to be appreciably noticable. We need to see what the present pages were.

The second real reason to have this is that it is very easy to work out
in advance how the VM should perform for a given simple pattern. The test
should back up what the developer has in their head. It's much easier to
work initially with regular data than true trace information.

Lastly, this isn't justification for bad refernce data but even producing
data with a know pattern is more reproducable than running kernel
compiles, big dd's, large mmaps etc and timing the results.

> <Other page reference behaviour>

I see your point and mostly I agree. The problem is generating the correct
type of data is difficult and a full project in itself but generating
exact test data is not my immediate concern. The script is going to be
receiving it's reference data from a VMR::Reference perl module which is
responsible for generating page references. If someone feels that a better
reference pattern should be used, they can add it to the module and re-run
the tests. Either that or they can describe how to generate (or cite a
paper) to me and I'll investigate it if I have the time

> The last suggestion -- real trace data -- is the best one.  I do wonder
> why you put ``real'' in quotes.

Because all programs are real, even VM Regress but testing trace data from it
would be pretty useless. When I said "real", I meant real as in applications
like compilers, database servers, web browsers etc.

> I also wouldn't want trace data taken
> from *one* application or database.  You need a whole suite to represent
> the kinds of reference behavior that a VM system will need to manage.
>

Trace data would be great but I haven't been thinking about it long and
haven't come up with a reliable way of generating it yet. Given a bit of
thought, a patch to the kernel could be developed that would allow processes
to be attached to and read page faults but that is only part of the picture.
Trapping calls to mark_page_accessed might help but I need to think more and
generating real trace data is more important for a much later release. I'm
still working on framework here.

> Again, I recognize that this is a work in progress.  I'd be happy to see
> it yield worthwhile results.  If you use oversimplified models, it won't.
> The results will not be reliable for evaluating performance or making
> comparisons of VM systems.
>

Things have to start with simplified models because they can be easily
understood at a glance. I think it's a bit unreasonable to expect a full
featured suites at first release. As I said I have been working on this
particular benchmark 1 day, *1* day and the suite has only about 8 or 10
days of development time in total. I like to think I'm not a bad
programmer but I'm not God :-)

-- 
Mel Gorman
MSc Student, University of Limerick
http://www.csn.ul.ie/~mel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-16 23:02         ` Mel
  (?)
@ 2002-08-19 19:05         ` Scott Kaplan
  2002-08-19 21:04           ` Mel
  -1 siblings, 1 reply; 25+ messages in thread
From: Scott Kaplan @ 2002-08-19 19:05 UTC (permalink / raw)
  To: Mel; +Cc: linux-mm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Friday, August 16, 2002, at 07:02 PM, Mel wrote:

> It will take a *long* time to develop the full test suite to cover,
> faulting, page alloc, slab, vmscan, buffer caches etc..

Agreed.  This is a big project, and I don't expect that the first cut will 
do it all.

>> Or...was this process descheduled, and what you measured is the interval
>> between when this process last ran and when the scheduler put it on the
>> CPU again?
>
> The measure is the time when the script asked the module to read a page.
> [...] I don't call schedule although it is possible I get scheduled.

That's exactly the concern that I had.  Large timing result like that are 
more likely because your code was preempted for something else.  It would 
probably be good to do *something* about these statistical outliers, 
because they can affect averages substantially.  One suggestion is to come 
up with a reasonable upper bound -- something like 5x the normal cost of 
page fault when I/O swapping is required -- and eliminate all timings that 
are larger than the cutoff.  You miss some measurements, but you avoid 
doing weird things to detect cases where the scheduling is interfering 
with your timing.  You just need to be sure that it *is* the scheduling 
that's causing such anomalies, and not something else.

> At the time the test was started, 4 instances of konqueror were starting 
> to
> run and it hogs physical pages quiet a lot so it stands to reason it would
> collide with the test.

I agree that they're likely to compete.  I don't think it's going to be 
easy, though, to reason a-priori about what the result of that competition 
will be; that is, it's not clear to me that it will cause bursts of paging 
activity as opposed to some other kind of paging behavior.

> Lastly, this isn't justification for bad refernce data but even producing
> data with a know pattern is more reproducable than running kernel
> compiles, big dd's, large mmaps etc and timing the results.

A good point:  This is a tool for testing that the desired concepts were 
implemented correctly.  I'll buy that.

> Things have to start with simplified models because they can be easily
> understood at a glance. I think it's a bit unreasonable to expect a full
> featured suites at first release.

I agree.  I was heavy handed, probably unfairly so, but there was a 
purpose to the points I tried to make:  *Since* this is a work in progress,
  I wanted to provide feedback so that it would avoid some known, poor 
directions.  It's good that you know of the limitations of modeling 
reference behavior, but lots of people have fallen into that trap and used 
poor models for evaluative purposes, believing the results to be more 
conclusive and comprehensive than they really were.  I figured that it 
would be better to sound the warning on that problem *before* you got 
deeply into the modeling issues for this project.

Scott

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (Darwin)
Comment: For info see http://www.gnupg.org

iD8DBQE9YUF08eFdWQtoOmgRAuHAAJ474zwp3PA5UXmZCN5MWgsUzhajeACfepUF
asAVQ/KBoEz9bGFLQ0gpZ4E=
=PVV7
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-19 19:05         ` Scott Kaplan
@ 2002-08-19 21:04           ` Mel
  2002-08-20 18:41             ` Stephen C. Tweedie
  2002-08-21 15:05             ` Scott Kaplan
  0 siblings, 2 replies; 25+ messages in thread
From: Mel @ 2002-08-19 21:04 UTC (permalink / raw)
  To: Scott Kaplan; +Cc: linux-mm

On Mon, 19 Aug 2002, Scott Kaplan wrote:

> > The measure is the time when the script asked the module to read a page.
> > [...] I don't call schedule although it is possible I get scheduled.
>
> That's exactly the concern that I had.  Large timing result like that are
> more likely because your code was preempted for something else.  It would
> probably be good to do *something* about these statistical outliers,
> because they can affect averages substantially.

At the moment I'm not calculating averages and I haven't worked out the
best way to factor in large skews in page reads. For the moment, I'm
taking the easy option and depending on the tester to be able to ignore
the bogus data.

> where the scheduling is interfering with your timing.  You just need to
> be sure that it *is* the scheduling that's causing such anomalies, and
> not something else.
>

As Daniel posted elsewhere, the Linux Trace Toolkit is what would answer
such questions. I'm trying to get as far as possible without using LTT for
the moment but I'll keep what you said in mind as I progress.

> I agree that they're likely to compete.  I don't think it's going to be
> easy, though, to reason a-priori about what the result of that competition
> will be; that is, it's not clear to me that it will cause bursts of paging
> activity as opposed to some other kind of paging behavior.
>

You're right, I'm only guessing what is happening for the moment. There
isn't enough data avaialble yet. I've started trapping the results of
vmstat and graphing it as well to help decide what is happening but still,
I'm depending on the user to be able to analyse the data themselves for
the moment

> > Things have to start with simplified models because they can be easily
> > understood at a glance. I think it's a bit unreasonable to expect a full
> > featured suites at first release.
>
> I agree.  I was heavy handed, probably unfairly so, but there was a
> purpose to the points I tried to make:  *Since* this is a work in progress,
>   I wanted to provide feedback so that it would avoid some known, poor
> directions.

Understood,.

> It's good that you know of the limitations of modeling
> reference behavior, but lots of people have fallen into that trap and used
> poor models for evaluative purposes, believing the results to be more
> conclusive and comprehensive than they really were.

I've read some of the papers that met the problem. I think I've come up
with a way that it can be addressed but it's ideas in my head, I haven't
investigated them yet. It is possible LTT can provide the data itself in
which case I'm off the mark anyway.

Without LTT, this is the prelimary guess as what needs to be done to get
real page faulting data. Note that I haven't researched this at all.

Anyway... add an option to the kernel CONFIG_FTRACE for Fault Tracing. A
struct would exist that looks something like

struct pagetrap {
	spinlock_t lock;
	pid_t pid;
	unsigned long (*callback)(pte_t *, unsigned long addr);
}

handle_pte_fault() and mark_page_accessed is changed to check this struct
and use the callback if it is registered and the pid is the same as
current->pid .  do_exit() is changed to make sure the pid been traced
removes itself correctly. Now.... collection.

A vm regress module faulttrace.o is loaded and exports proc entries

trace_begin
trace_read
trace_stop

begin will register a function for the kernel to call back with data
regarding the pid. The address will always be page aligned so the lower
bits can be used to show what the action was. This might be mmap/munmap
(from the vm regress benchmark), page fault or whatever. It dumps the data
that has be read from trace_read periodically.

If this works out, I'd should be able to calculate the fault rate for a
pid at the very least as well as get close to real reference behaviour. I
can't think of a way of trapping every real reference from a process
unless LTT would do it.

For trapping mmaps, munmaps and so on, an LD_PRELOAD trick can be used to
get programs to use VM Regress equivilants if available. This might be
utter crap, but if it is, I might come up with a better way of trapping
real data later. It's a problem for far away.

> I figured that it
> would be better to sound the warning on that problem *before* you got
> deeply into the modeling issues for this project.
>

True, thanks for the reminder

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-19 21:04           ` Mel
@ 2002-08-20 18:41             ` Stephen C. Tweedie
  2002-08-21 18:03               ` Mel
  2002-08-21 15:05             ` Scott Kaplan
  1 sibling, 1 reply; 25+ messages in thread
From: Stephen C. Tweedie @ 2002-08-20 18:41 UTC (permalink / raw)
  To: Mel; +Cc: Scott Kaplan, linux-mm

Hi,

On Mon, Aug 19, 2002 at 10:04:19PM +0100, Mel wrote:

> > That's exactly the concern that I had.  Large timing result like that are
> > more likely because your code was preempted for something else.  It would
> > probably be good to do *something* about these statistical outliers,
> > because they can affect averages substantially.
> 
> At the moment I'm not calculating averages and I haven't worked out the
> best way to factor in large skews in page reads. For the moment, I'm
> taking the easy option and depending on the tester to be able to ignore
> the bogus data.

You can get that by a bit of stats: keeping track of the sum of each
value you observe plus their squares and cubes gives you the main
stats you probably want to collect:

	mean			(obvious)
	standard deviation 	(measures variation between samples)
	standard error		(shows how accurate your estimation of
				 the mean is)
and	skew/3rd-moment		(shows how one-sided the distribution
				 is)

Distributions with a long tail to one side will have a high skew.  The
main one is standard error, though --- without that you have no idea
how useful your results are.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-20 18:41             ` Stephen C. Tweedie
@ 2002-08-21 18:03               ` Mel
  0 siblings, 0 replies; 25+ messages in thread
From: Mel @ 2002-08-21 18:03 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: linux-mm

On Tue, 20 Aug 2002, Stephen C. Tweedie wrote:

> You can get that by a bit of stats: keeping track of the sum of each
> value you observe plus their squares and cubes gives you the main
> stats you probably want to collect:
>

Good point. My stats analysis is a bit rusty bordering on the non-existant
and so a refresher course is on the way. I aim to start providing stats
analysis tools within the next few versions.  I'm going to focus another
while on data collection before I move heavier onto data analysis.

I've finished the benchmark for anonymous memory referecing for version of
VM Regress 0.6 (released later when I have the docs updated). I've run a
series of tests of 2.4.19 Vs 2.4.19-rmap14a .

http://www.csn.ul.ie/~mel/vmr/2.4.19/smooth_sin_50000/mapanon.html
http://www.csn.ul.ie/~mel/vmr/2.4.19-rmap14a/smooth_sin_50000/mapanon.html

are two of them . It is a test of 5,000,000 page references to a memory
range 50000 pages long, almost twice the size of physical memory.  It
tracks, how long it took to reference a page, the page presense versus
page frequency usage, a graph of vmstat output and the vmstat output
itself. It also shows the parameters of the test, duration of the test,
the kernel version and the output of /proc/cpuinfo and /proc/meminfo.

the only other graph I can think of relevance is one of page age Vs page
presense which would be a lot more useful than page reference count. The
most valuable stats analysis I can think of is against the time reference
data to filter badly skewed data but as I said stats analysis is a bit
away. I'm considering adding oprofile information if it is available.

Is there anything obvious I am missing?

-- 
Mel Gorman
MSc Student, University of Limerick
http://www.csn.ul.ie/~mel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-19 21:04           ` Mel
  2002-08-20 18:41             ` Stephen C. Tweedie
@ 2002-08-21 15:05             ` Scott Kaplan
  2002-08-21 16:28               ` Mel
  1 sibling, 1 reply; 25+ messages in thread
From: Scott Kaplan @ 2002-08-21 15:05 UTC (permalink / raw)
  To: Mel; +Cc: linux-mm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Monday, August 19, 2002, at 05:04 PM, Mel wrote:

> On Mon, 19 Aug 2002, Scott Kaplan wrote:
>> It's good that you know of the limitations of modeling
>> reference behavior, but lots of people have fallen into that trap and 
>> used
>> poor models for evaluative purposes, believing the results to be more
>> conclusive and comprehensive than they really were.
>
> I've read some of the papers that met the problem. I think I've come up
> with a way that it can be addressed but it's ideas in my head, I haven't
> investigated them yet.

What papers are those?  We all try to keep up with the literature, but if 
there's something that I've missed here, I'd love to know about it.

Scott

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (Darwin)
Comment: For info see http://www.gnupg.org

iD8DBQE9Y6ws8eFdWQtoOmgRAvhEAJ94rwinVJW28DWFj/H1qhWhDYA6XgCgoxgF
K2Yl7hNeS9w+Bsz/bBbaT2c=
=AsLy
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-21 15:05             ` Scott Kaplan
@ 2002-08-21 16:28               ` Mel
  2002-08-21 18:39                 ` Scott Kaplan
  0 siblings, 1 reply; 25+ messages in thread
From: Mel @ 2002-08-21 16:28 UTC (permalink / raw)
  To: Scott Kaplan; +Cc: linux-mm

On Wed, 21 Aug 2002, Scott Kaplan wrote:

> What papers are those?  We all try to keep up with the literature, but if
> there's something that I've missed here, I'd love to know about it.
>

For a start, I'm not reviewing litrature actively so I'm not up to date on
papers at all, nor am I willing to get involved in a I Know More Papers
Than You discussion here.

I also don't have any of the papers here and I don't have the time to look
at the moment because I'm writing the docs for vmregress 0.6.  Three come
to mind but they are all old papers but the principles are the same or at
least should be and semi relevant to collecting trace data for VM Regress.

One that comes to mind I think was called Adaptive Page Replacement Based
on Memory Reference Behaviour or something very similar to that name. I
think it had something on collecting trace reference that was based on
real programs.  It wasn't under Linux though, it was Solaris I believe so
not of direct use to us here.

In Search of a Better Malloc I *think* had something on traces but it
depended on synthetic traces for information that had some sort of uniform
distribution. It struck me as been not particularly reliable but it was a
method that appeared to be used in a number of other papers.

I'm am almost definite that "Dynamic Storage Allocation, A Critical
Review" has a section on traces and how they tended to be synthetic or
generated by small custom programs and why this was a bad thing. I believe
it talked briefly about how synthetic traces were used because they were
really easy to produce, not because they were useful. Still, it highlights
the problem of using predictable data.

There are others but I have a bad memory for remembering individual
papers. By and large, I found that papers on memory allocators tend to
focus on trace data and it's usefulness more than page replacement papers
did.  I could be wrong here though but don't quote me on that. By and
large, this is off-topic, so I'll shut up now.

-- 
Mel Gorman
MSc Student, University of Limerick
http://www.csn.ul.ie/~mel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-21 16:28               ` Mel
@ 2002-08-21 18:39                 ` Scott Kaplan
  0 siblings, 0 replies; 25+ messages in thread
From: Scott Kaplan @ 2002-08-21 18:39 UTC (permalink / raw)
  To: Mel; +Cc: linux-mm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday, August 21, 2002, at 12:28 PM, Mel wrote:

> On Wed, 21 Aug 2002, Scott Kaplan wrote:
>
>> What papers are those?  We all try to keep up with the literature, but if
>> there's something that I've missed here, I'd love to know about it.
>
> For a start, I'm not reviewing litrature actively so I'm not up to date on
> papers at all, nor am I willing to get involved in a I Know More Papers
> Than You discussion here.

I wasn't trying to get into a contest.  When someone mentioned that they 
know of papers to support an interesting position, I just want to know 
what those papers are, as the resulting list may include some that I haven'
t seen.

> One that comes to mind I think was called Adaptive Page Replacement Based
> on Memory Reference Behaviour or something very similar to that name. I
> think it had something on collecting trace reference that was based on
> real programs.

Are you thinking of the paper on the SEQ replacement policy by Glass and 
Cao?  If so, be skeptical of their trace gathering methods.  They gathered 
and reduced their traces in an ad-hoc manner that left some serious 
limitations to the use of those traces.  They also don't mention reference 
behavior modeling at all; they simply gathered some traces based on the 
spec95 benchmark, tested their new page replacement policy, and showed the 
results.

> In Search of a Better Malloc I *think* had something on traces but it
> depended on synthetic traces for information that had some sort of uniform
> distribution. It struck me as been not particularly reliable but it was a
> method that appeared to be used in a number of other papers.

> I'm am almost definite that "Dynamic Storage Allocation, A Critical
> Review" has a section on traces and how they tended to be synthetic or
> generated by small custom programs and why this was a bad thing. I believe
> it talked briefly about how synthetic traces were used because they were
> really easy to produce, not because they were useful. Still, it highlights
> the problem of using predictable data.

Also try Johnstone and Wilson, ``The Memory Fragmentation Problem: Solved?
'' (International Symposium on Memory Management '98) -- it's a later 
version of that same work.  I'm unfamiliar with the second paper you 
mentioned, but you should also see Zorn and Grunwald, ``Evaluating Models 
of Memory Allocation'' (U. Colorado tech report from '92), in which they 
claim that many simple models are good, but none good enough to synthesize 
a full system workload.

For reference behavior, an old citation, but perhaps a useful one, is 
Richard Carr's ``Virtual Memory Management'' (Ph.D. thesis, 1984 or 
thereabouts), in which he address a number of approaches to modeling, 
their shortcomings, and then presents a new one that is probably better, 
but goes unvalidated against real program behavior.

> By and large, I found that papers on memory allocators tend to
> focus on trace data and it's usefulness more than page replacement papers
> did.  I could be wrong here though but don't quote me on that.

That's likely true.  Use of models for reference traces seemed to die out 
about 15 years ago, as gathering real reference traces became more 
feasible (larger storage, faster chips for compression, etc.)

Scott
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (Darwin)
Comment: For info see http://www.gnupg.org

iD8DBQE9Y95O8eFdWQtoOmgRAhOtAJ4s4tzA184T3UnxB37R+cxKsAcYNwCcCS0q
LVw0VCDFUy47oamDjWtaz54=
=1pLE
-----END PGP SIGNATURE-----

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-16 23:02         ` Mel
@ 2002-08-19 19:50           ` Daniel Phillips
  -1 siblings, 0 replies; 25+ messages in thread
From: Daniel Phillips @ 2002-08-19 19:50 UTC (permalink / raw)
  To: Mel, Scott Kaplan; +Cc: Bill Huey, Rik van Riel, linux-kernel, linux-mm

On Saturday 17 August 2002 01:02, Mel wrote:
> On Fri, 16 Aug 2002, Scott Kaplan wrote:
> The measure is the time when the script asked the module to read a page.
> The page is read by echoing to a mapanon_read proc entry. It's looking
> like it takes about 350 microseconds to enter the module and perform the
> read. I don't call schedule although it is possible I get scheduled. The only
> way to be sure would be to collect all timing information within the module
> which is perfectly possible. The only trouble is that if the module collects,
> only one test instance can run at a time.

It sounds like you want to try the linux trace toolkit:

   http://www.opersys.com/LTT/

-- 
Daniel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
@ 2002-08-19 19:50           ` Daniel Phillips
  0 siblings, 0 replies; 25+ messages in thread
From: Daniel Phillips @ 2002-08-19 19:50 UTC (permalink / raw)
  To: Mel, Scott Kaplan; +Cc: Bill Huey, Rik van Riel, linux-kernel, linux-mm

On Saturday 17 August 2002 01:02, Mel wrote:
> On Fri, 16 Aug 2002, Scott Kaplan wrote:
> The measure is the time when the script asked the module to read a page.
> The page is read by echoing to a mapanon_read proc entry. It's looking
> like it takes about 350 microseconds to enter the module and perform the
> read. I don't call schedule although it is possible I get scheduled. The only
> way to be sure would be to collect all timing information within the module
> which is perfectly possible. The only trouble is that if the module collects,
> only one test instance can run at a time.

It sounds like you want to try the linux trace toolkit:

   http://www.opersys.com/LTT/

-- 
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-19 19:50           ` Daniel Phillips
@ 2002-08-19 21:19             ` Mel
  -1 siblings, 0 replies; 25+ messages in thread
From: Mel @ 2002-08-19 21:19 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel, linux-mm

On Mon, 19 Aug 2002, Daniel Phillips wrote:

> It sounds like you want to try the linux trace toolkit:
>
>    http://www.opersys.com/LTT/
>

I have been looking it's direction a couple of times. I suspect I'll
eventually end up using it to answer some questions but I'm trying to
get as far as possible without using large kernel patches. At the moment
the extent of the patches involves exporting symbols to modules

-- 
Mel Gorman
MSc Student, University of Limerick
http://www.csn.ul.ie/~mel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
@ 2002-08-19 21:19             ` Mel
  0 siblings, 0 replies; 25+ messages in thread
From: Mel @ 2002-08-19 21:19 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel, linux-mm

On Mon, 19 Aug 2002, Daniel Phillips wrote:

> It sounds like you want to try the linux trace toolkit:
>
>    http://www.opersys.com/LTT/
>

I have been looking it's direction a couple of times. I suspect I'll
eventually end up using it to answer some questions but I'm trying to
get as far as possible without using large kernel patches. At the moment
the extent of the patches involves exporting symbols to modules

-- 
Mel Gorman
MSc Student, University of Limerick
http://www.csn.ul.ie/~mel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-19 21:19             ` Mel
@ 2002-08-19 21:38               ` Daniel Phillips
  -1 siblings, 0 replies; 25+ messages in thread
From: Daniel Phillips @ 2002-08-19 21:38 UTC (permalink / raw)
  To: Mel; +Cc: linux-kernel, linux-mm

On Monday 19 August 2002 23:19, Mel wrote:
> On Mon, 19 Aug 2002, Daniel Phillips wrote:
> 
> > It sounds like you want to try the linux trace toolkit:
> >
> >    http://www.opersys.com/LTT/
> >
> 
> I have been looking it's direction a couple of times. I suspect I'll
> eventually end up using it to answer some questions

That's exactly what I meant - when you uncover something interesting with
your test tool, you investigate it further with LTT.

> but I'm trying to
> get as far as possible without using large kernel patches. At the moment
> the extent of the patches involves exporting symbols to modules

I think you've chosen roughly the right level to approach this.

-- 
Daniel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
@ 2002-08-19 21:38               ` Daniel Phillips
  0 siblings, 0 replies; 25+ messages in thread
From: Daniel Phillips @ 2002-08-19 21:38 UTC (permalink / raw)
  To: Mel; +Cc: linux-kernel, linux-mm

On Monday 19 August 2002 23:19, Mel wrote:
> On Mon, 19 Aug 2002, Daniel Phillips wrote:
> 
> > It sounds like you want to try the linux trace toolkit:
> >
> >    http://www.opersys.com/LTT/
> >
> 
> I have been looking it's direction a couple of times. I suspect I'll
> eventually end up using it to answer some questions

That's exactly what I meant - when you uncover something interesting with
your test tool, you investigate it further with LTT.

> but I'm trying to
> get as far as possible without using large kernel patches. At the moment
> the extent of the patches involves exporting symbols to modules

I think you've chosen roughly the right level to approach this.

-- 
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
  2002-08-16 21:29       ` Scott Kaplan
@ 2002-08-19 18:04         ` Daniel Phillips
  -1 siblings, 0 replies; 25+ messages in thread
From: Daniel Phillips @ 2002-08-19 18:04 UTC (permalink / raw)
  To: Scott Kaplan, Mel; +Cc: Bill Huey, Rik van Riel, linux-kernel, linux-mm

On Friday 16 August 2002 23:29, Scott Kaplan wrote:
> > Now... where this is going. I plan to write a module that will generate
> > page references to a given pattern. Possible pattern references are
> >
> > o Linear
> > o Pure random
> > o Random with gaussian distribution
> > o Smooth so the references look like a curve
> > o Trace data taken from a "real" application or database
> 
> Noooooooooo!
> 
> I can't think of a reason to test the VM under any one of the first three
> distributions.  I've never, *ever* seen or heard of a linear or gaussian
> distribution of page references.  As for uniform random (which is what I
> assume you mean by ``pure random''), that's not worth testing.  If a
> workload presents a pure random reference pattern, any on-line policy is
> screwed.  No process can do this on a data set that doesn't fit in memory,
> and if it does, there's no hope.

I disagree that the linear (which I assume means walk linearly through 
process memory) and random patterns aren't worth testing.  The former should 
produce very understandable behaviour and that's always a good thing.  It's 
an idiot check.  Specifically, with the algorithms we're using, we expect the 
first-touched pages to be chosen for eviction.  It's worth verifying that 
this works as expected.

Random gives us a nice baseline against which to evaluate our performance on 
more typical, localized loads.  That is, we need to know we're doing better 
than random, and it's very nice to know by how much.

The gaussian distribution is also interesting because it gives a simplistic 
notion of virtual address locality.  We are supposed to be able to predict 
likelihood of future uses based on historical access patterns, the question 
is: do we?  Comparing the random distribution to gaussian, we ought to see 
somewhat fewer evictions on the gaussian distribution.  (I'll bet right now 
that we completely fail that test, because we just do not examine the 
referenced bits frequently enough to recover any signal from the noise.)

I'll leave the more complex patterns to you and Mel, but these simple 
patterns are particularly interesting to me.  Not as a target for 
optimization, but more to verify that basic mechanisms are working as 
expected.

-- 
Daniel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] rmap 14
@ 2002-08-19 18:04         ` Daniel Phillips
  0 siblings, 0 replies; 25+ messages in thread
From: Daniel Phillips @ 2002-08-19 18:04 UTC (permalink / raw)
  To: Scott Kaplan, Mel; +Cc: Bill Huey, Rik van Riel, linux-kernel, linux-mm

On Friday 16 August 2002 23:29, Scott Kaplan wrote:
> > Now... where this is going. I plan to write a module that will generate
> > page references to a given pattern. Possible pattern references are
> >
> > o Linear
> > o Pure random
> > o Random with gaussian distribution
> > o Smooth so the references look like a curve
> > o Trace data taken from a "real" application or database
> 
> Noooooooooo!
> 
> I can't think of a reason to test the VM under any one of the first three
> distributions.  I've never, *ever* seen or heard of a linear or gaussian
> distribution of page references.  As for uniform random (which is what I
> assume you mean by ``pure random''), that's not worth testing.  If a
> workload presents a pure random reference pattern, any on-line policy is
> screwed.  No process can do this on a data set that doesn't fit in memory,
> and if it does, there's no hope.

I disagree that the linear (which I assume means walk linearly through 
process memory) and random patterns aren't worth testing.  The former should 
produce very understandable behaviour and that's always a good thing.  It's 
an idiot check.  Specifically, with the algorithms we're using, we expect the 
first-touched pages to be chosen for eviction.  It's worth verifying that 
this works as expected.

Random gives us a nice baseline against which to evaluate our performance on 
more typical, localized loads.  That is, we need to know we're doing better 
than random, and it's very nice to know by how much.

The gaussian distribution is also interesting because it gives a simplistic 
notion of virtual address locality.  We are supposed to be able to predict 
likelihood of future uses based on historical access patterns, the question 
is: do we?  Comparing the random distribution to gaussian, we ought to see 
somewhat fewer evictions on the gaussian distribution.  (I'll bet right now 
that we completely fail that test, because we just do not examine the 
referenced bits frequently enough to recover any signal from the noise.)

I'll leave the more complex patterns to you and Mel, but these simple 
patterns are particularly interesting to me.  Not as a target for 
optimization, but more to verify that basic mechanisms are working as 
expected.

-- 
Daniel
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2002-08-21 18:39 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-08-16  2:07 [PATCH] rmap 14 Rik van Riel
2002-08-16  2:07 ` Rik van Riel
2002-08-16  2:21 ` Bill Huey
2002-08-16  2:21   ` Bill Huey
2002-08-16 21:02   ` Mel
2002-08-16 21:02     ` Mel
2002-08-16 21:29     ` Scott Kaplan
2002-08-16 21:29       ` Scott Kaplan
2002-08-16 23:02       ` Mel
2002-08-16 23:02         ` Mel
2002-08-19 19:05         ` Scott Kaplan
2002-08-19 21:04           ` Mel
2002-08-20 18:41             ` Stephen C. Tweedie
2002-08-21 18:03               ` Mel
2002-08-21 15:05             ` Scott Kaplan
2002-08-21 16:28               ` Mel
2002-08-21 18:39                 ` Scott Kaplan
2002-08-19 19:50         ` Daniel Phillips
2002-08-19 19:50           ` Daniel Phillips
2002-08-19 21:19           ` Mel
2002-08-19 21:19             ` Mel
2002-08-19 21:38             ` Daniel Phillips
2002-08-19 21:38               ` Daniel Phillips
2002-08-19 18:04       ` Daniel Phillips
2002-08-19 18:04         ` Daniel Phillips

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.