[PATCH *] rmap VM 11c

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH *] rmap VM 11c
@ 2002-01-17 19:22 Rik van Riel
  2002-01-17 23:59 ` Bill Davidsen
  2002-01-18  0:33 ` Adam Kropelin
  0 siblings, 2 replies; 11+ messages in thread
From: Rik van Riel @ 2002-01-17 19:22 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel

For this release, IO tests are very much welcome ...


The third maintenance release of the 11th version of the reverse
mapping based VM is now available.
This is an attempt at making a more robust and flexible VM
subsystem, while cleaning up a lot of code at the same time.
The patch is available from:

           http://surriel.com/patches/2.4/2.4.17-rmap-11c
and        http://linuxvm.bkbits.net/


My big TODO items for a next release are:
  - fix page_launder() so it doesn't submit the whole
    inactive_dirty list for writeout in one go
    ... no longer needed due to fixed elevator ???
  - auto-tuning readahead, readahead per VMA

rmap 11c:
  - oom_kill race locking fix                             (Andres Salomon)
  - elevator improvement                                  (Andrew Morton)
  - dirty buffer writeout speedup (hopefully ;))          (me)
  - small documentation updates                           (me)
  - page_launder() never does synchronous IO, kswapd
    and the processes calling it sleep on higher level    (me)
  - deadlock fix in touch_page()                          (me)
rmap 11b:
  - added low latency reschedule points in vmscan.c       (me)
  - make i810_dma.c include mm_inline.h too               (William Lee Irwin)
  - wake up kswapd sleeper tasks on OOM kill so the
    killed task can continue on its way out               (me)
  - tune page allocation sleep point a little             (me)
rmap 11a:
  - don't let refill_inactive() progress count for OOM    (me)
  - after an OOM kill, wait 5 seconds for the next kill   (me)
  - agpgart_be fix for hashed waitqueues                  (William Lee Irwin)
rmap 11:
  - fix stupid logic inversion bug in wakeup_kswapd()     (Andrew Morton)
  - fix it again in the morning                           (me)
  - add #ifdef BROKEN_PPC_PTE_ALLOC_ONE to rmap.h, it
    seems PPC calls pte_alloc() before mem_map[] init     (me)
  - disable the debugging code in rmap.c ... the code
    is working and people are running benchmarks          (me)
  - let the slab cache shrink functions return a value
    to help prevent early OOM killing                     (Ed Tomlinson)
  - also, don't call the OOM code if we have enough
    free pages                                            (me)
  - move the call to lru_cache_del into __free_pages_ok   (Ben LaHaise)
  - replace the per-page waitqueue with a hashed
    waitqueue, reduces size of struct page from 64
    bytes to 52 bytes (48 bytes on non-highmem machines)  (William Lee Irwin)
rmap 10:
  - fix the livelock for real (yeah right), turned out
    to be a stupid bug in page_launder_zone()             (me)
  - to make sure the VM subsystem doesn't monopolise
    the CPU, let kswapd and some apps sleep a bit under
    heavy stress situations                               (me)
  - let __GFP_HIGH allocations dig a little bit deeper
    into the free page pool, the SCSI layer seems fragile (me)
rmap 9:
  - improve comments all over the place                   (Michael Cohen)
  - don't panic if page_remove_rmap() cannot find the
    rmap in question, it's possible that the memory was
    PG_reserved and belonging to a driver, but the driver
    exited and cleared the PG_reserved bit                (me)
  - fix the VM livelock by replacing > by >= in a few
    critical places in the pageout code                   (me)
  - treat the reclaiming of an inactive_clean page like
    allocating a new page, calling try_to_free_pages()
    and/or fixup_freespace() if required                  (me)
  - when low on memory, don't make things worse by
    doing swapin_readahead                                (me)
rmap 8:
  - add ANY_ZONE to the balancing functions to improve
    kswapd's balancing a bit                              (me)
  - regularize some of the maximum loop bounds in
    vmscan.c for cosmetic purposes                        (William Lee Irwin)
  - move page_address() to architecture-independent
    code, now the removal of page->virtual is portable    (William Lee Irwin)
  - speed up free_area_init_core() by doing a single
    pass over the pages and not using atomic ops          (William Lee Irwin)
  - documented the buddy allocator in page_alloc.c        (William Lee Irwin)
rmap 7:
  - clean up and document vmscan.c                        (me)
  - reduce size of page struct, part one                  (William Lee Irwin)
  - add rmap.h for other archs (untested, not for ARM)    (me)
rmap 6:
  - make the active and inactive_dirty list per zone,
    this is finally possible because we can free pages
    based on their physical address                       (William Lee Irwin)
  - cleaned up William's code a bit                       (me)
  - turn some defines into inlines and move those to
    mm_inline.h (the includes are a mess ...)             (me)
  - improve the VM balancing a bit                        (me)
  - add back inactive_target to /proc/meminfo             (me)
rmap 5:
  - fixed recursive buglet, introduced by directly
    editing the patch for making rmap 4 ;)))              (me)
rmap 4:
  - look at the referenced bits in page tables            (me)
rmap 3:
  - forgot one FASTCALL definition                        (me)
rmap 2:
  - teach try_to_unmap_one() about mremap()               (me)
  - don't assign swap space to pages with buffers         (me)
  - make the rmap.c functions FASTCALL / inline           (me)
rmap 1:
  - fix the swap leak in rmap 0                           (Dave McCracken)
rmap 0:
  - port of reverse mapping VM to 2.4.16                  (me)

Rik
-- 
"Linux holds advantages over the single-vendor commercial OS"
    -- Microsoft's "Competing with Linux" document

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH *] rmap VM 11c
  2002-01-17 19:22 [PATCH *] rmap VM 11c Rik van Riel
@ 2002-01-17 23:59 ` Bill Davidsen
  2002-01-18  0:05   ` Rik van Riel
  2002-01-18  0:33 ` Adam Kropelin
  1 sibling, 1 reply; 11+ messages in thread
From: Bill Davidsen @ 2002-01-17 23:59 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, linux-kernel

On Thu, 17 Jan 2002, Rik van Riel wrote:

> For this release, IO tests are very much welcome ...
> 
> 
> The third maintenance release of the 11th version of the reverse
> mapping based VM is now available.
> This is an attempt at making a more robust and flexible VM
> subsystem, while cleaning up a lot of code at the same time.
> The patch is available from:
> 
>            http://surriel.com/patches/2.4/2.4.17-rmap-11c
> and        http://linuxvm.bkbits.net/

Rik, I tried a simple test, building a kernel in a 128M P-II-400, and when
the load average got up to 50 or so the system became slow;-) On the other
hand it was still usable for most normal things other then incoming mail
which properly blocks at LA>10 or so.

I'll be trying it on a large machine tomorrow, but it at least looks
stable. In real life no sane person would do that, would they? Make with a
nice -10 was essentially invisible.

Maybe tomorrow the lateest -aa kernel on the same machine, with and
without my own personal patch.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH *] rmap VM 11c
  2002-01-17 23:59 ` Bill Davidsen
@ 2002-01-18  0:05   ` Rik van Riel
  0 siblings, 0 replies; 11+ messages in thread
From: Rik van Riel @ 2002-01-18  0:05 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-mm, linux-kernel

On Thu, 17 Jan 2002, Bill Davidsen wrote:

> >            http://surriel.com/patches/2.4/2.4.17-rmap-11c
> > and        http://linuxvm.bkbits.net/

> Rik, I tried a simple test, building a kernel in a 128M P-II-400, and
> when the load average got up to 50 or so the system became slow;-) On
> the other hand it was still usable for most normal things other then
> incoming mail which properly blocks at LA>10 or so.

Hehehe, when the load average is 50 only 2% of the CPU is available
for you. With that many gccs you're also under a memory squeeze with
128 MB of RAM, so it's no big wonder things got slow. ;)

I'm happy to hear the system was still usable, though.

> I'll be trying it on a large machine tomorrow, but it at least looks
> stable. In real life no sane person would do that, would they? Make
> with a nice -10 was essentially invisible.

Neat ...

> Maybe tomorrow the lateest -aa kernel on the same machine, with and
> without my own personal patch.

Looking forward to the results.

regards,

Rik
-- 
"Linux holds advantages over the single-vendor commercial OS"
    -- Microsoft's "Competing with Linux" document

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH *] rmap VM 11c
  2002-01-17 19:22 [PATCH *] rmap VM 11c Rik van Riel
  2002-01-17 23:59 ` Bill Davidsen
@ 2002-01-18  0:33 ` Adam Kropelin
  2002-01-18  0:56   ` Rik van Riel
                     ` (2 more replies)
  1 sibling, 3 replies; 11+ messages in thread
From: Adam Kropelin @ 2002-01-18  0:33 UTC (permalink / raw)
  To: Rik van Riel, linux-mm; +Cc: linux-kernel

Rik van Riel <riel@conectiva.com.br>:
> For this release, IO tests are very much welcome ...

Results from a run of my large FTP transfer test on this new release are...
interesting.

Overall time shows an improvement (6:28), though not enough of one to take the
lead over 2.4.13-ac7.

More interesting, perhaps, is the vmstat output, which shows this at first:

   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0      0  47816   2992  84236   0   0    10     0 4462   174   1  33  66
 1  0  0      0  41704   3004  89320   0   0    10     0 4322   167   0  33  67
 0  1  0      0  36004   3012  94064   0   0     9   877 4030   163   1  30  69
 0  1  1      0  33536   3016  96112   0   0     4  1616 1724    62   0  18  82
 0  1  2      0  31068   3020  98160   0   0     4  2048 1729    52   1  15  83
 0  1  1      0  28608   3024 100208   0   0     4  2064 1735    56   1  16  82
 0  1  1      0  26144   3028 102256   0   0     4  2048 1735    50   0  16  84
 0  1  1      0  23684   3032 104304   0   0     5  2048 1713    45   1  15  84
 0  1  1      0  21216   3036 106352   0   0     3  2064 1723    52   1  14  85
 1  0  2      0  18728   3040 108420   0   0     5  2048 1750    59   0  17  82
 0  1  1      0  16292   3044 110448   0   0     3  2064 1722    60   0  15  84
 1  0  1      0  13824   3048 112572   0   0     5  2032 1800    61   0  17  83
 1  0  1      0  11696   3052 114548   0   0     4  2528 1658    47   0  14  86
 1  0  1      0   9232   3056 116596   0   0     4  2048 1735    51   1  13  86
 0  1  2      0   6808   3060 118640   0   0     3  1584 1729    84   0  16  84

(i.e., nice steady writeout reminiscent of -ac)

...but after about 20 seconds, behavior degrades again:

   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  1  1      0   1500   3124 123268   0   0     0  3788  534    20   0   8  92
 0  1  1      0   1500   3124 123268   0   0     0     0  107    12   0   0 100
 0  1  1      0   1500   3124 123268   0   0     0     0  123    10   0   0 100
 0  1  1      0   1500   3124 123268   0   0     0  3666  123    12   0   2  97
 0  1  1      0   1500   3124 123268   0   0     1   259  109    12   0   8  92
 1  0  0      0   1404   3124 123360   0   0     2     0 1078    28   0   7  92
 1  0  0      0   1404   3136 123444   0   0    11     0 4560   178   0  39  61
 1  0  0      0   1404   3148 123448   0   0    10     0 4620   175   1  34  64
 0  0  0      0   1312   3156 123568   0   0    11     0 4276   181   0  36  64
 0  0  0      0   1404   3168 123492   0   0    10     0 4330   185   1  30  68
 0  1  1      0   1404   3172 123488   0   0     4  6864 1742    69   0  17  83
 0  1  1      0   1408   3172 123488   0   0     0     0  111    12   0   0  99
 0  1  1      0   1408   3172 123488   0   0     0     0  126     8   0   0 100
 0  1  1      0   1404   3172 123480   0   0     0  7456  518    18   0  10  90
 0  1  1      0   1404   3172 123480   0   0     0     0  112    10   0   0 100
 0  1  1      0   1404   3172 123480   0   0     0     0  123     9   0   0 100
 0  1  1      0   1404   3172 123476   0   0     1  7222  120    16   0   5  95
 0  1  1      0   1404   3172 123476   0   0     0     0  106     8   0   0 100
 0  1  1      0   1524   3172 123352   0   0     0  3790  519    18   0   8  92
 0  1  1      0   1524   3172 123352   0   0     0     0  113     8   0   0 100
 0  1  1      0   1524   3172 123352   0   0     0     0  125     8   0   0 100

Previous tests showed fluctuating bo values from the start; this is the first
time I've seen them steady, so something in the patch definitely is showing
through here.

I've a couple more tests to run, such as combining -rmap11c with cpqarray and
eepro driver updates from -ac. I'll keep you posted.

--Adam



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH *] rmap VM 11c
  2002-01-18  0:33 ` Adam Kropelin
@ 2002-01-18  0:56   ` Rik van Riel
  2002-01-18 10:06   ` Roy Sigurd Karlsbakk
       [not found]   ` <20020118182837.D31076@asooo.flowerfire.com>
  2 siblings, 0 replies; 11+ messages in thread
From: Rik van Riel @ 2002-01-18  0:56 UTC (permalink / raw)
  To: Adam Kropelin; +Cc: linux-mm, linux-kernel

On Thu, 17 Jan 2002, Adam Kropelin wrote:
> Rik van Riel <riel@conectiva.com.br>:
> > For this release, IO tests are very much welcome ...
>
> Results from a run of my large FTP transfer test on this new release
> are... interesting.
>
> Overall time shows an improvement (6:28), though not enough of one to
> take the lead over 2.4.13-ac7.

> (i.e., nice steady writeout reminiscent of -ac)
> ...but after about 20 seconds, behavior degrades again:
>
> Previous tests showed fluctuating bo values from the start; this is the first
> time I've seen them steady, so something in the patch definitely is showing
> through here.

Thank you for running this test.  I'll try to debug the situation
and see what's going on ... this definately isn't behaving like
it should.

kind regards,

Rik
-- 
"Linux holds advantages over the single-vendor commercial OS"
    -- Microsoft's "Competing with Linux" document

http://www.surriel.com/		http://distro.conectiva.com/



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH *] rmap VM 11c
  2002-01-18  0:33 ` Adam Kropelin
  2002-01-18  0:56   ` Rik van Riel
@ 2002-01-18 10:06   ` Roy Sigurd Karlsbakk
       [not found]   ` <20020118182837.D31076@asooo.flowerfire.com>
  2 siblings, 0 replies; 11+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-01-18 10:06 UTC (permalink / raw)
  To: Adam Kropelin; +Cc: Rik van Riel, linux-mm, linux-kernel

This looks a little like my problem...

See http://karlsbakk.net/dev/kernel/vm-fsckup.txt

On Thu, 17 Jan 2002, Adam Kropelin wrote:

> Rik van Riel <riel@conectiva.com.br>:
> > For this release, IO tests are very much welcome ...
>
> Results from a run of my large FTP transfer test on this new release are...
> interesting.
>
> Overall time shows an improvement (6:28), though not enough of one to take the
> lead over 2.4.13-ac7.
>
> More interesting, perhaps, is the vmstat output, which shows this at first:
>
>    procs                      memory    swap          io     system         cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
>  0  0  0      0  47816   2992  84236   0   0    10     0 4462   174   1  33  66
>  1  0  0      0  41704   3004  89320   0   0    10     0 4322   167   0  33  67
>  0  1  0      0  36004   3012  94064   0   0     9   877 4030   163   1  30  69
>  0  1  1      0  33536   3016  96112   0   0     4  1616 1724    62   0  18  82
>  0  1  2      0  31068   3020  98160   0   0     4  2048 1729    52   1  15  83
>  0  1  1      0  28608   3024 100208   0   0     4  2064 1735    56   1  16  82
>  0  1  1      0  26144   3028 102256   0   0     4  2048 1735    50   0  16  84
>  0  1  1      0  23684   3032 104304   0   0     5  2048 1713    45   1  15  84
>  0  1  1      0  21216   3036 106352   0   0     3  2064 1723    52   1  14  85
>  1  0  2      0  18728   3040 108420   0   0     5  2048 1750    59   0  17  82
>  0  1  1      0  16292   3044 110448   0   0     3  2064 1722    60   0  15  84
>  1  0  1      0  13824   3048 112572   0   0     5  2032 1800    61   0  17  83
>  1  0  1      0  11696   3052 114548   0   0     4  2528 1658    47   0  14  86
>  1  0  1      0   9232   3056 116596   0   0     4  2048 1735    51   1  13  86
>  0  1  2      0   6808   3060 118640   0   0     3  1584 1729    84   0  16  84
>
> (i.e., nice steady writeout reminiscent of -ac)
>
> ...but after about 20 seconds, behavior degrades again:
>
>    procs                      memory    swap          io     system         cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
>  0  1  1      0   1500   3124 123268   0   0     0  3788  534    20   0   8  92
>  0  1  1      0   1500   3124 123268   0   0     0     0  107    12   0   0 100
>  0  1  1      0   1500   3124 123268   0   0     0     0  123    10   0   0 100
>  0  1  1      0   1500   3124 123268   0   0     0  3666  123    12   0   2  97
>  0  1  1      0   1500   3124 123268   0   0     1   259  109    12   0   8  92
>  1  0  0      0   1404   3124 123360   0   0     2     0 1078    28   0   7  92
>  1  0  0      0   1404   3136 123444   0   0    11     0 4560   178   0  39  61
>  1  0  0      0   1404   3148 123448   0   0    10     0 4620   175   1  34  64
>  0  0  0      0   1312   3156 123568   0   0    11     0 4276   181   0  36  64
>  0  0  0      0   1404   3168 123492   0   0    10     0 4330   185   1  30  68
>  0  1  1      0   1404   3172 123488   0   0     4  6864 1742    69   0  17  83
>  0  1  1      0   1408   3172 123488   0   0     0     0  111    12   0   0  99
>  0  1  1      0   1408   3172 123488   0   0     0     0  126     8   0   0 100
>  0  1  1      0   1404   3172 123480   0   0     0  7456  518    18   0  10  90
>  0  1  1      0   1404   3172 123480   0   0     0     0  112    10   0   0 100
>  0  1  1      0   1404   3172 123480   0   0     0     0  123     9   0   0 100
>  0  1  1      0   1404   3172 123476   0   0     1  7222  120    16   0   5  95
>  0  1  1      0   1404   3172 123476   0   0     0     0  106     8   0   0 100
>  0  1  1      0   1524   3172 123352   0   0     0  3790  519    18   0   8  92
>  0  1  1      0   1524   3172 123352   0   0     0     0  113     8   0   0 100
>  0  1  1      0   1524   3172 123352   0   0     0     0  125     8   0   0 100
>
> Previous tests showed fluctuating bo values from the start; this is the first
> time I've seen them steady, so something in the patch definitely is showing
> through here.
>
> I've a couple more tests to run, such as combining -rmap11c with cpqarray and
> eepro driver updates from -ac. I'll keep you posted.
>
> --Adam
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH *] rmap VM 11c (RMAP IS A WINNER!)
       [not found]   ` <20020118182837.D31076@asooo.flowerfire.com>
@ 2002-01-19  5:08     ` Adam Kropelin
  2002-01-19 17:50       ` Andrea Arcangeli
  0 siblings, 1 reply; 11+ messages in thread
From: Adam Kropelin @ 2002-01-19  5:08 UTC (permalink / raw)
  To: Ken Brownfield
  Cc: Rik van Riel, Dieter Nützel, linux-kernel, Andrea Arcangeli

Ken Brownfield:

> Do you get more even throughput with this:
>
> /bin/echo "10 0 0 0 500 3000 10 0 0" > /proc/sys/vm/bdflush
>
> It seems to help significantly for me under heavy sustained I/O load.

With a little modification, Ken's suggestion makes -rmap11c a winner on my test
case.

/bin/echo "10 0 0 0 500 3000 30 0 0" > /proc/sys/vm/bdflush

Switching to synchronous bdflush a little later than Ken did brings performance
up to ~2000 blocks/sec, which is similar to older -ac kernels. This writeout
rate is very consistent (even more so than -ac) and seems to be the top end in
all large writes to the RAID (tried FTP, samba, and local balls-to-the-wall "cat
/dev/zero >..."), which helps show that this is not a network driver or protocol
interaction.

The same bdflush tuning (leaving aa's additional parameters at their defaults)
on 2.4.18pre2aa2 yields some improvement, but rmap is consistently faster by a
good margin. 2.4.17 performs worse with this tuning and is pretty much eating
dust at this point.

Latest Results:
2.4.17-rmap11c: 5:41 (down from 6:58)
2.4.18-pre2aa2: 6:31 (down from 7:10)
2.4.17: 7:06 (up from 6:57)

Congrats, Rik and thanks, Ken!

--Adam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH *] rmap VM 11c (RMAP IS A WINNER!)
  2002-01-19  5:08     ` [PATCH *] rmap VM 11c (RMAP IS A WINNER!) Adam Kropelin
@ 2002-01-19 17:50       ` Andrea Arcangeli
  2002-01-19 18:39         ` Adam Kropelin
  0 siblings, 1 reply; 11+ messages in thread
From: Andrea Arcangeli @ 2002-01-19 17:50 UTC (permalink / raw)
  To: Adam Kropelin
  Cc: Ken Brownfield, Rik van Riel, Dieter Nützel, linux-kernel

On Sat, Jan 19, 2002 at 12:08:30AM -0500, Adam Kropelin wrote:
> Ken Brownfield:
> 
> > Do you get more even throughput with this:
> >
> > /bin/echo "10 0 0 0 500 3000 10 0 0" > /proc/sys/vm/bdflush
> >
> > It seems to help significantly for me under heavy sustained I/O load.
> 
> With a little modification, Ken's suggestion makes -rmap11c a winner on my test
> case.
> 
> /bin/echo "10 0 0 0 500 3000 30 0 0" > /proc/sys/vm/bdflush
				  ^
> 
> Switching to synchronous bdflush a little later than Ken did brings performance
> up to ~2000 blocks/sec, which is similar to older -ac kernels. This writeout
> rate is very consistent (even more so than -ac) and seems to be the top end in
> all large writes to the RAID (tried FTP, samba, and local balls-to-the-wall "cat
> /dev/zero >..."), which helps show that this is not a network driver or protocol
> interaction.
> 
> The same bdflush tuning (leaving aa's additional parameters at their defaults)

you cannot set the underlined one to zero (way too low, insane) or to
left it to its default (20) in -aa, or it will be misconfigured setup
that can lead to anything. the rule is:

	nfract_stop_bdflush <= nfract <= nfract_sync

you set:

	nfract = 10
	nfract_sync = 30

so nfract_stop_bdflush cannot be 20.

Furthmore you set ndirty to 0, that also is an invalid setup.

With -aa something sane along the above lines is:

	/bin/echo "10 2000 0 0 500 3000 30 5 0" > /proc/sys/vm/bdflush

this set nfract to 2000 (so you will write around 2 mbyte at every go
with a 1k fs like I bet you have, not 500k as the default), plus nfract
= 10%, nfract_sync = 30% and nfract_stop_bdflush = 5%
(nfract_stop_bdflush is available only in -aa). Of course nfract should
be in function at least of bytes, not of blocksize, but oh well...

now it would be interesting to know how it performs this way with -aa.
The fact you setup the stop bdflush either to 0 or to 20 in -aa can very
well explain regression in async-flushing performance with your previous
test on top of -aa.

Andrea

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH *] rmap VM 11c (RMAP IS A WINNER!)
  2002-01-19 17:50       ` Andrea Arcangeli
@ 2002-01-19 18:39         ` Adam Kropelin
  2002-01-19 20:21           ` Andrea Arcangeli
  0 siblings, 1 reply; 11+ messages in thread
From: Adam Kropelin @ 2002-01-19 18:39 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Ken Brownfield, Rik van Riel, Dieter Nützel, linux-kernel

Andrea Arcangeli:
> On Sat, Jan 19, 2002 at 12:08:30AM -0500, Adam Kropelin wrote:
> > /bin/echo "10 0 0 0 500 3000 30 0 0" > /proc/sys/vm/bdflush
>   ^
>
> you cannot set the underlined one to zero (way too low, insane) or to
> left it to its default (20) in -aa, or it will be misconfigured setup
> that can lead to anything. the rule is:
>
> nfract_stop_bdflush <= nfract <= nfract_sync

<snip>

> so nfract_stop_bdflush cannot be 20.

Ok, thanks for straightening me out on that. I figured there might be some
consequence of  the additional knobs in -aa which I didn't know about.

> Furthmore you set ndirty to 0, that also is an invalid setup.

I didn't. That was one of the "additional parameters" that I left at the default
on -aa (500, it seems). Sorry, I should have been clearer about exactly what
settings I used on -aa; the quoted settings were for -rmap only. For reference,
the exact command I tried on -aa was:

/bin/echo "10 500 0 0 500 3000 30 20 0" > /proc/sys/vm/bdflush

> With -aa something sane along the above lines is:
>
> /bin/echo "10 2000 0 0 500 3000 30 5 0" > /proc/sys/vm/bdflush

Unfortunately, those adjustments on top of 2.4.18-pre2aa2 set a new record for
worst performance: 7:19.

An additional datapoint: The quoted bdflush settings which make 2.4.17-rmap11c a
winner do not do well at all on 2.4.17-rmap11a. Rik's initial reaction to the
issue was that there was a bug and I know he made some changes in rmap11c to
address it. The fact that 11c definitely performs better for me than 11a seems
to support this. Perhaps this bug or a variant thereof also exists in aa?

--Adam

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH *] rmap VM 11c (RMAP IS A WINNER!)
  2002-01-19 18:39         ` Adam Kropelin
@ 2002-01-19 20:21           ` Andrea Arcangeli
  2002-01-19 22:15             ` Adam Kropelin
  0 siblings, 1 reply; 11+ messages in thread
From: Andrea Arcangeli @ 2002-01-19 20:21 UTC (permalink / raw)
  To: Adam Kropelin
  Cc: Ken Brownfield, Rik van Riel, Dieter Nützel, linux-kernel

On Sat, Jan 19, 2002 at 01:39:22PM -0500, Adam Kropelin wrote:
> Andrea Arcangeli:
> > On Sat, Jan 19, 2002 at 12:08:30AM -0500, Adam Kropelin wrote:
> > > /bin/echo "10 0 0 0 500 3000 30 0 0" > /proc/sys/vm/bdflush
> >   ^
> >
> > you cannot set the underlined one to zero (way too low, insane) or to
> > left it to its default (20) in -aa, or it will be misconfigured setup
> > that can lead to anything. the rule is:
> >
> > nfract_stop_bdflush <= nfract <= nfract_sync
> 
> <snip>
> 
> > so nfract_stop_bdflush cannot be 20.
> 
> Ok, thanks for straightening me out on that. I figured there might be some
> consequence of  the additional knobs in -aa which I didn't know about.
> 
> > Furthmore you set ndirty to 0, that also is an invalid setup.
> 
> I didn't. That was one of the "additional parameters" that I left at the default
> on -aa (500, it seems). Sorry, I should have been clearer about exactly what
> settings I used on -aa; the quoted settings were for -rmap only. For reference,
> the exact command I tried on -aa was:
> 
> /bin/echo "10 500 0 0 500 3000 30 20 0" > /proc/sys/vm/bdflush
> 
> > With -aa something sane along the above lines is:
> >
> > /bin/echo "10 2000 0 0 500 3000 30 5 0" > /proc/sys/vm/bdflush
> 
> Unfortunately, those adjustments on top of 2.4.18-pre2aa2 set a new record for
> worst performance: 7:19.

then please try to decrease the nfract variable again, the above set it
to 2000, if you've a slow harddisk maybe that's too much, so you can try
to set it to 500 again.

I'd also give a try with the below settings:

	/bin/echo "10 500 0 0 500 3000 80 8 0" > /proc/sys/vm/bdflush

(500 may vary, you may try with 200 or 1000 instead etc.., but a large
ndirty_sync should allow your program to keep getting data)

> An additional datapoint: The quoted bdflush settings which make 2.4.17-rmap11c a
> winner do not do well at all on 2.4.17-rmap11a. Rik's initial reaction to the
> issue was that there was a bug and I know he made some changes in rmap11c to
> address it. The fact that 11c definitely performs better for me than 11a seems
> to support this. Perhaps this bug or a variant thereof also exists in aa?

AFIK there are no known bugs at the moment (things that can be called
bugs I mean). I cannot consider this special speed variation a bug. And
this buffer flushing thing matters only with a few function in buffer.c,
I don't see how the rmap design can make any difference to this
benchmark (if there's some true change that makes difference then it's
completly orthogonal with rmap).

I benchmarked in my hardware that writes as as fast as reads, and
personally I'm fine with the current behaviour of the async flushing for
2.4, so I don't care much about this (async flushing points are an
heuristic and so it's hard to get every single case faster than before,
that's why it's tuanable, and what you are hitting could be a
timing loopback), I basically care to find exactly what makes the
difference for you, just to be sure it's nothing serious as expected
(just different async flushing wakeup points).

Also just in case, I'd suggest to try to repeat each benchmark three
times, so we know we are not bitten by random variations in the numbers.

Andrea

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH *] rmap VM 11c (RMAP IS A WINNER!)
  2002-01-19 20:21           ` Andrea Arcangeli
@ 2002-01-19 22:15             ` Adam Kropelin
  0 siblings, 0 replies; 11+ messages in thread
From: Adam Kropelin @ 2002-01-19 22:15 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Ken Brownfield, Rik van Riel, Dieter Nützel, linux-kernel

(Andrea, the previous version of this mail wasn't supposed to go out yet. I
fat-fingered and sent it before I was done. This is the full version.)

Andrea Arcangeli:
> On Sat, Jan 19, 2002 at 01:39:22PM -0500, Adam Kropelin wrote:
> > Andrea Arcangeli:
> > > With -aa something sane along the above lines is:
> > >
> > > /bin/echo "10 2000 0 0 500 3000 30 5 0" > /proc/sys/vm/bdflush
> >
> > Unfortunately, those adjustments on top of 2.4.18-pre2aa2 set a new record
for
> > worst performance: 7:19.
>
> then please try to decrease the nfract variable again, the above set it
> to 2000, if you've a slow harddisk maybe that's too much, so you can try
> to set it to 500 again.

Yes, the harddisk is definitely slow: it's a hw RAID5 partition with older
drives, so writes are pretty slow.

I tried various nfract settings:

/bin/echo "10 300 0 0 500 3000 30 5 0" > /proc/sys/vm/bdflush
7:33

/bin/echo "10 500 0 0 500 3000 30 5 0" > /proc/sys/vm/bdflush
6:00

/bin/echo "10 800 0 0 500 3000 30 5 0" > /proc/sys/vm/bdflush
7:17

nfract=500 seems to be the best and gets much closer to the performance of rmap
and -ac. Writeout is still very bursty compared to the other kernels, but that
may not really matter, I don't know.

> I'd also give a try with the below settings:
>
> /bin/echo "10 500 0 0 500 3000 80 8 0" > /proc/sys/vm/bdflush

7:08

<snip>

> Also just in case, I'd suggest to try to repeat each benchmark three
> times, so we know we are not bitten by random variations in the numbers.

I've been doing a variation on that theme already. The numbers I've been
reporting are best of 2 runs. I have never seen the 2 runs differ by more than
+/- 10 seconds.

--Adam




^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-01-19 22:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-01-17 19:22 [PATCH *] rmap VM 11c Rik van Riel
2002-01-17 23:59 ` Bill Davidsen
2002-01-18  0:05   ` Rik van Riel
2002-01-18  0:33 ` Adam Kropelin
2002-01-18  0:56   ` Rik van Riel
2002-01-18 10:06   ` Roy Sigurd Karlsbakk
     [not found]   ` <20020118182837.D31076@asooo.flowerfire.com>
2002-01-19  5:08     ` [PATCH *] rmap VM 11c (RMAP IS A WINNER!) Adam Kropelin
2002-01-19 17:50       ` Andrea Arcangeli
2002-01-19 18:39         ` Adam Kropelin
2002-01-19 20:21           ` Andrea Arcangeli
2002-01-19 22:15             ` Adam Kropelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox