* Linux 2.4.18pre3-ac1
@ 2002-01-13 21:44 Alan Cox
2002-01-13 19:23 ` Thiago Rondon
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Alan Cox @ 2002-01-13 21:44 UTC (permalink / raw)
To: linux-kernel
People keep bugging me about the -ac tree stuff so this is whats in my
current internal diff with the ll patch and the ide changes excluded.
Much of this is stuff just waiting to go to Marcelo but it has the 32bit
uid quota that some folks consider pretty critical and the rmap-11b VM
which I consider pretty essential
(Marcelo I'll be sending you stuff I've done from this anyway, if there
is other stuff you want extracting just ask)
Linux 2.4.18pre3-ac1
o 32bit uid quota
o rmap-11b VM (Rik van Riel,
William Irwin etc)
o Make scsi printer visible (Stefan Wieseckel)
o Report Hercules Fortissimo card (Minya Sorakinu)
o Fix O_NDELAY close mishandling on the following (me)
sound cards: cmpci, cs46xx, es1370, es1371,
esssolo1, sonicvibes
o tdfx pixclock handling fix (Jurriaan)
o Fix mishandling of file system size limiting (Andrea Arcangeli)
o generic_serial cleanups (Rasmus Andersen)
o serial.c locking fixes for SMP - move from cli (Kees)
too
o Truncate fixes from old -ac tree (Andrew Morton)
o Hopefully fix the i2o oops (me)
| Not the right fix but it'll do till I rewrite this
o Fix non blocking tty blocking bug (Peter Benie)
o IRQ routing workaround for problem HP laptops (Cory Bell)
o Fix the rcpci driver (Pete Popov)
o Fix documentation of aedsp location (Adrian Bunk)
o Fix the worst of the APM ate my cpu problems (Andreas Steinmetz)
o Correct icmp documentation (Pierre Lombard)
o Multiple mxser crash on boot fix (Stephan von Krawczynski)
o ldm header fix (Anton Altaparmakov)
o Fix unchecked kmalloc in i2o_proc (Ragnar Hojland Espinosa)
o Fix unchecked kmalloc in airo_cs (Ragnar Hojland Espinosa)
o Fix unchecked kmalloc in btaudio (Ragnar Hojland Espinosa)
o Fix unchecked kmalloc in qnx4/inode.c (Ragnar Hojland Espinosa)
o Disable DRM4.1 GMX2000 driver (4.0 required) (me)
o Fix sb16 lower speed limit bug (Jori Liesenborgs)
o Fix compilation of orinoco driver (Ben Herrenschmidt)
o ISAPnP init fix (Chris Rankin)
o Export release_console_sem (Andrew Morton)
o Output nat crash fix (Rusty Russell)
o Fix PLIP (Tim Waugh)
o Natsemi driver hang fix (Manfred Spraul)
o Add mono/stereo reporting to gemtek pci radio (Jonathan Hudson)
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: Linux 2.4.18pre3-ac1 2002-01-13 21:44 Linux 2.4.18pre3-ac1 Alan Cox @ 2002-01-13 19:23 ` Thiago Rondon 2002-01-13 22:52 ` Ville Herva 2002-01-14 0:15 ` Adam Kropelin 2 siblings, 0 replies; 20+ messages in thread From: Thiago Rondon @ 2002-01-13 19:23 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel [maluco@freak maluco]$ finger @kernel.org [kernel.org] The latest stable version of the Linux kernel is: 2.4.17 The latest prepatch for the stable Linux kernel tree is: 2.4.18-pre3 The latest beta version of the Linux kernel is: 2.5.1 The latest prepatch for the beta Linux kernel tree is: 2.5.2-pre11 The latest -ac patch to the stable Linux kernels is: 2.4.13-ac8 That message is "maintainer" by someone? The -ac tree isnt update. On Sun, 13 Jan 2002, Alan Cox wrote: > People keep bugging me about the -ac tree stuff so this is whats in my > current internal diff with the ll patch and the ide changes excluded. > > Much of this is stuff just waiting to go to Marcelo but it has the 32bit > uid quota that some folks consider pretty critical and the rmap-11b VM > which I consider pretty essential > > (Marcelo I'll be sending you stuff I've done from this anyway, if there > is other stuff you want extracting just ask) > > Linux 2.4.18pre3-ac1 > > o 32bit uid quota > o rmap-11b VM (Rik van Riel, > William Irwin etc) > o Make scsi printer visible (Stefan Wieseckel) > o Report Hercules Fortissimo card (Minya Sorakinu) > o Fix O_NDELAY close mishandling on the following (me) > sound cards: cmpci, cs46xx, es1370, es1371, > esssolo1, sonicvibes > o tdfx pixclock handling fix (Jurriaan) > o Fix mishandling of file system size limiting (Andrea Arcangeli) > o generic_serial cleanups (Rasmus Andersen) > o serial.c locking fixes for SMP - move from cli (Kees) > too > o Truncate fixes from old -ac tree (Andrew Morton) > o Hopefully fix the i2o oops (me) > | Not the right fix but it'll do till I rewrite this > o Fix non blocking tty blocking bug (Peter Benie) > o IRQ routing workaround for problem HP laptops (Cory Bell) > o Fix the rcpci driver (Pete Popov) > o Fix documentation of aedsp location (Adrian Bunk) > o Fix the worst of the APM ate my cpu problems (Andreas Steinmetz) > o Correct icmp documentation (Pierre Lombard) > o Multiple mxser crash on boot fix (Stephan von Krawczynski) > o ldm header fix (Anton Altaparmakov) > o Fix unchecked kmalloc in i2o_proc (Ragnar Hojland Espinosa) > o Fix unchecked kmalloc in airo_cs (Ragnar Hojland Espinosa) > o Fix unchecked kmalloc in btaudio (Ragnar Hojland Espinosa) > o Fix unchecked kmalloc in qnx4/inode.c (Ragnar Hojland Espinosa) > o Disable DRM4.1 GMX2000 driver (4.0 required) (me) > o Fix sb16 lower speed limit bug (Jori Liesenborgs) > o Fix compilation of orinoco driver (Ben Herrenschmidt) > o ISAPnP init fix (Chris Rankin) > o Export release_console_sem (Andrew Morton) > o Output nat crash fix (Rusty Russell) > o Fix PLIP (Tim Waugh) > o Natsemi driver hang fix (Manfred Spraul) > o Add mono/stereo reporting to gemtek pci radio (Jonathan Hudson) > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Linux 2.4.18pre3-ac1 2002-01-13 21:44 Linux 2.4.18pre3-ac1 Alan Cox 2002-01-13 19:23 ` Thiago Rondon @ 2002-01-13 22:52 ` Ville Herva 2002-01-13 22:57 ` Alan Cox 2002-01-14 0:15 ` Adam Kropelin 2 siblings, 1 reply; 20+ messages in thread From: Ville Herva @ 2002-01-13 22:52 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel On Sun, Jan 13, 2002 at 04:44:46PM -0500, you [Alan Cox] claimed: > People keep bugging me about the -ac tree stuff so this is whats in my > current internal diff with the ll patch and the ide changes excluded. Any big reason why you aren't including those two? I'm pretty sure a lot of people will eventual bug Marcelo (and you) about merging ide to 2.4 proper (or -ac)... :) > Linux 2.4.18pre3-ac1 > > o rmap-11b VM (Rik van Riel, > William Irwin etc) So I gather you find this better than AA vm, even the -aa version? > o Fix O_NDELAY close mishandling on the following (me) > sound cards: cmpci, cs46xx, es1370, es1371, > esssolo1, sonicvibes With 17rc1, es1370 went once or twice to a state where it kept accepting data _very_ slowly and seemingly nothing came out of speakers. Actually I'm not sure if it actually ate any data, echo > /dev/dsp blocked, but some audio apps _seemed_ to make some progress. rmmod es1370; insmod es1370 succeeded, but didn't help - I had to reboot. 2.4.10ac10 (which is what I ran before 17rc1) never showed this. I wan't able to reproduce it on purpose. I guess this is not the fix for that? -- v -- v@iki.fi ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Linux 2.4.18pre3-ac1 2002-01-13 22:52 ` Ville Herva @ 2002-01-13 22:57 ` Alan Cox 0 siblings, 0 replies; 20+ messages in thread From: Alan Cox @ 2002-01-13 22:57 UTC (permalink / raw) To: Ville Herva; +Cc: Alan Cox, linux-kernel > On Sun, Jan 13, 2002 at 04:44:46PM -0500, you [Alan Cox] claimed: > > People keep bugging me about the -ac tree stuff so this is whats in my > > current internal diff with the ll patch and the ide changes excluded. > > Any big reason why you aren't including those two? I'm pretty sure a lot of > people will eventual bug Marcelo (and you) about merging ide to 2.4 > proper (or -ac)... :) So I can tell which patch causes problems if any > 2.4.10ac10 (which is what I ran before 17rc1) never showed this. I wan't > able to reproduce it on purpose. > > I guess this is not the fix for that? Thats the first I've heard of the other problem ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Linux 2.4.18pre3-ac1 2002-01-13 21:44 Linux 2.4.18pre3-ac1 Alan Cox 2002-01-13 19:23 ` Thiago Rondon 2002-01-13 22:52 ` Ville Herva @ 2002-01-14 0:15 ` Adam Kropelin 2002-01-14 0:47 ` Alan Cox 2002-01-14 2:54 ` Rik van Riel 2 siblings, 2 replies; 20+ messages in thread From: Adam Kropelin @ 2002-01-14 0:15 UTC (permalink / raw) To: linux-kernel ----- Original Message ----- From: "Alan Cox" <alan@redhat.com> To: <linux-kernel@vger.kernel.org> Sent: Sunday, January 13, 2002 4:44 PM Subject: Linux 2.4.18pre3-ac1 > People keep bugging me about the -ac tree stuff so this is whats in my > current internal diff with the ll patch and the ide changes excluded. <snip> For the sake of completeness I ran my large inbound FTP transfer test (details in the "Writeout in recent kernels..." thread) on this release. Performance and observed writeout behavior was essentially the same as for 2.4.17, both stock and with -rmap11a. Transfer time was 6:56 and writeout was uneven. 2.4.13-ac7 is still the winner by a significant margin. Hmmm... --Adam ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Linux 2.4.18pre3-ac1 2002-01-14 0:15 ` Adam Kropelin @ 2002-01-14 0:47 ` Alan Cox 2002-01-14 2:13 ` Benjamin LaHaise 2002-01-14 2:54 ` Rik van Riel 1 sibling, 1 reply; 20+ messages in thread From: Alan Cox @ 2002-01-14 0:47 UTC (permalink / raw) To: Adam Kropelin; +Cc: linux-kernel > in the "Writeout in recent kernels..." thread) on this release. Performance and > observed writeout behavior was essentially the same as for 2.4.17, both stock > and with -rmap11a. Transfer time was 6:56 and writeout was uneven. 2.4.13-ac7 is > still the winner by a significant margin. That is very useful information actually. That does rather imply that some of the performance hit came from the block I/O elevator differences in the old ac tree (the ones Linus hated ;)). Now the question (and part of the reason Linus didnt like them) - is why ? Alan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Linux 2.4.18pre3-ac1 2002-01-14 0:47 ` Alan Cox @ 2002-01-14 2:13 ` Benjamin LaHaise 0 siblings, 0 replies; 20+ messages in thread From: Benjamin LaHaise @ 2002-01-14 2:13 UTC (permalink / raw) To: Alan Cox; +Cc: Adam Kropelin, linux-kernel On Mon, Jan 14, 2002 at 12:47:54AM +0000, Alan Cox wrote: > That is very useful information actually. That does rather imply that some > of the performance hit came from the block I/O elevator differences in the > old ac tree (the ones Linus hated ;)). Now the question (and part of the > reason Linus didnt like them) - is why ? Iirc, Linus just didn't like the low/high watermarks for starting & stopping io. Personally, I liked it and wanted to use that mechanism for deciding when to submit additional blocks from the buffer cache for the device (it provides a nice means of encouraging batching). The problem that started this whole mess was a combination of the missing wake_up in the block layer that I found, plus the horrendous io latency that we hit with a long io queue and no priorities. The critical pages for swap in and program loading, as well as background write outs need to have a priority boost so that interactive feel is better. Of course, with quite a few improvements in when we wait on ios going into the vm between 2.4.7 and 2.4.17, we don't wait as indiscriminately on io as we did back then. But write out latency can still harm us. In effect, it is a latency vs thruput tradeoff. -ben ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Linux 2.4.18pre3-ac1 2002-01-14 0:15 ` Adam Kropelin 2002-01-14 0:47 ` Alan Cox @ 2002-01-14 2:54 ` Rik van Riel 2002-01-14 5:47 ` Eric W. Biederman 1 sibling, 1 reply; 20+ messages in thread From: Rik van Riel @ 2002-01-14 2:54 UTC (permalink / raw) To: Adam Kropelin; +Cc: linux-kernel On Sun, 13 Jan 2002, Adam Kropelin wrote: > From: "Alan Cox" <alan@redhat.com> > > > People keep bugging me about the -ac tree stuff so this is whats in my > > current internal diff with the ll patch and the ide changes excluded. > For the sake of completeness I ran my large inbound FTP transfer test > (details in the "Writeout in recent kernels..." thread) on this > release. Performance and observed writeout behavior was essentially > the same as for 2.4.17, both stock and with -rmap11a. Transfer time > was 6:56 and writeout was uneven. 2.4.13-ac7 is still the winner by a > significant margin. I'm looking into this bug, I just finished the first large dbench test set on 2.4.17-rmap11b with 512 MB RAM, tomorrow I'll run them with 128 and 32 MB of RAM. Luckily you have already shown the other recent kernels to have the same performance, so I only have to do half a day of testing. I'll try to track down this bug and get it fixed. regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Linux 2.4.18pre3-ac1 2002-01-14 2:54 ` Rik van Riel @ 2002-01-14 5:47 ` Eric W. Biederman 2002-01-14 6:17 ` Rik van Riel 0 siblings, 1 reply; 20+ messages in thread From: Eric W. Biederman @ 2002-01-14 5:47 UTC (permalink / raw) To: Rik van Riel; +Cc: Adam Kropelin, linux-kernel Rik van Riel <riel@conectiva.com.br> writes: > On Sun, 13 Jan 2002, Adam Kropelin wrote: > > > From: "Alan Cox" <alan@redhat.com> > > > > > People keep bugging me about the -ac tree stuff so this is whats in my > > > current internal diff with the ll patch and the ide changes excluded. > > > For the sake of completeness I ran my large inbound FTP transfer test > > (details in the "Writeout in recent kernels..." thread) on this > > release. Performance and observed writeout behavior was essentially > > the same as for 2.4.17, both stock and with -rmap11a. Transfer time > > was 6:56 and writeout was uneven. 2.4.13-ac7 is still the winner by a > > significant margin. > > I'm looking into this bug, I just finished the first large > dbench test set on 2.4.17-rmap11b with 512 MB RAM, tomorrow > I'll run them with 128 and 32 MB of RAM. > > Luckily you have already shown the other recent kernels to > have the same performance, so I only have to do half a day > of testing. I'll try to track down this bug and get it fixed. Rik while you are looking at your reverse mapping code, I would like to call to your attention the at least trippling of times for fork. I wouldn't be surprised if the reason your rmap vm handles things like gcc -j better than the stock kernel is simply the reduced number of processes, due to slower forking. Just my 2 cents so we don't forget the caveats of the reverse map approach. Eric ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Linux 2.4.18pre3-ac1 2002-01-14 5:47 ` Eric W. Biederman @ 2002-01-14 6:17 ` Rik van Riel 2002-01-14 7:25 ` Eric W. Biederman 0 siblings, 1 reply; 20+ messages in thread From: Rik van Riel @ 2002-01-14 6:17 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Adam Kropelin, linux-kernel On 13 Jan 2002, Eric W. Biederman wrote: > Rik van Riel <riel@conectiva.com.br> writes: > Rik while you are looking at your reverse mapping code, I would like > to call to your attention the at least trippling of times for fork. Dave McCracken has measured this on his system, it seems to vary from between 10% for bash to 400% for a process with 10 MB of memory. This is a problem which will need to be solved, a number of designs on how to deal with this are ready, implementation needs to be done. > I wouldn't be surprised if the reason your rmap vm handles things like > gcc -j better than the stock kernel is simply the reduced number of > processes, due to slower forking. I really doubt this, since gcc spends so much more time doing real work than forking that the time used in fork can be ignored, even if it gets 3 times slower. > Just my 2 cents so we don't forget the caveats of the reverse map > approach. The main way we can speed up fork easily is by not copying the page tables at all at fork time but filling them in later at page fault time. While this might look like it's just moving the overhead from one place to another, but for the typical fork()+exec() case it means (1) we don't copy the page tables at fork time (2) we don't need to free them at exec time (3) after the exec, the parent can just take back the complete page tables without having to take COW faults on all its pages. regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Linux 2.4.18pre3-ac1 2002-01-14 6:17 ` Rik van Riel @ 2002-01-14 7:25 ` Eric W. Biederman 2002-01-14 9:28 ` David S. Miller 2002-01-21 3:46 ` Daniel Phillips 0 siblings, 2 replies; 20+ messages in thread From: Eric W. Biederman @ 2002-01-14 7:25 UTC (permalink / raw) To: Rik van Riel; +Cc: Adam Kropelin, linux-kernel Rik van Riel <riel@conectiva.com.br> writes: > On 13 Jan 2002, Eric W. Biederman wrote: > > Rik van Riel <riel@conectiva.com.br> writes: > > > Rik while you are looking at your reverse mapping code, I would like > > to call to your attention the at least trippling of times for fork. > > Dave McCracken has measured this on his system, it seems to vary > from between 10% for bash to 400% for a process with 10 MB of memory. O.k. That sounds about like what I was expecting. > This is a problem which will need to be solved, a number of designs > on how to deal with this are ready, implementation needs to be done. > > I wouldn't be surprised if the reason your rmap vm handles things like > > gcc -j better than the stock kernel is simply the reduced number of > > processes, due to slower forking. > > I really doubt this, since gcc spends so much more time doing > real work than forking that the time used in fork can be ignored, > even if it gets 3 times slower. But for make -j the forking is done by make and it is nearly a fork bomb, there is simply a linear increase in the number of processes instead of an exponential one. So I will at least hold this as a candidate for the make -j kernel fixes. > > Just my 2 cents so we don't forget the caveats of the reverse map > > approach. > > The main way we can speed up fork easily is by not copying the > page tables at all at fork time but filling them in later at page > fault time. While this might look like it's just moving the overhead > from one place to another, but for the typical fork()+exec() case it > means (1) we don't copy the page tables at fork time (2) we don't > need to free them at exec time (3) after the exec, the parent can > just take back the complete page tables without having to take COW > faults on all its pages. Which is definitely a win. Perhaps we could even have paged page tables at that point. There is a second piece that should make things faster as well. Adopt the a BSD style page table allocation where we do an order 1 allocation and allocate both the page table and the reverse page tables all in the same chunk of memory. Which means you can jump from one to the other with pointer arithmetic. So you can lose one element of your reverse page table chain structure. Eric ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Linux 2.4.18pre3-ac1 2002-01-14 7:25 ` Eric W. Biederman @ 2002-01-14 9:28 ` David S. Miller 2002-01-14 12:05 ` Rik van Riel 2002-01-21 3:46 ` Daniel Phillips 1 sibling, 1 reply; 20+ messages in thread From: David S. Miller @ 2002-01-14 9:28 UTC (permalink / raw) To: ebiederm; +Cc: riel, akropel1, linux-kernel From: ebiederm@xmission.com (Eric W. Biederman) Date: 14 Jan 2002 00:25:16 -0700 But for make -j the forking is done by make and it is nearly a fork bomb Someone has probably mentioned this, but it is important to recognize that make uses vfork(). ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Linux 2.4.18pre3-ac1 2002-01-14 9:28 ` David S. Miller @ 2002-01-14 12:05 ` Rik van Riel 0 siblings, 0 replies; 20+ messages in thread From: Rik van Riel @ 2002-01-14 12:05 UTC (permalink / raw) To: David S. Miller; +Cc: ebiederm, akropel1, linux-kernel On Mon, 14 Jan 2002, David S. Miller wrote: > From: ebiederm@xmission.com (Eric W. Biederman) > > But for make -j the forking is done by make and it is nearly a > fork bomb > > Someone has probably mentioned this, but it is important to recognize > that make uses vfork(). Indeed. In the beginning I was also afraid I'd hit the fork() problem Eric mentions, but after running lots of tests I can't really say it has shown up in the profiles anywhere. I'm sure you could make a benchmark to clearly show it, but for most common workloads it doesn't seem to be much of an issue. A possible exception to this is apache, I need to look into that a bit more. regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Linux 2.4.18pre3-ac1 2002-01-14 7:25 ` Eric W. Biederman 2002-01-14 9:28 ` David S. Miller @ 2002-01-21 3:46 ` Daniel Phillips 2002-01-21 5:30 ` Richard Gooch 1 sibling, 1 reply; 20+ messages in thread From: Daniel Phillips @ 2002-01-21 3:46 UTC (permalink / raw) To: Eric W. Biederman, Rik van Riel; +Cc: Adam Kropelin, linux-kernel On January 14, 2002 08:25 am, Eric W. Biederman wrote: > Rik van Riel <riel@conectiva.com.br> writes: > > On 13 Jan 2002, Eric W. Biederman wrote: > > > Rik while you are looking at your reverse mapping code, I would like > > > to call to your attention the at least trippling of times for fork. > > > > Dave McCracken has measured this on his system, it seems to vary > > from between 10% for bash to 400% for a process with 10 MB of memory. > > O.k. That sounds about like what I was expecting. > [...] > > > Just my 2 cents so we don't forget the caveats of the reverse map > > > approach. > > > > The main way we can speed up fork easily is by not copying the > > page tables at all at fork time but filling them in later at page > > fault time. While this might look like it's just moving the overhead > > from one place to another, but for the typical fork()+exec() case it > > means (1) we don't copy the page tables at fork time (2) we don't > > need to free them at exec time (3) after the exec, the parent can > > just take back the complete page tables without having to take COW > > faults on all its pages. > > Which is definitely a win. Perhaps we could even have paged page tables > at that point. Yes, it's possible but it's of secondary importance. The first, essential goal has to be to eliminate the rmap fork overhead so that rmap becomes a 'never worse and often better' solution. It's for this reason that I developed an algorithm a few weeks ago to do lazy page table instantiation efficiently, which is what Rik is referring to. I'm not quite ready to post details yet, since I haven't tried it, and frankly, I'm learning about Unix memory management as I go, so there may well be a gaping hole I've missed. Hopefully we'll know in a few days, and I'll post the full writeup. The way I see it, the purpose of lazy page table instantiation is to overcome objections to the reverse pte mapping vm technique that have been expressed in the past, namely the slowdown in dup_mmap inside fork. I.e., if rmap slows down fork then Linus and Davem are going to veto it, as they've done in the past, because they feel that the as-yet-unproven advantages of physically-based vm scanning doesn't outweigh the easily measurable fork overhead. Personally, I think that's debatable, but by eliminating the overhead we eliminate the objection, and as far as I know, it's the only serious objection. -- Daniel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Linux 2.4.18pre3-ac1 2002-01-21 3:46 ` Daniel Phillips @ 2002-01-21 5:30 ` Richard Gooch 2002-01-21 5:34 ` Rik van Riel 2002-01-21 13:22 ` Daniel Phillips 0 siblings, 2 replies; 20+ messages in thread From: Richard Gooch @ 2002-01-21 5:30 UTC (permalink / raw) To: Daniel Phillips Cc: Eric W. Biederman, Rik van Riel, Adam Kropelin, linux-kernel Daniel Phillips writes: > The way I see it, the purpose of lazy page table instantiation is to > overcome objections to the reverse pte mapping vm technique that > have been expressed in the past, namely the slowdown in dup_mmap > inside fork. I.e., if rmap slows down fork then Linus and Davem are > going to veto it, as they've done in the past, because they feel > that the as-yet-unproven advantages of physically-based vm scanning > doesn't outweigh the easily measurable fork overhead. Personally, I > think that's debatable, but by eliminating the overhead we eliminate > the objection, and as far as I know, it's the only serious > objection. Will lazy page table instantiation speed up fork(2) without rmap? If so, then you've got a problem, because rmap will still be slower than non-rmap. Linus will happily grab any speedup and make that the new baseline against which new schemes are compared :-) Regards, Richard.... Permanent: rgooch@atnf.csiro.au Current: rgooch@ras.ucalgary.ca ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Linux 2.4.18pre3-ac1 2002-01-21 5:30 ` Richard Gooch @ 2002-01-21 5:34 ` Rik van Riel 2002-01-21 7:01 ` Eric W. Biederman 2002-01-21 13:22 ` Daniel Phillips 1 sibling, 1 reply; 20+ messages in thread From: Rik van Riel @ 2002-01-21 5:34 UTC (permalink / raw) To: Richard Gooch Cc: Daniel Phillips, Eric W. Biederman, Adam Kropelin, linux-kernel On Sun, 20 Jan 2002, Richard Gooch wrote: > Will lazy page table instantiation speed up fork(2) without rmap? > If so, then you've got a problem, because rmap will still be slower > than non-rmap. Linus will happily grab any speedup and make that the > new baseline against which new schemes are compared :-) I guess the difference here is "optimised for lmbench" vs. "optimised to be stable in real workloads" ;) Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Linux 2.4.18pre3-ac1 2002-01-21 5:34 ` Rik van Riel @ 2002-01-21 7:01 ` Eric W. Biederman 2002-01-21 12:02 ` Rik van Riel 2002-01-21 14:02 ` Daniel Phillips 0 siblings, 2 replies; 20+ messages in thread From: Eric W. Biederman @ 2002-01-21 7:01 UTC (permalink / raw) To: Rik van Riel; +Cc: Richard Gooch, Daniel Phillips, Adam Kropelin, linux-kernel Rik van Riel <riel@conectiva.com.br> writes: > On Sun, 20 Jan 2002, Richard Gooch wrote: > > > Will lazy page table instantiation speed up fork(2) without rmap? > > If so, then you've got a problem, because rmap will still be slower > > than non-rmap. Linus will happily grab any speedup and make that the > > new baseline against which new schemes are compared :-) But the differences will go down to the noise level. Your average fork shouldn't need to copy more than one page. So the amount of work is near constant. > I guess the difference here is "optimised for lmbench" > vs. "optimised to be stable in real workloads" ;) Currently the rmap patch triples the size of the page tables which is also an issue. Though it is relatively straight forward to reduce that to simply double the page table size with a order(1) allocation, so we can remove one pointer. Unless I am mistaken an every day shell script is fairly fork/exec/exit intensive operation. And there are probably more shell scripts for unix than every other kind of program put together. An additional possible strike against rmap is that walking through page tables in virtual address order is fairly cache friendly, while a random walk has more of a cache penalty. One more case that is difficult for rmap is the highly mapped case of something like glibc. You can easily get to a thousand entries or more for a single page. In which case a doubly linked list may be more appropriate then a singly linked list (for add/insert), but this again tripples or quadruples the page table size. And none of it solves having to walk very long lists in some circumstances. The best you can do is periodically unmapping pages, and then you only have very long lists for highly active pages. And to be fair rmap has some advantages over the current system. VM algorithms are some simpler to code when you can code them however you want to, instead of being constrained by other parts of the implementation. To the true sceptic what remains to be shown is Eric ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Linux 2.4.18pre3-ac1 2002-01-21 7:01 ` Eric W. Biederman @ 2002-01-21 12:02 ` Rik van Riel 2002-01-21 14:02 ` Daniel Phillips 1 sibling, 0 replies; 20+ messages in thread From: Rik van Riel @ 2002-01-21 12:02 UTC (permalink / raw) To: Eric W. Biederman Cc: Richard Gooch, Daniel Phillips, Adam Kropelin, linux-kernel On 21 Jan 2002, Eric W. Biederman wrote: > Currently the rmap patch triples the size of the page tables which is > also an issue. Though it is relatively straight forward to reduce > that to simply double the page table size with a order(1) allocation, > so we can remove one pointer. Actually most processes seem to be much smaller than 4 MB _and_ have their pages spread out over their address space. This means the page tables are sparsely populated and the pte_chain mechanism should use less memory than doubling the size of the page tables. > Unless I am mistaken an every day shell script is fairly fork/exec/exit > intensive operation. And there are probably more shell scripts for > unix than every other kind of program put together. Bash and gcc seem to use vfork, not sure about make... > An additional possible strike against rmap is that walking through > page tables in virtual address order is fairly cache friendly, while a > random walk has more of a cache penalty. In theory. In practice however kswapd seems to use less CPU with the -rmap VM, most notably in doesn't seem to get lost in the worst case behaviour of the normal VM where it scans hundreds of megabytes of normal memory because it has a DMA zone shortage... > One more case that is difficult for rmap is the highly mapped case of > something like glibc. You can easily get to a thousand entries or > more for a single page. In which case a doubly linked list may be > more appropriate then a singly linked list (for add/insert), but this > again tripples or quadruples the page table size. And none of it > solves having to walk very long lists in some circumstances. The > best you can do is periodically unmapping pages, and then you only > have very long lists for highly active pages. I admit this could be an issue. It would be interesting to see if it is an issue in practice though... > And to be fair rmap has some advantages over the current system. VM > algorithms are some simpler to code when you can code them however > you want to, instead of being constrained by other parts of the > implementation. > > To the true sceptic what remains to be shown is Well, you could download the patch and look for yourself ;) http://surriel.com/patches/ http://linuxvm.bkbits.net/ regards, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Linux 2.4.18pre3-ac1 2002-01-21 7:01 ` Eric W. Biederman 2002-01-21 12:02 ` Rik van Riel @ 2002-01-21 14:02 ` Daniel Phillips 1 sibling, 0 replies; 20+ messages in thread From: Daniel Phillips @ 2002-01-21 14:02 UTC (permalink / raw) To: Eric W. Biederman, Rik van Riel Cc: Richard Gooch, Adam Kropelin, linux-kernel On January 21, 2002 08:01 am, Eric W. Biederman wrote: > Rik van Riel <riel@conectiva.com.br> writes: > > On Sun, 20 Jan 2002, Richard Gooch wrote: > > > > > Will lazy page table instantiation speed up fork(2) without rmap? > > > If so, then you've got a problem, because rmap will still be slower > > > than non-rmap. Linus will happily grab any speedup and make that the > > > new baseline against which new schemes are compared :-) > > But the differences will go down to the noise level. Your average fork > shouldn't need to copy more than one page. So the amount of work is > near constant. In fact there's no difference at all at fork time since the instantiation work is defered to page fault time. > > I guess the difference here is "optimised for lmbench" > > vs. "optimised to be stable in real workloads" ;) > > Currently the rmap patch triples the size of the page tables which is > also an issue. Though it is relatively straight forward to reduce > that to simply double the page table size with a order(1) allocation, > so we can remove one pointer. As Rik pointed out, the overhead isn't per pte, it's 8 bytes per mapped page plus 4 bytes per physical page. This can be reduced to just 4 bytes per physical page in the case of nonshared pages, and the shared case the 8 bytes per mapped page can be reduced by various strategies. Even as it stands it's not too bad. > Unless I am mistaken an every day shell script is fairly fork/exec/exit > intensive operation. And there are probably more shell scripts for > unix than every other kind of program put together. > > An additional possible strike against rmap is that walking through > page tables in virtual address order is fairly cache friendly, while a > random walk has more of a cache penalty. Yes. I've proposed a small optimization where each pte_chain link points to several ptes, reducing the cache penalty for the pte chain walk. Improving the locality of the pte accesses themselves is not as easy since that would require the lru list would need to be in non-random order with respect to ptes, and I don't know any simple way to do that. I also think it doesn't matter much since this overhead is incurred only when we are doing heavy scanning of the ptes, and we are only doing that when we are under heavy memory pressure. In theory, the cost of the extra cache hits will be drowned out by the savings from improved page replacement decisions. Of course, this remains to be seen. > One more case that is difficult for rmap is the highly mapped case of > something like glibc. You can easily get to a thousand entries or > more for a single page. In which case a doubly linked list may be > more appropriate then a singly linked list (for add/insert), but this > again tripples or quadruples the page table size. And none of it > solves having to walk very long lists in some circumstances. The > best you can do is periodically unmapping pages, and then you only > have very long lists for highly active pages. It's likely that many of the page tables referring to glibc can be shared as well. This is a somewhat different problem than the lazy instantiation, but looks tractable to me. Without such sharing there are various things that can be done to reduce the list maintainance overhead. Such long lists can be special-cased for example, so that you would go to a double-linked list or a tree only when sharing exceeds some threshold. > And to be fair rmap has some advantages over the current system. VM > algorithms are some simpler to code when you can code them however > you want to, instead of being constrained by other parts of the > implementation. Virtual scanning has a fundamental disadvantage in comparison to reverse mapping: there is a large and unpredictable lag between the time a pte's accessed bit is transfered to a physical page and when it is queried during the physical scan. Thus, virtual scanning does not scale well, because as memory size increases the age of the accessed information in the physical page becomes increasingly random. There is no simple way to partition the virtual scan by physical region to reduce this lag. Though this is far from the only problem with virtual scanning, in the long run it's the killer. -- Daniel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Linux 2.4.18pre3-ac1 2002-01-21 5:30 ` Richard Gooch 2002-01-21 5:34 ` Rik van Riel @ 2002-01-21 13:22 ` Daniel Phillips 1 sibling, 0 replies; 20+ messages in thread From: Daniel Phillips @ 2002-01-21 13:22 UTC (permalink / raw) To: Richard Gooch Cc: Eric W. Biederman, Rik van Riel, Adam Kropelin, linux-kernel On January 21, 2002 06:30 am, Richard Gooch wrote: > Daniel Phillips writes: > > The way I see it, the purpose of lazy page table instantiation is to > > overcome objections to the reverse pte mapping vm technique that > > have been expressed in the past, namely the slowdown in dup_mmap > > inside fork. I.e., if rmap slows down fork then Linus and Davem are > > going to veto it, as they've done in the past, because they feel > > that the as-yet-unproven advantages of physically-based vm scanning > > doesn't outweigh the easily measurable fork overhead. Personally, I > > think that's debatable, but by eliminating the overhead we eliminate > > the objection, and as far as I know, it's the only serious > > objection. > > Will lazy page table instantiation speed up fork(2) without rmap? Yes. > If so, then you've got a problem, because rmap will still be slower > than non-rmap. Linus will happily grab any speedup and make that the > new baseline against which new schemes are compared :-) Fortunately, rmap and non-rmap will fork at the same speed since in each case the work will consist of copying just the page directory and incrementing the use counts of up to 1024 page tables. Page table instantiation, which happens at fault time, will be slower for rmap than non-rmap. However there are offsetting factors that suggest the bottom line performance will be very similar in unloaded cases, and will favor rmap under heavy load. -- Daniel ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2002-01-21 13:58 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-01-13 21:44 Linux 2.4.18pre3-ac1 Alan Cox 2002-01-13 19:23 ` Thiago Rondon 2002-01-13 22:52 ` Ville Herva 2002-01-13 22:57 ` Alan Cox 2002-01-14 0:15 ` Adam Kropelin 2002-01-14 0:47 ` Alan Cox 2002-01-14 2:13 ` Benjamin LaHaise 2002-01-14 2:54 ` Rik van Riel 2002-01-14 5:47 ` Eric W. Biederman 2002-01-14 6:17 ` Rik van Riel 2002-01-14 7:25 ` Eric W. Biederman 2002-01-14 9:28 ` David S. Miller 2002-01-14 12:05 ` Rik van Riel 2002-01-21 3:46 ` Daniel Phillips 2002-01-21 5:30 ` Richard Gooch 2002-01-21 5:34 ` Rik van Riel 2002-01-21 7:01 ` Eric W. Biederman 2002-01-21 12:02 ` Rik van Riel 2002-01-21 14:02 ` Daniel Phillips 2002-01-21 13:22 ` Daniel Phillips
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox