* Re: [patch 00/14] remap_file_pages protection support [not found] <20060430172953.409399000@zion.home.lan> @ 2006-05-02 3:45 ` Nick Piggin 2006-05-02 3:56 ` Nick Piggin ` (3 more replies) [not found] ` <20060430173025.752423000@zion.home.lan> 1 sibling, 4 replies; 27+ messages in thread From: Nick Piggin @ 2006-05-02 3:45 UTC (permalink / raw) To: blaisorblade; +Cc: Andrew Morton, linux-kernel, Linux Memory Management blaisorblade@yahoo.it wrote: > The first idea is to use this for UML - it must create a lot of single page > mappings, and managing them through separate VMAs is slow. I don't know about this. The patches add some complexity, I guess because we now have vmas which cannot communicate the protectedness of the pages. Still, nobody was too concerned about nonlinear mappings doing the same for addressing. But this does seem to add more overhead to the common cases in the VM :( Now I didn't follow the earlier discussions on this much, but let me try making a few silly comments to get things going again (cc'ed linux-mm). I think I would rather this all just folded under VM_NONLINEAR rather than having this extra MANYPROTS thing, no? (you're already doing that in one direction). > > Additional note: this idea, with some further refinements (which I'll code after > this chunk is accepted), will allow to reduce the number of used VMAs for most > userspace programs - in particular, it will allow to avoid creating one VMA for > one guard pages (which has PROT_NONE) - forcing PROT_NONE on that page will be > enough. I think that's silly. Your VM_MANYPROTS|VM_NONLINEAR vmas will cause more overhead in faulting and reclaim. It loooks like it would take an hour or two just to code up a patch which puts a VM_GUARDPAGES flag into the vma, and tells the free area allocator to skip vm_start-1 .. vm_end+1. What kind of troubles has prevented something simple and easy like that from going in? > > This will be useful since the VMA lookup at fault time can be a bottleneck for > some programs (I've received a report about this from Ulrich Drepper and I've > been told that also Val Henson from Intel is interested about this). I guess > that since we use RB-trees, the slowness is also due to the poor cache locality > of RB-trees (since RB nodes are within VMAs but aren't accessed together with > their content), compared for instance with radix trees where the lookup has high > cache locality (but they have however space usage problems, possibly bigger, on > 64-bit machines). Let's try get back to the good old days when people actually reported their bugs (togther will *real* numbers) to the mailing lists. That way, everybody gets to think about and discuss the problem. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-02 3:45 ` [patch 00/14] remap_file_pages protection support Nick Piggin @ 2006-05-02 3:56 ` Nick Piggin 2006-05-02 11:24 ` Ingo Molnar 2006-05-02 17:16 ` Lee Schermerhorn ` (2 subsequent siblings) 3 siblings, 1 reply; 27+ messages in thread From: Nick Piggin @ 2006-05-02 3:56 UTC (permalink / raw) To: blaisorblade; +Cc: Andrew Morton, linux-kernel, Linux Memory Management Nick Piggin wrote: > blaisorblade@yahoo.it wrote: > >> The first idea is to use this for UML - it must create a lot of single >> page >> mappings, and managing them through separate VMAs is slow. [...] > Let's try get back to the good old days when people actually reported > their bugs (togther will *real* numbers) to the mailing lists. That way, > everybody gets to think about and discuss the problem. Speaking of which, let's see some numbers for UML -- performance and memory. I don't doubt your claims, but I (and others) would be interested to see. Thanks PS. I'll be away for the next few days. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-02 3:56 ` Nick Piggin @ 2006-05-02 11:24 ` Ingo Molnar 2006-05-02 12:19 ` Nick Piggin 0 siblings, 1 reply; 27+ messages in thread From: Ingo Molnar @ 2006-05-02 11:24 UTC (permalink / raw) To: Nick Piggin Cc: blaisorblade, Andrew Morton, linux-kernel, Linux Memory Management * Nick Piggin <nickpiggin@yahoo.com.au> wrote: > >Let's try get back to the good old days when people actually reported > >their bugs (togther will *real* numbers) to the mailing lists. That way, > >everybody gets to think about and discuss the problem. > > Speaking of which, let's see some numbers for UML -- performance and > memory. I don't doubt your claims, but I (and others) would be > interested to see. firstly, thanks for the review feedback! originally i tested this feature with some minimal amount of RAM simulated by UML 128MB or so. That's just 32 thousand pages, but still the improvement was massive: context-switch times in UML were cut in half or more. Process-creation times improved 10-fold. With this feature included I accidentally (for the first time ever!) confused an UML shell prompt with a real shell prompt. (before that UML was so slow [even in "skas mode"] that you'd immediately notice it by the shell's behavior) the 'have 1 vma instead of 32,000 vmas' thing is a really, really big plus. It makes UML comparable to Xen, in rough terms of basic VM design. Now imagine a somewhat larger setup - 16 GB RAM UML instance with 4 million vmas per UML process ... Frankly, without sys_remap_file_pages_prot() the UML design is still somewhat of a toy. Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-02 11:24 ` Ingo Molnar @ 2006-05-02 12:19 ` Nick Piggin 0 siblings, 0 replies; 27+ messages in thread From: Nick Piggin @ 2006-05-02 12:19 UTC (permalink / raw) To: Ingo Molnar Cc: blaisorblade, Andrew Morton, linux-kernel, Linux Memory Management Ingo Molnar wrote: > originally i tested this feature with some minimal amount of RAM > simulated by UML 128MB or so. That's just 32 thousand pages, but still > the improvement was massive: context-switch times in UML were cut in > half or more. Process-creation times improved 10-fold. With this feature > included I accidentally (for the first time ever!) confused an UML shell > prompt with a real shell prompt. (before that UML was so slow [even in > "skas mode"] that you'd immediately notice it by the shell's behavior) Cool, thanks for the numbers. > > the 'have 1 vma instead of 32,000 vmas' thing is a really, really big > plus. It makes UML comparable to Xen, in rough terms of basic VM design. > > Now imagine a somewhat larger setup - 16 GB RAM UML instance with 4 > million vmas per UML process ... Frankly, without > sys_remap_file_pages_prot() the UML design is still somewhat of a toy. Yes, I guess I imagined the common case might have been slightly better, however with reasonable RAM utilisation, fragmentation means I wouldn't be surprised if it does easily get close to that worst theoretical case. My request for numbers was more about the Intel/glibc people than Paolo: I do realise it is a problem for UML. I just like to see nice numbers :) I think UML's really neat, so I'd love to see this get in. I don't see any fundamental sticking point, given a few iterations, and some more discussion. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-02 3:45 ` [patch 00/14] remap_file_pages protection support Nick Piggin 2006-05-02 3:56 ` Nick Piggin @ 2006-05-02 17:16 ` Lee Schermerhorn 2006-05-03 1:20 ` Blaisorblade 2006-05-03 0:25 ` Blaisorblade 2006-05-03 0:44 ` Blaisorblade 3 siblings, 1 reply; 27+ messages in thread From: Lee Schermerhorn @ 2006-05-02 17:16 UTC (permalink / raw) To: Nick Piggin Cc: blaisorblade, Andrew Morton, linux-kernel, Linux Memory Management On Tue, 2006-05-02 at 13:45 +1000, Nick Piggin wrote: > blaisorblade@yahoo.it wrote: > > > The first idea is to use this for UML - it must create a lot of single page > > mappings, and managing them through separate VMAs is slow. > > I don't know about this. The patches add some complexity, I guess because > we now have vmas which cannot communicate the protectedness of the pages. > Still, nobody was too concerned about nonlinear mappings doing the same > for addressing. But this does seem to add more overhead to the common cases > in the VM :( > > Now I didn't follow the earlier discussions on this much, but let me try > making a few silly comments to get things going again (cc'ed linux-mm). > > I think I would rather this all just folded under VM_NONLINEAR rather than > having this extra MANYPROTS thing, no? (you're already doing that in one > direction). <snip> One way I've seen this done on other systems is to use something like a prio tree [e.g., see the shared policy support for shmem] for sub-vma protection ranges. Most vmas [I'm guessing here] will have only the original protections or will be reprotected in toto. So, one need only allocate/populate the protection tree when sub-vma protections are requested. Then, one can test protections via the vma, perhaps with access/check macros to hide the existence of the protection tree. Of course, adding a tree-like structure could introduce locking complications/overhead in some paths where we'd rather not [just guessing again]. Might be more overhead than just mucking with the ptes [for UML], but would keep the ptes in sync with the vma's view of "protectedness". Lee -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-02 17:16 ` Lee Schermerhorn @ 2006-05-03 1:20 ` Blaisorblade 2006-05-03 14:35 ` Lee Schermerhorn 0 siblings, 1 reply; 27+ messages in thread From: Blaisorblade @ 2006-05-03 1:20 UTC (permalink / raw) To: Lee Schermerhorn Cc: Nick Piggin, Andrew Morton, linux-kernel, Linux Memory Management, Ulrich Drepper, Val Henson On Tuesday 02 May 2006 19:16, Lee Schermerhorn wrote: > On Tue, 2006-05-02 at 13:45 +1000, Nick Piggin wrote: > > blaisorblade@yahoo.it wrote: > > I think I would rather this all just folded under VM_NONLINEAR rather > > than having this extra MANYPROTS thing, no? (you're already doing that in > > one direction). > One way I've seen this done on other systems I'm curious, which ones? > is to use something like a > prio tree [e.g., see the shared policy support for shmem] for sub-vma > protection ranges. Which sub-vma ranges? The ones created with mprotect? I'm curious about what is the difference between this sub-tree and the main tree... you have some point, but I miss which one :-) Actually when doing a lookup in the main tree the extra nodes in the subtree are not searched, so you get an advantage. One possible point is that a VMA maps to one mmap() call (with splits from mremap(),mprotect(),partial munmap()s), and then they use sub-VMAs instead of VMA splits. > Most vmas [I'm guessing here] will have only the > original protections or will be reprotected in toto. > So, one need only > allocate/populate the protection tree when sub-vma protections are > requested. Then, one can test protections via the vma, perhaps with > access/check macros to hide the existence of the protection tree. Of > course, adding a tree-like structure could introduce locking > complications/overhead in some paths where we'd rather not [just > guessing again]. Might be more overhead than just mucking with the ptes > [for UML], but would keep the ptes in sync with the vma's view of > "protectedness". > > Lee Ok, there are two different situations, I'm globally unconvinced until I understand the usefulness of a different sub-tree. a) UML. The answer is _no_ to all guesses, since we must implement page tables of a guest virtual machine via mmap() or remap_file_pages. And they're as fragmented as they get (we get one-page-wide VMAs currently). b) the proposed glibc usage. The original Ulrich's request (which I cut down because of problems with objrmap) was to have one mapping per DSO, including code,data and guard page. So you have three protections in one VMA. However, this is doable via this remap_file_pages, adding something for handling private VMAs (handling movement of the anonymous memory you get on writes); but it's slow on swapout, since it stops using objrmap. So I've not thought to do it. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-03 1:20 ` Blaisorblade @ 2006-05-03 14:35 ` Lee Schermerhorn 0 siblings, 0 replies; 27+ messages in thread From: Lee Schermerhorn @ 2006-05-03 14:35 UTC (permalink / raw) To: Blaisorblade Cc: Nick Piggin, Andrew Morton, linux-kernel, Linux Memory Management, Ulrich Drepper, Val Henson, Bob Picco On Wed, 2006-05-03 at 03:20 +0200, Blaisorblade wrote: > On Tuesday 02 May 2006 19:16, Lee Schermerhorn wrote: > > On Tue, 2006-05-02 at 13:45 +1000, Nick Piggin wrote: > > > blaisorblade@yahoo.it wrote: > > > > I think I would rather this all just folded under VM_NONLINEAR rather > > > than having this extra MANYPROTS thing, no? (you're already doing that in > > > one direction). > > > One way I've seen this done on other systems > > I'm curious, which ones? Let's see: System V [4.2MP] back in the days of USL [Unix Systems Labs] did this, as did [does] Digital/Compaq/HP Tru64 on alpha. I'm not sure if the latter came from the original Mach/OSF code or from Bob Picco's rewrite thereof back in the 90's. > > > is to use something like a > > prio tree [e.g., see the shared policy support for shmem] for sub-vma > > protection ranges. > Which sub-vma ranges? The ones created with mprotect? > > I'm curious about what is the difference between this sub-tree and the main > tree... you have some point, but I miss which one :-) Actually when doing a > lookup in the main tree the extra nodes in the subtree are not searched, so > you get an advantage. True, you still have locate the protection range in the subtree and then still walk the page tables to install the new protections. Of course, in the bad old days, the only time I saw thousands or 10s of thousands of different mappings/vmas/regions/whatever was in vm stress tests. Until squid came along, that is. Then we encountered, on large memory alpha systems, 100s of thousands of mmaped files. Had to replace the linear [honest] vma/mapping list at that point ;-). > > One possible point is that a VMA maps to one mmap() call (with splits from > mremap(),mprotect(),partial munmap()s), and then they use sub-VMAs instead of > VMA splits. Yeah. That was the point--in response to Nick's comment about the disconnect betwen the protections as reported by the vma and the actual pte protections. > > > Most vmas [I'm guessing here] will have only the > > original protections or will be reprotected in toto. > > > So, one need only > > allocate/populate the protection tree when sub-vma protections are > > requested. Then, one can test protections via the vma, perhaps with > > access/check macros to hide the existence of the protection tree. Of > > course, adding a tree-like structure could introduce locking > > complications/overhead in some paths where we'd rather not [just > > guessing again]. Might be more overhead than just mucking with the ptes > > [for UML], but would keep the ptes in sync with the vma's view of > > "protectedness". > > > > Lee > > Ok, there are two different situations, I'm globally unconvinced until I > understand the usefulness of a different sub-tree. > > a) UML. The answer is _no_ to all guesses, since we must implement page tables > of a guest virtual machine via mmap() or remap_file_pages. And they're as > fragmented as they get (we get one-page-wide VMAs currently). > > b) the proposed glibc usage. The original Ulrich's request (which I cut down > because of problems with objrmap) was to have one mapping per DSO, including > code,data and guard page. So you have three protections in one VMA. > > However, this is doable via this remap_file_pages, adding something for > handling private VMAs (handling movement of the anonymous memory you get on > writes); but it's slow on swapout, since it stops using objrmap. So I've not > thought to do it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-02 3:45 ` [patch 00/14] remap_file_pages protection support Nick Piggin 2006-05-02 3:56 ` Nick Piggin 2006-05-02 17:16 ` Lee Schermerhorn @ 2006-05-03 0:25 ` Blaisorblade 2006-05-06 16:05 ` Ulrich Drepper 2006-05-03 0:44 ` Blaisorblade 3 siblings, 1 reply; 27+ messages in thread From: Blaisorblade @ 2006-05-03 0:25 UTC (permalink / raw) To: Nick Piggin Cc: Andrew Morton, linux-kernel, Linux Memory Management, Ulrich Drepper, Val Henson On Tuesday 02 May 2006 05:45, Nick Piggin wrote: > blaisorblade@yahoo.it wrote: > > This will be useful since the VMA lookup at fault time can be a > > bottleneck for some programs (I've received a report about this from > > Ulrich Drepper and I've been told that also Val Henson from Intel is > > interested about this). I guess that since we use RB-trees, the slowness > > is also due to the poor cache locality of RB-trees (since RB nodes are > > within VMAs but aren't accessed together with their content), compared > > for instance with radix trees where the lookup has high cache locality > > (but they have however space usage problems, possibly bigger, on 64-bit > > machines). > Let's try get back to the good old days when people actually reported > their bugs (togther will *real* numbers) to the mailing lists. That way, > everybody gets to think about and discuss the problem. I've not seen the numbers indeed, I've been told of a problem with a "customer program" and Ingo connected my work with this problem. Frankly, I've been always astonished about how looking up a 10-level tree can be slow. Poor cache locality is the only thing that I could think about. That said, it was an add-on, not the original motivation of the work. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-03 0:25 ` Blaisorblade @ 2006-05-06 16:05 ` Ulrich Drepper 2006-05-07 4:22 ` Nick Piggin 0 siblings, 1 reply; 27+ messages in thread From: Ulrich Drepper @ 2006-05-06 16:05 UTC (permalink / raw) To: Blaisorblade Cc: Nick Piggin, Andrew Morton, linux-kernel, Linux Memory Management, Val Henson [-- Attachment #1: Type: text/plain, Size: 3971 bytes --] Blaisorblade wrote: > I've not seen the numbers indeed, I've been told of a problem with a "customer > program" and Ingo connected my work with this problem. Frankly, I've been > always astonished about how looking up a 10-level tree can be slow. Poor > cache locality is the only thing that I could think about. It might be good if I explain a bit how much we use mmap in libc. The numbers can really add up quickly. - for each loaded DSO we might have up to 5 VMAs today: 1. text segment, protection r-xp, normally never written to 2. a gap, protection ---p (i.e., no access) 3. a relro data segment, protectection r--p 4. a data segment, protection rw-p 5. a bss segment, anonymous memory The first four are mapped from the file. In fact, the first segment "allocates" the entire address space of all segment, even if it's longer than the file. Then gap is done using mprotect(PROT_NONE). Then the area for segment 3 and 4 is mapped in one mmap() call. It's in the same file but the offset used in the mmap is not the same as the same as the offset which naturally is already established through the first mmap. I.e., if the first mmap() would start at offset 0 and continue for 1000 pages, the gap might start at a, say, offset of 4 pages and continue for 500 pages. Then the "natural" offset of the first data page would be 504 pages but the second mmap() call would in fact use the offset 4 because the text and data segment are continuous in the _file_ (although not in memory). Anyway, once relocations are done the protection of the relro segment is changed, splitting the data segment in two. So, for DSO loading there would be two steps of improvement: 1. if a mprotect() call wouldn't split the VMA we would have 3 VMAs in the end instead of 5. 40% gain. 2. if I could use remap_file_pages() for the data segment mapping and the call would allow changing the protection and it would not split the VMAs, then we'd be down to 2 mappings. 60% down. A second big VMA user are thread stacks. I think the application which was mentioned in this thread briefly used literally thousands of threads. Leaving aside the insanity of this (it's unfortunately how many programmers work) this can create problems because we get at least two (on IA-64 three) VMAs per thread. I.e., thread stack allocation works likes this: 1. allocate area big enough for stack and guard (we don't use automatic growing, this cannot work) 2. change the protection of the guard end of the stack to PROT_NONE. So, for say 1000 threads we'll end up with 2000 VMAs. Threads are also important to mention here because - often they are short-lived and we have to recreate them often. We usually reuse stacks but only keep that much allocated memory around. So more often than we like we actually free and later re-allocate stacks. - these thousands of stack VMAs are really used concurrently. ALl threads are woken over a period of time. A third source of VMAs is anonymous memory allocation. mmap is used in the malloc implementation and directly in various places. For randomization reasons there isn't really much we can do here, we shouldn't lump all these allocations together. A fourth source of VMAs are the programs themselves which mmap files. Often read-only mappings of many small files. Put all this together and non-trivial apps as written today (I don't say they are high-quality apps) can easily have a few thousand, maybe even 10,000 to 20,000 VMAs. Firefox on my machine uses in the moment ~560 VMAs and this is with only a handful of threads. Are these the numbers the VM system is optimized for? I think what our people running the experiments at the customer site saw is that it's not. The VMA traversal showed up on the profile lists. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-06 16:05 ` Ulrich Drepper @ 2006-05-07 4:22 ` Nick Piggin 2006-05-13 14:13 ` Nick Piggin 0 siblings, 1 reply; 27+ messages in thread From: Nick Piggin @ 2006-05-07 4:22 UTC (permalink / raw) To: Ulrich Drepper Cc: Blaisorblade, Andrew Morton, linux-kernel, Linux Memory Management, Val Henson Ulrich Drepper wrote: > Blaisorblade wrote: > >>I've not seen the numbers indeed, I've been told of a problem with a "customer >>program" and Ingo connected my work with this problem. Frankly, I've been >>always astonished about how looking up a 10-level tree can be slow. Poor >>cache locality is the only thing that I could think about. > > > It might be good if I explain a bit how much we use mmap in libc. The > numbers can really add up quickly. [...] Thanks. Very informative. > Put all this together and non-trivial apps as written today (I don't say > they are high-quality apps) can easily have a few thousand, maybe even > 10,000 to 20,000 VMAs. Firefox on my machine uses in the moment ~560 > VMAs and this is with only a handful of threads. Are these the numbers > the VM system is optimized for? I think what our people running the > experiments at the customer site saw is that it's not. The VMA > traversal showed up on the profile lists. Your % improvement numbers are of course only talking about memory usage improvements. Time complexity increases with the log of the number of VMAs, so while search within 100,000 vmas might have a CPU cost of 16 arbitrary units, it is only about 300% the cost in 40 vmas (and not the 2,500,000% that the number of vmas suggests). Definitely reducing vmas would be good. If guard ranges around vmas can be implemented easily and reduce vmas by even 20%, it would come at an almost zero complexity cost to the kernel. However, I think another consideration is the vma lookup cache. I need to get around to looking at this again, but IMO it is inadequate for threaded applications. Currently we have one last-lookup cached vma for each mm. You get cacheline bouncing when updating the cache, and the locality becomes almost useless. I think possibly each thread should have a private vma cache, with room for at least its stack vma(s), (and several others, eg. code, data). Perhaps the per-mm cache could be dispensed with completely, although it might be useful eg. for the heap. And it might be helped with increased entries as well. I've got patches lying around to implement this stuff -- I'd be interested to have more detail about this problem, or distilled test cases. Nick -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-07 4:22 ` Nick Piggin @ 2006-05-13 14:13 ` Nick Piggin 2006-05-13 18:19 ` Valerie Henson 0 siblings, 1 reply; 27+ messages in thread From: Nick Piggin @ 2006-05-13 14:13 UTC (permalink / raw) To: Ulrich Drepper Cc: Blaisorblade, Andrew Morton, linux-kernel, Linux Memory Management, Val Henson [-- Attachment #1: Type: text/plain, Size: 1433 bytes --] Nick Piggin wrote: > I think possibly each thread should have a private vma cache, with > room for at least its stack vma(s), (and several others, eg. code, > data). Perhaps the per-mm cache could be dispensed with completely, > although it might be useful eg. for the heap. And it might be helped > with increased entries as well. > > I've got patches lying around to implement this stuff -- I'd be > interested to have more detail about this problem, or distilled test > cases. OK, I got interested again, but can't get Val's ebizzy to give me a find_vma constrained workload yet (though the numbers back up my assertion that the vma cache is crap for threaded apps). Without the patch, after bootup, the vma cache gets 208 364 hits out of 438 535 lookups (47.5%) ./ebizzy -t16: 384.29user 754.61system 5:31.87elapsed 343%CPU And ebizzy gets 7 373 078 hits out of 82 255 863 lookups (8.9%) With mm + 4 slot LRU per-thread cache (this patch): After boot, 303 767 / 439 918 = 69.0% ./ebizzy -t16: 388.73user 750.29system 5:30.24elapsed 344%CPU ebizzy hits: 53 024 083 / 82 055 195 = 64.6% So on a non-threaded workload, hit rate is increased by about 50%; on a threaded workload it is increased by over 700%. In rbtree-walk -constrained workloads, the total find_vma speedup should be linear to the hit ratio improvement. I don't think my ebizzy numbers can justify the patch though... Nick -- SUSE Labs, Novell Inc. [-- Attachment #2: vma.patch --] [-- Type: text/plain, Size: 6766 bytes --] Index: linux-2.6/mm/mmap.c =================================================================== --- linux-2.6.orig/mm/mmap.c 2006-05-13 23:31:13.000000000 +1000 +++ linux-2.6/mm/mmap.c 2006-05-13 23:48:53.000000000 +1000 @@ -30,6 +30,99 @@ #include <asm/cacheflush.h> #include <asm/tlb.h> +static void vma_cache_touch(struct mm_struct *mm, struct vm_area_struct *vma) +{ + struct task_struct *curr = current; + if (mm == curr->mm) { + int i; + if (curr->vma_cache_sequence != mm->vma_sequence) { + curr->vma_cache_sequence = mm->vma_sequence; + curr->vma_cache[0] = vma; + for (i = 1; i < 4; i++) + curr->vma_cache[i] = NULL; + } else { + int update_mm; + + if (curr->vma_cache[0] == vma) + return; + + for (i = 1; i < 4; i++) { + if (curr->vma_cache[i] == vma) + break; + } + update_mm = 0; + if (i == 4) { + update_mm = 1; + i = 3; + } + while (i != 0) { + curr->vma_cache[i] = curr->vma_cache[i-1]; + i--; + } + curr->vma_cache[0] = vma; + + if (!update_mm) + return; + } + } + + if (mm->vma_cache != vma) /* prevent cacheline bouncing */ + mm->vma_cache = vma; +} + +static void vma_cache_replace(struct mm_struct *mm, struct vm_area_struct *vma, + struct vm_area_struct *repl) +{ + mm->vma_sequence++; + if (unlikely(mm->vma_sequence == 0)) { + struct task_struct *curr = current, *t; + t = curr; + rcu_read_lock(); + do { + t->vma_cache_sequence = -1; + t = next_thread(t); + } while (t != curr); + rcu_read_unlock(); + } + + if (mm->vma_cache == vma) + mm->vma_cache = repl; +} + +static inline void vma_cache_invalidate(struct mm_struct *mm, struct vm_area_struct *vma) +{ + vma_cache_replace(mm, vma, NULL); +} + +static struct vm_area_struct *vma_cache_find(struct mm_struct *mm, + unsigned long addr) +{ + struct task_struct *curr; + struct vm_area_struct *vma; + + inc_page_state(vma_cache_query); + + curr = current; + if (mm == curr->mm && mm->vma_sequence == curr->vma_cache_sequence) { + int i; + for (i = 0; i < 4; i++) { + vma = curr->vma_cache[i]; + if (vma && vma->vm_end > addr && vma->vm_start <= addr){ + inc_page_state(vma_cache_hit); + return vma; + } + } + } + + vma = mm->vma_cache; + if (vma && vma->vm_end > addr && vma->vm_start <= addr) { + inc_page_state(vma_cache_hit); + return vma; + } + + return NULL; +} + static void unmap_region(struct mm_struct *mm, struct vm_area_struct *vma, struct vm_area_struct *prev, unsigned long start, unsigned long end); @@ -460,8 +553,6 @@ { prev->vm_next = vma->vm_next; rb_erase(&vma->vm_rb, &mm->mm_rb); - if (mm->mmap_cache == vma) - mm->mmap_cache = prev; } /* @@ -586,6 +677,7 @@ * us to remove next before dropping the locks. */ __vma_unlink(mm, next, vma); + vma_cache_replace(mm, next, vma); if (file) __remove_shared_vm_struct(next, file, mapping); if (next->anon_vma) @@ -1384,8 +1476,8 @@ if (mm) { /* Check the cache first. */ /* (Cache hit rate is typically around 35%.) */ - vma = mm->mmap_cache; - if (!(vma && vma->vm_end > addr && vma->vm_start <= addr)) { + vma = vma_cache_find(mm, addr); + if (!vma) { struct rb_node * rb_node; rb_node = mm->mm_rb.rb_node; @@ -1405,9 +1497,9 @@ } else rb_node = rb_node->rb_right; } - if (vma) - mm->mmap_cache = vma; } + if (vma) + vma_cache_touch(mm, vma); } return vma; } @@ -1424,6 +1516,14 @@ if (!mm) goto out; + vma = vma_cache_find(mm, addr); + if (vma) { + rb_node = rb_prev(&vma->vm_rb); + if (rb_node) + prev = rb_entry(rb_node, struct vm_area_struct, vm_rb); + goto out; + } + /* Guard against addr being lower than the first VMA */ vma = mm->mmap; @@ -1445,6 +1545,9 @@ } out: + if (vma) + vma_cache_touch(mm, vma); + *pprev = prev; return prev ? prev->vm_next : vma; } @@ -1686,6 +1789,7 @@ insertion_point = (prev ? &prev->vm_next : &mm->mmap); do { + vma_cache_invalidate(mm, vma); rb_erase(&vma->vm_rb, &mm->mm_rb); mm->map_count--; tail_vma = vma; @@ -1698,7 +1802,6 @@ else addr = vma ? vma->vm_start : mm->mmap_base; mm->unmap_area(mm, addr); - mm->mmap_cache = NULL; /* Kill the cache. */ } /* Index: linux-2.6/include/linux/page-flags.h =================================================================== --- linux-2.6.orig/include/linux/page-flags.h 2006-05-13 23:31:05.000000000 +1000 +++ linux-2.6/include/linux/page-flags.h 2006-05-13 23:31:44.000000000 +1000 @@ -164,6 +164,9 @@ unsigned long pgrotated; /* pages rotated to tail of the LRU */ unsigned long nr_bounce; /* pages for bounce buffers */ + + unsigned long vma_cache_hit; + unsigned long vma_cache_query; }; extern void get_page_state(struct page_state *ret); Index: linux-2.6/mm/page_alloc.c =================================================================== --- linux-2.6.orig/mm/page_alloc.c 2006-05-13 23:31:05.000000000 +1000 +++ linux-2.6/mm/page_alloc.c 2006-05-13 23:31:44.000000000 +1000 @@ -2389,6 +2389,9 @@ "pgrotated", "nr_bounce", + + "vma_cache_hit", + "vma_cache_query", }; static void *vmstat_start(struct seq_file *m, loff_t *pos) Index: linux-2.6/include/linux/sched.h =================================================================== --- linux-2.6.orig/include/linux/sched.h 2006-05-13 23:31:03.000000000 +1000 +++ linux-2.6/include/linux/sched.h 2006-05-13 23:33:01.000000000 +1000 @@ -293,9 +293,11 @@ } while (0) struct mm_struct { - struct vm_area_struct * mmap; /* list of VMAs */ + struct vm_area_struct *mmap; /* list of VMAs */ struct rb_root mm_rb; - struct vm_area_struct * mmap_cache; /* last find_vma result */ + struct vm_area_struct *vma_cache; + unsigned long vma_sequence; + unsigned long (*get_unmapped_area) (struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); @@ -734,6 +736,8 @@ struct list_head ptrace_list; struct mm_struct *mm, *active_mm; + struct vm_area_struct *vma_cache[4]; + unsigned long vma_cache_sequence; /* task state */ struct linux_binfmt *binfmt; Index: linux-2.6/kernel/fork.c =================================================================== --- linux-2.6.orig/kernel/fork.c 2006-05-13 23:31:03.000000000 +1000 +++ linux-2.6/kernel/fork.c 2006-05-13 23:32:59.000000000 +1000 @@ -197,7 +197,7 @@ mm->locked_vm = 0; mm->mmap = NULL; - mm->mmap_cache = NULL; + mm->vma_sequence = oldmm->vma_sequence+1; mm->free_area_cache = oldmm->mmap_base; mm->cached_hole_size = ~0UL; mm->map_count = 0; @@ -238,6 +238,10 @@ tmp->vm_next = NULL; anon_vma_link(tmp); file = tmp->vm_file; + + if (oldmm->vma_cache == mpnt) + mm->vma_cache = tmp; + if (file) { struct inode *inode = file->f_dentry->d_inode; get_file(file); ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-13 14:13 ` Nick Piggin @ 2006-05-13 18:19 ` Valerie Henson 2006-05-13 22:54 ` Valerie Henson 2006-05-16 13:30 ` Nick Piggin 0 siblings, 2 replies; 27+ messages in thread From: Valerie Henson @ 2006-05-13 18:19 UTC (permalink / raw) To: Nick Piggin Cc: Ulrich Drepper, Blaisorblade, Andrew Morton, linux-kernel, Linux Memory Management, Val Henson On Sun, May 14, 2006 at 12:13:21AM +1000, Nick Piggin wrote: > > OK, I got interested again, but can't get Val's ebizzy to give me > a find_vma constrained workload yet (though the numbers back up > my assertion that the vma cache is crap for threaded apps). Hey Nick, Glad to see you're using it! There are (at least) two ways to do what you want: 1. Increase the number of threads - this gives you two vma's per thread, one for stack, one for guard page: $ ./ebizzy -t 100 2. Apply the patch at the end of this email and use -p "prevent coalescing", -m "always mmap" and appropriate number of chunks, size, and records to search - this works for me: $ ./ebizzy -p -m -n 10000 -s 4096 -r 100000 The original program mmapped everything with the same permissions and no alignment restrictions, so all the mmaps were coalesced into one. This version alternates PROT_WRITE permissions on the mmap'd areas after they are written, so you get lots of vma's: val@goober:~/ebizzy$ ./ebizzy -p -m -n 10000 -s 4096 -r 100000 [2]+ Stopped ./ebizzy -p -m -n 10000 -s 4096 -r 100000 val@goober:~/ebizzy$ wc -l /proc/`pgrep ebizzy`/maps 10019 /proc/10917/maps I haven't profiled to see if this brings find_vma to the top, though. (The patch also moves around some other stuff so that options are in alphabetical order; apparently I thought 's' came after 'r' and before 'R'...) -VAL --- ebizzy.c.old 2006-05-13 10:18:58.000000000 -0700 +++ ebizzy.c 2006-05-13 11:01:42.000000000 -0700 @@ -52,9 +52,10 @@ static unsigned int always_mmap; static unsigned int never_mmap; static unsigned int chunks; +static unsigned int prevent_coalescing; static unsigned int records; -static unsigned int chunk_size; static unsigned int random_size; +static unsigned int chunk_size; static unsigned int threads; static unsigned int verbose; static unsigned int linear; @@ -76,9 +77,10 @@ "-m\t\t Always use mmap instead of malloc\n" "-M\t\t Never use mmap\n" "-n <num>\t Number of memory chunks to allocate\n" + "-p \t\t Prevent mmap coalescing\n" "-r <num>\t Total number of records to search for\n" - "-s <size>\t Size of memory chunks, in bytes\n" "-R\t\t Randomize size of memory to copy and search\n" + "-s <size>\t Size of memory chunks, in bytes\n" "-t <num>\t Number of threads\n" "-v[v[v]]\t Be verbose (more v's for more verbose)\n" "-z\t\t Linear search instead of binary search\n", @@ -98,7 +100,7 @@ cmd = argv[0]; opterr = 1; - while ((c = getopt(argc, argv, "mMn:r:s:Rt:vz")) != -1) { + while ((c = getopt(argc, argv, "mMn:pr:Rs:t:vz")) != -1) { switch (c) { case 'm': always_mmap = 1; @@ -111,19 +113,22 @@ if (chunks == 0) usage(); break; + case 'p': + prevent_coalescing = 1; + break; case 'r': records = atoi(optarg); if (records == 0) usage(); break; + case 'R': + random_size = 1; + break; case 's': chunk_size = atoi(optarg); if (chunk_size == 0) usage(); break; - case 'R': - random_size = 1; - break; case 't': threads = atoi(optarg); if (threads == 0) @@ -141,7 +146,7 @@ } if (verbose) - printf("ebizzy 0.1, Copyright 2006 Intel Corporation\n" + printf("ebizzy 0.2, Copyright 2006 Intel Corporation\n" "Written by Val Henson <val_henson@linux.intel.com\n"); /* @@ -173,10 +178,11 @@ printf("always_mmap %u\n", always_mmap); printf("never_mmap %u\n", never_mmap); printf("chunks %u\n", chunks); + printf("prevent coalescing %u\n", prevent_coalescing); printf("records %u\n", records); printf("records per thread %u\n", records_per_thread); - printf("chunk_size %u\n", chunk_size); printf("random_size %u\n", random_size); + printf("chunk_size %u\n", chunk_size); printf("threads %u\n", threads); printf("verbose %u\n", verbose); printf("linear %u\n", linear); @@ -251,9 +257,13 @@ { int i, j; - for (i = 0; i < chunks; i++) + for (i = 0; i < chunks; i++) { for(j = 0; j < chunk_size / record_size; j++) mem[i][j] = (record_t) j; + /* Prevent coalescing by alternating permissions */ + if (prevent_coalescing && (i % 2) == 0) + mprotect(mem[i], chunk_size, PROT_READ); + } if (verbose) printf("Wrote memory\n"); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-13 18:19 ` Valerie Henson @ 2006-05-13 22:54 ` Valerie Henson 2006-05-16 13:30 ` Nick Piggin 1 sibling, 0 replies; 27+ messages in thread From: Valerie Henson @ 2006-05-13 22:54 UTC (permalink / raw) To: Nick Piggin Cc: Ulrich Drepper, Blaisorblade, Andrew Morton, linux-kernel, Linux Memory Management, Arjan van de Ven On Sat, May 13, 2006 at 11:19:46AM -0700, Valerie Henson wrote: > The original program mmapped everything with the same permissions and > no alignment restrictions, so all the mmaps were coalesced into one. > This version alternates PROT_WRITE permissions on the mmap'd areas > after they are written, so you get lots of vma's: ... Which is of course exactly the case that Blaisorblade's patches should coalesce into one vma. So I wrote another option which uses holes instead - takes more memory initially, unfortunately. Grab it from: http://www.nmt.edu/~val/patches/ebizzy.tar.gz -p for preventing coaelescing via protections, -P for preventing via holes. -VAL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-13 18:19 ` Valerie Henson 2006-05-13 22:54 ` Valerie Henson @ 2006-05-16 13:30 ` Nick Piggin 2006-05-16 13:51 ` Andreas Mohr 2006-05-16 16:33 ` Valerie Henson 1 sibling, 2 replies; 27+ messages in thread From: Nick Piggin @ 2006-05-16 13:30 UTC (permalink / raw) To: Valerie Henson Cc: Ulrich Drepper, Blaisorblade, Andrew Morton, linux-kernel, Linux Memory Management, Val Henson Valerie Henson wrote: > On Sun, May 14, 2006 at 12:13:21AM +1000, Nick Piggin wrote: > >>OK, I got interested again, but can't get Val's ebizzy to give me >>a find_vma constrained workload yet (though the numbers back up >>my assertion that the vma cache is crap for threaded apps). > > > Hey Nick, > > Glad to see you're using it! There are (at least) two ways to do what > you want: > > 1. Increase the number of threads - this gives you two vma's per > thread, one for stack, one for guard page: > > $ ./ebizzy -t 100 > > 2. Apply the patch at the end of this email and use -p "prevent > coalescing", -m "always mmap" and appropriate number of chunks, > size, and records to search - this works for me: > > $ ./ebizzy -p -m -n 10000 -s 4096 -r 100000 > > The original program mmapped everything with the same permissions and > no alignment restrictions, so all the mmaps were coalesced into one. > This version alternates PROT_WRITE permissions on the mmap'd areas > after they are written, so you get lots of vma's: > > val@goober:~/ebizzy$ ./ebizzy -p -m -n 10000 -s 4096 -r 100000 > > [2]+ Stopped ./ebizzy -p -m -n 10000 -s 4096 -r 100000 > val@goober:~/ebizzy$ wc -l /proc/`pgrep ebizzy`/maps > 10019 /proc/10917/maps > > I haven't profiled to see if this brings find_vma to the top, though. > Hi Val, Thanks, I've tried with your most recent ebizzy and with 256 threads and 50,000 vmas (which gives really poor mmap_cache hits), I'm still unable to get find_vma above a few % of kernel time. With 50,000 vmas, my per-thread vma cache is much less effective, I guess because access is pretty random (hopefully more realistic patterns would get a bigger improvement). I also tried running kbuild under UML, and could not make find_vma take much time either [in this case, the per-thread vma cache patch roughly doubles the number of hits, from about 15%->30% (in the host)]. So I guess it's time to go back into my hole. If anyone does come across a find_vma constrained workload (especially with threads), I'd be very interested. Thanks, Nick -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-16 13:30 ` Nick Piggin @ 2006-05-16 13:51 ` Andreas Mohr 2006-05-16 16:31 ` Valerie Henson 2006-05-16 16:33 ` Valerie Henson 1 sibling, 1 reply; 27+ messages in thread From: Andreas Mohr @ 2006-05-16 13:51 UTC (permalink / raw) To: Nick Piggin Cc: Valerie Henson, Ulrich Drepper, Blaisorblade, Andrew Morton, linux-kernel, Linux Memory Management, Val Henson Hi, On Tue, May 16, 2006 at 11:30:32PM +1000, Nick Piggin wrote: > I also tried running kbuild under UML, and could not make find_vma take > much time either [in this case, the per-thread vma cache patch roughly > doubles the number of hits, from about 15%->30% (in the host)]. > > So I guess it's time to go back into my hole. If anyone does come across > a find_vma constrained workload (especially with threads), I'd be very > interested. I cannot offer much other than some random confirmation that from my own oprofiling, whatever I did (often running a load test script consisting of launching 30 big apps at the same time), find_vma basically always showed up very prominently in the list of vmlinux-based code (always ranking within the top 4 or 5 kernel hotspots, such as timer interrupts, ACPI idle I/O etc.pp.). call-tracing showed it originating from mmap syscalls etc., and AFAIR quite some find_vma activity from oprofile itself. Profiling done on 512MB UP Athlon and P3/700, 2.6.16ish, current Debian. Sorry for the foggy report, I don't have those logs here right now. So yes, improving that part should help in general, but I cannot quite say that my machines are "constrained" by it. But you probably knew that already, otherwise you wouldn't have poked in there... ;) Andreas Mohr -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-16 13:51 ` Andreas Mohr @ 2006-05-16 16:31 ` Valerie Henson 2006-05-16 16:47 ` Andreas Mohr 0 siblings, 1 reply; 27+ messages in thread From: Valerie Henson @ 2006-05-16 16:31 UTC (permalink / raw) To: Andreas Mohr Cc: Nick Piggin, Ulrich Drepper, Blaisorblade, Andrew Morton, linux-kernel, Linux Memory Management, Val Henson On Tue, May 16, 2006 at 03:51:35PM +0200, Andreas Mohr wrote: > > I cannot offer much other than some random confirmation that from my own > oprofiling, whatever I did (often running a load test script consisting of > launching 30 big apps at the same time), find_vma basically always showed up > very prominently in the list of vmlinux-based code (always ranking within the > top 4 or 5 kernel hotspots, such as timer interrupts, ACPI idle I/O etc.pp.). > call-tracing showed it originating from mmap syscalls etc., and AFAIR quite > some find_vma activity from oprofile itself. This is important: Which kernel? The cases I saw it in were in a (now old) SuSE kernel which as it turns out had old/different vma lookup code. Thanks, -VAL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-16 16:31 ` Valerie Henson @ 2006-05-16 16:47 ` Andreas Mohr 2006-05-17 3:25 ` Nick Piggin 2006-05-17 6:10 ` Blaisorblade 0 siblings, 2 replies; 27+ messages in thread From: Andreas Mohr @ 2006-05-16 16:47 UTC (permalink / raw) To: Valerie Henson Cc: Nick Piggin, Ulrich Drepper, Blaisorblade, Andrew Morton, linux-kernel, Linux Memory Management, Val Henson Hi, On Tue, May 16, 2006 at 09:31:12AM -0700, Valerie Henson wrote: > On Tue, May 16, 2006 at 03:51:35PM +0200, Andreas Mohr wrote: > > > > I cannot offer much other than some random confirmation that from my own > > oprofiling, whatever I did (often running a load test script consisting of > > launching 30 big apps at the same time), find_vma basically always showed up > > very prominently in the list of vmlinux-based code (always ranking within the > > top 4 or 5 kernel hotspots, such as timer interrupts, ACPI idle I/O etc.pp.). > > call-tracing showed it originating from mmap syscalls etc., and AFAIR quite > > some find_vma activity from oprofile itself. > > This is important: Which kernel? I had some traces still showing find_vma prominently during a profiling run just yesterday, with a very fresh 2.6.17-rc4-ck1 (IOW, basically 2.6.17-rc4). I added some cache prefetching in the list traversal a while ago, and IIRC that improved profiling times there, but cache prefetching is very often a bandaid in search for a real solution: a better data-handling algorithm. Andreas Mohr -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-16 16:47 ` Andreas Mohr @ 2006-05-17 3:25 ` Nick Piggin 2006-05-17 6:10 ` Blaisorblade 1 sibling, 0 replies; 27+ messages in thread From: Nick Piggin @ 2006-05-17 3:25 UTC (permalink / raw) To: Andreas Mohr Cc: Valerie Henson, Ulrich Drepper, Blaisorblade, Andrew Morton, linux-kernel, Linux Memory Management, Val Henson Andreas Mohr wrote: >Hi, > >On Tue, May 16, 2006 at 09:31:12AM -0700, Valerie Henson wrote: > >>On Tue, May 16, 2006 at 03:51:35PM +0200, Andreas Mohr wrote: >> >>>I cannot offer much other than some random confirmation that from my own >>>oprofiling, whatever I did (often running a load test script consisting of >>>launching 30 big apps at the same time), find_vma basically always showed up >>>very prominently in the list of vmlinux-based code (always ranking within the >>>top 4 or 5 kernel hotspots, such as timer interrupts, ACPI idle I/O etc.pp.). >>>call-tracing showed it originating from mmap syscalls etc., and AFAIR quite >>>some find_vma activity from oprofile itself. >>> >>This is important: Which kernel? >> > >I had some traces still showing find_vma prominently during a profiling run >just yesterday, with a very fresh 2.6.17-rc4-ck1 (IOW, basically 2.6.17-rc4). >I added some cache prefetching in the list traversal a while ago, and IIRC >that improved profiling times there, but cache prefetching is very often >a bandaid in search for a real solution: a better data-handling algorithm. > If you want to try out the patch and see what it does for you, that would be interesting. I'll repost a slightly cleaned up version in a couple of hours. Nick -- Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-16 16:47 ` Andreas Mohr 2006-05-17 3:25 ` Nick Piggin @ 2006-05-17 6:10 ` Blaisorblade 1 sibling, 0 replies; 27+ messages in thread From: Blaisorblade @ 2006-05-17 6:10 UTC (permalink / raw) To: Andreas Mohr Cc: Valerie Henson, Nick Piggin, Ulrich Drepper, Andrew Morton, linux-kernel, Linux Memory Management, Val Henson On Tuesday 16 May 2006 18:47, Andreas Mohr wrote: > Hi, > > On Tue, May 16, 2006 at 09:31:12AM -0700, Valerie Henson wrote: > > On Tue, May 16, 2006 at 03:51:35PM +0200, Andreas Mohr wrote: > > > I cannot offer much other than some random confirmation that from my > > > own oprofiling, whatever I did (often running a load test script > > > consisting of launching 30 big apps at the same time), find_vma > > > basically always showed up very prominently in the list of > > > vmlinux-based code (always ranking within the top 4 or 5 kernel > > > hotspots, such as timer interrupts, ACPI idle I/O etc.pp.). > > > call-tracing showed it originating from mmap syscalls etc., and AFAIR > > > quite some find_vma activity from oprofile itself. > > > > This is important: Which kernel? I'd also add (for all peoples): on which processors? L2 cache size probably plays an important role, if (as I'm convinced) the problem are cache misses during rb-tree traversal. > I had some traces still showing find_vma prominently during a profiling run > just yesterday, with a very fresh 2.6.17-rc4-ck1 (IOW, basically > 2.6.17-rc4). I added some cache prefetching in the list traversal a while > ago, You mean the rb-tree traversal, I guess! Or was the base kernel so old? > and IIRC that improved profiling times there, but cache prefetching is > very often a bandaid in search for a real solution: a better data-handling > algorithm. Ok, finally I find the time to kick in and ask a couple of question. The current algorithm is good but has poor cache locality (IMHO). First, since you can get find_vma on the profile, I've read (the article talked about userspace apps but I think it applies to kernelspace too) that oprofile can trace L2 cache misses. I think such a profiling, if possible, would be particularly interesting: there's no reason whatsoever for that lookup, even on a 32-level tree (theoretical maximum since we have max 64K vmas and height_rbtree <= 2 logN), should be so slow, unless you add cache misses into the picture. The fact that cache prefetching helps shows this even more. The lookup has very poor cache locality: the rb-node (3 pointers i.e. 12 bytes, and we need only 2 pointers on searches) is surrounded by non-relevant data we fetch (we don't need the VMA itself for nodes we traverse). For cache-locality the best data structure I know of are radix trees; but changing the implementation is absolutely non-trivial (the find_vma_prev() and friends API is tightly coupled with the rb-tree), and the size of the tree grows with the virtual address space (which is a problem on 64-bit archs); finally, you have locality when you do multiple searches, especially for the root nodes, but not across different levels inside a single search. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-16 13:30 ` Nick Piggin 2006-05-16 13:51 ` Andreas Mohr @ 2006-05-16 16:33 ` Valerie Henson 1 sibling, 0 replies; 27+ messages in thread From: Valerie Henson @ 2006-05-16 16:33 UTC (permalink / raw) To: Nick Piggin Cc: Ulrich Drepper, Blaisorblade, Andrew Morton, linux-kernel, Linux Memory Management On Tue, May 16, 2006 at 11:30:32PM +1000, Nick Piggin wrote: > > Hi Val, > > Thanks, I've tried with your most recent ebizzy and with 256 threads and > 50,000 vmas (which gives really poor mmap_cache hits), I'm still unable > to get find_vma above a few % of kernel time. How excellent! Sometimes negative results are worth publishing. :) -VAL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-02 3:45 ` [patch 00/14] remap_file_pages protection support Nick Piggin ` (2 preceding siblings ...) 2006-05-03 0:25 ` Blaisorblade @ 2006-05-03 0:44 ` Blaisorblade 2006-05-06 9:06 ` Nick Piggin 3 siblings, 1 reply; 27+ messages in thread From: Blaisorblade @ 2006-05-03 0:44 UTC (permalink / raw) To: Nick Piggin Cc: Andrew Morton, linux-kernel, Linux Memory Management, Ulrich Drepper, Val Henson On Tuesday 02 May 2006 05:45, Nick Piggin wrote: > blaisorblade@yahoo.it wrote: > > The first idea is to use this for UML - it must create a lot of single > > page mappings, and managing them through separate VMAs is slow. > I think I would rather this all just folded under VM_NONLINEAR rather than > having this extra MANYPROTS thing, no? (you're already doing that in one > direction). That step is _temporary_ if the extra usages are accepted. Also, I reported (changelog of patch 03/14) a definite API bug you get if you don't distinguish VM_MANYPROTS from VM_NONLINEAR. I'm pasting it here because that changelog is rather long: "In fact, without this flag, we'd have indeed a regression with remap_file_pages VS mprotect, on uniform nonlinear VMAs. mprotect alters the VMA prots and walks each present PTE, ignoring installed ones, even when pte_file() is on; their saved prots will be restored on faults, ignoring VMA ones and losing the mprotect() on them. So, in do_file_page(), we must restore anyway VMA prots when the VMA is uniform, as we used to do before this trail of patches." > > Additional note: this idea, with some further refinements (which I'll > > code after this chunk is accepted), will allow to reduce the number of > > used VMAs for most userspace programs - in particular, it will allow to > > avoid creating one VMA for one guard pages (which has PROT_NONE) - > > forcing PROT_NONE on that page will be enough. > I think that's silly. Your VM_MANYPROTS|VM_NONLINEAR vmas will cause more > overhead in faulting and reclaim. I know that problem. In fact for that we want VM_MANYPROTS without VM_NONLINEAR. > It loooks like it would take an hour or two just to code up a patch which > puts a VM_GUARDPAGES flag into the vma, and tells the free area allocator > to skip vm_start-1 .. vm_end+1 we must refine which pages to skip (the example I saw has only one guard page, if I'm not mistaken) but > . What kind of troubles has prevented > something simple and easy like that from going in? Fairly better idea... It's just the fact that the original proposal was wider, and that we looked to the problem in the wrong way (+ we wanted anyway to have the present work merged, so that wasn't a problem). Ulrich wanted to have code+data(+guard on 64-bit) into the same VMA, but I left the code+data VMA joining away, to think more with it, since currently it's too slow on swapout. The other part is avoiding guard VMAs for thread stacks, and that could be accomplished too by your proposal. Iff this work is held out, however. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-03 0:44 ` Blaisorblade @ 2006-05-06 9:06 ` Nick Piggin 2006-05-06 15:26 ` Ulrich Drepper 0 siblings, 1 reply; 27+ messages in thread From: Nick Piggin @ 2006-05-06 9:06 UTC (permalink / raw) To: Blaisorblade Cc: Andrew Morton, linux-kernel, Linux Memory Management, Ulrich Drepper, Val Henson Blaisorblade wrote: > On Tuesday 02 May 2006 05:45, Nick Piggin wrote: > >>blaisorblade@yahoo.it wrote: >> >>>The first idea is to use this for UML - it must create a lot of single >>>page mappings, and managing them through separate VMAs is slow. > > >>I think I would rather this all just folded under VM_NONLINEAR rather than >>having this extra MANYPROTS thing, no? (you're already doing that in one >>direction). > > > That step is _temporary_ if the extra usages are accepted. Can we try to get the whole design right from the start? > > Also, I reported (changelog of patch 03/14) a definite API bug you get if you > don't distinguish VM_MANYPROTS from VM_NONLINEAR. I'm pasting it here because > that changelog is rather long: > > "In fact, without this flag, we'd have indeed a regression with > remap_file_pages VS mprotect, on uniform nonlinear VMAs. > > mprotect alters the VMA prots and walks each present PTE, ignoring installed > ones, even when pte_file() is on; their saved prots will be restored on > faults, > ignoring VMA ones and losing the mprotect() on them. So, in do_file_page(), we > must restore anyway VMA prots when the VMA is uniform, as we used to do before > this trail of patches." It is only a bug because you hadn't plugged the hole -- make it fix up pte_file ones as well. > Ulrich wanted to have code+data(+guard on 64-bit) into the same VMA, but I > left the code+data VMA joining away, to think more with it, since currently > it's too slow on swapout. Yes, and it would be ridiculous to do this with non linear protections anyway. If the vma code is so slow that glibc wants to merge code and data vmas together, then we obviously need to fix the data structure (which will help everyone) rather than hacking around it. > > The other part is avoiding guard VMAs for thread stacks, and that could be > accomplished too by your proposal. Iff this work is held out, however. I see no reason why they couldn't both go in. In fact, having an mmap flag for adding guard regions around vmas (and perhaps eg. a system-wide / per-process option for stack) could almost go in tomorrow. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 00/14] remap_file_pages protection support 2006-05-06 9:06 ` Nick Piggin @ 2006-05-06 15:26 ` Ulrich Drepper 0 siblings, 0 replies; 27+ messages in thread From: Ulrich Drepper @ 2006-05-06 15:26 UTC (permalink / raw) To: Nick Piggin Cc: Blaisorblade, Andrew Morton, linux-kernel, Linux Memory Management, Val Henson [-- Attachment #1: Type: text/plain, Size: 853 bytes --] Nick Piggin wrote: > I see no reason why they couldn't both go in. In fact, having an mmap > flag for > adding guard regions around vmas (and perhaps eg. a system-wide / > per-process > option for stack) could almost go in tomorrow. This would have to be flexible, though. For thread stacks, at least, the programmer is able to specify the size of the guard area. It can be arbitrarily large. Also, consider IA-64. Here we have two stacks. We allocate them with one mmap call and put the guard somewhere in the middle (the optimal ratio of CPU and register stack size is yet to be determined) and have the stack grow toward each other. This results into three VMAs in the moment. Anything which results on more VMAs obviously isn't good. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <20060430173025.752423000@zion.home.lan>]
* Re: [patch 11/14] remap_file_pages protection support: pte_present should not trigger on PTE_FILE PROTNONE ptes [not found] ` <20060430173025.752423000@zion.home.lan> @ 2006-05-02 3:53 ` Nick Piggin 2006-05-03 1:29 ` Blaisorblade 0 siblings, 1 reply; 27+ messages in thread From: Nick Piggin @ 2006-05-02 3:53 UTC (permalink / raw) To: blaisorblade; +Cc: Andrew Morton, linux-kernel, Linux Memory Management blaisorblade@yahoo.it wrote: > From: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> > > pte_present(pte) implies that pte_pfn(pte) is valid. Normally even with a > _PAGE_PROTNONE pte this holds, but not when such a PTE is installed by > the new install_file_pte; previously it didn't store protections, only file > offsets, with the patches it also stores protections, and can set > _PAGE_PROTNONE|_PAGE_FILE. Why is this combination useful? Can't you just drop the _PAGE_FILE from _PAGE_PROTNONE ptes? > > zap_pte_range, when acting on such a pte, calls vm_normal_page and gets > &mem_map[0], does page_remove_rmap, and we're easily in trouble, because it > happens to find a page with mapcount == 0. And it BUGs on this! > > I've seen this trigger easily and repeatably on UML on 2.6.16-rc3. This was > likely avoided in the past by the PageReserved test - page 0 *had* to be > reserved on i386 (dunno on UML). > > Implementation follows for UML and i386. > > To avoid additional overhead, I also considered adding likely() for > _PAGE_PRESENT and unlikely() for the rest, but I'm uncertain about validity of > possible [un]likely(pte_present()) occurrences. Not present pages are likely to be pretty common when unmapping. I don't like this patch much. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 11/14] remap_file_pages protection support: pte_present should not trigger on PTE_FILE PROTNONE ptes 2006-05-02 3:53 ` [patch 11/14] remap_file_pages protection support: pte_present should not trigger on PTE_FILE PROTNONE ptes Nick Piggin @ 2006-05-03 1:29 ` Blaisorblade 2006-05-06 10:03 ` Nick Piggin 0 siblings, 1 reply; 27+ messages in thread From: Blaisorblade @ 2006-05-03 1:29 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, linux-kernel, Linux Memory Management On Tuesday 02 May 2006 05:53, Nick Piggin wrote: > blaisorblade@yahoo.it wrote: > > From: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> > > > > pte_present(pte) implies that pte_pfn(pte) is valid. Normally even with a > > _PAGE_PROTNONE pte this holds, but not when such a PTE is installed by > > the new install_file_pte; previously it didn't store protections, only > > file offsets, with the patches it also stores protections, and can set > > _PAGE_PROTNONE|_PAGE_FILE. What could be done is to set a PTE with "no protection", use another bit rather than _PAGE_PROTNONE. This wastes one more bit but doable. > Why is this combination useful? Can't you just drop the _PAGE_FILE from > _PAGE_PROTNONE ptes? I must think on this, but the semantics are not entirely the same between the two cases. You have no page attached when _PAGE_FILE is there, but a page is attached to the PTE with only _PAGE_PROTNONE. Testing that via VM_MANYPROTS is just as slow as-is (can be changed with code duplication for the linear and non-linear cases). The application semantics can also be different when you remap as read/write that page - the app could have stored an offset there (this is less definite since you can't remap & keep the offset currently). Also, this wouldn't solve the problem, it would make the solution harder: how do I know that there's no page to call page_remove_rmap() on, without _PAGE_FILE? I thought to change _PAGE_PROTNONE: it is used to hold a page present and referenced but unaccessible. It seems it could be released when _PAGE_PROTNONE is set, but for anonymous memory it's impossible. When I've asked Hugh about this, he imagined the case when an application faults in a page in a VMA then mprotects(PROT_NONE) it; the PTE is set as PROT_NONE. We can avoid that in the VM_MAYSHARE case (VM_SHARED or PROT_SHARED was set but the file is readonly), but not when anonymous memory is present - the application could want it back. > > To avoid additional overhead, I also considered adding likely() for > > _PAGE_PRESENT and unlikely() for the rest, but I'm uncertain about > > validity of possible [un]likely(pte_present()) occurrences. > > Not present pages are likely to be pretty common when unmapping. Ok, only unlikely for test on _PAGE_PROTNONE and ! _PAGE_FILE. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 11/14] remap_file_pages protection support: pte_present should not trigger on PTE_FILE PROTNONE ptes 2006-05-03 1:29 ` Blaisorblade @ 2006-05-06 10:03 ` Nick Piggin 2006-05-07 17:50 ` Blaisorblade 0 siblings, 1 reply; 27+ messages in thread From: Nick Piggin @ 2006-05-06 10:03 UTC (permalink / raw) To: Blaisorblade; +Cc: Andrew Morton, linux-kernel, Linux Memory Management Blaisorblade wrote: > On Tuesday 02 May 2006 05:53, Nick Piggin wrote: > >>blaisorblade@yahoo.it wrote: >> >>>From: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> >>> >>>pte_present(pte) implies that pte_pfn(pte) is valid. Normally even with a >>>_PAGE_PROTNONE pte this holds, but not when such a PTE is installed by >>>the new install_file_pte; previously it didn't store protections, only >>>file offsets, with the patches it also stores protections, and can set >>>_PAGE_PROTNONE|_PAGE_FILE. > > > What could be done is to set a PTE with "no protection", use another bit > rather than _PAGE_PROTNONE. This wastes one more bit but doable. I see. > > >>Why is this combination useful? Can't you just drop the _PAGE_FILE from >>_PAGE_PROTNONE ptes? > > > I must think on this, but the semantics are not entirely the same between the > two cases. And yes, this won't work. I was misunderstanding what was happening. I guess your problem is that you're overloading the pte protection bits for present ptes as protection bits for not present (file) ptes. I'd rather you just used a different encoding for file pte protections then. "Wasting" a bit seems much more preferable for this very uncommon case (for most people) rather than bloating pte_present check, which is called in practically every performance critical inner loop). That said, if the patch is i386/uml specific then I don't have much say in it. If Ingo/Linus and Jeff/Yourself, respectively, accept the patch, then fine. But I think you should drop the comment from the core code. It seems wrong. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [patch 11/14] remap_file_pages protection support: pte_present should not trigger on PTE_FILE PROTNONE ptes 2006-05-06 10:03 ` Nick Piggin @ 2006-05-07 17:50 ` Blaisorblade 0 siblings, 0 replies; 27+ messages in thread From: Blaisorblade @ 2006-05-07 17:50 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, linux-kernel, Linux Memory Management On Saturday 06 May 2006 12:03, Nick Piggin wrote: > Blaisorblade wrote: > > On Tuesday 02 May 2006 05:53, Nick Piggin wrote: > >>blaisorblade@yahoo.it wrote: > >>>From: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> > >>> > >>>pte_present(pte) implies that pte_pfn(pte) is valid. Normally even with > >>> a _PAGE_PROTNONE pte this holds, but not when such a PTE is installed > >>> by the new install_file_pte; previously it didn't store protections, > >>> only file offsets, with the patches it also stores protections, and can > >>> set _PAGE_PROTNONE|_PAGE_FILE. > > > > What could be done is to set a PTE with "no protection", use another bit > > rather than _PAGE_PROTNONE. This wastes one more bit but doable. > I see. > I guess your problem is that you're overloading the pte protection bits > for present ptes as protection bits for not present (file) ptes. I'd rather > you just used a different encoding for file pte protections then. Yes, this is what I said above, so we agree; and indeed this overloading was decided when the present problem didn't trigger, so it can now change. As detailed in the patch description, the previous PageReserved handling prevented freeing page 0 and hided this. > "Wasting" a bit seems much more preferable for this very uncommon case (for > most people) rather than bloating pte_present check, which is called in > practically every performance critical inner loop). Yes, I thought about this problem, I wasn't sure how hard it was. > That said, if the patch is i386/uml specific then I don't have much say in > it. It's presently specific, but will probably extend. Implementations for some other archs were already sent and I've collected them (will send afterwards,I've avoided excess bloat). > If Ingo/Linus and Jeff/Yourself, respectively, accept the patch, then > fine. > But I think you should drop the comment from the core code. It seems wrong. Yep, forgot there, thanks for reminding, I've now removed it. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2006-05-17 6:10 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20060430172953.409399000@zion.home.lan>
2006-05-02 3:45 ` [patch 00/14] remap_file_pages protection support Nick Piggin
2006-05-02 3:56 ` Nick Piggin
2006-05-02 11:24 ` Ingo Molnar
2006-05-02 12:19 ` Nick Piggin
2006-05-02 17:16 ` Lee Schermerhorn
2006-05-03 1:20 ` Blaisorblade
2006-05-03 14:35 ` Lee Schermerhorn
2006-05-03 0:25 ` Blaisorblade
2006-05-06 16:05 ` Ulrich Drepper
2006-05-07 4:22 ` Nick Piggin
2006-05-13 14:13 ` Nick Piggin
2006-05-13 18:19 ` Valerie Henson
2006-05-13 22:54 ` Valerie Henson
2006-05-16 13:30 ` Nick Piggin
2006-05-16 13:51 ` Andreas Mohr
2006-05-16 16:31 ` Valerie Henson
2006-05-16 16:47 ` Andreas Mohr
2006-05-17 3:25 ` Nick Piggin
2006-05-17 6:10 ` Blaisorblade
2006-05-16 16:33 ` Valerie Henson
2006-05-03 0:44 ` Blaisorblade
2006-05-06 9:06 ` Nick Piggin
2006-05-06 15:26 ` Ulrich Drepper
[not found] ` <20060430173025.752423000@zion.home.lan>
2006-05-02 3:53 ` [patch 11/14] remap_file_pages protection support: pte_present should not trigger on PTE_FILE PROTNONE ptes Nick Piggin
2006-05-03 1:29 ` Blaisorblade
2006-05-06 10:03 ` Nick Piggin
2006-05-07 17:50 ` Blaisorblade
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).