* Memory controller merge (was Re: -mm merge plans for 2.6.24) [not found] <20071001142222.fcaa8d57.akpm@linux-foundation.org> @ 2007-10-02 4:21 ` Balbir Singh 2007-10-02 15:46 ` Hugh Dickins 2007-10-10 21:07 ` Rik van Riel 2007-10-02 16:06 ` kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] Hugh Dickins ` (2 subsequent siblings) 3 siblings, 2 replies; 32+ messages in thread From: Balbir Singh @ 2007-10-02 4:21 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, Linux Memory Management List Andrew Morton wrote: > memory-controller-add-documentation.patch > memory-controller-resource-counters-v7.patch > memory-controller-resource-counters-v7-fix.patch > memory-controller-containers-setup-v7.patch > memory-controller-accounting-setup-v7.patch > memory-controller-memory-accounting-v7.patch > memory-controller-memory-accounting-v7-fix.patch > memory-controller-memory-accounting-v7-fix-swapoff-breakage-however.patch > memory-controller-task-migration-v7.patch > memory-controller-add-per-container-lru-and-reclaim-v7.patch > memory-controller-add-per-container-lru-and-reclaim-v7-fix.patch > memory-controller-add-per-container-lru-and-reclaim-v7-fix-2.patch > memory-controller-add-per-container-lru-and-reclaim-v7-cleanup.patch > memory-controller-improve-user-interface.patch > memory-controller-oom-handling-v7.patch > memory-controller-oom-handling-v7-vs-oom-killer-stuff.patch > memory-controller-add-switch-to-control-what-type-of-pages-to-limit-v7.patch > memory-controller-add-switch-to-control-what-type-of-pages-to-limit-v7-cleanup.patch > memory-controller-add-switch-to-control-what-type-of-pages-to-limit-v7-fix-2.patch > memory-controller-make-page_referenced-container-aware-v7.patch > memory-controller-make-charging-gfp-mask-aware.patch > memory-controller-make-charging-gfp-mask-aware-fix.patch > memory-controller-bug_on.patch > mem-controller-gfp-mask-fix.patch > memcontrol-move-mm_cgroup-to-header-file.patch > memcontrol-move-oom-task-exclusion-to-tasklist.patch > memcontrol-move-oom-task-exclusion-to-tasklist-fix.patch > oom-add-sysctl-to-enable-task-memory-dump.patch > kswapd-should-only-wait-on-io-if-there-is-io.patch > > Hold. This needs a serious going-over by page reclaim people. > Hi, Andrew, I mostly agree with your decision. I am a little concerned however that as we develop and add more features (a.k.a better statistics/ forced reclaim), which are very important; the code base gets larger, the review takes longer :) I was hopeful of getting the bare minimal infrastructure for memory control in mainline, so that review is easy and additional changes can be well reviewed as well. Here are the pros and cons of merging the memory controller Pros 1. Smaller size, easy to review and merge 2. Incremental development, makes it easier to maintain the code Cons 1. Needs more review like you said 2. Although the UI is stable, it's a good chance to review it once more before merging the code into mainline Having said that, I'll continue testing the patches and make the solution more complete and usable. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-02 4:21 ` Memory controller merge (was Re: -mm merge plans for 2.6.24) Balbir Singh @ 2007-10-02 15:46 ` Hugh Dickins 2007-10-03 8:13 ` Balbir Singh 2007-10-04 16:10 ` Paul Menage 2007-10-10 21:07 ` Rik van Riel 1 sibling, 2 replies; 32+ messages in thread From: Hugh Dickins @ 2007-10-02 15:46 UTC (permalink / raw) To: Balbir Singh; +Cc: Andrew Morton, Pavel Emelianov, linux-kernel, linux-mm On Tue, 2 Oct 2007, Balbir Singh wrote: > Andrew Morton wrote: > > memory-controller-add-documentation.patch > > ... > > kswapd-should-only-wait-on-io-if-there-is-io.patch > > > > Hold. This needs a serious going-over by page reclaim people. > > I mostly agree with your decision. I am a little concerned however > that as we develop and add more features (a.k.a better statistics/ > forced reclaim), which are very important; the code base gets larger, > the review takes longer :) I agree with putting the memory controller stuff on hold from 2.6.24. Sorry, Balbir, I've failed to get back to you, still attending to priorities. Let me briefly summarize my issue with the mem controller: you've not yet given enough attention to swap. I accept that full swap control is something you're intending to add incrementally later; but the current state doesn't make sense to me. The problems are swapoff and swapin readahead. These pull pages into the swap cache, which are assigned to the cgroup (or the whatever-we- call-the-remainder-outside-all-the-cgroups) which is running swapoff or faulting in its own page; yet they very clearly don't (in general) belong to that cgroup, but to other cgroups which will be discovered later. I did try removing the cgroup mods to mm/swap_state.c, so swap pages get assigned to a cgroup only once it's really known; but that's not enough by itself, because cgroup RSS reclaim doesn't touch those pages, so the cgroup can easily OOM much too soon. I was thinking that you need a "limbo" cgroup for these pages, which can be attacked for reclaim along with any cgroup being reclaimed, but from which pages are readily migrated to their real cgroup once that's known. But I had to switch over to other work before trying that out: perhaps the idea doesn't really fly at all. And it might well be no longer needed once full mem+swap control is there. So in the current memory controller, that unuse_pte mem charge I was originally worried about failing (I hadn't at that point delved in to see how it tries to reclaim) actually never fails (and never does anything): the page is already assigned to some cgroup-or- whatever and is never charged to vma->vm_mm at that point. And small point: once that is sorted out and the page is properly assigned in unuse_pte, you'll be needing to pte_unmap_unlock and pte_offset_map_lock around the mem_cgroup_charge call there - you're right to call it with GFP_KERNEL, but cannot do so while holding the page table locked and mapped. (But because the page lock is held, there shouldn't be any raciness to dropping and retaking the ptl.) Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-02 15:46 ` Hugh Dickins @ 2007-10-03 8:13 ` Balbir Singh 2007-10-03 18:47 ` Hugh Dickins 2007-10-04 16:10 ` Paul Menage 1 sibling, 1 reply; 32+ messages in thread From: Balbir Singh @ 2007-10-03 8:13 UTC (permalink / raw) To: Hugh Dickins; +Cc: Andrew Morton, Pavel Emelianov, linux-kernel, linux-mm Hugh Dickins wrote: > On Tue, 2 Oct 2007, Balbir Singh wrote: >> Andrew Morton wrote: >>> memory-controller-add-documentation.patch >>> ... >>> kswapd-should-only-wait-on-io-if-there-is-io.patch >>> >>> Hold. This needs a serious going-over by page reclaim people. >> I mostly agree with your decision. I am a little concerned however >> that as we develop and add more features (a.k.a better statistics/ >> forced reclaim), which are very important; the code base gets larger, >> the review takes longer :) > > I agree with putting the memory controller stuff on hold from 2.6.24. > > Sorry, Balbir, I've failed to get back to you, still attending to > priorities. Let me briefly summarize my issue with the mem controller: > you've not yet given enough attention to swap. > I am open to suggestions and ways and means of making swap control complete and more usable. > I accept that full swap control is something you're intending to add > incrementally later; but the current state doesn't make sense to me. > > The problems are swapoff and swapin readahead. These pull pages into > the swap cache, which are assigned to the cgroup (or the whatever-we- > call-the-remainder-outside-all-the-cgroups) which is running swapoff > or faulting in its own page; yet they very clearly don't (in general) > belong to that cgroup, but to other cgroups which will be discovered > later. > I understand what your trying to say, but with several approaches that we tried in the past, we found caches the hardest to most accurately account. IIRC, with readahead, we don't even know if all the pages readahead will be used, that's why we charge everything to the cgroup that added the page to the cache. > I did try removing the cgroup mods to mm/swap_state.c, so swap pages > get assigned to a cgroup only once it's really known; but that's not > enough by itself, because cgroup RSS reclaim doesn't touch those > pages, so the cgroup can easily OOM much too soon. I was thinking > that you need a "limbo" cgroup for these pages, which can be attacked > for reclaim along with any cgroup being reclaimed, but from which > pages are readily migrated to their real cgroup once that's known. > Is migrating the charge to the real cgroup really required? > But I had to switch over to other work before trying that out: > perhaps the idea doesn't really fly at all. And it might well > be no longer needed once full mem+swap control is there. > > So in the current memory controller, that unuse_pte mem charge I was > originally worried about failing (I hadn't at that point delved in > to see how it tries to reclaim) actually never fails (and never > does anything): the page is already assigned to some cgroup-or- > whatever and is never charged to vma->vm_mm at that point. > Excellent! > And small point: once that is sorted out and the page is properly > assigned in unuse_pte, you'll be needing to pte_unmap_unlock and > pte_offset_map_lock around the mem_cgroup_charge call there - > you're right to call it with GFP_KERNEL, but cannot do so while > holding the page table locked and mapped. (But because the page > lock is held, there shouldn't be any raciness to dropping and > retaking the ptl.) > Good catch! I'll fix that. > Hugh -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-03 8:13 ` Balbir Singh @ 2007-10-03 18:47 ` Hugh Dickins 2007-10-04 4:16 ` Balbir Singh 0 siblings, 1 reply; 32+ messages in thread From: Hugh Dickins @ 2007-10-03 18:47 UTC (permalink / raw) To: Balbir Singh; +Cc: Andrew Morton, Pavel Emelianov, linux-kernel, linux-mm On Wed, 3 Oct 2007, Balbir Singh wrote: > Hugh Dickins wrote: > > > > Sorry, Balbir, I've failed to get back to you, still attending to > > priorities. Let me briefly summarize my issue with the mem controller: > > you've not yet given enough attention to swap. > > I am open to suggestions and ways and means of making swap control > complete and more usable. Well, swap control is another subject. I guess for that you'll need to track which cgroup each swap page belongs to (rather more expensive than the current swap_map of unsigned shorts). And I doubt it'll be swap control as such that's required, but control of rss+swap. But here I'm just worrying about how the existence of swap makes something of a nonsense of your rss control. > > I accept that full swap control is something you're intending to add > > incrementally later; but the current state doesn't make sense to me. > > > > The problems are swapoff and swapin readahead. These pull pages into > > the swap cache, which are assigned to the cgroup (or the whatever-we- > > call-the-remainder-outside-all-the-cgroups) which is running swapoff ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I'd appreciate it if you'd teach me the right name for that! > > or faulting in its own page; yet they very clearly don't (in general) > > belong to that cgroup, but to other cgroups which will be discovered > > later. > > I understand what your trying to say, but with several approaches that > we tried in the past, we found caches the hardest to most accurately > account. IIRC, with readahead, we don't even know if all the pages > readahead will be used, that's why we charge everything to the cgroup > that added the page to the cache. Yes, readahead is anyway problematic. My guess is that in the file cache case, you'll tend not to go too far wrong by charging to the one that added - though we're all aware that's fairly unsatisfactory. My point is that in the swap cache case, it's badly wrong: there's no page more obviously owned by a cgroup than its anonymous pages (forgetting for a moment that minority shared between cgroups until copy-on-write), so it's very wrong for swapin readahead or swapoff to go charging those to another or to no cgroup. Imagine a cgroup at its rss limit, with more out on swap. Then another cgroup does some swap readahead, bringing pages private to the first into cache. Or runs swapoff which actually plugs them into the rss of the first cgroup, so it goes over limit. Those are pages we'd want to swap out when the first cgroup faults to go further over its limit; but they're now not even identified as belonging to the right cgroup, so won't be found. > > I did try removing the cgroup mods to mm/swap_state.c, so swap pages > > get assigned to a cgroup only once it's really known; but that's not > > enough by itself, because cgroup RSS reclaim doesn't touch those > > pages, so the cgroup can easily OOM much too soon. I was thinking > > that you need a "limbo" cgroup for these pages, which can be attacked > > for reclaim along with any cgroup being reclaimed, but from which > > pages are readily migrated to their real cgroup once that's known. > > > > Is migrating the charge to the real cgroup really required? My answer is definitely yes. I'm not suggesting that you need general migration between cgroups at this stage (something for later quite likely); but I am suggesting you need one pseudo-cgroup to hold these cases temporarily, and that you cannot properly track rss without it (if there is any swap). > > But I had to switch over to other work before trying that out: > > perhaps the idea doesn't really fly at all. And it might well > > be no longer needed once full mem+swap control is there. > > > > So in the current memory controller, that unuse_pte mem charge I was > > originally worried about failing (I hadn't at that point delved in > > to see how it tries to reclaim) actually never fails (and never > > does anything): the page is already assigned to some cgroup-or- > > whatever and is never charged to vma->vm_mm at that point. > > > > Excellent! Umm, please explain what's excellent about that. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-03 18:47 ` Hugh Dickins @ 2007-10-04 4:16 ` Balbir Singh 2007-10-04 13:16 ` Hugh Dickins 0 siblings, 1 reply; 32+ messages in thread From: Balbir Singh @ 2007-10-04 4:16 UTC (permalink / raw) To: Hugh Dickins; +Cc: Andrew Morton, Pavel Emelianov, linux-kernel, linux-mm Hugh Dickins wrote: > On Wed, 3 Oct 2007, Balbir Singh wrote: >> Hugh Dickins wrote: >>> Sorry, Balbir, I've failed to get back to you, still attending to >>> priorities. Let me briefly summarize my issue with the mem controller: >>> you've not yet given enough attention to swap. >> I am open to suggestions and ways and means of making swap control >> complete and more usable. > > Well, swap control is another subject. I guess for that you'll need > to track which cgroup each swap page belongs to (rather more expensive > than the current swap_map of unsigned shorts). And I doubt it'll be > swap control as such that's required, but control of rss+swap. > I see what you mean now, other people have recommending a per cgroup swap file/device. > But here I'm just worrying about how the existence of swap makes > something of a nonsense of your rss control. > Ideally, pages would not reside for too long in swap cache (unless I've misunderstood swap cache or there are special cases for tmpfs/ ramfs). Once pages have been swapped back in, they get assigned back to their respective cgroup's in do_swap_page() (where we charge them back to the cgroup). The swap cache pages will be the first ones to go, once the cgroup exceeds its limit. There might be gaps in my understanding or I might be missing a use case scenario, where things work differently. >>> I accept that full swap control is something you're intending to add >>> incrementally later; but the current state doesn't make sense to me. >>> >>> The problems are swapoff and swapin readahead. These pull pages into >>> the swap cache, which are assigned to the cgroup (or the whatever-we- >>> call-the-remainder-outside-all-the-cgroups) which is running swapoff > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > I'd appreciate it if you'd teach me the right name for that! > In the past people have used names like default cgroup, we could use the root cgroup as the default cgroup. >>> or faulting in its own page; yet they very clearly don't (in general) >>> belong to that cgroup, but to other cgroups which will be discovered >>> later. >> I understand what your trying to say, but with several approaches that >> we tried in the past, we found caches the hardest to most accurately >> account. IIRC, with readahead, we don't even know if all the pages >> readahead will be used, that's why we charge everything to the cgroup >> that added the page to the cache. > > Yes, readahead is anyway problematic. My guess is that in the file > cache case, you'll tend not to go too far wrong by charging to the > one that added - though we're all aware that's fairly unsatisfactory. > > My point is that in the swap cache case, it's badly wrong: there's > no page more obviously owned by a cgroup than its anonymous pages > (forgetting for a moment that minority shared between cgroups > until copy-on-write), so it's very wrong for swapin readahead > or swapoff to go charging those to another or to no cgroup. > > Imagine a cgroup at its rss limit, with more out on swap. Then > another cgroup does some swap readahead, bringing pages private > to the first into cache. Or runs swapoff which actually plugs > them into the rss of the first cgroup, so it goes over limit. > > Those are pages we'd want to swap out when the first cgroup > faults to go further over its limit; but they're now not even > identified as belonging to the right cgroup, so won't be found. > Won't the right cgroup assignment happen as discussed above? >>> I did try removing the cgroup mods to mm/swap_state.c, so swap pages >>> get assigned to a cgroup only once it's really known; but that's not >>> enough by itself, because cgroup RSS reclaim doesn't touch those >>> pages, so the cgroup can easily OOM much too soon. I was thinking >>> that you need a "limbo" cgroup for these pages, which can be attacked >>> for reclaim along with any cgroup being reclaimed, but from which >>> pages are readily migrated to their real cgroup once that's known. >>> >> Is migrating the charge to the real cgroup really required? > > My answer is definitely yes. I'm not suggesting that you need > general migration between cgroups at this stage (something for > later quite likely); but I am suggesting you need one pseudo-cgroup > to hold these cases temporarily, and that you cannot properly track > rss without it (if there is any swap). > If what I understand and discussed earlier is, then we don't need to go this route. But I think the idea of having a pseduo cgroup is interesting (needs more thought). >>> So in the current memory controller, that unuse_pte mem charge I was >>> originally worried about failing (I hadn't at that point delved in >>> to see how it tries to reclaim) actually never fails (and never >>> does anything): the page is already assigned to some cgroup-or- >>> whatever and is never charged to vma->vm_mm at that point. >>> >> Excellent! > > Umm, please explain what's excellent about that. > Nothing really, I was glad that we dont fail, even though we might assign pages to some other cgroup. Not really exciting, but not failing was a relief :-) In summary, there's nothing excellent about it. > Hugh -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-04 4:16 ` Balbir Singh @ 2007-10-04 13:16 ` Hugh Dickins 2007-10-05 3:07 ` Balbir Singh 0 siblings, 1 reply; 32+ messages in thread From: Hugh Dickins @ 2007-10-04 13:16 UTC (permalink / raw) To: Balbir Singh; +Cc: Andrew Morton, Pavel Emelianov, linux-kernel, linux-mm On Thu, 4 Oct 2007, Balbir Singh wrote: > Hugh Dickins wrote: > > Well, swap control is another subject. I guess for that you'll need > > to track which cgroup each swap page belongs to (rather more expensive > > than the current swap_map of unsigned shorts). And I doubt it'll be > > swap control as such that's required, but control of rss+swap. > > I see what you mean now, other people have recommending a per cgroup > swap file/device. Sounds too inflexible, and too many swap areas to me. Perhaps the right answer will fall in between: assign clusters of swap pages to different cgroups as needed. But worry about that some other time. > > > But here I'm just worrying about how the existence of swap makes > > something of a nonsense of your rss control. > > > > Ideally, pages would not reside for too long in swap cache (unless Thinking particularly of those brought in by swapoff or swap readahead: some will get attached to mms once accessed, others will simply get freed when tasks exit or munmap, others will hang around until they reach the bottom of the LRU and are reclaimed again by memory pressure. But as your code stands, that'll be total memory pressure: in-cgroup memory pressure will tend to miss them, since typically they're assigned to the wrong cgroup; until then their presence is liable to cause other pages to be reclaimed which ideally should not be. > I've misunderstood swap cache or there are special cases for tmpfs/ > ramfs). ramfs pages are always in RAM, never go out to swap, no need to worry about them in this regard. But tmpfs pages can indeed go out to swap, so whatever we come up with needs to make sense with them too, yes. I don't think its swapoff/readahead issues are any harder to handle than the anonymous mapped page case, but it will need its own code to handle them. > Once pages have been swapped back in, they get assigned > back to their respective cgroup's in do_swap_page() (where we charge > them back to the cgroup). > That's where it should happen, yes; but my point is that it very often does not. Because the swap cache page (read in as part of the readaround cluster of some other cgroup, or in swapoff by some other cgroup) is already assigned to that other cgroup (by the mem_cgroup_cache_charge in __add_to_swap_cache), and so goes "The page_cgroup exists and the page has already been accounted" route when mem_cgroup_charge is called from do_swap_page. Doesn't it? Are we misunderstanding each other, because I'm assuming MEM_CGROUP_TYPE_ALL and you're assuming MEM_CGROUP_TYPE_MAPPED? though I can't see that _MAPPED and _CACHED are actually supported, there being no reference to them outside the enum that defines them. Or are you deceived by that ifdef NUMA code in swapin_readahead, which propagates the fantasy that swap allocation follows vma layout? That nonsense has been around too long, I'll soon be sending a patch to remove it. > The swap cache pages will be the first ones to go, once the cgroup > exceeds its limit. No, because they're (in general) booked to the wrong cgroup. > > There might be gaps in my understanding or I might be missing a use > case scenario, where things work differently. > > >>> I accept that full swap control is something you're intending to add > >>> incrementally later; but the current state doesn't make sense to me. > >>> > >>> The problems are swapoff and swapin readahead. These pull pages into > >>> the swap cache, which are assigned to the cgroup (or the whatever-we- > >>> call-the-remainder-outside-all-the-cgroups) which is running swapoff > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > I'd appreciate it if you'd teach me the right name for that! > > > > In the past people have used names like default cgroup, we could use > the root cgroup as the default cgroup. Okay, thanks. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-04 13:16 ` Hugh Dickins @ 2007-10-05 3:07 ` Balbir Singh 2007-10-07 17:41 ` Hugh Dickins 0 siblings, 1 reply; 32+ messages in thread From: Balbir Singh @ 2007-10-05 3:07 UTC (permalink / raw) To: Hugh Dickins; +Cc: Andrew Morton, Pavel Emelianov, linux-kernel, linux-mm Hugh Dickins wrote: > On Thu, 4 Oct 2007, Balbir Singh wrote: >> Hugh Dickins wrote: >>> Well, swap control is another subject. I guess for that you'll need >>> to track which cgroup each swap page belongs to (rather more expensive >>> than the current swap_map of unsigned shorts). And I doubt it'll be >>> swap control as such that's required, but control of rss+swap. >> I see what you mean now, other people have recommending a per cgroup >> swap file/device. > > Sounds too inflexible, and too many swap areas to me. Perhaps the > right answer will fall in between: assign clusters of swap pages to > different cgroups as needed. But worry about that some other time. > Yes, depending on the number of cgroups, we'll need to share swap areas between them. It requires more work and thought process. >>> But here I'm just worrying about how the existence of swap makes >>> something of a nonsense of your rss control. >>> >> Ideally, pages would not reside for too long in swap cache (unless > > Thinking particularly of those brought in by swapoff or swap readahead: > some will get attached to mms once accessed, others will simply get > freed when tasks exit or munmap, others will hang around until they > reach the bottom of the LRU and are reclaimed again by memory pressure. > > But as your code stands, that'll be total memory pressure: in-cgroup > memory pressure will tend to miss them, since typically they're > assigned to the wrong cgroup; until then their presence is liable > to cause other pages to be reclaimed which ideally should not be. > in-cgroup pressure will not affect them, since they are in different cgroups. If there is pressure in the cgroup to which they are wrongly assigned, they would get reclaimed first. >> I've misunderstood swap cache or there are special cases for tmpfs/ >> ramfs). > > ramfs pages are always in RAM, never go out to swap, no need to > worry about them in this regard. But tmpfs pages can indeed go > out to swap, so whatever we come up with needs to make sense > with them too, yes. I don't think its swapoff/readahead issues > are any harder to handle than the anonymous mapped page case, > but it will need its own code to handle them. > >> Once pages have been swapped back in, they get assigned >> back to their respective cgroup's in do_swap_page() (where we charge >> them back to the cgroup). >> > > That's where it should happen, yes; but my point is that it very > often does not. Because the swap cache page (read in as part of > the readaround cluster of some other cgroup, or in swapoff by some > other cgroup) is already assigned to that other cgroup (by the > mem_cgroup_cache_charge in __add_to_swap_cache), and so goes "The > page_cgroup exists and the page has already been accounted" route > when mem_cgroup_charge is called from do_swap_page. Doesn't it? > You are right, at this point I am beginning to wonder if I should account for the swap cache at all? We account for the pages in RSS and when the page comes back into the page table(s) via do_swap_page. If we believe that the swap cache is transitional and the current expected working behaviour does not seem right or hard to fix, it might be easy to ignore unuse_pte() and add/remove_from_swap_cache() for accounting and control. The expected working behaviour of the memory controller is that currently, as you point out several pages get accounted to the cgroup that initiates swapin readahead or swapoff. On cgroup pressure (the one that initiated swapin or swapoff), the cgroup would discard these pages first. These pages are discarded from the cgroup, but still live on the global LRU. When the original cgroup is under pressure, these pages might not be effected as they belong to a different cgroup, which might not be under any sort of pressure. > Are we misunderstanding each other, because I'm assuming > MEM_CGROUP_TYPE_ALL and you're assuming MEM_CGROUP_TYPE_MAPPED? > though I can't see that _MAPPED and _CACHED are actually supported, > there being no reference to them outside the enum that defines them. > I am also assuming MEM_CGROUP_TYPE_ALL for the purpose of our discussion. The accounting is split into mem_cgroup_charge() and mem_cgroup_cache_charge(). While charging the caches is when we check for the control_type. > Or are you deceived by that ifdef NUMA code in swapin_readahead, > which propagates the fantasy that swap allocation follows vma layout? > That nonsense has been around too long, I'll soon be sending a patch > to remove it. > The swapin readahead code under #ifdef NUMA is very confusing. I also noticed another confusing thing during my test, swap cache does not drop to 0, even though I've disabled all swap using swapoff. May be those are tmpfs pages. The other interesting thing I tried was running swapoff after a cgroup went over it's limit, the swapoff succeeded, but I see strange numbers for free swap. I'll start another thread after investigating a bit more. >> The swap cache pages will be the first ones to go, once the cgroup >> exceeds its limit. > > No, because they're (in general) booked to the wrong cgroup. > I meant for the wrong cgroup, in the wrong cgroup, these will be the first set of pages to be reclaimed. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-05 3:07 ` Balbir Singh @ 2007-10-07 17:41 ` Hugh Dickins 2007-10-08 2:54 ` Balbir Singh 0 siblings, 1 reply; 32+ messages in thread From: Hugh Dickins @ 2007-10-07 17:41 UTC (permalink / raw) To: Balbir Singh; +Cc: Andrew Morton, Pavel Emelianov, linux-kernel, linux-mm On Fri, 5 Oct 2007, Balbir Singh wrote: > Hugh Dickins wrote: > > > > That's where it should happen, yes; but my point is that it very > > often does not. Because the swap cache page (read in as part of > > the readaround cluster of some other cgroup, or in swapoff by some > > other cgroup) is already assigned to that other cgroup (by the > > mem_cgroup_cache_charge in __add_to_swap_cache), and so goes "The > > page_cgroup exists and the page has already been accounted" route > > when mem_cgroup_charge is called from do_swap_page. Doesn't it? > > > > You are right, at this point I am beginning to wonder if I should > account for the swap cache at all? We account for the pages in RSS > and when the page comes back into the page table(s) via do_swap_page. > If we believe that the swap cache is transitional and the current > expected working behaviour does not seem right or hard to fix, > it might be easy to ignore unuse_pte() and add/remove_from_swap_cache() > for accounting and control. It would be wrong to ignore the unuse_pte() case: what it's intending to do is correct, it's just being prevented by the swapcache issue from doing what it intends at present. (Though I'm not thrilled with the idea of it causing an admin's swapoff to fail because of a cgroup reaching mem limit there, I do agree with your earlier argument that that's the right thing to happen, and it's up to the admin to fix things up - my original objection came from not realizing that normally the cgroup will reclaim from itself to free its mem. Hmm, would the charge fail or the mm get OOM'ed?) Ignoring add_to/remove_from swap cache is what I've tried before, and again today. It's not enough: if you trying run a memhog (something that allocates and touches more memory than the cgroup is allowed, relying on pushing out to swap to complete), then that works well with the present accounting in add_to/remove_from swap cache, but it OOMs once I remove the memcontrol mods from mm/swap_state.c. I keep going back to investigate why, keep on thinking I understand it, then later realize I don't. Please give it a try, I hope you've got better mental models than I have. And I don't think it will be enough to handle shmem/tmpfs either; but won't worry about that until we've properly understood why exempting swapcache leads to those OOMs, and fixed that up. > > Are we misunderstanding each other, because I'm assuming > > MEM_CGROUP_TYPE_ALL and you're assuming MEM_CGROUP_TYPE_MAPPED? > > though I can't see that _MAPPED and _CACHED are actually supported, > > there being no reference to them outside the enum that defines them. > > I am also assuming MEM_CGROUP_TYPE_ALL for the purpose of our > discussion. The accounting is split into mem_cgroup_charge() and > mem_cgroup_cache_charge(). While charging the caches is when we > check for the control_type. It checks MEM_CGROUP_TYPE_ALL there, yes; but I can't find anything checking for either MEM_CGROUP_TYPE_MAPPED or MEM_CGROUP_TYPE_CACHED. (Or is it hidden in one of those preprocesor ## things which frustrate both my greps and me!?) > > Or are you deceived by that ifdef NUMA code in swapin_readahead, > > which propagates the fantasy that swap allocation follows vma layout? > > That nonsense has been around too long, I'll soon be sending a patch > > to remove it. > > The swapin readahead code under #ifdef NUMA is very confusing. I sent a patch to linux-mm last night, to remove that confusion. > I also > noticed another confusing thing during my test, swap cache does not > drop to 0, even though I've disabled all swap using swapoff. May be > those are tmpfs pages. The other interesting thing I tried was running > swapoff after a cgroup went over it's limit, the swapoff succeeded, > but I see strange numbers for free swap. I'll start another thread > after investigating a bit more. Those indeed are strange behaviours (if the swapoff really has succeeded, rather than lying), I not seen such and don't have an explanation. tmpfs doesn't add any weirdness there: when there's no swap, there can be no swap cache. Or is the swapoff still in progress? While it's busy, we keep /proc/meminfo looking sensible, but <Alt><SysRq>m can show negative free swap (IIRC). I'll be interested to hear what your investigation shows. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-07 17:41 ` Hugh Dickins @ 2007-10-08 2:54 ` Balbir Singh 0 siblings, 0 replies; 32+ messages in thread From: Balbir Singh @ 2007-10-08 2:54 UTC (permalink / raw) To: Hugh Dickins; +Cc: Andrew Morton, Pavel Emelianov, linux-kernel, linux-mm Hugh Dickins wrote: > On Fri, 5 Oct 2007, Balbir Singh wrote: >> Hugh Dickins wrote: >>> That's where it should happen, yes; but my point is that it very >>> often does not. Because the swap cache page (read in as part of >>> the readaround cluster of some other cgroup, or in swapoff by some >>> other cgroup) is already assigned to that other cgroup (by the >>> mem_cgroup_cache_charge in __add_to_swap_cache), and so goes "The >>> page_cgroup exists and the page has already been accounted" route >>> when mem_cgroup_charge is called from do_swap_page. Doesn't it? >>> >> You are right, at this point I am beginning to wonder if I should >> account for the swap cache at all? We account for the pages in RSS >> and when the page comes back into the page table(s) via do_swap_page. >> If we believe that the swap cache is transitional and the current >> expected working behaviour does not seem right or hard to fix, >> it might be easy to ignore unuse_pte() and add/remove_from_swap_cache() >> for accounting and control. > > It would be wrong to ignore the unuse_pte() case: what it's intending > to do is correct, it's just being prevented by the swapcache issue > from doing what it intends at present. > OK > (Though I'm not thrilled with the idea of it causing an admin's > swapoff to fail because of a cgroup reaching mem limit there, I do > agree with your earlier argument that that's the right thing to happen, > and it's up to the admin to fix things up - my original objection came > from not realizing that normally the cgroup will reclaim from itself > to free its mem. I'm glad we have that sorted out. Hmm, would the charge fail or the mm get OOM'ed?) > Right now, we OOM if charging and reclaim fails. > Ignoring add_to/remove_from swap cache is what I've tried before, > and again today. It's not enough: if you trying run a memhog > (something that allocates and touches more memory than the cgroup > is allowed, relying on pushing out to swap to complete), then that > works well with the present accounting in add_to/remove_from swap > cache, but it OOMs once I remove the memcontrol mods from > mm/swap_state.c. I keep going back to investigate why, keep on > thinking I understand it, then later realize I don't. Please > give it a try, I hope you've got better mental models than I have. > I will try it. Another way to try it, is to set memory.control_type to 1, that removes charging of cache pages (both swap cache and page cache). I just did a quick small test on the memory controller with swap cache changes disabled and it worked fine for me on my UML image (without OOMing). I'll try the same test on a bigger box. Disabling swap does usually cause an OOM for workloads using anonymous pages if the cgroup goes over it's limit (since the cgroup cannot pushout memory). > And I don't think it will be enough to handle shmem/tmpfs either; > but won't worry about that until we've properly understood why > exempting swapcache leads to those OOMs, and fixed that up. > Sure. >>> Are we misunderstanding each other, because I'm assuming >>> MEM_CGROUP_TYPE_ALL and you're assuming MEM_CGROUP_TYPE_MAPPED? >>> though I can't see that _MAPPED and _CACHED are actually supported, >>> there being no reference to them outside the enum that defines them. >> I am also assuming MEM_CGROUP_TYPE_ALL for the purpose of our >> discussion. The accounting is split into mem_cgroup_charge() and >> mem_cgroup_cache_charge(). While charging the caches is when we >> check for the control_type. > > It checks MEM_CGROUP_TYPE_ALL there, yes; but I can't find anything > checking for either MEM_CGROUP_TYPE_MAPPED or MEM_CGROUP_TYPE_CACHED. > (Or is it hidden in one of those preprocesor ## things which frustrate > both my greps and me!?) > MEM_CGROUP_TYPE_ALL is defined to be (MEM_CGROUP_TYPE_CACHED | MEM_CGROUP_TYPE_MAPPED). I'll make that more explicit with a patch. When the type is not MEM_CGROUP_TYPE_ALL, cached pages are ignored. >>> Or are you deceived by that ifdef NUMA code in swapin_readahead, >>> which propagates the fantasy that swap allocation follows vma layout? >>> That nonsense has been around too long, I'll soon be sending a patch >>> to remove it. >> The swapin readahead code under #ifdef NUMA is very confusing. > > I sent a patch to linux-mm last night, to remove that confusion. > Thanks, I saw that. >> I also >> noticed another confusing thing during my test, swap cache does not >> drop to 0, even though I've disabled all swap using swapoff. May be >> those are tmpfs pages. The other interesting thing I tried was running >> swapoff after a cgroup went over it's limit, the swapoff succeeded, >> but I see strange numbers for free swap. I'll start another thread >> after investigating a bit more. > > Those indeed are strange behaviours (if the swapoff really has > succeeded, rather than lying), I not seen such and don't have an > explanation. tmpfs doesn't add any weirdness there: when there's > no swap, there can be no swap cache. Or is the swapoff still in > progress? While it's busy, we keep /proc/meminfo looking sensible, > but <Alt><SysRq>m can show negative free swap (IIRC). > > I'll be interested to hear what your investigation shows. > With the new OOM killer changes, I see negative swap. When I run swapoff with a memory hogger workload, I see (after swapoff succeeds) .... Swap cache: add 473215, delete 473214, find 31744/36688, race 0+0 Free swap = 18446744073709105092kB Total swap = 0kB Free swap: -446524kB ... > Hugh -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-02 15:46 ` Hugh Dickins 2007-10-03 8:13 ` Balbir Singh @ 2007-10-04 16:10 ` Paul Menage 1 sibling, 0 replies; 32+ messages in thread From: Paul Menage @ 2007-10-04 16:10 UTC (permalink / raw) To: Hugh Dickins Cc: Balbir Singh, Andrew Morton, Pavel Emelianov, linux-kernel, linux-mm On 10/2/07, Hugh Dickins <hugh@veritas.com> wrote: > > I accept that full swap control is something you're intending to add > incrementally later; but the current state doesn't make sense to me. One comment on swap - ideally it should be a separate subsystem from the memory controller. That way people who are using cpusets to provide memory isolation (rather than using the page-based memory controller) can also get swap isolation. Paul -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-02 4:21 ` Memory controller merge (was Re: -mm merge plans for 2.6.24) Balbir Singh 2007-10-02 15:46 ` Hugh Dickins @ 2007-10-10 21:07 ` Rik van Riel 2007-10-11 6:33 ` Balbir Singh 1 sibling, 1 reply; 32+ messages in thread From: Rik van Riel @ 2007-10-10 21:07 UTC (permalink / raw) To: balbir; +Cc: Andrew Morton, linux-kernel, Linux Memory Management List On Tue, 02 Oct 2007 09:51:11 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > I was hopeful of getting the bare minimal infrastructure for memory > control in mainline, so that review is easy and additional changes > can be well reviewed as well. I am not yet convinced that the way the memory controller code and lumpy reclaim have been merged is correct. I am combing through the code now and will send in a patch when I figure out if/what is wrong. I ran into this because I'm trying to merge the split VM code up to the latest -mm... -- All Rights Reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Memory controller merge (was Re: -mm merge plans for 2.6.24) 2007-10-10 21:07 ` Rik van Riel @ 2007-10-11 6:33 ` Balbir Singh 0 siblings, 0 replies; 32+ messages in thread From: Balbir Singh @ 2007-10-11 6:33 UTC (permalink / raw) To: Rik van Riel; +Cc: Andrew Morton, linux-kernel, Linux Memory Management List Rik van Riel wrote: > On Tue, 02 Oct 2007 09:51:11 +0530 > Balbir Singh <balbir@linux.vnet.ibm.com> wrote: > >> I was hopeful of getting the bare minimal infrastructure for memory >> control in mainline, so that review is easy and additional changes >> can be well reviewed as well. > > I am not yet convinced that the way the memory controller code and > lumpy reclaim have been merged is correct. I am combing through the > code now and will send in a patch when I figure out if/what is wrong. > Hi, Rik, Do you mean the way the memory controller and lumpy reclaim work together? The reclaim in memory controller (on hitting the limit) is not lumpy. Would you like to see that change? Please do share your findings in the form of comments or patches. > I ran into this because I'm trying to merge the split VM code up to > the latest -mm... > Interesting, I'll see if I can find some spare test cycles to help test this code. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] [not found] <20071001142222.fcaa8d57.akpm@linux-foundation.org> 2007-10-02 4:21 ` Memory controller merge (was Re: -mm merge plans for 2.6.24) Balbir Singh @ 2007-10-02 16:06 ` Hugh Dickins 2007-10-02 9:10 ` Nick Piggin 2007-10-02 18:38 ` Mel Gorman 2007-10-02 16:21 ` new aops merge " Hugh Dickins 2007-10-02 17:45 ` remove zero_page (was Re: -mm merge plans for 2.6.24) Nick Piggin 3 siblings, 2 replies; 32+ messages in thread From: Hugh Dickins @ 2007-10-02 16:06 UTC (permalink / raw) To: Andrew Morton; +Cc: Chritoph Lameter, Mel Gorman, linux-kernel, linux-mm On Mon, 1 Oct 2007, Andrew Morton wrote: > # > # slub && antifrag > # > have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch > only-check-absolute-watermarks-for-alloc_high-and-alloc_harder-allocations.patch > slub-exploit-page-mobility-to-increase-allocation-order.patch > slub-reduce-antifrag-max-order.patch > > I think this stuff is in the "mm stuff we don't want to merge" category. > If so, I really should have dropped it ages ago. I agree. I spent a while last week bisecting down to see why my heavily swapping loads take 30%-60% longer with -mm than mainline, and it was here that they went bad. Trying to keep higher orders free is costly. On the other hand, hasn't SLUB efficiency been built on the expectation that higher orders can be used? And it would be a twisted shame for high performance to be held back by some idiot's swapping load. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] 2007-10-02 16:06 ` kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] Hugh Dickins @ 2007-10-02 9:10 ` Nick Piggin 2007-10-02 18:38 ` Mel Gorman 1 sibling, 0 replies; 32+ messages in thread From: Nick Piggin @ 2007-10-02 9:10 UTC (permalink / raw) To: Hugh Dickins Cc: Andrew Morton, Chritoph Lameter, Mel Gorman, linux-kernel, linux-mm On Wednesday 03 October 2007 02:06, Hugh Dickins wrote: > On Mon, 1 Oct 2007, Andrew Morton wrote: > > # > > # slub && antifrag > > # > > have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch > > only-check-absolute-watermarks-for-alloc_high-and-alloc_harder-allocation > >s.patch slub-exploit-page-mobility-to-increase-allocation-order.patch > > slub-reduce-antifrag-max-order.patch > > > > I think this stuff is in the "mm stuff we don't want to merge" > > category. If so, I really should have dropped it ages ago. > > I agree. I spent a while last week bisecting down to see why my heavily > swapping loads take 30%-60% longer with -mm than mainline, and it was > here that they went bad. Trying to keep higher orders free is costly. Yeah, no there's no way we'd merge that. > On the other hand, hasn't SLUB efficiency been built on the expectation > that higher orders can be used? And it would be a twisted shame for > high performance to be held back by some idiot's swapping load. IMO it's a bad idea to create all these dependencies like this. If SLUB can get _more_ performance out of using higher order allocations, then fine. If it is starting off at a disadvantage at the same order, then it that should be fixed first, right? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] 2007-10-02 16:06 ` kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] Hugh Dickins 2007-10-02 9:10 ` Nick Piggin @ 2007-10-02 18:38 ` Mel Gorman 2007-10-02 18:28 ` Christoph Lameter 1 sibling, 1 reply; 32+ messages in thread From: Mel Gorman @ 2007-10-02 18:38 UTC (permalink / raw) To: Hugh Dickins; +Cc: Andrew Morton, Chritoph Lameter, linux-kernel, linux-mm On Tue, 2007-10-02 at 17:06 +0100, Hugh Dickins wrote: > On Mon, 1 Oct 2007, Andrew Morton wrote: > > # > > # slub && antifrag > > # > > have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch > > only-check-absolute-watermarks-for-alloc_high-and-alloc_harder-allocations.patch > > slub-exploit-page-mobility-to-increase-allocation-order.patch > > slub-reduce-antifrag-max-order.patch > > > > I think this stuff is in the "mm stuff we don't want to merge" category. > > If so, I really should have dropped it ages ago. > > I agree. I spent a while last week bisecting down to see why my heavily > swapping loads take 30%-60% longer with -mm than mainline, and it was > here that they went bad. Trying to keep higher orders free is costly. > Very interesting. I had agreed with these patches being pulled but it was simply on the grounds that there was no agreement it was the right thing to do. It was best to have mainline and -mm behave the same from a fragmentation perspective and revisit this idea from scratch. That it affects swapping loads is news so thanks for that. > On the other hand, hasn't SLUB efficiency been built on the expectation > that higher orders can be used? And it would be a twisted shame for > high performance to be held back by some idiot's swapping load. > My belief is that SLUB can still use the higher orders if configured to do so at boot-time. The loss of these patches means it won't try and do it automatically. Christoph will chime in I'm sure. -- Mel Gorman -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] 2007-10-02 18:38 ` Mel Gorman @ 2007-10-02 18:28 ` Christoph Lameter 2007-10-03 0:37 ` Christoph Lameter 0 siblings, 1 reply; 32+ messages in thread From: Christoph Lameter @ 2007-10-02 18:28 UTC (permalink / raw) To: Mel Gorman; +Cc: Hugh Dickins, Andrew Morton, linux-kernel, linux-mm On Tue, 2 Oct 2007, Mel Gorman wrote: > > I agree. I spent a while last week bisecting down to see why my heavily > > swapping loads take 30%-60% longer with -mm than mainline, and it was > > here that they went bad. Trying to keep higher orders free is costly. The larger order allocations may cause excessive reclaim under certain circumstances. Reclaim will continue to evict pages until a larger order page can be coalesced. And it seems that this eviction is not that well targeted at this point. So lots of pages may be needlessly evicted. > > On the other hand, hasn't SLUB efficiency been built on the expectation > > that higher orders can be used? And it would be a twisted shame for > > high performance to be held back by some idiot's swapping load. > > > > My belief is that SLUB can still use the higher orders if configured to > do so at boot-time. The loss of these patches means it won't try and do > it automatically. Christoph will chime in I'm sure. You can still manually configure those at boot time via slub_max_order etc. I think Mel and I have to rethink how to do these efficiently. Mel has some ideas and there is some talk about using the vmalloc fallback to insure that things always work. Probably we may have to tune things so that fallback is chosen if reclaim cannot get us the larger order page with reasonable effort. The maximum order of allocation used by SLUB may have to depend on the number of page structs in the system since small systems (128M was the case that Peter found) can easier get into trouble. SLAB has similar measures to avoid order 1 allocations for small systems below 32M. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] 2007-10-02 18:28 ` Christoph Lameter @ 2007-10-03 0:37 ` Christoph Lameter 0 siblings, 0 replies; 32+ messages in thread From: Christoph Lameter @ 2007-10-03 0:37 UTC (permalink / raw) To: Mel Gorman; +Cc: Hugh Dickins, Andrew Morton, linux-kernel, linux-mm On Tue, 2 Oct 2007, Christoph Lameter wrote: > The maximum order of allocation used by SLUB may have to depend on the > number of page structs in the system since small systems (128M was the > case that Peter found) can easier get into trouble. SLAB has similar > measures to avoid order 1 allocations for small systems below 32M. A patch like this? This is based on the number of page structs on the system. Maybe it needs to be based on the number of MAX_ORDER blocks for antifrag? SLUB: Determine slub_max_order depending on the number of pages available Determine the maximum order to be used for slabs and the mininum desired number of objects in a slab from the amount of pages that a system has available (like SLAB does for the order 1/0 distinction). For systems with less than 128M only use order 0 allocations (SLAB does that for <32M only). The order 0 config is useful for small systems to minimize the memory used. Memory easily fragments since we have less than 32k pages to play with. Order 0 insures that higher order allocations are minimized (Larger orders must still be used for objects that do not fit into order 0 pages). Then step up to order 1 for systems < 256000 pages (1G) Order 2 limit to systems < 1000000 page structs (4G) Order 3 for systems larger than that. Signed-off-by: Christoph Lameter <clameter@sgi.com> --- mm/slub.c | 49 +++++++++++++++++++++++++------------------------ 1 file changed, 25 insertions(+), 24 deletions(-) Index: linux-2.6/mm/slub.c =================================================================== --- linux-2.6.orig/mm/slub.c 2007-10-02 09:26:16.000000000 -0700 +++ linux-2.6/mm/slub.c 2007-10-02 16:40:22.000000000 -0700 @@ -153,25 +153,6 @@ static inline void ClearSlabDebug(struct /* Enable to test recovery from slab corruption on boot */ #undef SLUB_RESILIENCY_TEST -#if PAGE_SHIFT <= 12 - -/* - * Small page size. Make sure that we do not fragment memory - */ -#define DEFAULT_MAX_ORDER 1 -#define DEFAULT_MIN_OBJECTS 4 - -#else - -/* - * Large page machines are customarily able to handle larger - * page orders. - */ -#define DEFAULT_MAX_ORDER 2 -#define DEFAULT_MIN_OBJECTS 8 - -#endif - /* * Mininum number of partial slabs. These will be left on the partial * lists even if they are empty. kmem_cache_shrink may reclaim them. @@ -1718,8 +1699,9 @@ static struct page *get_object_page(cons * take the list_lock. */ static int slub_min_order; -static int slub_max_order = DEFAULT_MAX_ORDER; -static int slub_min_objects = DEFAULT_MIN_OBJECTS; +static int slub_max_order; +static int slub_min_objects = 4; +static int manual; /* * Merge control. If this is set then no merging of slab caches will occur. @@ -2237,7 +2219,7 @@ static struct kmem_cache *kmalloc_caches static int __init setup_slub_min_order(char *str) { get_option (&str, &slub_min_order); - + manual = 1; return 1; } @@ -2246,7 +2228,7 @@ __setup("slub_min_order=", setup_slub_mi static int __init setup_slub_max_order(char *str) { get_option (&str, &slub_max_order); - + manual = 1; return 1; } @@ -2255,7 +2237,7 @@ __setup("slub_max_order=", setup_slub_ma static int __init setup_slub_min_objects(char *str) { get_option (&str, &slub_min_objects); - + manual = 1; return 1; } @@ -2566,6 +2548,16 @@ int kmem_cache_shrink(struct kmem_cache } EXPORT_SYMBOL(kmem_cache_shrink); +/* + * Table to autotune the maximum slab order based on the number of pages + * that the system has available. + */ +static unsigned long __initdata phys_pages_for_order[PAGE_ALLOC_COSTLY_ORDER] = { + 32768, /* >128M if using 4K pages, >512M (16k), >2G (64k) */ + 256000, /* >1G if using 4k pages, >4G (16k), >16G (64k) */ + 1000000 /* >4G if using 4k pages, >16G (16k), >64G (64k) */ +}; + /******************************************************************** * Basic setup of slabs *******************************************************************/ @@ -2575,6 +2567,15 @@ void __init kmem_cache_init(void) int i; int caches = 0; + if (!manual) { + /* No manual parameters. Autotune for system */ + for (i = 0; i < PAGE_ALLOC_COSTLY_ORDER; i++) + if (num_physpages > phys_pages_for_order[i]) { + slub_max_order++; + slub_min_objects <<= 1; + } + } + #ifdef CONFIG_NUMA /* * Must first have the slab cache available for the allocations of the -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* new aops merge [was Re: -mm merge plans for 2.6.24] [not found] <20071001142222.fcaa8d57.akpm@linux-foundation.org> 2007-10-02 4:21 ` Memory controller merge (was Re: -mm merge plans for 2.6.24) Balbir Singh 2007-10-02 16:06 ` kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] Hugh Dickins @ 2007-10-02 16:21 ` Hugh Dickins 2007-10-02 17:45 ` remove zero_page (was Re: -mm merge plans for 2.6.24) Nick Piggin 3 siblings, 0 replies; 32+ messages in thread From: Hugh Dickins @ 2007-10-02 16:21 UTC (permalink / raw) To: Andrew Morton; +Cc: Nick Piggin, linux-kernel, linux-mm On Mon, 1 Oct 2007, Andrew Morton wrote: > fs-introduce-write_begin-write_end-and-perform_write-aops.patch > introduce-write_begin-write_end-aops-important-fix.patch > introduce-write_begin-write_end-aops-fix2.patch > deny-partial-write-for-loop-dev-fd.patch > mm-restore-kernel_ds-optimisations.patch > implement-simple-fs-aops.patch > implement-simple-fs-aops-fix.patch > ... > fs-remove-some-aop_truncated_page.patch > > Merge Good, fine by me; but forces me to confess, with abject shame, that I still haven't sent you some shmem/tmpfs fixes/cleanups (currently intermingled with some other stuff in my tree, I'm still disentangling). Nothing so bad as to mess up a bisection, but my loop-over-tmpfs tests hang without passing gfp_mask down and down to add_to_swap_cache; and a few other bits. I'll get back on to it. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* remove zero_page (was Re: -mm merge plans for 2.6.24) [not found] <20071001142222.fcaa8d57.akpm@linux-foundation.org> ` (2 preceding siblings ...) 2007-10-02 16:21 ` new aops merge " Hugh Dickins @ 2007-10-02 17:45 ` Nick Piggin 2007-10-03 10:58 ` Andrew Morton 2007-10-03 15:21 ` Linus Torvalds 3 siblings, 2 replies; 32+ messages in thread From: Nick Piggin @ 2007-10-02 17:45 UTC (permalink / raw) To: Andrew Morton, Torvalds, Linus, linux-mm; +Cc: linux-kernel On Tuesday 02 October 2007 07:22, Andrew Morton wrote: > remove-zero_page.patch > > Linus dislikes it. Probably drop it. I don't know if Linus actually disliked the patch itself, or disliked my (maybe confusingly worded) rationale? To clarify: it is not zero_page that fundamentally causes a problem, but it is a problem that was exposed when I rationalised the page refcounting in the kernel (and mapcounting in the mm). I see about 4 things we can do: 1. Nothing 2. Remove zero_page 3. Reintroduce some refcount special-casing for the zero page 4. zero_page per-node or per-cpu or whatever 1 and 2 kind of imply that nothing much sane should use the zero_page much (the former also implies that we don't care much about those who do, but in that case, why not go for code removal?). 3 and 4 are if we think there are valid heavy users of zero page, or we are worried about hurting badly written apps by removing it. If the former, I'd love to hear about them; if the latter, then it definitely is a valid concern and I have a patch to avoid refcounting (but if this is the case then I do hope that one day we can eventually remove it). > mm-use-pagevec-to-rotate-reclaimable-page.patch > mm-use-pagevec-to-rotate-reclaimable-page-fix.patch > mm-use-pagevec-to-rotate-reclaimable-page-fix-2.patch > mm-use-pagevec-to-rotate-reclaimable-page-fix-function-declaration.patch > mm-use-pagevec-to-rotate-reclaimable-page-fix-bug-at-include-linux-mmh220.p >atch > mm-use-pagevec-to-rotate-reclaimable-page-kill-redundancy-in-rotate_reclaim >able_page.patch > mm-use-pagevec-to-rotate-reclaimable-page-move_tail_pages-into-lru_add_drai >n.patch > > I guess I'll merge this. Would be nice to have wider perfromance testing > but I guess it'll be easy enough to undo. Care to give it one more round through -mm? Is it easy enough to keep? I haven't had a chance to review it, which I'd like to do at some point (and I don't think it would hurt to have a bit more testing). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-02 17:45 ` remove zero_page (was Re: -mm merge plans for 2.6.24) Nick Piggin @ 2007-10-03 10:58 ` Andrew Morton 2007-10-03 15:21 ` Linus Torvalds 1 sibling, 0 replies; 32+ messages in thread From: Andrew Morton @ 2007-10-03 10:58 UTC (permalink / raw) To: Nick Piggin; +Cc: Torvalds, Linus, linux-mm, linux-kernel On Wed, 3 Oct 2007 03:45:09 +1000 Nick Piggin <nickpiggin@yahoo.com.au> wrote: > > mm-use-pagevec-to-rotate-reclaimable-page.patch > > mm-use-pagevec-to-rotate-reclaimable-page-fix.patch > > mm-use-pagevec-to-rotate-reclaimable-page-fix-2.patch > > mm-use-pagevec-to-rotate-reclaimable-page-fix-function-declaration.patch > > mm-use-pagevec-to-rotate-reclaimable-page-fix-bug-at-include-linux-mmh220.p > >atch > > mm-use-pagevec-to-rotate-reclaimable-page-kill-redundancy-in-rotate_reclaim > >able_page.patch > > mm-use-pagevec-to-rotate-reclaimable-page-move_tail_pages-into-lru_add_drai > >n.patch > > > > I guess I'll merge this. Would be nice to have wider perfromance testing > > but I guess it'll be easy enough to undo. > > Care to give it one more round through -mm? Is it easy enough to > keep? Yup. Nobody has done much with that code in ages. > I haven't had a chance to review it, which I'd like to do at some > point (and I don't think it would hurt to have a bit more testing). Sure. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-02 17:45 ` remove zero_page (was Re: -mm merge plans for 2.6.24) Nick Piggin 2007-10-03 10:58 ` Andrew Morton @ 2007-10-03 15:21 ` Linus Torvalds 2007-10-08 15:17 ` Nick Piggin 1 sibling, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2007-10-03 15:21 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, linux-mm, linux-kernel On Wed, 3 Oct 2007, Nick Piggin wrote: > > I don't know if Linus actually disliked the patch itself, or disliked > my (maybe confusingly worded) rationale? Yes. I'd happily accept the patch, but I'd want it clarified and made obvious what the problem was - and it wasn't the zero page itself, it was a regression in the VM that made it less palatable. I also thought that there were potentially better solutions, namely to simply avoid the VM regression, but I also acknowledge that they may not be worth it - I just want them to be on the table. In short: the real cost of the zero page was the reference counting on the page that we do these days. For example, I really do believe that the problem could fairly easily be fixed by simply not considering zero_page to be a "vm_normal_page()". We already *do* have support for pages not getting ref-counted (since we need it for other things), and I think that zero_page very naturally falls into exactly that situation. So in many ways, I would think that turning zero-page into a nonrefcounted page (the same way we really do have to do for other things anyway) would be the much more *direct* solution, and in many ways the obvious one. HOWEVER - if people think that it's easier to remove zero_page, and want to do it for other reasons, *AND* can hopefully even back up the claim that it never matters with numbers (ie that the extra pagefaults just make the whole zero-page thing pointless), then I'd certainly accept the patch. I'd just want the patch *description* to then also be correct, and blame the right situation, instead of blaming zero-page itself. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-03 15:21 ` Linus Torvalds @ 2007-10-08 15:17 ` Nick Piggin 2007-10-09 13:00 ` Hugh Dickins 2007-10-09 14:52 ` Linus Torvalds 0 siblings, 2 replies; 32+ messages in thread From: Nick Piggin @ 2007-10-08 15:17 UTC (permalink / raw) To: Linus Torvalds, Hugh Dickins; +Cc: Andrew Morton, linux-mm, linux-kernel On Thursday 04 October 2007 01:21, Linus Torvalds wrote: > On Wed, 3 Oct 2007, Nick Piggin wrote: > > I don't know if Linus actually disliked the patch itself, or disliked > > my (maybe confusingly worded) rationale? > > Yes. I'd happily accept the patch, but I'd want it clarified and made > obvious what the problem was - and it wasn't the zero page itself, it was > a regression in the VM that made it less palatable. OK, revised changelog at the end of this mail... > I also thought that there were potentially better solutions, namely to > simply avoid the VM regression, but I also acknowledge that they may not > be worth it - I just want them to be on the table. > > In short: the real cost of the zero page was the reference counting on the > page that we do these days. For example, I really do believe that the > problem could fairly easily be fixed by simply not considering zero_page > to be a "vm_normal_page()". We already *do* have support for pages not > getting ref-counted (since we need it for other things), and I think that > zero_page very naturally falls into exactly that situation. > > So in many ways, I would think that turning zero-page into a nonrefcounted > page (the same way we really do have to do for other things anyway) would > be the much more *direct* solution, and in many ways the obvious one. That was my first approach. It isn't completely trivial, but vm_normal_page() does play a part (but we end up needing a vm_normal_page() variant -- IIRC vm_normal_or_zero_page()). But taken as a whole, non-refcounted zero_page is obviously a lot more work than no zero page at all :) > HOWEVER - if people think that it's easier to remove zero_page, and want > to do it for other reasons, *AND* can hopefully even back up the claim > that it never matters with numbers (ie that the extra pagefaults just make > the whole zero-page thing pointless), then I'd certainly accept the patch. I have done some tests which indicate a couple of very basic common tools don't do much zero-page activity (ie. kbuild). And also combined with some logical arguments to say that a "sane" app wouldn't be using zero_page much. (basically -- if the app cares about memory or cache footprint and is using many pages of zeroes, then it should have a more compressed representation of zeroes anyway). However there is a window for some "insane" code to regress without the zero_page. I'm not arguing that we don't care about those, however I have no way to guarantee they don't exist. I hope we wouldn't get a potentially useless complexity like this stuck in the VM forever just because we don't _know_ whether it's useful to anybody... How about something like this? --- From: Nick Piggin <npiggin@suse.de> The commit b5810039a54e5babf428e9a1e89fc1940fabff11 contains the note A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue. And indeed this cacheline bouncing has shown up on large SGI systems. There was a situation where an Altix system was essentially livelocked tearing down ZERO_PAGE pagetables when an HPC app aborted during startup. This situation can be avoided in userspace, but it does highlight the potential scalability problem with refcounting ZERO_PAGE, and corner cases where it can really hurt (we don't want the system to livelock!). There are several broad ways to fix this problem: 1. add back some special casing to avoid refcounting ZERO_PAGE 2. per-node or per-cpu ZERO_PAGES 3. remove the ZERO_PAGE completely I will argue for 3. The others should also fix the problem, but they result in more complex code than does 3, with little or no real benefit that I can see. Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a false optimisation: if an application is performance critical, it would not be doing many read faults of new memory, or at least it could be expected to write to that memory soon afterwards. If cache or memory use is critical, it should not be working with a significant number of ZERO_PAGEs anyway (a more compact representation of zeroes should be used). As a sanity check -- mesuring on my desktop system, there are never many mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not increase much without it. When running a make -j4 kernel compile on my dual core system, there are about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000 ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second is torn down without being COWed). So removing ZERO_PAGE will save 1,000 page faults per second, and 2,000 bounces of the ZERO_PAGE struct page cacheline per second when running kbuild, while saving less than 1 page clearing operation per second (even 1 page clear is far cheaper than a thousand cacheline bounces between CPUs). Of course, neither the logical argument nor these checks give anything like a guarantee of no regressions. However, I think this is a reasonable opportunity to remove the ZERO_PAGE from the pagefault path. The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked. I don't see much use to them except complexity and useless benchmarks. All other users of ZERO_PAGE are converted just to use ZERO_PAGE(0) for simplicity. We can look at replacing them all and ripping out ZERO_PAGE completely if/when this patch gets in. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-08 15:17 ` Nick Piggin @ 2007-10-09 13:00 ` Hugh Dickins 2007-10-09 14:52 ` Linus Torvalds 1 sibling, 0 replies; 32+ messages in thread From: Hugh Dickins @ 2007-10-09 13:00 UTC (permalink / raw) To: Nick Piggin; +Cc: Linus Torvalds, Andrew Morton, linux-mm, linux-kernel On Tue, 9 Oct 2007, Nick Piggin wrote: > > The commit b5810039a54e5babf428e9a1e89fc1940fabff11 contains the note > > A last caveat: the ZERO_PAGE is now refcounted and managed with rmap > (and thus mapcounted and count towards shared rss). These writes to > the struct page could cause excessive cacheline bouncing on big > systems. There are a number of ways this could be addressed if it is > an issue. > > And indeed this cacheline bouncing has shown up on large SGI systems. > There was a situation where an Altix system was essentially livelocked > tearing down ZERO_PAGE pagetables when an HPC app aborted during startup. > This situation can be avoided in userspace, but it does highlight the > potential scalability problem with refcounting ZERO_PAGE, and corner > cases where it can really hurt (we don't want the system to livelock!). > > There are several broad ways to fix this problem: > 1. add back some special casing to avoid refcounting ZERO_PAGE > 2. per-node or per-cpu ZERO_PAGES > 3. remove the ZERO_PAGE completely > > I will argue for 3. The others should also fix the problem, but they > result in more complex code than does 3, with little or no real benefit > that I can see. Why? Sorry, I've no useful arguments to add (and my testing was too much like yours to add any value), but I do want to go on record as still a strong supporter of approach 3 and your patch. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-08 15:17 ` Nick Piggin 2007-10-09 13:00 ` Hugh Dickins @ 2007-10-09 14:52 ` Linus Torvalds 2007-10-09 9:31 ` Nick Piggin 1 sibling, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2007-10-09 14:52 UTC (permalink / raw) To: Nick Piggin; +Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel On Tue, 9 Oct 2007, Nick Piggin wrote: > > I have done some tests which indicate a couple of very basic common tools > don't do much zero-page activity (ie. kbuild). And also combined with some > logical arguments to say that a "sane" app wouldn't be using zero_page much. > (basically -- if the app cares about memory or cache footprint and is using > many pages of zeroes, then it should have a more compressed representation > of zeroes anyway). One of the things that zero-page has been used for is absolutely *huge* (but sparse) arrays in Fortan programs. At least in traditional fortran, it was very hard to do dynamic allocations, so people would allocate the *maximum* array statically, and then not necessarily use everything. I don't know if the pages ever even got paged in, but this is the kind of usage which is *not* insane. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-09 14:52 ` Linus Torvalds @ 2007-10-09 9:31 ` Nick Piggin 2007-10-10 2:22 ` Linus Torvalds 0 siblings, 1 reply; 32+ messages in thread From: Nick Piggin @ 2007-10-09 9:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel On Wednesday 10 October 2007 00:52, Linus Torvalds wrote: > On Tue, 9 Oct 2007, Nick Piggin wrote: > > I have done some tests which indicate a couple of very basic common tools > > don't do much zero-page activity (ie. kbuild). And also combined with > > some logical arguments to say that a "sane" app wouldn't be using > > zero_page much. (basically -- if the app cares about memory or cache > > footprint and is using many pages of zeroes, then it should have a more > > compressed representation of zeroes anyway). > > One of the things that zero-page has been used for is absolutely *huge* > (but sparse) arrays in Fortan programs. > > At least in traditional fortran, it was very hard to do dynamic > allocations, so people would allocate the *maximum* array statically, and > then not necessarily use everything. I don't know if the pages ever even > got paged in, In which case, they would not be using the ZERO_PAGE? If they were paging in (ie. reading) huge reams of zeroes, then maybe their algorithms aren't so good anyway? (I don't know). > but this is the kind of usage which is *not* insane. Yeah, that's why I use the double quotes... I wonder how to find out, though. I guess I could ask SGI if they could ask around -- but that still comes back to the problem of not being able to ever conclusively show that there are no real users of the ZERO_PAGE. Where do you suggest I go from here? Is there any way I can convince you to try it? Make it a config option? (just kidding) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-09 9:31 ` Nick Piggin @ 2007-10-10 2:22 ` Linus Torvalds 2007-10-09 10:15 ` Nick Piggin 0 siblings, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2007-10-10 2:22 UTC (permalink / raw) To: Nick Piggin; +Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel On Tue, 9 Oct 2007, Nick Piggin wrote: > > Where do you suggest I go from here? Is there any way I can > convince you to try it? Make it a config option? (just kidding) No, I'll take the damn patch, but quite frankly, I think your arguments suck. I've told you so before, and asked for numbers, and all you do is handwave. And this is like the *third*time*, and you don't even seem to admit that you're handwaving. So let's do it, but dammit: - make sure there aren't any invalid statements like this in the final commit message. - if somebody shows that you were wrong, and points to a real load, please never *ever* make excuses for this again, ok? Is that a deal? I hope we'll never need to hear about this again, but I really object to the way you've tried to "sell" this thing, by basically starting out dishonest about what the problem was, and even now I've yet to see a *single* performance number even though I've asked for them (except for the problem case, which was introduced by *you*) Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-10 2:22 ` Linus Torvalds @ 2007-10-09 10:15 ` Nick Piggin 2007-10-10 3:06 ` Linus Torvalds 2007-10-10 4:06 ` Hugh Dickins 0 siblings, 2 replies; 32+ messages in thread From: Nick Piggin @ 2007-10-09 10:15 UTC (permalink / raw) To: Linus Torvalds; +Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel On Wednesday 10 October 2007 12:22, Linus Torvalds wrote: > On Tue, 9 Oct 2007, Nick Piggin wrote: > > Where do you suggest I go from here? Is there any way I can > > convince you to try it? Make it a config option? (just kidding) > > No, I'll take the damn patch, but quite frankly, I think your arguments > suck. > > I've told you so before, and asked for numbers, and all you do is I gave 2 other numbers. After that, it really doesn't matter if I give you 2 numbers or 200, because it wouldn't change the fact that there are 3 programs using the ZERO_PAGE that we'll never know about. > handwave. And this is like the *third*time*, and you don't even seem to > admit that you're handwaving. I think I've always admitted I'm handwaving in my assertion that programs would not be using the zero page. My handwaving is an attempt to show that I have some vaguely reasonable reasons to think it will be OK to remove it. That's all. > So let's do it, but dammit: > - make sure there aren't any invalid statements like this in the final > commit message. Was the last one OK? > - if somebody shows that you were wrong, and points to a real load, > please never *ever* make excuses for this again, ok? > > Is that a deal? I hope we'll never need to hear about this again, but I > really object to the way you've tried to "sell" this thing, by basically > starting out dishonest about what the problem was, The dishonesty in the changelog is more of an oversight than an attempt to get it merged. It never even crossed my mind that you would be fooled by it ;) To prove my point: the *first* approach I posted to fix this problem was exactly a patch to special-case the zero_page refcounting which was removed with my PageReserved patch. Neither Hugh nor yourself liked it one bit! So I have no particular bias against the zero page or problem admitting I introduced the issue. I do just think this could be a nice opportunity to try getting rid of the zero page and simplifiy things. > and even now I've yet > to see a *single* performance number even though I've asked for them > (except for the problem case, which was introduced by *you*) Basically: I don't know what else to show you! I expect it would be relatively difficult to find a measurable difference between no zero-page and zero-page with no refcounting problem. Precisely because I can't find anything that really makes use of it. Again: what numbers can I get for you that would make you feel happier about it? Anyway, before you change your mind: it's a deal! If somebody screams then I'll have a patch for you to reintroduce the zero page minus refcounting the next day. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-09 10:15 ` Nick Piggin @ 2007-10-10 3:06 ` Linus Torvalds 2007-10-10 4:06 ` Hugh Dickins 1 sibling, 0 replies; 32+ messages in thread From: Linus Torvalds @ 2007-10-10 3:06 UTC (permalink / raw) To: Nick Piggin; +Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel On Tue, 9 Oct 2007, Nick Piggin wrote: > > I gave 2 other numbers. After that, it really doesn't matter if I give > you 2 numbers or 200, because it wouldn't change the fact that there > are 3 programs using the ZERO_PAGE that we'll never know about. You gave me no timings what-so-ever. Yes, you said "1000 page faults", but no, I have yet to see a *single* actual performance number. Maybe I missed it? Or maybe you just never did them. Was it really so non-obvious that I actually wanted *performance* numbers, not just some random numbers about how many page faults you have? Or did you post them somewhere else? I don't have any memory of having seen any performance numbers what-so-ever, but I admittedly get too much email. Here's three numbers of my own: 8, 17 and 975. So I gave you "numbers", but what do they _mean_? So let me try one more time: - I don't want any excuses about how bad PAGE_ZERO is. You made it bad, it wasn't bad before. - I want numbers. I want the commit message to tell us *why* this is done. The numbers I want is performance numbers, not handwave numbers. Both for the bad case that it's supposed to fix, *and* for "normal load". - I want you to just say that if it turns out that there are people who use ZERO_PAGE, you stop calling them crazy, and promise to look at the alternatives. How much clearer can I be? I have said several times that I think this patch is kind of sad, and the reason I think it's sad is that you (and Hugh) convinced me to take the patch that made it sad in the first place. It didn't *use* to be bad. And I've use ZERO_PAGE myself for timing, I've had nice test-programs that knew that it could ignore cache effects and get pure TLB effects when it just allocated memory and didn't write to it. That's why I don't like the lack of numbers. That's why I didn't like the original commit message that tried to blame the wrong part. That's why I didn't like this patch to begin with. But I'm perfectly ready to take it, and see if anybody complains. Hopefully nobody ever will. But by now I absolutely *detest* this patch because of its history, and how I *told* you guys what the reserved bit did, and how you totally ignored it, and then tried to blame ZERO_PAGE for that. So yes, I want the patch to be accompanied by an explanation, which includes the performance side of why it is wanted/needed in the first place. If this patch didn't have that kind of history, I wouldn't give a flying f about it. As it is, this whole thing has a background. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-09 10:15 ` Nick Piggin 2007-10-10 3:06 ` Linus Torvalds @ 2007-10-10 4:06 ` Hugh Dickins 2007-10-10 5:20 ` Linus Torvalds 1 sibling, 1 reply; 32+ messages in thread From: Hugh Dickins @ 2007-10-10 4:06 UTC (permalink / raw) To: Nick Piggin; +Cc: Linus Torvalds, Andrew Morton, linux-mm, linux-kernel On Tue, 9 Oct 2007, Nick Piggin wrote: > by it ;) To prove my point: the *first* approach I posted to fix this > problem was exactly a patch to special-case the zero_page refcounting > which was removed with my PageReserved patch. Neither Hugh nor yourself > liked it one bit! True (speaking for me; I forget whether Linus ever got to see it). I apologize to you, Nick, for getting you into this position of fighting for something which wasn't your choice in the first place. If I thought we'd have a better kernel by dropping this patch and going back to one that just avoids the refcounting, I'd say do it. No, I still think it's worth trying this one first. But best have your avoid-the-refcounting patch ready and reviewed for emergency use if regression does show up somewhere. Thanks, Hugh [My mails out are at present getting randomly delayed by six hours or so, which makes it extra hard for me to engage usefully in any thread.] -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-10 4:06 ` Hugh Dickins @ 2007-10-10 5:20 ` Linus Torvalds 2007-10-09 14:30 ` Nick Piggin 0 siblings, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2007-10-10 5:20 UTC (permalink / raw) To: Hugh Dickins; +Cc: Nick Piggin, Andrew Morton, linux-mm, linux-kernel On Wed, 10 Oct 2007, Hugh Dickins wrote: > On Tue, 9 Oct 2007, Nick Piggin wrote: > > by it ;) To prove my point: the *first* approach I posted to fix this > > problem was exactly a patch to special-case the zero_page refcounting > > which was removed with my PageReserved patch. Neither Hugh nor yourself > > liked it one bit! > > True (speaking for me; I forget whether Linus ever got to see it). The problem is, those first "remove ref-counting" patches were ugly *regardless* of ZERO_PAGE. We (yes, largely I) fixed up the mess since. The whole vm_normal_page() and the magic PFN_REMAP thing got rid of a lot of the problems. And I bet that we could do something very similar wrt the zero page too. Basically, the ZERO page could act pretty much exactly like a PFN_REMAP page: the VM would not touch it. No rmap, no page refcounting, no nothing. This following patch is not meant to be even half-way correct (it's not even _remotely_ tested), but is just meant to be a rough "grep for ZERO_PAGE in the VM, and see what happens if you don't ref-count it". Would something like the below work? I dunno. But I suspect it would. I doubt anybody has the energy to actually try to actually follow through on it, which is why I'm not pushing on it any more, and why I'll accept Nick's patch to just remove ZERO_PAGE, but I really *am* very unhappy about this. The "page refcounting cleanups" in the VM back when were really painful. And dammit, I felt like I was the one who had to clean them up after you guys. Which makes me really testy on this subject. And yes, I also admit that the vm_normal_page() and the PFN_REMAP thing ended up really improving the VM, and we're pretty much certainly better off now than we were before - but I also think that ZERO_PAGE etc could easily be handled with the same model. After all, if we can make "mmap(/dev/mem)" work with COW and everything, I'd argue that ZERO_PAGE really is just a very very small special case of that! Totally half-assed untested patch to follow, not meant for anything but a "I think this kind of approach should have worked too" comment. So I'm not pushing the patch below, I'm just fighting for people realizing that - the kernel has *always* (since pretty much day 1) done that ZERO_PAGE thing. This means that I would not be at all surprised if some application basically depends on it. I've written test-programs that depends on it - maybe people have written other code that basically has been written for and tested with a kernel that has basically always made read-only zero pages extra cheap. So while it may be true that removing ZERO_PAGE won't affect anybody, I don't think it's a given, and I also don't think it's sane calling people "crazy" for depending on something that has always been true under Linux for the last 15+ years. There are few behaviors that have been around for that long. - make sure the commit message is accurate as to need for this (ie not claim that the ZERO_PAGE itself was the problem, and give some actual performance numbers on what is going on) that's all. Linus --- mm/memory.c | 17 ++++++++--------- mm/migrate.c | 2 +- 2 files changed, 9 insertions(+), 10 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index f82b359..0a8cc88 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -386,6 +386,7 @@ static inline int is_cow_mapping(unsigned int flags) struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte) { unsigned long pfn = pte_pfn(pte); + struct page *page; if (unlikely(vma->vm_flags & VM_PFNMAP)) { unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT; @@ -413,7 +414,11 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_ * The PAGE_ZERO() pages and various VDSO mappings can * cause them to exist. */ - return pfn_to_page(pfn); + page = pfn_to_page(pfn); + if (PageReserved(page)) + page = NULL; + + return page; } /* @@ -968,7 +973,7 @@ no_page_table: if (flags & FOLL_ANON) { page = ZERO_PAGE(address); if (flags & FOLL_GET) - get_page(page); + page = alloc_page(GFP_KERNEL | GFP_ZERO); BUG_ON(flags & FOLL_WRITE); } return page; @@ -1131,9 +1136,6 @@ static int zeromap_pte_range(struct mm_struct *mm, pmd_t *pmd, pte++; break; } - page_cache_get(page); - page_add_file_rmap(page); - inc_mm_counter(mm, file_rss); set_pte_at(mm, addr, pte, zero_pte); } while (pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); @@ -1717,7 +1719,7 @@ gotten: if (unlikely(anon_vma_prepare(vma))) goto oom; - if (old_page == ZERO_PAGE(address)) { + if (!old_page) { new_page = alloc_zeroed_user_highpage_movable(vma, address); if (!new_page) goto oom; @@ -2274,15 +2276,12 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma, } else { /* Map the ZERO_PAGE - vm_page_prot is readonly */ page = ZERO_PAGE(address); - page_cache_get(page); entry = mk_pte(page, vma->vm_page_prot); ptl = pte_lockptr(mm, pmd); spin_lock(ptl); if (!pte_none(*page_table)) goto release; - inc_mm_counter(mm, file_rss); - page_add_file_rmap(page); } set_pte_at(mm, address, page_table, entry); diff --git a/mm/migrate.c b/mm/migrate.c index e2fdbce..8d2e110 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -827,7 +827,7 @@ static int do_move_pages(struct mm_struct *mm, struct page_to_node *pm, goto set_status; if (PageReserved(page)) /* Check for zero page */ - goto put_and_set; + goto set_status; pp->page = page; err = page_to_nid(page); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-10 5:20 ` Linus Torvalds @ 2007-10-09 14:30 ` Nick Piggin 2007-10-10 15:04 ` Linus Torvalds 0 siblings, 1 reply; 32+ messages in thread From: Nick Piggin @ 2007-10-09 14:30 UTC (permalink / raw) To: Linus Torvalds; +Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel On Wednesday 10 October 2007 15:20, Linus Torvalds wrote: > On Wed, 10 Oct 2007, Hugh Dickins wrote: > > On Tue, 9 Oct 2007, Nick Piggin wrote: > > > by it ;) To prove my point: the *first* approach I posted to fix this > > > problem was exactly a patch to special-case the zero_page refcounting > > > which was removed with my PageReserved patch. Neither Hugh nor yourself > > > liked it one bit! > > > > True (speaking for me; I forget whether Linus ever got to see it). > > The problem is, those first "remove ref-counting" patches were ugly > *regardless* of ZERO_PAGE. > > We (yes, largely I) fixed up the mess since. The whole vm_normal_page() > and the magic PFN_REMAP thing got rid of a lot of the problems. > > And I bet that we could do something very similar wrt the zero page too. > > Basically, the ZERO page could act pretty much exactly like a PFN_REMAP > page: the VM would not touch it. No rmap, no page refcounting, no nothing. > > This following patch is not meant to be even half-way correct (it's not > even _remotely_ tested), but is just meant to be a rough "grep for > ZERO_PAGE in the VM, and see what happens if you don't ref-count it". > > Would something like the below work? I dunno. But I suspect it would. I Sure it will work. It's not completely trivial like your patch, though. The VM has to know about ZERO_PAGE if you also want it to do the "optimised" wp (what you have won't work because it will break all other "not normal" pages which are non-zero I think). And your follow_page_page path is not going to do the right thing for ZERO_PAGE either I think. > doubt anybody has the energy to actually try to actually follow through on > it, which is why I'm not pushing on it any more, and why I'll accept Sure they have. http://marc.info/?l=linux-mm&m=117515508009729&w=2 OK, this patch was open coding the tests rather than putting them in vm_normal_page, but vm_normal_page doesn't magically make it a whole lot cleaner (a _little_ bit cleaner, I agree, but in my current patch I still need a vm_normal_or_zero_page() function). > Nick's patch to just remove ZERO_PAGE, but I really *am* very unhappy > about this. Well that's not very good... > The "page refcounting cleanups" in the VM back when were really painful. > And dammit, I felt like I was the one who had to clean them up after you > guys. Which makes me really testy on this subject. OK, but in this case we'll not have a big hard-to-revert set of changes that fundamentally alter assumptions throughout the vm. It will be more a case of "if somebody screams, put the zero page back", won't it? > Totally half-assed untested patch to follow, not meant for anything but a > "I think this kind of approach should have worked too" comment. > > So I'm not pushing the patch below, I'm just fighting for people realizing > that > > - the kernel has *always* (since pretty much day 1) done that ZERO_PAGE > thing. This means that I would not be at all surprised if some > application basically depends on it. I've written test-programs that > depends on it - maybe people have written other code that basically has > been written for and tested with a kernel that has basically always > made read-only zero pages extra cheap. > > So while it may be true that removing ZERO_PAGE won't affect anybody, I > don't think it's a given, and I also don't think it's sane calling > people "crazy" for depending on something that has always been true > under Linux for the last 15+ years. There are few behaviors that have > been around for that long. That's the main question. Maybe my wording was a little strong, but I simply personally couldn't think of sane uses of zero page. I'm not prepared to argue that none could possibly exist. It just seems like now might be a good time to just _try_ removing the zero page, because of this peripheral problem caused by my refcounting patch. If it doesn't work out, then at least we'll be wiser for it, we can document why the zero page is needed, and add it back with the refcounting exceptions. > - make sure the commit message is accurate as to need for this (ie not > claim that the ZERO_PAGE itself was the problem, and give some actual > performance numbers on what is going on) OK, maybe this is where we are not on the same page. There are 2 issues really. Firstly, performance problem of refcounting the zero-page -- we've established that it causes this livelock and that we should stop refcounting it, right? Second issue is the performance difference between removing the zero page completely, and de-refcounting it (it's obviously incorrect to argue for zero page removal for performance reasons if the performance improvement is simply coming from avoiding the refcounting). The problem with that is I simply don't know any tests that use the ZERO_PAGE significantly enough to measure a difference. The 1000 COW faults vs < 1 unmap per second thing was simply to show that, on the micro level, performance won't have regressed by removing the zero page. So I'm not arguing to remove the zero page because performance is so much better than having a de-refcounted zero page! I'm saying that we should remove the refcounting one way or the other. If you accept that, then I argue that we should try removing zero page completely rather than just de-refcounting it, because that allows nice simplifications and hopefully nobody will miss the zero page. Does that make sense? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: remove zero_page (was Re: -mm merge plans for 2.6.24) 2007-10-09 14:30 ` Nick Piggin @ 2007-10-10 15:04 ` Linus Torvalds 0 siblings, 0 replies; 32+ messages in thread From: Linus Torvalds @ 2007-10-10 15:04 UTC (permalink / raw) To: Nick Piggin; +Cc: Hugh Dickins, Andrew Morton, linux-mm, linux-kernel On Wed, 10 Oct 2007, Nick Piggin wrote: > > It just seems like now might be a good time to just _try_ removing > the zero page Yes. Let's do your patch immediately after the x86 merge, and just see if anybody screams. It might take a while, because I certainly agree that whoever would be affected by it is likely to be unusual. > OK, maybe this is where we are not on the same page. > There are 2 issues really. Firstly, performance problem of > refcounting the zero-page -- we've established that it causes > this livelock and that we should stop refcounting it, right? Yes, I do agree that refcounting is problematic. > Second issue is the performance difference between removing the > zero page completely, and de-refcounting it (it's obviously > incorrect to argue for zero page removal for performance reasons > if the performance improvement is simply coming from avoiding > the refcounting). Well, even if it's a "when you don't get into the bad behaviour, performance difference is not measurable", and give a before-and-after number for some random but interesting load. Even if it's just a kernel compile.. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2007-10-11 6:33 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20071001142222.fcaa8d57.akpm@linux-foundation.org>
2007-10-02 4:21 ` Memory controller merge (was Re: -mm merge plans for 2.6.24) Balbir Singh
2007-10-02 15:46 ` Hugh Dickins
2007-10-03 8:13 ` Balbir Singh
2007-10-03 18:47 ` Hugh Dickins
2007-10-04 4:16 ` Balbir Singh
2007-10-04 13:16 ` Hugh Dickins
2007-10-05 3:07 ` Balbir Singh
2007-10-07 17:41 ` Hugh Dickins
2007-10-08 2:54 ` Balbir Singh
2007-10-04 16:10 ` Paul Menage
2007-10-10 21:07 ` Rik van Riel
2007-10-11 6:33 ` Balbir Singh
2007-10-02 16:06 ` kswapd min order, slub max order [was Re: -mm merge plans for 2.6.24] Hugh Dickins
2007-10-02 9:10 ` Nick Piggin
2007-10-02 18:38 ` Mel Gorman
2007-10-02 18:28 ` Christoph Lameter
2007-10-03 0:37 ` Christoph Lameter
2007-10-02 16:21 ` new aops merge " Hugh Dickins
2007-10-02 17:45 ` remove zero_page (was Re: -mm merge plans for 2.6.24) Nick Piggin
2007-10-03 10:58 ` Andrew Morton
2007-10-03 15:21 ` Linus Torvalds
2007-10-08 15:17 ` Nick Piggin
2007-10-09 13:00 ` Hugh Dickins
2007-10-09 14:52 ` Linus Torvalds
2007-10-09 9:31 ` Nick Piggin
2007-10-10 2:22 ` Linus Torvalds
2007-10-09 10:15 ` Nick Piggin
2007-10-10 3:06 ` Linus Torvalds
2007-10-10 4:06 ` Hugh Dickins
2007-10-10 5:20 ` Linus Torvalds
2007-10-09 14:30 ` Nick Piggin
2007-10-10 15:04 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).