* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK [not found] ` <20130524140114.GK23650@twins.programming.kicks-ass.net> @ 2013-05-24 15:40 ` Christoph Lameter 2013-05-26 1:11 ` KOSAKI Motohiro 2013-05-27 6:48 ` Peter Zijlstra 0 siblings, 2 replies; 12+ messages in thread From: Christoph Lameter @ 2013-05-24 15:40 UTC (permalink / raw) To: Peter Zijlstra Cc: Al Viro, Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland, infinipath, linux-mm, linux-rdma, Or Gerlitz On Fri, 24 May 2013, Peter Zijlstra wrote: > Patch bc3e53f682 ("mm: distinguish between mlocked and pinned pages") > broke RLIMIT_MEMLOCK. Nope the patch fixed a problem with double accounting. The problem that we seem to have is to define what mlocked and pinned mean and how this relates to RLIMIT_MEMLOCK. mlocked pages are pages that are movable (not pinned!!!) and that are marked in some way by user space actions as mlocked (POSIX semantics). They are marked with a special page flag (PG_mlocked). Pinned pages are pages that have an elevated refcount because the hardware needs to use these pages for I/O. The elevated refcount may be temporary (then we dont care about this) or for a longer time (such as the memory registration of the IB subsystem). That is when we account the memory as pinned. The elevated refcount stops page migration and other things from trying to move that memory. Pages can be both pinned and mlocked. Before my patch some pages those two issues were conflated since the same counter was used and therefore these pages were counted twice. If an RDMA application was running using mlockall() and was performing large scale I/O then the counters could show extraordinary large numbers and the VM would start to behave erratically. It is important for the VM to know which pages cannot be evicted but that involves many more pages due to dirty pages etc etc. So far the assumption has been that RLIMIT_MEMLOCK is a limit on the pages that userspace has mlocked. You want the counter to mean something different it seems. What is it? I think we need to be first clear on what we want to accomplish and what these counters actually should count before changing things. Certainly would appreciate improvements in this area but resurrecting the conflation between mlocked and pinned pages is not the way to go. > This patch proposes to properly fix the problem by introducing > VM_PINNED. This also provides the groundwork for a possible mpin() > syscall or MADV_PIN -- although these are not included. Maybe add a new PIN page flag? Pages are not pinned per vma as the patch seems to assume. > It recognises that pinned page semantics are a strict super-set of > locked page semantics -- a pinned page will not generate major faults > (and thus satisfies mlock() requirements). Not exactly true. Pinned pages may not have the mlocked flag set and they are not managed on the unevictable LRU lists of the MM. > If people find this approach unworkable, I request we revert the above > mentioned patch to at least restore RLIMIT_MEMLOCK to a usable state > again. Cannot do that. This will cause the breakage that the patch was fixing to resurface. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK 2013-05-24 15:40 ` [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK Christoph Lameter @ 2013-05-26 1:11 ` KOSAKI Motohiro 2013-05-28 16:19 ` Christoph Lameter 2013-05-27 6:48 ` Peter Zijlstra 1 sibling, 1 reply; 12+ messages in thread From: KOSAKI Motohiro @ 2013-05-26 1:11 UTC (permalink / raw) To: Christoph Lameter Cc: Peter Zijlstra, Al Viro, Vince Weaver, LKML, Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo, trinity, Andrew Morton, Linus Torvalds, Roland Dreier, infinipath, linux-mm@kvack.org, linux-rdma, Or Gerlitz On Fri, May 24, 2013 at 11:40 AM, Christoph Lameter <cl@linux.com> wrote: > On Fri, 24 May 2013, Peter Zijlstra wrote: > >> Patch bc3e53f682 ("mm: distinguish between mlocked and pinned pages") >> broke RLIMIT_MEMLOCK. > > Nope the patch fixed a problem with double accounting. > > The problem that we seem to have is to define what mlocked and pinned mean > and how this relates to RLIMIT_MEMLOCK. > > mlocked pages are pages that are movable (not pinned!!!) and that are > marked in some way by user space actions as mlocked (POSIX semantics). > They are marked with a special page flag (PG_mlocked). > > Pinned pages are pages that have an elevated refcount because the hardware > needs to use these pages for I/O. The elevated refcount may be temporary > (then we dont care about this) or for a longer time (such as the memory > registration of the IB subsystem). That is when we account the memory as > pinned. The elevated refcount stops page migration and other things from > trying to move that memory. > > Pages can be both pinned and mlocked. Before my patch some pages those two > issues were conflated since the same counter was used and therefore these > pages were counted twice. If an RDMA application was running using > mlockall() and was performing large scale I/O then the counters could show > extraordinary large numbers and the VM would start to behave erratically. > > It is important for the VM to know which pages cannot be evicted but that > involves many more pages due to dirty pages etc etc. > > So far the assumption has been that RLIMIT_MEMLOCK is a limit on the pages > that userspace has mlocked. > > You want the counter to mean something different it seems. What is it? > > I think we need to be first clear on what we want to accomplish and what > these counters actually should count before changing things. Hm. If pinned and mlocked are totally difference intentionally, why IB uses RLIMIT_MEMLOCK. Why don't IB uses IB specific limit and why only IB raise up number of pinned pages and other gup users don't. I can't guess IB folk's intent. And now ever IB code has duplicated RLIMIT_MEMLOCK check and at least __ipath_get_user_pages() forget to check capable(CAP_IPC_LOCK). That's bad. > Certainly would appreciate improvements in this area but resurrecting the > conflation between mlocked and pinned pages is not the way to go. > >> This patch proposes to properly fix the problem by introducing >> VM_PINNED. This also provides the groundwork for a possible mpin() >> syscall or MADV_PIN -- although these are not included. > > Maybe add a new PIN page flag? Pages are not pinned per vma as the patch > seems to assume. Generically, you are right. But if VM_PINNED is really only for IB, this is acceptable limitation. They can split vma for their own purpose. Anyway, I agree we should clearly understand the semantics of IB pinning and the userland usage and assumption. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK 2013-05-26 1:11 ` KOSAKI Motohiro @ 2013-05-28 16:19 ` Christoph Lameter 0 siblings, 0 replies; 12+ messages in thread From: Christoph Lameter @ 2013-05-28 16:19 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Peter Zijlstra, Al Viro, Vince Weaver, LKML, Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo, trinity, Andrew Morton, Linus Torvalds, Roland Dreier, infinipath, linux-mm@kvack.org, linux-rdma, Or Gerlitz On Sat, 25 May 2013, KOSAKI Motohiro wrote: > If pinned and mlocked are totally difference intentionally, why IB uses > RLIMIT_MEMLOCK. Why don't IB uses IB specific limit and why only IB raise up > number of pinned pages and other gup users don't. > I can't guess IB folk's intent. True another limit would be better. The reason that IB raises the pinned pages is because IB permanently pins those pages. Other users of gup do that temporarily. If there are other users that pin pages permanently should also account for it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK 2013-05-24 15:40 ` [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK Christoph Lameter 2013-05-26 1:11 ` KOSAKI Motohiro @ 2013-05-27 6:48 ` Peter Zijlstra 2013-05-28 16:37 ` Christoph Lameter 1 sibling, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2013-05-27 6:48 UTC (permalink / raw) To: Christoph Lameter Cc: Al Viro, Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland, infinipath, linux-mm, linux-rdma, Or Gerlitz On Fri, May 24, 2013 at 03:40:26PM +0000, Christoph Lameter wrote: > On Fri, 24 May 2013, Peter Zijlstra wrote: > > > Patch bc3e53f682 ("mm: distinguish between mlocked and pinned pages") > > broke RLIMIT_MEMLOCK. > > Nope the patch fixed a problem with double accounting. And introduces another problem by now under-counting in a much more likely scenario. Before your patch pinned was included in locked and thus RLIMIT_MEMLOCK had a single resource counter. After your patch RLIMIT_MEMLOCK is applied separately to both -- more or less. Your patch changes RLIMIT_MEMLOCK semantics, doesn't mention this in the Changelog which makes one think this was unintentional. And you're refusing to acknowledge that this might be a problem. The fact that RLIMIT_MEMLOCK is the only consumer of the counter tells me you simply didn't seem to care and did a sloppy patch 'solving' your immediate issue. The below discussion of what mlock vs pinned should mean for RLIMIT_MEMLOCK should have been part of that patch, since it changed the previously established behaviour. > The problem that we seem to have is to define what mlocked and pinned mean > and how this relates to RLIMIT_MEMLOCK. > > mlocked pages are pages that are movable (not pinned!!!) and that are > marked in some way by user space actions as mlocked (POSIX semantics). > They are marked with a special page flag (PG_mlocked). NO, mlocked pages are pages that do not leave core memory; IOW do not cause major faults. Pinning pages is a perfectly spec compliant mlock() implementation. See: http://pubs.opengroup.org/onlinepubs/7908799/xsh/mlock.html The fact that we _want_ a weaker implementation because we like to move them despite them being mlocked is us playing weasel words with the spec :-) So much so that we have to introduce a means of actually pinning them _before_ we start to actually migrate mlocked pages (or did somebody actually change that too?). RT people will be unhappy if they hit minor faults or have to wait behind page-migration (esp. since the migrate code only removes the migration PTE after all pages in the batch are migrated). BTW, notice that mlock() is part of the REALTIME spec, and while its wording is such that is allows us to make it unsuitable for actual real-time usage this goes against the intent of the spec. Now in an earlier discussion on the issue 'we' (I can't remember if you participated there, I remember Mel and Kosaki-San) agreed that for 'normal' (read not whacky real-time people) mlock can still be useful and we should introduce a pinned user API for the RT people. > Pinned pages are pages that have an elevated refcount because the hardware > needs to use these pages for I/O. The elevated refcount may be temporary > (then we dont care about this) or for a longer time (such as the memory > registration of the IB subsystem). That is when we account the memory as > pinned. The elevated refcount stops page migration and other things from > trying to move that memory. Again I _know_ that!!! > Pages can be both pinned and mlocked. Right, but apart for mlockall() this is a highly unlikely situation to actually occur. And if you're using mlockall() you've effectively disabled RLIMIT_MEMLOCK and thus nobody cares if the resource counter goes funny. > Before my patch some pages those two > issues were conflated since the same counter was used and therefore these > pages were counted twice. If an RDMA application was running using > mlockall() and was performing large scale I/O then the counters could show > extraordinary large numbers and the VM would start to behave erratically. > > It is important for the VM to know which pages cannot be evicted but that > involves many more pages due to dirty pages etc etc. But the VM as such doesn't care, the VM doesn't use either locked_vm nor pinned_vm for anything! The only user is RLIMIT_MEMLOCK and you silently changed semantics. > So far the assumption has been that RLIMIT_MEMLOCK is a limit on the pages > that userspace has mlocked. > > You want the counter to mean something different it seems. What is it? It would've been so much better if you'd asked me that before you mucked it up. The intent of RLIMIT_MEMLOCK is clearly to limit the amount of memory the user can lock into core memory as per the direct definition of mlock. By this definition pinned should be included. This leaves us to solve the issue of pinned mlocked memory. > I think we need to be first clear on what we want to accomplish and what > these counters actually should count before changing things. Backward isn't it... _you_ changed it without consideration. > Certainly would appreciate improvements in this area but resurrecting the > conflation between mlocked and pinned pages is not the way to go. > > > This patch proposes to properly fix the problem by introducing > > VM_PINNED. This also provides the groundwork for a possible mpin() > > syscall or MADV_PIN -- although these are not included. > > Maybe add a new PIN page flag? Pages are not pinned per vma as the patch > seems to assume. The IB code does a big get_user_pages(), which last time I checked pins a sequential range of pages. Therefore the VMA approach. The perf thing is guaranteed to be virtually sequential and already is a VMA -- its the result of a user mmap() call. > > It recognises that pinned page semantics are a strict super-set of > > locked page semantics -- a pinned page will not generate major faults > > (and thus satisfies mlock() requirements). > > Not exactly true. Pinned pages may not have the mlocked flag set and they > are not managed on the unevictable LRU lists of the MM. Re-read the spec; pinned pages stay in memory; therefore they qualify. The fact that we've gone all 'smart' about dealing with such pages is immaterial. > > If people find this approach unworkable, I request we revert the above > > mentioned patch to at least restore RLIMIT_MEMLOCK to a usable state > > again. > > Cannot do that. This will cause the breakage that the patch was fixing to > resurface. But fixes my borkage, which I argue is much more user visible since it doesn't require CAP_IPC_LOCK :-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK 2013-05-27 6:48 ` Peter Zijlstra @ 2013-05-28 16:37 ` Christoph Lameter 2013-05-29 7:58 ` [regression] " Ingo Molnar ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Christoph Lameter @ 2013-05-28 16:37 UTC (permalink / raw) To: Peter Zijlstra Cc: Al Viro, Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland, infinipath, linux-mm, linux-rdma, Or Gerlitz, Hugh Dickins On Mon, 27 May 2013, Peter Zijlstra wrote: > Before your patch pinned was included in locked and thus RLIMIT_MEMLOCK > had a single resource counter. After your patch RLIMIT_MEMLOCK is > applied separately to both -- more or less. Before the patch the count was doubled since a single page was counted twice: Once because it was mlocked (marked with PG_mlock) and then again because it was also pinned (the refcount was increased). Two different things. We have agreed for a long time that mlocked pages are movable. That is not true for pinned pages and therefore pinning pages therefore do not fall into that category (Hugh? AFAICR you came up with that rule?) > NO, mlocked pages are pages that do not leave core memory; IOW do not > cause major faults. Pinning pages is a perfectly spec compliant mlock() > implementation. That is not the definition that we have used so far. > Now in an earlier discussion on the issue 'we' (I can't remember if you > participated there, I remember Mel and Kosaki-San) agreed that for > 'normal' (read not whacky real-time people) mlock can still be useful > and we should introduce a pinned user API for the RT people. Right. I remember that. > > Pinned pages are pages that have an elevated refcount because the hardware > > needs to use these pages for I/O. The elevated refcount may be temporary > > (then we dont care about this) or for a longer time (such as the memory > > registration of the IB subsystem). That is when we account the memory as > > pinned. The elevated refcount stops page migration and other things from > > trying to move that memory. > > Again I _know_ that!!! But then you refuse to acknowledge the difference and want to conflate both. > > Pages can be both pinned and mlocked. > > Right, but apart for mlockall() this is a highly unlikely situation to > actually occur. And if you're using mlockall() you've effectively > disabled RLIMIT_MEMLOCK and thus nobody cares if the resource counter > goes funny. mlockall() would never be used on all processes. You still need the RLIMIT_MLOCK to ensure that the box does not lock up. > > I think we need to be first clear on what we want to accomplish and what > > these counters actually should count before changing things. > > Backward isn't it... _you_ changed it without consideration. I applied the categorization that we had agreed on before during the development of page migratiob. Pinning is not compatible. > The IB code does a big get_user_pages(), which last time I checked > pins a sequential range of pages. Therefore the VMA approach. The IB code (and other code) can require the pinning of pages in various ways. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* [regression] Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK 2013-05-28 16:37 ` Christoph Lameter @ 2013-05-29 7:58 ` Ingo Molnar 2013-05-29 19:53 ` KOSAKI Motohiro 2013-05-30 18:30 ` Peter Zijlstra 2013-05-30 19:59 ` Pekka Enberg 2 siblings, 1 reply; 12+ messages in thread From: Ingo Molnar @ 2013-05-29 7:58 UTC (permalink / raw) To: Christoph Lameter Cc: Peter Zijlstra, Al Viro, Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland, infinipath, linux-mm, linux-rdma, Or Gerlitz, Hugh Dickins * Christoph Lameter <cl@linux.com> wrote: > On Mon, 27 May 2013, Peter Zijlstra wrote: > > > Before your patch pinned was included in locked and thus RLIMIT_MEMLOCK > > had a single resource counter. After your patch RLIMIT_MEMLOCK is > > applied separately to both -- more or less. > > Before the patch the count was doubled since a single page was counted > twice: Once because it was mlocked (marked with PG_mlock) and then again > because it was also pinned (the refcount was increased). Two different > things. Christoph, why are you *STILL* arguing?? You caused a *regression* in a userspace ABI plain and simple, and a security relevant one. Furtermore you modified kernel/events/core.c yet you never even Cc:-ed the parties involved ... All your excuses, obfuscation and attempts to redefine the universe to your liking won't change reality: it worked before, it does not now. Take responsibility for your action for christ's sake and move forward towards a resolution , okay? When can we expect a fix from you for the breakage you caused? Or at least a word that acknowledges that you broke a user ABI carelessly? Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK 2013-05-29 7:58 ` [regression] " Ingo Molnar @ 2013-05-29 19:53 ` KOSAKI Motohiro 2013-05-30 6:32 ` Ingo Molnar 0 siblings, 1 reply; 12+ messages in thread From: KOSAKI Motohiro @ 2013-05-29 19:53 UTC (permalink / raw) To: Ingo Molnar Cc: Christoph Lameter, Peter Zijlstra, Al Viro, Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland, infinipath, linux-mm, linux-rdma, Or Gerlitz, Hugh Dickins, kosaki.motohiro Hi I'm unhappy you guys uses offensive word so much. Please cool down all you guys. :-/ In fact, _BOTH_ the behavior before and after Cristoph's patch doesn't have cleaner semantics. And PeterZ proposed make new cleaner one rather than revert. No need to hassle. I'm 100% sure -rt people need stronger-mlock api. Please join discussion to make better API. In my humble opinion is: we should make mlock3(addr, len flags) new syscall (*) and support -rt requirement directly. And current strange IB RLIMIT_MEMLOCK usage should gradually migrate it. (*) or, to enhance mbind() is an option because i expect apps need to pin pages nearby NUMA nodes in many case. As your know, current IB pinning implementation doesn't guarantee no minor fault when fork is used. It's ok for IB. They uses madvise(MADV_NOFORK) too. But I'm not sure *all* of rt application are satisfied this. We might need to implement copy-on-fork or might not. I'd like hear other people's opinion. Also, all developer should know this pinning breaks when memory hot-plug is happen likes cpu bounding bysched_setaffinity() may break when cpu hot-remove. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK 2013-05-29 19:53 ` KOSAKI Motohiro @ 2013-05-30 6:32 ` Ingo Molnar 2013-05-30 20:42 ` KOSAKI Motohiro 0 siblings, 1 reply; 12+ messages in thread From: Ingo Molnar @ 2013-05-30 6:32 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Christoph Lameter, Peter Zijlstra, Al Viro, Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland, infinipath, linux-mm, linux-rdma, Or Gerlitz, Hugh Dickins * KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote: > Hi > > I'm unhappy you guys uses offensive word so much. Please cool down all > you guys. :-/ In fact, _BOTH_ the behavior before and after Cristoph's > patch doesn't have cleaner semantics. Erm, this feature _regressed_ after the patch. All other concerns are secondary. What's so difficult to understand about that? > And PeterZ proposed make new cleaner one rather than revert. No need to > hassle. Sorry, that's not how we deal with regressions upstream... I've been pretty polite in fact, compared to say Linus - someone should put a politically correct summary of: https://lkml.org/lkml/2013/2/22/456 https://lkml.org/lkml/2013/3/26/1113 into Documentation/, people keep forgetting about it. I'm sorry if being direct and to the point offends you [no, not really], this discussion got _way_ off the real track it should be on: that this is a _regression_. Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK 2013-05-30 6:32 ` Ingo Molnar @ 2013-05-30 20:42 ` KOSAKI Motohiro 2013-05-31 9:27 ` Ingo Molnar 0 siblings, 1 reply; 12+ messages in thread From: KOSAKI Motohiro @ 2013-05-30 20:42 UTC (permalink / raw) To: Ingo Molnar Cc: Christoph Lameter, Peter Zijlstra, Al Viro, Vince Weaver, LKML, Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo, trinity, Andrew Morton, Linus Torvalds, Roland Dreier, infinipath, linux-mm@kvack.org, linux-rdma, Or Gerlitz, Hugh Dickins >> I'm unhappy you guys uses offensive word so much. Please cool down all >> you guys. :-/ In fact, _BOTH_ the behavior before and after Cristoph's >> patch doesn't have cleaner semantics. > > Erm, this feature _regressed_ after the patch. All other concerns are > secondary. What's so difficult to understand about that? Because it is not new commit at all. Christoph's patch was introduced 10 releases before. $ git describe bc3e53f682 v3.1-7235-gbc3e53f If we just revert it, we may get another and opposite regression report. I'm worried about it. Moreover, I don't think discussion better fix is too difficult for us. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK 2013-05-30 20:42 ` KOSAKI Motohiro @ 2013-05-31 9:27 ` Ingo Molnar 0 siblings, 0 replies; 12+ messages in thread From: Ingo Molnar @ 2013-05-31 9:27 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Christoph Lameter, Peter Zijlstra, Al Viro, Vince Weaver, LKML, Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo, trinity, Andrew Morton, Linus Torvalds, Roland Dreier, infinipath, linux-mm@kvack.org, linux-rdma, Or Gerlitz, Hugh Dickins * KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote: > >> I'm unhappy you guys uses offensive word so much. Please cool down all > >> you guys. :-/ In fact, _BOTH_ the behavior before and after Cristoph's > >> patch doesn't have cleaner semantics. > > > > Erm, this feature _regressed_ after the patch. All other concerns are > > secondary. What's so difficult to understand about that? > > Because it is not new commit at all. Christoph's patch was introduced > 10 releases before. That's indeed sad - resource limits are not typically tested by apps. The lack of Cc:s to people affected also helped hide the effects of the change. > $ git describe bc3e53f682 > v3.1-7235-gbc3e53f > > If we just revert it, we may get another and opposite regression report. > I'm worried about it. Moreover, I don't think discussion better fix is > too difficult for us. Sure, I agree that a fix is generally better. Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK 2013-05-28 16:37 ` Christoph Lameter 2013-05-29 7:58 ` [regression] " Ingo Molnar @ 2013-05-30 18:30 ` Peter Zijlstra 2013-05-30 19:59 ` Pekka Enberg 2 siblings, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2013-05-30 18:30 UTC (permalink / raw) To: Christoph Lameter Cc: Al Viro, Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland, infinipath, linux-mm, linux-rdma, Or Gerlitz, Hugh Dickins On Tue, May 28, 2013 at 04:37:06PM +0000, Christoph Lameter wrote: > On Mon, 27 May 2013, Peter Zijlstra wrote: > > > Before your patch pinned was included in locked and thus RLIMIT_MEMLOCK > > had a single resource counter. After your patch RLIMIT_MEMLOCK is > > applied separately to both -- more or less. > > Before the patch the count was doubled since a single page was counted > twice: Once because it was mlocked (marked with PG_mlock) and then again > because it was also pinned (the refcount was increased). Two different things. Before the patch RLIMIT_MEMLOCK was a limit on the sum of both, after it is no longer. This change was not described in the changelog. This is a bug plain and simple and you refuse to acknowledge it. > We have agreed for a long time that mlocked pages are movable. That is not > true for pinned pages and therefore pinning pages therefore do not fall > into that category (Hugh? AFAICR you came up with that rule?) Into what category? RLIMIT_MEMLOCK should be a limit on the amount of pages that are constrained to memory (can not be paged). How does a pinned page not qualify for that? > > NO, mlocked pages are pages that do not leave core memory; IOW do not > > cause major faults. Pinning pages is a perfectly spec compliant mlock() > > implementation. > > That is not the definition that we have used so far. There's two statements there; which one do you disagree with? On the first; that is exactly the mlock() definition you want; excluding major faults only means faults cannot be subject to IO, IOW pages must remain in memory. Not excluding minor faults allows you to unmap and migrate pages. On the second; excluding any faults; that is what the mlock() specs intended -- because that is the thing real-time people really want and mlock is part of the real-time POSIX spec. [ Hence the need to provide mpin() and munpin() syscalls before we make mlock() pages migratable, otherwise RT people are screwed. ] Since excluding any fault per definition also excludes major faults, the second is a clear superset of the first. Both ensure pages stay in memory, thus both should count towards RLIMIT_MEMLOCK. > But then you refuse to acknowledge the difference and want to conflate > both. You're the one that is confused. You're just repeating yourself without providing argument or reason. > > > Pages can be both pinned and mlocked. > > > > Right, but apart for mlockall() this is a highly unlikely situation to > > actually occur. And if you're using mlockall() you've effectively > > disabled RLIMIT_MEMLOCK and thus nobody cares if the resource counter > > goes funny. > > mlockall() would never be used on all processes. You still need the > RLIMIT_MLOCK to ensure that the box does not lock up. You're so skilled at missing the point its not funny. The resource counter (formerly mm_struct::locked_vm, currently broken) associated with RLIMIT_MEMLOCK is per process. Therefore when you use mlockall() you've already done away with the limit, and thus the resource counter value is irrelevant. > > > I think we need to be first clear on what we want to accomplish and what > > > these counters actually should count before changing things. > > > > Backward isn't it... _you_ changed it without consideration. > > I applied the categorization that we had agreed on before during the > development of page migratiob. Pinning is not compatible. I've never agreed to any such thing, you changed my code without cc'ing me and without describing the actual change. Please explain how pinned pages are not stuck in memory and can be paged? Once you've convinced me of that, I'll concede they should not be counted towards RLIMIT_MEMLOCK. > > The IB code does a big get_user_pages(), which last time I checked > > pins a sequential range of pages. Therefore the VMA approach. > > The IB code (and other code) can require the pinning of pages in various > ways. You can also mlock()/madvise()/mprotect() in various ways, this is a non-argument against using VMAs. Is there anything besides IB and perf that has user controlled page pinning? If so, it needs fixing. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK 2013-05-28 16:37 ` Christoph Lameter 2013-05-29 7:58 ` [regression] " Ingo Molnar 2013-05-30 18:30 ` Peter Zijlstra @ 2013-05-30 19:59 ` Pekka Enberg 2 siblings, 0 replies; 12+ messages in thread From: Pekka Enberg @ 2013-05-30 19:59 UTC (permalink / raw) To: Christoph Lameter Cc: Peter Zijlstra, Al Viro, Vince Weaver, LKML, Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo, trinity, Andrew Morton, Linus Torvalds, roland, infinipath, linux-mm@kvack.org, linux-rdma, Or Gerlitz, Hugh Dickins On Mon, 27 May 2013, Peter Zijlstra wrote: >> Before your patch pinned was included in locked and thus RLIMIT_MEMLOCK >> had a single resource counter. After your patch RLIMIT_MEMLOCK is >> applied separately to both -- more or less. On Tue, May 28, 2013 at 7:37 PM, Christoph Lameter <cl@linux.com> wrote: > Before the patch the count was doubled since a single page was counted > twice: Once because it was mlocked (marked with PG_mlock) and then again > because it was also pinned (the refcount was increased). Two different things. Pinned vs. mlocked counting isn't the problem here. You changed RLIMIT_MEMLOCK userspace ABI and that needs to be restored. So the question is how can we preserve the old RLIMIT_MEMLOCK semantics while avoiding the double accounting issue. Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-05-31 9:27 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <alpine.DEB.2.10.1305221523420.9944@vincent-weaver-1.um.maine.edu>
[not found] ` <alpine.DEB.2.10.1305221953370.11450@vincent-weaver-1.um.maine.edu>
[not found] ` <alpine.DEB.2.10.1305222344060.12929@vincent-weaver-1.um.maine.edu>
[not found] ` <20130523044803.GA25399@ZenIV.linux.org.uk>
[not found] ` <20130523104154.GA23650@twins.programming.kicks-ass.net>
[not found] ` <0000013ed1b8d0cc-ad2bb878-51bd-430c-8159-629b23ed1b44-000000@email.amazonses.com>
[not found] ` <20130523152458.GD23650@twins.programming.kicks-ass.net>
[not found] ` <0000013ed2297ba8-467d474a-7068-45b3-9fa3-82641e6aa363-000000@email.amazonses.com>
[not found] ` <20130523163901.GG23650@twins.programming.kicks-ass.net>
[not found] ` <0000013ed28b638a-066d7dc7-b590-49f8-9423-badb9537b8b6-000000@email.amazonses.com>
[not found] ` <20130524140114.GK23650@twins.programming.kicks-ass.net>
2013-05-24 15:40 ` [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK Christoph Lameter
2013-05-26 1:11 ` KOSAKI Motohiro
2013-05-28 16:19 ` Christoph Lameter
2013-05-27 6:48 ` Peter Zijlstra
2013-05-28 16:37 ` Christoph Lameter
2013-05-29 7:58 ` [regression] " Ingo Molnar
2013-05-29 19:53 ` KOSAKI Motohiro
2013-05-30 6:32 ` Ingo Molnar
2013-05-30 20:42 ` KOSAKI Motohiro
2013-05-31 9:27 ` Ingo Molnar
2013-05-30 18:30 ` Peter Zijlstra
2013-05-30 19:59 ` Pekka Enberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox