public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
       [not found]                   ` <20130524140114.GK23650@twins.programming.kicks-ass.net>
@ 2013-05-24 15:40                     ` Christoph Lameter
  2013-05-26  1:11                       ` KOSAKI Motohiro
  2013-05-27  6:48                       ` Peter Zijlstra
  0 siblings, 2 replies; 12+ messages in thread
From: Christoph Lameter @ 2013-05-24 15:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Al Viro, Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland,
	infinipath, linux-mm, linux-rdma, Or Gerlitz

On Fri, 24 May 2013, Peter Zijlstra wrote:

> Patch bc3e53f682 ("mm: distinguish between mlocked and pinned pages")
> broke RLIMIT_MEMLOCK.

Nope the patch fixed a problem with double accounting.

The problem that we seem to have is to define what mlocked and pinned mean
and how this relates to RLIMIT_MEMLOCK.

mlocked pages are pages that are movable (not pinned!!!) and that are
marked in some way by user space actions as mlocked (POSIX semantics).
They are marked with a special page flag (PG_mlocked).

Pinned pages are pages that have an elevated refcount because the hardware
needs to use these pages for I/O. The elevated refcount may be temporary
(then we dont care about this) or for a longer time (such as the memory
registration of the IB subsystem). That is when we account the memory as
pinned. The elevated refcount stops page migration and other things from
trying to move that memory.

Pages can be both pinned and mlocked. Before my patch some pages those two
issues were conflated since the same counter was used and therefore these
pages were counted twice. If an RDMA application was running using
mlockall() and was performing large scale I/O then the counters could show
extraordinary large numbers and the VM would start to behave erratically.

It is important for the VM to know which pages cannot be evicted but that
involves many more pages due to dirty pages etc etc.

So far the assumption has been that RLIMIT_MEMLOCK is a limit on the pages
that userspace has mlocked.

You want the counter to mean something different it seems. What is it?

I think we need to be first clear on what we want to accomplish and what
these counters actually should count before changing things.

Certainly would appreciate improvements in this area but resurrecting the
conflation between mlocked and pinned pages is not the way to go.

> This patch proposes to properly fix the problem by introducing
> VM_PINNED. This also provides the groundwork for a possible mpin()
> syscall or MADV_PIN -- although these are not included.

Maybe add a new PIN page flag? Pages are not pinned per vma as the patch
seems to assume.

> It recognises that pinned page semantics are a strict super-set of
> locked page semantics -- a pinned page will not generate major faults
> (and thus satisfies mlock() requirements).

Not exactly true. Pinned pages may not have the mlocked flag set and they
are not managed on the unevictable LRU lists of the MM.

> If people find this approach unworkable, I request we revert the above
> mentioned patch to at least restore RLIMIT_MEMLOCK to a usable state
> again.

Cannot do that. This will cause the breakage that the patch was fixing to
resurface.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-24 15:40                     ` [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK Christoph Lameter
@ 2013-05-26  1:11                       ` KOSAKI Motohiro
  2013-05-28 16:19                         ` Christoph Lameter
  2013-05-27  6:48                       ` Peter Zijlstra
  1 sibling, 1 reply; 12+ messages in thread
From: KOSAKI Motohiro @ 2013-05-26  1:11 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Peter Zijlstra, Al Viro, Vince Weaver, LKML, Paul Mackerras,
	Ingo Molnar, Arnaldo Carvalho de Melo, trinity, Andrew Morton,
	Linus Torvalds, Roland Dreier, infinipath, linux-mm@kvack.org,
	linux-rdma, Or Gerlitz

On Fri, May 24, 2013 at 11:40 AM, Christoph Lameter <cl@linux.com> wrote:
> On Fri, 24 May 2013, Peter Zijlstra wrote:
>
>> Patch bc3e53f682 ("mm: distinguish between mlocked and pinned pages")
>> broke RLIMIT_MEMLOCK.
>
> Nope the patch fixed a problem with double accounting.
>
> The problem that we seem to have is to define what mlocked and pinned mean
> and how this relates to RLIMIT_MEMLOCK.
>
> mlocked pages are pages that are movable (not pinned!!!) and that are
> marked in some way by user space actions as mlocked (POSIX semantics).
> They are marked with a special page flag (PG_mlocked).
>
> Pinned pages are pages that have an elevated refcount because the hardware
> needs to use these pages for I/O. The elevated refcount may be temporary
> (then we dont care about this) or for a longer time (such as the memory
> registration of the IB subsystem). That is when we account the memory as
> pinned. The elevated refcount stops page migration and other things from
> trying to move that memory.
>
> Pages can be both pinned and mlocked. Before my patch some pages those two
> issues were conflated since the same counter was used and therefore these
>
pages were counted twice. If an RDMA application was running using
> mlockall() and was performing large scale I/O then the counters could show
> extraordinary large numbers and the VM would start to behave erratically.
>
> It is important for the VM to know which pages cannot be evicted but that
> involves many more pages due to dirty pages etc etc.
>
> So far the assumption has been that RLIMIT_MEMLOCK is a limit on the pages
> that userspace has mlocked.
>
> You want the counter to mean something different it seems. What is it?
>
> I think we need to be first clear on what we want to accomplish and what
> these counters actually should count before changing things.

Hm.
If pinned and mlocked are totally difference intentionally, why IB uses
RLIMIT_MEMLOCK. Why don't IB uses IB specific limit and why only IB raise up
number of pinned pages and other gup users don't.
I can't guess IB folk's intent.

And now ever IB code has duplicated RLIMIT_MEMLOCK
check and at least __ipath_get_user_pages() forget to check
capable(CAP_IPC_LOCK).
That's bad.


> Certainly would appreciate improvements in this area but resurrecting the
> conflation between mlocked and pinned pages is not the way to go.
>
>> This patch proposes to properly fix the problem by introducing
>> VM_PINNED. This also provides the groundwork for a possible mpin()
>> syscall or MADV_PIN -- although these are not included.
>
> Maybe add a new PIN page flag? Pages are not pinned per vma as the patch
> seems to assume.

Generically, you are right. But if VM_PINNED is really only for IB,
this is acceptable
limitation. They can split vma for their own purpose.

Anyway, I agree we should clearly understand the semantics of IB pinning and
the userland usage and assumption.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-24 15:40                     ` [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK Christoph Lameter
  2013-05-26  1:11                       ` KOSAKI Motohiro
@ 2013-05-27  6:48                       ` Peter Zijlstra
  2013-05-28 16:37                         ` Christoph Lameter
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2013-05-27  6:48 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Al Viro, Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland,
	infinipath, linux-mm, linux-rdma, Or Gerlitz

On Fri, May 24, 2013 at 03:40:26PM +0000, Christoph Lameter wrote:
> On Fri, 24 May 2013, Peter Zijlstra wrote:
> 
> > Patch bc3e53f682 ("mm: distinguish between mlocked and pinned pages")
> > broke RLIMIT_MEMLOCK.
> 
> Nope the patch fixed a problem with double accounting.

And introduces another problem by now under-counting in a much more
likely scenario.

Before your patch pinned was included in locked and thus RLIMIT_MEMLOCK
had a single resource counter. After your patch RLIMIT_MEMLOCK is
applied separately to both -- more or less.

Your patch changes RLIMIT_MEMLOCK semantics, doesn't mention this in the
Changelog which makes one think this was unintentional. And you're
refusing to acknowledge that this might be a problem.

The fact that RLIMIT_MEMLOCK is the only consumer of the counter tells
me you simply didn't seem to care and did a sloppy patch 'solving' your
immediate issue.

The below discussion of what mlock vs pinned should mean for
RLIMIT_MEMLOCK should have been part of that patch, since it changed the
previously established behaviour.

> The problem that we seem to have is to define what mlocked and pinned mean
> and how this relates to RLIMIT_MEMLOCK.
> 
> mlocked pages are pages that are movable (not pinned!!!) and that are
> marked in some way by user space actions as mlocked (POSIX semantics).
> They are marked with a special page flag (PG_mlocked).

NO, mlocked pages are pages that do not leave core memory; IOW do not
cause major faults. Pinning pages is a perfectly spec compliant mlock()
implementation.

See: http://pubs.opengroup.org/onlinepubs/7908799/xsh/mlock.html

The fact that we _want_ a weaker implementation because we like to move
them despite them being mlocked is us playing weasel words with the spec
:-) So much so that we have to introduce a means of actually pinning
them _before_ we start to actually migrate mlocked pages (or did
somebody actually change that too?).

RT people will be unhappy if they hit minor faults or have to wait
behind page-migration (esp. since the migrate code only removes the
migration PTE after all pages in the batch are migrated).

BTW, notice that mlock() is part of the REALTIME spec, and while its
wording is such that is allows us to make it unsuitable for actual
real-time usage this goes against the intent of the spec.

Now in an earlier discussion on the issue 'we' (I can't remember if you
participated there, I remember Mel and Kosaki-San) agreed that for
'normal' (read not whacky real-time people) mlock can still be useful
and we should introduce a pinned user API for the RT people.

> Pinned pages are pages that have an elevated refcount because the hardware
> needs to use these pages for I/O. The elevated refcount may be temporary
> (then we dont care about this) or for a longer time (such as the memory
> registration of the IB subsystem). That is when we account the memory as
> pinned. The elevated refcount stops page migration and other things from
> trying to move that memory.

Again I _know_ that!!!

> Pages can be both pinned and mlocked. 

Right, but apart for mlockall() this is a highly unlikely situation to
actually occur. And if you're using mlockall() you've effectively
disabled RLIMIT_MEMLOCK and thus nobody cares if the resource counter
goes funny.

> Before my patch some pages those two
> issues were conflated since the same counter was used and therefore these
> pages were counted twice. If an RDMA application was running using
> mlockall() and was performing large scale I/O then the counters could show
> extraordinary large numbers and the VM would start to behave erratically.
> 
> It is important for the VM to know which pages cannot be evicted but that
> involves many more pages due to dirty pages etc etc.

But the VM as such doesn't care, the VM doesn't use either locked_vm nor
pinned_vm for anything! The only user is RLIMIT_MEMLOCK and you silently
changed semantics.

> So far the assumption has been that RLIMIT_MEMLOCK is a limit on the pages
> that userspace has mlocked.
> 
> You want the counter to mean something different it seems. What is it?

It would've been so much better if you'd asked me that before you mucked
it up.

The intent of RLIMIT_MEMLOCK is clearly to limit the amount of memory
the user can lock into core memory as per the direct definition of
mlock. By this definition pinned should be included. This leaves us to
solve the issue of pinned mlocked memory.

> I think we need to be first clear on what we want to accomplish and what
> these counters actually should count before changing things.

Backward isn't it... _you_ changed it without consideration.

> Certainly would appreciate improvements in this area but resurrecting the
> conflation between mlocked and pinned pages is not the way to go.
> 
> > This patch proposes to properly fix the problem by introducing
> > VM_PINNED. This also provides the groundwork for a possible mpin()
> > syscall or MADV_PIN -- although these are not included.
> 
> Maybe add a new PIN page flag? Pages are not pinned per vma as the patch
> seems to assume.

The IB code does a big get_user_pages(), which last time I checked
pins a sequential range of pages. Therefore the VMA approach.

The perf thing is guaranteed to be virtually sequential and already is a
VMA -- its the result of a user mmap() call.

> > It recognises that pinned page semantics are a strict super-set of
> > locked page semantics -- a pinned page will not generate major faults
> > (and thus satisfies mlock() requirements).
> 
> Not exactly true. Pinned pages may not have the mlocked flag set and they
> are not managed on the unevictable LRU lists of the MM.

Re-read the spec; pinned pages stay in memory; therefore they qualify.
The fact that we've gone all 'smart' about dealing with such pages is
immaterial.

> > If people find this approach unworkable, I request we revert the above
> > mentioned patch to at least restore RLIMIT_MEMLOCK to a usable state
> > again.
> 
> Cannot do that. This will cause the breakage that the patch was fixing to
> resurface.

But fixes my borkage, which I argue is much more user visible since it
doesn't require CAP_IPC_LOCK :-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-26  1:11                       ` KOSAKI Motohiro
@ 2013-05-28 16:19                         ` Christoph Lameter
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Lameter @ 2013-05-28 16:19 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Peter Zijlstra, Al Viro, Vince Weaver, LKML, Paul Mackerras,
	Ingo Molnar, Arnaldo Carvalho de Melo, trinity, Andrew Morton,
	Linus Torvalds, Roland Dreier, infinipath, linux-mm@kvack.org,
	linux-rdma, Or Gerlitz

On Sat, 25 May 2013, KOSAKI Motohiro wrote:

> If pinned and mlocked are totally difference intentionally, why IB uses
> RLIMIT_MEMLOCK. Why don't IB uses IB specific limit and why only IB raise up
> number of pinned pages and other gup users don't.
> I can't guess IB folk's intent.

True another limit would be better. The reason that IB raises the
pinned pages is because IB permanently pins those pages. Other users of
gup do that temporarily.

If there are other users that pin pages permanently should also account
for it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-27  6:48                       ` Peter Zijlstra
@ 2013-05-28 16:37                         ` Christoph Lameter
  2013-05-29  7:58                           ` [regression] " Ingo Molnar
                                             ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Christoph Lameter @ 2013-05-28 16:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Al Viro, Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland,
	infinipath, linux-mm, linux-rdma, Or Gerlitz, Hugh Dickins

On Mon, 27 May 2013, Peter Zijlstra wrote:

> Before your patch pinned was included in locked and thus RLIMIT_MEMLOCK
> had a single resource counter. After your patch RLIMIT_MEMLOCK is
> applied separately to both -- more or less.

Before the patch the count was doubled since a single page was counted
twice: Once because it was mlocked (marked with PG_mlock) and then again
because it was also pinned (the refcount was increased). Two different things.

We have agreed for a long time that mlocked pages are movable. That is not
true for pinned pages and therefore pinning pages therefore do not fall
into that category (Hugh? AFAICR you came up with that rule?)

> NO, mlocked pages are pages that do not leave core memory; IOW do not
> cause major faults. Pinning pages is a perfectly spec compliant mlock()
> implementation.

That is not the definition that we have used so far.

> Now in an earlier discussion on the issue 'we' (I can't remember if you
> participated there, I remember Mel and Kosaki-San) agreed that for
> 'normal' (read not whacky real-time people) mlock can still be useful
> and we should introduce a pinned user API for the RT people.

Right. I remember that.

> > Pinned pages are pages that have an elevated refcount because the hardware
> > needs to use these pages for I/O. The elevated refcount may be temporary
> > (then we dont care about this) or for a longer time (such as the memory
> > registration of the IB subsystem). That is when we account the memory as
> > pinned. The elevated refcount stops page migration and other things from
> > trying to move that memory.
>
> Again I _know_ that!!!

But then you refuse to acknowledge the difference and want to conflate
both.

> > Pages can be both pinned and mlocked.
>
> Right, but apart for mlockall() this is a highly unlikely situation to
> actually occur. And if you're using mlockall() you've effectively
> disabled RLIMIT_MEMLOCK and thus nobody cares if the resource counter
> goes funny.

mlockall() would never be used on all processes. You still need the
RLIMIT_MLOCK to ensure that the box does not lock up.

> > I think we need to be first clear on what we want to accomplish and what
> > these counters actually should count before changing things.
>
> Backward isn't it... _you_ changed it without consideration.

I applied the categorization that we had agreed on before during the
development of page migratiob. Pinning is not compatible.

> The IB code does a big get_user_pages(), which last time I checked
> pins a sequential range of pages. Therefore the VMA approach.

The IB code (and other code) can require the pinning of pages in various
ways.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [regression] Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-28 16:37                         ` Christoph Lameter
@ 2013-05-29  7:58                           ` Ingo Molnar
  2013-05-29 19:53                             ` KOSAKI Motohiro
  2013-05-30 18:30                           ` Peter Zijlstra
  2013-05-30 19:59                           ` Pekka Enberg
  2 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2013-05-29  7:58 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Peter Zijlstra, Al Viro, Vince Weaver, linux-kernel,
	Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo, trinity,
	akpm, torvalds, roland, infinipath, linux-mm, linux-rdma,
	Or Gerlitz, Hugh Dickins


* Christoph Lameter <cl@linux.com> wrote:

> On Mon, 27 May 2013, Peter Zijlstra wrote:
> 
> > Before your patch pinned was included in locked and thus RLIMIT_MEMLOCK
> > had a single resource counter. After your patch RLIMIT_MEMLOCK is
> > applied separately to both -- more or less.
> 
> Before the patch the count was doubled since a single page was counted 
> twice: Once because it was mlocked (marked with PG_mlock) and then again 
> because it was also pinned (the refcount was increased). Two different 
> things.

Christoph, why are you *STILL* arguing??

You caused a *regression* in a userspace ABI plain and simple, and a 
security relevant one. Furtermore you modified kernel/events/core.c yet 
you never even Cc:-ed the parties involved ...

All your excuses, obfuscation and attempts to redefine the universe to 
your liking won't change reality: it worked before, it does not now. Take 
responsibility for your action for christ's sake and move forward towards 
a resolution , okay?

When can we expect a fix from you for the breakage you caused? Or at least 
a word that acknowledges that you broke a user ABI carelessly?

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-29  7:58                           ` [regression] " Ingo Molnar
@ 2013-05-29 19:53                             ` KOSAKI Motohiro
  2013-05-30  6:32                               ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: KOSAKI Motohiro @ 2013-05-29 19:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Lameter, Peter Zijlstra, Al Viro, Vince Weaver,
	linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland,
	infinipath, linux-mm, linux-rdma, Or Gerlitz, Hugh Dickins,
	kosaki.motohiro

Hi

I'm unhappy you guys uses offensive word so much. Please cool down all you guys. :-/
In fact, _BOTH_ the behavior before and after Cristoph's patch doesn't have cleaner semantics.
And PeterZ proposed make new cleaner one rather than revert. No need to hassle.

I'm 100% sure -rt people need stronger-mlock api. Please join discussion to make better API.
In my humble opinion is: we should make mlock3(addr, len flags) new syscall (*) and support
-rt requirement directly. And current strange IB RLIMIT_MEMLOCK usage should gradually migrate
it.
(*) or, to enhance mbind() is an option because i expect apps need to pin pages nearby NUMA nodes
in many case.

As your know, current IB pinning implementation doesn't guarantee no minor fault when fork
is used. It's ok for IB. They uses madvise(MADV_NOFORK) too. But I'm not sure *all* of rt
application are satisfied this. We might need to implement copy-on-fork or might not. I'd
like hear other people's opinion.

Also, all developer should know this pinning breaks when memory hot-plug is happen likes
cpu bounding bysched_setaffinity() may break when cpu hot-remove.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-29 19:53                             ` KOSAKI Motohiro
@ 2013-05-30  6:32                               ` Ingo Molnar
  2013-05-30 20:42                                 ` KOSAKI Motohiro
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2013-05-30  6:32 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Peter Zijlstra, Al Viro, Vince Weaver,
	linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland,
	infinipath, linux-mm, linux-rdma, Or Gerlitz, Hugh Dickins


* KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote:

> Hi
> 
> I'm unhappy you guys uses offensive word so much. Please cool down all 
> you guys. :-/ In fact, _BOTH_ the behavior before and after Cristoph's 
> patch doesn't have cleaner semantics.

Erm, this feature _regressed_ after the patch. All other concerns are 
secondary. What's so difficult to understand about that?

> And PeterZ proposed make new cleaner one rather than revert. No need to 
> hassle.

Sorry, that's not how we deal with regressions upstream...

I've been pretty polite in fact, compared to say Linus - someone should 
put a politically correct summary of:

   https://lkml.org/lkml/2013/2/22/456
   https://lkml.org/lkml/2013/3/26/1113

into Documentation/, people keep forgetting about it.

I'm sorry if being direct and to the point offends you [no, not really], 
this discussion got _way_ off the real track it should be on: that this is 
a _regression_.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-28 16:37                         ` Christoph Lameter
  2013-05-29  7:58                           ` [regression] " Ingo Molnar
@ 2013-05-30 18:30                           ` Peter Zijlstra
  2013-05-30 19:59                           ` Pekka Enberg
  2 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2013-05-30 18:30 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Al Viro, Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, trinity, akpm, torvalds, roland,
	infinipath, linux-mm, linux-rdma, Or Gerlitz, Hugh Dickins

On Tue, May 28, 2013 at 04:37:06PM +0000, Christoph Lameter wrote:
> On Mon, 27 May 2013, Peter Zijlstra wrote:
> 
> > Before your patch pinned was included in locked and thus RLIMIT_MEMLOCK
> > had a single resource counter. After your patch RLIMIT_MEMLOCK is
> > applied separately to both -- more or less.
> 
> Before the patch the count was doubled since a single page was counted
> twice: Once because it was mlocked (marked with PG_mlock) and then again
> because it was also pinned (the refcount was increased). Two different things.

Before the patch RLIMIT_MEMLOCK was a limit on the sum of both, after it
is no longer. This change was not described in the changelog.

This is a bug plain and simple and you refuse to acknowledge it.

> We have agreed for a long time that mlocked pages are movable. That is not
> true for pinned pages and therefore pinning pages therefore do not fall
> into that category (Hugh? AFAICR you came up with that rule?)

Into what category? RLIMIT_MEMLOCK should be a limit on the amount of
pages that are constrained to memory (can not be paged). How does a
pinned page not qualify for that?

> > NO, mlocked pages are pages that do not leave core memory; IOW do not
> > cause major faults. Pinning pages is a perfectly spec compliant mlock()
> > implementation.
> 
> That is not the definition that we have used so far.

There's two statements there; which one do you disagree with?

On the first; that is exactly the mlock() definition you want; excluding
major faults only means faults cannot be subject to IO, IOW pages must
remain in memory. Not excluding minor faults allows you to unmap and
migrate pages.

On the second; excluding any faults; that is what the mlock() specs
intended -- because that is the thing real-time people really want and
mlock is part of the real-time POSIX spec.

[ Hence the need to provide mpin() and munpin() syscalls before we make
  mlock() pages migratable, otherwise RT people are screwed. ]

Since excluding any fault per definition also excludes major faults,
the second is a clear superset of the first. Both ensure pages stay in
memory, thus both should count towards RLIMIT_MEMLOCK.

> But then you refuse to acknowledge the difference and want to conflate
> both.

You're the one that is confused. You're just repeating yourself without
providing argument or reason. 

> > > Pages can be both pinned and mlocked.
> >
> > Right, but apart for mlockall() this is a highly unlikely situation to
> > actually occur. And if you're using mlockall() you've effectively
> > disabled RLIMIT_MEMLOCK and thus nobody cares if the resource counter
> > goes funny.
> 
> mlockall() would never be used on all processes. You still need the
> RLIMIT_MLOCK to ensure that the box does not lock up.

You're so skilled at missing the point its not funny. 

The resource counter (formerly mm_struct::locked_vm, currently broken)
associated with RLIMIT_MEMLOCK is per process. Therefore when you use
mlockall() you've already done away with the limit, and thus the
resource counter value is irrelevant.

> > > I think we need to be first clear on what we want to accomplish and what
> > > these counters actually should count before changing things.
> >
> > Backward isn't it... _you_ changed it without consideration.
> 
> I applied the categorization that we had agreed on before during the
> development of page migratiob. Pinning is not compatible.

I've never agreed to any such thing, you changed my code without cc'ing
me and without describing the actual change.

Please explain how pinned pages are not stuck in memory and can be
paged? Once you've convinced me of that, I'll concede they should not be
counted towards RLIMIT_MEMLOCK.

> > The IB code does a big get_user_pages(), which last time I checked
> > pins a sequential range of pages. Therefore the VMA approach.
> 
> The IB code (and other code) can require the pinning of pages in various
> ways.

You can also mlock()/madvise()/mprotect() in various ways, this is a
non-argument against using VMAs.

Is there anything besides IB and perf that has user controlled page
pinning? If so, it needs fixing.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-28 16:37                         ` Christoph Lameter
  2013-05-29  7:58                           ` [regression] " Ingo Molnar
  2013-05-30 18:30                           ` Peter Zijlstra
@ 2013-05-30 19:59                           ` Pekka Enberg
  2 siblings, 0 replies; 12+ messages in thread
From: Pekka Enberg @ 2013-05-30 19:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Peter Zijlstra, Al Viro, Vince Weaver, LKML, Paul Mackerras,
	Ingo Molnar, Arnaldo Carvalho de Melo, trinity, Andrew Morton,
	Linus Torvalds, roland, infinipath, linux-mm@kvack.org,
	linux-rdma, Or Gerlitz, Hugh Dickins

On Mon, 27 May 2013, Peter Zijlstra wrote:
>> Before your patch pinned was included in locked and thus RLIMIT_MEMLOCK
>> had a single resource counter. After your patch RLIMIT_MEMLOCK is
>> applied separately to both -- more or less.

On Tue, May 28, 2013 at 7:37 PM, Christoph Lameter <cl@linux.com> wrote:
> Before the patch the count was doubled since a single page was counted
> twice: Once because it was mlocked (marked with PG_mlock) and then again
> because it was also pinned (the refcount was increased). Two different things.

Pinned vs. mlocked counting isn't the problem here. You changed
RLIMIT_MEMLOCK userspace ABI and that needs to be restored. So the
question is how can we preserve the old RLIMIT_MEMLOCK semantics while
avoiding the double accounting issue.

                        Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-30  6:32                               ` Ingo Molnar
@ 2013-05-30 20:42                                 ` KOSAKI Motohiro
  2013-05-31  9:27                                   ` Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: KOSAKI Motohiro @ 2013-05-30 20:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Christoph Lameter, Peter Zijlstra, Al Viro, Vince Weaver, LKML,
	Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo, trinity,
	Andrew Morton, Linus Torvalds, Roland Dreier, infinipath,
	linux-mm@kvack.org, linux-rdma, Or Gerlitz, Hugh Dickins

>> I'm unhappy you guys uses offensive word so much. Please cool down all
>> you guys. :-/ In fact, _BOTH_ the behavior before and after Cristoph's
>> patch doesn't have cleaner semantics.
>
> Erm, this feature _regressed_ after the patch. All other concerns are
> secondary. What's so difficult to understand about that?

Because it is not new commit at all. Christoph's patch was introduced
10 releases before.

$ git describe bc3e53f682
v3.1-7235-gbc3e53f

If we just revert it, we may get another and opposite regression
report. I'm worried
about it. Moreover, I don't think discussion better fix is too difficult for us.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK
  2013-05-30 20:42                                 ` KOSAKI Motohiro
@ 2013-05-31  9:27                                   ` Ingo Molnar
  0 siblings, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2013-05-31  9:27 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Peter Zijlstra, Al Viro, Vince Weaver, LKML,
	Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo, trinity,
	Andrew Morton, Linus Torvalds, Roland Dreier, infinipath,
	linux-mm@kvack.org, linux-rdma, Or Gerlitz, Hugh Dickins


* KOSAKI Motohiro <kosaki.motohiro@gmail.com> wrote:

> >> I'm unhappy you guys uses offensive word so much. Please cool down all
> >> you guys. :-/ In fact, _BOTH_ the behavior before and after Cristoph's
> >> patch doesn't have cleaner semantics.
> >
> > Erm, this feature _regressed_ after the patch. All other concerns are
> > secondary. What's so difficult to understand about that?
> 
> Because it is not new commit at all. Christoph's patch was introduced
> 10 releases before.

That's indeed sad - resource limits are not typically tested by apps. The 
lack of Cc:s to people affected also helped hide the effects of the 
change.

> $ git describe bc3e53f682
> v3.1-7235-gbc3e53f
> 
> If we just revert it, we may get another and opposite regression report. 
> I'm worried about it. Moreover, I don't think discussion better fix is 
> too difficult for us.

Sure, I agree that a fix is generally better.

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-05-31  9:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <alpine.DEB.2.10.1305221523420.9944@vincent-weaver-1.um.maine.edu>
     [not found] ` <alpine.DEB.2.10.1305221953370.11450@vincent-weaver-1.um.maine.edu>
     [not found]   ` <alpine.DEB.2.10.1305222344060.12929@vincent-weaver-1.um.maine.edu>
     [not found]     ` <20130523044803.GA25399@ZenIV.linux.org.uk>
     [not found]       ` <20130523104154.GA23650@twins.programming.kicks-ass.net>
     [not found]         ` <0000013ed1b8d0cc-ad2bb878-51bd-430c-8159-629b23ed1b44-000000@email.amazonses.com>
     [not found]           ` <20130523152458.GD23650@twins.programming.kicks-ass.net>
     [not found]             ` <0000013ed2297ba8-467d474a-7068-45b3-9fa3-82641e6aa363-000000@email.amazonses.com>
     [not found]               ` <20130523163901.GG23650@twins.programming.kicks-ass.net>
     [not found]                 ` <0000013ed28b638a-066d7dc7-b590-49f8-9423-badb9537b8b6-000000@email.amazonses.com>
     [not found]                   ` <20130524140114.GK23650@twins.programming.kicks-ass.net>
2013-05-24 15:40                     ` [RFC][PATCH] mm: Fix RLIMIT_MEMLOCK Christoph Lameter
2013-05-26  1:11                       ` KOSAKI Motohiro
2013-05-28 16:19                         ` Christoph Lameter
2013-05-27  6:48                       ` Peter Zijlstra
2013-05-28 16:37                         ` Christoph Lameter
2013-05-29  7:58                           ` [regression] " Ingo Molnar
2013-05-29 19:53                             ` KOSAKI Motohiro
2013-05-30  6:32                               ` Ingo Molnar
2013-05-30 20:42                                 ` KOSAKI Motohiro
2013-05-31  9:27                                   ` Ingo Molnar
2013-05-30 18:30                           ` Peter Zijlstra
2013-05-30 19:59                           ` Pekka Enberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox