Re: [PATCH 0/3] ipc subsystem refcounter conversions

public inbox for linux-arch@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
       [not found] ` <20170303162352.b6af1c0c3115b3f5f1e7aed3@linux-foundation.org>
@ 2017-05-27 19:58   ` Kees Cook
  2017-05-27 19:58     ` Kees Cook
  2017-05-29  8:39     ` Christoph Hellwig
  0 siblings, 2 replies; 20+ messages in thread
From: Kees Cook @ 2017-05-27 19:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Elena Reshetova, Peter Zijlstra, Greg KH, Eric W. Biederman,
	Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, Christoph Hellwig,
	axboe@kernel.dk, James Bottomley, x86@kernel.org, Ingo Molnar,
	Arnd Bergmann, David S. Miller, Rik van Riel, linux-arch,
	kernel-hardening

On Fri, Mar 3, 2017 at 4:23 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Mon, 20 Feb 2017 13:29:46 +0200 Elena Reshetova <elena.reshetova@intel.com> wrote:
>
>> Now when new refcount_t type and API are finally merged
>> (see include/linux/refcount.h), the following
>> patches convert various refcounters in the ipc susystem from atomic_t
>> to refcount_t. By doing this we prevent intentional or accidental
>> underflows or overflows that can led to use-after-free vulnerabilities.
>>
>> The below patches are fully independent and can be cherry-picked separately.
>> Since we convert all kernel subsystems in the same fashion, resulting
>> in about 300 patches, we have to group them for sending at least in some
>> fashion to be manageable. Please excuse the long cc list.
>
> Again, the refcount_t operations are much more expensive than the bare
> atomic_t operations.  I'm reluctant to merge any of these conversions

Since Manfred did a recent refactor of IPC RCU, and the refcount usage
is minimal, it seemed a good time to ask after these patches again.
(And send an updated one for the refactor.)

> without either
>
> a) a convincing demonstration that the performance impact is
>    sufficiently small (ie: unmeasurable) or

I've sent a few versions of a much faster refcount implementation
which has no measurable difference to atomic_t operations (based on
the PaX implementation). Getting more eyes on that would be nice; I'll
include you on CC when I send the next version.

> b) a compile-time option to disable the refcount_t operations (make
>    them generate the same code as the bare atomic_t ops).  Along with
>    some suitably reliable means of preventing people from accidentally
>    enabling the debug code in production builds.

Since the speed-ups will likely be arch-specific, should my proposed
CONFIG_FAST_REFCOUNT be made arch-indep when no arch-specific
implementation available (via totally unchecked atomic_t ops)? i.e.:

FAST_REFCOUNT=n: use function-based refcount_t with cmpxvhg and
full-verification
FAST_REFCOUNT=y without arch-specific implementation: use atomic_t
with no verification (i.e. no functional change from now)
FAST_REFCOUNT=y with arch-specific implementation: use atomic_t with
overflow protection

which means FAST_REFCOUNT would need to be default-on so that mm,
block, net users will remain happy.

Does that sound reasonable?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-27 19:58   ` [PATCH 0/3] ipc subsystem refcounter conversions Kees Cook
@ 2017-05-27 19:58     ` Kees Cook
  2017-05-29  8:39     ` Christoph Hellwig
  1 sibling, 0 replies; 20+ messages in thread
From: Kees Cook @ 2017-05-27 19:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Elena Reshetova, Peter Zijlstra, Greg KH, Eric W. Biederman,
	Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, Christoph Hellwig,
	axboe@kernel.dk, James Bottomley, x86@kernel.org, Ingo Molnar,
	Arnd Bergmann, David S. Miller, Rik van Riel, linux-arch,
	kernel-hardening@lists.openwall.com, LKML

On Fri, Mar 3, 2017 at 4:23 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Mon, 20 Feb 2017 13:29:46 +0200 Elena Reshetova <elena.reshetova@intel.com> wrote:
>
>> Now when new refcount_t type and API are finally merged
>> (see include/linux/refcount.h), the following
>> patches convert various refcounters in the ipc susystem from atomic_t
>> to refcount_t. By doing this we prevent intentional or accidental
>> underflows or overflows that can led to use-after-free vulnerabilities.
>>
>> The below patches are fully independent and can be cherry-picked separately.
>> Since we convert all kernel subsystems in the same fashion, resulting
>> in about 300 patches, we have to group them for sending at least in some
>> fashion to be manageable. Please excuse the long cc list.
>
> Again, the refcount_t operations are much more expensive than the bare
> atomic_t operations.  I'm reluctant to merge any of these conversions

Since Manfred did a recent refactor of IPC RCU, and the refcount usage
is minimal, it seemed a good time to ask after these patches again.
(And send an updated one for the refactor.)

> without either
>
> a) a convincing demonstration that the performance impact is
>    sufficiently small (ie: unmeasurable) or

I've sent a few versions of a much faster refcount implementation
which has no measurable difference to atomic_t operations (based on
the PaX implementation). Getting more eyes on that would be nice; I'll
include you on CC when I send the next version.

> b) a compile-time option to disable the refcount_t operations (make
>    them generate the same code as the bare atomic_t ops).  Along with
>    some suitably reliable means of preventing people from accidentally
>    enabling the debug code in production builds.

Since the speed-ups will likely be arch-specific, should my proposed
CONFIG_FAST_REFCOUNT be made arch-indep when no arch-specific
implementation available (via totally unchecked atomic_t ops)? i.e.:

FAST_REFCOUNT=n: use function-based refcount_t with cmpxvhg and
full-verification
FAST_REFCOUNT=y without arch-specific implementation: use atomic_t
with no verification (i.e. no functional change from now)
FAST_REFCOUNT=y with arch-specific implementation: use atomic_t with
overflow protection

which means FAST_REFCOUNT would need to be default-on so that mm,
block, net users will remain happy.

Does that sound reasonable?

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-27 19:58   ` [PATCH 0/3] ipc subsystem refcounter conversions Kees Cook
  2017-05-27 19:58     ` Kees Cook
@ 2017-05-29  8:39     ` Christoph Hellwig
  2017-05-29  8:39       ` Christoph Hellwig
  2017-05-29  9:11       ` Eric W. Biederman
  1 sibling, 2 replies; 20+ messages in thread
From: Christoph Hellwig @ 2017-05-29  8:39 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Elena Reshetova, Peter Zijlstra, Greg KH,
	Eric W. Biederman, Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn,
	arozansk, Davidlohr Bueso, Manfred Spraul, Christoph Hellwig,
	axboe@kernel.dk, James Bottomley, x86@kernel.org, Ingo Molnar,
	Arnd Bergmann, David S. Miller, Rik van Riel, linux-arch

On Sat, May 27, 2017 at 12:58:14PM -0700, Kees Cook wrote:
> FAST_REFCOUNT=n: use function-based refcount_t with cmpxvhg and
> full-verification
> FAST_REFCOUNT=y without arch-specific implementation: use atomic_t
> with no verification (i.e. no functional change from now)
> FAST_REFCOUNT=y with arch-specific implementation: use atomic_t with
> overflow protection
> 
> which means FAST_REFCOUNT would need to be default-on so that mm,
> block, net users will remain happy.
> 
> Does that sound reasonable?

I'd rather turn the options around so that the atomic_t or fast
arch implementations are the defaul.  But either way it needs to
be configurable.  Once that is done we can spread refcount_t everywhere
and everyone will be better off, if only for the documentation value
of the type when they use the atomic_t based implementation.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-29  8:39     ` Christoph Hellwig
@ 2017-05-29  8:39       ` Christoph Hellwig
  2017-05-29  9:11       ` Eric W. Biederman
  1 sibling, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2017-05-29  8:39 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Elena Reshetova, Peter Zijlstra, Greg KH,
	Eric W. Biederman, Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn,
	arozansk, Davidlohr Bueso, Manfred Spraul, Christoph Hellwig,
	axboe@kernel.dk, James Bottomley, x86@kernel.org, Ingo Molnar,
	Arnd Bergmann, David S. Miller, Rik van Riel, linux-arch,
	kernel-hardening@lists.openwall.com, LKML

On Sat, May 27, 2017 at 12:58:14PM -0700, Kees Cook wrote:
> FAST_REFCOUNT=n: use function-based refcount_t with cmpxvhg and
> full-verification
> FAST_REFCOUNT=y without arch-specific implementation: use atomic_t
> with no verification (i.e. no functional change from now)
> FAST_REFCOUNT=y with arch-specific implementation: use atomic_t with
> overflow protection
> 
> which means FAST_REFCOUNT would need to be default-on so that mm,
> block, net users will remain happy.
> 
> Does that sound reasonable?

I'd rather turn the options around so that the atomic_t or fast
arch implementations are the defaul.  But either way it needs to
be configurable.  Once that is done we can spread refcount_t everywhere
and everyone will be better off, if only for the documentation value
of the type when they use the atomic_t based implementation.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-29  8:39     ` Christoph Hellwig
  2017-05-29  8:39       ` Christoph Hellwig
@ 2017-05-29  9:11       ` Eric W. Biederman
  2017-05-29  9:11         ` Eric W. Biederman
  2017-05-29 10:24         ` Peter Zijlstra
  1 sibling, 2 replies; 20+ messages in thread
From: Eric W. Biederman @ 2017-05-29  9:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kees Cook, Andrew Morton, Elena Reshetova, Peter Zijlstra,
	Greg KH, Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, axboe@kernel.dk, James Bottomley,
	x86@kernel.org, Ingo Molnar, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-arch, kernel-hardening

Christoph Hellwig <hch@infradead.org> writes:

> On Sat, May 27, 2017 at 12:58:14PM -0700, Kees Cook wrote:
>> FAST_REFCOUNT=n: use function-based refcount_t with cmpxvhg and
>> full-verification
>> FAST_REFCOUNT=y without arch-specific implementation: use atomic_t
>> with no verification (i.e. no functional change from now)
>> FAST_REFCOUNT=y with arch-specific implementation: use atomic_t with
>> overflow protection
>> 
>> which means FAST_REFCOUNT would need to be default-on so that mm,
>> block, net users will remain happy.
>> 
>> Does that sound reasonable?
>
> I'd rather turn the options around so that the atomic_t or fast
> arch implementations are the defaul.  But either way it needs to
> be configurable.  Once that is done we can spread refcount_t everywhere
> and everyone will be better off, if only for the documentation value
> of the type when they use the atomic_t based implementation.

Agreed on the opposite default as opting into common library
implementations is how we typically handle things in linux.

Kees I I have a concern:

__must_check bool refcount_add_not_zero(unsigned int i, refcount_t *r)
{
        unsigned int new, val = atomic_read(&r->refs);

        do {
                if (!val)
                        return false;

                if (unlikely(val == UINT_MAX))
                        return true;

                new = val + i;
                if (new < val)
                        new = UINT_MAX;

        } while (!atomic_try_cmpxchg_relaxed(&r->refs, &val, new));

        WARN_ONCE(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");

        return true;
}

Why in the world do you succeed when you the value saturates????

From a code perspective that is bizarre.   The code already has to handle
the case when the counter does not increment.

So since we already have to have that code to handle the failure to
increment I think it would make much more sense if the counters did not
only saturate but report failure when the counter can not increment.

Right now I am inclined to NACK refcount_t conversions because their
semantics don't make sense.  Which would seem to make the code less
correct by introducing bizarre corner cases instead of letting the code
use the it's existing handling of the failure to increment or decrement
the counter.

Fixing the return value would move refcount_t into the realm of
something that is desirable because it has bettern semantics and
is more useful just on a day to day correctness point of view.  Even
ignoring the security implications.

I suspect that would also make it easier for refcount_t implementations
to produce efficient code.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-29  9:11       ` Eric W. Biederman
@ 2017-05-29  9:11         ` Eric W. Biederman
  2017-05-29 10:24         ` Peter Zijlstra
  1 sibling, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2017-05-29  9:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kees Cook, Andrew Morton, Elena Reshetova, Peter Zijlstra,
	Greg KH, Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, axboe@kernel.dk, James Bottomley,
	x86@kernel.org, Ingo Molnar, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-arch, kernel-hardening@lists.openwall.com,
	LKML

Christoph Hellwig <hch@infradead.org> writes:

> On Sat, May 27, 2017 at 12:58:14PM -0700, Kees Cook wrote:
>> FAST_REFCOUNT=n: use function-based refcount_t with cmpxvhg and
>> full-verification
>> FAST_REFCOUNT=y without arch-specific implementation: use atomic_t
>> with no verification (i.e. no functional change from now)
>> FAST_REFCOUNT=y with arch-specific implementation: use atomic_t with
>> overflow protection
>> 
>> which means FAST_REFCOUNT would need to be default-on so that mm,
>> block, net users will remain happy.
>> 
>> Does that sound reasonable?
>
> I'd rather turn the options around so that the atomic_t or fast
> arch implementations are the defaul.  But either way it needs to
> be configurable.  Once that is done we can spread refcount_t everywhere
> and everyone will be better off, if only for the documentation value
> of the type when they use the atomic_t based implementation.

Agreed on the opposite default as opting into common library
implementations is how we typically handle things in linux.

Kees I I have a concern:

__must_check bool refcount_add_not_zero(unsigned int i, refcount_t *r)
{
        unsigned int new, val = atomic_read(&r->refs);

        do {
                if (!val)
                        return false;

                if (unlikely(val == UINT_MAX))
                        return true;

                new = val + i;
                if (new < val)
                        new = UINT_MAX;

        } while (!atomic_try_cmpxchg_relaxed(&r->refs, &val, new));

        WARN_ONCE(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");

        return true;
}

Why in the world do you succeed when you the value saturates????

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-29  9:11       ` Eric W. Biederman
  2017-05-29  9:11         ` Eric W. Biederman
@ 2017-05-29 10:24         ` Peter Zijlstra
  2017-05-29 10:24           ` Peter Zijlstra
  2017-05-29 10:49           ` Eric W. Biederman
  1 sibling, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2017-05-29 10:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Christoph Hellwig, Kees Cook, Andrew Morton, Elena Reshetova,
	Greg KH, Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, axboe@kernel.dk, James Bottomley,
	x86@kernel.org, Ingo Molnar, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-arch, kernel-hardening@lists.openwall.com,
	LKML

On Mon, May 29, 2017 at 04:11:13AM -0500, Eric W. Biederman wrote:

> Kees I I have a concern:
> 
> __must_check bool refcount_add_not_zero(unsigned int i, refcount_t *r)
> {
>         unsigned int new, val = atomic_read(&r->refs);
> 
>         do {
>                 if (!val)
>                         return false;
> 
>                 if (unlikely(val == UINT_MAX))
>                         return true;
> 
>                 new = val + i;
>                 if (new < val)
>                         new = UINT_MAX;
> 
>         } while (!atomic_try_cmpxchg_relaxed(&r->refs, &val, new));
> 
>         WARN_ONCE(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
> 
>         return true;
> }
> 
> Why in the world do you succeed when you the value saturates????

Why not? On saturation the object will leak and returning a reference to
it is always good.

> From a code perspective that is bizarre.   The code already has to handle
> the case when the counter does not increment.

I don't see it as bizarre, we turned an overflow/use-after-free into a
leak. That's the primary mechanism here.

As long as we have a reference to a leaked object, we might as well use
it, its not going anywhere.

> Fixing the return value would move refcount_t into the realm of
> something that is desirable because it has bettern semantics and
> is more useful just on a day to day correctness point of view.  Even
> ignoring the security implications.

It changes the semantics between inc_not_zero() and inc(). It also
complicates the semantics of inc_not_zero(), where currently the failure
implies the count is 0 and means no-such-object, you complicate matters
by basically returning 'busy'.

That is a completely new class of failure that is actually hard to deal
with, not to mention that it completely destroys refcount_inc_not_zero()
being a 'simple' replacement for atomic_inc_not_zero().

In case of the current failure, the no-such-object, we can fix that by
creating said object. But what to do on 'busy' ? Surely you don't want
to create another. You'd have to somehow retrofit something to wait on
in every user.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-29 10:24         ` Peter Zijlstra
@ 2017-05-29 10:24           ` Peter Zijlstra
  2017-05-29 10:49           ` Eric W. Biederman
  1 sibling, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2017-05-29 10:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Christoph Hellwig, Kees Cook, Andrew Morton, Elena Reshetova,
	Greg KH, Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, axboe@kernel.dk, James Bottomley,
	x86@kernel.org, Ingo Molnar, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-arch, kernel-hardening@lists.openwall.com,
	LKML

On Mon, May 29, 2017 at 04:11:13AM -0500, Eric W. Biederman wrote:

> Kees I I have a concern:
> 
> __must_check bool refcount_add_not_zero(unsigned int i, refcount_t *r)
> {
>         unsigned int new, val = atomic_read(&r->refs);
> 
>         do {
>                 if (!val)
>                         return false;
> 
>                 if (unlikely(val == UINT_MAX))
>                         return true;
> 
>                 new = val + i;
>                 if (new < val)
>                         new = UINT_MAX;
> 
>         } while (!atomic_try_cmpxchg_relaxed(&r->refs, &val, new));
> 
>         WARN_ONCE(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
> 
>         return true;
> }
> 
> Why in the world do you succeed when you the value saturates????

Why not? On saturation the object will leak and returning a reference to
it is always good.

> From a code perspective that is bizarre.   The code already has to handle
> the case when the counter does not increment.

I don't see it as bizarre, we turned an overflow/use-after-free into a
leak. That's the primary mechanism here.

As long as we have a reference to a leaked object, we might as well use
it, its not going anywhere.

> Fixing the return value would move refcount_t into the realm of
> something that is desirable because it has bettern semantics and
> is more useful just on a day to day correctness point of view.  Even
> ignoring the security implications.

It changes the semantics between inc_not_zero() and inc(). It also
complicates the semantics of inc_not_zero(), where currently the failure
implies the count is 0 and means no-such-object, you complicate matters
by basically returning 'busy'.

That is a completely new class of failure that is actually hard to deal
with, not to mention that it completely destroys refcount_inc_not_zero()
being a 'simple' replacement for atomic_inc_not_zero().

In case of the current failure, the no-such-object, we can fix that by
creating said object. But what to do on 'busy' ? Surely you don't want
to create another. You'd have to somehow retrofit something to wait on
in every user.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-29 10:24         ` Peter Zijlstra
  2017-05-29 10:24           ` Peter Zijlstra
@ 2017-05-29 10:49           ` Eric W. Biederman
  2017-05-29 10:49             ` Eric W. Biederman
                               ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Eric W. Biederman @ 2017-05-29 10:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Kees Cook, Andrew Morton, Elena Reshetova,
	Greg KH, Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, axboe@kernel.dk, James Bottomley,
	x86@kernel.org, Ingo Molnar, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-arch, kernel-hardening@lists.openwall.com

Peter Zijlstra <peterz@infradead.org> writes:

> On Mon, May 29, 2017 at 04:11:13AM -0500, Eric W. Biederman wrote:
>
>> Kees I I have a concern:
>> 
>> __must_check bool refcount_add_not_zero(unsigned int i, refcount_t *r)
>> {
>>         unsigned int new, val = atomic_read(&r->refs);
>> 
>>         do {
>>                 if (!val)
>>                         return false;
>> 
>>                 if (unlikely(val == UINT_MAX))
>>                         return true;
>> 
>>                 new = val + i;
>>                 if (new < val)
>>                         new = UINT_MAX;
>> 
>>         } while (!atomic_try_cmpxchg_relaxed(&r->refs, &val, new));
>> 
>>         WARN_ONCE(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
>> 
>>         return true;
>> }
>> 
>> Why in the world do you succeed when you the value saturates????
>
> Why not? On saturation the object will leak and returning a reference to
> it is always good.
>
>> From a code perspective that is bizarre.   The code already has to handle
>> the case when the counter does not increment.
>
> I don't see it as bizarre, we turned an overflow/use-after-free into a
> leak. That's the primary mechanism here.
>
> As long as we have a reference to a leaked object, we might as well use
> it, its not going anywhere.
>
>> Fixing the return value would move refcount_t into the realm of
>> something that is desirable because it has bettern semantics and
>> is more useful just on a day to day correctness point of view.  Even
>> ignoring the security implications.
>
> It changes the semantics between inc_not_zero() and inc(). It also
> complicates the semantics of inc_not_zero(), where currently the failure
> implies the count is 0 and means no-such-object, you complicate matters
> by basically returning 'busy'.

Busy is not a state of a reference count.

It is true I am suggesting treating something with a saturated reference
as not available.  If that is what you mean by busy.  But if it's
reference is zero it is also not available.  So there is no practical
difference.

> That is a completely new class of failure that is actually hard to deal
> with, not to mention that it completely destroys refcount_inc_not_zero()
> being a 'simple' replacement for atomic_inc_not_zero().
>
> In case of the current failure, the no-such-object, we can fix that by
> creating said object. But what to do on 'busy' ? Surely you don't want
> to create another. You'd have to somehow retrofit something to wait on
> in every user.

Using little words.

A return of true from inc_not_zero means we took a reference.
A return of false means we did not take a reference.

The code already handles I took a reference or I did not take a
reference.

Therefore lying with refcount_t is not helpful.  It takes failures
the code could easily handle and turns them into leaks.

At least that is how I have seen reference counts used.  And those
are definitely the plane obivous semantics.

Your changes are definitely not drop in replacements for atomic_t in my
code.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-29 10:49           ` Eric W. Biederman
@ 2017-05-29 10:49             ` Eric W. Biederman
  2017-05-29 11:30             ` Eric W. Biederman
  2017-05-29 12:13             ` Peter Zijlstra
  2 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2017-05-29 10:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Kees Cook, Andrew Morton, Elena Reshetova,
	Greg KH, Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, axboe@kernel.dk, James Bottomley,
	x86@kernel.org, Ingo Molnar, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-arch, kernel-hardening@lists.openwall.com,
	LKML

Peter Zijlstra <peterz@infradead.org> writes:

> On Mon, May 29, 2017 at 04:11:13AM -0500, Eric W. Biederman wrote:
>
>> Kees I I have a concern:
>> 
>> __must_check bool refcount_add_not_zero(unsigned int i, refcount_t *r)
>> {
>>         unsigned int new, val = atomic_read(&r->refs);
>> 
>>         do {
>>                 if (!val)
>>                         return false;
>> 
>>                 if (unlikely(val == UINT_MAX))
>>                         return true;
>> 
>>                 new = val + i;
>>                 if (new < val)
>>                         new = UINT_MAX;
>> 
>>         } while (!atomic_try_cmpxchg_relaxed(&r->refs, &val, new));
>> 
>>         WARN_ONCE(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
>> 
>>         return true;
>> }
>> 
>> Why in the world do you succeed when you the value saturates????
>
> Why not? On saturation the object will leak and returning a reference to
> it is always good.
>
>> From a code perspective that is bizarre.   The code already has to handle
>> the case when the counter does not increment.
>
> I don't see it as bizarre, we turned an overflow/use-after-free into a
> leak. That's the primary mechanism here.
>
> As long as we have a reference to a leaked object, we might as well use
> it, its not going anywhere.
>
>> Fixing the return value would move refcount_t into the realm of
>> something that is desirable because it has bettern semantics and
>> is more useful just on a day to day correctness point of view.  Even
>> ignoring the security implications.
>
> It changes the semantics between inc_not_zero() and inc(). It also
> complicates the semantics of inc_not_zero(), where currently the failure
> implies the count is 0 and means no-such-object, you complicate matters
> by basically returning 'busy'.

Busy is not a state of a reference count.

It is true I am suggesting treating something with a saturated reference
as not available.  If that is what you mean by busy.  But if it's
reference is zero it is also not available.  So there is no practical
difference.

> That is a completely new class of failure that is actually hard to deal
> with, not to mention that it completely destroys refcount_inc_not_zero()
> being a 'simple' replacement for atomic_inc_not_zero().
>
> In case of the current failure, the no-such-object, we can fix that by
> creating said object. But what to do on 'busy' ? Surely you don't want
> to create another. You'd have to somehow retrofit something to wait on
> in every user.

Using little words.

A return of true from inc_not_zero means we took a reference.
A return of false means we did not take a reference.

The code already handles I took a reference or I did not take a
reference.

Therefore lying with refcount_t is not helpful.  It takes failures
the code could easily handle and turns them into leaks.

At least that is how I have seen reference counts used.  And those
are definitely the plane obivous semantics.

Your changes are definitely not drop in replacements for atomic_t in my
code.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-29 10:49           ` Eric W. Biederman
  2017-05-29 10:49             ` Eric W. Biederman
@ 2017-05-29 11:30             ` Eric W. Biederman
  2017-05-29 11:30               ` Eric W. Biederman
  2017-05-29 11:39               ` Eric W. Biederman
  2017-05-29 12:13             ` Peter Zijlstra
  2 siblings, 2 replies; 20+ messages in thread
From: Eric W. Biederman @ 2017-05-29 11:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Kees Cook, Andrew Morton, Elena Reshetova,
	Greg KH, Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, axboe@kernel.dk, James Bottomley,
	x86@kernel.org, Ingo Molnar, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-arch, kernel-hardening

ebiederm@xmission.com (Eric W. Biederman) writes:

> Peter Zijlstra <peterz@infradead.org> writes:
>
>> On Mon, May 29, 2017 at 04:11:13AM -0500, Eric W. Biederman wrote:
>>
>>> Kees I I have a concern:
>>> 
>>> __must_check bool refcount_add_not_zero(unsigned int i, refcount_t *r)
>>> {
>>>         unsigned int new, val = atomic_read(&r->refs);
>>> 
>>>         do {
>>>                 if (!val)
>>>                         return false;
>>> 
>>>                 if (unlikely(val == UINT_MAX))
>>>                         return true;
>>> 
>>>                 new = val + i;
>>>                 if (new < val)
>>>                         new = UINT_MAX;
>>> 
>>>         } while (!atomic_try_cmpxchg_relaxed(&r->refs, &val, new));
>>> 
>>>         WARN_ONCE(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
>>> 
>>>         return true;
>>> }
>>> 
>>> Why in the world do you succeed when you the value saturates????
>>
>> Why not? On saturation the object will leak and returning a reference to
>> it is always good.
>>
>>> From a code perspective that is bizarre.   The code already has to handle
>>> the case when the counter does not increment.
>>
>> I don't see it as bizarre, we turned an overflow/use-after-free into a
>> leak. That's the primary mechanism here.
>>
>> As long as we have a reference to a leaked object, we might as well use
>> it, its not going anywhere.
>>
>>> Fixing the return value would move refcount_t into the realm of
>>> something that is desirable because it has bettern semantics and
>>> is more useful just on a day to day correctness point of view.  Even
>>> ignoring the security implications.
>>
>> It changes the semantics between inc_not_zero() and inc(). It also
>> complicates the semantics of inc_not_zero(), where currently the failure
>> implies the count is 0 and means no-such-object, you complicate matters
>> by basically returning 'busy'.
>
> Busy is not a state of a reference count.
>
> It is true I am suggesting treating something with a saturated reference
> as not available.  If that is what you mean by busy.  But if it's
> reference is zero it is also not available.  So there is no practical
> difference.
>
>> That is a completely new class of failure that is actually hard to deal
>> with, not to mention that it completely destroys refcount_inc_not_zero()
>> being a 'simple' replacement for atomic_inc_not_zero().
>>
>> In case of the current failure, the no-such-object, we can fix that by
>> creating said object. But what to do on 'busy' ? Surely you don't want
>> to create another. You'd have to somehow retrofit something to wait on
>> in every user.
>
> Using little words.
>
> A return of true from inc_not_zero means we took a reference.
> A return of false means we did not take a reference.
>
> The code already handles I took a reference or I did not take a
> reference.
>
> Therefore lying with refcount_t is not helpful.  It takes failures
> the code could easily handle and turns them into leaks.
>
> At least that is how I have seen reference counts used.  And those
> are definitely the plane obivous semantics.
>
> Your changes are definitely not drop in replacements for atomic_t in my
> code.

To clarify.

If my code uses atomic_inc it does not expect a failure of any sort
and saturate semantics are a fine replacement.

If my code uses atomic_inc_not_zero it knows how to handle a failure
to take a reference count.  Making hiding the failure really bizarre.

A must check function that hides a case I can handle and requires
checking in a case where my code is built not to check is a drop in
replacement for neither.

So anyone who is proposing a refcount_t change as a drop in replacement
for any code I maintain I will nack on sight because refcount_t is not
currently a no-brain drop in replacement.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-29 11:30             ` Eric W. Biederman
@ 2017-05-29 11:30               ` Eric W. Biederman
  2017-05-29 11:39               ` Eric W. Biederman
  1 sibling, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2017-05-29 11:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Kees Cook, Andrew Morton, Elena Reshetova,
	Greg KH, Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, axboe@kernel.dk, James Bottomley,
	x86@kernel.org, Ingo Molnar, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-arch, kernel-hardening@lists.openwall.com,
	LKML

ebiederm@xmission.com (Eric W. Biederman) writes:

> Peter Zijlstra <peterz@infradead.org> writes:
>
>> On Mon, May 29, 2017 at 04:11:13AM -0500, Eric W. Biederman wrote:
>>
>>> Kees I I have a concern:
>>> 
>>> __must_check bool refcount_add_not_zero(unsigned int i, refcount_t *r)
>>> {
>>>         unsigned int new, val = atomic_read(&r->refs);
>>> 
>>>         do {
>>>                 if (!val)
>>>                         return false;
>>> 
>>>                 if (unlikely(val == UINT_MAX))
>>>                         return true;
>>> 
>>>                 new = val + i;
>>>                 if (new < val)
>>>                         new = UINT_MAX;
>>> 
>>>         } while (!atomic_try_cmpxchg_relaxed(&r->refs, &val, new));
>>> 
>>>         WARN_ONCE(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
>>> 
>>>         return true;
>>> }
>>> 
>>> Why in the world do you succeed when you the value saturates????
>>
>> Why not? On saturation the object will leak and returning a reference to
>> it is always good.
>>
>>> From a code perspective that is bizarre.   The code already has to handle
>>> the case when the counter does not increment.
>>
>> I don't see it as bizarre, we turned an overflow/use-after-free into a
>> leak. That's the primary mechanism here.
>>
>> As long as we have a reference to a leaked object, we might as well use
>> it, its not going anywhere.
>>
>>> Fixing the return value would move refcount_t into the realm of
>>> something that is desirable because it has bettern semantics and
>>> is more useful just on a day to day correctness point of view.  Even
>>> ignoring the security implications.
>>
>> It changes the semantics between inc_not_zero() and inc(). It also
>> complicates the semantics of inc_not_zero(), where currently the failure
>> implies the count is 0 and means no-such-object, you complicate matters
>> by basically returning 'busy'.
>
> Busy is not a state of a reference count.
>
> It is true I am suggesting treating something with a saturated reference
> as not available.  If that is what you mean by busy.  But if it's
> reference is zero it is also not available.  So there is no practical
> difference.
>
>> That is a completely new class of failure that is actually hard to deal
>> with, not to mention that it completely destroys refcount_inc_not_zero()
>> being a 'simple' replacement for atomic_inc_not_zero().
>>
>> In case of the current failure, the no-such-object, we can fix that by
>> creating said object. But what to do on 'busy' ? Surely you don't want
>> to create another. You'd have to somehow retrofit something to wait on
>> in every user.
>
> Using little words.
>
> A return of true from inc_not_zero means we took a reference.
> A return of false means we did not take a reference.
>
> The code already handles I took a reference or I did not take a
> reference.
>
> Therefore lying with refcount_t is not helpful.  It takes failures
> the code could easily handle and turns them into leaks.
>
> At least that is how I have seen reference counts used.  And those
> are definitely the plane obivous semantics.
>
> Your changes are definitely not drop in replacements for atomic_t in my
> code.

To clarify.

If my code uses atomic_inc it does not expect a failure of any sort
and saturate semantics are a fine replacement.

If my code uses atomic_inc_not_zero it knows how to handle a failure
to take a reference count.  Making hiding the failure really bizarre.

A must check function that hides a case I can handle and requires
checking in a case where my code is built not to check is a drop in
replacement for neither.

So anyone who is proposing a refcount_t change as a drop in replacement
for any code I maintain I will nack on sight because refcount_t is not
currently a no-brain drop in replacement.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-29 11:30             ` Eric W. Biederman
  2017-05-29 11:30               ` Eric W. Biederman
@ 2017-05-29 11:39               ` Eric W. Biederman
  2017-05-29 11:39                 ` Eric W. Biederman
  2017-05-29 12:23                 ` Peter Zijlstra
  1 sibling, 2 replies; 20+ messages in thread
From: Eric W. Biederman @ 2017-05-29 11:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Kees Cook, Andrew Morton, Elena Reshetova,
	Greg KH, Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, axboe@kernel.dk, James Bottomley,
	x86@kernel.org, Ingo Molnar, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-arch, kernel-hardening

ebiederm@xmission.com (Eric W. Biederman) writes:

> ebiederm@xmission.com (Eric W. Biederman) writes:
>
>> Peter Zijlstra <peterz@infradead.org> writes:
>>
>>> On Mon, May 29, 2017 at 04:11:13AM -0500, Eric W. Biederman wrote:
>>>
>>>> Kees I I have a concern:
>>>> 
>>>> __must_check bool refcount_add_not_zero(unsigned int i, refcount_t *r)
>>>> {
>>>>         unsigned int new, val = atomic_read(&r->refs);
>>>> 
>>>>         do {
>>>>                 if (!val)
>>>>                         return false;
>>>> 
>>>>                 if (unlikely(val == UINT_MAX))
>>>>                         return true;
>>>> 
>>>>                 new = val + i;
>>>>                 if (new < val)
>>>>                         new = UINT_MAX;
>>>> 
>>>>         } while (!atomic_try_cmpxchg_relaxed(&r->refs, &val, new));
>>>> 
>>>>         WARN_ONCE(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
>>>> 
>>>>         return true;
>>>> }
>>>> 
>>>> Why in the world do you succeed when you the value saturates????
>>>
>>> Why not? On saturation the object will leak and returning a reference to
>>> it is always good.
>>>
>>>> From a code perspective that is bizarre.   The code already has to handle
>>>> the case when the counter does not increment.
>>>
>>> I don't see it as bizarre, we turned an overflow/use-after-free into a
>>> leak. That's the primary mechanism here.
>>>
>>> As long as we have a reference to a leaked object, we might as well use
>>> it, its not going anywhere.
>>>
>>>> Fixing the return value would move refcount_t into the realm of
>>>> something that is desirable because it has bettern semantics and
>>>> is more useful just on a day to day correctness point of view.  Even
>>>> ignoring the security implications.
>>>
>>> It changes the semantics between inc_not_zero() and inc(). It also
>>> complicates the semantics of inc_not_zero(), where currently the failure
>>> implies the count is 0 and means no-such-object, you complicate matters
>>> by basically returning 'busy'.
>>
>> Busy is not a state of a reference count.
>>
>> It is true I am suggesting treating something with a saturated reference
>> as not available.  If that is what you mean by busy.  But if it's
>> reference is zero it is also not available.  So there is no practical
>> difference.
>>
>>> That is a completely new class of failure that is actually hard to deal
>>> with, not to mention that it completely destroys refcount_inc_not_zero()
>>> being a 'simple' replacement for atomic_inc_not_zero().
>>>
>>> In case of the current failure, the no-such-object, we can fix that by
>>> creating said object. But what to do on 'busy' ? Surely you don't want
>>> to create another. You'd have to somehow retrofit something to wait on
>>> in every user.
>>
>> Using little words.
>>
>> A return of true from inc_not_zero means we took a reference.
>> A return of false means we did not take a reference.
>>
>> The code already handles I took a reference or I did not take a
>> reference.
>>
>> Therefore lying with refcount_t is not helpful.  It takes failures
>> the code could easily handle and turns them into leaks.
>>
>> At least that is how I have seen reference counts used.  And those
>> are definitely the plane obivous semantics.
>>
>> Your changes are definitely not drop in replacements for atomic_t in my
>> code.
>
> To clarify.
>
> If my code uses atomic_inc it does not expect a failure of any sort
> and saturate semantics are a fine replacement.
>
> If my code uses atomic_inc_not_zero it knows how to handle a failure
> to take a reference count.  Making hiding the failure really bizarre.
>
> A must check function that hides a case I can handle and requires
> checking in a case where my code is built not to check is a drop in
> replacement for neither.
>
> So anyone who is proposing a refcount_t change as a drop in replacement
> for any code I maintain I will nack on sight because refcount_t is not
> currently a no-brain drop in replacement.

*Blink*

I failed to see that there is a refcount_inc.  Too much noise in
the header file I suppose.

But implementing refcount_inc in terms of refcount_inc_not_zero is
totally broken.  The two operations are not the same and the go to
different assumptions the code is making.

That explains why you think refcount_inc_not_zero should lie because
you are implementing refcount_inc with it.  They are semantically very
different operations.  Please separate them.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-29 11:39               ` Eric W. Biederman
@ 2017-05-29 11:39                 ` Eric W. Biederman
  2017-05-29 12:23                 ` Peter Zijlstra
  1 sibling, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2017-05-29 11:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Kees Cook, Andrew Morton, Elena Reshetova,
	Greg KH, Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, axboe@kernel.dk, James Bottomley,
	x86@kernel.org, Ingo Molnar, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-arch, kernel-hardening@lists.openwall.com,
	LKML

ebiederm@xmission.com (Eric W. Biederman) writes:

> ebiederm@xmission.com (Eric W. Biederman) writes:
>
>> Peter Zijlstra <peterz@infradead.org> writes:
>>
>>> On Mon, May 29, 2017 at 04:11:13AM -0500, Eric W. Biederman wrote:
>>>
>>>> Kees I I have a concern:
>>>> 
>>>> __must_check bool refcount_add_not_zero(unsigned int i, refcount_t *r)
>>>> {
>>>>         unsigned int new, val = atomic_read(&r->refs);
>>>> 
>>>>         do {
>>>>                 if (!val)
>>>>                         return false;
>>>> 
>>>>                 if (unlikely(val == UINT_MAX))
>>>>                         return true;
>>>> 
>>>>                 new = val + i;
>>>>                 if (new < val)
>>>>                         new = UINT_MAX;
>>>> 
>>>>         } while (!atomic_try_cmpxchg_relaxed(&r->refs, &val, new));
>>>> 
>>>>         WARN_ONCE(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
>>>> 
>>>>         return true;
>>>> }
>>>> 
>>>> Why in the world do you succeed when you the value saturates????
>>>
>>> Why not? On saturation the object will leak and returning a reference to
>>> it is always good.
>>>
>>>> From a code perspective that is bizarre.   The code already has to handle
>>>> the case when the counter does not increment.
>>>
>>> I don't see it as bizarre, we turned an overflow/use-after-free into a
>>> leak. That's the primary mechanism here.
>>>
>>> As long as we have a reference to a leaked object, we might as well use
>>> it, its not going anywhere.
>>>
>>>> Fixing the return value would move refcount_t into the realm of
>>>> something that is desirable because it has bettern semantics and
>>>> is more useful just on a day to day correctness point of view.  Even
>>>> ignoring the security implications.
>>>
>>> It changes the semantics between inc_not_zero() and inc(). It also
>>> complicates the semantics of inc_not_zero(), where currently the failure
>>> implies the count is 0 and means no-such-object, you complicate matters
>>> by basically returning 'busy'.
>>
>> Busy is not a state of a reference count.
>>
>> It is true I am suggesting treating something with a saturated reference
>> as not available.  If that is what you mean by busy.  But if it's
>> reference is zero it is also not available.  So there is no practical
>> difference.
>>
>>> That is a completely new class of failure that is actually hard to deal
>>> with, not to mention that it completely destroys refcount_inc_not_zero()
>>> being a 'simple' replacement for atomic_inc_not_zero().
>>>
>>> In case of the current failure, the no-such-object, we can fix that by
>>> creating said object. But what to do on 'busy' ? Surely you don't want
>>> to create another. You'd have to somehow retrofit something to wait on
>>> in every user.
>>
>> Using little words.
>>
>> A return of true from inc_not_zero means we took a reference.
>> A return of false means we did not take a reference.
>>
>> The code already handles I took a reference or I did not take a
>> reference.
>>
>> Therefore lying with refcount_t is not helpful.  It takes failures
>> the code could easily handle and turns them into leaks.
>>
>> At least that is how I have seen reference counts used.  And those
>> are definitely the plane obivous semantics.
>>
>> Your changes are definitely not drop in replacements for atomic_t in my
>> code.
>
> To clarify.
>
> If my code uses atomic_inc it does not expect a failure of any sort
> and saturate semantics are a fine replacement.
>
> If my code uses atomic_inc_not_zero it knows how to handle a failure
> to take a reference count.  Making hiding the failure really bizarre.
>
> A must check function that hides a case I can handle and requires
> checking in a case where my code is built not to check is a drop in
> replacement for neither.
>
> So anyone who is proposing a refcount_t change as a drop in replacement
> for any code I maintain I will nack on sight because refcount_t is not
> currently a no-brain drop in replacement.

*Blink*

I failed to see that there is a refcount_inc.  Too much noise in
the header file I suppose.

But implementing refcount_inc in terms of refcount_inc_not_zero is
totally broken.  The two operations are not the same and the go to
different assumptions the code is making.

That explains why you think refcount_inc_not_zero should lie because
you are implementing refcount_inc with it.  They are semantically very
different operations.  Please separate them.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-29 10:49           ` Eric W. Biederman
  2017-05-29 10:49             ` Eric W. Biederman
  2017-05-29 11:30             ` Eric W. Biederman
@ 2017-05-29 12:13             ` Peter Zijlstra
  2017-05-29 12:13               ` Peter Zijlstra
  2 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2017-05-29 12:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Christoph Hellwig, Kees Cook, Andrew Morton, Elena Reshetova,
	Greg KH, Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, axboe@kernel.dk, James Bottomley,
	x86@kernel.org, Ingo Molnar, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-arch, kernel-hardening

On Mon, May 29, 2017 at 05:49:53AM -0500, Eric W. Biederman wrote:

> > It changes the semantics between inc_not_zero() and inc(). It also
> > complicates the semantics of inc_not_zero(), where currently the failure
> > implies the count is 0 and means no-such-object, you complicate matters
> > by basically returning 'busy'.
> 
> Busy is not a state of a reference count.
> 
> It is true I am suggesting treating something with a saturated reference
> as not available.  If that is what you mean by busy.  But if it's
> reference is zero it is also not available.  So there is no practical
> difference.

There is. Previously when inc_not_zero() failed, you _knew_ it was 0 and
therefore know the object no longer 'exists'.

Similarly, if you know you're serialized against 1->0 you can then
assume it will not fail.

That goes out the window the moment you fail for any other condition.

> > That is a completely new class of failure that is actually hard to deal
> > with, not to mention that it completely destroys refcount_inc_not_zero()
> > being a 'simple' replacement for atomic_inc_not_zero().
> >
> > In case of the current failure, the no-such-object, we can fix that by
> > creating said object. But what to do on 'busy' ? Surely you don't want
> > to create another. You'd have to somehow retrofit something to wait on
> > in every user.
> 
> Using little words.
> 
> A return of true from inc_not_zero means we took a reference.
> A return of false means we did not take a reference.
> 
> The code already handles I took a reference or I did not take a
> reference.

I can well imagine code relying on the fact that failing to take a
reference means 0, see below. And once you start to fail for more
conditions, the actual value you failed on becomes interesting in order
to determine how to deal with it.

> Therefore lying with refcount_t is not helpful.

It is _NOT_ lying. It does as promised, it increments when it is not
zero. The fact that that increment can saturate is irrelevant. A
saturated reference is still a valid reference. Sure it causes a leak,
but who bloody cares, it shouldn't happen in the first place.

> It takes failures
> the code could easily handle and turns them into leaks.

That is the 'feature', we get to have leaks. Also those leaks _should_
not happen. They are a result of 'broken' code. So I don't see how
exposing them to the wider world helps anything but spread the pain of
the failure.

Please explain how the below is not subtly broken by changing
inc_not_zero.

struct obj *__obj_lookup(key)
{
	obj = rcu_based_lookup(key);
	if (refcount_inc_not_zero(&obj->ref))
		return obj;

	return NULL;
}

struct obj *obj_find_acquire(key)
{
	/* fast path, lockless lookup */

	rcu_read_lock()
	obj = __obj_lookup(key);
	rcu_read_unlock();

	if (obj)
		return obj;

	/* slow path, serialize */

	lock(&obj_lock);
	/* we're serialized, if it exists we must get a ref */
	obj = __obj_lookup(key);
	if (obj)
		goto unlock;

	/* allocate a new object and insert it */
	obj = obj_alloc();
	obj_init(obj, key);

unlock:
	unlock(&obj_lock);

	return obj;
}

> At least that is how I have seen reference counts used.  And those
> are definitely the plane obivous semantics.

I very strongly disagree. The one thing this primitive does is change
add/sub to be saturating, *consistently*.

You argue to expose the failure case, leading to more error paths
leading to more complication. I'll argue that nobody wants more error
handling, esp. for something that should not happen to begin with.

> Your changes are definitely not drop in replacements for atomic_t in my
> code.

refcount_t was never meant to be a drop-in replacement for atomic_t.
It is meant to be a fairly painless replacement for reference
counts that were previously implemented using atomic_t.

There is also the fairly common usage count scenario, that does not fit
well with refcount_t. I'm not sure we want to shoehorn that into
refcount_t either, we could create yet another type for that.

And there are of course a lot of less common things we're not wanting to
replace at all. atomic_t isn't broken we don't need to fix it.

I have no idea what you do so I cannot comment further.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-29 12:13             ` Peter Zijlstra
@ 2017-05-29 12:13               ` Peter Zijlstra
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2017-05-29 12:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Christoph Hellwig, Kees Cook, Andrew Morton, Elena Reshetova,
	Greg KH, Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, axboe@kernel.dk, James Bottomley,
	x86@kernel.org, Ingo Molnar, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-arch, kernel-hardening@lists.openwall.com,
	LKML

On Mon, May 29, 2017 at 05:49:53AM -0500, Eric W. Biederman wrote:

> > It changes the semantics between inc_not_zero() and inc(). It also
> > complicates the semantics of inc_not_zero(), where currently the failure
> > implies the count is 0 and means no-such-object, you complicate matters
> > by basically returning 'busy'.
> 
> Busy is not a state of a reference count.
> 
> It is true I am suggesting treating something with a saturated reference
> as not available.  If that is what you mean by busy.  But if it's
> reference is zero it is also not available.  So there is no practical
> difference.

There is. Previously when inc_not_zero() failed, you _knew_ it was 0 and
therefore know the object no longer 'exists'.

Similarly, if you know you're serialized against 1->0 you can then
assume it will not fail.

That goes out the window the moment you fail for any other condition.

> > That is a completely new class of failure that is actually hard to deal
> > with, not to mention that it completely destroys refcount_inc_not_zero()
> > being a 'simple' replacement for atomic_inc_not_zero().
> >
> > In case of the current failure, the no-such-object, we can fix that by
> > creating said object. But what to do on 'busy' ? Surely you don't want
> > to create another. You'd have to somehow retrofit something to wait on
> > in every user.
> 
> Using little words.
> 
> A return of true from inc_not_zero means we took a reference.
> A return of false means we did not take a reference.
> 
> The code already handles I took a reference or I did not take a
> reference.

I can well imagine code relying on the fact that failing to take a
reference means 0, see below. And once you start to fail for more
conditions, the actual value you failed on becomes interesting in order
to determine how to deal with it.

> Therefore lying with refcount_t is not helpful.

It is _NOT_ lying. It does as promised, it increments when it is not
zero. The fact that that increment can saturate is irrelevant. A
saturated reference is still a valid reference. Sure it causes a leak,
but who bloody cares, it shouldn't happen in the first place.

> It takes failures
> the code could easily handle and turns them into leaks.

That is the 'feature', we get to have leaks. Also those leaks _should_
not happen. They are a result of 'broken' code. So I don't see how
exposing them to the wider world helps anything but spread the pain of
the failure.

Please explain how the below is not subtly broken by changing
inc_not_zero.

struct obj *__obj_lookup(key)
{
	obj = rcu_based_lookup(key);
	if (refcount_inc_not_zero(&obj->ref))
		return obj;

	return NULL;
}

struct obj *obj_find_acquire(key)
{
	/* fast path, lockless lookup */

	rcu_read_lock()
	obj = __obj_lookup(key);
	rcu_read_unlock();

	if (obj)
		return obj;

	/* slow path, serialize */

	lock(&obj_lock);
	/* we're serialized, if it exists we must get a ref */
	obj = __obj_lookup(key);
	if (obj)
		goto unlock;

	/* allocate a new object and insert it */
	obj = obj_alloc();
	obj_init(obj, key);

unlock:
	unlock(&obj_lock);

	return obj;
}

> At least that is how I have seen reference counts used.  And those
> are definitely the plane obivous semantics.

I very strongly disagree. The one thing this primitive does is change
add/sub to be saturating, *consistently*.

You argue to expose the failure case, leading to more error paths
leading to more complication. I'll argue that nobody wants more error
handling, esp. for something that should not happen to begin with.

> Your changes are definitely not drop in replacements for atomic_t in my
> code.

refcount_t was never meant to be a drop-in replacement for atomic_t.
It is meant to be a fairly painless replacement for reference
counts that were previously implemented using atomic_t.

There is also the fairly common usage count scenario, that does not fit
well with refcount_t. I'm not sure we want to shoehorn that into
refcount_t either, we could create yet another type for that.

And there are of course a lot of less common things we're not wanting to
replace at all. atomic_t isn't broken we don't need to fix it.

I have no idea what you do so I cannot comment further.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-29 11:39               ` Eric W. Biederman
  2017-05-29 11:39                 ` Eric W. Biederman
@ 2017-05-29 12:23                 ` Peter Zijlstra
  2017-05-29 12:23                   ` Peter Zijlstra
  2017-05-29 15:43                   ` Peter Zijlstra
  1 sibling, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2017-05-29 12:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Christoph Hellwig, Kees Cook, Andrew Morton, Elena Reshetova,
	Greg KH, Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, axboe@kernel.dk, James Bottomley,
	x86@kernel.org, Ingo Molnar, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-arch, kernel-hardening

On Mon, May 29, 2017 at 06:39:44AM -0500, Eric W. Biederman wrote:
> I failed to see that there is a refcount_inc.  Too much noise in
> the header file I suppose.
> 
> But implementing refcount_inc in terms of refcount_inc_not_zero is
> totally broken.  The two operations are not the same and the go to
> different assumptions the code is making.
> 
> That explains why you think refcount_inc_not_zero should lie because
> you are implementing refcount_inc with it.  They are semantically very
> different operations.  Please separate them.

There has been much debate about this. And the best I'll do is add a
comment and/or retain these exact semantics.

What is done is:

	refcount_inc() := WARN_ON(!refcount_inc_not_zero())

Because incrementing a zero reference count is a use-after-free and
something we should not do ever.

This is where the whole usage count vs reference count pain comes from.

Once there are no more _references_ to an object, a reference count
frees the object. Therefore a zero reference count means a dead object
and incrementing from that is fail.

The usage count model otoh counts how many (active) users there are of
an object, and no active users is a good and expected situation. But it
is very explicitly not a reference count. Because even in the no users
case do we have a reference to the object (we've not leaked it after
all, we just don't track all references).

Similarly, refcount_dec() is implemented using dec_and_test() and will
WARN when it hits 0, because this is a leak and we don't want those
either.

A usage count variant otoh would be fine with hitting 0.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-29 12:23                 ` Peter Zijlstra
@ 2017-05-29 12:23                   ` Peter Zijlstra
  2017-05-29 15:43                   ` Peter Zijlstra
  1 sibling, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2017-05-29 12:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Christoph Hellwig, Kees Cook, Andrew Morton, Elena Reshetova,
	Greg KH, Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, axboe@kernel.dk, James Bottomley,
	x86@kernel.org, Ingo Molnar, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-arch, kernel-hardening@lists.openwall.com,
	LKML

On Mon, May 29, 2017 at 06:39:44AM -0500, Eric W. Biederman wrote:
> I failed to see that there is a refcount_inc.  Too much noise in
> the header file I suppose.
> 
> But implementing refcount_inc in terms of refcount_inc_not_zero is
> totally broken.  The two operations are not the same and the go to
> different assumptions the code is making.
> 
> That explains why you think refcount_inc_not_zero should lie because
> you are implementing refcount_inc with it.  They are semantically very
> different operations.  Please separate them.

There has been much debate about this. And the best I'll do is add a
comment and/or retain these exact semantics.

What is done is:

	refcount_inc() := WARN_ON(!refcount_inc_not_zero())

Because incrementing a zero reference count is a use-after-free and
something we should not do ever.

This is where the whole usage count vs reference count pain comes from.

Once there are no more _references_ to an object, a reference count
frees the object. Therefore a zero reference count means a dead object
and incrementing from that is fail.

The usage count model otoh counts how many (active) users there are of
an object, and no active users is a good and expected situation. But it
is very explicitly not a reference count. Because even in the no users
case do we have a reference to the object (we've not leaked it after
all, we just don't track all references).

Similarly, refcount_dec() is implemented using dec_and_test() and will
WARN when it hits 0, because this is a leak and we don't want those
either.

A usage count variant otoh would be fine with hitting 0.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-29 12:23                 ` Peter Zijlstra
  2017-05-29 12:23                   ` Peter Zijlstra
@ 2017-05-29 15:43                   ` Peter Zijlstra
  2017-05-29 15:43                     ` Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2017-05-29 15:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Christoph Hellwig, Kees Cook, Andrew Morton, Elena Reshetova,
	Greg KH, Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, axboe@kernel.dk, James Bottomley,
	x86@kernel.org, Ingo Molnar, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-arch, kernel-hardening

On Mon, May 29, 2017 at 02:23:16PM +0200, Peter Zijlstra wrote:
> On Mon, May 29, 2017 at 06:39:44AM -0500, Eric W. Biederman wrote:
> > I failed to see that there is a refcount_inc.  Too much noise in
> > the header file I suppose.
> > 
> > But implementing refcount_inc in terms of refcount_inc_not_zero is
> > totally broken.  The two operations are not the same and the go to
> > different assumptions the code is making.
> > 
> > That explains why you think refcount_inc_not_zero should lie because
> > you are implementing refcount_inc with it.  They are semantically very
> > different operations.  Please separate them.
> 
> There has been much debate about this. And the best I'll do is add a
> comment and/or retain these exact semantics.
> 
> What is done is:
> 
> 	refcount_inc() := WARN_ON(!refcount_inc_not_zero())
> 
> Because incrementing a zero reference count is a use-after-free and
> something we should not do ever.
> 
> This is where the whole usage count vs reference count pain comes from.
> 
> Once there are no more _references_ to an object, a reference count
> frees the object. Therefore a zero reference count means a dead object
> and incrementing from that is fail.
> 
> The usage count model otoh counts how many (active) users there are of
> an object, and no active users is a good and expected situation. But it
> is very explicitly not a reference count. Because even in the no users
> case do we have a reference to the object (we've not leaked it after
> all, we just don't track all references).
> 
> 
> Similarly, refcount_dec() is implemented using dec_and_test() and will
> WARN when it hits 0, because this is a leak and we don't want those
> either.
> 
> A usage count variant otoh would be fine with hitting 0.

A typical pattern for the usage count is caches, where objects are kept
in a data structure (hash/tree and/or list) and we count how many users
there are of said objects. A GC or shrinker will then iterate the
structure and prune those objects that have 0 users.

It is fairly straight forward to convert those to refcount_t by adding
one reference for the data structure itself. The GC/shrinker will then
have to use something like refcount_dec_if_one() to drop the reference
from 1->0 (and we could easily add something like dec_and_lock_if_one if
so desired).

Not all of them mind you, but simple cases can certainly be done without
too much pain.

But clearly there have been conversions of less than desired quality /
clarity though ...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/3] ipc subsystem refcounter conversions
  2017-05-29 15:43                   ` Peter Zijlstra
@ 2017-05-29 15:43                     ` Peter Zijlstra
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2017-05-29 15:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Christoph Hellwig, Kees Cook, Andrew Morton, Elena Reshetova,
	Greg KH, Ingo Molnar, Alexey Dobriyan, Serge E. Hallyn, arozansk,
	Davidlohr Bueso, Manfred Spraul, axboe@kernel.dk, James Bottomley,
	x86@kernel.org, Ingo Molnar, Arnd Bergmann, David S. Miller,
	Rik van Riel, linux-arch, kernel-hardening@lists.openwall.com,
	LKML

On Mon, May 29, 2017 at 02:23:16PM +0200, Peter Zijlstra wrote:
> On Mon, May 29, 2017 at 06:39:44AM -0500, Eric W. Biederman wrote:
> > I failed to see that there is a refcount_inc.  Too much noise in
> > the header file I suppose.
> > 
> > But implementing refcount_inc in terms of refcount_inc_not_zero is
> > totally broken.  The two operations are not the same and the go to
> > different assumptions the code is making.
> > 
> > That explains why you think refcount_inc_not_zero should lie because
> > you are implementing refcount_inc with it.  They are semantically very
> > different operations.  Please separate them.
> 
> There has been much debate about this. And the best I'll do is add a
> comment and/or retain these exact semantics.
> 
> What is done is:
> 
> 	refcount_inc() := WARN_ON(!refcount_inc_not_zero())
> 
> Because incrementing a zero reference count is a use-after-free and
> something we should not do ever.
> 
> This is where the whole usage count vs reference count pain comes from.
> 
> Once there are no more _references_ to an object, a reference count
> frees the object. Therefore a zero reference count means a dead object
> and incrementing from that is fail.
> 
> The usage count model otoh counts how many (active) users there are of
> an object, and no active users is a good and expected situation. But it
> is very explicitly not a reference count. Because even in the no users
> case do we have a reference to the object (we've not leaked it after
> all, we just don't track all references).
> 
> 
> Similarly, refcount_dec() is implemented using dec_and_test() and will
> WARN when it hits 0, because this is a leak and we don't want those
> either.
> 
> A usage count variant otoh would be fine with hitting 0.

A typical pattern for the usage count is caches, where objects are kept
in a data structure (hash/tree and/or list) and we count how many users
there are of said objects. A GC or shrinker will then iterate the
structure and prune those objects that have 0 users.

It is fairly straight forward to convert those to refcount_t by adding
one reference for the data structure itself. The GC/shrinker will then
have to use something like refcount_dec_if_one() to drop the reference
from 1->0 (and we could easily add something like dec_and_lock_if_one if
so desired).

Not all of them mind you, but simple cases can certainly be done without
too much pain.

But clearly there have been conversions of less than desired quality /
clarity though ...

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-05-29 15:44 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1487590189-18151-1-git-send-email-elena.reshetova@intel.com>
     [not found] ` <20170303162352.b6af1c0c3115b3f5f1e7aed3@linux-foundation.org>
2017-05-27 19:58   ` [PATCH 0/3] ipc subsystem refcounter conversions Kees Cook
2017-05-27 19:58     ` Kees Cook
2017-05-29  8:39     ` Christoph Hellwig
2017-05-29  8:39       ` Christoph Hellwig
2017-05-29  9:11       ` Eric W. Biederman
2017-05-29  9:11         ` Eric W. Biederman
2017-05-29 10:24         ` Peter Zijlstra
2017-05-29 10:24           ` Peter Zijlstra
2017-05-29 10:49           ` Eric W. Biederman
2017-05-29 10:49             ` Eric W. Biederman
2017-05-29 11:30             ` Eric W. Biederman
2017-05-29 11:30               ` Eric W. Biederman
2017-05-29 11:39               ` Eric W. Biederman
2017-05-29 11:39                 ` Eric W. Biederman
2017-05-29 12:23                 ` Peter Zijlstra
2017-05-29 12:23                   ` Peter Zijlstra
2017-05-29 15:43                   ` Peter Zijlstra
2017-05-29 15:43                     ` Peter Zijlstra
2017-05-29 12:13             ` Peter Zijlstra
2017-05-29 12:13               ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox