Re: [GIT PULL] percpu fix for v4.10-rc6

public inbox for linux-arch@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [GIT PULL] percpu fix for v4.10-rc6
       [not found] <20170131165537.GC23970@htj.duckdns.org>
@ 2017-01-31 21:32 ` Linus Torvalds
  2017-01-31 21:41   ` Linus Torvalds
                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Linus Torvalds @ 2017-01-31 21:32 UTC (permalink / raw)
  To: Tejun Heo, linux-arch@vger.kernel.org; +Cc: Linux Kernel Mailing List, dougmill

On Tue, Jan 31, 2017 at 8:55 AM, Tejun Heo <tj@kernel.org> wrote:
>
> Douglas found and fixed a ref leak bug in percpu_ref_tryget[_live]().
> The bug is caused by storing the return value of
> atomic_long_inc_not_zero() into an int temp variable before returning
> it as a bool.  The interim cast to int loses the upper bits and can
> lead to false negatives.  As percpu_ref uses a high bit to mark a
> draining counter, this can happen relatively easily.  Fixed by using
> bool for the temp variable.

I think this fix is wrong.

The fact is, atomic_long_inc_not_zero() shouldn't be returning
anything with high bits.. Casting to "int" should have absolutely no
impact. "int" is the traditional C truth value (with zero/nonzero
being false/true), and while we're generally moving towards "bool" for
true/false return values, I do think that code that assumes that these
functions can be cast to "int" are right.

For example, we used to have similar bugs in "test_bit()" returning
the actual bit value (which could be high).

And when users hit that problem, we fixed "test_bit()", not the users of it.

So I'd rather fix the places that (insanely) return a 64-bit value.

Is this purely a ppc64 issue, or does it happen somewhere else too?
Basically, the functions

  atomic*_sub_and_test()
  atomic*_dec_and_test()
  atomic*_inc_and_test()
  atomic*_add_negative()
  atomic*_add_unless()

should all be returning a proper bool/int truth value (preferably 0/1
so that we don't need to worry about it), not some random crazy value.

I've pulled this, but I really think it's papering over the real
issue. Adding "linux-arch" mailing list to ask architecture
maintainers to check their implementation of the atomic ops that
return a truth value.

                      Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] percpu fix for v4.10-rc6
  2017-01-31 21:32 ` [GIT PULL] percpu fix for v4.10-rc6 Linus Torvalds
@ 2017-01-31 21:41   ` Linus Torvalds
  2017-01-31 22:11     ` Tejun Heo
  2017-02-01  5:46   ` Michael Ellerman
  2017-02-01  7:56   ` David Howells
  2 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2017-01-31 21:41 UTC (permalink / raw)
  To: Tejun Heo, linux-arch@vger.kernel.org; +Cc: Linux Kernel Mailing List, dougmill

On Tue, Jan 31, 2017 at 1:32 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I've pulled this, but I really think it's papering over the real
> issue. Adding "linux-arch" mailing list to ask architecture
> maintainers to check their implementation of the atomic ops that
> return a truth value.

For example, looking at the x86-32 version, I see this:

  static inline int atomic64_add_unless(atomic64_t *v, long long a, long long u)
  ...
        return (int)a;

which looks really horribly wrong, but the assembly implementation
actually returns 0/1 in %eax so it ends up being right - just
confusingly so.

Also, to make things more confusing, the underscore version
(__atomic_add_unless()) actually returns the old value, not the truth
value of the comparison.

So this area definitely is messy. The x86-64 versions actually look
fairly clean and return nice boolean values.

                Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] percpu fix for v4.10-rc6
  2017-01-31 21:41   ` Linus Torvalds
@ 2017-01-31 22:11     ` Tejun Heo
  2017-01-31 22:11       ` Tejun Heo
  2017-01-31 22:17       ` Linus Torvalds
  0 siblings, 2 replies; 12+ messages in thread
From: Tejun Heo @ 2017-01-31 22:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-arch@vger.kernel.org, Linux Kernel Mailing List, dougmill

Hello, Linus.

On Tue, Jan 31, 2017 at 01:41:17PM -0800, Linus Torvalds wrote:
> On Tue, Jan 31, 2017 at 1:32 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I've pulled this, but I really think it's papering over the real
> > issue. Adding "linux-arch" mailing list to ask architecture
> > maintainers to check their implementation of the atomic ops that
> > return a truth value.

Yeah, for sure.

> For example, looking at the x86-32 version, I see this:
> 
>   static inline int atomic64_add_unless(atomic64_t *v, long long a, long long u)
>   ...
>         return (int)a;
> 
> which looks really horribly wrong, but the assembly implementation
> actually returns 0/1 in %eax so it ends up being right - just
> confusingly so.
> 
> Also, to make things more confusing, the underscore version
> (__atomic_add_unless()) actually returns the old value, not the truth
> value of the comparison.
> 
> So this area definitely is messy. The x86-64 versions actually look
> fairly clean and return nice boolean values.

We have a similar mess with bitops too.  x86 is cleaned up to have
bool returns but the generic implementation and a lot of other archs
return the tested bit instead of 1/0.  It'd be great to make all the
boolean functions actually return bool.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] percpu fix for v4.10-rc6
  2017-01-31 22:11     ` Tejun Heo
@ 2017-01-31 22:11       ` Tejun Heo
  2017-01-31 22:17       ` Linus Torvalds
  1 sibling, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2017-01-31 22:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-arch@vger.kernel.org, Linux Kernel Mailing List, dougmill

Hello, Linus.

On Tue, Jan 31, 2017 at 01:41:17PM -0800, Linus Torvalds wrote:
> On Tue, Jan 31, 2017 at 1:32 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I've pulled this, but I really think it's papering over the real
> > issue. Adding "linux-arch" mailing list to ask architecture
> > maintainers to check their implementation of the atomic ops that
> > return a truth value.

Yeah, for sure.

> For example, looking at the x86-32 version, I see this:
> 
>   static inline int atomic64_add_unless(atomic64_t *v, long long a, long long u)
>   ...
>         return (int)a;
> 
> which looks really horribly wrong, but the assembly implementation
> actually returns 0/1 in %eax so it ends up being right - just
> confusingly so.
> 
> Also, to make things more confusing, the underscore version
> (__atomic_add_unless()) actually returns the old value, not the truth
> value of the comparison.
> 
> So this area definitely is messy. The x86-64 versions actually look
> fairly clean and return nice boolean values.

We have a similar mess with bitops too.  x86 is cleaned up to have
bool returns but the generic implementation and a lot of other archs
return the tested bit instead of 1/0.  It'd be great to make all the
boolean functions actually return bool.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] percpu fix for v4.10-rc6
  2017-01-31 22:11     ` Tejun Heo
  2017-01-31 22:11       ` Tejun Heo
@ 2017-01-31 22:17       ` Linus Torvalds
  2017-01-31 22:27         ` Tejun Heo
  1 sibling, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2017-01-31 22:17 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-arch@vger.kernel.org, Linux Kernel Mailing List, dougmill

On Tue, Jan 31, 2017 at 2:11 PM, Tejun Heo <tj@kernel.org> wrote:
>
> We have a similar mess with bitops too.  x86 is cleaned up to have
> bool returns but the generic implementation and a lot of other archs
> return the tested bit instead of 1/0.  It'd be great to make all the
> boolean functions actually return bool.

If they really do return the tested bit, then those architectures
absolutely _will_ contain known bugs.

Because there definitely have been users of the bitop routines that
assign the result to an "int", and I have some dim memory of us also
having had things like drivers that made their own "bool" variables
and use "char" for them.

But I'm not seeing it. The generic bitop pattern seems to be

    static inline int test_and_change_bit(int nr, volatile unsigned long *addr)
    ...
        return (old & mask) != 0;

which is fine.

Just exactly what code did you look at?

                  Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] percpu fix for v4.10-rc6
  2017-01-31 22:17       ` Linus Torvalds
@ 2017-01-31 22:27         ` Tejun Heo
  2017-02-01  0:32           ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2017-01-31 22:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-arch@vger.kernel.org, Linux Kernel Mailing List, dougmill

On Tue, Jan 31, 2017 at 02:17:10PM -0800, Linus Torvalds wrote:
> Because there definitely have been users of the bitop routines that
> assign the result to an "int", and I have some dim memory of us also
> having had things like drivers that made their own "bool" variables
> and use "char" for them.
> 
> But I'm not seeing it. The generic bitop pattern seems to be
> 
>     static inline int test_and_change_bit(int nr, volatile unsigned long *addr)
>     ...
>         return (old & mask) != 0;
> 
> which is fine.
> 
> Just exactly what code did you look at?

My bad.  I misread the generic test_bit() code and was reading the
inner helper of ppc, DEFINE_TESTOP macro, which returns the masked
value.  We used to have this problem, right?  I seem to have a memory
of hitting this issue.

Is there a reason we don't make these functions explicitly return
bool?  To avoid unnecessary boolean conversion by the compiler?  If
so, there gotta be a way to avoid that.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] percpu fix for v4.10-rc6
  2017-01-31 22:27         ` Tejun Heo
@ 2017-02-01  0:32           ` Linus Torvalds
  2017-02-01  0:32             ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2017-02-01  0:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-arch@vger.kernel.org, Linux Kernel Mailing List, dougmill

On Tue, Jan 31, 2017 at 2:27 PM, Tejun Heo <tj@kernel.org> wrote:
>
> My bad.  I misread the generic test_bit() code and was reading the
> inner helper of ppc, DEFINE_TESTOP macro, which returns the masked
> value.  We used to have this problem, right?  I seem to have a memory
> of hitting this issue.

Yes. We used to haev the problem, but that's long ago.

> Is there a reason we don't make these functions explicitly return
> bool?  To avoid unnecessary boolean conversion by the compiler?  If
> so, there gotta be a way to avoid that.

We could certainly encourage people to do it on a case-by-case basis,
but yes, on some architectures it ends up not being convenient.

Take that i386 case of "atomic64_add_unless()", for example: the
assembly code really *does* end up returning a "bool", but we can't
tell the C compiler that, because we (due to the special calling
conventions) call it from an inline asm. And we pass in the input
values as a 64-bit register pair (%eax:%edx), gcc gets unhappy if we
then get the return value in just %eax (at least some versions did).

So to make gcc happy, we "lie" and say that the inline asm writes to
the 64-bit register pair, and then that "(int)" cast is to just
discard the upper bits in %edx, so that it just returns %eax. Which is
the end result we want.

But casting it to bool would make gcc generate a totally pointless
extra comparison against 0. Of course, as long as everybody just tests
the value, that's fine (you need the compare anyway for the test), but
the code we have now is basically correct and better.

Of course, for even better code, we might just change the asm code to
return the conditional in ZF instead, which avoids the whole "return
register is used as part of a register pair for input" issue entirely,
and would allow us to use the nifty condition code output logic in
gcc, generating even better code, _and_ would allow us to just specify
the return value as "bool" the way we do on x86-64.

But the condition code output feature is fairly new, and somebody
would need to rewrite both the cmpxchg64 and the non-atomic versions
to do the proper thing. So in the meantime the x86-32 version actually
*does* the right thing and returns 0/1, but doesn't show it as a
"bool" type.

The ppc64 code problem might be fixed by just changing the ppc code to
use a "bool" return value, and haev the compiler generate the test
that it currently doesn't (and that causes the problem with high
bits).

                       Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] percpu fix for v4.10-rc6
  2017-02-01  0:32           ` Linus Torvalds
@ 2017-02-01  0:32             ` Linus Torvalds
  0 siblings, 0 replies; 12+ messages in thread
From: Linus Torvalds @ 2017-02-01  0:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-arch@vger.kernel.org, Linux Kernel Mailing List, dougmill

On Tue, Jan 31, 2017 at 2:27 PM, Tejun Heo <tj@kernel.org> wrote:
>
> My bad.  I misread the generic test_bit() code and was reading the
> inner helper of ppc, DEFINE_TESTOP macro, which returns the masked
> value.  We used to have this problem, right?  I seem to have a memory
> of hitting this issue.

Yes. We used to haev the problem, but that's long ago.

> Is there a reason we don't make these functions explicitly return
> bool?  To avoid unnecessary boolean conversion by the compiler?  If
> so, there gotta be a way to avoid that.

We could certainly encourage people to do it on a case-by-case basis,
but yes, on some architectures it ends up not being convenient.

Take that i386 case of "atomic64_add_unless()", for example: the
assembly code really *does* end up returning a "bool", but we can't
tell the C compiler that, because we (due to the special calling
conventions) call it from an inline asm. And we pass in the input
values as a 64-bit register pair (%eax:%edx), gcc gets unhappy if we
then get the return value in just %eax (at least some versions did).

So to make gcc happy, we "lie" and say that the inline asm writes to
the 64-bit register pair, and then that "(int)" cast is to just
discard the upper bits in %edx, so that it just returns %eax. Which is
the end result we want.

But casting it to bool would make gcc generate a totally pointless
extra comparison against 0. Of course, as long as everybody just tests
the value, that's fine (you need the compare anyway for the test), but
the code we have now is basically correct and better.

Of course, for even better code, we might just change the asm code to
return the conditional in ZF instead, which avoids the whole "return
register is used as part of a register pair for input" issue entirely,
and would allow us to use the nifty condition code output logic in
gcc, generating even better code, _and_ would allow us to just specify
the return value as "bool" the way we do on x86-64.

But the condition code output feature is fairly new, and somebody
would need to rewrite both the cmpxchg64 and the non-atomic versions
to do the proper thing. So in the meantime the x86-32 version actually
*does* the right thing and returns 0/1, but doesn't show it as a
"bool" type.

The ppc64 code problem might be fixed by just changing the ppc code to
use a "bool" return value, and haev the compiler generate the test
that it currently doesn't (and that causes the problem with high
bits).

                       Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] percpu fix for v4.10-rc6
  2017-01-31 21:32 ` [GIT PULL] percpu fix for v4.10-rc6 Linus Torvalds
  2017-01-31 21:41   ` Linus Torvalds
@ 2017-02-01  5:46   ` Michael Ellerman
  2017-02-01  7:56   ` David Howells
  2 siblings, 0 replies; 12+ messages in thread
From: Michael Ellerman @ 2017-02-01  5:46 UTC (permalink / raw)
  To: Linus Torvalds, Tejun Heo, linux-arch@vger.kernel.org
  Cc: linuxppc-dev, Linux Kernel Mailing List, dougmill

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, Jan 31, 2017 at 8:55 AM, Tejun Heo <tj@kernel.org> wrote:
>>
>> Douglas found and fixed a ref leak bug in percpu_ref_tryget[_live]().
>> The bug is caused by storing the return value of
>> atomic_long_inc_not_zero() into an int temp variable before returning
>> it as a bool.  The interim cast to int loses the upper bits and can
>> lead to false negatives.  As percpu_ref uses a high bit to mark a
>> draining counter, this can happen relatively easily.  Fixed by using
>> bool for the temp variable.
>
> I think this fix is wrong.
>
> The fact is, atomic_long_inc_not_zero() shouldn't be returning
> anything with high bits.. Casting to "int" should have absolutely no
> impact. "int" is the traditional C truth value (with zero/nonzero
> being false/true), and while we're generally moving towards "bool" for
> true/false return values, I do think that code that assumes that these
> functions can be cast to "int" are right.
>
> For example, we used to have similar bugs in "test_bit()" returning
> the actual bit value (which could be high).
>
> And when users hit that problem, we fixed "test_bit()", not the users of it.
>
> So I'd rather fix the places that (insanely) return a 64-bit value.
>
> Is this purely a ppc64 issue, or does it happen somewhere else too?

Sorry I'm late to this, I wasn't Cc'ed on the original patch.

It looks like this is only a powerpc issue, we're the only arch other
than x86-32 that implements atomic_long_inc_not_zero() by hand (not
using atomic64_add_unless()).

Assuming all other arches have an atomic64_add_unless() which returns
an int then they should all be safe.

Actually we have a test suite for atomic64. The patch below adds a check
which catches the problem on powerpc at the moment, and passes once I
change our version to return an int. I'll turn it into a proper patch
and send it to whoever maintains the tests.

cheers


diff --git a/lib/atomic64_test.c b/lib/atomic64_test.c
index 46042901130f..813cd05bec9d 100644
--- a/lib/atomic64_test.c
+++ b/lib/atomic64_test.c
@@ -152,8 +152,10 @@ static __init void test_atomic64(void)
 	long long v0 = 0xaaa31337c001d00dLL;
 	long long v1 = 0xdeadbeefdeafcafeLL;
 	long long v2 = 0xfaceabadf00df001LL;
+	long long v3 = 0x8000000000000000LL;
 	long long onestwos = 0x1111111122222222LL;
 	long long one = 1LL;
+	int r_int;
 
 	atomic64_t v = ATOMIC64_INIT(v0);
 	long long r = v0;
@@ -239,6 +241,11 @@ static __init void test_atomic64(void)
 	BUG_ON(!atomic64_inc_not_zero(&v));
 	r += one;
 	BUG_ON(v.counter != r);
+
+	/* Confirm the return value fits in an int, even if the value doesn't */
+	INIT(v3);
+	r_int = atomic64_inc_not_zero(&v);
+	BUG_ON(!r_int);
 }
 
 static __init int test_atomics(void)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] percpu fix for v4.10-rc6
  2017-01-31 21:32 ` [GIT PULL] percpu fix for v4.10-rc6 Linus Torvalds
  2017-01-31 21:41   ` Linus Torvalds
  2017-02-01  5:46   ` Michael Ellerman
@ 2017-02-01  7:56   ` David Howells
       [not found]     ` <CA+55aFyiy2jD80RTbsm3C=G5ifgtj8GQHqFwaYZM+ktgx1embA@mail.gmail.com>
  2 siblings, 1 reply; 12+ messages in thread
From: David Howells @ 2017-02-01  7:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, Tejun Heo, linux-arch@vger.kernel.org,
	Linux Kernel Mailing List, dougmill

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> The fact is, atomic_long_inc_not_zero() shouldn't be returning
> anything with high bits..

Ummm...  Why's that the case?  If atomic_long_t can never exceed UINT_MAX,
then why does it exist at all?

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] percpu fix for v4.10-rc6
       [not found]         ` <CA+55aFz8TFRykr0qNBjNbK+kavUdbGOm4huf5XhPLQcy0tMyLw@mail.gmail.com>
@ 2017-02-01 10:00           ` David Howells
  2017-02-01 10:00             ` David Howells
  0 siblings, 1 reply; 12+ messages in thread
From: David Howells @ 2017-02-01 10:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, Linux Kernel Mailing List, linux-arch, Tejun Heo,
	dougmill

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> The return value isn't the value of the atomic variable.
> 
> The return value is whether the increment happened or not (ie the "was it not zero and
> could be incremented" part?)

Okay, fair enough.  I was thinking of it as atomic_long_inc_return() - which
obviously it isn't.

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [GIT PULL] percpu fix for v4.10-rc6
  2017-02-01 10:00           ` David Howells
@ 2017-02-01 10:00             ` David Howells
  0 siblings, 0 replies; 12+ messages in thread
From: David Howells @ 2017-02-01 10:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, Linux Kernel Mailing List, linux-arch, Tejun Heo,
	dougmill

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> The return value isn't the value of the atomic variable.
> 
> The return value is whether the increment happened or not (ie the "was it not zero and
> could be incremented" part?)

Okay, fair enough.  I was thinking of it as atomic_long_inc_return() - which
obviously it isn't.

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-02-01 10:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20170131165537.GC23970@htj.duckdns.org>
2017-01-31 21:32 ` [GIT PULL] percpu fix for v4.10-rc6 Linus Torvalds
2017-01-31 21:41   ` Linus Torvalds
2017-01-31 22:11     ` Tejun Heo
2017-01-31 22:11       ` Tejun Heo
2017-01-31 22:17       ` Linus Torvalds
2017-01-31 22:27         ` Tejun Heo
2017-02-01  0:32           ` Linus Torvalds
2017-02-01  0:32             ` Linus Torvalds
2017-02-01  5:46   ` Michael Ellerman
2017-02-01  7:56   ` David Howells
     [not found]     ` <CA+55aFyiy2jD80RTbsm3C=G5ifgtj8GQHqFwaYZM+ktgx1embA@mail.gmail.com>
     [not found]       ` <CA+55aFxRLP_qPBXoLomxj-VYG-R=rKJE8KZ_h4NQ4g74gpNEWQ@mail.gmail.com>
     [not found]         ` <CA+55aFz8TFRykr0qNBjNbK+kavUdbGOm4huf5XhPLQcy0tMyLw@mail.gmail.com>
2017-02-01 10:00           ` David Howells
2017-02-01 10:00             ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox