From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Ellerman Subject: Re: [GIT PULL] percpu fix for v4.10-rc6 Date: Wed, 01 Feb 2017 16:46:01 +1100 Message-ID: <87vasutqee.fsf@concordia.ellerman.id.au> References: <20170131165537.GC23970@htj.duckdns.org> Mime-Version: 1.0 Content-Type: text/plain Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+glppe-linuxppc-embedded-2=m.gmane.org@lists.ozlabs.org Sender: "Linuxppc-dev" To: Linus Torvalds , Tejun Heo , "linux-arch@vger.kernel.org" Cc: linuxppc-dev@lists.ozlabs.org, Linux Kernel Mailing List , dougmill@linux.vnet.ibm.com List-Id: linux-arch.vger.kernel.org Linus Torvalds writes: > On Tue, Jan 31, 2017 at 8:55 AM, Tejun Heo wrote: >> >> Douglas found and fixed a ref leak bug in percpu_ref_tryget[_live](). >> The bug is caused by storing the return value of >> atomic_long_inc_not_zero() into an int temp variable before returning >> it as a bool. The interim cast to int loses the upper bits and can >> lead to false negatives. As percpu_ref uses a high bit to mark a >> draining counter, this can happen relatively easily. Fixed by using >> bool for the temp variable. > > I think this fix is wrong. > > The fact is, atomic_long_inc_not_zero() shouldn't be returning > anything with high bits.. Casting to "int" should have absolutely no > impact. "int" is the traditional C truth value (with zero/nonzero > being false/true), and while we're generally moving towards "bool" for > true/false return values, I do think that code that assumes that these > functions can be cast to "int" are right. > > For example, we used to have similar bugs in "test_bit()" returning > the actual bit value (which could be high). > > And when users hit that problem, we fixed "test_bit()", not the users of it. > > So I'd rather fix the places that (insanely) return a 64-bit value. > > Is this purely a ppc64 issue, or does it happen somewhere else too? Sorry I'm late to this, I wasn't Cc'ed on the original patch. It looks like this is only a powerpc issue, we're the only arch other than x86-32 that implements atomic_long_inc_not_zero() by hand (not using atomic64_add_unless()). Assuming all other arches have an atomic64_add_unless() which returns an int then they should all be safe. Actually we have a test suite for atomic64. The patch below adds a check which catches the problem on powerpc at the moment, and passes once I change our version to return an int. I'll turn it into a proper patch and send it to whoever maintains the tests. cheers diff --git a/lib/atomic64_test.c b/lib/atomic64_test.c index 46042901130f..813cd05bec9d 100644 --- a/lib/atomic64_test.c +++ b/lib/atomic64_test.c @@ -152,8 +152,10 @@ static __init void test_atomic64(void) long long v0 = 0xaaa31337c001d00dLL; long long v1 = 0xdeadbeefdeafcafeLL; long long v2 = 0xfaceabadf00df001LL; + long long v3 = 0x8000000000000000LL; long long onestwos = 0x1111111122222222LL; long long one = 1LL; + int r_int; atomic64_t v = ATOMIC64_INIT(v0); long long r = v0; @@ -239,6 +241,11 @@ static __init void test_atomic64(void) BUG_ON(!atomic64_inc_not_zero(&v)); r += one; BUG_ON(v.counter != r); + + /* Confirm the return value fits in an int, even if the value doesn't */ + INIT(v3); + r_int = atomic64_inc_not_zero(&v); + BUG_ON(!r_int); } static __init int test_atomics(void)