From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753332AbbIRI52 (ORCPT ); Fri, 18 Sep 2015 04:57:28 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:48616 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752396AbbIRI5X (ORCPT ); Fri, 18 Sep 2015 04:57:23 -0400 Date: Fri, 18 Sep 2015 10:51:56 +0200 From: Peter Zijlstra To: Oleg Nesterov Cc: Dmitry Vyukov , ebiederm@xmission.com, Al Viro , Andrew Morton , Ingo Molnar , Paul McKenney , mhocko@suse.cz, LKML , ktsan@googlegroups.com, Kostya Serebryany , Andrey Konovalov , Alexander Potapenko , Hans Boehm Subject: Re: [PATCH] kernel: fix data race in put_pid Message-ID: <20150918085156.GS3816@twins.programming.kicks-ass.net> References: <1442496268-47803-1-git-send-email-dvyukov@google.com> <20150917160837.GA26050@redhat.com> <20150917174456.GA30178@redhat.com> <20150917180919.GA32116@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150917180919.GA32116@redhat.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 17, 2015 at 08:09:19PM +0200, Oleg Nesterov wrote: > On 09/17, Dmitry Vyukov wrote: > > > > I can update the patch description, but let me explain it here first. > > Yes thanks. > > > Here is the essence of what happens: > > Aha, so you really meant that 2 put_pid's can race with each other, > > > // thread 1 > > 1: pid->foo = 1; // foo is the first word of pid object > > // then it does put_pid > > 2: atomic_dec_and_test(&pid->count) // decrements count to 1 and > > returns false so the function returns > > > > // thread 2 > > // executes put_pid > > 3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free > > // then kmem_cache_free does: > > 4: *(void**)pid = head->freelist; > > 5: head->freelist = (void*)pid; > > > > This can be executed as: > > > > 4: *(void**)pid = head->freelist; > > 1: pid->foo = 1; // foo is the first word of pid object > > 2: atomic_dec_and_test(&pid->count) // decrements count to 1 and > > returns false so the function returns > > 3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free > > 5: head->freelist = (void*)pid; > > Unless I am totally confused, everything is simpler. We can forget > about the hoisting, freelist, etc. > > Thread 2 can see the result of atomic_dec_and_test(), but not the > result of "pid->foo = 1". In this case in can free the object which > can be re-allocated _before_ STORE(pid->foo) completes. Of course, > this would be really bad. > > I need to recheck, but afaics this is not possible. This optimization > is fine, but probably needs a comment. For sure, this code doesn't make any sense to me. > We rely on delayed_put_pid() > called by RCU. And note that nobody can write to this pid after it > is removed from the rcu-protected list. > > So I think this is false alarm, but I'll try to recheck tomorrow, it > is too late for me today. As an alternative patch, could we not do: void put_pid(struct pid *pid) { struct pid_namespace *ns; if (!pid) return; ns = pid->numbers[pid->level].ns; if ((atomic_read(&pid->count) == 1) || atomic_dec_and_test(&pid->count)) { + smp_read_barrier_depends(); /* ctrl-dep */ kmem_cache_free(ns->pid_cachep, pid); put_pid_ns(ns); } } That would upgrade the atomic_read() path to a full READ_ONCE_CTRL(), and thereby avoid any of the kmem_cache_free() stores from leaking out. And its free, except on Alpha. Whereas the atomic_read_acquire() will generate a full memory barrier on whole bunch of archs.