From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752205AbbIQSMR (ORCPT ); Thu, 17 Sep 2015 14:12:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:43481 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751723AbbIQSMQ (ORCPT ); Thu, 17 Sep 2015 14:12:16 -0400 Date: Thu, 17 Sep 2015 20:09:19 +0200 From: Oleg Nesterov To: Dmitry Vyukov Cc: ebiederm@xmission.com, Al Viro , Andrew Morton , Ingo Molnar , Paul McKenney , mhocko@suse.cz, LKML , ktsan@googlegroups.com, Kostya Serebryany , Andrey Konovalov , Alexander Potapenko , Hans Boehm , Peter Zijlstra Subject: Re: [PATCH] kernel: fix data race in put_pid Message-ID: <20150917180919.GA32116@redhat.com> References: <1442496268-47803-1-git-send-email-dvyukov@google.com> <20150917160837.GA26050@redhat.com> <20150917174456.GA30178@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/17, Dmitry Vyukov wrote: > > I can update the patch description, but let me explain it here first. Yes thanks. > Here is the essence of what happens: Aha, so you really meant that 2 put_pid's can race with each other, > // thread 1 > 1: pid->foo = 1; // foo is the first word of pid object > // then it does put_pid > 2: atomic_dec_and_test(&pid->count) // decrements count to 1 and > returns false so the function returns > > // thread 2 > // executes put_pid > 3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free > // then kmem_cache_free does: > 5: head->freelist = (void*)pid; > > This can be executed as: > > 4: *(void**)pid = head->freelist; > 1: pid->foo = 1; // foo is the first word of pid object > 2: atomic_dec_and_test(&pid->count) // decrements count to 1 and > returns false so the function returns > 3: atomic_load(&pid->count); // returns 1, so proceed to kmem_cache_free > 5: head->freelist = (void*)pid; Unless I am totally confused, everything is simpler. We can forget about the hoisting, freelist, etc. Thread 2 can see the result of atomic_dec_and_test(), but not the result of "pid->foo = 1". In this case in can free the object which can be re-allocated _before_ STORE(pid->foo) completes. Of course, this would be really bad. I need to recheck, but afaics this is not possible. This optimization is fine, but probably needs a comment. We rely on delayed_put_pid() called by RCU. And note that nobody can write to this pid after it is removed from the rcu-protected list. So I think this is false alarm, but I'll try to recheck tomorrow, it is too late for me today. Oleg.