From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S932217AbcH3V3D (ORCPT <rfc822;w@1wt.eu>);
        Tue, 30 Aug 2016 17:29:03 -0400
Received: from gate.crashing.org ([63.228.1.57]:37836 "EHLO gate.crashing.org"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751388AbcH3V3A (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 30 Aug 2016 17:29:00 -0400
Message-ID: <1472592498.2388.40.camel@kernel.crashing.org>
Subject: Re: [RFC][PATCH] Fix a race between rwsem and the scheduler
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Peter Zijlstra <peterz@infradead.org>, Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>,
        LKML <linux-kernel@vger.kernel.org>,
        Nicholas Piggin <nicholas.piggin@gmail.com>
Date: Wed, 31 Aug 2016 07:28:18 +1000
In-Reply-To: <20160830183416.GV10138@twins.programming.kicks-ass.net>
References: <4050f2ce-1aee-d2aa-39e3-36e995b56252@gmail.com>
         <20160830121937.GQ10138@twins.programming.kicks-ass.net>
         <20160830130426.GA17795@redhat.com> <20160830141321.GB2794@worktop>
         <20160830165746.GA29218@redhat.com>
         <20160830183416.GV10138@twins.programming.kicks-ass.net>
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.20.5 (3.20.5-1.fc24) 
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, 2016-08-30 at 20:34 +0200, Peter Zijlstra wrote:
> 
> I'm not actually sure it does. There is the comment from 8643cda549ca4
> which explain the program order guarantees.
> 
> But I'm not sure who or what would simply a full smp_mb() when you call
> schedule() -- I mean, its true on x86, but that's 'trivial'.

It's always been a requirement that if you actually context switch a
full mb() is implied (though that isn't the case if you don't actually
switch, ie, you are back to RUNNING before you even hit schedule).

On powerpc we have a sync deep in _switch to achieve that.

This is necessary so that a process who wakes up on a different CPU sees
all of its own load/stores.

> > I mean, I thought that the LOAD/STORE's done by some task can't
> > be re-ordered with LOAD/STORE's done by another task which was
> > running on the same CPU. Wrong?
> 
> If so, I'm not sure how :/
> 
> So smp_mb__before_spinlock() stops stores from @prev, and the ACQUIRE
> from spin_lock(&rq->lock) stops both loads/stores from @next, but afaict
> nothing stops the loads from @prev seeing stores from @next.
> 
> Also not sure this matters though, if they're threads in the same
> process its a data race already and nobody cares. If they're not threads
> in the same process, they're separated by address space and can't 'see'
> each other anyway.

The architecture switch_to() has to do the right thing.

Cheers,
Ben.