From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932406AbbELIps (ORCPT ); Tue, 12 May 2015 04:45:48 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:47668 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932296AbbELIpo (ORCPT ); Tue, 12 May 2015 04:45:44 -0400 Date: Tue, 12 May 2015 10:45:29 +0200 From: Peter Zijlstra To: Linus Torvalds Cc: Douglas Hatch , Thomas Gleixner , Ingo Molnar , Waiman Long , Linux Kernel Mailing List , "Norton, Scott J" , Peter Anvin , "linux-tip-commits@vger.kernel.org" Subject: Re: [tip:locking/core] locking/pvqspinlock: Replace xchg() by the more descriptive set_mb() Message-ID: <20150512084529.GC21418@twins.programming.kicks-ass.net> References: <20150511145408.GU27504@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 11, 2015 at 10:50:42AM -0700, Linus Torvalds wrote: > On Mon, May 11, 2015 at 7:54 AM, Peter Zijlstra wrote: > > > > Hmm, so I looked at the set_mb() definitions and I figure we want to do > > something like the below, right? > > I don't think you need to do this for the non-smp cases. Well, its the store tearing thing again, we use WRITE_ONCE() in smp_store_release() for the same reason. We want it to be a single store. > The whole > thing is about smp memory ordering, so on UP you don't even need the > WRITE_ONCE(), much less a barrier. No, we actually need both still on UP. Imagine the following sequence: for (;;) { set_current_state(TASK_KILLABLE); if (cond) break; schedule(); } __set_current_state(TASK_RUNNING); vs wake_up_process(p); As we know, set_current_state() is set_mb(), and thus will look like: current->state = TASK_KILLABLE; smp_mb(); if (cond) break; So without the WRITE_ONCE() we can get store tearing, and suppose our compiler is insane and translates the store into 4 byte stores. current->state[0] = TASK_UNINTERRUPTIBLE; current->state[1] = TASK_WAKEKILL >> 8; current->state[2] = 0; current->state[3] = 0; The obvious fail here is to get the wakeup interrupt between [0] and [1]. current->state[0] = TASK_UNINTERRUPTIBLE; wake_up_process(p); p->state = TASK_RUNNING; current->state[1] = TASK_WAKEKILL >> 8; current->state[2] = 0; current->state[3] = 0; With the end result that ->state == TASK_WAKEKILL, from which we'll not wake up unless killed. Similarly, without the barrier(), our friendly compiler is allowed to do: if (cond) break current->state = TASK_KILLABLE; schedule(); Which we all know to be broken. So no, set_mb() (or smp_store_mb()) very much does need the WRITE_ONCE() and a barrier() on UP.