From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752832AbdK0NAi (ORCPT <rfc822;w@1wt.eu>);
        Mon, 27 Nov 2017 08:00:38 -0500
Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:57110 "EHLO
        mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S1752754AbdK0NAg (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 27 Nov 2017 08:00:36 -0500
Date: Mon, 27 Nov 2017 14:00:28 +0100
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
        Sebastian Ott <sebott@linux.vnet.ibm.com>,
        Ingo Molnar <mingo@kernel.org>,
        Heiko Carstens <heiko.carstens@de.ibm.com>,
        linux-kernel@vger.kernel.org
Subject: Re: [bisected] system hang after boot
In-Reply-To: <20171127125455.GC30679@arm.com>
References: <alpine.LFD.2.21.1711221832100.1728@schleppi>
        <20171122182659.GA22648@arm.com>
        <20171122202217.GO3326@worktop>
        <20171127114947.GA30679@arm.com>
        <20171127134918.4da71a78@mschwideX1>
        <20171127125455.GC30679@arm.com>
X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-TM-AS-GCONF: 00
x-cbid: 17112713-0016-0000-0000-00000505F2FB
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 17112713-0017-0000-0000-00002841D287
Message-Id: <20171127140028.77cfb60a@mschwideX1>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-11-27_06:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501
 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0
 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0
 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000
 definitions=main-1711270180
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, 27 Nov 2017 12:54:56 +0000
Will Deacon <will.deacon@arm.com> wrote:

> Hi Martin,
> 
> On Mon, Nov 27, 2017 at 01:49:18PM +0100, Martin Schwidefsky wrote:
> > On Mon, 27 Nov 2017 11:49:48 +0000
> > Will Deacon <will.deacon@arm.com> wrote:  
> > > On Wed, Nov 22, 2017 at 09:22:17PM +0100, Peter Zijlstra wrote:  
> > > > On Wed, Nov 22, 2017 at 06:26:59PM +0000, Will Deacon wrote:
> > > >     
> > > > > Now, I can't see what the break_lock is doing here other than causing
> > > > > problems. Is there a good reason for it, or can you just try removing it
> > > > > altogether? Patch below.    
> > > > 
> > > > The main use is spin_is_contended(), which in turn ends up used in
> > > > __cond_resched_lock() through spin_needbreak().
> > > > 
> > > > This allows better lock wait times for PREEMPT kernels on platforms
> > > > where the lock implementation itself cannot provide 'contended' state.
> > > > 
> > > > In that capacity the write-write race shouldn't be a problem though.    
> > > 
> > > I'm not sure why it isn't a problem: given that the break_lock variable
> > > can read as 1 for a lock that is no longer contended and 0 for a lock that
> > > is currently contended, then the __cond_resched_lock is likely to see a
> > > value of 0 (i.e. spin_needbreak always return false) more often than no
> > > since it's checked by the lock holder.  
> > 
> > Grepping for 'break_lock' the two locking blueprints are the only places
> > where the field is written to. Unless I am blind, the associated unlock
> > functions do *not* reset 'break_lock'.
> > 
> > Without the raw_##op##_can_lock(lock) check the first of the blueprints
> > now looks like this:
> > 
> > void __lockfunc __raw_##op##_lock(locktype##_t *lock)                   \
> > {                                                                       \
> >         for (;;) {                                                      \
> >                 preempt_disable();                                      \
> >                 if (likely(do_raw_##op##_trylock(lock)))                \
> >                         break;                                          \
> >                 preempt_enable();                                       \
> >                                                                         \
> >                 if (!(lock)->break_lock)                                \
> >                         (lock)->break_lock = 1;                         \
> >                 while ((lock)->break_lock)                              \
> >                         arch_##op##_relax(&lock->raw_lock);             \
> >         }                                                               \
> >         (lock)->break_lock = 0;                                         \
> > }                                                                       \
> > 
> > All it takes to create an endless loop is two CPUs, the first acquired the
> > lock and the second tries to get the lock. After the unsuccessful trylock
> > of the second CPU, the first CPU releases the lock and never tries to take
> > it again. The second CPU will be stuck in an endless loop.  
> 
> Yes, it basically relies on the lock holder never winning that race.
> However, Peter's use-case just needs the lock-holder to be able to detect
> contention (which is always best-effort anyway), so I think we can make that
> "work" by removing the while loop above (see my subsequent diff sent to
> Sebastian).

Well, what race? The lock hold just has to hold the lock while another CPU
tries to get it. There is no particular bad timing involved, just a little
bit of contention is enough.

And yes, I think removing the while loop on break_lock will work.

> It's still questionable, because on a machine with store-buffers you really
> want to order writes to break_lock against something else, but it might
> happen to fall out depending on the details of the trylock() implementation.

Even more, if the compiler "proves" that nobody writes to break_lock it can
convert that to "while (1)" loop.

> > I guess my best course of action is to remove GENERIC_LOCKBREAK from
> > arch/s390/Kconfig to avoid this construct altogether. Let us see what
> > breaks if I do that ..  
> 
> We could just consider ripping out GENERIC_LOCKBREAK entirely, but I was
> hoping we could get a simpler fix in for now.

I would opt for removing it entirely.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.