public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will.deacon@arm.com>
To: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Sebastian Ott <sebott@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@kernel.org>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [bisected] system hang after boot
Date: Mon, 27 Nov 2017 13:10:31 +0000	[thread overview]
Message-ID: <20171127131030.GD30679@arm.com> (raw)
In-Reply-To: <20171127140028.77cfb60a@mschwideX1>

On Mon, Nov 27, 2017 at 02:00:28PM +0100, Martin Schwidefsky wrote:
> On Mon, 27 Nov 2017 12:54:56 +0000
> Will Deacon <will.deacon@arm.com> wrote:
> > On Mon, Nov 27, 2017 at 01:49:18PM +0100, Martin Schwidefsky wrote:
> > > On Mon, 27 Nov 2017 11:49:48 +0000
> > > Will Deacon <will.deacon@arm.com> wrote:  
> > > > On Wed, Nov 22, 2017 at 09:22:17PM +0100, Peter Zijlstra wrote:  
> > > > > On Wed, Nov 22, 2017 at 06:26:59PM +0000, Will Deacon wrote:
> > > > >     
> > > > > > Now, I can't see what the break_lock is doing here other than causing
> > > > > > problems. Is there a good reason for it, or can you just try removing it
> > > > > > altogether? Patch below.    
> > > > > 
> > > > > The main use is spin_is_contended(), which in turn ends up used in
> > > > > __cond_resched_lock() through spin_needbreak().
> > > > > 
> > > > > This allows better lock wait times for PREEMPT kernels on platforms
> > > > > where the lock implementation itself cannot provide 'contended' state.
> > > > > 
> > > > > In that capacity the write-write race shouldn't be a problem though.    
> > > > 
> > > > I'm not sure why it isn't a problem: given that the break_lock variable
> > > > can read as 1 for a lock that is no longer contended and 0 for a lock that
> > > > is currently contended, then the __cond_resched_lock is likely to see a
> > > > value of 0 (i.e. spin_needbreak always return false) more often than no
> > > > since it's checked by the lock holder.  
> > > 
> > > Grepping for 'break_lock' the two locking blueprints are the only places
> > > where the field is written to. Unless I am blind, the associated unlock
> > > functions do *not* reset 'break_lock'.
> > > 
> > > Without the raw_##op##_can_lock(lock) check the first of the blueprints
> > > now looks like this:
> > > 
> > > void __lockfunc __raw_##op##_lock(locktype##_t *lock)                   \
> > > {                                                                       \
> > >         for (;;) {                                                      \
> > >                 preempt_disable();                                      \
> > >                 if (likely(do_raw_##op##_trylock(lock)))                \
> > >                         break;                                          \
> > >                 preempt_enable();                                       \
> > >                                                                         \
> > >                 if (!(lock)->break_lock)                                \
> > >                         (lock)->break_lock = 1;                         \
> > >                 while ((lock)->break_lock)                              \
> > >                         arch_##op##_relax(&lock->raw_lock);             \
> > >         }                                                               \
> > >         (lock)->break_lock = 0;                                         \
> > > }                                                                       \
> > > 
> > > All it takes to create an endless loop is two CPUs, the first acquired the
> > > lock and the second tries to get the lock. After the unsuccessful trylock
> > > of the second CPU, the first CPU releases the lock and never tries to take
> > > it again. The second CPU will be stuck in an endless loop.  
> > 
> > Yes, it basically relies on the lock holder never winning that race.
> > However, Peter's use-case just needs the lock-holder to be able to detect
> > contention (which is always best-effort anyway), so I think we can make that
> > "work" by removing the while loop above (see my subsequent diff sent to
> > Sebastian).
> 
> Well, what race? The lock hold just has to hold the lock while another CPU
> tries to get it. There is no particular bad timing involved, just a little
> bit of contention is enough.

Yes, you're right. I keep forgetting that break_lock isn't cleared on
unlock.

> And yes, I think removing the while loop on break_lock will work.
> 
> > It's still questionable, because on a machine with store-buffers you really
> > want to order writes to break_lock against something else, but it might
> > happen to fall out depending on the details of the trylock() implementation.
> 
> Even more, if the compiler "proves" that nobody writes to break_lock it can
> convert that to "while (1)" loop.

break_lock should be annotated (at least) with READ_ONCE/WRITE_ONCE, which
should prevent that from happening.

> > > I guess my best course of action is to remove GENERIC_LOCKBREAK from
> > > arch/s390/Kconfig to avoid this construct altogether. Let us see what
> > > breaks if I do that ..  
> > 
> > We could just consider ripping out GENERIC_LOCKBREAK entirely, but I was
> > hoping we could get a simpler fix in for now.
> 
> I would opt for removing it entirely.

I'll cook a patch series, with the first patch just removing the while loop
and subsequent patches removing the stuff altogether.

Will

  reply	other threads:[~2017-11-27 13:10 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-22 17:46 [bisected] system hang after boot Sebastian Ott
2017-11-22 18:26 ` Will Deacon
2017-11-22 18:54   ` Sebastian Ott
2017-11-22 19:10     ` Will Deacon
2017-11-22 20:22   ` Peter Zijlstra
2017-11-27 11:49     ` Will Deacon
2017-11-27 12:45       ` Will Deacon
2017-11-27 13:05         ` Sebastian Ott
2017-11-27 12:49       ` Martin Schwidefsky
2017-11-27 12:54         ` Will Deacon
2017-11-27 13:00           ` Martin Schwidefsky
2017-11-27 13:10             ` Will Deacon [this message]
2017-11-27 13:13               ` Peter Zijlstra
2017-11-27 13:12             ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171127131030.GD30679@arm.com \
    --to=will.deacon@arm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=sebott@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox