All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Sewior <bigeasy@linutronix.de>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-s390@vger.kernel.org, Stefan Liebler <stli@linux.ibm.com>
Subject: Re: WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggerede
Date: Wed, 30 Jan 2019 15:35:57 -0800	[thread overview]
Message-ID: <20190130233557.GA4240@linux.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.21.1901302314590.8200@nanos.tec.linutronix.de>

On Thu, Jan 31, 2019 at 12:13:51AM +0100, Thomas Gleixner wrote:
> On Wed, 30 Jan 2019, Sebastian Sewior wrote:
> 
> > On 2019-01-30 18:56:54 [+0100], Thomas Gleixner wrote:
> > > TBH, no clue. Below are some more traceprintks which hopefully shed some
> > > light on that mystery. See kernel/futex.c line 30  ...
> > 
> > The robust list it somehow buggy. In the last trace we had the
> > handle_futex_death() of uaddr 3ff9e880140 as the last action. That means
> > it was an entry in 56496's ->list_op_pending entry. This makes sense
> > because it tried to acquire the lock, failed, got killed.
> 
> The robust list of the failing task seems to be correct. 
> 
> > According to uaddr pid 56956 is the owner. So 56956 invoked one of
> > pthread_mutex_lock() / pthread_mutex_timedlock() /
> > pthread_mutex_trylock() and should have obtained the lock in userland.
> > Depending on where it got killed, that mutex should be either recorded in
> > ->list_op_pending or the robust_list (or both if it didn't clear
> > ->list_op_pending yet). But it is not.
> > Similar for pthread_mutex_unlock().
> 
> > We don't have a trace_point if we abort processing the list.
> 
> The only reason why it would abort is due a page fault because that cannot
> be handled in the exit code anymore.
> 
> > On the other hand, it didn't trigger on x86 for hours. Could the atomic
> 
> s/hours/days/ ....
> 
> > ops be the culprit?
> 
> The glibc code does:
> 
>        THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending,
>        		     (void *) (((uintptr_t) &mutex->__data.__list.__next)
>                                    | 1));
> 
>        ....
>        lock in user space
> 
>        or
> 
>        lock in kernel space
> 
>        ENQUEUE_MUTEX_PI (mutex);
>        THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
> 
>        ENQUEUE_MUTEX_PI() resolves to a THREAD_GETMEM() which reads the
>        list head from TLS, some list manipulation operations and the final
>        THREAD_SETMEM() which stores the new list head
> 
> Now on x86 THREAD_GETMEM() and THREAD_SETMEM() are resolving to
> 
>     asm volatile ("movX .....")
> 
> on s390 they are
> 
>    descr->member
> 
> based operations.
> 
> Now the important part of the robust list is the store sequence, i.e. the
> list head and final update to the TLS visible part need to come _before_
> list_op_pending is cleared.
> 
> I might be missing something, but there is no compiler barrier in that code
> which would prevent the compiler from reordering the stores. It can
> rightfully do so because there is no compiler visible dependency of these
> two operations.
> 
> On x8664 the asm volatile might prevent it by chance, but it does not have
> a 'memory' specified which would guarantee a compiler barrier.
> 
> On s390 there is certainly nothing.
> 
> So assumed that clearing list_op_pending comes before the list head update,
> then the robust exit code in the kernel will fail to see either of
> them. FAIL.
> 
> I might be wrong as usual, but this would definitely explain the fail very
> well.

On recent versions of GCC, the fix would be to put this between the two
stores that need ordering:

	__atomic_thread_fence(__ATOMIC_RELEASE);

I must defer to Heiko on whether s390 GCC might tear the stores.  My
guess is "probably not".  ;-)

							Thanx, Paul

  reply	other threads:[~2019-01-30 23:35 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-27  8:11 WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggered Heiko Carstens
2018-11-28 14:32 ` Thomas Gleixner
2018-11-29 11:23   ` Heiko Carstens
2019-01-21 12:21     ` Heiko Carstens
2019-01-21 13:12       ` Thomas Gleixner
2019-01-22 21:14         ` Thomas Gleixner
2019-01-23  9:24           ` Heiko Carstens
2019-01-23 12:33             ` Thomas Gleixner
2019-01-23 12:40               ` Heiko Carstens
2019-01-28 13:44     ` Peter Zijlstra
2019-01-28 13:58       ` Peter Zijlstra
2019-01-28 15:53         ` Thomas Gleixner
2019-01-29  8:49           ` Peter Zijlstra
2019-01-29 22:15             ` [PATCH] futex: Handle early deadlock return correctly Thomas Gleixner
2019-01-30 12:01               ` Thomas Gleixner
2019-02-08 12:05               ` [tip:locking/urgent] " tip-bot for Thomas Gleixner
2019-01-29  9:01           ` WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggered Heiko Carstens
2019-01-29  9:33             ` Peter Zijlstra
2019-01-29  9:45             ` Thomas Gleixner
2019-01-29 10:24               ` Heiko Carstens
2019-01-29 10:35                 ` Peter Zijlstra
2019-01-29 13:03                   ` Thomas Gleixner
2019-01-29 13:23                     ` Heiko Carstens
     [not found]                       ` <20190129151058.GG26906@osiris>
2019-01-29 17:16                         ` Sebastian Sewior
2019-01-29 21:45                           ` Thomas Gleixner
     [not found]                           ` <20190130094913.GC5299@osiris>
2019-01-30 12:15                             ` Thomas Gleixner
     [not found]                               ` <20190130125955.GD5299@osiris>
2019-01-30 13:24                                 ` Sebastian Sewior
2019-01-30 13:29                                   ` Thomas Gleixner
2019-01-30 14:33                                     ` Thomas Gleixner
2019-01-30 17:56                                       ` Thomas Gleixner
2019-01-30 21:07                                         ` Sebastian Sewior
2019-01-30 23:13                                           ` WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggerede Thomas Gleixner
2019-01-30 23:35                                             ` Paul E. McKenney [this message]
2019-01-30 23:55                                               ` Thomas Gleixner
2019-01-31  0:27                                                 ` Thomas Gleixner
2019-01-31  1:45                                                   ` Paul E. McKenney
2019-01-31 16:52                                                   ` Heiko Carstens
2019-01-31 17:06                                                     ` Sebastian Sewior
2019-01-31 20:42                                                       ` Heiko Carstens
2019-02-01 16:12                                                       ` Heiko Carstens
2019-02-01 21:59                                                         ` Thomas Gleixner
     [not found]                                                           ` <20190202091043.GA3381@osiris>
2019-02-02 10:14                                                             ` Thomas Gleixner
2019-02-02 11:20                                                               ` Heiko Carstens
2019-02-03 16:30                                                                 ` Thomas Gleixner
2019-02-04 11:40                                                                   ` Heiko Carstens
2019-01-31  1:44                                                 ` Paul E. McKenney
2019-01-30 13:25                                 ` WARN_ON_ONCE(!new_owner) within wake_futex_pi() triggered Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190130233557.GA4240@linux.ibm.com \
    --to=paulmck@linux.ibm.com \
    --cc=bigeasy@linutronix.de \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=stli@linux.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.