Re: inconsistent lock state on v4.14.20-rt17

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Roosen Henri <Henri.Roosen@ginzinger.com>
To: "bigeasy@linutronix.de" <bigeasy@linutronix.de>
Cc: "linux-rt-users@vger.kernel.org" <linux-rt-users@vger.kernel.org>
Subject: Re: inconsistent lock state on v4.14.20-rt17
Date: Fri, 9 Mar 2018 09:47:16 +0000	[thread overview]
Message-ID: <1520588836.1921.2.camel@ginzinger.com> (raw)
In-Reply-To: <20180308180025.zdhu3zl24zfkb3qz@linutronix.de>

On Thu, 2018-03-08 at 19:00 +0100, bigeasy@linutronix.de wrote:
> On 2018-03-08 17:38:59 [+0000], Roosen Henri wrote:
> > > Is the backtrace, that you receive from lockdep, always the same
> > > or
> > > is
> > > it different sometimes?
> > 
> > It is different each time. So my gut feeling tells me it might be a
> > memory corruption of some kind.. maybe caused by a use after free
> > or
> > so..
> 
> CONFIG_SLUB_DEBUG_ON should (or could) catch this.

Thanks for pointing that out! I'll enable this for the next test run.
If there are more debug options which are of interest to switch on,
then please let me know.

> 
> > I restarted the target yesterday evening and this morning it was
> > frozen
> > without any trace on the terminal. Attaching a JTAG showed
> > different
> > call-stacks than yesterday; Core #2 (trying to print the info to
> > the
> > terminal) and #3 were spinning on a spin-lock, don't understand
> > what
> > core #0 and #1 were doing.
> 
> maybe #0 and #1 are idle but #2 and #3 should make progress. #2 looks
> like a warning, do you know where it is from or is this everything
> you
> get? Unless the warning comes from an atomic context you should see
> something on the UART.

#2 and #3 were not making progress, they kept on spinning at the
arch_spin_lock().

> 
> > Most of the times the call-stacks start at SyS_write() or
> > SyS_read()
> > from hackbench.
> 
> but what you posted was lockdep complaining about RQ lock.

Well, actually I've reported "since 4.9 we've been chasing random
kernel crashes", and the v4.14 now caught an inconsistent lock state.
The hope was that the trace for the inconsistent lock state pointed to
the root cause of the random kernel crashes.

> 
> > Some things I found out by testing on v4.9:
> > - minimum test to reproduce problem "while true; do hackbench -g
> > 100 -l
> > 1000; done &"
> > - reproducible with "hackbench -T" (threads)
> > - reproducible only on iMX6Q, not (yet) on iMX6S, iMX6D
> > - NOT reproducible with "hackbench -p" (pipes)
> 
> interesting.
> 
> > As that might be pointing towards the streaming unix socketpair
> > hackbench is using from multiple forked processes, I had a look at
> > net/unix/af_unix.c and wondered why unix_stream_sendmsg() doesn't
> > increase the reference count on the "other" socket the same as
> > unix_dgram_sendmsg() does. I don't see a reason why "other" is
> > handled
> > differently in both functions, so it smells fishy to me. But I'm
> > not
> > familiar with the net-code, so maybe you could review if the diff
> > below
> > would make sense:
> 
> Commit 830a1e5c212f ("[AF_UNIX]: Remove superfluous reference
> counting
> in unix_stream_sendmsg") claims that this is not required. But if
> your
> patch makes a difference then…

Okay, I didn't know the refcounting could be safely removed. The
overnight test with the change reproduced the inconsistent lock state
again, which proves indeed it makes no difference.

> 
> Sebastian

Thanks,
Henri

next prev parent reply	other threads:[~2018-03-09  9:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-06 15:27 inconsistent lock state on v4.14.20-rt17 Roosen Henri
2018-03-06 18:21 ` Sebastian Andrzej Siewior
     [not found]   ` <1520411200.1744.11.camel@ginzinger.com>
2018-03-08 15:57     ` bigeasy
2018-03-08 17:38       ` Roosen Henri
2018-03-08 18:00         ` bigeasy
2018-03-09  9:47           ` Roosen Henri [this message]
2018-03-14 19:55             ` bigeasy
2018-03-16 10:30               ` bigeasy
2018-03-17 21:03                 ` bigeasy
2018-03-21  8:31                   ` Roosen Henri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1520588836.1921.2.camel@ginzinger.com \
    --to=henri.roosen@ginzinger.com \
    --cc=bigeasy@linutronix.de \
    --cc=linux-rt-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).