All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Safonov <dima@arista.com>
To: lkp@lists.01.org
Subject: Re: [tty] 0b4f83d510: INFO:task_blocked_for_more_than#seconds
Date: Mon, 10 Sep 2018 19:50:36 +0100	[thread overview]
Message-ID: <1536605436.2710.5.camel@arista.com> (raw)
In-Reply-To: <20180910051452.GA518@jagdpanzerIV>

[-- Attachment #1: Type: text/plain, Size: 2892 bytes --]

Hi Sergey, Jiri,

On Mon, 2018-09-10 at 14:14 +0900, Sergey Senozhatsky wrote:
> On (09/07/18 08:39), Jiri Slaby wrote:
> > > [  244.944070] 
> > > [  244.944070] Showing all locks held in the system:
> > > [  244.945558] 1 lock held by khungtaskd/18:
> > > [  244.946495]  #0: (____ptrval____) (rcu_read_lock){....}, at:
> > > debug_show_all_locks+0x16/0x190
> > > [  244.948503] 2 locks held by askfirst/235:
> > > [  244.949439]  #0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
> > > tty_ldisc_ref_wait+0x25/0x50
> > > [  244.951762]  #1: (____ptrval____) (&ldata-
> > > >atomic_read_lock){+.+.}, at: n_tty_read+0x13d/0xa00
> > 
> > Here, it just seems to wait for input from the user.
> > 
> > > [  244.953799] 1 lock held by validate_data/655:
> > > [  244.954814]  #0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
> > > tty_ldisc_ref_wait+0x25/0x50
> > > [  244.956764] 1 lock held by dnsmasq/668:
> > > [  244.957649]  #0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
> > > tty_ldisc_ref_wait+0x25/0x50
> > > [  244.959598] 1 lock held by dropbear/734:
> > > [  244.967564]  #0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
> > > tty_ldisc_ref_wait+0x25/0x50
> > 
> > Hmm, I assume there is another task waiting for write_ldsem and
> > that one
> > prevents these readers to proceed. Unfortunately, due to the
> > defunct
> > __ptrval__ pointer hashing crap, we do not see who is waiting for
> > what.
> > But I am guessing all are the same locks.
> 
> Hmm, interesting. Am I getting it right that the test did pass
> before.
> And now we see that sort of/smells like live-lock right after the
> introduction of tty_ldisc_lock() to tty_reopen().
> 
> > So I think, we are forced to limit the waiting to 5 seconds in
> > reopen in
> > the end too (the same as we do for new open in tty_init_dev).
> 
> If I got it right, LKP did test the 5*HZ patch
> 
> 	retval = tty_ldisc_lock(tty, 5 * HZ);
> 
> At least it's
>  In-Reply-To: <20180829022353.23568-3-dima@arista.com>
> 
> and
>  Message-Id: <20180829022353.23568-3-dima@arista.com>
> 
> is the patch which does the 5*HZ lock timeout thing.

Yeah, I also noticed on the weekend that the commit in the mentioned
branch is from v1..

Currently, I tried to reproduce it like ~15-20 times, but unlucky :(

It looks like, the lockup wasn't introduced by this commit, but
unfortunately the commit made it more likely. At least, that's what I
suppose after I've found this report:
https://lkml.org/lkml/2017/11/21/216

It seems to me that the lockup is triggered by:
[  244.948503] 2 locks held by askfirst/235:
[  244.949439]  #0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
tty_ldisc_ref_wait+0x25/0x50
[  244.951762]  #1: (____ptrval____) (&ldata->atomic_read_lock){+.+.},
at: n_tty_read+0x13d/0xa00

Looking into this..

-- 
Thanks,
             Dmitry

WARNING: multiple messages have this Message-ID (diff)
From: Dmitry Safonov <dima@arista.com>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Jiri Slaby <jslaby@suse.cz>
Cc: "kernel test robot" <rong.a.chen@intel.com>,
	lkp@01.org, "Daniel Axtens" <dja@axtens.net>,
	"Dmitry Safonov" <0x7f454c46@gmail.com>,
	"Dmitry Vyukov" <dvyukov@google.com>,
	"Tan Xiaojun" <tanxiaojun@huawei.com>,
	"Peter Hurley" <peter@hurleysoftware.com>,
	"Pasi Kärkkäinen" <pasik@iki.fi>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Michael Neuling" <mikey@neuling.org>,
	"Mikulas Patocka" <mpatocka@redhat.com>,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [LKP] [tty] 0b4f83d510: INFO:task_blocked_for_more_than#seconds
Date: Mon, 10 Sep 2018 19:50:36 +0100	[thread overview]
Message-ID: <1536605436.2710.5.camel@arista.com> (raw)
In-Reply-To: <20180910051452.GA518@jagdpanzerIV>

Hi Sergey, Jiri,

On Mon, 2018-09-10 at 14:14 +0900, Sergey Senozhatsky wrote:
> On (09/07/18 08:39), Jiri Slaby wrote:
> > > [  244.944070] 
> > > [  244.944070] Showing all locks held in the system:
> > > [  244.945558] 1 lock held by khungtaskd/18:
> > > [  244.946495]  #0: (____ptrval____) (rcu_read_lock){....}, at:
> > > debug_show_all_locks+0x16/0x190
> > > [  244.948503] 2 locks held by askfirst/235:
> > > [  244.949439]  #0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
> > > tty_ldisc_ref_wait+0x25/0x50
> > > [  244.951762]  #1: (____ptrval____) (&ldata-
> > > >atomic_read_lock){+.+.}, at: n_tty_read+0x13d/0xa00
> > 
> > Here, it just seems to wait for input from the user.
> > 
> > > [  244.953799] 1 lock held by validate_data/655:
> > > [  244.954814]  #0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
> > > tty_ldisc_ref_wait+0x25/0x50
> > > [  244.956764] 1 lock held by dnsmasq/668:
> > > [  244.957649]  #0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
> > > tty_ldisc_ref_wait+0x25/0x50
> > > [  244.959598] 1 lock held by dropbear/734:
> > > [  244.967564]  #0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
> > > tty_ldisc_ref_wait+0x25/0x50
> > 
> > Hmm, I assume there is another task waiting for write_ldsem and
> > that one
> > prevents these readers to proceed. Unfortunately, due to the
> > defunct
> > __ptrval__ pointer hashing crap, we do not see who is waiting for
> > what.
> > But I am guessing all are the same locks.
> 
> Hmm, interesting. Am I getting it right that the test did pass
> before.
> And now we see that sort of/smells like live-lock right after the
> introduction of tty_ldisc_lock() to tty_reopen().
> 
> > So I think, we are forced to limit the waiting to 5 seconds in
> > reopen in
> > the end too (the same as we do for new open in tty_init_dev).
> 
> If I got it right, LKP did test the 5*HZ patch
> 
> 	retval = tty_ldisc_lock(tty, 5 * HZ);
> 
> At least it's
>  In-Reply-To: <20180829022353.23568-3-dima@arista.com>
> 
> and
>  Message-Id: <20180829022353.23568-3-dima@arista.com>
> 
> is the patch which does the 5*HZ lock timeout thing.

Yeah, I also noticed on the weekend that the commit in the mentioned
branch is from v1..

Currently, I tried to reproduce it like ~15-20 times, but unlucky :(

It looks like, the lockup wasn't introduced by this commit, but
unfortunately the commit made it more likely. At least, that's what I
suppose after I've found this report:
https://lkml.org/lkml/2017/11/21/216

It seems to me that the lockup is triggered by:
[  244.948503] 2 locks held by askfirst/235:
[  244.949439]  #0: (____ptrval____) (&tty->ldisc_sem){++++}, at:
tty_ldisc_ref_wait+0x25/0x50
[  244.951762]  #1: (____ptrval____) (&ldata->atomic_read_lock){+.+.},
at: n_tty_read+0x13d/0xa00

Looking into this..

-- 
Thanks,
             Dmitry

  reply	other threads:[~2018-09-10 18:50 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-29  2:23 [PATCH 0/4] tty: Hold write ldisc sem in tty_reopen() Dmitry Safonov
2018-08-29  2:23 ` [PATCH 1/4] tty: Drop tty->count on tty_reopen() failure Dmitry Safonov
2018-08-29 14:38   ` Jiri Slaby
2018-08-29 16:13     ` Dmitry Safonov
2018-08-31  6:47       ` Jiri Slaby
2018-08-31 11:54         ` Dmitry Safonov
2018-08-29  2:23 ` [PATCH 2/4] tty: Hold tty_ldisc_lock() during tty_reopen() Dmitry Safonov
2018-08-29  4:34   ` Sergey Senozhatsky
2018-08-29 14:30     ` Dmitry Safonov
2018-08-30  5:16     ` Benjamin Herrenschmidt
2018-08-29 14:40   ` Jiri Slaby
2018-08-29 14:45     ` Jiri Slaby
2018-08-29 16:36     ` Dmitry Safonov
2018-08-29 15:19   ` Tetsuo Handa
2018-08-31  6:51     ` Jiri Slaby
2018-08-31 11:17       ` Tetsuo Handa
2018-08-31 11:21         ` Jiri Slaby
2018-08-31 12:12           ` Dmitry Safonov
2018-09-07  4:50   ` [tty] 0b4f83d510: INFO:task_blocked_for_more_than#seconds kernel test robot
2018-09-07  4:50     ` [LKP] " kernel test robot
2018-09-07  6:39     ` Jiri Slaby
2018-09-07  6:39       ` [LKP] " Jiri Slaby
2018-09-07 11:12       ` Dmitry Safonov
2018-09-07 11:12         ` [LKP] " Dmitry Safonov
2018-09-10  5:14       ` Sergey Senozhatsky
2018-09-10  5:14         ` [LKP] " Sergey Senozhatsky
2018-09-10 18:50         ` Dmitry Safonov [this message]
2018-09-10 18:50           ` Dmitry Safonov
2018-08-29  2:23 ` [PATCH 3/4] tty: Lock tty pair in tty_init_dev() Dmitry Safonov
2018-08-29 14:46   ` Jiri Slaby
2018-08-29 16:28     ` Dmitry Safonov
2018-08-31  6:54       ` Jiri Slaby
2018-08-31 12:22         ` Dmitry Safonov
2018-08-29  2:23 ` [PATCH 4/4] tty/lockdep: Add ldisc_sem asserts Dmitry Safonov
2018-08-30  7:03 ` [PATCH 0/4] tty: Hold write ldisc sem in tty_reopen() Pasi Kärkkäinen
2018-08-30  7:03   ` Pasi Kärkkäinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1536605436.2710.5.camel@arista.com \
    --to=dima@arista.com \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.