stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* commit af32cc7b causes stalls on 3.18.23
@ 2015-10-31  1:56 Corey Wright
  2015-10-31  2:07 ` Sasha Levin
  0 siblings, 1 reply; 6+ messages in thread
From: Corey Wright @ 2015-10-31  1:56 UTC (permalink / raw)
  To: Kosuke Tatsukawa, Sasha Levin; +Cc: stable

Running applications under GNU Screen causes the 3.18.23 kernel to stall and
the application to hang.  Reverting commit af32cc7b, ie "tty: fix stall caused
by missing memory barrier in drivers/tty/n_tty.c", fixes the problem.

1. Compile 3.18.23 kernel (with specified .config [1], if necessary).
2. Boot kernel with Debian Wheezy userland.
3. Log in (console or SSH).
4. Run Screen (ie "screen").
5. Run "apt-get remove <package>".
6. Type "y<enter>" when asked "Do you want to continue [Y/n]?"
7. Observe no further application output, but eventually kernel messages:

[  246.333488] INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=5252 jiffies, g=812, c=811, q=61)
[  246.336486] Task dump for CPU 1:
[  246.336486] apt-get         R  running task        0  2040   2031 0x00000008
[  246.336486]  ffffffffffffffff ffffffff8118cabc ffff880017fec6c0 00000000a005206a
[  246.336486]  8000000000000163 ffffc90000840000 ffff880016829028 0000c9000083ffff
[  246.336486]  ff31880016828008 ffffc90000840000 ffffc9000083ffff ffffffff81815c98
[  246.336486] Call Trace:
[  246.336486]  [<ffffffff8118cabc>] ? alloc_vmap_area+0x26c/0x310
[  246.336486]  [<ffffffff8109cd77>] ? select_idle_sibling+0x27/0x120
[  246.336486]  [<ffffffff8109d1d2>] ? select_task_rq_fair+0x362/0x610
[  246.336486]  [<ffffffff8101d21d>] ? native_sched_clock+0x2d/0x80
[  246.336486]  [<ffffffff8109e592>] ? enqueue_task_fair+0x512/0xac0
[  246.336486]  [<ffffffff81093b89>] ? resched_curr+0x39/0xc0
[  246.336486]  [<ffffffff81094230>] ? check_preempt_curr+0x80/0xa0
[  246.336486]  [<ffffffff81094264>] ? ttwu_do_wakeup+0x14/0xc0
[  246.336486]  [<ffffffff8109756b>] ? try_to_wake_up+0xdb/0x2f0
[  246.336486]  [<ffffffff8155d12d>] ? tty_unlock+0x1d/0x50
[  246.336486]  [<ffffffff811d520a>] ? pollwake+0x6a/0x70
[  246.336486]  [<ffffffff81097780>] ? try_to_wake_up+0x2f0/0x2f0
[  246.336486]  [<ffffffff810a81e4>] ? __wake_up_common+0x54/0x90
[  246.336486]  [<ffffffff8155b56e>] ? down_write+0xe/0x40
[  246.336486]  [<ffffffff8139b8cb>] ? tty_set_termios+0x2bb/0x390
[  246.336486]  [<ffffffff8139be70>] ? set_termios+0x190/0x270
[  246.336486]  [<ffffffff8139c116>] ? tty_mode_ioctl+0x1c6/0x540
[  246.336486]  [<ffffffff8155c692>] ? ldsem_down_read+0x32/0x210
[  246.336486]  [<ffffffff813963ee>] ? tty_ioctl+0x2ce/0xb10
[  246.336486]  [<ffffffff811d1e0c>] ? do_filp_open+0x4c/0xc0
[  246.336486]  [<ffffffff811d41e3>] ? do_vfs_ioctl+0x83/0x500
[  246.336486]  [<ffffffff81078b77>] ? recalc_sigpending+0x17/0x50
[  246.336486]  [<ffffffff81079985>] ? __set_task_blocked+0x35/0x80
[  246.336486]  [<ffffffff8107c32d>] ? __set_current_blocked+0x3d/0x70
[  246.336486]  [<ffffffff811d4701>] ? SyS_ioctl+0xa1/0xc0
[  246.336486]  [<ffffffff8107c53e>] ? SyS_rt_sigprocmask+0x8e/0xc0
[  246.336486]  [<ffffffff8155d4cd>] ? system_call_fastpath+0x16/0x1b

Running "lxc-start -n <container_name>" under Screen immediately triggers the
problem (before it outputs anything or the container is started) with similar
kernel messages:

[  104.706102] INFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=5252 jiffies, g=695, c=694, q=49)
[  104.708653] Task dump for CPU 1:
[  104.708653] lxc-start       R  running task        0  2050   2047 0x00000008
[  104.708653]  ffff8800174e2000 0000000115922590 0000000000000000 ffff880016c00648
[  104.708653]  0000000000000296 ffffffff8155b56e 0000000000000202 ffffffff8139b8cb
[  104.708653]  00007f393d2a16a0 00000500810a8828 000000bf00000005 7f1c030000008a3b
[  104.708653] Call Trace:
[  104.708653]  [<ffffffff8155b56e>] ? down_write+0xe/0x40
[  104.708653]  [<ffffffff8139b8cb>] ? tty_set_termios+0x2bb/0x390
[  104.708653]  [<ffffffff8139be70>] ? set_termios+0x190/0x270
[  104.708653]  [<ffffffff8139c116>] ? tty_mode_ioctl+0x1c6/0x540
[  104.708653]  [<ffffffff8155c692>] ? ldsem_down_read+0x32/0x210
[  104.708653]  [<ffffffff813963ee>] ? tty_ioctl+0x2ce/0xb10
[  104.708653]  [<ffffffff811c3319>] ? get_empty_filp+0xc9/0x1c0
[  104.708653]  [<ffffffff811c3434>] ? alloc_file+0x24/0xc0
[  104.708653]  [<ffffffff81205dd5>] ? anon_inode_getfile+0xd5/0x170
[  104.708653]  [<ffffffff811d41e3>] ? do_vfs_ioctl+0x83/0x500
[  104.708653]  [<ffffffff811dea18>] ? __fd_install+0x28/0x70
[  104.708653]  [<ffffffff81205ecc>] ? anon_inode_getfd+0x5c/0xa0
[  104.708653]  [<ffffffff811d4701>] ? SyS_ioctl+0xa1/0xc0
[  104.708653]  [<ffffffff8155d4cd>] ? system_call_fastpath+0x16/0x1b

I'm available for debugging (just tell me what to do) and testing if needed.

[1] http://pastebin.com/77tHG3wd

Corey
--
undefined@pobox.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: commit af32cc7b causes stalls on 3.18.23
  2015-10-31  1:56 commit af32cc7b causes stalls on 3.18.23 Corey Wright
@ 2015-10-31  2:07 ` Sasha Levin
  2015-10-31  5:37   ` Corey Wright
  0 siblings, 1 reply; 6+ messages in thread
From: Sasha Levin @ 2015-10-31  2:07 UTC (permalink / raw)
  To: Corey Wright, Kosuke Tatsukawa; +Cc: stable

On 10/30/2015 09:56 PM, Corey Wright wrote:
> Running applications under GNU Screen causes the 3.18.23 kernel to stall and
> the application to hang.  Reverting commit af32cc7b, ie "tty: fix stall caused
> by missing memory barrier in drivers/tty/n_tty.c", fixes the problem.

Hi Corey,

Does doing the same on mainline work for you?


Thanks,
Sasha


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: commit af32cc7b causes stalls on 3.18.23
  2015-10-31  2:07 ` Sasha Levin
@ 2015-10-31  5:37   ` Corey Wright
  2015-10-31  8:47     ` Kosuke Tatsukawa
  2015-10-31 13:02     ` Sasha Levin
  0 siblings, 2 replies; 6+ messages in thread
From: Corey Wright @ 2015-10-31  5:37 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Kosuke Tatsukawa, stable

On Fri, 30 Oct 2015 22:07:24 -0400
Sasha Levin <sasha.levin@oracle.com> wrote:

> On 10/30/2015 09:56 PM, Corey Wright wrote:
> > Running applications under GNU Screen causes the 3.18.23 kernel to stall and
> > the application to hang.  Reverting commit af32cc7b, ie "tty: fix stall caused
> > by missing memory barrier in drivers/tty/n_tty.c", fixes the problem.
> 
> Hi Corey,
> 
> Does doing the same on mainline work for you?

is 4.1.12 close enough (which i already had "on-hand")?
 
it contains a similar commit 614ea4ea and tests successfully (ie no bug).

so the problem appears to be limited to 3.18.23.

thanks for getting me looking at and comparing the two versions!

in 3.18.23 we add a spin_unlock_irqrestore inside the "if (tty->link->packet)"
block, but we already have one outside of the if block.

i'm thinking it accidentally got dragged along when porting the commit from
4.1 to 3.18 (because the one that was added in 3.18.23 is in the same location
as the one already in 4.1).

the below patch fixes the problem and passes the
apt-and-lxc-start-run-under-screen-without-causing-a-stall test.

---------- >8 ----- cut here ----- 8< ----------
Remove extraneous, unmatched spin_unlock_irqrestore() introduced by commit
af32cc7b.

This prevents stalls when running command-line applications under GNU Screen.

Fixes: af32cc7bde63 ("tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c")
Signed-off-by: Corey Wright <undefined@pobox.com>

--- linux-3.18.23~/drivers/tty/n_tty.c	2015-10-30 23:46:48.000000000 -0500
+++ linux-3.18.23/drivers/tty/n_tty.c	2015-10-30 23:46:53.821376173 -0500
@@ -364,7 +364,6 @@ static void n_tty_packet_mode_flush(stru
 	spin_lock_irqsave(&tty->ctrl_lock, flags);
 	if (tty->link->packet) {
 		tty->ctrl_status |= TIOCPKT_FLUSHREAD;
-		spin_unlock_irqrestore(&tty->ctrl_lock, flags);
 		wake_up_interruptible(&tty->link->read_wait);
 	}
 	spin_unlock_irqrestore(&tty->ctrl_lock, flags);
---------- >8 ----- cut here ----- 8< ----------

> Thanks,
> Sasha

thank you!

corey
--
undefined@pobox.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: commit af32cc7b causes stalls on 3.18.23
  2015-10-31  5:37   ` Corey Wright
@ 2015-10-31  8:47     ` Kosuke Tatsukawa
  2015-10-31 13:02     ` Sasha Levin
  1 sibling, 0 replies; 6+ messages in thread
From: Kosuke Tatsukawa @ 2015-10-31  8:47 UTC (permalink / raw)
  To: Corey Wright; +Cc: Sasha Levin, stable@vger.kernel.org

Corey Wright wrote:
> On Fri, 30 Oct 2015 22:07:24 -0400
> Sasha Levin <sasha.levin@oracle.com> wrote:
> 
>> On 10/30/2015 09:56 PM, Corey Wright wrote:
>> > Running applications under GNU Screen causes the 3.18.23 kernel to stall and
>> > the application to hang.  Reverting commit af32cc7b, ie "tty: fix stall caused
>> > by missing memory barrier in drivers/tty/n_tty.c", fixes the problem.
>> 
>> Hi Corey,
>> 
>> Does doing the same on mainline work for you?
> 
> is 4.1.12 close enough (which i already had "on-hand")?
>  
> it contains a similar commit 614ea4ea and tests successfully (ie no bug).
> 
> so the problem appears to be limited to 3.18.23.
> 
> thanks for getting me looking at and comparing the two versions!
> 
> in 3.18.23 we add a spin_unlock_irqrestore inside the "if (tty->link->packet)"
> block, but we already have one outside of the if block.
> 
> i'm thinking it accidentally got dragged along when porting the commit from
> 4.1 to 3.18 (because the one that was added in 3.18.23 is in the same location
> as the one already in 4.1).
> 
> the below patch fixes the problem and passes the
> apt-and-lxc-start-run-under-screen-without-causing-a-stall test.

Thank you.

The original upstream patch just removes the
  if (waitqueue_active(...))
line, so the spin_unlock_irqrestore() added in commit af32cc7b is extra
and should be removed.


> ---------- >8 ----- cut here ----- 8< ----------
> Remove extraneous, unmatched spin_unlock_irqrestore() introduced by commit
> af32cc7b.
> 
> This prevents stalls when running command-line applications under GNU Screen.
> 
> Fixes: af32cc7bde63 ("tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c")
> Signed-off-by: Corey Wright <undefined@pobox.com>
> 
> --- linux-3.18.23~/drivers/tty/n_tty.c	2015-10-30 23:46:48.000000000 -0500
> +++ linux-3.18.23/drivers/tty/n_tty.c	2015-10-30 23:46:53.821376173 -0500
> @@ -364,7 +364,6 @@ static void n_tty_packet_mode_flush(stru
>  	spin_lock_irqsave(&tty->ctrl_lock, flags);
>  	if (tty->link->packet) {
>  		tty->ctrl_status |= TIOCPKT_FLUSHREAD;
> -		spin_unlock_irqrestore(&tty->ctrl_lock, flags);
>  		wake_up_interruptible(&tty->link->read_wait);
>  	}
>  	spin_unlock_irqrestore(&tty->ctrl_lock, flags);
> ---------- >8 ----- cut here ----- 8< ----------
> 
>> Thanks,
>> Sasha
> 
> thank you!
> 
> corey
> --
> undefined@pobox.com
---
Kosuke TATSUKAWA  | 3rd IT Platform Department
                  | IT Platform Division, NEC Corporation
                  | tatsu@ab.jp.nec.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: commit af32cc7b causes stalls on 3.18.23
  2015-10-31  5:37   ` Corey Wright
  2015-10-31  8:47     ` Kosuke Tatsukawa
@ 2015-10-31 13:02     ` Sasha Levin
  2015-10-31 18:42       ` Corey Wright
  1 sibling, 1 reply; 6+ messages in thread
From: Sasha Levin @ 2015-10-31 13:02 UTC (permalink / raw)
  To: Corey Wright; +Cc: Kosuke Tatsukawa, stable

On 10/31/2015 01:37 AM, Corey Wright wrote:
> On Fri, 30 Oct 2015 22:07:24 -0400
> Sasha Levin <sasha.levin@oracle.com> wrote:
> 
>> > On 10/30/2015 09:56 PM, Corey Wright wrote:
>>> > > Running applications under GNU Screen causes the 3.18.23 kernel to stall and
>>> > > the application to hang.  Reverting commit af32cc7b, ie "tty: fix stall caused
>>> > > by missing memory barrier in drivers/tty/n_tty.c", fixes the problem.
>> > 
>> > Hi Corey,
>> > 
>> > Does doing the same on mainline work for you?
> is 4.1.12 close enough (which i already had "on-hand")?
>  
> it contains a similar commit 614ea4ea and tests successfully (ie no bug).
> 
> so the problem appears to be limited to 3.18.23.
> 
> thanks for getting me looking at and comparing the two versions!
> 
> in 3.18.23 we add a spin_unlock_irqrestore inside the "if (tty->link->packet)"
> block, but we already have one outside of the if block.
> 
> i'm thinking it accidentally got dragged along when porting the commit from
> 4.1 to 3.18 (because the one that was added in 3.18.23 is in the same location
> as the one already in 4.1).
> 
> the below patch fixes the problem and passes the
> apt-and-lxc-start-run-under-screen-without-causing-a-stall test.
> 
> ---------- >8 ----- cut here ----- 8< ----------
> Remove extraneous, unmatched spin_unlock_irqrestore() introduced by commit
> af32cc7b.
> 
> This prevents stalls when running command-line applications under GNU Screen.
> 
> Fixes: af32cc7bde63 ("tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c")
> Signed-off-by: Corey Wright <undefined@pobox.com>
> 
> --- linux-3.18.23~/drivers/tty/n_tty.c	2015-10-30 23:46:48.000000000 -0500
> +++ linux-3.18.23/drivers/tty/n_tty.c	2015-10-30 23:46:53.821376173 -0500
> @@ -364,7 +364,6 @@ static void n_tty_packet_mode_flush(stru
>  	spin_lock_irqsave(&tty->ctrl_lock, flags);
>  	if (tty->link->packet) {
>  		tty->ctrl_status |= TIOCPKT_FLUSHREAD;
> -		spin_unlock_irqrestore(&tty->ctrl_lock, flags);
>  		wake_up_interruptible(&tty->link->read_wait);
>  	}
>  	spin_unlock_irqrestore(&tty->ctrl_lock, flags);

Hey Corey,

Thanks for looking into it!

I've pushed a corrected backport to the stable queue, if you could please confirm
that it works for you I'll ship it as 3.18.24.

The tree is available at git://git.kernel.org/pub/scm/linux/kernel/git/sashal/linux-stable.git linux-3.18.y-queue

Thanks again!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: commit af32cc7b causes stalls on 3.18.23
  2015-10-31 13:02     ` Sasha Levin
@ 2015-10-31 18:42       ` Corey Wright
  0 siblings, 0 replies; 6+ messages in thread
From: Corey Wright @ 2015-10-31 18:42 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Kosuke Tatsukawa, stable

On Sat, 31 Oct 2015 09:02:52 -0400
Sasha Levin <sasha.levin@oracle.com> wrote:

> On 10/31/2015 01:37 AM, Corey Wright wrote:
> > On Fri, 30 Oct 2015 22:07:24 -0400
> > Sasha Levin <sasha.levin@oracle.com> wrote:
> > 
> >> > On 10/30/2015 09:56 PM, Corey Wright wrote:
> >>> > > Running applications under GNU Screen causes the 3.18.23 kernel to stall and
> >>> > > the application to hang.  Reverting commit af32cc7b, ie "tty: fix stall caused
> >>> > > by missing memory barrier in drivers/tty/n_tty.c", fixes the problem.
> >> > 
> >> > Hi Corey,
> >> > 
> >> > Does doing the same on mainline work for you?
> > is 4.1.12 close enough (which i already had "on-hand")?
> >  
> > it contains a similar commit 614ea4ea and tests successfully (ie no bug).
> > 
> > so the problem appears to be limited to 3.18.23.
> > 
> > thanks for getting me looking at and comparing the two versions!
> > 
> > in 3.18.23 we add a spin_unlock_irqrestore inside the "if (tty->link->packet)"
> > block, but we already have one outside of the if block.
> > 
> > i'm thinking it accidentally got dragged along when porting the commit from
> > 4.1 to 3.18 (because the one that was added in 3.18.23 is in the same location
> > as the one already in 4.1).
> > 
> > the below patch fixes the problem and passes the
> > apt-and-lxc-start-run-under-screen-without-causing-a-stall test.
> > 
> > ---------- >8 ----- cut here ----- 8< ----------
> > Remove extraneous, unmatched spin_unlock_irqrestore() introduced by commit
> > af32cc7b.
> > 
> > This prevents stalls when running command-line applications under GNU Screen.
> > 
> > Fixes: af32cc7bde63 ("tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c")
> > Signed-off-by: Corey Wright <undefined@pobox.com>
> > 
> > --- linux-3.18.23~/drivers/tty/n_tty.c	2015-10-30 23:46:48.000000000 -0500
> > +++ linux-3.18.23/drivers/tty/n_tty.c	2015-10-30 23:46:53.821376173 -0500
> > @@ -364,7 +364,6 @@ static void n_tty_packet_mode_flush(stru
> >  	spin_lock_irqsave(&tty->ctrl_lock, flags);
> >  	if (tty->link->packet) {
> >  		tty->ctrl_status |= TIOCPKT_FLUSHREAD;
> > -		spin_unlock_irqrestore(&tty->ctrl_lock, flags);
> >  		wake_up_interruptible(&tty->link->read_wait);
> >  	}
> >  	spin_unlock_irqrestore(&tty->ctrl_lock, flags);
> 
> Hey Corey,
> 
> Thanks for looking into it!
> 
> I've pushed a corrected backport to the stable queue, if you could please confirm
> that it works for you I'll ship it as 3.18.24.
> 
> The tree is available at git://git.kernel.org/pub/scm/linux/kernel/git/sashal/linux-stable.git linux-3.18.y-queue

i've successfully tested git branch linux-3.18.y-queue (git describe
v3.18.23-2-gb008351) with the revert and the repatch (commits 80e5c4dd and
b008351f, respectively).  running apt-get and lxc-start under screen does not
cause a kernel stall.

> Thanks again!

thank you for maintaining 3.18!

corey
--
undefined@pobox.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-10-31 18:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-31  1:56 commit af32cc7b causes stalls on 3.18.23 Corey Wright
2015-10-31  2:07 ` Sasha Levin
2015-10-31  5:37   ` Corey Wright
2015-10-31  8:47     ` Kosuke Tatsukawa
2015-10-31 13:02     ` Sasha Levin
2015-10-31 18:42       ` Corey Wright

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).