Re: [PATCH] serial: flush ldisc after hangup

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Josef Bacik <jbacik@fb.com>
To: Peter Hurley <peter@hurleysoftware.com>,
	gregkh@linuxfoundation.org, jslaby@suse.com,
	linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] serial: flush ldisc after hangup
Date: Tue, 1 Mar 2016 13:21:26 -0500	[thread overview]
Message-ID: <56D5DDA6.5080605@fb.com> (raw)
In-Reply-To: <56D5DCA0.2040201@hurleysoftware.com>

On 03/01/2016 01:17 PM, Peter Hurley wrote:
> Hi Josef,
>
> On 03/01/2016 10:02 AM, Josef Bacik wrote:
>> We hit a panic pretty consistently in production that looked like this
>>
>> PID: 461061  TASK: ffff880203f8bc00  CPU: 2   COMMAND: "kworker/u8:2"
>>   #0 [ffff88015834b940] machine_kexec at ffffffff8103c1c5
>>   #1 [ffff88015834b990] crash_kexec at ffffffff810cd448
>>   #2 [ffff88015834ba60] oops_end at ffffffff81006478
>>   #3 [ffff88015834ba90] no_context at ffffffff818c5262
>>   #4 [ffff88015834baf0] __bad_area_nosemaphore at ffffffff818c545a
>>   #5 [ffff88015834bb40] bad_area_nosemaphore at ffffffff818c548c
>>   #6 [ffff88015834bb50] __do_page_fault at ffffffff81045ad5
>>   #7 [ffff88015834bbc0] do_page_fault at ffffffff81045efc
>>   #8 [ffff88015834bbd0] page_fault at ffffffff818d6b82
>>      [exception RIP: __uart_start+0x1a]
>>      RIP: ffffffff8152f30a  RSP: ffff88015834bc80  RFLAGS: 00010046
>>      RAX: 0000000000000000  RBX: ffffffff822e9920  RCX: 0000000000000036
>>      RDX: 0000000000003636  RSI: 00000000000000fe  RDI: ffffffff822e9920
>>      RBP: ffff88015834bca8   R8: 0000000000000000   R9: 00000000ffffffff
>>      R10: ffff8802546f0d20  R11: 0000000000000000  R12: ffff880254712400
>>      R13: 0000000000000286  R14: 00000000000000fe  R15: ffff880254712400
>>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>   #9 [ffff88015834bc80] uart_start at ffffffff8152fbf2
>
> Thanks for the report, but where's the rest of the stack trace?

Woops sorry about that

crash> bt
PID: 461061  TASK: ffff880203f8bc00  CPU: 2   COMMAND: "kworker/u8:2"
  #0 [ffff88015834b940] machine_kexec at ffffffff8103c1c5
  #1 [ffff88015834b990] crash_kexec at ffffffff810cd448
  #2 [ffff88015834ba60] oops_end at ffffffff81006478
  #3 [ffff88015834ba90] no_context at ffffffff818c5262
  #4 [ffff88015834baf0] __bad_area_nosemaphore at ffffffff818c545a
  #5 [ffff88015834bb40] bad_area_nosemaphore at ffffffff818c548c
  #6 [ffff88015834bb50] __do_page_fault at ffffffff81045ad5
  #7 [ffff88015834bbc0] do_page_fault at ffffffff81045efc
  #8 [ffff88015834bbd0] page_fault at ffffffff818d6b82
     [exception RIP: __uart_start+0x1a]
     RIP: ffffffff8152f30a  RSP: ffff88015834bc80  RFLAGS: 00010046
     RAX: 0000000000000000  RBX: ffffffff822e9920  RCX: 0000000000000036
     RDX: 0000000000003636  RSI: 00000000000000fe  RDI: ffffffff822e9920
     RBP: ffff88015834bca8   R8: 0000000000000000   R9: 00000000ffffffff
     R10: ffff8802546f0d20  R11: 0000000000000000  R12: ffff880254712400
     R13: 0000000000000286  R14: 00000000000000fe  R15: ffff880254712400
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #9 [ffff88015834bc80] uart_start at ffffffff8152fbf2
#10 [ffff88015834bcb0] uart_flush_chars at ffffffff8152fc1e
#11 [ffff88015834bcc0] n_tty_receive_buf_common at ffffffff81516cf1
#12 [ffff88015834bd80] n_tty_receive_buf2 at ffffffff81517414
#13 [ffff88015834bd90] flush_to_ldisc at ffffffff8151ab6d
#14 [ffff88015834bdf0] process_one_work at ffffffff81069871
#15 [ffff88015834be40] worker_thread at ffffffff81069c53
#16 [ffff88015834bec0] kthread at ffffffff8106f429
#17 [ffff88015834bf50] ret_from_fork at ffffffff818d50c8

>
>> It was a NULL pointer dereference, the state->port.tty was NULL so when we go to
>> check tty->stopped in uart_tx_stopped() we panic.  Looking at the other CPU's we
>> were in the middle of uart_open(), and the core actually had a valid pointer in
>> state->port.tty, which points to a race between either close or hangup (the only
>> two places that set state->port.tty to NULL) and open.  Close already flushes
>> the ldisc but hangup does not, which means we could have some characters in the
>> receive buffer in between the hangup and the open, and we end up in this
>> situation.
>
> Yeah, the race is that the ldisc should not be attempting i/o to
> the driver at all. This problem is fixed in -next already, but in the
> tty core rather than in each individual tty driver.
>

Great!  Which patch/patches fix this?  I looked at linux-next and 
there's a lot of refactoring stuff, do I need all the things or is there 
a specific one that fixes this problem?  Thanks,

Josef

WARNING: multiple messages have this Message-ID (diff)

From: Josef Bacik <jbacik@fb.com>
To: Peter Hurley <peter@hurleysoftware.com>,
	<gregkh@linuxfoundation.org>, <jslaby@suse.com>,
	<linux-serial@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] serial: flush ldisc after hangup
Date: Tue, 1 Mar 2016 13:21:26 -0500	[thread overview]
Message-ID: <56D5DDA6.5080605@fb.com> (raw)
In-Reply-To: <56D5DCA0.2040201@hurleysoftware.com>

On 03/01/2016 01:17 PM, Peter Hurley wrote:
> Hi Josef,
>
> On 03/01/2016 10:02 AM, Josef Bacik wrote:
>> We hit a panic pretty consistently in production that looked like this
>>
>> PID: 461061  TASK: ffff880203f8bc00  CPU: 2   COMMAND: "kworker/u8:2"
>>   #0 [ffff88015834b940] machine_kexec at ffffffff8103c1c5
>>   #1 [ffff88015834b990] crash_kexec at ffffffff810cd448
>>   #2 [ffff88015834ba60] oops_end at ffffffff81006478
>>   #3 [ffff88015834ba90] no_context at ffffffff818c5262
>>   #4 [ffff88015834baf0] __bad_area_nosemaphore at ffffffff818c545a
>>   #5 [ffff88015834bb40] bad_area_nosemaphore at ffffffff818c548c
>>   #6 [ffff88015834bb50] __do_page_fault at ffffffff81045ad5
>>   #7 [ffff88015834bbc0] do_page_fault at ffffffff81045efc
>>   #8 [ffff88015834bbd0] page_fault at ffffffff818d6b82
>>      [exception RIP: __uart_start+0x1a]
>>      RIP: ffffffff8152f30a  RSP: ffff88015834bc80  RFLAGS: 00010046
>>      RAX: 0000000000000000  RBX: ffffffff822e9920  RCX: 0000000000000036
>>      RDX: 0000000000003636  RSI: 00000000000000fe  RDI: ffffffff822e9920
>>      RBP: ffff88015834bca8   R8: 0000000000000000   R9: 00000000ffffffff
>>      R10: ffff8802546f0d20  R11: 0000000000000000  R12: ffff880254712400
>>      R13: 0000000000000286  R14: 00000000000000fe  R15: ffff880254712400
>>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>   #9 [ffff88015834bc80] uart_start at ffffffff8152fbf2
>
> Thanks for the report, but where's the rest of the stack trace?

Woops sorry about that

crash> bt
PID: 461061  TASK: ffff880203f8bc00  CPU: 2   COMMAND: "kworker/u8:2"
  #0 [ffff88015834b940] machine_kexec at ffffffff8103c1c5
  #1 [ffff88015834b990] crash_kexec at ffffffff810cd448
  #2 [ffff88015834ba60] oops_end at ffffffff81006478
  #3 [ffff88015834ba90] no_context at ffffffff818c5262
  #4 [ffff88015834baf0] __bad_area_nosemaphore at ffffffff818c545a
  #5 [ffff88015834bb40] bad_area_nosemaphore at ffffffff818c548c
  #6 [ffff88015834bb50] __do_page_fault at ffffffff81045ad5
  #7 [ffff88015834bbc0] do_page_fault at ffffffff81045efc
  #8 [ffff88015834bbd0] page_fault at ffffffff818d6b82
     [exception RIP: __uart_start+0x1a]
     RIP: ffffffff8152f30a  RSP: ffff88015834bc80  RFLAGS: 00010046
     RAX: 0000000000000000  RBX: ffffffff822e9920  RCX: 0000000000000036
     RDX: 0000000000003636  RSI: 00000000000000fe  RDI: ffffffff822e9920
     RBP: ffff88015834bca8   R8: 0000000000000000   R9: 00000000ffffffff
     R10: ffff8802546f0d20  R11: 0000000000000000  R12: ffff880254712400
     R13: 0000000000000286  R14: 00000000000000fe  R15: ffff880254712400
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #9 [ffff88015834bc80] uart_start at ffffffff8152fbf2
#10 [ffff88015834bcb0] uart_flush_chars at ffffffff8152fc1e
#11 [ffff88015834bcc0] n_tty_receive_buf_common at ffffffff81516cf1
#12 [ffff88015834bd80] n_tty_receive_buf2 at ffffffff81517414
#13 [ffff88015834bd90] flush_to_ldisc at ffffffff8151ab6d
#14 [ffff88015834bdf0] process_one_work at ffffffff81069871
#15 [ffff88015834be40] worker_thread at ffffffff81069c53
#16 [ffff88015834bec0] kthread at ffffffff8106f429
#17 [ffff88015834bf50] ret_from_fork at ffffffff818d50c8

>
>> It was a NULL pointer dereference, the state->port.tty was NULL so when we go to
>> check tty->stopped in uart_tx_stopped() we panic.  Looking at the other CPU's we
>> were in the middle of uart_open(), and the core actually had a valid pointer in
>> state->port.tty, which points to a race between either close or hangup (the only
>> two places that set state->port.tty to NULL) and open.  Close already flushes
>> the ldisc but hangup does not, which means we could have some characters in the
>> receive buffer in between the hangup and the open, and we end up in this
>> situation.
>
> Yeah, the race is that the ldisc should not be attempting i/o to
> the driver at all. This problem is fixed in -next already, but in the
> tty core rather than in each individual tty driver.
>

Great!  Which patch/patches fix this?  I looked at linux-next and 
there's a lot of refactoring stuff, do I need all the things or is there 
a specific one that fixes this problem?  Thanks,

Josef

next prev parent reply	other threads:[~2016-03-01 18:21 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-01 18:02 [PATCH] serial: flush ldisc after hangup Josef Bacik
2016-03-01 18:02 ` Josef Bacik
2016-03-01 18:17 ` Peter Hurley
2016-03-01 18:21   ` Josef Bacik [this message]
2016-03-01 18:21     ` Josef Bacik
2016-03-01 19:01     ` Peter Hurley
2016-03-01 19:04       ` Josef Bacik
2016-03-01 19:04         ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56D5DDA6.5080605@fb.com \
    --to=jbacik@fb.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jslaby@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-serial@vger.kernel.org \
    --cc=peter@hurleysoftware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.