Re: [PATCH] serial: flush ldisc after hangup

Linux Serial subsystem development
 help / color / mirror / Atom feed

From: Josef Bacik <jbacik@fb.com>
To: Peter Hurley <peter@hurleysoftware.com>,
	gregkh@linuxfoundation.org, jslaby@suse.com,
	linux-serial@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] serial: flush ldisc after hangup
Date: Tue, 1 Mar 2016 13:21:26 -0500	[thread overview]
Message-ID: <56D5DDA6.5080605@fb.com> (raw)
In-Reply-To: <56D5DCA0.2040201@hurleysoftware.com>

On 03/01/2016 01:17 PM, Peter Hurley wrote:
> Hi Josef,
>
> On 03/01/2016 10:02 AM, Josef Bacik wrote:
>> We hit a panic pretty consistently in production that looked like this
>>
>> PID: 461061  TASK: ffff880203f8bc00  CPU: 2   COMMAND: "kworker/u8:2"
>>   #0 [ffff88015834b940] machine_kexec at ffffffff8103c1c5
>>   #1 [ffff88015834b990] crash_kexec at ffffffff810cd448
>>   #2 [ffff88015834ba60] oops_end at ffffffff81006478
>>   #3 [ffff88015834ba90] no_context at ffffffff818c5262
>>   #4 [ffff88015834baf0] __bad_area_nosemaphore at ffffffff818c545a
>>   #5 [ffff88015834bb40] bad_area_nosemaphore at ffffffff818c548c
>>   #6 [ffff88015834bb50] __do_page_fault at ffffffff81045ad5
>>   #7 [ffff88015834bbc0] do_page_fault at ffffffff81045efc
>>   #8 [ffff88015834bbd0] page_fault at ffffffff818d6b82
>>      [exception RIP: __uart_start+0x1a]
>>      RIP: ffffffff8152f30a  RSP: ffff88015834bc80  RFLAGS: 00010046
>>      RAX: 0000000000000000  RBX: ffffffff822e9920  RCX: 0000000000000036
>>      RDX: 0000000000003636  RSI: 00000000000000fe  RDI: ffffffff822e9920
>>      RBP: ffff88015834bca8   R8: 0000000000000000   R9: 00000000ffffffff
>>      R10: ffff8802546f0d20  R11: 0000000000000000  R12: ffff880254712400
>>      R13: 0000000000000286  R14: 00000000000000fe  R15: ffff880254712400
>>      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>   #9 [ffff88015834bc80] uart_start at ffffffff8152fbf2
>
> Thanks for the report, but where's the rest of the stack trace?

Woops sorry about that

crash> bt
PID: 461061  TASK: ffff880203f8bc00  CPU: 2   COMMAND: "kworker/u8:2"
  #0 [ffff88015834b940] machine_kexec at ffffffff8103c1c5
  #1 [ffff88015834b990] crash_kexec at ffffffff810cd448
  #2 [ffff88015834ba60] oops_end at ffffffff81006478
  #3 [ffff88015834ba90] no_context at ffffffff818c5262
  #4 [ffff88015834baf0] __bad_area_nosemaphore at ffffffff818c545a
  #5 [ffff88015834bb40] bad_area_nosemaphore at ffffffff818c548c
  #6 [ffff88015834bb50] __do_page_fault at ffffffff81045ad5
  #7 [ffff88015834bbc0] do_page_fault at ffffffff81045efc
  #8 [ffff88015834bbd0] page_fault at ffffffff818d6b82
     [exception RIP: __uart_start+0x1a]
     RIP: ffffffff8152f30a  RSP: ffff88015834bc80  RFLAGS: 00010046
     RAX: 0000000000000000  RBX: ffffffff822e9920  RCX: 0000000000000036
     RDX: 0000000000003636  RSI: 00000000000000fe  RDI: ffffffff822e9920
     RBP: ffff88015834bca8   R8: 0000000000000000   R9: 00000000ffffffff
     R10: ffff8802546f0d20  R11: 0000000000000000  R12: ffff880254712400
     R13: 0000000000000286  R14: 00000000000000fe  R15: ffff880254712400
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #9 [ffff88015834bc80] uart_start at ffffffff8152fbf2
#10 [ffff88015834bcb0] uart_flush_chars at ffffffff8152fc1e
#11 [ffff88015834bcc0] n_tty_receive_buf_common at ffffffff81516cf1
#12 [ffff88015834bd80] n_tty_receive_buf2 at ffffffff81517414
#13 [ffff88015834bd90] flush_to_ldisc at ffffffff8151ab6d
#14 [ffff88015834bdf0] process_one_work at ffffffff81069871
#15 [ffff88015834be40] worker_thread at ffffffff81069c53
#16 [ffff88015834bec0] kthread at ffffffff8106f429
#17 [ffff88015834bf50] ret_from_fork at ffffffff818d50c8

>
>> It was a NULL pointer dereference, the state->port.tty was NULL so when we go to
>> check tty->stopped in uart_tx_stopped() we panic.  Looking at the other CPU's we
>> were in the middle of uart_open(), and the core actually had a valid pointer in
>> state->port.tty, which points to a race between either close or hangup (the only
>> two places that set state->port.tty to NULL) and open.  Close already flushes
>> the ldisc but hangup does not, which means we could have some characters in the
>> receive buffer in between the hangup and the open, and we end up in this
>> situation.
>
> Yeah, the race is that the ldisc should not be attempting i/o to
> the driver at all. This problem is fixed in -next already, but in the
> tty core rather than in each individual tty driver.
>

Great!  Which patch/patches fix this?  I looked at linux-next and 
there's a lot of refactoring stuff, do I need all the things or is there 
a specific one that fixes this problem?  Thanks,

Josef

next prev parent reply	other threads:[~2016-03-01 18:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-01 18:02 [PATCH] serial: flush ldisc after hangup Josef Bacik
2016-03-01 18:17 ` Peter Hurley
2016-03-01 18:21   ` Josef Bacik [this message]
2016-03-01 19:01     ` Peter Hurley
2016-03-01 19:04       ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56D5DDA6.5080605@fb.com \
    --to=jbacik@fb.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jslaby@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-serial@vger.kernel.org \
    --cc=peter@hurleysoftware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox