All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Daney <ddaney@caviumnetworks.com>
To: Peter Hurley <peter@hurleysoftware.com>,
	Dann Frazier <dann.frazier@canonical.com>
Cc: Ming Lei <ming.lei@canonical.com>,
	Scot Doyle <lkml14@scotdoyle.com>,
	David Airlie <airlied@linux.ie>,
	dri-devel@lists.freedesktop.org, "Chintakuntla,
	Radha" <Radha.Chintakuntla@caviumnetworks.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jiri Slaby <jslaby@suse.com>, Pavel Machek <pavel@ucw.cz>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: ast: cursor flashing softlockups
Date: Tue, 17 May 2016 11:07:44 -0700	[thread overview]
Message-ID: <573B5DF0.3080808@caviumnetworks.com> (raw)
In-Reply-To: <573B5743.6000309@caviumnetworks.com>

On 05/17/2016 10:39 AM, David Daney wrote:
> I can confirm this.

I have a patch that I think is the proper fix.  You should see it soon 
(after I test it a bit more)

Thanks,
David Daney


[...]

>>>
>>> On Tue, May 17, 2016 at 4:07 AM, Dann Frazier
>>> <dann.frazier@canonical.com> wrote:
>>>> Hi,
>>>>   I'm observing a soft lockup issue w/ the ASPEED controller on an
>>>> arm64 server platform. This was originally seen on Ubuntu's 4.4
>>>> kernel, but it is reproducible w/ vanilla 4.6-rc7 as well.
>>>>
>>>> [   32.792656] NMI watchdog: BUG: soft lockup - CPU#38 stuck for 22s!
>>>> [swapper/38:0]
>>>>
>>>> I observe this just once each time I boot into debian-installer (I'm
>>>> using a serial console, but the ast module gets loaded during
>>>> startup).
>>>
>>> I have figured out that it is caused by 'mod_timer(timer, jiffies)' and
>>> 'ops->cur_blink_jiffies' is observed as zero in cursor_timer_handler()
>>> when the issue happened.
>>
>> Thanks for tracking this down.
>>
>> This softlockup looks to be caused by:
>>
>>     commit 27a4c827c34ac4256a190cc9d24607f953c1c459
>>     Author: Scot Doyle <lkml14@scotdoyle.com>
>>     Date:   Thu Mar 26 13:56:38 2015 +0000
>>
>>         fbcon: use the cursor blink interval provided by vt
>>
>>         vt now provides a cursor blink interval via vc_data. Use this
>>         interval instead of the currently hardcoded 200 msecs. Store
>> it in
>>         fbcon_ops to avoid locking the console in cursor_timer_handler().
>>
>>         Signed-off-by: Scot Doyle <lkml14@scotdoyle.com>
>>         Acked-by: Pavel Machek <pavel@ucw.cz>
>>         Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>
>> and
>>
>>     commit bd63364caa8df38bad2b25b11b2a1b849475cce5
>>     Author: Scot Doyle <lkml14@scotdoyle.com>
>>     Date:   Thu Mar 26 13:54:39 2015 +0000
>>
>>         vt: add cursor blink interval escape sequence
>>
>>         Add an escape sequence to specify the current console's cursor
>> blink
>>         interval. The interval is specified as a number of
>> milliseconds until
>>         the next cursor display state toggle, from 50 to 65535.
>> /proc/loadavg
>>         did not show a difference with a one msec interval, but the lower
>>         bound is set to 50 msecs since slower hardware wasn't tested.
>>
>>         Store the interval in the vc_data structure for later access
>> by fbcon,
>>         initializing the value to fbcon's current hardcoded value of
>> 200 msecs.
>>
>>         Signed-off-by: Scot Doyle <lkml14@scotdoyle.com>
>>         Acked-by: Pavel Machek <pavel@ucw.cz>
>>         Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>
>>
>>
>>> Looks it is a real fbcon/vt issue, see following:
>>>
>>> fbcon_init()
>>>      <-.con_init
>>>            <-visual_init()
>>>
>>> reset_terminal()
>>>      <-vc_init()
>>>
>>> vc->vc_cur_blink_ms is just set in reset_terminal() from vc_init() path,
>>> and ops->cur_blink_jiffies is figured out from vc->vc_cur_blink_ms
>>> in fbcon_init().
>>>
>>> And visual_init() is always run before vc_init(), so
>>> ops->cur_blink_jiffies
>>> is initialized as zero and cause the soft lockup issue finally.
>>>
>>> Thanks,
>>> Ming
>>>
>>>>
>>>> perf shows that the CPU caught by the NMI is typically in code
>>>> updating the cursor timer:
>>>>
>>>> -   16.92%  swapper  [kernel.kallsyms]      [k]
>>>> _raw_spin_unlock_irqrestore
>>>>     - _raw_spin_unlock_irqrestore
>>>>        + 16.87% mod_timer
>>>>        + 0.05% cursor_timer_handler
>>>> -   12.15%  swapper  [kernel.kallsyms]      [k] queue_work_on
>>>>     - queue_work_on
>>>>        + 12.00% cursor_timer_handler
>>>>        + 0.15% call_timer_fn
>>>> +   10.98%  swapper  [kernel.kallsyms]      [k] run_timer_softirq
>>>> -    2.23%  swapper  [kernel.kallsyms]      [k] mod_timer
>>>>     - mod_timer
>>>>        + 1.97% cursor_timer_handler
>>>>        + 0.26% call_timer_fn
>>>>
>>>> During the same period, I can see that another CPU is actively
>>>> executing the timer function:
>>>>
>>>> -   42.18%  kworker/u96:2  [kernel.kallsyms]  [k] ww_mutex_unlock
>>>>     - ww_mutex_unlock
>>>>        - 40.70% ast_dirty_update
>>>>             ast_imageblit
>>>>             soft_cursor
>>>>             bit_cursor
>>>>             fb_flashcursor
>>>>             process_one_work
>>>>             worker_thread
>>>>             kthread
>>>>             ret_from_fork
>>>>        + 1.48% ast_imageblit
>>>> -   40.15%  kworker/u96:2  [kernel.kallsyms]  [k] __memcpy_toio
>>>>     - __memcpy_toio
>>>>        + 31.54% ast_dirty_update
>>>>        + 8.61% ast_imageblit
>>>>
>>>> Using the graph function tracer on fb_flashcursor(), I see that
>>>> ast_dirty_update usually takes around 60 us, in which it makes 16
>>>> calls to __memcpy_toio(). However, there is always one instance on
>>>> every boot of the installer where ast_dirty_update() takes ~98 *ms* to
>>>> complete, during which it makes 743 calls to __memcpy_toio(). While
>>>> that  doesn't directly account for the full 22s, I do wonder if that
>>>> maybe a smoking gun.
>>>>
>>>> fyi, this is being tracked at: https://bugs.launchpad.net/bugs/1574814
>>>>
>>>>    -dann
>>
>

WARNING: multiple messages have this Message-ID (diff)
From: David Daney <ddaney@caviumnetworks.com>
To: Peter Hurley <peter@hurleysoftware.com>,
	Dann Frazier <dann.frazier@canonical.com>
Cc: Ming Lei <ming.lei@canonical.com>,
	Scot Doyle <lkml14@scotdoyle.com>,
	David Airlie <airlied@linux.ie>,
	<dri-devel@lists.freedesktop.org>,
	"Chintakuntla, Radha" <Radha.Chintakuntla@caviumnetworks.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jiri Slaby <jslaby@suse.com>, Pavel Machek <pavel@ucw.cz>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: ast: cursor flashing softlockups
Date: Tue, 17 May 2016 11:07:44 -0700	[thread overview]
Message-ID: <573B5DF0.3080808@caviumnetworks.com> (raw)
In-Reply-To: <573B5743.6000309@caviumnetworks.com>

On 05/17/2016 10:39 AM, David Daney wrote:
> I can confirm this.

I have a patch that I think is the proper fix.  You should see it soon 
(after I test it a bit more)

Thanks,
David Daney


[...]

>>>
>>> On Tue, May 17, 2016 at 4:07 AM, Dann Frazier
>>> <dann.frazier@canonical.com> wrote:
>>>> Hi,
>>>>   I'm observing a soft lockup issue w/ the ASPEED controller on an
>>>> arm64 server platform. This was originally seen on Ubuntu's 4.4
>>>> kernel, but it is reproducible w/ vanilla 4.6-rc7 as well.
>>>>
>>>> [   32.792656] NMI watchdog: BUG: soft lockup - CPU#38 stuck for 22s!
>>>> [swapper/38:0]
>>>>
>>>> I observe this just once each time I boot into debian-installer (I'm
>>>> using a serial console, but the ast module gets loaded during
>>>> startup).
>>>
>>> I have figured out that it is caused by 'mod_timer(timer, jiffies)' and
>>> 'ops->cur_blink_jiffies' is observed as zero in cursor_timer_handler()
>>> when the issue happened.
>>
>> Thanks for tracking this down.
>>
>> This softlockup looks to be caused by:
>>
>>     commit 27a4c827c34ac4256a190cc9d24607f953c1c459
>>     Author: Scot Doyle <lkml14@scotdoyle.com>
>>     Date:   Thu Mar 26 13:56:38 2015 +0000
>>
>>         fbcon: use the cursor blink interval provided by vt
>>
>>         vt now provides a cursor blink interval via vc_data. Use this
>>         interval instead of the currently hardcoded 200 msecs. Store
>> it in
>>         fbcon_ops to avoid locking the console in cursor_timer_handler().
>>
>>         Signed-off-by: Scot Doyle <lkml14@scotdoyle.com>
>>         Acked-by: Pavel Machek <pavel@ucw.cz>
>>         Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>
>> and
>>
>>     commit bd63364caa8df38bad2b25b11b2a1b849475cce5
>>     Author: Scot Doyle <lkml14@scotdoyle.com>
>>     Date:   Thu Mar 26 13:54:39 2015 +0000
>>
>>         vt: add cursor blink interval escape sequence
>>
>>         Add an escape sequence to specify the current console's cursor
>> blink
>>         interval. The interval is specified as a number of
>> milliseconds until
>>         the next cursor display state toggle, from 50 to 65535.
>> /proc/loadavg
>>         did not show a difference with a one msec interval, but the lower
>>         bound is set to 50 msecs since slower hardware wasn't tested.
>>
>>         Store the interval in the vc_data structure for later access
>> by fbcon,
>>         initializing the value to fbcon's current hardcoded value of
>> 200 msecs.
>>
>>         Signed-off-by: Scot Doyle <lkml14@scotdoyle.com>
>>         Acked-by: Pavel Machek <pavel@ucw.cz>
>>         Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>
>>
>>
>>> Looks it is a real fbcon/vt issue, see following:
>>>
>>> fbcon_init()
>>>      <-.con_init
>>>            <-visual_init()
>>>
>>> reset_terminal()
>>>      <-vc_init()
>>>
>>> vc->vc_cur_blink_ms is just set in reset_terminal() from vc_init() path,
>>> and ops->cur_blink_jiffies is figured out from vc->vc_cur_blink_ms
>>> in fbcon_init().
>>>
>>> And visual_init() is always run before vc_init(), so
>>> ops->cur_blink_jiffies
>>> is initialized as zero and cause the soft lockup issue finally.
>>>
>>> Thanks,
>>> Ming
>>>
>>>>
>>>> perf shows that the CPU caught by the NMI is typically in code
>>>> updating the cursor timer:
>>>>
>>>> -   16.92%  swapper  [kernel.kallsyms]      [k]
>>>> _raw_spin_unlock_irqrestore
>>>>     - _raw_spin_unlock_irqrestore
>>>>        + 16.87% mod_timer
>>>>        + 0.05% cursor_timer_handler
>>>> -   12.15%  swapper  [kernel.kallsyms]      [k] queue_work_on
>>>>     - queue_work_on
>>>>        + 12.00% cursor_timer_handler
>>>>        + 0.15% call_timer_fn
>>>> +   10.98%  swapper  [kernel.kallsyms]      [k] run_timer_softirq
>>>> -    2.23%  swapper  [kernel.kallsyms]      [k] mod_timer
>>>>     - mod_timer
>>>>        + 1.97% cursor_timer_handler
>>>>        + 0.26% call_timer_fn
>>>>
>>>> During the same period, I can see that another CPU is actively
>>>> executing the timer function:
>>>>
>>>> -   42.18%  kworker/u96:2  [kernel.kallsyms]  [k] ww_mutex_unlock
>>>>     - ww_mutex_unlock
>>>>        - 40.70% ast_dirty_update
>>>>             ast_imageblit
>>>>             soft_cursor
>>>>             bit_cursor
>>>>             fb_flashcursor
>>>>             process_one_work
>>>>             worker_thread
>>>>             kthread
>>>>             ret_from_fork
>>>>        + 1.48% ast_imageblit
>>>> -   40.15%  kworker/u96:2  [kernel.kallsyms]  [k] __memcpy_toio
>>>>     - __memcpy_toio
>>>>        + 31.54% ast_dirty_update
>>>>        + 8.61% ast_imageblit
>>>>
>>>> Using the graph function tracer on fb_flashcursor(), I see that
>>>> ast_dirty_update usually takes around 60 us, in which it makes 16
>>>> calls to __memcpy_toio(). However, there is always one instance on
>>>> every boot of the installer where ast_dirty_update() takes ~98 *ms* to
>>>> complete, during which it makes 743 calls to __memcpy_toio(). While
>>>> that  doesn't directly account for the full 22s, I do wonder if that
>>>> maybe a smoking gun.
>>>>
>>>> fyi, this is being tracked at: https://bugs.launchpad.net/bugs/1574814
>>>>
>>>>    -dann
>>
>

  reply	other threads:[~2016-05-17 18:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-16 20:07 ast: cursor flashing softlockups Dann Frazier
2016-05-17  9:12 ` Ming Lei
2016-05-17 14:29   ` Peter Hurley
2016-05-17 14:29     ` Peter Hurley
2016-05-17 17:39     ` David Daney
2016-05-17 17:39       ` David Daney
2016-05-17 18:07       ` David Daney [this message]
2016-05-17 18:07         ` David Daney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=573B5DF0.3080808@caviumnetworks.com \
    --to=ddaney@caviumnetworks.com \
    --cc=Radha.Chintakuntla@caviumnetworks.com \
    --cc=airlied@linux.ie \
    --cc=dann.frazier@canonical.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jslaby@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkml14@scotdoyle.com \
    --cc=ming.lei@canonical.com \
    --cc=pavel@ucw.cz \
    --cc=peter@hurleysoftware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.