From: Peter Hurley <peter@hurleysoftware.com>
To: Ming Lei <ming.lei@canonical.com>,
Dann Frazier <dann.frazier@canonical.com>,
Scot Doyle <lkml14@scotdoyle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
dri-devel@lists.freedesktop.org, "Chintakuntla,
Radha" <Radha.Chintakuntla@caviumnetworks.com>,
Pavel Machek <pavel@ucw.cz>, Jiri Slaby <jslaby@suse.com>
Subject: Re: ast: cursor flashing softlockups
Date: Tue, 17 May 2016 07:29:04 -0700 [thread overview]
Message-ID: <573B2AB0.7070004@hurleysoftware.com> (raw)
In-Reply-To: <CACVXFVPXSxjaXPy-+3anwyYR39s3Z_9_K2BeO4uC8QK8jC7Sdw@mail.gmail.com>
[ +to Scot Doyle ]
Scot, please take a look at this soft lockup.
Regards,
Peter Hurley
Hi Ming,
On 05/17/2016 02:12 AM, Ming Lei wrote:
> Hi,
>
> On Tue, May 17, 2016 at 4:07 AM, Dann Frazier
> <dann.frazier@canonical.com> wrote:
>> Hi,
>> I'm observing a soft lockup issue w/ the ASPEED controller on an
>> arm64 server platform. This was originally seen on Ubuntu's 4.4
>> kernel, but it is reproducible w/ vanilla 4.6-rc7 as well.
>>
>> [ 32.792656] NMI watchdog: BUG: soft lockup - CPU#38 stuck for 22s!
>> [swapper/38:0]
>>
>> I observe this just once each time I boot into debian-installer (I'm
>> using a serial console, but the ast module gets loaded during
>> startup).
>
> I have figured out that it is caused by 'mod_timer(timer, jiffies)' and
> 'ops->cur_blink_jiffies' is observed as zero in cursor_timer_handler()
> when the issue happened.
Thanks for tracking this down.
This softlockup looks to be caused by:
commit 27a4c827c34ac4256a190cc9d24607f953c1c459
Author: Scot Doyle <lkml14@scotdoyle.com>
Date: Thu Mar 26 13:56:38 2015 +0000
fbcon: use the cursor blink interval provided by vt
vt now provides a cursor blink interval via vc_data. Use this
interval instead of the currently hardcoded 200 msecs. Store it in
fbcon_ops to avoid locking the console in cursor_timer_handler().
Signed-off-by: Scot Doyle <lkml14@scotdoyle.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
and
commit bd63364caa8df38bad2b25b11b2a1b849475cce5
Author: Scot Doyle <lkml14@scotdoyle.com>
Date: Thu Mar 26 13:54:39 2015 +0000
vt: add cursor blink interval escape sequence
Add an escape sequence to specify the current console's cursor blink
interval. The interval is specified as a number of milliseconds until
the next cursor display state toggle, from 50 to 65535. /proc/loadavg
did not show a difference with a one msec interval, but the lower
bound is set to 50 msecs since slower hardware wasn't tested.
Store the interval in the vc_data structure for later access by fbcon,
initializing the value to fbcon's current hardcoded value of 200 msecs.
Signed-off-by: Scot Doyle <lkml14@scotdoyle.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Looks it is a real fbcon/vt issue, see following:
>
> fbcon_init()
> <-.con_init
> <-visual_init()
>
> reset_terminal()
> <-vc_init()
>
> vc->vc_cur_blink_ms is just set in reset_terminal() from vc_init() path,
> and ops->cur_blink_jiffies is figured out from vc->vc_cur_blink_ms
> in fbcon_init().
>
> And visual_init() is always run before vc_init(), so ops->cur_blink_jiffies
> is initialized as zero and cause the soft lockup issue finally.
>
> Thanks,
> Ming
>
>>
>> perf shows that the CPU caught by the NMI is typically in code
>> updating the cursor timer:
>>
>> - 16.92% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
>> - _raw_spin_unlock_irqrestore
>> + 16.87% mod_timer
>> + 0.05% cursor_timer_handler
>> - 12.15% swapper [kernel.kallsyms] [k] queue_work_on
>> - queue_work_on
>> + 12.00% cursor_timer_handler
>> + 0.15% call_timer_fn
>> + 10.98% swapper [kernel.kallsyms] [k] run_timer_softirq
>> - 2.23% swapper [kernel.kallsyms] [k] mod_timer
>> - mod_timer
>> + 1.97% cursor_timer_handler
>> + 0.26% call_timer_fn
>>
>> During the same period, I can see that another CPU is actively
>> executing the timer function:
>>
>> - 42.18% kworker/u96:2 [kernel.kallsyms] [k] ww_mutex_unlock
>> - ww_mutex_unlock
>> - 40.70% ast_dirty_update
>> ast_imageblit
>> soft_cursor
>> bit_cursor
>> fb_flashcursor
>> process_one_work
>> worker_thread
>> kthread
>> ret_from_fork
>> + 1.48% ast_imageblit
>> - 40.15% kworker/u96:2 [kernel.kallsyms] [k] __memcpy_toio
>> - __memcpy_toio
>> + 31.54% ast_dirty_update
>> + 8.61% ast_imageblit
>>
>> Using the graph function tracer on fb_flashcursor(), I see that
>> ast_dirty_update usually takes around 60 us, in which it makes 16
>> calls to __memcpy_toio(). However, there is always one instance on
>> every boot of the installer where ast_dirty_update() takes ~98 *ms* to
>> complete, during which it makes 743 calls to __memcpy_toio(). While
>> that doesn't directly account for the full 22s, I do wonder if that
>> maybe a smoking gun.
>>
>> fyi, this is being tracked at: https://bugs.launchpad.net/bugs/1574814
>>
>> -dann
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
WARNING: multiple messages have this Message-ID (diff)
From: Peter Hurley <peter@hurleysoftware.com>
To: Ming Lei <ming.lei@canonical.com>,
Dann Frazier <dann.frazier@canonical.com>,
Scot Doyle <lkml14@scotdoyle.com>
Cc: David Airlie <airlied@linux.ie>,
dri-devel@lists.freedesktop.org, "Chintakuntla,
Radha" <Radha.Chintakuntla@caviumnetworks.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Jiri Slaby <jslaby@suse.com>, Pavel Machek <pavel@ucw.cz>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: ast: cursor flashing softlockups
Date: Tue, 17 May 2016 07:29:04 -0700 [thread overview]
Message-ID: <573B2AB0.7070004@hurleysoftware.com> (raw)
In-Reply-To: <CACVXFVPXSxjaXPy-+3anwyYR39s3Z_9_K2BeO4uC8QK8jC7Sdw@mail.gmail.com>
[ +to Scot Doyle ]
Scot, please take a look at this soft lockup.
Regards,
Peter Hurley
Hi Ming,
On 05/17/2016 02:12 AM, Ming Lei wrote:
> Hi,
>
> On Tue, May 17, 2016 at 4:07 AM, Dann Frazier
> <dann.frazier@canonical.com> wrote:
>> Hi,
>> I'm observing a soft lockup issue w/ the ASPEED controller on an
>> arm64 server platform. This was originally seen on Ubuntu's 4.4
>> kernel, but it is reproducible w/ vanilla 4.6-rc7 as well.
>>
>> [ 32.792656] NMI watchdog: BUG: soft lockup - CPU#38 stuck for 22s!
>> [swapper/38:0]
>>
>> I observe this just once each time I boot into debian-installer (I'm
>> using a serial console, but the ast module gets loaded during
>> startup).
>
> I have figured out that it is caused by 'mod_timer(timer, jiffies)' and
> 'ops->cur_blink_jiffies' is observed as zero in cursor_timer_handler()
> when the issue happened.
Thanks for tracking this down.
This softlockup looks to be caused by:
commit 27a4c827c34ac4256a190cc9d24607f953c1c459
Author: Scot Doyle <lkml14@scotdoyle.com>
Date: Thu Mar 26 13:56:38 2015 +0000
fbcon: use the cursor blink interval provided by vt
vt now provides a cursor blink interval via vc_data. Use this
interval instead of the currently hardcoded 200 msecs. Store it in
fbcon_ops to avoid locking the console in cursor_timer_handler().
Signed-off-by: Scot Doyle <lkml14@scotdoyle.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
and
commit bd63364caa8df38bad2b25b11b2a1b849475cce5
Author: Scot Doyle <lkml14@scotdoyle.com>
Date: Thu Mar 26 13:54:39 2015 +0000
vt: add cursor blink interval escape sequence
Add an escape sequence to specify the current console's cursor blink
interval. The interval is specified as a number of milliseconds until
the next cursor display state toggle, from 50 to 65535. /proc/loadavg
did not show a difference with a one msec interval, but the lower
bound is set to 50 msecs since slower hardware wasn't tested.
Store the interval in the vc_data structure for later access by fbcon,
initializing the value to fbcon's current hardcoded value of 200 msecs.
Signed-off-by: Scot Doyle <lkml14@scotdoyle.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Looks it is a real fbcon/vt issue, see following:
>
> fbcon_init()
> <-.con_init
> <-visual_init()
>
> reset_terminal()
> <-vc_init()
>
> vc->vc_cur_blink_ms is just set in reset_terminal() from vc_init() path,
> and ops->cur_blink_jiffies is figured out from vc->vc_cur_blink_ms
> in fbcon_init().
>
> And visual_init() is always run before vc_init(), so ops->cur_blink_jiffies
> is initialized as zero and cause the soft lockup issue finally.
>
> Thanks,
> Ming
>
>>
>> perf shows that the CPU caught by the NMI is typically in code
>> updating the cursor timer:
>>
>> - 16.92% swapper [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
>> - _raw_spin_unlock_irqrestore
>> + 16.87% mod_timer
>> + 0.05% cursor_timer_handler
>> - 12.15% swapper [kernel.kallsyms] [k] queue_work_on
>> - queue_work_on
>> + 12.00% cursor_timer_handler
>> + 0.15% call_timer_fn
>> + 10.98% swapper [kernel.kallsyms] [k] run_timer_softirq
>> - 2.23% swapper [kernel.kallsyms] [k] mod_timer
>> - mod_timer
>> + 1.97% cursor_timer_handler
>> + 0.26% call_timer_fn
>>
>> During the same period, I can see that another CPU is actively
>> executing the timer function:
>>
>> - 42.18% kworker/u96:2 [kernel.kallsyms] [k] ww_mutex_unlock
>> - ww_mutex_unlock
>> - 40.70% ast_dirty_update
>> ast_imageblit
>> soft_cursor
>> bit_cursor
>> fb_flashcursor
>> process_one_work
>> worker_thread
>> kthread
>> ret_from_fork
>> + 1.48% ast_imageblit
>> - 40.15% kworker/u96:2 [kernel.kallsyms] [k] __memcpy_toio
>> - __memcpy_toio
>> + 31.54% ast_dirty_update
>> + 8.61% ast_imageblit
>>
>> Using the graph function tracer on fb_flashcursor(), I see that
>> ast_dirty_update usually takes around 60 us, in which it makes 16
>> calls to __memcpy_toio(). However, there is always one instance on
>> every boot of the installer where ast_dirty_update() takes ~98 *ms* to
>> complete, during which it makes 743 calls to __memcpy_toio(). While
>> that doesn't directly account for the full 22s, I do wonder if that
>> maybe a smoking gun.
>>
>> fyi, this is being tracked at: https://bugs.launchpad.net/bugs/1574814
>>
>> -dann
next prev parent reply other threads:[~2016-05-17 14:29 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-16 20:07 ast: cursor flashing softlockups Dann Frazier
2016-05-17 9:12 ` Ming Lei
2016-05-17 14:29 ` Peter Hurley [this message]
2016-05-17 14:29 ` Peter Hurley
2016-05-17 17:39 ` David Daney
2016-05-17 17:39 ` David Daney
2016-05-17 18:07 ` David Daney
2016-05-17 18:07 ` David Daney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=573B2AB0.7070004@hurleysoftware.com \
--to=peter@hurleysoftware.com \
--cc=Radha.Chintakuntla@caviumnetworks.com \
--cc=dann.frazier@canonical.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=gregkh@linuxfoundation.org \
--cc=jslaby@suse.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lkml14@scotdoyle.com \
--cc=ming.lei@canonical.com \
--cc=pavel@ucw.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.