All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Hurley <peter@hurleysoftware.com>
To: Ming Lei <ming.lei@canonical.com>,
	Dann Frazier <dann.frazier@canonical.com>,
	Scot Doyle <lkml14@scotdoyle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	dri-devel@lists.freedesktop.org, "Chintakuntla,
	Radha" <Radha.Chintakuntla@caviumnetworks.com>,
	Pavel Machek <pavel@ucw.cz>, Jiri Slaby <jslaby@suse.com>
Subject: Re: ast: cursor flashing softlockups
Date: Tue, 17 May 2016 07:29:04 -0700	[thread overview]
Message-ID: <573B2AB0.7070004@hurleysoftware.com> (raw)
In-Reply-To: <CACVXFVPXSxjaXPy-+3anwyYR39s3Z_9_K2BeO4uC8QK8jC7Sdw@mail.gmail.com>

[ +to Scot Doyle ]

Scot, please take a look at this soft lockup.

Regards,
Peter Hurley


Hi Ming,

On 05/17/2016 02:12 AM, Ming Lei wrote:
> Hi,
> 
> On Tue, May 17, 2016 at 4:07 AM, Dann Frazier
> <dann.frazier@canonical.com> wrote:
>> Hi,
>>  I'm observing a soft lockup issue w/ the ASPEED controller on an
>> arm64 server platform. This was originally seen on Ubuntu's 4.4
>> kernel, but it is reproducible w/ vanilla 4.6-rc7 as well.
>>
>> [   32.792656] NMI watchdog: BUG: soft lockup - CPU#38 stuck for 22s!
>> [swapper/38:0]
>>
>> I observe this just once each time I boot into debian-installer (I'm
>> using a serial console, but the ast module gets loaded during
>> startup).
> 
> I have figured out that it is caused by 'mod_timer(timer, jiffies)' and
> 'ops->cur_blink_jiffies' is observed as zero in cursor_timer_handler()
> when the issue happened.

Thanks for tracking this down.

This softlockup looks to be caused by:

	commit 27a4c827c34ac4256a190cc9d24607f953c1c459
	Author: Scot Doyle <lkml14@scotdoyle.com>
	Date:   Thu Mar 26 13:56:38 2015 +0000

	    fbcon: use the cursor blink interval provided by vt
    
	    vt now provides a cursor blink interval via vc_data. Use this
	    interval instead of the currently hardcoded 200 msecs. Store it in
	    fbcon_ops to avoid locking the console in cursor_timer_handler().
    
	    Signed-off-by: Scot Doyle <lkml14@scotdoyle.com>
	    Acked-by: Pavel Machek <pavel@ucw.cz>
	    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

and

	commit bd63364caa8df38bad2b25b11b2a1b849475cce5
	Author: Scot Doyle <lkml14@scotdoyle.com>
	Date:   Thu Mar 26 13:54:39 2015 +0000

	    vt: add cursor blink interval escape sequence
    
	    Add an escape sequence to specify the current console's cursor blink
	    interval. The interval is specified as a number of milliseconds until
	    the next cursor display state toggle, from 50 to 65535. /proc/loadavg
	    did not show a difference with a one msec interval, but the lower
	    bound is set to 50 msecs since slower hardware wasn't tested.
    
	    Store the interval in the vc_data structure for later access by fbcon,
	    initializing the value to fbcon's current hardcoded value of 200 msecs.
    
	    Signed-off-by: Scot Doyle <lkml14@scotdoyle.com>
	    Acked-by: Pavel Machek <pavel@ucw.cz>
	    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>



> Looks it is a real fbcon/vt issue, see following:
> 
> fbcon_init()
>     <-.con_init
>           <-visual_init()
> 
> reset_terminal()
>     <-vc_init()
> 
> vc->vc_cur_blink_ms is just set in reset_terminal() from vc_init() path,
> and ops->cur_blink_jiffies is figured out from vc->vc_cur_blink_ms
> in fbcon_init().
> 
> And visual_init() is always run before vc_init(), so ops->cur_blink_jiffies
> is initialized as zero and cause the soft lockup issue finally.
> 
> Thanks,
> Ming
> 
>>
>> perf shows that the CPU caught by the NMI is typically in code
>> updating the cursor timer:
>>
>> -   16.92%  swapper  [kernel.kallsyms]      [k] _raw_spin_unlock_irqrestore
>>    - _raw_spin_unlock_irqrestore
>>       + 16.87% mod_timer
>>       + 0.05% cursor_timer_handler
>> -   12.15%  swapper  [kernel.kallsyms]      [k] queue_work_on
>>    - queue_work_on
>>       + 12.00% cursor_timer_handler
>>       + 0.15% call_timer_fn
>> +   10.98%  swapper  [kernel.kallsyms]      [k] run_timer_softirq
>> -    2.23%  swapper  [kernel.kallsyms]      [k] mod_timer
>>    - mod_timer
>>       + 1.97% cursor_timer_handler
>>       + 0.26% call_timer_fn
>>
>> During the same period, I can see that another CPU is actively
>> executing the timer function:
>>
>> -   42.18%  kworker/u96:2  [kernel.kallsyms]  [k] ww_mutex_unlock
>>    - ww_mutex_unlock
>>       - 40.70% ast_dirty_update
>>            ast_imageblit
>>            soft_cursor
>>            bit_cursor
>>            fb_flashcursor
>>            process_one_work
>>            worker_thread
>>            kthread
>>            ret_from_fork
>>       + 1.48% ast_imageblit
>> -   40.15%  kworker/u96:2  [kernel.kallsyms]  [k] __memcpy_toio
>>    - __memcpy_toio
>>       + 31.54% ast_dirty_update
>>       + 8.61% ast_imageblit
>>
>> Using the graph function tracer on fb_flashcursor(), I see that
>> ast_dirty_update usually takes around 60 us, in which it makes 16
>> calls to __memcpy_toio(). However, there is always one instance on
>> every boot of the installer where ast_dirty_update() takes ~98 *ms* to
>> complete, during which it makes 743 calls to __memcpy_toio(). While
>> that  doesn't directly account for the full 22s, I do wonder if that
>> maybe a smoking gun.
>>
>> fyi, this is being tracked at: https://bugs.launchpad.net/bugs/1574814
>>
>>   -dann

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

WARNING: multiple messages have this Message-ID (diff)
From: Peter Hurley <peter@hurleysoftware.com>
To: Ming Lei <ming.lei@canonical.com>,
	Dann Frazier <dann.frazier@canonical.com>,
	Scot Doyle <lkml14@scotdoyle.com>
Cc: David Airlie <airlied@linux.ie>,
	dri-devel@lists.freedesktop.org, "Chintakuntla,
	Radha" <Radha.Chintakuntla@caviumnetworks.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jiri Slaby <jslaby@suse.com>, Pavel Machek <pavel@ucw.cz>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: ast: cursor flashing softlockups
Date: Tue, 17 May 2016 07:29:04 -0700	[thread overview]
Message-ID: <573B2AB0.7070004@hurleysoftware.com> (raw)
In-Reply-To: <CACVXFVPXSxjaXPy-+3anwyYR39s3Z_9_K2BeO4uC8QK8jC7Sdw@mail.gmail.com>

[ +to Scot Doyle ]

Scot, please take a look at this soft lockup.

Regards,
Peter Hurley


Hi Ming,

On 05/17/2016 02:12 AM, Ming Lei wrote:
> Hi,
> 
> On Tue, May 17, 2016 at 4:07 AM, Dann Frazier
> <dann.frazier@canonical.com> wrote:
>> Hi,
>>  I'm observing a soft lockup issue w/ the ASPEED controller on an
>> arm64 server platform. This was originally seen on Ubuntu's 4.4
>> kernel, but it is reproducible w/ vanilla 4.6-rc7 as well.
>>
>> [   32.792656] NMI watchdog: BUG: soft lockup - CPU#38 stuck for 22s!
>> [swapper/38:0]
>>
>> I observe this just once each time I boot into debian-installer (I'm
>> using a serial console, but the ast module gets loaded during
>> startup).
> 
> I have figured out that it is caused by 'mod_timer(timer, jiffies)' and
> 'ops->cur_blink_jiffies' is observed as zero in cursor_timer_handler()
> when the issue happened.

Thanks for tracking this down.

This softlockup looks to be caused by:

	commit 27a4c827c34ac4256a190cc9d24607f953c1c459
	Author: Scot Doyle <lkml14@scotdoyle.com>
	Date:   Thu Mar 26 13:56:38 2015 +0000

	    fbcon: use the cursor blink interval provided by vt
    
	    vt now provides a cursor blink interval via vc_data. Use this
	    interval instead of the currently hardcoded 200 msecs. Store it in
	    fbcon_ops to avoid locking the console in cursor_timer_handler().
    
	    Signed-off-by: Scot Doyle <lkml14@scotdoyle.com>
	    Acked-by: Pavel Machek <pavel@ucw.cz>
	    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

and

	commit bd63364caa8df38bad2b25b11b2a1b849475cce5
	Author: Scot Doyle <lkml14@scotdoyle.com>
	Date:   Thu Mar 26 13:54:39 2015 +0000

	    vt: add cursor blink interval escape sequence
    
	    Add an escape sequence to specify the current console's cursor blink
	    interval. The interval is specified as a number of milliseconds until
	    the next cursor display state toggle, from 50 to 65535. /proc/loadavg
	    did not show a difference with a one msec interval, but the lower
	    bound is set to 50 msecs since slower hardware wasn't tested.
    
	    Store the interval in the vc_data structure for later access by fbcon,
	    initializing the value to fbcon's current hardcoded value of 200 msecs.
    
	    Signed-off-by: Scot Doyle <lkml14@scotdoyle.com>
	    Acked-by: Pavel Machek <pavel@ucw.cz>
	    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>



> Looks it is a real fbcon/vt issue, see following:
> 
> fbcon_init()
>     <-.con_init
>           <-visual_init()
> 
> reset_terminal()
>     <-vc_init()
> 
> vc->vc_cur_blink_ms is just set in reset_terminal() from vc_init() path,
> and ops->cur_blink_jiffies is figured out from vc->vc_cur_blink_ms
> in fbcon_init().
> 
> And visual_init() is always run before vc_init(), so ops->cur_blink_jiffies
> is initialized as zero and cause the soft lockup issue finally.
> 
> Thanks,
> Ming
> 
>>
>> perf shows that the CPU caught by the NMI is typically in code
>> updating the cursor timer:
>>
>> -   16.92%  swapper  [kernel.kallsyms]      [k] _raw_spin_unlock_irqrestore
>>    - _raw_spin_unlock_irqrestore
>>       + 16.87% mod_timer
>>       + 0.05% cursor_timer_handler
>> -   12.15%  swapper  [kernel.kallsyms]      [k] queue_work_on
>>    - queue_work_on
>>       + 12.00% cursor_timer_handler
>>       + 0.15% call_timer_fn
>> +   10.98%  swapper  [kernel.kallsyms]      [k] run_timer_softirq
>> -    2.23%  swapper  [kernel.kallsyms]      [k] mod_timer
>>    - mod_timer
>>       + 1.97% cursor_timer_handler
>>       + 0.26% call_timer_fn
>>
>> During the same period, I can see that another CPU is actively
>> executing the timer function:
>>
>> -   42.18%  kworker/u96:2  [kernel.kallsyms]  [k] ww_mutex_unlock
>>    - ww_mutex_unlock
>>       - 40.70% ast_dirty_update
>>            ast_imageblit
>>            soft_cursor
>>            bit_cursor
>>            fb_flashcursor
>>            process_one_work
>>            worker_thread
>>            kthread
>>            ret_from_fork
>>       + 1.48% ast_imageblit
>> -   40.15%  kworker/u96:2  [kernel.kallsyms]  [k] __memcpy_toio
>>    - __memcpy_toio
>>       + 31.54% ast_dirty_update
>>       + 8.61% ast_imageblit
>>
>> Using the graph function tracer on fb_flashcursor(), I see that
>> ast_dirty_update usually takes around 60 us, in which it makes 16
>> calls to __memcpy_toio(). However, there is always one instance on
>> every boot of the installer where ast_dirty_update() takes ~98 *ms* to
>> complete, during which it makes 743 calls to __memcpy_toio(). While
>> that  doesn't directly account for the full 22s, I do wonder if that
>> maybe a smoking gun.
>>
>> fyi, this is being tracked at: https://bugs.launchpad.net/bugs/1574814
>>
>>   -dann

  reply	other threads:[~2016-05-17 14:29 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-16 20:07 ast: cursor flashing softlockups Dann Frazier
2016-05-17  9:12 ` Ming Lei
2016-05-17 14:29   ` Peter Hurley [this message]
2016-05-17 14:29     ` Peter Hurley
2016-05-17 17:39     ` David Daney
2016-05-17 17:39       ` David Daney
2016-05-17 18:07       ` David Daney
2016-05-17 18:07         ` David Daney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=573B2AB0.7070004@hurleysoftware.com \
    --to=peter@hurleysoftware.com \
    --cc=Radha.Chintakuntla@caviumnetworks.com \
    --cc=dann.frazier@canonical.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jslaby@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkml14@scotdoyle.com \
    --cc=ming.lei@canonical.com \
    --cc=pavel@ucw.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.