From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752263AbcEQSHx (ORCPT ); Tue, 17 May 2016 14:07:53 -0400 Received: from mail-bl2on0067.outbound.protection.outlook.com ([65.55.169.67]:21184 "EHLO na01-bl2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750761AbcEQSHv (ORCPT ); Tue, 17 May 2016 14:07:51 -0400 Authentication-Results: hurleysoftware.com; dkim=none (message not signed) header.d=none;hurleysoftware.com; dmarc=none action=none header.from=caviumnetworks.com; Message-ID: <573B5DF0.3080808@caviumnetworks.com> Date: Tue, 17 May 2016 11:07:44 -0700 From: David Daney User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130625 Thunderbird/17.0.7 MIME-Version: 1.0 To: Peter Hurley , Dann Frazier CC: Ming Lei , Scot Doyle , David Airlie , , "Chintakuntla, Radha" , Greg Kroah-Hartman , Jiri Slaby , Pavel Machek , Linux Kernel Mailing List Subject: Re: ast: cursor flashing softlockups References: <573B2AB0.7070004@hurleysoftware.com> <573B5743.6000309@caviumnetworks.com> In-Reply-To: <573B5743.6000309@caviumnetworks.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [50.233.148.158] X-ClientProxiedBy: SN1PR0701CA0028.namprd07.prod.outlook.com (10.162.96.38) To BN4PR07MB2132.namprd07.prod.outlook.com (10.164.63.14) X-MS-Office365-Filtering-Correlation-Id: 66f56290-53c0-453d-ed39-08d37e7e27d2 X-Microsoft-Exchange-Diagnostics: 1;BN4PR07MB2132;2:RgMbnI8+XjbHXrxvTyHpLV/XmX+TXQXYFUZgSylzsG7rcHv4DxpXtxve6f796BbKy1DXJmHrbAKNtwdcNBkuYaBh1ifQA+PQVlbpzhMybZObjEHm9qkuYRxeUqLCglYUA+D/kFFkIdZDBmBXH0HGM4eFoQXQO70eYsatY5yu/R+vLsHK0yb7X3Jar/LMfp66;3:8OLMxUeIobIl3dvxR/O1t6u412D/Ncs8Vl/2rvGvwOxxwoOD+/hN/kz73NcXkCMxO9fjlk4UUUQ09zKbJRrbzQSRPmojosMD7L0DKAaKSp+VaTKEHB8uOX8oyKyfz1nm;25:jqoAVHbiNyUybOcyBgSfBNYcalrDz7rC925yEAxgnI6B2+pis0Tv9gYdPHZ2ckKqh5pzRIgCOVpeSRqbNFKUghj41pptm+UGe3FxN+sqNBJAzqYpHRC8tdOKPcVaAOOePD4nC38b1aCjSK0blm5SwPJo9ChiQuDwp/eH7zs5BZ5eIl8W4N+syyg70O2gPj2iAthi6AxqIbS86mVR7ZLZ/7DlsKrZQ+2E+cA/1UKzpXvjZEXldvPVl7MApUkd/vKbj491Vnu+6jj2AvaWU+eClM5M9qKKIgzwRwXO47ZgJ9McdzFFCL/dthReHIhydOE2onwh0GcD5lFahFcmXwwVoJzJJOjvUxKEirKjFH93svyjo3LnL8vBB7N9ot6Rt2Yo6xwgkXqIR4qblw+R4VgT/fZ46rlxhnew1ar6OBCFd9o= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BN4PR07MB2132; X-Microsoft-Exchange-Diagnostics: 1;BN4PR07MB2132;20:yJTIng91IzgUBKegTkILJmHobspl8ohlJMk6LMsBImjg+qdhuFdLhGvkNBYILopF497eSGuIw8rRQytiEYuQyjs0k8+Y4p2fmoMzWKluZEX60x63QIHt9Ph/YV/hZq9q5QOP3KF81w+fsLxmM1+OhUbmQcpeVpJ0oCml0RuW0BbTbTupXurTBTPLLhPR7NClyfPwkxa+pldYEvrXlpM6JYZenFbHb6c+j6A12MgeFx69tOPHN3JCaBfv51/qrl0ckwcnDnnm4W5kHtFqTK7jMMrRkT4JwaRhGZaa55vpyn68gFuxzxFQbebT1FvrKqG2dICJt1mHB3XiG3VBrEbKIqKeHrzrBfoSkP0GpKmyDkYy1yw0xfnUxJg9L7VlP5daA/hhiSX/NUXPCcRhNyClSjUsg+P5eHIKMaaphOLcH0aJnoUKBusu8ssGQtKbDQbYp87fXb7X1bhnbG23DiTFd8pEw5PfUzLVApKQuaQ3gW0gGBlysR24fmgtBUdnBrOT1QN4O4UCvbUHXhtXvd73lqX18MB5ydKEuM90tuve0o+4XHiZudj0TLM2QUEsh4usqcWqQcqOuateyWIiOS2HWZ2LDgNLhsJu+sPr4kjc0eI= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001);SRVR:BN4PR07MB2132;BCL:0;PCL:0;RULEID:;SRVR:BN4PR07MB2132; X-Microsoft-Exchange-Diagnostics: 1;BN4PR07MB2132;4:s5jG2fz0SVo/GKffYUbjMJWly0+vm6T2a97cz3iCmn7EA5bR06O7+x6nY9BtacBA0mAMYGToVX62dq8Bdnd9vItgtGfJDInr9rt/nOrK42NtTBiRLBzm/3XExGJTTIscVW5ml3T8DAMA53fmJ+NK2yFkpEQ4B3idGwh6xDB83SG5mZkp9iMEaGTRHBsJ0wBDwnmNgC8FXzUUjvfg5n/1axUf8/8S4U5Ieas7f2I+n01TXnDjrwz33RKlbPvzyOEFuLIbPiTpm1+acj/TST2FAcisQP0/6AV+wKYp4MH7aNDShOoccW9cy/189AplmpLTaRLFjclAYZKpCa7EGNnGoOGqjOP2eR5Kq68nj7k8hskDxNGkGkcpdRfKDmoISUI5 X-Forefront-PRVS: 0945B0CC72 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(4630300001)(6009001)(24454002)(377454003)(5001770100001)(4001350100001)(33656002)(15975445007)(2906002)(6116002)(3846002)(586003)(77096005)(5008740100001)(189998001)(50986999)(5004730100002)(4326007)(81166006)(8676002)(64126003)(66066001)(65956001)(92566002)(2950100001)(23676002)(47776003)(19580405001)(50466002)(230700001)(59896002)(83506001)(19580395003)(54356999)(65816999)(87266999)(36756003)(76176999)(53416004)(93886004)(42186005);DIR:OUT;SFP:1101;SCL:1;SRVR:BN4PR07MB2132;H:dl.caveonetworks.com;FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtCTjRQUjA3TUIyMTMyOzIzOllFS3M0ZFNSdmlKUGlSdGJHRFptbThyeGZL?= =?utf-8?B?SXAzK3lqb1ZWaW9DRytHWFNiWHFISHBoZXVONzdwY2Fzcm83S0hrT0NDUE1S?= =?utf-8?B?MFVHZHFGb1o1a2pxR3prWVdlV1ZHWEFrYytuYWlLK3VxVWxxcXZvNTFISmpT?= =?utf-8?B?c2EwQnBHTmhoeS9HS2ZwK2JPOVhPYzVkMnZkOHBWdnd3SzZNTjcrRjJqNEhE?= =?utf-8?B?QnRIUVlsSnRSY3lDV0lCdmw3c2dCbWduRTJEdHVma3JzSXRQMjRoTHp6K01o?= =?utf-8?B?TTZOTjJtY1JZZWxRamcrNkN3Nmc3VVdPQ003eElDYkRmbjlodnZpei9mQlJY?= =?utf-8?B?KzI2SEduT1NMQVR2ZmIrdmRJZFZEMnNYUDh1Uk1vdmN3YU9GZ2VEbW5hTlpj?= =?utf-8?B?QmNlMU03ZmF2d0dTNkFtQmdNRlAvNjNra2wzZ21sdzNEbmo0OHhRUUtIemto?= =?utf-8?B?VTVYZTAyNHdMZFIvSHZCZ0xBZGJrem5QRVJacjhqNWpRUnZqVVc2b1M2cm04?= =?utf-8?B?VXkyN0FiODZHZGY3akwyK281Q3d6eVY3TUx6U2xzTEwwdnh2TGpzUWFub0Zl?= =?utf-8?B?LzdWaE9Bd2VRZDNiRlRjOUxtYTViY3FEQWJsbmhnaWhZUmR5VXZWQ3JlanpO?= =?utf-8?B?Z1lWKy96Vk1udVZZK25MelRoM2FVdW43MENsWVlQQXJyL0hlc1krRVYwRWFa?= =?utf-8?B?MUhtZ0FkTEVpbm1aRmNlN0FPWU0xSmo4M24xZmpSdEhFMTB3ZDFJVHVwa04r?= =?utf-8?B?RWdDNERSMGppU3BaSm9aTTFPWmhWS2NnWnRDZSszZzZDQkNIVkRwMmlGYkFv?= =?utf-8?B?a1lZU2o5dTJBdGRoTnFWSVpiYk0wODloMVRWMkRxcEZWWUhDaGtzaVlxS0xJ?= =?utf-8?B?WmFKckpqTWpjcEhtNUczQVNFT0MzU3ltVVdtU0k1UlhTNkRnVmZ6QjF4bmpV?= =?utf-8?B?Z3BpVEJXbG82Wmg0dHVQZ0hZVGFUcXh5cWtFZEszM0VPWUR5RXlRNVpDWDNZ?= =?utf-8?B?VkpSTEp0aVJ5R0lZQTl4cWQ0WmdpOWpYWTB6VTEwSS92RmNiaEVqZ3ZDS2Qx?= =?utf-8?B?cTR5emRBYlRHMGtzUlJMb09RdU0wRHFJeVZ1citWaGljT2t2dUF5aDNVQ3o2?= =?utf-8?B?dTM3cmRqc0JaQ2xWMWlZOW1uc2poMllRaHdlWDNlWEZHaGZFQ29pQlk0dFZW?= =?utf-8?B?clR0N0c1WEk1NFQ4Q20xUVE2Sy81NlRXbTQvOGQ1OG5WK25sUFhocXpieUFl?= =?utf-8?B?WUpIODRJSmdtNzZ5K3NCUXQ4T0FzR25xdTJMcWhiV1dhOVNiL24vSmVHbXNn?= =?utf-8?B?azJ2ZXU4Z0tYZzBBaG5hQzBNRVlqc1BQVERycjVmRk0vdEN3Y09lZ3lBQjNP?= =?utf-8?B?RVYreEdSUW0zcmRXY3c4YmRMSVpIckJ1U0Z3YjZzTC9EaXAyOXZ0TjlYdTVM?= =?utf-8?B?TUVMKy9Zbkh1Sm9TNFAycFdPbkx1M0xhTTN3YlcxdUYxN0o3SFBkaXR3WmxH?= =?utf-8?B?UjFLUT09?= X-Microsoft-Exchange-Diagnostics: 1;BN4PR07MB2132;5:z0ZsAwNVMcuS5sExHtsXRUTBXKGWpLTlKfzY5D817F4+CDlVVQoVhlHP+zBGPJ9CHbhYlWXID/cwwX8SAkJHX/H5rCkbG0aRKBe8I7ZpnoT1yDZ+CsVlbfo/KxKbo6+ZiBmqxvjKraTS+/R85Ov/Vg==;24:zpRy5a7U4XRTZHxZUSyWSGBgu//GexuhjpR/YXsBuicRtkPSGTrn+Mt5tj4cPd+wNXckDDKf1L1DmOeq987W3eMFeK7demfmJFmWWPvZE5o=;7:AuAXpEuczyr6gatcqI5ugIk17wD1qhaoaHtCWYQjHnB4fo/svgkt72Q/xRnuVeeKsgZTIKr5M0uX4MveJxvQrNLTNJDkNV0GTMywin2XMuQKsB6T+qmSSTU6qmFY8387zdUiaI7J+6hbvXI3qvTUpwlT2pq4j5nxLuwxybXdKedqjfxi8aIy8/TqcTAK6SjA SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 May 2016 18:07:47.8329 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN4PR07MB2132 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/17/2016 10:39 AM, David Daney wrote: > I can confirm this. I have a patch that I think is the proper fix. You should see it soon (after I test it a bit more) Thanks, David Daney [...] >>> >>> On Tue, May 17, 2016 at 4:07 AM, Dann Frazier >>> wrote: >>>> Hi, >>>> I'm observing a soft lockup issue w/ the ASPEED controller on an >>>> arm64 server platform. This was originally seen on Ubuntu's 4.4 >>>> kernel, but it is reproducible w/ vanilla 4.6-rc7 as well. >>>> >>>> [ 32.792656] NMI watchdog: BUG: soft lockup - CPU#38 stuck for 22s! >>>> [swapper/38:0] >>>> >>>> I observe this just once each time I boot into debian-installer (I'm >>>> using a serial console, but the ast module gets loaded during >>>> startup). >>> >>> I have figured out that it is caused by 'mod_timer(timer, jiffies)' and >>> 'ops->cur_blink_jiffies' is observed as zero in cursor_timer_handler() >>> when the issue happened. >> >> Thanks for tracking this down. >> >> This softlockup looks to be caused by: >> >> commit 27a4c827c34ac4256a190cc9d24607f953c1c459 >> Author: Scot Doyle >> Date: Thu Mar 26 13:56:38 2015 +0000 >> >> fbcon: use the cursor blink interval provided by vt >> >> vt now provides a cursor blink interval via vc_data. Use this >> interval instead of the currently hardcoded 200 msecs. Store >> it in >> fbcon_ops to avoid locking the console in cursor_timer_handler(). >> >> Signed-off-by: Scot Doyle >> Acked-by: Pavel Machek >> Signed-off-by: Greg Kroah-Hartman >> >> and >> >> commit bd63364caa8df38bad2b25b11b2a1b849475cce5 >> Author: Scot Doyle >> Date: Thu Mar 26 13:54:39 2015 +0000 >> >> vt: add cursor blink interval escape sequence >> >> Add an escape sequence to specify the current console's cursor >> blink >> interval. The interval is specified as a number of >> milliseconds until >> the next cursor display state toggle, from 50 to 65535. >> /proc/loadavg >> did not show a difference with a one msec interval, but the lower >> bound is set to 50 msecs since slower hardware wasn't tested. >> >> Store the interval in the vc_data structure for later access >> by fbcon, >> initializing the value to fbcon's current hardcoded value of >> 200 msecs. >> >> Signed-off-by: Scot Doyle >> Acked-by: Pavel Machek >> Signed-off-by: Greg Kroah-Hartman >> >> >> >>> Looks it is a real fbcon/vt issue, see following: >>> >>> fbcon_init() >>> <-.con_init >>> <-visual_init() >>> >>> reset_terminal() >>> <-vc_init() >>> >>> vc->vc_cur_blink_ms is just set in reset_terminal() from vc_init() path, >>> and ops->cur_blink_jiffies is figured out from vc->vc_cur_blink_ms >>> in fbcon_init(). >>> >>> And visual_init() is always run before vc_init(), so >>> ops->cur_blink_jiffies >>> is initialized as zero and cause the soft lockup issue finally. >>> >>> Thanks, >>> Ming >>> >>>> >>>> perf shows that the CPU caught by the NMI is typically in code >>>> updating the cursor timer: >>>> >>>> - 16.92% swapper [kernel.kallsyms] [k] >>>> _raw_spin_unlock_irqrestore >>>> - _raw_spin_unlock_irqrestore >>>> + 16.87% mod_timer >>>> + 0.05% cursor_timer_handler >>>> - 12.15% swapper [kernel.kallsyms] [k] queue_work_on >>>> - queue_work_on >>>> + 12.00% cursor_timer_handler >>>> + 0.15% call_timer_fn >>>> + 10.98% swapper [kernel.kallsyms] [k] run_timer_softirq >>>> - 2.23% swapper [kernel.kallsyms] [k] mod_timer >>>> - mod_timer >>>> + 1.97% cursor_timer_handler >>>> + 0.26% call_timer_fn >>>> >>>> During the same period, I can see that another CPU is actively >>>> executing the timer function: >>>> >>>> - 42.18% kworker/u96:2 [kernel.kallsyms] [k] ww_mutex_unlock >>>> - ww_mutex_unlock >>>> - 40.70% ast_dirty_update >>>> ast_imageblit >>>> soft_cursor >>>> bit_cursor >>>> fb_flashcursor >>>> process_one_work >>>> worker_thread >>>> kthread >>>> ret_from_fork >>>> + 1.48% ast_imageblit >>>> - 40.15% kworker/u96:2 [kernel.kallsyms] [k] __memcpy_toio >>>> - __memcpy_toio >>>> + 31.54% ast_dirty_update >>>> + 8.61% ast_imageblit >>>> >>>> Using the graph function tracer on fb_flashcursor(), I see that >>>> ast_dirty_update usually takes around 60 us, in which it makes 16 >>>> calls to __memcpy_toio(). However, there is always one instance on >>>> every boot of the installer where ast_dirty_update() takes ~98 *ms* to >>>> complete, during which it makes 743 calls to __memcpy_toio(). While >>>> that doesn't directly account for the full 22s, I do wonder if that >>>> maybe a smoking gun. >>>> >>>> fyi, this is being tracked at: https://bugs.launchpad.net/bugs/1574814 >>>> >>>> -dann >> >