From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933178AbcINTO7 (ORCPT ); Wed, 14 Sep 2016 15:14:59 -0400 Received: from mail-sn1nam02on0126.outbound.protection.outlook.com ([104.47.36.126]:29328 "EHLO NAM02-SN1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S933041AbcINTO5 (ORCPT ); Wed, 14 Sep 2016 15:14:57 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=waiman.long@hpe.com; Message-ID: <57D9A1A9.8050506@hpe.com> Date: Wed, 14 Sep 2016 15:14:49 -0400 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: "Theodore Ts'o" CC: Waiman Long , Arnd Bergmann , Greg Kroah-Hartman , , Linus Torvalds , Scott J Norton , Douglas Hatch Subject: Re: [PATCH] random: Fix kernel panic due to system_wq use before init References: <1473879781-23819-1-git-send-email-Waiman.Long@hpe.com> In-Reply-To: <1473879781-23819-1-git-send-email-Waiman.Long@hpe.com> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [72.71.243.249] X-ClientProxiedBy: CY1PR08CA0003.namprd08.prod.outlook.com (10.163.94.141) To CS1PR84MB0312.NAMPRD84.PROD.OUTLOOK.COM (10.162.190.30) X-MS-Office365-Filtering-Correlation-Id: 35d6c8a0-d4ce-4fbb-655e-08d3dcd369e2 X-Microsoft-Exchange-Diagnostics: 1;CS1PR84MB0312;2:O+3YEgEY2bwgN5kaq9YzSozOMq6KAOSg/FfE1+ms8x9CML6jNZPVTtRXY7mLVXXdCdonLe0T0kwSpIMehyQ7yPs+da5DSrT6+J05rjV15PrUapuxnaoR5vJkCYaU56QpkJ7N3cnHPETF6lClLmqLVqGy/1pthAqqa+BGB1oyMHTa+LehFZi9trWq5y21/xnO;3:wA9GMmzEm+/bklW0DOREFK4kzTdGiDcZ5cXGr6c4r3mxq3UeELdyt/Vw0WAPh0oiFu/GqUd7X7xTMBecodDqPjTbak7Vn3seCMN57FNYzqFtMkzBqWJ7vcZbG+VKkPOq;25:ULEzvKxceG4h0Zd4lPaMTlDb5TfictBVKhBl+7Gc4u63Syz77sSqkOSlImKreuOulTXMefoyGsy2r/sJcTwwbI25z794XTouqi172cQQmy4oNEYrYJsoqrsIvg3us4UzMfm7XBkb2dC/caF8jbnwqe3mC8RvI7mVPKSwOAJGm5bBXJlCgXKRHrFTgmLxlr6Vzamopgx3k5LORzZyimkZc4YaaeHZrKtPm1pnp2xbUfZ7G5ZlZSQx4AAThXuzGqeQd6pEeK2HNuQ2fvnCXpejTlquJ7xced/sHcom0YNYTH702NmczEQt4mhAJEqaG00O+uLuj1rcFVocTDS139YgIl4w7ucGpYeZxh37H5YBog3Xox3sMRABCLkAFrjwIWI+i4TTLUSlSHQrLU/5+s73DUE0GtGa3NNjgmjSUjyY5sg= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CS1PR84MB0312; X-Microsoft-Exchange-Diagnostics: 1;CS1PR84MB0312;31:mtpYzcXudscbgapbogHCXl4AIP7dwKcN3yn/D1tDEqDVlSdKKt/cRCT+3EupztsMVVRbnvsOC27gQimw9a5GztbtcUTMls73u2UMFfOnHWpkf/xV4SUYO6xJtiVV1Wr67eMgt8FoqsVqA6DJtZz3hPhA4RSKlGEZfRnSK8Uye0LL4yLu3NEARkUI4Z6cupmuYFdnHH+sVjXdf5XyVihTntiiO1jC29GJyLobGmpPd+E=;20:6Cki1V3Lh6Bf3El4QcCzQZhAw6utvfltZF4ppeAl7jvGhs7J/XNsCd3yPLKghqqMXnUQ/qYBy8HPLpHvyZy+Jv7za7MExdn/+BCzO2odAMwSOb6G1kA801xVsTCkuxV1jQHesz6EB/aJ/3jIxmLq+A8dOgWYZgbH/tq0azRnFNAJXyoQ7TkMyCtFZPXM8m6nSeZBTpDfY1fVYyifDkz4fw0ImIK5b1umcA82zHQ5O2FkQg9RHcgQcLFgADLMldR9ijpi5TFx56aKGoKvOChILOJGZhcaoYJo/mGigHW4kyER/9E5y6tuTEM6iA0uIGi6JlxNHh+9gdkrksgDS7Rq6jJABRo9nQaqkWI8k3LXXb9NM+N0VdkAId6xg8qk16AXpPHvaCQPt+XzCv26Ie0p+/o4Jbj/8qYZjVaTcKpT5OOY9B184tWVSbqlmlwv8L9d68IPM6Wg4g8f5JqeZozdHJiP4W9QCYfgRA83G43L2H2WBHIV3BKD5eG6Z9hIEFJ+ X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(227479698468861); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6055026);SRVR:CS1PR84MB0312;BCL:0;PCL:0;RULEID:;SRVR:CS1PR84MB0312; X-Microsoft-Exchange-Diagnostics: 1;CS1PR84MB0312;4:uIzLaC6W1D5TpTcR+Uo15Ss8+u+8bP3f49E/adiQvbBNrHLE3q/3l/l633UEuuxf72Ah9CTKrfjkQCkT/v9Lwvxg6l94DJqvlYLQ6jleWbkFyMDY7CiOXZV5IhUOhhbMWCsl0NZTxgYMLUFF9KzX+nUkxI+4OQAHNwO5ZmDXln+4pTU5vBJ+4k8iZM0Vj1ZjlcoN1UjfRHMEgyTB0IImuieu7K4TN2mkG+HKSiGpApQDFKUQhYtmUyuX/YlzReA6iH4vTXX61hFQSl+tmm7W8snJ92PXkIZ3kbZjH6h9YOV7wJ5Rbt6i/Bi93UugBUexlprBTlboYhZq/yvHFw87ktlbcX3rZ54Hg3RUe3NTZRYJyMYiZ2oWlEYGsR7DBdAVv9RUoq6+aF4v/OTn4/sSINkxXbcRpTl8iOItESWe9k4z0w9fiYF+1jyao8FXGzwSEPTolFTrDnsOkUfhY4f8Cw== X-Forefront-PRVS: 006546F32A X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(6049001)(7916002)(24454002)(199003)(377454003)(189002)(105586002)(36756003)(101416001)(54356999)(76176999)(50986999)(106356001)(42186005)(65816999)(87266999)(77096005)(2950100001)(97736004)(2906002)(47776003)(305945005)(7736002)(68736007)(7846002)(65806001)(66066001)(65956001)(4326007)(117156001)(59896002)(19580405001)(230700001)(3846002)(586003)(6116002)(19580395003)(80316001)(4001350100001)(189998001)(110136003)(64126003)(81156014)(33656002)(575784001)(81166006)(5660300001)(23756003)(83506001)(50466002)(92566002)(2171001)(8676002)(86362001);DIR:OUT;SFP:1102;SCL:1;SRVR:CS1PR84MB0312;H:[192.168.142.204];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?iso-8859-1?Q?1;CS1PR84MB0312;23:/HfXADpFSlmb97+CHFp/VQLRKuHcZUsTb8x/qIF?= =?iso-8859-1?Q?+VOViZWYwH0IJsEW6gT3dRjC3luIS0M97xe33jQXCVFcnAbmoGTPlklNUc?= =?iso-8859-1?Q?xz4VfsQOotQZ323JVUvC3mn2WpDJTLjIgGfsSmItz9v680ftuD9ZGSjb0s?= =?iso-8859-1?Q?i8tTA/BGzue7LX1jN9i9N5pr10bCJlOxWrWEcF/f4HaSCPyAfKejsoNLV/?= =?iso-8859-1?Q?WAqldywKsQQvmUGtNyZ8nPrm7+5i8KBRSQxfnrzvYzUD6ctMuGqIYwxj5g?= =?iso-8859-1?Q?ueoc1olZTKG0CTwD9HI5V5pNnQg5Iuo0w0YqfWZC4RU8UfagFghsQA+oLo?= =?iso-8859-1?Q?85SZs9A1Y7YIRL+96Jl2CbMWlVSLQWiW8yq1JQld2XhlXYJ0oc1YdNjBce?= =?iso-8859-1?Q?MA967misPsIgykfyHADL9qQXokdC2xobirfPdl9tiZDAK0y2pKBHOB5obC?= =?iso-8859-1?Q?TRglwbob76hnKeAK7M3vX9AGHg2BKanyvIDzzJ/g7o2W5uYo3XWWJqhSiU?= =?iso-8859-1?Q?+FSSh0htah9BZbPkcsGD4+5N6ebbBFRo6RJparUgrOyZE2mECqPJ9DEeHE?= =?iso-8859-1?Q?QPftw5cd+XClfZIZ7SXbl96r79ZHDz9fZXr31p1wPRYqnO3zYKjC6gVb6D?= =?iso-8859-1?Q?8otLgOOC6AeXxYIMxQpiOn7OWmwqaSUXvasnbXTu8lm0fKXVQvwJDYp2Ym?= =?iso-8859-1?Q?uNeK+vzbYcUFPw60EGp5EgQ4GTBLy8o/LvymqbfHcgxblJKRc4vnaK52Wq?= =?iso-8859-1?Q?4zkEQGDHGTBgrHQe88wbBuZlW3C33rmgb/hsW2bNimCoMIeXpO5Vrgb46s?= =?iso-8859-1?Q?2Q0H1arCAgXLxBZiwUW5B39USjWUTUdZrUYqJF8wncc9XES5HGjEnhJ1Sj?= =?iso-8859-1?Q?w4aL4cBqaULj58jpKNRiiVBmhd8zD1az0maDFZO9j8hpnAT7wuik+6wjUN?= =?iso-8859-1?Q?56UAmydzphMIweR9lGSnShfyJZG3tjwonzzpQFSFInsLmxH5hhXdNblmLN?= =?iso-8859-1?Q?JS2ifGTDfWx1IkeSW7nYyLHsK6ejhsvwTXkRlRh5cQJpydwr2w9yoYTxHL?= =?iso-8859-1?Q?zVPCRq1eKVdYa0gnBpf3XmQrRMMUeEEu8k2M1I2cGhOS1+Rj5q2UftafPw?= =?iso-8859-1?Q?WgEAQe4gJDnrSrsY7Py21hNe+mqDR7y2VK0D/Z7DvEaMMa0ybJuTxDAEmD?= =?iso-8859-1?Q?dbMv2M9v3qdnXRpvGTKDxEj7iCzwF3/K/URFBMoNyACpkn1g1Cj4PO6lpC?= =?iso-8859-1?Q?iv3UY2uyL5/XDJ1npPU4q+/f2y40T2l+uKjSjPG2Hx9W+8/DGURJdbNGdd?= =?iso-8859-1?Q?/eGHjt4PlXvR8+WaPyGS5VU2hb3AyRRGbdxZVpOL7D5Z57WU+X71nogUGk?= =?iso-8859-1?Q?M0eAgVn6ixSq9YzlgS0uX7CB4CjpSoHAZ8InbgcoYt1M/bLgSDwfKAO8Sc?= =?iso-8859-1?Q?AgqDjU8LpMA+/Q=3D?= X-Microsoft-Exchange-Diagnostics: 1;CS1PR84MB0312;6:/XdUZd43KzjNCMJcrPxsEyUz/syjFZoO6CoGEts2VBrr33WFrDriDMBrTrgNOdkMm6WOrKzFyAfGjnCieZ7lSWXUZ6YRHcE/jNfzwHk8Tvo0mLAaALyPs5anyJmaZNopKF+2z6u9tMRzY8eg1xJ6emkshHf3XJAi2nDMmf76JjNnhXnViH7wYONgJ3twpDXakoqmE4WnJmXaX44zB9wConZySVLZPcLD2UJpZzuWFYjGwY+cn08KCjjf4oHBk+s+QOuV7mG7FY8hE5veYFaWS4xQtD30an6WbNqW02r7tS93+Z9I+o2inoumqNs1wrRlp2DYJEKGmCgNE3vZeC4fPA==;5:aIzwBNkhAqIxP0j7j3DpvxlJYYdPuFvPmnYsII0X2uUFdfBSRf4zizC2uChvKHD89r0sDzJ4xoqelCkIYxP1cf9Yu/EutHEwDBKgr4yHpkXuiL0mwNd5NBe/0JEOC8/7dQn6J/YkU7og7h9GhCwe6A==;24:fJ3T131jWxbtNJDPF4Lfgy+KEwCtblrHimQHfWIJTtdLOj99951/cpWLFLJtjQ6DiKkQNxR+RqIDmX5u2wY5Gvxt3RrMuQ2TMytyWiKCRVk=;7:i0F3zpch60RpXv6jgTaOmZvxQc/jsO6kKg7kYO5IfVdEAdw238k/+K9vVrphdjYjWZYl84OdSFO3HhzGeAhdRoqWCKcuF8d+W74ur3DFd+MLLNA6S9ohiOXTfdFlLolDvQ2UDEVS2dSyzDnf6chRK50lmlSEIynmv99BM85sGH8qxj2Aw6BJEdFoxFtzoVGB4HirTLizitrbKF8McRWmI1AsbepntQdZVsUMw4UOLJz4GkBwP64YmXEn+UlSJasM SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: hpe.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Sep 2016 19:14:55.4210 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CS1PR84MB0312 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/14/2016 03:03 PM, Waiman Long wrote: > While booting a 4.8-rc6 kernel on a 16-socket 768-thread Broadwell-EX > system, the kernel panic'ed with the following log: > > [ 51.837010] BUG: unable to handle kernel NULL pointer dereference at 0000000000000102 > [ 51.845635] IP: [] __queue_work+0x32/0x420 > [ 52.004366] Call Trace: > [ 52.007053] > [ 52.009171] [] queue_work_on+0x27/0x40 > [ 52.015306] [] credit_entropy_bits+0x1d7/0x2a0 > [ 52.022002] [] ? add_interrupt_randomness+0x1b9/0x210 > [ 52.029366] [] add_interrupt_randomness+0x1b9/0x210 > [ 52.036544] [] handle_irq_event_percpu+0x40/0x80 > [ 52.043430] [] handle_irq_event+0x3b/0x60 > [ 52.049655] [] handle_level_irq+0x88/0x100 > [ 52.055968] [] handle_irq+0xab/0x130 > [ 52.061708] [] ? _local_bh_enable+0x21/0x50 > [ 52.068125] [] ? __exit_idle+0x5/0x30 > [ 52.073965] [] do_IRQ+0x4d/0xd0 > [ 52.079229] [] common_interrupt+0x8c/0x8c > [ 52.085444] > [ 52.087568] [] ? try_to_free_pmd_page+0x9/0x40 > [ 52.094462] [] ? try_to_free_pmd_page+0x5/0x40 > [ 52.101157] [] ? __unmap_pmd_range.part.5+0x4a/0x70 > [ 52.108330] [] unmap_pmd_range+0x130/0x250 > [ 52.114644] [] __cpa_process_fault+0x47b/0x5a0 > [ 52.121339] [] __change_page_attr+0x78b/0x9e0 > [ 52.127946] [] ? __raw_callee_save___native_queued_spin_unlock+0x15/0x30 > [ 52.137124] [] __change_page_attr_set_clr+0x78/0x300 > [ 52.144404] [] ? __slab_alloc+0x4d/0x5c > [ 52.150436] [] kernel_map_pages_in_pgd+0x8f/0xd0 > [ 52.157333] [] efi_setup_page_tables+0xcc/0x1d9 > [ 52.164124] [] efi_enter_virtual_mode+0x35e/0x4af > [ 52.171117] [] start_kernel+0x41f/0x4c8 > [ 52.177142] [] ? set_init_arg+0x55/0x55 > [ 52.183168] [] ? early_idt_handler_array+0x120/0x120 > [ 52.190440] [] x86_64_start_reservations+0x2a/0x2c > [ 52.197516] [] x86_64_start_kernel+0x14c/0x16f > [ 52.204214] Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 ff 14 25 a0 a7 c2 81 f6 c4 02 0f 85 0d 03 00 00<41> f6 86 02 01 00 00 01 0f 85 ae 02 00 00 49 c7 c7 18 41 01 00 > [ 52.225516] RIP [] __queue_work+0x32/0x420 > [ 52.231838] RSP > [ 52.235667] CR2: 0000000000000102 > [ 52.239667] ---[ end trace 2ee7ea9d2908eb72 ]--- > [ 52.244743] Kernel panic - not syncing: Fatal exception in interrupt > > Looking at the panic'ed instruction indicated that system_wq was > used by credit_entropy_bits() before it it was initialized in an > early_initcall. > > This patch prevents the schedule_work() call from being made before > system_wq is initialized. > > Signed-off-by: Waiman Long > --- > drivers/char/random.c | 8 ++++++-- > 1 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/char/random.c b/drivers/char/random.c > index 3efb3bf..3afc519 100644 > --- a/drivers/char/random.c > +++ b/drivers/char/random.c > @@ -730,8 +730,12 @@ retry: > r->entropy_total>= 2*random_read_wakeup_bits) { > struct entropy_store *other =&blocking_pool; > > - if (other->entropy_count<= > - 3 * other->poolinfo->poolfracbits / 4) { > + /* > + * We cannot call schedule_work() before system_wq > + * is initialized. > + */ > + if (system_wq&& (other->entropy_count<= > + 3 * other->poolinfo->poolfracbits / 4)) { > schedule_work(&other->push_work); > r->entropy_total = 0; > } This patch fixed the kernel panic, but the test system still seemed to hang after the following log messages: [ 0.276735] random: fast init done [ 6.230775] random: crng init done In the stack backtrace above, the kernel hadn't even reached SMP boot after about 50s. That was extremely slow. I tried the 4.7.3 kernel and it booted up fine. So I suspect that there may be too many interrupts going on and it consumes most of the CPU cycles. The prime suspect is the random driver, I think. I would like to hear your thought on that. Cheers, Longman