From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755524AbcESVpP (ORCPT ); Thu, 19 May 2016 17:45:15 -0400 Received: from mail.skyhub.de ([78.46.96.112]:46037 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754566AbcESVpM (ORCPT ); Thu, 19 May 2016 17:45:12 -0400 Date: Thu, 19 May 2016 23:44:55 +0200 From: Borislav Petkov To: Nicholas Krause Cc: dougthompson@xmission.com, mchehab@osg.samsung.com, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, Steven Rostedt Subject: Re: [PATCH] edac:Fix kernel panic regression in edac_mc_reset_delay_period Message-ID: <20160519214455.GE552@pd.tnic> References: <1463687097-11306-1-git-send-email-xerofoify@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1463687097-11306-1-git-send-email-xerofoify@gmail.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 19, 2016 at 03:44:57PM -0400, Nicholas Krause wrote: > This fixes a kernel panic regression in the function, > edac_mc_reset_delay_period as show by this kernel panic > trace: > [ 58.402137] BUG: unable to handle kernel paging request at 0000000000015d10 > [ 58.410564] IP: [] queued_spin_lock_slowpath+0x132/0x170 > [ 58.418941] PGD 3ffcc8067 PUD 3ffc56067 PMD 0 > [ 58.428821] Oops: 0002 [#1] SMP > [ 58.439076] Modules linked in: xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_addrtype iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables > [ 58.468176] CPU: 1 PID: 2792 Comm: edactest Not tainted 4.6.0-dirty #1 ^^^^^^^^ Ha, what is that program? > [ 58.478878] Hardware name: HP ProLiant MicroServer, BIOS O41 10/01/2013 > [ 58.488590] task: ffff8803ff9a9300 ti: ffff8803ffbf0000 task.ti: ffff8803ffbf0000 > [ 58.499562] RIP: 0010:[] [] queued_spin_lock_slowpath+0x132/0x170 > [ 58.521850] RSP: 0018:ffff8803ffbf3cf8 EFLAGS: 00010002 > [ 58.532653] RAX: 0000000000002bfe RBX: 0000000000000082 RCX: 0000000000080000 > [ 58.545334] RDX: 0000000000015d10 RSI: 00000000affd0fc4 RDI: ffffffff81d39940 > [ 58.555376] RBP: ffff88040a97b848 R08: ffff88041ed15d00 R09: 0000000000000004 > [ 58.565813] R10: 000000000000000a R11: f000000000000000 R12: ffffffff81d39940 > [ 58.577911] R13: 000000000000c940 R14: ffff8803ffbf3d48 R15: ffff8803ffbf3f28 > [ 58.588311] FS: 00007f639468f780(0000) GS:ffff88041ed00000(0000) knlGS:00000000f7743680 > [ 58.598270] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 58.609814] CR2: 0000000000015d10 CR3: 00000003ffafa000 CR4: 00000000000006e0 > [ 58.620848] Stack: > [ 58.630118] ffffffff81774d3f 000000000000000f ffffffff810ae889 ffff88040a97b820 > [ 58.640635] ffff8803ffbf3d90 0000000000002000 ffff88040c335c00 00000000000003e8 > [ 58.652220] ffffffff810aed20 0000000000000041 0000000200000000 ffff88040a97b800 > [ 58.662230] Call Trace: > [ 58.672043] [] ? _raw_spin_lock_irqsave+0x1f/0x30 > [ 58.682221] [] ? lock_timer_base.isra.34+0x49/0x60 > [ 58.693178] [] ? del_timer+0x30/0x70 > [ 58.704839] [] ? try_to_grab_pending+0xa4/0x140 > [ 58.715206] [] ? mod_delayed_work_on+0x39/0x80 > [ 58.725250] [] ? edac_mc_reset_delay_period+0x30/0x50 > [ 58.735572] [] ? edac_set_poll_msec+0x45/0x60 > [ 58.745346] [] ? param_attr_store+0x6b/0xe0 > [ 58.755254] [] ? module_attr_store+0x15/0x20 > [ 58.764869] [] ? kernfs_fop_write+0x142/0x190 > [ 58.774516] [] ? __vfs_write+0x1e/0xe0 > [ 58.783565] [] ? __vfs_read+0xa4/0xd0 > [ 58.792437] [] ? __alloc_fd+0x37/0x160 > [ 58.801108] [] ? vfs_write+0xb0/0x1b0 > [ 58.809465] [] ? SyS_write+0x4b/0xb0 > [ 58.817707] [] ? entry_SYSCALL_64_fastpath+0x17/0x93 > [ 58.825626] Code: f8 66 c7 07 01 00 c3 66 90 f3 c3 48 89 c2 c1 e8 12 48 c1 ea 0c ff c8 83 e2 30 48 98 48 81 c2 00 5d 01 00 48 03 14 c5 40 24 d1 81 <4c> 89 02 41 8b 40 08 85 c0 75 0a f3 90 41 8b 40 08 85 c0 74 f6 > [ 58.852733] RIP [] queued_spin_lock_slowpath+0x132/0x170 > [ 58.861275] RSP > [ 58.869458] CR2: 0000000000015d10 > [ 58.877632] ---[ end trace 3f286bc71cca15d1 ]--- > [ 58.885869] Kernel panic - not syncing: Fatal exception So I see the splat but the fix does not look correct... It is more, like, an uninitialized workqueue somewhere. How do you trigger this? Write some values into /sys/module/edac_core/parameters/edac_mc_poll_msec ? I guess that's that edactest program. Can I have your .config please? ... Ok, I think I see it - we initialize the workqueues only when ->edac_check is defined. And you're probably using an EDAC driver which doesn't define that function, thus the splat. But which driver are you using? I don't see it in your module list. So it is either compiled in or you've simply loaded edac_core.ko only. If you want to write a proper fix, I'd give you a hint: look at ->op_state. That should be tested. :-) Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.