From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751812AbdFOPjD (ORCPT ); Thu, 15 Jun 2017 11:39:03 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:44036 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750749AbdFOPjB (ORCPT ); Thu, 15 Jun 2017 11:39:01 -0400 Date: Thu, 15 Jun 2017 08:38:57 -0700 From: "Paul E. McKenney" To: Tejun Heo Cc: jiangshanlai@gmail.com, linux-kernel@vger.kernel.org Subject: Re: WARN_ON_ONCE() in process_one_work()? Reply-To: paulmck@linux.vnet.ibm.com References: <20170501165747.GA993@linux.vnet.ibm.com> <20170501183807.GA7054@linux.vnet.ibm.com> <20170501184402.GB8921@htj.duckdns.org> <20170501185819.GJ3956@linux.vnet.ibm.com> <20170505171159.GA10296@linux.vnet.ibm.com> <20170613205837.GB7359@htj.duckdns.org> <20170613223103.GX3721@linux.vnet.ibm.com> <20170614151548.GA14462@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170614151548.GA14462@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17061515-0048-0000-0000-000001A7E94B X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007237; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000212; SDB=6.00875185; UDB=6.00435721; IPR=6.00655273; BA=6.00005423; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015837; XFM=3.00000015; UTC=2017-06-15 15:38:59 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17061515-0049-0000-0000-000041878E9F Message-Id: <20170615153857.GA27788@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-15_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1706150270 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 14, 2017 at 08:15:48AM -0700, Paul E. McKenney wrote: > On Tue, Jun 13, 2017 at 03:31:03PM -0700, Paul E. McKenney wrote: > > On Tue, Jun 13, 2017 at 04:58:37PM -0400, Tejun Heo wrote: > > > Hello, Paul. > > > > > > On Fri, May 05, 2017 at 10:11:59AM -0700, Paul E. McKenney wrote: > > > > Just following up... I have hit this bug a couple of times over the > > > > past few days. Anything I can do to help? > > > > > > My apologies for dropping the ball on this. I've gone over the hot > > > plug code in workqueue several times but can't really find how this > > > would happen. Can you please apply the following patch and see what > > > it says when the problem happens? > > > > I have fired it up, thank you! > > > > Last time I saw one failure in 21 hours of test runs, so I have kicked > > of 42 one-hour test runs. Will see what happens tomorrow morning, > > Pacific Time. > > And none of the 42 runs resulted in a workqueue splat. I will try again > this evening, Pacific Time. > > Who knows, maybe your diagnostic patch is the fix. ;-) And this time, we did get something! Here is the printk() output: [ 2126.863410] XXX workfn=vmstat_update pool->cpu/flags=1/0x0 curcpu=2 online=0-2,7 active=0,2,7 Please see below for the full splat from dmesg. Please let me know if you need additional email. My test ID is KSIC 2017.06.14-15:50:08/TREE07.14, just to help me find it in my large pile of test results. ;-) Thanx, Paul ------------------------------------------------------------------------ [ 2126.840625] ------------[ cut here ]------------ [ 2126.841373] WARNING: CPU: 2 PID: 6 at /home/paulmck/public_git/linux-rcu/kernel/workqueue.c:2048 process_one_work+0x5f/0x440 [ 2126.843042] Modules linked in: [ 2126.843521] CPU: 2 PID: 6 Comm: mm_percpu_wq Not tainted 4.12.0-rc4+ #1 [ 2126.844493] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 2126.845898] task: ffff985c5e59bc00 task.stack: ffffb8be000fc000 [ 2126.846773] RIP: 0010:process_one_work+0x5f/0x440 [ 2126.847470] RSP: 0018:ffffb8be000ffe60 EFLAGS: 00010002 [ 2126.848240] RAX: ffff985c5e595200 RBX: ffff985c5e56e840 RCX: 0000000000000000 [ 2126.849284] RDX: 0000000000000001 RSI: ffff985c5e8952c0 RDI: ffff985c5e56e840 [ 2126.850337] RBP: ffffb8be000ffe90 R08: 0000000000000002 R09: 0000000000000000 [ 2126.851378] R10: ffffb8be000ffe18 R11: 0000000000000000 R12: ffff985c5e899c00 [ 2126.852410] R13: ffff985c5e89e800 R14: 0000000000000000 R15: ffff985c5e8952c0 [ 2126.853440] FS: 0000000000000000(0000) GS:ffff985c5e900000(0000) knlGS:0000000000000000 [ 2126.854650] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2126.855500] CR2: 00000000000000b0 CR3: 000000001dcfc000 CR4: 00000000000006e0 [ 2126.856558] Call Trace: [ 2126.856944] rescuer_thread+0x207/0x350 [ 2126.857520] kthread+0x104/0x140 [ 2126.858024] ? process_one_work+0x440/0x440 [ 2126.858644] ? kthread_create_on_node+0x40/0x40 [ 2126.859343] ret_from_fork+0x22/0x30 [ 2126.859879] Code: e8 49 8b 45 08 44 8b b0 00 01 00 00 41 83 e6 20 f6 c1 04 44 89 75 d4 75 21 41 8b 54 24 04 65 44 8b 05 de 8a f9 74 41 39 d0 74 0f <0f> ff 80 3d b8 d5 ec 00 00 0f 84 92 03 00 00 48 ba eb 83 b5 80 [ 2126.862715] ---[ end trace df0aaa02bf2d28ee ]--- [ 2126.863410] XXX workfn=vmstat_update pool->cpu/flags=1/0x0 curcpu=2 online=0-2,7 active=0,2,7