From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752425AbdCESsZ (ORCPT ); Sun, 5 Mar 2017 13:48:25 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:53273 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751969AbdCESsX (ORCPT ); Sun, 5 Mar 2017 13:48:23 -0500 Date: Sun, 5 Mar 2017 10:47:36 -0800 From: "Paul E. McKenney" To: Dmitry Vyukov Cc: josh@joshtriplett.org, Steven Rostedt , Mathieu Desnoyers , jiangshanlai@gmail.com, LKML , syzkaller Subject: Re: rcu: WARNING in rcu_seq_end Reply-To: paulmck@linux.vnet.ibm.com References: <20170304204052.GC30506@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 17030518-0044-0000-0000-000002B8D4E2 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006728; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000206; SDB=6.00830388; UDB=6.00407302; IPR=6.00607979; BA=6.00005188; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00014521; XFM=3.00000012; UTC=2017-03-05 18:47:39 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17030518-0045-0000-0000-000006E6DC0D Message-Id: <20170305184736.GD30506@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-03-05_12:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1702020001 definitions=main-1703050162 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Mar 05, 2017 at 11:50:39AM +0100, Dmitry Vyukov wrote: > On Sat, Mar 4, 2017 at 9:40 PM, Paul E. McKenney > wrote: > > On Sat, Mar 04, 2017 at 05:01:19PM +0100, Dmitry Vyukov wrote: > >> Hello, > >> > >> Paul, you wanted bugs in rcu. > > > > Well, whether I want them or not, I must deal with them. ;-) > > > >> I've got this WARNING while running syzkaller fuzzer on > >> 86292b33d4b79ee03e2f43ea0381ef85f077c760: > >> > >> ------------[ cut here ]------------ > >> WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533 > >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 > >> Kernel panic - not syncing: panic_on_warn set ... > >> CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276 > >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > >> Workqueue: events wait_rcu_exp_gp > >> Call Trace: > >> __dump_stack lib/dump_stack.c:15 [inline] > >> dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 > >> panic+0x1fb/0x412 kernel/panic.c:179 > >> __warn+0x1c4/0x1e0 kernel/panic.c:540 > >> warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 > >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 > >> rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline] > >> rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517 > >> rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline] > >> wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570 > >> process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096 > >> worker_thread+0x223/0x19c0 kernel/workqueue.c:2230 > >> kthread+0x326/0x3f0 kernel/kthread.c:227 > >> ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 > >> Dumping ftrace buffer: > >> (ftrace buffer empty) > >> Kernel Offset: disabled > >> Rebooting in 86400 seconds.. > >> > >> > >> Not reproducible. But looking at the code, shouldn't it be: > >> > >> static void rcu_seq_end(unsigned long *sp) > >> { > >> smp_mb(); /* Ensure update-side operation before counter increment. */ > >> + WARN_ON_ONCE(!(*sp & 0x1)); > >> WRITE_ONCE(*sp, *sp + 1); > >> - WARN_ON_ONCE(*sp & 0x1); > >> } > >> > >> ? > >> > >> Otherwise wait_event in _synchronize_rcu_expedited can return as soon > >> as WRITE_ONCE(*sp, *sp + 1) finishes. As far as I understand this > >> consequently can allow start of next grace periods. Which in turn can > >> make the warning fire. Am I missing something? > >> > >> I don't see any other bad consequences of this. The rest of > >> rcu_exp_wait_wake can proceed when _synchronize_rcu_expedited has > >> returned and destroyed work on stack and next period has started and > >> ended, but it seems OK. > > > > I believe that this is a heygood change, but I don't see how it will > > help in this case. BTW, may I have your Signed-off-by? > > > > The reason I don't believe that it will help is that the > > rcu_exp_gp_seq_end() function is called from a workqueue handler that > > is invoked holding ->exp_mutex, and this mutex is not released until > > after the handler invokes rcu_seq_end() and then wakes up the task that > > scheduled the workqueue handler. So the ordering above should not matter > > (but I agree that your ordering is cleaner. > > > > That said, it looks like I am missing some memory barriers, please > > see the following patch. > > > > But what architecture did you see this on? > > > This is just x86. > > You seem to assume that wait_event() waits for the wakeup. It does not > work this way. It can return as soon as the condition becomes true > without ever waiting: > > 305 #define wait_event(wq, condition) \ > 306 do { \ > 307 might_sleep(); \ > 308 if (condition) \ > 309 break; \ > 310 __wait_event(wq, condition); \ > 311 } while (0) Agreed, hence my patch in the previous email. I guess I knew that, but on the day I wrote that code, my fingers didn't. Or somew similar lame excuse. ;-) > Mailed a signed patch: > https://groups.google.com/d/msg/syzkaller/XzUXuAzKkCw/5054wU9MEAAJ This is the patch you also sent by email, that moves the WARN_ON_ONCE(), thank you! Thanx, Paul >