* rcu: WARNING in rcu_seq_end @ 2017-03-04 16:01 Dmitry Vyukov 2017-03-04 20:40 ` Paul E. McKenney 0 siblings, 1 reply; 21+ messages in thread From: Dmitry Vyukov @ 2017-03-04 16:01 UTC (permalink / raw) To: Paul McKenney, josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML Cc: syzkaller Hello, Paul, you wanted bugs in rcu. I've got this WARNING while running syzkaller fuzzer on 86292b33d4b79ee03e2f43ea0381ef85f077c760: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533 rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: events wait_rcu_exp_gp Call Trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 panic+0x1fb/0x412 kernel/panic.c:179 __warn+0x1c4/0x1e0 kernel/panic.c:540 warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline] rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517 rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline] wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570 process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096 worker_thread+0x223/0x19c0 kernel/workqueue.c:2230 kthread+0x326/0x3f0 kernel/kthread.c:227 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Rebooting in 86400 seconds.. Not reproducible. But looking at the code, shouldn't it be: static void rcu_seq_end(unsigned long *sp) { smp_mb(); /* Ensure update-side operation before counter increment. */ + WARN_ON_ONCE(!(*sp & 0x1)); WRITE_ONCE(*sp, *sp + 1); - WARN_ON_ONCE(*sp & 0x1); } ? Otherwise wait_event in _synchronize_rcu_expedited can return as soon as WRITE_ONCE(*sp, *sp + 1) finishes. As far as I understand this consequently can allow start of next grace periods. Which in turn can make the warning fire. Am I missing something? I don't see any other bad consequences of this. The rest of rcu_exp_wait_wake can proceed when _synchronize_rcu_expedited has returned and destroyed work on stack and next period has started and ended, but it seems OK. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-04 16:01 rcu: WARNING in rcu_seq_end Dmitry Vyukov @ 2017-03-04 20:40 ` Paul E. McKenney 2017-03-05 10:50 ` Dmitry Vyukov 0 siblings, 1 reply; 21+ messages in thread From: Paul E. McKenney @ 2017-03-04 20:40 UTC (permalink / raw) To: Dmitry Vyukov Cc: josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller On Sat, Mar 04, 2017 at 05:01:19PM +0100, Dmitry Vyukov wrote: > Hello, > > Paul, you wanted bugs in rcu. Well, whether I want them or not, I must deal with them. ;-) > I've got this WARNING while running syzkaller fuzzer on > 86292b33d4b79ee03e2f43ea0381ef85f077c760: > > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533 > rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 > Kernel panic - not syncing: panic_on_warn set ... > CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > Workqueue: events wait_rcu_exp_gp > Call Trace: > __dump_stack lib/dump_stack.c:15 [inline] > dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 > panic+0x1fb/0x412 kernel/panic.c:179 > __warn+0x1c4/0x1e0 kernel/panic.c:540 > warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 > rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 > rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline] > rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517 > rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline] > wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570 > process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096 > worker_thread+0x223/0x19c0 kernel/workqueue.c:2230 > kthread+0x326/0x3f0 kernel/kthread.c:227 > ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 > Dumping ftrace buffer: > (ftrace buffer empty) > Kernel Offset: disabled > Rebooting in 86400 seconds.. > > > Not reproducible. But looking at the code, shouldn't it be: > > static void rcu_seq_end(unsigned long *sp) > { > smp_mb(); /* Ensure update-side operation before counter increment. */ > + WARN_ON_ONCE(!(*sp & 0x1)); > WRITE_ONCE(*sp, *sp + 1); > - WARN_ON_ONCE(*sp & 0x1); > } > > ? > > Otherwise wait_event in _synchronize_rcu_expedited can return as soon > as WRITE_ONCE(*sp, *sp + 1) finishes. As far as I understand this > consequently can allow start of next grace periods. Which in turn can > make the warning fire. Am I missing something? > > I don't see any other bad consequences of this. The rest of > rcu_exp_wait_wake can proceed when _synchronize_rcu_expedited has > returned and destroyed work on stack and next period has started and > ended, but it seems OK. I believe that this is a heygood change, but I don't see how it will help in this case. BTW, may I have your Signed-off-by? The reason I don't believe that it will help is that the rcu_exp_gp_seq_end() function is called from a workqueue handler that is invoked holding ->exp_mutex, and this mutex is not released until after the handler invokes rcu_seq_end() and then wakes up the task that scheduled the workqueue handler. So the ordering above should not matter (but I agree that your ordering is cleaner. That said, it looks like I am missing some memory barriers, please see the following patch. But what architecture did you see this on? Thanx, Paul ------------------------------------------------------------------------ commit 1038d97908afb04750d0775748e2165a6e92d851 Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Date: Sat Mar 4 12:33:53 2017 -0800 rcu: Expedited wakeups need to be fully ordered Expedited grace periods use workqueue handlers that wake up the requesters, but there is no lock mediating this wakeup. Therefore, memory barriers are required to ensure that the handler's memory references are seen by all to occur before synchronize_*_expedited() returns to its caller. Possibly detected by syzkaller. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 51ca287828a2..027e123d93c7 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -533,6 +533,7 @@ static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s) rnp->exp_seq_rq = s; spin_unlock(&rnp->exp_lock); } + smp_mb(); /* All above changes before wakeup. */ wake_up_all(&rnp->exp_wq[(rsp->expedited_sequence >> 1) & 0x3]); } trace_rcu_exp_grace_period(rsp->name, s, TPS("endwake")); @@ -614,6 +615,7 @@ static void _synchronize_rcu_expedited(struct rcu_state *rsp, wait_event(rnp->exp_wq[(s >> 1) & 0x3], sync_exp_work_done(rsp, &rdp->exp_workdone0, s)); + smp_mb(); /* Workqueue actions happen before return. */ /* Let the next expedited grace period start. */ mutex_unlock(&rsp->exp_mutex); ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-04 20:40 ` Paul E. McKenney @ 2017-03-05 10:50 ` Dmitry Vyukov 2017-03-05 18:47 ` Paul E. McKenney 0 siblings, 1 reply; 21+ messages in thread From: Dmitry Vyukov @ 2017-03-05 10:50 UTC (permalink / raw) To: Paul McKenney Cc: josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller On Sat, Mar 4, 2017 at 9:40 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > On Sat, Mar 04, 2017 at 05:01:19PM +0100, Dmitry Vyukov wrote: >> Hello, >> >> Paul, you wanted bugs in rcu. > > Well, whether I want them or not, I must deal with them. ;-) > >> I've got this WARNING while running syzkaller fuzzer on >> 86292b33d4b79ee03e2f43ea0381ef85f077c760: >> >> ------------[ cut here ]------------ >> WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533 >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 >> Kernel panic - not syncing: panic_on_warn set ... >> CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 >> Workqueue: events wait_rcu_exp_gp >> Call Trace: >> __dump_stack lib/dump_stack.c:15 [inline] >> dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 >> panic+0x1fb/0x412 kernel/panic.c:179 >> __warn+0x1c4/0x1e0 kernel/panic.c:540 >> warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 >> rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline] >> rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517 >> rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline] >> wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570 >> process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096 >> worker_thread+0x223/0x19c0 kernel/workqueue.c:2230 >> kthread+0x326/0x3f0 kernel/kthread.c:227 >> ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 >> Dumping ftrace buffer: >> (ftrace buffer empty) >> Kernel Offset: disabled >> Rebooting in 86400 seconds.. >> >> >> Not reproducible. But looking at the code, shouldn't it be: >> >> static void rcu_seq_end(unsigned long *sp) >> { >> smp_mb(); /* Ensure update-side operation before counter increment. */ >> + WARN_ON_ONCE(!(*sp & 0x1)); >> WRITE_ONCE(*sp, *sp + 1); >> - WARN_ON_ONCE(*sp & 0x1); >> } >> >> ? >> >> Otherwise wait_event in _synchronize_rcu_expedited can return as soon >> as WRITE_ONCE(*sp, *sp + 1) finishes. As far as I understand this >> consequently can allow start of next grace periods. Which in turn can >> make the warning fire. Am I missing something? >> >> I don't see any other bad consequences of this. The rest of >> rcu_exp_wait_wake can proceed when _synchronize_rcu_expedited has >> returned and destroyed work on stack and next period has started and >> ended, but it seems OK. > > I believe that this is a heygood change, but I don't see how it will > help in this case. BTW, may I have your Signed-off-by? > > The reason I don't believe that it will help is that the > rcu_exp_gp_seq_end() function is called from a workqueue handler that > is invoked holding ->exp_mutex, and this mutex is not released until > after the handler invokes rcu_seq_end() and then wakes up the task that > scheduled the workqueue handler. So the ordering above should not matter > (but I agree that your ordering is cleaner. > > That said, it looks like I am missing some memory barriers, please > see the following patch. > > But what architecture did you see this on? This is just x86. You seem to assume that wait_event() waits for the wakeup. It does not work this way. It can return as soon as the condition becomes true without ever waiting: 305 #define wait_event(wq, condition) \ 306 do { \ 307 might_sleep(); \ 308 if (condition) \ 309 break; \ 310 __wait_event(wq, condition); \ 311 } while (0) Mailed a signed patch: https://groups.google.com/d/msg/syzkaller/XzUXuAzKkCw/5054wU9MEAAJ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-05 10:50 ` Dmitry Vyukov @ 2017-03-05 18:47 ` Paul E. McKenney 2017-03-06 9:24 ` Dmitry Vyukov 0 siblings, 1 reply; 21+ messages in thread From: Paul E. McKenney @ 2017-03-05 18:47 UTC (permalink / raw) To: Dmitry Vyukov Cc: josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller On Sun, Mar 05, 2017 at 11:50:39AM +0100, Dmitry Vyukov wrote: > On Sat, Mar 4, 2017 at 9:40 PM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: > > On Sat, Mar 04, 2017 at 05:01:19PM +0100, Dmitry Vyukov wrote: > >> Hello, > >> > >> Paul, you wanted bugs in rcu. > > > > Well, whether I want them or not, I must deal with them. ;-) > > > >> I've got this WARNING while running syzkaller fuzzer on > >> 86292b33d4b79ee03e2f43ea0381ef85f077c760: > >> > >> ------------[ cut here ]------------ > >> WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533 > >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 > >> Kernel panic - not syncing: panic_on_warn set ... > >> CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276 > >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > >> Workqueue: events wait_rcu_exp_gp > >> Call Trace: > >> __dump_stack lib/dump_stack.c:15 [inline] > >> dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 > >> panic+0x1fb/0x412 kernel/panic.c:179 > >> __warn+0x1c4/0x1e0 kernel/panic.c:540 > >> warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 > >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 > >> rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline] > >> rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517 > >> rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline] > >> wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570 > >> process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096 > >> worker_thread+0x223/0x19c0 kernel/workqueue.c:2230 > >> kthread+0x326/0x3f0 kernel/kthread.c:227 > >> ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 > >> Dumping ftrace buffer: > >> (ftrace buffer empty) > >> Kernel Offset: disabled > >> Rebooting in 86400 seconds.. > >> > >> > >> Not reproducible. But looking at the code, shouldn't it be: > >> > >> static void rcu_seq_end(unsigned long *sp) > >> { > >> smp_mb(); /* Ensure update-side operation before counter increment. */ > >> + WARN_ON_ONCE(!(*sp & 0x1)); > >> WRITE_ONCE(*sp, *sp + 1); > >> - WARN_ON_ONCE(*sp & 0x1); > >> } > >> > >> ? > >> > >> Otherwise wait_event in _synchronize_rcu_expedited can return as soon > >> as WRITE_ONCE(*sp, *sp + 1) finishes. As far as I understand this > >> consequently can allow start of next grace periods. Which in turn can > >> make the warning fire. Am I missing something? > >> > >> I don't see any other bad consequences of this. The rest of > >> rcu_exp_wait_wake can proceed when _synchronize_rcu_expedited has > >> returned and destroyed work on stack and next period has started and > >> ended, but it seems OK. > > > > I believe that this is a heygood change, but I don't see how it will > > help in this case. BTW, may I have your Signed-off-by? > > > > The reason I don't believe that it will help is that the > > rcu_exp_gp_seq_end() function is called from a workqueue handler that > > is invoked holding ->exp_mutex, and this mutex is not released until > > after the handler invokes rcu_seq_end() and then wakes up the task that > > scheduled the workqueue handler. So the ordering above should not matter > > (but I agree that your ordering is cleaner. > > > > That said, it looks like I am missing some memory barriers, please > > see the following patch. > > > > But what architecture did you see this on? > > > This is just x86. > > You seem to assume that wait_event() waits for the wakeup. It does not > work this way. It can return as soon as the condition becomes true > without ever waiting: > > 305 #define wait_event(wq, condition) \ > 306 do { \ > 307 might_sleep(); \ > 308 if (condition) \ > 309 break; \ > 310 __wait_event(wq, condition); \ > 311 } while (0) Agreed, hence my patch in the previous email. I guess I knew that, but on the day I wrote that code, my fingers didn't. Or somew similar lame excuse. ;-) > Mailed a signed patch: > https://groups.google.com/d/msg/syzkaller/XzUXuAzKkCw/5054wU9MEAAJ This is the patch you also sent by email, that moves the WARN_ON_ONCE(), thank you! Thanx, Paul > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-05 18:47 ` Paul E. McKenney @ 2017-03-06 9:24 ` Dmitry Vyukov 2017-03-06 10:07 ` Paul E. McKenney 0 siblings, 1 reply; 21+ messages in thread From: Dmitry Vyukov @ 2017-03-06 9:24 UTC (permalink / raw) To: Paul McKenney Cc: josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller On Sun, Mar 5, 2017 at 7:47 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > On Sun, Mar 05, 2017 at 11:50:39AM +0100, Dmitry Vyukov wrote: >> On Sat, Mar 4, 2017 at 9:40 PM, Paul E. McKenney >> <paulmck@linux.vnet.ibm.com> wrote: >> > On Sat, Mar 04, 2017 at 05:01:19PM +0100, Dmitry Vyukov wrote: >> >> Hello, >> >> >> >> Paul, you wanted bugs in rcu. >> > >> > Well, whether I want them or not, I must deal with them. ;-) >> > >> >> I've got this WARNING while running syzkaller fuzzer on >> >> 86292b33d4b79ee03e2f43ea0381ef85f077c760: >> >> >> >> ------------[ cut here ]------------ >> >> WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533 >> >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 >> >> Kernel panic - not syncing: panic_on_warn set ... >> >> CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276 >> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 >> >> Workqueue: events wait_rcu_exp_gp >> >> Call Trace: >> >> __dump_stack lib/dump_stack.c:15 [inline] >> >> dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 >> >> panic+0x1fb/0x412 kernel/panic.c:179 >> >> __warn+0x1c4/0x1e0 kernel/panic.c:540 >> >> warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 >> >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 >> >> rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline] >> >> rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517 >> >> rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline] >> >> wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570 >> >> process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096 >> >> worker_thread+0x223/0x19c0 kernel/workqueue.c:2230 >> >> kthread+0x326/0x3f0 kernel/kthread.c:227 >> >> ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 >> >> Dumping ftrace buffer: >> >> (ftrace buffer empty) >> >> Kernel Offset: disabled >> >> Rebooting in 86400 seconds.. >> >> >> >> >> >> Not reproducible. But looking at the code, shouldn't it be: >> >> >> >> static void rcu_seq_end(unsigned long *sp) >> >> { >> >> smp_mb(); /* Ensure update-side operation before counter increment. */ >> >> + WARN_ON_ONCE(!(*sp & 0x1)); >> >> WRITE_ONCE(*sp, *sp + 1); >> >> - WARN_ON_ONCE(*sp & 0x1); >> >> } >> >> >> >> ? >> >> >> >> Otherwise wait_event in _synchronize_rcu_expedited can return as soon >> >> as WRITE_ONCE(*sp, *sp + 1) finishes. As far as I understand this >> >> consequently can allow start of next grace periods. Which in turn can >> >> make the warning fire. Am I missing something? >> >> >> >> I don't see any other bad consequences of this. The rest of >> >> rcu_exp_wait_wake can proceed when _synchronize_rcu_expedited has >> >> returned and destroyed work on stack and next period has started and >> >> ended, but it seems OK. >> > >> > I believe that this is a heygood change, but I don't see how it will >> > help in this case. BTW, may I have your Signed-off-by? >> > >> > The reason I don't believe that it will help is that the >> > rcu_exp_gp_seq_end() function is called from a workqueue handler that >> > is invoked holding ->exp_mutex, and this mutex is not released until >> > after the handler invokes rcu_seq_end() and then wakes up the task that >> > scheduled the workqueue handler. So the ordering above should not matter >> > (but I agree that your ordering is cleaner. >> > >> > That said, it looks like I am missing some memory barriers, please >> > see the following patch. >> > >> > But what architecture did you see this on? >> >> >> This is just x86. >> >> You seem to assume that wait_event() waits for the wakeup. It does not >> work this way. It can return as soon as the condition becomes true >> without ever waiting: >> >> 305 #define wait_event(wq, condition) \ >> 306 do { \ >> 307 might_sleep(); \ >> 308 if (condition) \ >> 309 break; \ >> 310 __wait_event(wq, condition); \ >> 311 } while (0) > > Agreed, hence my patch in the previous email. I guess I knew that, but Ah, you meant to synchronize rcu_seq_end with rcu_seq_done? I think you placed the barrier incorrectly for that. rcu_exp_wait_wake is already too late. The write that unblocks waiter is in rcu_seq_end so you need a release barrier _before_ that write. Also can we please start using smp_load_acquire/smp_store_release where they are what doctor said. They are faster, more readable, better for race detectors _and_ would prevent you from introducing this bug, because you would need to find the exact write that signifies completion. I.e.: diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index d80c2587bed8..aa7ba83f6a56 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3534,7 +3534,7 @@ static void rcu_seq_start(unsigned long *sp) static void rcu_seq_end(unsigned long *sp) { smp_mb(); /* Ensure update-side operation before counter increment. */ - WRITE_ONCE(*sp, *sp + 1); + smp_store_release(sp, *sp + 1); WARN_ON_ONCE(*sp & 0x1); } @@ -3554,7 +3554,7 @@ static unsigned long rcu_seq_snap(unsigned long *sp) */ static bool rcu_seq_done(unsigned long *sp, unsigned long s) { - return ULONG_CMP_GE(READ_ONCE(*sp), s); + return ULONG_CMP_GE(smp_load_acquire(sp), s); } > on the day I wrote that code, my fingers didn't. Or somew similar lame > excuse. ;-) > >> Mailed a signed patch: >> https://groups.google.com/d/msg/syzkaller/XzUXuAzKkCw/5054wU9MEAAJ > > This is the patch you also sent by email, that moves the WARN_ON_ONCE(), > thank you! ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-06 9:24 ` Dmitry Vyukov @ 2017-03-06 10:07 ` Paul E. McKenney 2017-03-06 10:11 ` Dmitry Vyukov 0 siblings, 1 reply; 21+ messages in thread From: Paul E. McKenney @ 2017-03-06 10:07 UTC (permalink / raw) To: Dmitry Vyukov Cc: josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller On Mon, Mar 06, 2017 at 10:24:24AM +0100, Dmitry Vyukov wrote: > On Sun, Mar 5, 2017 at 7:47 PM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: > > On Sun, Mar 05, 2017 at 11:50:39AM +0100, Dmitry Vyukov wrote: > >> On Sat, Mar 4, 2017 at 9:40 PM, Paul E. McKenney > >> <paulmck@linux.vnet.ibm.com> wrote: > >> > On Sat, Mar 04, 2017 at 05:01:19PM +0100, Dmitry Vyukov wrote: > >> >> Hello, > >> >> > >> >> Paul, you wanted bugs in rcu. > >> > > >> > Well, whether I want them or not, I must deal with them. ;-) > >> > > >> >> I've got this WARNING while running syzkaller fuzzer on > >> >> 86292b33d4b79ee03e2f43ea0381ef85f077c760: > >> >> > >> >> ------------[ cut here ]------------ > >> >> WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533 > >> >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 > >> >> Kernel panic - not syncing: panic_on_warn set ... > >> >> CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276 > >> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > >> >> Workqueue: events wait_rcu_exp_gp > >> >> Call Trace: > >> >> __dump_stack lib/dump_stack.c:15 [inline] > >> >> dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 > >> >> panic+0x1fb/0x412 kernel/panic.c:179 > >> >> __warn+0x1c4/0x1e0 kernel/panic.c:540 > >> >> warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 > >> >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 > >> >> rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline] > >> >> rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517 > >> >> rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline] > >> >> wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570 > >> >> process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096 > >> >> worker_thread+0x223/0x19c0 kernel/workqueue.c:2230 > >> >> kthread+0x326/0x3f0 kernel/kthread.c:227 > >> >> ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 > >> >> Dumping ftrace buffer: > >> >> (ftrace buffer empty) > >> >> Kernel Offset: disabled > >> >> Rebooting in 86400 seconds.. > >> >> > >> >> > >> >> Not reproducible. But looking at the code, shouldn't it be: > >> >> > >> >> static void rcu_seq_end(unsigned long *sp) > >> >> { > >> >> smp_mb(); /* Ensure update-side operation before counter increment. */ > >> >> + WARN_ON_ONCE(!(*sp & 0x1)); > >> >> WRITE_ONCE(*sp, *sp + 1); > >> >> - WARN_ON_ONCE(*sp & 0x1); > >> >> } > >> >> > >> >> ? > >> >> > >> >> Otherwise wait_event in _synchronize_rcu_expedited can return as soon > >> >> as WRITE_ONCE(*sp, *sp + 1) finishes. As far as I understand this > >> >> consequently can allow start of next grace periods. Which in turn can > >> >> make the warning fire. Am I missing something? > >> >> > >> >> I don't see any other bad consequences of this. The rest of > >> >> rcu_exp_wait_wake can proceed when _synchronize_rcu_expedited has > >> >> returned and destroyed work on stack and next period has started and > >> >> ended, but it seems OK. > >> > > >> > I believe that this is a heygood change, but I don't see how it will > >> > help in this case. BTW, may I have your Signed-off-by? > >> > > >> > The reason I don't believe that it will help is that the > >> > rcu_exp_gp_seq_end() function is called from a workqueue handler that > >> > is invoked holding ->exp_mutex, and this mutex is not released until > >> > after the handler invokes rcu_seq_end() and then wakes up the task that > >> > scheduled the workqueue handler. So the ordering above should not matter > >> > (but I agree that your ordering is cleaner. > >> > > >> > That said, it looks like I am missing some memory barriers, please > >> > see the following patch. > >> > > >> > But what architecture did you see this on? > >> > >> > >> This is just x86. > >> > >> You seem to assume that wait_event() waits for the wakeup. It does not > >> work this way. It can return as soon as the condition becomes true > >> without ever waiting: > >> > >> 305 #define wait_event(wq, condition) \ > >> 306 do { \ > >> 307 might_sleep(); \ > >> 308 if (condition) \ > >> 309 break; \ > >> 310 __wait_event(wq, condition); \ > >> 311 } while (0) > > > > Agreed, hence my patch in the previous email. I guess I knew that, but > > Ah, you meant to synchronize rcu_seq_end with rcu_seq_done? No, there is a mutex release and acquisition that do the synchronization, but only if the wakeup has appropriate barriers. The issue is that part of the mutex's critical section executes in a workqueue, possibly on some other CPU. Thanx, Paul > I think you placed the barrier incorrectly for that. rcu_exp_wait_wake > is already too late. The write that unblocks waiter is in rcu_seq_end > so you need a release barrier _before_ that write. > Also can we please start using smp_load_acquire/smp_store_release > where they are what doctor said. They are faster, more readable, > better for race detectors _and_ would prevent you from introducing > this bug, because you would need to find the exact write that > signifies completion. I.e.: > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index d80c2587bed8..aa7ba83f6a56 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3534,7 +3534,7 @@ static void rcu_seq_start(unsigned long *sp) > static void rcu_seq_end(unsigned long *sp) > { > smp_mb(); /* Ensure update-side operation before counter increment. */ > - WRITE_ONCE(*sp, *sp + 1); > + smp_store_release(sp, *sp + 1); > WARN_ON_ONCE(*sp & 0x1); > } > > @@ -3554,7 +3554,7 @@ static unsigned long rcu_seq_snap(unsigned long *sp) > */ > static bool rcu_seq_done(unsigned long *sp, unsigned long s) > { > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > + return ULONG_CMP_GE(smp_load_acquire(sp), s); > } > > > > > on the day I wrote that code, my fingers didn't. Or somew similar lame > > excuse. ;-) > > > >> Mailed a signed patch: > >> https://groups.google.com/d/msg/syzkaller/XzUXuAzKkCw/5054wU9MEAAJ > > > > This is the patch you also sent by email, that moves the WARN_ON_ONCE(), > > thank you! > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-06 10:07 ` Paul E. McKenney @ 2017-03-06 10:11 ` Dmitry Vyukov 2017-03-06 23:08 ` Paul E. McKenney 0 siblings, 1 reply; 21+ messages in thread From: Dmitry Vyukov @ 2017-03-06 10:11 UTC (permalink / raw) To: Paul McKenney Cc: josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller On Mon, Mar 6, 2017 at 11:07 AM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: > On Mon, Mar 06, 2017 at 10:24:24AM +0100, Dmitry Vyukov wrote: >> On Sun, Mar 5, 2017 at 7:47 PM, Paul E. McKenney >> <paulmck@linux.vnet.ibm.com> wrote: >> > On Sun, Mar 05, 2017 at 11:50:39AM +0100, Dmitry Vyukov wrote: >> >> On Sat, Mar 4, 2017 at 9:40 PM, Paul E. McKenney >> >> <paulmck@linux.vnet.ibm.com> wrote: >> >> > On Sat, Mar 04, 2017 at 05:01:19PM +0100, Dmitry Vyukov wrote: >> >> >> Hello, >> >> >> >> >> >> Paul, you wanted bugs in rcu. >> >> > >> >> > Well, whether I want them or not, I must deal with them. ;-) >> >> > >> >> >> I've got this WARNING while running syzkaller fuzzer on >> >> >> 86292b33d4b79ee03e2f43ea0381ef85f077c760: >> >> >> >> >> >> ------------[ cut here ]------------ >> >> >> WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533 >> >> >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 >> >> >> Kernel panic - not syncing: panic_on_warn set ... >> >> >> CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276 >> >> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 >> >> >> Workqueue: events wait_rcu_exp_gp >> >> >> Call Trace: >> >> >> __dump_stack lib/dump_stack.c:15 [inline] >> >> >> dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 >> >> >> panic+0x1fb/0x412 kernel/panic.c:179 >> >> >> __warn+0x1c4/0x1e0 kernel/panic.c:540 >> >> >> warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 >> >> >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 >> >> >> rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline] >> >> >> rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517 >> >> >> rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline] >> >> >> wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570 >> >> >> process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096 >> >> >> worker_thread+0x223/0x19c0 kernel/workqueue.c:2230 >> >> >> kthread+0x326/0x3f0 kernel/kthread.c:227 >> >> >> ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 >> >> >> Dumping ftrace buffer: >> >> >> (ftrace buffer empty) >> >> >> Kernel Offset: disabled >> >> >> Rebooting in 86400 seconds.. >> >> >> >> >> >> >> >> >> Not reproducible. But looking at the code, shouldn't it be: >> >> >> >> >> >> static void rcu_seq_end(unsigned long *sp) >> >> >> { >> >> >> smp_mb(); /* Ensure update-side operation before counter increment. */ >> >> >> + WARN_ON_ONCE(!(*sp & 0x1)); >> >> >> WRITE_ONCE(*sp, *sp + 1); >> >> >> - WARN_ON_ONCE(*sp & 0x1); >> >> >> } >> >> >> >> >> >> ? >> >> >> >> >> >> Otherwise wait_event in _synchronize_rcu_expedited can return as soon >> >> >> as WRITE_ONCE(*sp, *sp + 1) finishes. As far as I understand this >> >> >> consequently can allow start of next grace periods. Which in turn can >> >> >> make the warning fire. Am I missing something? >> >> >> >> >> >> I don't see any other bad consequences of this. The rest of >> >> >> rcu_exp_wait_wake can proceed when _synchronize_rcu_expedited has >> >> >> returned and destroyed work on stack and next period has started and >> >> >> ended, but it seems OK. >> >> > >> >> > I believe that this is a heygood change, but I don't see how it will >> >> > help in this case. BTW, may I have your Signed-off-by? >> >> > >> >> > The reason I don't believe that it will help is that the >> >> > rcu_exp_gp_seq_end() function is called from a workqueue handler that >> >> > is invoked holding ->exp_mutex, and this mutex is not released until >> >> > after the handler invokes rcu_seq_end() and then wakes up the task that >> >> > scheduled the workqueue handler. So the ordering above should not matter >> >> > (but I agree that your ordering is cleaner. >> >> > >> >> > That said, it looks like I am missing some memory barriers, please >> >> > see the following patch. >> >> > >> >> > But what architecture did you see this on? >> >> >> >> >> >> This is just x86. >> >> >> >> You seem to assume that wait_event() waits for the wakeup. It does not >> >> work this way. It can return as soon as the condition becomes true >> >> without ever waiting: >> >> >> >> 305 #define wait_event(wq, condition) \ >> >> 306 do { \ >> >> 307 might_sleep(); \ >> >> 308 if (condition) \ >> >> 309 break; \ >> >> 310 __wait_event(wq, condition); \ >> >> 311 } while (0) >> > >> > Agreed, hence my patch in the previous email. I guess I knew that, but >> >> Ah, you meant to synchronize rcu_seq_end with rcu_seq_done? > > No, there is a mutex release and acquisition that do the synchronization, > but only if the wakeup has appropriate barriers. The issue is that > part of the mutex's critical section executes in a workqueue, possibly > on some other CPU. What is that mutex? And what locks/unlocks provide synchronization? I see that one uses exp_mutex and another -- exp_wake_mutex. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-06 10:11 ` Dmitry Vyukov @ 2017-03-06 23:08 ` Paul E. McKenney 2017-03-07 7:05 ` Dmitry Vyukov 0 siblings, 1 reply; 21+ messages in thread From: Paul E. McKenney @ 2017-03-06 23:08 UTC (permalink / raw) To: Dmitry Vyukov Cc: josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller On Mon, Mar 06, 2017 at 11:11:23AM +0100, Dmitry Vyukov wrote: > On Mon, Mar 6, 2017 at 11:07 AM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: > > On Mon, Mar 06, 2017 at 10:24:24AM +0100, Dmitry Vyukov wrote: > >> On Sun, Mar 5, 2017 at 7:47 PM, Paul E. McKenney > >> <paulmck@linux.vnet.ibm.com> wrote: > >> > On Sun, Mar 05, 2017 at 11:50:39AM +0100, Dmitry Vyukov wrote: > >> >> On Sat, Mar 4, 2017 at 9:40 PM, Paul E. McKenney > >> >> <paulmck@linux.vnet.ibm.com> wrote: > >> >> > On Sat, Mar 04, 2017 at 05:01:19PM +0100, Dmitry Vyukov wrote: > >> >> >> Hello, > >> >> >> > >> >> >> Paul, you wanted bugs in rcu. > >> >> > > >> >> > Well, whether I want them or not, I must deal with them. ;-) > >> >> > > >> >> >> I've got this WARNING while running syzkaller fuzzer on > >> >> >> 86292b33d4b79ee03e2f43ea0381ef85f077c760: > >> >> >> > >> >> >> ------------[ cut here ]------------ > >> >> >> WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533 > >> >> >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 > >> >> >> Kernel panic - not syncing: panic_on_warn set ... > >> >> >> CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276 > >> >> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > >> >> >> Workqueue: events wait_rcu_exp_gp > >> >> >> Call Trace: > >> >> >> __dump_stack lib/dump_stack.c:15 [inline] > >> >> >> dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 > >> >> >> panic+0x1fb/0x412 kernel/panic.c:179 > >> >> >> __warn+0x1c4/0x1e0 kernel/panic.c:540 > >> >> >> warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 > >> >> >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 > >> >> >> rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline] > >> >> >> rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517 > >> >> >> rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline] > >> >> >> wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570 > >> >> >> process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096 > >> >> >> worker_thread+0x223/0x19c0 kernel/workqueue.c:2230 > >> >> >> kthread+0x326/0x3f0 kernel/kthread.c:227 > >> >> >> ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 > >> >> >> Dumping ftrace buffer: > >> >> >> (ftrace buffer empty) > >> >> >> Kernel Offset: disabled > >> >> >> Rebooting in 86400 seconds.. > >> >> >> > >> >> >> > >> >> >> Not reproducible. But looking at the code, shouldn't it be: > >> >> >> > >> >> >> static void rcu_seq_end(unsigned long *sp) > >> >> >> { > >> >> >> smp_mb(); /* Ensure update-side operation before counter increment. */ > >> >> >> + WARN_ON_ONCE(!(*sp & 0x1)); > >> >> >> WRITE_ONCE(*sp, *sp + 1); > >> >> >> - WARN_ON_ONCE(*sp & 0x1); > >> >> >> } > >> >> >> > >> >> >> ? > >> >> >> > >> >> >> Otherwise wait_event in _synchronize_rcu_expedited can return as soon > >> >> >> as WRITE_ONCE(*sp, *sp + 1) finishes. As far as I understand this > >> >> >> consequently can allow start of next grace periods. Which in turn can > >> >> >> make the warning fire. Am I missing something? > >> >> >> > >> >> >> I don't see any other bad consequences of this. The rest of > >> >> >> rcu_exp_wait_wake can proceed when _synchronize_rcu_expedited has > >> >> >> returned and destroyed work on stack and next period has started and > >> >> >> ended, but it seems OK. > >> >> > > >> >> > I believe that this is a heygood change, but I don't see how it will > >> >> > help in this case. BTW, may I have your Signed-off-by? > >> >> > > >> >> > The reason I don't believe that it will help is that the > >> >> > rcu_exp_gp_seq_end() function is called from a workqueue handler that > >> >> > is invoked holding ->exp_mutex, and this mutex is not released until > >> >> > after the handler invokes rcu_seq_end() and then wakes up the task that > >> >> > scheduled the workqueue handler. So the ordering above should not matter > >> >> > (but I agree that your ordering is cleaner. > >> >> > > >> >> > That said, it looks like I am missing some memory barriers, please > >> >> > see the following patch. > >> >> > > >> >> > But what architecture did you see this on? > >> >> > >> >> > >> >> This is just x86. > >> >> > >> >> You seem to assume that wait_event() waits for the wakeup. It does not > >> >> work this way. It can return as soon as the condition becomes true > >> >> without ever waiting: > >> >> > >> >> 305 #define wait_event(wq, condition) \ > >> >> 306 do { \ > >> >> 307 might_sleep(); \ > >> >> 308 if (condition) \ > >> >> 309 break; \ > >> >> 310 __wait_event(wq, condition); \ > >> >> 311 } while (0) > >> > > >> > Agreed, hence my patch in the previous email. I guess I knew that, but > >> > >> Ah, you meant to synchronize rcu_seq_end with rcu_seq_done? > > > > No, there is a mutex release and acquisition that do the synchronization, > > but only if the wakeup has appropriate barriers. The issue is that > > part of the mutex's critical section executes in a workqueue, possibly > > on some other CPU. > > What is that mutex? And what locks/unlocks provide synchronization? I > see that one uses exp_mutex and another -- exp_wake_mutex. Both of them. ->exp_mutex is acquired by the task requesting the grace period, and the counter's first increment is done by that task under that mutex. This task then schedules a workqueue, which drives forward the grace period. Upon grace-period completion, the workqueue handler does the second increment (the one that your patch addressed). The workqueue handler then acquires ->exp_wake_mutex and wakes the task that holds ->exp_mutex (along with all other tasks waiting for this grace period), and that task releases ->exp_mutex, which allows the next grace period to start (and the first increment for that next grace period to be carried out under that lock). The workqueue handler releases ->exp_wake_mutex after finishing its wakeups. Does that make sense? Thanx, Paul ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-06 23:08 ` Paul E. McKenney @ 2017-03-07 7:05 ` Dmitry Vyukov 2017-03-07 14:27 ` Boqun Feng 2017-03-07 15:16 ` Paul E. McKenney 0 siblings, 2 replies; 21+ messages in thread From: Dmitry Vyukov @ 2017-03-07 7:05 UTC (permalink / raw) To: Paul McKenney Cc: josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller On Tue, Mar 7, 2017 at 12:08 AM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: >> >> >> >> Hello, >> >> >> >> >> >> >> >> Paul, you wanted bugs in rcu. >> >> >> > >> >> >> > Well, whether I want them or not, I must deal with them. ;-) >> >> >> > >> >> >> >> I've got this WARNING while running syzkaller fuzzer on >> >> >> >> 86292b33d4b79ee03e2f43ea0381ef85f077c760: >> >> >> >> >> >> >> >> ------------[ cut here ]------------ >> >> >> >> WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533 >> >> >> >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 >> >> >> >> Kernel panic - not syncing: panic_on_warn set ... >> >> >> >> CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276 >> >> >> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 >> >> >> >> Workqueue: events wait_rcu_exp_gp >> >> >> >> Call Trace: >> >> >> >> __dump_stack lib/dump_stack.c:15 [inline] >> >> >> >> dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 >> >> >> >> panic+0x1fb/0x412 kernel/panic.c:179 >> >> >> >> __warn+0x1c4/0x1e0 kernel/panic.c:540 >> >> >> >> warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 >> >> >> >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 >> >> >> >> rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline] >> >> >> >> rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517 >> >> >> >> rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline] >> >> >> >> wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570 >> >> >> >> process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096 >> >> >> >> worker_thread+0x223/0x19c0 kernel/workqueue.c:2230 >> >> >> >> kthread+0x326/0x3f0 kernel/kthread.c:227 >> >> >> >> ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 >> >> >> >> Dumping ftrace buffer: >> >> >> >> (ftrace buffer empty) >> >> >> >> Kernel Offset: disabled >> >> >> >> Rebooting in 86400 seconds.. >> >> >> >> >> >> >> >> >> >> >> >> Not reproducible. But looking at the code, shouldn't it be: >> >> >> >> >> >> >> >> static void rcu_seq_end(unsigned long *sp) >> >> >> >> { >> >> >> >> smp_mb(); /* Ensure update-side operation before counter increment. */ >> >> >> >> + WARN_ON_ONCE(!(*sp & 0x1)); >> >> >> >> WRITE_ONCE(*sp, *sp + 1); >> >> >> >> - WARN_ON_ONCE(*sp & 0x1); >> >> >> >> } >> >> >> >> >> >> >> >> ? >> >> >> >> >> >> >> >> Otherwise wait_event in _synchronize_rcu_expedited can return as soon >> >> >> >> as WRITE_ONCE(*sp, *sp + 1) finishes. As far as I understand this >> >> >> >> consequently can allow start of next grace periods. Which in turn can >> >> >> >> make the warning fire. Am I missing something? >> >> >> >> >> >> >> >> I don't see any other bad consequences of this. The rest of >> >> >> >> rcu_exp_wait_wake can proceed when _synchronize_rcu_expedited has >> >> >> >> returned and destroyed work on stack and next period has started and >> >> >> >> ended, but it seems OK. >> >> >> > >> >> >> > I believe that this is a heygood change, but I don't see how it will >> >> >> > help in this case. BTW, may I have your Signed-off-by? >> >> >> > >> >> >> > The reason I don't believe that it will help is that the >> >> >> > rcu_exp_gp_seq_end() function is called from a workqueue handler that >> >> >> > is invoked holding ->exp_mutex, and this mutex is not released until >> >> >> > after the handler invokes rcu_seq_end() and then wakes up the task that >> >> >> > scheduled the workqueue handler. So the ordering above should not matter >> >> >> > (but I agree that your ordering is cleaner. >> >> >> > >> >> >> > That said, it looks like I am missing some memory barriers, please >> >> >> > see the following patch. >> >> >> > >> >> >> > But what architecture did you see this on? >> >> >> >> >> >> >> >> >> This is just x86. >> >> >> >> >> >> You seem to assume that wait_event() waits for the wakeup. It does not >> >> >> work this way. It can return as soon as the condition becomes true >> >> >> without ever waiting: >> >> >> >> >> >> 305 #define wait_event(wq, condition) \ >> >> >> 306 do { \ >> >> >> 307 might_sleep(); \ >> >> >> 308 if (condition) \ >> >> >> 309 break; \ >> >> >> 310 __wait_event(wq, condition); \ >> >> >> 311 } while (0) >> >> > >> >> > Agreed, hence my patch in the previous email. I guess I knew that, but >> >> >> >> Ah, you meant to synchronize rcu_seq_end with rcu_seq_done? >> > >> > No, there is a mutex release and acquisition that do the synchronization, >> > but only if the wakeup has appropriate barriers. The issue is that >> > part of the mutex's critical section executes in a workqueue, possibly >> > on some other CPU. >> >> What is that mutex? And what locks/unlocks provide synchronization? I >> see that one uses exp_mutex and another -- exp_wake_mutex. > > Both of them. > > ->exp_mutex is acquired by the task requesting the grace period, and > the counter's first increment is done by that task under that mutex. > This task then schedules a workqueue, which drives forward the grace > period. Upon grace-period completion, the workqueue handler does the > second increment (the one that your patch addressed). The workqueue > handler then acquires ->exp_wake_mutex and wakes the task that holds > ->exp_mutex (along with all other tasks waiting for this grace period), > and that task releases ->exp_mutex, which allows the next grace period to > start (and the first increment for that next grace period to be carried > out under that lock). The workqueue handler releases ->exp_wake_mutex > after finishing its wakeups. Then we need the following for the case when task requesting the grace period does not block, right? diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index d80c2587bed8..aa7ba83f6a56 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3534,7 +3534,7 @@ static void rcu_seq_start(unsigned long *sp) static void rcu_seq_end(unsigned long *sp) { smp_mb(); /* Ensure update-side operation before counter increment. */ - WRITE_ONCE(*sp, *sp + 1); + smp_store_release(sp, *sp + 1); WARN_ON_ONCE(*sp & 0x1); } @@ -3554,7 +3554,7 @@ static unsigned long rcu_seq_snap(unsigned long *sp) */ static bool rcu_seq_done(unsigned long *sp, unsigned long s) { - return ULONG_CMP_GE(READ_ONCE(*sp), s); + return ULONG_CMP_GE(smp_load_acquire(sp), s); } ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-07 7:05 ` Dmitry Vyukov @ 2017-03-07 14:27 ` Boqun Feng 2017-03-07 14:43 ` Dmitry Vyukov 2017-03-07 15:16 ` Paul E. McKenney 1 sibling, 1 reply; 21+ messages in thread From: Boqun Feng @ 2017-03-07 14:27 UTC (permalink / raw) To: Dmitry Vyukov Cc: Paul McKenney, josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller [-- Attachment #1: Type: text/plain, Size: 2097 bytes --] On Tue, Mar 07, 2017 at 08:05:19AM +0100, Dmitry Vyukov wrote: [...] > >> > >> What is that mutex? And what locks/unlocks provide synchronization? I > >> see that one uses exp_mutex and another -- exp_wake_mutex. > > > > Both of them. > > > > ->exp_mutex is acquired by the task requesting the grace period, and > > the counter's first increment is done by that task under that mutex. > > This task then schedules a workqueue, which drives forward the grace > > period. Upon grace-period completion, the workqueue handler does the > > second increment (the one that your patch addressed). The workqueue > > handler then acquires ->exp_wake_mutex and wakes the task that holds > > ->exp_mutex (along with all other tasks waiting for this grace period), > > and that task releases ->exp_mutex, which allows the next grace period to > > start (and the first increment for that next grace period to be carried > > out under that lock). The workqueue handler releases ->exp_wake_mutex > > after finishing its wakeups. > > > Then we need the following for the case when task requesting the grace > period does not block, right? > Won't be necessary I think, as the smp_mb() in rcu_seq_end() and the smp_mb__before_atomic() in sync_exp_work_done() already provide the required ordering, no? Regards, Boqun > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index d80c2587bed8..aa7ba83f6a56 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3534,7 +3534,7 @@ static void rcu_seq_start(unsigned long *sp) > static void rcu_seq_end(unsigned long *sp) > { > smp_mb(); /* Ensure update-side operation before counter increment. */ > - WRITE_ONCE(*sp, *sp + 1); > + smp_store_release(sp, *sp + 1); > WARN_ON_ONCE(*sp & 0x1); > } > > @@ -3554,7 +3554,7 @@ static unsigned long rcu_seq_snap(unsigned long *sp) > */ > static bool rcu_seq_done(unsigned long *sp, unsigned long s) > { > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > + return ULONG_CMP_GE(smp_load_acquire(sp), s); > > } [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-07 14:27 ` Boqun Feng @ 2017-03-07 14:43 ` Dmitry Vyukov 2017-03-07 15:27 ` Paul E. McKenney 0 siblings, 1 reply; 21+ messages in thread From: Dmitry Vyukov @ 2017-03-07 14:43 UTC (permalink / raw) To: Boqun Feng Cc: Paul McKenney, josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller On Tue, Mar 7, 2017 at 3:27 PM, Boqun Feng <boqun.feng@gmail.com> wrote: > On Tue, Mar 07, 2017 at 08:05:19AM +0100, Dmitry Vyukov wrote: > [...] >> >> >> >> What is that mutex? And what locks/unlocks provide synchronization? I >> >> see that one uses exp_mutex and another -- exp_wake_mutex. >> > >> > Both of them. >> > >> > ->exp_mutex is acquired by the task requesting the grace period, and >> > the counter's first increment is done by that task under that mutex. >> > This task then schedules a workqueue, which drives forward the grace >> > period. Upon grace-period completion, the workqueue handler does the >> > second increment (the one that your patch addressed). The workqueue >> > handler then acquires ->exp_wake_mutex and wakes the task that holds >> > ->exp_mutex (along with all other tasks waiting for this grace period), >> > and that task releases ->exp_mutex, which allows the next grace period to >> > start (and the first increment for that next grace period to be carried >> > out under that lock). The workqueue handler releases ->exp_wake_mutex >> > after finishing its wakeups. >> >> >> Then we need the following for the case when task requesting the grace >> period does not block, right? >> > > Won't be necessary I think, as the smp_mb() in rcu_seq_end() and the > smp_mb__before_atomic() in sync_exp_work_done() already provide the > required ordering, no? smp_mb() is probably fine, but smp_mb__before_atomic() is release not acquire. If we want to play that game, then I guess we also need smp_mb__after_atomic() there. But it would be way easier to understand what's happens there and prove that it's correct, if we use store_release/load_acquire. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-07 14:43 ` Dmitry Vyukov @ 2017-03-07 15:27 ` Paul E. McKenney 2017-03-07 18:37 ` Dmitry Vyukov 2017-03-07 23:05 ` Boqun Feng 0 siblings, 2 replies; 21+ messages in thread From: Paul E. McKenney @ 2017-03-07 15:27 UTC (permalink / raw) To: Dmitry Vyukov Cc: Boqun Feng, josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller On Tue, Mar 07, 2017 at 03:43:42PM +0100, Dmitry Vyukov wrote: > On Tue, Mar 7, 2017 at 3:27 PM, Boqun Feng <boqun.feng@gmail.com> wrote: > > On Tue, Mar 07, 2017 at 08:05:19AM +0100, Dmitry Vyukov wrote: > > [...] > >> >> > >> >> What is that mutex? And what locks/unlocks provide synchronization? I > >> >> see that one uses exp_mutex and another -- exp_wake_mutex. > >> > > >> > Both of them. > >> > > >> > ->exp_mutex is acquired by the task requesting the grace period, and > >> > the counter's first increment is done by that task under that mutex. > >> > This task then schedules a workqueue, which drives forward the grace > >> > period. Upon grace-period completion, the workqueue handler does the > >> > second increment (the one that your patch addressed). The workqueue > >> > handler then acquires ->exp_wake_mutex and wakes the task that holds > >> > ->exp_mutex (along with all other tasks waiting for this grace period), > >> > and that task releases ->exp_mutex, which allows the next grace period to > >> > start (and the first increment for that next grace period to be carried > >> > out under that lock). The workqueue handler releases ->exp_wake_mutex > >> > after finishing its wakeups. > >> > >> Then we need the following for the case when task requesting the grace > >> period does not block, right? > > > > Won't be necessary I think, as the smp_mb() in rcu_seq_end() and the > > smp_mb__before_atomic() in sync_exp_work_done() already provide the > > required ordering, no? > > smp_mb() is probably fine, but smp_mb__before_atomic() is release not > acquire. If we want to play that game, then I guess we also need > smp_mb__after_atomic() there. But it would be way easier to understand > what's happens there and prove that it's correct, if we use > store_release/load_acquire. Fair point, how about the following? Thanx, Paul ------------------------------------------------------------------------ commit 6fd8074f1976596898e39f5b7ea1755652533906 Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Date: Tue Mar 7 07:21:23 2017 -0800 rcu: Add smp_mb__after_atomic() to sync_exp_work_done() The sync_exp_work_done() function needs to fully order the counter-check operation against anything happening after the corresponding grace period. This is a theoretical bug, as all current architectures either provide full ordering for atomic operation on the one hand or implement, however, a little future-proofing is a good thing. This commit therefore adds smp_mb__after_atomic() after the atomic_long_inc() in sync_exp_work_done(). Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 027e123d93c7..652071abd9b4 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -247,6 +247,7 @@ static bool sync_exp_work_done(struct rcu_state *rsp, atomic_long_t *stat, /* Ensure test happens before caller kfree(). */ smp_mb__before_atomic(); /* ^^^ */ atomic_long_inc(stat); + smp_mb__after_atomic(); /* ^^^ */ return true; } return false; ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-07 15:27 ` Paul E. McKenney @ 2017-03-07 18:37 ` Dmitry Vyukov 2017-03-07 19:09 ` Paul E. McKenney 2017-03-07 23:05 ` Boqun Feng 1 sibling, 1 reply; 21+ messages in thread From: Dmitry Vyukov @ 2017-03-07 18:37 UTC (permalink / raw) To: Paul McKenney Cc: Boqun Feng, josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller On Tue, Mar 7, 2017 at 4:27 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote: >> > [...] >> >> >> >> >> >> What is that mutex? And what locks/unlocks provide synchronization? I >> >> >> see that one uses exp_mutex and another -- exp_wake_mutex. >> >> > >> >> > Both of them. >> >> > >> >> > ->exp_mutex is acquired by the task requesting the grace period, and >> >> > the counter's first increment is done by that task under that mutex. >> >> > This task then schedules a workqueue, which drives forward the grace >> >> > period. Upon grace-period completion, the workqueue handler does the >> >> > second increment (the one that your patch addressed). The workqueue >> >> > handler then acquires ->exp_wake_mutex and wakes the task that holds >> >> > ->exp_mutex (along with all other tasks waiting for this grace period), >> >> > and that task releases ->exp_mutex, which allows the next grace period to >> >> > start (and the first increment for that next grace period to be carried >> >> > out under that lock). The workqueue handler releases ->exp_wake_mutex >> >> > after finishing its wakeups. >> >> >> >> Then we need the following for the case when task requesting the grace >> >> period does not block, right? >> > >> > Won't be necessary I think, as the smp_mb() in rcu_seq_end() and the >> > smp_mb__before_atomic() in sync_exp_work_done() already provide the >> > required ordering, no? >> >> smp_mb() is probably fine, but smp_mb__before_atomic() is release not >> acquire. If we want to play that game, then I guess we also need >> smp_mb__after_atomic() there. But it would be way easier to understand >> what's happens there and prove that it's correct, if we use >> store_release/load_acquire. > > Fair point, how about the following? I am not qualified enough to reason about these smp_mb__after_atomic. >From practical point of view there may be enough barriers in the resulting machine code already, but re formal semantics of adding smp_mb__after_atomic after an unrelated subsequent atomic RMW op I gave up. You must be the best candidate for this now :) > Thanx, Paul > > ------------------------------------------------------------------------ > > commit 6fd8074f1976596898e39f5b7ea1755652533906 > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Date: Tue Mar 7 07:21:23 2017 -0800 > > rcu: Add smp_mb__after_atomic() to sync_exp_work_done() > > The sync_exp_work_done() function needs to fully order the counter-check > operation against anything happening after the corresponding grace period. > This is a theoretical bug, as all current architectures either provide > full ordering for atomic operation on the one hand or implement, > however, a little future-proofing is a good thing. This commit > therefore adds smp_mb__after_atomic() after the atomic_long_inc() > in sync_exp_work_done(). > > Reported-by: Dmitry Vyukov <dvyukov@google.com> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > index 027e123d93c7..652071abd9b4 100644 > --- a/kernel/rcu/tree_exp.h > +++ b/kernel/rcu/tree_exp.h > @@ -247,6 +247,7 @@ static bool sync_exp_work_done(struct rcu_state *rsp, atomic_long_t *stat, > /* Ensure test happens before caller kfree(). */ > smp_mb__before_atomic(); /* ^^^ */ > atomic_long_inc(stat); > + smp_mb__after_atomic(); /* ^^^ */ > return true; > } > return false; ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-07 18:37 ` Dmitry Vyukov @ 2017-03-07 19:09 ` Paul E. McKenney 0 siblings, 0 replies; 21+ messages in thread From: Paul E. McKenney @ 2017-03-07 19:09 UTC (permalink / raw) To: Dmitry Vyukov Cc: Boqun Feng, josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller On Tue, Mar 07, 2017 at 07:37:57PM +0100, Dmitry Vyukov wrote: > On Tue, Mar 7, 2017 at 4:27 PM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: > >> > [...] > >> >> >> > >> >> >> What is that mutex? And what locks/unlocks provide synchronization? I > >> >> >> see that one uses exp_mutex and another -- exp_wake_mutex. > >> >> > > >> >> > Both of them. > >> >> > > >> >> > ->exp_mutex is acquired by the task requesting the grace period, and > >> >> > the counter's first increment is done by that task under that mutex. > >> >> > This task then schedules a workqueue, which drives forward the grace > >> >> > period. Upon grace-period completion, the workqueue handler does the > >> >> > second increment (the one that your patch addressed). The workqueue > >> >> > handler then acquires ->exp_wake_mutex and wakes the task that holds > >> >> > ->exp_mutex (along with all other tasks waiting for this grace period), > >> >> > and that task releases ->exp_mutex, which allows the next grace period to > >> >> > start (and the first increment for that next grace period to be carried > >> >> > out under that lock). The workqueue handler releases ->exp_wake_mutex > >> >> > after finishing its wakeups. > >> >> > >> >> Then we need the following for the case when task requesting the grace > >> >> period does not block, right? > >> > > >> > Won't be necessary I think, as the smp_mb() in rcu_seq_end() and the > >> > smp_mb__before_atomic() in sync_exp_work_done() already provide the > >> > required ordering, no? > >> > >> smp_mb() is probably fine, but smp_mb__before_atomic() is release not > >> acquire. If we want to play that game, then I guess we also need > >> smp_mb__after_atomic() there. But it would be way easier to understand > >> what's happens there and prove that it's correct, if we use > >> store_release/load_acquire. > > > > Fair point, how about the following? > > I am not qualified enough to reason about these smp_mb__after_atomic. > >From practical point of view there may be enough barriers in the > resulting machine code already, but re formal semantics of adding > smp_mb__after_atomic after an unrelated subsequent atomic RMW op I > gave up. You must be the best candidate for this now :) Unfortunately, there are code paths from sync_exp_work_done() that have no memory barriers. :-( And I might be the best candidate, but this email thread has definitely shown that I am not infallable, never mind that there was already plenty of evidence on this particular point. So thank you again for your testing and review efforts! Thanx, Paul > > ------------------------------------------------------------------------ > > > > commit 6fd8074f1976596898e39f5b7ea1755652533906 > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > Date: Tue Mar 7 07:21:23 2017 -0800 > > > > rcu: Add smp_mb__after_atomic() to sync_exp_work_done() > > > > The sync_exp_work_done() function needs to fully order the counter-check > > operation against anything happening after the corresponding grace period. > > This is a theoretical bug, as all current architectures either provide > > full ordering for atomic operation on the one hand or implement, > > however, a little future-proofing is a good thing. This commit > > therefore adds smp_mb__after_atomic() after the atomic_long_inc() > > in sync_exp_work_done(). > > > > Reported-by: Dmitry Vyukov <dvyukov@google.com> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > > index 027e123d93c7..652071abd9b4 100644 > > --- a/kernel/rcu/tree_exp.h > > +++ b/kernel/rcu/tree_exp.h > > @@ -247,6 +247,7 @@ static bool sync_exp_work_done(struct rcu_state *rsp, atomic_long_t *stat, > > /* Ensure test happens before caller kfree(). */ > > smp_mb__before_atomic(); /* ^^^ */ > > atomic_long_inc(stat); > > + smp_mb__after_atomic(); /* ^^^ */ > > return true; > > } > > return false; > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-07 15:27 ` Paul E. McKenney 2017-03-07 18:37 ` Dmitry Vyukov @ 2017-03-07 23:05 ` Boqun Feng 2017-03-07 23:31 ` Paul E. McKenney 1 sibling, 1 reply; 21+ messages in thread From: Boqun Feng @ 2017-03-07 23:05 UTC (permalink / raw) To: Paul E. McKenney Cc: Dmitry Vyukov, josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller [-- Attachment #1: Type: text/plain, Size: 4122 bytes --] On Tue, Mar 07, 2017 at 07:27:15AM -0800, Paul E. McKenney wrote: > On Tue, Mar 07, 2017 at 03:43:42PM +0100, Dmitry Vyukov wrote: > > On Tue, Mar 7, 2017 at 3:27 PM, Boqun Feng <boqun.feng@gmail.com> wrote: > > > On Tue, Mar 07, 2017 at 08:05:19AM +0100, Dmitry Vyukov wrote: > > > [...] > > >> >> > > >> >> What is that mutex? And what locks/unlocks provide synchronization? I > > >> >> see that one uses exp_mutex and another -- exp_wake_mutex. > > >> > > > >> > Both of them. > > >> > > > >> > ->exp_mutex is acquired by the task requesting the grace period, and > > >> > the counter's first increment is done by that task under that mutex. > > >> > This task then schedules a workqueue, which drives forward the grace > > >> > period. Upon grace-period completion, the workqueue handler does the > > >> > second increment (the one that your patch addressed). The workqueue > > >> > handler then acquires ->exp_wake_mutex and wakes the task that holds > > >> > ->exp_mutex (along with all other tasks waiting for this grace period), > > >> > and that task releases ->exp_mutex, which allows the next grace period to > > >> > start (and the first increment for that next grace period to be carried > > >> > out under that lock). The workqueue handler releases ->exp_wake_mutex > > >> > after finishing its wakeups. > > >> > > >> Then we need the following for the case when task requesting the grace > > >> period does not block, right? > > > > > > Won't be necessary I think, as the smp_mb() in rcu_seq_end() and the > > > smp_mb__before_atomic() in sync_exp_work_done() already provide the > > > required ordering, no? > > > > smp_mb() is probably fine, but smp_mb__before_atomic() is release not > > acquire. If we want to play that game, then I guess we also need The point is that smp_mb__before_atomic() + atomic_long_inc() will guarantee a smp_mb() before or right along with the atomic operation, and that's enough because rcu_seq_done() followed by a smp_mb() will give it a acquire-like behavior. > > smp_mb__after_atomic() there. But it would be way easier to understand Adding smp_mb__after_atomic() would be pointless as it's the load of ->expedited_sequence that we want to ensure having acquire behavior rather than the atomic increment of @stat. > > what's happens there and prove that it's correct, if we use > > store_release/load_acquire. > > Fair point, how about the following? > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit 6fd8074f1976596898e39f5b7ea1755652533906 > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > Date: Tue Mar 7 07:21:23 2017 -0800 > > rcu: Add smp_mb__after_atomic() to sync_exp_work_done() > > The sync_exp_work_done() function needs to fully order the counter-check > operation against anything happening after the corresponding grace period. > This is a theoretical bug, as all current architectures either provide > full ordering for atomic operation on the one hand or implement, > however, a little future-proofing is a good thing. This commit > therefore adds smp_mb__after_atomic() after the atomic_long_inc() > in sync_exp_work_done(). > > Reported-by: Dmitry Vyukov <dvyukov@google.com> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > index 027e123d93c7..652071abd9b4 100644 > --- a/kernel/rcu/tree_exp.h > +++ b/kernel/rcu/tree_exp.h > @@ -247,6 +247,7 @@ static bool sync_exp_work_done(struct rcu_state *rsp, atomic_long_t *stat, > /* Ensure test happens before caller kfree(). */ > smp_mb__before_atomic(); /* ^^^ */ > atomic_long_inc(stat); > + smp_mb__after_atomic(); /* ^^^ */ If we really care about future-proofing, I think it's more safe to change smp_mb__before_atomic() to smp_mb() rather than adding __after_atomic() barrier. Though I think both would be unnecessary ;-) Regards, Boqun > return true; > } > return false; > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-07 23:05 ` Boqun Feng @ 2017-03-07 23:31 ` Paul E. McKenney 2017-03-08 1:39 ` Boqun Feng 0 siblings, 1 reply; 21+ messages in thread From: Paul E. McKenney @ 2017-03-07 23:31 UTC (permalink / raw) To: Boqun Feng Cc: Dmitry Vyukov, josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller On Wed, Mar 08, 2017 at 07:05:13AM +0800, Boqun Feng wrote: > On Tue, Mar 07, 2017 at 07:27:15AM -0800, Paul E. McKenney wrote: > > On Tue, Mar 07, 2017 at 03:43:42PM +0100, Dmitry Vyukov wrote: > > > On Tue, Mar 7, 2017 at 3:27 PM, Boqun Feng <boqun.feng@gmail.com> wrote: > > > > On Tue, Mar 07, 2017 at 08:05:19AM +0100, Dmitry Vyukov wrote: > > > > [...] > > > >> >> > > > >> >> What is that mutex? And what locks/unlocks provide synchronization? I > > > >> >> see that one uses exp_mutex and another -- exp_wake_mutex. > > > >> > > > > >> > Both of them. > > > >> > > > > >> > ->exp_mutex is acquired by the task requesting the grace period, and > > > >> > the counter's first increment is done by that task under that mutex. > > > >> > This task then schedules a workqueue, which drives forward the grace > > > >> > period. Upon grace-period completion, the workqueue handler does the > > > >> > second increment (the one that your patch addressed). The workqueue > > > >> > handler then acquires ->exp_wake_mutex and wakes the task that holds > > > >> > ->exp_mutex (along with all other tasks waiting for this grace period), > > > >> > and that task releases ->exp_mutex, which allows the next grace period to > > > >> > start (and the first increment for that next grace period to be carried > > > >> > out under that lock). The workqueue handler releases ->exp_wake_mutex > > > >> > after finishing its wakeups. > > > >> > > > >> Then we need the following for the case when task requesting the grace > > > >> period does not block, right? > > > > > > > > Won't be necessary I think, as the smp_mb() in rcu_seq_end() and the > > > > smp_mb__before_atomic() in sync_exp_work_done() already provide the > > > > required ordering, no? > > > > > > smp_mb() is probably fine, but smp_mb__before_atomic() is release not > > > acquire. If we want to play that game, then I guess we also need > > The point is that smp_mb__before_atomic() + atomic_long_inc() will > guarantee a smp_mb() before or right along with the atomic operation, > and that's enough because rcu_seq_done() followed by a smp_mb() will > give it a acquire-like behavior. Given current architectures, true enough, from what I can see. However, let's take a look at atomic_ops.rst: If a caller requires memory barrier semantics around an atomic_t operation which does not return a value, a set of interfaces are defined which accomplish this:: void smp_mb__before_atomic(void); void smp_mb__after_atomic(void); For example, smp_mb__before_atomic() can be used like so:: obj->dead = 1; smp_mb__before_atomic(); atomic_dec(&obj->ref_count); It makes sure that all memory operations preceding the atomic_dec() call are strongly ordered with respect to the atomic counter operation. In the above example, it guarantees that the assignment of "1" to obj->dead will be globally visible to other cpus before the atomic counter decrement. Without the explicit smp_mb__before_atomic() call, the implementation could legally allow the atomic counter update visible to other cpus before the "obj->dead = 1;" assignment. So the ordering is guaranteed against the atomic operation, not necessarily the stuff after it. But again, the implementations I know of do make the guarantee, hence my calling it a theoretical bug in the commit log. > > > smp_mb__after_atomic() there. But it would be way easier to understand > > Adding smp_mb__after_atomic() would be pointless as it's the load of > ->expedited_sequence that we want to ensure having acquire behavior > rather than the atomic increment of @stat. Again, agreed given current code, but atomic_ops.rst doesn't guarantee ordering past the actual atomic operation itself. Thanx, Paul > > > what's happens there and prove that it's correct, if we use > > > store_release/load_acquire. > > > > Fair point, how about the following? > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > commit 6fd8074f1976596898e39f5b7ea1755652533906 > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > Date: Tue Mar 7 07:21:23 2017 -0800 > > > > rcu: Add smp_mb__after_atomic() to sync_exp_work_done() > > > > The sync_exp_work_done() function needs to fully order the counter-check > > operation against anything happening after the corresponding grace period. > > This is a theoretical bug, as all current architectures either provide > > full ordering for atomic operation on the one hand or implement, > > however, a little future-proofing is a good thing. This commit > > therefore adds smp_mb__after_atomic() after the atomic_long_inc() > > in sync_exp_work_done(). > > > > Reported-by: Dmitry Vyukov <dvyukov@google.com> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > > index 027e123d93c7..652071abd9b4 100644 > > --- a/kernel/rcu/tree_exp.h > > +++ b/kernel/rcu/tree_exp.h > > @@ -247,6 +247,7 @@ static bool sync_exp_work_done(struct rcu_state *rsp, atomic_long_t *stat, > > /* Ensure test happens before caller kfree(). */ > > smp_mb__before_atomic(); /* ^^^ */ > > atomic_long_inc(stat); > > + smp_mb__after_atomic(); /* ^^^ */ > > If we really care about future-proofing, I think it's more safe to > change smp_mb__before_atomic() to smp_mb() rather than adding > __after_atomic() barrier. Though I think both would be unnecessary ;-) > > Regards, > Boqun > > > return true; > > } > > return false; > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-07 23:31 ` Paul E. McKenney @ 2017-03-08 1:39 ` Boqun Feng 2017-03-08 2:26 ` Paul E. McKenney 0 siblings, 1 reply; 21+ messages in thread From: Boqun Feng @ 2017-03-08 1:39 UTC (permalink / raw) To: Paul E. McKenney Cc: Dmitry Vyukov, josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller [-- Attachment #1: Type: text/plain, Size: 6979 bytes --] On Tue, Mar 07, 2017 at 03:31:54PM -0800, Paul E. McKenney wrote: > On Wed, Mar 08, 2017 at 07:05:13AM +0800, Boqun Feng wrote: > > On Tue, Mar 07, 2017 at 07:27:15AM -0800, Paul E. McKenney wrote: > > > On Tue, Mar 07, 2017 at 03:43:42PM +0100, Dmitry Vyukov wrote: > > > > On Tue, Mar 7, 2017 at 3:27 PM, Boqun Feng <boqun.feng@gmail.com> wrote: > > > > > On Tue, Mar 07, 2017 at 08:05:19AM +0100, Dmitry Vyukov wrote: > > > > > [...] > > > > >> >> > > > > >> >> What is that mutex? And what locks/unlocks provide synchronization? I > > > > >> >> see that one uses exp_mutex and another -- exp_wake_mutex. > > > > >> > > > > > >> > Both of them. > > > > >> > > > > > >> > ->exp_mutex is acquired by the task requesting the grace period, and > > > > >> > the counter's first increment is done by that task under that mutex. > > > > >> > This task then schedules a workqueue, which drives forward the grace > > > > >> > period. Upon grace-period completion, the workqueue handler does the > > > > >> > second increment (the one that your patch addressed). The workqueue > > > > >> > handler then acquires ->exp_wake_mutex and wakes the task that holds > > > > >> > ->exp_mutex (along with all other tasks waiting for this grace period), > > > > >> > and that task releases ->exp_mutex, which allows the next grace period to > > > > >> > start (and the first increment for that next grace period to be carried > > > > >> > out under that lock). The workqueue handler releases ->exp_wake_mutex > > > > >> > after finishing its wakeups. > > > > >> > > > > >> Then we need the following for the case when task requesting the grace > > > > >> period does not block, right? > > > > > > > > > > Won't be necessary I think, as the smp_mb() in rcu_seq_end() and the > > > > > smp_mb__before_atomic() in sync_exp_work_done() already provide the > > > > > required ordering, no? > > > > > > > > smp_mb() is probably fine, but smp_mb__before_atomic() is release not > > > > acquire. If we want to play that game, then I guess we also need > > > > The point is that smp_mb__before_atomic() + atomic_long_inc() will > > guarantee a smp_mb() before or right along with the atomic operation, > > and that's enough because rcu_seq_done() followed by a smp_mb() will > > give it a acquire-like behavior. > > Given current architectures, true enough, from what I can see. > > However, let's take a look at atomic_ops.rst: > > > If a caller requires memory barrier semantics around an atomic_t > operation which does not return a value, a set of interfaces are > defined which accomplish this:: > > void smp_mb__before_atomic(void); > void smp_mb__after_atomic(void); > > For example, smp_mb__before_atomic() can be used like so:: > > obj->dead = 1; > smp_mb__before_atomic(); > atomic_dec(&obj->ref_count); > > It makes sure that all memory operations preceding the atomic_dec() > call are strongly ordered with respect to the atomic counter > operation. In the above example, it guarantees that the assignment of > "1" to obj->dead will be globally visible to other cpus before the > atomic counter decrement. > > Without the explicit smp_mb__before_atomic() call, the > implementation could legally allow the atomic counter update visible > to other cpus before the "obj->dead = 1;" assignment. > > So the ordering is guaranteed against the atomic operation, not > necessarily the stuff after it. But again, the implementations I know > of do make the guarantee, hence my calling it a theoretical bug in the > commit log. > Fair enough ;-) It's me who misunderstood this part of document. However, the names of the barriers are smp_mb__{before,after}_atomic(), so if they, semantically, only provide ordering for the corresponding atomic ops rather than a full barrier, I would their names are misleading ;-) > > > > smp_mb__after_atomic() there. But it would be way easier to understand > > > > Adding smp_mb__after_atomic() would be pointless as it's the load of > > ->expedited_sequence that we want to ensure having acquire behavior > > rather than the atomic increment of @stat. > > Again, agreed given current code, but atomic_ops.rst doesn't guarantee > ordering past the actual atomic operation itself. > Neither does atomic_ops.rst guarantee the ordering between a load before the atomic op and memory accesses after the atomic op, right? I.e. atomic_ops.rst doesn't say no for reordering like this: r1 = READ_ONCE(a); ---------+ atomic_long_inc(b); | smp_mb__after_atomic(); | WRITE_ONCE(c); | {r1 = READ_ONCE(a)} <-------+ So it's still not an acquire for READ_ONCE(a), in our case "a" is ->expedited_sequence. To me, we can either fix the atomic_ops.rst or, as I proposed, just change smp_mb__before_atomic() to smp_mb(). Thoughts? Regards, Boqun > Thanx, Paul > > > > > what's happens there and prove that it's correct, if we use > > > > store_release/load_acquire. > > > > > > Fair point, how about the following? > > > > > > Thanx, Paul > > > > > > ------------------------------------------------------------------------ > > > > > > commit 6fd8074f1976596898e39f5b7ea1755652533906 > > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > Date: Tue Mar 7 07:21:23 2017 -0800 > > > > > > rcu: Add smp_mb__after_atomic() to sync_exp_work_done() > > > > > > The sync_exp_work_done() function needs to fully order the counter-check > > > operation against anything happening after the corresponding grace period. > > > This is a theoretical bug, as all current architectures either provide > > > full ordering for atomic operation on the one hand or implement, > > > however, a little future-proofing is a good thing. This commit > > > therefore adds smp_mb__after_atomic() after the atomic_long_inc() > > > in sync_exp_work_done(). > > > > > > Reported-by: Dmitry Vyukov <dvyukov@google.com> > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > > > index 027e123d93c7..652071abd9b4 100644 > > > --- a/kernel/rcu/tree_exp.h > > > +++ b/kernel/rcu/tree_exp.h > > > @@ -247,6 +247,7 @@ static bool sync_exp_work_done(struct rcu_state *rsp, atomic_long_t *stat, > > > /* Ensure test happens before caller kfree(). */ > > > smp_mb__before_atomic(); /* ^^^ */ > > > atomic_long_inc(stat); > > > + smp_mb__after_atomic(); /* ^^^ */ > > > > If we really care about future-proofing, I think it's more safe to > > change smp_mb__before_atomic() to smp_mb() rather than adding > > __after_atomic() barrier. Though I think both would be unnecessary ;-) > > > > Regards, > > Boqun > > > > > return true; > > > } > > > return false; > > > > > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-08 1:39 ` Boqun Feng @ 2017-03-08 2:26 ` Paul E. McKenney 2017-03-08 2:44 ` Boqun Feng 0 siblings, 1 reply; 21+ messages in thread From: Paul E. McKenney @ 2017-03-08 2:26 UTC (permalink / raw) To: Boqun Feng Cc: Dmitry Vyukov, josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller On Wed, Mar 08, 2017 at 09:39:13AM +0800, Boqun Feng wrote: > On Tue, Mar 07, 2017 at 03:31:54PM -0800, Paul E. McKenney wrote: > > On Wed, Mar 08, 2017 at 07:05:13AM +0800, Boqun Feng wrote: > > > On Tue, Mar 07, 2017 at 07:27:15AM -0800, Paul E. McKenney wrote: > > > > On Tue, Mar 07, 2017 at 03:43:42PM +0100, Dmitry Vyukov wrote: > > > > > On Tue, Mar 7, 2017 at 3:27 PM, Boqun Feng <boqun.feng@gmail.com> wrote: > > > > > > On Tue, Mar 07, 2017 at 08:05:19AM +0100, Dmitry Vyukov wrote: > > > > > > [...] > > > > > >> >> > > > > > >> >> What is that mutex? And what locks/unlocks provide synchronization? I > > > > > >> >> see that one uses exp_mutex and another -- exp_wake_mutex. > > > > > >> > > > > > > >> > Both of them. > > > > > >> > > > > > > >> > ->exp_mutex is acquired by the task requesting the grace period, and > > > > > >> > the counter's first increment is done by that task under that mutex. > > > > > >> > This task then schedules a workqueue, which drives forward the grace > > > > > >> > period. Upon grace-period completion, the workqueue handler does the > > > > > >> > second increment (the one that your patch addressed). The workqueue > > > > > >> > handler then acquires ->exp_wake_mutex and wakes the task that holds > > > > > >> > ->exp_mutex (along with all other tasks waiting for this grace period), > > > > > >> > and that task releases ->exp_mutex, which allows the next grace period to > > > > > >> > start (and the first increment for that next grace period to be carried > > > > > >> > out under that lock). The workqueue handler releases ->exp_wake_mutex > > > > > >> > after finishing its wakeups. > > > > > >> > > > > > >> Then we need the following for the case when task requesting the grace > > > > > >> period does not block, right? > > > > > > > > > > > > Won't be necessary I think, as the smp_mb() in rcu_seq_end() and the > > > > > > smp_mb__before_atomic() in sync_exp_work_done() already provide the > > > > > > required ordering, no? > > > > > > > > > > smp_mb() is probably fine, but smp_mb__before_atomic() is release not > > > > > acquire. If we want to play that game, then I guess we also need > > > > > > The point is that smp_mb__before_atomic() + atomic_long_inc() will > > > guarantee a smp_mb() before or right along with the atomic operation, > > > and that's enough because rcu_seq_done() followed by a smp_mb() will > > > give it a acquire-like behavior. > > > > Given current architectures, true enough, from what I can see. > > > > However, let's take a look at atomic_ops.rst: > > > > > > If a caller requires memory barrier semantics around an atomic_t > > operation which does not return a value, a set of interfaces are > > defined which accomplish this:: > > > > void smp_mb__before_atomic(void); > > void smp_mb__after_atomic(void); > > > > For example, smp_mb__before_atomic() can be used like so:: > > > > obj->dead = 1; > > smp_mb__before_atomic(); > > atomic_dec(&obj->ref_count); > > > > It makes sure that all memory operations preceding the atomic_dec() > > call are strongly ordered with respect to the atomic counter > > operation. In the above example, it guarantees that the assignment of > > "1" to obj->dead will be globally visible to other cpus before the > > atomic counter decrement. > > > > Without the explicit smp_mb__before_atomic() call, the > > implementation could legally allow the atomic counter update visible > > to other cpus before the "obj->dead = 1;" assignment. > > > > So the ordering is guaranteed against the atomic operation, not > > necessarily the stuff after it. But again, the implementations I know > > of do make the guarantee, hence my calling it a theoretical bug in the > > commit log. > > Fair enough ;-) It's me who misunderstood this part of document. > > However, the names of the barriers are smp_mb__{before,after}_atomic(), > so if they, semantically, only provide ordering for the corresponding > atomic ops rather than a full barrier, I would their names are > misleading ;-) Well, if you have both ordering before and after, then you have full ordering. > > > > > smp_mb__after_atomic() there. But it would be way easier to understand > > > > > > Adding smp_mb__after_atomic() would be pointless as it's the load of > > > ->expedited_sequence that we want to ensure having acquire behavior > > > rather than the atomic increment of @stat. > > > > Again, agreed given current code, but atomic_ops.rst doesn't guarantee > > ordering past the actual atomic operation itself. > > Neither does atomic_ops.rst guarantee the ordering between a load before > the atomic op and memory accesses after the atomic op, right? I.e. > atomic_ops.rst doesn't say no for reordering like this: > > r1 = READ_ONCE(a); ---------+ > atomic_long_inc(b); | > smp_mb__after_atomic(); | > WRITE_ONCE(c); | > {r1 = READ_ONCE(a)} <-------+ > > So it's still not an acquire for READ_ONCE(a), in our case "a" is > ->expedited_sequence. > > To me, we can either fix the atomic_ops.rst or, as I proposed, just > change smp_mb__before_atomic() to smp_mb(). Or have both an smp_mb__before_atomic() and an smp_mb__after_atomic(), as is the usual approach when you need full ordering. ;-) Thanx, Paul > Thoughts? > > Regards, > Boqun > > > Thanx, Paul > > > > > > > what's happens there and prove that it's correct, if we use > > > > > store_release/load_acquire. > > > > > > > > Fair point, how about the following? > > > > > > > > Thanx, Paul > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > commit 6fd8074f1976596898e39f5b7ea1755652533906 > > > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > > Date: Tue Mar 7 07:21:23 2017 -0800 > > > > > > > > rcu: Add smp_mb__after_atomic() to sync_exp_work_done() > > > > > > > > The sync_exp_work_done() function needs to fully order the counter-check > > > > operation against anything happening after the corresponding grace period. > > > > This is a theoretical bug, as all current architectures either provide > > > > full ordering for atomic operation on the one hand or implement, > > > > however, a little future-proofing is a good thing. This commit > > > > therefore adds smp_mb__after_atomic() after the atomic_long_inc() > > > > in sync_exp_work_done(). > > > > > > > > Reported-by: Dmitry Vyukov <dvyukov@google.com> > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > > > > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > > > > index 027e123d93c7..652071abd9b4 100644 > > > > --- a/kernel/rcu/tree_exp.h > > > > +++ b/kernel/rcu/tree_exp.h > > > > @@ -247,6 +247,7 @@ static bool sync_exp_work_done(struct rcu_state *rsp, atomic_long_t *stat, > > > > /* Ensure test happens before caller kfree(). */ > > > > smp_mb__before_atomic(); /* ^^^ */ > > > > atomic_long_inc(stat); > > > > + smp_mb__after_atomic(); /* ^^^ */ > > > > > > If we really care about future-proofing, I think it's more safe to > > > change smp_mb__before_atomic() to smp_mb() rather than adding > > > __after_atomic() barrier. Though I think both would be unnecessary ;-) > > > > > > Regards, > > > Boqun > > > > > > > return true; > > > > } > > > > return false; > > > > > > > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-08 2:26 ` Paul E. McKenney @ 2017-03-08 2:44 ` Boqun Feng 2017-03-08 3:08 ` Paul E. McKenney 0 siblings, 1 reply; 21+ messages in thread From: Boqun Feng @ 2017-03-08 2:44 UTC (permalink / raw) To: Paul E. McKenney Cc: Dmitry Vyukov, josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller [-- Attachment #1: Type: text/plain, Size: 8332 bytes --] On Tue, Mar 07, 2017 at 06:26:03PM -0800, Paul E. McKenney wrote: > On Wed, Mar 08, 2017 at 09:39:13AM +0800, Boqun Feng wrote: > > On Tue, Mar 07, 2017 at 03:31:54PM -0800, Paul E. McKenney wrote: > > > On Wed, Mar 08, 2017 at 07:05:13AM +0800, Boqun Feng wrote: > > > > On Tue, Mar 07, 2017 at 07:27:15AM -0800, Paul E. McKenney wrote: > > > > > On Tue, Mar 07, 2017 at 03:43:42PM +0100, Dmitry Vyukov wrote: > > > > > > On Tue, Mar 7, 2017 at 3:27 PM, Boqun Feng <boqun.feng@gmail.com> wrote: > > > > > > > On Tue, Mar 07, 2017 at 08:05:19AM +0100, Dmitry Vyukov wrote: > > > > > > > [...] > > > > > > >> >> > > > > > > >> >> What is that mutex? And what locks/unlocks provide synchronization? I > > > > > > >> >> see that one uses exp_mutex and another -- exp_wake_mutex. > > > > > > >> > > > > > > > >> > Both of them. > > > > > > >> > > > > > > > >> > ->exp_mutex is acquired by the task requesting the grace period, and > > > > > > >> > the counter's first increment is done by that task under that mutex. > > > > > > >> > This task then schedules a workqueue, which drives forward the grace > > > > > > >> > period. Upon grace-period completion, the workqueue handler does the > > > > > > >> > second increment (the one that your patch addressed). The workqueue > > > > > > >> > handler then acquires ->exp_wake_mutex and wakes the task that holds > > > > > > >> > ->exp_mutex (along with all other tasks waiting for this grace period), > > > > > > >> > and that task releases ->exp_mutex, which allows the next grace period to > > > > > > >> > start (and the first increment for that next grace period to be carried > > > > > > >> > out under that lock). The workqueue handler releases ->exp_wake_mutex > > > > > > >> > after finishing its wakeups. > > > > > > >> > > > > > > >> Then we need the following for the case when task requesting the grace > > > > > > >> period does not block, right? > > > > > > > > > > > > > > Won't be necessary I think, as the smp_mb() in rcu_seq_end() and the > > > > > > > smp_mb__before_atomic() in sync_exp_work_done() already provide the > > > > > > > required ordering, no? > > > > > > > > > > > > smp_mb() is probably fine, but smp_mb__before_atomic() is release not > > > > > > acquire. If we want to play that game, then I guess we also need > > > > > > > > The point is that smp_mb__before_atomic() + atomic_long_inc() will > > > > guarantee a smp_mb() before or right along with the atomic operation, > > > > and that's enough because rcu_seq_done() followed by a smp_mb() will > > > > give it a acquire-like behavior. > > > > > > Given current architectures, true enough, from what I can see. > > > > > > However, let's take a look at atomic_ops.rst: > > > > > > > > > If a caller requires memory barrier semantics around an atomic_t > > > operation which does not return a value, a set of interfaces are > > > defined which accomplish this:: > > > > > > void smp_mb__before_atomic(void); > > > void smp_mb__after_atomic(void); > > > > > > For example, smp_mb__before_atomic() can be used like so:: > > > > > > obj->dead = 1; > > > smp_mb__before_atomic(); > > > atomic_dec(&obj->ref_count); > > > > > > It makes sure that all memory operations preceding the atomic_dec() > > > call are strongly ordered with respect to the atomic counter > > > operation. In the above example, it guarantees that the assignment of > > > "1" to obj->dead will be globally visible to other cpus before the > > > atomic counter decrement. > > > > > > Without the explicit smp_mb__before_atomic() call, the > > > implementation could legally allow the atomic counter update visible > > > to other cpus before the "obj->dead = 1;" assignment. > > > > > > So the ordering is guaranteed against the atomic operation, not > > > necessarily the stuff after it. But again, the implementations I know > > > of do make the guarantee, hence my calling it a theoretical bug in the > > > commit log. > > > > Fair enough ;-) It's me who misunderstood this part of document. > > > > However, the names of the barriers are smp_mb__{before,after}_atomic(), > > so if they, semantically, only provide ordering for the corresponding > > atomic ops rather than a full barrier, I would their names are > > misleading ;-) > > Well, if you have both ordering before and after, then you have full > ordering. > I mean the names of the barriers are *smp_mb*__before_atomic() and *smp_mb*__after_atomic(), so it's natural to think they provide a smp_mb() in some situations ;-) > > > > > > smp_mb__after_atomic() there. But it would be way easier to understand > > > > > > > > Adding smp_mb__after_atomic() would be pointless as it's the load of > > > > ->expedited_sequence that we want to ensure having acquire behavior > > > > rather than the atomic increment of @stat. > > > > > > Again, agreed given current code, but atomic_ops.rst doesn't guarantee > > > ordering past the actual atomic operation itself. > > > > Neither does atomic_ops.rst guarantee the ordering between a load before > > the atomic op and memory accesses after the atomic op, right? I.e. > > atomic_ops.rst doesn't say no for reordering like this: > > > > r1 = READ_ONCE(a); ---------+ > > atomic_long_inc(b); | > > smp_mb__after_atomic(); | > > WRITE_ONCE(c); | > > {r1 = READ_ONCE(a)} <-------+ > > > > So it's still not an acquire for READ_ONCE(a), in our case "a" is > > ->expedited_sequence. > > > > To me, we can either fix the atomic_ops.rst or, as I proposed, just > > change smp_mb__before_atomic() to smp_mb(). > > Or have both an smp_mb__before_atomic() and an smp_mb__after_atomic(), > as is the usual approach when you need full ordering. ;-) > Yes ;-) It's just that "adding a barrier after one operation to provide acquire semantic for another operation" looks weird to me. Regards, Boqun > Thanx, Paul > > > Thoughts? > > > > Regards, > > Boqun > > > > > Thanx, Paul > > > > > > > > > what's happens there and prove that it's correct, if we use > > > > > > store_release/load_acquire. > > > > > > > > > > Fair point, how about the following? > > > > > > > > > > Thanx, Paul > > > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > > > commit 6fd8074f1976596898e39f5b7ea1755652533906 > > > > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > > > Date: Tue Mar 7 07:21:23 2017 -0800 > > > > > > > > > > rcu: Add smp_mb__after_atomic() to sync_exp_work_done() > > > > > > > > > > The sync_exp_work_done() function needs to fully order the counter-check > > > > > operation against anything happening after the corresponding grace period. > > > > > This is a theoretical bug, as all current architectures either provide > > > > > full ordering for atomic operation on the one hand or implement, > > > > > however, a little future-proofing is a good thing. This commit > > > > > therefore adds smp_mb__after_atomic() after the atomic_long_inc() > > > > > in sync_exp_work_done(). > > > > > > > > > > Reported-by: Dmitry Vyukov <dvyukov@google.com> > > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > > > > > > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > > > > > index 027e123d93c7..652071abd9b4 100644 > > > > > --- a/kernel/rcu/tree_exp.h > > > > > +++ b/kernel/rcu/tree_exp.h > > > > > @@ -247,6 +247,7 @@ static bool sync_exp_work_done(struct rcu_state *rsp, atomic_long_t *stat, > > > > > /* Ensure test happens before caller kfree(). */ > > > > > smp_mb__before_atomic(); /* ^^^ */ > > > > > atomic_long_inc(stat); > > > > > + smp_mb__after_atomic(); /* ^^^ */ > > > > > > > > If we really care about future-proofing, I think it's more safe to > > > > change smp_mb__before_atomic() to smp_mb() rather than adding > > > > __after_atomic() barrier. Though I think both would be unnecessary ;-) > > > > > > > > Regards, > > > > Boqun > > > > > > > > > return true; > > > > > } > > > > > return false; > > > > > > > > > > > > > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-08 2:44 ` Boqun Feng @ 2017-03-08 3:08 ` Paul E. McKenney 0 siblings, 0 replies; 21+ messages in thread From: Paul E. McKenney @ 2017-03-08 3:08 UTC (permalink / raw) To: Boqun Feng Cc: Dmitry Vyukov, josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller On Wed, Mar 08, 2017 at 10:44:17AM +0800, Boqun Feng wrote: > On Tue, Mar 07, 2017 at 06:26:03PM -0800, Paul E. McKenney wrote: > > On Wed, Mar 08, 2017 at 09:39:13AM +0800, Boqun Feng wrote: > > > On Tue, Mar 07, 2017 at 03:31:54PM -0800, Paul E. McKenney wrote: > > > > On Wed, Mar 08, 2017 at 07:05:13AM +0800, Boqun Feng wrote: > > > > > On Tue, Mar 07, 2017 at 07:27:15AM -0800, Paul E. McKenney wrote: > > > > > > On Tue, Mar 07, 2017 at 03:43:42PM +0100, Dmitry Vyukov wrote: > > > > > > > On Tue, Mar 7, 2017 at 3:27 PM, Boqun Feng <boqun.feng@gmail.com> wrote: > > > > > > > > On Tue, Mar 07, 2017 at 08:05:19AM +0100, Dmitry Vyukov wrote: > > > > > > > > [...] > > > > > > > >> >> > > > > > > > >> >> What is that mutex? And what locks/unlocks provide synchronization? I > > > > > > > >> >> see that one uses exp_mutex and another -- exp_wake_mutex. > > > > > > > >> > > > > > > > > >> > Both of them. > > > > > > > >> > > > > > > > > >> > ->exp_mutex is acquired by the task requesting the grace period, and > > > > > > > >> > the counter's first increment is done by that task under that mutex. > > > > > > > >> > This task then schedules a workqueue, which drives forward the grace > > > > > > > >> > period. Upon grace-period completion, the workqueue handler does the > > > > > > > >> > second increment (the one that your patch addressed). The workqueue > > > > > > > >> > handler then acquires ->exp_wake_mutex and wakes the task that holds > > > > > > > >> > ->exp_mutex (along with all other tasks waiting for this grace period), > > > > > > > >> > and that task releases ->exp_mutex, which allows the next grace period to > > > > > > > >> > start (and the first increment for that next grace period to be carried > > > > > > > >> > out under that lock). The workqueue handler releases ->exp_wake_mutex > > > > > > > >> > after finishing its wakeups. > > > > > > > >> > > > > > > > >> Then we need the following for the case when task requesting the grace > > > > > > > >> period does not block, right? > > > > > > > > > > > > > > > > Won't be necessary I think, as the smp_mb() in rcu_seq_end() and the > > > > > > > > smp_mb__before_atomic() in sync_exp_work_done() already provide the > > > > > > > > required ordering, no? > > > > > > > > > > > > > > smp_mb() is probably fine, but smp_mb__before_atomic() is release not > > > > > > > acquire. If we want to play that game, then I guess we also need > > > > > > > > > > The point is that smp_mb__before_atomic() + atomic_long_inc() will > > > > > guarantee a smp_mb() before or right along with the atomic operation, > > > > > and that's enough because rcu_seq_done() followed by a smp_mb() will > > > > > give it a acquire-like behavior. > > > > > > > > Given current architectures, true enough, from what I can see. > > > > > > > > However, let's take a look at atomic_ops.rst: > > > > > > > > > > > > If a caller requires memory barrier semantics around an atomic_t > > > > operation which does not return a value, a set of interfaces are > > > > defined which accomplish this:: > > > > > > > > void smp_mb__before_atomic(void); > > > > void smp_mb__after_atomic(void); > > > > > > > > For example, smp_mb__before_atomic() can be used like so:: > > > > > > > > obj->dead = 1; > > > > smp_mb__before_atomic(); > > > > atomic_dec(&obj->ref_count); > > > > > > > > It makes sure that all memory operations preceding the atomic_dec() > > > > call are strongly ordered with respect to the atomic counter > > > > operation. In the above example, it guarantees that the assignment of > > > > "1" to obj->dead will be globally visible to other cpus before the > > > > atomic counter decrement. > > > > > > > > Without the explicit smp_mb__before_atomic() call, the > > > > implementation could legally allow the atomic counter update visible > > > > to other cpus before the "obj->dead = 1;" assignment. > > > > > > > > So the ordering is guaranteed against the atomic operation, not > > > > necessarily the stuff after it. But again, the implementations I know > > > > of do make the guarantee, hence my calling it a theoretical bug in the > > > > commit log. > > > > > > Fair enough ;-) It's me who misunderstood this part of document. > > > > > > However, the names of the barriers are smp_mb__{before,after}_atomic(), > > > so if they, semantically, only provide ordering for the corresponding > > > atomic ops rather than a full barrier, I would their names are > > > misleading ;-) > > > > Well, if you have both ordering before and after, then you have full > > ordering. > > I mean the names of the barriers are *smp_mb*__before_atomic() and > *smp_mb*__after_atomic(), so it's natural to think they provide a > smp_mb() in some situations ;-) > > > > > > > > smp_mb__after_atomic() there. But it would be way easier to understand > > > > > > > > > > Adding smp_mb__after_atomic() would be pointless as it's the load of > > > > > ->expedited_sequence that we want to ensure having acquire behavior > > > > > rather than the atomic increment of @stat. > > > > > > > > Again, agreed given current code, but atomic_ops.rst doesn't guarantee > > > > ordering past the actual atomic operation itself. > > > > > > Neither does atomic_ops.rst guarantee the ordering between a load before > > > the atomic op and memory accesses after the atomic op, right? I.e. > > > atomic_ops.rst doesn't say no for reordering like this: > > > > > > r1 = READ_ONCE(a); ---------+ > > > atomic_long_inc(b); | > > > smp_mb__after_atomic(); | > > > WRITE_ONCE(c); | > > > {r1 = READ_ONCE(a)} <-------+ > > > > > > So it's still not an acquire for READ_ONCE(a), in our case "a" is > > > ->expedited_sequence. > > > > > > To me, we can either fix the atomic_ops.rst or, as I proposed, just > > > change smp_mb__before_atomic() to smp_mb(). > > > > Or have both an smp_mb__before_atomic() and an smp_mb__after_atomic(), > > as is the usual approach when you need full ordering. ;-) > > Yes ;-) It's just that "adding a barrier after one operation to provide > acquire semantic for another operation" looks weird to me. But they are memory barriers! They are -suppposed- to look weird! ;-) Thanx, Paul > Regards, > Boqun > > > Thanx, Paul > > > > > Thoughts? > > > > > > Regards, > > > Boqun > > > > > > > Thanx, Paul > > > > > > > > > > > what's happens there and prove that it's correct, if we use > > > > > > > store_release/load_acquire. > > > > > > > > > > > > Fair point, how about the following? > > > > > > > > > > > > Thanx, Paul > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > > > > > commit 6fd8074f1976596898e39f5b7ea1755652533906 > > > > > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > > > > Date: Tue Mar 7 07:21:23 2017 -0800 > > > > > > > > > > > > rcu: Add smp_mb__after_atomic() to sync_exp_work_done() > > > > > > > > > > > > The sync_exp_work_done() function needs to fully order the counter-check > > > > > > operation against anything happening after the corresponding grace period. > > > > > > This is a theoretical bug, as all current architectures either provide > > > > > > full ordering for atomic operation on the one hand or implement, > > > > > > however, a little future-proofing is a good thing. This commit > > > > > > therefore adds smp_mb__after_atomic() after the atomic_long_inc() > > > > > > in sync_exp_work_done(). > > > > > > > > > > > > Reported-by: Dmitry Vyukov <dvyukov@google.com> > > > > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > > > > > > > > > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > > > > > > index 027e123d93c7..652071abd9b4 100644 > > > > > > --- a/kernel/rcu/tree_exp.h > > > > > > +++ b/kernel/rcu/tree_exp.h > > > > > > @@ -247,6 +247,7 @@ static bool sync_exp_work_done(struct rcu_state *rsp, atomic_long_t *stat, > > > > > > /* Ensure test happens before caller kfree(). */ > > > > > > smp_mb__before_atomic(); /* ^^^ */ > > > > > > atomic_long_inc(stat); > > > > > > + smp_mb__after_atomic(); /* ^^^ */ > > > > > > > > > > If we really care about future-proofing, I think it's more safe to > > > > > change smp_mb__before_atomic() to smp_mb() rather than adding > > > > > __after_atomic() barrier. Though I think both would be unnecessary ;-) > > > > > > > > > > Regards, > > > > > Boqun > > > > > > > > > > > return true; > > > > > > } > > > > > > return false; > > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: rcu: WARNING in rcu_seq_end 2017-03-07 7:05 ` Dmitry Vyukov 2017-03-07 14:27 ` Boqun Feng @ 2017-03-07 15:16 ` Paul E. McKenney 1 sibling, 0 replies; 21+ messages in thread From: Paul E. McKenney @ 2017-03-07 15:16 UTC (permalink / raw) To: Dmitry Vyukov Cc: josh, Steven Rostedt, Mathieu Desnoyers, jiangshanlai, LKML, syzkaller On Tue, Mar 07, 2017 at 08:05:19AM +0100, Dmitry Vyukov wrote: > On Tue, Mar 7, 2017 at 12:08 AM, Paul E. McKenney > <paulmck@linux.vnet.ibm.com> wrote: > >> >> >> >> Hello, > >> >> >> >> > >> >> >> >> Paul, you wanted bugs in rcu. > >> >> >> > > >> >> >> > Well, whether I want them or not, I must deal with them. ;-) > >> >> >> > > >> >> >> >> I've got this WARNING while running syzkaller fuzzer on > >> >> >> >> 86292b33d4b79ee03e2f43ea0381ef85f077c760: > >> >> >> >> > >> >> >> >> ------------[ cut here ]------------ > >> >> >> >> WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533 > >> >> >> >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 > >> >> >> >> Kernel panic - not syncing: panic_on_warn set ... > >> >> >> >> CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276 > >> >> >> >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > >> >> >> >> Workqueue: events wait_rcu_exp_gp > >> >> >> >> Call Trace: > >> >> >> >> __dump_stack lib/dump_stack.c:15 [inline] > >> >> >> >> dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 > >> >> >> >> panic+0x1fb/0x412 kernel/panic.c:179 > >> >> >> >> __warn+0x1c4/0x1e0 kernel/panic.c:540 > >> >> >> >> warn_slowpath_null+0x2c/0x40 kernel/panic.c:583 > >> >> >> >> rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533 > >> >> >> >> rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline] > >> >> >> >> rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517 > >> >> >> >> rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline] > >> >> >> >> wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570 > >> >> >> >> process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096 > >> >> >> >> worker_thread+0x223/0x19c0 kernel/workqueue.c:2230 > >> >> >> >> kthread+0x326/0x3f0 kernel/kthread.c:227 > >> >> >> >> ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 > >> >> >> >> Dumping ftrace buffer: > >> >> >> >> (ftrace buffer empty) > >> >> >> >> Kernel Offset: disabled > >> >> >> >> Rebooting in 86400 seconds.. > >> >> >> >> > >> >> >> >> > >> >> >> >> Not reproducible. But looking at the code, shouldn't it be: > >> >> >> >> > >> >> >> >> static void rcu_seq_end(unsigned long *sp) > >> >> >> >> { > >> >> >> >> smp_mb(); /* Ensure update-side operation before counter increment. */ > >> >> >> >> + WARN_ON_ONCE(!(*sp & 0x1)); > >> >> >> >> WRITE_ONCE(*sp, *sp + 1); > >> >> >> >> - WARN_ON_ONCE(*sp & 0x1); > >> >> >> >> } > >> >> >> >> > >> >> >> >> ? > >> >> >> >> > >> >> >> >> Otherwise wait_event in _synchronize_rcu_expedited can return as soon > >> >> >> >> as WRITE_ONCE(*sp, *sp + 1) finishes. As far as I understand this > >> >> >> >> consequently can allow start of next grace periods. Which in turn can > >> >> >> >> make the warning fire. Am I missing something? > >> >> >> >> > >> >> >> >> I don't see any other bad consequences of this. The rest of > >> >> >> >> rcu_exp_wait_wake can proceed when _synchronize_rcu_expedited has > >> >> >> >> returned and destroyed work on stack and next period has started and > >> >> >> >> ended, but it seems OK. > >> >> >> > > >> >> >> > I believe that this is a heygood change, but I don't see how it will > >> >> >> > help in this case. BTW, may I have your Signed-off-by? > >> >> >> > > >> >> >> > The reason I don't believe that it will help is that the > >> >> >> > rcu_exp_gp_seq_end() function is called from a workqueue handler that > >> >> >> > is invoked holding ->exp_mutex, and this mutex is not released until > >> >> >> > after the handler invokes rcu_seq_end() and then wakes up the task that > >> >> >> > scheduled the workqueue handler. So the ordering above should not matter > >> >> >> > (but I agree that your ordering is cleaner. > >> >> >> > > >> >> >> > That said, it looks like I am missing some memory barriers, please > >> >> >> > see the following patch. > >> >> >> > > >> >> >> > But what architecture did you see this on? > >> >> >> > >> >> >> > >> >> >> This is just x86. > >> >> >> > >> >> >> You seem to assume that wait_event() waits for the wakeup. It does not > >> >> >> work this way. It can return as soon as the condition becomes true > >> >> >> without ever waiting: > >> >> >> > >> >> >> 305 #define wait_event(wq, condition) \ > >> >> >> 306 do { \ > >> >> >> 307 might_sleep(); \ > >> >> >> 308 if (condition) \ > >> >> >> 309 break; \ > >> >> >> 310 __wait_event(wq, condition); \ > >> >> >> 311 } while (0) > >> >> > > >> >> > Agreed, hence my patch in the previous email. I guess I knew that, but > >> >> > >> >> Ah, you meant to synchronize rcu_seq_end with rcu_seq_done? > >> > > >> > No, there is a mutex release and acquisition that do the synchronization, > >> > but only if the wakeup has appropriate barriers. The issue is that > >> > part of the mutex's critical section executes in a workqueue, possibly > >> > on some other CPU. > >> > >> What is that mutex? And what locks/unlocks provide synchronization? I > >> see that one uses exp_mutex and another -- exp_wake_mutex. > > > > Both of them. > > > > ->exp_mutex is acquired by the task requesting the grace period, and > > the counter's first increment is done by that task under that mutex. > > This task then schedules a workqueue, which drives forward the grace > > period. Upon grace-period completion, the workqueue handler does the > > second increment (the one that your patch addressed). The workqueue > > handler then acquires ->exp_wake_mutex and wakes the task that holds > > ->exp_mutex (along with all other tasks waiting for this grace period), > > and that task releases ->exp_mutex, which allows the next grace period to > > start (and the first increment for that next grace period to be carried > > out under that lock). The workqueue handler releases ->exp_wake_mutex > > after finishing its wakeups. First and foremost, thank you for reviewing this stuff!!! > Then we need the following for the case when task requesting the grace > period does not block, right? > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index d80c2587bed8..aa7ba83f6a56 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3534,7 +3534,7 @@ static void rcu_seq_start(unsigned long *sp) > static void rcu_seq_end(unsigned long *sp) > { > smp_mb(); /* Ensure update-side operation before counter increment. */ > - WRITE_ONCE(*sp, *sp + 1); > + smp_store_release(sp, *sp + 1); > WARN_ON_ONCE(*sp & 0x1); > } The above is not needed, as the smp_mb() covers it completely. > @@ -3554,7 +3554,7 @@ static unsigned long rcu_seq_snap(unsigned long *sp) > */ > static bool rcu_seq_done(unsigned long *sp, unsigned long s) > { > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > + return ULONG_CMP_GE(smp_load_acquire(sp), s); And this one is covered by the smp_mb__before_atomic() inside the "if" statement in sync_exp_work_done(), via rcu_exp_gp_seq_done(). That is, sync_exp_work_done() invokes rcu_exp_gp_seq_done() which in turn invokes rcu_seq_done(). Why smp_mb() instead of release-acquire? Because the grace-period relationship is quite strong, it cannot be implemented with release-acquire on all platforms. Thanx, Paul ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2017-03-08 3:14 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-03-04 16:01 rcu: WARNING in rcu_seq_end Dmitry Vyukov 2017-03-04 20:40 ` Paul E. McKenney 2017-03-05 10:50 ` Dmitry Vyukov 2017-03-05 18:47 ` Paul E. McKenney 2017-03-06 9:24 ` Dmitry Vyukov 2017-03-06 10:07 ` Paul E. McKenney 2017-03-06 10:11 ` Dmitry Vyukov 2017-03-06 23:08 ` Paul E. McKenney 2017-03-07 7:05 ` Dmitry Vyukov 2017-03-07 14:27 ` Boqun Feng 2017-03-07 14:43 ` Dmitry Vyukov 2017-03-07 15:27 ` Paul E. McKenney 2017-03-07 18:37 ` Dmitry Vyukov 2017-03-07 19:09 ` Paul E. McKenney 2017-03-07 23:05 ` Boqun Feng 2017-03-07 23:31 ` Paul E. McKenney 2017-03-08 1:39 ` Boqun Feng 2017-03-08 2:26 ` Paul E. McKenney 2017-03-08 2:44 ` Boqun Feng 2017-03-08 3:08 ` Paul E. McKenney 2017-03-07 15:16 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).