From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Andrey Konovalov <andreyknvl@google.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>,
Josh Triplett <josh@joshtriplett.org>,
Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
LKML <linux-kernel@vger.kernel.org>,
syzkaller <syzkaller@googlegroups.com>,
Dmitry Vyukov <dvyukov@google.com>,
Kostya Serebryany <kcc@google.com>
Subject: Re: srcu: BUG in __synchronize_srcu
Date: Fri, 10 Mar 2017 14:26:09 -0800 [thread overview]
Message-ID: <20170310222609.GN30506@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAAeHK+yeM99Lvod-EfZPGxMBbmDSzBDRzaLwB+upqYeq6uCZ1Q@mail.gmail.com>
On Fri, Mar 10, 2017 at 08:29:55PM +0100, Andrey Konovalov wrote:
> On Fri, Mar 10, 2017 at 8:28 PM, Andrey Konovalov <andreyknvl@google.com> wrote:
> > Hi,
> >
> > I've got the following error report while fuzzing the kernel with
> > syzkaller on an arm64 board.
>
> This also happened on x86 a few times during fuzzing, however it
> wasn't reproducible.
>
> >
> > On linux-next commit 56b8bad5e066c23e8fa273ef5fba50bd3da2ace8 (Mar 8).
> >
> > A reproducer and .config are attached.
> >
> > The bug happens while executing the following syzkaller program in a loop.
> > While it looks kvm-related, it might be that kvm just stresses the
> > srcu subsystem.
> >
> > mmap(&(0x7f0000000000/0xfff000)=nil, (0xfff000), 0x3, 0x32,
> > 0xffffffffffffffff, 0x0)
> > r0 = openat$kvm(0xffffffffffffff9c,
> > &(0x7f0000a05000)="2f6465762f6b766d00", 0x0, 0x0)
> > ioctl$KVM_CREATE_VM(r0, 0xae01, 0x0)
> >
> > ------------[ cut here ]------------
> > kernel BUG at kernel/rcu/srcu.c:436!
This is on v4.10, correct?
> > Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> > Modules linked in: meson_drm drm_kms_helper drm dwmac_generic realtek
> > dwmac_meson8b stmmac_platform stmmac meson_rng rng_core meson_gxbb_wdt
> > ipv6
> > CPU: 3 PID: 4250 Comm: a.out Not tainted 4.11.0-rc1-next-20170308-xc2-dirty #3
> > Hardware name: Hardkernel ODROID-C2 (DT)
> > task: ffff800063699700 task.stack: ffff800063cfc000
> > PC is at[< none >] __synchronize_srcu+0x3d0/0x470
> > kernel/rcu/srcu.c:412
> > LR is at[< none >] __synchronize_srcu+0x130/0x470
> > kernel/rcu/srcu.c:434
> > pc : [<ffff20000821a3b8>] lr : [<ffff20000821a118>] pstate: 80000145
> > sp : ffff800063cffb00
> > x29: ffff800063cffb00 x28: ffff80005b1d6e00
> > x27: 1fffe4000156b242 x26: ffff800063cffb70
> > x25: 1fffe4000156b23b x24: ffff20000ab591d8
> > x23: ffff200009dbf000 x22: ffff20000ab591a0
> > x21: ffff20000ab59210 x20: ffff800063cffb70
> > x19: ffff20000ab59190 x18: 0000000000000a03
> > x17: 0000ffff944f3950 x16: ffff20000811f818
> > x15: 0000000000000000 x14: 0000000000000007
> > x13: 0000000000000002 x12: 0000000000000000
> > x11: 0000000000000040 x10: 1fffe400014b568c
> > x9 : ffff20000ab29000 x8 : 0000000000000007
> > x7 : 0000000000000001 x6 : 0000000000000000
> > x5 : 0000000000000040 x4 : 0000000000000003
> > x3 : ffff20000ab59208 x2 : 1fffe4000156b243
> > x1 : 0000000000000000 x0 : ffff80005e71fb70
> >
> > Process a.out (pid: 4250, stack limit = 0xffff800063cfc000)
> > Stack: (0xffff800063cffb00 to 0xffff800063d00000)
> > fb00: ffff800063cffbd0 ffff20000821a480 ffff20000ab59190 1ffff0000b63adc0
> > fb20: dfff200000000000 ffff20000ab59190 ffff80004b9a8a00 1ffff000097351bc
> > fb40: ffff80004b9a8de0 0000000000000000 ffff800060cad328 ffff80005b1d6e00
> > fb60: ffff80004b9a8a00 1ffff000097351bc ffff80004f5e7b70 ffff200008217968
> > fb80: ffff800000000001 dead4ead00010001 dfff2000ffffffff ffffffffffffffff
> > fba0: ffff20000ab4c4b0 0000000000000000 0000000000000000 ffff200009b0b358
> > fbc0: ffff800063cffbc0 ffff800063cffbc0 ffff800063cffbf0 ffff2000083ffd20
> > fbe0: ffff80005b1d6e00 0000000000000140 ffff800063cffc50 ffff2000083aedfc
> > fc00: ffff80004b9a8a00 ffff80004b9a8a00 ffff80004b9a8d78 0000000000000001
> > fc20: 00000000000002a6 ffff80004f406780 ffff80004b9a8aa0 ffff800063699ac8
> > fc40: 1ffff0000c6d3359 ffff800063699700 ffff800063cffd20 ffff20000810caec
> > fc60: ffff80004b9a8a00 ffff80004b9a8b20 ffff80004b9a8d78 0000000000000001
> > fc80: ffff80005ebae2d8 ffff800063699ac8 ffff800063cffca0 ffff20000840fc08
> > fca0: ffff800063cffce0 ffff20000843327c ffff80005ebae2d8 ffff80004b9a8a00
> > fcc0: ffff80005ebae0f0 ffff200009de8000 ffff800063cffce0 ffff2000084332c4
> > fce0: ffff800063cffd20 ffff20000810cc64 ffff80004b9a8a00 ffff80004b9a8a48
> > fd00: ffff80004b9a8d78 0000000000000001 ffff800063cffd20 ffff20000810cae0
> > fd20: ffff800063cffd60 ffff20000811db88 ffff800063699700 ffff800063699700
> > fd40: ffff80004b9a8a00 0000000000000001 00000000000002a6 ffff80004f406780
> > fd60: ffff800063cffe40 ffff20000811f694 ffff80004f406780 0000000000000000
> > fd80: ffff80004f40681c 0000000000000004 1ffff00009e80d03 1ffff00009e80d02
> > fda0: ffff80004f406810 ffff800063699bd0 ffff800063699700 ffff800063699700
> > fdc0: ffff800063cffe80 ffff200008490868 0000000000000000 ffff80005fd45000
> > fde0: ffff80006369972c ffff800063699d48 1ffff0000c6d32e5 0000000000000004
> > fe00: 0000000000000123 000000000000001d 1ffff0000c6d33a9 ffff800063699700
> > fe20: ffff800063cffe30 ffff200008813c5c ffff800063cffe40 ffff20000811f688
> > fe40: ffff800063cffea0 ffff20000811f838 0000000000000000 000060006d24d000
> > fe60: ffffffffffffffff 0000ffff944f3974 0000000000000000 0000000000000015
> > fe80: 0000000000000123 000000000000005e ffff200009852000 ffff20000811f82c
> > fea0: 0000000000000000 ffff200008083f70 0000000000000000 0000000000000015
> > fec0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > fee0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > ff00: 000000000000005e ffffff80ffffffd0 0101010101010101 0000000000000020
> > ff20: 0000000000000018 0000000056bcb768 0000000000000000 0000ffff945be000
> > ff40: 0000000000413110 0000ffff944f3950 0000000000000a03 00000000004020f8
> > ff60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > ff80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > ffa0: 0000000000000000 0000ffffc0cc99b0 0000000000401248 0000ffffc0cc99b0
> > ffc0: 0000ffff944f3974 0000000000000000 0000000000000000 000000000000005e
> > ffe0: 0000000000000000 0000000000000000 0403030303030100 0807060606060605
> > Call trace:
> > Exception stack(0xffff800063cff910 to 0xffff800063cffa40)
> > f900: ffff20000ab59190 0001000000000000
> > f920: ffff800063cffb00 ffff20000821a3b8 0000000080000145 000000000000003d
> > f940: 1fffe4000156b23b ffff800063cffb70 ffff800063cff980 0001000000000000
> > f960: ffff800063cff9d0 ffff2000081da8e8 ffff800063699ec0 ffff200009df9000
> > f980: ffff800063cff990 ffff20000891e1e0 ffff800063cff9d0 ffff20000891e23c
> > f9a0: ffff200009dbfe18 0000000000000040 0000000000000004 0000000000000001
> > f9c0: 00000000000008ac 00000000000008ac ffff80005e71fb70 0000000000000000
> > f9e0: 1fffe4000156b243 ffff20000ab59208 0000000000000003 0000000000000040
> > fa00: 0000000000000000 0000000000000001 0000000000000007 ffff20000ab29000
> > fa20: 1fffe400014b568c 0000000000000040 0000000000000000 0000000000000002
> > [<ffff20000821a3b8>] __synchronize_srcu+0x3d0/0x470 kernel/rcu/srcu.c:412
> > [<ffff20000821a480>] synchronize_srcu+0x28/0x60 kernel/rcu/srcu.c:516
> > [<ffff2000083ffd20>] __mmu_notifier_release+0x268/0x3e0 mm/mmu_notifier.c:102
> > [< inline >] mmu_notifier_release ./include/linux/mmu_notifier.h:235
> > [<ffff2000083aedfc>] exit_mmap+0x21c/0x288 mm/mmap.c:2941
> > [< inline >] __mmput kernel/fork.c:881
> > [<ffff20000810caec>] mmput+0xdc/0x2e0 kernel/fork.c:903
> > [< inline >] exit_mm kernel/exit.c:557
> > [<ffff20000811db88>] do_exit+0x648/0x2020 kernel/exit.c:865
> > [<ffff20000811f694>] do_group_exit+0xdc/0x260 kernel/exit.c:982
> > [< inline >] SYSC_exit_group kernel/exit.c:993
> > [<ffff20000811f838>] __wake_up_parent+0x0/0x60 kernel/exit.c:991
> > [<ffff200008083f70>] el0_svc_naked+0x24/0x28 arch/arm64/kernel/entry.S:813
> > Code: 97feee10 35fff680 17ffff1c d503201f (d4210000)
> > ---[ end trace b727e9858bfac1ff ]---
> > Kernel panic - not syncing: Fatal exception
So the theory is that if !sp->running, all of SRCU's queues must be empty.
So if you are holding ->queue_lock (with irqs disabled) and you see
!sp->running, and then you enqueue a callback on ->batch_check0, then
that callback must be the first in the list. And the code preceding
the WARN_ON() you triggered does in fact check and enqueue shile holding
->queue_lock with irqs disabled.
And rcu_batch_queue() does operate FIFO as required. (Otherwise,
srcu_barrier() would not work.)
There are only three calls to rcu_batch_queue(), and the one involved with
the WARN_ON() enqueues to ->batch_check0. The other two enqueue to
->batch_queue. Callbacks move from ->batch_queue to ->batch_check0 to
->batch_check1 to ->batch_done, so nothing should slip in front.
Of course, if ->running were ever set to false with any of ->batch_check0,
->batch_check1, or ->batch_done non-empty, this WARN_ON() would trigger.
But srcu_reschedule() sets it to false only if all four batches are
empty (and checks and sets under ->queue_lock()), and all other cases
where it is set to false happen at initialization time, and also clear
out the queues. Of course, if someone raced an init_srcu_struct() with
either a call_srcu() or synchronize_srcu(), all bets are off. Now,
mmu_notifier.c does invoke init_srcu_struct() manually, but it does
so at subsys_initcall() time. Which -might- be after other things are
happening, so one "hail Mary" attempted fix is to remove mmu_notifier_init()
and replace the "static struct srcu_struct srcu" with:
DEFINE_STATIC_SRCU(srcu);
But this might require changing the name -- I vaguely recall some
strangeness where the names of statically defined per-CPU variables need
to be globally unique even when static. Easy enough to do, though.
Might need a similar change to the "srcu" instances defined in vmd.c
and kvm_host.h -- assuming that this change helps.
Another possibility is that something in SRCU is messing with either the
queues or the ->running field without holding ->queue_lock. And that does
seem to be happening -- srcu_advance_batches() invokes rcu_batch_move()
without holding anything. Which seems like it could cause trouble
if someone else was invoking synchronize_srcu() concurrently. Those
particular invocations might be safe due to access only by a single
kthread/workqueue, given that all updates to ->batch_queue are protected
by ->queue_lock (aside from initialization).
But ->batch_check0 is updated by __synchronize_srcu(), though protected
by ->queue_lock, and only if ->running is false, and with both the
check and the update protected by the same ->queue_lock critical section.
If ->running is false, then the workqueue cannot be running, so it remains
to see if all other updates to ->batch_check0 are either with ->queue_lock
held and ->running false on the one hand or from the workqueue handler
on the other:
srcu_collect_new() updates with ->queue_lock held, but does not check
->running. It is invoked only from process_srcu(), which in
turn is invoked only as a workqueue handler. The work is queued
from:
call_srcu(), which does so with ->queue_lock held having just
set ->running to true.
srcu_reschedule(), which invokes it if there are non-empty
queues. This is invoked from __synchronize_srcu()
in the case where it has set ->running to true
after finding the queues empty, which should imply
no other instances.
It is also invoked from process_srcu(), which is
invoked only as a workqueue handler. (Yay
recursive inquiry!)
srcu_advance_batches() updates without locks held. It is invoked as
follows:
__synchronize_srcu() in the case where ->running was set, which
as noted before excludes workqueue handlers.
process_srcu() which as noted before is only invoked from
a workqueue handler.
So an SRCU workqueue is invoked only from a workqueue handler, or from
some other task that transitioned ->running from false to true while
holding ->queuelock. There should therefore only be one SRCU workqueue
per srcu_struct, so this should be safe. Though I hope that it can
be simplified a bit. :-/
So the only suggestion I have at the moment is static definition of
the "srcu" variable. Lai, Josh, Steve, Mathieu, anything I missed?
Thanx, Paul
next prev parent reply other threads:[~2017-03-10 22:26 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-10 19:28 srcu: BUG in __synchronize_srcu Andrey Konovalov
2017-03-10 19:29 ` Andrey Konovalov
2017-03-10 19:42 ` Dmitry Vyukov
2017-03-10 22:26 ` Paul E. McKenney [this message]
2017-03-11 14:25 ` Mathieu Desnoyers
2017-03-11 20:06 ` Paul E. McKenney
2017-03-14 7:47 ` Lance Roy
2017-03-14 16:21 ` Paul E. McKenney
2017-03-27 12:36 ` Dmitry Vyukov
2017-03-27 14:16 ` Paul E. McKenney
2017-03-27 14:51 ` Dmitry Vyukov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170310222609.GN30506@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=andreyknvl@google.com \
--cc=dvyukov@google.com \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=kcc@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=rostedt@goodmis.org \
--cc=syzkaller@googlegroups.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.