From: Uladzislau Rezki <urezki@gmail.com>
To: "Paul E. McKenney" <paulmck@kernel.org>,
Zhang@pc638.lan, Qiang1 <qiang1.zhang@intel.com>,
Zhen Lei <thunder.leizhen@huawei.com>
Cc: "Zhang, Qiang1" <qiang1.zhang@intel.com>,
"Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>,
Frederic Weisbecker <frederic@kernel.org>,
Neeraj Upadhyay <quic_neeraju@quicinc.com>,
Josh Triplett <josh@joshtriplett.org>,
Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Joel Fernandes <joel@joelfernandes.org>,
"rcu@vger.kernel.org" <rcu@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] rcu: Dump memory object info if callback function is invalid
Date: Mon, 14 Nov 2022 19:22:44 +0100 [thread overview]
Message-ID: <Y3KHdBjAIR3FNKpS@pc638.lan> (raw)
In-Reply-To: <20221111184238.GX725751@paulmck-ThinkPad-P17-Gen-1>
> On Fri, Nov 11, 2022 at 01:05:56PM +0000, Zhang, Qiang1 wrote:
> > On 2022/11/11 19:54, Zhang, Qiang1 wrote:
> > >> When a structure containing an RCU callback rhp is (incorrectly)
> > >> freed and reallocated after rhp is passed to call_rcu(), it is not
> > >> unusual for
> > >> rhp->func to be set to NULL. This defeats the debugging prints used
> > >> rhp->by
> > >> __call_rcu_common() in kernels built with
> > >> CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, which expect to identify the
> > >> offending code using the identity of this function.
> > >>
> > >> And in kernels build without CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, things
> > >> are even worse, as can be seen from this splat:
> > >>
> > >> Unable to handle kernel NULL pointer dereference at virtual address 0
> > >> ... ...
> > >> PC is at 0x0
> > >> LR is at rcu_do_batch+0x1c0/0x3b8
> > >> ... ...
> > >> (rcu_do_batch) from (rcu_core+0x1d4/0x284)
> > >> (rcu_core) from (__do_softirq+0x24c/0x344)
> > >> (__do_softirq) from (__irq_exit_rcu+0x64/0x108)
> > >> (__irq_exit_rcu) from (irq_exit+0x8/0x10)
> > >> (irq_exit) from (__handle_domain_irq+0x74/0x9c)
> > >> (__handle_domain_irq) from (gic_handle_irq+0x8c/0x98)
> > >> (gic_handle_irq) from (__irq_svc+0x5c/0x94)
> > >> (__irq_svc) from (arch_cpu_idle+0x20/0x3c)
> > >> (arch_cpu_idle) from (default_idle_call+0x4c/0x78)
> > >> (default_idle_call) from (do_idle+0xf8/0x150)
> > >> (do_idle) from (cpu_startup_entry+0x18/0x20)
> > >> (cpu_startup_entry) from (0xc01530)
> > >>
> > >> This commit therefore adds calls to mem_dump_obj(rhp) to output some
> > >> information, for example:
> > >>
> > >> slab kmalloc-256 start ffff410c45019900 pointer offset 0 size 256
> > >>
> > >> This provides the rough size of the memory block and the offset of
> > >> the rcu_head structure, which as least provides at least a few clues
> > >> to help locate the problem. If the problem is reproducible,
> > >> additional slab debugging can be enabled, for example,
> > >> CONFIG_DEBUG_SLAB=y, which can provide significantly more information.
> > >>
> > >> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> > >> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > >> ---
> > >> kernel/rcu/rcu.h | 7 +++++++
> > >> kernel/rcu/srcutiny.c | 1 +
> > >> kernel/rcu/srcutree.c | 1 +
> > >> kernel/rcu/tasks.h | 1 +
> > >> kernel/rcu/tiny.c | 1 +
> > >> kernel/rcu/tree.c | 1 +
> > >> 6 files changed, 12 insertions(+)
> > >>
> > >> v1 --> v2:
> > >> 1. Remove condition "(unsigned long)rhp->func & 0x3", it have problems on x86.
> > >> 2. Paul E. McKenney helped me update the commit message, thanks.
> > >>
> > >
> > > Hi, Zhen Lei
> > >
> > > Maybe the following scenarios should be considered:
> > >
> > > CPU 0
> > > tasks context
> > > spin_lock(&vmap_area_lock)
> > > Interrupt
> > > RCU softirq
> > > rcu_do_batch
> > > mem_dump_obj
> > > vmalloc_dump_obj
> > > spin_lock(&vmap_area_lock) <-- deadlock
> >
> > >Right, thanks. I just saw the robot's report. So this patch should be dropped.
> > >I'll try to add an helper in mm, where I can check whether the lock has been held, and dump the content of memory object.
> >
> > This is a workaround, or maybe try a modification like the following,
> > of course, need to ask Paul's opinion.
>
> Another approach is to schedule a workqueue handler to do the
> mem_dump_obj(). This would allow mem_dump_obj() to run in a clean
> environment.
>
> This would allow vmalloc_dump_obj() to be called unconditionally.
>
> Other thoughts?
>
<snip>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ca71de7c9d77..956eb28f9c77 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2591,6 +2591,19 @@ struct vm_struct *find_vm_area(const void *addr)
return va->vm;
}
+static struct vm_struct *
+find_vm_area_trylock(const void *addr)
+{
+ struct vmap_area *va = NULL;
+
+ if (spin_trylock(&vmap_area_lock)) {
+ va = __find_vmap_area((unsigned long)addr, &vmap_area_root);
+ spin_unlock(&vmap_area_lock);
+ }
+
+ return va ? va->vm : NULL;
+}
+
/**
* remove_vm_area - find and remove a continuous kernel virtual area
* @addr: base address
@@ -4048,7 +4061,7 @@ bool vmalloc_dump_obj(void *object)
struct vm_struct *vm;
void *objp = (void *)PAGE_ALIGN((unsigned long)object);
- vm = find_vm_area(objp);
+ vm = find_vm_area_trylock(objp);
if (!vm)
return false;
pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
<snip>
There is one issue though, it is not correct to reference vm->members
after a lock is dropped because of: use after free bugs. It is better
to embed the search and read out: nr_pages, addr, caller and drop the
lock.
--
Uladzislau Rezki
next prev parent reply other threads:[~2022-11-14 18:22 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-11 10:28 [PATCH v2] rcu: Dump memory object info if callback function is invalid Zhen Lei
2022-11-11 11:54 ` Zhang, Qiang1
2022-11-11 12:34 ` Leizhen (ThunderTown)
2022-11-11 13:05 ` Zhang, Qiang1
2022-11-11 18:42 ` Paul E. McKenney
2022-11-12 2:32 ` Leizhen (ThunderTown)
2022-11-12 2:39 ` Leizhen (ThunderTown)
2022-11-12 6:09 ` Paul E. McKenney
2022-11-12 6:08 ` Paul E. McKenney
2022-11-14 7:18 ` Leizhen (ThunderTown)
2022-11-14 16:06 ` Paul E. McKenney
2022-11-16 14:43 ` Leizhen (ThunderTown)
2022-11-16 19:57 ` Paul E. McKenney
2022-11-17 0:09 ` Zhang, Qiang1
2022-11-14 18:22 ` Uladzislau Rezki [this message]
2022-11-17 1:29 ` Leizhen (ThunderTown)
2022-11-11 15:50 ` Elliott, Robert (Servers)
2022-11-14 7:05 ` Leizhen (ThunderTown)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y3KHdBjAIR3FNKpS@pc638.lan \
--to=urezki@gmail.com \
--cc=Zhang@pc638.lan \
--cc=frederic@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=paulmck@kernel.org \
--cc=qiang1.zhang@intel.com \
--cc=quic_neeraju@quicinc.com \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=thunder.leizhen@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.