Re: call_rcu from trace_preempt

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Alexei Starovoitov <ast@plumgrid.com>
Cc: Daniel Wagner <daniel.wagner@bmw-carit.de>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: call_rcu from trace_preempt
Date: Mon, 15 Jun 2015 16:07:02 -0700	[thread overview]
Message-ID: <20150615230702.GB3913@linux.vnet.ibm.com> (raw)
In-Reply-To: <557F509D.2000509@plumgrid.com>

On Mon, Jun 15, 2015 at 03:24:29PM -0700, Alexei Starovoitov wrote:
> Hi Paul,
> 
> I've been debugging the issue reported by Daniel:
> http://thread.gmane.org/gmane.linux.kernel/1974304/focus=1974304
> and it seems I narrowed it down to recursive call_rcu.

By "recursive call_rcu()", you mean invoking call_rcu() twice in a row
on the same memory, like this?

	call_rcu(&p->rh, some_callback_function);
	do_something_quick();
	call_rcu(&p->rh, another_callback_function);

Because this is perfectly legal:

	void recirculating_callback_function(struct rcu_head *p)
	{
		struct foo *fp = container_of(p, struct foo, rh);

		kfree(fp);
	}

	void recirculating_callback_function(struct rcu_head *p)
	{
		call_rcu(p, endpoint_callback_function);
	}

	...

	call_rcu(&fp->rh, recirculating_callback_function);

This sort of thing is actually used in some situations involving
RCU and reference counters.

> From trace_preempt_on() I'm doing:
> e = kmalloc(sizeof(struct elem), GFP_ATOMIC)
> kfree_rcu(e, rcu)

As written, this should be OK, assuming that "rcu" is a field of type
"struct rcu_head" (not a pointer!) within "struct elem".

> which causing all sorts of corruptions like:
> [    2.074175] WARNING: CPU: 0 PID: 3 at ../lib/debugobjects.c:263
> debug_print_object+0x8c/0xb0()
> [    2.075567] ODEBUG: active_state not available (active state 0)
> object type: rcu_head hint:           (null)
> 
> [    2.102141] WARNING: CPU: 0 PID: 3 at ../lib/debugobjects.c:263
> debug_print_object+0x8c/0xb0()
> [    2.103547] ODEBUG: deactivate not available (active state 0)
> object type: rcu_head hint:           (null)
> 
> [    2.253995] WARNING: CPU: 0 PID: 7 at ../kernel/rcu/tree.c:2976
> __call_rcu.constprop.67+0x1e5/0x350()
> [    2.255510] __call_rcu(): Leaked duplicate callback
> 
> Sometimes stack looks like:
> [    2.145163] WARNING: CPU: 0 PID: 102 at ../lib/debugobjects.c:263
> debug_print_object+0x8c/0xb0()
> [    2.147465] ODEBUG: active_state not available (active state 0)
> object type: rcu_head hint:           (null)
> [    2.148022] Modules linked in:
> [    2.148022] CPU: 0 PID: 102 Comm: systemd-udevd Not tainted
> 4.1.0-rc7+ #653
> [    2.148022] Hardware name: QEMU Standard PC (i440FX + PIIX,
> 1996), BIOS
> rel-1.7.5-rc1-0-gb1d4dc9-20140515_140003-nilsson.home.kraxel.org
> 04/01/2014
> [    2.148022]  ffffffff81a34f77 ffff88000fc03d18 ffffffff81781ed4
> 0000000000000105
> [    2.148022]  ffff88000fc03d68 ffff88000fc03d58 ffffffff81064e57
> 0000000000000000
> [    2.148022]  ffff88000fc03e20 ffffffff81c50f00 ffffffff81a34fdf
> 0000000000000286
> [    2.148022] Call Trace:
> [    2.148022]  <IRQ>  [<ffffffff81781ed4>] dump_stack+0x4f/0x7b
> [    2.148022]  [<ffffffff81064e57>] warn_slowpath_common+0x97/0xe0
> [    2.148022]  [<ffffffff81064f56>] warn_slowpath_fmt+0x46/0x50
> [    2.148022]  [<ffffffff813d305c>] debug_print_object+0x8c/0xb0
> [    2.148022]  [<ffffffff813d3cf6>] ? debug_object_active_state+0x66/0x160
> [    2.148022]  [<ffffffff813d3d81>] debug_object_active_state+0xf1/0x160
> [    2.148022]  [<ffffffff810e33b1>] rcu_process_callbacks+0x301/0xae0
> [    2.148022]  [<ffffffff810e3397>] ? rcu_process_callbacks+0x2e7/0xae0
> [    2.148022]  [<ffffffff810e9f28>] ? run_timer_softirq+0x218/0x4c0
> [    2.148022]  [<ffffffff81069dff>] __do_softirq+0x14f/0x670
> [    2.148022]  [<ffffffff8106a5f5>] irq_exit+0xa5/0xb0
> [    2.148022]  [<ffffffff8178ea6a>] smp_apic_timer_interrupt+0x4a/0x60
> [    2.148022]  [<ffffffff8178cea0>] apic_timer_interrupt+0x70/0x80
> [    2.148022]  <EOI>  [<ffffffff813d363c>] ?
> debug_object_activate+0x9c/0x1e0
> [    2.148022]  [<ffffffff8178bd47>] ? _raw_spin_unlock_irqrestore+0x67/0x80
> [    2.148022]  [<ffffffff813d36f6>] debug_object_activate+0x156/0x1e0
> [    2.148022]  [<ffffffff810de037>] rcuhead_fixup_activate+0x37/0x40
> [    2.148022]  [<ffffffff813d36a1>] debug_object_activate+0x101/0x1e0
> [    2.148022]  [<ffffffff8178bd2b>] ? _raw_spin_unlock_irqrestore+0x4b/0x80
> [    2.148022]  [<ffffffff810e2d26>] __call_rcu.constprop.67+0x46/0x350
> [    2.148022]  [<ffffffff813d3524>] ? __debug_object_init+0x3f4/0x430
> [    2.148022]  [<ffffffff8178bd2b>] ? _raw_spin_unlock_irqrestore+0x4b/0x80
> [    2.148022]  [<ffffffff810e308a>] kfree_call_rcu+0x1a/0x20
> [    2.148022]  [<ffffffff8115e230>] trace_preempt_on+0x180/0x290
> [    2.148022]  [<ffffffff8115e17e>] ? trace_preempt_on+0xce/0x290
> [    2.148022]  [<ffffffff810991c3>] preempt_count_sub+0x73/0xf0
> [    2.148022]  [<ffffffff8178bd2b>] _raw_spin_unlock_irqrestore+0x4b/0x80
> [    2.148022]  [<ffffffff813d3524>] __debug_object_init+0x3f4/0x430
> [    2.148022]  [<ffffffff8115e23c>] ? trace_preempt_on+0x18c/0x290
> [    2.148022]  [<ffffffff813d357b>] debug_object_init+0x1b/0x20
> [    2.148022]  [<ffffffff810de028>] rcuhead_fixup_activate+0x28/0x40
> [    2.148022]  [<ffffffff813d36a1>] debug_object_activate+0x101/0x1e0
> [    2.148022]  [<ffffffff812036b0>] ? get_max_files+0x20/0x20
> [    2.148022]  [<ffffffff810e2d26>] __call_rcu.constprop.67+0x46/0x350
> [    2.148022]  [<ffffffff810e30a7>] call_rcu+0x17/0x20
> [    2.148022]  [<ffffffff81203903>] __fput+0x183/0x200
> [    2.148022]  [<ffffffff812039ce>] ____fput+0xe/0x10
> [    2.148022]  [<ffffffff81088dd5>] task_work_run+0xb5/0xe0
> [    2.148022]  [<ffffffff81002c74>] do_notify_resume+0x64/0x80
> [    2.148022]  [<ffffffff8178c26c>] int_signal+0x12/0x17
> 
> My reading of the code is debug_object_*() bits are reporting real
> problem. In the above trace the call
> debug_rcu_head_unqueue(list);
> from rcu_do_batch() is not finding 'list' in tracked objects.
> 
> I know that doing call_rcu() from trace_preempt is ill advised,
> but I still want to understand why call_rcu corrupts the memory.

Hmmm...  This is what I would expect if you invoked call_rcu()
(or kfree_rcu(), for that matter) on some memory, then freed it
before the grace period ended.  This would cause the same problems
as any other use-after-free error.  Might this be happening?

> Attaching a patch that I'm using for debugging.
> It's doing recursion preemption check, so number of nested call_rcu
> is no more than 2.

Oh...  One important thing is that both call_rcu() and kfree_rcu()
use per-CPU variables, managing a per-CPU linked list.  This is why
they disable interrupts.  If you do another call_rcu() in the middle
of the first one in just the wrong place, you will have two entities
concurrently manipulating the same linked list, which will not go well.

Maybe mark call_rcu() and the things it calls as notrace?  Or you
could maintain a separate per-CPU linked list that gathered up the
stuff to be kfree()ed after a grace period, and some time later
feed them to kfree_rcu()?

> Also if I replace kfree_rcu is this patch with a regular kfree,
> all works fine.
> 
> I'm seeing this crashes in VM with _single_ cpu.
> Kernel is built with CONFIG_PREEMPT, CONFIG_PREEMPT_TRACER and
> CONFIG_DEBUG_OBJECTS_RCU_HEAD.

No surprise -- when you have lists hanging off of per-CPU variables,
it only takes one CPU to tangle the lists.

> Also interesting that size of
> struct elem {
>        u64 pad[32];
>        struct rcu_head rcu;
> };
> that I'm using in kmalloc/kfree_rcu changes the crash.
> If padding is zero, kernel just locksup, if pad[1] I see
> one type of odebug warnings, if pad[32] - another.

The usual consequence of racing a pair of callback insertions on the
same CPU would be that one of them gets leaked, and possible all
subsequent callbacks.  So the lockup is no surprise.  And there are a
lot of other assumptions in nearby code paths about only one execution
at a time from a given CPU.

> Any advise on where to look is greatly appreciated.

What I don't understand is exactly what you are trying to do.  Have more
complex tracers that dynamically allocate memory?  If so, having a per-CPU
list that stages memory to be freed so that it can be passed to call_rcu()
in a safe environment might make sense.  Of course, that list would need
to be managed carefully!

Or am I missing the point of the code below?

							Thanx, Paul

> Thanks!
> 

> diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
> index 8523ea345f2b..89433a83dd2d 100644
> --- a/kernel/trace/trace_irqsoff.c
> +++ b/kernel/trace/trace_irqsoff.c
> @@ -13,6 +13,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/module.h>
>  #include <linux/ftrace.h>
> +#include <linux/slab.h>
>  
>  #include "trace.h"
>  
> @@ -510,8 +511,42 @@ EXPORT_SYMBOL(trace_hardirqs_off_caller);
>  #endif /*  CONFIG_IRQSOFF_TRACER */
>  
>  #ifdef CONFIG_PREEMPT_TRACER
> +struct elem {
> +	u64 pad[32];
> +	struct rcu_head rcu;
> +};
> +
> +static DEFINE_PER_CPU(int, prog_active);
> +static void * test_alloc(void)
> +{
> +	struct elem *e = NULL;
> +
> +	if (in_nmi())
> +		return e;
> +
> +	preempt_disable_notrace();
> +	if (unlikely(__this_cpu_inc_return(prog_active) != 1))
> +		goto out;
> +
> +	rcu_read_lock();
> +	e = kmalloc(sizeof(struct elem), GFP_ATOMIC);
> +	rcu_read_unlock();
> +	if (!e)
> +		goto out;
> +
> +	kfree_rcu(e, rcu);
> +out:
> +	__this_cpu_dec(prog_active);
> +	preempt_enable_no_resched_notrace();
> +	return e;
> +}
> +
>  void trace_preempt_on(unsigned long a0, unsigned long a1)
>  {
> +	void * buf = 0;
> +	static int cnt = 0;
> +	if (cnt++ > 3000000)
> +		buf = test_alloc();
>  	if (preempt_trace() && !irq_trace())
>  		stop_critical_timing(a0, a1);
>  }

next prev parent reply	other threads:[~2015-06-15 23:07 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-15 22:24 call_rcu from trace_preempt Alexei Starovoitov
2015-06-15 23:07 ` Paul E. McKenney [this message]
2015-06-16  1:09   ` Alexei Starovoitov
2015-06-16  2:14     ` Paul E. McKenney
2015-06-16  5:45       ` Alexei Starovoitov
2015-06-16  6:06         ` Daniel Wagner
2015-06-16  6:25           ` Alexei Starovoitov
2015-06-16  6:34             ` Daniel Wagner
2015-06-16  6:46               ` Alexei Starovoitov
2015-06-16  6:54                 ` Daniel Wagner
2015-06-16 12:27         ` Paul E. McKenney
2015-06-16 12:38           ` Daniel Wagner
2015-06-16 14:16             ` Paul E. McKenney
2015-06-16 15:43               ` Steven Rostedt
2015-06-16 16:07                 ` Paul E. McKenney
2015-06-16 17:13                   ` Daniel Wagner
2015-06-16 15:41             ` Steven Rostedt
2015-06-16 15:52               ` Steven Rostedt
2015-06-16 17:11               ` Daniel Wagner
2015-06-16 17:20             ` Alexei Starovoitov
2015-06-16 17:37               ` Steven Rostedt
2015-06-17  0:33                 ` Alexei Starovoitov
2015-06-17  0:47                   ` Steven Rostedt
2015-06-17  1:04                     ` Alexei Starovoitov
2015-06-17  1:19                       ` Steven Rostedt
2015-06-17  8:11               ` Daniel Wagner
2015-06-17  9:05                 ` Daniel Wagner
2015-06-17 18:39                   ` Alexei Starovoitov
2015-06-17 20:37                     ` Paul E. McKenney
2015-06-17 20:53                       ` Alexei Starovoitov
2015-06-17 21:36                         ` Paul E. McKenney
2015-06-17 23:58                           ` Alexei Starovoitov
2015-06-18  0:20                             ` Paul E. McKenney
2015-06-16 15:37           ` Steven Rostedt
2015-06-16 16:05             ` Paul E. McKenney
2015-06-16 17:14               ` Alexei Starovoitov
2015-06-16 17:39                 ` Paul E. McKenney
2015-06-16 18:57                   ` Steven Rostedt
2015-06-16 19:20                     ` Paul E. McKenney
2015-06-16 19:29                       ` Steven Rostedt
2015-06-16 19:34                         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150615230702.GB3913@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=ast@plumgrid.com \
    --cc=daniel.wagner@bmw-carit.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.