public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Alexei Starovoitov <ast@plumgrid.com>
Cc: Daniel Wagner <daniel.wagner@bmw-carit.de>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: call_rcu from trace_preempt
Date: Mon, 15 Jun 2015 16:07:02 -0700	[thread overview]
Message-ID: <20150615230702.GB3913@linux.vnet.ibm.com> (raw)
In-Reply-To: <557F509D.2000509@plumgrid.com>

On Mon, Jun 15, 2015 at 03:24:29PM -0700, Alexei Starovoitov wrote:
> Hi Paul,
> 
> I've been debugging the issue reported by Daniel:
> http://thread.gmane.org/gmane.linux.kernel/1974304/focus=1974304
> and it seems I narrowed it down to recursive call_rcu.

By "recursive call_rcu()", you mean invoking call_rcu() twice in a row
on the same memory, like this?

	call_rcu(&p->rh, some_callback_function);
	do_something_quick();
	call_rcu(&p->rh, another_callback_function);

Because this is perfectly legal:

	void recirculating_callback_function(struct rcu_head *p)
	{
		struct foo *fp = container_of(p, struct foo, rh);

		kfree(fp);
	}

	void recirculating_callback_function(struct rcu_head *p)
	{
		call_rcu(p, endpoint_callback_function);
	}

	...

	call_rcu(&fp->rh, recirculating_callback_function);

This sort of thing is actually used in some situations involving
RCU and reference counters.

> From trace_preempt_on() I'm doing:
> e = kmalloc(sizeof(struct elem), GFP_ATOMIC)
> kfree_rcu(e, rcu)

As written, this should be OK, assuming that "rcu" is a field of type
"struct rcu_head" (not a pointer!) within "struct elem".

> which causing all sorts of corruptions like:
> [    2.074175] WARNING: CPU: 0 PID: 3 at ../lib/debugobjects.c:263
> debug_print_object+0x8c/0xb0()
> [    2.075567] ODEBUG: active_state not available (active state 0)
> object type: rcu_head hint:           (null)
> 
> [    2.102141] WARNING: CPU: 0 PID: 3 at ../lib/debugobjects.c:263
> debug_print_object+0x8c/0xb0()
> [    2.103547] ODEBUG: deactivate not available (active state 0)
> object type: rcu_head hint:           (null)
> 
> [    2.253995] WARNING: CPU: 0 PID: 7 at ../kernel/rcu/tree.c:2976
> __call_rcu.constprop.67+0x1e5/0x350()
> [    2.255510] __call_rcu(): Leaked duplicate callback
> 
> Sometimes stack looks like:
> [    2.145163] WARNING: CPU: 0 PID: 102 at ../lib/debugobjects.c:263
> debug_print_object+0x8c/0xb0()
> [    2.147465] ODEBUG: active_state not available (active state 0)
> object type: rcu_head hint:           (null)
> [    2.148022] Modules linked in:
> [    2.148022] CPU: 0 PID: 102 Comm: systemd-udevd Not tainted
> 4.1.0-rc7+ #653
> [    2.148022] Hardware name: QEMU Standard PC (i440FX + PIIX,
> 1996), BIOS
> rel-1.7.5-rc1-0-gb1d4dc9-20140515_140003-nilsson.home.kraxel.org
> 04/01/2014
> [    2.148022]  ffffffff81a34f77 ffff88000fc03d18 ffffffff81781ed4
> 0000000000000105
> [    2.148022]  ffff88000fc03d68 ffff88000fc03d58 ffffffff81064e57
> 0000000000000000
> [    2.148022]  ffff88000fc03e20 ffffffff81c50f00 ffffffff81a34fdf
> 0000000000000286
> [    2.148022] Call Trace:
> [    2.148022]  <IRQ>  [<ffffffff81781ed4>] dump_stack+0x4f/0x7b
> [    2.148022]  [<ffffffff81064e57>] warn_slowpath_common+0x97/0xe0
> [    2.148022]  [<ffffffff81064f56>] warn_slowpath_fmt+0x46/0x50
> [    2.148022]  [<ffffffff813d305c>] debug_print_object+0x8c/0xb0
> [    2.148022]  [<ffffffff813d3cf6>] ? debug_object_active_state+0x66/0x160
> [    2.148022]  [<ffffffff813d3d81>] debug_object_active_state+0xf1/0x160
> [    2.148022]  [<ffffffff810e33b1>] rcu_process_callbacks+0x301/0xae0
> [    2.148022]  [<ffffffff810e3397>] ? rcu_process_callbacks+0x2e7/0xae0
> [    2.148022]  [<ffffffff810e9f28>] ? run_timer_softirq+0x218/0x4c0
> [    2.148022]  [<ffffffff81069dff>] __do_softirq+0x14f/0x670
> [    2.148022]  [<ffffffff8106a5f5>] irq_exit+0xa5/0xb0
> [    2.148022]  [<ffffffff8178ea6a>] smp_apic_timer_interrupt+0x4a/0x60
> [    2.148022]  [<ffffffff8178cea0>] apic_timer_interrupt+0x70/0x80
> [    2.148022]  <EOI>  [<ffffffff813d363c>] ?
> debug_object_activate+0x9c/0x1e0
> [    2.148022]  [<ffffffff8178bd47>] ? _raw_spin_unlock_irqrestore+0x67/0x80
> [    2.148022]  [<ffffffff813d36f6>] debug_object_activate+0x156/0x1e0
> [    2.148022]  [<ffffffff810de037>] rcuhead_fixup_activate+0x37/0x40
> [    2.148022]  [<ffffffff813d36a1>] debug_object_activate+0x101/0x1e0
> [    2.148022]  [<ffffffff8178bd2b>] ? _raw_spin_unlock_irqrestore+0x4b/0x80
> [    2.148022]  [<ffffffff810e2d26>] __call_rcu.constprop.67+0x46/0x350
> [    2.148022]  [<ffffffff813d3524>] ? __debug_object_init+0x3f4/0x430
> [    2.148022]  [<ffffffff8178bd2b>] ? _raw_spin_unlock_irqrestore+0x4b/0x80
> [    2.148022]  [<ffffffff810e308a>] kfree_call_rcu+0x1a/0x20
> [    2.148022]  [<ffffffff8115e230>] trace_preempt_on+0x180/0x290
> [    2.148022]  [<ffffffff8115e17e>] ? trace_preempt_on+0xce/0x290
> [    2.148022]  [<ffffffff810991c3>] preempt_count_sub+0x73/0xf0
> [    2.148022]  [<ffffffff8178bd2b>] _raw_spin_unlock_irqrestore+0x4b/0x80
> [    2.148022]  [<ffffffff813d3524>] __debug_object_init+0x3f4/0x430
> [    2.148022]  [<ffffffff8115e23c>] ? trace_preempt_on+0x18c/0x290
> [    2.148022]  [<ffffffff813d357b>] debug_object_init+0x1b/0x20
> [    2.148022]  [<ffffffff810de028>] rcuhead_fixup_activate+0x28/0x40
> [    2.148022]  [<ffffffff813d36a1>] debug_object_activate+0x101/0x1e0
> [    2.148022]  [<ffffffff812036b0>] ? get_max_files+0x20/0x20
> [    2.148022]  [<ffffffff810e2d26>] __call_rcu.constprop.67+0x46/0x350
> [    2.148022]  [<ffffffff810e30a7>] call_rcu+0x17/0x20
> [    2.148022]  [<ffffffff81203903>] __fput+0x183/0x200
> [    2.148022]  [<ffffffff812039ce>] ____fput+0xe/0x10
> [    2.148022]  [<ffffffff81088dd5>] task_work_run+0xb5/0xe0
> [    2.148022]  [<ffffffff81002c74>] do_notify_resume+0x64/0x80
> [    2.148022]  [<ffffffff8178c26c>] int_signal+0x12/0x17
> 
> My reading of the code is debug_object_*() bits are reporting real
> problem. In the above trace the call
> debug_rcu_head_unqueue(list);
> from rcu_do_batch() is not finding 'list' in tracked objects.
> 
> I know that doing call_rcu() from trace_preempt is ill advised,
> but I still want to understand why call_rcu corrupts the memory.

Hmmm...  This is what I would expect if you invoked call_rcu()
(or kfree_rcu(), for that matter) on some memory, then freed it
before the grace period ended.  This would cause the same problems
as any other use-after-free error.  Might this be happening?

> Attaching a patch that I'm using for debugging.
> It's doing recursion preemption check, so number of nested call_rcu
> is no more than 2.

Oh...  One important thing is that both call_rcu() and kfree_rcu()
use per-CPU variables, managing a per-CPU linked list.  This is why
they disable interrupts.  If you do another call_rcu() in the middle
of the first one in just the wrong place, you will have two entities
concurrently manipulating the same linked list, which will not go well.

Maybe mark call_rcu() and the things it calls as notrace?  Or you
could maintain a separate per-CPU linked list that gathered up the
stuff to be kfree()ed after a grace period, and some time later
feed them to kfree_rcu()?

> Also if I replace kfree_rcu is this patch with a regular kfree,
> all works fine.
> 
> I'm seeing this crashes in VM with _single_ cpu.
> Kernel is built with CONFIG_PREEMPT, CONFIG_PREEMPT_TRACER and
> CONFIG_DEBUG_OBJECTS_RCU_HEAD.

No surprise -- when you have lists hanging off of per-CPU variables,
it only takes one CPU to tangle the lists.

> Also interesting that size of
> struct elem {
>        u64 pad[32];
>        struct rcu_head rcu;
> };
> that I'm using in kmalloc/kfree_rcu changes the crash.
> If padding is zero, kernel just locksup, if pad[1] I see
> one type of odebug warnings, if pad[32] - another.

The usual consequence of racing a pair of callback insertions on the
same CPU would be that one of them gets leaked, and possible all
subsequent callbacks.  So the lockup is no surprise.  And there are a
lot of other assumptions in nearby code paths about only one execution
at a time from a given CPU.

> Any advise on where to look is greatly appreciated.

What I don't understand is exactly what you are trying to do.  Have more
complex tracers that dynamically allocate memory?  If so, having a per-CPU
list that stages memory to be freed so that it can be passed to call_rcu()
in a safe environment might make sense.  Of course, that list would need
to be managed carefully!

Or am I missing the point of the code below?

							Thanx, Paul

> Thanks!
> 

> diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
> index 8523ea345f2b..89433a83dd2d 100644
> --- a/kernel/trace/trace_irqsoff.c
> +++ b/kernel/trace/trace_irqsoff.c
> @@ -13,6 +13,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/module.h>
>  #include <linux/ftrace.h>
> +#include <linux/slab.h>
>  
>  #include "trace.h"
>  
> @@ -510,8 +511,42 @@ EXPORT_SYMBOL(trace_hardirqs_off_caller);
>  #endif /*  CONFIG_IRQSOFF_TRACER */
>  
>  #ifdef CONFIG_PREEMPT_TRACER
> +struct elem {
> +	u64 pad[32];
> +	struct rcu_head rcu;
> +};
> +
> +static DEFINE_PER_CPU(int, prog_active);
> +static void * test_alloc(void)
> +{
> +	struct elem *e = NULL;
> +
> +	if (in_nmi())
> +		return e;
> +
> +	preempt_disable_notrace();
> +	if (unlikely(__this_cpu_inc_return(prog_active) != 1))
> +		goto out;
> +
> +	rcu_read_lock();
> +	e = kmalloc(sizeof(struct elem), GFP_ATOMIC);
> +	rcu_read_unlock();
> +	if (!e)
> +		goto out;
> +
> +	kfree_rcu(e, rcu);
> +out:
> +	__this_cpu_dec(prog_active);
> +	preempt_enable_no_resched_notrace();
> +	return e;
> +}
> +
>  void trace_preempt_on(unsigned long a0, unsigned long a1)
>  {
> +	void * buf = 0;
> +	static int cnt = 0;
> +	if (cnt++ > 3000000)
> +		buf = test_alloc();
>  	if (preempt_trace() && !irq_trace())
>  		stop_critical_timing(a0, a1);
>  }


  reply	other threads:[~2015-06-15 23:07 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-15 22:24 call_rcu from trace_preempt Alexei Starovoitov
2015-06-15 23:07 ` Paul E. McKenney [this message]
2015-06-16  1:09   ` Alexei Starovoitov
2015-06-16  2:14     ` Paul E. McKenney
2015-06-16  5:45       ` Alexei Starovoitov
2015-06-16  6:06         ` Daniel Wagner
2015-06-16  6:25           ` Alexei Starovoitov
2015-06-16  6:34             ` Daniel Wagner
2015-06-16  6:46               ` Alexei Starovoitov
2015-06-16  6:54                 ` Daniel Wagner
2015-06-16 12:27         ` Paul E. McKenney
2015-06-16 12:38           ` Daniel Wagner
2015-06-16 14:16             ` Paul E. McKenney
2015-06-16 15:43               ` Steven Rostedt
2015-06-16 16:07                 ` Paul E. McKenney
2015-06-16 17:13                   ` Daniel Wagner
2015-06-16 15:41             ` Steven Rostedt
2015-06-16 15:52               ` Steven Rostedt
2015-06-16 17:11               ` Daniel Wagner
2015-06-16 17:20             ` Alexei Starovoitov
2015-06-16 17:37               ` Steven Rostedt
2015-06-17  0:33                 ` Alexei Starovoitov
2015-06-17  0:47                   ` Steven Rostedt
2015-06-17  1:04                     ` Alexei Starovoitov
2015-06-17  1:19                       ` Steven Rostedt
2015-06-17  8:11               ` Daniel Wagner
2015-06-17  9:05                 ` Daniel Wagner
2015-06-17 18:39                   ` Alexei Starovoitov
2015-06-17 20:37                     ` Paul E. McKenney
2015-06-17 20:53                       ` Alexei Starovoitov
2015-06-17 21:36                         ` Paul E. McKenney
2015-06-17 23:58                           ` Alexei Starovoitov
2015-06-18  0:20                             ` Paul E. McKenney
2015-06-16 15:37           ` Steven Rostedt
2015-06-16 16:05             ` Paul E. McKenney
2015-06-16 17:14               ` Alexei Starovoitov
2015-06-16 17:39                 ` Paul E. McKenney
2015-06-16 18:57                   ` Steven Rostedt
2015-06-16 19:20                     ` Paul E. McKenney
2015-06-16 19:29                       ` Steven Rostedt
2015-06-16 19:34                         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150615230702.GB3913@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=ast@plumgrid.com \
    --cc=daniel.wagner@bmw-carit.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox