All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Daniel Wagner <wagi@monom.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>,
	Daniel Wagner <daniel.wagner@bmw-carit.de>,
	LKML <linux-kernel@vger.kernel.org>,
	rostedt@goodmis.org
Subject: Re: call_rcu from trace_preempt
Date: Tue, 16 Jun 2015 07:16:26 -0700	[thread overview]
Message-ID: <20150616141626.GI3913@linux.vnet.ibm.com> (raw)
In-Reply-To: <558018DD.1080701@monom.org>

On Tue, Jun 16, 2015 at 02:38:53PM +0200, Daniel Wagner wrote:
> On 06/16/2015 02:27 PM, Paul E. McKenney wrote:
> > On Mon, Jun 15, 2015 at 10:45:05PM -0700, Alexei Starovoitov wrote:
> >> On 6/15/15 7:14 PM, Paul E. McKenney wrote:
> >>>
> >>> Why do you believe that it is better to fix it within call_rcu()?
> >>
> >> found it:
> >> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> >> index 8cf7304b2867..a3be09d482ae 100644
> >> --- a/kernel/rcu/tree.c
> >> +++ b/kernel/rcu/tree.c
> >> @@ -935,9 +935,9 @@ bool notrace rcu_is_watching(void)
> >>  {
> >>         bool ret;
> >>
> >> -       preempt_disable();
> >> +       preempt_disable_notrace();
> >>         ret = __rcu_is_watching();
> >> -       preempt_enable();
> >> +       preempt_enable_notrace();
> >>         return ret;
> >>  }
> >>
> >> the rcu_is_watching() and __rcu_is_watching() are already marked
> >> notrace, so imo it's a good 'fix'.
> >> What was happening is that the above preempt_enable was triggering
> >> recursive call_rcu that was indeed messing 'rdp' that was
> >> prepared by __call_rcu and before __call_rcu_core could use that.
> > 
> >> btw, also noticed that local_irq_save done by note_gp_changes
> >> is partially redundant. In __call_rcu_core path the irqs are
> >> already disabled.
> > 
> > But you said earlier that nothing happened when interrupts were
> > disabled.  And interrupts are disabled across the call to
> > rcu_is_watching() in __call_rcu_core().  So why did those calls
> > to preempt_disable() and preempt_enable() cause trouble?
> 
> Just for the record: Using a thread for freeing the memory is curing the
> problem without the need to modify rcu_is_watching.

I must confess to liking this approach better than guaranteeing full-up
reentrancy in call_rcu() and kfree_rcu().  ;-)

							Thanx, Paul

> Here is the hack:
> 
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 83c209d..8d73be3 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -13,6 +13,8 @@
>  #include <linux/jhash.h>
>  #include <linux/filter.h>
>  #include <linux/vmalloc.h>
> +#include <linux/kthread.h>
> +#include <linux/spinlock.h>
> 
>  struct bpf_htab {
>  	struct bpf_map map;
> @@ -27,10 +29,14 @@ struct bpf_htab {
>  struct htab_elem {
>  	struct hlist_node hash_node;
>  	struct rcu_head rcu;
> +	struct list_head list;
>  	u32 hash;
>  	char key[0] __aligned(8);
>  };
> 
> +static LIST_HEAD(elem_freelist);
> +static DEFINE_SPINLOCK(elem_freelist_lock);
> +
>  /* Called from syscall */
>  static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>  {
> @@ -228,6 +234,7 @@ static int htab_map_update_elem(struct bpf_map *map,
> void *key, void *value,
>  	memcpy(l_new->key + round_up(key_size, 8), value, map->value_size);
> 
>  	l_new->hash = htab_map_hash(l_new->key, key_size);
> +	INIT_LIST_HEAD(&l_new->list);
> 
>  	/* bpf_map_update_elem() can be called in_irq() */
>  	spin_lock_irqsave(&htab->lock, flags);
> @@ -300,11 +307,17 @@ static int htab_map_delete_elem(struct bpf_map
> *map, void *key)
>  	if (l) {
>  		hlist_del_rcu(&l->hash_node);
>  		htab->count--;
> -		kfree_rcu(l, rcu);
> +		/* kfree_rcu(l, rcu); */
>  		ret = 0;
>  	}
> 
>  	spin_unlock_irqrestore(&htab->lock, flags);
> +
> +	if (l) {
> +		spin_lock_irqsave(&elem_freelist_lock, flags);
> +		list_add(&l->list, &elem_freelist);
> +		spin_unlock_irqrestore(&elem_freelist_lock, flags);
> +	}
>  	return ret;
>  }
> 
> @@ -359,9 +372,31 @@ static struct bpf_map_type_list htab_type
> __read_mostly = {
>  	.type = BPF_MAP_TYPE_HASH,
>  };
> 
> +static int free_thread(void *arg)
> +{
> +	unsigned long flags;
> +	struct htab_elem *l;
> +
> +	while (!kthread_should_stop()) {
> +		spin_lock_irqsave(&elem_freelist_lock, flags);
> +		while (!list_empty(&elem_freelist)) {
> +			l = list_entry(elem_freelist.next,
> +				struct htab_elem, list);
> +			list_del(&l->list);
> +			kfree(l);
> +		}
> +		spin_unlock_irqrestore(&elem_freelist_lock, flags);
> +	}
> +
> +	return 0;
> +}
> +
>  static int __init register_htab_map(void)
>  {
>  	bpf_register_map_type(&htab_type);
> +
> +	kthread_run(free_thread, NULL, "free_thread");
> +
>  	return 0;
>  }
>  late_initcall(register_htab_map);
> 


  reply	other threads:[~2015-06-16 14:16 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-15 22:24 call_rcu from trace_preempt Alexei Starovoitov
2015-06-15 23:07 ` Paul E. McKenney
2015-06-16  1:09   ` Alexei Starovoitov
2015-06-16  2:14     ` Paul E. McKenney
2015-06-16  5:45       ` Alexei Starovoitov
2015-06-16  6:06         ` Daniel Wagner
2015-06-16  6:25           ` Alexei Starovoitov
2015-06-16  6:34             ` Daniel Wagner
2015-06-16  6:46               ` Alexei Starovoitov
2015-06-16  6:54                 ` Daniel Wagner
2015-06-16 12:27         ` Paul E. McKenney
2015-06-16 12:38           ` Daniel Wagner
2015-06-16 14:16             ` Paul E. McKenney [this message]
2015-06-16 15:43               ` Steven Rostedt
2015-06-16 16:07                 ` Paul E. McKenney
2015-06-16 17:13                   ` Daniel Wagner
2015-06-16 15:41             ` Steven Rostedt
2015-06-16 15:52               ` Steven Rostedt
2015-06-16 17:11               ` Daniel Wagner
2015-06-16 17:20             ` Alexei Starovoitov
2015-06-16 17:37               ` Steven Rostedt
2015-06-17  0:33                 ` Alexei Starovoitov
2015-06-17  0:47                   ` Steven Rostedt
2015-06-17  1:04                     ` Alexei Starovoitov
2015-06-17  1:19                       ` Steven Rostedt
2015-06-17  8:11               ` Daniel Wagner
2015-06-17  9:05                 ` Daniel Wagner
2015-06-17 18:39                   ` Alexei Starovoitov
2015-06-17 20:37                     ` Paul E. McKenney
2015-06-17 20:53                       ` Alexei Starovoitov
2015-06-17 21:36                         ` Paul E. McKenney
2015-06-17 23:58                           ` Alexei Starovoitov
2015-06-18  0:20                             ` Paul E. McKenney
2015-06-16 15:37           ` Steven Rostedt
2015-06-16 16:05             ` Paul E. McKenney
2015-06-16 17:14               ` Alexei Starovoitov
2015-06-16 17:39                 ` Paul E. McKenney
2015-06-16 18:57                   ` Steven Rostedt
2015-06-16 19:20                     ` Paul E. McKenney
2015-06-16 19:29                       ` Steven Rostedt
2015-06-16 19:34                         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150616141626.GI3913@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=ast@plumgrid.com \
    --cc=daniel.wagner@bmw-carit.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=wagi@monom.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.