public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <klamm@yandex-team.ru>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: paulmck@linux.vnet.ibm.com, Dipankar Sarma <dipankar@in.ibm.com>,
	zhmurov@yandex-team.ru, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
	James Morris <jmorris@namei.org>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	Patrick McHardy <kaber@trash.net>
Subject: Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro
Date: Wed, 22 May 2013 15:58:16 +0400	[thread overview]
Message-ID: <519CB2D8.103@yandex-team.ru> (raw)
In-Reply-To: <1369201765.3301.299.camel@edumazet-glaptop>

On 22.05.2013 09:49, Eric Dumazet wrote:
> On Tue, 2013-05-21 at 19:01 -0700, Eric Dumazet wrote:
>
>> Please use ACCESS_ONCE(), which is the standard way to deal with this,
>> and remove the rcu_dereference_raw() in
>> hlist_nulls_for_each_entry_rcu()
>>
>> something like : (for the nulls part only)
>
> Thinking a bit more about this, I am a bit worried by other uses of
> ACCESS_ONCE(ptr->field)
>
> rcu_dereference(ptr->field) intent is to reload ptr->field every time
> from memory.

Exactly.

>
> Do we really need a
>
> #define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)
>
> #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
>
> and change all rcu_dererence(ptr->field) occurrences ?

Yes.
But not all: only "head->first". The node variable (or any other iterator) is
always a local variable that changes every cycle. So, we can rely on gcc here.

>
> I probably miss something obvious.
>
> Anyway, following patch seems to also solve the problem, I need a sleep to understand why.
>
>   	struct hlist_nulls_node *node;
> @@ -420,7 +420,7 @@ static struct sock *udp4_lib_lookup2(struct net *net,
>   begin:
>   	result = NULL;
>   	badness = 0;
> -	udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
> +	udp_portaddr_for_each_entry_rcu(sk, node, head) {
>   		score = compute_score2(sk, net, saddr, sport,
>   				      daddr, hnum, dif);
>   		if (score > badness) {


IMHO, it doesn't solve.
Ok, ok, it can actually solve, as fast ANY modification of related code.
For instance, adding something like
	unsigned int c = 0;
	udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
		...
		c++;
		if (c > 1000000)
			printk(...);
	}
also "solves" the problem.

It looks like, that it's semantically strange to use the ACCESS_(FIELD)_ONCE
macro for telling gcc to reread something every time.
So, I created corresponding rcu_dereference_field(_raw/_bh) macros.

Please, find my third attempt to create a patch below.

Regards,
Roman

-----------------------------------------------------------------------

rcu: fix a race in rcu lists traverse macros

Some network functions (udp4_lib_lookup2(), for instance) use the
rcu lists traverse macros (hlist_nulls_for_each_entry_rcu, for instance)
in a way that assumes restarting of a loop. In this case, it is strictly
necessary to reread the head->first value from the memory before each scan.
Without additional hints, gcc caches this value in a register. In this case,
if a cached node is moved to another chain during the scan, we can loop
forever getting wrong nulls values and restarting the loop uninterruptedly.

Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Reported-by: Boris Zhmurov <zhmurov@yandex-team.ru>
---
  include/linux/compiler.h      | 6 ++++++
  include/linux/rculist.h       | 9 +++++----
  include/linux/rculist_nulls.h | 2 +-
  include/linux/rcupdate.h      | 9 +++++++++
  4 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 92669cd..4109fab 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -351,6 +351,12 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
   */
  #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

+/*
+ * Same as ACCESS_ONCE(), but used for accessing field of a structure.
+ * The main goal is preventing compiler to store &ptr->field in a register.
+ */
+#define ACCESS_FIELD_ONCE(PTR, FIELD) (((volatile typeof(*PTR) *)PTR)->FIELD)
+
  /* Ignore/forbid kprobes attach on very low level functions marked by this attribute: */
  #ifdef CONFIG_KPROBES
  # define __kprobes	__attribute__((__section__(".kprobes.text")))
diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 8089e35..7582cfe 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -282,7 +282,8 @@ static inline void list_splice_init_rcu(struct list_head *list,
   * as long as the traversal is guarded by rcu_read_lock().
   */
  #define list_for_each_entry_rcu(pos, head, member) \
-	for (pos = list_entry_rcu((head)->next, typeof(*pos), member); \
+	for (pos = list_entry_rcu(ACCESS_FIELD_ONCE(head, next), \
+				  typeof(*pos), member); \
  		&pos->member != (head); \
  		pos = list_entry_rcu(pos->member.next, typeof(*pos), member))

@@ -439,7 +440,7 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
  }

  #define __hlist_for_each_rcu(pos, head)				\
-	for (pos = rcu_dereference(hlist_first_rcu(head));	\
+	for (pos = rcu_dereference_field(head, first);		\
  	     pos;						\
  	     pos = rcu_dereference(hlist_next_rcu(pos)))

@@ -454,7 +455,7 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
   * as long as the traversal is guarded by rcu_read_lock().
   */
  #define hlist_for_each_entry_rcu(pos, head, member)			\
-	for (pos = hlist_entry_safe (rcu_dereference_raw(hlist_first_rcu(head)),\
+	for (pos = hlist_entry_safe(rcu_dereference_field_raw(head, first), \
  			typeof(*(pos)), member);			\
  		pos;							\
  		pos = hlist_entry_safe(rcu_dereference_raw(hlist_next_rcu(\
@@ -471,7 +472,7 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
   * as long as the traversal is guarded by rcu_read_lock().
   */
  #define hlist_for_each_entry_rcu_bh(pos, head, member)			\
-	for (pos = hlist_entry_safe(rcu_dereference_bh(hlist_first_rcu(head)),\
+	for (pos = hlist_entry_safe(rcu_dereference_field_bh(head, first), \
  			typeof(*(pos)), member);			\
  		pos;							\
  		pos = hlist_entry_safe(rcu_dereference_bh(hlist_next_rcu(\
diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
index 2ae1371..ef431b4 100644
--- a/include/linux/rculist_nulls.h
+++ b/include/linux/rculist_nulls.h
@@ -107,7 +107,7 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n,
   *
   */
  #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member)			\
-	for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head));		\
+	for (pos = rcu_dereference_field_raw(head, first);			\
  		(!is_a_nulls(pos)) &&						\
  		({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
  		pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 4ccd68e..6fef0c2 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -640,6 +640,9 @@ static inline void rcu_preempt_sleep_check(void)

  #define rcu_dereference_raw(p) rcu_dereference_check(p, 1) /*@@@ needed? @@@*/

+#define rcu_dereference_field_raw(PTR, FIELD) \
+		rcu_dereference_raw(ACCESS_FIELD_ONCE(PTR, FIELD))
+
  /**
   * rcu_access_index() - fetch RCU index with no dereferencing
   * @p: The index to read
@@ -704,6 +707,9 @@ static inline void rcu_preempt_sleep_check(void)
   */
  #define rcu_dereference(p) rcu_dereference_check(p, 0)

+#define rcu_dereference_field(PTR, FIELD)		\
+	rcu_dereference(ACCESS_FIELD_ONCE(PTR, FIELD))
+
  /**
   * rcu_dereference_bh() - fetch an RCU-bh-protected pointer for dereferencing
   * @p: The pointer to read, prior to dereferencing
@@ -712,6 +718,9 @@ static inline void rcu_preempt_sleep_check(void)
   */
  #define rcu_dereference_bh(p) rcu_dereference_bh_check(p, 0)

+#define rcu_dereference_field_bh(PTR, FIELD)			\
+	rcu_dereference_bh(ACCESS_FIELD_ONCE(PTR, FIELD))
+
  /**
   * rcu_dereference_sched() - fetch RCU-sched-protected pointer for dereferencing
   * @p: The pointer to read, prior to dereferencing
-- 
1.8.1.2



  reply	other threads:[~2013-05-22 11:58 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-21  9:05 [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro Roman Gushchin
2013-05-21 10:40 ` David Laight
2013-05-21 11:55   ` Roman Gushchin
2013-05-21 13:42     ` David Laight
2013-05-21 12:09 ` Paul E. McKenney
2013-05-21 12:46   ` Roman Gushchin
2013-05-21 12:58     ` Paul E. McKenney
2013-05-21 13:37   ` Eric Dumazet
2013-05-21 13:44   ` Eric Dumazet
2013-05-21 14:47     ` Roman Gushchin
2013-05-21 15:16       ` Eric Dumazet
2013-05-21 15:51         ` Roman Gushchin
2013-05-21 15:38       ` Eric Dumazet
2013-05-21 15:51         ` Roman Gushchin
2013-05-21 18:12         ` [PATCH v2] " Roman Gushchin
2013-05-22  2:01           ` Eric Dumazet
2013-05-22  5:49             ` Eric Dumazet
2013-05-22 11:58               ` Roman Gushchin [this message]
2013-05-22 12:30                 ` Eric Dumazet
2013-05-22 13:07                   ` Roman Gushchin
2013-05-22 17:45                     ` Paul E. McKenney
2013-05-22 19:17                       ` Roman Gushchin
2013-05-25 11:37                         ` Paul E. McKenney
2013-05-27 11:34                           ` Roman Gushchin
2013-05-27 17:55                           ` Roman Gushchin
2013-05-28  0:12                             ` Eric Dumazet
2013-05-28  9:10                               ` Roman Gushchin
2013-05-29  0:34                                 ` Eric Dumazet
2013-05-29  1:31                                   ` Paul E. McKenney
2013-05-29  5:08                                     ` Eric Dumazet
2013-05-29 10:09                                       ` Roman Gushchin
2013-05-29 19:06                                         ` Eric Dumazet
2013-05-30  8:25                                           ` Roman Gushchin
2013-06-02 23:31                                             ` Eric Dumazet
2013-06-03  2:58                                               ` David Miller
2013-06-03  3:12                                                 ` Eric Dumazet
2013-06-03  3:27                                                   ` David Miller
2013-06-03  3:42                                                     ` Paul E. McKenney
2013-06-03  3:47                                                       ` Eric Dumazet
2013-06-03  3:49                                                       ` David Miller
2013-06-03  6:05                                                         ` Paul E. McKenney
2013-06-10 18:29                                                         ` Boris B. Zhmurov
2013-06-10 18:51                                                           ` Eric Dumazet
2013-06-03  3:48                                                   ` Paul E. McKenney
2013-06-03  3:42                                                 ` Paul E. McKenney
2013-05-29  9:17                                   ` Roman Gushchin
2013-05-29  1:19                                 ` Paul E. McKenney
2013-05-22 13:27                   ` David Laight
2013-05-22 13:36                     ` Eric Dumazet
2013-05-22 14:23                       ` David Laight
2013-05-22 13:55                     ` Roman Gushchin
2013-05-22  9:58             ` Paul E. McKenney
2013-05-22 12:28               ` Eric Dumazet
2013-05-22 13:00                 ` Paul E. McKenney
2013-05-22 14:16                   ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=519CB2D8.103@yandex-team.ru \
    --to=klamm@yandex-team.ru \
    --cc=davem@davemloft.net \
    --cc=dipankar@in.ibm.com \
    --cc=eric.dumazet@gmail.com \
    --cc=jmorris@namei.org \
    --cc=kaber@trash.net \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=yoshfuji@linux-ipv6.org \
    --cc=zhmurov@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox