netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [IPV4] ROUTE: ip_rt_dump() is unecessary slow
@ 2008-01-07 18:30 Eric Dumazet
  2008-01-08  5:52 ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2008-01-07 18:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org

I noticed "ip route list cache x.y.z.t" can be *very* slow.

While strace-ing -T it I also noticed that first part of route cache is
fetched quite fast :


recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.000047>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.000042>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3740 <0.000055>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.000043>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3732 <0.000053>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3708 <0.000052>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3680 <0.000041>

while the part at the end of the table is more expensive:

recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3656 <0.003857>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.003891>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.003765>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3700 <0.003879>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3676 <0.003797>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3724 <0.003856>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.003848>

The following patch corrects this performance/latency problem, removing quadratic behavior.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 10915bb..b7aa6dd 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2755,11 +2755,10 @@ int ip_rt_dump(struct sk_buff *skb,  struct netlink_callback *cb)
 	int idx, s_idx;
 
 	s_h = cb->args[0];
+	if (s_h < 0)
+		s_h = 0;
 	s_idx = idx = cb->args[1];
-	for (h = 0; h <= rt_hash_mask; h++) {
-		if (h < s_h) continue;
-		if (h > s_h)
-			s_idx = 0;
+	for (h = s_h; h <= rt_hash_mask; h++) {
 		rcu_read_lock_bh();
 		for (rt = rcu_dereference(rt_hash_table[h].chain), idx = 0; rt;
 		     rt = rcu_dereference(rt->u.dst.rt_next), idx++) {
@@ -2776,6 +2775,7 @@ int ip_rt_dump(struct sk_buff *skb,  struct netlink_callback *cb)
 			dst_release(xchg(&skb->dst, NULL));
 		}
 		rcu_read_unlock_bh();
+		s_idx = 0;
 	}
 
 done:


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [IPV4] ROUTE: ip_rt_dump() is unecessary slow
  2008-01-07 18:30 [IPV4] ROUTE: ip_rt_dump() is unecessary slow Eric Dumazet
@ 2008-01-08  5:52 ` David Miller
  2008-01-08  6:11   ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: David Miller @ 2008-01-08  5:52 UTC (permalink / raw)
  To: dada1; +Cc: netdev

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Mon, 7 Jan 2008 19:30:02 +0100

> I noticed "ip route list cache x.y.z.t" can be *very* slow.
> 
> While strace-ing -T it I also noticed that first part of route cache is
> fetched quite fast :
 ...
> The following patch corrects this performance/latency problem, removing quadratic behavior.
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

I think removing quadratic behavior is a bug fix, thus applied
to net-2.6, thanks Eric :-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [IPV4] ROUTE: ip_rt_dump() is unecessary slow
  2008-01-08  5:52 ` David Miller
@ 2008-01-08  6:11   ` Eric Dumazet
  2008-01-08  6:15     ` David Miller
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2008-01-08  6:11 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 821 bytes --]

David Miller a écrit :
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Mon, 7 Jan 2008 19:30:02 +0100
> 
>> I noticed "ip route list cache x.y.z.t" can be *very* slow.
>>
>> While strace-ing -T it I also noticed that first part of route cache is
>> fetched quite fast :
>  ...
>> The following patch corrects this performance/latency problem, removing quadratic behavior.
>>
>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> 
> I think removing quadratic behavior is a bug fix, thus applied
> to net-2.6, thanks Eric :-)

You are the boss :)

I had another patch to submit, so I based it no net-2.6, please just say me if 
you prefer a net-2/6.25 one, as it's not a critical bug fix...

Thank you

[IPV4] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>


[-- Attachment #2: route_rcu.patch --]
[-- Type: text/plain, Size: 1011 bytes --]

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index d337706..3b7562f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -278,7 +278,7 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
 
 	for (st->bucket = rt_hash_mask; st->bucket >= 0; --st->bucket) {
 		rcu_read_lock_bh();
-		r = rt_hash_table[st->bucket].chain;
+		r = rcu_dereference(rt_hash_table[st->bucket].chain);
 		if (r)
 			break;
 		rcu_read_unlock_bh();
@@ -288,15 +288,15 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
 
 static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
 {
-	struct rt_cache_iter_state *st = rcu_dereference(seq->private);
+	struct rt_cache_iter_state *st = seq->private;
 
-	r = r->u.dst.rt_next;
+	r = rcu_dereference(r->u.dst.rt_next);
 	while (!r) {
 		rcu_read_unlock_bh();
 		if (--st->bucket < 0)
 			break;
 		rcu_read_lock_bh();
-		r = rt_hash_table[st->bucket].chain;
+		r = rcu_dereference(rt_hash_table[st->bucket].chain);
 	}
 	return r;
 }

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [IPV4] ROUTE: ip_rt_dump() is unecessary slow
  2008-01-08  6:11   ` Eric Dumazet
@ 2008-01-08  6:15     ` David Miller
  2008-01-08  6:37       ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: David Miller @ 2008-01-08  6:15 UTC (permalink / raw)
  To: dada1; +Cc: netdev

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Tue, 08 Jan 2008 07:11:30 +0100

> @@ -288,15 +288,15 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
>  
>  static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
>  {
> -	struct rt_cache_iter_state *st = rcu_dereference(seq->private);
> +	struct rt_cache_iter_state *st = seq->private;
>  

Can you explain to me why this rcu_dereference() can be removed?

The rest of your patch is OK and once I understand the above
I'll add it to net-2.6, thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [IPV4] ROUTE: ip_rt_dump() is unecessary slow
  2008-01-08  6:15     ` David Miller
@ 2008-01-08  6:37       ` Eric Dumazet
  2008-01-09  6:02         ` Herbert Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2008-01-08  6:37 UTC (permalink / raw)
  To: David Miller, Dipankar Sarma; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 1581 bytes --]

David Miller a écrit :
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Tue, 08 Jan 2008 07:11:30 +0100
> 
>> @@ -288,15 +288,15 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
>>  
>>  static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
>>  {
>> -	struct rt_cache_iter_state *st = rcu_dereference(seq->private);
>> +	struct rt_cache_iter_state *st = seq->private;
>>  
> 
> Can you explain to me why this rcu_dereference() can be removed?

Very good question, but honestly I really dont see why it was there at the 
first place :

"struct seq_file" is private to this thread, so seq.private is
also private and cannot change while this thread runs rt_cache_get_next().

Reading it once (as guaranted bu rcu_dereference()) or several time if 
compiler really is dumb enough wont change the result...

> 
> The rest of your patch is OK and once I understand the above
> I'll add it to net-2.6, thanks!
> 

Maybe we can first have an Ack from Dipankar Sarma, I can repost the patch
with a more precise ChangeLog and wait his answer ?

[NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache

In rt_cache_get_next(), no need to guard seq->private by a rcu_dereference()
since seq is private to the thread running this function. Reading seq.private
once (as guaranted bu rcu_dereference()) or several time if compiler really is 
dumb enough wont change the result.

But we miss real spots where rcu_dereference() are needed, both in 
rt_cache_get_first() and rt_cache_get_next()

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>


[-- Attachment #2: route_rcu.patch --]
[-- Type: text/plain, Size: 1011 bytes --]

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index d337706..3b7562f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -278,7 +278,7 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
 
 	for (st->bucket = rt_hash_mask; st->bucket >= 0; --st->bucket) {
 		rcu_read_lock_bh();
-		r = rt_hash_table[st->bucket].chain;
+		r = rcu_dereference(rt_hash_table[st->bucket].chain);
 		if (r)
 			break;
 		rcu_read_unlock_bh();
@@ -288,15 +288,15 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
 
 static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
 {
-	struct rt_cache_iter_state *st = rcu_dereference(seq->private);
+	struct rt_cache_iter_state *st = seq->private;
 
-	r = r->u.dst.rt_next;
+	r = rcu_dereference(r->u.dst.rt_next);
 	while (!r) {
 		rcu_read_unlock_bh();
 		if (--st->bucket < 0)
 			break;
 		rcu_read_lock_bh();
-		r = rt_hash_table[st->bucket].chain;
+		r = rcu_dereference(rt_hash_table[st->bucket].chain);
 	}
 	return r;
 }

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [IPV4] ROUTE: ip_rt_dump() is unecessary slow
  2008-01-08  6:37       ` Eric Dumazet
@ 2008-01-09  6:02         ` Herbert Xu
  2008-01-09  7:38           ` [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2008-01-09  6:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, dipankar, netdev

Eric Dumazet <dada1@cosmosbay.com> wrote:
> 
> Very good question, but honestly I really dont see why it was there at the 
> first place :

It was there because someone went through this file and robotically
replaced all conditional read barriers with rcu_dereference, even when
it made zero sense.

Basically you can add a conditional barrier either at the point where
the pointer gets read, or where it gets derferenced.  Previously we
did the latter (except that the show function didn't have a barrier
at all which is technically a bug though harmless in pratice).  This
patch moves it to the spot where it gets read which is also OK.

> static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
> {
> -       struct rt_cache_iter_state *st = rcu_dereference(seq->private);
> +       struct rt_cache_iter_state *st = seq->private;
> 
> -       r = r->u.dst.rt_next;
> +       r = rcu_dereference(r->u.dst.rt_next);
>        while (!r) {
>                rcu_read_unlock_bh();
>                if (--st->bucket < 0)
>                        break;
>                rcu_read_lock_bh();
> -               r = rt_hash_table[st->bucket].chain;
> +               r = rcu_dereference(rt_hash_table[st->bucket].chain);
>        }
>        return r;

Slight optimisation: please move both barriers onto the return statement,
i.e.,

	return rcu_dereference(r);

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-09  6:02         ` Herbert Xu
@ 2008-01-09  7:38           ` Eric Dumazet
  2008-01-09  9:46             ` Herbert Xu
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2008-01-09  7:38 UTC (permalink / raw)
  To: Herbert Xu, Paul E. McKenney; +Cc: davem, dipankar, netdev

[-- Attachment #1: Type: text/plain, Size: 2290 bytes --]

Herbert Xu a écrit :
> Eric Dumazet <dada1@cosmosbay.com> wrote:
>> Very good question, but honestly I really dont see why it was there at the 
>> first place :
> 
> It was there because someone went through this file and robotically
> replaced all conditional read barriers with rcu_dereference, even when
> it made zero sense.
> 
> Basically you can add a conditional barrier either at the point where
> the pointer gets read, or where it gets derferenced.  Previously we
> did the latter (except that the show function didn't have a barrier
> at all which is technically a bug though harmless in pratice).  This
> patch moves it to the spot where it gets read which is also OK.
> 
>> static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
>> {
>> -       struct rt_cache_iter_state *st = rcu_dereference(seq->private);
>> +       struct rt_cache_iter_state *st = seq->private;
>>
>> -       r = r->u.dst.rt_next;
>> +       r = rcu_dereference(r->u.dst.rt_next);
>>        while (!r) {
>>                rcu_read_unlock_bh();
>>                if (--st->bucket < 0)
>>                        break;
>>                rcu_read_lock_bh();
>> -               r = rt_hash_table[st->bucket].chain;
>> +               r = rcu_dereference(rt_hash_table[st->bucket].chain);
>>        }
>>        return r;
> 
> Slight optimisation: please move both barriers onto the return statement,
> i.e.,
> 
> 	return rcu_dereference(r);

I am not sure this is valid, since it will do this :

r = rt_hash_table[st->bucket].chain;
if (r)
     return rcu_dereference(r);

So compiler might be dumb enough do dereference 
&rt_hash_table[st->bucket].chain two times.


It seems Dipankar is busy at this moment, so I will post again the patch and 
ask a comment from Paul :)

Thank you

[NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache

In rt_cache_get_next(), no need to guard seq->private by a rcu_dereference()
since seq is private to the thread running this function. Reading seq.private
once (as guaranted bu rcu_dereference()) or several time if compiler really is 
dumb enough wont change the result.

But we miss real spots where rcu_dereference() are needed, both in 
rt_cache_get_first() and rt_cache_get_next()

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>


[-- Attachment #2: route_rcu.patch --]
[-- Type: text/plain, Size: 1011 bytes --]

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index d337706..3b7562f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -278,7 +278,7 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
 
 	for (st->bucket = rt_hash_mask; st->bucket >= 0; --st->bucket) {
 		rcu_read_lock_bh();
-		r = rt_hash_table[st->bucket].chain;
+		r = rcu_dereference(rt_hash_table[st->bucket].chain);
 		if (r)
 			break;
 		rcu_read_unlock_bh();
@@ -288,15 +288,15 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
 
 static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
 {
-	struct rt_cache_iter_state *st = rcu_dereference(seq->private);
+	struct rt_cache_iter_state *st = seq->private;
 
-	r = r->u.dst.rt_next;
+	r = rcu_dereference(r->u.dst.rt_next);
 	while (!r) {
 		rcu_read_unlock_bh();
 		if (--st->bucket < 0)
 			break;
 		rcu_read_lock_bh();
-		r = rt_hash_table[st->bucket].chain;
+		r = rcu_dereference(rt_hash_table[st->bucket].chain);
 	}
 	return r;
 }

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-09  7:38           ` [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache Eric Dumazet
@ 2008-01-09  9:46             ` Herbert Xu
  2008-01-09 10:37               ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2008-01-09  9:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Paul E. McKenney, davem, dipankar, netdev

On Wed, Jan 09, 2008 at 08:38:56AM +0100, Eric Dumazet wrote:
> 
> I am not sure this is valid, since it will do this :
> 
> r = rt_hash_table[st->bucket].chain;
> if (r)
>     return rcu_dereference(r);
> 
> So compiler might be dumb enough do dereference 
> &rt_hash_table[st->bucket].chain two times.

That wouldn't be a problem at all.  The key is to add a barrier between
reading the pointer:

	r = rt_hash_table[st->bucket].chain

and dereferencing it later, e.g.,

	r->u.dst.rt_next

The barrier is there so that when we dereference r we don't read
stale cache that was there before the memory at r was initialised.
How many times you read the pointer value before the barrier is
irrelevant to the effectiveness of the barrier preceding the
dereference.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-09  9:46             ` Herbert Xu
@ 2008-01-09 10:37               ` Eric Dumazet
  2008-01-09 14:22                 ` Paul E. McKenney
                                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Eric Dumazet @ 2008-01-09 10:37 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Paul E. McKenney, davem, dipankar, netdev

On Wed, 9 Jan 2008 20:46:37 +1100
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Wed, Jan 09, 2008 at 08:38:56AM +0100, Eric Dumazet wrote:
> > 
> > I am not sure this is valid, since it will do this :
> > 
> > r = rt_hash_table[st->bucket].chain;
> > if (r)
> >     return rcu_dereference(r);
> > 
> > So compiler might be dumb enough do dereference 
> > &rt_hash_table[st->bucket].chain two times.
> 
> That wouldn't be a problem at all.  The key is to add a barrier between
> reading the pointer:
> 
> 	r = rt_hash_table[st->bucket].chain
> 
> and dereferencing it later, e.g.,
> 
> 	r->u.dst.rt_next
> 
> The barrier is there so that when we dereference r we don't read
> stale cache that was there before the memory at r was initialised.
> How many times you read the pointer value before the barrier is
> irrelevant to the effectiveness of the barrier preceding the
> dereference.

You are absolutely right Herbert, so I changed the patch to :

[NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache

In rt_cache_get_next(), no need to guard seq->private by a rcu_dereference()
since seq is private to the thread running this function. Reading seq.private
once (as guaranted bu rcu_dereference()) or several time if compiler really is 
dumb enough wont change the result.

But we miss real spots where rcu_dereference() are needed, both in 
rt_cache_get_first() and rt_cache_get_next()

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index d337706..28484f3 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -283,12 +283,12 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
 			break;
 		rcu_read_unlock_bh();
 	}
-	return r;
+	return rcu_dereference(r);
 }
 
 static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
 {
-	struct rt_cache_iter_state *st = rcu_dereference(seq->private);
+	struct rt_cache_iter_state *st = seq->private;
 
 	r = r->u.dst.rt_next;
 	while (!r) {
@@ -298,7 +298,7 @@ static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
 		rcu_read_lock_bh();
 		r = rt_hash_table[st->bucket].chain;
 	}
-	return r;
+	return rcu_dereference(r);
 }
 
 static struct rtable *rt_cache_get_idx(struct seq_file *seq, loff_t pos)

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-09 10:37               ` Eric Dumazet
@ 2008-01-09 14:22                 ` Paul E. McKenney
  2008-01-09 14:31                   ` David Miller
  2008-01-10 11:56                 ` David Miller
                                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: Paul E. McKenney @ 2008-01-09 14:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Herbert Xu, davem, dipankar, netdev

On Wed, Jan 09, 2008 at 11:37:27AM +0100, Eric Dumazet wrote:
> On Wed, 9 Jan 2008 20:46:37 +1100
> Herbert Xu <herbert@gondor.apana.org.au> wrote:
> 
> > On Wed, Jan 09, 2008 at 08:38:56AM +0100, Eric Dumazet wrote:
> > > 
> > > I am not sure this is valid, since it will do this :
> > > 
> > > r = rt_hash_table[st->bucket].chain;
> > > if (r)
> > >     return rcu_dereference(r);
> > > 
> > > So compiler might be dumb enough do dereference 
> > > &rt_hash_table[st->bucket].chain two times.
> > 
> > That wouldn't be a problem at all.  The key is to add a barrier between
> > reading the pointer:
> > 
> > 	r = rt_hash_table[st->bucket].chain
> > 
> > and dereferencing it later, e.g.,
> > 
> > 	r->u.dst.rt_next
> > 
> > The barrier is there so that when we dereference r we don't read
> > stale cache that was there before the memory at r was initialised.
> > How many times you read the pointer value before the barrier is
> > irrelevant to the effectiveness of the barrier preceding the
> > dereference.

Agreed -- as long as you don't try to dereference the pointer before
passing it through rcu_dereference(), and as long as both the initial
fetch of the pointer, the rcu_dereference(), and the actual dereferencing
of the pointer are all within the same RCU read-side critical section.

> You are absolutely right Herbert, so I changed the patch to :
> 
> [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
> 
> In rt_cache_get_next(), no need to guard seq->private by a rcu_dereference()
> since seq is private to the thread running this function. Reading seq.private
> once (as guaranted bu rcu_dereference()) or several time if compiler really is 
> dumb enough wont change the result.
> 
> But we miss real spots where rcu_dereference() are needed, both in 
> rt_cache_get_first() and rt_cache_get_next()
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index d337706..28484f3 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -283,12 +283,12 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
>  			break;
>  		rcu_read_unlock_bh();
>  	}
> -	return r;
> +	return rcu_dereference(r);
>  }

Would it be possible to tag rt_cache_get_first() with an __acquires(RCU)
to help out sparse?

>  static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
>  {
> -	struct rt_cache_iter_state *st = rcu_dereference(seq->private);
> +	struct rt_cache_iter_state *st = seq->private;
> 
>  	r = r->u.dst.rt_next;
>  	while (!r) {
> @@ -298,7 +298,7 @@ static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
>  		rcu_read_lock_bh();
>  		r = rt_hash_table[st->bucket].chain;
>  	}
> -	return r;
> +	return rcu_dereference(r);
>  }

Ditto for rt_cache_get_next()?

>  static struct rtable *rt_cache_get_idx(struct seq_file *seq, loff_t pos)

There would need to be a __releases(RCU) somewhere -- possibly
in rt_cache_seq_stop(), but need to defer to you guys on this one.

						Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-09 14:22                 ` Paul E. McKenney
@ 2008-01-09 14:31                   ` David Miller
  2008-01-09 14:43                     ` Paul E. McKenney
  2008-01-09 17:36                     ` Eric Dumazet
  0 siblings, 2 replies; 27+ messages in thread
From: David Miller @ 2008-01-09 14:31 UTC (permalink / raw)
  To: paulmck; +Cc: dada1, herbert, dipankar, netdev

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Wed, 9 Jan 2008 06:22:58 -0800

> On Wed, Jan 09, 2008 at 11:37:27AM +0100, Eric Dumazet wrote:
> > On Wed, 9 Jan 2008 20:46:37 +1100
> > Herbert Xu <herbert@gondor.apana.org.au> wrote:
> > 
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index d337706..28484f3 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -283,12 +283,12 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
> >  			break;
> >  		rcu_read_unlock_bh();
> >  	}
> > -	return r;
> > +	return rcu_dereference(r);
> >  }
> 
> Would it be possible to tag rt_cache_get_first() with an __acquires(RCU)
> to help out sparse?

Sparse can't handle conditional locking very well, as is done here.
There is a seperate thread where Eric reworks how all of this
locking is done in order to pacify sparse and be able to add the
__acquires() etc. tags and some of us found it too ugly to
swallow :-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-09 14:31                   ` David Miller
@ 2008-01-09 14:43                     ` Paul E. McKenney
  2008-01-09 17:36                     ` Eric Dumazet
  1 sibling, 0 replies; 27+ messages in thread
From: Paul E. McKenney @ 2008-01-09 14:43 UTC (permalink / raw)
  To: David Miller; +Cc: dada1, herbert, dipankar, netdev, josh

On Wed, Jan 09, 2008 at 06:31:26AM -0800, David Miller wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Date: Wed, 9 Jan 2008 06:22:58 -0800
> 
> > On Wed, Jan 09, 2008 at 11:37:27AM +0100, Eric Dumazet wrote:
> > > On Wed, 9 Jan 2008 20:46:37 +1100
> > > Herbert Xu <herbert@gondor.apana.org.au> wrote:
> > > 
> > > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > > index d337706..28484f3 100644
> > > --- a/net/ipv4/route.c
> > > +++ b/net/ipv4/route.c
> > > @@ -283,12 +283,12 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
> > >  			break;
> > >  		rcu_read_unlock_bh();
> > >  	}
> > > -	return r;
> > > +	return rcu_dereference(r);
> > >  }
> > 
> > Would it be possible to tag rt_cache_get_first() with an __acquires(RCU)
> > to help out sparse?
> 
> Sparse can't handle conditional locking very well, as is done here.
> There is a seperate thread where Eric reworks how all of this
> locking is done in order to pacify sparse and be able to add the
> __acquires() etc. tags and some of us found it too ugly to
> swallow :-)

Ah!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-09 14:31                   ` David Miller
  2008-01-09 14:43                     ` Paul E. McKenney
@ 2008-01-09 17:36                     ` Eric Dumazet
  1 sibling, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2008-01-09 17:36 UTC (permalink / raw)
  To: David Miller; +Cc: paulmck, herbert, dipankar, netdev

David Miller a écrit :
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Date: Wed, 9 Jan 2008 06:22:58 -0800
>
>   
>> On Wed, Jan 09, 2008 at 11:37:27AM +0100, Eric Dumazet wrote:
>>     
>>> On Wed, 9 Jan 2008 20:46:37 +1100
>>> Herbert Xu <herbert@gondor.apana.org.au> wrote:
>>>
>>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>>> index d337706..28484f3 100644
>>> --- a/net/ipv4/route.c
>>> +++ b/net/ipv4/route.c
>>> @@ -283,12 +283,12 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
>>>  			break;
>>>  		rcu_read_unlock_bh();
>>>  	}
>>> -	return r;
>>> +	return rcu_dereference(r);
>>>  }
>>>       
>> Would it be possible to tag rt_cache_get_first() with an __acquires(RCU)
>> to help out sparse?
>>     
>
> Sparse can't handle conditional locking very well, as is done here.
> There is a seperate thread where Eric reworks how all of this
> locking is done in order to pacify sparse and be able to add the
> __acquires() etc. tags and some of us found it too ugly to
> swallow :-)
>
>   

I will post a patch series to address this point after next merge 
(net-2.6 -> net-2.6.25)

Thank you





^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-09 10:37               ` Eric Dumazet
  2008-01-09 14:22                 ` Paul E. McKenney
@ 2008-01-10 11:56                 ` David Miller
  2008-01-10 14:06                 ` [DECNET] ROUTE: fix rcu_dereference() uses in /proc/net/decnet_cache Eric Dumazet
  2008-01-10 23:10                 ` [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache Jarek Poplawski
  3 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2008-01-10 11:56 UTC (permalink / raw)
  To: dada1; +Cc: herbert, paulmck, dipankar, netdev

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Wed, 9 Jan 2008 11:37:27 +0100

> [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
> 
> In rt_cache_get_next(), no need to guard seq->private by a rcu_dereference()
> since seq is private to the thread running this function. Reading seq.private
> once (as guaranted bu rcu_dereference()) or several time if compiler really is 
> dumb enough wont change the result.
> 
> But we miss real spots where rcu_dereference() are needed, both in 
> rt_cache_get_first() and rt_cache_get_next()
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

I've applied this to net-2.6, thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [DECNET] ROUTE: fix rcu_dereference() uses in /proc/net/decnet_cache
  2008-01-09 10:37               ` Eric Dumazet
  2008-01-09 14:22                 ` Paul E. McKenney
  2008-01-10 11:56                 ` David Miller
@ 2008-01-10 14:06                 ` Eric Dumazet
  2008-01-11  6:35                   ` David Miller
  2008-01-10 23:10                 ` [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache Jarek Poplawski
  3 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2008-01-10 14:06 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Paul E. McKenney, dipankar, netdev

Hi David

Here is DECNET part, shadowing commit 0bcceadceb0907094ba4e40bf9a7cd9b080f13fb ([IPV4] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache )

Thank you


[DECNET] ROUTE: fix rcu_dereference() uses in /proc/net/decnet_cache

In dn_rt_cache_get_next(), no need to guard seq->private by a rcu_dereference()
since seq is private to the thread running this function. Reading seq.private
once (as guaranted bu rcu_dereference()) or several time if compiler really is 
dumb enough wont change the result.
 
But we miss real spots where rcu_dereference() are needed, both in 
dn_rt_cache_get_first() and dn_rt_cache_get_next()

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 66663e5..0e10ff2 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1665,12 +1665,12 @@ static struct dn_route *dn_rt_cache_get_first(struct seq_file *seq)
 			break;
 		rcu_read_unlock_bh();
 	}
-	return rt;
+	return rcu_dereference(rt);
 }
 
 static struct dn_route *dn_rt_cache_get_next(struct seq_file *seq, struct dn_route *rt)
 {
-	struct dn_rt_cache_iter_state *s = rcu_dereference(seq->private);
+	struct dn_rt_cache_iter_state *s = seq->private;
 
 	rt = rt->u.dst.dn_next;
 	while(!rt) {
@@ -1680,7 +1680,7 @@ static struct dn_route *dn_rt_cache_get_next(struct seq_file *seq, struct dn_rou
 		rcu_read_lock_bh();
 		rt = dn_rt_hash_table[s->bucket].chain;
 	}
-	return rt;
+	return rcu_dereference(rt);
 }
 
 static void *dn_rt_cache_seq_start(struct seq_file *seq, loff_t *pos)

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-09 10:37               ` Eric Dumazet
                                   ` (2 preceding siblings ...)
  2008-01-10 14:06                 ` [DECNET] ROUTE: fix rcu_dereference() uses in /proc/net/decnet_cache Eric Dumazet
@ 2008-01-10 23:10                 ` Jarek Poplawski
  2008-01-10 23:51                   ` Paul E. McKenney
  2008-01-11  0:00                   ` Herbert Xu
  3 siblings, 2 replies; 27+ messages in thread
From: Jarek Poplawski @ 2008-01-10 23:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Herbert Xu, Paul E. McKenney, davem, dipankar, netdev

Eric Dumazet wrote, On 01/09/2008 11:37 AM:
...
> [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
...
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index d337706..28484f3 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -283,12 +283,12 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
>  			break;
>  		rcu_read_unlock_bh();
>  	}
> -	return r;
> +	return rcu_dereference(r);
>  }
>  
>  static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
>  {
> -	struct rt_cache_iter_state *st = rcu_dereference(seq->private);
> +	struct rt_cache_iter_state *st = seq->private;
>  
>  	r = r->u.dst.rt_next;
>  	while (!r) {
> @@ -298,7 +298,7 @@ static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
>  		rcu_read_lock_bh();
>  		r = rt_hash_table[st->bucket].chain;
>  	}
> -	return r;
> +	return rcu_dereference(r);
>  }

It seems this optimization could've a side effect: if during such a
loop updates are done, and r is seen !NULL during while() check, but
NULL after rcu_dereference(), the listing/counting could stop too
soon. So, IMHO, probably the first version of this patch is more
reliable. (Or alternatively additional check is needed before return.)

Regards,
Jarek P.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-10 23:10                 ` [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache Jarek Poplawski
@ 2008-01-10 23:51                   ` Paul E. McKenney
  2008-01-11 14:13                     ` Jarek Poplawski
  2008-01-11  0:00                   ` Herbert Xu
  1 sibling, 1 reply; 27+ messages in thread
From: Paul E. McKenney @ 2008-01-10 23:51 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Eric Dumazet, Herbert Xu, davem, dipankar, netdev

On Fri, Jan 11, 2008 at 12:10:42AM +0100, Jarek Poplawski wrote:
> Eric Dumazet wrote, On 01/09/2008 11:37 AM:
> ...
> > [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
> ...
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index d337706..28484f3 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -283,12 +283,12 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
> >  			break;
> >  		rcu_read_unlock_bh();
> >  	}
> > -	return r;
> > +	return rcu_dereference(r);
> >  }
> >  
> >  static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
> >  {
> > -	struct rt_cache_iter_state *st = rcu_dereference(seq->private);
> > +	struct rt_cache_iter_state *st = seq->private;
> >  
> >  	r = r->u.dst.rt_next;
> >  	while (!r) {
> > @@ -298,7 +298,7 @@ static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
> >  		rcu_read_lock_bh();
> >  		r = rt_hash_table[st->bucket].chain;
> >  	}
> > -	return r;
> > +	return rcu_dereference(r);
> >  }
> 
> It seems this optimization could've a side effect: if during such a
> loop updates are done, and r is seen !NULL during while() check, but
> NULL after rcu_dereference(), the listing/counting could stop too
> soon. So, IMHO, probably the first version of this patch is more
> reliable. (Or alternatively additional check is needed before return.)

Looks to me like "r" is a local variable (argument list), so there
should not be any possibility of it being changed by some other
task, right?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-10 23:10                 ` [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache Jarek Poplawski
  2008-01-10 23:51                   ` Paul E. McKenney
@ 2008-01-11  0:00                   ` Herbert Xu
  2008-01-11  8:30                     ` Jarek Poplawski
  1 sibling, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2008-01-11  0:00 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Eric Dumazet, Paul E. McKenney, davem, dipankar, netdev

On Fri, Jan 11, 2008 at 12:10:42AM +0100, Jarek Poplawski wrote:
>
> It seems this optimization could've a side effect: if during such a
> loop updates are done, and r is seen !NULL during while() check, but
> NULL after rcu_dereference(), the listing/counting could stop too
> soon. So, IMHO, probably the first version of this patch is more
> reliable. (Or alternatively additional check is needed before return.)

No, while the value of r->u.dst.rt_next can change between two readings,
the value of r cannot.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [DECNET] ROUTE: fix rcu_dereference() uses in /proc/net/decnet_cache
  2008-01-10 14:06                 ` [DECNET] ROUTE: fix rcu_dereference() uses in /proc/net/decnet_cache Eric Dumazet
@ 2008-01-11  6:35                   ` David Miller
  0 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2008-01-11  6:35 UTC (permalink / raw)
  To: dada1; +Cc: herbert, paulmck, dipankar, netdev

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Thu, 10 Jan 2008 15:06:10 +0100

> [DECNET] ROUTE: fix rcu_dereference() uses in /proc/net/decnet_cache
 ...
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-11  0:00                   ` Herbert Xu
@ 2008-01-11  8:30                     ` Jarek Poplawski
  2008-01-11  9:11                       ` Jarek Poplawski
  2008-01-11 10:37                       ` Herbert Xu
  0 siblings, 2 replies; 27+ messages in thread
From: Jarek Poplawski @ 2008-01-11  8:30 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric Dumazet, Paul E. McKenney, davem, dipankar, netdev

On Fri, Jan 11, 2008 at 11:00:20AM +1100, Herbert Xu wrote:
> On Fri, Jan 11, 2008 at 12:10:42AM +0100, Jarek Poplawski wrote:
> >
> > It seems this optimization could've a side effect: if during such a
> > loop updates are done, and r is seen !NULL during while() check, but
> > NULL after rcu_dereference(), the listing/counting could stop too
> > soon. So, IMHO, probably the first version of this patch is more
> > reliable. (Or alternatively additional check is needed before return.)
> 
> No, while the value of r->u.dst.rt_next can change between two readings,
> the value of r cannot.

...Then, of course, it's O.K.!

It looks like I'm really too lazy and/or these selfdocumenting features
of RCU are a bit overrated: one can never be sure which pointer is
really RCU protected without checking a few places?! So, after looking
at this rt_cache_get_next() and this patch only, it's looks like the
third candidate after seq->private and rtable...

Thanks for explanation and sorry for disturbing!
Jarek P.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-11  8:30                     ` Jarek Poplawski
@ 2008-01-11  9:11                       ` Jarek Poplawski
  2008-01-11  9:23                         ` Jarek Poplawski
  2008-01-11 10:38                         ` Herbert Xu
  2008-01-11 10:37                       ` Herbert Xu
  1 sibling, 2 replies; 27+ messages in thread
From: Jarek Poplawski @ 2008-01-11  9:11 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric Dumazet, Paul E. McKenney, davem, dipankar, netdev

On Fri, Jan 11, 2008 at 09:30:10AM +0100, Jarek Poplawski wrote:
> On Fri, Jan 11, 2008 at 11:00:20AM +1100, Herbert Xu wrote:
> > On Fri, Jan 11, 2008 at 12:10:42AM +0100, Jarek Poplawski wrote:
> > >
> > > It seems this optimization could've a side effect: if during such a
> > > loop updates are done, and r is seen !NULL during while() check, but
> > > NULL after rcu_dereference(), the listing/counting could stop too
> > > soon. So, IMHO, probably the first version of this patch is more
> > > reliable. (Or alternatively additional check is needed before return.)
> > 
> > No, while the value of r->u.dst.rt_next can change between two readings,
> > the value of r cannot.
> 
> ...Then, of course, it's O.K.!
> 
> It looks like I'm really too lazy and/or these selfdocumenting features
> of RCU are a bit overrated: one can never be sure which pointer is
> really RCU protected without checking a few places?! So, after looking
> at this rt_cache_get_next() and this patch only, it's looks like the
> third candidate after seq->private and rtable...

OOPS! ...it seems we are talking about the same, properly documented
(second) poiner yet...

So, IOW: strictly speaking you are right, r can't change here, but I
meant r vs. the returned value! Before the patch the returned value
couldn't be NULL unless all elements of the list were looped. After
this patch it seems possible...

Jarek P.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-11  9:11                       ` Jarek Poplawski
@ 2008-01-11  9:23                         ` Jarek Poplawski
  2008-01-11 10:38                         ` Herbert Xu
  1 sibling, 0 replies; 27+ messages in thread
From: Jarek Poplawski @ 2008-01-11  9:23 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric Dumazet, Paul E. McKenney, davem, dipankar, netdev

On Fri, Jan 11, 2008 at 10:11:40AM +0100, Jarek Poplawski wrote:
...
> So, IOW: strictly speaking you are right, r can't change here, but I
> meant r vs. the returned value! Before the patch the returned value
> couldn't be NULL unless all elements of the list were looped. After

...even more strictly:

couldn't be NULL unless all buckets of the hash table were looped. After

> this patch it seems possible...

Jarek P.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-11  8:30                     ` Jarek Poplawski
  2008-01-11  9:11                       ` Jarek Poplawski
@ 2008-01-11 10:37                       ` Herbert Xu
  2008-01-11 12:31                         ` Jarek Poplawski
  1 sibling, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2008-01-11 10:37 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Eric Dumazet, Paul E. McKenney, davem, dipankar, netdev

On Fri, Jan 11, 2008 at 09:30:10AM +0100, Jarek Poplawski wrote:
> 
> It looks like I'm really too lazy and/or these selfdocumenting features
> of RCU are a bit overrated: one can never be sure which pointer is
> really RCU protected without checking a few places?! So, after looking
> at this rt_cache_get_next() and this patch only, it's looks like the
> third candidate after seq->private and rtable...

Perhaps we could introduce a sparse attribute for it?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-11  9:11                       ` Jarek Poplawski
  2008-01-11  9:23                         ` Jarek Poplawski
@ 2008-01-11 10:38                         ` Herbert Xu
  2008-01-11 11:30                           ` Jarek Poplawski
  1 sibling, 1 reply; 27+ messages in thread
From: Herbert Xu @ 2008-01-11 10:38 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Eric Dumazet, Paul E. McKenney, davem, dipankar, netdev

On Fri, Jan 11, 2008 at 10:11:40AM +0100, Jarek Poplawski wrote:
> 
> So, IOW: strictly speaking you are right, r can't change here, but I
> meant r vs. the returned value! Before the patch the returned value
> couldn't be NULL unless all elements of the list were looped. After
> this patch it seems possible...

Since rcu_derference(r) is always the same as r this patch cannot
change the value returned.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-11 10:38                         ` Herbert Xu
@ 2008-01-11 11:30                           ` Jarek Poplawski
  0 siblings, 0 replies; 27+ messages in thread
From: Jarek Poplawski @ 2008-01-11 11:30 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric Dumazet, Paul E. McKenney, davem, dipankar, netdev

On Fri, Jan 11, 2008 at 09:38:52PM +1100, Herbert Xu wrote:
> On Fri, Jan 11, 2008 at 10:11:40AM +0100, Jarek Poplawski wrote:
> > 
> > So, IOW: strictly speaking you are right, r can't change here, but I
> > meant r vs. the returned value! Before the patch the returned value
> > couldn't be NULL unless all elements of the list were looped. After
> > this patch it seems possible...
> 
> Since rcu_derference(r) is always the same as r this patch cannot
> change the value returned.

Right!!! (But, you mean: "always the same as r" for local r, I hope...)

So, my moronness's selfdocumenting features are not overrated at all!

Thanks again,
Jarek P.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-11 10:37                       ` Herbert Xu
@ 2008-01-11 12:31                         ` Jarek Poplawski
  0 siblings, 0 replies; 27+ messages in thread
From: Jarek Poplawski @ 2008-01-11 12:31 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Eric Dumazet, Paul E. McKenney, davem, dipankar, netdev

On Fri, Jan 11, 2008 at 09:37:42PM +1100, Herbert Xu wrote:
> On Fri, Jan 11, 2008 at 09:30:10AM +0100, Jarek Poplawski wrote:
> > 
> > It looks like I'm really too lazy and/or these selfdocumenting features
> > of RCU are a bit overrated: one can never be sure which pointer is
> > really RCU protected without checking a few places?! So, after looking
> > at this rt_cache_get_next() and this patch only, it's looks like the
> > third candidate after seq->private and rtable...
> 
> Perhaps we could introduce a sparse attribute for it?

I hope I won't be cursed by all those forced to additional writing,
so I'd only admit that after this patch there should be no problem
with identifying RCU protected data properly (maybe only this kind
of rcu_dereference() needs some popularization).

Jarek P.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
  2008-01-10 23:51                   ` Paul E. McKenney
@ 2008-01-11 14:13                     ` Jarek Poplawski
  0 siblings, 0 replies; 27+ messages in thread
From: Jarek Poplawski @ 2008-01-11 14:13 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Eric Dumazet, Herbert Xu, davem, dipankar, netdev

On Thu, Jan 10, 2008 at 03:51:11PM -0800, Paul E. McKenney wrote:
> On Fri, Jan 11, 2008 at 12:10:42AM +0100, Jarek Poplawski wrote:
> > Eric Dumazet wrote, On 01/09/2008 11:37 AM:
> > ...
> > > [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache
> > ...
> > > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > > index d337706..28484f3 100644
> > > --- a/net/ipv4/route.c
> > > +++ b/net/ipv4/route.c
> > > @@ -283,12 +283,12 @@ static struct rtable *rt_cache_get_first(struct seq_file *seq)
> > >  			break;
> > >  		rcu_read_unlock_bh();
> > >  	}
> > > -	return r;
> > > +	return rcu_dereference(r);
> > >  }
> > >  
> > >  static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
> > >  {
> > > -	struct rt_cache_iter_state *st = rcu_dereference(seq->private);
> > > +	struct rt_cache_iter_state *st = seq->private;
> > >  
> > >  	r = r->u.dst.rt_next;
> > >  	while (!r) {
> > > @@ -298,7 +298,7 @@ static struct rtable *rt_cache_get_next(struct seq_file *seq, struct rtable *r)
> > >  		rcu_read_lock_bh();
> > >  		r = rt_hash_table[st->bucket].chain;
> > >  	}
> > > -	return r;
> > > +	return rcu_dereference(r);
> > >  }
> > 
> > It seems this optimization could've a side effect: if during such a
> > loop updates are done, and r is seen !NULL during while() check, but
> > NULL after rcu_dereference(), the listing/counting could stop too
> > soon. So, IMHO, probably the first version of this patch is more
> > reliable. (Or alternatively additional check is needed before return.)
> 
> Looks to me like "r" is a local variable (argument list), so there
> should not be any possibility of it being changed by some other
> task, right?

It seems words could be stronger than then logic (in some cases)...
After forgetting what's dereference usually for, it's all right!

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2008-01-11 14:07 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-07 18:30 [IPV4] ROUTE: ip_rt_dump() is unecessary slow Eric Dumazet
2008-01-08  5:52 ` David Miller
2008-01-08  6:11   ` Eric Dumazet
2008-01-08  6:15     ` David Miller
2008-01-08  6:37       ` Eric Dumazet
2008-01-09  6:02         ` Herbert Xu
2008-01-09  7:38           ` [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache Eric Dumazet
2008-01-09  9:46             ` Herbert Xu
2008-01-09 10:37               ` Eric Dumazet
2008-01-09 14:22                 ` Paul E. McKenney
2008-01-09 14:31                   ` David Miller
2008-01-09 14:43                     ` Paul E. McKenney
2008-01-09 17:36                     ` Eric Dumazet
2008-01-10 11:56                 ` David Miller
2008-01-10 14:06                 ` [DECNET] ROUTE: fix rcu_dereference() uses in /proc/net/decnet_cache Eric Dumazet
2008-01-11  6:35                   ` David Miller
2008-01-10 23:10                 ` [NET] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache Jarek Poplawski
2008-01-10 23:51                   ` Paul E. McKenney
2008-01-11 14:13                     ` Jarek Poplawski
2008-01-11  0:00                   ` Herbert Xu
2008-01-11  8:30                     ` Jarek Poplawski
2008-01-11  9:11                       ` Jarek Poplawski
2008-01-11  9:23                         ` Jarek Poplawski
2008-01-11 10:38                         ` Herbert Xu
2008-01-11 11:30                           ` Jarek Poplawski
2008-01-11 10:37                       ` Herbert Xu
2008-01-11 12:31                         ` Jarek Poplawski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).