netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tiny af_packet.c cleanup
@ 2003-09-13  5:50 Mitchell Blank Jr
  2003-09-13  7:35 ` Francois Romieu
  0 siblings, 1 reply; 10+ messages in thread
From: Mitchell Blank Jr @ 2003-09-13  5:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

In both places af_packet.c calls sk_run_filter() it does a "filter =
sk->sk_filter" to store the pointer in a local variable.  However, it
never actually uses this variable for anything -- it always refers to
the filter as sk->sk_filter instead.  Clearly this wasn't intentional.

This patch fixes up both of those cases and cleans up the code a tiny bit.

Patch is versus 2.6.0-test5.

-Mitch

--- linux-2.6.0-test5-VIRGIN/net/packet/af_packet.c	2003-09-08 12:39:53.000000000 -0700
+++ linux-2.6.0-test5mnb1/net/packet/af_packet.c	2003-09-12 13:28:53.857179768 -0700
@@ -433,9 +433,9 @@
 		struct sk_filter *filter;
 
 		bh_lock_sock(sk);
-		if ((filter = sk->sk_filter) != NULL)
-			res = sk_run_filter(skb, sk->sk_filter->insns,
-					    sk->sk_filter->len);
+		filter = sk->sk_filter;
+		if (likely(filter != NULL))	/* re-check under lock */
+			res = sk_run_filter(skb, filter->insns, filter->len);
 		bh_unlock_sock(sk);
 
 		if (res == 0)
@@ -537,9 +537,9 @@
 		struct sk_filter *filter;
 
 		bh_lock_sock(sk);
-		if ((filter = sk->sk_filter) != NULL)
-			res = sk_run_filter(skb, sk->sk_filter->insns,
-					    sk->sk_filter->len);
+		filter = sk->sk_filter;
+		if (likely(filter != NULL))	/* re-check under lock */
+			res = sk_run_filter(skb, filter->insns, filter->len);
 		bh_unlock_sock(sk);
 
 		if (res == 0)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tiny af_packet.c cleanup
  2003-09-13  5:50 [PATCH] tiny af_packet.c cleanup Mitchell Blank Jr
@ 2003-09-13  7:35 ` Francois Romieu
  2003-09-13  8:02   ` Mitchell Blank Jr
  0 siblings, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2003-09-13  7:35 UTC (permalink / raw)
  To: Mitchell Blank Jr; +Cc: David S. Miller, netdev

Mitchell Blank Jr <mitch@sfgoth.com> :
[...]
> --- linux-2.6.0-test5-VIRGIN/net/packet/af_packet.c	2003-09-08 12:39:53.000000000 -0700
> +++ linux-2.6.0-test5mnb1/net/packet/af_packet.c	2003-09-12 13:28:53.857179768 -0700
> @@ -433,9 +433,9 @@
>  		struct sk_filter *filter;
>  
>  		bh_lock_sock(sk);
> -		if ((filter = sk->sk_filter) != NULL)
> -			res = sk_run_filter(skb, sk->sk_filter->insns,
> -					    sk->sk_filter->len);
> +		filter = sk->sk_filter;
> +		if (likely(filter != NULL))	/* re-check under lock */
> +			res = sk_run_filter(skb, filter->insns, filter->len);

1 - pointers tests against NULL are naturally supposed "unlikely" by gcc.
2 - I am not completely convinced that the "/* re-check under lock */" 
    comment is really useful either: the lock statement is on the line before.

Make it more spartan:

--- linux-2.6.0-test5-VIRGIN/net/packet/af_packet.c	2003-09-08 12:39:53.000000000 -0700
+++ linux-2.6.0-test5mnb1/net/packet/af_packet.c	2003-09-12 13:28:53.857179768 -0700
@@ -433,9 +433,9 @@
 		struct sk_filter *filter;
 
 		bh_lock_sock(sk);
-		if ((filter = sk->sk_filter) != NULL)
-			res = sk_run_filter(skb, sk->sk_filter->insns,
-					    sk->sk_filter->len);
+		filter = sk->sk_filter;
+		if (filter)
+			res = sk_run_filter(skb, filter->insns, filter->len);
 		bh_unlock_sock(sk);
 
 		if (res == 0)
@@ -537,9 +537,9 @@
 		struct sk_filter *filter;
 
 		bh_lock_sock(sk);
-		if ((filter = sk->sk_filter) != NULL)
-			res = sk_run_filter(skb, sk->sk_filter->insns,
-					    sk->sk_filter->len);
+		filter = sk->sk_filter;
+		if (filter)
+			res = sk_run_filter(skb, filter->insns, filter->len);
 		bh_unlock_sock(sk);
 
 		if (res == 0)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tiny af_packet.c cleanup
  2003-09-13  7:35 ` Francois Romieu
@ 2003-09-13  8:02   ` Mitchell Blank Jr
  2003-09-13  9:03     ` Francois Romieu
  0 siblings, 1 reply; 10+ messages in thread
From: Mitchell Blank Jr @ 2003-09-13  8:02 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev

Francois Romieu wrote:
> 2 - I am not completely convinced that the "/* re-check under lock */" 
>     comment is really useful either: the lock statement is on the line before.

I thought that without the comment someone might think that the second
"if()" wasn't needed (since we had just checked the same value against
NULL a few lines up)

-Mitch

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tiny af_packet.c cleanup
  2003-09-13  8:02   ` Mitchell Blank Jr
@ 2003-09-13  9:03     ` Francois Romieu
  2003-09-13 20:15       ` Mitchell Blank Jr
  0 siblings, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2003-09-13  9:03 UTC (permalink / raw)
  To: Mitchell Blank Jr; +Cc: netdev

Mitchell Blank Jr <mitch@sfgoth.com> :
[...]
> I thought that without the comment someone might think that the second
> "if()" wasn't needed (since we had just checked the same value against
> NULL a few lines up)

Ok, I completely missed the intent.

Actually packet_rcv() is run in a BH context and doesn't race with
SO_{ATTACH/DETACH}_FILTER from sock_setsockopt() which does the
appropriate BH locking (spin_lock_bh(&sk->sk_lock.slock) in
net/core/sock.c::sock_setsockopt and in net/core/filter.c::sk_attach_filter).
packet_rcv() doesn't race with BH either due to the bh_lock_sock (a spin_lock
in disguise) you quoted.

That being said, I don't see how such an explanation could fit in a short,
inlined comment. :o)

--
Ueimor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tiny af_packet.c cleanup
  2003-09-13  9:03     ` Francois Romieu
@ 2003-09-13 20:15       ` Mitchell Blank Jr
  2003-09-14 10:55         ` Francois Romieu
  0 siblings, 1 reply; 10+ messages in thread
From: Mitchell Blank Jr @ 2003-09-13 20:15 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev

Francois Romieu wrote:
> Actually packet_rcv() is run in a BH context and doesn't race with
> SO_{ATTACH/DETACH}_FILTER from sock_setsockopt() which does the
> appropriate BH locking (spin_lock_bh(&sk->sk_lock.slock) in
> net/core/sock.c::sock_setsockopt and in net/core/filter.c::sk_attach_filter).
> packet_rcv() doesn't race with BH either due to the bh_lock_sock (a spin_lock
> in disguise) you quoted.

I don't understand what you're saying - are you saying that the locking
isn't needed?  Or just the recheck isn't needed?  It sounds like if we're
only avoiding the race because of the bh_lock_sock() then we do need to
recheck, right?

Could you do a patch for what you think it should look like?  You obviously
understand the locking issues here better than I.

-Mitch

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tiny af_packet.c cleanup
  2003-09-13 20:15       ` Mitchell Blank Jr
@ 2003-09-14 10:55         ` Francois Romieu
  2003-09-14 11:26           ` Mitchell Blank Jr
  0 siblings, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2003-09-14 10:55 UTC (permalink / raw)
  To: Mitchell Blank Jr; +Cc: netdev

Mitchell Blank Jr <mitch@sfgoth.com> :
[...]
> I don't understand what you're saying - are you saying that the locking
> isn't needed?  Or just the recheck isn't needed?  It sounds like if we're
> only avoiding the race because of the bh_lock_sock() then we do need to
> recheck, right?

- the locking is needed
- the recheck is needed
(- it does more than the recheck)

My point was related to the comment included in the patch, namely
"/* re-check under lock */"
>From a reader's view, it is important to know why things are done and the
comment only tells _what_ is done.

So why is locking required ?

The comment just before the head of packet_rcv() suggests it.

As usual packet handler [1] in Linux, this code is run in a BH context.
It can race with user-space when it uses the struct sock related part of
the sk_buff (and it could race with tasklets when it messes with the
struct sk_buff itself)

User-space can change sk->sk_filter through setsockopt() +
SO_{ATTACH/DETACH}_FILTER but it takes care to only do so in sections 
surrounded by spin_{lock/unlock}_bh() [2]. As the first "if (sk->sk_filter)"
in packet_rcv() takes no lock, there is a time window where it is possible
that sk->sk_filter has been changed by user-space code on one cpu whereas
packet_rcv() still sees the old value. However, user is guaranteed that a
packet received _after_ his call to setsockopt() has returned will be
correctly handled by packet_rcv(). Whence 1) user is happy 2) packet_rcv()
isn't forced to take a lock before examination of sk->sk_filter when there
is no BPF handling to do (optimization).

Now let's assume some BPF-related action _appears_ to be needed. Without
locking, even filter = sk->sk_filter in packet_rcv() could race with
sk->sk_filter = NULL in sock_setsockopt() and result in partially set
content for "filter". That's why user-space code forbids BH on its CPU and
stops BH on others CPU with sk->sk_lock.slock. As soon as packet_rcv()
try to lock sk->sk_lock.slock, locking between packet_rcv() and user space
is fine.
As there is no point in issuing "forbid BH on my CPU" from packet_rcv(),
this part of the locking is forgotten. Thus packet_rcv() only issues a
spin_{lock/unlock} (which protects from tasklets on different CPU as well).
spin_{lock/unlock} is named bh_{lock/unlock}_sock. I assume it is on
documentation purpose and to allow an evil plan in the future.

[1] net/core/dev.c::dev_add_pack() and struct packet_type in
    include/linux/netdevice.h
[2] net/core/sock.c::sock_setsockopt() ... SO_DETACH_FILTER and
    net/core/filter.c::sk_attach_filter()

> Could you do a patch for what you think it should look like?  You obviously
> understand the locking issues here better than I.

See previously posted patch. Imho the non-trivial part isn't the locking
itself but the fact that the first test of sk->sk_filter is done _without_
lock. May be something like the following comment before this test:

/* Racing with user-space for optimization purpose: don't panic */

Tell me if it is correctly worded and I'll patch it.

Btw, the locking issues are rather well explained in 
Documentation/DocBook/kernel-locking.tmpl (and in Schimmel's book as well).

--
Ueimor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tiny af_packet.c cleanup
  2003-09-14 10:55         ` Francois Romieu
@ 2003-09-14 11:26           ` Mitchell Blank Jr
  2003-09-15 22:56             ` David S. Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Mitchell Blank Jr @ 2003-09-14 11:26 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev

Francois Romieu wrote:
> See previously posted patch. Imho the non-trivial part isn't the locking
> itself but the fact that the first test of sk->sk_filter is done _without_
> lock.

OK, that was what I thought was going on.  I figured the short comment (along
with the likely()) would explain this adequately (i.e. "we're now re-checking
under lock so we get the authorative answer") but maybe it needs more
explaination.

-Mitch

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tiny af_packet.c cleanup
  2003-09-14 11:26           ` Mitchell Blank Jr
@ 2003-09-15 22:56             ` David S. Miller
  2003-09-16  3:13               ` Mitchell Blank Jr
  0 siblings, 1 reply; 10+ messages in thread
From: David S. Miller @ 2003-09-15 22:56 UTC (permalink / raw)
  To: Mitchell Blank Jr; +Cc: romieu, netdev

On Sun, 14 Sep 2003 04:26:28 -0700
Mitchell Blank Jr <mitch@sfgoth.com> wrote:

> Francois Romieu wrote:
> > See previously posted patch. Imho the non-trivial part isn't the locking
> > itself but the fact that the first test of sk->sk_filter is done _without_
> > lock.
> 
> OK, that was what I thought was going on.  I figured the short comment (along
> with the likely()) would explain this adequately (i.e. "we're now re-checking
> under lock so we get the authorative answer") but maybe it needs more
> explaination.

When you guys decide on a final patch let me know, the semantic
parts of Mitchell's changes look perfectly fine to me.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tiny af_packet.c cleanup
  2003-09-15 22:56             ` David S. Miller
@ 2003-09-16  3:13               ` Mitchell Blank Jr
  2003-09-16  5:41                 ` David S. Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Mitchell Blank Jr @ 2003-09-16  3:13 UTC (permalink / raw)
  To: David S. Miller, romieu; +Cc: netdev

David S. Miller wrote:
> When you guys decide on a final patch let me know, the semantic
> parts of Mitchell's changes look perfectly fine to me.

How about something like the following.  It expands the comment but turns
that bit of code into an inline function so we onlt have to explain it
once.

Untested, but compiles fine.

-Mitch

--- linux-2.6.0-test5-VIRGIN/net/packet/af_packet.c	2003-09-08 12:39:53.000000000 -0700
+++ linux-2.6.0-test5mnb1/net/packet/af_packet.c	2003-09-15 10:56:26.313218168 -0700
@@ -381,6 +381,23 @@
 }
 #endif
 
+static inline unsigned run_filter(struct sk_buff *skb, struct sock *sk, unsigned res)
+{
+	struct sk_filter *filter;
+
+	bh_lock_sock(sk);
+	filter = sk->sk_filter;
+	/*
+	 * Our caller already checked that filter != NULL but we need to
+	 * verify that under bh_lock_sock() to be safe
+	 */
+	if (likely(filter != NULL))
+		res = sk_run_filter(skb, filter->insns, filter->len);
+	bh_unlock_sock(sk);
+
+	return res;
+}
+
 /*
    This function makes lazy skb cloning in hope that most of packets
    are discarded by BPF.
@@ -429,15 +446,7 @@
 	snaplen = skb->len;
 
 	if (sk->sk_filter) {
-		unsigned res = snaplen;
-		struct sk_filter *filter;
-
-		bh_lock_sock(sk);
-		if ((filter = sk->sk_filter) != NULL)
-			res = sk_run_filter(skb, sk->sk_filter->insns,
-					    sk->sk_filter->len);
-		bh_unlock_sock(sk);
-
+		unsigned res = run_filter(skb, sk, snaplen);
 		if (res == 0)
 			goto drop_n_restore;
 		if (snaplen > res)
@@ -533,15 +542,7 @@
 	snaplen = skb->len;
 
 	if (sk->sk_filter) {
-		unsigned res = snaplen;
-		struct sk_filter *filter;
-
-		bh_lock_sock(sk);
-		if ((filter = sk->sk_filter) != NULL)
-			res = sk_run_filter(skb, sk->sk_filter->insns,
-					    sk->sk_filter->len);
-		bh_unlock_sock(sk);
-
+		unsigned res = run_filter(skb, sk, snaplen);
 		if (res == 0)
 			goto drop_n_restore;
 		if (snaplen > res)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] tiny af_packet.c cleanup
  2003-09-16  3:13               ` Mitchell Blank Jr
@ 2003-09-16  5:41                 ` David S. Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David S. Miller @ 2003-09-16  5:41 UTC (permalink / raw)
  To: Mitchell Blank Jr; +Cc: romieu, netdev

On Mon, 15 Sep 2003 20:13:55 -0700
Mitchell Blank Jr <mitch@sfgoth.com> wrote:

> How about something like the following.  It expands the comment but turns
> that bit of code into an inline function so we onlt have to explain it
> once.

Looks good, applied.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-09-16  5:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-13  5:50 [PATCH] tiny af_packet.c cleanup Mitchell Blank Jr
2003-09-13  7:35 ` Francois Romieu
2003-09-13  8:02   ` Mitchell Blank Jr
2003-09-13  9:03     ` Francois Romieu
2003-09-13 20:15       ` Mitchell Blank Jr
2003-09-14 10:55         ` Francois Romieu
2003-09-14 11:26           ` Mitchell Blank Jr
2003-09-15 22:56             ` David S. Miller
2003-09-16  3:13               ` Mitchell Blank Jr
2003-09-16  5:41                 ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).