Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next-2.6] bonding: move dev_addr cpy to bond_enslave
From: David Miller @ 2010-06-02 11:17 UTC (permalink / raw)
  To: jpirko; +Cc: netdev, fubar, bonding-devel
In-Reply-To: <20100519111428.GA2788@psychotron.lab.eng.brq.redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Wed, 19 May 2010 13:14:29 +0200

> Move the code that copies slave's mac address in case that's the first slave into
> bond_enslave. Ifenslave app does this also but that's not a problem. This is
> something that should be done in bond_enslave, and it shound not matter from
> where is it called.
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>

(Jiri, please number your patches in a set, even if they should apply
 properly independantly, thanks)

Applied.

^ permalink raw reply

* Re: [RFC] tcp: delack_timer expiration changes for every frame
From: David Miller @ 2010-06-02 11:11 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, ilpo.jarvinen
In-Reply-To: <1274388439.2508.27.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 20 May 2010 22:47:19 +0200

> So we change the delack_timer by a positive delta (~ HZ/10) and a
>  negative delta (~HZ/10), on the typical netperf TCP_RR workload.

Yes, this is silly.

> Here, the incoming frame is handled by netperf, doing a recvmsg().
> tcp_send_delayed_ack() sets the delack_timer to jiffies + HZ/25
> 
> [  392.207721] timer->expires=23045, new expires=23019(10) diff=26 timer=e5ecb754
 ...
> [  392.209221]  [<c1279ce7>] ? tcp_send_delayed_ack+0xb5/0xc1
> [  392.209282]  [<c1276d26>] ? tcp_rcv_established+0x39f/0x4f7

HZ/25 is TCP_DELACK_MIN and TCP_ATO_MIN, but the actual value we use
here involves incorporation of various measurements made on the
connection (RTO, etc.)

 ...
> tcp_v4_rcv() sets the delack timer to 37 ticks, so mod_timer() optimizations is not
> working at all.
> 
> [  392.217454] timer->expires=23019, new expires=23049(37) diff=-30 timer=e5ecb754
 ...
> [  392.218765]  [<c127cf56>] ? tcp_v4_rcv+0x41c/0x6b7
> [  392.218826]  [<c1265832>] ? ip_local_deliver_finish+0xe9/0x178

There must be a tail-call here at tcp_v4_rcv() or something missed in
the backtrace stack scanning logic, because tcp_v4_rcv() and it's main
inline tcp_v4_do_rcv() do not modify established state socket timers,
and in particular do not modify the delack timer, that I can see.

It must be in via tcp_rcv_established() or similar.
...

Nevermind, it's the inlined prequeue stuff.

It uses a seperate calculation of the delack timer offset, independant
of the one made by tcp_send_delayed_ack(), it's timer offset formula is:

	(3 * tcp_rto_min(sk)) / 4

with a MAX of:

	TCP_RTO_MAX

So every time we go in and out of recvmsg() we'll hop between these
two different delayed ACK settings.

The prequeue logic is trying to stretch the delayed ACK to 3/4 of a
window of data.  It's set a bit high, intentionally, in the hopes that
we'll get the process into recvmsg() and have it emit it's response
packet from a subsequent sendmsg() (that the ACK can ride on) before
this timer fires.

But when we drop the socket lock to sleep or return to userspace, one
of the next packets is just going to reset this timer differently.

While the intentions of the prequeue code look legit, the use of two
different delayed ACK timeout schemes has bad implications elsewhere.

For example, if the delack timer does actually fire, there is this
ATO fixup code here:

		if (!icsk->icsk_ack.pingpong) {
			/* Delayed ACK missed: inflate ATO. */
			icsk->icsk_ack.ato = min(icsk->icsk_ack.ato << 1, icsk->icsk_rto);
		} else {
			/* Delayed ACK missed: leave pingpong mode and
			 * deflate ATO.
			 */
			icsk->icsk_ack.pingpong = 0;
			icsk->icsk_ack.ato      = TCP_ATO_MIN;
		}

which is totally wrong if the delack timer offset is the one
calculated by the prequeue code.  Doubling the ATO in that case
is completely the wrong thing to do.

So yes we have all kinds of inconsistencies here and we should
probably unify things so that the timer gets kicked less often.

^ permalink raw reply

* Re: [PATCH] net/mpc52xx_phy: Various code cleanups
From: David Miller @ 2010-06-02 10:45 UTC (permalink / raw)
  To: w.sang; +Cc: linuxppc-dev, netdev, grant.likely
In-Reply-To: <1274264470-2905-1-git-send-email-w.sang@pengutronix.de>

From: Wolfram Sang <w.sang@pengutronix.de>
Date: Wed, 19 May 2010 12:21:10 +0200

> - don't free bus->irq (obsoleted by ca816d98170942371535b3e862813b0aba9b7d90)
> - don't dispose irqs (should be done in of_mdiobus_register())
> - use fec-pointer consistently in transfer()
> - use resource_size()
> - cosmetic fixes
> 
> Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next-2.6] bonding: make bonding_store_slaves simpler
From: David Miller @ 2010-06-02 10:40 UTC (permalink / raw)
  To: jpirko; +Cc: netdev, fubar, bonding-devel
In-Reply-To: <20100518154638.GF2878@psychotron.lab.eng.brq.redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Tue, 18 May 2010 17:46:39 +0200

> This patch makes bonding_store_slaves function nicer and easier to understand.
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] bonding: remove redundant checks from bonding_store_slaves V2
From: David Miller @ 2010-06-02 10:40 UTC (permalink / raw)
  To: jpirko; +Cc: netdev, fubar, bonding-devel
In-Reply-To: <20100518154452.GE2878@psychotron.lab.eng.brq.redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Tue, 18 May 2010 17:44:53 +0200

> (it's actually the same as v1)
> 
> Remove checks that duplicates similar checks in bond_enslave.
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] bonding: move slave MTU handling from sysfs V2
From: David Miller @ 2010-06-02 10:40 UTC (permalink / raw)
  To: jpirko; +Cc: netdev, fubar, bonding-devel, monis
In-Reply-To: <20100518154016.GD2878@psychotron.lab.eng.brq.redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Tue, 18 May 2010 17:42:40 +0200

> V1->V2: corrected res/ret use
> 
> For some reason, MTU handling (storing, and restoring) is taking  place in
> bond_sysfs. The correct place for this code is in bond_enslave, bond_release.
> So move it there.
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] bonding: remove unused variable "found"
From: David Miller @ 2010-06-02 10:40 UTC (permalink / raw)
  To: jpirko; +Cc: netdev
In-Reply-To: <20100517134953.GD2878@psychotron.lab.eng.brq.redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Mon, 17 May 2010 15:49:54 +0200

> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH 3/3] net: deliver skbs on inactive slaves to exact matches
From: David Miller @ 2010-06-02 10:36 UTC (permalink / raw)
  To: john.r.fastabend; +Cc: andy, fubar, nhorman, bonding-devel, netdev
In-Reply-To: <4BFDFAAA.8070701@intel.com>

From: John Fastabend <john.r.fastabend@intel.com>
Date: Wed, 26 May 2010 21:52:58 -0700

> I know you were looking over this series at one point.  Did you have
> any comments?  Sorry for the ping just wanted to keep this on my
> radar.

I'll hold this last one until Jay has a chance to comment on it.

^ permalink raw reply

* Re: [PATCH 2/3] net: fix conflict between null_or_orig and null_or_bond
From: David Miller @ 2010-06-02 10:35 UTC (permalink / raw)
  To: john.r.fastabend; +Cc: andy, fubar, nhorman, bonding-devel, netdev
In-Reply-To: <20100513073111.3528.19949.stgit@jf-dev2-dcblab>

From: John Fastabend <john.r.fastabend@intel.com>
Date: Thu, 13 May 2010 00:31:11 -0700

> If a skb is received on an inactive bond that does not meet
> the special cases checked for by skb_bond_should_drop it should
> only be delivered to exact matches as the comment in
> netif_receive_skb() says.
> 
> However because null_or_bond could also be null this is not
> always true.  This patch renames null_or_bond to orig_or_bond
> and initializes it to orig_dev.  This keeps the intent of
> null_or_bond to pass frames received on VLAN interfaces stacked
> on bonding interfaces without invalidating the statement for
> null_or_orig.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Also applied.

^ permalink raw reply

* Re: [PATCH 1/3] net: init_vlan should not copy slave or master flags
From: David Miller @ 2010-06-02 10:35 UTC (permalink / raw)
  To: john.r.fastabend; +Cc: andy, fubar, nhorman, bonding-devel, netdev
In-Reply-To: <20100513073106.3528.45412.stgit@jf-dev2-dcblab>

From: John Fastabend <john.r.fastabend@intel.com>
Date: Thu, 13 May 2010 00:31:06 -0700

> The vlan device should not copy the slave or master flags from
> the real device. It is not in the bond until added nor is it
> a master.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Applied, thanks John.

^ permalink raw reply

* Re: [PATCH 2/2 net-next-2.6] net: QDISC_STATE_RUNNING dont need atomic bit ops
From: David Miller @ 2010-06-02 10:25 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, alexander.h.duyck, netdev
In-Reply-To: <1275472233.2725.146.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 02 Jun 2010 11:50:33 +0200

> __QDISC_STATE_RUNNING is always changed while qdisc lock is held.
> 
> We can avoid two atomic operations in xmit path, if we move this bit in
> a new __state container.
> 
> Location of this __state container is carefully chosen so that fast path
> only dirties one qdisc cache line.
> 
> THROTTLED bit could later be moved into this __state location too, to
> avoid dirtying first qdisc cache line.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Also looks good, applied.

One thing about naming.  Here, even though we name the type and the
state member with two leading underscores, the things that actually
modify these state members are helper functions with names that lack
double underscores.

So really, reading the code, you don't see that special considerations
for these state changes might be necessary.

^ permalink raw reply

* Re: [PATCH 1/2 net-next-2.6] net: Define accessors to manipulate QDISC_STATE_RUNNING
From: David Miller @ 2010-06-02 10:24 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, alexander.h.duyck, netdev
In-Reply-To: <1275472154.2725.143.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 02 Jun 2010 11:49:14 +0200

> Define three helpers to manipulate QDISC_STATE_RUNNIG flag, that a
> second patch will move on another location.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Looks good, applied.

^ permalink raw reply

* Re: [v5 Patch 1/3] netpoll: add generic support for bridge and bonding devices
From: Cong Wang @ 2010-06-02 10:04 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Flavio Leitner, linux-kernel, Matt Mackall, netdev, bridge,
	Andy Gospodarek, Neil Horman, Jeff Moyer, Stephen Hemminger,
	bonding-devel, David Miller
In-Reply-To: <24059.1275417767@death.nxdomain.ibm.com>

On 06/02/10 02:42, Jay Vosburgh wrote:
> Cong Wang<amwang@redhat.com>  wrote:
>
>> On 06/01/10 03:08, Flavio Leitner wrote:
>>> On Mon, May 31, 2010 at 01:56:52PM +0800, Cong Wang wrote:
>>>> Hi, Flavio,
>>>>
>>>> Please use the attached patch instead, try to see if it solves
>>>> all your problems.
>>>
>>> I tried and it hangs. No backtraces this time.
>>> The bond_change_active_slave() prints before NETDEV_BONDING_FAILOVER
>>> notification, so I think it won't work.
>>
>> Ah, I thought the same.
>>
>>>
>>> Please, correct if I'm wrong, but when a failover happens with your
>>> patch applied, the netconsole would be disabled forever even with
>>> another healthy slave, right?
>>>
>>
>> Yes, this is an easy solution, because bonding has several modes,
>> it is complex to make netpoll work in different modes.
>
> 	If I understand correctly, the root cause of the problem with
> netconsole and bonding is that bonding is, ultimately, performing
> printks with a write lock held, and when netconsole recursively calls
> into bonding to send the printk over the netconsole, there is a deadlock
> (when the bonding xmit function attempts to acquire the same lock for
> read).


Yes.

>
> 	You're trying to avoid the deadlock by shutting off netconsole
> (permanently, it looks like) for one problem case: a failover, which
> does some printks with a write lock held.
>
> 	This doesn't look to me like a complete solution, there are
> other cases in bonding that will do printk with write locks held.  I
> suspect those will also hang netconsole as things exist today, and won't
> be affected by your patch below.


I can expect that, bonding modes are complex.

>
> 	For example:
>
> 	The sysfs functions to set the primary (bonding_store_primary)
> or active (bonding_store_active_slave) options: a pr_info is called to
> provide a log message of the results.  These could be tested by setting
> the primary or active options via sysfs, e.g.,
>
> echo eth0>  /sys/class/net/bond0/bonding/primary
> echo eth0>  /sys/class/net/bond0/bonding/active
>
> 	If the kernel is defined with DEBUG, there are a few pr_debug
> calls within write_locks (bond_del_vlan, for example).
>
> 	If the slave's underlying device driver's ndo_vlan_rx_register
> or ndo_vlan_rx_kill_vid functions call printk (and it looks like some do
> for error cases, e.g., igbvf, ehea, enic), those would also presumably
> deadlock (because bonding holds its write_lock when calling the ndo_
> vlan functions).
>
> 	It also appears that (with the patch below) some nominally
> normal usage patterns will immediately disable netconsole.  The one that
> comes to mind is if the primary= option is set (to "eth1" for this
> example), but that slave not enslaved first (the slaves are added, say,
> eth0 then eth1).  In that situation, when the primary slave (eth1 here)
> is added, the first thing that will happen is a failover, and that will
> disable netconsole.
>

Thanks for your detailed explanation!

This is why I said bonding is complex. I guess we would have to adjust
netpoll code for different bonding cases, one solution seems not fix all.
I am not sure how much work to do, since I am not familiar with bonding
code. Maybe Andy can help?

For the previous patch, it at least can make Flavio happy. :)

Thanks!

^ permalink raw reply

* [PATCH 2/2 net-next-2.6] net: QDISC_STATE_RUNNING dont need atomic bit ops
From: Eric Dumazet @ 2010-06-02  9:50 UTC (permalink / raw)
  To: Stephen Hemminger, David Miller; +Cc: alexander.h.duyck, netdev
In-Reply-To: <1274463643.2439.473.camel@edumazet-laptop>

__QDISC_STATE_RUNNING is always changed while qdisc lock is held.

We can avoid two atomic operations in xmit path, if we move this bit in
a new __state container.

Location of this __state container is carefully chosen so that fast path
only dirties one qdisc cache line.

THROTTLED bit could later be moved into this __state location too, to
avoid dirtying first qdisc cache line.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sch_generic.h |   15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9707dae..b3591e4 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -23,11 +23,17 @@ struct qdisc_rate_table {
 };
 
 enum qdisc_state_t {
-	__QDISC_STATE_RUNNING,
 	__QDISC_STATE_SCHED,
 	__QDISC_STATE_DEACTIVATED,
 };
 
+/*
+ * following bits are only changed while qdisc lock is held
+ */
+enum qdisc___state_t {
+	__QDISC___STATE_RUNNING,
+};
+
 struct qdisc_size_table {
 	struct list_head	list;
 	struct tc_sizespec	szopts;
@@ -72,23 +78,24 @@ struct Qdisc {
 	unsigned long		state;
 	struct sk_buff_head	q;
 	struct gnet_stats_basic_packed bstats;
+	unsigned long		__state;
 	struct gnet_stats_queue	qstats;
 	struct rcu_head     rcu_head;
 };
 
 static inline bool qdisc_is_running(struct Qdisc *qdisc)
 {
-	return test_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+	return test_bit(__QDISC___STATE_RUNNING, &qdisc->__state);
 }
 
 static inline bool qdisc_run_begin(struct Qdisc *qdisc)
 {
-	return !test_and_set_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+	return !__test_and_set_bit(__QDISC___STATE_RUNNING, &qdisc->__state);
 }
 
 static inline void qdisc_run_end(struct Qdisc *qdisc)
 {
-	clear_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+	__clear_bit(__QDISC___STATE_RUNNING, &qdisc->__state);
 }
 
 struct Qdisc_class_ops {



^ permalink raw reply related

* [PATCH 1/2 net-next-2.6] net: Define accessors to manipulate QDISC_STATE_RUNNING
From: Eric Dumazet @ 2010-06-02  9:49 UTC (permalink / raw)
  To: Stephen Hemminger, David Miller; +Cc: alexander.h.duyck, netdev
In-Reply-To: <1274463643.2439.473.camel@edumazet-laptop>

Define three helpers to manipulate QDISC_STATE_RUNNIG flag, that a
second patch will move on another location.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/pkt_sched.h   |    2 +-
 include/net/sch_generic.h |   15 +++++++++++++++
 net/core/dev.c            |    4 ++--
 net/sched/sch_generic.c   |    4 ++--
 4 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 9d4d87c..d9549af 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -95,7 +95,7 @@ extern void __qdisc_run(struct Qdisc *q);
 
 static inline void qdisc_run(struct Qdisc *q)
 {
-	if (!test_and_set_bit(__QDISC_STATE_RUNNING, &q->state))
+	if (qdisc_run_begin(q))
 		__qdisc_run(q);
 }
 
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 03ca5d8..9707dae 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -76,6 +76,21 @@ struct Qdisc {
 	struct rcu_head     rcu_head;
 };
 
+static inline bool qdisc_is_running(struct Qdisc *qdisc)
+{
+	return test_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+}
+
+static inline bool qdisc_run_begin(struct Qdisc *qdisc)
+{
+	return !test_and_set_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+}
+
+static inline void qdisc_run_end(struct Qdisc *qdisc)
+{
+	clear_bit(__QDISC_STATE_RUNNING, &qdisc->state);
+}
+
 struct Qdisc_class_ops {
 	/* Child qdisc manipulation */
 	struct netdev_queue *	(*select_queue)(struct Qdisc *, struct tcmsg *);
diff --git a/net/core/dev.c b/net/core/dev.c
index 983a3c1..2733226 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2047,7 +2047,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 		kfree_skb(skb);
 		rc = NET_XMIT_DROP;
 	} else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) &&
-		   !test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) {
+		   qdisc_run_begin(q)) {
 		/*
 		 * This is a work-conserving queue; there are no old skbs
 		 * waiting to be sent out; and the qdisc is not running -
@@ -2059,7 +2059,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 		if (sch_direct_xmit(skb, q, dev, txq, root_lock))
 			__qdisc_run(q);
 		else
-			clear_bit(__QDISC_STATE_RUNNING, &q->state);
+			qdisc_run_end(q);
 
 		rc = NET_XMIT_SUCCESS;
 	} else {
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index bd1892f..37b86ea 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -205,7 +205,7 @@ void __qdisc_run(struct Qdisc *q)
 		}
 	}
 
-	clear_bit(__QDISC_STATE_RUNNING, &q->state);
+	qdisc_run_end(q);
 }
 
 unsigned long dev_trans_start(struct net_device *dev)
@@ -797,7 +797,7 @@ static bool some_qdisc_is_busy(struct net_device *dev)
 
 		spin_lock_bh(root_lock);
 
-		val = (test_bit(__QDISC_STATE_RUNNING, &q->state) ||
+		val = (qdisc_is_running(q) ||
 		       test_bit(__QDISC_STATE_SCHED, &q->state));
 
 		spin_unlock_bh(root_lock);



^ permalink raw reply related

* Re: [RFC PATCH] net: add additional lock to qdisc to increase enqueue/dequeue fairness
From: Eric Dumazet @ 2010-06-02  9:48 UTC (permalink / raw)
  To: Stephen Hemminger, David Miller; +Cc: alexander.h.duyck, netdev
In-Reply-To: <1274463643.2439.473.camel@edumazet-laptop>

Le vendredi 21 mai 2010 à 19:40 +0200, Eric Dumazet a écrit :
> Le vendredi 21 mai 2010 à 18:44 +0200, Eric Dumazet a écrit :
> 
> > We could use cmpxchg() and manipulate several bits at once in fast path.
> > ( __QDISC_STATE_RUNNING, __QDISC_STATE_LOCKED ... ) but making the crowd
> > of cpus spin on the same bits/cacheline than dequeue worker would
> > definitely slowdown the worker.
> > 
> > 
> 
> Maybe I am missing something, but __QDISC_STATE_RUNNING is always
> manipulated with the lock held...
> 
> We might avoid two atomic ops when changing this state (if moved to a
> separate container) in fast path (when a cpu sends only one packet and
> returns)
> 
> 

Here are two patches to implement this idea.

First patch to abstract QDISC_STATE_RUNNING access.

Second patch to add a __qstate container and remove the atomic ops.




^ permalink raw reply

* Re: [PATCH net-2.6] bnx2: Fix hang during rmmod bnx2.
From: David Miller @ 2010-06-02  9:27 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1275440736-18058-1-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Tue, 1 Jun 2010 18:05:36 -0700

> The regression is caused by:
> 
> commit 4327ba435a56ada13eedf3eb332e583c7a0586a9
>     bnx2: Fix netpoll crash.
> 
> If ->open() and ->close() are called multiple times, the same napi structs
> will be added to dev->napi_list multiple times, corrupting the dev->napi_list.
> This causes free_netdev() to hang during rmmod.
> 
> We fix this by calling netif_napi_del() during ->close().
> 
> Also, bnx2_init_napi() must not be in the __devinit section since it is
> called by ->open().
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>
> Signed-off-by: Benjamin Li <benli@broadcom.com>

Applied, thanks!

^ permalink raw reply

* Re: [net-2.6 PATCH] enic: bug fix: make the set/get netlink VF_PORT support symmetrical
From: David Miller @ 2010-06-02  9:27 UTC (permalink / raw)
  To: scofeldm; +Cc: chrisw, netdev
In-Reply-To: <20100601185933.4323.2212.stgit@localhost.localdomain>

From: Scott Feldman <scofeldm@cisco.com>
Date: Tue, 01 Jun 2010 11:59:33 -0700

> From: Scott Feldman <scofeldm@cisco.com>
> 
> To make get/set netlink VF_PORT truly symmetrical, we need to keep track
> of what items are set and only return those items on get.  Previously, the
> driver wasn't differentiating between a set of attr with a NULL string,
> for example, and not setting the attr at all.  We only want to return
> the NULL string if the attr was actually set with a NULL string.  Otherwise,
> don't return the attr.
> 
> Signed-off-by: Scott Feldman <scofeldm@cisco.com>

Applied, thanks Scott.

^ permalink raw reply

* Re: [PATCH] xfrm: force a dst reference in __xfrm_route_forward()
From: David Miller @ 2010-06-02  9:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1275422689.2638.14.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 01 Jun 2010 22:04:49 +0200

> Packets going through __xfrm_route_forward() have a not refcounted dst
> entry, since we enabled a noref forwarding path.
> 
> xfrm_lookup() might incorrectly release this dst entry.
> 
> It's a bit late to make invasive changes in xfrm_lookup(), so lets force
> a refcount in this path.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks for fixing this bug Eric.

^ permalink raw reply

* Re: [PATCH net-next-2.6 2/2] qlcnic: NIC Partitioning - Add non privileged mode support
From: David Miller @ 2010-06-02  9:23 UTC (permalink / raw)
  To: anirban.chakraborty; +Cc: netdev, amit.salecha, ameen.rahman
In-Reply-To: <alpine.OSX.2.00.1006011428590.45836@macintosh-2.qlogic.org>

From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Date: Tue, 1 Jun 2010 14:33:09 -0700

> Added support for NIC functions that work in non privileged mode where these 
> functions are privileged to do IO only, the control operations are handled via 
> privileged functions.
> Bumped up version number to 5.0.3.
> 
> Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6 1/2] qlcnic: NIC Partitioning - Add basic infrastructure support
From: David Miller @ 2010-06-02  9:23 UTC (permalink / raw)
  To: anirban.chakraborty; +Cc: netdev, amit.salecha, ameen.rahman
In-Reply-To: <alpine.OSX.2.00.1006011422230.45836@macintosh-2.qlogic.org>

From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Date: Tue, 1 Jun 2010 14:28:51 -0700

> Following changes have been added to enable the adapter to work in
> NIC partitioning mode where multiple PCI functions of an adapter port can
> be configured to work as NIC functions. The first function that is enumerated on
> the PCI bus assumes the role of management function which, besides being able
> to do all the NIC functionality, can configure other NIC partitions. Other NIC
> functions can be configured as privileged or non privileged functions.
> Privileged function can not configure other NIC functions but can do all the
> NIC functionality including any firmware initialization, chip reset etc. Non
> privileged functions can do only basic IO. For chip reset etc, it depends on the
> privilege or management function.
> 
> 1. Added code to determine PCI function number independent of kernel API.
> 2. Added Driver - FW version 2.0 support.
> 3. Changed producer and consumer register offset calculation.
> 4. Added management and privileged operation modes for npar functions. A module
>  parameter has been added to control it.
> 5. Added support for configuring the eswitch in the adapter.
> 
> Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>

Applied.

^ permalink raw reply

* Re: [PATCH 00/12] sfc changes for 2.6.36
From: David Miller @ 2010-06-02  9:21 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1275426967.2114.25.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Tue, 01 Jun 2010 22:16:07 +0100

> The major feature here is RX buffer recycling, which improves
> performance on networks with heavy multicast traffic.  Other than that,
> there are various bug fixes and cleanup.
> 
> Ben.
> 
> Ben Hutchings (3):
>   sfc: Rename struct efx_mcdi_phy_cfg to efx_mcdi_phy_data
>   sfc: Only count bad packets in rx_errors
>   sfc: Get port number from CS_PORT_NUM, not PCI function number
> 
> Steve Hodgson (9):
>   sfc: Reschedule any resets scheduled inside efx_pm_freeze()
>   sfc: Workaround flush failures on Falcon B0
>   sfc: Synchronise link_advertising and wanted_fc on Siena
>   sfc: Wait for the link to stay up before running loopback selftest
>   sfc: Allow DRV_GEN events to be used outside of selftests
>   sfc: Remove efx_rx_queue::add_lock
>   sfc: Support only two rx buffers per page
>   sfc: Recycle discarded rx buffers back onto the queue
>   sfc: Allow shared pages to be recycled

ALl applied, thanks Ben.

^ permalink raw reply

* Re: pull request: wireless-2.6 2010-05-28
From: Sedat Dilek @ 2010-06-02  8:42 UTC (permalink / raw)
  To: reinette chatre
  Cc: John W. Linville, davem@davemloft.net,
	linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Zheng, Jiajia, Kolekar, Abhijeet,
	Berg, Johannes
In-Reply-To: <1275409361.2091.27208.camel@rchatre-DESK>

On Tue, Jun 1, 2010 at 6:22 PM, reinette chatre
<reinette.chatre@intel.com> wrote:
> Hi Sedat,
>
> On Sat, 2010-05-29 at 06:34 -0700, Sedat Dilek wrote:
>> Two people confirmed the patch in [2] fixes:
>> 1. iwlwifi-2.6 GIT master (commit f10a237c95abd6d64a3a24553bd1d3bcddd9108b)
>> 2. compat-wireless (2010-05-21)
>>
>> And it fixes also the above mentionned combination.
>
> We are working on getting this patch upstream. The version tested
> contains a significant amount of duplicated code as well as some cleanup
> code mixed in. It is thus not appropriate for an upstream submission at
> this time.
>

What do you mean by "upstream"?
upstream for me is Linus-tree (linux-2.6 GIT master).
Patches against iwlwifi-2.6 GIT trees are not very helpful in my case.

>> As a suggestion:
>> What about "copying" bug-reports (incl. its history) from IWL-BTS into
>> linux-wireless ML?
>> For example (dri-devel related) bug-reports from
>> bugzilla.freedesktop.org are "copied" into dri-devel ML.
>
> Pointing people to the bug report seems to work ok. What benefit does
> the above have?
>

[+] People - not subscribed to the BR - can follow "on change".
[-] Blow up linux-wireless ML?

- Sedat -

^ permalink raw reply

* [PATCH net-next-2.6] net: replace hooks in __netif_receive_skb V5
From: Jiri Pirko @ 2010-06-02  7:52 UTC (permalink / raw)
  To: netdev; +Cc: davem, kaber, eric.dumazet, shemminger
In-Reply-To: <20100601082805.1c84b16d@nehalam>

What this patch does is it removes two receive frame hooks (for bridge and for
macvlan) from __netif_receive_skb. These are replaced them with a single
hook for both. It only supports one hook per device because it makes no
sense to do bridging and macvlan on the same device.

Then a network driver (of virtual netdev like macvlan or bridge) can register
an rx_handler for needed net device.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
---
v4->v5
	- do rcu_assign_pointer even for assigning null for rcu-protected
	  pointers 
	- cosmetics (renames, comments and __netif_receive_skb if part)

 drivers/net/macvlan.c      |   19 +++++--
 include/linux/if_bridge.h  |    2 -
 include/linux/if_macvlan.h |    4 --
 include/linux/netdevice.h  |    7 +++
 net/bridge/br.c            |    2 -
 net/bridge/br_if.c         |    8 +++
 net/bridge/br_input.c      |   12 +++-
 net/bridge/br_private.h    |    3 +-
 net/core/dev.c             |  119 ++++++++++++++++++++-----------------------
 9 files changed, 93 insertions(+), 83 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 87e8d4c..53422ce 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -145,15 +145,16 @@ static void macvlan_broadcast(struct sk_buff *skb,
 }
 
 /* called under rcu_read_lock() from netif_receive_skb */
-static struct sk_buff *macvlan_handle_frame(struct macvlan_port *port,
-					    struct sk_buff *skb)
+static struct sk_buff *macvlan_handle_frame(struct sk_buff *skb)
 {
+	struct macvlan_port *port;
 	const struct ethhdr *eth = eth_hdr(skb);
 	const struct macvlan_dev *vlan;
 	const struct macvlan_dev *src;
 	struct net_device *dev;
 	unsigned int len;
 
+	port = rcu_dereference(skb->dev->macvlan_port);
 	if (is_multicast_ether_addr(eth->h_dest)) {
 		src = macvlan_hash_lookup(port, eth->h_source);
 		if (!src)
@@ -515,6 +516,7 @@ static int macvlan_port_create(struct net_device *dev)
 {
 	struct macvlan_port *port;
 	unsigned int i;
+	int err;
 
 	if (dev->type != ARPHRD_ETHER || dev->flags & IFF_LOOPBACK)
 		return -EINVAL;
@@ -528,13 +530,21 @@ static int macvlan_port_create(struct net_device *dev)
 	for (i = 0; i < MACVLAN_HASH_SIZE; i++)
 		INIT_HLIST_HEAD(&port->vlan_hash[i]);
 	rcu_assign_pointer(dev->macvlan_port, port);
-	return 0;
+
+	err = netdev_rx_handler_register(dev, macvlan_handle_frame);
+	if (err) {
+		rcu_assign_pointer(dev->macvlan_port, NULL);
+		kfree(port);
+	}
+
+	return err;
 }
 
 static void macvlan_port_destroy(struct net_device *dev)
 {
 	struct macvlan_port *port = dev->macvlan_port;
 
+	netdev_rx_handler_unregister(dev);
 	rcu_assign_pointer(dev->macvlan_port, NULL);
 	synchronize_rcu();
 	kfree(port);
@@ -767,14 +777,12 @@ static int __init macvlan_init_module(void)
 	int err;
 
 	register_netdevice_notifier(&macvlan_notifier_block);
-	macvlan_handle_frame_hook = macvlan_handle_frame;
 
 	err = macvlan_link_register(&macvlan_link_ops);
 	if (err < 0)
 		goto err1;
 	return 0;
 err1:
-	macvlan_handle_frame_hook = NULL;
 	unregister_netdevice_notifier(&macvlan_notifier_block);
 	return err;
 }
@@ -782,7 +790,6 @@ err1:
 static void __exit macvlan_cleanup_module(void)
 {
 	rtnl_link_unregister(&macvlan_link_ops);
-	macvlan_handle_frame_hook = NULL;
 	unregister_netdevice_notifier(&macvlan_notifier_block);
 }
 
diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 938b7e8..0d241a5 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -102,8 +102,6 @@ struct __fdb_entry {
 #include <linux/netdevice.h>
 
 extern void brioctl_set(int (*ioctl_hook)(struct net *, unsigned int, void __user *));
-extern struct sk_buff *(*br_handle_frame_hook)(struct net_bridge_port *p,
-					       struct sk_buff *skb);
 extern int (*br_should_route_hook)(struct sk_buff *skb);
 
 #endif
diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 9ea047a..c26a0e4 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -84,8 +84,4 @@ extern int macvlan_link_register(struct rtnl_link_ops *ops);
 extern netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
 				      struct net_device *dev);
 
-
-extern struct sk_buff *(*macvlan_handle_frame_hook)(struct macvlan_port *,
-						    struct sk_buff *);
-
 #endif /* _LINUX_IF_MACVLAN_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a249161..244494f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -381,6 +381,8 @@ enum gro_result {
 };
 typedef enum gro_result gro_result_t;
 
+typedef struct sk_buff *rx_handler_func_t(struct sk_buff *skb);
+
 extern void __napi_schedule(struct napi_struct *n);
 
 static inline int napi_disable_pending(struct napi_struct *n)
@@ -957,6 +959,7 @@ struct net_device {
 #endif
 
 	struct netdev_queue	rx_queue;
+	rx_handler_func_t	*rx_handler;
 
 	struct netdev_queue	*_tx ____cacheline_aligned_in_smp;
 
@@ -1693,6 +1696,10 @@ static inline void napi_free_frags(struct napi_struct *napi)
 	napi->skb = NULL;
 }
 
+extern int netdev_rx_handler_register(struct net_device *dev,
+				      rx_handler_func_t *rx_handler);
+extern void netdev_rx_handler_unregister(struct net_device *dev);
+
 extern void		netif_nit_deliver(struct sk_buff *skb);
 extern int		dev_valid_name(const char *name);
 extern int		dev_ioctl(struct net *net, unsigned int cmd, void __user *);
diff --git a/net/bridge/br.c b/net/bridge/br.c
index 76357b5..c8436fa 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -63,7 +63,6 @@ static int __init br_init(void)
 		goto err_out4;
 
 	brioctl_set(br_ioctl_deviceless_stub);
-	br_handle_frame_hook = br_handle_frame;
 
 #if defined(CONFIG_ATM_LANE) || defined(CONFIG_ATM_LANE_MODULE)
 	br_fdb_test_addr_hook = br_fdb_test_addr;
@@ -100,7 +99,6 @@ static void __exit br_deinit(void)
 	br_fdb_test_addr_hook = NULL;
 #endif
 
-	br_handle_frame_hook = NULL;
 	br_fdb_fini();
 }
 
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 18b245e..d924234 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -147,6 +147,7 @@ static void del_nbp(struct net_bridge_port *p)
 
 	list_del_rcu(&p->list);
 
+	netdev_rx_handler_unregister(dev);
 	rcu_assign_pointer(dev->br_port, NULL);
 
 	br_multicast_del_port(p);
@@ -429,6 +430,11 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
 		goto err2;
 
 	rcu_assign_pointer(dev->br_port, p);
+
+	err = netdev_rx_handler_register(dev, br_handle_frame);
+	if (err)
+		goto err3;
+
 	dev_disable_lro(dev);
 
 	list_add_rcu(&p->list, &br->port_list);
@@ -451,6 +457,8 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
 	br_netpoll_enable(br, dev);
 
 	return 0;
+err3:
+	rcu_assign_pointer(dev->br_port, NULL);
 err2:
 	br_fdb_delete_by_port(br, p, 1);
 err1:
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index d36e700..99647d8 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -131,15 +131,19 @@ static inline int is_link_local(const unsigned char *dest)
 }
 
 /*
- * Called via br_handle_frame_hook.
  * Return NULL if skb is handled
- * note: already called with rcu_read_lock (preempt_disabled)
+ * note: already called with rcu_read_lock (preempt_disabled) from
+ * netif_receive_skb
  */
-struct sk_buff *br_handle_frame(struct net_bridge_port *p, struct sk_buff *skb)
+struct sk_buff *br_handle_frame(struct sk_buff *skb)
 {
+	struct net_bridge_port *p;
 	const unsigned char *dest = eth_hdr(skb)->h_dest;
 	int (*rhook)(struct sk_buff *skb);
 
+	if (skb->pkt_type == PACKET_LOOPBACK)
+		return skb;
+
 	if (!is_valid_ether_addr(eth_hdr(skb)->h_source))
 		goto drop;
 
@@ -147,6 +151,8 @@ struct sk_buff *br_handle_frame(struct net_bridge_port *p, struct sk_buff *skb)
 	if (!skb)
 		return NULL;
 
+	p = rcu_dereference(skb->dev->br_port);
+
 	if (unlikely(is_link_local(dest))) {
 		/* Pause frames shouldn't be passed up by driver anyway */
 		if (skb->protocol == htons(ETH_P_PAUSE))
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 0f4a74b..c83519b 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -331,8 +331,7 @@ extern void br_features_recompute(struct net_bridge *br);
 
 /* br_input.c */
 extern int br_handle_frame_finish(struct sk_buff *skb);
-extern struct sk_buff *br_handle_frame(struct net_bridge_port *p,
-				       struct sk_buff *skb);
+extern struct sk_buff *br_handle_frame(struct sk_buff *skb);
 
 /* br_ioctl.c */
 extern int br_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd);
diff --git a/net/core/dev.c b/net/core/dev.c
index 983a3c1..4a02b58 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2583,70 +2583,14 @@ static inline int deliver_skb(struct sk_buff *skb,
 	return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
 }
 
-#if defined(CONFIG_BRIDGE) || defined (CONFIG_BRIDGE_MODULE)
-
-#if defined(CONFIG_ATM_LANE) || defined(CONFIG_ATM_LANE_MODULE)
+#if (defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE)) && \
+    (defined(CONFIG_ATM_LANE) || defined(CONFIG_ATM_LANE_MODULE))
 /* This hook is defined here for ATM LANE */
 int (*br_fdb_test_addr_hook)(struct net_device *dev,
 			     unsigned char *addr) __read_mostly;
 EXPORT_SYMBOL_GPL(br_fdb_test_addr_hook);
 #endif
 
-/*
- * If bridge module is loaded call bridging hook.
- *  returns NULL if packet was consumed.
- */
-struct sk_buff *(*br_handle_frame_hook)(struct net_bridge_port *p,
-					struct sk_buff *skb) __read_mostly;
-EXPORT_SYMBOL_GPL(br_handle_frame_hook);
-
-static inline struct sk_buff *handle_bridge(struct sk_buff *skb,
-					    struct packet_type **pt_prev, int *ret,
-					    struct net_device *orig_dev)
-{
-	struct net_bridge_port *port;
-
-	if (skb->pkt_type == PACKET_LOOPBACK ||
-	    (port = rcu_dereference(skb->dev->br_port)) == NULL)
-		return skb;
-
-	if (*pt_prev) {
-		*ret = deliver_skb(skb, *pt_prev, orig_dev);
-		*pt_prev = NULL;
-	}
-
-	return br_handle_frame_hook(port, skb);
-}
-#else
-#define handle_bridge(skb, pt_prev, ret, orig_dev)	(skb)
-#endif
-
-#if defined(CONFIG_MACVLAN) || defined(CONFIG_MACVLAN_MODULE)
-struct sk_buff *(*macvlan_handle_frame_hook)(struct macvlan_port *p,
-					     struct sk_buff *skb) __read_mostly;
-EXPORT_SYMBOL_GPL(macvlan_handle_frame_hook);
-
-static inline struct sk_buff *handle_macvlan(struct sk_buff *skb,
-					     struct packet_type **pt_prev,
-					     int *ret,
-					     struct net_device *orig_dev)
-{
-	struct macvlan_port *port;
-
-	port = rcu_dereference(skb->dev->macvlan_port);
-	if (!port)
-		return skb;
-
-	if (*pt_prev) {
-		*ret = deliver_skb(skb, *pt_prev, orig_dev);
-		*pt_prev = NULL;
-	}
-	return macvlan_handle_frame_hook(port, skb);
-}
-#else
-#define handle_macvlan(skb, pt_prev, ret, orig_dev)	(skb)
-#endif
-
 #ifdef CONFIG_NET_CLS_ACT
 /* TODO: Maybe we should just force sch_ingress to be compiled in
  * when CONFIG_NET_CLS_ACT is? otherwise some useless instructions
@@ -2742,6 +2686,47 @@ void netif_nit_deliver(struct sk_buff *skb)
 	rcu_read_unlock();
 }
 
+/**
+ *	netdev_rx_handler_register - register receive handler
+ *	@dev: device to register a handler for
+ *	@rx_handler: receive handler to register
+ *
+ *	Register a receive hander for a device. This handler will then be
+ *	called from __netif_receive_skb. A negative errno code is returned
+ *	on a failure.
+ *
+ *	The caller must hold the rtnl_mutex.
+ */
+int netdev_rx_handler_register(struct net_device *dev,
+			       rx_handler_func_t *rx_handler)
+{
+	ASSERT_RTNL();
+
+	if (dev->rx_handler)
+		return -EBUSY;
+
+	rcu_assign_pointer(dev->rx_handler, rx_handler);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(netdev_rx_handler_register);
+
+/**
+ *	netdev_rx_handler_unregister - unregister receive handler
+ *	@dev: device to unregister a handler from
+ *
+ *	Unregister a receive hander from a device.
+ *
+ *	The caller must hold the rtnl_mutex.
+ */
+void netdev_rx_handler_unregister(struct net_device *dev)
+{
+
+	ASSERT_RTNL();
+	rcu_assign_pointer(dev->rx_handler, NULL);
+}
+EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
+
 static inline void skb_bond_set_mac_by_master(struct sk_buff *skb,
 					      struct net_device *master)
 {
@@ -2794,6 +2779,7 @@ EXPORT_SYMBOL(__skb_bond_should_drop);
 static int __netif_receive_skb(struct sk_buff *skb)
 {
 	struct packet_type *ptype, *pt_prev;
+	rx_handler_func_t *rx_handler;
 	struct net_device *orig_dev;
 	struct net_device *master;
 	struct net_device *null_or_orig;
@@ -2856,12 +2842,17 @@ static int __netif_receive_skb(struct sk_buff *skb)
 ncls:
 #endif
 
-	skb = handle_bridge(skb, &pt_prev, &ret, orig_dev);
-	if (!skb)
-		goto out;
-	skb = handle_macvlan(skb, &pt_prev, &ret, orig_dev);
-	if (!skb)
-		goto out;
+	/* Handle special case of bridge or macvlan */
+	rx_handler = rcu_dereference(skb->dev->rx_handler);
+	if (rx_handler) {
+		if (pt_prev) {
+			ret = deliver_skb(skb, pt_prev, orig_dev);
+			pt_prev = NULL;
+		}
+		skb = rx_handler(skb);
+		if (!skb)
+			goto out;
+	}
 
 	/*
 	 * Make sure frames received on VLAN interfaces stacked on
-- 
1.7.0.1


^ permalink raw reply related

* Re: [PATCHv3] Refactor update of IPv6 flowi destination address for srcrt (RH) option
From: Arnaud Ebalard @ 2010-06-02  7:35 UTC (permalink / raw)
  To: David S. Miller
  Cc: Joe Perches, YOSHIFUJI Hideaki / 吉藤英明,
	netdev
In-Reply-To: <1275422925.19372.114.camel@Joe-Laptop.home>

Hi,

Here is a v3 based on the comments I received from Joe. The helper is
now in exthdrs.c (no more inline) and has a proper name.

Applies on net-2.6. Tested on 2.6.34.

Cheers,

a+

>From ec9c9695345f4821060a78bd3243cdfeaa852897 Mon Sep 17 00:00:00 2001
From: Arnaud Ebalard <arno@natisbad.org>
Date: Wed, 2 Jun 2010 09:25:27 +0200
Subject: [PATCH] Refactor update of IPv6 flowi destination address for srcrt (RH) option

There are more than a dozen occurrences of following code in the
IPv6 stack:

    if (opt && opt->srcrt) {
            struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt;
            ipv6_addr_copy(&final, &fl.fl6_dst);
            ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
            final_p = &final;
    }

Replace those with a helper. Note that the helper overrides final_p
in all cases. This is ok as final_p was previously initialized to
NULL when declared.

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
---
 include/net/ipv6.h               |    4 ++++
 net/dccp/ipv6.c                  |   30 ++++++------------------------
 net/ipv6/af_inet6.c              |    9 ++-------
 net/ipv6/datagram.c              |   18 ++++--------------
 net/ipv6/exthdrs.c               |   24 ++++++++++++++++++++++++
 net/ipv6/inet6_connection_sock.c |    9 ++-------
 net/ipv6/raw.c                   |   10 ++--------
 net/ipv6/syncookies.c            |    9 ++-------
 net/ipv6/tcp_ipv6.c              |   27 ++++++---------------------
 net/ipv6/udp.c                   |   11 +++--------
 10 files changed, 55 insertions(+), 96 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 2600b69..f5808d5 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -551,6 +551,10 @@ extern int 			ipv6_ext_hdr(u8 nexthdr);
 
 extern int ipv6_find_tlv(struct sk_buff *skb, int offset, int type);
 
+extern struct in6_addr *fl6_update_dst(struct flowi *fl,
+				       const struct ipv6_txoptions *opt,
+				       struct in6_addr *orig);
+
 /*
  *	socket options (ipv6_sockglue.c)
  */
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 0916988..6e3f325 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -248,7 +248,7 @@ static int dccp_v6_send_response(struct sock *sk, struct request_sock *req,
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct sk_buff *skb;
 	struct ipv6_txoptions *opt = NULL;
-	struct in6_addr *final_p = NULL, final;
+	struct in6_addr *final_p, final;
 	struct flowi fl;
 	int err = -1;
 	struct dst_entry *dst;
@@ -265,13 +265,7 @@ static int dccp_v6_send_response(struct sock *sk, struct request_sock *req,
 
 	opt = np->opt;
 
-	if (opt != NULL && opt->srcrt != NULL) {
-		const struct rt0_hdr *rt0 = (struct rt0_hdr *)opt->srcrt;
-
-		ipv6_addr_copy(&final, &fl.fl6_dst);
-		ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
-		final_p = &final;
-	}
+	final_p = fl6_update_dst(&fl, opt, &final);
 
 	err = ip6_dst_lookup(sk, &dst, &fl);
 	if (err)
@@ -545,19 +539,13 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk,
 		goto out_overflow;
 
 	if (dst == NULL) {
-		struct in6_addr *final_p = NULL, final;
+		struct in6_addr *final_p, final;
 		struct flowi fl;
 
 		memset(&fl, 0, sizeof(fl));
 		fl.proto = IPPROTO_DCCP;
 		ipv6_addr_copy(&fl.fl6_dst, &ireq6->rmt_addr);
-		if (opt != NULL && opt->srcrt != NULL) {
-			const struct rt0_hdr *rt0 = (struct rt0_hdr *)opt->srcrt;
-
-			ipv6_addr_copy(&final, &fl.fl6_dst);
-			ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
-			final_p = &final;
-		}
+		final_p = fl6_update_dst(&fl, opt, &final);
 		ipv6_addr_copy(&fl.fl6_src, &ireq6->loc_addr);
 		fl.oif = sk->sk_bound_dev_if;
 		fl.fl_ip_dport = inet_rsk(req)->rmt_port;
@@ -885,7 +873,7 @@ static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 	struct inet_sock *inet = inet_sk(sk);
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct dccp_sock *dp = dccp_sk(sk);
-	struct in6_addr *saddr = NULL, *final_p = NULL, final;
+	struct in6_addr *saddr = NULL, *final_p, final;
 	struct flowi fl;
 	struct dst_entry *dst;
 	int addr_type;
@@ -988,13 +976,7 @@ static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 	fl.fl_ip_sport = inet->inet_sport;
 	security_sk_classify_flow(sk, &fl);
 
-	if (np->opt != NULL && np->opt->srcrt != NULL) {
-		const struct rt0_hdr *rt0 = (struct rt0_hdr *)np->opt->srcrt;
-
-		ipv6_addr_copy(&final, &fl.fl6_dst);
-		ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
-		final_p = &final;
-	}
+	final_p = fl6_update_dst(&fl, np->opt, &final);
 
 	err = ip6_dst_lookup(sk, &dst, &fl);
 	if (err)
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index e733942..94b1b9c 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -651,7 +651,7 @@ int inet6_sk_rebuild_header(struct sock *sk)
 
 	if (dst == NULL) {
 		struct inet_sock *inet = inet_sk(sk);
-		struct in6_addr *final_p = NULL, final;
+		struct in6_addr *final_p, final;
 		struct flowi fl;
 
 		memset(&fl, 0, sizeof(fl));
@@ -665,12 +665,7 @@ int inet6_sk_rebuild_header(struct sock *sk)
 		fl.fl_ip_sport = inet->inet_sport;
 		security_sk_classify_flow(sk, &fl);
 
-		if (np->opt && np->opt->srcrt) {
-			struct rt0_hdr *rt0 = (struct rt0_hdr *) np->opt->srcrt;
-			ipv6_addr_copy(&final, &fl.fl6_dst);
-			ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
-			final_p = &final;
-		}
+		final_p = fl6_update_dst(&fl, np->opt, &final);
 
 		err = ip6_dst_lookup(sk, &dst, &fl);
 		if (err) {
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 7126846..7d929a2 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -38,10 +38,11 @@ int ip6_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	struct sockaddr_in6	*usin = (struct sockaddr_in6 *) uaddr;
 	struct inet_sock      	*inet = inet_sk(sk);
 	struct ipv6_pinfo      	*np = inet6_sk(sk);
-	struct in6_addr		*daddr, *final_p = NULL, final;
+	struct in6_addr		*daddr, *final_p, final;
 	struct dst_entry	*dst;
 	struct flowi		fl;
 	struct ip6_flowlabel	*flowlabel = NULL;
+	struct ipv6_txoptions   *opt;
 	int			addr_type;
 	int			err;
 
@@ -155,19 +156,8 @@ ipv4_connected:
 
 	security_sk_classify_flow(sk, &fl);
 
-	if (flowlabel) {
-		if (flowlabel->opt && flowlabel->opt->srcrt) {
-			struct rt0_hdr *rt0 = (struct rt0_hdr *) flowlabel->opt->srcrt;
-			ipv6_addr_copy(&final, &fl.fl6_dst);
-			ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
-			final_p = &final;
-		}
-	} else if (np->opt && np->opt->srcrt) {
-		struct rt0_hdr *rt0 = (struct rt0_hdr *)np->opt->srcrt;
-		ipv6_addr_copy(&final, &fl.fl6_dst);
-		ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
-		final_p = &final;
-	}
+	opt = flowlabel ? flowlabel->opt : np->opt;
+	final_p = fl6_update_dst(&fl, opt, &final);
 
 	err = ip6_dst_lookup(sk, &dst, &fl);
 	if (err)
diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 8a659f9..853a633 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -874,3 +874,27 @@ struct ipv6_txoptions *ipv6_fixup_options(struct ipv6_txoptions *opt_space,
 	return opt;
 }
 
+/**
+ * fl6_update_dst - update flowi destination address with info given
+ *                  by srcrt option, if any.
+ *
+ * @fl: flowi for which fl6_dst is to be updated
+ * @opt: struct ipv6_txoptions in which to look for srcrt opt
+ * @orig: copy of original fl6_dst address if modified
+ *
+ * Returns NULL if no txoptions or no srcrt, otherwise returns orig
+ * and initial value of fl->fl6_dst set in orig
+ */
+struct in6_addr *fl6_update_dst(struct flowi *fl,
+				const struct ipv6_txoptions *opt,
+				struct in6_addr *orig)
+{
+	if (!opt || !opt->srcrt)
+		return NULL;
+
+	ipv6_addr_copy(orig, &fl->fl6_dst);
+	ipv6_addr_copy(&fl->fl6_dst, ((struct rt0_hdr *)opt->srcrt)->addr);
+	return orig;
+}
+
+EXPORT_SYMBOL_GPL(fl6_update_dst);
diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index 0c5e3c3..8a16280 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -185,7 +185,7 @@ int inet6_csk_xmit(struct sk_buff *skb)
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct flowi fl;
 	struct dst_entry *dst;
-	struct in6_addr *final_p = NULL, final;
+	struct in6_addr *final_p, final;
 
 	memset(&fl, 0, sizeof(fl));
 	fl.proto = sk->sk_protocol;
@@ -199,12 +199,7 @@ int inet6_csk_xmit(struct sk_buff *skb)
 	fl.fl_ip_dport = inet->inet_dport;
 	security_sk_classify_flow(sk, &fl);
 
-	if (np->opt && np->opt->srcrt) {
-		struct rt0_hdr *rt0 = (struct rt0_hdr *)np->opt->srcrt;
-		ipv6_addr_copy(&final, &fl.fl6_dst);
-		ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
-		final_p = &final;
-	}
+	final_p = fl6_update_dst(&fl, np->opt, &final);
 
 	dst = __inet6_csk_dst_check(sk, np->dst_cookie);
 
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 4a4dcbe..864eb8e 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -725,7 +725,7 @@ static int rawv6_sendmsg(struct kiocb *iocb, struct sock *sk,
 {
 	struct ipv6_txoptions opt_space;
 	struct sockaddr_in6 * sin6 = (struct sockaddr_in6 *) msg->msg_name;
-	struct in6_addr *daddr, *final_p = NULL, final;
+	struct in6_addr *daddr, *final_p, final;
 	struct inet_sock *inet = inet_sk(sk);
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct raw6_sock *rp = raw6_sk(sk);
@@ -847,13 +847,7 @@ static int rawv6_sendmsg(struct kiocb *iocb, struct sock *sk,
 	if (ipv6_addr_any(&fl.fl6_src) && !ipv6_addr_any(&np->saddr))
 		ipv6_addr_copy(&fl.fl6_src, &np->saddr);
 
-	/* merge ip6_build_xmit from ip6_output */
-	if (opt && opt->srcrt) {
-		struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt;
-		ipv6_addr_copy(&final, &fl.fl6_dst);
-		ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
-		final_p = &final;
-	}
+	final_p = fl6_update_dst(&fl, opt, &final);
 
 	if (!fl.oif && ipv6_addr_is_multicast(&fl.fl6_dst))
 		fl.oif = np->mcast_oif;
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index 34d1f06..1238370 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -240,17 +240,12 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 	 * me if there is a preferred way.
 	 */
 	{
-		struct in6_addr *final_p = NULL, final;
+		struct in6_addr *final_p, final;
 		struct flowi fl;
 		memset(&fl, 0, sizeof(fl));
 		fl.proto = IPPROTO_TCP;
 		ipv6_addr_copy(&fl.fl6_dst, &ireq6->rmt_addr);
-		if (np->opt && np->opt->srcrt) {
-			struct rt0_hdr *rt0 = (struct rt0_hdr *) np->opt->srcrt;
-			ipv6_addr_copy(&final, &fl.fl6_dst);
-			ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
-			final_p = &final;
-		}
+		final_p = fl6_update_dst(&fl, np->opt, &final);
 		ipv6_addr_copy(&fl.fl6_src, &ireq6->loc_addr);
 		fl.oif = sk->sk_bound_dev_if;
 		fl.mark = sk->sk_mark;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 2b7c3a1..e487080 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -129,7 +129,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
-	struct in6_addr *saddr = NULL, *final_p = NULL, final;
+	struct in6_addr *saddr = NULL, *final_p, final;
 	struct flowi fl;
 	struct dst_entry *dst;
 	int addr_type;
@@ -250,12 +250,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 	fl.fl_ip_dport = usin->sin6_port;
 	fl.fl_ip_sport = inet->inet_sport;
 
-	if (np->opt && np->opt->srcrt) {
-		struct rt0_hdr *rt0 = (struct rt0_hdr *)np->opt->srcrt;
-		ipv6_addr_copy(&final, &fl.fl6_dst);
-		ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
-		final_p = &final;
-	}
+	final_p = fl6_update_dst(&fl, np->opt, &final);
 
 	security_sk_classify_flow(sk, &fl);
 
@@ -477,7 +472,7 @@ static int tcp_v6_send_synack(struct sock *sk, struct request_sock *req,
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct sk_buff * skb;
 	struct ipv6_txoptions *opt = NULL;
-	struct in6_addr * final_p = NULL, final;
+	struct in6_addr * final_p, final;
 	struct flowi fl;
 	struct dst_entry *dst;
 	int err = -1;
@@ -494,12 +489,7 @@ static int tcp_v6_send_synack(struct sock *sk, struct request_sock *req,
 	security_req_classify_flow(req, &fl);
 
 	opt = np->opt;
-	if (opt && opt->srcrt) {
-		struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt;
-		ipv6_addr_copy(&final, &fl.fl6_dst);
-		ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
-		final_p = &final;
-	}
+	final_p = fl6_update_dst(&fl, opt, &final);
 
 	err = ip6_dst_lookup(sk, &dst, &fl);
 	if (err)
@@ -1392,18 +1382,13 @@ static struct sock * tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
 		goto out_overflow;
 
 	if (dst == NULL) {
-		struct in6_addr *final_p = NULL, final;
+		struct in6_addr *final_p, final;
 		struct flowi fl;
 
 		memset(&fl, 0, sizeof(fl));
 		fl.proto = IPPROTO_TCP;
 		ipv6_addr_copy(&fl.fl6_dst, &treq->rmt_addr);
-		if (opt && opt->srcrt) {
-			struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt;
-			ipv6_addr_copy(&final, &fl.fl6_dst);
-			ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
-			final_p = &final;
-		}
+		final_p = fl6_update_dst(&fl, opt, &final);
 		ipv6_addr_copy(&fl.fl6_src, &treq->loc_addr);
 		fl.oif = sk->sk_bound_dev_if;
 		fl.mark = sk->sk_mark;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 87be586..1dd1aff 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -927,7 +927,7 @@ int udpv6_sendmsg(struct kiocb *iocb, struct sock *sk,
 	struct inet_sock *inet = inet_sk(sk);
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *) msg->msg_name;
-	struct in6_addr *daddr, *final_p = NULL, final;
+	struct in6_addr *daddr, *final_p, final;
 	struct ipv6_txoptions *opt = NULL;
 	struct ip6_flowlabel *flowlabel = NULL;
 	struct flowi fl;
@@ -1097,14 +1097,9 @@ do_udp_sendmsg:
 		ipv6_addr_copy(&fl.fl6_src, &np->saddr);
 	fl.fl_ip_sport = inet->inet_sport;
 
-	/* merge ip6_build_xmit from ip6_output */
-	if (opt && opt->srcrt) {
-		struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt;
-		ipv6_addr_copy(&final, &fl.fl6_dst);
-		ipv6_addr_copy(&fl.fl6_dst, rt0->addr);
-		final_p = &final;
+	final_p = fl6_update_dst(&fl, opt, &final);
+	if (final_p)
 		connected = 0;
-	}
 
 	if (!fl.oif && ipv6_addr_is_multicast(&fl.fl6_dst)) {
 		fl.oif = np->mcast_oif;
-- 
1.7.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox