Netdev List
 help / color / mirror / Atom feed
* bridge netpoll support: mismatch between net core and bridge headers
From: Mike Frysinger @ 2010-11-13 23:26 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev

commit 91d2c34a4eed32876ca333b0ca44f3bc56645805 added this bit of code
to net/bridge/br_private.h:
struct net_bridge_port {
    ....
+#ifdef CONFIG_NET_POLL_CONTROLLER
+   struct netpoll          *np;
+#endif
};
....
#ifdef CONFIG_NET_POLL_CONTROLLER
+static inline struct netpoll_info *br_netpoll_info(struct net_bridge *br)
+{
+   return br->dev->npinfo;
+}
....

unfortunately, this is not the define protection that is used in the
core net code (include/linux/netdevice.h):
#ifdef CONFIG_NETPOLL
    struct netpoll_info *npinfo;
#endif

so in my randconfig builds, i'm now seeing frequent failures along the lines of:

In file included from net/bridge/br.c:24:
net/bridge/br_private.h: In function ‘br_netpoll_info’:
net/bridge/br_private.h:293: error: ‘struct net_device’ has no member
named ‘npinfo’
make[2]: *** [net/bridge/br.o] Error 1
In file included from net/bridge/br_device.c:24:
net/bridge/br_private.h: In function ‘br_netpoll_info’:
net/bridge/br_private.h:293: error: ‘struct net_device’ has no member
named ‘npinfo’
In file included from net/bridge/br_fdb.c:27:
net/bridge/br_private.h: In function ‘br_netpoll_info’:
net/bridge/br_private.h:293: error: ‘struct net_device’ has no member
named ‘npinfo’
make[2]: *** [net/bridge/br_fdb.o] Error 1
make[2]: *** [net/bridge/br_device.o] Error 1
In file included from net/bridge/br_forward.c:23:
net/bridge/br_private.h: In function ‘br_netpoll_info’:
net/bridge/br_private.h:293: error: ‘struct net_device’ has no member
named ‘npinfo’
make[2]: *** [net/bridge/br_forward.o] Error 1

seems to be a regression introduced during the 2.6.36 cycle
-mike

^ permalink raw reply

* Re: Kernel rwlock design, Multicore and IGMP
From: Chris Metcalf @ 2010-11-13 23:03 UTC (permalink / raw)
  To: Américo Wang; +Cc: Cypher Wu, Eric Dumazet, linux-kernel, netdev
In-Reply-To: <ZXmP8hjgLHA.4648@exchange1.tad.internal.tilera.com>

On 11/12/2010 2:13 AM, Américo Wang wrote:
> On Fri, Nov 12, 2010 at 11:32:59AM +0800, Cypher Wu wrote:
>> On Thu, Nov 11, 2010 at 11:23 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> Le jeudi 11 novembre 2010 à 21:49 +0800, Cypher Wu a écrit :
>>>> I'm using TILEPro and its rwlock in kernel is a liitle different than
>>>> other platforms. It have a priority for write lock that when tried it
>>>> will block the following read lock even if read lock is hold by
>>>> others. Its code can be read in Linux Kernel 2.6.36 in
>>>> arch/tile/lib/spinlock_32.c.
>>>
>>> This seems a bug to me.
>>> [...]
>>>
>> It seems not a problem that read_lock() can be nested or not since
>> rwlock doesn't have 'owner', it's just that should we give
>> write_lock() a priority than read_lock() since if there have a lot
>> read_lock()s then they'll starve write_lock().
>> We should work out a well defined behavior so all the
>> platform-dependent raw_rwlock has to design under that principle.
> 
> It is a known weakness of rwlock, it is designed like that. :)

Exactly.  The tile rwlock correctly allows recursively reacquiring the read
lock.  But it does give priority to writers, for the (unfortunately
incorrect) reasons Cypher Wu outlined above, e.g.:

- Core A takes a read lock
- Core B tries for a write lock and blocks new read locks
- Core A tries for a (recursive) read lock and blocks

Core A and B are now deadlocked.

The solution is actually to simplify the tile rwlock implementation so that
both readers and writers contend fairly for the lock.

I'll post a patch in the next day or two for tile.

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com

^ permalink raw reply

* Re: Kernel rwlock design, Multicore and IGMP
From: Peter Zijlstra @ 2010-11-13 22:54 UTC (permalink / raw)
  To: Américo Wang; +Cc: Eric Dumazet, Cypher Wu, linux-kernel, netdev
In-Reply-To: <20101112081945.GA5949@cr0.nay.redhat.com>

On Fri, 2010-11-12 at 16:19 +0800, Américo Wang wrote:
> 
> Just for record, both Tile and X86 implement rwlock with a write-bias,
> this somewhat reduces the write-starvation problem. 

x86 does no such thing.

^ permalink raw reply

* Re: Kernel rwlock design, Multicore and IGMP
From: Peter Zijlstra @ 2010-11-13 22:53 UTC (permalink / raw)
  To: Cypher Wu; +Cc: Eric Dumazet, linux-kernel, netdev
In-Reply-To: <AANLkTik93QUQxSDmwd4Qj-gXQiWWPzd68JPAYAHBAsHR@mail.gmail.com>

On Fri, 2010-11-12 at 11:32 +0800, Cypher Wu wrote:
> It seems not a problem that read_lock() can be nested or not since
> rwlock doesn't have 'owner', 

You're mistaken.

> it's just that should we give
> write_lock() a priority than read_lock() since if there have a lot
> read_lock()s then they'll starve write_lock().

We rely on that behaviour. FWIW write preference locks will starve
readers.

> We should work out a well defined behavior so all the
> platform-dependent raw_rwlock has to design under that principle. 

We have that, all archs have read preference rwlock_t, they have to,
code relies on it.

^ permalink raw reply

* Re: [PATCH] atomic: add atomic_inc_not_zero_hint()
From: Paul E. McKenney @ 2010-11-13 22:26 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Eric Dumazet, Andrew Morton, linux-kernel, David Miller, netdev,
	Arnaldo Carvalho de Melo, Ingo Molnar, Andi Kleen, Nick Piggin
In-Reply-To: <alpine.DEB.2.00.1011121313001.16754@router.home>

On Fri, Nov 12, 2010 at 01:14:12PM -0600, Christoph Lameter wrote:
> 
> prefetchw() would be too much overhead?

No idea.  Where do you believe that prefetchw() should be added?

							Thanx, Paul

^ permalink raw reply

* [PATCH update 2] firewire: net: throttle TX queue before running out of tlabels
From: Stefan Richter @ 2010-11-13 22:07 UTC (permalink / raw)
  To: linux1394-devel; +Cc: linux-kernel, netdev
In-Reply-To: <tkrat.eaac597cf54bb660@s5r6.in-berlin.de>

This prevents firewire-net from submitting write requests in fast
succession until failure due to all 64 transaction labels were used up
for unfinished split transactions.  The netif_stop/wake_queue API is
used for this purpose.

Without this stop/wake mechanism, datagrams were simply lost whenever
the tlabel pool was exhausted.  Plus, tlabel exhaustion by firewire-net
also prevented other unrelated outbound transactions to be initiated.

The high watermark is set to considerably less than 64 (I chose 8)
because peers which run current Linux firewire-ohci are still easily
saturated by this (i.e. some datagrams are dropped with ack-busy-*
events), depending on the hardware at transmitter and receiver side.

I did not see changes to resulting throughput that were discernible from
the usual measuring noise.  To do:  Revisit the choice of queue depth
once firewire-ohci's AR DMA was improved.

I wonder what a good net_device.tx_queue_len value is.  I just set it
to the same value as the chosen watermark for now.

Note:  This removes some netif_wake_queue from reception code paths.
They were apparently copy&paste artefacts from a nonsensical
netif_wake_queue use in the older eth1394 driver.  This belongs only
into the transmit path.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
Update 2:  Maxim told me to de-obfuscate status tracking.  I realized
that netif_queue_stopped can be used for that and thereby noticed bogus
usages of it in the rx path.

 drivers/firewire/net.c |   59 ++++++++++++++++++++++++-----------------
 1 file changed, 35 insertions(+), 24 deletions(-)

Index: b/drivers/firewire/net.c
===================================================================
--- a/drivers/firewire/net.c
+++ b/drivers/firewire/net.c
@@ -28,8 +28,14 @@
 #include <asm/unaligned.h>
 #include <net/arp.h>
 
-#define FWNET_MAX_FRAGMENTS	25	/* arbitrary limit */
-#define FWNET_ISO_PAGE_COUNT	(PAGE_SIZE < 16 * 1024 ? 4 : 2)
+/* rx limits */
+#define FWNET_MAX_FRAGMENTS		25 /* arbitrary limit */
+#define FWNET_ISO_PAGE_COUNT		(PAGE_SIZE < 16*1024 ? 4 : 2)
+
+/* tx limits */
+#define FWNET_MAX_QUEUED_DATAGRAMS	8 /* should keep AT DMA busy enough */
+#define FWNET_MIN_QUEUED_DATAGRAMS	2
+#define FWNET_TX_QUEUE_LEN		FWNET_MAX_QUEUED_DATAGRAMS /* ? */
 
 #define IEEE1394_BROADCAST_CHANNEL	31
 #define IEEE1394_ALL_NODES		(0xffc0 | 0x003f)
@@ -641,8 +647,6 @@ static int fwnet_finish_incoming_packet(
 		net->stats.rx_packets++;
 		net->stats.rx_bytes += skb->len;
 	}
-	if (netif_queue_stopped(net))
-		netif_wake_queue(net);
 
 	return 0;
 
@@ -651,8 +655,6 @@ static int fwnet_finish_incoming_packet(
 	net->stats.rx_dropped++;
 
 	dev_kfree_skb_any(skb);
-	if (netif_queue_stopped(net))
-		netif_wake_queue(net);
 
 	return -ENOENT;
 }
@@ -784,15 +786,10 @@ static int fwnet_incoming_packet(struct 
 	 * Datagram is not complete, we're done for the
 	 * moment.
 	 */
-	spin_unlock_irqrestore(&dev->lock, flags);
-
-	return 0;
+	retval = 0;
  fail:
 	spin_unlock_irqrestore(&dev->lock, flags);
 
-	if (netif_queue_stopped(net))
-		netif_wake_queue(net);
-
 	return retval;
 }
 
@@ -892,6 +889,13 @@ static void fwnet_free_ptask(struct fwne
 	kmem_cache_free(fwnet_packet_task_cache, ptask);
 }
 
+/* Caller must hold dev->lock. */
+static void dec_queued_datagrams(struct fwnet_device *dev)
+{
+	if (--dev->queued_datagrams == FWNET_MIN_QUEUED_DATAGRAMS)
+		netif_wake_queue(dev->netdev);
+}
+
 static int fwnet_send_packet(struct fwnet_packet_task *ptask);
 
 static void fwnet_transmit_packet_done(struct fwnet_packet_task *ptask)
@@ -908,7 +912,7 @@ static void fwnet_transmit_packet_done(s
 	/* Check whether we or the networking TX soft-IRQ is last user. */
 	free = (ptask->outstanding_pkts == 0 && ptask->enqueued);
 	if (free)
-		dev->queued_datagrams--;
+		dec_queued_datagrams(dev);
 
 	if (ptask->outstanding_pkts == 0) {
 		dev->netdev->stats.tx_packets++;
@@ -979,7 +983,7 @@ static void fwnet_transmit_packet_failed
 	/* Check whether we or the networking TX soft-IRQ is last user. */
 	free = ptask->enqueued;
 	if (free)
-		dev->queued_datagrams--;
+		dec_queued_datagrams(dev);
 
 	dev->netdev->stats.tx_dropped++;
 	dev->netdev->stats.tx_errors++;
@@ -1064,7 +1068,7 @@ static int fwnet_send_packet(struct fwne
 		if (!free)
 			ptask->enqueued = true;
 		else
-			dev->queued_datagrams--;
+			dec_queued_datagrams(dev);
 
 		spin_unlock_irqrestore(&dev->lock, flags);
 
@@ -1083,7 +1087,7 @@ static int fwnet_send_packet(struct fwne
 	if (!free)
 		ptask->enqueued = true;
 	else
-		dev->queued_datagrams--;
+		dec_queued_datagrams(dev);
 
 	spin_unlock_irqrestore(&dev->lock, flags);
 
@@ -1249,6 +1253,15 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 	struct fwnet_peer *peer;
 	unsigned long flags;
 
+	spin_lock_irqsave(&dev->lock, flags);
+
+	/* Can this happen? */
+	if (netif_queue_stopped(dev->netdev)) {
+		spin_unlock_irqrestore(&dev->lock, flags);
+
+		return NETDEV_TX_BUSY;
+	}
+
 	ptask = kmem_cache_alloc(fwnet_packet_task_cache, GFP_ATOMIC);
 	if (ptask == NULL)
 		goto fail;
@@ -1267,9 +1280,6 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 	proto = hdr_buf.h_proto;
 	dg_size = skb->len;
 
-	/* serialize access to peer, including peer->datagram_label */
-	spin_lock_irqsave(&dev->lock, flags);
-
 	/*
 	 * Set the transmission type for the packet.  ARP packets and IP
 	 * broadcast packets are sent via GASP.
@@ -1291,7 +1301,7 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 
 		peer = fwnet_peer_find_by_guid(dev, be64_to_cpu(guid));
 		if (!peer || peer->fifo == FWNET_NO_FIFO_ADDR)
-			goto fail_unlock;
+			goto fail;
 
 		generation         = peer->generation;
 		dest_node          = peer->node_id;
@@ -1345,7 +1355,8 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 		max_payload += RFC2374_FRAG_HDR_SIZE;
 	}
 
-	dev->queued_datagrams++;
+	if (++dev->queued_datagrams == FWNET_MAX_QUEUED_DATAGRAMS)
+		netif_stop_queue(dev->netdev);
 
 	spin_unlock_irqrestore(&dev->lock, flags);
 
@@ -1356,9 +1367,9 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 
 	return NETDEV_TX_OK;
 
- fail_unlock:
-	spin_unlock_irqrestore(&dev->lock, flags);
  fail:
+	spin_unlock_irqrestore(&dev->lock, flags);
+
 	if (ptask)
 		kmem_cache_free(fwnet_packet_task_cache, ptask);
 
@@ -1415,7 +1426,7 @@ static void fwnet_init_dev(struct net_de
 	net->addr_len		= FWNET_ALEN;
 	net->hard_header_len	= FWNET_HLEN;
 	net->type		= ARPHRD_IEEE1394;
-	net->tx_queue_len	= 10;
+	net->tx_queue_len	= FWNET_TX_QUEUE_LEN;
 	SET_ETHTOOL_OPS(net, &fwnet_ethtool_ops);
 }
 

-- 
Stefan Richter
-=====-==-=- =-== -==-=
http://arcgraph.de/sr/


^ permalink raw reply

* Re: [PATCH net-next-2.6] bridge: add __rcu annotations
From: Eric Dumazet @ 2010-11-13 22:04 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20101113101320.4b1c9ba7@nehalam>

Le samedi 13 novembre 2010 à 10:13 -0800, Stephen Hemminger a écrit :
> On Sat, 13 Nov 2010 18:58:50 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > Le samedi 13 novembre 2010 à 09:35 -0800, Stephen Hemminger a écrit :
> > > On Sat, 13 Nov 2010 09:15:28 +0100
> > > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > 
> > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > > index 578debb..ffbd177 100644
> > > > --- a/include/linux/netdevice.h
> > > > +++ b/include/linux/netdevice.h
> > > > @@ -996,7 +996,10 @@ struct net_device {
> > > >  #endif
> > > >  
> > > >  	rx_handler_func_t	*rx_handler;
> > > > -	void			*rx_handler_data;
> > > > +	union {
> > > > +		void				*rx_handler_data;
> > > > +		struct net_bridge_port __rcu	*br_port_rcu;
> > > > +	};
> > > >  
> > > >  	struct netdev_queue __rcu *ingress_queue;
> > > 
> > > I don't like making the generic hook typed again.
> > > We don't do this for other callbacks, timers, workqueues, ...
> > > Why is it necessary for RCU notation.
> > > 
> > 
> > because rcu_dereference() needs the type for __CHECKER__/sparse checks
> > 
> > #define __rcu_dereference_check(p, c, space) \
> >         ({ \
> >                 typeof(*p) *_________p1 = (typeof(*p)*__force )ACCESS_ONCE(p); \
> >                 rcu_lockdep_assert(c); \
> >                 rcu_dereference_sparse(p, space); \
> >                 smp_read_barrier_depends(); \
> >                 ((typeof(*p) __force __kernel *)(_________p1)); \
> >         })
> > 
> > So using a "void *ptr" is not an option
> > 
> > Its also cleaner to use
> > 
> > rcu_dereference(dev->br_port_rcu)
> > 
> > instead of 
> > 
> > (struct net_bridge_port *)rcu_dereference(dev->rx_handler_data)
> 
> There must be a better way. What about use of that hook by macvlan and openvswitch?

macvlan and openvswitch (is it part of linux yet ???)

I honestly dont understand your point Stephen, maybe you could explain a
bit more what is the problem ?

I use a union, like many other ones in the kernel. This is the first
time I ear this is not good to add type safety.

You can use either one or other field at your convenience.

If you are talking about stacking hooks, that has nothing to do with
this (cleanup) rcu patch, but previous introduction of
rx_handler_data/rx_handler ?

Please run sparse on x86_64 machine and watch all the warnings in bridge
code. (with CONFIG_SPARSE_RCU_POINTER=y)

Me confused.



^ permalink raw reply

* [PATCH] net: more Kconfig whitespace cleanup
From: Philippe De Muyter @ 2010-11-13 18:43 UTC (permalink / raw)
  To: netdev; +Cc: Philippe De Muyter

indentation for TSI108_ETH entry was too big.

Signed-off-by: Philippe De Muyter <phdm@macqel.be>
---
 drivers/net/Kconfig |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 805cf5d..90431de 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2389,12 +2389,12 @@ config SPIDER_NET
 	  Cell Processor-Based Blades from IBM.
 
 config TSI108_ETH
-	   tristate "Tundra TSI108 gigabit Ethernet support"
-	   depends on TSI108_BRIDGE
-	   help
-	     This driver supports Tundra TSI108 gigabit Ethernet ports.
-	     To compile this driver as a module, choose M here: the module
-	     will be called tsi108_eth.
+	tristate "Tundra TSI108 gigabit Ethernet support"
+	depends on TSI108_BRIDGE
+	help
+	  This driver supports Tundra TSI108 gigabit Ethernet ports.
+	  To compile this driver as a module, choose M here: the module
+	  will be called tsi108_eth.
 
 config GELIC_NET
 	tristate "PS3 Gigabit Ethernet driver"
-- 
1.7.1


^ permalink raw reply related

* Re: [PATCH 4/10] Fix leaking of kernel heap addresses in net/
From: Dan Rosenberg @ 2010-11-13 18:42 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1289673008.3090.350.camel@Dan>

> Actually, this is not even a joke.
> 
> Take a look at how we track what sockets a user wants dumped via
> the inet_diag netlink facility, the socket pointer is used as
> the identification cookie.
> 
> I'm sure we'll now get some more security theatre about how we
> have to undo that too.
> 
> More and more I see this whole idea as extremely rediculious.
> 
> If I can write to or read kernel memory, I can look up the
> sockets, inodes, and whatever else we're currently exposing
> the addresses of.  Even without a symbol table, which is
> readily available, I can easily find the ksymtab and find the
> inode and socket hash table addresses there.
> 
> This whole exercise is closing the barn door after the horses have
> already escaped, and it's causing all kinds of inconveniences
> that we really have no need for.

Of course if you can both write and read to kernel memory, this is a
pointless exercise, but those are not the conditions I am trying to
defend against.  Proactive security assumes that individual bugs will
continue to be found, and takes steps to prevent exploitation of classes
of bugs on a broader scale.  In this case, the goal is for it to be
difficult to exploit arbitrary write bugs WITHOUT having a separate
arbitrary read bug.  Several other steps are being taken in this
direction - there is open discussion about the merits of hiding symbol
information, address space randomization, marking various likely targets
as read-only, and restricting access to debugging information.  However,
none of this really accomplishes much if addresses are still exposed
via /proc.

I'm sorry that you see all this as "security theatre".  It's clear that
there's too much resistance to this effort for it to ever succeed, so
I'm ceasing attempts to get this patch series through.

-Dan


^ permalink raw reply

* Re: [PATCH net-next-2.6] bridge: add __rcu annotations
From: Stephen Hemminger @ 2010-11-13 18:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1289671130.2743.28.camel@edumazet-laptop>

On Sat, 13 Nov 2010 18:58:50 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le samedi 13 novembre 2010 à 09:35 -0800, Stephen Hemminger a écrit :
> > On Sat, 13 Nov 2010 09:15:28 +0100
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > 
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 578debb..ffbd177 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -996,7 +996,10 @@ struct net_device {
> > >  #endif
> > >  
> > >  	rx_handler_func_t	*rx_handler;
> > > -	void			*rx_handler_data;
> > > +	union {
> > > +		void				*rx_handler_data;
> > > +		struct net_bridge_port __rcu	*br_port_rcu;
> > > +	};
> > >  
> > >  	struct netdev_queue __rcu *ingress_queue;
> > 
> > I don't like making the generic hook typed again.
> > We don't do this for other callbacks, timers, workqueues, ...
> > Why is it necessary for RCU notation.
> > 
> 
> because rcu_dereference() needs the type for __CHECKER__/sparse checks
> 
> #define __rcu_dereference_check(p, c, space) \
>         ({ \
>                 typeof(*p) *_________p1 = (typeof(*p)*__force )ACCESS_ONCE(p); \
>                 rcu_lockdep_assert(c); \
>                 rcu_dereference_sparse(p, space); \
>                 smp_read_barrier_depends(); \
>                 ((typeof(*p) __force __kernel *)(_________p1)); \
>         })
> 
> So using a "void *ptr" is not an option
> 
> Its also cleaner to use
> 
> rcu_dereference(dev->br_port_rcu)
> 
> instead of 
> 
> (struct net_bridge_port *)rcu_dereference(dev->rx_handler_data)

There must be a better way. What about use of that hook by macvlan and openvswitch?



-- 

^ permalink raw reply

* Re: [PATCH net-next-2.6] bridge: add __rcu annotations
From: Eric Dumazet @ 2010-11-13 17:58 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20101113093545.6fe9c077@nehalam>

Le samedi 13 novembre 2010 à 09:35 -0800, Stephen Hemminger a écrit :
> On Sat, 13 Nov 2010 09:15:28 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 578debb..ffbd177 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -996,7 +996,10 @@ struct net_device {
> >  #endif
> >  
> >  	rx_handler_func_t	*rx_handler;
> > -	void			*rx_handler_data;
> > +	union {
> > +		void				*rx_handler_data;
> > +		struct net_bridge_port __rcu	*br_port_rcu;
> > +	};
> >  
> >  	struct netdev_queue __rcu *ingress_queue;
> 
> I don't like making the generic hook typed again.
> We don't do this for other callbacks, timers, workqueues, ...
> Why is it necessary for RCU notation.
> 

because rcu_dereference() needs the type for __CHECKER__/sparse checks

#define __rcu_dereference_check(p, c, space) \
        ({ \
                typeof(*p) *_________p1 = (typeof(*p)*__force )ACCESS_ONCE(p); \
                rcu_lockdep_assert(c); \
                rcu_dereference_sparse(p, space); \
                smp_read_barrier_depends(); \
                ((typeof(*p) __force __kernel *)(_________p1)); \
        })

So using a "void *ptr" is not an option

Its also cleaner to use

rcu_dereference(dev->br_port_rcu)

instead of 

(struct net_bridge_port *)rcu_dereference(dev->rx_handler_data)




^ permalink raw reply

* Re: [PATCH net-next-2.6] bridge: add __rcu annotations
From: Stephen Hemminger @ 2010-11-13 17:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1289636128.2743.15.camel@edumazet-laptop>

On Sat, 13 Nov 2010 09:15:28 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 578debb..ffbd177 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -996,7 +996,10 @@ struct net_device {
>  #endif
>  
>  	rx_handler_func_t	*rx_handler;
> -	void			*rx_handler_data;
> +	union {
> +		void				*rx_handler_data;
> +		struct net_bridge_port __rcu	*br_port_rcu;
> +	};
>  
>  	struct netdev_queue __rcu *ingress_queue;

I don't like making the generic hook typed again.
We don't do this for other callbacks, timers, workqueues, ...
Why is it necessary for RCU notation.

-- 

^ permalink raw reply

* YOU HAVE WON $1,900,000 USD
From: MICROSOFT AWARD 2010 PROMO @ 2010-11-13 13:55 UTC (permalink / raw)


You have just been selected as a lucky winner from the Microsoft monthly 
draws and you have won $1,900,000 USD, please contact
(mic.awardmicrosoft21@yahoo.com.hk ) with your name address, and telephone numbers 
for claims for claims. 

Congratulation!! Once again. 

Yours in service, 
The Award Team (Microsoft Corporation)

^ permalink raw reply

* Re: ethtool maintenance
From: Ben Hutchings @ 2010-11-13 14:30 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: NetDev, David Miller, Peter Martuccelli
In-Reply-To: <4CDE502D.6000506@garzik.org>

On Sat, 2010-11-13 at 03:45 -0500, Jeff Garzik wrote:
> So, a recent emergency surgery has really set me back, work-wise. 
> ethtool [the userspace utility] 2.6.36 is still not out, and personally 
> it remains a third or fourth priority.
> 
> While it's likely that I could get back to ethtool's patch queue next 
> week, it continues to be low man on the totem pole.  Seems only fair to 
> see if anyone else is interested in maintaining it.
> 
> I emailed Ben Hutchings privately about this, but haven't heard back, so 
> I thought I'd go ahead and email the list.
> 
> Anyone interested?

I am interested, but will need to clear it with my boss before making
such a commitment.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* [PATCH update] firewire: net: throttle TX queue before running out of tlabels
From: Stefan Richter @ 2010-11-13 12:16 UTC (permalink / raw)
  To: linux1394-devel; +Cc: linux-kernel, netdev
In-Reply-To: <tkrat.39c164e4c52e2fc8@s5r6.in-berlin.de>

This prevents firewire-net from submitting write requests in fast
succession until failure due to all 64 transaction labels used up for
unfinished split transactions.  The netif_stop/wake_queue API is used
for this purpose.

Without this stop/wake mechanism, datagrams were simply lost whenever
the tlabel pool was exhausted.  Plus, tlabel exhaustion by firewire-net
also prevented other unrelated outbound transactions to be initiated.

The high watermark is set to considerably less than 64 (I chose 8)
because peers which run current Linux firewire-ohci are still easily
saturated by this (i.e. some datagrams are dropped with ack-busy-*
events), depending on the hardware at transmitter and receiver side.

I did not see changes to resulting throughput that were discernible from
the usual measuring noise.  To do:  Revisit the choice of queue depth
once firewire-ohci's AR DMA was improved.

I wonder what a good net_device.tx_queue_len value is.  I just set it
to the same value as the chosen watermark for now.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
---
Update:  Stricter version with an early NETDEV_TX_BUSY return if the
.ndo_start_xmit method is called while the driver is stopping (or has
stopped) the transmit queue.  Thus there can really be never more than
FWNET_MAX_QUEUED_DATAGRAMS of pending outbound 1394 transactions.

 drivers/firewire/net.c |   53 ++++++++++++++++++++++++++++++-----------
 1 file changed, 39 insertions(+), 14 deletions(-)

Index: b/drivers/firewire/net.c
===================================================================
--- a/drivers/firewire/net.c
+++ b/drivers/firewire/net.c
@@ -28,8 +28,15 @@
 #include <asm/unaligned.h>
 #include <net/arp.h>
 
-#define FWNET_MAX_FRAGMENTS	25	/* arbitrary limit */
-#define FWNET_ISO_PAGE_COUNT	(PAGE_SIZE < 16 * 1024 ? 4 : 2)
+/* rx limits */
+#define FWNET_MAX_FRAGMENTS		25 /* arbitrary limit */
+#define FWNET_ISO_PAGE_COUNT		(PAGE_SIZE < 16*1024 ? 4 : 2)
+
+/* tx limits */
+#define FWNET_MAX_QUEUED_DATAGRAMS	8 /* should keep AT DMA busy enough */
+#define FWNET_MIN_QUEUED_DATAGRAMS	2
+#define FWNET_TX_QUEUE_STOPPED		FWNET_MAX_QUEUED_DATAGRAMS
+#define FWNET_TX_QUEUE_LEN		FWNET_MAX_QUEUED_DATAGRAMS /* ? */
 
 #define IEEE1394_BROADCAST_CHANNEL	31
 #define IEEE1394_ALL_NODES		(0xffc0 | 0x003f)
@@ -892,6 +899,16 @@ static void fwnet_free_ptask(struct fwne
 	kmem_cache_free(fwnet_packet_task_cache, ptask);
 }
 
+/* Caller must hold dev->lock. */
+static void dec_queued_datagrams(struct fwnet_device *dev)
+{
+	if (--dev->queued_datagrams ==
+			FWNET_MIN_QUEUED_DATAGRAMS + FWNET_TX_QUEUE_STOPPED) {
+		dev->queued_datagrams -= FWNET_TX_QUEUE_STOPPED;
+		netif_wake_queue(dev->netdev);
+	}
+}
+
 static int fwnet_send_packet(struct fwnet_packet_task *ptask);
 
 static void fwnet_transmit_packet_done(struct fwnet_packet_task *ptask)
@@ -908,7 +925,7 @@ static void fwnet_transmit_packet_done(s
 	/* Check whether we or the networking TX soft-IRQ is last user. */
 	free = (ptask->outstanding_pkts == 0 && ptask->enqueued);
 	if (free)
-		dev->queued_datagrams--;
+		dec_queued_datagrams(dev);
 
 	if (ptask->outstanding_pkts == 0) {
 		dev->netdev->stats.tx_packets++;
@@ -979,7 +996,7 @@ static void fwnet_transmit_packet_failed
 	/* Check whether we or the networking TX soft-IRQ is last user. */
 	free = ptask->enqueued;
 	if (free)
-		dev->queued_datagrams--;
+		dec_queued_datagrams(dev);
 
 	dev->netdev->stats.tx_dropped++;
 	dev->netdev->stats.tx_errors++;
@@ -1064,7 +1081,7 @@ static int fwnet_send_packet(struct fwne
 		if (!free)
 			ptask->enqueued = true;
 		else
-			dev->queued_datagrams--;
+			dec_queued_datagrams(dev);
 
 		spin_unlock_irqrestore(&dev->lock, flags);
 
@@ -1083,7 +1100,7 @@ static int fwnet_send_packet(struct fwne
 	if (!free)
 		ptask->enqueued = true;
 	else
-		dev->queued_datagrams--;
+		dec_queued_datagrams(dev);
 
 	spin_unlock_irqrestore(&dev->lock, flags);
 
@@ -1249,6 +1266,14 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 	struct fwnet_peer *peer;
 	unsigned long flags;
 
+	spin_lock_irqsave(&dev->lock, flags);
+
+	if (dev->queued_datagrams > FWNET_MAX_QUEUED_DATAGRAMS) {
+		spin_unlock_irqrestore(&dev->lock, flags);
+
+		return NETDEV_TX_BUSY;
+	}
+
 	ptask = kmem_cache_alloc(fwnet_packet_task_cache, GFP_ATOMIC);
 	if (ptask == NULL)
 		goto fail;
@@ -1267,9 +1292,6 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 	proto = hdr_buf.h_proto;
 	dg_size = skb->len;
 
-	/* serialize access to peer, including peer->datagram_label */
-	spin_lock_irqsave(&dev->lock, flags);
-
 	/*
 	 * Set the transmission type for the packet.  ARP packets and IP
 	 * broadcast packets are sent via GASP.
@@ -1291,7 +1313,7 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 
 		peer = fwnet_peer_find_by_guid(dev, be64_to_cpu(guid));
 		if (!peer || peer->fifo == FWNET_NO_FIFO_ADDR)
-			goto fail_unlock;
+			goto fail;
 
 		generation         = peer->generation;
 		dest_node          = peer->node_id;
@@ -1345,7 +1367,10 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 		max_payload += RFC2374_FRAG_HDR_SIZE;
 	}
 
-	dev->queued_datagrams++;
+	if (++dev->queued_datagrams == FWNET_MAX_QUEUED_DATAGRAMS) {
+		dev->queued_datagrams += FWNET_TX_QUEUE_STOPPED;
+		netif_stop_queue(dev->netdev);
+	}
 
 	spin_unlock_irqrestore(&dev->lock, flags);
 
@@ -1356,9 +1381,9 @@ static netdev_tx_t fwnet_tx(struct sk_bu
 
 	return NETDEV_TX_OK;
 
- fail_unlock:
-	spin_unlock_irqrestore(&dev->lock, flags);
  fail:
+	spin_unlock_irqrestore(&dev->lock, flags);
+
 	if (ptask)
 		kmem_cache_free(fwnet_packet_task_cache, ptask);
 
@@ -1415,7 +1440,7 @@ static void fwnet_init_dev(struct net_de
 	net->addr_len		= FWNET_ALEN;
 	net->hard_header_len	= FWNET_HLEN;
 	net->type		= ARPHRD_IEEE1394;
-	net->tx_queue_len	= 10;
+	net->tx_queue_len	= FWNET_TX_QUEUE_LEN;
 	SET_ETHTOOL_OPS(net, &fwnet_ethtool_ops);
 }
 

-- 
Stefan Richter
-=====-==-=- =-== -==-=
http://arcgraph.de/sr/


^ permalink raw reply

* ethtool maintenance
From: Jeff Garzik @ 2010-11-13  8:45 UTC (permalink / raw)
  To: NetDev; +Cc: David Miller, Peter Martuccelli, Ben Hutchings


So, a recent emergency surgery has really set me back, work-wise. 
ethtool [the userspace utility] 2.6.36 is still not out, and personally 
it remains a third or fourth priority.

While it's likely that I could get back to ethtool's patch queue next 
week, it continues to be low man on the totem pole.  Seems only fair to 
see if anyone else is interested in maintaining it.

I emailed Ben Hutchings privately about this, but haven't heard back, so 
I thought I'd go ahead and email the list.

Anyone interested?

	Jeff





^ permalink raw reply

* [RFC] pull linux-2.6 into net-next-2.6 ?
From: Eric Dumazet @ 2010-11-13  8:21 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Hi David

Andrew pushed to Linus the patch introducing atomic_inc_not_zero_hint()

(commit 3f9d35b9514da675)

Would it be possible to get it in net-next-2.6 so that I can start using
it in network stack ?

Thanks !



^ permalink raw reply

* [PATCH net-next-2.6] bridge: add __rcu annotations
From: Eric Dumazet @ 2010-11-13  8:15 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Stephen Hemminger

Add modern __rcu annotations to bridge code, to reduce sparse errors,
and self document code.
(CONFIG_SPARSE_RCU_POINTER=y)

Use of an anonymous union in net_device to get proper type for
net_dev->br_port_rcu, to get cleaner br_port_get_rcu() definition.

br_port_get() renamed to br_port_get_rtnl() to make clear RTNL is held.

Note: Add br_should_route_hook_t typedef, this is the only way we can
get a clean RCU implementation for function pointer.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
---
 include/linux/if_bridge.h             |    3 
 include/linux/netdevice.h             |    5 +
 net/bridge/br.c                       |    5 -
 net/bridge/br_if.c                    |    2 
 net/bridge/br_input.c                 |    4 -
 net/bridge/br_multicast.c             |   78 +++++++++++++++---------
 net/bridge/br_netlink.c               |    4 -
 net/bridge/br_notify.c                |    4 -
 net/bridge/br_private.h               |   11 +--
 net/bridge/netfilter/ebtable_broute.c |    2 
 10 files changed, 72 insertions(+), 46 deletions(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 0d241a5..dc813e9 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -102,7 +102,8 @@ struct __fdb_entry {
 #include <linux/netdevice.h>
 
 extern void brioctl_set(int (*ioctl_hook)(struct net *, unsigned int, void __user *));
-extern int (*br_should_route_hook)(struct sk_buff *skb);
+typedef int (*br_should_route_hook_t)(struct sk_buff *skb);
+extern br_should_route_hook_t __rcu *br_should_route_hook;
 
 #endif
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 578debb..ffbd177 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -996,7 +996,10 @@ struct net_device {
 #endif
 
 	rx_handler_func_t	*rx_handler;
-	void			*rx_handler_data;
+	union {
+		void				*rx_handler_data;
+		struct net_bridge_port __rcu	*br_port_rcu;
+	};
 
 	struct netdev_queue __rcu *ingress_queue;
 
diff --git a/net/bridge/br.c b/net/bridge/br.c
index c8436fa..9fad125 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -22,7 +22,8 @@
 
 #include "br_private.h"
 
-int (*br_should_route_hook)(struct sk_buff *skb);
+br_should_route_hook_t __rcu *br_should_route_hook __read_mostly;
+EXPORT_SYMBOL(br_should_route_hook);
 
 static const struct stp_proto br_stp_proto = {
 	.rcv	= br_stp_rcv,
@@ -102,8 +103,6 @@ static void __exit br_deinit(void)
 	br_fdb_fini();
 }
 
-EXPORT_SYMBOL(br_should_route_hook);
-
 module_init(br_init)
 module_exit(br_deinit)
 MODULE_LICENSE("GPL");
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 89ad25a..3a611d2 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -478,7 +478,7 @@ int br_del_if(struct net_bridge *br, struct net_device *dev)
 	if (!br_port_exists(dev))
 		return -EINVAL;
 
-	p = br_port_get(dev);
+	p = br_port_get_rtnl(dev);
 	if (p->br != br)
 		return -EINVAL;
 
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 25207a1..948c921 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -139,7 +139,7 @@ struct sk_buff *br_handle_frame(struct sk_buff *skb)
 {
 	struct net_bridge_port *p;
 	const unsigned char *dest = eth_hdr(skb)->h_dest;
-	int (*rhook)(struct sk_buff *skb);
+	br_should_route_hook_t *rhook;
 
 	if (unlikely(skb->pkt_type == PACKET_LOOPBACK))
 		return skb;
@@ -174,7 +174,7 @@ forward:
 	case BR_STATE_FORWARDING:
 		rhook = rcu_dereference(br_should_route_hook);
 		if (rhook != NULL) {
-			if (rhook(skb))
+			if ((*rhook)(skb))
 				return skb;
 			dest = eth_hdr(skb)->h_dest;
 		}
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index eb5b256..326e599 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -33,6 +33,9 @@
 
 #include "br_private.h"
 
+#define mlock_dereference(X, br) \
+	rcu_dereference_protected(X, lockdep_is_held(&br->multicast_lock))
+
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 static inline int ipv6_is_local_multicast(const struct in6_addr *addr)
 {
@@ -135,7 +138,7 @@ static struct net_bridge_mdb_entry *br_mdb_ip6_get(
 struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br,
 					struct sk_buff *skb)
 {
-	struct net_bridge_mdb_htable *mdb = br->mdb;
+	struct net_bridge_mdb_htable *mdb = rcu_dereference(br->mdb);
 	struct br_ip ip;
 
 	if (br->multicast_disabled)
@@ -235,7 +238,8 @@ static void br_multicast_group_expired(unsigned long data)
 	if (mp->ports)
 		goto out;
 
-	mdb = br->mdb;
+	mdb = mlock_dereference(br->mdb, br);
+
 	hlist_del_rcu(&mp->hlist[mdb->ver]);
 	mdb->size--;
 
@@ -249,16 +253,20 @@ out:
 static void br_multicast_del_pg(struct net_bridge *br,
 				struct net_bridge_port_group *pg)
 {
-	struct net_bridge_mdb_htable *mdb = br->mdb;
+	struct net_bridge_mdb_htable *mdb;
 	struct net_bridge_mdb_entry *mp;
 	struct net_bridge_port_group *p;
-	struct net_bridge_port_group **pp;
+	struct net_bridge_port_group __rcu **pp;
+
+	mdb = mlock_dereference(br->mdb, br);
 
 	mp = br_mdb_ip_get(mdb, &pg->addr);
 	if (WARN_ON(!mp))
 		return;
 
-	for (pp = &mp->ports; (p = *pp); pp = &p->next) {
+	for (pp = &mp->ports;
+	     (p = mlock_dereference(*pp, br)) != NULL;
+	     pp = &p->next) {
 		if (p != pg)
 			continue;
 
@@ -294,10 +302,10 @@ out:
 	spin_unlock(&br->multicast_lock);
 }
 
-static int br_mdb_rehash(struct net_bridge_mdb_htable **mdbp, int max,
+static int br_mdb_rehash(struct net_bridge_mdb_htable __rcu **mdbp, int max,
 			 int elasticity)
 {
-	struct net_bridge_mdb_htable *old = *mdbp;
+	struct net_bridge_mdb_htable *old = rcu_dereference_protected(*mdbp, 1);
 	struct net_bridge_mdb_htable *mdb;
 	int err;
 
@@ -569,7 +577,7 @@ static struct net_bridge_mdb_entry *br_multicast_get_group(
 	struct net_bridge *br, struct net_bridge_port *port,
 	struct br_ip *group, int hash)
 {
-	struct net_bridge_mdb_htable *mdb = br->mdb;
+	struct net_bridge_mdb_htable *mdb;
 	struct net_bridge_mdb_entry *mp;
 	struct hlist_node *p;
 	unsigned count = 0;
@@ -577,6 +585,7 @@ static struct net_bridge_mdb_entry *br_multicast_get_group(
 	int elasticity;
 	int err;
 
+	mdb = rcu_dereference_protected(br->mdb, 1);
 	hlist_for_each_entry(mp, p, &mdb->mhash[hash], hlist[mdb->ver]) {
 		count++;
 		if (unlikely(br_ip_equal(group, &mp->addr)))
@@ -642,10 +651,11 @@ static struct net_bridge_mdb_entry *br_multicast_new_group(
 	struct net_bridge *br, struct net_bridge_port *port,
 	struct br_ip *group)
 {
-	struct net_bridge_mdb_htable *mdb = br->mdb;
+	struct net_bridge_mdb_htable *mdb;
 	struct net_bridge_mdb_entry *mp;
 	int hash;
 
+	mdb = rcu_dereference_protected(br->mdb, 1);
 	if (!mdb) {
 		if (br_mdb_rehash(&br->mdb, BR_HASH_SIZE, 0))
 			return NULL;
@@ -660,7 +670,7 @@ static struct net_bridge_mdb_entry *br_multicast_new_group(
 
 	case -EAGAIN:
 rehash:
-		mdb = br->mdb;
+		mdb = rcu_dereference_protected(br->mdb, 1);
 		hash = br_ip_hash(mdb, group);
 		break;
 
@@ -692,7 +702,7 @@ static int br_multicast_add_group(struct net_bridge *br,
 {
 	struct net_bridge_mdb_entry *mp;
 	struct net_bridge_port_group *p;
-	struct net_bridge_port_group **pp;
+	struct net_bridge_port_group __rcu **pp;
 	unsigned long now = jiffies;
 	int err;
 
@@ -712,7 +722,9 @@ static int br_multicast_add_group(struct net_bridge *br,
 		goto out;
 	}
 
-	for (pp = &mp->ports; (p = *pp); pp = &p->next) {
+	for (pp = &mp->ports;
+	     (p = mlock_dereference(*pp, br)) != NULL;
+	     pp = &p->next) {
 		if (p->port == port)
 			goto found;
 		if ((unsigned long)p->port < (unsigned long)port)
@@ -1106,7 +1118,7 @@ static int br_ip4_multicast_query(struct net_bridge *br,
 	struct net_bridge_mdb_entry *mp;
 	struct igmpv3_query *ih3;
 	struct net_bridge_port_group *p;
-	struct net_bridge_port_group **pp;
+	struct net_bridge_port_group __rcu **pp;
 	unsigned long max_delay;
 	unsigned long now = jiffies;
 	__be32 group;
@@ -1145,7 +1157,7 @@ static int br_ip4_multicast_query(struct net_bridge *br,
 	if (!group)
 		goto out;
 
-	mp = br_mdb_ip4_get(br->mdb, group);
+	mp = br_mdb_ip4_get(mlock_dereference(br->mdb, br), group);
 	if (!mp)
 		goto out;
 
@@ -1157,7 +1169,9 @@ static int br_ip4_multicast_query(struct net_bridge *br,
 	     try_to_del_timer_sync(&mp->timer) >= 0))
 		mod_timer(&mp->timer, now + max_delay);
 
-	for (pp = &mp->ports; (p = *pp); pp = &p->next) {
+	for (pp = &mp->ports;
+	     (p = mlock_dereference(*pp, br)) != NULL;
+	     pp = &p->next) {
 		if (timer_pending(&p->timer) ?
 		    time_after(p->timer.expires, now + max_delay) :
 		    try_to_del_timer_sync(&p->timer) >= 0)
@@ -1178,7 +1192,8 @@ static int br_ip6_multicast_query(struct net_bridge *br,
 	struct mld_msg *mld = (struct mld_msg *) icmp6_hdr(skb);
 	struct net_bridge_mdb_entry *mp;
 	struct mld2_query *mld2q;
-	struct net_bridge_port_group *p, **pp;
+	struct net_bridge_port_group *p;
+	struct net_bridge_port_group __rcu **pp;
 	unsigned long max_delay;
 	unsigned long now = jiffies;
 	struct in6_addr *group = NULL;
@@ -1214,7 +1229,7 @@ static int br_ip6_multicast_query(struct net_bridge *br,
 	if (!group)
 		goto out;
 
-	mp = br_mdb_ip6_get(br->mdb, group);
+	mp = br_mdb_ip6_get(mlock_dereference(br->mdb, br), group);
 	if (!mp)
 		goto out;
 
@@ -1225,7 +1240,9 @@ static int br_ip6_multicast_query(struct net_bridge *br,
 	     try_to_del_timer_sync(&mp->timer) >= 0))
 		mod_timer(&mp->timer, now + max_delay);
 
-	for (pp = &mp->ports; (p = *pp); pp = &p->next) {
+	for (pp = &mp->ports;
+	     (p = mlock_dereference(*pp, br)) != NULL;
+	     pp = &p->next) {
 		if (timer_pending(&p->timer) ?
 		    time_after(p->timer.expires, now + max_delay) :
 		    try_to_del_timer_sync(&p->timer) >= 0)
@@ -1254,7 +1271,7 @@ static void br_multicast_leave_group(struct net_bridge *br,
 	    timer_pending(&br->multicast_querier_timer))
 		goto out;
 
-	mdb = br->mdb;
+	mdb = mlock_dereference(br->mdb, br);
 	mp = br_mdb_ip_get(mdb, group);
 	if (!mp)
 		goto out;
@@ -1277,7 +1294,9 @@ static void br_multicast_leave_group(struct net_bridge *br,
 		goto out;
 	}
 
-	for (p = mp->ports; p; p = p->next) {
+	for (p = mlock_dereference(mp->ports, br);
+	     p != NULL;
+	     p = mlock_dereference(p->next, br)) {
 		if (p->port != port)
 			continue;
 
@@ -1625,7 +1644,7 @@ void br_multicast_stop(struct net_bridge *br)
 	del_timer_sync(&br->multicast_query_timer);
 
 	spin_lock_bh(&br->multicast_lock);
-	mdb = br->mdb;
+	mdb = mlock_dereference(br->mdb, br);
 	if (!mdb)
 		goto out;
 
@@ -1729,6 +1748,7 @@ int br_multicast_toggle(struct net_bridge *br, unsigned long val)
 {
 	struct net_bridge_port *port;
 	int err = 0;
+	struct net_bridge_mdb_htable *mdb;
 
 	spin_lock(&br->multicast_lock);
 	if (br->multicast_disabled == !val)
@@ -1741,15 +1761,16 @@ int br_multicast_toggle(struct net_bridge *br, unsigned long val)
 	if (!netif_running(br->dev))
 		goto unlock;
 
-	if (br->mdb) {
-		if (br->mdb->old) {
+	mdb = mlock_dereference(br->mdb, br);
+	if (mdb) {
+		if (mdb->old) {
 			err = -EEXIST;
 rollback:
 			br->multicast_disabled = !!val;
 			goto unlock;
 		}
 
-		err = br_mdb_rehash(&br->mdb, br->mdb->max,
+		err = br_mdb_rehash(&br->mdb, mdb->max,
 				    br->hash_elasticity);
 		if (err)
 			goto rollback;
@@ -1774,6 +1795,7 @@ int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val)
 {
 	int err = -ENOENT;
 	u32 old;
+	struct net_bridge_mdb_htable *mdb;
 
 	spin_lock(&br->multicast_lock);
 	if (!netif_running(br->dev))
@@ -1782,7 +1804,9 @@ int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val)
 	err = -EINVAL;
 	if (!is_power_of_2(val))
 		goto unlock;
-	if (br->mdb && val < br->mdb->size)
+
+	mdb = mlock_dereference(br->mdb, br);
+	if (mdb && val < mdb->size)
 		goto unlock;
 
 	err = 0;
@@ -1790,8 +1814,8 @@ int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val)
 	old = br->hash_max;
 	br->hash_max = val;
 
-	if (br->mdb) {
-		if (br->mdb->old) {
+	if (mdb) {
+		if (mdb->old) {
 			err = -EEXIST;
 rollback:
 			br->hash_max = old;
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 4a6a378..b301dfc 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -123,7 +123,7 @@ static int br_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 		if (!br_port_exists(dev) || idx < cb->args[0])
 			goto skip;
 
-		if (br_fill_ifinfo(skb, br_port_get(dev),
+		if (br_fill_ifinfo(skb, br_port_get_rtnl(dev),
 				   NETLINK_CB(cb->skb).pid,
 				   cb->nlh->nlmsg_seq, RTM_NEWLINK,
 				   NLM_F_MULTI) < 0)
@@ -171,7 +171,7 @@ static int br_rtm_setlink(struct sk_buff *skb,  struct nlmsghdr *nlh, void *arg)
 
 	if (!br_port_exists(dev))
 		return -EINVAL;
-	p = br_port_get(dev);
+	p = br_port_get_rtnl(dev);
 
 	/* if kernel STP is running, don't allow changes */
 	if (p->br->stp_enabled == BR_KERNEL_STP)
diff --git a/net/bridge/br_notify.c b/net/bridge/br_notify.c
index 404d4e1..e72e49e 100644
--- a/net/bridge/br_notify.c
+++ b/net/bridge/br_notify.c
@@ -32,7 +32,7 @@ struct notifier_block br_device_notifier = {
 static int br_device_event(struct notifier_block *unused, unsigned long event, void *ptr)
 {
 	struct net_device *dev = ptr;
-	struct net_bridge_port *p = br_port_get(dev);
+	struct net_bridge_port *p;
 	struct net_bridge *br;
 	int err;
 
@@ -40,7 +40,7 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v
 	if (!br_port_exists(dev))
 		return NOTIFY_DONE;
 
-	p = br_port_get(dev);
+	p = br_port_get_rtnl(dev);
 	br = p->br;
 
 	switch (event) {
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 75c90ed..32235d4 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -72,7 +72,7 @@ struct net_bridge_fdb_entry
 
 struct net_bridge_port_group {
 	struct net_bridge_port		*port;
-	struct net_bridge_port_group	*next;
+	struct net_bridge_port_group __rcu	*next;
 	struct hlist_node		mglist;
 	struct rcu_head			rcu;
 	struct timer_list		timer;
@@ -86,7 +86,7 @@ struct net_bridge_mdb_entry
 	struct hlist_node		hlist[2];
 	struct hlist_node		mglist;
 	struct net_bridge		*br;
-	struct net_bridge_port_group	*ports;
+	struct net_bridge_port_group __rcu	*ports;
 	struct rcu_head			rcu;
 	struct timer_list		timer;
 	struct timer_list		query_timer;
@@ -151,9 +151,8 @@ struct net_bridge_port
 #endif
 };
 
-#define br_port_get_rcu(dev) \
-	((struct net_bridge_port *) rcu_dereference(dev->rx_handler_data))
-#define br_port_get(dev) ((struct net_bridge_port *) dev->rx_handler_data)
+#define br_port_get_rcu(dev) rcu_dereference(dev->br_port_rcu)
+#define br_port_get_rtnl(dev) rtnl_dereference(dev->br_port_rcu)
 #define br_port_exists(dev) (dev->priv_flags & IFF_BRIDGE_PORT)
 
 struct br_cpu_netstats {
@@ -227,7 +226,7 @@ struct net_bridge
 	unsigned long			multicast_startup_query_interval;
 
 	spinlock_t			multicast_lock;
-	struct net_bridge_mdb_htable	*mdb;
+	struct net_bridge_mdb_htable __rcu	*mdb;
 	struct hlist_head		router_list;
 	struct hlist_head		mglist;
 
diff --git a/net/bridge/netfilter/ebtable_broute.c b/net/bridge/netfilter/ebtable_broute.c
index ae3f106..4ce9d8a 100644
--- a/net/bridge/netfilter/ebtable_broute.c
+++ b/net/bridge/netfilter/ebtable_broute.c
@@ -87,7 +87,7 @@ static int __init ebtable_broute_init(void)
 	if (ret < 0)
 		return ret;
 	/* see br_input.c */
-	rcu_assign_pointer(br_should_route_hook, ebt_broute);
+	rcu_assign_pointer(br_should_route_hook, (br_should_route_hook_t *)ebt_broute);
 	return 0;
 }
 



^ permalink raw reply related

* Re: [PATCH net-next-2.6 V2] igmp: RCU conversion of in_dev->mc_list
From: Américo Wang @ 2010-11-13  6:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Américo Wang, Cypher Wu, linux-kernel, netdev, David Miller
In-Reply-To: <1289576810.3185.261.camel@edumazet-laptop>

>
>Here is an updated version.
>
>[PATCH net-next-2.6 V2] igmp: RCU conversion of in_dev->mc_list
>
>in_dev->mc_list is protected by one rwlock (in_dev->mc_list_lock).
>
>This can easily be converted to a RCU protection.
>
>Writers hold RTNL, so mc_list_lock is removed, not replaced by a
>spinlock.

Ah, this saves much work.

>
>Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>Cc: Cypher Wu <cypher.w@gmail.com>
>Cc: Américo Wang <xiyou.wangcong@gmail.com>

I just did a quick look, it looks good to me.

Thanks!

^ permalink raw reply

* Re: Kernel rwlock design, Multicore and IGMP
From: Américo Wang @ 2010-11-13  6:35 UTC (permalink / raw)
  To: Cypher Wu
  Cc: Américo Wang, Yong Zhang, Eric Dumazet, linux-kernel, netdev
In-Reply-To: <AANLkTi=0Z1zVqCvYwKiQ8cUZoeD_rLgZ+C07GuFgA7E7@mail.gmail.com>

On Fri, Nov 12, 2010 at 07:06:47PM +0800, Cypher Wu wrote:
>>
>> Note, on Tile, it uses a little different algorithm.
>>
>
>It seems that rwlock on x86 and tile have different behavior, x86 use
>RW_LOCK_BIAS, when read_lock() it will test if the lock is 0, and if
>so then the read_lock() have to 'spinning', otherwise it dec the lock;
>when write_lock() tried it first check if lock is It seems that rwlock
>on x86 and tile have different behavior, x86 use RW_LOCK_BIAS and if
>so, set lock to 0 and continue, otherwise it will 'spinning'.
>I'm not very familiar with x86 architecture, but the code seems like
>working that way.

No, they should be the same, sorry I made a mistake in the above reply.

Although Tile uses shifts in implementation while x86 uses inc/dec,
the idea is same, either writers use higher bits and readers use
lower bits or vice-versa.

-- 
Live like a child, think like the god.
 

^ permalink raw reply

* Re: Kernel rwlock design, Multicore and IGMP
From: Américo Wang @ 2010-11-13  6:28 UTC (permalink / raw)
  To: Yong Zhang
  Cc: Américo Wang, Eric Dumazet, Cypher Wu, linux-kernel, netdev
In-Reply-To: <20101112130017.GA9752@zhy>

On Fri, Nov 12, 2010 at 09:00:17PM +0800, Yong Zhang wrote:
>On Fri, Nov 12, 2010 at 05:18:18PM +0800, Américo Wang wrote:
>> On Fri, Nov 12, 2010 at 05:09:45PM +0800, Yong Zhang wrote:
>> >On Fri, Nov 12, 2010 at 4:19 PM, Américo Wang <xiyou.wangcong@gmail.com> wrote:
>> >> On Fri, Nov 12, 2010 at 08:27:54AM +0100, Eric Dumazet wrote:
>> >>>Le vendredi 12 novembre 2010 à 15:13 +0800, Américo Wang a écrit :
>> >>>> On Fri, Nov 12, 2010 at 11:32:59AM +0800, Cypher Wu wrote:
>> >>>> >On Thu, Nov 11, 2010 at 11:23 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> >>>> >> Le jeudi 11 novembre 2010 à 21:49 +0800, Cypher Wu a écrit :
>> >>>> >>
>> >>>> >> Hi
>> >>>> >>
>> >>>> >> CC netdev, since you ask questions about network stuff _and_ rwlock
>> >>>> >>
>> >>>> >>
>> >>>> >>> I'm using TILEPro and its rwlock in kernel is a liitle different than
>> >>>> >>> other platforms. It have a priority for write lock that when tried it
>> >>>> >>> will block the following read lock even if read lock is hold by
>> >>>> >>> others. Its code can be read in Linux Kernel 2.6.36 in
>> >>>> >>> arch/tile/lib/spinlock_32.c.
>> >>>> >>
>> >>>> >> This seems a bug to me.
>> >>>> >>
>> >>>> >> read_lock() can be nested. We used such a schem in the past in iptables
>> >>>> >> (it can re-enter itself),
>> >>>> >> and we used instead a spinlock(), but with many discussions with lkml
>> >>>> >> and Linus himself if I remember well.
>> >>>> >>
>> >>>> >It seems not a problem that read_lock() can be nested or not since
>> >>>> >rwlock doesn't have 'owner', it's just that should we give
>> >>>> >write_lock() a priority than read_lock() since if there have a lot
>> >>>> >read_lock()s then they'll starve write_lock().
>> >>>> >We should work out a well defined behavior so all the
>> >>>> >platform-dependent raw_rwlock has to design under that principle.
>> >>>>
>> >>>
>> >>>AFAIK, Lockdep allows read_lock() to be nested.
>> >>>
>> >>>> It is a known weakness of rwlock, it is designed like that. :)
>> >>>>
>> >>>
>> >>>Agreed.
>> >>>
>> >>
>> >> Just for record, both Tile and X86 implement rwlock with a write-bias,
>> >> this somewhat reduces the write-starvation problem.
>> >
>> >Are you sure(on x86)?
>> >
>> >It seems that we never realize writer-bias rwlock.
>> >
>> 
>> Try
>> 
>> % grep RW_LOCK_BIAS -nr arch/x86
>> 
>> *And* read the code to see how it works. :)
>
>If read_lock()/write_lock() fails, the subtracted value(1 for
>read_lock() and RW_LOCK_BIAS for write_lock()) is added back.
>So reader and writer will contend on the same lock fairly.
>
>And RW_LOCK_BIAS based rwlock is a variant of sighed-test
>rwlock, so it works in the same way to highest-bit-set mode
>rwlock.
>
>Seem you're cheated by it's name(RW_LOCK_BIAS). :)

Ah, no, I made a mistake that I thought the initial value
of rwlock is something like 0, but clearly it is RW_LOCK_BIAS.
Yeah, then there is certainly no bias to writers, and x86
must be using almost the same algorithm with Tile.

-- 
Live like a child, think like the god.
 

^ permalink raw reply

* [PATCH] net: use the macros defined for the members of flowi
From: Changli Gao @ 2010-11-13  4:43 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Changli Gao

Use the macros defined for the members of flowi to clean the code up.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
---
 include/net/route.h             |   12 ++++-------
 net/atm/clip.c                  |    3 +-
 net/bridge/br_netfilter.c       |    9 +-------
 net/dccp/ipv4.c                 |   13 ++++--------
 net/decnet/dn_route.c           |   22 ++++++++------------
 net/decnet/dn_rules.c           |    2 -
 net/ipv4/af_inet.c              |   18 ++++------------
 net/ipv4/arp.c                  |   12 +++++------
 net/ipv4/fib_frontend.c         |   28 +++++++-------------------
 net/ipv4/fib_semantics.c        |    8 +------
 net/ipv4/icmp.c                 |   28 ++++++++------------------
 net/ipv4/igmp.c                 |    8 ++-----
 net/ipv4/inet_connection_sock.c |   15 +++++--------
 net/ipv4/ip_gre.c               |   30 ++++++++-------------------
 net/ipv4/ip_output.c            |   25 +++++++++--------------
 net/ipv4/ipip.c                 |   20 +++++-------------
 net/ipv4/ipmr.c                 |   18 ++++------------
 net/ipv4/netfilter.c            |    8 +++----
 net/ipv4/raw.c                  |    7 ++----
 net/ipv4/route.c                |   43 +++++++++++++++-------------------------
 net/ipv4/syncookies.c           |   15 +++++--------
 net/ipv4/udp.c                  |   12 ++++-------
 net/ipv4/xfrm4_policy.c         |    8 +------
 net/ipv6/ip6mr.c                |    4 ---
 net/ipv6/netfilter.c            |    6 +----
 net/ipv6/route.c                |   24 +++++-----------------
 net/ipv6/sit.c                  |   14 +++++--------
 net/l2tp/l2tp_ip.c              |   12 ++++-------
 net/netfilter/ipvs/ip_vs_ctl.c  |    6 +----
 net/netfilter/ipvs/ip_vs_xmit.c |   34 +++++++------------------------
 net/netfilter/xt_TEE.c          |   12 +++++------
 net/rxrpc/ar-peer.c             |   10 ++++-----
 32 files changed, 171 insertions(+), 315 deletions(-)
diff --git a/include/net/route.h b/include/net/route.h
index 5cd46d1..b8c1f77 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -169,14 +169,12 @@ static inline int ip_route_connect(struct rtable **rp, __be32 dst,
 {
 	struct flowi fl = { .oif = oif,
 			    .mark = sk->sk_mark,
-			    .nl_u = { .ip4_u = { .daddr = dst,
-						 .saddr = src,
-						 .tos   = tos } },
+			    .fl4_dst = dst,
+			    .fl4_src = src,
+			    .fl4_tos = tos,
 			    .proto = protocol,
-			    .uli_u = { .ports =
-				       { .sport = sport,
-					 .dport = dport } } };
-
+			    .fl_ip_sport = sport,
+			    .fl_ip_dport = dport };
 	int err;
 	struct net *net = sock_net(sk);
 
diff --git a/net/atm/clip.c b/net/atm/clip.c
index ff956d1..d257da5 100644
--- a/net/atm/clip.c
+++ b/net/atm/clip.c
@@ -502,7 +502,8 @@ static int clip_setentry(struct atm_vcc *vcc, __be32 ip)
 	struct atmarp_entry *entry;
 	int error;
 	struct clip_vcc *clip_vcc;
-	struct flowi fl = { .nl_u = { .ip4_u = { .daddr = ip, .tos = 1}} };
+	struct flowi fl = { .fl4_dst = ip,
+			    .fl4_tos = 1 };
 	struct rtable *rt;
 
 	if (vcc->push != clip_push) {
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 865fd76..36cd0b7 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -412,13 +412,8 @@ static int br_nf_pre_routing_finish(struct sk_buff *skb)
 	if (dnat_took_place(skb)) {
 		if ((err = ip_route_input(skb, iph->daddr, iph->saddr, iph->tos, dev))) {
 			struct flowi fl = {
-				.nl_u = {
-					.ip4_u = {
-						 .daddr = iph->daddr,
-						 .saddr = 0,
-						 .tos = RT_TOS(iph->tos) },
-				},
-				.proto = 0,
+				.fl4_dst = iph->daddr,
+				.fl4_tos = RT_TOS(iph->tos),
 			};
 			struct in_device *in_dev = __in_dev_get_rcu(dev);
 
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 3f69ea1..45a434f 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -462,15 +462,12 @@ static struct dst_entry* dccp_v4_route_skb(struct net *net, struct sock *sk,
 {
 	struct rtable *rt;
 	struct flowi fl = { .oif = skb_rtable(skb)->rt_iif,
-			    .nl_u = { .ip4_u =
-				      { .daddr = ip_hdr(skb)->saddr,
-					.saddr = ip_hdr(skb)->daddr,
-					.tos = RT_CONN_FLAGS(sk) } },
+			    .fl4_dst = ip_hdr(skb)->saddr,
+			    .fl4_src = ip_hdr(skb)->daddr,
+			    .fl4_tos = RT_CONN_FLAGS(sk),
 			    .proto = sk->sk_protocol,
-			    .uli_u = { .ports =
-				       { .sport = dccp_hdr(skb)->dccph_dport,
-					 .dport = dccp_hdr(skb)->dccph_sport }
-				     }
+			    .fl_ip_sport = dccp_hdr(skb)->dccph_dport,
+			    .fl_ip_dport = dccp_hdr(skb)->dccph_sport
 			  };
 
 	security_skb_classify_flow(skb, &fl);
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 474d54d..8280e43 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -271,10 +271,10 @@ static void dn_dst_link_failure(struct sk_buff *skb)
 
 static inline int compare_keys(struct flowi *fl1, struct flowi *fl2)
 {
-	return ((fl1->nl_u.dn_u.daddr ^ fl2->nl_u.dn_u.daddr) |
-		(fl1->nl_u.dn_u.saddr ^ fl2->nl_u.dn_u.saddr) |
+	return ((fl1->fld_dst ^ fl2->fld_dst) |
+		(fl1->fld_src ^ fl2->fld_src) |
 		(fl1->mark ^ fl2->mark) |
-		(fl1->nl_u.dn_u.scope ^ fl2->nl_u.dn_u.scope) |
+		(fl1->fld_scope ^ fl2->fld_scope) |
 		(fl1->oif ^ fl2->oif) |
 		(fl1->iif ^ fl2->iif)) == 0;
 }
@@ -882,11 +882,9 @@ static inline __le16 dn_fib_rules_map_destination(__le16 daddr, struct dn_fib_re
 
 static int dn_route_output_slow(struct dst_entry **pprt, const struct flowi *oldflp, int try_hard)
 {
-	struct flowi fl = { .nl_u = { .dn_u =
-				      { .daddr = oldflp->fld_dst,
-					.saddr = oldflp->fld_src,
-					.scope = RT_SCOPE_UNIVERSE,
-				     } },
+	struct flowi fl = { .fld_dst = oldflp->fld_dst,
+			    .fld_src = oldflp->fld_src,
+			    .fld_scope = RT_SCOPE_UNIVERSE,
 			    .mark = oldflp->mark,
 			    .iif = init_net.loopback_dev->ifindex,
 			    .oif = oldflp->oif };
@@ -1230,11 +1228,9 @@ static int dn_route_input_slow(struct sk_buff *skb)
 	int flags = 0;
 	__le16 gateway = 0;
 	__le16 local_src = 0;
-	struct flowi fl = { .nl_u = { .dn_u =
-				     { .daddr = cb->dst,
-				       .saddr = cb->src,
-				       .scope = RT_SCOPE_UNIVERSE,
-				    } },
+	struct flowi fl = { .fld_dst = cb->dst,
+			    .fld_src = cb->src,
+			    .fld_scope = RT_SCOPE_UNIVERSE,
 			    .mark = skb->mark,
 			    .iif = skb->dev->ifindex };
 	struct dn_fib_res res = { .fi = NULL, .type = RTN_UNREACHABLE };
diff --git a/net/decnet/dn_rules.c b/net/decnet/dn_rules.c
index 48fdf10..6eb91df 100644
--- a/net/decnet/dn_rules.c
+++ b/net/decnet/dn_rules.c
@@ -175,7 +175,7 @@ static int dn_fib_rule_compare(struct fib_rule *rule, struct fib_rule_hdr *frh,
 
 unsigned dnet_addr_type(__le16 addr)
 {
-	struct flowi fl = { .nl_u = { .dn_u = { .daddr = addr } } };
+	struct flowi fl = { .fld_dst = addr };
 	struct dn_fib_res res;
 	unsigned ret = RTN_UNICAST;
 	struct dn_fib_table *tb = dn_fib_get_table(RT_TABLE_LOCAL, 0);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index f581f77..f2b6110 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1148,21 +1148,13 @@ int inet_sk_rebuild_header(struct sock *sk)
 	struct flowi fl = {
 		.oif = sk->sk_bound_dev_if,
 		.mark = sk->sk_mark,
-		.nl_u = {
-			.ip4_u = {
-				.daddr	= daddr,
-				.saddr	= inet->inet_saddr,
-				.tos	= RT_CONN_FLAGS(sk),
-			},
-		},
+		.fl4_dst = daddr,
+		.fl4_src = inet->inet_saddr,
+		.fl4_tos = RT_CONN_FLAGS(sk),
 		.proto = sk->sk_protocol,
 		.flags = inet_sk_flowi_flags(sk),
-		.uli_u = {
-			.ports = {
-				.sport = inet->inet_sport,
-				.dport = inet->inet_dport,
-			},
-		},
+		.fl_ip_sport = inet->inet_sport,
+		.fl_ip_dport = inet->inet_dport,
 	};
 
 	security_sk_classify_flow(sk, &fl);
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index d8e540c..b564b76 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -433,8 +433,8 @@ static int arp_ignore(struct in_device *in_dev, __be32 sip, __be32 tip)
 
 static int arp_filter(__be32 sip, __be32 tip, struct net_device *dev)
 {
-	struct flowi fl = { .nl_u = { .ip4_u = { .daddr = sip,
-						 .saddr = tip } } };
+	struct flowi fl = { .fl4_dst = sip,
+			    .fl4_src = tip };
 	struct rtable *rt;
 	int flag = 0;
 	/*unsigned long now; */
@@ -1061,8 +1061,8 @@ static int arp_req_set(struct net *net, struct arpreq *r,
 	if (r->arp_flags & ATF_PERM)
 		r->arp_flags |= ATF_COM;
 	if (dev == NULL) {
-		struct flowi fl = { .nl_u.ip4_u = { .daddr = ip,
-						    .tos = RTO_ONLINK } };
+		struct flowi fl = { .fl4_dst = ip,
+				    .fl4_tos = RTO_ONLINK };
 		struct rtable *rt;
 		err = ip_route_output_key(net, &rt, &fl);
 		if (err != 0)
@@ -1169,8 +1169,8 @@ static int arp_req_delete(struct net *net, struct arpreq *r,
 
 	ip = ((struct sockaddr_in *)&r->arp_pa)->sin_addr.s_addr;
 	if (dev == NULL) {
-		struct flowi fl = { .nl_u.ip4_u = { .daddr = ip,
-						    .tos = RTO_ONLINK } };
+		struct flowi fl = { .fl4_dst = ip,
+				    .fl4_tos = RTO_ONLINK };
 		struct rtable *rt;
 		err = ip_route_output_key(net, &rt, &fl);
 		if (err != 0)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index eb6f69a..d3a1112 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -158,11 +158,7 @@ static void fib_flush(struct net *net)
 struct net_device *__ip_dev_find(struct net *net, __be32 addr, bool devref)
 {
 	struct flowi fl = {
-		.nl_u = {
-			.ip4_u = {
-				.daddr = addr
-			}
-		},
+		.fl4_dst = addr,
 		.flags = FLOWI_FLAG_MATCH_ANY_IIF
 	};
 	struct fib_result res = { 0 };
@@ -193,7 +189,7 @@ static inline unsigned __inet_dev_addr_type(struct net *net,
 					    const struct net_device *dev,
 					    __be32 addr)
 {
-	struct flowi		fl = { .nl_u = { .ip4_u = { .daddr = addr } } };
+	struct flowi		fl = { .fl4_dst = addr };
 	struct fib_result	res;
 	unsigned ret = RTN_BROADCAST;
 	struct fib_table *local_table;
@@ -247,13 +243,9 @@ int fib_validate_source(__be32 src, __be32 dst, u8 tos, int oif,
 {
 	struct in_device *in_dev;
 	struct flowi fl = {
-		.nl_u = {
-			.ip4_u = {
-				.daddr = src,
-				.saddr = dst,
-				.tos = tos
-			}
-		},
+		.fl4_dst = src,
+		.fl4_src = dst,
+		.fl4_tos = tos,
 		.mark = mark,
 		.iif = oif
 	};
@@ -853,13 +845,9 @@ static void nl_fib_lookup(struct fib_result_nl *frn, struct fib_table *tb)
 	struct fib_result       res;
 	struct flowi            fl = {
 		.mark = frn->fl_mark,
-		.nl_u = {
-			.ip4_u = {
-				.daddr = frn->fl_addr,
-				.tos = frn->fl_tos,
-				.scope = frn->fl_scope
-			}
-		}
+		.fl4_dst = frn->fl_addr,
+		.fl4_tos = frn->fl_tos,
+		.fl4_scope = frn->fl_scope,
 	};
 
 #ifdef CONFIG_IP_MULTIPLE_TABLES
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 3e0da3e..12d3dc3 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -563,12 +563,8 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi,
 		rcu_read_lock();
 		{
 			struct flowi fl = {
-				.nl_u = {
-					.ip4_u = {
-						.daddr = nh->nh_gw,
-						.scope = cfg->fc_scope + 1,
-					},
-				},
+				.fl4_dst = nh->nh_gw,
+				.fl4_scope = cfg->fc_scope + 1,
 				.oif = nh->nh_oif,
 			};
 
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index c6e2aff..4daebd1 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -386,10 +386,9 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 			daddr = icmp_param->replyopts.faddr;
 	}
 	{
-		struct flowi fl = { .nl_u = { .ip4_u =
-					      { .daddr = daddr,
-						.saddr = rt->rt_spec_dst,
-						.tos = RT_TOS(ip_hdr(skb)->tos) } },
+		struct flowi fl = { .fl4_dst= daddr,
+				    .fl4_src = rt->rt_spec_dst,
+				    .fl4_tos = RT_TOS(ip_hdr(skb)->tos),
 				    .proto = IPPROTO_ICMP };
 		security_skb_classify_flow(skb, &fl);
 		if (ip_route_output_key(net, &rt, &fl))
@@ -542,22 +541,13 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 
 	{
 		struct flowi fl = {
-			.nl_u = {
-				.ip4_u = {
-					.daddr = icmp_param.replyopts.srr ?
-						icmp_param.replyopts.faddr :
-						iph->saddr,
-					.saddr = saddr,
-					.tos = RT_TOS(tos)
-				}
-			},
+			.fl4_dst = icmp_param.replyopts.srr ?
+				   icmp_param.replyopts.faddr : iph->saddr,
+			.fl4_src = saddr,
+			.fl4_tos = RT_TOS(tos),
 			.proto = IPPROTO_ICMP,
-			.uli_u = {
-				.icmpt = {
-					.type = type,
-					.code = code
-				}
-			}
+			.fl_icmp_type = type,
+			.fl_icmp_code = code,
 		};
 		int err;
 		struct rtable *rt2;
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 08d0d81..606f92c 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -298,8 +298,7 @@ static struct sk_buff *igmpv3_newpack(struct net_device *dev, int size)
 
 	{
 		struct flowi fl = { .oif = dev->ifindex,
-				    .nl_u = { .ip4_u = {
-				    .daddr = IGMPV3_ALL_MCR } },
+				    .fl4_dst = IGMPV3_ALL_MCR,
 				    .proto = IPPROTO_IGMP };
 		if (ip_route_output_key(net, &rt, &fl)) {
 			kfree_skb(skb);
@@ -644,7 +643,7 @@ static int igmp_send_report(struct in_device *in_dev, struct ip_mc_list *pmc,
 
 	{
 		struct flowi fl = { .oif = dev->ifindex,
-				    .nl_u = { .ip4_u = { .daddr = dst } },
+				    .fl4_dst = dst,
 				    .proto = IPPROTO_IGMP };
 		if (ip_route_output_key(net, &rt, &fl))
 			return -1;
@@ -1421,8 +1420,7 @@ void ip_mc_destroy_dev(struct in_device *in_dev)
 /* RTNL is locked */
 static struct in_device *ip_mc_find_dev(struct net *net, struct ip_mreqn *imr)
 {
-	struct flowi fl = { .nl_u = { .ip4_u =
-				      { .daddr = imr->imr_multiaddr.s_addr } } };
+	struct flowi fl = { .fl4_dst = imr->imr_multiaddr.s_addr };
 	struct rtable *rt;
 	struct net_device *dev = NULL;
 	struct in_device *idev = NULL;
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 7174370..06f5f8f 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -358,17 +358,14 @@ struct dst_entry *inet_csk_route_req(struct sock *sk,
 	struct ip_options *opt = inet_rsk(req)->opt;
 	struct flowi fl = { .oif = sk->sk_bound_dev_if,
 			    .mark = sk->sk_mark,
-			    .nl_u = { .ip4_u =
-				      { .daddr = ((opt && opt->srr) ?
-						  opt->faddr :
-						  ireq->rmt_addr),
-					.saddr = ireq->loc_addr,
-					.tos = RT_CONN_FLAGS(sk) } },
+			    .fl4_dst = ((opt && opt->srr) ?
+					  opt->faddr : ireq->rmt_addr),
+			    .fl4_src = ireq->loc_addr,
+			    .fl4_tos = RT_CONN_FLAGS(sk),
 			    .proto = sk->sk_protocol,
 			    .flags = inet_sk_flowi_flags(sk),
-			    .uli_u = { .ports =
-				       { .sport = inet_sk(sk)->inet_sport,
-					 .dport = ireq->rmt_port } } };
+			    .fl_ip_sport = inet_sk(sk)->inet_sport,
+			    .fl_ip_dport = ireq->rmt_port };
 	struct net *net = sock_net(sk);
 
 	security_req_classify_flow(req, &fl);
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index cab2057..a2e9cfd 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -772,13 +772,9 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev
 	{
 		struct flowi fl = {
 			.oif = tunnel->parms.link,
-			.nl_u = {
-				.ip4_u = {
-					.daddr = dst,
-					.saddr = tiph->saddr,
-					.tos = RT_TOS(tos)
-				}
-			},
+			.fl4_dst = dst,
+			.fl4_src = tiph->saddr,
+			.fl4_tos = RT_TOS(tos),
 			.proto = IPPROTO_GRE
 		}
 ;
@@ -951,13 +947,9 @@ static int ipgre_tunnel_bind_dev(struct net_device *dev)
 	if (iph->daddr) {
 		struct flowi fl = {
 			.oif = tunnel->parms.link,
-			.nl_u = {
-				.ip4_u = {
-					.daddr = iph->daddr,
-					.saddr = iph->saddr,
-					.tos = RT_TOS(iph->tos)
-				}
-			},
+			.fl4_dst = iph->daddr,
+			.fl4_src = iph->saddr,
+			.fl4_tos = RT_TOS(iph->tos),
 			.proto = IPPROTO_GRE
 		};
 		struct rtable *rt;
@@ -1216,13 +1208,9 @@ static int ipgre_open(struct net_device *dev)
 	if (ipv4_is_multicast(t->parms.iph.daddr)) {
 		struct flowi fl = {
 			.oif = t->parms.link,
-			.nl_u = {
-				.ip4_u = {
-					.daddr = t->parms.iph.daddr,
-					.saddr = t->parms.iph.saddr,
-					.tos = RT_TOS(t->parms.iph.tos)
-				}
-			},
+			.fl4_dst = t->parms.iph.daddr,
+			.fl4_src = t->parms.iph.saddr,
+			.fl4_tos = RT_TOS(t->parms.iph.tos),
 			.proto = IPPROTO_GRE
 		};
 		struct rtable *rt;
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 439d2a3..5090c7f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -341,15 +341,13 @@ int ip_queue_xmit(struct sk_buff *skb)
 		{
 			struct flowi fl = { .oif = sk->sk_bound_dev_if,
 					    .mark = sk->sk_mark,
-					    .nl_u = { .ip4_u =
-						      { .daddr = daddr,
-							.saddr = inet->inet_saddr,
-							.tos = RT_CONN_FLAGS(sk) } },
+					    .fl4_dst = daddr,
+					    .fl4_src = inet->inet_saddr,
+					    .fl4_tos = RT_CONN_FLAGS(sk),
 					    .proto = sk->sk_protocol,
 					    .flags = inet_sk_flowi_flags(sk),
-					    .uli_u = { .ports =
-						       { .sport = inet->inet_sport,
-							 .dport = inet->inet_dport } } };
+					    .fl_ip_sport = inet->inet_sport,
+					    .fl_ip_dport = inet->inet_dport };
 
 			/* If this fails, retransmit mechanism of transport layer will
 			 * keep trying until route appears or the connection times
@@ -1404,14 +1402,11 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *ar
 
 	{
 		struct flowi fl = { .oif = arg->bound_dev_if,
-				    .nl_u = { .ip4_u =
-					      { .daddr = daddr,
-						.saddr = rt->rt_spec_dst,
-						.tos = RT_TOS(ip_hdr(skb)->tos) } },
-				    /* Not quite clean, but right. */
-				    .uli_u = { .ports =
-					       { .sport = tcp_hdr(skb)->dest,
-						 .dport = tcp_hdr(skb)->source } },
+				    .fl4_dst = daddr,
+				    .fl4_src = rt->rt_spec_dst,
+				    .fl4_tos = RT_TOS(ip_hdr(skb)->tos),
+				    .fl_ip_sport = tcp_hdr(skb)->dest,
+				    .fl_ip_dport = tcp_hdr(skb)->source,
 				    .proto = sk->sk_protocol,
 				    .flags = ip_reply_arg_flowi_flags(arg) };
 		security_skb_classify_flow(skb, &fl);
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index cd300aa..e70ad58 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -463,13 +463,9 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 	{
 		struct flowi fl = {
 			.oif = tunnel->parms.link,
-			.nl_u = {
-				.ip4_u = {
-					.daddr = dst,
-					.saddr = tiph->saddr,
-					.tos = RT_TOS(tos)
-				}
-			},
+			.fl4_dst = dst,
+			.fl4_src= tiph->saddr,
+			.fl4_tos = RT_TOS(tos),
 			.proto = IPPROTO_IPIP
 		};
 
@@ -589,13 +585,9 @@ static void ipip_tunnel_bind_dev(struct net_device *dev)
 	if (iph->daddr) {
 		struct flowi fl = {
 			.oif = tunnel->parms.link,
-			.nl_u = {
-				.ip4_u = {
-					.daddr = iph->daddr,
-					.saddr = iph->saddr,
-					.tos = RT_TOS(iph->tos)
-				}
-			},
+			.fl4_dst = iph->daddr,
+			.fl4_src = iph->saddr,
+			.fl4_tos = RT_TOS(iph->tos),
 			.proto = IPPROTO_IPIP
 		};
 		struct rtable *rt;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index ef2b008..92aaa3d 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1537,13 +1537,9 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
 	if (vif->flags & VIFF_TUNNEL) {
 		struct flowi fl = {
 			.oif = vif->link,
-			.nl_u = {
-				.ip4_u = {
-					.daddr = vif->remote,
-					.saddr = vif->local,
-					.tos = RT_TOS(iph->tos)
-				}
-			},
+			.fl4_dst = vif->remote,
+			.fl4_src = vif->local,
+			.fl4_tos = RT_TOS(iph->tos),
 			.proto = IPPROTO_IPIP
 		};
 
@@ -1553,12 +1549,8 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
 	} else {
 		struct flowi fl = {
 			.oif = vif->link,
-			.nl_u = {
-				.ip4_u = {
-					.daddr = iph->daddr,
-					.tos = RT_TOS(iph->tos)
-				}
-			},
+			.fl4_dst = iph->daddr,
+			.fl4_tos = RT_TOS(iph->tos),
 			.proto = IPPROTO_IPIP
 		};
 
diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
index d88a46c..994a1f2 100644
--- a/net/ipv4/netfilter.c
+++ b/net/ipv4/netfilter.c
@@ -31,10 +31,10 @@ int ip_route_me_harder(struct sk_buff *skb, unsigned addr_type)
 	 * packets with foreign saddr to appear on the NF_INET_LOCAL_OUT hook.
 	 */
 	if (addr_type == RTN_LOCAL) {
-		fl.nl_u.ip4_u.daddr = iph->daddr;
+		fl.fl4_dst = iph->daddr;
 		if (type == RTN_LOCAL)
-			fl.nl_u.ip4_u.saddr = iph->saddr;
-		fl.nl_u.ip4_u.tos = RT_TOS(iph->tos);
+			fl.fl4_src = iph->saddr;
+		fl.fl4_tos = RT_TOS(iph->tos);
 		fl.oif = skb->sk ? skb->sk->sk_bound_dev_if : 0;
 		fl.mark = skb->mark;
 		fl.flags = skb->sk ? inet_sk_flowi_flags(skb->sk) : 0;
@@ -47,7 +47,7 @@ int ip_route_me_harder(struct sk_buff *skb, unsigned addr_type)
 	} else {
 		/* non-local src, find valid iif to satisfy
 		 * rp-filter when calling ip_route_input. */
-		fl.nl_u.ip4_u.daddr = iph->saddr;
+		fl.fl4_dst = iph->saddr;
 		if (ip_route_output_key(net, &rt, &fl) != 0)
 			return -1;
 
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 1f85ef2..a3d5ab7 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -549,10 +549,9 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	{
 		struct flowi fl = { .oif = ipc.oif,
 				    .mark = sk->sk_mark,
-				    .nl_u = { .ip4_u =
-					      { .daddr = daddr,
-						.saddr = saddr,
-						.tos = tos } },
+				    .fl4_dst = daddr,
+				    .fl4_src = saddr,
+				    .fl4_tos = tos,
 				    .proto = inet->hdrincl ? IPPROTO_RAW :
 							     sk->sk_protocol,
 				  };
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 66610ea..ec2333f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -684,17 +684,17 @@ static inline bool rt_caching(const struct net *net)
 static inline bool compare_hash_inputs(const struct flowi *fl1,
 					const struct flowi *fl2)
 {
-	return ((((__force u32)fl1->nl_u.ip4_u.daddr ^ (__force u32)fl2->nl_u.ip4_u.daddr) |
-		((__force u32)fl1->nl_u.ip4_u.saddr ^ (__force u32)fl2->nl_u.ip4_u.saddr) |
+	return ((((__force u32)fl1->fl4_dst ^ (__force u32)fl2->fl4_dst) |
+		((__force u32)fl1->fl4_src ^ (__force u32)fl2->fl4_src) |
 		(fl1->iif ^ fl2->iif)) == 0);
 }
 
 static inline int compare_keys(struct flowi *fl1, struct flowi *fl2)
 {
-	return (((__force u32)fl1->nl_u.ip4_u.daddr ^ (__force u32)fl2->nl_u.ip4_u.daddr) |
-		((__force u32)fl1->nl_u.ip4_u.saddr ^ (__force u32)fl2->nl_u.ip4_u.saddr) |
+	return (((__force u32)fl1->fl4_dst ^ (__force u32)fl2->fl4_dst) |
+		((__force u32)fl1->fl4_src ^ (__force u32)fl2->fl4_src) |
 		(fl1->mark ^ fl2->mark) |
-		(*(u16 *)&fl1->nl_u.ip4_u.tos ^ *(u16 *)&fl2->nl_u.ip4_u.tos) |
+		(*(u16 *)&fl1->fl4_tos ^ *(u16 *)&fl2->fl4_tos) |
 		(fl1->oif ^ fl2->oif) |
 		(fl1->iif ^ fl2->iif)) == 0;
 }
@@ -2089,12 +2089,10 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 {
 	struct fib_result res;
 	struct in_device *in_dev = __in_dev_get_rcu(dev);
-	struct flowi fl = { .nl_u = { .ip4_u =
-				      { .daddr = daddr,
-					.saddr = saddr,
-					.tos = tos,
-					.scope = RT_SCOPE_UNIVERSE,
-				      } },
+	struct flowi fl = { .fl4_dst	= daddr,
+			    .fl4_src	= saddr,
+			    .fl4_tos	= tos,
+			    .fl4_scope	= RT_SCOPE_UNIVERSE,
 			    .mark = skb->mark,
 			    .iif = dev->ifindex };
 	unsigned	flags = 0;
@@ -2480,14 +2478,11 @@ static int ip_route_output_slow(struct net *net, struct rtable **rp,
 				const struct flowi *oldflp)
 {
 	u32 tos	= RT_FL_TOS(oldflp);
-	struct flowi fl = { .nl_u = { .ip4_u =
-				      { .daddr = oldflp->fl4_dst,
-					.saddr = oldflp->fl4_src,
-					.tos = tos & IPTOS_RT_MASK,
-					.scope = ((tos & RTO_ONLINK) ?
-						  RT_SCOPE_LINK :
-						  RT_SCOPE_UNIVERSE),
-				      } },
+	struct flowi fl = { .fl4_dst = oldflp->fl4_dst,
+			    .fl4_src = oldflp->fl4_src,
+			    .fl4_tos = tos & IPTOS_RT_MASK,
+			    .fl4_scope = ((tos & RTO_ONLINK) ?
+					  RT_SCOPE_LINK : RT_SCOPE_UNIVERSE),
 			    .mark = oldflp->mark,
 			    .iif = net->loopback_dev->ifindex,
 			    .oif = oldflp->oif };
@@ -2944,13 +2939,9 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr* nlh, void
 			err = -rt->dst.error;
 	} else {
 		struct flowi fl = {
-			.nl_u = {
-				.ip4_u = {
-					.daddr = dst,
-					.saddr = src,
-					.tos = rtm->rtm_tos,
-				},
-			},
+			.fl4_dst = dst,
+			.fl4_src = src,
+			.fl4_tos = rtm->rtm_tos,
 			.oif = tb[RTA_OIF] ? nla_get_u32(tb[RTA_OIF]) : 0,
 			.mark = mark,
 		};
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 650cace..4751920 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -346,17 +346,14 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 	 */
 	{
 		struct flowi fl = { .mark = sk->sk_mark,
-				    .nl_u = { .ip4_u =
-					      { .daddr = ((opt && opt->srr) ?
-							  opt->faddr :
-							  ireq->rmt_addr),
-						.saddr = ireq->loc_addr,
-						.tos = RT_CONN_FLAGS(sk) } },
+				    .fl4_dst = ((opt && opt->srr) ?
+						opt->faddr : ireq->rmt_addr),
+				    .fl4_src = ireq->loc_addr,
+				    .fl4_tos = RT_CONN_FLAGS(sk),
 				    .proto = IPPROTO_TCP,
 				    .flags = inet_sk_flowi_flags(sk),
-				    .uli_u = { .ports =
-					       { .sport = th->dest,
-						 .dport = th->source } } };
+				    .fl_ip_sport = th->dest,
+				    .fl_ip_dport = th->source };
 		security_req_classify_flow(req, &fl);
 		if (ip_route_output_key(sock_net(sk), &rt, &fl)) {
 			reqsk_free(req);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 28cb2d7..803887f 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -890,15 +890,13 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	if (rt == NULL) {
 		struct flowi fl = { .oif = ipc.oif,
 				    .mark = sk->sk_mark,
-				    .nl_u = { .ip4_u =
-					      { .daddr = faddr,
-						.saddr = saddr,
-						.tos = tos } },
+				    .fl4_dst = faddr,
+				    .fl4_src = saddr,
+				    .fl4_tos = tos,
 				    .proto = sk->sk_protocol,
 				    .flags = inet_sk_flowi_flags(sk),
-				    .uli_u = { .ports =
-					       { .sport = inet->inet_sport,
-						 .dport = dport } } };
+				    .fl_ip_sport = inet->inet_sport,
+				    .fl_ip_dport = dport };
 		struct net *net = sock_net(sk);
 
 		security_sk_classify_flow(sk, &fl);
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index dd1fd8c..b9e28b9 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -22,12 +22,8 @@ static struct dst_entry *xfrm4_dst_lookup(struct net *net, int tos,
 					  xfrm_address_t *daddr)
 {
 	struct flowi fl = {
-		.nl_u = {
-			.ip4_u = {
-				.tos = tos,
-				.daddr = daddr->a4,
-			},
-		},
+		.fl4_dst = daddr->a4,
+		.fl4_tos = tos,
 	};
 	struct dst_entry *dst;
 	struct rtable *rt;
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 6f32ffc..9fab274 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1843,9 +1843,7 @@ static int ip6mr_forward2(struct net *net, struct mr6_table *mrt,
 
 	fl = (struct flowi) {
 		.oif = vif->link,
-		.nl_u = { .ip6_u =
-				{ .daddr = ipv6h->daddr, }
-		}
+		.fl6_dst = ipv6h->daddr,
 	};
 
 	dst = ip6_route_output(net, NULL, &fl);
diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c
index 7155b24..35915e8 100644
--- a/net/ipv6/netfilter.c
+++ b/net/ipv6/netfilter.c
@@ -18,10 +18,8 @@ int ip6_route_me_harder(struct sk_buff *skb)
 	struct flowi fl = {
 		.oif = skb->sk ? skb->sk->sk_bound_dev_if : 0,
 		.mark = skb->mark,
-		.nl_u =
-		{ .ip6_u =
-		  { .daddr = iph->daddr,
-		    .saddr = iph->saddr, } },
+		.fl6_dst = iph->daddr,
+		.fl6_src = iph->saddr,
 	};
 
 	dst = ip6_route_output(net, skb->sk, &fl);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index fc32833..7763663 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -558,11 +558,7 @@ struct rt6_info *rt6_lookup(struct net *net, const struct in6_addr *daddr,
 {
 	struct flowi fl = {
 		.oif = oif,
-		.nl_u = {
-			.ip6_u = {
-				.daddr = *daddr,
-			},
-		},
+		.fl6_dst = *daddr,
 	};
 	struct dst_entry *dst;
 	int flags = strict ? RT6_LOOKUP_F_IFACE : 0;
@@ -778,13 +774,9 @@ void ip6_route_input(struct sk_buff *skb)
 	int flags = RT6_LOOKUP_F_HAS_SADDR;
 	struct flowi fl = {
 		.iif = skb->dev->ifindex,
-		.nl_u = {
-			.ip6_u = {
-				.daddr = iph->daddr,
-				.saddr = iph->saddr,
-				.flowlabel = (* (__be32 *) iph)&IPV6_FLOWINFO_MASK,
-			},
-		},
+		.fl6_dst = iph->daddr,
+		.fl6_src = iph->saddr,
+		.fl6_flowlabel = (* (__be32 *) iph)&IPV6_FLOWINFO_MASK,
 		.mark = skb->mark,
 		.proto = iph->nexthdr,
 	};
@@ -1463,12 +1455,8 @@ static struct rt6_info *ip6_route_redirect(struct in6_addr *dest,
 	struct ip6rd_flowi rdfl = {
 		.fl = {
 			.oif = dev->ifindex,
-			.nl_u = {
-				.ip6_u = {
-					.daddr = *dest,
-					.saddr = *src,
-				},
-			},
+			.fl6_dst = *dest,
+			.fl6_src = *src,
 		},
 	};
 
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index d6bfaec..6e48a80 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -730,10 +730,9 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 	}
 
 	{
-		struct flowi fl = { .nl_u = { .ip4_u =
-					      { .daddr = dst,
-						.saddr = tiph->saddr,
-						.tos = RT_TOS(tos) } },
+		struct flowi fl = { .fl4_dst = dst,
+				    .fl4_src = tiph->saddr,
+				    .fl4_tos = RT_TOS(tos),
 				    .oif = tunnel->parms.link,
 				    .proto = IPPROTO_IPV6 };
 		if (ip_route_output_key(dev_net(dev), &rt, &fl)) {
@@ -855,10 +854,9 @@ static void ipip6_tunnel_bind_dev(struct net_device *dev)
 	iph = &tunnel->parms.iph;
 
 	if (iph->daddr) {
-		struct flowi fl = { .nl_u = { .ip4_u =
-					      { .daddr = iph->daddr,
-						.saddr = iph->saddr,
-						.tos = RT_TOS(iph->tos) } },
+		struct flowi fl = { .fl4_dst = iph->daddr,
+				    .fl4_src = iph->saddr,
+				    .fl4_tos = RT_TOS(iph->tos),
 				    .oif = tunnel->parms.link,
 				    .proto = IPPROTO_IPV6 };
 		struct rtable *rt;
diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index 0bf6a59..04635e8 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -476,15 +476,13 @@ static int l2tp_ip_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *m
 
 		{
 			struct flowi fl = { .oif = sk->sk_bound_dev_if,
-					    .nl_u = { .ip4_u = {
-							.daddr = daddr,
-							.saddr = inet->inet_saddr,
-							.tos = RT_CONN_FLAGS(sk) } },
+					    .fl4_dst = daddr,
+					    .fl4_src = inet->inet_saddr,
+					    .fl4_tos = RT_CONN_FLAGS(sk),
 					    .proto = sk->sk_protocol,
 					    .flags = inet_sk_flowi_flags(sk),
-					    .uli_u = { .ports = {
-							 .sport = inet->inet_sport,
-							 .dport = inet->inet_dport } } };
+					    .fl_ip_sport = inet->inet_sport,
+					    .fl_ip_dport = inet->inet_dport };
 
 			/* If this fails, retransmit mechanism of transport layer will
 			 * keep trying until route appears or the connection times
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 5f5daa3..c6f2936 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -110,10 +110,8 @@ static int __ip_vs_addr_is_local_v6(const struct in6_addr *addr)
 	struct rt6_info *rt;
 	struct flowi fl = {
 		.oif = 0,
-		.nl_u = {
-			.ip6_u = {
-				.daddr = *addr,
-				.saddr = { .s6_addr32 = {0, 0, 0, 0} }, } },
+		.fl6_dst = *addr,
+		.fl6_src = { .s6_addr32 = {0, 0, 0, 0} },
 	};
 
 	rt = (struct rt6_info *)ip6_route_output(&init_net, NULL, &fl);
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 10bd39c..5325a3f 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -96,12 +96,8 @@ __ip_vs_get_out_rt(struct sk_buff *skb, struct ip_vs_dest *dest,
 		if (!(rt = (struct rtable *)
 		      __ip_vs_dst_check(dest, rtos))) {
 			struct flowi fl = {
-				.oif = 0,
-				.nl_u = {
-					.ip4_u = {
-						.daddr = dest->addr.ip,
-						.saddr = 0,
-						.tos = rtos, } },
+				.fl4_dst = dest->addr.ip,
+				.fl4_tos = rtos,
 			};
 
 			if (ip_route_output_key(net, &rt, &fl)) {
@@ -118,12 +114,8 @@ __ip_vs_get_out_rt(struct sk_buff *skb, struct ip_vs_dest *dest,
 		spin_unlock(&dest->dst_lock);
 	} else {
 		struct flowi fl = {
-			.oif = 0,
-			.nl_u = {
-				.ip4_u = {
-					.daddr = daddr,
-					.saddr = 0,
-					.tos = rtos, } },
+			.fl4_dst = daddr,
+			.fl4_tos = rtos,
 		};
 
 		if (ip_route_output_key(net, &rt, &fl)) {
@@ -178,14 +170,9 @@ __ip_vs_reroute_locally(struct sk_buff *skb)
 		refdst_drop(orefdst);
 	} else {
 		struct flowi fl = {
-			.oif = 0,
-			.nl_u = {
-				.ip4_u = {
-					.daddr = iph->daddr,
-					.saddr = iph->saddr,
-					.tos = RT_TOS(iph->tos),
-				}
-			},
+			.fl4_dst = iph->daddr,
+			.fl4_src = iph->saddr,
+			.fl4_tos = RT_TOS(iph->tos),
 			.mark = skb->mark,
 		};
 		struct rtable *rt;
@@ -216,12 +203,7 @@ __ip_vs_route_output_v6(struct net *net, struct in6_addr *daddr,
 {
 	struct dst_entry *dst;
 	struct flowi fl = {
-		.oif = 0,
-		.nl_u = {
-			.ip6_u = {
-				.daddr = *daddr,
-			},
-		},
+		.fl6_dst = *daddr,
 	};
 
 	dst = ip6_route_output(net, NULL, &fl);
diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c
index 22a2d42..231867d 100644
--- a/net/netfilter/xt_TEE.c
+++ b/net/netfilter/xt_TEE.c
@@ -70,9 +70,9 @@ tee_tg_route4(struct sk_buff *skb, const struct xt_tee_tginfo *info)
 			return false;
 		fl.oif = info->priv->oif;
 	}
-	fl.nl_u.ip4_u.daddr = info->gw.ip;
-	fl.nl_u.ip4_u.tos   = RT_TOS(iph->tos);
-	fl.nl_u.ip4_u.scope = RT_SCOPE_UNIVERSE;
+	fl.fl4_dst = info->gw.ip;
+	fl.fl4_tos = RT_TOS(iph->tos);
+	fl.fl4_scope = RT_SCOPE_UNIVERSE;
 	if (ip_route_output_key(net, &rt, &fl) != 0)
 		return false;
 
@@ -150,9 +150,9 @@ tee_tg_route6(struct sk_buff *skb, const struct xt_tee_tginfo *info)
 			return false;
 		fl.oif = info->priv->oif;
 	}
-	fl.nl_u.ip6_u.daddr = info->gw.in6;
-	fl.nl_u.ip6_u.flowlabel = ((iph->flow_lbl[0] & 0xF) << 16) |
-				  (iph->flow_lbl[1] << 8) | iph->flow_lbl[2];
+	fl.fl6_dst = info->gw.in6;
+	fl.fl6_flowlabel = ((iph->flow_lbl[0] & 0xF) << 16) |
+			   (iph->flow_lbl[1] << 8) | iph->flow_lbl[2];
 	dst = ip6_route_output(net, NULL, &fl);
 	if (dst == NULL)
 		return false;
diff --git a/net/rxrpc/ar-peer.c b/net/rxrpc/ar-peer.c
index 9f1729b..a53fb25 100644
--- a/net/rxrpc/ar-peer.c
+++ b/net/rxrpc/ar-peer.c
@@ -47,12 +47,12 @@ static void rxrpc_assess_MTU_size(struct rxrpc_peer *peer)
 	case AF_INET:
 		fl.oif = 0;
 		fl.proto = IPPROTO_UDP,
-		fl.nl_u.ip4_u.saddr = 0;
-		fl.nl_u.ip4_u.daddr = peer->srx.transport.sin.sin_addr.s_addr;
-		fl.nl_u.ip4_u.tos = 0;
+		fl.fl4_dst = peer->srx.transport.sin.sin_addr.s_addr;
+		fl.fl4_src = 0;
+		fl.fl4_tos = 0;
 		/* assume AFS.CM talking to AFS.FS */
-		fl.uli_u.ports.sport = htons(7001);
-		fl.uli_u.ports.dport = htons(7000);
+		fl.fl_ip_sport = htons(7001);
+		fl.fl_ip_dport = htons(7000);
 		break;
 	default:
 		BUG();

^ permalink raw reply related

* Re: possible kernel oops from user MSS
From: David Miller @ 2010-11-12 23:26 UTC (permalink / raw)
  To: mzhang; +Cc: netdev
In-Reply-To: <4CDDC6EE.2010005@mvista.com>

From: Min Zhang <mzhang@mvista.com>
Date: Fri, 12 Nov 2010 14:59:58 -0800

> Regarding commit 7a1abd08d52fdeddb3e9a5a33f2f15cc6a5674d2 ("tcp:
> Increase TCP_MAXSEG socket option minimum"). What is the reason
> TCP_MAXSEG minimum be 64? Isn't the exact be 40 which is
> TCPOLEN_MD5SIG_ALIGNED(20) + TCPOLEN_TSTAMP_ALIGNED(12) + 8?
> 
> Or is it better to use TCP_MIN_MSS from tcp.h:
> 
> /* Minimal accepted MSS. It is (60+60+8) - (20+20). */
> #define TCP_MIN_MSS        88U

I suppose TCP_MIN_MSS would be better to use, I'll make that
change, thanks.

^ permalink raw reply

* RE: [RFC PATCH] network: return errors if we know tcp_connect failed
From: Hua Zhong @ 2010-11-12 23:14 UTC (permalink / raw)
  To: 'Patrick McHardy'
  Cc: 'Eric Paris', netdev, linux-kernel, davem, kuznet, pekkas,
	jmorris, yoshfuji
In-Reply-To: <4CDCEE65.3060105@trash.net>

> On 11.11.2010 22:58, Hua Zhong wrote:
> >> Yes, I realize this is little different than if the
> >> SYN was dropped in the first network device, but it is different
> >> because we know what happened!  We know that connect() call failed
> >> and that there isn't anything coming back.
> >
> > I would argue that -j DROP should behave exactly as the packet is
> dropped in the network, while -j REJECT should signal the failure to
> the application as soon as possible (which it doesn't seem to do).
> 
> It sends an ICMP error or TCP reset. Interpretation is up to TCP.

Huh? It's the OUTPUT chain we are talking about. There is no ICMP error or
TCP reset.


^ permalink raw reply

* Re: [PATCH] r8169: fix checksum broken
From: Francois Romieu @ 2010-11-12 23:13 UTC (permalink / raw)
  To: Shan Wei; +Cc: netdev@vger.kernel.org, David Miller
In-Reply-To: <20101112224746.GA6676@electric-eye.fr.zoreil.com>

Francois Romieu <romieu@fr.zoreil.com> :
[...]
> Which kind of device do you use : PCI-E 8168 / 810x or PCI 8169 ?

Wrong page. Forget it.

Acked-by: Francois Romieu <romieu@fr.zoreil.com>

-- 
Ueimor

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox