Netdev List
 help / color / mirror / Atom feed
* Re: [net-next PATCH 2/3] dcb: add DCBX mode to event notifier attributes
From: David Miller @ 2011-10-06 19:50 UTC (permalink / raw)
  To: john.r.fastabend; +Cc: netdev, gospo
In-Reply-To: <20111006185238.2781.36917.stgit@jf-dev1-dcblab>

From: John Fastabend <john.r.fastabend@intel.com>
Date: Thu, 06 Oct 2011 11:52:38 -0700

> Add DCBX mode to event notifiers so listeners can learn
> currently enabled mode.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Applied.

^ permalink raw reply

* Re: [net-next PATCH 3/3] dcb: Add stub routines for !CONFIG_DCB
From: David Miller @ 2011-10-06 19:50 UTC (permalink / raw)
  To: john.r.fastabend; +Cc: netdev, gospo
In-Reply-To: <20111006185243.2781.67932.stgit@jf-dev1-dcblab>

From: John Fastabend <john.r.fastabend@intel.com>
Date: Thu, 06 Oct 2011 11:52:44 -0700

> To avoid ifdefs in the other code that supports DCB notifiers
> add stub routines. This method seems popular in other net code
> for example 8021Q.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH] Break up the single NBD lock into one per NBD device
From: Eric Dumazet @ 2011-10-06 19:53 UTC (permalink / raw)
  To: H.K. Jerry Chu; +Cc: davem, netdev
In-Reply-To: <1317080052-6052-1-git-send-email-hkchu@google.com>

Le lundi 26 septembre 2011 à 16:34 -0700, H.K. Jerry Chu a écrit :
> From: Jerry Chu <hkchu@google.com>
> 
> This patch breaks up the single NBD lock into one per
> disk. The single NBD lock has become a serious performance
> bottleneck when multiple NBD disks are being used.
> 
> The original comment on why a single lock may be ok no
> longer holds for today's much faster NICs.
> 
> Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
> ---
>  drivers/block/nbd.c |   22 +++++++++-------------
>  1 files changed, 9 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index f533f33..355e15c 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -58,20 +58,9 @@ static unsigned int debugflags;
>  
>  static unsigned int nbds_max = 16;
>  static struct nbd_device *nbd_dev;
> +static spinlock_t *nbd_locks;

static spinlock_t *nbd_locks __read_mostly;

>  static int max_part;
>  
> -/*
> - * Use just one lock (or at most 1 per NIC). Two arguments for this:
> - * 1. Each NIC is essentially a synchronization point for all servers
> - *    accessed through that NIC so there's no need to have more locks
> - *    than NICs anyway.
> - * 2. More locks lead to more "Dirty cache line bouncing" which will slow
> - *    down each lock to the point where they're actually slower than just
> - *    a single lock.
> - * Thanks go to Jens Axboe and Al Viro for their LKML emails explaining this!
> - */
> -static DEFINE_SPINLOCK(nbd_lock);
> -
>  #ifndef NDEBUG
>  static const char *ioctl_cmd_to_ascii(int cmd)
>  {
> @@ -753,6 +742,12 @@ static int __init nbd_init(void)
>  	if (!nbd_dev)
>  		return -ENOMEM;
>  
> +	nbd_locks = kcalloc(nbds_max, sizeof(*nbd_locks), GFP_KERNEL);
> +	if (!nbd_locks) {
> +		kfree(nbd_dev);
> +		return -ENOMEM;
> +	}
> +

	Please add loop to init spinlocks to help LOCKDEP...

	for (i = 0; i < nbds_max; i++)
		spin_lock_init(&nbd_locks[i]);

>  	part_shift = 0;
>  	if (max_part > 0) {
>  		part_shift = fls(max_part);
> @@ -784,7 +779,7 @@ static int __init nbd_init(void)
>  		 * every gendisk to have its very own request_queue struct.
>  		 * These structs are big so we dynamically allocate them.
>  		 */
> -		disk->queue = blk_init_queue(do_nbd_request, &nbd_lock);
> +		disk->queue = blk_init_queue(do_nbd_request, &nbd_locks[i]);
>  		if (!disk->queue) {
>  			put_disk(disk);
>  			goto out;
> @@ -832,6 +827,7 @@ out:
>  		put_disk(nbd_dev[i].disk);
>  	}
>  	kfree(nbd_dev);
> +	kfree(nbd_locks);
>  	return err;
>  }
>  

^ permalink raw reply

* radvd 1.8.2 released
From: Reuben Hawkins @ 2011-10-06 19:47 UTC (permalink / raw)
  To: radvd-announce-l, radvd Development Discussion, netdev

Hi,

I've just posted a new release of radvd which includes a few bug fixes
and security enhancements.  Please update immediately.

Thanks,
Reuben

^ permalink raw reply

* Re: [PATCH] net: fix typos in Documentation/networking/scaling.txt
From: David Miller @ 2011-10-06 20:00 UTC (permalink / raw)
  To: benjamin.poirier; +Cc: netdev, linux-doc, willemb
In-Reply-To: <1317736830-4442-1-git-send-email-benjamin.poirier@gmail.com>

From: Benjamin Poirier <benjamin.poirier@gmail.com>
Date: Tue,  4 Oct 2011 10:00:30 -0400

> The second hunk fixes rps_sock_flow_table but has to re-wrap the paragraph.
> 
> Signed-off-by: Benjamin Poirier <benjamin.poirier@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH 1/8] vxge: convert to SKB paged frag API.
From: David Miller @ 2011-10-06 20:00 UTC (permalink / raw)
  To: Ian.Campbell; +Cc: mirqus, netdev, jdmason
In-Reply-To: <1317916822.21903.252.camel@zakaz.uk.xensource.com>

From: Ian Campbell <Ian.Campbell@citrix.com>
Date: Thu, 6 Oct 2011 17:00:22 +0100

> Here it is. David, if you want N separate patches (or a git pull
> request?) let me know.

No, this is fine, applied.

Thanks.

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2011-10-06 20:23 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) MD5 signature pool leak fix in TCP, from Zheng Yan.

2) Fix carrier state regression in bridging, from Stephen Hemminger.

3) Unicast forwards between macvtap interfaces should use
   dev_forward_skb() not the vlan->forward() method.  Fix from David
   Ward.

4) TCP's lost_cnt_hint is updated one too many times in some situations,
   fix from Zheng Yan.

5) netfilter needs to use rwlock_init(), from Thomas Gleixner.

Please pull, thanks a lot!

The following changes since commit 6367f1775ebb66b0f0e9e3512159f3257a6fde0e:

  Merge branch 'for-linus' of http://people.redhat.com/agk/git/linux-dm (2011-10-06 08:31:47 -0700)

are available in the git repository at:

  git://github.com/davem330/net.git master

Benjamin Poirier (1):
      net: fix typos in Documentation/networking/scaling.txt

David Ward (1):
      macvlan/macvtap: Fix unicast between macvtap interfaces in bridge mode

Thomas Gleixner (1):
      netfilter: Use proper rwlock init function

Yan, Zheng (2):
      tcp: properly handle md5sig_pool references
      tcp: properly update lost_cnt_hint during shifting

stephen hemminger (1):
      bridge: leave carrier on for empty bridge

 Documentation/networking/scaling.txt |   10 +++++-----
 drivers/net/macvlan.c                |    2 +-
 net/bridge/br_device.c               |    3 ---
 net/ipv4/tcp_input.c                 |    4 +---
 net/ipv4/tcp_ipv4.c                  |   11 +++++++----
 net/ipv6/tcp_ipv6.c                  |    8 +++++---
 net/netfilter/ipvs/ip_vs_ctl.c       |    2 +-
 7 files changed, 20 insertions(+), 20 deletions(-)

^ permalink raw reply

* [PATCH net-next] macvlan: handle fragmented multicast frames
From: Eric Dumazet @ 2011-10-06 20:28 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <4E8CDB9B.6010900@candelatech.com>

Le mercredi 05 octobre 2011 à 15:35 -0700, Ben Greear a écrit :

> If someone wants to cook up macvlan-ip-defrag patch I'll be happy
> to test it.  But, as far as I can tell, this problem can happen on
> any two interfaces.  The reason that some of mine work (.1q vlans)
> and macvlan didn't is probably because those were separated by
> some virtual network links that imparted extra delay...so the
> vlan consumed all its fragments and passed the complete pkt up
> the stack before the mac-vlan ever saw the initial frame.
> 
> With this in mind, it seems that using multiple udp multicast
> sockets bound to specific devices is fundamentally broken for
> fragmented packets.
> 
> I have no pressing need for this feature, so now that I better understand
> the problem I can just document it and move on to other things.
> 
> Thanks for all the help.
> 

Please test following patch (note I had no time to test it, sorry !)

Based on net-next tree, might apply on 3.0 kernel...

[PATCH net-next] macvlan: handle fragmented multicast frames

Fragmented multicast frames are delivered to a single macvlan port,
because ip defrag logic considers other samples are redundant.

Implement a defrag step before trying to send the multicast frame.

Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/net/macvlan.c  |    3 +++
 include/net/ip.h       |    9 +++++++++
 net/ipv4/ip_fragment.c |   36 ++++++++++++++++++++++++++++++++++++
 net/packet/af_packet.c |   39 +--------------------------------------

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index b100c90..40366eb 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -169,6 +169,9 @@ static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb)
 
 	port = macvlan_port_get_rcu(skb->dev);
 	if (is_multicast_ether_addr(eth->h_dest)) {
+		skb = ip_check_defrag(skb, IP_DEFRAG_MACVLAN);
+		if (!skb)
+			return RX_HANDLER_CONSUMED;
 		src = macvlan_hash_lookup(port, eth->h_source);
 		if (!src)
 			/* frame comes from an external address */
diff --git a/include/net/ip.h b/include/net/ip.h
index aa76c7a..c7e066a 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -406,9 +406,18 @@ enum ip_defrag_users {
 	IP_DEFRAG_VS_OUT,
 	IP_DEFRAG_VS_FWD,
 	IP_DEFRAG_AF_PACKET,
+	IP_DEFRAG_MACVLAN,
 };
 
 int ip_defrag(struct sk_buff *skb, u32 user);
+#ifdef CONFIG_INET
+struct sk_buff *ip_check_defrag(struct sk_buff *skb, u32 user);
+#else
+static inline struct sk_buff *ip_check_defrag(struct sk_buff *skb, u32 user)
+{
+	return skb;
+}
+#endif
 int ip_frag_mem(struct net *net);
 int ip_frag_nqueues(struct net *net);
 
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 0e0ab98..763589a 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -682,6 +682,42 @@ int ip_defrag(struct sk_buff *skb, u32 user)
 }
 EXPORT_SYMBOL(ip_defrag);
 
+struct sk_buff *ip_check_defrag(struct sk_buff *skb, u32 user)
+{
+	const struct iphdr *iph;
+	u32 len;
+
+	if (skb->protocol != htons(ETH_P_IP))
+		return skb;
+
+	if (!pskb_may_pull(skb, sizeof(struct iphdr)))
+		return skb;
+
+	iph = ip_hdr(skb);
+	if (iph->ihl < 5 || iph->version != 4)
+		return skb;
+	if (!pskb_may_pull(skb, iph->ihl*4))
+		return skb;
+	iph = ip_hdr(skb);
+	len = ntohs(iph->tot_len);
+	if (skb->len < len || len < (iph->ihl * 4))
+		return skb;
+
+	if (ip_is_fragment(ip_hdr(skb))) {
+		skb = skb_share_check(skb, GFP_ATOMIC);
+		if (skb) {
+			if (pskb_trim_rcsum(skb, len))
+				return skb;
+			memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+			if (ip_defrag(skb, user))
+				return NULL;
+			skb->rxhash = 0;
+		}
+	}
+	return skb;
+}
+EXPORT_SYMBOL(ip_check_defrag);
+
 #ifdef CONFIG_SYSCTL
 static int zero;
 
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 25e68f5..ff9eed7 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1213,43 +1213,6 @@ static struct sock *fanout_demux_cpu(struct packet_fanout *f, struct sk_buff *sk
 	return f->arr[cpu % num];
 }
 
-static struct sk_buff *fanout_check_defrag(struct sk_buff *skb)
-{
-#ifdef CONFIG_INET
-	const struct iphdr *iph;
-	u32 len;
-
-	if (skb->protocol != htons(ETH_P_IP))
-		return skb;
-
-	if (!pskb_may_pull(skb, sizeof(struct iphdr)))
-		return skb;
-
-	iph = ip_hdr(skb);
-	if (iph->ihl < 5 || iph->version != 4)
-		return skb;
-	if (!pskb_may_pull(skb, iph->ihl*4))
-		return skb;
-	iph = ip_hdr(skb);
-	len = ntohs(iph->tot_len);
-	if (skb->len < len || len < (iph->ihl * 4))
-		return skb;
-
-	if (ip_is_fragment(ip_hdr(skb))) {
-		skb = skb_share_check(skb, GFP_ATOMIC);
-		if (skb) {
-			if (pskb_trim_rcsum(skb, len))
-				return skb;
-			memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
-			if (ip_defrag(skb, IP_DEFRAG_AF_PACKET))
-				return NULL;
-			skb->rxhash = 0;
-		}
-	}
-#endif
-	return skb;
-}
-
 static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev,
 			     struct packet_type *pt, struct net_device *orig_dev)
 {
@@ -1268,7 +1231,7 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev,
 	case PACKET_FANOUT_HASH:
 	default:
 		if (f->defrag) {
-			skb = fanout_check_defrag(skb);
+			skb = ip_check_defrag(skb, IP_DEFRAG_AF_PACKET);
 			if (!skb)
 				return 0;
 		}

^ permalink raw reply related

* Re: IPv4 multicast and mac-vlans acting weird on 3.0.4+
From: Eric Dumazet @ 2011-10-06 20:42 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <1317850603.3457.21.camel@edumazet-laptop>

Le mercredi 05 octobre 2011 à 23:36 +0200, Eric Dumazet a écrit :
> Le mercredi 05 octobre 2011 à 13:56 -0700, Ben Greear a écrit :
> 
> > Wouldn't you have the same problem with two real Ethernet interfaces on
> > the same LAN, or two 802.1Q devices for that matter?  The addrs will all
> > be the same in that case too?
> > 
> 
> Usually multicast is coupled with routing.
> 
> A JOIN message from your app wont be sent on all interfaces...
> 
> But yes, we might have a similar issue with regular vlans.
> 
> Probably nobody noticed yet. Just say no to fragments :)
> 
> > Also, if I have just a single mac-vlan active (the other 3 are 'ifconfig foo down'),
> > I still see the problem with mcast.
> > 
> 
> Thats another bug : macvlan doesnt test IFF_UP on broadcasts, only for
> unicast messages. Please test following patch.
> 
> >  From what you describe, I am thinking I may be hitting a different
> > issue.  Any ideas on how to figure out why exactly the NF_HOOK isn't
> > calling the ip_rcv_finish method?
> > 
> 
> Really I believe I tried to explain the thing already...
> 
> ip_local_deliver() -> ip_defrag() :
> 
> 
> [PATCH] macvlan: dont send frames on DOWN devices
> 
> Reported-by: Ben Greear <greearb@candelatech.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
> index b100c90..94a0282 100644
> --- a/drivers/net/macvlan.c
> +++ b/drivers/net/macvlan.c
> @@ -145,7 +145,8 @@ static void macvlan_broadcast(struct sk_buff *skb,
>  		hlist_for_each_entry_rcu(vlan, n, &port->vlan_hash[i], hlist) {
>  			if (vlan->dev == src || !(vlan->mode & mode))
>  				continue;
> -
> +			if (!(vlan->dev->flags & IFF_UP))
> +				continue;
>  			nskb = skb_clone(skb, GFP_ATOMIC);
>  			err = macvlan_broadcast_one(nskb, vlan, eth,
>  					 mode == MACVLAN_MODE_BRIDGE);
> 

This one is not needed.

When a port is down, its not in vlan_hash[] table anymore.

(Not sure why we perform the IFF_UP test for unicast frames.)

^ permalink raw reply

* Re: [PATCH 1/8] vxge: convert to SKB paged frag API.
From: Michał Mirosław @ 2011-10-06 20:45 UTC (permalink / raw)
  To: Ian Campbell; +Cc: David Miller, netdev@vger.kernel.org, Jon Mason
In-Reply-To: <1317916822.21903.252.camel@zakaz.uk.xensource.com>

W dniu 6 października 2011 18:00 użytkownik Ian Campbell
<Ian.Campbell@citrix.com> napisał:
> On Thu, 2011-10-06 at 08:05 +0100, Ian Campbell wrote:
>> On Wed, 2011-10-05 at 22:03 +0100, Michał Mirosław wrote:
>> > 2011/10/5 Ian Campbell <ian.campbell@citrix.com>:
>> > [...]
>> > > --- a/drivers/net/ethernet/neterion/vxge/vxge-main.c
>> > > +++ b/drivers/net/ethernet/neterion/vxge/vxge-main.c
>> > > @@ -923,9 +923,9 @@ vxge_xmit(struct sk_buff *skb, struct net_device *dev)
>> > >                if (!frag->size)
>> > >                        continue;
>> > >
>> > > -               dma_pointer = (u64) pci_map_page(fifo->pdev, frag->page,
>> > > -                               frag->page_offset, frag->size,
>> > > -                               PCI_DMA_TODEVICE);
>> > > +               dma_pointer = (u64)skb_frag_dma_map(&fifo->pdev->dev, frag,
>> > > +                                                   0, frag->size,
>> > > +                                                   PCI_DMA_TODEVICE);
>> >
>> > This should be DMA_TO_DEVICE instead of PCI_DMA_TODEVICE.
>> > >                if (unlikely(pci_dma_mapping_error(fifo->pdev, dma_pointer)))
>> > >                        goto _exit2;
>> > I would also change this to dma_mapping_error() in one go.
>> > Just a random patch check.
>> Thanks Michał.
>> I'm sure I must have made the same mistakes in a whole bunch of patches
>> which have already been applied. I'll knock up a fixup patch.
> Here it is. David, if you want N separate patches (or a git pull
> request?) let me know.

There's a catch there, though:

[...]
>                        mapping = skb_frag_dma_map(&tp->pdev->dev, frag, 0,
> -                                                  len, PCI_DMA_TODEVICE);
> +                                                  len, DMA_TO_DEVICE);
>
>                        tnapi->tx_buffers[entry].skb = NULL;
>                        dma_unmap_addr_set(&tnapi->tx_buffers[entry], mapping,
>                                           mapping);
> -                       if (pci_dma_mapping_error(tp->pdev, mapping))
> +                       if (dma_mapping_error(tp->pdev, mapping))

dma_mapping_error() takes struct device *, so those changes should be:

dma_mapping_error(&tp->pdev->dev, mapping)

(Like skb_frag_dma_map()'s first argument).

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH 1/8] vxge: convert to SKB paged frag API.
From: Ian Campbell @ 2011-10-06 21:08 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: David Miller, netdev@vger.kernel.org, Jon Mason
In-Reply-To: <CAHXqBFJsGcAdH0zFS_Gd34oR3Ov6ssbXRMzK-KrnSxq8c3=WNg@mail.gmail.com>

On Thu, 2011-10-06 at 21:45 +0100, Michał Mirosław wrote:
> W dniu 6 października 2011 18:00 użytkownik Ian Campbell
> <Ian.Campbell@citrix.com> napisał:
> > On Thu, 2011-10-06 at 08:05 +0100, Ian Campbell wrote:
> >> On Wed, 2011-10-05 at 22:03 +0100, Michał Mirosław wrote:
> >> > 2011/10/5 Ian Campbell <ian.campbell@citrix.com>:
> >> > [...]
> >> > > --- a/drivers/net/ethernet/neterion/vxge/vxge-main.c
> >> > > +++ b/drivers/net/ethernet/neterion/vxge/vxge-main.c
> >> > > @@ -923,9 +923,9 @@ vxge_xmit(struct sk_buff *skb, struct net_device *dev)
> >> > >                if (!frag->size)
> >> > >                        continue;
> >> > >
> >> > > -               dma_pointer = (u64) pci_map_page(fifo->pdev, frag->page,
> >> > > -                               frag->page_offset, frag->size,
> >> > > -                               PCI_DMA_TODEVICE);
> >> > > +               dma_pointer = (u64)skb_frag_dma_map(&fifo->pdev->dev, frag,
> >> > > +                                                   0, frag->size,
> >> > > +                                                   PCI_DMA_TODEVICE);
> >> >
> >> > This should be DMA_TO_DEVICE instead of PCI_DMA_TODEVICE.
> >> > >                if (unlikely(pci_dma_mapping_error(fifo->pdev, dma_pointer)))
> >> > >                        goto _exit2;
> >> > I would also change this to dma_mapping_error() in one go.
> >> > Just a random patch check.
> >> Thanks Michał.
> >> I'm sure I must have made the same mistakes in a whole bunch of patches
> >> which have already been applied. I'll knock up a fixup patch.
> > Here it is. David, if you want N separate patches (or a git pull
> > request?) let me know.
> 
> There's a catch there, though:
> 
> [...]
> >                        mapping = skb_frag_dma_map(&tp->pdev->dev, frag, 0,
> > -                                                  len, PCI_DMA_TODEVICE);
> > +                                                  len, DMA_TO_DEVICE);
> >
> >                        tnapi->tx_buffers[entry].skb = NULL;
> >                        dma_unmap_addr_set(&tnapi->tx_buffers[entry], mapping,
> >                                           mapping);
> > -                       if (pci_dma_mapping_error(tp->pdev, mapping))
> > +                       if (dma_mapping_error(tp->pdev, mapping))
> 
> dma_mapping_error() takes struct device *, so those changes should be:
> 
> dma_mapping_error(&tp->pdev->dev, mapping)
> 
> (Like skb_frag_dma_map()'s first argument).

You are absolutely right, I've no idea how I missed the very obvious
warning this produces. Incremental patch is below, sorry about this!

8<-------------------------------------------------------

From 5be2edc6eec5c66b58f4287f1d3ba3637afa7ad6 Mon Sep 17 00:00:00 2001
From: Ian Campbell <ian.campbell@citrix.com>
Date: Thu, 6 Oct 2011 22:05:41 +0100
Subject: [PATCH] net: fix argument to dma_mapping_error after conversion to skb_frag_dma_map

The recent conversion from pci_dma_mapping_error to dma_mapping_error missed
the change in the exact parameter, which needs to be the struct device * not
the struct pci_device *.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/ethernet/broadcom/tg3.c                |    2 +-
 drivers/net/ethernet/marvell/sky2.c                |    4 ++--
 drivers/net/ethernet/pasemi/pasemi_mac.c           |    2 +-
 .../net/ethernet/qlogic/netxen/netxen_nic_main.c   |    2 +-
 drivers/net/ethernet/qlogic/qla3xxx.c              |    2 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c   |    2 +-
 drivers/net/ethernet/qlogic/qlge/qlge_main.c       |    2 +-
 drivers/net/ethernet/sfc/tx.c                      |    2 +-
 8 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 3abcb4d..9dbd1af 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6784,7 +6784,7 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			tnapi->tx_buffers[entry].skb = NULL;
 			dma_unmap_addr_set(&tnapi->tx_buffers[entry], mapping,
 					   mapping);
-			if (dma_mapping_error(tp->pdev, mapping))
+			if (dma_mapping_error(&tp->pdev->dev, mapping))
 				goto dma_error;
 
 			if (tg3_tx_frag_set(tnapi, &entry, &budget, mapping,
diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
index 7baff3e..a3ce9b6 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -1231,7 +1231,7 @@ static int sky2_rx_map_skb(struct pci_dev *pdev, struct rx_ring_info *re,
 						    frag->size,
 						    DMA_FROM_DEVICE);
 
-		if (dma_mapping_error(pdev, re->frag_addr[i]))
+		if (dma_mapping_error(&pdev->dev, re->frag_addr[i]))
 			goto map_page_error;
 	}
 	return 0;
@@ -1938,7 +1938,7 @@ static netdev_tx_t sky2_xmit_frame(struct sk_buff *skb,
 		mapping = skb_frag_dma_map(&hw->pdev->dev, frag, 0,
 					   frag->size, DMA_TO_DEVICE);
 
-		if (dma_mapping_error(hw->pdev, mapping))
+		if (dma_mapping_error(&hw->pdev->dev, mapping))
 			goto mapping_unwind;
 
 		upper = upper_32_bits(mapping);
diff --git a/drivers/net/ethernet/pasemi/pasemi_mac.c b/drivers/net/ethernet/pasemi/pasemi_mac.c
index d247030..c6f0056 100644
--- a/drivers/net/ethernet/pasemi/pasemi_mac.c
+++ b/drivers/net/ethernet/pasemi/pasemi_mac.c
@@ -1508,7 +1508,7 @@ static int pasemi_mac_start_tx(struct sk_buff *skb, struct net_device *dev)
 		map[i + 1] = skb_frag_dma_map(&mac->dma_pdev->dev, frag, 0,
 					      frag->size, DMA_TO_DEVICE);
 		map_size[i+1] = frag->size;
-		if (dma_mapping_error(mac->dma_pdev, map[i + 1])) {
+		if (dma_mapping_error(&mac->dma_pdev->dev, map[i + 1])) {
 			nfrags = i;
 			goto out_err_nolock;
 		}
diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
index b061c07..e2ba78b 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
@@ -1907,7 +1907,7 @@ netxen_map_tx_skb(struct pci_dev *pdev,
 
 		map = skb_frag_dma_map(&pdev->dev, frag, 0, frag->size,
 				       DMA_TO_DEVICE);
-		if (dma_mapping_error(pdev, map))
+		if (dma_mapping_error(&pdev->dev, map))
 			goto unwind;
 
 		nf->dma = map;
diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c b/drivers/net/ethernet/qlogic/qla3xxx.c
index 8932265..46f9b64 100644
--- a/drivers/net/ethernet/qlogic/qla3xxx.c
+++ b/drivers/net/ethernet/qlogic/qla3xxx.c
@@ -2391,7 +2391,7 @@ static int ql_send_map(struct ql3_adapter *qdev,
 		map = skb_frag_dma_map(&qdev->pdev->dev, frag, 0, frag->size,
 				       DMA_TO_DEVICE);
 
-		err = dma_mapping_error(qdev->pdev, map);
+		err = dma_mapping_error(&qdev->pdev->dev, map);
 		if (err) {
 			netdev_err(qdev->ndev,
 				   "PCI mapping frags failed with error: %d\n",
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index c9756e7..eac19e7d 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -2137,7 +2137,7 @@ qlcnic_map_tx_skb(struct pci_dev *pdev,
 
 		map = skb_frag_dma_map(&pdev->dev, frag, 0, frag->size,
 				       DMA_TO_DEVICE);
-		if (dma_mapping_error(pdev, map))
+		if (dma_mapping_error(&pdev->dev, map))
 			goto unwind;
 
 		nf->dma = map;
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 094ac22..f2d9bb7 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -1434,7 +1434,7 @@ static int ql_map_send(struct ql_adapter *qdev,
 		map = skb_frag_dma_map(&qdev->pdev->dev, frag, 0, frag->size,
 				       DMA_TO_DEVICE);
 
-		err = dma_mapping_error(qdev->pdev, map);
+		err = dma_mapping_error(&qdev->pdev->dev, map);
 		if (err) {
 			netif_err(qdev, tx_queued, qdev->ndev,
 				  "PCI mapping frags failed with error: %d.\n",
diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index 7f47efc..3964a62 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -927,7 +927,7 @@ static int tso_get_fragment(struct tso_state *st, struct efx_nic *efx,
 {
 	st->unmap_addr = skb_frag_dma_map(&efx->pci_dev->dev, frag, 0,
 					  frag->size, DMA_TO_DEVICE);
-	if (likely(!dma_mapping_error(efx->pci_dev, st->unmap_addr))) {
+	if (likely(!dma_mapping_error(&efx->pci_dev->dev, st->unmap_addr))) {
 		st->unmap_single = false;
 		st->unmap_len = frag->size;
 		st->in_len = frag->size;
-- 
1.7.2.5



Ian.

^ permalink raw reply related

* Re: [PATCH 1/8] vxge: convert to SKB paged frag API.
From: David Miller @ 2011-10-06 21:15 UTC (permalink / raw)
  To: mirqus; +Cc: Ian.Campbell, netdev, jdmason
In-Reply-To: <CAHXqBFJsGcAdH0zFS_Gd34oR3Ov6ssbXRMzK-KrnSxq8c3=WNg@mail.gmail.com>

From: Michał Mirosław <mirqus@gmail.com>
Date: Thu, 6 Oct 2011 22:45:57 +0200

> There's a catch there, though:
 ..
> dma_mapping_error() takes struct device *, so those changes should be:
> 
> dma_mapping_error(&tp->pdev->dev, mapping)
> 
> (Like skb_frag_dma_map()'s first argument).

Why don't you take a look at what I committed and pushed out
(hint: I fixed it up when I saw the build warnings)?

^ permalink raw reply

* Re: [PATCH 1/8] vxge: convert to SKB paged frag API.
From: David Miller @ 2011-10-06 21:16 UTC (permalink / raw)
  To: Ian.Campbell; +Cc: mirqus, netdev, jdmason
In-Reply-To: <1317935310.24742.18.camel@dagon.hellion.org.uk>

From: Ian Campbell <Ian.Campbell@eu.citrix.com>
Date: Thu, 6 Oct 2011 22:08:30 +0100

> You are absolutely right, I've no idea how I missed the very obvious
> warning this produces. Incremental patch is below, sorry about this!

Would really love to know what your patch is against, since I
fixed this problem when I commited your original patch.

^ permalink raw reply

* Re: [PATCH 1/8] vxge: convert to SKB paged frag API.
From: Ian Campbell @ 2011-10-06 21:19 UTC (permalink / raw)
  To: David Miller; +Cc: mirqus@gmail.com, netdev@vger.kernel.org, jdmason@kudzu.us
In-Reply-To: <20111006.171519.797156146863050743.davem@davemloft.net>

On Thu, 2011-10-06 at 22:15 +0100, David Miller wrote:
> From: Michał Mirosław <mirqus@gmail.com>
> Date: Thu, 6 Oct 2011 22:45:57 +0200
> 
> > There's a catch there, though:
>  ..
> > dma_mapping_error() takes struct device *, so those changes should be:
> > 
> > dma_mapping_error(&tp->pdev->dev, mapping)
> > 
> > (Like skb_frag_dma_map()'s first argument).
> 
> Why don't you take a look at what I committed and pushed out
> (hint: I fixed it up when I saw the build warnings)?

I pulled your tree and it wasn't there yet and it still isn't, I guess
it's delayed in mirroring?

Ian.

^ permalink raw reply

* [PATCH] bridge: fix hang on removal of bridge via netlink
From: Stephen Hemminger @ 2011-10-06 21:19 UTC (permalink / raw)
  To: Sridhar Samudrala, David Miller; +Cc: netdev
In-Reply-To: <1317921532.6433.13.camel@w-sridhar.beaverton.ibm.com>

Need to cleanup bridge device timers and ports when being bridge
device is being removed via netlink.

This fixes the problem of observed when doing:
 ip link add br0 type bridge
 ip link set dev eth1 master br0
 ip link set br0 up
 ip link del br0

which would cause br0 to hang in unregister_netdev because
of leftover reference count.

Reported-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
Patch is against net-next but should go to net and stable trees
since it is an observable hang on 3.0 and later kernels.

--- a/net/bridge/br_if.c	2011-10-03 11:08:36.304168386 -0700
+++ b/net/bridge/br_if.c	2011-10-06 11:27:47.682488755 -0700
@@ -160,9 +160,10 @@ static void del_nbp(struct net_bridge_po
 	call_rcu(&p->rcu, destroy_nbp_rcu);
 }
 
-/* called with RTNL */
-static void del_br(struct net_bridge *br, struct list_head *head)
+/* Delete bridge device */
+void br_dev_delete(struct net_device *dev, struct list_head *head)
 {
+	struct net_bridge *br = netdev_priv(dev);
 	struct net_bridge_port *p, *n;
 
 	list_for_each_entry_safe(p, n, &br->port_list, list) {
@@ -267,7 +268,7 @@ int br_del_bridge(struct net *net, const
 	}
 
 	else
-		del_br(netdev_priv(dev), NULL);
+		br_dev_delete(dev, NULL);
 
 	rtnl_unlock();
 	return ret;
@@ -446,7 +447,7 @@ void __net_exit br_net_exit(struct net *
 	rtnl_lock();
 	for_each_netdev(net, dev)
 		if (dev->priv_flags & IFF_EBRIDGE)
-			del_br(netdev_priv(dev), &list);
+			br_dev_delete(dev, &list);
 
 	unregister_netdevice_many(&list);
 	rtnl_unlock();
--- a/net/bridge/br_netlink.c	2011-09-16 13:12:58.061369744 -0700
+++ b/net/bridge/br_netlink.c	2011-10-06 11:20:21.808911679 -0700
@@ -210,6 +210,7 @@ static struct rtnl_link_ops br_link_ops
 	.priv_size	= sizeof(struct net_bridge),
 	.setup		= br_dev_setup,
 	.validate	= br_validate,
+	.dellink	= br_dev_delete,
 };
 
 int __init br_netlink_init(void)
--- a/net/bridge/br_private.h	2011-10-06 08:42:27.353044954 -0700
+++ b/net/bridge/br_private.h	2011-10-06 11:25:17.845118817 -0700
@@ -301,6 +301,7 @@ static inline int br_is_root_bridge(cons
 
 /* br_device.c */
 extern void br_dev_setup(struct net_device *dev);
+extern void br_dev_delete(struct net_device *dev, struct list_head *list);
 extern netdev_tx_t br_dev_xmit(struct sk_buff *skb,
 			       struct net_device *dev);
 #ifdef CONFIG_NET_POLL_CONTROLLER

^ permalink raw reply

* Re: [PATCH 1/8] vxge: convert to SKB paged frag API.
From: Ian Campbell @ 2011-10-06 21:25 UTC (permalink / raw)
  To: David Miller; +Cc: mirqus@gmail.com, netdev@vger.kernel.org, jdmason@kudzu.us
In-Reply-To: <20111006.171645.1154998655978830692.davem@davemloft.net>

On Thu, 2011-10-06 at 22:16 +0100, David Miller wrote:
> From: Ian Campbell <Ian.Campbell@eu.citrix.com>
> Date: Thu, 6 Oct 2011 22:08:30 +0100
> 
> > You are absolutely right, I've no idea how I missed the very obvious
> > warning this produces. Incremental patch is below, sorry about this!
> 
> Would really love to know what your patch is against, since I
> fixed this problem when I commited your original patch.

It was against e878d78b9a74 + the original bad patch, i.e.:

$ git log --pretty=oneline net-next/master^..HEAD
80f4b53b3d2c009689178d12de0bc108ddd580cd net: fix argument to dma_mapping_error after conversion to skb_frag_dma_map
669b1ce22eedd0f7bac048299feb06c67804ed83 net: use DMA_x_DEVICE and dma_mapping_error with skb_frag_dma_map
e878d78b9a7403fabc89ecc93c56928b74d14f01 virtio-net: Verify page list size before fitting into skb

AFAICT e878d78b9a7403fabc89ecc93c56928b74d14f01 is still the head of the
public git://github.com/davem330/net-next master.

Ian.

^ permalink raw reply

* Re: [PATCH 1/8] vxge: convert to SKB paged frag API.
From: Michał Mirosław @ 2011-10-06 21:28 UTC (permalink / raw)
  To: David Miller; +Cc: Ian.Campbell, netdev, jdmason
In-Reply-To: <20111006.171519.797156146863050743.davem@davemloft.net>

2011/10/6 David Miller <davem@davemloft.net>:
> From: Michał Mirosław <mirqus@gmail.com>
> Date: Thu, 6 Oct 2011 22:45:57 +0200
>> There's a catch there, though:
>  ..
>> dma_mapping_error() takes struct device *, so those changes should be:
>>
>> dma_mapping_error(&tp->pdev->dev, mapping)
>>
>> (Like skb_frag_dma_map()'s first argument).
> Why don't you take a look at what I committed and pushed out
> (hint: I fixed it up when I saw the build warnings)?

You were just quicker than me or Ian on this one. We'll be faster next time.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH 1/8] vxge: convert to SKB paged frag API.
From: David Miller @ 2011-10-06 21:56 UTC (permalink / raw)
  To: Ian.Campbell; +Cc: mirqus, netdev, jdmason
In-Reply-To: <1317936302.24742.23.camel@dagon.hellion.org.uk>

From: Ian Campbell <Ian.Campbell@eu.citrix.com>
Date: Thu, 6 Oct 2011 22:25:02 +0100

> On Thu, 2011-10-06 at 22:16 +0100, David Miller wrote:
>> From: Ian Campbell <Ian.Campbell@eu.citrix.com>
>> Date: Thu, 6 Oct 2011 22:08:30 +0100
>> 
>> > You are absolutely right, I've no idea how I missed the very obvious
>> > warning this produces. Incremental patch is below, sorry about this!
>> 
>> Would really love to know what your patch is against, since I
>> fixed this problem when I commited your original patch.
> 
> It was against e878d78b9a74 + the original bad patch, i.e.:
> 
> $ git log --pretty=oneline net-next/master^..HEAD
> 80f4b53b3d2c009689178d12de0bc108ddd580cd net: fix argument to dma_mapping_error after conversion to skb_frag_dma_map
> 669b1ce22eedd0f7bac048299feb06c67804ed83 net: use DMA_x_DEVICE and dma_mapping_error with skb_frag_dma_map
> e878d78b9a7403fabc89ecc93c56928b74d14f01 virtio-net: Verify page list size before fitting into skb
> 
> AFAICT e878d78b9a7403fabc89ecc93c56928b74d14f01 is still the head of the
> public git://github.com/davem330/net-next master.

I'm and idiot, I didn't push it out when I left the office :-/

Sorry.

But when your commit appears it will have the dma_mapping_error() stuff
fixed up, so don't worry about it. :-)

^ permalink raw reply

* Re: [PATCH] IPv6: DAD from bonding iface is treated as dup address from others
From: Yinglin Sun @ 2011-10-06 22:17 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Neil Horman, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <27199.1317927933@death>

On Thu, Oct 6, 2011 at 12:05 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
>
> Neil Horman <nhorman@tuxdriver.com> wrote:
>
> >On Wed, Oct 05, 2011 at 08:59:10PM -0700, Yinglin Sun wrote:
> >> Steps to reproduce this issue:
> >> 1. create bond0 over eth0 and eth1, set the mode to balance-xor
> >> 2. add an IPv6 address to bond0
> >> 3. DAD packet is sent out from one slave and then is looped back from
> >> the other slave. Therefore, it is treated as a duplicate address and
> >> stays tentative afterwards:
> >>    kern.info:
> >>        Oct  5 11:50:18 testvm1 kernel: [  129.224353] bond0: IPv6 duplicate address 1234::1 detected!
> >>
> >> Signed-off-by: Yinglin Sun <Yinglin.Sun@emc.com>
> >> ---
> >>  net/ipv6/ndisc.c |   15 +++++++++++++--
> >>  1 files changed, 13 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> >> index 9da6e02..c82f4c7 100644
> >> --- a/net/ipv6/ndisc.c
> >> +++ b/net/ipv6/ndisc.c
> >> @@ -809,9 +809,10 @@ static void ndisc_recv_ns(struct sk_buff *skb)
> >>
> >>              if (ifp->flags & (IFA_F_TENTATIVE|IFA_F_OPTIMISTIC)) {
> >>                      if (dad) {
> >> +                            const unsigned char *sadr;
> >> +                            sadr = skb_mac_header(skb);
> >> +
> >>                              if (dev->type == ARPHRD_IEEE802_TR) {
> >> -                                    const unsigned char *sadr;
> >> -                                    sadr = skb_mac_header(skb);
> >>                                      if (((sadr[8] ^ dev->dev_addr[0]) & 0x7f) == 0 &&
> >>                                          sadr[9] == dev->dev_addr[1] &&
> >>                                          sadr[10] == dev->dev_addr[2] &&
> >> @@ -821,6 +822,16 @@ static void ndisc_recv_ns(struct sk_buff *skb)
> >>                                              /* looped-back to us */
> >>                                              goto out;
> >>                                      }
> >> +                            } else if (dev->type == ARPHRD_ETHER) {
> >> +                                    if (sadr[6] == dev->dev_addr[0] &&
> >> +                                        sadr[7] == dev->dev_addr[1] &&
> >> +                                        sadr[8] == dev->dev_addr[2] &&
> >> +                                        sadr[9] == dev->dev_addr[3] &&
> >> +                                        sadr[10] == dev->dev_addr[4] &&
> >> +                                        sadr[11] == dev->dev_addr[5]) {
> >> +                                            /* looped-back to us */
> >> +                                            goto out;
> >> +                                    }
> >>                              }
> >>
> >>                              /*
> >> --
> >> 1.7.4.1
> >>
> >Nack, This seems like it will just completely break DAD.  What if theres another
> >system out there with the same mac address.  A response from that system would
> >get dropped by this filter, instead of causing The local system to stop using
> >the address.  What you really want to do is modify
> >bond_should_deliver_exact_match to detect this frame on the inactive slave or
> >some such, and drop the frame there.
>
>        Also NACK; and adding a bit of information.  The balance-xor
> mode is nominally expecting to interact with a switch whose ports are
> set for etherchannel ("static link aggregation"), in which case the
> switch will not loop the packet back around.
>
>        If your switch can do etherchannel, then enable it and the
> problem should go away.  If your switch cannot do this, then you may
> have other issues, because all of the multicast or broadcast packets
> going out any bonding slave will loop around to another slave.  You
> could also use 802.3ad / LACP if you switch supports that.
>
>        For balance-xor (or balance-rr, for that matter) mode to a
> non-etherchannel switch, it's going to be difficult, if not impossible,
> to modify bond_should_deliver_exact_match, because there are no inactive
> slaves.  In this mode, bonding is expecting the switch to balance
> incoming traffic across the ports, and not deliver looped back packets
> or duplicates.  There are no restrictions on what type of traffic
> (mcast, bcast, ucast) may arrive on any given port.
>
>        I can't think of a way to make the non-etherchannel case work
> for balance-xor (or balance-rr) without breaking the DAD functionality
> in the case of an actual duplicate.  I'm not aware of a way to
> distinguish a looped back DAD probe from an actual duplicate address
> probe elsewhere on the network.
>

Hi Neil & Jay,

Thanks a lot for the comments.

The use case is to add IPv6 address on the bonding interface first,
and then set up port channel on switch. We'll hit this issue and the
new address will stay tentative and unusable after port channel is set
up on switch. This patch is for this valid use case.

Except failover mode, all slaves are active on receiving packets, so
we are receiving such looped back DAD and the bonding driver cannot
ignore them. I cannot think of a way to distinguish if a DAD is looped
back or from someone else having the same mac address. They look the
same to the host. If there is another machine having the same mac
address, this code path gets executed if both are doing DAD at the
same time for the same IPv6 address. Maybe we should find out what the
specification defines for this case?

Thanks.

Yinglin

^ permalink raw reply

* Re: [PATCH] SELinux: Fix RCU deref check warning in sel_netport_insert()
From: Paul Moore @ 2011-10-06 22:51 UTC (permalink / raw)
  To: David Howells; +Cc: selinux, netdev
In-Reply-To: <1624.1317821523@redhat.com>

On Wednesday, October 05, 2011 02:32:03 PM David Howells wrote:
> Paul Moore <paul@paul-moore.com> wrote:
> > We should probably do the same for the security/selinux/netif.c as it
> > uses the same logic; David is this something you want to tackle?
> 
> netif.c doesn't use any rcu_dereference*() function directly, though it does
> use list_for_each_entry_rcu().  However, I'm not sure that's a problem. 
> What is it you're referring to?

My apologies, the netport.c and netif.c code is very, very similar and 
whenever I see a patch just for one of the two it causes a reaction that you 
saw above.  While netif.c has a similar function, sel_netif_insert(), it is 
slightly different and doesn't need a rcu_dereference() ad the netport.c code 
does.

Sorry for the confusion.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH] SELinux: Fix RCU deref check warning in sel_netport_insert() [ver #3]
From: Paul Moore @ 2011-10-06 22:52 UTC (permalink / raw)
  To: David Howells; +Cc: selinux, netdev
In-Reply-To: <20111005111919.30551.77529.stgit@warthog.procyon.org.uk>

On Wednesday, October 05, 2011 12:19:19 PM David Howells wrote:
> Fix the following bug in sel_netport_insert() where rcu_dereference() should
> be rcu_dereference_protected() as sel_netport_lock is held.
> 
> ===================================================
> [ INFO: suspicious rcu_dereference_check() usage. ]
> ---------------------------------------------------
> security/selinux/netport.c:127 invoked rcu_dereference_check() without
> protection!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 1, debug_locks = 0
> 1 lock held by ossec-rootcheck/3323:
>  #0:  (sel_netport_lock){+.....}, at: [<ffffffff8117d775>]
> sel_netport_sid+0xbb/0x226
> 
> stack backtrace:
> Pid: 3323, comm: ossec-rootcheck Not tainted 3.1.0-rc8-fsdevel+ #1095
> Call Trace:
>  [<ffffffff8105cfb7>] lockdep_rcu_dereference+0xa7/0xb0
>  [<ffffffff8117d871>] sel_netport_sid+0x1b7/0x226
>  [<ffffffff8117d6ba>] ? sel_netport_avc_callback+0xbc/0xbc
>  [<ffffffff8117556c>] selinux_socket_bind+0x115/0x230
>  [<ffffffff810a5388>] ? might_fault+0x4e/0x9e
>  [<ffffffff810a53d1>] ? might_fault+0x97/0x9e
>  [<ffffffff81171cf4>] security_socket_bind+0x11/0x13
>  [<ffffffff812ba967>] sys_bind+0x56/0x95
>  [<ffffffff81380dac>] ? sysret_check+0x27/0x62
>  [<ffffffff8105b767>] ? trace_hardirqs_on_caller+0x11e/0x155
>  [<ffffffff81076fcd>] ? audit_syscall_entry+0x17b/0x1ae
>  [<ffffffff811b5eae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff81380d7b>] system_call_fastpath+0x16/0x1b
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
> 
>  security/selinux/netport.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)

Acked-by: Paul Moore <paul@paul-moore.com>

> diff --git a/security/selinux/netport.c b/security/selinux/netport.c
> index 0b62bd1..7b9eb1f 100644
> --- a/security/selinux/netport.c
> +++ b/security/selinux/netport.c
> @@ -123,7 +123,9 @@ static void sel_netport_insert(struct sel_netport *port)
> if (sel_netport_hash[idx].size == SEL_NETPORT_HASH_BKT_LIMIT) {
>  		struct sel_netport *tail;
>  		tail = list_entry(
> -			rcu_dereference(sel_netport_hash[idx].list.prev),
> +			rcu_dereference_protected(
> +				sel_netport_hash[idx].list.prev,
> +				lockdep_is_held(&sel_netport_lock)),
>  			struct sel_netport, list);
>  		list_del_rcu(&tail->list);
>  		kfree_rcu(tail, rcu);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH] bridge: fix hang on removal of bridge via netlink
From: Sridhar Samudrala @ 2011-10-06 23:02 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20111006141941.437be127@nehalam.linuxnetplumber.net>

On Thu, 2011-10-06 at 14:19 -0700, Stephen Hemminger wrote:
> Need to cleanup bridge device timers and ports when being bridge
> device is being removed via netlink.
> 
> This fixes the problem of observed when doing:
>  ip link add br0 type bridge
>  ip link set dev eth1 master br0
>  ip link set br0 up
>  ip link del br0
> 
> which would cause br0 to hang in unregister_netdev because
> of leftover reference count.
> 
> Reported-by: Sridhar Samudrala <sri@us.ibm.com>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Acked-by: Sridhar Samudrala <sri@us.ibm.com>

> 
> ---
> Patch is against net-next but should go to net and stable trees
> since it is an observable hang on 3.0 and later kernels.
> 
> --- a/net/bridge/br_if.c	2011-10-03 11:08:36.304168386 -0700
> +++ b/net/bridge/br_if.c	2011-10-06 11:27:47.682488755 -0700
> @@ -160,9 +160,10 @@ static void del_nbp(struct net_bridge_po
>  	call_rcu(&p->rcu, destroy_nbp_rcu);
>  }
> 
> -/* called with RTNL */
> -static void del_br(struct net_bridge *br, struct list_head *head)
> +/* Delete bridge device */
> +void br_dev_delete(struct net_device *dev, struct list_head *head)
>  {
> +	struct net_bridge *br = netdev_priv(dev);
>  	struct net_bridge_port *p, *n;
> 
>  	list_for_each_entry_safe(p, n, &br->port_list, list) {
> @@ -267,7 +268,7 @@ int br_del_bridge(struct net *net, const
>  	}
> 
>  	else
> -		del_br(netdev_priv(dev), NULL);
> +		br_dev_delete(dev, NULL);
> 
>  	rtnl_unlock();
>  	return ret;
> @@ -446,7 +447,7 @@ void __net_exit br_net_exit(struct net *
>  	rtnl_lock();
>  	for_each_netdev(net, dev)
>  		if (dev->priv_flags & IFF_EBRIDGE)
> -			del_br(netdev_priv(dev), &list);
> +			br_dev_delete(dev, &list);
> 
>  	unregister_netdevice_many(&list);
>  	rtnl_unlock();
> --- a/net/bridge/br_netlink.c	2011-09-16 13:12:58.061369744 -0700
> +++ b/net/bridge/br_netlink.c	2011-10-06 11:20:21.808911679 -0700
> @@ -210,6 +210,7 @@ static struct rtnl_link_ops br_link_ops
>  	.priv_size	= sizeof(struct net_bridge),
>  	.setup		= br_dev_setup,
>  	.validate	= br_validate,
> +	.dellink	= br_dev_delete,
>  };
> 
>  int __init br_netlink_init(void)
> --- a/net/bridge/br_private.h	2011-10-06 08:42:27.353044954 -0700
> +++ b/net/bridge/br_private.h	2011-10-06 11:25:17.845118817 -0700
> @@ -301,6 +301,7 @@ static inline int br_is_root_bridge(cons
> 
>  /* br_device.c */
>  extern void br_dev_setup(struct net_device *dev);
> +extern void br_dev_delete(struct net_device *dev, struct list_head *list);
>  extern netdev_tx_t br_dev_xmit(struct sk_buff *skb,
>  			       struct net_device *dev);
>  #ifdef CONFIG_NET_POLL_CONTROLLER

^ permalink raw reply

* [PATCH] iproute2: Fix usage and man page for 'ip link'
From: Sridhar Samudrala @ 2011-10-06 23:10 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Add bridge as a supported type with 'ip link' in usage and all the missing
types in 'ip' man page. Also fixed some typos.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>

diff --git a/ip/iplink.c b/ip/iplink.c
index e5325a6..35e6dc6 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -43,7 +43,7 @@ static int iplink_have_newlink(void);
 void iplink_usage(void)
 {
 	if (iplink_have_newlink()) {
-		fprintf(stderr, "Usage: ip link add link DEV [ name ] NAME\n");
+		fprintf(stderr, "Usage: ip link add [link DEV] [ name ] NAME\n");
 		fprintf(stderr, "                   [ txqueuelen PACKETS ]\n");
 		fprintf(stderr, "                   [ address LLADDR ]\n");
 		fprintf(stderr, "                   [ broadcast LLADDR ]\n");
@@ -78,7 +78,7 @@ void iplink_usage(void)
 
 	if (iplink_have_newlink()) {
 		fprintf(stderr, "\n");
-		fprintf(stderr, "TYPE := { vlan | veth | vcan | dummy | ifb | macvlan | can }\n");
+		fprintf(stderr, "TYPE := { vlan | veth | vcan | dummy | ifb | macvlan | can | bridge }\n");
 	}
 	exit(-1);
 }
diff --git a/man/man8/ip.8 b/man/man8/ip.8
index 27993a4..49d94f5 100644
--- a/man/man8/ip.8
+++ b/man/man8/ip.8
@@ -47,7 +47,7 @@ ip \- show / manipulate routing, devices, policy routing and tunnels
 
 .ti -8
 .IR TYPE " := [ "
-.BR vlan " | " maclan " | " can " ]"
+.BR vlan " | " veth " | " vcan " | " dummy " | " ifb " | " macvlan " | " can " | " bridge ]"
 
 .ti -8
 .BI "ip link delete " DEVICE
@@ -989,13 +989,28 @@ Link types:
 
 .in +8
 .B vlan
-- 802.1q tagged virrtual LAN interface
+- 802.1q tagged virtual LAN interface
+.sp
+.B veth
+- Virtual ethernet interface
+.sp
+.B vcan
+- Virtual Local CAN interface
+.sp
+.B dummy
+- Dummy network interface
+.sp
+.B ifb
+- Intermediate Functional Block device
 .sp
 .B macvlan
 - virtual interface base on link layer address (MAC)
 .sp
 .B can
 - Controller Area Network interface
+.sp
+.B bridge
+- Ethernet Bridge device
 .in -8
 
 .SS ip link delete - delete virtual link

^ permalink raw reply related

* Re: Asserting ECN from userspace?
From: Andi Kleen @ 2011-10-06 23:52 UTC (permalink / raw)
  To: David Täht; +Cc: netdev, bloat-devel
In-Reply-To: <4E8BF6B2.6030101@gmail.com>

David Täht <dave.taht@gmail.com> writes:
>
> And twiddling them, on a per stream basis, for a single packet, would
> seem to require something more robust than setsockopt/getsockopt
> (although that would work for udp streams)

With netfilter nf_queue you can construct a rule that passes packets
through user space and reinjects them.

I would suggest to just use that to modify the ECN bits.

I'm sure with reasonable google skills you can find some examples
how to do this on the web.

-Andi


-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply

* Re: [PATCH] IPv6: DAD from bonding iface is treated as dup address from others
From: Yinglin Sun @ 2011-10-07  0:03 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Neil Horman, David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <CAN17JHUo00BGbsQ0wDjhcY-wWRuGr-in-i_JBzE5__jgO5be=g@mail.gmail.com>

On Thu, Oct 6, 2011 at 3:17 PM, Yinglin Sun <Yinglin.Sun@emc.com> wrote:
>
> On Thu, Oct 6, 2011 at 12:05 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
> >
> > Neil Horman <nhorman@tuxdriver.com> wrote:
> >
> > >On Wed, Oct 05, 2011 at 08:59:10PM -0700, Yinglin Sun wrote:
> > >> Steps to reproduce this issue:
> > >> 1. create bond0 over eth0 and eth1, set the mode to balance-xor
> > >> 2. add an IPv6 address to bond0
> > >> 3. DAD packet is sent out from one slave and then is looped back from
> > >> the other slave. Therefore, it is treated as a duplicate address and
> > >> stays tentative afterwards:
> > >>    kern.info:
> > >>        Oct  5 11:50:18 testvm1 kernel: [  129.224353] bond0: IPv6 duplicate address 1234::1 detected!
> > >>
> > >> Signed-off-by: Yinglin Sun <Yinglin.Sun@emc.com>
> > >> ---
> > >>  net/ipv6/ndisc.c |   15 +++++++++++++--
> > >>  1 files changed, 13 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> > >> index 9da6e02..c82f4c7 100644
> > >> --- a/net/ipv6/ndisc.c
> > >> +++ b/net/ipv6/ndisc.c
> > >> @@ -809,9 +809,10 @@ static void ndisc_recv_ns(struct sk_buff *skb)
> > >>
> > >>              if (ifp->flags & (IFA_F_TENTATIVE|IFA_F_OPTIMISTIC)) {
> > >>                      if (dad) {
> > >> +                            const unsigned char *sadr;
> > >> +                            sadr = skb_mac_header(skb);
> > >> +
> > >>                              if (dev->type == ARPHRD_IEEE802_TR) {
> > >> -                                    const unsigned char *sadr;
> > >> -                                    sadr = skb_mac_header(skb);
> > >>                                      if (((sadr[8] ^ dev->dev_addr[0]) & 0x7f) == 0 &&
> > >>                                          sadr[9] == dev->dev_addr[1] &&
> > >>                                          sadr[10] == dev->dev_addr[2] &&
> > >> @@ -821,6 +822,16 @@ static void ndisc_recv_ns(struct sk_buff *skb)
> > >>                                              /* looped-back to us */
> > >>                                              goto out;
> > >>                                      }
> > >> +                            } else if (dev->type == ARPHRD_ETHER) {
> > >> +                                    if (sadr[6] == dev->dev_addr[0] &&
> > >> +                                        sadr[7] == dev->dev_addr[1] &&
> > >> +                                        sadr[8] == dev->dev_addr[2] &&
> > >> +                                        sadr[9] == dev->dev_addr[3] &&
> > >> +                                        sadr[10] == dev->dev_addr[4] &&
> > >> +                                        sadr[11] == dev->dev_addr[5]) {
> > >> +                                            /* looped-back to us */
> > >> +                                            goto out;
> > >> +                                    }
> > >>                              }
> > >>
> > >>                              /*
> > >> --
> > >> 1.7.4.1
> > >>
> > >Nack, This seems like it will just completely break DAD.  What if theres another
> > >system out there with the same mac address.  A response from that system would
> > >get dropped by this filter, instead of causing The local system to stop using
> > >the address.  What you really want to do is modify
> > >bond_should_deliver_exact_match to detect this frame on the inactive slave or
> > >some such, and drop the frame there.
> >
> >        Also NACK; and adding a bit of information.  The balance-xor
> > mode is nominally expecting to interact with a switch whose ports are
> > set for etherchannel ("static link aggregation"), in which case the
> > switch will not loop the packet back around.
> >
> >        If your switch can do etherchannel, then enable it and the
> > problem should go away.  If your switch cannot do this, then you may
> > have other issues, because all of the multicast or broadcast packets
> > going out any bonding slave will loop around to another slave.  You
> > could also use 802.3ad / LACP if you switch supports that.
> >
> >        For balance-xor (or balance-rr, for that matter) mode to a
> > non-etherchannel switch, it's going to be difficult, if not impossible,
> > to modify bond_should_deliver_exact_match, because there are no inactive
> > slaves.  In this mode, bonding is expecting the switch to balance
> > incoming traffic across the ports, and not deliver looped back packets
> > or duplicates.  There are no restrictions on what type of traffic
> > (mcast, bcast, ucast) may arrive on any given port.
> >
> >        I can't think of a way to make the non-etherchannel case work
> > for balance-xor (or balance-rr) without breaking the DAD functionality
> > in the case of an actual duplicate.  I'm not aware of a way to
> > distinguish a looped back DAD probe from an actual duplicate address
> > probe elsewhere on the network.
> >
>
> Hi Neil & Jay,
>
> Thanks a lot for the comments.
>
> The use case is to add IPv6 address on the bonding interface first,
> and then set up port channel on switch. We'll hit this issue and the
> new address will stay tentative and unusable after port channel is set
> up on switch. This patch is for this valid use case.
>
> Except failover mode, all slaves are active on receiving packets, so
> we are receiving such looped back DAD and the bonding driver cannot
> ignore them. I cannot think of a way to distinguish if a DAD is looped
> back or from someone else having the same mac address. They look the
> same to the host. If there is another machine having the same mac
> address, this code path gets executed if both are doing DAD at the
> same time for the same IPv6 address. Maybe we should find out what the
> specification defines for this case?
>

RFC4862 has a discussion about this issue:
http://tools.ietf.org/html/rfc4862#appendix-A
The better solution could be to record the number of DAD sent out. If
we received more DAD packets than we sent out, there is someone else
on the network who has the same mac address and sent DAD for the same
IPv6 address. However, this solution doesn't work with bonding
interface, since all other active slaves but the one sending out DAD
will receive packet looped back. It doesn't seem there is a simple
solution for this issue.

Yinglin

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox