Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [PATCH 2/4] ipgre: follow state of lower device
From: Christian Benvenuti (benve) @ 2012-05-04 23:34 UTC (permalink / raw)
  To: Stephen Hemminger, David Miller; +Cc: netdev, kaber
In-Reply-To: <20120503154025.0845359e@nehalam.linuxnetplumber.net>

Is this the same issue I described in the email below?

  Subject:Route flush on linkdown: physical vs virtual/stacked
interfaces
  http://marc.info/?l=linux-netdev&m=133468470719285&w=2

(ie, need to propagate carrier changes to upper layer device/s)

Thanks
/Chris

> -----Original Message-----
> From: netdev-owner@vger.kernel.org
[mailto:netdev-owner@vger.kernel.org] On Behalf Of Stephen
> Hemminger
> Sent: Thursday, May 03, 2012 3:40 PM
> To: David Miller
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH 2/4] ipgre: follow state of lower device
> 
> On Sat, 14 Apr 2012 14:53:02 -0400 (EDT)
> David Miller <davem@davemloft.net> wrote:
> 
> > From: Stephen Hemminger <shemminger@vyatta.com>
> > Date: Thu, 12 Apr 2012 09:31:17 -0700
> >
> > > GRE tunnels like other layered devices should propogate
> > > carrier and RFC2863 state from lower device to tunnel.
> > >
> > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> >
> > Like others I don't like the ugly hash traversal.
> >
> > A small hash on ifindex, iflink, or whatever ought to be easy and
make
> > the code look much nicer.
> >
> > Longer term project is that a lot of this tunneling code can be
> > commonized at some point.
> 
> The whole set of tunnels needs to be cleaned up to be something
modular, clean
> and cached like the code in OpenVswitch.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 0/3] First pass of cleanups for pskb_expand_head
From: Alexander Duyck @ 2012-05-05  0:26 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, jeffrey.t.kirsher

After looking over the tcp coalesing and GRO code a couple of days ago it
occurred to me that pskb_expand_head has a few flaws.  A few of which are
addressed in this patch series.

This change set takes care of some of the minor cleanup items.  One thing
that caught my eye is the fact the memmove code in the fast-path is likely
no longer doing any thing but burning cycles on a call that doesn't
actually move any memory.

The other change is a follow on to that to drop the fastpath variable which
really just means if the skb is cloned or not.

The final change in this set just adds an inline for getting the end offset
since there were multiple places where we were computing end - head to get
the offset and if we are storing it as an offset it makes more sense to
just pull the actual value.

There are a few more items that I will try to get to next week.  The big one
is the fact that pskb_expand_head can mess up the truesize since it can
allocate a new head but never updates the truesize.  I plan on adding a helper
function for the cases where we are just using it unshare the head so I can
identify the places where we are actually modifying the size.

---

Alexander Duyck (3):
      skb: Add inline helper for getting the skb end offset from head
      skb: Drop "fastpath" variable for skb_cloned check in pskb_expand_head
      skb: Drop bad code from pskb_expand_head

 drivers/atm/ambassador.c             |    2 +
 drivers/atm/idt77252.c               |    2 +
 drivers/net/wimax/i2400m/usb-rx.c    |    2 +
 drivers/staging/octeon/ethernet-tx.c |    2 +
 include/linux/skbuff.h               |   12 ++++++++-
 net/core/skbuff.c                    |   46 ++++++++++------------------------
 6 files changed, 29 insertions(+), 37 deletions(-)

-- 
Thanks,

Alex

^ permalink raw reply

* [PATCH 1/3] skb: Drop bad code from pskb_expand_head
From: Alexander Duyck @ 2012-05-05  0:26 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, jeffrey.t.kirsher
In-Reply-To: <20120505001059.21292.31647.stgit@gitlad.jf.intel.com>

The fast-path for pskb_expand_head contains a check where the size plus the
unaligned size of skb_shared_info is compared against the size of the data
buffer.  This code path has two issues.  First is the fact that after the
recent changes by Eric Dumazet to __alloc_skb and build_skb the shared info
is always placed in the optimal spot for a buffer size making this check
unnecessary.  The second issue is the fact that the check doesn't take into
account the aligned size of shared info.  As a result the code burns cycles
doing a memcpy with nothing actually being shifted.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 net/core/skbuff.c |   12 ------------
 1 files changed, 0 insertions(+), 12 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c199aa4..4d085d4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -951,17 +951,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 		fastpath = atomic_read(&skb_shinfo(skb)->dataref) == delta;
 	}
 
-	if (fastpath && !skb->head_frag &&
-	    size + sizeof(struct skb_shared_info) <= ksize(skb->head)) {
-		memmove(skb->head + size, skb_shinfo(skb),
-			offsetof(struct skb_shared_info,
-				 frags[skb_shinfo(skb)->nr_frags]));
-		memmove(skb->head + nhead, skb->head,
-			skb_tail_pointer(skb) - skb->head);
-		off = nhead;
-		goto adjust_others;
-	}
-
 	data = kmalloc(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
 		       gfp_mask);
 	if (!data)
@@ -997,7 +986,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 
 	skb->head     = data;
 	skb->head_frag = 0;
-adjust_others:
 	skb->data    += off;
 #ifdef NET_SKBUFF_DATA_USES_OFFSET
 	skb->end      = size;

^ permalink raw reply related

* [PATCH 2/3] skb: Drop "fastpath" variable for skb_cloned check in pskb_expand_head
From: Alexander Duyck @ 2012-05-05  0:26 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, jeffrey.t.kirsher
In-Reply-To: <20120505001059.21292.31647.stgit@gitlad.jf.intel.com>

Since there is now only one spot that actually uses "fastpath" there isn't
much point in carrying it.  Instead we can just use a check for skb_cloned
to verify if we can perform the fast-path free for the head or not.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 net/core/skbuff.c |   22 ++++++++--------------
 1 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4d085d4..17e4b1e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -932,7 +932,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 	u8 *data;
 	int size = nhead + (skb_end_pointer(skb) - skb->head) + ntail;
 	long off;
-	bool fastpath;
 
 	BUG_ON(nhead < 0);
 
@@ -941,16 +940,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 
 	size = SKB_DATA_ALIGN(size);
 
-	/* Check if we can avoid taking references on fragments if we own
-	 * the last reference on skb->head. (see skb_release_data())
-	 */
-	if (!skb->cloned)
-		fastpath = true;
-	else {
-		int delta = skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1;
-		fastpath = atomic_read(&skb_shinfo(skb)->dataref) == delta;
-	}
-
 	data = kmalloc(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
 		       gfp_mask);
 	if (!data)
@@ -966,9 +955,12 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 	       skb_shinfo(skb),
 	       offsetof(struct skb_shared_info, frags[skb_shinfo(skb)->nr_frags]));
 
-	if (fastpath) {
-		skb_free_head(skb);
-	} else {
+	/*
+	 * if shinfo is shared we must drop the old head gracefully, but if it
+	 * is not we can just drop the old head and let the existing refcount
+	 * be since all we did is relocate the values
+	 */
+	if (skb_cloned(skb)) {
 		/* copy this zero copy skb frags */
 		if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
 			if (skb_copy_ubufs(skb, gfp_mask))
@@ -981,6 +973,8 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 			skb_clone_fraglist(skb);
 
 		skb_release_data(skb);
+	} else {
+		skb_free_head(skb);
 	}
 	off = (data + nhead) - skb->head;
 

^ permalink raw reply related

* [PATCH 3/3] skb: Add inline helper for getting the skb end offset from head
From: Alexander Duyck @ 2012-05-05  0:26 UTC (permalink / raw)
  To: netdev; +Cc: davem, eric.dumazet, jeffrey.t.kirsher
In-Reply-To: <20120505001059.21292.31647.stgit@gitlad.jf.intel.com>

With the recent changes for how we compute the skb truesize it occurs to me
we are probably going to have a lot of calls to skb_end_pointer -
skb->head.  Instead of running all over the place doing that it would make
more sense to just make it a separate inline skb_end_offset(skb) that way
we can return the correct value without having gcc having to do all the
optimization to cancel out skb->head - skb->head.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 drivers/atm/ambassador.c             |    2 +-
 drivers/atm/idt77252.c               |    2 +-
 drivers/net/wimax/i2400m/usb-rx.c    |    2 +-
 drivers/staging/octeon/ethernet-tx.c |    2 +-
 include/linux/skbuff.h               |   12 +++++++++++-
 net/core/skbuff.c                    |   12 ++++++------
 6 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/atm/ambassador.c b/drivers/atm/ambassador.c
index f8f41e0..89b30f3 100644
--- a/drivers/atm/ambassador.c
+++ b/drivers/atm/ambassador.c
@@ -802,7 +802,7 @@ static void fill_rx_pool (amb_dev * dev, unsigned char pool,
     }
     // cast needed as there is no %? for pointer differences
     PRINTD (DBG_SKB, "allocated skb at %p, head %p, area %li",
-	    skb, skb->head, (long) (skb_end_pointer(skb) - skb->head));
+	    skb, skb->head, (long) skb_end_offset(skb));
     rx.handle = virt_to_bus (skb);
     rx.host_address = cpu_to_be32 (virt_to_bus (skb->data));
     if (rx_give (dev, &rx, pool))
diff --git a/drivers/atm/idt77252.c b/drivers/atm/idt77252.c
index 1c05212..8974bd2 100644
--- a/drivers/atm/idt77252.c
+++ b/drivers/atm/idt77252.c
@@ -1258,7 +1258,7 @@ idt77252_rx_raw(struct idt77252_dev *card)
 	tail = readl(SAR_REG_RAWCT);
 
 	pci_dma_sync_single_for_cpu(card->pcidev, IDT77252_PRV_PADDR(queue),
-				    skb_end_pointer(queue) - queue->head - 16,
+				    skb_end_offset(queue) - 16,
 				    PCI_DMA_FROMDEVICE);
 
 	while (head != tail) {
diff --git a/drivers/net/wimax/i2400m/usb-rx.c b/drivers/net/wimax/i2400m/usb-rx.c
index e325768..b78ee67 100644
--- a/drivers/net/wimax/i2400m/usb-rx.c
+++ b/drivers/net/wimax/i2400m/usb-rx.c
@@ -277,7 +277,7 @@ retry:
 		d_printf(1, dev, "RX: size changed to %d, received %d, "
 			 "copied %d, capacity %ld\n",
 			 rx_size, read_size, rx_skb->len,
-			 (long) (skb_end_pointer(new_skb) - new_skb->head));
+			 (long) skb_end_offset(new_skb));
 		goto retry;
 	}
 		/* In most cases, it happens due to the hardware scheduling a
diff --git a/drivers/staging/octeon/ethernet-tx.c b/drivers/staging/octeon/ethernet-tx.c
index 56d74dc..418ed03 100644
--- a/drivers/staging/octeon/ethernet-tx.c
+++ b/drivers/staging/octeon/ethernet-tx.c
@@ -344,7 +344,7 @@ int cvm_oct_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 	if (unlikely
 	    (skb->truesize !=
-	     sizeof(*skb) + skb_end_pointer(skb) - skb->head)) {
+	     sizeof(*skb) + skb_end_offset(skb))) {
 		/*
 		   printk("TX buffer truesize has been changed\n");
 		 */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 37f5391..91ad5e2 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -645,11 +645,21 @@ static inline unsigned char *skb_end_pointer(const struct sk_buff *skb)
 {
 	return skb->head + skb->end;
 }
+
+static inline unsigned int skb_end_offset(const struct sk_buff *skb)
+{
+	return skb->end;
+}
 #else
 static inline unsigned char *skb_end_pointer(const struct sk_buff *skb)
 {
 	return skb->end;
 }
+
+static inline unsigned int skb_end_offset(const struct sk_buff *skb)
+{
+	return skb->end - skb->head;
+}
 #endif
 
 /* Internal */
@@ -2558,7 +2568,7 @@ static inline bool skb_is_recycleable(const struct sk_buff *skb, int skb_size)
 		return false;
 
 	skb_size = SKB_DATA_ALIGN(skb_size + NET_SKB_PAD);
-	if (skb_end_pointer(skb) - skb->head < skb_size)
+	if (skb_end_offset(skb) < skb_size)
 		return false;
 
 	if (skb_shared(skb) || skb_cloned(skb))
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 17e4b1e..2c35da8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -829,7 +829,7 @@ static void copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t gfp_mask)
 {
 	int headerlen = skb_headroom(skb);
-	unsigned int size = (skb_end_pointer(skb) - skb->head) + skb->data_len;
+	unsigned int size = skb_end_offset(skb) + skb->data_len;
 	struct sk_buff *n = alloc_skb(size, gfp_mask);
 
 	if (!n)
@@ -930,7 +930,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 {
 	int i;
 	u8 *data;
-	int size = nhead + (skb_end_pointer(skb) - skb->head) + ntail;
+	int size = nhead + skb_end_offset(skb) + ntail;
 	long off;
 
 	BUG_ON(nhead < 0);
@@ -2727,14 +2727,13 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features)
 			if (unlikely(!nskb))
 				goto err;
 
-			hsize = skb_end_pointer(nskb) - nskb->head;
+			hsize = skb_end_offset(nskb);
 			if (skb_cow_head(nskb, doffset + headroom)) {
 				kfree_skb(nskb);
 				goto err;
 			}
 
-			nskb->truesize += skb_end_pointer(nskb) - nskb->head -
-					  hsize;
+			nskb->truesize += skb_end_offset(nskb) - hsize;
 			skb_release_head_state(nskb);
 			__skb_push(nskb, doffset);
 		} else {
@@ -2883,7 +2882,8 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		skb_frag_size_sub(frag, offset);
 
 		/* all fragments truesize : remove (head size + sk_buff) */
-		delta_truesize = skb->truesize - SKB_TRUESIZE(skb_end_pointer(skb) - skb->head);
+		delta_truesize = skb->truesize -
+				 SKB_TRUESIZE(skb_end_offset(skb));
 
 		skb->truesize -= skb->data_len;
 		skb->len -= skb->data_len;

^ permalink raw reply related

* Re: [net-next PATCH v4 0/8] Managing the forwarding database(FDB)
From: John Fastabend @ 2012-05-05  5:00 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Sridhar Samudrala, Michael S. Tsirkin, shemminger, bhutchings,
	hadi, jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2
In-Reply-To: <CAGe6so8q26X=HoQx+P-wkoLMtq1NhRerk98-v0cxhUpvMH4zmQ@mail.gmail.com>

On 5/4/2012 1:34 PM, Roopa Prabhu wrote:
> 
> 
> On Thu, May 3, 2012 at 10:43 PM, Sridhar Samudrala <sri@us.ibm.com <mailto:sri@us.ibm.com>> wrote:
> 
>     On 5/3/2012 12:38 PM, John Fastabend wrote:
> 
>         On 5/2/2012 4:36 PM, Sridhar Samudrala wrote:
> 
>             On 5/2/2012 2:52 PM, John Fastabend wrote:
> 
>                 On 5/2/2012 8:08 AM, Michael S. Tsirkin wrote:
> 
>                     On Sun, Apr 15, 2012 at 01:06:37PM -0400, David Miller wrote:
> 
>                         From: John Fastabend<john.r.fastabend@__intel.com <mailto:john.r.fastabend@intel.com>>
>                         Date: Sun, 15 Apr 2012 09:43:51 -0700
> 
>                             The following series is a submission for net-next to allow
>                             embedded switches and other stacked devices other then the
>                             Linux bridge to manage a forwarding database.
> 
>                             Previously discussed here,
> 
>                             http://lists.openwall.net/__netdev/2012/03/19/26 <http://lists.openwall.net/netdev/2012/03/19/26>
> 
>                             v4: propagate return codes correctly for ndo_dflt_Fdb_dump()
> 
>                             v3: resolve the macvlan patch 8/8 to fix a dev_set_promiscuity()
>                                  error and add the flags field to change and get link routines.
> 
>                             v2: addressed feedback from Ben Hutchings resolving a typo in the
>                                  multicast add/del routines and improving the error handling
>                                  when both NTF_SELF and NTF_MASTER are set.
> 
>                             I've tested this with 'br' tool published by Stephen Hemminger
>                             soon to be renamed 'bridge' I believe and various traffic
>                             generators mostly pktgen, ping, and netperf.
> 
>                         All applied, if we need any more tweaks we can just add them
>                         on top of this work.
> 
>                         Thanks John.
> 
>                     John, do you plan to update kvm userspace to use this interface?
> 
>                 No immediate plans. I would really appreciate it if you or one
>                 of the IBM developers working in this space took it on. Of course
>                 if no one steps up I guess I can eventually get at it but it will
>                 be sometime. For now I've been doing this manually with the bridge
>                 tool yet to be published.
> 
> 
>             Does this mean that when we add an interface to a bridge, it need not be put in promiscuous mode and
>             add/delete fdb entries dynamically?
> 
>         The net/bridge will automatically put the interface in promisc mode
>         when the device is attached. We do need to add/delete fdb entries
>         though to allow forwarding packets from the virtual function and
>         any emulated devices e.g. tap devices on the bridge.
> 
> 
>     Consider the following scenario where we have a SR-IOV NIC with 1 PF
>     and 2 VFs (VF1 & VF2).
>     - eth0 is the PF which is attached to bridge br0 and connected to 2 VMs VM1 and VM2.
>     - eth1 is the VF1 terminated on the host and assigned to VM3 via macvtap0 in passthru mode.
>     - VF2 is directly assigned to VM4 via pci-device assignment.
> 
>      VM1      VM2         VM3           VM4
>     (mac1)  (mac2)     (mac3)         (mac4)
>      |        |           |             |
>      |        |           |             |
>     vnet0   vnet1         |             |
>      |        |           |             |
>      \        /           |             |
>      \      /            |             |
>        br0            macvtap0         |
>         |              (mac3)          |
>         |                |             |
>        eth0            eth1            |
>         |              (mac3)          |
>         |               |              |
>       ------------------------------__------
>      | PF              VF1           VF2  |
>      |                                    |
>      |                 VEB                |
>      ------------------------------__------
> 
>     In this setup, i think when VM1 and VM2 come up, mac1 and mac2 have to be added to the
>     embedded bridge's fdb.  Once we add these 2 entries, all the 4 VMs can talk to each other.
>     Is this correct?
> 

Correct as Roopa indicated.

>     Now, if VM1 or VM2 wants to add secondary mac addresses, i think we need qemu to add a new fdb
>     entry when it receives add mac address command via virtio control vq.
> 
> 
> yes. I had used (with some tweaks) some existing qemu patches on patchwork to try this out with my implementation.
> 
> The links to the patches on patchwork are listed in my cover mail at http://marc.info/?l=linux-netdev&m=131534911001054&w=2 <http://marc.info/?l=linux-netdev&m=131534911001054&w=2>
> 
>  
> 
>     Can we add multiple mac addresses to VFs? For example VM3 and VM4 trying to add a secondary mac address.

Yes this is why we also added the fdb interface to the macvlan device as well.

> 
>     What about VMs trying to create VLANs? I think this will work on VM1 and VM2. However with VM3
>     and VM4, i think we need qemu to add vlans to the VFs when the VMs create them.
> 
> 
> yes for vlans too, the qemu patches pointed out above can be reused.
> 
> Thanks,
> Roopa
>  
> 

^ permalink raw reply

* Re: [PATCH 1/3] skb: Drop bad code from pskb_expand_head
From: Eric Dumazet @ 2012-05-05  5:35 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, davem, jeffrey.t.kirsher
In-Reply-To: <20120505002645.21292.38368.stgit@gitlad.jf.intel.com>

On Fri, 2012-05-04 at 17:26 -0700, Alexander Duyck wrote:
> The fast-path for pskb_expand_head contains a check where the size plus the
> unaligned size of skb_shared_info is compared against the size of the data
> buffer.  This code path has two issues.  First is the fact that after the
> recent changes by Eric Dumazet to __alloc_skb and build_skb the shared info
> is always placed in the optimal spot for a buffer size making this check
> unnecessary.  The second issue is the fact that the check doesn't take into
> account the aligned size of shared info.  As a result the code burns cycles
> doing a memcpy with nothing actually being shifted.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
> 
>  net/core/skbuff.c |   12 ------------
>  1 files changed, 0 insertions(+), 12 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index c199aa4..4d085d4 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -951,17 +951,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>  		fastpath = atomic_read(&skb_shinfo(skb)->dataref) == delta;
>  	}
>  
> -	if (fastpath && !skb->head_frag &&
> -	    size + sizeof(struct skb_shared_info) <= ksize(skb->head)) {
> -		memmove(skb->head + size, skb_shinfo(skb),
> -			offsetof(struct skb_shared_info,
> -				 frags[skb_shinfo(skb)->nr_frags]));
> -		memmove(skb->head + nhead, skb->head,
> -			skb_tail_pointer(skb) - skb->head);
> -		off = nhead;
> -		goto adjust_others;
> -	}
> -
>  	data = kmalloc(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
>  		       gfp_mask);
>  	if (!data)
> @@ -997,7 +986,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>  
>  	skb->head     = data;
>  	skb->head_frag = 0;
> -adjust_others:
>  	skb->data    += off;
>  #ifdef NET_SKBUFF_DATA_USES_OFFSET
>  	skb->end      = size;
> 

I totally agree this code is no longer needed, we already have the
skb_shared_info at the end of the buffer.

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH 2/3] skb: Drop "fastpath" variable for skb_cloned check in pskb_expand_head
From: Eric Dumazet @ 2012-05-05  5:37 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, davem, jeffrey.t.kirsher
In-Reply-To: <20120505002651.21292.19680.stgit@gitlad.jf.intel.com>

On Fri, 2012-05-04 at 17:26 -0700, Alexander Duyck wrote:
> Since there is now only one spot that actually uses "fastpath" there isn't
> much point in carrying it.  Instead we can just use a check for skb_cloned
> to verify if we can perform the fast-path free for the head or not.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
> 
>  net/core/skbuff.c |   22 ++++++++--------------
>  1 files changed, 8 insertions(+), 14 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 4d085d4..17e4b1e 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -932,7 +932,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>  	u8 *data;
>  	int size = nhead + (skb_end_pointer(skb) - skb->head) + ntail;
>  	long off;
> -	bool fastpath;
>  
>  	BUG_ON(nhead < 0);
>  
> @@ -941,16 +940,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>  
>  	size = SKB_DATA_ALIGN(size);
>  
> -	/* Check if we can avoid taking references on fragments if we own
> -	 * the last reference on skb->head. (see skb_release_data())
> -	 */
> -	if (!skb->cloned)
> -		fastpath = true;
> -	else {
> -		int delta = skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1;
> -		fastpath = atomic_read(&skb_shinfo(skb)->dataref) == delta;
> -	}
> -
>  	data = kmalloc(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
>  		       gfp_mask);
>  	if (!data)
> @@ -966,9 +955,12 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>  	       skb_shinfo(skb),
>  	       offsetof(struct skb_shared_info, frags[skb_shinfo(skb)->nr_frags]));
>  
> -	if (fastpath) {
> -		skb_free_head(skb);
> -	} else {
> +	/*
> +	 * if shinfo is shared we must drop the old head gracefully, but if it
> +	 * is not we can just drop the old head and let the existing refcount
> +	 * be since all we did is relocate the values
> +	 */
> +	if (skb_cloned(skb)) {
>  		/* copy this zero copy skb frags */
>  		if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
>  			if (skb_copy_ubufs(skb, gfp_mask))
> @@ -981,6 +973,8 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>  			skb_clone_fraglist(skb);
>  
>  		skb_release_data(skb);
> +	} else {
> +		skb_free_head(skb);
>  	}
>  	off = (data + nhead) - skb->head;
>  
> 

Excellent

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH 3/3] skb: Add inline helper for getting the skb end offset from head
From: Eric Dumazet @ 2012-05-05  5:39 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, davem, jeffrey.t.kirsher
In-Reply-To: <20120505002656.21292.89799.stgit@gitlad.jf.intel.com>

On Fri, 2012-05-04 at 17:26 -0700, Alexander Duyck wrote:
> With the recent changes for how we compute the skb truesize it occurs to me
> we are probably going to have a lot of calls to skb_end_pointer -
> skb->head.  Instead of running all over the place doing that it would make
> more sense to just make it a separate inline skb_end_offset(skb) that way
> we can return the correct value without having gcc having to do all the
> optimization to cancel out skb->head - skb->head.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
> 
>  drivers/atm/ambassador.c             |    2 +-
>  drivers/atm/idt77252.c               |    2 +-
>  drivers/net/wimax/i2400m/usb-rx.c    |    2 +-
>  drivers/staging/octeon/ethernet-tx.c |    2 +-
>  include/linux/skbuff.h               |   12 +++++++++++-
>  net/core/skbuff.c                    |   12 ++++++------
>  6 files changed, 21 insertions(+), 11 deletions(-)

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH 0/3] First pass of cleanups for pskb_expand_head
From: Eric Dumazet @ 2012-05-05  5:44 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, davem, jeffrey.t.kirsher
In-Reply-To: <20120505001059.21292.31647.stgit@gitlad.jf.intel.com>

On Fri, 2012-05-04 at 17:26 -0700, Alexander Duyck wrote:
> pull the actual value.
> 
> There are a few more items that I will try to get to next week.  The big one
> is the fact that pskb_expand_head can mess up the truesize since it can
> allocate a new head but never updates the truesize.  I plan on adding a helper
> function for the cases where we are just using it unshare the head so I can
> identify the places where we are actually modifying the size.

In the old days, truesize adjustements were done after
pskb_expand_head() calls. (Mabye because some contexts didnt care of
truesize for ephemeral skbs, not charged to a socket)

So it will be a nice cleanup for sure.

^ permalink raw reply

* Re: [PATCH 0/3] First pass of cleanups for pskb_expand_head
From: Alexander Duyck @ 2012-05-05  6:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Alexander Duyck, netdev, davem, jeffrey.t.kirsher
In-Reply-To: <1336196671.3752.490.camel@edumazet-glaptop>

On 5/4/2012 10:44 PM, Eric Dumazet wrote:
> On Fri, 2012-05-04 at 17:26 -0700, Alexander Duyck wrote:
>> pull the actual value.
>>
>> There are a few more items that I will try to get to next week.  The big one
>> is the fact that pskb_expand_head can mess up the truesize since it can
>> allocate a new head but never updates the truesize.  I plan on adding a helper
>> function for the cases where we are just using it unshare the head so I can
>> identify the places where we are actually modifying the size.
> In the old days, truesize adjustements were done after
> pskb_expand_head() calls. (Mabye because some contexts didnt care of
> truesize for ephemeral skbs, not charged to a socket)
>
> So it will be a nice cleanup for sure.

I suspect the reason for no truesize adjustment is because this function 
gets called in the transmit path, and we probably should be adjusting 
truesize while there is still a desctructor in place that will turn 
around and subtract the truesize from the socket memory.  I'm still 
thinking about what would be the best solution to that, but in the 
meantime I figure I can at least add a helper function to handle all the 
pskb_expand_head(skb, 0, 0, GFP_ATOMIC) cases and just replace them with 
something like skb_unshare_head(skb).  That way I will have a better 
idea of the few cases where we might actually impact truesize.

^ permalink raw reply

* Re: [net-next 2/8] e1000e: initial support for i217
From: Bjørn Mork @ 2012-05-05  8:01 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, Bruce Allan, netdev, gospo, sassmann
In-Reply-To: <1336127716-20383-3-git-send-email-jeffrey.t.kirsher@intel.com>

Jeff Kirsher <jeffrey.t.kirsher@intel.com> writes:

> diff --git a/drivers/net/ethernet/intel/e1000e/defines.h b/drivers/net/ethernet/intel/e1000e/defines.h
> index 3a50259..11c4666 100644
> --- a/drivers/net/ethernet/intel/e1000e/defines.h
> +++ b/drivers/net/ethernet/intel/e1000e/defines.h
> @@ -74,7 +74,9 @@
>  #define E1000_WUS_BC           E1000_WUFC_BC
>  
>  /* Extended Device Control */
> +#define E1000_CTRL_EXT_LPCD  0x00000004     /* LCD Power Cycle Done */
>  #define E1000_CTRL_EXT_SDP3_DATA 0x00000080 /* Value of SW Definable Pin 3 */
> +#define E1000_CTRL_EXT_FORCE_SMBUS 0x00000004 /* Force SMBus mode*/
>  #define E1000_CTRL_EXT_EE_RST    0x00002000 /* Reinitialize from EEPROM */
>  #define E1000_CTRL_EXT_SPD_BYPS  0x00008000 /* Speed Select Bypass */
>  #define E1000_CTRL_EXT_RO_DIS    0x00020000 /* Relaxed Ordering disable */

The mangled sorting and alignment of the new entries made me wonder if
this was a typo.  But reading further below it looks like
E1000_CTRL_EXT_LPCD is input and E1000_CTRL_EXT_FORCE_SMBUS is output.
If that is correct, then it probably deserves a small comment here along
with better sorting and alignment to make it clear that the duplicate
value is intentional?


Bjørn

^ permalink raw reply

* Re: [PATCH 01/13 v4] usb/net: rndis: inline the cpu_to_le32() macro
From: Linus Walleij @ 2012-05-05  9:01 UTC (permalink / raw)
  To: Jussi Kivilinna
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Greg Kroah-Hartman, David S. Miller, Felipe Balbi, Haiyang Zhang,
	Wei Yongjun, Ben Hutchings
In-Reply-To: <20120502182938.15804ij32yd8jsis-tzMWlZeEOor1KXRcyAk9cg@public.gmane.org>

On Wed, May 2, 2012 at 5:29 PM, Jussi Kivilinna
<jussi.kivilinna-E01nCVcF24I@public.gmane.org> wrote:

> Quoting Linus Walleij <linus.walleij-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>:
>
>> The header file <linux/usb/rndis_host.h> used a number of #defines
>> that included the cpu_to_le32() macro to assure the result will be
>> in LE endianness. Inlining this into the code instead of using it
>> in the code definitions yields consolidation opportunities later
>> on as you will see in the following patches. The individual
>> drivers also used local defines - all are switched over to the
>> pattern of doing the conversion at the call sites instead.
>>
>
> After this patch, endianness checks with sparse output:
(...)
> Patch fixing this attached.

Thanks! Folded this into patch 1 and added your Signed-off-by.

> Patch-set to clean-up ugliness caused by this patch at:
> http://koti.mbnet.fi/axh/kernel/rndis_wlan/

This seems like a good middle-ground as compared to the
other suggestion to force all defines to be cpu_to_le32().

Do you want me to rebase this on top of my series (there was
a number of conflicts later in the series) and carry it as part
of this patch set?

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 00/13 v4] usb/net: rndis: first step toward consolidation
From: Linus Walleij @ 2012-05-05  9:02 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r, balbi-l0cyMroinI0,
	jussi.kivilinna-E01nCVcF24I, haiyangz-0li6OtcxBFHby3iVrkZq2A,
	yongjun_wei-zrsr2BFq86L20UzCJQGyNP8+0UxHXcjY,
	ben-/+tVBieCtBitmTQ+vhA3Yw
In-Reply-To: <20120502.202020.1507739621351234969.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

On Thu, May 3, 2012 at 2:20 AM, David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:

> From: "Linus Walleij" <linus.walleij-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> Date: Tue,  1 May 2012 20:22:09 +0200
>
>> The REAL v4 patch set... forget v3 :-(
>
> You'll definitely need to submit at least a v5, especially
> after all of the endian bugs that have been spotted.

Yep Jussi is helping me to hash this out...

Thanks,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: IP_MULTICAST_IF getsockopt man part
From: Michael Kerrisk (man-pages) @ 2012-05-05 11:57 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: linux-man, netdev
In-Reply-To: <20120504090013.GA2362@minipsycho>

On Fri, May 4, 2012 at 9:00 PM, Jiri Pirko <jpirko@redhat.com> wrote:
> Hi.
>
> <quote>
> IP_MULTICAST_IF (since Linux 1.2)
>                      Set the local device for a multicast socket.
>                      Argument is an ip_mreqn or ip_mreq structure
>                      similar to IP_ADD_MEMBERSHIP.
> </quote>
>
> That is not true. Setsockopt recognizes only ip_mreqn and in_addr. I
> made patch which makes it recognize ip_mreq as well. So that would be
> probably ok.
> http://patchwork.ozlabs.org/patch/156815/
>
> On the other hand, getsockopt works only with in_addr. That I think is
> good behaviour but manpages here needs to be corrected in this way (read
> part needs to be added here)

Jirka,

I'm having trouble to understand what you mean. Perhaps it would be
simplest if you showed your proposed replacement text for the text
quoted above.

Thanks,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply

* [net-next 0/4][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2012-05-05 12:38 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

This series of patches contains updates for e1000e and ixgbe.

NOTE- The ixgbe patch can and probably should be applied to
David Miller's net tree as well.

The following are changes since commit bd14b1b2e29bd6812597f896dde06eaf7c6d2f24:
  tcp: be more strict before accepting ECN negociation
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Bruce Allan (2):
  e1000e: enable forced master/slave on 82577
  e1000e: increase version number

John Fastabend (1):
  ixgbe: dcb: IEEE PFC stats and reset logic incorrect

Richard Alpe (1):
  e1000e: clear REQ and GNT in EECD (82571 && 82572)

 drivers/net/ethernet/intel/e1000e/82571.c       |   12 ++++-
 drivers/net/ethernet/intel/e1000e/netdev.c      |    2 +-
 drivers/net/ethernet/intel/e1000e/phy.c         |   71 ++++++++++++++--------
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c |    7 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |    6 ++-
 5 files changed, 69 insertions(+), 29 deletions(-)

-- 
1.7.7.6

^ permalink raw reply

* [net-next 1/4] e1000e: enable forced master/slave on 82577
From: Jeff Kirsher @ 2012-05-05 12:38 UTC (permalink / raw)
  To: davem; +Cc: Bruce Allan, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1336221493-913-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Bruce Allan <bruce.w.allan@intel.com>

Like other supported (igp) PHYs, the driver needs to be able to force the
master/slave mode on 82577.  Since the code is the same as what already
exists in the code flow for igp PHYs, move it to a new function to be
called for both flows.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/phy.c |   71 +++++++++++++++++++-----------
 1 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c
index ada7133..0334d01 100644
--- a/drivers/net/ethernet/intel/e1000e/phy.c
+++ b/drivers/net/ethernet/intel/e1000e/phy.c
@@ -639,6 +639,45 @@ s32 e1000e_write_kmrn_reg_locked(struct e1000_hw *hw, u32 offset, u16 data)
 }
 
 /**
+ *  e1000_set_master_slave_mode - Setup PHY for Master/slave mode
+ *  @hw: pointer to the HW structure
+ *
+ *  Sets up Master/slave mode
+ **/
+static s32 e1000_set_master_slave_mode(struct e1000_hw *hw)
+{
+	s32 ret_val;
+	u16 phy_data;
+
+	/* Resolve Master/Slave mode */
+	ret_val = e1e_rphy(hw, PHY_1000T_CTRL, &phy_data);
+	if (ret_val)
+		return ret_val;
+
+	/* load defaults for future use */
+	hw->phy.original_ms_type = (phy_data & CR_1000T_MS_ENABLE) ?
+	    ((phy_data & CR_1000T_MS_VALUE) ?
+	     e1000_ms_force_master : e1000_ms_force_slave) : e1000_ms_auto;
+
+	switch (hw->phy.ms_type) {
+	case e1000_ms_force_master:
+		phy_data |= (CR_1000T_MS_ENABLE | CR_1000T_MS_VALUE);
+		break;
+	case e1000_ms_force_slave:
+		phy_data |= CR_1000T_MS_ENABLE;
+		phy_data &= ~(CR_1000T_MS_VALUE);
+		break;
+	case e1000_ms_auto:
+		phy_data &= ~CR_1000T_MS_ENABLE;
+		/* fall-through */
+	default:
+		break;
+	}
+
+	return e1e_wphy(hw, PHY_1000T_CTRL, phy_data);
+}
+
+/**
  *  e1000_copper_link_setup_82577 - Setup 82577 PHY for copper link
  *  @hw: pointer to the HW structure
  *
@@ -659,7 +698,11 @@ s32 e1000_copper_link_setup_82577(struct e1000_hw *hw)
 	/* Enable downshift */
 	phy_data |= I82577_CFG_ENABLE_DOWNSHIFT;
 
-	return e1e_wphy(hw, I82577_CFG_REG, phy_data);
+	ret_val = e1e_wphy(hw, I82577_CFG_REG, phy_data);
+	if (ret_val)
+		return ret_val;
+
+	return e1000_set_master_slave_mode(hw);
 }
 
 /**
@@ -895,31 +938,7 @@ s32 e1000e_copper_link_setup_igp(struct e1000_hw *hw)
 				return ret_val;
 		}
 
-		ret_val = e1e_rphy(hw, PHY_1000T_CTRL, &data);
-		if (ret_val)
-			return ret_val;
-
-		/* load defaults for future use */
-		phy->original_ms_type = (data & CR_1000T_MS_ENABLE) ?
-			((data & CR_1000T_MS_VALUE) ?
-			e1000_ms_force_master :
-			e1000_ms_force_slave) :
-			e1000_ms_auto;
-
-		switch (phy->ms_type) {
-		case e1000_ms_force_master:
-			data |= (CR_1000T_MS_ENABLE | CR_1000T_MS_VALUE);
-			break;
-		case e1000_ms_force_slave:
-			data |= CR_1000T_MS_ENABLE;
-			data &= ~(CR_1000T_MS_VALUE);
-			break;
-		case e1000_ms_auto:
-			data &= ~CR_1000T_MS_ENABLE;
-		default:
-			break;
-		}
-		ret_val = e1e_wphy(hw, PHY_1000T_CTRL, data);
+		ret_val = e1000_set_master_slave_mode(hw);
 	}
 
 	return ret_val;
-- 
1.7.7.6

^ permalink raw reply related

* [net-next 3/4] e1000e: increase version number
From: Jeff Kirsher @ 2012-05-05 12:38 UTC (permalink / raw)
  To: davem; +Cc: Bruce Allan, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1336221493-913-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Bruce Allan <bruce.w.allan@intel.com>

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index b53ea83..f648299 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -56,7 +56,7 @@
 
 #define DRV_EXTRAVERSION "-k"
 
-#define DRV_VERSION "1.11.3" DRV_EXTRAVERSION
+#define DRV_VERSION "2.0.0" DRV_EXTRAVERSION
 char e1000e_driver_name[] = "e1000e";
 const char e1000e_driver_version[] = DRV_VERSION;
 
-- 
1.7.7.6

^ permalink raw reply related

* [net-next 2/4] e1000e: clear REQ and GNT in EECD (82571 && 82572)
From: Jeff Kirsher @ 2012-05-05 12:38 UTC (permalink / raw)
  To: davem; +Cc: Richard Alpe, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1336221493-913-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Richard Alpe <richard.alpe@ericsson.com>

Clear the REQ and GNT bit in the eeprom control register (EECD).
This is required if the eeprom is to be accessed with auto read
EERD register.

After a cold reset this doesn't matter but if PBIST MAC test was
executed before booting, the register was left in a dirty state
(the 2 bits where set), which caused the read operation to time out
and returning 0.

Reference (page 312):
http://download.intel.com/design/network/manuals/316080.pdf

Reported-by: Aleksandar Igic <aleksandar.igic@dektech.com.au>
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/e1000e/82571.c |   12 +++++++++++-
 1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/82571.c b/drivers/net/ethernet/intel/e1000e/82571.c
index 6a8a908..36db4df 100644
--- a/drivers/net/ethernet/intel/e1000e/82571.c
+++ b/drivers/net/ethernet/intel/e1000e/82571.c
@@ -999,7 +999,7 @@ static s32 e1000_set_d0_lplu_state_82571(struct e1000_hw *hw, bool active)
  **/
 static s32 e1000_reset_hw_82571(struct e1000_hw *hw)
 {
-	u32 ctrl, ctrl_ext;
+	u32 ctrl, ctrl_ext, eecd;
 	s32 ret_val;
 
 	/*
@@ -1072,6 +1072,16 @@ static s32 e1000_reset_hw_82571(struct e1000_hw *hw)
 	 */
 
 	switch (hw->mac.type) {
+	case e1000_82571:
+	case e1000_82572:
+		/*
+		 * REQ and GNT bits need to be cleared when using AUTO_RD
+		 * to access the EEPROM.
+		 */
+		eecd = er32(EECD);
+		eecd &= ~(E1000_EECD_REQ | E1000_EECD_GNT);
+		ew32(EECD, eecd);
+		break;
 	case e1000_82573:
 	case e1000_82574:
 	case e1000_82583:
-- 
1.7.7.6

^ permalink raw reply related

* [net-next 4/4] ixgbe: dcb: IEEE PFC stats and reset logic incorrect
From: Jeff Kirsher @ 2012-05-05 12:38 UTC (permalink / raw)
  To: davem; +Cc: John Fastabend, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1336221493-913-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: John Fastabend <john.r.fastabend@intel.com>

PFC stats are only tabulated when PFC is enabled. However in IEEE
mode the ieee_pfc pfc_tc bits were not checked and the calculation
was aborted.

This results in statistics not being reported through ethtool and
possible a false Tx hang occurring when receiving pause frames.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c |    7 +++++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |    6 +++++-
 2 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c
index 652e4b0..2feacf6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_dcb_nl.c
@@ -662,6 +662,13 @@ static int ixgbe_dcbnl_ieee_setpfc(struct net_device *dev,
 			return -ENOMEM;
 	}
 
+	if (pfc->pfc_en) {
+		adapter->last_lfc_mode = adapter->hw.fc.current_mode;
+		adapter->hw.fc.current_mode = ixgbe_fc_pfc;
+	} else {
+		adapter->hw.fc.current_mode = adapter->last_lfc_mode;
+	}
+
 	prio_tc = adapter->ixgbe_ieee_ets->prio_tc;
 	memcpy(adapter->ixgbe_ieee_pfc, pfc, sizeof(*adapter->ixgbe_ieee_pfc));
 	return ixgbe_dcb_hw_pfc_config(&adapter->hw, pfc->pfc_en, prio_tc);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index b2daff3..4048c9d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -637,7 +637,11 @@ static void ixgbe_update_xoff_received(struct ixgbe_adapter *adapter)
 			clear_bit(__IXGBE_HANG_CHECK_ARMED,
 				  &adapter->tx_ring[i]->state);
 		return;
-	} else if (!(adapter->dcb_cfg.pfc_mode_enable))
+	} else if (((adapter->dcbx_cap & DCB_CAP_DCBX_VER_CEE) &&
+		    !(adapter->dcb_cfg.pfc_mode_enable)) ||
+		   ((adapter->dcbx_cap & DCB_CAP_DCBX_VER_IEEE) &&
+		    adapter->ixgbe_ieee_pfc &&
+		    !(adapter->ixgbe_ieee_pfc->pfc_en)))
 		return;
 
 	/* update stats for each tc, only valid with PFC enabled */
-- 
1.7.7.6

^ permalink raw reply related

* ipctl - new tool for efficient read/write of net related sysctl
From: Oskar Berggren @ 2012-05-05 15:13 UTC (permalink / raw)
  To: netdev

Hi,

In a project of mine I need to read (and possibly set) many of the properties
found under /proc/sys/net/ipv4/conf/. This is simple enough, except that
when you have hundreds of interfaces, it is really slow. In my tests it takes
about 4 seconds to read a single variable for 700 interfaces. For a while I
worked around this using the binary sysctl() interface, but this is deprecated.

In an experiment to get around this limitation I have created "ipctl", a kernel
module and accompanying user space library/tool. Communication between
kernel and user space is based on generic netlink. What used to take
seconds now happen in a few milliseconds.

So far I have only implemented support for the proxy_arp setting. Do you
think it's worthwhile to pursue this to create something more complete? Are
there other ideas on how one might get fast read/write of the IP-related
settings in procfs?

The full source code is available at:
https://github.com/oskarb/ipctl

Kernel module enclosed below. Haven't done much kernel programming
before, so comments are most welcome!

/Oskar


#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/inetdevice.h>
#include <linux/netdevice.h>
#include <net/netlink.h>
#include <net/genetlink.h>
#include "../../include/libipctl/ipctl-nl.h"

#define MOD_AUTHOR "Oskar Berggren <oskar.berggren@gmail.com>"
#define MOD_DESC "A module to offer efficient mass control of the IP
sysctl family traditionally controlled through /proc."
#define MOD_VER "0.1"


static int ipctl_get_proxyarp_by_ifindex(int ifIndex, int *on)
{
	struct net *net = &init_net;
	struct net_device *dev;
	struct in_device *in_dev;

	dev = dev_get_by_index(net, ifIndex);

	if (dev)
	{
		if (__in_dev_get_rtnl(dev))
		{
			in_dev = __in_dev_get_rtnl(dev);
			*on = IN_DEV_CONF_GET(in_dev, PROXY_ARP);
		}

		dev_put(dev);  // Release reference.
	}

	return 0;
}


static int ipctl_set_proxyarp_by_ifindex(int ifIndex, int on)
{
	struct net *net = &init_net;
	struct net_device *dev;
	struct in_device *in_dev;

	dev = dev_get_by_index(net, ifIndex);

	if (dev)
	{
		if (__in_dev_get_rtnl(dev))
		{
			in_dev = __in_dev_get_rtnl(dev);
			IN_DEV_CONF_SET(in_dev, PROXY_ARP, on);
		}

		dev_put(dev);  // Release reference.
	}

	return 0;
}


/* family definition */
static struct genl_family ipctl_gnl_family = {
	.id = GENL_ID_GENERATE,
	.hdrsize = 0,
	.name = IPCTL_GENL_NAME,
	.version = IPCTL_GENL_VERSION,
	.maxattr = IPCTL_ATTR_MAX,
};


static int ipctl_reply(struct sk_buff *skb, struct genl_info *info,
		       int property, int ifIndex, int value)
{
	struct sk_buff *skb_reply;
	void *msg_head;
	int rc;

	pr_debug("ipctl: reply start\n");

	skb_reply = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
	if (skb_reply == NULL)
		goto out;

	msg_head = genlmsg_put(skb_reply, 0, info->snd_seq,
&ipctl_gnl_family, 0, IPCTL_CMD_GET);
	if (msg_head == NULL) {
		rc = -ENOMEM;
		goto out;
	}

	rc = nla_put_u32(skb_reply, IPCTL_ATTR_PROPERTY, property);
	if (rc != 0)
		goto out;

	rc = nla_put_u32(skb_reply, IPCTL_ATTR_IFINDEX, ifIndex);
	if (rc != 0)
		goto out;

	rc = nla_put_u8(skb_reply, IPCTL_ATTR_VALUE, value);
	if (rc != 0)
		goto out;
	
	/* finalize the message */
	genlmsg_end(skb_reply, msg_head);

	rc = genlmsg_reply(skb_reply , info);
	if (rc != 0)
		goto out;

	return 0;
out:
	pr_warning("ipctl: Error occured in reply: %d\n", rc);

	return rc;
}


/* handler for SET messages via NETLINK */
int ipctl_set(struct sk_buff *skb, struct genl_info *info)
{
	/* message handling code goes here; return 0 on success, negative
	 * values on failure */

	int property = nla_get_u32(info->attrs[IPCTL_ATTR_PROPERTY]);
	int ifIndex = nla_get_u32(info->attrs[IPCTL_ATTR_IFINDEX]);
	int value = nla_get_u8(info->attrs[IPCTL_ATTR_VALUE]);

	pr_debug("ipctl: set p=%d i=%d v=%d\n", property, ifIndex, value);

	if (property == IPCTL_PROPERTY_PROXYARP)
		return ipctl_set_proxyarp_by_ifindex(ifIndex, value);

	return 0;
}


/* handler for GET messages via NETLINK */
int ipctl_get(struct sk_buff *skb, struct genl_info *info)
{
	/* message handling code goes here; return 0 on success, negative
	 * values on failure */

	int property = nla_get_u32(info->attrs[IPCTL_ATTR_PROPERTY]);
	int ifIndex = nla_get_u32(info->attrs[IPCTL_ATTR_IFINDEX]);
	int value = 0;
	int retval = 0;

	pr_debug("ipctl: get p=%d i=%d\n", property, ifIndex);

	if (property == IPCTL_PROPERTY_PROXYARP)
		retval = ipctl_get_proxyarp_by_ifindex(ifIndex, &value);

	if (retval)
		return retval;

	return ipctl_reply(skb, info, property, ifIndex, value);
}


/* NETLINK operation definition */
struct genl_ops ipctl_gnl_ops_set = {
	.cmd = IPCTL_CMD_SET,
	.flags = GENL_ADMIN_PERM,
	.policy = ipctl_genl_policy,
	.doit = ipctl_set,
	.dumpit = NULL,
};

struct genl_ops ipctl_gnl_ops_get = {
	.cmd = IPCTL_CMD_GET,
	.flags = 0,
	.policy = ipctl_genl_policy,
	.doit = ipctl_get,
	.dumpit = NULL,
};


static int __init ipctl_init(void)
{
	int rc;

	printk(KERN_INFO "ipctl: %s.\n", MOD_VER);

	rc = genl_register_family(&ipctl_gnl_family);
	if (rc)
		printk("ipctl: genl_register_family: %d.\n", rc);

	rc = genl_register_ops(&ipctl_gnl_family, &ipctl_gnl_ops_set);
	if (rc)
		printk("ipctl: genl_register_ops: %d.\n", rc);

	rc = genl_register_ops(&ipctl_gnl_family, &ipctl_gnl_ops_get);
	if (rc)
		printk("ipctl: genl_register_ops: %d.\n", rc);

	/*
	 * A non 0 return means init_module failed; module can't be loaded.
	 */
	return 0;
}


static void __exit ipctl_exit(void)
{
	genl_unregister_family(&ipctl_gnl_family);
}


module_init(ipctl_init);
module_exit(ipctl_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR(MOD_AUTHOR);
MODULE_DESCRIPTION(MOD_DESC);
MODULE_VERSION(MOD_VER);

^ permalink raw reply

* Re: [net-next 5/9] e1000e: Disable ASPM L1 on 82574
From: Nix @ 2012-05-05 16:33 UTC (permalink / raw)
  To: Wyborny, Carolyn, Matthew Garrett
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, Chris Boot,
	netdev@vger.kernel.org, gospo@redhat.com, sassmann@redhat.com
In-Reply-To: <87sjfhaukf.fsf@spindle.srvr.nix>

On 3 May 2012, nix@esperi.org.uk outgrape:

> On 3 May 2012, Carolyn Wyborny told this:
>
>> It would be good to know why/how your system is re-enabling the
>> setting. The problem is not solvable in firmware unfortunately and is
>> somewhat platform dependent. MMIO-tracer might be used to try and see
>
> I entirely forgot about that tool! *Definitely* worth trying.
>
> I'll give it a try this weekend.

Well, mmiotrace was a total flop: massive numbers of unexpected
secondary interrupts and a hard lockup. Still, I've now diagnosed this
bug and it's right up Matthew Garrett's street!

Matthew: the problem here is a server with an 82574L (controlled by the
e1000e driver). This NIC has a hardware bug causing it to lock up in a
way that only a reboot can solve in an hour or two if PCIe ASPM is not
disabled during boot (leaving me with my home directory stuck behind a
dead NIC on a headless machine, most annoying). The driver is attempting
to disable it, but failing.

>> when the re-enabling config space is written, but it might be too
>> heavyweight for a live production system.
>
> Given that the re-enabling happens at around the same time as the boot
> scripts finish running (it's done by the time I can log in), that's not
> going to be a problem. Hence my speculation that it's being re-enabled
> when the interface stabilizes (which is, of course, asynchronous) or
> something like that.

This is wrong. The disable never happens. The BIOS has been told to
enable PCIe ASPM. However, the kernel log says:

May  5 17:06:53 spindle info: [    0.629699]  pci0000:00: Requesting ACPI _OSC control (0x1d)
May  5 17:06:53 spindle info: [    0.629941]  pci0000:00: ACPI _OSC request failed (AE_NOT_FOUND), returned control mask: 0x1d
May  5 17:06:53 spindle info: [    0.630373] ACPI _OSC control for PCIe not granted, disabling ASPM

Unless pcie_aspm=force has been specified on the kernel command line,
this flips aspm_disabled to 1.

The e1000e driver then says (with a bit of extra debugging info I
added):

May  5 17:06:53 spindle info: [    1.248153] e1000e 0000:03:00.0: Disabling ASPM L0s L1
May  5 17:06:53 spindle info: [    1.248393] e1000e 0000:03:00.0: Disabling ASPM via pci_disable_link_state_locked()
May  5 17:06:53 spindle info: [    1.248823] e1000e 0000:03:00.0: aspm disabled, not forcing

i.e. because aspm_disabled is set, pci/pcie/aspm.c refuses to make any
changes at all to ASPM link state, not even to turn *off* ASPM on a
device on which the BIOS turned it on at boot. So ASPM remains enabled
and the NIC eventually locks up.

The question here is how to fix it. It appears that the motherboard or
BIOS on this machine does not grant _OSC control even (especially?) if
you have turned on PCIe ASPM in the BIOS. But perhaps even if _OSC is
not granted you should permit PCIe to be *disabled* by drivers, just not
enabled? (The BIOS appears to be buggy in this area: if you turn off
ASPM, save, and go back into setup, ASPM has turned itself back on
again!)

I'm not sure what the right thing to do is here: I don't know enough
about this area. But it does seem very strange that the only way I have
to turn off PCIe ASPM reliably on this device is to tell the kernel to
forcibly turn it *on*!

^ permalink raw reply

* Re: [net-next PATCH v4 0/8] Managing the forwarding database(FDB)
From: Michael S. Tsirkin @ 2012-05-05 19:53 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Sridhar Samudrala, John Fastabend, shemminger, bhutchings, hadi,
	jeffrey.t.kirsher, netdev, gregory.v.rose, krkumar2
In-Reply-To: <CAGe6so8q26X=HoQx+P-wkoLMtq1NhRerk98-v0cxhUpvMH4zmQ@mail.gmail.com>

On Fri, May 04, 2012 at 01:34:24PM -0700, Roopa Prabhu wrote:
> the qemu patches pointed out above can be reused.

Do you have plans to do this?

^ permalink raw reply

* [PATCH] net/ipv6/af_inet6.c: checkpatch cleanup
From: Eldad Zack @ 2012-05-05 20:13 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
  Cc: netdev, linux-kernel, Eldad Zack

af_inet6.c:80: ERROR: do not initialise statics to 0 or NULL
af_inet6.c:259: ERROR: spaces required around that '=' (ctx:VxV)
af_inet6.c:394: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
af_inet6.c:412: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
af_inet6.c:422: ERROR: do not use assignment in if condition
af_inet6.c:425: ERROR: do not use assignment in if condition
af_inet6.c:433: ERROR: do not use assignment in if condition
af_inet6.c:437: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
af_inet6.c:446: ERROR: spaces required around that '=' (ctx:VxV)
af_inet6.c:478: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
af_inet6.c:485: ERROR: that open brace { should be on the previous line
af_inet6.c:485: ERROR: space required before the open parenthesis '('
af_inet6.c:513: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
af_inet6.c:629: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
af_inet6.c:647: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
af_inet6.c:687: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
af_inet6.c:709: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
af_inet6.c:1073: ERROR: space required before the open parenthesis '('

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
---
 net/ipv6/af_inet6.c |   29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 0ad046c..bf8e146 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -77,7 +77,7 @@ struct ipv6_params ipv6_defaults = {
 	.autoconf = 1,
 };
 
-static int disable_ipv6_mod = 0;
+static int disable_ipv6_mod;
 
 module_param_named(disable, disable_ipv6_mod, int, 0444);
 MODULE_PARM_DESC(disable, "Disable IPv6 module such that it is non-functional");
@@ -256,7 +256,7 @@ out_rcu_unlock:
 /* bind for INET6 API */
 int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 {
-	struct sockaddr_in6 *addr=(struct sockaddr_in6 *)uaddr;
+	struct sockaddr_in6 *addr = (struct sockaddr_in6 *)uaddr;
 	struct sock *sk = sock->sk;
 	struct inet_sock *inet = inet_sk(sk);
 	struct ipv6_pinfo *np = inet6_sk(sk);
@@ -390,7 +390,6 @@ out_unlock:
 	rcu_read_unlock();
 	goto out;
 }
-
 EXPORT_SYMBOL(inet6_bind);
 
 int inet6_release(struct socket *sock)
@@ -408,7 +407,6 @@ int inet6_release(struct socket *sock)
 
 	return inet_release(sock);
 }
-
 EXPORT_SYMBOL(inet6_release);
 
 void inet6_destroy_sock(struct sock *sk)
@@ -419,10 +417,12 @@ void inet6_destroy_sock(struct sock *sk)
 
 	/* Release rx options */
 
-	if ((skb = xchg(&np->pktoptions, NULL)) != NULL)
+	skb = xchg(&np->pktoptions, NULL);
+	if (skb != NULL)
 		kfree_skb(skb);
 
-	if ((skb = xchg(&np->rxpmtu, NULL)) != NULL)
+	skb = xchg(&np->rxpmtu, NULL);
+	if (skb != NULL)
 		kfree_skb(skb);
 
 	/* Free flowlabels */
@@ -430,10 +430,10 @@ void inet6_destroy_sock(struct sock *sk)
 
 	/* Free tx options */
 
-	if ((opt = xchg(&np->opt, NULL)) != NULL)
+	opt = xchg(&np->opt, NULL);
+	if (opt != NULL)
 		sock_kfree_s(sk, opt, opt->tot_len);
 }
-
 EXPORT_SYMBOL_GPL(inet6_destroy_sock);
 
 /*
@@ -443,7 +443,7 @@ EXPORT_SYMBOL_GPL(inet6_destroy_sock);
 int inet6_getname(struct socket *sock, struct sockaddr *uaddr,
 		 int *uaddr_len, int peer)
 {
-	struct sockaddr_in6 *sin=(struct sockaddr_in6 *)uaddr;
+	struct sockaddr_in6 *sin = (struct sockaddr_in6 *)uaddr;
 	struct sock *sk = sock->sk;
 	struct inet_sock *inet = inet_sk(sk);
 	struct ipv6_pinfo *np = inet6_sk(sk);
@@ -474,7 +474,6 @@ int inet6_getname(struct socket *sock, struct sockaddr *uaddr,
 	*uaddr_len = sizeof(*sin);
 	return 0;
 }
-
 EXPORT_SYMBOL(inet6_getname);
 
 int inet6_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
@@ -482,8 +481,7 @@ int inet6_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 	struct sock *sk = sock->sk;
 	struct net *net = sock_net(sk);
 
-	switch(cmd)
-	{
+	switch (cmd) {
 	case SIOCGSTAMP:
 		return sock_get_timestamp(sk, (struct timeval __user *)arg);
 
@@ -509,7 +507,6 @@ int inet6_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 	/*NOTREACHED*/
 	return 0;
 }
-
 EXPORT_SYMBOL(inet6_ioctl);
 
 const struct proto_ops inet6_stream_ops = {
@@ -625,7 +622,6 @@ out_illegal:
 	       p->type);
 	goto out;
 }
-
 EXPORT_SYMBOL(inet6_register_protosw);
 
 void
@@ -643,7 +639,6 @@ inet6_unregister_protosw(struct inet_protosw *p)
 		synchronize_net();
 	}
 }
-
 EXPORT_SYMBOL(inet6_unregister_protosw);
 
 int inet6_sk_rebuild_header(struct sock *sk)
@@ -683,7 +678,6 @@ int inet6_sk_rebuild_header(struct sock *sk)
 
 	return 0;
 }
-
 EXPORT_SYMBOL_GPL(inet6_sk_rebuild_header);
 
 int ipv6_opt_accepted(struct sock *sk, struct sk_buff *skb)
@@ -705,7 +699,6 @@ int ipv6_opt_accepted(struct sock *sk, struct sk_buff *skb)
 	}
 	return 0;
 }
-
 EXPORT_SYMBOL_GPL(ipv6_opt_accepted);
 
 static int ipv6_gso_pull_exthdrs(struct sk_buff *skb, int proto)
@@ -1070,7 +1063,7 @@ static int __init inet6_init(void)
 	BUILD_BUG_ON(sizeof(struct inet6_skb_parm) > sizeof(dummy_skb->cb));
 
 	/* Register the socket-side information for inet6_create.  */
-	for(r = &inetsw6[0]; r < &inetsw6[SOCK_MAX]; ++r)
+	for (r = &inetsw6[0]; r < &inetsw6[SOCK_MAX]; ++r)
 		INIT_LIST_HEAD(r);
 
 	if (disable_ipv6_mod) {
-- 
1.7.10

^ permalink raw reply related

* [PATCH resend] [IPV6] remove sysctl accept_source_route
From: Eldad Zack @ 2012-05-05 20:28 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy
  Cc: linux-kernel, netdev, Eldad Zack

The only place where the accpet_source_route flag is checked is when we
are processing the type 2 routing header. In that case we only allow it if
it (1) has only segments left = 1 and (2) if it matches our home address,
which is the behavior required by RFC 6275 (see sections 8.5, 11.3.3), and
it doesn't make sense to block rh2 when we're a mobile node.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
---
 include/linux/ipv6.h |    2 --
 net/ipv6/addrconf.c  |   10 ----------
 net/ipv6/exthdrs.c   |    6 ------
 3 files changed, 18 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 8260ef7..a77c6fe 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -162,7 +162,6 @@ struct ipv6_devconf {
 #endif
 #endif
 	__s32		proxy_ndp;
-	__s32		accept_source_route;
 #ifdef CONFIG_IPV6_OPTIMISTIC_DAD
 	__s32		optimistic_dad;
 #endif
@@ -208,7 +207,6 @@ enum {
 	DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN,
 	DEVCONF_PROXY_NDP,
 	DEVCONF_OPTIMISTIC_DAD,
-	DEVCONF_ACCEPT_SOURCE_ROUTE,
 	DEVCONF_MC_FORWARDING,
 	DEVCONF_DISABLE_IPV6,
 	DEVCONF_ACCEPT_DAD,
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index e3b3421..bca2acf 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -194,7 +194,6 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
 #endif
 #endif
 	.proxy_ndp		= 0,
-	.accept_source_route	= 0,	/* we do not accept RH0 by default. */
 	.disable_ipv6		= 0,
 	.accept_dad		= 1,
 };
@@ -228,7 +227,6 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
 #endif
 #endif
 	.proxy_ndp		= 0,
-	.accept_source_route	= 0,	/* we do not accept RH0 by default. */
 	.disable_ipv6		= 0,
 	.accept_dad		= 1,
 };
@@ -3905,7 +3903,6 @@ static inline void ipv6_store_devconf(struct ipv6_devconf *cnf,
 #endif
 #endif
 	array[DEVCONF_PROXY_NDP] = cnf->proxy_ndp;
-	array[DEVCONF_ACCEPT_SOURCE_ROUTE] = cnf->accept_source_route;
 #ifdef CONFIG_IPV6_OPTIMISTIC_DAD
 	array[DEVCONF_OPTIMISTIC_DAD] = cnf->optimistic_dad;
 #endif
@@ -4535,13 +4532,6 @@ static struct addrconf_sysctl_table
 			.mode		= 0644,
 			.proc_handler	= proc_dointvec,
 		},
-		{
-			.procname	= "accept_source_route",
-			.data		= &ipv6_devconf.accept_source_route,
-			.maxlen		= sizeof(int),
-			.mode		= 0644,
-			.proc_handler	= proc_dointvec,
-		},
 #ifdef CONFIG_IPV6_OPTIMISTIC_DAD
 		{
 			.procname       = "optimistic_dad",
diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index aa0a51e..597cf2a 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -337,12 +337,8 @@ static int ipv6_rthdr_rcv(struct sk_buff *skb)
 	struct ipv6_rt_hdr *hdr;
 	struct rt0_hdr *rthdr;
 	struct net *net = dev_net(skb->dev);
-	int accept_source_route = net->ipv6.devconf_all->accept_source_route;
 
 	idev = __in6_dev_get(skb->dev);
-	if (idev && accept_source_route > idev->cnf.accept_source_route)
-		accept_source_route = idev->cnf.accept_source_route;
-
 	if (!pskb_may_pull(skb, skb_transport_offset(skb) + 8) ||
 	    !pskb_may_pull(skb, (skb_transport_offset(skb) +
 				 ((skb_transport_header(skb)[1] + 1) << 3)))) {
@@ -393,8 +389,6 @@ looped_back:
 	switch (hdr->type) {
 #if defined(CONFIG_IPV6_MIP6) || defined(CONFIG_IPV6_MIP6_MODULE)
 	case IPV6_SRCRT_TYPE_2:
-		if (accept_source_route < 0)
-			goto unknown_rh;
 		/* Silently discard invalid RTH type 2 */
 		if (hdr->hdrlen != 2 || hdr->segments_left != 1) {
 			IP6_INC_STATS_BH(net, ip6_dst_idev(skb_dst(skb)),
-- 
1.7.10

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox