Netdev List
 help / color / mirror / Atom feed
* [PATCH 1/2] xen: netfront: refactor code for checking validity of incoming skbs
From: Ian Campbell @ 2011-01-25 17:09 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Ian Campbell, Jeremy Fitzhardinge, xen-devel, netdev
In-Reply-To: <1295975376.14780.6595.camel@zakaz.uk.xensource.com>

Makes future additional validation clearer.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: xen-devel@lists.xensource.com
Cc: netdev@vger.kernel.org
---
 drivers/net/xen-netfront.c |   24 +++++++++++++++++-------
 1 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 546de57..4dc347b 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -809,6 +809,18 @@ out:
 	return err;
 }
 
+static int validate_incoming_skb(struct sk_buff *skb)
+{
+	/*
+	 * If the SKB is partial then we must be able to setup the
+	 * checksum fields in the skb.
+	 */
+	if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_setup(skb))
+		return 0;
+
+	return 1;
+}
+
 static int handle_incoming_queue(struct net_device *dev,
 				 struct sk_buff_head *rxq)
 {
@@ -829,13 +841,11 @@ static int handle_incoming_queue(struct net_device *dev,
 		/* Ethernet work: Delayed to here as it peeks the header. */
 		skb->protocol = eth_type_trans(skb, dev);
 
-		if (skb->ip_summed == CHECKSUM_PARTIAL) {
-			if (skb_checksum_setup(skb)) {
-				kfree_skb(skb);
-				packets_dropped++;
-				dev->stats.rx_errors++;
-				continue;
-			}
+		if (!validate_incoming_skb(skb)) {
+			kfree_skb(skb);
+			packets_dropped++;
+			dev->stats.rx_errors++;
+			continue;
 		}
 
 		dev->stats.rx_packets++;
-- 
1.5.6.5


^ permalink raw reply related

* [PATCH 2/2] xen: netfront: Drop GSO SKBs which do not have csum_blank.
From: Ian Campbell @ 2011-01-25 17:10 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: Ian Campbell, Jeremy Fitzhardinge, xen-devel, netdev
In-Reply-To: <1295975376.14780.6595.camel@zakaz.uk.xensource.com>

The Linux network stack expects all GSO SKBs to have ip_summed ==
CHECKSUM_PARTIAL (which implies that the frame contains a partial
checksum) and the Xen network ring protocol similarly expects an SKB
which has GSO set to also have NETRX_csum_blank (which also implies a
partial checksum). Therefore drop such frames on receive otherwise
they will trigger the warning in skb_gso_segment.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: xen-devel@lists.xensource.com
Cc: netdev@vger.kernel.org
---
 drivers/net/xen-netfront.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 4dc347b..0ea47da 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -818,6 +818,10 @@ static int validate_incoming_skb(struct sk_buff *skb)
 	if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_setup(skb))
 		return 0;
 
+	/* A GSO SKB must be partial. */
+	if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb))
+		return 0;
+
 	return 1;
 }
 
-- 
1.5.6.5


^ permalink raw reply related

* [PATCH net-next-2.6] pktgen: speedup fragmented skbs
From: Eric Dumazet @ 2011-01-25 17:13 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

We spend lot of time clearing pages in pktgen.
(Or not clearing them on ipv6 and leaking kernel memory)

Since we dont modify them, we can use one zeroed page, and get
references on it. This page can use NUMA affinity as well.

Define pktgen_finalize_skb() helper, used both in ipv4 and ipv6

Results using skbs with one frag :

Before patch :

Result: OK: 608980458(c608978520+d1938) nsec, 1000000000
(100byte,1frags)
  1642088pps 1313Mb/sec (1313670400bps) errors: 0

After patch :

Result: OK: 345285014(c345283891+d1123) nsec, 1000000000
(100byte,1frags)
  2896158pps 2316Mb/sec (2316926400bps) errors: 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/pktgen.c |  234 +++++++++++++++++---------------------------
 1 files changed, 93 insertions(+), 141 deletions(-)

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index a9e7fc4..17c4e16 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -251,6 +251,7 @@ struct pktgen_dev {
 	int max_pkt_size;	/* = ETH_ZLEN; */
 	int pkt_overhead;	/* overhead for MPLS, VLANs, IPSEC etc */
 	int nfrags;
+	struct page *page;
 	u64 delay;		/* nano-seconds */
 
 	__u64 count;		/* Default No packets to send */
@@ -1134,6 +1135,10 @@ static ssize_t pktgen_if_write(struct file *file,
 		if (node_possible(value)) {
 			pkt_dev->node = value;
 			sprintf(pg_result, "OK: node=%d", pkt_dev->node);
+			if (pkt_dev->page) {
+				put_page(pkt_dev->page);
+				pkt_dev->page = 0;
+			}
 		}
 		else
 			sprintf(pg_result, "ERROR: node not possible");
@@ -2605,6 +2610,90 @@ static inline __be16 build_tci(unsigned int id, unsigned int cfi,
 	return htons(id | (cfi << 12) | (prio << 13));
 }
 
+static void pktgen_finalize_skb(struct pktgen_dev *pkt_dev, struct sk_buff *skb,
+				int datalen)
+{
+	struct timeval timestamp;
+	struct pktgen_hdr *pgh;
+
+	pgh = (struct pktgen_hdr *)skb_put(skb, sizeof(*pgh));
+	datalen -= sizeof(*pgh);
+
+	if (pkt_dev->nfrags <= 0) {
+		pgh = (struct pktgen_hdr *)skb_put(skb, datalen);
+		memset(pgh + 1, 0, datalen);
+	} else {
+		int frags = pkt_dev->nfrags;
+		int i, len;
+
+
+		if (frags > MAX_SKB_FRAGS)
+			frags = MAX_SKB_FRAGS;
+		len = datalen - frags * PAGE_SIZE;
+		if (len > 0) {
+			memset(skb_put(skb, len), 0, len);
+			datalen = frags * PAGE_SIZE;
+		}
+
+		i = 0;
+		while (datalen > 0) {
+			if (unlikely(!pkt_dev->page)) {
+				int node = numa_node_id();
+
+				if (pkt_dev->node >= 0 && (pkt_dev->flags & F_NODE))
+					node = pkt_dev->node;
+				pkt_dev->page = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
+				if (!pkt_dev->page)
+					break;
+			}
+			skb_shinfo(skb)->frags[i].page = pkt_dev->page;
+			get_page(pkt_dev->page);
+			skb_shinfo(skb)->frags[i].page_offset = 0;
+			skb_shinfo(skb)->frags[i].size =
+			    (datalen < PAGE_SIZE ? datalen : PAGE_SIZE);
+			datalen -= skb_shinfo(skb)->frags[i].size;
+			skb->len += skb_shinfo(skb)->frags[i].size;
+			skb->data_len += skb_shinfo(skb)->frags[i].size;
+			i++;
+			skb_shinfo(skb)->nr_frags = i;
+		}
+
+		while (i < frags) {
+			int rem;
+
+			if (i == 0)
+				break;
+
+			rem = skb_shinfo(skb)->frags[i - 1].size / 2;
+			if (rem == 0)
+				break;
+
+			skb_shinfo(skb)->frags[i - 1].size -= rem;
+
+			skb_shinfo(skb)->frags[i] =
+			    skb_shinfo(skb)->frags[i - 1];
+			get_page(skb_shinfo(skb)->frags[i].page);
+			skb_shinfo(skb)->frags[i].page =
+			    skb_shinfo(skb)->frags[i - 1].page;
+			skb_shinfo(skb)->frags[i].page_offset +=
+			    skb_shinfo(skb)->frags[i - 1].size;
+			skb_shinfo(skb)->frags[i].size = rem;
+			i++;
+			skb_shinfo(skb)->nr_frags = i;
+		}
+	}
+
+	/* Stamp the time, and sequence number,
+	 * convert them to network byte order
+	 */
+	pgh->pgh_magic = htonl(PKTGEN_MAGIC);
+	pgh->seq_num = htonl(pkt_dev->seq_num);
+
+	do_gettimeofday(&timestamp);
+	pgh->tv_sec = htonl(timestamp.tv_sec);
+	pgh->tv_usec = htonl(timestamp.tv_usec);
+}
+
 static struct sk_buff *fill_packet_ipv4(struct net_device *odev,
 					struct pktgen_dev *pkt_dev)
 {
@@ -2613,7 +2702,6 @@ static struct sk_buff *fill_packet_ipv4(struct net_device *odev,
 	struct udphdr *udph;
 	int datalen, iplen;
 	struct iphdr *iph;
-	struct pktgen_hdr *pgh = NULL;
 	__be16 protocol = htons(ETH_P_IP);
 	__be32 *mpls;
 	__be16 *vlan_tci = NULL;                 /* Encapsulates priority and VLAN ID */
@@ -2729,76 +2817,7 @@ static struct sk_buff *fill_packet_ipv4(struct net_device *odev,
 			   pkt_dev->pkt_overhead);
 	skb->dev = odev;
 	skb->pkt_type = PACKET_HOST;
-
-	if (pkt_dev->nfrags <= 0) {
-		pgh = (struct pktgen_hdr *)skb_put(skb, datalen);
-		memset(pgh + 1, 0, datalen - sizeof(struct pktgen_hdr));
-	} else {
-		int frags = pkt_dev->nfrags;
-		int i, len;
-
-		pgh = (struct pktgen_hdr *)(((char *)(udph)) + 8);
-
-		if (frags > MAX_SKB_FRAGS)
-			frags = MAX_SKB_FRAGS;
-		if (datalen > frags * PAGE_SIZE) {
-			len = datalen - frags * PAGE_SIZE;
-			memset(skb_put(skb, len), 0, len);
-			datalen = frags * PAGE_SIZE;
-		}
-
-		i = 0;
-		while (datalen > 0) {
-			struct page *page = alloc_pages(GFP_KERNEL | __GFP_ZERO, 0);
-			skb_shinfo(skb)->frags[i].page = page;
-			skb_shinfo(skb)->frags[i].page_offset = 0;
-			skb_shinfo(skb)->frags[i].size =
-			    (datalen < PAGE_SIZE ? datalen : PAGE_SIZE);
-			datalen -= skb_shinfo(skb)->frags[i].size;
-			skb->len += skb_shinfo(skb)->frags[i].size;
-			skb->data_len += skb_shinfo(skb)->frags[i].size;
-			i++;
-			skb_shinfo(skb)->nr_frags = i;
-		}
-
-		while (i < frags) {
-			int rem;
-
-			if (i == 0)
-				break;
-
-			rem = skb_shinfo(skb)->frags[i - 1].size / 2;
-			if (rem == 0)
-				break;
-
-			skb_shinfo(skb)->frags[i - 1].size -= rem;
-
-			skb_shinfo(skb)->frags[i] =
-			    skb_shinfo(skb)->frags[i - 1];
-			get_page(skb_shinfo(skb)->frags[i].page);
-			skb_shinfo(skb)->frags[i].page =
-			    skb_shinfo(skb)->frags[i - 1].page;
-			skb_shinfo(skb)->frags[i].page_offset +=
-			    skb_shinfo(skb)->frags[i - 1].size;
-			skb_shinfo(skb)->frags[i].size = rem;
-			i++;
-			skb_shinfo(skb)->nr_frags = i;
-		}
-	}
-
-	/* Stamp the time, and sequence number,
-	 * convert them to network byte order
-	 */
-	if (pgh) {
-		struct timeval timestamp;
-
-		pgh->pgh_magic = htonl(PKTGEN_MAGIC);
-		pgh->seq_num = htonl(pkt_dev->seq_num);
-
-		do_gettimeofday(&timestamp);
-		pgh->tv_sec = htonl(timestamp.tv_sec);
-		pgh->tv_usec = htonl(timestamp.tv_usec);
-	}
+	pktgen_finalize_skb(pkt_dev, skb, datalen);
 
 #ifdef CONFIG_XFRM
 	if (!process_ipsec(pkt_dev, skb, protocol))
@@ -2980,7 +2999,6 @@ static struct sk_buff *fill_packet_ipv6(struct net_device *odev,
 	struct udphdr *udph;
 	int datalen;
 	struct ipv6hdr *iph;
-	struct pktgen_hdr *pgh = NULL;
 	__be16 protocol = htons(ETH_P_IPV6);
 	__be32 *mpls;
 	__be16 *vlan_tci = NULL;                 /* Encapsulates priority and VLAN ID */
@@ -3083,75 +3101,7 @@ static struct sk_buff *fill_packet_ipv6(struct net_device *odev,
 	skb->dev = odev;
 	skb->pkt_type = PACKET_HOST;
 
-	if (pkt_dev->nfrags <= 0)
-		pgh = (struct pktgen_hdr *)skb_put(skb, datalen);
-	else {
-		int frags = pkt_dev->nfrags;
-		int i;
-
-		pgh = (struct pktgen_hdr *)(((char *)(udph)) + 8);
-
-		if (frags > MAX_SKB_FRAGS)
-			frags = MAX_SKB_FRAGS;
-		if (datalen > frags * PAGE_SIZE) {
-			skb_put(skb, datalen - frags * PAGE_SIZE);
-			datalen = frags * PAGE_SIZE;
-		}
-
-		i = 0;
-		while (datalen > 0) {
-			struct page *page = alloc_pages(GFP_KERNEL, 0);
-			skb_shinfo(skb)->frags[i].page = page;
-			skb_shinfo(skb)->frags[i].page_offset = 0;
-			skb_shinfo(skb)->frags[i].size =
-			    (datalen < PAGE_SIZE ? datalen : PAGE_SIZE);
-			datalen -= skb_shinfo(skb)->frags[i].size;
-			skb->len += skb_shinfo(skb)->frags[i].size;
-			skb->data_len += skb_shinfo(skb)->frags[i].size;
-			i++;
-			skb_shinfo(skb)->nr_frags = i;
-		}
-
-		while (i < frags) {
-			int rem;
-
-			if (i == 0)
-				break;
-
-			rem = skb_shinfo(skb)->frags[i - 1].size / 2;
-			if (rem == 0)
-				break;
-
-			skb_shinfo(skb)->frags[i - 1].size -= rem;
-
-			skb_shinfo(skb)->frags[i] =
-			    skb_shinfo(skb)->frags[i - 1];
-			get_page(skb_shinfo(skb)->frags[i].page);
-			skb_shinfo(skb)->frags[i].page =
-			    skb_shinfo(skb)->frags[i - 1].page;
-			skb_shinfo(skb)->frags[i].page_offset +=
-			    skb_shinfo(skb)->frags[i - 1].size;
-			skb_shinfo(skb)->frags[i].size = rem;
-			i++;
-			skb_shinfo(skb)->nr_frags = i;
-		}
-	}
-
-	/* Stamp the time, and sequence number,
-	 * convert them to network byte order
-	 * should we update cloned packets too ?
-	 */
-	if (pgh) {
-		struct timeval timestamp;
-
-		pgh->pgh_magic = htonl(PKTGEN_MAGIC);
-		pgh->seq_num = htonl(pkt_dev->seq_num);
-
-		do_gettimeofday(&timestamp);
-		pgh->tv_sec = htonl(timestamp.tv_sec);
-		pgh->tv_usec = htonl(timestamp.tv_usec);
-	}
-	/* pkt_dev->seq_num++; FF: you really mean this? */
+	pktgen_finalize_skb(pkt_dev, skb, datalen);
 
 	return skb;
 }
@@ -3884,6 +3834,8 @@ static int pktgen_remove_device(struct pktgen_thread *t,
 	free_SAs(pkt_dev);
 #endif
 	vfree(pkt_dev->flows);
+	if (pkt_dev->page)
+		put_page(pkt_dev->page);
 	kfree(pkt_dev);
 	return 0;
 }



^ permalink raw reply related

* Re: [PATCH net-next-2.6] pktgen: speedup fragmented skbs
From: Ben Greear @ 2011-01-25 17:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1295975635.3588.337.camel@edumazet-laptop>

On 01/25/2011 09:13 AM, Eric Dumazet wrote:
> We spend lot of time clearing pages in pktgen.
> (Or not clearing them on ipv6 and leaking kernel memory)
>
> Since we dont modify them, we can use one zeroed page, and get
> references on it. This page can use NUMA affinity as well.
>
> Define pktgen_finalize_skb() helper, used both in ipv4 and ipv6

Some devices, like vlans, can change the skb, but perhaps they
will not actually mess with the paged data?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: [net-next 08/12] ixgb: convert to new VLAN model
From: Jesse Gross @ 2011-01-25 17:22 UTC (permalink / raw)
  To: Tantilov, Emil S
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, netdev@vger.kernel.org,
	bphilips@novell.com, Pieper, Jeffrey E
In-Reply-To: <EA929A9653AAE14F841771FB1DE5A136602DEE63CB@rrsmsx501.amr.corp.intel.com>

On Sun, Jan 23, 2011 at 4:25 PM, Tantilov, Emil S
<emil.s.tantilov@intel.com> wrote:
> Jesse Gross wrote:
>> On Thu, Jan 6, 2011 at 7:29 PM,  <jeffrey.t.kirsher@intel.com> wrote:
>>> +static int ixgb_set_flags(struct net_device *netdev, u32 data) +{
>>> +       struct ixgb_adapter *adapter = netdev_priv(netdev); +
>>> bool need_reset; +       int rc;
>>> +
>>> +       /*
>>> +        * TX vlan insertion does not work per HW design when Rx
>>> stripping is +        * disabled.  Disable txvlan when rxvlan is
>>> off. +        */ +       if ((data & ETH_FLAG_RXVLAN) !=
>>> (netdev->features & NETIF_F_HW_VLAN_RX)) +               data ^=
>>> ETH_FLAG_TXVLAN;
>>
>> Does this really do the right thing?  If the RX vlan setting is
>> changed, it will do the opposite of what the user requested for TX
>> vlan?
>>
>> So if I start with both on (the default) and turn them both off in one
>> command (a valid setting), I will get RX off and TX on (an invalid
>> setting).
>>
>> Why not:
>>
>> if (!(data & ETH_FLAG_RXVLAN))
>>         data &= ~ETH_FLAG_TXVLAN;
>
> Yeah that works for disabling rxvlan, but what if rxvlan is disabled, and the user attempts to enable txvlan? At least our validation argued that we should make it work both ways. Perhaps something like the following?
>
>        if (!(data & ETH_FLAG_RXVLAN) &&
>           (netdev->features & NETIF_F_HW_VLAN_TX))
>                data &= ~ETH_FLAG_TXVLAN;
>        else if (data & ETH_FLAG_TXVLAN)
>                data |= ETH_FLAG_RXVLAN;

I think the logic above does what you describe and will always result
in a consistent state.  Turning dependent features on when needed is a
little bit inconsistent with the rest of Ethtool (for example, turning
on TSO when checksum offloading is off will not enable checksum
offloading, it will produce an error).  However, I know that drivers
aren't completely consistent here and the most important part is that
it enforces valid states, so I don't have a strong opinion.  Ben's
previous suggestion of Ethtool querying again after the operation and
reporting any flags that were automatically changed would help a lot
here.

^ permalink raw reply

* Re: [PATCH net-next-2.6] pktgen: speedup fragmented skbs
From: Eric Dumazet @ 2011-01-25 17:47 UTC (permalink / raw)
  To: Ben Greear; +Cc: David Miller, netdev
In-Reply-To: <4D3F06BF.7010903@candelatech.com>

Le mardi 25 janvier 2011 à 09:22 -0800, Ben Greear a écrit :
> On 01/25/2011 09:13 AM, Eric Dumazet wrote:
> > We spend lot of time clearing pages in pktgen.
> > (Or not clearing them on ipv6 and leaking kernel memory)
> >
> > Since we dont modify them, we can use one zeroed page, and get
> > references on it. This page can use NUMA affinity as well.
> >
> > Define pktgen_finalize_skb() helper, used both in ipv4 and ipv6
> 
> Some devices, like vlans, can change the skb, but perhaps they
> will not actually mess with the paged data?

Yes, its absolutely forbidden to write on paged data.

If necessary, a COW must be done.




^ permalink raw reply

* Re: STMMAC driver: NFS Problem on 2.6.37
From: Chuck Lever @ 2011-01-25 18:04 UTC (permalink / raw)
  To: deepaksi
  Cc: Armando VISCONTI, Trond Myklebust, netdev-u79uwXL29TY76Z2rM5mHXA,
	Linux NFS Mailing List, Shiraz HASHIM, Viresh KUMAR,
	Peppe CAVALLARO, amitgoel
In-Reply-To: <4D3EBA54.4020308-qxv4g6HH51o@public.gmane.org>

See analysis in line.

On Jan 25, 2011, at 6:56 AM, deepaksi wrote:

> Hi All,
> 
> Please find more debug information on the issue. The log below is related to the kernel debug information.
> 
> 1. Kernel Debug Information log : ( The relevant portions are marked in bold )
> 
> [    2.980000] stmmac: Rx Checksum Offload Engine supported
> [    3.980000]  ** phydev->state 4 
> [    4.000000] IP-Config: Complete:                                                                                        /* Ethernet device open print */
> [    4.000000]      device=eth0, addr=192.168.1.10, mask=255.255.255.0, gw=192.168.1.1,
> [    4.010000]      host=192.168.1.10, domain=, nis-domain=(none),
> [    4.010000]      bootserver=192.168.1.1, rootserver=192.168.1.1, rootpath=
> [    4.020000] Root-NFS: nfsroot=/opt/STM/STLinux-2.4/devkit/armv7/target
> [    4.030000] NFS: nfs mount opts='nolock,addr=192.168.1.1'
> [    4.030000] NFS:   parsing nfs mount option 'nolock'
> [    4.040000] NFS:   parsing nfs mount option 'addr=192.168.1.1'
> [    4.040000] NFS: MNTPATH: '/opt/STM/STLinux-2.4/devkit/armv7/target'
> [    4.050000] NFS: sending MNT request for 192.168.1.1:/opt/STM/STLinux-2.4/devkit/armv7/target
> [    4.060000] Calling rpc_create
> [    4.060000] RPC:       set up xprt to 192.168.1.1 (autobind) via tcp
> [    4.070000] RPC:       created transport cf894000 with 16 slots
> [    4.080000] xprt_create_transport: RPC:       created transport cf894000 with 16 slots
> [    4.080000] RPC:       creating mount client for 192.168.1.1 (xprt cf894000)
> [    4.090000] RPC:       creating UNIX authenticator for client cfa90800
> [    4.100000] Calling rpc_ping
> [    4.100000] RPC:       new task initialized, procpid 1
> [    4.100000] RPC:       allocated task cfa11980
> [    4.110000] RPC:     1 __rpc_execute flags=0x680
> [    4.110000] RPC:     1 call_start mount3 proc NULL (sync)

The first RPC request is a MNT protocol request to determine the NFS file handle for the export to be mounted.

> [    4.120000] RPC:     1 call_reserve (status 0)
> [    4.120000] RPC:     1 reserved req cfb80000 xid f68b1f23
> [    4.130000] RPC:     1 call_reserveresult (status 0)
> [    4.130000] RPC:     1 call_refresh (status 0)
> [    4.140000] RPC:     1 holding NULL cred c0492798
> [    4.140000] RPC:     1 refreshing NULL cred c0492798
> [    4.150000] RPC:     1 call_refreshresult (status 0)
> [    4.150000] RPC:     1 call_allocate (status 0)
> [    4.160000] RPC:     1 allocated buffer of size 92 at cf894800
> [    4.160000] RPC:     1 call_bind (status 0)
> [    4.170000] RPC:     1 rpcb_getport_async(192.168.1.1, 100005, 3, 6)
> [    4.170000] RPC:     1 sleep_on(queue "xprt_binding" time 4294937713)
> [    4.180000] RPC:     1 added to queue cf8940a4 "xprt_binding"
> [    4.190000] RPC:     1 setting alarm for 60000 ms
> [    4.190000] RPC:     1 rpcb_getport_async: trying rpcbind version 2
> [    4.200000] Calling rpc_create
> [    4.200000] RPC:       set up xprt to 192.168.1.1 (port 111) via tcp
> [    4.210000] RPC:       created transport cf895000 with 16 slots
> [    4.210000] xprt_create_transport: RPC:       created transport cf895000 with 16 slots
> [    4.220000] RPC:       creating rpcbind client for 192.168.1.1 (xprt cf895000)
> [    4.230000] RPC:       creating UNIX authenticator for client cfa90a00
> [    4.230000] rpc_create returns 0xcfa90a00
> [    4.240000] RPC:       new task initialized, procpid 1
> [    4.240000] RPC:       allocated task cfa11a00
> [    4.250000] RPC:       rpc_release_client(cfa90a00)
> [    4.250000] RPC:     1 sync task going to sleep
> [    4.260000] RPC:     2 __rpc_execute flags=0x681
> [    4.260000] RPC:     2 call_start rpcbind2 proc GETPORT (async)

This is a child RPC request, which is an rpcbind query to discover the port where the NFS server's MNT service resides.  This is necessary before a transport socket for the MNT request can be set up.  The MNT request is put to sleep while the rpcbind query proceeds.

> [    4.270000] RPC:     2 call_reserve (status 0)
> [    4.270000] RPC:     2 reserved req cfb81000 xid 59bc347d
> [    4.280000] RPC:     2 call_reserveresult (status 0)
> [    4.280000] RPC:     2 call_refresh (status 0)
> [    4.290000] RPC:     2 looking up UNIX cred
> [    4.290000] RPC:       looking up UNIX cred
> [    4.300000] RPC:       allocating UNIX cred for uid 0 gid 0
> [    4.300000] RPC:     2 refreshing UNIX cred cfa11a80
> [    4.310000] RPC:     2 call_refreshresult (status 0)
> [    4.310000] RPC:     2 call_allocate (status 0)
> [    4.320000] RPC:     2 allocated buffer of size 412 at cf895800
> [    4.320000] RPC:     2 call_bind (status 0)
> [    4.330000] RPC:     2 call_connect xprt cf895000 is not connected
> [    4.330000] RPC:     2 xprt_connect xprt cf895000 is not connected
> [    4.340000] RPC:     2 sleep_on(queue "xprt_pending" time 4294937730)
> [    4.340000] RPC:     2 added to queue cf8951dc "xprt_pending"
> [    4.350000] RPC:     2 setting alarm for 60000 ms
> [    4.360000] RPC:       xs_connect scheduled xprt cf895000
> [    4.360000] RPC:       xs_bind 0.0.0.0:0: ok (0)
> [    4.370000] RPC:       worker connecting xprt cf895000 via tcp to 192.168.1.1 (port 111)
> [    4.370000] RPC:       cf895000 connect status 115 connected 0 sock state 2

A transport socket is set up for the rpcbind query, which connects to port 111 on the server.  The RPC client puts the rpcbind query request to sleep waiting for the transport socket connection to complete.  This is done in a worker thread.

> [    6.980000] PHY: 0:05 - Link is Up - 1000/Full            /* As soon as network link is up, post auto negotiation, we get the error below ( Unable to reach icmp)*/
> [    7.380000] RPC:       xs_error_report client cf895000...    /* The ethernet phy on our board typically takes 3 seconds for auto negotiation process */
> [    7.380000] RPC:       error 113                                 /* In case the frequency of the phy state machine (in phy framework) polling is increased from HZ to HZ/10, we do not face this issue*/

After three seconds, the initial socket connection attempt is getting EHOSTUNREACH.  The transport socket's error report callback wakes up RPC task 2 with status -EAGAIN.

"we do not face this issue" -- in that case, does the connection attempt succeed?  Do you have debugging output for that case?

> [    7.390000] RPC:     2 __rpc_wake_up_task (now 4294938035)
> [    7.390000] RPC:     2 disabling timer
> [    7.390000] RPC:     2 removed from queue cf8951dc "xprt_pending"
> [    7.400000] RPC:       __rpc_wake_up_task done
> [    7.410000] RPC:       xs_tcp_state_change client cf895000...
> [    7.410000] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [    7.420000] RPC:       disconnected transport cf895000

The transport is marked disconnected here by the transport socket's state change callback.  The socket state is TCP_CLOSE.  xprt_disconnect_done() wakes up any pending tasks with -EAGAIN (a second wake up).

> [    7.420000] RPC:     2 __rpc_execute flags=0x681
> [    7.420000] ************* skb->protocol = 608**************
> [    7.420000] ************* skb->protocol = 8**************
> [    7.440000] RPC:     2 xprt_connect_status: retrying

Here is RPC task 2 waking up in xprt_connect_status with status -EAGAIN.

> [    7.440000] RPC:     2 call_connect_status (status -11)
> [    7.450000] RPC:     2 call_transmit (status 0)
> [    7.450000] RPC:     2 xprt_prepare_transmit
> [    7.460000] RPC:     2 rpc_xdr_encode (status 0)
> [    7.460000] RPC:     2 marshaling UNIX cred cfa11a80
> [    7.470000] RPC:     2 using AUTH_UNIX cred cfa11a80 to wrap rpc data
> [    7.470000] RPC:     2 encoding PMAP_GETPORT call (100005, 3, 6, 0)
> [    7.480000] RPC:     2 xprt_transmit(92)
> [    7.480000] RPC:       xs_tcp_send_request(92) = -113

> [    7.490000] RPC:       sendmsg returned unrecognized error 113

I don't see a fresh connection attempt here.  The RPC client appears to be attempting to send on an unconnected transport socket.  The next send request is getting EHOSTUNREACH again, and thus the rpcbind query fails.

I thought there used to be code in xprt_prepare_transmit() to reconnect if needed, but I no longer find it here.  call_connect_status() handles -EAGAIN by advancing the state machine to call_transmit(), so the FSM does expect call_transmit() and its children to deal with an unconnected transport.

Possibly relevant commits: 2a491991, c8485e4d

In any event, the outcome is probably correct: if the RPC client gets EHOSTUNREACH while trying to connect or send, then it will fail this RPC request.

> [    7.490000] RPC:       xs_tcp_state_change client cf895000...
> [    7.500000] RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
> [    7.510000] RPC:       disconnected transport cf895000
> [    7.510000] RPC:       wake_up_next(cf895174 "xprt_resend")
> [    7.520000] RPC:       wake_up_next(cf89510c "xprt_sending")
> [    7.520000] RPC:       setting port for xprt cf894000 to 0
> [    7.530000] RPC:     2 rpcb_getport_done(status -113, port 0)
> [    7.530000] RPC:     2 return 0, status -113
> [    7.540000] RPC:     2 release task
> [    7.540000] RPC:       freeing buffer of size 412 at cf895800
> [    7.550000] RPC:     2 release request cfb81000
> [    7.550000] RPC:       wake_up_next(cf895244 "xprt_backlog")
> [    7.560000] RPC:       rpc_release_client(cfa90a00)
> [    7.560000] RPC:       destroying rpcbind client for 192.168.1.1
> [    7.570000] RPC:       destroying transport cf895000
> [    7.570000] RPC:       xs_destroy xprt cf895000
> [    7.580000] RPC:       xs_close xprt cf895000
> [    7.580000] RPC:       disconnected transport cf895000
> [    7.590000] RPC:     2 freeing task
> [    7.590000] RPC:     1 __rpc_wake_up_task (now 4294938055)
> [    7.600000] RPC:     1 disabling timer
> [    7.600000] RPC:     1 removed from queue cf8940a4 "xprt_binding"
> [    7.610000] RPC:       __rpc_wake_up_task done
> [    7.610000] RPC:     1 sync task resuming
> [    7.620000] RPC:     1 remote rpcbind unreachable: -113
> [    7.620000] RPC:     1 return 0, status -113
> [    7.630000] RPC:     1 release task
> [    7.630000] RPC:       freeing buffer of size 92 at cf894800
> [    7.630000] RPC:     1 release request cfb80000
> [    7.640000] RPC:       wake_up_next(cf894244 "xprt_backlog")
> [    7.640000] RPC:       rpc_release_client(cfa90800)
> [    7.650000] RPC:     1 freeing task
> [    7.650000] rpc_ping returned -113
> [    7.660000] Calling rpc_ping failed

Note that the RPC ping is for RPC task 1, the MNT request, not for the rpcbind query (RPC task 2).

An RPC ping for the MNT request can't proceed until the rpcbind query has discovered the correct server port to connect to.  The rpcbind query is done first, then, and is what is failing to connect.  So RPC ping is a red herring here -- the RPC client doesn't get far enough to send a MNT NULL request.

  [ ... remaining dump contents snipped ... ]

> We have made following observations
> 1. It seems that the time taken by phy auto negotiation process is long and as soon as the link gets up rpc ping request is getting timed out and we receive "Unable to reach ICMP" error. The time out error is same even if you do not connect a network cable and do a nfs boot.
> 2. We tried to modify the rate at which the work queue is scheduled in the phy framework. instead of scheduling every HZ ( 1 sec), we modified it to HZ/10. We did not received the error. This probably reduced the margin of the phy framework informing the kernel that the link is up.
> 3. We tried to use another network card and did a nfs boot. The only relevant difference we could find was the time of auto negotiation.

Can you post a similar debugging dump of a root mount that succeeds using a different network card?

> Are there some changes in the kernel framework w.r.t rpc ping time out ? This problem was not there in previous kernels.

There have been changes in the RPC socket code around how it manages recovery from failed attempts to connect.  We also have new logic now in the RPC client that causes RPC ping to fail immediately if a host can't be reached.

Thanks for your efforts so far.  It would be helpful if you could bisect to determine which commit(s) introduced this RPC client behavior (or any related changes to your network driver behavior).

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [net-next 08/12] ixgb: convert to new VLAN model
From: Tantilov, Emil S @ 2011-01-25 18:20 UTC (permalink / raw)
  To: Jesse Gross
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, netdev@vger.kernel.org,
	bphilips@novell.com, Pieper, Jeffrey E, Ben Hutchings
In-Reply-To: <AANLkTikS7zw5OfPFTJBUeUyy39Tx26Wkg9FX3n77-v1G@mail.gmail.com>

>-----Original Message-----
>From: Jesse Gross [mailto:jesse@nicira.com]
>Sent: Tuesday, January 25, 2011 9:23 AM
>To: Tantilov, Emil S
>Cc: Kirsher, Jeffrey T; davem@davemloft.net; netdev@vger.kernel.org;
>bphilips@novell.com; Pieper, Jeffrey E
>Subject: Re: [net-next 08/12] ixgb: convert to new VLAN model
>
>On Sun, Jan 23, 2011 at 4:25 PM, Tantilov, Emil S
><emil.s.tantilov@intel.com> wrote:
>> Jesse Gross wrote:
>>> On Thu, Jan 6, 2011 at 7:29 PM,  <jeffrey.t.kirsher@intel.com> wrote:
>>>> +static int ixgb_set_flags(struct net_device *netdev, u32 data) +{
>>>> +       struct ixgb_adapter *adapter = netdev_priv(netdev); +
>>>> bool need_reset; +       int rc;
>>>> +
>>>> +       /*
>>>> +        * TX vlan insertion does not work per HW design when Rx
>>>> stripping is +        * disabled.  Disable txvlan when rxvlan is
>>>> off. +        */ +       if ((data & ETH_FLAG_RXVLAN) !=
>>>> (netdev->features & NETIF_F_HW_VLAN_RX)) +               data ^=
>>>> ETH_FLAG_TXVLAN;
>>>
>>> Does this really do the right thing?  If the RX vlan setting is
>>> changed, it will do the opposite of what the user requested for TX
>>> vlan?
>>>
>>> So if I start with both on (the default) and turn them both off in one
>>> command (a valid setting), I will get RX off and TX on (an invalid
>>> setting).
>>>
>>> Why not:
>>>
>>> if (!(data & ETH_FLAG_RXVLAN))
>>>         data &= ~ETH_FLAG_TXVLAN;
>>
>> Yeah that works for disabling rxvlan, but what if rxvlan is disabled, and
>the user attempts to enable txvlan? At least our validation argued that we
>should make it work both ways. Perhaps something like the following?
>>
>>        if (!(data & ETH_FLAG_RXVLAN) &&
>>           (netdev->features & NETIF_F_HW_VLAN_TX))
>>                data &= ~ETH_FLAG_TXVLAN;
>>        else if (data & ETH_FLAG_TXVLAN)
>>                data |= ETH_FLAG_RXVLAN;
>
>I think the logic above does what you describe and will always result
>in a consistent state.  Turning dependent features on when needed is a
>little bit inconsistent with the rest of Ethtool (for example, turning
>on TSO when checksum offloading is off will not enable checksum
>offloading, it will produce an error).  However, I know that drivers

That is the reason I asked, as I don't want to keep bouncing the patch back and forth. Personally I like the idea of helping the user and adjusting the flags to something that works rather than a generic error message.

>aren't completely consistent here and the most important part is that
>it enforces valid states, so I don't have a strong opinion.  Ben's
>previous suggestion of Ethtool querying again after the operation and
>reporting any flags that were automatically changed would help a lot
>here.

Sure, but I think a savvy user would always check the result of an ethtool command (ie. `ethtool -K` followed with `ethtool -k`, -A/-a, etc).
`
Added Ben in case he has comments.

Thanks,
Emil

^ permalink raw reply

* [PATCH net-next-2.6] bonding: fix return value of bonding_store_slaves_active()
From: Jiri Pirko @ 2011-01-25 18:53 UTC (permalink / raw)
  To: netdev; +Cc: netdev, fubar, davem

count is incorrectly returned even in case of fail. Return ret instead.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 8fd0174..f33155f 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1595,7 +1595,7 @@ static ssize_t bonding_store_slaves_active(struct device *d,
 		}
 	}
 out:
-	return count;
+	return ret;
 }
 static DEVICE_ATTR(all_slaves_active, S_IRUGO | S_IWUSR,
 		   bonding_show_slaves_active, bonding_store_slaves_active);

^ permalink raw reply related

* Re: [PATCH] ipv6: Revert 'administrative down' address handling changes.
From: Eric W. Biederman @ 2011-01-25 18:55 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, shemminger, brian.haley, lorenzo, herbert
In-Reply-To: <m1zkqp2woo.fsf@fess.ebiederm.org>

ebiederm@xmission.com (Eric W. Biederman) writes:

> David Miller <davem@davemloft.net> writes:
>
>> From: David Miller <davem@davemloft.net>
>> Date: Sun, 23 Jan 2011 23:41:01 -0800 (PST)
>>
>>> Eric B. and co., please do some testing to make sure all of your
>>> disable_ipv6 cases are functioning properly with this applied.
>>
>> Ping?
>
> In progress.  I had to make a small change to your patch to get it
> to apply against 2.6.37.  neigh_ifdown has not been removed from the
> beginning of addrconf_ifdown there.  The piece that was failing for
> me is not failing now so, so far so good.
>
> It was reported that in 2.6.37 there was a new regression that
> 1connecting to ::1 when ipv6 was disabled would not fail immediately but
> would have to wait a while.  With your patch applied I am not seeing
> that behavior either.
>
> Tomorrow I should know if I see any weird side effects with your patch,
> after my regression tests for everything else have finished running.

I have to admit I am seeing weird side effects.  A test that was
mysteriously failing because it could not create a vlan on top of a tap
device has started working again ;)

So this change is looking really good for me.

I have to track through some additional failures that I think are test
bugs, but it looks like I might finally have 2.6.37 in a shape where I
can use it.

Eric

^ permalink raw reply

* Re: [PATCH net-next-2.6] bonding: fix return value of bonding_store_slaves_active()
From: Jay Vosburgh @ 2011-01-25 19:22 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem
In-Reply-To: <20110125185315.GB2932@psychotron.redhat.com>

Jiri Pirko <jpirko@redhat.com> wrote:

>count is incorrectly returned even in case of fail. Return ret instead.
>
>Signed-off-by: Jiri Pirko <jpirko@redhat.com>

	It looks like bonding_store_carrier has the same problem, could
you fix it up too and resubmit?

	-J


>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index 8fd0174..f33155f 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -1595,7 +1595,7 @@ static ssize_t bonding_store_slaves_active(struct device *d,
> 		}
> 	}
> out:
>-	return count;
>+	return ret;
> }
> static DEVICE_ATTR(all_slaves_active, S_IRUGO | S_IWUSR,
> 		   bonding_show_slaves_active, bonding_store_slaves_active);


^ permalink raw reply

* Re: [PATCH] typhoon: Kill references to UTS_RELEASE
From: David Dillow @ 2011-01-25 19:39 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110124.151214.28803162.davem@davemloft.net>

On Mon, 2011-01-24 at 15:12 -0800, David Miller wrote:
> This makes the driver get rebuilt every single time you
> type 'make' which is beyond rediculious.

Indeed, that's stupid.

> I hereby declare this driver to have version "1.0"

Wish you'd gone to 1.6 or 2.0 since the last version before Joe's/my
changes was 1.5.9, but it's pretty irrelevant anyways. Hmm, does
anything complain if we just remove the MODULE_VERSION?



^ permalink raw reply

* Re: [PATCH] typhoon: Kill references to UTS_RELEASE
From: David Miller @ 2011-01-25 20:33 UTC (permalink / raw)
  To: dave; +Cc: netdev
In-Reply-To: <1295984365.11300.4.camel@lap75545.ornl.gov>

From: David Dillow <dave@thedillows.org>
Date: Tue, 25 Jan 2011 14:39:25 -0500

> On Mon, 2011-01-24 at 15:12 -0800, David Miller wrote:
>> I hereby declare this driver to have version "1.0"
> 
> Wish you'd gone to 1.6 or 2.0 since the last version before Joe's/my
> changes was 1.5.9, but it's pretty irrelevant anyways. Hmm, does
> anything complain if we just remove the MODULE_VERSION?

I don't think anything does, but it's presence helps distribution
vendors so they can add their own sub-version suffix to the driver
version when they make local changes.

And you really do want that when users report problems but are using
vendor kernels, it makes the fact that local changes exist from the
vendor easy to see.

^ permalink raw reply

* Re: [PATCH v2 02/16] net: change netdev->features to u32
From: David Miller @ 2011-01-25 20:37 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev, bhutchings
In-Reply-To: <20110125110745.GA30979@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Tue, 25 Jan 2011 12:07:45 +0100

> On Mon, Jan 24, 2011 at 03:30:04PM -0800, David Miller wrote:
>> From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
>> Date: Sat, 22 Jan 2011 23:14:12 +0100 (CET)
>> > Quoting Ben Hutchings: we presumably won't be defining features that
>> > can only be enabled on 64-bit architectures.
>> > 
>> > Occurences found by `grep -r` on net/, drivers/net, include/
>> > 
>> > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
>> I'll apply this, with one change, adding the suggestion that
>> features and vlan_features be moved next to each other in
>> struct netdev, from Eric Dumazet.
> 
> You can use the one I sent in reply to Eric's suggestion.
> It's this one: http://patchwork.ozlabs.org/patch/80056/

Sorry I only noticed that after I applied my modified version.

^ permalink raw reply

* Re: [PATCH] netconsole: kbuild dependency fix
From: Stanislav Fomichev @ 2011-01-25 20:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110124.233705.260069637.davem@davemloft.net>

> Please show the configuration and build error messages from when the
> build fails, otherwise we have to guess what causes the dependency.
Hm, I can't reproduce it (the config which caused the error was not saved).
And honestly, looking deeper into the code I can't imagine how could I get
an error here.
So sorry for confusing you.

^ permalink raw reply

* Re: [PATCH] ipv6: Revert 'administrative down' address handling changes.
From: David Miller @ 2011-01-25 20:49 UTC (permalink / raw)
  To: ebiederm; +Cc: netdev, shemminger, brian.haley, lorenzo, herbert
In-Reply-To: <m1ei803ja4.fsf@fess.ebiederm.org>

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Tue, 25 Jan 2011 10:55:31 -0800

> ebiederm@xmission.com (Eric W. Biederman) writes:
> 
>> David Miller <davem@davemloft.net> writes:
>>
>>> From: David Miller <davem@davemloft.net>
>>> Date: Sun, 23 Jan 2011 23:41:01 -0800 (PST)
>>>
>>>> Eric B. and co., please do some testing to make sure all of your
>>>> disable_ipv6 cases are functioning properly with this applied.
>>>
>>> Ping?
>>
>> In progress.  I had to make a small change to your patch to get it
>> to apply against 2.6.37.  neigh_ifdown has not been removed from the
>> beginning of addrconf_ifdown there.  The piece that was failing for
>> me is not failing now so, so far so good.
>>
>> It was reported that in 2.6.37 there was a new regression that
>> 1connecting to ::1 when ipv6 was disabled would not fail immediately but
>> would have to wait a while.  With your patch applied I am not seeing
>> that behavior either.
>>
>> Tomorrow I should know if I see any weird side effects with your patch,
>> after my regression tests for everything else have finished running.
> 
> I have to admit I am seeing weird side effects.  A test that was
> mysteriously failing because it could not create a vlan on top of a tap
> device has started working again ;)
> 
> So this change is looking really good for me.

Thanks for the testing feedback, that's looking good enough for me,
I'll start pushing this fix around.

^ permalink raw reply

* Re: [PATCH net-next-2.6] bonding: fix return value of bonding_store_slaves_active()
From: Jiri Pirko @ 2011-01-25 20:52 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev, davem
In-Reply-To: <29198.1295983370@death>

Tue, Jan 25, 2011 at 08:22:50PM CET, fubar@us.ibm.com wrote:
>Jiri Pirko <jpirko@redhat.com> wrote:
>
>>count is incorrectly returned even in case of fail. Return ret instead.
>>
>>Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>
>	It looks like bonding_store_carrier has the same problem, could
>you fix it up too and resubmit?

you are right, I missed that. Will resend.

Thanks.

Jirka
>
>	-J
>
>
>>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>>index 8fd0174..f33155f 100644
>>--- a/drivers/net/bonding/bond_sysfs.c
>>+++ b/drivers/net/bonding/bond_sysfs.c
>>@@ -1595,7 +1595,7 @@ static ssize_t bonding_store_slaves_active(struct device *d,
>> 		}
>> 	}
>> out:
>>-	return count;
>>+	return ret;
>> }
>> static DEVICE_ATTR(all_slaves_active, S_IRUGO | S_IWUSR,
>> 		   bonding_show_slaves_active, bonding_store_slaves_active);
>

^ permalink raw reply

* [PATCH net-next-2.6] bonding: fix return value of couple of store functions
From: Jiri Pirko @ 2011-01-25 21:03 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev, davem
In-Reply-To: <29198.1295983370@death>

count is incorrectly returned even in case of fail. Return ret instead.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 8fd0174..72bb0f6 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1198,7 +1198,7 @@ static ssize_t bonding_store_carrier(struct device *d,
 			bond->dev->name, new_value);
 	}
 out:
-	return count;
+	return ret;
 }
 static DEVICE_ATTR(use_carrier, S_IRUGO | S_IWUSR,
 		   bonding_show_carrier, bonding_store_carrier);
@@ -1595,7 +1595,7 @@ static ssize_t bonding_store_slaves_active(struct device *d,
 		}
 	}
 out:
-	return count;
+	return ret;
 }
 static DEVICE_ATTR(all_slaves_active, S_IRUGO | S_IWUSR,
 		   bonding_show_slaves_active, bonding_store_slaves_active);

^ permalink raw reply related

* Re: [PATCH net-next-2.6] bonding: fix return value of couple of store functions
From: Jay Vosburgh @ 2011-01-25 21:07 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem
In-Reply-To: <20110125210324.GD2932@psychotron.redhat.com>

Jiri Pirko <jpirko@redhat.com> wrote:

>count is incorrectly returned even in case of fail. Return ret instead.
>
>Signed-off-by: Jiri Pirko <jpirko@redhat.com>

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>


>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index 8fd0174..72bb0f6 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -1198,7 +1198,7 @@ static ssize_t bonding_store_carrier(struct device *d,
> 			bond->dev->name, new_value);
> 	}
> out:
>-	return count;
>+	return ret;
> }
> static DEVICE_ATTR(use_carrier, S_IRUGO | S_IWUSR,
> 		   bonding_show_carrier, bonding_store_carrier);
>@@ -1595,7 +1595,7 @@ static ssize_t bonding_store_slaves_active(struct device *d,
> 		}
> 	}
> out:
>-	return count;
>+	return ret;
> }
> static DEVICE_ATTR(all_slaves_active, S_IRUGO | S_IWUSR,
> 		   bonding_show_slaves_active, bonding_store_slaves_active);

^ permalink raw reply

* Re: [PATCH net-next-2.6] bonding: fix return value of couple of store functions
From: David Miller @ 2011-01-25 21:13 UTC (permalink / raw)
  To: fubar; +Cc: jpirko, netdev
In-Reply-To: <304.1295989626@death>

From: Jay Vosburgh <fubar@us.ibm.com>
Date: Tue, 25 Jan 2011 13:07:06 -0800

> Jiri Pirko <jpirko@redhat.com> wrote:
> 
>>count is incorrectly returned even in case of fail. Return ret instead.
>>
>>Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> 
> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

Applied, thanks guys.

^ permalink raw reply

* Re: [PATCH] netfilter: ipvs: fix compiler warnings
From: Julian Anastasov @ 2011-01-25 21:22 UTC (permalink / raw)
  To: Simon Horman
  Cc: Patrick McHardy, Changli Gao, Wensong Zhang, David S. Miller,
	netdev, lvs-devel, netfilter-devel
In-Reply-To: <20110125131558.GB2841@verge.net.au>


 	Hello,

On Tue, 25 Jan 2011, Simon Horman wrote:

>> Hi Patrick,
>>
>> I'm a little unsure how you would like me to differentiate
>> between fixes for 2.6.38 - e.g. my previous two pull requests.
>> And patches intended for 2.6.39.
>>
>> In any case I will send a pull request that includes
>> Changli's change against nf-next-2.6.
>
> Actually, I realise that Changli's change is actually
> 2.6.38 material. I'll send a pull request. Please ignore
> my noise about 2.6.39.

 	Simon, I don't see IPVS netns work in 2.6.38-rc2,
so these changes are not for 2.6.38-rc.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH net-next-2.6] pktgen: speedup fragmented skbs
From: David Miller @ 2011-01-25 21:26 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1295975635.3588.337.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 25 Jan 2011 18:13:55 +0100

> We spend lot of time clearing pages in pktgen.
> (Or not clearing them on ipv6 and leaking kernel memory)
> 
> Since we dont modify them, we can use one zeroed page, and get
> references on it. This page can use NUMA affinity as well.
> 
> Define pktgen_finalize_skb() helper, used both in ipv4 and ipv6
> 
> Results using skbs with one frag :
> 
> Before patch :
> 
> Result: OK: 608980458(c608978520+d1938) nsec, 1000000000
> (100byte,1frags)
>   1642088pps 1313Mb/sec (1313670400bps) errors: 0
> 
> After patch :
> 
> Result: OK: 345285014(c345283891+d1123) nsec, 1000000000
> (100byte,1frags)
>   2896158pps 2316Mb/sec (2316926400bps) errors: 0
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, although I changed "->page = 0;" to "->page = NULL;"

Thanks!

^ permalink raw reply

* Re: [PATCH] pch_gbe: Fix the issue that the receiving data is not normal.
From: David Miller @ 2011-01-25 21:32 UTC (permalink / raw)
  To: toshiharu-linux
  Cc: netdev, linux-kernel, qi.wang, yong.y.wang, andrew.chih.howe.khor,
	joel.clark, kok.howg.ewe
In-Reply-To: <4D3D0373.30003@dsn.okisemi.com>

From: Toshiharu Okada <toshiharu-linux@dsn.okisemi.com>
Date: Mon, 24 Jan 2011 13:43:31 +0900

> This PCH_GBE driver had an issue that the receiving data is not normal.
> This driver had not removed correctly the padding data 
> which the DMA include in receiving data. 
> 
> This patch fixed this issue.
> 
> Signed-off-by: Toshiharu Okada <toshiharu-linux@dsn.okisemi.com>

There are bugs in these changes:

>  			if (skb_copy_flag) {	/* recycle  skb */
>  				struct sk_buff *new_skb;
>  				new_skb =
> -				    netdev_alloc_skb(netdev,
> -						     length + NET_IP_ALIGN);
> +				    netdev_alloc_skb(netdev, length);
>  				if (new_skb) {
>  					if (!skb_padding_flag) {
>  						skb_reserve(new_skb,
> -								NET_IP_ALIGN);
> +							PCH_GBE_DMA_PADDING);
>  					}
>  					memcpy(new_skb->data, skb->data,
>  						length);

If "!skb_padding_flag" then you will write past the end of the SKB
data in that memcpy.

You cannot allocate only "length" then proceed to reserve PCH_GBE_DMA_PADDING
and then add "length" worth of data on top of that.  In such a cause you
must allocate at least "length + PCH_GBE_DMA_PADDING".

Furthermore you _MUST_ respect NET_IP_ALIGN.  Some platforms set this value
to "0", because otherwise performance suffers greatly.

There are two seperate issues, removing the padding bytes provided by
the device, and aligning the IP headers as wanted by the cpu
architecutre.  Therefore they should be handled seperately, and we
therefore should still see references to NET_IP_ALIGN in your patch.

^ permalink raw reply

* Re: does intel X520-SR(ixgbe) support RSS on single VLAN?
From: Alexander Duyck @ 2011-01-25 21:34 UTC (permalink / raw)
  To: Rui
  Cc: Ben Hutchings, netdev@vger.kernel.org,
	e1000-devel@lists.sourceforge.net
In-Reply-To: <AANLkTik9bcRjJ4N7_rny_pf6qzR2RbUZRKtz5y9sW0Sb@mail.gmail.com>

On 1/25/2011 1:03 AM, Rui wrote:
> On Tue, Jan 25, 2011 at 11:05 AM, Ben Hutchings
> <bhutchings@solarflare.com>  wrote:
>> On Tue, 2011-01-25 at 10:10 +0800, Rui wrote:
>>> On Tue, Jan 25, 2011 at 1:09 AM, Alexander Duyck
>>> <alexander.h.duyck@intel.com>  wrote:
>>>> On 1/24/2011 6:18 AM, Rui wrote:
>>>>>
>>>>> hi
>>>>> does intel X520-SR support RSS on single VLAN?
>>>>>
>>>>> tested with 3 different vlan id and priority packets
>>>>> What I saw is that all packets were always delivered to the same RxQ.
>>>>> looks can not get a different RSS index for these packet?
>>>>> any setting needed?
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>> The X520 should have no problems hashing on a single VLAN tagged frame.
>>>>   However the VLAN will not be a part of the RSS hash.  The  only components
>>>> of the hash are the IPv4/IPv6 source and destination addresses, and if the
>>>> flow is TCP then the port numbers.
>>>>
>>> hi alexander
>>> I got these information from the intel community:
>>>
>>> 'I asked our software engineers about your question, and this is what I learned.
>>> You cannot filter by just VLAN or VLAN priority.  The L4 type will
>>> also play a role in the filter and as such you would only be able to
>>> filter TCP, UDP, and SCTP packets that are bound for a VLAN.
>>> The command itself to setup a filter is “ethtool –U ethX flow-type
>>> tcp4 vlan 0x2000 vlan-mask 0xE000 action Y” where X is the correct
>>> index for the interface and Y is the queue you want to route the
>>> traffic to.  This would have to be repeated for udp4 and sctp4.
>>> I hope this will help.
>>> Mark H"
>>
>> The mask specifies bits to be ignored, so if you want to filter on the
>> basis of only the priority bits you should use vlan-mask 0xfff.  Unless
>> this is another inconsistency I failed to notice...
>>
>>> so my question is that the VLAN is PART of the RSS or not?
>>
>> It's not part of any specified Toeplitz hash.  However, some hardware
>> supports adding the hash (after indirection) to the queue number
>> specified by a filter.  Currently the ethtool API doesn't have a way to
>> request that.
>>
>>> looks the
>>> perfect filter support vlan id ?can the perfect filter support
>>> wildchar,such as: flow-type ANY?
>>
>> It is possible to specify this using flow-type ether, but the ixgbe
>> driver does not yet support that (and I have no idea whether the
>> hardware does).
>>
>> Ben.
>>
>> --
>> Ben Hutchings, Senior Software Engineer, Solarflare Communications
>> Not speaking for my employer; that's the marketing department's job.
>> They asked us to note that Solarflare product names are trademarked.
>>
>>
>
> I got this msg:
> Cannot add new RX n-tuple filter: Operation not supported
> This command is only supported after 2.6.34?

Yes, this command will only work on 2.6.34 and older kernels.

Thanks,

Alex

^ permalink raw reply

* Re: [PATCH net-next-2.6] pktgen: speedup fragmented skbs
From: Eric Dumazet @ 2011-01-25 21:36 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110125.132626.39178379.davem@davemloft.net>

Le mardi 25 janvier 2011 à 13:26 -0800, David Miller a écrit :

> Applied, although I changed "->page = 0;" to "->page = NULL;"
> 

Ok, thanks !



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox