Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net 6/7] qlge: refactoring of ethtool stats.
From: David Miller @ 2012-07-03  1:26 UTC (permalink / raw)
  To: bhutchings
  Cc: jitendra.kalsaria, netdev, ron.mercer, Dept_NX_Linux_NIC_Driver
In-Reply-To: <1341278648.2590.31.camel@bwh-desktop.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Tue, 3 Jul 2012 02:24:08 +0100

> This is hardly an urgent fix...

This entire series is suspect, and largely inappropriate for 'net'
this late in the -rc.

^ permalink raw reply

* Re: [PATCH net 6/7] qlge: refactoring of ethtool stats.
From: Ben Hutchings @ 2012-07-03  1:24 UTC (permalink / raw)
  To: Jitendra Kalsaria; +Cc: davem, netdev, ron.mercer, Dept_NX_Linux_NIC_Driver
In-Reply-To: <1341272514-5156-7-git-send-email-jitendra.kalsaria@qlogic.com>

This is hardly an urgent fix...

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net 4/7] qlge: Fixed double pci free upon tx_ring->q allocation failure.
From: Ben Hutchings @ 2012-07-03  1:22 UTC (permalink / raw)
  To: Jitendra Kalsaria; +Cc: davem, netdev, ron.mercer, Dept_NX_Linux_NIC_Driver
In-Reply-To: <1341272514-5156-5-git-send-email-jitendra.kalsaria@qlogic.com>

On Mon, 2012-07-02 at 19:41 -0400, Jitendra Kalsaria wrote:
> From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
> 
> Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
> ---
>  drivers/net/ethernet/qlogic/qlge/qlge_main.c |   14 +++++++-------
>  1 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
> index cdbc860..9ecd15f 100644
> --- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
> +++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
> @@ -2701,20 +2701,20 @@ static int ql_alloc_tx_resources(struct ql_adapter *qdev,
>  	    pci_alloc_consistent(qdev->pdev, tx_ring->wq_size,
>  				 &tx_ring->wq_base_dma);
>  
> -	if ((tx_ring->wq_base == NULL) ||
> -	    tx_ring->wq_base_dma & WQ_ADDR_ALIGN) {
> -		netif_err(qdev, ifup, qdev->ndev, "tx_ring alloc failed.\n");
> -		return -ENOMEM;
> -	}
> +	if (!tx_ring->wq_base || tx_ring->wq_base_dma & WQ_ADDR_ALIGN)
> +		goto err;
> +

So in case pci_alloc_consistent() fails, you now try to free anyway.
Not sure whether that's safe; do you feel lucky?

>  	tx_ring->q =
>  	    kmalloc(tx_ring->wq_len * sizeof(struct tx_ring_desc), GFP_KERNEL);
> -	if (tx_ring->q == NULL)
> +	if (!tx_ring->q)
>  		goto err;

Unrelated change.

>  	return 0;
>  err:
>  	pci_free_consistent(qdev->pdev, tx_ring->wq_size,
> -			    tx_ring->wq_base, tx_ring->wq_base_dma);
> +			tx_ring->wq_base, tx_ring->wq_base_dma);

This was nicely indented before...

> +	tx_ring->wq_base = NULL;
> +	netif_err(qdev, ifup, qdev->ndev, "tx_ring alloc failed.\n");
>  	return -ENOMEM;
>  }
>  

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: Deleting an alias causes rest to get deleted
From: Ben Hutchings @ 2012-07-03  1:12 UTC (permalink / raw)
  To: Volkan Yazıcı; +Cc: netdev
In-Reply-To: <4FF1FC74.8080401@gmail.com>

On Mon, 2012-07-02 at 22:54 +0300, Volkan Yazıcı wrote:
> Hi!
> 
> I observe an IP aliasing anomaly that occurs when I try to delete an IP 
> alias from an interface. That is, when I delete the first address in a 
> set of IP aliased addresses assigned according to a particular subnet, 
> rest of the aliases get deleted as well. Check out the below snippet.
[...]
> As a side note, when I first asked this question to Stephen Hemminger 
> (he forwarded me to this mailing list) he also told me that "/In Linux 
> the interface aliases are really a legacy from the BSD style addressing, 
> and don't act the same. It is not common practice to use them./" Is that 
> really the case?
[...]

If you didn't give him the full details shown above, it's possible he
thought you meant alias interfaces such as 'eth0:0'.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API
From: David Miller @ 2012-07-03  1:04 UTC (permalink / raw)
  To: bhutchings; +Cc: ogerlitz, roland, yevgenyp, oren, netdev, hadarh
In-Reply-To: <20120702.171507.1288066003825644221.davem@davemloft.net>


Just in case you guys _really_ and _truly_ are so unable to think
outside the box that you can't come up with something reasonable, I'll
get you started with two ideas:

1) A special "chipset" dummy netdev that a special class of ethtool
   commands can run on to set these things.

2) A "chipset" genetlink family with suitable operations and
   attributes.

In both cases appropriate mechanisms are added to make for keys that
are used for chipset matching, and device drivers simply register
a notifier handler that is called on two occaisions:

1) When settings are changed.

2) Upon initial handler registry, to acquire the initial settings.

^ permalink raw reply

* Re: [PATCH net 1/7] qlge: Fixed packet transmit errors due to potential driver errors.
From: David Miller @ 2012-07-03  0:18 UTC (permalink / raw)
  To: jitendra.kalsaria; +Cc: netdev, ron.mercer, Dept_NX_Linux_NIC_Driver
In-Reply-To: <1341272514-5156-2-git-send-email-jitendra.kalsaria@qlogic.com>

From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Date: Mon,  2 Jul 2012 19:41:48 -0400

> From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
> 
> qlge driver was acting wrongly when considering TX ring full
> as a TX error. TX ring full is expected behavior when NIC is
> overwhelmed and is expected to happen, as far as packets are
> not lost.
> 
> Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>

If your driver is properly coded, this code path should never trigger,
ever.  So it is an error, and you need to fix whatever bug exists in
your driver which allows this to happen, rather than this change
here which attempts to sweep the issue under the rug.

^ permalink raw reply

* Re: [PATCH net-next 06/10] {NET,IB}/mlx4: Add device managed flow steering firmware API
From: David Miller @ 2012-07-03  0:15 UTC (permalink / raw)
  To: bhutchings; +Cc: ogerlitz, roland, yevgenyp, oren, netdev, hadarh
In-Reply-To: <1341252445.2590.12.camel@bwh-desktop.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 2 Jul 2012 19:07:25 +0100

> But there may not be enough commonality to define a non- vendor-specific
> API.  And ethtool really isn't a good way to expose parameters that are
> per-controller rather than per-net-device, particularly if changing them
> may disrupt all running net devices on that controller and not just the
> one used to invoke SIOCETHTOOL.

I fundamentally disagree with you.

Are you really saying that it's OK for every damn vendor to define
their own magic knob to control stuff like this?  Surely you're not.

^ permalink raw reply

* Re: [PATCH v6] sctp: be more restrictive in transport selection on bundled sacks
From: David Miller @ 2012-07-03  0:10 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, vyasevich, linux-sctp
In-Reply-To: <20120702122531.GA29681@hmsreliant.think-freely.org>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Mon, 2 Jul 2012 08:25:31 -0400

> On Sun, Jul 01, 2012 at 07:44:25PM -0400, Neil Horman wrote:
>> On Sun, Jul 01, 2012 at 02:43:19PM -0700, David Miller wrote:
>> > From: Neil Horman <nhorman@tuxdriver.com>
>> > Date: Sun, 1 Jul 2012 08:47:50 -0400
>> > 
>> > > Perhaps we could modify the SubmittingPatches document to indicate that an
>> > > Acked-by from a subsystem maintainer implicitly confers authority on the
>> > > upstream receiver to request reasonable stylistic changes that don't affect the
>> > > functionality of the patch in the interests of maintaining coding conventions.
>> > 
>> > Yes, that would make sense.
>> > 
>> 
>> 
>> I'll propose it in a few days.
>> Neil
>> 
> How does this language sound to you?

Looks fine to me.

^ permalink raw reply

* Re: [PATCH 00/12] Swap-over-NFS without deadlocking V8
From: Eric B Munson @ 2012-07-03  0:10 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	David Miller, Trond Myklebust, Neil Brown, Christoph Hellwig,
	Peter Zijlstra, Mike Christie, Sebastian Andrzej Siewior
In-Reply-To: <20120702143556.GT14154@suse.de>

[-- Attachment #1: Type: text/plain, Size: 5432 bytes --]

On Mon, 02 Jul 2012, Mel Gorman wrote:

> On Sun, Jul 01, 2012 at 01:22:54PM -0400, Eric B Munson wrote:
> > On Fri, 29 Jun 2012, Mel Gorman wrote:
> > 
> > > Changelog since V7
> > >   o Rebase to linux-next 20120629
> > >   o bi->page_dma instead of bi->page in intel driver
> > >   o Build fix for !CONFIG_NET					(sebastian)
> > >   o Restore PF_MEMALLOC flags correctly in all cases		(jlayton)
> > > 
> > > Changelog since V6
> > >   o Rebase to linux-next 20120622
> > > 
> > > Changelog since V5
> > >   o Rebase to v3.5-rc3
> > > 
> > > Changelog since V4
> > >   o Catch if SOCK_MEMALLOC flag is cleared with rmem tokens	(davem)
> > > 
> > > Changelog since V3
> > >   o Rebase to 3.4-rc5
> > >   o kmap pages for writing to swap				(akpm)
> > >   o Move forward declaration to reduce chance of duplication	(akpm)
> > > 
> > > Changelog since V2
> > >   o Nothing significant, just rebases. A radix tree lookup is replaced with
> > >     a linear search would be the biggest rebase artifact
> > > 
> > > This patch series is based on top of "Swap-over-NBD without deadlocking v14"
> > > as it depends on the same reservation of PF_MEMALLOC reserves logic.
> > > 
> > > When a user or administrator requires swap for their application, they
> > > create a swap partition and file, format it with mkswap and activate it with
> > > swapon. In diskless systems this is not an option so if swap if required
> > > then swapping over the network is considered.  The two likely scenarios
> > > are when blade servers are used as part of a cluster where the form factor
> > > or maintenance costs do not allow the use of disks and thin clients.
> > > 
> > > The Linux Terminal Server Project recommends the use of the Network
> > > Block Device (NBD) for swap but this is not always an option.  There is
> > > no guarantee that the network attached storage (NAS) device is running
> > > Linux or supports NBD. However, it is likely that it supports NFS so there
> > > are users that want support for swapping over NFS despite any performance
> > > concern. Some distributions currently carry patches that support swapping
> > > over NFS but it would be preferable to support it in the mainline kernel.
> > > 
> > > Patch 1 avoids a stream-specific deadlock that potentially affects TCP.
> > > 
> > > Patch 2 is a small modification to SELinux to avoid using PFMEMALLOC
> > > 	reserves.
> > > 
> > > Patch 3 adds three helpers for filesystems to handle swap cache pages.
> > > 	For example, page_file_mapping() returns page->mapping for
> > > 	file-backed pages and the address_space of the underlying
> > > 	swap file for swap cache pages.
> > > 
> > > Patch 4 adds two address_space_operations to allow a filesystem
> > > 	to pin all metadata relevant to a swapfile in memory. Upon
> > > 	successful activation, the swapfile is marked SWP_FILE and
> > > 	the address space operation ->direct_IO is used for writing
> > > 	and ->readpage for reading in swap pages.
> > > 
> > > Patch 5 notes that patch 3 is bolting
> > > 	filesystem-specific-swapfile-support onto the side and that
> > > 	the default handlers have different information to what
> > > 	is available to the filesystem. This patch refactors the
> > > 	code so that there are generic handlers for each of the new
> > > 	address_space operations.
> > > 
> > > Patch 6 adds an API to allow a vector of kernel addresses to be
> > > 	translated to struct pages and pinned for IO.
> > > 
> > > Patch 7 adds support for using highmem pages for swap by kmapping
> > > 	the pages before calling the direct_IO handler.
> > > 
> > > Patch 8 updates NFS to use the helpers from patch 3 where necessary.
> > > 
> > > Patch 9 avoids setting PF_private on PG_swapcache pages within NFS.
> > > 
> > > Patch 10 implements the new swapfile-related address_space operations
> > > 	for NFS and teaches the direct IO handler how to manage
> > > 	kernel addresses.
> > > 
> > > Patch 11 prevents page allocator recursions in NFS by using GFP_NOIO
> > > 	where appropriate.
> > > 
> > > Patch 12 fixes a NULL pointer dereference that occurs when using
> > > 	swap-over-NFS.
> > > 
> > > With the patches applied, it is possible to mount a swapfile that is on an
> > > NFS filesystem. Swap performance is not great with a swap stress test taking
> > > roughly twice as long to complete than if the swap device was backed by NBD.
> > 
> > To test this set I am using memory cgroups to force swap usage.  I am seeing
> > the cgroup controller killing my processes instead of using the nfs swapfile.
> > 
> 
> How sure are you that this is not a cgroup bug? For dirty file data on some
> kernels, cgroups can prematurely kill processes if pages are not being
> cleaned fast enough. I would not expect the same problem for anonymous
> pages but it's worth considering. Please also test with a normal swapfile.
> 
> If OOM is disabled and the process hangs, try capturing a sysrq+t and
> see where the process is stuck.
> 

It looks like the problem is with cgroups, when I run without cgroups and limit
memory on the boot command line everything works fine.  To test I limited the
machine to 1G of ram then ran several memory benchmarks with work set sizes of
1.5G, all completed successfully with my swap file located on an NFS share.

Tested-by: Eric B Munson <emunson@mgebm.net>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [RFC] [TCP 0/3] Receive from socket into bio without copying
From: Willy Tarreau @ 2012-07-03  0:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: chetan loke, Andreas Gruenbacher, netdev, linux-kernel,
	Herbert Xu, David S. Miller
In-Reply-To: <1341265024.22621.464.camel@edumazet-glaptop>

Hi Eric,

On Mon, Jul 02, 2012 at 11:37:04PM +0200, Eric Dumazet wrote:
> On Mon, 2012-07-02 at 15:41 -0400, chetan loke wrote:
> > On Mon, Jul 2, 2012 at 12:06 PM, Andreas Gruenbacher <agruen@linbit.com> wrote:
> > > On Mon, 2012-07-02 at 15:54 +0200, Eric Dumazet wrote:
> > >> So I will just say no to your patches, unless you demonstrate the
> > >> splice() problems, and how you can fix the alignment problem in a new
> > >> layer instead of in the existing zero copy standard one.
> > >
> > > Again, splice or not is not the issue here. It does not, by itself, allow zero
> > > copy from the network directly to disk but it could likely be made to support
> > > that if we can get the alignment right first.  The proposed MSG_NEW_PACKET flag
> > > helps with that, but maybe someone has a better idea.
> > >
> > 
> > Eric - by using splice do you mean something like:
> > 
> > int filedes[2];
> > PIPE_SIZE (64*1024)
> > pipe(filedes);
> > ret = splice (sock_fd_from, &from_offset, filedes [1], NULL, PIPE_SIZE,
> >                      SPLICE_F_MORE | SPLICE_F_MOVE);
> > 
> > 
> > ret = splice (filedes [0], NULL, file_fd_to,
> >                          &to_offset, ret,
> >                          SPLICE_F_MORE | SPLICE_F_MOVE);
> > 
> 
> Yes, thats more or less the plan. You also can play with bigger
> PIPE_SIZE if needed.

I confirm, this is recommended at high bit rates if you're working with
large windows.

> > i.e. splice-in from socket to pipe, and splice-out from pipe to destination?
> > 
> > Andreas - if the above assumption is true then can you apply the
> > 'MSG_NEW_PACKET' on the sender and see if the above pseudo-splice code
> > achieves something similar to what you expect on the receive side(you
> > can also play w/ F_SETPIPE_SZ -  although I found very little
> > reduction in CPU usage)? Note: My personal experience - using splice
> > from an input-file-A to output-file-B bought very minimal cpu
> > reduction(yes, both the files used O_DIRECT). Instead, a simple
> > read/write w/ O_DIRECT from file-A to file-B was much much faster.
> 
> splice() performance from socket to pipe have improved a lot in
> linux-3.5
> 
> It was not true zero copy, until very recent patches.

In fact it has been true zero copy in 2.6.25 until we faced a large
amount of data corruption and the zero copy was disabled in 2.6.25.X.
Since then it remained that way until you brought your patches to
re-instantiate it.

> (It was zero copy only on certain class of NIC, not on the ones found
> on appliances or cheap platforms)
> 
> Willy Tarreau mentioned a nice boost of performance with haproxy.

Yes definitely. The savings are more noticeable on small systems where
memory bandwidth is limited. On a small ARM system bound by RAM bandwidth,
the performance was basically doubled. But I also observed nice savings
on a core2duo equipped with 2 myricom 10Gig NICs forwarding at line rate.

> Willy wanted to work on a direct splice from socket to socket, but
> I am not sure it'll bring major speed improvement.

I'm not sure at all either, I'm betting a few percent saved from the
reduction of syscalls, not much more. This is why I'll probably check
this when I have enough time to kill.

Regards,
Willy

^ permalink raw reply

* I wanna share some happiness with you. Would you like some?)
From: Shizue Rothman @ 2012-07-03  0:01 UTC (permalink / raw)
  To: biuro@horn.com.pl

How's it going gorgeous! ))
I am Shizue.
One girl gave me ur contact and she also told me that you r cute ;)
So, I can't wait to make a new friend, I guess it is your turn now!
Let's start right now, what you think? :)

^ permalink raw reply

* [PATCH net 5/7] qlge: Categorize receive frame errors from firmware.
From: Jitendra Kalsaria @ 2012-07-02 23:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer, Dept_NX_Linux_NIC_Driver, Jitendra Kalsaria
In-Reply-To: <1341272514-5156-1-git-send-email-jitendra.kalsaria@qlogic.com>

From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
---
 drivers/net/ethernet/qlogic/qlge/qlge.h         |    8 +++
 drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c |   14 +++++
 drivers/net/ethernet/qlogic/qlge/qlge_main.c    |   63 +++++++++++++----------
 3 files changed, 58 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge.h b/drivers/net/ethernet/qlogic/qlge/qlge.h
index 5a639df..e81bbb7 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge.h
+++ b/drivers/net/ethernet/qlogic/qlge/qlge.h
@@ -1535,6 +1535,14 @@ struct nic_stats {
 	u64 rx_1024_to_1518_pkts;
 	u64 rx_1519_to_max_pkts;
 	u64 rx_len_err_pkts;
+	/* Receive Mac Err stats */
+	u64 rx_code_err;
+	u64 rx_oversize_err;
+	u64 rx_undersize_err;
+	u64 rx_preamble_err;
+	u64 rx_frame_len_err;
+	u64 rx_crc_err;
+	u64 rx_err_count;
 	/*
 	 * These stats come from offset 500h to 5C8h
 	 * in the XGMAC register.
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c b/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
index 31ee6dc..7163f5d 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
@@ -226,6 +226,13 @@ static char ql_stats_str_arr[][ETH_GSTRING_LEN] = {
 	{"rx_1024_to_1518_pkts"},
 	{"rx_1519_to_max_pkts"},
 	{"rx_len_err_pkts"},
+	{"rx_code_err"},
+	{"rx_oversize_err"},
+	{"rx_undersize_err"},
+	{"rx_preamble_err"},
+	{"rx_frame_len_err"},
+	{"rx_crc_err"},
+	{"rx_err_count"},
 	{"tx_cbfc_pause_frames0"},
 	{"tx_cbfc_pause_frames1"},
 	{"tx_cbfc_pause_frames2"},
@@ -320,6 +327,13 @@ ql_get_ethtool_stats(struct net_device *ndev,
 	*data++ = s->rx_1024_to_1518_pkts;
 	*data++ = s->rx_1519_to_max_pkts;
 	*data++ = s->rx_len_err_pkts;
+	*data++ = s->rx_code_err;
+	*data++ = s->rx_oversize_err;
+	*data++ = s->rx_undersize_err;
+	*data++ = s->rx_preamble_err;
+	*data++ = s->rx_frame_len_err;
+	*data++ = s->rx_crc_err;
+	*data++ = s->rx_err_count;
 	*data++ = s->tx_cbfc_pause_frames0;
 	*data++ = s->tx_cbfc_pause_frames1;
 	*data++ = s->tx_cbfc_pause_frames2;
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 9ecd15f..06dfafe 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -1433,6 +1433,36 @@ map_error:
 	return NETDEV_TX_BUSY;
 }
 
+/* Categorizing receive firmware frame errors */
+static void ql_categorize_rx_err(struct ql_adapter *qdev, u8 rx_err)
+{
+	struct nic_stats *stats = &qdev->nic_stats;
+
+	stats->rx_err_count++;
+
+	switch (rx_err & IB_MAC_IOCB_RSP_ERR_MASK) {
+	case IB_MAC_IOCB_RSP_ERR_CODE_ERR:
+		stats->rx_code_err++;
+		break;
+	case IB_MAC_IOCB_RSP_ERR_OVERSIZE:
+		stats->rx_oversize_err++;
+		break;
+	case IB_MAC_IOCB_RSP_ERR_UNDERSIZE:
+		stats->rx_undersize_err++;
+		break;
+	case IB_MAC_IOCB_RSP_ERR_PREAMBLE:
+		stats->rx_preamble_err++;
+		break;
+	case IB_MAC_IOCB_RSP_ERR_FRAME_LEN:
+		stats->rx_frame_len_err++;
+		break;
+	case IB_MAC_IOCB_RSP_ERR_CRC:
+		stats->rx_crc_err++;
+	default:
+		break;
+	}
+}
+
 /* Process an inbound completion from an rx ring. */
 static void ql_process_mac_rx_gro_page(struct ql_adapter *qdev,
 					struct rx_ring *rx_ring,
@@ -1499,15 +1529,6 @@ static void ql_process_mac_rx_page(struct ql_adapter *qdev,
 	addr = lbq_desc->p.pg_chunk.va;
 	prefetch(addr);
 
-
-	/* Frame error, so drop the packet. */
-	if (ib_mac_rsp->flags2 & IB_MAC_IOCB_RSP_ERR_MASK) {
-		netif_info(qdev, drv, qdev->ndev,
-			  "Receive error, flags2 = 0x%x\n", ib_mac_rsp->flags2);
-		rx_ring->rx_errors++;
-		goto err_out;
-	}
-
 	/* The max framesize filter on this chip is set higher than
 	 * MTU since FCoE uses 2k frames.
 	 */
@@ -1593,15 +1614,6 @@ static void ql_process_mac_rx_skb(struct ql_adapter *qdev,
 	memcpy(skb_put(new_skb, length), skb->data, length);
 	skb = new_skb;
 
-	/* Frame error, so drop the packet. */
-	if (ib_mac_rsp->flags2 & IB_MAC_IOCB_RSP_ERR_MASK) {
-		netif_info(qdev, drv, qdev->ndev,
-			  "Receive error, flags2 = 0x%x\n", ib_mac_rsp->flags2);
-		dev_kfree_skb_any(skb);
-		rx_ring->rx_errors++;
-		return;
-	}
-
 	/* loopback self test for ethtool */
 	if (test_bit(QL_SELFTEST, &qdev->flags)) {
 		ql_check_lb_frame(qdev, skb);
@@ -1908,15 +1920,6 @@ static void ql_process_mac_split_rx_intr(struct ql_adapter *qdev,
 		return;
 	}
 
-	/* Frame error, so drop the packet. */
-	if (ib_mac_rsp->flags2 & IB_MAC_IOCB_RSP_ERR_MASK) {
-		netif_info(qdev, drv, qdev->ndev,
-			  "Receive error, flags2 = 0x%x\n", ib_mac_rsp->flags2);
-		dev_kfree_skb_any(skb);
-		rx_ring->rx_errors++;
-		return;
-	}
-
 	/* The max framesize filter on this chip is set higher than
 	 * MTU since FCoE uses 2k frames.
 	 */
@@ -1999,6 +2002,12 @@ static unsigned long ql_process_mac_rx_intr(struct ql_adapter *qdev,
 
 	QL_DUMP_IB_MAC_RSP(ib_mac_rsp);
 
+	/* Frame error, so drop the packet. */
+	if (ib_mac_rsp->flags2 & IB_MAC_IOCB_RSP_ERR_MASK) {
+		ql_categorize_rx_err(qdev, ib_mac_rsp->flags2);
+		return (unsigned long)length;
+	}
+
 	if (ib_mac_rsp->flags4 & IB_MAC_IOCB_RSP_HV) {
 		/* The data and headers are split into
 		 * separate buffers.
-- 
1.7.1

^ permalink raw reply related

* [PATCH net 6/7] qlge: refactoring of ethtool stats.
From: Jitendra Kalsaria @ 2012-07-02 23:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer, Dept_NX_Linux_NIC_Driver, Jitendra Kalsaria
In-Reply-To: <1341272514-5156-1-git-send-email-jitendra.kalsaria@qlogic.com>

From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
---
 drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c |  295 ++++++++++++-----------
 1 files changed, 157 insertions(+), 138 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c b/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
index 7163f5d..3dcf5c8 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
@@ -35,10 +35,152 @@
 
 #include "qlge.h"
 
+struct ql_stats {
+	char stat_string[ETH_GSTRING_LEN];
+	int sizeof_stat;
+	int stat_offset;
+};
+
+#define QL_SIZEOF(m) FIELD_SIZEOF(struct ql_adapter, m)
+#define QL_OFF(m) offsetof(struct ql_adapter, m)
+
+static const struct ql_stats ql_gstrings_stats[] = {
+	{"tx_pkts", QL_SIZEOF(nic_stats.tx_pkts), QL_OFF(nic_stats.tx_pkts)},
+	{"tx_bytes", QL_SIZEOF(nic_stats.tx_bytes), QL_OFF(nic_stats.tx_bytes)},
+	{"tx_mcast_pkts", QL_SIZEOF(nic_stats.tx_mcast_pkts),
+					QL_OFF(nic_stats.tx_mcast_pkts)},
+	{"tx_bcast_pkts", QL_SIZEOF(nic_stats.tx_bcast_pkts),
+					QL_OFF(nic_stats.tx_bcast_pkts)},
+	{"tx_ucast_pkts", QL_SIZEOF(nic_stats.tx_ucast_pkts),
+					QL_OFF(nic_stats.tx_ucast_pkts)},
+	{"tx_ctl_pkts", QL_SIZEOF(nic_stats.tx_ctl_pkts),
+					QL_OFF(nic_stats.tx_ctl_pkts)},
+	{"tx_pause_pkts", QL_SIZEOF(nic_stats.tx_pause_pkts),
+					QL_OFF(nic_stats.tx_pause_pkts)},
+	{"tx_64_pkts", QL_SIZEOF(nic_stats.tx_64_pkt),
+					QL_OFF(nic_stats.tx_64_pkt)},
+	{"tx_65_to_127_pkts", QL_SIZEOF(nic_stats.tx_65_to_127_pkt),
+					QL_OFF(nic_stats.tx_65_to_127_pkt)},
+	{"tx_128_to_255_pkts", QL_SIZEOF(nic_stats.tx_128_to_255_pkt),
+					QL_OFF(nic_stats.tx_128_to_255_pkt)},
+	{"tx_256_511_pkts", QL_SIZEOF(nic_stats.tx_256_511_pkt),
+					QL_OFF(nic_stats.tx_256_511_pkt)},
+	{"tx_512_to_1023_pkts",	QL_SIZEOF(nic_stats.tx_512_to_1023_pkt),
+					QL_OFF(nic_stats.tx_512_to_1023_pkt)},
+	{"tx_1024_to_1518_pkts", QL_SIZEOF(nic_stats.tx_1024_to_1518_pkt),
+					QL_OFF(nic_stats.tx_1024_to_1518_pkt)},
+	{"tx_1519_to_max_pkts",	QL_SIZEOF(nic_stats.tx_1519_to_max_pkt),
+					QL_OFF(nic_stats.tx_1519_to_max_pkt)},
+	{"tx_undersize_pkts", QL_SIZEOF(nic_stats.tx_undersize_pkt),
+					QL_OFF(nic_stats.tx_undersize_pkt)},
+	{"tx_oversize_pkts", QL_SIZEOF(nic_stats.tx_oversize_pkt),
+					QL_OFF(nic_stats.tx_oversize_pkt)},
+	{"rx_bytes", QL_SIZEOF(nic_stats.rx_bytes), QL_OFF(nic_stats.rx_bytes)},
+	{"rx_bytes_ok",	QL_SIZEOF(nic_stats.rx_bytes_ok),
+					QL_OFF(nic_stats.rx_bytes_ok)},
+	{"rx_pkts", QL_SIZEOF(nic_stats.rx_pkts), QL_OFF(nic_stats.rx_pkts)},
+	{"rx_pkts_ok", QL_SIZEOF(nic_stats.rx_pkts_ok),
+					QL_OFF(nic_stats.rx_pkts_ok)},
+	{"rx_bcast_pkts", QL_SIZEOF(nic_stats.rx_bcast_pkts),
+					QL_OFF(nic_stats.rx_bcast_pkts)},
+	{"rx_mcast_pkts", QL_SIZEOF(nic_stats.rx_mcast_pkts),
+					QL_OFF(nic_stats.rx_mcast_pkts)},
+	{"rx_ucast_pkts", QL_SIZEOF(nic_stats.rx_ucast_pkts),
+					QL_OFF(nic_stats.rx_ucast_pkts)},
+	{"rx_undersize_pkts", QL_SIZEOF(nic_stats.rx_undersize_pkts),
+					QL_OFF(nic_stats.rx_undersize_pkts)},
+	{"rx_oversize_pkts", QL_SIZEOF(nic_stats.rx_oversize_pkts),
+					QL_OFF(nic_stats.rx_oversize_pkts)},
+	{"rx_jabber_pkts", QL_SIZEOF(nic_stats.rx_jabber_pkts),
+					QL_OFF(nic_stats.rx_jabber_pkts)},
+	{"rx_undersize_fcerr_pkts",
+		QL_SIZEOF(nic_stats.rx_undersize_fcerr_pkts),
+				QL_OFF(nic_stats.rx_undersize_fcerr_pkts)},
+	{"rx_drop_events", QL_SIZEOF(nic_stats.rx_drop_events),
+					QL_OFF(nic_stats.rx_drop_events)},
+	{"rx_fcerr_pkts", QL_SIZEOF(nic_stats.rx_fcerr_pkts),
+					QL_OFF(nic_stats.rx_fcerr_pkts)},
+	{"rx_align_err", QL_SIZEOF(nic_stats.rx_align_err),
+					QL_OFF(nic_stats.rx_align_err)},
+	{"rx_symbol_err", QL_SIZEOF(nic_stats.rx_symbol_err),
+					QL_OFF(nic_stats.rx_symbol_err)},
+	{"rx_mac_err", QL_SIZEOF(nic_stats.rx_mac_err),
+					QL_OFF(nic_stats.rx_mac_err)},
+	{"rx_ctl_pkts",	QL_SIZEOF(nic_stats.rx_ctl_pkts),
+					QL_OFF(nic_stats.rx_ctl_pkts)},
+	{"rx_pause_pkts", QL_SIZEOF(nic_stats.rx_pause_pkts),
+					QL_OFF(nic_stats.rx_pause_pkts)},
+	{"rx_64_pkts", QL_SIZEOF(nic_stats.rx_64_pkts),
+					QL_OFF(nic_stats.rx_64_pkts)},
+	{"rx_65_to_127_pkts", QL_SIZEOF(nic_stats.rx_65_to_127_pkts),
+					QL_OFF(nic_stats.rx_65_to_127_pkts)},
+	{"rx_128_255_pkts", QL_SIZEOF(nic_stats.rx_128_255_pkts),
+					QL_OFF(nic_stats.rx_128_255_pkts)},
+	{"rx_256_511_pkts", QL_SIZEOF(nic_stats.rx_256_511_pkts),
+					QL_OFF(nic_stats.rx_256_511_pkts)},
+	{"rx_512_to_1023_pkts",	QL_SIZEOF(nic_stats.rx_512_to_1023_pkts),
+					QL_OFF(nic_stats.rx_512_to_1023_pkts)},
+	{"rx_1024_to_1518_pkts", QL_SIZEOF(nic_stats.rx_1024_to_1518_pkts),
+					QL_OFF(nic_stats.rx_1024_to_1518_pkts)},
+	{"rx_1519_to_max_pkts",	QL_SIZEOF(nic_stats.rx_1519_to_max_pkts),
+					QL_OFF(nic_stats.rx_1519_to_max_pkts)},
+	{"rx_len_err_pkts", QL_SIZEOF(nic_stats.rx_len_err_pkts),
+					QL_OFF(nic_stats.rx_len_err_pkts)},
+	{"rx_code_err",	QL_SIZEOF(nic_stats.rx_code_err),
+					QL_OFF(nic_stats.rx_code_err)},
+	{"rx_oversize_err", QL_SIZEOF(nic_stats.rx_oversize_err),
+					QL_OFF(nic_stats.rx_oversize_err)},
+	{"rx_undersize_err", QL_SIZEOF(nic_stats.rx_undersize_err),
+					QL_OFF(nic_stats.rx_undersize_err)},
+	{"rx_preamble_err", QL_SIZEOF(nic_stats.rx_preamble_err),
+					QL_OFF(nic_stats.rx_preamble_err)},
+	{"rx_frame_len_err", QL_SIZEOF(nic_stats.rx_frame_len_err),
+					QL_OFF(nic_stats.rx_frame_len_err)},
+	{"rx_crc_err", QL_SIZEOF(nic_stats.rx_crc_err),
+					QL_OFF(nic_stats.rx_crc_err)},
+	{"rx_err_count", QL_SIZEOF(nic_stats.rx_err_count),
+					QL_OFF(nic_stats.rx_err_count)},
+	{"tx_cbfc_pause_frames0", QL_SIZEOF(nic_stats.tx_cbfc_pause_frames0),
+				QL_OFF(nic_stats.tx_cbfc_pause_frames0)},
+	{"tx_cbfc_pause_frames1", QL_SIZEOF(nic_stats.tx_cbfc_pause_frames1),
+				QL_OFF(nic_stats.tx_cbfc_pause_frames1)},
+	{"tx_cbfc_pause_frames2", QL_SIZEOF(nic_stats.tx_cbfc_pause_frames2),
+				QL_OFF(nic_stats.tx_cbfc_pause_frames2)},
+	{"tx_cbfc_pause_frames3", QL_SIZEOF(nic_stats.tx_cbfc_pause_frames3),
+				QL_OFF(nic_stats.tx_cbfc_pause_frames3)},
+	{"tx_cbfc_pause_frames4", QL_SIZEOF(nic_stats.tx_cbfc_pause_frames4),
+				QL_OFF(nic_stats.tx_cbfc_pause_frames4)},
+	{"tx_cbfc_pause_frames5", QL_SIZEOF(nic_stats.tx_cbfc_pause_frames5),
+				QL_OFF(nic_stats.tx_cbfc_pause_frames5)},
+	{"tx_cbfc_pause_frames6", QL_SIZEOF(nic_stats.tx_cbfc_pause_frames6),
+				QL_OFF(nic_stats.tx_cbfc_pause_frames6)},
+	{"tx_cbfc_pause_frames7", QL_SIZEOF(nic_stats.tx_cbfc_pause_frames7),
+				QL_OFF(nic_stats.tx_cbfc_pause_frames7)},
+	{"rx_cbfc_pause_frames0", QL_SIZEOF(nic_stats.rx_cbfc_pause_frames0),
+				QL_OFF(nic_stats.rx_cbfc_pause_frames0)},
+	{"rx_cbfc_pause_frames1", QL_SIZEOF(nic_stats.rx_cbfc_pause_frames1),
+				QL_OFF(nic_stats.rx_cbfc_pause_frames1)},
+	{"rx_cbfc_pause_frames2", QL_SIZEOF(nic_stats.rx_cbfc_pause_frames2),
+				QL_OFF(nic_stats.rx_cbfc_pause_frames2)},
+	{"rx_cbfc_pause_frames3", QL_SIZEOF(nic_stats.rx_cbfc_pause_frames3),
+				QL_OFF(nic_stats.rx_cbfc_pause_frames3)},
+	{"rx_cbfc_pause_frames4", QL_SIZEOF(nic_stats.rx_cbfc_pause_frames4),
+				QL_OFF(nic_stats.rx_cbfc_pause_frames4)},
+	{"rx_cbfc_pause_frames5", QL_SIZEOF(nic_stats.rx_cbfc_pause_frames5),
+				QL_OFF(nic_stats.rx_cbfc_pause_frames5)},
+	{"rx_cbfc_pause_frames6", QL_SIZEOF(nic_stats.rx_cbfc_pause_frames6),
+				QL_OFF(nic_stats.rx_cbfc_pause_frames6)},
+	{"rx_cbfc_pause_frames7", QL_SIZEOF(nic_stats.rx_cbfc_pause_frames7),
+				QL_OFF(nic_stats.rx_cbfc_pause_frames7)},
+	{"rx_nic_fifo_drop", QL_SIZEOF(nic_stats.rx_nic_fifo_drop),
+					QL_OFF(nic_stats.rx_nic_fifo_drop)},
+};
+
 static const char ql_gstrings_test[][ETH_GSTRING_LEN] = {
 	"Loopback test  (offline)"
 };
 #define QLGE_TEST_LEN (sizeof(ql_gstrings_test) / ETH_GSTRING_LEN)
+#define QLGE_STATS_LEN ARRAY_SIZE(ql_gstrings_stats)
 
 static int ql_update_ring_coalescing(struct ql_adapter *qdev)
 {
@@ -183,83 +325,19 @@ quit:
 	QL_DUMP_STAT(qdev);
 }
 
-static char ql_stats_str_arr[][ETH_GSTRING_LEN] = {
-	{"tx_pkts"},
-	{"tx_bytes"},
-	{"tx_mcast_pkts"},
-	{"tx_bcast_pkts"},
-	{"tx_ucast_pkts"},
-	{"tx_ctl_pkts"},
-	{"tx_pause_pkts"},
-	{"tx_64_pkts"},
-	{"tx_65_to_127_pkts"},
-	{"tx_128_to_255_pkts"},
-	{"tx_256_511_pkts"},
-	{"tx_512_to_1023_pkts"},
-	{"tx_1024_to_1518_pkts"},
-	{"tx_1519_to_max_pkts"},
-	{"tx_undersize_pkts"},
-	{"tx_oversize_pkts"},
-	{"rx_bytes"},
-	{"rx_bytes_ok"},
-	{"rx_pkts"},
-	{"rx_pkts_ok"},
-	{"rx_bcast_pkts"},
-	{"rx_mcast_pkts"},
-	{"rx_ucast_pkts"},
-	{"rx_undersize_pkts"},
-	{"rx_oversize_pkts"},
-	{"rx_jabber_pkts"},
-	{"rx_undersize_fcerr_pkts"},
-	{"rx_drop_events"},
-	{"rx_fcerr_pkts"},
-	{"rx_align_err"},
-	{"rx_symbol_err"},
-	{"rx_mac_err"},
-	{"rx_ctl_pkts"},
-	{"rx_pause_pkts"},
-	{"rx_64_pkts"},
-	{"rx_65_to_127_pkts"},
-	{"rx_128_255_pkts"},
-	{"rx_256_511_pkts"},
-	{"rx_512_to_1023_pkts"},
-	{"rx_1024_to_1518_pkts"},
-	{"rx_1519_to_max_pkts"},
-	{"rx_len_err_pkts"},
-	{"rx_code_err"},
-	{"rx_oversize_err"},
-	{"rx_undersize_err"},
-	{"rx_preamble_err"},
-	{"rx_frame_len_err"},
-	{"rx_crc_err"},
-	{"rx_err_count"},
-	{"tx_cbfc_pause_frames0"},
-	{"tx_cbfc_pause_frames1"},
-	{"tx_cbfc_pause_frames2"},
-	{"tx_cbfc_pause_frames3"},
-	{"tx_cbfc_pause_frames4"},
-	{"tx_cbfc_pause_frames5"},
-	{"tx_cbfc_pause_frames6"},
-	{"tx_cbfc_pause_frames7"},
-	{"rx_cbfc_pause_frames0"},
-	{"rx_cbfc_pause_frames1"},
-	{"rx_cbfc_pause_frames2"},
-	{"rx_cbfc_pause_frames3"},
-	{"rx_cbfc_pause_frames4"},
-	{"rx_cbfc_pause_frames5"},
-	{"rx_cbfc_pause_frames6"},
-	{"rx_cbfc_pause_frames7"},
-	{"rx_nic_fifo_drop"},
-};
-
 static void ql_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
 {
+	int index;
 	switch (stringset) {
 	case ETH_SS_TEST:
 		memcpy(buf, *ql_gstrings_test, QLGE_TEST_LEN * ETH_GSTRING_LEN);
 		break;
 	case ETH_SS_STATS:
-		memcpy(buf, ql_stats_str_arr, sizeof(ql_stats_str_arr));
+		for (index = 0; index < QLGE_STATS_LEN; index++) {
+			memcpy(buf + index * ETH_GSTRING_LEN,
+				ql_gstrings_stats[index].stat_string,
+				ETH_GSTRING_LEN);
+		}
 		break;
 	}
 }
@@ -270,7 +348,7 @@ static int ql_get_sset_count(struct net_device *dev, int sset)
 	case ETH_SS_TEST:
 		return QLGE_TEST_LEN;
 	case ETH_SS_STATS:
-		return ARRAY_SIZE(ql_stats_str_arr);
+		return QLGE_STATS_LEN;
 	default:
 		return -EOPNOTSUPP;
 	}
@@ -281,76 +359,17 @@ ql_get_ethtool_stats(struct net_device *ndev,
 		     struct ethtool_stats *stats, u64 *data)
 {
 	struct ql_adapter *qdev = netdev_priv(ndev);
-	struct nic_stats *s = &qdev->nic_stats;
+	int index, length;
 
+	length = QLGE_STATS_LEN;
 	ql_update_stats(qdev);
 
-	*data++ = s->tx_pkts;
-	*data++ = s->tx_bytes;
-	*data++ = s->tx_mcast_pkts;
-	*data++ = s->tx_bcast_pkts;
-	*data++ = s->tx_ucast_pkts;
-	*data++ = s->tx_ctl_pkts;
-	*data++ = s->tx_pause_pkts;
-	*data++ = s->tx_64_pkt;
-	*data++ = s->tx_65_to_127_pkt;
-	*data++ = s->tx_128_to_255_pkt;
-	*data++ = s->tx_256_511_pkt;
-	*data++ = s->tx_512_to_1023_pkt;
-	*data++ = s->tx_1024_to_1518_pkt;
-	*data++ = s->tx_1519_to_max_pkt;
-	*data++ = s->tx_undersize_pkt;
-	*data++ = s->tx_oversize_pkt;
-	*data++ = s->rx_bytes;
-	*data++ = s->rx_bytes_ok;
-	*data++ = s->rx_pkts;
-	*data++ = s->rx_pkts_ok;
-	*data++ = s->rx_bcast_pkts;
-	*data++ = s->rx_mcast_pkts;
-	*data++ = s->rx_ucast_pkts;
-	*data++ = s->rx_undersize_pkts;
-	*data++ = s->rx_oversize_pkts;
-	*data++ = s->rx_jabber_pkts;
-	*data++ = s->rx_undersize_fcerr_pkts;
-	*data++ = s->rx_drop_events;
-	*data++ = s->rx_fcerr_pkts;
-	*data++ = s->rx_align_err;
-	*data++ = s->rx_symbol_err;
-	*data++ = s->rx_mac_err;
-	*data++ = s->rx_ctl_pkts;
-	*data++ = s->rx_pause_pkts;
-	*data++ = s->rx_64_pkts;
-	*data++ = s->rx_65_to_127_pkts;
-	*data++ = s->rx_128_255_pkts;
-	*data++ = s->rx_256_511_pkts;
-	*data++ = s->rx_512_to_1023_pkts;
-	*data++ = s->rx_1024_to_1518_pkts;
-	*data++ = s->rx_1519_to_max_pkts;
-	*data++ = s->rx_len_err_pkts;
-	*data++ = s->rx_code_err;
-	*data++ = s->rx_oversize_err;
-	*data++ = s->rx_undersize_err;
-	*data++ = s->rx_preamble_err;
-	*data++ = s->rx_frame_len_err;
-	*data++ = s->rx_crc_err;
-	*data++ = s->rx_err_count;
-	*data++ = s->tx_cbfc_pause_frames0;
-	*data++ = s->tx_cbfc_pause_frames1;
-	*data++ = s->tx_cbfc_pause_frames2;
-	*data++ = s->tx_cbfc_pause_frames3;
-	*data++ = s->tx_cbfc_pause_frames4;
-	*data++ = s->tx_cbfc_pause_frames5;
-	*data++ = s->tx_cbfc_pause_frames6;
-	*data++ = s->tx_cbfc_pause_frames7;
-	*data++ = s->rx_cbfc_pause_frames0;
-	*data++ = s->rx_cbfc_pause_frames1;
-	*data++ = s->rx_cbfc_pause_frames2;
-	*data++ = s->rx_cbfc_pause_frames3;
-	*data++ = s->rx_cbfc_pause_frames4;
-	*data++ = s->rx_cbfc_pause_frames5;
-	*data++ = s->rx_cbfc_pause_frames6;
-	*data++ = s->rx_cbfc_pause_frames7;
-	*data++ = s->rx_nic_fifo_drop;
+	for (index = 0; index < length; index++) {
+		char *p = (char *)qdev +
+			ql_gstrings_stats[index].stat_offset;
+		*data++ = (ql_gstrings_stats[index].sizeof_stat ==
+			sizeof(u64)) ? *(u64 *)p : (*(u32 *)p);
+	}
 }
 
 static int ql_get_settings(struct net_device *ndev,
-- 
1.7.1

^ permalink raw reply related

* [PATCH net 7/7] qlge: Bumped driver version to 1.00.00.31
From: Jitendra Kalsaria @ 2012-07-02 23:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer, Dept_NX_Linux_NIC_Driver, Jitendra Kalsaria
In-Reply-To: <1341272514-5156-1-git-send-email-jitendra.kalsaria@qlogic.com>

From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
---
 drivers/net/ethernet/qlogic/qlge/qlge.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge.h b/drivers/net/ethernet/qlogic/qlge/qlge.h
index e81bbb7..5a8c00c 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge.h
+++ b/drivers/net/ethernet/qlogic/qlge/qlge.h
@@ -18,7 +18,7 @@
  */
 #define DRV_NAME  	"qlge"
 #define DRV_STRING 	"QLogic 10 Gigabit PCI-E Ethernet Driver "
-#define DRV_VERSION	"v1.00.00.30.00.00-01"
+#define DRV_VERSION	"v1.00.00.31"
 
 #define WQ_ADDR_ALIGN	0x3	/* 4 byte alignment */
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH net 1/7] qlge: Fixed packet transmit errors due to potential driver errors.
From: Jitendra Kalsaria @ 2012-07-02 23:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer, Dept_NX_Linux_NIC_Driver, Jitendra Kalsaria
In-Reply-To: <1341272514-5156-1-git-send-email-jitendra.kalsaria@qlogic.com>

From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>

qlge driver was acting wrongly when considering TX ring full
as a TX error. TX ring full is expected behavior when NIC is
overwhelmed and is expected to happen, as far as packets are
not lost.

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
---
 drivers/net/ethernet/qlogic/qlge/qlge_main.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 09d8d33..cdbc860 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -2562,7 +2562,6 @@ static netdev_tx_t qlge_send(struct sk_buff *skb, struct net_device *ndev)
 			   __func__, tx_ring_idx);
 		netif_stop_subqueue(ndev, tx_ring->wq_id);
 		atomic_inc(&tx_ring->queue_stopped);
-		tx_ring->tx_errors++;
 		return NETDEV_TX_BUSY;
 	}
 	tx_ring_desc = &tx_ring->q[tx_ring->prod_idx];
-- 
1.7.1

^ permalink raw reply related

* [PATCH net 4/7] qlge: Fixed double pci free upon tx_ring->q allocation failure.
From: Jitendra Kalsaria @ 2012-07-02 23:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer, Dept_NX_Linux_NIC_Driver, Jitendra Kalsaria
In-Reply-To: <1341272514-5156-1-git-send-email-jitendra.kalsaria@qlogic.com>

From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
---
 drivers/net/ethernet/qlogic/qlge/qlge_main.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index cdbc860..9ecd15f 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -2701,20 +2701,20 @@ static int ql_alloc_tx_resources(struct ql_adapter *qdev,
 	    pci_alloc_consistent(qdev->pdev, tx_ring->wq_size,
 				 &tx_ring->wq_base_dma);
 
-	if ((tx_ring->wq_base == NULL) ||
-	    tx_ring->wq_base_dma & WQ_ADDR_ALIGN) {
-		netif_err(qdev, ifup, qdev->ndev, "tx_ring alloc failed.\n");
-		return -ENOMEM;
-	}
+	if (!tx_ring->wq_base || tx_ring->wq_base_dma & WQ_ADDR_ALIGN)
+		goto err;
+
 	tx_ring->q =
 	    kmalloc(tx_ring->wq_len * sizeof(struct tx_ring_desc), GFP_KERNEL);
-	if (tx_ring->q == NULL)
+	if (!tx_ring->q)
 		goto err;
 
 	return 0;
 err:
 	pci_free_consistent(qdev->pdev, tx_ring->wq_size,
-			    tx_ring->wq_base, tx_ring->wq_base_dma);
+			tx_ring->wq_base, tx_ring->wq_base_dma);
+	tx_ring->wq_base = NULL;
+	netif_err(qdev, ifup, qdev->ndev, "tx_ring alloc failed.\n");
 	return -ENOMEM;
 }
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH net 2/7] qlge: Stand-up card should not report supporting wol.
From: Jitendra Kalsaria @ 2012-07-02 23:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer, Dept_NX_Linux_NIC_Driver, Jitendra Kalsaria
In-Reply-To: <1341272514-5156-1-git-send-email-jitendra.kalsaria@qlogic.com>

From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
---
 drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c |   46 ++++++++++++++---------
 1 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c b/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
index 8e2c2a7..98f04d7 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_ethtool.c
@@ -388,30 +388,40 @@ static void ql_get_drvinfo(struct net_device *ndev,
 static void ql_get_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
 {
 	struct ql_adapter *qdev = netdev_priv(ndev);
-	/* What we support. */
-	wol->supported = WAKE_MAGIC;
-	/* What we've currently got set. */
-	wol->wolopts = qdev->wol;
+	unsigned short ssys_dev = qdev->pdev->subsystem_device;
+
+	if (ssys_dev == 0x0068 || ssys_dev == 0x0180) {
+		wol->supported = WAKE_MAGIC;
+		wol->wolopts = qdev->wol;
+	}
 }
 
 static int ql_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
 {
 	struct ql_adapter *qdev = netdev_priv(ndev);
-	int status;
+	unsigned short ssys_dev = qdev->pdev->subsystem_device;
 
-	if (wol->wolopts & ~WAKE_MAGIC)
-		return -EINVAL;
-	qdev->wol = wol->wolopts;
-
-	netif_info(qdev, drv, qdev->ndev, "Set wol option 0x%x\n", qdev->wol);
-	if (!qdev->wol) {
-		u32 wol = 0;
-		status = ql_mb_wol_mode(qdev, wol);
-		netif_err(qdev, drv, qdev->ndev, "WOL %s (wol code 0x%x)\n",
-			  status == 0 ? "cleared successfully" : "clear failed",
-			  wol);
-	}
+	if (ssys_dev == 0x0068 || ssys_dev == 0x0180) {
+		if (wol->wolopts & ~WAKE_MAGIC)
+			return -EINVAL;
+		qdev->wol = wol->wolopts;
+
+		netif_info(qdev, drv, qdev->ndev,
+				"Set wol option 0x%x\n", qdev->wol);
+		if (!qdev->wol) {
+			u32 wol = 0;
+			int status = 0;
 
+			status = ql_mb_wol_mode(qdev, wol);
+			netif_err(qdev, drv, qdev->ndev,
+			"WOL %s (wol code 0x%x)\n",
+			status == 0 ? "cleared successfully" : "clear failed",
+			wol);
+		}
+	} else {
+		netif_info(qdev, drv, qdev->ndev,
+				"WOL is not supported on stand-up card\n");
+	}
 	return 0;
 }
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH net 0/7] qlge: bug fix
From: Jitendra Kalsaria @ 2012-07-02 23:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer, Dept_NX_Linux_NIC_Driver, Jitendra Kalsaria

From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>

Please apply it to net.

Thanks,
Jitendra

^ permalink raw reply

* Re: [PATCH v6] bonding support for IPv6 transmit hashing
From: Jay Vosburgh @ 2012-07-02 23:33 UTC (permalink / raw)
  To: John Eaglesham; +Cc: netdev
In-Reply-To: <eb20e8f67e6aad94c233219e058c3e793ed755cb.1341167171.git.linux@8192.net>

John Eaglesham <linux@8192.net> wrote:

>Currently the "bonding" driver does not support load balancing outgoing
>traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4)
>are currently supported; this patch adds transmit hashing for IPv6 (and
>TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the
>bonding driver.
>
>The algorithm chosen (xor'ing the bottom three quads of the source and
>destination addresses together, then xor'ing each byte of that result into
>the bottom byte, finally xor'ing with the last bytes of the MAC addresses)
>was selected after testing almost 400,000 unique IPv6 addresses harvested
>from server logs. This algorithm had the most even distribution for both
>big- and little-endian architectures while still using few instructions. Its
>behavior also attempts to closely match that of the IPv4 algorithm.
>
>The IPv6 flow label was intentionally not included in the hash as it appears
>to be unset in the vast majority of IPv6 traffic sampled, and the current
>algorithm not using the flow label already offers a very even distribution.
>
>Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets,
>ie, they are not balanced based on layer 4 information. Additionally,
>IPv6 packets with intermediate headers are not balanced based on layer
>4 information. In practice these intermediate headers are not common and
>this should not cause any problems, and the alternative (a packet-parsing
>loop and look-up table) seemed slow and complicated for little gain.
>
>This is an update to prior patches I submitted. This version includes:
>* Updated and clarified description
>* IPv6 algorithm more closely matches that of IPv4
>* Thorough bounds checking on all xmit functions
>* Consolidate layer 2 hashing logic into one function
>* Update style as per Jay Vosburgh and David Miller
>* Patches against net-next as one patch
>
>Patch has been tested and performs as expected.
>
>John Eaglesham
>
>---
> Documentation/networking/bonding.txt | 32 +++++++++++--
> drivers/net/bonding/bond_main.c      | 91 +++++++++++++++++++++++++-----------
> 2 files changed, 92 insertions(+), 31 deletions(-)
>
>diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
>index bfea8a3..3851dad 100644
>--- a/Documentation/networking/bonding.txt
>+++ b/Documentation/networking/bonding.txt
>@@ -752,12 +752,23 @@ xmit_hash_policy
> 		protocol information to generate the hash.
>
> 		Uses XOR of hardware MAC addresses and IP addresses to
>-		generate the hash.  The formula is
>+		generate the hash.  The IPv4 formula is
>
> 		(((source IP XOR dest IP) AND 0xffff) XOR
> 			( source MAC XOR destination MAC ))
> 				modulo slave count
>
>+		The IPv6 formula is
>+
>+		hash =
>+			(source ip quad 2 XOR dest IP quad 2) XOR
>+			(source ip quad 3 XOR dest IP quad 3) XOR
>+			(source ip quad 4 XOR dest IP quad 4)
>+
>+		(((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
>+			(source MAC XOR destination MAC))
>+				modulo slave count

	This seems to be missing an XOR, between the end of "XOR hash)"
and the start of "(source MAC".

>+
> 		This algorithm will place all traffic to a particular
> 		network peer on the same slave.  For non-IP traffic,
> 		the formula is the same as for the layer2 transmit
>@@ -778,19 +789,30 @@ xmit_hash_policy
> 		slaves, although a single connection will not span
> 		multiple slaves.
>
>-		The formula for unfragmented TCP and UDP packets is
>+		The formula for unfragmented IPv4 TCP and UDP packets is
>
> 		((source port XOR dest port) XOR
> 			 ((source IP XOR dest IP) AND 0xffff)
> 				modulo slave count
>
>-		For fragmented TCP or UDP packets and all other IP
>-		protocol traffic, the source and destination port
>+		The formula for unfragmented IPv6 TCP and UDP packets is
>+
>+		hash =
>+			(source ip quad 2 XOR dest IP quad 2) XOR
>+			(source ip quad 3 XOR dest IP quad 3) XOR
>+			(source ip quad 4 XOR dest IP quad 4)
>+
>+		((source port XOR dest port) XOR
>+			(hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
>+				modulo slave count
>+
>+		For fragmented TCP or UDP packets and all other IPv4 and
>+		IPv6 protocol traffic, the source and destination port
> 		information is omitted.  For non-IP traffic, the
> 		formula is the same as for the layer2 transmit hash
> 		policy.
>
>-		This policy is intended to mimic the behavior of
>+		The IPv4 policy is intended to mimic the behavior of
> 		certain switches, notably Cisco switches with PFC2 as
> 		well as some Foundry and IBM products.
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index f5a40b9..c733d55 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -3345,56 +3345,95 @@ static struct notifier_block bond_netdev_notifier = {
> /*---------------------------- Hashing Policies -----------------------------*/
>
> /*
>+ * Hash for the output device based upon layer 2 data
>+ */
>+static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
>+{
>+	struct ethhdr *data = (struct ethhdr *)skb->data;
>+
>+	if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
>+		return (data->h_dest[5] ^ data->h_source[5]) % count;
>+
>+	return 0;
>+}
>+
>+/*
>  * Hash for the output device based upon layer 2 and layer 3 data. If
>- * the packet is not IP mimic bond_xmit_hash_policy_l2()
>+ * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
>  */
> static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
> {
> 	struct ethhdr *data = (struct ethhdr *)skb->data;
>-	struct iphdr *iph = ip_hdr(skb);
>-
>-	if (skb->protocol == htons(ETH_P_IP)) {
>+	struct iphdr *iph;
>+	struct ipv6hdr *ipv6h;
>+	u32 v6hash;
>+	__be32 *s, *d;
>+
>+	if (skb->protocol == htons(ETH_P_IP) &&
>+		skb_network_header_len(skb) >= sizeof(struct iphdr)) {
>+		iph = ip_hdr(skb);
> 		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
> 			(data->h_dest[5] ^ data->h_source[5])) % count;
>+	} else if (skb->protocol == htons(ETH_P_IPV6) &&
>+		skb_network_header_len(skb) >= sizeof(struct ipv6hdr)) {
>+		ipv6h = ipv6_hdr(skb);
>+		s = &ipv6h->saddr.s6_addr32[0];
>+		d = &ipv6h->daddr.s6_addr32[0];
>+		v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
>+		v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
>+		return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
> 	}
>
>-	return (data->h_dest[5] ^ data->h_source[5]) % count;
>+	return bond_xmit_hash_policy_l2(skb, count);
> }
>
> /*
>  * Hash for the output device based upon layer 3 and layer 4 data. If
>  * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
>- * altogether not IP, mimic bond_xmit_hash_policy_l2()
>+ * altogether not IP, fall back on bond_xmit_hash_policy_l2()
>  */
> static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
> {
>-	struct ethhdr *data = (struct ethhdr *)skb->data;
>-	struct iphdr *iph = ip_hdr(skb);
>-	__be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
>-	int layer4_xor = 0;
>+	u32 layer4_xor = 0;
>+	struct iphdr *iph;
>+	struct ipv6hdr *ipv6h;
>+	__be32 *s, *d;
>+	__be16 *layer4hdr;
>
> 	if (skb->protocol == htons(ETH_P_IP)) {
>+		iph = ip_hdr(skb);
> 		if (!ip_is_fragment(iph) &&
>-		    (iph->protocol == IPPROTO_TCP ||
>-		     iph->protocol == IPPROTO_UDP)) {
>+			(iph->protocol == IPPROTO_TCP ||
>+			iph->protocol == IPPROTO_UDP)) {

	Why did these two lines change?

>+			layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
>+			if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 >
>+				skb_headlen(skb) - skb_network_offset(skb))
>+				goto short_header;
> 			layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1)));
>+		} else if (skb_network_header_len(skb) < sizeof(struct iphdr)) {
>+			goto short_header;
> 		}
>-		return (layer4_xor ^
>-			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
>-
>+		return (layer4_xor ^ ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;

	This line runs past 80 columns.  There are a few more of these
further down.

>+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
>+		ipv6h = ipv6_hdr(skb);
>+		if (ipv6h->nexthdr == IPPROTO_TCP || ipv6h->nexthdr == IPPROTO_UDP) {
>+			layer4hdr = (__be16 *)((u8 *)ipv6h + sizeof(struct ipv6hdr));

	Could this be written as

			layer4hdr = (__be16 *)(ipv6h + 1);

	instead?

	-J

>+			if (sizeof(struct ipv6hdr) + sizeof(__be16) * 2 >
>+				skb_headlen(skb) - skb_network_offset(skb))
>+				goto short_header;
>+			layer4_xor = (*layer4hdr ^ *(layer4hdr + 1));
>+		} else if (skb_network_header_len(skb) < sizeof(struct ipv6hdr)) {
>+			goto short_header;
>+		}
>+		s = &ipv6h->saddr.s6_addr32[0];
>+		d = &ipv6h->daddr.s6_addr32[0];
>+		layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
>+		layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^ (layer4_xor >> 8);
>+		return layer4_xor % count;
> 	}
>
>-	return (data->h_dest[5] ^ data->h_source[5]) % count;
>-}
>-
>-/*
>- * Hash for the output device based upon layer 2 data
>- */
>-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
>-{
>-	struct ethhdr *data = (struct ethhdr *)skb->data;
>-
>-	return (data->h_dest[5] ^ data->h_source[5]) % count;
>+short_header:
>+	return bond_xmit_hash_policy_l2(skb, count);
> }
>
> /*-------------------------- Device entry points ----------------------------*/
>-- 
>1.7.11

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* [PATCH v6] bonding support for IPv6 transmit hashing
From: John Eaglesham @ 2012-07-01 19:13 UTC (permalink / raw)
  To: netdev; +Cc: John Eaglesham
In-Reply-To: <c390ff662ede0f304398a88ca1a96e3773065c60.1341129744.git.linux@8192.net>

Currently the "bonding" driver does not support load balancing outgoing
traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4)
are currently supported; this patch adds transmit hashing for IPv6 (and
TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the
bonding driver.

The algorithm chosen (xor'ing the bottom three quads of the source and
destination addresses together, then xor'ing each byte of that result into
the bottom byte, finally xor'ing with the last bytes of the MAC addresses)
was selected after testing almost 400,000 unique IPv6 addresses harvested
from server logs. This algorithm had the most even distribution for both
big- and little-endian architectures while still using few instructions. Its
behavior also attempts to closely match that of the IPv4 algorithm.

The IPv6 flow label was intentionally not included in the hash as it appears
to be unset in the vast majority of IPv6 traffic sampled, and the current
algorithm not using the flow label already offers a very even distribution.

Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets,
ie, they are not balanced based on layer 4 information. Additionally,
IPv6 packets with intermediate headers are not balanced based on layer
4 information. In practice these intermediate headers are not common and
this should not cause any problems, and the alternative (a packet-parsing
loop and look-up table) seemed slow and complicated for little gain.

This is an update to prior patches I submitted. This version includes:
* Updated and clarified description
* IPv6 algorithm more closely matches that of IPv4
* Thorough bounds checking on all xmit functions
* Consolidate layer 2 hashing logic into one function
* Update style as per Jay Vosburgh and David Miller
* Patches against net-next as one patch

Patch has been tested and performs as expected.

John Eaglesham

---
 Documentation/networking/bonding.txt | 32 +++++++++++--
 drivers/net/bonding/bond_main.c      | 91 +++++++++++++++++++++++++-----------
 2 files changed, 92 insertions(+), 31 deletions(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index bfea8a3..3851dad 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -752,12 +752,23 @@ xmit_hash_policy
 		protocol information to generate the hash.
 
 		Uses XOR of hardware MAC addresses and IP addresses to
-		generate the hash.  The formula is
+		generate the hash.  The IPv4 formula is
 
 		(((source IP XOR dest IP) AND 0xffff) XOR
 			( source MAC XOR destination MAC ))
 				modulo slave count
 
+		The IPv6 formula is
+
+		hash =
+			(source ip quad 2 XOR dest IP quad 2) XOR
+			(source ip quad 3 XOR dest IP quad 3) XOR
+			(source ip quad 4 XOR dest IP quad 4)
+
+		(((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
+			(source MAC XOR destination MAC))
+				modulo slave count
+
 		This algorithm will place all traffic to a particular
 		network peer on the same slave.  For non-IP traffic,
 		the formula is the same as for the layer2 transmit
@@ -778,19 +789,30 @@ xmit_hash_policy
 		slaves, although a single connection will not span
 		multiple slaves.
 
-		The formula for unfragmented TCP and UDP packets is
+		The formula for unfragmented IPv4 TCP and UDP packets is
 
 		((source port XOR dest port) XOR
 			 ((source IP XOR dest IP) AND 0xffff)
 				modulo slave count
 
-		For fragmented TCP or UDP packets and all other IP
-		protocol traffic, the source and destination port
+		The formula for unfragmented IPv6 TCP and UDP packets is
+
+		hash =
+			(source ip quad 2 XOR dest IP quad 2) XOR
+			(source ip quad 3 XOR dest IP quad 3) XOR
+			(source ip quad 4 XOR dest IP quad 4)
+
+		((source port XOR dest port) XOR
+			(hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
+				modulo slave count
+
+		For fragmented TCP or UDP packets and all other IPv4 and
+		IPv6 protocol traffic, the source and destination port
 		information is omitted.  For non-IP traffic, the
 		formula is the same as for the layer2 transmit hash
 		policy.
 
-		This policy is intended to mimic the behavior of
+		The IPv4 policy is intended to mimic the behavior of
 		certain switches, notably Cisco switches with PFC2 as
 		well as some Foundry and IBM products.
 
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index f5a40b9..c733d55 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3345,56 +3345,95 @@ static struct notifier_block bond_netdev_notifier = {
 /*---------------------------- Hashing Policies -----------------------------*/
 
 /*
+ * Hash for the output device based upon layer 2 data
+ */
+static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
+{
+	struct ethhdr *data = (struct ethhdr *)skb->data;
+
+	if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
+		return (data->h_dest[5] ^ data->h_source[5]) % count;
+
+	return 0;
+}
+
+/*
  * Hash for the output device based upon layer 2 and layer 3 data. If
- * the packet is not IP mimic bond_xmit_hash_policy_l2()
+ * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
  */
 static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
-	struct iphdr *iph = ip_hdr(skb);
-
-	if (skb->protocol == htons(ETH_P_IP)) {
+	struct iphdr *iph;
+	struct ipv6hdr *ipv6h;
+	u32 v6hash;
+	__be32 *s, *d;
+
+	if (skb->protocol == htons(ETH_P_IP) &&
+		skb_network_header_len(skb) >= sizeof(struct iphdr)) {
+		iph = ip_hdr(skb);
 		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
 			(data->h_dest[5] ^ data->h_source[5])) % count;
+	} else if (skb->protocol == htons(ETH_P_IPV6) &&
+		skb_network_header_len(skb) >= sizeof(struct ipv6hdr)) {
+		ipv6h = ipv6_hdr(skb);
+		s = &ipv6h->saddr.s6_addr32[0];
+		d = &ipv6h->daddr.s6_addr32[0];
+		v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
+		v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
+		return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
 	}
 
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
+	return bond_xmit_hash_policy_l2(skb, count);
 }
 
 /*
  * Hash for the output device based upon layer 3 and layer 4 data. If
  * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
- * altogether not IP, mimic bond_xmit_hash_policy_l2()
+ * altogether not IP, fall back on bond_xmit_hash_policy_l2()
  */
 static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
 {
-	struct ethhdr *data = (struct ethhdr *)skb->data;
-	struct iphdr *iph = ip_hdr(skb);
-	__be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
-	int layer4_xor = 0;
+	u32 layer4_xor = 0;
+	struct iphdr *iph;
+	struct ipv6hdr *ipv6h;
+	__be32 *s, *d;
+	__be16 *layer4hdr;
 
 	if (skb->protocol == htons(ETH_P_IP)) {
+		iph = ip_hdr(skb);
 		if (!ip_is_fragment(iph) &&
-		    (iph->protocol == IPPROTO_TCP ||
-		     iph->protocol == IPPROTO_UDP)) {
+			(iph->protocol == IPPROTO_TCP ||
+			iph->protocol == IPPROTO_UDP)) {
+			layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
+			if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 >
+				skb_headlen(skb) - skb_network_offset(skb))
+				goto short_header;
 			layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1)));
+		} else if (skb_network_header_len(skb) < sizeof(struct iphdr)) {
+			goto short_header;
 		}
-		return (layer4_xor ^
-			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
-
+		return (layer4_xor ^ ((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		ipv6h = ipv6_hdr(skb);
+		if (ipv6h->nexthdr == IPPROTO_TCP || ipv6h->nexthdr == IPPROTO_UDP) {
+			layer4hdr = (__be16 *)((u8 *)ipv6h + sizeof(struct ipv6hdr));
+			if (sizeof(struct ipv6hdr) + sizeof(__be16) * 2 >
+				skb_headlen(skb) - skb_network_offset(skb))
+				goto short_header;
+			layer4_xor = (*layer4hdr ^ *(layer4hdr + 1));
+		} else if (skb_network_header_len(skb) < sizeof(struct ipv6hdr)) {
+			goto short_header;
+		}
+		s = &ipv6h->saddr.s6_addr32[0];
+		d = &ipv6h->daddr.s6_addr32[0];
+		layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
+		layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^ (layer4_xor >> 8);
+		return layer4_xor % count;
 	}
 
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
-}
-
-/*
- * Hash for the output device based upon layer 2 data
- */
-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
-{
-	struct ethhdr *data = (struct ethhdr *)skb->data;
-
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
+short_header:
+	return bond_xmit_hash_policy_l2(skb, count);
 }
 
 /*-------------------------- Device entry points ----------------------------*/
-- 
1.7.11

^ permalink raw reply related

* Re: AW: AW: RFC: replace packets already in queue
From: Eric Dumazet @ 2012-07-02 21:56 UTC (permalink / raw)
  To: Nicolas de Pesloüan; +Cc: Erdt, Ralph, Rick Jones, netdev@vger.kernel.org
In-Reply-To: <4FF20557.4090501@gmail.com>

On Mon, 2012-07-02 at 22:32 +0200, Nicolas de Pesloüan wrote:
> Le 02/07/2012 10:38, Erdt, Ralph a écrit :
> >>> Even if the wireless queue is a problem (because of our setup, this
> >> is
> >>> not a problem), the network stack queue (*) is the biggest queue, and
> >>> a good point to optimize.
> >>
> >> Hmm, I am not convinced you have no queues on wireless.
> >>
> >> Please describe how you managed this.
> >>
> >> In fact this is the biggest problem with wireless : mac82011 framework
> >> aggressively pull packets from Linux packet qdisc in order to perform
> >> packet aggregation.
> >
> > I did not talking about W-LAN (802.11). I'm talking about an property technology which is able to
> > send over KILOMETERs (WLAN<  100m) but with VERY low bandwidth: 9600 bit (no Mega, Giga or Kilo!)
> > (W-LAN: slowest: 1Mbit). The devices is loosely connected to our boxes: No linux driver but a
> > program which create an virtual network device. This just sends one packet to the devices and
> > then waits for the acknowledgement that the packet was sent. THEN the next packet will be send.
> > There is no further queue, because the wireless is so lame, that there is no need for that! (BTW:
> > the qdisc and the connector are distinct problems/programs. There is no dependency.)
> 
> If I were you, I would use a tun/tap interface and manage a private packet queue in userspace. This 
> way, you wouldn't have to manage the overhead of porting your kernel code to every new kernel versions.
> 

This seems a good idea.

Then you can do other coalescing stuff, like TCP ACK that could be
aggregated to single ACK as well.

^ permalink raw reply

* Re: [RFC] [TCP 0/3] Receive from socket into bio without copying
From: Eric Dumazet @ 2012-07-02 21:37 UTC (permalink / raw)
  To: chetan loke
  Cc: Andreas Gruenbacher, netdev, linux-kernel, Herbert Xu,
	David S. Miller, Willy Tarreau
In-Reply-To: <CAAsGZS6ahzUo+QRcLOyPcj5=mp53O+M9kqByOwPb0zZcr-cgwQ@mail.gmail.com>

On Mon, 2012-07-02 at 15:41 -0400, chetan loke wrote:
> On Mon, Jul 2, 2012 at 12:06 PM, Andreas Gruenbacher <agruen@linbit.com> wrote:
> > On Mon, 2012-07-02 at 15:54 +0200, Eric Dumazet wrote:
> >> So I will just say no to your patches, unless you demonstrate the
> >> splice() problems, and how you can fix the alignment problem in a new
> >> layer instead of in the existing zero copy standard one.
> >
> > Again, splice or not is not the issue here. It does not, by itself, allow zero
> > copy from the network directly to disk but it could likely be made to support
> > that if we can get the alignment right first.  The proposed MSG_NEW_PACKET flag
> > helps with that, but maybe someone has a better idea.
> >
> 
> Eric - by using splice do you mean something like:
> 
> int filedes[2];
> PIPE_SIZE (64*1024)
> pipe(filedes);
> ret = splice (sock_fd_from, &from_offset, filedes [1], NULL, PIPE_SIZE,
>                      SPLICE_F_MORE | SPLICE_F_MOVE);
> 
> 
> ret = splice (filedes [0], NULL, file_fd_to,
>                          &to_offset, ret,
>                          SPLICE_F_MORE | SPLICE_F_MOVE);
> 

Yes, thats more or less the plan. You also can play with bigger
PIPE_SIZE if needed.

> 
> i.e. splice-in from socket to pipe, and splice-out from pipe to destination?
> 
> Andreas - if the above assumption is true then can you apply the
> 'MSG_NEW_PACKET' on the sender and see if the above pseudo-splice code
> achieves something similar to what you expect on the receive side(you
> can also play w/ F_SETPIPE_SZ -  although I found very little
> reduction in CPU usage)? Note: My personal experience - using splice
> from an input-file-A to output-file-B bought very minimal cpu
> reduction(yes, both the files used O_DIRECT). Instead, a simple
> read/write w/ O_DIRECT from file-A to file-B was much much faster.

splice() performance from socket to pipe have improved a lot in
linux-3.5

It was not true zero copy, until very recent patches.
(It was zero copy only on certain class of NIC, not on the ones found
on appliances or cheap platforms)

Willy Tarreau mentioned a nice boost of performance with haproxy.

Willy wanted to work on a direct splice from socket to socket, but
I am not sure it'll bring major speed improvement.

^ permalink raw reply

* Re: AW: AW: RFC: replace packets already in queue
From: Nicolas de Pesloüan @ 2012-07-02 20:32 UTC (permalink / raw)
  To: Erdt, Ralph; +Cc: Eric Dumazet, Rick Jones, netdev@vger.kernel.org
In-Reply-To: <FB112703C4930F4ABEBB5B763F96491139379643@MAILSERV2A.lorien.fkie.fgan.de>

Le 02/07/2012 10:38, Erdt, Ralph a écrit :
>>> Even if the wireless queue is a problem (because of our setup, this
>> is
>>> not a problem), the network stack queue (*) is the biggest queue, and
>>> a good point to optimize.
>>
>> Hmm, I am not convinced you have no queues on wireless.
>>
>> Please describe how you managed this.
>>
>> In fact this is the biggest problem with wireless : mac82011 framework
>> aggressively pull packets from Linux packet qdisc in order to perform
>> packet aggregation.
>
> I did not talking about W-LAN (802.11). I'm talking about an property technology which is able to
> send over KILOMETERs (WLAN<  100m) but with VERY low bandwidth: 9600 bit (no Mega, Giga or Kilo!)
> (W-LAN: slowest: 1Mbit). The devices is loosely connected to our boxes: No linux driver but a
> program which create an virtual network device. This just sends one packet to the devices and
> then waits for the acknowledgement that the packet was sent. THEN the next packet will be send.
> There is no further queue, because the wireless is so lame, that there is no need for that! (BTW:
> the qdisc and the connector are distinct problems/programs. There is no dependency.)

If I were you, I would use a tun/tap interface and manage a private packet queue in userspace. This 
way, you wouldn't have to manage the overhead of porting your kernel code to every new kernel versions.

	Nicolas.

^ permalink raw reply

* Deleting an alias causes rest to get deleted
From: Volkan Yazıcı @ 2012-07-02 19:54 UTC (permalink / raw)
  To: netdev

Hi!

I observe an IP aliasing anomaly that occurs when I try to delete an IP 
alias from an interface. That is, when I delete the first address in a 
set of IP aliased addresses assigned according to a particular subnet, 
rest of the aliases get deleted as well. Check out the below snippet.

    $ *for I in `seq 1 6`; do sudo ip addr add 192.168.2.$I/29 dev eth0;
    done*
    $ ip addr list
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
         link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
         inet 127.0.0.1/8 scope host lo
         inet6 ::1/128 scope host
            valid_lft forever preferred_lft forever
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
    state UP qlen 1000
         link/ether 00:24:54:b9:1c:f8 brd ff:ff:ff:ff:ff:ff
    *inet 192.168.1.200/24 brd 192.168.1.255 scope global eth0**
         inet 192.168.2.1/29 scope global eth0
         inet 192.168.2.2/29 scope global secondary eth0
         inet 192.168.2.3/29 scope global secondary eth0
         inet 192.168.2.4/29 scope global secondary eth0
         inet 192.168.2.5/29 scope global secondary eth0
         inet 192.168.2.6/29 scope global secondary eth0*
         inet6 fe80::224:54ff:feb9:1cf8/64 scope link
            valid_lft forever preferred_lft forever
    3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
         link/ether e8:39:df:6a:21:2a brd ff:ff:ff:ff:ff:ff
    $ *sudo ip addr del 192.168.2.1/29 dev eth0*
    $ ip addr list
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
         link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
         inet 127.0.0.1/8 scope host lo
         inet6 ::1/128 scope host
            valid_lft forever preferred_lft forever
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
    state UP qlen 1000
         link/ether 00:24:54:b9:1c:f8 brd ff:ff:ff:ff:ff:ff
    *inet 192.168.1.200/24 brd 192.168.1.255 scope global eth0*
         inet6 fe80::224:54ff:feb9:1cf8/64 scope link
            valid_lft forever preferred_lft forever
    3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
         link/ether e8:39:df:6a:21:2a brd ff:ff:ff:ff:ff:ff

Per see, deleting 192.168.2.1/29 causes the rest of the aliased 
interfaces get deleted as well. This is something that is slightly 
documented in the ifconfig manual: /for every scope (i.e. same net with 
address/netmask combination) all aliases are deleted, if you delete the 
first (primary)/. So what is the right way of just deleting the first 
(primary) alias without affecting the rest? If this is a scoping issue, 
is it possible to assign each alias as primary within its own dedicated 
scope?

As a side note, when I first asked this question to Stephen Hemminger 
(he forwarded me to this mailing list) he also told me that "/In Linux 
the interface aliases are really a legacy from the BSD style addressing, 
and don't act the same. It is not common practice to use them./" Is that 
really the case? Because, as you know, IP aliasing is the heart of a 
majority of the high-availability and clustering solutions in Linux. Is 
IP aliasing a really deprecated technology in Linux? Should we avoid 
using it? If so, what do you recommend as an alternative?

Best.

^ permalink raw reply

* Re: [PATCH] net: dont use __netdev_alloc_skb for bounce buffer
From: Stefan Bader @ 2012-07-02 20:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1341254172.22621.456.camel@edumazet-glaptop>

[-- Attachment #1: Type: text/plain, Size: 2780 bytes --]

I can confirm that, with the below patch applied, at least the b44 regression is
fixed and network is usable again.

-Stefan

On 02.07.2012 20:36, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> commit a1c7fff7e1 (net: netdev_alloc_skb() use build_skb()) broke b44 on
> some 64bit machines.
> 
> It appears b44 and b43 use __netdev_alloc_skb() instead of alloc_skb()
> for their bounce buffers.
> 
> There is no need to add an extra NET_SKB_PAD reservation for bounce
> buffers :
> 
> - In TX path, NET_SKB_PAD is useless
> 
> - In RX path in b44, we force a copy of incoming frames if
>   GFP_DMA allocations were needed.
> 
> Reported-and-bisected-by: Stefan Bader <stefan.bader@canonical.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  drivers/net/ethernet/broadcom/b44.c  |    4 ++--
>  drivers/net/wireless/b43legacy/dma.c |    2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/b44.c b/drivers/net/ethernet/broadcom/b44.c
> index 46b8b7d..d09c6b5 100644
> --- a/drivers/net/ethernet/broadcom/b44.c
> +++ b/drivers/net/ethernet/broadcom/b44.c
> @@ -656,7 +656,7 @@ static int b44_alloc_rx_skb(struct b44 *bp, int src_idx, u32 dest_idx_unmasked)
>  			dma_unmap_single(bp->sdev->dma_dev, mapping,
>  					     RX_PKT_BUF_SZ, DMA_FROM_DEVICE);
>  		dev_kfree_skb_any(skb);
> -		skb = __netdev_alloc_skb(bp->dev, RX_PKT_BUF_SZ, GFP_ATOMIC|GFP_DMA);
> +		skb = alloc_skb(RX_PKT_BUF_SZ, GFP_ATOMIC | GFP_DMA);
>  		if (skb == NULL)
>  			return -ENOMEM;
>  		mapping = dma_map_single(bp->sdev->dma_dev, skb->data,
> @@ -967,7 +967,7 @@ static netdev_tx_t b44_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  			dma_unmap_single(bp->sdev->dma_dev, mapping, len,
>  					     DMA_TO_DEVICE);
>  
> -		bounce_skb = __netdev_alloc_skb(dev, len, GFP_ATOMIC | GFP_DMA);
> +		bounce_skb = alloc_skb(len, GFP_ATOMIC | GFP_DMA);
>  		if (!bounce_skb)
>  			goto err_out;
>  
> diff --git a/drivers/net/wireless/b43legacy/dma.c b/drivers/net/wireless/b43legacy/dma.c
> index f1f8bd0..c8baf02 100644
> --- a/drivers/net/wireless/b43legacy/dma.c
> +++ b/drivers/net/wireless/b43legacy/dma.c
> @@ -1072,7 +1072,7 @@ static int dma_tx_fragment(struct b43legacy_dmaring *ring,
>  	meta->dmaaddr = map_descbuffer(ring, skb->data, skb->len, 1);
>  	/* create a bounce buffer in zone_dma on mapping failure. */
>  	if (b43legacy_dma_mapping_error(ring, meta->dmaaddr, skb->len, 1)) {
> -		bounce_skb = __dev_alloc_skb(skb->len, GFP_ATOMIC | GFP_DMA);
> +		bounce_skb = alloc_skb(skb->len, GFP_ATOMIC | GFP_DMA);
>  		if (!bounce_skb) {
>  			ring->current_slot = old_top_slot;
>  			ring->used_slots = old_used_slots;
> 
> 




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox