Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 0/2] pktgen: Clone skb to avoid corruption of skbs in ndo_start_xmit methods (v3)
From: Neil Horman @ 2011-07-26 16:05 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1311105179-26408-1-git-send-email-nhorman@tuxdriver.com>

Ok, after considering all your comments, Dave suggested this as an alternate
approach:

1) We create a new priv_flag, IFF_SKB_TX_SHARED, to identify drivers capable of
handling shared skbs.  Default is to not set this flag

2) Modify ether_setup to enable this flag, under the assumption that any driver
calling this  function is initalizing a real ethernet device and as such can
handle shared skbs since they don't tend to store state in the skb struct.
Pktgen can then query this flag when a user script attempts to issue the
clone_skb command and decide if it is to be alowed or not.

3) Audit the network drivers calling ether_setup to identify any code doing so
that can't actualy handle shared skbs and manually disable the new flag.  There
are about 10 drivers in this category.

Change notes:
v3) Fixed Erics note in which I tested the length of the passed in string rather
than its converted value for beign > 0

Thoughts/reviews aprpeciated.
Neil

^ permalink raw reply

* Re: [PATCH net-next-2.6 2/2] be2net: use stats-sync to read/write 64-bit stats
From: Stephen Hemminger @ 2011-07-26 14:58 UTC (permalink / raw)
  To: Sathya Perla; +Cc: netdev
In-Reply-To: <1311657015-23465-3-git-send-email-sathya.perla@emulex.com>

On Tue, 26 Jul 2011 10:40:15 +0530
Sathya Perla <sathya.perla@emulex.com> wrote:

> 64-bit stats in be2net are written/read as follows using the stats-sync
> interface for safe access in 32-bit archs:
> 
> 64-bit 		sync			writer			reader
> stats
> ------------------------------------------------------------------------------
> tx_stats	tx_stats->sync		be_xmit			be_get_stats64,
> 								ethtool
> tx-compl	tx_stats->sync_compl	tx-compl-processing	ethtool
> rx-stats	rx_stats->sync		rx-compl-processing	be_get_stats64,
> 								ethtool,
> 								eqd-update
> 
> This patch is based on Stephen Hemminger's earlier patch on the same issue...
> 
> Signed-off-by: Sathya Perla <sathya.perla@emulex.com>

Is the tx complete stat even worth the effort? does it provide a useful metric?
Since rx/tx bytes are already in regular stats, keeping them in ethtool stats
is redundant.

These are just minor nits, you can ignore this advice if you want.

^ permalink raw reply

* Re: [PATCH 1/2] net: add IFF_SKB_TX_SHARED flag to priv_flags
From: Neil Horman @ 2011-07-26 14:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Robert Olsson, Alexey Dobriyan, David S. Miller
In-Reply-To: <1311624682.2996.1.camel@edumazet-laptop>

On Mon, Jul 25, 2011 at 10:11:22PM +0200, Eric Dumazet wrote:
> Le lundi 25 juillet 2011 à 15:45 -0400, Neil Horman a écrit :
> > Pktgen attempts to transmit shared skbs to net devices, which can't be used by
> > some drivers as they keep state information in skbs.  This patch adds a flag
> > marking drivers as being able to handle shared skbs in their tx path.  Drivers
> > are defaulted to being unable to do so, but calling ether_setup enables this
> > flag, as 90% of the drivers calling ether_setup touch real hardware and can
> > handle shared skbs.  A subsequent patch will audit drivers to ensure that the
> > flag is set properly
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > Reported-by: Jiri Pirko <jpirko@redhat.com>
> > CC: Robert Olsson <robert.olsson@its.uu.se>
> > CC: Eric Dumazet <eric.dumazet@gmail.com>
> > CC: Alexey Dobriyan <adobriyan@gmail.com>
> > CC: David S. Miller <davem@davemloft.net>
> > ---
> >  include/linux/if.h |    2 ++
> >  net/core/pktgen.c  |    8 +++++---
> >  net/ethernet/eth.c |    1 +
> >  3 files changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/if.h b/include/linux/if.h
> > index 3bc63e6..03489ca 100644
> > --- a/include/linux/if.h
> > +++ b/include/linux/if.h
> > @@ -76,6 +76,8 @@
> >  #define IFF_BRIDGE_PORT	0x4000		/* device used as bridge port */
> >  #define IFF_OVS_DATAPATH	0x8000	/* device used as Open vSwitch
> >  					 * datapath port */
> > +#define IFF_TX_SKB_SHARING	0x10000	/* The interface supports sharing
> > +					 * skbs on transmit */
> >  
> >  #define IF_GET_IFACE	0x0001		/* for querying only */
> >  #define IF_GET_PROTO	0x0002
> > diff --git a/net/core/pktgen.c b/net/core/pktgen.c
> > index f76079c..53f3f15 100644
> > --- a/net/core/pktgen.c
> > +++ b/net/core/pktgen.c
> > @@ -1070,7 +1070,9 @@ static ssize_t pktgen_if_write(struct file *file,
> >  		len = num_arg(&user_buffer[i], 10, &value);
> >  		if (len < 0)
> >  			return len;
> > -
> > +		if ((len > 0) &&
> 
> 		if ((value > 0) ...
> 
Thank you Eric, I'll respin this shortly
Neil

^ permalink raw reply

* Re: [PATCH] net: Fix security_socket_sendmsg() bypass problem.
From: Eric Paris @ 2011-07-26 13:58 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: anton, casey, mjt, davem, netdev, linux-security-module
In-Reply-To: <201107262021.FCC34304.HMOOFQJVSFLFOt@I-love.SAKURA.ne.jp>

On Tue, Jul 26, 2011 at 7:21 AM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:

> Does SELinux want to receive nosec flag at selinux_socket_sendmsg() because
> calling security_socket_sendmsg() more than once is meaningless for SELinux?

SELinux would not mind having a flag such that we could expedite the
call on the 2nd or later.  Did we finally find the first place where
SELinux is going to be the faster LSM!  Never thought I'd see the day!

-Eric

^ permalink raw reply

* invalid requirement from ethtool?
From: Eli Cohen @ 2011-07-26 12:42 UTC (permalink / raw)
  To: davem; +Cc: netdev

Hi,
I see the following text in include/linux/ethtool.h and wonder what is
the reasoning for requiring that both params cannot be zero. I could
not track when and who inserted this text as it dates before git was
used to track kernel code, but my feeling is that is related to a
specific hardware limitation.

        /* How many packets to delay an RX interrupt after
         * a packet arrives.  If 0, only rx_coalesce_usecs is
         * used.  It is illegal to set both usecs and max frames
         * to zero as this would cause RX interrupts to never be
         * generated.
         */
        __u32   rx_max_coalesced_frames;

        /* How many packets to delay a TX interrupt after
         * a packet is sent.  If 0, only tx_coalesce_usecs is
         * used.  It is illegal to set both usecs and max frames
         * to zero as this would cause TX interrupts to never be
         * generated.
         */
        __u32   tx_max_coalesced_frames;

I found this in tg3 driver:
         /* No rx interrupts will be generated if both are zero */
        if ((ec->rx_coalesce_usecs == 0) &&
            (ec->rx_max_coalesced_frames == 0))
                return -EINVAL;

However, bnx2 for example allows setting both to zero.

I think both params zero should be allowed and mean coalescing is not
operational, thus we can remove these comments from ethtool.h

^ permalink raw reply

* [PATCH] net: sock_sendmsg_nosec() is static
From: Eric Dumazet @ 2011-07-26 12:39 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Anton Blanchard

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Anton Blanchard <anton@samba.org>
---
 net/socket.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/socket.c b/net/socket.c
index 02dc82d..b7ce3b7 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -580,7 +580,7 @@ int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
 }
 EXPORT_SYMBOL(sock_sendmsg);
 
-int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg, size_t size)
+static int sock_sendmsg_nosec(struct socket *sock, struct msghdr *msg, size_t size)
 {
 	struct kiocb iocb;
 	struct sock_iocb siocb;



^ permalink raw reply related

* Re: [PATCH] gianfar: fix bug caused by 87c288c6e9aa31720b72e2bc2d665e24e1653c3e
From: Sebastian Pöhn @ 2011-07-26 12:21 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Linux Netdev
In-Reply-To: <20110726120442.GC2078@minipsycho.brq.redhat.com>

Am Dienstag, den 26.07.2011, 14:04 +0200 schrieb Jiri Pirko:
> Tue, Jul 26, 2011 at 01:13:41PM CEST, sebastian.belden@googlemail.com wrote:
> >Am Dienstag, den 26.07.2011, 12:46 +0200 schrieb Jiri Pirko:
> >> Tue, Jul 26, 2011 at 12:03:13PM CEST, sebastian.belden@googlemail.com wrote:
> >> >commit 87c288c6e9aa31720b72e2bc2d665e24e1653c3e "gianfar: do vlan cleanup" has two issues:
> >> ># permutation of rx and tx flags
> >> ># enabling vlan tag insertion by default (this leads to unusable connections on some configurations)
> >> 
> >> How so? What's causing that?
> >If you enable the VLINS bit of txctrl and do not alter the vlan tag
> >configuration of the NIC, every packet will get a all zero vlan tag
> >(0x8100 0000). If you run a network without vlan awareness all packets
> >will be ignored.
> 
> 
> Interesting hw... I would guess that if gfar_tx_vlan() is not called, no
> tag would be put there...
The Freescale documentation says in detail:

if(FCB has valid VLAN field)
	send packet with this one
else
	send packet with default vlan tag

default vlan tag = dfvlan register = 0x8100 0000 by default
> 
> >
> >In my configuration gianfar system <-> 3c59x system the 3com system
> >discards all packets received with the vlan tag.
> >> 
> >> >
> >> >If VLAN insertion is requested (via ethtool) it will be set at an other point ...
> >> >
> >> >Signed-off-by: Sebastian Poehn <sebastian.poehn@belden.com>
> >> >---
> >> >
> >> > drivers/net/gianfar.c |    6 +-----
> >> > 1 files changed, 1 insertions(+), 5 deletions(-)
> >> >
> >> >diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> >> >index 835cd25..2659daa 100644
> >> >--- a/drivers/net/gianfar.c
> >> >+++ b/drivers/net/gianfar.c
> >> >@@ -388,12 +388,8 @@ static void gfar_init_mac(struct net_device *ndev)
> >> > 	if (priv->hwts_rx_en)
> >> > 		rctrl |= RCTRL_PRSDEP_INIT | RCTRL_TS_ENABLE;
> >> > 
> >> >-	/* keep vlan related bits if it's enabled */
> >> >-	if (ndev->features & NETIF_F_HW_VLAN_TX)
> >> >-		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
> >> >-
> >> > 	if (ndev->features & NETIF_F_HW_VLAN_RX)
> >> >-		tctrl |= TCTRL_VLINS;
> >> >+		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
> >> > 
> >> > 	/* Init rctrl based on our settings */
> >> > 	gfar_write(&regs->rctrl, rctrl);
> >> >
> >> >
> >> 
> >> If you really need that to be done, you should remove NETIF_F_HW_VLAN_TX
> >> from features (not hw_features) (never add it).
> >> 
> >The only thing I did is to let the vlan insertion disabled by default.
> >If someone wants it, it may be enabled via ethtool.
> >
> >



^ permalink raw reply

* Re: [PATCH] gianfar: fix bug caused by 87c288c6e9aa31720b72e2bc2d665e24e1653c3e
From: Jiri Pirko @ 2011-07-26 12:04 UTC (permalink / raw)
  To: Sebastian Pöhn; +Cc: Linux Netdev
In-Reply-To: <1311678821.17939.7.camel@DENEC1DT0191>

Tue, Jul 26, 2011 at 01:13:41PM CEST, sebastian.belden@googlemail.com wrote:
>Am Dienstag, den 26.07.2011, 12:46 +0200 schrieb Jiri Pirko:
>> Tue, Jul 26, 2011 at 12:03:13PM CEST, sebastian.belden@googlemail.com wrote:
>> >commit 87c288c6e9aa31720b72e2bc2d665e24e1653c3e "gianfar: do vlan cleanup" has two issues:
>> ># permutation of rx and tx flags
>> ># enabling vlan tag insertion by default (this leads to unusable connections on some configurations)
>> 
>> How so? What's causing that?
>If you enable the VLINS bit of txctrl and do not alter the vlan tag
>configuration of the NIC, every packet will get a all zero vlan tag
>(0x8100 0000). If you run a network without vlan awareness all packets
>will be ignored.


Interesting hw... I would guess that if gfar_tx_vlan() is not called, no
tag would be put there...

>
>In my configuration gianfar system <-> 3c59x system the 3com system
>discards all packets received with the vlan tag.
>> 
>> >
>> >If VLAN insertion is requested (via ethtool) it will be set at an other point ...
>> >
>> >Signed-off-by: Sebastian Poehn <sebastian.poehn@belden.com>
>> >---
>> >
>> > drivers/net/gianfar.c |    6 +-----
>> > 1 files changed, 1 insertions(+), 5 deletions(-)
>> >
>> >diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
>> >index 835cd25..2659daa 100644
>> >--- a/drivers/net/gianfar.c
>> >+++ b/drivers/net/gianfar.c
>> >@@ -388,12 +388,8 @@ static void gfar_init_mac(struct net_device *ndev)
>> > 	if (priv->hwts_rx_en)
>> > 		rctrl |= RCTRL_PRSDEP_INIT | RCTRL_TS_ENABLE;
>> > 
>> >-	/* keep vlan related bits if it's enabled */
>> >-	if (ndev->features & NETIF_F_HW_VLAN_TX)
>> >-		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
>> >-
>> > 	if (ndev->features & NETIF_F_HW_VLAN_RX)
>> >-		tctrl |= TCTRL_VLINS;
>> >+		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
>> > 
>> > 	/* Init rctrl based on our settings */
>> > 	gfar_write(&regs->rctrl, rctrl);
>> >
>> >
>> 
>> If you really need that to be done, you should remove NETIF_F_HW_VLAN_TX
>> from features (not hw_features) (never add it).
>> 
>The only thing I did is to let the vlan insertion disabled by default.
>If someone wants it, it may be enabled via ethtool.
>
>

^ permalink raw reply

* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
From: Christoph Hellwig @ 2011-07-26 11:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Hellwig, Tim Chen, Al Viro, David Miller, Andi Kleen,
	Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
	linux-fsdevel, netdev
In-Reply-To: <1311677013.2355.25.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Tue, Jul 26, 2011 at 12:43:33PM +0200, Eric Dumazet wrote:
> BTW, we have one atomic op that could be avoided in new_inode()
> 
> spin_lock(&inode->i_lock);
> inode->i_state = 0;
> spin_unlock(&inode->i_lock);
> 
> can probably be changed to something less expensive...
> 
> inode->i_state = 0;
> smp_wmb();
> 
> Not clear if we really need a memory barrier either....

I think we already had this in some of the earlier vfs/inode scale
series, but it got lost when Al asked to just put the fundamental
changes in.

For plain new_inode() the barrier shouldn't be needed as we take
the sb list lock just a little later.  I'm not sure about your new
variant, so I'll rather lave that to you.

There's a few other things missing from earlier iterations, most notable
the non-atomic i_count, and the bucket locks for the inode hash, if
you're eager enough to look into that area.

^ permalink raw reply

* Re: [PATCH] net: Fix security_socket_sendmsg() bypass problem.
From: Tetsuo Handa @ 2011-07-26 11:21 UTC (permalink / raw)
  To: anton; +Cc: casey, mjt, davem, netdev, linux-security-module
In-Reply-To: <20110726195557.5abcf96d@kryten>

Anton Blanchard wrote:
> Not sure what happened to your email but the gains are evident at
> just 2 packets.

I know your benchmark result, but that result is based on calling
security_socket_sendmsg() only once.

What we worry is that overhead by security_socket_sendmsg() kills the
performance gain for batched case.

> I can help with testing - the commit included a microbenchmark for
> the purposes of analysing its performance.

Yes please. The patch in msg11510.html would want some more discussion, but
the patch in msg11504.html is ready to benchmark on your environment.

Does SELinux want to receive nosec flag at selinux_socket_sendmsg() because
calling security_socket_sendmsg() more than once is meaningless for SELinux?

^ permalink raw reply

* Re: [PATCH] gianfar: fix bug caused by 87c288c6e9aa31720b72e2bc2d665e24e1653c3e
From: Sebastian Pöhn @ 2011-07-26 11:13 UTC (permalink / raw)
  To: Jiri Pirko, Linux Netdev
In-Reply-To: <20110726104559.GA2078@minipsycho.brq.redhat.com>

Am Dienstag, den 26.07.2011, 12:46 +0200 schrieb Jiri Pirko:
> Tue, Jul 26, 2011 at 12:03:13PM CEST, sebastian.belden@googlemail.com wrote:
> >commit 87c288c6e9aa31720b72e2bc2d665e24e1653c3e "gianfar: do vlan cleanup" has two issues:
> ># permutation of rx and tx flags
> ># enabling vlan tag insertion by default (this leads to unusable connections on some configurations)
> 
> How so? What's causing that?
If you enable the VLINS bit of txctrl and do not alter the vlan tag
configuration of the NIC, every packet will get a all zero vlan tag
(0x8100 0000). If you run a network without vlan awareness all packets
will be ignored.

In my configuration gianfar system <-> 3c59x system the 3com system
discards all packets received with the vlan tag.
> 
> >
> >If VLAN insertion is requested (via ethtool) it will be set at an other point ...
> >
> >Signed-off-by: Sebastian Poehn <sebastian.poehn@belden.com>
> >---
> >
> > drivers/net/gianfar.c |    6 +-----
> > 1 files changed, 1 insertions(+), 5 deletions(-)
> >
> >diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
> >index 835cd25..2659daa 100644
> >--- a/drivers/net/gianfar.c
> >+++ b/drivers/net/gianfar.c
> >@@ -388,12 +388,8 @@ static void gfar_init_mac(struct net_device *ndev)
> > 	if (priv->hwts_rx_en)
> > 		rctrl |= RCTRL_PRSDEP_INIT | RCTRL_TS_ENABLE;
> > 
> >-	/* keep vlan related bits if it's enabled */
> >-	if (ndev->features & NETIF_F_HW_VLAN_TX)
> >-		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
> >-
> > 	if (ndev->features & NETIF_F_HW_VLAN_RX)
> >-		tctrl |= TCTRL_VLINS;
> >+		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
> > 
> > 	/* Init rctrl based on our settings */
> > 	gfar_write(&regs->rctrl, rctrl);
> >
> >
> 
> If you really need that to be done, you should remove NETIF_F_HW_VLAN_TX
> from features (not hw_features) (never add it).
> 
The only thing I did is to let the vlan insertion disabled by default.
If someone wants it, it may be enabled via ethtool.



^ permalink raw reply

* Re: [PATCH] cxgb3i: ref count cdev access to prevent modification while in use
From: Neil Horman @ 2011-07-26 10:50 UTC (permalink / raw)
  To: Divy Le Ray; +Cc: netdev, Steve Wise, David S. Miller, Karen Xie
In-Reply-To: <4E2DDDC5.8040405@chelsio.com>

On Mon, Jul 25, 2011 at 02:19:01PM -0700, Divy Le Ray wrote:
> On 07/25/2011 12:56 PM, Neil Horman wrote:
> >
> >This oops was reported recently:
> >d:mon> e
> >cpu 0xd: Vector: 300 (Data Access) at [c0000000fd4c7120]
> >    pc: d00000000076f194: .t3_l2t_get+0x44/0x524 [cxgb3]
> >    lr: d000000000b02108: .init_act_open+0x150/0x3d4 [cxgb3i]
> >    sp: c0000000fd4c73a0
> >   msr: 8000000000009032
> >   dar: 0
> > dsisr: 40000000
> >  current = 0xc0000000fd640d40
> >  paca    = 0xc00000000054ff80
> >    pid   = 5085, comm = iscsid
> >d:mon> t
> >[c0000000fd4c7450] d000000000b02108 .init_act_open+0x150/0x3d4 [cxgb3i]
> >[c0000000fd4c7500] d000000000e45378 .cxgbi_ep_connect+0x784/0x8e8
> >[libcxgbi]
> >[c0000000fd4c7650] d000000000db33f0 .iscsi_if_rx+0x71c/0xb18
> >[scsi_transport_iscsi2]
> >[c0000000fd4c7740] c000000000370c9c .netlink_data_ready+0x40/0xa4
> >[c0000000fd4c77c0] c00000000036f010 .netlink_sendskb+0x4c/0x9c
> >[c0000000fd4c7850] c000000000370c18 .netlink_sendmsg+0x358/0x39c
> >[c0000000fd4c7950] c00000000033be24 .sock_sendmsg+0x114/0x1b8
> >[c0000000fd4c7b50] c00000000033d208 .sys_sendmsg+0x218/0x2ac
> >[c0000000fd4c7d70] c00000000033f55c .sys_socketcall+0x228/0x27c
> >[c0000000fd4c7e30] c0000000000086a4 syscall_exit+0x0/0x40
> >--- Exception: c01 (System Call) at 00000080da560cfc
> >
> >The root cause was an EEH error, which sent us down the
> >offload_close path in
> >the cxgb3 driver, which in turn sets cdev->lldev to NULL, without
> >regard for
> >upper layer driver (like the cxgbi drivers) which might have
> >execution contexts
> >in the middle of its use. The result is the oops above, when
> >t3_l2t_get attempts
> >to dereference cdev->lldev right after the EEH error handler sets
> >it to NULL.
> >
> >The fix is to reference count the cdev structure.  When an EEH
> >error occurs, the
> >shutdown path:
> >t3_adapter_error->offload_close->cxgb3i_remove_clients->cxgb3i_dev_close
> >will now block until such time as the cdev pointer has a use count
> >of zero.
> >This coupled with the fact that lookups will now skip finding any
> >registered
> >cdev's in cxgbi_device_find_by_[lldev|netdev] with the
> >CXGBI_FLAG_ADAPTER_RESET
> >bit set ensures that on an EEH, the setting of lldev to NULL in
> >offload_close
> >will only happen after there are no longer any active users of the data
> >structure.
> >
> >This has been tested by the reporter and shown to fix the reproted oops
> >
> >Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> >CC: Divy Le Ray <divy@chelsio.com>
> >CC: Steve Wise <swise@chelsio.com>
> >CC: "David S. Miller" <davem@davemloft.net>
> >
> 
> Also cc-ing Karen.
> 
Thank you Divy. Karen, if you're going to be working on cxgb3i, would you mind
updating the MAINTINERS file so I and others don't forget to CC you in the
future?

Thanks!
Neil


^ permalink raw reply

* Re: [PATCH] gianfar: fix bug caused by 87c288c6e9aa31720b72e2bc2d665e24e1653c3e
From: Jiri Pirko @ 2011-07-26 10:46 UTC (permalink / raw)
  To: Sebastian Pöhn; +Cc: Linux Netdev
In-Reply-To: <1311674593.17190.7.camel@DENEC1DT0191>

Tue, Jul 26, 2011 at 12:03:13PM CEST, sebastian.belden@googlemail.com wrote:
>commit 87c288c6e9aa31720b72e2bc2d665e24e1653c3e "gianfar: do vlan cleanup" has two issues:
># permutation of rx and tx flags
># enabling vlan tag insertion by default (this leads to unusable connections on some configurations)

How so? What's causing that?

>
>If VLAN insertion is requested (via ethtool) it will be set at an other point ...
>
>Signed-off-by: Sebastian Poehn <sebastian.poehn@belden.com>
>---
>
> drivers/net/gianfar.c |    6 +-----
> 1 files changed, 1 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
>index 835cd25..2659daa 100644
>--- a/drivers/net/gianfar.c
>+++ b/drivers/net/gianfar.c
>@@ -388,12 +388,8 @@ static void gfar_init_mac(struct net_device *ndev)
> 	if (priv->hwts_rx_en)
> 		rctrl |= RCTRL_PRSDEP_INIT | RCTRL_TS_ENABLE;
> 
>-	/* keep vlan related bits if it's enabled */
>-	if (ndev->features & NETIF_F_HW_VLAN_TX)
>-		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
>-
> 	if (ndev->features & NETIF_F_HW_VLAN_RX)
>-		tctrl |= TCTRL_VLINS;
>+		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
> 
> 	/* Init rctrl based on our settings */
> 	gfar_write(&regs->rctrl, rctrl);
>
>

If you really need that to be done, you should remove NETIF_F_HW_VLAN_TX
from features (not hw_features) (never add it).


^ permalink raw reply

* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
From: Eric Dumazet @ 2011-07-26 10:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox,
	Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev
In-Reply-To: <20110726094204.GA12086@infradead.org>

Le mardi 26 juillet 2011 à 05:42 -0400, Christoph Hellwig a écrit :
> On Tue, Jul 26, 2011 at 11:36:34AM +0200, Eric Dumazet wrote:
> > [PATCH v2] vfs: dont chain pipe/anon/socket on superblock s_inodes list
> > 
> > Workloads using pipes and sockets hit inode_sb_list_lock contention.
> > 
> > superblock s_inodes list is needed for quota, dirty, pagecache and
> > fsnotify management. pipe/anon/socket fs are clearly not candidates for
> > these.
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Looks good to me,
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 

Thanks !

BTW, we have one atomic op that could be avoided in new_inode()

spin_lock(&inode->i_lock);
inode->i_state = 0;
spin_unlock(&inode->i_lock);

can probably be changed to something less expensive...

inode->i_state = 0;
smp_wmb();

Not clear if we really need a memory barrier either....



--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] gianfar: fix bug caused by 87c288c6e9aa31720b72e2bc2d665e24e1653c3e
From: Sebastian Pöhn @ 2011-07-26 10:03 UTC (permalink / raw)
  To: Linux Netdev, jpirko

commit 87c288c6e9aa31720b72e2bc2d665e24e1653c3e "gianfar: do vlan cleanup" has two issues:
# permutation of rx and tx flags
# enabling vlan tag insertion by default (this leads to unusable connections on some configurations)

If VLAN insertion is requested (via ethtool) it will be set at an other point ...

Signed-off-by: Sebastian Poehn <sebastian.poehn@belden.com>
---

 drivers/net/gianfar.c |    6 +-----
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 835cd25..2659daa 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -388,12 +388,8 @@ static void gfar_init_mac(struct net_device *ndev)
 	if (priv->hwts_rx_en)
 		rctrl |= RCTRL_PRSDEP_INIT | RCTRL_TS_ENABLE;
 
-	/* keep vlan related bits if it's enabled */
-	if (ndev->features & NETIF_F_HW_VLAN_TX)
-		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
-
 	if (ndev->features & NETIF_F_HW_VLAN_RX)
-		tctrl |= TCTRL_VLINS;
+		rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
 
 	/* Init rctrl based on our settings */
 	gfar_write(&regs->rctrl, rctrl);



^ permalink raw reply related

* Re: [PATCH] net: Fix security_socket_sendmsg() bypass problem.
From: Anton Blanchard @ 2011-07-26  9:55 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: casey, mjt, davem, netdev, linux-security-module
In-Reply-To: <201107260143.CGH18263.FOOSVMOQFJFLHt@I-love.SAKURA.ne.jp>


Hi,

> I fear it too. Unless many dozens (maybe some hundreds) of packets
> are sent by sendmmsg(), msg11504.html might show better performance
> than msg11510.html . But I don't have a machine to benchmark.

Not sure what happened to your email but the gains are evident at
just 2 packets.

I can help with testing - the commit included a microbenchmark for
the purposes of analysing its performance.

Anton

--

    net: Add sendmmsg socket system call
    
    This patch adds a multiple message send syscall and is the send
    version of the existing recvmmsg syscall. This is heavily
    based on the patch by Arnaldo that added recvmmsg.
    
    I wrote a microbenchmark to test the performance gains of using
    this new syscall:
    
    http://ozlabs.org/~anton/junkcode/sendmmsg_test.c
    
    The test was run on a ppc64 box with a 10 Gbit network card. The
    benchmark can send both UDP and RAW ethernet packets.
    
    64B UDP
    
    batch   pkts/sec
    1       804570
    2       872800 (+ 8 %)
    4       916556 (+14 %)
    8       939712 (+17 %)
    16      952688 (+18 %)
    32      956448 (+19 %)
    64      964800 (+20 %)
    
    64B raw socket
    
    batch   pkts/sec
    1       1201449
    2       1350028 (+12 %)
    4       1461416 (+22 %)
    8       1513080 (+26 %)
    16      1541216 (+28 %)
    32      1553440 (+29 %)
    64      1557888 (+30 %)
    
    We see a 20% improvement in throughput on UDP send and 30%
    on raw socket send.
    
    [ Add sparc syscall entries. -DaveM ]

^ permalink raw reply

* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
From: Christoph Hellwig @ 2011-07-26  9:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Hellwig, Tim Chen, Al Viro, David Miller, Andi Kleen,
	Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
	linux-fsdevel, netdev
In-Reply-To: <1311672994.2355.17.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Tue, Jul 26, 2011 at 11:36:34AM +0200, Eric Dumazet wrote:
> [PATCH v2] vfs: dont chain pipe/anon/socket on superblock s_inodes list
> 
> Workloads using pipes and sockets hit inode_sb_list_lock contention.
> 
> superblock s_inodes list is needed for quota, dirty, pagecache and
> fsnotify management. pipe/anon/socket fs are clearly not candidates for
> these.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Looks good to me,

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply

* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
From: Eric Dumazet @ 2011-07-26  9:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tim Chen, Al Viro, David Miller, Andi Kleen, Matthew Wilcox,
	Anton Blanchard, npiggin, linux-kernel, linux-fsdevel, netdev
In-Reply-To: <20110726090357.GA13013@infradead.org>

Le mardi 26 juillet 2011 à 05:03 -0400, Christoph Hellwig a écrit :
> On Tue, Jul 26, 2011 at 10:21:06AM +0200, Eric Dumazet wrote:
> > Well, not 'last' contention point, as we still hit remove_inode_hash(),
> 
> There should be no ned to put pipe or anon inodes on the inode hash.
> Probably sockets don't need it either, but I'd need to look at it in
> detail.
> 
> > inode_wb_list_del()
> 
> The should never be on the wb list either, doing an unlocked check for
> actually beeing on the list before taking the lock should help you.

Yes, it might even help regular inodes ;)

> 
> > inode_lru_list_del(),
> 
> No real need to keep inodes in the LRU if we only allocate them using
> new_inode but never look them up either.  You might want to try setting
> .drop_inode to generic_delete_inode for these.

Yes, I'll take a look, thanks.

> 
> > +struct inode *__new_inode(struct super_block *sb)
> > +{
> > +	struct inode *inode = alloc_inode(sb);
> > +
> > +	if (inode) {
> > +		spin_lock(&inode->i_lock);
> > +		inode->i_state = 0;
> > +		spin_unlock(&inode->i_lock);
> > +		INIT_LIST_HEAD(&inode->i_sb_list);
> > +	}
> > +	return inode;
> > +}
> 
> This needs a much better name like new_inode_pseudo, and a kerneldoc 
> comment explaining when it is safe to use, and the consequences, which
> appear to me:
> 
>  - fs may never be unmount
>  - quotas can't work on the filesystem
>  - writeback can't work on the filesystem

Thanks for reviewing, here is v2 of the patch, addressing your comments.


[PATCH v2] vfs: dont chain pipe/anon/socket on superblock s_inodes list

Workloads using pipes and sockets hit inode_sb_list_lock contention.

superblock s_inodes list is needed for quota, dirty, pagecache and
fsnotify management. pipe/anon/socket fs are clearly not candidates for
these.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
v2: address Christoph comments

 fs/anon_inodes.c   |    2 +-
 fs/inode.c         |   39 ++++++++++++++++++++++++++++++---------
 fs/pipe.c          |    2 +-
 include/linux/fs.h |    3 ++-
 net/socket.c       |    2 +-
 5 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 4d433d3..f11e43e 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -187,7 +187,7 @@ EXPORT_SYMBOL_GPL(anon_inode_getfd);
  */
 static struct inode *anon_inode_mkinode(void)
 {
-	struct inode *inode = new_inode(anon_inode_mnt->mnt_sb);
+	struct inode *inode = new_inode_pseudo(anon_inode_mnt->mnt_sb);
 
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
diff --git a/fs/inode.c b/fs/inode.c
index 96c77b8..319b93b 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -362,9 +362,11 @@ EXPORT_SYMBOL_GPL(inode_sb_list_add);
 
 static inline void inode_sb_list_del(struct inode *inode)
 {
-	spin_lock(&inode_sb_list_lock);
-	list_del_init(&inode->i_sb_list);
-	spin_unlock(&inode_sb_list_lock);
+	if (!list_empty(&inode->i_sb_list)) {
+		spin_lock(&inode_sb_list_lock);
+		list_del_init(&inode->i_sb_list);
+		spin_unlock(&inode_sb_list_lock);
+	}
 }
 
 static unsigned long hash(struct super_block *sb, unsigned long hashval)
@@ -797,6 +799,29 @@ unsigned int get_next_ino(void)
 EXPORT_SYMBOL(get_next_ino);
 
 /**
+ *	new_inode_pseudo 	- obtain an inode
+ *	@sb: superblock
+ *
+ *	Allocates a new inode for given superblock.
+ *	Inode wont be chained in superblock s_inodes list
+ *	This means :
+ *	- fs can't be unmount
+ *	- quotas, fsnotify, writeback can't work
+ */
+struct inode *new_inode_pseudo(struct super_block *sb)
+{
+	struct inode *inode = alloc_inode(sb);
+
+	if (inode) {
+		spin_lock(&inode->i_lock);
+		inode->i_state = 0;
+		spin_unlock(&inode->i_lock);
+		INIT_LIST_HEAD(&inode->i_sb_list);
+	}
+	return inode;
+}
+
+/**
  *	new_inode 	- obtain an inode
  *	@sb: superblock
  *
@@ -814,13 +839,9 @@ struct inode *new_inode(struct super_block *sb)
 
 	spin_lock_prefetch(&inode_sb_list_lock);
 
-	inode = alloc_inode(sb);
-	if (inode) {
-		spin_lock(&inode->i_lock);
-		inode->i_state = 0;
-		spin_unlock(&inode->i_lock);
+	inode = new_inode_pseudo(sb);
+	if (inode)
 		inode_sb_list_add(inode);
-	}
 	return inode;
 }
 EXPORT_SYMBOL(new_inode);
diff --git a/fs/pipe.c b/fs/pipe.c
index 1b7f9af..0e0be1d 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -948,7 +948,7 @@ static const struct dentry_operations pipefs_dentry_operations = {
 
 static struct inode * get_pipe_inode(void)
 {
-	struct inode *inode = new_inode(pipe_mnt->mnt_sb);
+	struct inode *inode = new_inode_pseudo(pipe_mnt->mnt_sb);
 	struct pipe_inode_info *pipe;
 
 	if (!inode)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a665804..cc363fa 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2310,7 +2310,8 @@ extern void __iget(struct inode * inode);
 extern void iget_failed(struct inode *);
 extern void end_writeback(struct inode *);
 extern void __destroy_inode(struct inode *);
-extern struct inode *new_inode(struct super_block *);
+extern struct inode *new_inode_pseudo(struct super_block *sb);
+extern struct inode *new_inode(struct super_block *sb);
 extern void free_inode_nonrcu(struct inode *inode);
 extern int should_remove_suid(struct dentry *);
 extern int file_remove_suid(struct file *);
diff --git a/net/socket.c b/net/socket.c
index 02dc82d..26ed35c 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -467,7 +467,7 @@ static struct socket *sock_alloc(void)
 	struct inode *inode;
 	struct socket *sock;
 
-	inode = new_inode(sock_mnt->mnt_sb);
+	inode = new_inode_pseudo(sock_mnt->mnt_sb);
 	if (!inode)
 		return NULL;
 



^ permalink raw reply related

* Re: [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
From: Christoph Hellwig @ 2011-07-26  9:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tim Chen, Al Viro, David Miller, Christoph Hellwig, Andi Kleen,
	Matthew Wilcox, Anton Blanchard, npiggin, linux-kernel,
	linux-fsdevel, netdev
In-Reply-To: <1311668466.2355.12.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Tue, Jul 26, 2011 at 10:21:06AM +0200, Eric Dumazet wrote:
> Well, not 'last' contention point, as we still hit remove_inode_hash(),

There should be no ned to put pipe or anon inodes on the inode hash.
Probably sockets don't need it either, but I'd need to look at it in
detail.

> inode_wb_list_del()

The should never be on the wb list either, doing an unlocked check for
actually beeing on the list before taking the lock should help you.

> inode_lru_list_del(),

No real need to keep inodes in the LRU if we only allocate them using
new_inode but never look them up either.  You might want to try setting
.drop_inode to generic_delete_inode for these.

> +struct inode *__new_inode(struct super_block *sb)
> +{
> +	struct inode *inode = alloc_inode(sb);
> +
> +	if (inode) {
> +		spin_lock(&inode->i_lock);
> +		inode->i_state = 0;
> +		spin_unlock(&inode->i_lock);
> +		INIT_LIST_HEAD(&inode->i_sb_list);
> +	}
> +	return inode;
> +}

This needs a much better name like new_inode_pseudo, and a kerneldoc 
comment explaining when it is safe to use, and the consequences, which
appear to me:

 - fs may never be unmount
 - quotas can't work on the filesystem
 - writeback can't work on the filesystem

> @@ -814,13 +829,9 @@ struct inode *new_inode(struct super_block *sb)
>  
>  	spin_lock_prefetch(&inode_sb_list_lock);
>  
> -	inode = alloc_inode(sb);
> -	if (inode) {
> -		spin_lock(&inode->i_lock);
> -		inode->i_state = 0;
> -		spin_unlock(&inode->i_lock);
> -		inode_sb_list_add(inode);
> -	}
> +	inode = __new_inode(sb);
> +	if (inode)
> +			inode_sb_list_add(inode);

bad indentation.


^ permalink raw reply

* [PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list
From: Eric Dumazet @ 2011-07-26  8:21 UTC (permalink / raw)
  To: Tim Chen, Al Viro, David Miller
  Cc: Christoph Hellwig, Andi Kleen, Matthew Wilcox, Anton Blanchard,
	npiggin, linux-kernel, linux-fsdevel, netdev
In-Reply-To: <1311660013.2996.6.camel@edumazet-laptop>

Le mardi 26 juillet 2011 à 08:00 +0200, Eric Dumazet a écrit :

> Next step is to not chain pipes/sockets into superblock s_inodes list
> 
> inode_sb_list_add()/inode_sb_list_del() is the very last contention
> point because of spin_lock(&inode_sb_list_lock);

Well, not 'last' contention point, as we still hit remove_inode_hash(),
inode_wb_list_del(), inode_lru_list_del(), but thats a clear win on my
2x4x2 machine : 9 seconds instead of 22 on a close(socket()) benchmark.


[PATCH] vfs: dont chain pipe/anon/socket on superblock s_inodes list

Workloads using pipes and sockets hit inode_sb_list_lock contention.

superblock s_inodes list is needed for quota, dirty, pagecache and
fsnotify management. pipe/anon/socket fs are clearly not candidates for
these.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 fs/anon_inodes.c   |    2 +-
 fs/inode.c         |   31 +++++++++++++++++++++----------
 fs/pipe.c          |    2 +-
 include/linux/fs.h |    3 ++-
 net/socket.c       |    2 +-
 5 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 4d433d3..269499e 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -187,7 +187,7 @@ EXPORT_SYMBOL_GPL(anon_inode_getfd);
  */
 static struct inode *anon_inode_mkinode(void)
 {
-	struct inode *inode = new_inode(anon_inode_mnt->mnt_sb);
+	struct inode *inode = __new_inode(anon_inode_mnt->mnt_sb);
 
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
diff --git a/fs/inode.c b/fs/inode.c
index 96c77b8..8a6d62b 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -362,9 +362,11 @@ EXPORT_SYMBOL_GPL(inode_sb_list_add);
 
 static inline void inode_sb_list_del(struct inode *inode)
 {
-	spin_lock(&inode_sb_list_lock);
-	list_del_init(&inode->i_sb_list);
-	spin_unlock(&inode_sb_list_lock);
+	if (!list_empty(&inode->i_sb_list)) {
+		spin_lock(&inode_sb_list_lock);
+		list_del_init(&inode->i_sb_list);
+		spin_unlock(&inode_sb_list_lock);
+	}
 }
 
 static unsigned long hash(struct super_block *sb, unsigned long hashval)
@@ -796,6 +798,19 @@ unsigned int get_next_ino(void)
 }
 EXPORT_SYMBOL(get_next_ino);
 
+struct inode *__new_inode(struct super_block *sb)
+{
+	struct inode *inode = alloc_inode(sb);
+
+	if (inode) {
+		spin_lock(&inode->i_lock);
+		inode->i_state = 0;
+		spin_unlock(&inode->i_lock);
+		INIT_LIST_HEAD(&inode->i_sb_list);
+	}
+	return inode;
+}
+
 /**
  *	new_inode 	- obtain an inode
  *	@sb: superblock
@@ -814,13 +829,9 @@ struct inode *new_inode(struct super_block *sb)
 
 	spin_lock_prefetch(&inode_sb_list_lock);
 
-	inode = alloc_inode(sb);
-	if (inode) {
-		spin_lock(&inode->i_lock);
-		inode->i_state = 0;
-		spin_unlock(&inode->i_lock);
-		inode_sb_list_add(inode);
-	}
+	inode = __new_inode(sb);
+	if (inode)
+			inode_sb_list_add(inode);
 	return inode;
 }
 EXPORT_SYMBOL(new_inode);
diff --git a/fs/pipe.c b/fs/pipe.c
index 1b7f9af..937b962 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -948,7 +948,7 @@ static const struct dentry_operations pipefs_dentry_operations = {
 
 static struct inode * get_pipe_inode(void)
 {
-	struct inode *inode = new_inode(pipe_mnt->mnt_sb);
+	struct inode *inode = __new_inode(pipe_mnt->mnt_sb);
 	struct pipe_inode_info *pipe;
 
 	if (!inode)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index a665804..60be54f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2310,7 +2310,8 @@ extern void __iget(struct inode * inode);
 extern void iget_failed(struct inode *);
 extern void end_writeback(struct inode *);
 extern void __destroy_inode(struct inode *);
-extern struct inode *new_inode(struct super_block *);
+extern struct inode *__new_inode(struct super_block *sb);
+extern struct inode *new_inode(struct super_block *sb);
 extern void free_inode_nonrcu(struct inode *inode);
 extern int should_remove_suid(struct dentry *);
 extern int file_remove_suid(struct file *);
diff --git a/net/socket.c b/net/socket.c
index 02dc82d..b4b8a08 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -467,7 +467,7 @@ static struct socket *sock_alloc(void)
 	struct inode *inode;
 	struct socket *sock;
 
-	inode = new_inode(sock_mnt->mnt_sb);
+	inode = __new_inode(sock_mnt->mnt_sb);
 	if (!inode)
 		return NULL;
 


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: write() udp socket
From: ZHOU Xiaobo @ 2011-07-26  5:39 UTC (permalink / raw)
  To: Huajun Li; +Cc: netdev

------------------
Sincerely yours
                         ZHOU Xiaobo

------------------ Original ------------------
From:  "Huajun Li"<huajun.li.lee@gmail.com>;
Date:  Sun, Jul 24, 2011 04:33 PM
To:  "ZHOU Xiaobo"<xb.zhou@qq.com>;
Cc:  "netdev"<netdev@vger.kernel.org>;
Subject:  Re: write() udp socket

2011/7/23 ZHOU Xiaobo <xb.zhou@qq.com>:
> question No1:
> When I call
> ssize_t write(int fd, const void *buf, size_t count);
>
>
> on a nonblocking UDP socket, is the return value  always equal to 'count'?
>
>

I don't think so.  The function may be interrupt by signal or return
due to other reason, so the return value only represents the size it
writes successfully to the fd.

UDP is datagram, so I think it guarantees the 'buffer' in 'write()'  is entirely sent
like an atomic operate.

> question No2:
> Can I write() a UDP socket in multiple threads without locking?
>

In my opinion, you could. However, the receiver may not get what you expected.

what will happen? I only concern whether the application 'buffer' is sent partially which
is unacceptable.

>
> thanks
>
>
> ------------------
> Sincerely yours
>                         ZHOU Xiaobo

^ permalink raw reply

* Re: write() udp socket
From: ZHOU Xiaobo @ 2011-07-26  5:32 UTC (permalink / raw)
  To: Rick Jones, Huajun Li; +Cc: netdev

------------------
Sincerely yours
                         ZHOU Xiaobo
 
 
 
------------------ Original ------------------
From:  "Rick Jones"<rick.jones2@hp.com>;
Date:  Tue, Jul 26, 2011 01:38 AM
To:  "Huajun Li"<huajun.li.lee@gmail.com>;
Cc:  "ZHOU Xiaobo"<xb.zhou@qq.com>; "netdev"<netdev@vger.kernel.org>;
Subject:  Re: write() udp socket
 
On 07/24/2011 01:33 AM, Huajun Li wrote:
> 2011/7/23 ZHOU Xiaobo<xb.zhou@qq.com>:
>> question No1:
>> When I call
>> ssize_t write(int fd, const void *buf, size_t count);
>>
>>
>> on a nonblocking UDP socket, is the return value  always equal to 'count'?
>>
>>
>
> I don't think so.  The function may be interrupt by signal or return
> due to other reason, so the return value only represents the size it
> writes successfully to the fd.

I believe it should either appaear to succeed or fail.  write() best not 
be sending partial UDP datagrams.  That would be "bad."



yeah, the same as I think. If so the answer of Question No.2 is 'yes' too?

^ permalink raw reply

* Re: IPv6: autoconfiguration and suspend/resume or link down/up
From: Anirban Chakraborty @ 2011-07-26  5:16 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jiri Bohac, netdev, Herbert Xu, David Miller, stephen hemminger
In-Reply-To: <79ahktw3gy268nl9yyk76cxe.1311104529189@email.android.com>


On Jul 19, 2011, at 12:42 PM, Stephen Hemminger wrote:

> bridge forwarding table; route cache; and neighbor table could have same problem. I thought carrier is supposed to toggle on suspend or hibernate

In case of a VM using a VF as a NIC device, the VF would not know of a suspend event unless the hypervisor sends such a notification to the PF,
which the PF could relay back to the VF. Does KVM send such notification at present? Other option would be to bring down the interface in the
VM.

-Anirban



> 
> Jiri Bohac <jbohac@suse.cz> wrote:
> 
>> Hi,
>> 
>> I came over a surprising behaviour with IPv6 autoconfiguration,
>> which I think is a bug, but I would first like to hear other
>> people's opinions before trying to fix this:
>> 
>> Problem 1: all the address/route lifetimes are kept in jiffies
>> and jiffies don't get incremented on resume. So when a
>> route/address lifetime is 30 minutes and the system resumes after
>> 1 hour, the route/address should be considered expired, but it is
>> not.
>> 
>> Problem 2: when a system is moved to a new network a RS is not
>> sent. Thus, IPv6 does not autoconfigure until the router sends a
>> periodic RA. This can occur both while the system is alive and
>> while it is suspended. I think the autoconfigured state should be
>> discarded when the kernel suspects the system could have been
>> moved to a different network.
>> 
>> When the cable is unplugged and plugged in again, we already get
>> notified through linkwatch -> netdev_state_change ->
>> -> call_netdevice_notifiers(NETDEV_CHANGE, ...)
>> However, if the device has already been autoconfigured,
>> addrconf_notify() only handles this event by printing a
>> message.
>> 
>> So my idea was to:
>> - handle link up/down in addrconf_notify() similarly to
>> NETDEV_UP/NETDEV_DOWN
>> 
>> - on suspend, faking a link down event; on resume, faking a link up event
>> (or better, having a special event type for suspend/resume)
>> 
>> This would cause autoconfiguration to be restarted on resume as
>> well as cable plug/unplug, solving both the above problems.
>> 
>> Or do we want to completely rely on userspace tools
>> (networkmanager/ifplug) and expect them to do NETDEV_DOWN on
>> unplug/suspend and NETDEV_UP on plug/resume?
>> 
>> Any thoughts?
>> 
>> -- 
>> Jiri Bohac <jbohac@suse.cz>
>> SUSE Labs, SUSE CZ
>> 
> \x13��칻\x1c�&�~�&�\x18��+-��ݶ\x17��w��˛���m�޵ׯ�{ay�\x1dʇڙ�,j\a��f���h���z�\x1e�w���\f���j:+v���w�j�m����\a����zZ+�����ݢj"��!�i


^ permalink raw reply

* [PATCH net-next-2.6 2/2] be2net: use stats-sync to read/write 64-bit stats
From: Sathya Perla @ 2011-07-26  5:10 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1311657015-23465-1-git-send-email-sathya.perla@emulex.com>

64-bit stats in be2net are written/read as follows using the stats-sync
interface for safe access in 32-bit archs:

64-bit 		sync			writer			reader
stats
------------------------------------------------------------------------------
tx_stats	tx_stats->sync		be_xmit			be_get_stats64,
								ethtool
tx-compl	tx_stats->sync_compl	tx-compl-processing	ethtool
rx-stats	rx_stats->sync		rx-compl-processing	be_get_stats64,
								ethtool,
								eqd-update

This patch is based on Stephen Hemminger's earlier patch on the same issue...

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
---
 drivers/net/benet/be.h         |    5 ++-
 drivers/net/benet/be_cmds.c    |    1 -
 drivers/net/benet/be_ethtool.c |   61 +++++++++++++++++++++----------
 drivers/net/benet/be_main.c    |   77 +++++++++++++++++++++++++---------------
 4 files changed, 93 insertions(+), 51 deletions(-)

diff --git a/drivers/net/benet/be.h b/drivers/net/benet/be.h
index 68227fd..af57b51 100644
--- a/drivers/net/benet/be.h
+++ b/drivers/net/benet/be.h
@@ -29,6 +29,7 @@
 #include <linux/interrupt.h>
 #include <linux/firmware.h>
 #include <linux/slab.h>
+#include <linux/u64_stats_sync.h>
 
 #include "be_hw.h"
 
@@ -174,6 +175,8 @@ struct be_tx_stats {
 	u64 tx_compl;
 	ulong tx_jiffies;
 	u32 tx_stops;
+	struct u64_stats_sync sync;
+	struct u64_stats_sync sync_compl;
 };
 
 struct be_tx_obj {
@@ -206,6 +209,7 @@ struct be_rx_stats {
 	u32 rx_mcast_pkts;
 	u32 rx_compl_err;	/* completions with err set */
 	u32 rx_pps;		/* pkts per second */
+	struct u64_stats_sync sync;
 };
 
 struct be_rx_compl_info {
@@ -518,7 +522,6 @@ static inline bool be_multi_rxq(const struct be_adapter *adapter)
 extern void be_cq_notify(struct be_adapter *adapter, u16 qid, bool arm,
 		u16 num_popped);
 extern void be_link_status_update(struct be_adapter *adapter, bool link_up);
-extern void netdev_stats_update(struct be_adapter *adapter);
 extern void be_parse_stats(struct be_adapter *adapter);
 extern int be_load_fw(struct be_adapter *adapter, u8 *func);
 #endif				/* BE_H */
diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index e15f06a..7dc4741 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -83,7 +83,6 @@ static int be_mcc_compl_process(struct be_adapter *adapter,
 			 (compl->tag0 == OPCODE_ETH_GET_PPORT_STATS)) &&
 			(compl->tag1 == CMD_SUBSYSTEM_ETH)) {
 			be_parse_stats(adapter);
-			netdev_stats_update(adapter);
 			adapter->stats_cmd_sent = false;
 		}
 	} else {
diff --git a/drivers/net/benet/be_ethtool.c b/drivers/net/benet/be_ethtool.c
index 0300b9d..e92a8d8 100644
--- a/drivers/net/benet/be_ethtool.c
+++ b/drivers/net/benet/be_ethtool.c
@@ -74,10 +74,12 @@ static const struct be_ethtool_stat et_stats[] = {
 };
 #define ETHTOOL_STATS_NUM ARRAY_SIZE(et_stats)
 
-/* Stats related to multi RX queues */
+/* Stats related to multi RX queues: get_stats routine assumes bytes, pkts
+ * are first and second members respectively.
+ */
 static const struct be_ethtool_stat et_rx_stats[] = {
-	{DRVSTAT_RX_INFO(rx_bytes)},
-	{DRVSTAT_RX_INFO(rx_pkts)},
+	{DRVSTAT_RX_INFO(rx_bytes)},/* If moving this member see above note */
+	{DRVSTAT_RX_INFO(rx_pkts)}, /* If moving this member see above note */
 	{DRVSTAT_RX_INFO(rx_polls)},
 	{DRVSTAT_RX_INFO(rx_events)},
 	{DRVSTAT_RX_INFO(rx_compl)},
@@ -88,8 +90,11 @@ static const struct be_ethtool_stat et_rx_stats[] = {
 };
 #define ETHTOOL_RXSTATS_NUM (ARRAY_SIZE(et_rx_stats))
 
-/* Stats related to multi TX queues */
+/* Stats related to multi TX queues: get_stats routine assumes compl is the
+ * first member
+ */
 static const struct be_ethtool_stat et_tx_stats[] = {
+	{DRVSTAT_TX_INFO(tx_compl)}, /* If moving this member see above note */
 	{DRVSTAT_TX_INFO(tx_bytes)},
 	{DRVSTAT_TX_INFO(tx_pkts)},
 	{DRVSTAT_TX_INFO(tx_reqs)},
@@ -243,32 +248,48 @@ be_get_ethtool_stats(struct net_device *netdev,
 	struct be_rx_obj *rxo;
 	struct be_tx_obj *txo;
 	void *p;
-	int i, j, base;
+	unsigned int i, j, base = 0, start;
 
 	for (i = 0; i < ETHTOOL_STATS_NUM; i++) {
 		p = (u8 *)&adapter->drv_stats + et_stats[i].offset;
-		data[i] = (et_stats[i].size == sizeof(u64)) ?
-				*(u64 *)p: *(u32 *)p;
+		data[i] = *(u32 *)p;
 	}
+	base += ETHTOOL_STATS_NUM;
 
-	base = ETHTOOL_STATS_NUM;
 	for_all_rx_queues(adapter, rxo, j) {
-		for (i = 0; i < ETHTOOL_RXSTATS_NUM; i++) {
-			p = (u8 *)rx_stats(rxo) + et_rx_stats[i].offset;
-			data[base + j * ETHTOOL_RXSTATS_NUM + i] =
-				(et_rx_stats[i].size == sizeof(u64)) ?
-					*(u64 *)p: *(u32 *)p;
+		struct be_rx_stats *stats = rx_stats(rxo);
+
+		do {
+			start = u64_stats_fetch_begin_bh(&stats->sync);
+			data[base] = stats->rx_bytes;
+			data[base + 1] = stats->rx_pkts;
+		} while (u64_stats_fetch_retry_bh(&stats->sync, start));
+
+		for (i = 2; i < ETHTOOL_RXSTATS_NUM; i++) {
+			p = (u8 *)stats + et_rx_stats[i].offset;
+			data[base + i] = *(u32 *)p;
 		}
+		base += ETHTOOL_RXSTATS_NUM;
 	}
 
-	base = ETHTOOL_STATS_NUM + adapter->num_rx_qs * ETHTOOL_RXSTATS_NUM;
 	for_all_tx_queues(adapter, txo, j) {
-		for (i = 0; i < ETHTOOL_TXSTATS_NUM; i++) {
-			p = (u8 *)tx_stats(txo) + et_tx_stats[i].offset;
-			data[base + j * ETHTOOL_TXSTATS_NUM + i] =
-				(et_tx_stats[i].size == sizeof(u64)) ?
-					*(u64 *)p: *(u32 *)p;
-		}
+		struct be_tx_stats *stats = tx_stats(txo);
+
+		do {
+			start = u64_stats_fetch_begin_bh(&stats->sync_compl);
+			data[base] = stats->tx_compl;
+		} while (u64_stats_fetch_retry_bh(&stats->sync_compl, start));
+
+		do {
+			start = u64_stats_fetch_begin_bh(&stats->sync);
+			for (i = 1; i < ETHTOOL_TXSTATS_NUM; i++) {
+				p = (u8 *)stats + et_tx_stats[i].offset;
+				data[base + i] =
+					(et_tx_stats[i].size == sizeof(u64)) ?
+						*(u64 *)p : *(u32 *)p;
+			}
+		} while (u64_stats_fetch_retry_bh(&stats->sync, start));
+		base += ETHTOOL_TXSTATS_NUM;
 	}
 }
 
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 9cfbfdf..9f2f66c 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -396,36 +396,44 @@ void be_parse_stats(struct be_adapter *adapter)
 			erx->rx_drops_no_fragments[rxo->q.id];
 }
 
-void netdev_stats_update(struct be_adapter *adapter)
+static struct rtnl_link_stats64 *be_get_stats64(struct net_device *netdev,
+					struct rtnl_link_stats64 *stats)
 {
+	struct be_adapter *adapter = netdev_priv(netdev);
 	struct be_drv_stats *drvs = &adapter->drv_stats;
-	struct net_device_stats *dev_stats = &adapter->netdev->stats;
 	struct be_rx_obj *rxo;
 	struct be_tx_obj *txo;
-	unsigned long pkts = 0, bytes = 0, mcast = 0, drops = 0;
+	u64 pkts, bytes;
+	unsigned int start;
 	int i;
 
 	for_all_rx_queues(adapter, rxo, i) {
-		pkts += rx_stats(rxo)->rx_pkts;
-		bytes += rx_stats(rxo)->rx_bytes;
-		mcast += rx_stats(rxo)->rx_mcast_pkts;
-		drops += rx_stats(rxo)->rx_drops_no_skbs;
+		const struct be_rx_stats *rx_stats = rx_stats(rxo);
+		do {
+			start = u64_stats_fetch_begin_bh(&rx_stats->sync);
+			pkts = rx_stats(rxo)->rx_pkts;
+			bytes = rx_stats(rxo)->rx_bytes;
+		} while (u64_stats_fetch_retry_bh(&rx_stats->sync, start));
+		stats->rx_packets += pkts;
+		stats->rx_bytes += bytes;
+		stats->multicast += rx_stats(rxo)->rx_mcast_pkts;
+		stats->rx_dropped += rx_stats(rxo)->rx_drops_no_skbs +
+					rx_stats(rxo)->rx_drops_no_frags;
 	}
-	dev_stats->rx_packets = pkts;
-	dev_stats->rx_bytes = bytes;
-	dev_stats->multicast = mcast;
-	dev_stats->rx_dropped = drops;
 
-	pkts = bytes = 0;
 	for_all_tx_queues(adapter, txo, i) {
-		pkts += tx_stats(txo)->tx_pkts;
-		bytes += tx_stats(txo)->tx_bytes;
+		const struct be_tx_stats *tx_stats = tx_stats(txo);
+		do {
+			start = u64_stats_fetch_begin_bh(&tx_stats->sync);
+			pkts = tx_stats(txo)->tx_pkts;
+			bytes = tx_stats(txo)->tx_bytes;
+		} while (u64_stats_fetch_retry_bh(&tx_stats->sync, start));
+		stats->tx_packets += pkts;
+		stats->tx_bytes += bytes;
 	}
-	dev_stats->tx_packets = pkts;
-	dev_stats->tx_bytes = bytes;
 
 	/* bad pkts received */
-	dev_stats->rx_errors = drvs->rx_crc_errors +
+	stats->rx_errors = drvs->rx_crc_errors +
 		drvs->rx_alignment_symbol_errors +
 		drvs->rx_in_range_errors +
 		drvs->rx_out_range_errors +
@@ -434,26 +442,24 @@ void netdev_stats_update(struct be_adapter *adapter)
 		drvs->rx_dropped_too_short +
 		drvs->rx_dropped_header_too_small +
 		drvs->rx_dropped_tcp_length +
-		drvs->rx_dropped_runt +
-		drvs->rx_tcp_checksum_errs +
-		drvs->rx_ip_checksum_errs +
-		drvs->rx_udp_checksum_errs;
+		drvs->rx_dropped_runt;
 
 	/* detailed rx errors */
-	dev_stats->rx_length_errors = drvs->rx_in_range_errors +
+	stats->rx_length_errors = drvs->rx_in_range_errors +
 		drvs->rx_out_range_errors +
 		drvs->rx_frame_too_long;
 
-	dev_stats->rx_crc_errors = drvs->rx_crc_errors;
+	stats->rx_crc_errors = drvs->rx_crc_errors;
 
 	/* frame alignment errors */
-	dev_stats->rx_frame_errors = drvs->rx_alignment_symbol_errors;
+	stats->rx_frame_errors = drvs->rx_alignment_symbol_errors;
 
 	/* receiver fifo overrun */
 	/* drops_no_pbuf is no per i/f, it's per BE card */
-	dev_stats->rx_fifo_errors = drvs->rxpp_fifo_overflow_drop +
+	stats->rx_fifo_errors = drvs->rxpp_fifo_overflow_drop +
 				drvs->rx_input_fifo_overflow_drop +
 				drvs->rx_drops_no_pbuf;
+	return stats;
 }
 
 void be_link_status_update(struct be_adapter *adapter, bool link_up)
@@ -479,12 +485,14 @@ static void be_tx_stats_update(struct be_tx_obj *txo,
 {
 	struct be_tx_stats *stats = tx_stats(txo);
 
+	u64_stats_update_begin(&stats->sync);
 	stats->tx_reqs++;
 	stats->tx_wrbs += wrb_cnt;
 	stats->tx_bytes += copied;
 	stats->tx_pkts += (gso_segs ? gso_segs : 1);
 	if (stopped)
 		stats->tx_stops++;
+	u64_stats_update_end(&stats->sync);
 }
 
 /* Determine number of WRB entries needed to xmit data in an skb */
@@ -905,7 +913,8 @@ static void be_rx_eqd_update(struct be_adapter *adapter, struct be_rx_obj *rxo)
 	struct be_rx_stats *stats = rx_stats(rxo);
 	ulong now = jiffies;
 	ulong delta = now - stats->rx_jiffies;
-	u32 eqd;
+	u64 pkts;
+	unsigned int start, eqd;
 
 	if (!rx_eq->enable_aic)
 		return;
@@ -920,8 +929,13 @@ static void be_rx_eqd_update(struct be_adapter *adapter, struct be_rx_obj *rxo)
 	if (delta < HZ)
 		return;
 
-	stats->rx_pps = (stats->rx_pkts - stats->rx_pkts_prev) / (delta / HZ);
-	stats->rx_pkts_prev = stats->rx_pkts;
+	do {
+		start = u64_stats_fetch_begin_bh(&stats->sync);
+		pkts = stats->rx_pkts;
+	} while (u64_stats_fetch_retry_bh(&stats->sync, start));
+
+	stats->rx_pps = (pkts - stats->rx_pkts_prev) / (delta / HZ);
+	stats->rx_pkts_prev = pkts;
 	stats->rx_jiffies = now;
 	eqd = stats->rx_pps / 110000;
 	eqd = eqd << 3;
@@ -942,6 +956,7 @@ static void be_rx_stats_update(struct be_rx_obj *rxo,
 {
 	struct be_rx_stats *stats = rx_stats(rxo);
 
+	u64_stats_update_begin(&stats->sync);
 	stats->rx_compl++;
 	stats->rx_bytes += rxcp->pkt_size;
 	stats->rx_pkts++;
@@ -949,6 +964,7 @@ static void be_rx_stats_update(struct be_rx_obj *rxo,
 		stats->rx_mcast_pkts++;
 	if (rxcp->err)
 		stats->rx_compl_err++;
+	u64_stats_update_end(&stats->sync);
 }
 
 static inline bool csum_passed(struct be_rx_compl_info *rxcp)
@@ -1878,8 +1894,9 @@ static int be_poll_tx_mcc(struct napi_struct *napi, int budget)
 				netif_wake_subqueue(adapter->netdev, i);
 			}
 
-			adapter->drv_stats.tx_events++;
+			u64_stats_update_begin(&tx_stats(txo)->sync_compl);
 			tx_stats(txo)->tx_compl += tx_compl;
+			u64_stats_update_end(&tx_stats(txo)->sync_compl);
 		}
 	}
 
@@ -1893,6 +1910,7 @@ static int be_poll_tx_mcc(struct napi_struct *napi, int budget)
 	napi_complete(napi);
 
 	be_eq_notify(adapter, tx_eq->q.id, true, false, 0);
+	adapter->drv_stats.tx_events++;
 	return 1;
 }
 
@@ -2843,6 +2861,7 @@ static struct net_device_ops be_netdev_ops = {
 	.ndo_set_rx_mode	= be_set_multicast_list,
 	.ndo_set_mac_address	= be_mac_addr_set,
 	.ndo_change_mtu		= be_change_mtu,
+	.ndo_get_stats64	= be_get_stats64,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_vlan_rx_add_vid	= be_vlan_add_vid,
 	.ndo_vlan_rx_kill_vid	= be_vlan_rem_vid,
-- 
1.7.4


^ permalink raw reply related

* [PATCH net-next-2.6 1/2] be2net: cleanup and refactor stats code
From: Sathya Perla @ 2011-07-26  5:10 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1311657015-23465-1-git-send-email-sathya.perla@emulex.com>

In preparation for 64-bit stats interface, the following cleanups help
streamline the code:
1) made some more rx/tx stats stored by driver 64 bit
2) made some HW stas (err/drop counters) stored in be_drv_stats 32 bit to
   keep the code simple as BE provides 32-bit counters only.
3) removed duplication of netdev stats in ethtool
4) removed some un-necessary stats and fixed some names

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
---
 drivers/net/benet/be.h         |  110 ++++++++---------
 drivers/net/benet/be_cmds.c    |   20 ---
 drivers/net/benet/be_cmds.h    |   53 +-------
 drivers/net/benet/be_ethtool.c |   65 +++--------
 drivers/net/benet/be_main.c    |  263 +++++++++++++---------------------------
 5 files changed, 155 insertions(+), 356 deletions(-)

diff --git a/drivers/net/benet/be.h b/drivers/net/benet/be.h
index c85768c..68227fd 100644
--- a/drivers/net/benet/be.h
+++ b/drivers/net/benet/be.h
@@ -167,15 +167,13 @@ struct be_mcc_obj {
 };
 
 struct be_tx_stats {
-	u32 be_tx_reqs;		/* number of TX requests initiated */
-	u32 be_tx_stops;	/* number of times TX Q was stopped */
-	u32 be_tx_wrbs;		/* number of tx WRBs used */
-	u32 be_tx_compl;	/* number of tx completion entries processed */
-	ulong be_tx_jiffies;
-	u64 be_tx_bytes;
-	u64 be_tx_bytes_prev;
-	u64 be_tx_pkts;
-	u32 be_tx_rate;
+	u64 tx_bytes;
+	u64 tx_pkts;
+	u64 tx_reqs;
+	u64 tx_wrbs;
+	u64 tx_compl;
+	ulong tx_jiffies;
+	u32 tx_stops;
 };
 
 struct be_tx_obj {
@@ -195,22 +193,19 @@ struct be_rx_page_info {
 };
 
 struct be_rx_stats {
-	u32 rx_post_fail;/* number of ethrx buffer alloc failures */
-	u32 rx_polls;	/* number of times NAPI called poll function */
-	u32 rx_events;	/* number of ucast rx completion events  */
-	u32 rx_compl;	/* number of rx completion entries processed */
-	ulong rx_dropped; /* number of skb allocation errors */
-	ulong rx_jiffies;
 	u64 rx_bytes;
-	u64 rx_bytes_prev;
 	u64 rx_pkts;
-	u32 rx_rate;
+	u64 rx_pkts_prev;
+	ulong rx_jiffies;
+	u32 rx_drops_no_skbs;	/* skb allocation errors */
+	u32 rx_drops_no_frags;	/* HW has no fetched frags */
+	u32 rx_post_fail;	/* page post alloc failures */
+	u32 rx_polls;		/* NAPI calls */
+	u32 rx_events;
+	u32 rx_compl;
 	u32 rx_mcast_pkts;
-	u32 rxcp_err;	/* Num rx completion entries w/ err set. */
-	ulong rx_fps_jiffies;	/* jiffies at last FPS calc */
-	u32 rx_frags;
-	u32 prev_rx_frags;
-	u32 rx_fps;		/* Rx frags per second */
+	u32 rx_compl_err;	/* completions with err set */
+	u32 rx_pps;		/* pkts per second */
 };
 
 struct be_rx_compl_info {
@@ -247,43 +242,40 @@ struct be_rx_obj {
 
 struct be_drv_stats {
 	u8 be_on_die_temperature;
-	u64 be_tx_events;
-	u64 eth_red_drops;
-	u64 rx_drops_no_pbuf;
-	u64 rx_drops_no_txpb;
-	u64 rx_drops_no_erx_descr;
-	u64 rx_drops_no_tpre_descr;
-	u64 rx_drops_too_many_frags;
-	u64 rx_drops_invalid_ring;
-	u64 forwarded_packets;
-	u64 rx_drops_mtu;
-	u64 rx_crc_errors;
-	u64 rx_alignment_symbol_errors;
-	u64 rx_pause_frames;
-	u64 rx_priority_pause_frames;
-	u64 rx_control_frames;
-	u64 rx_in_range_errors;
-	u64 rx_out_range_errors;
-	u64 rx_frame_too_long;
-	u64 rx_address_match_errors;
-	u64 rx_dropped_too_small;
-	u64 rx_dropped_too_short;
-	u64 rx_dropped_header_too_small;
-	u64 rx_dropped_tcp_length;
-	u64 rx_dropped_runt;
-	u64 rx_ip_checksum_errs;
-	u64 rx_tcp_checksum_errs;
-	u64 rx_udp_checksum_errs;
-	u64 rx_switched_unicast_packets;
-	u64 rx_switched_multicast_packets;
-	u64 rx_switched_broadcast_packets;
-	u64 tx_pauseframes;
-	u64 tx_priority_pauseframes;
-	u64 tx_controlframes;
-	u64 rxpp_fifo_overflow_drop;
-	u64 rx_input_fifo_overflow_drop;
-	u64 pmem_fifo_overflow_drop;
-	u64 jabber_events;
+	u32 tx_events;
+	u32 eth_red_drops;
+	u32 rx_drops_no_pbuf;
+	u32 rx_drops_no_txpb;
+	u32 rx_drops_no_erx_descr;
+	u32 rx_drops_no_tpre_descr;
+	u32 rx_drops_too_many_frags;
+	u32 rx_drops_invalid_ring;
+	u32 forwarded_packets;
+	u32 rx_drops_mtu;
+	u32 rx_crc_errors;
+	u32 rx_alignment_symbol_errors;
+	u32 rx_pause_frames;
+	u32 rx_priority_pause_frames;
+	u32 rx_control_frames;
+	u32 rx_in_range_errors;
+	u32 rx_out_range_errors;
+	u32 rx_frame_too_long;
+	u32 rx_address_match_errors;
+	u32 rx_dropped_too_small;
+	u32 rx_dropped_too_short;
+	u32 rx_dropped_header_too_small;
+	u32 rx_dropped_tcp_length;
+	u32 rx_dropped_runt;
+	u32 rx_ip_checksum_errs;
+	u32 rx_tcp_checksum_errs;
+	u32 rx_udp_checksum_errs;
+	u32 tx_pauseframes;
+	u32 tx_priority_pauseframes;
+	u32 tx_controlframes;
+	u32 rxpp_fifo_overflow_drop;
+	u32 rx_input_fifo_overflow_drop;
+	u32 pmem_fifo_overflow_drop;
+	u32 jabber_events;
 };
 
 struct be_vf_cfg {
diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index 054fa67..e15f06a 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -82,26 +82,6 @@ static int be_mcc_compl_process(struct be_adapter *adapter,
 		if (((compl->tag0 == OPCODE_ETH_GET_STATISTICS) ||
 			 (compl->tag0 == OPCODE_ETH_GET_PPORT_STATS)) &&
 			(compl->tag1 == CMD_SUBSYSTEM_ETH)) {
-			if (adapter->generation == BE_GEN3) {
-				if (lancer_chip(adapter)) {
-					struct lancer_cmd_resp_pport_stats
-						*resp = adapter->stats_cmd.va;
-					be_dws_le_to_cpu(&resp->pport_stats,
-						sizeof(resp->pport_stats));
-				} else {
-					struct be_cmd_resp_get_stats_v1 *resp =
-							adapter->stats_cmd.va;
-
-				be_dws_le_to_cpu(&resp->hw_stats,
-							sizeof(resp->hw_stats));
-				}
-			} else {
-				struct be_cmd_resp_get_stats_v0 *resp =
-							adapter->stats_cmd.va;
-
-				be_dws_le_to_cpu(&resp->hw_stats,
-							sizeof(resp->hw_stats));
-			}
 			be_parse_stats(adapter);
 			netdev_stats_update(adapter);
 			adapter->stats_cmd_sent = false;
diff --git a/drivers/net/benet/be_cmds.h b/drivers/net/benet/be_cmds.h
index 8e4d488..d3342c4 100644
--- a/drivers/net/benet/be_cmds.h
+++ b/drivers/net/benet/be_cmds.h
@@ -693,8 +693,7 @@ struct be_cmd_resp_get_stats_v0 {
 	struct be_hw_stats_v0 hw_stats;
 };
 
-#define make_64bit_val(hi_32, lo_32)	(((u64)hi_32<<32) | lo_32)
-struct lancer_cmd_pport_stats {
+struct lancer_pport_stats {
 	u32 tx_packets_lo;
 	u32 tx_packets_hi;
 	u32 tx_unicast_packets_lo;
@@ -871,16 +870,16 @@ struct lancer_cmd_req_pport_stats {
 	struct be_cmd_req_hdr hdr;
 	union {
 		struct pport_stats_params params;
-		u8 rsvd[sizeof(struct lancer_cmd_pport_stats)];
+		u8 rsvd[sizeof(struct lancer_pport_stats)];
 	} cmd_params;
 };
 
 struct lancer_cmd_resp_pport_stats {
 	struct be_cmd_resp_hdr hdr;
-	struct lancer_cmd_pport_stats pport_stats;
+	struct lancer_pport_stats pport_stats;
 };
 
-static inline  struct lancer_cmd_pport_stats*
+static inline struct lancer_pport_stats*
 	pport_stats_from_cmd(struct be_adapter *adapter)
 {
 	struct lancer_cmd_resp_pport_stats *cmd = adapter->stats_cmd.va;
@@ -1383,8 +1382,7 @@ struct be_cmd_resp_get_stats_v1 {
 	struct be_hw_stats_v1 hw_stats;
 };
 
-static inline void *
-hw_stats_from_cmd(struct be_adapter *adapter)
+static inline void *hw_stats_from_cmd(struct be_adapter *adapter)
 {
 	if (adapter->generation == BE_GEN3) {
 		struct be_cmd_resp_get_stats_v1 *cmd = adapter->stats_cmd.va;
@@ -1397,34 +1395,6 @@ hw_stats_from_cmd(struct be_adapter *adapter)
 	}
 }
 
-static inline void *be_port_rxf_stats_from_cmd(struct be_adapter *adapter)
-{
-	if (adapter->generation == BE_GEN3) {
-		struct be_hw_stats_v1 *hw_stats = hw_stats_from_cmd(adapter);
-		struct be_rxf_stats_v1 *rxf_stats = &hw_stats->rxf;
-
-		return &rxf_stats->port[adapter->port_num];
-	} else {
-		struct be_hw_stats_v0 *hw_stats = hw_stats_from_cmd(adapter);
-		struct be_rxf_stats_v0 *rxf_stats = &hw_stats->rxf;
-
-		return &rxf_stats->port[adapter->port_num];
-	}
-}
-
-static inline void *be_rxf_stats_from_cmd(struct be_adapter *adapter)
-{
-	if (adapter->generation == BE_GEN3) {
-		struct be_hw_stats_v1 *hw_stats = hw_stats_from_cmd(adapter);
-
-		return &hw_stats->rxf;
-	} else {
-		struct be_hw_stats_v0 *hw_stats = hw_stats_from_cmd(adapter);
-
-		return &hw_stats->rxf;
-	}
-}
-
 static inline void *be_erx_stats_from_cmd(struct be_adapter *adapter)
 {
 	if (adapter->generation == BE_GEN3) {
@@ -1438,19 +1408,6 @@ static inline void *be_erx_stats_from_cmd(struct be_adapter *adapter)
 	}
 }
 
-static inline void *be_pmem_stats_from_cmd(struct be_adapter *adapter)
-{
-	if (adapter->generation == BE_GEN3) {
-		struct be_hw_stats_v1 *hw_stats = hw_stats_from_cmd(adapter);
-
-		return &hw_stats->pmem;
-	} else {
-		struct be_hw_stats_v0 *hw_stats = hw_stats_from_cmd(adapter);
-
-		return &hw_stats->pmem;
-	}
-}
-
 extern int be_pci_fnum_get(struct be_adapter *adapter);
 extern int be_cmd_POST(struct be_adapter *adapter);
 extern int be_cmd_mac_addr_query(struct be_adapter *adapter, u8 *mac_addr,
diff --git a/drivers/net/benet/be_ethtool.c b/drivers/net/benet/be_ethtool.c
index 7fd8130..0300b9d 100644
--- a/drivers/net/benet/be_ethtool.c
+++ b/drivers/net/benet/be_ethtool.c
@@ -26,33 +26,18 @@ struct be_ethtool_stat {
 	int offset;
 };
 
-enum {NETSTAT, DRVSTAT_TX, DRVSTAT_RX, ERXSTAT,
-			DRVSTAT};
+enum {DRVSTAT_TX, DRVSTAT_RX, DRVSTAT};
 #define FIELDINFO(_struct, field) FIELD_SIZEOF(_struct, field), \
 					offsetof(_struct, field)
-#define NETSTAT_INFO(field) 	#field, NETSTAT,\
-					FIELDINFO(struct net_device_stats,\
-						field)
 #define DRVSTAT_TX_INFO(field)	#field, DRVSTAT_TX,\
 					FIELDINFO(struct be_tx_stats, field)
 #define DRVSTAT_RX_INFO(field)	#field, DRVSTAT_RX,\
 					FIELDINFO(struct be_rx_stats, field)
-#define ERXSTAT_INFO(field)	#field, ERXSTAT,\
-					FIELDINFO(struct be_erx_stats_v1, field)
 #define	DRVSTAT_INFO(field)	#field, DRVSTAT,\
-					FIELDINFO(struct be_drv_stats, \
-						field)
+					FIELDINFO(struct be_drv_stats, field)
 
 static const struct be_ethtool_stat et_stats[] = {
-	{NETSTAT_INFO(rx_packets)},
-	{NETSTAT_INFO(tx_packets)},
-	{NETSTAT_INFO(rx_bytes)},
-	{NETSTAT_INFO(tx_bytes)},
-	{NETSTAT_INFO(rx_errors)},
-	{NETSTAT_INFO(tx_errors)},
-	{NETSTAT_INFO(rx_dropped)},
-	{NETSTAT_INFO(tx_dropped)},
-	{DRVSTAT_INFO(be_tx_events)},
+	{DRVSTAT_INFO(tx_events)},
 	{DRVSTAT_INFO(rx_crc_errors)},
 	{DRVSTAT_INFO(rx_alignment_symbol_errors)},
 	{DRVSTAT_INFO(rx_pause_frames)},
@@ -71,9 +56,6 @@ static const struct be_ethtool_stat et_stats[] = {
 	{DRVSTAT_INFO(rx_ip_checksum_errs)},
 	{DRVSTAT_INFO(rx_tcp_checksum_errs)},
 	{DRVSTAT_INFO(rx_udp_checksum_errs)},
-	{DRVSTAT_INFO(rx_switched_unicast_packets)},
-	{DRVSTAT_INFO(rx_switched_multicast_packets)},
-	{DRVSTAT_INFO(rx_switched_broadcast_packets)},
 	{DRVSTAT_INFO(tx_pauseframes)},
 	{DRVSTAT_INFO(tx_controlframes)},
 	{DRVSTAT_INFO(rx_priority_pause_frames)},
@@ -96,24 +78,24 @@ static const struct be_ethtool_stat et_stats[] = {
 static const struct be_ethtool_stat et_rx_stats[] = {
 	{DRVSTAT_RX_INFO(rx_bytes)},
 	{DRVSTAT_RX_INFO(rx_pkts)},
-	{DRVSTAT_RX_INFO(rx_rate)},
 	{DRVSTAT_RX_INFO(rx_polls)},
 	{DRVSTAT_RX_INFO(rx_events)},
 	{DRVSTAT_RX_INFO(rx_compl)},
 	{DRVSTAT_RX_INFO(rx_mcast_pkts)},
 	{DRVSTAT_RX_INFO(rx_post_fail)},
-	{DRVSTAT_RX_INFO(rx_dropped)},
-	{ERXSTAT_INFO(rx_drops_no_fragments)}
+	{DRVSTAT_RX_INFO(rx_drops_no_skbs)},
+	{DRVSTAT_RX_INFO(rx_drops_no_frags)}
 };
 #define ETHTOOL_RXSTATS_NUM (ARRAY_SIZE(et_rx_stats))
 
 /* Stats related to multi TX queues */
 static const struct be_ethtool_stat et_tx_stats[] = {
-	{DRVSTAT_TX_INFO(be_tx_rate)},
-	{DRVSTAT_TX_INFO(be_tx_reqs)},
-	{DRVSTAT_TX_INFO(be_tx_wrbs)},
-	{DRVSTAT_TX_INFO(be_tx_stops)},
-	{DRVSTAT_TX_INFO(be_tx_compl)}
+	{DRVSTAT_TX_INFO(tx_bytes)},
+	{DRVSTAT_TX_INFO(tx_pkts)},
+	{DRVSTAT_TX_INFO(tx_reqs)},
+	{DRVSTAT_TX_INFO(tx_wrbs)},
+	{DRVSTAT_TX_INFO(tx_compl)},
+	{DRVSTAT_TX_INFO(tx_stops)}
 };
 #define ETHTOOL_TXSTATS_NUM (ARRAY_SIZE(et_tx_stats))
 
@@ -260,20 +242,11 @@ be_get_ethtool_stats(struct net_device *netdev,
 	struct be_adapter *adapter = netdev_priv(netdev);
 	struct be_rx_obj *rxo;
 	struct be_tx_obj *txo;
-	void *p = NULL;
+	void *p;
 	int i, j, base;
 
 	for (i = 0; i < ETHTOOL_STATS_NUM; i++) {
-		switch (et_stats[i].type) {
-		case NETSTAT:
-			p = &netdev->stats;
-			break;
-		case DRVSTAT:
-			p = &adapter->drv_stats;
-			break;
-		}
-
-		p = (u8 *)p + et_stats[i].offset;
+		p = (u8 *)&adapter->drv_stats + et_stats[i].offset;
 		data[i] = (et_stats[i].size == sizeof(u64)) ?
 				*(u64 *)p: *(u32 *)p;
 	}
@@ -281,15 +254,7 @@ be_get_ethtool_stats(struct net_device *netdev,
 	base = ETHTOOL_STATS_NUM;
 	for_all_rx_queues(adapter, rxo, j) {
 		for (i = 0; i < ETHTOOL_RXSTATS_NUM; i++) {
-			switch (et_rx_stats[i].type) {
-			case DRVSTAT_RX:
-				p = (u8 *)&rxo->stats + et_rx_stats[i].offset;
-				break;
-			case ERXSTAT:
-				p = (u32 *)be_erx_stats_from_cmd(adapter) +
-								rxo->q.id;
-				break;
-			}
+			p = (u8 *)rx_stats(rxo) + et_rx_stats[i].offset;
 			data[base + j * ETHTOOL_RXSTATS_NUM + i] =
 				(et_rx_stats[i].size == sizeof(u64)) ?
 					*(u64 *)p: *(u32 *)p;
@@ -299,7 +264,7 @@ be_get_ethtool_stats(struct net_device *netdev,
 	base = ETHTOOL_STATS_NUM + adapter->num_rx_qs * ETHTOOL_RXSTATS_NUM;
 	for_all_tx_queues(adapter, txo, j) {
 		for (i = 0; i < ETHTOOL_TXSTATS_NUM; i++) {
-			p = (u8 *)&txo->stats + et_tx_stats[i].offset;
+			p = (u8 *)tx_stats(txo) + et_tx_stats[i].offset;
 			data[base + j * ETHTOOL_TXSTATS_NUM + i] =
 				(et_tx_stats[i].size == sizeof(u64)) ?
 					*(u64 *)p: *(u32 *)p;
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index c411bb1..9cfbfdf 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -245,14 +245,14 @@ netdev_addr:
 
 static void populate_be2_stats(struct be_adapter *adapter)
 {
-
-	struct be_drv_stats *drvs = &adapter->drv_stats;
-	struct be_pmem_stats *pmem_sts = be_pmem_stats_from_cmd(adapter);
+	struct be_hw_stats_v0 *hw_stats = hw_stats_from_cmd(adapter);
+	struct be_pmem_stats *pmem_sts = &hw_stats->pmem;
+	struct be_rxf_stats_v0 *rxf_stats = &hw_stats->rxf;
 	struct be_port_rxf_stats_v0 *port_stats =
-		be_port_rxf_stats_from_cmd(adapter);
-	struct be_rxf_stats_v0 *rxf_stats =
-		be_rxf_stats_from_cmd(adapter);
+					&rxf_stats->port[adapter->port_num];
+	struct be_drv_stats *drvs = &adapter->drv_stats;
 
+	be_dws_le_to_cpu(hw_stats, sizeof(*hw_stats));
 	drvs->rx_pause_frames = port_stats->rx_pause_frames;
 	drvs->rx_crc_errors = port_stats->rx_crc_errors;
 	drvs->rx_control_frames = port_stats->rx_control_frames;
@@ -267,12 +267,10 @@ static void populate_be2_stats(struct be_adapter *adapter)
 	drvs->rx_dropped_too_small = port_stats->rx_dropped_too_small;
 	drvs->rx_dropped_too_short = port_stats->rx_dropped_too_short;
 	drvs->rx_out_range_errors = port_stats->rx_out_range_errors;
-	drvs->rx_input_fifo_overflow_drop =
-		port_stats->rx_input_fifo_overflow;
+	drvs->rx_input_fifo_overflow_drop = port_stats->rx_input_fifo_overflow;
 	drvs->rx_dropped_header_too_small =
 		port_stats->rx_dropped_header_too_small;
-	drvs->rx_address_match_errors =
-		port_stats->rx_address_match_errors;
+	drvs->rx_address_match_errors = port_stats->rx_address_match_errors;
 	drvs->rx_alignment_symbol_errors =
 		port_stats->rx_alignment_symbol_errors;
 
@@ -280,36 +278,30 @@ static void populate_be2_stats(struct be_adapter *adapter)
 	drvs->tx_controlframes = port_stats->tx_controlframes;
 
 	if (adapter->port_num)
-		drvs->jabber_events =
-			rxf_stats->port1_jabber_events;
+		drvs->jabber_events = rxf_stats->port1_jabber_events;
 	else
-		drvs->jabber_events =
-			rxf_stats->port0_jabber_events;
+		drvs->jabber_events = rxf_stats->port0_jabber_events;
 	drvs->rx_drops_no_pbuf = rxf_stats->rx_drops_no_pbuf;
 	drvs->rx_drops_no_txpb = rxf_stats->rx_drops_no_txpb;
 	drvs->rx_drops_no_erx_descr = rxf_stats->rx_drops_no_erx_descr;
 	drvs->rx_drops_invalid_ring = rxf_stats->rx_drops_invalid_ring;
 	drvs->forwarded_packets = rxf_stats->forwarded_packets;
 	drvs->rx_drops_mtu = rxf_stats->rx_drops_mtu;
-	drvs->rx_drops_no_tpre_descr =
-		rxf_stats->rx_drops_no_tpre_descr;
-	drvs->rx_drops_too_many_frags =
-		rxf_stats->rx_drops_too_many_frags;
+	drvs->rx_drops_no_tpre_descr = rxf_stats->rx_drops_no_tpre_descr;
+	drvs->rx_drops_too_many_frags = rxf_stats->rx_drops_too_many_frags;
 	adapter->drv_stats.eth_red_drops = pmem_sts->eth_red_drops;
 }
 
 static void populate_be3_stats(struct be_adapter *adapter)
 {
-	struct be_drv_stats *drvs = &adapter->drv_stats;
-	struct be_pmem_stats *pmem_sts = be_pmem_stats_from_cmd(adapter);
-
-	struct be_rxf_stats_v1 *rxf_stats =
-		be_rxf_stats_from_cmd(adapter);
+	struct be_hw_stats_v1 *hw_stats = hw_stats_from_cmd(adapter);
+	struct be_pmem_stats *pmem_sts = &hw_stats->pmem;
+	struct be_rxf_stats_v1 *rxf_stats = &hw_stats->rxf;
 	struct be_port_rxf_stats_v1 *port_stats =
-		be_port_rxf_stats_from_cmd(adapter);
+					&rxf_stats->port[adapter->port_num];
+	struct be_drv_stats *drvs = &adapter->drv_stats;
 
-	drvs->rx_priority_pause_frames = 0;
-	drvs->pmem_fifo_overflow_drop = 0;
+	be_dws_le_to_cpu(hw_stats, sizeof(*hw_stats));
 	drvs->rx_pause_frames = port_stats->rx_pause_frames;
 	drvs->rx_crc_errors = port_stats->rx_crc_errors;
 	drvs->rx_control_frames = port_stats->rx_control_frames;
@@ -327,12 +319,10 @@ static void populate_be3_stats(struct be_adapter *adapter)
 		port_stats->rx_dropped_header_too_small;
 	drvs->rx_input_fifo_overflow_drop =
 		port_stats->rx_input_fifo_overflow_drop;
-	drvs->rx_address_match_errors =
-		port_stats->rx_address_match_errors;
+	drvs->rx_address_match_errors = port_stats->rx_address_match_errors;
 	drvs->rx_alignment_symbol_errors =
 		port_stats->rx_alignment_symbol_errors;
-	drvs->rxpp_fifo_overflow_drop =
-		port_stats->rxpp_fifo_overflow_drop;
+	drvs->rxpp_fifo_overflow_drop = port_stats->rxpp_fifo_overflow_drop;
 	drvs->tx_pauseframes = port_stats->tx_pauseframes;
 	drvs->tx_controlframes = port_stats->tx_controlframes;
 	drvs->jabber_events = port_stats->jabber_events;
@@ -342,10 +332,8 @@ static void populate_be3_stats(struct be_adapter *adapter)
 	drvs->rx_drops_invalid_ring = rxf_stats->rx_drops_invalid_ring;
 	drvs->forwarded_packets = rxf_stats->forwarded_packets;
 	drvs->rx_drops_mtu = rxf_stats->rx_drops_mtu;
-	drvs->rx_drops_no_tpre_descr =
-		rxf_stats->rx_drops_no_tpre_descr;
-	drvs->rx_drops_too_many_frags =
-		rxf_stats->rx_drops_too_many_frags;
+	drvs->rx_drops_no_tpre_descr = rxf_stats->rx_drops_no_tpre_descr;
+	drvs->rx_drops_too_many_frags = rxf_stats->rx_drops_too_many_frags;
 	adapter->drv_stats.eth_red_drops = pmem_sts->eth_red_drops;
 }
 
@@ -353,22 +341,15 @@ static void populate_lancer_stats(struct be_adapter *adapter)
 {
 
 	struct be_drv_stats *drvs = &adapter->drv_stats;
-	struct lancer_cmd_pport_stats *pport_stats = pport_stats_from_cmd
-						(adapter);
-	drvs->rx_priority_pause_frames = 0;
-	drvs->pmem_fifo_overflow_drop = 0;
-	drvs->rx_pause_frames =
-		make_64bit_val(pport_stats->rx_pause_frames_hi,
-				 pport_stats->rx_pause_frames_lo);
-	drvs->rx_crc_errors = make_64bit_val(pport_stats->rx_crc_errors_hi,
-						pport_stats->rx_crc_errors_lo);
-	drvs->rx_control_frames =
-			make_64bit_val(pport_stats->rx_control_frames_hi,
-			pport_stats->rx_control_frames_lo);
+	struct lancer_pport_stats *pport_stats =
+					pport_stats_from_cmd(adapter);
+
+	be_dws_le_to_cpu(pport_stats, sizeof(*pport_stats));
+	drvs->rx_pause_frames = pport_stats->rx_pause_frames_lo;
+	drvs->rx_crc_errors = pport_stats->rx_crc_errors_lo;
+	drvs->rx_control_frames = pport_stats->rx_control_frames_lo;
 	drvs->rx_in_range_errors = pport_stats->rx_in_range_errors;
-	drvs->rx_frame_too_long =
-		make_64bit_val(pport_stats->rx_internal_mac_errors_hi,
-					pport_stats->rx_frames_too_long_lo);
+	drvs->rx_frame_too_long = pport_stats->rx_frames_too_long_lo;
 	drvs->rx_dropped_runt = pport_stats->rx_dropped_runt;
 	drvs->rx_ip_checksum_errs = pport_stats->rx_ip_checksum_errors;
 	drvs->rx_tcp_checksum_errs = pport_stats->rx_tcp_checksum_errors;
@@ -382,32 +363,24 @@ static void populate_lancer_stats(struct be_adapter *adapter)
 				pport_stats->rx_dropped_header_too_small;
 	drvs->rx_input_fifo_overflow_drop = pport_stats->rx_fifo_overflow;
 	drvs->rx_address_match_errors = pport_stats->rx_address_match_errors;
-	drvs->rx_alignment_symbol_errors =
-		make_64bit_val(pport_stats->rx_symbol_errors_hi,
-				pport_stats->rx_symbol_errors_lo);
+	drvs->rx_alignment_symbol_errors = pport_stats->rx_symbol_errors_lo;
 	drvs->rxpp_fifo_overflow_drop = pport_stats->rx_fifo_overflow;
-	drvs->tx_pauseframes = make_64bit_val(pport_stats->tx_pause_frames_hi,
-					pport_stats->tx_pause_frames_lo);
-	drvs->tx_controlframes =
-		make_64bit_val(pport_stats->tx_control_frames_hi,
-				pport_stats->tx_control_frames_lo);
+	drvs->tx_pauseframes = pport_stats->tx_pause_frames_lo;
+	drvs->tx_controlframes = pport_stats->tx_control_frames_lo;
 	drvs->jabber_events = pport_stats->rx_jabbers;
-	drvs->rx_drops_no_pbuf = 0;
-	drvs->rx_drops_no_txpb = 0;
-	drvs->rx_drops_no_erx_descr = 0;
 	drvs->rx_drops_invalid_ring = pport_stats->rx_drops_invalid_queue;
-	drvs->forwarded_packets = make_64bit_val(pport_stats->num_forwards_hi,
-						pport_stats->num_forwards_lo);
-	drvs->rx_drops_mtu = make_64bit_val(pport_stats->rx_drops_mtu_hi,
-						pport_stats->rx_drops_mtu_lo);
-	drvs->rx_drops_no_tpre_descr = 0;
+	drvs->forwarded_packets = pport_stats->num_forwards_lo;
+	drvs->rx_drops_mtu = pport_stats->rx_drops_mtu_lo;
 	drvs->rx_drops_too_many_frags =
-		make_64bit_val(pport_stats->rx_drops_too_many_frags_hi,
-				pport_stats->rx_drops_too_many_frags_lo);
+				pport_stats->rx_drops_too_many_frags_lo;
 }
 
 void be_parse_stats(struct be_adapter *adapter)
 {
+	struct be_erx_stats_v1 *erx = be_erx_stats_from_cmd(adapter);
+	struct be_rx_obj *rxo;
+	int i;
+
 	if (adapter->generation == BE_GEN3) {
 		if (lancer_chip(adapter))
 			populate_lancer_stats(adapter);
@@ -416,6 +389,11 @@ void be_parse_stats(struct be_adapter *adapter)
 	} else {
 		populate_be2_stats(adapter);
 	}
+
+	/* as erx_v1 is longer than v0, ok to use v1 defn for v0 access */
+	for_all_rx_queues(adapter, rxo, i)
+		rx_stats(rxo)->rx_drops_no_frags =
+			erx->rx_drops_no_fragments[rxo->q.id];
 }
 
 void netdev_stats_update(struct be_adapter *adapter)
@@ -431,19 +409,7 @@ void netdev_stats_update(struct be_adapter *adapter)
 		pkts += rx_stats(rxo)->rx_pkts;
 		bytes += rx_stats(rxo)->rx_bytes;
 		mcast += rx_stats(rxo)->rx_mcast_pkts;
-		drops += rx_stats(rxo)->rx_dropped;
-		/*  no space in linux buffers: best possible approximation */
-		if (adapter->generation == BE_GEN3) {
-			if (!(lancer_chip(adapter))) {
-				struct be_erx_stats_v1 *erx =
-					be_erx_stats_from_cmd(adapter);
-				drops += erx->rx_drops_no_fragments[rxo->q.id];
-			}
-		} else {
-			struct be_erx_stats_v0 *erx =
-					be_erx_stats_from_cmd(adapter);
-			drops += erx->rx_drops_no_fragments[rxo->q.id];
-		}
+		drops += rx_stats(rxo)->rx_drops_no_skbs;
 	}
 	dev_stats->rx_packets = pkts;
 	dev_stats->rx_bytes = bytes;
@@ -452,8 +418,8 @@ void netdev_stats_update(struct be_adapter *adapter)
 
 	pkts = bytes = 0;
 	for_all_tx_queues(adapter, txo, i) {
-		pkts += tx_stats(txo)->be_tx_pkts;
-		bytes += tx_stats(txo)->be_tx_bytes;
+		pkts += tx_stats(txo)->tx_pkts;
+		bytes += tx_stats(txo)->tx_bytes;
 	}
 	dev_stats->tx_packets = pkts;
 	dev_stats->tx_bytes = bytes;
@@ -508,89 +474,17 @@ void be_link_status_update(struct be_adapter *adapter, bool link_up)
 	}
 }
 
-/* Update the EQ delay n BE based on the RX frags consumed / sec */
-static void be_rx_eqd_update(struct be_adapter *adapter, struct be_rx_obj *rxo)
-{
-	struct be_eq_obj *rx_eq = &rxo->rx_eq;
-	struct be_rx_stats *stats = &rxo->stats;
-	ulong now = jiffies;
-	u32 eqd;
-
-	if (!rx_eq->enable_aic)
-		return;
-
-	/* Wrapped around */
-	if (time_before(now, stats->rx_fps_jiffies)) {
-		stats->rx_fps_jiffies = now;
-		return;
-	}
-
-	/* Update once a second */
-	if ((now - stats->rx_fps_jiffies) < HZ)
-		return;
-
-	stats->rx_fps = (stats->rx_frags - stats->prev_rx_frags) /
-			((now - stats->rx_fps_jiffies) / HZ);
-
-	stats->rx_fps_jiffies = now;
-	stats->prev_rx_frags = stats->rx_frags;
-	eqd = stats->rx_fps / 110000;
-	eqd = eqd << 3;
-	if (eqd > rx_eq->max_eqd)
-		eqd = rx_eq->max_eqd;
-	if (eqd < rx_eq->min_eqd)
-		eqd = rx_eq->min_eqd;
-	if (eqd < 10)
-		eqd = 0;
-	if (eqd != rx_eq->cur_eqd)
-		be_cmd_modify_eqd(adapter, rx_eq->q.id, eqd);
-
-	rx_eq->cur_eqd = eqd;
-}
-
-static u32 be_calc_rate(u64 bytes, unsigned long ticks)
-{
-	u64 rate = bytes;
-
-	do_div(rate, ticks / HZ);
-	rate <<= 3;			/* bytes/sec -> bits/sec */
-	do_div(rate, 1000000ul);	/* MB/Sec */
-
-	return rate;
-}
-
-static void be_tx_rate_update(struct be_tx_obj *txo)
-{
-	struct be_tx_stats *stats = tx_stats(txo);
-	ulong now = jiffies;
-
-	/* Wrapped around? */
-	if (time_before(now, stats->be_tx_jiffies)) {
-		stats->be_tx_jiffies = now;
-		return;
-	}
-
-	/* Update tx rate once in two seconds */
-	if ((now - stats->be_tx_jiffies) > 2 * HZ) {
-		stats->be_tx_rate = be_calc_rate(stats->be_tx_bytes
-						  - stats->be_tx_bytes_prev,
-						 now - stats->be_tx_jiffies);
-		stats->be_tx_jiffies = now;
-		stats->be_tx_bytes_prev = stats->be_tx_bytes;
-	}
-}
-
 static void be_tx_stats_update(struct be_tx_obj *txo,
 			u32 wrb_cnt, u32 copied, u32 gso_segs, bool stopped)
 {
 	struct be_tx_stats *stats = tx_stats(txo);
 
-	stats->be_tx_reqs++;
-	stats->be_tx_wrbs += wrb_cnt;
-	stats->be_tx_bytes += copied;
-	stats->be_tx_pkts += (gso_segs ? gso_segs : 1);
+	stats->tx_reqs++;
+	stats->tx_wrbs += wrb_cnt;
+	stats->tx_bytes += copied;
+	stats->tx_pkts += (gso_segs ? gso_segs : 1);
 	if (stopped)
-		stats->be_tx_stops++;
+		stats->tx_stops++;
 }
 
 /* Determine number of WRB entries needed to xmit data in an skb */
@@ -1005,10 +899,16 @@ static int be_set_vf_tx_rate(struct net_device *netdev,
 	return status;
 }
 
-static void be_rx_rate_update(struct be_rx_obj *rxo)
+static void be_rx_eqd_update(struct be_adapter *adapter, struct be_rx_obj *rxo)
 {
-	struct be_rx_stats *stats = &rxo->stats;
+	struct be_eq_obj *rx_eq = &rxo->rx_eq;
+	struct be_rx_stats *stats = rx_stats(rxo);
 	ulong now = jiffies;
+	ulong delta = now - stats->rx_jiffies;
+	u32 eqd;
+
+	if (!rx_eq->enable_aic)
+		return;
 
 	/* Wrapped around */
 	if (time_before(now, stats->rx_jiffies)) {
@@ -1016,29 +916,39 @@ static void be_rx_rate_update(struct be_rx_obj *rxo)
 		return;
 	}
 
-	/* Update the rate once in two seconds */
-	if ((now - stats->rx_jiffies) < 2 * HZ)
+	/* Update once a second */
+	if (delta < HZ)
 		return;
 
-	stats->rx_rate = be_calc_rate(stats->rx_bytes - stats->rx_bytes_prev,
-				now - stats->rx_jiffies);
+	stats->rx_pps = (stats->rx_pkts - stats->rx_pkts_prev) / (delta / HZ);
+	stats->rx_pkts_prev = stats->rx_pkts;
 	stats->rx_jiffies = now;
-	stats->rx_bytes_prev = stats->rx_bytes;
+	eqd = stats->rx_pps / 110000;
+	eqd = eqd << 3;
+	if (eqd > rx_eq->max_eqd)
+		eqd = rx_eq->max_eqd;
+	if (eqd < rx_eq->min_eqd)
+		eqd = rx_eq->min_eqd;
+	if (eqd < 10)
+		eqd = 0;
+	if (eqd != rx_eq->cur_eqd) {
+		be_cmd_modify_eqd(adapter, rx_eq->q.id, eqd);
+		rx_eq->cur_eqd = eqd;
+	}
 }
 
 static void be_rx_stats_update(struct be_rx_obj *rxo,
 		struct be_rx_compl_info *rxcp)
 {
-	struct be_rx_stats *stats = &rxo->stats;
+	struct be_rx_stats *stats = rx_stats(rxo);
 
 	stats->rx_compl++;
-	stats->rx_frags += rxcp->num_rcvd;
 	stats->rx_bytes += rxcp->pkt_size;
 	stats->rx_pkts++;
 	if (rxcp->pkt_type == BE_MULTICAST_PACKET)
 		stats->rx_mcast_pkts++;
 	if (rxcp->err)
-		stats->rxcp_err++;
+		stats->rx_compl_err++;
 }
 
 static inline bool csum_passed(struct be_rx_compl_info *rxcp)
@@ -1174,7 +1084,7 @@ static void be_rx_compl_process(struct be_adapter *adapter,
 
 	skb = netdev_alloc_skb_ip_align(netdev, BE_HDR_LEN);
 	if (unlikely(!skb)) {
-		rxo->stats.rx_dropped++;
+		rx_stats(rxo)->rx_drops_no_skbs++;
 		be_rx_compl_discard(adapter, rxo, rxcp);
 		return;
 	}
@@ -1389,7 +1299,7 @@ static void be_post_rx_frags(struct be_rx_obj *rxo, gfp_t gfp)
 		if (!pagep) {
 			pagep = be_alloc_pages(adapter->big_page_size, gfp);
 			if (unlikely(!pagep)) {
-				rxo->stats.rx_post_fail++;
+				rx_stats(rxo)->rx_post_fail++;
 				break;
 			}
 			page_dmaaddr = dma_map_page(&adapter->pdev->dev, pagep,
@@ -1899,7 +1809,7 @@ static int be_poll_rx(struct napi_struct *napi, int budget)
 	struct be_rx_compl_info *rxcp;
 	u32 work_done;
 
-	rxo->stats.rx_polls++;
+	rx_stats(rxo)->rx_polls++;
 	for (work_done = 0; work_done < budget; work_done++) {
 		rxcp = be_rx_compl_get(rxo);
 		if (!rxcp)
@@ -1968,8 +1878,8 @@ static int be_poll_tx_mcc(struct napi_struct *napi, int budget)
 				netif_wake_subqueue(adapter->netdev, i);
 			}
 
-			adapter->drv_stats.be_tx_events++;
-			txo->stats.be_tx_compl += tx_compl;
+			adapter->drv_stats.tx_events++;
+			tx_stats(txo)->tx_compl += tx_compl;
 		}
 	}
 
@@ -2031,7 +1941,6 @@ static void be_worker(struct work_struct *work)
 	struct be_adapter *adapter =
 		container_of(work, struct be_adapter, work.work);
 	struct be_rx_obj *rxo;
-	struct be_tx_obj *txo;
 	int i;
 
 	if (!adapter->ue_detected && !lancer_chip(adapter))
@@ -2060,11 +1969,7 @@ static void be_worker(struct work_struct *work)
 			be_cmd_get_stats(adapter, &adapter->stats_cmd);
 	}
 
-	for_all_tx_queues(adapter, txo, i)
-		be_tx_rate_update(txo);
-
 	for_all_rx_queues(adapter, rxo, i) {
-		be_rx_rate_update(rxo);
 		be_rx_eqd_update(adapter, rxo);
 
 		if (rxo->rx_post_starved) {
-- 
1.7.4


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox