Netdev List
 help / color / mirror / Atom feed
* Re: ipv6 hitting route max_size
From: Simon Kirby @ 2011-06-09  4:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, yoshfuji
In-Reply-To: <20110607.005645.1883989402770213985.davem@davemloft.net>

On Tue, Jun 07, 2011 at 12:56:45AM -0700, David Miller wrote:

> From: Simon Kirby <sim@hostway.ca>
> Date: Mon, 6 Jun 2011 16:15:21 -0700
> 
> > Ok, makes sense, but the result is now that ipv4 loads a full Internet
> > table with no adjustments, while ipv6 does not. Would it make sense to
> > change 4096 to 1048576, or would it be better to count only clones of
> > the actual route or something along those lines?
> 
> Simon can you give this patch a try?

Didn't apply to 2.6.39, so I tried 3.0-rc2, but I get an Oops when
running the example reproduction case I gave before (

for ((i = 0;i < 4200;i++)); do ip route add unreachable 2000::$i; done

) both with and without your patch applied:

BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
IP: [<ffffffff8143e2b7>] ip6_route_add+0xe7/0x6b0
PGD 3ed7c8067 PUD 3ed5a1067 PMD 0
Oops: 0002 [#1] SMP
CPU 0
Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack tg3 e100 libphy

Pid: 8932, comm: ip Not tainted 3.0.0-rc2-amd64-net #1 To Be Filled By O.E.M. To Be Filled By O.E.M./TYAN High-End Dual AMD Opteron, S2882
RIP: 0010:[<ffffffff8143e2b7>]  [<ffffffff8143e2b7>] ip6_route_add+0xe7/0x6b0
RSP: 0018:ffff8803e59939f8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8803e5993a58 RCX: 0000000000000038
RDX: 00000000000000a0 RSI: 0000000000000008 RDI: 00000000000000a0
RBP: ffffffff817b3300 R08: ffffffff816c8980 R09: 0000000000000000
R10: 0000000000000001 R11: dead000000200200 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 00000000fffffff4
FS:  00007f5f11908700(0000) GS:ffff8803ffc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a0 CR3: 00000003edfdd000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ip (pid: 8932, threadinfo ffff8803e5992000, task ffff8803ed70dfa0)
Stack:
 0000000000000000 0000000000000000 ffff8803eecd2a80 0000000000000000
 0000000000000000 ffff8803eedabe00 ffff8803ee671600 ffffffff813b8a30
 ffff8803fe00ac00 ffff8803e5993b50 0000000000000000 ffffffff8143e89c
Call Trace:
 [<ffffffff813b8a30>] ? rtnetlink_rcv+0x30/0x30
 [<ffffffff8143e89c>] ? inet6_rtm_newroute+0x1c/0x30
 [<ffffffff813cb3b9>] ? netlink_rcv_skb+0x89/0xb0
 [<ffffffff813b8a1f>] ? rtnetlink_rcv+0x1f/0x30
 [<ffffffff813cb013>] ? netlink_unicast+0x283/0x2d0
 [<ffffffff813cb930>] ? netlink_sendmsg+0x230/0x390
 [<ffffffff8139639b>] ? sock_sendmsg+0xab/0xe0
 [<ffffffff810925eb>] ? __alloc_pages_nodemask+0x10b/0x700
 [<ffffffff810a3fc2>] ? __do_fault+0x3e2/0x4c0
 [<ffffffff81395b9e>] ? move_addr_to_kernel+0x2e/0x40
 [<ffffffff813a1fd9>] ? verify_iovec+0x69/0xd0
 [<ffffffff813972e2>] ? __sys_sendmsg+0x172/0x300
 [<ffffffff81027465>] ? do_page_fault+0x1a5/0x430
 [<ffffffff813cb6be>] ? netlink_autobind+0x8e/0xd0
 [<ffffffff81395bfc>] ? move_addr_to_user+0x4c/0x60
 [<ffffffff81396f55>] ? sys_getsockname+0xd5/0xe0
 [<ffffffff81397634>] ? sys_sendmsg+0x44/0x80
 [<ffffffff814a35bb>] ? system_call_fastpath+0x16/0x1b
Code: 31 c9 31 d2 45 31 c0 31 f6 41 bf f4 ff ff ff e8 b0 2d f7 ff 48 8d 90 a0 00 00 00 49 89 c4 b9 38 00 00 00 31 c0 4d 85 e4 48 89 d7 <f3> ab 0f 84 06 03 00 00 66 41 c7 44 24 6a ff ff 31 c0 f6 43 16
RIP  [<ffffffff8143e2b7>] ip6_route_add+0xe7/0x6b0
 RSP <ffff8803e59939f8>
CR2: 00000000000000a0
---[ end trace 370907621d87fefc ]---

I don't see many changes to ip6_route_add other than c3968a857a6b6c3.
Checking shortly once I get a git tree on this box, but no ipmi and I'm
remote at the moment.

Btw, maybe rt6_alloc_clone or rt6_alloc_cow needs to clear the DST_NOCOUNT
flag from rt->dst.flags for it to count any of them? Didn't verify.

Simon-

^ permalink raw reply

* Re: [PATCH net-next-2.6] inetpeer: remove unused list
From: Eric Dumazet @ 2011-06-09  3:47 UTC (permalink / raw)
  To: David Miller; +Cc: tim.c.chen, andi, netdev
In-Reply-To: <20110608.170557.2273990802892743973.davem@davemloft.net>

Le mercredi 08 juin 2011 à 17:05 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 09 Jun 2011 01:35:34 +0200
> 
> > Andi Kleen and Tim Chen reported huge contention on inetpeer
> > unused_peers.lock, on memcached workload on a 40 core machine, with
> > disabled route cache.
> > 
> > It appears we constantly flip peers refcnt between 0 and 1 values, and
> > we must insert/remove peers from unused_peers.list, holding a contended
> > spinlock.
> > 
> > Remove this list completely and perform a garbage collection on-the-fly,
> > at lookup time, using the expired nodes we met during the tree
> > traversal.
> > 
> > This removes a lot of code, makes locking more standard, and obsoletes
> > two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also
> > removes two pointers in inet_peer structure.
> > 
> > There is still a false sharing effect because refcnt is in first cache
> > line of object [were the links and keys used by lookups are located], we
> > might move it at the end of inet_peer structure to let this first cache
> > line mostly read by cpus.
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Didn't expect you to implement this so fast :-)
> 
> Applied, thanks!

Thanks David

I would be glad to hear about numbers from Andi and Tim, I am very
jealous of their 80 Threads machine ;)




^ permalink raw reply

* Re: [PATCH] RFC2988bis + taking RTT sample from 3WHS for the passive open side
From: Jerry Chu @ 2011-06-09  3:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Hagen Paul Pfeifer, tsunanet,
	netdev@vger.kernel.org
In-Reply-To: <1307576361.3980.26.camel@edumazet-laptop>

On Wed, Jun 8, 2011 at 4:39 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 08 juin 2011 à 15:26 -0700, Jerry Chu a écrit :
>> Eric,
>>
>> It just occurred to me now that initRTO is being reduced, both TCP_SYN_RETRIES
>> and TCP_SYNACK_RETRIES should be bumped up a bit, to e.g., 7 (?) to meet the
>> 3 minutes R2 requirement per RFC1122.
>>
>> If you agree, I will submit another patch.
>>
>
> Good catch, but no RFC lowered yet this 3 minutes requirement ?

Not that I know of. BTW, I'm not as nearly concerned about RFC compliance
as what's the "right thing" to do, e.g., if apps have some expectation on
how long, e.g., a connect() call will persistent before ETIMEOUT...

With exponential backoff, initRTO of 3secs and 5 retries will last 189 secs
before ETIMEOUT. Setting initRTO to 1sec reduces this time to 63 secs.
Guess it's still more than one minute so it's not too bad. Now I don't think
it's a good idea to raise retries to 7 or 247 secs, which seems too long (more
than 4 minutes) just to be RFC compliant. Raising retries to 6 or allowing
127 secs might be more reasonable though.

Thought?

Jerry

^ permalink raw reply

* Re: [PATCH 4/9] ifb: convert to 64 bit stats
From: Eric Dumazet @ 2011-06-09  3:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, netdev
In-Reply-To: <20110608203235.2a2be3e4@nehalam.ftrdhcpuser.net>

Le mercredi 08 juin 2011 à 20:32 -0700, Stephen Hemminger a écrit :

> IFB is running in single thread context. Therefore just locking would
> be enough

Yep, that is provided by include/linux/u64_stats_sync.h for free.

As a bonus, no locking on 64bit arches and a clean interface.




^ permalink raw reply

* Re: [PATCH 4/9] ifb: convert to 64 bit stats
From: Stephen Hemminger @ 2011-06-09  3:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, netdev
In-Reply-To: <1307590146.3980.28.camel@edumazet-laptop>

On Thu, 09 Jun 2011 05:29:06 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le mercredi 08 juin 2011 à 17:54 -0700, Stephen Hemminger a écrit :
> > pièce jointe document texte brut (ifb-stats64.patch)
> > Convert input functional block device to use 64 bit stats.
> > 
> > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> > 
> > 
> > --- a/drivers/net/ifb.c	2011-06-07 16:58:31.317079332 -0700
> > +++ b/drivers/net/ifb.c	2011-06-07 17:29:02.958161955 -0700
> > @@ -42,7 +42,14 @@ struct ifb_private {
> >  	struct tasklet_struct   ifb_tasklet;
> >  	int     tasklet_pending;
> >  	struct sk_buff_head     rq;
> > +	u64 rx_packets;
> > +	u64 rx_bytes;
> > +	u64 rx_dropped;
> > +
> >  	struct sk_buff_head     tq;
> > +	u64 tx_packets;
> > +	u64 tx_bytes;
> > +	u64 tx_dropped;
> >  };
> >  
> 
> Hi Stephen
> 
> This needs special synchronization on 32bit arches, as Ben pointed out
> for other drivers.

IFB is running in single thread context. Therefore just locking would
be enough

^ permalink raw reply

* Re: [PATCH 4/9] ifb: convert to 64 bit stats
From: Eric Dumazet @ 2011-06-09  3:29 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, netdev
In-Reply-To: <20110609005417.507470402@vyatta.com>

Le mercredi 08 juin 2011 à 17:54 -0700, Stephen Hemminger a écrit :
> pièce jointe document texte brut (ifb-stats64.patch)
> Convert input functional block device to use 64 bit stats.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> 
> --- a/drivers/net/ifb.c	2011-06-07 16:58:31.317079332 -0700
> +++ b/drivers/net/ifb.c	2011-06-07 17:29:02.958161955 -0700
> @@ -42,7 +42,14 @@ struct ifb_private {
>  	struct tasklet_struct   ifb_tasklet;
>  	int     tasklet_pending;
>  	struct sk_buff_head     rq;
> +	u64 rx_packets;
> +	u64 rx_bytes;
> +	u64 rx_dropped;
> +
>  	struct sk_buff_head     tq;
> +	u64 tx_packets;
> +	u64 tx_bytes;
> +	u64 tx_dropped;
>  };
>  

Hi Stephen

This needs special synchronization on 32bit arches, as Ben pointed out
for other drivers.




^ permalink raw reply

* Re: [PATCH 3/9] veth: convert to 64 bit statistics
From: Stephen Hemminger @ 2011-06-09  3:22 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David S. Miller, netdev
In-Reply-To: <1307584841.22348.528.camel@localhost>

On Thu, 09 Jun 2011 03:00:41 +0100
Ben Hutchings <bhutchings@solarflare.com> wrote:

> On Wed, 2011-06-08 at 17:53 -0700, Stephen Hemminger wrote:
> > Not much change, device was already keeping per cpu statistics.
> > Use recent 64 statistics interface.
> [...]
> 
> It's also going to need to use u64_stats_sync functions.
> 
> Ben.
> 

No veth is doing per-cpu update therefore the new code has the same guarantee
as the old code.

^ permalink raw reply

* Re: [PATCH net-next 0/2] bonding: delete two unused variables
From: Américo Wang @ 2011-06-09  3:19 UTC (permalink / raw)
  To: Weiping Pan; +Cc: fubar, andy, netdev, linux-kernel
In-Reply-To: <cover.1307587586.git.panweiping3@gmail.com>

On Thu, Jun 9, 2011 at 10:51 AM, Weiping Pan <panweiping3@gmail.com> wrote:
> Delete two unused variables in bonding.
>
> Weiping Pan (2):
>  bonding: delete unused ad_timer
>  bonding: delete unused arp_mon_pt
>

Both look good to me,

Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>

Thanks.

^ permalink raw reply

* [PATCH net-next 2/2] bonding: delete unused arp_mon_pt
From: Weiping Pan @ 2011-06-09  2:51 UTC (permalink / raw)
  To: fubar, andy; +Cc: netdev, linux-kernel, Weiping Pan
In-Reply-To: <cover.1307587586.git.panweiping3@gmail.com>

Now all received packets are handled by bond_handle_frame,
arp_mon_pt isn't used any more.

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
---
 drivers/net/bonding/bonding.h |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index ea1d005..382903f 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -240,7 +240,6 @@ struct bonding {
 	struct   bond_params params;
 	struct   list_head vlan_list;
 	struct   vlan_group *vlgrp;
-	struct   packet_type arp_mon_pt;
 	struct   workqueue_struct *wq;
 	struct   delayed_work mii_work;
 	struct   delayed_work arp_work;
-- 
1.7.4.4


^ permalink raw reply related

* [PATCH net-next 1/2] bonding: delete unused ad_timer
From: Weiping Pan @ 2011-06-09  2:51 UTC (permalink / raw)
  To: fubar, andy; +Cc: netdev, linux-kernel, Weiping Pan
In-Reply-To: <cover.1307587586.git.panweiping3@gmail.com>

Now we use agg_select_timer and ad_work.

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
---
 drivers/net/bonding/bond_3ad.h |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.h b/drivers/net/bonding/bond_3ad.h
index 0ee3f16..fb67185 100644
--- a/drivers/net/bonding/bond_3ad.h
+++ b/drivers/net/bonding/bond_3ad.h
@@ -257,7 +257,6 @@ struct ad_bond_info {
 	int lacp_fast;		/* whether fast periodic tx should be
 				 * requested
 				 */
-	struct timer_list ad_timer;
 };
 
 struct ad_slave_info {
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH net-next 0/2] bonding: delete two unused variables
From: Weiping Pan @ 2011-06-09  2:51 UTC (permalink / raw)
  To: fubar, andy; +Cc: netdev, linux-kernel, Weiping Pan

Delete two unused variables in bonding.

Weiping Pan (2):
  bonding: delete unused ad_timer
  bonding: delete unused arp_mon_pt

 drivers/net/bonding/bond_3ad.h |    1 -
 drivers/net/bonding/bonding.h  |    1 -
 2 files changed, 0 insertions(+), 2 deletions(-)

-- 
1.7.4.4


^ permalink raw reply

* [Patch] netpoll: prevent netpoll setup on slave devices
From: Amerigo Wang @ 2011-06-09  2:42 UTC (permalink / raw)
  To: netdev; +Cc: WANG Cong, Neil Horman

In commit 8d8fc29d02a33e4bd5f4fa47823c1fd386346093
(netpoll: disable netpoll when enslave a device), we automatically
disable netpoll when the underlying device is being enslaved,
we also need to prevent people from setuping netpoll on
devices that are already enslaved.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>

---
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 2d7d6d4..42ea4b0 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -792,6 +792,12 @@ int netpoll_setup(struct netpoll *np)
 		return -ENODEV;
 	}
 
+	if (ndev->master) {
+		printk(KERN_ERR "%s: %s is a slave device, aborting.\n",
+		       np->name, np->dev_name);
+		return -EBUSY;
+	}
+
 	if (!netif_running(ndev)) {
 		unsigned long atmost, atleast;
 

^ permalink raw reply related

* Re: [PATCH 3/9] veth: convert to 64 bit statistics
From: Ben Hutchings @ 2011-06-09  2:00 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, netdev
In-Reply-To: <20110609005417.449670103@vyatta.com>

On Wed, 2011-06-08 at 17:53 -0700, Stephen Hemminger wrote:
> Not much change, device was already keeping per cpu statistics.
> Use recent 64 statistics interface.
[...]

It's also going to need to use u64_stats_sync functions.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH 2/9] xen: convert to 64 bit stats interface
From: Ben Hutchings @ 2011-06-09  1:56 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, Jeremy Fitzhardinge, netdev, xen-devel
In-Reply-To: <20110609005417.385577995@vyatta.com>

On Wed, 2011-06-08 at 17:53 -0700, Stephen Hemminger wrote:
> Convert xen driver to 64 bit statistics interface.
> This driver was already counting packet per queue in a 64 bit value so not
> a huge change.
[...]

I think this driver will need to use u64_stats_sync.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH 9/9] s2io: implement 64 bit stats
From: Ben Hutchings @ 2011-06-09  1:52 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, Jon Mason, netdev
In-Reply-To: <20110609005417.817071451@vyatta.com>

On Wed, 2011-06-08 at 17:54 -0700, Stephen Hemminger wrote:
> plain text document attachment (s2io-stats.patch)
> Convert s2io driver to 64 bit statistics interface.
> This driver keeps it private set of counters so change those
> to the 64 bit netlink type.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> 
> --- a/drivers/net/s2io.c	2011-06-07 17:35:37.372117743 -0700
> +++ b/drivers/net/s2io.c	2011-06-07 17:44:14.102680070 -0700
> @@ -4897,7 +4897,8 @@ static void s2io_updt_stats(struct s2io_
>   *  Return value:
>   *  pointer to the updated net_device_stats structure.
>   */
> -static struct net_device_stats *s2io_get_stats(struct net_device *dev)
> +static struct rtnl_link_stats64 *s2io_get_stats(struct net_device *dev,
> +						struct rtnl_link_stats64 *net_stats)
>  {
>  	struct s2io_nic *sp = netdev_priv(dev);
>  	struct mac_info *mac_control = &sp->mac_control;
> @@ -4916,40 +4917,32 @@ static struct net_device_stats *s2io_get
>  	 */
>  	delta = ((u64) le32_to_cpu(stats->rmac_vld_frms_oflow) << 32 |
>  		le32_to_cpu(stats->rmac_vld_frms)) - sp->stats.rx_packets;
> -	sp->stats.rx_packets += delta;
> -	dev->stats.rx_packets += delta;
> +	sp->stats.tx_packets += delta;

This is now adding to tx_packets, not rx_packets.

>  	delta = ((u64) le32_to_cpu(stats->tmac_frms_oflow) << 32 |
>  		le32_to_cpu(stats->tmac_frms)) - sp->stats.tx_packets;
>  	sp->stats.tx_packets += delta;
> -	dev->stats.tx_packets += delta;
>  
>  	delta = ((u64) le32_to_cpu(stats->rmac_data_octets_oflow) << 32 |
>  		le32_to_cpu(stats->rmac_data_octets)) - sp->stats.rx_bytes;
>  	sp->stats.rx_bytes += delta;
> -	dev->stats.rx_bytes += delta;
[...]

It seems to me that the delta calculations and sp->stats are no longer
necessary.  These can be just:

	net_stats->rx_packets = ((u64) le32_to_cpu(stats->rmac_vld_frms_oflow) << 32 |
				 le32_to_cpu(stats->rmac_vld_frms));
 	net_stats->tx_packets = ((u64) le32_to_cpu(stats->tmac_frms_oflow) << 32 |
				 le32_to_cpu(stats->tmac_frms));
	net_stats->rx_bytes = ((u64) le32_to_cpu(stats->rmac_data_octets_oflow) << 32 |
		 	       le32_to_cpu(stats->rmac_data_octets));
	...

[...]
> @@ -7447,7 +7437,7 @@ static int rx_osm_handler(struct ring_in
>  		if (err_mask != 0x5) {
>  			DBG_PRINT(ERR_DBG, "%s: Rx error Value: 0x%x\n",
>  				  dev->name, err_mask);
> -			dev->stats.rx_crc_errors++;
> +			sp->stats.rx_crc_errors++;
[...]

This won't be atomic on 32-bit systems.  If this isn't double-counting
CRC errors (I suspect it is) then there needs to be a separate unsigned
long counter that's folded in by s2io_get_stats().

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [RFC] breakage in sysfs_readdir() and s_instances abuse in sysfs
From: Al Viro @ 2011-06-09  1:26 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-fsdevel, Linus Torvalds, netdev, Linux Containers
In-Reply-To: <20110607225902.GS11521@ZenIV.linux.org.uk>

On Tue, Jun 07, 2011 at 11:59:02PM +0100, Al Viro wrote:

> Completely untested patch follows:

FWIW, modulo a couple of brainos it seems to work here without blowing
everything up.  Actual freeing of struct net is delayed until after
umount of sysfs instances refering it, shutdown is *not* delayed, sysfs
entries are removed nicely as they ought to, no objects from other net_ns
are picked by readdir or lookup...

If somebody has objections - yell.  If I don't hear anything, the following
goes to -next:

commit 72d4e98002b45598bb88af74eeac20874f2789be
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Jun 8 21:13:01 2011 -0400

    Delay struct net freeing while there's a sysfs instance refering to it
    
    	* new refcount in struct net, controlling actual freeing of the memory
    	* new method in kobj_ns_type_operations (->put_ns())
    	* ->current_ns() semantics change - it's supposed to be followed by
    corresponding ->put_ns().  For struct net in case of CONFIG_NET_NS it bumps
    the new refcount; net_put_ns() decrements it and calls net_free() if the
    last reference has been dropped.
    	* old net_free() callers call net_put_ns() instead.
    	* sysfs_exit_ns() is gone, along with a large part of callchain
    leading to it; now that the references stored in ->ns[...] stay valid we
    do not need to hunt them down and replace them with NULL.  That fixes
    problems in sysfs_lookup() and sysfs_readdir(), along with getting rid
    of sb->s_instances abuse.
    
    	Note that struct net *shutdown* logics has not changed - net_cleanup()
    is called exactly when it used to be called.  The only thing postponed by
    having a sysfs instance refering to that struct net is actual freeing of
    memory occupied by struct net.
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index 2668957..1e7a2ee 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -111,8 +111,11 @@ static struct dentry *sysfs_mount(struct file_system_type *fs_type,
 		info->ns[type] = kobj_ns_current(type);
 
 	sb = sget(fs_type, sysfs_test_super, sysfs_set_super, info);
-	if (IS_ERR(sb) || sb->s_fs_info != info)
+	if (IS_ERR(sb) || sb->s_fs_info != info) {
+		for (type = KOBJ_NS_TYPE_NONE; type < KOBJ_NS_TYPES; type++)
+			kobj_ns_put(type, info->ns[type]);
 		kfree(info);
+	}
 	if (IS_ERR(sb))
 		return ERR_CAST(sb);
 	if (!sb->s_root) {
@@ -131,11 +134,14 @@ static struct dentry *sysfs_mount(struct file_system_type *fs_type,
 static void sysfs_kill_sb(struct super_block *sb)
 {
 	struct sysfs_super_info *info = sysfs_info(sb);
+	int type;
 
 	/* Remove the superblock from fs_supers/s_instances
 	 * so we can't find it, before freeing sysfs_super_info.
 	 */
 	kill_anon_super(sb);
+	for (type = KOBJ_NS_TYPE_NONE; type < KOBJ_NS_TYPES; type++)
+		kobj_ns_put(type, info->ns[type]);
 	kfree(info);
 }
 
@@ -145,28 +151,6 @@ static struct file_system_type sysfs_fs_type = {
 	.kill_sb	= sysfs_kill_sb,
 };
 
-void sysfs_exit_ns(enum kobj_ns_type type, const void *ns)
-{
-	struct super_block *sb;
-
-	mutex_lock(&sysfs_mutex);
-	spin_lock(&sb_lock);
-	list_for_each_entry(sb, &sysfs_fs_type.fs_supers, s_instances) {
-		struct sysfs_super_info *info = sysfs_info(sb);
-		/*
-		 * If we see a superblock on the fs_supers/s_instances
-		 * list the unmount has not completed and sb->s_fs_info
-		 * points to a valid struct sysfs_super_info.
-		 */
-		/* Ignore superblocks with the wrong ns */
-		if (info->ns[type] != ns)
-			continue;
-		info->ns[type] = NULL;
-	}
-	spin_unlock(&sb_lock);
-	mutex_unlock(&sysfs_mutex);
-}
-
 int __init sysfs_init(void)
 {
 	int err = -ENOMEM;
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 3d28af3..2ed2404 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -136,7 +136,7 @@ struct sysfs_addrm_cxt {
  * instance).
  */
 struct sysfs_super_info {
-	const void *ns[KOBJ_NS_TYPES];
+	void *ns[KOBJ_NS_TYPES];
 };
 #define sysfs_info(SB) ((struct sysfs_super_info *)(SB->s_fs_info))
 extern struct sysfs_dirent sysfs_root;
diff --git a/include/linux/kobject_ns.h b/include/linux/kobject_ns.h
index 82cb5bf..5fa481c 100644
--- a/include/linux/kobject_ns.h
+++ b/include/linux/kobject_ns.h
@@ -38,9 +38,10 @@ enum kobj_ns_type {
  */
 struct kobj_ns_type_operations {
 	enum kobj_ns_type type;
-	const void *(*current_ns)(void);
+	void *(*current_ns)(void);
 	const void *(*netlink_ns)(struct sock *sk);
 	const void *(*initial_ns)(void);
+	void (*put_ns)(void *);
 };
 
 int kobj_ns_type_register(const struct kobj_ns_type_operations *ops);
@@ -48,9 +49,9 @@ int kobj_ns_type_registered(enum kobj_ns_type type);
 const struct kobj_ns_type_operations *kobj_child_ns_ops(struct kobject *parent);
 const struct kobj_ns_type_operations *kobj_ns_ops(struct kobject *kobj);
 
-const void *kobj_ns_current(enum kobj_ns_type type);
+void *kobj_ns_current(enum kobj_ns_type type);
 const void *kobj_ns_netlink(enum kobj_ns_type type, struct sock *sk);
 const void *kobj_ns_initial(enum kobj_ns_type type);
-void kobj_ns_exit(enum kobj_ns_type type, const void *ns);
+void kobj_ns_put(enum kobj_ns_type type, void *ns);
 
 #endif /* _LINUX_KOBJECT_NS_H */
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index c3acda6..e2696d7 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -177,9 +177,6 @@ struct sysfs_dirent *sysfs_get_dirent(struct sysfs_dirent *parent_sd,
 struct sysfs_dirent *sysfs_get(struct sysfs_dirent *sd);
 void sysfs_put(struct sysfs_dirent *sd);
 
-/* Called to clear a ns tag when it is no longer valid */
-void sysfs_exit_ns(enum kobj_ns_type type, const void *tag);
-
 int __must_check sysfs_init(void);
 
 #else /* CONFIG_SYSFS */
@@ -338,10 +335,6 @@ static inline void sysfs_put(struct sysfs_dirent *sd)
 {
 }
 
-static inline void sysfs_exit_ns(int type, const void *tag)
-{
-}
-
 static inline int __must_check sysfs_init(void)
 {
 	return 0;
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 2bf9ed9..415af64 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -35,8 +35,11 @@ struct netns_ipvs;
 #define NETDEV_HASHENTRIES (1 << NETDEV_HASHBITS)
 
 struct net {
+	atomic_t		passive;	/* To decided when the network
+						 * namespace should be freed.
+						 */
 	atomic_t		count;		/* To decided when the network
-						 *  namespace should be freed.
+						 *  namespace should be shut down.
 						 */
 #ifdef NETNS_REFCNT_DEBUG
 	atomic_t		use_count;	/* To track references we
@@ -154,6 +157,9 @@ int net_eq(const struct net *net1, const struct net *net2)
 {
 	return net1 == net2;
 }
+
+extern void net_put_ns(void *);
+
 #else
 
 static inline struct net *get_net(struct net *net)
@@ -175,6 +181,8 @@ int net_eq(const struct net *net1, const struct net *net2)
 {
 	return 1;
 }
+
+#define net_put_ns NULL
 #endif
 
 
diff --git a/lib/kobject.c b/lib/kobject.c
index 82dc34c..43116b2 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -948,9 +948,9 @@ const struct kobj_ns_type_operations *kobj_ns_ops(struct kobject *kobj)
 }
 
 
-const void *kobj_ns_current(enum kobj_ns_type type)
+void *kobj_ns_current(enum kobj_ns_type type)
 {
-	const void *ns = NULL;
+	void *ns = NULL;
 
 	spin_lock(&kobj_ns_type_lock);
 	if ((type > KOBJ_NS_TYPE_NONE) && (type < KOBJ_NS_TYPES) &&
@@ -987,23 +987,15 @@ const void *kobj_ns_initial(enum kobj_ns_type type)
 	return ns;
 }
 
-/*
- * kobj_ns_exit - invalidate a namespace tag
- *
- * @type: the namespace type (i.e. KOBJ_NS_TYPE_NET)
- * @ns: the actual namespace being invalidated
- *
- * This is called when a tag is no longer valid.  For instance,
- * when a network namespace exits, it uses this helper to
- * make sure no sb's sysfs_info points to the now-invalidated
- * netns.
- */
-void kobj_ns_exit(enum kobj_ns_type type, const void *ns)
+void kobj_ns_put(enum kobj_ns_type type, void *ns)
 {
-	sysfs_exit_ns(type, ns);
+	spin_lock(&kobj_ns_type_lock);
+	if ((type > KOBJ_NS_TYPE_NONE) && (type < KOBJ_NS_TYPES) &&
+	    kobj_ns_ops_tbl[type] && kobj_ns_ops_tbl[type]->put_ns)
+		kobj_ns_ops_tbl[type]->put_ns(ns);
+	spin_unlock(&kobj_ns_type_lock);
 }
 
-
 EXPORT_SYMBOL(kobject_get);
 EXPORT_SYMBOL(kobject_put);
 EXPORT_SYMBOL(kobject_del);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 11b98bc..0c799f1 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1179,9 +1179,14 @@ static void remove_queue_kobjects(struct net_device *net)
 #endif
 }
 
-static const void *net_current_ns(void)
+static void *net_current_ns(void)
 {
-	return current->nsproxy->net_ns;
+	struct net *ns = current->nsproxy->net_ns;
+#ifdef CONFIG_NET_NS
+	if (ns)
+		atomic_inc(&ns->passive);
+#endif
+	return ns;
 }
 
 static const void *net_initial_ns(void)
@@ -1199,19 +1204,10 @@ struct kobj_ns_type_operations net_ns_type_operations = {
 	.current_ns = net_current_ns,
 	.netlink_ns = net_netlink_ns,
 	.initial_ns = net_initial_ns,
+	.put_ns = net_put_ns,
 };
 EXPORT_SYMBOL_GPL(net_ns_type_operations);
 
-static void net_kobj_ns_exit(struct net *net)
-{
-	kobj_ns_exit(KOBJ_NS_TYPE_NET, net);
-}
-
-static struct pernet_operations kobj_net_ops = {
-	.exit = net_kobj_ns_exit,
-};
-
-
 #ifdef CONFIG_HOTPLUG
 static int netdev_uevent(struct device *d, struct kobj_uevent_env *env)
 {
@@ -1339,6 +1335,5 @@ EXPORT_SYMBOL(netdev_class_remove_file);
 int netdev_kobject_init(void)
 {
 	kobj_ns_type_register(&net_ns_type_operations);
-	register_pernet_subsys(&kobj_net_ops);
 	return class_register(&net_class);
 }
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 6c6b86d..19feb20 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -128,6 +128,7 @@ static __net_init int setup_net(struct net *net)
 	LIST_HEAD(net_exit_list);
 
 	atomic_set(&net->count, 1);
+	atomic_set(&net->passive, 1);
 
 #ifdef NETNS_REFCNT_DEBUG
 	atomic_set(&net->use_count, 0);
@@ -210,6 +211,13 @@ static void net_free(struct net *net)
 	kmem_cache_free(net_cachep, net);
 }
 
+void net_put_ns(void *p)
+{
+	struct net *ns = p;
+	if (ns && atomic_dec_and_test(&ns->passive))
+		net_free(ns);
+}
+
 struct net *copy_net_ns(unsigned long flags, struct net *old_net)
 {
 	struct net *net;
@@ -230,7 +238,7 @@ struct net *copy_net_ns(unsigned long flags, struct net *old_net)
 	}
 	mutex_unlock(&net_mutex);
 	if (rv < 0) {
-		net_free(net);
+		net_put_ns(net);
 		return ERR_PTR(rv);
 	}
 	return net;
@@ -286,7 +294,7 @@ static void cleanup_net(struct work_struct *work)
 	/* Finally it is safe to free my network namespace structure */
 	list_for_each_entry_safe(net, tmp, &net_exit_list, exit_list) {
 		list_del_init(&net->exit_list);
-		net_free(net);
+		net_put_ns(net);
 	}
 }
 static DECLARE_WORK(net_cleanup_work, cleanup_net);

^ permalink raw reply related

* [PATCH 6/9] enic: update to support 64 bit stats
From: Stephen Hemminger @ 2011-06-09  0:54 UTC (permalink / raw)
  To: David S. Miller, Christian Benvenuti, Vasanthy Kolluri,
	Roopa Prabhu, David Wang
  Cc: netdev
In-Reply-To: <20110609005356.160260858@vyatta.com>

[-- Attachment #1: enic-stats64.patch --]
[-- Type: text/plain, Size: 1503 bytes --]

The device driver already uses 64 bit statistics, it just
doesn't use the 64 bit interface.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/drivers/net/enic/enic_main.c	2011-06-07 16:58:31.317079332 -0700
+++ b/drivers/net/enic/enic_main.c	2011-06-07 17:29:09.670195233 -0700
@@ -801,10 +801,10 @@ static netdev_tx_t enic_hard_start_xmit(
 }
 
 /* dev_base_lock rwlock held, nominally process context */
-static struct net_device_stats *enic_get_stats(struct net_device *netdev)
+static struct rtnl_link_stats64 *enic_get_stats(struct net_device *netdev,
+						struct rtnl_link_stats64 *net_stats)
 {
 	struct enic *enic = netdev_priv(netdev);
-	struct net_device_stats *net_stats = &netdev->stats;
 	struct vnic_stats *stats;
 
 	enic_dev_stats_dump(enic, &stats);
@@ -2117,7 +2117,7 @@ static const struct net_device_ops enic_
 	.ndo_open		= enic_open,
 	.ndo_stop		= enic_stop,
 	.ndo_start_xmit		= enic_hard_start_xmit,
-	.ndo_get_stats		= enic_get_stats,
+	.ndo_get_stats64	= enic_get_stats,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_set_rx_mode	= enic_set_rx_mode,
 	.ndo_set_multicast_list	= enic_set_rx_mode,
@@ -2139,7 +2139,7 @@ static const struct net_device_ops enic_
 	.ndo_open		= enic_open,
 	.ndo_stop		= enic_stop,
 	.ndo_start_xmit		= enic_hard_start_xmit,
-	.ndo_get_stats		= enic_get_stats,
+	.ndo_get_stats64	= enic_get_stats,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_set_mac_address	= enic_set_mac_address,
 	.ndo_set_rx_mode	= enic_set_rx_mode,



^ permalink raw reply

* [PATCH 1/9] vmxnet3: convert to 64 bit stats interface
From: Stephen Hemminger @ 2011-06-09  0:53 UTC (permalink / raw)
  To: David S. Miller, Shreyas Bhatewara; +Cc: netdev, pv-drivers
In-Reply-To: <20110609005356.160260858@vyatta.com>

[-- Attachment #1: vmxnet3-stats64.patch --]
[-- Type: text/plain, Size: 4534 bytes --]

Convert vmxnet3 driver to 64 bit statistics interface.
This driver was already counting packet per queue in a 64 bit value so not
a huge change. Eliminate unused old net_device_stats structure.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/drivers/net/vmxnet3/vmxnet3_drv.c	2011-06-04 22:01:31.932389206 +0900
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c	2011-06-07 19:50:23.189420166 +0900
@@ -2864,7 +2864,7 @@ vmxnet3_probe_device(struct pci_dev *pde
 		.ndo_set_mac_address = vmxnet3_set_mac_addr,
 		.ndo_change_mtu = vmxnet3_change_mtu,
 		.ndo_set_features = vmxnet3_set_features,
-		.ndo_get_stats = vmxnet3_get_stats,
+		.ndo_get_stats64 = vmxnet3_get_stats64,
 		.ndo_tx_timeout = vmxnet3_tx_timeout,
 		.ndo_set_multicast_list = vmxnet3_set_mc,
 		.ndo_vlan_rx_register = vmxnet3_vlan_rx_register,
--- a/drivers/net/vmxnet3/vmxnet3_ethtool.c	2011-06-04 21:54:27.110282630 +0900
+++ b/drivers/net/vmxnet3/vmxnet3_ethtool.c	2011-06-07 20:13:34.492319266 +0900
@@ -113,15 +113,15 @@ vmxnet3_global_stats[] = {
 };
 
 
-struct net_device_stats *
-vmxnet3_get_stats(struct net_device *netdev)
+struct rtnl_link_stats64 *
+vmxnet3_get_stats64(struct net_device *netdev,
+		   struct rtnl_link_stats64 *stats)
 {
 	struct vmxnet3_adapter *adapter;
 	struct vmxnet3_tq_driver_stats *drvTxStats;
 	struct vmxnet3_rq_driver_stats *drvRxStats;
 	struct UPT1_TxStats *devTxStats;
 	struct UPT1_RxStats *devRxStats;
-	struct net_device_stats *net_stats = &netdev->stats;
 	unsigned long flags;
 	int i;
 
@@ -132,36 +132,36 @@ vmxnet3_get_stats(struct net_device *net
 	VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD, VMXNET3_CMD_GET_STATS);
 	spin_unlock_irqrestore(&adapter->cmd_lock, flags);
 
-	memset(net_stats, 0, sizeof(*net_stats));
 	for (i = 0; i < adapter->num_tx_queues; i++) {
 		devTxStats = &adapter->tqd_start[i].stats;
 		drvTxStats = &adapter->tx_queue[i].stats;
-		net_stats->tx_packets += devTxStats->ucastPktsTxOK +
-					devTxStats->mcastPktsTxOK +
-					devTxStats->bcastPktsTxOK;
-		net_stats->tx_bytes += devTxStats->ucastBytesTxOK +
-				      devTxStats->mcastBytesTxOK +
-				      devTxStats->bcastBytesTxOK;
-		net_stats->tx_errors += devTxStats->pktsTxError;
-		net_stats->tx_dropped += drvTxStats->drop_total;
+		stats->tx_packets += devTxStats->ucastPktsTxOK +
+				     devTxStats->mcastPktsTxOK +
+				     devTxStats->bcastPktsTxOK;
+		stats->tx_bytes += devTxStats->ucastBytesTxOK +
+				   devTxStats->mcastBytesTxOK +
+				   devTxStats->bcastBytesTxOK;
+		stats->tx_errors += devTxStats->pktsTxError;
+		stats->tx_dropped += drvTxStats->drop_total;
 	}
 
 	for (i = 0; i < adapter->num_rx_queues; i++) {
 		devRxStats = &adapter->rqd_start[i].stats;
 		drvRxStats = &adapter->rx_queue[i].stats;
-		net_stats->rx_packets += devRxStats->ucastPktsRxOK +
-					devRxStats->mcastPktsRxOK +
-					devRxStats->bcastPktsRxOK;
-
-		net_stats->rx_bytes += devRxStats->ucastBytesRxOK +
-				      devRxStats->mcastBytesRxOK +
-				      devRxStats->bcastBytesRxOK;
-
-		net_stats->rx_errors += devRxStats->pktsRxError;
-		net_stats->rx_dropped += drvRxStats->drop_total;
-		net_stats->multicast +=  devRxStats->mcastPktsRxOK;
+		stats->rx_packets += devRxStats->ucastPktsRxOK +
+				     devRxStats->mcastPktsRxOK +
+				     devRxStats->bcastPktsRxOK;
+
+		stats->rx_bytes += devRxStats->ucastBytesRxOK +
+				   devRxStats->mcastBytesRxOK +
+				   devRxStats->bcastBytesRxOK;
+
+		stats->rx_errors += devRxStats->pktsRxError;
+		stats->rx_dropped += drvRxStats->drop_total;
+		stats->multicast +=  devRxStats->mcastPktsRxOK;
 	}
-	return net_stats;
+
+	return stats;
 }
 
 static int
--- a/drivers/net/vmxnet3/vmxnet3_int.h	2011-06-04 22:02:03.048543506 +0900
+++ b/drivers/net/vmxnet3/vmxnet3_int.h	2011-06-07 19:54:22.890608779 +0900
@@ -323,7 +323,6 @@ struct vmxnet3_adapter {
 	struct Vmxnet3_TxQueueDesc	*tqd_start;     /* all tx queue desc */
 	struct Vmxnet3_RxQueueDesc	*rqd_start;	/* all rx queue desc */
 	struct net_device		*netdev;
-	struct net_device_stats		net_stats;
 	struct pci_dev			*pdev;
 
 	u8			__iomem *hw_addr0; /* for BAR 0 */
@@ -407,7 +406,9 @@ vmxnet3_create_queues(struct vmxnet3_ada
 		      u32 tx_ring_size, u32 rx_ring_size, u32 rx_ring2_size);
 
 extern void vmxnet3_set_ethtool_ops(struct net_device *netdev);
-extern struct net_device_stats *vmxnet3_get_stats(struct net_device *netdev);
+
+extern struct rtnl_link_stats64 *
+vmxnet3_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats);
 
 extern char vmxnet3_driver_name[];
 #endif



^ permalink raw reply

* [PATCH 7/9] myricom: update to 64 bit stats
From: Stephen Hemminger @ 2011-06-09  0:54 UTC (permalink / raw)
  To: David S. Miller, Andrew Gallatin, Brice Goglin; +Cc: netdev
In-Reply-To: <20110609005356.160260858@vyatta.com>

[-- Attachment #1: myricom-stats.patch --]
[-- Type: text/plain, Size: 2184 bytes --]

Driver was already keeping 64 bit counters, just not using the new interface.

Ps: IMHO drivers should not be duplicating network device
stats into ethtool stats. It is useless duplication.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


--- a/drivers/net/myri10ge/myri10ge.c	2011-06-07 16:58:31.333079418 -0700
+++ b/drivers/net/myri10ge/myri10ge.c	2011-06-07 17:29:13.298213223 -0700
@@ -377,7 +377,8 @@ static inline void put_be32(__be32 val,
 	__raw_writel((__force __u32) val, (__force void __iomem *)p);
 }
 
-static struct net_device_stats *myri10ge_get_stats(struct net_device *dev);
+static struct rtnl_link_stats64 *myri10ge_get_stats(struct net_device *dev,
+						    struct rtnl_link_stats64 *stats);
 
 static void set_fw_name(struct myri10ge_priv *mgp, char *name, bool allocated)
 {
@@ -1831,13 +1832,14 @@ myri10ge_get_ethtool_stats(struct net_de
 {
 	struct myri10ge_priv *mgp = netdev_priv(netdev);
 	struct myri10ge_slice_state *ss;
+	struct rtnl_link_stats64 link_stats;
 	int slice;
 	int i;
 
 	/* force stats update */
-	(void)myri10ge_get_stats(netdev);
+	(void)myri10ge_get_stats(netdev, &link_stats);
 	for (i = 0; i < MYRI10GE_NET_STATS_LEN; i++)
-		data[i] = ((unsigned long *)&netdev->stats)[i];
+		data[i] = ((u64 *)&link_stats)[i];
 
 	data[i++] = (unsigned int)mgp->tx_boundary;
 	data[i++] = (unsigned int)mgp->wc_enabled;
@@ -2976,11 +2978,11 @@ drop:
 	return NETDEV_TX_OK;
 }
 
-static struct net_device_stats *myri10ge_get_stats(struct net_device *dev)
+static struct rtnl_link_stats64 *myri10ge_get_stats(struct net_device *dev,
+						    struct rtnl_link_stats64 *stats)
 {
 	struct myri10ge_priv *mgp = netdev_priv(dev);
 	struct myri10ge_slice_netstats *slice_stats;
-	struct net_device_stats *stats = &dev->stats;
 	int i;
 
 	spin_lock(&mgp->stats_lock);
@@ -3790,7 +3792,7 @@ static const struct net_device_ops myri1
 	.ndo_open		= myri10ge_open,
 	.ndo_stop		= myri10ge_close,
 	.ndo_start_xmit		= myri10ge_xmit,
-	.ndo_get_stats		= myri10ge_get_stats,
+	.ndo_get_stats64	= myri10ge_get_stats,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_change_mtu		= myri10ge_change_mtu,
 	.ndo_fix_features	= myri10ge_fix_features,



^ permalink raw reply

* [PATCH 9/9] s2io: implement 64 bit stats
From: Stephen Hemminger @ 2011-06-09  0:54 UTC (permalink / raw)
  To: David S. Miller, Jon Mason; +Cc: netdev
In-Reply-To: <20110609005356.160260858@vyatta.com>

[-- Attachment #1: s2io-stats.patch --]
[-- Type: text/plain, Size: 4085 bytes --]

Convert s2io driver to 64 bit statistics interface.
This driver keeps it private set of counters so change those
to the 64 bit netlink type.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


--- a/drivers/net/s2io.c	2011-06-07 17:35:37.372117743 -0700
+++ b/drivers/net/s2io.c	2011-06-07 17:44:14.102680070 -0700
@@ -4897,7 +4897,8 @@ static void s2io_updt_stats(struct s2io_
  *  Return value:
  *  pointer to the updated net_device_stats structure.
  */
-static struct net_device_stats *s2io_get_stats(struct net_device *dev)
+static struct rtnl_link_stats64 *s2io_get_stats(struct net_device *dev,
+						struct rtnl_link_stats64 *net_stats)
 {
 	struct s2io_nic *sp = netdev_priv(dev);
 	struct mac_info *mac_control = &sp->mac_control;
@@ -4916,40 +4917,32 @@ static struct net_device_stats *s2io_get
 	 */
 	delta = ((u64) le32_to_cpu(stats->rmac_vld_frms_oflow) << 32 |
 		le32_to_cpu(stats->rmac_vld_frms)) - sp->stats.rx_packets;
-	sp->stats.rx_packets += delta;
-	dev->stats.rx_packets += delta;
+	sp->stats.tx_packets += delta;
 
 	delta = ((u64) le32_to_cpu(stats->tmac_frms_oflow) << 32 |
 		le32_to_cpu(stats->tmac_frms)) - sp->stats.tx_packets;
 	sp->stats.tx_packets += delta;
-	dev->stats.tx_packets += delta;
 
 	delta = ((u64) le32_to_cpu(stats->rmac_data_octets_oflow) << 32 |
 		le32_to_cpu(stats->rmac_data_octets)) - sp->stats.rx_bytes;
 	sp->stats.rx_bytes += delta;
-	dev->stats.rx_bytes += delta;
 
 	delta = ((u64) le32_to_cpu(stats->tmac_data_octets_oflow) << 32 |
 		le32_to_cpu(stats->tmac_data_octets)) - sp->stats.tx_bytes;
 	sp->stats.tx_bytes += delta;
-	dev->stats.tx_bytes += delta;
 
 	delta = le64_to_cpu(stats->rmac_drop_frms) - sp->stats.rx_errors;
 	sp->stats.rx_errors += delta;
-	dev->stats.rx_errors += delta;
 
 	delta = ((u64) le32_to_cpu(stats->tmac_any_err_frms_oflow) << 32 |
 		le32_to_cpu(stats->tmac_any_err_frms)) - sp->stats.tx_errors;
 	sp->stats.tx_errors += delta;
-	dev->stats.tx_errors += delta;
 
 	delta = le64_to_cpu(stats->rmac_drop_frms) - sp->stats.rx_dropped;
 	sp->stats.rx_dropped += delta;
-	dev->stats.rx_dropped += delta;
 
 	delta = le64_to_cpu(stats->tmac_drop_frms) - sp->stats.tx_dropped;
 	sp->stats.tx_dropped += delta;
-	dev->stats.tx_dropped += delta;
 
 	/* The adapter MAC interprets pause frames as multicast packets, but
 	 * does not pass them up.  This erroneously increases the multicast
@@ -4961,19 +4954,16 @@ static struct net_device_stats *s2io_get
 	delta -= le64_to_cpu(stats->rmac_pause_ctrl_frms);
 	delta -= sp->stats.multicast;
 	sp->stats.multicast += delta;
-	dev->stats.multicast += delta;
 
 	delta = ((u64) le32_to_cpu(stats->rmac_usized_frms_oflow) << 32 |
 		le32_to_cpu(stats->rmac_usized_frms)) +
 		le64_to_cpu(stats->rmac_long_frms) - sp->stats.rx_length_errors;
 	sp->stats.rx_length_errors += delta;
-	dev->stats.rx_length_errors += delta;
 
 	delta = le64_to_cpu(stats->rmac_fcs_err_frms) - sp->stats.rx_crc_errors;
 	sp->stats.rx_crc_errors += delta;
-	dev->stats.rx_crc_errors += delta;
 
-	return &dev->stats;
+	return &sp->stats;
 }
 
 /**
@@ -7447,7 +7437,7 @@ static int rx_osm_handler(struct ring_in
 		if (err_mask != 0x5) {
 			DBG_PRINT(ERR_DBG, "%s: Rx error Value: 0x%x\n",
 				  dev->name, err_mask);
-			dev->stats.rx_crc_errors++;
+			sp->stats.rx_crc_errors++;
 			swstats->mem_freed
 				+= skb->truesize;
 			dev_kfree_skb(skb);
@@ -7731,7 +7721,7 @@ static int rts_ds_steer(struct s2io_nic
 static const struct net_device_ops s2io_netdev_ops = {
 	.ndo_open	        = s2io_open,
 	.ndo_stop	        = s2io_close,
-	.ndo_get_stats	        = s2io_get_stats,
+	.ndo_get_stats64        = s2io_get_stats,
 	.ndo_start_xmit    	= s2io_xmit,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_set_multicast_list = s2io_set_multicast,
--- a/drivers/net/s2io.h	2011-06-07 17:38:50.921077503 -0700
+++ b/drivers/net/s2io.h	2011-06-07 17:43:59.478607560 -0700
@@ -871,7 +871,7 @@ struct s2io_nic {
 
 	struct mac_addr def_mac_addr[256];
 
-	struct net_device_stats stats;
+	struct rtnl_link_stats64 stats;
 	int high_dma_flag;
 	int device_enabled_once;
 



^ permalink raw reply

* [PATCH 5/9] netxen: convert to 64 bit statistics
From: Stephen Hemminger @ 2011-06-09  0:54 UTC (permalink / raw)
  To: David S. Miller, Amit Kumar Salecha; +Cc: netdev
In-Reply-To: <20110609005356.160260858@vyatta.com>

[-- Attachment #1: netxen-getstats64.patch --]
[-- Type: text/plain, Size: 1752 bytes --]

Change to 64 bit statistics interface, driver was already maintaining 64 bit
value.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/drivers/net/netxen/netxen_nic_main.c	2011-06-07 20:33:48.150337471 +0900
+++ b/drivers/net/netxen/netxen_nic_main.c	2011-06-07 20:57:29.912628921 +0900
@@ -92,7 +92,8 @@ static irqreturn_t netxen_msi_intr(int i
 static irqreturn_t netxen_msix_intr(int irq, void *data);
 
 static void netxen_config_indev_addr(struct net_device *dev, unsigned long);
-static struct net_device_stats *netxen_nic_get_stats(struct net_device *netdev);
+static struct rtnl_link_stats64 *netxen_nic_get_stats(struct net_device *dev,
+						      struct rtnl_link_stats64 *stats);
 static int netxen_nic_set_mac(struct net_device *netdev, void *p);
 
 /*  PCI Device ID Table  */
@@ -520,7 +521,7 @@ static const struct net_device_ops netxe
 	.ndo_open	   = netxen_nic_open,
 	.ndo_stop	   = netxen_nic_close,
 	.ndo_start_xmit    = netxen_nic_xmit_frame,
-	.ndo_get_stats	   = netxen_nic_get_stats,
+	.ndo_get_stats64   = netxen_nic_get_stats,
 	.ndo_validate_addr = eth_validate_addr,
 	.ndo_set_multicast_list = netxen_set_multicast_list,
 	.ndo_set_mac_address    = netxen_nic_set_mac,
@@ -2110,10 +2111,10 @@ request_reset:
 	clear_bit(__NX_RESETTING, &adapter->state);
 }
 
-static struct net_device_stats *netxen_nic_get_stats(struct net_device *netdev)
+static struct rtnl_link_stats64 *netxen_nic_get_stats(struct net_device *netdev,
+						      struct rtnl_link_stats64 *stats)
 {
 	struct netxen_adapter *adapter = netdev_priv(netdev);
-	struct net_device_stats *stats = &netdev->stats;
 
 	stats->rx_packets = adapter->stats.rx_pkts + adapter->stats.lro_pkts;
 	stats->tx_packets = adapter->stats.xmitfinished;



^ permalink raw reply

* [PATCH 3/9] veth: convert to 64 bit statistics
From: Stephen Hemminger @ 2011-06-09  0:53 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev
In-Reply-To: <20110609005356.160260858@vyatta.com>

[-- Attachment #1: veth-stats64.patch --]
[-- Type: text/plain, Size: 2294 bytes --]

Not much change, device was already keeping per cpu statistics.
Use recent 64 statistics interface.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/drivers/net/veth.c	2011-06-07 20:04:42.221679879 +0900
+++ b/drivers/net/veth.c	2011-06-07 20:10:04.259276778 +0900
@@ -24,12 +24,12 @@
 #define MAX_MTU 65535		/* Max L3 MTU (arbitrary) */
 
 struct veth_net_stats {
-	unsigned long	rx_packets;
-	unsigned long	tx_packets;
-	unsigned long	rx_bytes;
-	unsigned long	tx_bytes;
-	unsigned long	tx_dropped;
-	unsigned long	rx_dropped;
+	u64	rx_packets;
+	u64	tx_packets;
+	u64	rx_bytes;
+	u64	tx_bytes;
+	u64	tx_dropped;
+	u64	rx_dropped;
 };
 
 struct veth_priv {
@@ -159,32 +159,27 @@ rx_drop:
  * general routines
  */
 
-static struct net_device_stats *veth_get_stats(struct net_device *dev)
+static struct rtnl_link_stats64 *veth_get_stats64(struct net_device *dev,
+						  struct rtnl_link_stats64 *tot)
 {
 	struct veth_priv *priv;
 	int cpu;
-	struct veth_net_stats *stats, total = {0};
+	struct veth_net_stats *stats;
 
 	priv = netdev_priv(dev);
 
 	for_each_possible_cpu(cpu) {
 		stats = per_cpu_ptr(priv->stats, cpu);
 
-		total.rx_packets += stats->rx_packets;
-		total.tx_packets += stats->tx_packets;
-		total.rx_bytes   += stats->rx_bytes;
-		total.tx_bytes   += stats->tx_bytes;
-		total.tx_dropped += stats->tx_dropped;
-		total.rx_dropped += stats->rx_dropped;
+		tot->rx_packets += stats->rx_packets;
+		tot->tx_packets += stats->tx_packets;
+		tot->rx_bytes   += stats->rx_bytes;
+		tot->tx_bytes   += stats->tx_bytes;
+		tot->tx_dropped += stats->tx_dropped;
+		tot->rx_dropped += stats->rx_dropped;
 	}
-	dev->stats.rx_packets = total.rx_packets;
-	dev->stats.tx_packets = total.tx_packets;
-	dev->stats.rx_bytes   = total.rx_bytes;
-	dev->stats.tx_bytes   = total.tx_bytes;
-	dev->stats.tx_dropped = total.tx_dropped;
-	dev->stats.rx_dropped = total.rx_dropped;
 
-	return &dev->stats;
+	return tot;
 }
 
 static int veth_open(struct net_device *dev)
@@ -254,7 +249,7 @@ static const struct net_device_ops veth_
 	.ndo_stop            = veth_close,
 	.ndo_start_xmit      = veth_xmit,
 	.ndo_change_mtu      = veth_change_mtu,
-	.ndo_get_stats       = veth_get_stats,
+	.ndo_get_stats64     = veth_get_stats64,
 	.ndo_set_mac_address = eth_mac_addr,
 };
 



^ permalink raw reply

* [PATCH 4/9] ifb: convert to 64 bit stats
From: Stephen Hemminger @ 2011-06-09  0:54 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev
In-Reply-To: <20110609005356.160260858@vyatta.com>

[-- Attachment #1: ifb-stats64.patch --]
[-- Type: text/plain, Size: 2495 bytes --]

Convert input functional block device to use 64 bit stats.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


--- a/drivers/net/ifb.c	2011-06-07 16:58:31.317079332 -0700
+++ b/drivers/net/ifb.c	2011-06-07 17:29:02.958161955 -0700
@@ -42,7 +42,14 @@ struct ifb_private {
 	struct tasklet_struct   ifb_tasklet;
 	int     tasklet_pending;
 	struct sk_buff_head     rq;
+	u64 rx_packets;
+	u64 rx_bytes;
+	u64 rx_dropped;
+
 	struct sk_buff_head     tq;
+	u64 tx_packets;
+	u64 tx_bytes;
+	u64 tx_dropped;
 };
 
 static int numifbs = 2;
@@ -57,7 +64,6 @@ static void ri_tasklet(unsigned long dev
 
 	struct net_device *_dev = (struct net_device *)dev;
 	struct ifb_private *dp = netdev_priv(_dev);
-	struct net_device_stats *stats = &_dev->stats;
 	struct netdev_queue *txq;
 	struct sk_buff *skb;
 
@@ -77,15 +83,16 @@ static void ri_tasklet(unsigned long dev
 
 		skb->tc_verd = 0;
 		skb->tc_verd = SET_TC_NCLS(skb->tc_verd);
-		stats->tx_packets++;
-		stats->tx_bytes +=skb->len;
+
+		dp->tx_packets++;
+		dp->tx_bytes +=skb->len;
 
 		rcu_read_lock();
 		skb->dev = dev_get_by_index_rcu(&init_net, skb->skb_iif);
 		if (!skb->dev) {
 			rcu_read_unlock();
 			dev_kfree_skb(skb);
-			stats->tx_dropped++;
+			dp->tx_dropped++;
 			if (skb_queue_len(&dp->tq) != 0)
 				goto resched;
 			break;
@@ -120,9 +127,26 @@ resched:
 
 }
 
+static struct rtnl_link_stats64 *ifb_stats64(struct net_device *dev,
+					     struct rtnl_link_stats64 *stats)
+{
+	struct ifb_private *dp = netdev_priv(dev);
+
+	stats->rx_packets = dp->rx_packets;
+	stats->rx_bytes = dp->rx_bytes;
+	stats->rx_dropped = dp->rx_dropped;
+	stats->tx_packets = dp->tx_packets;
+	stats->tx_bytes = dp->tx_bytes;
+	stats->tx_dropped = dp->tx_dropped;
+
+	return stats;
+}
+
+
 static const struct net_device_ops ifb_netdev_ops = {
 	.ndo_open	= ifb_open,
 	.ndo_stop	= ifb_close,
+	.ndo_get_stats64 = ifb_stats64,
 	.ndo_start_xmit	= ifb_xmit,
 	.ndo_validate_addr = eth_validate_addr,
 };
@@ -153,15 +177,14 @@ static void ifb_setup(struct net_device
 static netdev_tx_t ifb_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ifb_private *dp = netdev_priv(dev);
-	struct net_device_stats *stats = &dev->stats;
 	u32 from = G_TC_FROM(skb->tc_verd);
 
-	stats->rx_packets++;
-	stats->rx_bytes+=skb->len;
+	dp->rx_packets++;
+	dp->rx_bytes+=skb->len;
 
 	if (!(from & (AT_INGRESS|AT_EGRESS)) || !skb->skb_iif) {
 		dev_kfree_skb(skb);
-		stats->rx_dropped++;
+		dp->rx_dropped++;
 		return NETDEV_TX_OK;
 	}
 



^ permalink raw reply

* [PATCH 8/9] niu: support 64 bit stats interface
From: Stephen Hemminger @ 2011-06-09  0:54 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev
In-Reply-To: <20110609005356.160260858@vyatta.com>

[-- Attachment #1: niu-stats.patch --]
[-- Type: text/plain, Size: 2244 bytes --]

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/drivers/net/niu.c	2011-06-07 16:58:31.333079418 -0700
+++ b/drivers/net/niu.c	2011-06-07 17:29:17.234232746 -0700
@@ -6249,9 +6249,10 @@ static void niu_sync_mac_stats(struct ni
 		niu_sync_bmac_stats(np);
 }
 
-static void niu_get_rx_stats(struct niu *np)
+static void niu_get_rx_stats(struct niu *np,
+			     struct rtnl_link_stats64 *stats)
 {
-	unsigned long pkts, dropped, errors, bytes;
+	u64 pkts, dropped, errors, bytes;
 	struct rx_ring_info *rx_rings;
 	int i;
 
@@ -6273,15 +6274,16 @@ static void niu_get_rx_stats(struct niu
 	}
 
 no_rings:
-	np->dev->stats.rx_packets = pkts;
-	np->dev->stats.rx_bytes = bytes;
-	np->dev->stats.rx_dropped = dropped;
-	np->dev->stats.rx_errors = errors;
+	stats->rx_packets = pkts;
+	stats->rx_bytes = bytes;
+	stats->rx_dropped = dropped;
+	stats->rx_errors = errors;
 }
 
-static void niu_get_tx_stats(struct niu *np)
+static void niu_get_tx_stats(struct niu *np,
+			     struct rtnl_link_stats64 *stats)
 {
-	unsigned long pkts, errors, bytes;
+	u64 pkts, errors, bytes;
 	struct tx_ring_info *tx_rings;
 	int i;
 
@@ -6300,20 +6302,22 @@ static void niu_get_tx_stats(struct niu
 	}
 
 no_rings:
-	np->dev->stats.tx_packets = pkts;
-	np->dev->stats.tx_bytes = bytes;
-	np->dev->stats.tx_errors = errors;
+	stats->tx_packets = pkts;
+	stats->tx_bytes = bytes;
+	stats->tx_errors = errors;
 }
 
-static struct net_device_stats *niu_get_stats(struct net_device *dev)
+static struct rtnl_link_stats64 *niu_get_stats(struct net_device *dev,
+					       struct rtnl_link_stats64 *stats)
 {
 	struct niu *np = netdev_priv(dev);
 
 	if (netif_running(dev)) {
-		niu_get_rx_stats(np);
-		niu_get_tx_stats(np);
+		niu_get_rx_stats(np, stats);
+		niu_get_tx_stats(np, stats);
 	}
-	return &dev->stats;
+
+	return stats;
 }
 
 static void niu_load_hash_xmac(struct niu *np, u16 *hash)
@@ -9711,7 +9715,7 @@ static const struct net_device_ops niu_n
 	.ndo_open		= niu_open,
 	.ndo_stop		= niu_close,
 	.ndo_start_xmit		= niu_start_xmit,
-	.ndo_get_stats		= niu_get_stats,
+	.ndo_get_stats64	= niu_get_stats,
 	.ndo_set_multicast_list	= niu_set_rx_mode,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_set_mac_address	= niu_set_mac_addr,



^ permalink raw reply

* [PATCH 2/2] tun: dont force inline of functions
From: Stephen Hemminger @ 2011-06-09  0:33 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev
In-Reply-To: <20110609003306.651532958@vyatta.com>

[-- Attachment #1: tun-tap-noinline.patch --]
[-- Type: text/plain, Size: 1708 bytes --]

Current standard practice is to not mark most functions as inline
and  let compiler decide instead.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/drivers/net/tun.c	2011-05-13 16:43:25.643040523 -0700
+++ b/drivers/net/tun.c	2011-05-13 16:47:06.041228664 -0700
@@ -550,9 +550,9 @@ static unsigned int tun_chr_poll(struct
 
 /* prepad is the amount to reserve at front.  len is length after that.
  * linear is a hint as to how much to copy (usually headers). */
-static inline struct sk_buff *tun_alloc_skb(struct tun_struct *tun,
-					    size_t prepad, size_t len,
-					    size_t linear, int noblock)
+static struct sk_buff *tun_alloc_skb(struct tun_struct *tun,
+				     size_t prepad, size_t len,
+				     size_t linear, int noblock)
 {
 	struct sock *sk = tun->socket.sk;
 	struct sk_buff *skb;
@@ -578,9 +578,9 @@ static inline struct sk_buff *tun_alloc_
 }
 
 /* Get packet from user space buffer */
-static __inline__ ssize_t tun_get_user(struct tun_struct *tun,
-				       const struct iovec *iv, size_t count,
-				       int noblock)
+static ssize_t tun_get_user(struct tun_struct *tun,
+			    const struct iovec *iv, size_t count,
+			    int noblock)
 {
 	struct tun_pi pi = { 0, cpu_to_be16(ETH_P_IP) };
 	struct sk_buff *skb;
@@ -729,9 +729,9 @@ static ssize_t tun_chr_aio_write(struct
 }
 
 /* Put packet to the user space buffer */
-static __inline__ ssize_t tun_put_user(struct tun_struct *tun,
-				       struct sk_buff *skb,
-				       const struct iovec *iv, int len)
+static ssize_t tun_put_user(struct tun_struct *tun,
+			    struct sk_buff *skb,
+			    const struct iovec *iv, int len)
 {
 	struct tun_pi pi = { 0, skb->protocol };
 	ssize_t total = 0;



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox