Netdev List
 help / color / mirror / Atom feed
* Re: Possible bug in net/ipv4/route.c?
From: Stephen Hemminger @ 2010-07-05 18:03 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Sol Kavy, linux-kernel, gren, gjin, msezgin, silgen,
	David S. Miller, netdev
In-Reply-To: <20100705120413.GA6219@gondor.apana.org.au>

On Mon, 5 Jul 2010 20:04:13 +0800
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> Sol Kavy <skavy@ubicom.com> wrote:
> > Found Linux: 2.6.28
> > Arch: Ubicom32 <not yet pushed>
> > Project: uCLinux based Router
> > Test: Bit torrent Stress Test
> > 
> > Note: The top of Linus git net/ipv4/route.c appears to have the same issue.
> > 
> > The following is a patch for clearing out IP options area in an input skb during link failure processing.  Without this patch, the icmp_send() can result in a call to ip_options_echo() where the common buffer area of the skb is incorrectly interpreted.  Depending on the previous use of the skb->cb[], the interpreted option length values can cause stack corruption by copying more than 40 bytes to the output options.
> > 
> > In our case, a driver is using the skb->cb[] area to hold driver specific data.   The driver is not zeroing out the area after use.  I can see three basic solutions:
> > 
> > 1) Drivers are not allowed to use the skb->cb[] area at all.  Ubicom should modify the driver to use a different approach.
> > 
> > 2) The layer using skb->cb[] should clear this area after use and before handing the skb to another layer.  Ubicom should modify the driver to clear the skb->cb[] area before sending it up the line.
> > 
> > 3) Any layer that "uses" the skb->cb[] area must clear the area before use.  In which case, the proposed patch would fix the problem for the ipv4_link_failure().  I believe that this is the correct fix because I see ip_rcv() clears the skb->cb[] before using it.
> > 
> > Can someone confirm that this is the appropriate fix?  If this is documented somewhere, please direct me to the documentation.
> 
> Thanks for the report!
>  
> > Patch:  
> > 
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index 125ee64..d13805f 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -1606,6 +1606,14 @@ static void ipv4_link_failure(struct sk_buff *skb)
> > {
> >         struct rtable *rt;
> > 
> > +       /*
> > +         * Since link failure can be called with skbs from many layers (see arp)
> > +         * the cb area of the skb must be cleared before use. Because the cb area 
> > +         * can be formatted according to the caller layer's cb area format and it may cause
> > +         * corruptions when it is handled in a different network layer.
> > +         */
> > +       memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
> >         icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
> >         rt = skb->rtable;
> > 
> > The packet is enqueud by:
> > do_IRQ()->do_softirq()->__do_softirq()->net_rx_action()->ubi32_eth_napi_poll()->ubi32_eth_receive()->__vlan_hwaccel_rx()->netif_receive_skb()->br_handle_frame()->nf_hook_slow()->br_nf_pre_routing_finish()->br_nfr_pre_routing_finish_bridge()->neight_resolve_output()->__neigh_event_send().
> > 
> > The packet is then dequeued by: 
> > do_IRQ() -> irq_exit() -> do_softirq() -> run_timer_softirq() -> neigh_timer_handler() -> arp_error_report() -> ipv4_link_failure() -> icmp_send() -> ip_options_echo().
> > 
> > Because the Ubicom Ethernet driver overwrites the common buffer area, the enqueued packet contains garbage when casted as an IP options data structure.  This results in ip_options_echo() miss reading the option length information and overwriting memory.  By clearing the skb->cb[] before processing the icmp_send() against the packet, we ensure that ip_options_echo() does not corrupt memory.   
> 
> Generally this area should be cleared on entry to each stack
> intending on using it.  So in this case, I'd point the finger
> of blame at the bridge stack for letting this packet into the
> IP stack through the back entrance without taking the proper
> precautions.

The CB is used in two places in the bridge code.
1) The recent multicast changes (IGMP)
2) Netfilter / ebtables to store header.

Ebtables is okay because the only part of the cb[] it uses
is the incoming device (brdev) which is always initialized coming into
the bridge: br_dev_xmit and br_handle_frame_finish

The IGMP code looks buggy.

/* net device transmit always called with no BH (preempt_disabled) */
netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
{
... [A]
	BR_INPUT_SKB_CB(skb)->brdev = dev;

	skb_reset_mac_header(skb);
	skb_pull(skb, ETH_HLEN);

	if (is_multicast_ether_addr(dest)) {
		if (br_multicast_rcv(br, NULL, skb))
			goto out;

		mdst = br_mdb_get(br, skb);
		if (mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb))
			... [B]

The problem is that br_dev_xmit is looking at flags in the CB[] that
are uninitialized.

if br_dev_xmit cleared the CB at [A] the mrouters_only would always be zero
at [B].

Where should the mrouters and igmp_only fields in skb be initialized?



^ permalink raw reply

* Re: setsockopt(IP_TOS) being privileged or distinct capability?
From: Philip Prindeville @ 2010-07-05 18:04 UTC (permalink / raw)
  To: Alexander Clouter; +Cc: netdev
In-Reply-To: <20100703234813.GJ24655@chipmunk>

On 7/3/10 5:48 PM, Alexander Clouter wrote:
> Hi,
>
> * Philip Prindeville<philipp_subx@redfish-solutions.com>  [2010-07-03 17:07:52-0600]:
>    
>> On 7/3/10 12:55 PM, Alexander Clouter wrote:
>>      
>>>
>>>        
>>>> Does anyone else think that setsockopt(IP_TOS) should be a privileged
>>>> operation, perhaps using CAP_NET_ADMIN, or maybe even adding separate
>>>> granularity as CAP_NET_TOS?
>>>>
>>>>
>>>>          
>>> I really would prefer not having to run telnet and ssh *clients* as
>>> root. :)
>>>        
>> Don't ping and traceroute -I currently run as root?
>>
>>      
> Indeed, but I have no idea what that has to do with ToS/DSCP flags?
>    

The logic being that if having a RAW socket requires privilege, but it's 
necessary to have reasonable security for user-invokable programs... and 
we manage to do this without too much trouble for those to programs, 
then it shouldn't be an undue hardship to do the same for ssh.

> ping and (old skool) traceroute use ICMP where you need to open a
> privileged socket; to send and receive ICMP packets.  Opening a UDP/TCP
> is an unprivileged operation and so is setsockopt(IP_TOS).
>    

Right.  And I'm saying because of the potentially disruptive nature of 
setsockopt(IP_TOS), perhaps it should require privilege.


> I'm guessing, if you excuse me Google-stalking you), this is all linked
> to:
>
> https://bugzilla.mindrot.org/show_bug.cgi?id=1733
>    

That would be a very good guess.

And google-stalking is fine.  I draw the line at leaving dead cats at my 
front door.


> You have to bear in mind ToS is a marking that userland can utilise to
> request that the network provides it with a particular QoS, this does
> not mean for an instant the network has to honour that (I know my ISP
> does not and neither does my work network I sysadmin for)...otherwise
> nothing would stop me using:
>    

I understand that.  That's part of the reason that I've submitted 
patches for APR, Apache, Thunderbird, Firefox, Proftpd, Curl, wget, 
etc.  There is pressure within certain technical groups to get ISP's to 
voluntarily implement RFC-4594... that's the carrot.  The stick being 
FCC heavy-handed regulation of the ISP's if they don't.

Once QoS markings actually *are* implemented in carrier networks, the 
potential for abuse is non-insignificant.  Hence the suggestion to make 
it privileged.


> iptables -t mangle -I POSTROUTING -j DSCP --set-dscp-class EF
>    

Except that "iptables" is also a privileged operation.


> QoS is meaningless unless you place boundaries on the policies; the
> ToS/DSCP marking should only be used as a *hint* for classification of
> traffic flows.
>    

Like I said, there's an effort to push the ISP's into implementing 
RFC-4594 widespread.  Their previous arguments for not doing so were (a) 
most software doesn't implement QoS (hence the scrub I did above), and 
(b) there were no standard markings.  RFC-4594 attempts to impose a 
standard.


> For example, 'interactive' and 'low latency' (in the case of SSH or
> telnet) should not exceed 10kB/s...unless you like to play 0verkill :)
> Anything marking it's traffic as interactive but shutting traffic at
> 500kB/s is obviously telling lies.  If you build your policing rules to
> blindly accept whatever is in the ToS/DSCP field, you are configuring a
> DoS vector on your network.
>
> Cheers
>
>    

When you say "interactive" and "low latency" are you referring to the 
RFC-791 mappings for those, or to the RFC-4594 mappings of those classes?

Here we use Arno's Iptables Firewall with the traffic-shaper plugin I 
wrote.  This does shaping and policing within traffic classes.

And yes, doing an out-of-the-box shaper for Fedora is on my TODO list.

-Philip


^ permalink raw reply

* Re: setsockopt(IP_TOS) being privileged or distinct capability?
From: Philip Prindeville @ 2010-07-05 18:08 UTC (permalink / raw)
  To: Alexander Clouter; +Cc: netdev
In-Reply-To: <20100703234813.GJ24655@chipmunk>

Say, on a slightly different subject... but still having to do with 
setsockopt(IP_TOS)... is it intentional that system call effectively 
does nothing if the socket is in listen(), connect(), or bind() states?

In other words, you have to issue the setsockopt() immediately after the 
socket() call, or it doesn't do anything.

As I remember, that's slightly different semantics from BSD, which 
allows you to change the markings on a bound or listening socket.

I've not walked through the kernel sources to see why this is.

-Philip


^ permalink raw reply

* Re: [PATCH net-next-2.6] tg3: 64bits stats
From: Eric Dumazet @ 2010-07-05 18:07 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Matt Carlson, Michael Chan, netdev, David Miller
In-Reply-To: <1278351070.2087.70.camel@achroite.uk.solarflarecom.com>

Le lundi 05 juillet 2010 à 18:31 +0100, Ben Hutchings a écrit :
> On Mon, 2010-07-05 at 18:03 +0200, Eric Dumazet wrote:
> > Le lundi 05 juillet 2010 à 11:14 +0200, Eric Dumazet a écrit :
> > > After commit be1f3c2c027c (net: Enable 64-bit net device statistics on
> > > 32-bit architectures), we can now provide 64bit stats, even on 32bit
> > > arches.
> > > 
> > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > > ---
> > 
> > Please disregard this patch.
> > 
> > There is small possibility a reader might read a 64bit value while
> > another writer makes a change to it, changing high 32bit value.
> > 
> > A change in core network would be needed to make this 100% safe,
> > possibly using a seqlock to protect dev->stats64
> 
> I really didn't want to add this overhead and complication to readers
> when only some drivers need it.
> 

Overhead would be minimal, only in get_stats() method and only for
drivers that want to provide 64 bit stats...

Clearly, rx/tx path must not have any overhead.

Right now, even RTNL 64bit stats are not safe.

Should we revert prior patches or try to make progress ?




^ permalink raw reply

* Re: [PATCHv3 2/2] sfc: Implement 64-bit net device statistics on all architectures
From: Eric Dumazet @ 2010-07-05 18:16 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, Stephen Hemminger, Arnd Bergmann, netdev,
	linux-net-drivers
In-Reply-To: <1276017672.2185.12.camel@achroite.uk.solarflarecom.com>

Le mardi 08 juin 2010 à 18:21 +0100, Ben Hutchings a écrit :
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> ---
> This is unchanged from v1.
> 
> Ben.
> 
>  drivers/net/sfc/efx.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/sfc/efx.c b/drivers/net/sfc/efx.c
> index 26b0cc2..8ad476a 100644
> --- a/drivers/net/sfc/efx.c
> +++ b/drivers/net/sfc/efx.c
> @@ -1492,11 +1492,11 @@ static int efx_net_stop(struct net_device *net_dev)
>  }
>  
>  /* Context: process, dev_base_lock or RTNL held, non-blocking. */
> -static struct net_device_stats *efx_net_stats(struct net_device *net_dev)
> +static struct rtnl_link_stats64 *efx_net_stats(struct net_device *net_dev)
>  {
>  	struct efx_nic *efx = netdev_priv(net_dev);
>  	struct efx_mac_stats *mac_stats = &efx->mac_stats;
> -	struct net_device_stats *stats = &net_dev->stats;
> +	struct rtnl_link_stats64 *stats = &net_dev->stats64;
>  
>  	spin_lock_bh(&efx->stats_lock);
>  	efx->type->update_stats(efx);
> @@ -1630,7 +1630,7 @@ static void efx_set_multicast_list(struct net_device *net_dev)
>  static const struct net_device_ops efx_netdev_ops = {
>  	.ndo_open		= efx_net_open,
>  	.ndo_stop		= efx_net_stop,
> -	.ndo_get_stats		= efx_net_stats,
> +	.ndo_get_stats64	= efx_net_stats,
>  	.ndo_tx_timeout		= efx_watchdog,
>  	.ndo_start_xmit		= efx_hard_start_xmit,
>  	.ndo_validate_addr	= eth_validate_addr,
> -- 
> 1.6.2.5
> 

Ben, David

I believe following patch is needed after our recent commits.
Not sure a seqlock is really needed, maybe a spinlock would be enough.

Thanks

[PATCH net-next-2.6] net: fix 64 bit counters on 32 bit arches

There is a small possibility that a reader gets incorrect values on 32
bit arches. SNMP applications could catch incorrect counters when a
32bit high part is changed.

I believe we need to add some synchronisation to avoid this.

A driver that provides 64bit stats on a 32bit arches should make sure it
gets the seqlock before updating them. Other drivers can stay unchanged,
since upper 32bits are not updated.

This new seqlock should be acquired in slow path only in process context
(no BH protection for instance).

Drivers should not use it in their rx/tx path. (They better stay with
32bit counters on 32bit arches)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/net/loopback.c    |    4 ++
 drivers/net/macvlan.c     |    2 +
 drivers/net/sfc/efx.c     |    2 +
 include/linux/netdevice.h |   43 +++++++++++++++++++++++
 net/8021q/vlan_dev.c      |    2 +
 net/core/dev.c            |   38 ++++++++++++--------
 net/core/rtnetlink.c      |   66 +++++++++++++++++++-----------------
 7 files changed, 112 insertions(+), 45 deletions(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 4dd0510..5e82e4a 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -123,12 +123,16 @@ static struct rtnl_link_stats64 *loopback_get_stats64(struct net_device *dev)
 		bytes   += tbytes;
 		packets += tpackets;
 	}
+
+	netdev_stats_update_begin(dev);
 	stats->rx_packets = packets;
 	stats->tx_packets = packets;
 	stats->rx_dropped = drops;
 	stats->rx_errors  = drops;
 	stats->rx_bytes   = bytes;
 	stats->tx_bytes   = bytes;
+	netdev_stats_update_end(dev);
+
 	return stats;
 }
 
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index e6d626e..c39a90a 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -458,11 +458,13 @@ static struct rtnl_link_stats64 *macvlan_dev_get_stats64(struct net_device *dev)
 			/* rx_errors is an ulong, updated without syncp protection */
 			accum.rx_errors		+= p->rx_errors;
 		}
+		netdev_stats_update_begin(dev);
 		stats->rx_packets = accum.rx_packets;
 		stats->rx_bytes   = accum.rx_bytes;
 		stats->rx_errors  = accum.rx_errors;
 		stats->rx_dropped = accum.rx_errors;
 		stats->multicast  = accum.rx_multicast;
+		netdev_stats_update_end(dev);
 	}
 	return stats;
 }
diff --git a/drivers/net/sfc/efx.c b/drivers/net/sfc/efx.c
index 35b3f29..06afff6 100644
--- a/drivers/net/sfc/efx.c
+++ b/drivers/net/sfc/efx.c
@@ -1543,6 +1543,7 @@ static struct rtnl_link_stats64 *efx_net_stats(struct net_device *net_dev)
 	efx->type->update_stats(efx);
 	spin_unlock_bh(&efx->stats_lock);
 
+	netdev_stats_update_begin(net_dev);
 	stats->rx_packets = mac_stats->rx_packets;
 	stats->tx_packets = mac_stats->tx_packets;
 	stats->rx_bytes = mac_stats->rx_bytes;
@@ -1565,6 +1566,7 @@ static struct rtnl_link_stats64 *efx_net_stats(struct net_device *net_dev)
 	stats->tx_errors = (stats->tx_window_errors +
 			    mac_stats->tx_bad);
 
+	netdev_stats_update_end(net_dev);
 	return stats;
 }
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4d27368..d6726cf 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -886,6 +886,9 @@ struct net_device {
 	int			ifindex;
 	int			iflink;
 
+#if BITS_PER_LONG==32
+	seqlock_t		stats_lock;
+#endif
 	union {
 		struct rtnl_link_stats64 stats64;
 		struct net_device_stats stats;
@@ -1080,6 +1083,46 @@ struct net_device {
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
+
+static inline void netdev_stats_initlock(struct net_device *dev)
+{
+#if BITS_PER_LONG==32
+	seqlock_init(&dev->stats_lock);
+#endif
+}
+
+static inline void netdev_stats_update_begin(struct net_device *dev)
+{
+#if BITS_PER_LONG==32
+	write_seqlock(&dev->stats_lock);
+#endif
+}
+
+static inline void netdev_stats_update_end(struct net_device *dev)
+{
+#if BITS_PER_LONG==32
+	write_sequnlock(&dev->stats_lock);
+#endif
+}
+
+static inline unsigned int netdev_stats_fetch_begin(const struct net_device *dev)
+{
+#if BITS_PER_LONG==32
+	return read_seqbegin(&dev->stats_lock);
+#else
+	return 0;
+#endif
+}
+
+static bool inline netdev_stats_fetch_retry(const struct net_device *dev, unsigned int start)
+{
+#if BITS_PER_LONG==32
+	return read_seqretry(&dev->stats_lock, start);
+#else
+	return false;
+#endif
+}
+
 #define	NETDEV_ALIGN		32
 
 static inline
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index c6456cb..3d4e2f0 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -830,10 +830,12 @@ static struct rtnl_link_stats64 *vlan_dev_get_stats64(struct net_device *dev)
 			/* rx_errors is an ulong, not protected by syncp */
 			accum.rx_errors  += p->rx_errors;
 		}
+		netdev_stats_update_begin(dev);
 		stats->rx_packets = accum.rx_packets;
 		stats->rx_bytes   = accum.rx_bytes;
 		stats->rx_errors  = accum.rx_errors;
 		stats->multicast  = accum.rx_multicast;
+		netdev_stats_update_end(dev);
 	}
 	return stats;
 }
diff --git a/net/core/dev.c b/net/core/dev.c
index 93b8929..b41c7ac 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3704,24 +3704,31 @@ void dev_seq_stop(struct seq_file *seq, void *v)
 static void dev_seq_printf_stats(struct seq_file *seq, struct net_device *dev)
 {
 	const struct rtnl_link_stats64 *stats = dev_get_stats(dev);
+	struct rtnl_link_stats64 temp;
+	unsigned int start;
+	
+	do {
+		start = netdev_stats_fetch_begin(dev);
+		memcpy(&temp, stats, sizeof(temp));
+	} while (netdev_stats_fetch_retry(dev, start));
 
 	seq_printf(seq, "%6s: %7llu %7llu %4llu %4llu %4llu %5llu %10llu %9llu "
 		   "%8llu %7llu %4llu %4llu %4llu %5llu %7llu %10llu\n",
-		   dev->name, stats->rx_bytes, stats->rx_packets,
-		   stats->rx_errors,
-		   stats->rx_dropped + stats->rx_missed_errors,
-		   stats->rx_fifo_errors,
-		   stats->rx_length_errors + stats->rx_over_errors +
-		    stats->rx_crc_errors + stats->rx_frame_errors,
-		   stats->rx_compressed, stats->multicast,
-		   stats->tx_bytes, stats->tx_packets,
-		   stats->tx_errors, stats->tx_dropped,
-		   stats->tx_fifo_errors, stats->collisions,
-		   stats->tx_carrier_errors +
-		    stats->tx_aborted_errors +
-		    stats->tx_window_errors +
-		    stats->tx_heartbeat_errors,
-		   stats->tx_compressed);
+		   dev->name, temp.rx_bytes, temp.rx_packets,
+		   temp.rx_errors,
+		   temp.rx_dropped + temp.rx_missed_errors,
+		   temp.rx_fifo_errors,
+		   temp.rx_length_errors + temp.rx_over_errors +
+		    temp.rx_crc_errors + temp.rx_frame_errors,
+		   temp.rx_compressed, temp.multicast,
+		   temp.tx_bytes, temp.tx_packets,
+		   temp.tx_errors, temp.tx_dropped,
+		   temp.tx_fifo_errors, temp.collisions,
+		   temp.tx_carrier_errors +
+		    temp.tx_aborted_errors +
+		    temp.tx_window_errors +
+		    temp.tx_heartbeat_errors,
+		   temp.tx_compressed);
 }
 
 /*
@@ -5386,6 +5393,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 	if (dev_addr_init(dev))
 		goto free_rx;
 
+	netdev_stats_initlock(dev);
 	dev_mc_init(dev);
 	dev_uc_init(dev);
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e645778..b3a3812 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -610,37 +610,43 @@ static void copy_rtnl_link_stats(struct rtnl_link_stats *a,
 	a->tx_compressed = b->tx_compressed;
 }
 
-static void copy_rtnl_link_stats64(void *v, const struct rtnl_link_stats64 *b)
+static void copy_rtnl_link_stats64(void *v, const struct rtnl_link_stats64 *b,
+				   struct net_device *dev)
 {
 	struct rtnl_link_stats64 a;
-
-	a.rx_packets = b->rx_packets;
-	a.tx_packets = b->tx_packets;
-	a.rx_bytes = b->rx_bytes;
-	a.tx_bytes = b->tx_bytes;
-	a.rx_errors = b->rx_errors;
-	a.tx_errors = b->tx_errors;
-	a.rx_dropped = b->rx_dropped;
-	a.tx_dropped = b->tx_dropped;
-
-	a.multicast = b->multicast;
-	a.collisions = b->collisions;
-
-	a.rx_length_errors = b->rx_length_errors;
-	a.rx_over_errors = b->rx_over_errors;
-	a.rx_crc_errors = b->rx_crc_errors;
-	a.rx_frame_errors = b->rx_frame_errors;
-	a.rx_fifo_errors = b->rx_fifo_errors;
-	a.rx_missed_errors = b->rx_missed_errors;
-
-	a.tx_aborted_errors = b->tx_aborted_errors;
-	a.tx_carrier_errors = b->tx_carrier_errors;
-	a.tx_fifo_errors = b->tx_fifo_errors;
-	a.tx_heartbeat_errors = b->tx_heartbeat_errors;
-	a.tx_window_errors = b->tx_window_errors;
-
-	a.rx_compressed = b->rx_compressed;
-	a.tx_compressed = b->tx_compressed;
+	unsigned int start;
+
+	do {
+		start = netdev_stats_fetch_begin(dev);
+
+		a.rx_packets = b->rx_packets;
+		a.tx_packets = b->tx_packets;
+		a.rx_bytes = b->rx_bytes;
+		a.tx_bytes = b->tx_bytes;
+		a.rx_errors = b->rx_errors;
+		a.tx_errors = b->tx_errors;
+		a.rx_dropped = b->rx_dropped;
+		a.tx_dropped = b->tx_dropped;
+	
+		a.multicast = b->multicast;
+		a.collisions = b->collisions;
+
+		a.rx_length_errors = b->rx_length_errors;
+		a.rx_over_errors = b->rx_over_errors;
+		a.rx_crc_errors = b->rx_crc_errors;
+		a.rx_frame_errors = b->rx_frame_errors;
+		a.rx_fifo_errors = b->rx_fifo_errors;
+		a.rx_missed_errors = b->rx_missed_errors;
+
+		a.tx_aborted_errors = b->tx_aborted_errors;
+		a.tx_carrier_errors = b->tx_carrier_errors;
+		a.tx_fifo_errors = b->tx_fifo_errors;
+		a.tx_heartbeat_errors = b->tx_heartbeat_errors;
+		a.tx_window_errors = b->tx_window_errors;
+
+		a.rx_compressed = b->rx_compressed;
+		a.tx_compressed = b->tx_compressed;
+	} while (netdev_stats_fetch_retry(dev, start));
 	memcpy(v, &a, sizeof(a));
 }
 
@@ -854,7 +860,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			sizeof(struct rtnl_link_stats64));
 	if (attr == NULL)
 		goto nla_put_failure;
-	copy_rtnl_link_stats64(nla_data(attr), stats);
+	copy_rtnl_link_stats64(nla_data(attr), stats, dev);
 
 	if (dev->dev.parent)
 		NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));



^ permalink raw reply related

* Re: [PATCH net-next-2.6] tg3: 64bits stats
From: Ben Hutchings @ 2010-07-05 18:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Matt Carlson, Michael Chan, netdev, David Miller
In-Reply-To: <1278353230.2877.601.camel@edumazet-laptop>

On Mon, 2010-07-05 at 20:07 +0200, Eric Dumazet wrote:
> Le lundi 05 juillet 2010 à 18:31 +0100, Ben Hutchings a écrit :
> > On Mon, 2010-07-05 at 18:03 +0200, Eric Dumazet wrote:
> > > Le lundi 05 juillet 2010 à 11:14 +0200, Eric Dumazet a écrit :
> > > > After commit be1f3c2c027c (net: Enable 64-bit net device statistics on
> > > > 32-bit architectures), we can now provide 64bit stats, even on 32bit
> > > > arches.
> > > > 
> > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > > > ---
> > > 
> > > Please disregard this patch.
> > > 
> > > There is small possibility a reader might read a 64bit value while
> > > another writer makes a change to it, changing high 32bit value.
> > > 
> > > A change in core network would be needed to make this 100% safe,
> > > possibly using a seqlock to protect dev->stats64
> > 
> > I really didn't want to add this overhead and complication to readers
> > when only some drivers need it.
> > 
> 
> Overhead would be minimal, only in get_stats() method and only for
> drivers that want to provide 64 bit stats...
> 
> Clearly, rx/tx path must not have any overhead.
> 
> Right now, even RTNL 64bit stats are not safe.
> 
> Should we revert prior patches or try to make progress ?

I think you should use a similar approach here as you did in the
loopback driver, i.e. update private variables in the RX and TX path and
then copy/aggregate them in the implementation ndo_get_stats64 (only
without the need for percpu stats).

If you want to include a seqlock in the driver stats interface, you can
do that but it's not going to be pretty and we're still going to need
additional seqlocks for per-queue (or percpy) stats in some drivers.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCHv3 2/2] sfc: Implement 64-bit net device statistics on all architectures
From: Eric Dumazet @ 2010-07-05 18:31 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, Stephen Hemminger, Arnd Bergmann, netdev,
	linux-net-drivers
In-Reply-To: <1278353780.2877.620.camel@edumazet-laptop>

Le lundi 05 juillet 2010 à 20:16 +0200, Eric Dumazet a écrit :
> Le mardi 08 juin 2010 à 18:21 +0100, Ben Hutchings a écrit :
> > Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> > ---
> > This is unchanged from v1.
> > 
> > Ben.
> > 
> >  drivers/net/sfc/efx.c |    6 +++---
> >  1 files changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/net/sfc/efx.c b/drivers/net/sfc/efx.c
> > index 26b0cc2..8ad476a 100644
> > --- a/drivers/net/sfc/efx.c
> > +++ b/drivers/net/sfc/efx.c
> > @@ -1492,11 +1492,11 @@ static int efx_net_stop(struct net_device *net_dev)
> >  }
> >  
> >  /* Context: process, dev_base_lock or RTNL held, non-blocking. */
> > -static struct net_device_stats *efx_net_stats(struct net_device *net_dev)
> > +static struct rtnl_link_stats64 *efx_net_stats(struct net_device *net_dev)
> >  {
> >  	struct efx_nic *efx = netdev_priv(net_dev);
> >  	struct efx_mac_stats *mac_stats = &efx->mac_stats;
> > -	struct net_device_stats *stats = &net_dev->stats;
> > +	struct rtnl_link_stats64 *stats = &net_dev->stats64;
> >  
> >  	spin_lock_bh(&efx->stats_lock);
> >  	efx->type->update_stats(efx);
> > @@ -1630,7 +1630,7 @@ static void efx_set_multicast_list(struct net_device *net_dev)
> >  static const struct net_device_ops efx_netdev_ops = {
> >  	.ndo_open		= efx_net_open,
> >  	.ndo_stop		= efx_net_stop,
> > -	.ndo_get_stats		= efx_net_stats,
> > +	.ndo_get_stats64	= efx_net_stats,
> >  	.ndo_tx_timeout		= efx_watchdog,
> >  	.ndo_start_xmit		= efx_hard_start_xmit,
> >  	.ndo_validate_addr	= eth_validate_addr,
> > -- 
> > 1.6.2.5
> > 
> 
> Ben, David
> 
> I believe following patch is needed after our recent commits.
> Not sure a seqlock is really needed, maybe a spinlock would be enough.
> 

One other way would be to add a rtnl_link_stats64 param to
ndo_get_stats64() method, and ask drivers to copy their stats in this
zone, instead of returning &dev->stats64 or something...

And also change dev_get_stats() with this new parameter.

Each caller would use a private copy, with no risk of concurrent
updates.




^ permalink raw reply

* Re: [PATCH net-next-2.6] tg3: 64bits stats
From: Eric Dumazet @ 2010-07-05 18:35 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Matt Carlson, Michael Chan, netdev, David Miller
In-Reply-To: <1278354656.2087.108.camel@achroite.uk.solarflarecom.com>

Le lundi 05 juillet 2010 à 19:30 +0100, Ben Hutchings a écrit :

> I think you should use a similar approach here as you did in the
> loopback driver, i.e. update private variables in the RX and TX path and
> then copy/aggregate them in the implementation ndo_get_stats64 (only
> without the need for percpu stats).
> 
> If you want to include a seqlock in the driver stats interface, you can
> do that but it's not going to be pretty and we're still going to need
> additional seqlocks for per-queue (or percpy) stats in some drivers.

Yes, I provided one patch but am working on a different one, requiring a
new rtnl_link_stats64 param to ndo_get_stats64() methods and
dev_get_stats() as well.

dev->stats64 should not be overwritten without some synchronization, so
just disallow it for the moment...



^ permalink raw reply

* RE: Possible bug in net/ipv4/route.c?
From: Sol Kavy @ 2010-07-05 16:37 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Greg Ren, Guojun Jin, Murat Sezgin, Sener Ilgen, David S. Miller,
	Stephen Hemminger, netdev
In-Reply-To: <20100705120413.GA6219@gondor.apana.org.au>

So is there some place else that should have the clear instead of ipv4_link_failure()?

Sol

-----Original Message-----
From: Herbert Xu [mailto:herbert@gondor.apana.org.au] 
Sent: Monday, July 05, 2010 5:04 AM
To: Sol Kavy
Cc: linux-kernel@vger.kernel.org; Greg Ren; Guojun Jin; Murat Sezgin; Sener Ilgen; David S. Miller; Stephen Hemminger; netdev@vger.kernel.org
Subject: Re: Possible bug in net/ipv4/route.c?

Sol Kavy <skavy@ubicom.com> wrote:
> Found Linux: 2.6.28
> Arch: Ubicom32 <not yet pushed>
> Project: uCLinux based Router
> Test: Bit torrent Stress Test
> 
> Note: The top of Linus git net/ipv4/route.c appears to have the same issue.
> 
> The following is a patch for clearing out IP options area in an input skb during link failure processing.  Without this patch, the icmp_send() can result in a call to ip_options_echo() where the common buffer area of the skb is incorrectly interpreted.  Depending on the previous use of the skb->cb[], the interpreted option length values can cause stack corruption by copying more than 40 bytes to the output options.
> 
> In our case, a driver is using the skb->cb[] area to hold driver specific data.   The driver is not zeroing out the area after use.  I can see three basic solutions:
> 
> 1) Drivers are not allowed to use the skb->cb[] area at all.  Ubicom should modify the driver to use a different approach.
> 
> 2) The layer using skb->cb[] should clear this area after use and before handing the skb to another layer.  Ubicom should modify the driver to clear the skb->cb[] area before sending it up the line.
> 
> 3) Any layer that "uses" the skb->cb[] area must clear the area before use.  In which case, the proposed patch would fix the problem for the ipv4_link_failure().  I believe that this is the correct fix because I see ip_rcv() clears the skb->cb[] before using it.
> 
> Can someone confirm that this is the appropriate fix?  If this is documented somewhere, please direct me to the documentation.

Thanks for the report!
 
> Patch:
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 
> 125ee64..d13805f 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1606,6 +1606,14 @@ static void ipv4_link_failure(struct sk_buff 
> *skb) {
>         struct rtable *rt;
> 
> +       /*
> +         * Since link failure can be called with skbs from many 
> +layers (see arp)
> +         * the cb area of the skb must be cleared before use. Because 
> +the cb area
> +         * can be formatted according to the caller layer's cb area 
> +format and it may cause
> +         * corruptions when it is handled in a different network layer.
> +         */
> +       memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
>         icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
>         rt = skb->rtable;
> 
> The packet is enqueud by:
> do_IRQ()->do_softirq()->__do_softirq()->net_rx_action()->ubi32_eth_napi_poll()->ubi32_eth_receive()->__vlan_hwaccel_rx()->netif_receive_skb()->br_handle_frame()->nf_hook_slow()->br_nf_pre_routing_finish()->br_nfr_pre_routing_finish_bridge()->neight_resolve_output()->__neigh_event_send().
> 
> The packet is then dequeued by: 
> do_IRQ() -> irq_exit() -> do_softirq() -> run_timer_softirq() -> neigh_timer_handler() -> arp_error_report() -> ipv4_link_failure() -> icmp_send() -> ip_options_echo().
> 
> Because the Ubicom Ethernet driver overwrites the common buffer area, 
> the enqueued packet contains garbage when casted as an IP options data structure.  This results in ip_options_echo() miss reading the option length information and overwriting memory.  By clearing the skb->cb[] before processing the icmp_send() against the packet, we ensure that ip_options_echo() does not corrupt memory.

Generally this area should be cleared on entry to each stack intending on using it.  So in this case, I'd point the finger of blame at the bridge stack for letting this packet into the IP stack through the back entrance without taking the proper precautions.

Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH net-next-2.6] tg3: 64bits stats
From: Ben Hutchings @ 2010-07-05 18:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Matt Carlson, Michael Chan, netdev, David Miller
In-Reply-To: <1278353230.2877.601.camel@edumazet-laptop>

On Mon, 2010-07-05 at 20:07 +0200, Eric Dumazet wrote:
> Le lundi 05 juillet 2010 à 18:31 +0100, Ben Hutchings a écrit :
> > On Mon, 2010-07-05 at 18:03 +0200, Eric Dumazet wrote:
> > > Le lundi 05 juillet 2010 à 11:14 +0200, Eric Dumazet a écrit :
> > > > After commit be1f3c2c027c (net: Enable 64-bit net device statistics on
> > > > 32-bit architectures), we can now provide 64bit stats, even on 32bit
> > > > arches.
> > > > 
> > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > > > ---
> > > 
> > > Please disregard this patch.
> > > 
> > > There is small possibility a reader might read a 64bit value while
> > > another writer makes a change to it, changing high 32bit value.
> > > 
> > > A change in core network would be needed to make this 100% safe,
> > > possibly using a seqlock to protect dev->stats64
> > 
> > I really didn't want to add this overhead and complication to readers
> > when only some drivers need it.
> > 
> 
> Overhead would be minimal, only in get_stats() method and only for
> drivers that want to provide 64 bit stats...
> 
> Clearly, rx/tx path must not have any overhead.
> 
> Right now, even RTNL 64bit stats are not safe.

Ugh, I was assuming that callers of dev_get_stats() would hold
dev_base_lock.  However only netstat_show() seems to do so currently.

Ben.

> Should we revert prior patches or try to make progress ?
> 
> 
> 
-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH net-next-2.6] tg3: 64bits stats
From: Eric Dumazet @ 2010-07-05 18:59 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Matt Carlson, Michael Chan, netdev, David Miller
In-Reply-To: <1278355263.2087.116.camel@achroite.uk.solarflarecom.com>

Le lundi 05 juillet 2010 à 19:41 +0100, Ben Hutchings a écrit :
> Ugh, I was assuming that callers of dev_get_stats() would hold
> dev_base_lock.  However only netstat_show() seems to do so currently.

Right, but a read_lock() would not be enough, even if all users would
use it.



^ permalink raw reply

* Re: [PATCH 3/3] pm_qos: get rid of the allocation in pm_qos_add_request()
From: mark gross @ 2010-07-05 19:16 UTC (permalink / raw)
  To: James Bottomley; +Cc: Takashi Iwai, Linux PM, markgross, netdev
In-Reply-To: <1278338568.2850.1.camel@mulgrave.site>

On Mon, Jul 05, 2010 at 09:02:48AM -0500, James Bottomley wrote:
> On Mon, 2010-07-05 at 08:41 +0200, Takashi Iwai wrote:
> > sorry for the late reply, as I've been on vacation in the last week
> > (and shut off mails intentionally :)
> 
> Envy forbids me from saying that's OK.
> 
> > At Mon, 28 Jun 2010 12:44:48 -0500,
> > James Bottomley wrote:
> > > 
> > > Since every caller has to squirrel away the returned pointer anyway,
> > > they might as well supply the memory area.  This fixes a bug in a few of
> > > the call sites where the returned pointer was dereferenced without
> > > checking it for NULL (which gets returned if the kzalloc failed).
> > > 
> > > I'd like to hear how sound and netdev feels about this: it will add
> > > about two more pointers worth of data to struct netdev and struct
> > > snd_pcm_substream .. but I think it's worth it.  If you're OK, I'll add
> > > your acks and send through the pm tree.
> > > 
> > > This also looks to me like an android independent clean up (even though
> > > it renders the request_add atomically callable).  I also added include
> > > guards to include/linux/pm_qos_params.h
> > 
> > I like the patch very well, too.
> > But, just wondering...
> > 
> > > @@ -262,6 +260,11 @@ void pm_qos_update_request(struct pm_qos_request_list *pm_qos_req,
> > >  	if (!pm_qos_req) /*guard against callers passing in null */
> > >  		return;
> > >  
> > > +	if (pm_qos_request_active(pm_qos_req)) {
> > > +		WARN(1, KERN_ERR "pm_qos_update_request() called for unknown object\n");
> > > +		return;
> > > +	}
> > > +
> > 
> > Is this correct...?  Shouldn't it be a negative check?
> 
> Yes, it should be a negative check ... I'll update the patch.  I guess
> this still means that no-one has managed to test it on a functional
> system ...
> 

well I guess that explains the warning I got on my back port of this
patch.


[   62.944788] ------------[ cut here ]------------                                        
[   62.944788] WARNING: at kernel/pm_qos_params.c:266                                      
pm_qos_update_request+0x21/0x46)                                                           
[   62.944788] pm_qos_update_request() called for unknown object                           
[   62.944788] Modules linked in: mrst_sspi cfspi_slave chnl_chr                           
caif_chr chnl_net caf                                                                      
[   62.944788] Pid: 91, comm: mrst/0 Tainted: G        W  2.6.31.6-mrst
#30                
[   62.944788] Call Trace:                                                                 
[   62.944788]  [<c0145b2e>] ? pm_qos_update_request+0x21/0x46                             
[   62.944788]  [<c012fff3>] warn_slowpath_common+0x60/0x77                                
[   62.944788]  [<c013003e>] warn_slowpath_fmt+0x24/0x27                                   
[   62.944788]  [<c0145b2e>] pm_qos_update_request+0x21/0x46                               
[   62.944788]  [<c03029f2>] int_transfer_complete_work+0x19/0x65                          
[   62.944788]  [<c013f02a>] worker_thread+0x153/0x1df                                     
[   62.944788]  [<c03029d9>] ? int_transfer_complete_work+0x0/0x65                         
[   62.944788]  [<c0141df1>] ? autoremove_wake_function+0x0/0x30                           
[   62.944788]  [<c0141c7c>] kthread+0x64/0x69                                             
[   62.944788]  [<c013eed7>] ? worker_thread+0x0/0x1df                                     
[   62.944788]  [<c0141c18>] ? kthread+0x0/0x69                                            
[   62.944788]  [<c01034df>] kernel_thread_helper+0x7/0x10                                 
[   62.944788] ---[ end trace 1723851b79e06c5d ]---                                        
[   62.944788] ------------[ cut here ]------------                                        
[   62.944788] WARNING: at kernel/pm_qos_params.c:266          

Sorry, I've been swamped by work and personal things the past 2 weeks.

--mgross


^ permalink raw reply

* bridge br_multicast: BUG: unable to handle kernel NULL pointer dereference
From: Frank Arnold @ 2010-07-05 19:05 UTC (permalink / raw)
  To: Stephen Hemminger, YOSHIFUJI Hideaki, Herbert Xu; +Cc: netdev

Hi,

we see a kernel NULL pointer dereference during testing of the KVM tree,
currently based on 2.6.35-rc3. We are using bridge to connect the KVM
guests through the hosts network interface. Here is the trace:

BUG: unable to handle kernel NULL pointer dereference at
0000000000000028                                               
IP: [<ffffffffa0196da0>] __br_ip4_hash+0x0/0x7c [bridge]                                                                
PGD 0                                                                                                                   
Oops: 0000 [#1] SMP                                                                                                     
last sysfs file: /sys/module/lockd/initstate                                                                            
CPU 3                                                                                                                   
Modules linked in: nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc bridge stp ipv6 kvm_amd kvm snd_hda_codec_atihdmi 
snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd pcspkr serio_raw ata_generic r8169 so
undcore i2c_piix4 pata_acpi i2c_core joydev snd_page_alloc mii pata_atiixp shpchp [last unloaded: scsi_wait_scan]       
                                                                                                                        
Pid: 0, comm: swapper Not tainted 2.6.35.20100705_8dea564-1.fc11.osrc.x86_64 #1 GA-MA74GM-S2H/GA-MA74GM-S2H             
RIP: 0010:[<ffffffffa0196da0>]  [<ffffffffa0196da0>] __br_ip4_hash+0x0/0x7c [bridge]                                    
RSP: 0018:ffff880001b838a8  EFLAGS: 00010246                                                                            
RAX: ffff880126028000 RBX: 0000000000000000 RCX: ffff880127b3a828                                                       
RDX: 0000000001b80008 RSI: 0000000064ffffef RDI: 0000000000000000                                                       
RBP: ffff880001b838b0 R08: ffff8800054c3870 R09: 0000000000000000                                                       
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880001b83a00                                                       
R13: ffff880001b83a00 R14: ffff880127b3a800 R15: ffff880125ccc400                                                       
FS:  00007f17d45ea6f0(0000) GS:ffff880001b80000(0000) knlGS:0000000000000000                                            
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b                                                                       
CR2: 0000000000000028 CR3: 00000000016b0000 CR4: 00000000000006e0                                                       
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                                       
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400                                                       
Process swapper (pid: 0, threadinfo ffff880127ab4000, task ffff880127ab96b0)                                            
Stack:                                                                                                                  
 ffffffffa0196f48 ffff880001b838d0 ffffffffa01970be ffff880126028640                                                    
<0> ffff880125ccc400 ffff880001b83910 ffffffffa0197511 ffff880001b83900                                                 
<0> ffff880127b3a800 ffff8800054c3868 ffff880126028640 ffff880127b3a800                                                 
Call Trace:                                                                                                             
 <IRQ>                                                                                                                  
 [<ffffffffa0196f48>] ? br_ip_hash+0x1f/0x28 [bridge]                                                                   
 [<ffffffffa01970be>] br_mdb_ip_get+0x12/0x24 [bridge]                                                                  
 [<ffffffffa0197511>] br_multicast_leave_group+0x62/0x160 [bridge]                                                      
 [<ffffffffa0199028>] br_multicast_rcv+0x60e/0xcda [bridge]                                                             
 [<ffffffff81043320>] ? local_bh_enable_ip+0x9/0xb                                                                      
 [<ffffffff81369f85>] ? _raw_spin_unlock_bh+0xf/0x11                                                                    
 [<ffffffff812f9a1a>] ? packet+0x1a/0x24                                                                                
 [<ffffffff812f777b>] ? nf_conntrack_in+0x4ee/0x59f                                                                     
 [<ffffffffa01907d5>] ? fdb_create+0x28/0x73 [bridge]                                                                   
 [<ffffffffa0190945>] ? br_fdb_update+0x125/0x134 [bridge]                                                              
 [<ffffffffa0191e74>] br_handle_frame_finish+0x6d/0x1ba [bridge]                                                        
 [<ffffffffa0191e07>] ? br_handle_frame_finish+0x0/0x1ba [bridge]                                                       
 [<ffffffffa0195c79>] NF_HOOK_THRESH+0x46/0x4d [bridge]                                                                 
 [<ffffffffa0195ed2>] ? nf_bridge_push_encap_header+0x2f/0x3c [bridge]                                                  
 [<ffffffffa0196c65>] br_nf_pre_routing_finish+0x222/0x231 [bridge]                                                     
 [<ffffffff812f4a10>] ? nf_hook_slow+0x65/0xc6                                                                          
 [<ffffffffa0196a43>] ? br_nf_pre_routing_finish+0x0/0x231 [bridge]                                                     
 [<ffffffffa0196a43>] ? br_nf_pre_routing_finish+0x0/0x231 [bridge]                                                     
 [<ffffffffa0195c79>] NF_HOOK_THRESH+0x46/0x4d [bridge]                                                                 
 [<ffffffffa019609a>] ? nf_bridge_alloc+0x1d/0x3a [bridge]                                                              
 [<ffffffffa0196a26>] br_nf_pre_routing+0x550/0x56d [bridge]                                                            
 [<ffffffff812f4968>] nf_iterate+0x41/0x84                                                                              
 [<ffffffffa0191e07>] ? br_handle_frame_finish+0x0/0x1ba [bridge]                                                       
 [<ffffffff812f4a10>] nf_hook_slow+0x65/0xc6                                                                            
 [<ffffffffa0191e07>] ? br_handle_frame_finish+0x0/0x1ba [bridge]                                                       
 [<ffffffffa0191e07>] ? br_handle_frame_finish+0x0/0x1ba [bridge]                                                       
 [<ffffffffa0191df5>] NF_HOOK.clone.0+0x41/0x53 [bridge]                                                                
 [<ffffffffa0192137>] br_handle_frame+0x176/0x18f [bridge]                                                              
 [<ffffffff812d54e5>] __netif_receive_skb+0x2b0/0x3f5                                                                   
 [<ffffffff810592d2>] ? ktime_get_real+0x11/0x3e                                                                        
 [<ffffffff812d612c>] netif_receive_skb+0x52/0x59                                                                       
 [<ffffffff812d0ce6>] ? __netdev_alloc_skb+0x2f/0x4b                                                                    
 [<ffffffffa0054ff1>] rtl8169_rx_interrupt+0x385/0x4d6 [r8169]                                                          
 [<ffffffff81222203>] ? scsi_next_command+0x3e/0x46                                                                     
 [<ffffffff812354b3>] ? __ata_qc_complete+0xdf/0xe7                                                                     
 [<ffffffffa0057614>] rtl8169_poll+0x37/0x1a1 [r8169]                                                                   
 [<ffffffff812d62ed>] net_rx_action+0xab/0x18c                                                                          
 [<ffffffffa00565f4>] ? rtl8169_interrupt+0x2cb/0x36e [r8169]                                                           
 [<ffffffff81043446>] __do_softirq+0x97/0x125                                                                           
 [<ffffffff8101a026>] ? ack_apic_level+0x78/0x1ce                                                                       
 [<ffffffff810038dc>] call_softirq+0x1c/0x28                                                                            
 [<ffffffff81004e61>] do_softirq+0x41/0x7e                                                                              
 [<ffffffff810431ce>] irq_exit+0x36/0x78                                                                                
 [<ffffffff8100459c>] do_IRQ+0xa7/0xbe                                                                                  
 [<ffffffff8136a1d3>] ret_from_intr+0x0/0x11                                                                            
 <EOI>                                                                                                                  
 [<ffffffff8102036c>] ? native_safe_halt+0x6/0x8                                                                        
 [<ffffffff8136d161>] ? atomic_notifier_call_chain+0x13/0x15                                                            
 [<ffffffff81009696>] default_idle+0x27/0x44                                                                            
 [<ffffffff81001d3a>] cpu_idle+0x58/0x93                                                                                
 [<ffffffff81364944>] start_secondary+0x1a4/0x1a8                                                                       
Code: 7e 66 81 fa 81 00 74 0d 31 c0 66 81 fa 88 64 0f 94 c0 c1 e0 03 89 c2 48 29 93 e0 00 00 00 01 43 68 31 c0 5b 41 5c 
c9 c3 90 90 90 <8b> 47 28 89 f1 ba b9 79 37 9e c1 e9 0d 29 f2 55 29 f0 48 89 e5                                         
RIP  [<ffffffffa0196da0>] __br_ip4_hash+0x0/0x7c [bridge]                                                               
 RSP <ffff880001b838a8>                                                                                                 
CR2: 0000000000000028                                                                                                   
---[ end trace c0f05a4e3727475d ]---                                                                                    
Kernel panic - not syncing: Fatal exception in interrupt                                                                


-- 
Frank Arnold 
Systems Design Technician, Software Test
AMD Operating System Research Center
Dresden, Germany
Tel: +49 351 448 356702


Legal Information:
Advanced Micro Devices GmbH
Einsteinring 24
85609 Dornach b. München

Geschäftsführer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis München
Registergericht München, HRB Nr. 43632



^ permalink raw reply

* [PATCH net-next-2.6 V2] net:  fix 64 bit counters on 32 bit arches
From: Eric Dumazet @ 2010-07-05 20:05 UTC (permalink / raw)
  To: Ben Hutchings, David Miller
  Cc: Stephen Hemminger, Arnd Bergmann, netdev, linux-net-drivers
In-Reply-To: <1278354682.2877.639.camel@edumazet-laptop>

Le lundi 05 juillet 2010 à 20:31 +0200, Eric Dumazet a écrit :

> One other way would be to add a rtnl_link_stats64 param to
> ndo_get_stats64() method, and ask drivers to copy their stats in this
> zone, instead of returning &dev->stats64 or something...
> 
> And also change dev_get_stats() with this new parameter.
> 
> Each caller would use a private copy, with no risk of concurrent
> updates.
> 
> 

Here is an updated patch, without added lock but temp storage.

Thanks

[PATCH net-next-2.6 V2] net: fix 64 bit counters on 32 bit arches

There is a small possibility that a reader gets incorrect values on 32
bit arches. SNMP applications could catch incorrect counters when a
32bit high part is changed by another stats consumer/provider.

One way to solve this is to add a rtnl_link_stats64 param to all 
ndo_get_stats64() methods, and also add such a parameter to
dev_get_stats().

Rule is that we are not allowed to use dev->stats64 as a temporary
storage for 64bit stats, but a caller provided area (usually on stack)

Old drivers (only providing get_stats() method) need no changes.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/net/bonding/bond_main.c   |   64 +++++++++++++---------------
 drivers/net/ixgbe/ixgbe_ethtool.c |    8 ++-
 drivers/net/loopback.c            |    4 -
 drivers/net/macvlan.c             |    6 +-
 drivers/net/sfc/efx.c             |    3 -
 drivers/net/sfc/ethtool.c         |    3 -
 include/linux/netdevice.h         |   12 +++--
 net/8021q/vlan_dev.c              |    6 --
 net/8021q/vlanproc.c              |    3 -
 net/bridge/br_device.c            |    4 -
 net/core/dev.c                    |   25 +++++++---
 net/core/net-sysfs.c              |    4 +
 net/core/rtnetlink.c              |    3 -
 13 files changed, 79 insertions(+), 66 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a95a41b..9bb9bfa 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3804,51 +3804,49 @@ static int bond_close(struct net_device *bond_dev)
 	return 0;
 }
 
-static struct rtnl_link_stats64 *bond_get_stats(struct net_device *bond_dev)
+static struct rtnl_link_stats64 *bond_get_stats(struct net_device *bond_dev,
+						struct rtnl_link_stats64 *stats)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
-	struct rtnl_link_stats64 *stats = &bond_dev->stats64;
-	struct rtnl_link_stats64 local_stats;
+	struct rtnl_link_stats64 temp;
 	struct slave *slave;
 	int i;
 
-	memset(&local_stats, 0, sizeof(local_stats));
+	memset(stats, 0, sizeof(*stats));
 
 	read_lock_bh(&bond->lock);
 
 	bond_for_each_slave(bond, slave, i) {
 		const struct rtnl_link_stats64 *sstats =
-			dev_get_stats(slave->dev);
-
-		local_stats.rx_packets += sstats->rx_packets;
-		local_stats.rx_bytes += sstats->rx_bytes;
-		local_stats.rx_errors += sstats->rx_errors;
-		local_stats.rx_dropped += sstats->rx_dropped;
-
-		local_stats.tx_packets += sstats->tx_packets;
-		local_stats.tx_bytes += sstats->tx_bytes;
-		local_stats.tx_errors += sstats->tx_errors;
-		local_stats.tx_dropped += sstats->tx_dropped;
-
-		local_stats.multicast += sstats->multicast;
-		local_stats.collisions += sstats->collisions;
-
-		local_stats.rx_length_errors += sstats->rx_length_errors;
-		local_stats.rx_over_errors += sstats->rx_over_errors;
-		local_stats.rx_crc_errors += sstats->rx_crc_errors;
-		local_stats.rx_frame_errors += sstats->rx_frame_errors;
-		local_stats.rx_fifo_errors += sstats->rx_fifo_errors;
-		local_stats.rx_missed_errors += sstats->rx_missed_errors;
-
-		local_stats.tx_aborted_errors += sstats->tx_aborted_errors;
-		local_stats.tx_carrier_errors += sstats->tx_carrier_errors;
-		local_stats.tx_fifo_errors += sstats->tx_fifo_errors;
-		local_stats.tx_heartbeat_errors += sstats->tx_heartbeat_errors;
-		local_stats.tx_window_errors += sstats->tx_window_errors;
+			dev_get_stats(slave->dev, &temp);
+
+		stats->rx_packets += sstats->rx_packets;
+		stats->rx_bytes += sstats->rx_bytes;
+		stats->rx_errors += sstats->rx_errors;
+		stats->rx_dropped += sstats->rx_dropped;
+
+		stats->tx_packets += sstats->tx_packets;
+		stats->tx_bytes += sstats->tx_bytes;
+		stats->tx_errors += sstats->tx_errors;
+		stats->tx_dropped += sstats->tx_dropped;
+
+		stats->multicast += sstats->multicast;
+		stats->collisions += sstats->collisions;
+
+		stats->rx_length_errors += sstats->rx_length_errors;
+		stats->rx_over_errors += sstats->rx_over_errors;
+		stats->rx_crc_errors += sstats->rx_crc_errors;
+		stats->rx_frame_errors += sstats->rx_frame_errors;
+		stats->rx_fifo_errors += sstats->rx_fifo_errors;
+		stats->rx_missed_errors += sstats->rx_missed_errors;
+
+		stats->tx_aborted_errors += sstats->tx_aborted_errors;
+		stats->tx_carrier_errors += sstats->tx_carrier_errors;
+		stats->tx_fifo_errors += sstats->tx_fifo_errors;
+		stats->tx_heartbeat_errors += sstats->tx_heartbeat_errors;
+		stats->tx_window_errors += sstats->tx_window_errors;
 	}
 
-	memcpy(stats, &local_stats, sizeof(struct net_device_stats));
-
 	read_unlock_bh(&bond->lock);
 
 	return stats;
diff --git a/drivers/net/ixgbe/ixgbe_ethtool.c b/drivers/net/ixgbe/ixgbe_ethtool.c
index 5275e9c..7e85e61 100644
--- a/drivers/net/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ixgbe/ixgbe_ethtool.c
@@ -55,7 +55,7 @@ struct ixgbe_stats {
 				offsetof(struct ixgbe_adapter, m)
 #define IXGBE_NETDEV_STAT(m)	NETDEV_STATS, \
 				sizeof(((struct net_device *)0)->m), \
-				offsetof(struct net_device, m)
+				offsetof(struct net_device, m) - offsetof(struct net_device, stats)
 
 static struct ixgbe_stats ixgbe_gstrings_stats[] = {
 	{"rx_packets", IXGBE_NETDEV_STAT(stats.rx_packets)},
@@ -998,16 +998,18 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	u64 *queue_stat;
 	int stat_count = sizeof(struct ixgbe_queue_stats) / sizeof(u64);
+	struct rtnl_link_stats64 temp;
+	const struct rtnl_link_stats64 *net_stats;
 	int j, k;
 	int i;
 	char *p = NULL;
 
 	ixgbe_update_stats(adapter);
-	dev_get_stats(netdev);
+	net_stats = dev_get_stats(netdev, &temp);
 	for (i = 0; i < IXGBE_GLOBAL_STATS_LEN; i++) {
 		switch (ixgbe_gstrings_stats[i].type) {
 		case NETDEV_STATS:
-			p = (char *) netdev +
+			p = (char *) net_stats +
 					ixgbe_gstrings_stats[i].stat_offset;
 			break;
 		case IXGBE_STATS:
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 4dd0510..9a09967 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -98,10 +98,10 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb,
 	return NETDEV_TX_OK;
 }
 
-static struct rtnl_link_stats64 *loopback_get_stats64(struct net_device *dev)
+static struct rtnl_link_stats64 *loopback_get_stats64(struct net_device *dev,
+						      struct rtnl_link_stats64 *stats)
 {
 	const struct pcpu_lstats __percpu *pcpu_lstats;
-	struct rtnl_link_stats64 *stats = &dev->stats64;
 	u64 bytes = 0;
 	u64 packets = 0;
 	u64 drops = 0;
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index e6d626e..6112f14 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -431,12 +431,12 @@ static void macvlan_uninit(struct net_device *dev)
 	free_percpu(vlan->rx_stats);
 }
 
-static struct rtnl_link_stats64 *macvlan_dev_get_stats64(struct net_device *dev)
+static struct rtnl_link_stats64 *macvlan_dev_get_stats64(struct net_device *dev,
+							 struct rtnl_link_stats64 *stats)
 {
-	struct rtnl_link_stats64 *stats = &dev->stats64;
 	struct macvlan_dev *vlan = netdev_priv(dev);
 
-	dev_txq_stats_fold(dev, &dev->stats);
+	dev_txq_stats_fold(dev, (struct net_device_stats *)stats);
 
 	if (vlan->rx_stats) {
 		struct macvlan_rx_stats *p, accum = {0};
diff --git a/drivers/net/sfc/efx.c b/drivers/net/sfc/efx.c
index 35b3f29..ba674c5 100644
--- a/drivers/net/sfc/efx.c
+++ b/drivers/net/sfc/efx.c
@@ -1533,11 +1533,10 @@ static int efx_net_stop(struct net_device *net_dev)
 }
 
 /* Context: process, dev_base_lock or RTNL held, non-blocking. */
-static struct rtnl_link_stats64 *efx_net_stats(struct net_device *net_dev)
+static struct rtnl_link_stats64 *efx_net_stats(struct net_device *net_dev, struct rtnl_link_stats64 *stats)
 {
 	struct efx_nic *efx = netdev_priv(net_dev);
 	struct efx_mac_stats *mac_stats = &efx->mac_stats;
-	struct rtnl_link_stats64 *stats = &net_dev->stats64;
 
 	spin_lock_bh(&efx->stats_lock);
 	efx->type->update_stats(efx);
diff --git a/drivers/net/sfc/ethtool.c b/drivers/net/sfc/ethtool.c
index 3b8b0a0..fd19d6a 100644
--- a/drivers/net/sfc/ethtool.c
+++ b/drivers/net/sfc/ethtool.c
@@ -469,12 +469,13 @@ static void efx_ethtool_get_stats(struct net_device *net_dev,
 	struct efx_mac_stats *mac_stats = &efx->mac_stats;
 	struct efx_ethtool_stat *stat;
 	struct efx_channel *channel;
+	struct rtnl_link_stats64 temp;
 	int i;
 
 	EFX_BUG_ON_PARANOID(stats->n_stats != EFX_ETHTOOL_NUM_STATS);
 
 	/* Update MAC and NIC statistics */
-	dev_get_stats(net_dev);
+	dev_get_stats(net_dev, &temp);
 
 	/* Fill detailed statistics buffer */
 	for (i = 0; i < EFX_ETHTOOL_NUM_STATS; i++) {
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4d27368..60de653 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -666,7 +666,8 @@ struct netdev_rx_queue {
  *	Callback uses when the transmitter has not made any progress
  *	for dev->watchdog ticks.
  *
- * struct rtnl_link_stats64* (*ndo_get_stats64)(struct net_device *dev);
+ * struct rtnl_link_stats64* (*ndo_get_stats64)(struct net_device *dev
+ *                      struct rtnl_link_stats64 *storage);
  * struct net_device_stats* (*ndo_get_stats)(struct net_device *dev);
  *	Called when a user wants to get the network device usage
  *	statistics. Drivers must do one of the following:
@@ -733,7 +734,8 @@ struct net_device_ops {
 						   struct neigh_parms *);
 	void			(*ndo_tx_timeout) (struct net_device *dev);
 
-	struct rtnl_link_stats64* (*ndo_get_stats64)(struct net_device *dev);
+	struct rtnl_link_stats64* (*ndo_get_stats64)(struct net_device *dev,
+						     struct rtnl_link_stats64 *storage);
 	struct net_device_stats* (*ndo_get_stats)(struct net_device *dev);
 
 	void			(*ndo_vlan_rx_register)(struct net_device *dev,
@@ -2139,8 +2141,10 @@ extern void		netdev_features_change(struct net_device *dev);
 /* Load a device via the kmod */
 extern void		dev_load(struct net *net, const char *name);
 extern void		dev_mcast_init(void);
-extern const struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev);
-extern void		dev_txq_stats_fold(const struct net_device *dev, struct net_device_stats *stats);
+extern const struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev,
+						     struct rtnl_link_stats64 *storage);
+extern void		dev_txq_stats_fold(const struct net_device *dev,
+					   struct net_device_stats *stats);
 
 extern int		netdev_max_backlog;
 extern int		netdev_tstamp_prequeue;
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index c6456cb..7865a4c 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -803,11 +803,9 @@ static u32 vlan_ethtool_get_flags(struct net_device *dev)
 	return dev_ethtool_get_flags(vlan->real_dev);
 }
 
-static struct rtnl_link_stats64 *vlan_dev_get_stats64(struct net_device *dev)
+static struct rtnl_link_stats64 *vlan_dev_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 {
-	struct rtnl_link_stats64 *stats = &dev->stats64;
-
-	dev_txq_stats_fold(dev, &dev->stats);
+	dev_txq_stats_fold(dev, (struct net_device_stats *)stats);
 
 	if (vlan_dev_info(dev)->vlan_rx_stats) {
 		struct vlan_rx_stats *p, accum = {0};
diff --git a/net/8021q/vlanproc.c b/net/8021q/vlanproc.c
index df56f5c..80e280f 100644
--- a/net/8021q/vlanproc.c
+++ b/net/8021q/vlanproc.c
@@ -278,6 +278,7 @@ static int vlandev_seq_show(struct seq_file *seq, void *offset)
 {
 	struct net_device *vlandev = (struct net_device *) seq->private;
 	const struct vlan_dev_info *dev_info = vlan_dev_info(vlandev);
+	struct rtnl_link_stats64 temp;
 	const struct rtnl_link_stats64 *stats;
 	static const char fmt[] = "%30s %12lu\n";
 	static const char fmt64[] = "%30s %12llu\n";
@@ -286,7 +287,7 @@ static int vlandev_seq_show(struct seq_file *seq, void *offset)
 	if (!is_vlan_dev(vlandev))
 		return 0;
 
-	stats = dev_get_stats(vlandev);
+	stats = dev_get_stats(vlandev, &temp);
 	seq_printf(seq,
 		   "%s  VID: %d	 REORDER_HDR: %i  dev->priv_flags: %hx\n",
 		   vlandev->name, dev_info->vlan_id,
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index edf639e..075c435 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -98,10 +98,10 @@ static int br_dev_stop(struct net_device *dev)
 	return 0;
 }
 
-static struct rtnl_link_stats64 *br_get_stats64(struct net_device *dev)
+static struct rtnl_link_stats64 *br_get_stats64(struct net_device *dev,
+						struct rtnl_link_stats64 *stats)
 {
 	struct net_bridge *br = netdev_priv(dev);
-	struct rtnl_link_stats64 *stats = &dev->stats64;
 	struct br_cpu_netstats tmp, sum = { 0 };
 	unsigned int cpu;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 93b8929..b6d570a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3703,7 +3703,8 @@ void dev_seq_stop(struct seq_file *seq, void *v)
 
 static void dev_seq_printf_stats(struct seq_file *seq, struct net_device *dev)
 {
-	const struct rtnl_link_stats64 *stats = dev_get_stats(dev);
+	struct rtnl_link_stats64 temp;
+	const struct rtnl_link_stats64 *stats = dev_get_stats(dev, &temp);
 
 	seq_printf(seq, "%6s: %7llu %7llu %4llu %4llu %4llu %5llu %10llu %9llu "
 		   "%8llu %7llu %4llu %4llu %4llu %5llu %7llu %10llu\n",
@@ -5281,23 +5282,29 @@ EXPORT_SYMBOL(dev_txq_stats_fold);
 /**
  *	dev_get_stats	- get network device statistics
  *	@dev: device to get statistics from
+ *	@storage: place to store stats
  *
  *	Get network statistics from device. The device driver may provide
  *	its own method by setting dev->netdev_ops->get_stats64 or
  *	dev->netdev_ops->get_stats; otherwise the internal statistics
  *	structure is used.
  */
-const struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev)
+const struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev,
+					      struct rtnl_link_stats64 *storage)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
 
-	if (ops->ndo_get_stats64)
-		return ops->ndo_get_stats64(dev);
-	if (ops->ndo_get_stats)
-		return (struct rtnl_link_stats64 *)ops->ndo_get_stats(dev);
-
-	dev_txq_stats_fold(dev, &dev->stats);
-	return &dev->stats64;
+	if (ops->ndo_get_stats64) {
+		memset(storage, 0, sizeof(*storage));
+		return ops->ndo_get_stats64(dev, storage);
+	}
+	if (ops->ndo_get_stats) {
+		memcpy(storage, ops->ndo_get_stats(dev), sizeof(*storage));
+		return storage;
+	}
+	memcpy(storage, &dev->stats, sizeof(*storage));
+	dev_txq_stats_fold(dev, (struct net_device_stats *)storage);
+	return storage;
 }
 EXPORT_SYMBOL(dev_get_stats);
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index ea3bb4c..914f42b 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -330,7 +330,9 @@ static ssize_t netstat_show(const struct device *d,
 
 	read_lock(&dev_base_lock);
 	if (dev_isalive(dev)) {
-		const struct rtnl_link_stats64 *stats = dev_get_stats(dev);
+		struct rtnl_link_stats64 temp;
+		const struct rtnl_link_stats64 *stats = dev_get_stats(dev, &temp);
+
 		ret = sprintf(buf, fmt_u64, *(u64 *)(((u8 *) stats) + offset));
 	}
 	read_unlock(&dev_base_lock);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e645778..5e773ea 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -791,6 +791,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 {
 	struct ifinfomsg *ifm;
 	struct nlmsghdr *nlh;
+	struct rtnl_link_stats64 temp;
 	const struct rtnl_link_stats64 *stats;
 	struct nlattr *attr;
 
@@ -847,7 +848,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 	if (attr == NULL)
 		goto nla_put_failure;
 
-	stats = dev_get_stats(dev);
+	stats = dev_get_stats(dev, &temp);
 	copy_rtnl_link_stats(nla_data(attr), stats);
 
 	attr = nla_reserve(skb, IFLA_STATS64,



^ permalink raw reply related

* Re: Fwd: Possible bug in net/ipv4/route.c?
From: Henrique de Moraes Holschuh @ 2010-07-05 20:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Herbert Xu, yoshfuji, netdev, linux-kernel, Stephen Hemminger
In-Reply-To: <1278336898.2877.212.camel@edumazet-laptop>

On Mon, 05 Jul 2010, Eric Dumazet wrote:
> Le lundi 05 juillet 2010 à 21:22 +0800, Herbert Xu a écrit :
> > On Mon, Jul 05, 2010 at 02:59:14PM +0200, Eric Dumazet wrote:
> > >
> > > Why do we clear full 48 bytes skb->cb[] in skb_alloc(), if no protocol
> > > stack should rely it being zero ?
> > 
> > Unless a protocol is allocating the skb itself, then the fact
> > that skb_alloc clears skb->cb is no guarantee that the skb->cb
> > will be zero.
> 
> I see. We could :
> 
> Avoid this memset(skb->cb, 0, sizeof(skb->cb)) in fastpath.

Any chances of skb->cb being leaked to userspace or the network, due to
driver bugs or other such oddities?

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply

* Re: Fwd: Possible bug in net/ipv4/route.c?
From: Eric Dumazet @ 2010-07-05 20:18 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh
  Cc: Herbert Xu, yoshfuji, netdev, linux-kernel, Stephen Hemminger
In-Reply-To: <20100705200728.GB11096@khazad-dum.debian.net>

Le lundi 05 juillet 2010 à 17:07 -0300, Henrique de Moraes Holschuh a
écrit :
> On Mon, 05 Jul 2010, Eric Dumazet wrote:
> > Le lundi 05 juillet 2010 à 21:22 +0800, Herbert Xu a écrit :
> > > On Mon, Jul 05, 2010 at 02:59:14PM +0200, Eric Dumazet wrote:
> > > >
> > > > Why do we clear full 48 bytes skb->cb[] in skb_alloc(), if no protocol
> > > > stack should rely it being zero ?
> > > 
> > > Unless a protocol is allocating the skb itself, then the fact
> > > that skb_alloc clears skb->cb is no guarantee that the skb->cb
> > > will be zero.
> > 
> > I see. We could :
> > 
> > Avoid this memset(skb->cb, 0, sizeof(skb->cb)) in fastpath.
> 
> Any chances of skb->cb being leaked to userspace or the network, due to
> driver bugs or other such oddities?
> 

Not "a priori", but a bug is always possible ;)

cb[] is internal use only, should not be sent to network or user land.




^ permalink raw reply

* Re: [linux-pm] [PATCH 3/3] pm_qos: get rid of the allocation in pm_qos_add_request()
From: Rafael J. Wysocki @ 2010-07-05 21:07 UTC (permalink / raw)
  To: linux-pm; +Cc: James Bottomley, Takashi Iwai, netdev, markgross
In-Reply-To: <1278338568.2850.1.camel@mulgrave.site>

On Monday, July 05, 2010, James Bottomley wrote:
> On Mon, 2010-07-05 at 08:41 +0200, Takashi Iwai wrote:
> > sorry for the late reply, as I've been on vacation in the last week
> > (and shut off mails intentionally :)
> 
> Envy forbids me from saying that's OK.
> 
> > At Mon, 28 Jun 2010 12:44:48 -0500,
> > James Bottomley wrote:
> > > 
> > > Since every caller has to squirrel away the returned pointer anyway,
> > > they might as well supply the memory area.  This fixes a bug in a few of
> > > the call sites where the returned pointer was dereferenced without
> > > checking it for NULL (which gets returned if the kzalloc failed).
> > > 
> > > I'd like to hear how sound and netdev feels about this: it will add
> > > about two more pointers worth of data to struct netdev and struct
> > > snd_pcm_substream .. but I think it's worth it.  If you're OK, I'll add
> > > your acks and send through the pm tree.
> > > 
> > > This also looks to me like an android independent clean up (even though
> > > it renders the request_add atomically callable).  I also added include
> > > guards to include/linux/pm_qos_params.h
> > 
> > I like the patch very well, too.
> > But, just wondering...
> > 
> > > @@ -262,6 +260,11 @@ void pm_qos_update_request(struct pm_qos_request_list *pm_qos_req,
> > >  	if (!pm_qos_req) /*guard against callers passing in null */
> > >  		return;
> > >  
> > > +	if (pm_qos_request_active(pm_qos_req)) {
> > > +		WARN(1, KERN_ERR "pm_qos_update_request() called for unknown object\n");
> > > +		return;
> > > +	}
> > > +
> > 
> > Is this correct...?  Shouldn't it be a negative check?
> 
> Yes, it should be a negative check ... I'll update the patch.

I've already fixed it in my tree.

> I guess this still means that no-one has managed to test it on a functional
> system ...

Well, it's been for a while in linux-next ...

Rafael

^ permalink raw reply

* Re: bnx2/5709: Strange interrupts spread
From: Rick Jones @ 2010-07-05 22:56 UTC (permalink / raw)
  To: Christophe Ngo Van Duc; +Cc: netdev@vger.kernel.org
In-Reply-To: <AANLkTillLjztYe7mmyO_LoTXavDtTd2nsaAsd22z0dG-@mail.gmail.com>

Christophe Ngo Van Duc wrote:
> Is it a requirement that the interface has an IP address for the TX
> and RX hash to work?

At the risk of typing into Michael's keyboard, chances are, what the NIC does is 
follow Microsoft's specs for RSS, which as far as I can tell, discusses hashing 
either just on the IP addresses (v4 or v6) or the IP addresses and TCP port 
numbers, so unless Broadcom has an enhancement/extension, if the traffic 
arriving is not TCP/IP and not multiple connections, it probably does not get 
spread-out.

rick jones

^ permalink raw reply

* [net-next PATCH 0/6] qlge: fixes for qlge.
From: Ron Mercer @ 2010-07-05 22:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer


1) Restore promiscuous setting after reset.
2) Don't use firmware when forcing firmware dump.
3) Reduce print level in data path statements.
4) Fix possible endian issue for rx UDP csum.
5) Make adapter drop frame errors and pass up csum errors.
6) Change version to v1.00.00.25.00.00-01.


^ permalink raw reply

* [net-next PATCH 4/6] qlge: Fix possible endian issue for rx UDP csum.
From: Ron Mercer @ 2010-07-05 22:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer
In-Reply-To: <1278368382-23887-1-git-send-email-ron.mercer@qlogic.com>

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
---
 drivers/net/qlge/qlge_main.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/qlge/qlge_main.c b/drivers/net/qlge/qlge_main.c
index e39451b..a41b6b5 100644
--- a/drivers/net/qlge/qlge_main.c
+++ b/drivers/net/qlge/qlge_main.c
@@ -1677,7 +1677,7 @@ static void ql_process_mac_rx_skb(struct ql_adapter *qdev,
 			/* Unfragmented ipv4 UDP frame. */
 			struct iphdr *iph = (struct iphdr *) skb->data;
 			if (!(iph->frag_off &
-				cpu_to_be16(IP_MF|IP_OFFSET))) {
+				ntohs(IP_MF|IP_OFFSET))) {
 				skb->ip_summed = CHECKSUM_UNNECESSARY;
 				netif_printk(qdev, rx_status, KERN_DEBUG,
 					     qdev->ndev,
@@ -1997,7 +1997,7 @@ static void ql_process_mac_split_rx_intr(struct ql_adapter *qdev,
 		/* Unfragmented ipv4 UDP frame. */
 			struct iphdr *iph = (struct iphdr *) skb->data;
 			if (!(iph->frag_off &
-				cpu_to_be16(IP_MF|IP_OFFSET))) {
+				ntohs(IP_MF|IP_OFFSET))) {
 				skb->ip_summed = CHECKSUM_UNNECESSARY;
 				netif_printk(qdev, rx_status, KERN_DEBUG, qdev->ndev,
 					     "TCP checksum done!\n");
-- 
1.6.0.2


^ permalink raw reply related

* [net-next PATCH 6/6] qlge: Change version to v1.00.00.25.00.00-01.
From: Ron Mercer @ 2010-07-05 22:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer
In-Reply-To: <1278368382-23887-1-git-send-email-ron.mercer@qlogic.com>


Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
---
 drivers/net/qlge/qlge.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/qlge/qlge.h b/drivers/net/qlge/qlge.h
index 27f83d6..06b2188 100644
--- a/drivers/net/qlge/qlge.h
+++ b/drivers/net/qlge/qlge.h
@@ -16,7 +16,7 @@
  */
 #define DRV_NAME  	"qlge"
 #define DRV_STRING 	"QLogic 10 Gigabit PCI-E Ethernet Driver "
-#define DRV_VERSION	"v1.00.00.23.00.00-01"
+#define DRV_VERSION	"v1.00.00.25.00.00-01"
 
 #define PFX "qlge: "
 
-- 
1.6.0.2


^ permalink raw reply related

* [net-next PATCH 3/6] qlge: Reduce print level in data path statements.
From: Ron Mercer @ 2010-07-05 22:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer
In-Reply-To: <1278368382-23887-1-git-send-email-ron.mercer@qlogic.com>

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
---
 drivers/net/qlge/qlge_main.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/qlge/qlge_main.c b/drivers/net/qlge/qlge_main.c
index e99c2c6..e39451b 100644
--- a/drivers/net/qlge/qlge_main.c
+++ b/drivers/net/qlge/qlge_main.c
@@ -1521,7 +1521,7 @@ static void ql_process_mac_rx_page(struct ql_adapter *qdev,
 
 	/* Frame error, so drop the packet. */
 	if (ib_mac_rsp->flags2 & IB_MAC_IOCB_RSP_ERR_MASK) {
-		netif_err(qdev, drv, qdev->ndev,
+		netif_info(qdev, drv, qdev->ndev,
 			  "Receive error, flags2 = 0x%x\n", ib_mac_rsp->flags2);
 		rx_ring->rx_errors++;
 		goto err_out;
@@ -1618,7 +1618,7 @@ static void ql_process_mac_rx_skb(struct ql_adapter *qdev,
 
 	/* Frame error, so drop the packet. */
 	if (ib_mac_rsp->flags2 & IB_MAC_IOCB_RSP_ERR_MASK) {
-		netif_err(qdev, drv, qdev->ndev,
+		netif_info(qdev, drv, qdev->ndev,
 			  "Receive error, flags2 = 0x%x\n", ib_mac_rsp->flags2);
 		dev_kfree_skb_any(skb);
 		rx_ring->rx_errors++;
@@ -1939,7 +1939,7 @@ static void ql_process_mac_split_rx_intr(struct ql_adapter *qdev,
 
 	/* Frame error, so drop the packet. */
 	if (ib_mac_rsp->flags2 & IB_MAC_IOCB_RSP_ERR_MASK) {
-		netif_err(qdev, drv, qdev->ndev,
+		netif_info(qdev, drv, qdev->ndev,
 			  "Receive error, flags2 = 0x%x\n", ib_mac_rsp->flags2);
 		dev_kfree_skb_any(skb);
 		rx_ring->rx_errors++;
-- 
1.6.0.2


^ permalink raw reply related

* [net-next PATCH 5/6] qlge: Make adapter drop frame errors and pass up csum errors.
From: Ron Mercer @ 2010-07-05 22:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer
In-Reply-To: <1278368382-23887-1-git-send-email-ron.mercer@qlogic.com>

Statistics are available and the driver doesn't need the actual frame.
This TCP/UDP and IP headers checksum errors will still be passed to the
driver.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
---
 drivers/net/qlge/qlge_main.c |   30 ++++++++++++++++++++++++++++--
 1 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/net/qlge/qlge_main.c b/drivers/net/qlge/qlge_main.c
index a41b6b5..dd9e86c 100644
--- a/drivers/net/qlge/qlge_main.c
+++ b/drivers/net/qlge/qlge_main.c
@@ -574,6 +574,22 @@ static int ql_set_routing_reg(struct ql_adapter *qdev, u32 index, u32 mask,
 			    (RT_IDX_ALL_ERR_SLOT << RT_IDX_IDX_SHIFT);/* index */
 			break;
 		}
+	case RT_IDX_IP_CSUM_ERR: /* Pass up IP CSUM error frames. */
+		{
+			value = RT_IDX_DST_DFLT_Q | /* dest */
+				RT_IDX_TYPE_NICQ | /* type */
+				(RT_IDX_IP_CSUM_ERR_SLOT <<
+				RT_IDX_IDX_SHIFT); /* index */
+			break;
+		}
+	case RT_IDX_TU_CSUM_ERR: /* Pass up TCP/UDP CSUM error frames. */
+		{
+			value = RT_IDX_DST_DFLT_Q | /* dest */
+				RT_IDX_TYPE_NICQ | /* type */
+				(RT_IDX_TCP_UDP_CSUM_ERR_SLOT <<
+				RT_IDX_IDX_SHIFT); /* index */
+			break;
+		}
 	case RT_IDX_BCAST:	/* Pass up Broadcast frames to default Q. */
 		{
 			value = RT_IDX_DST_DFLT_Q |	/* dest */
@@ -3587,10 +3603,20 @@ static int ql_route_initialize(struct ql_adapter *qdev)
 	if (status)
 		return status;
 
-	status = ql_set_routing_reg(qdev, RT_IDX_ALL_ERR_SLOT, RT_IDX_ERR, 1);
+	status = ql_set_routing_reg(qdev, RT_IDX_IP_CSUM_ERR_SLOT,
+						RT_IDX_IP_CSUM_ERR, 1);
+	if (status) {
+		netif_err(qdev, ifup, qdev->ndev,
+			"Failed to init routing register "
+			"for IP CSUM error packets.\n");
+		goto exit;
+	}
+	status = ql_set_routing_reg(qdev, RT_IDX_TCP_UDP_CSUM_ERR_SLOT,
+						RT_IDX_TU_CSUM_ERR, 1);
 	if (status) {
 		netif_err(qdev, ifup, qdev->ndev,
-			  "Failed to init routing register for error packets.\n");
+			"Failed to init routing register "
+			"for TCP/UDP CSUM error packets.\n");
 		goto exit;
 	}
 	status = ql_set_routing_reg(qdev, RT_IDX_BCAST_SLOT, RT_IDX_BCAST, 1);
-- 
1.6.0.2


^ permalink raw reply related

* [net-next PATCH 2/6] qlge: Don't use firmware when forcing firmware dump.
From: Ron Mercer @ 2010-07-05 22:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer
In-Reply-To: <1278368382-23887-1-git-send-email-ron.mercer@qlogic.com>

In some cases the firmware may be dead.  Instead we dump the firmware
parameters and then restart it.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
---
 drivers/net/qlge/qlge.h     |    1 -
 drivers/net/qlge/qlge_dbg.c |    7 +------
 drivers/net/qlge/qlge_mpi.c |   17 -----------------
 3 files changed, 1 insertions(+), 24 deletions(-)

diff --git a/drivers/net/qlge/qlge.h b/drivers/net/qlge/qlge.h
index 01b0634..27f83d6 100644
--- a/drivers/net/qlge/qlge.h
+++ b/drivers/net/qlge/qlge.h
@@ -2227,7 +2227,6 @@ int ql_dump_risc_ram_area(struct ql_adapter *qdev, void *buf,
 		u32 ram_addr, int word_count);
 int ql_core_dump(struct ql_adapter *qdev,
 		struct ql_mpi_coredump *mpi_coredump);
-int ql_mb_sys_err(struct ql_adapter *qdev);
 int ql_mb_about_fw(struct ql_adapter *qdev);
 int ql_wol(struct ql_adapter *qdev);
 int ql_mb_wol_set_magic(struct ql_adapter *qdev, u32 enable_wol);
diff --git a/drivers/net/qlge/qlge_dbg.c b/drivers/net/qlge/qlge_dbg.c
index 68a1c9b..548e901 100644
--- a/drivers/net/qlge/qlge_dbg.c
+++ b/drivers/net/qlge/qlge_dbg.c
@@ -1237,12 +1237,7 @@ static void ql_get_core_dump(struct ql_adapter *qdev)
 			  "Force Coredump can only be done from interface that is up.\n");
 		return;
 	}
-
-	if (ql_mb_sys_err(qdev)) {
-		netif_err(qdev, ifup, qdev->ndev,
-			  "Fail force coredump with ql_mb_sys_err().\n");
-		return;
-	}
+	ql_queue_fw_error(qdev);
 }
 
 void ql_gen_reg_dump(struct ql_adapter *qdev,
diff --git a/drivers/net/qlge/qlge_mpi.c b/drivers/net/qlge/qlge_mpi.c
index 3c00462..f84e857 100644
--- a/drivers/net/qlge/qlge_mpi.c
+++ b/drivers/net/qlge/qlge_mpi.c
@@ -606,23 +606,6 @@ end:
 	return status;
 }
 
-int ql_mb_sys_err(struct ql_adapter *qdev)
-{
-	struct mbox_params mbc;
-	struct mbox_params *mbcp = &mbc;
-	int status;
-
-	memset(mbcp, 0, sizeof(struct mbox_params));
-
-	mbcp->in_count = 1;
-	mbcp->out_count = 0;
-
-	mbcp->mbox_in[0] = MB_CMD_MAKE_SYS_ERR;
-
-	status = ql_mailbox_command(qdev, mbcp);
-	return status;
-}
-
 /* Get MPI firmware version. This will be used for
  * driver banner and for ethtool info.
  * Returns zero on success.
-- 
1.6.0.2


^ permalink raw reply related

* [net-next PATCH 1/6] qlge: Restore promiscuous setting after reset.
From: Ron Mercer @ 2010-07-05 22:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, ron.mercer
In-Reply-To: <1278368382-23887-1-git-send-email-ron.mercer@qlogic.com>

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
---
 drivers/net/qlge/qlge.h      |    1 +
 drivers/net/qlge/qlge_main.c |    7 ++++++-
 2 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/net/qlge/qlge.h b/drivers/net/qlge/qlge.h
index bfb8b32..01b0634 100644
--- a/drivers/net/qlge/qlge.h
+++ b/drivers/net/qlge/qlge.h
@@ -2246,6 +2246,7 @@ netdev_tx_t ql_lb_send(struct sk_buff *skb, struct net_device *ndev);
 void ql_check_lb_frame(struct ql_adapter *, struct sk_buff *);
 int ql_own_firmware(struct ql_adapter *qdev);
 int ql_clean_lb_rx_ring(struct rx_ring *rx_ring, int budget);
+void qlge_set_multicast_list(struct net_device *ndev);
 
 #if 1
 #define QL_ALL_DUMP
diff --git a/drivers/net/qlge/qlge_main.c b/drivers/net/qlge/qlge_main.c
index fa4b24c..e99c2c6 100644
--- a/drivers/net/qlge/qlge_main.c
+++ b/drivers/net/qlge/qlge_main.c
@@ -3919,6 +3919,11 @@ static int ql_adapter_up(struct ql_adapter *qdev)
 	if ((ql_read32(qdev, STS) & qdev->port_init) &&
 			(ql_read32(qdev, STS) & qdev->port_link_up))
 		ql_link_on(qdev);
+	/* Restore rx mode. */
+	clear_bit(QL_ALLMULTI, &qdev->flags);
+	clear_bit(QL_PROMISCUOUS, &qdev->flags);
+	qlge_set_multicast_list(qdev->ndev);
+
 	ql_enable_interrupts(qdev);
 	ql_enable_all_completion_interrupts(qdev);
 	netif_tx_start_all_queues(qdev->ndev);
@@ -4204,7 +4209,7 @@ static struct net_device_stats *qlge_get_stats(struct net_device
 	return &ndev->stats;
 }
 
-static void qlge_set_multicast_list(struct net_device *ndev)
+void qlge_set_multicast_list(struct net_device *ndev)
 {
 	struct ql_adapter *qdev = (struct ql_adapter *)netdev_priv(ndev);
 	struct netdev_hw_addr *ha;
-- 
1.6.0.2


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox