Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] rps: tcp: fix rps_sock_flow_table table updates
From: David Miller @ 2010-06-04 22:57 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, netdev
In-Reply-To: <1275591838.2533.37.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 03 Jun 2010 21:03:58 +0200

> I believe a moderate SYN flood attack can corrupt RFS flow table
> (rps_sock_flow_table), making RPS/RFS much less effective.
> 
> Even in a normal situation, server handling short lived sessions suffer
> from bad steering for the first data packet of a session, if another SYN
> packet is received for another session.
> 
> We do following action in tcp_v4_rcv() :
> 
> 	sock_rps_save_rxhash(sk, skb->rxhash);
> 
> We should _not_ do this if sk is a LISTEN socket, as about each
> packet received on a LISTEN socket has a different rxhash than
> previous one.
>  -> RPS_NO_CPU markers are spread all over rps_sock_flow_table.
> 
> Also, it makes sense to protect sk->rxhash field changes with socket
> lock (We currently can change it even if user thread owns the lock
> and might use rxhash)
> 
> This patch moves sock_rps_save_rxhash() to a sock locked section,
> and only for non LISTEN sockets.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] tcp: use correct net ns in cookie_v4_check()
From: David Miller @ 2010-06-04 22:57 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1275579947.2456.80.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 03 Jun 2010 17:45:47 +0200

> [PATCH] tcp: use correct net ns in cookie_v4_check()
> 
> Its better to make a route lookup in appropriate namespace.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied and queued up for -stable, thanks Eric!

^ permalink raw reply

* Re: [PATCH] syncookies: remove Kconfig text line about disabled-by-default
From: David Miller @ 2010-06-04 22:57 UTC (permalink / raw)
  To: fw; +Cc: netdev
In-Reply-To: <1275561750-24111-1-git-send-email-fw@strlen.de>

From: Florian Westphal <fw@strlen.de>
Date: Thu,  3 Jun 2010 12:42:30 +0200

> syncookies default to on since
> e994b7c901ded7200b525a707c6da71f2cf6d4bb
> (tcp: Don't make syn cookies initial setting depend on CONFIG_SYSCTL).
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>

Applied, thanks Florian.

^ permalink raw reply

* Re: 200 millisecond timeouts in TCP
From: Ryousei Takano @ 2010-06-04 22:55 UTC (permalink / raw)
  To: Satoru SATOH; +Cc: Ivan Novick, netdev
In-Reply-To: <20100604150202.GA8268@localhost.localdomain>

Hi satoru,

On Sat, Jun 5, 2010 at 12:02 AM, Satoru SATOH <satoru.satoh@gmail.com> wrote:
> It's possible to tune the RTO min value (rto_min) per route since
> 2.6.23+.  (see also the manual of iproute2, ip(8) )
>
Yeah, I forgot it.  Per route configuration is a nice idea compared
with global sysctls.

Thanks!
Ryousei

> - satoru
>
> On Fri, Jun 04, 2010 at 03:58:57PM +0900, Ryousei Takano wrote:
>> On Fri, Jun 4, 2010 at 7:37 AM, Ivan Novick <novickivan@gmail.com> wrote:
>> > Hello,
>> >
>> > Using tcpdump and systemtap I am seeing that sometimes retransmission
>> > of data is sent after waiting 200 milliseconds.  However sometimes
>> > retransmissions happen quicker.
>> >
>> > Is there a specifc event that causes these 200 milisec delays to kick
>> > in?  Are those events identifiable in netstat -s output?
>> >
>> > Also do you know if the timeout numbers for TCP are configurable parameters?
>> >
>> The minimum RTO value is fixed to 200 ms.  It is useful to make the min/max
>> RTO values tunable.  For example, reducing the minimum RTO value is effective
>> for TCP incast problem [1].  Of course, it may occur spurious retransmissions.
>>
>> [1] Vijay Vasudevan, et al, Safe and Effective Fine-grained TCP Retransmissions
>> for Datacenter Communication, SIGCOMM2009
>>
>> In Solaris, there are two tunable parameters: tcp_rexmit_interval_min/max.
>>
>> Do you have plan to introduce sysctl parameters like these to the Linux.
>>
>> Thanks,
>> Ryousei
>

^ permalink raw reply

* Re: Ethernet drivers wiki page
From: Stephen Hemminger @ 2010-06-04 22:23 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: netdev, H. Peter Anvin, Johannes Berg, linux-kernel
In-Reply-To: <AANLkTik1isTBolg4IDchOlmKqXOLjQoobZrJ-9bb8ZL5@mail.gmail.com>

On Fri, 4 Jun 2010 15:04:17 -0700
"Luis R. Rodriguez" <mcgrof@gmail.com> wrote:

> On Fri, Jun 4, 2010 at 2:52 PM, Luis R. Rodriguez <mcgrof@gmail.com> wrote:
> > Where can I start adding documentation for our Ethernet drivers
> > upstream? I was hoping for a wiki but I don't think we have a netdev
> > one yet. Shall we create one or piggy back on something else? I'm
> > reviewing:
> >
> > https://wiki.kernel.org/
> >
> > FWIW, I want to see something like this:
> >
> > http://wireless.kernel.org/en/users/Drivers
> >
> > But for ethernet.
> 
> Since I've stuffed a few ethernet drivers on compat-wireless I might
> as well start carrying more. So now we'd have 802.11, Bluetooth and
> Ethernet all backported using the same framework, automatically, down
> to at least the oldest stable kernel supported listed on kernel.org
> and using all a generic compat module.
> 
> I want to document all this crap.

Why not Documentation/networking in kernel source? There is already
documentation there though much of it is out of date.
This has the advantage of having change log and matching kernel version.

^ permalink raw reply

* Re: Is CONFIG_IP_VS_IPV6 still/really dangerous?
From: Ferenc Wagner @ 2010-06-04 22:21 UTC (permalink / raw)
  To: lvs-users; +Cc: netdev, robert.gallagher, Julius Volz
In-Reply-To: <AANLkTikQPaHYuOOAiJtIft9NnOZ2S1LXTVMi_BLD8gau@mail.gmail.com>

Julius Volz <juliusv@google.com> writes:

> On Fri, Jun 4, 2010 at 3:56 PM, Ferenc Wagner <wferi@niif.hu> wrote:
>
>> In commit fab0de02fb0da83b90cec7fce4294747d86d5c6f CONFIG_IP_VS_IPV6 is
>> described as:
>>
>>    Add IPv6 support to IPVS. This is incomplete and might be dangerous.
>>
>> I agree its implementation is incomplete.  But I wonder if it's really
>> dangerous in the sense that generic distribution kernels shouldn't
>> enable it, because it can break unrelated (eg. IPv4 IPVS) functionality.
>>
>> What does that warning mean today?  Isn't it out of date?
>
> I wrote the IPv6 support back in the day, but never used it
> large-scale. Rob Gallagher from HEAnet was doing some bigger
> experiments with it, but I'm not sure how far it went. CCing him.
>
> There are probably some other people out there that have tested it
> extensively. Maybe try the lvs-users and lvs-devel mailing lists?

Sounds like a good idea!  So:

Dear lvs-users,

did you experience any breakage as a result of switching on
CONFIG_IP_VS_IPV6?  I mean apart from it having incomplete
functionality.  The gist of the question is whether this option
is suitable for generic distro kernels or not, cf. above.
-- 
Thanks for your time,
Feri.

^ permalink raw reply

* [PATCHv2 2/2] sfc: Implement 64-bit net device statistics on all architectures
From: Ben Hutchings @ 2010-06-04 22:06 UTC (permalink / raw)
  To: David Miller; +Cc: Stephen Hemminger, Arnd Bergmann, netdev, linux-net-drivers
In-Reply-To: <1275689093.2095.36.camel@achroite.uk.solarflarecom.com>

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
This is unchanged from v1.

Ben.

 drivers/net/sfc/efx.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/sfc/efx.c b/drivers/net/sfc/efx.c
index 26b0cc2..8ad476a 100644
--- a/drivers/net/sfc/efx.c
+++ b/drivers/net/sfc/efx.c
@@ -1492,11 +1492,11 @@ static int efx_net_stop(struct net_device *net_dev)
 }
 
 /* Context: process, dev_base_lock or RTNL held, non-blocking. */
-static struct net_device_stats *efx_net_stats(struct net_device *net_dev)
+static struct rtnl_link_stats64 *efx_net_stats(struct net_device *net_dev)
 {
 	struct efx_nic *efx = netdev_priv(net_dev);
 	struct efx_mac_stats *mac_stats = &efx->mac_stats;
-	struct net_device_stats *stats = &net_dev->stats;
+	struct rtnl_link_stats64 *stats = &net_dev->stats64;
 
 	spin_lock_bh(&efx->stats_lock);
 	efx->type->update_stats(efx);
@@ -1630,7 +1630,7 @@ static void efx_set_multicast_list(struct net_device *net_dev)
 static const struct net_device_ops efx_netdev_ops = {
 	.ndo_open		= efx_net_open,
 	.ndo_stop		= efx_net_stop,
-	.ndo_get_stats		= efx_net_stats,
+	.ndo_get_stats64	= efx_net_stats,
 	.ndo_tx_timeout		= efx_watchdog,
 	.ndo_start_xmit		= efx_hard_start_xmit,
 	.ndo_validate_addr	= eth_validate_addr,
-- 
1.6.2.5

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCHv2 1/2] net: Enable 64-bit net device statistics on 32-bit architectures
From: Ben Hutchings @ 2010-06-04 22:04 UTC (permalink / raw)
  To: David Miller; +Cc: Stephen Hemminger, Arnd Bergmann, netdev, linux-net-drivers

Use struct rtnl_link_stats64 as the statistics structure.

On 32-bit architectures, insert 32 bits of padding after/before each
field of struct net_device_stats to make its layout compatible with
struct rtnl_link_stats64.  Add an anonymous union in net_device; move
stats into the union and add struct rtnl_link_stats64 stats64.

Add net_device_ops::ndo_get_stats64, implementations of which will
return a pointer to struct rtnl_link_stats64.  Drivers that implement
this operation must not update the structure asynchronously.

Change dev_get_stats() to call ndo_get_stats64 if available, and to
return a pointer to struct rtnl_link_stats64.  Change callers of
dev_get_stats() accordingly.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
This time without the horrible accessor macros.

Ben.

 include/linux/if_link.h   |    3 +-
 include/linux/netdevice.h |   91 ++++++++++++++++++++++++++------------------
 net/8021q/vlanproc.c      |   13 +++---
 net/core/dev.c            |   19 +++++----
 net/core/net-sysfs.c      |   12 +++---
 net/core/rtnetlink.c      |    6 +-
 6 files changed, 83 insertions(+), 61 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 85c812d..7fcad2e 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -4,7 +4,7 @@
 #include <linux/types.h>
 #include <linux/netlink.h>
 
-/* The struct should be in sync with struct net_device_stats */
+/* This struct should be in sync with struct rtnl_link_stats64 */
 struct rtnl_link_stats {
 	__u32	rx_packets;		/* total packets received	*/
 	__u32	tx_packets;		/* total packets transmitted	*/
@@ -37,6 +37,7 @@ struct rtnl_link_stats {
 	__u32	tx_compressed;
 };
 
+/* The main device statistics structure */
 struct rtnl_link_stats64 {
 	__u64	rx_packets;		/* total packets received	*/
 	__u64	tx_packets;		/* total packets transmitted	*/
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a249161..dceacc2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -159,45 +159,49 @@ static inline bool dev_xmit_complete(int rc)
 #define MAX_HEADER (LL_MAX_HEADER + 48)
 #endif
 
-#endif  /*  __KERNEL__  */
-
 /*
- *	Network device statistics. Akin to the 2.0 ether stats but
- *	with byte counters.
+ *	Old network device statistics. Fields are native words
+ *	(unsigned long) so they can be read and written atomically.
+ *	Each field is padded to 64 bits for compatibility with
+ *	rtnl_link_stats64.
  */
 
+#if BITS_PER_LONG == 64
+#define NET_DEVICE_STATS_DEFINE(name)	u64 name
+#elif defined(__LITTLE_ENDIAN)
+#define NET_DEVICE_STATS_DEFINE(name)	u32 name, pad_ ## name
+#else
+#define NET_DEVICE_STATS_DEFINE(name)	u32 pad_ ## name, name
+#endif
+
 struct net_device_stats {
-	unsigned long	rx_packets;		/* total packets received	*/
-	unsigned long	tx_packets;		/* total packets transmitted	*/
-	unsigned long	rx_bytes;		/* total bytes received 	*/
-	unsigned long	tx_bytes;		/* total bytes transmitted	*/
-	unsigned long	rx_errors;		/* bad packets received		*/
-	unsigned long	tx_errors;		/* packet transmit problems	*/
-	unsigned long	rx_dropped;		/* no space in linux buffers	*/
-	unsigned long	tx_dropped;		/* no space available in linux	*/
-	unsigned long	multicast;		/* multicast packets received	*/
-	unsigned long	collisions;
-
-	/* detailed rx_errors: */
-	unsigned long	rx_length_errors;
-	unsigned long	rx_over_errors;		/* receiver ring buff overflow	*/
-	unsigned long	rx_crc_errors;		/* recved pkt with crc error	*/
-	unsigned long	rx_frame_errors;	/* recv'd frame alignment error */
-	unsigned long	rx_fifo_errors;		/* recv'r fifo overrun		*/
-	unsigned long	rx_missed_errors;	/* receiver missed packet	*/
-
-	/* detailed tx_errors */
-	unsigned long	tx_aborted_errors;
-	unsigned long	tx_carrier_errors;
-	unsigned long	tx_fifo_errors;
-	unsigned long	tx_heartbeat_errors;
-	unsigned long	tx_window_errors;
-	
-	/* for cslip etc */
-	unsigned long	rx_compressed;
-	unsigned long	tx_compressed;
+	NET_DEVICE_STATS_DEFINE(rx_packets);
+	NET_DEVICE_STATS_DEFINE(tx_packets);
+	NET_DEVICE_STATS_DEFINE(rx_bytes);
+	NET_DEVICE_STATS_DEFINE(tx_bytes);
+	NET_DEVICE_STATS_DEFINE(rx_errors);
+	NET_DEVICE_STATS_DEFINE(tx_errors);
+	NET_DEVICE_STATS_DEFINE(rx_dropped);
+	NET_DEVICE_STATS_DEFINE(tx_dropped);
+	NET_DEVICE_STATS_DEFINE(multicast);
+	NET_DEVICE_STATS_DEFINE(collisions);
+	NET_DEVICE_STATS_DEFINE(rx_length_errors);
+	NET_DEVICE_STATS_DEFINE(rx_over_errors);
+	NET_DEVICE_STATS_DEFINE(rx_crc_errors);
+	NET_DEVICE_STATS_DEFINE(rx_frame_errors);
+	NET_DEVICE_STATS_DEFINE(rx_fifo_errors);
+	NET_DEVICE_STATS_DEFINE(rx_missed_errors);
+	NET_DEVICE_STATS_DEFINE(tx_aborted_errors);
+	NET_DEVICE_STATS_DEFINE(tx_carrier_errors);
+	NET_DEVICE_STATS_DEFINE(tx_fifo_errors);
+	NET_DEVICE_STATS_DEFINE(tx_heartbeat_errors);
+	NET_DEVICE_STATS_DEFINE(tx_window_errors);
+	NET_DEVICE_STATS_DEFINE(rx_compressed);
+	NET_DEVICE_STATS_DEFINE(tx_compressed);
 };
 
+#endif  /*  __KERNEL__  */
+
 
 /* Media selection options. */
 enum {
@@ -660,10 +664,19 @@ struct netdev_rx_queue {
  *	Callback uses when the transmitter has not made any progress
  *	for dev->watchdog ticks.
  *
+ * struct rtnl_link_stats64* (*ndo_get_stats64)(struct net_device *dev);
  * struct net_device_stats* (*ndo_get_stats)(struct net_device *dev);
  *	Called when a user wants to get the network device usage
- *	statistics. If not defined, the counters in dev->stats will
- *	be used.
+ *	statistics. Drivers must do one of the following:
+ *	1. Define @ndo_get_stats64 to update a rtnl_link_stats64 structure
+ *	   (which should normally be dev->stats64) and return a ponter to
+ *	   it. The structure must not be changed asynchronously.
+ *	2. Define @ndo_get_stats to update a net_device_stats64 structure
+ *	   (which should normally be dev->stats) and return a pointer to
+ *	   it. The structure may be changed asynchronously only if each
+ *	   field is written atomically.
+ *	3. Update dev->stats asynchronously and atomically, and define
+ *	   neither operation.
  *
  * void (*ndo_vlan_rx_register)(struct net_device *dev, struct vlan_group *grp);
  *	If device support VLAN receive accleration
@@ -718,6 +731,7 @@ struct net_device_ops {
 						   struct neigh_parms *);
 	void			(*ndo_tx_timeout) (struct net_device *dev);
 
+	struct rtnl_link_stats64* (*ndo_get_stats64)(struct net_device *dev);
 	struct net_device_stats* (*ndo_get_stats)(struct net_device *dev);
 
 	void			(*ndo_vlan_rx_register)(struct net_device *dev,
@@ -867,7 +881,10 @@ struct net_device {
 	int			ifindex;
 	int			iflink;
 
-	struct net_device_stats	stats;
+	union {
+		struct rtnl_link_stats64 stats64;
+		struct net_device_stats stats;
+	};
 
 #ifdef CONFIG_WIRELESS_EXT
 	/* List of functions to handle Wireless Extensions (instead of ioctl).
@@ -2118,7 +2135,7 @@ extern void		netdev_features_change(struct net_device *dev);
 /* Load a device via the kmod */
 extern void		dev_load(struct net *net, const char *name);
 extern void		dev_mcast_init(void);
-extern const struct net_device_stats *dev_get_stats(struct net_device *dev);
+extern const struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev);
 extern void		dev_txq_stats_fold(const struct net_device *dev, struct net_device_stats *stats);
 
 extern int		netdev_max_backlog;
diff --git a/net/8021q/vlanproc.c b/net/8021q/vlanproc.c
index afead35..df56f5c 100644
--- a/net/8021q/vlanproc.c
+++ b/net/8021q/vlanproc.c
@@ -278,8 +278,9 @@ static int vlandev_seq_show(struct seq_file *seq, void *offset)
 {
 	struct net_device *vlandev = (struct net_device *) seq->private;
 	const struct vlan_dev_info *dev_info = vlan_dev_info(vlandev);
-	const struct net_device_stats *stats;
+	const struct rtnl_link_stats64 *stats;
 	static const char fmt[] = "%30s %12lu\n";
+	static const char fmt64[] = "%30s %12llu\n";
 	int i;
 
 	if (!is_vlan_dev(vlandev))
@@ -291,12 +292,12 @@ static int vlandev_seq_show(struct seq_file *seq, void *offset)
 		   vlandev->name, dev_info->vlan_id,
 		   (int)(dev_info->flags & 1), vlandev->priv_flags);
 
-	seq_printf(seq, fmt, "total frames received", stats->rx_packets);
-	seq_printf(seq, fmt, "total bytes received", stats->rx_bytes);
-	seq_printf(seq, fmt, "Broadcast/Multicast Rcvd", stats->multicast);
+	seq_printf(seq, fmt64, "total frames received", stats->rx_packets);
+	seq_printf(seq, fmt64, "total bytes received", stats->rx_bytes);
+	seq_printf(seq, fmt64, "Broadcast/Multicast Rcvd", stats->multicast);
 	seq_puts(seq, "\n");
-	seq_printf(seq, fmt, "total frames transmitted", stats->tx_packets);
-	seq_printf(seq, fmt, "total bytes transmitted", stats->tx_bytes);
+	seq_printf(seq, fmt64, "total frames transmitted", stats->tx_packets);
+	seq_printf(seq, fmt64, "total bytes transmitted", stats->tx_bytes);
 	seq_printf(seq, fmt, "total headroom inc",
 		   dev_info->cnt_inc_headroom_on_tx);
 	seq_printf(seq, fmt, "total encap on xmit",
diff --git a/net/core/dev.c b/net/core/dev.c
index 983a3c1..71a6fd8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3686,10 +3686,10 @@ void dev_seq_stop(struct seq_file *seq, void *v)
 
 static void dev_seq_printf_stats(struct seq_file *seq, struct net_device *dev)
 {
-	const struct net_device_stats *stats = dev_get_stats(dev);
+	const struct rtnl_link_stats64 *stats = dev_get_stats(dev);
 
-	seq_printf(seq, "%6s: %7lu %7lu %4lu %4lu %4lu %5lu %10lu %9lu "
-		   "%8lu %7lu %4lu %4lu %4lu %5lu %7lu %10lu\n",
+	seq_printf(seq, "%6s: %7llu %7llu %4llu %4llu %4llu %5llu %10llu %9llu "
+		   "%8llu %7llu %4llu %4llu %4llu %5llu %7llu %10llu\n",
 		   dev->name, stats->rx_bytes, stats->rx_packets,
 		   stats->rx_errors,
 		   stats->rx_dropped + stats->rx_missed_errors,
@@ -5266,18 +5266,21 @@ EXPORT_SYMBOL(dev_txq_stats_fold);
  *	@dev: device to get statistics from
  *
  *	Get network statistics from device. The device driver may provide
- *	its own method by setting dev->netdev_ops->get_stats; otherwise
- *	the internal statistics structure is used.
+ *	its own method by setting dev->netdev_ops->get_stats64 or
+ *	dev->netdev_ops->get_stats; otherwise the internal statistics
+ *	structure is used.
  */
-const struct net_device_stats *dev_get_stats(struct net_device *dev)
+const struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
 
+	if (ops->ndo_get_stats64)
+		return ops->ndo_get_stats64(dev);
 	if (ops->ndo_get_stats)
-		return ops->ndo_get_stats(dev);
+		return (struct rtnl_link_stats64 *)ops->ndo_get_stats(dev);
 
 	dev_txq_stats_fold(dev, &dev->stats);
-	return &dev->stats;
+	return &dev->stats64;
 }
 EXPORT_SYMBOL(dev_get_stats);
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 99e7052..ea3bb4c 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -29,6 +29,7 @@ static const char fmt_hex[] = "%#x\n";
 static const char fmt_long_hex[] = "%#lx\n";
 static const char fmt_dec[] = "%d\n";
 static const char fmt_ulong[] = "%lu\n";
+static const char fmt_u64[] = "%llu\n";
 
 static inline int dev_isalive(const struct net_device *dev)
 {
@@ -324,14 +325,13 @@ static ssize_t netstat_show(const struct device *d,
 	struct net_device *dev = to_net_dev(d);
 	ssize_t ret = -EINVAL;
 
-	WARN_ON(offset > sizeof(struct net_device_stats) ||
-			offset % sizeof(unsigned long) != 0);
+	WARN_ON(offset > sizeof(struct rtnl_link_stats64) ||
+			offset % sizeof(u64) != 0);
 
 	read_lock(&dev_base_lock);
 	if (dev_isalive(dev)) {
-		const struct net_device_stats *stats = dev_get_stats(dev);
-		ret = sprintf(buf, fmt_ulong,
-			      *(unsigned long *)(((u8 *) stats) + offset));
+		const struct rtnl_link_stats64 *stats = dev_get_stats(dev);
+		ret = sprintf(buf, fmt_u64, *(u64 *)(((u8 *) stats) + offset));
 	}
 	read_unlock(&dev_base_lock);
 	return ret;
@@ -343,7 +343,7 @@ static ssize_t show_##name(struct device *d,				\
 			   struct device_attribute *attr, char *buf) 	\
 {									\
 	return netstat_show(d, attr, buf,				\
-			    offsetof(struct net_device_stats, name));	\
+			    offsetof(struct rtnl_link_stats64, name));	\
 }									\
 static DEVICE_ATTR(name, S_IRUGO, show_##name, NULL)
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1a2af24..e645778 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -579,7 +579,7 @@ static unsigned int rtnl_dev_combine_flags(const struct net_device *dev,
 }
 
 static void copy_rtnl_link_stats(struct rtnl_link_stats *a,
-				 const struct net_device_stats *b)
+				 const struct rtnl_link_stats64 *b)
 {
 	a->rx_packets = b->rx_packets;
 	a->tx_packets = b->tx_packets;
@@ -610,7 +610,7 @@ static void copy_rtnl_link_stats(struct rtnl_link_stats *a,
 	a->tx_compressed = b->tx_compressed;
 }
 
-static void copy_rtnl_link_stats64(void *v, const struct net_device_stats *b)
+static void copy_rtnl_link_stats64(void *v, const struct rtnl_link_stats64 *b)
 {
 	struct rtnl_link_stats64 a;
 
@@ -791,7 +791,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 {
 	struct ifinfomsg *ifm;
 	struct nlmsghdr *nlh;
-	const struct net_device_stats *stats;
+	const struct rtnl_link_stats64 *stats;
 	struct nlattr *attr;
 
 	nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ifm), flags);
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* Re: Ethernet drivers wiki page
From: Luis R. Rodriguez @ 2010-06-04 22:04 UTC (permalink / raw)
  To: netdev; +Cc: H. Peter Anvin, Stephen Hemminger, Johannes Berg, linux-kernel
In-Reply-To: <AANLkTinSjJsqNvEhJZIE6cJxEkBtg-DO2dqZWMczUy2i@mail.gmail.com>

On Fri, Jun 4, 2010 at 2:52 PM, Luis R. Rodriguez <mcgrof@gmail.com> wrote:
> Where can I start adding documentation for our Ethernet drivers
> upstream? I was hoping for a wiki but I don't think we have a netdev
> one yet. Shall we create one or piggy back on something else? I'm
> reviewing:
>
> https://wiki.kernel.org/
>
> FWIW, I want to see something like this:
>
> http://wireless.kernel.org/en/users/Drivers
>
> But for ethernet.

Since I've stuffed a few ethernet drivers on compat-wireless I might
as well start carrying more. So now we'd have 802.11, Bluetooth and
Ethernet all backported using the same framework, automatically, down
to at least the oldest stable kernel supported listed on kernel.org
and using all a generic compat module.

I want to document all this crap.

  Luis

^ permalink raw reply

* Re: Ethernet drivers wiki page
From: Ben Hutchings @ 2010-06-04 22:01 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: netdev, H. Peter Anvin, Stephen Hemminger, Johannes Berg
In-Reply-To: <AANLkTinSjJsqNvEhJZIE6cJxEkBtg-DO2dqZWMczUy2i@mail.gmail.com>

On Fri, 2010-06-04 at 14:52 -0700, Luis R. Rodriguez wrote:
> Where can I start adding documentation for our Ethernet drivers
> upstream? I was hoping for a wiki but I don't think we have a netdev
> one yet. Shall we create one or piggy back on something else? I'm
> reviewing:
> 
> https://wiki.kernel.org/
> 
> FWIW, I want to see something like this:
> 
> http://wireless.kernel.org/en/users/Drivers
> 
> But for ethernet.

Wherever you put it, it should probably be linked from:

http://www.linuxfoundation.org/collaborate/workgroups/networking/

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Ethernet drivers wiki page
From: Luis R. Rodriguez @ 2010-06-04 21:52 UTC (permalink / raw)
  To: netdev; +Cc: H. Peter Anvin, Stephen Hemminger, Johannes Berg

Where can I start adding documentation for our Ethernet drivers
upstream? I was hoping for a wiki but I don't think we have a netdev
one yet. Shall we create one or piggy back on something else? I'm
reviewing:

https://wiki.kernel.org/

FWIW, I want to see something like this:

http://wireless.kernel.org/en/users/Drivers

But for ethernet.

  Luis

^ permalink raw reply

* RE: [PATCH V2 1/2] Export firmware assigned labels of network devices to sysfs
From: Narendra_K @ 2010-06-04 21:15 UTC (permalink / raw)
  To: greg; +Cc: netdev, linux-hotplug, linux-pci, Matt_Domsch
In-Reply-To: <20100604204955.GA20329@kroah.com>

> 
> Um, what?
> 
> Why are you resending this over and over with no additional content
> added?
> 
> If you have a new patch, send it.  Don't bury it at the bottom of
> another message, and then linewrap the thing...
> 

Greg, Sorry for the inconvenience caused. I have sent the V3 patch with
changes mentioned.

With regards,
Narendra K

^ permalink raw reply

* Re: [PATCH V3 1/2] Export firmware assigned labels of network devices to sysfs
From: Narendra K @ 2010-06-04 21:07 UTC (permalink / raw)
  To: greg
  Cc: netdev, linux-hotplug, linux-pci, matt_domsch, jordan_hargrave,
	charles_rose
In-Reply-To: <EDA0A4495861324DA2618B4C45DCB3EE612918@blrx3m08.blr.amer.dell.com>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Narendra K
> Sent: Friday, June 04, 2010 2:37 AM
> To: greg@kroah.com
> Cc: netdev@vger.kernel.org; linux-hotplug@vger.kernel.org; linux-pci@vger.kernel.org; Domsch, Matt
> Subject: Re: [PATCH V2 1/2] Export firmware assigned labels of network devices to sysfs
> 
> In addition to these changes there are a coulple of changes i have done -
> 
> 1.Removed the check for network devices and evaulate _DSM for any pci device
> that has _DSM defined in adherence to the spec.
> 
> 2.Renamed the functions pci_create,remove-acpi_attr_files to 
> pci_create,remove_firmware_label_files.
> 
> 3.Added checks for conditional compilation of if CONFIG_ACPI || CONFIG_DMI
> 

V2 -> V3: I had missed fixing warnings in the patch. Please find the V3
of the patch with the warnings fixed -


From: Narendra K <narendra_k@dell.com>
Subject: [PATCH V3 1/2] Export firmware assigned labels of pci devices to sysfs

This patch exports the firmware assigned labels of pci devices to
sysfs which could be used by user space.

Signed-off-by: Jordan Hargrave <jordan_hargrave@dell.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
---
 drivers/firmware/dmi_scan.c |   24 ++++
 drivers/pci/Makefile        |    2 +-
 drivers/pci/pci-label.c     |  274 +++++++++++++++++++++++++++++++++++++++++++
 drivers/pci/pci-sysfs.c     |    5 +
 drivers/pci/pci.h           |    2 +
 include/linux/dmi.h         |    9 ++
 6 files changed, 315 insertions(+), 1 deletions(-)
 create mode 100644 drivers/pci/pci-label.c

diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c
index d464672..7d8439b 100644
--- a/drivers/firmware/dmi_scan.c
+++ b/drivers/firmware/dmi_scan.c
@@ -277,6 +277,28 @@ static void __init dmi_save_ipmi_device(const struct dmi_header *dm)
 	list_add_tail(&dev->list, &dmi_devices);
 }
 
+static void __init dmi_save_devslot(int id, int seg, int bus, int devfn, const char *name)
+{
+	struct dmi_devslot *slot;
+
+	slot = dmi_alloc(sizeof(*slot) + strlen(name) + 1);
+	if (!slot) {
+		printk(KERN_ERR "dmi_save_devslot: out of memory.\n");
+		return;
+	}	
+	slot->id = id;
+	slot->seg = seg;
+	slot->bus = bus;
+	slot->devfn = devfn;
+
+	strcpy((char *)&slot[1], name);
+	slot->dev.type = DMI_DEV_TYPE_DEVSLOT;
+	slot->dev.name = (char *)&slot[1];
+	slot->dev.device_data = slot;
+
+	list_add(&slot->dev.list, &dmi_devices);
+}
+
 static void __init dmi_save_extended_devices(const struct dmi_header *dm)
 {
 	const u8 *d = (u8*) dm + 5;
@@ -285,6 +307,7 @@ static void __init dmi_save_extended_devices(const struct dmi_header *dm)
 	if ((*d & 0x80) == 0)
 		return;
 
+	dmi_save_devslot(-1, *(u16 *)(d+2), *(d+4), *(d+5), dmi_string_nosave(dm, *(d-1)));
 	dmi_save_one_device(*d & 0x7f, dmi_string_nosave(dm, *(d - 1)));
 }
 
@@ -333,6 +356,7 @@ static void __init dmi_decode(const struct dmi_header *dm, void *dummy)
 		break;
 	case 41:	/* Onboard Devices Extended Information */
 		dmi_save_extended_devices(dm);
+		break;
 	}
 }
 
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 0b51857..69c503a 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -4,7 +4,7 @@
 
 obj-y		+= access.o bus.o probe.o remove.o pci.o \
 			pci-driver.o search.o pci-sysfs.o rom.o setup-res.o \
-			irq.o vpd.o
+			irq.o vpd.o pci-label.o
 obj-$(CONFIG_PROC_FS) += proc.o
 obj-$(CONFIG_SYSFS) += slot.o
 
diff --git a/drivers/pci/pci-label.c b/drivers/pci/pci-label.c
new file mode 100644
index 0000000..d6e0e9b
--- /dev/null
+++ b/drivers/pci/pci-label.c
@@ -0,0 +1,274 @@
+/*
+ * Purpose: Export the firmware label associated with a pci network interface
+ * device to sysfs
+ * Copyright (C) 2010 Dell Inc.
+ * by Narendra K <Narendra_K@dell.com>, Jordan Hargrave <Jordan_Hargrave@dell.com>
+ *
+ * This code checks if the pci network device has a related ACPI _DSM. If
+ * available, the code calls the _DSM to retrieve the index and string and
+ * exports them to sysfs. If the ACPI _DSM is not available, it falls back on
+ * SMBIOS. SMBIOS defines type 41 for onboard pci devices. This code retrieves
+ * strings associated with the type 41 and exports it to sysfs.
+ *
+ * Please see http://linux.dell.com/wiki/index.php/Oss/libnetdevname for more
+ * information.
+ */
+
+#include <linux/dmi.h>
+#include <linux/sysfs.h>
+#include <linux/pci.h>
+#include <linux/pci_ids.h>
+#include <linux/module.h>
+#include <linux/acpi.h>
+#include <linux/pci-acpi.h>
+#include <acpi/acpi_drivers.h>
+#include <acpi/acpi_bus.h>
+#include "pci.h"
+
+#define	DEVICE_LABEL_DSM	0x07
+
+#if defined CONFIG_DMI
+
+struct smbios_attribute {
+	struct attribute attr;
+	ssize_t (*show) (struct device *dev, struct device_attribute *, char *buf);
+	ssize_t (*test) (struct device *dev, char *buf);
+};
+
+static ssize_t
+smbiosname_string_exists(struct device *dev, char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+  	const struct dmi_device *dmi;
+  	struct dmi_devslot *dslot;
+  	int bus;
+  	int devfn;
+
+  	bus = pdev->bus->number;
+  	devfn = pdev->devfn;
+
+  	dmi = NULL;
+  	while ((dmi = dmi_find_device(DMI_DEV_TYPE_DEVSLOT, NULL, dmi)) != NULL) {
+    		dslot = dmi->device_data;
+    		if (dslot && dslot->bus == bus && dslot->devfn == devfn) {
+			if (buf)
+      				return scnprintf(buf, PAGE_SIZE, "%s\n", dmi->name);
+			return strlen(dmi->name);
+		}
+	}
+	
+	return 0;
+}
+
+static ssize_t
+smbiosname_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	return smbiosname_string_exists(dev, buf);
+}
+
+struct smbios_attribute smbios_attr_label = {
+	.attr = {.name = "label", .mode = 0444, .owner = THIS_MODULE},
+	.show = smbiosname_show,
+	.test = smbiosname_string_exists,
+};
+
+static int 
+pci_create_smbiosname_file(struct pci_dev *pdev)
+{
+	if (smbios_attr_label.test && smbios_attr_label.test(&pdev->dev, NULL)) {
+		if (sysfs_create_file(&pdev->dev.kobj, &smbios_attr_label.attr))
+			return -1;
+		return 0;
+	}
+	return -1;	
+}
+
+static int 
+pci_remove_smbiosname_file(struct pci_dev *pdev)
+{
+	if (smbios_attr_label.test && smbios_attr_label.test(&pdev->dev, NULL)) {
+		sysfs_remove_file(&pdev->dev.kobj, &smbios_attr_label.attr);
+		return 0;
+	}
+	return -1;
+}
+#else
+static inline int 
+pci_create_smbiosname_file(struct pci_dev *pdev)
+{
+	return -1;
+}
+
+static inline int 
+pci_remove_smbiosname_file(struct pci_dev *pdev)
+{
+	return -1;
+}
+#endif
+
+#if defined CONFIG_ACPI
+
+static const char device_label_dsm_uuid[] = {
+	0xD0, 0x37, 0xC9, 0xE5, 0x53, 0x35, 0x7A, 0x4D,
+	0x91, 0x17, 0xEA, 0x4D, 0x19, 0xC3, 0x43, 0x4D
+};
+
+struct acpi_attribute {
+	struct attribute attr;
+	ssize_t (*show) (struct device *dev, struct device_attribute *attr, char *buf);
+	ssize_t (*test) (struct device *dev, char *buf, char *attribute);
+};
+
+static int
+dsm_get_label(acpi_handle handle, int func, 
+              struct acpi_buffer *output, 
+              char *buf, char *attribute)
+{
+	struct acpi_object_list input;
+	union acpi_object params[4];
+	union acpi_object *obj; 
+	int len = 0;
+
+	int err;
+
+	input.count = 4;
+	input.pointer = params;
+	params[0].type = ACPI_TYPE_BUFFER;
+	params[0].buffer.length = sizeof(device_label_dsm_uuid);
+	params[0].buffer.pointer = (char *)device_label_dsm_uuid;
+	params[1].type = ACPI_TYPE_INTEGER;
+	params[1].integer.value = 0x02;
+	params[2].type = ACPI_TYPE_INTEGER;
+	params[2].integer.value = func;
+	params[3].type = ACPI_TYPE_PACKAGE;
+	params[3].package.count = 0;
+	params[3].package.elements = NULL;
+
+	err = acpi_evaluate_object(handle, "_DSM", &input, output);
+	if (err) 
+		return -1;
+	
+	
+	obj = (union acpi_object *)output->pointer;
+
+	switch (obj->type) {
+	case ACPI_TYPE_PACKAGE:
+		if (obj->package.count != 2) 
+			break;
+		len = obj->package.elements[0].integer.value;
+		if (buf) {
+			if (!strncmp(attribute, "index", strlen(attribute)))
+				scnprintf(buf, PAGE_SIZE, "%llu\n", 
+				obj->package.elements[0].integer.value);
+			else
+				scnprintf(buf, PAGE_SIZE, "%s\n", 
+				obj->package.elements[1].string.pointer);
+			kfree(output->pointer);
+			return strlen(buf);
+		}
+		kfree(output->pointer);
+		return len;
+	break;
+	default:
+		kfree(output->pointer);
+	}
+	return -1;
+}	
+
+static ssize_t
+acpi_index_string_exist(struct device *dev, char *buf, char *attribute)
+{
+	struct acpi_buffer output = {ACPI_ALLOCATE_BUFFER, NULL};
+	acpi_handle handle;
+	int length;
+	
+	handle = DEVICE_ACPI_HANDLE(dev);
+
+	if (!handle) 
+		return -1;
+
+	if ((length = dsm_get_label(handle, DEVICE_LABEL_DSM, 
+				    &output, buf, attribute)) < 0)
+		return -1;
+
+	return length;
+}
+	
+static ssize_t
+acpilabel_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	return acpi_index_string_exist(dev, buf, "label");
+}
+
+static ssize_t
+acpiindex_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	return acpi_index_string_exist(dev, buf, "index");
+}
+
+struct acpi_attribute acpi_attr_label = {
+	.attr = {.name = "label", .mode = 0444, .owner = THIS_MODULE},
+	.show = acpilabel_show,
+	.test = acpi_index_string_exist,
+};
+
+struct acpi_attribute acpi_attr_index = {
+	.attr = {.name = "index", .mode = 0444, .owner = THIS_MODULE},
+	.show = acpiindex_show,
+	.test = acpi_index_string_exist,
+};
+
+static int
+pci_create_acpi_index_label_files(struct pci_dev *pdev)
+{
+	if (acpi_attr_label.test && acpi_attr_label.test(&pdev->dev, NULL, NULL) > 0) {
+		if (sysfs_create_file(&pdev->dev.kobj, &acpi_attr_label.attr)) 
+			return -1;
+		if (sysfs_create_file(&pdev->dev.kobj, &acpi_attr_index.attr)) 
+			return -1;
+		return 0;
+	}
+	return -1;
+}
+
+static int
+pci_remove_acpi_index_label_files(struct pci_dev *pdev)
+{
+	if (acpi_attr_label.test && acpi_attr_label.test(&pdev->dev, NULL, NULL) > 0) {
+		sysfs_remove_file(&pdev->dev.kobj, &acpi_attr_label.attr);
+		sysfs_remove_file(&pdev->dev.kobj, &acpi_attr_index.attr);
+		return 0;
+	}
+	return -1;
+}
+#else
+static inline int
+pci_create_acpi_index_label_files(struct pci_dev *pdev)
+{
+	return -1;
+}
+
+static inline int
+pci_remove_acpi_index_label_files(struct pci_dev *pdev)
+{
+	return -1;
+}
+#endif
+
+int pci_create_firmware_label_files(struct pci_dev *pdev)
+{
+	if (!pci_create_acpi_index_label_files(pdev))
+		return 0;
+	if (!pci_create_smbiosname_file(pdev))
+		return 0;
+	return -ENODEV;
+}
+
+int pci_remove_firmware_label_files(struct pci_dev *pdev)
+{
+	if (!pci_remove_acpi_index_label_files(pdev))
+		return 0;
+	if (!pci_remove_smbiosname_file(pdev))
+		return 0;
+	return -ENODEV;
+}
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index fad9398..4ed517f 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1073,6 +1073,8 @@ int __must_check pci_create_sysfs_dev_files (struct pci_dev *pdev)
 	if (retval)
 		goto err_vga_file;
 
+	pci_create_firmware_label_files(pdev);
+
 	return 0;
 
 err_vga_file:
@@ -1140,6 +1142,9 @@ void pci_remove_sysfs_dev_files(struct pci_dev *pdev)
 		sysfs_remove_bin_file(&pdev->dev.kobj, pdev->rom_attr);
 		kfree(pdev->rom_attr);
 	}
+
+	pci_remove_firmware_label_files(pdev);
+
 }
 
 static int __init pci_sysfs_init(void)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4eb10f4..f223283 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -11,6 +11,8 @@
 extern int pci_uevent(struct device *dev, struct kobj_uevent_env *env);
 extern int pci_create_sysfs_dev_files(struct pci_dev *pdev);
 extern void pci_remove_sysfs_dev_files(struct pci_dev *pdev);
+extern int pci_create_firmware_label_files(struct pci_dev *pdev);
+extern int pci_remove_firmware_label_files(struct pci_dev *pdev);
 extern void pci_cleanup_rom(struct pci_dev *dev);
 #ifdef HAVE_PCI_MMAP
 extern int pci_mmap_fits(struct pci_dev *pdev, int resno,
diff --git a/include/linux/dmi.h b/include/linux/dmi.h
index a8a3e1a..cc57c3a 100644
--- a/include/linux/dmi.h
+++ b/include/linux/dmi.h
@@ -20,6 +20,7 @@ enum dmi_device_type {
 	DMI_DEV_TYPE_SAS,
 	DMI_DEV_TYPE_IPMI = -1,
 	DMI_DEV_TYPE_OEM_STRING = -2,
+	DMI_DEV_TYPE_DEVSLOT = -3,
 };
 
 struct dmi_header {
@@ -37,6 +38,14 @@ struct dmi_device {
 
 #ifdef CONFIG_DMI
 
+struct dmi_devslot {
+	struct dmi_device dev;
+	int id;
+	int seg;
+	int bus;
+	int devfn;
+};
+
 extern int dmi_check_system(const struct dmi_system_id *list);
 const struct dmi_system_id *dmi_first_match(const struct dmi_system_id *list);
 extern const char * dmi_get_system_info(int field);
-- 
1.6.5.2

With regards,
Narendra K

^ permalink raw reply related

* Re: [PATCH V2 1/2] Export firmware assigned labels of network devices to sysfs
From: Greg KH @ 2010-06-04 21:01 UTC (permalink / raw)
  To: Narendra_K; +Cc: netdev, linux-hotplug, linux-pci, Matt_Domsch
In-Reply-To: <20100604204955.GA20329@kroah.com>

On Fri, Jun 04, 2010 at 01:49:55PM -0700, Greg KH wrote:
> On Sat, Jun 05, 2010 at 02:01:35AM +0530, Narendra_K@Dell.com wrote:
> > 
> > 
> > With regards,
> 
> Um, what?
> 
> Why are you resending this over and over with no additional content
> added?

And you have an out-of-office autoreply.

I give up...

^ permalink raw reply

* Re: [PATCH 2/2] pktgen: receive packets and process incoming rate
From: Eric Dumazet @ 2010-06-02 13:00 UTC (permalink / raw)
  To: Daniel Turull; +Cc: netdev, robert, jens.laas
In-Reply-To: <4C06453B.1080801@gmail.com>

Le mercredi 02 juin 2010 à 13:49 +0200, Daniel Turull a écrit :
> This patch adds receiver part to pktgen taking advantages of SMP systems
> with multiple rx queues:
> - Creation of new proc file  /proc/net/pktgen/pgrx to control and display the receiver.
> - It uses PER-CPU variable to store the results per each CPU.
> - Results displayed per CPU and aggregated.
> - The packet handler is add in the protocols handlers (dev_Add_pack())
> - Available statistics: packets and bytes received, work time and rate
> - Only process pktgen packets
> - It is possible to select the incoming interface 
> - Documentation updated with the new commands to control the receiver part.
> 

Interesting, but does it belongs to pktgen ?

other comments included :)

> Signed-off-by: Daniel Turull <daniel.turull@gmail.com>
> 


>  
> +/*Recevier parameters per cpu*/
> +struct pktgen_rx {
> +	u64 rx_packets;		/*packets arrived*/

unsigned long rx_packets ?

> +	u64 rx_bytes;		/*bytes arrived*/
> +
> +	ktime_t start_time;	/*first time stamp of a packet*/
> +	ktime_t last_time;	/*last packet arrival */
> +};


> +int pktgen_rcv_basic(struct sk_buff *skb, struct net_device *dev,
> +			 struct packet_type *pt, struct net_device *orig_dev)
> +{
> +	/* Check magic*/
> +	struct iphdr *iph = ip_hdr(skb);
> +	struct pktgen_hdr *pgh;
> +	void *vaddr;

Following code is ... well ... interesting... But ...

1) Is it IPV6 compatable ? pktgen can be ipv6 or ipv4
2) Is it resistant to malicious packets ? (very small ones)
3) No checksum ?

I think you should use standard mechanisms... (pskb_may_pull(), ...)
Take a look at __netpoll_rx() for an example.

> +	if (skb_is_nonlinear(skb)) {
> +		vaddr = kmap_skb_frag(&skb_shinfo(skb)->frags[0]);
> +		pgh = (struct pktgen_hdr *)
> +			(vaddr+skb_shinfo(skb)->frags[0].page_offset);
> +	} else {
> +		pgh = (struct pktgen_hdr *)(((char *)(iph)) + 28);
> +	}
> +
> +	if (unlikely(pgh->pgh_magic != PKTGEN_MAGIC_NET))
> +		goto end;
> +
> +	if (unlikely(!__get_cpu_var(pktgen_rx_data).rx_packets))
> +		__get_cpu_var(pktgen_rx_data).start_time = ktime_now();
> +
> +	__get_cpu_var(pktgen_rx_data).last_time = ktime_now();
> +

Its a bit suboptimal to use __get_cpu_var several time. Take a look at
disassembly code :)

Use a pointer instead


> +	/* Update counter of packets*/
> +	__get_cpu_var(pktgen_rx_data).rx_packets++;
> +	__get_cpu_var(pktgen_rx_data).rx_bytes += skb->len+14;

This +14 seems suspect (what about vlan tags ?)
Use ETH_HLEN instead at a very minimum?

> +end:
> +	if (skb_is_nonlinear(skb))
> +		kunmap_skb_frag(vaddr);

Should not recognised packets be allowed to flight in other parts of
kernel stack ? This way, we could use ssh to remotely control this
pktgen session ;)

> +	kfree_skb(skb);
> +	return 0;
> +}
> +


^ permalink raw reply

* Re: [PATCH V2 1/2] Export firmware assigned labels of network devices to sysfs
From: Greg KH @ 2010-06-04 20:49 UTC (permalink / raw)
  To: Narendra_K; +Cc: netdev, linux-hotplug, linux-pci, Matt_Domsch
In-Reply-To: <EDA0A4495861324DA2618B4C45DCB3EE612913@blrx3m08.blr.amer.dell.com>

On Sat, Jun 05, 2010 at 02:01:35AM +0530, Narendra_K@Dell.com wrote:
> 
> 
> With regards,

Um, what?

Why are you resending this over and over with no additional content
added?

If you have a new patch, send it.  Don't bury it at the bottom of
another message, and then linewrap the thing...

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH 1/2] net: Enable 64-bit net device statistics on 32-bit architectures
From: Ben Hutchings @ 2010-06-04 20:47 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, Arnd Bergmann, netdev, linux-net-drivers
In-Reply-To: <20100604133930.34e2d53b@nehalam>

On Fri, 2010-06-04 at 13:39 -0700, Stephen Hemminger wrote:
> On Fri, 04 Jun 2010 19:15:18 +0100
> Ben Hutchings <bhutchings@solarflare.com> wrote:
> 
> > On Fri, 2010-06-04 at 10:28 -0700, Stephen Hemminger wrote:
> > > On Thu, 03 Jun 2010 20:11:38 +0100
> > > Ben Hutchings <bhutchings@solarflare.com> wrote:
> > > 
> > > > static inline u64 rtnl_link_stats64_read(const u64 *field)
> > > > {
> > > > 	return ACCESS_ONCE(*field);
> > > > }
> > > > static inline u32 rtnl_link_stats64_read32(const u64 *field)
> > > > {
> > > > 	return ACCESS_ONCE(*field);
> > > > }
> > > 
> > > Do we really care if compiler reorders access. I think not.
> > > There was no order guarantee in the past.
> > 
> > Since these reads are potentially racing with writes, we want to ensure
> > that they are atomic.  Without the volatile-qualification, the compiler
> > can legitimately split or repeat the reads, though I don't see any
> > particular reason why this is a likely optimisation.
> > 
> > Ben.
> > 
> 
> But this part of the code is only being run on on 64 bit machines.
> Updates of basic types for the CPU are atomic, lots of other code
> already assumes this.
> Take off your tin hat, this is excessive paranoia.

I like my tin hat, I'm sure I saw more oopses before I put it on. ;-)

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH 1/2] net: Enable 64-bit net device statistics on 32-bit architectures
From: Stephen Hemminger @ 2010-06-04 20:39 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, Arnd Bergmann, netdev, linux-net-drivers
In-Reply-To: <1275675318.2095.30.camel@achroite.uk.solarflarecom.com>

On Fri, 04 Jun 2010 19:15:18 +0100
Ben Hutchings <bhutchings@solarflare.com> wrote:

> On Fri, 2010-06-04 at 10:28 -0700, Stephen Hemminger wrote:
> > On Thu, 03 Jun 2010 20:11:38 +0100
> > Ben Hutchings <bhutchings@solarflare.com> wrote:
> > 
> > > static inline u64 rtnl_link_stats64_read(const u64 *field)
> > > {
> > > 	return ACCESS_ONCE(*field);
> > > }
> > > static inline u32 rtnl_link_stats64_read32(const u64 *field)
> > > {
> > > 	return ACCESS_ONCE(*field);
> > > }
> > 
> > Do we really care if compiler reorders access. I think not.
> > There was no order guarantee in the past.
> 
> Since these reads are potentially racing with writes, we want to ensure
> that they are atomic.  Without the volatile-qualification, the compiler
> can legitimately split or repeat the reads, though I don't see any
> particular reason why this is a likely optimisation.
> 
> Ben.
> 

But this part of the code is only being run on on 64 bit machines.
Updates of basic types for the CPU are atomic, lots of other code
already assumes this.
Take off your tin hat, this is excessive paranoia.

-- 

^ permalink raw reply

* RE: [PATCH V2 1/2] Export firmware assigned labels of network devices to sysfs
From: Narendra_K @ 2010-06-04 20:31 UTC (permalink / raw)
  To: Narendra_K, greg; +Cc: netdev, linux-hotplug, linux-pci, Matt_Domsch
In-Reply-To: <20100603210715.GA18025@auslistsprd01.us.dell.com>



With regards,
Narendra K


> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Narendra K
> Sent: Friday, June 04, 2010 2:37 AM
> To: greg@kroah.com
> Cc: netdev@vger.kernel.org; linux-hotplug@vger.kernel.org; linux-
> pci@vger.kernel.org; Domsch, Matt
> Subject: Re: [PATCH V2 1/2] Export firmware assigned labels of network
> devices to sysfs
> 
> > -----Original Message-----
> > From: Greg KH [mailto:greg@kroah.com]
> > Sent: Friday, May 28, 2010 9:11 PM
> > To: K, Narendra
> > Cc: netdev@vger.kernel.org; linux-hotplug@vger.kernel.org;
> > linux-pci@vger.kernel.org; Domsch, Matt; Hargrave, Jordan; Rose,
> > Charles; Nijhawan, Vijay
> > Subject: Re: [PATCH 1/2] Export firmware assigned labels of network
> > devices to sysfs
> >
> Thanks for the comments.
> 
> > On Fri, May 28, 2010 at 06:55:21AM -0500, K, Narendra wrote:
> > > Please refer to the PCI-SIG Draft ECN
> > > "PCIe Device Labeling under Operating Systems Draft ECN" at this
> link
> > -
> > > http://www.pcisig.com/specifications/pciexpress/review_zone/.
> > >
> > > It would be great to know your views on this ECN. Please let us
> know
> > if you have
> > > have any suggestions or changes.
> >
> > Note that only members of the PCI-SIG can do this, which pretty much
> > rules out any "normal" Linux kernel developer :(
> >
> > Care to post a public version of this for us to review?
> > > --- /dev/null
> > > +++ b/drivers/pci/pci-label.c
> > > @@ -0,0 +1,242 @@
> > > +/*
> > > + * File:       drivers/pci/pci-label.c
> >
> > This line is not needed, we know the file name :)
> >
> > > + * Purpose:    Export the firmware label associated with a pci
> > network interface
> > > + * device to sysfs
> > > + * Copyright (C) 2010 Dell Inc.
> > > + * by Narendra K <Narendra_K@dell.com>, Jordan Hargrave
> > <Jordan_Hargrave@dell.com>
> > > + *
> > > + * This code checks if the pci network device has a related ACPI
> > _DSM. If
> > > + * available, the code calls the _DSM to retrieve the index and
> > string and
> > > + * exports them to sysfs. If the ACPI _DSM is not available, it
> falls
> > back on
> > > + * SMBIOS. SMBIOS defines type 41 for onboard pci devices. This
> code
> > retrieves
> > > + * strings associated with the type 41 and exports it to sysfs.
> > > + *
> > > + * Please see
> http://linux.dell.com/wiki/index.php/Oss/libnetdevname
> > for more
> > > + * information.
> > > + */
> > > +
> > > +#include <linux/pci-label.h>
> >
> > Why is this file in include/linux/ ?  Who needs it there?  Can't it
> just
> > be in in the drivers/pci/ directory?  Actually all you need is 2
> > functions in there, so it could go into the internal pci.h file in
> that
> > directory without a problem, right?
> >
> 
> This file is removed and functions are moved to the internal pci.h
> 
> > > +
> > > +static ssize_t
> > > +smbiosname_string_exists(struct device *dev, char *buf)
> > > +{
> > > +       struct pci_dev *pdev = to_pci_dev(dev);
> > > +       const struct dmi_device *dmi;
> > > +       struct dmi_devslot *dslot;
> > > +       int bus;
> > > +       int devfn;
> > > +
> > > +       bus = pdev->bus->number;
> > > +       devfn = pdev->devfn;
> > > +
> > > +       dmi = NULL;
> > > +       while ((dmi = dmi_find_device(DMI_DEV_TYPE_DEVSLOT, NULL,
> > dmi)) != NULL) {
> > > +               dslot = dmi->device_data;
> > > +               if (dslot && dslot->bus == bus && dslot->devfn ==
> > devfn) {
> > > +                       if (buf)
> > > +                               return scnprintf(buf, PAGE_SIZE,
> > "%s\n", dmi->name);
> > > +                       return strlen(dmi->name);
> > > +               }
> > > +       }
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static ssize_t
> > > +smbiosname_show(struct device *dev, struct device_attribute *attr,
> > char *buf)
> > > +{
> > > +       return smbiosname_string_exists(dev, buf);
> > > +}
> > > +
> > > +struct smbios_attribute smbios_attr_label = {
> > > +       .attr = {.name = __stringify(label), .mode = 0444, .owner =
> > THIS_MODULE},
> >
> > Can't you just put "label" as the name?
> >
> 
> This is fixed.
> 
> > > +       .show = smbiosname_show,
> > > +       .test = smbiosname_string_exists,
> > > +};
> > > +
> > > +static int
> > > +pci_create_smbiosname_file(struct pci_dev *pdev)
> > > +{
> > > +       if (smbios_attr_label.test &&
> > smbios_attr_label.test(&pdev->dev, NULL)) {
> > > +               sysfs_create_file(&pdev->dev.kobj,
> > &smbios_attr_label.attr);
> > > +               return 0;
> > > +       }
> > > +       return -1;
> > > +}
> > > +
> > > +static int
> > > +pci_remove_smbiosname_file(struct pci_dev *pdev)
> > > +{
> > > +       if (smbios_attr_label.test &&
> > smbios_attr_label.test(&pdev->dev, NULL)) {
> > > +               sysfs_remove_file(&pdev->dev.kobj,
> > &smbios_attr_label.attr);
> > > +               return 0;
> > > +       }
> > > +       return -1;
> > > +}
> > > +
> > > +static const char dell_dsm_uuid[] = {
> >
> > Um, a dell specific uuid in a generic file?  What happens when we
> need
> > to support another manufacturer?
> >
> 
> My understanding of uuid was incorrect. I have renamed it to a more
> generic
> device_label_dsm_uuid and ACPI_DSM_FUNCTION to DEVICE_LABEL_DSM
> 
> > > +       0xD0, 0x37, 0xC9, 0xE5, 0x53, 0x35, 0x7A, 0x4D,
> > > +       0x91, 0x17, 0xEA, 0x4D, 0x19, 0xC3, 0x43, 0x4D
> > > +};
> > > +
> > > +
> > > +static int
> > > +dsm_get_label(acpi_handle handle, int func,
> > > +              struct acpi_buffer *output,
> > > +              char *buf, char *attribute)
> > > +{
> > > +       struct acpi_object_list input;
> > > +       union acpi_object params[4];
> > > +       union acpi_object *obj;
> > > +       int len = 0;
> > > +
> > > +       int err;
> > > +
> > > +       input.count = 4;
> > > +       input.pointer = params;
> > > +       params[0].type = ACPI_TYPE_BUFFER;
> > > +       params[0].buffer.length = sizeof(dell_dsm_uuid);
> > > +       params[0].buffer.pointer = (char *)dell_dsm_uuid;
> > > +       params[1].type = ACPI_TYPE_INTEGER;
> > > +       params[1].integer.value = 0x02;
> > > +       params[2].type = ACPI_TYPE_INTEGER;
> > > +       params[2].integer.value = func;
> > > +       params[3].type = ACPI_TYPE_PACKAGE;
> > > +       params[3].package.count = 0;
> > > +       params[3].package.elements = NULL;
> > > +
> > > +       err = acpi_evaluate_object(handle, "_DSM", &input, output);
> > > +       if (err) {
> > > +               printk(KERN_INFO "failed to evaulate _DSM\n");
> > > +               return -1;
> > > +       }
> > > +
> > > +       obj = (union acpi_object *)output->pointer;
> > > +
> > > +       switch (obj->type) {
> > > +       case ACPI_TYPE_PACKAGE:
> > > +               if (obj->package.count == 2) {
> > > +                       len = obj-
> >package.elements[0].integer.value;
> > > +                       if (buf) {
> > > +                               if (!strncmp(attribute, "index",
> > strlen(attribute)))
> > > +                                       scnprintf(buf, PAGE_SIZE,
> > "%lu\n",
> > > +
> > obj->package.elements[0].integer.value);
> > > +                               else
> > > +                                       scnprintf(buf, PAGE_SIZE,
> > "%s\n",
> > > +
> > obj->package.elements[1].string.pointer);
> > > +                               kfree(output->pointer);
> > > +                               return strlen(buf);
> > > +                       }
> > > +               }
> > > +               kfree(output->pointer);
> > > +               return len;
> > > +       break;
> > > +       default:
> > > +               return -1;
> > > +       }
> > > +}
> > > +
> > > +static ssize_t
> > > +acpi_index_string_exist(struct device *dev, char *buf, char
> > *attribute)
> > > +{
> > > +       struct pci_dev *pdev = to_pci_dev(dev);
> > > +
> > > +       struct acpi_buffer output = {ACPI_ALLOCATE_BUFFER, NULL};
> > > +       acpi_handle handle;
> > > +       int length;
> > > +       int is_addin_card = 0;
> > > +
> > > +       if ((pdev->class >> 16) != PCI_BASE_CLASS_NETWORK)
> > > +               return -1;
> > > +
> > > +       handle = DEVICE_ACPI_HANDLE(dev);
> > > +
> > > +       if (!handle) {
> > > +               /*
> > > +                * The device is an add-in network controller and
> does
> > have
> > > +                * a valid handle. Try until we get the handle for
> the
> > parent
> > > +                * bridge
> > > +                */
> > > +               struct pci_bus *pbus;
> > > +               for (pbus = pdev->bus; pbus; pbus = pbus->parent) {
> > > +                       handle =
> > DEVICE_ACPI_HANDLE(&(pbus->self->dev));
> > > +                       if (handle)
> > > +                               break;
> > > +
> > > +               }
> > > +       }
> > > +
> > > +       if ((length = dsm_get_label(handle, DELL_DSM_NETWORK,
> > > +                                   &output, buf, attribute)) < 0)
> > > +               return -1;
> > > +
> > > +       return length;
> > > +}
> > > +
> > > +static ssize_t
> > > +acpilabel_show(struct device *dev, struct device_attribute *attr,
> > char *buf)
> > > +{
> > > +       return acpi_index_string_exist(dev, buf, "label");
> > > +}
> > > +
> > > +static ssize_t
> > > +acpiindex_show(struct device *dev, struct device_attribute *attr,
> > char *buf)
> > > +{
> > > +       return acpi_index_string_exist(dev, buf, "index");
> > > +}
> > > +
> > > +struct acpi_attribute acpi_attr_label = {
> > > +       .attr = {.name = __stringify(label), .mode = 0444, .owner =
> > THIS_MODULE},
> > > +       .show = acpilabel_show,
> > > +       .test = acpi_index_string_exist,
> > > +};
> > > +
> > > +struct acpi_attribute acpi_attr_index = {
> > > +       .attr = {.name = __stringify(index), .mode = 0444, .owner =
> > THIS_MODULE},
> > > +       .show = acpiindex_show,
> > > +       .test = acpi_index_string_exist,
> > > +};
> > > +
> > > +static int
> > > +pci_create_acpi_index_label_files(struct pci_dev *pdev)
> > > +{
> > > +       if (acpi_attr_label.test && acpi_attr_label.test(&pdev-
> >dev,
> > NULL) > 0) {
> > > +               sysfs_create_file(&pdev->dev.kobj,
> > &acpi_attr_label.attr);
> > > +               sysfs_create_file(&pdev->dev.kobj,
> > &acpi_attr_index.attr);
> > > +               return 0;
> > > +       }
> > > +       return -1;
> > > +}
> > > +
> > > +static int
> > > +pci_remove_acpi_index_label_files(struct pci_dev *pdev)
> > > +{
> > > +       if (acpi_attr_label.test && acpi_attr_label.test(&pdev-
> >dev,
> > NULL) > 0) {
> > > +               sysfs_remove_file(&pdev->dev.kobj,
> > &acpi_attr_label.attr);
> > > +               sysfs_remove_file(&pdev->dev.kobj,
> > &acpi_attr_index.attr);
> > > +               return 0;
> > > +       }
> > > +       return -1;
> > > +}
> > > +
> > > +int pci_create_acpi_attr_files(struct pci_dev *pdev)
> > > +{
> > > +       if (!pci_create_acpi_index_label_files(pdev))
> > > +               return 0;
> > > +       if (!pci_create_smbiosname_file(pdev))
> > > +               return 0;
> > > +       return -ENODEV;
> > > +}
> > > +EXPORT_SYMBOL(pci_create_acpi_attr_files);
> >
> > EXPORT_SYMBOL_GPL?
> >
> > Wait, why does this need to be exported at all?  What module is ever
> > going to call this function?
> >
> > > +int pci_remove_acpi_attr_files(struct pci_dev *pdev)
> > > +{
> > > +       if (!pci_remove_acpi_index_label_files(pdev))
> > > +               return 0;
> > > +       if (!pci_remove_smbiosname_file(pdev))
> > > +               return 0;
> > > +       return -ENODEV;
> > > +
> > > +}
> > > +EXPORT_SYMBOL(pci_remove_acpi_attr_files);
> >
> > Same here, what module will call this?
> >
> 
> These functions need not be exported as they are not called by any
> module.
> 
> > > +++ b/include/linux/pci-label.h
> >
> > As discussed above, this whole file does not need to exist.
> >
> > > +extern int pci_create_acpi_attr_files(struct pci_dev *pdev);
> > > +extern int pci_remove_acpi_attr_files(struct pci_dev *pdev);
> >
> > Just put these two functions in the drivers/pci/pci.h file.
> >
> Fixed.
> 
> In addition to these changes there are a coulple of changes i have done
> -
> 
> 1.Removed the check for network devices and evaulate _DSM for any pci
> device
> that has _DSM defined in adherence to the spec.
> 
> 2.Renamed the functions pci_create,remove-acpi_attr_files to
> pci_create,remove_firmware_label_files.
> 
> 3.Added checks for conditional compilation of if CONFIG_ACPI ||
> CONFIG_DMI
> 
> Note: While testing the patch with CONFIG_ACPI set to no, the
> compilation
> would fail with the below message.
> 
>   CC      drivers/pci/pci-label.o
> In file included from drivers/pci/pci-label.c:24:
> include/linux/pci-acpi.h:39: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or
> ‘__attribute__’
> before ‘acpi_find_root_bridge_handle’
> 
> I had to add make this change to proceed with the compilation. It would
> be
> great to know if i am missing something in the way conditional
> compilation
> is implemented or is it a issue.
> 
> ---
>  include/linux/pci-acpi.h |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h
> index c8b6473..bc40827 100644
> --- a/include/linux/pci-acpi.h
> +++ b/include/linux/pci-acpi.h
> @@ -36,8 +36,8 @@ static inline acpi_handle
> acpi_pci_get_bridge_handle(struct pci_bus *pbus)
>  					      pbus->number);
>  }
>  #else
> -static inline acpi_handle acpi_find_root_bridge_handle(struct pci_dev
> *pdev)
> -{ return NULL; }
> +static inline void acpi_find_root_bridge_handle(struct pci_dev *pdev)
> +{ }
>  #endif
> 
>  #endif	/* _PCI_ACPI_H_ */
> 
> 
> Please find the patch with above suggestions and changes -
> 
> 
> From: Narendra K <narendra_k@dell.com>
> Subject: [PATCH V2 1/2] Export firmware assigned labels of pci devices
> to sysfs
> 
> This patch exports the firmware assigned labels of pci devices to
> sysfs which could be used by user space.
> 
> Signed-off-by: Jordan Hargrave <jordan_hargrave@dell.com>
> Signed-off-by: Narendra K <narendra_k@dell.com>
> ---
>  drivers/firmware/dmi_scan.c |   24 ++++
>  drivers/pci/Makefile        |    2 +-
>  drivers/pci/pci-label.c     |  273
> +++++++++++++++++++++++++++++++++++++++++++
>  drivers/pci/pci-sysfs.c     |    5 +
>  drivers/pci/pci.h           |    2 +
>  include/linux/dmi.h         |    9 ++
>  6 files changed, 314 insertions(+), 1 deletions(-)
>  create mode 100644 drivers/pci/pci-label.c
> 
> diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c
> index d464672..7d8439b 100644
> --- a/drivers/firmware/dmi_scan.c
> +++ b/drivers/firmware/dmi_scan.c
> @@ -277,6 +277,28 @@ static void __init dmi_save_ipmi_device(const
> struct dmi_header *dm)
>  	list_add_tail(&dev->list, &dmi_devices);
>  }
> 
> +static void __init dmi_save_devslot(int id, int seg, int bus, int
> devfn, const char *name)
> +{
> +	struct dmi_devslot *slot;
> +
> +	slot = dmi_alloc(sizeof(*slot) + strlen(name) + 1);
> +	if (!slot) {
> +		printk(KERN_ERR "dmi_save_devslot: out of memory.\n");
> +		return;
> +	}
> +	slot->id = id;
> +	slot->seg = seg;
> +	slot->bus = bus;
> +	slot->devfn = devfn;
> +
> +	strcpy((char *)&slot[1], name);
> +	slot->dev.type = DMI_DEV_TYPE_DEVSLOT;
> +	slot->dev.name = (char *)&slot[1];
> +	slot->dev.device_data = slot;
> +
> +	list_add(&slot->dev.list, &dmi_devices);
> +}
> +
>  static void __init dmi_save_extended_devices(const struct dmi_header
> *dm)
>  {
>  	const u8 *d = (u8*) dm + 5;
> @@ -285,6 +307,7 @@ static void __init dmi_save_extended_devices(const
> struct dmi_header *dm)
>  	if ((*d & 0x80) == 0)
>  		return;
> 
> +	dmi_save_devslot(-1, *(u16 *)(d+2), *(d+4), *(d+5),
> dmi_string_nosave(dm, *(d-1)));
>  	dmi_save_one_device(*d & 0x7f, dmi_string_nosave(dm, *(d - 1)));
>  }
> 
> @@ -333,6 +356,7 @@ static void __init dmi_decode(const struct
> dmi_header *dm, void *dummy)
>  		break;
>  	case 41:	/* Onboard Devices Extended Information */
>  		dmi_save_extended_devices(dm);
> +		break;
>  	}
>  }
> 
> diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
> index 0b51857..69c503a 100644
> --- a/drivers/pci/Makefile
> +++ b/drivers/pci/Makefile
> @@ -4,7 +4,7 @@
> 
>  obj-y		+= access.o bus.o probe.o remove.o pci.o \
>  			pci-driver.o search.o pci-sysfs.o rom.o setup-res.o \
> -			irq.o vpd.o
> +			irq.o vpd.o pci-label.o
>  obj-$(CONFIG_PROC_FS) += proc.o
>  obj-$(CONFIG_SYSFS) += slot.o
> 
> diff --git a/drivers/pci/pci-label.c b/drivers/pci/pci-label.c
> new file mode 100644
> index 0000000..b35d48c
> --- /dev/null
> +++ b/drivers/pci/pci-label.c
> @@ -0,0 +1,273 @@
> +/*
> + * Purpose: Export the firmware label associated with a pci network
> interface
> + * device to sysfs
> + * Copyright (C) 2010 Dell Inc.
> + * by Narendra K <Narendra_K@dell.com>, Jordan Hargrave
> <Jordan_Hargrave@dell.com>
> + *
> + * This code checks if the pci network device has a related ACPI _DSM.
> If
> + * available, the code calls the _DSM to retrieve the index and string
> and
> + * exports them to sysfs. If the ACPI _DSM is not available, it falls
> back on
> + * SMBIOS. SMBIOS defines type 41 for onboard pci devices. This code
> retrieves
> + * strings associated with the type 41 and exports it to sysfs.
> + *
> + * Please see http://linux.dell.com/wiki/index.php/Oss/libnetdevname
> for more
> + * information.
> + */
> +
> +#include <linux/dmi.h>
> +#include <linux/sysfs.h>
> +#include <linux/pci.h>
> +#include <linux/pci_ids.h>
> +#include <linux/module.h>
> +#include <linux/acpi.h>
> +#include <linux/pci-acpi.h>
> +#include <acpi/acpi_drivers.h>
> +#include <acpi/acpi_bus.h>
> +#include "pci.h"
> +
> +#define	DEVICE_LABEL_DSM	0x07
> +
> +#if defined CONFIG_DMI
> +
> +struct smbios_attribute {
> +	struct attribute attr;
> +	ssize_t (*show) (struct device *dev, char *buf);
> +	ssize_t (*test) (struct device *dev, char *buf);
> +};
> +
> +static ssize_t
> +smbiosname_string_exists(struct device *dev, char *buf)
> +{
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +  	const struct dmi_device *dmi;
> +  	struct dmi_devslot *dslot;
> +  	int bus;
> +  	int devfn;
> +
> +  	bus = pdev->bus->number;
> +  	devfn = pdev->devfn;
> +
> +  	dmi = NULL;
> +  	while ((dmi = dmi_find_device(DMI_DEV_TYPE_DEVSLOT, NULL, dmi))
> != NULL) {
> +    		dslot = dmi->device_data;
> +    		if (dslot && dslot->bus == bus && dslot->devfn == devfn) {
> +			if (buf)
> +      				return scnprintf(buf, PAGE_SIZE, "%s\n",
> dmi->name);
> +			return strlen(dmi->name);
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static ssize_t
> +smbiosname_show(struct device *dev, struct device_attribute *attr,
> char *buf)
> +{
> +	return smbiosname_string_exists(dev, buf);
> +}
> +
> +struct smbios_attribute smbios_attr_label = {
> +	.attr = {.name = "label", .mode = 0444, .owner = THIS_MODULE},
> +	.show = smbiosname_show,
> +	.test = smbiosname_string_exists,
> +};
> +
> +static int
> +pci_create_smbiosname_file(struct pci_dev *pdev)
> +{
> +	if (smbios_attr_label.test && smbios_attr_label.test(&pdev->dev,
> NULL)) {
> +		sysfs_create_file(&pdev->dev.kobj,
> &smbios_attr_label.attr);
> +		return 0;
> +	}
> +	return -1;
> +}
> +
> +static int
> +pci_remove_smbiosname_file(struct pci_dev *pdev)
> +{
> +	if (smbios_attr_label.test && smbios_attr_label.test(&pdev->dev,
> NULL)) {
> +		sysfs_remove_file(&pdev->dev.kobj,
> &smbios_attr_label.attr);
> +		return 0;
> +	}
> +	return -1;
> +}
> +#else
> +static inline int
> +pci_create_smbiosname_file(struct pci_dev *pdev)
> +{
> +	return -1;
> +}
> +
> +static inline int
> +pci_remove_smbiosname_file(struct pci_dev *pdev)
> +{
> +	return -1;
> +}
> +#endif
> +
> +#if defined CONFIG_ACPI
> +
> +static const char device_label_dsm_uuid[] = {
> +	0xD0, 0x37, 0xC9, 0xE5, 0x53, 0x35, 0x7A, 0x4D,
> +	0x91, 0x17, 0xEA, 0x4D, 0x19, 0xC3, 0x43, 0x4D
> +};
> +
> +struct acpi_attribute {
> +	struct attribute attr;
> +	ssize_t (*show) (struct device *dev, char *buf);
> +	ssize_t (*test) (struct device *dev, char *buf);
> +};
> +
> +static int
> +dsm_get_label(acpi_handle handle, int func,
> +              struct acpi_buffer *output,
> +              char *buf, char *attribute)
> +{
> +	struct acpi_object_list input;
> +	union acpi_object params[4];
> +	union acpi_object *obj;
> +	int len = 0;
> +
> +	int err;
> +
> +	input.count = 4;
> +	input.pointer = params;
> +	params[0].type = ACPI_TYPE_BUFFER;
> +	params[0].buffer.length = sizeof(device_label_dsm_uuid);
> +	params[0].buffer.pointer = (char *)device_label_dsm_uuid;
> +	params[1].type = ACPI_TYPE_INTEGER;
> +	params[1].integer.value = 0x02;
> +	params[2].type = ACPI_TYPE_INTEGER;
> +	params[2].integer.value = func;
> +	params[3].type = ACPI_TYPE_PACKAGE;
> +	params[3].package.count = 0;
> +	params[3].package.elements = NULL;
> +
> +	err = acpi_evaluate_object(handle, "_DSM", &input, output);
> +	if (err)
> +		return -1;
> +
> +
> +	obj = (union acpi_object *)output->pointer;
> +
> +	switch (obj->type) {
> +	case ACPI_TYPE_PACKAGE:
> +		if (obj->package.count != 2)
> +			break;
> +		len = obj->package.elements[0].integer.value;
> +		if (buf) {
> +			if (!strncmp(attribute, "index", strlen(attribute)))
> +				scnprintf(buf, PAGE_SIZE, "%lu\n",
> +				obj->package.elements[0].integer.value);
> +			else
> +				scnprintf(buf, PAGE_SIZE, "%s\n",
> +				obj->package.elements[1].string.pointer);
> +			kfree(output->pointer);
> +			return strlen(buf);
> +		}
> +		kfree(output->pointer);
> +		return len;
> +	break;
> +	default:
> +		kfree(output->pointer);
> +		return -1;
> +	}
> +}
> +
> +static ssize_t
> +acpi_index_string_exist(struct device *dev, char *buf, char
> *attribute)
> +{
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +
> +	struct acpi_buffer output = {ACPI_ALLOCATE_BUFFER, NULL};
> +	acpi_handle handle;
> +	int length;
> +
> +	handle = DEVICE_ACPI_HANDLE(dev);
> +
> +	if (!handle)
> +		return -1;
> +
> +	if ((length = dsm_get_label(handle, DEVICE_LABEL_DSM,
> +				    &output, buf, attribute)) < 0)
> +		return -1;
> +
> +	return length;
> +}
> +
> +static ssize_t
> +acpilabel_show(struct device *dev, struct device_attribute *attr, char
> *buf)
> +{
> +	return acpi_index_string_exist(dev, buf, "label");
> +}
> +
> +static ssize_t
> +acpiindex_show(struct device *dev, struct device_attribute *attr, char
> *buf)
> +{
> +	return acpi_index_string_exist(dev, buf, "index");
> +}
> +
> +struct acpi_attribute acpi_attr_label = {
> +	.attr = {.name = "label", .mode = 0444, .owner = THIS_MODULE},
> +	.show = acpilabel_show,
> +	.test = acpi_index_string_exist,
> +};
> +
> +struct acpi_attribute acpi_attr_index = {
> +	.attr = {.name = "index", .mode = 0444, .owner = THIS_MODULE},
> +	.show = acpiindex_show,
> +	.test = acpi_index_string_exist,
> +};
> +
> +static int
> +pci_create_acpi_index_label_files(struct pci_dev *pdev)
> +{
> +	if (acpi_attr_label.test && acpi_attr_label.test(&pdev->dev,
> NULL) > 0) {
> +		sysfs_create_file(&pdev->dev.kobj, &acpi_attr_label.attr);
> +		sysfs_create_file(&pdev->dev.kobj, &acpi_attr_index.attr);
> +		return 0;
> +	}
> +	return -1;
> +}
> +
> +static int
> +pci_remove_acpi_index_label_files(struct pci_dev *pdev)
> +{
> +	if (acpi_attr_label.test && acpi_attr_label.test(&pdev->dev,
> NULL) > 0) {
> +		sysfs_remove_file(&pdev->dev.kobj, &acpi_attr_label.attr);
> +		sysfs_remove_file(&pdev->dev.kobj, &acpi_attr_index.attr);
> +		return 0;
> +	}
> +	return -1;
> +}
> +#else
> +static inline int
> +pci_create_acpi_index_label_files(struct pci_dev *pdev)
> +{
> +	return -1;
> +}
> +
> +static inline int
> +pci_remove_acpi_index_label_files(struct pci_dev *pdev)
> +{
> +	return -1;
> +}
> +#endif
> +
> +int pci_create_firmware_label_files(struct pci_dev *pdev)
> +{
> +	if (!pci_create_acpi_index_label_files(pdev))
> +		return 0;
> +	if (!pci_create_smbiosname_file(pdev))
> +		return 0;
> +	return -ENODEV;
> +}
> +
> +int pci_remove_firmware_label_files(struct pci_dev *pdev)
> +{
> +	if (!pci_remove_acpi_index_label_files(pdev))
> +		return 0;
> +	if (!pci_remove_smbiosname_file(pdev))
> +		return 0;
> +	return -ENODEV;
> +}
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index fad9398..4ed517f 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -1073,6 +1073,8 @@ int __must_check pci_create_sysfs_dev_files
> (struct pci_dev *pdev)
>  	if (retval)
>  		goto err_vga_file;
> 
> +	pci_create_firmware_label_files(pdev);
> +
>  	return 0;
> 
>  err_vga_file:
> @@ -1140,6 +1142,9 @@ void pci_remove_sysfs_dev_files(struct pci_dev
> *pdev)
>  		sysfs_remove_bin_file(&pdev->dev.kobj, pdev->rom_attr);
>  		kfree(pdev->rom_attr);
>  	}
> +
> +	pci_remove_firmware_label_files(pdev);
> +
>  }
> 
>  static int __init pci_sysfs_init(void)
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 4eb10f4..f223283 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -11,6 +11,8 @@
>  extern int pci_uevent(struct device *dev, struct kobj_uevent_env
> *env);
>  extern int pci_create_sysfs_dev_files(struct pci_dev *pdev);
>  extern void pci_remove_sysfs_dev_files(struct pci_dev *pdev);
> +extern int pci_create_firmware_label_files(struct pci_dev *pdev);
> +extern int pci_remove_firmware_label_files(struct pci_dev *pdev);
>  extern void pci_cleanup_rom(struct pci_dev *dev);
>  #ifdef HAVE_PCI_MMAP
>  extern int pci_mmap_fits(struct pci_dev *pdev, int resno,
> diff --git a/include/linux/dmi.h b/include/linux/dmi.h
> index a8a3e1a..cc57c3a 100644
> --- a/include/linux/dmi.h
> +++ b/include/linux/dmi.h
> @@ -20,6 +20,7 @@ enum dmi_device_type {
>  	DMI_DEV_TYPE_SAS,
>  	DMI_DEV_TYPE_IPMI = -1,
>  	DMI_DEV_TYPE_OEM_STRING = -2,
> +	DMI_DEV_TYPE_DEVSLOT = -3,
>  };
> 
>  struct dmi_header {
> @@ -37,6 +38,14 @@ struct dmi_device {
> 
>  #ifdef CONFIG_DMI
> 
> +struct dmi_devslot {
> +	struct dmi_device dev;
> +	int id;
> +	int seg;
> +	int bus;
> +	int devfn;
> +};
> +
>  extern int dmi_check_system(const struct dmi_system_id *list);
>  const struct dmi_system_id *dmi_first_match(const struct dmi_system_id
> *list);
>  extern const char * dmi_get_system_info(int field);
> --
> 1.6.5.2
> 
> With regards,
> Narendra K
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: still having r8169 woes with XID 18000000
From: Timo Teräs @ 2010-06-04 20:24 UTC (permalink / raw)
  To: Phil Sutter; +Cc: françois romieu, netdev
In-Reply-To: <4C09387F.1050403@iki.fi>

On 06/04/2010 08:31 PM, Timo Teräs wrote:
> On 06/04/2010 04:43 PM, Phil Sutter wrote:
>> On Fri, Jun 04, 2010 at 04:02:11PM +0300, Timo Teräs wrote:
>>>> Comparing r8169-6.013 with it's predecessor 6.012, you'll find a newly
>>>> enabled function rtl8169_phy_power_up() as well as some more invocations
>>>> of rtl8169_phy_power_down().
>>>>
>>>> This is probably the solution to these (at least in our case) very
>>>> sporadic, but highly annoying, problems. In fact, when our NIC didn't
>>>> detect any link, it needed a full power-cycle (no success with
>>>> reset-button), so almost not workaroundable.
>>>
> However, removing the specific phy config code
> (rtl8169scd_hw_phy_config) which was introduced by commit 2e955856ff
> seems to solve it. At least I was not able to reproduce the failure with
> 20-30 module reloads.

Ok, I figured that either the data the phy config writes is bad, or mdio
io is failing, so I added some additional checks to mdio_write and
mdio_read. More loops (upto 2000 iterations) and debug print if it
ultimately failed. And it did!

At bootup I got this:

r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:00:09.0: PCI->APIC IRQ transform: INT A -> IRQ 18
r8169 0000:00:09.0: no PCI Express capability
eth0: RTL8169sc/8110sc at 0xf835c000, 00:30:18:a6:2b:6c, XID 18000000 IRQ 18
r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:00:0b.0: PCI->APIC IRQ transform: INT A -> IRQ 19
r8169 0000:00:0b.0: no PCI Express capability
eth1: RTL8169sc/8110sc at 0xf8360000, 00:30:18:a6:2b:6d, XID 18000000 IRQ 19
r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:00:0c.0: PCI->APIC IRQ transform: INT A -> IRQ 16
r8169 0000:00:0c.0: no PCI Express capability
eth2: RTL8169sc/8110sc at 0xf8364000, 00:30:18:a6:2b:6e, XID 18000000 IRQ 16
r8169: mdio_write(f8364000, 0x00000003, 0000000a1) required 2000 cycles
r8169: mdio_write(f8364000, 0x00000000, 000001000) required 2000 cycles
r8169: mdio_write(f8364000, 0x00000000, 00000a0ff) required 2000 cycles
r8169: mdio_write(f8364000, 0x00000014, 00000fb54) required 2000 cycles

And eth2 was not working. Reloading the module gave a lot of other
mdio_write and mdio_read errors.

It seems to be pretty random when the errors occur, but that's the
reason why the NIC stops working: mdio_write() fails (one or more times)
at some crucial point of the board specific phy config code resulting in
bad state.

Any ideas how to debug this further?

I guess next step is to compile the Realtek driver and see if that works
right.

> One more curiosity: if i do a hard power reset, the NIC has green link
> indicator led after power up. When loading the kernel module it goes to
> orange/red. I wonder why the difference.

Figured this. At startup it goes to 100mbit/s fixed mode. After module
load it gets 1gb/s. Setting it manually to 100mbit/s changes the color
back to green. So it's just a speed indicator.

^ permalink raw reply

* [PATCH nf-next-2.6 2/2] conntrack: per_cpu untracking
From: Eric Dumazet @ 2010-06-04 20:15 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Changli Gao, Netfilter Developers, netdev
In-Reply-To: <1275668732.2482.201.camel@edumazet-laptop>

NOTRACK makes all cpus share a cache line on nf_conntrack_untracked
twice per packet, slowing down performance.

This patch converts it to a per_cpu variable.

We assume same cpu is used for a given packet, entering and exiting the
NOTRACK state.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/netfilter/nf_conntrack.h |    5 +--
 net/netfilter/nf_conntrack_core.c    |   36 ++++++++++++++++++-------
 2 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index 3bc38c7..84a4b6f 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -261,11 +261,10 @@ extern s16 (*nf_ct_nat_offset)(const struct nf_conn *ct,
 			       u32 seq);
 
 /* Fake conntrack entry for untracked connections */
+DECLARE_PER_CPU(struct nf_conn, nf_conntrack_untracked);
 static inline struct nf_conn *nf_ct_untracked_get(void)
 {
-	extern struct nf_conn nf_conntrack_untracked;
-
-	return &nf_conntrack_untracked;
+	return &__raw_get_cpu_var(nf_conntrack_untracked);
 }
 extern void nf_ct_untracked_status_or(unsigned long bits);
 
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 6c1da21..9c66141 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -62,8 +62,8 @@ EXPORT_SYMBOL_GPL(nf_conntrack_htable_size);
 unsigned int nf_conntrack_max __read_mostly;
 EXPORT_SYMBOL_GPL(nf_conntrack_max);
 
-struct nf_conn nf_conntrack_untracked;
-EXPORT_SYMBOL_GPL(nf_conntrack_untracked);
+DEFINE_PER_CPU(struct nf_conn, nf_conntrack_untracked);
+EXPORT_PER_CPU_SYMBOL(nf_conntrack_untracked);
 
 static int nf_conntrack_hash_rnd_initted;
 static unsigned int nf_conntrack_hash_rnd;
@@ -1183,10 +1183,21 @@ static void nf_ct_release_dying_list(struct net *net)
 	spin_unlock_bh(&nf_conntrack_lock);
 }
 
+static int untrack_refs(void)
+{
+	int cnt = 0, cpu;
+
+	for_each_possible_cpu(cpu) {
+		struct nf_conn *ct = &per_cpu(nf_conntrack_untracked, cpu);
+
+		cnt += atomic_read(&ct->ct_general.use) - 1;
+	}
+	return cnt;
+}
+
 static void nf_conntrack_cleanup_init_net(void)
 {
-	/* wait until all references to nf_conntrack_untracked are dropped */
-	while (atomic_read(&nf_conntrack_untracked.ct_general.use) > 1)
+	while (untrack_refs() > 0)
 		schedule();
 
 	nf_conntrack_helper_fini();
@@ -1323,14 +1334,17 @@ module_param_call(hashsize, nf_conntrack_set_hashsize, param_get_uint,
 
 void nf_ct_untracked_status_or(unsigned long bits)
 {
-	nf_conntrack_untracked.status |= bits;
+	int cpu;
+
+	for_each_possible_cpu(cpu)
+		per_cpu(nf_conntrack_untracked, cpu).status |= bits;
 }
 EXPORT_SYMBOL_GPL(nf_ct_untracked_status_or);
 
 static int nf_conntrack_init_init_net(void)
 {
 	int max_factor = 8;
-	int ret;
+	int ret, cpu;
 
 	/* Idea from tcp.c: use 1/16384 of memory.  On i386: 32MB
 	 * machine has 512 buckets. >= 1GB machines have 16384 buckets. */
@@ -1369,10 +1383,12 @@ static int nf_conntrack_init_init_net(void)
 		goto err_extend;
 #endif
 	/* Set up fake conntrack: to never be deleted, not in any hashes */
-#ifdef CONFIG_NET_NS
-	nf_conntrack_untracked.ct_net = &init_net;
-#endif
-	atomic_set(&nf_conntrack_untracked.ct_general.use, 1);
+	for_each_possible_cpu(cpu) {
+		struct nf_conn *ct = &per_cpu(nf_conntrack_untracked, cpu);
+
+		write_pnet(&ct->ct_net, &init_net);
+		atomic_set(&ct->ct_general.use, 1);
+	}
 	/*  - and look it like as a confirmed connection */
 	nf_ct_untracked_status_or(IPS_CONFIRMED | IPS_UNTRACKED);
 	return 0;



^ permalink raw reply related

* Re: [v5 Patch 1/3] netpoll: add generic support for bridge and bonding devices
From: Andy Gospodarek @ 2010-06-04 19:18 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jay Vosburgh, Flavio Leitner, linux-kernel, Matt Mackall, netdev,
	bridge, Andy Gospodarek, Neil Horman, Jeff Moyer,
	Stephen Hemminger, bonding-devel, David Miller
In-Reply-To: <4C062CBD.7090906@redhat.com>

On Wed, Jun 02, 2010 at 06:04:45PM +0800, Cong Wang wrote:
> On 06/02/10 02:42, Jay Vosburgh wrote:
>> Cong Wang<amwang@redhat.com>  wrote:
>>
>>> On 06/01/10 03:08, Flavio Leitner wrote:
>>>> On Mon, May 31, 2010 at 01:56:52PM +0800, Cong Wang wrote:
>>>>> Hi, Flavio,
>>>>>
>>>>> Please use the attached patch instead, try to see if it solves
>>>>> all your problems.
>>>>
>>>> I tried and it hangs. No backtraces this time.
>>>> The bond_change_active_slave() prints before NETDEV_BONDING_FAILOVER
>>>> notification, so I think it won't work.
>>>
>>> Ah, I thought the same.
>>>
>>>>
>>>> Please, correct if I'm wrong, but when a failover happens with your
>>>> patch applied, the netconsole would be disabled forever even with
>>>> another healthy slave, right?
>>>>
>>>
>>> Yes, this is an easy solution, because bonding has several modes,
>>> it is complex to make netpoll work in different modes.
>>
>> 	If I understand correctly, the root cause of the problem with
>> netconsole and bonding is that bonding is, ultimately, performing
>> printks with a write lock held, and when netconsole recursively calls
>> into bonding to send the printk over the netconsole, there is a deadlock
>> (when the bonding xmit function attempts to acquire the same lock for
>> read).
>
>
> Yes.
>
>>
>> 	You're trying to avoid the deadlock by shutting off netconsole
>> (permanently, it looks like) for one problem case: a failover, which
>> does some printks with a write lock held.
>>
>> 	This doesn't look to me like a complete solution, there are
>> other cases in bonding that will do printk with write locks held.  I
>> suspect those will also hang netconsole as things exist today, and won't
>> be affected by your patch below.
>
>
> I can expect that, bonding modes are complex.
>
>>
>> 	For example:
>>
>> 	The sysfs functions to set the primary (bonding_store_primary)
>> or active (bonding_store_active_slave) options: a pr_info is called to
>> provide a log message of the results.  These could be tested by setting
>> the primary or active options via sysfs, e.g.,
>>
>> echo eth0>  /sys/class/net/bond0/bonding/primary
>> echo eth0>  /sys/class/net/bond0/bonding/active
>>
>> 	If the kernel is defined with DEBUG, there are a few pr_debug
>> calls within write_locks (bond_del_vlan, for example).
>>
>> 	If the slave's underlying device driver's ndo_vlan_rx_register
>> or ndo_vlan_rx_kill_vid functions call printk (and it looks like some do
>> for error cases, e.g., igbvf, ehea, enic), those would also presumably
>> deadlock (because bonding holds its write_lock when calling the ndo_
>> vlan functions).
>>
>> 	It also appears that (with the patch below) some nominally
>> normal usage patterns will immediately disable netconsole.  The one that
>> comes to mind is if the primary= option is set (to "eth1" for this
>> example), but that slave not enslaved first (the slaves are added, say,
>> eth0 then eth1).  In that situation, when the primary slave (eth1 here)
>> is added, the first thing that will happen is a failover, and that will
>> disable netconsole.
>>
>
> Thanks for your detailed explanation!
>
> This is why I said bonding is complex. I guess we would have to adjust
> netpoll code for different bonding cases, one solution seems not fix all.
> I am not sure how much work to do, since I am not familiar with bonding
> code. Maybe Andy can help?
>

Sorry I've been silent until now.  This does seem quite similar to a
problem I've previously encountered when dealing with bonding+netpoll on
some old 2.6.9-based kernels.  There is no guarantee the methods used
there will apply here, but I'll talk about them anyway.

As Flavio noticed, recursive calls into the bond transmit routines were
not a good idea.  I discovered the same and worked around this issue by
checking to see if we could take the bond->lock for writing before
continuing.  If we could not get, I wanted to signal that this should be
queued for transmission later.  Based on the flow of netpoll_send_skb
(or possibly for another reason that is escaping me right now) I added
one of these checks in bond_poll_controller too.  These aren't the
prettiest fixes, but seemed to work well for me when I did this work in
the past.  I realize the differences are not that great compared to some
of the patches posted by Flavio, but I think they are worth trying.

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index ef60244..d7b9b99 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1290,6 +1290,12 @@ static bool slaves_support_netpoll(struct net_device *bond_dev)
 static void bond_poll_controller(struct net_device *bond_dev)
 {
 	struct net_device *dev = bond_dev->npinfo->netpoll->real_dev;
+	struct bonding *bond = netdev_priv(bond_dev);
+
+	if (!write_trylock(&bond->lock))
+		return;
+	write_unlock(&bond->lock);
+
 	if (dev != bond_dev)
 		netpoll_poll_dev(dev);
 }
@@ -4418,7 +4424,11 @@ static void bond_set_xmit_hash_policy(struct bonding *bond)
 
 static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
-	const struct bonding *bond = netdev_priv(dev);
+	struct bonding *bond = netdev_priv(dev);
+
+	if (!write_trylock(&bond->lock))
+		return NETDEV_TX_BUSY;
+	write_unlock(&bond->lock);
 
 	switch (bond->params.mode) {
 	case BOND_MODE_ROUNDROBIN:

The other key to all of this is to make sure that queuing is done
correctly now that we expect to queue these frames and have them sent at
some point when there is a member of the bond that is actually capable
of sending them out.

The new style of sending queued skbs in a workqueue is much better than
what was done in the 2.6.9 timeframe, but careful attention should still
be paid to txq lock and which processor is the owner.  Returning
something other than NETDEV_TX_OK from bond_start_xmit and checking for
locks being held there should also help with any deadlocks that show up
while running in queue_process (though they would not be recursive).

I'm not in a good spot to test this right now, but I can take a look at
next week and we can try and track down any of the other deadlocks that
currently exist as I suspect this will not resolve all of the issues.

^ permalink raw reply related

* Re: [PATCH 1/2] net: Enable 64-bit net device statistics on 32-bit architectures
From: Ben Hutchings @ 2010-06-04 18:15 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, Arnd Bergmann, netdev, linux-net-drivers
In-Reply-To: <20100604102852.08ce3cd1@nehalam>

On Fri, 2010-06-04 at 10:28 -0700, Stephen Hemminger wrote:
> On Thu, 03 Jun 2010 20:11:38 +0100
> Ben Hutchings <bhutchings@solarflare.com> wrote:
> 
> > static inline u64 rtnl_link_stats64_read(const u64 *field)
> > {
> > 	return ACCESS_ONCE(*field);
> > }
> > static inline u32 rtnl_link_stats64_read32(const u64 *field)
> > {
> > 	return ACCESS_ONCE(*field);
> > }
> 
> Do we really care if compiler reorders access. I think not.
> There was no order guarantee in the past.

Since these reads are potentially racing with writes, we want to ensure
that they are atomic.  Without the volatile-qualification, the compiler
can legitimately split or repeat the reads, though I don't see any
particular reason why this is a likely optimisation.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: still having r8169 woes with XID 18000000
From: Timo Teräs @ 2010-06-04 17:31 UTC (permalink / raw)
  To: Phil Sutter; +Cc: françois romieu, netdev
In-Reply-To: <20100604134351.7981F4CD45@orbit.nwl.cc>

On 06/04/2010 04:43 PM, Phil Sutter wrote:
> On Fri, Jun 04, 2010 at 04:02:11PM +0300, Timo Teräs wrote:
>>> Comparing r8169-6.013 with it's predecessor 6.012, you'll find a newly
>>> enabled function rtl8169_phy_power_up() as well as some more invocations
>>> of rtl8169_phy_power_down().
>>>
>>> This is probably the solution to these (at least in our case) very
>>> sporadic, but highly annoying, problems. In fact, when our NIC didn't
>>> detect any link, it needed a full power-cycle (no success with
>>> reset-button), so almost not workaroundable.
>>
>> Sounds very similar to the problem I have. Thanks for the pointers!
>>
>> It looks like the r8169 driver does have phy power up code in it, but
>> it's only executed for specific versions of the chip. Realtek driver
>> seems to do it unconditionally.
> 
> Hmm. I actually never looked at the corresponding parts of the
> in-tree-driver, but that would have definitely been the next step in
> order to fix it.
> 
>> The check seems to be:
>>                 if ((tp->mac_version == RTL_GIGA_MAC_VER_11) ||
>>                     (tp->mac_version == RTL_GIGA_MAC_VER_12) ||
>>                     (tp->mac_version >= RTL_GIGA_MAC_VER_17)) {
>>
>> I wonder if I should just add my mac version there (_VER_05) and test if
>> it'll make it better.
> 
> Surely worth a try. On the other hand, looking at the sheer mass of
> problem reports regarding this driver, making it worse is rather hard to
> do I guess. :)

Ok. The issue is semi-reliably producible with just removing the
kernel driver and reloading it. The problem occurs maybe with 10-20% chance.

So far, it looks like the phy wakeup does not really help. Adding the
_VER_05 check did not help.

However, removing the specific phy config code
(rtl8169scd_hw_phy_config) which was introduced by commit 2e955856ff
seems to solve it. At least I was not able to reproduce the failure with
20-30 module reloads.

One more curiosity: if i do a hard power reset, the NIC has green link
indicator led after power up. When loading the kernel module it goes to
orange/red. I wonder why the difference.

- Timo

^ permalink raw reply

* Re: [PATCH 1/2] net: Enable 64-bit net device statistics on 32-bit architectures
From: Stephen Hemminger @ 2010-06-04 17:28 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, Arnd Bergmann, netdev, linux-net-drivers
In-Reply-To: <1275592298.2106.36.camel@achroite.uk.solarflarecom.com>

On Thu, 03 Jun 2010 20:11:38 +0100
Ben Hutchings <bhutchings@solarflare.com> wrote:

> static inline u64 rtnl_link_stats64_read(const u64 *field)
> {
> 	return ACCESS_ONCE(*field);
> }
> static inline u32 rtnl_link_stats64_read32(const u64 *field)
> {
> 	return ACCESS_ONCE(*field);
> }

Do we really care if compiler reorders access. I think not.
There was no order guarantee in the past.

-- 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox