Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 2/2] Ethtool: add EEE to ethtool's documentation
From: Ben Hutchings @ 2012-06-06 15:56 UTC (permalink / raw)
  To: Yuval Mintz; +Cc: netdev, eilong, peppe.cavallaro
In-Reply-To: <1338878470-24784-3-git-send-email-yuvalmin@broadcom.com>

On Tue, 2012-06-05 at 09:41 +0300, Yuval Mintz wrote:
> under Synopsis:
> 	ethtool --get-eee devname
> 	ethtool --set-eee devname [eee on|off] [tx-lpi on|off] [tx-timer N] [advertise N]
> 
> under Options:
>        --get-eee
>               Queries  the  specified network device for its support in Efficient Energy Ethernet (ac-
> 	      cording to the IEEE 802.3az specifications)
>        --set-eee
> 	      Sets the device EEE behaviour.
>        eee on|off
> 	      Enables/Disables the device support in EEE.
>        tx-lpi on|off
> 	      Determines whether the device should assert its tx lpi.
>        advertise N
>               Sets the speeds for which the device would advertise EEE capabliities.   Values  are  as
> 	      for --change advertise
>        tx-timer N
>               Sets  the  amount  of time the device should stay in idle mode prior to asserting its tx
>               lpi (in microseconds). This has meaning only when tx lpi is on.

Please just fold this change into the patch that adds the options.  Also
there is no need to repeat the content in the commit message.

> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
> ---
>  ethtool.8.in |   32 ++++++++++++++++++++++++++++++++
>  1 files changed, 32 insertions(+), 0 deletions(-)
> 
> diff --git a/ethtool.8.in b/ethtool.8.in
> index 523b737..b906d8e 100644
> --- a/ethtool.8.in
> +++ b/ethtool.8.in
> @@ -335,6 +335,16 @@ ethtool \- query or control network driver and hardware settings
>  .I devname flag
>  .A1 on off
>  .RB ...
> +.HP
> +.B ethtool \-\-get\-eee
> +.I devname
> +.HP
> +.B ethtool \-\-set\-eee
> +.I devname
> +.B2 eee on off
> +.B2 tx-lpi on off
> +.BN tx-timer
> +.BN advertise
>  .
>  .\" Adjust lines (i.e. full justification) and hyphenate.
>  .ad
> @@ -817,6 +827,28 @@ Sets the device's private flags as specified.
>  .I flag
>  .A1 on off
>  Sets the state of the named private flag.
> +.TP
> +.B \-\-get\-eee
> +Queries the specified network device for its support in Efficient Energy
> +Ethernet (according to the IEEE 802.3az specifications)

'of', not 'in'

> +.TP
> +.B \-\-set\-eee
> +Sets the device EEE behaviour.
> +.TP
> +.A2 eee on off
> +Enables/Disables the device support in EEE.

Lower-case 'd' for 'disables'.
'of', not 'in'

> +.TP
> +.A2 tx-lpi on off
> +Determines whether the device should assert its tx lpi.

'TX LPI' should be in upper-case.

> +.TP
> +.BI advertise \ N
> +Sets the speeds for which the device would advertise EEE capabliities.

'would', not 'should'
'capabilities', not 'capabliities'

> +Values are as for
> +.B \-\-change advertise
> +.TP
> +.BI tx-timer \ N
> +Sets the amount of time the device should stay in idle mode prior to asserting
> +its tx lpi (in microseconds). This has meaning only when tx lpi is on.

Same here.

Ben.

>  .SH BUGS
>  Not supported (in part or whole) on all network drivers.
>  .SH AUTHOR

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: Possible deadlock in ipv6?
From: Eric Dumazet @ 2012-06-06 15:53 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: netdev
In-Reply-To: <4FCF6DF4.2090304@parallels.com>

On Wed, 2012-06-06 at 18:49 +0400, Vladimir Davydov wrote:
> I'm not familiar with the linux net subsystem, so I would appreciate if 
> someone could clarify if the following call chain is possible:
> 
> addrconf_ifdown() calls neigh_ifdown(nd_tbl) which locks nd_tbl.lock for 
> writing and calls
> 
>      pneigh_ifdown
>      pndisc_destructor
>      ipv6_dev_mc_dec
>      __ipv6_dev_mc_dec
>      igmp6_group_dropped
>      igmp6_leave_group
>      igmp6_send
>      icmp6_dst_alloc
>      ip6_neigh_lookup
>      neigh_create
> 
> and neigh_create() locks nd_tbl.lock for writing again resulting in a 
> deadlock.

It seems a deadlock is possible indeed, good catch !

^ permalink raw reply

* Re: [PATCH 1/2] Ethtool: Add EEE support
From: Ben Hutchings @ 2012-06-06 15:48 UTC (permalink / raw)
  To: Yuval Mintz; +Cc: netdev, eilong, peppe.cavallaro
In-Reply-To: <1338878470-24784-2-git-send-email-yuvalmin@broadcom.com>

On Tue, 2012-06-05 at 09:41 +0300, Yuval Mintz wrote:
> This patch adds 2 new ethtool commands which can be
> used to manipulate network interfaces' support in
> EEE.
> 
> Output of 'get' has the following form:
> 
> 	EEE Settings for p2p1:
> 		EEE status: enabled - active
> 		Tx LPI: 1000 (u)
> 		Supported EEE link modes:  10000baseT/Full
> 		Advertised EEE link modes:  10000baseT/Full
> 		Link partner advertised EEE link modes:  10000baseT/Full
> 
> Thanks goes to Giuseppe Cavallaro for his original patch.
[...]
> diff --git a/ethtool.c b/ethtool.c
> index f18f611..063e72b 100644
> --- a/ethtool.c
> +++ b/ethtool.c
> @@ -359,7 +359,8 @@ static int do_version(struct cmd_context *ctx)
>  	return 0;
>  }
>  
> -static void dump_link_caps(const char *prefix, const char *an_prefix, u32 mask);
> +static void dump_link_caps(const char *prefix, const char *an_prefix, u32 mask,
> +			    u8 all);

For now, use int for booleans.  At some point I would like to see a
thorough cleanup of ethtool to use bool where appropriate - but that's
independent of this.

[...]
>  /* Print link capability flags (supported, advertised or lp_advertised).
>   * Assumes that the corresponding SUPPORTED and ADVERTISED flags are equal.
>   */
>  static void
> -dump_link_caps(const char *prefix, const char *an_prefix, u32 mask)
> +dump_link_caps(const char *prefix, const char *an_prefix, u32 mask, u8 all)
>  {
>  	int indent;
>  	int did1;
> @@ -456,24 +457,26 @@ dump_link_caps(const char *prefix, const char *an_prefix, u32 mask)
>  		 fprintf(stdout, "Not reported");
>  	fprintf(stdout, "\n");
>  
> -	fprintf(stdout, "	%s pause frame use: ", prefix);
> -	if (mask & ADVERTISED_Pause) {
> -		fprintf(stdout, "Symmetric");
> -		if (mask & ADVERTISED_Asym_Pause)
> -			fprintf(stdout, " Receive-only");
> -		fprintf(stdout, "\n");
> -	} else {
> -		if (mask & ADVERTISED_Asym_Pause)
> -			fprintf(stdout, "Transmit-only\n");
> +	if (all) {

It might be clearer to invert this flag and name it something like
'link_mode_only'.

> +		fprintf(stdout, "	%s pause frame use: ", prefix);
> +		if (mask & ADVERTISED_Pause) {
> +			fprintf(stdout, "Symmetric");
> +			if (mask & ADVERTISED_Asym_Pause)
> +				fprintf(stdout, " Receive-only");
> +			fprintf(stdout, "\n");
> +		} else {
> +			if (mask & ADVERTISED_Asym_Pause)
> +				fprintf(stdout, "Transmit-only\n");
> +			else
> +				fprintf(stdout, "No\n");
> +		}
> +
> +		fprintf(stdout, "	%s auto-negotiation: ", an_prefix);
> +		if (mask & ADVERTISED_Autoneg)
> +			fprintf(stdout, "Yes\n");
>  		else
>  			fprintf(stdout, "No\n");
>  	}
> -
> -	fprintf(stdout, "	%s auto-negotiation: ", an_prefix);
> -	if (mask & ADVERTISED_Autoneg)
> -		fprintf(stdout, "Yes\n");
> -	else
> -		fprintf(stdout, "No\n");
>  }
>  
>  static int dump_ecmd(struct ethtool_cmd *ep)
[...]
> @@ -1116,6 +1120,36 @@ static int dump_rxfhash(int fhash, u64 val)
>  	return 0;
>  }
>  
> +static int dump_eeecmd(struct ethtool_eee *ep)

Is there any reason for this not to return void?

> +{
> +
> +	fprintf(stdout, "	EEE status: ");
> +	if (!ep->supported) {
> +		fprintf(stdout, "not supported\n");
> +		return 0;
> +	} else if (!ep->eee_enabled) {
> +		fprintf(stdout, "disabled\n");
> +	} else {
> +		fprintf(stdout, "enabled - ");
> +		if (ep->eee_active)
> +			fprintf(stdout, "active\n");
> +		else
> +			fprintf(stdout, "inactive\n");
> +	}
> +
> +	fprintf(stdout, "	Tx LPI:");
> +	if (ep->tx_lpi_enabled)
> +		fprintf(stdout, " %d (u)\n", ep->tx_lpi_timer);

"us" not "(u)"

> +	else
> +		fprintf(stdout, " disabled\n");
> +
> +	dump_link_caps("Supported EEE", "", ep->supported, 0);
> +	dump_link_caps("Advertised EEE", "", ep->advertised, 0);
> +	dump_link_caps("Link partner advertised EEE", "", ep->lp_advertised, 0);
> +
> +	return 0;
> +}
> +
[...]
> +static int do_seee(struct cmd_context *ctx)
> +{
> +	int adv_c = -1, lpi_c = -1, lpi_time_c = -1, eee_c = -1;
> +	int change = -1, change2 = -1;
> +	struct ethtool_eee eeecmd;
> +	struct cmdline_info cmdline_eee[] = {
> +		{ "advertise",    CMDL_U32,  &adv_c,       &eeecmd.advertised },
> +		{ "tx-lpi",       CMDL_BOOL, &lpi_c,   &eeecmd.tx_lpi_enabled },
> +		{ "tx-timer",	  CMDL_U32,  &lpi_time_c, &eeecmd.tx_lpi_timer},
> +		{ "eee",	  CMDL_BOOL, &eee_c,	   &eeecmd.eee_enabled},
> +	};
> +
> +	if (ctx->argc == 0)
> +		exit_bad_args();
> +
> +	parse_generic_cmdline(ctx, &change, cmdline_eee,
> +			      ARRAY_SIZE(cmdline_eee));
> +
> +	eeecmd.cmd = ETHTOOL_GEEE;
> +	if (send_ioctl(ctx, &eeecmd)) {
> +		perror("Cannot get EEE settings");
> +		return 1;
> +	}
> +
> +	do_generic_set(cmdline_eee, ARRAY_SIZE(cmdline_eee), &change2);
> +
> +	if (change2) {
> +
> +		eeecmd.cmd = ETHTOOL_SEEE;
> +		if (send_ioctl(ctx, &eeecmd)) {
> +			perror("Cannot set EEE settings");
> +			return 1;
> +		}
> +	}
> +
> +	return 1;

return 0, I think!

> +}
> +
>  int send_ioctl(struct cmd_context *ctx, void *cmd)
>  {
>  #ifndef TEST_ETHTOOL
> @@ -3423,6 +3516,12 @@ static const struct option {
>  	  "		[ hex on|off ]\n"
>  	  "		[ offset N ]\n"
>  	  "		[ length N ]\n" },
> +	{ "--get-eee", 1, do_geee, "Get EEE settings"},
> +	{ "--set-eee", 1, do_seee, "Set EEE settings",
> +	  "		[ eee on|off ]\n"
> +	  "		[ advertise %x ]\n"
> +	  "		[ tx-lpi on|off ]\n"
> +	  "		[ tx-timer %x ]\n"},

The tx-timer value would normally be specified in decimal, so put "%d"
here.

>  	{ "-h|--help", 0, show_usage, "Show this help" },
>  	{ "--version", 0, do_version, "Show version number" },
>  	{}

You also need to add some test cases for the command line parsing in
test-cmdline.c.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [net-next PATCH v2 1/3] Added kernel support in EEE Ethtool commands
From: Ben Hutchings @ 2012-06-06 15:20 UTC (permalink / raw)
  To: Yuval Mintz; +Cc: davem, netdev, eilong, peppe.cavallaro
In-Reply-To: <1338973098-16439-2-git-send-email-yuvalmin@broadcom.com>

On Wed, 2012-06-06 at 11:58 +0300, Yuval Mintz wrote:
> This patch extends the kernel's ethtool interface by adding support
> for 2 new EEE commands - get_eee and set_eee.
> 
> Thanks goes to Giuseppe Cavallaro for his original patch adding this support.
> 
> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
> Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
> ---
>  include/linux/ethtool.h |   32 ++++++++++++++++++++++++++++++++
>  net/core/ethtool.c      |   40 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 72 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index e17fa71..6250e1f 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -137,6 +137,32 @@ struct ethtool_eeprom {
>  };
>  
>  /**
> + * struct ethtool_eee - Energy Efficient Ethernet information
> + * @cmd: ETHTOOL_{G,S}EEE
> + * @supported: Link speeds for which there is eee support.
> + * @advertised: Link speeds the interface advertises (AN) as eee capable.
> + * @lp_advertised: Link speeds the link partner advertised as eee capable.

And these are bitmasks of SUPPORTED_* & ADVERTISED_* flags, right?
Maybe 'link modes' not 'link speeds'?

Otherwise, this all looks good to me (with limited knowledge of EEE).

Ben.

> + * @eee_active: Result of the eee auto negotiation.
> + * @eee_enabled: EEE configured mode (enabled/disabled).
> + * @tx_lpi_enabled: Whether the interface should assert its tx lpi, given
> + *	that eee was negotiated.
> + * @tx_lpi_timer: Time in microseconds the interface delays prior to asserting
> + *	its tx lpi (after reaching 'idle' state). Effective only when eee
> + *	was negotiated and tx_lpi_enabled was set.
> + */
> +struct ethtool_eee {
[...]

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Eric Dumazet @ 2012-06-06 15:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <20120606144941.GA17092@redhat.com>

On Wed, 2012-06-06 at 17:49 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 06, 2012 at 03:10:10PM +0200, Eric Dumazet wrote:
> > On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote:
> > 
> > > We currently do all stats either on napi callback or from
> > > start_xmit callback.
> > > This makes them safe, yes?
> > 
> > Hmm, then _bh() variant is needed in virtnet_stats(), as explained in
> > include/linux/u64_stats_sync.h section 6)
> > 
> >  * 6) If counter might be written by an interrupt, readers should block interrupts.
> >  *    (On UP, there is no seqcount_t protection, a reader allowing interrupts could
> >  *     read partial values)
> > 
> > Yes, its tricky...
> 
> Sounds good, but I have a question: this realies on counters
> being atomic on 64 bit.
> Would not it be better to always use a seqlock even on 64 bit?
> This way counters would actually be correct and in sync.
> As it is if we want e.g. average packet size,
> we can not rely e.g. on it being bytes/packets.

When this stuff was discussed, we chose to have a nop on 64bits.

Your point has little to do with 64bit stats, it was already like that
with 'long int' counters.

Consider average driver doing :

dev->stats.rx_bytes += skb->len;
dev->stats.rx_packets++;

A concurrent reader can read an updated rx_bytes and a 'previous'
rx_packets one.

'fixing' this requires a lot of work and memory barriers (in all
drivers), for a very litle gain (at most one packet error)

u64_stats_sync was really meant to be 0-cost on 64bit arches.

^ permalink raw reply

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Stephen Hemminger @ 2012-06-06 15:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Eric Dumazet, Jason Wang, netdev, rusty, linux-kernel,
	virtualization
In-Reply-To: <20120606144941.GA17092@redhat.com>

On Wed, 6 Jun 2012 17:49:42 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> Sounds good, but I have a question: this realies on counters
> being atomic on 64 bit.
> Would not it be better to always use a seqlock even on 64 bit?
> This way counters would actually be correct and in sync.
> As it is if we want e.g. average packet size,
> we can not rely e.g. on it being bytes/packets.

This has not been a requirement on real physical devices; therefore
the added overhead is not really justified.

Many network cards use counters in hardware to count packets/bytes
and there is no expectation of atomic access there.

^ permalink raw reply

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Michael S. Tsirkin @ 2012-06-06 14:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <1338988210.2760.4485.camel@edumazet-glaptop>

On Wed, Jun 06, 2012 at 03:10:10PM +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote:
> 
> > We currently do all stats either on napi callback or from
> > start_xmit callback.
> > This makes them safe, yes?
> 
> Hmm, then _bh() variant is needed in virtnet_stats(), as explained in
> include/linux/u64_stats_sync.h section 6)
> 
>  * 6) If counter might be written by an interrupt, readers should block interrupts.
>  *    (On UP, there is no seqcount_t protection, a reader allowing interrupts could
>  *     read partial values)
> 
> Yes, its tricky...

Sounds good, but I have a question: this realies on counters
being atomic on 64 bit.
Would not it be better to always use a seqlock even on 64 bit?
This way counters would actually be correct and in sync.
As it is if we want e.g. average packet size,
we can not rely e.g. on it being bytes/packets.

> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 5214b1e..705aaa7 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -703,12 +703,12 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
>  		u64 tpackets, tbytes, rpackets, rbytes;
>  
>  		do {
> -			start = u64_stats_fetch_begin(&stats->syncp);
> +			start = u64_stats_fetch_begin_bh(&stats->syncp);
>  			tpackets = stats->tx_packets;
>  			tbytes   = stats->tx_bytes;
>  			rpackets = stats->rx_packets;
>  			rbytes   = stats->rx_bytes;
> -		} while (u64_stats_fetch_retry(&stats->syncp, start));
> +		} while (u64_stats_fetch_retry_bh(&stats->syncp, start));
>  
>  		tot->rx_packets += rpackets;
>  		tot->tx_packets += tpackets;
> 

^ permalink raw reply

* Possible deadlock in ipv6?
From: Vladimir Davydov @ 2012-06-06 14:49 UTC (permalink / raw)
  To: netdev

I'm not familiar with the linux net subsystem, so I would appreciate if 
someone could clarify if the following call chain is possible:

addrconf_ifdown() calls neigh_ifdown(nd_tbl) which locks nd_tbl.lock for 
writing and calls

     pneigh_ifdown
     pndisc_destructor
     ipv6_dev_mc_dec
     __ipv6_dev_mc_dec
     igmp6_group_dropped
     igmp6_leave_group
     igmp6_send
     icmp6_dst_alloc
     ip6_neigh_lookup
     neigh_create

and neigh_create() locks nd_tbl.lock for writing again resulting in a 
deadlock.

Thank you.

^ permalink raw reply

* Re: [PATCH] ip.7: Improve explanation about calling listen or connect
From: Flavio Leitner @ 2012-06-06 14:44 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Peter Schiffer, linux-man-u79uwXL29TY76Z2rM5mHXA, netdev
In-Reply-To: <4FBF66D8.7060007-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>


Hi,

Could someone tell me what's the patch current state?
It has been a month already with no feedback.
thanks,
fbl

On Fri, 25 May 2012 13:02:48 +0200
Peter Schiffer <pschiffe-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> Hi Michael,
> 
> do you have any comments for this update? Or do you need some supporting 
> info?
> 
> peter
> 
> On 05/09/2012 02:30 PM, Flavio Leitner wrote:
> > Signed-off-by: Flavio Leitner<fbl-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > ---
> >   man7/ip.7 |   15 +++++++++------
> >   1 files changed, 9 insertions(+), 6 deletions(-)
> >
> > diff --git a/man7/ip.7 b/man7/ip.7
> > index 9f560df..84fe32d 100644
> > --- a/man7/ip.7
> > +++ b/man7/ip.7
> > @@ -69,12 +69,11 @@ For
> >   you may specify a valid IANA IP protocol defined in
> >   RFC\ 1700 assigned numbers.
> >   .PP
> > -.\" FIXME ip current does an autobind in listen, but I'm not sure
> > -.\" if that should be documented.
> >   When a process wants to receive new incoming packets or connections, it
> >   should bind a socket to a local interface address using
> >   .BR bind (2).
> > -Only one IP socket may be bound to any given local (address, port) pair.
> > +In this case, only one IP socket may be bound to any given local
> > +(address, port) pair.
> >   When
> >   .B INADDR_ANY
> >   is specified in the bind call, the socket will be bound to
> > @@ -82,10 +81,14 @@ is specified in the bind call, the socket will be bound to
> >   local interfaces.
> >   When
> >   .BR listen (2)
> > -or
> > +is called on an unbound socket, the socket is automatically bound
> > +to a random free port with the local address set to
> > +.BR INADDR_ANY .
> > +When
> >   .BR connect (2)
> > -are called on an unbound socket, it is automatically bound to a
> > -random free port with the local address set to
> > +is called on an unbound socket, the socket is automatically bound
> > +to a random free port or an usable shared port with the local address
> > +set to
> >   .BR INADDR_ANY .
> >
> >   A TCP local socket address that has been bound is unavailable for

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Difficulties to get 1Gbps on be2net ethernet card
From: Jean-Michel Hautbois @ 2012-06-06 14:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sathya.Perla, netdev
In-Reply-To: <CAL8zT=jGHo82mo-s8Tfs9LWzfu2GkrS4eZJoeOpHhpXHMr6csg@mail.gmail.com>

2012/6/6 Jean-Michel Hautbois <jhautbois@gmail.com>:
> 2012/6/6 Jean-Michel Hautbois <jhautbois@gmail.com>:
>> 2012/6/6 Eric Dumazet <eric.dumazet@gmail.com>:
>>> On Wed, 2012-06-06 at 12:04 +0200, Jean-Michel Hautbois wrote:
>>>
>>>> Well, well, well, after having tested several configurations, several
>>>> drivers, I have a big difference between an old 2.6.26 kernel and a
>>>> newer one (I tried 3.2 and 3.4).
>>>>
>>>> Here is my stream : UDP packets (multicast), 4000 bytes length, MTU
>>>> set to 4096. I am sending packets only, nothing on RX.
>>>> I send from 1Gbps upto 2.4Gbps and I see no drops in tc with 2.6.26
>>>> kernel, but a lot of drops with a newer kernel.
>>>> So, I don't know if I missed something in my kernel configuration, but
>>>> I have used the 2.6.26 one as a reference, in order to set the same
>>>> options (DMA related, etc).
>>>>
>>>> I easily reproduce this problem and setting a bigger txqueuelen solves
>>>> it partially.
>>>> 1Gbps requires a txqueulen of 9000, 2.4Gbps requires more than 20000 !
>>>>
>>>> If you have any idea, I am interested, as this is a big issue for my use case.
>>>>
>>>
>>> Yep.
>>>
>>> This driver wants to limit number of tx completions, thats just wrong.
>>>
>>> Fix and dirty patch:
>>>
>>>
>>> diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
>>> index c5c4c0e..1e8f8a6 100644
>>> --- a/drivers/net/ethernet/emulex/benet/be.h
>>> +++ b/drivers/net/ethernet/emulex/benet/be.h
>>> @@ -105,7 +105,7 @@ static inline char *nic_name(struct pci_dev *pdev)
>>>  #define MAX_TX_QS              8
>>>  #define MAX_ROCE_EQS           5
>>>  #define MAX_MSIX_VECTORS       (MAX_RSS_QS + MAX_ROCE_EQS) /* RSS qs + RoCE */
>>> -#define BE_TX_BUDGET           256
>>> +#define BE_TX_BUDGET           65535
>>>  #define BE_NAPI_WEIGHT         64
>>>  #define MAX_RX_POST            BE_NAPI_WEIGHT /* Frags posted at a time */
>>>  #define RX_FRAGS_REFILL_WM     (RX_Q_LEN - MAX_RX_POST)
>>>
>>
>> I will try that in a few minutes.
>> I also have a mlx4 driver (mlx4_en) which has a similar behaviour, and
>> a broadcom (bnx2x).
>>
>
> And it is not really better, still need about 18000 at 2.4Gbps in
> order to avoid drops...
> I really think there is something in the networking stack or in my
> configuration (DMA ? Something else ?)...
> As it doesn't seem to be driver related as I said...
>

If it can help, on a 3.0 kernel a txqueuelen of 9000 is sufficient in
order to get this bandwith on TX.

JM

^ permalink raw reply

* [PATCH 1/1] block/nbd: micro-optimization in nbd request completion
From: Chetan Loke @ 2012-06-06 14:15 UTC (permalink / raw)
  To: Paul.Clements, axboe, linux-kernel; +Cc: netdev, Chetan Loke


Add in-flight cmds to the tail. That way while searching(during request completion),we will always get a hit on the first element.


Signed-off-by: Chetan Loke <loke.chetan@gmail.com>
---
 drivers/block/nbd.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 061427a..8957b9f 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -481,7 +481,7 @@ static void nbd_handle_req(struct nbd_device *nbd, struct request *req)
 		nbd_end_request(req);
 	} else {
 		spin_lock(&nbd->queue_lock);
-		list_add(&req->queuelist, &nbd->queue_head);
+		list_add_tail(&req->queuelist, &nbd->queue_head);
 		spin_unlock(&nbd->queue_lock);
 	}
 
-- 
1.7.5.2

^ permalink raw reply related

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Eric Dumazet @ 2012-06-06 13:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <20120606111357.GA15070@redhat.com>

On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote:

> We currently do all stats either on napi callback or from
> start_xmit callback.
> This makes them safe, yes?

Hmm, then _bh() variant is needed in virtnet_stats(), as explained in
include/linux/u64_stats_sync.h section 6)

 * 6) If counter might be written by an interrupt, readers should block interrupts.
 *    (On UP, there is no seqcount_t protection, a reader allowing interrupts could
 *     read partial values)

Yes, its tricky...

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5214b1e..705aaa7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -703,12 +703,12 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev,
 		u64 tpackets, tbytes, rpackets, rbytes;

 		do {
-			start = u64_stats_fetch_begin(&stats->syncp);
+			start = u64_stats_fetch_begin_bh(&stats->syncp);
 			tpackets = stats->tx_packets;
 			tbytes   = stats->tx_bytes;
 			rpackets = stats->rx_packets;
 			rbytes   = stats->rx_bytes;
-		} while (u64_stats_fetch_retry(&stats->syncp, start));
+		} while (u64_stats_fetch_retry_bh(&stats->syncp, start));

 		tot->rx_packets += rpackets;
 		tot->tx_packets += tpackets;

^ permalink raw reply related

* Re: Difficulties to get 1Gbps on be2net ethernet card
From: Jean-Michel Hautbois @ 2012-06-06 13:07 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sathya.Perla, netdev
In-Reply-To: <CAL8zT=hHDPnvRFcQ0w+D=AP+QK6ic4X=tva6Yw_XGwuTbAYjhQ@mail.gmail.com>

2012/6/6 Jean-Michel Hautbois <jhautbois@gmail.com>:
> 2012/6/6 Eric Dumazet <eric.dumazet@gmail.com>:
>> On Wed, 2012-06-06 at 12:04 +0200, Jean-Michel Hautbois wrote:
>>
>>> Well, well, well, after having tested several configurations, several
>>> drivers, I have a big difference between an old 2.6.26 kernel and a
>>> newer one (I tried 3.2 and 3.4).
>>>
>>> Here is my stream : UDP packets (multicast), 4000 bytes length, MTU
>>> set to 4096. I am sending packets only, nothing on RX.
>>> I send from 1Gbps upto 2.4Gbps and I see no drops in tc with 2.6.26
>>> kernel, but a lot of drops with a newer kernel.
>>> So, I don't know if I missed something in my kernel configuration, but
>>> I have used the 2.6.26 one as a reference, in order to set the same
>>> options (DMA related, etc).
>>>
>>> I easily reproduce this problem and setting a bigger txqueuelen solves
>>> it partially.
>>> 1Gbps requires a txqueulen of 9000, 2.4Gbps requires more than 20000 !
>>>
>>> If you have any idea, I am interested, as this is a big issue for my use case.
>>>
>>
>> Yep.
>>
>> This driver wants to limit number of tx completions, thats just wrong.
>>
>> Fix and dirty patch:
>>
>>
>> diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
>> index c5c4c0e..1e8f8a6 100644
>> --- a/drivers/net/ethernet/emulex/benet/be.h
>> +++ b/drivers/net/ethernet/emulex/benet/be.h
>> @@ -105,7 +105,7 @@ static inline char *nic_name(struct pci_dev *pdev)
>>  #define MAX_TX_QS              8
>>  #define MAX_ROCE_EQS           5
>>  #define MAX_MSIX_VECTORS       (MAX_RSS_QS + MAX_ROCE_EQS) /* RSS qs + RoCE */
>> -#define BE_TX_BUDGET           256
>> +#define BE_TX_BUDGET           65535
>>  #define BE_NAPI_WEIGHT         64
>>  #define MAX_RX_POST            BE_NAPI_WEIGHT /* Frags posted at a time */
>>  #define RX_FRAGS_REFILL_WM     (RX_Q_LEN - MAX_RX_POST)
>>
>
> I will try that in a few minutes.
> I also have a mlx4 driver (mlx4_en) which has a similar behaviour, and
> a broadcom (bnx2x).
>

And it is not really better, still need about 18000 at 2.4Gbps in
order to avoid drops...
I really think there is something in the networking stack or in my
configuration (DMA ? Something else ?)...
As it doesn't seem to be driver related as I said...

JM

^ permalink raw reply

* Re: Difficulties to get 1Gbps on be2net ethernet card
From: Jean-Michel Hautbois @ 2012-06-06 12:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sathya.Perla, netdev
In-Reply-To: <1338980484.2760.4219.camel@edumazet-glaptop>

2012/6/6 Eric Dumazet <eric.dumazet@gmail.com>:
> On Wed, 2012-06-06 at 12:04 +0200, Jean-Michel Hautbois wrote:
>
>> Well, well, well, after having tested several configurations, several
>> drivers, I have a big difference between an old 2.6.26 kernel and a
>> newer one (I tried 3.2 and 3.4).
>>
>> Here is my stream : UDP packets (multicast), 4000 bytes length, MTU
>> set to 4096. I am sending packets only, nothing on RX.
>> I send from 1Gbps upto 2.4Gbps and I see no drops in tc with 2.6.26
>> kernel, but a lot of drops with a newer kernel.
>> So, I don't know if I missed something in my kernel configuration, but
>> I have used the 2.6.26 one as a reference, in order to set the same
>> options (DMA related, etc).
>>
>> I easily reproduce this problem and setting a bigger txqueuelen solves
>> it partially.
>> 1Gbps requires a txqueulen of 9000, 2.4Gbps requires more than 20000 !
>>
>> If you have any idea, I am interested, as this is a big issue for my use case.
>>
>
> Yep.
>
> This driver wants to limit number of tx completions, thats just wrong.
>
> Fix and dirty patch:
>
>
> diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
> index c5c4c0e..1e8f8a6 100644
> --- a/drivers/net/ethernet/emulex/benet/be.h
> +++ b/drivers/net/ethernet/emulex/benet/be.h
> @@ -105,7 +105,7 @@ static inline char *nic_name(struct pci_dev *pdev)
>  #define MAX_TX_QS              8
>  #define MAX_ROCE_EQS           5
>  #define MAX_MSIX_VECTORS       (MAX_RSS_QS + MAX_ROCE_EQS) /* RSS qs + RoCE */
> -#define BE_TX_BUDGET           256
> +#define BE_TX_BUDGET           65535
>  #define BE_NAPI_WEIGHT         64
>  #define MAX_RX_POST            BE_NAPI_WEIGHT /* Frags posted at a time */
>  #define RX_FRAGS_REFILL_WM     (RX_Q_LEN - MAX_RX_POST)
>

I will try that in a few minutes.
I also have a mlx4 driver (mlx4_en) which has a similar behaviour, and
a broadcom (bnx2x).

JM

^ permalink raw reply

* Re: [PATCH] net: sierra_net: device IDs for Aircard 320U++
From: Greg KH @ 2012-06-06 12:16 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Dan Williams, linux-ywE8TTl5eJHWpu6QEFMNjNBPR1lH4CV8, Autif Khan,
	Tom Cassidy, stable-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <87wr3kua2k.fsf-lbf33ChDnrE/G1V5fR+Y7Q@public.gmane.org>

On Wed, Jun 06, 2012 at 10:19:15AM +0200, Bjørn Mork wrote:
> Greg KH <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org> writes:
> > On Wed, Jun 06, 2012 at 09:18:10AM +0200, Bjørn Mork wrote:
> >> Adding device IDs for Aircard 320U and two other devices
> >> found in the out-of-tree version of this driver.
> >> 
> >> Cc: linux-ywE8TTl5eJHWpu6QEFMNjNBPR1lH4CV8@public.gmane.org
> >> Cc: Autif Khan <autif.mlist-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >> Cc: Tom Cassidy <tomas.cassidy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> Signed-off-by: Bjørn Mork <bjorn-yOkvZcmFvRU@public.gmane.org>
> >> ---
> >>  drivers/net/usb/sierra_net.c |   14 ++++++++++----
> >>  1 file changed, 10 insertions(+), 4 deletions(-)
> >
> > Wait, Tom just sent me a patch adding these device ids to the sierra
> > serial driver, why would the same device work for both drivers?
> 
> Because it's a composite device.  Was this a trick question? :-)
> 
> >  Where should the device id go?
> 
> To both drivers.  The device is similar to the 1199:68a3 device already
> supported by both drivers.  It has a number of serial ports (depending
> on how many features like GPS etc is enabled) supported by the "sierra"
> driver and one ethernet interface speaking Sierra's HIP protocol
> supported by the "sierra_net" driver.

Ok, thanks for clearing that up, I'll take the serial patch, and I'm
sure that David will take this one.

	Acked-by: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] netdev: mv643xx_eth: Prevent build on PPC32
From: Josh Boyer @ 2012-06-06 11:21 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Ben Hutchings, Lennert Buytenhek, Olof Johansson, netdev
In-Reply-To: <20120606052910.GA674@lunn.ch>

On Wed, Jun 06, 2012 at 07:29:10AM +0200, Andrew Lunn wrote:
> > The proper fix, from my minimal looking, was one of:
> > 
> > 1) revert the change for ARM that introduced th clk stuff
> > 2) do a similar change as the original commit but with a bunch of
> > #ifdef-ery
> > 3) implement the clkdev API stuff for 32-bit ppc
> > 
> > Honestly, I'd go for either 1 or 2.  The commit that introduced it was
> > broken to begin with, but that isn't my call.
> 
> I broke it. Sorry.
> 
> At the time, there was a push to remove all the #ifdefs. The following
> patchset was doing this:
> 
> https://lkml.org/lkml/2012/4/21/94
> 
> it would provide dummy implementations for those systems without clk
> support. However, it seems that patch set never made it in, and i did
> not declare my dependency on it.
> 
> I'm happy to add #ifdef. However, i would first like to understand
> what was 'broken to begin with'.

Simply that a commit was introduced that did not build on all the
existing platforms the driver supports.  The world is not ARM, or x86,
or PPC32, etc.  I haven't looked to see if it would still function
correctly in the presence of a dummy clk implementation, but if not that
would also be bad.

josh

^ permalink raw reply

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Michael S. Tsirkin @ 2012-06-06 11:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <1338972341.2760.3944.camel@edumazet-glaptop>

On Wed, Jun 06, 2012 at 10:45:41AM +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race
> > on 32bit arches.
> > 
> > We must use separate syncp for rx and tx path as they can be run at the
> > same time on different cpus. Thus one sequence increment can be lost and
> > readers spin forever.
> > 
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Stephen Hemminger <shemminger@vyatta.com>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Cc: Jason Wang <jasowang@redhat.com>
> > ---
> 
> Just to make clear : even using percpu stats/syncp, we have no guarantee
> that write_seqcount_begin() is done with one instruction. [1]
> 
> It is OK on x86 if "incl" instruction is generated by the compiler, but
> on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can
> be interrupted.
> 
> So if you are 100% sure all paths are safe against preemption/BH, then
> this patch is not needed, but a big comment in the code would avoid
> adding possible races in the future.

We currently do all stats either on napi callback or from
start_xmit callback.
This makes them safe, yes?

> [1] If done with one instruction, we still have a race, since a reader
> might see an even sequence and conclude no writer is inside the critical
> section. So read values could be wrong.
> 
> 

^ permalink raw reply

* Re: Difficulties to get 1Gbps on be2net ethernet card
From: Eric Dumazet @ 2012-06-06 11:01 UTC (permalink / raw)
  To: Jean-Michel Hautbois; +Cc: Sathya.Perla, netdev
In-Reply-To: <CAL8zT=ggT7Y2on6qmsp3u9CLOCwd6nOr3VjQfEsGZzA+O6us0A@mail.gmail.com>

On Wed, 2012-06-06 at 12:04 +0200, Jean-Michel Hautbois wrote:

> Well, well, well, after having tested several configurations, several
> drivers, I have a big difference between an old 2.6.26 kernel and a
> newer one (I tried 3.2 and 3.4).
> 
> Here is my stream : UDP packets (multicast), 4000 bytes length, MTU
> set to 4096. I am sending packets only, nothing on RX.
> I send from 1Gbps upto 2.4Gbps and I see no drops in tc with 2.6.26
> kernel, but a lot of drops with a newer kernel.
> So, I don't know if I missed something in my kernel configuration, but
> I have used the 2.6.26 one as a reference, in order to set the same
> options (DMA related, etc).
> 
> I easily reproduce this problem and setting a bigger txqueuelen solves
> it partially.
> 1Gbps requires a txqueulen of 9000, 2.4Gbps requires more than 20000 !
> 
> If you have any idea, I am interested, as this is a big issue for my use case.
> 

Yep.

This driver wants to limit number of tx completions, thats just wrong.

Fix and dirty patch:


diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
index c5c4c0e..1e8f8a6 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -105,7 +105,7 @@ static inline char *nic_name(struct pci_dev *pdev)
 #define MAX_TX_QS		8
 #define MAX_ROCE_EQS		5
 #define MAX_MSIX_VECTORS	(MAX_RSS_QS + MAX_ROCE_EQS) /* RSS qs + RoCE */
-#define BE_TX_BUDGET		256
+#define BE_TX_BUDGET		65535
 #define BE_NAPI_WEIGHT		64
 #define MAX_RX_POST		BE_NAPI_WEIGHT /* Frags posted at a time */
 #define RX_FRAGS_REFILL_WM	(RX_Q_LEN - MAX_RX_POST)

^ permalink raw reply related

* Re: Difficulties to get 1Gbps on be2net ethernet card
From: Jean-Michel Hautbois @ 2012-06-06 10:04 UTC (permalink / raw)
  To: Sathya.Perla; +Cc: eric.dumazet, netdev
In-Reply-To: <CAL8zT=iP+5o11am67ZBVOd=QrOfcjFWDydiuM9RrrAKe_k7LZw@mail.gmail.com>

2012/5/30 Jean-Michel Hautbois <jhautbois@gmail.com>:
> 2012/5/30  <Sathya.Perla@emulex.com>:
>>>-----Original Message-----
>>>From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
>>>Behalf Of Jean-Michel Hautbois
>>>
>>>2012/5/30 Jean-Michel Hautbois <jhautbois@gmail.com>:
>>>
>>>I used vmstat in order to see the differences between the two kernels.
>>>The main difference is the number of interrupts per second.
>>>I have an average of 87500 on 3.2 and 7500 on 2.6, 10 times lower !
>>>I suspect the be2net driver to be the main cause, and I checkes the
>>>/proc/interrupts file in order to be sure.
>>>
>>>I have for eth1-tx on 2.6.26 about 2200 interrupts per second and 23000 on 3.2.
>>>BTW, it is named eth1-q0 on 3.2 (and tx and rx are the same IRQ)
>>>whereas there is eth1-rx0 and eth1-tx on 2.6.26.
>>
>> Yes, there is an issue with be2net interrupt mitigation in the recent code with
>> RX and TX on the same Evt-Q (commit 10ef9ab4). The high interrupt rate happens when a TX blast is
>> done while RX is relatively silent on a queue pair. Interrupt rate due to TX completions is not being
>> mitigated.
>>
>> I have a fix and will send it out soon..
>>
>> thanks,
>> -Sathya
>
> Hi Sathya !
> Thanks for this information !
> I had the correct diagnostic :). I am waiting for your fix.
>

Well, well, well, after having tested several configurations, several
drivers, I have a big difference between an old 2.6.26 kernel and a
newer one (I tried 3.2 and 3.4).

Here is my stream : UDP packets (multicast), 4000 bytes length, MTU
set to 4096. I am sending packets only, nothing on RX.
I send from 1Gbps upto 2.4Gbps and I see no drops in tc with 2.6.26
kernel, but a lot of drops with a newer kernel.
So, I don't know if I missed something in my kernel configuration, but
I have used the 2.6.26 one as a reference, in order to set the same
options (DMA related, etc).

I easily reproduce this problem and setting a bigger txqueuelen solves
it partially.
1Gbps requires a txqueulen of 9000, 2.4Gbps requires more than 20000 !

If you have any idea, I am interested, as this is a big issue for my use case.

JM

^ permalink raw reply

* Deadlock, L2TP over IP are not working, 3.4.1
From: Denys Fedoryshchenko @ 2012-06-06  9:54 UTC (permalink / raw)
  To: davem, netdev, linux-kernel

It seems l2tp are not working, at least for me, due some bug

Script i uses, to reproduce:
SERVER=192.168.11.2
LOCALIP=`curl http://${SERVER}:8080/myip`
ID=`curl http://${SERVER}:8080/tunid` # It will generate some number, 
let's say 2
echo ID: ${ID}
modprobe l2tp_ip
modprobe l2tp_eth
ip l2tp add tunnel remote ${SERVER} local ${LOCALIP} tunnel_id ${ID} 
peer_tunnel_id ${ID} encap ip
ip l2tp add session name tun100 tunnel_id ${ID} session_id 1 
peer_session_id 1
ip link set dev tun100 up
ip addr add dev tun100 10.0.6.${ID}/24

Here is report for latest stable kernel. I can reproduce it on multiple 
pc's.
It is new setup, so i am not sure it was working on old kernels or not 
(regression or not).

[ 8683.927442] ======================================================
[ 8683.927555] [ INFO: possible circular locking dependency detected ]
[ 8683.927672] 3.4.1-build-0061 #14 Not tainted
[ 8683.927782] -------------------------------------------------------
[ 8683.927895] swapper/0/0 is trying to acquire lock:
[ 8683.928007]  (slock-AF_INET){+.-...}, at: [<e0fc73ec>] 
l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[ 8683.928121]
[ 8683.928121] but task is already holding lock:
[ 8683.928121]  (_xmit_ETHER#2){+.-...}, at: [<c02f062d>] 
sch_direct_xmit+0x36/0x119
[ 8683.928121]
[ 8683.928121] which lock already depends on the new lock.
[ 8683.928121]
[ 8683.928121]
[ 8683.928121] the existing dependency chain (in reverse order) is:
[ 8683.928121]
[ 8683.928121] -> #1 (_xmit_ETHER#2){+.-...}:
[ 8683.928121]        [<c015a561>] lock_acquire+0x71/0x85
[ 8683.928121]        [<c034da2d>] _raw_spin_lock+0x33/0x40
[ 8683.928121]        [<c0304e0c>] ip_send_reply+0xf2/0x1ce
[ 8683.928121]        [<c0317dbc>] tcp_v4_send_reset+0x153/0x16f
[ 8683.928121]        [<c0317f4a>] tcp_v4_do_rcv+0x172/0x194
[ 8683.928121]        [<c031929b>] tcp_v4_rcv+0x387/0x5a0
[ 8683.928121]        [<c03001d0>] ip_local_deliver_finish+0x13a/0x1e9
[ 8683.928121]        [<c0300645>] NF_HOOK.clone.11+0x46/0x4d
[ 8683.928121]        [<c030075b>] ip_local_deliver+0x41/0x45
[ 8683.928121]        [<c03005dd>] ip_rcv_finish+0x31a/0x33c
[ 8683.928121]        [<c0300645>] NF_HOOK.clone.11+0x46/0x4d
[ 8683.928121]        [<c0300960>] ip_rcv+0x201/0x23d
[ 8683.928121]        [<c02de91b>] __netif_receive_skb+0x329/0x378
[ 8683.928121]        [<c02deae8>] netif_receive_skb+0x4e/0x7d
[ 8683.928121]        [<e08d5ef3>] rtl8139_poll+0x243/0x33d [8139too]
[ 8683.928121]        [<c02df103>] net_rx_action+0x90/0x15d
[ 8683.928121]        [<c012b2b5>] __do_softirq+0x7b/0x118
[ 8683.928121]
[ 8683.928121] -> #0 (slock-AF_INET){+.-...}:
[ 8683.928121]        [<c0159f1b>] __lock_acquire+0x9a3/0xc27
[ 8683.928121]        [<c015a561>] lock_acquire+0x71/0x85
[ 8683.928121]        [<c034da2d>] _raw_spin_lock+0x33/0x40
[ 8683.928121]        [<e0fc73ec>] l2tp_xmit_skb+0x173/0x47e 
[l2tp_core]
[ 8683.928121]        [<e0fe31fb>] l2tp_eth_dev_xmit+0x1a/0x2f 
[l2tp_eth]
[ 8683.928121]        [<c02e01e7>] dev_hard_start_xmit+0x333/0x3f2
[ 8683.928121]        [<c02f064c>] sch_direct_xmit+0x55/0x119
[ 8683.928121]        [<c02e0528>] dev_queue_xmit+0x282/0x418
[ 8683.928121]        [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c
[ 8683.928121]        [<c031f524>] arp_xmit+0x22/0x24
[ 8683.928121]        [<c031f567>] arp_send+0x41/0x48
[ 8683.928121]        [<c031fa7d>] arp_process+0x289/0x491
[ 8683.928121]        [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c
[ 8683.928121]        [<c031f7a0>] arp_rcv+0xb1/0xc3
[ 8683.928121]        [<c02de91b>] __netif_receive_skb+0x329/0x378
[ 8683.928121]        [<c02de9d3>] process_backlog+0x69/0x130
[ 8683.928121]        [<c02df103>] net_rx_action+0x90/0x15d
[ 8683.928121]        [<c012b2b5>] __do_softirq+0x7b/0x118
[ 8683.928121]
[ 8683.928121] other info that might help us debug this:
[ 8683.928121]
[ 8683.928121]  Possible unsafe locking scenario:
[ 8683.928121]
[ 8683.928121]        CPU0                    CPU1
[ 8683.928121]        ----                    ----
[ 8683.928121]   lock(_xmit_ETHER#2);
[ 8683.928121]                                lock(slock-AF_INET);
[ 8683.928121]                                lock(_xmit_ETHER#2);
[ 8683.928121]   lock(slock-AF_INET);
[ 8683.928121]
[ 8683.928121]  *** DEADLOCK ***
[ 8683.928121]
[ 8683.928121] 3 locks held by swapper/0/0:
[ 8683.928121]  #0:  (rcu_read_lock){.+.+..}, at: [<c02dbc10>] 
rcu_lock_acquire+0x0/0x30
[ 8683.928121]  #1:  (rcu_read_lock_bh){.+....}, at: [<c02dbc10>] 
rcu_lock_acquire+0x0/0x30
[ 8683.928121]  #2:  (_xmit_ETHER#2){+.-...}, at: [<c02f062d>] 
sch_direct_xmit+0x36/0x119
[ 8683.928121]
[ 8683.928121] stack backtrace:
[ 8683.928121] Pid: 0, comm: swapper/0 Not tainted 3.4.1-build-0061 #14
[ 8683.928121] Call Trace:
[ 8683.928121]  [<c034bdd2>] ? printk+0x18/0x1a
[ 8683.928121]  [<c0158904>] print_circular_bug+0x1ac/0x1b6
[ 8683.928121]  [<c0159f1b>] __lock_acquire+0x9a3/0xc27
[ 8683.928121]  [<c015a561>] lock_acquire+0x71/0x85
[ 8683.928121]  [<e0fc73ec>] ? l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[ 8683.928121]  [<c034da2d>] _raw_spin_lock+0x33/0x40
[ 8683.928121]  [<e0fc73ec>] ? l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[ 8683.928121]  [<e0fc73ec>] l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[ 8683.928121]  [<e0fe31fb>] l2tp_eth_dev_xmit+0x1a/0x2f [l2tp_eth]
[ 8683.928121]  [<c02e01e7>] dev_hard_start_xmit+0x333/0x3f2
[ 8683.928121]  [<c02f064c>] sch_direct_xmit+0x55/0x119
[ 8683.928121]  [<c02e0528>] dev_queue_xmit+0x282/0x418
[ 8683.928121]  [<c02e02a6>] ? dev_hard_start_xmit+0x3f2/0x3f2
[ 8683.928121]  [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c
[ 8683.928121]  [<c031f524>] arp_xmit+0x22/0x24
[ 8683.928121]  [<c02e02a6>] ? dev_hard_start_xmit+0x3f2/0x3f2
[ 8683.928121]  [<c031f567>] arp_send+0x41/0x48
[ 8683.928121]  [<c031fa7d>] arp_process+0x289/0x491
[ 8683.928121]  [<c031f7f4>] ? __neigh_lookup.clone.20+0x42/0x42
[ 8683.928121]  [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c
[ 8683.928121]  [<c031f7a0>] arp_rcv+0xb1/0xc3
[ 8683.928121]  [<c031f7f4>] ? __neigh_lookup.clone.20+0x42/0x42
[ 8683.928121]  [<c02de91b>] __netif_receive_skb+0x329/0x378
[ 8683.928121]  [<c02de9d3>] process_backlog+0x69/0x130
[ 8683.928121]  [<c02df103>] net_rx_action+0x90/0x15d
[ 8683.928121]  [<c012b2b5>] __do_softirq+0x7b/0x118
[ 8683.928121]  [<c012b23a>] ? local_bh_enable+0xd/0xd
[ 8683.928121]  <IRQ>  [<c012b4d0>] ? irq_exit+0x41/0x91
[ 8683.928121]  [<c0103c6f>] ? do_IRQ+0x79/0x8d
[ 8683.928121]  [<c0157ea1>] ? trace_hardirqs_off_caller+0x2e/0x86
[ 8683.928121]  [<c034ef6e>] ? common_interrupt+0x2e/0x34
[ 8683.928121]  [<c0108a33>] ? default_idle+0x23/0x38
[ 8683.928121]  [<c01091a8>] ? cpu_idle+0x55/0x6f
[ 8683.928121]  [<c033df25>] ? rest_init+0xa1/0xa7
[ 8683.928121]  [<c033de84>] ? __read_lock_failed+0x14/0x14
[ 8683.928121]  [<c0498745>] ? start_kernel+0x303/0x30a
[ 8683.928121]  [<c0498209>] ? repair_env_string+0x51/0x51
[ 8683.928121]  [<c04980a8>] ? i386_start_kernel+0xa8/0xaf


[158595.436934]
[158595.437018] ======================================================
[158595.437111] [ INFO: possible circular locking dependency detected ]
[158595.437198] 3.4.0-build-0061 #12 Tainted: G        W
[158595.437281] -------------------------------------------------------
[158595.437365] swapper/0/0 is trying to acquire lock:
[158595.437447]  (slock-AF_INET){+.-...}, at: [<f86453ec>] 
l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[158595.437613]
[158595.437613] but task is already holding lock:
[158595.437763]  (_xmit_ETHER#2){+.-...}, at: [<c02f09b9>] 
sch_direct_xmit+0x36/0x119
[158595.437837]
[158595.437837] which lock already depends on the new lock.
[158595.437837]
[158595.437837]
[158595.437837] the existing dependency chain (in reverse order) is:
[158595.437837]
[158595.437837] -> #1 (_xmit_ETHER#2){+.-...}:
[158595.437837]        [<c015a6d1>] lock_acquire+0x71/0x85
[158595.437837]        [<c034de94>] _raw_spin_lock_irqsave+0x40/0x50
[158595.437837]        [<c017c1f2>] get_page_from_freelist+0x227/0x398
[158595.437837]        [<c017c5a7>] __alloc_pages_nodemask+0xef/0x5f9
[158595.437837]        [<c019c34f>] alloc_slab_page+0x1d/0x21
[158595.437837]        [<c019c39f>] new_slab+0x4c/0x164
[158595.437837]        [<c019d259>] 
__slab_alloc.clone.59.clone.64+0x247/0x2de
[158595.437837]        [<c019dd21>] __kmalloc_track_caller+0x55/0xa4
[158595.437837]        [<c02d56fb>] __alloc_skb+0x51/0x100
[158595.437837]        [<c02d2cfa>] sock_alloc_send_pskb+0x9e/0x263
[158595.437837]        [<c02d2ed7>] sock_alloc_send_skb+0x18/0x1d
[158595.437837]        [<c0303e04>] 
__ip_append_data.clone.52+0x302/0x6dc
[158595.437837]        [<c030494c>] ip_append_data+0x80/0x88
[158595.437837]        [<c03209dd>] icmp_push_reply+0x5c/0x101
[158595.437837]        [<c0321555>] icmp_send+0x31d/0x342
[158595.437837]        [<f862b05c>] send_unreach+0x19/0x1b [ipt_REJECT]
[158595.437837]        [<f862b0f5>] reject_tg+0x53/0x2de [ipt_REJECT]
[158595.437837]        [<c033359a>] ipt_do_table+0x3ad/0x410
[158595.437837]        [<f856c0c4>] iptable_filter_hook+0x56/0x5e 
[iptable_filter]
[158595.437837]        [<c02f9941>] nf_iterate+0x36/0x5c
[158595.437837]        [<c02f99bf>] nf_hook_slow+0x58/0xf1
[158595.437837]        [<c0301f33>] ip_forward+0x295/0x2a2
[158595.437837]        [<c0300969>] ip_rcv_finish+0x31a/0x33c
[158595.437837]        [<c03009d1>] NF_HOOK.clone.11+0x46/0x4d
[158595.437837]        [<c0300cec>] ip_rcv+0x201/0x23d
[158595.437837]        [<c02deca7>] __netif_receive_skb+0x329/0x378
[158595.437837]        [<c02dee74>] netif_receive_skb+0x4e/0x7d
[158595.437837]        [<c02def60>] napi_skb_finish+0x1e/0x34
[158595.437837]        [<c02df389>] napi_gro_receive+0x20/0x24
[158595.437837]        [<f850e213>] rtl8169_poll+0x2e6/0x52c [r8169]
[158595.437837]        [<c02df48f>] net_rx_action+0x90/0x15d
[158595.437837]        [<c012b42d>] __do_softirq+0x7b/0x118
[158595.437837]
[158595.437837] -> #0 (slock-AF_INET){+.-...}:
[158595.437837]        [<c015a08b>] __lock_acquire+0x9a3/0xc27
[158595.437837]        [<c015a6d1>] lock_acquire+0x71/0x85
[158595.437837]        [<c034ddad>] _raw_spin_lock+0x33/0x40
[158595.437837]        [<f86453ec>] l2tp_xmit_skb+0x173/0x47e 
[l2tp_core]
[158595.437837]        [<f86591fb>] l2tp_eth_dev_xmit+0x1a/0x2f 
[l2tp_eth]
[158595.437837]        [<c02e0573>] dev_hard_start_xmit+0x333/0x3f2
[158595.437837]        [<c02f09d8>] sch_direct_xmit+0x55/0x119
[158595.437837]        [<c02e08b4>] dev_queue_xmit+0x282/0x418
[158595.437837]        [<c031f887>] NF_HOOK.clone.19+0x45/0x4c
[158595.437837]        [<c031f8b0>] arp_xmit+0x22/0x24
[158595.437837]        [<c031f8f3>] arp_send+0x41/0x48
[158595.437837]        [<c031fe09>] arp_process+0x289/0x491
[158595.437837]        [<c031f887>] NF_HOOK.clone.19+0x45/0x4c
[158595.437837]        [<c031fb2c>] arp_rcv+0xb1/0xc3
[158595.437837]        [<c02deca7>] __netif_receive_skb+0x329/0x378
[158595.437837]        [<c02ded5f>] process_backlog+0x69/0x130
[158595.437837]        [<c02df48f>] net_rx_action+0x90/0x15d
[158595.437837]        [<c012b42d>] __do_softirq+0x7b/0x118
[158595.437837]
[158595.437837] other info that might help us debug this:
[158595.437837]
[158595.437837]  Possible unsafe locking scenario:
[158595.437837]
[158595.437837]        CPU0                    CPU1
[158595.437837]        ----                    ----
[158595.437837]   lock(_xmit_ETHER#2);
[158595.437837]                                lock(slock-AF_INET);
[158595.437837]                                lock(_xmit_ETHER#2);
[158595.437837]   lock(slock-AF_INET);
[158595.437837]
[158595.437837]  *** DEADLOCK ***
[158595.437837]
[158595.437837] 3 locks held by swapper/0/0:
[158595.437837]  #0:  (rcu_read_lock){.+.+..}, at: [<c02dbf9c>] 
rcu_lock_acquire+0x0/0x30
[158595.437837]  #1:  (rcu_read_lock_bh){.+....}, at: [<c02dbf9c>] 
rcu_lock_acquire+0x0/0x30
[158595.437837]  #2:  (_xmit_ETHER#2){+.-...}, at: [<c02f09b9>] 
sch_direct_xmit+0x36/0x119
[158595.437837]
[158595.437837] stack backtrace:
[158595.437837] Pid: 0, comm: swapper/0 Tainted: G        W    
3.4.0-build-0061 #12
[158595.437837] Call Trace:
[158595.437837]  [<c034c156>] ? printk+0x18/0x1a
[158595.437837]  [<c0158a74>] print_circular_bug+0x1ac/0x1b6
[158595.437837]  [<c015a08b>] __lock_acquire+0x9a3/0xc27
[158595.437837]  [<c015a6d1>] lock_acquire+0x71/0x85
[158595.437837]  [<f86453ec>] ? l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[158595.437837]  [<c034ddad>] _raw_spin_lock+0x33/0x40
[158595.437837]  [<f86453ec>] ? l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[158595.437837]  [<f86453ec>] l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[158595.437837]  [<f86591fb>] l2tp_eth_dev_xmit+0x1a/0x2f [l2tp_eth]
[158595.437837]  [<c02e0573>] dev_hard_start_xmit+0x333/0x3f2
[158595.437837]  [<c02f09d8>] sch_direct_xmit+0x55/0x119
[158595.437837]  [<c02e08b4>] dev_queue_xmit+0x282/0x418
[158595.437837]  [<c02e0632>] ? dev_hard_start_xmit+0x3f2/0x3f2
[158595.437837]  [<c031f887>] NF_HOOK.clone.19+0x45/0x4c
[158595.437837]  [<c031f8b0>] arp_xmit+0x22/0x24
[158595.437837]  [<c02e0632>] ? dev_hard_start_xmit+0x3f2/0x3f2
[158595.437837]  [<c031f8f3>] arp_send+0x41/0x48
[158595.437837]  [<c031fe09>] arp_process+0x289/0x491
[158595.437837]  [<c031fb80>] ? __neigh_lookup.clone.20+0x42/0x42
[158595.437837]  [<c031f887>] NF_HOOK.clone.19+0x45/0x4c
[158595.437837]  [<c031fb2c>] arp_rcv+0xb1/0xc3
[158595.437837]  [<c031fb80>] ? __neigh_lookup.clone.20+0x42/0x42
[158595.437837]  [<c02deca7>] __netif_receive_skb+0x329/0x378
[158595.437837]  [<c02ded5f>] process_backlog+0x69/0x130
[158595.437837]  [<c02df48f>] net_rx_action+0x90/0x15d
[158595.437837]  [<c012b42d>] __do_softirq+0x7b/0x118
[158595.437837]  [<c013236e>] ? do_send_specific+0xb/0x8f
[158595.437837]  [<c012b3b2>] ? local_bh_enable+0xd/0xd
[158595.437837]  <IRQ>  [<c012b648>] ? irq_exit+0x41/0x91
[158595.437837]  [<c0103c73>] ? do_IRQ+0x79/0x8d
[158595.437837]  [<c0158011>] ? trace_hardirqs_off_caller+0x2e/0x86
[158595.437837]  [<c034f2ee>] ? common_interrupt+0x2e/0x34
[158595.437837]  [<c015007b>] ? ktime_get_ts+0x8f/0x9b
[158595.437837]  [<c0108a0a>] ? mwait_idle+0x50/0x5a
[158595.437837]  [<c01091ac>] ? cpu_idle+0x55/0x6f
[158595.437837]  [<c033e2b1>] ? rest_init+0xa1/0xa7
[158595.437837]  [<c033e210>] ? __read_lock_failed+0x14/0x14
[158595.437837]  [<c049874f>] ? start_kernel+0x30d/0x314
[158595.437837]  [<c0498209>] ? repair_env_string+0x51/0x51
[158595.437837]  [<c04980a8>] ? i386_start_kernel+0xa8/0xaf

[63546.808787]
[63546.809025] ======================================================
[63546.809259] [ INFO: possible circular locking dependency detected ]
[63546.809494] 3.4.1-build-0061 #14 Not tainted
[63546.809685] -------------------------------------------------------
[63546.809685] swapper/0/0 is trying to acquire lock:
[63546.809685]  (slock-AF_INET){+.-...}, at: [<f8c593ec>] 
l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[63546.809685]
[63546.809685] but task is already holding lock:
[63546.809685]  (_xmit_ETHER#2){+.-...}, at: [<c02f062d>] 
sch_direct_xmit+0x36/0x119
[63546.809685]
[63546.809685] which lock already depends on the new lock.
[63546.809685]
[63546.809685]
[63546.809685] the existing dependency chain (in reverse order) is:
[63546.809685]
[63546.809685] -> #1 (_xmit_ETHER#2){+.-...}:
[63546.809685]        [<c015a561>] lock_acquire+0x71/0x85
[63546.809685]        [<c034dc06>] _raw_spin_lock_bh+0x38/0x45
[63546.809685]        [<c02a4e8a>] ppp_push+0x59/0x4b3
[63546.809685]        [<c02a66b9>] ppp_xmit_process+0x41b/0x4be
[63546.809685]        [<c02a69b9>] ppp_write+0x90/0xa1
[63546.809685]        [<c01a2e8c>] vfs_write+0x7e/0xab
[63546.809685]        [<c01a2ffc>] sys_write+0x3d/0x5e
[63546.809685]        [<c034e191>] syscall_call+0x7/0xb
[63546.809685]
[63546.809685] -> #0 (slock-AF_INET){+.-...}:
[63546.809685]        [<c0159f1b>] __lock_acquire+0x9a3/0xc27
[63546.809685]        [<c015a561>] lock_acquire+0x71/0x85
[63546.809685]        [<c034da2d>] _raw_spin_lock+0x33/0x40
[63546.809685]        [<f8c593ec>] l2tp_xmit_skb+0x173/0x47e 
[l2tp_core]
[63546.809685]        [<f8c751fb>] l2tp_eth_dev_xmit+0x1a/0x2f 
[l2tp_eth]
[63546.809685]        [<c02e01e7>] dev_hard_start_xmit+0x333/0x3f2
[63546.809685]        [<c02f064c>] sch_direct_xmit+0x55/0x119
[63546.809685]        [<c02e0528>] dev_queue_xmit+0x282/0x418
[63546.809685]        [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c
[63546.809685]        [<c031f524>] arp_xmit+0x22/0x24
[63546.809685]        [<c031f567>] arp_send+0x41/0x48
[63546.809685]        [<c031fa7d>] arp_process+0x289/0x491
[63546.809685]        [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c
[63546.809685]        [<c031f7a0>] arp_rcv+0xb1/0xc3
[63546.809685]        [<c02de91b>] __netif_receive_skb+0x329/0x378
[63546.809685]        [<c02de9d3>] process_backlog+0x69/0x130
[63546.809685]        [<c02df103>] net_rx_action+0x90/0x15d
[63546.809685]        [<c012b2b5>] __do_softirq+0x7b/0x118
[63546.809685]
[63546.809685] other info that might help us debug this:
[63546.809685]
[63546.809685]  Possible unsafe locking scenario:
[63546.809685]
[63546.809685]        CPU0                    CPU1
[63546.809685]        ----                    ----
[63546.809685]   lock(_xmit_ETHER#2);
[63546.809685]                                lock(slock-AF_INET);
[63546.809685]                                lock(_xmit_ETHER#2);
[63546.809685]   lock(slock-AF_INET);
[63546.809685]
[63546.809685]  *** DEADLOCK ***
[63546.809685]
[63546.809685] 3 locks held by swapper/0/0:
[63546.809685]  #0:  (rcu_read_lock){.+.+..}, at: [<c02dbc10>] 
rcu_lock_acquire+0x0/0x30
[63546.809685]  #1:  (rcu_read_lock_bh){.+....}, at: [<c02dbc10>] 
rcu_lock_acquire+0x0/0x30
[63546.809685]  #2:  (_xmit_ETHER#2){+.-...}, at: [<c02f062d>] 
sch_direct_xmit+0x36/0x119
[63546.809685]
[63546.809685] stack backtrace:
[63546.809685] Pid: 0, comm: swapper/0 Not tainted 3.4.1-build-0061 #14
[63546.809685] Call Trace:
[63546.809685]  [<c034bdd2>] ? printk+0x18/0x1a
[63546.809685]  [<c0158904>] print_circular_bug+0x1ac/0x1b6
[63546.809685]  [<c0159f1b>] __lock_acquire+0x9a3/0xc27
[63546.809685]  [<c015a561>] lock_acquire+0x71/0x85
[63546.809685]  [<f8c593ec>] ? l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[63546.809685]  [<c034da2d>] _raw_spin_lock+0x33/0x40
[63546.809685]  [<f8c593ec>] ? l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[63546.809685]  [<f8c593ec>] l2tp_xmit_skb+0x173/0x47e [l2tp_core]
[63546.809685]  [<f8c751fb>] l2tp_eth_dev_xmit+0x1a/0x2f [l2tp_eth]
[63546.809685]  [<c02e01e7>] dev_hard_start_xmit+0x333/0x3f2
[63546.809685]  [<c02f064c>] sch_direct_xmit+0x55/0x119
[63546.809685]  [<c02e0528>] dev_queue_xmit+0x282/0x418
[63546.809685]  [<c02e02a6>] ? dev_hard_start_xmit+0x3f2/0x3f2
[63546.809685]  [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c
[63546.809685]  [<c031f524>] arp_xmit+0x22/0x24
[63546.809685]  [<c02e02a6>] ? dev_hard_start_xmit+0x3f2/0x3f2
[63546.809685]  [<c031f567>] arp_send+0x41/0x48
[63546.809685]  [<c031fa7d>] arp_process+0x289/0x491
[63546.809685]  [<c031f7f4>] ? __neigh_lookup.clone.20+0x42/0x42
[63546.809685]  [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c
[63546.809685]  [<c031f7a0>] arp_rcv+0xb1/0xc3
[63546.809685]  [<c031f7f4>] ? __neigh_lookup.clone.20+0x42/0x42
[63546.809685]  [<c02de91b>] __netif_receive_skb+0x329/0x378
[63546.809685]  [<c02de9d3>] process_backlog+0x69/0x130
[63546.809685]  [<c02df103>] net_rx_action+0x90/0x15d
[63546.809685]  [<c012b2b5>] __do_softirq+0x7b/0x118
[63546.809685]  [<c012b23a>] ? local_bh_enable+0xd/0xd
[63546.809685]  [<c012b23a>] ? local_bh_enable+0xd/0xd
[63546.809685]  <IRQ>  [<c012b4d0>] ? irq_exit+0x41/0x91
[63546.809685]  [<c0103c6f>] ? do_IRQ+0x79/0x8d
[63546.809685]  [<c0157ea1>] ? trace_hardirqs_off_caller+0x2e/0x86
[63546.809685]  [<c034ef6e>] ? common_interrupt+0x2e/0x34
[63546.809685]  [<c015007b>] ? do_gettimeofday+0x20/0x29
[63546.809685]  [<c0108a06>] ? mwait_idle+0x50/0x5a
[63546.809685]  [<c01091a8>] ? cpu_idle+0x55/0x6f
[63546.809685]  [<c033df25>] ? rest_init+0xa1/0xa7
[63546.809685]  [<c033de84>] ? __read_lock_failed+0x14/0x14
[63546.809685]  [<c0498745>] ? start_kernel+0x303/0x30a
[63546.809685]  [<c0498209>] ? repair_env_string+0x51/0x51
[63546.809685]  [<c04980a8>] ? i386_start_kernel+0xa8/0xaf

---
Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L.

^ permalink raw reply

* Re: [PATCH 7/7] netfilter: add user-space connection tracking helper infrastructure
From: Ferenc Wagner @ 2012-06-06  9:39 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev, wferi
In-Reply-To: <1338812485-4232-8-git-send-email-pablo@netfilter.org>

pablo@netfilter.org writes:

> * Security: Avoid complex string matching and mangling in kernel-space
>   running in unprivileged mode.

Or rather in privileged mode?

> 2) Add rules to enable the FTP user-space helper which is
>    used to track traffic going to TCP port 10000.

The examples use port 21 in the iptables commands and the expectations:

>  iptables -I OUTPUT -t raw -p tcp --dport 21 -j CT --helper ftp
>  iptables -I PREROUTING -t raw -p tcp --dport 21 -j CT --helper ftp
>
>     [NEW] 301 proto=6 src=192.168.1.136 dst=130.89.148.12 sport=0 dport=54037 mask-src=255.255.255.255 mask-dst=255.255.255.255 sport=0 dport=65535 master-src=192.168.1.136 master-dst=130.89.148.12 sport=57127 dport=21 class=0 helper=ftp
> [DESTROY] 301 proto=6 src=192.168.1.136 dst=130.89.148.12 sport=0 dport=54037 mask-src=255.255.255.255 mask-dst=255.255.255.255 sport=0 dport=65535 master-src=192.168.1.136 master-dst=130.89.148.12 sport=57127 dport=21 class=0 helper=ftp
-- 
Regards,
Feri.

^ permalink raw reply

* Re: [PATCH] virtio-net: fix a race on 32bit arches
From: Jason Wang @ 2012-06-06  9:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: mst, netdev, linux-kernel, virtualization, Stephen Hemminger
In-Reply-To: <1338972341.2760.3944.camel@edumazet-glaptop>

On 06/06/2012 04:45 PM, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote:
>> From: Eric Dumazet<edumazet@google.com>
>>
>> commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race
>> on 32bit arches.
>>
>> We must use separate syncp for rx and tx path as they can be run at the
>> same time on different cpus. Thus one sequence increment can be lost and
>> readers spin forever.
>>
>> Signed-off-by: Eric Dumazet<edumazet@google.com>
>> Cc: Stephen Hemminger<shemminger@vyatta.com>
>> Cc: Michael S. Tsirkin<mst@redhat.com>
>> Cc: Jason Wang<jasowang@redhat.com>
>> ---
> Just to make clear : even using percpu stats/syncp, we have no guarantee
> that write_seqcount_begin() is done with one instruction. [1]
>
> It is OK on x86 if "incl" instruction is generated by the compiler, but
> on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can
> be interrupted.
>
> So if you are 100% sure all paths are safe against preemption/BH, then
> this patch is not needed, but a big comment in the code would avoid
> adding possible races in the future.

Thanks for explaing, current virtio-net is safe I think. But the patch 
is still needed as my patch would update the statistics in irq.
>
> [1] If done with one instruction, we still have a race, since a reader
> might see an even sequence and conclude no writer is inside the critical
> section. So read values could be wrong.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [V2 RFC net-next PATCH 2/2] virtio_net: export more statistics through ethtool
From: Jason Wang @ 2012-06-06  9:37 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20120606082752.GA12767@redhat.com>

On 06/06/2012 04:27 PM, Michael S. Tsirkin wrote:
> On Wed, Jun 06, 2012 at 03:52:17PM +0800, Jason Wang wrote:
>> Satistics counters is useful for debugging and performance optimization, so this
>> patch lets virtio_net driver collect following and export them to userspace
>> through "ethtool -S":
>>
>> - number of packets sent/received
>> - number of bytes sent/received
>> - number of callbacks for tx/rx
>> - number of kick for tx/rx
>> - number of bytes/packets queued for tx
>>
>> As virtnet_stats were per-cpu, so both per-cpu and gloabl satistics were
>> collected like:
>>
>> NIC statistics:
>>       tx_bytes[0]: 1731209929
>>       tx_packets[0]: 60685
>>       tx_kicks[0]: 63
>>       tx_callbacks[0]: 73
>>       tx_queued_bytes[0]: 1935749360
>>       tx_queued_packets[0]: 80652
>>       rx_bytes[0]: 2695648
>>       rx_packets[0]: 40767
>>       rx_kicks[0]: 1
>>       rx_callbacks[0]: 2077
>>       tx_bytes[1]: 9105588697
>>       tx_packets[1]: 344150
>>       tx_kicks[1]: 162
>>       tx_callbacks[1]: 905
>>       tx_queued_bytes[1]: 8901049412
>>       tx_queued_packets[1]: 324184
>>       rx_bytes[1]: 23679828
>>       rx_packets[1]: 358770
>>       rx_kicks[1]: 6
>>       rx_callbacks[1]: 17717
>>       tx_bytes: 10836798626
>>       tx_packets: 404835
>>       tx_kicks: 225
>>       tx_callbacks: 978
>>       tx_queued_bytes: 10836798772
>>       tx_queued_packets: 404836
>>       rx_bytes: 26375476
>>       rx_packets: 399537
>>       rx_kicks: 7
>>       rx_callbacks: 19794
>>
>> TODO:
>>
>> - more statistics
>> - calculate the pending bytes/pkts
>>
> Do we need that? pending is (queued - packets), no?
>   

No, if we choose to calculate by tools.
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>>
>> ---
>> Changes from v1:
>>
>> - style&  typo fixs
>> - convert the statistics fields to array
>> - use unlikely()
>> ---
>>   drivers/net/virtio_net.c |  115 +++++++++++++++++++++++++++++++++++++++++++++-
>>   1 files changed, 113 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 6e4aa6f..909a0a7 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -44,8 +44,14 @@ module_param(gso, bool, 0444);
>>   enum virtnet_stats_type {
>>   	VIRTNET_TX_BYTES,
>>   	VIRTNET_TX_PACKETS,
>> +	VIRTNET_TX_KICKS,
>> +	VIRTNET_TX_CBS,
>> +	VIRTNET_TX_Q_BYTES,
>> +	VIRTNET_TX_Q_PACKETS,
>>   	VIRTNET_RX_BYTES,
>>   	VIRTNET_RX_PACKETS,
>> +	VIRTNET_RX_KICKS,
>> +	VIRTNET_RX_CBS,
>>   	VIRTNET_NUM_STATS,
>>   };
>>
>> @@ -54,6 +60,21 @@ struct virtnet_stats {
>>   	u64 data[VIRTNET_NUM_STATS];
>>   };
>>
>> +static struct {
> static const?
>

Sorry, forget this.
>> +	char string[ETH_GSTRING_LEN];
>> +} virtnet_stats_str_attr[] = {
>> +	{ "tx_bytes" },
>> +	{ "tx_packets" },
>> +	{ "tx_kicks" },
>> +	{ "tx_callbacks" },
>> +	{ "tx_queued_bytes" },
>> +	{ "tx_queued_packets" },
>> +	{ "rx_bytes" },
>> +	{ "rx_packets" },
>> +	{ "rx_kicks" },
>> +	{ "rx_callbacks" },
>> +};
>> +
>>   struct virtnet_info {
>>   	struct virtio_device *vdev;
>>   	struct virtqueue *rvq, *svq, *cvq;
>> @@ -146,6 +167,11 @@ static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
>>   static void skb_xmit_done(struct virtqueue *svq)
>>   {
>>   	struct virtnet_info *vi = svq->vdev->priv;
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> +
>> +	u64_stats_update_begin(&stats->syncp);
>> +	stats->data[VIRTNET_TX_CBS]++;
>> +	u64_stats_update_end(&stats->syncp);
>>
>>   	/* Suppress further interrupts. */
>>   	virtqueue_disable_cb(svq);
>> @@ -465,6 +491,7 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>>   {
>>   	int err;
>>   	bool oom;
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>>
>>   	do {
>>   		if (vi->mergeable_rx_bufs)
>> @@ -481,13 +508,24 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>>   	} while (err>  0);
>>   	if (unlikely(vi->num>  vi->max))
>>   		vi->max = vi->num;
>> -	virtqueue_kick(vi->rvq);
>> +	if (virtqueue_kick_prepare(vi->rvq)) {
> if (unlikely())
> also move stats here where they are actually used?

Sure.
>> +		virtqueue_notify(vi->rvq);
>> +		u64_stats_update_begin(&stats->syncp);
>> +		stats->data[VIRTNET_RX_KICKS]++;
>> +		u64_stats_update_end(&stats->syncp);
>> +	}
>>   	return !oom;
>>   }
>>
>>   static void skb_recv_done(struct virtqueue *rvq)
>>   {
>>   	struct virtnet_info *vi = rvq->vdev->priv;
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> +
>> +	u64_stats_update_begin(&stats->syncp);
>> +	stats->data[VIRTNET_RX_CBS]++;
>> +	u64_stats_update_end(&stats->syncp);
>> +
>>   	/* Schedule NAPI, Suppress further interrupts if successful. */
>>   	if (napi_schedule_prep(&vi->napi)) {
>>   		virtqueue_disable_cb(rvq);
>> @@ -630,7 +668,9 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
>>   static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>   {
>>   	struct virtnet_info *vi = netdev_priv(dev);
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>>   	int capacity;
>> +	bool kick;
>>
>>   	/* Free up any pending old buffers before queueing new ones. */
>>   	free_old_xmit_skbs(vi);
>> @@ -655,7 +695,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>>   		kfree_skb(skb);
>>   		return NETDEV_TX_OK;
>>   	}
>> -	virtqueue_kick(vi->svq);
>> +
>> +	kick = virtqueue_kick_prepare(vi->svq);
>> +	if (unlikely(kick))
>> +		virtqueue_notify(vi->svq);
>> +
>> +	u64_stats_update_begin(&stats->syncp);
>> +	if (unlikely(kick))
>> +		stats->data[VIRTNET_TX_KICKS]++;
>> +	stats->data[VIRTNET_TX_Q_BYTES] += skb->len;
>> +	stats->data[VIRTNET_TX_Q_PACKETS]++;
>> +	u64_stats_update_end(&stats->syncp);
>>
>>   	/* Don't wait up for transmitted skbs to be freed. */
>>   	skb_orphan(skb);
>> @@ -943,10 +993,71 @@ static void virtnet_get_drvinfo(struct net_device *dev,
>>
>>   }
>>
>> +static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
>> +{
>> +	int i, cpu;
>> +	switch (stringset) {
>> +	case ETH_SS_STATS:
>> +		for_each_possible_cpu(cpu)
>> +			for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
>> +				sprintf(buf, "%s[%u]",
>> +					virtnet_stats_str_attr[i].string, cpu);
>> +				buf += ETH_GSTRING_LEN;
> I would do
> 	 ret = snprintf(buf, ETH_GSTRING_LEN, ...)
> 	 BUG_ON(ret>= ETH_GSTRING_LEN);
> here to make it more robust.

Ok.
>> +			}
>> +		for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
>> +			memcpy(buf, virtnet_stats_str_attr[i].string,
>> +				ETH_GSTRING_LEN);
>> +			buf += ETH_GSTRING_LEN;
>> +		}
> 		So why not just memcpy the whole array there?
> 		memcpy(buf, virtnet_stats_str_attr,
> 		       sizeof virtnet_stats_str_attr);
>
>> +		break;
>> +	}
>> +}
>> +
>> +static int virtnet_get_sset_count(struct net_device *dev, int sset)
>> +{
>> +	switch (sset) {
>> +	case ETH_SS_STATS:
> also add
> 	BUILD_BUG_ON(VIRTNET_NUM_STATS != (sizeof virtnet_stats_str_attr) / ETH_GSTRING_LEN);
>

Ok.
>> +		return VIRTNET_NUM_STATS * (num_possible_cpus() + 1);
>> +	default:
>> +		return -EOPNOTSUPP;
>> +	}
>> +}
>> +
>> +static void virtnet_get_ethtool_stats(struct net_device *dev,
>> +				      struct ethtool_stats *stats, u64 *buf)
>> +{
>> +	struct virtnet_info *vi = netdev_priv(dev);
>> +	int cpu, i;
>> +	unsigned int start;
>> +	struct virtnet_stats sample, total;
>> +
>> +	memset(&total, 0, sizeof(total));
> sizeof total
> when operand is a variable,
> to distinguish from when it is a type.

Sure.
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		struct virtnet_stats *s = per_cpu_ptr(vi->stats, cpu);
>> +		do {
>> +			start = u64_stats_fetch_begin(&s->syncp);
>> +			memcpy(&sample.data,&s->data,
>> +			       sizeof(u64) * VIRTNET_NUM_STATS);
>> +		} while (u64_stats_fetch_retry(&s->syncp, start));
>> +
>> +		for (i = 0; i<  VIRTNET_NUM_STATS; i++) {
>> +			*buf = sample.data[i];
>> +			total.data[i] += sample.data[i];
>> +			buf++;
>> +		}
>> +	}
>> +
>> +	memcpy(buf,&total.data, sizeof(u64) * VIRTNET_NUM_STATS);
>> +}
>> +
>>   static const struct ethtool_ops virtnet_ethtool_ops = {
>>   	.get_drvinfo = virtnet_get_drvinfo,
>>   	.get_link = ethtool_op_get_link,
>>   	.get_ringparam = virtnet_get_ringparam,
>> +	.get_ethtool_stats = virtnet_get_ethtool_stats,
>> +	.get_strings = virtnet_get_strings,
>> +	.get_sset_count = virtnet_get_sset_count,
>>   };
>>
>>   #define MIN_MTU 68

^ permalink raw reply

* Re: [V2 RFC net-next PATCH 2/2] virtio_net: export more statistics through ethtool
From: Michael S. Tsirkin @ 2012-06-06  9:32 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20120606075217.29081.30713.stgit@amd-6168-8-1.englab.nay.redhat.com>

On Wed, Jun 06, 2012 at 03:52:17PM +0800, Jason Wang wrote:
> Satistics counters is useful for debugging and performance optimization, so this
> patch lets virtio_net driver collect following and export them to userspace
> through "ethtool -S":
> 
> - number of packets sent/received
> - number of bytes sent/received
> - number of callbacks for tx/rx
> - number of kick for tx/rx
> - number of bytes/packets queued for tx
> 
> As virtnet_stats were per-cpu, so both per-cpu and gloabl satistics were
> collected like:
> 
> NIC statistics:
>      tx_bytes[0]: 1731209929
>      tx_packets[0]: 60685
>      tx_kicks[0]: 63
>      tx_callbacks[0]: 73
>      tx_queued_bytes[0]: 1935749360
>      tx_queued_packets[0]: 80652
>      rx_bytes[0]: 2695648
>      rx_packets[0]: 40767
>      rx_kicks[0]: 1
>      rx_callbacks[0]: 2077
>      tx_bytes[1]: 9105588697
>      tx_packets[1]: 344150
>      tx_kicks[1]: 162
>      tx_callbacks[1]: 905
>      tx_queued_bytes[1]: 8901049412
>      tx_queued_packets[1]: 324184
>      rx_bytes[1]: 23679828
>      rx_packets[1]: 358770
>      rx_kicks[1]: 6
>      rx_callbacks[1]: 17717
>      tx_bytes: 10836798626
>      tx_packets: 404835
>      tx_kicks: 225
>      tx_callbacks: 978
>      tx_queued_bytes: 10836798772
>      tx_queued_packets: 404836
>      rx_bytes: 26375476
>      rx_packets: 399537
>      rx_kicks: 7
>      rx_callbacks: 19794
> 
> TODO:
> 
> - more statistics
> - calculate the pending bytes/pkts
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> ---
> Changes from v1:
> 
> - style & typo fixs
> - convert the statistics fields to array
> - use unlikely()
> ---
>  drivers/net/virtio_net.c |  115 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 113 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 6e4aa6f..909a0a7 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -44,8 +44,14 @@ module_param(gso, bool, 0444);
>  enum virtnet_stats_type {
>  	VIRTNET_TX_BYTES,
>  	VIRTNET_TX_PACKETS,
> +	VIRTNET_TX_KICKS,
> +	VIRTNET_TX_CBS,
> +	VIRTNET_TX_Q_BYTES,
> +	VIRTNET_TX_Q_PACKETS,

What about counting the time we spend with queue
stopped and # of times we stop the queue?

>  	VIRTNET_RX_BYTES,
>  	VIRTNET_RX_PACKETS,
> +	VIRTNET_RX_KICKS,
> +	VIRTNET_RX_CBS,

What about a counter for oom on rx?

>  	VIRTNET_NUM_STATS,
>  };
>  
> @@ -54,6 +60,21 @@ struct virtnet_stats {
>  	u64 data[VIRTNET_NUM_STATS];
>  };
>  
> +static struct {
> +	char string[ETH_GSTRING_LEN];
> +} virtnet_stats_str_attr[] = {
> +	{ "tx_bytes" },
> +	{ "tx_packets" },
> +	{ "tx_kicks" },
> +	{ "tx_callbacks" },
> +	{ "tx_queued_bytes" },
> +	{ "tx_queued_packets" },
> +	{ "rx_bytes" },
> +	{ "rx_packets" },
> +	{ "rx_kicks" },
> +	{ "rx_callbacks" },
> +};
> +
>  struct virtnet_info {
>  	struct virtio_device *vdev;
>  	struct virtqueue *rvq, *svq, *cvq;
> @@ -146,6 +167,11 @@ static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
>  static void skb_xmit_done(struct virtqueue *svq)
>  {
>  	struct virtnet_info *vi = svq->vdev->priv;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	stats->data[VIRTNET_TX_CBS]++;
> +	u64_stats_update_end(&stats->syncp);
>  
>  	/* Suppress further interrupts. */
>  	virtqueue_disable_cb(svq);
> @@ -465,6 +491,7 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>  {
>  	int err;
>  	bool oom;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>  
>  	do {
>  		if (vi->mergeable_rx_bufs)
> @@ -481,13 +508,24 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
>  	} while (err > 0);
>  	if (unlikely(vi->num > vi->max))
>  		vi->max = vi->num;
> -	virtqueue_kick(vi->rvq);
> +	if (virtqueue_kick_prepare(vi->rvq)) {
> +		virtqueue_notify(vi->rvq);
> +		u64_stats_update_begin(&stats->syncp);
> +		stats->data[VIRTNET_RX_KICKS]++;
> +		u64_stats_update_end(&stats->syncp);
> +	}
>  	return !oom;
>  }
>  
>  static void skb_recv_done(struct virtqueue *rvq)
>  {
>  	struct virtnet_info *vi = rvq->vdev->priv;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	stats->data[VIRTNET_RX_CBS]++;
> +	u64_stats_update_end(&stats->syncp);
> +

This data path so not entirely free.
I am guessing the overhead is not measureable but
did you check?

An alternative is to count when napi callbacks
are envoked. If we also count when weight was exceeded
we get almost the same result.


>  	/* Schedule NAPI, Suppress further interrupts if successful. */
>  	if (napi_schedule_prep(&vi->napi)) {
>  		virtqueue_disable_cb(rvq);
> @@ -630,7 +668,9 @@ static int xmit_skb(struct virtnet_info *vi, struct sk_buff *skb)
>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	struct virtnet_info *vi = netdev_priv(dev);
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>  	int capacity;
> +	bool kick;
>  
>  	/* Free up any pending old buffers before queueing new ones. */
>  	free_old_xmit_skbs(vi);
> @@ -655,7 +695,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  		kfree_skb(skb);
>  		return NETDEV_TX_OK;
>  	}
> -	virtqueue_kick(vi->svq);
> +
> +	kick = virtqueue_kick_prepare(vi->svq);
> +	if (unlikely(kick))
> +		virtqueue_notify(vi->svq);
> +
> +	u64_stats_update_begin(&stats->syncp);
> +	if (unlikely(kick))
> +		stats->data[VIRTNET_TX_KICKS]++;
> +	stats->data[VIRTNET_TX_Q_BYTES] += skb->len;
> +	stats->data[VIRTNET_TX_Q_PACKETS]++;
> +	u64_stats_update_end(&stats->syncp);
>  
>  	/* Don't wait up for transmitted skbs to be freed. */
>  	skb_orphan(skb);
> @@ -943,10 +993,71 @@ static void virtnet_get_drvinfo(struct net_device *dev,
>  
>  }
>  
> +static void virtnet_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
> +{
> +	int i, cpu;
> +	switch (stringset) {
> +	case ETH_SS_STATS:
> +		for_each_possible_cpu(cpu)
> +			for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +				sprintf(buf, "%s[%u]",
> +					virtnet_stats_str_attr[i].string, cpu);
> +				buf += ETH_GSTRING_LEN;
> +			}
> +		for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +			memcpy(buf, virtnet_stats_str_attr[i].string,
> +				ETH_GSTRING_LEN);
> +			buf += ETH_GSTRING_LEN;
> +		}
> +		break;
> +	}
> +}
> +
> +static int virtnet_get_sset_count(struct net_device *dev, int sset)
> +{
> +	switch (sset) {
> +	case ETH_SS_STATS:
> +		return VIRTNET_NUM_STATS * (num_possible_cpus() + 1);
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +}
> +
> +static void virtnet_get_ethtool_stats(struct net_device *dev,
> +				      struct ethtool_stats *stats, u64 *buf)
> +{
> +	struct virtnet_info *vi = netdev_priv(dev);
> +	int cpu, i;
> +	unsigned int start;
> +	struct virtnet_stats sample, total;
> +
> +	memset(&total, 0, sizeof(total));
> +
> +	for_each_possible_cpu(cpu) {
> +		struct virtnet_stats *s = per_cpu_ptr(vi->stats, cpu);
> +		do {
> +			start = u64_stats_fetch_begin(&s->syncp);
> +			memcpy(&sample.data, &s->data,
> +			       sizeof(u64) * VIRTNET_NUM_STATS);
> +		} while (u64_stats_fetch_retry(&s->syncp, start));
> +
> +		for (i = 0; i < VIRTNET_NUM_STATS; i++) {
> +			*buf = sample.data[i];
> +			total.data[i] += sample.data[i];
> +			buf++;
> +		}
> +	}
> +
> +	memcpy(buf, &total.data, sizeof(u64) * VIRTNET_NUM_STATS);
> +}
> +
>  static const struct ethtool_ops virtnet_ethtool_ops = {
>  	.get_drvinfo = virtnet_get_drvinfo,
>  	.get_link = ethtool_op_get_link,
>  	.get_ringparam = virtnet_get_ringparam,
> +	.get_ethtool_stats = virtnet_get_ethtool_stats,
> +	.get_strings = virtnet_get_strings,
> +	.get_sset_count = virtnet_get_sset_count,
>  };
>  
>  #define MIN_MTU 68

^ permalink raw reply

* [net-next PATCH v2 3/3] bnx2x: Added EEE Ethtool support.
From: Yuval Mintz @ 2012-06-06  8:58 UTC (permalink / raw)
  To: davem, netdev; +Cc: eilong, bhutchings, peppe.cavallaro, Yuval Mintz
In-Reply-To: <1338973098-16439-1-git-send-email-yuvalmin@broadcom.com>

This patch extends the bnx2x's ethtool interface to enable
control in the eee feature, as well as report statistic information
about it.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 .../net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c    |  134 ++++++++++++++++++++
 1 files changed, 134 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
index ddc18ee..bf30e28 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
@@ -177,6 +177,8 @@ static const struct {
 			4, STATS_FLAGS_FUNC, "recoverable_errors" },
 	{ STATS_OFFSET32(unrecoverable_error),
 			4, STATS_FLAGS_FUNC, "unrecoverable_errors" },
+	{ STATS_OFFSET32(eee_tx_lpi),
+			4, STATS_FLAGS_PORT, "Tx LPI entry count"}
 };
 
 #define BNX2X_NUM_STATS		ARRAY_SIZE(bnx2x_stats_arr)
@@ -1543,6 +1545,136 @@ static const struct {
 	{ "idle check (online)" }
 };
 
+static u32 bnx2x_eee_to_adv(u32 eee_adv)
+{
+	u32 modes = 0;
+
+	if (eee_adv & SHMEM_EEE_100M_ADV)
+		modes |= ADVERTISED_100baseT_Full;
+	if (eee_adv & SHMEM_EEE_1G_ADV)
+		modes |= ADVERTISED_1000baseT_Full;
+	if (eee_adv & SHMEM_EEE_10G_ADV)
+		modes |= ADVERTISED_10000baseT_Full;
+
+	return modes;
+}
+
+static u32 bnx2x_adv_to_eee(u32 modes, u32 shift)
+{
+	u32 eee_adv = 0;
+	if (modes & ADVERTISED_100baseT_Full)
+		eee_adv |= SHMEM_EEE_100M_ADV;
+	if (modes & ADVERTISED_1000baseT_Full)
+		eee_adv |= SHMEM_EEE_1G_ADV;
+	if (modes & ADVERTISED_10000baseT_Full)
+		eee_adv |= SHMEM_EEE_10G_ADV;
+
+	return eee_adv << shift;
+}
+
+static int bnx2x_get_eee(struct net_device *dev, struct ethtool_eee *edata)
+{
+	struct bnx2x *bp = netdev_priv(dev);
+	u32 eee_cfg;
+
+	if (!SHMEM2_HAS(bp, eee_status[BP_PORT(bp)])) {
+		DP(BNX2X_MSG_ETHTOOL, "BC Version does not support EEE\n");
+		return -EOPNOTSUPP;
+	}
+
+	eee_cfg = SHMEM2_RD(bp, eee_status[BP_PORT(bp)]);
+
+	edata->supported =
+		bnx2x_eee_to_adv((eee_cfg & SHMEM_EEE_SUPPORTED_MASK) >>
+				 SHMEM_EEE_SUPPORTED_SHIFT);
+
+	edata->advertised =
+		bnx2x_eee_to_adv((eee_cfg & SHMEM_EEE_ADV_STATUS_MASK) >>
+				 SHMEM_EEE_ADV_STATUS_SHIFT);
+	edata->lp_advertised =
+		bnx2x_eee_to_adv((eee_cfg & SHMEM_EEE_LP_ADV_STATUS_MASK) >>
+				 SHMEM_EEE_LP_ADV_STATUS_SHIFT);
+
+	/* SHMEM value is in 16u units --> Convert to 1u units. */
+	edata->tx_lpi_timer = (eee_cfg & SHMEM_EEE_TIMER_MASK) << 4;
+
+	edata->eee_enabled    = (eee_cfg & SHMEM_EEE_REQUESTED_BIT)	? 1 : 0;
+	edata->eee_active     = (eee_cfg & SHMEM_EEE_ACTIVE_BIT)	? 1 : 0;
+	edata->tx_lpi_enabled = (eee_cfg & SHMEM_EEE_LPI_REQUESTED_BIT) ? 1 : 0;
+
+	return 0;
+}
+
+static int bnx2x_set_eee(struct net_device *dev, struct ethtool_eee *edata)
+{
+	struct bnx2x *bp = netdev_priv(dev);
+	u32 eee_cfg;
+	u32 advertised;
+
+	if (IS_MF(bp))
+		return 0;
+
+	if (!SHMEM2_HAS(bp, eee_status[BP_PORT(bp)])) {
+		DP(BNX2X_MSG_ETHTOOL, "BC Version does not support EEE\n");
+		return -EOPNOTSUPP;
+	}
+
+	eee_cfg = SHMEM2_RD(bp, eee_status[BP_PORT(bp)]);
+
+	if (!(eee_cfg & SHMEM_EEE_SUPPORTED_MASK)) {
+		DP(BNX2X_MSG_ETHTOOL, "Board does not support EEE!\n");
+		return -EOPNOTSUPP;
+	}
+
+	advertised = bnx2x_adv_to_eee(edata->advertised,
+				      SHMEM_EEE_ADV_STATUS_SHIFT);
+	if ((advertised != (eee_cfg & SHMEM_EEE_ADV_STATUS_MASK))) {
+		DP(BNX2X_MSG_ETHTOOL,
+		   "Direct manipulation of EEE advertisment is not supported\n");
+		return -EINVAL;
+	}
+
+	if (edata->tx_lpi_timer > EEE_MODE_TIMER_MASK) {
+		DP(BNX2X_MSG_ETHTOOL,
+		   "Maximal Tx Lpi timer supported is %x(u)\n",
+		   EEE_MODE_TIMER_MASK);
+		return -EINVAL;
+	}
+	if (edata->tx_lpi_enabled &&
+	    (edata->tx_lpi_timer < EEE_MODE_NVRAM_AGGRESSIVE_TIME)) {
+		DP(BNX2X_MSG_ETHTOOL,
+		   "Minimal Tx Lpi timer supported is %d(u)\n",
+		   EEE_MODE_NVRAM_AGGRESSIVE_TIME);
+		return -EINVAL;
+	}
+
+	/* All is well; Apply changes*/
+	if (edata->eee_enabled)
+		bp->link_params.eee_mode |= EEE_MODE_ADV_LPI;
+	else
+		bp->link_params.eee_mode &= ~EEE_MODE_ADV_LPI;
+
+	if (edata->tx_lpi_enabled)
+		bp->link_params.eee_mode |= EEE_MODE_ENABLE_LPI;
+	else
+		bp->link_params.eee_mode &= ~EEE_MODE_ENABLE_LPI;
+
+	bp->link_params.eee_mode &= ~EEE_MODE_TIMER_MASK;
+	bp->link_params.eee_mode |= (edata->tx_lpi_timer &
+				    EEE_MODE_TIMER_MASK) |
+				    EEE_MODE_OVERRIDE_NVRAM |
+				    EEE_MODE_OUTPUT_TIME;
+
+	/* Restart link to propogate changes */
+	if (netif_running(dev)) {
+		bnx2x_stats_handle(bp, STATS_EVENT_STOP);
+		bnx2x_link_set(bp);
+	}
+
+	return 0;
+}
+
+
 enum {
 	BNX2X_CHIP_E1_OFST = 0,
 	BNX2X_CHIP_E1H_OFST,
@@ -2472,6 +2604,8 @@ static const struct ethtool_ops bnx2x_ethtool_ops = {
 	.get_rxfh_indir_size	= bnx2x_get_rxfh_indir_size,
 	.get_rxfh_indir		= bnx2x_get_rxfh_indir,
 	.set_rxfh_indir		= bnx2x_set_rxfh_indir,
+	.get_eee		= bnx2x_get_eee,
+	.set_eee		= bnx2x_set_eee,
 };
 
 void bnx2x_set_ethtool_ops(struct net_device *netdev)
-- 
1.7.9.rc2

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox