Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] net: drivers/net/hippi/Kconfig should be sourced
From: David Miller @ 2011-11-09 20:46 UTC (permalink / raw)
  To: pebolle; +Cc: netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <1320784270.14409.404.camel@x61.thuisdomein>

From: Paul Bolle <pebolle@tiscali.nl>
Date: Tue, 08 Nov 2011 21:31:10 +0100

> Commit ff5a3b509e ("hippi: Move the HIPPI driver") moved the HIPPI
> driver into drivers/net/hippi. It didn't source
> drivers/net/hippi/Kconfig though, so it didn't make all necessary
> Kconfig changes. So let drivers/net/kconfig source HIPPI's Kconfig file.
> 
> Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
> ---
> git grep tested only. Perhaps the exact spot where
> drivers/net/hippi/Kconfig gets sourced is relevant, so this needs
> (build) testing by people actually familiar with the HIPPI driver. 

Please at least type "make oldconfig" with CONFIG_HIPPI enabled or
similar before submitting patches like this.

There is nothing architecture or platform specific about getting
the option enabled enough for you to see this:

drivers/net/hippi/Kconfig:40: syntax error
drivers/net/hippi/Kconfig:20: missing end statement for this entry
drivers/net/Kconfig:28: missing end statement for this entry
drivers/Kconfig:1: missing end statement for this entry
drivers/net/hippi/Kconfig:39: invalid statement
drivers/net/Kconfig:341: unexpected end statement
drivers/Kconfig:139: unexpected end statement
make[1]: *** [oldconfig] Error 1
make: *** [oldconfig] Error 2

I've fixed this up but if you can't be bothered to type "make" I
seriously can't be bothered to even look at your patch submissions.

^ permalink raw reply

* [PATCH 1/2] include/net/cfg80211.h: Fix issue of make htmldocs
From: Marcos Paulo de Souza @ 2011-11-09 20:46 UTC (permalink / raw)
  To: johannes; +Cc: davem, netdev, rdunlap, Marcos Paulo de Souza

Make documentation of member sta_modify_mask of struct
station_parameters and sta_flags of struct station_info.

Signed-off-by: Marcos Paulo de Souza <marcos.mage@gmail.com>
---
 include/net/cfg80211.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index 92cf1c2..bbf6bf7 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -447,6 +447,7 @@ enum station_parameters_apply_mask {
  *	(bitmask of BIT(NL80211_STA_FLAG_...))
  * @sta_flags_set: station flags values
  *	(bitmask of BIT(NL80211_STA_FLAG_...))
+ * @sta_modify_mask: apply new uAPSD parameters
  * @listen_interval: listen interval or -1 for no change
  * @aid: AID or zero for no change
  * @plink_action: plink action to take
@@ -606,6 +607,7 @@ struct sta_bss_parameters {
  * @tx_failed: number of failed transmissions (retries exceeded, no ACK)
  * @rx_dropped_misc:  Dropped for un-specified reason.
  * @bss_param: current BSS parameters
+ * @sta_flags: Station flags mask/set
  * @generation: generation number for nl80211 dumps.
  *	This number should increase every time the list of stations
  *	changes, i.e. when a station is added or removed, so that
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH] net/can/mscan: add listen only mode
From: Marc Kleine-Budde @ 2011-11-09 20:50 UTC (permalink / raw)
  To: linux-can; +Cc: netdev, davem, Marc Kleine-Budde

This patch adds listen only mode to the mscan controller.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
---

The patch targets net-next/master and can be pulled:

The following changes since commit e56c57d0d3fdbbdf583d3af96bfb803b8dfa713e:

  net: rename sk_clone to sk_clone_lock (2011-11-08 17:07:07 -0500)

are available in the git repository at:
  git://git.pengutronix.de/git/mkl/linux-2.6.git can/mscan-listen-only-for-net-next

Marc Kleine-Budde (1):
      net/can/mscan: add listen only mode

 drivers/net/can/mscan/mscan.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/can/mscan/mscan.c b/drivers/net/can/mscan/mscan.c
index ec4a311..74f3b18 100644
--- a/drivers/net/can/mscan/mscan.c
+++ b/drivers/net/can/mscan/mscan.c
@@ -581,7 +581,10 @@ static int mscan_open(struct net_device *dev)
 
 	priv->open_time = jiffies;
 
-	clrbits8(&regs->canctl1, MSCAN_LISTEN);
+	if (ctrlmode.flags & CAN_CTRLMODE_LISTENONLY)
+		setbits8(&regs->canctl1, MSCAN_LISTEN);
+	else
+		clrbits8(&regs->canctl1, MSCAN_LISTEN);
 
 	ret = mscan_start(dev);
 	if (ret)
@@ -690,7 +693,8 @@ struct net_device *alloc_mscandev(void)
 	priv->can.bittiming_const = &mscan_bittiming_const;
 	priv->can.do_set_bittiming = mscan_do_set_bittiming;
 	priv->can.do_set_mode = mscan_do_set_mode;
-	priv->can.ctrlmode_supported = CAN_CTRLMODE_3_SAMPLES;
+	priv->can.ctrlmode_supported = CAN_CTRLMODE_3_SAMPLES |
+		CAN_CTRLMODE_LISTENONLY;
 
 	for (i = 0; i < TX_QUEUE_SIZE; i++) {
 		priv->tx_queue[i].id = i;
-- 
1.7.4.1


^ permalink raw reply related

* Re: [PATCH 0/2] AH fixes for asynchronous hash algorithms.
From: David Miller @ 2011-11-09 20:56 UTC (permalink / raw)
  To: nbowler; +Cc: netdev, linux-kernel
In-Reply-To: <1320790365-29152-1-git-send-email-nbowler@elliptictech.com>

From: Nick Bowler <nbowler@elliptictech.com>
Date: Tue,  8 Nov 2011 17:12:43 -0500

> Here are two fixes for AH when using an asynchronous hmac driver.  Both
> are -stable candidates as these problems appear to have been present
> since AH was converted to use ahash way back in 2.6.33.
> 
> These code paths are not exercised when using the default software hash
> implementations which do not use the ahash callbacks, but the issues can be
> reproduced by using cryptd to create an asynchronous hash algorithm for
> testing.
> 
> This driver could probably do with some cleanups to reduce the code
> duplication (and thus test coverage) between the asynchronous callbacks
> and synchronous code paths, which should help avoid these kind of
> problems in the future.  These code paths apparently do not see a
> lot of testing.  But that's for a later patch series.
> 
> Nick Bowler (2):
>   ah: Correctly pass error codes in ahash output callback.
>   ah: Read nexthdr value before overwriting it in ahash input callback.

Thanks a lot for these bug fixes Nick, both applied.

Also queued up for -stable.

^ permalink raw reply

* Re: [PATCH] ipv4: fix for ip_options_rcv_srr() daddr update.
From: David Miller @ 2011-11-09 20:59 UTC (permalink / raw)
  To: lw; +Cc: netdev
In-Reply-To: <4EBA2E30.8050102@cn.fujitsu.com>

From: Li Wei <lw@cn.fujitsu.com>
Date: Wed, 09 Nov 2011 15:39:28 +0800

> When opt->srr_is_hit is set skb_rtable(skb) has been updated for
> 'nexthop' and iph->daddr should always equals to skb_rtable->rt_dst
> holds, We need update iph->daddr either.
> 
> Signed-off-by: Li Wei <lw@cn.fujitsu.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next] ipv4: reduce percpu needs for icmpmsg mibs
From: David Miller @ 2011-11-09 21:04 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1320793483.26025.29.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 09 Nov 2011 00:04:43 +0100

> Reading /proc/net/snmp on a machine with a lot of cpus is very expensive
> (can be ~88000 us).
> 
> This is because ICMPMSG MIB uses 4096 bytes per cpu, and folding values
> for all possible cpus can read 16 Mbytes of memory.
> 
> ICMP messages are not considered as fast path on a typical server, and
> eventually few cpus handle them anyway. We can afford an atomic
> operation instead of using percpu data.
> 
> This saves 4096 bytes per cpu and per network namespace.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> If this patch is accepted, I'll submit the IPv6 part as well.

Looks good, applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH] net/usb: Misc. fixes for the LG-VL600 LTE USB modem
From: David Miller @ 2011-11-09 21:06 UTC (permalink / raw)
  To: prox; +Cc: dcbw, oliver, gregkh, netdev, linux-kernel
In-Reply-To: <20111109185714.GA15884@prolixium.com>

From: Mark Kamichoff <prox@prolixium.com>
Date: Wed, 9 Nov 2011 13:57:14 -0500

> For (a), it's my understanding that __constant_htons() should be used
> only for initializers and htons() used in other cases, since it handles
> checking for constants.  I suppose you're right and this is a little
> gratuitous, but I wanted to keep things clean.
> 
> As far as (b), sorry!  That's an error on my part.  I must have been
> practicing another coding style at the time.  The braces certainly
> shouldn't be there, let me know if I should resubmit.

Please get rid of the gratuitous htons() etc. changes and keep this
patch purely to the bug fixes and resubmit.

Thank you.

^ permalink raw reply

* Re: [PATCH] net: drivers/net/hippi/Kconfig should be sourced
From: Paul Bolle @ 2011-11-09 21:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <20111109.154643.2008804116058722848.davem@davemloft.net>

On Wed, 2011-11-09 at 15:46 -0500, David Miller wrote:
> Please at least type "make oldconfig" with CONFIG_HIPPI enabled or
> similar before submitting patches like this.
> 
> There is nothing architecture or platform specific about getting
> the option enabled enough for you to see this:
> 
> drivers/net/hippi/Kconfig:40: syntax error
> drivers/net/hippi/Kconfig:20: missing end statement for this entry
> drivers/net/Kconfig:28: missing end statement for this entry
> drivers/Kconfig:1: missing end statement for this entry
> drivers/net/hippi/Kconfig:39: invalid statement
> drivers/net/Kconfig:341: unexpected end statement
> drivers/Kconfig:139: unexpected end statement
> make[1]: *** [oldconfig] Error 1
> make: *** [oldconfig] Error 2
> 
> I've fixed this up but if you can't be bothered to type "make" I
> seriously can't be bothered to even look at your patch submissions.

Would it be better if I hadn't submitted this as a patch (with a
warning, which you perhaps missed, that I didn't build test it) but as a
simple message to notify the people who wrote the patch that started all
this, netdev and you, that that commit was incomplete? If so, I'd be
glad to only do that in the future.


Paul Bolle

^ permalink raw reply

* Re: net: Add network priority cgroup
From: Neil Horman @ 2011-11-09 21:09 UTC (permalink / raw)
  To: Dave Taht; +Cc: netdev, John Fastabend, Robert Love, David S. Miller
In-Reply-To: <CAA93jw7G90kGBu2JnaEdWv3J0OSPO8eg55TMrZZCWPD-pdRf_g@mail.gmail.com>

On Wed, Nov 09, 2011 at 09:27:08PM +0100, Dave Taht wrote:
> On Wed, Nov 9, 2011 at 8:57 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
> >
> > Data Center Bridging environments are currently somewhat limited in their
> > ability to provide a general mechanism for controlling traffic priority.
> 
> 
> 
> >
> > Specifically they are unable to administratively control the priority at which
> > various types of network traffic are sent.
> >
> > Currently, the only ways to set the priority of a network buffer are:
> >
> > 1) Through the use of the SO_PRIORITY socket option
> > 2) By using low level hooks, like a tc action
> >
> 2), above is a little vague.
> 
> There are dozens of ways to control the relative priorities of network
> streams in addition to priority notably diffserv, various forms of
> fair queuing, and active queue management tecniques like RED, Blue,
> etc.
> 
I'm referring explicitly to skb->prioroity here.  Sorry If I wasn't clear.

> The priority field within the Linux skb is used for multiple purposes
> - in addition to SO_PRIORITY it is also used for queue selection
> within tc for a variety of queuing disciplines. Certain bands are
> reserved for vlan and wireless queueing, (these features are rarely
> used)
> 
Yes.

> Twiddling with it on one level or creating a controller for it can and
> will still be messed up by attempts to sanely use it elsewhere in the
> stack.
> 
Why?  Its not like it can't already be twiddled with via SO_PRIORITY.  This does
exactly the same thing, it just lets us do it via an administrative interface
rather than a programatic one.  I don't disagree that the use of skb->prioirty
is complex, but this doesn't add any complexity that isn't already there.  It
just gives us a general way to assign priorities for those that know how to use
it consistently, in a way that doesn't require application modification.  Thats
something that DCB needs.

> >
> > (1) is difficult from an administrative perspective because it requires that the
> > application to be coded to not just assume the default priority is sufficient,
> > and must expose an administrative interface to allow priority adjustment.  Such
> > a solution is not scalable in a DCB environment
> >
> 
> Nor any other complex environment. Or even a simple one.
Yes.

> 
> >
> > (2) is also difficult, as it requires constant administrative oversight of
> > applications so as to build appropriate rules to match traffic belonging to
> 
> Yes, your description of option 2, as simplified above, is difficult.
> 
> However certain algorithms are intended to improve fairness between
> flows that do not require as much oversight and classification.
> 
Yes, but DCB is orthogonal to software traffic control.  Its hardware queueing 
based on the priority value of an skb.  As such, when a DCB enabled multiqueue
adapter selects the output queues in dev_pick_tx, it needs to have the
skb->priority value set properly.  Since we don't run any of the tc filters or
classifiers until after thats complete, we can't use those to adjust the skb
priority, as the root qdisc is already selected.

> However, even when RED or a newer queue management algorithm such as
> QFQ or DRR is applied, classes of traffic exist that benefit from more
> specialized diffserv or diffserv-like behavior.
> 
I understand, but again, DCB is orthogonal to that.  DCB is a hardware based
solution that steers traffic to various output queues in the NIC based on the
skb->priority value.  Take a look at ixgbe_select_queue for an example.

> However, the evidence for something more complex in server
> environments than simple priority management is compelling at this
> point.
> 
> > various classes, so that priority can be appropriately set. It is further
> > limiting when DCB enabled hardware is in use, due to the fact that tc rules are
> > only run after a root qdisc has been selected (DCB enabled hardware may reserve
> > hw queues for various traffic classes and needs the priority to be set prior to
> > selecting the root qdisc)
> >
> 
> Multiple applications (somewhat) rightly set priorities according to
> their view of the world.
> 
> background traffic and immediate traffic often set the appropriate
> diffserv bits, other traffic can do the same, and at least a few apps
> set the priority field also in the hope that that will do some good,
> and perhaps more should.
> 
Agreed, and this patch respects that.  It only sets the priority of an skb that
doesn't already have its priority set.  See skb_update_prio.

> 
> >
> > I've discussed various solutions with John Fastabend, and we saw a cgroup as
> > being a good general solution to this problem.  The network priority cgroup
> 
> Not if you are wanting to apply queue management further down the stack!
> 
I'm not saying you can use the two together! I understand that this solution
interferes with the use of skb->priority in various queuing disciplines (just
like a program using SO_PRIORITY would), but the way those disciplines work is
incompatible with DCB at the moment.  You wouldn't use them all at the same
time.  I'd be happy to add some documentation to my patch to reflect that if you
like.

> >
> > allows for a per-interface priority map to be built per cgroup.  Any traffic
> > originating from an application in a cgroup, that does not explicitly set its
> > priority with SO_PRIORITY will have its priority assigned to the value
> > designated for that group on that interface.
> 
> > This allows a user space daemon,
> > when conducting LLDP negotiation with a DCB enabled peer to create a cgroup
> > based on the APP_TLV value received and administratively assign applications to
> > that priority using the existing cgroup utility infrastructure.
> 
> I would like it if the many uses of the priority field were reduced to
> one use per semantic grouping.
> 
> You are adding a controller to something that is already
> ill-controlled and ill-defined, overly overloaded and both under and
> over used, to be managed in userspace by code to designed later, and
> then re-mapped once it exits a vm into another host or hardware queue
> management system which may or may not share similar assumptions.
> 
> Don't get me wrong, I LIKE the controller idea, but think the priority
> field needs to be un-overloaded first to avoid ill-effects elsewhere
> in the users of the down-stream subsystems.
> 
We can certainly discuss the idea of separating the various semantic uses of
skb->priority out, but I don't think this patch is the place to do it. The
DCB use case for priority already exists (it specifically uses the prio_tc_map
as indexed by skb->priority in __skb_tx_hash).  I'm just adding a means of
controlling it more easily and reliably. 

> > Tested by John and myself, with good results
> 
> With what?
> 
What else?  and ixgbe adapter and ping.  I created a test netprio cgroup, assigned a
priority value to it, and did a did a cgexec -g net_prio:test ping www.yahoo.com
with a printk in the ixgbe tx method to valiedate that the proper queue mapping
was selected.

Neil

> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > CC: John Fastabend <john.r.fastabend@intel.com>
> > CC: Robert Love <robert.w.love@intel.com>
> > CC: "David S. Miller" <davem@davemloft.net>
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> --
> Dave Täht
> SKYPE: davetaht
> 
> http://www.bufferbloat.net
> 

^ permalink raw reply

* Re: net: Add network priority cgroup
From: John Fastabend @ 2011-11-09 21:10 UTC (permalink / raw)
  To: Dave Taht
  Cc: Neil Horman, netdev@vger.kernel.org, Love, Robert W,
	David S. Miller
In-Reply-To: <CAA93jw7G90kGBu2JnaEdWv3J0OSPO8eg55TMrZZCWPD-pdRf_g@mail.gmail.com>

On 11/9/2011 12:27 PM, Dave Taht wrote:
> On Wed, Nov 9, 2011 at 8:57 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
>>
>> Data Center Bridging environments are currently somewhat limited in their
>> ability to provide a general mechanism for controlling traffic priority.
> 
> 
> 
>>
>> Specifically they are unable to administratively control the priority at which
>> various types of network traffic are sent.
>>
>> Currently, the only ways to set the priority of a network buffer are:
>>
>> 1) Through the use of the SO_PRIORITY socket option
>> 2) By using low level hooks, like a tc action
>>
> 2), above is a little vague.
> 
> There are dozens of ways to control the relative priorities of network
> streams in addition to priority notably diffserv, various forms of
> fair queuing, and active queue management tecniques like RED, Blue,
> etc.
> 

Maybe dozens of ways to control traffic using various combinations of
qdiscs but I think for classification we have a small set of reasonably
defined mechanisms.

 - tc filter/action
 - netfilter infrastructure think CLASSIFY (iptables/ebtables)
 - socket options SO_PRIORITY and SO_TOS

By the way setting the tos bits also sets the sk->priority. What other
classifications did I miss?

> The priority field within the Linux skb is used for multiple purposes
> - in addition to SO_PRIORITY it is also used for queue selection
> within tc for a variety of queuing disciplines. Certain bands are
> reserved for vlan and wireless queueing, (these features are rarely
> used)
> 
> Twiddling with it on one level or creating a controller for it can and
> will still be messed up by attempts to sanely use it elsewhere in the
> stack.
> 

The skb->priority is used by some qdiscs and also with vlan egress_maps.

Without knowing the wireless situation it seems you can either not manage
priority over wireless links if this is a problem or perhaps we can clean
up the wireless queueing and integrate it with the appropriate qdisc.

Could the wireless skb->priority usage be tied into mqprio?

>>
>> (1) is difficult from an administrative perspective because it requires that the
>> application to be coded to not just assume the default priority is sufficient,
>> and must expose an administrative interface to allow priority adjustment.  Such
>> a solution is not scalable in a DCB environment
>>
> 
> Nor any other complex environment. Or even a simple one.
> 
>>
>> (2) is also difficult, as it requires constant administrative oversight of
>> applications so as to build appropriate rules to match traffic belonging to
> 
> Yes, your description of option 2, as simplified above, is difficult.
> 
> However certain algorithms are intended to improve fairness between
> flows that do not require as much oversight and classification.
> 
> However, even when RED or a newer queue management algorithm such as
> QFQ or DRR is applied, classes of traffic exist that benefit from more
> specialized diffserv or diffserv-like behavior.
> 
> However, the evidence for something more complex in server
> environments than simple priority management is compelling at this
> point.
> 
>> various classes, so that priority can be appropriately set. It is further
>> limiting when DCB enabled hardware is in use, due to the fact that tc rules are
>> only run after a root qdisc has been selected (DCB enabled hardware may reserve
>> hw queues for various traffic classes and needs the priority to be set prior to
>> selecting the root qdisc)
>>
> 
> Multiple applications (somewhat) rightly set priorities according to
> their view of the world.
> 
> background traffic and immediate traffic often set the appropriate
> diffserv bits, other traffic can do the same, and at least a few apps
> set the priority field also in the hope that that will do some good,
> and perhaps more should.

These patches do not overwrite existing priorities. So applications
that manage the priority can continue to do this.

> 
> 
>>
>> I've discussed various solutions with John Fastabend, and we saw a cgroup as
>> being a good general solution to this problem.  The network priority cgroup
> 
> Not if you are wanting to apply queue management further down the stack!
> 

I don't follow? Here your saying that you have a queue management that the
QOS layer is unaware of? OK so any qdisc or priority mechanism is going to
interfere with 'further down the stack'.

>>
>> allows for a per-interface priority map to be built per cgroup.  Any traffic
>> originating from an application in a cgroup, that does not explicitly set its
>> priority with SO_PRIORITY will have its priority assigned to the value
>> designated for that group on that interface.
> 
>> This allows a user space daemon,
>> when conducting LLDP negotiation with a DCB enabled peer to create a cgroup
>> based on the APP_TLV value received and administratively assign applications to
>> that priority using the existing cgroup utility infrastructure.
> 
> I would like it if the many uses of the priority field were reduced to
> one use per semantic grouping.
> 
> You are adding a controller to something that is already
> ill-controlled and ill-defined, overly overloaded and both under and
> over used, to be managed in userspace by code to designed later, and
> then re-mapped once it exits a vm into another host or hardware queue
> management system which may or may not share similar assumptions.
> 

I don't think its ill-defined or ill-controlled. The priority can be
set by well defined mechanisms. We provide another mechanism to set
the priority without having to modify existing applications and a
mechanism for administrators/tools to set dynamically.

Overloaded perhaps the egress_map is a bit of an overloading of this.
But its existed for a long time.

IMHO hardware queue management systems should be integrated into the
qdisc layer if possible. DCB enabled hardware had similar problems
trying to do hardware queue management without involving the OS and
had to add hacks into select_queue() or hard coded traffic types
into the base drivers to work around this. 'mqprio' and dev support
for traffic classes was my take at a generic mechanism to expose this
to the OS.


> Don't get me wrong, I LIKE the controller idea, but think the priority
> field needs to be un-overloaded first to avoid ill-effects elsewhere
> in the users of the down-stream subsystems.
> 

But doesn't this help the down-stream subsystems as well? The priority
will eventually be pushed down the stack.

>> Tested by John and myself, with good results
> 
> With what?
> 

I tested this with mqprio using the net_prio cgroups to set the priority
and using mqprio to bind hardware queue sets to each priority. Then
I used netperf, ping, and the cg* tools to test I/O.

As a side note I expect you could also use this in conjunction with
the vlan egress_map to push applications onto 802.1Q priorities.

>> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>> CC: John Fastabend <john.r.fastabend@intel.com>
>> CC: Robert Love <robert.w.love@intel.com>
>> CC: "David S. Miller" <davem@davemloft.net>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> --
> Dave Täht
> SKYPE: davetaht
> 
> http://www.bufferbloat.net

^ permalink raw reply

* Re: [PATCH] iMX28 Ethernet driver fix
From: David Miller @ 2011-11-09 21:11 UTC (permalink / raw)
  To: phorton; +Cc: netdev, linux-arm-kernel
In-Reply-To: <20111109124411.GA31046@axolotl.localnet>

From: Peter Horton <phorton@bitbox.co.uk>
Date: Wed, 9 Nov 2011 12:44:11 +0000

> -	if (((unsigned long) bufaddr) & FEC_ALIGNMENT) {
> +	if ((((unsigned long) bufaddr) & FEC_ALIGNMENT) ||
> +		((id_entry->driver_data & FEC_QUIRK_SWAP_FRAME) &&
> +		skb_cloned(skb)))
> +	{

Please format this condition properly:

	if (A ||
	    (B &&
             C)) {

^ permalink raw reply

* Re: [PATCH] net: drivers/net/hippi/Kconfig should be sourced
From: David Miller @ 2011-11-09 21:15 UTC (permalink / raw)
  To: pebolle; +Cc: netdev, linux-kernel, jeffrey.t.kirsher
In-Reply-To: <1320872916.27598.49.camel@x61.thuisdomein>

From: Paul Bolle <pebolle@tiscali.nl>
Date: Wed, 09 Nov 2011 22:08:36 +0100

> Would it be better if I hadn't submitted this as a patch

Yes, because eventually someone who actually cared about the
situation would submit a properly tested patch.

If nobody else notices the problem, that's fine too, because it means
nobody else cares about whether HIPPI is missing from the build or
not.

^ permalink raw reply

* Re: [PATCH V4 net-next] neigh: new unresolved queue limits
From: David Miller @ 2011-11-09 21:16 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1320837249.2315.26.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 09 Nov 2011 12:14:09 +0100

> unres_qlen is the number of frames we are able to queue per unresolved
> neighbour. Its default value (3) was never changed and is responsible
> for strange drops, especially if IP fragments are used, or multiple
> sessions start in parallel. Even a single tcp flow can hit this limit.
 ...

Ok, I've applied this, let's see what happens :-)

Thanks!

^ permalink raw reply

* Re: pull request: wireless 2011-11-09
From: David Miller @ 2011-11-09 21:20 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20111109193504.GA32400@tuxdriver.com>

From: "John W. Linville" <linville@tuxdriver.com>
Date: Wed, 9 Nov 2011 14:35:04 -0500

> Regarding the Bluetooth fixes, Gustavo says this:
> 
> Please let me know if there are problems!

Gustavo says what? :-)

^ permalink raw reply

* Re: [PATCH V4 net-next] neigh: new unresolved queue limits
From: David Miller @ 2011-11-09 21:21 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <20111109.161644.505896539772671525.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 09 Nov 2011 12:14:09 +0100
> 
>> unres_qlen is the number of frames we are able to queue per unresolved
>> neighbour. Its default value (3) was never changed and is responsible
>> for strange drops, especially if IP fragments are used, or multiple
>> sessions start in parallel. Even a single tcp flow can hit this limit.
>  ...
> 
> Ok, I've applied this, let's see what happens :-)

Early answer, build fails.

Please test build this patch with DECNET enabled and resubmit.  The
decnet neigh layer still refers to the removed ->queue_len member.

Thanks.

^ permalink raw reply

* Re: pull request: wireless 2011-11-09
From: John W. Linville @ 2011-11-09 21:25 UTC (permalink / raw)
  To: David Miller; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20111109.162015.2261725491621555303.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 825 bytes --]

On Wed, Nov 09, 2011 at 04:20:15PM -0500, David Miller wrote:
> From: "John W. Linville" <linville@tuxdriver.com>
> Date: Wed, 9 Nov 2011 14:35:04 -0500
> 
> > Regarding the Bluetooth fixes, Gustavo says this:
> > 
> > Please let me know if there are problems!
> 
> Gustavo says what? :-)

Hmmm...obviously not my best day...

Gustavo says:

"3 more fixes to linux 3.2. One is USB device id addition and the other two
patches combined fixes a connection issue. The first one from Arek Lichwa
revert the wrong fix and a second commit from Andrzej Kaczmarek fix the issue
properly."

Hth! :-)

John

P.S.  Pull request head is commit e29ec6247053ad60bd0b36f155b647364a615097.
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* [057/262] MAINTANERS: update Qualcomm Atheros addresses
From: Greg KH @ 2011-11-09 21:26 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, stable, netdev, jouni, yangjie, vthiagar,
	senthilb, Luis R. Rodriguez, John W. Linville
In-Reply-To: <20111109212847.GA20838@kroah.com>

3.0-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Luis R. Rodriguez" <mcgrof@qca.qualcomm.com>

commit fe8e084455f273b32cc57a5fbaf6c22ef984d657 upstream.

Qualcomm ate up Atheros, all of the old e-mail addresses
no longer work and e-mails sent to it will bounce. Update
the addresses to the new shiny Qualcomm Atheros (QCA) ones.

Cc: stable@kernel.org
Cc: netdev@vger.kernel.org
Cc: jouni@qca.qualcomm.com
Cc: yangjie@qca.qualcomm.com
Cc: vthiagar@qca.qualcomm.com
Cc: senthilb@qca.qualcomm.com
Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 MAINTAINERS |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1221,7 +1221,7 @@ F:	Documentation/aoe/
 F:	drivers/block/aoe/
 
 ATHEROS ATH GENERIC UTILITIES
-M:	"Luis R. Rodriguez" <lrodriguez@atheros.com>
+M:	"Luis R. Rodriguez" <mcgrof@qca.qualcomm.com>
 L:	linux-wireless@vger.kernel.org
 S:	Supported
 F:	drivers/net/wireless/ath/*
@@ -1229,7 +1229,7 @@ F:	drivers/net/wireless/ath/*
 ATHEROS ATH5K WIRELESS DRIVER
 M:	Jiri Slaby <jirislaby@gmail.com>
 M:	Nick Kossifidis <mickflemm@gmail.com>
-M:	"Luis R. Rodriguez" <lrodriguez@atheros.com>
+M:	"Luis R. Rodriguez" <mcgrof@qca.qualcomm.com>
 M:	Bob Copeland <me@bobcopeland.com>
 L:	linux-wireless@vger.kernel.org
 L:	ath5k-devel@lists.ath5k.org
@@ -1238,10 +1238,10 @@ S:	Maintained
 F:	drivers/net/wireless/ath/ath5k/
 
 ATHEROS ATH9K WIRELESS DRIVER
-M:	"Luis R. Rodriguez" <lrodriguez@atheros.com>
-M:	Jouni Malinen <jmalinen@atheros.com>
-M:	Vasanthakumar Thiagarajan <vasanth@atheros.com>
-M:	Senthil Balasubramanian <senthilkumar@atheros.com>
+M:	"Luis R. Rodriguez" <mcgrof@qca.qualcomm.com>
+M:	Jouni Malinen <jouni@qca.qualcomm.com>
+M:	Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com>
+M:	Senthil Balasubramanian <senthilb@qca.qualcomm.com>
 L:	linux-wireless@vger.kernel.org
 L:	ath9k-devel@lists.ath9k.org
 W:	http://wireless.kernel.org/en/users/Drivers/ath9k
@@ -1269,7 +1269,7 @@ F:	drivers/input/misc/ati_remote2.c
 ATLX ETHERNET DRIVERS
 M:	Jay Cliburn <jcliburn@gmail.com>
 M:	Chris Snook <chris.snook@gmail.com>
-M:	Jie Yang <jie.yang@atheros.com>
+M:	Jie Yang <yangjie@qca.qualcomm.com>
 L:	netdev@vger.kernel.org
 W:	http://sourceforge.net/projects/atl1
 W:	http://atl1.sourceforge.net

^ permalink raw reply

* [065/264] MAINTANERS: update Qualcomm Atheros addresses
From: Greg KH @ 2011-11-09 21:31 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: torvalds, akpm, alan, stable, netdev, jouni, yangjie, vthiagar,
	senthilb, Luis R. Rodriguez, John W. Linville
In-Reply-To: <20111109213508.GA3476@kroah.com>

3.1-stable review patch.  If anyone has any objections, please let me know.

------------------

From: "Luis R. Rodriguez" <mcgrof@qca.qualcomm.com>

commit fe8e084455f273b32cc57a5fbaf6c22ef984d657 upstream.

Qualcomm ate up Atheros, all of the old e-mail addresses
no longer work and e-mails sent to it will bounce. Update
the addresses to the new shiny Qualcomm Atheros (QCA) ones.

Cc: stable@kernel.org
Cc: netdev@vger.kernel.org
Cc: jouni@qca.qualcomm.com
Cc: yangjie@qca.qualcomm.com
Cc: vthiagar@qca.qualcomm.com
Cc: senthilb@qca.qualcomm.com
Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 MAINTAINERS |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1230,7 +1230,7 @@ F:	Documentation/aoe/
 F:	drivers/block/aoe/
 
 ATHEROS ATH GENERIC UTILITIES
-M:	"Luis R. Rodriguez" <lrodriguez@atheros.com>
+M:	"Luis R. Rodriguez" <mcgrof@qca.qualcomm.com>
 L:	linux-wireless@vger.kernel.org
 S:	Supported
 F:	drivers/net/wireless/ath/*
@@ -1238,7 +1238,7 @@ F:	drivers/net/wireless/ath/*
 ATHEROS ATH5K WIRELESS DRIVER
 M:	Jiri Slaby <jirislaby@gmail.com>
 M:	Nick Kossifidis <mickflemm@gmail.com>
-M:	"Luis R. Rodriguez" <lrodriguez@atheros.com>
+M:	"Luis R. Rodriguez" <mcgrof@qca.qualcomm.com>
 M:	Bob Copeland <me@bobcopeland.com>
 L:	linux-wireless@vger.kernel.org
 L:	ath5k-devel@lists.ath5k.org
@@ -1247,10 +1247,10 @@ S:	Maintained
 F:	drivers/net/wireless/ath/ath5k/
 
 ATHEROS ATH9K WIRELESS DRIVER
-M:	"Luis R. Rodriguez" <lrodriguez@atheros.com>
-M:	Jouni Malinen <jmalinen@atheros.com>
-M:	Vasanthakumar Thiagarajan <vasanth@atheros.com>
-M:	Senthil Balasubramanian <senthilkumar@atheros.com>
+M:	"Luis R. Rodriguez" <mcgrof@qca.qualcomm.com>
+M:	Jouni Malinen <jouni@qca.qualcomm.com>
+M:	Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com>
+M:	Senthil Balasubramanian <senthilb@qca.qualcomm.com>
 L:	linux-wireless@vger.kernel.org
 L:	ath9k-devel@lists.ath9k.org
 W:	http://wireless.kernel.org/en/users/Drivers/ath9k

^ permalink raw reply

* Re: pull request: wireless 2011-11-09
From: David Miller @ 2011-11-09 21:35 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20111109212505.GC10712@tuxdriver.com>

From: "John W. Linville" <linville@tuxdriver.com>
Date: Wed, 9 Nov 2011 16:25:05 -0500

> On Wed, Nov 09, 2011 at 04:20:15PM -0500, David Miller wrote:
>> From: "John W. Linville" <linville@tuxdriver.com>
>> Date: Wed, 9 Nov 2011 14:35:04 -0500
>> 
>> > Regarding the Bluetooth fixes, Gustavo says this:
>> > 
>> > Please let me know if there are problems!
>> 
>> Gustavo says what? :-)
> 
> Hmmm...obviously not my best day...
> 
> Gustavo says:
> 
> "3 more fixes to linux 3.2. One is USB device id addition and the other two
> patches combined fixes a connection issue. The first one from Arek Lichwa
> revert the wrong fix and a second commit from Andrzej Kaczmarek fix the issue
> properly."
> 
> Hth! :-)

That's better :-)

Pulled, thanks a lot John!

^ permalink raw reply

* Re: [PATCH net-next] ipv4: PKTINFO doesnt need dst reference
From: David Miller @ 2011-11-09 21:37 UTC (permalink / raw)
  To: eric.dumazet; +Cc: bhutchings, pstaszewski, netdev
In-Reply-To: <1320859475.3916.21.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 09 Nov 2011 18:24:35 +0100

> [PATCH net-next] ipv4: IP_PKTINFO doesnt need dst reference
> 
> When a socket uses IP_PKTINFO notifications, we currently force a dst
> reference for each received skb. Reader has to access dst to get needed
> information (rt_iif & rt_spec_dst) and must release dst reference.
> 
> We also forced a dst reference if skb was put in socket backlog, even
> without IP_PKTINFO handling. This happens under stress/load.
> 
> We can instead store the needed information in skb->cb[], so that only
> softirq handler really access dst, improving cache hit ratios.
> 
> This removes two atomic operations per packet, and false sharing as
> well.
> 
> On a benchmark using a mono threaded receiver (doing only recvmsg()
> calls), I can reach 720.000 pps instead of 570.000 pps.
> 
> IP_PKTINFO is typically used by DNS servers, and any multihomed aware
> UDP application.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Looks good, if it compiles I'll push it out to net-next :-)

^ permalink raw reply

* [PATCH][RESEND] net/usb: Misc. fixes for the LG-VL600 LTE USB modem
From: Mark Kamichoff @ 2011-11-09 21:48 UTC (permalink / raw)
  To: oliver, gregkh; +Cc: linux-usb, netdev, linux-kernel, Mark Kamichoff

Add checking for valid magic values (needed for stability in the event
corrupted packets are received) and remove some other unneeded checks.
Also, fix flagging device as WWAN (Bugzilla bug #39952).

Signed-off-by: Mark Kamichoff <prox@prolixium.com>
---
 drivers/net/usb/cdc_ether.c |    2 +-
 drivers/net/usb/lg-vl600.c  |   25 +++++++++++--------------
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index c924ea2..99ed6eb 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -567,7 +567,7 @@ static const struct usb_device_id	products [] = {
 {
 	USB_DEVICE_AND_INTERFACE_INFO(0x1004, 0x61aa, USB_CLASS_COMM,
 			USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE),
-	.driver_info = (unsigned long)&wwan_info,
+	.driver_info = 0,
 },
 
 /*
diff --git a/drivers/net/usb/lg-vl600.c b/drivers/net/usb/lg-vl600.c
index d43db32..9c26c63 100644
--- a/drivers/net/usb/lg-vl600.c
+++ b/drivers/net/usb/lg-vl600.c
@@ -144,10 +144,11 @@ static int vl600_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 	}
 
 	frame = (struct vl600_frame_hdr *) buf->data;
-	/* NOTE: Should check that frame->magic == 0x53544448?
-	 * Otherwise if we receive garbage at the beginning of the frame
-	 * we may end up allocating a huge buffer and saving all the
-	 * future incoming data into it.  */
+	/* Yes, check that frame->magic == 0x53544448 (or 0x44544d48),
+	 * otherwise we may run out of memory w/a bad packet */
+	if (ntohl(frame->magic) != 0x53544448 &&
+			ntohl(frame->magic) != 0x44544d48)
+		goto error;
 
 	if (buf->len < sizeof(*frame) ||
 			buf->len != le32_to_cpup(&frame->len)) {
@@ -296,6 +297,11 @@ encapsulate:
 	 * overwrite the remaining fields.
 	 */
 	packet = (struct vl600_pkt_hdr *) skb->data;
+	/* The VL600 wants IPv6 packets to have an IPv4 ethertype
+	 * Since this modem only supports IPv4 and IPv6, just set all
+	 * frames to 0x0800 (ETH_P_IP)
+	 */
+	packet->h_proto = htons(ETH_P_IP);
 	memset(&packet->dummy, 0, sizeof(packet->dummy));
 	packet->len = cpu_to_le32(orig_len);
 
@@ -308,21 +314,12 @@ encapsulate:
 	if (skb->len < full_len) /* Pad */
 		skb_put(skb, full_len - skb->len);
 
-	/* The VL600 wants IPv6 packets to have an IPv4 ethertype
-	 * Check if this is an IPv6 packet, and set the ethertype
-	 * to 0x800
-	 */
-	if ((skb->data[sizeof(struct vl600_pkt_hdr *) + 0x22] & 0xf0) == 0x60) {
-		skb->data[sizeof(struct vl600_pkt_hdr *) + 0x20] = 0x08;
-		skb->data[sizeof(struct vl600_pkt_hdr *) + 0x21] = 0;
-	}
-
 	return skb;
 }
 
 static const struct driver_info	vl600_info = {
 	.description	= "LG VL600 modem",
-	.flags		= FLAG_ETHER | FLAG_RX_ASSEMBLE,
+	.flags		= FLAG_RX_ASSEMBLE | FLAG_WWAN,
 	.bind		= vl600_bind,
 	.unbind		= vl600_unbind,
 	.status		= usbnet_cdc_status,
-- 
1.7.5.4

^ permalink raw reply related

* Re: [PATCH net-next] ipv4: PKTINFO doesnt need dst reference
From: Eric Dumazet @ 2011-11-09 22:03 UTC (permalink / raw)
  To: David Miller; +Cc: bhutchings, pstaszewski, netdev
In-Reply-To: <20111109.163708.2156133928191684256.davem@davemloft.net>

Le mercredi 09 novembre 2011 à 16:37 -0500, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 09 Nov 2011 18:24:35 +0100
> 
> > [PATCH net-next] ipv4: IP_PKTINFO doesnt need dst reference
> > 
> > When a socket uses IP_PKTINFO notifications, we currently force a dst
> > reference for each received skb. Reader has to access dst to get needed
> > information (rt_iif & rt_spec_dst) and must release dst reference.
> > 
> > We also forced a dst reference if skb was put in socket backlog, even
> > without IP_PKTINFO handling. This happens under stress/load.
> > 
> > We can instead store the needed information in skb->cb[], so that only
> > softirq handler really access dst, improving cache hit ratios.
> > 
> > This removes two atomic operations per packet, and false sharing as
> > well.
> > 
> > On a benchmark using a mono threaded receiver (doing only recvmsg()
> > calls), I can reach 720.000 pps instead of 570.000 pps.
> > 
> > IP_PKTINFO is typically used by DNS servers, and any multihomed aware
> > UDP application.
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Looks good, if it compiles I'll push it out to net-next :-)

Arg :(  I cross my fingers :)

BTW, on my bnx2x adapter, even small UDP frames use more than PAGE_SIZE
bytes :

skb->truesize=4352 len=26 (payload only)

Truesize being now more precise, we hit badly the shared
udp_memory_allocated, even with single frames.

I wonder if we shouldnt increase SK_MEM_QUANTUM a bit to avoid
ping/pong...

-#define SK_MEM_QUANTUM ((int)PAGE_SIZE)
+#define SK_MEM_QUANTUM ((int)PAGE_SIZE * 2)

^ permalink raw reply

* Re: [PATCH] libteam: fix function names to include 'bond'
From: Jiri Pirko @ 2011-11-09 22:04 UTC (permalink / raw)
  To: Flavio Leitner
  Cc: netdev, davem, eric.dumazet, bhutchings, shemminger, fubar, andy,
	tgraf, ebiederm, mirqus, kaber, greearb, jesse, benjamin.poirier,
	jzupka
In-Reply-To: <1320862846-6000-1-git-send-email-fbl@redhat.com>


Hi Flavio.

Thomas included these 2 functions in latest libnl upstream. Bond
versions wouldn't work because of "bond" type check.

Jirka

Wed, Nov 09, 2011 at 07:20:46PM CET, fbl@redhat.com wrote:
>Signed-off-by: Flavio Leitner <fbl@redhat.com>
>---
>
> I found those while trying to test V6 patch using latest
> libteam (commit 5e9790816606a6dd4e7f6f32c0bb0c45e5d13b31)
> and libnl-3.2.2 (last stable).
> thanks,
> fbl
>
> lib/libteam.c |    4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/lib/libteam.c b/lib/libteam.c
>index feb13b6..e7ae6b0 100644
>--- a/lib/libteam.c
>+++ b/lib/libteam.c
>@@ -1331,7 +1331,7 @@ int team_port_add(struct team_handle *th, uint32_t port_ifindex)
> {
> 	int err;
> 
>-	err = rtnl_link_enslave_ifindex(th->nl_cli.sock, th->ifindex,
>+	err = rtnl_link_bond_enslave_ifindex(th->nl_cli.sock, th->ifindex,
> 					port_ifindex);
> 	return -nl2syserr(err);
> }
>@@ -1350,6 +1350,6 @@ int team_port_remove(struct team_handle *th, uint32_t port_ifindex)
> {
> 	int err;
> 
>-	err = rtnl_link_release_ifindex(th->nl_cli.sock, port_ifindex);
>+	err = rtnl_link_bond_release_ifindex(th->nl_cli.sock, port_ifindex);
> 	return -nl2syserr(err);
> }
>-- 
>1.7.6
>

^ permalink raw reply

* [PATCH V5 net-next] neigh: new unresolved queue limits
From: Eric Dumazet @ 2011-11-09 22:07 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20111109.162137.808999062815992591.davem@davemloft.net>

Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
> From: David Miller <davem@davemloft.net>
> Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
> 
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Wed, 09 Nov 2011 12:14:09 +0100
> > 
> >> unres_qlen is the number of frames we are able to queue per unresolved
> >> neighbour. Its default value (3) was never changed and is responsible
> >> for strange drops, especially if IP fragments are used, or multiple
> >> sessions start in parallel. Even a single tcp flow can hit this limit.
> >  ...
> > 
> > Ok, I've applied this, let's see what happens :-)
> 
> Early answer, build fails.
> 
> Please test build this patch with DECNET enabled and resubmit.  The
> decnet neigh layer still refers to the removed ->queue_len member.
> 
> Thanks.

Ouch, this was fixed on one machine yesterday, but not the other one I
used this morning, sorry.

[PATCH V5 net-next] neigh: new unresolved queue limits

unres_qlen is the number of frames we are able to queue per unresolved
neighbour. Its default value (3) was never changed and is responsible
for strange drops, especially if IP fragments are used, or multiple
sessions start in parallel. Even a single tcp flow can hit this limit.

$ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108
PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data.
8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms

--- 192.168.20.108 ping statistics ---
2 packets transmitted, 1 received, 50% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.322/0.322/0.322/0.000 ms

Increasing unres_qlen can be dangerous, since an attacker might try to
fill many queues with many packets and consume all memory.

Switch to a bytes limit (limiting queued skbs truesize), and allow a
default limit of 64Kbytes per unresolved neighbour. This new limit seems
big, but as a packet can consume 64Kbytes, it reduces the memory window
offered to attackers.

unres_qlen is kept for compatibility, but internally converted to/from
bytes limit.

# cd /proc/sys/net/ipv4/neigh/default/
# grep . unres_qlen*
unres_qlen:31
unres_qlen_bytes:65536
# echo 10 >unres_qlen
# grep . unres_qlen*
unres_qlen:10
unres_qlen_bytes:21540
# echo 30000 >unres_qlen_bytes
# grep . unres_qlen*
unres_qlen:14
unres_qlen_bytes:30000

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
V5: decnet compile error fix

 Documentation/networking/ip-sysctl.txt |   10 +
 include/linux/neighbour.h              |    1 
 include/net/neighbour.h                |    3 
 net/atm/clip.c                         |    2 
 net/core/neighbour.c                   |  162 +++++++++++++++--------
 net/decnet/dn_neigh.c                  |    2 
 net/ipv4/arp.c                         |    2 
 net/ipv6/ndisc.c                       |    2 
 8 files changed, 128 insertions(+), 56 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index f049a1c..b886706 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -31,6 +31,16 @@ neigh/default/gc_thresh3 - INTEGER
 	when using large numbers of interfaces and when communicating
 	with large numbers of directly-connected peers.
 
+neigh/default/unres_qlen_bytes - INTEGER
+	The maximum number of bytes which may be used by packets
+	queued for each	unresolved address by other network layers.
+	(added in linux 3.3)
+
+neigh/default/unres_qlen - INTEGER
+	The maximum number of packets which may be queued for each
+	unresolved address by other network layers.
+	(deprecated in linux 3.3) : use unres_qlen_bytes instead.
+
 mtu_expires - INTEGER
 	Time, in seconds, that cached PMTU information is kept.
 
diff --git a/include/linux/neighbour.h b/include/linux/neighbour.h
index a7003b7..b188f68 100644
--- a/include/linux/neighbour.h
+++ b/include/linux/neighbour.h
@@ -116,6 +116,7 @@ enum {
 	NDTPA_PROXY_DELAY,		/* u64, msecs */
 	NDTPA_PROXY_QLEN,		/* u32 */
 	NDTPA_LOCKTIME,			/* u64, msecs */
+	NDTPA_QUEUE_LENBYTES,		/* u32 */
 	__NDTPA_MAX
 };
 #define NDTPA_MAX (__NDTPA_MAX - 1)
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 2720884..7ae5acf 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -59,7 +59,7 @@ struct neigh_parms {
 	int	reachable_time;
 	int	delay_probe_time;
 
-	int	queue_len;
+	int	queue_len_bytes;
 	int	ucast_probes;
 	int	app_probes;
 	int	mcast_probes;
@@ -99,6 +99,7 @@ struct neighbour {
 	rwlock_t		lock;
 	atomic_t		refcnt;
 	struct sk_buff_head	arp_queue;
+	unsigned int		arp_queue_len_bytes;
 	struct timer_list	timer;
 	unsigned long		used;
 	atomic_t		probes;
diff --git a/net/atm/clip.c b/net/atm/clip.c
index 8523940..32c41b8 100644
--- a/net/atm/clip.c
+++ b/net/atm/clip.c
@@ -329,7 +329,7 @@ static struct neigh_table clip_tbl = {
 		.gc_staletime 		= 60 * HZ,
 		.reachable_time 	= 30 * HZ,
 		.delay_probe_time 	= 5 * HZ,
-		.queue_len 		= 3,
+		.queue_len_bytes 	= 64 * 1024,
 		.ucast_probes 		= 3,
 		.mcast_probes 		= 3,
 		.anycast_delay 		= 1 * HZ,
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 039d51e..2684794 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -238,6 +238,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
 				   it to safe state.
 				 */
 				skb_queue_purge(&n->arp_queue);
+				n->arp_queue_len_bytes = 0;
 				n->output = neigh_blackhole;
 				if (n->nud_state & NUD_VALID)
 					n->nud_state = NUD_NOARP;
@@ -702,6 +703,7 @@ void neigh_destroy(struct neighbour *neigh)
 		printk(KERN_WARNING "Impossible event.\n");
 
 	skb_queue_purge(&neigh->arp_queue);
+	neigh->arp_queue_len_bytes = 0;
 
 	dev_put(neigh->dev);
 	neigh_parms_put(neigh->parms);
@@ -842,6 +844,7 @@ static void neigh_invalidate(struct neighbour *neigh)
 		write_lock(&neigh->lock);
 	}
 	skb_queue_purge(&neigh->arp_queue);
+	neigh->arp_queue_len_bytes = 0;
 }
 
 static void neigh_probe(struct neighbour *neigh)
@@ -980,15 +983,20 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
 
 	if (neigh->nud_state == NUD_INCOMPLETE) {
 		if (skb) {
-			if (skb_queue_len(&neigh->arp_queue) >=
-			    neigh->parms->queue_len) {
+			while (neigh->arp_queue_len_bytes + skb->truesize >
+			       neigh->parms->queue_len_bytes) {
 				struct sk_buff *buff;
+
 				buff = __skb_dequeue(&neigh->arp_queue);
+				if (!buff)
+					break;
+				neigh->arp_queue_len_bytes -= buff->truesize;
 				kfree_skb(buff);
 				NEIGH_CACHE_STAT_INC(neigh->tbl, unres_discards);
 			}
 			skb_dst_force(skb);
 			__skb_queue_tail(&neigh->arp_queue, skb);
+			neigh->arp_queue_len_bytes += skb->truesize;
 		}
 		rc = 1;
 	}
@@ -1175,6 +1183,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
 			write_lock_bh(&neigh->lock);
 		}
 		skb_queue_purge(&neigh->arp_queue);
+		neigh->arp_queue_len_bytes = 0;
 	}
 out:
 	if (update_isrouter) {
@@ -1747,7 +1756,11 @@ static int neightbl_fill_parms(struct sk_buff *skb, struct neigh_parms *parms)
 		NLA_PUT_U32(skb, NDTPA_IFINDEX, parms->dev->ifindex);
 
 	NLA_PUT_U32(skb, NDTPA_REFCNT, atomic_read(&parms->refcnt));
-	NLA_PUT_U32(skb, NDTPA_QUEUE_LEN, parms->queue_len);
+	NLA_PUT_U32(skb, NDTPA_QUEUE_LENBYTES, parms->queue_len_bytes);
+	/* approximative value for deprecated QUEUE_LEN (in packets) */
+	NLA_PUT_U32(skb, NDTPA_QUEUE_LEN,
+		    DIV_ROUND_UP(parms->queue_len_bytes,
+				 SKB_TRUESIZE(ETH_FRAME_LEN)));
 	NLA_PUT_U32(skb, NDTPA_PROXY_QLEN, parms->proxy_qlen);
 	NLA_PUT_U32(skb, NDTPA_APP_PROBES, parms->app_probes);
 	NLA_PUT_U32(skb, NDTPA_UCAST_PROBES, parms->ucast_probes);
@@ -1974,7 +1987,11 @@ static int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 
 			switch (i) {
 			case NDTPA_QUEUE_LEN:
-				p->queue_len = nla_get_u32(tbp[i]);
+				p->queue_len_bytes = nla_get_u32(tbp[i]) *
+						     SKB_TRUESIZE(ETH_FRAME_LEN);
+				break;
+			case NDTPA_QUEUE_LENBYTES:
+				p->queue_len_bytes = nla_get_u32(tbp[i]);
 				break;
 			case NDTPA_PROXY_QLEN:
 				p->proxy_qlen = nla_get_u32(tbp[i]);
@@ -2635,117 +2652,158 @@ EXPORT_SYMBOL(neigh_app_ns);
 
 #ifdef CONFIG_SYSCTL
 
-#define NEIGH_VARS_MAX 19
+static int proc_unres_qlen(ctl_table *ctl, int write, void __user *buffer,
+			   size_t *lenp, loff_t *ppos)
+{
+	int size, ret;
+	ctl_table tmp = *ctl;
+
+	tmp.data = &size;
+	size = DIV_ROUND_UP(*(int *)ctl->data, SKB_TRUESIZE(ETH_FRAME_LEN));
+	ret = proc_dointvec(&tmp, write, buffer, lenp, ppos);
+	if (write && !ret)
+		*(int *)ctl->data = size * SKB_TRUESIZE(ETH_FRAME_LEN);
+	return ret;
+}
+
+enum {
+	NEIGH_VAR_MCAST_PROBE,
+	NEIGH_VAR_UCAST_PROBE,
+	NEIGH_VAR_APP_PROBE,
+	NEIGH_VAR_RETRANS_TIME,
+	NEIGH_VAR_BASE_REACHABLE_TIME,
+	NEIGH_VAR_DELAY_PROBE_TIME,
+	NEIGH_VAR_GC_STALETIME,
+	NEIGH_VAR_QUEUE_LEN,
+	NEIGH_VAR_QUEUE_LEN_BYTES,
+	NEIGH_VAR_PROXY_QLEN,
+	NEIGH_VAR_ANYCAST_DELAY,
+	NEIGH_VAR_PROXY_DELAY,
+	NEIGH_VAR_LOCKTIME,
+	NEIGH_VAR_RETRANS_TIME_MS,
+	NEIGH_VAR_BASE_REACHABLE_TIME_MS,
+	NEIGH_VAR_GC_INTERVAL,
+	NEIGH_VAR_GC_THRESH1,
+	NEIGH_VAR_GC_THRESH2,
+	NEIGH_VAR_GC_THRESH3,
+	NEIGH_VAR_MAX
+};
 
 static struct neigh_sysctl_table {
 	struct ctl_table_header *sysctl_header;
-	struct ctl_table neigh_vars[NEIGH_VARS_MAX];
+	struct ctl_table neigh_vars[NEIGH_VAR_MAX + 1];
 	char *dev_name;
 } neigh_sysctl_template __read_mostly = {
 	.neigh_vars = {
-		{
+		[NEIGH_VAR_MCAST_PROBE] = {
 			.procname	= "mcast_solicit",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= proc_dointvec,
 		},
-		{
+		[NEIGH_VAR_UCAST_PROBE] = {
 			.procname	= "ucast_solicit",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= proc_dointvec,
 		},
-		{
+		[NEIGH_VAR_APP_PROBE] = {
 			.procname	= "app_solicit",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= proc_dointvec,
 		},
-		{
+		[NEIGH_VAR_RETRANS_TIME] = {
 			.procname	= "retrans_time",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= proc_dointvec_userhz_jiffies,
 		},
-		{
+		[NEIGH_VAR_BASE_REACHABLE_TIME] = {
 			.procname	= "base_reachable_time",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= proc_dointvec_jiffies,
 		},
-		{
+		[NEIGH_VAR_DELAY_PROBE_TIME] = {
 			.procname	= "delay_first_probe_time",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= proc_dointvec_jiffies,
 		},
-		{
+		[NEIGH_VAR_GC_STALETIME] = {
 			.procname	= "gc_stale_time",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= proc_dointvec_jiffies,
 		},
-		{
+		[NEIGH_VAR_QUEUE_LEN] = {
 			.procname	= "unres_qlen",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
+			.proc_handler	= proc_unres_qlen,
+		},
+		[NEIGH_VAR_QUEUE_LEN_BYTES] = {
+			.procname	= "unres_qlen_bytes",
+			.maxlen		= sizeof(int),
+			.mode		= 0644,
 			.proc_handler	= proc_dointvec,
 		},
-		{
+		[NEIGH_VAR_PROXY_QLEN] = {
 			.procname	= "proxy_qlen",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= proc_dointvec,
 		},
-		{
+		[NEIGH_VAR_ANYCAST_DELAY] = {
 			.procname	= "anycast_delay",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= proc_dointvec_userhz_jiffies,
 		},
-		{
+		[NEIGH_VAR_PROXY_DELAY] = {
 			.procname	= "proxy_delay",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= proc_dointvec_userhz_jiffies,
 		},
-		{
+		[NEIGH_VAR_LOCKTIME] = {
 			.procname	= "locktime",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= proc_dointvec_userhz_jiffies,
 		},
-		{
+		[NEIGH_VAR_RETRANS_TIME_MS] = {
 			.procname	= "retrans_time_ms",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= proc_dointvec_ms_jiffies,
 		},
-		{
+		[NEIGH_VAR_BASE_REACHABLE_TIME_MS] = {
 			.procname	= "base_reachable_time_ms",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= proc_dointvec_ms_jiffies,
 		},
-		{
+		[NEIGH_VAR_GC_INTERVAL] = {
 			.procname	= "gc_interval",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= proc_dointvec_jiffies,
 		},
-		{
+		[NEIGH_VAR_GC_THRESH1] = {
 			.procname	= "gc_thresh1",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= proc_dointvec,
 		},
-		{
+		[NEIGH_VAR_GC_THRESH2] = {
 			.procname	= "gc_thresh2",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
 			.proc_handler	= proc_dointvec,
 		},
-		{
+		[NEIGH_VAR_GC_THRESH3] = {
 			.procname	= "gc_thresh3",
 			.maxlen		= sizeof(int),
 			.mode		= 0644,
@@ -2778,47 +2836,49 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
 	if (!t)
 		goto err;
 
-	t->neigh_vars[0].data  = &p->mcast_probes;
-	t->neigh_vars[1].data  = &p->ucast_probes;
-	t->neigh_vars[2].data  = &p->app_probes;
-	t->neigh_vars[3].data  = &p->retrans_time;
-	t->neigh_vars[4].data  = &p->base_reachable_time;
-	t->neigh_vars[5].data  = &p->delay_probe_time;
-	t->neigh_vars[6].data  = &p->gc_staletime;
-	t->neigh_vars[7].data  = &p->queue_len;
-	t->neigh_vars[8].data  = &p->proxy_qlen;
-	t->neigh_vars[9].data  = &p->anycast_delay;
-	t->neigh_vars[10].data = &p->proxy_delay;
-	t->neigh_vars[11].data = &p->locktime;
-	t->neigh_vars[12].data  = &p->retrans_time;
-	t->neigh_vars[13].data  = &p->base_reachable_time;
+	t->neigh_vars[NEIGH_VAR_MCAST_PROBE].data  = &p->mcast_probes;
+	t->neigh_vars[NEIGH_VAR_UCAST_PROBE].data  = &p->ucast_probes;
+	t->neigh_vars[NEIGH_VAR_APP_PROBE].data  = &p->app_probes;
+	t->neigh_vars[NEIGH_VAR_RETRANS_TIME].data  = &p->retrans_time;
+	t->neigh_vars[NEIGH_VAR_BASE_REACHABLE_TIME].data  = &p->base_reachable_time;
+	t->neigh_vars[NEIGH_VAR_DELAY_PROBE_TIME].data  = &p->delay_probe_time;
+	t->neigh_vars[NEIGH_VAR_GC_STALETIME].data  = &p->gc_staletime;
+	t->neigh_vars[NEIGH_VAR_QUEUE_LEN].data  = &p->queue_len_bytes;
+	t->neigh_vars[NEIGH_VAR_QUEUE_LEN_BYTES].data  = &p->queue_len_bytes;
+	t->neigh_vars[NEIGH_VAR_PROXY_QLEN].data  = &p->proxy_qlen;
+	t->neigh_vars[NEIGH_VAR_ANYCAST_DELAY].data  = &p->anycast_delay;
+	t->neigh_vars[NEIGH_VAR_PROXY_DELAY].data = &p->proxy_delay;
+	t->neigh_vars[NEIGH_VAR_LOCKTIME].data = &p->locktime;
+	t->neigh_vars[NEIGH_VAR_RETRANS_TIME_MS].data  = &p->retrans_time;
+	t->neigh_vars[NEIGH_VAR_BASE_REACHABLE_TIME_MS].data  = &p->base_reachable_time;
 
 	if (dev) {
 		dev_name_source = dev->name;
 		/* Terminate the table early */
-		memset(&t->neigh_vars[14], 0, sizeof(t->neigh_vars[14]));
+		memset(&t->neigh_vars[NEIGH_VAR_GC_INTERVAL], 0,
+		       sizeof(t->neigh_vars[NEIGH_VAR_GC_INTERVAL]));
 	} else {
 		dev_name_source = neigh_path[NEIGH_CTL_PATH_DEV].procname;
-		t->neigh_vars[14].data = (int *)(p + 1);
-		t->neigh_vars[15].data = (int *)(p + 1) + 1;
-		t->neigh_vars[16].data = (int *)(p + 1) + 2;
-		t->neigh_vars[17].data = (int *)(p + 1) + 3;
+		t->neigh_vars[NEIGH_VAR_GC_INTERVAL].data = (int *)(p + 1);
+		t->neigh_vars[NEIGH_VAR_GC_THRESH1].data = (int *)(p + 1) + 1;
+		t->neigh_vars[NEIGH_VAR_GC_THRESH2].data = (int *)(p + 1) + 2;
+		t->neigh_vars[NEIGH_VAR_GC_THRESH3].data = (int *)(p + 1) + 3;
 	}
 
 
 	if (handler) {
 		/* RetransTime */
-		t->neigh_vars[3].proc_handler = handler;
-		t->neigh_vars[3].extra1 = dev;
+		t->neigh_vars[NEIGH_VAR_RETRANS_TIME].proc_handler = handler;
+		t->neigh_vars[NEIGH_VAR_RETRANS_TIME].extra1 = dev;
 		/* ReachableTime */
-		t->neigh_vars[4].proc_handler = handler;
-		t->neigh_vars[4].extra1 = dev;
+		t->neigh_vars[NEIGH_VAR_BASE_REACHABLE_TIME].proc_handler = handler;
+		t->neigh_vars[NEIGH_VAR_BASE_REACHABLE_TIME].extra1 = dev;
 		/* RetransTime (in milliseconds)*/
-		t->neigh_vars[12].proc_handler = handler;
-		t->neigh_vars[12].extra1 = dev;
+		t->neigh_vars[NEIGH_VAR_RETRANS_TIME_MS].proc_handler = handler;
+		t->neigh_vars[NEIGH_VAR_RETRANS_TIME_MS].extra1 = dev;
 		/* ReachableTime (in milliseconds) */
-		t->neigh_vars[13].proc_handler = handler;
-		t->neigh_vars[13].extra1 = dev;
+		t->neigh_vars[NEIGH_VAR_BASE_REACHABLE_TIME_MS].proc_handler = handler;
+		t->neigh_vars[NEIGH_VAR_BASE_REACHABLE_TIME_MS].extra1 = dev;
 	}
 
 	t->dev_name = kstrdup(dev_name_source, GFP_KERNEL);
diff --git a/net/decnet/dn_neigh.c b/net/decnet/dn_neigh.c
index 7f0eb08..9e73aa1 100644
--- a/net/decnet/dn_neigh.c
+++ b/net/decnet/dn_neigh.c
@@ -107,7 +107,7 @@ struct neigh_table dn_neigh_table = {
 		.gc_staletime =	60 * HZ,
 		.reachable_time =		30 * HZ,
 		.delay_probe_time =	5 * HZ,
-		.queue_len =		3,
+		.queue_len_bytes =	64*1024,
 		.ucast_probes =	0,
 		.app_probes =		0,
 		.mcast_probes =	0,
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 96a164a..d732827 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -177,7 +177,7 @@ struct neigh_table arp_tbl = {
 		.gc_staletime		= 60 * HZ,
 		.reachable_time		= 30 * HZ,
 		.delay_probe_time	= 5 * HZ,
-		.queue_len		= 3,
+		.queue_len_bytes	= 64*1024,
 		.ucast_probes		= 3,
 		.mcast_probes		= 3,
 		.anycast_delay		= 1 * HZ,
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 44e5b7f..4a20982 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -141,7 +141,7 @@ struct neigh_table nd_tbl = {
 		.gc_staletime		= 60 * HZ,
 		.reachable_time		= ND_REACHABLE_TIME,
 		.delay_probe_time	= 5 * HZ,
-		.queue_len		= 3,
+		.queue_len_bytes	= 64*1024,
 		.ucast_probes		= 3,
 		.mcast_probes		= 3,
 		.anycast_delay		= 1 * HZ,

^ permalink raw reply related

* [PATCH net-next v1 0/9] forcedeth: stats & debug enhancements
From: David Decotigny @ 2011-11-09 22:09 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: David S. Miller, Ian Campbell, Eric Dumazet, Jeff Kirsher,
	Ben Hutchings, David Decotigny

These changes implement the ndo_get_stats64 API and add a few more
stats and debugging features for forcedeth. They also ensure that
stats updates are correct in SMP systems, 32 or 64-bits.

Regarding the "implement ndo_get_stats64() API" patch, I'm not sure
I'm using the right way to protect the 64b stats. Ideally, I would
like them to be non-blocking (u64_stats_sync.h), but as there are
several sources for updates, I don't think I can do without locking or
per-CPU stats. Would per-CPU stats be better here (note: I expect the
contention on netdev_priv(dev)->stats_lock to be _VERY_ low)?

Tested:
  ~150Mbps incoming TCP, ethtool -S in a loop, x86_64 16-way:
     tx_bytes: 1413863329
     rx_packets: 38918872
     tx_packets: 19828148
     rx_bytes: 57818685991

############################################
# Patch Set Summary:

David Decotigny (6):
  forcedeth: expose module parameters in /sys/module
  forcedeth: stats for rx_packets based on hardware registers
  forcedeth: implement ndo_get_stats64() API
  forcedeth: account for dropped RX frames
  forcedeth: stats updated with a deferrable timer
  forcedeth: whitespace/indentation fixes

Mike Ditto (1):
  forcedeth: Add messages to indicate using MSI or MSI-X

Sameer Nanda (2):
  forcedeth: allow to silence "TX timeout" debug messages
  forcedeth: new ethtool stat counter for TX timeouts

 drivers/net/ethernet/nvidia/forcedeth.c |  271 +++++++++++++++++++++----------
 1 files changed, 184 insertions(+), 87 deletions(-)

-- 
1.7.3.1

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox