Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] 3c505: Fix compile breakage
From: David Miller @ 2011-10-31 21:57 UTC (permalink / raw)
  To: joe; +Cc: bharrosh, jbaron, philb, linux-kernel, netdev, randy.dunlap, sfr
In-Reply-To: <1320097524.4399.8.camel@Joe-Laptop>

From: Joe Perches <joe@perches.com>
Date: Mon, 31 Oct 2011 23:45:24 +0200

> The joys of preprocessor games with c90 named initializers.
> 
> commit 07613b0b5ef8
> ("dynamic_debug: consolidate repetitive struct _ddebug descriptor definitions")
> uses a ".filename" named initializer.
> 
> When filename is also a #define this fails to compile.
> 
> Remove #define filename from 3c505.c
> 
> Signed-off-by: Joe Perches <joe@perches.com>

Andrew Morton submitted a fix for this earlier today.

http://marc.info/?l=linux-netdev&m=132009089027792&w=2

^ permalink raw reply

* [PATCH] [PATCH] HFSC (7) & (8) doc... (fixups)
From: Michal Soltys @ 2011-10-31 21:56 UTC (permalink / raw)
  To: vapier; +Cc: stephen.hemminger, netdev
In-Reply-To: <4EAF1904.8060307@ziu.info>

Few minore changes and small additions.
---
 man/man7/tc-hfsc.7 |   94 ++++++++++++++++++++++++++++++++++++----------------
 man/man8/tc-hfsc.8 |    2 +-
 man/man8/tc-stab.8 |   49 +++++++++++++++-----------
 3 files changed, 94 insertions(+), 51 deletions(-)

diff --git a/man/man7/tc-hfsc.7 b/man/man7/tc-hfsc.7
index bcdea7b..9a9d85a 100644
--- a/man/man7/tc-hfsc.7
+++ b/man/man7/tc-hfsc.7
@@ -1,4 +1,4 @@
-.TH HFSC 7 "25 February 2009" iproute2 Linux
+.TH HFSC 7 "31 October 2011" iproute2 Linux
 .ce 1
 \fBHIERARCHICAL FAIR SERVICE CURVE\fR
 .
@@ -158,7 +158,7 @@ curve.
 .IP "V()"
 In linkshare criterion, arbitrates which packet to send next. Note that V() is
 function of a virtual time \- see \fBLINKSHARE CRITERION\fR section for
-details.  Virtual time \&'vt' corresponds to packets' heads
+details. Virtual time \&'vt' corresponds to packets' heads
 (vt\~=\~V^(\-1)(w)). Based on LS service curve.
 .IP "F()"
 An extension to linkshare criterion, used to limit at which speed linkshare
@@ -187,12 +187,12 @@ Interface 10mbit, two classes, both with two\-piece linear service curves:
 .PP
 Assume for a moment, that we only use D() for both finding eligible packets,
 and choosing the most fitting one, thus eligible time would be computed as
-D^(\-1)(w) and deadline time would be computed as D^(\-1)(w+l).  If the 2nd
+D^(\-1)(w) and deadline time would be computed as D^(\-1)(w+l). If the 2nd
 class starts sending packets 1 second after the 1st class, it's of course
 impossible to guarantee 14mbit, as the interface capability is only 10mbit.
 The only workaround in this scenario is to allow the 1st class to send the
 packets earlier that would normally be allowed. That's where separate E() comes
-to help.  Putting all the math aside (see HFSC paper for details), E() for RT
+to help. Putting all the math aside (see HFSC paper for details), E() for RT
 concave service curve is just like D(), but for the RT convex service curve \-
 it's constructed using \fIonly\fR RT service curve's 2nd slope (in our example
 \- 7mbit).
@@ -255,7 +255,7 @@ Such approach has its price though. The problem is analogous to what was
 presented in previous section and is caused by non\-linearity of service
 curves:
 .IP 1) 4
-either it's impossible to guarantee both service curves and satisfy fairness
+either it's impossible to guarantee service curves and satisfy fairness
 during certain time periods:
 
 .RS 4
@@ -278,40 +278,40 @@ beyond of what the interface is capable of.
 .RE
 
 .IP 2) 4
-and/or it's impossible to guarantee service curves of all classes at all
+and/or it's impossible to guarantee service curves of all classes at the same
+time [fairly or not]:
 
 .RS 4
-Even if we didn't use virtual time and allowed a session to be "punished",
-there's a possibility that service curves of all classes couldn't be
-guaranteed for a brief period. Consider following, a bit more complicated
-example:
-
-Root interface, classes A and B with concave and convex curve (summing up to
-root), A1 & A2 (children of A), \fIboth\fR with concave curves summing up to A,
-B1 & B2 (children of B), \fIboth\fR with convex curves summing up to B.
 
-Assume that A2, B1 and B2 are constantly backlogged, and at some later point
-A1 becomes backlogged. We can easily choose slopes, so that even if we
-"punish" A2 for earlier excess bandwidth received, A1 will have no chance of
-getting bandwidth corresponding to its first slope. Following from the above
-example:
+This is similar to the above case, but a bit more subtle. We will consider two
+subtrees, arbitrated by their common (root here) parent:
 
 .nf
+R (root) -\ 10mbit
+
 A  \- 7mbit, then 3mbit
 A1 \- 5mbit, then 2mbit
 A2 \- 2mbit, then 1mbit
 
 B  \- 3mbit, then 7mbit
-B1 \- 2mbit, then 5mbit
-B2 \- 1mbit, then 2mbit
 .fi
 
-At the point when A1 starts sending, it should get 5mbit to not violate its
-service curve. A2 gets punished and doesn't send at all, B1 and B2 both keep
-sending at their 5mbit and 2mbit. But as you can see, we already are beyond
-interface's capacity \- at 12mbit. A1 could get 3mbit at most. If we used
-virtual times and kept fairness property, A1 and A2 would send at 3mbit
-together with 5:2 ratio (so respectively at ~2.14mbit and ~0.86mbit).
+R arbitrates between left subtree (A) and right (B). Assume that A2 and B are
+constantly backlogged, and at some later point A1 becomes backlogged (when all
+other classes are in their 2nd linear part).
+
+What happens now ? B (choice made by R) will \fIalways\fR get 7 mbit as R is
+only (obviously) concerned with the ratio between its direct children. Thus A
+subtree gets 3mbit, but its children would want (at the point when A1 became
+backlogged) 5mbit + 1mbit. That's of course impossible, as they can only get
+3mbit due to interface limitation.
+
+In the left subtree \- we have the same situation as previously (fair split
+between A1 and A2, but violated guarantees), but in the whole tree \- there's
+no fairness (B got 7mbit, but A1 and A2 have to fit together in 3mbit) and
+there's no guarantees for all classes (only B got what it wanted). Even if we
+violated fairness in the A subtree and set A2's service curve to 0, A1 would
+still not get the required bandwidth.
 .RE
 .
 .SH "UPPERLIMIT CRITERION"
@@ -416,6 +416,19 @@ In the other words - LS criterion is meaningless in the above example.
 You can quickly "workaround" it by making sure each leaf class has RT service
 curve assigned (thus guaranteeing all of them will get some bandwidth), but it
 doesn't make it any more valid.
+
+Keep in mind - if you use nonlinear curves and irregularities explained above
+happen \fIonly\fR in the first segment, then there's little wrong with
+"overusing" RT curve a bit:
+
+.nf
+A \- ls 5.0mbit, rt 9mbit/30ms, then 1mbit
+B \- ls 2.5mbit
+C \- ls 2.5mbit
+.fi
+
+Here, the vt of A will "spike" in the initial period, but then A will never get more
+than 1mbit, until B & C catch up. Then everything will be back to normal.
 .
 .SH "LINUX AND TIMER RESOLUTION"
 .
@@ -434,7 +447,7 @@ If you have \&'tickless system' enabled, then the timer interrupt will trigger
 as slowly as possible, but each time a scheduler throttles itself (or any
 other part of the kernel needs better accuracy), the rate will be increased as
 needed / possible. The ceiling is either \&'timer frequency' if \&'high
-resolution timer support' is not available or not compiled in. Otherwise it's
+resolution timer support' is not available or not compiled in, or it's
 hardware dependent and can go \fIfar\fR beyond the highest \&'timer frequency'
 setting available.
 
@@ -458,7 +471,7 @@ tc class add dev eth0 parent 1:0 classid 1:1 hfsc rt m2 10mbit
 
 Assuming packet of ~1KB size and HZ=100, that averages to ~0.8mbit \- anything
 beyond it (e.g. the above example with specified rate over 10x bigger) will
-require appropriate queuing and cause bursts every ~10 ms.  As you can
+require appropriate queuing and cause bursts every ~10 ms. As you can
 imagine, any HFSC's RT guarantees will be seriously invalidated by that.
 Aforementioned example is mainly important if you deal with old hardware \- as
 it's particularly popular for home server chores. Even then, you can easily
@@ -510,6 +523,29 @@ curve there, and in such scenario HFSC simply doesn't throttle at all.
 So, in rare case you need those speeds with only RT service curve, or with UL
 service curve \- remember about drawbacks.
 .
+.SH "CAVEAT: RANDOM ONLINE EXAMPLES"
+.
+For reasons unknown (though well guessed), many examples you can google love to
+overuse UL criterion and stuff it in every node possible. This makes no sense
+and works against what HFSC tries to do (and does pretty damn well). Use UL
+where it makes sense - on the uppermost node to match upstream router's uplink
+capacity. Or - in special cases, such as testing (limit certain subtree to some
+speed) or customers that must never get more than certain speed. In the last
+case you can usually achieve the same by just using RT criterion without LS+UL
+on leaf nodes.
+
+As for router case - remember it's good to differentiate between "traffic to
+router" (remote console, web config, etc.) and "outgoing traffic", so for
+example:
+
+.nf
+tc qdisc add dev eth0 root handle 1:0 hfsc default 0x8002
+tc class add dev eth0 parent 1:0 classid 1:999 hfsc rt m2 50mbit
+tc class add dev eth0 parent 1:0 classid 1:1 hfsc ls m2 2mbit ul m2 2mbit
+.fi
+
+\&... so "internet" tree under 1:1 and "router itself" as 1:999
+.
 .SH "LAYER2 ADAPTATION"
 .
 Please refer to \fBtc\-stab\fR(8)
diff --git a/man/man8/tc-hfsc.8 b/man/man8/tc-hfsc.8
index 22018c0..c5ff331 100644
--- a/man/man8/tc-hfsc.8
+++ b/man/man8/tc-hfsc.8
@@ -1,4 +1,4 @@
-.TH HFSC 8 "25 February 2009" iproute2 Linux
+.TH HFSC 8 "31 October 2011" iproute2 Linux
 .
 .SH NAME
 HFSC \- Hierarchical Fair Service Curve's control under linux
diff --git a/man/man8/tc-stab.8 b/man/man8/tc-stab.8
index 1442a69..522ea00 100644
--- a/man/man8/tc-stab.8
+++ b/man/man8/tc-stab.8
@@ -1,4 +1,4 @@
-.TH STAB 8 "25 February 2009" iproute2 Linux
+.TH STAB 8 "31 October 2011" iproute2 Linux
 .
 .SH NAME
 tc\-stab \- Generic size table manipulations
@@ -42,14 +42,14 @@ size is calculated only once \- when a qdisc enqueues the packet. Initial root
 enqueue initializes it to the real packet's size.
 
 Each qdisc can use different size table, but the adjusted size is stored in
-area shared by whole qdisc hierarchy attached to the interface (technically,
-it's stored in skb). The effect is, that if you have such setup, the last qdisc
-with a stab in a chain "wins". For example, consider HFSC with simple pfifo
-attached to one of its leaf classes. If that pfifo qdisc has stab defined, it
-will override lengths calculated during HFSC's enqueue, and in turn, whenever
-HFSC tries to dequeue a packet, it will use potentially invalid size in its
-calculations. Normal setups will usually include stab defined only on root
-qdisc, but further overriding gives extra flexibility for less usual setups.
+area shared by whole qdisc hierarchy attached to the interface. The effect is,
+that if you have such setup, the last qdisc with a stab in a chain "wins". For
+example, consider HFSC with simple pfifo attached to one of its leaf classes.
+If that pfifo qdisc has stab defined, it will override lengths calculated
+during HFSC's enqueue, and in turn, whenever HFSC tries to dequeue a packet, it
+will use potentially invalid size in its calculations. Normal setups will
+usually include stab defined only on root qdisc, but further overriding gives
+extra flexibility for less usual setups.
 
 Initial size table is calculated by \fBtc\fR tool using \fBmtu\fR and
 \fBtsize\fR parameters. The algorithm sets each slot's size to the smallest
@@ -59,18 +59,16 @@ table will usually support more than is required by \fBmtu\fR.
 
 For example, with \fBmtu\fR\~=\~1500 and \fBtsize\fR\~=\~128, a table with 128
 slots will be created, where slot 0 will correspond to sizes 0\-16, slot 1 to
-17\~\-\~32, \&..., slot 127 to 2033\~\-\~2048. Note, that the sizes
-are shifted 1 byte (normally you would expect 0\~\-\~15, 16\~\-\~31, \&...,
-2032\~\-\~2047). Sizes assigned to each slot depend on \fBlinklayer\fR parameter.
+17\~\-\~32, \&..., slot 127 to 2033\~\-\~2048. Sizes assigned to each slot
+depend on \fBlinklayer\fR parameter.
 
 Stab calculation is also safe for an unusual case, when a size assigned to a
 slot would be larger than 2^16\-1 (you will lose the accuracy though).
 
 During kernel part of packet size adjustment, \fBoverhead\fR will be added to
-original size, and after subtracting 1 (to land in the proper slot \- see above
-about shifting by 1 byte) slot will be calculated. If the size would cause
-overflow, more than 1 slot will be used to get the final size. It of course will
-affect accuracy, but it's only a guard against unusual situations.
+original size, and then slot will be calculated. If the size would cause
+overflow, more than 1 slot will be used to get the final size. It of course
+will affect accuracy, but it's only a guard against unusual situations.
 
 Currently there're two methods of creating values stored in the size table \-
 ethernet and atm (adsl):
@@ -82,8 +80,8 @@ This is basically 1\-1 mapping, so following our example from above
 and so on, up to slot 127 with 2048. Note, that \fBmpu\fR\~>\~0 must be
 specified, and slots that would get less than specified by \fBmpu\fR, will get
 \fBmpu\fR instead. If you don't specify \fBmpu\fR, the size table will not be
-created at all, although any \fBoverhead\fR value will be respected during
-calculations.
+created at all (it wouldn't make any difference), although any \fBoverhead\fR
+value will be respected during calculations.
 .IP "atm, adsl"
 .br
 ATM linklayer consists of 53 byte cells, where each of them provides 48 bytes
@@ -127,7 +125,7 @@ IPoA in LLC case requires SNAP, instead of LLC\-NLPID (see rfc2684) \- this is
 the reason, why it actually takes more space than PPPoA.
 .IP \(bu
 In rare cases, FCS might be preserved on protocols that include ethernet frame
-(Bridged and PPPoE).  In such situation, any ethernet specific padding
+(Bridged and PPPoE). In such situation, any ethernet specific padding
 guaranteeing 64 bytes long frame size has to be included as well (see rfc2684).
 In the other words, it also guarantees that any packet you send will take
 minimum 2 atm cells. You should set \fBmpu\fR accordingly for that.
@@ -136,11 +134,20 @@ When size table is consulted, and you're shaping traffic for the sake of
 another modem/router, ethernet header (without padding) will already be added
 to initial packet's length. You should compensate for that by subtracting 14
 from the above overheads in such case. If you're shaping directly on the router
-(for example, with speedtouch usb modem) using ppp daemon, layer2 header will
-not be added yet.
+(for example, with speedtouch usb modem) using ppp daemon, you're using raw ip
+interface without underlying layer2, so nothing will be added.
 
 For more thorough explanations, please see \fB[1]\fR and \fB[2]\fR.
 .
+.SH "ETHERNET CARDS CONSIDERATIONS"
+.
+It's often forgotten, that modern network cards (even cheap ones on desktop
+motherboards) and/or their drivers often support different offloading
+mechanisms. In context of traffic shaping, 'tso' and 'gso' might cause
+undesirable effects, due to massive tcp segments being considered during
+traffic shaping (including stab calculations). For slow uplink interfaces,
+it's good to use \fBethtool\fR to turn off offloading features.
+.
 .SH "SEE ALSO"
 .
 \fBtc\fR(8), \fBtc\-hfsc\fR(7), \fBtc\-hfsc\fR(8),
-- 
1.7.7.1

^ permalink raw reply related

* Re: [PATCH] HFSC (7) & (8) documentation + assorted changes
From: Michal Soltys @ 2011-10-31 21:54 UTC (permalink / raw)
  To: Mike Frysinger; +Cc: stephen.hemminger, netdev
In-Reply-To: <CAJaTeTom1yW_22CbTUa+6iMMpFsomnMhm6-WV0Uft+0FrzjqvQ@mail.gmail.com>

On 11-10-26 12:02, Mike Frysinger wrote:
>
> if you want to post updates to the content, i can take care of the
> *roff formatting.
> -mike
> --

Did few minor adjustments / fixups on top of the patch you posted. 
Though simple, check please if I didn't mess something.

^ permalink raw reply

* [PATCH] 3c505: Fix compile breakage
From: Joe Perches @ 2011-10-31 21:45 UTC (permalink / raw)
  To: Boaz Harrosh, Jason Baron
  Cc: Philip Blundell, linux-kernel, netdev, Randy Dunlap,
	Stephen Rothwell
In-Reply-To: <4EAF0F34.4070702@panasas.com>

The joys of preprocessor games with c90 named initializers.

commit 07613b0b5ef8
("dynamic_debug: consolidate repetitive struct _ddebug descriptor definitions")
uses a ".filename" named initializer.

When filename is also a #define this fails to compile.

Remove #define filename from 3c505.c

Signed-off-by: Joe Perches <joe@perches.com>

---

On Mon, 2011-10-31 at 14:12 -0700, Boaz Harrosh wrote:
> Doing an "make ARCH=i386 allmodconfig" on  linus/master  [f362f98] gives me the below
> compilation breakage.
> (Fedora_15_amd64 machine)
> It's probably old news but I thought I'll report it as part of my obligation
> as a Kernel monkey

Thanks Boaz.

Good monkey, <gives peanut>

 drivers/net/ethernet/i825xx/3c505.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/i825xx/3c505.c b/drivers/net/ethernet/i825xx/3c505.c
index 40e1a17..ba82a26 100644
--- a/drivers/net/ethernet/i825xx/3c505.c
+++ b/drivers/net/ethernet/i825xx/3c505.c
@@ -126,15 +126,13 @@
  *
  *********************************************************/
 
-#define filename __FILE__
-
 #define timeout_msg "*** timeout at %s:%s (line %d) ***\n"
 #define TIMEOUT_MSG(lineno) \
-	pr_notice(timeout_msg, filename, __func__, (lineno))
+	pr_notice(timeout_msg, __FILE__, __func__, (lineno))
 
 #define invalid_pcb_msg "*** invalid pcb length %d at %s:%s (line %d) ***\n"
 #define INVALID_PCB_MSG(len) \
-	pr_notice(invalid_pcb_msg, (len), filename, __func__, __LINE__)
+	pr_notice(invalid_pcb_msg, (len), __FILE__, __func__, __LINE__)
 
 #define search_msg "%s: Looking for 3c505 adapter at address %#x..."
 

^ permalink raw reply related

* [patch 1/1] net/netfilter/nf_conntrack_netlink.c: fix Oops on container destroy
From: akpm @ 2011-10-31 21:33 UTC (permalink / raw)
  To: kaber; +Cc: davem, netdev, netfilter-devel, akpm, alex, stable, stable

From: Alex Bligh <alex@alex.org.uk>
Subject: net/netfilter/nf_conntrack_netlink.c: fix Oops on container destroy

Problem:

A repeatable Oops can be caused if a container with networking
unshared is destroyed when it has nf_conntrack entries yet to expire.

A copy of the oops follows below. A perl program generating the oops
repeatably is attached inline below.

Analysis:

The oops is called from cleanup_net when the namespace is
destroyed. conntrack iterates through outstanding events and calls
death_by_timeout on each of them, which in turn produces a call to
ctnetlink_conntrack_event. This calls nf_netlink_has_listeners, which
oopses because net->nfnl is NULL.

The perl program generates the container through fork() then
clone(NS_NEWNET). I does not explicitly	set up netlink
explicitly set up netlink, but I presume it was set up else net->nfnl
would have been NULL earlier (i.e. when an earlier connection
timed out). This would thus suggest that net->nfnl is made NULL
during the destruction of the container, which I think is done by
nfnetlink_net_exit_batch.

I can see that the various subsystems are deinitialised in the opposite
order to which the relevant register_pernet_subsys calls are called,
and both nf_conntrack and nfnetlink_net_ops register their relevant
subsystems. If nfnetlink_net_ops registered later than nfconntrack,
then its exit routine would have been called first, which would cause
the oops described. I am not sure there is anything to prevent this
happening in a container environment.

Whilst there's perhaps a more complex problem revolving around ordering
of subsystem deinit, it seems to me that missing a netlink event on a
container that is dying is not a disaster. An early check for net->nfnl
being non-NULL in ctnetlink_conntrack_event appears to fix this. There
may remain a potential race condition if it becomes NULL immediately
after being checked (I am not sure any lock is held at this point or
how synchronisation for subsystem deinitialization works).

Patch:

The patch attached should apply on everything from 2.6.26 (if not before)
onwards; it appears to be a problem on all kernels. This was taken against
Ubuntu-3.0.0-11.17 which is very close to 3.0.4. I have torture-tested it
with the above perl script for 15 minutes or so; the perl script hung the
machine within 20 seconds without this patch.

Applicability:

If this is the right solution, it should be applied to all stable kernels
as well as head. Apart from the minor overhead of checking one variable
against NULL, it can never 'do the wrong thing', because if net->nfnl
is NULL, an oops will inevitably result. Therefore, checking is a reasonable
thing to do unless it can be proven than net->nfnl will never be NULL.

Check net->nfnl for NULL in ctnetlink_conntrack_event to avoid Oops on
container destroy

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Cc: Patrick McHardy <kaber@trash.net>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 net/netfilter/nf_conntrack_netlink.c |    5 +++++
 1 file changed, 5 insertions(+)

diff -puN net/netfilter/nf_conntrack_netlink.c~net-netfilter-nf_conntrack_netlinkc-fix-oops-on-container-destroy net/netfilter/nf_conntrack_netlink.c
--- a/net/netfilter/nf_conntrack_netlink.c~net-netfilter-nf_conntrack_netlinkc-fix-oops-on-container-destroy
+++ a/net/netfilter/nf_conntrack_netlink.c
@@ -570,6 +570,11 @@ ctnetlink_conntrack_event(unsigned int e
 		return 0;

 	net = nf_ct_net(ct);
+
+	/* container deinit, netlink may have died before death_by_timeout */
+	if (!net->nfnl)
+		return 0;
+
 	if (!item->report && !nfnetlink_has_listeners(net, group))
 		return 0;

_

^ permalink raw reply

* Re: [RFC v2] tcp: Export TCP Delayed ACK parameters to user
From: Rick Jones @ 2011-10-31 21:29 UTC (permalink / raw)
  To: Daniel Baluta
  Cc: David Miller, eric.dumazet, kuznet, jmorris, yoshfuji, kaber,
	netdev, luto
In-Reply-To: <CAEnQRZAGc42q1LCEDs=QigLkXeHBui=KnAm5=5xrEzvA-LDcGg@mail.gmail.com>

On 10/31/2011 01:02 PM, Daniel Baluta wrote:
> On Mon, Oct 31, 2011 at 8:10 PM, Rick Jones<rick.jones2@hp.com>  wrote:
>> Whether tracked as bytes or segments, my take is that to ask applications to
>> have to think about another non-portable socket option is ungood.  I would
>> suggest taking the time to work-out the automagic heuristic to drop the
>> deferred ACK count on connections where it being large is un-desirable and
>> then not need to worry about the limits being global.
>
> Your suggestion deserves further investigation, it looks tricky to
> find a good heuristic for increasing/decreasing the ACK deferred count.

Well, presumably you can observe the behaviour of some HP-UX and/or 
Solaris receivers to get some ideas.

>> If I recall correctly, in one of your earlier posts you mentioned something
>> about a 20% performance boost.  What were the specific conditions of that
>> testing?  Was it over a setup where the receiver already had LRO/GRO or was
>> it over a more plain receiver NIC without that functionality?
>
> If I remember correctly on the receiver side there was no LRO/GRO, but we
> tweaked some of /proc/sys/net/ipv4 parameters (e.g tcp_rmem).
> Also, the traffic was highly unidirectional with many clients feeding multimedia
> content to a server.
>
> Anyhow, we used our custom kernel which is an older kernel version.
> Are there any recommended benchmarks/tools for testing this kind of parameters?

Well, the last time I was tilting after the ACK avoidance windmill I 
used my favorite tool, netperf.  I believe I posted some HP-UX data 
showing the effect of different values of tcp_deferred_ack_max.  Both on 
throughput, and on CPU utilization/service demand.  Of course, I have 
something of a bias in that regard :)

rick jones

^ permalink raw reply

* Re: [PATCH] bonding:update speed/duplex for NETDEV_CHANGE
From: Jay Vosburgh @ 2011-10-31 21:23 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Weiping Pan, netdev, andy, linux-kernel
In-Reply-To: <1320094108.2735.15.camel@bwh-desktop>

Ben Hutchings <bhutchings@solarflare.com> wrote:

>On Mon, 2011-10-31 at 13:32 -0700, Jay Vosburgh wrote:
>[...]
>> 	This particular case arises only during enslavement.  The call
>> to bond_update_speed_duplex call has failed, but the device is marked by
>> bonding to be up.  Bonding complains that the device isn't down, but it
>> cannot get speed and duplex, and therefore is assuming them to be
>> 100/Full.
>> 
>> 	The catch is that this happens only for the ARP monitor, because
>> it initially presumes a slave to be up regardless of actual carrier
>> state (for historical reasons related to very old 10 or 10/100 drivers,
>> prior to the introduction of netif_carrier_*).
>
>Right, I gathered that.  Is there any reason to use the ARP monitor when
>all slaves support link state notification?  Maybe the bonding
>documentation should recommend miimon in section 7, not just in section
>2.

	The ARP monitor can validate that traffic actually flows from
the slave to some destination in the switch domain (and back), so, for
example, it's useful in cases that multiple switch hops exist between
the host and the local router.  A link failure in the middle of the path
won't affect carrier on the local device, but still may cause a
communications break.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* 3c505.c: Does not compile on linus/master  [f362f98]
From: Boaz Harrosh @ 2011-10-31 21:12 UTC (permalink / raw)
  To: Philip Blundell, linux-kernel, netdev; +Cc: Randy Dunlap, Stephen Rothwell

Doing an "make ARCH=i386 allmodconfig" on  linus/master  [f362f98] gives me the below
compilation breakage.

(Fedora_15_amd64 machine)

It's probably old news but I thought I'll report it as part of my obligation
as a Kernel monkey

Cheers
Boaz
----

/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c: In function ‘send_pcb’:
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:390:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:390:4: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:436:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:435:3: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c: In function ‘start_receive’:
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:557:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:557:3: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c: In function ‘receive_packet’:
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:629:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:629:3: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c: In function ‘elp_interrupt’:
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:667:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:665:5: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:689:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:689:6: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:724:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:723:8: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:729:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:728:9: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:736:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:736:8: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:746:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:746:7: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:756:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:755:7: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:766:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:765:7: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:776:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:775:7: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:792:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:792:7: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:800:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:800:7: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:821:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:820:6: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c: In function ‘elp_open’:
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:854:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:854:3: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:916:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:916:3: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:940:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:940:3: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:962:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:962:3: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c: In function ‘send_packet’:
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:992:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:992:4: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1014:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1014:3: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1040:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1040:3: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c: In function ‘elp_timeout’:
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1057:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1057:3: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c: In function ‘elp_start_xmit’:
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1079:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1079:3: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1088:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1088:4: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1094:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1094:3: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c: In function ‘elp_get_stats’:
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1113:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1113:3: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c: In function ‘elp_close’:
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1175:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1175:3: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c: In function ‘elp_set_mc_list’:
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1219:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1219:3: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1253:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1253:3: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c: In function ‘elp_sense’:
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1289:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1289:3: error: expected ‘}’ before ‘.’ token
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c: In function ‘elp_autodetect’:
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1355:1: error: expected identifier before string constant
/net/ca-quad-11a-boot/samana/bharrosh/git/loo-ct/drivers/net/ethernet/i825xx/3c505.c:1355:3: error: expected ‘}’ before ‘.’ token

^ permalink raw reply

* [PATCH] neigh: print nud_state in neigh timer handler.
From: Daniel Baluta @ 2011-10-31 21:10 UTC (permalink / raw)
  To: davem
  Cc: eric.dumazet, gregory.v.rose, jeffrey.t.kirsher, netdev,
	Daniel Baluta, Daniel Baluta

From: Daniel Baluta <daniel.baluta@gmail.com>

For debugging purposes it is useful to know the exact state of a
non NUD_IN_TIMER neighbour entry whose timer handler just expired.

Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
---
 net/core/neighbour.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 909ecb3..6a8a311 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -874,7 +874,7 @@ static void neigh_timer_handler(unsigned long arg)
 
 	if (!(state & NUD_IN_TIMER)) {
 #ifndef CONFIG_SMP
-		printk(KERN_WARNING "neigh: timer & !nud_in_timer\n");
+		printk(KERN_WARNING "neigh: timer & !nud_in_timer, state:0x%x\n", state);
 #endif
 		goto out;
 	}
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH] bonding:update speed/duplex for NETDEV_CHANGE
From: Ben Hutchings @ 2011-10-31 20:48 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Weiping Pan, netdev, andy, linux-kernel
In-Reply-To: <14973.1320093129@death>

On Mon, 2011-10-31 at 13:32 -0700, Jay Vosburgh wrote:
[...]
> 	This particular case arises only during enslavement.  The call
> to bond_update_speed_duplex call has failed, but the device is marked by
> bonding to be up.  Bonding complains that the device isn't down, but it
> cannot get speed and duplex, and therefore is assuming them to be
> 100/Full.
> 
> 	The catch is that this happens only for the ARP monitor, because
> it initially presumes a slave to be up regardless of actual carrier
> state (for historical reasons related to very old 10 or 10/100 drivers,
> prior to the introduction of netif_carrier_*).

Right, I gathered that.  Is there any reason to use the ARP monitor when
all slaves support link state notification?  Maybe the bonding
documentation should recommend miimon in section 7, not just in section
2.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] bonding:update speed/duplex for NETDEV_CHANGE
From: Jay Vosburgh @ 2011-10-31 20:32 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Weiping Pan, netdev, andy, linux-kernel
In-Reply-To: <1320084906.2735.9.camel@bwh-desktop>

Ben Hutchings <bhutchings@solarflare.com> wrote:

>On Mon, 2011-10-31 at 22:19 +0800, Weiping Pan wrote:
>> Zheng Liang(lzheng@redhat.com) found a bug that if we config bonding with
>> arp monitor, sometimes bonding driver cannot get the speed and duplex from
>> its slaves, it will assume them to be 100Mb/sec and Full, please see
>> /proc/net/bonding/bond0.
>> But there is no such problem when uses miimon.
>> 
>> (Take igb for example)
>> I find that the reason is that after dev_open() in bond_enslave(),
>> bond_update_speed_duplex() will call igb_get_settings()
>> , but in that function,
>> it runs ethtool_cmd_speed_set(ecmd, -1); ecmd->duplex = -1;
>> because igb get an error value of status.
>> So even dev_open() is called, but the device is not really ready to get its
>> settings.
>> 
>> Maybe it is safe for us to call igb_get_settings() only after
>> this message shows up, that is "igb: p4p1 NIC Link is Up 1000 Mbps Full Duplex,
>> Flow Control: RX".
>[...]

	I'll first point out that this patch is somewhat cosmetic, and
really only affects what shows up in /proc/net/bonding/bond0 for speed
and duplex.  The reason being that the modes that actually need to use
the speed and duplex information require the miimon for link state
checking, and that code path does the right thing already.

	This has probably been wrong all along, but relatively recently
code was added to show the speed and duplex in /proc/net/bonding/bond0,
so it now has a visible effect.

	So, the patch is ok as far as it goes, in that it will keep the
values displayed in the /proc file up to date.

	However, I'm not sure that faking the speed/duplex to 100/Full
is still the correct thing to do.  For the modes that use the
information, the ethtool state won't be queried if carrier is down (and
in those cases, if the speed / duplex returns an error while carrier up,
we should probably pay attention).  For the modes that the information
is merely cosmetic, displaying "unknown" as ethtool does is probably a
more accurate representation.

	Can you additionally remove the "fake to 100/Full" logic?  This
involves changing bond_update_speed_duplex to not fake the speed and
duplex, changing bond_enslave to not issue that warning, and changing
bond_info_show_slave to handle "bad" speed and duplex values.

	Anybody see a problem with doing that?

>For any device with autonegotiation enabled, you generally cannot get
>the speed and duplex settings until the link is up.  While the link is
>down, you may see a value of 0, ~0, or the best mode currently
>advertised.  So I think that the bonding driver should avoid updating
>the slave speed and duplex values whenever autoneg is enabled and the
>link is down.

	Well, it's a little more complicated than that.  Bonding already
generally avoids checking the speed and duplex if the slave isn't up (or
at least normally won't complain if it fails).

	This particular case arises only during enslavement.  The call
to bond_update_speed_duplex call has failed, but the device is marked by
bonding to be up.  Bonding complains that the device isn't down, but it
cannot get speed and duplex, and therefore is assuming them to be
100/Full.

	The catch is that this happens only for the ARP monitor, because
it initially presumes a slave to be up regardless of actual carrier
state (for historical reasons related to very old 10 or 10/100 drivers,
prior to the introduction of netif_carrier_*).

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Subnet router anycast for FE80/10 ?
From: Andreas Hofmeister @ 2011-10-31 20:22 UTC (permalink / raw)
  To: netdev

Hi,

I noticed that once forwarding has been enabled on an interface, there 
is a "subnet router anycast address" for the link-local address prefix 
FE80/10.

This address seems not to be explicitly mentioned in any RFC, but RFC 
4291 says "All routers are required to support the Subnet-Router anycast 
addresses for the subnets to which they have interfaces."

In the sense that a Linux router actually has an address FE80/10 on each 
ipv6 enabled interface, it seems to be correct to also have FE80:: as an 
anycast address on all interfaces which have ipv6 and forwarding enabled.

But then, FE80/10 is not actually supposed to be routed at all and so a 
router cannot not really be a router for that particular subnet ?

Or is "FE80::" just supposed to be the anycast equivalent for the "all 
routers" multicast address ff02::2 ?

Maybe someone on this list could enlighten me.

Ciao
  Andi

^ permalink raw reply

* Re: [RFC v2] tcp: Export TCP Delayed ACK parameters to user
From: Daniel Baluta @ 2011-10-31 20:02 UTC (permalink / raw)
  To: Rick Jones
  Cc: David Miller, eric.dumazet, kuznet, jmorris, yoshfuji, kaber,
	netdev, luto
In-Reply-To: <4EAEE487.5080905@hp.com>

On Mon, Oct 31, 2011 at 8:10 PM, Rick Jones <rick.jones2@hp.com> wrote:
> Whether tracked as bytes or segments, my take is that to ask applications to
> have to think about another non-portable socket option is ungood.  I would
> suggest taking the time to work-out the automagic heuristic to drop the
> deferred ACK count on connections where it being large is un-desirable and
> then not need to worry about the limits being global.

Your suggestion deserves further investigation, it looks tricky to
find a good heuristic for increasing/decreasing the ACK deferred count.

>
> Given the stack's existing propensity to try to decide when to increase the
> window I might even go so far as to suggest the sense of the heuristic be
> flipped and it seek to decide when it is ok to increase the number of
> segments/bytes per ACK.  To what extent one needs to go beyond what happens
> already with the stretching of ACKs via GRO/LRO or if that mechanism can
> serve as part of the logic of the heuristic is probably a fertile area for
> discussion.
>
> If I recall correctly, in one of your earlier posts you mentioned something
> about a 20% performance boost.  What were the specific conditions of that
> testing?  Was it over a setup where the receiver already had LRO/GRO or was
> it over a more plain receiver NIC without that functionality?

If I remember correctly on the receiver side there was no LRO/GRO, but we
tweaked some of /proc/sys/net/ipv4 parameters (e.g tcp_rmem).
Also, the traffic was highly unidirectional with many clients feeding multimedia
content to a server.

Anyhow, we used our custom kernel which is an older kernel version.
Are there any recommended benchmarks/tools for testing this kind of parameters?

Daniel.

^ permalink raw reply

* [patch 1/1] drivers/net/ethernet/i825xx/3c505.c: fix build with dynamic debug
From: akpm @ 2011-10-31 19:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, akpm, akpm, jbaron, philb

From: Andrew Morton <akpm@google.com>
Subject: drivers/net/ethernet/i825xx/3c505.c: fix build with dynamic debug

The `#define filename' screws up the expansion of
DEFINE_DYNAMIC_DEBUG_METADATA:

drivers/net/ethernet/i825xx/3c505.c: In function 'send_pcb':
drivers/net/ethernet/i825xx/3c505.c:390: error: expected identifier before string constant
drivers/net/ethernet/i825xx/3c505.c:390: error: expected '}' before '.' token
drivers/net/ethernet/i825xx/3c505.c:436: error: expected identifier before string constant
drivers/net/ethernet/i825xx/3c505.c:435: error: expected '}' before '.' token
drivers/net/ethernet/i825xx/3c505.c: In function 'start_receive':
drivers/net/ethernet/i825xx/3c505.c:557: error: expected identifier before string constant
drivers/net/ethernet/i825xx/3c505.c:557: error: expected '}' before '.' token
drivers/net/ethernet/i825xx/3c505.c: In function 'receive_packet':
drivers/net/ethernet/i825xx/3c505.c:629: error: expected identifier before string constant

etc

So remove that #define and "open-code" it.

Cc: Philip Blundell <philb@gnu.org>
Cc: David Miller <davem@davemloft.net>
Cc: Jason Baron <jbaron@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/net/ethernet/i825xx/3c505.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff -puN drivers/net/ethernet/i825xx/3c505.c~drivers-net-ethernet-i825xx-3c505c-fix-build-with-dynamic-debug drivers/net/ethernet/i825xx/3c505.c
--- a/drivers/net/ethernet/i825xx/3c505.c~drivers-net-ethernet-i825xx-3c505c-fix-build-with-dynamic-debug
+++ a/drivers/net/ethernet/i825xx/3c505.c
@@ -126,15 +126,13 @@
  *
  *********************************************************/
 
-#define filename __FILE__
-
 #define timeout_msg "*** timeout at %s:%s (line %d) ***\n"
 #define TIMEOUT_MSG(lineno) \
-	pr_notice(timeout_msg, filename, __func__, (lineno))
+	pr_notice(timeout_msg, __FILE__, __func__, (lineno))
 
 #define invalid_pcb_msg "*** invalid pcb length %d at %s:%s (line %d) ***\n"
 #define INVALID_PCB_MSG(len) \
-	pr_notice(invalid_pcb_msg, (len), filename, __func__, __LINE__)
+	pr_notice(invalid_pcb_msg, (len), __FILE__, __func__, __LINE__)
 
 #define search_msg "%s: Looking for 3c505 adapter at address %#x..."
 
_

^ permalink raw reply

* Confirmation
From: Western Union Money Transfer @ 2011-10-31 18:48 UTC (permalink / raw)


You have $85,000USD in cash credit by the International Monetary Funds via Western 
Union. Confirm this receipt to due process unit officer with 
FullName,Address,Tel,Occupation

*******************************************************
http://www.chasque.net
*******************************************************

^ permalink raw reply

* Re: [PATCH 2/2 v4] net/smsc911x: Add regulator support
From: Mike Frysinger @ 2011-10-31 18:21 UTC (permalink / raw)
  To: Robert Marklund
  Cc: netdev, Steve Glendinning, Mathieu Poirier, Paul Mundt, linux-sh,
	Sascha Hauer, Tony Lindgren, linux-omap, uclinux-dist-devel,
	Linus Walleij
In-Reply-To: <1320064719-14449-1-git-send-email-robert.marklund@stericsson.com>

[-- Attachment #1: Type: Text/Plain, Size: 437 bytes --]

On Monday 31 October 2011 08:38:39 Robert Marklund wrote:
> ChangeLog v3->v4:
> - Remove dual prints and old comment on Mike's request.
> - Split the request_free fucntion on Mike and Sascha request.

would be nice if the enable/disable were split as well ...

>  	iounmap(pdata->ioaddr);
> 
> +	(void)smsc911x_enable_disable_resources(pdev, false);

i don't think the (void) cast is necessary

otherwise looks fine
-mike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [RFC v2] tcp: Export TCP Delayed ACK parameters to user
From: Rick Jones @ 2011-10-31 18:10 UTC (permalink / raw)
  To: Daniel Baluta
  Cc: David Miller, eric.dumazet, kuznet, jmorris, yoshfuji, kaber,
	netdev, luto
In-Reply-To: <CAEnQRZCggUnoVZXyXfZ6-Om+hwQL_6Oo3dPODsXAH+iJYqN=jw@mail.gmail.com>

Whether tracked as bytes or segments, my take is that to ask 
applications to have to think about another non-portable socket option 
is ungood.  I would suggest taking the time to work-out the automagic 
heuristic to drop the deferred ACK count on connections where it being 
large is un-desirable and then not need to worry about the limits being 
global.

Given the stack's existing propensity to try to decide when to increase 
the window I might even go so far as to suggest the sense of the 
heuristic be flipped and it seek to decide when it is ok to increase the 
number of segments/bytes per ACK.  To what extent one needs to go beyond 
what happens already with the stretching of ACKs via GRO/LRO or if that 
mechanism can serve as part of the logic of the heuristic is probably a 
fertile area for discussion.

If I recall correctly, in one of your earlier posts you mentioned 
something about a 20% performance boost.  What were the specific 
conditions of that testing?  Was it over a setup where the receiver 
already had LRO/GRO or was it over a more plain receiver NIC without 
that functionality?

rick jones

^ permalink raw reply

* Re: [PATCH] bonding:update speed/duplex for NETDEV_CHANGE
From: Ben Hutchings @ 2011-10-31 18:15 UTC (permalink / raw)
  To: Weiping Pan; +Cc: netdev, fubar, andy, linux-kernel
In-Reply-To: <e00468a2cbb8a25d7a89028e876769449454309f.1320070684.git.wpan@redhat.com>

On Mon, 2011-10-31 at 22:19 +0800, Weiping Pan wrote:
> Zheng Liang(lzheng@redhat.com) found a bug that if we config bonding with
> arp monitor, sometimes bonding driver cannot get the speed and duplex from
> its slaves, it will assume them to be 100Mb/sec and Full, please see
> /proc/net/bonding/bond0.
> But there is no such problem when uses miimon.
> 
> (Take igb for example)
> I find that the reason is that after dev_open() in bond_enslave(),
> bond_update_speed_duplex() will call igb_get_settings()
> , but in that function,
> it runs ethtool_cmd_speed_set(ecmd, -1); ecmd->duplex = -1;
> because igb get an error value of status.
> So even dev_open() is called, but the device is not really ready to get its
> settings.
> 
> Maybe it is safe for us to call igb_get_settings() only after
> this message shows up, that is "igb: p4p1 NIC Link is Up 1000 Mbps Full Duplex,
> Flow Control: RX".
[...]

For any device with autonegotiation enabled, you generally cannot get
the speed and duplex settings until the link is up.  While the link is
down, you may see a value of 0, ~0, or the best mode currently
advertised.  So I think that the bonding driver should avoid updating
the slave speed and duplex values whenever autoneg is enabled and the
link is down.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* !
From: FBI @ 2011-10-31 17:58 UTC (permalink / raw)


Get  back  now  for  your  money  that  you  lost   to  scammers  back  now  reply  back!

^ permalink raw reply

* RE: [net-next-2.6 PATCH 0/6 RFC v3] macvlan: MAC Address filtering support for passthru mode
From: Rose, Gregory V @ 2011-10-31 17:39 UTC (permalink / raw)
  To: Roopa Prabhu, netdev@vger.kernel.org
  Cc: sri@us.ibm.com, dragos.tatulea@gmail.com, kvm@vger.kernel.org,
	arnd@arndb.de, mst@redhat.com, davem@davemloft.net,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com
In-Reply-To: <CAD42458.3856A%roprabhu@cisco.com>

> -----Original Message-----
> From: Roopa Prabhu [mailto:roprabhu@cisco.com]
> Sent: Monday, October 31, 2011 10:09 AM
> To: Rose, Gregory V; netdev@vger.kernel.org
> Cc: sri@us.ibm.com; dragos.tatulea@gmail.com; kvm@vger.kernel.org;
> arnd@arndb.de; mst@redhat.com; davem@davemloft.net; mchan@broadcom.com;
> dwang2@cisco.com; shemminger@vyatta.com; eric.dumazet@gmail.com;
> kaber@trash.net; benve@cisco.com
> Subject: Re: [net-next-2.6 PATCH 0/6 RFC v3] macvlan: MAC Address
> filtering support for passthru mode
> 
> 
> 
> 
> On 10/31/11 9:38 AM, "Rose, Gregory V" <gregory.v.rose@intel.com> wrote:
> 
> >> -----Original Message-----
> >> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org]
> >> On Behalf Of Roopa Prabhu
> >> Sent: Friday, October 28, 2011 7:34 PM
> >> To: netdev@vger.kernel.org
> >> Cc: sri@us.ibm.com; dragos.tatulea@gmail.com; kvm@vger.kernel.org;
> >> arnd@arndb.de; mst@redhat.com; davem@davemloft.net; Rose, Gregory V;
> >> mchan@broadcom.com; dwang2@cisco.com; shemminger@vyatta.com;
> >> eric.dumazet@gmail.com; kaber@trash.net; benve@cisco.com
> >> Subject: [net-next-2.6 PATCH 0/6 RFC v3] macvlan: MAC Address filtering
> >> support for passthru mode
> >>
> >> v2 -> v3
> >> - Moved set and get filter ops from rtnl_link_ops to netdev_ops
> >> - Support for SRIOV VFs.
> >> [Note: The get filters msg might get too big for SRIOV vfs.
> >>         But this patch follows existing sriov vf get code and
> >> accomodate filters for all VF's in a PF.
> >>         And for the SRIOV case I have only tested the fact that the VF
> >> arguments are getting delivered to rtnetlink correctly. The rest of
> >> the code follows existing sriov vf handling code so it should work
> >> just fine]
> >> - Fixed all op and netlink attribute names to start with IFLA_RX_FILTER
> >> - Changed macvlan filter ops to call corresponding lowerdev op if
> lowerdev
> >>   supports it for passthru mode. Else it falls back on macvlan handling
> >> the
> >>   filters locally as in v1 and v2
> >>
> >> v1 -> v2
> >> - Instead of TUNSETTXFILTER introduced rtnetlink interface for the same
> >>
> >
> > [snip...]
> >
> >>
> >> This patch series implements the following
> >> 01/6 rtnetlink: Netlink interface for setting MAC and VLAN filters
> >> 02/6 netdev: Add netdev_ops to set and get MAC/VLAN rx filters
> >> 03/6 rtnetlink: Add support to set MAC/VLAN filters
> >> 04/6 rtnetlink: Add support to get MAC/VLAN filters
> >> 05/6 macvlan: Add support to set MAC/VLAN filter netdev ops
> >> 06/6 macvlan: Add support to get MAC/VLAN filter netdev ops
> >>
> >> Please comment. Thanks.
> >
> > After some preliminary review this looks pretty good to me in so far as
> adding
> > the necessary hooks to do what I need to do.  I appreciate your effort
> on
> > this.
> >
> > I'm sort of a hands-on type of person so I need to apply this patch to a
> > private git tree and then take it for a test drive (so to speak).  If I have
> > further comments I'll get back to you.
> >
> Sounds good.
> 
> > Did you have any plans for modifying any user space tools such as 'ip' to use
> > this interface?
> >
> 
> Yes, I have an iproute2 sample patch for setting and displaying the filters
> which I have been using to test this interface. I can send the patch to you
> after some cleanup if you think it will be useful for you to try out this
> interface.
> 
> Thanks Greg.

Yes, please do.

Thanks,

- Greg


^ permalink raw reply

* Re: Linux 3.1-rc9
From: Simon Kirby @ 2011-10-31 17:32 UTC (permalink / raw)
  To: Thomas Gleixner, David Miller
  Cc: Peter Zijlstra, Linus Torvalds, Linux Kernel Mailing List,
	Dave Jones, Martin Schwidefsky, Ingo Molnar, Network Development
In-Reply-To: <20111025202049.GB25043@hostway.ca>

On Tue, Oct 25, 2011 at 01:20:49PM -0700, Simon Kirby wrote:

> On Mon, Oct 24, 2011 at 12:02:03PM -0700, Simon Kirby wrote:
> 
> > Ok, hit the hang about 4 more times, but only this morning on a box with
> > a serial cable attached. Yay!
> 
> Here's lockdep output from another box. This one looks a bit different.

One more, again a bit different. The last few lockups have looked like
this. Not sure why, but we're hitting this at a few a day now. Thomas,
this is without your patch, but as you said, that's right before a free
and should print a separate lockdep warning.

No "huh" lines until after the trace on this one. I'll move to 3.1 with
cherry-picked b0691c8e now.

Simon-

[104661.173798] 
[104661.173801] =======================================================
[104661.179922] [ INFO: possible circular locking dependency detected ]
[104661.179922] 3.1.0-rc10-hw-lockdep+ #51
[104661.179922] -------------------------------------------------------
[104661.179922] watchdog.pl/29331 is trying to acquire lock:
[104661.179922]  (slock-AF_INET/1){+.-.-.}, at: [<ffffffff81664887>] tcp_v4_rcv+0x867/0xc10
[104661.179922] 
[104661.179922] but task is already holding lock:
[104661.179922]  (slock-AF_INET){+.-.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
[104661.179922] 
[104661.179922] which lock already depends on the new lock.
[104661.179922] 
[104661.179922] 
[104661.179922] the existing dependency chain (in reverse order) is:
[104661.239412] 
[104661.239412] -> #1 (slock-AF_INET){+.-.-.}:
[104661.244767]        [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[104661.244767]        [<ffffffff816f55fc>] _raw_spin_lock+0x3c/0x50
[104661.244767]        [<ffffffff81604540>] sk_clone+0x120/0x420
[104661.244767]        [<ffffffff8164cb33>] inet_csk_clone+0x13/0x90
[104661.244767]        [<ffffffff816669a5>] tcp_create_openreq_child+0x25/0x4d0
[104661.244767]        [<ffffffff81664c78>] tcp_v4_syn_recv_sock+0x48/0x2c0
[104661.244767]        [<ffffffff816667f5>] tcp_check_req+0x335/0x4c0
[104661.244767]        [<ffffffff81663e5e>] tcp_v4_do_rcv+0x29e/0x460
[104661.244767]        [<ffffffff816648ac>] tcp_v4_rcv+0x88c/0xc10   
[104661.244767]        [<ffffffff81641960>] ip_local_deliver_finish+0x100/0x2f0
[104661.244767]        [<ffffffff81641bdd>] ip_local_deliver+0x8d/0xa0
[104661.244767]        [<ffffffff81641203>] ip_rcv_finish+0x1a3/0x510 
[104661.244767]        [<ffffffff816417e2>] ip_rcv+0x272/0x2f0
[104661.244767]        [<ffffffff81610d67>] __netif_receive_skb+0x4d7/0x560
[104661.244767]        [<ffffffff81610ec0>] process_backlog+0xd0/0x1e0
[104661.244767]        [<ffffffff81613880>] net_rx_action+0x140/0x2c0 
[104661.244767]        [<ffffffff810640b8>] __do_softirq+0x138/0x250  
[104661.244767]        [<ffffffff817002bc>] call_softirq+0x1c/0x30    
[104661.244767]        [<ffffffff810153c5>] do_softirq+0x95/0xd0      
[104661.244767]        [<ffffffff81063dbd>] local_bh_enable_ip+0xed/0x110
[104661.244767]        [<ffffffff816f5e9f>] _raw_spin_unlock_bh+0x3f/0x50
[104661.244767]        [<ffffffff81602e41>] release_sock+0x161/0x1d0
[104661.244767]        [<ffffffff816762ed>] inet_stream_connect+0x6d/0x2f0
[104661.244767]        [<ffffffff815fcfeb>] kernel_connect+0xb/0x10
[104661.244767]        [<ffffffff816aaf86>] xs_tcp_setup_socket+0x2a6/0x4c0
[104661.244767]        [<ffffffff81078cf9>] process_one_work+0x1e9/0x560   
[104661.244767]        [<ffffffff81079403>] worker_thread+0x193/0x420      
[104661.244767]        [<ffffffff81080466>] kthread+0x96/0xb0
[104661.244767]        [<ffffffff817001c4>] kernel_thread_helper+0x4/0x10
[104661.244767] 
[104661.244767] -> #0 (slock-AF_INET/1){+.-.-.}:
[104661.244767]        [<ffffffff8109a000>] __lock_acquire+0x2040/0x2180
[104661.244767]        [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[104661.244767]        [<ffffffff816f55aa>] _raw_spin_lock_nested+0x3a/0x50
[104661.244767]        [<ffffffff81664887>] tcp_v4_rcv+0x867/0xc10
[104661.244767]        [<ffffffff81641960>] ip_local_deliver_finish+0x100/0x2f0
[104661.244767]        [<ffffffff81641bdd>] ip_local_deliver+0x8d/0xa0
[104661.244767]        [<ffffffff81641203>] ip_rcv_finish+0x1a3/0x510 
[104661.244767]        [<ffffffff816417e2>] ip_rcv+0x272/0x2f0
[104661.244767]        [<ffffffff81610d67>] __netif_receive_skb+0x4d7/0x560
[104661.244767]        [<ffffffff81612e24>] netif_receive_skb+0x104/0x120  
[104661.244767]        [<ffffffff81612f70>] napi_skb_finish+0x50/0x70
[104661.244767]        [<ffffffff81613635>] napi_gro_receive+0xc5/0xd0
[104661.244767]        [<ffffffffa000ad50>] bnx2_poll_work+0x610/0x1560 [bnx2]
[104661.244767]        [<ffffffffa000bde6>] bnx2_poll+0x66/0x250 [bnx2]
[104661.244767]        [<ffffffff81613880>] net_rx_action+0x140/0x2c0  
[104661.244767]        [<ffffffff810640b8>] __do_softirq+0x138/0x250   
[104661.244767]        [<ffffffff817002bc>] call_softirq+0x1c/0x30     
[104661.244767]        [<ffffffff810153c5>] do_softirq+0x95/0xd0       
[104661.244767]        [<ffffffff81063c8d>] irq_exit+0xdd/0x110        
[104661.244767]        [<ffffffff81014b74>] do_IRQ+0x64/0xe0           
[104661.244767]        [<ffffffff816f6273>] ret_from_intr+0x0/0x1a     
[104661.244767]        [<ffffffff816f65b5>] page_fault+0x25/0x30     
[104661.244767] 
[104661.244767] other info that might help us debug this:
[104661.244767] 
[104661.244767]  Possible unsafe locking scenario:
[104661.244767]        
[104661.244767]        CPU0                    CPU1
[104661.244767]        ----                    ----
[104661.244767]   lock(slock-AF_INET);
[104661.244767]                                lock(slock-AF_INET);
[104661.244767]                                lock(slock-AF_INET);
[104661.244767]   lock(slock-AF_INET);
[104661.244767] 
[104661.244767]  *** DEADLOCK ***
[104661.244767] 
[104661.244767] 3 locks held by watchdog.pl/29331:
[104661.244767]  #0:  (slock-AF_INET){+.-.-.}, at: [<ffffffff81604540>] sk_clone+0x120/0x420
[104661.244767]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff816109f5>] __netif_receive_skb+0x165/0x560
[104661.244767]  #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff816418a0>] ip_local_deliver_finish+0x40/0x2f0
[104661.244767] 
[104661.244767] stack backtrace:
[104661.244767] Pid: 29331, comm: watchdog.pl Not tainted 3.1.0-rc10-hw-lockdep+ #51
[104661.244767] Call Trace:
[104661.244767]  <IRQ>  [<ffffffff81097eab>] print_circular_bug+0x21b/0x330
[104661.244767]  [<ffffffff8109a000>] __lock_acquire+0x2040/0x2180
[104661.244767]  [<ffffffff8109a7b9>] lock_acquire+0x109/0x140
[104661.244767]  [<ffffffff81664887>] ? tcp_v4_rcv+0x867/0xc10
[104661.244767]  [<ffffffff816f55aa>] _raw_spin_lock_nested+0x3a/0x50
[104661.244767]  [<ffffffff81664887>] ? tcp_v4_rcv+0x867/0xc10
[104661.244767]  [<ffffffff81664887>] tcp_v4_rcv+0x867/0xc10  
[104661.244767]  [<ffffffff816418a0>] ? ip_local_deliver_finish+0x40/0x2f0
[104661.244767]  [<ffffffff81636978>] ? nf_hook_slow+0x148/0x1a0
[104661.244767]  [<ffffffff81641960>] ip_local_deliver_finish+0x100/0x2f0
[104661.244767]  [<ffffffff816418a0>] ? ip_local_deliver_finish+0x40/0x2f0
[104661.244767]  [<ffffffff81641bdd>] ip_local_deliver+0x8d/0xa0
[104661.244767]  [<ffffffff81641203>] ip_rcv_finish+0x1a3/0x510 
[104661.244767]  [<ffffffff816417e2>] ip_rcv+0x272/0x2f0
[104661.244767]  [<ffffffff81610d67>] __netif_receive_skb+0x4d7/0x560
[104661.244767]  [<ffffffff816109f5>] ? __netif_receive_skb+0x165/0x560
[104661.244767]  [<ffffffff81612e24>] netif_receive_skb+0x104/0x120
[104661.244767]  [<ffffffff81612d43>] ? netif_receive_skb+0x23/0x120
[104661.244767]  [<ffffffff816133ab>] ? dev_gro_receive+0x29b/0x380 
[104661.244767]  [<ffffffff816132a2>] ? dev_gro_receive+0x192/0x380 
[104661.244767]  [<ffffffff81612f70>] napi_skb_finish+0x50/0x70
[104661.244767]  [<ffffffff81613635>] napi_gro_receive+0xc5/0xd0
[104661.244767]  [<ffffffffa000ad50>] bnx2_poll_work+0x610/0x1560 [bnx2]
[104661.244767]  [<ffffffffa000bde6>] bnx2_poll+0x66/0x250 [bnx2]
[104661.244767]  [<ffffffff81613880>] net_rx_action+0x140/0x2c0  
[104661.244767]  [<ffffffff810640b8>] __do_softirq+0x138/0x250   
[104661.244767]  [<ffffffff817002bc>] call_softirq+0x1c/0x30     
[104661.244767]  [<ffffffff810153c5>] do_softirq+0x95/0xd0       
[104661.244767]  [<ffffffff81063c8d>] irq_exit+0xdd/0x110        
[104661.244767]  [<ffffffff81014b74>] do_IRQ+0x64/0xe0           
[104661.244767]  [<ffffffff816f6273>] common_interrupt+0x73/0x73
[104661.244767]  <EOI>  [<ffffffff816f99b3>] ? do_page_fault+0x93/0x520
[104661.244767]  [<ffffffff816f99af>] ? do_page_fault+0x8f/0x520
[104661.244767]  [<ffffffff81149afc>] ? vfsmount_lock_local_unlock+0x1c/0x40
[104661.244767]  [<ffffffff8114a79b>] ? mntput_no_expire+0x3b/0x150
[104661.244767]  [<ffffffff8114a8ca>] ? mntput+0x1a/0x30
[104661.244767]  [<ffffffff8112c540>] ? fput+0x190/0x230
[104661.244767]  [<ffffffff813a60ed>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[104661.244767]  [<ffffffff816f65b5>] page_fault+0x25/0x30
[104661.897577] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104661.923653] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104663.418206] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104666.420003] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104672.425159] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?
[104684.423542] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000102, exited with 00000103?
[104691.206752] huh, entered softirq 3 NET_RX ffffffff81613740 preempt_count 00000101, exited with 00000102?

^ permalink raw reply

* Re: [net-next-2.6 PATCH 0/6 RFC v3] macvlan: MAC Address filtering support for passthru mode
From: Roopa Prabhu @ 2011-10-31 17:09 UTC (permalink / raw)
  To: Rose, Gregory V, netdev@vger.kernel.org
  Cc: sri@us.ibm.com, dragos.tatulea@gmail.com, kvm@vger.kernel.org,
	arnd@arndb.de, mst@redhat.com, davem@davemloft.net,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com
In-Reply-To: <43F901BD926A4E43B106BF17856F075501A1BD5241@orsmsx508.amr.corp.intel.com>




On 10/31/11 9:38 AM, "Rose, Gregory V" <gregory.v.rose@intel.com> wrote:

>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
>> On Behalf Of Roopa Prabhu
>> Sent: Friday, October 28, 2011 7:34 PM
>> To: netdev@vger.kernel.org
>> Cc: sri@us.ibm.com; dragos.tatulea@gmail.com; kvm@vger.kernel.org;
>> arnd@arndb.de; mst@redhat.com; davem@davemloft.net; Rose, Gregory V;
>> mchan@broadcom.com; dwang2@cisco.com; shemminger@vyatta.com;
>> eric.dumazet@gmail.com; kaber@trash.net; benve@cisco.com
>> Subject: [net-next-2.6 PATCH 0/6 RFC v3] macvlan: MAC Address filtering
>> support for passthru mode
>> 
>> v2 -> v3
>> - Moved set and get filter ops from rtnl_link_ops to netdev_ops
>> - Support for SRIOV VFs.
>> [Note: The get filters msg might get too big for SRIOV vfs.
>>         But this patch follows existing sriov vf get code and
>> accomodate filters for all VF's in a PF.
>>         And for the SRIOV case I have only tested the fact that the VF
>> arguments are getting delivered to rtnetlink correctly. The rest of
>> the code follows existing sriov vf handling code so it should work
>> just fine]
>> - Fixed all op and netlink attribute names to start with IFLA_RX_FILTER
>> - Changed macvlan filter ops to call corresponding lowerdev op if lowerdev
>>   supports it for passthru mode. Else it falls back on macvlan handling
>> the
>>   filters locally as in v1 and v2
>> 
>> v1 -> v2
>> - Instead of TUNSETTXFILTER introduced rtnetlink interface for the same
>> 
> 
> [snip...]
> 
>> 
>> This patch series implements the following
>> 01/6 rtnetlink: Netlink interface for setting MAC and VLAN filters
>> 02/6 netdev: Add netdev_ops to set and get MAC/VLAN rx filters
>> 03/6 rtnetlink: Add support to set MAC/VLAN filters
>> 04/6 rtnetlink: Add support to get MAC/VLAN filters
>> 05/6 macvlan: Add support to set MAC/VLAN filter netdev ops
>> 06/6 macvlan: Add support to get MAC/VLAN filter netdev ops
>> 
>> Please comment. Thanks.
> 
> After some preliminary review this looks pretty good to me in so far as adding
> the necessary hooks to do what I need to do.  I appreciate your effort on
> this.
> 
> I'm sort of a hands-on type of person so I need to apply this patch to a
> private git tree and then take it for a test drive (so to speak).  If I have
> further comments I'll get back to you.
> 
Sounds good. 

> Did you have any plans for modifying any user space tools such as 'ip' to use
> this interface?
> 

Yes, I have an iproute2 sample patch for setting and displaying the filters
which I have been using to test this interface. I can send the patch to you
after some cleanup if you think it will be useful for you to try out this
interface.

Thanks Greg.

^ permalink raw reply

* RE: [net-next-2.6 PATCH 0/6 RFC v3] macvlan: MAC Address filtering support for passthru mode
From: Rose, Gregory V @ 2011-10-31 16:38 UTC (permalink / raw)
  To: Roopa Prabhu, netdev@vger.kernel.org
  Cc: sri@us.ibm.com, dragos.tatulea@gmail.com, kvm@vger.kernel.org,
	arnd@arndb.de, mst@redhat.com, davem@davemloft.net,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com
In-Reply-To: <20111029023159.5198.60245.stgit@rhel6.1>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Roopa Prabhu
> Sent: Friday, October 28, 2011 7:34 PM
> To: netdev@vger.kernel.org
> Cc: sri@us.ibm.com; dragos.tatulea@gmail.com; kvm@vger.kernel.org;
> arnd@arndb.de; mst@redhat.com; davem@davemloft.net; Rose, Gregory V;
> mchan@broadcom.com; dwang2@cisco.com; shemminger@vyatta.com;
> eric.dumazet@gmail.com; kaber@trash.net; benve@cisco.com
> Subject: [net-next-2.6 PATCH 0/6 RFC v3] macvlan: MAC Address filtering
> support for passthru mode
> 
> v2 -> v3
> - Moved set and get filter ops from rtnl_link_ops to netdev_ops
> - Support for SRIOV VFs.
> 	[Note: The get filters msg might get too big for SRIOV vfs.
>         But this patch follows existing sriov vf get code and
> 	accomodate filters for all VF's in a PF.
>         And for the SRIOV case I have only tested the fact that the VF
> 	arguments are getting delivered to rtnetlink correctly. The rest of
> 	the code follows existing sriov vf handling code so it should work
> 	just fine]
> - Fixed all op and netlink attribute names to start with IFLA_RX_FILTER
> - Changed macvlan filter ops to call corresponding lowerdev op if lowerdev
>   supports it for passthru mode. Else it falls back on macvlan handling
> the
>   filters locally as in v1 and v2
> 
> v1 -> v2
> - Instead of TUNSETTXFILTER introduced rtnetlink interface for the same
> 

[snip...]

> 
> This patch series implements the following
> 01/6 rtnetlink: Netlink interface for setting MAC and VLAN filters
> 02/6 netdev: Add netdev_ops to set and get MAC/VLAN rx filters
> 03/6 rtnetlink: Add support to set MAC/VLAN filters
> 04/6 rtnetlink: Add support to get MAC/VLAN filters
> 05/6 macvlan: Add support to set MAC/VLAN filter netdev ops
> 06/6 macvlan: Add support to get MAC/VLAN filter netdev ops
> 
> Please comment. Thanks.

After some preliminary review this looks pretty good to me in so far as adding the necessary hooks to do what I need to do.  I appreciate your effort on this.

I'm sort of a hands-on type of person so I need to apply this patch to a private git tree and then take it for a test drive (so to speak).  If I have further comments I'll get back to you.

Did you have any plans for modifying any user space tools such as 'ip' to use this interface?

- Greg

> 
> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
> Signed-off-by: Christian Benvenuti <benve@cisco.com>
> Signed-off-by: David Wang <dwang2@cisco.com>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] vlan: Don't propagate flag changes on down interfaces.
From: Matthijs Kooijman @ 2011-10-31 14:53 UTC (permalink / raw)
  To: netdev; +Cc: Matthijs Kooijman
In-Reply-To: <20111031134411.GG14392@login.drsnuggles.stderr.nl>

When (de)configuring a vlan interface, the IFF_ALLMULTI ans IFF_PROMISC
flags are cleared or set on the underlying interface. So, if these flags
are changed on a vlan interface that is not up, the flags underlying
interface might be set or cleared twice.

Only propagating flag changes when a device is up makes sure this does
not happen. It also makes sure that an underlying device is not set to
promiscuous or allmulti mode for a vlan device that is down.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
---
 net/8021q/vlan_dev.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 9d40a07..0e72689 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -470,10 +470,12 @@ static void vlan_dev_change_rx_flags(struct net_device *dev, int change)
 {
 	struct net_device *real_dev = vlan_dev_info(dev)->real_dev;

-	if (change & IFF_ALLMULTI)
-		dev_set_allmulti(real_dev, dev->flags & IFF_ALLMULTI ? 1 : -1);
-	if (change & IFF_PROMISC)
-		dev_set_promiscuity(real_dev, dev->flags & IFF_PROMISC ? 1 : -1);
+	if (dev->flags & IFF_UP) {
+		if (change & IFF_ALLMULTI)
+			dev_set_allmulti(real_dev, dev->flags & IFF_ALLMULTI ? 1 : -1);
+		if (change & IFF_PROMISC)
+			dev_set_promiscuity(real_dev, dev->flags & IFF_PROMISC ? 1 : -1);
+	}
 }

 static void vlan_dev_set_rx_mode(struct net_device *vlan_dev)
-- 
1.7.7

^ permalink raw reply related

* [PATCH] bonding:update speed/duplex for NETDEV_CHANGE
From: Weiping Pan @ 2011-10-31 14:19 UTC (permalink / raw)
  To: netdev; +Cc: fubar, andy, linux-kernel, Weiping Pan
In-Reply-To: <4EAE0D9A.9060408@gmail.com>

Zheng Liang(lzheng@redhat.com) found a bug that if we config bonding with
arp monitor, sometimes bonding driver cannot get the speed and duplex from
its slaves, it will assume them to be 100Mb/sec and Full, please see
/proc/net/bonding/bond0.
But there is no such problem when uses miimon.

(Take igb for example)
I find that the reason is that after dev_open() in bond_enslave(),
bond_update_speed_duplex() will call igb_get_settings()
, but in that function,
it runs ethtool_cmd_speed_set(ecmd, -1); ecmd->duplex = -1;
because igb get an error value of status.
So even dev_open() is called, but the device is not really ready to get its
settings.

Maybe it is safe for us to call igb_get_settings() only after
this message shows up, that is "igb: p4p1 NIC Link is Up 1000 Mbps Full Duplex,
Flow Control: RX".

So I prefer to update the speed and duplex for a slave when reseices
NETDEV_CHANGE/NETDEV_UP event.

Signed-off-by: Weiping Pan <wpan@redhat.com>
---
 drivers/net/bonding/bond_main.c |   19 ++++++++-----------
 1 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index c34cc1e..f5458eb 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3220,6 +3220,7 @@ static int bond_slave_netdev_event(unsigned long event,
 {
 	struct net_device *bond_dev = slave_dev->master;
 	struct bonding *bond = netdev_priv(bond_dev);
+	struct slave *slave = NULL;

 	switch (event) {
 	case NETDEV_UNREGISTER:
@@ -3230,20 +3231,16 @@ static int bond_slave_netdev_event(unsigned long event,
 				bond_release(bond_dev, slave_dev);
 		}
 		break;
+	case NETDEV_UP:
 	case NETDEV_CHANGE:
-		if (bond->params.mode == BOND_MODE_8023AD || bond_is_lb(bond)) {
-			struct slave *slave;
-
-			slave = bond_get_slave_by_dev(bond, slave_dev);
-			if (slave) {
-				u32 old_speed = slave->speed;
-				u8  old_duplex = slave->duplex;
-
-				bond_update_speed_duplex(slave);
+		slave = bond_get_slave_by_dev(bond, slave_dev);
+		if (slave) {
+			u32 old_speed = slave->speed;
+			u8  old_duplex = slave->duplex;

-				if (bond_is_lb(bond))
-					break;
+			bond_update_speed_duplex(slave);

+			if (bond->params.mode == BOND_MODE_8023AD) {
 				if (old_speed != slave->speed)
 					bond_3ad_adapter_speed_changed(slave);
 				if (old_duplex != slave->duplex)
-- 
1.7.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox