Netdev List
 help / color / mirror / Atom feed
* Re: [PATCHv2] vhost-net: add dhclient work-around from userspace
From: Michael S. Tsirkin @ 2010-06-30 22:31 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: David Miller, arozansk, herbert.xu, quintela, kvm, virtualization,
	netdev, linux-kernel, ykaul, markmc
In-Reply-To: <4C2BC04B.3000100@codemonkey.ws>

On Wed, Jun 30, 2010 at 05:08:11PM -0500, Anthony Liguori wrote:
> On 06/29/2010 08:04 AM, Michael S. Tsirkin wrote:
> >On Tue, Jun 29, 2010 at 12:36:47AM -0700, David Miller wrote:
> >>From: "Michael S. Tsirkin"<mst@redhat.com>
> >>Date: Mon, 28 Jun 2010 13:08:07 +0300
> >>
> >>>Userspace virtio server has the following hack
> >>>so guests rely on it, and we have to replicate it, too:
> >>>
> >>>Use port number to detect incoming IPv4 DHCP response packets,
> >>>and fill in the checksum for these.
> >>>
> >>>The issue we are solving is that on linux guests, some apps
> >>>that use recvmsg with AF_PACKET sockets, don't know how to
> >>>handle CHECKSUM_PARTIAL;
> >>>The interface to return the relevant information was added
> >>>in 8dc4194474159660d7f37c495e3fc3f10d0db8cc,
> >>>and older userspace does not use it.
> >>>One important user of recvmsg with AF_PACKET is dhclient,
> >>>so we add a work-around just for DHCP.
> >>>
> >>>Don't bother applying the hack to IPv6 as userspace virtio does not
> >>>have a work-around for that - let's hope guests will do the right
> >>>thing wrt IPv6.
> >>>
> >>>Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
> >>Yikes, this is awful too.
> >>
> >>Nothing in the kernel should be mucking around with procotol packets
> >>like this by default.  In particular, what the heck does port 67 mean?
> >>Locally I can use it for whatever I want for my own purposes, I don't
> >>have to follow the conventions for service ports as specified by the
> >>IETF.
> >>
> >>But I can't have the packet checksum state be left alone for port 67
> >>traffic on a box using virtio because you have this hack there.
> >>
> >>And yes it's broken on machines using the qemu thing, but at least the
> >>hack there is restricted to userspace.
> >Yes, and I think it was a mistake to add the hack there. This is what
> >prevented applications from using the new interface in the 3 years
> >since it was first introduced.
> 
> It's far more complicated than that.  dhclient is part of ISC's DHCP
> package.  They do not have a public SCM and instead require you to
> join their Software Guild to get access to it.
> 
> This problem was identified in one distribution and the patch was
> pushed upstream but because they did not have a public SCM, most
> other distributions did not see the fix until it appeared in a
> release.  ISC has a pretty long release cycle historically.
> 
> ISC's had the fix for a long time but there was a 3-year gap in
> their releases and since their SCM isn't public, users are stuck
> with the last release.
> 
> This hack makes sense in QEMU as we have a few hacks like this to
> fix broken guests.
>  A primary use of virtualization is to run old
> applications so it makes sense for us to do that.

IMO it was wrong to put it in qemu: originally, if a distro shipped
a broken virtio/dhclient combo, it was it's own bug to fix.
But now that qemu has shipped the work-around for so long,
broken guests seemed work. So we *still* see the bug re-surface in new guests.

And since they are fairly new, it is interesting to
get decent performance from them now.

> 
> I don't think it makes sense for vhost to do this.  These guests are
> so old that they don't have the requisite features to achieve really
> high performance anyway.
> 
> I've always thought making vhost totally transparent was a bad idea
> and this is one of the reasons.

It does not have to be fully transparent. You can insert your own ring
in the middle, and copy descriptors around.  And we stop on errors and
let userspace handle.  This will come handy if we get e.g. virtio bug
that we need to work around.

> We can do a lot of ugly things in
> userspace that we shouldn't be doing in the kernel.
> 
> Regards,
> 
> Anthony Liguori

QEMU is only userspace for the host. It is the hardware for the guest.
So IMO we should not be doing the ugly things there either.

-- 
MST

^ permalink raw reply

* Re: b44: Reset due to FIFO overflow.
From: James Courtier-Dutton @ 2010-06-30 22:20 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, erblichs, netdev
In-Reply-To: <20100630.132220.129754921.davem@davemloft.net>

On 30 June 2010 21:22, David Miller <davem@davemloft.net> wrote:
> From: James Courtier-Dutton <james.dutton@gmail.com>
> Date: Mon, 28 Jun 2010 11:17:59 +0100
>
>> Interesting, which hardware, apart from the b44, is it that "requires"
>> a hardware reset after a RX FIFO overflow.
>
> This problem is quite common, actually.
>
> Even though it shouldn't be, this is seemingly one of the least tested
> paths of a networking chip.
>
> You'd think the recovery would be easy, flush the fifos and drop the
> packet, then rewind the RX descriptor pointer.
>
> But it's not and I've seen everything from RX descriptor corruption
> to random DMA splats elsewhere corrupting memory entirely, as a result
> of a networking card taking a RX fifo overflow.
>

Well, I have just written a patch (see other thread) to try and reset
the FIFO instead of a complete HW reset.
How do I know if I have RX descriptor corruption, or random DMA splats?
I have not detected any problems so far.

^ permalink raw reply

* Re: [PATCHv2] vhost-net: add dhclient work-around from userspace
From: Anthony Liguori @ 2010-06-30 22:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, arozansk, herbert.xu, quintela, kvm, virtualization,
	netdev, linux-kernel, ykaul, markmc
In-Reply-To: <20100629130439.GD3603@redhat.com>

On 06/29/2010 08:04 AM, Michael S. Tsirkin wrote:
> On Tue, Jun 29, 2010 at 12:36:47AM -0700, David Miller wrote:
>    
>> From: "Michael S. Tsirkin"<mst@redhat.com>
>> Date: Mon, 28 Jun 2010 13:08:07 +0300
>>
>>      
>>> Userspace virtio server has the following hack
>>> so guests rely on it, and we have to replicate it, too:
>>>
>>> Use port number to detect incoming IPv4 DHCP response packets,
>>> and fill in the checksum for these.
>>>
>>> The issue we are solving is that on linux guests, some apps
>>> that use recvmsg with AF_PACKET sockets, don't know how to
>>> handle CHECKSUM_PARTIAL;
>>> The interface to return the relevant information was added
>>> in 8dc4194474159660d7f37c495e3fc3f10d0db8cc,
>>> and older userspace does not use it.
>>> One important user of recvmsg with AF_PACKET is dhclient,
>>> so we add a work-around just for DHCP.
>>>
>>> Don't bother applying the hack to IPv6 as userspace virtio does not
>>> have a work-around for that - let's hope guests will do the right
>>> thing wrt IPv6.
>>>
>>> Signed-off-by: Michael S. Tsirkin<mst@redhat.com>
>>>        
>> Yikes, this is awful too.
>>
>> Nothing in the kernel should be mucking around with procotol packets
>> like this by default.  In particular, what the heck does port 67 mean?
>> Locally I can use it for whatever I want for my own purposes, I don't
>> have to follow the conventions for service ports as specified by the
>> IETF.
>>
>> But I can't have the packet checksum state be left alone for port 67
>> traffic on a box using virtio because you have this hack there.
>>
>> And yes it's broken on machines using the qemu thing, but at least the
>> hack there is restricted to userspace.
>>      
> Yes, and I think it was a mistake to add the hack there. This is what
> prevented applications from using the new interface in the 3 years
> since it was first introduced.
>    

It's far more complicated than that.  dhclient is part of ISC's DHCP 
package.  They do not have a public SCM and instead require you to join 
their Software Guild to get access to it.

This problem was identified in one distribution and the patch was pushed 
upstream but because they did not have a public SCM, most other 
distributions did not see the fix until it appeared in a release.  ISC 
has a pretty long release cycle historically.

ISC's had the fix for a long time but there was a 3-year gap in their 
releases and since their SCM isn't public, users are stuck with the last 
release.

This hack makes sense in QEMU as we have a few hacks like this to fix 
broken guests.  A primary use of virtualization is to run old 
applications so it makes sense for us to do that.

I don't think it makes sense for vhost to do this.  These guests are so 
old that they don't have the requisite features to achieve really high 
performance anyway.

I've always thought making vhost totally transparent was a bad idea and 
this is one of the reasons.  We can do a lot of ugly things in userspace 
that we shouldn't be doing in the kernel.

Regards,

Anthony Liguori

^ permalink raw reply

* [PATCH 1/1] ehea: Allocate stats buffer with GFP_KERNEL
From: Brian King @ 2010-06-30 21:59 UTC (permalink / raw)
  To: ossthema; +Cc: osstklei, raisch, netdev, brking


Since ehea_get_stats calls ehea_h_query_ehea_port, which
can sleep, we can also sleep when allocating a page in
this function. This fixes some memory allocation failure
warnings seen under low memory conditions.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---

 drivers/net/ehea/ehea_main.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN drivers/net/ehea/ehea_main.c~ehea_get_stats_gfp drivers/net/ehea/ehea_main.c
--- linux-2.6/drivers/net/ehea/ehea_main.c~ehea_get_stats_gfp	2010-06-28 09:46:51.000000000 -0500
+++ linux-2.6-bjking1/drivers/net/ehea/ehea_main.c	2010-06-28 09:46:51.000000000 -0500
@@ -335,7 +335,7 @@ static struct net_device_stats *ehea_get
 
 	memset(stats, 0, sizeof(*stats));
 
-	cb2 = (void *)get_zeroed_page(GFP_ATOMIC);
+	cb2 = (void *)get_zeroed_page(GFP_KERNEL);
 	if (!cb2) {
 		ehea_error("no mem for cb2");
 		goto out;
_

^ permalink raw reply

* Re: [net-next-2.6 PATCH v2] x86: Align skb w/ start of cacheline on newer core 2/Xeon Arch
From: David Miller @ 2010-06-30 21:34 UTC (permalink / raw)
  To: hpa
  Cc: jeffrey.t.kirsher, netdev, gospo, bphilips, tglx, mingo, x86,
	alexander.h.duyck
In-Reply-To: <4C2BB7F1.6050009@zytor.com>

From: "H. Peter Anvin" <hpa@zytor.com>
Date: Wed, 30 Jun 2010 14:32:33 -0700

> On 06/30/2010 02:28 PM, David Miller wrote:
>> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>> Date: Tue, 29 Jun 2010 21:38:00 -0700
>> 
>>> From: Alexander Duyck <alexander.h.duyck@intel.com>
>>>
>>> x86 architectures can handle unaligned accesses in hardware, and it has
>>> been shown that unaligned DMA accesses can be expensive on Nehalem
>>> architectures.  As such we should overwrite NET_IP_ALIGN to resolve
>>> this issue.
>>>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Ingo Molnar <mingo@redhat.com>
>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>> Cc: x86@kernel.org
>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>>> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>> 
>> Can I get an x86'er ACK on this?  I can merge it in via net-next-2.6
>> which is probably most convenient for people who want to see the
>> networking performance effects of this change.
> 
> Acked-by: H. Peter Anvin <hpa@zytor.com>

Applied, thanks everyone.

^ permalink raw reply

* Re: [net-next-2.6 PATCH v2] x86: Align skb w/ start of cacheline on newer core 2/Xeon Arch
From: H. Peter Anvin @ 2010-06-30 21:32 UTC (permalink / raw)
  To: David Miller
  Cc: jeffrey.t.kirsher, netdev, gospo, bphilips, tglx, mingo, x86,
	alexander.h.duyck
In-Reply-To: <20100630.142832.51275605.davem@davemloft.net>

On 06/30/2010 02:28 PM, David Miller wrote:
> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Date: Tue, 29 Jun 2010 21:38:00 -0700
> 
>> From: Alexander Duyck <alexander.h.duyck@intel.com>
>>
>> x86 architectures can handle unaligned accesses in hardware, and it has
>> been shown that unaligned DMA accesses can be expensive on Nehalem
>> architectures.  As such we should overwrite NET_IP_ALIGN to resolve
>> this issue.
>>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: x86@kernel.org
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> 
> Can I get an x86'er ACK on this?  I can merge it in via net-next-2.6
> which is probably most convenient for people who want to see the
> networking performance effects of this change.

Acked-by: H. Peter Anvin <hpa@zytor.com>

	-hpa

^ permalink raw reply

* Re: [PATCHv2] vhost-net: add dhclient work-around from userspace
From: David Miller @ 2010-06-30 21:30 UTC (permalink / raw)
  To: mst
  Cc: arozansk, herbert.xu, quintela, kvm, virtualization, netdev,
	linux-kernel, ykaul, markmc
In-Reply-To: <20100629130439.GD3603@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 29 Jun 2010 16:04:39 +0300

> Since using the module involves updating the management tools
> as well, if we go down this route it will be much less painful
> for everyone to do push it upstream.

Ok, you can make your case to Patrick McHardy and if he'll merge
it into his netfilter GIT tree I guess I'll have to take it :)

^ permalink raw reply

* Re: [net-next-2.6 PATCH v2] x86: Align skb w/ start of cacheline on newer core 2/Xeon Arch
From: David Miller @ 2010-06-30 21:28 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: netdev, gospo, bphilips, tglx, mingo, hpa, x86, alexander.h.duyck
In-Reply-To: <20100630043728.9224.64191.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 29 Jun 2010 21:38:00 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> x86 architectures can handle unaligned accesses in hardware, and it has
> been shown that unaligned DMA accesses can be expensive on Nehalem
> architectures.  As such we should overwrite NET_IP_ALIGN to resolve
> this issue.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Can I get an x86'er ACK on this?  I can merge it in via net-next-2.6
which is probably most convenient for people who want to see the
networking performance effects of this change.

^ permalink raw reply

* Re: [net-next-2.6 PATCH] ixgbe: add 1g PHY support for 82599
From: David Miller @ 2010-06-30 21:27 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips, donald.c.skidmore
In-Reply-To: <20100630043017.8987.49958.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 29 Jun 2010 21:30:59 -0700

> From: Don Skidmore <donald.c.skidmore@intel.com>
> 
> Add support for 1G SFP+ PHY's to 82599.
> 
> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] bridge: add per bridge device controls for invoking iptables
From: David Miller @ 2010-06-30 21:27 UTC (permalink / raw)
  To: shemminger; +Cc: kaber, netdev
In-Reply-To: <20100630142440.68adfdb1@nehalam>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 30 Jun 2010 14:24:40 -0700

> On Mon, 28 Jun 2010 14:47:00 +0200
> kaber@trash.net wrote:
> 
>> From: Patrick McHardy <kaber@trash.net>
>> 
>> Support more fine grained control of bridge netfilter iptables invocation
>> by adding seperate brnf_call_*tables parameters for each device using the
>> sysfs interface. Packets are passed to layer 3 netfilter when either the
>> global parameter or the per bridge parameter is enabled.
>> 
>> Signed-off-by: Patrick McHardy <kaber@trash.net>
> 
> Looks like a good idea.
> 
> Acked-by: Stephen Hemminger <shemminger@vyatta.com>

Acked-by: David S. Miller <davem@davemloft.net>

Patrick since this is mostly netfilter'ish, please toss it into one
of your trees.

Thanks!

^ permalink raw reply

* Re: [PATCH] bridge: add per bridge device controls for invoking iptables
From: Stephen Hemminger @ 2010-06-30 21:24 UTC (permalink / raw)
  To: kaber; +Cc: netdev
In-Reply-To: <1277729220-11775-1-git-send-email-kaber@trash.net>

On Mon, 28 Jun 2010 14:47:00 +0200
kaber@trash.net wrote:

> From: Patrick McHardy <kaber@trash.net>
> 
> Support more fine grained control of bridge netfilter iptables invocation
> by adding seperate brnf_call_*tables parameters for each device using the
> sysfs interface. Packets are passed to layer 3 netfilter when either the
> global parameter or the per bridge parameter is enabled.
> 
> Signed-off-by: Patrick McHardy <kaber@trash.net>

Looks like a good idea.

Acked-by: Stephen Hemminger <shemminger@vyatta.com>

-- 

^ permalink raw reply

* Re: TCP not triggering a fast retransmit?
From: David Miller @ 2010-06-30 21:22 UTC (permalink / raw)
  To: bhutchings; +Cc: novickivan, netdev, jmatthews, theath, herbert
In-Reply-To: <1277931829.4878.9.camel@localhost>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 30 Jun 2010 22:03:49 +0100

> In that packet capture I see TCP payload lengths which are 2, 3 and 4
> times the usual MSS of 1448 bytes, which implies that GRO or LRO is in
> use.  In RHEL 5.4 the TCP stack does not ACK often enough in this case
> because it is missing this change:
> 
> commit ff9b5e0f08cb650d113eef0c654f931c0a7ae730
> Author: Herbert Xu <herbert@gondor.apana.org.au>
> Date:   Thu Aug 31 15:11:02 2006 -0700
> 
>     [TCP]: Fix rcv mss estimate for LRO

It certainly could be, I'll try make sure this gets rectified,
thanks!

^ permalink raw reply

* Re: [PATCH 0/2] cxgb4vf: small fixes to new driver
From: David Miller @ 2010-06-30 21:22 UTC (permalink / raw)
  To: leedom; +Cc: netdev
In-Reply-To: <201006301413.42716.leedom@chelsio.com>

From: Casey Leedom <leedom@chelsio.com>
Date: Wed, 30 Jun 2010 14:13:42 -0700

> P.S. Is the below in a FAQ and/or Wiki somewhere?  if not, I think it would make 
> a valuable addition.  (And if it already is in a FAQ/Wiki then I'm even more of 
> a Patch Bozo ...)

See the "DISCUSSION" section of "git help am"

^ permalink raw reply

* Re: [PATCH 0/2] cxgb4vf: small fixes to new driver
From: Casey Leedom @ 2010-06-30 21:13 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100630.140513.171484395.davem@davemloft.net>

| From: David Miller <davem@davemloft.net>
| Date: Wednesday, June 30, 2010 02:05 pm
| 
| I've applied both patches but you really need to fix up how you
| submit these changes.

  Thanks David.  I won't submit any more patches till I get my local git patch 
experts to vet the results.  You shouldn't be asked to do such mechanical patch 
fixups.  I appreciate your time in describing this.  Thanks!

Hoping not to be a Patch Bozo in future submissions,
Casey

P.S. Is the below in a FAQ and/or Wiki somewhere?  if not, I think it would make 
a valuable addition.  (And if it already is in a FAQ/Wiki then I'm even more of 
a Patch Bozo ...)

| 1) Your Subject: line becomes the commit message header.
| 
|    It should be a single statement, prefixed by "xxx: "
|    where "xxx" is the subsystem or driver you are making
|    changes to.  Here it would be "cxgb4vf: "
| 
|    It should not bleed into the rest of commit message body, like
|    your's did.
| 
| 2) You should not include all of the commit crap from GIT in the body
|    of your email.  I just have to edit all of that junk out before I
|    apply your patch.
| 
| A perfect email patch submission looks like this (my comments are in
| {} braces):
| 
| From: Me <me@wherever.com>
| Subject: [PATCH N/M] subsystem: Make whatever do whatever.
| 
| { Next line is optional, it goes into your email body and is used
|   when the patch author is someone other than the person sending
|   the email }
| 
| From: Real Author <cooldude@wherever.com>
| 
| This explains what this commit message is doing.
| 
| It gives code path traces, pretty ascii-art diagrams, and cross
| references when doing so helps other people understand the change.
| 
| Signed-off-by: Real Author <cooldude@wherever.com>
| Signed-off-by: Me <me@wherever.com>
| 
| { "---" marks the end of the commit message text, afterwards you
|    can add whatever auxiliary information you want people to know about
|    the patch, but for whatever reason it'snt appropriate for the
|    commit message. }
| 
| ---
| 
| This is some extra information I want the list to see when I post
| this patch.
| 
| { And finally the full patch comes next. }
| 
| Ok?  All of the GIT tools know exactly how to pick apart the above
| formatted patch and apply it to the tree with the author, etc. all
| set properly.
| 
| And this is the format output by "git send-email" so you can use it
| to help construct proper patch postings even if you don't want to
| use "git send-email" to send the email directly.

^ permalink raw reply

* Re: [PATCH net-next-2.6 2/2] sfc: Add support for RX flow hash control
From: David Miller @ 2010-06-30 21:10 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1277910388.2082.15.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 30 Jun 2010 16:06:28 +0100

> Allow ethtool to query the number of RX rings, the fields used in RX
> flow hashing and the hash indirection table.
> 
> Allow ethtool to update the RX flow hash indirection table.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6 1/2] ethtool: Add support for control of RX flow hash indirection
From: David Miller @ 2010-06-30 21:10 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1277910323.2082.14.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 30 Jun 2010 16:05:23 +0100

> Many NICs use an indirection table to map an RX flow hash value to one
> of an arbitrary number of queues (not necessarily a power of 2).  It
> can be useful to remove some queues from this indirection table so
> that they are only used for flows that are specifically filtered
> there.  It may also be useful to weight the mapping to account for
> user processes with the same CPU-affinity as the RX interrupts.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Looks good, applied, thanks Ben.

^ permalink raw reply

* Re: [PATCH net-next-2.6 3/3] vmxnet3: Remove incorrect implementation of ethtool_ops::get_flags()
From: David Miller @ 2010-06-30 21:10 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, sbhatewara, pv-drivers
In-Reply-To: <1277902060.2082.13.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 30 Jun 2010 13:47:40 +0100

> Only some netdev feature flags correspond directly to ethtool feature
> flags.  ethtool_op_get_flags() does the right thing.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6 2/3] netdev: Make ethtool_ops::set_flags() return -EINVAL for unsupported flags
From: David Miller @ 2010-06-30 21:10 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev
In-Reply-To: <1277902016.2082.12.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 30 Jun 2010 13:46:56 +0100

> The documented error code for attempts to set unsupported flags (or
> to clear flags that cannot be disabled) is EINVAL, not EOPNOTSUPP.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6 1/3] ethtool: Change ethtool_op_set_flags to validate flags
From: David Miller @ 2010-06-30 21:10 UTC (permalink / raw)
  To: bhutchings
  Cc: netdev, linux-net-drivers, sgruszka, amit.salecha, amwang,
	anirban.chakraborty, dm, scofeldm, vkolluri, roprabhu,
	e1000-devel, buytenh, gallatin, brice, shemminger, jgarzik
In-Reply-To: <1277901872.2082.10.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Wed, 30 Jun 2010 13:44:32 +0100

> ethtool_op_set_flags() does not check for unsupported flags, and has
> no way of doing so.  This means it is not suitable for use as a
> default implementation of ethtool_ops::set_flags.
> 
> Add a 'supported' parameter specifying the flags that the driver and
> hardware support, validate the requested flags against this, and
> change all current callers to pass this parameter.
> 
> Change some other trivial implementations of ethtool_ops::set_flags to
> call ethtool_op_set_flags().
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
> Acked-by: Jeff Garzik <jgarzik@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH 0/2] cxgb4vf: small fixes to new driver
From: David Miller @ 2010-06-30 21:05 UTC (permalink / raw)
  To: leedom; +Cc: netdev
In-Reply-To: <201006291552.14816.leedom@chelsio.com>


I've applied both patches but you really need to fix up how you
submit these changes.

1) Your Subject: line becomes the commit message header.

   It should be a single statement, prefixed by "xxx: "
   where "xxx" is the subsystem or driver you are making
   changes to.  Here it would be "cxgb4vf: "

   It should not bleed into the rest of commit message body, like
   your's did.

2) You should not include all of the commit crap from GIT in the body
   of your email.  I just have to edit all of that junk out before I
   apply your patch.

A perfect email patch submission looks like this (my comments are in
{} braces):

From: Me <me@wherever.com>
Subject: [PATCH N/M] subsystem: Make whatever do whatever.

{ Next line is optional, it goes into your email body and is used
  when the patch author is someone other than the person sending
  the email }

From: Real Author <cooldude@wherever.com>

This explains what this commit message is doing.

It gives code path traces, pretty ascii-art diagrams, and cross
references when doing so helps other people understand the change.

Signed-off-by: Real Author <cooldude@wherever.com>
Signed-off-by: Me <me@wherever.com>

{ "---" marks the end of the commit message text, afterwards you
   can add whatever auxiliary information you want people to know about
   the patch, but for whatever reason it'snt appropriate for the
   commit message. }

---

This is some extra information I want the list to see when I post
this patch.

{ And finally the full patch comes next. }

Ok?  All of the GIT tools know exactly how to pick apart the above
formatted patch and apply it to the tree with the author, etc. all
set properly.

And this is the format output by "git send-email" so you can use it
to help construct proper patch postings even if you don't want to
use "git send-email" to send the email directly.

^ permalink raw reply

* Re: TCP not triggering a fast retransmit?
From: Ben Hutchings @ 2010-06-30 21:03 UTC (permalink / raw)
  To: Ivan Novick; +Cc: netdev, jmatthews, Tim Heath, Herbert Xu
In-Reply-To: <AANLkTinJC6uS1GTbKJWL9AlEs2e5GleQvJbTUZrsQHaE@mail.gmail.com>

On Wed, 2010-06-30 at 11:04 -0700, Ivan Novick wrote:
> Hello all,
> 
> Attached is a packet capture from my application that is running on
> RedHat Enterprise Linux 5.4
> 
> I am seeing a Retransmission timeout but I was hoping this case would
> go into fast retransmit and not RTO.
> 
> I am wondering why did the sender not send more data?  If the sender
> was to send more data and extend the window then it would seem the
> duplicate acks or SACKS should trigger fast retransmit.
[...]

In that packet capture I see TCP payload lengths which are 2, 3 and 4
times the usual MSS of 1448 bytes, which implies that GRO or LRO is in
use.  In RHEL 5.4 the TCP stack does not ACK often enough in this case
because it is missing this change:

commit ff9b5e0f08cb650d113eef0c654f931c0a7ae730
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Thu Aug 31 15:11:02 2006 -0700

    [TCP]: Fix rcv mss estimate for LRO

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH net-next-2.6] ipv4: sysctl to block responding on down interface
From: David Miller @ 2010-06-30 20:58 UTC (permalink / raw)
  To: shemminger; +Cc: joakim.tjernlund, netdev
In-Reply-To: <20100630135535.0e3a5ea1@nehalam>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 30 Jun 2010 13:55:35 -0700

>> The fact that the syctl knob, when enabled, can't even function properly
>> in this "multiple interfaces with same address" case is another reason I
>> have decided to not apply this.
> 
> We already have sysctl knobs that exist to work around broken printer TCP,
> middleboxes and other broken stacks; my opinion this is just another one
> of those types of workarounds.

But that sysctl knob for the printer workaround doesn't break legitimate
configurations like this one does.

^ permalink raw reply

* Re: [PATCH net-next-2.6] ipv4: sysctl to block responding on down interface
From: Stephen Hemminger @ 2010-06-30 20:55 UTC (permalink / raw)
  To: David Miller; +Cc: joakim.tjernlund, netdev
In-Reply-To: <20100622.101537.245382806.davem@davemloft.net>

On Tue, 22 Jun 2010 10:15:37 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Fri, 11 Jun 2010 08:48:54 -0700
> 
> > The initial problem report was for a management application that used ICMP
> > to check link availability.
> 
> That application is buggy, and even if we apply this patch it will
> only properly function when speaking to systems in a non-default
> configuration.  And, it would be a non-default setting which, by your
> own admission below, cannot function properly in valid interface
> configurations.

It is a remote management system not a local application.
The management system is stupid, but it is hard to argue with
customers that other system is broken. 

> It's easier to fix the app to work in all cases than to add another
> sysctl knob hack for a segment of the world that can't seem to wrap
> their head around the fact that our behavior is valid, specified, and
> an explicit design decision meant to increase the chances of
> successful communication between two systems.
> 
> > The default is disabled to maintain compatibility with previous behavior.
> > This is not recommended for server systems because it makes fail over more
> > difficult, and does not account for configurations where multiple interfaces
> > have the same IP address.
> 
> The fact that the syctl knob, when enabled, can't even function properly
> in this "multiple interfaces with same address" case is another reason I
> have decided to not apply this.

We already have sysctl knobs that exist to work around broken printer TCP,
middleboxes and other broken stacks; my opinion this is just another one
of those types of workarounds.

^ permalink raw reply

* Re: [PATCH v2] bonding: check if clients MAC addr has changed
From: David Miller @ 2010-06-30 20:52 UTC (permalink / raw)
  To: fleitner; +Cc: bonding-devel, fubar, netdev, gospo
In-Reply-To: <1277835879-26834-1-git-send-email-fleitner@redhat.com>

From: Flavio Leitner <fleitner@redhat.com>
Date: Tue, 29 Jun 2010 15:24:39 -0300

> When two systems using bonding devices in adaptive load
> balancing (ALB) communicates with each other, an endless
> ping-pong of ARP replies starts between these two systems.
> 
> What happens? In the ALB mode, bonding driver keeps track
> of each client connected in a hash table, so it can do the
> receive load balancing (RLB). This hash table is updated
> when an ARP reply is received, then it scans for the client
> entry, updates its MAC address and flag it to be announced
> later. Therefore, two seconds later, the alb monitor runs
> and send for each updated client entry two ARP replies
> updating this specific client. The same process happens on
> the receiving system, causing the endless ping-pong of arp
> replies.
> 
> See more information including the relevant functions below:
 ...
> Signed-off-by: Flavio Leitner <fleitner@redhat.com>
> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

Applied, thanks.

Probably one of the best networking commit log messages I've seen in a
long time.  Nice work.

^ permalink raw reply

* Re: [RFC] [PATCH] ethtool: Change ethtool_op_set_flags to validate flags
From: David Miller @ 2010-06-30 20:47 UTC (permalink / raw)
  To: sgruszka; +Cc: bhutchings, amit.salecha, netdev, amwang, anirban.chakraborty
In-Reply-To: <20100630132111.72559f9f@dhcp-lab-109.englab.brq.redhat.com>

From: Stanislaw Gruszka <sgruszka@redhat.com>
Date: Wed, 30 Jun 2010 13:21:11 +0200

> On Tue, 29 Jun 2010 17:01:01 +0100
> Ben Hutchings <bhutchings@solarflare.com> wrote:
> 
>> This is the sort of change I'd like to see.
...
> Looks good for me as well.
> 
> Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>

Me too.

Ben, please submit this formally.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox