Netdev List
 help / color / mirror / Atom feed
* Re: Weird TCP retransmit behaviour in recent kernels
From: Ilpo Järvinen @ 2010-05-15 22:45 UTC (permalink / raw)
  To: Michael Smith; +Cc: Netdev
In-Reply-To: <alpine.LNX.2.00.1005141815560.3517@pentagram.it.hurts.ca>

On Fri, 14 May 2010, Michael Smith wrote:

> I'm struggling with TCP sessions stalling when Windows XP SP2 clients
> connect to a SUSE Linux Enterprise 11 server (kernel 2.6.27.x). The
> problem doesn't occur with kernel 2.6.18.8 on the server, and I'm
> wondering if something's changed since then in the retransmit logic.
> 
> It seems like when consecutive packets are lost, the SLES11
> server retransmits the first packet when the timeout fires. The client
> ACKs, but the server doesn't retransmit the next lost packet; instead,
> it sends a couple more new packets,

Which is expected and desired change, known as FRTO (RFC 5682).

> which don't get ACKed.

This is where your problem is, they should get acked in a _compliant_ 
network (with duplicate ACKs).

> The new packets don't show up in Wireshark - either something in the 
> network is dropping them,

There's some non-compliant middlebox in the network?

> or maybe Windows doesn't forward them to WinPcap because
> there's a hole in the sequence. The timeout fires again after double
> the time, and the second packet is retransmitted and ACKed, then
> more brand new packets are sent out. The transfer quickly grinds to a
> halt.
>
> There's a WAN and VPN between the clients and the server. HTTP downloads
> from the server stall at various points depending on the client. The
> point at which the connection stalls seems to be dependent on latency.
> For example, if the RTT to the client is 12 ms, the connection might
> usually stall after 120 KB; if it's 20 ms, it might stall at 1200 KB.
> 
> The problem doesn't occur when a Windows client talks to a Windows
> server.  When a Linux client talks to the SLES11 server, the connection
> doesn't stall completely but slows to a crawl (~3 KB/sec, as opposed to
> typical 50-200 KB/sec).
> 
> I was able to work around the problem for most clients by locking the
> TCP congestion window to a maximum of 6 on the SLES11 server. Some sites
> are pathologically bad and the connection stalls unless I lock the
> congestion window to 1 (!!).
> 
> I've put up a couple of sample traces from a pathological site where
> the problem shows up with cwnd locked to 3:
> 
> http://www.hurts.ca/sles11.router.pcap.gz - view from the server's firewall
> http://www.hurts.ca/sles11.windows.pcap.gz - view from a client PC
> 
> On the firewall, you can see the problem around packets 93-104. The server
> sends sequence 66781, 68041, 69301; retransmits 66781, gets an ACK, then
> sends 70561, 71821; retransmits 68041, gets an ACK, then sends 73081,
> 74341, and so on. On the client, the "future" sequence packets after
> the ACK never show up in Wireshark.
> 
> I've tried all of the obvious things:
> - disabling TCP segment/checksum offloading functions on client and server;
> - disabling SACK;
> - trying all available congestion control algorithms on SLES11
>   (cubic, reno, veno, illinois);
> - turning off anti-virus on the client.
> 
> The only 100% reliable workaround seems to be to proxy the connections
> through a kernel 2.6.18.8 machine on the same subnet. It seems like
> the problem exists with a vanilla 2.6.31 kernel, too.
> 
> Has anyone seen something like this before? Any ideas where to go next? I'm
> pretty sure there's nothing strange in the network - just plain old Cisco
> routers and site-to-site VPNs.

Some have seen similar phenomena, every time it has been fault in some 
middlebox/peer that does not do what it should. You can disable frto 
using tcp_frto sysctl if you like, however, I disagree with you as I'm 
pretty sure there is some broken middlebox in the network (which is trying 
to be too intelligent).

-- 
 i.

^ permalink raw reply

* Re: [PATCH 13/20] net/caif: Use kzalloc
From: Sjur Brændeland @ 2010-05-15 22:54 UTC (permalink / raw)
  To: Julia Lawall; +Cc: David S. Miller, netdev, linux-kernel, kernel-janitors
In-Reply-To: <Pine.LNX.4.64.1005132203150.6282@ask.diku.dk>

Julia Lawall <julia@diku.dk> wrote:
>Use kzalloc rather than the combination of kmalloc and memset.

Thank you, this look good to me.
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>

^ permalink raw reply

* [PATCH REPOST] PCI: Disable MSI for MCP55 on P5N32-E SLI
From: Ben Hutchings @ 2010-05-16  1:28 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-pci, netdev

As reported in <http://bugs.debian.org/552299>, MSI appears to be
broken for this on-board device.  We already have a quirk for the
P5N32-SLI Premium; extend it to cover both variants of the board.

Reported-by: Romain DEGEZ <romain.degez@smartjog.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@kernel.org
---
 drivers/pci/quirks.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 27c0e6e..4807825 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2218,15 +2218,16 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SERVERWORKS,
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_8132_BRIDGE,
 			 ht_enable_msi_mapping);
 
-/* The P5N32-SLI Premium motherboard from Asus has a problem with msi
+/* The P5N32-SLI motherboards from Asus have a problem with msi
  * for the MCP55 NIC. It is not yet determined whether the msi problem
  * also affects other devices. As for now, turn off msi for this device.
  */
 static void __devinit nvenet_msi_disable(struct pci_dev *dev)
 {
-	if (dmi_name_in_vendors("P5N32-SLI PREMIUM")) {
+	if (dmi_name_in_vendors("P5N32-SLI PREMIUM") ||
+	    dmi_name_in_vendors("P5N32-E SLI")) {
 		dev_info(&dev->dev,
-			 "Disabling msi for MCP55 NIC on P5N32-SLI Premium\n");
+			 "Disabling msi for MCP55 NIC on P5N32-SLI\n");
 		dev->no_msi = 1;
 	}
 }
-- 
1.7.0.3




^ permalink raw reply related

* Re: [PATCH] rndis_host: Poll status channel before control channel
From: David Miller @ 2010-05-16  5:54 UTC (permalink / raw)
  To: ben; +Cc: dbrownell, john.carr, netdev, vzeeaxwl, herton
In-Reply-To: <1273930635.2564.23.camel@localhost>

From: Ben Hutchings <ben@decadent.org.uk>
Date: Sat, 15 May 2010 14:37:15 +0100

> On Wed, 2010-05-12 at 23:42 -0700, David Miller wrote:
>> From: Ben Hutchings <ben@decadent.org.uk>
>> Date: Tue, 20 Apr 2010 00:08:28 +0100
>> 
>> > Some RNDIS devices don't respond on the control channel until polled
>> > on the status channel.  In particular, this was reported to be the
>> > case for the 2Wire HomePortal 1000SW.
>> > 
>> > This is roughly based on a patch by John Carr <john.carr@unrouted.co.uk>
>> > which is reported to be needed for use with some Windows Mobile devices
>> > and which is currently applied by Mandriva.
>> > 
>> > Reported-by: Mark Glassberg <vzeeaxwl@myfairpoint.net>
>> > Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
>> > Tested-by: Mark Glassberg <vzeeaxwl@myfairpoint.net>
>> > ---
>> > Note that this change hasn't yet been tested with any other RNDIS
>> > devices.  John, can you confirm whether this also handles the WinMob
>> > devices?
>> 
>> Still waiting for this to get tested.  Is there really nobody in the
>> world with RNDIS devices who can test this patch?  If so, maybe that's
>> a good reason to not apply it :-))))
> 
> This has been in Debian unstable since 1 May and I haven't seen any
> fall-out yet.  However I acknowledge that absence of evidence is not
> evidence of absence.

I think I'll toss it into net-next-2.6 and we'll see if any monsters
come out of that.

Thanks.

^ permalink raw reply

* Re: [PATCH 1/4] bridge: netpoll cleanup
From: David Miller @ 2010-05-16  6:12 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, bridge
In-Reply-To: <20100510193320.775633381@vyatta.com>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 10 May 2010 12:31:08 -0700

> Move code around so that the ifdef for NETPOLL_CONTROLLER don't have to
> show up in main code path. The control functions should be in helpers
> that are only compiled if needed.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/4] bridge: change console message interface
From: David Miller @ 2010-05-16  6:12 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, bridge
In-Reply-To: <20100510193320.859557093@vyatta.com>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 10 May 2010 12:31:09 -0700

> Use one set of macro's for all bridge messages.
> 
> Note: can't use netdev_XXX macro's because bridge is purely
> virtual and has no device parent.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply

* Re: [PATCH 3/4] bridge: netfilter use net_ratelimit
From: David Miller @ 2010-05-16  6:12 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, bridge
In-Reply-To: <20100510193320.936126854@vyatta.com>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 10 May 2010 12:31:10 -0700

> The function __br_dnat_complain is basically reimplementing existing
> net_ratelimit.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

This code is no longer there after the recent netfilter merges.

^ permalink raw reply

* Re: [PATCH 4/4] bridge: update sysfs link names if port device names have changed
From: David Miller @ 2010-05-16  6:12 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, bridge, simon
In-Reply-To: <20100510193321.020451781@vyatta.com>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 10 May 2010 12:31:11 -0700

> From: Simon Arlott <simon@fire.lp0.eu>
> 
> Links for each port are created in sysfs using the device
> name, but this could be changed after being added to the
> bridge.
> 
> As well as being unable to remove interfaces after this
> occurs (because userspace tools don't recognise the new
> name, and the kernel won't recognise the old name), adding
> another interface with the old name to the bridge will
> cause an error trying to create the sysfs link.
> 
> This fixes the problem by listening for NETDEV_CHANGENAME
> notifications and renaming the link.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=12743
> 
> Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
> Acked-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply

* Re: [patch 2.6.35 00/25] WiMAX pull request
From: David Miller @ 2010-05-16  6:24 UTC (permalink / raw)
  To: inaky; +Cc: netdev, wimax, inaky.perez-gonzalez
In-Reply-To: <cover.1273708027.git.inaky.perez-gonzalez@intel.com>

From: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Date: Fri, 14 May 2010 14:44:59 -0700

> The following changes since commit 2b0b05ddc04b6d45e71cd36405df512075786f1e:
>   Marcel Holtmann (1):
>         Bluetooth: Fix issues where sk_sleep() helper is needed now
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/inaky/wimax.git master
> 
> Patches follow for reviewing convenience.

Pulled, thanks a lot.

^ permalink raw reply

* Re: [PATCH 0/6] netns support in the kobject layer
From: David Miller @ 2010-05-16  6:26 UTC (permalink / raw)
  To: greg
  Cc: ebiederm, gregkh, kay.sievers, linux-kernel, tj, cornelia.huck,
	eric.dumazet, bcrl, serue, netdev
In-Reply-To: <20100506200404.GA21805@kroah.com>

From: Greg KH <greg@kroah.com>
Date: Thu, 6 May 2010 13:04:04 -0700

> On Tue, May 04, 2010 at 05:35:54PM -0700, Eric W. Biederman wrote:
>> 
>> With the tagged sysfs support finally merged into Greg's tree,
>> it is time for the last little bits of work to get the kobject
>> layer and network namespaces to play together properly.
>> 
>> These patches are roughly evenly divided between network layer work
>> and sysfs layer work.  Last time this conundrum came up I believe
>> we decided that the easiest way to handle this was for Greg to carry
>> all of the patches.  David, Greg does that still make sense?
> 
> That's fine, if I get David's ack on these.

Looks good to me:

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [Patch v10 0/3] net: reserve ports for applications using fixed port numbers
From: David Miller @ 2010-05-16  6:28 UTC (permalink / raw)
  To: amwang
  Cc: linux-kernel, opurdila, ebiederm, eric.dumazet, penguin-kernel,
	netdev, nhorman, xiaosuo, adobriyan
In-Reply-To: <20100505103033.5600.77502.sendpatchset@localhost.localdomain>

From: Amerigo Wang <amwang@redhat.com>
Date: Wed, 5 May 2010 06:26:34 -0400

> 
> Changes from the previous version:
> - Use 'true' and 'false' for bool's;
> - Fix some coding style problems;
> - Allow appending lines to bitmap proc file so that it will be
>   easier to add new bits.

Applied to net-next-2.6, thanks.

^ permalink raw reply

* Re: [PATCH] skge: use the DMA state API instead of the pci equivalents
From: David Miller @ 2010-05-16  6:29 UTC (permalink / raw)
  To: shemminger; +Cc: fujita.tomonori, netdev
In-Reply-To: <20100514183307.6c14f294@nehalam>

From: Stephen Hemminger <shemminger@linux-foundation.org>
Date: Fri, 14 May 2010 18:33:07 -0700

> On Wed, 28 Apr 2010 09:57:04 +0900
> FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:
> 
>> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> 
> Yes, this works fine. Sorry for the delay but that test system
> was offline for several months and the disk went bad.
> 
> Acked-by: Stephen Hemminger <shemminger@vyatta.com>

Applied, thanks.

^ permalink raw reply

* Re: [RFC 0/5] generic rx recycling
From: David Miller @ 2010-05-16  6:32 UTC (permalink / raw)
  To: sebastian; +Cc: netdev, tglx
In-Reply-To: <1273070870-7821-1-git-send-email-sebastian@breakpoint.cc>

From: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Date: Wed,  5 May 2010 16:47:45 +0200

> This series merges the rx recycling code trying to come up with generic
> code. Recycling skbs from the tx path for incomming rx skips the memory
> allocater and improves latency during memory pressure.
> This is now used by just by just four drivers in the tree which were doing
> this on their own.

You're adding new unnecessary SMP locking to all of these drivers.

In the gianfar original code the recycle queue is accessed lockless
using __skb_dequeue() et al.  But you're using the skb_dequeue()
interface in the generic version which takes the SKB queue lock
which is absolutely unnecessary where these drivers make these
calls since they already need to have their chip RX path locked
already.

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: adjust handle_macvlan to pass port struct to hook
From: David Miller @ 2010-05-16  6:48 UTC (permalink / raw)
  To: kaber; +Cc: jpirko, netdev
In-Reply-To: <4BE819B6.3020101@trash.net>

From: Patrick McHardy <kaber@trash.net>
Date: Mon, 10 May 2010 16:35:34 +0200

> Jiri Pirko wrote:
>> Now there's null check here and also again in the hook. Looking at bridge bits
>> which are simmilar, port structure is rcu_dereferenced right away in
>> handle_bridge and passed to hook. Looks nicer.
>> 
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> 
> Looks fine, thanks.
> 
> Acked-by: Patrick McHardy <kaber@trash.net>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] xfrm: fix policy unreferencing on larval drop
From: David Miller @ 2010-05-16  6:49 UTC (permalink / raw)
  To: timo.teras; +Cc: netdev
In-Reply-To: <1273146734-8022-1-git-send-email-timo.teras@iki.fi>

From: Timo Teras <timo.teras@iki.fi>
Date: Thu,  6 May 2010 14:52:14 +0300

> I mistakenly had the error path to use num_pols to decide how
> many policies we need to drop (cruft from earlier patch set
> version which did not handle socket policies right).
> 
> This is wrong since normally we do not keep explicit references
> (instead we hold reference to the cache entry which holds references
> to policies). drop_pols is set to num_pols if we are holding the
> references, so use that. Otherwise we eventually BUG_ON inside
> xfrm_policy_destroy due to premature policy deletion.
> 
> Signed-off-by: Timo Teras <timo.teras@iki.fi>

Applied, thanks a lot Timo.

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: Consistent skb timestamping
From: David Miller @ 2010-05-16  6:56 UTC (permalink / raw)
  To: therbert; +Cc: eric.dumazet, netdev
In-Reply-To: <AANLkTikLgHvtpCtBTKmJZBwixmZDHjRjGb1c59oAemli@mail.gmail.com>

From: Tom Herbert <therbert@google.com>
Date: Thu, 6 May 2010 08:12:57 -0700

> I'm contemplating changing SO_TIMESTAMP to not enable global
> timestamps, but only take the timestamp for a packet once the socket
> is identified and the timestamp flag is set (this is the technique
> done in FreeBSD and Solaris, so I believe the external semantics
> would still be valid).

This is not tenable.

Users have made it clear in the past that when they ask for a timestamp
they really want the timestamp as close to the device receive handling
path as possible.

Users basically really want timestamps in two places:

1) As near the device RX handling as possible

2) The point at which recvmsg() got the data

The former is obtainable from SO_TIMESTAMP and the latter from
gettimeofday().

So putting it way down to the point where we choose the socket isn't
going to work at all.

FreeBSD and Solaris combined have a tiny sliver of the number of users
we have to cater to, so they can have all kinds of latitude with which
to break things like that.  So saying they do something is like saying
"the moon was out tonight", it has no relevance on whether we are able
to do it too :-)

The real fix is to make the devices less stupid and give us timestamps
directly, and thanks to things like PTP support in hardware that's
actually more and more of a reality these days.

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: Consistent skb timestamping
From: David Miller @ 2010-05-16  6:57 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, netdev
In-Reply-To: <1273162488.2853.43.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 06 May 2010 18:14:48 +0200

> [PATCH v2 net-next-2.6] net: Consistent skb timestamping
 ...
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH 2.6.34-rc6] net: Improve ks8851 snl transmit performance
From: David Miller @ 2010-05-16  7:26 UTC (permalink / raw)
  To: Tristram.Ha; +Cc: ben, netdev, linux-kernel, x0066660, s-jan
In-Reply-To: <14385191E87B904DBD836449AA30269D021A64@MORGANITE.micrel.com>

From: "Ha, Tristram" <Tristram.Ha@Micrel.Com>
Date: Thu, 6 May 2010 15:50:27 -0700

> From: Tristram Ha <Tristram.Ha@micrel.com>
> 
> Under heavy transmission the driver will put 4 1514-byte packets in
> queue and stop the device transmit queue.  Only the last packet
> triggers the transmit done interrupt and wakes up the device
> transmit queue.  That means a bit of time is wasted when the CPU
> cannot send any more packet.
> 
> The new implementation triggers the transmit interrupt when the
> transmit buffer left is less than 3 packets.  The maximum transmit
> buffer size is 6144 bytes.  This allows the device transmit queue to
> be restarted sooner so that CPU can send more packets.
> 
> For TCP receiving it also has the benefit of not triggering any transmit interrupt at all.
> 
> There is a driver option no_tx_opt so that the driver can revert to
> original implementation.  This allows user to verify if the transmit
> performance actually improves.
> 
> Signed-off-by: Tristram Ha <Tristram.Ha@micrel.com>

First, if you want to post patches you have to format them properly as
ascii text with no longer than 80 column lines in your commit message.
I really don't want to hear about your email client being a reason
you can't do this properly :-)

Second, I don't think you can use the skb->ip_summed for this hacked
state tracking you are using.  The packet might be shared with other
entities, and therefore if you change the field it won't be correct
for them any more.

^ permalink raw reply

* Re: [PATCH 1/2] netdev/fec: fix performance impact from mdio poll operation
From: David Miller @ 2010-05-16  7:28 UTC (permalink / raw)
  To: bryan.wu; +Cc: s.hauer, gerg, amit.kucheria, netdev, linux-kernel
In-Reply-To: <1273199239-11057-2-git-send-email-bryan.wu@canonical.com>

From: Bryan Wu <bryan.wu@canonical.com>
Date: Fri,  7 May 2010 10:27:18 +0800

> BugLink: http://bugs.launchpad.net/bugs/546649
> BugLink: http://bugs.launchpad.net/bugs/457878
> 
> After introducing phylib supporting, users experienced performace drop. That is
> because of the mdio polling operation of phylib. Use msleep to replace the busy
> waiting cpu_relax() and remove the warning message.
> 
> Signed-off-by: Bryan Wu <bryan.wu@canonical.com>
> Acked-by: Andy Whitcroft <apw@canonical.com>

As you've already been told, making these MDIO interfaces fail silently
is not acceptable.

Please fix this bug properly and resubmit this patch series.

Thanks.

^ permalink raw reply

* Re: TCP-MD5 checksum failure on x86_64 SMP
From: David Miller @ 2010-05-16  7:30 UTC (permalink / raw)
  To: shemminger; +Cc: eric.dumazet, bhaskie, bhutchings, netdev
In-Reply-To: <20100507103639.4f1a51fa@nehalam>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Fri, 7 May 2010 10:36:39 -0700

> On Fri, 07 May 2010 19:21:33 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
>> Le vendredi 07 mai 2010 à 10:14 -0700, Stephen Hemminger a écrit :
>> 
>> > Forget the per cpu data; the pool should just be scrapped.
>> > 
>> > The only reason the pool exists is that the crypto hash state which
>> > should just be moved into the md5_info (88 bytes).  The pseudo
>> > header can just be generated on the stack before passing to the crypto
>> > code.
>> 
>> 
>> Sure, but I'm afraid there is no generic API do do that (if we want to
>> reuse crypto/md5.c code).
> 
> It looks like the pool is just an optimization to avoid opening too
> many crypto API connections.  This should only be an issue if offloading
> MD5.

It's an issue because creating a crypto API context is expensive, so this
influences our connection rates with MD5.

^ permalink raw reply

* Re: TCP-MD5 checksum failure on x86_64 SMP
From: David Miller @ 2010-05-16  7:35 UTC (permalink / raw)
  To: eric.dumazet; +Cc: bhaskie, shemminger, bhutchings, netdev
In-Reply-To: <1273219222.2261.11.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 07 May 2010 10:00:22 +0200

> [PATCH] tcp: fix MD5 (RFC2385) support
> 
> TCP MD5 support uses percpu data for temporary storage. It currently
> disables preemption so that same storage cannot be reclaimed by another
> thread on same cpu.
> 
> We also have to make sure a softirq handler wont try to use also same
> context. Various bug reports demonstrated corruptions.
> 
> Fix is to disable preemption and BH.
> 
> Reported-by: Bhaskar Dutta <bhaskie@gmail.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: TCP-MD5 checksum failure on x86_64 SMP
From: David Miller @ 2010-05-16  7:37 UTC (permalink / raw)
  To: eric.dumazet; +Cc: bhaskie, shemminger, bhutchings, netdev
In-Reply-To: <1273267137.2325.31.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 07 May 2010 23:18:57 +0200

> [PATCH] net: Introduce sk_route_nocaps
> 
> TCP-MD5 sessions have intermittent failures, when route cache is
> invalidated. ip_queue_xmit() has to find a new route, calls
> sk_setup_caps(sk, &rt->u.dst), destroying the 
> 
> sk->sk_route_caps &= ~NETIF_F_GSO_MASK
> 
> that MD5 desperately try to make all over its way (from
> tcp_transmit_skb() for example)
> 
> So we send few bad packets, and everything is fine when
> tcp_transmit_skb() is called again for this socket.
> 
> Since ip_queue_xmit() is at a lower level than TCP-MD5, I chose to use a
> socket field, sk_route_nocaps, containing bits to mask on sk_route_caps.
> 
> Reported-by: Bhaskar Dutta <bhaskie@gmail.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Since the connection does recover eventually, I'm stuffing this into
net-next-2.6 into net-2.6

After some time in net-next-2.6, we can submit it to -stable.

^ permalink raw reply

* Re: VLAN I/F's and TX queue.
From: David Miller @ 2010-05-16  7:40 UTC (permalink / raw)
  To: joakim.tjernlund; +Cc: kaber, eric.dumazet, netdev
In-Reply-To: <OFC83B46CB.0E764A8E-ONC125771F.005109E4-C125771F.00518338@transmode.se>

From: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Date: Mon, 10 May 2010 16:50:20 +0200

> Patrick McHardy <kaber@trash.net> wrote on 2010/05/10 16:33:00:
>>
>> Joakim Tjernlund wrote:
>> > Eric Dumazet <eric.dumazet@gmail.com> wrote on 2010/05/07 10:53:23:
>> >>> 3) I would expect lost pkgs to be accounted on eth0 instead of
>> >>>    the VLAN interface(s) since that is where the pkg is lost, why
>> >>>    isn't it so?
>> >> You try to send packets on eth0.XXX, some are dropped, and accounted for
>> >> on eth0.XXX stats. What is wrong with this ?
>> >
>> > In this case one lost pkg is accounted for twice, once on eth0.1 and
>> > once more on eth0.1.1. Note that eth0.1.1 is stacked on
>> > top of eth0.1
>> >
>> > I would at least expect eth0 to also account lost pkgs too.
>> > I was confused by the current accounting as I knew that
>> > the underlying HW I/F should be the only I/F that could
>> > drop pkgs.
>>
>> In case of NET_XMIT_CN, the packet is dropped by the qdisc before
>> it reaches eth0, so its only accounted on the upper devices.
> 
> hmm, I am afraid I don't follow this. Why would a pkg be dropped before
> it reaches eth0?

Because we have packet schedulers that sit before the device transmit
happens, and those packet schedulers enforce limits based upon
classification results or other criteria, and if those limits are
exceeded packets are droppers and NET_XMIT_CN is returned back up into
the transmit path of the networking stack.

The device never sees that packet get submitted to it's ->ndo_start_xmit()
routine, and this is entirely intentional.  And it is entirely intentional
that NET_XMIT_CN gets passed up into the caller, where protocols such as
TCP can key off this information to make congestion control decisions.


^ permalink raw reply

* Re: [PATCH v2] sctp: Fix a race between ICMP protocol unreachable and connect()
From: David Miller @ 2010-05-16  7:46 UTC (permalink / raw)
  To: vladislav.yasevich; +Cc: yjwei, netdev, linux-sctp
In-Reply-To: <4BE81144.8020806@hp.com>

From: Vlad Yasevich <vladislav.yasevich@hp.com>
Date: Mon, 10 May 2010 09:59:32 -0400

> 
> 
> Wei Yongjun wrote:
>> [PATCH] sctp: delete active ICMP proto unreachable timer when free transport
>> 
>> transport may be free before ICMP proto unreachable timer expire, so
>> we should delete active ICMP proto unreachable timer when transport
>> is going away.
>> 
>> Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
 ...
> ACK.  This fixes a race against close().  Although that will be fairly hard to
> do, it is possible.
> 
> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>

Applied, thanks.

^ permalink raw reply

* Re: [patch 0/3] s390: qeth patches for 2.6.35
From: David Miller @ 2010-05-16  7:50 UTC (permalink / raw)
  To: frank.blaschka; +Cc: netdev, linux-s390
In-Reply-To: <20100512053444.035939000@de.ibm.com>

From: frank.blaschka@de.ibm.com
Date: Wed, 12 May 2010 07:34:44 +0200

> here are some qeth patches for 2.6.35 (net-next).
> 
> shortlog:
> Ursula Braun (1)
> qeth: new message if OLM limit is reached
> 
> Frank Blaschka (2)
> qeth: exploit HW TX checksumming
> qeth: synchronize configuration interface

Looks good, all applied, thanks!

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox