Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Mostly revert "e1000/e1000e: Move PCI-Express device IDs over to e1000e"
From: Adrian Bunk @ 2008-01-30 23:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Randy Dunlap, Linux Kernel Mailing List, auke-jan.h.kok, jeff,
	David S. Miller, akpm, netdev
In-Reply-To: <alpine.LFD.1.00.0801301631400.3426@www.l.google.com>

On Wed, Jan 30, 2008 at 04:51:04PM +1100, Linus Torvalds wrote:
> 
> 
> On Tue, 29 Jan 2008, Randy Dunlap wrote:
> > 
> > Andrew was concerned about this when the driver was in -mm.
> > He asked for a patch that would set E1000E to same value as E1000
> > and I supplied that.  Auke acked it IIRC.  Other people vetoed it.  :(
> 
> Yeah, I've been discussing with Jeff and the gang.
> 
> I think we have agreed on a solution where the ID's show up in the old 
> driver if the new driver is not enabled at all.
> 
> (And as a side note: it turns out that the problem I experienced didn't 
> come from the new e1000e driver after all, so I'll be removing the 
> EXPERIMENTAL flag again).
> 
> So I'd suggest the final patch be something like this, but I'm sendign it 
> out just as an example of how we could solve this, not necessarily as a 
> final patch.
> 
> Jeff, Auke, would something like this be acceptable? It makes it very 
> obvious in the driver table which entries are for the PCIE versions that 
> would be handled by the E1000E driver if it is enabled..
> 
> Untested, but as mentioned, this is more of a "this looks maintainable and 
> like it should solve the issues" rather than anything I was planning on 
> committing now.

I don't like it:

We should aim at having exactly one driver for one card.

Your patch has effects like e.g. a kernel behaving differently when 
adding and compiling the e1000e module later compared to having it 
originally in the .config.

And fun like "The card works on my machine with the e1000 driver, why 
doesn't it work in your machine with the e1000 driver?".

And in terms of maintainability, people will disable the e1000e driver 
in their kernel for working around bugs in it instead of reporting the 
bugs. Exactly what we want to not happen.

And unless we want to keep this situation forever, we anyway have to 
remove the support for the PCI-Express adapters from the e1000 driver at 
some point in time, so why not make a clear cut now? Whatever problems 
this causes will be the same now or in a few years.

> 		Linus

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply

* Re: e1000 full-duplex TCP performance well below wire speed
From: SANGTAE HA @ 2008-01-31  0:17 UTC (permalink / raw)
  To: Bruce Allen; +Cc: Linux Kernel Mailing List, netdev, Stephen Hemminger
In-Reply-To: <Pine.LNX.4.63.0801301610240.19938@trinity.phys.uwm.edu>

Hi Bruce,

On Jan 30, 2008 5:25 PM, Bruce Allen <ballen@gravity.phys.uwm.edu> wrote:
>
> In our application (cluster computing) we use a very tightly coupled
> high-speed low-latency network.  There is no 'wide area traffic'.  So it's
> hard for me to understand why any networking components or software layers
> should take more than milliseconds to ramp up or back off in speed.
> Perhaps we should be asking for a TCP congestion avoidance algorithm which
> is designed for a data center environment where there are very few hops
> and typical packet delivery times are tens or hundreds of microseconds.
> It's very different than delivering data thousands of km across a WAN.
>

If your network latency is low, regardless of type of protocols should
give you more than 900Mbps. I can guess the RTT of two machines is
less than 4ms in your case and I remember the throughputs of all
high-speed protocols (including tcp-reno) were more than 900Mbps with
4ms RTT. So, my question which kernel version did you use with your
broadcomm NIC and got more than 900Mbps?

I have two machines connected by a gig switch and I can see what
happens in my environment. Could you post what parameters did you use
for netperf testing?
and also if you set any parameters for your testing, please post them
here so that I can see that happens to me as well.

Regards,
Sangtae

^ permalink raw reply

* [PATCHv2] PHYLIB: Add BCM5482 PHY support
From: Nate Case @ 2008-01-31  0:28 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Andy Fleming, netdev

This Broadcom PHY is similar to other bcm54xx devices.

Signed-off-by: Nate Case <ncase@xes-inc.com>
---
Note: This is a re-submission, correcting the bad indentation in the first patch

 drivers/net/phy/broadcom.c |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c
index 29666c8..5b80358 100644
--- a/drivers/net/phy/broadcom.c
+++ b/drivers/net/phy/broadcom.c
@@ -141,6 +141,20 @@ static struct phy_driver bcm5461_driver = {
 	.driver 	= { .owner = THIS_MODULE },
 };
 
+static struct phy_driver bcm5482_driver = {
+	.phy_id		= 0x0143bcb0,
+	.phy_id_mask	= 0xfffffff0,
+	.name		= "Broadcom BCM5482",
+	.features	= PHY_GBIT_FEATURES,
+	.flags		= PHY_HAS_MAGICANEG | PHY_HAS_INTERRUPT,
+	.config_init	= bcm54xx_config_init,
+	.config_aneg	= genphy_config_aneg,
+	.read_status	= genphy_read_status,
+	.ack_interrupt	= bcm54xx_ack_interrupt,
+	.config_intr	= bcm54xx_config_intr,
+	.driver 	= { .owner = THIS_MODULE },
+};
+
 static int __init broadcom_init(void)
 {
 	int ret;
@@ -154,8 +168,13 @@ static int __init broadcom_init(void)
 	ret = phy_driver_register(&bcm5461_driver);
 	if (ret)
 		goto out_5461;
+	ret = phy_driver_register(&bcm5482_driver);
+	if (ret)
+		goto out_5482;
 	return ret;
 
+out_5482:
+	phy_driver_unregister(&bcm5461_driver);
 out_5461:
 	phy_driver_unregister(&bcm5421_driver);
 out_5421:
@@ -166,6 +185,7 @@ out_5411:
 
 static void __exit broadcom_exit(void)
 {
+	phy_driver_unregister(&bcm5482_driver);
 	phy_driver_unregister(&bcm5461_driver);
 	phy_driver_unregister(&bcm5421_driver);
 	phy_driver_unregister(&bcm5411_driver);
-- 
1.5.3.3




^ permalink raw reply related

* Re: [PATCH] Optimize cxgb3 xmit path (a bit)
From: Divy Le Ray @ 2008-01-31  0:04 UTC (permalink / raw)
  To: Krishna Kumar; +Cc: jeff, netdev, davem
In-Reply-To: <20080130070016.29078.94125.sendpatchset@N20wks267652wss.in.ibm.com>

Krishna Kumar wrote:
> Changes:
> 	1. Add common code for stopping queue.
> 	2. No need to call netif_stop_queue followed by netif_wake_queue (and
> 	   infact a netif_start_queue could have been used instead), instead
> 	   call stop_queue if required, and remove code under USE_GTS macro.
> 	3. There is no need to check for netif_queue_stopped, as the network
> 	   core guarantees that for us (I am sure every driver could remove
> 	   that check, eg e1000 - I have tested that path a few billion times
> 	   with about a few hundred thousand qstops but the condition never
> 	   hit even once).
>
> Thanks,
>   

Hi Krishna,

Thanks for the work.
There is however a bit more cleaning to do regarding the USE_GTS macro.
I'll post a patch soon that will take your points in account.

Cheers,
Divy

^ permalink raw reply

* Re: net-2.6.25 is no more...
From: David Miller @ 2008-01-31  0:36 UTC (permalink / raw)
  To: dlezcano; +Cc: netdev
In-Reply-To: <47A0B62D.3080005@fr.ibm.com>

From: Daniel Lezcano <dlezcano@fr.ibm.com>
Date: Wed, 30 Jan 2008 18:38:53 +0100

> David Miller wrote:
> > From: Daniel Lezcano <dlezcano@fr.ibm.com>
> > Date: Wed, 30 Jan 2008 10:03:09 +0100
> > 
> >> David Miller wrote:
> >>> Now that the bulk has been merged over and we are
> >>> actively working alongside Linus's tree I have moved
> >>> all current patch applying to net-2.6 instead of net-2.6.25,
> >>> so the current tree to use is:
> >>>
> >>> 	kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git
> >> This tree is for fixes only, right ? or shall we send enhancement 
> >> patches to net-2.6 until net-2.6.26 appears ?
> > 
> > The latter.
> 
> What will happen to the patches sent to net-2.6.25 a few days ago during 
> the merge ? Should I resend them against net-2.6 ?

I have them in my backlog, so just be patient as even just doing
a read-only pass on my email while being here at LCA08 is super
painful.

^ permalink raw reply

* Re: Mostly revert "e1000/e1000e: Move PCI-Express device IDs over       to e1000e"
From: Frans Pop @ 2008-01-31  1:26 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: akpm, auke-jan.h.kok, davem, jeff, linux-kernel, netdev,
	randy.dunlap, torvalds
In-Reply-To: <20080130235836.GV29368@does.not.exist>

Adrian Bunk wrote:
>> Jeff, Auke, would something like this be acceptable? It makes it very
>> obvious in the driver table which entries are for the PCIE versions that
>> would be handled by the E1000E driver if it is enabled..
> 
> I don't like it:
> We should aim at having exactly one driver for one card.

There is one thing I don't understand, but that may well be just me...

>From Linus' original patch:
> +++ b/drivers/net/e1000/e1000_main.c
> +     INTEL_E1000_ETHERNET_DEVICE(0x108C),

So, apparently support for 8086:108c was removed from the e1000 driver.

>From my lspci:
$ lspci -nn | grep Ether
01:00.0 Ethernet controller [0200]: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) [8086:108c] (rev 03)

But when I look at where that card is sitting:
$ readlink pci/devices/0000\:01\:00.0/driver
../../../../bus/pci/drivers/e1000

So, it's on the PCI bus, not on the PCI-Express bus (which I also have, but
which has no devices on it).

Or does the e1000e driver also support cards on the PCI bus?

If that's the case then the original changelog entry "Move PCI-Express
device IDs over to e1000e" is misleading as it's not only PCI-Express
devices...

Hmmm. Or does which driver is loaded decide on which bus the device ends up?

Confused,
FJP

^ permalink raw reply

* Re: [PATCH retry] bluetooth : add conn add/del workqueues to avoid connection fail
From: Dave Young @ 2008-01-31  1:29 UTC (permalink / raw)
  To: Marcel Holtmann; +Cc: netdev, David Miller, bluez-devel, linux-kernel
In-Reply-To: <1201688462.6218.47.camel@violet>

On Jan 30, 2008 6:21 PM, Marcel Holtmann <marcel@holtmann.org> wrote:
> Hi Dave,
>
> > > The bluetooth hci_conn sysfs add/del executed in the default workqueue.
> > > If the del_conn is executed after the new add_conn with same target,
> > > add_conn will failed with warning of "same kobject name".
> > >
> > > Here add btaddconn & btdelconn workqueues,
> > > flush the btdelconn workqueue in the add_conn function to avoid the issue.
> > >
> > > Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
> >
> > This looks good, applied, thanks Dave.
> >
> > I've queued this up for 2.6.25 merging, if you want me to
> > schedule it for -stable, just let me know.
>
> don't include it. I first have to stress test it on one of my machines.
> Besides that I have to do some coding style cleanups.

Sorry, I thought you forgot it.
Thanks.

BTW, for the bus_id bug, If there's no urgent need I think we could do
the fix after driver core bus_id changes of kay which will be there
soon.

>
> Regards
>
> Marcel
>
>
>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

^ permalink raw reply

* Re: [PATCH] [1/1] Deprecate tcp_tw_{reuse,recycle}
From: Andi Kleen @ 2008-01-31  2:59 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <47A0CE75.5080200@candelatech.com>

On Wednesday 30 January 2008 20:22, Ben Greear wrote:

> We use these features to enable creating very high numbers of short-lived
> TCP connections, primarily used as a test tool for other network
> devices.

Hopefully these other network devices don't do any NAT then
or don't otherwise violate the IP-matches-PAWS assumption.
Most likely they do actually, so enabling TW recycle
for testing is probably not even safe for you.

Modern systems have a lot of RAM so even without tw recycle
you should be able to get a very high number of connections.
An timewait socket is around 128 bytes on 64bit; this means
with a GB of memory you can already support > 8 Million TW sockets.
On 32bit it's even more.

The optimization was originally written at a time when 64MB systems
were common.

If you don't care about data integrity have you considered just 
using some custom UDP based protocol or run one of the user space
TCP stacks and disable all data integrity features? If you do care about
data integrity then you should probably disable tw recycle anyways.

The deprecation period will be some time (several months) so you'll have 
enough time to migrate to another method

> Perhaps just document the adverse affects and/or have it print out a
> warning on the console whenever the feature is enabled?

"This feature is insecure and does not work on the internet or with NAT" ? 

Somehow this just does not seem right to me. 

-Andi

^ permalink raw reply

* Re: [PATCH] [AF_RXRPC]: constify function pointer tables
From: David Miller @ 2008-01-31  3:05 UTC (permalink / raw)
  To: jengelh; +Cc: dhowells, netdev
In-Reply-To: <Pine.LNX.4.64.0801222047350.5722@fbirervta.pbzchgretzou.qr>

From: Jan Engelhardt <jengelh@computergmbh.de>
Date: Tue, 22 Jan 2008 20:47:51 +0100 (CET)

> Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>

Applied, thanks.

^ permalink raw reply

* Re: [resend][PATCH] Introducing socket mark socket option
From: David Miller @ 2008-01-31  3:08 UTC (permalink / raw)
  To: kaber; +Cc: panther, netfilter-devel, netdev, linux-arch
In-Reply-To: <47985DD4.3000706@trash.net>

From: Patrick McHardy <kaber@trash.net>
Date: Thu, 24 Jan 2008 10:43:48 +0100

> Laszlo Attila Toth wrote:
> > A userspace program may wish to set the mark for each packets its send
> > without using the netfilter MARK target. Changing the mark can be used
> > for mark based routing without netfilter or for packet filtering.
> > 
> > It requires CAP_NET_ADMIN capability.
> 
> 
> Looks good to me.

Applied, thanks.

^ permalink raw reply

* Re: [PATCH 1/1][NETNS] Add missing initialization of nl_info.nl_net in rtm_to_fib6_config()
From: David Miller @ 2008-01-31  3:09 UTC (permalink / raw)
  To: benjamin.thery; +Cc: netdev, den, dlezcano
In-Reply-To: <20080124103619.032868994@theryb.frec.bull.fr>

From: Benjamin Thery <benjamin.thery@bull.net>
Date: Thu, 24 Jan 2008 11:32:21 +0100

> Add missing initialization of the new nl_info.nl_net field in 
> rtm_to_fib6_config(). This will be needed the store network namespace
> associated to the fib6_config struct.
> 
> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>

Applied, thanks.

^ permalink raw reply

* Re: [XFRM]: constify 'struct xfrm_type'
From: David Miller @ 2008-01-31  3:12 UTC (permalink / raw)
  To: dada1; +Cc: netdev
In-Reply-To: <20080124122621.3248c651.dada1@cosmosbay.com>

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Thu, 24 Jan 2008 12:26:21 +0100

> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH net-2.6.25][NETNS]: Fix race between put_net() and netlink_kernel_create().
From: David Miller @ 2008-01-31  3:31 UTC (permalink / raw)
  To: xemul; +Cc: den, netdev, devel, adobriyan
In-Reply-To: <47988F61.9070505@openvz.org>

From: Pavel Emelyanov <xemul@openvz.org>
Date: Thu, 24 Jan 2008 16:15:13 +0300

> The comment about "race free view of the set of network 
> namespaces" was a bit hasty. Look (there even can be only 
> one CPU, as discovered by Alexey Dobriyan and Denis Lunev):
 ...
> Instead, I propose to crate the socket inside an init_net
> namespace and then re-attach it to the desired one right
> after the socket is created.
> 
> After doing this, we also have to be careful on error paths
> not to drop the reference on the namespace, we didn't get
> the one on.
> 
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
> Acked-by: Denis Lunev <den@openvz.org>

Applied, thanks.

^ permalink raw reply

* Re: [NET]: should explicitely initialize atomic_t field in struct dst_ops
From: David Miller @ 2008-01-31  4:08 UTC (permalink / raw)
  To: dada1; +Cc: netdev
In-Reply-To: <20080124161117.5727c45c.dada1@cosmosbay.com>

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Thu, 24 Jan 2008 16:11:17 +0100

> All but one struct dst_ops static initializations miss explicit
> initialization of entries field.
> 
> As this field is atomic_t, we should use ATOMIC_INIT(0), and not
> rely on atomic_t implementation.
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH] [NET] cpmac: convert to new Fixed PHY infrastructure (was: Re: fixed phy support (warning related to FIXED_MII_100_FDX))
From: Kumar Gala @ 2008-01-31  4:30 UTC (permalink / raw)
  To: avorontsov; +Cc: linuxppc-dev list, netdev, Jeff Garzik, Eugene Konev
In-Reply-To: <20080121204953.GA11384@localhost.localdomain>

> From: Anton Vorontsov <avorontsov@ru.mvista.com>
> Subject: [PATCH] [NET] cpmac: convert to new Fixed PHY infrastructure
>
> This patch converts cpmac to the new Fixed PHY infrastructure,  
> though it
> doesn't fix all the problems with that driver. I didn't even bother to
> test this patch to compile, because cpmac driver is broken in  
> several ways:
>
> 1. This driver won't compile by itself because lack of its header  
> describing
>   platform data;
> 2. It assumes that fixed PHYs should be created by the ethernet  
> driver.
>   It is wrong assumption: fixed PHYs creation is platform code  
> authority,
>   driver must blindly accept bus_id and phy_id platform data variables
>   instead.
>
> Also, it seem that that driver doesn't have actual in-tree users, so
> nothing to fix further.
>
> The main purpose of that patch is to get rid of the following Kconfig
> warning:
>
> scripts/kconfig/conf -s arch/powerpc/Kconfig
> drivers/net/Kconfig:1713:warning: 'select' used by config symbol
> 'CPMAC' refers to undefined symbol 'FIXED_MII_100_FDX'
>
> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
> ---
> drivers/net/Kconfig |    4 +--
> drivers/net/cpmac.c |   55 +++++++++++++++ 
> +----------------------------------
> 2 files changed, 19 insertions(+), 40 deletions(-)

applied.

- k


^ permalink raw reply

* Re: [PACTH 1/1] drivers/net/usb: AX88178 100Mbps problem
From: David Hollis @ 2008-01-31  4:41 UTC (permalink / raw)
  To: Reinin Oyama; +Cc: netdev
In-Reply-To: <479DD97A.8020307@hasiru.net>


On Mon, 2008-01-28 at 22:32 +0900, Reinin Oyama wrote:
> Asix 88178 does not work under 100Mbps connection.
> This patch correct the problem.
> kernel version: 2.6.24

Please don't post the patch as a .gz, it's very small so just post it as
text.

Otherwise:

Acked-by: David Hollis <dhollis@davehollis.com>

-- 
David Hollis <dhollis@davehollis.com>


^ permalink raw reply

* RE: Mostly revert "e1000/e1000e: Move PCI-Express device IDs over       to e1000e"
From: Brandeburg, Jesse @ 2008-01-31  4:59 UTC (permalink / raw)
  To: Frans Pop, Adrian Bunk
  Cc: akpm, Kok, Auke-jan H, davem, jeff, linux-kernel, netdev,
	randy.dunlap, torvalds
In-Reply-To: <E1JKOBl-0007ne-EE@faramir.fjphome.nl>

Frans Pop wrote:
> There is one thing I don't understand, but that may well be just me...
> 
> From Linus' original patch:
>> +++ b/drivers/net/e1000/e1000_main.c
>> +     INTEL_E1000_ETHERNET_DEVICE(0x108C),
> 
> So, apparently support for 8086:108c was removed from the e1000
> driver. 

When it was enabled to be supported by e1000e.
 
> From my lspci:
> $ lspci -nn | grep Ether
> 01:00.0 Ethernet controller [0200]: Intel Corporation 82573E Gigabit
> Ethernet Controller (Copper) [8086:108c] (rev 03) 
> 
> But when I look at where that card is sitting:
> $ readlink pci/devices/0000\:01\:00.0/driver
> ../../../../bus/pci/drivers/e1000
> 
> So, it's on the PCI bus, not on the PCI-Express bus (which I also
> have, but 
> which has no devices on it).

82573E/L are PCIe devices only, don't let the use of "PCI configuration
space" confuse you.  All PCIe devices support PCI configuration space.
This allows systems with PCIe to work right (or mostly right) with all
the PCI supporting software like Linux.
 
> Or does the e1000e driver also support cards on the PCI bus?

E1000e is targeted at the PCIe devices only.
 
> If that's the case then the original changelog entry "Move PCI-Express
> device IDs over to e1000e" is misleading as it's not only PCI-Express
> devices...

Unfortunate bit of confusion over terminology.
 
> Hmmm. Or does which driver is loaded decide on which bus the device
> ends up? 

Hope this helped,
  Jesse

^ permalink raw reply

* Re: [git patches] net driver fixes
From: Sam Ravnborg @ 2008-01-31  5:05 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Jeff Garzik, David Miller, netdev, LKML
In-Reply-To: <20080130224711.GA24133@electric-eye.fr.zoreil.com>

On Wed, Jan 30, 2008 at 11:47:11PM +0100, Francois Romieu wrote:
> Sam Ravnborg <sam@ravnborg.org> :
> [...]
> > > -static struct pci_device_id sis190_pci_tbl[] __devinitdata = {
> > > +static struct pci_device_id sis190_pci_tbl[] = {
> > >  	{ PCI_DEVICE(PCI_VENDOR_ID_SI, 0x0190), 0, 0, 0 },
> > >  	{ PCI_DEVICE(PCI_VENDOR_ID_SI, 0x0191), 0, 0, 1 },
> > >  	{ 0, },
> > 
> > The __devinitdata is OK, it is the following _devinitdata that had
> > to be _devinitconst.
> 
> Strangely enough, removing the devinitdata from the sis190_pci_tbl
> silents the error message here. Do you have an explanation ?
gcc compalins if you add const and non-const data to the same section
which is the case in this driver.

The bug are exposed now where __devinitdata are no longer an empty define.

	Sam

^ permalink raw reply

* Re: ipcomp regression in 2.6.24
From: Herbert Xu @ 2008-01-31  5:32 UTC (permalink / raw)
  To: Marco Berizzi; +Cc: davem, Daniel.Beschorner, netdev
In-Reply-To: <BAY103-DAV204C57458B3E41DE1AA365B2360@phx.gbl>

On Wed, Jan 30, 2008 at 10:14:46AM +0100, Marco Berizzi wrote:
>
> Sorry for bother you again.
> I have applied to 2.6.24, but ipcomp doesn't work anyway.
> I have patched a clean 2.6.24 tree and I did a complete
> rebuild.
> With tcpdump I see both the esp packets going in/out but
> I don't see the clear packets on the interface.

After testing it here it looks like there is this little typo
which means that you can't actually use IPComp for anything
that's not compressible :)

[IPCOMP]: Fix reception of incompressible packets

I made a silly typo by entering IPPROTO_IP (== 0) instead of
IPPROTO_IPIP (== 4).  This broke the reception of incompressible
packets.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/net/ipv4/xfrm4_tunnel.c b/net/ipv4/xfrm4_tunnel.c
index 3268451..152f83d 100644
--- a/net/ipv4/xfrm4_tunnel.c
+++ b/net/ipv4/xfrm4_tunnel.c
@@ -50,7 +50,7 @@ static struct xfrm_type ipip_type = {
 
 static int xfrm_tunnel_rcv(struct sk_buff *skb)
 {
-	return xfrm4_rcv_spi(skb, IPPROTO_IP, ip_hdr(skb)->saddr);
+	return xfrm4_rcv_spi(skb, IPPROTO_IPIP, ip_hdr(skb)->saddr);
 }
 
 static int xfrm_tunnel_err(struct sk_buff *skb, u32 info)

^ permalink raw reply related

* RE: e1000 full-duplex TCP performance well below wire speed
From: Brandeburg, Jesse @ 2008-01-31  5:43 UTC (permalink / raw)
  To: Bruce Allen; +Cc: netdev, Carsten Aulbert, Henning Fehrmann, Bruce Allen
In-Reply-To: <Pine.LNX.4.63.0801301635470.19938@trinity.phys.uwm.edu>

Bruce Allen wrote:
> Hi Jesse,
> 
> It's good to be talking directly to one of the e1000 developers and
> maintainers.  Although at this point I am starting to think that the
> issue may be TCP stack related and nothing to do with the NIC.  Am I
> correct that these are quite distinct parts of the kernel?

Yes, quite.
 
> Important note: we ARE able to get full duplex wire speed (over 900
> Mb/s simulaneously in both directions) using UDP.  The problems occur
> only with TCP connections.

That eliminates bus bandwidth issues, probably, but small packets take
up a lot of extra descriptors, bus bandwidth, CPU, and cache resources.
 
>>> The test was done with various mtu sizes ranging from 1500 to 9000,
>>> with ethernet flow control switched on and off, and using reno and
>>> cubic as a TCP congestion control.
>> 
>> As asked in LKML thread, please post the exact netperf command used
>> to start the client/server, whether or not you're using irqbalanced
>> (aka irqbalance) and what cat /proc/interrupts looks like (you ARE
>> using MSI, right?)
> 
> I have to wait until Carsten or Henning wake up tomorrow (now 23:38 in
> Germany).  So we'll provide this info in ~10 hours.

I would suggest you try TCP_RR with a command line something like this:
netperf -t TCP_RR -H <hostname> -C -c -- -b 4 -r 64K

I think you'll have to compile netperf with burst mode support enabled.

> I assume that the interrupt load is distributed among all four cores
> -- the default affinity is 0xff, and I also assume that there is some
> type of interrupt aggregation taking place in the driver.  If the
> CPUs were not able to service the interrupts fast enough, I assume
> that we would also see loss of performance with UDP testing.
> 
>> One other thing you can try with e1000 is disabling the dynamic
>> interrupt moderation by loading the driver with
>> InterruptThrottleRate=8000,8000,... (the number of commas depends on
>> your number of ports) which might help in your particular benchmark.
> 
> OK.  Is 'dynamic interrupt moderation' another name for 'interrupt
> aggregation'?  Meaning that if more than one interrupt is generated
> in a given time interval, then they are replaced by a single
> interrupt? 

Yes, InterruptThrottleRate=8000 means there will be no more than 8000
ints/second from that adapter, and if interrupts are generated faster
than that they are "aggregated."

Interestingly since you are interested in ultra low latency, and may be
willing to give up some cpu for it during bulk transfers you should try
InterruptThrottleRate=1 (can generate up to 70000 ints/s)

>> just for completeness can you post the dump of ethtool -e eth0 and
>> lspci -vvv?
> 
> Yup, we'll give that info also.
> 
> Thanks again!

Welcome, its an interesting discussion.  Hope we can come to a good
conclusion.

Jesse

^ permalink raw reply

* Re: ipcomp regression in 2.6.24
From: David Miller @ 2008-01-31  5:48 UTC (permalink / raw)
  To: herbert; +Cc: pupilla, Daniel.Beschorner, netdev
In-Reply-To: <20080131053221.GA4739@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 31 Jan 2008 16:32:21 +1100

> [IPCOMP]: Fix reception of incompressible packets
> 
> I made a silly typo by entering IPPROTO_IP (== 0) instead of
> IPPROTO_IPIP (== 4).  This broke the reception of incompressible
> packets.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied, thanks!

^ permalink raw reply

* Re: net-2.6.25 is no more...
From: Jeff Garzik @ 2008-01-31  6:09 UTC (permalink / raw)
  To: David Miller
  Cc: dlezcano-NmTC/0ZBporQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA, John Linville
In-Reply-To: <20080130.054807.182676684.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

David Miller wrote:
> From: Daniel Lezcano <dlezcano-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
> Date: Wed, 30 Jan 2008 10:03:09 +0100
> 
>> David Miller wrote:
>>> Now that the bulk has been merged over and we are
>>> actively working alongside Linus's tree I have moved
>>> all current patch applying to net-2.6 instead of net-2.6.25,
>>> so the current tree to use is:
>>>
>>> 	kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git
>> This tree is for fixes only, right ? or shall we send enhancement 
>> patches to net-2.6 until net-2.6.26 appears ?
> 
> The latter.

So...  what to do about changes which are not bug fixes?  Such changes 
should not go upstream immediately, because that's the standard rule for 
merge windows.

And queueing them on our side just re-creates the same old situation we 
just changed away from, after all.

	Jeff

^ permalink raw reply

* Re: net-2.6.25 is no more...
From: David Miller @ 2008-01-31  6:21 UTC (permalink / raw)
  To: jeff-o2qLIJkoznsdnm+yROfE0A
  Cc: dlezcano-NmTC/0ZBporQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	linville-2XuSBdqkA4R54TAoqtyWWQ
In-Reply-To: <47A16611.3000602-o2qLIJkoznsdnm+yROfE0A@public.gmane.org>

From: Jeff Garzik <jeff-o2qLIJkoznsdnm+yROfE0A@public.gmane.org>
Date: Thu, 31 Jan 2008 01:09:21 -0500

> David Miller wrote:
> > The latter.
> 
> So...  what to do about changes which are not bug fixes?  Such changes 
> should not go upstream immediately, because that's the standard rule for 
> merge windows.
> 
> And queueing them on our side just re-creates the same old situation we 
> just changed away from, after all.

I'm saying that everything should go to the net-2.6 tree right now.

When the merge window closes, net-2.6 will be for bug fixes only and
I'll hold off on creating net-2.6.26 for about a week so that people
concentrate on regression fixes.

Yes, things will pile up but I think it's appropriate to hold off on
merging features into the 2.6.26 queue for just one week.

^ permalink raw reply

* Re: [PATCH] [1/1] Deprecate tcp_tw_{reuse,recycle}
From: Ben Greear @ 2008-01-31  6:37 UTC (permalink / raw)
  To: Andi Kleen; +Cc: netdev
In-Reply-To: <200801310359.07362.ak@suse.de>

Andi Kleen wrote:
> On Wednesday 30 January 2008 20:22, Ben Greear wrote:
>
>   
>> We use these features to enable creating very high numbers of short-lived
>> TCP connections, primarily used as a test tool for other network
>> devices.
>>     
>
> Hopefully these other network devices don't do any NAT then
> or don't otherwise violate the IP-matches-PAWS assumption.
> Most likely they do actually, so enabling TW recycle
> for testing is probably not even safe for you.
>
> Modern systems have a lot of RAM so even without tw recycle
> you should be able to get a very high number of connections.
> An timewait socket is around 128 bytes on 64bit; this means
> with a GB of memory you can already support > 8 Million TW sockets.
> On 32bit it's even more.
>   
I believe the problem was that all of my ports were used up with
TIME_WAIT sockets and so it couldn't create more.  My test
case was similar to this:

1 Have one machine B listen for connections on one interface (one IP).
2 Have one machine A make a connection to B, and close connection 
immediately or soon after
  it was established.
goto 2

The goal was to make a maximum number of TCP connections per second.  
The data passed
is just filler, and for the fastest settings, we don't pass data at all. 
  Without setting
tcp_tw_recycle to 1, the system could do only a few thousand connections 
per second.  With
it set to 1, I think I was getting around 10,000.  Either way, it was 
significantly faster than
w/out recycle enabled.

So, is there a better way to max out the connections per second without 
having to use tcp_tw_recycle?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH] [1/1] Deprecate tcp_tw_{reuse,recycle}
From: Andi Kleen @ 2008-01-31  6:55 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <47A16CBB.3000409@candelatech.com>


> I believe the problem was that all of my ports were used up with
> TIME_WAIT sockets and so it couldn't create more.  My test
> case was similar to this:

Ah that's simple to solve then :- use more IP addresses and bind 
to them in RR in your user program.

Arguably the Linux TCP code should be able to do this by itself
when enough IP addresses are available, but it's not very hard
to do in user space using bind(2)

BTW it's also an very unusual case -- in most cases there are more
remote IP addresses

> So, is there a better way to max out the connections per second without 
> having to use tcp_tw_recycle?

Well did you profile where the bottle necks were?

Perhaps also just increase the memory allowed for TCP sockets.

-Andi



^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox