Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 01/11] vxge: enable rxhash
From: Jon Mason @ 2010-11-08 22:53 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, Sivakumar Subramani, Sreenivasa Honnur,
	Ramkrishna Vepa
In-Reply-To: <20101108.124452.115944240.davem@davemloft.net>

On Mon, Nov 08, 2010 at 12:44:52PM -0800, David Miller wrote:
> 
> This patch set doesn't apply at all to the current tree, please
> respin them, thanks.

When I sent out the series on Thursday, the tree did not have
"vxge: make functions local and remove dead code".  When that patch
was originally released (Oct 15), I asked for it to not be included as
it would break soon-to-be-released patch series.  I did not see any
e-mail afterward, so I assumed this was acceptable to you.  We then
ran the driver though our internal tests to verify its functionality,
which would need to be re-done if the patches are respun.

I have a reworked version of that patch which can be applied after
this patch series.  Is it acceptable to you to revert the commit,
apply the series, then apply the modified version of the "local
functions" patch?  I have already sniff tested it on our hardware
without issues.

Thanks,
Jon

^ permalink raw reply

* [PATCH 07/17][trivial] net, wireless: Remove unnecessary casts of void ptr returning alloc function return values
From: Jesper Juhl @ 2010-11-08 23:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: trivial, Ulrich Kunitz, Daniel Drake, John W. Linville,
	linux-wireless, netdev

Hi,

The [vk][cmz]alloc(_node) family of functions return void pointers which
it's completely unnecessary/pointless to cast to other pointer types since
that happens implicitly.

This patch removes such casts from drivers/net/


Signed-off-by: Jesper Juhl <jj@chaosbits.net>
---
 zd_chip.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/wireless/zd1211rw/zd_chip.c b/drivers/net/wireless/zd1211rw/zd_chip.c
index 87a95bc..dfcebed 100644
--- a/drivers/net/wireless/zd1211rw/zd_chip.c
+++ b/drivers/net/wireless/zd1211rw/zd_chip.c
@@ -117,8 +117,7 @@ int zd_ioread32v_locked(struct zd_chip *chip, u32 *values, const zd_addr_t *addr
 
 	/* Allocate a single memory block for values and addresses. */
 	count16 = 2*count;
-	a16 = (zd_addr_t *) kmalloc(count16 * (sizeof(zd_addr_t) + sizeof(u16)),
-		                   GFP_KERNEL);
+	a16 = kmalloc(count16 * (sizeof(zd_addr_t) + sizeof(u16)), GFP_KERNEL);
 	if (!a16) {
 		dev_dbg_f(zd_chip_dev(chip),
 			  "error ENOMEM in allocation of a16\n");



-- 
Jesper Juhl <jj@chaosbits.net>             http://www.chaosbits.net/
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please.


^ permalink raw reply related

* [PATCH 1/2] r8169: revert "Handle rxfifo errors on 8168 chips"
From: Francois Romieu @ 2010-11-08 23:23 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Andreas Radke, Matthew Garrett, Daniel J Blueman

The original patch helps under obscure conditions (no pun) but
some 8168 do not like it. The change needs to be tightened with
a specific 8168 version.

This reverts commit 801e147cde02f04b5c2f42764cd43a89fc7400a2.

Regression at https://bugzilla.kernel.org/show_bug.cgi?id=20882

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Tested-by: Andreas Radke <a.radke@arcor.de>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Daniel J Blueman <daniel.blueman@gmail.com>
---
 drivers/net/r8169.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index d88ce9f..3a0877e 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -2931,7 +2931,7 @@ static const struct rtl_cfg_info {
 		.hw_start	= rtl_hw_start_8168,
 		.region		= 2,
 		.align		= 8,
-		.intr_event	= SYSErr | RxFIFOOver | LinkChg | RxOverflow |
+		.intr_event	= SYSErr | LinkChg | RxOverflow |
 				  TxErr | TxOK | RxOK | RxErr,
 		.napi_event	= TxErr | TxOK | RxOK | RxOverflow,
 		.features	= RTL_FEATURE_GMII | RTL_FEATURE_MSI,
@@ -4588,7 +4588,8 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 		}
 
 		/* Work around for rx fifo overflow */
-		if (unlikely(status & RxFIFOOver)) {
+		if (unlikely(status & RxFIFOOver) &&
+		(tp->mac_version == RTL_GIGA_MAC_VER_11)) {
 			netif_stop_queue(dev);
 			rtl8169_tx_timeout(dev);
 			break;
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH 2/2] r8169: fix sleeping while holding spinlock.
From: Francois Romieu @ 2010-11-08 23:23 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Daniel J Blueman, Rafael J. Wysocki

As device_set_wakeup_enable can now sleep, move the call to outside
the critical section.

Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/net/r8169.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 3a0877e..4c4d169 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -846,10 +846,10 @@ static int rtl8169_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
 	else
 		tp->features &= ~RTL_FEATURE_WOL;
 	__rtl8169_set_wol(tp, wol->wolopts);
-	device_set_wakeup_enable(&tp->pci_dev->dev, wol->wolopts);
-
 	spin_unlock_irq(&tp->lock);
 
+	device_set_wakeup_enable(&tp->pci_dev->dev, wol->wolopts);
+
 	return 0;
 }
 
-- 
1.7.2.3


^ permalink raw reply related

* Re: [2.6.37-rc1, patch] gianfar: fix sleep in atomic...
From: Rafael J. Wysocki @ 2010-11-08 23:30 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: David S. Miller, Francois Romieu, Linux Kernel, netdev
In-Reply-To: <AANLkTimf_mVrRPEcj8qBbYw1iYHfWdeKUNS3Kk-dfhTT@mail.gmail.com>

On Tuesday, November 02, 2010, Daniel J Blueman wrote:
> Since device_set_wakeup_enable now sleeps, it should not be called
> from a critical section. Since wol_en is not updated elsewhere, we can
> omit the locking entirely.
> 
> Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>

Acked-by: Rafael J. Wysocki <rjw@sisk.pl>

> diff --git a/drivers/net/gianfar_ethtool.c b/drivers/net/gianfar_ethtool.c
> index 5c566eb..e641d7c 100644
> --- a/drivers/net/gianfar_ethtool.c
> +++ b/drivers/net/gianfar_ethtool.c
> @@ -635,10 +635,8 @@ static int gfar_set_wol(struct net_device *dev,
> struct ethtool_wolinfo *wol)
>  	if (wol->wolopts & ~WAKE_MAGIC)
>  		return -EINVAL;
> 
> -	spin_lock_irqsave(&priv->bflock, flags);
>  	priv->wol_en = wol->wolopts & WAKE_MAGIC ? 1 : 0;
>  	device_set_wakeup_enable(&dev->dev, priv->wol_en);
> -	spin_unlock_irqrestore(&priv->bflock, flags);
> 
>  	return 0;
>  }
> 

^ permalink raw reply

* Re: Netlink limitations
From: Pablo Neira Ayuso @ 2010-11-08 23:36 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Thomas Graf, Patrick McHardy, David S. Miller, netdev
In-Reply-To: <alpine.LNX.2.01.1011081958410.31946@obet.zrqbmnf.qr>

On 08/11/10 20:21, Jan Engelhardt wrote:
> On Monday 2010-11-08 16:16, Thomas Graf wrote:
>>>
>>> Messages are not limited to 64k, individual attributes are. Holger
>>> started working on a nlattr32, which uses 32 bit for the length
>>> value.
>>
>> Also, it is not required to pack everything in attributes. Your protocol
>> may specify that the whole message payload consists of chained attributes.
>> Alternatively you may as well split your attribut chain and dump them
>> as several messages.
> 
> Yeah with NETLINK_URELEASE that seems the way to go. However, what are
> compelling arguments to use Netlink over other forms of bidirectional
> communication? (To play devils advocate, one could use nlattr32/TLVs
> over ioctl too.)

Netlink also provides an event-based notification infrastructure. Of
course, you can implement that upon a new socket family that supports
your new ioctls operations taking things in TLV format.

However, I guess that the whole thing will start looking like netlink
quite a lot in the end ;-).

^ permalink raw reply

* Re: [Bugme-new] [Bug 22142] New: skge module doesn't work in 2.6.37-rc1
From: Andrew Morton @ 2010-11-08 23:43 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: bugzilla-daemon, bugme-daemon, netdev, jtmettala
In-Reply-To: <bug-22142-10286@https.bugzilla.kernel.org/>


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Fri, 5 Nov 2010 23:14:21 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=22142
> 
>            Summary: skge module doesn't work in 2.6.37-rc1
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.37-rc1
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Network
>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>         ReportedBy: jtmettala@gmail.com
>         Regression: Yes
> 
> 
> Here is original report.
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/670955
> 
> I hope attached file has enough information. It has a trace.
> 

skge_devinit() did a nearly-NULL deref.

[    8.521324] Intel ICH 0000:00:1f.5: PCI INT B -> GSI 17 (level, low) -> IRQ 17
[    8.521384] Intel ICH 0000:00:1f.5: setting latency timer to 64
[    8.683032] skge 0000:02:05.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
[    8.683091] skge: 1.13 addr 0xfbffc000 irq 22 chip Yukon-Lite rev 7
[    8.696044] BUG: unable to handle kernel NULL pointer dereference at 00000008
[    8.696162] IP: [<f800a215>] skge_devinit+0x1a5/0x210 [skge]
[    8.696246] *pde = 00000000 
[    8.696320] Oops: 0002 [#1] SMP 
[    8.696425] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.1/usb3/3-1/3-1:1.0/input/input4/mouse1/uevent
[    8.696478] Modules linked in: skge(+) i2c_algo_bit joydev snd_mpu401 snd_mpu401_uart snd_seq_midi snd_intel8x0(+) usbhid hid snd_ac97_codec snd_rawmidi snd_seq_midi_event snd_seq ac97_bus snd_pcm snd_seq_device snd_timer snd_page_alloc snd ppdev firewire_sbp2 shpchp parport_pc asus_atk0110 firewire_core floppy crc_itu_t ns558 soundcore gameport psmouse serio_raw lp parport
[    8.697688] 
[    8.697730] Pid: 329, comm: modprobe Not tainted 2.6.37-2-generic #9-Ubuntu P5P800/To Be Filled By O.E.M.
[    8.697783] EIP: 0060:[<f800a215>] EFLAGS: 00010246 CPU: 0
[    8.697829] EIP is at skge_devinit+0x1a5/0x210 [skge]
[    8.697872] EAX: 00000000 EBX: f5fbb000 ECX: 00000000 EDX: 00000000
[    8.697916] ESI: f5fbb440 EDI: f5f68300 EBP: f5ff5dfc ESP: f5ff5de4
[    8.697960]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[    8.698004] Process modprobe (pid: 329, ti=f5ff4000 task=f5880000 task.ti=f5ff4000)
[    8.698054] Stack:
[    8.698093]  00000040 f5f68300 00000000 f6581000 00000000 f5f68300 f5ff5e3c f800f789
[    8.698400]  f800fdd4 f800febd fbffc000 00000000 00000016 f8010131 00000007 c0423af5
[    8.698706]  00000292 00000001 f5f68344 f6581000 f5ff5e5c f6581060 f5ff5e54 c0388937
[    8.699012] Call Trace:
[    8.699059]  [<f800f789>] ? skge_probe+0x284/0x41b [skge]
[    8.699108]  [<c0423af5>] ? pm_runtime_enable+0x45/0x70
[    8.699155]  [<c0388937>] ? local_pci_probe+0x47/0xb0
[    8.699201]  [<c0389e18>] ? pci_device_probe+0x68/0x90
[    8.699247]  [<c041cb6d>] ? really_probe+0x4d/0x150
[    8.699292]  [<c0424fab>] ? pm_runtime_barrier+0x4b/0xb0
[    8.699337]  [<c041ce0c>] ? driver_probe_device+0x3c/0x60
[    8.699383]  [<c041ceb1>] ? __driver_attach+0x81/0x90
[    8.699428]  [<c041ce30>] ? __driver_attach+0x0/0x90
[    8.699473]  [<c041be98>] ? bus_for_each_dev+0x48/0x70
[    8.699518]  [<c041ca1e>] ? driver_attach+0x1e/0x20
[    8.699562]  [<c041ce30>] ? __driver_attach+0x0/0x90
[    8.699606]  [<c041c5d1>] ? bus_add_driver+0xc1/0x2c0
[    8.699652]  [<c03897c0>] ? pci_device_remove+0x0/0xf0
[    8.699697]  [<c041d0f6>] ? driver_register+0x66/0x110
[    8.699742]  [<c04fd807>] ? dmi_matches+0x47/0xb0
[    8.699787]  [<c0388ed5>] ? __pci_register_driver+0x45/0xb0
[    8.699834]  [<f802102f>] ? skge_init_module+0x2f/0x31 [skge]
[    8.699880]  [<c0101255>] ? do_one_initcall+0x35/0x170
[    8.699927]  [<f8021000>] ? skge_init_module+0x0/0x31 [skge]
[    8.699973]  [<c018807b>] ? sys_init_module+0x9b/0x1e0
[    8.700012]  [<c02252a2>] ? sys_write+0x42/0x70
[    8.700012]  [<c010309f>] ? sysenter_do_call+0x12/0x28
[    8.700012] Code: 40 04 66 89 42 04 0f b6 8b 21 01 00 00 8d 83 00 01 00 00 8b 93 78 01 00 00 e8 c8 a9 36 c8 89 d8 e8 b1 57 52 c8 8b 83 00 02 00 00 <f0> 80 48 08 01 83 c4 0c 89 d8 5b 5e 5f 5d c3 8d 74 26 00 31 d2 
[    8.700012] EIP: [<f800a215>] skge_devinit+0x1a5/0x210 [skge] SS:ESP 0068:f5ff5de4
[    8.700012] CR2: 0000000000000008
[    8.702518] ---[ end trace 997185377b275fcf ]---


^ permalink raw reply

* Re: [PATCH 2/2] r8169: fix sleeping while holding spinlock.
From: Andrew Hendry @ 2010-11-08 23:48 UTC (permalink / raw)
  To: Francois Romieu
  Cc: netdev, David S. Miller, Daniel J Blueman, Rafael J. Wysocki
In-Reply-To: <20101108232358.GB13720@electric-eye.fr.zoreil.com>

Was getting this error on boot "BUG: scheduling while atomic:
ethtool/1430/0x00000002" patch fixed them.

Acked-by: Andrew Hendry <andrew.hendry@gmail.com>

On Tue, Nov 9, 2010 at 10:23 AM, Francois Romieu <romieu@fr.zoreil.com> wrote:
> As device_set_wakeup_enable can now sleep, move the call to outside
> the critical section.
>
> Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
> Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
> ---
>  drivers/net/r8169.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index 3a0877e..4c4d169 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -846,10 +846,10 @@ static int rtl8169_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
>        else
>                tp->features &= ~RTL_FEATURE_WOL;
>        __rtl8169_set_wol(tp, wol->wolopts);
> -       device_set_wakeup_enable(&tp->pci_dev->dev, wol->wolopts);
> -
>        spin_unlock_irq(&tp->lock);
>
> +       device_set_wakeup_enable(&tp->pci_dev->dev, wol->wolopts);
> +
>        return 0;
>  }
>
> --
> 1.7.2.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: Loopback performance from kernel 2.6.12 to 2.6.37
From: Andrew Hendry @ 2010-11-09  0:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jesper Dangaard Brouer, netdev
In-Reply-To: <1289228785.2820.203.camel@edumazet-laptop>

results on an i7 860 @ 2.80Ghz machine, no virtualization involved. 2.6.37-rc1+

# time dd if=/dev/zero bs=1M count=10000 | netcat  127.0.0.1 9999
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 50.2022 s, 209 MB/s

real	0m50.210s
user	0m1.094s
sys	0m57.589s



On Tue, Nov 9, 2010 at 2:06 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 08 novembre 2010 à 12:04 +0100, Eric Dumazet a écrit :
>> Le lundi 08 novembre 2010 à 11:58 +0100, Jesper Dangaard Brouer a
>> écrit :
>> > On Fri, 2010-11-05 at 21:29 +0100, Eric Dumazet wrote:
>> > > Le vendredi 05 novembre 2010 à 11:49 +0100, Jesper Dangaard Brouer a
>> > > écrit :
>> > > > Hi Eric,
>> > > >
>> > > > A colleague send me a link to someone who has done some quite extensive
>> > > > performance measurements across different kernel versions.
>> > > >
>> > > > I noticed that the loopback performance has gotten quite bad:
>> > > >
>> > > > http://www.phoronix.com/scan.php?page=article&item=linux_2612_2637&num=6
>> > > >
>> > > > I though you might be interested in the link.
>> > > >
>> > > > See you around :-)
>> > >
>> > > Hi !
>> > >
>> > > Problem is : I have no idea what test they exactly use,
>> > > do you have info about it ?
>> >
>> > Its called the Phoronix test-suite, their website is:
>> > http://www.phoronix-test-suite.com/?k=home
>> >
>> > On my Ubuntu workstation their software comes as a software package:
>> >  sudo aptitude install phoronix-test-suite
>> >
>> > They seem to be related to the test/review site:
>> > http://www.phoronix.com/
>> >
>> >
>> >
>> > > This probably can be explained very fast.
>> >
>> > The loopback test seems to be the only real networking test they do.
>> > It looks like they just copy a very big fil via loopback, and record the
>> > time it took... quite simple.
>> >
>> > Their tests seems to be focused on CPU util/speed, graphics/games.
>> >
>> >
>> > The thing that caught my attention, was that they seemed interested in
>> > doing performance regression testing on all kernel versions...
>> >
>> > So, I though, it would be great if someone else would do automated
>> > performance regression testing for us :-),  Too bad they only have a
>> > very simple network test.
>> >
>> >
>>
>
>>
>
> CC netdev, if you dont mind.
>
>
> Their network test is basically :
>
> netcat -l 9999 >/dev/null &
> time dd if=/dev/zero bs=1M count=10000 | netcat  127.0.0.1 9999
>
> They say it takes 38 seconds on their "super fast" processor
>
> On my dev machine, not super fast (E5540 @2.53GHz), I get 8 or 9
> seconds, even if only one CPU is online, all others offline.
>
> Go figure... maybe an artifact of the virtualization they use.
>
> I suspect some problem with the ticket spinlocks and a call to
> hypervisor to say 'I am spinning on a spinlock, see if you need to do
> something useful', or maybe ACPI problem (going to/from idle)
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Takes > 1 second to delete macvlan with global IPv6 address on it.
From: Ben Greear @ 2010-11-09  0:20 UTC (permalink / raw)
  To: NetDev

This is on an otherwise lightly loaded 2.6.36 + hacks system, 12 physical interfaces,
and two VETH interfaces.

It's much faster to delete an interface when it has no IPv6 address:

[root@ct503-60 lanforge]# time ip link add link eth5 up name eth5#0 address 00:00:00:00:00:01 type macvlan

real	0m0.005s
user	0m0.001s
sys	0m0.004s
[root@ct503-60 lanforge]# time ip link delete eth5#0

real	0m0.033s
user	0m0.001s
sys	0m0.005s
[root@ct503-60 lanforge]# ip link add link eth5 up name eth5#0 address 00:00:00:00:00:01 type macvlan

[root@ct503-60 lanforge]# ip -6 addr add 2002::1/64 dev eth5#0
[root@ct503-60 lanforge]# time ip link delete eth5#0

real	0m1.030s
user	0m0.000s
sys	0m0.013s


Funny enough, if you explicitly remove the IPv6 addr first it seems
to run at normal speed (adding both operation's times together)

[root@ct503-60 lanforge]# ip link add link eth5 up name eth5#0 address 00:00:00:00:00:01 type macvlan
[root@ct503-60 lanforge]# ip -6 addr add 2002::1/64 dev eth5#0
[root@ct503-60 lanforge]# time ip -6 addr delete 2002::1/64 dev eth5#0

real	0m0.001s
user	0m0.000s
sys	0m0.001s
[root@ct503-60 lanforge]# time ip link delete eth5#0

real	0m0.028s
user	0m0.001s
sys	0m0.005s


Take it easy,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* [patch v3] ipvs: allow transmit of GRO aggregated skbs
From: Simon Horman @ 2010-11-09  1:08 UTC (permalink / raw)
  To: lvs-devel, netdev; +Cc: Julian Anastasov, Herbert Xu

Attempt at allowing LVS to transmit skbs of greater than MTU length that
have been aggregated by GRO and can thus be deaggregated by GSO.

Cc: Julian Anastasov <ja@ssi.bg>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Simon Horman <horms@verge.net.au>

--- 

* LRO is still an outstanding issue, but as its deprecated in favour
  of GRO perhaps it doesn't need to be solved.

* v1
  - Based on 2.6.35

* v2
  - Rebase on current nf-next-2.6 tree (~2.6.37-rc1)

* v3
  - Use skb_is_gso() instead of netif_needs_gso() as suggested by
    Julian Anastasov and confirmed by Herbert Xu.

Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_xmit.c
===================================================================
--- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_xmit.c	2010-11-08 16:27:31.000000000 +0900
+++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_xmit.c	2010-11-08 16:29:19.000000000 +0900
@@ -408,7 +408,8 @@ ip_vs_bypass_xmit(struct sk_buff *skb, s
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->dst);
-	if ((skb->len > mtu) && (iph->frag_off & htons(IP_DF))) {
+	if ((skb->len > mtu) && (iph->frag_off & htons(IP_DF)) &&
+	    !skb_is_gso(skb)) {
 		ip_rt_put(rt);
 		icmp_send(skb, ICMP_DEST_UNREACH,ICMP_FRAG_NEEDED, htonl(mtu));
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -461,7 +462,7 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->dst);
-	if (skb->len > mtu) {
+	if (skb->len > mtu && !skb_is_gso(skb)) {
 		if (!skb->dev) {
 			struct net *net = dev_net(skb_dst(skb)->dev);
 
@@ -560,7 +561,8 @@ ip_vs_nat_xmit(struct sk_buff *skb, stru
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->dst);
-	if ((skb->len > mtu) && (iph->frag_off & htons(IP_DF))) {
+	if ((skb->len > mtu) && (iph->frag_off & htons(IP_DF)) &&
+	    !skb_is_gso(skb)) {
 		icmp_send(skb, ICMP_DEST_UNREACH,ICMP_FRAG_NEEDED, htonl(mtu));
 		IP_VS_DBG_RL_PKT(0, AF_INET, pp, skb, 0,
 				 "ip_vs_nat_xmit(): frag needed for");
@@ -675,7 +677,7 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, s
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->dst);
-	if (skb->len > mtu) {
+	if (skb->len > mtu && !skb_is_gso(skb)) {
 		if (!skb->dev) {
 			struct net *net = dev_net(skb_dst(skb)->dev);
 
@@ -790,8 +792,8 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, s
 
 	df |= (old_iph->frag_off & htons(IP_DF));
 
-	if ((old_iph->frag_off & htons(IP_DF))
-	    && mtu < ntohs(old_iph->tot_len)) {
+	if ((old_iph->frag_off & htons(IP_DF) &&
+	    mtu < ntohs(old_iph->tot_len) && !skb_is_gso(skb))) {
 		icmp_send(skb, ICMP_DEST_UNREACH,ICMP_FRAG_NEEDED, htonl(mtu));
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
 		goto tx_error_put;
@@ -903,7 +905,8 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb
 	if (skb_dst(skb))
 		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), mtu);
 
-	if (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr)) {
+	if (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) &&
+	    !skb_is_gso(skb)) {
 		if (!skb->dev) {
 			struct net *net = dev_net(skb_dst(skb)->dev);
 
@@ -1008,7 +1011,8 @@ ip_vs_dr_xmit(struct sk_buff *skb, struc
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->dst);
-	if ((iph->frag_off & htons(IP_DF)) && skb->len > mtu) {
+	if ((iph->frag_off & htons(IP_DF)) && skb->len > mtu &&
+	    !skb_is_gso(skb)) {
 		icmp_send(skb, ICMP_DEST_UNREACH,ICMP_FRAG_NEEDED, htonl(mtu));
 		ip_rt_put(rt);
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -1174,7 +1178,8 @@ ip_vs_icmp_xmit(struct sk_buff *skb, str
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->dst);
-	if ((skb->len > mtu) && (ip_hdr(skb)->frag_off & htons(IP_DF))) {
+	if ((skb->len > mtu) && (ip_hdr(skb)->frag_off & htons(IP_DF)) &&
+	    !skb_is_gso(skb)) {
 		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
 		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
 		goto tx_error_put;
@@ -1288,7 +1293,7 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb,
 
 	/* MTU checking */
 	mtu = dst_mtu(&rt->dst);
-	if (skb->len > mtu) {
+	if (skb->len > mtu && !skb_is_gso(skb)) {
 		if (!skb->dev) {
 			struct net *net = dev_net(skb_dst(skb)->dev);
 

^ permalink raw reply

* Re: [patch v3] ipvs: allow transmit of GRO aggregated skbs
From: Simon Horman @ 2010-11-09  1:18 UTC (permalink / raw)
  To: lvs-devel, netdev; +Cc: Julian Anastasov, Herbert Xu
In-Reply-To: <20101109010847.GA13974@verge.net.au>

On Tue, Nov 09, 2010 at 10:08:49AM +0900, Simon Horman wrote:
> Attempt at allowing LVS to transmit skbs of greater than MTU length that
> have been aggregated by GRO and can thus be deaggregated by GSO.
> 
> Cc: Julian Anastasov <ja@ssi.bg>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Signed-off-by: Simon Horman <horms@verge.net.au>
> 
> --- 
> 
> * LRO is still an outstanding issue, but as its deprecated in favour
>   of GRO perhaps it doesn't need to be solved.
> 
> * v1
>   - Based on 2.6.35
> 
> * v2
>   - Rebase on current nf-next-2.6 tree (~2.6.37-rc1)
> 
> * v3
>   - Use skb_is_gso() instead of netif_needs_gso() as suggested by
>     Julian Anastasov and confirmed by Herbert Xu.

On thinking about this a bit more, I believe that this is material
for stable as its affecting deployed systems. I'll back-port it
and add the appropriate CC once its seen a bit more testing.

> 
> Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_xmit.c
> ===================================================================
> --- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_xmit.c	2010-11-08 16:27:31.000000000 +0900
> +++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_xmit.c	2010-11-08 16:29:19.000000000 +0900
> @@ -408,7 +408,8 @@ ip_vs_bypass_xmit(struct sk_buff *skb, s
>  
>  	/* MTU checking */
>  	mtu = dst_mtu(&rt->dst);
> -	if ((skb->len > mtu) && (iph->frag_off & htons(IP_DF))) {
> +	if ((skb->len > mtu) && (iph->frag_off & htons(IP_DF)) &&
> +	    !skb_is_gso(skb)) {
>  		ip_rt_put(rt);
>  		icmp_send(skb, ICMP_DEST_UNREACH,ICMP_FRAG_NEEDED, htonl(mtu));
>  		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
> @@ -461,7 +462,7 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb
>  
>  	/* MTU checking */
>  	mtu = dst_mtu(&rt->dst);
> -	if (skb->len > mtu) {
> +	if (skb->len > mtu && !skb_is_gso(skb)) {
>  		if (!skb->dev) {
>  			struct net *net = dev_net(skb_dst(skb)->dev);
>  
> @@ -560,7 +561,8 @@ ip_vs_nat_xmit(struct sk_buff *skb, stru
>  
>  	/* MTU checking */
>  	mtu = dst_mtu(&rt->dst);
> -	if ((skb->len > mtu) && (iph->frag_off & htons(IP_DF))) {
> +	if ((skb->len > mtu) && (iph->frag_off & htons(IP_DF)) &&
> +	    !skb_is_gso(skb)) {
>  		icmp_send(skb, ICMP_DEST_UNREACH,ICMP_FRAG_NEEDED, htonl(mtu));
>  		IP_VS_DBG_RL_PKT(0, AF_INET, pp, skb, 0,
>  				 "ip_vs_nat_xmit(): frag needed for");
> @@ -675,7 +677,7 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, s
>  
>  	/* MTU checking */
>  	mtu = dst_mtu(&rt->dst);
> -	if (skb->len > mtu) {
> +	if (skb->len > mtu && !skb_is_gso(skb)) {
>  		if (!skb->dev) {
>  			struct net *net = dev_net(skb_dst(skb)->dev);
>  
> @@ -790,8 +792,8 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, s
>  
>  	df |= (old_iph->frag_off & htons(IP_DF));
>  
> -	if ((old_iph->frag_off & htons(IP_DF))
> -	    && mtu < ntohs(old_iph->tot_len)) {
> +	if ((old_iph->frag_off & htons(IP_DF) &&
> +	    mtu < ntohs(old_iph->tot_len) && !skb_is_gso(skb))) {
>  		icmp_send(skb, ICMP_DEST_UNREACH,ICMP_FRAG_NEEDED, htonl(mtu));
>  		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
>  		goto tx_error_put;
> @@ -903,7 +905,8 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb
>  	if (skb_dst(skb))
>  		skb_dst(skb)->ops->update_pmtu(skb_dst(skb), mtu);
>  
> -	if (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr)) {
> +	if (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) &&
> +	    !skb_is_gso(skb)) {
>  		if (!skb->dev) {
>  			struct net *net = dev_net(skb_dst(skb)->dev);
>  
> @@ -1008,7 +1011,8 @@ ip_vs_dr_xmit(struct sk_buff *skb, struc
>  
>  	/* MTU checking */
>  	mtu = dst_mtu(&rt->dst);
> -	if ((iph->frag_off & htons(IP_DF)) && skb->len > mtu) {
> +	if ((iph->frag_off & htons(IP_DF)) && skb->len > mtu &&
> +	    !skb_is_gso(skb)) {
>  		icmp_send(skb, ICMP_DEST_UNREACH,ICMP_FRAG_NEEDED, htonl(mtu));
>  		ip_rt_put(rt);
>  		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
> @@ -1174,7 +1178,8 @@ ip_vs_icmp_xmit(struct sk_buff *skb, str
>  
>  	/* MTU checking */
>  	mtu = dst_mtu(&rt->dst);
> -	if ((skb->len > mtu) && (ip_hdr(skb)->frag_off & htons(IP_DF))) {
> +	if ((skb->len > mtu) && (ip_hdr(skb)->frag_off & htons(IP_DF)) &&
> +	    !skb_is_gso(skb)) {
>  		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
>  		IP_VS_DBG_RL("%s(): frag needed\n", __func__);
>  		goto tx_error_put;
> @@ -1288,7 +1293,7 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb,
>  
>  	/* MTU checking */
>  	mtu = dst_mtu(&rt->dst);
> -	if (skb->len > mtu) {
> +	if (skb->len > mtu && !skb_is_gso(skb)) {
>  		if (!skb->dev) {
>  			struct net *net = dev_net(skb_dst(skb)->dev);
>  
> --
> To unsubscribe from this list: send the line "unsubscribe lvs-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH 07/17][trivial] net, wireless: Remove unnecessary casts of void ptr returning alloc function return values
From: John W. Linville @ 2010-11-09  1:23 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	trivial-DgEjT+Ai2ygdnm+yROfE0A, Ulrich Kunitz, Daniel Drake,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <alpine.LNX.2.00.1011082330390.23697-h2p7t3/P30RzeRGmFJ5qR7ZzlVVXadcDXqFh9Ls21Oc@public.gmane.org>

On Tue, Nov 09, 2010 at 12:09:13AM +0100, Jesper Juhl wrote:
> Hi,
> 
> The [vk][cmz]alloc(_node) family of functions return void pointers which
> it's completely unnecessary/pointless to cast to other pointer types since
> that happens implicitly.
> 
> This patch removes such casts from drivers/net/
> 
> 
> Signed-off-by: Jesper Juhl <jj-IYz4IdjRLj0sV2N9l4h3zg@public.gmane.org>
> ---
>  zd_chip.c |    3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> diff --git a/drivers/net/wireless/zd1211rw/zd_chip.c b/drivers/net/wireless/zd1211rw/zd_chip.c
> index 87a95bc..dfcebed 100644
> --- a/drivers/net/wireless/zd1211rw/zd_chip.c
> +++ b/drivers/net/wireless/zd1211rw/zd_chip.c
> @@ -117,8 +117,7 @@ int zd_ioread32v_locked(struct zd_chip *chip, u32 *values, const zd_addr_t *addr
>  
>  	/* Allocate a single memory block for values and addresses. */
>  	count16 = 2*count;
> -	a16 = (zd_addr_t *) kmalloc(count16 * (sizeof(zd_addr_t) + sizeof(u16)),
> -		                   GFP_KERNEL);
> +	a16 = kmalloc(count16 * (sizeof(zd_addr_t) + sizeof(u16)), GFP_KERNEL);
>  	if (!a16) {
>  		dev_dbg_f(zd_chip_dev(chip),
>  			  "error ENOMEM in allocation of a16\n");

kcalloc?

-- 
John W. Linville		Someday the world will need a hero, and you
linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org			might be all we have.  Be ready.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 01/11] vxge: enable rxhash
From: David Miller @ 2010-11-09  2:38 UTC (permalink / raw)
  To: jon.mason; +Cc: netdev, Sivakumar.Subramani, Sreenivasa.Honnur, Ramkrishna.Vepa
In-Reply-To: <20101108225340.GA16247@exar.com>

From: Jon Mason <jon.mason@exar.com>
Date: Mon, 8 Nov 2010 16:53:40 -0600

> On Mon, Nov 08, 2010 at 12:44:52PM -0800, David Miller wrote:
>> 
>> This patch set doesn't apply at all to the current tree, please
>> respin them, thanks.
> 
> When I sent out the series on Thursday, the tree did not have
> "vxge: make functions local and remove dead code".  When that patch
> was originally released (Oct 15), I asked for it to not be included as
> it would break soon-to-be-released patch series.  I did not see any
> e-mail afterward, so I assumed this was acceptable to you.  We then
> ran the driver though our internal tests to verify its functionality,
> which would need to be re-done if the patches are respun.
> 
> I have a reworked version of that patch which can be applied after
> this patch series.  Is it acceptable to you to revert the commit,
> apply the series, then apply the modified version of the "local
> functions" patch?  I have already sniff tested it on our hardware
> without issues.

Ummm, no.  I'm not reverting a correct patch just so that your
original patches apply properly.

Please just respin the patch series on top of the current tree.

^ permalink raw reply

* Re: [Bugme-new] [Bug 22142] New: skge module doesn't work in 2.6.37-rc1
From: David Miller @ 2010-11-09  2:46 UTC (permalink / raw)
  To: akpm; +Cc: shemminger, bugzilla-daemon, bugme-daemon, netdev, jtmettala
In-Reply-To: <20101108154306.0f93eddb.akpm@linux-foundation.org>

From: Andrew Morton <akpm@linux-foundation.org>
Date: Mon, 8 Nov 2010 15:43:06 -0800

> skge_devinit() did a nearly-NULL deref.

Fixed in net-2.6:

--------------------
skge: Remove tx queue stopping in skge_devinit()

After e6484930d7c73d324bccda7d43d131088da697b9: net: allocate tx queues in register_netdevice
It causes an Oops at skge_probe() time.

Signed-off-by: Guillaume Chazarain <guichaz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/skge.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/skge.c b/drivers/net/skge.c
index bfec2e0..220e039 100644
--- a/drivers/net/skge.c
+++ b/drivers/net/skge.c
@@ -3858,7 +3858,6 @@ static struct net_device *skge_devinit(struct skge_hw *hw, int port,
 
 	/* device is off until link detection */
 	netif_carrier_off(dev);
-	netif_stop_queue(dev);
 
 	return dev;
 }
-- 
1.7.3.2


^ permalink raw reply related

* Re: [PATCH] inet: fix ip_mc_drop_socket()
From: Miles Lane @ 2010-11-09  4:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Markus Trippelsdorf, David Miller, paulmck, ilpo.jarvinen, LKML,
	Len Brown, netdev
In-Reply-To: <1289250954.2790.11.camel@edumazet-laptop>

Looks good here.

On Mon, Nov 8, 2010 at 4:15 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Hmm, I believe I found the bug.
>
> Thanks guys !
>
> [PATCH] inet: fix ip_mc_drop_socket()
>
> commit 8723e1b4ad9be4444 (inet: RCU changes in inetdev_by_index())
> forgot one call site in ip_mc_drop_socket()
>
> We should not decrease idev refcount after inetdev_by_index() call,
> since refcount is not increased anymore.
>
> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
> Reported-by: Miles Lane <miles.lane@gmail.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  net/ipv4/igmp.c |    4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
> index c8877c6..3c53c2d 100644
> --- a/net/ipv4/igmp.c
> +++ b/net/ipv4/igmp.c
> @@ -2306,10 +2306,8 @@ void ip_mc_drop_socket(struct sock *sk)
>
>                in_dev = inetdev_by_index(net, iml->multi.imr_ifindex);
>                (void) ip_mc_leave_src(sk, iml, in_dev);
> -               if (in_dev != NULL) {
> +               if (in_dev != NULL)
>                        ip_mc_dec_group(in_dev, iml->multi.imr_multiaddr.s_addr);
> -                       in_dev_put(in_dev);
> -               }
>                /* decrease mem now to avoid the memleak warning */
>                atomic_sub(sizeof(*iml), &sk->sk_omem_alloc);
>                call_rcu(&iml->rcu, ip_mc_socklist_reclaim);
>
>
>

^ permalink raw reply

* Re: [PATCH] virtio_net: Fix queue full check
From: Krishna Kumar2 @ 2010-11-09  4:26 UTC (permalink / raw)
  To: Rusty Russell; +Cc: davem, Michael S. Tsirkin, netdev, yvugenfi
In-Reply-To: <201011080938.47938.rusty@rustcorp.com.au>

Rusty Russell <rusty@rustcorp.com.au> wrote on 11/08/2010 04:38:47 AM:

> Re: [PATCH] virtio_net: Fix queue full check
>
> On Thu, 4 Nov 2010 10:54:24 pm Michael S. Tsirkin wrote:
> > I thought about this some more.  I think the original
> > code is actually correct in returning ENOSPC: indirect
> > buffers are nice, but it's a mistake
> > to rely on them as a memory allocation might fail.
> >
> > And if you look at virtio-net, it is dropping packets
> > under memory pressure which is not really a happy outcome:
> > the packet will get freed, reallocated and we get another one,
> > adding pressure on the allocator instead of releasing it
> > until we free up some buffers.
> >
> > So I now think we should calculate the capacity
> > assuming non-indirect entries, and if we manage to
> > use indirect, all the better.
>
> I've long said it's a weakness in the network stack that it insists
> drivers stop the tx queue before they *might* run out of room, leading to
> worst-case assumptions and underutilization of the tx ring.
>
> However, I lost that debate, and so your patch is the way it's supposed
to
> work.  The other main indirect user (block) doesn't care as its queue
> allows for post-attempt blocking.
>
> I enhanced your commentry a little:
>
> Subject: virtio: return correct capacity to users
> Date: Thu, 4 Nov 2010 14:24:24 +0200
> From: "Michael S. Tsirkin" <mst@redhat.com>
>
> We can't rely on indirect buffers for capacity
> calculations because they need a memory allocation
> which might fail.  In particular, virtio_net can get
> into this situation under stress, and it drops packets
> and performs badly.
>
> So return the number of buffers we can guarantee users.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> Reported-By: Krishna Kumar2 <krkumar2@in.ibm.com>

I have tested this patch for 3-4 hours but so far I have not got the tx
full
error. I am not sure if "Tested-By" applies to this situation, but just in
case:

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reported-By: Krishna Kumar2 <krkumar2@in.ibm.com>
Tested-By: Krishna Kumar2 <krkumar2@in.ibm.com>

I think both this patch and the original patch I submitted
are needed? That patch removes ENOMEM check and the increment
of dev->stats.tx_fifo_errors, and reports "memory failure".

Thanks,

- KK


^ permalink raw reply

* Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net
From: Krishna Kumar2 @ 2010-11-09  4:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: anthony, arnd, avi, davem, eric.dumazet, kvm, netdev, rusty
In-Reply-To: <20101026085709.GC23530@redhat.com>

"Michael S. Tsirkin" <mst@redhat.com> wrote on 10/26/2010 02:27:09 PM:

> Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net
>
> On Mon, Oct 25, 2010 at 09:20:38PM +0530, Krishna Kumar2 wrote:
> > > Krishna Kumar2/India/IBM@IBMIN wrote on 10/20/2010 02:24:52 PM:
> >
> > Any feedback, comments, objections, issues or bugs about the
> > patches? Please let me know if something needs to be done.
> >
> > Some more test results:
> > _____________________________________________________
> >          Host->Guest BW (numtxqs=2)
> > #       BW%     CPU%    RCPU%   SD%     RSD%
> > _____________________________________________________
>
> I think we discussed the need for external to guest testing
> over 10G. For large messages we should not see any change
> but you should be able to get better numbers for small messages
> assuming a MQ NIC card.

I had to make a few changes to qemu (and a minor change in macvtap
driver) to get multiple TXQ support using macvtap working. The NIC
is a ixgbe card.

__________________________________________________________________________
            Org vs New (I/O: 512 bytes, #numtxqs=2, #vhosts=3)
#      BW1     BW2 (%)       SD1    SD2 (%)        RSD1    RSD2 (%)
__________________________________________________________________________
1      14367   13142 (-8.5)  56     62 (10.7)      8        8 (0)
2      3652    3855 (5.5)    37     35 (-5.4)      7        6 (-14.2)
4      12529   12059 (-3.7)  65     77 (18.4)      35       35 (0)
8      13912   14668 (5.4)   288    332 (15.2)     175      184 (5.1)
16     13433   14455 (7.6)   1218   1321 (8.4)     920      943 (2.5)
24     12750   13477 (5.7)   2876   2985 (3.7)     2514     2348 (-6.6)
32     11729   12632 (7.6)   5299   5332 (.6)      4934     4497 (-8.8)
40     11061   11923 (7.7)   8482   8364 (-1.3)    8374     7495 (-10.4)
48     10624   11267 (6.0)   12329  12258 (-.5)    12762    11538 (-9.5)
64     10524   10596 (.6)    21689  22859 (5.3)    23626    22403 (-5.1)
80     9856    10284 (4.3)   35769  36313 (1.5)    39932    36419 (-8.7)
96     9691    10075 (3.9)   52357  52259 (-.1)    58676    53463 (-8.8)
128    9351    9794 (4.7)    114707 94275 (-17.8)  114050   97337 (-14.6)
__________________________________________________________________________
Avg:      BW: (3.3)      SD: (-7.3)      RSD: (-11.0)

__________________________________________________________________________
            Org vs New (I/O: 1K, #numtxqs=8, #vhosts=5)
#      BW1      BW2 (%)       SD1   SD2 (%)        RSD1   RSD2 (%)
__________________________________________________________________________
1      16509    15985 (-3.1)  45    47 (4.4)       7       7 (0)
2      6963     4499 (-35.3)  17    51 (200.0)     7       7 (0)
4      12932    11080 (-14.3) 49    74 (51.0)      35      35 (0)
8      13878    14095 (1.5)   223   292 (30.9)     175     181 (3.4)
16     13440    13698 (1.9)   980   1131 (15.4)    926     942 (1.7)
24     12680    12927 (1.9)   2387  2463 (3.1)     2526    2342 (-7.2)
32     11714    12261 (4.6)   4506  4486 (-.4)     4941    4463 (-9.6)
40     11059    11651 (5.3)   7244  7081 (-2.2)    8349    7437 (-10.9)
48     10580    11095 (4.8)   10811 10500 (-2.8)   12809   11403 (-10.9)
64     10569    10566 (0)     19194 19270 (.3)     23648   21717 (-8.1)
80     9827     10753 (9.4)   31668 29425 (-7.0)   39991   33824 (-15.4)
96     10043    10150 (1.0)   45352 44227 (-2.4)   57766   51131 (-11.4)
128    9360     9979 (6.6)    92058 79198 (-13.9)  114381  92873 (-18.8)
__________________________________________________________________________
Avg:      BW: (-.5)      SD: (-7.5)      RSD: (-14.7)

Is there anything else you would like me to test/change, or shall
I submit the next version (with the above macvtap changes)?

Thanks,

- KK


^ permalink raw reply

* Re: Loopback performance from kernel 2.6.12 to 2.6.37
From: Eric Dumazet @ 2010-11-09  5:22 UTC (permalink / raw)
  To: Andrew Hendry; +Cc: Jesper Dangaard Brouer, netdev
In-Reply-To: <AANLkTi=HhouZymj0R7JsDy-X1LDbfT_WL0x10EMhdOho@mail.gmail.com>

Le mardi 09 novembre 2010 à 11:05 +1100, Andrew Hendry a écrit :
> results on an i7 860 @ 2.80Ghz machine, no virtualization involved. 2.6.37-rc1+
> 
> # time dd if=/dev/zero bs=1M count=10000 | netcat  127.0.0.1 9999
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 50.2022 s, 209 MB/s
> 
> real	0m50.210s
> user	0m1.094s
> sys	0m57.589s

Thanks !

Could you take a pef snapshot during the test ?

# perf record -a -g sleep 10
# perf report




^ permalink raw reply

* Re: Takes > 1 second to delete macvlan with global IPv6 address on it.
From: Eric Dumazet @ 2010-11-09  6:15 UTC (permalink / raw)
  To: Ben Greear; +Cc: NetDev
In-Reply-To: <4CD893C6.2030803@candelatech.com>

Le lundi 08 novembre 2010 à 16:20 -0800, Ben Greear a écrit :
> This is on an otherwise lightly loaded 2.6.36 + hacks system, 12 physical interfaces,
> and two VETH interfaces.
> 
> It's much faster to delete an interface when it has no IPv6 address:
> 
> [root@ct503-60 lanforge]# time ip link add link eth5 up name eth5#0 address 00:00:00:00:00:01 type macvlan
> 
> real	0m0.005s
> user	0m0.001s
> sys	0m0.004s
> [root@ct503-60 lanforge]# time ip link delete eth5#0
> 
> real	0m0.033s
> user	0m0.001s
> sys	0m0.005s
> [root@ct503-60 lanforge]# ip link add link eth5 up name eth5#0 address 00:00:00:00:00:01 type macvlan
> 
> [root@ct503-60 lanforge]# ip -6 addr add 2002::1/64 dev eth5#0
> [root@ct503-60 lanforge]# time ip link delete eth5#0
> 
> real	0m1.030s
> user	0m0.000s
> sys	0m0.013s
> 
> 
> Funny enough, if you explicitly remove the IPv6 addr first it seems
> to run at normal speed (adding both operation's times together)
> 
> [root@ct503-60 lanforge]# ip link add link eth5 up name eth5#0 address 00:00:00:00:00:01 type macvlan
> [root@ct503-60 lanforge]# ip -6 addr add 2002::1/64 dev eth5#0
> [root@ct503-60 lanforge]# time ip -6 addr delete 2002::1/64 dev eth5#0
> 
> real	0m0.001s
> user	0m0.000s
> sys	0m0.001s
> [root@ct503-60 lanforge]# time ip link delete eth5#0
> 
> real	0m0.028s
> user	0m0.001s
> sys	0m0.005s
> 

The key here is you have to wait a bit (2 seconds) between 
"ip -6 addr add..." and the "ip link delete", or it is fast.

So ipv6 misses a cleanup somewhere and a device refcount is held.

here is a debugging patch on current kernels :

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 072652d..820d9ed 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1799,6 +1799,7 @@ extern void netdev_run_todo(void);
  */
 static inline void dev_put(struct net_device *dev)
 {
+	WARN_ON(dev->reg_state == NETREG_UNREGISTERED);
 	irqsafe_cpu_dec(*dev->pcpu_refcnt);
 }
 
gives :

[  418.614227] ------------[ cut here ]------------
[  418.614281] WARNING: at include/linux/netdevice.h:1802 in6_dev_finish_destroy+0xc9/0xf0()
[  418.614348] Hardware name: ProLiant BL460c G6
[  418.614392] Modules linked in: macvlan ipmi_devintf ipmi_si ipmi_msghandler dm_mod tg3 libphy sg [last unloaded: x_tables]
[  418.614804] Pid: 5403, comm: ip Tainted: G        W   2.6.37-rc1-00186-g5c6f178-dirty #271
[  418.614857] Call Trace:
[  418.614901]  [<ffffffff814ecac9>] ? in6_dev_finish_destroy+0xc9/0xf0
[  418.614952]  [<ffffffff81046440>] warn_slowpath_common+0x90/0xc0
[  418.615002]  [<ffffffff8104648a>] warn_slowpath_null+0x1a/0x20
[  418.615051]  [<ffffffff814ecac9>] in6_dev_finish_destroy+0xc9/0xf0
[  418.615101]  [<ffffffff814f469e>] ip6_dst_ifdown+0x5e/0x60
[  418.615150]  [<ffffffff81448318>] dst_ifdown+0x38/0x110
[  418.615198]  [<ffffffff81448457>] dst_dev_event+0x67/0x130
[  418.615247]  [<ffffffff815d2888>] notifier_call_chain+0x58/0x80
[  418.615298]  [<ffffffff8106b86e>] __raw_notifier_call_chain+0xe/0x10
[  418.615348]  [<ffffffff8106b886>] raw_notifier_call_chain+0x16/0x20
[  418.615432]  [<ffffffff814408d7>] call_netdevice_notifiers+0x37/0x70
[  418.615496]  [<ffffffff81440a47>] netdev_run_todo+0x137/0x260
[  418.615560]  [<ffffffff8144f11e>] rtnl_unlock+0xe/0x10
[  418.615621]  [<ffffffff8144f18a>] rtnetlink_rcv+0x2a/0x40
[  418.615684]  [<ffffffff8148b043>] netlink_unicast+0x2c3/0x2d0
[  418.615747]  [<ffffffff81438a8b>] ? memcpy_fromiovec+0x7b/0xa0
[  418.615810]  [<ffffffff8148bddd>] netlink_sendmsg+0x24d/0x380
[  418.615874]  [<ffffffff8142dad0>] sock_sendmsg+0xc0/0xf0
[  418.615938]  [<ffffffff81458370>] ? verify_compat_iovec+0x80/0x130
[  418.616002]  [<ffffffff8142e894>] sys_sendmsg+0x1a4/0x340
[  418.616065]  [<ffffffff810dad46>] ? handle_mm_fault+0x676/0x8b0
[  418.616129]  [<ffffffff815d2610>] ? do_page_fault+0x2a0/0x4c0
[  418.616192]  [<ffffffff8142df09>] ? sys_recvmsg+0x49/0x70
[  418.616254]  [<ffffffff81457f14>] compat_sys_sendmsg+0x14/0x20
[  418.616317]  [<ffffffff81458cbf>] compat_sys_socketcall+0x1cf/0x220
[  418.616380]  [<ffffffff815cf1e5>] ? page_fault+0x25/0x30
[  418.616443]  [<ffffffff8102ec60>] sysenter_dispatch+0x7/0x2e
[  418.616520] ---[ end trace c2d75997b525ef59 ]---



^ permalink raw reply related

* Re: [PATCH] via-rhine: hardware VLAN support
From: Roger Luethi @ 2010-11-09  6:18 UTC (permalink / raw)
  To: Jesse Gross; +Cc: netdev, David S. Miller
In-Reply-To: <AANLkTim40QH2AWz8YtW_y3=WjEU0_Rom9-CPFj-O5MCt@mail.gmail.com>

On Mon, 08 Nov 2010 12:53:57 -0800, Jesse Gross wrote:
> On Mon, Nov 8, 2010 at 8:21 AM, Roger Luethi <rl@hellgate.ch> wrote:
> > On Fri, 05 Nov 2010 11:31:56 -0700, Jesse Gross wrote:
> >> On Fri, Nov 5, 2010 at 3:43 AM, Roger Luethi <rl@hellgate.ch> wrote:
> >> > This patch adds VLAN hardware support for Rhine chips.
> >>
> >> This uses the old interfaces for vlan acceleration.  We're working to
> >> switch drivers over to use the new methods and the old ones will be
> >> going away in the future.  It would be great if we can avoid adding
> >> more code that uses those interfaces.
> >
> > Can you point me to a driver that has been switched to use the new methods
> > already? Is there some other form of documentation?
> 
> bnx2 is an example of a driver that has been converted.  The commit
> that actually made the change was
> 7d0fd2117e3d0550d7987b3aff2bfbc0244cf7c6, which should highlight the
> differences.  A key point is that drivers should no longer reference
> vlan groups at all.

Thank you. I will take a look and submit a revised patch.

^ permalink raw reply

* Re: Loopback performance from kernel 2.6.12 to 2.6.37
From: Eric Dumazet @ 2010-11-09  6:23 UTC (permalink / raw)
  To: Andrew Hendry; +Cc: Jesper Dangaard Brouer, netdev
In-Reply-To: <1289280152.2790.23.camel@edumazet-laptop>

Le mardi 09 novembre 2010 à 06:22 +0100, Eric Dumazet a écrit :
> Le mardi 09 novembre 2010 à 11:05 +1100, Andrew Hendry a écrit :
> > results on an i7 860 @ 2.80Ghz machine, no virtualization involved. 2.6.37-rc1+
> > 
> > # time dd if=/dev/zero bs=1M count=10000 | netcat  127.0.0.1 9999
> > 10000+0 records in
> > 10000+0 records out
> > 10485760000 bytes (10 GB) copied, 50.2022 s, 209 MB/s
> > 
> > real	0m50.210s
> > user	0m1.094s
> > sys	0m57.589s
> 
> Thanks !
> 
> Could you take a pef snapshot during the test ?
> 
> # perf record -a -g sleep 10
> # perf report
> 
> 

On my laptop 
Intel(R) Core(TM)2 Duo CPU     T8300  @ 2.40GHz
(2.6.35-22-generic #35-Ubuntu SMP Sat Oct 16 20:45:36 UTC 2010 x86_64
GNU/Linux) :

time dd if=/dev/zero bs=1M count=10000|netcat 127.0.0.1 9999
10000+0 enregistrements lus
10000+0 enregistrements écrits
10485760000 octets (10 GB) copiés, 38,2691 s, 274 MB/s

real	0m38.274s
user	0m1.870s
sys	0m38.370s


perf top result :

-------------------------------------------------------------------------------------------------
   PerfTop:    1948 irqs/sec  kernel:90.7%  exact:  0.0% [1000Hz cycles],  (all, 2 CPUs)
-------------------------------------------------------------------------------------------------

             samples  pcnt function                    DSO
             _______ _____ ___________________________ ___________________

             1867.00 12.4% copy_user_generic_string    [kernel.kallsyms]  
             1166.00  7.7% __ticket_spin_lock          [kernel.kallsyms]  
              744.00  4.9% __clear_user                [kernel.kallsyms]  
              667.00  4.4% system_call                 [kernel.kallsyms]  
              329.00  2.2% tcp_sendmsg                 [kernel.kallsyms]  
              304.00  2.0% schedule                    [kernel.kallsyms]  
              257.00  1.7% _raw_spin_unlock_irqrestore [kernel.kallsyms]  
              231.00  1.5% fget_light                  [kernel.kallsyms]  
              216.00  1.4% do_poll                     [kernel.kallsyms]  
              203.00  1.3% __read_chk                  /lib/libc-2.12.1.so
              202.00  1.3% __pollwait                  [kernel.kallsyms]  
              201.00  1.3% __poll                      /lib/libc-2.12.1.so
              187.00  1.2% system_call_after_swapgs    [kernel.kallsyms]  
              176.00  1.2% __write                     /lib/libc-2.12.1.so
              173.00  1.1% _raw_spin_lock_irqsave      [kernel.kallsyms]  
              163.00  1.1% tcp_recvmsg                 [kernel.kallsyms]  
              158.00  1.0% do_sys_poll                 [kernel.kallsyms]  
              153.00  1.0% vfs_write                   [kernel.kallsyms]  
              143.00  0.9% pipe_read                   [kernel.kallsyms]  
              141.00  0.9% fput                        [kernel.kallsyms]  
              121.00  0.8% common_file_perm            [kernel.kallsyms]  
              120.00  0.8% _cond_resched               [kernel.kallsyms]  


# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  0   1456 120056  51572 2606876    0    0   158    41  254  190  9  2 88  0
 2  0   1456 120140  51580 2606868    0    0    12     0  758 158309 11 76 13  0
 2  0   1456 119520  51588 2606896    0    0     0   176  778 160749  8 80 12  0
 2  0   1456 120388  51588 2606896    0    0     0     0  730 158201  9 76 16  0
 3  0   1456 120388  51588 2606896    0    0     0     0  745 158490  8 76 16  0
 2  0   1456 120520  51588 2606896    0    0     0     0  991 159120  9 78 13  0
 2  0   1456 120024  51588 2606896    0    0     0     0  653 160023 10 79 11  0
 3  0   1456 120520  51588 2606896    0    0     0     0  659 160614  8 78 14  0
 2  0   1456 120272  51596 2606896    0    0     0    80  695 159922 10 75 14  0
 4  0   1456 120272  51596 2606896    0    0     0     0  675 158010  7 79 14  0


# powertop
     PowerTOP version 1.13      (C) 2007 Intel Corporation

< Detailed C-state information is not P-states (frequencies)
                                      Turbo Mode    43.1%
                                        2.40 Ghz    48.0%
                                        2.00 Ghz     8.2%
                                        1.60 Ghz     0.7%
                                        1200 Mhz     0.1%

Wakeups-from-idle per second : 542.9    interval: 10.0s
no ACPI power usage estimate available

Top causes for wakeups:
  21.9% (196.5)   [kernel scheduler] Load balancing tick
  21.2% (190.7)   [Rescheduling interrupts] <kernel IPI>
  12.7% (114.0)   PS/2 keyboard/mouse/touchpad interrupt
  12.0% (107.9)   plugin-containe
  11.1% ( 99.3)   alsa-sink
   6.0% ( 53.8)   firefox-bin
   4.4% ( 39.7)   fping
   3.9% ( 35.2)   Xorg
   1.3% ( 11.3)   [b43] <interrupt>
   1.1% ( 10.0)   ksoftirqd/0
   0.4% (  4.0)D  nagios3
   0.2% (  1.9)D  gnome-terminal
   0.7% (  6.4)   [Thermal event interrupts] <kernel IPI>




^ permalink raw reply

* Re: Loopback performance from kernel 2.6.12 to 2.6.37
From: Andrew Hendry @ 2010-11-09  6:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jesper Dangaard Brouer, netdev
In-Reply-To: <1289283797.2790.84.camel@edumazet-laptop>

most my slowdown was kmemleak left on.

After fixing its is still a lot slower than your dev system
.
# time dd if=/dev/zero bs=1M count=10000 | netcat  127.0.0.1 9999
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 25.8182 s, 406 MB/s

real	0m25.821s
user	0m1.502s
sys	0m33.463s

------------------------------------------------------------------------------------------------------------------
   PerfTop:     241 irqs/sec  kernel:56.8%  exact:  0.0% [1000Hz
cycles],  (all, 8 CPUs)
------------------------------------------------------------------------------------------------------------------

             samples  pcnt function                    DSO
             _______ _____ ___________________________
______________________________________

             1255.00  8.7% hpet_msi_next_event
/lib/modules/2.6.37-rc1+/build/vmlinux
             1081.00  7.5% copy_user_generic_string
/lib/modules/2.6.37-rc1+/build/vmlinux
              863.00  6.0% __ticket_spin_lock
/lib/modules/2.6.37-rc1+/build/vmlinux
              498.00  3.5% do_sys_poll
/lib/modules/2.6.37-rc1+/build/vmlinux
              455.00  3.2% system_call
/lib/modules/2.6.37-rc1+/build/vmlinux
              409.00  2.8% fget_light
/lib/modules/2.6.37-rc1+/build/vmlinux
              348.00  2.4% tcp_sendmsg
/lib/modules/2.6.37-rc1+/build/vmlinux
              269.00  1.9% fsnotify
/lib/modules/2.6.37-rc1+/build/vmlinux
              258.00  1.8% _raw_spin_unlock_irqrestore
/lib/modules/2.6.37-rc1+/build/vmlinux
              223.00  1.6% _raw_spin_lock_irqsave
/lib/modules/2.6.37-rc1+/build/vmlinux
              203.00  1.4% __clear_user
/lib/modules/2.6.37-rc1+/build/vmlinux
              184.00  1.3% tcp_poll
/lib/modules/2.6.37-rc1+/build/vmlinux
              178.00  1.2% vfs_write
/lib/modules/2.6.37-rc1+/build/vmlinux
              165.00  1.1% tcp_recvmsg
/lib/modules/2.6.37-rc1+/build/vmlinux
              152.00  1.1% pipe_read
/lib/modules/2.6.37-rc1+/build/vmlinux
              149.00  1.0% schedule
/lib/modules/2.6.37-rc1+/build/vmlinux
              135.00  0.9% rw_verify_area
/lib/modules/2.6.37-rc1+/build/vmlinux
              135.00  0.9% __pollwait
/lib/modules/2.6.37-rc1+/build/vmlinux
              130.00  0.9% __write
/lib/libc-2.12.1.so
              127.00  0.9% __ticket_spin_unlock
/lib/modules/2.6.37-rc1+/build/vmlinux
              126.00  0.9% __poll
/lib/libc-2.12.1.so


On Tue, Nov 9, 2010 at 5:23 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 09 novembre 2010 à 06:22 +0100, Eric Dumazet a écrit :
>> Le mardi 09 novembre 2010 à 11:05 +1100, Andrew Hendry a écrit :
>> > results on an i7 860 @ 2.80Ghz machine, no virtualization involved. 2.6.37-rc1+
>> >
>> > # time dd if=/dev/zero bs=1M count=10000 | netcat  127.0.0.1 9999
>> > 10000+0 records in
>> > 10000+0 records out
>> > 10485760000 bytes (10 GB) copied, 50.2022 s, 209 MB/s
>> >
>> > real        0m50.210s
>> > user        0m1.094s
>> > sys 0m57.589s
>>
>> Thanks !
>>
>> Could you take a pef snapshot during the test ?
>>
>> # perf record -a -g sleep 10
>> # perf report
>>
>>
>
> On my laptop
> Intel(R) Core(TM)2 Duo CPU     T8300  @ 2.40GHz
> (2.6.35-22-generic #35-Ubuntu SMP Sat Oct 16 20:45:36 UTC 2010 x86_64
> GNU/Linux) :
>
> time dd if=/dev/zero bs=1M count=10000|netcat 127.0.0.1 9999
> 10000+0 enregistrements lus
> 10000+0 enregistrements écrits
> 10485760000 octets (10 GB) copiés, 38,2691 s, 274 MB/s
>
> real    0m38.274s
> user    0m1.870s
> sys     0m38.370s
>
>
> perf top result :
>
> -------------------------------------------------------------------------------------------------
>   PerfTop:    1948 irqs/sec  kernel:90.7%  exact:  0.0% [1000Hz cycles],  (all, 2 CPUs)
> -------------------------------------------------------------------------------------------------
>
>             samples  pcnt function                    DSO
>             _______ _____ ___________________________ ___________________
>
>             1867.00 12.4% copy_user_generic_string    [kernel.kallsyms]
>             1166.00  7.7% __ticket_spin_lock          [kernel.kallsyms]
>              744.00  4.9% __clear_user                [kernel.kallsyms]
>              667.00  4.4% system_call                 [kernel.kallsyms]
>              329.00  2.2% tcp_sendmsg                 [kernel.kallsyms]
>              304.00  2.0% schedule                    [kernel.kallsyms]
>              257.00  1.7% _raw_spin_unlock_irqrestore [kernel.kallsyms]
>              231.00  1.5% fget_light                  [kernel.kallsyms]
>              216.00  1.4% do_poll                     [kernel.kallsyms]
>              203.00  1.3% __read_chk                  /lib/libc-2.12.1.so
>              202.00  1.3% __pollwait                  [kernel.kallsyms]
>              201.00  1.3% __poll                      /lib/libc-2.12.1.so
>              187.00  1.2% system_call_after_swapgs    [kernel.kallsyms]
>              176.00  1.2% __write                     /lib/libc-2.12.1.so
>              173.00  1.1% _raw_spin_lock_irqsave      [kernel.kallsyms]
>              163.00  1.1% tcp_recvmsg                 [kernel.kallsyms]
>              158.00  1.0% do_sys_poll                 [kernel.kallsyms]
>              153.00  1.0% vfs_write                   [kernel.kallsyms]
>              143.00  0.9% pipe_read                   [kernel.kallsyms]
>              141.00  0.9% fput                        [kernel.kallsyms]
>              121.00  0.8% common_file_perm            [kernel.kallsyms]
>              120.00  0.8% _cond_resched               [kernel.kallsyms]
>
>
> # vmstat 1
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  2  0   1456 120056  51572 2606876    0    0   158    41  254  190  9  2 88  0
>  2  0   1456 120140  51580 2606868    0    0    12     0  758 158309 11 76 13  0
>  2  0   1456 119520  51588 2606896    0    0     0   176  778 160749  8 80 12  0
>  2  0   1456 120388  51588 2606896    0    0     0     0  730 158201  9 76 16  0
>  3  0   1456 120388  51588 2606896    0    0     0     0  745 158490  8 76 16  0
>  2  0   1456 120520  51588 2606896    0    0     0     0  991 159120  9 78 13  0
>  2  0   1456 120024  51588 2606896    0    0     0     0  653 160023 10 79 11  0
>  3  0   1456 120520  51588 2606896    0    0     0     0  659 160614  8 78 14  0
>  2  0   1456 120272  51596 2606896    0    0     0    80  695 159922 10 75 14  0
>  4  0   1456 120272  51596 2606896    0    0     0     0  675 158010  7 79 14  0
>
>
> # powertop
>     PowerTOP version 1.13      (C) 2007 Intel Corporation
>
> < Detailed C-state information is not P-states (frequencies)
>                                      Turbo Mode    43.1%
>                                        2.40 Ghz    48.0%
>                                        2.00 Ghz     8.2%
>                                        1.60 Ghz     0.7%
>                                        1200 Mhz     0.1%
>
> Wakeups-from-idle per second : 542.9    interval: 10.0s
> no ACPI power usage estimate available
>
> Top causes for wakeups:
>  21.9% (196.5)   [kernel scheduler] Load balancing tick
>  21.2% (190.7)   [Rescheduling interrupts] <kernel IPI>
>  12.7% (114.0)   PS/2 keyboard/mouse/touchpad interrupt
>  12.0% (107.9)   plugin-containe
>  11.1% ( 99.3)   alsa-sink
>   6.0% ( 53.8)   firefox-bin
>   4.4% ( 39.7)   fping
>   3.9% ( 35.2)   Xorg
>   1.3% ( 11.3)   [b43] <interrupt>
>   1.1% ( 10.0)   ksoftirqd/0
>   0.4% (  4.0)D  nagios3
>   0.2% (  1.9)D  gnome-terminal
>   0.7% (  6.4)   [Thermal event interrupts] <kernel IPI>
>
>
>
>

^ permalink raw reply

* Re: Loopback performance from kernel 2.6.12 to 2.6.37
From: Eric Dumazet @ 2010-11-09  6:38 UTC (permalink / raw)
  To: Andrew Hendry; +Cc: Jesper Dangaard Brouer, netdev
In-Reply-To: <AANLkTikAPaU_2=wS_T3V-8xFZm-G3qutJBxY8yb0QCYL@mail.gmail.com>

Le mardi 09 novembre 2010 à 17:30 +1100, Andrew Hendry a écrit :
> most my slowdown was kmemleak left on.
> 
> After fixing its is still a lot slower than your dev system
> .
> # time dd if=/dev/zero bs=1M count=10000 | netcat  127.0.0.1 9999
> 10000+0 records in
> 10000+0 records out
> 10485760000 bytes (10 GB) copied, 25.8182 s, 406 MB/s
> 
> real	0m25.821s
> user	0m1.502s
> sys	0m33.463s
> 
> ------------------------------------------------------------------------------------------------------------------
>    PerfTop:     241 irqs/sec  kernel:56.8%  exact:  0.0% [1000Hz
> cycles],  (all, 8 CPUs)
> ------------------------------------------------------------------------------------------------------------------
> 
>              samples  pcnt function                    DSO
>              _______ _____ ___________________________
> ______________________________________
> 
>              1255.00  8.7% hpet_msi_next_event
> /lib/modules/2.6.37-rc1+/build/vmlinux
>              1081.00  7.5% copy_user_generic_string
> /lib/modules/2.6.37-rc1+/build/vmlinux
>               863.00  6.0% __ticket_spin_lock
> /lib/modules/2.6.37-rc1+/build/vmlinux
>               498.00  3.5% do_sys_poll
> /lib/modules/2.6.37-rc1+/build/vmlinux
>               455.00  3.2% system_call
> /lib/modules/2.6.37-rc1+/build/vmlinux
>               409.00  2.8% fget_light
> /lib/modules/2.6.37-rc1+/build/vmlinux
>               348.00  2.4% tcp_sendmsg
> /lib/modules/2.6.37-rc1+/build/vmlinux
>               269.00  1.9% fsnotify
> /lib/modules/2.6.37-rc1+/build/vmlinux
>               258.00  1.8% _raw_spin_unlock_irqrestore
> /lib/modules/2.6.37-rc1+/build/vmlinux
>               223.00  1.6% _raw_spin_lock_irqsave
> /lib/modules/2.6.37-rc1+/build/vmlinux
>               203.00  1.4% __clear_user
> /lib/modules/2.6.37-rc1+/build/vmlinux
>               184.00  1.3% tcp_poll
> /lib/modules/2.6.37-rc1+/build/vmlinux
>               178.00  1.2% vfs_write
> /lib/modules/2.6.37-rc1+/build/vmlinux
>               165.00  1.1% tcp_recvmsg
> /lib/modules/2.6.37-rc1+/build/vmlinux
>               152.00  1.1% pipe_read
> /lib/modules/2.6.37-rc1+/build/vmlinux
>               149.00  1.0% schedule
> /lib/modules/2.6.37-rc1+/build/vmlinux
>               135.00  0.9% rw_verify_area
> /lib/modules/2.6.37-rc1+/build/vmlinux
>               135.00  0.9% __pollwait
> /lib/modules/2.6.37-rc1+/build/vmlinux
>               130.00  0.9% __write
> /lib/libc-2.12.1.so
>               127.00  0.9% __ticket_spin_unlock
> /lib/modules/2.6.37-rc1+/build/vmlinux
>               126.00  0.9% __poll
> /lib/libc-2.12.1.so
> 
> 


Hmm, your clock source is HPET, that might explain the problem on a
scheduler intensive workload.

My HP dev machine
# grep . /sys/devices/system/clocksource/clocksource0/*
/sys/devices/system/clocksource/clocksource0/available_clocksource:tsc hpet acpi_pm 
/sys/devices/system/clocksource/clocksource0/current_clocksource:tsc

My laptop:
$ grep . /sys/devices/system/clocksource/clocksource0/*
/sys/devices/system/clocksource/clocksource0/available_clocksource:tsc hpet acpi_pm 
/sys/devices/system/clocksource/clocksource0/current_clocksource:tsc



^ permalink raw reply

* Re: Loopback performance from kernel 2.6.12 to 2.6.37
From: Eric Dumazet @ 2010-11-09  6:42 UTC (permalink / raw)
  To: Andrew Hendry; +Cc: Jesper Dangaard Brouer, netdev
In-Reply-To: <1289284715.2790.87.camel@edumazet-laptop>

Le mardi 09 novembre 2010 à 07:38 +0100, Eric Dumazet a écrit :

> Hmm, your clock source is HPET, that might explain the problem on a
> scheduler intensive workload.
> 

And if a packet sniffer (dhclient for example) makes all packets being
timestamped, it also can explain a slowdown, even if there is no
scheduler artifacts.

cat /proc/net/packet

> My HP dev machine
> # grep . /sys/devices/system/clocksource/clocksource0/*
> /sys/devices/system/clocksource/clocksource0/available_clocksource:tsc hpet acpi_pm 
> /sys/devices/system/clocksource/clocksource0/current_clocksource:tsc
> 
> My laptop:
> $ grep . /sys/devices/system/clocksource/clocksource0/*
> /sys/devices/system/clocksource/clocksource0/available_clocksource:tsc hpet acpi_pm 
> /sys/devices/system/clocksource/clocksource0/current_clocksource:tsc
> 




^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox