Netdev List
 help / color / mirror / Atom feed
* Re: [PATCHv4 net-next-2.6 4/5] XFRM,IPv6: Add IRO remapping hook in xfrm_input()
From: Arnaud Ebalard @ 2010-10-06 21:42 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, Eric Dumazet, Hideaki YOSHIFUJI, netdev
In-Reply-To: <20101006012509.GA3540@gondor.apana.org.au>

Hi Herbert,

Herbert Xu <herbert@gondor.apana.org.au> writes:

> On Wed, Oct 06, 2010 at 01:28:42AM +0200, Arnaud Ebalard wrote:
>>
>>  1) First, current net-next-2.6 (no patches applied)
>>  2) then, with my patches applied and CONFIG_XFRM_SUB_POLICY enabled
>>  3) then, with my patches applied and CONFIG_XFRM_SUB_POLICY disabled
>
> To measure the effect of this properly you should use null
> encryption/hashing and look at the CPU utilisation with minimum
> packet sizes.

I did that w/ the following kernels on the receiver (same as before):

 - kernel 1: w/o patches (net-next-2.6)
 - kernel 2: patches applied w/ CONFIG_XFRM_SUB_POLICY
 - kernel 3: patches applied w/o CONFIG_XFRM_SUB_POLICY

5 runs of sending 1073741824 bytes over TCP protected using ESP (null
enc and null auth) in transport mode, forcing small packets:

dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x <v4orv6dst> 1234 

5 runs for each kernel for IPv4 and IPv6. I keep only the 3 central
values provided by dd indicating the completion of the emission (remove
highest and smallest) and do an average w/ those 3 values: 

              IPv4          IPv6
kernel 1    70.0815s      72.4843s
kernel 2    69.8335s      72.3462s
kernel 3    69.9758s      72.3588s

I used the exact same method for the tests for the 3 cases (reboot,
started the test, nothing else running). I am unable to explain why it's
longer for the test to complete with an unpatched kernel but this is
what I get. Maybe it is just an artefact and the impact is just simply
too small to be mesured. Anyway, the whole set of results are at the end
of the email. 


>> > With your remapping, would it be possible to add dummy xfrm_state
>> > objects with the remapped destination address that could then call
>> > xfrm6_input_addr?
>> >
>> > That way normal IPsec users would not be affected at all while
>> > preserving your new functionality.
>> 
>> I don't think I can do that easily (at all?) with what XFRM provides,
>> can I? Or at least I don't see how it is possible because I would need
>> some kind of policy for the state to be applied and the only trigger I
>> see is the src/dst address mismatch when processing the IPsec packet.
>
> So do you know the remapped destination addresses a priori?

I have both the HoA and the CoA (this is x->coaddr in my state. '::' is
allowed if I want to allow anything), but the appplication of the state
needs to be done for traffic meant for the HoA and not blindly for all
the IPsec traffic received with the CoA (as source or destination).

it's not possible to just blindly remap things based on the on-wire
address and the fact it is IPsec traffic. For instance, if some tunnel
mode IPsec traffic between a MN and its HA is used in parallel, I cannot
remap the CoA from received packets on the MN to the HoA.

That's the reason why the lookup is done via something stable (i.e. the
HoA) which is derived from the SPI during SA lookup. 


> If not then then other possibility would be to add the code hook
> in case of xfrm_state_lookup failure.

This would work for destination address. But it has the drawback of
requiring a first lookup (guaranteed to fail if destination remapping
for the feature) and then a second (somehow w/o using the destination
address). Source check would still be done in all cases.


> But more importantly you need to solve the hash collission issue
> that I mentioned earlier.  Without that it won't work at all.

You are correct. The fact that the byspi hash table contains all states
(for in and out traffic) and that the state lookup does not involve the
direction but only the destination address (damn, the one I remove ;-))
is a real issue. It just makes the state lookup unreliable when I pass
NULL as daddr for incoming traffic. Thanks for pointing that.

I don't see any good solution yet (state have no direction) but I will
definitely focus on that issue and spend some time on it tomorrow.

Cheers,

a+

The full set of results for the tests:

* kernel 1

$ dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x 192.168.0.14 1234
1073741824 bytes (1.1 GB) copied, 70.0393 s, 15.3 MB/s
1073741824 bytes (1.1 GB) copied, 70.2442 s, 15.3 MB/s
1073741824 bytes (1.1 GB) copied, 70.1995 s, 15.3 MB/s
1073741824 bytes (1.1 GB) copied, 70.0057 s, 15.3 MB/s
1073741824 bytes (1.1 GB) copied, 69.9887 s, 15.3 MB/s

$ dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x fdff::1 1234 
1073741824 bytes (1.1 GB) copied, 72.5605 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 72.9725 s, 14.7 MB/s
1073741824 bytes (1.1 GB) copied, 72.3099 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 72.435 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 72.4573 s, 14.8 MB/s

* kernel 2

$ dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x 192.168.0.14 1234
1073741824 bytes (1.1 GB) copied, 69.6845 s, 15.4 MB/s
1073741824 bytes (1.1 GB) copied, 69.9419 s, 15.4 MB/s
1073741824 bytes (1.1 GB) copied, 69.8615 s, 15.4 MB/s
1073741824 bytes (1.1 GB) copied, 70.0142 s, 15.3 MB/s
1073741824 bytes (1.1 GB) copied, 69.6970 s, 15.4 MB/s

$ dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x fdff::1 1234 
1073741824 bytes (1.1 GB) copied, 72.4252 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 71.5411 s, 15.0 MB/s
1073741824 bytes (1.1 GB) copied, 72.2388 s, 14.9 MB/s
1073741824 bytes (1.1 GB) copied, 72.3745 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 72.6553 s, 14.8 MB/s

* kernel 3

$ dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x 192.168.0.14 1234
1073741824 bytes (1.1 GB) copied, 69.8569 s, 15.4 MB/s
1073741824 bytes (1.1 GB) copied, 70.4445 s, 15.2 MB/s
1073741824 bytes (1.1 GB) copied, 69.8989 s, 15.4 MB/s
1073741824 bytes (1.1 GB) copied, 70.0238 s, 15.3 MB/s
1073741824 bytes (1.1 GB) copied, 70.0047 s, 15.3 MB/s

$ dd if=/dev/zero bs=1024 count=1048576 | nc --mtu=100 -x fdff::1 1234 
1073741824 bytes (1.1 GB) copied, 72.4677 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 72.2757 s, 14.9 MB/s
1073741824 bytes (1.1 GB) copied, 72.2261 s, 14.9 MB/s
1073741824 bytes (1.1 GB) copied, 72.3331 s, 14.8 MB/s
1073741824 bytes (1.1 GB) copied, 72.8262 s, 14.7 MB/s


^ permalink raw reply

* Re: [net-next-2.6 PATCH 2/3] bonding: add Speed/Duplex information to /proc/net/bonding/bond
From: Krzysztof Olędzki @ 2010-10-06 21:32 UTC (permalink / raw)
  To: David Miller; +Cc: fubar, bonding-devel, netdev
In-Reply-To: <20101006.142424.246555203.davem@davemloft.net>

On 2010-10-06 23:24, David Miller wrote:
> From: Krzysztof Piotr Oledzki <ole@ans.pl>
> Date: Thu, 30 Sep 2010 18:19:04 +0200
>
>> Subject: bonding: add Speed/Duplex information to /proc/net/bonding/bond*
>>
>> Effect:
>>   Slave Interface: eth5
>>   MII Status: up
>>   Speed: 10000 Mbps
>>   Duplex: full
>>   Link Failure Count: 0
>>   Permanent HW addr: XX:XX:XX:XX:XX:XX
>>   Slave queue ID: 0
>>
>> Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
>
> Changing the layout of a procfs file is pretty much always unsafe.
>
> Some useland program somewhere depends upon the current layout, and
> therefore very likely could break if we change the contents.
>
> We really can't apply a patch like this.  A most portable and
> extensible interface (netlink, ethtool) should be used to provide this
> information to userspace.

Sure, no problem.

Please note however that this procfs file was recently changed ("Slave 
queue ID" was added) so if this is indeed a problem than we would 
already heard about this.

Best regards,

			Krzysztof Olędzki

^ permalink raw reply

* Re: [net-next-2.6 PATCH 3/3] bonding: reread information about speed and duplex when interface goes up
From: David Miller @ 2010-10-06 21:28 UTC (permalink / raw)
  To: ole; +Cc: fubar, bonding-devel, netdev
In-Reply-To: <4ca4b87e.NzbQ35P2qRrA2iuz%ole@ans.pl>

From: Krzysztof Piotr Oledzki <ole@ans.pl>
Date: Thu, 30 Sep 2010 18:19:10 +0200

>>From 43285224a785e90c7d4cff2be0766ca8df6ddfb9 Mon Sep 17 00:00:00 2001
> From: Krzysztof Piotr Oledzki <ole@ans.pl>
> Date: Thu, 30 Sep 2010 17:09:02 +0200
> Subject: bonding: reread information about speed and duplex when interface goes up
> 
> When an interface was enslaved when it was down, bonding thinks
> it has speed -1 even after it goes up. This leads into selecting
> a wrong active interface in active/backup mode on mixed 10G/1G or
> 1G/100M environment.
> 
> before:
>  bonding: bond0: link status definitely up for interface eth5, 100 Mbps full duplex.
>  bonding: bond0: link status definitely up for interface eth0, 100 Mbps full duplex.
> 
> after:
>  bonding: bond0: link status definitely up for interface eth5, 10000 Mbps full duplex.
>  bonding: bond0: link status definitely up for interface eth0, 1000 Mbps full duplex.
> 
> Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>

Applied.

^ permalink raw reply

* Re: [net-next-2.6 PATCH 1/3] bonding: print information about speed and duplex seen by the driver
From: David Miller @ 2010-10-06 21:28 UTC (permalink / raw)
  To: ole; +Cc: fubar, bonding-devel, netdev
In-Reply-To: <4ca4b873.iZ0Dea7Y1OFcVmf8%ole@ans.pl>

From: Krzysztof Piotr Oledzki <ole@ans.pl>
Date: Thu, 30 Sep 2010 18:18:59 +0200

>>From d4fb10933bb88f2bf410b0abe42f151c13e3df85 Mon Sep 17 00:00:00 2001
> From: Krzysztof Piotr Oledzki <ole@ans.pl>
> Date: Thu, 30 Sep 2010 17:05:19 +0200
> Subject: bonding: print information about speed and duplex seen by the driver
> 
> before:
>  bonding: bond0: link status definitely up for interface eth5
>  bonding: bond0: link status definitely up for interface eth0
> 
> after:
>  bonding: bond0: link status definitely up for interface eth5, 100 Mbps full duplex.
>  bonding: bond0: link status definitely up for interface eth0, 100 Mbps full duplex.
> 
> 
> Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>

Applied.

^ permalink raw reply

* Re: Regarding to your linux kernel CL
From: Chung-Yih Wang (王崇懿) @ 2010-10-06 21:23 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, timo.teras, netdev
In-Reply-To: <20101006.011448.112578977.davem@davemloft.net>

Do you mean if we should move the genid/cookie logic into sk_dst_check
and remove the dst->ops->check()? If so, I dont think it will be a
good idea to remove 'dst->ops->check' since it is per dst entry's
check not just for ipv4 entry. At least, the fact of obsolete=2
reveals that the entry is freed and is ready for being released. That
will be weird if we still use the obsolete entry for routing the
socket's traffic(for sure, that's why the packets of the socket goes
nowhere but dropped).

On Wed, Oct 6, 2010 at 1:14 AM, David Miller <davem@davemloft.net> wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Wed, 6 Oct 2010 15:59:46 +0800
>
>> Perhaps we should move the genid/cookie logic into the dst so that
>> we can eliminate the dst->check call or at least make it a lot less
>> frequent.
>
> That sounds like a good idea.
>

^ permalink raw reply

* Re: [net-next-2.6 PATCH 2/3] bonding: add Speed/Duplex information to /proc/net/bonding/bond
From: David Miller @ 2010-10-06 21:24 UTC (permalink / raw)
  To: ole; +Cc: fubar, bonding-devel, netdev
In-Reply-To: <4ca4b878.KGc4gLtXHLSFlAdV%ole@ans.pl>

From: Krzysztof Piotr Oledzki <ole@ans.pl>
Date: Thu, 30 Sep 2010 18:19:04 +0200

> Subject: bonding: add Speed/Duplex information to /proc/net/bonding/bond*
> 
> Effect:
>  Slave Interface: eth5
>  MII Status: up
>  Speed: 10000 Mbps
>  Duplex: full
>  Link Failure Count: 0
>  Permanent HW addr: XX:XX:XX:XX:XX:XX
>  Slave queue ID: 0
> 
> Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>

Changing the layout of a procfs file is pretty much always unsafe.

Some useland program somewhere depends upon the current layout, and
therefore very likely could break if we change the contents.

We really can't apply a patch like this.  A most portable and
extensible interface (netlink, ethtool) should be used to provide this
information to userspace.

^ permalink raw reply

* Re: [PATCH] net/fec: fix link/queue interaction, wake queue iff link is up
From: David Miller @ 2010-10-06 21:15 UTC (permalink / raw)
  To: oskar; +Cc: netdev, dan, bigeasy, hjk
In-Reply-To: <1286364718-20070-1-git-send-email-oskar@linutronix.de>

From: Oskar Schirmer <oskar@linutronix.de>
Date: Wed,  6 Oct 2010 13:31:58 +0200

> with hardware slow in negotiation, the system did freeze
> while trying to mount root on nfs at boot time.
> 
> while the driver did report being busy when the link is down
> or no transmission buffers are available, it did not stop the
> queue, causing instant retries. furthermore, transmission being
> triggered with link down was caused by unconditional queue
> wakes, especially on timeouts.
> 
> now, wake queue only iff link is up and transmission
> buffers are available, and dont forget to wake queue
> when link has been adjusted. next, add stop queue notification
> upon driver induced transmission problems, so network stack has
> a chance to handle the situation.
> 
> Signed-off-by: Oskar Schirmer <oskar@linutronix.de>

You should never have to do this.

As long as netif_carrier_on() and netif_carrier_off() are invoked
properly by the driver, the core networking will never send you
packets when netif_carrier_ok() is false.

I am not applying this patch, drivers should not manage tx queue
state using link status as a condition, that's the job of the
core networking.

^ permalink raw reply

* Re: [PATCH net-next 00/19] bnx2x, cnic, bnx2i: Supporting 57712 device
From: David Miller @ 2010-10-06 21:11 UTC (permalink / raw)
  To: dmitry; +Cc: netdev, eilong, mchan
In-Reply-To: <1286370813.30552.7.camel@lb-tlvb-dmitry>

From: "Dmitry Kravkov" <dmitry@broadcom.com>
Date: Wed, 6 Oct 2010 15:13:33 +0200

> 01 cnic: Pass cp pointer to BNX2X_HW_CID.
> 02 cnic: Use pfid for internal memory offsets.
> 03 cnic: Fine-tune ring init code.
> 04 bnx2x: create folder for bnx2x firmware files
> 05 bnx2x: add 6.0.34 fw files
> 06 bnx2x, cnic, bnx2i: use new FW/HSI
> 07 bnx2x: remove old FW files 
> 08 bnx2x: rename MF related fields
> 09 bnx2x: change type of spq_left to atomic
> 10 bnx2x: Add 57712 support
> 11 bnx2x: remove unused parameter in reuse_rx_skb()
> 12 bnx2x: remove unused fields in main driver structure
> 13 bnx2x: use proper constants for dma_unmap* calls
> 14 bnx2x: use L1_CACHE_BYTES instead of magic number
> 15 bnx2x: move msix table initialization to probe()
> 16 bnx2x, cnic: Fix SPQ return credit
> 17 bnx2x: code beautify
> 18 bnx2x: properly initialize FW stats
> 19 bnx2x: update version to 1.60.00-1
> 
> Please consider applying to net-next.

All applied, thanks.

^ permalink raw reply

* Re: [net-next-2.6 PATCH] net: netif_set_real_num_rx_queues may cap num_rx_queues at init time
From: David Miller @ 2010-10-06 20:41 UTC (permalink / raw)
  To: eric.dumazet; +Cc: mcarlson, bhutchings, john.r.fastabend, netdev, therbert
In-Reply-To: <1286390945.9417.46.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 06 Oct 2010 20:49:05 +0200

> Le mercredi 06 octobre 2010 à 11:14 -0700, Matt Carlson a écrit :
> 
>> Yes.  We were missing a call to this function in the legacy case.
>> 
>> 
>> [PATCH net-next] tg3: Set real_num_rx_queues for non-multiq devs
>> 
>> Commit 2ddaad397c47de012dfb956b0c05540da1a0dde5 entitled "tg3: Use
>> netif_set_real_num_{rx,tx}_queues()" added a new call to
>> netif_set_real_num_rx_queues in tg3_enable_msix().  This call also needs
>> to be added to the legacy path to correctly reflect the actual number of
>> rx queues.
>> 
>> Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
...
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks everyone.

^ permalink raw reply

* Segmentation fault when using ping6
From: Rogerio Pimentel @ 2010-10-06 20:27 UTC (permalink / raw)
  To: netdev, linux-arm-kernel

Hi,

I'm having a "segmentation fault" problem when using ping6.

When testing on ARM9 platforms (i.MX25 and i.MX27), it returns the error:

root@freescale ~$ ping6 ::1
PING ::1 (::1): 56 data bytes
Segmentation fault

When testing on ARM11 and ARM Cortex A8 platforms (i.MX31 and i.MX51), 
it works:

root@freescale ~$ ping6 ::1
PING ::1 (::1): 56 data bytes
64 bytes from ::1: seq=0 ttl=64 time=0.331 ms
64 bytes from ::1: seq=1 ttl=64 time=0.214 ms
64 bytes from ::1: seq=2 ttl=64 time=0.176 ms
64 bytes from ::1: seq=3 ttl=64 time=0.155 ms

On all cases, I'm using Linux Kernel 2.6.35 (from Mainline) and Busybox 
1.15.0

Does anybody have any idea?

Thanks in advance,

Rogerio Pimentel



^ permalink raw reply

* Re: drivers/net/Makefile - question about link order comment introduced in 2.3.43
From: David Miller @ 2010-10-06 20:15 UTC (permalink / raw)
  To: joe; +Cc: netdev, sam
In-Reply-To: <1286363049.2156.113.camel@Joe-Laptop>

From: Joe Perches <joe@perches.com>
Date: Wed, 06 Oct 2010 04:04:09 -0700

> It wasn't that way in 2.3.43-pre7 , but it is in 2.3.43
> and has remained there since.  There's nothing in lkml
> history as far as I can find to describe a reason for it.
> 
> http://lkml.org/lkml/2000/2/10
> (or thereabout)
> 
> Anyone know why it's like that or if that Makefile link
> ordering is still needed?

I suspect it's not needed any more.

I also suspect it's an obsolete sparc specific hack to get the primary
ethernet interface to always show up as eth0.

^ permalink raw reply

* Re: [GIT PULL net-next-2.6] vhost-net patchset for 2.6.37
From: David Miller @ 2010-10-06 20:11 UTC (permalink / raw)
  To: mst; +Cc: kvm, virtualization, netdev, linux-kernel, jasowang
In-Reply-To: <20101005182732.GA25796@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 5 Oct 2010 20:27:32 +0200

> It looks like it was a quiet cycle for vhost-net:
> probably because most of energy was spent on bugfixes
> that went in for 2.6.36.
> People are working on multiqueue, tracing but I'm not
> sure it'll get done in time for 2.6.37 - so here's
> a tree with a single patch that helps windows guests
> which we definitely want in the next kernel.
> 
> Please merge for 2.6.37.

Pulled, thanks Michael.

^ permalink raw reply

* Re: [PATCH wireless-2.6 or stable] iwlwifi: return error when fail to start scanning
From: John W. Linville @ 2010-10-06 20:01 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Florian Mickler, stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	wey-yi.w.guy-ral2JQCrhuEAvxtiuMwx3w,
	reinette.chatre-ral2JQCrhuEAvxtiuMwx3w,
	ilw-VuQAYsv1563Yd54FQh9/CA, johannes.berg-ral2JQCrhuEAvxtiuMwx3w,
	ben.m.cahill-ral2JQCrhuEAvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20101006160434.GB24581-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Wed, Oct 06, 2010 at 06:04:35PM +0200, Stanislaw Gruszka wrote:
> This is stable friendly backport of wireless-testing commit
> 3eecce527c7434dfd16517ce08b49632c8a26717 "iwlwifi: unify scan start
> checks". Wireless-testing version include also a lot of cleanups.
> 
> On error case in {iwl3945,iwlagn}_request_scan we queue abort_scan work,
> which in a fact do nothing, so we never complete the scan. Thats gives
> "WARNING: at net/wireless/core.c:614 wdev_cleanup_work" and stopped
> network connection until reload iwlwifi modules. Return of -EIO, and
> stopping queuing any work is a fix.
> 
> Note there are still many cases when we can get: 
> 
> WARNING: at net/wireless/core.c:614 wdev_cleanup_work  
> WARNING: at net/mac80211/scan.c:266 ieee80211_scan_completed
> WARNING: at net/mac80211/scan.c:269 ieee80211_scan_complete
> 
> which are not fixed. Unfortunately does not exist single, small fix
> for that problems, and we probably will see some more bug reports
> with scan warnings. As solution we rewrite iwl-scan.c code to avoid
> all possible race conditions. That quite big patch set is queued
> for 2.6.37 .
> 
> Signed-off-by: Stanislaw Gruszka <sgruszka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
> Patch is intended for wireless-2.6, or in case when it does not
> make 2.6.36 release, for -stable 2.6.36.y 

I'm fine with this patch for -stable, but I'm not sure it is worthwhile
to rush it into 2.6.36.  The problem it fixes is intermittent and
not always very severe, and we've already been living with it for
4 releases.

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org			might be all we have.  Be ready.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] SIW: iWARP Protocol headers
From: David Dillow @ 2010-10-06 19:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bernard Metzler, Bart Van Assche,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20101006183734.GK24268-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Wed, 2010-10-06 at 12:37 -0600, Jason Gunthorpe wrote:
> On Wed, Oct 06, 2010 at 02:22:55PM -0400, David Dillow wrote:
> > On Wed, 2010-10-06 at 11:25 -0600, Jason Gunthorpe wrote:
> > > On Wed, Oct 06, 2010 at 02:52:46PM +0200, Bernard Metzler wrote:
> > > It is actually a little more complicated than just this. I assume you
> > > are casting the structures over packet payloads? In this case you
> > > have to guarentee alignment (or used packed everywhere). Does iwarp
> > > have provisions for alignment? If so you can construct your bitfields
> > > using the alignment type, ie if iWarp guarantees 4 bytes then the
> > > biggest type you can use is u32 - then you can avoid using packed.
> > > 
> > > Mind you, I'm not sure how to guarentee alignment when you consider
> > > all the possible sources of unalignment in the stack: TCP, IP, L2 stack ?
> > 
> > You don't have to. The TCP stack, IIRC, requires the architecture to fix
> > up misaligned accesses, or at least it used to. It will try to keep
> > things aligned if possible -- see the comment above NET_IP_ALIGN in
> > include/linux/skbuff.h.
> 
> I was under the impression that this was just to align the IP
> header.

It is, and it assumes an ethernet header of 14 bytes. As you note, VLANs
and everything else can mess with that, which is why the TCP stack
requires the architecture to handle unaligned accesses. It's a bonus if
iWarp can guarantee an alignment WRT the IP or TCP header, but not
required.

> > The structures in Bernard's header have all fields naturally
> > aligned, so no need for the packed attribute and its baggage.
> 
> No, they arent:
> 
> > +struct iwarp_rdma_write {
> > +	struct iwarp_ctrl	ctrl;
> > +	__u32			sink_stag;
> > +	__u64			sink_to;
> > +} __attribute__((__packed__));
> 
> For instance has sink_to unaligned if the maximum aligment of the
> payload is only guarenteed to be 4. The proper thing to do in this
> case is probably __attribute__((__packed__, __aligned__(4))) if that
> works the way I think.. :)

struct iwarp_ctrl consists of 2 u16s, so sink_stag is aligned to 4 bytes
from the start of the struct iwarp_rdma_write, and sink_to is at 8 bytes
-- so they are naturally aligned with respect to the structure. The
compiler will not add padding, so no need for the packed attribute.

Note I'm not talking about alignment in memory -- that'll depend on
where the data starts, and as I mentioned, the stack doesn't make any
guarantees about that, relying on the architecture to fixup any
misaligned accesses.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net
From: Michael S. Tsirkin @ 2010-10-06 19:03 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: anthony, arnd, avi, davem, kvm, netdev, rusty, herbert
In-Reply-To: <OF806283F6.38268772-ON652577B4.005E805A-652577B4.00611B1C@in.ibm.com>

On Wed, Oct 06, 2010 at 11:13:31PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 10/05/2010 11:53:23 PM:
> 
> > > > Any idea where does this come from?
> > > > Do you see more TX interrupts? RX interrupts? Exits?
> > > > Do interrupts bounce more between guest CPUs?
> > > > 4. Identify reasons for single netperf BW regression.
> > >
> > > After testing various combinations of #txqs, #vhosts, #netperf
> > > sessions, I think the drop for 1 stream is due to TX and RX for
> > > a flow being processed on different cpus.
> >
> > Right. Can we fix it?
> 
> I am not sure how to. My initial patch had one thread but gave
> small gains and ran into limitations once number of sessions
> became large.

Sure. We will need multiple RX queues, and have a single
thread handle a TX and RX pair. Then we need to make sure packets
from a given flow on TX land on the same thread on RX.
As flows can be hashed differently, for this to work we'll have to
expose this info in host/guest interface.
But since multiqueue implies host/guest ABI changes anyway,
this point is moot.

BTW, an interesting approach could be using bonding
and multiple virtio-net interfaces.
What are the disadvantages of such a setup?  One advantage
is it can be made to work in existing guests.

> > >  I did two more tests:
> > >     1. Pin vhosts to same CPU:
> > >         - BW drop is much lower for 1 stream case (- 5 to -8% range)
> > >         - But performance is not so high for more sessions.
> > >     2. Changed vhost to be single threaded:
> > >           - No degradation for 1 session, and improvement for upto
> > >          8, sometimes 16 streams (5-12%).
> > >           - BW degrades after that, all the way till 128 netperf
> sessions.
> > >           - But overall CPU utilization improves.
> > >             Summary of the entire run (for 1-128 sessions):
> > >                 txq=4:  BW: (-2.3)      CPU: (-16.5)    RCPU: (-5.3)
> > >                 txq=16: BW: (-1.9)      CPU: (-24.9)    RCPU: (-9.6)
> > >
> > > I don't see any reasons mentioned above.  However, for higher
> > > number of netperf sessions, I see a big increase in retransmissions:
> >
> > Hmm, ok, and do you see any errors?
> 
> I haven't seen any in any statistics, messages, etc.

Herbert, could you help out debugging this increase in retransmissions
please?  Older mail on netdev in this thread has some numbers that seem
to imply that we start hitting retransmissions much more as # of flows
goes up.

> Also no
> retranmissions for txq=1.

While it's nice that we have this parameter, the need to choose between
single stream and multi stream performance when you start the vm makes
this patch much less interesting IMHO.


-- 
MST

^ permalink raw reply

* Re: [net-next-2.6 PATCH] net: netif_set_real_num_rx_queues may cap num_rx_queues at init time
From: Eric Dumazet @ 2010-10-06 18:49 UTC (permalink / raw)
  To: Matt Carlson
  Cc: Ben Hutchings, John Fastabend, netdev@vger.kernel.org,
	therbert@google.com
In-Reply-To: <20101006181458.GA6817@mcarlson.broadcom.com>

Le mercredi 06 octobre 2010 à 11:14 -0700, Matt Carlson a écrit :

> Yes.  We were missing a call to this function in the legacy case.
> 
> 
> [PATCH net-next] tg3: Set real_num_rx_queues for non-multiq devs
> 
> Commit 2ddaad397c47de012dfb956b0c05540da1a0dde5 entitled "tg3: Use
> netif_set_real_num_{rx,tx}_queues()" added a new call to
> netif_set_real_num_rx_queues in tg3_enable_msix().  This call also needs
> to be added to the legacy path to correctly reflect the actual number of
> rx queues.
> 
> Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
> ---
>  drivers/net/tg3.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> index 16e1a95..e5b9ec5 100644
> --- a/drivers/net/tg3.c
> +++ b/drivers/net/tg3.c
> @@ -8906,6 +8906,7 @@ defcfg:
>  		tp->irq_cnt = 1;
>  		tp->napi[0].irq_vec = tp->pdev->irq;
>  		netif_set_real_num_tx_queues(tp->dev, 1);
> +		netif_set_real_num_rx_queues(tp->dev, 1);
>  	}
>  }
>  

Thanks, this fixed the thing for me, once device is UP

# insmod drivers/net/tg3.ko
# grep . /sys/class/net/eth3/queues/*/rps_cpus
/sys/class/net/eth3/queues/rx-0/rps_cpus:00000000
/sys/class/net/eth3/queues/rx-1/rps_cpus:00000000
/sys/class/net/eth3/queues/rx-2/rps_cpus:00000000
/sys/class/net/eth3/queues/rx-3/rps_cpus:00000000
/sys/class/net/eth3/queues/rx-4/rps_cpus:00000000
# echo ff >/sys/class/net/eth3/queues/rx-2/rps_cpus
# grep . /sys/class/net/eth3/queues/*/rps_cpus
/sys/class/net/eth3/queues/rx-0/rps_cpus:00000000
/sys/class/net/eth3/queues/rx-1/rps_cpus:00000000
/sys/class/net/eth3/queues/rx-2/rps_cpus:000000ff
/sys/class/net/eth3/queues/rx-3/rps_cpus:00000000
/sys/class/net/eth3/queues/rx-4/rps_cpus:00000000
# ifconfig eth3 up
# grep . /sys/class/net/eth3/queues/*/rps_cpus
00000000

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>



^ permalink raw reply

* Re: [PATCH] SIW: iWARP Protocol headers
From: Jason Gunthorpe @ 2010-10-06 18:37 UTC (permalink / raw)
  To: David Dillow
  Cc: Bernard Metzler, Bart Van Assche, linux-rdma, linux-rdma-owner,
	netdev
In-Reply-To: <1286389375.26136.24.camel@lap75545.ornl.gov>

On Wed, Oct 06, 2010 at 02:22:55PM -0400, David Dillow wrote:
> On Wed, 2010-10-06 at 11:25 -0600, Jason Gunthorpe wrote:
> > On Wed, Oct 06, 2010 at 02:52:46PM +0200, Bernard Metzler wrote:
> > It is actually a little more complicated than just this. I assume you
> > are casting the structures over packet payloads? In this case you
> > have to guarentee alignment (or used packed everywhere). Does iwarp
> > have provisions for alignment? If so you can construct your bitfields
> > using the alignment type, ie if iWarp guarantees 4 bytes then the
> > biggest type you can use is u32 - then you can avoid using packed.
> > 
> > Mind you, I'm not sure how to guarentee alignment when you consider
> > all the possible sources of unalignment in the stack: TCP, IP, L2 stack ?
> 
> You don't have to. The TCP stack, IIRC, requires the architecture to fix
> up misaligned accesses, or at least it used to. It will try to keep
> things aligned if possible -- see the comment above NET_IP_ALIGN in
> include/linux/skbuff.h.

I was under the impression that this was just to align the IP
header. There are quite a variety of options and additional stuff that
can be inserted at each layer, VLAN for ethernet, IP option/extension
headers, TCP option headers, etc, etc. I seem to remember that because
of all this there are cases where processing unaligned payloads is
unavoidable? Hence my question if iWarp has restrictions (particularly
on TCP) that might guarentee alignment.

> The structures in Bernard's header have all fields naturally
> aligned, so no need for the packed attribute and its baggage.

No, they arent:

> +struct iwarp_rdma_write {
> +	struct iwarp_ctrl	ctrl;
> +	__u32			sink_stag;
> +	__u64			sink_to;
> +} __attribute__((__packed__));

For instance has sink_to unaligned if the maximum aligment of the
payload is only guarenteed to be 4. The proper thing to do in this
case is probably __attribute__((__packed__, __aligned__(4))) if that
works the way I think.. :)

Jason

^ permalink raw reply

* Re: [PATCH] SIW: iWARP Protocol headers
From: David Dillow @ 2010-10-06 18:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bernard Metzler, Bart Van Assche,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20101006172518.GI24268-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

On Wed, 2010-10-06 at 11:25 -0600, Jason Gunthorpe wrote:
> On Wed, Oct 06, 2010 at 02:52:46PM +0200, Bernard Metzler wrote:
> It is actually a little more complicated than just this. I assume you
> are casting the structures over packet payloads? In this case you
> have to guarentee alignment (or used packed everywhere). Does iwarp
> have provisions for alignment? If so you can construct your bitfields
> using the alignment type, ie if iWarp guarantees 4 bytes then the
> biggest type you can use is u32 - then you can avoid using packed.
> 
> Mind you, I'm not sure how to guarentee alignment when you consider
> all the possible sources of unalignment in the stack: TCP, IP, L2 stack ?

You don't have to. The TCP stack, IIRC, requires the architecture to fix
up misaligned accesses, or at least it used to. It will try to keep
things aligned if possible -- see the comment above NET_IP_ALIGN in
include/linux/skbuff.h.

The structures in Bernard's header have all fields naturally aligned, so
no need for the packed attribute and its baggage.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [net-next-2.6 PATCH] net: netif_set_real_num_rx_queues may cap num_rx_queues at init time
From: Matt Carlson @ 2010-10-06 18:14 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Eric Dumazet, Matthew Carlson, John Fastabend,
	netdev@vger.kernel.org, therbert@google.com
In-Reply-To: <1286379105.2371.15.camel@achroite.uk.solarflarecom.com>

On Wed, Oct 06, 2010 at 08:31:45AM -0700, Ben Hutchings wrote:
> On Wed, 2010-10-06 at 17:23 +0200, Eric Dumazet wrote:
> > Le mercredi 06 octobre 2010 ?? 16:07 +0100, Ben Hutchings a ??crit :
> > 
> > > The waste of memory is minimal now that we only allocate kobjects for
> > > real_num_rx_queues.
> > 
> > Thats strange, here with tg3 (and mono queue device) :
> > 
> > 0a:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5715S
> > Gigabit Ethernet (rev a3)
> > 
> > grep . /sys/class/net/eth2/queues/rx-*/rps_cpus
> > /sys/class/net/eth2/queues/rx-0/rps_cpus:00000000
> > /sys/class/net/eth2/queues/rx-1/rps_cpus:00000000
> > /sys/class/net/eth2/queues/rx-2/rps_cpus:00000000
> > /sys/class/net/eth2/queues/rx-3/rps_cpus:00000000
> > /sys/class/net/eth2/queues/rx-4/rps_cpus:00000000
> 
> It looks like I missed a necessary call to
> netif_set_real_num_rx_queues() in tg3.  I suggest that Matt should check
> and correct this since I got the numbers wrong last time.

Yes.  We were missing a call to this function in the legacy case.


[PATCH net-next] tg3: Set real_num_rx_queues for non-multiq devs

Commit 2ddaad397c47de012dfb956b0c05540da1a0dde5 entitled "tg3: Use
netif_set_real_num_{rx,tx}_queues()" added a new call to
netif_set_real_num_rx_queues in tg3_enable_msix().  This call also needs
to be added to the legacy path to correctly reflect the actual number of
rx queues.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
---
 drivers/net/tg3.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 16e1a95..e5b9ec5 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -8906,6 +8906,7 @@ defcfg:
 		tp->irq_cnt = 1;
 		tp->napi[0].irq_vec = tp->pdev->irq;
 		netif_set_real_num_tx_queues(tp->dev, 1);
+		netif_set_real_num_rx_queues(tp->dev, 1);
 	}
 }
 
-- 
1.7.2.2



^ permalink raw reply related

* Re: [PATCH] SIW: Queue pair
From: Bart Van Assche @ 2010-10-06 17:57 UTC (permalink / raw)
  To: Bernard Metzler
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1286261719-5197-1-git-send-email-bmt-OA+xvbQnYDHMbYB6QlFGEg@public.gmane.org>

On Tue, Oct 5, 2010 at 8:55 AM, Bernard Metzler <bmt-OA+xvbQnYDHMbYB6QlFGEg@public.gmane.org> wrote:
>
> [ ... ]
> +int
> +siw_qp_modify(struct siw_qp *qp, struct siw_qp_attrs *attrs,
> +             enum siw_qp_attr_mask mask)
> +{
> [ ... ]
> +               case SIW_QP_STATE_ERROR:
> +                       siw_rq_flush(qp);
> +                       qp->attrs.state = SIW_QP_STATE_ERROR;
> +                       drop_conn = 1;
> +                       break;

At least for InfiniBand when the state of a queue pair is changed to
Error, work requests must be completed in error. The above code just
flushes pending work requests. That doesn't look correct to me.

A quote from the InfiniBand Architecture Specification, table 91, QP
State Transition Properties:

Any state to Reset: QP attributes are reset to the same values after
the QP was created. Outstanding Work Requests are removed from the
queues without notifying the Consumer.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net
From: Arnd Bergmann @ 2010-10-06 17:50 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: anthony, avi, davem, Ben Greear, kvm, Michael S. Tsirkin, netdev,
	rusty
In-Reply-To: <OFB5315194.76E80A8A-ON652577B4.005DBA77-652577B4.005E77C6@in.ibm.com>

On Wednesday 06 October 2010 19:14:42 Krishna Kumar2 wrote:
> Arnd Bergmann <arnd@arndb.de> wrote on 10/06/2010 05:49:00 PM:
> 
> > > I don't see any reasons mentioned above.  However, for higher
> > > number of netperf sessions, I see a big increase in retransmissions:
> > > _______________________________________
> > > #netperf      ORG           NEW
> > >             BW (#retr)    BW (#retr)
> > > _______________________________________
> > > 1          70244 (0)     64102 (0)
> > > 4          21421 (0)     36570 (416)
> > > 8          21746 (0)     38604 (148)
> > > 16         21783 (0)     40632 (464)
> > > 32         22677 (0)     37163 (1053)
> > > 64         23648 (4)     36449 (2197)
> > > 128        23251 (2)     31676 (3185)
> > > _______________________________________
> >
> >
> > This smells like it could be related to a problem that Ben Greear found
> > recently (see "macvlan:  Enable qdisc backoff logic"). When the hardware
> > is busy, used to just drop the packet. With Ben's patch, we return
> -EAGAIN
> > to qemu (or vhost-net) to trigger a resend.
> >
> > I suppose what we really should do is feed that condition back to the
> > guest network stack and implement the backoff in there.
> 
> Thanks for the pointer. I will take a look at this as I hadn't seen
> this patch earlier. Is there any way to figure out if this is the
> issue?

I think a good indication would be if this changes with/without the
patch, and if you see -EAGAIN in qemu with the patch applied.

	Arnd

^ permalink raw reply

* Re: [PATCH wireless-2.6 or stable] iwlwifi: return error when fail to start scanning
From: Guy, Wey-Yi @ 2010-10-06 17:48 UTC (permalink / raw)
  To: Florian Mickler, Stanislaw Gruszka
  Cc: linville@tuxdriver.com, stable@kernel.org,
	linux-wireless@vger.kernel.org, Chatre, Reinette,
	ilw@linux.intel.com, Berg, Johannes, Cahill, Ben M,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20101006194537.35e8b00e@schatten.dmk.lab>

On Wed, 2010-10-06 at 10:45 -0700, Florian Mickler wrote:
> Hi Stanislaw!
> 
> On Wed, 6 Oct 2010 18:04:35 +0200
> Stanislaw Gruszka <sgruszka@redhat.com> wrote:
> 
> > This is stable friendly backport of wireless-testing commit
> > 3eecce527c7434dfd16517ce08b49632c8a26717 "iwlwifi: unify scan start
> > checks". Wireless-testing version include also a lot of cleanups.
> > 
> > On error case in {iwl3945,iwlagn}_request_scan we queue abort_scan work,
> > which in a fact do nothing, so we never complete the scan. Thats gives
> > "WARNING: at net/wireless/core.c:614 wdev_cleanup_work" and stopped
> > network connection until reload iwlwifi modules. Return of -EIO and
> > stopping queuing any work is a fix.
> > 
> > Note there are still many cases when we can get: 
> > 
> > WARNING: at net/wireless/core.c:614 wdev_cleanup_work  
> > WARNING: at net/mac80211/scan.c:266 ieee80211_scan_completed
> > WARNING: at net/mac80211/scan.c:269 ieee80211_scan_complete
> > 
> > which are not fixed. Unfortunately does not exist single, small fix
> > for that problems, and we probably will see some more bug reports
> > with scan warnings. As solution we rewrite iwl-scan.c code to avoid
> > all possible race conditions. That quite big patch set is queued
> > for 2.6.37 .
> > 
> > Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>

looks good to me

Thanks
Wey



^ permalink raw reply

* Re: [PATCH wireless-2.6 or stable] iwlwifi: return error when fail to start scanning
From: Florian Mickler @ 2010-10-06 17:45 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: linville-2XuSBdqkA4R54TAoqtyWWQ, stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	wey-yi.w.guy-ral2JQCrhuEAvxtiuMwx3w,
	reinette.chatre-ral2JQCrhuEAvxtiuMwx3w,
	ilw-VuQAYsv1563Yd54FQh9/CA, johannes.berg-ral2JQCrhuEAvxtiuMwx3w,
	ben.m.cahill-ral2JQCrhuEAvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20101006160434.GB24581-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Hi Stanislaw!

On Wed, 6 Oct 2010 18:04:35 +0200
Stanislaw Gruszka <sgruszka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> This is stable friendly backport of wireless-testing commit
> 3eecce527c7434dfd16517ce08b49632c8a26717 "iwlwifi: unify scan start
> checks". Wireless-testing version include also a lot of cleanups.
> 
> On error case in {iwl3945,iwlagn}_request_scan we queue abort_scan work,
> which in a fact do nothing, so we never complete the scan. Thats gives
> "WARNING: at net/wireless/core.c:614 wdev_cleanup_work" and stopped
> network connection until reload iwlwifi modules. Return of -EIO, and
> stopping queuing any work is a fix.
> 
> Note there are still many cases when we can get: 
> 
> WARNING: at net/wireless/core.c:614 wdev_cleanup_work  
> WARNING: at net/mac80211/scan.c:266 ieee80211_scan_completed
> WARNING: at net/mac80211/scan.c:269 ieee80211_scan_complete
> 
> which are not fixed. Unfortunately does not exist single, small fix
> for that problems, and we probably will see some more bug reports
> with scan warnings. As solution we rewrite iwl-scan.c code to avoid
> all possible race conditions. That quite big patch set is queued
> for 2.6.37 .
> 
> Signed-off-by: Stanislaw Gruszka <sgruszka-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---

Not that it matters much, but I've looked through it and couldn't find
anything wrong with it.

 Reviewed-by: Florian Mickler <florian-sVu6HhrpSfRAfugRpC6u6w@public.gmane.org>

Regards,
Flo

> Patch is intended for wireless-2.6, or in case when it does not
> make 2.6.36 release, for -stable 2.6.36.y 
> 
>  drivers/net/wireless/iwlwifi/iwl-3945.h     |    2 +-
>  drivers/net/wireless/iwlwifi/iwl-agn-lib.c  |   18 ++++++------------
>  drivers/net/wireless/iwlwifi/iwl-agn.h      |    2 +-
>  drivers/net/wireless/iwlwifi/iwl-core.h     |    2 +-
>  drivers/net/wireless/iwlwifi/iwl-scan.c     |   14 +++++++++++---
>  drivers/net/wireless/iwlwifi/iwl3945-base.c |   19 ++++++-------------
>  6 files changed, 26 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/net/wireless/iwlwifi/iwl-3945.h b/drivers/net/wireless/iwlwifi/iwl-3945.h
> index bb2aeeb..98509c5 100644
> --- a/drivers/net/wireless/iwlwifi/iwl-3945.h
> +++ b/drivers/net/wireless/iwlwifi/iwl-3945.h
> @@ -295,7 +295,7 @@ extern const struct iwl_channel_info *iwl3945_get_channel_info(
>  extern int iwl3945_rs_next_rate(struct iwl_priv *priv, int rate);
>  
>  /* scanning */
> -void iwl3945_request_scan(struct iwl_priv *priv, struct ieee80211_vif *vif);
> +int iwl3945_request_scan(struct iwl_priv *priv, struct ieee80211_vif *vif);
>  
>  /* Requires full declaration of iwl_priv before including */
>  #include "iwl-io.h"
> diff --git a/drivers/net/wireless/iwlwifi/iwl-agn-lib.c b/drivers/net/wireless/iwlwifi/iwl-agn-lib.c
> index 8fd00a6..2be8faa 100644
> --- a/drivers/net/wireless/iwlwifi/iwl-agn-lib.c
> +++ b/drivers/net/wireless/iwlwifi/iwl-agn-lib.c
> @@ -1147,7 +1147,7 @@ static int iwl_get_channels_for_scan(struct iwl_priv *priv,
>  	return added;
>  }
>  
> -void iwlagn_request_scan(struct iwl_priv *priv, struct ieee80211_vif *vif)
> +int iwlagn_request_scan(struct iwl_priv *priv, struct ieee80211_vif *vif)
>  {
>  	struct iwl_host_cmd cmd = {
>  		.id = REPLY_SCAN_CMD,
> @@ -1394,24 +1394,18 @@ void iwlagn_request_scan(struct iwl_priv *priv, struct ieee80211_vif *vif)
>  	scan->len = cpu_to_le16(cmd.len);
>  
>  	set_bit(STATUS_SCAN_HW, &priv->status);
> -	if (iwl_send_cmd_sync(priv, &cmd))
> +	if (iwl_send_cmd_sync(priv, &cmd)) {
> +		clear_bit(STATUS_SCAN_HW, &priv->status);
>  		goto done;
> +	}
>  
>  	queue_delayed_work(priv->workqueue, &priv->scan_check,
>  			   IWL_SCAN_CHECK_WATCHDOG);
>  
> -	return;
> +	return 0;
>  
>   done:
> -	/* Cannot perform scan. Make sure we clear scanning
> -	* bits from status so next scan request can be performed.
> -	* If we don't clear scanning status bit here all next scan
> -	* will fail
> -	*/
> -	clear_bit(STATUS_SCAN_HW, &priv->status);
> -	clear_bit(STATUS_SCANNING, &priv->status);
> -	/* inform mac80211 scan aborted */
> -	queue_work(priv->workqueue, &priv->abort_scan);
> +	return -EIO;
>  }
>  
>  int iwlagn_manage_ibss_station(struct iwl_priv *priv,
> diff --git a/drivers/net/wireless/iwlwifi/iwl-agn.h b/drivers/net/wireless/iwlwifi/iwl-agn.h
> index cc6464d..015da26 100644
> --- a/drivers/net/wireless/iwlwifi/iwl-agn.h
> +++ b/drivers/net/wireless/iwlwifi/iwl-agn.h
> @@ -216,7 +216,7 @@ void iwl_reply_statistics(struct iwl_priv *priv,
>  			  struct iwl_rx_mem_buffer *rxb);
>  
>  /* scan */
> -void iwlagn_request_scan(struct iwl_priv *priv, struct ieee80211_vif *vif);
> +int iwlagn_request_scan(struct iwl_priv *priv, struct ieee80211_vif *vif);
>  
>  /* station mgmt */
>  int iwlagn_manage_ibss_station(struct iwl_priv *priv,
> diff --git a/drivers/net/wireless/iwlwifi/iwl-core.h b/drivers/net/wireless/iwlwifi/iwl-core.h
> index 5e6ee3d..110de0f 100644
> --- a/drivers/net/wireless/iwlwifi/iwl-core.h
> +++ b/drivers/net/wireless/iwlwifi/iwl-core.h
> @@ -109,7 +109,7 @@ struct iwl_hcmd_utils_ops {
>  				  __le16 fc, __le32 *tx_flags);
>  	int  (*calc_rssi)(struct iwl_priv *priv,
>  			  struct iwl_rx_phy_res *rx_resp);
> -	void (*request_scan)(struct iwl_priv *priv, struct ieee80211_vif *vif);
> +	int (*request_scan)(struct iwl_priv *priv, struct ieee80211_vif *vif);
>  };
>  
>  struct iwl_apm_ops {
> diff --git a/drivers/net/wireless/iwlwifi/iwl-scan.c b/drivers/net/wireless/iwlwifi/iwl-scan.c
> index a4b3663..290140a 100644
> --- a/drivers/net/wireless/iwlwifi/iwl-scan.c
> +++ b/drivers/net/wireless/iwlwifi/iwl-scan.c
> @@ -298,6 +298,7 @@ EXPORT_SYMBOL(iwl_init_scan_params);
>  
>  static int iwl_scan_initiate(struct iwl_priv *priv, struct ieee80211_vif *vif)
>  {
> +	int ret;
>  	lockdep_assert_held(&priv->mutex);
>  
>  	IWL_DEBUG_INFO(priv, "Starting scan...\n");
> @@ -308,9 +309,11 @@ static int iwl_scan_initiate(struct iwl_priv *priv, struct ieee80211_vif *vif)
>  	if (WARN_ON(!priv->cfg->ops->utils->request_scan))
>  		return -EOPNOTSUPP;
>  
> -	priv->cfg->ops->utils->request_scan(priv, vif);
> +	ret = priv->cfg->ops->utils->request_scan(priv, vif);
> +	if (ret)
> +		clear_bit(STATUS_SCANNING, &priv->status);
> +	return ret;
>  
> -	return 0;
>  }
>  
>  int iwl_mac_hw_scan(struct ieee80211_hw *hw,
> @@ -380,6 +383,7 @@ void iwl_internal_short_hw_scan(struct iwl_priv *priv)
>  
>  void iwl_bg_start_internal_scan(struct work_struct *work)
>  {
> +	int ret;
>  	struct iwl_priv *priv =
>  		container_of(work, struct iwl_priv, start_internal_scan);
>  
> @@ -414,7 +418,11 @@ void iwl_bg_start_internal_scan(struct work_struct *work)
>  	if (WARN_ON(!priv->cfg->ops->utils->request_scan))
>  		goto unlock;
>  
> -	priv->cfg->ops->utils->request_scan(priv, NULL);
> +	ret = priv->cfg->ops->utils->request_scan(priv, NULL);
> +	if (ret) {
> +		clear_bit(STATUS_SCANNING, &priv->status);
> +		priv->is_internal_short_scan = false;
> +	}
>   unlock:
>  	mutex_unlock(&priv->mutex);
>  }
> diff --git a/drivers/net/wireless/iwlwifi/iwl3945-base.c b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> index d31661c..fc82da0 100644
> --- a/drivers/net/wireless/iwlwifi/iwl3945-base.c
> +++ b/drivers/net/wireless/iwlwifi/iwl3945-base.c
> @@ -2799,7 +2799,7 @@ static void iwl3945_rfkill_poll(struct work_struct *data)
>  
>  }
>  
> -void iwl3945_request_scan(struct iwl_priv *priv, struct ieee80211_vif *vif)
> +int iwl3945_request_scan(struct iwl_priv *priv, struct ieee80211_vif *vif)
>  {
>  	struct iwl_host_cmd cmd = {
>  		.id = REPLY_SCAN_CMD,
> @@ -3000,25 +3000,18 @@ void iwl3945_request_scan(struct iwl_priv *priv, struct ieee80211_vif *vif)
>  	scan->len = cpu_to_le16(cmd.len);
>  
>  	set_bit(STATUS_SCAN_HW, &priv->status);
> -	if (iwl_send_cmd_sync(priv, &cmd))
> +	if (iwl_send_cmd_sync(priv, &cmd)) {
> +		clear_bit(STATUS_SCAN_HW, &priv->status);
>  		goto done;
> +	}
>  
>  	queue_delayed_work(priv->workqueue, &priv->scan_check,
>  			   IWL_SCAN_CHECK_WATCHDOG);
>  
> -	return;
> +	return 0;
>  
>   done:
> -	/* can not perform scan make sure we clear scanning
> -	 * bits from status so next scan request can be performed.
> -	 * if we dont clear scanning status bit here all next scan
> -	 * will fail
> -	*/
> -	clear_bit(STATUS_SCAN_HW, &priv->status);
> -	clear_bit(STATUS_SCANNING, &priv->status);
> -
> -	/* inform mac80211 scan aborted */
> -	queue_work(priv->workqueue, &priv->abort_scan);
> +	return -EIO;
>  }
>  
>  static void iwl3945_bg_restart(struct work_struct *data)
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net
From: Krishna Kumar2 @ 2010-10-06 17:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: anthony, arnd, avi, davem, kvm, netdev, rusty
In-Reply-To: <20101005182323.GA25852@redhat.com>

"Michael S. Tsirkin" <mst@redhat.com> wrote on 10/05/2010 11:53:23 PM:

> > > Any idea where does this come from?
> > > Do you see more TX interrupts? RX interrupts? Exits?
> > > Do interrupts bounce more between guest CPUs?
> > > 4. Identify reasons for single netperf BW regression.
> >
> > After testing various combinations of #txqs, #vhosts, #netperf
> > sessions, I think the drop for 1 stream is due to TX and RX for
> > a flow being processed on different cpus.
>
> Right. Can we fix it?

I am not sure how to. My initial patch had one thread but gave
small gains and ran into limitations once number of sessions
became large.

> >  I did two more tests:
> >     1. Pin vhosts to same CPU:
> >         - BW drop is much lower for 1 stream case (- 5 to -8% range)
> >         - But performance is not so high for more sessions.
> >     2. Changed vhost to be single threaded:
> >           - No degradation for 1 session, and improvement for upto
> >          8, sometimes 16 streams (5-12%).
> >           - BW degrades after that, all the way till 128 netperf
sessions.
> >           - But overall CPU utilization improves.
> >             Summary of the entire run (for 1-128 sessions):
> >                 txq=4:  BW: (-2.3)      CPU: (-16.5)    RCPU: (-5.3)
> >                 txq=16: BW: (-1.9)      CPU: (-24.9)    RCPU: (-9.6)
> >
> > I don't see any reasons mentioned above.  However, for higher
> > number of netperf sessions, I see a big increase in retransmissions:
>
> Hmm, ok, and do you see any errors?

I haven't seen any in any statistics, messages, etc. Also no
retranmissions for txq=1.

> > Single netperf case didn't have any retransmissions so that is not
> > the cause for drop.  I tested ixgbe (MQ):
> > ___________________________________________________________
> > #netperf      ixgbe             ixgbe (pin intrs to cpu#0 on
> >                                        both server/client)
> >             BW (#retr)          BW (#retr)
> > ___________________________________________________________
> > 1           3567 (117)          6000 (251)
> > 2           4406 (477)          6298 (725)
> > 4           6119 (1085)         7208 (3387)
> > 8           6595 (4276)         7381 (15296)
> > 16          6651 (11651)        6856 (30394)
>
> Interesting.
> You are saying we get much more retransmissions with physical nic as
> well?

Yes, with ixgbe. I re-ran with 16 netperfs running for 15 secs on
both ixgbe and cxgb3 just now to reconfirm:

ixgbe: BW: 6186.85  SD/Remote: 135.711, 339.376  CPU/Remote: 79.99, 200.00,
Retrans: 545
cxgb3: BW: 8051.07  SD/Remote: 144.416, 260.487  CPU/Remote: 110.88,
200.00, Retrans: 0

However 64 netperfs for 30 secs gave:

ixgbe: BW: 6691.12  SD/Remote: 8046.617, 5259.992  CPU/Remote: 1223.86,
799.97, Retrans: 1424
cxgb3: BW: 7799.16  SD/Remote: 2589.875, 4317.013  CPU/Remote: 480.39
800.64, Retrans: 649

# ethtool -i eth4
driver: ixgbe
version: 2.0.84-k2
firmware-version: 0.9-3
bus-info: 0000:1f:00.1

# ifconfig output:
       RX packets:783241 errors:0 dropped:0 overruns:0 frame:0
       TX packets:689533 errors:0 dropped:0 overruns:0 carrier:0
       collisions:0 txqueuelen:1000

# lspci output:
1f:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit Network
Connec
tion (rev 01)
        Subsystem: Intel Corporation Ethernet Server Adapter X520-2
        Flags: bus master, fast devsel, latency 0, IRQ 30
        Memory at 98900000 (64-bit, prefetchable) [size=512K]
        I/O ports at 2020 [size=32]
        Memory at 98a00000 (64-bit, prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 00-1b-21-ff-ff-40-4a-b4
        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
        Kernel driver in use: ixgbe
        Kernel modules: ixgbe

> > I haven't done this right now since I don't have a setup.  I guess
> > it would be limited by wire speed and gains may not be there.  I
> > will try to do this later when I get the setup.
>
> OK but at least need to check that it does not hurt things.

Yes, sure.

> > Summary:
> >
> > 1. Average BW increase for regular I/O is best for #txq=16 with the
> >    least CPU utilization increase.
> > 2. The average BW for 512 byte I/O is best for lower #txq=2. For higher
> >    #txqs, BW increased only after a particular #netperf sessions - in
> >    my testing that limit was 32 netperf sessions.
> > 3. Multiple txq for guest by itself doesn't seem to have any issues.
> >    Guest CPU% increase is slightly higher than BW improvement.  I
> >    think it is true for all mq drivers since more paths run in parallel
> >    upto the device instead of sleeping and allowing one thread to send
> >    all packets via qdisc_restart.
> > 4. Having high number of txqs gives better gains and reduces cpu util
> >    on the guest and the host.
> > 5. MQ is intended for server loads.  MQ should probably not be
explicitly
> >    specified for client systems.
> > 6. No regression with numtxqs=1 (or if mq option is not used) in any
> >    testing scenario.
>
> Of course txq=1 can be considered a kind of fix, but if we know the
> issue is TX/RX flows getting bounced between CPUs, can we fix this?
> Workload-specific optimizations can only get us this far.

I will test with your patch tomorrow night once I am back.

Thanks,

- KK


^ permalink raw reply

* Re: [PATCH wireless-2.6 or stable] iwlwifi: return error when fail to start scanning
From: Florian Mickler @ 2010-10-06 17:32 UTC (permalink / raw)
  To: Guy, Wey-Yi
  Cc: Stanislaw Gruszka, linville@tuxdriver.com, stable@kernel.org,
	linux-wireless@vger.kernel.org, Chatre, Reinette,
	ilw@linux.intel.com, Berg, Johannes, Cahill, Ben M,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <1286381578.12594.18.camel@wwguy-ubuntu>

On Wed, 06 Oct 2010 09:12:58 -0700
"Guy, Wey-Yi" <wey-yi.w.guy@intel.com> wrote:

> Hi Gruszka,
> 
> On Wed, 2010-10-06 at 09:04 -0700, Stanislaw Gruszka wrote:
> > This is stable friendly backport of wireless-testing commit
> > 3eecce527c7434dfd16517ce08b49632c8a26717 "iwlwifi: unify scan start
> > checks". Wireless-testing version include also a lot of cleanups.
> > 
> > On error case in {iwl3945,iwlagn}_request_scan we queue abort_scan work,
> > which in a fact do nothing, so we never complete the scan. Thats gives
> > "WARNING: at net/wireless/core.c:614 wdev_cleanup_work" and stopped
> > network connection until reload iwlwifi modules. Return of -EIO, and
> > stopping queuing any work is a fix.
> > 
> > Note there are still many cases when we can get: 
> > 
> > WARNING: at net/wireless/core.c:614 wdev_cleanup_work  
> > WARNING: at net/mac80211/scan.c:266 ieee80211_scan_completed
> > WARNING: at net/mac80211/scan.c:269 ieee80211_scan_complete
> > 
> > which are not fixed. Unfortunately does not exist single, small fix
> > for that problems, and we probably will see some more bug reports
> > with scan warnings. As solution we rewrite iwl-scan.c code to avoid
> > all possible race conditions. That quite big patch set is queued
> > for 2.6.37 .
> > 
> > Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
> > ---
> > Patch is intended for wireless-2.6, or in case when it does not
> > make 2.6.36 release, for -stable 2.6.36.y 
> > 
> 
> Is this the replacement for Mickler's patch?
> 
> Thanks
> Wey
> 

This is on top of current mainline, as far as I see. (And it replaces
the patch I sent few hours ago)

Regards,
Flo

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox