Netdev List
 help / color / mirror / Atom feed
* Re: Occasional oops with IPSec and IPv6.
From: Nick Bowler @ 2011-11-18 19:26 UTC (permalink / raw)
  To: Timo Teräs; +Cc: Eric Dumazet, netdev, David S. Miller
In-Reply-To: <4EC6A38E.6060404@iki.fi>

On 2011-11-18 20:27 +0200, Timo Teräs wrote:
> On 11/18/2011 06:39 PM, Eric Dumazet wrote:
> > Le vendredi 18 novembre 2011 à 11:27 -0500, Nick Bowler a écrit :
> >> On 2011-11-17 14:09 -0500, Nick Bowler wrote:
> >>> One of the tests we do with IPsec involves sending and receiving UDP
> >>> datagrams of all sizes from 1 to N bytes, where N is much larger than
> >>> the MTU.  In this particular instance, the MTU is 1500 bytes and N is
> >>> 10000 bytes.  This test works fine with IPv4, but I'm getting an
> >>> occasional oops on Linus' master with IPv6 (output at end of email).  We
> >>> also run the same test where N is less than the MTU, and it does not
> >>> trigger this issue.  The resulting fallout seems to eventually lock up
> >>> the box (although it continues to work for a little while afterwards).
> >>>
> >>> The issue appears timing related, and it doesn't always occur.  This
> >>> probably also explains why I've not seen this issue before now, as we
> >>> recently upgraded all our lab systems to machines from this century
> >>> (with newfangled dual core processors).  This also makes it somewhat
> >>> hard to reproduce, but I can trigger it pretty reliably by running 'yes'
> >>> in an ssh session (which doesn't use IPsec) while running the test:
> >>> it'll usually trigger in 2 or 3 runs.  The choice of cipher suite
> >>> appears to be irrelevant.
[...]
> > Please note commit 80c802f307 added a known bug, fixed in commit
> > 0b150932197b (xfrm: avoid possible oopse in xfrm_alloc_dst)
> > 
> > Given commit 80c802f307 complexity, we can assume other bugs are to be
> > fixed as well.
[...]
> This looks quite different. And I've been trying to figure out what
> causes this. However, the OOPS happens at ip6_fragment(), indicating
> that there was not enough allocated headroom (skb underrun). My initial
> thought is ipv6 bug that just got uncovered by my commit; especially
> since ipv4 side is happy. But I haven't yet been able to figure this one
> out.
> 
> Could you also try Herbert's latest patch set:
>   [0/6] Replace LL_ALLOCATED_SPACE to allow needed_headroom adjustment
> 
> This changes how the headroom is calculated, and *might* fix this issue
> too if it's caused by the same SMP race condition which got uncovered by
> my other commit earlier.

I applied all six of those patches, but I still see a crash.  However,
the call trace seems to be slightly different.  I've appended the trace
from the run with these paches applied, just in case it's significant.

NOTE: I did not carefully look at the traces of all the crashes I've
triggered.  This particular backtrace could potentially have appeared
before applying these patches and I would not have noticed.

[   45.318137] NET: Registered protocol family 15
[  125.153082] skb_under_panic: text:c1215d1d len:1462 put:14 head:f2ff1000 data:f2ff0ffa tail:0xf2ff15b0 end:0xf2ff1780 dev:p10p1
[  125.165124] ------------[ cut here ]------------
[  125.166001] kernel BUG at net/core/skbuff.c:147!
[  125.166001] invalid opcode: 0000 [#1] PREEMPT SMP 
[  125.166001] Modules linked in: authenc esp6 xfrm6_mode_transport deflate zlib_deflate ctr twofish_generic twofish_common camellia serpent blowfish_generic blowfish_common cast5 des_generic cbc xcbc rmd160 sha512_generic sha256_generic sha1_generic md5 hmac crypto_null af_key nfs lockd auth_rpcgss sunrpc rng_core iptable_filter ip_tables ip6table_filter ip6_tables x_tables psmouse sg r8169 mii evdev button ipv6 autofs4 usbhid ohci_hcd ehci_hcd usbcore usb_common sd_mod radeon ttm drm_kms_helper drm backlight i2c_algo_bit cfbcopyarea cfbimgblt cfbfillrect [last unloaded: scsi_wait_scan]
[  125.196579] 
[  125.196579] Pid: 2792, comm: udp_burst Not tainted 3.2.0-rc2-00115-g8b662f5 #54 System manufacturer System Product Name/M4A785T-M
[  125.196579] EIP: 0060:[<c11ff2af>] EFLAGS: 00010246 CPU: 0
[  125.196579] EIP is at skb_push+0x52/0x5b
[  125.196579] EAX: 00000089 EBX: f39cb000 ECX: 00000080 EDX: 00000003
[  125.196579] ESI: f39cb000 EDI: f39cb000 EBP: f29abb10 ESP: f29abae4
[  125.196579]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  125.196579] Process udp_burst (pid: 2792, ti=f29aa000 task=f3d8a2c0 task.ti=f29aa000)
[  125.196579] Stack:
[  125.196579]  c13a2756 c1215d1d 000005b6 0000000e f2ff1000 f2ff0ffa f2ff15b0 f2ff1780
[  125.196579]  f39cb000 00000000 f4fcdec0 f29abb28 c1215d1d 000086dd c1215d01 f29a5600
[  125.196579]  00000002 f29abb44 c120ee6d f4fcdec0 00000000 000005a8 f4fcde00 f29a5600
[  125.196579] Call Trace:
[  125.196579]  [<c1215d1d>] ? eth_header+0x1c/0x8b
[  125.196579]  [<c1215d1d>] eth_header+0x1c/0x8b
[  125.196579]  [<c1215d01>] ? eth_rebuild_header+0x53/0x53
[  125.196579]  [<c120ee6d>] dev_hard_header.constprop.12+0x28/0x32
[  125.196579]  [<c120ef74>] neigh_resolve_output+0xfd/0x138
[  125.196579]  [<f838af19>] ip6_finish_output2+0x280/0x31a [ipv6]
[  125.196579]  [<f838bf61>] ip6_fragment+0x3bd/0x939 [ipv6]
[  125.196579]  [<f838ac99>] ? NF_HOOK.constprop.4+0x30/0x30 [ipv6]
[  125.196579]  [<f838c51c>] ip6_finish_output+0x3f/0x4c [ipv6]
[  125.196579]  [<f838c5e1>] ip6_output+0xb8/0xc0 [ipv6]
[  125.196579]  [<c12520f1>] xfrm_output_resume+0x75/0x2c5
[  125.196579]  [<c125234e>] xfrm_output2+0xd/0xf
[  125.196579]  [<c12523e3>] xfrm_output+0x93/0x9c
[  125.196579]  [<f83a8b32>] xfrm6_output_finish+0x13/0x15 [ipv6]
[  125.196579]  [<f83a8a1f>] __xfrm6_output+0x108/0x10d [ipv6]
[  125.196579]  [<f83a8b7b>] xfrm6_output+0x47/0x4c [ipv6]
[  125.196579]  [<f838a7b4>] dst_output+0x12/0x15 [ipv6]
[  125.196579]  [<f838b36a>] ip6_local_out+0x17/0x1a [ipv6]
[  125.196579]  [<f838d27b>] ip6_push_pending_frames+0x2a4/0x346 [ipv6]
[  125.196579]  [<f839a035>] udp_v6_push_pending_frames+0x213/0x271 [ipv6]
[  125.196579]  [<f839ae84>] ? udpv6_sendmsg+0x68d/0x832 [ipv6]
[  125.196579]  [<f839aea6>] udpv6_sendmsg+0x6af/0x832 [ipv6]
[  125.196579]  [<c123fe84>] ? ip_fast_csum+0x30/0x30
[  125.196579]  [<c12403c0>] inet_sendmsg+0x4e/0x57
[  125.196579]  [<c11f8de6>] sock_sendmsg+0xbe/0xd9
[  125.196579]  [<c1052d64>] ? trace_hardirqs_off+0xb/0xd
[  125.196579]  [<c1270f48>] ? restore_all+0xf/0xf
[  125.196579]  [<c1055715>] ? trace_hardirqs_on_caller+0x10e/0x13f
[  125.196579]  [<c10542df>] ? mark_lock+0x26/0x1ea
[  125.196579]  [<c10acdbb>] ? fget_light+0x28/0x7c
[  125.196579]  [<c11fa23a>] sys_sendto+0xb1/0xcd
[  125.196579]  [<c10548e7>] ? __lock_acquire+0x444/0xb17
[  125.196579]  [<c1270bb1>] ? _raw_spin_unlock_irq+0x39/0x45
[  125.196579]  [<c1055038>] ? lock_release_non_nested+0x7e/0x1bb
[  125.196579]  [<c11fa26e>] sys_send+0x18/0x1a
[  125.196579]  [<c11fa877>] sys_socketcall+0xce/0x19a
[  125.196579]  [<c11507c0>] ? trace_hardirqs_on_thunk+0xc/0x10
[  125.196579]  [<c1271650>] sysenter_do_call+0x12/0x36
[  125.196579] Code: c1 85 f6 0f 45 de 53 ff b1 98 00 00 00 ff b1 94 00 00 00 50 ff b1 9c 00 00 00 52 ff 71 50 ff 75 04 68 56 27 3a c1 e8 5a c7 06 00 <0f> 0b 8d 65 f8 5b 5e 5d c3 55 89 c1 89 e5 56 53 83 79 54 00 8b 
[  125.196579] EIP: [<c11ff2af>] skb_push+0x52/0x5b SS:ESP 0068:f29abae4
[  125.544777] ---[ end trace 3ca7fd586035bfb5 ]---
[  125.549588] BUG: sleeping function called from invalid context at kernel/rwsem.c:21
[  125.557655] in_atomic(): 0, irqs_disabled(): 0, pid: 2792, name: udp_burst
[  125.565415] INFO: lockdep is turned off.
[  125.569682] Pid: 2792, comm: udp_burst Tainted: G      D      3.2.0-rc2-00115-g8b662f5 #54
[  125.578640] Call Trace:
[  125.581476]  [<c10307b1>] ? console_unlock+0x1b6/0x1c9
[  125.587209]  [<c1024dbd>] __might_sleep+0xe2/0xe9
[  125.592457]  [<c126ff47>] down_read+0x17/0x3b
[  125.597311]  [<c105fc85>] acct_collect+0x39/0x134
[  125.602749]  [<c1032c08>] do_exit+0x188/0x5de
[  125.607604]  [<c1031464>] ? kmsg_dump+0xdf/0xe7
[  125.612710]  [<c1004737>] oops_end+0x92/0x9a
[  125.617647]  [<c1004868>] die+0x51/0x59
[  125.622008]  [<c1002626>] do_trap+0x89/0xa2
[  125.626665]  [<c1002776>] ? do_bounds+0x52/0x52
[  125.631781]  [<c10027e7>] do_invalid_op+0x71/0x7b
[  125.637157]  [<c11ff2af>] ? skb_push+0x52/0x5b
[  125.642175]  [<c1270f48>] ? restore_all+0xf/0xf
[  125.647256]  [<c10307b1>] ? console_unlock+0x1b6/0x1c9
[  125.653106]  [<c102369b>] ? need_resched+0x14/0x1e
[  125.658517]  [<c126f1f7>] ? preempt_schedule+0x40/0x46
[  125.664271]  [<c1030c19>] ? vprintk+0x390/0x3ae
[  125.669417]  [<c1052d01>] ? trace_hardirqs_off_caller+0x2e/0x86
[  125.675999]  [<c11507d0>] ? trace_hardirqs_off_thunk+0xc/0x10
[  125.682561]  [<c127140b>] error_code+0x5f/0x64
[  125.687553]  [<c1002776>] ? do_bounds+0x52/0x52
[  125.692621]  [<c11ff2af>] ? skb_push+0x52/0x5b
[  125.697723]  [<c1215d1d>] ? eth_header+0x1c/0x8b
[  125.702905]  [<c1215d1d>] eth_header+0x1c/0x8b
[  125.707963]  [<c1215d01>] ? eth_rebuild_header+0x53/0x53
[  125.713945]  [<c120ee6d>] dev_hard_header.constprop.12+0x28/0x32
[  125.720617]  [<c120ef74>] neigh_resolve_output+0xfd/0x138
[  125.726714]  [<f838af19>] ip6_finish_output2+0x280/0x31a [ipv6]
[  125.733397]  [<f838bf61>] ip6_fragment+0x3bd/0x939 [ipv6]
[  125.739483]  [<f838ac99>] ? NF_HOOK.constprop.4+0x30/0x30 [ipv6]
[  125.746261]  [<f838c51c>] ip6_finish_output+0x3f/0x4c [ipv6]
[  125.752772]  [<f838c5e1>] ip6_output+0xb8/0xc0 [ipv6]
[  125.758684]  [<c12520f1>] xfrm_output_resume+0x75/0x2c5
[  125.764729]  [<c125234e>] xfrm_output2+0xd/0xf
[  125.769960]  [<c12523e3>] xfrm_output+0x93/0x9c
[  125.775292]  [<f83a8b32>] xfrm6_output_finish+0x13/0x15 [ipv6]
[  125.781988]  [<f83a8a1f>] __xfrm6_output+0x108/0x10d [ipv6]
[  125.788515]  [<f83a8b7b>] xfrm6_output+0x47/0x4c [ipv6]
[  125.794659]  [<f838a7b4>] dst_output+0x12/0x15 [ipv6]
[  125.800633]  [<f838b36a>] ip6_local_out+0x17/0x1a [ipv6]
[  125.806889]  [<f838d27b>] ip6_push_pending_frames+0x2a4/0x346 [ipv6]
[  125.814176]  [<f839a035>] udp_v6_push_pending_frames+0x213/0x271 [ipv6]
[  125.821792]  [<f839ae84>] ? udpv6_sendmsg+0x68d/0x832 [ipv6]
[  125.828447]  [<f839aea6>] udpv6_sendmsg+0x6af/0x832 [ipv6]
[  125.834931]  [<c123fe84>] ? ip_fast_csum+0x30/0x30
[  125.840522]  [<c12403c0>] inet_sendmsg+0x4e/0x57
[  125.845919]  [<c11f8de6>] sock_sendmsg+0xbe/0xd9
[  125.851343]  [<c1052d64>] ? trace_hardirqs_off+0xb/0xd
[  125.857271]  [<c1270f48>] ? restore_all+0xf/0xf
[  125.862642]  [<c1055715>] ? trace_hardirqs_on_caller+0x10e/0x13f
[  125.869618]  [<c10542df>] ? mark_lock+0x26/0x1ea
[  125.875028]  [<c10acdbb>] ? fget_light+0x28/0x7c
[  125.880431]  [<c11fa23a>] sys_sendto+0xb1/0xcd
[  125.885688]  [<c10548e7>] ? __lock_acquire+0x444/0xb17
[  125.891665]  [<c1270bb1>] ? _raw_spin_unlock_irq+0x39/0x45
[  125.898057]  [<c1055038>] ? lock_release_non_nested+0x7e/0x1bb
[  125.904803]  [<c11fa26e>] sys_send+0x18/0x1a
[  125.909815]  [<c11fa877>] sys_socketcall+0xce/0x19a
[  125.915539]  [<c11507c0>] ? trace_hardirqs_on_thunk+0xc/0x10
[  125.922127]  [<c1271650>] sysenter_do_call+0x12/0x36
[  185.166028] INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 1, t=60002 jiffies)
[  185.167017] INFO: Stall ended before state dump start

-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)

^ permalink raw reply

* Re: Should "N/A" dust bunnies be swept from fw_version?
From: Rick Jones @ 2011-11-18 19:23 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20111118.141046.1309959676401370888.davem@davemloft.net>

On 11/18/2011 11:10 AM, David Miller wrote:
> From: Rick Jones<rick.jones2@hp.com>
> Date: Fri, 18 Nov 2011 11:09:48 -0800
>
>> On 11/17/2011 04:19 PM, Ben Hutchings wrote:
>>> On Thu, 2011-11-17 at 15:27 -0800, Rick Jones wrote:
>>>> In the discussion on "enable virtio_net to return bus_info in ethtool
>>>> -i
>>>> consistent with emulated NICs" Ben Hutchings had the following
>>>> feedback
>>>> on what might go into bus_info:
>>>>
>>>>> Please use the existing 'not implemented' value, which is the empty
>>>>> string.  If you think ethtool should print some helpful message
>>>>> instead
>>>>> of an empty string, please submit a patch for ethtool.
>>>>
>>>> When I was sweeping in the .get_drvinfo routines, I noticed many
>>>> drivers
>>>> would return "N/A" for fw_version - presumably they were drivers for
>>>> cards without firmware.  Should those be removed to have the
>>>> fw_version
>>>> be the empty string, or should those sleeping dust bunnies be allowed
>>>> to
>>>> lie?
>>>
>>> I much prefer the empty string; the ethtool utility can turn that into
>>> a
>>> user-friendly placeholder if it's considered confusing.
>>
>> Any other opinions out there?  Anyone? Anyone?-)
>
> I agree with Ben, just provide the empty string.

OK, when I have time to pick-up my broom, I'll do some additional sweeping.

rick

^ permalink raw reply

* Re: b43: BCM 4331: MacBook 8,1: No connection after suspend
From: John W. Linville @ 2011-11-18 19:22 UTC (permalink / raw)
  To: Nico -telmich- Schottelius, LKML, netdev, Arend van Spriel
  Cc: b43-dev, linux-wireless, Rafał Miłecki, Larry Finger,
	Michael Buesch
In-Reply-To: <20111118173242.GA2101@schottelius.org>

Arend has nothing to do with b43, and b43 has it's own
mailing list, b43-dev@lists.infradead.org.  Even if it didn't,
linux-wireless@vger.kernel.org would be a more appropriate list than
this one.

John

On Fri, Nov 18, 2011 at 06:32:42PM +0100, Nico -telmich- Schottelius wrote:
> Hello,
> 
> new notebook, new problems (*):
> 
> Running 3.2.0-rc1 on the MacBook Pro 8,1 with the BCM4331
> (14e4:4331) the WLAN indeed works:
> 
> [   86.231702] b43-phy0: Broadcom 4331 WLAN found (core revision 29)
> [   86.269190] ieee80211 phy0: Selected rate control algorithm
> 'minstrel_ht'
> [   86.270486] Broadcom 43xx driver loaded [ Features: PMNLS ]
> [   87.677265] b43-phy0: Loading firmware version 666.2 (2011-02-23
> 01:15:07)
> 
> But: After a suspend it seems not to receive any packets anymore:
> 
> [ 2334.494845] wlan0: authenticate with 64:87:d7:37:89:89 (try 1)
> [ 2334.694035] wlan0: authenticate with 64:87:d7:37:89:89 (try 2)
> [ 2334.893909] wlan0: authenticate with 64:87:d7:37:89:89 (try 3)
> [ 2335.093824] wlan0: authentication with 64:87:d7:37:89:89 timed out
> 
> wpa_supplicant thus retries to connect to the network again and again
> without success.
> 
> This is reproducable on the MBP 8,1 and is *NOT* fixed if I unload
> b43 and modprobe it again. It is also "not fixed" when doing
> multiple suspends/resumes.
> 
> Any pointers to this?
> 
> Cheers,
> 
> Nico
> 
> 
> (*) feels like in good old times...
> 
> -- 
> PGP key: 7ED9 F7D3 6B10 81D7 0EC5  5C09 D7DC C8E4 3187 7DF0
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: [PATCH] xen-netfront: report link speed to ethtool
From: Olaf Hering @ 2011-11-18 19:17 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev, xen-devel, Jeremy Fitzhardinge, Konrad Rzeszutek Wilk
In-Reply-To: <1321643431.2883.39.camel@bwh-desktop>

On Fri, Nov 18, Ben Hutchings wrote:

> On Fri, 2011-11-18 at 19:43 +0100, Olaf Hering wrote:
> > On Fri, Nov 18, Ben Hutchings wrote:
> > 
> > > On Fri, 2011-11-18 at 17:48 +0100, Olaf Hering wrote:
> > > > The reported data refers to VMWare vmxnet.
> > > NAK, we should not just make things up.
> > 
> > So how about removing veth_get_settings, vmxnet3_get_settings,
> > tun_get_settings and other functions that escaped my grep?
> 
> If they can't provide meaningful information then maybe they should be
> removed.  However, that could result in a regression for existing
> working configurations.  (This isn't the same as the case you're trying
> to fix, since those applications have never worked with xen-netfront or
> many other drivers that don't implement get_settings.)

That may be.
How about a new generic ethtool_op_get_settings_veth which returns fake
values for all relevant drivers (virtio, xen-netfront, and the ones
listed above)?

Olaf

^ permalink raw reply

* Re: NFS TCP race condition with SOCK_ASYNC_NOSPACE
From: Trond Myklebust @ 2011-11-18 19:14 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <4EC6AC47.60404-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>

On Fri, 2011-11-18 at 19:04 +0000, Andrew Cooper wrote: 
> On 18/11/11 18:52, Trond Myklebust wrote:
> > On Fri, 2011-11-18 at 18:40 +0000, Andrew Cooper wrote: 
> >> Hello,
> >>
> >> As described originally in
> >> http://www.spinics.net/lists/linux-nfs/msg25314.html, we were
> >> encountering a bug whereby the NFS session was unexpectedly timing out.
> >>
> >> I believe I have found the source of the race condition causing the timeout.
> >>
> >> Brief overview of setup:
> >>   10GiB network, NFS mounted using TCP.  Problem reproduces with
> >> multiple different NICs, with synchronous or asynchronous mounts, and
> >> with soft and hard mounts.  Reproduces on 2.6.32 and I am currently
> >> trying to reproduce with mainline. (I don't have physical access to the
> >> servers so installing stuff is not fantastically easy)
> >>
> >>
> >>
> >> In net/sunrpc/xprtsock.c:xs_tcp_send_request(), we try to write data to
> >> the sock buffer using xs_sendpages()
> >>
> >> When the sock buffer is nearly fully, we get an EAGAIN from
> >> xs_sendpages() which causes a break out of the loop.  Lower down the
> >> function, we switch on status which cases us to call xs_nospace() with
> >> the task.
> >>
> >> In xs_nospace(), we test the SOCK_ASYNC_NOSPACE bit from the socket, and
> >> in the rare case where that bit is clear, we return 0 instead of
> >> EAGAIN.  This promptly overwrites status in xs_tcp_send_request().
> >>
> >> The result is that xs_tcp_release_xprt() finds a request which has no
> >> error, but has not sent all of the bytes in its send buffer.  It cleans
> >> up by setting XPRT_CLOSE_WAIT which causes xprt_clear_locked() to queue
> >> xprt->task_cleanup, which closes the TCP connection.
> >>
> >>
> >> Under normal operation, the TCP connection goes down and back up without
> >> interruption to the NFS layer.  However, when the NFS server hangs in a
> >> half closed state, the client forces a RST of the TCP connection,
> >> leading to the timeout.
> >>
> >> I have tried a few naive fixes such as changing the default return value
> >> in xs_nospace() from 0 to -EAGAIN (meaning that 0 will never be
> >> returned) but this causes a kernel memory leak.  Can someone who a
> >> better understanding of these interactions than me have a look?  It
> >> seems that the if (test_bit()) test in xs_nospace() should have an else
> >> clause.
> > I fully agree with your analysis. The correct thing to do here is to
> > always return either EAGAIN or ENOTCONN. Thank you very much for working
> > this one out!
> >
> > Trond
> 
> Returning EAGAIN seems to cause a kernel memory leak, as the oomkiller
> starts going after processes holding large amounts of LowMem.  Returning

The EAGAIN should trigger a retry of the send.

> ENOTCONN causes the NFS session to complain about a timeout in the logs,
> and in the case of a softmout, give an EIO to the calling process.

Correct. ENOTCONN means that the connection was lost.

> >From the looks of the TCP stream, and from the the looks of some
> targeted debugging, nothing is actually wrong, so the client should not
> be trying to FIN the TCP connection.  Is it possible that there is a
> more sinister reason for SOCK_ASYNC_NOSPACE being clear?

Normally, it means that we're out of the out-of-write-buffer condition
that caused the socket to fail (i.e. the socket has made progress
sending more data, so that we can now resume sending more). Returning
EAGAIN in that condition is correct.

> I can attempt to find which of the many calls to clear that bit is
> actually causing the problem, but I have a feeing that is going to a
> little more tricky to narrow down.
> 

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org
www.netapp.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] xen-netfront: report link speed to ethtool
From: David Miller @ 2011-11-18 19:11 UTC (permalink / raw)
  To: bhutchings; +Cc: olaf, netdev, xen-devel, jeremy.fitzhardinge, konrad.wilk
In-Reply-To: <1321638394.2883.32.camel@bwh-desktop>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Fri, 18 Nov 2011 17:46:34 +0000

> On Fri, 2011-11-18 at 17:48 +0100, Olaf Hering wrote:
>> Add .get_settings function, return fake data so that ethtool can get
>> enough information. For some application like VCS, this is useful,
>> otherwise some of application logic will get panic.
>> The reported data refers to VMWare vmxnet.
>> 
>> Signed-off-by: Xin Wei Hu <xwhu@suse.com>
>> Signed-off-by: Chunyan Liu <cyliu@suse.com>
>> Signed-off-by: Olaf Hering <olaf@aepfle.de>
> 
> NAK, we should not just make things up.

Agreed, if you cannot determine the values with certainty do not
implement this method.

Fix the tools which cannot function without this information.

^ permalink raw reply

* Re: use a special value of -2 for virtual devices to report indeterminate speed?
From: Ben Hutchings @ 2011-11-18 19:13 UTC (permalink / raw)
  To: Rick Jones
  Cc: Jeremy Fitzhardinge, Olaf Hering, netdev@vger.kernel.org,
	xen-devel@lists.xensource.com, Konrad Rzeszutek Wilk
In-Reply-To: <4EC6AAF3.6080803@hp.com>

On Fri, 2011-11-18 at 10:58 -0800, Rick Jones wrote:
> On 11/18/2011 10:46 AM, Jeremy Fitzhardinge wrote:
> > On 11/18/2011 10:44 AM, Rick Jones wrote:
> >>   It could I suppose, decide
> >> based on the physical NIC to which it is attached, so long as folks
> >> using the virtual NIC don't expect its attributes to be the same from
> >> system to system.
> >
> > And assuming there's a physical NIC at all.
> 
> It sounds like we need a way to specify "Indeterminate" for link speed? 
>   Or some verbiage to that effect. Right now 0 and -1 cause ethtool to 
> report "Unknown!"
> 
>          if (speed == 0 || speed == (u16)(-1) || speed == (u32)(-1))
>                  fprintf(stdout, "Unknown!\n");
>          else
>                  fprintf(stdout, "%uMb/s\n", speed);
> 
> 
> How about -2 for the u32 cast value of speed returning "Indeterminate" 
> or something like that?  Not in "proper" patch format:
> 
> 	if (speed == 0 || speed == (u16)(-1) || speed == (u32)(-1))
> 		fprintf(stdout, "Unknown!\n");
> 	else if (speed == (u32)(-2))
> 		fprintf(stdout, "Indeterminate.");
> 	else
> 		fprintf(stdout, "%uMb/s\n", speed);

I'm open to something like this, but the problem with assigning new
magic numbers is that older versions of ethtool won't know to report
them as special.

We should also consider stacked drivers like bonding (and presumably
team) that expect real numbers when the link is up.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: Should "N/A" dust bunnies be swept from fw_version?
From: David Miller @ 2011-11-18 19:10 UTC (permalink / raw)
  To: rick.jones2; +Cc: netdev
In-Reply-To: <4EC6AD7C.2070307@hp.com>

From: Rick Jones <rick.jones2@hp.com>
Date: Fri, 18 Nov 2011 11:09:48 -0800

> On 11/17/2011 04:19 PM, Ben Hutchings wrote:
>> On Thu, 2011-11-17 at 15:27 -0800, Rick Jones wrote:
>>> In the discussion on "enable virtio_net to return bus_info in ethtool
>>> -i
>>> consistent with emulated NICs" Ben Hutchings had the following
>>> feedback
>>> on what might go into bus_info:
>>>
>>>> Please use the existing 'not implemented' value, which is the empty
>>>> string.  If you think ethtool should print some helpful message
>>>> instead
>>>> of an empty string, please submit a patch for ethtool.
>>>
>>> When I was sweeping in the .get_drvinfo routines, I noticed many
>>> drivers
>>> would return "N/A" for fw_version - presumably they were drivers for
>>> cards without firmware.  Should those be removed to have the
>>> fw_version
>>> be the empty string, or should those sleeping dust bunnies be allowed
>>> to
>>> lie?
>>
>> I much prefer the empty string; the ethtool utility can turn that into
>> a
>> user-friendly placeholder if it's considered confusing.
> 
> Any other opinions out there?  Anyone? Anyone?-)

I agree with Ben, just provide the empty string.

^ permalink raw reply

* Re: [PATCH] xen-netfront: report link speed to ethtool
From: Ben Hutchings @ 2011-11-18 19:10 UTC (permalink / raw)
  To: Olaf Hering; +Cc: netdev, xen-devel, Jeremy Fitzhardinge, Konrad Rzeszutek Wilk
In-Reply-To: <20111118184336.GA16027@aepfle.de>

On Fri, 2011-11-18 at 19:43 +0100, Olaf Hering wrote:
> On Fri, Nov 18, Ben Hutchings wrote:
> 
> > On Fri, 2011-11-18 at 17:48 +0100, Olaf Hering wrote:
> > > The reported data refers to VMWare vmxnet.
> > NAK, we should not just make things up.
> 
> So how about removing veth_get_settings, vmxnet3_get_settings,
> tun_get_settings and other functions that escaped my grep?

If they can't provide meaningful information then maybe they should be
removed.  However, that could result in a regression for existing
working configurations.  (This isn't the same as the case you're trying
to fix, since those applications have never worked with xen-netfront or
many other drivers that don't implement get_settings.)

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: Should "N/A" dust bunnies be swept from fw_version?
From: Rick Jones @ 2011-11-18 19:09 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1321575574.2749.55.camel@bwh-desktop>

On 11/17/2011 04:19 PM, Ben Hutchings wrote:
> On Thu, 2011-11-17 at 15:27 -0800, Rick Jones wrote:
>> In the discussion on "enable virtio_net to return bus_info in ethtool -i
>> consistent with emulated NICs" Ben Hutchings had the following feedback
>> on what might go into bus_info:
>>
>>> Please use the existing 'not implemented' value, which is the empty
>>> string.   If you think ethtool should print some helpful message instead
>>> of an empty string, please submit a patch for ethtool.
>>
>> When I was sweeping in the .get_drvinfo routines, I noticed many drivers
>> would return "N/A" for fw_version - presumably they were drivers for
>> cards without firmware.  Should those be removed to have the fw_version
>> be the empty string, or should those sleeping dust bunnies be allowed to
>> lie?
>
> I much prefer the empty string; the ethtool utility can turn that into a
> user-friendly placeholder if it's considered confusing.

Any other opinions out there?  Anyone? Anyone?-)

rick jones

^ permalink raw reply

* Re: NFS TCP race condition with SOCK_ASYNC_NOSPACE
From: Andrew Cooper @ 2011-11-18 19:04 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1321642368.2653.35.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>

On 18/11/11 18:52, Trond Myklebust wrote:
> On Fri, 2011-11-18 at 18:40 +0000, Andrew Cooper wrote: 
>> Hello,
>>
>> As described originally in
>> http://www.spinics.net/lists/linux-nfs/msg25314.html, we were
>> encountering a bug whereby the NFS session was unexpectedly timing out.
>>
>> I believe I have found the source of the race condition causing the timeout.
>>
>> Brief overview of setup:
>>   10GiB network, NFS mounted using TCP.  Problem reproduces with
>> multiple different NICs, with synchronous or asynchronous mounts, and
>> with soft and hard mounts.  Reproduces on 2.6.32 and I am currently
>> trying to reproduce with mainline. (I don't have physical access to the
>> servers so installing stuff is not fantastically easy)
>>
>>
>>
>> In net/sunrpc/xprtsock.c:xs_tcp_send_request(), we try to write data to
>> the sock buffer using xs_sendpages()
>>
>> When the sock buffer is nearly fully, we get an EAGAIN from
>> xs_sendpages() which causes a break out of the loop.  Lower down the
>> function, we switch on status which cases us to call xs_nospace() with
>> the task.
>>
>> In xs_nospace(), we test the SOCK_ASYNC_NOSPACE bit from the socket, and
>> in the rare case where that bit is clear, we return 0 instead of
>> EAGAIN.  This promptly overwrites status in xs_tcp_send_request().
>>
>> The result is that xs_tcp_release_xprt() finds a request which has no
>> error, but has not sent all of the bytes in its send buffer.  It cleans
>> up by setting XPRT_CLOSE_WAIT which causes xprt_clear_locked() to queue
>> xprt->task_cleanup, which closes the TCP connection.
>>
>>
>> Under normal operation, the TCP connection goes down and back up without
>> interruption to the NFS layer.  However, when the NFS server hangs in a
>> half closed state, the client forces a RST of the TCP connection,
>> leading to the timeout.
>>
>> I have tried a few naive fixes such as changing the default return value
>> in xs_nospace() from 0 to -EAGAIN (meaning that 0 will never be
>> returned) but this causes a kernel memory leak.  Can someone who a
>> better understanding of these interactions than me have a look?  It
>> seems that the if (test_bit()) test in xs_nospace() should have an else
>> clause.
> I fully agree with your analysis. The correct thing to do here is to
> always return either EAGAIN or ENOTCONN. Thank you very much for working
> this one out!
>
> Trond

Returning EAGAIN seems to cause a kernel memory leak, as the oomkiller
starts going after processes holding large amounts of LowMem.  Returning
ENOTCONN causes the NFS session to complain about a timeout in the logs,
and in the case of a softmout, give an EIO to the calling process.

>From the looks of the TCP stream, and from the the looks of some
targeted debugging, nothing is actually wrong, so the client should not
be trying to FIN the TCP connection.  Is it possible that there is a
more sinister reason for SOCK_ASYNC_NOSPACE being clear?

I can attempt to find which of the many calls to clear that bit is
actually causing the problem, but I have a feeing that is going to a
little more tricky to narrow down.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* use a special value of -2 for virtual devices to report indeterminate speed?
From: Rick Jones @ 2011-11-18 18:58 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Ben Hutchings, Olaf Hering, netdev@vger.kernel.org,
	xen-devel@lists.xensource.com, Konrad Rzeszutek Wilk
In-Reply-To: <4EC6A802.9090805@goop.org>

On 11/18/2011 10:46 AM, Jeremy Fitzhardinge wrote:
> On 11/18/2011 10:44 AM, Rick Jones wrote:
>>   It could I suppose, decide
>> based on the physical NIC to which it is attached, so long as folks
>> using the virtual NIC don't expect its attributes to be the same from
>> system to system.
>
> And assuming there's a physical NIC at all.

It sounds like we need a way to specify "Indeterminate" for link speed? 
  Or some verbiage to that effect. Right now 0 and -1 cause ethtool to 
report "Unknown!"

         if (speed == 0 || speed == (u16)(-1) || speed == (u32)(-1))
                 fprintf(stdout, "Unknown!\n");
         else
                 fprintf(stdout, "%uMb/s\n", speed);


How about -2 for the u32 cast value of speed returning "Indeterminate" 
or something like that?  Not in "proper" patch format:

	if (speed == 0 || speed == (u16)(-1) || speed == (u32)(-1))
		fprintf(stdout, "Unknown!\n");
	else if (speed == (u32)(-2))
		fprintf(stdout, "Indeterminate.");
	else
		fprintf(stdout, "%uMb/s\n", speed);

Signed-off-by: Rick Jones <rick.jones2@hp.com>	

rick jones

^ permalink raw reply

* Re: NFS TCP race condition with SOCK_ASYNC_NOSPACE
From: Trond Myklebust @ 2011-11-18 18:52 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4EC6A681.30902-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>

On Fri, 2011-11-18 at 18:40 +0000, Andrew Cooper wrote: 
> Hello,
> 
> As described originally in
> http://www.spinics.net/lists/linux-nfs/msg25314.html, we were
> encountering a bug whereby the NFS session was unexpectedly timing out.
> 
> I believe I have found the source of the race condition causing the timeout.
> 
> Brief overview of setup:
>   10GiB network, NFS mounted using TCP.  Problem reproduces with
> multiple different NICs, with synchronous or asynchronous mounts, and
> with soft and hard mounts.  Reproduces on 2.6.32 and I am currently
> trying to reproduce with mainline. (I don't have physical access to the
> servers so installing stuff is not fantastically easy)
> 
> 
> 
> In net/sunrpc/xprtsock.c:xs_tcp_send_request(), we try to write data to
> the sock buffer using xs_sendpages()
> 
> When the sock buffer is nearly fully, we get an EAGAIN from
> xs_sendpages() which causes a break out of the loop.  Lower down the
> function, we switch on status which cases us to call xs_nospace() with
> the task.
> 
> In xs_nospace(), we test the SOCK_ASYNC_NOSPACE bit from the socket, and
> in the rare case where that bit is clear, we return 0 instead of
> EAGAIN.  This promptly overwrites status in xs_tcp_send_request().
> 
> The result is that xs_tcp_release_xprt() finds a request which has no
> error, but has not sent all of the bytes in its send buffer.  It cleans
> up by setting XPRT_CLOSE_WAIT which causes xprt_clear_locked() to queue
> xprt->task_cleanup, which closes the TCP connection.
> 
> 
> Under normal operation, the TCP connection goes down and back up without
> interruption to the NFS layer.  However, when the NFS server hangs in a
> half closed state, the client forces a RST of the TCP connection,
> leading to the timeout.
> 
> I have tried a few naive fixes such as changing the default return value
> in xs_nospace() from 0 to -EAGAIN (meaning that 0 will never be
> returned) but this causes a kernel memory leak.  Can someone who a
> better understanding of these interactions than me have a look?  It
> seems that the if (test_bit()) test in xs_nospace() should have an else
> clause.

I fully agree with your analysis. The correct thing to do here is to
always return either EAGAIN or ENOTCONN. Thank you very much for working
this one out!

Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org
www.netapp.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] xen-netfront: report link speed to ethtool
From: Jeremy Fitzhardinge @ 2011-11-18 18:46 UTC (permalink / raw)
  To: Rick Jones
  Cc: Ben Hutchings, Olaf Hering, netdev@vger.kernel.org,
	xen-devel@lists.xensource.com, Konrad Rzeszutek Wilk
In-Reply-To: <4EC6A778.1000503@hp.com>

On 11/18/2011 10:44 AM, Rick Jones wrote:
>  It could I suppose, decide 
> based on the physical NIC to which it is attached, so long as folks 
> using the virtual NIC don't expect its attributes to be the same from 
> system to system.

And assuming there's a physical NIC at all.

    J

^ permalink raw reply

* Re: [PATCH] xen-netfront: report link speed to ethtool
From: Rick Jones @ 2011-11-18 18:44 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Olaf Hering, netdev, xen-devel, Jeremy Fitzhardinge,
	Konrad Rzeszutek Wilk
In-Reply-To: <1321638394.2883.32.camel@bwh-desktop>

On 11/18/2011 09:46 AM, Ben Hutchings wrote:
> On Fri, 2011-11-18 at 17:48 +0100, Olaf Hering wrote:
>> Add .get_settings function, return fake data so that ethtool can get
>> enough information. For some application like VCS, this is useful,
>> otherwise some of application logic will get panic.
>> The reported data refers to VMWare vmxnet.
>>
>> Signed-off-by: Xin Wei Hu<xwhu@suse.com>
>> Signed-off-by: Chunyan Liu<cyliu@suse.com>
>> Signed-off-by: Olaf Hering<olaf@aepfle.de>
>
> NAK, we should not just make things up.

Which raises an interesting question for a virtual interface that isn't 
pretending to be a specific NIC type. What should the reported speed be? 
  Is it a 10/100 NIC?  A 1 or 10 GbE NIC? 3.14 GbE?  For other emulated 
interfaces, it rather falls-out from the emulation.  We can say that the 
driver may not make stuff up, but it would seem what is running in the 
host/hypervisor/dom0/whatever will have to.  It could I suppose, decide 
based on the physical NIC to which it is attached, so long as folks 
using the virtual NIC don't expect its attributes to be the same from 
system to system.

rick

^ permalink raw reply

* Re: [PATCH] xen-netfront: report link speed to ethtool
From: Olaf Hering @ 2011-11-18 18:43 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev, xen-devel, Jeremy Fitzhardinge, Konrad Rzeszutek Wilk
In-Reply-To: <1321638394.2883.32.camel@bwh-desktop>

On Fri, Nov 18, Ben Hutchings wrote:

> On Fri, 2011-11-18 at 17:48 +0100, Olaf Hering wrote:
> > The reported data refers to VMWare vmxnet.
> NAK, we should not just make things up.

So how about removing veth_get_settings, vmxnet3_get_settings,
tun_get_settings and other functions that escaped my grep?

Olaf

^ permalink raw reply

* NFS TCP race condition with SOCK_ASYNC_NOSPACE
From: Andrew Cooper @ 2011-11-18 18:40 UTC (permalink / raw)
  To: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA

Hello,

As described originally in
http://www.spinics.net/lists/linux-nfs/msg25314.html, we were
encountering a bug whereby the NFS session was unexpectedly timing out.

I believe I have found the source of the race condition causing the timeout.

Brief overview of setup:
  10GiB network, NFS mounted using TCP.  Problem reproduces with
multiple different NICs, with synchronous or asynchronous mounts, and
with soft and hard mounts.  Reproduces on 2.6.32 and I am currently
trying to reproduce with mainline. (I don't have physical access to the
servers so installing stuff is not fantastically easy)



In net/sunrpc/xprtsock.c:xs_tcp_send_request(), we try to write data to
the sock buffer using xs_sendpages()

When the sock buffer is nearly fully, we get an EAGAIN from
xs_sendpages() which causes a break out of the loop.  Lower down the
function, we switch on status which cases us to call xs_nospace() with
the task.

In xs_nospace(), we test the SOCK_ASYNC_NOSPACE bit from the socket, and
in the rare case where that bit is clear, we return 0 instead of
EAGAIN.  This promptly overwrites status in xs_tcp_send_request().

The result is that xs_tcp_release_xprt() finds a request which has no
error, but has not sent all of the bytes in its send buffer.  It cleans
up by setting XPRT_CLOSE_WAIT which causes xprt_clear_locked() to queue
xprt->task_cleanup, which closes the TCP connection.


Under normal operation, the TCP connection goes down and back up without
interruption to the NFS layer.  However, when the NFS server hangs in a
half closed state, the client forces a RST of the TCP connection,
leading to the timeout.

I have tried a few naive fixes such as changing the default return value
in xs_nospace() from 0 to -EAGAIN (meaning that 0 will never be
returned) but this causes a kernel memory leak.  Can someone who a
better understanding of these interactions than me have a look?  It
seems that the if (test_bit()) test in xs_nospace() should have an else
clause.

Thanks in advance,

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Occasional oops with IPSec and IPv6.
From: Timo Teräs @ 2011-11-18 18:27 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Nick Bowler, netdev, David S. Miller
In-Reply-To: <1321634378.3277.35.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On 11/18/2011 06:39 PM, Eric Dumazet wrote:
> Le vendredi 18 novembre 2011 à 11:27 -0500, Nick Bowler a écrit :
>> On 2011-11-17 14:09 -0500, Nick Bowler wrote:
>>> One of the tests we do with IPsec involves sending and receiving UDP
>>> datagrams of all sizes from 1 to N bytes, where N is much larger than
>>> the MTU.  In this particular instance, the MTU is 1500 bytes and N is
>>> 10000 bytes.  This test works fine with IPv4, but I'm getting an
>>> occasional oops on Linus' master with IPv6 (output at end of email).  We
>>> also run the same test where N is less than the MTU, and it does not
>>> trigger this issue.  The resulting fallout seems to eventually lock up
>>> the box (although it continues to work for a little while afterwards).
>>>
>>> The issue appears timing related, and it doesn't always occur.  This
>>> probably also explains why I've not seen this issue before now, as we
>>> recently upgraded all our lab systems to machines from this century
>>> (with newfangled dual core processors).  This also makes it somewhat
>>> hard to reproduce, but I can trigger it pretty reliably by running 'yes'
>>> in an ssh session (which doesn't use IPsec) while running the test:
>>> it'll usually trigger in 2 or 3 runs.  The choice of cipher suite
>>> appears to be irrelevant.
>>>
>>> I built a relatively old kernel (2.6.34) and could not reproduce the
>>> issue there, so I ran a git bisect.  It pointed to the following, which
>>> (unsurprisingly) no longer reverts cleanly.
>>>
>>> Let me know if you need any more info.  I'll see if I can reproduce the
>>> issue with a smaller test case...
>>
>> OK, here's a somewhat straigthforward way to reproduce it that I've
>> found.  It uses a short test program called "udp_burst" which simply
>> transmits a bunch of UDP datagrams at all sizes between 1 and 10000,
>> included at the end of this mail.
>>[snip]
> 
> Please note commit 80c802f307 added a known bug, fixed in commit
> 0b150932197b (xfrm: avoid possible oopse in xfrm_alloc_dst)
> 
> Given commit 80c802f307 complexity, we can assume other bugs are to be
> fixed as well.
> 
> Unfortunately, Timo seems unresponsive.

This looks quite different. And I've been trying to figure out what
causes this. However, the OOPS happens at ip6_fragment(), indicating
that there was not enough allocated headroom (skb underrun). My initial
thought is ipv6 bug that just got uncovered by my commit; especially
since ipv4 side is happy. But I haven't yet been able to figure this one
out.

Could you also try Herbert's latest patch set:
  [0/6] Replace LL_ALLOCATED_SPACE to allow needed_headroom adjustment

This changes how the headroom is calculated, and *might* fix this issue
too if it's caused by the same SMP race condition which got uncovered by
my other commit earlier.

- Timo

^ permalink raw reply

* Re: Unable to flush ICMP redirect routes in kernel 3.0+
From: David Miller @ 2011-11-18 18:04 UTC (permalink / raw)
  To: fbl; +Cc: eric.dumazet, famzah, netdev, segoon
In-Reply-To: <20111118152142.717324b1@asterix.rh>

From: Flavio Leitner <fbl@redhat.com>
Date: Fri, 18 Nov 2011 15:21:42 -0200

> On Fri, 18 Nov 2011 18:07:53 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
>> Le vendredi 18 novembre 2011 à 15:05 -0200, Flavio Leitner a écrit :
>> 
>> > Sorry, I meant that we are trying to avoid doing this:
>> > +			hash = rt_hash(daddr, skeys[s],
>> > ikeys[i],rt_genid(net)); +
>> > +			rthp = &rt_hash_table[hash].chain;
>> > +
>> > +			while ((rt = rcu_dereference(*rthp)) !=
>> > NULL) {
>> > +				rthp = &rt->dst.rt_next;
>> 
>> Sure, but this is still needed right now.
> 
> Yes, David will not be happy, unfortunately :)

He better be happy that someone is fixing all the bugs he added.

^ permalink raw reply

* Re: [PATCH] xen-netfront: report link speed to ethtool
From: Ben Hutchings @ 2011-11-18 17:46 UTC (permalink / raw)
  To: Olaf Hering; +Cc: netdev, xen-devel, Jeremy Fitzhardinge, Konrad Rzeszutek Wilk
In-Reply-To: <20111118164805.GA14345@aepfle.de>

On Fri, 2011-11-18 at 17:48 +0100, Olaf Hering wrote:
> Add .get_settings function, return fake data so that ethtool can get
> enough information. For some application like VCS, this is useful,
> otherwise some of application logic will get panic.
> The reported data refers to VMWare vmxnet.
> 
> Signed-off-by: Xin Wei Hu <xwhu@suse.com>
> Signed-off-by: Chunyan Liu <cyliu@suse.com>
> Signed-off-by: Olaf Hering <olaf@aepfle.de>

NAK, we should not just make things up.

Ben.

> ---
>  drivers/net/xen-netfront.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> Index: linux-3.2-rc2/drivers/net/xen-netfront.c
> ===================================================================
> --- linux-3.2-rc2.orig/drivers/net/xen-netfront.c
> +++ linux-3.2-rc2/drivers/net/xen-netfront.c
> @@ -1727,6 +1727,17 @@ static void netback_changed(struct xenbu
>  	}
>  }
>  
> +static int xennet_get_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
> +{
> +	ecmd->supported = SUPPORTED_1000baseT_Full | SUPPORTED_TP;
> +	ecmd->advertising = ADVERTISED_TP;
> +	ecmd->port = PORT_TP;
> +	ecmd->transceiver = XCVR_INTERNAL;
> +	ecmd->speed = SPEED_1000;
> +	ecmd->duplex = DUPLEX_FULL;
> +	return 0;
> +}
> +
>  static const struct xennet_stat {
>  	char name[ETH_GSTRING_LEN];
>  	u16 offset;
> @@ -1774,6 +1785,7 @@ static const struct ethtool_ops xennet_e
>  {
>  	.get_link = ethtool_op_get_link,
>  
> +	.get_settings = xennet_get_settings,
>  	.get_sset_count = xennet_get_sset_count,
>  	.get_ethtool_stats = xennet_get_ethtool_stats,
>  	.get_strings = xennet_get_strings,
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: lock-warning seen on giving rds-info command
From: David Miller @ 2011-11-18 17:44 UTC (permalink / raw)
  To: eric.dumazet; +Cc: kumaras, netdev, venkat.x.venkatsubra
In-Reply-To: <1321604802.2444.40.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 18 Nov 2011 09:26:42 +0100

> You mean that following sequence triggers a warning ?
> 
> spin_lock_irqsave(&rds_sock_lock, flags);
> ...
> read_lock_bh(&sk->sk_callback_lock);
> read_unlock_bh(&sk->sk_callback_lock);   // HERE
> ...

This sequence has always been disallowed.

BH's are triggered by software, so when you do local_bh_enable() it checks
to make sure you haven't tried to do this with hard IRQs disabled and
triggers the warning if so.

static void __local_bh_enable(unsigned int cnt)
{
	WARN_ON_ONCE(in_irq());
	WARN_ON_ONCE(!irqs_disabled());
 ...

^ permalink raw reply

* Re: [net-next-2.6 PATCH 0/6 v4] macvlan: MAC Address filtering support for passthru mode
From: Ben Hutchings @ 2011-11-18 17:40 UTC (permalink / raw)
  To: Greg Rose
  Cc: Roopa Prabhu, netdev@vger.kernel.org, davem@davemloft.net,
	chrisw@redhat.com, sri@us.ibm.com, dragos.tatulea@gmail.com,
	kvm@vger.kernel.org, arnd@arndb.de, mst@redhat.com,
	mchan@broadcom.com, dwang2@cisco.com, shemminger@vyatta.com,
	eric.dumazet@gmail.com, kaber@trash.net, benve@cisco.com
In-Reply-To: <4EC68EBB.3080303@intel.com>

On Fri, 2011-11-18 at 08:58 -0800, Greg Rose wrote:
> On 11/17/2011 4:44 PM, Ben Hutchings wrote:
> > On Thu, 2011-11-17 at 16:32 -0800, Greg Rose wrote:
> >> On 11/17/2011 4:15 PM, Ben Hutchings wrote:
> >>> Sorry to come to this rather late.
> >>>
> >>> On Tue, 2011-11-08 at 23:55 -0800, Roopa Prabhu wrote:
> >>> [...]
> >>>> v2 ->   v3
> >>>> - Moved set and get filter ops from rtnl_link_ops to netdev_ops
> >>>> - Support for SRIOV VFs.
> >>>>           [Note: The get filters msg (in the way current get rtnetlink handles
> >>>>           it) might get too big for SRIOV vfs. This patch follows existing sriov
> >>>>           vf get code and tries to accomodate filters for all VF's in a PF.
> >>>>           And for the SRIOV case I have only tested the fact that the VF
> >>>>           arguments are getting delivered to rtnetlink correctly. The code
> >>>>           follows existing sriov vf handling code so rest of it should work fine]
> >>> [...]
> >>>
> >>> This is already broken for large numbers of VFs, and increasing the
> >>> amount of information per VF is going to make the situation worse.  I am
> >>> no netlink expert but I think that the current approach of bundling all
> >>> information about an interface in a single message may not be
> >>> sustainable.
> >>>
> >>> Also, I'm unclear on why this interface is to be used to set filtering
> >>> for the (PF) net device as well as for related VFs.  Doesn't that
> >>> duplicate the functionality of ndo_set_rx_mode and
> >>> ndo_vlan_rx_{add,kill}_vid?
> >>
> >> Functionally yes but contextually no.  This allows the PF driver to know
> >> that it is setting these filters in the context of the existence of VFs,
> >> allowing it to take appropriate action.  The other two functions may be
> >> called without the presence of SR-IOV enablement and the existence of VFs.
> >>
> >> Anyway, that's why I asked Roopa to add that capability.
> >
> > I don't follow.  The PF driver already knows whether it has enabled VFs.
> >
> > How do filters set this way interact with filters set through the
> > existing operations?  Should they override promiscuous mode?  None of
> > this has been specified.
> 
> Promiscuous mode is exactly the issue this feature is intended for.  I'm 
> not familiar with the solarflare device but Intel HW promiscuous mode is 
> only promiscuous on the physical port, not on the VEB.  So a packet sent 
> from a VF will not be captured by the PF across the VEB unless the MAC 
> and VLAN filters have been programmed into the HW.

Yes, I get it.  The hardware bridge needs to know more about the address
configuration on the host than the driver is getting at the moment.

> So you may not need 
> the feature for your devices but it is required for Intel devices.

Well we don't have the hardware bridge but that means each VF driver
needs to know whether to fall back to the software bridge.  The net
driver needs much the same additional information.

> And 
> it's a fairly simple request, just allow -1 to indicate that the target 
> of the filter requests is for the PF itself.  Using the already existing 
> set_rx_mode function wont' work because the PF driver will look at it 
> and figure it's in promiscuous mode anyway, so it won't set the filters 
> into the HW.  At least that is how it is in the case of our HW and 
> driver.  Again, the behavior of your HW and driver is unknown to me and 
> thus you may not require this feature.

What concerns me is that this seems to be a workaround rather than a fix
for over-use of promiscuous mode, and it changes the semantics of
filtering modes in ways that haven't been well-specified.

What if there's a software bridge between two net devices corresponding
to separate physical ports, so that they really need to be promiscuous?
What if the administrator runs tcpdump and really wants the (PF) net
device to be promiscuous?

These cases shouldn't break because of VF acceleration.  Or at least we
should make a conscious and documented decision that 'promiscuous'
doesn't mean that if you enable it on your network adapter.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: b43: BCM 4331: MacBook 8,1: No connection after suspend
From: Nico Schottelius @ 2011-11-18 17:39 UTC (permalink / raw)
  To: Nico -telmich- Schottelius, LKML, netdev, Arend van Spriel
In-Reply-To: <20111118173242.GA2101@schottelius.org>

[-- Attachment #1: Type: text/plain, Size: 203 bytes --]

Nico -telmich- Schottelius [Fri, Nov 18, 2011 at 06:32:42PM +0100]:
> Hello,
> 
> new notebook, new problems (*):

+ full dmesg attached.

-- 
PGP key: 7ED9 F7D3 6B10 81D7 0EC5  5C09 D7DC C8E4 3187 7DF0

[-- Attachment #2: dmesg.no-connection-after-suspend.gz --]
[-- Type: application/octet-stream, Size: 145793 bytes --]

^ permalink raw reply

* Re: [PATCH] net: Use kmemdup rather than duplicating its implementation
From: Jesse Brandeburg @ 2011-11-18 17:32 UTC (permalink / raw)
  To: Thomas Meyer
  Cc: samuel@sortiz.org, davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <1321569820.1624.307.camel@localhost.localdomain>

On Thu, 17 Nov 2011 14:43:40 -0800
Thomas Meyer <thomas@m3y3r.de> wrote:

> The semantic patch that makes this change is available
> in scripts/coccinelle/api/memdup.cocci.
> 
> Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
> ---
> 
> diff -u -p a/net/irda/irttp.c b/net/irda/irttp.c
> --- a/net/irda/irttp.c 2011-11-07 19:39:06.071138486 +0100
> +++ b/net/irda/irttp.c 2011-11-08 10:59:07.152748948 +0100
> @@ -1461,14 +1461,13 @@ struct tsap_cb *irttp_dup(struct tsap_cb
>  	}
>  
>  	/* Allocate a new instance */
> -	new = kmalloc(sizeof(struct tsap_cb), GFP_ATOMIC);
> +	new = kmemdup(orig, sizeof(struct tsap_cb), GFP_ATOMIC);
>  	if (!new) {
>  		IRDA_DEBUG(0, "%s(), unable to kmalloc\n", __func__);
>  		spin_unlock_irqrestore(&irttp->tsaps->hb_spinlock, flags);
>  		return NULL;
>  	}
>  	/* Dup */

this ^^^ comment should be removed also.

> -	memcpy(new, orig, sizeof(struct tsap_cb));
>  	spin_lock_init(&new->lock);
>  
>  	/* We don't need the old instance any more */

^ permalink raw reply

* b43: BCM 4331: MacBook 8,1: No connection after suspend
From: Nico -telmich- Schottelius @ 2011-11-18 17:32 UTC (permalink / raw)
  To: LKML, netdev, Arend van Spriel

Hello,

new notebook, new problems (*):

Running 3.2.0-rc1 on the MacBook Pro 8,1 with the BCM4331
(14e4:4331) the WLAN indeed works:

[   86.231702] b43-phy0: Broadcom 4331 WLAN found (core revision 29)
[   86.269190] ieee80211 phy0: Selected rate control algorithm
'minstrel_ht'
[   86.270486] Broadcom 43xx driver loaded [ Features: PMNLS ]
[   87.677265] b43-phy0: Loading firmware version 666.2 (2011-02-23
01:15:07)

But: After a suspend it seems not to receive any packets anymore:

[ 2334.494845] wlan0: authenticate with 64:87:d7:37:89:89 (try 1)
[ 2334.694035] wlan0: authenticate with 64:87:d7:37:89:89 (try 2)
[ 2334.893909] wlan0: authenticate with 64:87:d7:37:89:89 (try 3)
[ 2335.093824] wlan0: authentication with 64:87:d7:37:89:89 timed out

wpa_supplicant thus retries to connect to the network again and again
without success.

This is reproducable on the MBP 8,1 and is *NOT* fixed if I unload
b43 and modprobe it again. It is also "not fixed" when doing
multiple suspends/resumes.

Any pointers to this?

Cheers,

Nico


(*) feels like in good old times...

-- 
PGP key: 7ED9 F7D3 6B10 81D7 0EC5  5C09 D7DC C8E4 3187 7DF0

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox