3.7.3+: Bad paging request in ip_rcv_finish while running NFS traffic.

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

* 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
@ 2013-01-21 21:07 Ben Greear
  2013-01-22  0:32 ` Ben Greear
  0 siblings, 1 reply; 21+ messages in thread
From: Ben Greear @ 2013-01-21 21:07 UTC (permalink / raw)
  To: netdev, linux-nfs@vger.kernel.org

I posted about this a few days ago, but this time the patches applied
are minimal and there are no out-of-tree kernel modules loaded.

I have to have the NFS patches (see below) for this test case to run
at all.

Test case is 2500 macvlans, each with a file-IO process running
on it doing writing on it's own mountpoint.  This is using NFSv4,
O_DIRECT, and 10k write() calls (and each file is 10k long, so
open/close file operations as well.  Smaller files and write sizes
showed the problem as well.

The bug is most easily hit during my application's restart, where interfaces
are being (re)configured and the file-io processes are being
started.  This includes changes to: IP addrs, routes, ip routing rules,
mount/unmount operations, etc.

I suspect it may be some sort of race between tearing down the IP protocol
and having traffic currently being received on that
interface.  I have not been able to hit this using HTTP traffic,
but perhaps that is just part of the race and not a particular
problem with NFS.

The patches applied on top of 3.7.3 are some NFS crash fixes Trond posted
last Friday and my own patches I just posted to linux-nfs:

http://www.spinics.net/lists/linux-nfs/msg34811.html

If anyone has any suggestions for how to further debug this,
please let me know.

All of the crashes are in the same place, but the value of
the bad address changes...sometimes appears as if NULL were
dereferenced...

<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/greearb/kernel/2.6/linux-3.7.x64/vmlinux...done.
(gdb) l *(ip_rcv_finish+0x2b7)
0xffffffff8149c783 is in ip_rcv_finish (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:373).
368					skb->len);
369		} else if (rt->rt_type == RTN_BROADCAST)
370			IP_UPD_PO_STATS_BH(dev_net(rt->dst.dev), IPSTATS_MIB_INBCAST,
371					skb->len);
372	
373		return dst_input(skb);
374	
375	drop:
376		kfree_skb(skb);
377		return NET_RX_DROP;
(gdb) quit

IPv6: ADDRCONF(NETDEV_CHANGE): eth2#903: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): eth2#923: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): eth2#943: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): eth2#963: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): eth2#983: link becomes ready
kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
BUG: unable to handle kernel paging request at ffff88040d7d8000
IP: [<ffff88040d7d8000>] 0xffff88040d7d7fff
PGD 1a0c063 PUD df78e067 PMD 800000040d6001e3
Oops: 0011 [#1] PREEMPT SMP
Modules linked in: macvlan pktgen lockd sunrpc uinput coretemp hwmon kvm_intel kvm gpio_ich iTCO_wdt iTCO_vendor_support microcode pcspkr lpc_ich i2c_i801 
e1000e ioatdma igb i7core_edac edac_core ptp pps_core dca ipv6 mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core
CPU 6
Pid: 47, comm: ksoftirqd/6 Tainted: G         C   3.7.3+ #37 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<ffff88040d7d8000>]  [<ffff88040d7d8000>] 0xffff88040d7d7fff
RSP: 0018:ffff88040d75bc00  EFLAGS: 00010282
RAX: ffff88040435cd80 RBX: ffff8803dd83e300 RCX: ffff8803dd83e300
RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff8803dd83e300
RBP: ffff88040d75bc28 R08: ffffffff8149c4cc R09: ffff88040d75bbf0
R10: dead000000200200 R11: dead000000100100 R12: ffff8803e10c88fc
R13: ffff8803dd83e300 R14: ffff88040d38e000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88041fcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff88040d7d8000 CR3: 0000000001a0b000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ksoftirqd/6 (pid: 47, threadinfo ffff88040d75a000, task ffff88040d751730)
Stack:
  ffffffff8149c783 ffff8803dd83e300 ffffffff8149c4cc ffff8803dd83e300
  ffff88040d38e000 ffff88040d75bc58 ffffffff8149cae8 0000000080000000
  0000000000000001 ffff8803dd83e300 ffff88040d38e000 ffff88040d75bc88
Call Trace:
  [<ffffffff8149c783>] ? ip_rcv_finish+0x2b7/0x2cf
  [<ffffffff8149c4cc>] ? inet_del_protocol+0x37/0x37
  [<ffffffff8149cae8>] NF_HOOK.clone.1+0x4c/0x53
  [<ffffffff8149cd73>] ip_rcv+0x237/0x268
  [<ffffffff81468c46>] __netif_receive_skb+0x487/0x530
  [<ffffffff81468de5>] process_backlog+0xf6/0x1d7
  [<ffffffff8146b19b>] net_rx_action+0xad/0x20c
  [<ffffffff8108d292>] __do_softirq+0x9c/0x161
  [<ffffffff8108d37a>] run_ksoftirqd+0x23/0x42
  [<ffffffff810a6f3b>] smpboot_thread_fn+0x253/0x258
  [<ffffffff810a6ce8>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff8109ff60>] kthread+0xbf/0xc7
  [<ffffffff8109fea1>] ? __init_kthread_worker+0x37/0x37
  [<ffffffff815292bc>] ret_from_fork+0x7c/0xb0
  [<ffffffff8109fea1>] ? __init_kthread_worker+0x37/0x37
Code: 36 a4 36 6a ea 6a ea 6a ea 6a ea e9 81 e9 81 e9 81 e9 81 71 48 71 48 71 48 71 48 ea 50 ea 50 ea 50 ea 50 f7 e4 f7 e4 f7 e4 f7 e4 <22> 22 00 00 ad 4e ad de 
ff ff ff ff 00 00 00 00 ff ff ff ff ff
RIP  [<ffff88040d7d8000>] 0xffff88040d7d7fff
  RSP <ffff88040d75bc00>
CR2: ffff88040d7d8000
---[ end trace 1c131533c1de5e8b ]---

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-21 21:07 3.7.3+: Bad paging request in ip_rcv_finish while running NFS traffic Ben Greear
@ 2013-01-22  0:32 ` Ben Greear
  2013-01-22  4:40   ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Ben Greear @ 2013-01-22  0:32 UTC (permalink / raw)
  To: netdev, linux-nfs@vger.kernel.org

On 01/21/2013 01:07 PM, Ben Greear wrote:
> I posted about this a few days ago, but this time the patches applied
> are minimal and there are no out-of-tree kernel modules loaded.

Here's another crash, this time with SLUB memory debugging turned on.

Seems much harder to hit this way...I've only managed this one.  I
believe the RCX register might be interesting...that 6b is probably
freed memory poisioning.  Maybe skb or skb_dest() is already freed?

I have added a 'verify_mem_not_deleted(skb)' before the
dst_input line in ip_rcv_finish..but so far, it hasn't
hit the problem...


general protection fault: 0000 [#1] PREEMPT SMP
Modules linked in: 8021q garp stp llc nfsv4 auth_rpcgss nfs fscache macvlan pktgen lockd sunrpc iTCO_wdt iTCO_vendor_support gpio_ich coretemp hwmon kvm_intel 
kvm microcode pcspkr i2c_i801 lpc_ich igb e1000e ioatdma ptp i7core_edac pps_core edac_core dca uinput ipv6 mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core 
[last unloaded: macvlan]
CPU 2
Pid: 22, comm: rcuc/2 Tainted: G         C O 3.7.3+ #38 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<ffffffff814c196d>]  [<ffffffff814c196d>] ip_rcv_finish+0x334/0x34f
RSP: 0018:ffff88041fc43d98  EFLAGS: 00010282
RAX: ffff880369f67180 RBX: ffff8803e4144fc0 RCX: 000000006b6b6b6b
RDX: ffff8803dd632f64 RSI: ffffffff81a2a580 RDI: ffff8803e4144fc0
RBP: ffff88041fc43db8 R08: ffffffff814c1639 R09: ffffffff814c1639
R10: ffff8803dd632f64 R11: ffff8803dd632f64 R12: ffff8803dd632f64
R13: ffff88041fc58420 R14: ffff88040cfa8000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88041fc40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000079e008 CR3: 00000003d5759000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rcuc/2 (pid: 22, threadinfo ffff88040d7d4000, task ffff88040d7d8000)
Stack:
  ffff8803e4144fc0 ffffffff814c1639 ffff8803e4144fc0 ffff88040cfa8000
  ffff88041fc43de8 ffffffff814c1e61 ffff880480000000 0000000000000246
  ffff8803e4144fc0 ffff88040cfa8000 ffff88041fc43e18 ffffffff814c20ec
Call Trace:
  <IRQ>
  [<ffffffff814c1639>] ? skb_dst+0x41/0x41
  [<ffffffff814c1e61>] NF_HOOK.clone.1+0x4c/0x53
  [<ffffffff814c20ec>] ip_rcv+0x237/0x267
  [<ffffffff814887f0>] __netif_receive_skb+0x537/0x5e0
  [<ffffffff8148895a>] process_backlog+0xc1/0x1b7
  [<ffffffff8148b082>] ? net_rx_action+0x1c4/0x1e2
  [<ffffffff8148af70>] net_rx_action+0xb2/0x1e2
  [<ffffffff8108eeb8>] __do_softirq+0xab/0x17b
  [<ffffffff81557cfc>] call_softirq+0x1c/0x30
  <EOI>
  [<ffffffff8100bd46>] do_softirq+0x46/0x9e
  [<ffffffff810fb709>] ? rcu_cpu_kthread+0xf6/0x12f
  [<ffffffff8108f08d>] _local_bh_enable_ip+0xb3/0xe7
  [<ffffffff8108f0d9>] local_bh_enable+0xd/0x10
  [<ffffffff810fb709>] rcu_cpu_kthread+0xf6/0x12f
  [<ffffffff810aacde>] smpboot_thread_fn+0x253/0x259
  [<ffffffff810aaa8b>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff810a3439>] kthread+0xc2/0xca
  [<ffffffff810a3377>] ? __init_kthread_worker+0x56/0x56
  [<ffffffff815569bc>] ret_from_fork+0x7c/0xb0
  [<ffffffff810a3377>] ? __init_kthread_worker+0x56/0x56
Code: 8b 80 58 04 00 00 48 8b 80 48 01 00 00 65 48 ff 80 c8 00 00 00 8b 53 68 65 48 01 90 e8 00 00 00 48 89 df e8
8e fc ff ff 48 89 df <ff> 50 50 eb 0d 48 89 df e8 2c eb fb ff b8 01 00 00 00 5b 41 5c
RIP  [<ffffffff814c196d>] ip_rcv_finish+0x334/0x34f
  RSP <ffff88041fc43d98>
[drm:mgag200_bo_reserve] *ERROR* reserve failed ffff8803feba9fb0


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-22  0:32 ` Ben Greear
@ 2013-01-22  4:40   ` Eric Dumazet
  2013-01-22  5:57     ` Ben Greear
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2013-01-22  4:40 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs@vger.kernel.org

On Mon, 2013-01-21 at 16:32 -0800, Ben Greear wrote:
> On 01/21/2013 01:07 PM, Ben Greear wrote:
> > I posted about this a few days ago, but this time the patches applied
> > are minimal and there are no out-of-tree kernel modules loaded.
> 
> Here's another crash, this time with SLUB memory debugging turned on.
> 
> Seems much harder to hit this way...I've only managed this one.  I
> believe the RCX register might be interesting...that 6b is probably
> freed memory poisioning.  Maybe skb or skb_dest() is already freed?
> 
> I have added a 'verify_mem_not_deleted(skb)' before the
> dst_input line in ip_rcv_finish..but so far, it hasn't
> hit the problem...
> 

There is no way skb is freed here.

I would say macvlan is at fault here.

It probably lacks a proper (dev->flags & IFF_UP) test.

Please try :

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index d3fb97d..c07bdef 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -111,9 +111,15 @@ static int macvlan_broadcast_one(struct sk_buff *skb,
 				 const struct ethhdr *eth, bool local)
 {
 	struct net_device *dev = vlan->dev;
+
 	if (!skb)
 		return NET_RX_DROP;
 
+	if (!(dev->flags & IFF_UP)) {
+		kfree_skb(skb);
+		return NET_RX_DROP;
+	}
+
 	if (local)
 		return vlan->forward(dev, skb);
 



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-22  4:40   ` Eric Dumazet
@ 2013-01-22  5:57     ` Ben Greear
  2013-01-22 17:08       ` Ben Greear
  0 siblings, 1 reply; 21+ messages in thread
From: Ben Greear @ 2013-01-22  5:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs@vger.kernel.org

On 01/21/2013 08:40 PM, Eric Dumazet wrote:
> On Mon, 2013-01-21 at 16:32 -0800, Ben Greear wrote:
>> On 01/21/2013 01:07 PM, Ben Greear wrote:
>>> I posted about this a few days ago, but this time the patches applied
>>> are minimal and there are no out-of-tree kernel modules loaded.
>>
>> Here's another crash, this time with SLUB memory debugging turned on.
>>
>> Seems much harder to hit this way...I've only managed this one.  I
>> believe the RCX register might be interesting...that 6b is probably
>> freed memory poisioning.  Maybe skb or skb_dest() is already freed?
>>
>> I have added a 'verify_mem_not_deleted(skb)' before the
>> dst_input line in ip_rcv_finish..but so far, it hasn't
>> hit the problem...
>>
>
> There is no way skb is freed here.
>
> I would say macvlan is at fault here.
>
> It probably lacks a proper (dev->flags & IFF_UP) test.

Thanks for the patch.  It's running OK so far, but I'll need to
do a bunch more testing tomorrow to make sure I'm not just getting
lucky!

I'll let you know how it goes tomorrow.

Thanks,
Ben



-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-22  5:57     ` Ben Greear
@ 2013-01-22 17:08       ` Ben Greear
  2013-01-22 17:17         ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Ben Greear @ 2013-01-22 17:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs@vger.kernel.org

On 01/21/2013 09:57 PM, Ben Greear wrote:
> On 01/21/2013 08:40 PM, Eric Dumazet wrote:
>> On Mon, 2013-01-21 at 16:32 -0800, Ben Greear wrote:
>>> On 01/21/2013 01:07 PM, Ben Greear wrote:
>>>> I posted about this a few days ago, but this time the patches applied
>>>> are minimal and there are no out-of-tree kernel modules loaded.
>>>
>>> Here's another crash, this time with SLUB memory debugging turned on.
>>>
>>> Seems much harder to hit this way...I've only managed this one.  I
>>> believe the RCX register might be interesting...that 6b is probably
>>> freed memory poisioning.  Maybe skb or skb_dest() is already freed?
>>>
>>> I have added a 'verify_mem_not_deleted(skb)' before the
>>> dst_input line in ip_rcv_finish..but so far, it hasn't
>>> hit the problem...
>>>
>>
>> There is no way skb is freed here.
>>
>> I would say macvlan is at fault here.
>>
>> It probably lacks a proper (dev->flags & IFF_UP) test.
>
> Thanks for the patch.  It's running OK so far, but I'll need to
> do a bunch more testing tomorrow to make sure I'm not just getting
> lucky!
>
> I'll let you know how it goes tomorrow.

Unfortunately, I hit it again this morning after the first restart of
my application (which bounces all 3000 interfaces).  Memory poisoning
was disabled.

(gdb) l *(ip_rcv_finish+0x2b9)
0xffffffff814a8ab3 is in ip_rcv_finish (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:373).
368					skb->len);
369		} else if (rt->rt_type == RTN_BROADCAST)
370			IP_UPD_PO_STATS_BH(dev_net(rt->dst.dev), IPSTATS_MIB_INBCAST,
371					skb->len);
372	
373		return dst_input(skb);
374	
375	drop:
376		kfree_skb(skb);
377		return NET_RX_DROP;
(gdb)

(gdb) l *(skb_dst+0x5a)
0xffffffff814a87fa is in ip_rcv_finish (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:320).
315	
316	int sysctl_ip_early_demux __read_mostly = 1;
317	EXPORT_SYMBOL(sysctl_ip_early_demux);
318	
319	static int ip_rcv_finish(struct sk_buff *skb)
320	{
321		const struct iphdr *iph = ip_hdr(skb);
322		struct rtable *rt;
323	
324		if (sysctl_ip_early_demux && !skb_dst(skb)) {

nfs: server 10.1.0.1 not responding, timed out
nfs: server 10.1.0.1 not responding, timed out
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<          (null)>]           (null)
PGD 0
Oops: 0010 [#1] PREEMPT SMP
Modules linked in: nf_nat_ipv4 nf_nat nfsv4 auth_rpcgss nfs fscache 8021q garp stp llc lockd sunrpc macvlan pktgen uinput coretemp hwmon kvm_intel kvm iTCO_wdt 
iTCO_vendor_support gpio_ich microcode pcspkr i2c_i801 lpc_ich e1000e i7core_edac ioatdma edac_core igb ptp pps_core dca ipv6 mgag200 i2c_algo_bit 
drm_kms_helper ttm drm i2c_core [last unloaded: iptable_nat]
CPU 0
Pid: 9, comm: rcuc/0 Tainted: G         C O 3.7.3+ #43 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
RSP: 0018:ffff88041fc03da0  EFLAGS: 00010286
RAX: ffff88023045f5c0 RBX: ffff88034491ab00 RCX: 00000000ffff8800
RDX: ffff88025920d4fc RSI: ffffffff81a2a500 RDI: ffff88034491ab00
RBP: ffff88041fc03dc8 R08: ffffffff814a87fa R09: ffff88041fc03d90
R10: ffff88025920d4fc R11: ffff88041fc03e28 R12: ffff88025920d4fc
R13: ffff88041fc18b60 R14: ffff88040d3f8000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001a0b000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rcuc/0 (pid: 9, threadinfo ffff88040d4e2000, task ffff88040d4e8000)
Stack:
  ffffffff814a8ab3 ffff88034491ab00 ffffffff814a87fa ffff88034491ab00
  ffff88040d3f8000 ffff88041fc03df8 ffffffff814a8e66 0000000080000000
  ffffffff81472e61 ffff88034491ab00 ffff88040d3f8000 ffff88041fc03e28
Call Trace:
  <IRQ>
  [<ffffffff814a8ab3>] ? ip_rcv_finish+0x2b9/0x2d1
  [<ffffffff814a87fa>] ? skb_dst+0x5a/0x5a
  [<ffffffff814a8e66>] NF_HOOK.clone.1+0x4c/0x54
  [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb
  [<ffffffff814a90f3>] ip_rcv+0x237/0x268
  [<ffffffff81473def>] __netif_receive_skb+0x487/0x530
  [<ffffffff81473f91>] process_backlog+0xf9/0x1da
nfs: server 10.1.8.1 not responding, timed out
  [<ffffffff8147639a>] net_rx_action+0xad/0x218
  [<ffffffff8108d50a>] __do_softirq+0x9c/0x161
  [<ffffffff81538ddc>] call_softirq+0x1c/0x30
  <EOI>
  [<ffffffff8100bd21>] do_softirq+0x41/0x7e
  [<ffffffff8108d68b>] _local_bh_enable_ip+0x7a/0x9f
  [<ffffffff8108d6c8>] local_bh_enable+0xd/0x11
  [<ffffffff810f3661>] rcu_cpu_kthread+0xe6/0x11f
  [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259
  [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff810a0a6d>] kthread+0xc2/0xca
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
  [<ffffffff81537afc>] ret_from_fork+0x7c/0xb0
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
Code:  Bad RIP value.nfs: server 10.1.0.1 not responding, timed out
nfs: server 10.1.0.1 not responding, timed out

>
> Thanks,
> Ben
>
>
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-22 17:08       ` Ben Greear
@ 2013-01-22 17:17         ` Eric Dumazet
  2013-01-22 17:26           ` Ben Greear
  2013-01-22 17:26           ` Eric Dumazet
  0 siblings, 2 replies; 21+ messages in thread
From: Eric Dumazet @ 2013-01-22 17:17 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs@vger.kernel.org

On Tue, 2013-01-22 at 09:08 -0800, Ben Greear wrote:

> Unfortunately, I hit it again this morning after the first restart of
> my application (which bounces all 3000 interfaces).  Memory poisoning
> was disabled.

Is your NFS traffic using TCP or UDP ?



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-22 17:17         ` Eric Dumazet
@ 2013-01-22 17:26           ` Ben Greear
  2013-01-22 17:26           ` Eric Dumazet
  1 sibling, 0 replies; 21+ messages in thread
From: Ben Greear @ 2013-01-22 17:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs@vger.kernel.org

On 01/22/2013 09:17 AM, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 09:08 -0800, Ben Greear wrote:
>
>> Unfortunately, I hit it again this morning after the first restart of
>> my application (which bounces all 3000 interfaces).  Memory poisoning
>> was disabled.
>
> Is your NFS traffic using TCP or UDP ?

It's using TCP.

Thanks,
Ben

>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-22 17:17         ` Eric Dumazet
  2013-01-22 17:26           ` Ben Greear
@ 2013-01-22 17:26           ` Eric Dumazet
  2013-01-22 22:18             ` Ben Greear
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2013-01-22 17:26 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs@vger.kernel.org

On Tue, 2013-01-22 at 09:17 -0800, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 09:08 -0800, Ben Greear wrote:
> 
> > Unfortunately, I hit it again this morning after the first restart of
> > my application (which bounces all 3000 interfaces).  Memory poisoning
> > was disabled.
> 
> Is your NFS traffic using TCP or UDP ?
> 

Oh well, it seems macvlan.c has to skb_drop_dst(skb) before giving skb
to netif_rx()

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 68a43fe..4f049ee 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -114,6 +114,7 @@ static int macvlan_broadcast_one(struct sk_buff *skb,
 	if (!skb)
 		return NET_RX_DROP;
 
+	skb_dst_drop(skb);
 	if (local)
 		return vlan->forward(dev, skb);
 
@@ -220,6 +221,7 @@ static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb)
 	if (!skb)
 		goto out;
 
+	skb_dst_drop(skb);
 	skb->dev = dev;
 	skb->pkt_type = PACKET_HOST;
 



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-22 17:26           ` Eric Dumazet
@ 2013-01-22 22:18             ` Ben Greear
  2013-01-23  2:32               ` Ben Greear
  0 siblings, 1 reply; 21+ messages in thread
From: Ben Greear @ 2013-01-22 22:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs@vger.kernel.org

On 01/22/2013 09:26 AM, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 09:17 -0800, Eric Dumazet wrote:
>> On Tue, 2013-01-22 at 09:08 -0800, Ben Greear wrote:
>>
>>> Unfortunately, I hit it again this morning after the first restart of
>>> my application (which bounces all 3000 interfaces).  Memory poisoning
>>> was disabled.
>>
>> Is your NFS traffic using TCP or UDP ?
>>
>
> Oh well, it seems macvlan.c has to skb_drop_dst(skb) before giving skb
> to netif_rx()

I just saw another crash.  It had run 2 user-space restarts and
2 reboots, but on the third reboot, it crashed coming up.  It seemed
to last longer this time, but that could just be luck as it's never
been super easy to reproduce this quickly.

For completeness, here is the diff I was using:

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 68a43fe..eb55c88 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -111,9 +111,16 @@ static int macvlan_broadcast_one(struct sk_buff *skb,
                                  const struct ethhdr *eth, bool local)
  {
         struct net_device *dev = vlan->dev;
+
         if (!skb)
                 return NET_RX_DROP;

+       if (!(dev->flags & IFF_UP)) {
+               kfree_skb(skb);
+               return NET_RX_DROP;
+       }
+
+       skb_dst_drop(skb);
         if (local)
                 return vlan->forward(dev, skb);

@@ -220,6 +227,7 @@ static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb)
         if (!skb)
                 goto out;

+       skb_dst_drop(skb);
         skb->dev = dev;
         skb->pkt_type = PACKET_HOST;


BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<          (null)>]           (null)
PGD 0
Oops: 0010 [#1] PREEMPT SMP
Modules linked in: nf_nat_ipv4 nf_nat nfsv4 auth_rpcgss nfs fscache 8021q garp stp llc macvlan pktgen lockd sunrpc uinput iTCO_wdt iTCO_vendor_support gpio_ich 
coretemp hwmon kvm_intel kvm microcode pcspkr i2c_i801 lpc_ich e1000e igb ptp ioatdma i7core_edac pps_core dca edac_core ipv6 mgag200 i2c_algo_bit 
drm_kms_helper ttm drm i2c_core [last unloaded: iptable_nat]
CPU 5
Pid: 40, comm: rcuc/5 Tainted: G         C   3.7.3+ #43 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
RSP: 0018:ffff88041fca3da0  EFLAGS: 00010282
RAX: ffff88030ae8bc80 RBX: ffff880198694500 RCX: 0000000000000028
RDX: ffffffff81aafcb0 RSI: ffffffff81a2a500 RDI: ffff880198694500
RBP: ffff88041fca3dc8 R08: ffffffff814a87fa R09: ffff88041fca3d90
R10: ffff8803dc45b8fc R11: ffff88041fca3e28 R12: ffff8803dc45b8fc
R13: ffff880198694500 R14: ffff88040d3f8000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88041fca0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001a0b000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rcuc/5 (pid: 40, threadinfo ffff88040d73c000, task ffff88040d723ea0)
Stack:
  ffffffff814a8ab3 ffff880198694500 ffffffff814a87fa ffff880198694500
  ffff88040d3f8000 ffff88041fca3df8 ffffffff814a8e66 0000000080000000
  ffffffff81472e61 ffff880198694500 ffff88040d3f8000 ffff88041fca3e28
Call Trace:
  <IRQ>
  [<ffffffff814a8ab3>] ? ip_rcv_finish+0x2b9/0x2d1
  [<ffffffff814a87fa>] ? skb_dst+0x5a/0x5a
  [<ffffffff814a8e66>] NF_HOOK.clone.1+0x4c/0x54
  [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb
  [<ffffffff814a90f3>] ip_rcv+0x237/0x268
  [<ffffffff81473def>] __netif_receive_skb+0x487/0x530
  [<ffffffff81473f91>] process_backlog+0xf9/0x1da
  [<ffffffff8147639a>] net_rx_action+0xad/0x218
  [<ffffffff8108d50a>] __do_softirq+0x9c/0x161
  [<ffffffff81538ddc>] call_softirq+0x1c/0x30
  <EOI>
  [<ffffffff8100bd21>] do_softirq+0x41/0x7e
  [<ffffffff8108d68b>] _local_bh_enable_ip+0x7a/0x9f
  [<ffffffff8108d6c8>] local_bh_enable+0xd/0x11
  [<ffffffff810f3661>] rcu_cpu_kthread+0xe6/0x11f
  [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259
  [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff810a0a6d>] kthread+0xc2/0xca
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
  [<ffffffff81537afc>] ret_from_fork+0x7c/0xb0
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
Code:  Bad RIP value.
RIP  [<          (null)>]           (null)
  RSP <ffff88041fca3da0>
CR2: 0000000000000000


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-22 22:18             ` Ben Greear
@ 2013-01-23  2:32               ` Ben Greear
  2013-01-23  6:11                 ` Eric Dumazet
  2013-01-23 23:55                 ` Ben Greear
  0 siblings, 2 replies; 21+ messages in thread
From: Ben Greear @ 2013-01-23  2:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs@vger.kernel.org

On 01/22/2013 02:18 PM, Ben Greear wrote:
> On 01/22/2013 09:26 AM, Eric Dumazet wrote:
>> On Tue, 2013-01-22 at 09:17 -0800, Eric Dumazet wrote:
>>> On Tue, 2013-01-22 at 09:08 -0800, Ben Greear wrote:
>>>
>>>> Unfortunately, I hit it again this morning after the first restart of
>>>> my application (which bounces all 3000 interfaces).  Memory poisoning
>>>> was disabled.
>>>
>>> Is your NFS traffic using TCP or UDP ?
>>>
>>
>> Oh well, it seems macvlan.c has to skb_drop_dst(skb) before giving skb
>> to netif_rx()
>
> I just saw another crash.  It had run 2 user-space restarts and
> 2 reboots, but on the third reboot, it crashed coming up.  It seemed
> to last longer this time, but that could just be luck as it's never
> been super easy to reproduce this quickly.

I added a patch to set dst->input and dst->output to 0xdeadbeef before
freeing the memory.  (The warn-on below did NOT hit)

@@ -452,6 +452,9 @@ static inline int dst_output(struct sk_buff *skb)
  /* Input packet from network to transport.  */
  static inline int dst_input(struct sk_buff *skb)
  {
+       if (WARN_ON(((unsigned long)(skb_dst(skb))) < 4000)) {
+               printk("Bad skb_dst: %lu\n", skb->_skb_refdst);
+       }
         return skb_dst(skb)->input(skb);
  }

diff --git a/net/core/dst.c b/net/core/dst.c
index ee6153e..234b168 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -245,6 +245,7 @@ again:
                 dst->ops->destroy(dst);
         if (dst->dev)
                 dev_put(dst->dev);
+       dst->input = dst->output = 0xdeadbeef;
         kmem_cache_free(dst->ops->kmem_cachep, dst);

         dst = child;

Looks like we do indeed access freed memory, based on this crash I saw on
the next reboot:

[root@lf1011-12060006 ~]# BUG: unable to handle kernel paging request at 00000000deadbeef
IP: [<00000000deadbeef>] 0xdeadbeee
PGD 0
Oops: 0010 [#1] PREEMPT SMP
Modules linked in: macvlan pktgen lockd sunrpc uinput iTCO_wdt iTCO_vendor_support gpio_ich coretemp hwmon kvm_intel kvm microcode pcspkr i2c_i801 lpc_ich 
e1000e i7core_edac ioatdma edac_core igb ptp pps_core dca ipv6 mgag200 i2c_algo_bit drm_kms_helper ttm drm i2c_core
CPU 8
Pid: 59, comm: ksoftirqd/8 Tainted: G         C O 3.7.3+ #46 Iron Systems Inc. EE2610R/X8ST3
RIP: 0010:[<00000000deadbeef>]  [<00000000deadbeef>] 0xdeadbeee
RSP: 0018:ffff88040d7d7bc0  EFLAGS: 00010286
RAX: ffff8803d97fc900 RBX: ffff8803d4d30d00 RCX: 0000000000000028
RDX: ffffffff81aafcb0 RSI: ffffffff81a2a500 RDI: ffff8803d4d30d00
RBP: ffff88040d7d7be8 R08: ffffffff814a8812 R09: ffff88040d7d7bb0
R10: ffff8803c9dfd8fc R11: ffff88040d7d7c48 R12: ffff8803c9dfd8fc
R13: ffff8803d4d30d00 R14: ffff88040d3f8000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88041fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000deadbeef CR3: 0000000001a0b000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ksoftirqd/8 (pid: 59, threadinfo ffff88040d7d6000, task ffff88040d7e1f50)
Stack:
  ffffffff814a8b02 ffff8803d4d30d00 ffffffff814a8812 ffff8803d4d30d00
  ffff88040d3f8000 ffff88040d7d7c18 ffffffff814a8eb5 0000000080000000
  ffffffff81472e61 ffff8803d4d30d00 ffff88040d3f8000 ffff88040d7d7c48
Call Trace:
  [<ffffffff814a8b02>] ? ip_rcv_finish+0x2f0/0x308
  [<ffffffff814a8812>] ? skb_dst+0x5a/0x5a
  [<ffffffff814a8eb5>] NF_HOOK.clone.1+0x4c/0x54
  [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb
  [<ffffffff814a9142>] ip_rcv+0x237/0x269
  [<ffffffff81473def>] __netif_receive_skb+0x487/0x530
  [<ffffffff81473f91>] process_backlog+0xf9/0x1da
  [<ffffffff8147639a>] net_rx_action+0xad/0x218
  [<ffffffff8108d50a>] __do_softirq+0x9c/0x161
  [<ffffffff8108d5f2>] run_ksoftirqd+0x23/0x42
  [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259
  [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11
  [<ffffffff810a0a6d>] kthread+0xc2/0xca
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
  [<ffffffff81537b7c>] ret_from_fork+0x7c/0xb0
  [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
Code:  Bad RIP value.
RIP  [<00000000deadbeef>] 0xdeadbeee
  RSP <ffff88040d7d7bc0>
CR2: 00000000deadbeef
---[ end trace eed854e70ff0a575 ]---
Kernel panic - not syncing: Fatal excepti

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-23  2:32               ` Ben Greear
@ 2013-01-23  6:11                 ` Eric Dumazet
  2013-01-23  7:14                   ` Ben Greear
  2013-01-23 23:55                 ` Ben Greear
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2013-01-23  6:11 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs@vger.kernel.org

On Tue, 2013-01-22 at 18:32 -0800, Ben Greear wrote:

> diff --git a/net/core/dst.c b/net/core/dst.c
> index ee6153e..234b168 100644
> --- a/net/core/dst.c
> +++ b/net/core/dst.c
> @@ -245,6 +245,7 @@ again:
>                  dst->ops->destroy(dst);
>          if (dst->dev)
>                  dev_put(dst->dev);
> +       dst->input = dst->output = 0xdeadbeef;
>          kmem_cache_free(dst->ops->kmem_cachep, dst);

Great !

You could comment the kmem_cache_free() as well to get better chances to
hit the bug, and maybe start a bisection to find the faulty commit ?




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-23  6:11                 ` Eric Dumazet
@ 2013-01-23  7:14                   ` Ben Greear
  2013-01-23 13:35                     ` Eric Dumazet
  2013-01-23 14:42                     ` Eric Dumazet
  0 siblings, 2 replies; 21+ messages in thread
From: Ben Greear @ 2013-01-23  7:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs@vger.kernel.org

On 01/22/2013 10:11 PM, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 18:32 -0800, Ben Greear wrote:
>
>> diff --git a/net/core/dst.c b/net/core/dst.c
>> index ee6153e..234b168 100644
>> --- a/net/core/dst.c
>> +++ b/net/core/dst.c
>> @@ -245,6 +245,7 @@ again:
>>                   dst->ops->destroy(dst);
>>           if (dst->dev)
>>                   dev_put(dst->dev);
>> +       dst->input = dst->output = 0xdeadbeef;
>>           kmem_cache_free(dst->ops->kmem_cachep, dst);
>
> Great !
>
> You could comment the kmem_cache_free() as well to get better chances to
> hit the bug, and maybe start a bisection to find the faulty commit ?

I suspect the bug goes back at least as far as 3.3.  And since
I need the NFS patches for this test case, bisecting will be pure hell.

I'll work on some more code instrumentation tomorrow.

One thing that came to mind while I was looking at the code today:

How are the non-ref-counted dst objects used safely?  Any chance
that tearing down the IP protocol on a device (or deleting a device)
could delete a dst that is referenced by an skb (and thus crashes as
I see)?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-23  7:14                   ` Ben Greear
@ 2013-01-23 13:35                     ` Eric Dumazet
  2013-01-23 18:15                       ` Ben Greear
  2013-01-23 14:42                     ` Eric Dumazet
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2013-01-23 13:35 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs@vger.kernel.org

On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote:

> How are the non-ref-counted dst objects used safely?  Any chance
> that tearing down the IP protocol on a device (or deleting a device)
> could delete a dst that is referenced by an skb (and thus crashes as
> I see)?

There is probably a bug. Normally it should be RCU protected.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-23  7:14                   ` Ben Greear
  2013-01-23 13:35                     ` Eric Dumazet
@ 2013-01-23 14:42                     ` Eric Dumazet
  2013-01-23 21:53                       ` Ben Greear
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2013-01-23 14:42 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs@vger.kernel.org

On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote:

> I suspect the bug goes back at least as far as 3.3.  And since
> I need the NFS patches for this test case, bisecting will be pure hell.

Make sure the bug is still present in net-next.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-23 13:35                     ` Eric Dumazet
@ 2013-01-23 18:15                       ` Ben Greear
  2013-01-23 21:43                         ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Ben Greear @ 2013-01-23 18:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs@vger.kernel.org

On 01/23/2013 05:35 AM, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote:
>
>> How are the non-ref-counted dst objects used safely?  Any chance
>> that tearing down the IP protocol on a device (or deleting a device)
>> could delete a dst that is referenced by an skb (and thus crashes as
>> I see)?
>
> There is probably a bug. Normally it should be RCU protected.

I'm building net-next, and will do some testing on it later today
if all goes well.

In the meantime, a few questions about RCU.

If the non-ref-counted dst is protected by RCU, does that mean
that from the time we acquire the pointer (ie, set it in the skb),
to the time we clear the pointer from the skb, we must be under
RCU read lock?

If so, that means that ip_rcv_finish must be called under RCU
read lock, etc?

Btw, this seems non related to NFS, so I'm dropping them
from CC after this email unless I hear a request otherwise.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-23 18:15                       ` Ben Greear
@ 2013-01-23 21:43                         ` Eric Dumazet
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2013-01-23 21:43 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs@vger.kernel.org

On Wed, 2013-01-23 at 10:15 -0800, Ben Greear wrote:
> On 01/23/2013 05:35 AM, Eric Dumazet wrote:
> > On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote:
> >
> >> How are the non-ref-counted dst objects used safely?  Any chance
> >> that tearing down the IP protocol on a device (or deleting a device)
> >> could delete a dst that is referenced by an skb (and thus crashes as
> >> I see)?
> >
> > There is probably a bug. Normally it should be RCU protected.
> 
> I'm building net-next, and will do some testing on it later today
> if all goes well.
> 
> In the meantime, a few questions about RCU.
> 
> If the non-ref-counted dst is protected by RCU, does that mean
> that from the time we acquire the pointer (ie, set it in the skb),
> to the time we clear the pointer from the skb, we must be under
> RCU read lock?
> 
> If so, that means that ip_rcv_finish must be called under RCU
> read lock, etc?

Obviously yes.  Check skb_dst_force() for the rare cases we want to
escape the RCU section.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-23 14:42                     ` Eric Dumazet
@ 2013-01-23 21:53                       ` Ben Greear
  0 siblings, 0 replies; 21+ messages in thread
From: Ben Greear @ 2013-01-23 21:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs@vger.kernel.org

On 01/23/2013 06:42 AM, Eric Dumazet wrote:
> On Tue, 2013-01-22 at 23:14 -0800, Ben Greear wrote:
>
>> I suspect the bug goes back at least as far as 3.3.  And since
>> I need the NFS patches for this test case, bisecting will be pure hell.
>
> Make sure the bug is still present in net-next.
>

My test case isn't running well on net-next.  Seems most
of the file-io processes are hung trying to open a file:

[root@lf1011-12060006 lanforge]# cat /proc/10612/stack
[<ffffffff8115fadb>] do_last+0x1db/0xa4e
[<ffffffff8116090b>] path_openat+0xcb/0x363
[<ffffffff81160ca8>] do_filp_open+0x38/0x84
[<ffffffff81153bfc>] do_sys_open+0x6d/0xff
[<ffffffff81153cbb>] sys_open+0x1c/0x1e
[<ffffffff8154f629>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff


I'm going back to 3.7.3 and will try to learn some more
about what it's doing there.

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-23  2:32               ` Ben Greear
  2013-01-23  6:11                 ` Eric Dumazet
@ 2013-01-23 23:55                 ` Ben Greear
  2013-01-24  0:01                   ` Eric Dumazet
  1 sibling, 1 reply; 21+ messages in thread
From: Ben Greear @ 2013-01-23 23:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs@vger.kernel.org

On 01/22/2013 06:32 PM, Ben Greear wrote:

So, I'm slowly making some progress.  I've verified that the skb
has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish
method.  I'm trying to track it backwards and figure out which
device it belongs to, etc....takes a while to reproduce though.

One thing about this stack trace below...the dev_seq_stop() does
a rcu read-unlock.  Now, I can't figure out exactly how ip_rcv()
can cause dev_seq_stop() to run, but if this stack is legit,
then maybe by the time we enter the ip_rcv_finish() code we are
running without rcu_readlock() held?

If so, that would probably explain the bug.

> Call Trace:
>   [<ffffffff814a8b02>] ? ip_rcv_finish+0x2f0/0x308
>   [<ffffffff814a8812>] ? skb_dst+0x5a/0x5a
>   [<ffffffff814a8eb5>] NF_HOOK.clone.1+0x4c/0x54
>   [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb
>   [<ffffffff814a9142>] ip_rcv+0x237/0x269
>   [<ffffffff81473def>] __netif_receive_skb+0x487/0x530
>   [<ffffffff81473f91>] process_backlog+0xf9/0x1da
>   [<ffffffff8147639a>] net_rx_action+0xad/0x218
>   [<ffffffff8108d50a>] __do_softirq+0x9c/0x161
>   [<ffffffff8108d5f2>] run_ksoftirqd+0x23/0x42
>   [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259
>   [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11
>   [<ffffffff810a0a6d>] kthread+0xc2/0xca
>   [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
>   [<ffffffff81537b7c>] ret_from_fork+0x7c/0xb0
>   [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56

## This is from a slightly different kernel image...but probably this part is legit.

0xffffffff814a92b3 is in ip_rcv (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:466).
461		/* Our transport medium may have padded the buffer out. Now we know it
462		 * is IP we can trim to the true length of the frame.
463		 * Note this now means skb->len holds ntohs(iph->tot_len).
464		 */
465		if (pskb_trim_rcsum(skb, len)) {
466			IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS);
467			goto drop;
468		}
469	
470		/* Remove any debris in the socket control block */

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-23 23:55                 ` Ben Greear
@ 2013-01-24  0:01                   ` Eric Dumazet
  2013-01-24  0:13                     ` Ben Greear
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2013-01-24  0:01 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs@vger.kernel.org

On Wed, 2013-01-23 at 15:55 -0800, Ben Greear wrote:
> On 01/22/2013 06:32 PM, Ben Greear wrote:
> 
> So, I'm slowly making some progress.  I've verified that the skb
> has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish
> method.  I'm trying to track it backwards and figure out which
> device it belongs to, etc....takes a while to reproduce though.
> 
> One thing about this stack trace below...the dev_seq_stop() does
> a rcu read-unlock.  Now, I can't figure out exactly how ip_rcv()
> can cause dev_seq_stop() to run, but if this stack is legit,
> then maybe by the time we enter the ip_rcv_finish() code we are
> running without rcu_readlock() held?
> 
> If so, that would probably explain the bug.
> 

The whole thing is run under rcu_read_lock() done in
__netif_receive_skb()

My suspicion was that we called netif_rx() from macvlan leaving a
not refcounted skb dst.

But the patch I sent to you didnt solve the bug, so its something else.

You could trace at which point the dst was released. (where you set
dst->input/output to deadbeef)

> > Call Trace:
> >   [<ffffffff814a8b02>] ? ip_rcv_finish+0x2f0/0x308
> >   [<ffffffff814a8812>] ? skb_dst+0x5a/0x5a
> >   [<ffffffff814a8eb5>] NF_HOOK.clone.1+0x4c/0x54
> >   [<ffffffff81472e61>] ? dev_seq_stop+0xb/0xb
> >   [<ffffffff814a9142>] ip_rcv+0x237/0x269
> >   [<ffffffff81473def>] __netif_receive_skb+0x487/0x530
> >   [<ffffffff81473f91>] process_backlog+0xf9/0x1da
> >   [<ffffffff8147639a>] net_rx_action+0xad/0x218
> >   [<ffffffff8108d50a>] __do_softirq+0x9c/0x161
> >   [<ffffffff8108d5f2>] run_ksoftirqd+0x23/0x42
> >   [<ffffffff810a7ebe>] smpboot_thread_fn+0x253/0x259
> >   [<ffffffff810a7c6b>] ? test_ti_thread_flag.clone.0+0x11/0x11
> >   [<ffffffff810a0a6d>] kthread+0xc2/0xca
> >   [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
> >   [<ffffffff81537b7c>] ret_from_fork+0x7c/0xb0
> >   [<ffffffff810a09ab>] ? __init_kthread_worker+0x56/0x56
> 
> 
> ## This is from a slightly different kernel image...but probably this part is legit.
> 
> 0xffffffff814a92b3 is in ip_rcv (/home/greearb/git/linux-3.7.dev.y/net/ipv4/ip_input.c:466).
> 461		/* Our transport medium may have padded the buffer out. Now we know it
> 462		 * is IP we can trim to the true length of the frame.
> 463		 * Note this now means skb->len holds ntohs(iph->tot_len).
> 464		 */
> 465		if (pskb_trim_rcsum(skb, len)) {
> 466			IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS);
> 467			goto drop;
> 468		}
> 469	
> 470		/* Remove any debris in the socket control block */
> 
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-24  0:01                   ` Eric Dumazet
@ 2013-01-24  0:13                     ` Ben Greear
  2013-01-24  0:23                       ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Ben Greear @ 2013-01-24  0:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, linux-nfs@vger.kernel.org

On 01/23/2013 04:01 PM, Eric Dumazet wrote:
> On Wed, 2013-01-23 at 15:55 -0800, Ben Greear wrote:
>> On 01/22/2013 06:32 PM, Ben Greear wrote:
>>
>> So, I'm slowly making some progress.  I've verified that the skb
>> has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish
>> method.  I'm trying to track it backwards and figure out which
>> device it belongs to, etc....takes a while to reproduce though.
>>
>> One thing about this stack trace below...the dev_seq_stop() does
>> a rcu read-unlock.  Now, I can't figure out exactly how ip_rcv()
>> can cause dev_seq_stop() to run, but if this stack is legit,
>> then maybe by the time we enter the ip_rcv_finish() code we are
>> running without rcu_readlock() held?
>>
>> If so, that would probably explain the bug.
>>
>
> The whole thing is run under rcu_read_lock() done in
> __netif_receive_skb()

I was worried that the dev_seq_stop might be called
incorrectly causing an asymetric unlock.  I have no
idea how that might happened, but several crashes
have that dev_seq_stop method listed, so it got me suspicious.

>
> My suspicion was that we called netif_rx() from macvlan leaving a
> not refcounted skb dst.
>
> But the patch I sent to you didnt solve the bug, so its something else.
>
> You could trace at which point the dst was released. (where you set
> dst->input/output to deadbeef)

My current code is in some garbage collector timer code, but I can
work on saving the call-site that first pokes the dst into the
garbage collection list...

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 3.7.3+:  Bad paging request in ip_rcv_finish while running NFS traffic.
  2013-01-24  0:13                     ` Ben Greear
@ 2013-01-24  0:23                       ` Eric Dumazet
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2013-01-24  0:23 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, linux-nfs@vger.kernel.org

On Wed, 2013-01-23 at 16:13 -0800, Ben Greear wrote:
> On 01/23/2013 04:01 PM, Eric Dumazet wrote:
> > On Wed, 2013-01-23 at 15:55 -0800, Ben Greear wrote:
> >> On 01/22/2013 06:32 PM, Ben Greear wrote:
> >>
> >> So, I'm slowly making some progress.  I've verified that the skb
> >> has bogus dst (0xdeadbeef) at the top of the ip_rcv_finish
> >> method.  I'm trying to track it backwards and figure out which
> >> device it belongs to, etc....takes a while to reproduce though.
> >>
> >> One thing about this stack trace below...the dev_seq_stop() does
> >> a rcu read-unlock.  Now, I can't figure out exactly how ip_rcv()
> >> can cause dev_seq_stop() to run, but if this stack is legit,
> >> then maybe by the time we enter the ip_rcv_finish() code we are
> >> running without rcu_readlock() held?
> >>
> >> If so, that would probably explain the bug.
> >>
> >
> > The whole thing is run under rcu_read_lock() done in
> > __netif_receive_skb()
> 
> I was worried that the dev_seq_stop might be called
> incorrectly causing an asymetric unlock.  I have no
> idea how that might happened, but several crashes
> have that dev_seq_stop method listed, so it got me suspicious.

dev_seq_stop() is some word in the kernel stack, result of a prior
system call. Stack is not cleanup.

Each function reserves an amount of stack but not always write on all
reserved space (some automatic variables might be not set)

Note the "? " before the name : linux printed the symbol but this was
not a call site for this particular call graph. Its only an extra
indication, that can be useful sometimes.




^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-01-24  0:24 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-21 21:07 3.7.3+: Bad paging request in ip_rcv_finish while running NFS traffic Ben Greear
2013-01-22  0:32 ` Ben Greear
2013-01-22  4:40   ` Eric Dumazet
2013-01-22  5:57     ` Ben Greear
2013-01-22 17:08       ` Ben Greear
2013-01-22 17:17         ` Eric Dumazet
2013-01-22 17:26           ` Ben Greear
2013-01-22 17:26           ` Eric Dumazet
2013-01-22 22:18             ` Ben Greear
2013-01-23  2:32               ` Ben Greear
2013-01-23  6:11                 ` Eric Dumazet
2013-01-23  7:14                   ` Ben Greear
2013-01-23 13:35                     ` Eric Dumazet
2013-01-23 18:15                       ` Ben Greear
2013-01-23 21:43                         ` Eric Dumazet
2013-01-23 14:42                     ` Eric Dumazet
2013-01-23 21:53                       ` Ben Greear
2013-01-23 23:55                 ` Ben Greear
2013-01-24  0:01                   ` Eric Dumazet
2013-01-24  0:13                     ` Ben Greear
2013-01-24  0:23                       ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox