From: Joe Jin <joe.jin@oracle.com>
To: Frank Blaschka <frank.blaschka@de.ibm.com>,
"David S. Miller" <davem@davemloft.net>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"zheng.x.li@oracle.com" <zheng.x.li@oracle.com>
Subject: kernel panic in skb_copy_bits
Date: Thu, 27 Jun 2013 10:58:16 +0800 [thread overview]
Message-ID: <51CBAA48.3080802@oracle.com> (raw)
Hi,
When we do fail over test with iscsi + multipath by reset the switches
on OVM(2.6.39) we hit the panic:
BUG: unable to handle kernel paging request at ffff88006d9e8d48
IP: [<ffffffff812605bb>] memcpy+0xb/0x120
PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0
Oops: 0000 [#1] SMP
CPU 7
Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed
dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext3 j!
bd mbcache
Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246
RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120
RSP: e02b:ffff8801003c3d58 EFLAGS: 00010246
RAX: ffff880076b9e280 RBX: ffff8800714d2c00 RCX: 0000000000000057
RDX: 0000000000000000 RSI: ffff88006d9e8d48 RDI: ffff880076b9e280
RBP: ffff8801003c3dc0 R08: 00000000000bf723 R09: 0000000000000000
R10: 0000000000000000 R11: 000000000000000a R12: 0000000000000034
R13: 0000000000000034 R14: 00000000000002b8 R15: 00000000000005a8
FS: 00007fc1e852a6e0(0000) GS:ffff8801003c0000(0000) knlGS:0000000000000000
CS: e033 DS: 002b ES: 002b CR0: 000000008005003b
CR2: ffff88006d9e8d48 CR3: 000000006370b000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff880077ac0000, task ffff880077abe240)
Stack:
ffffffff8142db21 0000000000000000 ffff880076b9e280 ffff8800637097f0
000002ec00000000 00000000000002b8 ffff880077ac0000 0000000000000000
ffff8800637097f0 ffff880066c9a7c0 00000000fffffdb4 000000000000024c
Call Trace:
<IRQ>
[<ffffffff8142db21>] ? skb_copy_bits+0x1c1/0x2e0
[<ffffffff8142f173>] skb_copy+0xf3/0x120
[<ffffffff81447fbc>] neigh_timer_handler+0x1ac/0x350
[<ffffffff810573fe>] ? account_idle_ticks+0xe/0x10
[<ffffffff81447e10>] ? neigh_alloc+0x180/0x180
[<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110
[<ffffffff81447e10>] ? neigh_alloc+0x180/0x180
[<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220
[<ffffffff81075c39>] __do_softirq+0xb9/0x1d0
[<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70
[<ffffffff81511d3c>] call_softirq+0x1c/0x30
[<ffffffff810172e5>] do_softirq+0x65/0xa0
[<ffffffff8107656b>] irq_exit+0xab/0xc0
[<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50
[<ffffffff81511d8e>] xen_do_hypervisor_callback+0x1e/0x30
<EOI>
[<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[<ffffffff8100a0b0>] ? xen_safe_halt+0x10/0x20
[<ffffffff8101dfeb>] ? default_idle+0x5b/0x170
[<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0
[<ffffffff8100a8c9>] ? xen_irq_enable_direct_reloc+0x4/0x4
[<ffffffff814f7bbe>] ? cpu_bringup_and_idle+0xe/0x10
Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c
RIP [<ffffffff812605bb>] memcpy+0xb/0x120
RSP <ffff8801003c3d58>
CR2: ffff88006d9e8d48
Reviewed vmcore I found the skb->users is 1 at the moment, checked network neighbour
history I found skb_get() be replaced by skb_copy by commit 7e36763b2c:
commit 7e36763b2c204d59de4e88087f84a2c0c8421f25
Author: Frank Blaschka <frank.blaschka@de.ibm.com>
Date: Mon Mar 3 12:16:04 2008 -0800
[NET]: Fix race in generic address resolution.
neigh_update sends skb from neigh->arp_queue while neigh_timer_handler
has increased skbs refcount and calls solicit with the
skb. neigh_timer_handler should not increase skbs refcount but make a
copy of the skb and do solicit with the copy.
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So can you please give some details of the race? per vmcore seems like the skb data
be freed, I suspected skb_get() lost at somewhere?
I reverted above commit the panic not occurred during our testing.
Any input will appreciate!
Best Regards,
Joe
next reply other threads:[~2013-06-27 2:58 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-27 2:58 Joe Jin [this message]
2013-06-27 5:31 ` kernel panic in skb_copy_bits Eric Dumazet
2013-06-27 7:15 ` Joe Jin
2013-06-28 4:17 ` Joe Jin
2013-06-28 6:52 ` Eric Dumazet
2013-06-28 9:37 ` Eric Dumazet
2013-06-28 11:33 ` Joe Jin
2013-06-28 23:36 ` Joe Jin
2013-06-29 7:04 ` Eric Dumazet
2013-06-29 7:20 ` Eric Dumazet
2013-06-29 16:11 ` Ben Greear
2013-06-29 16:26 ` Eric Dumazet
2013-06-29 16:31 ` Ben Greear
2013-06-30 0:26 ` Joe Jin
2013-06-30 7:50 ` Eric Dumazet
2013-07-01 20:36 ` David Miller
2013-06-30 9:13 ` Alex Bligh
2013-06-30 9:35 ` Alex Bligh
2013-07-01 3:18 ` Joe Jin
2013-07-01 8:11 ` Ian Campbell
2013-07-01 13:00 ` Joe Jin
2013-07-04 8:55 ` Joe Jin
2013-07-04 8:59 ` Ian Campbell
2013-07-04 9:34 ` Eric Dumazet
2013-07-04 9:52 ` Ian Campbell
2013-07-04 10:12 ` Eric Dumazet
2013-07-04 12:57 ` Alex Bligh
2013-07-04 21:32 ` David Miller
2013-07-01 8:29 ` Alex Bligh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51CBAA48.3080802@oracle.com \
--to=joe.jin@oracle.com \
--cc=davem@davemloft.net \
--cc=frank.blaschka@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=zheng.x.li@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).