* BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]
@ 2010-09-03 0:20 James Harper
2010-09-03 0:26 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 4+ messages in thread
From: James Harper @ 2010-09-03 0:20 UTC (permalink / raw)
To: xen-devel
I've had "BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]" happen
twice now and the server fairly quickly (over the course of 5-10
minutes) becomes unusable after that happens. The call trace appears to
be:
Call Trace:
[<ffffffff8100e8e9>] ? xen_force_evtchn_callback+0x9/0xa
[<ffffffff8100ef72>] ? check_events+0x12/0x20
[<ffffffff8100ef19>] ? xen_irq_enable_direct_end+0x0/0x7
[<ffffffff8102fcc1>] ? do_page_fault+0x9e/0x27b
[<ffffffff812b39a5>] ? page_fault+0x25/0x30
[<ffffffff8116f28d>] ? __put_user_4+0x1d/0x30
[<ffffffff81045265>] ? schedule_tail+0x92/0x96
[<ffffffff81011983>] ? ret_from_fork+0x13/0x80
That was under 2.6.32.17-g69a73fa. I've since upgraded to
2.6.32.18-ge6b9b2c but the changelogs don't show anything obviously
related to the crash (but as usual there are a lot of them so maybe I've
missed something?)
Is this a known problem under 2.6.32.17-g69a73fa that has since been
fixed?
Thanks
James
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]
2010-09-03 0:20 BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865] James Harper
@ 2010-09-03 0:26 ` Jeremy Fitzhardinge
2010-09-03 0:35 ` James Harper
0 siblings, 1 reply; 4+ messages in thread
From: Jeremy Fitzhardinge @ 2010-09-03 0:26 UTC (permalink / raw)
To: James Harper; +Cc: xen-devel
On 09/02/2010 05:20 PM, James Harper wrote:
> I've had "BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]" happen
> twice now and the server fairly quickly (over the course of 5-10
> minutes) becomes unusable after that happens. The call trace appears to
> be:
>
> Call Trace:
> [<ffffffff8100e8e9>] ? xen_force_evtchn_callback+0x9/0xa
> [<ffffffff8100ef72>] ? check_events+0x12/0x20
> [<ffffffff8100ef19>] ? xen_irq_enable_direct_end+0x0/0x7
> [<ffffffff8102fcc1>] ? do_page_fault+0x9e/0x27b
> [<ffffffff812b39a5>] ? page_fault+0x25/0x30
> [<ffffffff8116f28d>] ? __put_user_4+0x1d/0x30
> [<ffffffff81045265>] ? schedule_tail+0x92/0x96
> [<ffffffff81011983>] ? ret_from_fork+0x13/0x80
>
> That was under 2.6.32.17-g69a73fa. I've since upgraded to
> 2.6.32.18-ge6b9b2c but the changelogs don't show anything obviously
> related to the crash (but as usual there are a lot of them so maybe I've
> missed something?)
>
> Is this a known problem under 2.6.32.17-g69a73fa that has since been
> fixed?
Hard to know; nothing springs to mind right now. What else is going on
at the time? What's the full softlockup message? Is there any more
context?
There have been some fixes relating to long-standing interrupt migration
bugs, but they're after 69a73fa.
J
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]
2010-09-03 0:26 ` Jeremy Fitzhardinge
@ 2010-09-03 0:35 ` James Harper
2010-09-05 23:44 ` James Harper
0 siblings, 1 reply; 4+ messages in thread
From: James Harper @ 2010-09-03 0:35 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: xen-devel
>
> On 09/02/2010 05:20 PM, James Harper wrote:
> > I've had "BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]"
happen
> > twice now and the server fairly quickly (over the course of 5-10
> > minutes) becomes unusable after that happens. The call trace appears
to
> > be:
> >
> > Call Trace:
> > [<ffffffff8100e8e9>] ? xen_force_evtchn_callback+0x9/0xa
> > [<ffffffff8100ef72>] ? check_events+0x12/0x20
> > [<ffffffff8100ef19>] ? xen_irq_enable_direct_end+0x0/0x7
> > [<ffffffff8102fcc1>] ? do_page_fault+0x9e/0x27b
> > [<ffffffff812b39a5>] ? page_fault+0x25/0x30
> > [<ffffffff8116f28d>] ? __put_user_4+0x1d/0x30
> > [<ffffffff81045265>] ? schedule_tail+0x92/0x96
> > [<ffffffff81011983>] ? ret_from_fork+0x13/0x80
> >
> > That was under 2.6.32.17-g69a73fa. I've since upgraded to
> > 2.6.32.18-ge6b9b2c but the changelogs don't show anything obviously
> > related to the crash (but as usual there are a lot of them so maybe
I've
> > missed something?)
> >
> > Is this a known problem under 2.6.32.17-g69a73fa that has since been
> > fixed?
>
> Hard to know; nothing springs to mind right now. What else is going
on
> at the time? What's the full softlockup message? Is there any more
> context?
I just noticed that it coincided pretty much exactly when a backup runs.
I take a snapshot in Dom0 then block-attach it to the linux DomU so it
can be backed up from DomU which makes restores easier etc, rather than
doing the backup in Dom0. I hadn't noticed before because it's the
Windows DomU that people complain about first, not the linux DomU.
Block-attach/detach would explain why udevd is involved too which makes
more sense. The block-attach happens every hour on the hour. The full
cut&paste of the first hang is:
Sep 3 09:00:42 keechsvr2 kernel: [235893.079566] blkback: ring-ref
1442, event-channel 13, protocol 1 (x86_64-abi)
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] BUG: soft lockup -
CPU#1 stuck for 61s! [udevd:4865]
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in:
ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun
act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc ipt_ULO
G xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT
xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp
iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc n
f_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp
nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4
nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback
xen_blkback
blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd
tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo snd_page_alloc
ipmi_msghandler pcspkr psmouse serio_raw rng_core i5000_edac edac_c
ore i5k_amb button evdev xfs exportfs dm_mirror dm_region_hash dm_log
dm_snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata
usb_storage scsi_mod usbhid hid piix ide_pci_generic bnx2 ehci_hcd id
e_core uhci_hc
Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded:
scsi_wait_scan]
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CPU 1:
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in:
ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun
act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc ipt_ULO
G xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT
xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp
iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc n
f_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp
nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4
nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback
xen_blkback
blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd
tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo snd_page_alloc
ipmi_msghandler pcspkr psmouse serio_raw rng_core i5000_edac edac_c
ore i5k_amb button evdev xfs exportfs dm_mirror dm_region_hash dm_log
dm_snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata
usb_storage scsi_mod usbhid hid piix ide_pci_generic bnx2 ehci_hcd id
e_core uhci_hc
Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded:
scsi_wait_scan]
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Pid: 4865, comm: udevd
Not tainted 2.6.32.17-g69a73fa #1 ProLiant ML370 G5
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RIP:
e030:[<ffffffff8100922a>] [<ffffffff8100922a>]
hypercall_page+0x22a/0x1001
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RSP:
e02b:ffff8800114d5dd0 EFLAGS: 00000246
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RAX: 0000000000040000
RBX: 0000000000000001 RCX: ffffffff8100922a
keechsvr2:/usr/local/src/linux-2.6-xen# head -100 /var/log/kern.log
Sep 3 07:00:44 keechsvr2 kernel: [228694.767691] blkback: ring-ref 311,
event-channel 13, protocol 1 (x86_64-abi)
Sep 3 08:00:42 keechsvr2 kernel: [232293.072248] blkback: ring-ref
1334, event-channel 13, protocol 1 (x86_64-abi)
Sep 3 09:00:42 keechsvr2 kernel: [235893.079566] blkback: ring-ref
1442, event-channel 13, protocol 1 (x86_64-abi)
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] BUG: soft lockup -
CPU#1 stuck for 61s! [udevd:4865]
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in:
ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun
act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc ipt_ULOG
xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT
xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp
iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc
nf_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp
nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4
nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback
xen_blkback blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm
snd_timer snd tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo
snd_page_alloc ipmi_msghandler pcspkr psmouse serio_raw rng_core
i5000_edac edac_core i5k_amb button evdev xfs exportfs dm_mirror
dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_piix
ata_generic libata usb_storage scsi_mod usbhid hid piix ide_pci_generic
bnx2 ehci_hcd ide_core uhci_hc
Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded:
scsi_wait_scan]
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CPU 1:
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in:
ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun
act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc ipt_ULOG
xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT
xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp
iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc
nf_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp
nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4
nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback
xen_blkback blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm
snd_timer snd tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo
snd_page_alloc ipmi_msghandler pcspkr psmouse serio_raw rng_core
i5000_edac edac_core i5k_amb button evdev xfs exportfs dm_mirror
dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_piix
ata_generic libata usb_storage scsi_mod usbhid hid piix ide_pci_generic
bnx2 ehci_hcd ide_core uhci_hc
Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded:
scsi_wait_scan]
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Pid: 4865, comm: udevd
Not tainted 2.6.32.17-g69a73fa #1 ProLiant ML370 G5
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RIP:
e030:[<ffffffff8100922a>] [<ffffffff8100922a>]
hypercall_page+0x22a/0x1001
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RSP:
e02b:ffff8800114d5dd0 EFLAGS: 00000246
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RAX: 0000000000040000
RBX: 0000000000000001 RCX: ffffffff8100922a
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RDX: ffffffff814293d0
RSI: 0000000000000000 RDI: 0000000000000000
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RBP: 00007fbc628a9800
R08: 0000000000000000 R09: 0000000000000000
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] R10: 0000000000000002
R11: 0000000000000246 R12: 0000000000000002
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] R13: ffff8800114d5e88
R14: ffff880017f9cfc0 R15: ffff880017eed500
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] FS:
00007fbc628a9770(0000) GS:ffff880002de8000(0000) knlGS:0000000000000000
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CS: e033 DS: 0000 ES:
0000 CR0: 000000008005003b
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CR2: 00007fbc628a9800
CR3: 000000001e2d7000 CR4: 0000000000002660
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Call Trace:
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8100e8e9>]
? xen_force_evtchn_callback+0x9/0xa
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8100ef72>]
? check_events+0x12/0x20
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8100ef19>]
? xen_irq_enable_direct_end+0x0/0x7
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8102fcc1>]
? do_page_fault+0x9e/0x27b
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff812b39a5>]
? page_fault+0x25/0x30
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8116f28d>]
? __put_user_4+0x1d/0x30
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff81045265>]
? schedule_tail+0x92/0x96
Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff81011983>]
? ret_from_fork+0x13/0x80
There's another hang at 09:03:09 which I can post if you'd find it
useful.
If the block-attach happened at 09:00:42 and the crash happened at
09:02:13 then I assume that whatever udevd started doing started at
09:01:12 (eg 61 seconds earlier) which means that it's more likely to be
the block-detach at the end of the backup causing the hang not the
block-attach. The backup runs pretty quick.
I'm inclined to turn off the snapshot backup if this isn't a known and
resolved problem...
Thanks
James
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]
2010-09-03 0:35 ` James Harper
@ 2010-09-05 23:44 ` James Harper
0 siblings, 0 replies; 4+ messages in thread
From: James Harper @ 2010-09-05 23:44 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: xen-devel
This is still happening with 2.6.32.18-ge6b9b2c but I'm now not so sure
if it's a problem with blkback after all. My script does this:
echo source=$source >>/tmp/attach-snapshot.log
echo dest=$dest >>/tmp/attach-snapshot.log
lvcreate --snapshot --name=$dest --size=1G $source
lvscan >>/tmp/attach-snapshot.log
xm block-attach $domname phy:$dest $destdev w
/tmp/attach-snapshot.log shows "source=" and "dest=" but not a result of
"lvscan", which would indicate that it's the "lvcreate" that's hung, not
the "xm block-attach". Disk writes are still working so I know that it's
not just a missed buffer or anything. That would explain why udevd is
hanging too - lvcreate would involve creating a new device while "xm
block-attach" wouldn't. I think I'm barking up the wrong tree on
xen-devel unless anyone else has any suggestions?
Thanks
James
> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-
> bounces@lists.xensource.com] On Behalf Of James Harper
> Sent: Friday, 3 September 2010 10:36
> To: Jeremy Fitzhardinge
> Cc: xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] BUG: soft lockup - CPU#1 stuck for 61s!
[udevd:4865]
>
> >
> > On 09/02/2010 05:20 PM, James Harper wrote:
> > > I've had "BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]"
> happen
> > > twice now and the server fairly quickly (over the course of 5-10
> > > minutes) becomes unusable after that happens. The call trace
appears
> to
> > > be:
> > >
> > > Call Trace:
> > > [<ffffffff8100e8e9>] ? xen_force_evtchn_callback+0x9/0xa
> > > [<ffffffff8100ef72>] ? check_events+0x12/0x20
> > > [<ffffffff8100ef19>] ? xen_irq_enable_direct_end+0x0/0x7
> > > [<ffffffff8102fcc1>] ? do_page_fault+0x9e/0x27b
> > > [<ffffffff812b39a5>] ? page_fault+0x25/0x30
> > > [<ffffffff8116f28d>] ? __put_user_4+0x1d/0x30
> > > [<ffffffff81045265>] ? schedule_tail+0x92/0x96
> > > [<ffffffff81011983>] ? ret_from_fork+0x13/0x80
> > >
> > > That was under 2.6.32.17-g69a73fa. I've since upgraded to
> > > 2.6.32.18-ge6b9b2c but the changelogs don't show anything
obviously
> > > related to the crash (but as usual there are a lot of them so
maybe
> I've
> > > missed something?)
> > >
> > > Is this a known problem under 2.6.32.17-g69a73fa that has since
been
> > > fixed?
> >
> > Hard to know; nothing springs to mind right now. What else is going
> on
> > at the time? What's the full softlockup message? Is there any more
> > context?
>
> I just noticed that it coincided pretty much exactly when a backup
runs.
> I take a snapshot in Dom0 then block-attach it to the linux DomU so it
> can be backed up from DomU which makes restores easier etc, rather
than
> doing the backup in Dom0. I hadn't noticed before because it's the
> Windows DomU that people complain about first, not the linux DomU.
> Block-attach/detach would explain why udevd is involved too which
makes
> more sense. The block-attach happens every hour on the hour. The full
> cut&paste of the first hang is:
>
> Sep 3 09:00:42 keechsvr2 kernel: [235893.079566] blkback: ring-ref
> 1442, event-channel 13, protocol 1 (x86_64-abi)
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] BUG: soft lockup -
> CPU#1 stuck for 61s! [udevd:4865]
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in:
> ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun
> act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc
ipt_ULO
> G xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT
> xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp
> iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc n
> f_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp
> nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4
> nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback
> xen_blkback
> blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd
> tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo snd_page_alloc
> ipmi_msghandler pcspkr psmouse serio_raw rng_core i5000_edac edac_c
> ore i5k_amb button evdev xfs exportfs dm_mirror dm_region_hash dm_log
> dm_snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata
> usb_storage scsi_mod usbhid hid piix ide_pci_generic bnx2 ehci_hcd id
> e_core uhci_hc
> Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded:
> scsi_wait_scan]
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CPU 1:
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in:
> ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun
> act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc
ipt_ULO
> G xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT
> xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp
> iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc n
> f_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp
> nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4
> nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback
> xen_blkback
> blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd
> tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo snd_page_alloc
> ipmi_msghandler pcspkr psmouse serio_raw rng_core i5000_edac edac_c
> ore i5k_amb button evdev xfs exportfs dm_mirror dm_region_hash dm_log
> dm_snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata
> usb_storage scsi_mod usbhid hid piix ide_pci_generic bnx2 ehci_hcd id
> e_core uhci_hc
> Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded:
> scsi_wait_scan]
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Pid: 4865, comm:
udevd
> Not tainted 2.6.32.17-g69a73fa #1 ProLiant ML370 G5
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RIP:
> e030:[<ffffffff8100922a>] [<ffffffff8100922a>]
> hypercall_page+0x22a/0x1001
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RSP:
> e02b:ffff8800114d5dd0 EFLAGS: 00000246
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RAX:
0000000000040000
> RBX: 0000000000000001 RCX: ffffffff8100922a
> keechsvr2:/usr/local/src/linux-2.6-xen# head -100 /var/log/kern.log
> Sep 3 07:00:44 keechsvr2 kernel: [228694.767691] blkback: ring-ref
311,
> event-channel 13, protocol 1 (x86_64-abi)
> Sep 3 08:00:42 keechsvr2 kernel: [232293.072248] blkback: ring-ref
> 1334, event-channel 13, protocol 1 (x86_64-abi)
> Sep 3 09:00:42 keechsvr2 kernel: [235893.079566] blkback: ring-ref
> 1442, event-channel 13, protocol 1 (x86_64-abi)
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] BUG: soft lockup -
> CPU#1 stuck for 61s! [udevd:4865]
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in:
> ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun
> act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc
ipt_ULOG
> xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT
> xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp
> iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc
> nf_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp
> nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4
> nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback
> xen_blkback blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm
> snd_timer snd tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo
> snd_page_alloc ipmi_msghandler pcspkr psmouse serio_raw rng_core
> i5000_edac edac_core i5k_amb button evdev xfs exportfs dm_mirror
> dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_piix
> ata_generic libata usb_storage scsi_mod usbhid hid piix
ide_pci_generic
> bnx2 ehci_hcd ide_core uhci_hc
> Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded:
> scsi_wait_scan]
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CPU 1:
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in:
> ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun
> act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc
ipt_ULOG
> xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT
> xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp
> iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc
> nf_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp
> nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4
> nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback
> xen_blkback blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm
> snd_timer snd tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo
> snd_page_alloc ipmi_msghandler pcspkr psmouse serio_raw rng_core
> i5000_edac edac_core i5k_amb button evdev xfs exportfs dm_mirror
> dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_piix
> ata_generic libata usb_storage scsi_mod usbhid hid piix
ide_pci_generic
> bnx2 ehci_hcd ide_core uhci_hc
> Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded:
> scsi_wait_scan]
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Pid: 4865, comm:
udevd
> Not tainted 2.6.32.17-g69a73fa #1 ProLiant ML370 G5
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RIP:
> e030:[<ffffffff8100922a>] [<ffffffff8100922a>]
> hypercall_page+0x22a/0x1001
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RSP:
> e02b:ffff8800114d5dd0 EFLAGS: 00000246
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RAX:
0000000000040000
> RBX: 0000000000000001 RCX: ffffffff8100922a
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RDX:
ffffffff814293d0
> RSI: 0000000000000000 RDI: 0000000000000000
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RBP:
00007fbc628a9800
> R08: 0000000000000000 R09: 0000000000000000
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] R10:
0000000000000002
> R11: 0000000000000246 R12: 0000000000000002
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] R13:
ffff8800114d5e88
> R14: ffff880017f9cfc0 R15: ffff880017eed500
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] FS:
> 00007fbc628a9770(0000) GS:ffff880002de8000(0000)
knlGS:0000000000000000
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CS: e033 DS: 0000
ES:
> 0000 CR0: 000000008005003b
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CR2:
00007fbc628a9800
> CR3: 000000001e2d7000 CR4: 0000000000002660
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] DR0:
0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] DR3:
0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Call Trace:
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505]
[<ffffffff8100e8e9>]
> ? xen_force_evtchn_callback+0x9/0xa
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505]
[<ffffffff8100ef72>]
> ? check_events+0x12/0x20
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505]
[<ffffffff8100ef19>]
> ? xen_irq_enable_direct_end+0x0/0x7
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505]
[<ffffffff8102fcc1>]
> ? do_page_fault+0x9e/0x27b
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505]
[<ffffffff812b39a5>]
> ? page_fault+0x25/0x30
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505]
[<ffffffff8116f28d>]
> ? __put_user_4+0x1d/0x30
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505]
[<ffffffff81045265>]
> ? schedule_tail+0x92/0x96
> Sep 3 09:02:13 keechsvr2 kernel: [235983.760505]
[<ffffffff81011983>]
> ? ret_from_fork+0x13/0x80
>
> There's another hang at 09:03:09 which I can post if you'd find it
> useful.
>
> If the block-attach happened at 09:00:42 and the crash happened at
> 09:02:13 then I assume that whatever udevd started doing started at
> 09:01:12 (eg 61 seconds earlier) which means that it's more likely to
be
> the block-detach at the end of the backup causing the hang not the
> block-attach. The backup runs pretty quick.
>
> I'm inclined to turn off the snapshot backup if this isn't a known and
> resolved problem...
>
> Thanks
>
> James
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-09-05 23:44 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-03 0:20 BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865] James Harper
2010-09-03 0:26 ` Jeremy Fitzhardinge
2010-09-03 0:35 ` James Harper
2010-09-05 23:44 ` James Harper
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.