BUG: soft lockup detected on CPU#0!

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* BUG: soft lockup detected on CPU#0!
@ 2006-04-05  2:39 Christopher S. Aker
  2006-04-05  2:45 ` Itamar Reis Peixoto
  2006-04-05 10:16 ` Keir Fraser
  0 siblings, 2 replies; 8+ messages in thread
From: Christopher S. Aker @ 2006-04-05  2:39 UTC (permalink / raw)
  To: xen-devel

BUG: soft lockup detected on CPU#0!
Pid: 17178, comm:        xvd 275 fd:86
EIP: 0061:[<c0133b65>] CPU: 1
EIP is at kthread_should_stop+0x15/0x20
  EFLAGS: 00000246    Not tainted  (2.6.16-xen0 #1)
EAX: 00000001 EBX: 00000000 ECX: c05431b4 EDX: 00000000
ESI: 00000000 EDI: cc7b143c EBP: c5f11f98 DS: 007b ES: 007b
CR0: 8005003b CR2: 090a8928 CR3: 0635f000 CR4: 00000660
  [<c0387bd5>] blkif_schedule+0x25/0x4a0
  [<c0133e90>] autoremove_wake_function+0x0/0x60
  [<c0133c6f>] kthread+0xff/0x110
  [<c0387bb0>] blkif_schedule+0x0/0x4a0
  [<c0133b70>] kthread+0x0/0x110
  [<c0102ca5>] kernel_thread_helper+0x5/0x10

These are happening every few minutes, for four domains, and rising. 
Machine is under somewhat high disk load (lots of scp processes copying 
large file systems, cfq disk sched).

xen_changeset: Sun Mar 26 11:50:39 2006 +0100 9441:30ae67d6e5f0

Those domain have now zombified and are unkillable.  I'll grab the 
latest updates and reboot the box.

-Chris

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: soft lockup detected on CPU#0!
  2006-04-05  2:39 Christopher S. Aker
@ 2006-04-05  2:45 ` Itamar Reis Peixoto
  2006-04-05 10:16 ` Keir Fraser
  1 sibling, 0 replies; 8+ messages in thread
From: Itamar Reis Peixoto @ 2006-04-05  2:45 UTC (permalink / raw)
  To: Christopher S. Aker, xen-devel

I have the same problem.

http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=543

xen_changeset          : Sat Apr  1 14:59:12 2006 +0100 9511:60071beccf18

if you found a solution first, please send-me.



> BUG: soft lockup detected on CPU#0!
> Pid: 17178, comm:        xvd 275 fd:86
> EIP: 0061:[<c0133b65>] CPU: 1
> EIP is at kthread_should_stop+0x15/0x20
>  EFLAGS: 00000246    Not tainted  (2.6.16-xen0 #1)
> EAX: 00000001 EBX: 00000000 ECX: c05431b4 EDX: 00000000
> ESI: 00000000 EDI: cc7b143c EBP: c5f11f98 DS: 007b ES: 007b
> CR0: 8005003b CR2: 090a8928 CR3: 0635f000 CR4: 00000660
>  [<c0387bd5>] blkif_schedule+0x25/0x4a0
>  [<c0133e90>] autoremove_wake_function+0x0/0x60
>  [<c0133c6f>] kthread+0xff/0x110
>  [<c0387bb0>] blkif_schedule+0x0/0x4a0
>  [<c0133b70>] kthread+0x0/0x110
>  [<c0102ca5>] kernel_thread_helper+0x5/0x10
>
> These are happening every few minutes, for four domains, and rising. 
> Machine is under somewhat high disk load (lots of scp processes copying 
> large file systems, cfq disk sched).
>
> xen_changeset: Sun Mar 26 11:50:39 2006 +0100 9441:30ae67d6e5f0
>
> Those domain have now zombified and are unkillable.  I'll grab the latest 
> updates and reboot the box.
>
> -Chris
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: soft lockup detected on CPU#0!
  2006-04-05  2:39 Christopher S. Aker
  2006-04-05  2:45 ` Itamar Reis Peixoto
@ 2006-04-05 10:16 ` Keir Fraser
  2006-04-05 18:32   ` Christopher S. Aker
  1 sibling, 1 reply; 8+ messages in thread
From: Keir Fraser @ 2006-04-05 10:16 UTC (permalink / raw)
  To: Christopher S. Aker; +Cc: xen-devel


On 5 Apr 2006, at 03:39, Christopher S. Aker wrote:

> These are happening every few minutes, for four domains, and rising. 
> Machine is under somewhat high disk load (lots of scp processes 
> copying large file systems, cfq disk sched).
>
> xen_changeset: Sun Mar 26 11:50:39 2006 +0100 9441:30ae67d6e5f0
>
> Those domain have now zombified and are unkillable.  I'll grab the 
> latest updates and reboot the box.

Since it looks like a problem with the blkback kernel thread, it's 
worth doing:
  echo 1 >/sys/module/blkback/parameters/debug_lvl

That may get some kernel tracing (at level KERN_DEBUG) from that thread 
and we can see if it's got into a bad looping state.

  -- Keir

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: soft lockup detected on CPU#0!
  2006-04-05 10:16 ` Keir Fraser
@ 2006-04-05 18:32   ` Christopher S. Aker
  2006-04-06  6:56     ` Keir Fraser
  2006-04-06  9:23     ` Keir Fraser
  0 siblings, 2 replies; 8+ messages in thread
From: Christopher S. Aker @ 2006-04-05 18:32 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

Keir Fraser wrote:
> Since it looks like a problem with the blkback kernel thread, it's worth 
> doing:
>  echo 1 >/sys/module/blkback/parameters/debug_lvl
> 
> That may get some kernel tracing (at level KERN_DEBUG) from that thread 
> and we can see if it's got into a bad looping state.

After an update and a reboot, and turning off soft lockup detection, I'm 
still getting zombie domains.  It also appears that after this happens, 
no new block devices can be attached.

Here's a summary of the different debug outputs:

(after restarting Xend)
==> /var/log/xend.log <==
[2006-04-05 14:29:09 xend] DEBUG (XendDomain:197) Cannot recreate 
information for dying domain 54.  Xend will ignore this domain from now on.
[2006-04-05 14:29:09 xend] DEBUG (XendDomain:197) Cannot recreate 
information for dying domain 73.  Xend will ignore this domain from now on.

Apr  5 14:28:40 host56 kernel: xvd 73 fd:85: I/O pending, delaying exit
Apr  5 14:28:40 host56 kernel: xvd 73 fd:85: not connected (13 pending)
Apr  5 14:28:40 host56 kernel: xvd 73 fd:85: I/O pending, delaying exit
Apr  5 14:28:40 host56 kernel: xvd 73 fd:85: not connected (13 pending)
^-- these flood syslog

Apr  5 14:28:40 host56 kernel: ined (13 pe, delayed (13 pe, delayined 
(13 , delayed (13 , delayied (13 , delayined (13 , delayed (13 pend, 
delayed (13 , delayined (13 pe, delayined (13 pe, delayined (13 , 
delayed (13 pe, delayed (13 , delayined (13 , delayed (13 pendin, 
delayined (13 p, delayined (13 pen, delayed (13 pe, delayined (13 , 
delayied (13 pe, delayed (13 , delayined (13 , delayed (13 pendin, 
delayined (13 , delayined (13 pe, delayed (13 pe, delayined (13 , 
delayed (13 pe, delayed (13 , delayined (13 pe, delayined (13 pendin, 
delayined (13 pe, delaying ed (13 pe, delayined (13 pe, delayined (13 
pe, delayed (13 pe, delayed (13 , delayin, delayined (13 pending, 
delayined (13 , delaying ed (13 pe, delayed (13 pe, delayined (13 , 
delayed (13 pe, delayed (13 , delayined (13 pe, delayed (13 pendin, 
delayined (13 , delayined (13 pe, delayed (13 pe, delayined (13 , 
delayed (13 pe, delayed (13 , delayined (13 , delayied (13 pendin, 
delayined (13 , delayined (13 pe, delayined (13 pe, delayed (13 pe, 
delayed (13
Apr  5 14:28:40 host56 kernel: elayined (13 , delayed (13 pendin, 
delayined (13 , delayined (13 pe, delayed (13 pe, delayined (13 pe, 
delayed (13 pe, delayed (13 p, delayined (13 , delayed (13 pendin, 
delayined (13 , delayined (13 pe, delayed (13 pe, delayined (13 , 
delayed (13 p, delayed (13 pe, delayined (13 pe, delayined (13 pend, 
delayined (13 , delaying ed (13 peed (13 , delayined (13 , delayined (13 
pe, delayed (13 pe, delayined (13 p, delayined (13 pend, delayined (13 , 
delayined (13 pe, delayined (13 pe, de, delayined (13 pe, delayed (13 , 
delayined (13 , delayed (13 pendin, delayined (13 , delayined (13 pen, 
delayed (13 pe, delayined (13 , delayed (13 pe, delayed (13 , delayined 
(13 , delayed (13 pendin, delayined (13 , delayined (13 pe, delayined 
(13 pe, delayined (13 , delayed (13 pe, delayed (13 , delayined (13 , 
delayed (13 pendin, delayined (13 , delayined (13 pe, delayed (13 pe, 
delayined (13 , delayined (13 pe, delayed (13 , delayined (13 p, delayed 
(13 pend, delayed (13 , delayined (13 pe, dela
^-- these are flooding, but not quite as often.

This leaves Xen/Xend in an unstable condition, I'm thinking the only way 
out is a reboot...

-Chris

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: soft lockup detected on CPU#0!
  2006-04-05 18:32   ` Christopher S. Aker
@ 2006-04-06  6:56     ` Keir Fraser
  2006-04-06  9:23     ` Keir Fraser
  1 sibling, 0 replies; 8+ messages in thread
From: Keir Fraser @ 2006-04-06  6:56 UTC (permalink / raw)
  To: Christopher S. Aker; +Cc: xen-devel


On 5 Apr 2006, at 19:32, Christopher S. Aker wrote:

> pe, delayed (13 , delayined (13 , delayed (13 pendin, delayined (13 , 
> delayined (13 pe, delayined (13 pe, delayined (13 , delayed (13 pe, 
> delayed (13 , delayined (13 , delayed (13 pendin, delayined (13 , 
> delayined (13 pe, delayed (13 pe, delayined (13 , delayined (13 pe, 
> delayed (13 , delayined (13 p, delayed (13 pend, delayed (13 , 
> delayined (13 pe, dela
> ^-- these are flooding, but not quite as often.
>
> This leaves Xen/Xend in an unstable condition, I'm thinking the only 
> way out is a reboot...

Thanks, I'll create a fix for xen-unstable which can later be 
backported to 3.0.2.

  -- Keir

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: BUG: soft lockup detected on CPU#0!
  2006-04-05 18:32   ` Christopher S. Aker
  2006-04-06  6:56     ` Keir Fraser
@ 2006-04-06  9:23     ` Keir Fraser
  1 sibling, 0 replies; 8+ messages in thread
From: Keir Fraser @ 2006-04-06  9:23 UTC (permalink / raw)
  To: Christopher S. Aker; +Cc: xen-devel

On 5 Apr 2006, at 19:32, Christopher S. Aker wrote:

> After an update and a reboot, and turning off soft lockup detection, 
> I'm still getting zombie domains.  It also appears that after this 
> happens, no new block devices can be attached.

Okay, this issue should be fixed in both the -unstable and -3.0-testing 
trees. I pushed directly into the 3.0.2 release tree as the original 
blkback kernel-thread loop was very broken indeed. I'm sure the fix is 
a strict improvement. :-)

Look for changeset comment "Fix the blkif_schedule() kthread loop." 
when you pull -- that's the changeset that contains the fix.

  -- Keir

^ permalink raw reply	[flat|nested] 8+ messages in thread

* BUG: soft lockup detected on CPU#0!
@ 2007-09-10  2:14 Abhinav Srivastava
  0 siblings, 0 replies; 8+ messages in thread
From: Abhinav Srivastava @ 2007-09-10  2:14 UTC (permalink / raw)
  To: xen-devel

Hi there,

I am using Xen 3.0.3 testing and getting following
error in domU while doing large copy:

#time cp -R linux-2.6.18 linux-cp
BUG: soft lockup detected on CPU#0!

Pid: 998, comm:                   cp
EIP: 0061:[<c0101227>] CPU: 0
EIP is at 0xc0101227
 EFLAGS: 00000286    Not tainted  (2.6.16.29-xen #52)
EAX: 00000000 EBX: deadbeef ECX: deadbeef EDX:
c056dc40
ESI: 0000055e EDI: bf874000 EBP: c7b6a000 DS: 007b ES:
007b
CR0: 8005003b CR2: 08061494 CR3: 07204000 CR4:
00000640
 [<c010693b>] sysxenentrynotifier+0x3b/0x50
 [<c0104f8c>] syscall_call+0xa/0x16

sysxenentrynotifier is my function, which i call from
system call handler to intercept system calls. I am
also logging events inside xen and passing it to dom0
using shared memory setup between dom0 and Xen. When I
copy events to dom0, I pause domU for sometime using
function set_bit(VCPUF_blocked) and and then i
un-pause it using clead_bit(VCPUF_blocked).

Copy operation runs fine for sometime but after that
domU gets hung and when I press any key in domU, it
throws above mentioned error.

If I try to perform the same operation but my logging
code disabled, it works fine. Is it the problem
related to my code or some Xen problem, which gets
enabled when I pause or unpause domU.

Any help or pointers in this regard would be very much
appreciated.

Thanks,
Abhinav

      Unlimited freedom, unlimited storage. Get it now, on http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* BUG: soft lockup detected on CPU#0!
@ 2010-08-02 22:54 Luke S Crawford
  0 siblings, 0 replies; 8+ messages in thread
From: Luke S Crawford @ 2010-08-02 22:54 UTC (permalink / raw)
  To: xen-devel


so yeah, i've been getting these on both my amd G34 systems as well 
as my socket F (mcp55) systems.  three times in as many months, all on 
different servers.  

relivant bits of xm info:
xen_major              : 3
xen_minor              : 4
xen_extra              : .3

this is with the xen.org 2.6.18 kernel that comes with 3.4.3


Call Trace:
 <IRQ> [<ffffffff8025894a>] softlockup_tick+0xce/0xe0
 [<ffffffff8020df6c>] timer_interrupt+0x3a8/0x402
 [<ffffffff80258c34>] handle_IRQ_event+0x4e/0x96
 [<ffffffff80258d20>] __do_IRQ+0xa4/0x105
 [<ffffffff8020bd6c>] do_IRQ+0x44/0x4d
 [<ffffffff80351f4c>] evtchn_do_upcall+0x19e/0x256
 [<ffffffff80209d8e>] do_hypervisor_callback+0x1e/0x2c
 <EOI> [<ffffffff8035d93e>] show_rd_sect+0x0/0x68
 [<ffffffff802ee0bc>] __read_lock_failed+0x8/0x14
 [<ffffffff803494de>] get_device+0x17/0x20
 [<ffffffff8040415d>] .text.lock.spinlock+0x53/0x8a
 [<ffffffff8035d965>] show_rd_sect+0x27/0x68
 [<ffffffff802be588>] sysfs_read_file+0xa5/0x12c
 [<ffffffff8028031c>] vfs_read+0xcb/0x171
 [<ffffffff802806fb>] sys_read+0x45/0x6e
 [<ffffffff802097b2>] tracesys+0xab/0xb5



-- 
Luke S. Crawford
http://prgmr.com/xen/         -   Hosting for the technically adept
http://nostarch.com/xen.htm   -   We don't assume you are stupid.  

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-08-02 22:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-02 22:54 BUG: soft lockup detected on CPU#0! Luke S Crawford
  -- strict thread matches above, loose matches on Subject: below --
2007-09-10  2:14 Abhinav Srivastava
2006-04-05  2:39 Christopher S. Aker
2006-04-05  2:45 ` Itamar Reis Peixoto
2006-04-05 10:16 ` Keir Fraser
2006-04-05 18:32   ` Christopher S. Aker
2006-04-06  6:56     ` Keir Fraser
2006-04-06  9:23     ` Keir Fraser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).