linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Richard Kennedy <richard@rsk.demon.co.uk>
To: Jens Axboe <jaxboe@fusionio.com>
Cc: Tejun Heo <tj@kernel.org>, Rob Landley <rlandley@parallels.com>,
	Pete Clements <clem@clem.clem-digital.net>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>
Subject: Re: Commit 7eaceaccab5f40 causing boot hang.
Date: Thu, 31 Mar 2011 14:23:51 +0100	[thread overview]
Message-ID: <1301577831.1984.2.camel@castor.rsk> (raw)
In-Reply-To: <4D947F29.5050203@fusionio.com>

On Thu, 2011-03-31 at 15:18 +0200, Jens Axboe wrote:
> On 2011-03-31 15:09, Richard Kennedy wrote:
> > On 31/03/11 13:33, Jens Axboe wrote:
> >> On 2011-03-31 14:11, Tejun Heo wrote:
> >>> Hello,
> >>>
> >>> On Thu, Mar 31, 2011 at 12:37:42PM +0200, Jens Axboe wrote:
> >>>> It seems to hard hang, looks very odd:
> >>>>
> >>>> [   84.056007] BUG: soft lockup - CPU#0 stuck for 67s! [kworker/0:2:743]
> >>>> [   84.056008] Modules linked in:
> >>>> [   84.056008] irq event stamp: 334859658
> >>>> [   84.056008] hardirqs last  enabled at (334859657): [<ffffffff815c40c7>] _raw_spin_unlock_irq+0x2b/0x30
> >>>> [   84.056008] hardirqs last disabled at (334859658): [<ffffffff815c42e7>] save_args+0x67/0x70
> >>>> [   84.056008] softirqs last  enabled at (334855538): [<ffffffff81044819>] __do_softirq+0x1a3/0x1c2
> >>>> [   84.056008] softirqs last disabled at (334855525): [<ffffffff815cb9cc>] call_softirq+0x1c/0x30
> >>>> [   84.056008] CPU 0 
> >>>> [   84.056008] Modules linked in:
> >>>> [   84.056008] 
> >>>> [   84.056008] Pid: 743, comm: kworker/0:2 Not tainted 2.6.39-rc1+ #12 Bochs Bochs
> >>>> [   84.056008] RIP: 0010:[<ffffffff815c40c9>]  [<ffffffff815c40c9>] _raw_spin_unlock_irq+0x2d/0x30
> >>>> [   84.056008] RSP: 0018:ffff88003d343d98  EFLAGS: 00000202
> >>>> [   84.056008] RAX: 0000000013f58d89 RBX: 0000000000000006 RCX: ffff88003d2c5998
> >>>> [   84.056008] RDX: 0000000000000006 RSI: ffff88003d343da0 RDI: ffff88003db19508
> >>>> [   84.056008] RBP: ffff88003d343da0 R08: ffff88003fc15c00 R09: 0000000000000001
> >>>> [   84.056008] R10: ffffffff81e0d040 R11: ffff88003d343d60 R12: ffffffff815cb18e
> >>>> [   84.056008] R13: 0000000000000001 R14: ffff88003d2c5998 R15: ffffffff81069aec
> >>>> [   84.056008] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
> >>>> [   84.056008] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >>>> [   84.056008] CR2: 000000000060d828 CR3: 000000003d3f8000 CR4: 00000000000006f0
> >>>> [   84.056008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>> [   84.056008] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >>>> [   84.056008] Process kworker/0:2 (pid: 743, threadinfo ffff88003d342000, task ffff88003db18f60)
> >>>> [   84.056008] Stack:
> >>>> [   84.056008]  ffff88003d2c5870 ffff88003d343dc0 ffffffff812171d3 ffff88003fc15c00
> >>>> [   84.056008]  ffff88003d31e6c0 ffff88003d343e50 ffffffff81053e99 ffffffff81053e0b
> >>>> [   84.056008]  ffff88003d342010 ffff88003db18f60 0000000000000046 ffff88003fc15c05
> >>>> [   84.056008] Call Trace:
> >>>> [   84.056008]  [<ffffffff812171d3>] blk_delay_work+0x32/0x36
> >>>> [   84.056008]  [<ffffffff81053e99>] process_one_work+0x230/0x397
> >>>> [   84.056008]  [<ffffffff81053e0b>] ? process_one_work+0x1a2/0x397
> >>>> [   84.056008]  [<ffffffff8105612a>] worker_thread+0x136/0x255
> >>>> [   84.056008]  [<ffffffff81055ff4>] ? manage_workers+0x190/0x190
> >>>> [   84.056008]  [<ffffffff8105974a>] kthread+0x7d/0x85
> >>>> [   84.056008]  [<ffffffff815cb8d4>] kernel_thread_helper+0x4/0x10
> >>>> [   84.056008]  [<ffffffff815c4440>] ? retint_restore_args+0xe/0xe
> >>>> [   84.056008]  [<ffffffff810596cd>] ? __init_kthread_worker+0x56/0x56
> >>>> [   84.056008]  [<ffffffff815cb8d0>] ? gs_change+0xb/0xb
> >>>> [   84.056008] Code: 01 00 00 00 48 89 e5 53 48 89 fb 48 83 c7 18 48 83 ec 08 48 8b 55 08 e8 11 7b aa ff 48 89 df e8 03 05 c7 ff e8 f3 5e aa ff fb 5e <5b> c9 c3 55 48 89 e5 41 54 49 89 fc 48 8b 55 08 48 83 c7 18 53 
> >>>
> >>> Is CONFIG_PREEMPT[_VOLUNTARY] set?  The soft lockup detection works by
> >>> checking whether a high pri RT task is scheduled periodically and busy
> >>> looping in kernel code with preemption disabled or CONFIG_PREEMPT
> >>> disabled would trigger it.  The backtrace doesn't mean the CPU is
> >>> stuck there not progressing.  It just shows where the CPU is at the
> >>> moment of triggering and the softlockup triggering itself indicates
> >>> that IRQ and bottom halves are running fine.
> >>>
> >>> I think more likely failure mode is something is looping in block path
> >>> expecting someone else to do something but as it is busy looping and
> >>> preemption isn't enabled the someone else can't proceed.  We had a
> >>> similar problem in md lately.
> >>>
> >>>  http://thread.gmane.org/gmane.linux.raid/33020
> >>
> >> I think this sound exactly right, and also explains why SMP works. And
> >> yes CONFIG_PREEMPT_VOLUNTARY=y is set.
> >>
> >>> If this is the case, a good way to debug would be triggering sysrq-l
> >>> multiple times while the machine is hung and see where the CPU is busy
> >>> looping.
> >>
> >> Doesn't respond to sysrq. I'll poke around.
> >>
> > 
> > Hi Jens,
> > 
> > I'm seeing a problem with fio never completing when writing to 2 disks
> > simultaneously. In my test case I'm writing 2Gb to both a LVM volume & a
> > pata drive on x86_64 on a AMD X2. Could this be a related issue?
> > 
> > I'm not getting anything reported in the log, lockup detection doesn't
> > report anything either. The write seems to have finished (the disk light
> > activity has stopped) and the cpu cores are both below 10% usage, but
> > fio never returns. The test does complete some times, but it seems to be
> > one 1 in 4.
> 
> So when you say PATA, it's /dev/hdaX something as well?
> 
> > I'm going to try tracing it and see if I can spot where it's stuck.
> 
> Thanks, that would be nice.
> 
The second drive is /dev/sdb1 mounted on /opt, both file systems are
ext4.
 




  reply	other threads:[~2011-03-31 13:25 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-29  6:19 Commit 7eaceaccab5f40 causing boot hang Rob Landley
2011-03-29  6:35 ` Jens Axboe
2011-03-29  8:52   ` Rob Landley
2011-03-29 13:59     ` Jens Axboe
2011-03-29 14:13       ` Rob Landley
2011-03-29 14:16         ` Jens Axboe
2011-03-29 15:51           ` Pete Clements
2011-03-30  6:06             ` Rob Landley
2011-03-30 11:38               ` Jens Axboe
2011-03-30 13:52                 ` Rob Landley
2011-03-31  9:02                   ` Jens Axboe
2011-03-31  9:11                     ` Rob Landley
2011-03-31 10:07                       ` Jens Axboe
2011-03-31 10:37                         ` Jens Axboe
2011-03-31 10:47                           ` Rob Landley
2011-03-31 11:09                             ` Jens Axboe
2011-04-01  3:11                               ` Nicholas A. Bellinger
2011-04-03  7:05                                 ` Andreas Mohr
2011-04-01 12:26                               ` Mark Lord
2011-04-09  2:56                               ` Rob Landley
2011-04-09  3:00                                 ` Jens Axboe
2011-03-31 12:11                           ` Tejun Heo
2011-03-31 12:33                             ` Jens Axboe
2011-03-31 13:09                               ` Richard Kennedy
2011-03-31 13:18                                 ` Jens Axboe
2011-03-31 13:23                                   ` Richard Kennedy [this message]
2011-03-31 13:33                                     ` Jens Axboe
2011-03-31 14:49                                       ` Richard Kennedy
2011-04-04 13:47                                         ` Richard Kennedy
2011-04-13 10:25                                           ` Richard Kennedy
2011-04-13 10:36                                             ` Jens Axboe
2011-04-06 11:55                               ` Rob Landley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1301577831.1984.2.camel@castor.rsk \
    --to=richard@rsk.demon.co.uk \
    --cc=clem@clem.clem-digital.net \
    --cc=jaxboe@fusionio.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rlandley@parallels.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).