All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>, linux-ia64@vger.kernel.org
Subject: Re: system hang on latest git
Date: Tue, 29 Jan 2008 20:16:48 +0000	[thread overview]
Message-ID: <20080129201648.GA15220@kernel.dk> (raw)
In-Reply-To: <20080129201136.GX15220@kernel.dk>

On Tue, Jan 29 2008, Jens Axboe wrote:
> On Tue, Jan 29 2008, Luck, Tony wrote:
> > I pulled Linus' tree this morning (git head = 0ba6c33bcddc64a54b5f1c25a696c4767dc76292)
> > and built for ia64 (using arch/ia64/configs/tiger_defconfig).   System booted
> > OK, but when I stressed it a little (building another kernel with "make -j32")
> > it hung.
> > 
> > The console has a bunch (98) of warnings about tasks blocked for more than 120
> > seconds like this:
> > INFO: task grep:9168 blocked for more than 120 seconds.
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > 
> > Call Trace:
> >  [<a000000100704120>] schedule+0x11c0/0x1340
> >                                 spà000001ed8afbf0 bspà000001ed8a1280
> >  [<a00000010024e720>] do_get_write_access+0x660/0xbe0
> >                                 spà000001ed8afc20 bspà000001ed8a1208
> >  [<a00000010024f060>] journal_get_write_access+0x40/0x80
> >                                 spà000001ed8afca0 bspà000001ed8a11c8
> >  [<a000000100245db0>] __ext3_journal_get_write_access+0x30/0xa0
> >                                 spà000001ed8afca0 bspà000001ed8a1190
> >  [<a00000010022dea0>] ext3_reserve_inode_write+0x80/0x120
> >                                 spà000001ed8afca0 bspà000001ed8a1158
> >  [<a00000010022df70>] ext3_mark_inode_dirty+0x30/0x80
> >                                 spà000001ed8afca0 bspà000001ed8a1130
> >  [<a000000100232530>] ext3_dirty_inode+0xd0/0x120
> >                                 spà000001ed8afcc0 bspà000001ed8a1100
> >  [<a000000100170e20>] __mark_inode_dirty+0xa0/0x3e0
> >                                 spà000001ed8afcc0 bspà000001ed8a10b0
> >  [<a00000010015b570>] touch_atime+0x310/0x340
> >                                 spà000001ed8afcc0 bspà000001ed8a1088
> >  [<a0000001000d6c20>] do_generic_mapping_read+0x780/0x7a0
> >                                 spà000001ed8afce0 bspà000001ed8a0fe0
> >  [<a0000001000db250>] generic_file_aio_read+0x290/0x340
> >                                 spà000001ed8afce0 bspà000001ed8a0f80
> >  [<a00000010012c990>] do_sync_read+0x170/0x200
> >                                 spà000001ed8afd10 bspà000001ed8a0f40
> >  [<a00000010012cbd0>] vfs_read+0x1b0/0x2e0
> >                                 spà000001ed8afe20 bspà000001ed8a0ef0
> >  [<a00000010012d250>] sys_read+0x70/0xe0
> >                                 spà000001ed8afe20 bspà000001ed8a0e78
> >  [<a00000010000a4a0>] ia64_ret_from_syscall+0x0/0x20
> >                                 spà000001ed8afe30 bspà000001ed8a0e78
> > 
> > 
> > [The stack trace has several variations ... some from sys_read(), some from
> > sys_open(), some from sys_execve(), some from sys_mmap() etc. 84/98 stack
> > traces pass through the touch_atime->__mark_inode_dirty path ... all 98
> > are attached]
> > 
> > A quick dig into processor state shows 8 cpus are idle.  7 are spinning
> > in __spin_lock_irq() from __make_request() and one is in spin_lock() from
> > as_merged_requests().
> 
> Looks like a deadlock on queue lock and ioc lock, but I don't see
> immediately what the problem is. I can't stick around for longer
> tonight, but I'll get to the bottom of this tomorrow.

Actually, can you try this? It has a known race but nothing to worry
about, and it removes ioc->lock from irq context.

diff --git a/block/as-iosched.c b/block/as-iosched.c
index b201d16..585aad2 100644
--- a/block/as-iosched.c
+++ b/block/as-iosched.c
@@ -235,10 +235,8 @@ static void as_put_io_context(struct request *rq)
 	aic = RQ_IOC(rq)->aic;
 
 	if (rq_is_sync(rq) && aic) {
-		spin_lock(&aic->lock);
 		set_bit(AS_TASK_IORUNNING, &aic->state);
 		aic->last_end_request = jiffies;
-		spin_unlock(&aic->lock);
 	}
 
 	put_io_context(RQ_IOC(rq));

-- 
Jens Axboe


WARNING: multiple messages have this Message-ID (diff)
From: Jens Axboe <jens.axboe@oracle.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>, linux-ia64@vger.kernel.org
Subject: Re: system hang on latest git
Date: Tue, 29 Jan 2008 21:16:48 +0100	[thread overview]
Message-ID: <20080129201648.GA15220@kernel.dk> (raw)
In-Reply-To: <20080129201136.GX15220@kernel.dk>

On Tue, Jan 29 2008, Jens Axboe wrote:
> On Tue, Jan 29 2008, Luck, Tony wrote:
> > I pulled Linus' tree this morning (git head = 0ba6c33bcddc64a54b5f1c25a696c4767dc76292)
> > and built for ia64 (using arch/ia64/configs/tiger_defconfig).   System booted
> > OK, but when I stressed it a little (building another kernel with "make -j32")
> > it hung.
> > 
> > The console has a bunch (98) of warnings about tasks blocked for more than 120
> > seconds like this:
> > INFO: task grep:9168 blocked for more than 120 seconds.
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > 
> > Call Trace:
> >  [<a000000100704120>] schedule+0x11c0/0x1340
> >                                 sp=e0000001ed8afbf0 bsp=e0000001ed8a1280
> >  [<a00000010024e720>] do_get_write_access+0x660/0xbe0
> >                                 sp=e0000001ed8afc20 bsp=e0000001ed8a1208
> >  [<a00000010024f060>] journal_get_write_access+0x40/0x80
> >                                 sp=e0000001ed8afca0 bsp=e0000001ed8a11c8
> >  [<a000000100245db0>] __ext3_journal_get_write_access+0x30/0xa0
> >                                 sp=e0000001ed8afca0 bsp=e0000001ed8a1190
> >  [<a00000010022dea0>] ext3_reserve_inode_write+0x80/0x120
> >                                 sp=e0000001ed8afca0 bsp=e0000001ed8a1158
> >  [<a00000010022df70>] ext3_mark_inode_dirty+0x30/0x80
> >                                 sp=e0000001ed8afca0 bsp=e0000001ed8a1130
> >  [<a000000100232530>] ext3_dirty_inode+0xd0/0x120
> >                                 sp=e0000001ed8afcc0 bsp=e0000001ed8a1100
> >  [<a000000100170e20>] __mark_inode_dirty+0xa0/0x3e0
> >                                 sp=e0000001ed8afcc0 bsp=e0000001ed8a10b0
> >  [<a00000010015b570>] touch_atime+0x310/0x340
> >                                 sp=e0000001ed8afcc0 bsp=e0000001ed8a1088
> >  [<a0000001000d6c20>] do_generic_mapping_read+0x780/0x7a0
> >                                 sp=e0000001ed8afce0 bsp=e0000001ed8a0fe0
> >  [<a0000001000db250>] generic_file_aio_read+0x290/0x340
> >                                 sp=e0000001ed8afce0 bsp=e0000001ed8a0f80
> >  [<a00000010012c990>] do_sync_read+0x170/0x200
> >                                 sp=e0000001ed8afd10 bsp=e0000001ed8a0f40
> >  [<a00000010012cbd0>] vfs_read+0x1b0/0x2e0
> >                                 sp=e0000001ed8afe20 bsp=e0000001ed8a0ef0
> >  [<a00000010012d250>] sys_read+0x70/0xe0
> >                                 sp=e0000001ed8afe20 bsp=e0000001ed8a0e78
> >  [<a00000010000a4a0>] ia64_ret_from_syscall+0x0/0x20
> >                                 sp=e0000001ed8afe30 bsp=e0000001ed8a0e78
> > 
> > 
> > [The stack trace has several variations ... some from sys_read(), some from
> > sys_open(), some from sys_execve(), some from sys_mmap() etc. 84/98 stack
> > traces pass through the touch_atime->__mark_inode_dirty path ... all 98
> > are attached]
> > 
> > A quick dig into processor state shows 8 cpus are idle.  7 are spinning
> > in __spin_lock_irq() from __make_request() and one is in spin_lock() from
> > as_merged_requests().
> 
> Looks like a deadlock on queue lock and ioc lock, but I don't see
> immediately what the problem is. I can't stick around for longer
> tonight, but I'll get to the bottom of this tomorrow.

Actually, can you try this? It has a known race but nothing to worry
about, and it removes ioc->lock from irq context.

diff --git a/block/as-iosched.c b/block/as-iosched.c
index b201d16..585aad2 100644
--- a/block/as-iosched.c
+++ b/block/as-iosched.c
@@ -235,10 +235,8 @@ static void as_put_io_context(struct request *rq)
 	aic = RQ_IOC(rq)->aic;
 
 	if (rq_is_sync(rq) && aic) {
-		spin_lock(&aic->lock);
 		set_bit(AS_TASK_IORUNNING, &aic->state);
 		aic->last_end_request = jiffies;
-		spin_unlock(&aic->lock);
 	}
 
 	put_io_context(RQ_IOC(rq));

-- 
Jens Axboe


  reply	other threads:[~2008-01-29 20:16 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-29 19:15 system hang on latest git Luck, Tony
2008-01-29 20:11 ` Jens Axboe
2008-01-29 20:11   ` Jens Axboe
2008-01-29 20:16   ` Jens Axboe [this message]
2008-01-29 20:16     ` Jens Axboe
2008-01-29 21:46     ` Olof Johansson
2008-01-29 21:46       ` Olof Johansson
2008-01-29 21:38       ` Jens Axboe
2008-01-29 21:38         ` Jens Axboe
2008-01-29 21:56         ` Olof Johansson
2008-01-29 21:56           ` Olof Johansson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080129201648.GA15220@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.