From: Andrew Morton <akpm@osdl.org>
To: "Charles C. Bennett, Jr." <ccb@acm.org>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>
Subject: Re: BUG: write-lock lockup
Date: Sat, 17 Jun 2006 10:07:10 -0700 [thread overview]
Message-ID: <20060617100710.ec05131f.akpm@osdl.org> (raw)
In-Reply-To: <1150142023.3621.22.camel@cbox.memecycle.com>
On Mon, 12 Jun 2006 15:53:43 -0400
"Charles C. Bennett, Jr." <ccb@acm.org> wrote:
>
> Hi All -
>
> I'm seeing write-lock lockups... this is happening with
> the Fedora kernels from at least as far back as their 2.6.15-1.1833_FC4
> (2.6.15.5) and as recently as their 2.6.16-1.2111_FC4 (2.6.16.17).
>
> It's a beefy box:
> Gateway 9515R - Intel ICH5/ICH5R
> Two Dual Core 3.0 ghz Xeons, 4GB RAM
> All disks via Emulex LP101-H (thor), switched fabric to
> Hitachi WMS SAN storage.
>
> The lockups are not process-specific. Running mke2fs on large
> filesystems seems to get it happen sooner rather than later.
Apart from these messages, does the machine otherwise work OK?
> I can fish out any other data you need, run tests, etc.
>
>
> Jun 9 20:43:40 localhost kernel: BUG: write-lock lockup on CPU#6, mkfs.ext2/3258, f799c864 (Not tainted)
> Jun 9 20:43:40 localhost kernel: [<c01f21d7>] __write_lock_debug+0xb4/0xdd [<c01f223f>] _raw_write_lock+0x3f/0x7c
> Jun 9 20:43:40 localhost kernel: [<c015532c>] shrink_list+0x197/0x45d [<c01557a3>] shrink_cache+0xe7/0x29b
> Jun 9 20:43:40 localhost kernel: [<c014d955>] bad_range+0x22/0x2f [<c014dfe3>] __rmqueue+0xd1/0x156
> Jun 9 20:43:40 localhost kernel: [<c0155db9>] shrink_zone+0x89/0xd8 [<c0155e6e>] shrink_caches+0x66/0x74
> Jun 9 20:43:40 localhost kernel: [<c0155f29>] try_to_free_pages+0xad/0x1b7 [<c014e8a5>] __alloc_pages+0x136/0x2ec
> Jun 9 20:43:40 localhost kernel: [<c016ca3e>] __block_commit_write+0x83/0x8f [<c014be23>] generic_file_buffered_write+0x16b/0x655
> Jun 9 20:43:40 localhost kernel: [<c012a04b>] current_fs_time+0x5a/0x75 [<c012a04b>] current_fs_time+0x5a/0x75
> Jun 9 20:43:40 localhost kernel: [<c0182cd8>] inode_update_time+0x2d/0x99 [<c014c5b0>] __generic_file_aio_write_nolock+0x2a3/0x4d9
> Jun 9 20:43:40 localhost kernel: [<c014c822>] generic_file_aio_write_nolock+0x3c/0x92 [<c01719e0>] blkdev_file_write+0x0/0x24
> Jun 9 20:43:40 localhost kernel: [<c014c99c>] generic_file_write_nolock+0x85/0x9f [<c0139196>] autoremove_wake_function+0x0/0x37
> Jun 9 20:43:40 localhost kernel: [<c0171a00>] blkdev_file_write+0x20/0x24 [<c01699f2>] vfs_write+0xa2/0x15a
> Jun 9 20:43:40 localhost kernel: [<c0169b55>] sys_write+0x41/0x6a [<c0104035>] syscall_call+0x7/0xb
> Jun 9 21:02:58 localhost kernel: BUG: write-lock lockup on CPU#1, mkfs.ext2/3259, f799c864 (Not tainted)
> Jun 9 21:02:58 localhost kernel: [<c01f21d7>] __write_lock_debug+0xb4/0xdd [<c01f223f>] _raw_write_lock+0x3f/0x7c
> Jun 9 21:02:58 localhost kernel: [<c032bd15>] _write_lock_irqsave+0x9/0xd [<c015070d>] test_clear_page_writeback+0x2d/0xa6
> Jun 9 21:02:58 localhost kernel: [<c014a09c>] end_page_writeback+0x76/0x84 [<c016b2c9>] end_buffer_async_write+0xbf/0x12a
> Jun 9 21:02:58 localhost kernel: [<c01f1167>] memmove+0x24/0x2d [<c014d292>] mempool_free+0x3a/0x73
> Jun 9 21:02:58 localhost kernel: [<c016db6a>] end_bio_bh_io_sync+0x0/0x4f [<c016db6a>] end_bio_bh_io_sync+0x0/0x4f
> Jun 9 21:02:58 localhost kernel: [<c016db8d>] end_bio_bh_io_sync+0x23/0x4f [<c016f307>] bio_endio+0x3e/0x69
> Jun 9 21:02:58 localhost kernel: [<c01e2e88>] __end_that_request_first+0x101/0x235 [<f885c8af>] scsi_end_request+0x1b/0xb0 [scsi_mod]
> Jun 9 21:02:58 localhost kernel: [<f885cbd5>] scsi_io_completion+0x151/0x4d2 [scsi_mod] [<f8833c97>] sd_rw_intr+0x70/0x3c1 [sd_mod]
> Jun 9 21:02:58 localhost kernel: [<f8857cad>] scsi_finish_command+0x82/0xd0 [scsi_mod] [<f8857b8e>] scsi_softirq+0xc0/0x137 [scsi_mod]
> Jun 9 21:02:58 localhost kernel: [<c012a1f2>] __do_softirq+0x72/0xdc [<c0106383>] do_softirq+0x4b/0x4f
> Jun 9 21:02:58 localhost kernel: =======================
> Jun 9 21:02:58 localhost kernel: [<c0106265>] do_IRQ+0x55/0x86 [<c0104a7a>] common_interrupt+0x1a/0x20
> Jun 9 21:03:05 localhost kernel: [<c014e387>] free_hot_cold_page+0xd5/0x126 [<c014eaf4>] __pagevec_free+0x1f/0x2e
> Jun 9 21:03:08 localhost kernel: [<c0154371>] __pagevec_release_nonlru+0x29/0x8b [<c01553aa>] shrink_list+0x215/0x45d
> Jun 9 21:03:09 localhost kernel: [<c01557a3>] shrink_cache+0xe7/0x29b [<c0155db9>] shrink_zone+0x89/0xd8
> Jun 9 21:03:09 localhost kernel: [<c0155e6e>] shrink_caches+0x66/0x74 [<c0155f29>] try_to_free_pages+0xad/0x1b7
> Jun 9 21:03:09 localhost kernel: [<c014e8a5>] __alloc_pages+0x136/0x2ec [<c016ca3e>] __block_commit_write+0x83/0x8f
> Jun 9 21:03:10 localhost kernel: [<c014be23>] generic_file_buffered_write+0x16b/0x655 [<c012a04b>] current_fs_time+0x5a/0x75
> Jun 9 21:03:11 localhost kernel: [<c0182cd8>] inode_update_time+0x2d/0x99 [<c014c5b0>] __generic_file_aio_write_nolock+0x2a3/0x4d9
> Jun 9 21:03:11 localhost kernel: [<c014c822>] generic_file_aio_write_nolock+0x3c/0x92 [<c01719e0>] blkdev_file_write+0x0/0x24
> Jun 9 21:03:12 localhost kernel: [<c014c99c>] generic_file_write_nolock+0x85/0x9f [<c0139196>] autoremove_wake_function+0x0/0x37
> Jun 9 21:03:12 localhost kernel: [<c0171a00>] blkdev_file_write+0x20/0x24 [<c01699f2>] vfs_write+0xa2/0x15a
> Jun 9 21:03:12 localhost kernel: [<c0169b55>] sys_write+0x41/0x6a [<c0104035>] syscall_call+0x7/0xb
Well. Obviously __write_lock_debug():print_once was meant to have static
scope (__read_mostly, too). But that's a cosmetic thing.
I'm suspecting that the debug code has simply gone wrong here - that
there's such a lot of read_lock() traffic happening with this workload that
the debug version of write_lock() simply isn't able to take the lock.
This might fix it, but it'll break the timing calculations in that loop.
--- devel/lib/spinlock_debug.c~a 2006-06-17 03:08:35.000000000 -0700
+++ devel-akpm/lib/spinlock_debug.c 2006-06-17 03:08:56.000000000 -0700
@@ -219,7 +219,7 @@ static void __write_lock_debug(rwlock_t
for (i = 0; i < loops_per_jiffy * HZ; i++) {
if (__raw_write_trylock(&lock->raw_lock))
return;
- __delay(1);
+ cpu_relax();
}
/* lockup suspected: */
if (print_once) {
_
(I'm a little surprised that RH-FC ships with this debug option enabled. It's
good for kernel development, but not optimal for users...)
Ingo, we have two bugs in there - some assist and thought would be
appreciated, please.
next prev parent reply other threads:[~2006-06-17 17:07 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-12 19:53 BUG: write-lock lockup Charles C. Bennett, Jr.
2006-06-17 17:07 ` Andrew Morton [this message]
2006-06-19 7:02 ` [patch] increase spinlock-debug looping timeouts from 1 sec to 1 min Ingo Molnar
2006-06-19 7:59 ` Andrew Morton
2006-06-19 8:12 ` Ingo Molnar
2006-06-19 8:21 ` Ingo Molnar
2006-06-19 8:32 ` Andrew Morton
2006-06-19 8:35 ` Ingo Molnar
2006-06-19 9:13 ` Andrew Morton
2006-06-19 11:39 ` Ingo Molnar
2006-06-19 19:55 ` Andrew Morton
2006-06-20 8:06 ` Arjan van de Ven
2006-06-20 8:40 ` [patch] fix spinlock-debug looping Ingo Molnar
2006-06-20 8:52 ` Andrew Morton
2006-06-20 9:15 ` Ingo Molnar
2006-06-20 9:32 ` Andrew Morton
2006-06-20 9:34 ` Ingo Molnar
2006-06-20 16:02 ` Dave Olson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060617100710.ec05131f.akpm@osdl.org \
--to=akpm@osdl.org \
--cc=ccb@acm.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox