linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.24-rc6 reproducible raid5 hang
@ 2007-12-27 17:06 dean gaudet
  2007-12-27 17:39 ` dean gaudet
  2007-12-27 19:52 ` Justin Piszcz
  0 siblings, 2 replies; 30+ messages in thread
From: dean gaudet @ 2007-12-27 17:06 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1093 bytes --]

hey neil -- remember that raid5 hang which me and only one or two others 
ever experienced and which was hard to reproduce?  we were debugging it 
well over a year ago (that box has 400+ day uptime now so at least that 
long ago :)  the workaround was to increase stripe_cache_size... i seem to 
have a way to reproduce something which looks much the same.

setup:

- 2.6.24-rc6
- system has 8GiB RAM but no swap
- 8x750GB in a raid5 with one spare, chunksize 1024KiB.
- mkfs.xfs default options
- mount -o noatime
- dd if=/dev/zero of=/mnt/foo bs=4k count=2621440

that sequence hangs for me within 10 seconds... and i can unhang / rehang 
it by toggling between stripe_cache_size 256 and 1024.  i detect the hang 
by watching "iostat -kx /dev/sd? 5".

i've attached the kernel log where i dumped task and timer state while it 
was hung... note that you'll see at some point i did an xfs mount with 
external journal but it happens with internal journal as well.

looks like it's using the raid456 module and async api.

anyhow let me know if you need more info / have any suggestions.

-dean

[-- Attachment #2: Type: APPLICATION/octet-stream, Size: 19281 bytes --]

[-- Attachment #3: Type: APPLICATION/octet-stream, Size: 25339 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread
* Re: 2.6.24-rc6 reproducible raid5 hang
@ 2008-01-23 13:37 Tim Southerwood
  2008-01-23 17:43 ` Carlos Carvalho
  0 siblings, 1 reply; 30+ messages in thread
From: Tim Southerwood @ 2008-01-23 13:37 UTC (permalink / raw)
  To: linux-raid

Sorry if this breaks threaded mail readers, I only just subscribed to 
the list so don;t have the original post to reply to.

I believe I'm having the same problem.

Regarding XFS on a raid5 md array:

Kernels 2.6.22-14 (Ubuntu Gutsy generic and server builds) *and* 
2.6.24-rc8 (pure build from virgin sources) compiled for amd64 arch.

Raid 5 configured across 4 x 500GB SATA disks (Nforce nv_sata driver, 
Asus M2N-E mobo, Athlon X64, 4GB RAM

MD Chunk size is 1024k. This is allocated to an LVM2 PV, then sliced up.
Taking one sample logical volume of 150GB I ran

mkfs.xfs -d su=1024k,sw=3 -L vol_linux /dev/vg00/vol_linux

I then found that putting high write load on that filesystem cause a 
hang. High load could be a little as a single rsync of a mirror of 
Ubunty Gutsy (many 10's of GB) from my old server to here. Hang would 
happen in a few hours typically.

I could generate relatively quick hangs by running xfs_fsr (defragger) 
in parallel.

Trying the workaround up upping /sys/block/md1/md/stripe_cache_size to 
4096 seems (fingers crossed) to have helped. Been running the rsync 
again, plus xfs_fst + a few dd's of 11 GB to the same filesystem.

I did notice also that the write speed increased dramatically with a 
bigger stripe_cache_size.

A more detailed analysis of the problem indicated that, after the hang:

I could log in;

One CPU core was stuck in 100% IO wait.
The other core was useable, with care. So I managed to get a SysRQ T and 
  one place the system appeared blocked was via this path:

[ 2039.466258] xfs_fsr       D 0000000000000000     0  7324   7308
[ 2039.466260]  ffff810119399858 0000000000000082 0000000000000000 
0000000000000046
[ 2039.466263]  ffff810110d6c680 ffff8101102ba998 ffff8101102ba770 
ffffffff8054e5e0
[ 2039.466265]  ffff8101102ba998 000000010014a1e6 ffffffffffffffff 
ffff810110ddcb30
[ 2039.466268] Call Trace:
[ 2039.466277]  [<ffffffff8808a26b>] :raid456:get_active_stripe+0x1cb/0x610
[ 2039.466282]  [<ffffffff80234000>] default_wake_function+0x0/0x10
[ 2039.466289]  [<ffffffff88090ff8>] :raid456:make_request+0x1f8/0x610
[ 2039.466293]  [<ffffffff80251c20>] autoremove_wake_function+0x0/0x30
[ 2039.466295]  [<ffffffff80331121>] __up_read+0x21/0xb0
[ 2039.466300]  [<ffffffff8031f336>] generic_make_request+0x1d6/0x3d0
[ 2039.466303]  [<ffffffff80280bad>] vm_normal_page+0x3d/0xc0
[ 2039.466307]  [<ffffffff8031f59f>] submit_bio+0x6f/0xf0
[ 2039.466311]  [<ffffffff802c98cc>] dio_bio_submit+0x5c/0x90
[ 2039.466313]  [<ffffffff802c9943>] dio_send_cur_page+0x43/0xa0
[ 2039.466316]  [<ffffffff802c99ee>] submit_page_section+0x4e/0x150
[ 2039.466319]  [<ffffffff802ca2e2>] __blockdev_direct_IO+0x742/0xb50
[ 2039.466342]  [<ffffffff8832e9a2>] :xfs:xfs_vm_direct_IO+0x182/0x190
[ 2039.466357]  [<ffffffff8832edb0>] :xfs:xfs_get_blocks_direct+0x0/0x20
[ 2039.466370]  [<ffffffff8832e350>] :xfs:xfs_end_io_direct+0x0/0x80
[ 2039.466375]  [<ffffffff80444fb5>] __wait_on_bit_lock+0x65/0x80
[ 2039.466380]  [<ffffffff80272883>] generic_file_direct_IO+0xe3/0x190
[ 2039.466385]  [<ffffffff802729a4>] generic_file_direct_write+0x74/0x150
[ 2039.466402]  [<ffffffff88336db2>] :xfs:xfs_write+0x492/0x8f0
[ 2039.466421]  [<ffffffff883099bc>] :xfs:xfs_iunlock+0x2c/0xb0
[ 2039.466437]  [<ffffffff88336866>] :xfs:xfs_read+0x186/0x240
[ 2039.466443]  [<ffffffff8029e5b9>] do_sync_write+0xd9/0x120
[ 2039.466448]  [<ffffffff80251c20>] autoremove_wake_function+0x0/0x30
[ 2039.466457]  [<ffffffff8029eead>] vfs_write+0xdd/0x190
[ 2039.466461]  [<ffffffff8029f5b3>] sys_write+0x53/0x90
[ 2039.466465]  [<ffffffff8020c29e>] system_call+0x7e/0x83


However, I'm of the opinion that the system should not deadlock, even if 
tunable parameters are unfavourable. I'm happy with the workaround 
(indeed the system performs better).

However, it will take me a week's worth of testing before I'm willing to 
commission this as my new fileserver.

So, if there is anything anyone would like me to try, I'm happy to 
volunteer as a guinea pig :)

Yes, I can build and patch kernels. But I'm not hot at debugging kernels 
so if kernel core dumps or whatever are needed, please point me at the 
right document or hint as to which commands I need to read about.

Cheers

Tim

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2008-02-14 10:13 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-27 17:06 2.6.24-rc6 reproducible raid5 hang dean gaudet
2007-12-27 17:39 ` dean gaudet
2007-12-29 16:48   ` dean gaudet
2007-12-29 20:47     ` Dan Williams
2007-12-29 20:58       ` dean gaudet
2007-12-29 21:50         ` Justin Piszcz
2007-12-29 22:11           ` dean gaudet
2007-12-29 22:21             ` dean gaudet
2007-12-29 22:06         ` Dan Williams
2007-12-30 17:58           ` dean gaudet
2008-01-09 18:28             ` Dan Williams
2008-01-10  0:09               ` Neil Brown
2008-01-10  3:07                 ` Dan Williams
2008-01-10  3:57                   ` Neil Brown
2008-01-10  4:56                     ` Dan Williams
2008-01-10 20:28                     ` Bill Davidsen
2008-01-10  7:13                 ` dean gaudet
2008-01-10 18:49                   ` Dan Williams
2008-01-11  1:46                     ` Neil Brown
2008-01-11  2:14                       ` dean gaudet
2008-01-10 17:59                 ` dean gaudet
2007-12-27 19:52 ` Justin Piszcz
2007-12-28  0:08   ` dean gaudet
  -- strict thread matches above, loose matches on Subject: below --
2008-01-23 13:37 Tim Southerwood
2008-01-23 17:43 ` Carlos Carvalho
2008-01-24 20:30   ` Tim Southerwood
2008-01-28 17:29     ` Tim Southerwood
2008-01-29 14:16       ` Carlos Carvalho
2008-01-29 22:58         ` Bill Davidsen
2008-02-14 10:13           ` Burkhard Carstens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).