* [SPAM] Write hang on ARM based target
@ 2009-06-08 15:46 Dunphy, Bill
[not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F8D-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
0 siblings, 1 reply; 14+ messages in thread
From: Dunphy, Bill @ 2009-06-08 15:46 UTC (permalink / raw)
To: users-JrjvKiOkagjYtjvyW6yDsg
[-- Attachment #1.1: Type: text/plain, Size: 979 bytes --]
Greetings,
I'm attempting to get NILFS up and running on a Marvell DB-88F6281-BP-A
(ARM) and have experienced a consistent hang during data transfer. This
target board is running a 2.6.22.18 kernel and I've tested it against
NILFS versions 2.0.12, 2.0.14 and the latest git pull (6/4/09). I've
eliminated the garbage collector and and have enabled full debugging for
log capture. The failure is not immediate or at a fixed point -
typically in the 1-2GB neighborhood of writing variably sized files.
This same exercise (a simple recursive local copy) works flawlessly on
my host Ubuntu i86 machine as well as on the target with an ext3 file
system.
Is there a specific kernel feature that I might be missing and need to
enable? A kernel version sensitivity? Is this log information
something you would be interested in seeing? If so, how verbose would
you like it or better still, how would you like me to collect it?
Thanks,
William Dunphy
[-- Attachment #1.2: Type: text/html, Size: 2831 bytes --]
[-- Attachment #2: Type: text/plain, Size: 158 bytes --]
_______________________________________________
users mailing list
users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
https://www.nilfs.org/mailman/listinfo/users
^ permalink raw reply [flat|nested] 14+ messages in thread[parent not found: <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F8D-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>]
* Re: [SPAM] Write hang on ARM based target [not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F8D-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org> @ 2009-06-08 16:49 ` Ryusuke Konishi [not found] ` <20090609.014919.79903506.ryusuke-sG5X7nlA6pw@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Ryusuke Konishi @ 2009-06-08 16:49 UTC (permalink / raw) To: users-JrjvKiOkagjYtjvyW6yDsg, WDunphy-/3AnZwnt+WyVkR+q5Y5grQ Hi! On Mon, 8 Jun 2009 09:46:52 -0600, "Dunphy, Bill" wrote: > Greetings, > > I'm attempting to get NILFS up and running on a Marvell DB-88F6281-BP-A > (ARM) and have experienced a consistent hang during data transfer. This > target board is running a 2.6.22.18 kernel and I've tested it against > NILFS versions 2.0.12, 2.0.14 and the latest git pull (6/4/09). I've > eliminated the garbage collector and and have enabled full debugging for > log capture. The failure is not immediate or at a fixed point - > typically in the 1-2GB neighborhood of writing variably sized files. > This same exercise (a simple recursive local copy) works flawlessly on > my host Ubuntu i86 machine as well as on the target with an ext3 file > system. > > Is there a specific kernel feature that I might be missing and need to > enable? A kernel version sensitivity? Is this log information > something you would be interested in seeing? If so, how verbose would > you like it or better still, how would you like me to collect it? Thanks for reporting. What will happen if you trigger SysRq when you meet the hang? # echo t > /proc/sysrq-trigger I don't know whether the arm kernel codes is supporting sysrq, but we may get the stack dump of the nilfs task hung at some lock. If it works, it would help us to narrow down the problem. Also, you can know the progress of segment writer by specifying the following debug options, # echo "-vvv segment -vv seginfo" > /proc/fs/nilfs2/debug_option This feature is only available on the nilfs2 out-of-tree module built with CONFIG_NILFS_DEBUG=y option. Could you try either of these? Thanks in advance, Ryusuke Konishi ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20090609.014919.79903506.ryusuke-sG5X7nlA6pw@public.gmane.org>]
* Re: [SPAM] Write hang on ARM based target [not found] ` <20090609.014919.79903506.ryusuke-sG5X7nlA6pw@public.gmane.org> @ 2009-06-08 20:30 ` Dunphy, Bill [not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F96-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Dunphy, Bill @ 2009-06-08 20:30 UTC (permalink / raw) To: Ryusuke Konishi, users-JrjvKiOkagjYtjvyW6yDsg Ryusuke, Thanks for the prompt reply. >Hi! >On Mon, 8 Jun 2009 09:46:52 -0600, "Dunphy, Bill" wrote: >> Greetings, >> >> I'm attempting to get NILFS up and running on a Marvell >> DB-88F6281-BP-A >> (ARM) and have experienced a consistent hang during data transfer. >> This target board is running a 2.6.22.18 kernel and I've tested it >> against NILFS versions 2.0.12, 2.0.14 and the latest git pull >> (6/4/09). I've eliminated the garbage collector and and have enabled >> full debugging for log capture. The failure is not immediate or at a >> fixed point - typically in the 1-2GB neighborhood of writing variably sized files. >> This same exercise (a simple recursive local copy) works flawlessly on >> my host Ubuntu i86 machine as well as on the target with an ext3 file >> system. >> >> Is there a specific kernel feature that I might be missing and need to >> enable? A kernel version sensitivity? Is this log information >> something you would be interested in seeing? If so, how verbose would >> you like it or better still, how would you like me to collect it? > >Thanks for reporting. > >What will happen if you trigger SysRq when you meet the hang? > > # echo t > /proc/sysrq-trigger > >I don't know whether the arm kernel codes is supporting sysrq, but we may get the stack dump of >the nilfs task hung at some lock. If it works, it would help us to narrow down the problem. Here's the stack trace of the copy process which has entered a permanent disk sleep state: cp D C002CAA4 0 339 333 (NOTLB) [<c002c580>] (schedule+0x0/0x600) from [<c002d54c>] (schedule_timeout+0x98/0xc4) [<c002d4b4>] (schedule_timeout+0x0/0xc4) from [<c002cc28>] (io_schedule_timeout+0x34/0x58) r7:00000001 r6:dfe6fc64 r5:dfd7bac0 r4:c03fb3a0 [<c002cbf4>] (io_schedule_timeout+0x0/0x58) from [<c014b834>] (congestion_wait+0x70/0x94) r4:c03df664 [<c014b7c4>] (congestion_wait+0x0/0x94) from [<c0146b2c>] (balance_dirty_pages_ratelimited_nr+0x12c/0x1c0) r8:00000000 r7:cf142660 r6:00000600 r5:c04038c4 r4:ffffffce [<c0146a00>] (balance_dirty_pages_ratelimited_nr+0x0/0x1c0) from [<c0142860>] (generic_file_buffered_write+0x570/0x670) [<c01422f0>] (generic_file_buffered_write+0x0/0x670) from [<c0142db0>] (__generic_file_aio_write_nolock+0x450/0x4a0) [<c0142960>] (__generic_file_aio_write_nolock+0x0/0x4a0) from [<c0142e78>] (generic_file_aio_write+0x78/0xf4) [<c0142e00>] (generic_file_aio_write+0x0/0xf4) from [<c0160430>] (do_sync_write+0xb8/0x108) [<c0160378>] (do_sync_write+0x0/0x108) from [<c0160d18>] (vfs_write+0xb8/0x148) [<c0160c60>] (vfs_write+0x0/0x148) from [<c01612dc>] (sys_write+0x44/0x70) r7:00000549 r6:dead0bc0 r5:00000000 r4:00009000 [<c0161298>] (sys_write+0x0/0x70) from [<c0027400>] (ret_fast_syscall+0x0/0x2c) r8:c00275a8 r7:00000004 r6:0000000d r5:bee23f50 r4:00000549 >Also, you can know the progress of segment writer by specifying the following debug options, > > # echo "-vvv segment -vv seginfo" > /proc/fs/nilfs2/debug_option > >This feature is only available on the nilfs2 out-of-tree module built with CONFIG_NILFS_DEBUG=y >option. Here's the tail of the syslog with segment and seginfo logging enhanced as requested: NILFS(segment) nilfs_segctor_sync: start task=dfd7bac0 seq=159 NILFS(segment) nilfs_segctor_thread: woke NILFS(segment) nilfs_segctor_thread: sequence: req=159, done=158, state=0 NILFS(segment) nilfs_transaction_lock: task c0909820 locking segment semaphore NILFS(segment) nilfs_transaction_lock: locked NILFS(segment) nilfs_segctor_confirm: Skipped construction (no changes) NILFS(segment) nilfs_segctor_notify: complete requests from seq=159 to seq=159 NILFS(segment) nilfs_segctor_wakeup: wakeup task=dfe6fb44 seq=159 NILFS(segment) nilfs_transaction_unlock: task c0909820 unlocked segment semaphore NILFS(segment) nilfs_segctor_thread: sequence: req=159, done=159, state=0 NILFS(segment) nilfs_segctor_sync: done task=dfd7bac0 seq=159 err=0 NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment semaphore NILFS(segment) nilfs_transaction_lock: locked NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment semaphore NILFS(segment) nilfs_segctor_sync: start task=dfd7bac0 seq=160 NILFS(segment) nilfs_segctor_thread: woke NILFS(segment) nilfs_segctor_thread: sequence: req=160, done=159, state=0 NILFS(segment) nilfs_transaction_lock: task c0909820 locking segment semaphore NILFS(segment) nilfs_transaction_lock: locked NILFS(segment) nilfs_segctor_confirm: Skipped construction (no changes) NILFS(segment) nilfs_segctor_notify: comion_unlock: task dfd7bac0 unlocked segment semaphore NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment semaphore NILFS(segment) nilfs_transaction_lock: locked NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment semaphore // Bunch of repeated lock, lock, unlock entries... NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment semaphore NILFS(segment) nilfs_transaction_lock: locked NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment semaphore NILFS(segment) nilfs_segctor_sync: start task=dfd7bac0 seq=703 NILFS(segment) nilfs_segctor_thread: woke NILFS(segment) nilfs_segctor_thread: sequence: req=703, done=702, state=0 NILFS(segment) nilfs_transaction_lock: task c0909820 locking segment semaphore NILFS(segment) nilfs_transaction_lock: locked NILFS(segment) nilfs_segctor_confirm: Skipped construction (no changes) NILFS(segment) nilfs_segctor_notify: complete requests from seq=703 to seq=703 NILFS(segment) nilfs_segctor_wakeup: wakeup task=dfe6fb44 seq=703 NILFS(segment) nilfs_transaction_unlock: task c0909820 unlocked segment semaphore NILFS(segment) nilfs_segctor_thread: sequence: req=703, done=703, state=0 NILFS(segment) nilfs_segctor_sync: done task=dfd7bac0 seq=703 err=0 NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment semaphore NILFS(segment) nilfs_transaction_lock: locked NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment semaphore // Bunch of repeated lock, lock, unlock entries... NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment semaphore NILFS(segment) nilfs_transaction_lock: locked NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment semaphore // Game over. As you can see, I've cut out some repeated sections to trim down the output. If you need more log info beyond this tail, let me know how much and how you would like me to pass it to you. Bill ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F96-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>]
* Re: Write hang on ARM based target [not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F96-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org> @ 2009-06-09 19:04 ` Dunphy, Bill [not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F99-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Dunphy, Bill @ 2009-06-09 19:04 UTC (permalink / raw) To: Ryusuke Konishi, users-JrjvKiOkagjYtjvyW6yDsg Ryusuke, I've simplified my test approach by simply copying a single 10 byte file repeatedly. When doing so, I get consistent NILFS log output at the time of hang. Examination of /proc/sysreq-trigger also shows the copy process to be in the same state as before (at sys_write). Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_construct: begin (mode=0x1) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: ** LSEG BEGIN Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: ** GC INODE STAGE Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: ** FILE STAGE Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: ** IFILE STAGE Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: called (ino=6, stage-count=3) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers: called (ino=6) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers: done (ino=6) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_node_buffers: called (ino=6) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: done (err=0) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: ** CP STAGE Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: called (ino=4, stage-count=4) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers: called (ino=4) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers: done (ino=4) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_node_buffers: called (ino=4) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: done (err=0) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: ** SUFILE STAGE Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: called (ino=5, stage-count=5) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers: called (ino=5) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers: done (ino=5) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_node_buffers: called (ino=5) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: done (err=0) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: ** DAT STAGE Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: called (ino=3, stage-count=6) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers: called (ino=3) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers: done (ino=3) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_node_buffers: called (ino=3) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: done (err=0) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: ** SR STAGE Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: add a super root block Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: ** LSEG END Jun 9 12:55:14 kernel: ========= NILFS SEGMENT INFORMATION ======== Jun 9 12:55:14 kernel: full segment: segnum=101, start=206848, end=208895 Jun 9 12:55:14 kernel: partial segment: start=208788, rest=108 Jun 9 12:55:14 kernel: ------------------ SUMMARY ----------------- Jun 9 12:55:14 kernel: nfinfo = 4 (number of files) Jun 9 12:55:14 kernel: nblocks = 22 (number of blocks) Jun 9 12:55:14 kernel: sumbytes = 408 (size of summary in bytes) Jun 9 12:55:14 kernel: nsumblk = 1 (number of summary blocks) Jun 9 12:55:14 kernel: flags = LOGBGN|LOGEND|SR Jun 9 12:55:14 kernel: ============================================ Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_update_payload_blocknr: called Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_update_payload_blocknr: done Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: called Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: done Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: called Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: done Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_update_segusage: called Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_update_segusage: done Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: called Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: done Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_write: submitting summary blocks Jun 9 12:55:14 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated bio (max_vecs=64) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_write: submitting normal blocks (index=1) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_submit_seg_bio: submitting bio (start_sector=1670304, size=90112) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_write: submitted a segment (err=0, pseg_start=208788, #re) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_wait: called nbio=1 Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_wait: wait completed Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_complete_write: completing segment (flags=0x7) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_complete_write: completed a segment having a super root ) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_do_construct: submitted all segments Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_construct: end (stage=9) Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_notify: complete requests from seq=2 to seq=1 Jun 9 12:55:14 kernel: NILFS(segment) nilfs_transaction_unlock: task c08c0820 unlocked segment semaphore Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_thread: sequence: req=1, done=1, state=0 Bill -----Original Message----- From: Dunphy, Bill Sent: Monday, June 08, 2009 2:31 PM To: 'Ryusuke Konishi'; users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org Subject: RE: [NILFS users] [SPAM] Write hang on ARM based target Ryusuke, Thanks for the prompt reply. >Hi! >On Mon, 8 Jun 2009 09:46:52 -0600, "Dunphy, Bill" wrote: >> Greetings, >> >> I'm attempting to get NILFS up and running on a Marvell >> DB-88F6281-BP-A >> (ARM) and have experienced a consistent hang during data transfer. >> This target board is running a 2.6.22.18 kernel and I've tested it >> against NILFS versions 2.0.12, 2.0.14 and the latest git pull >> (6/4/09). I've eliminated the garbage collector and and have enabled >> full debugging for log capture. The failure is not immediate or at a >> fixed point - typically in the 1-2GB neighborhood of writing variably sized files. >> This same exercise (a simple recursive local copy) works flawlessly >> on my host Ubuntu i86 machine as well as on the target with an ext3 >> file system. >> >> Is there a specific kernel feature that I might be missing and need >> to enable? A kernel version sensitivity? Is this log information >> something you would be interested in seeing? If so, how verbose >> would you like it or better still, how would you like me to collect it? > >Thanks for reporting. > >What will happen if you trigger SysRq when you meet the hang? > > # echo t > /proc/sysrq-trigger > >I don't know whether the arm kernel codes is supporting sysrq, but we may get the stack dump of >the nilfs task hung at some lock. If it works, it would help us to narrow down the problem. Here's the stack trace of the copy process which has entered a permanent disk sleep state: cp D C002CAA4 0 339 333 (NOTLB) [<c002c580>] (schedule+0x0/0x600) from [<c002d54c>] (schedule_timeout+0x98/0xc4) [<c002d4b4>] (schedule_timeout+0x0/0xc4) from [<c002cc28>] (io_schedule_timeout+0x34/0x58) r7:00000001 r6:dfe6fc64 r5:dfd7bac0 r4:c03fb3a0 [<c002cbf4>] (io_schedule_timeout+0x0/0x58) from [<c014b834>] (congestion_wait+0x70/0x94) r4:c03df664 [<c014b7c4>] (congestion_wait+0x0/0x94) from [<c0146b2c>] (balance_dirty_pages_ratelimited_nr+0x12c/0x1c0) r8:00000000 r7:cf142660 r6:00000600 r5:c04038c4 r4:ffffffce [<c0146a00>] (balance_dirty_pages_ratelimited_nr+0x0/0x1c0) from [<c0142860>] (generic_file_buffered_write+0x570/0x670) [<c01422f0>] (generic_file_buffered_write+0x0/0x670) from [<c0142db0>] (__generic_file_aio_write_nolock+0x450/0x4a0) [<c0142960>] (__generic_file_aio_write_nolock+0x0/0x4a0) from [<c0142e78>] (generic_file_aio_write+0x78/0xf4) [<c0142e00>] (generic_file_aio_write+0x0/0xf4) from [<c0160430>] (do_sync_write+0xb8/0x108) [<c0160378>] (do_sync_write+0x0/0x108) from [<c0160d18>] (vfs_write+0xb8/0x148) [<c0160c60>] (vfs_write+0x0/0x148) from [<c01612dc>] (sys_write+0x44/0x70) r7:00000549 r6:dead0bc0 r5:00000000 r4:00009000 [<c0161298>] (sys_write+0x0/0x70) from [<c0027400>] (ret_fast_syscall+0x0/0x2c) r8:c00275a8 r7:00000004 r6:0000000d r5:bee23f50 r4:00000549 >Also, you can know the progress of segment writer by specifying the >following debug options, > > # echo "-vvv segment -vv seginfo" > /proc/fs/nilfs2/debug_option > >This feature is only available on the nilfs2 out-of-tree module built with CONFIG_NILFS_DEBUG=y >option. Here's the tail of the syslog with segment and seginfo logging enhanced as requested: NILFS(segment) nilfs_segctor_sync: start task=dfd7bac0 seq=159 NILFS(segment) nilfs_segctor_thread: woke NILFS(segment) nilfs_segctor_thread: sequence: req=159, done=158, state=0 NILFS(segment) nilfs_transaction_lock: task c0909820 locking segment semaphore NILFS(segment) nilfs_transaction_lock: locked NILFS(segment) nilfs_segctor_confirm: Skipped construction (no changes) NILFS(segment) nilfs_segctor_notify: complete requests from seq=159 to seq=159 NILFS(segment) nilfs_segctor_wakeup: wakeup task=dfe6fb44 seq=159 NILFS(segment) nilfs_transaction_unlock: task c0909820 unlocked segment semaphore NILFS(segment) nilfs_segctor_thread: sequence: req=159, done=159, state=0 NILFS(segment) nilfs_segctor_sync: done task=dfd7bac0 seq=159 err=0 NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment semaphore NILFS(segment) nilfs_transaction_lock: locked NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment semaphore NILFS(segment) nilfs_segctor_sync: start task=dfd7bac0 seq=160 NILFS(segment) nilfs_segctor_thread: woke NILFS(segment) nilfs_segctor_thread: sequence: req=160, done=159, state=0 NILFS(segment) nilfs_transaction_lock: task c0909820 locking segment semaphore NILFS(segment) nilfs_transaction_lock: locked NILFS(segment) nilfs_segctor_confirm: Skipped construction (no changes) NILFS(segment) nilfs_segctor_notify: comion_unlock: task dfd7bac0 unlocked segment semaphore NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment semaphore NILFS(segment) nilfs_transaction_lock: locked NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment semaphore // Bunch of repeated lock, lock, unlock entries... NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment semaphore NILFS(segment) nilfs_transaction_lock: locked NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment semaphore NILFS(segment) nilfs_segctor_sync: start task=dfd7bac0 seq=703 NILFS(segment) nilfs_segctor_thread: woke NILFS(segment) nilfs_segctor_thread: sequence: req=703, done=702, state=0 NILFS(segment) nilfs_transaction_lock: task c0909820 locking segment semaphore NILFS(segment) nilfs_transaction_lock: locked NILFS(segment) nilfs_segctor_confirm: Skipped construction (no changes) NILFS(segment) nilfs_segctor_notify: complete requests from seq=703 to seq=703 NILFS(segment) nilfs_segctor_wakeup: wakeup task=dfe6fb44 seq=703 NILFS(segment) nilfs_transaction_unlock: task c0909820 unlocked segment semaphore NILFS(segment) nilfs_segctor_thread: sequence: req=703, done=703, state=0 NILFS(segment) nilfs_segctor_sync: done task=dfd7bac0 seq=703 err=0 NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment semaphore NILFS(segment) nilfs_transaction_lock: locked NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment semaphore // Bunch of repeated lock, lock, unlock entries... NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment semaphore NILFS(segment) nilfs_transaction_lock: locked NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment semaphore // Game over. As you can see, I've cut out some repeated sections to trim down the output. If you need more log info beyond this tail, let me know how much and how you would like me to pass it to you. Bill ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F99-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>]
* Re: Write hang on ARM based target [not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F99-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org> @ 2009-06-12 4:23 ` Ryusuke Konishi [not found] ` <20090612.132344.17157054.ryusuke-sG5X7nlA6pw@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Ryusuke Konishi @ 2009-06-12 4:23 UTC (permalink / raw) To: WDunphy-/3AnZwnt+WyVkR+q5Y5grQ; +Cc: users-JrjvKiOkagjYtjvyW6yDsg Hi Dunphy, Thank you for sending me the trace log. I will look into the problem later. Ryusuke Konishi On Tue, 9 Jun 2009 13:04:04 -0600, "Dunphy, Bill" wrote: > > Ryusuke, > > I've simplified my test approach by simply copying a single 10 byte file > repeatedly. When doing so, I get consistent NILFS log output at the > time of hang. Examination of /proc/sysreq-trigger also shows the copy > process to be in the same state as before (at sys_write). > > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_construct: begin > (mode=0x1) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: ** > LSEG BEGIN > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: ** > GC INODE STAGE > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: ** > FILE STAGE > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: ** > IFILE STAGE > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: called > (ino=6, stage-count=3) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers: > called (ino=6) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers: > done (ino=6) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_node_buffers: > called (ino=6) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: done > (err=0) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: ** > CP STAGE > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: called > (ino=4, stage-count=4) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers: > called (ino=4) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers: > done (ino=4) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_node_buffers: > called (ino=4) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: done > (err=0) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: ** > SUFILE STAGE > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: called > (ino=5, stage-count=5) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers: > called (ino=5) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers: > done (ino=5) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_node_buffers: > called (ino=5) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: done > (err=0) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: ** > DAT STAGE > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: called > (ino=3, stage-count=6) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers: > called (ino=3) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers: > done (ino=3) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_node_buffers: > called (ino=3) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: done > (err=0) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: ** > SR STAGE > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: add > a super root block > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: ** > LSEG END > Jun 9 12:55:14 kernel: ========= NILFS SEGMENT INFORMATION ======== > > Jun 9 12:55:14 kernel: full segment: segnum=101, start=206848, > end=208895 > Jun 9 12:55:14 kernel: partial segment: start=208788, rest=108 > > Jun 9 12:55:14 kernel: ------------------ SUMMARY ----------------- > > Jun 9 12:55:14 kernel: nfinfo = 4 (number of files) > > Jun 9 12:55:14 kernel: nblocks = 22 (number of blocks) > > Jun 9 12:55:14 kernel: sumbytes = 408 (size of summary in bytes) > > Jun 9 12:55:14 kernel: nsumblk = 1 (number of summary blocks) > > Jun 9 12:55:14 kernel: flags = LOGBGN|LOGEND|SR > > Jun 9 12:55:14 kernel: ============================================ > > Jun 9 12:55:14 kernel: NILFS(segment) > nilfs_segctor_update_payload_blocknr: called > Jun 9 12:55:14 kernel: NILFS(segment) > nilfs_segctor_update_payload_blocknr: done > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: > called > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: > done > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: > called > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: > done > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_update_segusage: > called > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_update_segusage: > done > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: > called > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: > done > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_write: submitting > summary blocks > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated > bio (max_vecs=64) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_write: submitting > normal blocks (index=1) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_submit_seg_bio: submitting > bio (start_sector=1670304, size=90112) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_write: submitted a > segment (err=0, pseg_start=208788, #re) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_wait: called nbio=1 > > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_wait: wait completed > > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_complete_write: > completing segment (flags=0x7) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_complete_write: > completed a segment having a super root ) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_do_construct: > submitted all segments > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_construct: end > (stage=9) > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_notify: complete > requests from seq=2 to seq=1 > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_transaction_unlock: task > c08c0820 unlocked segment semaphore > Jun 9 12:55:14 kernel: NILFS(segment) nilfs_segctor_thread: sequence: > req=1, done=1, state=0 > > Bill > > -----Original Message----- > From: Dunphy, Bill > Sent: Monday, June 08, 2009 2:31 PM > To: 'Ryusuke Konishi'; users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org > Subject: RE: [NILFS users] [SPAM] Write hang on ARM based target > > Ryusuke, > > Thanks for the prompt reply. > > >Hi! > >On Mon, 8 Jun 2009 09:46:52 -0600, "Dunphy, Bill" wrote: > >> Greetings, > >> > >> I'm attempting to get NILFS up and running on a Marvell > >> DB-88F6281-BP-A > >> (ARM) and have experienced a consistent hang during data transfer. > >> This target board is running a 2.6.22.18 kernel and I've tested it > >> against NILFS versions 2.0.12, 2.0.14 and the latest git pull > >> (6/4/09). I've eliminated the garbage collector and and have enabled > > >> full debugging for log capture. The failure is not immediate or at a > > >> fixed point - typically in the 1-2GB neighborhood of writing variably > sized files. > >> This same exercise (a simple recursive local copy) works flawlessly > >> on my host Ubuntu i86 machine as well as on the target with an ext3 > >> file system. > >> > >> Is there a specific kernel feature that I might be missing and need > >> to enable? A kernel version sensitivity? Is this log information > >> something you would be interested in seeing? If so, how verbose > >> would you like it or better still, how would you like me to collect > it? > > > >Thanks for reporting. > > > >What will happen if you trigger SysRq when you meet the hang? > > > > # echo t > /proc/sysrq-trigger > > > >I don't know whether the arm kernel codes is supporting sysrq, but we > may get the stack dump of >the nilfs task hung at some lock. If it > works, it would help us to narrow down the problem. > > Here's the stack trace of the copy process which has entered a permanent > disk sleep state: > > cp D C002CAA4 0 339 333 (NOTLB) > [<c002c580>] (schedule+0x0/0x600) from [<c002d54c>] > (schedule_timeout+0x98/0xc4) [<c002d4b4>] (schedule_timeout+0x0/0xc4) > from [<c002cc28>] (io_schedule_timeout+0x34/0x58) > r7:00000001 r6:dfe6fc64 r5:dfd7bac0 r4:c03fb3a0 [<c002cbf4>] > (io_schedule_timeout+0x0/0x58) from [<c014b834>] > (congestion_wait+0x70/0x94) > r4:c03df664 > [<c014b7c4>] (congestion_wait+0x0/0x94) from [<c0146b2c>] > (balance_dirty_pages_ratelimited_nr+0x12c/0x1c0) > r8:00000000 r7:cf142660 r6:00000600 r5:c04038c4 r4:ffffffce > [<c0146a00>] (balance_dirty_pages_ratelimited_nr+0x0/0x1c0) from > [<c0142860>] (generic_file_buffered_write+0x570/0x670) > [<c01422f0>] (generic_file_buffered_write+0x0/0x670) from [<c0142db0>] > (__generic_file_aio_write_nolock+0x450/0x4a0) > [<c0142960>] (__generic_file_aio_write_nolock+0x0/0x4a0) from > [<c0142e78>] (generic_file_aio_write+0x78/0xf4) > [<c0142e00>] (generic_file_aio_write+0x0/0xf4) from [<c0160430>] > (do_sync_write+0xb8/0x108) [<c0160378>] (do_sync_write+0x0/0x108) from > [<c0160d18>] (vfs_write+0xb8/0x148) [<c0160c60>] (vfs_write+0x0/0x148) > from [<c01612dc>] (sys_write+0x44/0x70) > r7:00000549 r6:dead0bc0 r5:00000000 r4:00009000 [<c0161298>] > (sys_write+0x0/0x70) from [<c0027400>] (ret_fast_syscall+0x0/0x2c) > r8:c00275a8 r7:00000004 r6:0000000d r5:bee23f50 r4:00000549 > > >Also, you can know the progress of segment writer by specifying the > >following debug options, > > > > # echo "-vvv segment -vv seginfo" > /proc/fs/nilfs2/debug_option > > > >This feature is only available on the nilfs2 out-of-tree module built > with CONFIG_NILFS_DEBUG=y >option. > > Here's the tail of the syslog with segment and seginfo logging enhanced > as requested: > > NILFS(segment) nilfs_segctor_sync: start task=dfd7bac0 seq=159 > NILFS(segment) nilfs_segctor_thread: woke > NILFS(segment) nilfs_segctor_thread: sequence: req=159, done=158, > state=0 > NILFS(segment) nilfs_transaction_lock: task c0909820 locking segment > semaphore > NILFS(segment) nilfs_transaction_lock: locked > NILFS(segment) nilfs_segctor_confirm: Skipped construction (no changes) > NILFS(segment) nilfs_segctor_notify: complete requests from seq=159 to > seq=159 > NILFS(segment) nilfs_segctor_wakeup: wakeup task=dfe6fb44 seq=159 > NILFS(segment) nilfs_transaction_unlock: task c0909820 unlocked segment > semaphore > NILFS(segment) nilfs_segctor_thread: sequence: req=159, done=159, > state=0 > NILFS(segment) nilfs_segctor_sync: done task=dfd7bac0 seq=159 err=0 > NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment > semaphore > NILFS(segment) nilfs_transaction_lock: locked > NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment > semaphore > NILFS(segment) nilfs_segctor_sync: start task=dfd7bac0 seq=160 > NILFS(segment) nilfs_segctor_thread: woke > NILFS(segment) nilfs_segctor_thread: sequence: req=160, done=159, > state=0 > NILFS(segment) nilfs_transaction_lock: task c0909820 locking segment > semaphore > NILFS(segment) nilfs_transaction_lock: locked > NILFS(segment) nilfs_segctor_confirm: Skipped construction (no changes) > NILFS(segment) nilfs_segctor_notify: comion_unlock: task dfd7bac0 > unlocked segment semaphore > NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment > semaphore > NILFS(segment) nilfs_transaction_lock: locked > NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment > semaphore > > // Bunch of repeated lock, lock, unlock entries... > > NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment > semaphore > NILFS(segment) nilfs_transaction_lock: locked > NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment > semaphore > NILFS(segment) nilfs_segctor_sync: start task=dfd7bac0 seq=703 > NILFS(segment) nilfs_segctor_thread: woke > NILFS(segment) nilfs_segctor_thread: sequence: req=703, done=702, > state=0 > NILFS(segment) nilfs_transaction_lock: task c0909820 locking segment > semaphore > NILFS(segment) nilfs_transaction_lock: locked > NILFS(segment) nilfs_segctor_confirm: Skipped construction (no changes) > NILFS(segment) nilfs_segctor_notify: complete requests from seq=703 to > seq=703 > NILFS(segment) nilfs_segctor_wakeup: wakeup task=dfe6fb44 seq=703 > NILFS(segment) nilfs_transaction_unlock: task c0909820 unlocked segment > semaphore > NILFS(segment) nilfs_segctor_thread: sequence: req=703, done=703, > state=0 > NILFS(segment) nilfs_segctor_sync: done task=dfd7bac0 seq=703 err=0 > NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment > semaphore > NILFS(segment) nilfs_transaction_lock: locked > NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment > semaphore > > // Bunch of repeated lock, lock, unlock entries... > > NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment > semaphore > NILFS(segment) nilfs_transaction_lock: locked > NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment > semaphore > > // Game over. > > As you can see, I've cut out some repeated sections to trim down the > output. If you need more log info beyond this tail, let me know how > much and how you would like me to pass it to you. > > Bill ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20090612.132344.17157054.ryusuke-sG5X7nlA6pw@public.gmane.org>]
* Re: Write hang on ARM based target [not found] ` <20090612.132344.17157054.ryusuke-sG5X7nlA6pw@public.gmane.org> @ 2009-06-18 18:36 ` Ryusuke Konishi [not found] ` <20090619.033642.32757627.ryusuke-sG5X7nlA6pw@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Ryusuke Konishi @ 2009-06-18 18:36 UTC (permalink / raw) To: WDunphy-/3AnZwnt+WyVkR+q5Y5grQ Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg, users-JrjvKiOkagjYtjvyW6yDsg Hi, On Fri, 12 Jun 2009 13:23:44 +0900 (JST), Ryusuke Konishi wrote: > On Tue, 9 Jun 2009 13:04:04 -0600, "Dunphy, Bill" wrote: > > > > Ryusuke, > > > > I've simplified my test approach by simply copying a single 10 byte file > > repeatedly. When doing so, I get consistent NILFS log output at the > > time of hang. Examination of /proc/sysreq-trigger also shows the copy > > process to be in the same state as before (at sys_write). > > Thank you for sending me the trace log. I will look into the problem > later. > > Ryusuke Konishi In another thread, I've sent a patch that may cause the hang problem. Here, I attach the out-of-tree version of the patch. Does this make a difference in your problem ? Thanks, Ryusuke Konishi -- diff --git a/fs/dat.c b/fs/dat.c index 036b05d..b05d9d4 100644 --- a/fs/dat.c +++ b/fs/dat.c @@ -149,15 +149,6 @@ void nilfs_dat_commit_start(struct inode *dat, struct nilfs_palloc_req *req, entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr, req->pr_entry_bh, kaddr); entry->de_start = cpu_to_le64(nilfs_mdt_cno(dat)); - if (entry->de_blocknr != cpu_to_le64(0) || - entry->de_end != cpu_to_le64(NILFS_CNO_MAX)) { - printk(KERN_CRIT - "%s: vbn = %llu, start = %llu, end = %llu, pbn = %llu\n", - __func__, (unsigned long long)req->pr_entry_nr, - (unsigned long long)le64_to_cpu(entry->de_start), - (unsigned long long)le64_to_cpu(entry->de_end), - (unsigned long long)le64_to_cpu(entry->de_blocknr)); - } entry->de_blocknr = cpu_to_le64(blocknr); kunmap_atomic(kaddr, KM_USER0); diff --git a/fs/segment.c b/fs/segment.c index 3cbb3fe..225d5ae 100644 --- a/fs/segment.c +++ b/fs/segment.c @@ -2009,26 +2009,13 @@ static int nilfs_segctor_write(struct nilfs_sc_info *sci, err = nilfs_segbuf_write(segbuf, &wi); res = nilfs_segbuf_wait(segbuf, &wi); - err = unlikely(err) ? : res; + err = unlikely(err) ? err : res; if (unlikely(err)) return err; } return 0; } -static int nilfs_page_has_uncleared_buffer(struct page *page) -{ - struct buffer_head *head, *bh; - - head = bh = page_buffers(page); - do { - if (buffer_dirty(bh) && !list_empty(&bh->b_assoc_buffers)) - return 1; - bh = bh->b_this_page; - } while (bh != head); - return 0; -} - static void __nilfs_end_page_io(struct page *page, int err) { if (!err) { @@ -2056,12 +2043,11 @@ static void nilfs_end_page_io(struct page *page, int err) if (!page) return; - if (buffer_nilfs_node(page_buffers(page)) && - nilfs_page_has_uncleared_buffer(page)) - /* For b-tree node pages, this function may be called twice - or more because they might be split in a segment. - This check assures that cleanup has been done for all - buffers in a split btnode page. */ + if (buffer_nilfs_node(page_buffers(page)) && !PageWriteback(page)) + /* + * For b-tree node pages, this function may be called twice + * or more because they might be split in a segment. + */ return; __nilfs_end_page_io(page, err); @@ -2125,7 +2111,7 @@ static void nilfs_segctor_abort_write(struct nilfs_sc_info *sci, } if (bh->b_page != fs_page) { nilfs_end_page_io(fs_page, err); - if (unlikely(fs_page == failed_page)) + if (fs_page && fs_page == failed_page) goto done; fs_page = bh->b_page; } ^ permalink raw reply related [flat|nested] 14+ messages in thread
[parent not found: <20090619.033642.32757627.ryusuke-sG5X7nlA6pw@public.gmane.org>]
* Re: Write hang on ARM based target [not found] ` <20090619.033642.32757627.ryusuke-sG5X7nlA6pw@public.gmane.org> @ 2009-06-18 18:42 ` Ryusuke Konishi 2009-06-18 19:48 ` Dunphy, Bill 1 sibling, 0 replies; 14+ messages in thread From: Ryusuke Konishi @ 2009-06-18 18:42 UTC (permalink / raw) To: WDunphy-/3AnZwnt+WyVkR+q5Y5grQ Cc: users-JrjvKiOkagjYtjvyW6yDsg, konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg On Fri, 19 Jun 2009 03:36:42 +0900 (JST), Ryusuke Konishi <ryusuke-sG5X7nlA6pw@public.gmane.org> wrote: > Hi, > On Fri, 12 Jun 2009 13:23:44 +0900 (JST), Ryusuke Konishi wrote: > > On Tue, 9 Jun 2009 13:04:04 -0600, "Dunphy, Bill" wrote: > > > > > > Ryusuke, > > > > > > I've simplified my test approach by simply copying a single 10 byte file > > > repeatedly. When doing so, I get consistent NILFS log output at the > > > time of hang. Examination of /proc/sysreq-trigger also shows the copy > > > process to be in the same state as before (at sys_write). > > > > Thank you for sending me the trace log. I will look into the problem > > later. > > > > Ryusuke Konishi > > In another thread, I've sent a patch that may cause the hang problem. > Here, I attach the out-of-tree version of the patch. s/cause/fix/ Ryusuke > Does this make a difference in your problem ? > > Thanks, > Ryusuke Konishi > -- > > diff --git a/fs/dat.c b/fs/dat.c > index 036b05d..b05d9d4 100644 > --- a/fs/dat.c > +++ b/fs/dat.c > @@ -149,15 +149,6 @@ void nilfs_dat_commit_start(struct inode *dat, struct nilfs_palloc_req *req, > entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr, > req->pr_entry_bh, kaddr); > entry->de_start = cpu_to_le64(nilfs_mdt_cno(dat)); > - if (entry->de_blocknr != cpu_to_le64(0) || > - entry->de_end != cpu_to_le64(NILFS_CNO_MAX)) { > - printk(KERN_CRIT > - "%s: vbn = %llu, start = %llu, end = %llu, pbn = %llu\n", > - __func__, (unsigned long long)req->pr_entry_nr, > - (unsigned long long)le64_to_cpu(entry->de_start), > - (unsigned long long)le64_to_cpu(entry->de_end), > - (unsigned long long)le64_to_cpu(entry->de_blocknr)); > - } > entry->de_blocknr = cpu_to_le64(blocknr); > kunmap_atomic(kaddr, KM_USER0); > > diff --git a/fs/segment.c b/fs/segment.c > index 3cbb3fe..225d5ae 100644 > --- a/fs/segment.c > +++ b/fs/segment.c > @@ -2009,26 +2009,13 @@ static int nilfs_segctor_write(struct nilfs_sc_info *sci, > err = nilfs_segbuf_write(segbuf, &wi); > > res = nilfs_segbuf_wait(segbuf, &wi); > - err = unlikely(err) ? : res; > + err = unlikely(err) ? err : res; > if (unlikely(err)) > return err; > } > return 0; > } > > -static int nilfs_page_has_uncleared_buffer(struct page *page) > -{ > - struct buffer_head *head, *bh; > - > - head = bh = page_buffers(page); > - do { > - if (buffer_dirty(bh) && !list_empty(&bh->b_assoc_buffers)) > - return 1; > - bh = bh->b_this_page; > - } while (bh != head); > - return 0; > -} > - > static void __nilfs_end_page_io(struct page *page, int err) > { > if (!err) { > @@ -2056,12 +2043,11 @@ static void nilfs_end_page_io(struct page *page, int err) > if (!page) > return; > > - if (buffer_nilfs_node(page_buffers(page)) && > - nilfs_page_has_uncleared_buffer(page)) > - /* For b-tree node pages, this function may be called twice > - or more because they might be split in a segment. > - This check assures that cleanup has been done for all > - buffers in a split btnode page. */ > + if (buffer_nilfs_node(page_buffers(page)) && !PageWriteback(page)) > + /* > + * For b-tree node pages, this function may be called twice > + * or more because they might be split in a segment. > + */ > return; > > __nilfs_end_page_io(page, err); > @@ -2125,7 +2111,7 @@ static void nilfs_segctor_abort_write(struct nilfs_sc_info *sci, > } > if (bh->b_page != fs_page) { > nilfs_end_page_io(fs_page, err); > - if (unlikely(fs_page == failed_page)) > + if (fs_page && fs_page == failed_page) > goto done; > fs_page = bh->b_page; > } > _______________________________________________ > users mailing list > users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org > https://www.nilfs.org/mailman/listinfo/users ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Write hang on ARM based target [not found] ` <20090619.033642.32757627.ryusuke-sG5X7nlA6pw@public.gmane.org> 2009-06-18 18:42 ` Ryusuke Konishi @ 2009-06-18 19:48 ` Dunphy, Bill [not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FAC-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org> 1 sibling, 1 reply; 14+ messages in thread From: Dunphy, Bill @ 2009-06-18 19:48 UTC (permalink / raw) To: Ryusuke Konishi Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg, users-JrjvKiOkagjYtjvyW6yDsg Tried applying patch to latest git pull as well as 2.0.12 and 2.0.14 and experienced the following patch issue: $ patch -p1 <your.patch (Stripping trailing CRs from patch.) patching file fs/dat.c (Stripping trailing CRs from patch.) patching file fs/segment.c patch: **** malformed patch at line 50: @@ -2056,12 +2043,11 @@ static void nilfs_end_page_io(struct page *page, int err) I could attempt to do by hand but decided to let you chime in first. Thoughts? Bill -----Original Message----- From: Ryusuke Konishi [mailto:ryusuke-sG5X7nlA6pw@public.gmane.org] Sent: Thursday, June 18, 2009 12:37 PM To: Dunphy, Bill Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org Subject: Re: [NILFS users] Write hang on ARM based target Hi, On Fri, 12 Jun 2009 13:23:44 +0900 (JST), Ryusuke Konishi wrote: > On Tue, 9 Jun 2009 13:04:04 -0600, "Dunphy, Bill" wrote: > > > > Ryusuke, > > > > I've simplified my test approach by simply copying a single 10 byte > > file repeatedly. When doing so, I get consistent NILFS log output > > at the time of hang. Examination of /proc/sysreq-trigger also shows > > the copy process to be in the same state as before (at sys_write). > > Thank you for sending me the trace log. I will look into the problem > later. > > Ryusuke Konishi In another thread, I've sent a patch that may cause the hang problem. Here, I attach the out-of-tree version of the patch. Does this make a difference in your problem ? Thanks, Ryusuke Konishi -- diff --git a/fs/dat.c b/fs/dat.c index 036b05d..b05d9d4 100644 --- a/fs/dat.c +++ b/fs/dat.c @@ -149,15 +149,6 @@ void nilfs_dat_commit_start(struct inode *dat, struct nilfs_palloc_req *req, entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr, req->pr_entry_bh, kaddr); entry->de_start = cpu_to_le64(nilfs_mdt_cno(dat)); - if (entry->de_blocknr != cpu_to_le64(0) || - entry->de_end != cpu_to_le64(NILFS_CNO_MAX)) { - printk(KERN_CRIT - "%s: vbn = %llu, start = %llu, end = %llu, pbn = %llu\n", - __func__, (unsigned long long)req->pr_entry_nr, - (unsigned long long)le64_to_cpu(entry->de_start), - (unsigned long long)le64_to_cpu(entry->de_end), - (unsigned long long)le64_to_cpu(entry->de_blocknr)); - } entry->de_blocknr = cpu_to_le64(blocknr); kunmap_atomic(kaddr, KM_USER0); diff --git a/fs/segment.c b/fs/segment.c index 3cbb3fe..225d5ae 100644 --- a/fs/segment.c +++ b/fs/segment.c @@ -2009,26 +2009,13 @@ static int nilfs_segctor_write(struct nilfs_sc_info *sci, err = nilfs_segbuf_write(segbuf, &wi); res = nilfs_segbuf_wait(segbuf, &wi); - err = unlikely(err) ? : res; + err = unlikely(err) ? err : res; if (unlikely(err)) return err; } return 0; } -static int nilfs_page_has_uncleared_buffer(struct page *page) -{ - struct buffer_head *head, *bh; - - head = bh = page_buffers(page); - do { - if (buffer_dirty(bh) && !list_empty(&bh->b_assoc_buffers)) - return 1; - bh = bh->b_this_page; - } while (bh != head); - return 0; -} - static void __nilfs_end_page_io(struct page *page, int err) { if (!err) { @@ -2056,12 +2043,11 @@ static void nilfs_end_page_io(struct page *page, int err) if (!page) return; - if (buffer_nilfs_node(page_buffers(page)) && - nilfs_page_has_uncleared_buffer(page)) - /* For b-tree node pages, this function may be called twice - or more because they might be split in a segment. - This check assures that cleanup has been done for all - buffers in a split btnode page. */ + if (buffer_nilfs_node(page_buffers(page)) && !PageWriteback(page)) + /* + * For b-tree node pages, this function may be called twice + * or more because they might be split in a segment. + */ return; __nilfs_end_page_io(page, err); @@ -2125,7 +2111,7 @@ static void nilfs_segctor_abort_write(struct nilfs_sc_info *sci, } if (bh->b_page != fs_page) { nilfs_end_page_io(fs_page, err); - if (unlikely(fs_page == failed_page)) + if (fs_page && fs_page == failed_page) goto done; fs_page = bh->b_page; } ^ permalink raw reply related [flat|nested] 14+ messages in thread
[parent not found: <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FAC-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>]
* Re: Write hang on ARM based target [not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FAC-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org> @ 2009-06-19 1:47 ` Ryusuke Konishi [not found] ` <20090619.104731.99018736.ryusuke-sG5X7nlA6pw@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Ryusuke Konishi @ 2009-06-19 1:47 UTC (permalink / raw) To: WDunphy-/3AnZwnt+WyVkR+q5Y5grQ Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg, users-JrjvKiOkagjYtjvyW6yDsg [-- Attachment #1: Type: Text/Plain, Size: 823 bytes --] Hi, On Thu, 18 Jun 2009 13:48:56 -0600, "Dunphy, Bill" wrote: > Tried applying patch to latest git pull as well as 2.0.12 and 2.0.14 and > experienced the following patch issue: > > $ patch -p1 <your.patch > (Stripping trailing CRs from patch.) > patching file fs/dat.c > (Stripping trailing CRs from patch.) > patching file fs/segment.c > patch: **** malformed patch at line 50: @@ -2056,12 +2043,11 @@ static > void nilfs_end_page_io(struct page *page, int err) > > I could attempt to do by hand but decided to let you chime in first. > > Thoughts? > > Bill Hmm, strange. I will attach the compressed one. It is against the latest git, but will be also applicable to v2.0.14. $ cd nilfs2-module $ bunzip -c xxx/nilfs2-fix-hang-after-bio_alloc-failed.patch.bz2 | patch -p1 How does this go ? Ryusuke Konishi [-- Attachment #2: nilfs2-fix-hang-after-bio_alloc-failed.patch.bz2 --] [-- Type: Application/Octet-Stream, Size: 1159 bytes --] [-- Attachment #3: Type: text/plain, Size: 158 bytes --] _______________________________________________ users mailing list users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org https://www.nilfs.org/mailman/listinfo/users ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <20090619.104731.99018736.ryusuke-sG5X7nlA6pw@public.gmane.org>]
* Re: Write hang on ARM based target [not found] ` <20090619.104731.99018736.ryusuke-sG5X7nlA6pw@public.gmane.org> @ 2009-06-19 15:01 ` Dunphy, Bill [not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FB3-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Dunphy, Bill @ 2009-06-19 15:01 UTC (permalink / raw) To: Ryusuke Konishi Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg, users-JrjvKiOkagjYtjvyW6yDsg Thanks. That patch integrated successfully. Ran it again with the following result: Jun 19 08:38:26 kernel: ========= NILFS SEGMENT INFORMATION ======== Jun 19 08:38:26 kernel: full segment: segnum=39, start=79872, end=81919 Jun 19 08:38:26 kernel: partial segment: start=81162, rest=758 Jun 19 08:38:26 kernel: ------------------ SUMMARY ----------------- Jun 19 08:38:26 kernel: nfinfo = 4 (number of files) Jun 19 08:38:26 kernel: nblocks = 14 (number of blocks) Jun 19 08:38:26 kernel: sumbytes = 312 (size of summary in bytes) Jun 19 08:38:26 kernel: nsumblk = 1 (number of summary blocks) Jun 19 08:38:26 kernel: flags = LOGBGN|LOGEND|SR Jun 19 08:38:26 kernel: ============================================ Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_payload_blocknr: called Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_payload_blocknr: done Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: called Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: done Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: called Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: done Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage: called Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage: done Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: called Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: done Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting summary blocks Jun 19 08:38:26 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated bio (max_vecs=16) Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting normal blocks (index=1) Jun 19 08:38:26 kernel: NILFS(segment) nilfs_submit_seg_bio: submitting bio (start_sector=649296, size=57344,) Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitted a segment (err=0, pseg_start=81162, #req) Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_wait: called nbio=1 Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_wait: wait completed Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write: completing segment (flags=0x7) Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write: completed a segment having a super root ) Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_do_construct: submitted all segments Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_construct: end (stage=9) Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_notify: complete requests from seq=2 to seq=1 Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_thread: sequence: req=1, done=1, state=0 Bill -----Original Message----- From: Ryusuke Konishi [mailto:ryusuke-sG5X7nlA6pw@public.gmane.org] Sent: Thursday, June 18, 2009 7:48 PM To: Dunphy, Bill Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org Subject: Re: [NILFS users] Write hang on ARM based target Hi, On Thu, 18 Jun 2009 13:48:56 -0600, "Dunphy, Bill" wrote: > Tried applying patch to latest git pull as well as 2.0.12 and 2.0.14 > and experienced the following patch issue: > > $ patch -p1 <your.patch > (Stripping trailing CRs from patch.) > patching file fs/dat.c > (Stripping trailing CRs from patch.) > patching file fs/segment.c > patch: **** malformed patch at line 50: @@ -2056,12 +2043,11 @@ static > void nilfs_end_page_io(struct page *page, int err) > > I could attempt to do by hand but decided to let you chime in first. > > Thoughts? > > Bill Hmm, strange. I will attach the compressed one. It is against the latest git, but will be also applicable to v2.0.14. $ cd nilfs2-module $ bunzip -c xxx/nilfs2-fix-hang-after-bio_alloc-failed.patch.bz2 | patch -p1 How does this go ? Ryusuke Konishi ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FB3-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>]
* Re: Write hang on ARM based target [not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FB3-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org> @ 2009-06-22 18:58 ` Dunphy, Bill [not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FB9-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Dunphy, Bill @ 2009-06-22 18:58 UTC (permalink / raw) To: Ryusuke Konishi Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg, users-JrjvKiOkagjYtjvyW6yDsg An update. I've just begun some testing on a different reference design board utilizing the 88F6281. This particular board has native support in the 2.6.30 kernel which allowed me to give the in-tree version of NILFS a go. This board/kernel version combination ran though the testing mentioned below without a hitch this weekend (1 million loops). I've since performed a number of massive simultaneous data transfers without any errors. Performance appeared to be much better from a high level as well. So at this point, it appears to me that there is a NILFS sensitivity to the 2.6.22.18 kernel and/or a board oddity (even though other file systems worked flawlessly). My near term plan is to move forward with this new board/kernel combination. However, I will keep the original board and it's 2.6.22.18 kernel up and available if you would like me to try some other changes - you decide. In the meanwhile, I'll start banging away on this platform and report in if I see any strange behavior. Bill -----Original Message----- From: Dunphy, Bill Sent: Friday, June 19, 2009 9:01 AM To: 'Ryusuke Konishi' Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org Subject: RE: [NILFS users] Write hang on ARM based target Thanks. That patch integrated successfully. Ran it again with the following result: Jun 19 08:38:26 kernel: ========= NILFS SEGMENT INFORMATION ======== Jun 19 08:38:26 kernel: full segment: segnum=39, start=79872, end=81919 Jun 19 08:38:26 kernel: partial segment: start=81162, rest=758 Jun 19 08:38:26 kernel: ------------------ SUMMARY ----------------- Jun 19 08:38:26 kernel: nfinfo = 4 (number of files) Jun 19 08:38:26 kernel: nblocks = 14 (number of blocks) Jun 19 08:38:26 kernel: sumbytes = 312 (size of summary in bytes) Jun 19 08:38:26 kernel: nsumblk = 1 (number of summary blocks) Jun 19 08:38:26 kernel: flags = LOGBGN|LOGEND|SR Jun 19 08:38:26 kernel: ============================================ Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_payload_blocknr: called Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_payload_blocknr: done Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: called Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: done Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: called Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: done Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage: called Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage: done Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: called Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: done Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting summary blocks Jun 19 08:38:26 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated bio (max_vecs=16) Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting normal blocks (index=1) Jun 19 08:38:26 kernel: NILFS(segment) nilfs_submit_seg_bio: submitting bio (start_sector=649296, size=57344,) Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitted a segment (err=0, pseg_start=81162, #req) Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_wait: called nbio=1 Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_wait: wait completed Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write: completing segment (flags=0x7) Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write: completed a segment having a super root ) Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_do_construct: submitted all segments Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_construct: end (stage=9) Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_notify: complete requests from seq=2 to seq=1 Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_thread: sequence: req=1, done=1, state=0 Bill -----Original Message----- From: Ryusuke Konishi [mailto:ryusuke-sG5X7nlA6pw@public.gmane.org] Sent: Thursday, June 18, 2009 7:48 PM To: Dunphy, Bill Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org Subject: Re: [NILFS users] Write hang on ARM based target Hi, On Thu, 18 Jun 2009 13:48:56 -0600, "Dunphy, Bill" wrote: > Tried applying patch to latest git pull as well as 2.0.12 and 2.0.14 > and experienced the following patch issue: > > $ patch -p1 <your.patch > (Stripping trailing CRs from patch.) > patching file fs/dat.c > (Stripping trailing CRs from patch.) > patching file fs/segment.c > patch: **** malformed patch at line 50: @@ -2056,12 +2043,11 @@ static > void nilfs_end_page_io(struct page *page, int err) > > I could attempt to do by hand but decided to let you chime in first. > > Thoughts? > > Bill Hmm, strange. I will attach the compressed one. It is against the latest git, but will be also applicable to v2.0.14. $ cd nilfs2-module $ bunzip -c xxx/nilfs2-fix-hang-after-bio_alloc-failed.patch.bz2 | patch -p1 How does this go ? Ryusuke Konishi ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FB9-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>]
* Re: Write hang on ARM based target [not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FB9-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org> @ 2009-06-23 3:35 ` Ryusuke Konishi [not found] ` <20090623.123505.37056161.ryusuke-sG5X7nlA6pw@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Ryusuke Konishi @ 2009-06-23 3:35 UTC (permalink / raw) To: WDunphy-/3AnZwnt+WyVkR+q5Y5grQ Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg, users-JrjvKiOkagjYtjvyW6yDsg Hi Bill, On Mon, 22 Jun 2009 12:58:22 -0600, "Dunphy, Bill" wrote: > An update. > > I've just begun some testing on a different reference design board > utilizing the 88F6281. This particular board has native support in the > 2.6.30 kernel which allowed me to give the in-tree version of NILFS a > go. This board/kernel version combination ran though the testing > mentioned below without a hitch this weekend (1 million loops). I've > since performed a number of massive simultaneous data transfers without > any errors. Performance appeared to be much better from a high level as > well. So at this point, it appears to me that there is a NILFS > sensitivity to the 2.6.22.18 kernel and/or a board oddity (even though > other file systems worked flawlessly). My near term plan is to move > forward with this new board/kernel combination. However, I will keep > the original board and it's 2.6.22.18 kernel up and available if you > would like me to try some other changes - you decide. In the meanwhile, > I'll start banging away on this platform and report in if I see any > strange behavior. > > Bill Sorry for my late reply. I found an inconsistent state in the value of sequence counter shown in your log. I think some sort of synchronization problem is present. If so, I think we should resolve the problem because it may occur in any RISC architectures. Could you test if the attached patch makes a difference? The patch adds volatile specifiers to sequence counters which may be shared among different tasks. Thanks, Ryusuke Konishi > -----Original Message----- > From: Dunphy, Bill > Sent: Friday, June 19, 2009 9:01 AM > To: 'Ryusuke Konishi' > Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org > Subject: RE: [NILFS users] Write hang on ARM based target > > Thanks. That patch integrated successfully. > > Ran it again with the following result: > > Jun 19 08:38:26 kernel: ========= NILFS SEGMENT INFORMATION ======== > > Jun 19 08:38:26 kernel: full segment: segnum=39, start=79872, end=81919 > > Jun 19 08:38:26 kernel: partial segment: start=81162, rest=758 > > Jun 19 08:38:26 kernel: ------------------ SUMMARY ----------------- > > Jun 19 08:38:26 kernel: nfinfo = 4 (number of files) > > Jun 19 08:38:26 kernel: nblocks = 14 (number of blocks) > > Jun 19 08:38:26 kernel: sumbytes = 312 (size of summary in bytes) > > Jun 19 08:38:26 kernel: nsumblk = 1 (number of summary blocks) > > Jun 19 08:38:26 kernel: flags = LOGBGN|LOGEND|SR > > Jun 19 08:38:26 kernel: ============================================ > > Jun 19 08:38:26 kernel: NILFS(segment) > nilfs_segctor_update_payload_blocknr: called > Jun 19 08:38:26 kernel: NILFS(segment) > nilfs_segctor_update_payload_blocknr: done > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: > called > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: > done > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: > called > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: > done > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage: > called > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage: > done > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: > called > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: > done > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting > summary blocks > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated > bio (max_vecs=16) > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting > normal blocks (index=1) > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_submit_seg_bio: submitting > bio (start_sector=649296, size=57344,) Jun 19 08:38:26 kernel: > NILFS(segment) nilfs_segbuf_write: submitted a segment (err=0, > pseg_start=81162, #req) > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_wait: called nbio=1 > > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_wait: wait completed > > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write: > completing segment (flags=0x7) > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write: > completed a segment having a super root ) > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_do_construct: > submitted all segments > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_construct: end > (stage=9) diff --git a/fs/segment.c b/fs/segment.c index 84201ce..c9c28c2 100644 --- a/fs/segment.c +++ b/fs/segment.c @@ -2530,7 +2530,7 @@ void nilfs_segctor_clear_segments_to_be_freed(struct nilfs_sc_info *sci) struct nilfs_segctor_wait_request { wait_queue_t wq; - __u32 seq; + volatile __u32 seq; int err; atomic_t done; }; @@ -2699,7 +2699,7 @@ int nilfs_construct_dsync_segment(struct super_block *sb, struct inode *inode, struct nilfs_segctor_req { int mode; - __u32 seq_accepted; + volatile __u32 seq_accepted; int sc_err; /* construction failure */ int sb_err; /* super block writeback failure */ }; diff --git a/fs/segment.h b/fs/segment.h index 44dca64..8533783 100644 --- a/fs/segment.h +++ b/fs/segment.h @@ -163,8 +163,8 @@ struct nilfs_sc_info { wait_queue_head_t sc_wait_daemon; wait_queue_head_t sc_wait_task; - __u32 sc_seq_request; - __u32 sc_seq_done; + volatile __u32 sc_seq_request; + volatile __u32 sc_seq_done; int sc_sync; unsigned long sc_interval; -- 1.6.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
[parent not found: <20090623.123505.37056161.ryusuke-sG5X7nlA6pw@public.gmane.org>]
* Re: Write hang on ARM based target [not found] ` <20090623.123505.37056161.ryusuke-sG5X7nlA6pw@public.gmane.org> @ 2009-06-23 15:34 ` Dunphy, Bill [not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FBB-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org> 0 siblings, 1 reply; 14+ messages in thread From: Dunphy, Bill @ 2009-06-23 15:34 UTC (permalink / raw) To: Ryusuke Konishi Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg, users-JrjvKiOkagjYtjvyW6yDsg Ryusuke, This patch failed to integrate with the latest git pull again. Looking at the lines it was trying to substitute I can see why. Being a relative rookie when it comes to git, am I missing a step here? My assumption is that anything we do in unison will be against the latest and greatest - is this a bad assumption? My sequence is a bit overkill but seems to ensure a clean start: 1) git clone http://git.nilfs.org/nilfs-module-git 2) cd nilfs2-module 3) patch -p1 < "your patch" If you have any suggestions as to how I should be doing this differently, I'm all ears. In any event, the good news is that was able to hand patch the necessary files (segment.c/segment.h) - pretty simple, straight forward change. The bad news is that it still ends in a hang state: Jun 23 09:12:52 kernel: ========= NILFS SEGMENT INFORMATION ======== Jun 23 09:12:52 kernel: full segment: segnum=22, start=45056, end=47103 Jun 23 09:12:52 kernel: partial segment: start=45413, rest=1691 Jun 23 09:12:52 kernel: ------------------ SUMMARY ----------------- Jun 23 09:12:52 kernel: nfinfo = 4 (number of files) Jun 23 09:12:52 kernel: nblocks = 14 (number of blocks) Jun 23 09:12:52 kernel: sumbytes = 312 (size of summary in bytes) Jun 23 09:12:52 kernel: nsumblk = 1 (number of summary blocks) Jun 23 09:12:52 kernel: flags = LOGBGN|LOGEND|SR Jun 23 09:12:52 kernel: ============================================ Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_update_payload_blocknr: called Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_update_payload_blocknr: done Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: called Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: done Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: called Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: done Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_update_segusage: called Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_update_segusage: done Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: called Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: done Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_write: submitting summary blocks Jun 23 09:12:52 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated bio (max_vecs=16) Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_write: submitting normal blocks (index=1) Jun 23 09:12:52 kernel: NILFS(segment) nilfs_submit_seg_bio: submitting bio (start_sector=363304, size=57344,) Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_write: submitted a segment (err=0, pseg_start=45413, #req) Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_wait: called nbio=1 Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_wait: wait completed Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_complete_write: completing segment (flags=0x7) Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_complete_write: completed a segment having a super root ) Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_do_construct: submitted all segments Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_construct: end (stage=9) Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_notify: complete requests from seq=1 to seq=1 Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_thread: sequence: req=1, done=1, state=0 Bill -----Original Message----- From: Ryusuke Konishi [mailto:ryusuke-sG5X7nlA6pw@public.gmane.org] Sent: Monday, June 22, 2009 9:35 PM To: Dunphy, Bill Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org Subject: Re: [NILFS users] Write hang on ARM based target Hi Bill, On Mon, 22 Jun 2009 12:58:22 -0600, "Dunphy, Bill" wrote: > An update. > > I've just begun some testing on a different reference design board > utilizing the 88F6281. This particular board has native support in > the 2.6.30 kernel which allowed me to give the in-tree version of > NILFS a go. This board/kernel version combination ran though the > testing mentioned below without a hitch this weekend (1 million > loops). I've since performed a number of massive simultaneous data > transfers without any errors. Performance appeared to be much better > from a high level as well. So at this point, it appears to me that > there is a NILFS sensitivity to the 2.6.22.18 kernel and/or a board > oddity (even though other file systems worked flawlessly). My near > term plan is to move forward with this new board/kernel combination. > However, I will keep the original board and it's 2.6.22.18 kernel up > and available if you would like me to try some other changes - you > decide. In the meanwhile, I'll start banging away on this platform > and report in if I see any strange behavior. > > Bill Sorry for my late reply. I found an inconsistent state in the value of sequence counter shown in your log. I think some sort of synchronization problem is present. If so, I think we should resolve the problem because it may occur in any RISC architectures. Could you test if the attached patch makes a difference? The patch adds volatile specifiers to sequence counters which may be shared among different tasks. Thanks, Ryusuke Konishi > -----Original Message----- > From: Dunphy, Bill > Sent: Friday, June 19, 2009 9:01 AM > To: 'Ryusuke Konishi' > Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org > Subject: RE: [NILFS users] Write hang on ARM based target > > Thanks. That patch integrated successfully. > > Ran it again with the following result: > > Jun 19 08:38:26 kernel: ========= NILFS SEGMENT INFORMATION ======== > > Jun 19 08:38:26 kernel: full segment: segnum=39, start=79872, > end=81919 > > Jun 19 08:38:26 kernel: partial segment: start=81162, rest=758 > > Jun 19 08:38:26 kernel: ------------------ SUMMARY ----------------- > > Jun 19 08:38:26 kernel: nfinfo = 4 (number of files) > > Jun 19 08:38:26 kernel: nblocks = 14 (number of blocks) > > Jun 19 08:38:26 kernel: sumbytes = 312 (size of summary in bytes) > > Jun 19 08:38:26 kernel: nsumblk = 1 (number of summary blocks) > > Jun 19 08:38:26 kernel: flags = LOGBGN|LOGEND|SR > > Jun 19 08:38:26 kernel: ============================================ > > Jun 19 08:38:26 kernel: NILFS(segment) > nilfs_segctor_update_payload_blocknr: called > Jun 19 08:38:26 kernel: NILFS(segment) > nilfs_segctor_update_payload_blocknr: done > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: > called > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: > done > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: > called > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: > done > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage: > called > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage: > done > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: > called > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: > done > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting > summary blocks > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated > bio (max_vecs=16) > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting > normal blocks (index=1) > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_submit_seg_bio: > submitting bio (start_sector=649296, size=57344,) Jun 19 08:38:26 kernel: > NILFS(segment) nilfs_segbuf_write: submitted a segment (err=0, > pseg_start=81162, #req) Jun 19 08:38:26 kernel: NILFS(segment) > nilfs_segbuf_wait: called nbio=1 > > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_wait: wait > completed > > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write: > completing segment (flags=0x7) > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write: > completed a segment having a super root ) Jun 19 08:38:26 kernel: > NILFS(segment) nilfs_segctor_do_construct: > submitted all segments > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_construct: end > (stage=9) diff --git a/fs/segment.c b/fs/segment.c index 84201ce..c9c28c2 100644 --- a/fs/segment.c +++ b/fs/segment.c @@ -2530,7 +2530,7 @@ void nilfs_segctor_clear_segments_to_be_freed(struct nilfs_sc_info *sci) struct nilfs_segctor_wait_request { wait_queue_t wq; - __u32 seq; + volatile __u32 seq; int err; atomic_t done; }; @@ -2699,7 +2699,7 @@ int nilfs_construct_dsync_segment(struct super_block *sb, struct inode *inode, struct nilfs_segctor_req { int mode; - __u32 seq_accepted; + volatile __u32 seq_accepted; int sc_err; /* construction failure */ int sb_err; /* super block writeback failure */ }; diff --git a/fs/segment.h b/fs/segment.h index 44dca64..8533783 100644 --- a/fs/segment.h +++ b/fs/segment.h @@ -163,8 +163,8 @@ struct nilfs_sc_info { wait_queue_head_t sc_wait_daemon; wait_queue_head_t sc_wait_task; - __u32 sc_seq_request; - __u32 sc_seq_done; + volatile __u32 sc_seq_request; + volatile __u32 sc_seq_done; int sc_sync; unsigned long sc_interval; -- 1.6.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FBB-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>]
* Re: Write hang on ARM based target [not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FBB-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org> @ 2009-06-24 21:16 ` Dunphy, Bill 0 siblings, 0 replies; 14+ messages in thread From: Dunphy, Bill @ 2009-06-24 21:16 UTC (permalink / raw) To: Ryusuke Konishi Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg, users-JrjvKiOkagjYtjvyW6yDsg Ryusuke, Here's another data point. I've run the same test on the new ARM reference design hardware, but this time running the 2.6.22.18 kernel. It also hangs after a similar period of time and when examining the logging output, it appears very similar to the original error. Jun 24 15:06:25 kernel: ========= NILFS SEGMENT INFORMATION ======== Jun 24 15:06:25 kernel: full segment: segnum=124, start=253952, end=255999 Jun 24 15:06:25 kernel: partial segment: start=254180, rest=1820 Jun 24 15:06:25 kernel: ------------------ SUMMARY ----------------- Jun 24 15:06:25 kernel: nfinfo = 4 (number of files) Jun 24 15:06:25 kernel: nblocks = 17 (number of blocks) Jun 24 15:06:25 kernel: sumbytes = 344 (size of summary in bytes) Jun 24 15:06:25 kernel: nsumblk = 1 (number of summary blocks) Jun 24 15:06:25 kernel: flags = LOGBGN|LOGEND|SR Jun 24 15:06:25 kernel: ============================================ Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_update_payload_blocknr: called Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_update_payload_blocknr: done Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: called Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: done Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: called Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: done Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_update_segusage: called Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_update_segusage: done Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: called Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: done Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segbuf_write: submitting summary blocks Jun 24 15:06:25 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated bio (max_vecs=64) Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segbuf_write: submitting normal blocks (index=1) Jun 24 15:06:25 kernel: NILFS(segment) nilfs_submit_seg_bio: submitting bio (start_sector=2033440, size=69632) Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segbuf_write: submitted a segment (err=0, pseg_start=254180, #re) Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segbuf_wait: called nbio=1 Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segbuf_wait: wait completed Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_complete_write: completing segment (flags=0x7) Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_complete_write: completed a segment having a super root ) Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_do_construct: submitted all segments Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_construct: end (stage=9) Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_notify: complete requests from seq=1 to seq=1 Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_thread: sequence: req=1, done=1, state=0 Not too surprising - just eliminates the possibility of bad hardware. Bill -----Original Message----- From: Dunphy, Bill Sent: Tuesday, June 23, 2009 9:35 AM To: 'Ryusuke Konishi' Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org Subject: RE: [NILFS users] Write hang on ARM based target Ryusuke, This patch failed to integrate with the latest git pull again. Looking at the lines it was trying to substitute I can see why. Being a relative rookie when it comes to git, am I missing a step here? My assumption is that anything we do in unison will be against the latest and greatest - is this a bad assumption? My sequence is a bit overkill but seems to ensure a clean start: 1) git clone http://git.nilfs.org/nilfs-module-git 2) cd nilfs2-module 3) patch -p1 < "your patch" If you have any suggestions as to how I should be doing this differently, I'm all ears. In any event, the good news is that was able to hand patch the necessary files (segment.c/segment.h) - pretty simple, straight forward change. The bad news is that it still ends in a hang state: Jun 23 09:12:52 kernel: ========= NILFS SEGMENT INFORMATION ======== Jun 23 09:12:52 kernel: full segment: segnum=22, start=45056, end=47103 Jun 23 09:12:52 kernel: partial segment: start=45413, rest=1691 Jun 23 09:12:52 kernel: ------------------ SUMMARY ----------------- Jun 23 09:12:52 kernel: nfinfo = 4 (number of files) Jun 23 09:12:52 kernel: nblocks = 14 (number of blocks) Jun 23 09:12:52 kernel: sumbytes = 312 (size of summary in bytes) Jun 23 09:12:52 kernel: nsumblk = 1 (number of summary blocks) Jun 23 09:12:52 kernel: flags = LOGBGN|LOGEND|SR Jun 23 09:12:52 kernel: ============================================ Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_update_payload_blocknr: called Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_update_payload_blocknr: done Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: called Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: done Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: called Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: done Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_update_segusage: called Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_update_segusage: done Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: called Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: done Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_write: submitting summary blocks Jun 23 09:12:52 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated bio (max_vecs=16) Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_write: submitting normal blocks (index=1) Jun 23 09:12:52 kernel: NILFS(segment) nilfs_submit_seg_bio: submitting bio (start_sector=363304, size=57344,) Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_write: submitted a segment (err=0, pseg_start=45413, #req) Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_wait: called nbio=1 Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_wait: wait completed Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_complete_write: completing segment (flags=0x7) Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_complete_write: completed a segment having a super root ) Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_do_construct: submitted all segments Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_construct: end (stage=9) Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_notify: complete requests from seq=1 to seq=1 Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_thread: sequence: req=1, done=1, state=0 Bill -----Original Message----- From: Ryusuke Konishi [mailto:ryusuke-sG5X7nlA6pw@public.gmane.org] Sent: Monday, June 22, 2009 9:35 PM To: Dunphy, Bill Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org Subject: Re: [NILFS users] Write hang on ARM based target Hi Bill, On Mon, 22 Jun 2009 12:58:22 -0600, "Dunphy, Bill" wrote: > An update. > > I've just begun some testing on a different reference design board > utilizing the 88F6281. This particular board has native support in > the 2.6.30 kernel which allowed me to give the in-tree version of > NILFS a go. This board/kernel version combination ran though the > testing mentioned below without a hitch this weekend (1 million > loops). I've since performed a number of massive simultaneous data > transfers without any errors. Performance appeared to be much better > from a high level as well. So at this point, it appears to me that > there is a NILFS sensitivity to the 2.6.22.18 kernel and/or a board > oddity (even though other file systems worked flawlessly). My near > term plan is to move forward with this new board/kernel combination. > However, I will keep the original board and it's 2.6.22.18 kernel up > and available if you would like me to try some other changes - you > decide. In the meanwhile, I'll start banging away on this platform > and report in if I see any strange behavior. > > Bill Sorry for my late reply. I found an inconsistent state in the value of sequence counter shown in your log. I think some sort of synchronization problem is present. If so, I think we should resolve the problem because it may occur in any RISC architectures. Could you test if the attached patch makes a difference? The patch adds volatile specifiers to sequence counters which may be shared among different tasks. Thanks, Ryusuke Konishi > -----Original Message----- > From: Dunphy, Bill > Sent: Friday, June 19, 2009 9:01 AM > To: 'Ryusuke Konishi' > Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org > Subject: RE: [NILFS users] Write hang on ARM based target > > Thanks. That patch integrated successfully. > > Ran it again with the following result: > > Jun 19 08:38:26 kernel: ========= NILFS SEGMENT INFORMATION ======== > > Jun 19 08:38:26 kernel: full segment: segnum=39, start=79872, > end=81919 > > Jun 19 08:38:26 kernel: partial segment: start=81162, rest=758 > > Jun 19 08:38:26 kernel: ------------------ SUMMARY ----------------- > > Jun 19 08:38:26 kernel: nfinfo = 4 (number of files) > > Jun 19 08:38:26 kernel: nblocks = 14 (number of blocks) > > Jun 19 08:38:26 kernel: sumbytes = 312 (size of summary in bytes) > > Jun 19 08:38:26 kernel: nsumblk = 1 (number of summary blocks) > > Jun 19 08:38:26 kernel: flags = LOGBGN|LOGEND|SR > > Jun 19 08:38:26 kernel: ============================================ > > Jun 19 08:38:26 kernel: NILFS(segment) > nilfs_segctor_update_payload_blocknr: called > Jun 19 08:38:26 kernel: NILFS(segment) > nilfs_segctor_update_payload_blocknr: done > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: > called > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap: > done > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: > called > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint: > done > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage: > called > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage: > done > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: > called > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums: > done > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting > summary blocks > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated > bio (max_vecs=16) > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting > normal blocks (index=1) > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_submit_seg_bio: > submitting bio (start_sector=649296, size=57344,) Jun 19 08:38:26 kernel: > NILFS(segment) nilfs_segbuf_write: submitted a segment (err=0, > pseg_start=81162, #req) Jun 19 08:38:26 kernel: NILFS(segment) > nilfs_segbuf_wait: called nbio=1 > > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_wait: wait > completed > > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write: > completing segment (flags=0x7) > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write: > completed a segment having a super root ) Jun 19 08:38:26 kernel: > NILFS(segment) nilfs_segctor_do_construct: > submitted all segments > Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_construct: end > (stage=9) diff --git a/fs/segment.c b/fs/segment.c index 84201ce..c9c28c2 100644 --- a/fs/segment.c +++ b/fs/segment.c @@ -2530,7 +2530,7 @@ void nilfs_segctor_clear_segments_to_be_freed(struct nilfs_sc_info *sci) struct nilfs_segctor_wait_request { wait_queue_t wq; - __u32 seq; + volatile __u32 seq; int err; atomic_t done; }; @@ -2699,7 +2699,7 @@ int nilfs_construct_dsync_segment(struct super_block *sb, struct inode *inode, struct nilfs_segctor_req { int mode; - __u32 seq_accepted; + volatile __u32 seq_accepted; int sc_err; /* construction failure */ int sb_err; /* super block writeback failure */ }; diff --git a/fs/segment.h b/fs/segment.h index 44dca64..8533783 100644 --- a/fs/segment.h +++ b/fs/segment.h @@ -163,8 +163,8 @@ struct nilfs_sc_info { wait_queue_head_t sc_wait_daemon; wait_queue_head_t sc_wait_task; - __u32 sc_seq_request; - __u32 sc_seq_done; + volatile __u32 sc_seq_request; + volatile __u32 sc_seq_done; int sc_sync; unsigned long sc_interval; -- 1.6.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-06-24 21:16 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-08 15:46 [SPAM] Write hang on ARM based target Dunphy, Bill
[not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F8D-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
2009-06-08 16:49 ` Ryusuke Konishi
[not found] ` <20090609.014919.79903506.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-06-08 20:30 ` Dunphy, Bill
[not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F96-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
2009-06-09 19:04 ` Dunphy, Bill
[not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F99-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
2009-06-12 4:23 ` Ryusuke Konishi
[not found] ` <20090612.132344.17157054.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-06-18 18:36 ` Ryusuke Konishi
[not found] ` <20090619.033642.32757627.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-06-18 18:42 ` Ryusuke Konishi
2009-06-18 19:48 ` Dunphy, Bill
[not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FAC-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
2009-06-19 1:47 ` Ryusuke Konishi
[not found] ` <20090619.104731.99018736.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-06-19 15:01 ` Dunphy, Bill
[not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FB3-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
2009-06-22 18:58 ` Dunphy, Bill
[not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FB9-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
2009-06-23 3:35 ` Ryusuke Konishi
[not found] ` <20090623.123505.37056161.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-06-23 15:34 ` Dunphy, Bill
[not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FBB-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
2009-06-24 21:16 ` Dunphy, Bill
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox