linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/5] dio: clean up completion phase of direct_io_worker()
@ 2006-09-05 23:57 Zach Brown
  2006-09-05 23:57 ` [RFC 1/5] dio: centralize completion in dio_complete() Zach Brown
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Zach Brown @ 2006-09-05 23:57 UTC (permalink / raw)
  To: linux-fsdevel, linux-aio, linux-kernel

dio: clean up completion phase of direct_io_worker()

There have been a lot of bugs recently due to the way direct_io_worker() tries
to decide how to finish direct IO operations.  In the worst examples it has
failed to call aio_complete() at all (hang) or called it too many times (oops).

This set of patches cleans up the completion phase with the goal of removing
the complexity that lead to these bugs.  We end up with one path that
calculates the result of the operation after all off the bios have completed.
We decide when to generate a result of the operation using that path based on
the final release of a refcount on the dio structure.

I tried to progress towards the final state in steps that were relatively easy
to understand.  Each step should compile but I only tested the final result of
having all the patches applied.

The patches result in a slight net decrease in code and binary size:

 2.6.18-rc4-dio-cleanup/fs/direct-io.c |    8
 2.6.18-rc5-dio-cleanup/fs/direct-io.c |   94 +++++------
 2.6.18-rc5-dio-cleanup/mm/filemap.c   |    4
 fs/direct-io.c                        |  273 ++++++++++++++--------------------
 4 files changed, 159 insertions(+), 220 deletions(-)

   text    data     bss     dec     hex filename
2592385  450996  210296 3253677  31a5ad vmlinux.before
2592113  450980  210296 3253389  31a48d vmlinux.after

The patches pass light testing with aio-stress, the direct IO tests I could
manage to get running in LTP, and some home-brew functional tests.  It's still
making its way through stress testing.  It should not be merged until we hear
from that more rigorous testing, I don't think.

I hoped to get some feedback (and maybe volunteers for testing!) by sending the
patches out before waiting for the stress tests.

- z

^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: [RFC 0/5] dio: clean up completion phase of direct_io_worker()
@ 2006-09-21 12:24 Veerendra Chandrappa
  0 siblings, 0 replies; 17+ messages in thread
From: Veerendra Chandrappa @ 2006-09-21 12:24 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel

Hello ,

        I applied the DIO patches and built the kernel 2.6.18-rc6 
(kernel.org).
And executed  Aio-DioStressTest of LTP testsuite( ltp-full-20060822 ) 
on EXT2, EXT3 and XFS filesystems. For the EXT2 and EXT3 filesystems the 
tests went okay. But I got stack trace on XFS filesystem and the machine 
went down.

kernel BUG at kernel/workqueue.c:113!
invalid opcode: 0000 [#1]
PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU:    2
EIP:    0060:[<c012df03>]    Not tainted VLI
EFLAGS: 00010202   (2.6.18-rc6-dio #1)
EIP is at queue_work+0x86/0x90
eax: f7900780   ebx: f790077c   ecx: f7900754   edx: 00000002
esi: c5f4f8e0   edi: 00000000   ebp: c5d63ca8   esp: c5d63c94
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, ti=c5d62000 task=c5d2b030 task.ti=c5d62000)
Stack: f268f180 c5d63cb4 3e39c000 00000000 00010000 c5d63cb0 c02b43a2 
c5d63cc8
       c02b5e2f f7900754 00000000 3e39c000 f4e90000 c5d63d04 c018e77a 
f3235780
       3e39c000 00000000 00010000 f2604b20 f268f180 c5d63d04 00010000 
3e39c000
Call Trace:
 [<c0103cea>] show_stack_log_lvl+0xcc/0xdc
 [<c0103f0f>] show_registers+0x1b7/0x22b
 [<c0104143>] die+0x139/0x235
 [<c01042bd>] do_trap+0x7e/0xb4
 [<c01045f3>] do_invalid_op+0xb5/0xbf
 [<c0103945>] error_code+0x39/0x40
 [<c02b43a2>] xfs_finish_ioend+0x20/0x22
 [<c02b5e2f>] xfs_end_io_direct+0x3c/0x68
 [<c018e77a>] dio_complete+0xe3/0xfe
 [<c018e82d>] dio_bio_end_aio+0x98/0xb1
 [<c016e889>] bio_endio+0x4e/0x78
 [<c02cdc89>] __end_that_request_first+0xcd/0x416
 [<c02ce015>] end_that_request_chunk+0x1f/0x21
 [<c0380442>] scsi_end_request+0x2d/0xe8
 [<c0380715>] scsi_io_completion+0x10c/0x409
 [<c03a986b>] sd_rw_intr+0x188/0x2c6
 [<c037b832>] scsi_finish_command+0x4e/0x96
 [<c0380f44>] scsi_softirq_done+0xaa/0x10b
 [<c02ce073>] blk_done_softirq+0x5c/0x6a
 [<c01227a4>] __do_softirq+0x6d/0xe3
 [<c0122858>] do_softirq+0x3e/0x40
 [<c01228a1>] irq_exit+0x47/0x49
 [<c01054ef>] do_IRQ+0x2f/0x5d
 [<c0103826>] common_interrupt+0x1a/0x20
 [<c0100d84>] cpu_idle+0x9a/0xb0
 [<c010e077>] start_secondary+0xeb/0x32c
 [<00000000>] 0x0
 [<c5d63fb4>] 0xc5d63fb4
Code: ff ff b8 01 00 00 00 e8 87 9a fe ff 89 e0 25 00 e0 ff ff 8b 40 08 
a8 08 75 0a 83 c4 08 89 f8 5b 5e 5f 5d c3 e8 2f 0f 3
EIP: [<c012df03>] queue_work+0x86/0x90 SS:ESP 0068:c5d63c94
 <0>Kernel panic - not syncing: Fatal exception in interrupt

Also I executed some of the tests from the 
http: // developer.osdl. org/daniel/AIO/TESTS/ and it went fine.
For further testing, I am planning to put stress workload on DB2 by 
enabling AIO-DIO features of DB2. And for error path testing, I am 
contemplating to use kprobe to inject the IO errors.

   Will let you know the progress as it happens.

Regards
Veerendra C
LTC-ISL, IBM.

^ permalink raw reply	[flat|nested] 17+ messages in thread
[parent not found: <OFBE544A3C.7C1B2C64-ON652571F0.003C21B6-652571F0.003C2DF3@in.ibm.com>]

end of thread, other threads:[~2006-09-21 18:38 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-05 23:57 [RFC 0/5] dio: clean up completion phase of direct_io_worker() Zach Brown
2006-09-05 23:57 ` [RFC 1/5] dio: centralize completion in dio_complete() Zach Brown
2006-09-05 23:57 ` [RFC 2/5] dio: call blk_run_address_space() once per op Zach Brown
2006-09-05 23:57 ` [RFC 3/5] dio: formalize bio counters as a dio reference count Zach Brown
2006-09-05 23:57 ` [RFC 4/5] dio: remove duplicate bio wait code Zach Brown
2006-09-05 23:57 ` [RFC 5/5] dio: only call aio_complete() after returning -EIOCBQUEUED Zach Brown
2006-09-06  4:35 ` bogofilter ate 3/5 Zach Brown
2006-09-06  5:00   ` Willy Tarreau
2006-09-06  7:22   ` Mike Galbraith
2006-09-08 22:16   ` Matthias Andree
2006-09-06  7:36 ` [RFC 0/5] dio: clean up completion phase of direct_io_worker() Suparna Bhattacharya
2006-09-06 16:36   ` Zach Brown
2006-09-06 14:57 ` Jeff Moyer
2006-09-06 16:46   ` Zach Brown
2006-09-06 18:13     ` Jeff Moyer
  -- strict thread matches above, loose matches on Subject: below --
2006-09-21 12:24 Veerendra Chandrappa
     [not found] <OFBE544A3C.7C1B2C64-ON652571F0.003C21B6-652571F0.003C2DF3@in.ibm.com>
2006-09-21 18:38 ` Zach Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).