* Re: [RFC 0/5] dio: clean up completion phase of direct_io_worker()
[not found] <OFBE544A3C.7C1B2C64-ON652571F0.003C21B6-652571F0.003C2DF3@in.ibm.com>
@ 2006-09-21 18:38 ` Zach Brown
0 siblings, 0 replies; 8+ messages in thread
From: Zach Brown @ 2006-09-21 18:38 UTC (permalink / raw)
To: Veerendra Chandrappa; +Cc: linux-fsdevel, linux-aio, linux-kernel, suparna, xfs
> on EXT2, EXT3 and XFS filesystems. For the EXT2 and EXT3 filesystems the
> tests went okay. But I got stack trace on XFS filesystem and the machine
> went down.
Fantastic, thanks for running these tests.
> kernel BUG at kernel/workqueue.c:113!
> EIP is at queue_work+0x86/0x90
We were able to set the pending bit but then found that list_empty()
failed on the work queue's entry list_head. Let's call this memory
corruption of some kind.
> [<c02b43a2>] xfs_finish_ioend+0x20/0x22
> [<c02b5e2f>] xfs_end_io_direct+0x3c/0x68
> [<c018e77a>] dio_complete+0xe3/0xfe
> [<c018e82d>] dio_bio_end_aio+0x98/0xb1
> [<c016e889>] bio_endio+0x4e/0x78
> [<c02cdc89>] __end_that_request_first+0xcd/0x416
It was completing an AIO request.
ret = blockdev_direct_IO_own_locking(rw, iocb, inode,
iomap.iomap_target->bt_bdev,
iov, offset, nr_segs,
xfs_get_blocks_direct,
xfs_end_io_direct);
if (unlikely(ret <= 0 && iocb->private))
xfs_destroy_ioend(iocb->private);
It looks like xfs_vm_direct_io() is destroying the ioend in the case
where direct IO is returning -EIOCBQUEUED. Later the AIO will complete
and try to call queue_work on the freed ioend. This wasn't a problem
before when blkdev_direct_IO_*() would just return the number of bytes
in the op that was in flight. That test should be
if (unlikely(ret != -EIOCBQUEUED && iocb->private))
I'll update the patch set and send it out.
This makes me worry that XFS might have other paths that need to know
about the magical -EIOCBQUEUED case which actually means that a AIO DIO
is in flight.
Could I coerce some XFS guys into investigating if we might have other
problems with trying to bubble -EIOCBQUEUED up from
blockdev_direct_IO_own_locking() up through to xfs_file_aio_write()'s
caller before calling xfs_end_io_direct()?
- z
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC 0/5] dio: clean up completion phase of direct_io_worker()
@ 2006-09-21 12:24 Veerendra Chandrappa
0 siblings, 0 replies; 8+ messages in thread
From: Veerendra Chandrappa @ 2006-09-21 12:24 UTC (permalink / raw)
To: linux-kernel, linux-fsdevel
Hello ,
I applied the DIO patches and built the kernel 2.6.18-rc6
(kernel.org).
And executed Aio-DioStressTest of LTP testsuite( ltp-full-20060822 )
on EXT2, EXT3 and XFS filesystems. For the EXT2 and EXT3 filesystems the
tests went okay. But I got stack trace on XFS filesystem and the machine
went down.
kernel BUG at kernel/workqueue.c:113!
invalid opcode: 0000 [#1]
PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 2
EIP: 0060:[<c012df03>] Not tainted VLI
EFLAGS: 00010202 (2.6.18-rc6-dio #1)
EIP is at queue_work+0x86/0x90
eax: f7900780 ebx: f790077c ecx: f7900754 edx: 00000002
esi: c5f4f8e0 edi: 00000000 ebp: c5d63ca8 esp: c5d63c94
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, ti=c5d62000 task=c5d2b030 task.ti=c5d62000)
Stack: f268f180 c5d63cb4 3e39c000 00000000 00010000 c5d63cb0 c02b43a2
c5d63cc8
c02b5e2f f7900754 00000000 3e39c000 f4e90000 c5d63d04 c018e77a
f3235780
3e39c000 00000000 00010000 f2604b20 f268f180 c5d63d04 00010000
3e39c000
Call Trace:
[<c0103cea>] show_stack_log_lvl+0xcc/0xdc
[<c0103f0f>] show_registers+0x1b7/0x22b
[<c0104143>] die+0x139/0x235
[<c01042bd>] do_trap+0x7e/0xb4
[<c01045f3>] do_invalid_op+0xb5/0xbf
[<c0103945>] error_code+0x39/0x40
[<c02b43a2>] xfs_finish_ioend+0x20/0x22
[<c02b5e2f>] xfs_end_io_direct+0x3c/0x68
[<c018e77a>] dio_complete+0xe3/0xfe
[<c018e82d>] dio_bio_end_aio+0x98/0xb1
[<c016e889>] bio_endio+0x4e/0x78
[<c02cdc89>] __end_that_request_first+0xcd/0x416
[<c02ce015>] end_that_request_chunk+0x1f/0x21
[<c0380442>] scsi_end_request+0x2d/0xe8
[<c0380715>] scsi_io_completion+0x10c/0x409
[<c03a986b>] sd_rw_intr+0x188/0x2c6
[<c037b832>] scsi_finish_command+0x4e/0x96
[<c0380f44>] scsi_softirq_done+0xaa/0x10b
[<c02ce073>] blk_done_softirq+0x5c/0x6a
[<c01227a4>] __do_softirq+0x6d/0xe3
[<c0122858>] do_softirq+0x3e/0x40
[<c01228a1>] irq_exit+0x47/0x49
[<c01054ef>] do_IRQ+0x2f/0x5d
[<c0103826>] common_interrupt+0x1a/0x20
[<c0100d84>] cpu_idle+0x9a/0xb0
[<c010e077>] start_secondary+0xeb/0x32c
[<00000000>] 0x0
[<c5d63fb4>] 0xc5d63fb4
Code: ff ff b8 01 00 00 00 e8 87 9a fe ff 89 e0 25 00 e0 ff ff 8b 40 08
a8 08 75 0a 83 c4 08 89 f8 5b 5e 5f 5d c3 e8 2f 0f 3
EIP: [<c012df03>] queue_work+0x86/0x90 SS:ESP 0068:c5d63c94
<0>Kernel panic - not syncing: Fatal exception in interrupt
Also I executed some of the tests from the
http: // developer.osdl. org/daniel/AIO/TESTS/ and it went fine.
For further testing, I am planning to put stress workload on DB2 by
enabling AIO-DIO features of DB2. And for error path testing, I am
contemplating to use kprobe to inject the IO errors.
Will let you know the progress as it happens.
Regards
Veerendra C
LTC-ISL, IBM.
^ permalink raw reply [flat|nested] 8+ messages in thread* [RFC 0/5] dio: clean up completion phase of direct_io_worker()
@ 2006-09-05 23:57 Zach Brown
2006-09-06 7:36 ` Suparna Bhattacharya
2006-09-06 14:57 ` Jeff Moyer
0 siblings, 2 replies; 8+ messages in thread
From: Zach Brown @ 2006-09-05 23:57 UTC (permalink / raw)
To: linux-fsdevel, linux-aio, linux-kernel
dio: clean up completion phase of direct_io_worker()
There have been a lot of bugs recently due to the way direct_io_worker() tries
to decide how to finish direct IO operations. In the worst examples it has
failed to call aio_complete() at all (hang) or called it too many times (oops).
This set of patches cleans up the completion phase with the goal of removing
the complexity that lead to these bugs. We end up with one path that
calculates the result of the operation after all off the bios have completed.
We decide when to generate a result of the operation using that path based on
the final release of a refcount on the dio structure.
I tried to progress towards the final state in steps that were relatively easy
to understand. Each step should compile but I only tested the final result of
having all the patches applied.
The patches result in a slight net decrease in code and binary size:
2.6.18-rc4-dio-cleanup/fs/direct-io.c | 8
2.6.18-rc5-dio-cleanup/fs/direct-io.c | 94 +++++------
2.6.18-rc5-dio-cleanup/mm/filemap.c | 4
fs/direct-io.c | 273 ++++++++++++++--------------------
4 files changed, 159 insertions(+), 220 deletions(-)
text data bss dec hex filename
2592385 450996 210296 3253677 31a5ad vmlinux.before
2592113 450980 210296 3253389 31a48d vmlinux.after
The patches pass light testing with aio-stress, the direct IO tests I could
manage to get running in LTP, and some home-brew functional tests. It's still
making its way through stress testing. It should not be merged until we hear
from that more rigorous testing, I don't think.
I hoped to get some feedback (and maybe volunteers for testing!) by sending the
patches out before waiting for the stress tests.
- z
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC 0/5] dio: clean up completion phase of direct_io_worker()
2006-09-05 23:57 Zach Brown
@ 2006-09-06 7:36 ` Suparna Bhattacharya
2006-09-06 16:36 ` Zach Brown
2006-09-06 14:57 ` Jeff Moyer
1 sibling, 1 reply; 8+ messages in thread
From: Suparna Bhattacharya @ 2006-09-06 7:36 UTC (permalink / raw)
To: Zach Brown; +Cc: linux-fsdevel, linux-aio, linux-kernel
On Tue, Sep 05, 2006 at 04:57:32PM -0700, Zach Brown wrote:
> There have been a lot of bugs recently due to the way direct_io_worker() tries
> to decide how to finish direct IO operations. In the worst examples it has
> failed to call aio_complete() at all (hang) or called it too many times (oops).
>
> This set of patches cleans up the completion phase with the goal of removing
> the complexity that lead to these bugs. We end up with one path that
> calculates the result of the operation after all off the bios have completed.
> We decide when to generate a result of the operation using that path based on
> the final release of a refcount on the dio structure.
Very nice piece of work ! Thanks for taking this up :)
I have looked through all the patches and the completion path unification
logic looks sound to me (btw, the explanations and comments are very good too).
Not the usual bandaids that we often end up with, but a real cleanup that
should go some way in making at least the most errorprone part of the DIO
code more maintainable (I hope/wish we could also do something similar with
simplifying the locking as well).
Of course with this code, we have to await rigorous testing
... and more reviews, but please consider this as my ack for the approach.
Regards
Suparna
>
> I tried to progress towards the final state in steps that were relatively easy
> to understand. Each step should compile but I only tested the final result of
> having all the patches applied.
>
> The patches result in a slight net decrease in code and binary size:
>
> 2.6.18-rc4-dio-cleanup/fs/direct-io.c | 8
> 2.6.18-rc5-dio-cleanup/fs/direct-io.c | 94 +++++------
> 2.6.18-rc5-dio-cleanup/mm/filemap.c | 4
> fs/direct-io.c | 273 ++++++++++++++--------------------
> 4 files changed, 159 insertions(+), 220 deletions(-)
>
> text data bss dec hex filename
> 2592385 450996 210296 3253677 31a5ad vmlinux.before
> 2592113 450980 210296 3253389 31a48d vmlinux.after
>
> The patches pass light testing with aio-stress, the direct IO tests I could
> manage to get running in LTP, and some home-brew functional tests. It's still
> making its way through stress testing. It should not be merged until we hear
> from that more rigorous testing, I don't think.
>
> I hoped to get some feedback (and maybe volunteers for testing!) by sending the
> patches out before waiting for the stress tests.
>
> - z
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org. For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
--
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC 0/5] dio: clean up completion phase of direct_io_worker()
2006-09-06 7:36 ` Suparna Bhattacharya
@ 2006-09-06 16:36 ` Zach Brown
0 siblings, 0 replies; 8+ messages in thread
From: Zach Brown @ 2006-09-06 16:36 UTC (permalink / raw)
To: suparna; +Cc: linux-fsdevel, linux-aio, linux-kernel
> code more maintainable (I hope/wish we could also do something similar with
> simplifying the locking as well).
I agree, and that is definitely on my medium-term todo list.
> Of course with this code, we have to await rigorous testing
> ... and more reviews, but please consider this as my ack for the approach.
Yeah, absolutely. Is there a chance that IBM can throw some testing
cycles at it? I have it queued up for some moderately sized DB runs
over here (FC arrays, cable pulling, that kind of thing.)
- z
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC 0/5] dio: clean up completion phase of direct_io_worker()
2006-09-05 23:57 Zach Brown
2006-09-06 7:36 ` Suparna Bhattacharya
@ 2006-09-06 14:57 ` Jeff Moyer
2006-09-06 16:46 ` Zach Brown
1 sibling, 1 reply; 8+ messages in thread
From: Jeff Moyer @ 2006-09-06 14:57 UTC (permalink / raw)
To: Zach Brown; +Cc: linux-fsdevel, linux-aio, linux-kernel
==> Regarding [RFC 0/5] dio: clean up completion phase of direct_io_worker(); Zach Brown <zach.brown@oracle.com> adds:
zach.brown> There have been a lot of bugs recently due to the way
zach.brown> direct_io_worker() tries to decide how to finish direct IO
zach.brown> operations. In the worst examples it has failed to call
zach.brown> aio_complete() at all (hang) or called it too many times
zach.brown> (oops).
zach.brown> This set of patches cleans up the completion phase with the
zach.brown> goal of removing the complexity that lead to these bugs. We
zach.brown> end up with one path that calculates the result of the
zach.brown> operation after all off the bios have completed. We decide
zach.brown> when to generate a result of the operation using that path
zach.brown> based on the final release of a refcount on the dio structure.
[...]
zach.brown> I hoped to get some feedback (and maybe volunteers for
zach.brown> testing!) by sending the patches out before waiting for the
zach.brown> stress tests.
This all looks good, the code is much easier to follow. What do you think
about making dio->result an unsigned quantity? It should never be negative
now that there is an io_error field.
ACK.
Jeff
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC 0/5] dio: clean up completion phase of direct_io_worker()
2006-09-06 14:57 ` Jeff Moyer
@ 2006-09-06 16:46 ` Zach Brown
2006-09-06 18:13 ` Jeff Moyer
0 siblings, 1 reply; 8+ messages in thread
From: Zach Brown @ 2006-09-06 16:46 UTC (permalink / raw)
To: Jeff Moyer; +Cc: linux-fsdevel, linux-aio, linux-kernel
> This all looks good, the code is much easier to follow. What do you think
> about making dio->result an unsigned quantity? It should never be negative
> now that there is an io_error field.
Yeah, that has always bugged me too. I considered renaming it 'issued',
or something, as part of this patchset but thought we could do it later.
While we're on this topic, I'm nervious that we increment it when
do_direct_IO fails. It might be sound, but that we consider it the
amount of work "transferred" for dio->end_io makes me want to make sure
there aren't confusing corner cases here.
- z
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC 0/5] dio: clean up completion phase of direct_io_worker()
2006-09-06 16:46 ` Zach Brown
@ 2006-09-06 18:13 ` Jeff Moyer
0 siblings, 0 replies; 8+ messages in thread
From: Jeff Moyer @ 2006-09-06 18:13 UTC (permalink / raw)
To: Zach Brown; +Cc: linux-fsdevel, linux-aio, linux-kernel
==> Regarding Re: [RFC 0/5] dio: clean up completion phase of direct_io_worker(); Zach Brown <zach.brown@oracle.com> adds:
>> This all looks good, the code is much easier to follow. What do you think
>> about making dio->result an unsigned quantity? It should never be negative
>> now that there is an io_error field.
zach.brown> Yeah, that has always bugged me too. I considered renaming it
zach.brown> 'issued', or something, as part of this patchset but thought we
zach.brown> could do it later.
I figured since you were doing some house-keeping, we might as well clean
up as much as possible. It's up to you, though. ;)
zach.brown> While we're on this topic, I'm nervious that we increment it
zach.brown> when do_direct_IO fails. It might be sound, but that we
zach.brown> consider it the amount of work "transferred" for dio->end_io
zach.brown> makes me want to make sure there aren't confusing corner cases
zach.brown> here.
It does look non-obvious when reading the code. However, I'm pretty sure
it's right. dio->block_in_file is only updated if there is no error
returned from submit_page_section. As such, it really does reflect how
much work was done before the error, right? It does seem odd that we do
this math in two separate places, though.
-Jeff
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-09-21 18:38 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <OFBE544A3C.7C1B2C64-ON652571F0.003C21B6-652571F0.003C2DF3@in.ibm.com>
2006-09-21 18:38 ` [RFC 0/5] dio: clean up completion phase of direct_io_worker() Zach Brown
2006-09-21 12:24 Veerendra Chandrappa
-- strict thread matches above, loose matches on Subject: below --
2006-09-05 23:57 Zach Brown
2006-09-06 7:36 ` Suparna Bhattacharya
2006-09-06 16:36 ` Zach Brown
2006-09-06 14:57 ` Jeff Moyer
2006-09-06 16:46 ` Zach Brown
2006-09-06 18:13 ` Jeff Moyer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).