From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zach Brown Subject: Re: [RFC 0/5] dio: clean up completion phase of direct_io_worker() Date: Thu, 21 Sep 2006 11:38:13 -0700 Message-ID: <4512DC15.8050101@oracle.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-kernel@vger.kernel.org, suparna@in.ibm.com, xfs@oss.sgi.com Return-path: Received: from agminet01.oracle.com ([141.146.126.228]:35462 "EHLO agminet01.oracle.com") by vger.kernel.org with ESMTP id S1751444AbWIUSiv (ORCPT ); Thu, 21 Sep 2006 14:38:51 -0400 To: Veerendra Chandrappa In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org > on EXT2, EXT3 and XFS filesystems. For the EXT2 and EXT3 filesystems the > tests went okay. But I got stack trace on XFS filesystem and the machine > went down. Fantastic, thanks for running these tests. > kernel BUG at kernel/workqueue.c:113! > EIP is at queue_work+0x86/0x90 We were able to set the pending bit but then found that list_empty() failed on the work queue's entry list_head. Let's call this memory corruption of some kind. > [] xfs_finish_ioend+0x20/0x22 > [] xfs_end_io_direct+0x3c/0x68 > [] dio_complete+0xe3/0xfe > [] dio_bio_end_aio+0x98/0xb1 > [] bio_endio+0x4e/0x78 > [] __end_that_request_first+0xcd/0x416 It was completing an AIO request. ret = blockdev_direct_IO_own_locking(rw, iocb, inode, iomap.iomap_target->bt_bdev, iov, offset, nr_segs, xfs_get_blocks_direct, xfs_end_io_direct); if (unlikely(ret <= 0 && iocb->private)) xfs_destroy_ioend(iocb->private); It looks like xfs_vm_direct_io() is destroying the ioend in the case where direct IO is returning -EIOCBQUEUED. Later the AIO will complete and try to call queue_work on the freed ioend. This wasn't a problem before when blkdev_direct_IO_*() would just return the number of bytes in the op that was in flight. That test should be if (unlikely(ret != -EIOCBQUEUED && iocb->private)) I'll update the patch set and send it out. This makes me worry that XFS might have other paths that need to know about the magical -EIOCBQUEUED case which actually means that a AIO DIO is in flight. Could I coerce some XFS guys into investigating if we might have other problems with trying to bubble -EIOCBQUEUED up from blockdev_direct_IO_own_locking() up through to xfs_file_aio_write()'s caller before calling xfs_end_io_direct()? - z