From mboxrd@z Thu Jan 1 00:00:00 1970 From: chris.mason@oracle.com Subject: (unknown) Date: Thu, 21 Dec 2006 13:56:06 -0700 Message-ID: <200612212056.kBLKPBFO011611@rgmgw2.us.oracle.com> Return-path: Received: from rgminet02.oracle.com ([148.87.113.119]:16369 "EHLO rgminet02.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1423105AbWLUVCi (ORCPT ); Thu, 21 Dec 2006 16:02:38 -0500 Received: from rgminet01.oracle.com (rgminet01.oracle.com [148.87.113.118]) by rgminet02.oracle.com (Switch-3.2.4/Switch-3.1.7) with ESMTP id kBLL2ak6015255 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 21 Dec 2006 14:02:36 -0700 To: unlisted-recipients:; (no To-header on input) Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org >>From chris.mason@oracle.com Thu Dec 21 15:35:02 2006 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [PATCH 5 of 8] Make ext3 safe for the new DIO locking rules X-Mercurial-Node: bebaf8972a3198faf661ab988af0f53cd49856bb Message-Id: In-Reply-To: Date: Thu, 21 Dec 2006 15:35:01 -0400 From: Chris Mason To: linux-fsdevel@vger.kernel.org, akpm@osdl.org, zach.brown@oracle.com This creates a version of ext3_get_block that starts and ends a transaction. By starting and ending the transaction inside get_block, this is able to avoid lock inversion problems when the DIO code tries to take page locks inside blockdev_direct_IO. (transaction locks must always happen after page locks). Signed-off-by: Chris Mason diff -r 385bc75d9266 -r bebaf8972a31 fs/ext3/inode.c --- a/fs/ext3/inode.c Thu Dec 21 15:31:30 2006 -0500 +++ b/fs/ext3/inode.c Thu Dec 21 15:31:30 2006 -0500 @@ -1673,6 +1673,30 @@ static int ext3_releasepage(struct page return journal_try_to_free_buffers(journal, page, wait); } +static int ext3_get_block_direct_IO(struct inode *inode, sector_t iblock, + struct buffer_head *bh_result, int create) +{ + int ret = 0; + handle_t *handle = ext3_journal_start(inode, DIO_CREDITS); + if (IS_ERR(handle)) { + ret = PTR_ERR(handle); + goto out; + } + ret = ext3_get_block(inode, iblock, bh_result, create); + /* + * Reacquire the handle: ext3_get_block() can restart the transaction + */ + handle = journal_current_handle(); + if (handle) { + int err; + err = ext3_journal_stop(handle); + if (!ret) + ret = err; + } +out: + return ret; +} + /* * If the O_DIRECT write will extend the file then add this inode to the * orphan list. So recovery will truncate it back to the original size @@ -1693,39 +1717,58 @@ static ssize_t ext3_direct_IO(int rw, st int orphan = 0; size_t count = iov_length(iov, nr_segs); - if (rw == WRITE) { - loff_t final_size = offset + count; - + if (rw == WRITE && (offset + count > inode->i_size)) { handle = ext3_journal_start(inode, DIO_CREDITS); if (IS_ERR(handle)) { ret = PTR_ERR(handle); goto out; } - if (final_size > inode->i_size) { - ret = ext3_orphan_add(handle, inode); - if (ret) - goto out_stop; - orphan = 1; - ei->i_disksize = inode->i_size; - } - } - + ret = ext3_orphan_add(handle, inode); + if (ret) { + ext3_journal_stop(handle); + goto out; + } + ei->i_disksize = inode->i_size; + ret = ext3_journal_stop(handle); + if (ret) { + /* something has gone horribly wrong, cleanup + * the orphan list in ram + */ + if (inode->i_nlink) + ext3_orphan_del(NULL, inode); + goto out; + } + orphan = 1; + } + + /* + * the placeholder page code may take a page lock, so we have + * to stop any running transactions before calling + * blockdev_direct_IO. Use ext3_get_block_direct_IO to start + * and stop a transaction on each get_block call. + */ ret = blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov, offset, nr_segs, - ext3_get_block, NULL); + ext3_get_block_direct_IO, NULL); /* * Reacquire the handle: ext3_get_block() can restart the transaction */ handle = journal_current_handle(); -out_stop: - if (handle) { + if (orphan) { int err; - - if (orphan && inode->i_nlink) + handle = ext3_journal_start(inode, DIO_CREDITS); + if (IS_ERR(handle)) { + ret = PTR_ERR(handle); + if (inode->i_nlink) + ext3_orphan_del(NULL, inode); + goto out; + } + + if (inode->i_nlink) ext3_orphan_del(handle, inode); - if (orphan && ret > 0) { + if (ret > 0) { loff_t end = offset + ret; if (end > inode->i_size) { ei->i_disksize = end;