From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755940Ab1DYUXZ (ORCPT ); Mon, 25 Apr 2011 16:23:25 -0400 Received: from 1wt.eu ([62.212.114.60]:33735 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754763Ab1DYUXV (ORCPT ); Mon, 25 Apr 2011 16:23:21 -0400 Message-Id: <20110425200233.733236421@pcw.home.local> User-Agent: quilt/0.48-1 Date: Mon, 25 Apr 2011 22:02:49 +0200 From: Willy Tarreau To: linux-kernel@vger.kernel.org, stable@kernel.org, stable-review@kernel.org Cc: Chuck Lever , Trond Myklebust , Greg Kroah-Hartman Subject: [PATCH 017/173] NFS: Fix "kernel BUG at fs/aio.c:554!" In-Reply-To: <46075c3a3ef08be6d70339617d6afc98@local> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2.6.27.59-stable review patch. If anyone has any objections, please let us know. ------------------ From: Chuck Lever commit 839f7ad6932d95f4d5ae7267b95c574714ff3d5b upstream. Nick Piggin reports: > I'm getting use after frees in aio code in NFS > > [ 2703.396766] Call Trace: > [ 2703.396858] [] ? native_sched_clock+0x27/0x80 > [ 2703.396959] [] ? put_lock_stats+0xe/0x40 > [ 2703.397058] [] ? lock_release_holdtime+0xa8/0x140 > [ 2703.397159] [] lock_acquire+0x95/0x1b0 > [ 2703.397260] [] ? aio_put_req+0x2b/0x60 > [ 2703.397361] [] ? get_parent_ip+0x11/0x50 > [ 2703.397464] [] _raw_spin_lock_irq+0x41/0x80 > [ 2703.397564] [] ? aio_put_req+0x2b/0x60 > [ 2703.397662] [] aio_put_req+0x2b/0x60 > [ 2703.397761] [] do_io_submit+0x2be/0x7c0 > [ 2703.397895] [] sys_io_submit+0xb/0x10 > [ 2703.397995] [] system_call_fastpath+0x16/0x1b > > Adding some tracing, it is due to nfs completing the request then > returning something other than -EIOCBQUEUED, so aio.c > also completes the request. To address this, prevent the NFS direct I/O engine from completing async iocbs when the forward path returns an error without starting any I/O. This fix appears to survive ^C during both "xfstest no. 208" and "fsx -Z." It's likely this bug has existed for a very long while, as we are seeing very similar symptoms in OEL 5. Copying stable. Signed-off-by: Chuck Lever Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman --- fs/nfs/direct.c | 34 ++++++++++++++++++++-------------- 1 file changed, 20 insertions(+), 14 deletions(-) --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -401,15 +401,18 @@ static ssize_t nfs_direct_read_schedule_ pos += vec->iov_len; } + /* + * If no bytes were started, return the error, and let the + * generic layer handle the completion. + */ + if (requested_bytes == 0) { + nfs_direct_req_release(dreq); + return result < 0 ? result : -EIO; + } + if (put_dreq(dreq)) nfs_direct_complete(dreq); - - if (requested_bytes != 0) - return 0; - - if (result < 0) - return result; - return -EIO; + return 0; } static ssize_t nfs_direct_read(struct kiocb *iocb, const struct iovec *iov, @@ -829,15 +832,18 @@ static ssize_t nfs_direct_write_schedule pos += vec->iov_len; } + /* + * If no bytes were started, return the error, and let the + * generic layer handle the completion. + */ + if (requested_bytes == 0) { + nfs_direct_req_release(dreq); + return result < 0 ? result : -EIO; + } + if (put_dreq(dreq)) nfs_direct_write_complete(dreq, dreq->inode); - - if (requested_bytes != 0) - return 0; - - if (result < 0) - return result; - return -EIO; + return 0; } static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov,