From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Moyer Subject: Re: [patch] fs: aio fix rcu lookup Date: Tue, 18 Jan 2011 12:21:58 -0500 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Andrew Morton , linux-fsdevel , linux-kernel@vger.kernel.org To: Nick Piggin Return-path: Received: from mx1.redhat.com ([209.132.183.28]:20739 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753110Ab1ARRWb convert rfc822-to-8bit (ORCPT ); Tue, 18 Jan 2011 12:22:31 -0500 Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Nick Piggin writes: > On Tue, Jan 18, 2011 at 6:07 AM, Jeff Moyer wrote= : >> Nick Piggin writes: >> >>> On Sat, Jan 15, 2011 at 1:52 AM, Jeff Moyer wro= te: >>>> Nick Piggin writes: >>>> >>>>> Hi, >>>>> >>>>> While hunting down a bug in NFS's AIO, I believe I found this >>>>> buggy code... >>>>> >>>>> fs: aio fix rcu ioctx lookup >>>>> >>>>> aio-dio-invalidate-failure GPFs in aio_put_req from io_submit. >>>>> >>>>> lookup_ioctx doesn't implement the rcu lookup pattern properly. >>>>> rcu_read_lock does not prevent refcount going to zero, so we >>>>> might take a refcount on a zero count ioctx. >>>> >>>> So, does this patch fix the problem? =C2=A0You didn't actually say= =2E... >>> >>> No, it seemd to be an NFS AIO problem, although it was a >>> slightly older kernel so I'll re test after -rc1 if I haven't heard >>> back about it. >> >> OK. >> >>> Do you agree with the theoretical problem? I didn't try to >>> write a racer to break it yet. Inserting a delay before the >>> get_ioctx might do the trick. >> >> I'm not convinced, no. =C2=A0The last reference to the kioctx is alw= ays the >> process, released in the exit_aio path, or via sys_io_destroy. =C2=A0= In both >> cases, we cancel all aios, then wait for them all to complete before >> dropping the final reference to the context. > > That wouldn't appear to prevent a concurrent thread from doing an > io operation that requires ioctx lookup, and taking the last referenc= e > after the io_cancel thread drops the ref. io_cancel isn't of any concern here. When io_setup is called, it creates the ioctx and takes 2 references to it. There are two paths to destroying the ioctx: one is through process exit, the other is through a call to sys_io_destroy. The former means that you can't submit more I/O anyway (which in turn means that there won't be any more lookups on the ioctx), so I'll focus on the latter. What you're asking about, then, is a race between lookup_ioctx and io_destroy. The first thing io_destroy does is to set ctx->dead to 1 and remove the ioctx from the list: spin_lock(&mm->ioctx_lock); was_dead =3D ioctx->dead; ioctx->dead =3D 1; hlist_del_rcu(&ioctx->list); spin_unlock(&mm->ioctx_lock); if (likely(!was_dead)) put_ioctx(ioctx); /* twice for the list */ aio_cancel_all(ioctx); wait_for_all_aios(ioctx); wake_up(&ioctx->wait); put_ioctx(ioctx); /* once for the lookup */ The lookup code is this: rcu_read_lock(); hlist_for_each_entry_rcu(ctx, n, &mm->ioctx_list, list) { if (ctx->user_id =3D=3D ctx_id && !ctx->dead) { get_ioctx(ctx); ret =3D ctx; break; =2E.. In order for the race to occur, the lookup code would have to find the ioctx on the list without the dead mark set. Then, the io_destroy code would have to do all of its work, including its two put_ioctx calls, an= d finally the get_ioctx from the lookup would have to happen. Possible? Maybe. It certainly isn't explicitly protected against. Go ahead and re-post the patch. I agree that it's a theoretical race. =3D= ) Cheers, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html