From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kara Subject: Re: [patch] fs: aio fix rcu lookup Date: Wed, 19 Jan 2011 14:21:23 +0100 Message-ID: <20110119132123.GC4246@quack.suse.cz> References: <20110118190114.GA5070@quack.suse.cz> <20110118235236.GA14087@quack.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Jan Kara , Jeff Moyer , Andrew Morton , linux-fsdevel , linux-kernel@vger.kernel.org To: Nick Piggin Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Wed 19-01-11 11:20:45, Nick Piggin wrote: > On Wed, Jan 19, 2011 at 10:52 AM, Jan Kara wrote: > > On Wed 19-01-11 09:17:23, Nick Piggin wrote: > >> On Wed, Jan 19, 2011 at 6:01 AM, Jan Kara wrote: > >> > On Tue 18-01-11 10:24:24, Nick Piggin wrote: > >> >> On Tue, Jan 18, 2011 at 6:07 AM, Jeff Moyer = wrote: > >> >> > Nick Piggin writes: > >> >> >> Do you agree with the theoretical problem? I didn't try to > >> >> >> write a racer to break it yet. Inserting a delay before the > >> >> >> get_ioctx might do the trick. > >> >> > > >> >> > I'm not convinced, no. =A0The last reference to the kioctx is= always the > >> >> > process, released in the exit_aio path, or via sys_io_destroy= =2E =A0In both > >> >> > cases, we cancel all aios, then wait for them all to complete= before > >> >> > dropping the final reference to the context. > >> >> > >> >> That wouldn't appear to prevent a concurrent thread from doing = an > >> >> io operation that requires ioctx lookup, and taking the last re= ference > >> >> after the io_cancel thread drops the ref. > >> >> > >> >> > So, while I agree that what you wrote is better, I remain unc= onvinced of > >> >> > it solving a real-world problem. =A0Feel free to push it in a= s a cleanup, > >> >> > though. > >> >> > >> >> Well I think it has to be technically correct first. If there i= s indeed a > >> >> guaranteed ref somehow, it just needs a comment. > >> > =A0Hmm, the code in io_destroy() indeed looks fishy. We delete t= he ioctx > >> > from the hash table and set ioctx->dead which is supposed to sto= p > >> > lookup_ioctx() from finding it (see the !ctx->dead check in > >> > lookup_ioctx()). There's even a comment in io_destroy() saying: > >> > =A0 =A0 =A0 =A0/* > >> > =A0 =A0 =A0 =A0 * Wake up any waiters. =A0The setting of ctx->de= ad must be seen > >> > =A0 =A0 =A0 =A0 * by other CPUs at this point. =A0Right now, we = rely on the > >> > =A0 =A0 =A0 =A0 * locking done by the above calls to ensure this= consistency. > >> > =A0 =A0 =A0 =A0 */ > >> > But since lookup_ioctx() is called without any lock or barrier n= othing > >> > really seems to prevent the list traversal and ioctx->dead test = to happen > >> > before io_destroy() and get_ioctx() after io_destroy(). > >> > > >> > But wouldn't the right fix be to call synchronize_rcu() in io_de= stroy()? > >> > Because with your fix we could still return 'dead' ioctx and I d= on't think > >> > we are supposed to do that... > >> > >> With my fix we won't oops, I was a bit concerned about ->dead, > >> yes but I don't know what semantics it is attempted to have there. > > =A0But wouldn't it do something bad if the memory gets reallocated = for > > something else and set to non-zero? E.g. memory corruption? >=20 > I don't see how it would with my patch. Ah, you are right. Since the whole structure gets freed only after th= e RCU period expires, we are guaranteed to see 0 in ctx->users until we drop rcu_read_lock. Maybe a comment like the above would be useful at the pl= ace where you use try_get_ioctx() but your patch works. > >> synchronize_rcu() in io_destroy() =A0does not prevent it from retu= rning > >> as soon as lookup_ioctx drops the rcu_read_lock(). > > =A0Yes, exactly. So references obtained before synchronize_rcu() wo= uld be > > completely fine and valid references and there would be no referenc= es after > > synchronize_rcu() because they'd see 'dead' set. But looking at the= code > > again it still would not be enough because we could still race with > > io_submit_one() adding new IO to the ioctx which will be freed just= after > > put_ioctx() in do_io_submit(). > > > > The patch below implements what I have in mind - it should be proba= bly > > split into two but I'd like to hear comments on that before doing t= hese > > cosmetic touches ;) >=20 > Well this seems to solve it too, but it is 2 problems here. It is cha= nging > the semantics of io_destroy which requires the big synchronize_rcu() > hammer. >=20 > But I don't believe that's necessarily desirable, or required. In fac= t it is > explicitly not reuired because it only says that it _may_ cancel outs= tanding > requests. Well, we are not required to cancel all the outstanding AIO because o= f the API requirement, that's granted. But we must do it because of the way h= ow the code is written. Outstanding IO requests reference ioctx but they a= re not counted in ctx->users but in ctx->reqs_active. So the code relies o= n the fact that the reference held by the hash table protects ctx from be= ing freed and io_destroy() waits for requests before dropping the last reference to ctx. But there's the second race I describe making it poss= ible for new IO to be created after io_destroy() has waited for all IO to finish... Honza --=20 Jan Kara SUSE Labs, CR