From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zeniv.linux.org.uk ([195.92.253.2]:48869 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750778AbcBJV0H (ORCPT ); Wed, 10 Feb 2016 16:26:07 -0500 Date: Wed, 10 Feb 2016 21:26:03 +0000 From: Al Viro To: Mike Marshall Cc: Linus Torvalds , linux-fsdevel , Stephen Rothwell Subject: Re: Orangefs ABI documentation Message-ID: <20160210212603.GL17997@ZenIV.linux.org.uk> References: <20160208233535.GC17997@ZenIV.linux.org.uk> <20160209033203.GE17997@ZenIV.linux.org.uk> <20160209174049.GG17997@ZenIV.linux.org.uk> <20160209221623.GI17997@ZenIV.linux.org.uk> <20160209224050.GJ17997@ZenIV.linux.org.uk> <20160209231328.GK17997@ZenIV.linux.org.uk> <20160210164435.GA4950@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160210164435.GA4950@ZenIV.linux.org.uk> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Wed, Feb 10, 2016 at 04:44:36PM +0000, Al Viro wrote: > > That breakage had been introduced between 2.8.5 and 2.8.6 (at some point > > during the spring of 2012). AFAICS, all versions starting with 2.8.6 are > > vulnerable... > > BTW, what about kill -9 delivered to readdir in progress? There's no > cancel for those (and AFAICS the daemon will reject cancel on anything > other than FILE_IO), so what's to stop another thread from picking the > same readdir slot and getting (daemon-side) two of them spewing into > the same area of shared memory? Is it simply that daemon-side the shared > memory on readdir is touched only upon request completion in completely > serialized process_vfs_requests()? That doesn't seem to be enough - > suppose the second readdir request completes (daemon-side) first, its results > get packed into shared memory slot and it is reported to kernel, which > proceeds to repack and copy that data to userland. In the meanwhile, > daemon completes the _earlier_ readdir and proceeds to pack its results into > the same slot of shared memory. Sure, the kernel won't take that (the > op with the matching tag has been gone already), but the data is stored > into shared memory *before* writev() on the control device that would pass > the response to the kernel, so it still gets overwritten. Right under > decoding readdir()... > > Or is there something in the daemon that would guarantee readdir responses > to happen in the same order in which it had picked the requests? I'm not > familiar enough with that beast (and overall control flow in there is, er, > not the most transparent one I've seen), so I might be missing something, > but I don't see anything obvious that would guarantee such ordering. > > Please, clarify. Two more questions: * why do we need cancel to be held back while we are going through ORANGEFS_DEV_REMOUNT_ALL? IOW, why do we need to take request_mutex for them? * your ->kill_sb() starts with telling daemon that fs is gone, then proceeds to evict dentries/inodes. Sure, you don't have page cache (or that would've been instantly fatal - dirty pages would need to be written out, for one thing), but why do it in this order? IOW, why not _start_ with kill_anon_super(), then do the rest of the work?