From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from zeniv.linux.org.uk ([195.92.253.2]:48869 "EHLO
	ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750778AbcBJV0H (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Wed, 10 Feb 2016 16:26:07 -0500
Date: Wed, 10 Feb 2016 21:26:03 +0000
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Mike Marshall <hubcap@omnibond.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>
Subject: Re: Orangefs ABI documentation
Message-ID: <20160210212603.GL17997@ZenIV.linux.org.uk>
References: <CAOg9mSR1-7xOZCC=8dbx+zfHDep4NyUZm0e5XBDQzpCFaCNH7Q@mail.gmail.com>
 <20160208233535.GC17997@ZenIV.linux.org.uk>
 <20160209033203.GE17997@ZenIV.linux.org.uk>
 <CAOg9mSTFivsJDL-Ppruivd8gp_iThdq6N6+bWg0TfLXpV=rs8g@mail.gmail.com>
 <20160209174049.GG17997@ZenIV.linux.org.uk>
 <CAOg9mSTzE7KmrVC1zWSgC+vo20HfKLrsM3VPkxnYLN9roi+ZOw@mail.gmail.com>
 <20160209221623.GI17997@ZenIV.linux.org.uk>
 <20160209224050.GJ17997@ZenIV.linux.org.uk>
 <20160209231328.GK17997@ZenIV.linux.org.uk>
 <20160210164435.GA4950@ZenIV.linux.org.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160210164435.GA4950@ZenIV.linux.org.uk>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Wed, Feb 10, 2016 at 04:44:36PM +0000, Al Viro wrote:
> > That breakage had been introduced between 2.8.5 and 2.8.6 (at some point
> > during the spring of 2012).  AFAICS, all versions starting with 2.8.6 are
> > vulnerable...
> 
> BTW, what about kill -9 delivered to readdir in progress?  There's no
> cancel for those (and AFAICS the daemon will reject cancel on anything
> other than FILE_IO), so what's to stop another thread from picking the
> same readdir slot and getting (daemon-side) two of them spewing into
> the same area of shared memory?  Is it simply that daemon-side the shared
> memory on readdir is touched only upon request completion in completely
> serialized process_vfs_requests()?  That doesn't seem to be enough -
> suppose the second readdir request completes (daemon-side) first, its results
> get packed into shared memory slot and it is reported to kernel, which
> proceeds to repack and copy that data to userland.  In the meanwhile,
> daemon completes the _earlier_ readdir and proceeds to pack its results into
> the same slot of shared memory.  Sure, the kernel won't take that (the
> op with the matching tag has been gone already), but the data is stored
> into shared memory *before* writev() on the control device that would pass
> the response to the kernel, so it still gets overwritten.  Right under
> decoding readdir()...
> 
> Or is there something in the daemon that would guarantee readdir responses
> to happen in the same order in which it had picked the requests?  I'm not
> familiar enough with that beast (and overall control flow in there is, er,
> not the most transparent one I've seen), so I might be missing something,
> but I don't see anything obvious that would guarantee such ordering.
> 
> Please, clarify.

Two more questions:
	* why do we need cancel to be held back while we are going through
ORANGEFS_DEV_REMOUNT_ALL?  IOW, why do we need to take request_mutex for
them?
	* your ->kill_sb() starts with telling daemon that fs is gone,
then proceeds to evict dentries/inodes.  Sure, you don't have page cache
(or that would've been instantly fatal - dirty pages would need to be
written out, for one thing), but why do it in this order?  IOW, why not
_start_ with kill_anon_super(), then do the rest of the work?