From mboxrd@z Thu Jan 1 00:00:00 1970 From: Al Viro Subject: Re: updated orangefs tree at kernel.org Date: Sat, 10 Oct 2015 03:31:57 +0100 Message-ID: <20151010023157.GE22011@ZenIV.linux.org.uk> References: <20151008210745.GW22011@ZenIV.linux.org.uk> <20151008230333.GX22011@ZenIV.linux.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Linus Torvalds , linux-fsdevel To: Mike Marshall Return-path: Received: from zeniv.linux.org.uk ([195.92.253.2]:54904 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750975AbbJJCcA (ORCPT ); Fri, 9 Oct 2015 22:32:00 -0400 Content-Disposition: inline In-Reply-To: <20151008230333.GX22011@ZenIV.linux.org.uk> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Fri, Oct 09, 2015 at 12:03:33AM +0100, Al Viro wrote: > What I have in mind is something like (*ABSOLUTELY* *UNTESTED*) > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git orangefs-untested OK, I've thrown several more fixes into that branch, but that's the low-hanging fruit. The really interesting questions I see right now are: * it needs some form of lseek() for directories - currently absent, not even rewind to the beginning. Userland won't be happy * races around the umount - I don't understand the protocol well enough to fix those, but this area looks fishy as hell * racing copy_attributes_to_inode() and the way it mangles the metadata of live inodes. ->i_mode updates in particular are seriously broken, so are updates of symlink bodies, etc. * truncation of long symlinks and the lack of NUL-termination in there. * atrocious userland API for downcalls (response to everything other than readdir must come in 4-element iovec array passed to writev(), no matter where the segment boundaries are; response to readdir must come in 5-element iovec array passed to writev(), the boundaries among the first 4 segments do not matter, the 5th segment covering exactly the payload). IMO that needs to be fixed before we merge the damn thing - it's really too ugly to live. I would really like to hear from somebody familiar with the userland side - what responses does it actually submit? The validation kernel-side is basically inexistent, and while I suspect that we could handle the actually sent responses much cleaner, the full set of everything that would be accepted by the current code is a bloody mess and would be much harder to handle in a clean way. What's more, the response layouts are messy, and if it is still possible to change that API, we'd be much better off if we cleaned them up. * for some reason the lookup request tells the server whether there was LOOKUP_FOLLOW in flags. This is really odd - no other filesystem cares about that bit and until now its presence in ->lookup() flags had been basically at convenience of fs/namei.c; it doesn't match anything useful and I'm very surprised by seeing orangefs pass it to server. LOOKUP_FOLLOW is almost certainly a wrong approximation to whatever orangefs is trying to achieve. What is it about?