linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: "Fu, Rodney" <rfu@panasas.com>, Matthew Wilcox <willy@infradead.org>
Cc: "hch@lst.de" <hch@lst.de>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	linux-api <linux-api@vger.kernel.org>
Subject: Re: Provision for filesystem specific open flags
Date: Mon, 20 Nov 2017 08:38:16 -0500	[thread overview]
Message-ID: <1511185096.4228.8.camel@kernel.org> (raw)
In-Reply-To: <BN3PR0801MB2257249A7388086676CBA811AB2B0@BN3PR0801MB2257.namprd08.prod.outlook.com>

On Mon, 2017-11-13 at 15:16 +0000, Fu, Rodney wrote:
> > > > No.  If you want new flags bits, make a public proposal.  Maybe some 
> > > > other filesystem would also benefit from them.
> > > 
> > > Ah, I see what you mean now, thanks.
> > > 
> > > I would like to propose O_CONCURRENT_WRITE as a new open flag.  It is 
> > > currently used in the Panasas filesystem (panfs) and defined with value:
> > > 
> > > #define O_CONCURRENT_WRITE 020000000000
> > > 
> > > This flag has been provided by panfs to HPC users via the mpich 
> > > package for well over a decade.  See:
> > > 
> > > https://github.com/pmodels/mpich/blob/master/src/mpi/romio/adio/ad_pan
> > > fs/ad_panfs_open6.c#L344
> > > 
> > > O_CONCURRENT_WRITE indicates to the filesystem that the application 
> > > doing the open is participating in a coordinated distributed manner 
> > > with other such applications, possibly running on different hosts.  
> > > This allows the panfs filesystem to delegate some of the cache 
> > > coherency responsibilities to the application, improving performance.
> > > 
> > > The reason this flag is used on open as opposed to having a post-open 
> > > ioctl or fcntl SETFL is to allow panfs to catch and reject opens by 
> > > applications that attempt to access files that have already been 
> > > opened by applications that have set O_CONCURRENT_WRITE.
> > OK, let me just check I understand.  Once any application has opened the inode
> > with O_CONCURRENT_WRITE, all subsequent attempts to open the same inode without
> > O_CONCURRENT_WRITE will fail.  Presumably also if somebody already has the inode
> > open without O_CONCURRENT_WRITE set, the first open with O_CONCURRENT_WRITE will
> > fail?
> 
> Yes on both counts.  Opening with O_CONCURRENT_WRITE, followed by an open
> without will fail.  Opening without O_CONCURRENT_WRITE followed by one with it
> will also fail.
> 
> > Are opens with O_RDONLY also blocked?
> 
> No they are not.  The decision to grant access is based solely on the
> O_CONCURRENT_WRITE flag.
> 
> > This feels a lot like leases ... maybe there's an opportunity to give better
> > semantics here -- rather than rejecting opens without O_CONCURRENT_WRITE, all
> > existing users could be forced to use the stricter coherency model?
> 
> I don't think that will work, at least not from the perspective of trying to
> maintain good performance.  A user that does not open with O_CONCURRENT_WRITE
> does not know how to adhere to the proper access patterns that maintain
> coherency.  To continue to allow all users access after that point, the
> filesystem will have to force all users into a non-cacheable mode.  Instead, we
> reject stray opens to allow any existing CONCURRENT_WRITE application to
> complete in a higher performance mode.
> 

(added linux-api@vger.kernel.org to the cc list...)

Actually, it feels more like O_EXLOCK / O_SHLOCK to me:

    https://www.gnu.org/software/libc/manual/html_node/Open_002dtime-Flags.html

Those are not quite the same semantics as what you're describing for
O_CONCURRENT_WRITE, but the handling of conflicts would be similar. 

Maybe it's possible to dovetail your new flag on top of a credible
O_EXLOCK/O_SHLOCK implementation? It'd be nice to have those to
implement VFS-level share/deny locking. Most NFS and SMB servers could
make good use of it.


-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2017-11-20 13:38 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-10 16:49 Provision for filesystem specific open flags Fu, Rodney
2017-11-10 17:23 ` hch
2017-11-10 17:39   ` Fu, Rodney
2017-11-10 19:29     ` Matthew Wilcox
2017-11-10 21:04       ` Fu, Rodney
2017-11-11  0:37         ` Matthew Wilcox
2017-11-13 15:16           ` Fu, Rodney
2017-11-20 13:38             ` Jeff Layton [this message]
2017-11-13  0:48         ` Dave Chinner
2017-11-13 17:02           ` Fu, Rodney
2017-11-13 21:58             ` Dave Chinner
2017-11-14 17:35               ` Fu, Rodney
2017-11-20 13:53                 ` Jeff Layton
2017-12-04  5:29                 ` NeilBrown
2017-12-05 21:36                   ` Andreas Dilger
2017-11-13 17:45         ` Bernd Schubert
2017-11-13 20:19           ` Fu, Rodney
2017-11-20 14:03             ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1511185096.4228.8.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=rfu@panasas.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).