From: Jeff Layton <jlayton@kernel.org>
To: "Fu, Rodney" <rfu@panasas.com>, Matthew Wilcox <willy@infradead.org>
Cc: "hch@lst.de" <hch@lst.de>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
linux-api <linux-api@vger.kernel.org>
Subject: Re: Provision for filesystem specific open flags
Date: Mon, 20 Nov 2017 08:38:16 -0500 [thread overview]
Message-ID: <1511185096.4228.8.camel@kernel.org> (raw)
In-Reply-To: <BN3PR0801MB2257249A7388086676CBA811AB2B0@BN3PR0801MB2257.namprd08.prod.outlook.com>
On Mon, 2017-11-13 at 15:16 +0000, Fu, Rodney wrote:
> > > > No. If you want new flags bits, make a public proposal. Maybe some
> > > > other filesystem would also benefit from them.
> > >
> > > Ah, I see what you mean now, thanks.
> > >
> > > I would like to propose O_CONCURRENT_WRITE as a new open flag. It is
> > > currently used in the Panasas filesystem (panfs) and defined with value:
> > >
> > > #define O_CONCURRENT_WRITE 020000000000
> > >
> > > This flag has been provided by panfs to HPC users via the mpich
> > > package for well over a decade. See:
> > >
> > > https://github.com/pmodels/mpich/blob/master/src/mpi/romio/adio/ad_pan
> > > fs/ad_panfs_open6.c#L344
> > >
> > > O_CONCURRENT_WRITE indicates to the filesystem that the application
> > > doing the open is participating in a coordinated distributed manner
> > > with other such applications, possibly running on different hosts.
> > > This allows the panfs filesystem to delegate some of the cache
> > > coherency responsibilities to the application, improving performance.
> > >
> > > The reason this flag is used on open as opposed to having a post-open
> > > ioctl or fcntl SETFL is to allow panfs to catch and reject opens by
> > > applications that attempt to access files that have already been
> > > opened by applications that have set O_CONCURRENT_WRITE.
> > OK, let me just check I understand. Once any application has opened the inode
> > with O_CONCURRENT_WRITE, all subsequent attempts to open the same inode without
> > O_CONCURRENT_WRITE will fail. Presumably also if somebody already has the inode
> > open without O_CONCURRENT_WRITE set, the first open with O_CONCURRENT_WRITE will
> > fail?
>
> Yes on both counts. Opening with O_CONCURRENT_WRITE, followed by an open
> without will fail. Opening without O_CONCURRENT_WRITE followed by one with it
> will also fail.
>
> > Are opens with O_RDONLY also blocked?
>
> No they are not. The decision to grant access is based solely on the
> O_CONCURRENT_WRITE flag.
>
> > This feels a lot like leases ... maybe there's an opportunity to give better
> > semantics here -- rather than rejecting opens without O_CONCURRENT_WRITE, all
> > existing users could be forced to use the stricter coherency model?
>
> I don't think that will work, at least not from the perspective of trying to
> maintain good performance. A user that does not open with O_CONCURRENT_WRITE
> does not know how to adhere to the proper access patterns that maintain
> coherency. To continue to allow all users access after that point, the
> filesystem will have to force all users into a non-cacheable mode. Instead, we
> reject stray opens to allow any existing CONCURRENT_WRITE application to
> complete in a higher performance mode.
>
(added linux-api@vger.kernel.org to the cc list...)
Actually, it feels more like O_EXLOCK / O_SHLOCK to me:
https://www.gnu.org/software/libc/manual/html_node/Open_002dtime-Flags.html
Those are not quite the same semantics as what you're describing for
O_CONCURRENT_WRITE, but the handling of conflicts would be similar.
Maybe it's possible to dovetail your new flag on top of a credible
O_EXLOCK/O_SHLOCK implementation? It'd be nice to have those to
implement VFS-level share/deny locking. Most NFS and SMB servers could
make good use of it.
--
Jeff Layton <jlayton@kernel.org>
next parent reply other threads:[~2017-11-20 13:38 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <BN3PR0801MB2257E7D90F26A85C1D16730EAB540@BN3PR0801MB2257.namprd08.prod.outlook.com>
[not found] ` <20171110172344.GA15288@lst.de>
[not found] ` <BN3PR0801MB2257E71C2A12EA41C77FF7EBAB540@BN3PR0801MB2257.namprd08.prod.outlook.com>
[not found] ` <20171110192902.GA10339@bombadil.infradead.org>
[not found] ` <BN3PR0801MB22576444104088CEDD24DE7DAB540@BN3PR0801MB2257.namprd08.prod.outlook.com>
[not found] ` <20171111003721.GA9546@bombadil.infradead.org>
[not found] ` <BN3PR0801MB2257249A7388086676CBA811AB2B0@BN3PR0801MB2257.namprd08.prod.outlook.com>
2017-11-20 13:38 ` Jeff Layton [this message]
[not found] ` <20171113004855.GV4094@dastard>
[not found] ` <BN3PR0801MB225771FD9BBD14A99A9358F7AB2B0@BN3PR0801MB2257.namprd08.prod.outlook.com>
[not found] ` <20171113215847.GY4094@dastard>
[not found] ` <BN3PR0801MB2257BBAA9D7CA6CEDCA04DECAB280@BN3PR0801MB2257.namprd08.prod.outlook.com>
[not found] ` <BN3PR0801MB2257BBAA9D7CA6CEDCA04DECAB280-1I06WyKSH1RpbkYrVjfdjVJr2SjL+wq6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-11-20 13:53 ` Provision for filesystem specific open flags Jeff Layton
[not found] ` <15b7fb2c-0ab7-014c-025f-b95d254e75d0@fastmail.fm>
[not found] ` <BN3PR0801MB2257378E7F3596E0E61C1F89AB2B0@BN3PR0801MB2257.namprd08.prod.outlook.com>
[not found] ` <BN3PR0801MB2257378E7F3596E0E61C1F89AB2B0-1I06WyKSH1RpbkYrVjfdjVJr2SjL+wq6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-11-20 14:03 ` Florian Weimer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1511185096.4228.8.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=hch@lst.de \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=rfu@panasas.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).