linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Bernd Schubert <bschubert@ddn.com>
Cc: Dharmendra Hans <dharamhans87@gmail.com>,
	Miklos Szeredi <miklos@szeredi.hu>,
	linux-fsdevel@vger.kernel.org,
	fuse-devel <fuse-devel@lists.sourceforge.net>,
	linux-kernel@vger.kernel.org, Dharmendra Singh <dsingh@ddn.com>
Subject: Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create
Date: Fri, 6 May 2022 13:07:37 -0400	[thread overview]
Message-ID: <YnVV2Rr4NMyFj5oF@redhat.com> (raw)
In-Reply-To: <78c2beed-b221-71b4-019f-b82522d98f1e@ddn.com>

On Fri, May 06, 2022 at 06:41:17PM +0200, Bernd Schubert wrote:
> 
> 
> On 5/6/22 16:12, Vivek Goyal wrote:
> 
> [...]
> 
> > On Fri, May 06, 2022 at 11:04:05AM +0530, Dharmendra Hans wrote:
> 
> > 
> > Ok, looks like your fuse file server is talking to a another file
> > server on network and that's why you are mentioning two network trips.
> > 
> > Let us differentiate between two things first.
> > 
> > A. FUSE protocol semantics
> > B. Implementation of FUSE protocl by libfuse.
> > 
> > I think I am stressing on A and you are stressing on B. I just want
> > to see what's the difference between FUSE_CREATE and FUSE_ATOMIC_CREATE
> > from fuse protocol point of view. Again look at from kernel's point of
> > view and don't worry about libfuse is going to implement it.
> > Implementations can vary.
> 
> Agreed, I don't think we need to bring in network for the kernel to libfuse
> API.
> 
> > 
> >  From kernel's perspective FUSE_CREATE is supposed to create + open a
> > file. It is possible file already exists. Look at include/fuse_lowlevel.h
> > description for create().
> > 
> >          /**
> >           * Create and open a file
> >           *
> >           * If the file does not exist, first create it with the specified
> >           * mode, and then open it.
> >           */
> > 
> > I notice that fuse is offering a high level API as well as low level
> > API. I primarily know about low level API. To me these are just two
> > different implementation but things don't change how kernel sends
> > fuse messages and what it expects from server in return.
> > 
> > Now with FUSE_ATOMIC_CREATE, from kernel's perspective, only difference
> > is that in reply message file server will also indicate if file was
> > actually created or not. Is that right?
> > 
> > And I am focussing on this FUSE API apsect. I am least concerned at
> > this point of time who libfuse decides to actually implement FUSE_CREATE
> > or FUSE_ATOMIC_CREATE etc. You might make a single call in libfuse
> > server (instead of two) and that's performance optimization in libfuse.
> > Kernel does not care how many calls did you make in file server to
> > implement FUSE_CREATE or FUSE_ATOMIC_CREATE. All it cares is that
> > create and open the file.
> > 
> > So while you might do things in more atomic manner in file server and
> > cut down on network traffic, kernel fuse API does not care. All it cares
> > about is create + open a file.
> > 
> > Anyway, from kernel's perspective, I think you should be able to
> > just use FUSE_CREATE and still be do "lookup + create + open".
> > FUSE_ATOMIC_CREATE is just allows one additional optimization so
> > that you know whether to invalidate parent dir's attrs or not.
> > 
> > In fact kernel is not putting any atomicity requirements as well on
> > file server. And that's why I think this new command should probably
> > be called FUSE_CREATE_EXT because it just sends back additional
> > info.
> > 
> > All the atomicity stuff you have been describing is that you are
> > trying to do some optimizations in libfuse implementation to implement
> > FUSE_ATOMIC_CREATE so that you send less number of commands over
> > network. That's a good idea but fuse kernel API does not require you
> > do these atomically, AFAICS.
> > 
> > Given I know little bit of fuse low level API, If I were to implement
> > this in virtiofs/passthrough_ll.c, I probably will do following.
> > 
> > A. Check if caller provided O_EXCL flag.
> > B. openat(O_CREAT | O_EXCL)
> > C. If success, we created the file. Set file_created = 1.
> > 
> > D. If error and error != -EEXIST, send error back to client.
> > E. If error and error == -EEXIST, if caller did provide O_EXCL flag,
> >     return error.
> > F. openat() returned -EEXIST and caller did not provide O_EXCL flag,
> >     that means file already exists.  Set file_created = 0.
> > G. Do lookup() etc to create internal lo_inode and stat() of file.
> > H. Send response back to client using fuse_reply_create().
> > This is one sample implementation for fuse lowlevel API. There could
> > be other ways to implement. But all that is libfuse + filesystem
> > specific and kernel does not care how many operations you use to
> > complete and what's the atomicity etc. Of course less number of
> > operations you do better it is.
> > 
> > Anyway, I think I have said enough on this topic. IMHO, FUSE_CREATE
> > descritpion (fuse_lowlevel.h) already mentions that "If the file does not
> > exist, first create it with the specified mode and then open it". That
> > means intent of protocol is that file could already be there as well.
> > So I think we probably should implement this optimization (in kernel)
> > using FUSE_CREATE command and then add FUSE_CREATE_EXT to add optimization
> > about knowing whether file was actually created or not.
> > 
> > W.r.t libfuse optimizations, I am not sure why can't you do optimizations
> > with FUSE_CREATE and why do you need FUSE_CREATE_EXT necessarily. If
> > are you worried that some existing filesystems will break, I think
> > you can create an internal helper say fuse_create_atomic() and then
> > use that if filesystem offers it. IOW, libfuse will have two
> > ways to implement FUSE_CREATE. And if filesystem offers a new way which
> > cuts down on network traffic, libfuse uses more efficient method. We
> > should not have to change kernel FUSE API just because libfuse can
> > do create + open operation more efficiently.
> 
> Ah right, I like this. As I had written before, the first patch version was
> using FUSE_CREATE and I was worried to break something. Yes, it should be
> possible split into lookup+create on the libfuse side. That being said,
> libfuse will need to know which version it is - there might be an old kernel
> sending the non-optimized version - libfuse should not do another lookup
> then.

I am confused about one thing. For FUSE_CREATE command, how does it
matter whether kernel has done lookup() before sending FUSE_CREATE. All
FUSE_CREATE seems to say that crate a file (if it does not exist already)
and then open it and return file handle as well as inode attributes. It
does not say anything about whether a LOOKUP has already been done 
by kernel or not.

It looks like you are assuming that if FUSE_CREATE is coming, that
means client has already done FUSE_LOOKUP. So there is something we
are not on same page about.

I looked at fuse_lowlevel API and passthrough_ll.c and there is no
assumption whether FUSE_LOOKUP has already been called by client
before calling FUSE_CREATE. Similarly, I looked at virtiofs code
and I can't see any such assumption there as well.

https://github.com/qemu/qemu/blob/master/tools/virtiofsd/passthrough_ll.c

So I am sort of lost. May be you can help me understsand this.

> Now there is 'fi.flags = arg->flags', but these are already taken by
> open/fcntl flags - I would not feel comfortable to overload these. At best,
> struct fuse_create_in currently had a padding field, we could convert these
> to something like 'ext_fuse_open_flags' and then use it for fuse internal
> things. Difficulty here is that I don't know if all kernel implementations
> zero the struct (BSD, MacOS), so I guess we would need to negotiate at
> startup/init time and would need another main feature flag? And with that
> I'm not be sure anymore if the result would be actually more simple than
> what we have right now for the first patch.

If FUSE_CREATE indeed has a dependency on FUSE_LOOKUP have been called
before that, then I agree that we can't implement new semantics with
FUSE_CREATE and we will have to introduce a new op say
FUSE_ATOMIC_CREATE/FUSE_LOOKUP_CREATE/FUSE_CREATE_EXT.

But looking at fuse API, I don't see FUSE_CREATE ever guaranteeing that
a FUSE_LOOKUP has been done before this.

Thanks
Vivek


  reply	other threads:[~2022-05-06 17:07 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-02 10:25 [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create Dharmendra Singh
2022-05-02 10:25 ` [PATCH v4 1/3] FUSE: Implement atomic lookup + create Dharmendra Singh
2022-05-03 12:43   ` Vivek Goyal
2022-05-03 14:13   ` Vivek Goyal
2022-05-03 19:53   ` Vivek Goyal
2022-05-03 20:48     ` Bernd Schubert
2022-05-04  4:26     ` Dharmendra Hans
2022-05-04 14:47       ` Vivek Goyal
2022-05-04 15:46         ` Bernd Schubert
2022-05-04 17:31           ` Vivek Goyal
2022-05-05  4:51         ` Dharmendra Hans
2022-05-05 14:26           ` Vivek Goyal
2022-05-06  5:34             ` Dharmendra Hans
2022-05-06 14:12               ` Vivek Goyal
2022-05-06 16:41                 ` Bernd Schubert
2022-05-06 17:07                   ` Vivek Goyal [this message]
2022-05-06 18:45                     ` Bernd Schubert
2022-05-07 10:42                       ` Jean-Pierre André
2022-05-11 10:08                         ` Bernd Schubert
2022-05-02 10:25 ` [PATCH v4 2/3] FUSE: Implement atomic lookup + open Dharmendra Singh
2022-05-04 18:20   ` Vivek Goyal
2022-05-05  6:39     ` Dharmendra Hans
2022-05-02 10:25 ` [PATCH v4 3/3] FUSE: Avoid lookup in d_revalidate() Dharmendra Singh
2022-05-04 20:39   ` Vivek Goyal
2022-05-04 21:05     ` Bernd Schubert
2022-05-05  5:49     ` Dharmendra Hans
2022-05-04 19:18 ` [PATCH v4 0/3] FUSE: Implement atomic lookup + open/create Vivek Goyal
2022-05-05  6:12   ` Dharmendra Hans
2022-05-05 12:54     ` Vivek Goyal
2022-05-05 15:13       ` Bernd Schubert
2022-05-05 19:59         ` Vivek Goyal
2022-05-11  9:40           ` Miklos Szeredi
2022-05-11  9:59             ` Bernd Schubert
2022-05-11 17:21             ` Vivek Goyal
2022-05-11 19:30               ` Vivek Goyal
2022-05-12  8:16                 ` Dharmendra Hans
2022-05-12 15:24                   ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YnVV2Rr4NMyFj5oF@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=bschubert@ddn.com \
    --cc=dharamhans87@gmail.com \
    --cc=dsingh@ddn.com \
    --cc=fuse-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).