linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hans Beckerus <hans.beckerus@gmail.com>
To: Nikhilesh Reddy <reddyn@codeaurora.org>,
	Nikolaus Rath <nikolaus@rath.org>
Cc: Jan Kara <jack@suse.cz>, Richard Weinberger <richard@nod.at>,
	Miklos Szeredi <miklos@szeredi.hu>,
	fuse-devel <fuse-devel@lists.sourceforge.net>,
	Greg KH <gregkh@linuxfoundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andy Lutomirski <luto@amacapital.net>,
	sven.utcke@gmx.de, Al Viro <viro@zeniv.linux.org.uk>,
	Linux API <linux-api@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [fuse-devel] [PATCH] fuse: Add support for fuse stacked I/O
Date: Fri, 15 Jan 2016 22:56:11 +0100	[thread overview]
Message-ID: <56996AFB.1070006@gmail.com> (raw)
In-Reply-To: <56994884.9060002@codeaurora.org>

On 2016-01-15 8:29, Nikhilesh Reddy wrote:
> On Fri 15 Jan 2016 09:51:50 AM PST, Nikolaus Rath wrote:
>> On Jan 15 2016, Antonio SJ Musumeci <trapexit@spawn.link> wrote:
>>> The idea is that you want to be able to reason about open, create, etc. but
>>> don't care about the data transfer.
>>>
>>> I have N filesystems I wish to unionize. When I create a new file I want to
>>> pick the drive with the most free space (or some other algo). creat is
>>> called, succeeds, and now the application issuing this starts writing. The
>>> FUSE fs doesn't care about the writes. It just wanted to pick the drive
>>> this file should have been created on. Anything I'd do with the FD after
>>> that I'm happy to short circuit. I don't need to be asked what to do when
>>> fstat'ing this FD or anything which in FUSE hands over the 'fh'. It's just
>>> a file descriptor for me and I'd simply be calling the same function.
>>>
>>> Ideally I think one would want to be able to select which functions to
>>> short circuit and maybe even have it so that a short circuited function
>>> could propagate back through FUSE on error. But the read and write short
>>> circuiting is probably the biggest win given the overhead.
>>
>> I think you should avoid using the term "stacked" completely (which
>> would also make Christoph happy). There have been several discussions in
>> the past about adding a "fd delegation" function to FUSE. Generally, the
>> idea is that the FUSE userspace code tells the FUSE kernel module to
>> internally "delegate" to writes and reads for a given file (or even a
>> range in that file) to a different file descriptor provided by
>> userspace.
>>
>> I think that function would be useful, and not just for union file
>> systems. There are many FUSE file systems that end up writing the data
>> into some other file on the disk without doing any transformations on
>> the data itself. Especially with the range feature, they would all
>> benefit from the ability to delegate reads and writes.
I agree with Nikolaus here. I do believe there might be use-cases that 
could benefit from this.
I have a typical example were a FUSE fs wish to handle reads but really 
does not care about the writes other than
it should transparently write to the underlying fs. Simply getting a 
move of a file from the underlying fs to the
FUSE mount point if located on e.g. the same physical partition would 
result in a more or less instant operation, right?
But this also requires that the operations are selectable. A user should 
be able to choose which operation to bypass.
I understand though that this will need adaptations to libfuse as well.
Another question here is if an inotify write-type watch on the FUSE 
mount point will be affected by this or not?

>> However, Miklos has said in the past that the performance gain from this
>> is very small. You can get almost as good a result by splicing from one
>> fd to the other in userspace. In that case this function could actually
>> be implemented completely in libfuse.
>>
>>
>> Do you have any benchmark results that compare a splice-in-userspace
>> approach with your patch?
>>
>>
>> Best,
>> -Nikolaus
>>
>
> Hi
>
> @Linus
> Thanks for taking the time to reply to my email. It means a lot.
>
> FUSE allows users to implement extensions to filesystems ..such as enforcing policy or permissions without having to modify the kernel or maintain the policy in the kernel.
>
> One such example is what was quoted by Antonio above ..
> Another example is a fuse based filesystem that tries to enforce additional permissions on a FAT based mount point.
>
> >From what i could google there are many FUSE based filesystems out there that do things during the open call but simply pass through the read/and write I/O calls to the local "lower" filesystem where they actually store the data.
>
> >From what i understand ...unionfs or overlayfs and similar filesystems are primarily used to support a merged or unified view of directories and do not offer mechanisms to add policy or other checks /extensions to the I/O operations without modifying the kernel..
>
> The main motivation is to make FUSE performance better in such usecases without loosing out on the ease of implementing and extending in the userspace.
>
>
>
> @Nikolaus
> Our local benchmarks on embedded devices (where power and cpu usage is critical) show that splice doesnt help as much .. when running multiple cpu's results in increased power usage
>
> The below results are on a specific device model.
>
> Where IOPS is number of 4K based read or writes that could be performed each second.
>
>                                   regular         spliced         Stacked I/O
> sequencial write (MiBPS)	56.55633333	100.34445       141.7096667
> sequencial read (MiBPS)	        49.644	        60.43434        122.367
>
> random write (IOPS)	        2554.333333	4053.4545       8572
> random read (IOPS)	        977.3333333	1223.34         1432.666667
>
> The above tests were performed using a file size of 1GB
>
> Using stacked I/O showed the best performance (almost the same as the native EXT4 filesystem that is storing the real file)
>
> Also we measured that there is a 5% saving of Power and the CPU timeslices used. ( Splice did not improve this at all compared to default fuse)
>
> Random I/O i.e seeking to random parts of a file and reading ( usecases such as elf and *.so loading from fuse based filesystems also improved
>
>
> Similarly when using MMAPED I/O ( in an extended patch to this one.. still in progress) showed a significant improvement about a 400% improvement over default fuse.
>
> Also we can called it FUSE_DELEGATED_IO if that helps :).
> I chose to call is stacked i/o since we are technically stacking the fuse read/writes on the ext4/fat or other filesystems.
>
> Please let me know if you have any questions.
>
> @everyone
> Thanks so much for your comments and the interest.
> Also many of you have shown support for the patch in private emails.
> I would be grateful if you could voice the same support on the public thread so that everyone knows that there is interest in this patch.
>
>


  parent reply	other threads:[~2016-01-15 21:56 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <565394BE.4040506@codeaurora.org>
2016-01-13 23:53 ` [PATCH] fuse: Add support for fuse stacked I/O Nikhilesh Reddy
2016-01-14  4:57   ` Greg KH
2016-01-14 18:57     ` Nikhilesh Reddy
2016-01-14 19:19       ` Linus Torvalds
2016-01-15 16:31         ` Andy Lutomirski
     [not found]           ` <CALCETrWNQ9ytw1bCOOjFJRstauYc6DocQAmZCF61CErAJ5BF2g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-01-15 17:37             ` Antonio SJ Musumeci
2016-01-15 17:51               ` Nikolaus Rath
2016-01-15 19:29                 ` Nikhilesh Reddy
2016-01-15 21:38                   ` Nikolaus Rath
2016-01-15 21:43                   ` Linus Torvalds
2016-01-15 21:46                     ` Andy Lutomirski
2016-01-15 21:53                       ` Linus Torvalds
2016-01-19  2:57                         ` Nikhilesh Reddy
2016-01-20 23:24                           ` Nikhilesh Reddy
2016-01-15 21:56                   ` Hans Beckerus [this message]
2016-01-14  8:25   ` Christoph Hellwig
2016-01-14 15:52     ` Nikolaus Rath
2016-01-15  9:29       ` Christoph Hellwig
2016-01-19  3:07   ` Jann Horn
2016-01-20 23:07     ` Nikhilesh Reddy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56996AFB.1070006@gmail.com \
    --to=hans.beckerus@gmail.com \
    --cc=fuse-devel@lists.sourceforge.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=jack@suse.cz \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=miklos@szeredi.hu \
    --cc=nikolaus@rath.org \
    --cc=reddyn@codeaurora.org \
    --cc=richard@nod.at \
    --cc=sven.utcke@gmx.de \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).