From: Nikolaus Rath <Nikolaus@rath.org>
To: Amir Goldstein <amir73il@gmail.com>
Cc: fuse-devel <fuse-devel@lists.sourceforge.net>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Miklos Szeredi <miklos@szeredi.hu>,
Matthew Wilcox <willy@infradead.org>,
Dave Chinner <dchinner@redhat.com>
Subject: Re: [fuse-devel] 512 byte aligned write + O_DIRECT for xfstests
Date: Mon, 22 Jun 2020 08:35:03 +0100 [thread overview]
Message-ID: <87mu4vsgd4.fsf@vostro.rath.org> (raw)
In-Reply-To: <CAOQ4uxiYG3Z9rnXB6F+fnRtoV1e3k=WP5-mgphgkKsWw+jUK=Q@mail.gmail.com> (Amir Goldstein's message of "Mon, 22 Jun 2020 09:37:35 +0300")
On Jun 22 2020, Amir Goldstein <amir73il@gmail.com> wrote:
> [+CC fsdevel folks]
>
> On Mon, Jun 22, 2020 at 8:33 AM Nikolaus Rath <Nikolaus@rath.org> wrote:
>>
>> On Jun 21 2020, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> >> I am not sure that is correct. At step 6, the write() request from
>> >> userspace is still being processed. I don't think that it is reasonable
>> >> to expect that the write() request is atomic, i.e. you can't expect to
>> >> see none or all of the data that is *currently being written*.
>> >
>> > Apparently the standard is quite clear on this:
>> >
>> > "All of the following functions shall be atomic with respect to each
>> > other in the effects specified in POSIX.1-2017 when they operate on
>> > regular files or symbolic links:
>> >
>> > [...]
>> > pread()
>> > read()
>> > readv()
>> > pwrite()
>> > write()
>> > writev()
>> > [...]
>> >
>> > If two threads each call one of these functions, each call shall
>> > either see all of the specified effects of the other call, or none of
>> > them."[1]
>> >
>> > Thanks,
>> > Miklos
>> >
>> > [1]
>> > https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_07
>>
>> Thanks for digging this up, I did not know about this.
>>
>> That leaves FUSE in a rather uncomfortable place though, doesn't it?
>> What does the kernel do when userspace issues a write request that's
>> bigger than FUSE userspace pipe? It sounds like either the request must
>> be splitted (so it becomes non-atomic), or you'd have to return a short
>> write (which IIRC is not supposed to happen for local filesystems).
>>
>
> What makes you say that short writes are not supposed to happen?
I don't think it was an authoritative source, but I I've repeatedly read
that "you do not have to worry about short reads/writes when accessing
the local disk". I expect this to be a common expectation to be baked
into programs, no matter if valid or not.
> Seems like the options for FUSE are:
> - Take shared i_rwsem lock on read like XFS and regress performance of
> mixed rw workload
> - Do the above only for non-direct and writeback_cache to minimize the
> damage potential
> - Return short read/write for direct IO if request is bigger that FUSE
> buffer size
> - Add a FUSE mode that implements direct IO internally as something like
> RWF_UNCACHED [2] - this is a relaxed version of "no caching" in client or
> a stricter version of "cache write-through" in the sense that
> during an ongoing
> large write operation, read of those fresh written bytes only is served
> from the client cache copy and not from the server.
I didn't understand all of that, but it seems to me that there is a
fundamental problem with splitting up a single write into multiple FUSE
requests, because the second request may fail after the first one
succeeds.
Best,
-Nikolaus
--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
»Time flies like an arrow, fruit flies like a Banana.«
next prev parent reply other threads:[~2020-06-22 7:35 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAMHtQmP_TVR8QA+noWQk04Nj_8AxMXfjCj1K_k0Zf6BN-Bq9sg@mail.gmail.com>
[not found] ` <87bllhh7mg.fsf@vostro.rath.org>
[not found] ` <CAMHtQmPcADq0WSAY=uFFyRgAeuCCAo=8dOHg37304at1SRjGBg@mail.gmail.com>
[not found] ` <877dw0g0wn.fsf@vostro.rath.org>
[not found] ` <CAJfpegs3xthDEuhx_vHUtjJ7BAbVfoDu9voNPPAqJo4G3BBYZQ@mail.gmail.com>
[not found] ` <87sgensmsk.fsf@vostro.rath.org>
2020-06-22 6:37 ` [fuse-devel] 512 byte aligned write + O_DIRECT for xfstests Amir Goldstein
2020-06-22 7:35 ` Nikolaus Rath [this message]
2020-06-22 7:57 ` Amir Goldstein
2020-06-26 5:27 ` Nikolaus Rath
2020-07-01 9:58 ` Hselin Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87mu4vsgd4.fsf@vostro.rath.org \
--to=nikolaus@rath.org \
--cc=amir73il@gmail.com \
--cc=dchinner@redhat.com \
--cc=fuse-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.