linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Theodore Ts'o <tytso@mit.edu>, Jan Kara <jack@suse.cz>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	Chris Mason <clm@fb.com>, Al Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	Ext4 <linux-ext4@vger.kernel.org>,
	Linux Btrfs <linux-btrfs@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: [RFC][PATCH] link.2: AT_ATOMIC_DATA and AT_ATOMIC_METADATA
Date: Mon, 3 Jun 2019 09:17:19 +0300	[thread overview]
Message-ID: <CAOQ4uxhqJJvr=uHmn_vPPPwZDCQoL2GFug30quFScNORT5Fw=w@mail.gmail.com> (raw)
In-Reply-To: <20190603042540.GH29573@dread.disaster.area>

> > Actually, one of my use cases is "atomic rename" of files with
> > no data (looking for atomicity w.r.t xattr and mtime), so this "atomic rename"
> > thread should not be interfering with other workloads at all.
>
> Which should already guaranteed because a) rename is supposed to be
> atomic, and b) metadata ordering requirements in journalled
> filesystems. If they lose xattrs across rename, there's something
> seriously wrong with the filesystem implementation.  I'm really not
> sure what you think filesystems are actually doing with metadata
> across rename operations....
>

Dave,

We are going in circles so much that my head is spinning.
I don't blame anyone for having a hard time to keep up with the plot, because
it spans many threads and subjects, so let me re-iterate:

- I *do* know that rename provides me the needed "metadata barrier"
  w.r.t. xattr on xfs/ext4 today.
- I *do* know the sync_file_range()+rename() callback provides the
"data barrier"
  I need on xfs/ext4 today.
- I *do* use this internal fs knowledge in my applications
- I even fixed up sync_file_range() per your suggestion, so I won't need to use
  the FIEMAP_FLAG_SYNC hack
- At attempt from CrashMonkey developers to document this behavior was
  "shot down" for many justified reasons
- Without any documentation nor explicit API with a clean guarantee, users
  cannot write efficient applications without being aware of the filesystem
  underneath and follow that filesystem development to make sure behavior
  has not changed
- The most recent proposal I have made in LSF, based on Jan's suggestion is
  to change nothing in filesystem implementation, but use a new *explicit* verb
  to communicate the expectation of the application, so that
filesystems are free
  the change behavior in the future in the absence of the new verb

Once again, ATOMIC_METADATA is a noop in preset xfs/ext4.
ATOMIC_DATA is sync_file_range() in present xfs/ext4.
The APIs I *need* from the kernel *do* exist, but the filesystem developers
(except xfs) are not willing to document the guarantee that the existing
interfaces provide in the present.

[...]
> So, in the interests of /informed debate/, please implement what you
> want using batched AIO_FSYNC + rename/linkat completion callback and
> measure what it acheives. Then implement a sync_file_range/linkat
> thread pool that provides the same functionality to the application
> (i.e. writeback concurrency in userspace) and measure it. Then we
> can discuss what the relative overhead is with numbers and can
> perform analysis to determine what the cause of the performance
> differential actually is.
>

Fare enough.

> Neither of these things require kernel modifications, but you need
> to provide the evidence that existing APIs are insufficient.

APIs are sufficient if I know which filesystem I am running on.
btrfs needs a different set of syscalls to get the same thing done.

> Indeed, we now have the new async ioring stuff that can run async
> sync_file_range calls, so you probably need to benchmark replacing
> AIO_FSYNC with that interface as well. This new API likely does
> exactly what you want without the journal/device cache flush
> overhead of AIO_FSYNC....
>

Indeed, I am keeping a close watch on io_uring.

Thanks,
Amir.

  reply	other threads:[~2019-06-03  6:17 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-27 17:26 [RFC][PATCH] link.2: AT_ATOMIC_DATA and AT_ATOMIC_METADATA Amir Goldstein
2019-05-28 20:06 ` Darrick J. Wong
2019-05-29  5:58   ` Amir Goldstein
2019-05-28 20:26 ` Theodore Ts'o
2019-05-29  5:38   ` Amir Goldstein
2019-05-31 15:21     ` Amir Goldstein
2019-05-31 16:41       ` Theodore Ts'o
2019-05-31 17:22         ` Amir Goldstein
2019-05-31 19:21           ` Theodore Ts'o
2019-05-31 22:45         ` Dave Chinner
2019-05-31 23:28           ` Dave Chinner
2019-06-01  8:01             ` Amir Goldstein
2019-06-03  4:25               ` Dave Chinner
2019-06-03  6:17                 ` Amir Goldstein [this message]
2019-06-01  7:21           ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ4uxhqJJvr=uHmn_vPPPwZDCQoL2GFug30quFScNORT5Fw=w@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=clm@fb.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).