From: Dave Chinner <david@fromorbit.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
lsf-pc <lsf-pc@lists.linux-foundation.org>,
Jan Kara <jack@suse.cz>, Christian Brauner <brauner@kernel.org>,
Josef Bacik <josef@toxicpanda.com>,
Jeff Layton <jlayton@kernel.org>
Subject: Re: [LSF/MM/BPF TOPIC] vfs write barriers
Date: Thu, 23 Jan 2025 11:34:29 +1100 [thread overview]
Message-ID: <Z5GOlVQpN47LLmo1@dread.disaster.area> (raw)
In-Reply-To: <CAOQ4uxgYERCmPrTXjuM4Q3HdWK_HxuOkkpAEnesDHCAD=9fsOg@mail.gmail.com>
On Mon, Jan 20, 2025 at 12:41:33PM +0100, Amir Goldstein wrote:
> For the HSM prototype, we track changes to a filesystem during
> a given time period by handling pre-modify vfs events and recording
> the file handles of changed objects.
>
> sb_write_barrier(sb) provides an (internal so far) vfs API to wait
> for in-flight syscalls that can be still modifying user visible in-core
> data/metadata, without blocking new syscalls.
Yes, I get this part. What I don't understand is how it is in any
way useful....
> The method described in the HSM prototype [3] uses this API
> to persist the state that all the changes until time T were "observed".
>
> > This proposed write barrier does not seem capable of providing any
> > sort of physical data or metadata/data write ordering guarantees, so
> > I'm a bit lost in how it can be used to provide reliable "crash
> > consistent change tracking" when there is no relationship between
> > the data/metadata in memory and data/metadata on disk...
>
> That's a good question. A bit hard to explain but I will try.
>
> The short answer is that the vfs write barrier does *not* by itself
> provide the guarantee for "crash consistent change tracking".
>
> In the prototype, the "crash consistent change tracking" guarantee
> is provided by the fact that the change records are recorded as
> as metadata in the same filesystem, prior to the modification and
> those metadata records are strictly ordered by the filesystem before
> the actual change.
This doesn't make any sense to me - you seem to be making
assumptions that I know an awful lot about how your HSM prototype
works.
What's in a change record, when does it get written, what is it's
persistence semantics, what filesystem metadata is it being written
to? how does this relate to the actual dirty data that is
resident in the page cache that hasn't been written to stable
storage yet? Is there a another change record to say the data the
first change record tracks has been written to persistent storage?
> The vfs write barrier allows to partition the change tracking records
> into overlapping time periods in a way that allows the *consumer* of
> the changes to consume the changes in a "crash consistent manner",
> because:
> 1. All the in-core changes recorded before the barrier are fully
> observable after the barrier
> 2. All the in-core changes that started after the barrier, will be recorded
> for the future change query
>
> I would love to discuss the merits and pitfalls of this method, but the
> main thing I wanted to get feedback on is whether anyone finds the
> described vfs API useful for anything other that the change tracking
> system that I described.
This seems like a very specialised niche use case right now, but I
still have no clear idea how the application using this proposed
write barrier actually works to acheive the stated functionality
this feature provides it with...
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2025-01-23 0:34 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-17 18:01 [LSF/MM/BPF TOPIC] vfs write barriers Amir Goldstein
2025-01-19 21:15 ` Dave Chinner
2025-01-20 11:41 ` Amir Goldstein
2025-01-23 0:34 ` Dave Chinner [this message]
2025-01-23 14:01 ` Amir Goldstein
2025-01-23 18:14 ` Jeff Layton
2025-01-24 21:07 ` Amir Goldstein
2025-02-11 14:53 ` Jan Kara
2025-03-20 17:00 ` Amir Goldstein
2025-03-27 18:23 ` Amir Goldstein
2025-01-27 23:34 ` Dave Chinner
2025-01-29 1:39 ` Amir Goldstein
2025-02-11 21:12 ` Dave Chinner
2025-02-12 8:29 ` Amir Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z5GOlVQpN47LLmo1@dread.disaster.area \
--to=david@fromorbit.com \
--cc=amir73il@gmail.com \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=josef@toxicpanda.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox