From: Gao Xiang <hsiangkao@linux.alibaba.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Christian Brauner <brauner@kernel.org>,
"Darrick J. Wong" <djwong@kernel.org>,
Amir Goldstein <amir73il@gmail.com>,
Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
Daniel Borkmann <daniel@iogearbox.net>,
Alexei Starovoitov <ast@kernel.org>,
linux-fsdevel@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH] bpf: add bpf_real_inode() kfunc
Date: Fri, 10 Apr 2026 15:29:39 +0800 [thread overview]
Message-ID: <6bfe805e-be0d-4bd5-ac45-1b58fc55839f@linux.alibaba.com> (raw)
In-Reply-To: <adihXuFuuS5uoJ31@infradead.org>
On 2026/4/10 15:06, Christoph Hellwig wrote:
> On Fri, Apr 10, 2026 at 02:46:00PM +0800, Gao Xiang wrote:
>>> It needs to be applied in-memory for every changed, and persisted to
>>> disk on every fsync or equivalent operation.
>>
>> Yes, yet it doesn't change my evaluation: and you need
>> to consider background writebacks too (since writeback
>> will update data and then impact the whole hash tree).
>>
>> Currently data writeback can be applied for each block
>> independently, but if you consider maintaining a hash
>> tree (rather than simple checksums), I guess you have
>> to keep strict atomicity between data writeback,
>> metadata and hash-tree writeback, otherwise the hashes
>> and partial writeback data will be mismatched.
>
> You write the leaf checksum with each block. The rest of the chain
> leading up to the root is kept in metadata tied to the inode and needs
> to be written atomically with the transaction commit that updates the
> on-disk metadata to point to the newly written block.
>
>> Yes, the OOB approach for leaf hashes will help to
>> reduce write amplification, but my current observation
>> is that it won't have any help to read amplification,
>> especially for small random read; overall it depends
>> on the target workload.
>
> For HDD is roughly halves the number of seeks for random reads, and
> at least significantly reduces it significantly but quite a bit
> less. For SSD it reduces the IOPS in a similar way, but for that
Yes, seeks can be reduced and if all related leaf block
hashes can be loaded in a single request (even some main
data blocks may not needed) it will help more; in
practice, a POC forming out to measure the numbers between
hashing in individual blocks or extended area may help to
get more detailed ideas.
If it's useful, I think dm-verity can work out in this way
too as a bogus as long as the device supports it (and
converting between these two metadata formats should be
trivial.)
> you need to max out the IOPS, which for most workloads you won't
> on anything currently (and probably in the future) using erofs.
The problem is not always maximum IOPS (I also suspect
there is such real long-term max-IOPS workload too), but
small random burst I/Os with low latencies is what we
usually care much for typical use cases.
Thanks,
Gao Xiang
next prev parent reply other threads:[~2026-04-10 7:29 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-26 16:53 [PATCH] bpf: add bpf_real_inode() kfunc Christian Brauner
2026-03-26 17:02 ` Amir Goldstein
2026-03-27 5:28 ` Christoph Hellwig
2026-03-27 6:05 ` Darrick J. Wong
2026-04-07 10:25 ` Christian Brauner
2026-04-07 14:54 ` Christoph Hellwig
2026-04-09 13:19 ` Christian Brauner
2026-04-09 14:24 ` Christoph Hellwig
2026-04-09 14:37 ` Gao Xiang
2026-04-09 16:11 ` Christoph Hellwig
2026-04-09 16:42 ` Gao Xiang
2026-04-10 6:15 ` Christoph Hellwig
2026-04-10 6:46 ` Gao Xiang
2026-04-10 7:06 ` Christoph Hellwig
2026-04-10 7:29 ` Gao Xiang [this message]
2026-03-27 12:19 ` bot+bpf-ci
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6bfe805e-be0d-4bd5-ac45-1b58fc55839f@linux.alibaba.com \
--to=hsiangkao@linux.alibaba.com \
--cc=amir73il@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=daniel@iogearbox.net \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox