public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Mark Harmstone <mark@harmstone.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>, Boris Burkov <boris@bur.io>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: add BTRFS_IOC_GET_CSUMS ioctl
Date: Tue, 7 Apr 2026 19:13:45 +0100	[thread overview]
Message-ID: <a40afd17-bfb6-486d-8632-9663d6cf7ce3@harmstone.com> (raw)
In-Reply-To: <2bb3df33-a9e0-48fc-bff4-957c7d7cb8eb@gmx.com>

I think all three of us are confusing each other a little here.

The ioctl answers the question: if I were to read X bytes of data from a 
file at Y offset and calculated the csums manually, what would the value 
be? To which the kernel responds either with the values, that the read 
is guaranteed to return zero and thus we can use the precomputed csum 
for the zero sector, or that the value isn't known and userspace has to 
do it anyway.

The value isn't known if it's a nodatasum file or if it's compressed. We 
store the csums of compressed extents, but crucially it's over the 
compressed data. So there's no one-to-one mapping between file blocks 
and compressed sectors (by definition, because it's compressed), and 
bookending means that it might be data we don't have access to.

We absolutely can't give non-root users csums to arbitrary data, that's 
definitely a security breach.

Userspace can already obtain the csums from the disk for a file by using 
FIEMAP and the tree search ioctl. But I believe the consensus around the 
tree search ioctl is a) that it's very difficult to use, as you need to 
know the internals of btrfs, and b) it requires CAP_SYS_ADMIN, at a time 
when containerization and finer-grained access controls means this is 
frowned upon.

This ioctl is a simpler way of doing the csum lookup, and without 
requiring root.

On 04/04/2026 12.00 am, Qu Wenruo wrote:
> 
> 
> 在 2026/4/4 09:14, Boris Burkov 写道:
>> On Fri, Apr 03, 2026 at 08:16:26AM +1030, Qu Wenruo wrote:
>>>
>>>
>>> 在 2026/4/3 03:35, Mark Harmstone 写道:
>>>> On 25/03/2026 9.04 pm, Qu Wenruo wrote:
>>> [...]
>>>>>
>>>>> That's done by progs through fiemap. There will be a flag ENCODED
>>>>> for compressed file extents.
>>>>
>>>> No, this still won't work I'm afraid. The ioctl is answering the
>>>> question "what's the csum of the sector no. such-and-such in this
>>>> file?". That can't be answered for compressed extents, as the csums are
>>>> on the compressed data.
>>>
>>> My point is, since you are not trying to fetching the csum of compressed
>>> extent in the first place, you don't need to bother that situation at 
>>> all.
>>>
>>> And even for compressed extents, it is still possible to fetch the csum,
>>> after all we're just search the csum tree for a given logical bytenr.
>>>
>>> There will be some extra concerns like fiemap can not return the real
>>> compressed length, but again we ruled our compressed extents in the 
>>> first
>>> place.
>>>
>>
>> I may be misinterpreting, but I feel like the question at hand is how
>> much hardening the ioctl requires to be correct, and how much work it
>> can delegate to userspace.
>>
>> Suppose we make it require root, I think we could make the interface
>> much simpler and just use the logical offsets instead of a file based
>> interface, and we can leave it up to userspace entirely to figure out
>> which ranges they care about.
>>
>> OTOH, if we agree we want the csums ioctl to be unprivileged, which
>> means that the interface must assume that the input could be bad, on
>> overlapping extents, not marked up properly, etc... In that case, I do
>> not quite know *exactly* what is redundant with fiemap, what is exactly
>> necessary for safety vs for caller convenience, etc.
>>
>> Basically, I think the options are roughly:
>>
>> - Mark's proposal: A smart, convenient GET_CSUMS that does everything
>>    turnkey and as helpfully as possible. Lots of redundance with fiemap.
>>    Safe to make unprivileged.
>> - Qu's review: Require the user to do the fiemap part themselves and
>>    don't make GET_CSUMS quite as turnkey. It is unclear to me whether it
>>    is possible to make such a version unprivileged safely *without*
>>    the fiemap redundancy.
>> - Boris's strawman: A dumb, inconvenient GET_CSUMS that expects a lot of
>>    userspace but doesn't check anything and definitely needs root. If we
>>    do go root-only, I feel like this might be the best interface?
> 
> Well, my idea is more aligned with yours, except the root part.
> 
> Our ideas share the same part, the ioctl just handles things inside the 
> csum tree without bothering subvolume tree.
> 
> Yes, bad inputs can lead to a lot of information leakage if we allow 
> non-root users to use this ioctl, but I doubt if they can really do 
> anything with the information they got.
> 
> One still needs proper privilege to call fiemap on a file, so even if 
> one knows there are some csum at random logical bytenr, unless they can 
> access fiemap result of files that are utilizing those bytenrs, the csum 
> is still useless.
> 
> But I'm also fine with root privilege requirement for the ioctl too, as 
> to me stricter requirement has no obvious disadvantage, and can release 
> us from safety concerns.
> 
>>
>> And the questions are:
>> 1. How badly do we want non-root? In practice, mkfs is root when writing
>> disks but not necessarily when writing image files, so it's a bit of a
>> toss up there. At meta we tend to end up sad when mkfs has root-only
>> functionality that we want.
> 
> I'm fine either way.
> 
>> 2. What is the bare minimum processing needed to safely allow non-root
>> callers with arbitrarily wrong input? I don't see how we can assume they
>> will use fiemap correctly and not hit bookends, or set correct tags on
>> the input, for a few examples.
> 
> They just pass random logical into the ioctl, and we return whatever 
> they want, including something to show which range has csum, and the csum.
> 
> Let me be clear again, it's just a variant of TREE_SEARCH, except the 
> existing TREE_SEARCH is not good enough for csum tree search.
> 
> We don't need to bother if it's bookend/compressed or whatever, if they 
> want to do stupid things, that's their choice, and just reading the csum 
> shouldn't cause any writes/effects/damage to the fs, so let they do 
> whatever.
> 
> Thanks,
> Qu
> 
>>
>> Thanks,
>> Boris
>>
> 


  reply	other threads:[~2026-04-07 18:13 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-20 12:50 [PATCH] btrfs: add BTRFS_IOC_GET_CSUMS ioctl Mark Harmstone
2026-03-20 13:03 ` Mark Harmstone
2026-03-20 22:18 ` Qu Wenruo
2026-03-25  7:34   ` Qu Wenruo
2026-03-25 14:43     ` Mark Harmstone
2026-03-25 21:04       ` Qu Wenruo
2026-04-02 17:05         ` Mark Harmstone
2026-04-02 21:46           ` Qu Wenruo
2026-04-03 22:44             ` Boris Burkov
2026-04-03 23:00               ` Qu Wenruo
2026-04-07 18:13                 ` Mark Harmstone [this message]
2026-04-07 21:52                   ` Qu Wenruo
2026-04-07 22:13                     ` Boris Burkov
2026-04-07 22:39                       ` Qu Wenruo
2026-04-08 13:22                         ` Mark Harmstone

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a40afd17-bfb6-486d-8632-9663d6cf7ce3@harmstone.com \
    --to=mark@harmstone.com \
    --cc=boris@bur.io \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox