Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>,
	Qu Wenruo <wqu@suse.com>, Jonathan Corbet <corbet@lwn.net>,
	Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	linux-doc@vger.kernel.org, linux-btrfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH v2 5/5] btrfs: fiemap: return extent physical size
Date: Wed, 3 Apr 2024 17:49:42 +1030	[thread overview]
Message-ID: <305008f4-9e17-4435-bb1d-a56b1de63c9b@gmx.com> (raw)
In-Reply-To: <d01b4606-38fa-4f27-8fbd-31de505ba3a3@dorminy.me>



在 2024/4/3 16:32, Sweet Tea Dorminy 写道:
>>> This means, we will emit a entry that uses the end to the physical
>>> extent end.
>>>
>>> Considering a file layout like this:
>>>
>>>      item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
>>>          generation 7 type 1 (regular)
>>>          extent data disk byte 13631488 nr 65536
>>>          extent data offset 0 nr 4096 ram 65536
>>>          extent compression 0 (none)
>>>      item 7 key (257 EXTENT_DATA 4096) itemoff 15763 itemsize 53
>>>          generation 8 type 1 (regular)
>>>          extent data disk byte 13697024 nr 4096
>>>          extent data offset 0 nr 4096 ram 4096
>>>          extent compression 0 (none)
>>>      item 8 key (257 EXTENT_DATA 8192) itemoff 15710 itemsize 53
>>>          generation 7 type 1 (regular)
>>>          extent data disk byte 13631488 nr 65536
>>>          extent data offset 8192 nr 57344 ram 65536
>>>          extent compression 0 (none)
>>>
>>> For fiemap, we would got something like this:
>>>
>>> fileoff 0, logical len 4k, phy 13631488, phy len 64K
>>> fileoff 4k, logical len 4k, phy 13697024, phy len 4k
>>> fileoff 8k, logical len 56k, phy 13631488 + 8k, phylen 56k
>>>
>>> [HOW TO CALCULATE WASTED SPACE IN USER SPACE]
>>> My concern is on the first entry. It indicates that we have wasted
>>> 60K (phy len is 64K, while logical len is only 4K)
>>>
>>> But that information is not correct, as in reality we only wasted 4K,
>>> the remaining 56K is still referred by file range [8K, 64K).
>>>
>>> Do you mean that user space program should maintain a mapping of each
>>> utilized physical range, and when handling the reported file range
>>> [8K, 64K), the user space program should find that the physical range
>>> covers with one existing extent, and do calculation correctly?
>>
>> My goal is to give an unprivileged interface for tools like compsize
>> to figure out how much space is used by a particular set of files.
>> They report the total disk space referenced by the provided list of
>> files, currently by doing a tree search (CAP_SYS_ADMIN) for all the
>> extents pertaining to the requested files and deduplicating extents
>> based on disk_bytenr.
>>
>> It seems simplest to me for userspace for the kernel to emit the
>> entire extent for each part of it referenced in a file, and let
>> userspace deal with deduplicating extents. This is also most similar
>> to the existing tree-search based interface. Reporting whole extents
>> gives more flexibility for userspace to figure out how to report
>> bookend extents, or shared extents, or ...
>>
>> It does seem a little weird where if you request with fiemap only e.g.
>> 4k-16k range in that example file you'll get reported all 68k
>> involved, but I can't figure out a way to fix that without having the
>> kernel keep track of used parts of the extents as part of reporting,
>> which sounds expensive.
>>
>> You're right that I'm being inconsistent, taking off extent_offset
>> from the reported disk size when that isn't what I should be doing, so
>> I fixed that in v3.
>
> Ah, I think I grasp a point I'd missed before.
> - Without setting disk_bytenr to the actual start of the data on disk,
> there's no way to find the location of the actual data on disk within
> the extent from fiemap alone

Yes, that's my point.

> - But reporting disk_bytenr + offset, to get actual start of data on
> disk, means we need to report a physical size to figure out the end of
> the extent and we can't know the beginning.

disk_bytenr + offset + disk_num_bytes, and with the existing things like
length (aka, num_bytes), filepos (aka, key.offset) flags
(compression/hole/preallocated etc), we have everything we need to know
for regular extents.

For compressed extents, we also need ram_bytes.

If you ask me, I'd say put all the extra members into fiemap entry if we
have the space...

It would be u64 * 4 if we go 1:1 on the file extent items, otherwise we
may cheap on offset and ram_bytes (u32 is enough for btrfs at least), in
that case it would be u64 * 2 + u32 * 2.

But I'm also 100% sure, the extra members would not be welcomed by other
filesystems either.

Thanks,
Qu

>
> We can't convey both actual location, start, and end of the extent in
> just two pieces of information.
>
> On the other hand, if someone really needs to know the actual location
> on disk of their data, they could use the tree_search ioctl as root to
> do so?
>
> So I still think we should be reporting entire extents but am less
> confident that it doesn't break existing users. I am not sure how common
> it is to take fiemap output on btrfs and use it to try to get to
> physical data on disk - do you know of a tool that does so?
>
> Thank you!
>

  reply	other threads:[~2024-04-03  7:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-28  1:22 [PATCH v2 0/5] fiemap extension for more physical information Sweet Tea Dorminy
2024-03-28  1:22 ` [PATCH v2 1/5] fs: fiemap: add physical_length field to extents Sweet Tea Dorminy
2024-03-28  1:22 ` [PATCH v2 2/5] fs: fiemap: update fiemap_fill_next_extent() signature Sweet Tea Dorminy
2024-03-28  1:22 ` [PATCH v2 3/5] fs: fiemap: add new COMPRESSED extent state Sweet Tea Dorminy
2024-03-28  1:22 ` [PATCH v2 4/5] btrfs: fiemap: emit new COMPRESSED fiemap state Sweet Tea Dorminy
2024-03-28  1:22 ` [PATCH v2 5/5] btrfs: fiemap: return extent physical size Sweet Tea Dorminy
2024-03-31  9:03   ` Qu Wenruo
2024-04-03  5:32     ` Sweet Tea Dorminy
2024-04-03  5:52       ` Qu Wenruo
2024-04-03  7:18         ` Sweet Tea Dorminy
2024-04-03  6:02       ` Sweet Tea Dorminy
2024-04-03  7:19         ` Qu Wenruo [this message]
2024-04-09 18:52           ` David Sterba
2024-04-09 19:31             ` Andreas Dilger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=305008f4-9e17-4435-bb1d-a56b1de63c9b@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=brauner@kernel.org \
    --cc=clm@fb.com \
    --cc=corbet@lwn.net \
    --cc=dsterba@suse.com \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@meta.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sweettea-kernel@dorminy.me \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox