* Re: Logical to Physical Address Mapping/Translation in Btrfs
2023-12-19 18:55 saranyag
@ 2023-12-19 8:00 ` Andrei Borzenkov
0 siblings, 0 replies; 6+ messages in thread
From: Andrei Borzenkov @ 2023-12-19 8:00 UTC (permalink / raw)
To: saranyag; +Cc: linux-btrfs
On Tue, Dec 19, 2023 at 9:18 AM <saranyag@cdac.in> wrote:
>
> Hi,
>
> May I know how the logical address is translated to the physical address in
> Btrfs?
>
> I have read the official documentation of Btrfs available here
> (https://btrfs.readthedocs.io/en/latest/Introduction.html). It is not
> covering the address translation part in detail.
>
> I have also gone through the Btrfs source code
> (https://github.com/torvalds/linux/tree/master/fs/btrfs). I could not figure
> out the address translation from the code also.
>
> After referring to the following functions, what I could understand is that
> after getting the logical address of Chunk tree root from the superblock, we
> need to convert it into the corresponding physical address for parsing into
> the next level.
>
> int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices
> *fs_devices, char *options)
>
> int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
>
> static int read_one_chunk(struct btrfs_key *key, struct extent_buffer
> *leaf, struct btrfs_chunk *chunk)
>
> Any hints or pointers to the documentation on this is greatly appreciated.
>
Current btrfs-progs has
btrfs inspect-internal map-swapfile
command that returns a physical offset on the device. You could look
there for the implementation.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Logical to Physical Address Mapping/Translation in Btrfs
@ 2023-12-19 11:29 saranyag
2023-12-19 22:20 ` Qu Wenruo
0 siblings, 1 reply; 6+ messages in thread
From: saranyag @ 2023-12-19 11:29 UTC (permalink / raw)
To: linux-btrfs
Hi,
May I know how the logical address is translated to the physical address in
Btrfs?
I have read the official documentation of Btrfs available here
(https://btrfs.readthedocs.io/en/latest/Introduction.html). It is not
covering the address translation part in detail.
I have also gone through the Btrfs source code
(https://github.com/torvalds/linux/tree/master/fs/btrfs). I could not figure
out the address translation from the code also.
After referring to the following functions, what I could understand is that
after getting the logical address of Chunk tree root from the superblock, we
need to convert it into the corresponding physical address for parsing into
the next level.
int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices
*fs_devices, char *options)
int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
static int read_one_chunk(struct btrfs_key *key, struct extent_buffer
*leaf, struct btrfs_chunk *chunk)
Any hints or pointers to the documentation on this is greatly appreciated.
I want to know the implementation part in btrfs-progs/source code.
Thanks in advance
Saranya G
CDAC
------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 6+ messages in thread
* Logical to Physical Address Mapping/Translation in Btrfs
@ 2023-12-19 18:55 saranyag
2023-12-19 8:00 ` Andrei Borzenkov
0 siblings, 1 reply; 6+ messages in thread
From: saranyag @ 2023-12-19 18:55 UTC (permalink / raw)
To: linux-btrfs
Hi,
May I know how the logical address is translated to the physical address in
Btrfs?
I have read the official documentation of Btrfs available here
(https://btrfs.readthedocs.io/en/latest/Introduction.html). It is not
covering the address translation part in detail.
I have also gone through the Btrfs source code
(https://github.com/torvalds/linux/tree/master/fs/btrfs). I could not figure
out the address translation from the code also.
After referring to the following functions, what I could understand is that
after getting the logical address of Chunk tree root from the superblock, we
need to convert it into the corresponding physical address for parsing into
the next level.
int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices
*fs_devices, char *options)
int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
static int read_one_chunk(struct btrfs_key *key, struct extent_buffer
*leaf, struct btrfs_chunk *chunk)
Any hints or pointers to the documentation on this is greatly appreciated.
Thanks in advance
Saranya G
CDAC
------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Logical to Physical Address Mapping/Translation in Btrfs
2023-12-19 11:29 Logical to Physical Address Mapping/Translation in Btrfs saranyag
@ 2023-12-19 22:20 ` Qu Wenruo
2023-12-20 7:24 ` Andrei Borzenkov
0 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2023-12-19 22:20 UTC (permalink / raw)
To: saranyag, linux-btrfs
On 2023/12/19 21:59, saranyag@cdac.in wrote:
> Hi,
>
> May I know how the logical address is translated to the physical address in
> Btrfs?
This is documented in btrfs-dev-docs/chunks.txt:
https://github.com/btrfs/btrfs-dev-docs/blob/master/chunks.txt
>
> I have read the official documentation of Btrfs available here
> (https://btrfs.readthedocs.io/en/latest/Introduction.html). It is not
> covering the address translation part in detail.
>
> I have also gone through the Btrfs source code
> (https://github.com/torvalds/linux/tree/master/fs/btrfs). I could not figure
> out the address translation from the code also.
>
> After referring to the following functions, what I could understand is that
> after getting the logical address of Chunk tree root from the superblock, we
> need to convert it into the corresponding physical address for parsing into
> the next level.
>
> int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices
> *fs_devices, char *options)
>
> int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
>
> static int read_one_chunk(struct btrfs_key *key, struct extent_buffer
> *leaf, struct btrfs_chunk *chunk)
>
> Any hints or pointers to the documentation on this is greatly appreciated.
> I want to know the implementation part in btrfs-progs/source code.
For the implementation, you need to check btrfs_map_block() (the same
name in both btrfs-progs and kernel), which is the core of logical ->
physical mapping.
All your mentioned functions are just reading the chunk tree into memory.
Thanks,
Qu
>
> Thanks in advance
> Saranya G
> CDAC
>
>
>
> ------------------------------------------------------------------------------------------------------------
> [ C-DAC is on Social-Media too. Kindly follow us at:
> Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
>
> This e-mail is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information. If you are not the
> intended recipient, please contact the sender by reply e-mail and destroy
> all copies and the original message. Any unauthorized review, use,
> disclosure, dissemination, forwarding, printing or copying of this email
> is strictly prohibited and appropriate legal action will be taken.
> ------------------------------------------------------------------------------------------------------------
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Logical to Physical Address Mapping/Translation in Btrfs
2023-12-19 22:20 ` Qu Wenruo
@ 2023-12-20 7:24 ` Andrei Borzenkov
2023-12-20 8:51 ` Qu Wenruo
0 siblings, 1 reply; 6+ messages in thread
From: Andrei Borzenkov @ 2023-12-20 7:24 UTC (permalink / raw)
To: Qu Wenruo; +Cc: saranyag, linux-btrfs
On Wed, Dec 20, 2023 at 1:20 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2023/12/19 21:59, saranyag@cdac.in wrote:
> > Hi,
> >
> > May I know how the logical address is translated to the physical address in
> > Btrfs?
>
> This is documented in btrfs-dev-docs/chunks.txt:
>
> https://github.com/btrfs/btrfs-dev-docs/blob/master/chunks.txt
>
I tried to read it three times and I still do not understand it.
It starts with showing two chunks - metadata and data
--><--
item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 4593811456) itemoff 15863 itemsize 112
length 268435456 owner 2 stripe_len 65536 type METADATA|RAID1
io_align 65536 io_width 65536 sector_size 4096
num_stripes 2 sub_stripes 1
stripe 0 devid 2 offset 2425356288
dev_uuid a7963b67-1277-49ff-bb1d-9d81c5605f1b
stripe 1 devid 1 offset 2446327808
dev_uuid 5f8b54f0-2a35-4330-a06b-9c8fd935bc36
item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 2446327808) itemoff 15975 itemsize 112
length 2147483648 owner 2 stripe_len 65536 type DATA|RAID0
io_align 65536 io_width 65536 sector_size 4096
num_stripes 2 sub_stripes 1
stripe 0 devid 2 offset 1351614464
dev_uuid a7963b67-1277-49ff-bb1d-9d81c5605f1b
stripe 1 devid 1 offset 1372585984
dev_uuid 5f8b54f0-2a35-4330-a06b-9c8fd935bc36
--><--
Then it apparently talks about writing into metadata chunk, judging by
the logical address
--><--
Consider we want to write 2m at at 4596957184 - that's 3m past the start of
the data chunk in the previous example. In order to see where in the physical
stripe this write will go into we need to derive the following values:
block group offset = [address within block group] - [start address of
block group]
block_group_offset = 4596957184 - 4593811456 = 3145728 => 3m
--><--
The chunk start 4593811456 is metadata. But when talking about
physical location it suddenly takes address of the device extent of
the data chunk
--><--
physical_address = [physical stripe start] + [logical_stripe] *
[logical stripe_size]
physical_address = 1351614464 + 48 * 64k = 1351614464 + 3145728 = 1354760192
--><--
1351614464 is the address of the stripe 0 of the data chunk. It is
completely unclear whether it is intentional or not.
Nor does it explain how the device extent (physical stripe) is
selected and how it jumps from the block_group_offset to the
(physical) stripe number.
Intermixing "chunk" and "block group" does not help in understanding it either.
And I suspect RAID5/6 is something entirely different ...
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Logical to Physical Address Mapping/Translation in Btrfs
2023-12-20 7:24 ` Andrei Borzenkov
@ 2023-12-20 8:51 ` Qu Wenruo
0 siblings, 0 replies; 6+ messages in thread
From: Qu Wenruo @ 2023-12-20 8:51 UTC (permalink / raw)
To: Andrei Borzenkov; +Cc: saranyag, linux-btrfs
On 2023/12/20 17:54, Andrei Borzenkov wrote:
> On Wed, Dec 20, 2023 at 1:20 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2023/12/19 21:59, saranyag@cdac.in wrote:
>>> Hi,
>>>
>>> May I know how the logical address is translated to the physical address in
>>> Btrfs?
>>
>> This is documented in btrfs-dev-docs/chunks.txt:
>>
>> https://github.com/btrfs/btrfs-dev-docs/blob/master/chunks.txt
>>
>
> I tried to read it three times and I still do not understand it.
>
> It starts with showing two chunks - metadata and data
>
> --><--
> item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 4593811456) itemoff 15863 itemsize 112
> length 268435456 owner 2 stripe_len 65536 type METADATA|RAID1
> io_align 65536 io_width 65536 sector_size 4096
> num_stripes 2 sub_stripes 1
> stripe 0 devid 2 offset 2425356288
> dev_uuid a7963b67-1277-49ff-bb1d-9d81c5605f1b
> stripe 1 devid 1 offset 2446327808
> dev_uuid 5f8b54f0-2a35-4330-a06b-9c8fd935bc36
>
> item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 2446327808) itemoff 15975 itemsize 112
> length 2147483648 owner 2 stripe_len 65536 type DATA|RAID0
> io_align 65536 io_width 65536 sector_size 4096
> num_stripes 2 sub_stripes 1
> stripe 0 devid 2 offset 1351614464
> dev_uuid a7963b67-1277-49ff-bb1d-9d81c5605f1b
> stripe 1 devid 1 offset 1372585984
> dev_uuid 5f8b54f0-2a35-4330-a06b-9c8fd935bc36
> --><--
>
> Then it apparently talks about writing into metadata chunk, judging by
> the logical address
Firstly, btrfs uses chunk and block groups interchangeably.
Block groups are more used inside extent tree, as we have
BLOCK_GROUP_ITEM, focusing on the used/free space.
Meanwhile for logical -> physical mapping, we use chunk more frequently,
as that's the name of the chunk tree, and CHUNK_ITEM.
>
> --><--
> Consider we want to write 2m at at 4596957184 - that's 3m past the start of
> the data chunk in the previous example. In order to see where in the physical
> stripe this write will go into we need to derive the following values:
>
> block group offset = [address within block group] - [start address of
> block group]
> block_group_offset = 4596957184 - 4593811456 = 3145728 => 3m
> --><--
>
> The chunk start 4593811456 is metadata. But when talking about
> physical location it suddenly takes address of the device extent of
> the data chunk
>
> --><--
> physical_address = [physical stripe start] + [logical_stripe] *
> [logical stripe_size]
> physical_address = 1351614464 + 48 * 64k = 1351614464 + 3145728 = 1354760192
> --><--
Damn it, this part is for RAID0, and is not correct since our write
should arrive in RAID1 METADATA chunk.
In that case, RAID1* is pretty simple, every mirror is the a full copy
of each other, no striping/rotation.
Thus in that block group offset of 3M writes, we should write into both
mirrors:
physical_address = [physical stripe start] + [offset inside bg]
Thus the result should be:
Mirror 1 [devid 2] physical address = 2425356288 + 3m
Mirror 2 [devid 1] physical address = 2446327808 + 3m.
>
> 1351614464 is the address of the stripe 0 of the data chunk. It is
> completely unclear whether it is intentional or not.
>
> Nor does it explain how the device extent (physical stripe) is
> selected and how it jumps from the block_group_offset to the
> (physical) stripe number.
For RAID0 the whole situation is a little complex, but still easy to
understand.
Firstly for a RAID0 chunk, they are split into 64K length stripes, and
each 64K stripe are spread into each device.
If we have a RAID0 chunk with 2 devices, just like the data chunk example:
Off inside bg 0 +64K +128K +192K
| Stripe 0 | Stripe 1 | Stripe 2 | ...
Then Stripe 0 would be at the first dev extent of that data chunk, with
offset 0 to the dev extent (devid 2 physical 1351614464 + offset 0)
Stripe 1 would be at the second dev extent, with offset 0 to the dev
extent. (devid 1 physical 1372585984 + offset 0).
And for stripe 2, it would be at the first dev extent again, but offset
64K to the dev extent.
So for the bg offset 1m write for the RAID0 data chunk, it would be at:
1) stripe_nr = bg_offset / stripe_length
1M / 64K = 16
2) Choose which dev-extent to be write into:
stripe_index = (bg_off / stripe_len) % nr_dev
( 1M / 64K ) % 2 = 0
Thus we choose the dev stripe 0 ( devid 2 offset 1351614464) of that
data chunk.
3) Final physical offset
physical_off = dev_extent_off + offset_in_stripe + skipped_physical
= dev_extnet_off + (bg_off & stripe_mask) +
stripe_nr / nr_dev * stripe_len
= 1351614464 + (1M & (64K - 1)) + 16 / 2 * 64K
= 1351614464 + 0 + 8 * 64K
Remember, all these calculation is the same for regular RAID0.
For RAID10, it's RAID1 for each RAID0 stripe.
Just make above nr_dev to be (nr_dev / 2).
In btrfs' case, we use sub_stripe to distinguish the calculation.
For RAID10, sub_stripe would always be 2, and RAID0 would always have
sub_stripe as 1.
Thus above nr_dev can be replaced to (nr_dev / sub_stripes), and then
can cover both RAID0 and RAID10.
>
> Intermixing "chunk" and "block group" does not help in understanding it either.
>
> And I suspect RAID5/6 is something entirely different ...
RAID5/6 is mostly based on RAID0, but with more rotation involved, thus
more complex.
I can explain RAID56 in more details, if you can grasp the RAID0 and
RAID1, and RAID10 part first.
And RAID0 behavior are shared between LVM striped/dm-raid0/btrfs RAID0,
following the same behavior (IIRC).
The same for RAID1, among LVM mirrored/dm-raid1/btrfs RAID1.
Thanks,
Qu
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-12-20 8:51 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-19 11:29 Logical to Physical Address Mapping/Translation in Btrfs saranyag
2023-12-19 22:20 ` Qu Wenruo
2023-12-20 7:24 ` Andrei Borzenkov
2023-12-20 8:51 ` Qu Wenruo
-- strict thread matches above, loose matches on Subject: below --
2023-12-19 18:55 saranyag
2023-12-19 8:00 ` Andrei Borzenkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox