From: Liu Bo <liubo2009@cn.fujitsu.com>
To: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Newbie questions on some of btrfs code...
Date: Mon, 21 May 2012 17:33:37 +0800 [thread overview]
Message-ID: <4FBA0BF1.1000901@cn.fujitsu.com> (raw)
In-Reply-To: <CAHf9xvZ8+ZXFLRM5R_xjekQOsf2Xaj0UEANMVW5BBJ2YjjhGuw@mail.gmail.com>
On 05/21/2012 04:20 PM, Alex Lyakas wrote:
> Hi Liu,
> do you think that this should not happen? I see this all the time, and
> I am not doing any stress tests. Just creating a file and writing some
> data at different offsets, to create "holes" in the file offset space.
> btrfsck does not produce any errors.
I happen to know how it works :)
This comes from our COW feature, when we rewrite a file extent from its middle part,
we will find another space for the new data and leave the original extent alone:
So for the following situation:
> item 23 key (266 EXTENT_DATA 4096) itemoff 2269 itemsize 53
> extent data disk byte 0 nr 0
> extent data offset 0 nr 4096 ram 8192
> extent compression 0
As your case, after the first 'size 5' inline extent is written,
"nr 4096 < ram 8192" could come from:
1) dd if=/dev/zero of=/mnt/btrfs/foobar bs=1k seek=12 count=4 conv=notrunc;sync
2) dd if=/dev/zero of=/mnt/btrfs/foobar bs=1k seek=8 count=4 conv=notrunc;sync
1) makes
> item 23 key (266 EXTENT_DATA 4096) itemoff 2269 itemsize 53
> extent data disk byte 0 nr 0
> extent data offset 0 nr 8192 ram 8192
> extent compression 0
2) makes
> item 23 key (266 EXTENT_DATA 4096) itemoff 2269 itemsize 53
> extent data disk byte 0 nr 0
> extent data offset 0 nr 4096 ram 8192
> extent compression 0
> I am using kernel 3.3.6 and btrfs-progrs compiled from
> git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git,
> as advised by wiki.
>
> For example, I have now the following file:
> item 20 key (266 INODE_ITEM 0) itemoff 2369 itemsize 160
> inode generation 64 size 200005 block group 0 mode 100644 links 1
> item 21 key (266 INODE_REF 256) itemoff 2348 itemsize 21
> inode ref index 10 namelen 11 name: sparse_file
> item 22 key (266 EXTENT_DATA 0) itemoff 2322 itemsize 26
> inline extent data size 5 ram 5 compress 0
> item 23 key (266 EXTENT_DATA 4096) itemoff 2269 itemsize 53
> extent data disk byte 0 nr 0
> extent data offset 0 nr 4096 ram 8192
> extent compression 0
> item 24 key (266 EXTENT_DATA 8192) itemoff 2216 itemsize 53
> extent data disk byte 432013312 nr 4096
> extent data offset 0 nr 4096 ram 4096
> extent compression 0
> item 25 key (266 EXTENT_DATA 12288) itemoff 2163 itemsize 53
> extent data disk byte 0 nr 0
> extent data offset 0 nr 86016 ram 90112
> extent compression 0
> item 26 key (266 EXTENT_DATA 98304) itemoff 2110 itemsize 53
> extent data disk byte 432017408 nr 4096
> extent data offset 0 nr 4096 ram 4096
> extent compression 0
> item 27 key (266 EXTENT_DATA 102400) itemoff 2057 itemsize 53
> extent data disk byte 0 nr 0
> extent data offset 0 nr 94208 ram 98304
> extent compression 0
> item 28 key (266 EXTENT_DATA 196608) itemoff 2004 itemsize 53
> extent data disk byte 432021504 nr 4096
> extent data offset 0 nr 4096 ram 4096
> extent compression 0
>
> Some observations for it:
> # There is a real "hole" between first two extents, because the length
> of first extent is 5 bytes, but second extent starts at offset 4096.
> Is this expected? I see this all the time.
Yup, our extents are sectorsize aligned, say 4096.
> # There are several extents with
> btrfs_file_extent_item::disk_bytenr==0. According to some hints within
> the kernel btrfs code, I presume that these are zero-extents. So when
> I see disk_bytenr==0, I should not try looking up this extent in
> extent tree or in chunk tree, I should assume that this extent should
> be filled by zeros. Is my understanding correct?
'disk_bytenr == 0' means dummy extents, which has no data.
> # The last extent has offset=196608 and size=4096. Adding them up
> gives 200704. However, the file size within INODE_ITEM is 200005. So
> this is the issue you asked about.
>
Given the sectorsize aligned stuff, the file size of INODE_ITEM is correct, 200005 here.
> I have some more pesky questions, which hopefully you or some other
> devs can help with. Or at least point me at a relevant code to look
> at.
>
> # What is BTRFS_FILE_EXTENT_PREALLOC? How should I treat
> btrfs_file_extent_item of such type?
>
IIRC, PREALLOC comes from fallocate or something like that, which means we allocate the
space in advance, and will use it in the future.
> # Why btrfs_previous_item() in btrfs-progs in different from kernel
> code? In kernel code, there are additional checks like this:
> nritems = btrfs_header_nritems(leaf);
> if (nritems == 0)
> return 1;
> if (path->slots[0] == nritems)
> path->slots[0]--;
>
The kernel side is more careful, it's ok.
> # What is the btrfs_dir_item::data_len value is used for? I saw it
> appearing in XATTR_ITEM, but not in DIR_INDEX/DIR_ITEM
>
data_len is xattr relative, plz check the source code: btrfs_set_acl()
thanks,
liubo
> Thanks!
> Alex.
>
>
>
>
>
> On Mon, May 21, 2012 at 4:59 AM, Liu Bo <liubo2009@cn.fujitsu.com> wrote:
>> On 05/18/2012 09:32 PM, Alex Lyakas wrote:
>>
>>> Thank you, Hugo, for the detailed explanation. I am now able to find
>>> the CHUNK_ITEMs and to successfully locate the file data on disk.
>>> Can you maybe address several follow-up questions I have?
>>>
>>> # When looking for CHUNK_ITEMs, should I check that their
>>> btrfs_chunk::type==BTRFS_BLOCK_GROUP_DATA (and not SYSTEM/METADATA
>>> etc)? Or file extent should always be mapped to BTRFS_BLOCK_GROUP_DATA
>>> chunk?
>>>
>>> # It looks like I don't even need to bother with the extent tree at
>>> this point, because from EXTENT_DATA in fs tree I can navigate
>>> directly to CHUNK_ITEM in chunk tree, correct?
>>>
>>> # For replicating RAID levels, you said there will be multiple
>>> CHUNK_ITEMs. How do I find them then? Should I know in advance how
>>> much there should be, and look for them, considering only
>>> btrfs_chunk::type==BTRFS_BLOCK_GROUP_DATA? (I don't bother for
>>> replication at this point, though).
>>>
>>> # If I find in the fs tree an EXTENT_DATA of type
>>> BTRFS_FILE_EXTENT_PREALLOC, how should I treat it? What does it mean?
>>> (BTRFS_FILE_EXTENT_INLINE are easy to treat).
>>>
>>> # One of my files has two EXTENT_DATAs, like this:
>>> item 14 key (270 EXTENT_DATA 0) itemoff 1812 itemsize 53
>>> extent data disk byte 432508928 nr 1474560
>>> extent data offset 0 nr 1470464 ram 1474560
>>> extent compression 0
>>> item 15 key (270 EXTENT_DATA 1470464) itemoff 1759 itemsize 53
>>> extent data disk byte 432082944 nr 126976
>>> extent data offset 0 nr 126976 ram 126976
>>> extent compression 0
>>> Summing btrfs_file_extent_item::num_bytes gives
>>> 1470464+126976=1597440. (I know that I should not be summing
>>> btrfs_file_extent_item::disk_num_bytes, but num_bytes).
>>> However, it's INODE_ITEM gives size of 1593360, which is less:
>>> item 11 key (270 INODE_ITEM 0) itemoff 1970 itemsize 160
>>> inode generation 26 size 1593360 block group 0 mode 100700 links 1
>>>
>>> Is this a valid situation, or I should always consider size in
>>> INODE_ITEM as the correct one?
>>>
>>
>> Hi Alex,
>>
>> Have you tried btrfsck on this 'inode size mismatch' box?
>>
>> And I'm interest in if it can be reproduced and how?
>>
>>
>> thanks,
>> liubo
>>
>>> Thanks again,
>>> Alex.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2012-05-21 9:29 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-18 11:21 Newbie questions on some of btrfs code Alex Lyakas
2012-05-18 11:50 ` Hugo Mills
2012-05-18 13:32 ` Alex Lyakas
2012-05-18 13:59 ` Hugo Mills
2012-05-20 7:40 ` Alex Lyakas
2012-05-21 1:59 ` Liu Bo
2012-05-21 8:20 ` Alex Lyakas
2012-05-21 9:33 ` Liu Bo [this message]
2012-05-21 10:05 ` Alex Lyakas
2012-05-22 1:42 ` Liu Bo
2012-05-22 7:48 ` Alex Lyakas
2012-05-21 10:44 ` Jan Schmidt
2012-05-22 8:07 ` Alex Lyakas
2012-05-22 22:08 ` Jan Schmidt
2012-05-28 18:45 ` Alex Lyakas
2012-05-29 9:13 ` Jan Schmidt
2012-05-29 11:27 ` Alex Lyakas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FBA0BF1.1000901@cn.fujitsu.com \
--to=liubo2009@cn.fujitsu.com \
--cc=alex.bolshoy.btrfs@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).