All of lore.kernel.org
 help / color / mirror / Atom feed
From: Liu Bo <liubo2009@cn.fujitsu.com>
To: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Newbie questions on some of btrfs code...
Date: Mon, 21 May 2012 17:33:37 +0800	[thread overview]
Message-ID: <4FBA0BF1.1000901@cn.fujitsu.com> (raw)
In-Reply-To: <CAHf9xvZ8+ZXFLRM5R_xjekQOsf2Xaj0UEANMVW5BBJ2YjjhGuw@mail.gmail.com>

On 05/21/2012 04:20 PM, Alex Lyakas wrote:

> Hi Liu,
> do you think that this should not happen? I see this all the time, and
> I am not doing any stress tests. Just creating a file and writing some
> data at different offsets, to create "holes" in the file offset space.
> btrfsck does not produce any errors.


I happen to know how it works :)

This comes from our COW feature, when we rewrite a file extent from its middle part,
we will find another space for the new data and leave the original extent alone:

So for the following situation:
> 	item 23 key (266 EXTENT_DATA 4096) itemoff 2269 itemsize 53
> 		extent data disk byte 0 nr 0
> 		extent data offset 0 nr 4096 ram 8192
> 		extent compression 0

As your case, after the first 'size 5' inline extent is written,
"nr 4096 < ram 8192" could come from:
1) dd if=/dev/zero of=/mnt/btrfs/foobar bs=1k seek=12 count=4 conv=notrunc;sync
2) dd if=/dev/zero of=/mnt/btrfs/foobar bs=1k seek=8 count=4 conv=notrunc;sync

1) makes
> 	item 23 key (266 EXTENT_DATA 4096) itemoff 2269 itemsize 53
> 		extent data disk byte 0 nr 0
> 		extent data offset 0 nr 8192 ram 8192
> 		extent compression 0

2) makes
> 	item 23 key (266 EXTENT_DATA 4096) itemoff 2269 itemsize 53
> 		extent data disk byte 0 nr 0
> 		extent data offset 0 nr 4096 ram 8192
> 		extent compression 0

> I am using kernel 3.3.6 and btrfs-progrs compiled from
> git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git,
> as advised by wiki.
> 
> For example, I have now the following file:
> 	item 20 key (266 INODE_ITEM 0) itemoff 2369 itemsize 160
> 		inode generation 64 size 200005 block group 0 mode 100644 links 1
> 	item 21 key (266 INODE_REF 256) itemoff 2348 itemsize 21
> 		inode ref index 10 namelen 11 name: sparse_file
> 	item 22 key (266 EXTENT_DATA 0) itemoff 2322 itemsize 26
> 		inline extent data size 5 ram 5 compress 0
> 	item 23 key (266 EXTENT_DATA 4096) itemoff 2269 itemsize 53
> 		extent data disk byte 0 nr 0
> 		extent data offset 0 nr 4096 ram 8192
> 		extent compression 0
> 	item 24 key (266 EXTENT_DATA 8192) itemoff 2216 itemsize 53
> 		extent data disk byte 432013312 nr 4096
> 		extent data offset 0 nr 4096 ram 4096
> 		extent compression 0
> 	item 25 key (266 EXTENT_DATA 12288) itemoff 2163 itemsize 53
> 		extent data disk byte 0 nr 0
> 		extent data offset 0 nr 86016 ram 90112
> 		extent compression 0
> 	item 26 key (266 EXTENT_DATA 98304) itemoff 2110 itemsize 53
> 		extent data disk byte 432017408 nr 4096
> 		extent data offset 0 nr 4096 ram 4096
> 		extent compression 0
> 	item 27 key (266 EXTENT_DATA 102400) itemoff 2057 itemsize 53
> 		extent data disk byte 0 nr 0
> 		extent data offset 0 nr 94208 ram 98304
> 		extent compression 0
> 	item 28 key (266 EXTENT_DATA 196608) itemoff 2004 itemsize 53
> 		extent data disk byte 432021504 nr 4096
> 		extent data offset 0 nr 4096 ram 4096
> 		extent compression 0
> 
> Some observations for it:
> # There is a real "hole" between first two extents, because the length
> of first extent is 5 bytes, but second extent starts at offset 4096.
> Is this expected? I see this all the time.


Yup, our extents are sectorsize aligned, say 4096.


> # There are several extents with
> btrfs_file_extent_item::disk_bytenr==0. According to some hints within
> the kernel btrfs code, I presume that these are zero-extents. So when
> I see disk_bytenr==0, I should not try looking up this extent in
> extent tree or in chunk tree, I should assume that this extent should
> be filled by zeros. Is my understanding correct?


'disk_bytenr == 0' means dummy extents, which has no data.


> # The last extent has offset=196608 and size=4096. Adding them up
> gives 200704. However, the file size within INODE_ITEM is 200005. So
> this is the issue you asked about.
> 


Given the sectorsize aligned stuff, the file size of INODE_ITEM is correct, 200005 here.


> I have some more pesky questions, which hopefully you or some other
> devs can help with. Or at least point me at a relevant code to look
> at.
> 
> # What is BTRFS_FILE_EXTENT_PREALLOC? How should I treat
> btrfs_file_extent_item of such type?
> 


IIRC, PREALLOC comes from fallocate or something like that, which means we allocate the
space in advance, and will use it in the future.


> # Why btrfs_previous_item() in btrfs-progs in different from kernel
> code? In kernel code, there are additional checks like this:
> 		nritems = btrfs_header_nritems(leaf);
> 		if (nritems == 0)
> 			return 1;
> 		if (path->slots[0] == nritems)
> 			path->slots[0]--;
> 


The kernel side is more careful, it's ok.


> # What is the btrfs_dir_item::data_len value is used for? I saw it
> appearing in XATTR_ITEM, but not in DIR_INDEX/DIR_ITEM
> 


data_len is xattr relative, plz check the source code: btrfs_set_acl()


thanks,
liubo

> Thanks!
> Alex.
> 
> 
> 
> 
> 
> On Mon, May 21, 2012 at 4:59 AM, Liu Bo <liubo2009@cn.fujitsu.com> wrote:
>> On 05/18/2012 09:32 PM, Alex Lyakas wrote:
>>
>>> Thank you, Hugo, for the detailed explanation. I am now able to find
>>> the CHUNK_ITEMs and to successfully locate the file data on disk.
>>> Can you maybe address several follow-up questions I have?
>>>
>>> # When looking for CHUNK_ITEMs, should I check that their
>>> btrfs_chunk::type==BTRFS_BLOCK_GROUP_DATA (and not SYSTEM/METADATA
>>> etc)? Or file extent should always be mapped to BTRFS_BLOCK_GROUP_DATA
>>> chunk?
>>>
>>> # It looks like I don't even need to bother with the extent tree at
>>> this point, because from EXTENT_DATA in fs tree I can navigate
>>> directly to CHUNK_ITEM in chunk tree, correct?
>>>
>>> # For replicating RAID levels, you said there will be multiple
>>> CHUNK_ITEMs. How do I find them then? Should I know in advance how
>>> much there should be, and look for them, considering only
>>> btrfs_chunk::type==BTRFS_BLOCK_GROUP_DATA? (I don't bother for
>>> replication at this point, though).
>>>
>>> # If I find in the fs tree an EXTENT_DATA of type
>>> BTRFS_FILE_EXTENT_PREALLOC, how should I treat it? What does it mean?
>>> (BTRFS_FILE_EXTENT_INLINE are easy to treat).
>>>
>>> # One of my files has two EXTENT_DATAs, like this:
>>>       item 14 key (270 EXTENT_DATA 0) itemoff 1812 itemsize 53
>>>               extent data disk byte 432508928 nr 1474560
>>>               extent data offset 0 nr 1470464 ram 1474560
>>>               extent compression 0
>>>       item 15 key (270 EXTENT_DATA 1470464) itemoff 1759 itemsize 53
>>>               extent data disk byte 432082944 nr 126976
>>>               extent data offset 0 nr 126976 ram 126976
>>>               extent compression 0
>>> Summing btrfs_file_extent_item::num_bytes gives
>>> 1470464+126976=1597440. (I know that I should not be summing
>>> btrfs_file_extent_item::disk_num_bytes, but num_bytes).
>>> However, it's INODE_ITEM gives size of 1593360, which is less:
>>>       item 11 key (270 INODE_ITEM 0) itemoff 1970 itemsize 160
>>>               inode generation 26 size 1593360 block group 0 mode 100700 links 1
>>>
>>> Is this a valid situation, or I should always consider size in
>>> INODE_ITEM as the correct one?
>>>
>>
>> Hi Alex,
>>
>> Have you tried btrfsck on this 'inode size mismatch' box?
>>
>> And I'm interest in if it can be reproduced and how?
>>
>>
>> thanks,
>> liubo
>>
>>> Thanks again,
>>> Alex.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



  reply	other threads:[~2012-05-21  9:29 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-18 11:21 Newbie questions on some of btrfs code Alex Lyakas
2012-05-18 11:50 ` Hugo Mills
2012-05-18 13:32   ` Alex Lyakas
2012-05-18 13:59     ` Hugo Mills
2012-05-20  7:40       ` Alex Lyakas
2012-05-21  1:59     ` Liu Bo
2012-05-21  8:20       ` Alex Lyakas
2012-05-21  9:33         ` Liu Bo [this message]
2012-05-21 10:05           ` Alex Lyakas
2012-05-22  1:42             ` Liu Bo
2012-05-22  7:48               ` Alex Lyakas
2012-05-21 10:44 ` Jan Schmidt
2012-05-22  8:07   ` Alex Lyakas
2012-05-22 22:08     ` Jan Schmidt
2012-05-28 18:45       ` Alex Lyakas
2012-05-29  9:13         ` Jan Schmidt
2012-05-29 11:27           ` Alex Lyakas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FBA0BF1.1000901@cn.fujitsu.com \
    --to=liubo2009@cn.fujitsu.com \
    --cc=alex.bolshoy.btrfs@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.