linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zach Brown <zach.brown@oracle.com>
To: Sage Weil <sage@newdream.net>
Cc: Chris Mason <chris.mason@oracle.com>,
	btrfs-devel@oss.oracle.com, linux-btrfs@vger.kernel.org
Subject: Re: [Btrfs-devel] cloning file data
Date: Fri, 25 Apr 2008 09:50:36 -0700	[thread overview]
Message-ID: <48120BDC.9000900@oracle.com> (raw)
In-Reply-To: <200804250941.35343.chris.mason@oracle.com>


> We've written into the middle of that 100MB extent, and we need to do COW.  
> One option is to read the whole thing, change 4k and write it all back.  
> Instead, btrfs does something like this (+/- off by need more coffee errors):
> 
> file pos = 0 -> [ old extent, offset = 0, num_bytes = 400k ]
> file pos = 409600 -> [ new 4k extent, offset = 0, num_bytes = 4k ]
> file pos = 413696 -> [ old extent, offset = 413696, num_bytes = 100MB - 404k]
> 
> An extra reference is taken on the old extent to reflect that we're pointing 
> to it twice.

If you learn how to parse the debug-tree output then this can be seen
pretty easily.  To do this we can watch the leaves of the fs tree for
the inode and extent items of the file we work with:

# dd if=/dev/zero bs=1M count=1k of=/tmp/image
# losetup /dev/loop0 /tmp/image
# ./mkfs.btrfs /dev/loop0
# mount -t btrfs /dev/loop0 /mnt/btrfs

# dd if=/dev/zero bs=64M count=1 of=/mnt/btrfs/test
# sync

# ./debug-tree /tmp/image

	item 5 key (256 11 258) itemoff 3779 itemsize 26
		dir index 258 type 1
		namelen 4 datalen 0 name: test
	[...]
	item 1 key (258 1 0) itemoff 2699 itemsize 108
		inode generation 0 size 67108864 [...]
	[...]
	item 3 key (258 12 0) itemoff 2652 itemsize 41
		extent data disk byte 190382080 nr 67108864
		extent data offset 0 nr 67108864

In the root directory we found a dirent for our test file which shows it
has objectid 258, then we found its inode with size=64m and the file
extent which references the 64m extent on disk which starts at byte
offset 190382080.

So now we over-write a 4k region in the file at offset 64k.

# dd if=/dev/zero bs=4k count=1 seek=16 of=/mnt/btrfs/test conv=notrunc
# sync

# ./debug-tree /tmp/image

	item 1 key (258 1 0) itemoff 2699 itemsize 108
		inode generation 0 size 67108864 [...]
	[...]
	item 3 key (258 12 0) itemoff 2652 itemsize 41
		extent data disk byte 190382080 nr 67108864
		extent data offset 0 nr 65536
	item 4 key (258 12 65536) itemoff 2611 itemsize 41
		extent data disk byte 257490944 nr 4096
		extent data offset 0 nr 4096
	item 5 key (258 12 69632) itemoff 2570 itemsize 41
		extent data disk byte 190382080 nr 67108864
		extent data offset 69632 nr 67039232

We still have the same inode, and it has the same size, but its extent
items look very different.  The extent for the first 64k looks much the
same.  It references the old 64m extent on disk.  But see the 'nr
65536', it only maps 64k of that 64m into the file.  Then we have the 4k
extent that we just wrote.  Then we have another reference to that 64m
extent but for the remaining data after the new 4k.

The extra credit assignment is to observe the effect of these extent
reference item changes on the reference count items which are stored
over in the leaves of the extent allocation tree.

debug-tree is fantastic, but it can be kind of intimidating if you don't
already know what all the numbers mean :).  Reducing the barrier to
understanding its output might be a great project for someone interested
in learning the disk format without having to learn how to work with the
kernel code.

- z

  reply	other threads:[~2008-04-25 16:50 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-24 22:47 cloning file data Sage Weil
2008-04-25 13:41 ` [Btrfs-devel] " Chris Mason
2008-04-25 16:50   ` Zach Brown [this message]
2008-04-25 16:58     ` Chris Mason
2008-04-25 17:04       ` Zach Brown
2008-04-25 16:50   ` Zach Brown
2008-04-25 18:32     ` Sage Weil
2008-04-25 18:26   ` Sage Weil
2008-04-26  4:38     ` Sage Weil
2008-05-03  4:44       ` Yan Zheng
2008-05-03  6:16         ` Sage Weil
2008-05-03  6:48           ` Yan Zheng
2008-05-03  7:25           ` Yan Zheng
2008-05-05 10:27             ` Chris Mason
2008-05-05 15:57               ` Sage Weil
2008-05-21 17:19                 ` btrfs_put_inode Mingming
2008-05-21 18:02                   ` btrfs_put_inode Chris Mason
2008-05-21 18:45                     ` btrfs_put_inode Mingming
2008-05-21 18:52                       ` btrfs_put_inode Chris Mason
2008-05-21 22:29                         ` [RFC][PATCH]btrfs delete ordered inode handling fix Mingming
2008-05-22 14:11                           ` Chris Mason
2008-05-22 17:43                             ` Mingming
2008-05-22 17:47                               ` Chris Mason
2008-05-22 20:39                                 ` Mingming
2008-05-22 22:23                                   ` Chris Mason
2008-05-21 18:23                   ` btrfs_put_inode Ryan Hope
2008-05-21 18:32                     ` btrfs_put_inode Chris Mason
2008-05-21 19:02                       ` btrfs_put_inode Mingming
2008-04-25 20:28   ` [Btrfs-devel] cloning file data Sage Weil
2008-04-29 20:52 ` Chris Mason
2008-05-02 20:50 ` Chris Mason
2008-05-02 21:38   ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48120BDC.9000900@oracle.com \
    --to=zach.brown@oracle.com \
    --cc=btrfs-devel@oss.oracle.com \
    --cc=chris.mason@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).