Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Wade Cline <clinew@linux.vnet.ibm.com>
To: Alex Lyakas <alex.btrfs@zadarastorage.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: btrfs seems to do COW while inode has NODATACOW set
Date: Thu, 25 Oct 2012 13:52:00 -0700	[thread overview]
Message-ID: <5089A670.7080108@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAOcd+r2U9XjvjS6bZPNXkQh1JGJxohmHyAb5CAQNySwCyV1TzA@mail.gmail.com>

On 10/25/2012 12:09 PM, Alex Lyakas wrote:

> Wade, thanks.
>
> Yes, with the preallocated extent I saw the behavior you describe, and
> it makes perfect sense to alloc a new EXTENT_DATA in this case.
> In my case, I did another simple test:
>
> Before:
> 	item 4 key (257 INODE_ITEM 0) itemoff 3593 itemsize 160
> 		inode generation 5 transid 5 size 5368709120 nbytes 5368709120
> owner[0:0] mode 100644
> 		inode blockgroup 0 nlink 1 flags 0x3 seq 0
> 	item 5 key (257 INODE_REF 256) itemoff 3578 itemsize 15
> 		inode ref index 2 namelen 5 name: vol-1
> 	item 6 key (257 EXTENT_DATA 0) itemoff 3525 itemsize 53
> 		extent data disk byte 5368709120 nr 131072
> 		extent data offset 0 nr 131072 ram 131072
> 		extent compression 0
> 	item 7 key (257 EXTENT_DATA 131072) itemoff 3472 itemsize 53
> 		extent data disk byte 5905842176 nr 33423360
> 		extent data offset 0 nr 33423360 ram 33423360
> 		extent compression 0
>                  ...
>
> I am going to do a single write of a 4Kib block into (257 EXTENT_DATA
> 131072) extent:
>
> dd if=/dev/urandom of=/mnt/src/subvol-1/vol-1 bs=4096 seek=32 count=1
> conv=notrunc
>
> After:
> 	item 4 key (257 INODE_ITEM 0) itemoff 3593 itemsize 160
> 		inode generation 5 transid 21 size 5368709120 nbytes 5368709120
> owner[0:0] mode 100644
> 		inode blockgroup 0 nlink 1 flags 0x3 seq 1
> 	item 5 key (257 INODE_REF 256) itemoff 3578 itemsize 15
> 		inode ref index 2 namelen 5 name: vol-1
> 	item 6 key (257 EXTENT_DATA 0) itemoff 3525 itemsize 53
> 		extent data disk byte 5368709120 nr 131072
> 		extent data offset 0 nr 131072 ram 131072
> 		extent compression 0
> 	item 7 key (257 EXTENT_DATA 131072) itemoff 3472 itemsize 53
> 		extent data disk byte 5368840192 nr 4096
> 		extent data offset 0 nr 4096 ram 4096
> 		extent compression 0
> 	item 8 key (257 EXTENT_DATA 135168) itemoff 3419 itemsize 53
> 		extent data disk byte 5905842176 nr 33423360
> 		extent data offset 4096 nr 33419264 ram 33423360
> 		extent compression 0
>
> We clearly see that a new extent has been allocated for some reason
> (bytenr=5368840192), and previous extent (bytenr=5905842176) is still
> there, but used at offset of 4096. This is exactly cow, I believe.
Hmm, I'm pretty sure that using 'dd' in this fashion skips the first 32 4096-sized
blocks and thus writes -past- the length of this extent (eg: writes from 131073 to
135168). This causes a new extent to be allocated after the previous extent.

But even if using 'dd' with a 'skip' value of '31' created a new EXTENT_DATA, it
would not necessarily be data CoW, since data CoW refers only to the location of
the -data- (i.e., not metadata and thus not EXTENT_DATA) on disk. The key thing
is to look at where the EXTENT_DATAs are pointing to, not how many EXTENT_DATAs
there are.

> However, your hint about not being able to read into memory may be
> useful; it would be good if we can find the place in the code that
> does that decision to cow.
Try looking at the callers of btrfs_cow_block(), but you'll be own your own from
there :)

> I guess I am looking for a way to never ever allocate new EXTENT_DATAs
> on a fully-mapped file. Is there one?
Hmm, I don't think that this exists right now. You could try a '-o autodefrag' to
minimize the number of EXTENT_DATAs, though.

Regards,
Wade

>
> Thanks!
> Alex.


  reply	other threads:[~2012-10-25 20:52 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-25 18:35 btrfs seems to do COW while inode has NODATACOW set Alex Lyakas
2012-10-25 18:40 ` cwillu
2012-10-25 18:47   ` Alex Lyakas
2012-10-25 18:58 ` Wade Cline
2012-10-25 19:09   ` Alex Lyakas
2012-10-25 20:52     ` Wade Cline [this message]
2012-10-26 13:33       ` Kyle Gates
2012-10-28 12:12         ` Alex Lyakas
2012-10-29 17:18           ` Alex Lyakas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5089A670.7080108@linux.vnet.ibm.com \
    --to=clinew@linux.vnet.ibm.com \
    --cc=alex.btrfs@zadarastorage.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox