Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Mike Power <dodtsair@gmail.com>
To: cwillu <cwillu@cwillu.com>
Cc: Hugo Mills <hugo@carfax.org.uk>, linux-btrfs@vger.kernel.org
Subject: Re: copy on write misconception
Date: Sat, 23 Feb 2013 10:30:14 -0800	[thread overview]
Message-ID: <51290AB6.3090709@alumni.calpoly.edu> (raw)
In-Reply-To: <CAE5mzvhLSL1DQEGK6gnm+9Hgg5Umu33Tx3m5LkHdjqq+_pUYpg@mail.gmail.com>

On 02/22/2013 10:35 AM, cwillu wrote:
> On Fri, Feb 22, 2013 at 11:41 AM, Mike Power <dodtsair@gmail.com> wrote:
>> On 02/22/2013 09:16 AM, Hugo Mills wrote:
>>> On Fri, Feb 22, 2013 at 09:11:28AM -0800, Mike Power wrote:
>>>> I think I have a misconception of what copy on write in btrfs means
>>>> for individual files.
>>>>
>>>> I had originally thought that I could create a large file:
>>>> time dd if=/dev/zero of=10G bs=1G count=10
>>>> 10+0 records in
>>>> 10+0 records out
>>>> 10737418240 bytes (11 GB) copied, 100.071 s, 107 MB/s
>>>>
>>>> real    1m41.082s
>>>> user    0m0.000s
>>>> sys    0m7.792s
>>>>
>>>> Then if I copied this file no blocks would be copied until they are
>>>> written.  Hence the two files would use the same blocks underneath.
>>>> But specifically that copy would be fast.  Since it would only need
>>>> to write some metadata.  But when I copy the file:
>>>> time cp 10G 10G2
>>>>
>>>> real    3m38.790s
>>>> user    0m0.124s
>>>> sys    0m10.709s
>>>>
>>>> Oddly enough it actually takes longer then the initial file
>>>> creation.  So I am guessing that the long duration copy of the file
>>>> is expected and that is not one of the virtues of btrfs copy on
>>>> write.  Does that sound right?
>>>      You probably want cp --reflink=always, which makes a CoW copy of
>>> the file's metadata only. The resulting files have the semantics of
>>> two different files, but share their blocks until a part of one of
>>> them is modified (at which point, the modified blocks are no longer
>>> shared).
>>>
>>>      Hugo.
>>>
>> I see, and it works great:
>> time cp --reflink=always 10G 10G3
>>
>> real    0m0.028s
>> user    0m0.000s
>> sys    0m0.000s
>>
>> So from the user perspective I might say I want to opt out of this feature
>> not optin.  I want all copies by all applications done as a copy on write.
>> But if my understanding is correct that is up to the application being
>> called (in this case cp) and how it in turns makes calls to the system.
>>
>> In short I can't remount the btrfs filesystem with some new args that says
>> always copy on write files because that is what it already.
> There's no "copy a file" syscall; when a program copies a file, it
> opens a new file, and writes all the bytes from the old to the new.
> Converting this to a reflink would require btrfs to implement full
> de-dup (which is rather expensive), and still wouldn't prevent the
> program from reading and writing all 10gb (and so wouldn't be any
> faster).
>
> You can set an alias in your shell to make cp --reflink=auto the
> default, but that won't affect other programs, nor other users.
Thanks for the help guys.  I learned that if I want some application to 
support this behavior they must specifically choose to implement it.

  reply	other threads:[~2013-02-23 18:30 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-22 17:11 copy on write misconception Mike Power
2013-02-22 17:16 ` Hugo Mills
2013-02-22 17:41   ` Mike Power
2013-02-22 18:35     ` cwillu
2013-02-23 18:30       ` Mike Power [this message]
2013-02-22 17:24 ` cwillu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51290AB6.3090709@alumni.calpoly.edu \
    --to=dodtsair@gmail.com \
    --cc=cwillu@cwillu.com \
    --cc=hugo@carfax.org.uk \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox