From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-da0-f41.google.com ([209.85.210.41]:47110 "EHLO mail-da0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755622Ab3BWSaR (ORCPT ); Sat, 23 Feb 2013 13:30:17 -0500 Received: by mail-da0-f41.google.com with SMTP id e20so862050dak.28 for ; Sat, 23 Feb 2013 10:30:16 -0800 (PST) Message-ID: <51290AB6.3090709@alumni.calpoly.edu> Date: Sat, 23 Feb 2013 10:30:14 -0800 From: Mike Power MIME-Version: 1.0 To: cwillu CC: Hugo Mills , linux-btrfs@vger.kernel.org Subject: Re: copy on write misconception References: <5127A6C0.6080104@alumni.calpoly.edu> <20130222171604.GI14283@carfax.org.uk> <5127ADC3.7090808@alumni.calpoly.edu> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 02/22/2013 10:35 AM, cwillu wrote: > On Fri, Feb 22, 2013 at 11:41 AM, Mike Power wrote: >> On 02/22/2013 09:16 AM, Hugo Mills wrote: >>> On Fri, Feb 22, 2013 at 09:11:28AM -0800, Mike Power wrote: >>>> I think I have a misconception of what copy on write in btrfs means >>>> for individual files. >>>> >>>> I had originally thought that I could create a large file: >>>> time dd if=/dev/zero of=10G bs=1G count=10 >>>> 10+0 records in >>>> 10+0 records out >>>> 10737418240 bytes (11 GB) copied, 100.071 s, 107 MB/s >>>> >>>> real 1m41.082s >>>> user 0m0.000s >>>> sys 0m7.792s >>>> >>>> Then if I copied this file no blocks would be copied until they are >>>> written. Hence the two files would use the same blocks underneath. >>>> But specifically that copy would be fast. Since it would only need >>>> to write some metadata. But when I copy the file: >>>> time cp 10G 10G2 >>>> >>>> real 3m38.790s >>>> user 0m0.124s >>>> sys 0m10.709s >>>> >>>> Oddly enough it actually takes longer then the initial file >>>> creation. So I am guessing that the long duration copy of the file >>>> is expected and that is not one of the virtues of btrfs copy on >>>> write. Does that sound right? >>> You probably want cp --reflink=always, which makes a CoW copy of >>> the file's metadata only. The resulting files have the semantics of >>> two different files, but share their blocks until a part of one of >>> them is modified (at which point, the modified blocks are no longer >>> shared). >>> >>> Hugo. >>> >> I see, and it works great: >> time cp --reflink=always 10G 10G3 >> >> real 0m0.028s >> user 0m0.000s >> sys 0m0.000s >> >> So from the user perspective I might say I want to opt out of this feature >> not optin. I want all copies by all applications done as a copy on write. >> But if my understanding is correct that is up to the application being >> called (in this case cp) and how it in turns makes calls to the system. >> >> In short I can't remount the btrfs filesystem with some new args that says >> always copy on write files because that is what it already. > There's no "copy a file" syscall; when a program copies a file, it > opens a new file, and writes all the bytes from the old to the new. > Converting this to a reflink would require btrfs to implement full > de-dup (which is rather expensive), and still wouldn't prevent the > program from reading and writing all 10gb (and so wouldn't be any > faster). > > You can set an alias in your shell to make cp --reflink=auto the > default, but that won't affect other programs, nor other users. Thanks for the help guys. I learned that if I want some application to support this behavior they must specifically choose to implement it.