All of lore.kernel.org
 help / color / mirror / Atom feed
From: Austin S Hemmelgarn <ahferroin7@gmail.com>
To: Erkki Seppala <flux-btrfs@inside.org>, linux-btrfs@vger.kernel.org
Subject: Re: btrfs autodefrag?
Date: Mon, 19 Oct 2015 07:56:51 -0400	[thread overview]
Message-ID: <5624DA83.40200@gmail.com> (raw)
In-Reply-To: <m49zizfbcsb.fsf@coffee.modeemi.fi>

[-- Attachment #1: Type: text/plain, Size: 2320 bytes --]

On 2015-10-19 02:19, Erkki Seppala wrote:
> Hugo Mills <hugo@carfax.org.uk> writes:
>>     It has to be disabled because if you enable it, there's a race
>> condition: since you're overwriting existing data (rather than CoWing
>> it), you can't update the checksums atomically. So, in the interests
>> of consistency, checksums are disabled.
>
> I suppose this has been suggested before, but couldn't it store both the
> new and the old checksums and be satisfied if either of them match?
Actually, I don't think that's been suggested before, read on however 
for an explanation of why we don't do that.
>
> The user is probably not happy that a partial write is going to be
> difficult to read from the device due to a checksum error, but there is
> no promise of recently-overwritten data state with traditional
> filesystems either in case of sudden powerdown, assuming there is no
> data journaling..
And that is exactly the case with how things are now, when something is 
marked NOCOW, it has essentially zero guarantee of data consistency 
after a crash.  As things are now though, there is a guarantee that you 
can still read the file, but using checksums like you suggest would 
result in it being unreadable most of the time, because it's 
statistically unlikely that we wrote the _whole_ block (IOW, we can't 
guarantee without COW that the data was completely written) because:
a. While some disks do atomically write single sectors, most don't, and 
if the power dies during the disk writing a single sector, there is no 
certainty exactly what that sector will read back as.
b. Assuming that item a is not an issue, one block in BTRFS is usually 
multiple sectors on disk, and a majority of disks have volatile write 
caches, thus it is not unlikely that the power will die during the 
process of writing the block.
c. In the event that both items a and b are not an issue (for example, 
you have a storage controller with a non-volatile write cache, have 
write caching turned off on the disks, and it's a smart enough storage 
controller that it only removes writes from the cache after they 
return), then there is still the small but distinct possibility that the 
crash will cause either corruption in the write cache, or some other 
hardware related issue.



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

  reply	other threads:[~2015-10-19 11:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-17 16:36 btrfs autodefrag? Xavier Gnata
2015-10-18  5:46 ` Duncan
2015-10-18 12:44   ` Xavier Gnata
2015-10-19  6:04   ` Paul Harvey
2015-10-18 14:24 ` Rich Freeman
2015-10-18 14:40   ` Hugo Mills
2015-10-19  6:19     ` Erkki Seppala
2015-10-19 11:56       ` Austin S Hemmelgarn [this message]
2015-10-19 16:13         ` Erkki Seppala
2015-10-19 19:48           ` Austin S Hemmelgarn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5624DA83.40200@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=flux-btrfs@inside.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.