All of lore.kernel.org
 help / color / mirror / Atom feed
From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: [auto-]defrag, nodatacow - general suggestions?(was: btrfs: poor performance on deleting many large files?)
Date: Thu, 17 Dec 2015 04:06:56 +0000 (UTC)	[thread overview]
Message-ID: <pan$7df3b$2174f8e9$efd1ab67$2dbe2f31@cox.net> (raw)
In-Reply-To: 1450303141.6242.50.camel@scientia.net

Christoph Anton Mitterer posted on Wed, 16 Dec 2015 22:59:01 +0100 as
excerpted:

> On Wed, 2015-12-09 at 16:36 +0000, Duncan wrote:
>> But... as I've pointed out in other replies, in many cases including
>> this specific one (bittorrent), applications have already had to
>> develop their own integrity management features

> Well let's move discussion upon that into the "dear developers, can we
> have notdatacow + checksumming, plz?" where I showed in one of the more
> recent threads that bittorrent seems rather to be the only thing which
> does use that per default... while on the VM image front, nothing seems
> to support it, and on the DB front, some support it, but don't use it
> per default.
> 
>> In the bittorrent case specifically, torrent chunks are already
>> checksummed, and if they don't verify upon download, the chunk is
>> thrown away and redownloaded.

> I'm not a bittorrent expert, because I don't use it, but that sounds to
> be more like the edonkey model, where - while there are checksums -
> these are only used until the download completes. Then you have the
> complete file, any checksum info thrown away, and the file again being
> "at risk" (i.e. not checksum protected).

[I'm breaking this into smaller replies again.]

Just to mention here, that I said "integrity management features", which 
includes more than checksumming.  As Austin Hemmelgarn has been pointing 
out, DBs and some VMs do COW, some DBs do checksumming or at least have 
that option, and both VMs and DBs generally do at least some level of 
consistency checking as they load.  Those are all "integrity management 
features" at some level.

As for bittorrent, I /think/ the checksums are in the torrent files 
themselves (and if I'm not mistaken, much as git, the chunks within the 
file are actually IDed by checksum, not specific position, so as long as 
the torrent is active, uploading or downloading, these will by definition 
be retained).  As long as those are retained, the checksums should be 
retained.  And ideally, people will continue to torrent the files long 
after they've finished downloading them, in which case they'll still need 
the torrent files themselves, along with the checksums info.

And for longer term storage, people really should be copying/moving their 
torrented files elsewhere, in such a way that they either eliminate the 
fragmentation if the files weren't nocowed, or eliminate the nocow 
attribute and get them checksum-protected as normal for files not 
intended to be constantly randomly rewritten, which will be the case once 
they're no longer being actively downloaded.  Of course that's at the 
slightly technically oriented user level, but then, the whole nocow 
thing, or even caring about checksums and longer term file integrity in 
the first place, is also technically oriented user level.  Normal users 
will just download without worrying about the nocow in the first place, 
and perhaps wonder why the disk is thrashing so, but not be inclined to 
do anything about it except perhaps switch back to their old filesystem, 
where it was faster and the disk didn't sound as bad.  In doing so, 
they'll either automatically get the checksuming along with the worse 
performance, or go back to a filesystem without the checksumming, and 
think it's fine as they know no different.

Meanwhile, if they do it correctly there's no window without protection, 
as the torrent file can be used to double-verify the file once moved, as 
well, before deleting it.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


  reply	other threads:[~2015-12-17  4:07 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-23  1:43 btrfs: poor performance on deleting many large files Mitch Fossen
2015-11-23  6:29 ` Duncan
2015-11-25 21:49   ` Mitchell Fossen
2015-11-26 16:52     ` Duncan
2015-11-26 18:25       ` Christoph Anton Mitterer
2015-11-26 23:29         ` Duncan
2015-11-27  0:06           ` Christoph Anton Mitterer
2015-11-27  3:38             ` Duncan
2015-11-28  3:57               ` Christoph Anton Mitterer
2015-11-28  6:49                 ` Duncan
2015-12-12 22:15                   ` Christoph Anton Mitterer
2015-12-13  7:10                     ` Duncan
2015-12-16 22:14                       ` Christoph Anton Mitterer
2015-12-14 14:24                     ` Austin S. Hemmelgarn
2015-12-14 19:39                       ` Christoph Anton Mitterer
2015-12-14 20:27                         ` Austin S. Hemmelgarn
2015-12-14 21:30                           ` Lionel Bouton
2015-12-14 23:25                             ` Christoph Anton Mitterer
2015-12-15  1:49                               ` Duncan
2015-12-15  2:38                                 ` Lionel Bouton
2015-12-16  8:10                                   ` Duncan
2015-12-14 23:10                           ` Christoph Anton Mitterer
2015-12-14 23:16                           ` project idea: per-object default mount-options / more btrfs-properties / chattr attributes (was: btrfs: poor performance on deleting many large files) Christoph Anton Mitterer
2015-12-15  2:08                           ` btrfs: poor performance on deleting many large files Duncan
2015-12-15  4:05                       ` Chris Murphy
2015-11-27  1:49     ` Qu Wenruo
2015-11-23 12:59 ` Austin S Hemmelgarn
2015-11-26  0:23   ` [auto-]defrag, nodatacow - general suggestions?(was: btrfs: poor performance on deleting many large files?) Christoph Anton Mitterer
2015-11-26  0:33     ` Hugo Mills
2015-12-09  5:43       ` Christoph Anton Mitterer
2015-12-09 13:36         ` Duncan
2015-12-14  2:46           ` Christoph Anton Mitterer
2015-12-14 11:19             ` Duncan
2015-12-16 23:39           ` Kai Krakow
2015-12-14  1:44       ` Christoph Anton Mitterer
2015-12-14 10:51         ` Duncan
2015-12-16 23:55           ` Christoph Anton Mitterer
2015-11-26 23:08     ` Duncan
2015-12-09  5:45       ` Christoph Anton Mitterer
2015-12-09 16:36         ` Duncan
2015-12-16 21:59           ` Christoph Anton Mitterer
2015-12-17  4:06             ` Duncan [this message]
2015-12-18  0:21               ` Christoph Anton Mitterer
2015-12-17  4:35             ` Duncan
2015-12-17  5:07             ` Duncan
2015-12-17  5:12             ` Duncan
2015-12-17  6:00             ` Duncan
2015-12-17  6:01             ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$7df3b$2174f8e9$efd1ab67$2dbe2f31@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.