From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: dsterba@suse.cz, Waxhead <waxhead@online.no>,
	linux-btrfs@vger.kernel.org
Subject: Re: Is stability a joke? (wiki updated)
Date: Mon, 19 Sep 2016 08:32:14 -0400	[thread overview]
Message-ID: <8dc842dc-c9a9-5662-1222-2d6785a66359@gmail.com> (raw)
In-Reply-To: <20160919034701.GE21290@hungrycats.org>
On 2016-09-18 23:47, Zygo Blaxell wrote:
> On Mon, Sep 12, 2016 at 12:56:03PM -0400, Austin S. Hemmelgarn wrote:
>> 4. File Range Cloning and Out-of-band Dedupe: Similarly, work fine if the FS
>> is healthy.
>
> I've found issues with OOB dedup (clone/extent-same):
>
> 1.  Don't dedup data that has not been committed--either call fsync()
> on it, or check the generation numbers on each extent before deduping
> it, or make sure the data is not being actively modified during dedup;
> otherwise, a race condition may lead to the the filesystem locking up and
> becoming inaccessible until the kernel is rebooted.  This is particularly
> important if you are doing bedup-style incremental dedup on a live system.
>
> I've worked around #1 by placing a fsync() call on the src FD immediately
> before calling FILE_EXTENT_SAME.  When I do an A/B experiment with and
> without the fsync, "with-fsync" runs for weeks at a time without issues,
> while "without-fsync" hangs, sometimes in just a matter of hours.  Note
> that the fsync() doesn't resolve the underlying race condition, it just
> makes the filesystem hang less often.
>
> 2.  There is a practical limit to the number of times a single duplicate
> extent can be deduplicated.  As more references to a shared extent
> are created, any part of the filesystem that uses backref walking code
> gets slower.  This includes dedup itself, balance, device replace/delete,
> FIEMAP, LOGICAL_INO, and mmap() (which can be bad news if the duplicate
> files are executables).  Several factors (including file size and number
> of snapshots) are involved, making it difficult to devise workarounds or
> set up test cases.  99.5% of the time, these operations just get slower
> by a few ms each time a new reference is created, but the other 0.5% of
> the time, write operations will abruptly grow to consume hours of CPU
> time or dozens of gigabytes of RAM (in millions of kmalloc-32 slabs)
> when they touch one of these over-shared extents.  When this occurs,
> it effectively (but not literally) crashes the host machine.
>
> I've worked around #2 by building tables of "toxic" hashes that occur too
> frequently in a filesystem to be deduped, and using these tables in dedup
> software to ignore any duplicate data matching them.  These tables can
> be relatively small as they only need to list hashes that are repeated
> more than a few thousand times, and typical filesystems (up to 10TB or
> so) have only a few hundred such hashes.
>
> I happened to have a couple of machines taken down by these issues this
> very weekend, so I can confirm the issues are present in kernels 4.4.21,
> 4.5.7, and 4.7.4.
OK, that's good to know.  In my case, I'm not operating on a very big 
data set (less than 40GB, but the storage cluster I'm doing this on only 
has about 200GB of total space, so I'm trying to conserve as much as 
possible), and it's mostly static data (less than 100MB worth of changes 
a day except on Sunday when I run backups), so it makes sense that I've 
not seen either of these issues.
The second one sounds like the same performance issue caused by having 
very large numbers of snapshots, and based on what's happening, I don't 
think there's any way we could fix it without rewriting certain core code.
next prev parent reply	other threads:[~2016-09-19 12:32 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-11  8:55 Is stability a joke? Waxhead
2016-09-11  9:56 ` Steven Haigh
2016-09-11 10:23 ` Martin Steigerwald
2016-09-11 11:21   ` Zoiled
2016-09-11 11:43     ` Martin Steigerwald
2016-09-11 12:05       ` Martin Steigerwald
2016-09-11 12:39         ` Waxhead
2016-09-11 13:02           ` Hugo Mills
2016-09-11 14:59             ` Martin Steigerwald
2016-09-11 20:14             ` Chris Murphy
2016-09-12 12:20             ` Austin S. Hemmelgarn
2016-09-12 12:59               ` Michel Bouissou
2016-09-12 13:14                 ` Austin S. Hemmelgarn
2016-09-12 14:04                 ` Lionel Bouton
2016-09-15  1:05               ` Nicholas D Steeves
2016-09-15  8:02                 ` Martin Steigerwald
2016-09-16  7:13                 ` Helmut Eller
2016-09-15  5:55               ` Kai Krakow
2016-09-15  8:05                 ` Martin Steigerwald
2016-09-11 14:54           ` Martin Steigerwald
2016-09-11 15:19             ` Martin Steigerwald
2016-09-11 20:21             ` Chris Murphy
2016-09-11 17:46           ` Marc MERLIN
2016-09-20 16:33             ` Chris Murphy
2016-09-11 17:11         ` Duncan
2016-09-12 12:26           ` Austin S. Hemmelgarn
2016-09-11 12:30       ` Waxhead
2016-09-11 14:36         ` Martin Steigerwald
2016-09-12 12:48   ` Swâmi Petaramesh
2016-09-12 13:53 ` Chris Mason
2016-09-12 17:36   ` Zoiled
2016-09-12 17:44     ` Waxhead
2016-09-15  1:12     ` Nicholas D Steeves
2016-09-12 14:27 ` David Sterba
2016-09-12 14:54   ` Austin S. Hemmelgarn
2016-09-12 16:51     ` David Sterba
2016-09-12 17:31       ` Austin S. Hemmelgarn
2016-09-15  1:07         ` Nicholas D Steeves
2016-09-15  1:13           ` Steven Haigh
2016-09-15  2:14             ` stability matrix (was: Is stability a joke?) Christoph Anton Mitterer
2016-09-15  9:49               ` stability matrix Hans van Kranenburg
2016-09-15 11:54                 ` Austin S. Hemmelgarn
2016-09-15 14:15                   ` Chris Murphy
2016-09-15 14:56                   ` Martin Steigerwald
2016-09-19 14:38                   ` David Sterba
2016-09-19 15:27               ` stability matrix (was: Is stability a joke?) David Sterba
2016-09-19 17:18                 ` stability matrix Austin S. Hemmelgarn
2016-09-19 19:52                   ` Christoph Anton Mitterer
2016-09-19 20:07                     ` Chris Mason
2016-09-19 20:36                       ` Christoph Anton Mitterer
2016-09-19 21:03                         ` Chris Mason
2016-09-19 19:45                 ` stability matrix (was: Is stability a joke?) Christoph Anton Mitterer
2016-09-20  7:59                   ` Duncan
2016-09-20  8:19                     ` Hugo Mills
2016-09-20  8:34                   ` David Sterba
2016-09-19 15:38         ` Is stability a joke? David Sterba
2016-09-19 21:25           ` Hans van Kranenburg
2016-09-12 16:27   ` Is stability a joke? (wiki updated) David Sterba
2016-09-12 16:56     ` Austin S. Hemmelgarn
2016-09-12 17:29       ` Filipe Manana
2016-09-12 17:42         ` Austin S. Hemmelgarn
2016-09-12 20:08       ` Chris Murphy
2016-09-13 11:35         ` Austin S. Hemmelgarn
2016-09-15 18:01           ` Chris Murphy
2016-09-15 18:20             ` Austin S. Hemmelgarn
2016-09-15 19:02               ` Chris Murphy
2016-09-15 20:16                 ` Hugo Mills
2016-09-15 20:26                   ` Chris Murphy
2016-09-16 12:00                     ` Austin S. Hemmelgarn
2016-09-19  2:57                       ` Zygo Blaxell
2016-09-19 12:37                         ` Austin S. Hemmelgarn
2016-09-19  4:08                 ` Zygo Blaxell
2016-09-19 15:27                   ` Sean Greenslade
2016-09-19 17:38                   ` Austin S. Hemmelgarn
2016-09-19 18:27                     ` Chris Murphy
2016-09-19 18:34                       ` Austin S. Hemmelgarn
2016-09-19 20:15                     ` Zygo Blaxell
2016-09-20 12:09                       ` Austin S. Hemmelgarn
2016-09-15 21:23               ` Christoph Anton Mitterer
2016-09-16 12:13                 ` Austin S. Hemmelgarn
2016-09-19  3:47       ` Zygo Blaxell
2016-09-19 12:32         ` Austin S. Hemmelgarn [this message]
2016-09-19 15:33           ` Zygo Blaxell
2016-09-12 19:57     ` Martin Steigerwald
2016-09-12 20:21       ` Pasi Kärkkäinen
2016-09-12 20:35         ` Martin Steigerwald
2016-09-12 20:44           ` Chris Murphy
2016-09-13 11:28             ` Austin S. Hemmelgarn
2016-09-13 11:39               ` Martin Steigerwald
2016-09-14  5:53             ` Marc Haber
2016-09-12 20:48         ` Waxhead
2016-09-13  8:38           ` Timofey Titovets
2016-09-13 11:26             ` Austin S. Hemmelgarn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=8dc842dc-c9a9-5662-1222-2d6785a66359@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=waxhead@online.no \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).