From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
Cc: dsterba@suse.cz, Waxhead <waxhead@online.no>,
linux-btrfs@vger.kernel.org
Subject: Re: Is stability a joke? (wiki updated)
Date: Mon, 19 Sep 2016 11:33:42 -0400 [thread overview]
Message-ID: <20160919153341.GA4703@hungrycats.org> (raw)
In-Reply-To: <8dc842dc-c9a9-5662-1222-2d6785a66359@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3765 bytes --]
On Mon, Sep 19, 2016 at 08:32:14AM -0400, Austin S. Hemmelgarn wrote:
> On 2016-09-18 23:47, Zygo Blaxell wrote:
> >On Mon, Sep 12, 2016 at 12:56:03PM -0400, Austin S. Hemmelgarn wrote:
> >>4. File Range Cloning and Out-of-band Dedupe: Similarly, work fine if the FS
> >>is healthy.
> >
> >I've found issues with OOB dedup (clone/extent-same):
> >
> >1. Don't dedup data that has not been committed--either call fsync()
> >on it, or check the generation numbers on each extent before deduping
> >it, or make sure the data is not being actively modified during dedup;
> >otherwise, a race condition may lead to the the filesystem locking up and
> >becoming inaccessible until the kernel is rebooted. This is particularly
> >important if you are doing bedup-style incremental dedup on a live system.
> >
> >I've worked around #1 by placing a fsync() call on the src FD immediately
> >before calling FILE_EXTENT_SAME. When I do an A/B experiment with and
> >without the fsync, "with-fsync" runs for weeks at a time without issues,
> >while "without-fsync" hangs, sometimes in just a matter of hours. Note
> >that the fsync() doesn't resolve the underlying race condition, it just
> >makes the filesystem hang less often.
> >
> >2. There is a practical limit to the number of times a single duplicate
> >extent can be deduplicated. As more references to a shared extent
> >are created, any part of the filesystem that uses backref walking code
> >gets slower. This includes dedup itself, balance, device replace/delete,
> >FIEMAP, LOGICAL_INO, and mmap() (which can be bad news if the duplicate
> >files are executables). Several factors (including file size and number
> >of snapshots) are involved, making it difficult to devise workarounds or
> >set up test cases. 99.5% of the time, these operations just get slower
> >by a few ms each time a new reference is created, but the other 0.5% of
> >the time, write operations will abruptly grow to consume hours of CPU
> >time or dozens of gigabytes of RAM (in millions of kmalloc-32 slabs)
> >when they touch one of these over-shared extents. When this occurs,
> >it effectively (but not literally) crashes the host machine.
> >
> >I've worked around #2 by building tables of "toxic" hashes that occur too
> >frequently in a filesystem to be deduped, and using these tables in dedup
> >software to ignore any duplicate data matching them. These tables can
> >be relatively small as they only need to list hashes that are repeated
> >more than a few thousand times, and typical filesystems (up to 10TB or
> >so) have only a few hundred such hashes.
> >
> >I happened to have a couple of machines taken down by these issues this
> >very weekend, so I can confirm the issues are present in kernels 4.4.21,
> >4.5.7, and 4.7.4.
> OK, that's good to know. In my case, I'm not operating on a very big data
> set (less than 40GB, but the storage cluster I'm doing this on only has
> about 200GB of total space, so I'm trying to conserve as much as possible),
> and it's mostly static data (less than 100MB worth of changes a day except
> on Sunday when I run backups), so it makes sense that I've not seen either
> of these issues.
I ran into issue #2 on an 8GB filesystem last weekend. The lower limit
on filesystem size could be as low as a few megabytes if they're arranged
in *just* the right way.
> The second one sounds like the same performance issue caused by having very
> large numbers of snapshots, and based on what's happening, I don't think
> there's any way we could fix it without rewriting certain core code.
find_parent_nodes is the usual culprit for CPU usage. Fixing this is
required for in-band dedup as well, so I assume someone has it on their
roadmap and will get it done eventually.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
next prev parent reply other threads:[~2016-09-19 15:34 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-11 8:55 Is stability a joke? Waxhead
2016-09-11 9:56 ` Steven Haigh
2016-09-11 10:23 ` Martin Steigerwald
2016-09-11 11:21 ` Zoiled
2016-09-11 11:43 ` Martin Steigerwald
2016-09-11 12:05 ` Martin Steigerwald
2016-09-11 12:39 ` Waxhead
2016-09-11 13:02 ` Hugo Mills
2016-09-11 14:59 ` Martin Steigerwald
2016-09-11 20:14 ` Chris Murphy
2016-09-12 12:20 ` Austin S. Hemmelgarn
2016-09-12 12:59 ` Michel Bouissou
2016-09-12 13:14 ` Austin S. Hemmelgarn
2016-09-12 14:04 ` Lionel Bouton
2016-09-15 1:05 ` Nicholas D Steeves
2016-09-15 8:02 ` Martin Steigerwald
2016-09-16 7:13 ` Helmut Eller
2016-09-15 5:55 ` Kai Krakow
2016-09-15 8:05 ` Martin Steigerwald
2016-09-11 14:54 ` Martin Steigerwald
2016-09-11 15:19 ` Martin Steigerwald
2016-09-11 20:21 ` Chris Murphy
2016-09-11 17:46 ` Marc MERLIN
2016-09-20 16:33 ` Chris Murphy
2016-09-11 17:11 ` Duncan
2016-09-12 12:26 ` Austin S. Hemmelgarn
2016-09-11 12:30 ` Waxhead
2016-09-11 14:36 ` Martin Steigerwald
2016-09-12 12:48 ` Swâmi Petaramesh
2016-09-12 13:53 ` Chris Mason
2016-09-12 17:36 ` Zoiled
2016-09-12 17:44 ` Waxhead
2016-09-15 1:12 ` Nicholas D Steeves
2016-09-12 14:27 ` David Sterba
2016-09-12 14:54 ` Austin S. Hemmelgarn
2016-09-12 16:51 ` David Sterba
2016-09-12 17:31 ` Austin S. Hemmelgarn
2016-09-15 1:07 ` Nicholas D Steeves
2016-09-15 1:13 ` Steven Haigh
2016-09-15 2:14 ` stability matrix (was: Is stability a joke?) Christoph Anton Mitterer
2016-09-15 9:49 ` stability matrix Hans van Kranenburg
2016-09-15 11:54 ` Austin S. Hemmelgarn
2016-09-15 14:15 ` Chris Murphy
2016-09-15 14:56 ` Martin Steigerwald
2016-09-19 14:38 ` David Sterba
2016-09-19 15:27 ` stability matrix (was: Is stability a joke?) David Sterba
2016-09-19 17:18 ` stability matrix Austin S. Hemmelgarn
2016-09-19 19:52 ` Christoph Anton Mitterer
2016-09-19 20:07 ` Chris Mason
2016-09-19 20:36 ` Christoph Anton Mitterer
2016-09-19 21:03 ` Chris Mason
2016-09-19 19:45 ` stability matrix (was: Is stability a joke?) Christoph Anton Mitterer
2016-09-20 7:59 ` Duncan
2016-09-20 8:19 ` Hugo Mills
2016-09-20 8:34 ` David Sterba
2016-09-19 15:38 ` Is stability a joke? David Sterba
2016-09-19 21:25 ` Hans van Kranenburg
2016-09-12 16:27 ` Is stability a joke? (wiki updated) David Sterba
2016-09-12 16:56 ` Austin S. Hemmelgarn
2016-09-12 17:29 ` Filipe Manana
2016-09-12 17:42 ` Austin S. Hemmelgarn
2016-09-12 20:08 ` Chris Murphy
2016-09-13 11:35 ` Austin S. Hemmelgarn
2016-09-15 18:01 ` Chris Murphy
2016-09-15 18:20 ` Austin S. Hemmelgarn
2016-09-15 19:02 ` Chris Murphy
2016-09-15 20:16 ` Hugo Mills
2016-09-15 20:26 ` Chris Murphy
2016-09-16 12:00 ` Austin S. Hemmelgarn
2016-09-19 2:57 ` Zygo Blaxell
2016-09-19 12:37 ` Austin S. Hemmelgarn
2016-09-19 4:08 ` Zygo Blaxell
2016-09-19 15:27 ` Sean Greenslade
2016-09-19 17:38 ` Austin S. Hemmelgarn
2016-09-19 18:27 ` Chris Murphy
2016-09-19 18:34 ` Austin S. Hemmelgarn
2016-09-19 20:15 ` Zygo Blaxell
2016-09-20 12:09 ` Austin S. Hemmelgarn
2016-09-15 21:23 ` Christoph Anton Mitterer
2016-09-16 12:13 ` Austin S. Hemmelgarn
2016-09-19 3:47 ` Zygo Blaxell
2016-09-19 12:32 ` Austin S. Hemmelgarn
2016-09-19 15:33 ` Zygo Blaxell [this message]
2016-09-12 19:57 ` Martin Steigerwald
2016-09-12 20:21 ` Pasi Kärkkäinen
2016-09-12 20:35 ` Martin Steigerwald
2016-09-12 20:44 ` Chris Murphy
2016-09-13 11:28 ` Austin S. Hemmelgarn
2016-09-13 11:39 ` Martin Steigerwald
2016-09-14 5:53 ` Marc Haber
2016-09-12 20:48 ` Waxhead
2016-09-13 8:38 ` Timofey Titovets
2016-09-13 11:26 ` Austin S. Hemmelgarn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160919153341.GA4703@hungrycats.org \
--to=ce3g8jdj@umail.furryterror.org \
--cc=ahferroin7@gmail.com \
--cc=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=waxhead@online.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).