From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
Cc: dsterba@suse.cz, Waxhead <waxhead@online.no>,
linux-btrfs@vger.kernel.org
Subject: Re: Is stability a joke? (wiki updated)
Date: Mon, 19 Sep 2016 11:33:42 -0400 [thread overview]
Message-ID: <20160919153341.GA4703@hungrycats.org> (raw)
In-Reply-To: <8dc842dc-c9a9-5662-1222-2d6785a66359@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3765 bytes --]
On Mon, Sep 19, 2016 at 08:32:14AM -0400, Austin S. Hemmelgarn wrote:
> On 2016-09-18 23:47, Zygo Blaxell wrote:
> >On Mon, Sep 12, 2016 at 12:56:03PM -0400, Austin S. Hemmelgarn wrote:
> >>4. File Range Cloning and Out-of-band Dedupe: Similarly, work fine if the FS
> >>is healthy.
> >
> >I've found issues with OOB dedup (clone/extent-same):
> >
> >1. Don't dedup data that has not been committed--either call fsync()
> >on it, or check the generation numbers on each extent before deduping
> >it, or make sure the data is not being actively modified during dedup;
> >otherwise, a race condition may lead to the the filesystem locking up and
> >becoming inaccessible until the kernel is rebooted. This is particularly
> >important if you are doing bedup-style incremental dedup on a live system.
> >
> >I've worked around #1 by placing a fsync() call on the src FD immediately
> >before calling FILE_EXTENT_SAME. When I do an A/B experiment with and
> >without the fsync, "with-fsync" runs for weeks at a time without issues,
> >while "without-fsync" hangs, sometimes in just a matter of hours. Note
> >that the fsync() doesn't resolve the underlying race condition, it just
> >makes the filesystem hang less often.
> >
> >2. There is a practical limit to the number of times a single duplicate
> >extent can be deduplicated. As more references to a shared extent
> >are created, any part of the filesystem that uses backref walking code
> >gets slower. This includes dedup itself, balance, device replace/delete,
> >FIEMAP, LOGICAL_INO, and mmap() (which can be bad news if the duplicate
> >files are executables). Several factors (including file size and number
> >of snapshots) are involved, making it difficult to devise workarounds or
> >set up test cases. 99.5% of the time, these operations just get slower
> >by a few ms each time a new reference is created, but the other 0.5% of
> >the time, write operations will abruptly grow to consume hours of CPU
> >time or dozens of gigabytes of RAM (in millions of kmalloc-32 slabs)
> >when they touch one of these over-shared extents. When this occurs,
> >it effectively (but not literally) crashes the host machine.
> >
> >I've worked around #2 by building tables of "toxic" hashes that occur too
> >frequently in a filesystem to be deduped, and using these tables in dedup
> >software to ignore any duplicate data matching them. These tables can
> >be relatively small as they only need to list hashes that are repeated
> >more than a few thousand times, and typical filesystems (up to 10TB or
> >so) have only a few hundred such hashes.
> >
> >I happened to have a couple of machines taken down by these issues this
> >very weekend, so I can confirm the issues are present in kernels 4.4.21,
> >4.5.7, and 4.7.4.
> OK, that's good to know. In my case, I'm not operating on a very big data
> set (less than 40GB, but the storage cluster I'm doing this on only has
> about 200GB of total space, so I'm trying to conserve as much as possible),
> and it's mostly static data (less than 100MB worth of changes a day except
> on Sunday when I run backups), so it makes sense that I've not seen either
> of these issues.
I ran into issue #2 on an 8GB filesystem last weekend. The lower limit
on filesystem size could be as low as a few megabytes if they're arranged
in *just* the right way.
> The second one sounds like the same performance issue caused by having very
> large numbers of snapshots, and based on what's happening, I don't think
> there's any way we could fix it without rewriting certain core code.
find_parent_nodes is the usual culprit for CPU usage. Fixing this is
required for in-band dedup as well, so I assume someone has it on their
roadmap and will get it done eventually.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
next prev parent reply other threads:[~2016-09-19 15:34 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-11 8:55 Is stability a joke? Waxhead
2016-09-11 9:56 ` Steven Haigh
2016-09-11 10:23 ` Martin Steigerwald
2016-09-11 11:21 ` Zoiled
2016-09-11 11:43 ` Martin Steigerwald
2016-09-11 12:05 ` Martin Steigerwald
2016-09-11 12:39 ` Waxhead
2016-09-11 13:02 ` Hugo Mills
2016-09-11 14:59 ` Martin Steigerwald
2016-09-11 20:14 ` Chris Murphy
2016-09-12 12:20 ` Austin S. Hemmelgarn
2016-09-12 12:59 ` Michel Bouissou
2016-09-12 13:14 ` Austin S. Hemmelgarn
2016-09-12 14:04 ` Lionel Bouton
2016-09-15 1:05 ` Nicholas D Steeves
2016-09-15 8:02 ` Martin Steigerwald
2016-09-16 7:13 ` Helmut Eller
2016-09-15 5:55 ` Kai Krakow
2016-09-15 8:05 ` Martin Steigerwald
2016-09-11 14:54 ` Martin Steigerwald
2016-09-11 15:19 ` Martin Steigerwald
2016-09-11 20:21 ` Chris Murphy
2016-09-11 17:46 ` Marc MERLIN
2016-09-20 16:33 ` Chris Murphy
2016-09-11 17:11 ` Duncan
2016-09-12 12:26 ` Austin S. Hemmelgarn
2016-09-11 12:30 ` Waxhead
2016-09-11 14:36 ` Martin Steigerwald
2016-09-12 12:48 ` Swâmi Petaramesh
2016-09-12 13:53 ` Chris Mason
2016-09-12 17:36 ` Zoiled
2016-09-12 17:44 ` Waxhead
2016-09-15 1:12 ` Nicholas D Steeves
2016-09-12 14:27 ` David Sterba
2016-09-12 14:54 ` Austin S. Hemmelgarn
2016-09-12 16:51 ` David Sterba
2016-09-12 17:31 ` Austin S. Hemmelgarn
2016-09-15 1:07 ` Nicholas D Steeves
2016-09-15 1:13 ` Steven Haigh
2016-09-15 2:14 ` stability matrix (was: Is stability a joke?) Christoph Anton Mitterer
2016-09-15 9:49 ` stability matrix Hans van Kranenburg
2016-09-15 11:54 ` Austin S. Hemmelgarn
2016-09-15 14:15 ` Chris Murphy
2016-09-15 14:56 ` Martin Steigerwald
2016-09-19 14:38 ` David Sterba
2016-09-19 15:27 ` stability matrix (was: Is stability a joke?) David Sterba
2016-09-19 17:18 ` stability matrix Austin S. Hemmelgarn
2016-09-19 19:52 ` Christoph Anton Mitterer
2016-09-19 20:07 ` Chris Mason
2016-09-19 20:36 ` Christoph Anton Mitterer
2016-09-19 21:03 ` Chris Mason
2016-09-19 19:45 ` stability matrix (was: Is stability a joke?) Christoph Anton Mitterer
2016-09-20 7:59 ` Duncan
2016-09-20 8:19 ` Hugo Mills
2016-09-20 8:34 ` David Sterba
2016-09-19 15:38 ` Is stability a joke? David Sterba
2016-09-19 21:25 ` Hans van Kranenburg
2016-09-12 16:27 ` Is stability a joke? (wiki updated) David Sterba
2016-09-12 16:56 ` Austin S. Hemmelgarn
2016-09-12 17:29 ` Filipe Manana
2016-09-12 17:42 ` Austin S. Hemmelgarn
2016-09-12 20:08 ` Chris Murphy
2016-09-13 11:35 ` Austin S. Hemmelgarn
2016-09-15 18:01 ` Chris Murphy
2016-09-15 18:20 ` Austin S. Hemmelgarn
2016-09-15 19:02 ` Chris Murphy
2016-09-15 20:16 ` Hugo Mills
2016-09-15 20:26 ` Chris Murphy
2016-09-16 12:00 ` Austin S. Hemmelgarn
2016-09-19 2:57 ` Zygo Blaxell
2016-09-19 12:37 ` Austin S. Hemmelgarn
2016-09-19 4:08 ` Zygo Blaxell
2016-09-19 15:27 ` Sean Greenslade
2016-09-19 17:38 ` Austin S. Hemmelgarn
2016-09-19 18:27 ` Chris Murphy
2016-09-19 18:34 ` Austin S. Hemmelgarn
2016-09-19 20:15 ` Zygo Blaxell
2016-09-20 12:09 ` Austin S. Hemmelgarn
2016-09-15 21:23 ` Christoph Anton Mitterer
2016-09-16 12:13 ` Austin S. Hemmelgarn
2016-09-19 3:47 ` Zygo Blaxell
2016-09-19 12:32 ` Austin S. Hemmelgarn
2016-09-19 15:33 ` Zygo Blaxell [this message]
2016-09-12 19:57 ` Martin Steigerwald
2016-09-12 20:21 ` Pasi Kärkkäinen
2016-09-12 20:35 ` Martin Steigerwald
2016-09-12 20:44 ` Chris Murphy
2016-09-13 11:28 ` Austin S. Hemmelgarn
2016-09-13 11:39 ` Martin Steigerwald
2016-09-14 5:53 ` Marc Haber
2016-09-12 20:48 ` Waxhead
2016-09-13 8:38 ` Timofey Titovets
2016-09-13 11:26 ` Austin S. Hemmelgarn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160919153341.GA4703@hungrycats.org \
--to=ce3g8jdj@umail.furryterror.org \
--cc=ahferroin7@gmail.com \
--cc=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=waxhead@online.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.