public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Chandan Babu R <chandanrlinux@gmail.com>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Christoph Hellwig <hch@lst.de>,
	Allison Henderson <allison.henderson@oracle.com>,
	"Luis R. Rodriguez" <mcgrof@kernel.org>,
	Theodore Tso <tytso@mit.edu>
Subject: Re: [PATCH V14 00/16] Bail out if transaction can cause extent count to overflow
Date: Wed, 25 May 2022 18:21:40 +1000	[thread overview]
Message-ID: <20220525082140.GG1098723@dread.disaster.area> (raw)
In-Reply-To: <CAOQ4uxhhvsH8zLHxVc=HNViO12cssWFK4y+Pq5Jsz=2ymGaypg@mail.gmail.com>

On Tue, May 24, 2022 at 07:05:07PM +0300, Amir Goldstein wrote:
> On Tue, May 24, 2022 at 8:36 AM Amir Goldstein <amir73il@gmail.com> wrote:
> 
> Allow me to rephrase that using a less hypothetical use case.
> 
> Our team is working on an out-of-band dedupe tool, much like
> https://markfasheh.github.io/duperemove/duperemove.html
> but for larger scale filesystems and testing focus is on xfs.

dedupe is nothing new. It's being done in production systems and has
been for a while now. e.g. Veeam has a production server back end
for their reflink/dedupe based backup software that is hosted on
XFS.

The only scalability issues we've seen with those systems managing
tens of TB of heavily cross-linked files so far have been limited to
how long unlink of those large files takes. Dedupe/reflink speeds up
ingest for backup farms, but it slows down removal/garbage
collection of backup that are no longer needed. The big
reflink/dedupe backup farms I've seen problems with are generally
dealing with extent counts per file in the tens of millions,
which is still very managable.

Maybe we'll see more problems as data sets grow, but it's also
likely that the crosslinked data sets the applications build will
scale out (more base files) instead of up (larger base files). This
will mean they remain at the "tens of millions of extents per file"
level and won't stress the filesystem any more than they already do.

> In certain settings, such as containers, the tool does not control the
> running kernel and *if* we require a new kernel, the newest we can
> require in this setting is 5.10.y.

*If* you have a customer that creates a billion extents in a single
file, then you could consider backporting this. But until managing
billions of extents per file is an actual issue for production
filesystems, it's unnecessary to backport these changes.

> How would the tool know that it can safely create millions of dups
> that may get fragmented?

Millions or shared extents in a single file aren't a problem at all.
Millions of references to a single shared block aren't a problem at
all, either.

But there are limits to how much you can share a single block, and
those limits are *highly variable* because they are dependent on
free space being available to record references.  e.g. XFS can
share a single block a maximum of 2^32 -1 times. If a user turns on
rmapbt, that max share limit drops way down to however many
individual rmap records can be stored in the rmap btree before the
AG runs out of space. If the AGs are small and/or full of other data,
that could limit sharing of a single block to a few hundreds of
references.

IOWs, applications creating shared extents must expect the operation
to fail at any time, without warning. And dedupe applications need
to be able to index multiple replicas of the same block so that they
aren't limited to deduping that data to a single block that has
arbitrary limits on how many times it can be shared.

> Does anyone *object* to including this series in the stable kernel
> after it passes the tests?

If you end up having a customer that hits a billion extents in a
single file, then you can backport these patches to the 5.10.y
series. But without any obvious production need for these patches,
they don't fit the criteria for stable backports...

Don't change what ain't broke.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2022-05-25  8:22 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-10 16:07 [PATCH V14 00/16] Bail out if transaction can cause extent count to overflow Chandan Babu R
2021-01-10 16:07 ` [PATCH V14 01/16] xfs: Add helper for checking per-inode extent count overflow Chandan Babu R
2021-01-10 16:07 ` [PATCH V14 02/16] xfs: Check for extent overflow when trivally adding a new extent Chandan Babu R
2021-01-10 16:07 ` [PATCH V14 03/16] xfs: Check for extent overflow when punching a hole Chandan Babu R
2021-01-10 16:07 ` [PATCH V14 04/16] xfs: Check for extent overflow when adding dir entries Chandan Babu R
2021-01-12  1:34   ` Darrick J. Wong
2021-01-10 16:07 ` [PATCH V14 05/16] xfs: Check for extent overflow when removing " Chandan Babu R
2021-01-12  1:38   ` Darrick J. Wong
2021-01-10 16:07 ` [PATCH V14 06/16] xfs: Check for extent overflow when renaming " Chandan Babu R
2021-01-12  1:37   ` Darrick J. Wong
2021-01-10 16:07 ` [PATCH V14 07/16] xfs: Check for extent overflow when adding/removing xattrs Chandan Babu R
2021-01-10 16:07 ` [PATCH V14 08/16] xfs: Check for extent overflow when writing to unwritten extent Chandan Babu R
2021-01-10 16:07 ` [PATCH V14 09/16] xfs: Check for extent overflow when moving extent from cow to data fork Chandan Babu R
2021-01-10 16:07 ` [PATCH V14 10/16] xfs: Check for extent overflow when remapping an extent Chandan Babu R
2021-01-10 16:07 ` [PATCH V14 11/16] xfs: Check for extent overflow when swapping extents Chandan Babu R
2021-01-10 16:07 ` [PATCH V14 12/16] xfs: Introduce error injection to reduce maximum inode fork extent count Chandan Babu R
2021-01-10 16:07 ` [PATCH V14 13/16] xfs: Remove duplicate assert statement in xfs_bmap_btalloc() Chandan Babu R
2021-01-10 16:07 ` [PATCH V14 14/16] xfs: Compute bmap extent alignments in a separate function Chandan Babu R
2021-01-10 16:07 ` [PATCH V14 15/16] xfs: Process allocated extent " Chandan Babu R
2021-01-10 16:07 ` [PATCH V14 16/16] xfs: Introduce error injection to allocate only minlen size extents for files Chandan Babu R
2022-05-23 11:15 ` [PATCH V14 00/16] Bail out if transaction can cause extent count to overflow Amir Goldstein
2022-05-23 15:50   ` Chandan Babu R
2022-05-23 19:06     ` Amir Goldstein
2022-05-25  5:49       ` Amir Goldstein
2022-05-23 22:43   ` Dave Chinner
2022-05-24  5:36     ` Amir Goldstein
2022-05-24 16:05       ` Amir Goldstein
2022-05-25  8:21         ` Dave Chinner [this message]
2022-05-25  7:33       ` Dave Chinner
2022-05-25  7:48         ` Amir Goldstein
2022-05-25  8:38           ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220525082140.GG1098723@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=allison.henderson@oracle.com \
    --cc=amir73il@gmail.com \
    --cc=chandanrlinux@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox