Re: [PATCH v3 1/2] ext4: introduce EXT4_BG_WAS_TRIMMED to optimize trim

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: tytso@mit.edu
To: Andreas Dilger <adilger@dilger.ca>
Cc: Wang Shilong <wangshilong1991@gmail.com>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Wang Shilong <wshilong@ddn.com>, Shuichi Ihara <sihara@ddn.com>
Subject: Re: [PATCH v3 1/2] ext4: introduce EXT4_BG_WAS_TRIMMED to optimize trim
Date: Mon, 10 Aug 2020 09:24:57 -0400	[thread overview]
Message-ID: <20200810132457.GA14208@mit.edu> (raw)
In-Reply-To: <9789BE11-11FB-42B2-A5BE-D4887838ED10@dilger.ca>

On Sat, Aug 08, 2020 at 10:33:08PM -0600, Andreas Dilger wrote:
> What about storing "s_min_freed_blocks_to_trim" persistently in the
> superblock, and then the admin can adjust this as desired?  If it is
> set =1, then the "lazy trim" optimization would be disabled (every
> FITRIM request would honor the trim requests whenever there is a
> freed block in a group).  I suppose we could allow =0 to mean "do not
> store the WAS_TRIMMED flag persistently", so there would be no change
> for current behavior, and it would require a tune2fs option to set the
> new value into the superblock (though we might consider setting this
> to a non-zero value in mke2fs by default).

Currently the the minimum blocks to trim is passed in to FITRIM from
userspace; so we would need to define how the passed-in value from the
fstrim program interacts with the value stored in the sueprblock.
Would we always ignore the value passed-in from userspace?  That
doesn't seem right...

> The other thing we were thinkgin about was changing the "-o discard" code
> to leverage the WAS_TRIMMED flag, and just do bulk trim periodically
> in the filesystem as blocks are freed from groups, rather than tracking
> freed extents in memory and submitting trims actively during IO.  Instead,
> it would track groups that exceed "s_min_freed_blocks_to_trim", and trim
> the whole group in the background when the filesystem is not active.

Hmm, maybe.  That's an awful lot of complexity, which is my concern
with that approach.

Part of the problem here is that discard is being used for different
things for different use cases and devices with different discard
speeds.  Right now, one of the primary uses of -o discard is for
people who have fast discard implementation(s and/or people who really
want to make sure every freed block is immediately discard --- perhaps
to meet security / privacy requirements (such as HIPPA compliance,
etc.).   I don't want to break that.

We now have a requirement of people who have very slow discards --- I
think at one point people mentioned something about for devices using
HDD, probably in some kind of dm-thin use case?  One solution that we
can use for those is simply use fstrim -m 8M or some such.  But it
appears that part of the problem is people do want more precision than
that?

Another solution might be to skip trimming block groups if there have
been blocks that have been freshly freed that are pending a commit,
and skip that block group until the commit has completed.  That might
also help reduce contention on a busy file system.

Yet another solution might be bias block allocations towards LBA
Uranges that have been deleted recently --- since another way to avoid
trims is to simply overwrite those LBA's.  But then the question is
how much memory are we willing to dedicate towards tracking recently
released LBA's, and to what level of granularity?  Perhaps we just
track the freed extents, and if they don't get used within a certain
period, or if we start getting put under memory pressure, we then send
the discards at that point.

Ultimately, though, this is a space full of trade offs, and I'm
reminded of one of my father's favorite Chinese sayings: "You're
demanding a horse which can run fast, but which doesn't eat much
grass." (又要马儿跑，又要马儿不吃草).  Or translated more
idiomatically, you can't have your cake and eat it too.  It seems this
desire transcends all cultures.  :-)

	       	   	      	   	- Ted

next prev parent reply	other threads:[~2020-08-10 13:25 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-22 13:14 [PATCH v3 1/2] ext4: introduce EXT4_BG_WAS_TRIMMED to optimize trim Wang Shilong
2020-06-22 13:14 ` [PATCH 2/2] ext4: avoid trimming block group if only few blocks freed Wang Shilong
2020-06-22 14:16   ` [PATCH v2 " Wang Shilong
2020-06-22 17:20     ` Andreas Dilger
2020-06-22 17:18 ` [PATCH v3 1/2] ext4: introduce EXT4_BG_WAS_TRIMMED to optimize trim Andreas Dilger
2020-08-06  4:47 ` tytso
2020-08-08  1:29   ` Wang Shilong
2020-08-08 15:18     ` tytso
2020-08-09  4:33       ` Andreas Dilger
2020-08-10 13:24         ` tytso [this message]
2020-08-12 23:14           ` Andreas Dilger
2020-08-14  8:06           ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200810132457.GA14208@mit.edu \
    --to=tytso@mit.edu \
    --cc=adilger@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sihara@ddn.com \
    --cc=wangshilong1991@gmail.com \
    --cc=wshilong@ddn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).