Re: [PATCH] ext4: Avoid trim error on fs with small groups

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Martin Wilck <mwilck@suse.com>
To: "Martin K. Petersen" <martin.petersen@oracle.com>,
	Lukas Czerner <lczerner@redhat.com>
Cc: Jan Kara <jack@suse.cz>, Ted Tso <tytso@mit.edu>,
	mwilck@suse.de, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] ext4: Avoid trim error on fs with small groups
Date: Fri, 19 Nov 2021 16:43:53 +0100	[thread overview]
Message-ID: <0a3061a3f443c86cf3b38007e85c51d94e9d7845.camel@suse.com> (raw)
In-Reply-To: <yq1y25n8lpb.fsf@ca-mkp.ca.oracle.com>

Hi Martin, Lukas,

On Tue, 2021-11-16 at 23:35 -0500, Martin K. Petersen wrote:
> 
> Lukas,
> 
> > My understanding always was that the request needs to be both
> > properly
> > aligned and of a certain minimum size (discard_granularity) to take
> > effect.
> > 
> > If what you're saying is true then I feel like the documentation of
> > discard_granularity
> > 
> > Documentation/ABI/testing/sysfs-block
> > Documentation/block/queue-sysfs.rst
> > 
> > should change because that's not how I understand the notion of
> > internal allocation unit. However the sentence you quoted is not
> > entirely clear to me either so I'll leave that decision to someone
> > who
> > understands it better than me.
> > 
> > Martin could you please clarify it for us ?
> 
> The rationale behind exposing the discard granularity to userland was
> to
> facilitate mkfs.* picking allocation units that were friendly wrt. the
> device's internal granularity. The nature of that granularity depends
> on
> the type of device (thin provisioned, resource provisioned, SSD, etc.).
> 
> Just like the other queue limits the discard granularity was meant as a
> hint (some of that hintiness got lost as a result of conflating zeroing
> and deallocating but that has since been resolved).
> 
> It is true that some devices get confused if you submit discards that
> are smaller than their internal granularity. However, those devices are
> typically the ones that don't actually support reporting that
> granularity in the first place! Whereas SCSI devices generally don't
> care. They'll happily ignore any parts of the request that are smaller
> than whatever size they use internally.
> 
> One of the problems with "optimizing" away pieces that are smaller than
> the reported granularity is when you combine devices. Say you have a
> RAID1 of an SSD that reports 4096 bytes and one that reports 256MB. The
> stacked device will then have a discard granularity of 256MB and thus
> the SSD with the smaller granularity won't receive any of the discards
> that might otherwise help it manage its media.

I've checked btrfs and xfs, and they treat the minlen like Jan's patch
would - the minlen is set to the device's granularity, and discarding
smaller ranges is silently not even attempted.

So this seems to be common practice among file system implementations,
and thus Jan's patch would make ext4 behave like the others. But I'm
still uncertain if this behavior is ideal, and your remarks seem to
confirm that.

The fstrim man page text is highly misleading. "The default value is
zero, discarding every free block" is obviously wrong, given the
current actual behavior of the major filesystems.

The way this is currently implemented, the user has no way to actually
"discard every free block" to the extent supported by the device. I
think that being able to do this would be desirable, at least in
certain situations.

If we changed this, with the default value of 0 used by fstrim, file
systems would attempt to free every block, which is slow and would
likely either fail or have no effect on may devices. Maybe we should
treat the value "0" like "automatic", i.e. adjust it the minlen to the
device's granularity like we do now, but leave the value unchanged if
the user explicitly sets a small non-zero value? That way "fstrim -m
512" could be used to force discarding every block that can be
discarded.

Regards
Martin

next prev parent reply	other threads:[~2021-11-19 15:43 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20211112152202.26614-1-jack@suse.cz>
     [not found] ` <20211115114821.swt3nqtw2pdgahsq@work>
     [not found]   ` <20211115125141.GD23412@quack2.suse.cz>
     [not found]     ` <59b60aae9401a043f7d7cec0f8004f2ca7d4f4db.camel@suse.com>
     [not found]       ` <20211115145312.g4ptf22rl55jf37l@work>
     [not found]         ` <4e4d1ac7735c38f1395db19b97025bf411982b60.camel@suse.com>
2021-11-16 11:56           ` [PATCH] ext4: Avoid trim error on fs with small groups Lukas Czerner
2021-11-17  4:35             ` Martin K. Petersen
2021-11-19 15:43               ` Martin Wilck [this message]
2021-11-22 13:53                 ` Lukas Czerner
2022-01-03 15:34                   ` Martin Wilck
2022-01-03 18:59                     ` Lukas Czerner
2022-01-04 14:55                       ` Jan Kara
2022-01-04 15:05                         ` Martin Wilck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0a3061a3f443c86cf3b38007e85c51d94e9d7845.camel@suse.com \
    --to=mwilck@suse.com \
    --cc=jack@suse.cz \
    --cc=lczerner@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mwilck@suse.de \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).