Re: discard feature, mkfs.ext4 and mmc default fallback to normal erase op

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Theodore Y. Ts'o" <tytso@mit.edu>
To: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Michael Walle <michael@walle.cc>,
	linux-ext4@vger.kernel.org,
	"linux-mmc@vger.kernel.org" <linux-mmc@vger.kernel.org>,
	linux-block <linux-block@vger.kernel.org>
Subject: Re: discard feature, mkfs.ext4 and mmc default fallback to normal erase op
Date: Wed, 9 Dec 2020 11:35:33 -0500	[thread overview]
Message-ID: <20201209163533.GI52960@mit.edu> (raw)
In-Reply-To: <CAPDyKFrpccdMyqsxTi2dotbtr3_7hL0hUjWXc5Sx52kGnrDuLw@mail.gmail.com>

On Wed, Dec 09, 2020 at 03:51:24PM +0100, Ulf Hansson wrote:
> 
> Even if the discarded blocks are flushed at some wisely selected
> point, when the device is idle, that doesn't guarantee that the
> internal garbage collection runs inside the device. In the end that
> depends on the FW implementation of the card - and I assume it's
> likely triggered based on some internal idle time and the amount of
> "garbage" there is to deal with.

At least from a file system perspective, I don't care when the
internal garbage collection actually runs inside the device.  What I
do care is that (a) a read to a discarded sector returns zero's after
it has been discard (or the storage device needs to tell me I can't
count on that), and (b) that eventually, for write endurance reasons,
the garbage collection will *eventually* happen.

If the list of erase blocks or flash pages that are not in use are
tracked in such a way that they are actually garbage collected before
the device actually needs free blocks, it really doesn't matter if it
happens right away, or hours later.  (If the device is 90% free,
because it was just formatted and we did a pre-discard at format time,
then it could happen hours or days later.)

But if the device's FTL is too incompetent such that it loses track of
which erase blocks / flash pages do need to be GC'ed, such that it
impacts device lifetime... well then, that's sad, and it would be nice
to find out about this without having to do an expensive,
time-consuming certification process.  (OTOH, all the big companies
are doing hardware certifications anyway, because you can't fully
trust the storage vendors, and how many storage vendors are really
going to admit, or make it easy to determine, "the FTL is so
cost-optimized that it's cr*p"?  :-)

Having a way to tell the storage device that it would be better to
suspend GC, or to accelerate GC, because we know the device is about
to become much less likely to perform writes, would certainly be a
good and useful thing to do, although I see that as mostly being
useful for improving I/O performance, especially for low-end flash ---
I suspect that for high-end SSD's, which are designed so that they can
handle continuous write streams without much performance degradation,
they have enough oomph in their internal CPU that they can do GC's in
real-time while the device is under a continuous random write workload
with only minimal performance impacts.

> *) Use the runtime PM framework to detect an idle period and then
> trigger background operations. The problem is, that we don't really
> know how long we will be idle, meaning that we don't know if it's
> really a wise decision to trigger the background operations in the
> end.
> 
> **) Invent a new type of generic block request, as to let userspace
> trigger this.

I think you really want to give userspace the ability to trigger this.
Whether it's via a generic block request, or an ioctl, I'll leave that
to the people maintain the driver and/or block layer.  That's because
userspace will have knowledge to things like, "the screen is off", or
"the phone is on the wireless charger and/or the user has said, "OK,
Google, goodnight" to trigger the night-time home automation commands.

We can of course try to make some automatic determinations based on
the runtime PM framework, but that doesn't necessarily tell us the
likelihood that the system will become busy in the future; OTOH, maybe
that doesn't matter, if the storage needs only a very tiny amount of
time after it's told, "stop GC", to finish up what it's doing so it
can respond to I/O request at full speed?

						- Ted

     prev parent reply	other threads:[~2020-12-09 16:36 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-07 15:10 discard feature, mkfs.ext4 and mmc default fallback to normal erase op Michael Walle
2020-12-07 18:35 ` Theodore Y. Ts'o
2020-12-07 20:39   ` Michael Walle
2020-12-08  2:40     ` Theodore Y. Ts'o
2020-12-08  9:49       ` Ulf Hansson
2020-12-08 11:26         ` Michael Walle
2020-12-08 16:17           ` Ulf Hansson
2020-12-08 20:57             ` Michael Walle
2020-12-08 16:52           ` Theodore Y. Ts'o
2020-12-09 14:51             ` Ulf Hansson
2020-12-09 16:35               ` Theodore Y. Ts'o [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201209163533.GI52960@mit.edu \
    --to=tytso@mit.edu \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-mmc@vger.kernel.org \
    --cc=michael@walle.cc \
    --cc=ulf.hansson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.