From: Lukas Czerner <lczerner@redhat.com>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-fsdevel@vger.kernel.org, axboe@kernel.dk,
Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH] block: reintroduce discard_zeroes_data sysfs file and BLKDISCARDZEROES
Date: Thu, 17 Aug 2017 09:47:44 +0200 [thread overview]
Message-ID: <20170817074744.blcsel6w2ctbjjd4@rh_laptop> (raw)
In-Reply-To: <yq1inhnneu5.fsf@oracle.com>
On Wed, Aug 16, 2017 at 09:49:22PM -0400, Martin K. Petersen wrote:
> e standards tweaked the definitions a bit so the semantics became
> even more confusing and harder to honor in the drivers.
>
> As a result, we changed things so that discards are only used to
> de-provision blocks. And the zeroout call/ioctl is used to zero block
> ranges.
>
> Which ATA/SCSI/NVMe command is issued on the back-end depends on what's
> supported by the device and is hidden from the caller.
>
> However, zeroout is guaranteed to return a zeroed block range on
> subsequent reads. The blocks may be unmapped, anchored, written
> explicitly, written with write same, or a combination thereof. But you
> are guaranteed predictable results.
>
> Whereas a discarded region may be sliced and diced and rounded off
> before it hits the device. Which is then free to ignore all or parts of
> the request.
>
> Consequently, discard_zeroes_data is meaningless. Because there is no
> guarantee that all of the discarded blocks will be acted upon. It
> kinda-sorta sometimes worked (if the device was whitelisted, had a
> reported alignment of 0, a granularity of 512 bytes, stacking didn't get
> in the way, and you were lucky on the device end). But there were always
> conditions.
Thanks for the detailed explanation. That's wery usefull to know!
>
> So taking a step back: What information specifically were you trying to
> obtain from querying that flag? And why do you need it?
There are many users that historically benefit from the
"discard_zeroes_data" semantics. For example mkfs, where it's beneficial
to discard the blocks before creating a file system and if we also get
deterministic zeroes on read, even better since we do not have to
initialize some portions of the file system manually.
The other example might be virtualization where they can support
efficient "Wipe After Delete" and "Enable Discard" in case that
"discard_zeroes_data". I am sure there are other examples.
So I understand now that Deterministic Read Zero after TRIM is not
realiable so we do not want to use that flag because we can't guarantee
it in this case. However there are other situations where we can such
as loop device (might be especially usefull for VM) where backing file
system supports punch hole, or even SCSI write same with UNMAP ?
Currently user space can call fallocate with FALLOC_FL_PUNCH_HOLE |
FALLOC_FL_KEEP_SIZE however if that succeeds we're only guaranteed that
the range has been zeroed, not unmapped/discarded ? (that's not very
clear from the comments). None of the modes seems to guarantee both
zeroout and unmap in case of success. However still, there seem to be no
way to tell what's actually supported from user space without ending up
calling fallocate, is there ? While before we had discard_zeroes_data
which people learned to rely on in certain situations, even though it
might have been shaky.
I actually like the rewrite the Christoph did, even though documentation
seems to be lacking. But I just wonder if it's possible to bring back
the former functionality, at least in some form.
Thanks!
-Lukas
>
> --
> Martin K. Petersen Oracle Linux Engineering
next prev parent reply other threads:[~2017-08-17 7:47 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-16 13:19 [PATCH] block: reintroduce discard_zeroes_data sysfs file and BLKDISCARDZEROES Lukas Czerner
2017-08-16 15:18 ` Christoph Hellwig
2017-08-16 15:48 ` Lukas Czerner
2017-08-17 1:49 ` Martin K. Petersen
2017-08-17 7:47 ` Lukas Czerner [this message]
2017-08-17 8:17 ` Christoph Hellwig
2017-08-17 8:41 ` Lukas Czerner
2017-08-17 9:52 ` Christoph Hellwig
2017-08-17 13:35 ` Lukas Czerner
2017-08-17 17:47 ` Martin K. Petersen
2017-08-17 19:35 ` Lukas Czerner
2017-08-17 20:39 ` Theodore Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170817074744.blcsel6w2ctbjjd4@rh_laptop \
--to=lczerner@redhat.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=martin.petersen@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox