All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: "Martin K. Petersen" <martin.petersen@oracle.com>,
	linux-fsdevel@vger.kernel.org, axboe@kernel.dk,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH] block: reintroduce discard_zeroes_data sysfs file and BLKDISCARDZEROES
Date: Thu, 17 Aug 2017 10:17:11 +0200	[thread overview]
Message-ID: <20170817081711.GA24626@lst.de> (raw)
In-Reply-To: <20170817074744.blcsel6w2ctbjjd4@rh_laptop>

On Thu, Aug 17, 2017 at 09:47:44AM +0200, Lukas Czerner wrote:
> There are many users that historically benefit from the
> "discard_zeroes_data" semantics. For example mkfs, where it's beneficial
> to discard the blocks before creating a file system and if we also get
> deterministic zeroes on read, even better since we do not have to
> initialize some portions of the file system manually.

But that's now what discard_zeroes_data gives you unfortunately.

> So I understand now that Deterministic Read Zero after TRIM is not
> realiable so we do not want to use that flag because we can't guarantee
> it in this case. However there are other situations where we can such
> as loop device (might be especially usefull for VM) where backing file
> system supports punch hole, or even SCSI write same with UNMAP ?
> 
> Currently user space can call fallocate with FALLOC_FL_PUNCH_HOLE |
> FALLOC_FL_KEEP_SIZE however if that succeeds we're only guaranteed that
> the range has been zeroed, not unmapped/discarded ? (that's not very
> clear from the comments). None of the modes seems to guarantee both
> zeroout and unmap in case of success. However still, there seem to be no
> way to tell what's actually supported from user space without ending up
> calling fallocate, is there ? While before we had discard_zeroes_data
> which people learned to rely on in certain situations, even though it
> might have been shaky.

You never get (and never got) a guarantee that the blocks were unmapped
as none of the storage protocol ever requires the device to deallocate.
Because devices have their internal chunk/block size everything else would
be impractival.

But fallocate FALLOC_FL_PUNCH_HOLE on a block device sets the REQ_UNAP
hints which asks the driver to unmap if at all possible.  Note that
unmap or not is not a binary decision - typical devices will deallocate
all whole blocks inside the range, and zero the rest.

  reply	other threads:[~2017-08-17  8:17 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-16 13:19 [PATCH] block: reintroduce discard_zeroes_data sysfs file and BLKDISCARDZEROES Lukas Czerner
2017-08-16 15:18 ` Christoph Hellwig
2017-08-16 15:48   ` Lukas Czerner
2017-08-17  1:49     ` Martin K. Petersen
2017-08-17  7:47       ` Lukas Czerner
2017-08-17  8:17         ` Christoph Hellwig [this message]
2017-08-17  8:41           ` Lukas Czerner
2017-08-17  9:52             ` Christoph Hellwig
2017-08-17 13:35               ` Lukas Czerner
2017-08-17 17:47                 ` Martin K. Petersen
2017-08-17 19:35                   ` Lukas Czerner
2017-08-17 20:39                   ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170817081711.GA24626@lst.de \
    --to=hch@lst.de \
    --cc=axboe@kernel.dk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.