From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from verein.lst.de ([213.95.11.211]:44581 "EHLO newverein.lst.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751628AbdHQIRN (ORCPT ); Thu, 17 Aug 2017 04:17:13 -0400 Date: Thu, 17 Aug 2017 10:17:11 +0200 From: Christoph Hellwig To: "Martin K. Petersen" , linux-fsdevel@vger.kernel.org, axboe@kernel.dk, Christoph Hellwig Subject: Re: [PATCH] block: reintroduce discard_zeroes_data sysfs file and BLKDISCARDZEROES Message-ID: <20170817081711.GA24626@lst.de> References: <1502889581-19483-1-git-send-email-lczerner@redhat.com> <20170816151803.GB18932@lst.de> <20170816154845.y5kcq3ssbp7efduy@rh_laptop> <20170817074744.blcsel6w2ctbjjd4@rh_laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170817074744.blcsel6w2ctbjjd4@rh_laptop> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Thu, Aug 17, 2017 at 09:47:44AM +0200, Lukas Czerner wrote: > There are many users that historically benefit from the > "discard_zeroes_data" semantics. For example mkfs, where it's beneficial > to discard the blocks before creating a file system and if we also get > deterministic zeroes on read, even better since we do not have to > initialize some portions of the file system manually. But that's now what discard_zeroes_data gives you unfortunately. > So I understand now that Deterministic Read Zero after TRIM is not > realiable so we do not want to use that flag because we can't guarantee > it in this case. However there are other situations where we can such > as loop device (might be especially usefull for VM) where backing file > system supports punch hole, or even SCSI write same with UNMAP ? > > Currently user space can call fallocate with FALLOC_FL_PUNCH_HOLE | > FALLOC_FL_KEEP_SIZE however if that succeeds we're only guaranteed that > the range has been zeroed, not unmapped/discarded ? (that's not very > clear from the comments). None of the modes seems to guarantee both > zeroout and unmap in case of success. However still, there seem to be no > way to tell what's actually supported from user space without ending up > calling fallocate, is there ? While before we had discard_zeroes_data > which people learned to rely on in certain situations, even though it > might have been shaky. You never get (and never got) a guarantee that the blocks were unmapped as none of the storage protocol ever requires the device to deallocate. Because devices have their internal chunk/block size everything else would be impractival. But fallocate FALLOC_FL_PUNCH_HOLE on a block device sets the REQ_UNAP hints which asks the driver to unmap if at all possible. Note that unmap or not is not a binary decision - typical devices will deallocate all whole blocks inside the range, and zero the rest.