From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from aserp1040.oracle.com ([141.146.126.69]:43415 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751552AbdHQBte (ORCPT ); Wed, 16 Aug 2017 21:49:34 -0400 To: Lukas Czerner Cc: linux-fsdevel@vger.kernel.org, axboe@kernel.dk, martin.petersen@oracle.com, Christoph Hellwig Subject: Re: [PATCH] block: reintroduce discard_zeroes_data sysfs file and BLKDISCARDZEROES From: "Martin K. Petersen" References: <1502889581-19483-1-git-send-email-lczerner@redhat.com> <20170816151803.GB18932@lst.de> <20170816154845.y5kcq3ssbp7efduy@rh_laptop> Date: Wed, 16 Aug 2017 21:49:22 -0400 In-Reply-To: <20170816154845.y5kcq3ssbp7efduy@rh_laptop> (Lukas Czerner's message of "Wed, 16 Aug 2017 17:48:45 +0200") Message-ID: MIME-Version: 1.0 Content-Type: text/plain Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Lukas, > I'd like to be able to recognize where we have a device that does > support write zero with unmap, TRIM with RZAT and whatever else that > provides this. The problem was that the original REQ_DISCARD was used both for de-provisioning block ranges and for zeroing them. On top of that, the storage standards tweaked the definitions a bit so the semantics became even more confusing and harder to honor in the drivers. As a result, we changed things so that discards are only used to de-provision blocks. And the zeroout call/ioctl is used to zero block ranges. Which ATA/SCSI/NVMe command is issued on the back-end depends on what's supported by the device and is hidden from the caller. However, zeroout is guaranteed to return a zeroed block range on subsequent reads. The blocks may be unmapped, anchored, written explicitly, written with write same, or a combination thereof. But you are guaranteed predictable results. Whereas a discarded region may be sliced and diced and rounded off before it hits the device. Which is then free to ignore all or parts of the request. Consequently, discard_zeroes_data is meaningless. Because there is no guarantee that all of the discarded blocks will be acted upon. It kinda-sorta sometimes worked (if the device was whitelisted, had a reported alignment of 0, a granularity of 512 bytes, stacking didn't get in the way, and you were lucky on the device end). But there were always conditions. So taking a step back: What information specifically were you trying to obtain from querying that flag? And why do you need it? -- Martin K. Petersen Oracle Linux Engineering