* [RFC] Discard update for 3.19 @ 2014-11-07 5:08 Martin K. Petersen 2014-11-07 5:08 ` [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM Martin K. Petersen ` (5 more replies) 0 siblings, 6 replies; 28+ messages in thread From: Martin K. Petersen @ 2014-11-07 5:08 UTC (permalink / raw) To: linux-scsi, linux-ide, linux-fsdevel, neilb This update is mainly motivated by an attempt to make the discard_zeroes_data reporting more accurate. As we have discussed several times in the past we are stuck with pretty weak guarantees from the T10/T13 standards. And as a result we feel compelled to tighten up the scenarios under which we advertise discard_zeroes_data since several applications and subsystems depend on it being accurate. The first patch is the most controversial. It disables discard_zeroes_data for libata devices unless they explicitly have been whitelisted. I had hoped to have a more comprehensive list of drives but I didn't have much luck in procuring the identify strings that would allow me to generate the whitelist matching patterns. I could use some help here. The second patch tweaks the SCSI disk driver to prefer WRITE SAME w/ UNMAP instead of the UNMAP command since the former has deterministic behavior. The lack of a hard discard_zeroes_data guarantees has also prevented us from having a variant of blkdev_issue_zeroout() that discards if possible. The last patch in this series will add such a call that the filesystems and virt block drivers can use to clear and deprovision block ranges. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM 2014-11-07 5:08 [RFC] Discard update for 3.19 Martin K. Petersen @ 2014-11-07 5:08 ` Martin K. Petersen 2014-11-07 8:24 ` Christoph Hellwig ` (2 more replies) 2014-11-07 5:08 ` [PATCH 2/3] sd: Disable discard_zeroes_data for UNMAP Martin K. Petersen ` (4 subsequent siblings) 5 siblings, 3 replies; 28+ messages in thread From: Martin K. Petersen @ 2014-11-07 5:08 UTC (permalink / raw) To: linux-scsi, linux-ide, linux-fsdevel, neilb; +Cc: Martin K. Petersen As defined, the DRAT (Deterministic Read After Trim) and RZAT (Return Zero After Trim) flags in the ATA Command Set are unreliable in the sense that they only define what happens if the device successfully executed the DSM TRIM command. TRIM is only advisory, however, and the device is free to silently ignore all or parts of the request. In practice this renders the DRAT and RZAT flags completely useless and because the results are unpredictable we decided to disable discard in MD for 3.18 to avoid the risk of data corruption. Hardware vendors in the real world obviously need better guarantees than what the standards bodies provide. Unfortuntely those guarantees are encoded in product requirements documents rather than somewhere we can key off of them programatically. So we are compelled to disabling discard_zeroes_data for all devices unless we explicitly have data to support whitelisting them. This patch whitelists SSDs from a few of the main vendors. None of the whitelists are based on written guarantees. They are purely based on empirical evidence collected from internal and external users that have tested or qualified these drives in RAID deployments. The whitelist is only meant as a starting point and is by no means comprehensive: - All intel SSD models except for 510 - Micron M5* - Samsung SSDs - Seagate SSDs Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> --- drivers/ata/libata-core.c | 18 ++++++++++++++---- drivers/ata/libata-scsi.c | 10 ++++++---- include/linux/libata.h | 1 + 3 files changed, 21 insertions(+), 8 deletions(-) diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index c5ba15af87d3..f41f24a8bc21 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4225,10 +4225,20 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = { { "PIONEER DVD-RW DVR-216D", NULL, ATA_HORKAGE_NOSETXFER }, /* devices that don't properly handle queued TRIM commands */ - { "Micron_M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, - { "Crucial_CT???M500SSD*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, - { "Micron_M550*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, - { "Crucial_CT*M550SSD*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, + { "Micron_M5?0*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | + ATA_HORKAGE_ZERO_AFTER_TRIM, }, + { "Crucial_CT???M5?0SSD*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, + + /* + * DRAT/RZAT are weak guarantees. Explicitly black/whitelist + * SSDs that provide reliable zero after TRIM. + */ + { "INTEL*SSDSC2MH*", NULL, 0, }, /* Blacklist intel 510 */ + { "INTEL*SSD*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, + { "SSD*INTEL*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, + { "Samsung*SSD*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, + { "SAMSUNG*SSD*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, + { "ST[1248][0248]0[FH]*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, /* * Some WD SATA-I drives spin up and down erratically when the link diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index 0586f66d70fa..deaa6e34ed4d 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -2515,13 +2515,15 @@ static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf) rbuf[15] = lowest_aligned; if (ata_id_has_trim(args->id)) { - rbuf[14] |= 0x80; /* TPE */ + rbuf[14] |= 0x80; /* LBPME */ - if (ata_id_has_zero_after_trim(args->id)) - rbuf[14] |= 0x40; /* TPRZ */ + if (ata_id_has_zero_after_trim(args->id) && + dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM) { + ata_dev_warn(dev, "Enabling discard_zeroes_data\n"); + rbuf[14] |= 0x40; /* LBPRZ */ + } } } - return 0; } diff --git a/include/linux/libata.h b/include/linux/libata.h index bd5fefeaf548..45ac825b8366 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -421,6 +421,7 @@ enum { ATA_HORKAGE_NO_NCQ_TRIM = (1 << 19), /* don't use queued TRIM */ ATA_HORKAGE_NOLPM = (1 << 20), /* don't use LPM */ ATA_HORKAGE_WD_BROKEN_LPM = (1 << 21), /* some WDs have broken LPM */ + ATA_HORKAGE_ZERO_AFTER_TRIM = (1 << 22),/* guarantees zero after trim */ /* DMA mask for user DMA control: User visible values; DO NOT renumber */ -- 1.9.3 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM 2014-11-07 5:08 ` [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM Martin K. Petersen @ 2014-11-07 8:24 ` Christoph Hellwig 2014-11-07 15:37 ` Martin K. Petersen 2014-12-05 16:45 ` Paolo Bonzini 2014-12-05 22:58 ` Elliott, Robert (Server Storage) 2 siblings, 1 reply; 28+ messages in thread From: Christoph Hellwig @ 2014-11-07 8:24 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi, linux-ide, linux-fsdevel, neilb On Fri, Nov 07, 2014 at 12:08:12AM -0500, Martin K. Petersen wrote: > if (ata_id_has_trim(args->id)) { > - rbuf[14] |= 0x80; /* TPE */ > + rbuf[14] |= 0x80; /* LBPME */ > > - if (ata_id_has_zero_after_trim(args->id)) > - rbuf[14] |= 0x40; /* TPRZ */ > + if (ata_id_has_zero_after_trim(args->id) && > + dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM) { > + ata_dev_warn(dev, "Enabling discard_zeroes_data\n"); I think this should _info, not _warn. Otherwise looks good to me, Reviewed-by: Christoph Hellwig <hch@lst.de> It would be nice if there was a way to trigger the flag from userspace, so that we don't need to rebuild the kernel to add a whitelist entry. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM 2014-11-07 8:24 ` Christoph Hellwig @ 2014-11-07 15:37 ` Martin K. Petersen 0 siblings, 0 replies; 28+ messages in thread From: Martin K. Petersen @ 2014-11-07 15:37 UTC (permalink / raw) To: Christoph Hellwig Cc: Martin K. Petersen, linux-scsi, linux-ide, linux-fsdevel, neilb >>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes: >> + ata_dev_warn(dev, "Enabling discard_zeroes_data\n"); Christoph> I think this should _info, not _warn. Fixed. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM 2014-11-07 5:08 ` [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM Martin K. Petersen 2014-11-07 8:24 ` Christoph Hellwig @ 2014-12-05 16:45 ` Paolo Bonzini 2014-12-05 22:58 ` Elliott, Robert (Server Storage) 2 siblings, 0 replies; 28+ messages in thread From: Paolo Bonzini @ 2014-12-05 16:45 UTC (permalink / raw) To: Martin K. Petersen, linux-scsi, linux-ide, linux-fsdevel, neilb On 07/11/2014 06:08, Martin K. Petersen wrote: > The whitelist is only meant as a starting point and is by no means > comprehensive: > > - All intel SSD models except for 510 > - Micron M5* > - Samsung SSDs > - Seagate SSDs > > Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> > --- > drivers/ata/libata-core.c | 18 ++++++++++++++---- > drivers/ata/libata-scsi.c | 10 ++++++---- > include/linux/libata.h | 1 + > 3 files changed, 21 insertions(+), 8 deletions(-) > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c > index c5ba15af87d3..f41f24a8bc21 100644 > --- a/drivers/ata/libata-core.c > +++ b/drivers/ata/libata-core.c > @@ -4225,10 +4225,20 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = { > { "PIONEER DVD-RW DVR-216D", NULL, ATA_HORKAGE_NOSETXFER }, > > /* devices that don't properly handle queued TRIM commands */ > - { "Micron_M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, > - { "Crucial_CT???M500SSD*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, > - { "Micron_M550*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, > - { "Crucial_CT*M550SSD*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, > + { "Micron_M5?0*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | > + ATA_HORKAGE_ZERO_AFTER_TRIM, }, > + { "Crucial_CT???M5?0SSD*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, I have a Crucial_CT256MX1 (i.e. MX100) and it does reliably zero. BTW. it's the same hardware as the M550, so probably the same set of quirks should apply to both. Paolo > + > + /* > + * DRAT/RZAT are weak guarantees. Explicitly black/whitelist > + * SSDs that provide reliable zero after TRIM. > + */ > + { "INTEL*SSDSC2MH*", NULL, 0, }, /* Blacklist intel 510 */ > + { "INTEL*SSD*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, > + { "SSD*INTEL*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, > + { "Samsung*SSD*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, > + { "SAMSUNG*SSD*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, > + { "ST[1248][0248]0[FH]*", NULL, ATA_HORKAGE_ZERO_AFTER_TRIM, }, > > /* > * Some WD SATA-I drives spin up and down erratically when the link > diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c > index 0586f66d70fa..deaa6e34ed4d 100644 > --- a/drivers/ata/libata-scsi.c > +++ b/drivers/ata/libata-scsi.c > @@ -2515,13 +2515,15 @@ static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf) > rbuf[15] = lowest_aligned; > > if (ata_id_has_trim(args->id)) { > - rbuf[14] |= 0x80; /* TPE */ > + rbuf[14] |= 0x80; /* LBPME */ > > - if (ata_id_has_zero_after_trim(args->id)) > - rbuf[14] |= 0x40; /* TPRZ */ > + if (ata_id_has_zero_after_trim(args->id) && > + dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM) { > + ata_dev_warn(dev, "Enabling discard_zeroes_data\n"); > + rbuf[14] |= 0x40; /* LBPRZ */ > + } > } > } > - > return 0; > } > > diff --git a/include/linux/libata.h b/include/linux/libata.h > index bd5fefeaf548..45ac825b8366 100644 > --- a/include/linux/libata.h > +++ b/include/linux/libata.h > @@ -421,6 +421,7 @@ enum { > ATA_HORKAGE_NO_NCQ_TRIM = (1 << 19), /* don't use queued TRIM */ > ATA_HORKAGE_NOLPM = (1 << 20), /* don't use LPM */ > ATA_HORKAGE_WD_BROKEN_LPM = (1 << 21), /* some WDs have broken LPM */ > + ATA_HORKAGE_ZERO_AFTER_TRIM = (1 << 22),/* guarantees zero after trim */ > > /* DMA mask for user DMA control: User visible values; DO NOT > renumber */ > ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM 2014-11-07 5:08 ` [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM Martin K. Petersen 2014-11-07 8:24 ` Christoph Hellwig 2014-12-05 16:45 ` Paolo Bonzini @ 2014-12-05 22:58 ` Elliott, Robert (Server Storage) 2014-12-08 15:15 ` Theodore Ts'o 2 siblings, 1 reply; 28+ messages in thread From: Elliott, Robert (Server Storage) @ 2014-12-05 22:58 UTC (permalink / raw) To: Martin K. Petersen, Paolo Bonzini, linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org, linux-fsdevel@vger.kernel.org, neilb@suse.de > -----Original Message----- > From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi- > owner@vger.kernel.org] On Behalf Of Martin K. Petersen > Sent: Thursday, 06 November, 2014 11:08 PM > To: linux-scsi@vger.kernel.org; linux-ide@vger.kernel.org; linux- > fsdevel@vger.kernel.org; neilb@suse.de > Cc: Martin K. Petersen > Subject: [PATCH 1/3] libata: Whitelist SSDs that are known to properly return > zeroes after TRIM > > As defined, the DRAT (Deterministic Read After Trim) and RZAT (Return > Zero After Trim) flags in the ATA Command Set are unreliable in the > sense that they only define what happens if the device successfully > executed the DSM TRIM command. TRIM is only advisory, however, and the > device is free to silently ignore all or parts of the request. > > In practice this renders the DRAT and RZAT flags completely useless and > because the results are unpredictable we decided to disable discard in > MD for 3.18 to avoid the risk of data corruption. > > Hardware vendors in the real world obviously need better guarantees than > what the standards bodies provide. Unfortuntely those guarantees are > encoded in product requirements documents rather than somewhere we can > key off of them programatically. So we are compelled to disabling > discard_zeroes_data for all devices unless we explicitly have data to > support whitelisting them. > > This patch whitelists SSDs from a few of the main vendors. None of the > whitelists are based on written guarantees. They are purely based on > empirical evidence collected from internal and external users that have > tested or qualified these drives in RAID deployments. > > The whitelist is only meant as a starting point and is by no means > comprehensive: > > - All intel SSD models except for 510 > - Micron M5* > - Samsung SSDs > - Seagate SSDs > That description and Paolo's reply: > From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi- > owner@vger.kernel.org] On Behalf Of Paolo Bonzini > Sent: Friday, 05 December, 2014 10:45 AM ... > I have a Crucial_CT256MX1 (i.e. MX100) and it does reliably zero. make me concerned about this whitelist approach. I think you need a manufacturer assertion that this is indeed the design intent; you cannot be certain based on observation from outside. Since the SCSI and ATA standards allow ignoring the hint, it might be honored most of the time, but ignored in some rare cases (e.g., drive firmware has a malloc() failure that only happens when the drive is handling an overtemperature condition and six other problems at the same time). Maybe there should be two levels: * vendor asserts the drive is designed to always honor the hint * community observes the drive always seems to honor the hint and a sysfs flag for users to select the level at which they feel safe. A user running 3 replicas of the data in different sites might be more trusting than a user for which this is the only copy of the data. --- Rob Elliott HP Server Storage ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM 2014-12-05 22:58 ` Elliott, Robert (Server Storage) @ 2014-12-08 15:15 ` Theodore Ts'o 2014-12-08 15:28 ` James Bottomley 2014-12-08 22:59 ` One Thousand Gnomes 0 siblings, 2 replies; 28+ messages in thread From: Theodore Ts'o @ 2014-12-08 15:15 UTC (permalink / raw) To: Elliott, Robert (Server Storage) Cc: Martin K. Petersen, Paolo Bonzini, linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org, linux-fsdevel@vger.kernel.org, neilb@suse.de On Fri, Dec 05, 2014 at 10:58:09PM +0000, Elliott, Robert (Server Storage) wrote: > > I have a Crucial_CT256MX1 (i.e. MX100) and it does reliably zero. > > make me concerned about this whitelist approach. > > I think you need a manufacturer assertion that this is indeed > the design intent; you cannot be certain based on observation > from outside. How is this different from a manufacturer assertion that they follow a SCSI or ATA standard? There have been cases in the distant past (fortunately) of disk manufacturers that ignored a CACHE FLUSH command just to get higher Winbench scores. Does that mean we can't trust them to do anything right? What I'd suggest instead is that if a vendor states this on a spec sheet --- more than just an e-mail assertion --- so they can be sued if they knowingly misrepresent their product, that we take their word at it. Of course, there will be bugs, which is why we have blacklists, or why we can remove them from the list if it turns out there are edge conditions where it appears the disk doesn't quite do the right thing. After all, we generally take the manufacturer's word that air bags will work as claimed, even if potentially 11 million of them are currently subject to recall. And do we think that "the community" would necessarily be more suited than the vendors and the manufacturer to figure out whether or not air bags or drives are working as desired? That being said, if someone wants to create a open source program which stress tests SSD's to look for cases where it is dropping a requested discard, that would certainly be a good thing to do... Cheers, - Ted ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM 2014-12-08 15:15 ` Theodore Ts'o @ 2014-12-08 15:28 ` James Bottomley 2014-12-08 22:59 ` One Thousand Gnomes 1 sibling, 0 replies; 28+ messages in thread From: James Bottomley @ 2014-12-08 15:28 UTC (permalink / raw) To: Theodore Ts'o Cc: Elliott, Robert (Server Storage), Martin K. Petersen, Paolo Bonzini, linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org, linux-fsdevel@vger.kernel.org, neilb@suse.de On Mon, 2014-12-08 at 10:15 -0500, Theodore Ts'o wrote: > On Fri, Dec 05, 2014 at 10:58:09PM +0000, Elliott, Robert (Server Storage) wrote: > > > I have a Crucial_CT256MX1 (i.e. MX100) and it does reliably zero. > > > > make me concerned about this whitelist approach. > > > > I think you need a manufacturer assertion that this is indeed > > the design intent; you cannot be certain based on observation > > from outside. > > How is this different from a manufacturer assertion that they follow a > SCSI or ATA standard? There have been cases in the distant past > (fortunately) of disk manufacturers that ignored a CACHE FLUSH command > just to get higher Winbench scores. Does that mean we can't trust > them to do anything right? That answer depends on device type manufacturer. USB devices, hell no. ATA devices, maybe and SCSI devices usually. The main problem is usually testing. Consumer devices like USB and (s)ATA rarely get tested on anything but windows. USB devices tend to supply their own driver, so they're on the "we fix it in the driver" model which is why they bite us so badly. (S)ATA usually comply, but they only test what windows exercises, so if windows doesn't do it, chances are it never got tested. SCSI devices still tend to be tested in legacy UNIX environments, which are as diverse as we are. > What I'd suggest instead is that if a vendor states this on a spec > sheet --- more than just an e-mail assertion --- so they can be sued > if they knowingly misrepresent their product, that we take their word > at it. Of course, there will be bugs, which is why we have > blacklists, or why we can remove them from the list if it turns out > there are edge conditions where it appears the disk doesn't quite do > the right thing. > > After all, we generally take the manufacturer's word that air bags > will work as claimed, even if potentially 11 million of them are > currently subject to recall. And do we think that "the community" > would necessarily be more suited than the vendors and the manufacturer > to figure out whether or not air bags or drives are working as > desired? > > That being said, if someone wants to create a open source program > which stress tests SSD's to look for cases where it is dropping a > requested discard, that would certainly be a good thing to do... The purpose of DRAT and RZAT is to enable disk arrays deterministically to use TRIM/Unmap so arrays know what happens to stripes on discard. Arrays are being built of mostly SATA technology these days, so some manufacturers have retargetted to arrays and consumer technology (and are testing the array cases). However, windows doesn't use either feature, so manufacturers not targetting arrays will never test this feature. Hence, in this case, I think a whitelist does make sense. James ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM 2014-12-08 15:15 ` Theodore Ts'o 2014-12-08 15:28 ` James Bottomley @ 2014-12-08 22:59 ` One Thousand Gnomes 1 sibling, 0 replies; 28+ messages in thread From: One Thousand Gnomes @ 2014-12-08 22:59 UTC (permalink / raw) To: Theodore Ts'o Cc: Elliott, Robert (Server Storage), Martin K. Petersen, Paolo Bonzini, linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org, linux-fsdevel@vger.kernel.org, neilb@suse.de On Mon, 8 Dec 2014 10:15:59 -0500 "Theodore Ts'o" <tytso@mit.edu> wrote: > On Fri, Dec 05, 2014 at 10:58:09PM +0000, Elliott, Robert (Server Storage) wrote: > > > I have a Crucial_CT256MX1 (i.e. MX100) and it does reliably zero. > > > > make me concerned about this whitelist approach. > > > > I think you need a manufacturer assertion that this is indeed > > the design intent; you cannot be certain based on observation > > from outside. > > How is this different from a manufacturer assertion that they follow a > SCSI or ATA standard? There have been cases in the distant past > (fortunately) of disk manufacturers that ignored a CACHE FLUSH command > just to get higher Winbench scores. Does that mean we can't trust > them to do anything right? At the time they never promised to honour cache flush. The reason it was became mandatory in the specification was in part so that the vendors could all force each other to play fair. If its "optional" then it's tough..., if they say they meet the standard it's class action 8) If this is a promise then it ought to be good James: "USB devices tend to supply their own driver" has not been true for some years now. Microsoft provide an in-box driver and vendors have the choice of using that or certifying their own via WHQL, which is a bit like choosing between free ice cream and banging your head against a plank cover in nails. Alan ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 2/3] sd: Disable discard_zeroes_data for UNMAP 2014-11-07 5:08 [RFC] Discard update for 3.19 Martin K. Petersen 2014-11-07 5:08 ` [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM Martin K. Petersen @ 2014-11-07 5:08 ` Martin K. Petersen 2014-11-07 8:25 ` Christoph Hellwig 2014-11-10 23:43 ` Paolo Bonzini 2014-11-07 5:08 ` [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function Martin K. Petersen ` (3 subsequent siblings) 5 siblings, 2 replies; 28+ messages in thread From: Martin K. Petersen @ 2014-11-07 5:08 UTC (permalink / raw) To: linux-scsi, linux-ide, linux-fsdevel, neilb; +Cc: Martin K. Petersen The T10 SBC UNMAP command does not provide any hard guarantees that blocks will return zeroes on a subsequent READ. This is due to the fact that the device server is free to silently ignore all or parts of the request. The only way to ensure that a block consistently returns zeroes after being unmapped is to use WRITE SAME with the UNMAP bit set. Should the device be unable to unmap one or more blocks described by the command it is required to manually write zeroes to them. Until now we have preferred UNMAP over the WRITE SAME variants to accommodate thinly provisioned devices that predated the final SBC-3 spec. This patch changes the heuristic so that we favor WRITE SAME(16) or (10) over UNMAP if these commands are marked as supported in the Logical Block Provisioning VPD page. The patch also disables discard_zeroes_data for devices operating in UNMAP mode. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> --- drivers/scsi/sd.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index b041eca8955d..95bfb7bfbb9d 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -656,7 +656,7 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode) unsigned int logical_block_size = sdkp->device->sector_size; unsigned int max_blocks = 0; - q->limits.discard_zeroes_data = sdkp->lbprz; + q->limits.discard_zeroes_data = 0; q->limits.discard_alignment = sdkp->unmap_alignment * logical_block_size; q->limits.discard_granularity = @@ -680,11 +680,13 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode) case SD_LBP_WS16: max_blocks = min_not_zero(sdkp->max_ws_blocks, (u32)SD_MAX_WS16_BLOCKS); + q->limits.discard_zeroes_data = sdkp->lbprz; break; case SD_LBP_WS10: max_blocks = min_not_zero(sdkp->max_ws_blocks, (u32)SD_MAX_WS10_BLOCKS); + q->limits.discard_zeroes_data = sdkp->lbprz; break; case SD_LBP_ZERO: @@ -2622,12 +2624,12 @@ static void sd_read_block_limits(struct scsi_disk *sdkp) } else { /* LBP VPD page tells us what to use */ - if (sdkp->lbpu && sdkp->max_unmap_blocks) - sd_config_discard(sdkp, SD_LBP_UNMAP); - else if (sdkp->lbpws) + if (sdkp->lbpws) sd_config_discard(sdkp, SD_LBP_WS16); else if (sdkp->lbpws10) sd_config_discard(sdkp, SD_LBP_WS10); + else if (sdkp->lbpu && sdkp->max_unmap_blocks) + sd_config_discard(sdkp, SD_LBP_UNMAP); else sd_config_discard(sdkp, SD_LBP_DISABLE); } -- 1.9.3 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 2/3] sd: Disable discard_zeroes_data for UNMAP 2014-11-07 5:08 ` [PATCH 2/3] sd: Disable discard_zeroes_data for UNMAP Martin K. Petersen @ 2014-11-07 8:25 ` Christoph Hellwig 2014-11-10 23:43 ` Paolo Bonzini 1 sibling, 0 replies; 28+ messages in thread From: Christoph Hellwig @ 2014-11-07 8:25 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi, linux-ide, linux-fsdevel, neilb Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/3] sd: Disable discard_zeroes_data for UNMAP 2014-11-07 5:08 ` [PATCH 2/3] sd: Disable discard_zeroes_data for UNMAP Martin K. Petersen 2014-11-07 8:25 ` Christoph Hellwig @ 2014-11-10 23:43 ` Paolo Bonzini 1 sibling, 0 replies; 28+ messages in thread From: Paolo Bonzini @ 2014-11-10 23:43 UTC (permalink / raw) To: Martin K. Petersen, linux-scsi, linux-ide, linux-fsdevel, neilb On 07/11/2014 06:08, Martin K. Petersen wrote: > The T10 SBC UNMAP command does not provide any hard guarantees that > blocks will return zeroes on a subsequent READ. This is due to the fact > that the device server is free to silently ignore all or parts of the > request. > > The only way to ensure that a block consistently returns zeroes after > being unmapped is to use WRITE SAME with the UNMAP bit set. Should the > device be unable to unmap one or more blocks described by the command it > is required to manually write zeroes to them. > > Until now we have preferred UNMAP over the WRITE SAME variants to > accommodate thinly provisioned devices that predated the final SBC-3 > spec. This patch changes the heuristic so that we favor WRITE SAME(16) > or (10) over UNMAP if these commands are marked as supported in the > Logical Block Provisioning VPD page. > > The patch also disables discard_zeroes_data for devices operating in > UNMAP mode. > > Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> > --- > drivers/scsi/sd.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index b041eca8955d..95bfb7bfbb9d 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -656,7 +656,7 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode) > unsigned int logical_block_size = sdkp->device->sector_size; > unsigned int max_blocks = 0; > > - q->limits.discard_zeroes_data = sdkp->lbprz; > + q->limits.discard_zeroes_data = 0; > q->limits.discard_alignment = sdkp->unmap_alignment * > logical_block_size; > q->limits.discard_granularity = > @@ -680,11 +680,13 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode) > case SD_LBP_WS16: > max_blocks = min_not_zero(sdkp->max_ws_blocks, > (u32)SD_MAX_WS16_BLOCKS); > + q->limits.discard_zeroes_data = sdkp->lbprz; > break; > > case SD_LBP_WS10: > max_blocks = min_not_zero(sdkp->max_ws_blocks, > (u32)SD_MAX_WS10_BLOCKS); > + q->limits.discard_zeroes_data = sdkp->lbprz; > break; > > case SD_LBP_ZERO: > @@ -2622,12 +2624,12 @@ static void sd_read_block_limits(struct scsi_disk *sdkp) > > } else { /* LBP VPD page tells us what to use */ > > - if (sdkp->lbpu && sdkp->max_unmap_blocks) > - sd_config_discard(sdkp, SD_LBP_UNMAP); > - else if (sdkp->lbpws) > + if (sdkp->lbpws) > sd_config_discard(sdkp, SD_LBP_WS16); > else if (sdkp->lbpws10) > sd_config_discard(sdkp, SD_LBP_WS10); > + else if (sdkp->lbpu && sdkp->max_unmap_blocks) > + sd_config_discard(sdkp, SD_LBP_UNMAP); > else > sd_config_discard(sdkp, SD_LBP_DISABLE); > } > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function 2014-11-07 5:08 [RFC] Discard update for 3.19 Martin K. Petersen 2014-11-07 5:08 ` [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM Martin K. Petersen 2014-11-07 5:08 ` [PATCH 2/3] sd: Disable discard_zeroes_data for UNMAP Martin K. Petersen @ 2014-11-07 5:08 ` Martin K. Petersen 2014-11-07 8:26 ` Christoph Hellwig 2014-11-11 0:04 ` Darrick J. Wong 2014-11-07 12:09 ` [RFC] Discard update for 3.19 Bernd Schubert ` (2 subsequent siblings) 5 siblings, 2 replies; 28+ messages in thread From: Martin K. Petersen @ 2014-11-07 5:08 UTC (permalink / raw) To: linux-scsi, linux-ide, linux-fsdevel, neilb; +Cc: Martin K. Petersen blkdev_issue_discard() will zero a given block range on disk. This is done by way of either WRITE SAME or regular WRITE. I.e. the blocks on disk will be written and thus provisioned. There are use cases where the desired behavior is to zero the blocks but unprovision them if possible. The blocks must deterministically contain zeroes when they are subsequently read back. This patch introduces a blkdev_issue_zeroout_discard() call that provides this functionality. If a block device guarantees discard_zeroes_data the new function will use discard to clear the block range. If the device does not support discard_zeroes_data or if the discard request fails we will fall back to blkdev_issue_zeroout() to ensure predictable results. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> --- block/blk-lib.c | 44 ++++++++++++++++++++++++++++++++++++++++++-- include/linux/blkdev.h | 2 ++ 2 files changed, 44 insertions(+), 2 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 8411be3c19d3..2ffec6a01c71 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -278,14 +278,18 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, } /** - * blkdev_issue_zeroout - zero-fill a block range + * blkdev_issue_zeroout - zero-fill and provision a block range * @bdev: blockdev to write * @sector: start sector * @nr_sects: number of sectors to write * @gfp_mask: memory allocation flags (for bio_alloc) * * Description: - * Generate and issue number of bios with zerofiled pages. + * Zero-fill a block range. The blocks will be provisioned + * (allocated/anchored) and are guaranteed to return zeroes when read + * back. This function will attempt to use WRITE SAME to optimize the + * process if the block device supports it. Otherwise it will fall back + * to zeroing the blocks using regular WRITE calls. */ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, @@ -305,3 +309,39 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); } EXPORT_SYMBOL(blkdev_issue_zeroout); + +/** + * blkdev_issue_zeroout_discard - zero-fill and attempt to discard block range + * @bdev: blockdev to write + * @sector: start sector + * @nr_sects: number of sectors to write + * @gfp_mask: memory allocation flags (for bio_alloc) + * + * Description: + * Zero-fill a block range. In contrast to blkdev_issue_zeroout() this + * function will attempt to deprovision (deallocate/discard) the blocks + * in question. It will only do so if the underlying device guarantees + * that subsequent READ operations to the block range in question will + * return zeroes. If the device does not provide hard guarantees or if + * the DISCARD attempt should fail the block range will be explicitly + * zeroed using blkdev_issue_zeroout(). + */ + +int blkdev_issue_zeroout_discard(struct block_device *bdev, sector_t sector, + sector_t nr_sects, gfp_t gfp_mask) +{ + struct request_queue *q = bdev_get_queue(bdev); + + if (blk_queue_discard(q) && q->limits.discard_zeroes_data) { + unsigned char bdn[BDEVNAME_SIZE]; + + if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0)) + return 0; + + bdevname(bdev, bdn); + pr_err("%s: DISCARD failed. Manually zeroing.\n", bdn); + } + + return blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); +} +EXPORT_SYMBOL(blkdev_issue_zeroout_discard); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index aac0f9ea952a..078b6e5f488a 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1164,6 +1164,8 @@ extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, struct page *page); extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask); +extern int blkdev_issue_zeroout_discard(struct block_device *bdev, + sector_t sector, sector_t nr_sects, gfp_t gfp_mask); static inline int sb_issue_discard(struct super_block *sb, sector_t block, sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags) { -- 1.9.3 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function 2014-11-07 5:08 ` [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function Martin K. Petersen @ 2014-11-07 8:26 ` Christoph Hellwig 2014-11-07 15:42 ` Martin K. Petersen 2014-11-11 0:04 ` Darrick J. Wong 1 sibling, 1 reply; 28+ messages in thread From: Christoph Hellwig @ 2014-11-07 8:26 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi, linux-ide, linux-fsdevel, neilb On Fri, Nov 07, 2014 at 12:08:14AM -0500, Martin K. Petersen wrote: > blkdev_issue_discard() will zero a given block range on disk. This is > done by way of either WRITE SAME or regular WRITE. I.e. the blocks on > disk will be written and thus provisioned. > > There are use cases where the desired behavior is to zero the blocks but > unprovision them if possible. The blocks must deterministically contain > zeroes when they are subsequently read back. > > This patch introduces a blkdev_issue_zeroout_discard() call that > provides this functionality. If a block device guarantees > discard_zeroes_data the new function will use discard to clear the block > range. If the device does not support discard_zeroes_data or if the > discard request fails we will fall back to blkdev_issue_zeroout() to > ensure predictable results. I'm not a fan of adding another function here and would prefer a flag, but it looks correct, so: Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function 2014-11-07 8:26 ` Christoph Hellwig @ 2014-11-07 15:42 ` Martin K. Petersen 2014-11-07 16:20 ` Theodore Ts'o 0 siblings, 1 reply; 28+ messages in thread From: Martin K. Petersen @ 2014-11-07 15:42 UTC (permalink / raw) To: Christoph Hellwig Cc: Martin K. Petersen, linux-scsi, linux-ide, linux-fsdevel, neilb, Theodore Ts'o >>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes: Christoph> I'm not a fan of adding another function here and would Christoph> prefer a flag, but it looks correct, That was my original approach too but I didn't want to stomp over all the existing callers. Although there only are few. Ted: Which would you prefer? -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function 2014-11-07 15:42 ` Martin K. Petersen @ 2014-11-07 16:20 ` Theodore Ts'o 2014-11-07 16:27 ` Martin K. Petersen 0 siblings, 1 reply; 28+ messages in thread From: Theodore Ts'o @ 2014-11-07 16:20 UTC (permalink / raw) To: Martin K. Petersen Cc: Christoph Hellwig, linux-scsi, linux-ide, linux-fsdevel, neilb On Fri, Nov 07, 2014 at 10:42:24AM -0500, Martin K. Petersen wrote: > >>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes: > > Christoph> I'm not a fan of adding another function here and would > Christoph> prefer a flag, but it looks correct, > > That was my original approach too but I didn't want to stomp over all > the existing callers. Although there only are few. > > Ted: Which would you prefer? There are *very* few users of blkdev_issue_zeroout(), and aside for a single drbd, they are all in the block layer. It would only start affecting ext4 when we plumb that flag through to sb_issue_zeroout (which your patch doesn't currently do), at which point it will affect 4 call sites in ext4, and a call site in gfs2 and hpfs2. So I'd be in favor of adding a flag to to blkdev_issue_zeroout(), and I would have a slight preference for also modifying sb_issue_zeroout so the flag gets plumbed all the way through to the fs-level callers. Cheers, - Ted ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function 2014-11-07 16:20 ` Theodore Ts'o @ 2014-11-07 16:27 ` Martin K. Petersen 2014-11-14 20:22 ` Martin K. Petersen 0 siblings, 1 reply; 28+ messages in thread From: Martin K. Petersen @ 2014-11-07 16:27 UTC (permalink / raw) To: Theodore Ts'o Cc: Martin K. Petersen, Christoph Hellwig, linux-scsi, linux-ide, linux-fsdevel, neilb >>>>> "Ted" == Theodore Ts'o <tytso@mit.edu> writes: Ted> So I'd be in favor of adding a flag to to blkdev_issue_zeroout(), Ted> and I would have a slight preference for also modifying Ted> sb_issue_zeroout so the flag gets plumbed all the way through to Ted> the fs-level callers. OK. I'll do that, then. I always liked the flag better. I was just trying to minimize the impact. What would you prefer as the default for the ext4 use case? To allocate or to discard? -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function 2014-11-07 16:27 ` Martin K. Petersen @ 2014-11-14 20:22 ` Martin K. Petersen 2014-11-17 18:53 ` Darrick J. Wong 0 siblings, 1 reply; 28+ messages in thread From: Martin K. Petersen @ 2014-11-14 20:22 UTC (permalink / raw) To: Martin K. Petersen Cc: Theodore Ts'o, Christoph Hellwig, linux-scsi, linux-ide, linux-fsdevel, neilb >>>>> "Martin" == Martin K Petersen <martin.petersen@oracle.com> writes: Martin> What would you prefer as the default for the ext4 use case? To Martin> allocate or to discard? I didn't get a preference for whether sb_issue_zeroout() should discard or allocate. But here's an updated patch 3... commit eb23c9e71e08b7f467cbc36990a1a01a94a7b959 Author: Martin K. Petersen <martin.petersen@oracle.com> Date: Thu Nov 6 14:36:05 2014 -0500 block: Add discard flag to blkdev_issue_zeroout() function blkdev_issue_discard() will zero a given block range. This is done by way of explicit writing, thus provisioning or allocating the blocks on disk. There are use cases where the desired behavior is to zero the blocks but unprovision them if possible. The blocks must deterministically contain zeroes when they are subsequently read back. This patch adds a flag to blkdev_issue_zeroout() that provides this variant. If the discard flag is set and a block device guarantees discard_zeroes_data we will use REQ_DISCARD to clear the block range. If the device does not support discard_zeroes_data or if the discard request fails we will fall back to first REQ_WRITE_SAME and then a regular REQ_WRITE. Also update the callers of blkdev_issue_zero() to reflect the new flag and make sb_issue_zeroout() prefer the discard approach. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> diff --git a/block/blk-lib.c b/block/blk-lib.c index 8411be3c19d3..715e948f58a4 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -283,23 +283,45 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, * @sector: start sector * @nr_sects: number of sectors to write * @gfp_mask: memory allocation flags (for bio_alloc) + * @discard: whether to discard the block range * * Description: - * Generate and issue number of bios with zerofiled pages. + + * Zero-fill a block range. If the discard flag is set and the block + * device guarantees that subsequent READ operations to the block range + * in question will return zeroes, the blocks will be discarded. Should + * the discard request fail, if the discard flag is not set, or if + * discard_zeroes_data is not supported, this function will resort to + * zeroing the blocks manually, thus provisioning (allocating, + * anchoring) them. If the block device supports the WRITE SAME command + * blkdev_issue_zeroout() will use it to optimize the process of + * clearing the block range. Otherwise the zeroing will be performed + * using regular WRITE calls. */ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, - sector_t nr_sects, gfp_t gfp_mask) + sector_t nr_sects, gfp_t gfp_mask, bool discard) { + struct request_queue *q = bdev_get_queue(bdev); + unsigned char bdn[BDEVNAME_SIZE]; + + if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data) { + + if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0)) + return 0; + + bdevname(bdev, bdn); + pr_warn("%s: DISCARD failed. Manually zeroing.\n", bdn); + } + if (bdev_write_same(bdev)) { - unsigned char bdn[BDEVNAME_SIZE]; if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, ZERO_PAGE(0))) return 0; bdevname(bdev, bdn); - pr_err("%s: WRITE SAME failed. Manually zeroing.\n", bdn); + pr_warn("%s: WRITE SAME failed. Manually zeroing.\n", bdn); } return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); diff --git a/block/ioctl.c b/block/ioctl.c index 6c7bf903742f..7d8befde2aca 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -198,7 +198,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, uint64_t start, if (start + len > (i_size_read(bdev->bd_inode) >> 9)) return -EINVAL; - return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL); + return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false); } static int put_ushort(unsigned long arg, unsigned short val) diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c index 6960fb064731..ee5b9611c51c 100644 --- a/drivers/block/drbd/drbd_receiver.c +++ b/drivers/block/drbd/drbd_receiver.c @@ -1388,7 +1388,7 @@ int drbd_submit_peer_request(struct drbd_device *device, list_add_tail(&peer_req->w.list, &device->active_ee); spin_unlock_irq(&device->resource->req_lock); if (blkdev_issue_zeroout(device->ldev->backing_bdev, - sector, data_size >> 9, GFP_NOIO)) + sector, data_size >> 9, GFP_NOIO, false)) peer_req->flags |= EE_WAS_ERROR; drbd_endio_write_sec_final(peer_req); return 0; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index aac0f9ea952a..f3ad69d8de45 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1163,7 +1163,7 @@ extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector, extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, struct page *page); extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, - sector_t nr_sects, gfp_t gfp_mask); + sector_t nr_sects, gfp_t gfp_mask, bool discard); static inline int sb_issue_discard(struct super_block *sb, sector_t block, sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags) { @@ -1177,7 +1177,7 @@ static inline int sb_issue_zeroout(struct super_block *sb, sector_t block, return blkdev_issue_zeroout(sb->s_bdev, block << (sb->s_blocksize_bits - 9), nr_blocks << (sb->s_blocksize_bits - 9), - gfp_mask); + gfp_mask, true); } extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm); ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function 2014-11-14 20:22 ` Martin K. Petersen @ 2014-11-17 18:53 ` Darrick J. Wong 0 siblings, 0 replies; 28+ messages in thread From: Darrick J. Wong @ 2014-11-17 18:53 UTC (permalink / raw) To: Martin K. Petersen Cc: Theodore Ts'o, Christoph Hellwig, linux-scsi, linux-ide, linux-fsdevel, neilb On Fri, Nov 14, 2014 at 03:22:05PM -0500, Martin K. Petersen wrote: > >>>>> "Martin" == Martin K Petersen <martin.petersen@oracle.com> writes: > > Martin> What would you prefer as the default for the ext4 use case? To > Martin> allocate or to discard? > > I didn't get a preference for whether sb_issue_zeroout() should discard > or allocate. In the discussions I've had on the ext4 list, we seem to be leaning towards discard and falling back to allocate if necessary. --D > > But here's an updated patch 3... > > commit eb23c9e71e08b7f467cbc36990a1a01a94a7b959 > Author: Martin K. Petersen <martin.petersen@oracle.com> > Date: Thu Nov 6 14:36:05 2014 -0500 > > block: Add discard flag to blkdev_issue_zeroout() function > > blkdev_issue_discard() will zero a given block range. This is done by > way of explicit writing, thus provisioning or allocating the blocks on > disk. > > There are use cases where the desired behavior is to zero the blocks but > unprovision them if possible. The blocks must deterministically contain > zeroes when they are subsequently read back. > > This patch adds a flag to blkdev_issue_zeroout() that provides this > variant. If the discard flag is set and a block device guarantees > discard_zeroes_data we will use REQ_DISCARD to clear the block range. If > the device does not support discard_zeroes_data or if the discard > request fails we will fall back to first REQ_WRITE_SAME and then a > regular REQ_WRITE. > > Also update the callers of blkdev_issue_zero() to reflect the new flag > and make sb_issue_zeroout() prefer the discard approach. > > Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> > > diff --git a/block/blk-lib.c b/block/blk-lib.c > index 8411be3c19d3..715e948f58a4 100644 > --- a/block/blk-lib.c > +++ b/block/blk-lib.c > @@ -283,23 +283,45 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, > * @sector: start sector > * @nr_sects: number of sectors to write > * @gfp_mask: memory allocation flags (for bio_alloc) > + * @discard: whether to discard the block range > * > * Description: > - * Generate and issue number of bios with zerofiled pages. > + > + * Zero-fill a block range. If the discard flag is set and the block > + * device guarantees that subsequent READ operations to the block range > + * in question will return zeroes, the blocks will be discarded. Should > + * the discard request fail, if the discard flag is not set, or if > + * discard_zeroes_data is not supported, this function will resort to > + * zeroing the blocks manually, thus provisioning (allocating, > + * anchoring) them. If the block device supports the WRITE SAME command > + * blkdev_issue_zeroout() will use it to optimize the process of > + * clearing the block range. Otherwise the zeroing will be performed > + * using regular WRITE calls. > */ > > int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, > - sector_t nr_sects, gfp_t gfp_mask) > + sector_t nr_sects, gfp_t gfp_mask, bool discard) > { > + struct request_queue *q = bdev_get_queue(bdev); > + unsigned char bdn[BDEVNAME_SIZE]; > + > + if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data) { > + > + if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0)) > + return 0; > + > + bdevname(bdev, bdn); > + pr_warn("%s: DISCARD failed. Manually zeroing.\n", bdn); > + } > + > if (bdev_write_same(bdev)) { > - unsigned char bdn[BDEVNAME_SIZE]; > > if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask, > ZERO_PAGE(0))) > return 0; > > bdevname(bdev, bdn); > - pr_err("%s: WRITE SAME failed. Manually zeroing.\n", bdn); > + pr_warn("%s: WRITE SAME failed. Manually zeroing.\n", bdn); > } > > return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); > diff --git a/block/ioctl.c b/block/ioctl.c > index 6c7bf903742f..7d8befde2aca 100644 > --- a/block/ioctl.c > +++ b/block/ioctl.c > @@ -198,7 +198,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, uint64_t start, > if (start + len > (i_size_read(bdev->bd_inode) >> 9)) > return -EINVAL; > > - return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL); > + return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false); > } > > static int put_ushort(unsigned long arg, unsigned short val) > diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c > index 6960fb064731..ee5b9611c51c 100644 > --- a/drivers/block/drbd/drbd_receiver.c > +++ b/drivers/block/drbd/drbd_receiver.c > @@ -1388,7 +1388,7 @@ int drbd_submit_peer_request(struct drbd_device *device, > list_add_tail(&peer_req->w.list, &device->active_ee); > spin_unlock_irq(&device->resource->req_lock); > if (blkdev_issue_zeroout(device->ldev->backing_bdev, > - sector, data_size >> 9, GFP_NOIO)) > + sector, data_size >> 9, GFP_NOIO, false)) > peer_req->flags |= EE_WAS_ERROR; > drbd_endio_write_sec_final(peer_req); > return 0; > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index aac0f9ea952a..f3ad69d8de45 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -1163,7 +1163,7 @@ extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector, > extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, > sector_t nr_sects, gfp_t gfp_mask, struct page *page); > extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, > - sector_t nr_sects, gfp_t gfp_mask); > + sector_t nr_sects, gfp_t gfp_mask, bool discard); > static inline int sb_issue_discard(struct super_block *sb, sector_t block, > sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags) > { > @@ -1177,7 +1177,7 @@ static inline int sb_issue_zeroout(struct super_block *sb, sector_t block, > return blkdev_issue_zeroout(sb->s_bdev, > block << (sb->s_blocksize_bits - 9), > nr_blocks << (sb->s_blocksize_bits - 9), > - gfp_mask); > + gfp_mask, true); > } > > extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm); > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function 2014-11-07 5:08 ` [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function Martin K. Petersen 2014-11-07 8:26 ` Christoph Hellwig @ 2014-11-11 0:04 ` Darrick J. Wong 2014-11-11 2:33 ` Martin K. Petersen 2014-11-17 19:28 ` [PATCH] block: create ioctl to discard-or-zeroout a range of blocks Darrick J. Wong 1 sibling, 2 replies; 28+ messages in thread From: Darrick J. Wong @ 2014-11-11 0:04 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi, linux-ide, linux-fsdevel, neilb, linux-api On Fri, Nov 07, 2014 at 12:08:14AM -0500, Martin K. Petersen wrote: > blkdev_issue_discard() will zero a given block range on disk. This is > done by way of either WRITE SAME or regular WRITE. I.e. the blocks on > disk will be written and thus provisioned. > > There are use cases where the desired behavior is to zero the blocks but > unprovision them if possible. The blocks must deterministically contain > zeroes when they are subsequently read back. > > This patch introduces a blkdev_issue_zeroout_discard() call that > provides this functionality. If a block device guarantees > discard_zeroes_data the new function will use discard to clear the block > range. If the device does not support discard_zeroes_data or if the > discard request fails we will fall back to blkdev_issue_zeroout() to > ensure predictable results. Can this be plumbed into a BLK* ioctl too? I'll write a patch, if this is ok with everyone: struct blkzeroout_t { __u64 start; __u64 end; __u32 flags; }; #define BLKZEROOUT_DISCARD_OK 1 #define BLKZEROOUT_V2 _IOR(0x12, 127, sizeof(struct blkzeroout_t)) ...and make it zap the page cache per earlier discussion. This seems to be a good fit with what we've been discussing for mke2fs. --D > > Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> > --- > block/blk-lib.c | 44 ++++++++++++++++++++++++++++++++++++++++++-- > include/linux/blkdev.h | 2 ++ > 2 files changed, 44 insertions(+), 2 deletions(-) > > diff --git a/block/blk-lib.c b/block/blk-lib.c > index 8411be3c19d3..2ffec6a01c71 100644 > --- a/block/blk-lib.c > +++ b/block/blk-lib.c > @@ -278,14 +278,18 @@ static int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, > } > > /** > - * blkdev_issue_zeroout - zero-fill a block range > + * blkdev_issue_zeroout - zero-fill and provision a block range > * @bdev: blockdev to write > * @sector: start sector > * @nr_sects: number of sectors to write > * @gfp_mask: memory allocation flags (for bio_alloc) > * > * Description: > - * Generate and issue number of bios with zerofiled pages. > + * Zero-fill a block range. The blocks will be provisioned > + * (allocated/anchored) and are guaranteed to return zeroes when read > + * back. This function will attempt to use WRITE SAME to optimize the > + * process if the block device supports it. Otherwise it will fall back > + * to zeroing the blocks using regular WRITE calls. > */ > > int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, > @@ -305,3 +309,39 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, > return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); > } > EXPORT_SYMBOL(blkdev_issue_zeroout); > + > +/** > + * blkdev_issue_zeroout_discard - zero-fill and attempt to discard block range > + * @bdev: blockdev to write > + * @sector: start sector > + * @nr_sects: number of sectors to write > + * @gfp_mask: memory allocation flags (for bio_alloc) > + * > + * Description: > + * Zero-fill a block range. In contrast to blkdev_issue_zeroout() this > + * function will attempt to deprovision (deallocate/discard) the blocks > + * in question. It will only do so if the underlying device guarantees > + * that subsequent READ operations to the block range in question will > + * return zeroes. If the device does not provide hard guarantees or if > + * the DISCARD attempt should fail the block range will be explicitly > + * zeroed using blkdev_issue_zeroout(). > + */ > + > +int blkdev_issue_zeroout_discard(struct block_device *bdev, sector_t sector, > + sector_t nr_sects, gfp_t gfp_mask) > +{ > + struct request_queue *q = bdev_get_queue(bdev); > + > + if (blk_queue_discard(q) && q->limits.discard_zeroes_data) { > + unsigned char bdn[BDEVNAME_SIZE]; > + > + if (!blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0)) > + return 0; > + > + bdevname(bdev, bdn); > + pr_err("%s: DISCARD failed. Manually zeroing.\n", bdn); > + } > + > + return blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask); > +} > +EXPORT_SYMBOL(blkdev_issue_zeroout_discard); > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index aac0f9ea952a..078b6e5f488a 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -1164,6 +1164,8 @@ extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, > sector_t nr_sects, gfp_t gfp_mask, struct page *page); > extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, > sector_t nr_sects, gfp_t gfp_mask); > +extern int blkdev_issue_zeroout_discard(struct block_device *bdev, > + sector_t sector, sector_t nr_sects, gfp_t gfp_mask); > static inline int sb_issue_discard(struct super_block *sb, sector_t block, > sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags) > { > -- > 1.9.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function 2014-11-11 0:04 ` Darrick J. Wong @ 2014-11-11 2:33 ` Martin K. Petersen 2014-11-17 19:28 ` [PATCH] block: create ioctl to discard-or-zeroout a range of blocks Darrick J. Wong 1 sibling, 0 replies; 28+ messages in thread From: Martin K. Petersen @ 2014-11-11 2:33 UTC (permalink / raw) To: Darrick J. Wong Cc: Martin K. Petersen, linux-scsi, linux-ide, linux-fsdevel, neilb, linux-api >>>>> "Darrick" == Darrick J Wong <darrick.wong@oracle.com> writes: Darrick> Can this be plumbed into a BLK* ioctl too? I'll write a patch, Darrick> if this is ok with everyone: Darrick> ...and make it zap the page cache per earlier discussion. This Darrick> seems to be a good fit with what we've been discussing for Darrick> mke2fs. That sounds good to me. I'll get the updated patch out tomorrow. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] block: create ioctl to discard-or-zeroout a range of blocks 2014-11-11 0:04 ` Darrick J. Wong 2014-11-11 2:33 ` Martin K. Petersen @ 2014-11-17 19:28 ` Darrick J. Wong 1 sibling, 0 replies; 28+ messages in thread From: Darrick J. Wong @ 2014-11-17 19:28 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi, linux-ide, linux-fsdevel, neilb, linux-api Create a new ioctl to expose the block layer's newfound ability to issue either a zeroing discard, a WRITE SAME with a zero page, or a regular write with the zero page. This BLKZEROOUT2 ioctl takes {start, length, flags} as parameters. So far, the only flag available is to enable the zeroing discard part -- without it, the call invokes the old BLKZEROOUT behavior. start and length have the same meaning as in BLKZEROOUT. Furthermore, because BLKZEROOUT2 issues commands directly to the storage device, we must invalidate the page cache (as a regular O_DIRECT write would do) to avoid returning stale cache contents at a later time. This patch depends on mkp's earlier patch "block: Introduce blkdev_issue_zeroout_discard() function". Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> --- block/ioctl.c | 45 ++++++++++++++++++++++++++++++++++++++------- include/uapi/linux/fs.h | 7 +++++++ 2 files changed, 45 insertions(+), 7 deletions(-) diff --git a/block/ioctl.c b/block/ioctl.c index 7d8befd..ff623d5 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -186,19 +186,39 @@ static int blk_ioctl_discard(struct block_device *bdev, uint64_t start, } static int blk_ioctl_zeroout(struct block_device *bdev, uint64_t start, - uint64_t len) + uint64_t len, uint32_t flags) { + int ret; + struct address_space *mapping; + uint64_t end = start + len - 1; + + if (flags & ~BLKZEROOUT2_DISCARD_OK) + return -EINVAL; if (start & 511) return -EINVAL; if (len & 511) return -EINVAL; - start >>= 9; - len >>= 9; - - if (start + len > (i_size_read(bdev->bd_inode) >> 9)) + if (end >= i_size_read(bdev->bd_inode)) return -EINVAL; - return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL, false); + /* Invalidate the page cache, including dirty pages */ + mapping = bdev->bd_inode->i_mapping; + truncate_inode_pages_range(mapping, start, end); + + ret = blkdev_issue_zeroout(bdev, start >> 9, len >> 9, GFP_KERNEL, + flags & BLKZEROOUT2_DISCARD_OK); + if (ret) + goto out; + + /* + * Invalidate again; if someone wandered in and dirtied a page, + * the caller will be given -EBUSY. + */ + ret = invalidate_inode_pages2_range(mapping, + start >> PAGE_CACHE_SHIFT, + end >> PAGE_CACHE_SHIFT); +out: + return ret; } static int put_ushort(unsigned long arg, unsigned short val) @@ -326,7 +346,18 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, if (copy_from_user(range, (void __user *)arg, sizeof(range))) return -EFAULT; - return blk_ioctl_zeroout(bdev, range[0], range[1]); + return blk_ioctl_zeroout(bdev, range[0], range[1], 0); + } + case BLKZEROOUT2: { + struct blkzeroout2 p; + + if (!(mode & FMODE_WRITE)) + return -EBADF; + + if (copy_from_user(&p, (void __user *)arg, sizeof(p))) + return -EFAULT; + + return blk_ioctl_zeroout(bdev, p.start, p.length, p.flags); } case HDIO_GETGEO: { diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 3735fa0..54d24ea 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -150,6 +150,13 @@ struct inodes_stat_t { #define BLKSECDISCARD _IO(0x12,125) #define BLKROTATIONAL _IO(0x12,126) #define BLKZEROOUT _IO(0x12,127) +struct blkzeroout2 { + __u64 start; + __u64 length; + __u32 flags; +}; +#define BLKZEROOUT2_DISCARD_OK 1 +#define BLKZEROOUT2 _IOR(0x12, 127, struct blkzeroout2) #define BMAP_IOCTL 1 /* obsolete - kept for compatibility */ #define FIBMAP _IO(0x00,1) /* bmap access */ ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [RFC] Discard update for 3.19 2014-11-07 5:08 [RFC] Discard update for 3.19 Martin K. Petersen ` (2 preceding siblings ...) 2014-11-07 5:08 ` [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function Martin K. Petersen @ 2014-11-07 12:09 ` Bernd Schubert 2014-11-07 12:11 ` Bernd Schubert 2014-11-10 14:19 ` Christoph Hellwig 5 siblings, 0 replies; 28+ messages in thread From: Bernd Schubert @ 2014-11-07 12:09 UTC (permalink / raw) To: linux-fsdevel > The second patch tweaks the SCSI disk driver to prefer WRITE SAME w/ > UNMAP instead of the UNMAP command since the former has deterministic > behavior. I have seen data corruption on an Intel SSD 510 after sending WRITE SAME UNMAP. As far as I remember, these disks ignore writes after sending that command. Unmap worked fine, though. Cheers, Bernd ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] Discard update for 3.19 2014-11-07 5:08 [RFC] Discard update for 3.19 Martin K. Petersen ` (3 preceding siblings ...) 2014-11-07 12:09 ` [RFC] Discard update for 3.19 Bernd Schubert @ 2014-11-07 12:11 ` Bernd Schubert 2014-11-07 15:46 ` Martin K. Petersen 2014-11-10 14:19 ` Christoph Hellwig 5 siblings, 1 reply; 28+ messages in thread From: Bernd Schubert @ 2014-11-07 12:11 UTC (permalink / raw) To: Martin K. Petersen, linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org, linux-fsdevel@vger.kernel.org, neilb@suse.de > The second patch tweaks the SCSI disk driver to prefer WRITE SAME w/ > UNMAP instead of the UNMAP command since the former has deterministic > behavior. I have seen data corruption on an Intel SSD 510 after sending WRITE SAME UNMAP. As far as I remember, these disks ignore writes after sending that command. Unmap worked fine, though. So possibly there is another blacklisting required. Cheers, Bernd ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] Discard update for 3.19 2014-11-07 12:11 ` Bernd Schubert @ 2014-11-07 15:46 ` Martin K. Petersen 0 siblings, 0 replies; 28+ messages in thread From: Martin K. Petersen @ 2014-11-07 15:46 UTC (permalink / raw) To: Bernd Schubert Cc: Martin K. Petersen, linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org, linux-fsdevel@vger.kernel.org, neilb@suse.de >>>>> "Bernd" == Bernd Schubert <bschubert@ddn.com> writes: Bernd, Bernd> I have seen data corruption on an Intel SSD 510 after sending Bernd> WRITE SAME UNMAP. As far as I remember, these disks ignore writes Bernd> after sending that command. Unmap worked fine, though. So Bernd> possibly there is another blacklisting required. Well, the 510 is a SATA drive and doesn't implement either command so that would be an LSI SATL problem. I recall that we had some problems with the LSI firmware blacklisting DRAT/RZAT on that particular drive so I did the same in my libata patch. I don't have any bugs open with WRITE SAME or UNMAP on recent LSI firmware, though. We use it extensively here. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] Discard update for 3.19 2014-11-07 5:08 [RFC] Discard update for 3.19 Martin K. Petersen ` (4 preceding siblings ...) 2014-11-07 12:11 ` Bernd Schubert @ 2014-11-10 14:19 ` Christoph Hellwig 2014-11-12 18:40 ` Ewan Milne 5 siblings, 1 reply; 28+ messages in thread From: Christoph Hellwig @ 2014-11-10 14:19 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi, linux-ide, linux-fsdevel, neilb Looks like there is no real dependency between these patches, so we might take on each through the libata, scsi and block trees. Can I get another review for the sd patch, please? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] Discard update for 3.19 2014-11-10 14:19 ` Christoph Hellwig @ 2014-11-12 18:40 ` Ewan Milne 2014-11-12 19:41 ` Martin K. Petersen 0 siblings, 1 reply; 28+ messages in thread From: Ewan Milne @ 2014-11-12 18:40 UTC (permalink / raw) To: Christoph Hellwig Cc: Martin K. Petersen, linux-scsi, linux-ide, linux-fsdevel, neilb On Mon, 2014-11-10 at 06:19 -0800, Christoph Hellwig wrote: > Looks like there is no real dependency between these patches, so we > might take on each through the libata, scsi and block trees. > > Can I get another review for the sd patch, please? > The changes in [PATCH 2/3] sd: Disable discard_zeroes_data for UNMAP look fine to me. I was wondering, though, if the changes to add the "devices_handle_discard_safely" module parameter to the MD raid drivers are really needed if this is fixed. (It's great to be able to disable the use of discard if desired, but how is an administrator actually supposed to know if the devices they have *really* work properly?) Maybe the default value of this parameter should be changed, or the parameter should be changed to have an inverse sense, i.e. "disable use of discard"... -Ewan ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [RFC] Discard update for 3.19 2014-11-12 18:40 ` Ewan Milne @ 2014-11-12 19:41 ` Martin K. Petersen 0 siblings, 0 replies; 28+ messages in thread From: Martin K. Petersen @ 2014-11-12 19:41 UTC (permalink / raw) To: emilne Cc: Christoph Hellwig, Martin K. Petersen, linux-scsi, linux-ide, linux-fsdevel, neilb >>>>> "Ewan" == Ewan Milne <emilne@redhat.com> writes: Ewan> The changes in [PATCH 2/3] sd: Disable discard_zeroes_data for Ewan> UNMAP look fine to me. I was wondering, though, if the changes to Ewan> add the "devices_handle_discard_safely" module parameter to the MD Ewan> raid drivers are really needed if this is fixed. I was expecting Neil to either revert or toggle the default once this change was in place. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2014-12-08 22:59 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-11-07 5:08 [RFC] Discard update for 3.19 Martin K. Petersen 2014-11-07 5:08 ` [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM Martin K. Petersen 2014-11-07 8:24 ` Christoph Hellwig 2014-11-07 15:37 ` Martin K. Petersen 2014-12-05 16:45 ` Paolo Bonzini 2014-12-05 22:58 ` Elliott, Robert (Server Storage) 2014-12-08 15:15 ` Theodore Ts'o 2014-12-08 15:28 ` James Bottomley 2014-12-08 22:59 ` One Thousand Gnomes 2014-11-07 5:08 ` [PATCH 2/3] sd: Disable discard_zeroes_data for UNMAP Martin K. Petersen 2014-11-07 8:25 ` Christoph Hellwig 2014-11-10 23:43 ` Paolo Bonzini 2014-11-07 5:08 ` [PATCH 3/3] block: Introduce blkdev_issue_zeroout_discard() function Martin K. Petersen 2014-11-07 8:26 ` Christoph Hellwig 2014-11-07 15:42 ` Martin K. Petersen 2014-11-07 16:20 ` Theodore Ts'o 2014-11-07 16:27 ` Martin K. Petersen 2014-11-14 20:22 ` Martin K. Petersen 2014-11-17 18:53 ` Darrick J. Wong 2014-11-11 0:04 ` Darrick J. Wong 2014-11-11 2:33 ` Martin K. Petersen 2014-11-17 19:28 ` [PATCH] block: create ioctl to discard-or-zeroout a range of blocks Darrick J. Wong 2014-11-07 12:09 ` [RFC] Discard update for 3.19 Bernd Schubert 2014-11-07 12:11 ` Bernd Schubert 2014-11-07 15:46 ` Martin K. Petersen 2014-11-10 14:19 ` Christoph Hellwig 2014-11-12 18:40 ` Ewan Milne 2014-11-12 19:41 ` Martin K. Petersen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).