From: Dave Chinner <david@fromorbit.com>
To: Tejun Heo <tj@kernel.org>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
James Bottomley <James.Bottomley@HansenPartnership.com>,
linux-ide@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] libata: Whitelist SSDs that are known to properly return zeroes after TRIM
Date: Wed, 7 Jan 2015 15:15:37 +1100 [thread overview]
Message-ID: <20150107041537.GH31508@dastard> (raw)
In-Reply-To: <20150107025455.GL3077@htj.dyndns.org>
On Tue, Jan 06, 2015 at 09:54:55PM -0500, Tejun Heo wrote:
> Hello, Martin.
>
> On Tue, Jan 06, 2015 at 07:05:40PM -0500, Martin K. Petersen wrote:
> > Tejun> Isn't that kinda niche and specialized tho?
> >
> > I don't think so. There are two reasons for zeroing block ranges:
> >
> > 1) To ensure they contain zeroes on subsequent reads
> >
> > 2) To preallocate them or anchor them down on thin provisioned devices
> >
> > The filesystem folks have specifically asked to be able to make that
> > distinction. Hence the patch that changes blkdev_issue_zeroout().
> >
> > You really don't want to write out gobs and gobs of zeroes and cause
> > unnecessary flash wear if all you care about is the blocks being in a
> > deterministic state.
>
> I think I'm still missing something. Are there enough cases where
> filesystems want to write out zeroes during operation?
IMO, yes.
w.r.t. thinp devices, we need to be able to guarantee that
prellocated regions in the filesystem are actually backed by real
blocks in the thinp device so we don't get ENOSPC from the thinp
device. No filesystems do this yet because we don't have a mechanism
for telling the lower layers "preallocate these blocks to zero".
The biggest issue is that we currently have no easy way to say
"these blocks need to contain zeros, but we aren't actually using
them yet". i.e. the filesystem code assumes that they contain zeros
(e.g. in ext4 inode tables because mkfs used to zero them) if they
haven't been used, so when it reads them it detects that
initialisation is needed because the blocks are empty....
FWIW, some filesystems need these regions to actually contain
zeros because they can't track unwritten extents (e.g.
gfs2). having sb_issue_zeroout() just do the right thing enables us
to efficiently zero the regions they are preallocating...
> Earlier in the
> thread, it was mentioned that this is currently mostly useful for
> raids which need the blocks actually cleared for checksum consistency,
> which basically means that raid metadata handling isn't (yet) capable
> of just marking those (parts of) stripes as unused. If a filesystem
> wants to read back zeros from data blocks, wouldn't it be just marking
> the matching index as such?
Not all filesystems can do this for user data (see gfs2 case above)
and no linux filesystem tracks whether free space contains zeros or
stale data. Hence if we want blocks to be zeroed on disk, we
currently have to write zeros to them and hence they get pinned in
devices as "used space" even though they may never get used again.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2015-01-07 4:15 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <yq18uiojfq9.fsf@sermon.lab.mkp.net>
[not found] ` <20141204170611.GB2995@htj.dyndns.org>
[not found] ` <yq14mtahmil.fsf@sermon.lab.mkp.net>
[not found] ` <20141205145148.GI4080@htj.dyndns.org>
[not found] ` <1418184578.2121.3.camel@HansenPartnership.com>
[not found] ` <20141210142927.GA6294@htj.dyndns.org>
[not found] ` <yq1oarbcj6f.fsf@sermon.lab.mkp.net>
[not found] ` <20150105162830.GP15833@htj.dyndns.org>
2015-01-07 0:05 ` [PATCH] libata: Whitelist SSDs that are known to properly return zeroes after TRIM Martin K. Petersen
2015-01-07 2:54 ` Tejun Heo
2015-01-07 4:15 ` Dave Chinner [this message]
2015-01-07 15:26 ` Tejun Heo
2015-01-08 14:28 ` Martin K. Petersen
2015-01-08 15:11 ` Tejun Heo
2015-01-08 15:34 ` Martin K. Petersen
2015-01-08 15:36 ` Tejun Heo
2015-01-08 15:58 ` Tim Small
2015-01-09 20:52 ` Martin K. Petersen
2015-01-09 21:39 ` Tejun Heo
2015-01-08 14:29 ` Martin K. Petersen
2015-01-08 4:05 ` Phillip Susi
2015-01-08 4:58 ` Andreas Dilger
2015-01-08 14:09 ` Phillip Susi
2015-01-08 22:31 ` Andreas Dilger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150107041537.GH31508@dastard \
--to=david@fromorbit.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).