From: Greg Freemyer <greg.freemyer@gmail.com>
To: Neil Brown <neilb@suse.de>
Cc: Matthew Wilcox <matthew@wil.cx>, Theodore Tso <tytso@mit.edu>,
Ric Wheeler <rwheeler@redhat.com>, "J?rn Engel" <joern@logfs.org>,
Matthew Wilcox <willy@linux.intel.com>,
Jens Axboe <jens.axboe@oracle.com>,
linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: Is TRIM/DISCARD going to be a performance problem?
Date: Tue, 12 May 2009 09:28:53 -0400 [thread overview]
Message-ID: <87f94c370905120628r2352e923h43dca1645e197b6c@mail.gmail.com> (raw)
In-Reply-To: <18952.46829.115084.46432@notabene.brown>
On Mon, May 11, 2009 at 7:38 PM, Neil Brown <neilb@suse.de> wrote:
> On Monday May 11, greg.freemyer@gmail.com wrote:
>>
>> And since the mdraid layer is not currently planning to track what has
>> been discarded over time, when a re-shape comes along, it will
>> effectively un-trim everything and rewrite 100% of the FS.
>
> You might not call them "plans" exactly, but I have had thoughts
> about tracking which part of an raid5 had 'live' data and which were
> trimmed. I think that is the only way I could support TRIM, unless
> devices guarantee that all trimmed blocks read a zeros, and that seems
> unlikely.
Neil,
Re: raid 5, etc. No FS info/discussion
The latest T13 proposed spec I saw explicitly allows reads from
trimmed sectors to return non-determinate data in some devices. Their
is a per device flag you can read to see if a device does that or not.
I think mdraid needs to simply assume all trimmed sectors return
non-determinate data. Either that, or simply check that per device
flag and refuse to accept a drive that supports returning
non-determinate data.
Regardless, ignoring reshape, why do you need to track it?
... thinking
Oh yes, you will have to track it at least at the stripe level.
If p = d1 ^ d2 is not guaranteed to be true due to a stripe discard
and p, d1, d2 are all potentially non-determinate all is good at first
because who cares that d1 = p ^ d2 is not true for your discarded
stripe. d1 is effectively just random data anyway.
But as soon as either d1 or d2 is written to, you will need to force
the entire stripe back into a determinate state or else you will have
unprotected data sitting on that stripe. You can only do that if you
know the entire stripe was previously indeterminate, thus you have no
option but to track the state of the stripes if dmraid is going to
support discards with devices that advertise themselves as returning
indeterminate data.
So Neil, it looks like you need to move from thoughts about tracking
discards to planning to track discards.
FYI: I don't know if it just for show, or if people really plan to do
it, but I have seen several people build up very high performance raid
arrays from SSDs already. Seems that about 8 SSDs maxes out the
current group of sata controllers, pci-express, etc.
Since SSDs with trim support should be even faster, I suspect these
ultra-high performance setups will want to use them.
Greg
--
Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf
The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2009-05-12 13:28 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-09 21:14 Is TRIM/DISCARD going to be a performance problem? Theodore Ts'o
2009-05-10 16:53 ` Jörn Engel
2009-05-11 8:37 ` Theodore Tso
2009-05-11 10:06 ` Jörn Engel
2009-05-11 10:18 ` Jens Axboe
2009-05-11 15:43 ` Jeff Garzik
2009-05-11 11:27 ` Theodore Tso
2009-05-11 12:09 ` Theodore Tso
2009-05-11 13:10 ` Greg Freemyer
2009-05-11 13:39 ` Matthew Wilcox
2009-05-11 14:27 ` Theodore Tso
2009-05-11 14:29 ` Ric Wheeler
2009-05-11 14:50 ` Theodore Tso
2009-05-11 14:58 ` Ric Wheeler
2009-05-11 15:00 ` Matthew Wilcox
2009-05-11 18:47 ` Greg Freemyer
2009-05-11 19:22 ` Andreas Dilger
2009-05-11 23:38 ` Neil Brown
2009-05-12 13:28 ` Greg Freemyer [this message]
2009-05-11 13:15 ` Ric Wheeler
2010-04-24 17:11 ` Phillip Susi
2009-05-11 12:43 ` Jörn Engel
2009-05-11 12:48 ` Matthew Wilcox
[not found] ` <f3177b9e0905111433i40e41c90r920d7ccf36442ffd@mail.gmail.com>
2009-05-11 22:03 ` Chris Worley
2009-05-11 16:30 ` Chris Worley
2009-05-11 8:12 ` Jens Axboe
2009-05-11 8:41 ` Theodore Tso
2009-05-11 8:49 ` Jens Axboe
2009-05-11 17:18 ` Chris Mason
2009-05-11 18:43 ` Matthew Wilcox
2009-05-11 18:53 ` Chris Mason
2009-05-11 19:19 ` Theodore Tso
2009-05-29 10:52 ` Florian Weimer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87f94c370905120628r2352e923h43dca1645e197b6c@mail.gmail.com \
--to=greg.freemyer@gmail.com \
--cc=jens.axboe@oracle.com \
--cc=joern@logfs.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=matthew@wil.cx \
--cc=neilb@suse.de \
--cc=rwheeler@redhat.com \
--cc=tytso@mit.edu \
--cc=willy@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).