* Software RAID and TRIM
@ 2011-06-28 15:31 Tom De Mulder
2011-06-28 16:11 ` Mathias Burén
` (2 more replies)
0 siblings, 3 replies; 45+ messages in thread
From: Tom De Mulder @ 2011-06-28 15:31 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: TEXT/PLAIN, Size: 847 bytes --]
Hi,
I'm investigating SSD performance on Linux, in particular for RAID
devices.
As I understand it—and please correct me if I'm wrong—currently software
RAID does not pass through TRIM to the underlying devices. TRIM is
essential for the continued high performance of SSDs, which otherwise
degrade over time.
I don't think there would be any harm in this command being passed through
to underlying devices if they don't support it (they would just ignore
it), and if they do it would make high-performance software RAID of SSDs a
possibility.
Is this something that's in the works?
Many thanks,
--
Tom De Mulder <tdm27@cam.ac.uk> - Cambridge University Computing Service
+44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH
-> 28/06/2011 : The Moon is Waning Crescent (22% of Full)
^ permalink raw reply [flat|nested] 45+ messages in thread* Re: Software RAID and TRIM 2011-06-28 15:31 Software RAID and TRIM Tom De Mulder @ 2011-06-28 16:11 ` Mathias Burén 2011-06-29 10:32 ` Tom De Mulder 2011-06-29 10:33 ` Tom De Mulder 2011-06-28 16:17 ` Johannes Truschnigg 2011-06-28 16:40 ` David Brown 2 siblings, 2 replies; 45+ messages in thread From: Mathias Burén @ 2011-06-28 16:11 UTC (permalink / raw) To: Tom De Mulder; +Cc: linux-raid On 28 June 2011 16:31, Tom De Mulder <tdm27@cam.ac.uk> wrote: > Hi, > > > I'm investigating SSD performance on Linux, in particular for RAID devices. > > As I understand it—and please correct me if I'm wrong—currently software > RAID does not pass through TRIM to the underlying devices. TRIM is essential > for the continued high performance of SSDs, which otherwise degrade over > time. > > I don't think there would be any harm in this command being passed through > to underlying devices if they don't support it (they would just ignore it), > and if they do it would make high-performance software RAID of SSDs a > possibility. > > > Is this something that's in the works? > > > > Many thanks, > > -- > Tom De Mulder <tdm27@cam.ac.uk> - Cambridge University Computing Service > +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH > -> 28/06/2011 : The Moon is Waning Crescent (22% of Full) IIRC md can already pass TRIM down, but I think the filesystem needs to know about the underlying architecture, or something, for TRIM to work in RAID. There's numerous discussions on this in the archives of this mailing list. /M -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-28 16:11 ` Mathias Burén @ 2011-06-29 10:32 ` Tom De Mulder 2011-06-29 10:45 ` NeilBrown 2011-07-17 21:57 ` Lutz Vieweg 2011-06-29 10:33 ` Tom De Mulder 1 sibling, 2 replies; 45+ messages in thread From: Tom De Mulder @ 2011-06-29 10:32 UTC (permalink / raw) To: Mathias Burén; +Cc: linux-raid [-- Attachment #1: Type: TEXT/PLAIN, Size: 1186 bytes --] On Tue, 28 Jun 2011, Mathias Burén wrote: > IIRC md can already pass TRIM down, but I think the filesystem needs > to know about the underlying architecture, or something, for TRIM to > work in RAID. Yes, it's (usually/ideally) the filesystem's job to invoke the TRIM command, and that's what ext4 can do. I have it working just fine on single drives, but for reasons of service reliability would need to get RAID to work. I tried (on an admittedly vanilla Ubuntu 2.6.38 kernel) the same on a two drive RAID1 md and it definitely didn't work (the blocks didn't get marked as unused and zeroed). > There's numerous discussions on this in the archives of > this mailing list. Given how fast things move in the world of SSDs at the moment, I wanted to check if any progress was made since. :-) I don't seem to be able to find any reference to this in recent kernel source commits (but I'm a complete amateur when it comes to git). Thanks, -- Tom De Mulder <tdm27@cam.ac.uk> - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -> 29/06/2011 : The Moon is Waning Crescent (18% of Full) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-29 10:32 ` Tom De Mulder @ 2011-06-29 10:45 ` NeilBrown 2011-06-29 11:10 ` Tom De Mulder ` (3 more replies) 2011-07-17 21:57 ` Lutz Vieweg 1 sibling, 4 replies; 45+ messages in thread From: NeilBrown @ 2011-06-29 10:45 UTC (permalink / raw) To: Tom De Mulder; +Cc: Mathias Burén, linux-raid On Wed, 29 Jun 2011 11:32:55 +0100 (BST) Tom De Mulder <tdm27@cam.ac.uk> wrote: > On Tue, 28 Jun 2011, Mathias Burén wrote: > > > IIRC md can already pass TRIM down, but I think the filesystem needs > > to know about the underlying architecture, or something, for TRIM to > > work in RAID. > > Yes, it's (usually/ideally) the filesystem's job to invoke the TRIM > command, and that's what ext4 can do. I have it working just fine on > single drives, but for reasons of service reliability would need to get > RAID to work. > > I tried (on an admittedly vanilla Ubuntu 2.6.38 kernel) the same on a two > drive RAID1 md and it definitely didn't work (the blocks didn't get marked > as unused and zeroed). > > > There's numerous discussions on this in the archives of > > this mailing list. > > Given how fast things move in the world of SSDs at the moment, I wanted to > check if any progress was made since. :-) I don't seem to be able to find > any reference to this in recent kernel source commits (but I'm a complete > amateur when it comes to git). Trim support for md is a long way down my list of interesting projects (and no-one else has volunteered). It is not at all straight forward to implement. For stripe/parity RAID, (RAID4/5/6) it is only safe to discard full stripes at a time, and the md layer would need to keep a record of which stripes had been discarded so that it didn't risk trusting data (and parity) read from those stripes. So you would need some sort of bitmap of invalid stripes, and you would need the fs to discard in very large chunks for it to be useful at all. For copying RAID (RAID1, RAID10) you really need the same bitmap. There isn't the same risk of reading and trusting discarded parity, but a resync which didn't know about discarded ranges would undo the discard for you. So is basically requires another bitmap to be stored with the metadata, and a fairly fine-grained bitmap it would need to be. Then every read and resync checks the bitmap and ignores or returns 0 for discarded ranges, and every write needs to check and if the range was discard, clear the bit and write to the whole range. So: do-able, but definitely non-trivial. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-29 10:45 ` NeilBrown @ 2011-06-29 11:10 ` Tom De Mulder 2011-06-29 11:48 ` Scott E. Armitage 2011-06-29 12:46 ` David Brown ` (2 subsequent siblings) 3 siblings, 1 reply; 45+ messages in thread From: Tom De Mulder @ 2011-06-29 11:10 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid On Wed, 29 Jun 2011, NeilBrown wrote: > It is not at all straight forward to implement. > > For stripe/parity RAID, (RAID4/5/6) it is only safe to discard full stripes at > a time, and the md layer would need to keep a record of which stripes had been > discarded so that it didn't risk trusting data (and parity) read from those > stripes. So you would need some sort of bitmap of invalid stripes, and you > would need the fs to discard in very large chunks for it to be useful at all. > > For copying RAID (RAID1, RAID10) you really need the same bitmap. There > isn't the same risk of reading and trusting discarded parity, but a resync > which didn't know about discarded ranges would undo the discard for you. However, that might not necessarily be a problem; tools exist that can be run manually (slightly fsck-like) and tell the drive which blocks can be erased. > So: do-able, but definitely non-trivial. Thanks very much for your response, you make some very good points. I shall, for the time being, chop my SSDs in half and let them treat the empty half as spare area, which should make performance degradation a non-issue. I hope. Cheers, -- Tom De Mulder <tdm27@cam.ac.uk> - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -> 29/06/2011 : The Moon is Waning Crescent (18% of Full) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-29 11:10 ` Tom De Mulder @ 2011-06-29 11:48 ` Scott E. Armitage 2011-06-29 12:46 ` Roberto Spadim 0 siblings, 1 reply; 45+ messages in thread From: Scott E. Armitage @ 2011-06-29 11:48 UTC (permalink / raw) To: linux-raid On Wed, Jun 29, 2011 at 7:10 AM, Tom De Mulder <tdm27@cam.ac.uk> wrote: > However, that might not necessarily be a problem; tools exist that can be run manually (slightly fsck-like) and tell the drive which blocks can be erased. For RAID5/6 at least, md will still require knowledge of what stripes are and are not in use by the filesystem. In the current implementation, the entire array must be consistent, regardless of whether or not a particular block is in use. As far as my understanding goes, any level of TRIM support for parity arrays would be a fundamental shift in the way md treats the array. The simplest solution I see is to do as Niel suggested, and mimic TRIM support at the RAID level, and pass commands down as necessary. An alternative solution would be to add a second TRIM layer, where md maintains a list of what is or is not in use, and once an entire stripe has been discarded by the filesystem, it can send a single TRIM command to each member drive to drop the entire stripe contents. This adds abstraction for the filesystem layer, allowing it to treat the RAID array like a regular SSD, but adds significant complexity to md itself. -Scott p.s. Sorry if you receive this twice; Majordomo rejected the first one on HTML subpart basis. -- Scott Armitage, B.A.Sc., M.A.Sc. candidate Space Flight Laboratory University of Toronto Institute for Aerospace Studies 4925 Dufferin Street, Toronto, Ontario, Canada, M3H 5T6 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-29 11:48 ` Scott E. Armitage @ 2011-06-29 12:46 ` Roberto Spadim 0 siblings, 0 replies; 45+ messages in thread From: Roberto Spadim @ 2011-06-29 12:46 UTC (permalink / raw) To: Scott E. Armitage; +Cc: linux-raid some ideas.... maybe for a test only... we could send trim commands on raid1 arrays only or 'raid0 linear' since they don´t stripe, this could be 'easy' to develop when filesystem send trim, we send it to down device (/dev/sdX99) there´s a problem of offset (for raid1) maybe some devices just work with 4096bytes blocks on trim command, maybe not we could implement and put in a beta/alpha realease to test like ext4 guys are doing with discard command (it´s a user option today) 2011/6/29 Scott E. Armitage <launchpad@scott.armitage.name>: > On Wed, Jun 29, 2011 at 7:10 AM, Tom De Mulder <tdm27@cam.ac.uk> wrote: >> However, that might not necessarily be a problem; tools exist that can be run manually (slightly fsck-like) and tell the drive which blocks can be erased. > > For RAID5/6 at least, md will still require knowledge of what stripes > are and are not in use by the filesystem. In the current > implementation, the entire array must be consistent, regardless of > whether or not a particular block is in use. As far as my > understanding goes, any level of TRIM support for parity arrays would > be a fundamental shift in the way md treats the array. > > The simplest solution I see is to do as Niel suggested, and mimic TRIM > support at the RAID level, and pass commands down as necessary. An > alternative solution would be to add a second TRIM layer, where md > maintains a list of what is or is not in use, and once an entire > stripe has been discarded by the filesystem, it can send a single TRIM > command to each member drive to drop the entire stripe contents. This > adds abstraction for the filesystem layer, allowing it to treat the > RAID array like a regular SSD, but adds significant complexity to md > itself. > > -Scott > > p.s. Sorry if you receive this twice; Majordomo rejected the first one > on HTML subpart basis. > > -- > Scott Armitage, B.A.Sc., M.A.Sc. candidate > Space Flight Laboratory > University of Toronto Institute for Aerospace Studies > 4925 Dufferin Street, Toronto, Ontario, Canada, M3H 5T6 > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-29 10:45 ` NeilBrown 2011-06-29 11:10 ` Tom De Mulder @ 2011-06-29 12:46 ` David Brown 2011-06-30 0:28 ` NeilBrown 2011-06-29 13:39 ` Namhyung Kim 2011-07-17 22:11 ` Lutz Vieweg 3 siblings, 1 reply; 45+ messages in thread From: David Brown @ 2011-06-29 12:46 UTC (permalink / raw) To: linux-raid On 29/06/2011 12:45, NeilBrown wrote: > On Wed, 29 Jun 2011 11:32:55 +0100 (BST) Tom De Mulder<tdm27@cam.ac.uk> > wrote: > >> On Tue, 28 Jun 2011, Mathias Burén wrote: >> >>> IIRC md can already pass TRIM down, but I think the filesystem needs >>> to know about the underlying architecture, or something, for TRIM to >>> work in RAID. >> >> Yes, it's (usually/ideally) the filesystem's job to invoke the TRIM >> command, and that's what ext4 can do. I have it working just fine on >> single drives, but for reasons of service reliability would need to get >> RAID to work. >> >> I tried (on an admittedly vanilla Ubuntu 2.6.38 kernel) the same on a two >> drive RAID1 md and it definitely didn't work (the blocks didn't get marked >> as unused and zeroed). >> >>> There's numerous discussions on this in the archives of >>> this mailing list. >> >> Given how fast things move in the world of SSDs at the moment, I wanted to >> check if any progress was made since. :-) I don't seem to be able to find >> any reference to this in recent kernel source commits (but I'm a complete >> amateur when it comes to git). > > > Trim support for md is a long way down my list of interesting projects (and > no-one else has volunteered). > > It is not at all straight forward to implement. > > For stripe/parity RAID, (RAID4/5/6) it is only safe to discard full stripes at > a time, and the md layer would need to keep a record of which stripes had been > discarded so that it didn't risk trusting data (and parity) read from those > stripes. So you would need some sort of bitmap of invalid stripes, and you > would need the fs to discard in very large chunks for it to be useful at all. > > For copying RAID (RAID1, RAID10) you really need the same bitmap. There > isn't the same risk of reading and trusting discarded parity, but a resync > which didn't know about discarded ranges would undo the discard for you. > > So is basically requires another bitmap to be stored with the metadata, and a > fairly fine-grained bitmap it would need to be. Then every read and resync > checks the bitmap and ignores or returns 0 for discarded ranges, and every > write needs to check and if the range was discard, clear the bit and write to > the whole range. > > So: do-able, but definitely non-trivial. > Wouldn't the sync/no-sync tracking you already have planned be usable for tracking discarded areas? Or will that not be find-grained enough for the purpose? -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-29 12:46 ` David Brown @ 2011-06-30 0:28 ` NeilBrown 2011-06-30 7:50 ` David Brown 0 siblings, 1 reply; 45+ messages in thread From: NeilBrown @ 2011-06-30 0:28 UTC (permalink / raw) To: David Brown; +Cc: linux-raid On Wed, 29 Jun 2011 14:46:08 +0200 David Brown <david@westcontrol.com> wrote: > On 29/06/2011 12:45, NeilBrown wrote: > > On Wed, 29 Jun 2011 11:32:55 +0100 (BST) Tom De Mulder<tdm27@cam.ac.uk> > > wrote: > > > >> On Tue, 28 Jun 2011, Mathias Burén wrote: > >> > >>> IIRC md can already pass TRIM down, but I think the filesystem needs > >>> to know about the underlying architecture, or something, for TRIM to > >>> work in RAID. > >> > >> Yes, it's (usually/ideally) the filesystem's job to invoke the TRIM > >> command, and that's what ext4 can do. I have it working just fine on > >> single drives, but for reasons of service reliability would need to get > >> RAID to work. > >> > >> I tried (on an admittedly vanilla Ubuntu 2.6.38 kernel) the same on a two > >> drive RAID1 md and it definitely didn't work (the blocks didn't get marked > >> as unused and zeroed). > >> > >>> There's numerous discussions on this in the archives of > >>> this mailing list. > >> > >> Given how fast things move in the world of SSDs at the moment, I wanted to > >> check if any progress was made since. :-) I don't seem to be able to find > >> any reference to this in recent kernel source commits (but I'm a complete > >> amateur when it comes to git). > > > > > > Trim support for md is a long way down my list of interesting projects (and > > no-one else has volunteered). > > > > It is not at all straight forward to implement. > > > > For stripe/parity RAID, (RAID4/5/6) it is only safe to discard full stripes at > > a time, and the md layer would need to keep a record of which stripes had been > > discarded so that it didn't risk trusting data (and parity) read from those > > stripes. So you would need some sort of bitmap of invalid stripes, and you > > would need the fs to discard in very large chunks for it to be useful at all. > > > > For copying RAID (RAID1, RAID10) you really need the same bitmap. There > > isn't the same risk of reading and trusting discarded parity, but a resync > > which didn't know about discarded ranges would undo the discard for you. > > > > So is basically requires another bitmap to be stored with the metadata, and a > > fairly fine-grained bitmap it would need to be. Then every read and resync > > checks the bitmap and ignores or returns 0 for discarded ranges, and every > > write needs to check and if the range was discard, clear the bit and write to > > the whole range. > > > > So: do-able, but definitely non-trivial. > > > > Wouldn't the sync/no-sync tracking you already have planned be usable > for tracking discarded areas? Or will that not be find-grained enough > for the purpose? That would be a necessary precursor to DISCARD support: yes. DISCARD would probably require a much finer grain than I would otherwise suggest but I would design the feature to allow a range of granularities. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-30 0:28 ` NeilBrown @ 2011-06-30 7:50 ` David Brown 0 siblings, 0 replies; 45+ messages in thread From: David Brown @ 2011-06-30 7:50 UTC (permalink / raw) To: linux-raid On 30/06/2011 02:28, NeilBrown wrote: > On Wed, 29 Jun 2011 14:46:08 +0200 David Brown<david@westcontrol.com> wrote: > >> On 29/06/2011 12:45, NeilBrown wrote: >>> On Wed, 29 Jun 2011 11:32:55 +0100 (BST) Tom De Mulder<tdm27@cam.ac.uk> >>> wrote: >>> >>>> On Tue, 28 Jun 2011, Mathias Burén wrote: >>>> >>>>> IIRC md can already pass TRIM down, but I think the filesystem needs >>>>> to know about the underlying architecture, or something, for TRIM to >>>>> work in RAID. >>>> >>>> Yes, it's (usually/ideally) the filesystem's job to invoke the TRIM >>>> command, and that's what ext4 can do. I have it working just fine on >>>> single drives, but for reasons of service reliability would need to get >>>> RAID to work. >>>> >>>> I tried (on an admittedly vanilla Ubuntu 2.6.38 kernel) the same on a two >>>> drive RAID1 md and it definitely didn't work (the blocks didn't get marked >>>> as unused and zeroed). >>>> >>>>> There's numerous discussions on this in the archives of >>>>> this mailing list. >>>> >>>> Given how fast things move in the world of SSDs at the moment, I wanted to >>>> check if any progress was made since. :-) I don't seem to be able to find >>>> any reference to this in recent kernel source commits (but I'm a complete >>>> amateur when it comes to git). >>> >>> >>> Trim support for md is a long way down my list of interesting projects (and >>> no-one else has volunteered). >>> >>> It is not at all straight forward to implement. >>> >>> For stripe/parity RAID, (RAID4/5/6) it is only safe to discard full stripes at >>> a time, and the md layer would need to keep a record of which stripes had been >>> discarded so that it didn't risk trusting data (and parity) read from those >>> stripes. So you would need some sort of bitmap of invalid stripes, and you >>> would need the fs to discard in very large chunks for it to be useful at all. >>> >>> For copying RAID (RAID1, RAID10) you really need the same bitmap. There >>> isn't the same risk of reading and trusting discarded parity, but a resync >>> which didn't know about discarded ranges would undo the discard for you. >>> >>> So is basically requires another bitmap to be stored with the metadata, and a >>> fairly fine-grained bitmap it would need to be. Then every read and resync >>> checks the bitmap and ignores or returns 0 for discarded ranges, and every >>> write needs to check and if the range was discard, clear the bit and write to >>> the whole range. >>> >>> So: do-able, but definitely non-trivial. >>> >> >> Wouldn't the sync/no-sync tracking you already have planned be usable >> for tracking discarded areas? Or will that not be find-grained enough >> for the purpose? > > That would be a necessary precursor to DISCARD support: yes. > DISCARD would probably require a much finer grain than I would otherwise > suggest but I would design the feature to allow a range of granularities. > I suppose the big win for the sync/no-sync tracking is when initialising an array - arrays that haven't been written don't need to be in sync. But you will probably be best with a list of sync (or no-sync) areas for that job, rather than a bitmap, as there won't be very many such blocks (a few dozen, perhaps, for multiple partitions and filesystems like XFS that write in different areas) and as the disk gets used, the "no-sync" areas will decrease in size and number. For DISCARD, however, you'd get no-sync areas scattered around the disk. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-29 10:45 ` NeilBrown 2011-06-29 11:10 ` Tom De Mulder 2011-06-29 12:46 ` David Brown @ 2011-06-29 13:39 ` Namhyung Kim 2011-06-30 0:27 ` NeilBrown 2011-07-17 22:11 ` Lutz Vieweg 3 siblings, 1 reply; 45+ messages in thread From: Namhyung Kim @ 2011-06-29 13:39 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid NeilBrown <neilb@suse.de> writes: > On Wed, 29 Jun 2011 11:32:55 +0100 (BST) Tom De Mulder <tdm27@cam.ac.uk> > wrote: > >> On Tue, 28 Jun 2011, Mathias Burén wrote: >> >> > IIRC md can already pass TRIM down, but I think the filesystem needs >> > to know about the underlying architecture, or something, for TRIM to >> > work in RAID. >> >> Yes, it's (usually/ideally) the filesystem's job to invoke the TRIM >> command, and that's what ext4 can do. I have it working just fine on >> single drives, but for reasons of service reliability would need to get >> RAID to work. >> >> I tried (on an admittedly vanilla Ubuntu 2.6.38 kernel) the same on a two >> drive RAID1 md and it definitely didn't work (the blocks didn't get marked >> as unused and zeroed). >> >> > There's numerous discussions on this in the archives of >> > this mailing list. >> >> Given how fast things move in the world of SSDs at the moment, I wanted to >> check if any progress was made since. :-) I don't seem to be able to find >> any reference to this in recent kernel source commits (but I'm a complete >> amateur when it comes to git). > > > Trim support for md is a long way down my list of interesting projects (and > no-one else has volunteered). > Just out of curiosity, what are there in your list? :) -- Regards, Namhyung Kim -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-29 13:39 ` Namhyung Kim @ 2011-06-30 0:27 ` NeilBrown 0 siblings, 0 replies; 45+ messages in thread From: NeilBrown @ 2011-06-30 0:27 UTC (permalink / raw) To: Namhyung Kim; +Cc: linux-raid On Wed, 29 Jun 2011 22:39:24 +0900 Namhyung Kim <namhyung@gmail.com> wrote: > NeilBrown <neilb@suse.de> writes: > > > On Wed, 29 Jun 2011 11:32:55 +0100 (BST) Tom De Mulder <tdm27@cam.ac.uk> > > wrote: > > > >> On Tue, 28 Jun 2011, Mathias Burén wrote: > >> > >> > IIRC md can already pass TRIM down, but I think the filesystem needs > >> > to know about the underlying architecture, or something, for TRIM to > >> > work in RAID. > >> > >> Yes, it's (usually/ideally) the filesystem's job to invoke the TRIM > >> command, and that's what ext4 can do. I have it working just fine on > >> single drives, but for reasons of service reliability would need to get > >> RAID to work. > >> > >> I tried (on an admittedly vanilla Ubuntu 2.6.38 kernel) the same on a two > >> drive RAID1 md and it definitely didn't work (the blocks didn't get marked > >> as unused and zeroed). > >> > >> > There's numerous discussions on this in the archives of > >> > this mailing list. > >> > >> Given how fast things move in the world of SSDs at the moment, I wanted to > >> check if any progress was made since. :-) I don't seem to be able to find > >> any reference to this in recent kernel source commits (but I'm a complete > >> amateur when it comes to git). > > > > > > Trim support for md is a long way down my list of interesting projects (and > > no-one else has volunteered). > > > > Just out of curiosity, what are there in your list? :) > > http://neil.brown.name/blog/20110216044002 I have code for the first - the bad block log - and it seems to work. But I really need to design and then perform some more testing. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-29 10:45 ` NeilBrown ` (2 preceding siblings ...) 2011-06-29 13:39 ` Namhyung Kim @ 2011-07-17 22:11 ` Lutz Vieweg 3 siblings, 0 replies; 45+ messages in thread From: Lutz Vieweg @ 2011-07-17 22:11 UTC (permalink / raw) To: linux-raid NeilBrown wrote: > Trim support for md is a long way down my list of interesting projects (and > no-one else has volunteered). That's a pity. Actually, we were desperate enough about being able to discard unused sectors from our SSDs "behind" MD that we implemented a user-space work-around (using fallocate and BLKDISCARD ioctls after finding out which physical devices are hidden behind the RAID), but that is awkward in comparison to just using "fstrim" or alike, as this means that during the discards, the filesystem appears "almost full", and the work-around supports only RAID-1. > It is not at all straight forward to implement. For RAID5/6, I understand that. But supporting RAID 0/1, and maybe even RAID 10, should not be that difficult. (dm-raid does support this, though we don't like dm-raid too much for several other reasons.) If today somebody is investing into SSDs, it is for speed. So if you are setting up an SSD based RAID, it's unlikely that you'll aim for RAID5/6, anyway. > For copying RAID (RAID1, RAID10) you really need the same bitmap. There > isn't the same risk of reading and trusting discarded parity, but a resync > which didn't know about discarded ranges would undo the discard for you. That is true, but not really a problem. Yes, the write-performance will suffer until the next "fstrim" is done, but the performance suffers from the resync anyway, so that's not something extra, and SSD users will certainly issue "fstrim" periodically, anyway. I guess you would make many people happy if MD-raid supported passing through discards, even if it was only for RAID 0/1, and even if a resync meant you'd have to issue an additional "fstrim". Regards, Lutz Vieweg ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-29 10:32 ` Tom De Mulder 2011-06-29 10:45 ` NeilBrown @ 2011-07-17 21:57 ` Lutz Vieweg 1 sibling, 0 replies; 45+ messages in thread From: Lutz Vieweg @ 2011-07-17 21:57 UTC (permalink / raw) To: linux-raid Tom De Mulder wrote: > Yes, it's (usually/ideally) the filesystem's job to invoke the TRIM > command Well, for us voluntarily (cron-triggered) batch-discards have shown to be the better option. If you leave it to the filesystem to trigger the discards, then you might lose write performance when you need it most. In comparison, a voluntarily triggered discard in some low-usage time is painless. Regards, Lutz Vieweg ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-28 16:11 ` Mathias Burén 2011-06-29 10:32 ` Tom De Mulder @ 2011-06-29 10:33 ` Tom De Mulder 2011-06-29 12:42 ` David Brown 2011-07-17 22:00 ` Lutz Vieweg 1 sibling, 2 replies; 45+ messages in thread From: Tom De Mulder @ 2011-06-29 10:33 UTC (permalink / raw) To: linux-raid On 28/06/11, David Brown wrote: > However, AFAIUI, you are wrong about TRIM being essential for the > continued high performance of SSDs. As long as your SSDs have some > over-provisioning (or you only partition something like 90% of the > drive), and it's got good garbage collection, then TRIM will have > minimal effect. While you are mostly correct, over time even consumer SSDs will end up in this state. Maybe I should have specified--my particular aim is to try and use (fairly high-end) consumer SSDs for "enterprise" server applications, hence the research into RAID. Most hardware RAID controllers that I know of don't pass on the TRIM command (for various reasons), so I was hoping to have more luck with software RAID. Best, -- Tom De Mulder <tdm27@cam.ac.uk> - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -> 29/06/2011 : The Moon is Waning Crescent (18% of Full) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-29 10:33 ` Tom De Mulder @ 2011-06-29 12:42 ` David Brown 2011-06-29 12:55 ` Tom De Mulder 2011-07-17 22:00 ` Lutz Vieweg 1 sibling, 1 reply; 45+ messages in thread From: David Brown @ 2011-06-29 12:42 UTC (permalink / raw) To: linux-raid On 29/06/2011 12:33, Tom De Mulder wrote: > On 28/06/11, David Brown wrote: > >> However, AFAIUI, you are wrong about TRIM being essential for the >> continued high performance of SSDs. As long as your SSDs have some >> over-provisioning (or you only partition something like 90% of the >> drive), and it's got good garbage collection, then TRIM will have >> minimal effect. > > While you are mostly correct, over time even consumer SSDs will end up > in this state. > I don't quite follow you here - what state will consumer SSDs end up in? > Maybe I should have specified--my particular aim is to try and use > (fairly high-end) consumer SSDs for "enterprise" server applications, > hence the research into RAID. Most hardware RAID controllers that I know > of don't pass on the TRIM command (for various reasons), so I was hoping > to have more luck with software RAID. > > Now you know /why/ hardware RAID controllers don't implement TRIM! Have you tried any real-world benchmarking with realistic loads with a single SSD, ext4, and TRIM on and off? Almost every article I've seen on the subject is using very synthetic benchmarks, almost always on windows, few are done with current garbage-collecting SSDs. It seems to be accepted wisdom from the early days of SSDs that TRIM makes a big difference - and few people challenge that with real numbers or real thought, even though the internal structure of the flash has changed dramatically (transparent compression, for example, gives a completely different effect). Of course, if you /do/ try it yourself and can show clear figures, then I'm willing to change my mind :-) If I had a spare SSD, I'd do the testing myself. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-29 12:42 ` David Brown @ 2011-06-29 12:55 ` Tom De Mulder 2011-06-29 13:02 ` Roberto Spadim ` (3 more replies) 0 siblings, 4 replies; 45+ messages in thread From: Tom De Mulder @ 2011-06-29 12:55 UTC (permalink / raw) To: David Brown; +Cc: linux-raid [-- Attachment #1: Type: TEXT/PLAIN, Size: 2349 bytes --] On Wed, 29 Jun 2011, David Brown wrote: >> While you are mostly correct, over time even consumer SSDs will end up >> in this state. > I don't quite follow you here - what state will consumer SSDs end up in? Sorry, I meant to say "SSDs in typical consumer desktop machines". The state where writes are very slow. > Have you tried any real-world benchmarking with realistic loads with a single > SSD, ext4, and TRIM on and off? Almost every article I've seen on the subject > is using very synthetic benchmarks, almost always on windows, few are done > with current garbage-collecting SSDs. It seems to be accepted wisdom from the > early days of SSDs that TRIM makes a big difference - and few people challenge > that with real numbers or real thought, even though the internal structure of > the flash has changed dramatically (transparent compression, for example, > gives a completely different effect). > > Of course, if you /do/ try it yourself and can show clear figures, then I'm > willing to change my mind :-) If I had a spare SSD, I'd do the testing > myself. I have a set of 4 Intel 510 SSDs purely for testing, and I have used these to simulate the kinds of workload I would expect them to experience in a server environment (focused mainly on database access). So far, those tests have focused on using single drives (ie. without RAID) on a variety of controllers. Once the drives get fuller (something which does happen on servers) I do indeed see write latencies that are in the order of several seconds (I saw from 1500µs to 6000µs), as the drive suddenly struggles to free entire blocks, where initially latency was in the single digits. I am hoping to get my hands on some Sandforce controller-based SSDs as well, to compare, but even they show degradation as they get fuller in AnandTech's tests (and those tests seem, IME, trustworthy). My current plan is to sacrifice half the capacity by partitioning, stick 2 of them in md RAID1 (so, without TRIM) and over the next few days to run benchmarks over them, to see what the end result is. Best, -- Tom De Mulder <tdm27@cam.ac.uk> - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -> 29/06/2011 : The Moon is Waning Crescent (18% of Full) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-29 12:55 ` Tom De Mulder @ 2011-06-29 13:02 ` Roberto Spadim 2011-06-29 13:10 ` David Brown ` (2 subsequent siblings) 3 siblings, 0 replies; 45+ messages in thread From: Roberto Spadim @ 2011-06-29 13:02 UTC (permalink / raw) To: Tom De Mulder; +Cc: David Brown, linux-raid nice, anyone know if freebsd or netbsd or other o.s. have this (raid trim) to do some benchmarks without losing our time developing? 2011/6/29 Tom De Mulder <tdm27@cam.ac.uk>: > On Wed, 29 Jun 2011, David Brown wrote: > >>> While you are mostly correct, over time even consumer SSDs will end up >>> in this state. >> >> I don't quite follow you here - what state will consumer SSDs end up in? > > Sorry, I meant to say "SSDs in typical consumer desktop machines". The state > where writes are very slow. > >> Have you tried any real-world benchmarking with realistic loads with a >> single SSD, ext4, and TRIM on and off? Almost every article I've seen on >> the subject is using very synthetic benchmarks, almost always on windows, >> few are done with current garbage-collecting SSDs. It seems to be accepted >> wisdom from the early days of SSDs that TRIM makes a big difference - and >> few people challenge that with real numbers or real thought, even though the >> internal structure of the flash has changed dramatically (transparent >> compression, for example, gives a completely different effect). >> >> Of course, if you /do/ try it yourself and can show clear figures, then >> I'm willing to change my mind :-) If I had a spare SSD, I'd do the testing >> myself. > > I have a set of 4 Intel 510 SSDs purely for testing, and I have used these > to simulate the kinds of workload I would expect them to experience in a > server environment (focused mainly on database access). So far, those tests > have focused on using single drives (ie. without RAID) on a variety of > controllers. > > Once the drives get fuller (something which does happen on servers) I do > indeed see write latencies that are in the order of several seconds (I saw > from 1500µs to 6000µs), as the drive suddenly struggles to free entire > blocks, where initially latency was in the single digits. > > I am hoping to get my hands on some Sandforce controller-based SSDs as well, > to compare, but even they show degradation as they get fuller in AnandTech's > tests (and those tests seem, IME, trustworthy). > > My current plan is to sacrifice half the capacity by partitioning, stick 2 > of them in md RAID1 (so, without TRIM) and over the next few days to run > benchmarks over them, to see what the end result is. > > > Best, > > -- > Tom De Mulder <tdm27@cam.ac.uk> - Cambridge University Computing Service > +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH > -> 29/06/2011 : The Moon is Waning Crescent (18% of Full) -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-29 12:55 ` Tom De Mulder 2011-06-29 13:02 ` Roberto Spadim @ 2011-06-29 13:10 ` David Brown 2011-06-30 5:51 ` Mikael Abrahamsson 2011-07-17 22:16 ` Lutz Vieweg 3 siblings, 0 replies; 45+ messages in thread From: David Brown @ 2011-06-29 13:10 UTC (permalink / raw) To: linux-raid On 29/06/2011 14:55, Tom De Mulder wrote: > On Wed, 29 Jun 2011, David Brown wrote: > >>> While you are mostly correct, over time even consumer SSDs will end up >>> in this state. >> I don't quite follow you here - what state will consumer SSDs end up in? > > Sorry, I meant to say "SSDs in typical consumer desktop machines". The > state where writes are very slow. > Well, many consumer level systems use older or cheaper SSDs which don't have the benefit of newer garbage collection, and don't have much over-provisioning (you can always do that yourself by leaving some space unpartitioned - but "consumer" users would typically not do that). And remember that for users in this class, who will probably have small SSDs to keep costs down, will have fairly full drives - making TRIM almost useless. >> Have you tried any real-world benchmarking with realistic loads with a >> single SSD, ext4, and TRIM on and off? Almost every article I've seen >> on the subject is using very synthetic benchmarks, almost always on >> windows, few are done with current garbage-collecting SSDs. It seems >> to be accepted wisdom from the early days of SSDs that TRIM makes a >> big difference - and few people challenge that with real numbers or >> real thought, even though the internal structure of the flash has >> changed dramatically (transparent compression, for example, gives a >> completely different effect). >> >> Of course, if you /do/ try it yourself and can show clear figures, >> then I'm willing to change my mind :-) If I had a spare SSD, I'd do >> the testing myself. > > I have a set of 4 Intel 510 SSDs purely for testing, and I have used > these to simulate the kinds of workload I would expect them to > experience in a server environment (focused mainly on database access). > So far, those tests have focused on using single drives (ie. without > RAID) on a variety of controllers. > > Once the drives get fuller (something which does happen on servers) I do > indeed see write latencies that are in the order of several seconds (I > saw from 1500µs to 6000µs), as the drive suddenly struggles to free > entire blocks, where initially latency was in the single digits. > > I am hoping to get my hands on some Sandforce controller-based SSDs as > well, to compare, but even they show degradation as they get fuller in > AnandTech's tests (and those tests seem, IME, trustworthy). > > My current plan is to sacrifice half the capacity by partitioning, stick > 2 of them in md RAID1 (so, without TRIM) and over the next few days to > run benchmarks over them, to see what the end result is. > Well, try it and see - and let us know the results. 50% manual over-provisioning seems excessive, but I guess that's what you'll find out with the tests. > > Best, > > -- > Tom De Mulder <tdm27@cam.ac.uk> - Cambridge University Computing Service > +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH > -> 29/06/2011 : The Moon is Waning Crescent (18% of Full) -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-29 12:55 ` Tom De Mulder 2011-06-29 13:02 ` Roberto Spadim 2011-06-29 13:10 ` David Brown @ 2011-06-30 5:51 ` Mikael Abrahamsson 2011-07-04 9:13 ` Tom De Mulder 2011-07-17 22:16 ` Lutz Vieweg 3 siblings, 1 reply; 45+ messages in thread From: Mikael Abrahamsson @ 2011-06-30 5:51 UTC (permalink / raw) To: Tom De Mulder; +Cc: David Brown, linux-raid [-- Attachment #1: Type: TEXT/PLAIN, Size: 1552 bytes --] On Wed, 29 Jun 2011, Tom De Mulder wrote: > I have a set of 4 Intel 510 SSDs purely for testing, and I have used > these to simulate the kinds of workload I would expect them to > experience in a server environment (focused mainly on database access). > So far, those tests have focused on using single drives (ie. without > RAID) on a variety of controllers. From the tests I have read, the Intel 510 are actually worse than the Intel X-25 G1/G2/320 models, with exactly the symptoms you're describing. It's fast for linear reads and writes, but not so good for random writes, especially not when it's getting full. > Once the drives get fuller (something which does happen on servers) I do > indeed see write latencies that are in the order of several seconds (I > saw from 1500µs to 6000µs), as the drive suddenly struggles to free > entire blocks, where initially latency was in the single digits. Yeah, this is a common problem especially for older drives. A lot has happened with garbage collect but the fact is still that a lot of SSD vendors have too little spare area, so the recommendation you make regarding leaving a large area unused is something I do as well, and it works. > I am hoping to get my hands on some Sandforce controller-based SSDs as well, > to compare, but even they show degradation as they get fuller in AnandTech's > tests (and those tests seem, IME, trustworthy). Include the Intel 320 as well, I think it should be viable for your usage pattern. -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-30 5:51 ` Mikael Abrahamsson @ 2011-07-04 9:13 ` Tom De Mulder 2011-07-04 16:26 ` Werner Fischer 0 siblings, 1 reply; 45+ messages in thread From: Tom De Mulder @ 2011-07-04 9:13 UTC (permalink / raw) To: Mikael Abrahamsson; +Cc: David Brown, linux-raid On Thu, 30 Jun 2011, Mikael Abrahamsson wrote: > From the tests I have read, the Intel 510 are actually worse than the Intel > X-25 G1/G2/320 models, with exactly the symptoms you're describing. It's fast > for linear reads and writes, but not so good for random writes, especially not > when it's getting full. Yes; that's why I'm looking forward to also getting some SandForce 22xx based drives (probably OCZ Vertex 3) to test. > Include the Intel 320 as well, I think it should be viable for your usage > pattern. I wasn't too impressed by the Anandtech review of the 320, and (as everywhere) my funds are limited. :-) -- Tom De Mulder <tdm27@cam.ac.uk> - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -> 04/07/2011 : The Moon is Waxing Crescent (17% of Full) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-04 9:13 ` Tom De Mulder @ 2011-07-04 16:26 ` Werner Fischer 2011-07-17 22:31 ` Lutz Vieweg 0 siblings, 1 reply; 45+ messages in thread From: Werner Fischer @ 2011-07-04 16:26 UTC (permalink / raw) To: Tom De Mulder; +Cc: linux-raid Hi Tom, 1) regarding Software RAID and TRIM: there is a script raid1ext4trim.sh-1.4 from Chris Caputo that does a TRIM for Ext4 file systems on a software RAID 1. According to the comments in the script it only supports RAID volumes which reside on complete disks (e.g. /dev/sdb and /dev/sdc), not on RAID partitions (e.g. /dev/sdb1 and /dev/sdc1) The script is shipped with hdparm, get hdparm 9.37 at http://sourceforge.net/projects/hdparm/ and you'll find the script in the subfolder hdparm-9.37/wiper/contrib/ I have not tested the script yet, maybe I could do some tests tomorrow 2) regarding choosing the right SSD: I would strongly recommend a SSD with integrated power-outage protection, Intel's 320 series has this inside: http://newsroom.intel.com/servlet/JiveServlet/download/38-4324/Intel_SSD_320_Series_Enhance_Power_Loss_Technology_Brief.pdf I have done some power-outage tests today, including a Vertex-3 and an Intel 320 series. I used diskchecker.pl from http://brad.livejournal.com/2116715.html result: -> for the Vertex 3 diskchecker.pl reported lost data: [root@f15-ocz-vertex3 ~]# ./diskchecker.pl -s 10.10.30.199 verify testfile2 verifying: 0.00% verifying: 1.42% Error at page 52141, 0 seconds before end. verifying: 6.31% Error at page 83344, 0 seconds before end. verifying: 11.12% Error at page 163555, 0 seconds before end. [...] Total errors: 12 Histogram of seconds before end: 0 12 [root@f15-ocz-vertex3 ~]# -> for the Intel 320 Series diskchecker.pl did not report data loss: [root@f15-intel-320 ~]# ./diskchecker.pl -s 10.10.30.199 verify testfile2 verifying: 0.00% verifying: 0.12% [...] verifying: 99.82% verifying: 100.00% Total errors: 0 [root@f15-intel-320 ~]# I did the tests multiple times, I had also some runs on the Vertex 3 without errors, but with the Intel 320 Series no single test reported an error. I did the tests with fedora 15 on the SSDs, here are the details of hdparm -I OCZ Vertex 3: Model Number: OCZ-VERTEX3 Serial Number: OCZ-OQZF2I45DYZ47T3C Firmware Revision: 2.06 Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0 [...] device size with M = 1000*1000: 120034 MBytes (120 GB) Intel 320 Series: Model Number: INTEL SSDSA2CW160G3 Serial Number: CVPR112601AL160DGN Firmware Revision: 4PC10302 Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6 [...] device size with M = 1000*1000: 160041 MBytes (160 GB) Regards, Werner On Mon, 2011-07-04 at 10:13 +0100, Tom De Mulder wrote: > On Thu, 30 Jun 2011, Mikael Abrahamsson wrote: > > > From the tests I have read, the Intel 510 are actually worse than the Intel > > X-25 G1/G2/320 models, with exactly the symptoms you're describing. It's fast > > for linear reads and writes, but not so good for random writes, especially not > > when it's getting full. > > Yes; that's why I'm looking forward to also getting some SandForce 22xx > based drives (probably OCZ Vertex 3) to test. > > > Include the Intel 320 as well, I think it should be viable for your usage > > pattern. > > I wasn't too impressed by the Anandtech review of the 320, and (as > everywhere) my funds are limited. :-) > > > -- > Tom De Mulder <tdm27@cam.ac.uk> - Cambridge University Computing Service > +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH > -> 04/07/2011 : The Moon is Waxing Crescent (17% of Full) > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-04 16:26 ` Werner Fischer @ 2011-07-17 22:31 ` Lutz Vieweg 0 siblings, 0 replies; 45+ messages in thread From: Lutz Vieweg @ 2011-07-17 22:31 UTC (permalink / raw) To: linux-raid Werner Fischer wrote: > 1) regarding Software RAID and TRIM: > there is a script raid1ext4trim.sh-1.4 from Chris Caputo that does a > TRIM for Ext4 file systems on a software RAID 1. According to the > comments in the script it only supports RAID volumes which reside on > complete disks (e.g. /dev/sdb and /dev/sdc), not on RAID partitions > (e.g. /dev/sdb1 and /dev/sdc1) > The script is shipped with hdparm I wonder why people would use the "hdparm" tool to issue TRIM commands on a lower level that you can do much more portable by using ioctl BLKDISCARD... > I would strongly recommend a SSD with integrated power-outage > protection Your results seem to indicate differences, but how is that an evidence for SSDs corrupting filesystems? As long as the SSD actually tells the truth about draining its caches when asked to, the journaling of the filesystem will keep the meta-data intact - but not necessarily the data inside the files, - for very plausible performance reasons, most filesystems will _not_ try to sync non-meta data by default! Nevertheless, sensitivity against power-outage situations has been a subject of many SSD updates for different controllers, so there may have been real issues, too. Regards, Lutz Vieweg ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-29 12:55 ` Tom De Mulder ` (2 preceding siblings ...) 2011-06-30 5:51 ` Mikael Abrahamsson @ 2011-07-17 22:16 ` Lutz Vieweg 3 siblings, 0 replies; 45+ messages in thread From: Lutz Vieweg @ 2011-07-17 22:16 UTC (permalink / raw) To: linux-raid Tom De Mulder wrote: > I have a set of 4 Intel 510 SSDs purely for testing, and I have used > these to simulate the kinds of workload I would expect them to > experience in a server environment Beware: The Intel SSDs are documented to voluntarily throttle write speed if they detect a lot of writing going on to meet their lifetime advertisement. (I have not read such in the documentation of marvell/micron/indilinx/sandforce controllers, and indeed, when wiped once per week, our SSDs keep up their initial performance. And yes, I find it acceptable that they might wear out after >= 3 years :-) Regards, Lutz Vieweg ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-29 10:33 ` Tom De Mulder 2011-06-29 12:42 ` David Brown @ 2011-07-17 22:00 ` Lutz Vieweg 1 sibling, 0 replies; 45+ messages in thread From: Lutz Vieweg @ 2011-07-17 22:00 UTC (permalink / raw) To: linux-raid Tom De Mulder wrote: > Maybe I should have specified--my particular aim is to try and use > (fairly high-end) consumer SSDs for "enterprise" server applications That's exactly what we do. After all, "RAID" is still the acronym for "Redundant Array of _Inexpensive_ Disks", no matter how many times big-$$$ will try to tell you otherwise. And a software RAID built from some cheap consumer SSDs easily outperforms those overpriced "enterprise class" SSD devices they try to sell you. > Most hardware RAID controllers that I know > of don't pass on the TRIM command Not only that, they also suck regarding adding lots of latency to the SSD communication. There simply is no reason anymore to use hardware RAID at all. Regards, Lutz Vieweg ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-28 15:31 Software RAID and TRIM Tom De Mulder 2011-06-28 16:11 ` Mathias Burén @ 2011-06-28 16:17 ` Johannes Truschnigg 2011-06-28 16:40 ` David Brown 2 siblings, 0 replies; 45+ messages in thread From: Johannes Truschnigg @ 2011-06-28 16:17 UTC (permalink / raw) To: Tom De Mulder; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 717 bytes --] Hi Tom, On Tue, Jun 28, 2011 at 04:31:35PM +0100, Tom De Mulder wrote: > Hi, > [...] > Is this something that's in the works? Iirc, dm-raid supports passthru of DSM/TRIM commands for its provided RAID0 and RAID1 levels. Maybe that's already enough for your purposes? I don't know if there's any development going on on the md side of things in in that regard. Others on this list will surely be able to answer that question, however. Have a nice day! -- with best regards: - Johannes Truschnigg ( johannes@truschnigg.info ) www: http://johannes.truschnigg.info/ phone: +43 650 2 133337 xmpp: johannes@truschnigg.info Please do not bother me with HTML-eMail or attachments. Thank you. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-28 15:31 Software RAID and TRIM Tom De Mulder 2011-06-28 16:11 ` Mathias Burén 2011-06-28 16:17 ` Johannes Truschnigg @ 2011-06-28 16:40 ` David Brown 2011-07-17 21:52 ` Lutz Vieweg 2 siblings, 1 reply; 45+ messages in thread From: David Brown @ 2011-06-28 16:40 UTC (permalink / raw) To: linux-raid On 28/06/11 17:31, Tom De Mulder wrote: > Hi, > > > I'm investigating SSD performance on Linux, in particular for RAID devices. > > As I understand it—and please correct me if I'm wrong—currently software > RAID does not pass through TRIM to the underlying devices. TRIM is > essential for the continued high performance of SSDs, which otherwise > degrade over time. > > I don't think there would be any harm in this command being passed > through to underlying devices if they don't support it (they would just > ignore it), and if they do it would make high-performance software RAID > of SSDs a possibility. > > > Is this something that's in the works? > > I don't think you are wrong about software raid not passing TRIM down to the device (IIRC, it /can/ be passed down through LVM raid setups, but they are slower and less flexible than md raid). However, AFAIUI, you are wrong about TRIM being essential for the continued high performance of SSDs. As long as your SSDs have some over-provisioning (or you only partition something like 90% of the drive), and it's got good garbage collection, then TRIM will have minimal effect. TRIM only makes a big difference in benchmarks which fill up most of a disk, then erase the files, then start writing them again, and even then it is mainly with older flash controllers. I think other SSD-optimisations, such as those in BTRFS, are much more important. These include bypassing or disabling code that is aimed at optimising disk access and minimising head movement - such code is of great benefit with hard disks, but helps little and adds latency on SSD systems. (I haven't done any benchmarks to justify this opinion, nor have I direct links - it's based on my understanding of TRIM and how SSDs work, and how SSD controllers have changed between early devices and current ones.) -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-06-28 16:40 ` David Brown @ 2011-07-17 21:52 ` Lutz Vieweg 2011-07-18 5:14 ` Mikael Abrahamsson ` (2 more replies) 0 siblings, 3 replies; 45+ messages in thread From: Lutz Vieweg @ 2011-07-17 21:52 UTC (permalink / raw) To: linux-raid David Brown wrote: > However, AFAIUI, you are wrong about TRIM being essential for the > continued high performance of SSDs. As long as your SSDs have some > over-provisioning (or you only partition something like 90% of the > drive), and it's got good garbage collection, then TRIM will have > minimal effect. I beg to differ. We are using SSDs in very much the way that Tom de Mulder intends, and from our extensive performance measurements over many months now I can say that (at least if you do have significant amounts of write operations) it _does_ make a lot of difference whether you periodically discard the unused sectors or not. (For us, the write performance measured to be about half as good when there are no free erase blocks available anymore.) Of course, you can only benefit from discards if your filesystem is not full (because then there is nothing to discard). But any kind of "garbage collection" by the SSD itself will not have the same effect, since it cannot know which blocks are in use by the filesystem. > I think other SSD-optimisations, such as those in BTRFS, are much more > important. Actually, (apart from btrfs still being in development, not really ready for production use, yet), XFS (-o delaylog,barrier) performs better on our SSDs than btrfs - without any SSD-specific options. What is really an important factor for SSD performance: The controller. The same SSDs perform with significantly lower latency for us when connected to SATA controller channels than when connected to SAS controllers (and they perform abysmal when used as hardware-RAID constituents, in comparison). Regards, Lutz Vieweg ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-17 21:52 ` Lutz Vieweg @ 2011-07-18 5:14 ` Mikael Abrahamsson 2011-07-18 10:35 ` David Brown 2011-07-18 10:53 ` Tom De Mulder 2 siblings, 0 replies; 45+ messages in thread From: Mikael Abrahamsson @ 2011-07-18 5:14 UTC (permalink / raw) To: Lutz Vieweg; +Cc: linux-raid On Sun, 17 Jul 2011, Lutz Vieweg wrote: > David Brown wrote: >> However, AFAIUI, you are wrong about TRIM being essential for the continued >> high performance of SSDs. As long as your SSDs have some over-provisioning >> (or you only partition something like 90% of the drive), and it's got good >> garbage collection, then TRIM will have minimal effect. > > I beg to differ. > > Of course, you can only benefit from discards if your filesystem > is not full (because then there is nothing to discard). But any > kind of "garbage collection" by the SSD itself will not have the > same effect, since it cannot know which blocks are in use by the > filesystem. Well, that's what you gain from only using 90% of the drive space for data (be it via partition or some other means), you increase the overprovisioning and thus the drive has more empty space to play with, even if you fill up the FS to 100%. So yes, TRIM is nice but if you want consistant performance then you need to assume that your FS is going to be 100% full anyway, so then you have to limit the FS block use to 80-90% of the total drive space. -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-17 21:52 ` Lutz Vieweg 2011-07-18 5:14 ` Mikael Abrahamsson @ 2011-07-18 10:35 ` David Brown 2011-07-18 10:48 ` Tom De Mulder 2011-07-18 18:09 ` Lutz Vieweg 2011-07-18 10:53 ` Tom De Mulder 2 siblings, 2 replies; 45+ messages in thread From: David Brown @ 2011-07-18 10:35 UTC (permalink / raw) To: linux-raid On 17/07/2011 23:52, Lutz Vieweg wrote: > David Brown wrote: >> However, AFAIUI, you are wrong about TRIM being essential for the >> continued high performance of SSDs. As long as your SSDs have some >> over-provisioning (or you only partition something like 90% of the >> drive), and it's got good garbage collection, then TRIM will have >> minimal effect. > > I beg to differ. > Well, I don't have your experience here (I have a couple of 60G SSD's in RAID0, without TRIM, but that's hardly in the same class). So I don't expect you to put much weight on my opinions. But maybe it will give you reason for more testing. > We are using SSDs in very much the way that Tom de Mulder intends, > and from our extensive performance measurements over many months > now I can say that (at least if you do have significant amounts > of write operations) it _does_ make a lot of difference whether you > periodically discard the unused sectors or not. > (For us, the write performance measured to be about half as good > when there are no free erase blocks available anymore.) > If there are no free erase blocks, then your SSD's don't have enough over-provisioning. This is, after all, the whole point of having more physical flash than the logical disk size would suggest. Depending on the quality of the SSD (more expensive ones have more over-provisioning), and the usage patterns (if you have lots of small random writes, you'll need more extra space), then you might have to "manually" over-provision the disk by only partitioning about 90% of the disk. Of course, you must make sure that the remaining 10% is "discarded", or left untouched from new, and that you use the partition for your RAID and not the whole disk. So now you have plenty of erase blocks at any time, and your write performance will be good. TRIM, on the other hand, does not give you any extra free erase blocks. If you think it does, you've misunderstood it. TRIM exists to make garbage collection a little more efficient - when garbage collecting an erase block that contains TRIM'ed blocks, the TRIM'ed blocks don't need to be copied. This saves a small amount of time in the copying, and allows slightly denser packing. It may sometimes lead to saving whole erase blocks, but that's seldom the case in practice except when erasing large files. If your disks are reasonably full, then TRIM will not help much because the garbage collection will be desperately trying to piece together small bits into complete erase blocks, and your performance will drop through the floor. If you have plenty of overprovisioning, then the SSD still has lots of completely free erase blocks whenever it needs them. If your filesystem re-uses (logical) blocks, then TRIM will not help. It is /always/ more efficient for the FS to simply write new data to the same block, rather than TRIM'ing it first. TRIM is a very expensive command - it acts a bit like a write, but it is not a queued command. Thus the block layer must wait for /all/ IO commands to have completed, then issue the TRIM, then wait for it to complete, and then carry on with new commands. On some SSD's, it will (according to something I read) trigger garbage collection, which may slow down the SSD. Even without that, the performance of most meta-data operations (such as delete) will drop considerably when they also need to do TRIM. <http://people.redhat.com/jmoyer/discard/ext4_batched_discard/ext4_discard.html> <http://lwn.net/Articles/347511/> <http://www.realworldtech.com/beta/forums/index.cfm?action=detail&id=116034&threadid=115697&roomid=2> On the other hand, your off-line batch TRIM during low use periods could well be a win. The cost of these discards is not going to be an issue, and large batched discards are going to be far more useful to the SSD than small scattered ones. I believe that there has been work on a similar system in XFS - I don't know what happened to that, or if there is any way to make it work in concert with md raid. What will make a big difference to using SSD's in md raid is the sync/no-sync tracking. This will avoid a lot of unnecessary writes, especially with a new array, and leave the SSD with more free blocks (at least until the disk is getting full of data). It is also much higher up the things-to-do list, because it will be useful for all uses of md raid, and is a perquisite to general discard support. (Strictly speaking it is not needed for SSD's that guarantee a zero return on TRIM'ed blocks - but only some SSD's give that guarantee.) > Of course, you can only benefit from discards if your filesystem > is not full (because then there is nothing to discard). But any > kind of "garbage collection" by the SSD itself will not have the > same effect, since it cannot know which blocks are in use by the > filesystem. > Garbage collection will recycle blocks that have been overwritten. The filesystem knows which logical blocks are in use, and which are free. Filesystems already heavily re-use blocks, in the aim of preferring faster outer tracks on HD's, and minimizing head movement. So when a file is erased, there's a good chance that those same logical blocks will be re-used soon - TRIM is of no benefit in that case. >> I think other SSD-optimisations, such as those in BTRFS, are much more >> important. > > Actually, (apart from btrfs still being in development, not really > ready for production use, yet), XFS (-o delaylog,barrier) performs > better on our SSDs than btrfs - without any SSD-specific options. > btrfs is ready for some uses, but is not mature and real-world tested enough for serious systems (and its tools are still lacking somewhat). But more generally, different filesystems are faster and slower for different usage patterns. One SSD optimisation that many filesystems could implement is to be less concerned about fragmentation. Most modern filesystems go out of their way to try to reduce fragmentation, which is great for HD use. But on SSD's, you should be happy to fragment files if it promotes re-use of erased blocks, as long as fragments aim to fill complete erase blocks (in size and alignment). > What is really an important factor for SSD performance: The controller. > The same SSDs perform with significantly lower latency for us when > connected to SATA controller channels than when connected to SAS > controllers (and they perform abysmal when used as hardware-RAID > constituents, in comparison). That is /very/ interesting to know, and is a data point I haven't read elsewhere (though I knew about poor performance of hardware RAID with SSD). Thanks for sharing that. > > Regards, > > Lutz Vieweg > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-18 10:35 ` David Brown @ 2011-07-18 10:48 ` Tom De Mulder 2011-07-18 18:09 ` Lutz Vieweg 1 sibling, 0 replies; 45+ messages in thread From: Tom De Mulder @ 2011-07-18 10:48 UTC (permalink / raw) To: linux-raid On Mon, 18 Jul 2011, David Brown wrote: First, I'd like to say that I've done more testing, and found that even after very prolonged, sustained heavy use, the (Intel 510) SSDs I partitioned 50/50 with half left unused didn't show any degradation in performance. That's after about a week of constant writing/erasing. > If your disks are reasonably full, then TRIM will not help much because the > garbage collection will be desperately trying to piece together small bits > into complete erase blocks, and your performance will drop through the floor. However, it won't drop as low as it would without TRIM in the same situation. But with a continuous heavy workload, even TRIM won't help, and over-provisioning is the way to go. Best, -- Tom De Mulder <tdm27@cam.ac.uk> - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -> 18/07/2011 : The Moon is Waning Gibbous (83% of Full) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-18 10:35 ` David Brown 2011-07-18 10:48 ` Tom De Mulder @ 2011-07-18 18:09 ` Lutz Vieweg 2011-07-18 20:18 ` David Brown 1 sibling, 1 reply; 45+ messages in thread From: Lutz Vieweg @ 2011-07-18 18:09 UTC (permalink / raw) To: linux-raid On 07/18/2011 12:35 PM, David Brown wrote: > If there are no free erase blocks, then your SSD's don't have enough over-provisioning. When you think about "How many free erase blocks are enough?" you'll come to the conclusion that this simply depends on the usage pattern. Ideally, you'll want every write to a SSD to go to a completely free erase block, because if it doesn't, it's both slower and will probably also lead to a higher average number of write cycles (because more than one read-modify-write cycle per erase block may be required to fill it with new data, if that new data cannot be buffered in the SSDs RAM.) If the goal is to have every write go to a free erase block, then you need to free up at least as many erase blocks per time period as data will be written during that time period (assuming the worst case that all writes will _not_ go to blocks that have been written to before). Of course you can accomplish this by over-providing so much flash space that the SSD will always be capable of re-arranging the used data blocks such that they are tightly packed into fully used erase blocks, while the rest of the erase blocks are completely empty. But that is a pretty expensive approach, essentially this requires 100% over-provisioing (or: 50 usable capacity, or twice the price for the storage). And, you still have to trust that the SSD will use that over-provisioned space the way you want (e.g. the SSD firmware could be inclined to only re-arrange erase blocks that have a certain ratio of unused sectors within them). One good thing abort explicitely discarding sectors, while using most of the offered space is (besides the significant cost argument) that your SSD will likely invest effort to re-arrange sectors into fully allocated and fully free erase blocks exactly at the time when this makes most sense for you. It will have to copy only data that is actually still valid (reducing wear), and you may even choose a time at which you know that significant amounts of data have been deleted. > Depending on the quality of the SSD (more expensive ones have more over-provisioning) Alas, manufacturers tend to ask twice the price for much less than twice the over-provisioning, so it's still advisable to buy the cheaper SSD and choose over-provisioning ratio by using only part of it... > TRIM, on the other hand, does not give you any extra free erase blocks. If you think it does, you've > misunderstood it. I have to disagree on this :-) Imagine a SSD with 10 erase blocks capacity, each having place for 10 sectors. Let's assume the SSD advertises only 90 sectors total capacity, over-providing one erase block. Now I write 8 files each of 10 sectors size on the SSDs, then delete 2 of the 8 files. If the SSD now performs some "garbage collection", it will not have more than 2 free erase blocks. But if I discard/TRIM the unused sectors, and the SSD does the right thing about it, there will be 4 free erase blocks. So, yes, TRIM can gain you extra free erase blocks, but of course only if there is unused space in the filesystem. > It may sometimes lead to saving > whole erase blocks, but that's seldom the case in practice except when erasing large files. Our different perception may result from our use-case involving frequent deletion of files, while yours doesn't. But this is not only about "large files", only. Obviously, all modern SSDs are capable of re-arranging data into fully allocated and fully free erase-blocks, and this process can benefit from every single sector that has been discarded. > If your filesystem re-uses (logical) blocks, then TRIM will not help. If the only thing the filesystem does is overwriting blocks that held valid data right until they are overwritten with newer valid data, then TRIM will certainly not help. But every discard that happens in between an invalidation of data and the overwriting of the same logical block can potentially benefit from a TRIM in between. Imagine a file of 1000 sectors, all valid data. Now your application decides to overwrite that file with 1000 sectors of newer data. Let's assume the FS is clever enough to use the same 1000 logical sectors for this. But let's also assume the RAM-cache of the SSD is only 20 logical sectors in size, and one erase-block is 10 sectors in size. Now the SSD needs to start writing from its RAM buffer to flash at least after 20 sectors of data have been processed. If you are lucky, and everything was written in sequence, and well aligned, then the SSD may just need to erase and overwrite flash blocks that were formerly used for the same logical sectors. But if you are unlucky, the logical sectors to write are spread across different flash erase blocks. Thus the SSD can at best only mark them "unused" and has to write the data to a different (hopefully completely free) erase block. Again, if lucky (or heavily over-provisioned), you had >= 100 free erase blocks available when you started writing, and after they were written, 100 other erase blocks that held the older data can be freed after all 1000 sectors have been written. But if you are unlucky, not that many free erase blocks were available when starting to write. Then, to write the new data, the SSD needs to read data from non-completely-free erase blocks, fill the unused sectors within them with the new data, and write back the erase-blocks - which means much lower performance, and more wear. Now the same procedure with a "TRIM": After laying out the logical sectors to write to (but before writing to them), the filesystem can issue a "discard" on all those sectors. This will enable the SSD to mark all 100 erase blocks as completely free - even without additional "re-arranging". The following write operation to 1000 sectors may require erase-before write (if no pre-existing completely free erase-blocks can be used), but that is much better than having to do "read-modify-erase-write" cycles to the flash (and a larger number of that, since data has to be copied that the SSD cannot know to be obsolete). So: While re-arranging of valid data into erase-blocks may be expensive enough to do it only "batched" from time to time, even the simple marking of sectors as discarded can help the performance and endurance of a SSD. > It is /always/ more efficient > for the FS to simply write new data to the same block, rather than TRIM'ing it first. Depends on how expensive the marking of sectors as free is for the SSD, and how likely newly written data that fits into the SSDs cache will cause the freeing of complete erase blocks. > TRIM is a very expensive command That seems to depend a lot on the firmware of different drives. But I agree that it might not be a good idea to rely on it being cheap. From the behaviour of the SSDs we like best it seems that TRIM is often only causing cheap "marking as free" operations, while sometimes, every few weeks, the SSD is actually doing a lot of re-arranging ("garbage collecting"?) stuff after the discards have been issued. (Certainly also depends a lot on the usage pattern.) > I believe that there has been work on a similar system > in XFS Yes, XFS supports that now, but alas, we cannot use it with MD, as MD will discard the discards :-) > What will make a big difference to using SSD's in md raid is the sync/no-sync tracking. This will > avoid a lot of unnecessary writes, especially with a new array, and leave the SSD with more free > blocks (at least until the disk is getting full of data). Hmmm... the sync/no-sync tracking will save you exactly one write to all sectors. That's certainly a good thing, but since a single "fstrim" after the sync will restore the "good performance" situation, I don't consider that an urgent feature. > Filesystems already heavily re-use blocks, in the aim > of preferring faster outer tracks on HD's, and minimizing head movement. So when a file is erased, > there's a good chance that those same logical blocks will be re-used soon - TRIM is of no benefit in > that case. It is of benefit - to the performance of exactly those writes that go to the formerly used logical blocks. > btrfs is ready for some uses, but is not mature and real-world tested enough for serious systems > (and its tools are still lacking somewhat). Let's not divert the discussion too much. I'll happily re-try btrfs when the developers say it's not experimental anymore, and when there's a "fsck"-like utility to check its integrity. Regards, Lutz Vieweg ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-18 18:09 ` Lutz Vieweg @ 2011-07-18 20:18 ` David Brown 2011-07-19 9:29 ` Lutz Vieweg 0 siblings, 1 reply; 45+ messages in thread From: David Brown @ 2011-07-18 20:18 UTC (permalink / raw) To: linux-raid On 18/07/11 20:09, Lutz Vieweg wrote: > On 07/18/2011 12:35 PM, David Brown wrote: >> If there are no free erase blocks, then your SSD's don't have enough >> over-provisioning. > > When you think about "How many free erase blocks are enough?" you'll > come to the conclusion that this simply depends on the usage pattern. > Yes. > Ideally, you'll want every write to a SSD to go to a completely free > erase block, because if it doesn't, it's both slower and will probably > also lead to a higher average number of write cycles (because more than > one read-modify-write cycle per erase block may be required to fill it > with new data, if that new data cannot be buffered in the SSDs RAM.) > No. You don't need to fill an erase block for writing - writes are done as write blocks (I think 4K is the norm). That's the odd thing about flash - erase is done in much larger blocks than writes. > If the goal is to have every write go to a free erase block, then you > need to free up at least as many erase blocks per time period as data > will be written during that time period (assuming the worst case that > all writes will _not_ go to blocks that have been written to before). > Again, no - since you don't have to write to whole erase blocks. > Of course you can accomplish this by over-providing so much flash space > that the SSD will always be capable of re-arranging the used data blocks > such that they are tightly packed into fully used erase blocks, while > the rest of the erase blocks are completely empty. > But that is a pretty expensive approach, essentially this requires 100% > over-provisioing (or: 50 usable capacity, or twice the price for the > storage). The level of over-provisioning that can be useful will depend on the usage patterns, such as how much and how scattered your deletes are. There will be diminishing returns for increased overprovisioning - the balance is up to the user, but I can't imagine 50% being sensible. I wonder if you are mixing up the theoretical peak write speeds to a new SSD with real-world write speeds to a disk in use. These are not the same, and no amount of TRIM'ing or over-provisioning will let you see those speeds in anything but a synthetic benchmark. Your aim is /not/ to go mad trying to reach the marketing-claimed speeds in a real application, but to balance /good/ and /consistent/ speeds with a sensible cost. Understand that SSD's are very fast, but not as fast as a marketer or an initial benchmark suggests, and you will be much happier with your disks. > And, you still have to trust that the SSD will use that over-provisioned > space the way you want (e.g. the SSD firmware could be inclined to only > re-arrange erase blocks that have a certain ratio of unused sectors > within them). > You want to pick an SSD with good garbage collection, if that's what you mean. > One good thing abort explicitely discarding sectors, while using most of > the offered space is (besides the significant cost argument) that your > SSD will likely invest effort to re-arrange sectors into fully allocated > and fully free erase blocks exactly at the time when this makes most > sense for you. It will have to copy only data that is actually still > valid (reducing wear), and you may even choose a time at which you know > that significant amounts of data have been deleted. > The reality is that for most applications and usage patterns, logical blocks that are deleted and not re-used are in the minority. It is true that when garbage-collecting a block, the SSD can hop over the discarded blocks. But since they are in the minority, it's a small effect. It could even be a detrimental effect - it could encourage the SSD to garbage-collect a block that would otherwise be left untouched, leading to extra effort and wear (but giving you a little more free space). Any effort done by the SSD on TRIM'ed blocks is wasted if these (logical) blocks are overwritten by the filesystem later, except if the SSD was otherwise short on free blocks. Again, the use of explicit batch discards gives a better effect than automatic TRIMs on deletes. >> Depending on the quality of the SSD (more expensive ones have more >> over-provisioning) > > Alas, manufacturers tend to ask twice the price for much less than twice > the over-provisioning, > so it's still advisable to buy the cheaper SSD and choose > over-provisioning ratio by using > only part of it... > Fair enough. > >> TRIM, on the other hand, does not give you any extra free erase >> blocks. If you think it does, you've >> misunderstood it. > > I have to disagree on this :-) > > Imagine a SSD with 10 erase blocks capacity, each having place for 10 > sectors. > Let's assume the SSD advertises only 90 sectors total capacity, > over-providing one erase block. > Now I write 8 files each of 10 sectors size on the SSDs, then delete 2 > of the 8 files. > > If the SSD now performs some "garbage collection", it will not have more > than 2 free erase blocks. > > But if I discard/TRIM the unused sectors, and the SSD does the right > thing about it, there will be 4 free erase blocks. > > So, yes, TRIM can gain you extra free erase blocks, but of course only > if there is unused space in the filesystem. > OK, let me rephrase - TRIM does not give you /significantly/ more free erase blocks /in real life/. You can construct arrangements, like you described, where the SSD can get noticeably more erase blocks through the use of TRIM. But under use, things are different as blocks are written and re-written. Your example would break as soon as you take into account the writing of the directory to the disk, messing up your neat blocks. And again, appropriately scheduled batch TRIM will give better results than automatic TRIM, and /may/ be worth the effort. > >> It may sometimes lead to saving >> whole erase blocks, but that's seldom the case in practice except when >> erasing large files. > > Our different perception may result from our use-case involving frequent > deletion of files, while yours doesn't. > Perhaps. The nature of most filesystems is to grow - more data gets written than erased. But many of the effects here are usage pattern dependent. > But this is not only about "large files", only. Obviously, all modern > SSDs are capable of re-arranging data into fully allocated and fully > free erase-blocks, and this process can benefit from every single sector > that has been discarded. > > >> If your filesystem re-uses (logical) blocks, then TRIM will not help. > > If the only thing the filesystem does is overwriting blocks that held > valid data right until they are overwritten with newer valid data, then > TRIM will certainly not help. > > But every discard that happens in between an invalidation of data and > the overwriting of the same logical block can potentially benefit from a > TRIM in between. Imagine a file of 1000 sectors, all valid data. Now > your application decides to overwrite that file with 1000 sectors of > newer data. Let's assume the FS is clever enough to use the same 1000 > logical sectors for this. But let's also assume the RAM-cache of the SSD > is only 20 logical sectors in size, and one erase-block is 10 > sectors in size. Now the SSD needs to start writing from its RAM buffer > to flash at least after 20 sectors of data have been processed. If you > are lucky, and everything was written in sequence, and well aligned, > then the SSD may just need to erase and overwrite flash blocks that were > formerly used for the same logical sectors. But if you are unlucky, the > logical sectors to write are spread across different flash erase blocks. > Thus the SSD can at best only mark them "unused" and has to write the > data to a different (hopefully completely free) erase block. Again, if > lucky (or heavily over-provisioned), you had >= 100 free erase blocks > available when you started writing, and after they were written, 100 > other erase blocks that held the older data can be freed after all 1000 > sectors have been written. But if you are unlucky, not that many free > erase blocks were available when starting to write. Then, to write the > new data, the SSD needs to read data from non-completely-free erase > blocks, fill the unused sectors within them with the new data, and write > back the erase-blocks - which means much lower performance, and more wear. > Now the same procedure with a "TRIM": After laying out the logical > sectors to write to (but before writing to them), the filesystem can > issue a "discard" on all those sectors. This will enable the SSD to mark > all 100 erase blocks as completely free - even without additional > "re-arranging". The following write operation to 1000 sectors may > require erase-before write (if no pre-existing completely free > erase-blocks can be used), but that is much better than having to do > "read-modify-erase-write" cycles to the flash (and a larger number of > that, since data has to be copied that the SSD cannot know to be obsolete). > > So: While re-arranging of valid data into erase-blocks may be expensive > enough to do it only "batched" from time to time, even the simple > marking of sectors as discarded can help the performance and endurance > of a SSD. > Again, I think your arguments only work on very artificial data. But perhaps this is close to your real-world usage patterns. >> It is /always/ more efficient >> for the FS to simply write new data to the same block, rather than >> TRIM'ing it first. > > Depends on how expensive the marking of sectors as free is for the SSD, > and how likely newly written data that fits into the SSDs cache will > cause the freeing of complete erase blocks. > > >> TRIM is a very expensive command > > That seems to depend a lot on the firmware of different drives. > But I agree that it might not be a good idea to rely on it being cheap. > > From the behaviour of the SSDs we like best it seems that TRIM is often > only causing cheap "marking as free" operations, while sometimes, every > few weeks, the SSD is actually doing a lot of re-arranging ("garbage > collecting"?) stuff after the discards have been issued. > (Certainly also depends a lot on the usage pattern.) > My main point about TRIM being expensive is the effect it has on the block IO queue, regardless of the implementation in the SSD. Again, this is less relevant to batched TRIMs during low-use times. >> I believe that there has been work on a similar system >> in XFS > > Yes, XFS supports that now, but alas, we cannot use it with MD, as MD > will discard the discards :-) > > >> What will make a big difference to using SSD's in md raid is the >> sync/no-sync tracking. This will >> avoid a lot of unnecessary writes, especially with a new array, and >> leave the SSD with more free >> blocks (at least until the disk is getting full of data). > > Hmmm... the sync/no-sync tracking will save you exactly one write to all > sectors. That's certainly a good thing, but since a single "fstrim" > after the sync will restore the "good performance" situation, I don't > consider that an urgent feature. > I really hope your SSD's return zeros for TRIM'ed blocks, and that you are sure all your TRIMs are in full raid stripes - otherwise you will /seriously/ mess up your raid arrays. One definite problem with RAID on SSD's is that this first write will mean that the SSD has no more free erase blocks than if the filesystem were full, as the SSD doesn't know the blocks can be recycled. Of course, it will see that pretty quickly as soon as the filesystem writes real data, but it will still have extra waste. For mirrored drives, this may mean a difference in speed in the two drives as one has more freedom for garbage collection than the other (for RAID5, this effect is spread evenly over the disks). > >> Filesystems already heavily re-use blocks, in the aim >> of preferring faster outer tracks on HD's, and minimizing head >> movement. So when a file is erased, >> there's a good chance that those same logical blocks will be re-used >> soon - TRIM is of no benefit in >> that case. > > It is of benefit - to the performance of exactly those writes that go to > the formerly used logical blocks. > > >> btrfs is ready for some uses, but is not mature and real-world tested >> enough for serious systems >> (and its tools are still lacking somewhat). > > Let's not divert the discussion too much. I'll happily re-try btrfs when > the developers say it's not experimental anymore, and when there's a > "fsck"-like utility to check its integrity. > > Regards, > > Lutz Vieweg > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-18 20:18 ` David Brown @ 2011-07-19 9:29 ` Lutz Vieweg 2011-07-19 10:22 ` David Brown 0 siblings, 1 reply; 45+ messages in thread From: Lutz Vieweg @ 2011-07-19 9:29 UTC (permalink / raw) To: linux-raid On 07/18/2011 10:18 PM, David Brown wrote: > You don't need to fill an erase block for writing - writes are done as write blocks (I think 4K is > the norm). You are right on that. Those sectors in a partially used erase block that have not been written to since the last erase of the whole erase block can be written to as good as sectors in completely empty erase blocks. > My main point about TRIM being expensive is the effect it has on the block IO queue, regardless of > the implementation in the SSD. Because of those effects on the block-IO-queue, the user-space work-around we implemented to discard the SSDs our RAID-1s consist of will not discard "one area on all SSDs at a time", but rather iterate first through all unused areas on one SSD, then iterate through the same list of areas on the second SSD. The effect of this is very much to our liking: While we can see near-100%-utilization on one SSD at a time during the discards, the other SSD will happily service the readers, and even the writes that go to the /dev/md* device are buffered in main memory long enough that we do not really see a significantly bad impact on the service. (This might be different, though, if the discards were done during peak-write-load times of the day.) > I really hope your SSD's return zeros for TRIM'ed blocks For RAID-1, the only consequence of not doing so is just that "data-check" runs may result in a > 0 mismatch_cnt. It does not destroy any of your data, and as long as I have two SSDs in a RAID, both of which give a non-error result when reading a sector, I would have no indication of "which of the returned sector contents to prefer", anyway. (I admit that for health monitoring it is useful to have a meaningful mismatch_cnt.) > and that you are sure all your TRIMs are > in full raid stripes - otherwise you will /seriously/ mess up your raid arrays. Again, for RAID0/1 (even 10) I don't see why this would harm any data. Regards, Lutz Vieweg ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-19 9:29 ` Lutz Vieweg @ 2011-07-19 10:22 ` David Brown 2011-07-19 13:41 ` Lutz Vieweg 2011-07-19 14:19 ` Tom De Mulder 0 siblings, 2 replies; 45+ messages in thread From: David Brown @ 2011-07-19 10:22 UTC (permalink / raw) To: linux-raid On 19/07/2011 11:29, Lutz Vieweg wrote: > On 07/18/2011 10:18 PM, David Brown wrote: >> You don't need to fill an erase block for writing - writes are done >> as write blocks (I think 4K is the norm). > > You are right on that. Those sectors in a partially used erase block > that have not been written to since the last erase of the whole erase > block can be written to as good as sectors in completely empty erase > blocks. > > >> My main point about TRIM being expensive is the effect it has on >> the block IO queue, regardless of the implementation in the SSD. > > Because of those effects on the block-IO-queue, the user-space > work-around we implemented to discard the SSDs our RAID-1s consist of > will not discard "one area on all SSDs at a time", but rather iterate > first through all unused areas on one SSD, then iterate through the > same list of areas on the second SSD. > Do you take the arrays off-line during this process, or at least make them read-only? If not, how do you ensure that the lists are valid? > The effect of this is very much to our liking: While we can see > near-100%-utilization on one SSD at a time during the discards, the > other SSD will happily service the readers, and even the writes that > go to the /dev/md* device are buffered in main memory long enough > that we do not really see a significantly bad impact on the service. > (This might be different, though, if the discards were done during > peak-write-load times of the day.) > > >> I really hope your SSD's return zeros for TRIM'ed blocks > > For RAID-1, the only consequence of not doing so is just that > "data-check" runs may result in a > 0 mismatch_cnt. It does not > destroy any of your data, and as long as I have two SSDs in a RAID, > both of which give a non-error result when reading a sector, I would > have no indication of "which of the returned sector contents to > prefer", anyway. > > (I admit that for health monitoring it is useful to have a meaningful > mismatch_cnt.) > >> and that you are sure all your TRIMs are in full raid stripes - >> otherwise you will /seriously/ mess up your raid arrays. > > Again, for RAID0/1 (even 10) I don't see why this would harm any > data. > Fair enough for RAID1. Just don't try it with RAID5! ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-19 10:22 ` David Brown @ 2011-07-19 13:41 ` Lutz Vieweg 2011-07-19 15:06 ` David Brown 2011-07-19 14:19 ` Tom De Mulder 1 sibling, 1 reply; 45+ messages in thread From: Lutz Vieweg @ 2011-07-19 13:41 UTC (permalink / raw) To: linux-raid On 07/19/2011 12:22 PM, David Brown wrote: >> Because of those effects on the block-IO-queue, the user-space >> work-around we implemented to discard the SSDs our RAID-1s consist of >> will not discard "one area on all SSDs at a time", but rather iterate >> first through all unused areas on one SSD, then iterate through the >> same list of areas on the second SSD. > > Do you take the arrays off-line during this process, or at least make > them read-only? No, we keep them online and writeable. > If not, how do you ensure that the lists are valid? The discard procedure works by..: - use SYS_fallocate to allocate the free space on the device (minus some safety margin for the writes that will happen during the procedure) for a temporary file (notice that with fallocate on XFS, you can allocate space for a file without actually ever writing to it) - use ioctl FIEMAP to get a list of the logical blocks that were allocated - use ioctl BLKDISCARD to discard these blocks - remove the temporary file Since the blocks to discard are allocated for the temporary file during the procedure, they will not be used otherwise. Obviously, we would still prefer using "fstrim", because then there would be no need for that temporary file, the "safety margin" and a temporary high fill level of the filesystem. Regards, Lutz Vieweg ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-19 13:41 ` Lutz Vieweg @ 2011-07-19 15:06 ` David Brown 2011-07-20 10:39 ` Lutz Vieweg 0 siblings, 1 reply; 45+ messages in thread From: David Brown @ 2011-07-19 15:06 UTC (permalink / raw) To: linux-raid On 19/07/2011 15:41, Lutz Vieweg wrote: > On 07/19/2011 12:22 PM, David Brown wrote: >>> Because of those effects on the block-IO-queue, the user-space >>> work-around we implemented to discard the SSDs our RAID-1s consist of >>> will not discard "one area on all SSDs at a time", but rather iterate >>> first through all unused areas on one SSD, then iterate through the >>> same list of areas on the second SSD. >> >> Do you take the arrays off-line during this process, or at least make >> them read-only? > > No, we keep them online and writeable. > >> If not, how do you ensure that the lists are valid? > > The discard procedure works by..: > > - use SYS_fallocate to allocate the free space on the device (minus > some safety margin for the writes that will happen during the procedure) > for a temporary file (notice that with fallocate on XFS, you can > allocate space for a file without actually ever writing to it) > > - use ioctl FIEMAP to get a list of the logical blocks that were > allocated > > - use ioctl BLKDISCARD to discard these blocks > > - remove the temporary file > > Since the blocks to discard are allocated for the temporary > file during the procedure, they will not be used otherwise. > > Obviously, we would still prefer using "fstrim", because then > there would be no need for that temporary file, the "safety margin" > and a temporary high fill level of the filesystem. > > Regards, > > Lutz Vieweg > It certainly sounds like a safe procedure, but I can see why you feel it's not quite as elegant as it could be. You will also be "discarding" blocks that have never been written (at least, not since the last discard...) - is there much overhead in that? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-19 15:06 ` David Brown @ 2011-07-20 10:39 ` Lutz Vieweg 0 siblings, 0 replies; 45+ messages in thread From: Lutz Vieweg @ 2011-07-20 10:39 UTC (permalink / raw) To: linux-raid On 07/19/2011 05:06 PM, David Brown wrote: > It certainly sounds like a safe procedure, but I can see why you feel it's not quite as elegant as > it could be. You will also be "discarding" blocks that have never been written (at least, not since > the last discard...) - is there much overhead in that? Luckily the SSDs we use do not require significant time to process a discard on areas that were already free - e.g. discarding ~ 250G of SSD space that is already empty this way takes only ~ 10 seconds. Regards, Lutz Vieweg ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-19 10:22 ` David Brown 2011-07-19 13:41 ` Lutz Vieweg @ 2011-07-19 14:19 ` Tom De Mulder 2011-07-20 7:42 ` David Brown 2011-07-20 12:13 ` Werner Fischer 1 sibling, 2 replies; 45+ messages in thread From: Tom De Mulder @ 2011-07-19 14:19 UTC (permalink / raw) To: linux-raid In case people are interested, I ran more benchmarks. The impact of TRIM on an over-provisioned drive is remarkable: a 25% performance loss when using Postmark. Because this isn't really on-topic for the MD mailing list, I've put it somewhere else: http://tdm27.wordpress.com/2011/07/19/some-solid-state-drive-benchmarks/ My next goal, when I have the time, is to compare different amounts of over-provisioning. -- Tom De Mulder <tdm27@cam.ac.uk> - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -> 19/07/2011 : The Moon is Waning Gibbous (75% of Full) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-19 14:19 ` Tom De Mulder @ 2011-07-20 7:42 ` David Brown 2011-07-20 12:20 ` Lutz Vieweg 2011-07-20 12:13 ` Werner Fischer 1 sibling, 1 reply; 45+ messages in thread From: David Brown @ 2011-07-20 7:42 UTC (permalink / raw) To: linux-raid On 19/07/2011 16:19, Tom De Mulder wrote: > > In case people are interested, I ran more benchmarks. The impact of TRIM > on an over-provisioned drive is remarkable: a 25% performance loss when > using Postmark. > > Because this isn't really on-topic for the MD mailing list, I've put it > somewhere else: > It is a little off-topic, perhaps, but still of interest to many RAID users precisely because of the myths and inaccurate data surrounding TRIM. There are too many people that think TRIM is essential to SSD's, RAID doesn't support TRIM, therefore you shouldn't use RAID and SSD's together. > http://tdm27.wordpress.com/2011/07/19/some-solid-state-drive-benchmarks/ > To try to explain your results - first it's easy to see why md raid1 with discard is a little slower than md raid1 without discard - the raid layer ignores the discards, so they can't help or hinder much, and the filesystem is doing a bit of extra work (sending the discards) to no purpose. It is also easy to see why a single SSD with no discards is about the same speed. You are using RAID1 - reads and writes are not striped in any way, so the speed is the same as for a single disk. If the test accessed multiple files in parallel (especially reads), you'd see faster reads. The telling figure here, though, is that TRIM made the single drive significantly slower. > My next goal, when I have the time, is to compare different amounts of > over-provisioning. > Also try using RAID10,far for your arrays. That will work the SSD's harder, and perhaps give a better comparison. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-20 7:42 ` David Brown @ 2011-07-20 12:20 ` Lutz Vieweg 0 siblings, 0 replies; 45+ messages in thread From: Lutz Vieweg @ 2011-07-20 12:20 UTC (permalink / raw) To: linux-raid On 07/20/2011 09:42 AM, David Brown wrote: >> http://tdm27.wordpress.com/2011/07/19/some-solid-state-drive-benchmarks/ > > The telling figure here, though, is that TRIM made the single drive significantly slower. More precisely, online-TRIM of ext4 on Intel SSDs seems to be a bad combination. I think it's clear you cannot gain much from TRIM if you're willing to spend the money for 2 times overprovisioning, anyway. You can lose significantly from online-trim when the filesystem issues a lot of TRIM commands all the time and when the SSD is slow to process them. TRIM gains you an advantage with less over-provisioning, and is better done in batches after significant amounts of data have been written/deleted. When you try with different levels of over-provisioning, also try with batched discards (fstrim) between runs of your benchmark. Regards, Lutz Vieweg ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-19 14:19 ` Tom De Mulder 2011-07-20 7:42 ` David Brown @ 2011-07-20 12:13 ` Werner Fischer 2011-07-20 12:25 ` Lutz Vieweg 1 sibling, 1 reply; 45+ messages in thread From: Werner Fischer @ 2011-07-20 12:13 UTC (permalink / raw) To: linux-raid On Tue, 2011-07-19 at 15:19 +0100, Tom De Mulder wrote: > In case people are interested, I ran more benchmarks. The impact of TRIM > on an over-provisioned drive is remarkable: a 25% performance loss when > using Postmark. > > Because this isn't really on-topic for the MD mailing list, I've put it > somewhere else: > > http://tdm27.wordpress.com/2011/07/19/some-solid-state-drive-benchmarks/ > > My next goal, when I have the time, is to compare different amounts of > over-provisioning. There is a paper from Intel "Over-provisioning an Intel® SSD" (analyzing X25-M 160 GB Gen.2 SSDs): http://cache-www.intel.com/cd/00/00/45/95/459555_459555.pdf On page 10 of this Intel presentation they mention that a spare area >27% of native capacity has diminishing returns for such an SSD: http://maltiel-consulting.com/Enterprise_Data_Integrity_Increasing_Endurance.pdf Regards, Werner -- : Werner Fischer : Technology Specialist : Thomas-Krenn.AG | The server-experts : http://www.thomas-krenn.com | http://www.thomas-krenn.com/wiki -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-20 12:13 ` Werner Fischer @ 2011-07-20 12:25 ` Lutz Vieweg 0 siblings, 0 replies; 45+ messages in thread From: Lutz Vieweg @ 2011-07-20 12:25 UTC (permalink / raw) To: linux-raid On 07/20/2011 02:13 PM, Werner Fischer wrote: > There is a paper from Intel "Over-provisioning an Intel® SSD" (analyzing > X25-M 160 GB Gen.2 SSDs): > http://cache-www.intel.com/cd/00/00/45/95/459555_459555.pdf > > On page 10 of this Intel presentation they mention that a spare area >> 27% of native capacity has diminishing returns for such an SSD: > http://maltiel-consulting.com/Enterprise_Data_Integrity_Increasing_Endurance.pdf (This latter document is password protected.) The first document, though, claims almost linear benefit (regarding IOs/sec) from much higher amounts of over-provisioning. Alas, their chart does not extend into the region where saturation of the effect must occur for sure. Regards, Lutz Vieweg -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-17 21:52 ` Lutz Vieweg 2011-07-18 5:14 ` Mikael Abrahamsson 2011-07-18 10:35 ` David Brown @ 2011-07-18 10:53 ` Tom De Mulder 2011-07-18 12:13 ` Werner Fischer 2 siblings, 1 reply; 45+ messages in thread From: Tom De Mulder @ 2011-07-18 10:53 UTC (permalink / raw) To: linux-raid On Sun, 17 Jul 2011, Lutz Vieweg wrote: > What is really an important factor for SSD performance: The controller. > The same SSDs perform with significantly lower latency for us when > connected to SATA controller channels than when connected to SAS > controllers (and they perform abysmal when used as hardware-RAID > constituents, in comparison). Interesting. I think it depends a lot on the controller. On a Dell server with PERC5/i RAID controller (actually made by LSI) I saw some performance degradation but not enough that I'd consider it a deal-breaker for situations where I really cared about the RAID functionality, more than about the loss of performance. After all, the latency is still massively lower than it is with spinning disk. I have a really great Areca RAID controller in a different server, but unfortunately it's in use and it'll be a while before I get another one I can use for testing. Given how well it does in other respects, I have high hopes for it. Best, -- Tom De Mulder <tdm27@cam.ac.uk> - Cambridge University Computing Service +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH -> 18/07/2011 : The Moon is Waning Gibbous (83% of Full) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Software RAID and TRIM 2011-07-18 10:53 ` Tom De Mulder @ 2011-07-18 12:13 ` Werner Fischer 0 siblings, 0 replies; 45+ messages in thread From: Werner Fischer @ 2011-07-18 12:13 UTC (permalink / raw) To: linux-raid On Mon, 2011-07-18 at 11:53 +0100, Tom De Mulder wrote: > On Sun, 17 Jul 2011, Lutz Vieweg wrote: > > > What is really an important factor for SSD performance: The controller. > > The same SSDs perform with significantly lower latency for us when > > connected to SATA controller channels than when connected to SAS > > controllers (and they perform abysmal when used as hardware-RAID > > constituents, in comparison). > > Interesting. > > I think it depends a lot on the controller. On a Dell server with PERC5/i > RAID controller (actually made by LSI) I saw some performance degradation > but not enough that I'd consider it a deal-breaker for situations where I > really cared about the RAID functionality, more than about the loss of > performance. After all, the latency is still massively lower than it is > with spinning disk. > > I have a really great Areca RAID controller in a different server, but > unfortunately it's in use and it'll be a while before I get another one I > can use for testing. Given how well it does in other respects, I have high > hopes for it. I agree that the controller can influence performance: 1. SATA controller: direct communication 2. SAS controller: Serial ATA Tunneling Protocol (STP) is used, this can have an impact on performance 3. Hardware RAID controller: depending on the controller, performance impact can be from low to very high Regards, Werner > > > Best, > > -- > Tom De Mulder <tdm27@cam.ac.uk> - Cambridge University Computing Service > +44 1223 3 31843 - New Museums Site, Pembroke Street, Cambridge CB2 3QH > -> 18/07/2011 : The Moon is Waning Gibbous (83% of Full) > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- : Werner Fischer : Technology Specialist : Thomas-Krenn.AG | The server-experts : http://www.thomas-krenn.com | http://www.thomas-krenn.com/wiki ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2011-07-20 12:25 UTC | newest] Thread overview: 45+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-06-28 15:31 Software RAID and TRIM Tom De Mulder 2011-06-28 16:11 ` Mathias Burén 2011-06-29 10:32 ` Tom De Mulder 2011-06-29 10:45 ` NeilBrown 2011-06-29 11:10 ` Tom De Mulder 2011-06-29 11:48 ` Scott E. Armitage 2011-06-29 12:46 ` Roberto Spadim 2011-06-29 12:46 ` David Brown 2011-06-30 0:28 ` NeilBrown 2011-06-30 7:50 ` David Brown 2011-06-29 13:39 ` Namhyung Kim 2011-06-30 0:27 ` NeilBrown 2011-07-17 22:11 ` Lutz Vieweg 2011-07-17 21:57 ` Lutz Vieweg 2011-06-29 10:33 ` Tom De Mulder 2011-06-29 12:42 ` David Brown 2011-06-29 12:55 ` Tom De Mulder 2011-06-29 13:02 ` Roberto Spadim 2011-06-29 13:10 ` David Brown 2011-06-30 5:51 ` Mikael Abrahamsson 2011-07-04 9:13 ` Tom De Mulder 2011-07-04 16:26 ` Werner Fischer 2011-07-17 22:31 ` Lutz Vieweg 2011-07-17 22:16 ` Lutz Vieweg 2011-07-17 22:00 ` Lutz Vieweg 2011-06-28 16:17 ` Johannes Truschnigg 2011-06-28 16:40 ` David Brown 2011-07-17 21:52 ` Lutz Vieweg 2011-07-18 5:14 ` Mikael Abrahamsson 2011-07-18 10:35 ` David Brown 2011-07-18 10:48 ` Tom De Mulder 2011-07-18 18:09 ` Lutz Vieweg 2011-07-18 20:18 ` David Brown 2011-07-19 9:29 ` Lutz Vieweg 2011-07-19 10:22 ` David Brown 2011-07-19 13:41 ` Lutz Vieweg 2011-07-19 15:06 ` David Brown 2011-07-20 10:39 ` Lutz Vieweg 2011-07-19 14:19 ` Tom De Mulder 2011-07-20 7:42 ` David Brown 2011-07-20 12:20 ` Lutz Vieweg 2011-07-20 12:13 ` Werner Fischer 2011-07-20 12:25 ` Lutz Vieweg 2011-07-18 10:53 ` Tom De Mulder 2011-07-18 12:13 ` Werner Fischer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).