From: Roberto Spadim <roberto@spadim.com.br>
To: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
Cc: "Eric D. Mudama" <edmudama@bounceswoosh.org>,
"Scott E. Armitage" <launchpad@scott.armitage.name>,
David Brown <david@westcontrol.com>,
linux-raid@vger.kernel.org
Subject: Re: SSD - TRIM command
Date: Wed, 9 Feb 2011 17:27:41 -0200 [thread overview]
Message-ID: <AANLkTikD78oz3PHUAKSUWo3MiThowEZX4+RmaAOjDepP@mail.gmail.com> (raw)
In-Reply-To: <20110209192101.GA20745@lazy.lzy>
just to make READ ok with any drive mix
if device have TRIM, use it
if not use WRITE 0x000000...
after if we READ from /dev/md0
we have the same information (0x000000) doesn´t matter if it´s a ssd
hd with or without trim function
ext4 send trim command (but it´s a user option, should be used only
with TRIM supported disks)
swap send (it´s not a user option, kernel check if device can execute
TRIM, if not don´t send (i don´t know what it do, but we could use the
same code to 'emulate' TRIM command, like swap do))
why emulate? because we can use a mixed array (ssd/hd) and get more
performace from TRIM enabled disks and ext4 (or other filesystem that
will use md as a device)
the point is: put support of TRIM command to MD devices
today i don´t know if it have (i think not)
if exists this support, how it works? could we mix TRIM enabled and
non TRIM devices in a raid array?
the first option is don´t use trim
the second use trim when possible, emulate trim when impossible
the third only accept trim if all devices are trim enabled (this
should be a run time option, since we can remove a mirror with trim
support and put a mirror without trim support)
2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
>> yeah =)
>> a question...
>> if i send a TRIM to a sector
>> if i read from it
>> what i have?
>> 0x00000000000000000000000000000000000 ?
>> if yes, we could translate TRIM to WRITE on devices without TRIM (hard disks)
>> just to have the same READ information
>
> It seems the 0x0 is not a standard. Return values
> seem to be quite undefined, even if 0x0 *might*
> be common.
>
> Second, why do you want to emulate the 0x0 thing?
>
> I do not see the point of writing zero on a device
> which do not support TRIM. Just do nothing seems a
> better choice, even in mixed environment.
>
> bye,
>
> pg
>
>> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
>> >> it´s just a discussion, right? no implementation yet, right?
>> >
>> > Of course...
>> >
>> >> what i think....
>> >> if device accept TRIM, we can use TRIM.
>> >> if not, we must translate TRIM to something similar (maybe many WRITES
>> >> ?), and when we READ from disk we get the same information
>> >
>> > TRIM is not about writing at all. TRIM tells the
>> > device that the addressed block is not anymore used,
>> > so it (the SSD) can do whatever it wants with it.
>> >
>> > The only software layer having the same "knowledge"
>> > is the filesystem, the other layers, do not have
>> > any decisional power about the block allocation.
>> > Except for metadata, of course.
>> >
>> > So, IMHO, a software TRIM can only be in the FS.
>> >
>> > bye,
>> >
>> > pg
>> >
>> >> the translation coulbe be done by kernel (not md) maybe options on
>> >> libata, nbd device....
>> >> other option is do it with md, internal (md) TRIM translate function
>> >>
>> >> who send trim?
>> >> internal md information: md can generate it (if necessary, maybe it´s
>> >> not...) for parity disks (not data disks)
>> >> filesystem/or another upper layer program (database with direct device
>> >> access), we could accept TRIM from filesystem/database, and send it to
>> >> disks/mirrors, when necessary translate it (internal or kernel
>> >> translate function)
>> >>
>> >>
>> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
>> >> > On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote:
>> >> >> nice =)
>> >> >> but check that parity block is a raid information, not a filesystem information
>> >> >> for raid we could implement trim when possible (like swap)
>> >> >> and implement a trim that we receive from filesystem, and send to all
>> >> >> disks (if it´s a raid1 with mirrors, we should sent to all mirrors)
>> >> >
>> >> > To all disk also in case of RAID-5?
>> >> >
>> >> > What if the TRIM belongs only to a single SDD block
>> >> > belonging to a single chunk of a stripe?
>> >> > That is a *single* SSD of the RAID-5.
>> >> >
>> >> > Should md re-read the block and re-write (not TRIM)
>> >> > the parity?
>> >> >
>> >> > I think anything that has to do with checking &
>> >> > repairing must be carefully considered...
>> >> >
>> >> > bye,
>> >> >
>> >> > pg
>> >> >
>> >> >> i don´t know what trim do very well, but i think it´s a very big write
>> >> >> with only some bits for example:
>> >> >> set sector1='00000000000000000000000000000000000000000000000000'
>> >> >> could be replace by:
>> >> >> trim sector1
>> >> >> it´s faster for sata communication, and it´s a good information for
>> >> >> hard disk (it can put a single '0' at the start of the sector and know
>> >> >> that all sector is 0, if it try to read any information it can use
>> >> >> internal memory (don´t read hard disk), if a write is done it should
>> >> >> write 0000 to bits, and after after the write operation, but it´s
>> >> >> internal function of hard disk/ssd, not a problem of md raid... md
>> >> >> raid should need know how to optimize and use it =] )
>> >> >>
>> >> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
>> >> >> >> ext4 send trim commands to device (disk/md raid/nbd)
>> >> >> >> kernel swap send this commands (when possible) to device too
>> >> >> >> for internal raid5 parity disk this could be done by md, for data
>> >> >> >> disks this should be done by ext4
>> >> >> >
>> >> >> > That's an interesting point.
>> >> >> >
>> >> >> > On which basis should a parity "block" get a TRIM?
>> >> >> >
>> >> >> > If you ask me, I think the complete TRIM story is, at
>> >> >> > best, a temporary patch.
>> >> >> >
>> >> >> > IMHO the wear levelling should be handled by the filesystem
>> >> >> > and, with awarness of this, by the underlining device drivers.
>> >> >> > Reason is that the FS knows better what's going on with the
>> >> >> > blocks and what will happen.
>> >> >> >
>> >> >> > bye,
>> >> >> >
>> >> >> > pg
>> >> >> >
>> >> >> >>
>> >> >> >> the other question... about resync with only write what is different
>> >> >> >> this is very good since write and read speed can be different for ssd
>> >> >> >> (hd don´t have this 'problem')
>> >> >> >> but i´m sure that just write what is diff is better than write all
>> >> >> >> (ssd life will be bigger, hd maybe... i think that will be bigger too)
>> >> >> >>
>> >> >> >>
>> >> >> >> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>:
>> >> >> >> > On Wed, Feb 9 at 11:28, Scott E. Armitage wrote:
>> >> >> >> >>
>> >> >> >> >> Who sends this command? If md can assume that determinate mode is
>> >> >> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5,
>> >> >> >> >> consistency of the parity information depends on the determinate
>> >> >> >> >> pattern used and the number of disks. If you used determinate
>> >> >> >> >> all-zero, then parity information would always be consistent, but this
>> >> >> >> >> is probably not preferable since every TRIM command would incur an
>> >> >> >> >> extra write for each bit in each page of the block.
>> >> >> >> >
>> >> >> >> > True, and there are several solutions. Maybe track space used via
>> >> >> >> > some mechanism, such that when you trim you're only trimming the
>> >> >> >> > entire stripe width so no parity is required for the trimmed regions.
>> >> >> >> > Or, trust the drive's wear leveling and endurance rating, combined
>> >> >> >> > with SMART data, to indicate when you need to replace the device
>> >> >> >> > preemptive to eventual failure.
>> >> >> >> >
>> >> >> >> > It's not an unsolvable issue. If the RAID5 used distributed parity,
>> >> >> >> > you could expect wear leveling to wear all the devices evenly, since
>> >> >> >> > on average, the # of writes to all devices will be the same. Only a
>> >> >> >> > RAID4 setup would see a lopsided amount of writes to a single device.
>> >> >> >> >
>> >> >> >> > --eric
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > Eric D. Mudama
>> >> >> >> > edmudama@bounceswoosh.org
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> >> >> > the body of a message to majordomo@vger.kernel.org
>> >> >> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> Roberto Spadim
>> >> >> >> Spadim Technology / SPAEmpresarial
>> >> >> >> --
>> >> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >> >> >
>> >> >> > --
>> >> >> >
>> >> >> > piergiorgio
>> >> >> > --
>> >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> >> > the body of a message to majordomo@vger.kernel.org
>> >> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Roberto Spadim
>> >> >> Spadim Technology / SPAEmpresarial
>> >> >> --
>> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >> >
>> >> > --
>> >> >
>> >> > piergiorgio
>> >> > --
>> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> > the body of a message to majordomo@vger.kernel.org
>> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Roberto Spadim
>> >> Spadim Technology / SPAEmpresarial
>> >
>> > --
>> >
>> > piergiorgio
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> >
>>
>>
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>
> --
>
> piergiorgio
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-02-09 19:27 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-07 20:07 SSD - TRIM command Roberto Spadim
2011-02-08 17:37 ` maurice
2011-02-08 18:31 ` Roberto Spadim
[not found] ` <AANLkTik5SumqyTN5LZVntna8nunvPe7v38TSFf9eCfcU@mail.gmail.com>
2011-02-08 20:50 ` Roberto Spadim
2011-02-08 21:18 ` maurice
2011-02-08 21:33 ` Roberto Spadim
2011-02-09 7:44 ` Stan Hoeppner
2011-02-09 9:05 ` Eric D. Mudama
2011-02-09 15:45 ` Chris Worley
2011-02-09 13:29 ` David Brown
2011-02-09 14:39 ` Roberto Spadim
2011-02-09 15:00 ` Scott E. Armitage
2011-02-09 15:52 ` Chris Worley
2011-02-09 19:15 ` Doug Dumitru
2011-02-09 19:22 ` Roberto Spadim
2011-02-09 16:19 ` Eric D. Mudama
2011-02-09 16:28 ` Scott E. Armitage
2011-02-09 17:17 ` Eric D. Mudama
2011-02-09 18:18 ` Roberto Spadim
2011-02-09 18:24 ` Piergiorgio Sartor
2011-02-09 18:30 ` Roberto Spadim
2011-02-09 18:38 ` Piergiorgio Sartor
2011-02-09 18:46 ` Roberto Spadim
2011-02-09 18:52 ` Roberto Spadim
2011-02-09 19:13 ` Piergiorgio Sartor
2011-02-09 19:16 ` Roberto Spadim
2011-02-09 19:21 ` Piergiorgio Sartor
2011-02-09 19:27 ` Roberto Spadim [this message]
2011-02-21 18:24 ` Phillip Susi
2011-02-21 18:30 ` Roberto Spadim
2011-02-09 15:49 ` David Brown
2011-02-21 18:20 ` Phillip Susi
2011-02-21 18:25 ` Roberto Spadim
2011-02-21 18:34 ` Phillip Susi
2011-02-21 18:48 ` Roberto Spadim
2011-02-21 18:51 ` Mathias Burén
2011-02-21 19:32 ` Roberto Spadim
2011-02-21 19:38 ` Mathias Burén
2011-02-21 19:39 ` Mathias Burén
2011-02-21 19:43 ` Roberto Spadim
2011-02-21 20:45 ` Phillip Susi
2011-02-21 19:39 ` Roberto Spadim
2011-02-21 19:51 ` Doug Dumitru
2011-02-21 19:57 ` Roberto Spadim
2011-02-21 20:47 ` Phillip Susi
2011-02-21 21:02 ` Mathias Burén
2011-02-21 22:52 ` Roberto Spadim
2011-02-21 23:41 ` Mathias Burén
2011-02-21 23:42 ` Mathias Burén
2011-02-21 23:52 ` Roberto Spadim
2011-02-22 0:25 ` Mathias Burén
2011-02-22 0:30 ` Brendan Conoboy
2011-02-22 0:36 ` Eric D. Mudama
2011-02-22 1:46 ` Roberto Spadim
2011-02-22 1:52 ` Mathias Burén
2011-02-22 1:55 ` Roberto Spadim
2011-02-22 2:01 ` Eric D. Mudama
2011-02-22 2:02 ` Mikael Abrahamsson
2011-02-22 2:22 ` Guy Watkins
2011-02-22 2:27 ` Roberto Spadim
2011-02-22 3:45 ` NeilBrown
2011-02-22 4:37 ` Roberto Spadim
2011-02-22 2:38 ` Phillip Susi
2011-02-22 3:29 ` Roberto Spadim
2011-02-22 3:42 ` Roberto Spadim
2011-02-22 4:04 ` Phillip Susi
2011-02-22 4:30 ` Roberto Spadim
2011-02-22 14:45 ` Phillip Susi
2011-02-22 17:15 ` Roberto Spadim
2011-02-22 0:32 ` Eric D. Mudama
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AANLkTikD78oz3PHUAKSUWo3MiThowEZX4+RmaAOjDepP@mail.gmail.com \
--to=roberto@spadim.com.br \
--cc=david@westcontrol.com \
--cc=edmudama@bounceswoosh.org \
--cc=launchpad@scott.armitage.name \
--cc=linux-raid@vger.kernel.org \
--cc=piergiorgio.sartor@nexgo.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).