linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>
To: Roberto Spadim <roberto@spadim.com.br>
Cc: Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>,
	"Eric D. Mudama" <edmudama@bounceswoosh.org>,
	"Scott E. Armitage" <launchpad@scott.armitage.name>,
	David Brown <david@westcontrol.com>,
	linux-raid@vger.kernel.org
Subject: Re: SSD - TRIM command
Date: Wed, 9 Feb 2011 20:21:02 +0100	[thread overview]
Message-ID: <20110209192101.GA20745@lazy.lzy> (raw)
In-Reply-To: <AANLkTina-4yvjFgR4dxvvYvNZ56gfZ4S324OWh-zBjji@mail.gmail.com>

> yeah =)
> a question...
> if i send a TRIM to a sector
> if i read from it
> what i have?
> 0x00000000000000000000000000000000000 ?
> if yes, we could translate TRIM to WRITE on devices without TRIM (hard disks)
> just to have the same READ information

It seems the 0x0 is not a standard. Return values
seem to be quite undefined, even if 0x0 *might*
be common.

Second, why do you want to emulate the 0x0 thing?

I do not see the point of writing zero on a device
which do not support TRIM. Just do nothing seems a
better choice, even in mixed environment.

bye,

pg
 
> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
> >> it´s just a discussion, right? no implementation yet, right?
> >
> > Of course...
> >
> >> what i think....
> >> if device accept TRIM, we can use TRIM.
> >> if not, we must translate TRIM to something similar (maybe many WRITES
> >> ?), and when we READ from disk we get the same information
> >
> > TRIM is not about writing at all. TRIM tells the
> > device that the addressed block is not anymore used,
> > so it (the SSD) can do whatever it wants with it.
> >
> > The only software layer having the same "knowledge"
> > is the filesystem, the other layers, do not have
> > any decisional power about the block allocation.
> > Except for metadata, of course.
> >
> > So, IMHO, a software TRIM can only be in the FS.
> >
> > bye,
> >
> > pg
> >
> >> the translation coulbe be done by kernel (not md) maybe options on
> >> libata, nbd device....
> >> other option is do it with md, internal (md) TRIM translate function
> >>
> >> who send trim?
> >> internal md information: md can generate it (if necessary, maybe it´s
> >> not...) for parity disks (not data disks)
> >> filesystem/or another upper layer program (database with direct device
> >> access), we could accept TRIM from filesystem/database, and send it to
> >> disks/mirrors, when necessary translate it (internal or kernel
> >> translate function)
> >>
> >>
> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
> >> > On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote:
> >> >> nice =)
> >> >> but check that parity block is a raid information, not a filesystem information
> >> >> for raid we could implement trim when possible (like swap)
> >> >> and implement a trim that we receive from filesystem, and send to all
> >> >> disks (if it´s a raid1 with mirrors, we should sent to all mirrors)
> >> >
> >> > To all disk also in case of RAID-5?
> >> >
> >> > What if the TRIM belongs only to a single SDD block
> >> > belonging to a single chunk of a stripe?
> >> > That is a *single* SSD of the RAID-5.
> >> >
> >> > Should md re-read the block and re-write (not TRIM)
> >> > the parity?
> >> >
> >> > I think anything that has to do with checking &
> >> > repairing must be carefully considered...
> >> >
> >> > bye,
> >> >
> >> > pg
> >> >
> >> >> i don´t know what trim do very well, but i think it´s a very big write
> >> >> with only some bits for example:
> >> >> set sector1='00000000000000000000000000000000000000000000000000'
> >> >> could be replace by:
> >> >> trim sector1
> >> >> it´s faster for sata communication, and it´s a good information for
> >> >> hard disk (it can put a single '0' at the start of the sector and know
> >> >> that all sector is 0, if it try to read any information it can use
> >> >> internal memory (don´t read hard disk), if a write is done it should
> >> >> write 0000 to bits, and after after the write operation, but it´s
> >> >> internal function of hard disk/ssd, not a problem of md raid... md
> >> >> raid should need know how to optimize and use it =] )
> >> >>
> >> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
> >> >> >> ext4 send trim commands to device (disk/md raid/nbd)
> >> >> >> kernel swap send this commands (when possible) to device too
> >> >> >> for internal raid5 parity disk this could be done by md, for data
> >> >> >> disks this should be done by ext4
> >> >> >
> >> >> > That's an interesting point.
> >> >> >
> >> >> > On which basis should a parity "block" get a TRIM?
> >> >> >
> >> >> > If you ask me, I think the complete TRIM story is, at
> >> >> > best, a temporary patch.
> >> >> >
> >> >> > IMHO the wear levelling should be handled by the filesystem
> >> >> > and, with awarness of this, by the underlining device drivers.
> >> >> > Reason is that the FS knows better what's going on with the
> >> >> > blocks and what will happen.
> >> >> >
> >> >> > bye,
> >> >> >
> >> >> > pg
> >> >> >
> >> >> >>
> >> >> >> the other question... about resync with only write what is different
> >> >> >> this is very good since write and read speed can be different for ssd
> >> >> >> (hd don´t have this 'problem')
> >> >> >> but i´m sure that just write what is diff is better than write all
> >> >> >> (ssd life will be bigger, hd maybe... i think that will be bigger too)
> >> >> >>
> >> >> >>
> >> >> >> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>:
> >> >> >> > On Wed, Feb  9 at 11:28, Scott E. Armitage wrote:
> >> >> >> >>
> >> >> >> >> Who sends this command? If md can assume that determinate mode is
> >> >> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5,
> >> >> >> >> consistency of the parity information depends on the determinate
> >> >> >> >> pattern used and the number of disks. If you used determinate
> >> >> >> >> all-zero, then parity information would always be consistent, but this
> >> >> >> >> is probably not preferable since every TRIM command would incur an
> >> >> >> >> extra write for each bit in each page of the block.
> >> >> >> >
> >> >> >> > True, and there are several solutions.  Maybe track space used via
> >> >> >> > some mechanism, such that when you trim you're only trimming the
> >> >> >> > entire stripe width so no parity is required for the trimmed regions.
> >> >> >> > Or, trust the drive's wear leveling and endurance rating, combined
> >> >> >> > with SMART data, to indicate when you need to replace the device
> >> >> >> > preemptive to eventual failure.
> >> >> >> >
> >> >> >> > It's not an unsolvable issue.  If the RAID5 used distributed parity,
> >> >> >> > you could expect wear leveling to wear all the devices evenly, since
> >> >> >> > on average, the # of writes to all devices will be the same.  Only a
> >> >> >> > RAID4 setup would see a lopsided amount of writes to a single device.
> >> >> >> >
> >> >> >> > --eric
> >> >> >> >
> >> >> >> > --
> >> >> >> > Eric D. Mudama
> >> >> >> > edmudama@bounceswoosh.org
> >> >> >> >
> >> >> >> > --
> >> >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> >> >> > the body of a message to majordomo@vger.kernel.org
> >> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Roberto Spadim
> >> >> >> Spadim Technology / SPAEmpresarial
> >> >> >> --
> >> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >
> >> >> > --
> >> >> >
> >> >> > piergiorgio
> >> >> > --
> >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> >> > the body of a message to majordomo@vger.kernel.org
> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Roberto Spadim
> >> >> Spadim Technology / SPAEmpresarial
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >
> >> > --
> >> >
> >> > piergiorgio
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> > the body of a message to majordomo@vger.kernel.org
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >
> >>
> >>
> >>
> >> --
> >> Roberto Spadim
> >> Spadim Technology / SPAEmpresarial
> >
> > --
> >
> > piergiorgio
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> 
> 
> -- 
> Roberto Spadim
> Spadim Technology / SPAEmpresarial

-- 

piergiorgio
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-02-09 19:21 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-07 20:07 SSD - TRIM command Roberto Spadim
2011-02-08 17:37 ` maurice
2011-02-08 18:31   ` Roberto Spadim
     [not found]     ` <AANLkTik5SumqyTN5LZVntna8nunvPe7v38TSFf9eCfcU@mail.gmail.com>
2011-02-08 20:50       ` Roberto Spadim
2011-02-08 21:18         ` maurice
2011-02-08 21:33           ` Roberto Spadim
2011-02-09  7:44   ` Stan Hoeppner
2011-02-09  9:05     ` Eric D. Mudama
2011-02-09 15:45       ` Chris Worley
2011-02-09 13:29     ` David Brown
2011-02-09 14:39       ` Roberto Spadim
2011-02-09 15:00         ` Scott E. Armitage
2011-02-09 15:52           ` Chris Worley
2011-02-09 19:15             ` Doug Dumitru
2011-02-09 19:22               ` Roberto Spadim
2011-02-09 16:19           ` Eric D. Mudama
2011-02-09 16:28             ` Scott E. Armitage
2011-02-09 17:17               ` Eric D. Mudama
2011-02-09 18:18                 ` Roberto Spadim
2011-02-09 18:24                   ` Piergiorgio Sartor
2011-02-09 18:30                     ` Roberto Spadim
2011-02-09 18:38                       ` Piergiorgio Sartor
2011-02-09 18:46                         ` Roberto Spadim
2011-02-09 18:52                           ` Roberto Spadim
2011-02-09 19:13                           ` Piergiorgio Sartor
2011-02-09 19:16                             ` Roberto Spadim
2011-02-09 19:21                               ` Piergiorgio Sartor [this message]
2011-02-09 19:27                                 ` Roberto Spadim
2011-02-21 18:24             ` Phillip Susi
2011-02-21 18:30               ` Roberto Spadim
2011-02-09 15:49         ` David Brown
2011-02-21 18:20           ` Phillip Susi
2011-02-21 18:25             ` Roberto Spadim
2011-02-21 18:34               ` Phillip Susi
2011-02-21 18:48                 ` Roberto Spadim
2011-02-21 18:51               ` Mathias Burén
2011-02-21 19:32                 ` Roberto Spadim
2011-02-21 19:38                   ` Mathias Burén
2011-02-21 19:39                     ` Mathias Burén
2011-02-21 19:43                       ` Roberto Spadim
2011-02-21 20:45                       ` Phillip Susi
2011-02-21 19:39                   ` Roberto Spadim
2011-02-21 19:51                     ` Doug Dumitru
2011-02-21 19:57                       ` Roberto Spadim
2011-02-21 20:47                     ` Phillip Susi
2011-02-21 21:02                       ` Mathias Burén
2011-02-21 22:52                         ` Roberto Spadim
2011-02-21 23:41                           ` Mathias Burén
2011-02-21 23:42                             ` Mathias Burén
2011-02-21 23:52                               ` Roberto Spadim
2011-02-22  0:25                                 ` Mathias Burén
2011-02-22  0:30                                 ` Brendan Conoboy
2011-02-22  0:36                                 ` Eric D. Mudama
2011-02-22  1:46                                   ` Roberto Spadim
2011-02-22  1:52                                     ` Mathias Burén
2011-02-22  1:55                                       ` Roberto Spadim
2011-02-22  2:01                                         ` Eric D. Mudama
2011-02-22  2:02                                         ` Mikael Abrahamsson
2011-02-22  2:22                                           ` Guy Watkins
2011-02-22  2:27                                             ` Roberto Spadim
2011-02-22  3:45                                               ` NeilBrown
2011-02-22  4:37                                                 ` Roberto Spadim
2011-02-22  2:38                                         ` Phillip Susi
2011-02-22  3:29                                           ` Roberto Spadim
2011-02-22  3:42                                             ` Roberto Spadim
2011-02-22  4:04                                             ` Phillip Susi
2011-02-22  4:30                                               ` Roberto Spadim
2011-02-22 14:45                                                 ` Phillip Susi
2011-02-22 17:15                                                   ` Roberto Spadim
2011-02-22  0:32                           ` Eric D. Mudama

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110209192101.GA20745@lazy.lzy \
    --to=piergiorgio.sartor@nexgo.de \
    --cc=david@westcontrol.com \
    --cc=edmudama@bounceswoosh.org \
    --cc=launchpad@scott.armitage.name \
    --cc=linux-raid@vger.kernel.org \
    --cc=roberto@spadim.com.br \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).