linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Greg Freemyer <greg.freemyer@gmail.com>
Cc: david@lang.hm, Markus Trippelsdorf <markus@trippelsdorf.de>,
	Matthew Wilcox <willy@linux.intel.com>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Nitin Gupta <ngupta@vflare.org>, Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org,
	Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: Discard support (was Re: [PATCH] swap: send callback when swap slot is freed)
Date: Thu, 13 Aug 2009 19:18:15 +0000	[thread overview]
Message-ID: <1250191095.3901.116.camel@mulgrave.site> (raw)
In-Reply-To: <87f94c370908131115r680a7523w3cdbc78b9e82373c@mail.gmail.com>

On Thu, 2009-08-13 at 14:15 -0400, Greg Freemyer wrote:
> On Thu, Aug 13, 2009 at 12:33 PM, <david@lang.hm> wrote:
> > On Thu, 13 Aug 2009, Markus Trippelsdorf wrote:
> >
> >> On Thu, Aug 13, 2009 at 08:13:12AM -0700, Matthew Wilcox wrote:
> >>>
> >>> I am planning a complete overhaul of the discard work.  Users can send
> >>> down discard requests as frequently as they like.  The block layer will
> >>> cache them, and invalidate them if writes come through.  Periodically,
> >>> the block layer will send down a TRIM or an UNMAP (depending on the
> >>> underlying device) and get rid of the blocks that have remained unwanted
> >>> in the interim.
> >>
> >> That is a very good idea. I've tested your original TRIM implementation on
> >> my Vertex yesterday and it was awful ;-). The SSD needs hundreds of
> >> milliseconds to digest a single TRIM command. And since your
> >> implementation
> >> sends a TRIM for each extent of each deleted file, the whole system is
> >> unusable after a short while.
> >> An optimal solution would be to consolidate the discard requests, bundle
> >> them and send them to the drive as infrequent as possible.
> >
> > or queue them up and send them when the drive is idle (you would need to
> > keep track to make sure the space isn't re-used)
> >
> > as an example, if you would consider spinning down a drive you don't hurt
> > performance by sending accumulated trim commands.
> >
> > David Lang
> 
> An alternate approach is the block layer maintain its own bitmap of
> used unused sectors / blocks. Unmap commands from the filesystem just
> cause the bitmap to be updated.  No other effect.
> 
> (Big unknown: Where will the bitmap live between reboots?  Require DM
> volumes so we can have a dedicated bitmap volume in the mix to store
> the bitmap to? Maybe on mount, the filesystem has to be scanned to
> initially populate the bitmap?   Other options?)

I wouldn't really have it live anywhere.  Discard is best effort; it's
not required for fs integrity.  As long as we don't discard an in-use
block we're free to do anything else (including forget to discard,
rediscard a discarded block etc).

It is theoretically possible to run all of this from user space using
the fs mappings, a bit like a defrag command.

One other option would just be to scan on mount, discard everything
empty and redo on next mount ... this might be just the thing for
laptops.

> Assuming we have a persistent bitmap in place, have a background
> scanner that kicks in when the cpu / disk is idle.  It just
> continuously scans the bitmap looking for contiguous blocks of unused
> sectors.  Each time it finds one, it sends the largest possible unmap
> down the block stack and eventually to the device.
> 
> When normal cpu / disk activity kicks in, this process goes to sleep.
> 
> That way much of the smarts are concentrated in the block layer, not
> in the filesystem code.  And it is being done when the disk is
> otherwise idle, so you don't have the ncq interference.
> 
> Even laptop users should have enough idle cpu available to manage
> this.  Enterprise would get the large discards it wants, and
> unmentioned in the previous discussion, mdraid gets the large discards
> it also wants.
> 
> ie. If a mdraid raid5/raid6 volume is built of SSDs, it will only be
> able to discard a full stripe at a time. Otherwise the P=D1 ^ D2 logic
> is lost.
> 
> Another benefit of the above is the code should be extremely safe and testable.

Actually, I think, if we go in-kernel, the discard might be better tied
into the block plugging mechanism.  The real test might be no
outstanding commands and queue plugged, keep plugged and begin
discarding.

James



  reply	other threads:[~2009-08-13 19:18 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200908122007.43522.ngupta@vflare.org>
     [not found] ` <Pine.LNX.4.64.0908122312380.25501@sister.anvils>
     [not found]   ` <20090813151312.GA13559@linux.intel.com>
     [not found]     ` <20090813162621.GB1915@phenom2.trippelsdorf.de>
     [not found]       ` <alpine.DEB.1.10.0908130931400.28013@asgard.lang.hm>
2009-08-13 18:15         ` Discard support (was Re: [PATCH] swap: send callback when swap slot is freed) Greg Freemyer
2009-08-13 19:18           ` James Bottomley [this message]
2009-08-13 20:31             ` Richard Sharpe
2009-08-14 22:03             ` Mark Lord
2009-08-14 22:54               ` Greg Freemyer
2009-08-15 13:12                 ` Mark Lord
2009-08-13 20:44           ` david
2009-08-13 20:54             ` Bryan Donlan
2009-08-14 22:10               ` Mark Lord
2009-08-14 23:21                 ` Chris Worley
2009-08-14 23:45                   ` Matthew Wilcox
2009-08-15  0:19                     ` Chris Worley
2009-08-15  0:30                       ` Greg Freemyer
2009-08-15  0:38                         ` Chris Worley
2009-08-15  1:55                           ` Greg Freemyer
2009-08-15 13:20                           ` Mark Lord
2009-08-16 22:52                             ` Chris Worley
2009-08-17  2:03                               ` Mark Lord
2009-08-15 12:59                       ` James Bottomley
2009-08-15 13:22                         ` Mark Lord
2009-08-15 13:55                           ` James Bottomley
2009-08-15 17:39                             ` jim owens
2009-08-16 17:08                               ` Robert Hancock
2009-08-16 14:05                             ` Alan Cox
2009-08-16 14:16                               ` Mark Lord
2009-08-16 15:34                               ` Arjan van de Ven
2009-08-16 15:44                                 ` Theodore Tso
2009-08-16 17:28                                   ` Mark Lord
     [not found]                                   ` <4A8841D7.10506@rtr.ca>
2009-08-16 17:37                                     ` Mark Lord
     [not found]                                     ` <4A8843C3.3020409@rtr.ca>
2009-08-17 16:30                                       ` Bill Davidsen
2009-08-17 16:56                                         ` jim owens
2009-08-17 17:14                                           ` Bill Davidsen
2009-08-17 17:37                                             ` jim owens
2009-08-16 15:52                                 ` James Bottomley
2009-08-16 16:32                                   ` Mark Lord
2009-08-16 18:07                                     ` James Bottomley
2009-08-16 18:19                                       ` Mark Lord
2009-08-16 18:24                                         ` James Bottomley
2009-08-17 16:37                                           ` Bill Davidsen
2009-08-17 17:08                                             ` Greg Freemyer
2009-08-17 17:19                                               ` James Bottomley
2009-08-17 18:16                                                 ` Ric Wheeler
2009-08-17 18:21                                                 ` Greg Freemyer
2009-08-17 19:18                                                   ` James Bottomley
2009-08-17 20:19                                                     ` Mark Lord
2009-08-17 20:28                                                       ` James Bottomley
2009-08-17 20:28                                               ` Mark Lord
2009-08-16 16:59                                   ` Christoph Hellwig
2009-08-17  4:24                                     ` Douglas Gilbert
2009-08-17 13:56                                     ` James Bottomley
2009-08-17 14:10                                       ` Matthew Wilcox
2009-08-17 19:12                                         ` Christoph Hellwig
2009-08-17 19:24                                           ` James Bottomley
2009-08-16 21:50                                   ` Discard support Roland Dreier
2009-08-16 22:06                                     ` Jeff Garzik
2009-08-16 22:13                                     ` Theodore Tso
2009-08-16 22:51                                       ` Mark Lord
2009-08-16 19:29                                 ` Discard support (was Re: [PATCH] swap: send callback when swap slot is freed) Alan Cox
2009-08-16 23:05                                   ` John Robinson
2009-08-17  2:05                                     ` Mark Lord
2009-08-13 21:28             ` Greg Freemyer
2009-08-13 22:20               ` Richard Sharpe
2009-08-14  0:19                 ` Greg Freemyer
     [not found]                   ` <46b8a8850908131758s781b07f6v2729483c0e50ae7a@mail.gmail.com>
2009-08-14 21:33                     ` Greg Freemyer
     [not found]                     ` <87f94c370908141433h111f819j550467bf31c60776@mail.gmail.com>
2009-08-14 21:56                       ` Discard support Roland Dreier
2009-08-14 22:10                         ` Greg Freemyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1250191095.3901.116.camel@mulgrave.site \
    --to=james.bottomley@hansenpartnership.com \
    --cc=david@lang.hm \
    --cc=greg.freemyer@gmail.com \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=markus@trippelsdorf.de \
    --cc=mingo@elte.hu \
    --cc=ngupta@vflare.org \
    --cc=peterz@infradead.org \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).