linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stan Hoeppner <stan@hardwarefreak.com>
To: NeilBrown <neilb@suse.de>
Cc: John Stoffel <john@stoffel.org>, Marc MERLIN <marc@merlins.org>,
	linux-raid@vger.kernel.org
Subject: Re: Very long raid5 init/rebuild times
Date: Fri, 24 Jan 2014 02:24:09 -0600	[thread overview]
Message-ID: <52E22329.7070903@hardwarefreak.com> (raw)
In-Reply-To: <20140124140729.6cf73745@notabene.brown>

On 1/23/2014 9:07 PM, NeilBrown wrote:
> On Thu, 23 Jan 2014 19:02:21 -0600 Stan Hoeppner <stan@hardwarefreak.com>
> wrote:
> 
>> On 1/23/2014 8:28 AM, John Stoffel wrote:
>>
>>> But more importantly, maybe it would make sense to have this number
>>> automatically scale with memory size?  If you only have 1gig stay at
>>> 256, but then jump more aggresively to 1024, 2048, 4196 and 8192 and
>>> then (for now) capping at 8192.  
>>
>> Setting the default based strictly on memory capacity won't work.  See
>> this discussion for background.
>>
>> http://www.spinics.net/lists/raid/msg45364.html
>>
> 
> I would like to see the stripe cache grow on demand, shrink when idle, and
> use the "shrinker" interface to shrink even when not idle if there is memory
> pressure.
> So if someone wants a project....
> 
> NeilBrown


I'm a user, not a kernel hacker, and I don't know C.  Three strikes
right there. :(  Otherwise I'd love to tackle it.  I do have some
comments/ideas on the subject.

Progressively growing and shrinking the cache should be relatively
straightforward.  We can do it dynamically today by modifying a system
variable.  What's needed is code to track data input volume or rate to
md and to interface with the shrinker.

I think the difficult aspect of this will be determining the upper bound
on the cache size for a given system, as the optimum cache size directly
correlates to the throughput of the hardware.  With the current power of
2 restrictions, less than thorough testing indicates that disk based
arrays seem to prefer a value of 1024-2048 for max throughput whereas
SSD arrays seem to prefer 4096.  In either case, going to the next legal
value decreases throughput and eats double the RAM while doing so.

So here we need some way to determine device throughput or at least
device class, and set an upper bound accordingly.  I also think we
should consider unhitching our wagon from powers of 2 if we're going to
be dynamically growing/shrinking the cache.  I think grow/shrink should
be progressive with smaller jumps.  With 5 drives growing from 2048 to
4096 is going to grab 40MB of pages, likewise dumping 40MB for the
impending shrink iteration, then 20MB, 10MB, and finally dumping 5MB
arriving back at the 1MB/drive default.  This may cause a lot of memory
thrashing on some systems and workloads, evicting application data from
L2/L3 caches.  So we may want to be careful about how much memory we're
shuffling and how often.

-- 
Stan



  reply	other threads:[~2014-01-24  8:24 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-21  7:35 Very long raid5 init/rebuild times Marc MERLIN
2014-01-21 16:37 ` Marc MERLIN
2014-01-21 17:08   ` Mark Knecht
2014-01-21 18:42   ` Chris Murphy
2014-01-22  7:55   ` Stan Hoeppner
2014-01-22 17:48     ` Marc MERLIN
2014-01-22 23:17       ` Stan Hoeppner
2014-01-23 14:28         ` John Stoffel
2014-01-24  1:02           ` Stan Hoeppner
2014-01-24  3:07             ` NeilBrown
2014-01-24  8:24               ` Stan Hoeppner [this message]
2014-01-23  2:37       ` Stan Hoeppner
2014-01-23  9:13         ` Marc MERLIN
2014-01-23 12:24           ` Stan Hoeppner
2014-01-23 21:01             ` Marc MERLIN
2014-01-24  5:13               ` Stan Hoeppner
2014-01-25  8:36                 ` Marc MERLIN
2014-01-28  7:46                   ` Stan Hoeppner
2014-01-28 16:50                     ` Marc MERLIN
2014-01-29  0:56                       ` Stan Hoeppner
2014-01-29  1:01                         ` Marc MERLIN
2014-01-30 20:47                     ` Phillip Susi
2014-02-01 22:39                       ` Stan Hoeppner
2014-02-02 18:53                         ` Phillip Susi
2014-02-03  6:34                           ` Stan Hoeppner
2014-02-03 14:42                             ` Phillip Susi
2014-02-04  3:30                               ` Stan Hoeppner
2014-02-04 17:59                                 ` Larry Fenske
2014-02-04 18:08                                   ` Phillip Susi
2014-02-04 18:43                                     ` Stan Hoeppner
2014-02-04 18:55                                       ` Phillip Susi
2014-02-04 19:15                                         ` Stan Hoeppner
2014-02-04 20:16                                           ` Phillip Susi
2014-02-04 21:58                                             ` Stan Hoeppner
2014-02-05  1:19                                               ` Phillip Susi
2014-02-05  1:42                                                 ` Stan Hoeppner
2014-01-30 20:36                 ` Phillip Susi
2014-01-30 20:18             ` Phillip Susi
2014-01-22 19:38     ` Opal 2.0 SEDs on linux, was: " Chris Murphy
2014-01-21 18:31 ` Chris Murphy
2014-01-22 13:46 ` Ethan Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52E22329.7070903@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=john@stoffel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=marc@merlins.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).