Linux RAID subsystem development
 help / color / mirror / Atom feed
From: Shaohua Li <shli@fb.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: NeilBrown <neilb@suse.de>,
	linux-raid <linux-raid@vger.kernel.org>,
	Song Liu <songliubraving@fb.com>,
	Kernel-team@fb.com
Subject: Re: [RFC] raid5: add a log device to fix raid5/6 write hole issue
Date: Thu, 9 Apr 2015 09:03:27 -0700	[thread overview]
Message-ID: <20150409160327.GA2087406@devbig257.prn2.facebook.com> (raw)
In-Reply-To: <CAPcyv4j2RjqY=Ns8rXMUypGovU7_beNfpuL-iW0Fx9522ogPkA@mail.gmail.com>

On Thu, Apr 09, 2015 at 08:37:03AM -0700, Dan Williams wrote:
> On Wed, Apr 8, 2015 at 11:15 PM, Shaohua Li <shli@fb.com> wrote:
> > On Thu, Apr 09, 2015 at 03:04:59PM +1000, NeilBrown wrote:
> >> On Wed, 8 Apr 2015 17:43:11 -0700 Shaohua Li <shli@fb.com> wrote:
> >>
> >> > Hi,
> >> > This is what I'm working on now, and hopefully had the basic code
> >> > running next week. The new design will do cache and fix the write hole
> >> > issue too. Before I post the code out, I'd like to check if the design
> >> > has obvious issues.
> >>
> >> I can't say I'm excited about it....
> >>
> >> You still haven't explained why you would ever want to read data from the
> >> "cache"?  Why not just keep everything in the stripe-cache until it is safe
> >> in the RAID.   I asked before and you said:
> >>
> >> >> I'm not enthusiastic to use stripe cache though, we can't keep all data
> >> >> in stripe cache. What we really need is an index.
> >>
> >> which is hardly an answer.  Why cannot you keep all the data in the stripe
> >> cache?  How much data is there? How much memory can you afford to dedicate?
> >>
> >> You must have some very long sustained bursts of writes which are much faster
> >> than the RAID can accept in order to not be able to keep everything in memory.
> >>
> >>
> >> Your cache layout seems very rigid.  I would much rather a layout that was
> >> very general and flexible.  If you want to always allocate a chunk at a time
> >> then fine, but don't force that on the cache layout.
> >>
> >> The log really should be very simple.  A block describing what comes next,
> >> then lots of data/parity.  Then another block and more data etc etc.
> >> Each metadata  block points to the next one.
> >> If you need an index of the cache, you keep that in memory.  On restart, you
> >> read all of the metadata blocks and  built up the index.
> >>
> >> I think that space in the log should be reclaimed in exactly the order that
> >> it is written, so the active part of the log is contiguous.   Obviously
> >> individual blocks become inactive in arbitrary order as they are written to
> >> the RAID, but each extent of the log becomes free in order.
> >> If you want that to happen out of order, you would need to present a very
> >> good reason.
> >
> > I came to the same idea when I'm thinking about a caching layer, but the
> > memory size is the main blocking issue. If the solution requires a large
> > amount of extra memory, it's not cost effective, so a hard sell to
> > replace hardware raid with software raid. The design completely depends
> > on if we can store all data in memory. I don't have an anwser yet how
> > much memory we should use to make the aggregation efficient. Guess only
> > number can talk. I'll try to collect some data and get back to you.
> >
> 
> Another consideration to keep in mind is persistent memory.  I'm
> working on an in-kernel mechanism to claim and map pmem and a
> raid-write-cache is an obvious first application.  I'll include you on
> the initial submission of that capability.

Exactly, we are planing to use pmem in the future when it's mature and
popular. SSD is still the best option before pmem is popular and widely
used.

Thanks,
Shaohua

      reply	other threads:[~2015-04-09 16:03 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-30 22:25 [RFC] raid5: add a log device to fix raid5/6 write hole issue Shaohua Li
2015-04-01  3:47 ` Dan Williams
2015-04-01  5:53   ` Shaohua Li
2015-04-01  6:02     ` NeilBrown
2015-04-01 17:14       ` Shaohua Li
2015-04-01 18:36   ` Piergiorgio Sartor
2015-04-01 18:46     ` Dan Williams
2015-04-01 20:07       ` Jiang, Dave
2015-04-01 18:46     ` Alireza Haghdoost
2015-04-01 19:57       ` Wols Lists
2015-04-01 20:04         ` Alireza Haghdoost
2015-04-01 20:18           ` Wols Lists
2015-04-01 20:17         ` Jens Axboe
2015-04-01 21:53 ` NeilBrown
2015-04-01 23:40   ` Shaohua Li
2015-04-02  0:19     ` NeilBrown
2015-04-02  4:07       ` Shaohua Li
2015-04-09  0:43         ` Shaohua Li
2015-04-09  5:04           ` NeilBrown
2015-04-09  6:15             ` Shaohua Li
2015-04-09 15:37               ` Dan Williams
2015-04-09 16:03                 ` Shaohua Li [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150409160327.GA2087406@devbig257.prn2.facebook.com \
    --to=shli@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox