Re: [RFC] Metadata Replication for Ext4

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Aditya Kali <adityakali@google.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Eric Sandeen <sandeen@redhat.com>,
	Andreas Dilger <adilger@dilger.ca>,
	Lukas Czerner <lczerner@redhat.com>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	Nauman Rafique <nauman@google.com>,
	TheodoreTso <tytso@google.com>, Ric Wheeler <rwheeler@redhat.com>,
	"Alasdair G.Kergon" <agk@redhat.com>
Subject: Re: [RFC] Metadata Replication for Ext4
Date: Wed, 26 Oct 2011 16:39:23 -0700	[thread overview]
Message-ID: <CAGr1F2Fuuoowh59k-_e6cK1v0Z2TMEtM4FpDTf2APjmViqUu-g@mail.gmail.com> (raw)
In-Reply-To: <20111021155428.GA21564@infradead.org>

Thanks all for your feedback. Summarizing from the discussion so far,
there seem to be three main solutions suggested for replicating
metadata:
1) Use mke2fs hack to store all metadata in 1st block group and use dm
and raid1 to mirror 1st block group (most of the metadata).
    Pros: Simple approach that does not require any ext4 changes.
    Cons: Added overhead of raid and device mapper will be significant
for fast SSDs
    Cons: Management overhead on large number of machines
    Cons: Need to add support in raid to read from the mirror if primary fails.
2) Have a separate metadata device and access all ext4 metadata from
it. This device could be raid1 or whatever.
    Pros: No need for device mapper
    Pros: Solves many other problems (SSDs can be used to cache
metadata for disks, etc.)
    Cons: Will need to significantly over allocate space (running out
of space on this device potentially means no more writes to
filesystem).
    Cons: Lot of ext4 code change
3) A replica inode that resides on either same device or an external
device (this proposal)
    Pros: No need for device mapper or other additional layers
    Pros: Simpler management in production
    Cons: Not generic (Ext4 specific)
    Cons: Complicates Ext4 for questionable gain (specially with inode
being on same device)

#2 seems to be an ideal solution, but it would be substantial amount
of efforts and will require lot of ext4 changes.
One other alternative that comes to mind is to have an external
"replica device" (hybrid of ideas #2 and #3) instead of an entire
"metadata device" with an option for the filesystem to read from the
replica first. All metadata writes that go to the original will also
go to the replica device. In addition, the filesystem can choose to
read from the replica first. With this, we get the benifits of #2 and
#3 without needing lot of ext4 (or any other filesystem) changes.
What do you think? Will this be something that could be implemented
without much intrusion into ext4 codebase?

Thanks,

On Fri, Oct 21, 2011 at 8:54 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Fri, Oct 21, 2011 at 10:52:11AM -0500, Eric Sandeen wrote:
>> With an SSD, you -really- don't know the independent failure domains,
>> with all the garbage collection & remapping that they may do, right?
>
> In fact some popular consumer SSDs do some fairly efficient data
> de-duplication which completly runs any metadata redundancy on a single
> of these devices void.
>
>



-- 
Aditya

next prev parent reply	other threads:[~2011-10-26 23:39 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-19  1:12 [RFC] Metadata Replication for Ext4 Aditya Kali
2011-10-19  8:43 ` Yongqiang Yang
2011-10-20 23:28   ` Aditya Kali
2011-10-19 14:10 ` Lukas Czerner
2011-10-19 16:19   ` Andreas Dilger
2011-10-20 22:45     ` Aditya Kali
2011-10-21  7:50       ` Lukas Czerner
2011-10-21 15:52       ` Eric Sandeen
2011-10-21 15:54         ` Christoph Hellwig
2011-10-26 23:39           ` Aditya Kali [this message]
2011-11-01  7:35             ` Lukas Czerner
2011-10-21  0:09     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGr1F2Fuuoowh59k-_e6cK1v0Z2TMEtM4FpDTf2APjmViqUu-g@mail.gmail.com \
    --to=adityakali@google.com \
    --cc=adilger@dilger.ca \
    --cc=agk@redhat.com \
    --cc=hch@infradead.org \
    --cc=lczerner@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=nauman@google.com \
    --cc=rwheeler@redhat.com \
    --cc=sandeen@redhat.com \
    --cc=tytso@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).