linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joel Becker <jlbec@evilplan.org>
To: Amir Goldstein <amir73il@gmail.com>
Cc: lsf-pc@lists.linuxfoundation.org,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Theodore Tso <tytso@mit.edu>,
	Chris Mason <chris.mason@oracle.com>
Subject: Re: [LSF/FS TOPIC] Ext4 snapshots status update
Date: Tue, 29 Mar 2011 17:34:35 -0700	[thread overview]
Message-ID: <20110330003429.GA32669@noexit> (raw)
In-Reply-To: <AANLkTimuR-oOprBR7Xkehx01ojrdxYOmgqdnX7wNbpt6@mail.gmail.com>

On Wed, Mar 23, 2011 at 10:19:38PM +0200, Amir Goldstein wrote:
> On Fri, Feb 4, 2011 at 2:20 AM, Joel Becker <jlbec@evilplan.org> wrote:
> > On Fri, Feb 04, 2011 at 12:33:39AM +0200, Amir Goldstein wrote:
> >        I've already got a design for a front-end snapshot program that
> > implements a policy on top this generic behavior.  This design would
> > cover both first-class and hidden style snapshots, because it assume
> > snapshots are in a distinct namespace.  I haven't gotten around to
> > implementing it yet, but btrfs and other snapshottable filesystems were
> > part of the design goal.
> 
> Any chance of getting a copy of that design of yours, to get a head start
> for LSF?

	Yeah, I owe it to you.  It wasn't a written-down thing, it was a
hammered-out-in-our-heads thing among some ocfs2 developers.  I'm going
to braindump here to get us going.  First, I'll speak to your points.

> Here are some other generic snapshot related topics we may want to discuss:
> 
> 1. Collaborating the use of inode flags COW_FL, NOCOW_FL, suggested by Chris.

	I'm unsure where these fit, perhaps because I missed the
discussion between Chris and you.  ocfs2 has the inode flag
OCFS2_REFCOUNTED_FL to signify a refcount tree is attached to the inode.
This is ocfs2's structure for maintaining extent reference counts.  Is
your COW_FL the same?  Or is it a permission flag?  NOCOW_FL sounds
like: "Set this flag on the inode and it will prevent CoW."

> 2. How to deal with mmap write to COW file, when you get ENOSPC.

	We just fail the write with VM_FAULT_SIGBUS like mmap write to a
hole.  It's what happens for most other CoW filesystems today.  If
you're using CoW, you should be aware of what to expect.

> 3. Adding buffer_remap() flag for buffered I/O code, meaning, there is
> an existing mapping to initialize a page on partial write, but still need
> to call get_block() to get a (possibly) new mapping.

	Since ocfs2 doesn't allocate in get_block(), this doesn't affect
us.  We notice the refcounted extent in write_begin() and CoW it right
there.  Same place we clean up unwritten extents.
 
--snip--

	Now, about my snapshot thoughts as promised.  My understanding
of the snapshots you have implemented in ext4 is that they are like some
SAN snapshots; they are hidden objects not visible unless you use
special access.  They are particular to a given inode and are children
of that inode.  What happens when you remove the visible inode?  Do the
snapshots disappear?  Do you have limitations on how many shapshots a
particular inode can have?  These questions plagued us when we original
set out to design inode snapshots for ocfs2.
	Once we settled on a mechanism for CoW among ocfs2 inodes, we
quickly decided that a snapshot should be visible in the namespace.
This gave rise to the reflink(2) call, though that name is deprecated in
favor of fastcopy(2).  Currently our API is OCFS2_IOC_REFLINK (see,
legacy!), but we eventually want to get the system call upstream.  In
ocfs2-land, we decided to keep policy out of the kernel.
OCFS2_IOC_REFLINK creates a new inode that shares all the extents of the
source in CoW fashion, but once it returns, that new inode is a peer of
the source.  There is no parent->child relationship.
	Thus, for ocfs2 (and forgive the legacy names, the binary hasn't
changed yet), a "snapshot" is just:

    snapshot: reflink source target.snap && chmod 0444 target.snap

You can add "chattr +i target.snap" in there if you like.
	Since there is no "snapshot namespace" stuff for ocfs2 in the
kernel, it was our intention to propose a snapshot(8) binary that works
like mkfs/fsck; snapshot(8) just calls snapshot.<fstype>(8).  Our
plan was to place snapshot policy in snapshot.ocfs2(8).  This
implementation would handle managing the <mountpoint>/.snapshot/...
namespace behind the user:

    ? cd /mnt/ocfs2
    ? snapshot file1  # Creates /mnt/ocfs2/.snapshot/file1.<timestamp>
    <timestamp>
    ? snapshot file1 test  # Creates /mnt/ocfs2/.snapshot/file1.test
    test
    ? snapshot list file1
    Snapshots for file1:
        <timestamp>
        test

Something like that.
	A different snapshot model like ext4 could have snapshot.ext4(8)
call the kernel or whatever mechanism was appropriate.  A filesystem
from a NAS filer could use filer-specific calls.
	Beyond that, I wanted snapshot(8) to handle scheduling of
snapshots.  The usual daily/weekly stuff should be easy to schedule
generically.
	That's my brain dump.  I could enumerate proposed command
syntaxes, but I don't think that's necessary.

Joel

-- 

"Depend on the rabbit's foot if you will, but remember, it didn't
 help the rabbit."
	- R. E. Shay

			http://www.jlbec.org/
			jlbec@evilplan.org
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-03-30  0:34 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-03 22:33 [LSF/FS TOPIC] Ext4 snapshots status update Amir Goldstein
2011-02-04  0:20 ` Joel Becker
2011-02-04  5:52   ` Amir Goldstein
2011-03-23 20:19   ` Amir Goldstein
2011-03-30  0:34     ` Joel Becker [this message]
2011-03-30  4:16       ` Amir Goldstein
2011-03-30  5:52         ` Tao Ma
2011-03-30  6:05           ` Amir Goldstein
2011-03-30 10:33             ` Joel Becker
2011-03-30 10:46               ` Amir Goldstein
2011-03-30 11:50         ` Chris Mason
2011-03-30 12:08           ` Amir Goldstein
2011-04-01  0:10             ` [Lsf-pc] " Trond Myklebust
2011-04-01  3:58               ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110330003429.GA32669@noexit \
    --to=jlbec@evilplan.org \
    --cc=amir73il@gmail.com \
    --cc=chris.mason@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linuxfoundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).