All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Becker <jlbec@evilplan.org>
To: Amir Goldstein <amir73il@gmail.com>
Cc: lsf-pc@lists.linuxfoundation.org,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Theodore Tso <tytso@mit.edu>,
	Chris Mason <chris.mason@oracle.com>
Subject: Re: [LSF/FS TOPIC] Ext4 snapshots status update
Date: Tue, 29 Mar 2011 17:34:35 -0700	[thread overview]
Message-ID: <20110330003429.GA32669@noexit> (raw)
In-Reply-To: <AANLkTimuR-oOprBR7Xkehx01ojrdxYOmgqdnX7wNbpt6@mail.gmail.com>

On Wed, Mar 23, 2011 at 10:19:38PM +0200, Amir Goldstein wrote:
> On Fri, Feb 4, 2011 at 2:20 AM, Joel Becker <jlbec@evilplan.org> wrote:
> > On Fri, Feb 04, 2011 at 12:33:39AM +0200, Amir Goldstein wrote:
> >        I've already got a design for a front-end snapshot program that
> > implements a policy on top this generic behavior.  This design would
> > cover both first-class and hidden style snapshots, because it assume
> > snapshots are in a distinct namespace.  I haven't gotten around to
> > implementing it yet, but btrfs and other snapshottable filesystems were
> > part of the design goal.
> 
> Any chance of getting a copy of that design of yours, to get a head start
> for LSF?

	Yeah, I owe it to you.  It wasn't a written-down thing, it was a
hammered-out-in-our-heads thing among some ocfs2 developers.  I'm going
to braindump here to get us going.  First, I'll speak to your points.

> Here are some other generic snapshot related topics we may want to discuss:
> 
> 1. Collaborating the use of inode flags COW_FL, NOCOW_FL, suggested by Chris.

	I'm unsure where these fit, perhaps because I missed the
discussion between Chris and you.  ocfs2 has the inode flag
OCFS2_REFCOUNTED_FL to signify a refcount tree is attached to the inode.
This is ocfs2's structure for maintaining extent reference counts.  Is
your COW_FL the same?  Or is it a permission flag?  NOCOW_FL sounds
like: "Set this flag on the inode and it will prevent CoW."

> 2. How to deal with mmap write to COW file, when you get ENOSPC.

	We just fail the write with VM_FAULT_SIGBUS like mmap write to a
hole.  It's what happens for most other CoW filesystems today.  If
you're using CoW, you should be aware of what to expect.

> 3. Adding buffer_remap() flag for buffered I/O code, meaning, there is
> an existing mapping to initialize a page on partial write, but still need
> to call get_block() to get a (possibly) new mapping.

	Since ocfs2 doesn't allocate in get_block(), this doesn't affect
us.  We notice the refcounted extent in write_begin() and CoW it right
there.  Same place we clean up unwritten extents.
 
--snip--

	Now, about my snapshot thoughts as promised.  My understanding
of the snapshots you have implemented in ext4 is that they are like some
SAN snapshots; they are hidden objects not visible unless you use
special access.  They are particular to a given inode and are children
of that inode.  What happens when you remove the visible inode?  Do the
snapshots disappear?  Do you have limitations on how many shapshots a
particular inode can have?  These questions plagued us when we original
set out to design inode snapshots for ocfs2.
	Once we settled on a mechanism for CoW among ocfs2 inodes, we
quickly decided that a snapshot should be visible in the namespace.
This gave rise to the reflink(2) call, though that name is deprecated in
favor of fastcopy(2).  Currently our API is OCFS2_IOC_REFLINK (see,
legacy!), but we eventually want to get the system call upstream.  In
ocfs2-land, we decided to keep policy out of the kernel.
OCFS2_IOC_REFLINK creates a new inode that shares all the extents of the
source in CoW fashion, but once it returns, that new inode is a peer of
the source.  There is no parent->child relationship.
	Thus, for ocfs2 (and forgive the legacy names, the binary hasn't
changed yet), a "snapshot" is just:

    snapshot: reflink source target.snap && chmod 0444 target.snap

You can add "chattr +i target.snap" in there if you like.
	Since there is no "snapshot namespace" stuff for ocfs2 in the
kernel, it was our intention to propose a snapshot(8) binary that works
like mkfs/fsck; snapshot(8) just calls snapshot.<fstype>(8).  Our
plan was to place snapshot policy in snapshot.ocfs2(8).  This
implementation would handle managing the <mountpoint>/.snapshot/...
namespace behind the user:

    ? cd /mnt/ocfs2
    ? snapshot file1  # Creates /mnt/ocfs2/.snapshot/file1.<timestamp>
    <timestamp>
    ? snapshot file1 test  # Creates /mnt/ocfs2/.snapshot/file1.test
    test
    ? snapshot list file1
    Snapshots for file1:
        <timestamp>
        test

Something like that.
	A different snapshot model like ext4 could have snapshot.ext4(8)
call the kernel or whatever mechanism was appropriate.  A filesystem
from a NAS filer could use filer-specific calls.
	Beyond that, I wanted snapshot(8) to handle scheduling of
snapshots.  The usual daily/weekly stuff should be easy to schedule
generically.
	That's my brain dump.  I could enumerate proposed command
syntaxes, but I don't think that's necessary.

Joel

-- 

"Depend on the rabbit's foot if you will, but remember, it didn't
 help the rabbit."
	- R. E. Shay

			http://www.jlbec.org/
			jlbec@evilplan.org
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-03-30  0:35 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-03 22:33 [LSF/FS TOPIC] Ext4 snapshots status update Amir Goldstein
2011-02-04  0:20 ` Joel Becker
2011-02-04  5:52   ` Amir Goldstein
2011-03-23 20:19   ` Amir Goldstein
2011-03-30  0:34     ` Joel Becker [this message]
2011-03-30  4:16       ` Amir Goldstein
2011-03-30  5:52         ` Tao Ma
2011-03-30  6:05           ` Amir Goldstein
2011-03-30 10:33             ` Joel Becker
2011-03-30 10:46               ` Amir Goldstein
2011-03-30 11:50         ` Chris Mason
2011-03-30 12:08           ` Amir Goldstein
2011-04-01  0:10             ` [Lsf-pc] " Trond Myklebust
2011-04-01  3:58               ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110330003429.GA32669@noexit \
    --to=jlbec@evilplan.org \
    --cc=amir73il@gmail.com \
    --cc=chris.mason@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linuxfoundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.