Re: What to do about subvolumes?

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Josef Bacik <josef@redhat.com>
To: Goffredo Baroncelli <kreijack@libero.it>
Cc: Josef Bacik <josef@redhat.com>,
	ssorce@redhat.com, linux-btrfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, chris.mason@oracle.com
Subject: Re: What to do about subvolumes?
Date: Wed, 1 Dec 2010 13:36:57 -0500	[thread overview]
Message-ID: <20101201183656.GD7021@localhost.localdomain> (raw)
In-Reply-To: <201012011933.40169.kreijack@libero.it>

On Wed, Dec 01, 2010 at 07:33:39PM +0100, Goffredo Baroncelli wrote:
> On Wednesday, 01 December, 2010, Josef Bacik wrote:
> > Hello,
> > 
> 
> Hi Josef
> 
> > 
> > === What are subvolumes? ===
> > 
> > They are just another tree.  In BTRFS we have various b-trees to describe 
> the
> > filesystem.  A few of them are filesystem wide, such as the extent tree, 
> chunk
> > tree, root tree etc.  The tree's that hold the actual filesystem data, that 
> is
> > inodes and such, are kept in their own b-tree.  This is how subvolumes and
> > snapshots appear on disk, they are simply new b-trees with all of the file 
> data
> > contained within them.
> > 
> > === What do subvolumes look like? ===
> > 
> [...]
> > 
> > 2) Obviously you can't just rm -rf subvolumes.  Because they are roots 
> there's
> > extra metadata to keep track of them, so you have to use one of our ioctls 
> to
> > delete subvolumes/snapshots.
> 
> Sorry, but I can't understand this sentence. It is clear that a directory and 
> a subvolume have a totally different on-disk format. But why it would be not 
> possible to remove a subvolume via the normal rmdir(2) syscall ? I posted a 
> patch some months ago: when the rmdir is invoked on a subvolume, the same 
> action of the ioctl BTRFS_IOC_SNAP_DESTROY is performed.
> 
> See https://patchwork.kernel.org/patch/260301/
>  

Oh hey thats cool.  That would be reasonable I think.  I was just saying that
currently we can't remove subvolumes/snapshots via rm, not that it wasn't
possible at all.  So I think what you did would be a good thing to have.

> [...]
> > 
> > There is one tricky thing.  When you create a subvolume, the directory inode
> > that is created in the parent subvolume has the inode number of 256.  So if 
> you
> > have a bunch of subvolumes in the same parent subvolume, you are going to 
> have a
> > bunch of directories with the inode number of 256.  This is so when users cd
> > into a subvolume we can know its a subvolume and do all the normal voodoo to
> > start looking in the subvolumes tree instead of the parent subvolumes tree.
> > 
> > This is where things go a bit sideways.  We had serious problems with NFS, 
> but
> > thankfully NFS gives us a bunch of hooks to get around these problems.
> > CIFS/Samba do not, so we will have problems there, not to mention any other
> > userspace application that looks at inode numbers.
> 
> How this is/should be different of a mounted filesystem ?
> For example:
> 
> # cd /tmp
> # btrfs subvolume create sub-a
> # btrfs subvolume create sub-b
> # mkdir mount -a; mkdir mount-b
> # mount /dev/sda6 mount-a		# an ext4 fs
> # mount /dev/sdb2 mount-b		# an ext3 fs
> # $ stat -c "%8i %n" sub-a sub-b mount-a mount-b
>      256 sub-a
>      256 sub-b
>        2 mount-a
>        2 mount-b
> 
> In this case the inode-number returned are equal for both the mounted 
> filesystems and the subvolumes. However, the fsid is different.
> 
> # stat -fc "%8i %n" sub-a sub-b mount-a mount-b .
> cdc937c1a203df74 sub-a
> cdc937c1a203df77 sub-b
> b27d147f003561c8 mount-a
> d49e1a3d2333d2e1 mount-b
> cdc937c1a203df75 .
> 
> Moreover I suggest to look at the difference of the inode returned by 
> readdir(3) and stat(3)..
>

Yeah you are right, the inode numbering can probably be the same, we just need
to make them logically different mounts so things like NFS and samba still work
right.

> [...]
> > I feel like I'm forgetting something here, hopefully somebody will point it 
> out.
> > 
> 
> Another point that I want like to discuss is how manage the "pivoting" between 
> the subvolumes. One of the most beautiful feature of btrfs is the snapshot 
> capability. In fact it is possible to make a snapshot of the root of the 
> filesystem and to mount it in a subsequent reboot.
> But is very complicated to manage the pivoting of a snapshot of a root 
> filesystem, because I cannot delete the "old root" due to the fact that the 
> "new root" is placed in the "old root".
> 
> A possible solution is not to put the root of the filesystem (where are placed 
> /usr, /etc....) in the root of the btrfs filesystem; but it should be accepted 
> from the beginning the idea that the root of a filesystem should be placed in 
> a subvolume which int turn is placed in the root of a btrfs filesystem...
> 
> I am open to other opinions.
> 

Agreed, one of the things that Chris and I have discussed is the possiblity of
just having dangling roots, since really the directories are just an easy way to
get to the subvolumes.  This would let you delete the original volume and use
the snapshot from then on out.  Something to do in the future for sure.  Thanks,

Josef

next prev parent reply	other threads:[~2010-12-01 18:37 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-01 14:21 What to do about subvolumes? Josef Bacik
2010-12-01 14:50 ` Mike Hommey
2010-12-01 14:51 ` C Anthony Risinger
2010-12-01 16:01   ` Chris Mason
2010-12-01 16:03     ` C Anthony Risinger
2010-12-01 16:13       ` Chris Mason
2010-12-01 16:31     ` Mike Hommey
2010-12-09 19:53       ` Martin Steigerwald
2010-12-01 16:00 ` Chris Mason
2010-12-01 16:38 ` Hugo Mills
2010-12-01 16:48   ` Gordan Bobic
2010-12-01 16:52   ` Mike Hommey
2010-12-01 16:52   ` C Anthony Risinger
2010-12-01 17:38   ` Josef Bacik
2010-12-01 19:35     ` Hugo Mills
2010-12-01 20:24       ` Freddie Cash
2010-12-01 21:28         ` Hugo Mills
2010-12-01 23:32           ` Freddie Cash
2010-12-02  4:46             ` Mike Fedyk
2010-12-01 18:33 ` Goffredo Baroncelli
2010-12-01 18:36   ` Josef Bacik [this message]
2010-12-01 18:48     ` C Anthony Risinger
2010-12-01 18:52       ` C Anthony Risinger
2010-12-01 19:08         ` Goffredo Baroncelli
2010-12-01 19:44 ` J. Bruce Fields
2010-12-01 19:54   ` Josef Bacik
2010-12-01 20:00     ` J. Bruce Fields
2010-12-01 20:09       ` Josef Bacik
2010-12-01 20:16         ` J. Bruce Fields
2010-12-02  1:52         ` Michael Vrable
2010-12-03 20:53           ` J. Bruce Fields
2010-12-01 20:03 ` Jeff Layton
2010-12-01 20:46   ` Goffredo Baroncelli
2010-12-01 21:06     ` Jeff Layton
2010-12-02  9:26 ` Arne Jansen
2010-12-02  9:49 ` Arne Jansen
2010-12-02 16:11   ` Chris Mason
2010-12-02 17:14     ` David Pottage
2010-12-03 20:56       ` J. Bruce Fields
2010-12-03  2:43   ` Phillip Susi
2011-01-31  2:40   ` Ian Kent
2010-12-03  4:25 ` Chris Ball
2010-12-03 14:00   ` Josef Bacik
2010-12-03 21:45 ` Josef Bacik
2010-12-03 22:16   ` J. Bruce Fields
2010-12-03 22:27   ` Dave Chinner
2010-12-03 22:29     ` Chris Mason
2010-12-03 22:45       ` J. Bruce Fields
2010-12-03 23:01         ` Andreas Dilger
2010-12-06 16:48           ` J. Bruce Fields
2010-12-08  6:39             ` Andreas Dilger
2010-12-08 23:07             ` Neil Brown
2010-12-09  4:41               ` Andreas Dilger
2010-12-09 15:19                 ` J. Bruce Fields
2010-12-07 16:52         ` hch
2010-12-07 20:45           ` J. Bruce Fields
2010-12-07 16:51     ` Christoph Hellwig
2010-12-07 17:02       ` Trond Myklebust
2010-12-08 17:16         ` Andreas Dilger
2010-12-08 17:27           ` J. Bruce Fields
2010-12-08 21:18             ` Andreas Dilger
2010-12-04 21:58   ` Mike Fedyk
2010-12-06 14:27     ` Josef Bacik
2011-01-31  2:56       ` Ian Kent
2010-12-07 16:48 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101201183656.GD7021@localhost.localdomain \
    --to=josef@redhat.com \
    --cc=chris.mason@oracle.com \
    --cc=kreijack@libero.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=ssorce@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).