From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f46.google.com ([74.125.82.46]:40902 "EHLO mail-wg0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933722Ab3GWRrp (ORCPT ); Tue, 23 Jul 2013 13:47:45 -0400 Received: by mail-wg0-f46.google.com with SMTP id k13so699196wgh.13 for ; Tue, 23 Jul 2013 10:47:44 -0700 (PDT) Message-ID: <51EEC1BD.9030001@gmail.com> Date: Tue, 23 Jul 2013 19:47:41 +0200 From: Gabriel de Perthuis MIME-Version: 1.0 To: Hugo Mills CC: Linux Btrfs , Jerome Haltom Subject: Re: Q: Why subvolumes? References: <20130723150620.GG20517@carfax.org.uk> In-Reply-To: <20130723150620.GG20517@carfax.org.uk> Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: > Now... since the snapshot's FS tree is a direct duplicate of the > original FS tree (actually, it's the same tree, but they look like > different things to the outside world), they share everything -- > including things like inode numbers. This is OK within a subvolume, > because we have the semantics that subvolumes have their own distinct > inode-number spaces. If we could snapshot arbitrary subsections of the > FS, we'd end up having to fix up inode numbers to ensure that they > were unique -- which can't really be an atomic operation (unless you > want to have the FS locked while the kernel updates the inodes of the > billion files you just snapshotted). I don't think so; I just checked some snapshots and the inos are the same. Btrfs just changes the dev_id of subvolumes (somehow the vfs allows this). > The other thing to talk about here is that while the FS tree is a > tree structure, it's not a direct one-to-one map to the directory tree > structure. In fact, it looks more like a list of inodes, in inode > order, with some extra info for easily tracking through the list. The > B-tree structure of the FS tree is just a fast indexing method. So > snapshotting a directory entry within the FS tree would require > (somehow) making an atomic copy, or CoW copy, of only the parts of the > FS tree that fall under the directory in question -- so you'd end up > trying to take a sequence of records in the FS tree, of arbitrary size > (proportional roughly to the number of entries in the directory) and > copying them to somewhere else in the same tree in such a way that you > can automatically dereference the copies when you modify them. So, > ultimately, it boils down to being able to do CoW operations at the > byte level, which is going to introduce huge quantities of extra > metadata, and it all starts looking really awkward to implement (plus > having to deal with the long time taken to copy the directory entries > for the thing you're snapshotting). Btrfs already does CoW of arbitrarily-large files (extent lists); doing the same for directories doesn't seem impossible.