From: "Theodore Ts'o" <tytso@mit.edu>
To: Xin Zhao <uszhaoxin@gmail.com>
Cc: Al Viro <viro@ftp.linux.org.uk>,
mingz@ele.uri.edu, mikado4vn@gmail.com,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-fsdevel@vger.kernel.org
Subject: Re: Question regarding to store file system metadata in database
Date: Mon, 20 Mar 2006 17:19:11 -0500 [thread overview]
Message-ID: <20060320221911.GB11447@thunk.org> (raw)
In-Reply-To: <4ae3c140603200713m24a5af0agd891a709286deb47@mail.gmail.com>
On Mon, Mar 20, 2006 at 10:13:43AM -0500, Xin Zhao wrote:
> Second, I might want to give the background on which we are
> considering the possibility of storing metadata in database. We are
> currently developing a file system that allows multiple virtual
> machines to share base software environment. With our current design,
> a new VM can be deployed in several seconds by inheriting the file
> system of an existing VM. If a VM is to modify a shared file, the file
> system will do copy-on-write to gernerate a private copy for this VM.
> Thus, there could be multiple physical copies for a virtual pathname.
> Even more complicated, a physical copy could be shared by arbitrary
> subset of VMs. Now let's consider how to support this using regular
> file system. You can treat VMs as clients or users of a standard
> linux. Consider the following scenario: VM2 inherit VM1's file
> system. The physical copy for virtual file F is F.1. Then, it modified
> file F and get its private copy F.2. Now VM3 inherit VM2's file
> system. The inherit graph is as follow:
> VM1-->VM2-->VM3
Why not leverage devicemapper, and implement muliple hierarchical
copy-on-write snapshots at the block device level? It would be much
easier....
> We do know many file systems already use db like technique to index
> metadata. For example B tree used by ReiserFS and HTree used by Ext3.
> But they do not provide the feature we need. This at least exposes one
> fundamental limit: they do not support easy extension on metadata. So
> at least some extension must be made to make the mapping efficient. So
> we thought "since they are using db like technique, why not simply use
> DB? " At least a DB makes it simple to extend metadata of a file
> system. For example, in our case, we might also want to add hash value
> of file content into a file's metadata. This allows us to merge
> several files with identical contents into one for disk space saving,
> which is important in our scenario since we assume that many VMs uses
> identical software environment.
Why not use a DB? Because most databases's are big and bloated and
not something you want to have in the kernel (not even Hans Reiser was
crazy enough to propose stuffing an SQL interpreter into the kernel :-)
--- and if you put the generic database (complete with SQL interpreter
and all the rest) in userspace, doing upcalls into userspace, and then
having to have the database interpret the SQL query, etc., takes time.
If you don't care about performance, by all means, try using FUSE and
implementing a user-space filesystem. It will be slow as all get-out,
but maybe it won't matter for your application.
> Also, I am not proposing to use db to store all metadata. As mentioned
> before, currently I am just considering to store the pathname-inode
> mapping. Other attributes like atime, ctime are stored using standard
> way. So this is essentially a layer above standard FS. Because only
> open () syscall needs to access metadata with the communication across
> kernel boundary, I am expecting a moderate performance impact. But I
> am not sure about this. Someone has any experience on that?
That won't be just open(), but stat(), readdir(), unlink(), rename(),
etc. It's all going to depend on your workload and how much
filesystem access it requires. It certainly won't be a general
purpose solution, and for some workloads it will be disaterously slow.
But hey, if you don't believe me, go ahead try implementing it.....
- Ted
next prev parent reply other threads:[~2006-03-20 22:19 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-03-19 4:48 Question regarding to store file system metadata in database Xin Zhao
2006-03-19 5:07 ` Mikado
2006-03-19 17:48 ` Xin Zhao
2006-03-19 17:58 ` Ming Zhang
2006-03-19 18:11 ` Xin Zhao
2006-03-19 18:26 ` Ming Zhang
2006-03-19 18:50 ` Xin Zhao
2006-03-19 19:47 ` Al Viro
[not found] ` <441DC2D6.4060001@societasilluminati.org>
2006-03-19 21:24 ` Ming Zhang
2006-03-20 13:09 ` Theodore Ts'o
2006-03-20 15:13 ` Xin Zhao
2006-03-20 19:36 ` Xin Zhao
2006-03-20 19:58 ` Al Viro
2006-03-20 22:53 ` Xin Zhao
2006-03-20 23:32 ` Al Viro
2006-03-20 21:08 ` Matti Aarnio
2006-03-20 22:28 ` Erez Zadok
2006-03-20 22:19 ` Theodore Ts'o [this message]
2006-03-21 6:51 ` Miklos Szeredi
2006-03-21 20:05 ` Pavel Machek
2006-03-22 15:21 ` Xin Zhao
2006-03-19 21:34 ` Ming Zhang
2006-03-20 8:30 ` Matti Aarnio
2006-03-19 23:06 ` Alan Cox
2006-03-19 23:44 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060320221911.GB11447@thunk.org \
--to=tytso@mit.edu \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mikado4vn@gmail.com \
--cc=mingz@ele.uri.edu \
--cc=uszhaoxin@gmail.com \
--cc=viro@ftp.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox