linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 1/2] AUFS: merging/stacking several filesystems
@ 2008-04-02  5:12 hooanon05
  2008-04-02 15:11 ` Tomas M
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: hooanon05 @ 2008-04-02  5:12 UTC (permalink / raw)
  To: linux-fsdevel


Hello fs-developers,

I am developing a stackable unification filesystem which unifies several
directories and provides a merged single directory.
I guess most people already knows what it is. When users access a file,
the access will be passed/re-directed/converted (sorry, I am not sure
which English word is correct) to the real file on the member
filesystem. The member filesystem is called 'lower filesytstem' or
'branch' and has a mode 'readonly' and 'readwrite.' And the file
deletion is handled as 'whiteout' on the upper writable branch.

On this ML, there have been discussions about UnionMount (Jan Blunck
and Bharata B Rao) and Unionfs (Erez Zadok). They took different
approaches to implement the merged-view.
The former tries putting it into VFS, and the latter implements as a
separate filesystem.
(If I misunderstand about these implementations, please let me know and
I shall correct it. Because it is a long time ago when I read their
source files last time.)
UnionMount's approach will be able to small, but may be hard to share
branches between several UnionMount since the whiteout in it is
implemented in the inode on branch filesystem and always
shared. According to Bharata's recent post, readdir does not seems to
be finished yet.
Unionfs has a longer history. When I got the idea of stacking
filesystem (Aug 2005), it already existed. It has virtual super_block,
inode, dentry and file objects and they have an array pointing lower
same kind objects. After contributing many patches for Unionfs, I
re-started my project AUFS (Jun 2006).

In AUFS, the structure of filesystem is simlilar to Unionfs, but I
implemented my own ideas, approaches and enhancements in it.
Here are some of them and the intention of this post is to get some
initial feedback about its design.
You can see the actual details, documents, CVS logs, and how people
are using it from
<http://aufs.sf.net>.

Kindly review and let me know your comments.


o file mapping -- mmap and sharing pages
----------------------------------------------------------------------
In AUFS, the file-mapped pages are shared between the lower file and
the AUFS's virtual one by overriding vm_operation, particularly
->fault().

In aufs_mmap(),
- get and store vm_ops of the lower file.
- map the file of aufs by generic_file_mmap() and set aufs's vm operations.

In aufs_fault(),
- a race can happen. for instance a multithreaded library.
- get the file of aufs from the passed vma, sleep if needed.
- get the lower file from the aufs file.
- call ->fault() in the previously stored vm_ops with setting the
  lower file to vm_file.
- restore vm_file and wake_up if someone else got sleep.

When a member filesystem is added to or deleted from the stack (often
called union), the same-named file may unveil and its contents will be
replaced by the new one when a process read(2) through previously
opened file.
(Some users may not want to refresh the filedata. For such users, I
have a plan to implement a mount option 'refrof' which decides to
refresh the opened files or not.)
In this case, an already mapped file will not be updated since the
contents are a part of a process and it should not be changed by AUFS
branch management. Of course, in case of the deleting branch has a
busy file, it cannot be deleted from the union.

In UnionMount, it won't be matter since it doesn't have its own inode
and file object.
In Unionfs, the memory pages mapped to filedata are copied from
the lower (real) file into the Unionfs's virtual one and handles it by
address_space operations. Recently Unionfs changed it to the one I
suggested in last December which AUFS took (since Jul 2006).


o external inode number table and bitmap (XINO/XIB)
----------------------------------------------------------------------
Because aufs has its own virtual inode, it has to manage the inode
number. Generally iunique() is used for this purpose, but when a user
execute chmod/chown -R to a large directory or rmdir to a dir who has
child, a problem may arise. Because chmod/chown -R checks the
inode number, it may be changed/re-assigned silently/internally and
the command will return an error. In rmdir, dentry_unhash() is called
and its child dentry/inode is unhashed. It means the inode number for
the child will be changed/re-assigned when then will be accessed again.

To keep the inode number unchanged, aufs has an external inode number
table and bitmap (which are called 'xino' and 'xib') per a branch
filesystem. The table is a regular file which is created on the first
writable branch automatically be default. When several branches exist
on the same (real) filesystem, those files will be shared.
If xino/xib is unnecessary for user, he can specify 'noxino' mount
option and disable it.
Aufs shows the size of these files via sysfs.

Currently these xino/xib are created and deleted at the aufs mount
time (the files are still opened), but I have a request from users who
are using aufs on NFS server and exporting. So I will implement an
option not to delete xino/xib files and re-use it after NFS server
reboot.

In UnionMount, it won't be matter since it doesn't have its own inode.
In Unionfs, they took iunique() approach and still have above
problem. But they already started Unionfs-ODF branch which has another
mounted filesystem and delegate the inode number management to it. The
ODF approach has some overhead since it requires to create/remove
files/dirs on another filesystem.


o cache coherency or user's direct access to branch filesystems
  (UDBA) -- inotify
----------------------------------------------------------------------
Users may create/delete/change files on branch, bypassing aufs, at
anytime (user's direct access, UDBA). Because aufs has its own inode
and file objects and they are cached in a generic way, it has to
maintain the inode attribute and the directory listing.

In order to implement this, aufs has three levels of detect-test. The
most strict test is using inotify(CONFIG_INOTIFY) feature. When a user
specifies this test level, aufs will set inotify-watch to all the
branch dir in cache. When an aufs dir inode object is created and
cached, it will refer the real dirs on branches, and aufs sets
inotify-watch to them and will be notified when UDBA occurs. The watch
will be cleared when the aufs dir inode is purged from the system
inode cache.
When UDBA occurs, aufs registers a function to 'events' thread by
schedule_work(), and the function sets some special status to the
cached aufs inode private data. When the same file is accessed through
aufs, aufs will detect the status and refresh all necessary data.

The other two levels of test don't use inotify. The most simple test
level checks nothing. It is for readonly filesystems such as
cdrom (Even if the most strict test is specified, aufs doesn't set
inotify to such filesystems). The middle level (default) is
checking/comparing inode attributes in d_revalidate(). It means this
test level may not be effective for a negative dentry.
In most cases, I guess the default level is enough and users can execute
'mount -o remount /aufs' to discard the unused caches. But if a user
really want to reflect the UDBA soon, the highest test option will help
him/her.


o hardlink over branches, pseudo-link
----------------------------------------------------------------------
When a file on a lower readonly branch is hard-linked (fileA and
fileB) and a user modifies fileA, aufs will copy-up it to the upper
writable branch and make the originally requested change to fileA on
the upper branch. On the writable branch, fileA is not hardlinked. It
means fileB on the lower branch still have the old contents.

To address this problem, aufs introduced a 'pseudo-link' (plink) which
is a logical hardlink over branches. It maintains the simple inode list
on memory and checks the accessed inode is in the list.
Finally fileB is handled as if it existed on the writable branch, by
referencing fileA's inode on the writable branch as fileB's inode.

Additionally, to support the case of fileA on the writable branch is
deleted, aufs creates another hardlink on the writable branch which
exists under a special directory to hide it from users.

At remount/umount time, /sbin/{mount,umount}.aufs script checks the
pseudo-linked inode list in aufs, re-produces all real hardlinks on
the writable branch, and flushes the list on memory (But these script
has a potential race problem).


Thank you reading this long and my broken English.

Junjiro Okajima

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 1/2] AUFS: merging/stacking several filesystems
  2008-04-02  5:12 [RFC 1/2] AUFS: merging/stacking several filesystems hooanon05
@ 2008-04-02 15:11 ` Tomas M
  2008-04-03  6:56   ` hooanon05
  2008-04-03  6:53 ` hooanon05
  2008-04-03  6:58 ` [RFC 2/2] " hooanon05
  2 siblings, 1 reply; 9+ messages in thread
From: Tomas M @ 2008-04-02 15:11 UTC (permalink / raw)
  To: linux-fsdevel

Hi all.

I am happy to see AuFS author in this list, and I hope there will be some people who review the design and post own comments and ideas, in order to make AuFS even better. :) I am an AuFS user for a long time and what I really appreciate (from the user's point of view) is the following:

- AuFS supports writable branch balancing. That means, you can setup several partitions for writing and AuFS will split all new/modified files between them, based on free disk space, existence of parent directory, randomly, or combinations.

- AuFS supports huge amount of branches. I'm currently using hundreds of branches without just a small slowdown (which is obvious).

- AuFS provides a list of branches through /sys, which doesn't have the limitation like /proc/mounts. For that reason, it works correctly even with thousand of branches (while so much branches would break /proc/mounts at all).

- AuFS implements 'rr' branch mode, it means 'really-readonly'. This is really useful, particularly for ISO images or SquashFS filesystems as a brach, as AuFS doesn't need to re-lookup those filesystems. (You know, a readonly branch 'ro' can be modified from another place, eg. network, so there can occur a 'direct branch access' even for read-only directories and AuFS handles it correctly.)

- last, but not the least, AuFS is really stable in real world situations. I used unionfs in the past, but my second name for it was 'NULL POINTER DEREFERENCE'. I can see those errors still happening in latest unionfs as well, last one I've found is from 27th of May 2008 ... BUG: unable to handle kernel NULL pointer dereference. ... I have absolutely no idea what that means, but the same errors keep appearing in unionfs for years. You won't see anything like that in AuFS. Guess why knoppix and other projects switched to it :)

That's all from me :)
Thanks

Tomas M
slax.org


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC 1/2] AUFS: merging/stacking several filesystems
  2008-04-02  5:12 [RFC 1/2] AUFS: merging/stacking several filesystems hooanon05
  2008-04-02 15:11 ` Tomas M
@ 2008-04-03  6:53 ` hooanon05
  2008-04-03  6:58 ` [RFC 2/2] " hooanon05
  2 siblings, 0 replies; 9+ messages in thread
From: hooanon05 @ 2008-04-03  6:53 UTC (permalink / raw)
  To: linux-fsdevel


> Here are some of them and the intention of this post is to get some
> initial feedback about its design.
	:::
> Kindly review and let me know your comments.

o readdir -- virtual dir block on memory (VDIR)
----------------------------------------------------------------------
This is an approach I posted a few months ago replying UnionMount's
post. It constructs a virtual dir block on memory. For readdir, aufs
calls vfs_readdir() internally for each lower dirs, merges their
entries with eliminating the whiteout-ed ones, and gives it the the
file (dir) object. So the file object has its entry list until it is
closed. The entry list will be updated when the file position is zero
and becomes old. This decision is made in aufs automatically.

It may consume rather large memory and cpu cycles. To reduce the number
of memory allocations, the implementation became rather tricky .

Some people may call it can be a security hole or DoS attack since the
opened and once readdir-ed dir (file object) holds its entry list and
becomes a pressure for system memory. But I'd say it is similar to
files under /proc or /sys. The virtual files on procfs and sysfs also
holds a memory page (generally) while they are opened. When an idea to
reduce memory for them is introduced, it will be applied to aufs too.


o policies for selecting one among multiple writable branches,
  parent-dir, round-robin and most-free-space
----------------------------------------------------------------------
When the number of writable branch is more than one, aufs has to decide
the target branch for file creation or copy-up. By default, the highest
writable branch which has the parent (or ancestor) dir of the target
file is chosen (top-down-parent policy).
By user's request, aufs has some other policies to select the writable
branch, round-robin and most-free-space policies for file creation, and
top-down-parent, bottom-up-parent and bottom-up policies for copy-up.

As expected, the round-robin policy selects in circular. When you have
two writable branches and creates 10 new files, 5 files will be
created for each branch. mkdir(2) systemcall is an exception. When you
create 10 new directories, all are created on the same branch.
And the most-free-space policy selects the one which has most free
space among the writable branches. The amount of free space will be
checked by aufs internally, and users can specify its time interval.

The policies for copy-up is more simple,
top-down-parent is equivalent to the same named on in create policy,
bottom-up-parent selects the writable branch where the parent dir
exists and the nearest upper one from the copyup-source,
bottom-up selects the nearest upper writable branch from the
copyup-source, regardless the existence of the parent dir.

There are some rules or exceptions to apply these policies.
- If there is a readonly branch above the policy-selected branch and
  the parent dir is marked as opaque (a variation of whiteout), or the
  target (creating) file is whiteout-ed on the upper readonly branch,
  then the policy will be ignored and the target file will be created
  on the nearest upper writable branch than the readonly branch.
- If there is a writable branch above the policy-selected branch and
  the parent dir is marked as opaque or the target file is whiteouted
  on the branch, then the policy will be ignored and the target file
  will be created on the highest one among the upper writable branches
  who has diropq or whiteout. In case of whiteout, aufs removes it as
  usual.
- link(2) and rename(2) systemcalls are exceptions in every policy.
  They try selecting the branch where the source exists as possible
  since copyup a large file will take long time. If it can't be,
  ie. the branch where the source exists is readonly, then they will
  follow the copyup policy.
- There is an exception for rename(2) when the target exists.
  If the rename target exists, aufs compares the index of the branches
  where the source and the target exists and selects the higher
  one. If the selected branch is readonly, then aufs follows the
  copyup policy.


o revert everything after an error on a branch in a single systemcall,
  and remove/rename dir -- temporary name and EXDEV
----------------------------------------------------------------------
Since aufs handles several filesystems internally, it is important to
revert everything after an error happend on a branch internally, and
returns the expected error of systemcall.
To do this, aufs selects only one target writable branch for
create/remove operations and didn't change other
branches. Additionally aufs has to pay attention the order of internal
operaion to make it revertible at any point. The general rule is here.

For creation,
- lock the real dir on the target branch
- lookup a whiteout for the target
- actual creation of the target
- unlink the whiteout for it, if exists
- d_instantiate()
- unlock the real dir

For removal,
- lock the real dir on the target branch
- create a whiteout for the target, if needed
- actual removal of the target, if it exists on the target branch
- unlock the real dir

Generally rename(2) can handle the destination dir which already
exists, and aufs_rename() basically calls vfs_rename() on the writable
branch. When an empty dst-dir exists on the lower branch(es), aufs has
to make the renamed dir opaque (which is a variation of whiteout and
called 'diropq') by creating a special 'diropq' file under the renamed
dir.
If aufs cannot create the 'diropq' file, aufs cannot revert the
previous vfs_rename().

To address this problem, aufs renames the existing dst-dir to the
temporary new whiteout-ed name before the actual vfs_rename(). After
all operations succeeded, aufs_rename() passes the temporary name to
another kernel thread and returns.
The kernel thread removes the temporary name later.
If aufs cannot create the 'diropq' file, it tries vfs_rename() the
src-dir to its old name, and the temporary name to the old dst-dir name.

This approach is implemented in aufs_rmdir() too (except the branch is
NFS), and very effective when the target dir has many whiteouts since
aufs has to unlink the child whiteouts before calling vfs_rmdir().
It may take long time and user has to wait for the completion of
_logically_ empty dir is removed.
With this approach, user don't need to wait so long time.
But the number of child whiteout is not so much, nobody likes this
overhead. So aufs has an option which specifies the threshold of the
number of child whiteouts.

In rename(2), when the target dir has its child on several branches,
aufs_rename() returns -EXDEV, since it may cause many/long internal
copy-up. Generally mv(1) supports this case and retries create/copy
for each children.


Thank you reading this long and my broken English.

Junjiro Okajima

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 1/2] AUFS: merging/stacking several filesystems
  2008-04-02 15:11 ` Tomas M
@ 2008-04-03  6:56   ` hooanon05
  0 siblings, 0 replies; 9+ messages in thread
From: hooanon05 @ 2008-04-03  6:56 UTC (permalink / raw)
  To: Tomas M; +Cc: linux-fsdevel


Thanx Tomas,

Tomas M:
> - AuFS provides a list of branches through /sys, which doesn't have the limitation like /proc/mounts. For that reason, it works correctly even with thousand of branches (while so much branches would break /proc/mounts at all).

Strictly speaking, it is not a limitaion of /proc/mounts but mount(8) or
/etc/mtab. I corrected the aufs document and myself a few weeks ago.


Junjiro Okajima

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC 2/2] AUFS: merging/stacking several filesystems
  2008-04-02  5:12 [RFC 1/2] AUFS: merging/stacking several filesystems hooanon05
  2008-04-02 15:11 ` Tomas M
  2008-04-03  6:53 ` hooanon05
@ 2008-04-03  6:58 ` hooanon05
  2008-05-12  4:43   ` hooanon05
  2 siblings, 1 reply; 9+ messages in thread
From: hooanon05 @ 2008-04-03  6:58 UTC (permalink / raw)
  To: linux-fsdevel


> Here are some of them and the intention of this post is to get some
> initial feedback about its design.
	:::
> Kindly review and let me know your comments.

o readdir -- virtual dir block on memory (VDIR)
----------------------------------------------------------------------
This is an approach I posted a few months ago replying UnionMount's
post. It constructs a virtual dir block on memory. For readdir, aufs
calls vfs_readdir() internally for each lower dirs, merges their
entries with eliminating the whiteout-ed ones, and gives it the the
file (dir) object. So the file object has its entry list until it is
closed. The entry list will be updated when the file position is zero
and becomes old. This decision is made in aufs automatically.

It may consume rather large memory and cpu cycles. To reduce the number
of memory allocations, the implementation became rather tricky .

Some people may call it can be a security hole or DoS attack since the
opened and once readdir-ed dir (file object) holds its entry list and
becomes a pressure for system memory. But I'd say it is similar to
files under /proc or /sys. The virtual files on procfs and sysfs also
holds a memory page (generally) while they are opened. When an idea to
reduce memory for them is introduced, it will be applied to aufs too.


o policies for selecting one among multiple writable branches,
  parent-dir, round-robin and most-free-space
----------------------------------------------------------------------
When the number of writable branch is more than one, aufs has to decide
the target branch for file creation or copy-up. By default, the highest
writable branch which has the parent (or ancestor) dir of the target
file is chosen (top-down-parent policy).
By user's request, aufs has some other policies to select the writable
branch, round-robin and most-free-space policies for file creation, and
top-down-parent, bottom-up-parent and bottom-up policies for copy-up.

As expected, the round-robin policy selects in circular. When you have
two writable branches and creates 10 new files, 5 files will be
created for each branch. mkdir(2) systemcall is an exception. When you
create 10 new directories, all are created on the same branch.
And the most-free-space policy selects the one which has most free
space among the writable branches. The amount of free space will be
checked by aufs internally, and users can specify its time interval.

The policies for copy-up is more simple,
top-down-parent is equivalent to the same named on in create policy,
bottom-up-parent selects the writable branch where the parent dir
exists and the nearest upper one from the copyup-source,
bottom-up selects the nearest upper writable branch from the
copyup-source, regardless the existence of the parent dir.

There are some rules or exceptions to apply these policies.
- If there is a readonly branch above the policy-selected branch and
  the parent dir is marked as opaque (a variation of whiteout), or the
  target (creating) file is whiteout-ed on the upper readonly branch,
  then the policy will be ignored and the target file will be created
  on the nearest upper writable branch than the readonly branch.
- If there is a writable branch above the policy-selected branch and
  the parent dir is marked as opaque or the target file is whiteouted
  on the branch, then the policy will be ignored and the target file
  will be created on the highest one among the upper writable branches
  who has diropq or whiteout. In case of whiteout, aufs removes it as
  usual.
- link(2) and rename(2) systemcalls are exceptions in every policy.
  They try selecting the branch where the source exists as possible
  since copyup a large file will take long time. If it can't be,
  ie. the branch where the source exists is readonly, then they will
  follow the copyup policy.
- There is an exception for rename(2) when the target exists.
  If the rename target exists, aufs compares the index of the branches
  where the source and the target exists and selects the higher
  one. If the selected branch is readonly, then aufs follows the
  copyup policy.


o revert everything after an error on a branch in a single systemcall,
  and remove/rename dir -- temporary name and EXDEV
----------------------------------------------------------------------
Since aufs handles several filesystems internally, it is important to
revert everything after an error happend on a branch internally, and
returns the expected error of systemcall.
To do this, aufs selects only one target writable branch for
create/remove operations and didn't change other
branches. Additionally aufs has to pay attention the order of internal
operaion to make it revertible at any point. The general rule is here.

For creation,
- lock the real dir on the target branch
- lookup a whiteout for the target
- actual creation of the target
- unlink the whiteout for it, if exists
- d_instantiate()
- unlock the real dir

For removal,
- lock the real dir on the target branch
- create a whiteout for the target, if needed
- actual removal of the target, if it exists on the target branch
- unlock the real dir

Generally rename(2) can handle the destination dir which already
exists, and aufs_rename() basically calls vfs_rename() on the writable
branch. When an empty dst-dir exists on the lower branch(es), aufs has
to make the renamed dir opaque (which is a variation of whiteout and
called 'diropq') by creating a special 'diropq' file under the renamed
dir.
If aufs cannot create the 'diropq' file, aufs cannot revert the
previous vfs_rename().

To address this problem, aufs renames the existing dst-dir to the
temporary new whiteout-ed name before the actual vfs_rename(). After
all operations succeeded, aufs_rename() passes the temporary name to
another kernel thread and returns.
The kernel thread removes the temporary name later.
If aufs cannot create the 'diropq' file, it tries vfs_rename() the
src-dir to its old name, and the temporary name to the old dst-dir name.

This approach is implemented in aufs_rmdir() too (except the branch is
NFS), and very effective when the target dir has many whiteouts since
aufs has to unlink the child whiteouts before calling vfs_rmdir().
It may take long time and user has to wait for the completion of
_logically_ empty dir is removed.
With this approach, user don't need to wait so long time.
But the number of child whiteout is not so much, nobody likes this
overhead. So aufs has an option which specifies the threshold of the
number of child whiteouts.

In rename(2), when the target dir has its child on several branches,
aufs_rename() returns -EXDEV, since it may cause many/long internal
copy-up. Generally mv(1) supports this case and retries create/copy
for each children.


Thank you reading this long and my broken English.

Junjiro Okajima

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 2/2] AUFS: merging/stacking several filesystems
  2008-04-03  6:58 ` [RFC 2/2] " hooanon05
@ 2008-05-12  4:43   ` hooanon05
  2008-05-12  4:54     ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: hooanon05 @ 2008-05-12  4:43 UTC (permalink / raw)
  To: akpm, linux-fsdevel


> > Here are some of them and the intention of this post is to get some
> > initial feedback about its design.
	:::

I have posted some of ideas, design or approaches which are implemented
in AUFS stackable filesystem about a month before.
While I have a plan to implement some more features still, the current
AUFS status is better and used many people for years.
Since I have received requests to submit AUFS into the mainline more
than once, Now I'd ask you to include AUFS into mainline.
But the source is large (see below).
Should I send all of these files to this ML, or ask you to download them
from CVS?
If AUFS was much smaller, I would send files here without asking.


Junjiro Okajima

----------------------------------------------------------------------
$ wc -l fs/aufs25/*.[ch]
    56 fs/aufs25/aufs.h
   109 fs/aufs25/br_fuse.c
   391 fs/aufs25/br_nfs.c
    69 fs/aufs25/br_xfs.c
   932 fs/aufs25/branch.c
   345 fs/aufs25/branch.h
  1043 fs/aufs25/cpup.c
    82 fs/aufs25/cpup.h
   246 fs/aufs25/dcsub.c
    54 fs/aufs25/dcsub.h
   485 fs/aufs25/debug.c
   210 fs/aufs25/debug.h
  1020 fs/aufs25/dentry.c
   384 fs/aufs25/dentry.h
   425 fs/aufs25/dinfo.c
   573 fs/aufs25/dir.c
   146 fs/aufs25/dir.h
   113 fs/aufs25/dlgt.c
   597 fs/aufs25/export.c
   661 fs/aufs25/f_op.c
   826 fs/aufs25/file.c
   246 fs/aufs25/file.h
   185 fs/aufs25/finfo.c
   708 fs/aufs25/hin_or_dlgt.c
   188 fs/aufs25/hinode.h
  1114 fs/aufs25/hinotify.c
   844 fs/aufs25/i_op.c
   828 fs/aufs25/i_op_add.c
   582 fs/aufs25/i_op_del.c
   832 fs/aufs25/i_op_ren.c
   290 fs/aufs25/iinfo.c
   425 fs/aufs25/inode.c
   336 fs/aufs25/inode.h
   307 fs/aufs25/misc.c
   201 fs/aufs25/misc.h
   243 fs/aufs25/module.c
    78 fs/aufs25/module.h
  1489 fs/aufs25/opts.c
   245 fs/aufs25/opts.h
   349 fs/aufs25/plink.c
   111 fs/aufs25/robr.c
   268 fs/aufs25/sbinfo.c
   906 fs/aufs25/super.c
   409 fs/aufs25/super.h
   107 fs/aufs25/sysaufs.c
   150 fs/aufs25/sysaufs.h
   498 fs/aufs25/sysfs.c
   112 fs/aufs25/sysrq.c
   963 fs/aufs25/vdir.c
   653 fs/aufs25/vfsub.c
   493 fs/aufs25/vfsub.h
   693 fs/aufs25/wbr_policy.c
  1058 fs/aufs25/whout.c
   140 fs/aufs25/whout.h
   321 fs/aufs25/wkq.c
   160 fs/aufs25/wkq.h
  1304 fs/aufs25/xino.c
 26603 total
----------------------------------------------------------------------

(http://aufs.sf.net)

Aufs -- Another Unionfs
Junjiro Okajima

# $Id: README,v 1.79 2008/05/04 23:55:14 sfjro Exp $


0. Introduction
----------------------------------------
In the early days, aufs was entirely re-designed and re-implemented
Unionfs Version 1.x series. After many original ideas, approaches,
improvements and implementations, it becomes totally different from
Unionfs while keeping the basic features.
Recently, Unionfs Version 2.x series begin taking some of same
approaches to aufs's.
Unionfs is being developed by Professor Erez Zadok at Stony Brook
University and his team.
If you don't know Unionfs, I recommend you becoming familiar with it
before using aufs. Some terminology in aufs follows Unionfs's.

Bug reports (including my broken English), suggestions, comments
and donations are always welcome. Your bug report may help other users,
including future users. Especially the bug report which doesn't follow
unix/linux filesystem's semantics is important.


1. Features
----------------------------------------
- unite several directories into a single virtual filesystem. The member
  directory is called as a branch.
- you can specify the permission flags to the branch, which are 'readonly',
  'readwrite' and 'whiteout-able.'
- by upper writable branch, internal copyup and whiteout, files/dirs on
  readonly branch are modifiable logically.
- dynamic branch manipulation, add, del.
- etc... see Unionfs in detail.

Also there are many enhancements in aufs, such as:
- safer and faster
- keep inode number by external inode number table
- keep the timestamps of file/dir in internal copyup operation
- seekable directory, supporting NFS readdir.
- support mmap(2) including /proc/PID/exe symlink, without page-copy
- whiteout is hardlinked in order to reduce the consumption of inodes
  on branch
- do not copyup, nor create a whiteout when it is unnecessary
- revert a single systemcall when an error occurs in aufs
- remount interface instead of ioctl
- maintain /etc/mtab by an external shell script, /sbin/mount.aufs.
- loopback mounted filesystem as a branch
- kernel thread for removing the dir who has a plenty of whiteouts
- support copyup sparse file (a file which has a 'hole' in it)
- default permission flags for branches
- selectable permission flags for ro branch, whether whiteout can
  exist or not
- export via NFS.
- support <sysfs>/fs/aufs.
- support multiple writable branches, some policies to select one
  among multiple writable branches.
- a new semantics for link(2) and rename(2) to support multiple
  writable branches.
- a delegation of the internal branch access to support task I/O
  accounting, which also supports Linux Security Modules (LSM) mainly
  for Suse AppArmor.
- nested mount, i.e. aufs as readonly no-whiteout branch of another aufs.
- copyup-on-open or copyup-on-write
- show-whiteout mode
- show configuration even out of kernel tree
- no glibc changes are required.
- and more... see aufs manual in detail

Aufs is in still development stage, especially:
- pseudo hardlink (hardlink over branches)
- allow a direct access manually to a file on branch, e.g. bypassing aufs.
  including NFS or remote filesystem branch.
- refine xino and revalidate
- pseudo-link in NFS-exporting

(current work)
- reorder the branch index without del/re-add.
- permanent xino files

(next work)
- an option for refreshing the opened files after add/del branches
- 'move' policy for copy-up between two writable branches, after
  checking free space.
- ioctl to manipulate file between branches.
- and documentation

(just an idea)
- remount option copy/move between two branches. (unnecessary?)
- O_DIRECT (unnecessary?)
- light version, without branch manipulation. (unnecessary?)
- SMP, because I don't have such machine. But several users reported
  aufs is working fine on SMP machines.
- copyup in userspace
- inotify in userspace
- xattr, acl


2. Download
----------------------------------------
CVS tree is in aufs project of SourceForge.
Here is simple instructions to get aufs source files. It is recommended to
refer to the document about CVS on SourceForge.
	$ mkdir aufs.wcvs
	$ cd aufs.wcvs
	$ cvs -d:pserver:anonymous@aufs.cvs.sourceforge.net:/cvsroot/aufs login
	(CVS password is empty)
	$ cvs -z3 -d:pserver:anonymous@aufs.cvs.sourceforge.net:/cvsroot/aufs co aufs

In order to update files after the first checkout,
	$ cd aufs.wcvs/aufs
	$ cvs update
Do not forget -A option for 'cvs update' if you have ever 'cvs update' with
specifying a file version.

In order to see what the difference between two versions (two dates) is,
	$ cd aufs.wcvs/aufs
	$ cvs diff -D20061212 -D20061219

Usually I am updating CVS tree on every Monday.
I always try putting the stable version in CVS, so you can try CVS
instead of SourceForge File Release. And every changes are summarized
and reported to aufs-users at lists.sourceforge.net ML. I'd like to
recommend you to join this ML.

(snip)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 2/2] AUFS: merging/stacking several filesystems
  2008-05-12  4:43   ` hooanon05
@ 2008-05-12  4:54     ` Andrew Morton
  2008-05-16 14:20       ` hooanon05
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2008-05-12  4:54 UTC (permalink / raw)
  To: hooanon05; +Cc: linux-fsdevel

On Mon, 12 May 2008 13:43:56 +0900 hooanon05@yahoo.co.jp wrote:

> 
> > > Here are some of them and the intention of this post is to get some
> > > initial feedback about its design.
> 	:::
> 
> I have posted some of ideas, design or approaches which are implemented
> in AUFS stackable filesystem about a month before.
> While I have a plan to implement some more features still, the current
> AUFS status is better and used many people for years.
> Since I have received requests to submit AUFS into the mainline more
> than once, Now I'd ask you to include AUFS into mainline.
> But the source is large (see below).
> Should I send all of these files to this ML, or ask you to download them
> from CVS?
> If AUFS was much smaller, I would send files here without asking.
> 

Yup, prepare a patch series and email them out.  cc linux-kernel too.

> ...
>
> 0. Introduction
> ----------------------------------------
> In the early days, aufs was entirely re-designed and re-implemented
> Unionfs Version 1.x series. After many original ideas, approaches,
> improvements and implementations, it becomes totally different from
> Unionfs while keeping the basic features.
> Recently, Unionfs Version 2.x series begin taking some of same
> approaches to aufs's.
> Unionfs is being developed by Professor Erez Zadok at Stony Brook
> University and his team.
> If you don't know Unionfs, I recommend you becoming familiar with it
> before using aufs. Some terminology in aufs follows Unionfs's.

We'd be interested in hearing if aufs addresses any of the shortcomings
which reviewers identified in unionfs and if so, how.

> CVS tree is in aufs project of SourceForge.

Nobody is set up to handle cvs, sorry.  It would be worth the time to
migrate it to git.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 2/2] AUFS: merging/stacking several filesystems
  2008-05-12  4:54     ` Andrew Morton
@ 2008-05-16 14:20       ` hooanon05
  2008-05-16 14:36         ` hooanon05
  0 siblings, 1 reply; 9+ messages in thread
From: hooanon05 @ 2008-05-16 14:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-fsdevel, linux-kernel


Andrew Morton:
> On Mon, 12 May 2008 13:43:56 +0900 hooanon05@yahoo.co.jp wrote:
> 
> > 
> > > > Here are some of them and the intention of this post is to get some
> > > > initial feedback about its design.
> > 	:::
> > 
> > I have posted some of ideas, design or approaches which are implemented
> > in AUFS stackable filesystem about a month before.
> > While I have a plan to implement some more features still, the current
> > AUFS status is better and used many people for years.
> > Since I have received requests to submit AUFS into the mainline more
> > than once, Now I'd ask you to include AUFS into mainline.
> > But the source is large (see below).
> > Should I send all of these files to this ML, or ask you to download them
> > from CVS?
> > If AUFS was much smaller, I would send files here without asking.
> > 
> 
> Yup, prepare a patch series and email them out.  cc linux-kernel too.

Ok, I will send these patches to two MLs.
I'd ask linux-kernel people to read previous posts too.
http://marc.info/?l=linux-fsdevel&m=120716468102834&w=2
http://marc.info/?l=linux-fsdevel&m=120720593114664&w=2

 Documentation/filesystems/aufs/README      |  374 +++++++
 Documentation/filesystems/aufs/aufs.5      | 1608 ++++++++++++++++++++++++++++
 Documentation/filesystems/aufs/aulchown.c  |   29 +
 Documentation/filesystems/aufs/auplink     |  170 +++
 Documentation/filesystems/aufs/mount.aufs  |  205 ++++
 Documentation/filesystems/aufs/umount.aufs |   33 +
 fs/Kconfig                                 |    2 +
 fs/Makefile                                |    1 +
 fs/aufs/Kconfig                            |  203 ++++
 fs/aufs/Makefile                           |   57 +
 fs/aufs/aufs.h                             |   56 +
 fs/aufs/br_fuse.c                          |  109 ++
 fs/aufs/br_nfs.c                           |  391 +++++++
 fs/aufs/br_xfs.c                           |   69 ++
 fs/aufs/branch.c                           |  933 ++++++++++++++++
 fs/aufs/branch.h                           |  345 ++++++
 fs/aufs/cpup.c                             | 1043 ++++++++++++++++++
 fs/aufs/cpup.h                             |   82 ++
 fs/aufs/dcsub.c                            |  246 +++++
 fs/aufs/dcsub.h                            |   54 +
 fs/aufs/debug.c                            |  485 +++++++++
 fs/aufs/debug.h                            |  210 ++++
 fs/aufs/dentry.c                           | 1020 ++++++++++++++++++
 fs/aufs/dentry.h                           |  384 +++++++
 fs/aufs/dinfo.c                            |  425 ++++++++
 fs/aufs/dir.c                              |  573 ++++++++++
 fs/aufs/dir.h                              |  146 +++
 fs/aufs/dlgt.c                             |  113 ++
 fs/aufs/export.c                           |  597 +++++++++++
 fs/aufs/f_op.c                             |  665 ++++++++++++
 fs/aufs/file.c                             |  822 ++++++++++++++
 fs/aufs/file.h                             |  246 +++++
 fs/aufs/finfo.c                            |  185 ++++
 fs/aufs/hin_or_dlgt.c                      |  708 ++++++++++++
 fs/aufs/hinode.h                           |  188 ++++
 fs/aufs/hinotify.c                         | 1114 +++++++++++++++++++
 fs/aufs/i_op.c                             |  844 +++++++++++++++
 fs/aufs/i_op_add.c                         |  828 ++++++++++++++
 fs/aufs/i_op_del.c                         |  582 ++++++++++
 fs/aufs/i_op_ren.c                         |  832 ++++++++++++++
 fs/aufs/iinfo.c                            |  290 +++++
 fs/aufs/inode.c                            |  425 ++++++++
 fs/aufs/inode.h                            |  336 ++++++
 fs/aufs/misc.c                             |  307 ++++++
 fs/aufs/misc.h                             |  201 ++++
 fs/aufs/module.c                           |  243 +++++
 fs/aufs/module.h                           |   78 ++
 fs/aufs/opts.c                             | 1493 ++++++++++++++++++++++++++
 fs/aufs/opts.h                             |  245 +++++
 fs/aufs/plink.c                            |  349 ++++++
 fs/aufs/robr.c                             |  111 ++
 fs/aufs/sbinfo.c                           |  268 +++++
 fs/aufs/super.c                            |  891 +++++++++++++++
 fs/aufs/super.h                            |  410 +++++++
 fs/aufs/sysaufs.c                          |  104 ++
 fs/aufs/sysaufs.h                          |  150 +++
 fs/aufs/sysfs.c                            |  459 ++++++++
 fs/aufs/sysrq.c                            |  112 ++
 fs/aufs/vdir.c                             |  963 +++++++++++++++++
 fs/aufs/vfsub.c                            |  653 +++++++++++
 fs/aufs/vfsub.h                            |  493 +++++++++
 fs/aufs/wbr_policy.c                       |  693 ++++++++++++
 fs/aufs/whout.c                            | 1058 ++++++++++++++++++
 fs/aufs/whout.h                            |  140 +++
 fs/aufs/wkq.c                              |  321 ++++++
 fs/aufs/wkq.h                              |  160 +++
 fs/aufs/xino.c                             | 1249 +++++++++++++++++++++
 fs/namei.c                                 |    2 +-
 include/linux/aufs_type.h                  |  111 ++
 include/linux/lockdep.h                    |    4 +
 include/linux/namei.h                      |    1 +
 71 files changed, 29296 insertions(+), 1 deletions(-)


Junjiro Okajima

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC 2/2] AUFS: merging/stacking several filesystems
  2008-05-16 14:20       ` hooanon05
@ 2008-05-16 14:36         ` hooanon05
  0 siblings, 0 replies; 9+ messages in thread
From: hooanon05 @ 2008-05-16 14:36 UTC (permalink / raw)
  To: Andrew Morton, linux-fsdevel, linux-kernel


> Andrew Morton:
> > Yup, prepare a patch series and email them out.  cc linux-kernel too.
> 
> Ok, I will send these patches to two MLs.

These pathces are against linux-trees.git v2.6.25-mm1.


Junjiro Okajima

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-05-16 14:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-02  5:12 [RFC 1/2] AUFS: merging/stacking several filesystems hooanon05
2008-04-02 15:11 ` Tomas M
2008-04-03  6:56   ` hooanon05
2008-04-03  6:53 ` hooanon05
2008-04-03  6:58 ` [RFC 2/2] " hooanon05
2008-05-12  4:43   ` hooanon05
2008-05-12  4:54     ` Andrew Morton
2008-05-16 14:20       ` hooanon05
2008-05-16 14:36         ` hooanon05

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).