From: hooanon05@yahoo.co.jp
To: Erez Zadok <ezk@cs.sunysb.edu>
Cc: hch@infradead.org, viro@ftp.linux.org.uk,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [UNIONFS] 00/42 Unionfs and related patches review
Date: Sat, 15 Dec 2007 06:15:59 +0900 [thread overview]
Message-ID: <E1J3Ht2-0000mH-0V@jroun> (raw)
In-Reply-To: <200712131529.lBDFT5Ch025553@agora.fsl.cs.sunysb.edu>
Hello Professor Zadok,
Erez Zadok:
> I believe that small VFS changes to help stackable file systems are
> perfectly reasonable, and a good thing; and I'm working on such patches.
> Conversely, I am very mindful of the VFS's complexity, so I also believe
> that massive VFS changes are a bad thing; I want to avoid bloating the VFS
> or de-stabilizing it just for the sake of stacking or any single stackable
> f/s. I am also concerned about not changing existing "lower" file systems
> whatsoever, because they are well tested and stable.
I have no objection against your opinion about massive VFS changes or
existing "lower" filesystems.
> from). So in my opinion, the chances are very slim that a large amount of
> data changes will happen on a lower inode all within one second and not be
> detected by our mtime/cite cache-coherency algorithms.
I agree that time-based checking is available in many cases.
But there will exist some opeartions which are done in one
second, and it may not be available when a user changes the clock/time
of his system.
> Also, time-based cache coherency is a [sic] time-honored technique in NFS.
> Users have gotten used to the fact that if they change something on the
> server (i.e., the "layer" below the client), that those changes many not be
> immediately visible on the client (esp. with heavy caching on the client).
> So if it's been good enough for NFS for over two decades, I don't see a
> compelling reason to complicate unionfs for a slim chance of detecting
> changes that occur within one second.
Since NFS is a remote filesystem, I don't think it is a good idea to
compare the behaviour of if and a stackable filesystem, since all
lower(branch) filesystems are able to be local filesystems.
> Right now my code to detect when a lower object has changed is very simple
> and short: just one "if" statement to compare the corresponding inode
> mtimes. I'd like to keep things simple if at all possible. Fundamentally,
> all I need is ONE simple bit of information that will tell me that the upper
> and lower inodes are no longer in sync. Just one bit, not a whole complex
> data structure with callbacks and bit-maps and such.
Agreed, so the inotify handler should set a flag or atomic_inc/dec
the internal generation, or enqueue such job and handle it
later (shortly). Of course, when the dentry/inode of the stackable
filesystem corresspoding to the modfied file are not cached, the handler
has nothing to do.
Additionally, it is only directories to be set inotify for monitoring,
instead of all files. The inotify handler for a directory receives all
necessary (for a stackable fs) events for its children.
(but there are a few limitations or exceptions)
> What you propose violates the clean layer separation in a way that I'm not
> too comfortable with (even if it works for you :-) I believe stackable file
:::
> system, each struct file/dentry/inode has a corresponding lower object. Our
> FiST templates for Linux include an extra mode---called "fist lite"---which
> saves on inodes and pages by having BOTH the upper and lower dentry point to
> the lower inode. This saves memory (shared pages) and reduces layering
> overhead (but you can't intercept mmap ops, which some stackable f/s like to
> do). The cost of such trick is violating the clean layering separation: a
> dentry of the upper file system now points to an inode (via dentry->d_inode)
> of the lower file system! To me, this is dangerous in the long run because
> objects from one layer can be "leaked" to another layer, with potentially
> disastrous results.
Currently, I don't think sharing page is any kind of
violation. Additionally the dentry of the upper file system does NOT
point to the inode of the lower file system. Of course it can implement
->mmap operation.
> What you propose above with vm_operations requires a sequence of operations
> in the vm->fault operation which looks like:
>
> saved_file = vma->vm_file;
> vma->vm_file = hidden_file;
> call the lower ->fault op passing it the modified vma
> vma->vm_file = saved_file;
Basically, yes.
But there are several things to do such as locking.
> Even if both of these techniques work (and they do, at least in limited
> testing I've done), there is something very unpleasant about having to
> temporarily override a field's value, then fix it again, after coming back
> from calling the lower op. Aside from the uncleanliness of this kind of
> technique, it can seriously lead to races and other data corruptions when/if
> the temporarily-fixed fields "leak" outside the current code. (I have a
> strong feeling that several kernel developers might vomit in disgust if I
> dared to submit such hacky patches to unionfs... :-)
I guess probably you forgot some locking.
> To me, any time such a hack has to be employed, it tells me that there's
> something wrong with the API in question. And so I'd much rather see the
> API fixed The Right Way[tm], than promote such potentially unsafe practices.
If you changed some important members of internal structures without
locking, it would be unsafe and violate something.
Finally I think the approach of sharing pages, you may call it
zero-copy conversely your approach, is safe. At least, this approach is
working over a year while several people are using it.
Of course, I never say it is bug-free. There may exist a problem which
simply I don't know yet.
Sincerely,
Junjiro Okajima
prev parent reply other threads:[~2007-12-14 21:16 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-10 2:41 [UNIONFS] 00/42 Unionfs and related patches review Erez Zadok
2007-12-10 2:41 ` [PATCH 01/42] Unionfs: filesystems documentation index Erez Zadok
2007-12-10 14:47 ` Jan Engelhardt
2007-12-13 15:06 ` Erez Zadok
2007-12-13 21:25 ` Jan Engelhardt
2007-12-10 2:41 ` [PATCH 02/42] Unionfs: unionfs " Erez Zadok
2007-12-10 2:41 ` [PATCH 03/42] Unionfs: documentation for general concepts Erez Zadok
2007-12-10 2:41 ` [PATCH 04/42] Unionfs: usage documentation for users Erez Zadok
2007-12-10 2:41 ` [PATCH 05/42] Unionfs: documentation for any known issues Erez Zadok
2007-12-10 2:41 ` [PATCH 06/42] Unionfs: documentation about renaming operations Erez Zadok
2007-12-10 2:41 ` [PATCH 07/42] Unionfs maintainers Erez Zadok
2007-12-10 2:41 ` [PATCH 08/42] Makefile: hook to compile unionfs Erez Zadok
2007-12-10 2:41 ` [PATCH 09/42] Unionfs: main Makefile Erez Zadok
2007-12-10 2:41 ` [PATCH 10/42] Unionfs: fanout header definitions Erez Zadok
2007-12-10 2:41 ` [PATCH 11/42] Unionfs: main header file Erez Zadok
2007-12-10 2:41 ` [PATCH 12/42] Unionfs: common file copyup/revalidation operations Erez Zadok
2007-12-10 2:41 ` [PATCH 13/42] Unionfs: basic file operations Erez Zadok
2007-12-10 2:41 ` [PATCH 14/42] Unionfs: lower-level copyup routines Erez Zadok
2007-12-10 2:41 ` [PATCH 15/42] Unionfs: dentry revalidation Erez Zadok
2007-12-10 2:41 ` [PATCH 16/42] Unionfs: lower-level lookup routines Erez Zadok
2007-12-10 2:41 ` [PATCH 17/42] Unionfs: rename method and helpers Erez Zadok
2007-12-10 2:41 ` [PATCH 18/42] Unionfs: directory reading file operations Erez Zadok
2007-12-10 2:41 ` [PATCH 19/42] Unionfs: readdir helper functions Erez Zadok
2007-12-10 2:41 ` [PATCH 20/42] Unionfs: readdir state helpers Erez Zadok
2007-12-10 2:41 ` [PATCH 21/42] Unionfs: inode operations Erez Zadok
2007-12-10 2:41 ` [PATCH 22/42] Unionfs: unlink/rmdir operations Erez Zadok
2007-12-10 2:41 ` [PATCH 23/42] Unionfs: address-space operations Erez Zadok
2007-12-10 2:41 ` [PATCH 24/42] Unionfs: mount-time and stacking-interposition functions Erez Zadok
2007-12-10 2:41 ` [PATCH 25/42] Unionfs: super_block operations Erez Zadok
2007-12-10 2:41 ` [PATCH 26/42] Unionfs: extended attributes operations Erez Zadok
2007-12-10 2:42 ` [PATCH 27/42] Unionfs: async I/O queue headers Erez Zadok
2007-12-10 2:42 ` [PATCH 28/42] Unionfs: async I/O queue operations Erez Zadok
2007-12-10 2:42 ` [PATCH 29/42] Unionfs: miscellaneous helper routines Erez Zadok
2007-12-10 2:42 ` [PATCH 30/42] Unionfs: debugging infrastructure Erez Zadok
2007-12-10 2:42 ` [PATCH 31/42] VFS: fs_stack header cleanups Erez Zadok
2007-12-10 2:42 ` [PATCH 32/42] Unionfs file system magic number Erez Zadok
2007-12-10 2:42 ` [PATCH 33/42] MM: extern for drop_pagecache_sb Erez Zadok
2007-12-10 14:04 ` Adrian Bunk
2007-12-13 15:05 ` Erez Zadok
2007-12-10 2:42 ` [PATCH 34/42] VFS path get/put ops used by Unionfs Erez Zadok
2007-12-10 2:42 ` [PATCH 35/42] Unionfs: common header file for user-land utilities and kernel Erez Zadok
2007-12-10 2:42 ` [PATCH 36/42] VFS: export drop_pagecache_sb Erez Zadok
2007-12-12 5:38 ` Nick Piggin
2007-12-13 15:24 ` Erez Zadok
2007-12-13 22:47 ` Nick Piggin
2007-12-10 2:42 ` [PATCH 37/42] VFS: export release_open_intent symbol Erez Zadok
2007-12-10 2:42 ` [PATCH 38/42] VFS: simplified fsstack_copy_attr_all Erez Zadok
2007-12-10 2:42 ` [PATCH 39/42] Put Unionfs and eCryptfs under one layered filesystems menu Erez Zadok
2007-12-10 2:42 ` [PATCH 40/42] eCryptfs: use simplified fs_stack API for dentry operations Erez Zadok
2007-12-10 2:42 ` [PATCH 41/42] eCryptfs: use simplified fs_stack API for inode operations Erez Zadok
2007-12-10 2:42 ` [PATCH 42/42] eCryptfs: use simplified fs_stack API for main operations Erez Zadok
2007-12-10 3:20 ` [UNIONFS] 00/42 Unionfs and related patches review hooanon05
2007-12-13 15:29 ` Erez Zadok
2007-12-14 21:15 ` hooanon05 [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E1J3Ht2-0000mH-0V@jroun \
--to=hooanon05@yahoo.co.jp \
--cc=akpm@linux-foundation.org \
--cc=ezk@cs.sunysb.edu \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=viro@ftp.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).