From: Arnd Bergmann <arnd@arndb.de>
To: hooanon05@yahoo.co.jp
Cc: Jamie Lokier <jamie@shareable.org>,
Phillip Lougher <phillip@lougher.demon.co.uk>,
David Newall <davidn@davidnewall.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
hch@lst.de
Subject: Re: [RFC 0/7] [RFC] cramfs: fake write support
Date: Mon, 2 Jun 2008 13:15:40 +0200 [thread overview]
Message-ID: <200806021315.41211.arnd@arndb.de> (raw)
In-Reply-To: <9159.1212402992@jrobl>
On Monday 02 June 2008, hooanon05@yahoo.co.jp wrote:
> > * data inconsistency problems when simultaneously accessing the underlying
> > fs and the union.
> Aufs has three levels of detecting the direct-access to the lower
> (branch) filesystems (ie. bypassing aufs). I guess the most strict level
> is a good answer for your question. It is based on the inotify
> feature. Aufs sets inotify-watch to every accessed directories on lower
> fs. During those inodes are cached, aufs receives the inotify event for
> thier children/files and marks the aufs data for the file is
> obsoleted. When the file is accessed later, aufs retrives the latest
> inode (or dentry) again.
> The inotify-watch will be removed when the aufs dir inode is discarded
> from cache.
This is a very complicated approach, and I'm not sure if it even addresses
the case where you have a shared mmap on both files. With VFS based union
mounts, they share one inode, so you don't need to use idiotify in the first
place, and it automatically works on shared mmaps.
> > * duplication of dentry and inode data structures in the union wastes
> > memory and cpu cycles.
>
> Aufs has its own dentry and inode object as normal fs has. And they have
> pointers to the corresponding ones on the lower fs. If you make a union
> from two real filesystems, then aufs inode will have (at most) two
> pointers as its private data.
> Do you mean having pointers is a duplicataion?
I mean having your own dentry and inode object is duplication. The
underlying file system already has them, so if you have your own,
you need to keep them synchronized. I guess that in order to do
a lookup on a file, you need the steps of
1. lookup in aufs dentry cache -> fail
2. lookup in underlying dentry cache -> fail
3. try to read dentry from disk -> fail
4. repeat 2-3 until found, or arrive at lowest level
5. create an inode in memory for the lower file system
6. create dentry in memory on lower file system, pointing
to that
7. create an aufs specific inode pointing to the underlying
inode
8. create an aufs specific dentry object to point to that
9. create a struct inode representing the aufs inode
10. create another VFS dentry to point to that
when you really should just return the dentry found by the
lower file system.
> > * whiteouts are in the same namespace as regular files, so conflicts are
> > possible.
>
> Yes, that's right.
> Aufs reserves ".wh." as a whiteout prefix, and prohibits users to handle
> such filename inside aufs. It might be a problem as you wrote, but users
> can create/remove them directly on the lower fs and I have never
> received request about this reserved prefix.
It's not so much a practical limitation as an exploitable feature.
E.g. an unpriviledged user may use this to get an application into
an error condition by asking for an invalid file name.
Posix reserves a well-defined set of invalid file names, and
deviation from this means that you are not compliant, and that
in a potentially unexpected way.
> > * mounting a large number of aufs on top of each other eventually
> > overflows the kernel stack, e.g. in readdir.
>
> Aufs readdir operation consumes memory, but it is not stack. If it was
> implemented as a recursive function, it might cause the stack
> overflow. But actually it is a loop.
> The memory is used for stroing entry names and eliminating whiteout-ed
> ones, and the result will be cached for a specified time. So the memory
> (other than stack) will be consumed.
How does aufs know that one of its branches is an aufs itself?
If you detect this, do you fold it into a single aufs instance with
more branches?
In case you don't do it, I don't see how you get around the stack
overflow, but if you do it, you have again added a whole lot of
complexity for something that should be trivial when done right.
> > * allowing multiple writable branches (instead of just stacking
> > one rw copy on a number of ro file systems) is confusing to the user
> > and complicates the implementation a lot.
>
> Probably you are right. Initially aufs had only one policy to select the
> writable branch. But several users requested another policy such as
> round-robin or most-free-spece, and aufs has implemented them.
> I don't guess uers will be confused by these policies. While I tried it
> should be simple, I guess some people will say it is complex.
I personally think that a policy other than writing to the top is crazy
enough, but randomly writing to multiple places is much worse, as it
becomes unpredictable what the file system does, not just unexpected.
Arnd <><
next prev parent reply other threads:[~2008-06-02 11:16 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-31 15:37 [RFC 0/7] [RFC] cramfs: fake write support arnd
2008-05-31 18:56 ` David Newall
2008-05-31 20:40 ` Arnd Bergmann
2008-06-01 3:54 ` Phillip Lougher
2008-06-01 8:52 ` Arnd Bergmann
2008-06-01 12:28 ` Jamie Lokier
2008-06-01 21:49 ` Arnd Bergmann
2008-06-02 2:48 ` hooanon05
2008-06-02 3:25 ` Erez Zadok
2008-06-02 7:51 ` Arnd Bergmann
2008-06-02 18:13 ` Erez Zadok
2008-06-03 2:02 ` Phillip Lougher
2008-06-02 3:51 ` Erez Zadok
2008-06-02 11:07 ` Jamie Lokier
2008-06-02 4:37 ` Erez Zadok
2008-06-02 6:07 ` Bharata B Rao
2008-06-02 7:17 ` Jan Engelhardt
2008-06-02 7:12 ` Arnd Bergmann
2008-06-02 10:36 ` hooanon05
2008-06-02 11:15 ` Arnd Bergmann [this message]
2008-06-02 12:56 ` hooanon05
2008-06-02 14:13 ` Arnd Bergmann
2008-06-02 14:33 ` hooanon05
2008-06-02 15:01 ` Arnd Bergmann
2008-06-03 11:04 ` hooanon05
2008-06-02 14:54 ` Evgeniy Polyakov
2008-06-02 17:42 ` Arnd Bergmann
2008-06-02 15:35 ` Erez Zadok
2008-06-01 6:02 ` David Newall
2008-06-01 9:11 ` Jan Engelhardt
2008-06-01 16:25 ` Jörn Engel
2008-06-01 3:19 ` Phillip Lougher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200806021315.41211.arnd@arndb.de \
--to=arnd@arndb.de \
--cc=davidn@davidnewall.com \
--cc=hch@lst.de \
--cc=hooanon05@yahoo.co.jp \
--cc=jamie@shareable.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=phillip@lougher.demon.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox