From: Dave Chinner <david@fromorbit.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: Christoph Hellwig <hch@infradead.org>, xfs@oss.sgi.com
Subject: Re: beginners project: RENAME_WHITEOUT
Date: Tue, 11 Nov 2014 00:52:49 +1100 [thread overview]
Message-ID: <20141110135249.GR28565@dastard> (raw)
In-Reply-To: <CAJfpegsvZtCxV-GSJ4ON7=JeNkZ7o2=+fmbOTrxDAfm==b1XBw@mail.gmail.com>
On Mon, Nov 10, 2014 at 10:25:40AM +0100, Miklos Szeredi wrote:
> On Sun, Nov 9, 2014 at 12:42 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Fri, Nov 07, 2014 at 11:09:59AM -0800, Christoph Hellwig wrote:
> >> The overlayfs merge introduces a new rename flag to create to whiteouts.
> >> Should be a fairly easy to implement.
> >>
> >> Miklos, do you have any good documentation and/or test cases for this?
> >
> > So overlayfs uses some weird char dev hack to implement whiteout
> > inodes in directories? Why do we need a whiteout inode on disk?
> > what information is actually stored in the whiteout inode that
> > overlayfs actually needs? Only readdir and lookup care about
> > whiteouts, and AFAICT nothing of the inode is ever used except
> > checking the chrdev/whiteoutdev hack via ovl_is_whiteout(dentry).
> >
> > Indeed, whatever happened to just storing the whiteout in the dirent
> > via DT_WHT and using that information on lookup in the lower
> > filesystem to mark the dentry returned appropriately without needing
> > to lookup a real inode?
>
> The filesystem is free to implement whiteouts a dirent without an actual inode.
Sure, but overlayfs won't make use of it, so we'd have
have to hack around overlayfs's ignorance of DT_WHT in several
different places to do this. e.g.
- in mknod to intercept creation of magical whiteout chardevs
- in readdir so we can convert them to DT_CHR so overlayfs
can detect them,
- in ->lookup so we can create magical chardev inodes in
memory rather than try to read them from disk.
- in rename we have to detect the magical chardev inodes so
we know it's a whiteout we are dealing with
This is difficult because overlayfs hard codes the definition of a
whiteout into the VFS interface implementation as well as it's
internal directory implementation. This leaves almost no room for
anyone to optimise the back end implementation because the
translation layers are complex and fiddly....
> But we do need at least an inode in the VFS, since the whiteout needs
> to be presented to userspace when not part of the overlay.
Sure, but that's a different problem.
> The DT_WHT
> makes the typical mistake of trying to make the implementation nice,
> while not caring about user interfaces.
You're implying the d_type field in a dirent is something that it is
not. d_type has only one purpose in life - to allowing userspace
applications to avoid a stat() call to find out the type of the
object the dirent points to.
> This is usually a big mistake, user interfaces are much more important
> than implementation details, and an already existing file type on
> which all the usual operations work (stat, unlink) is much better in
> this respect than a completely new object which is unknown and
> unmanageable for the vast majority of applications.
Sure, but again that's not the issue I'm commenting on. The dirent
type has no effect on stat, unlink, etc that are done on the dirent
after it is returned to userspace.
So why is overloading DT_CHR to mean 'either a char device or a
whiteout entry' a sane user interface design decision? d_type *was*
a simple, obvious, effective and efficient user interface that
allowed users to avoid extra syscalls. It's been used this way by
userspace for, what, 15 years?
With the overlayfs "magic" we now have the situation where d_type is
not sufficient to avoid a stat() call to determine the type the
dirent points to. IOWs, we've just fucked up a perfectly good
interface that is widely used because somebody thought that using
the DT_WHT value in the d_type field for whiteout dirents is a "bad
interface".
> The special chardev was Linus' idea, but I agree with him completely
> on this point. Introducing DT_WHT on the userspace API would be much
> more of a hack than reusing existing objects and operations.
Magical char dev for access, unlink, etc: no problems there.
DT_CHR for the whiteout dirent type: completely fucked up.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-11-10 13:53 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-07 19:09 beginners project: RENAME_WHITEOUT Christoph Hellwig
2014-11-07 22:59 ` Eric Sandeen
2014-11-08 6:40 ` Christoph Hellwig
2014-11-09 0:08 ` cem
2014-11-08 23:42 ` Dave Chinner
2014-11-10 9:25 ` Miklos Szeredi
2014-11-10 13:52 ` Dave Chinner [this message]
2014-11-10 16:55 ` Miklos Szeredi
2015-01-09 13:30 ` Carlos Maiolino
2015-01-22 1:05 ` Dave Chinner
2014-11-10 9:14 ` Miklos Szeredi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141110135249.GR28565@dastard \
--to=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=miklos@szeredi.hu \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox