From: Jamie Lokier <jamie@shareable.org>
To: Linus Torvalds <torvalds@osdl.org>
Cc: viro@parcelfarce.linux.theplanet.co.uk,
David Woodhouse <dwmw2@infradead.org>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Andrew Morton <akpm@osdl.org>, Carsten Otte <cotte@de.ibm.com>,
Carsten Otte <cotte@freenet.de>,
linux-fsdevel@vger.kernel.org, sct@redhat.com,
Dave Kleikamp <shaggy@austin.ibm.com>
Subject: Re: [PATCH] ext3 [linux-2.6.2.]: accessing already freed inodes when under memory pressure
Date: Fri, 2 Apr 2004 21:40:57 +0100 [thread overview]
Message-ID: <20040402204057.GD653@mail.shareable.org> (raw)
In-Reply-To: <Pine.LNX.4.58.0404021127200.1122@ppc970.osdl.org>
Linus Torvalds wrote:
> Naah. You _want_ user space to see that they are the same file, and then
> the algorithm should be: "open+open+fstat+fstat+cmp st_dev/st_ino".
The trouble with that is the many programs which assume that if
st_nlink == 1, then the file has only one path. This is a critical
optimisation for any program which looks at a lot of files checking
for equivalent files, and is widely assumed. I've always thought it a
reliable basic unix assumption.
For example: rsync -H, cp -a, and Emacs backup-by-copying-when-linked.
You may argue that using "rsync -H" or "cp -a" on a tree that contains
bind mounts is broken by design.
Then there are occasions when you want to traverse a tree, but not
cross mounts. Programs do that by checking whether the st_dev field
returned by stat() or fstat() on a directory is different from its parent.
For example: find -xdev, tar --one-file-system, cp -x.
You may argue that bind mounts shouldn't count for the purpose of
--one-file-system, but there should be some reasonable a way for
programs to recognise the bind mount topology, to offer the behaviour
of --one-file-system-not-crossing-bind-mounts.
Regular files can be bind-mounted. So even if bind mounts did change
st_dev, programs which check st_dev when entering a directory wouldn't
recognise bind mounted files.
Then there are programs such as optimised Make and cacheing systems
and servers which use dnotify. dnotify is reliable for single-linked
files, and when there are multiple links, it's still reliable if you
discover all the st_nlink paths to a file. However, it isn't reliable
if there are any bind mounts, because you don't know whether you have
all paths to a file (without grubbing inside /proc/mounts, and that
has race conditions anyway).
The obvious strategy for such programs is to ignore bind mounts and
give incorrect results if there are any. However it would be much
better if they could detect when a file has multiple paths that aren't
mentioned in st_nlink, so they wouldn't depend on dnotify in such cases.
> But I agree that it might be good to also have a way to enquire about
> mount information. One logical place for that might be "fstatfs()".
> We've got a few spare bytes there, so it wouldn't be impossible to do.
Sounds like a good idea. I appreciate Al's point about st_dev having
an actual real meaning. I was thinking that with 64-bit dev_t, it
might be ok to reserve some bits of that to distinguish bind mounts,
so that a program can still get the underlying device id if it wants that.
There's a precedent for different views of a filesystem having
different st_dev values. Think about loopback NFS: possibly multiple
NFS filesystems, and a real one, all referring to the same set of
files, and _all_ of them have different st_dev values. Semantically a
bind mount does not seem so different from that.
-- Jamie
next prev parent reply other threads:[~2004-04-02 20:43 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-03-30 11:57 [PATCH] ext3 [linux-2.6.2.]: accessing already freed inodes when under memory pressure Martin Schwidefsky
2004-03-30 13:39 ` David Woodhouse
2004-03-30 14:16 ` Matthew Wilcox
2004-03-30 15:51 ` Linus Torvalds
2004-04-02 16:12 ` viro
2004-04-02 18:01 ` viro
2004-04-02 18:52 ` Linus Torvalds
2004-04-02 19:02 ` Linus Torvalds
2004-04-02 19:10 ` viro
2004-04-02 19:07 ` viro
2004-04-02 20:23 ` viro
2004-04-02 22:40 ` Trond Myklebust
2004-04-02 23:06 ` viro
2004-04-02 23:23 ` Trond Myklebust
2004-04-03 0:53 ` Neil Brown
2004-04-02 23:19 ` Trond Myklebust
2004-04-02 19:17 ` Jamie Lokier
2004-04-02 19:25 ` viro
2004-04-02 19:32 ` Linus Torvalds
2004-04-02 19:37 ` viro
2004-04-02 19:45 ` Linus Torvalds
2004-04-02 20:08 ` viro
2004-04-02 20:40 ` Jamie Lokier [this message]
2004-04-02 20:59 ` Christoph Hellwig
2004-04-02 21:09 ` viro
2004-04-02 23:42 ` Jamie Lokier
2004-04-02 21:08 ` viro
2004-04-03 0:39 ` Jamie Lokier
2004-04-05 14:07 ` Stephen C. Tweedie
2004-03-30 15:07 ` Linus Torvalds
2004-04-02 16:14 ` viro
-- strict thread matches above, loose matches on Subject: below --
2004-03-30 15:13 Martin Schwidefsky
2004-03-29 19:07 Martin Schwidefsky
2004-03-29 20:11 ` Linus Torvalds
2004-03-29 20:29 ` Dave Kleikamp
2004-02-19 18:00 Martin Schwidefsky
2004-02-19 12:21 Carsten Otte
2004-02-19 16:53 ` Linus Torvalds
2004-02-19 17:39 ` Stephen C. Tweedie
2004-02-19 18:49 ` Andrew Morton
2004-02-19 20:28 ` Carsten Otte
2004-02-19 20:26 ` viro
2004-02-19 20:35 ` Carsten Otte
2004-02-19 20:14 ` Carsten Otte
2004-02-20 3:41 ` Andrew Morton
2004-02-19 20:19 ` Carsten Otte
[not found] ` <20040220164325.659c4e45.akpm@osdl.org>
[not found] ` <200402241338.57855.cotte@freenet.de>
2004-02-24 22:55 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040402204057.GD653@mail.shareable.org \
--to=jamie@shareable.org \
--cc=akpm@osdl.org \
--cc=cotte@de.ibm.com \
--cc=cotte@freenet.de \
--cc=dwmw2@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=schwidefsky@de.ibm.com \
--cc=sct@redhat.com \
--cc=shaggy@austin.ibm.com \
--cc=torvalds@osdl.org \
--cc=viro@parcelfarce.linux.theplanet.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox