From: Al Viro <viro@ZenIV.linux.org.uk>
To: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Martin Steigerwald <martin@lichtvoll.de>,
Matthew Wilcox <willy@infradead.org>,
dsterba@suse.cz, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
linux-m68k@lists.linux-m68k.org,
Debian m68k <debian-68k@lists.debian.org>
Subject: Re: moving affs + RDB partition support to staging?
Date: Sun, 6 May 2018 08:40:00 +0100 [thread overview]
Message-ID: <20180506073955.GJ30522@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20180506005946.GI30522@ZenIV.linux.org.uk>
On Sun, May 06, 2018 at 01:59:51AM +0100, Al Viro wrote:
> > There is nothing at the moment that needs fixing.
>
> Funny, that... I'd been going through the damn thing for the
> last week or so; open-by-fhandle/nfs export support is completely
> buggered. And as for the rest... the least said about the error
> handling, the better - something like rename() hitting an IO
> error (read one) can not only screw the on-disk data into the
> ground, it can do seriously bad things to kernel data structures.
... and while we are at it, consider the following:
mkdir a
mkdir b
touch a/x
ln a/x b
<umount and mount again, to get cold dcache, or just wait for memory
pressure to evict those suckers>
process A: unlink("a/x");
process B: open("b/x");
We had two entries - one in a, another in b; the one in a is ST_FILE one,
the one in b - ST_LINKFILE, with ->original refering to the ST_FILE one.
unlink() needs to kick the entry out of a; since it can't leave an
ST_FILE not included into any directory (and since the file should live on
due to b/x still being there) it ends up removing ST_LINKFILE entry from
b and moving ST_FILE one in its place. That happens in affs_remove_link().
However, open() gets there just as this surgery is about to begin.
It gets to affs_lookup(), which finds the entry for b/x, reads it,
stashes block number of that entry into dentry->d_fsdata, notices
that it's ST_LINKFILE one, picks the block number of ST_FILE one
out of it and proceeds to look up the inode; that doesn't end up
doing any work (the same inode is in icache due to a/x having been
looked up by unlink(2)).
In the meanwhile, since the hash table of b has been unlocked once
we'd done hash lookup, affs_remove_link() can proceed with the
surgery. It *does* try to catch dentries with ->d_fsdata containing
the block number of sacrificed ST_LINKFILE and reset that to block
number of ST_FILE that gets moved in its place. However, it does
so by
static void
affs_fix_dcache(struct inode *inode, u32 entry_ino)
{
struct dentry *dentry;
spin_lock(&inode->i_lock);
hlist_for_each_entry(dentry, &inode->i_dentry, d_u.d_alias) {
if (entry_ino == (u32)(long)dentry->d_fsdata) {
dentry->d_fsdata = (void *)inode->i_ino;
break;
}
}
spin_unlock(&inode->i_lock);
}
i.e. looping through dentries that point to our inode. Except that
*our* dentry isn't attached to inode yet, so we are SOL - it's
left with ->d_fsdata pointing to (destroyed) ST_LINKFILE.
After a while process B closes the file and unlinks it. Take a look
at affs_remove_header() and you'll see how much fun we are in for -
it uses ->d_fsdata to pick the entry to find and remove...
That's an fs corruptor, plain and simple. As far as I had been able
to reconstruct the history, leftover after Roman's locking changes
17 years ago. AFAICS, I'd missed it back then - the races I'd spotted
had been dealt with about a year later (when we started to lock the
victim in vfs_unlink/vfs_rmdir/vfs_rename), but this one went unnoticed...
WARNING: multiple messages have this Message-ID (diff)
From: Al Viro <viro@ZenIV.linux.org.uk>
To: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Martin Steigerwald <martin@lichtvoll.de>,
Matthew Wilcox <willy@infradead.org>,
dsterba@suse.cz, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
linux-m68k@vger.kernel.org,
Debian m68k <debian-68k@lists.debian.org>
Subject: Re: moving affs + RDB partition support to staging?
Date: Sun, 6 May 2018 08:40:00 +0100 [thread overview]
Message-ID: <20180506073955.GJ30522@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20180506005946.GI30522@ZenIV.linux.org.uk>
On Sun, May 06, 2018 at 01:59:51AM +0100, Al Viro wrote:
> > There is nothing at the moment that needs fixing.
>
> Funny, that... I'd been going through the damn thing for the
> last week or so; open-by-fhandle/nfs export support is completely
> buggered. And as for the rest... the least said about the error
> handling, the better - something like rename() hitting an IO
> error (read one) can not only screw the on-disk data into the
> ground, it can do seriously bad things to kernel data structures.
... and while we are at it, consider the following:
mkdir a
mkdir b
touch a/x
ln a/x b
<umount and mount again, to get cold dcache, or just wait for memory
pressure to evict those suckers>
process A: unlink("a/x");
process B: open("b/x");
We had two entries - one in a, another in b; the one in a is ST_FILE one,
the one in b - ST_LINKFILE, with ->original refering to the ST_FILE one.
unlink() needs to kick the entry out of a; since it can't leave an
ST_FILE not included into any directory (and since the file should live on
due to b/x still being there) it ends up removing ST_LINKFILE entry from
b and moving ST_FILE one in its place. That happens in affs_remove_link().
However, open() gets there just as this surgery is about to begin.
It gets to affs_lookup(), which finds the entry for b/x, reads it,
stashes block number of that entry into dentry->d_fsdata, notices
that it's ST_LINKFILE one, picks the block number of ST_FILE one
out of it and proceeds to look up the inode; that doesn't end up
doing any work (the same inode is in icache due to a/x having been
looked up by unlink(2)).
In the meanwhile, since the hash table of b has been unlocked once
we'd done hash lookup, affs_remove_link() can proceed with the
surgery. It *does* try to catch dentries with ->d_fsdata containing
the block number of sacrificed ST_LINKFILE and reset that to block
number of ST_FILE that gets moved in its place. However, it does
so by
static void
affs_fix_dcache(struct inode *inode, u32 entry_ino)
{
struct dentry *dentry;
spin_lock(&inode->i_lock);
hlist_for_each_entry(dentry, &inode->i_dentry, d_u.d_alias) {
if (entry_ino == (u32)(long)dentry->d_fsdata) {
dentry->d_fsdata = (void *)inode->i_ino;
break;
}
}
spin_unlock(&inode->i_lock);
}
i.e. looping through dentries that point to our inode. Except that
*our* dentry isn't attached to inode yet, so we are SOL - it's
left with ->d_fsdata pointing to (destroyed) ST_LINKFILE.
After a while process B closes the file and unlinks it. Take a look
at affs_remove_header() and you'll see how much fun we are in for -
it uses ->d_fsdata to pick the entry to find and remove...
That's an fs corruptor, plain and simple. As far as I had been able
to reconstruct the history, leftover after Roman's locking changes
17 years ago. AFAICS, I'd missed it back then - the races I'd spotted
had been dealt with about a year later (when we started to lock the
victim in vfs_unlink/vfs_rmdir/vfs_rename), but this one went unnoticed...
next prev parent reply other threads:[~2018-05-06 7:41 UTC|newest]
Thread overview: 91+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-25 15:46 Moving unmaintained filesystems to staging Matthew Wilcox
2018-04-25 15:47 ` Christoph Hellwig
2018-04-25 20:30 ` David Sterba
2018-04-26 2:57 ` Matthew Wilcox
2018-04-26 10:28 ` moving affs + RDB partition support to staging? (was: Re: Moving unmaintained filesystems to staging) Martin Steigerwald
2018-04-26 10:28 ` Martin Steigerwald
2018-04-26 10:45 ` moving affs + RDB partition support to staging? John Paul Adrian Glaubitz
2018-04-26 10:45 ` John Paul Adrian Glaubitz
2018-04-26 10:59 ` David Sterba
2018-04-26 11:06 ` John Paul Adrian Glaubitz
2018-05-06 0:59 ` Al Viro
2018-05-06 7:40 ` Al Viro [this message]
2018-05-06 7:40 ` Al Viro
2018-05-06 20:46 ` Al Viro
2018-05-06 20:46 ` Al Viro
2018-05-06 20:49 ` John Paul Adrian Glaubitz
2018-05-06 21:32 ` Al Viro
2018-05-06 21:32 ` Al Viro
2018-05-07 2:15 ` Al Viro
2018-05-07 2:15 ` Al Viro
2018-05-07 2:40 ` Michael Schmitz
2018-05-07 2:40 ` Michael Schmitz
2018-05-07 7:08 ` Martin Steigerwald
2018-05-07 7:08 ` Martin Steigerwald
2018-05-07 20:50 ` Michael Schmitz
2018-05-07 20:50 ` Michael Schmitz
2018-05-07 20:56 ` Ingo Jürgensmann
2018-05-07 20:58 ` John Paul Adrian Glaubitz
2018-05-06 8:40 ` John Paul Adrian Glaubitz
2018-05-06 8:40 ` John Paul Adrian Glaubitz
2018-05-06 10:12 ` Martin Steigerwald
2018-04-26 11:00 ` moving affs + RDB partition support to staging? (was: Re: Moving unmaintained filesystems to staging) Christoph Hellwig
2018-04-26 11:00 ` Christoph Hellwig
2018-04-26 11:08 ` Geert Uytterhoeven
2018-04-26 11:08 ` Geert Uytterhoeven
2018-04-26 23:56 ` Finn Thain
2018-04-26 23:56 ` Finn Thain
2018-04-27 1:43 ` moving affs + RDB partition support to staging? jdow
2018-04-27 1:43 ` jdow
2018-04-27 1:26 ` jdow
2018-04-27 1:26 ` jdow
2018-05-06 8:52 ` John Paul Adrian Glaubitz
2018-05-06 10:10 ` Martin Steigerwald
2018-05-06 10:10 ` Martin Steigerwald
2018-05-07 4:54 ` jdow
2018-04-27 2:11 ` moving affs + RDB partition support to staging? (was: Re: Moving unmaintained filesystems to staging) Michael Schmitz
2018-04-27 2:11 ` Michael Schmitz
2018-06-24 9:06 ` Martin Steigerwald
2018-06-24 11:33 ` moving affs + RDB partition support to staging? jdow
2018-06-24 11:40 ` jdow
2018-06-26 2:23 ` Michael Schmitz
2018-06-26 5:17 ` jdow
2018-06-26 8:12 ` Martin Steigerwald
2018-06-26 9:46 ` jdow
2018-06-26 8:31 ` Michael Schmitz
2018-06-26 9:45 ` jdow
2018-06-27 1:07 ` Michael Schmitz
2018-06-27 6:24 ` jdow
2018-06-27 8:03 ` Martin Steigerwald
2018-06-28 2:57 ` jdow
2018-06-28 7:40 ` Amiga RDB partition support for disks >= 2 TB (was: Re: moving affs + RDB partition support to staging?) Martin Steigerwald
2018-06-27 9:00 ` moving affs + RDB partition support to staging? Michael Schmitz
2018-06-28 3:44 ` jdow
2018-06-28 5:43 ` Michael Schmitz
2018-06-28 6:39 ` jdow
2018-06-28 8:16 ` Amiga RDB partition support for disks >= 2 TB (was: Re: moving affs + RDB partition support to staging?) Martin Steigerwald
2018-06-28 10:00 ` Amiga RDB partition support for disks >= 2 TB jdow
2018-06-28 11:30 ` Martin Steigerwald
2018-06-28 11:38 ` Martin Steigerwald
2018-06-28 12:31 ` jdow
2018-06-28 8:07 ` Amiga RDB partition support for disks >= 2 TB (was: Re: moving affs + RDB partition support to staging?) Martin Steigerwald
2018-06-27 7:57 ` moving affs + RDB partition support to staging? Martin Steigerwald
2018-06-28 2:56 ` jdow
2018-06-26 8:02 ` Martin Steigerwald
2018-06-26 8:40 ` Michael Schmitz
2018-06-26 9:31 ` jdow
2018-06-25 7:53 ` moving affs + RDB partition support to staging? (was: Re: Moving unmaintained filesystems to staging) Michael Schmitz
2018-06-25 8:26 ` Martin Steigerwald
2018-06-25 8:40 ` Geert Uytterhoeven
2018-04-27 8:01 ` Martin Steigerwald
2018-04-27 8:01 ` Martin Steigerwald
2018-04-26 4:58 ` Moving unmaintained filesystems to staging Nikolay Borisov
2018-04-26 5:30 ` Willy Tarreau
2018-04-26 6:11 ` Pavel Machek
2018-04-26 10:36 ` Martin Steigerwald
2018-05-03 9:18 ` Pavel Machek
2018-04-27 1:10 ` Luis R. Rodriguez
2018-04-29 12:07 ` Greg KH
2018-04-29 20:07 ` Ondrej Zary
2018-04-29 23:37 ` Greg KH
2018-05-01 10:14 ` Pavel Machek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180506073955.GJ30522@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=axboe@kernel.dk \
--cc=debian-68k@lists.debian.org \
--cc=dsterba@suse.cz \
--cc=glaubitz@physik.fu-berlin.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-m68k@lists.linux-m68k.org \
--cc=martin@lichtvoll.de \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.