From: Ingo Molnar <mingo@elte.hu>
To: Anton Altaparmakov <aia21@cam.ac.uk>
Cc: Miles Lane <miles.lane@gmail.com>,
LKML <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@osdl.org>,
Arjan van de Ven <arjan@infradead.org>
Subject: Re: 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected!
Date: Mon, 12 Jun 2006 11:40:11 +0200 [thread overview]
Message-ID: <20060612094011.GA32640@elte.hu> (raw)
In-Reply-To: <1150102041.24273.15.camel@imp.csi.cam.ac.uk>
* Anton Altaparmakov <aia21@cam.ac.uk> wrote:
> I will do that when I have some more spare time... I don't think it
> is particularly urgent given it has always been like this and if
> anyone is trying to kill their file system even fixing this will not
> stop them... They would just need to run the ntfsprogs provided
> utilities to achieve anything they like... [...]
i agree that it's not urgent, but i think we have a small disagreement
wrt. the philosophy behind that conclusion :-)
manipulating a device file (i.e. running ntfsprogs) is a known "hang
yourself" thing that people are and must be aware of - but messing
around within the VFS namespace of a filesystem itself ought to be safe,
even if the file can only be accessed via a special mount option and if
it's called $Bitmap. The mount option doesnt really say that it's
unsafe:
show_sys_files
If show_sys_files is specified, show the system files in direc-
tory listings. Otherwise the default behaviour is to hide the
system files. Note that even when show_sys_files is specified,
"$MFT" may will not be visible due to bugs/mis-features in
glibc. Further, note that irrespective of show_sys_files, all
files are accessible by name, i.e. you can always do "ls -l
'$UpCase'" for example to specifically show the system file con-
taining the Unicode upcase table.
and the dangers of direct namespace access are apparently anticipated in
the permissions of the $Mft itself:
[root@neptune mnt2]# ls -l \$Mft
---------- 1 root root 69632 Jan 1 1970 $Mft
and finally, lets call this a real bug in NTFS: we both worked hard to
cover it via the lock validator =B-)
in any case, the message will only trigger if someone abuses one of the
system files, so there is no urgency, and that will also serve as a
reminder.
below i've attached a cleaned up version of my NTFS annotations so far.
I think they are pretty straightforward and nonintrusive - for
!CONFIG_LOCKDEP they add zero changes to ntfs.ko. Most of the linecount
of the patch comes from comments - so i think we might even consider
this a "document locking rules" patch :-)
Ingo
---------------------------
Subject: lock validator: annotate NTFS locking rules
From: Ingo Molnar <mingo@elte.hu>
NTFS uses lots of type-opaque objects which acquire their true
identity runtime - so the lock validator needs to be helped in
a couple of places to figure out object types.
Many thanks to Anton Altaparmakov for giving lots of explanations
about NTFS locking rules.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
fs/ntfs/inode.c | 33 +++++++++++++++++++++++++++++++++
fs/ntfs/mft.c | 2 +-
fs/ntfs/super.c | 31 +++++++++++++++++++++++++++++++
3 files changed, 65 insertions(+), 1 deletion(-)
Index: linux/fs/ntfs/inode.c
===================================================================
--- linux.orig/fs/ntfs/inode.c
+++ linux/fs/ntfs/inode.c
@@ -367,6 +367,12 @@ static void ntfs_destroy_extent_inode(nt
kmem_cache_free(ntfs_inode_cache, ni);
}
+/*
+ * The attribute runlist lock has separate locking rules from the
+ * normal runlist lock, so split the two lock-types:
+ */
+static struct lockdep_type_key attr_list_rl_lock_type;
+
/**
* __ntfs_init_inode - initialize ntfs specific part of an inode
* @sb: super block of mounted volume
@@ -394,6 +400,8 @@ void __ntfs_init_inode(struct super_bloc
ni->attr_list_size = 0;
ni->attr_list = NULL;
ntfs_init_runlist(&ni->attr_list_rl);
+ lockdep_reinit_key(&ni->attr_list_rl.lock,
+ &attr_list_rl_lock_type);
ni->itype.index.bmp_ino = NULL;
ni->itype.index.block_size = 0;
ni->itype.index.vcn_size = 0;
@@ -405,6 +413,13 @@ void __ntfs_init_inode(struct super_bloc
ni->ext.base_ntfs_ino = NULL;
}
+/*
+ * Extent inodes get MFT-mapped in a nested way, while the base inode
+ * is still mapped. Teach this nesting to the lock validator by creating
+ * a separate type for nested inode's mrec_lock's:
+ */
+static struct lockdep_type_key extent_inode_mrec_lock_key;
+
inline ntfs_inode *ntfs_new_extent_inode(struct super_block *sb,
unsigned long mft_no)
{
@@ -413,6 +428,7 @@ inline ntfs_inode *ntfs_new_extent_inode
ntfs_debug("Entering.");
if (likely(ni != NULL)) {
__ntfs_init_inode(sb, ni);
+ lockdep_reinit_key(&ni->mrec_lock, &extent_inode_mrec_lock_key);
ni->mft_no = mft_no;
ni->type = AT_UNUSED;
ni->name = NULL;
@@ -1722,6 +1738,15 @@ err_out:
return err;
}
+/*
+ * The MFT inode has special locking, so teach the lock validator
+ * about this by splitting off the locking rules of the MFT from
+ * the locking rules of other inodes. The MFT inode can never be
+ * accessed from the VFS side (or even internally), only by the
+ * map_mft functions.
+ */
+static struct lockdep_type_key mft_ni_runlist_lock_key, mft_ni_mrec_lock_key;
+
/**
* ntfs_read_inode_mount - special read_inode for mount time use only
* @vi: inode to read
@@ -2148,6 +2173,14 @@ int ntfs_read_inode_mount(struct inode *
ntfs_attr_put_search_ctx(ctx);
ntfs_debug("Done.");
ntfs_free(m);
+
+ /*
+ * Split the locking rules of the MFT inode from the
+ * locking rules of other inodes:
+ */
+ lockdep_reinit_key(&ni->runlist.lock, &mft_ni_runlist_lock_key);
+ lockdep_reinit_key(&ni->mrec_lock, &mft_ni_mrec_lock_key);
+
return 0;
em_put_err_out:
Index: linux/fs/ntfs/mft.c
===================================================================
--- linux.orig/fs/ntfs/mft.c
+++ linux/fs/ntfs/mft.c
@@ -348,7 +348,7 @@ map_err_out:
base_ni->ext.extent_ntfs_inos = tmp;
}
base_ni->ext.extent_ntfs_inos[base_ni->nr_extents++] = ni;
- mutex_unlock(&base_ni->extent_lock);
+ mutex_unlock_non_nested(&base_ni->extent_lock);
atomic_dec(&base_ni->count);
ntfs_debug("Done 2.");
*ntfs_ino = ni;
Index: linux/fs/ntfs/super.c
===================================================================
--- linux.orig/fs/ntfs/super.c
+++ linux/fs/ntfs/super.c
@@ -1724,6 +1724,14 @@ upcase_failed:
return FALSE;
}
+/*
+ * The lcn and mft bitmap inodes are NTFS-internal inodes with
+ * their own special locking rules:
+ */
+static struct lockdep_type_key
+ lcnbmp_runlist_lock_key, lcnbmp_mrec_lock_key,
+ mftbmp_runlist_lock_key, mftbmp_mrec_lock_key;
+
/**
* load_system_files - open the system files using normal functions
* @vol: ntfs super block describing device whose system files to load
@@ -1780,6 +1788,10 @@ static BOOL load_system_files(ntfs_volum
ntfs_error(sb, "Failed to load $MFT/$BITMAP attribute.");
goto iput_mirr_err_out;
}
+ lockdep_reinit_key(&NTFS_I(vol->mftbmp_ino)->runlist.lock,
+ &mftbmp_runlist_lock_key);
+ lockdep_reinit_key(&NTFS_I(vol->mftbmp_ino)->mrec_lock,
+ &mftbmp_mrec_lock_key);
/* Read upcase table and setup @vol->upcase and @vol->upcase_len. */
if (!load_and_init_upcase(vol))
goto iput_mftbmp_err_out;
@@ -1802,6 +1814,11 @@ static BOOL load_system_files(ntfs_volum
iput(vol->lcnbmp_ino);
goto bitmap_failed;
}
+ lockdep_reinit_key(&NTFS_I(vol->lcnbmp_ino)->runlist.lock,
+ &lcnbmp_runlist_lock_key);
+ lockdep_reinit_key(&NTFS_I(vol->lcnbmp_ino)->mrec_lock,
+ &lcnbmp_mrec_lock_key);
+
NInoSetSparseDisabled(NTFS_I(vol->lcnbmp_ino));
if ((vol->nr_clusters + 7) >> 3 > i_size_read(vol->lcnbmp_ino)) {
iput(vol->lcnbmp_ino);
@@ -2742,6 +2759,17 @@ static int ntfs_fill_super(struct super_
struct inode *tmp_ino;
int blocksize, result;
+ /*
+ * We do a pretty difficult piece of bootstrap by reading the
+ * MFT (and other metadata) from disk into memory. We'll only
+ * release this metadata during umount, so the locking patterns
+ * observed during bootstrap do not count. So turn off the
+ * observation of locking patterns (strictly for this context
+ * only) while mounting NTFS. [The validator is still active
+ * otherwise, even for this context: it will for example record
+ * lock type registrations.]
+ */
+ lockdep_off();
ntfs_debug("Entering.");
#ifndef NTFS_RW
sb->s_flags |= MS_RDONLY;
@@ -2753,6 +2781,7 @@ static int ntfs_fill_super(struct super_
if (!silent)
ntfs_error(sb, "Allocation of NTFS volume structure "
"failed. Aborting mount...");
+ lockdep_on();
return -ENOMEM;
}
/* Initialize ntfs_volume structure. */
@@ -2939,6 +2968,7 @@ static int ntfs_fill_super(struct super_
mutex_unlock(&ntfs_lock);
sb->s_export_op = &ntfs_export_ops;
lock_kernel();
+ lockdep_on();
return 0;
}
ntfs_error(sb, "Failed to allocate root directory.");
@@ -3058,6 +3088,7 @@ err_out_now:
sb->s_fs_info = NULL;
kfree(vol);
ntfs_debug("Failed, returning -EINVAL.");
+ lockdep_on();
return -EINVAL;
}
next prev parent reply other threads:[~2006-06-12 9:41 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-08 4:27 2.6.17-rc6-mm1 -- BUG: possible circular locking deadlock detected! Miles Lane
2006-06-08 7:32 ` Anton Altaparmakov
2006-06-08 9:55 ` Ingo Molnar
2006-06-08 10:53 ` Anton Altaparmakov
2006-06-08 11:23 ` Ingo Molnar
2006-06-09 8:09 ` Anton Altaparmakov
2006-06-10 7:59 ` Ingo Molnar
2006-06-10 8:48 ` Anton Altaparmakov
2006-06-11 5:31 ` Ingo Molnar
2006-06-11 7:10 ` Anton Altaparmakov
2006-06-12 8:31 ` Ingo Molnar
2006-06-12 8:47 ` Anton Altaparmakov
2006-06-12 9:40 ` Ingo Molnar [this message]
2006-06-12 10:24 ` Anton Altaparmakov
2006-06-12 10:56 ` Ingo Molnar
2006-06-13 5:41 ` Miles Lane
2006-06-13 21:18 ` Ingo Molnar
2006-06-08 16:11 ` Ingo Molnar
2006-06-08 21:17 ` Anton Altaparmakov
2006-06-09 7:19 ` Ingo Molnar
2006-06-09 8:31 ` Anton Altaparmakov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060612094011.GA32640@elte.hu \
--to=mingo@elte.hu \
--cc=aia21@cam.ac.uk \
--cc=akpm@osdl.org \
--cc=arjan@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miles.lane@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.