From: Anton Altaparmakov <aia21@cam.ac.uk>
To: Christoph Hellwig <hch@lst.de>
Cc: ntfs-dev <linux-ntfs-dev@lists.sourceforge.net>,
fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: fishy ->put_inode usage in ntfs
Date: Mon, 14 Feb 2005 20:44:55 +0000 (GMT) [thread overview]
Message-ID: <Pine.LNX.4.60.0502142019001.4269@hermes-1.csi.cam.ac.uk> (raw)
In-Reply-To: <20050213163512.GA3686@lst.de>
On Sun, 13 Feb 2005, Christoph Hellwig wrote:
> On Thu, Feb 10, 2005 at 02:59:32PM +0000, Anton Altaparmakov wrote:
> > On Thu, 2005-02-10 at 15:50 +0100, Christoph Hellwig wrote:
> > > On Thu, Feb 10, 2005 at 02:48:26PM +0000, Anton Altaparmakov wrote:
> > > > If the igrab() were not done, it would be possible for clear_inode to be
> > > > called on the 'parent' inode whilst at the same time one or more attr
> > > > inodes (belonging to this 'parent') are in use and Bad Things(TM) would
> > > > happen...
> > >
> > > What bad thing specificly? If there's shared information we should
> > > probably refcount them separately.
> >
> > Each attr inode stores a pointer to its parent inode in NTFS_I(struct
> > inode *vi)->ext.base_ntfs_ino. This pointer would point to random
> > memory if clear_inode is called on the parent whilst the attr inode is
> > still in use.
>
> Ok. Al Viro suggested to put a spinlock around the bmp_ino pointer and
> then igrab the master inode when accessing it, iput it when done.
No lock is needed. It wasn't needed in the past so why should it be
needed now?
> Below is a rather half-backed patch to implement the suggestion. It
> compiles, but I'm pretty sure I introduced various bugs.
This patch may compile but would never work. At all. A few reasons I can
think of off the top of my head whilst reading the patch are inserted in
it below.
> Also the new spinlock isn't nice, it'd probably be better to reuse some
> other lock for this single field.
As I said I don't understand what you want a lock for.
> --- 1.85/fs/ntfs/dir.c 2004-11-05 13:48:35 +01:00
> +++ edited/fs/ntfs/dir.c 2005-02-13 17:34:51 +01:00
> @@ -1102,7 +1102,7 @@ static int ntfs_readdir(struct file *fil
> {
> s64 ia_pos, ia_start, prev_ia_pos, bmp_pos;
> loff_t fpos;
> - struct inode *bmp_vi, *vdir = filp->f_dentry->d_inode;
> + struct inode *bmp_vi = NULL, *vdir = filp->f_dentry->d_inode;
> struct super_block *sb = vdir->i_sb;
> ntfs_inode *ndir = NTFS_I(vdir);
> ntfs_volume *vol = NTFS_SB(sb);
> @@ -1250,17 +1250,14 @@ skip_index_root:
> /* Get the offset into the index allocation attribute. */
> ia_pos = (s64)fpos - vol->mft_record_size;
> ia_mapping = vdir->i_mapping;
> - bmp_vi = ndir->itype.index.bmp_ino;
> +
> + bmp_vi = ntfs_grab_bmp_inode(vdir);
> if (unlikely(!bmp_vi)) {
> - ntfs_debug("Inode 0x%lx, regetting index bitmap.", vdir->i_ino);
> - bmp_vi = ntfs_attr_iget(vdir, AT_BITMAP, I30, 4);
> - if (IS_ERR(bmp_vi)) {
> - ntfs_error(sb, "Failed to get bitmap attribute.");
> - err = PTR_ERR(bmp_vi);
> - goto err_out;
> - }
> - ndir->itype.index.bmp_ino = bmp_vi;
> + ntfs_error(sb, "Failed to get bitmap attribute.");
> + err = -EINVAL;
> + goto err_out;
So every time we get a concurrent clear_inode() and iget() for the same
inode what happens? We get your "Failed to get bitmap attribute." every
time? Or can clear_inode only be called once the inode is removed from
icache? The code you are removing is getting the inode back in the case
that it has disappeared (which I was able to cause really easily by
applying even moderate read-access stress on a mounted volume)...
> }
> +
> bmp_mapping = bmp_vi->i_mapping;
> /* Get the starting bitmap bit position and sanity check it. */
> bmp_pos = ia_pos >> ndir->itype.index.block_size_bits;
> @@ -1445,6 +1442,8 @@ EOD:
> abort:
> kfree(name);
> done:
> + if (bmp_vi)
> + ntfs_ungrab_bmp_inode(vdir);
> #ifdef DEBUG
> if (!rc)
> ntfs_debug("EOD, fpos 0x%llx, returning 0.", fpos);
> @@ -1455,6 +1454,8 @@ done:
> filp->f_pos = fpos;
> return 0;
> err_out:
> + if (bmp_vi)
> + ntfs_ungrab_bmp_inode(vdir);
> if (bmp_page)
> ntfs_unmap_page(bmp_page);
> if (ia_page) {
> @@ -1537,8 +1538,16 @@ static int ntfs_dir_fsync(struct file *f
>
> ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
> BUG_ON(!S_ISDIR(vi->i_mode));
> - if (NInoIndexAllocPresent(ni) && ni->itype.index.bmp_ino)
> - write_inode_now(ni->itype.index.bmp_ino, !datasync);
> +
> + if (NInoIndexAllocPresent(ni)) {
> + struct inode *bmp_inode = ntfs_grab_bmp_inode(vi);
> +
> + if (bmp_inode) {
> + write_inode_now(bmp_inode, !datasync);
> + ntfs_ungrab_bmp_inode(vi);
> + }
> + }
> +
> ret = ntfs_write_inode(vi, 1);
> write_inode_now(vi, !datasync);
> err = sync_blockdev(vi->i_sb->s_bdev);
> --- 1.154/fs/ntfs/inode.c 2004-11-09 11:42:05 +01:00
> +++ edited/fs/ntfs/inode.c 2005-02-13 17:40:55 +01:00
> @@ -388,6 +388,7 @@ void __ntfs_init_inode(struct super_bloc
> ni->attr_list = NULL;
> ntfs_init_runlist(&ni->attr_list_rl);
> ni->itype.index.bmp_ino = NULL;
> + spin_lock_init(&ni->itype.index.bmp_ino_lock);
> ni->itype.index.block_size = 0;
> ni->itype.index.vcn_size = 0;
> ni->itype.index.collation_rule = 0;
> @@ -414,6 +415,29 @@ inline ntfs_inode *ntfs_new_extent_inode
> return ni;
> }
>
> +/*
> + * Grab primary inode when we're using the attr inode because they
> + * share state.
> + */
> +struct inode *ntfs_grab_bmp_inode(struct inode *inode)
> +{
> + struct inode *bmp_inode;
> + ntfs_inode *ni = NTFS_I(inode);
> +
> + spin_lock(&ni->itype.index.bmp_ino_lock);
> + bmp_inode = ni->itype.index.bmp_ino;
> + if (bmp_inode && !igrab(inode))
> + bmp_inode = NULL;
> + spin_unlock(&ni->itype.index.bmp_ino_lock);
> +
> + return inode;
> +}
What about when bmp_inode is NULL? Surely this function should be
regetting it? (See above comment.)
> +
> +void ntfs_ungrab_bmp_inode(struct inode *inode)
> +{
> + iput(inode);
> +}
> +
> /**
> * ntfs_is_extended_system_file - check if a file is in the $Extend directory
> * @ctx: initialized attribute search context
> @@ -1365,7 +1389,6 @@ static int ntfs_read_locked_attr_inode(s
> * Make sure the base inode doesn't go away and attach it to the
> * attribute inode.
> */
> - igrab(base_vi);
> ni->ext.base_ntfs_ino = base_ni;
> ni->nr_extents = -1;
>
> @@ -1651,7 +1674,6 @@ skip_large_index_stuff:
> * Make sure the base inode doesn't go away and attach it to the
> * index inode.
> */
> - igrab(base_vi);
> ni->ext.base_ntfs_ino = base_ni;
> ni->nr_extents = -1;
>
> @@ -2102,37 +2124,6 @@ err_out:
> return -1;
> }
>
> -/**
> - * ntfs_put_inode - handler for when the inode reference count is decremented
> - * @vi: vfs inode
> - *
> - * The VFS calls ntfs_put_inode() every time the inode reference count (i_count)
> - * is about to be decremented (but before the decrement itself.
> - *
> - * If the inode @vi is a directory with two references, one of which is being
> - * dropped, we need to put the attribute inode for the directory index bitmap,
> - * if it is present, otherwise the directory inode would remain pinned for
> - * ever.
> - */
> -void ntfs_put_inode(struct inode *vi)
> -{
> - if (S_ISDIR(vi->i_mode) && atomic_read(&vi->i_count) == 2) {
> - ntfs_inode *ni = NTFS_I(vi);
> - if (NInoIndexAllocPresent(ni)) {
> - struct inode *bvi = NULL;
> - down(&vi->i_sem);
> - if (atomic_read(&vi->i_count) == 2) {
> - bvi = ni->itype.index.bmp_ino;
> - if (bvi)
> - ni->itype.index.bmp_ino = NULL;
> - }
> - up(&vi->i_sem);
> - if (bvi)
> - iput(bvi);
This iput() is essential. Your patch doesn't seem to place it anywhere.
This means that all the bmp_ino inodes in the mounted fs will never have
their reference dropped so when you try to umount you will get "VFS: Busy
inodes. Self destruct in 5 seconds."...
> - }
> - }
> -}
> -
> static void __ntfs_clear_inode(ntfs_inode *ni)
> {
> /* Free all alocated memory. */
> @@ -2198,18 +2189,6 @@ void ntfs_clear_big_inode(struct inode *
> {
> ntfs_inode *ni = NTFS_I(vi);
>
> - /*
> - * If the inode @vi is an index inode we need to put the attribute
> - * inode for the index bitmap, if it is present, otherwise the index
> - * inode would disappear and the attribute inode for the index bitmap
> - * would no longer be referenced from anywhere and thus it would remain
> - * pinned for ever.
> - */
> - if (NInoAttr(ni) && (ni->type == AT_INDEX_ALLOCATION) &&
> - NInoIndexAllocPresent(ni) && ni->itype.index.bmp_ino) {
> - iput(ni->itype.index.bmp_ino);
> - ni->itype.index.bmp_ino = NULL;
> - }
You cannot remove this code or even move it to below the write commit and
__ntfs_clear_inode call or it will break.
> #ifdef NTFS_RW
> if (NInoDirty(ni)) {
> BOOL was_bad = (is_bad_inode(vi));
> @@ -2236,15 +2215,22 @@ void ntfs_clear_big_inode(struct inode *
>
> __ntfs_clear_inode(ni);
>
> + /* Release the base inode if we are holding it. */
> if (NInoAttr(ni)) {
> - /* Release the base inode if we are holding it. */
> - if (ni->nr_extents == -1) {
> - iput(VFS_I(ni->ext.base_ntfs_ino));
> + ntfs_inode *base_inode = ni->ext.base_ntfs_ino;
You are not testing nr_extents and thus base_inode could be NULL or worse
could be a different meaning altogether (ni->ext is a union the meaning of
which is determined by nr_extents which must be checked before accessing
ni->ext).
> +
> + if (S_ISDIR(VFS_I(base_inode)->i_mode)) {
> + spin_lock(&base_inode->itype.index.bmp_ino_lock);
> + base_inode->itype.index.bmp_ino = NULL;
> + iput(VFS_I(base_inode));
Why are we putting the base inode here? I thought your patch removes the
getting of a reference to it...
> + spin_unlock(&base_inode->itype.index.bmp_ino_lock);
> + ni->ext.base_ntfs_ino = NULL;
> + } else if (ni->nr_extents == -1) {
> ni->nr_extents = 0;
> ni->ext.base_ntfs_ino = NULL;
> + iput(VFS_I(base_inode));
Ditto.
And when the directory is being cleared? Where is the iput() for the
bmp_ino? It used to be in put_inode but you get rid of it in your patch.
Why btw? It seems really silly to remove it considering it is there and
it works and it is not hurting anybody...
> }
> }
> - return;
> }
>
> /**
> --- 1.45/fs/ntfs/inode.h 2004-10-25 11:46:30 +02:00
> +++ edited/fs/ntfs/inode.h 2005-02-13 17:32:58 +01:00
> @@ -101,6 +101,7 @@ struct _ntfs_inode {
> struct { /* It is a directory, $MFT, or an index inode. */
> struct inode *bmp_ino; /* Attribute inode for the
> index $BITMAP. */
> + spinlock_t bmp_ino_lock;
> u32 block_size; /* Size of an index block. */
> u32 vcn_size; /* Size of a vcn in this
> index. */
> @@ -275,6 +276,9 @@ extern struct inode *ntfs_attr_iget(stru
> extern struct inode *ntfs_index_iget(struct inode *base_vi, ntfschar *name,
> u32 name_len);
>
> +extern struct inode *ntfs_grab_bmp_inode(struct inode *inode);
> +extern void ntfs_ungrab_bmp_inode(struct inode *inode);
> +
> extern struct inode *ntfs_alloc_big_inode(struct super_block *sb);
> extern void ntfs_destroy_big_inode(struct inode *inode);
> extern void ntfs_clear_big_inode(struct inode *vi);
> @@ -295,8 +299,6 @@ extern ntfs_inode *ntfs_new_extent_inode
> extern void ntfs_clear_extent_inode(ntfs_inode *ni);
>
> extern int ntfs_read_inode_mount(struct inode *vi);
> -
> -extern void ntfs_put_inode(struct inode *vi);
>
> extern int ntfs_show_options(struct seq_file *sf, struct vfsmount *mnt);
>
> --- 1.184/fs/ntfs/super.c 2005-01-05 03:48:14 +01:00
> +++ edited/fs/ntfs/super.c 2005-02-13 17:15:47 +01:00
> @@ -2186,9 +2186,6 @@ static int ntfs_statfs(struct super_bloc
> static struct super_operations ntfs_sops = {
> .alloc_inode = ntfs_alloc_big_inode, /* VFS: Allocate new inode. */
> .destroy_inode = ntfs_destroy_big_inode, /* VFS: Deallocate inode. */
> - .put_inode = ntfs_put_inode, /* VFS: Called just before
> - the inode reference count
> - is decreased. */
> #ifdef NTFS_RW
> //.dirty_inode = NULL, /* VFS: Called from
> // __mark_inode_dirty(). */
>
Best regards,
Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/
next prev parent reply other threads:[~2005-02-14 20:45 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-10-14 11:26 fishy ->put_inode usage in ntfs Christoph Hellwig
2004-10-14 12:39 ` Anton Altaparmakov
2004-10-14 12:44 ` Matthew Wilcox
2004-10-14 13:27 ` Anton Altaparmakov
2004-10-14 14:59 ` Anton Altaparmakov
2004-10-14 12:59 ` Christoph Hellwig
2004-10-14 13:26 ` Anton Altaparmakov
2005-02-10 10:47 ` Christoph Hellwig
2005-02-10 14:40 ` Anton Altaparmakov
2005-02-10 14:42 ` Christoph Hellwig
2005-02-10 14:48 ` Anton Altaparmakov
2005-02-10 14:50 ` Anton Altaparmakov
2005-02-10 14:51 ` Christoph Hellwig
2005-02-10 14:50 ` Christoph Hellwig
2005-02-10 14:59 ` Anton Altaparmakov
2005-02-13 16:35 ` Christoph Hellwig
2005-02-14 20:44 ` Anton Altaparmakov [this message]
2005-03-01 23:17 ` David Woodhouse
2005-03-02 8:43 ` Anton Altaparmakov
2005-03-02 8:53 ` David Woodhouse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.60.0502142019001.4269@hermes-1.csi.cam.ac.uk \
--to=aia21@cam.ac.uk \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-ntfs-dev@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).