linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ram Pai <linuxram@us.ibm.com>
To: Ian Kent <raven@themaw.net>
Cc: autofs@linux.kernel.org, linux-fsdevel@vger.kernel.org,
	William H Taber <wtaber@us.ibm.com>
Subject: Re: [autofs] [RFC PATCH]autofs4: hang and proposed fix
Date: Wed, 16 Nov 2005 17:52:42 -0800	[thread overview]
Message-ID: <1132192362.5720.163.camel@localhost> (raw)
In-Reply-To: <Pine.LNX.4.63.0511170627150.3049@donald.themaw.net>

On Wed, 2005-11-16 at 14:57, Ian Kent wrote:
> On Wed, 16 Nov 2005, Ram Pai wrote:
> 
> > On Wed, 2005-11-16 at 04:41, Ian Kent wrote:
> > > On Wed, 16 Nov 2005, Ram Pai wrote:
> > > 
> > > Thanks for you effort Ram.
> > > 
> > > > Autofs4 assumes that its ->revalidate() function gets called with the
> > > > parent_dentry's_inode_semaphore released. This is true mostly
> > > > but not in one particular case.
> > > 
> > > Yep. Certainly does.
> > > 
> > > Isn't my mistake not noticing that the inode semaphore is taken in 
> > > vfs_readdir?
> > > 
> > > It's been like that all along and I can't understand how I didn't notice 
> > > it before.
> > > 
> > > Help me out a bit here please Ram.
> > > 
> > > Aren't there other paths that enter revalidate without holding the 
> > > semaphore? chdir?
> > 
> > Looking at the code and it seemed to me that ->revalidate() function is
> > always with the semaphore held. Atleast VFS seem to have that
> > assumption.
> 
> My reading gives me the opposite impression.
> 
> The chdir example above calls the path walking routine which calls the 
> lookup method with the semaphore held and revalidate without it held. 
> Clearly, there are a number of other examples.
> 
> Certainly, I could be wrong and I'll be checking that.

I see your point. Looking more through the code it looks like
the convention about how ->revalidate() gets called, seems to be
inconsistent in VFS.

in do_lookup() which calls ->revalidate(), the semaphore is not-held.

Where as lookup_one_len() is expected to be called with the semaphore
held. This function calls lookup_hash() which calls cached_lookup()
which later calls ->revalidate(), and here ->revalidate() is called with
the semaphore held.  Is this the source of the bug?


> > 
> > And looking at the autofs4 code, I  get the impression, that it
> > assumes that the semaphore is released when it gets called. Which seems
> > to be inconsistent and wrong.
> 
> It does but I thought that was VFS design and I'm willing to be corrected 
> if I'm wrong.
> 
> Thinking about it and looking at the stack trace I'm having a bit of 
> trouble working out why there is a mount wait being triggered here at all. 
> 
> I think this is a getdents call so then there should have been an open 
> which would have done the mount wait, followed by the getdents call itself 
> and finally a close. I see this sequence all the time when I'm using debug 
> to log activity and it's also evident from the corresponding functions in 
> fs/libfs.c.

In this case there is no explicit open on a autofs4 directory. The
readdir is taking place on a directory belonging to the stubfs
filesystem. Internally stubfs filesystem is trying to open the
automounter's dentry through the 
lookup_one_len() call.  And this triggers the automouter into action.

> 
> What I can't work out is how getdents appears to be called without having 
> called open. Is there anything more that you can tell me about how you 
> have been able to demonstrate this error.
> 
Maybe you are missing the stubfs part. The stubfs is kind of
in-the-middle filesystem which sits between the application and
the autofs4. 

Will Taber: Am I saying this right?

Take a look at the test patch for stubfs posted at
http://www.sudhaa.com/~ram/readahead/stubfs.patch


For clarity here is the scenario:

P1 executes 'ls' on a directory belonging to stubfs. 
          stubfs's ->lookup() gets
	  called and it internally redirects that lookup to autofs4
          by calling lookup_one_len() on /net/ram 
          note: /net belongs to autofs4 and lookup_one_len() is
           called holding the inode-semaphore of /net .
           lookup_one_len() calls lookup_hash() which finds that there
	   is no cached dentry for 'ram', and hence allocates a dentry
           and calls ->lookup() of autofs4. 
           autofs4 adds the dentry to the dcache and calls its
	   ->revalidate() after releasing the semaphore.
           ->revalidate tries to wake up the automounter daemon, and
            goes to sleep on a waitq.

P2 executes 'ls' on another directory belonging to stubfs. 
         stubfs's ->lookup()
         gets called and it internally redirects that lookup to autofs4
         by calling lookup_one_len() on /net/ram. lookup_one_len() is
         called holding the inode-semaphore of /net.  
         lookup_one_len() calls lookup_hash() which calls
         cached_lookup(). cached_lookup() finds the dentry
         corresponding to 'ram' in the dcache. So it calls
         ->revalidate() on it. NOTE: this time autofs4's 
         ->revalidate() is called holding the semaphore.
         ->revalidate() goes to sleep on the same waitq 
         waiting on the automounter to wake him up.

automouter: the automounter now comes in and tries to hold 
           the semaphore on /net and deadlocks.

The question is: Who is the culprit?  stubfs?  VFS? or
             autofs4?

RP


> > 
> > > 
> > > Does uping an open semaphore allow other undesirable side affects?
> > > 
> > > Do you think it would perhaps be better to release the semaphore in 
> > > autofs4_readdir ... hang on the stack trace doesn't look like a readdir 
> > > ... I'll have to check 2.6.15-rc1 ... ?
> > > 
> > 
> > One automounter process is in sys_mkdir() system call and the other if I
> > recollect correctly was in sys_getdent64() system call.
> > 
> > Yes this problem can be demonstrated on all versions of the 2.6 kernel.
> > Infact I reproduced it on 2.6.15-rc1 kernel.
> > 
> > > Apart from the above, looking at the patch and assuming that the semaphore 
> > > is always held it would probaby be better to move semaphore open/close 
> > > into try_to_fill_dentry as any control process using autofs, such as automount, 
> > > must never cause a mount wait to be called (oz_mode = 1).
> > 
> > 
> > 
> > > 
> > > Ideas?
> > 
> > One thing is sure, VFS assumes that the semaphore is held when
> > ->revalidate() is called, and that convention is followed religiously by
> > all filesytems.  autofs4 has this special case of releasing the
> > semaphore if it is waiting to be woken up by the automounter daemon. So
> > as you said, may be the semaphore must be released just before sleeping
> > on the waitq?
> > 
> > RP
> > 
> > > 
> > > > 
> > > > Process P1  calls autofs4's ->lookup(). The lookup finds that the dentry
> > > > does not exist. It creates a dentry and adds to the cache. Releases
> > > > the parent's inode's semaphore and than calls ->revalidate().
> > > > 
> > > > Process P2 meanwhile comes in and cached_lookup() gets called. It finds
> > > > the dentry in the cache and finds ->revalidate() function exists. So
> > > > it calls ->revalidate() holding the parent's inode's semaphore.
> > > > 
> > > > Now the automounter daemon comes in and tries to hold the same semaphore
> > > > in order to mount. But since the semaphore is held by P2 it
> > > > goes to sleep.
> > > > 
> > > > Process P1 and P2 continue waiting for the mount to complete and it never
> > > > happens. Deadlock.
> > > > 
> > > > The stack of the deadlock is as follows:
> > > > 
> > > > ls            S 00000000     0 13049  11954                     (NOTLB)
> > > > f5221df0 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > > > 00000000 f5d44a70 c721b520 00000000 d4f33800 003d0990 c721b9d8 f5d44030
> > > > f5d44164 f5220000 f5221e3c f3dd6880 f5221e68 c0215207 f3b95580 80000000
> > > > Call Trace:
> > > > [<c0215207>] autofs4_wait+0x307/0x3d0
> > > > [<c02141d3>] try_to_fill_dentry+0xf3/0x150
> > > > [<c0214389>] autofs4_revalidate+0x159/0x170
> > > > [<c02144e0>] autofs4_lookup+0x110/0x150
> > > > [<c016f3f5>] __lookup_hash+0x85/0xb0
> > > > [<c016f42a>] lookup_hash+0xa/0x10
> > > > [<c016f483>] lookup_one_len+0x53/0x70
> > > > [<f8851293>] stubfs_readdir+0x113/0x170 [stubfs]
> > > > [<c0172fcb>] vfs_readdir+0x8b/0xa0
> > > > [<c01733b3>] sys_getdents64+0x63/0xb5
> > > > [<c010464d>] syscall_call+0x7/0xb
> > > > 
> > > > ls            S C011B1AF     0 13050  11898                     (NOTLB)
> > > > f1337df0 00000082 f1337e04 c011b1af 06ce3f60 00000027 00000027 00000080
> > > > 06d03f60 00000000 c721b520 00000000 d4f33800 003d0990 f1337df0 f5d44a70
> > > > f5d44ba4 f1336000 f1337e3c f3dd6880 f1337e68 c0215207 f3b95580 80000000
> > > > Call Trace:
> > > > [<c0215207>] autofs4_wait+0x307/0x3d0
> > > > [<c02141d3>] try_to_fill_dentry+0xf3/0x150
> > > > [<c0214389>] autofs4_revalidate+0x159/0x170
> > > > [<c016dc77>] cached_lookup+0x47/0x80
> > > > [<c016f3ca>] __lookup_hash+0x5a/0xb0
> > > > [<c016f42a>] lookup_hash+0xa/0x10
> > > > [<c016f483>] lookup_one_len+0x53/0x70  
> > > > [<f88512e3>] stubfs_readdir+0x163/0x170 [stubfs]
> > > > [<c0172fcb>] vfs_readdir+0x8b/0xa0  
> > > > [<c01733b3>] sys_getdents64+0x63/0xb5
> > > > [<c010464d>] syscall_call+0x7/0xb
> > > > 
> > > > automount     D 00000010     0 13052  13016                     (NOTLB)
> > > > f3321f00 fff80000 00000007 00000010 f3321f68 c7b1cd20 00000000 f3321f34
> > > > f3321ee8 f5e92a70 c7233520 00000000 d5304100 003d0990 c7233560 f1e31a70
> > > > f1e31ba4 f5f59914 f5f5991c 00000296 f3321f38 c03b4cd3 f1e31a70 00000001
> > > > Call Trace:
> > > > [<c03b4cd3>] __down+0x83/0xe0
> > > > [<c03b3632>] __down_failed+0xa/0x10
> > > > [<c0171e6d>] .text.lock.namei+0xeb/0x1de
> > > > [<c0170482>] sys_mkdir+0x52/0xd0
> > > > [<c010464d>] syscall_call+0x7/0xb
> > > > BUG: soft lockup detected on CPU#0!
> > > > 
> > > > 
> > > > I have coded up a tentative fix. The patch releases the semaphore in
> > > > ->revalidate() function, instead of the caller of that function.  Not
> > > > sure if this is the right fix. Tested it and verified that the deadlock
> > > > is fixed.  But I am not sure if it opens up other bugs. Please validate.
> > > > 
> > > > 
> > > >  fs/autofs4/root.c |   26 +++++++++++++++-----------
> > > >  1 files changed, 15 insertions(+), 11 deletions(-)
> > > > 
> > > > Index: 2.6.15-rc1/fs/autofs4/root.c
> > > > ===================================================================
> > > > --- 2.6.15-rc1.orig/fs/autofs4/root.c
> > > > +++ 2.6.15-rc1/fs/autofs4/root.c
> > > > @@ -386,40 +386,47 @@ static int autofs4_revalidate(struct den
> > > >  	struct autofs_sb_info *sbi = autofs4_sbi(dir->i_sb);
> > > >  	int oz_mode = autofs4_oz_mode(sbi);
> > > >  	int flags = nd ? nd->flags : 0;
> > > >  	int status = 1;
> > > >  
> > > > +	up(&dir->i_sem);
> > > >  	/* Pending dentry */
> > > >  	if (autofs4_ispending(dentry)) {
> > > >  		if (!oz_mode)
> > > > -			status = try_to_fill_dentry(dentry, dir->i_sb, sbi, flags);
> > > > -		return status;
> > > > +			status = try_to_fill_dentry(dentry, dir->i_sb,
> > > > +					sbi, flags);
> > > > +		goto out;
> > > >  	}
> > > >  
> > > >  	/* Negative dentry.. invalidate if "old" */
> > > > -	if (dentry->d_inode == NULL)
> > > > -		return (dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT);
> > > > +	if (dentry->d_inode == NULL) {
> > > > +		status = (dentry->d_time - jiffies <= AUTOFS_NEGATIVE_TIMEOUT);
> > > > +		goto out;
> > > > +	}
> > > >  
> > > >  	/* Check for a non-mountpoint directory with no contents */
> > > >  	spin_lock(&dcache_lock);
> > > >  	if (S_ISDIR(dentry->d_inode->i_mode) &&
> > > >  	    !d_mountpoint(dentry) && 
> > > >  	    list_empty(&dentry->d_subdirs)) {
> > > >  		DPRINTK("dentry=%p %.*s, emptydir",
> > > >  			 dentry, dentry->d_name.len, dentry->d_name.name);
> > > >  		spin_unlock(&dcache_lock);
> > > >  		if (!oz_mode)
> > > > -			status = try_to_fill_dentry(dentry, dir->i_sb, sbi, flags);
> > > > -		return status;
> > > > +			status = try_to_fill_dentry(dentry, dir->i_sb, sbi,
> > > > +					flags);
> > > > +		goto out;
> > > >  	}
> > > >  	spin_unlock(&dcache_lock);
> > > >  
> > > >  	/* Update the usage list */
> > > >  	if (!oz_mode)
> > > >  		autofs4_update_usage(dentry);
> > > >  
> > > > -	return 1;
> > > > +out:
> > > > +	down(&dir->i_sem);
> > > > +	return status;
> > > >  }
> > > >  
> > > >  static void autofs4_dentry_release(struct dentry *de)
> > > >  {
> > > >  	struct autofs_info *inf;
> > > > @@ -485,15 +492,12 @@ static struct dentry *autofs4_lookup(str
> > > >  		spin_unlock(&dentry->d_lock);
> > > >  	}
> > > >  	dentry->d_fsdata = NULL;
> > > >  	d_add(dentry, NULL);
> > > >  
> > > > -	if (dentry->d_op && dentry->d_op->d_revalidate) {
> > > > -		up(&dir->i_sem);
> > > > +	if (dentry->d_op && dentry->d_op->d_revalidate)
> > > >  		(dentry->d_op->d_revalidate)(dentry, nd);
> > > > -		down(&dir->i_sem);
> > > > -	}
> > > >  
> > > >  	/*
> > > >  	 * If we are still pending, check if we had to handle
> > > >  	 * a signal. If so we can force a restart..
> > > >  	 */
> > > > 
> > > > _______________________________________________
> > > > autofs mailing list
> > > > autofs@linux.kernel.org
> > > > http://linux.kernel.org/mailman/listinfo/autofs
> > > > 
> > > 
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 


  reply	other threads:[~2005-11-17  1:52 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-11-16 10:17 [RFC PATCH]autofs4: hang and proposed fix Ram Pai
2005-11-16 12:41 ` [autofs] " Ian Kent
2005-11-16 16:50   ` Ram Pai
2005-11-16 22:57     ` Ian Kent
2005-11-17  1:52       ` Ram Pai [this message]
2005-11-17 18:50         ` [autofs] " Ian Kent
2005-11-17 19:19           ` William H. Taber
2005-11-17 20:39             ` Ram Pai
2005-11-17 22:31               ` William H. Taber
2005-11-18 14:57                 ` Ian Kent
2005-11-18 14:54               ` Ian Kent
2005-11-18 14:44             ` Ian Kent
2005-11-18 15:20               ` William H. Taber
2005-11-18 16:30                 ` Ian Kent
2005-11-18 17:12                   ` William H. Taber
2005-11-18 18:57                     ` Ram Pai
2005-11-18 20:08                       ` William H. Taber
2005-11-19  2:52                         ` Ian Kent
2005-11-21 16:40                           ` William H. Taber
2005-11-22 13:13                             ` Ian Kent
2005-11-22 17:48                               ` [autofs] " William H. Taber
2005-11-23 14:11                                 ` Ian Kent
2005-11-23 16:42                                   ` William H. Taber
2005-11-23 17:52                                     ` Ian Kent
2005-11-23 18:47                                       ` William H. Taber
2005-11-19  1:40                     ` Ian Kent
2005-11-16 15:22 ` Jeff Moyer
2005-11-16 17:00   ` [autofs] " Ram Pai
2005-11-16 18:25     ` Jeff Moyer
2005-11-16 19:24       ` William H. Taber
2005-11-16 19:51         ` Ram Pai
2005-11-27 10:47 ` Ian Kent
2005-11-28 17:19   ` William H. Taber
2005-11-28 23:12     ` Badari Pulavarty
2005-11-29 14:19       ` Ian Kent
2005-11-29 16:34         ` William H. Taber
2005-11-30 14:02           ` Ian Kent
2005-11-30 16:49             ` Badari Pulavarty
2005-11-30 17:04               ` Trond Myklebust
2005-11-30 21:10                 ` William H. Taber
2005-11-29 14:20     ` Ian Kent
2005-11-30  1:16 ` [autofs] " Jeff Moyer
2005-11-30  1:56   ` Trond Myklebust
2005-11-30  4:15     ` Jeff Moyer
2005-11-30  6:14       ` Trond Myklebust
2005-11-30 15:44         ` Ian Kent
2005-11-30 15:53           ` [autofs] " Trond Myklebust
2005-11-30 16:12             ` Ian Kent
2005-11-30 16:27               ` Ian Kent
2005-11-30 16:45               ` [autofs] " Trond Myklebust
2005-11-30 20:32     ` William H. Taber
2005-11-30 20:53       ` Trond Myklebust
2005-11-30 21:30         ` William H. Taber
2005-11-30 22:32           ` Trond Myklebust
2005-12-01 16:27             ` William H. Taber
2005-12-01 12:09           ` Ian Kent
2005-12-01 16:30             ` William H. Taber
2005-12-02 13:49               ` Ian Kent
2005-12-02 14:07                 ` Jeff Moyer
2005-12-02 15:21                   ` Ian Kent
2005-12-02 16:35                     ` [autofs] " Will Taber
2005-12-02 17:11                       ` Ian Kent
2005-12-02 15:34                 ` Will Taber
2005-12-02 17:29                   ` Ian Kent
2005-12-02 18:12                     ` Trond Myklebust
2005-12-04 12:56                       ` Christoph Hellwig
2005-12-04 12:57                         ` Christoph Hellwig
2005-12-04 14:58                           ` Ian Kent
2005-12-04 17:17                             ` [autofs] " Christoph Hellwig
2005-12-05 14:02                               ` Ian Kent
2005-12-06 21:20                               ` Jeff Moyer
2005-12-06 21:40                                 ` Christoph Hellwig
2005-12-06 22:37                                   ` Jeff Moyer
2005-12-07 14:52                                   ` Will Taber
2005-12-07 15:18                                     ` Christoph Hellwig
2005-12-07 15:22                                   ` Brian Long
2005-12-07 15:25                                     ` Christoph Hellwig
2005-12-07 17:46                                     ` Will Taber
2005-12-08 14:16                                       ` Ian Kent
2005-12-09 12:12                                       ` Christoph Hellwig
2005-12-09 13:33                                         ` John T. Kohl
2005-12-13 18:39                                           ` Christoph Hellwig
2005-12-04 14:56                         ` Ian Kent
2005-12-02 19:04                     ` [autofs] " Will Taber
2005-12-04  9:39                       ` Ian Kent
2005-12-02 16:04                 ` [autofs] " Jeff Moyer
2005-12-02 17:36                   ` Ian Kent
2005-12-02 18:33                     ` [autofs] " Will Taber
2005-12-04  9:52                       ` Ian Kent
2005-12-04 14:54                         ` Ian Kent
2005-12-05 15:40                           ` Ian Kent
2005-11-30 14:48   ` [autofs] " Ian Kent

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1132192362.5720.163.camel@localhost \
    --to=linuxram@us.ibm.com \
    --cc=autofs@linux.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=raven@themaw.net \
    --cc=wtaber@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).