From: Ian Kent <raven@themaw.net>
To: Sage Weil <sage@newdream.net>
Cc: linux-fsdevel@vger.kernel.org, hch@infradead.org,
akpm@linux-foundation.org, Al Viro <viro@zeniv.linux.org.uk>,
Andreas Dilger <adilger@sun.com>,
Yehuda Sadeh <yehuda@newdream.net>
Subject: Re: [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held
Date: Thu, 26 Mar 2009 11:09:31 +0900 [thread overview]
Message-ID: <49CAE3DB.3030909@themaw.net> (raw)
In-Reply-To: <Pine.LNX.4.64.0903251148500.19272@cobra.newdream.net>
Sage Weil wrote:
> On Thu, 26 Mar 2009, Ian Kent wrote:
>> Ian Kent wrote:
>>> Sage Weil wrote:
>>>> On Tue, 24 Mar 2009, Ian Kent wrote:
>>>>> Ian Kent wrote:
>>>>>> Sage Weil wrote:
>>>>>>> real_lookup() is called by do_lookup() if dentry revalidation fails. If
>>>>>>> the cache is re-populated while waiting for i_mutex, it may find that
>>>>>>> a d_lookup() subsequently succeeds (see the "Uhhuh! Nasty case" comment).
>>>>>>>
>>>>>>> Previously, real_lookup() would drop i_mutex and do_revalidate() again. If
>>>>>>> revalidate failed _again_, however, it would give up with -ENOENT. The
>>>>>>> problem here that network file systems may be invalidating dentries via
>>>>>>> server callbacks, e.g. due to concurrent access from another client, and
>>>>>>> -ENOENT is frequently the wrong answer.
>>>>>> This will be something of a problem for autofs4 (and autofs).
>>>>>> It would require fairly significant changes to the revalidate code.
>>>>> Or maybe not that significant, I'm not sure yet.
>>>> Can you be more specific about your concern? Since the locking rules
>>>> aren't really changing (see below), the main thing to worry about is the
>>>> loss of the wonky -ENOENT error..
>>> Historically, the lock inconsistency has been around for a long time and
>>> autofs(4) has worked around that issue. The situation is that when
>>> autofs(4) needs to send a request back to user space it must be sent
>>> without the mutex held. This callback can be needed in the lookup or in
>>> revalidate, depending on the situation. Currently when the callback is
>>> needed in revalidate, the lock is never held.
>>>
>>> If the lock is now held through revalidate then I will need to move the
>>> lock release/re-acquire from the lookup to the revalidate. My initial
>>> concern was that if we need to release and re-aquire the lock more than
>>> once in the revalidate code the complexity of potential deadlock goes up
>>> considerably. Having put so much effort into dealing with deadlock
>>> issues in lookup with just one release and re-acquire I'm worried I'll
>>> end up with more problems.
>>>
>>> However, it looks like I will only need to do the release/re-acquire
>>> once in revalidate, so maybe the status-quo will be maintained and no
>>> additional issues will surface.
>>>
>> It's just occurred to me that your proposing to hold the mutex over one
>> of the two calls that can occur as a result of do_lookup(). That's not
>> good because then we possibly have mixed locking for the same VFS
>> operation. Last time I tried to deal with this I got into a "am I the
>> lock owner, can I release it" dilemma.
>>
>> So, I guess I'm saying that, to do this we would need to hold the lock
>> for both the calls to revalidate.
>
> I guess the question is whether consistent revalidate rules from the point
> of do_lookup() are intended. There are various other paths (through
> lookup_hash() and cached_lookup()) that do take the mutex. Whether you
> get 'mixed locking' is just a matter of what starting point you choose.
>
> And as far as I can tell, that's a good thing. In general, we want an
> unlocked revalidate to avoid mutex contention during cached lookups. But
> we also want locked revalidate for paths where changes are being made to
> avoid races. It sounds like autofs was just getting lucky before since
> none of the operations that use those paths are supported.
It's not due to luck.
>
> Would it be possible to avoid the upcall in revalidate, and instead fail
> and let the subsequent lookup path do it? (I'm guessing the upcall
> doesn't happen for _every_ revalidate?)
Yes, that's right, just every revalidate from processes that aren't the
automount process itself. The normal case is the mount succeeds and
further walks follow the mount from then on until it expires.
It was more than three years ago when I tried to make everything go
through lookup so my memory is pretty unclear now. In the end I think
there was one case in real_lookup() where the lookup was skipped,
revalidate was called and failed but lookup wasn't then called again and
I got an incorrect failure. AFAICR all other code paths that hold the
mutex over lookup and revalidate perform the revalidate first and then
the lookup if that fails, which avoids this case.
>
>
>> But if revalidate returns an error code then it will be returned and so
>> __link_path_walk() will then return that. At least that is what I
>> intended when I did it, but maybe I didn't get it quite right for
>> everyone, for example there would be no call to d_invalidate in that
>> case. Can you perhaps use that mechanism?
>
> I'm not sure what you mean. When this problem currently comes up,
> revalidate doesn't want to return an error... it wants ->lookup() to be
> called again, but instead real_lookup() is returning -ENOENT. The obvious
> alternative of looping (instead of holding i_mutex over the revalidate)
> would (at least theoretically) be starvation prone.
I just don't understand what you need to resolve.
If returning an error isn't what you need then that's just the way it is.
Ian
next prev parent reply other threads:[~2009-03-26 2:09 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-19 20:16 [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held Sage Weil
2009-03-19 20:16 ` [PATCH 2/2] vfs: clean up real_lookup Sage Weil
2009-03-19 20:22 ` Christoph Hellwig
2009-03-19 20:35 ` Sage Weil
2009-03-19 20:23 ` [PATCH 1/2] vfs: make real_lookup do dentry revalidation with i_mutex held Christoph Hellwig
2009-03-24 4:14 ` Ian Kent
2009-03-24 4:18 ` Ian Kent
2009-03-25 4:29 ` Sage Weil
2009-03-25 6:08 ` Ian Kent
2009-03-25 16:11 ` Ian Kent
2009-03-25 19:11 ` Sage Weil
2009-03-26 2:09 ` Ian Kent [this message]
2009-03-26 3:53 ` Sage Weil
2009-03-26 8:00 ` Ian Kent
2009-03-26 10:38 ` Ian Kent
2009-03-29 8:53 ` Ian Kent
2009-04-03 0:58 ` Sage Weil
2009-04-03 2:00 ` Ian Kent
2009-04-03 3:07 ` Sage Weil
2009-06-22 17:15 ` Sage Weil
2009-06-23 0:37 ` Ian Kent
2009-06-23 2:40 ` H. Peter Anvin
2009-06-25 7:21 ` Ian Kent
2009-06-25 13:41 ` H. Peter Anvin
2009-06-25 13:58 ` Christoph Hellwig
2009-06-23 2:42 ` H. Peter Anvin
2009-06-24 2:28 ` Ian Kent
2009-06-24 5:45 ` Sage Weil
2009-06-24 9:17 ` Ian Kent
2009-06-24 17:46 ` Sage Weil
2009-06-25 2:50 ` Ian Kent
2009-06-25 4:13 ` Ian Kent
2009-06-25 4:49 ` Sage Weil
2009-06-25 5:52 ` Ian Kent
2009-09-17 6:36 ` Ian Kent
2009-07-20 2:45 ` Ian Kent
2009-07-28 22:47 ` Sage Weil
2009-07-29 2:59 ` Ian Kent
2009-07-29 16:57 ` Sage Weil
2009-07-30 0:56 ` Ian Kent
2009-07-30 17:47 ` Sage Weil
2009-07-31 2:03 ` Ian Kent
2009-03-26 3:54 ` Ian Kent
2009-03-26 4:03 ` Sage Weil
2009-03-26 5:07 ` Ian Kent
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49CAE3DB.3030909@themaw.net \
--to=raven@themaw.net \
--cc=adilger@sun.com \
--cc=akpm@linux-foundation.org \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=sage@newdream.net \
--cc=viro@zeniv.linux.org.uk \
--cc=yehuda@newdream.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).