regression when opening directories on NFSv4

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

* regression when opening directories on NFSv4
@ 2011-09-21 15:58 Jeff Layton
  2011-09-21 18:53 ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2011-09-21 15:58 UTC (permalink / raw)
  To: linux-nfs

We had a regression reported against RHEL concerning the opening of
directories and it looks like that same problem is in current mainline
code too. If you do the following on a directory that is not yet in the
dcache you get an EISDIR error:

     open("/mnt/nfs/dir1", O_RDONLY)         = -1 EISDIR (Is a directory)

If however, you stat the directory first, the open works. The
difference seems to be that in the first case we're going through the
lookup codepath, and in the second we go through d_revalidate.

In the first case, we send an OPEN call to the server and it responds
with NFS4ERR_ISDIR. That gets translated to -EISDIR, and returned to
userspace. It wasn't always this way though, and I think the regression
was introduced in commit d953126a2.

That patch was added to fix an oops due to a buggy server, and I'm
unclear on how best to fix this. It seems like we need to allow the
server to fall back to doing a normal lookup when we get -EISDIR on the
OPEN call, but how do we ensure that we don't end up with the same oops
from that server bug?

Thoughts?
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: regression when opening directories on NFSv4
  2011-09-21 15:58 regression when opening directories on NFSv4 Jeff Layton
@ 2011-09-21 18:53 ` Trond Myklebust
  2011-09-21 19:10   ` Jeff Layton
  0 siblings, 1 reply; 5+ messages in thread
From: Trond Myklebust @ 2011-09-21 18:53 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Wed, 2011-09-21 at 11:58 -0400, Jeff Layton wrote: 
> We had a regression reported against RHEL concerning the opening of
> directories and it looks like that same problem is in current mainline
> code too. If you do the following on a directory that is not yet in the
> dcache you get an EISDIR error:
> 
>      open("/mnt/nfs/dir1", O_RDONLY)         = -1 EISDIR (Is a directory)
> 
> If however, you stat the directory first, the open works. The
> difference seems to be that in the first case we're going through the
> lookup codepath, and in the second we go through d_revalidate.
> 
> In the first case, we send an OPEN call to the server and it responds
> with NFS4ERR_ISDIR. That gets translated to -EISDIR, and returned to
> userspace. It wasn't always this way though, and I think the regression
> was introduced in commit d953126a2.
> 
> That patch was added to fix an oops due to a buggy server, and I'm
> unclear on how best to fix this. It seems like we need to allow the
> server to fall back to doing a normal lookup when we get -EISDIR on the
> OPEN call, but how do we ensure that we don't end up with the same oops
> from that server bug?

How about returning an error if we get to the file->f_ops->open on a
regular file in NFSv4?

In NFSv4.1 we could theoretically issue an OPEN will CLAIM_FH (assuming
there are servers out there that support it), but for NFSv4 as far as I
can see, there will always be a potential for races.

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: regression when opening directories on NFSv4
  2011-09-21 18:53 ` Trond Myklebust
@ 2011-09-21 19:10   ` Jeff Layton
  2011-09-21 19:30     ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2011-09-21 19:10 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Wed, 21 Sep 2011 14:53:12 -0400
Trond Myklebust <Trond.Myklebust@netapp.com> wrote:

> On Wed, 2011-09-21 at 11:58 -0400, Jeff Layton wrote: 
> > We had a regression reported against RHEL concerning the opening of
> > directories and it looks like that same problem is in current mainline
> > code too. If you do the following on a directory that is not yet in the
> > dcache you get an EISDIR error:
> > 
> >      open("/mnt/nfs/dir1", O_RDONLY)         = -1 EISDIR (Is a directory)
> > 
> > If however, you stat the directory first, the open works. The
> > difference seems to be that in the first case we're going through the
> > lookup codepath, and in the second we go through d_revalidate.
> > 
> > In the first case, we send an OPEN call to the server and it responds
> > with NFS4ERR_ISDIR. That gets translated to -EISDIR, and returned to
> > userspace. It wasn't always this way though, and I think the regression
> > was introduced in commit d953126a2.
> > 
> > That patch was added to fix an oops due to a buggy server, and I'm
> > unclear on how best to fix this. It seems like we need to allow the
> > server to fall back to doing a normal lookup when we get -EISDIR on the
> > OPEN call, but how do we ensure that we don't end up with the same oops
> > from that server bug?
> 
> How about returning an error if we get to the file->f_ops->open on a
> regular file in NFSv4?
> 

That would probably be reasonable. I'll see if I can come up with a
patch. The tricky part of course is ensuring that nothing regresses...

I think this is probably safe for the most part. The d_revalidate
codepath has always allowed you to end up with an open context with
NULL state. 

Granted the buggy server case here is exceedingly rare, but it seems
like the code already assumes that a ctx reached via filp may have a
NULL state pointer.

Thanks,
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: regression when opening directories on NFSv4
  2011-09-21 19:10   ` Jeff Layton
@ 2011-09-21 19:30     ` Trond Myklebust
  2011-09-22 14:34       ` Jeff Layton
  0 siblings, 1 reply; 5+ messages in thread
From: Trond Myklebust @ 2011-09-21 19:30 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Wed, 2011-09-21 at 15:10 -0400, Jeff Layton wrote: 
> On Wed, 21 Sep 2011 14:53:12 -0400
> Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> 
> > On Wed, 2011-09-21 at 11:58 -0400, Jeff Layton wrote: 
> > > We had a regression reported against RHEL concerning the opening of
> > > directories and it looks like that same problem is in current mainline
> > > code too. If you do the following on a directory that is not yet in the
> > > dcache you get an EISDIR error:
> > > 
> > >      open("/mnt/nfs/dir1", O_RDONLY)         = -1 EISDIR (Is a directory)
> > > 
> > > If however, you stat the directory first, the open works. The
> > > difference seems to be that in the first case we're going through the
> > > lookup codepath, and in the second we go through d_revalidate.
> > > 
> > > In the first case, we send an OPEN call to the server and it responds
> > > with NFS4ERR_ISDIR. That gets translated to -EISDIR, and returned to
> > > userspace. It wasn't always this way though, and I think the regression
> > > was introduced in commit d953126a2.
> > > 
> > > That patch was added to fix an oops due to a buggy server, and I'm
> > > unclear on how best to fix this. It seems like we need to allow the
> > > server to fall back to doing a normal lookup when we get -EISDIR on the
> > > OPEN call, but how do we ensure that we don't end up with the same oops
> > > from that server bug?
> > 
> > How about returning an error if we get to the file->f_ops->open on a
> > regular file in NFSv4?
> > 
> 
> That would probably be reasonable. I'll see if I can come up with a
> patch. The tricky part of course is ensuring that nothing regresses...
> 
> I think this is probably safe for the most part. The d_revalidate
> codepath has always allowed you to end up with an open context with
> NULL state. 
> 
> Granted the buggy server case here is exceedingly rare, but it seems
> like the code already assumes that a ctx reached via filp may have a
> NULL state pointer.

I agree that the buggy server is rare, but you can potentially reproduce
the problem using something like the following script

mkdir b; touch a; while true do mv a c; mv b a; mv c b; done

It will probably mostly either succeed or fail with ENOENT, but every
now and then it should be possible to tickle the above issue.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: regression when opening directories on NFSv4
  2011-09-21 19:30     ` Trond Myklebust
@ 2011-09-22 14:34       ` Jeff Layton
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff Layton @ 2011-09-22 14:34 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Wed, 21 Sep 2011 15:30:12 -0400
Trond Myklebust <Trond.Myklebust@netapp.com> wrote:

> On Wed, 2011-09-21 at 15:10 -0400, Jeff Layton wrote: 
> > On Wed, 21 Sep 2011 14:53:12 -0400
> > Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> > 
> > > On Wed, 2011-09-21 at 11:58 -0400, Jeff Layton wrote: 
> > > > We had a regression reported against RHEL concerning the opening of
> > > > directories and it looks like that same problem is in current mainline
> > > > code too. If you do the following on a directory that is not yet in the
> > > > dcache you get an EISDIR error:
> > > > 
> > > >      open("/mnt/nfs/dir1", O_RDONLY)         = -1 EISDIR (Is a directory)
> > > > 
> > > > If however, you stat the directory first, the open works. The
> > > > difference seems to be that in the first case we're going through the
> > > > lookup codepath, and in the second we go through d_revalidate.
> > > > 
> > > > In the first case, we send an OPEN call to the server and it responds
> > > > with NFS4ERR_ISDIR. That gets translated to -EISDIR, and returned to
> > > > userspace. It wasn't always this way though, and I think the regression
> > > > was introduced in commit d953126a2.
> > > > 
> > > > That patch was added to fix an oops due to a buggy server, and I'm
> > > > unclear on how best to fix this. It seems like we need to allow the
> > > > server to fall back to doing a normal lookup when we get -EISDIR on the
> > > > OPEN call, but how do we ensure that we don't end up with the same oops
> > > > from that server bug?
> > > 
> > > How about returning an error if we get to the file->f_ops->open on a
> > > regular file in NFSv4?
> > > 
> > 
> > That would probably be reasonable. I'll see if I can come up with a
> > patch. The tricky part of course is ensuring that nothing regresses...
> > 
> > I think this is probably safe for the most part. The d_revalidate
> > codepath has always allowed you to end up with an open context with
> > NULL state. 
> > 
> > Granted the buggy server case here is exceedingly rare, but it seems
> > like the code already assumes that a ctx reached via filp may have a
> > NULL state pointer.
> 
> I agree that the buggy server is rare, but you can potentially reproduce
> the problem using something like the following script
> 
> mkdir b; touch a; while true do mv a c; mv b a; mv c b; done
> 
> It will probably mostly either succeed or fail with ENOENT, but every
> now and then it should be possible to tickle the above issue.
> 

Ok, I sent you a patch that fixes the bug.

I ran the above on the server and a program in a loop that did opens on
the client, but was never able to reproduce the server-side bug. It
seemed to be OK in other testing though.
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-09-22 14:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-21 15:58 regression when opening directories on NFSv4 Jeff Layton
2011-09-21 18:53 ` Trond Myklebust
2011-09-21 19:10   ` Jeff Layton
2011-09-21 19:30     ` Trond Myklebust
2011-09-22 14:34       ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox