nfs_refresh_inode: inode number mismatch

All of lore.kernel.org
 help / color / mirror / Atom feed

* nfs_refresh_inode: inode number mismatch
@ 2001-02-08  1:13 Jun Sun
  2001-02-08  1:22 ` Neil Brown
  0 siblings, 1 reply; 27+ messages in thread
From: Jun Sun @ 2001-02-08  1:13 UTC (permalink / raw)
  To: linux-kernel

This is a weird problem that I am looking at right.  It seems to indicate a
bug in the nfs server.

I have a MIPS machine that boots from a NFS root fs hosted on a redhat 6.2
workstation.  Everything works fine except that after a few reboots I start to
see the error messages like the following:

Freeing unused kernel memory: 24k freed
INIT: version 2.77 booting
nfs_refresh_inode: inode number mismatch
expected (0x308/0x28b3d2), got (0x308/0x12b91b)
INIT: Entering runlevel: 3
sh-2.03# 

Restarting the nfs server on the host does not get rid of the messages. 
Things will get better if I reboot the host.

I traced the network packets, and it seems obvious that the server is
returning wrong fileid in the "write reply" message.  Below is a segment of
the extracted packet trace.  It is obvious that the nfs server returns a wrong
fileid for the same handle it returned earlier to the client.  The confusing
part is the nfs server actually serves the first write request, and a couple
of other requests, correctly but failed for the second time, returning a wrong
fileid.

In my particular setup, it seems only certain files (inodes) tend to get
screwed up.

Does anybody have an idea as to what is wrong here?

Please cc your reply to my email address.  TIA.

Jun

------------------
round 3:

case 1:

2177 lookup:
        ioctl.save

2178 lookup reply:
        fileid: 2667474
        handle:
cabaebfed2b32800e6ab2800080300000803000054c21100b2302b0c00000000

2181 write:
        offset:0
        total count: 60
        handle:
cabaebfed2b32800e6ab2800080300000803000054c21100b2302b0c00000000

2182 write reply:
        fileid: 2667474
        size: 60

2183 setattr:
        handle:
cabaebfed2b32800e6ab2800080300000803000054c21100b2302b0c00000000

2184 setattr reply:
        fileid: 2667474

2185 write:
        handle:
cabaebfed2b32800e6ab2800080300000803000054c21100b2302b0c00000000

2186 write reply:
        fileid 1227035
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2001-02-08  1:13 Jun Sun
@ 2001-02-08  1:22 ` Neil Brown
  2001-02-08  8:08   ` Russell King
  0 siblings, 1 reply; 27+ messages in thread
From: Neil Brown @ 2001-02-08  1:22 UTC (permalink / raw)
  To: Jun Sun; +Cc: linux-kernel

On Wednesday February 7, jsun@mvista.com wrote:
> 
> This is a weird problem that I am looking at right.  It seems to indicate a
> bug in the nfs server.
> 
> I have a MIPS machine that boots from a NFS root fs hosted on a redhat 6.2
> workstation.  Everything works fine except that after a few reboots I start to
> see the error messages like the following:

What verison of Linux?  If it is less than 2.2.18, then an upgrade 
will help you a lot.

If it is >= 2.2.18, I will look some more.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2001-02-08  1:22 ` Neil Brown
@ 2001-02-08  8:08   ` Russell King
  2001-02-09  0:02     ` Jun Sun
  0 siblings, 1 reply; 27+ messages in thread
From: Russell King @ 2001-02-08  8:08 UTC (permalink / raw)
  To: Neil Brown; +Cc: Jun Sun, linux-kernel

Neil Brown writes:
> On Wednesday February 7, jsun@mvista.com wrote:
> > This is a weird problem that I am looking at right.  It seems to indicate a
> > bug in the nfs server.
> > 
> > I have a MIPS machine that boots from a NFS root fs hosted on a redhat 6.2
> > workstation.  Everything works fine except that after a few reboots I start to
> > see the error messages like the following:
> 
> What verison of Linux?  If it is less than 2.2.18, then an upgrade 
> will help you a lot.
> 
> If it is >= 2.2.18, I will look some more.

Note that you need to upgrade the server, not the client.  Also, make sure
you don't reboot the client more than once in a 2 minute time window.
--
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2001-02-08  8:08   ` Russell King
@ 2001-02-09  0:02     ` Jun Sun
  0 siblings, 0 replies; 27+ messages in thread
From: Jun Sun @ 2001-02-09  0:02 UTC (permalink / raw)
  To: Russell King; +Cc: Neil Brown, linux-kernel

Russell King wrote:
> 
> Neil Brown writes:
> > On Wednesday February 7, jsun@mvista.com wrote:
> > > This is a weird problem that I am looking at right.  It seems to indicate a
> > > bug in the nfs server.
> > >
> > > I have a MIPS machine that boots from a NFS root fs hosted on a redhat 6.2
> > > workstation.  Everything works fine except that after a few reboots I start to
> > > see the error messages like the following:
> >
> > What verison of Linux?  If it is less than 2.2.18, then an upgrade
> > will help you a lot.
> >
> > If it is >= 2.2.18, I will look some more.
> 
> Note that you need to upgrade the server, not the client.  Also, make sure
> you don't reboot the client more than once in a 2 minute time window.

My server was 2.2.14.  I upgraded it to 2.2.18.  It appears that the problem
is gone, although it will probably take a while to be sure.

I do find the "no more than once in 2 minutes" requirement amusing ... :-)

Jun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2001-02-22 22:30 Scott A McConnell
@ 2001-02-22 21:59 ` Russell King
  2001-02-23  9:30   ` Trond Myklebust
  0 siblings, 1 reply; 27+ messages in thread
From: Russell King @ 2001-02-22 21:59 UTC (permalink / raw)
  To: Scott A McConnell; +Cc: linux-kernel

Scott A McConnell writes:
> I am running  RedHat Linux version 2.2.16-3 on  my PC and  Hardhat Linux
> version 2.4.0-test5 on my MIPS board. Any thoughts or suggestions?
> 
> I saw a discussion start on the ARM list along these lines but I never
> saw a solution.

The problem is partly caused by the NFS server indefinitely caching NFS
request XIDs to responses, and the NFS client not having a way to generate
a random initial XID.  (thus, for each reboot, it starts at the same XID
number).

Upgrade your NFS server to kernel 2.2.18, and don't reboot more than once
in a 2 minute window.

--
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 27+ messages in thread

* nfs_refresh_inode: inode number mismatch
@ 2001-02-22 22:30 Scott A McConnell
  2001-02-22 21:59 ` Russell King
  0 siblings, 1 reply; 27+ messages in thread
From: Scott A McConnell @ 2001-02-22 22:30 UTC (permalink / raw)
  To: linux-kernel

I am getting NFS errors/warnings

VFS: Mounted root (nfs filesystem).
Freeing unused kernel memory: 196k freed
nfs_refresh_inode: inode number mismatch
expected (0x806/0x6246a), got (0x806/0x62b48)
                     ^/var/run/utmp
^/var/log/wtmp                        **************
nfs_refresh_inode: inode number mismatch
expected (0x806/0x62b48), got (0x806/0x6246a)
nfs_refresh_inode: inode number mismatch
expected (0x806/0x6246a), got (0x806/0x62b48)
nfs_refresh_inode: inode number mismatch
expected (0x806/0x62b4f), got (0x806/0x6246a)

^/var/run/inetd.pid
*****************
nfs_refresh_inode: inode number mismatch
expected (0x806/0x6246a), got (0x806/0x62b48)
nfs_refresh_inode: inode number mismatch
expected (0x806/0x62b48), got (0x806/0x6246a)
nfs_refresh_inode: inode number mismatch
expected (0x806/0x62b48), got (0x806/0x6246a)
nfs_refresh_inode: inode number mismatch
expected (0x806/0x6246a), got (0x806/0x62b48)
nfs_refresh_inode: inode number mismatch
expected (0x806/0x42d60), got (0x806/0x42d5f)
nfs_refresh_inode: inode number mismatch
expected (0x806/0x6246a), got (0x806/0x62b48)
nfs_refresh_inode: inode number mismatch
expected (0x806/0x6246a), got (0x806/0x62b48)
nfs_refresh_inode: inode number mismatch
expected (0x806/0x6246a), got (0x806/0x62b48)

I am running  RedHat Linux version 2.2.16-3 on  my PC and  Hardhat Linux
version 2.4.0-test5 on my MIPS board. Any thoughts or suggestions?

I saw a discussion start on the ARM list along these lines but I never
saw a solution.

Please CC me at samcconn@cotw.com

Thanks,
Scott



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2001-02-22 21:59 ` Russell King
@ 2001-02-23  9:30   ` Trond Myklebust
  0 siblings, 0 replies; 27+ messages in thread
From: Trond Myklebust @ 2001-02-23  9:30 UTC (permalink / raw)
  To: Russell King; +Cc: Scott A McConnell, linux-kernel

>>>>> " " == Russell King <rmk@arm.linux.org.uk> writes:

     > Scott A McConnell writes:
    >> I am running RedHat Linux version 2.2.16-3 on my PC and Hardhat
    >> Linux version 2.4.0-test5 on my MIPS board. Any thoughts or
    >> suggestions?
    >>
    >> I saw a discussion start on the ARM list along these lines but
    >> I never saw a solution.

     > The problem is partly caused by the NFS server indefinitely
     > caching NFS request XIDs to responses, and the NFS client not
     > having a way to generate a random initial XID.  (thus, for each
     > reboot, it starts at the same XID number).

That shouldn't be true in the latest kernels. knfsd should normally
cache requests for no longer than 2 minutes with the changes made by
Neil following your bugreport.

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 27+ messages in thread

* nfs_refresh_inode: inode number mismatch
@ 2001-07-17  0:24 Marco d'Itri
  2001-07-17  9:44 ` Trond Myklebust
  0 siblings, 1 reply; 27+ messages in thread
From: Marco d'Itri @ 2001-07-17  0:24 UTC (permalink / raw)
  To: linux-kernel

Jul 18 00:15:07 newsserver kernel: nfs_refresh_inode: inode number mismatch
Jul 18 00:15:07 newsserver kernel: expected (0x3b30ac75/0x48d5), got (0x3b30ac75/0x8d04)

I've got a flood of these messages while talking to a procom NAS this.
Should I worry? Upgrade/patch the kernel? Yell at procom tech support?


Linux newsserver 2.4.5 #1 Fri Jun 22 18:18:56 CEST 2001 i686 unknown

192.168.139.11:/news_store on /shared/archive type nfs (rw,noatime,rsize=8192,wsize=8192,udp,nfsvers=3,addr=192.168.139.11)


-- 
ciao,
Marco

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2001-07-17  0:24 Marco d'Itri
@ 2001-07-17  9:44 ` Trond Myklebust
  2001-07-18 22:25   ` Marco d'Itri
  0 siblings, 1 reply; 27+ messages in thread
From: Trond Myklebust @ 2001-07-17  9:44 UTC (permalink / raw)
  To: Marco d'Itri; +Cc: linux-kernel

>>>>> " " == Marco d'Itri <md@Linux.IT> writes:

     > Jul 18 00:15:07 newsserver kernel: nfs_refresh_inode: inode
     > number mismatch Jul 18 00:15:07 newsserver kernel: expected
     > (0x3b30ac75/0x48d5), got (0x3b30ac75/0x8d04)

     > I've got a flood of these messages while talking to a procom
     > NAS this.  Should I worry? Upgrade/patch the kernel? Yell at
     > procom tech support?

Have you applied any extra patches to NFS? I remember one of my
patches (availalble from my WWW-page, but clearly marked experimental)
was generating these messages gratuitously.

If, on the other hand, you're using a clean kernel, I'd look into what
the server is doing. It sounds like it's doing the same thing that the
userland `nfs-server' does: namely to recycle filehandles after a file
gets deleted...

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2001-07-17  9:44 ` Trond Myklebust
@ 2001-07-18 22:25   ` Marco d'Itri
  2001-07-19 11:00     ` Trond Myklebust
  0 siblings, 1 reply; 27+ messages in thread
From: Marco d'Itri @ 2001-07-18 22:25 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel

On Jul 17, Trond Myklebust <trond.myklebust@fys.uio.no> wrote:

 >     > Jul 18 00:15:07 newsserver kernel: nfs_refresh_inode: inode
 >     > number mismatch Jul 18 00:15:07 newsserver kernel: expected
 >     > (0x3b30ac75/0x48d5), got (0x3b30ac75/0x8d04)

 >     > I've got a flood of these messages while talking to a procom
 >     > NAS this.  Should I worry? Upgrade/patch the kernel? Yell at
 >     > procom tech support?

 >Have you applied any extra patches to NFS? I remember one of my
No, the kernel is plain unpatched 2.4.5.

 >If, on the other hand, you're using a clean kernel, I'd look into what
 >the server is doing. It sounds like it's doing the same thing that the
 >userland `nfs-server' does: namely to recycle filehandles after a file
 >gets deleted...
Anything specific I can tell to their tech support?

Can I ignore these messages or I risk data corruption?

-- 
ciao,
Marco

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2001-07-18 22:25   ` Marco d'Itri
@ 2001-07-19 11:00     ` Trond Myklebust
  0 siblings, 0 replies; 27+ messages in thread
From: Trond Myklebust @ 2001-07-19 11:00 UTC (permalink / raw)
  To: Marco d'Itri; +Cc: Linux Kernel

>>>>> " " == Marco d'Itri <md@Linux.IT> writes:

     > On Jul 17, Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
    >> > Jul 18 00:15:07 newsserver kernel: nfs_refresh_inode: inode
    >> > number mismatch Jul 18 00:15:07 newsserver kernel: expected
    >> > (0x3b30ac75/0x48d5), got (0x3b30ac75/0x8d04)

    >> If, on the other hand, you're using a clean kernel, I'd look
    >> into what the server is doing. It sounds like it's doing the
    >> same thing that the userland `nfs-server' does: namely to
    >> recycle filehandles after a file gets deleted...
     > Anything specific I can tell to their tech support?

     > Can I ignore these messages or I risk data corruption?

There's always a small danger of data corruption, since the NFS client
can't rely on the file handle actually being a pointer to the file we
expect.

Try 2.4.6 first though, as a couple of fixes were implemented there
that should reduce the frequency of such messages. Basically we ensure
that inodes are removed from the cache when we do believe that it has
been deleted.

A proper fix, though, would be for the server to implement filehandles
that are unique as per RFC1813...

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 27+ messages in thread

* nfs_refresh_inode: inode number mismatch
@ 2003-06-03 23:54 Frank Cusack
  2003-06-04 14:19 ` Trond Myklebust
  0 siblings, 1 reply; 27+ messages in thread
From: Frank Cusack @ 2003-06-03 23:54 UTC (permalink / raw)
  To: lkml, trond.myklebust

Hi,

[Previously sent to nfs@sourceforge with no response]

I'm using a frankenstein kernel, 2.4.21-rc3 with some -ac bits,
and 2.5.69 NFS+RPC backported to it.  Like the CITI kernel (for krb5),
but a little more aggressive on the bits backported.  For the purpose
of this email, I think the code I have questions with is similar or even
identical from 2.4.21->2.5.69.  I can reproduce this problem on a RH
2.4.20-9smp kernel.

Consider these two shells running on the same machine:

	    1				    2

	cd /nfs				cd /nfs
	mkdir t
	echo foo > t/foo
	less t/foo
	 [less waits for input]
					rm -rf t
	'v'
	 [vi tries to access tmp/foo]

At this point, fs/nfs/inode.c:__nfs_refresh_inode() prints the "inode
number mismatch" error.  AFAICT, this is just noise, but the noise is
driving me crazy. :-)

Now, if sequence 2 is run on a different machine, there is no error!
So that hints to me that the local cache just needs to be cleared,
perhaps in nfs_rmdir() or maybe in nfs_unlink()/nfs_safe_remove().
I've tried a few things, but I'm not familiar enough with the code
and am making slow progress.  I can suppress this error by testing
for 'unlinked but open' in __nfs_refresh_inode:

        if (NFS_FILEID(inode) != fattr->fileid) {
		if (inode->i_nlink)	/* quiet if inode DNE anymore */
			printk(...)
	}

Do you think this is safe?  Some minimal logs:

kernel: NFS: dentry_delete(t/.nfs01c7d70600000001, 2)	| renamed file
kernel: NFS: delete_inode(e/29873926)			| unlink of renamed foo

kernel: NFS: refresh_inode(e/29873923 ct=1 info=0x6)	| accessing t/
kernel: nfs_refresh_inode: inode number mismatch
kernel: expected (0xe/0x1c7d703), got (0xe/0xe63bc2)
kernel: NFS: dentry_delete(fsstress/t, 0)
kernel: NFS: delete_inode(e/29873923)

and then access calls beginning at the root.  I apologize for the likely
uselessness of the above logs.  I can email some annotated logs if desired,
but the problem is very easy to reproduce, so I'll hold off for now.

This problem only exists for nfsv3.  This problem doesn't occur if there
is a third process also holding foo open (note that the directory does
get removed, just no kernel error when trying to access it).

The 2.2 kernel doesn't have this problem, because (apparently) it doesn't
allow you to unlink a .nfsXXX file while it's open (and therefore you
cannot remove the dir).

Which made me look around (2.5.69):  In nfs_silly_rename(), the new
dentry (sdentry) gets a d_count of 1.  Doesn't this indicate that no
one is holding this file open?  (which then tells nfs_unlink() to just
call nfs_safe_remove() rather than nfs_silly_rename())  Is that really
desirable?  Even if I set the d_count to match what the previous
dentry->d_count had, and avoid calling dput(sdentry), on the next run
through nfs_unlink() the d_count is 1 and it just goes to nfs_safe_remove().
I think that clearly, I don't understand what the d_move() is for.
(My guess is to avoid nfs_async_unlink() getting passed a dentry which
we are actually about to get rid of, but I haven't wrapped my head around
the dcache yet.)

Then I noticed that the DCACHE_NFSFS_RENAMED seems a little racy.
nfs_async_unlink() sets this and when the call completes,
nfs_complete_unlink() resets it.  So while it's being deleted, if an
rm -rf quickly picks up the .nfs name before the async unlink returns,
it won't get removed.  But if the nfs call completes first, it does
get removed.  Is the intention just to prevent removal of the .nfs
file until the old file is removed on the server?  What's the benefit
of this?

So, even with that error message quieted, fsstress reports lots of
inode mismatches.  I am in the process of trying to piece together a
simple reproducible sequence of NFS calls.

This is against a netapp server, although I can't see how the server would
matter.

Thanks for any advice, guidance, or hopefully fixes!  BTW, I'm interested
to hear what tools folks use to stress the NFS client.

/fc

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2003-06-03 23:54 Frank Cusack
@ 2003-06-04 14:19 ` Trond Myklebust
  2003-06-04 21:20   ` Frank Cusack
  0 siblings, 1 reply; 27+ messages in thread
From: Trond Myklebust @ 2003-06-04 14:19 UTC (permalink / raw)
  To: Frank Cusack; +Cc: lkml, trond.myklebust

>>>>> " " == Frank Cusack <fcusack@fcusack.com> writes:

     > Hi, [Previously sent to nfs@sourceforge with no response]

     > I'm using a frankenstein kernel, 2.4.21-rc3 with some -ac bits,
     > and 2.5.69 NFS+RPC backported to it.  Like the CITI kernel (for
     > krb5), but a little more aggressive on the bits backported.
     > For the purpose of this email, I think the code I have
     > questions with is similar or even identical from
     > 2.4.21->2.5.69.  I can reproduce this problem on a RH
     > 2.4.20-9smp kernel.

     > Consider these two shells running on the same machine:

     > 	    1 2

     > 	cd /nfs cd /nfs mkdir t echo foo > t/foo less t/foo
     > 	 [less waits for input]
     > 					rm -rf t
     > 	'v'
     > 	 [vi tries to access tmp/foo]

     > At this point, fs/nfs/inode.c:__nfs_refresh_inode() prints the
     > "inode number mismatch" error.  AFAICT, this is just noise, but
     > the noise is driving me crazy. :-)

Inode number mismatch points to either an an obvious server error (it
is not providing unique filehandles) or corruption of the fattr struct
that was passed to nfs_refresh_inode().

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2003-06-04 14:19 ` Trond Myklebust
@ 2003-06-04 21:20   ` Frank Cusack
  2003-06-04 21:28     ` Trond Myklebust
  2003-06-05  9:11     ` Adrian Cox
  0 siblings, 2 replies; 27+ messages in thread
From: Frank Cusack @ 2003-06-04 21:20 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: lkml

On Wed, Jun 04, 2003 at 04:19:38PM +0200, Trond Myklebust wrote:
> >>>>> " " == Frank Cusack <fcusack@fcusack.com> writes:
>      > At this point, fs/nfs/inode.c:__nfs_refresh_inode() prints the
>      > "inode number mismatch" error.  AFAICT, this is just noise, but
>      > the noise is driving me crazy. :-)
> 
> Inode number mismatch points to either an an obvious server error (it
> is not providing unique filehandles) or corruption of the fattr struct
> that was passed to nfs_refresh_inode().

Clearly it's not the former.  No way a netapp filer is going to have
this problem.  I can't imagine *any* nfs server having this problem.

Could you take another look at the specific case I cited?  At the time
I try to access the file, the path to it no longer exists.  No information
on this file should exist.

/fc

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2003-06-04 21:20   ` Frank Cusack
@ 2003-06-04 21:28     ` Trond Myklebust
  2003-06-05  9:11     ` Adrian Cox
  1 sibling, 0 replies; 27+ messages in thread
From: Trond Myklebust @ 2003-06-04 21:28 UTC (permalink / raw)
  To: Frank Cusack; +Cc: lkml

>>>>> " " == Frank Cusack <fcusack@fcusack.com> writes:

     > Could you take another look at the specific case I cited?  At
     > the time I try to access the file, the path to it no longer
     > exists.  No information on this file should exist.

I cannot duplicate.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2003-06-04 21:20   ` Frank Cusack
  2003-06-04 21:28     ` Trond Myklebust
@ 2003-06-05  9:11     ` Adrian Cox
  2003-06-05  9:13       ` Russell King
  1 sibling, 1 reply; 27+ messages in thread
From: Adrian Cox @ 2003-06-05  9:11 UTC (permalink / raw)
  To: Frank Cusack; +Cc: trond.myklebust, linux-kernel

On Wed, 4 Jun 2003 14:20:47 -0700
"Frank Cusack" <fcusack@fcusack.com> wrote:

> On Wed, Jun 04, 2003 at 04:19:38PM +0200, Trond Myklebust wrote:
> > >>>>> " " == Frank Cusack <fcusack@fcusack.com> writes:
> >      > At this point, fs/nfs/inode.c:__nfs_refresh_inode() prints
> >      > the"inode number mismatch" error.  AFAICT, this is just
> >      > noise, but the noise is driving me crazy. :-)
> > 
> > Inode number mismatch points to either an an obvious server error
> > (it is not providing unique filehandles) or corruption of the fattr
> > struct that was passed to nfs_refresh_inode().

There's a very common cause on embedded boards that don't have
real-time clocks. Without a clock the client uses the same XID on every
run, leading to lots of these messages. Is your clock broken?

- Adrian Cox
http://www.humboldt.co.uk/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2003-06-05  9:11     ` Adrian Cox
@ 2003-06-05  9:13       ` Russell King
  2003-06-05 13:51         ` Trond Myklebust
  0 siblings, 1 reply; 27+ messages in thread
From: Russell King @ 2003-06-05  9:13 UTC (permalink / raw)
  To: Adrian Cox; +Cc: Frank Cusack, trond.myklebust, linux-kernel

On Thu, Jun 05, 2003 at 10:11:20AM +0100, Adrian Cox wrote:
> There's a very common cause on embedded boards that don't have
> real-time clocks. Without a clock the client uses the same XID on every
> run, leading to lots of these messages. Is your clock broken?

BTDT.

If this is the case, you need to ensure that you don't reboot the client
before the servers XID cache times out the XID numbers.  For Linux knfsd,
that's around 2 minutes.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2003-06-05  9:13       ` Russell King
@ 2003-06-05 13:51         ` Trond Myklebust
  0 siblings, 0 replies; 27+ messages in thread
From: Trond Myklebust @ 2003-06-05 13:51 UTC (permalink / raw)
  To: Russell King; +Cc: Adrian Cox, Frank Cusack, linux-kernel

>>>>> " " == Russell King <rmk@arm.linux.org.uk> writes:

     > If this is the case, you need to ensure that you don't reboot
     > the client before the servers XID cache times out the XID
     > numbers.  For Linux knfsd, that's around 2 minutes.

Note that older versions of knfsd didn't time out their replay cache
at all...

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 27+ messages in thread

* nfs_refresh_inode: inode number mismatch
@ 2007-09-11 16:41 Chris Carlson
  2007-09-11 17:02 ` Jeff Layton
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Chris Carlson @ 2007-09-11 16:41 UTC (permalink / raw)
  To: nfs

We are running MontaVista Embedded Linux 2.4 with NetApps NFS servers as 
the root filesystem and Linux 2.6 mounted filesystems.  A simple test 
runs to copy files from one mount point to another (both are different 
directories on the same NFS server mounted at differet points).

After 30 copies of a hundred files are made, the system is rebooted and 
the test repeats.

After 2 reboots, an NFS file is created, and we get the following error 
from the kernel:

nfs_refresh_inode: inode number mismatch
expected (0x11/0xdacea3), got (0x11/0xb8d5e3)

We're just trying to figure out what to do to figure out what the 
problem is.  Is there a good place to place printks or breakpoints?

Thanks for any assistance you can provide.

Chris Carlson

CONFIDENTIALITY NOTICE:

This email, together with any attachments, is intended only for use by Aristos Logic Corporation and the
individual(s) to which it is addressed and may contain information that is privileged, confidential or 
exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination,
distribution, or copying of this email, or any attachment, is strictly prohibited. If you have received this
email in error, please notify Aristos Logic Corporation by sending an email to sysadm@aristoslogic.com
and delete this email, along with any attachments, from your computer.

Thank you.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2007-09-11 16:41 nfs_refresh_inode: inode number mismatch Chris Carlson
@ 2007-09-11 17:02 ` Jeff Layton
  2007-09-11 17:43   ` Jeff Layton
  2007-09-28  0:11   ` Chris Carlson
  2007-09-11 17:09 ` Chuck Lever
  2007-09-21 20:46 ` Trond Myklebust
  2 siblings, 2 replies; 27+ messages in thread
From: Jeff Layton @ 2007-09-11 17:02 UTC (permalink / raw)
  To: Chris Carlson; +Cc: nfs

On Tue, 11 Sep 2007 09:41:51 -0700
"Chris Carlson" <c.carlson@aristoslogic.com> wrote:

> 
> We are running MontaVista Embedded Linux 2.4 with NetApps NFS servers as 
> the root filesystem and Linux 2.6 mounted filesystems.  A simple test 
> runs to copy files from one mount point to another (both are different 
> directories on the same NFS server mounted at differet points).
> 
> After 30 copies of a hundred files are made, the system is rebooted and 
> the test repeats.
> 
> After 2 reboots, an NFS file is created, and we get the following error 
> from the kernel:
> 
> nfs_refresh_inode: inode number mismatch
> expected (0x11/0xdacea3), got (0x11/0xb8d5e3)
> 

This may be a bug in the NetApp. I saw some similar messages when
working on an issue and it turned out to be a filer bug. I ended
up tracking it down by doing network captures, and then searching them
for the 'expected' and 'got' sequence of bytes in wireshark. It showed
that in some cases the netapp was sending back a new fileid in the
WCC attributes for the dir when a create call would fail.

> We're just trying to figure out what to do to figure out what the 
> problem is.  Is there a good place to place printks or breakpoints?
> 
> Thanks for any assistance you can provide.
> 
> Chris Carlson
> 
> 
> CONFIDENTIALITY NOTICE:
> 
> This email, together with any attachments, is intended only for use by Aristos Logic Corporation and the
> individual(s) to which it is addressed and may contain information that is privileged, confidential or 
> exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination,
> distribution, or copying of this email, or any attachment, is strictly prohibited. If you have received this
> email in error, please notify Aristos Logic Corporation by sending an email to sysadm@aristoslogic.com
> and delete this email, along with any attachments, from your computer.
> 
> Thank you.
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2005.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs
> 


-- 
Jeff Layton <jlayton@redhat.com>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2007-09-11 16:41 nfs_refresh_inode: inode number mismatch Chris Carlson
  2007-09-11 17:02 ` Jeff Layton
@ 2007-09-11 17:09 ` Chuck Lever
  2007-09-11 21:15   ` Chris Carlson
  2007-09-21 20:46 ` Trond Myklebust
  2 siblings, 1 reply; 27+ messages in thread
From: Chuck Lever @ 2007-09-11 17:09 UTC (permalink / raw)
  To: Chris Carlson; +Cc: nfs

[-- Attachment #1: Type: text/plain, Size: 1397 bytes --]

Hi Chris-

Chris Carlson wrote:
> We are running MontaVista Embedded Linux 2.4 with NetApps NFS servers as 
> the root filesystem and Linux 2.6 mounted filesystems.  A simple test 
> runs to copy files from one mount point to another (both are different 
> directories on the same NFS server mounted at differet points).
> 
> After 30 copies of a hundred files are made, the system is rebooted and 
> the test repeats.
> 
> After 2 reboots, an NFS file is created, and we get the following error 
> from the kernel:
> 
> nfs_refresh_inode: inode number mismatch
> expected (0x11/0xdacea3), got (0x11/0xb8d5e3)
> 
> We're just trying to figure out what to do to figure out what the 
> problem is.  Is there a good place to place printks or breakpoints?

This may be due to an RPC XID collision.  Which 2.4 kernel are you 
using?  The Linux NFS client may be sending the same XID sequence on the 
same port number after each reboot, in which case the server will 
respond with a cached reply rather than doing real work.  The cached 
reply may contain old file ID information, which triggers the "inode 
number mismatch" message you see in your log.

One way to detect if this is happening is to use "pktt" on the filer. 
You can capture a packet trace across client reboots to determine if

A) the transport socket's port number is the same across reboots, and

B) the RPC XID sequence is the same

[-- Attachment #2: chuck.lever.vcf --]
[-- Type: text/x-vcard, Size: 290 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
url:http://oss.oracle.com/~cel
version:2.1
end:vcard


[-- Attachment #3: Type: text/plain, Size: 228 bytes --]

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

[-- Attachment #4: Type: text/plain, Size: 140 bytes --]

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2007-09-11 17:02 ` Jeff Layton
@ 2007-09-11 17:43   ` Jeff Layton
  2007-09-11 21:11     ` Chris Carlson
  2007-09-28  0:11   ` Chris Carlson
  1 sibling, 1 reply; 27+ messages in thread
From: Jeff Layton @ 2007-09-11 17:43 UTC (permalink / raw)
  To: Chris Carlson; +Cc: nfs

On Tue, 11 Sep 2007 13:02:26 -0400
Jeff Layton <jlayton@redhat.com> wrote:

> On Tue, 11 Sep 2007 09:41:51 -0700
> "Chris Carlson" <c.carlson@aristoslogic.com> wrote:
> 
> > 
> > We are running MontaVista Embedded Linux 2.4 with NetApps NFS servers as 
> > the root filesystem and Linux 2.6 mounted filesystems.  A simple test 
> > runs to copy files from one mount point to another (both are different 
> > directories on the same NFS server mounted at differet points).
> > 
> > After 30 copies of a hundred files are made, the system is rebooted and 
> > the test repeats.
> > 
> > After 2 reboots, an NFS file is created, and we get the following error 
> > from the kernel:
> > 
> > nfs_refresh_inode: inode number mismatch
> > expected (0x11/0xdacea3), got (0x11/0xb8d5e3)
> > 
> 
> This may be a bug in the NetApp. I saw some similar messages when
> working on an issue and it turned out to be a filer bug. I ended
> up tracking it down by doing network captures, and then searching them
> for the 'expected' and 'got' sequence of bytes in wireshark. It showed
> that in some cases the netapp was sending back a new fileid in the
> WCC attributes for the dir when a create call would fail.
> 

For the record, the NetApp engineers I worked with on this issue referenced
NetApp BURT:

244015: We should not have pre/post attributes in case of an error coming
from exports code

You might want to check that your filer has that fix.

-- 
Jeff Layton <jlayton@redhat.com>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2007-09-11 17:43   ` Jeff Layton
@ 2007-09-11 21:11     ` Chris Carlson
  0 siblings, 0 replies; 27+ messages in thread
From: Chris Carlson @ 2007-09-11 21:11 UTC (permalink / raw)
  To: Jeff Layton; +Cc: nfs


Thanks a lot, Jeff.  It looks like this is the problem.  You just saved 
me from days of research and testing.

Chris


Jeff Layton wrote:
> On Tue, 11 Sep 2007 13:02:26 -0400
> Jeff Layton <jlayton@redhat.com> wrote:
>
>   
>> On Tue, 11 Sep 2007 09:41:51 -0700
>> "Chris Carlson" <c.carlson@aristoslogic.com> wrote:
>>
>>     
>>> We are running MontaVista Embedded Linux 2.4 with NetApps NFS servers as 
>>> the root filesystem and Linux 2.6 mounted filesystems.  A simple test 
>>> runs to copy files from one mount point to another (both are different 
>>> directories on the same NFS server mounted at differet points).
>>>
>>> After 30 copies of a hundred files are made, the system is rebooted and 
>>> the test repeats.
>>>
>>> After 2 reboots, an NFS file is created, and we get the following error 
>>> from the kernel:
>>>
>>> nfs_refresh_inode: inode number mismatch
>>> expected (0x11/0xdacea3), got (0x11/0xb8d5e3)
>>>
>>>       
>> This may be a bug in the NetApp. I saw some similar messages when
>> working on an issue and it turned out to be a filer bug. I ended
>> up tracking it down by doing network captures, and then searching them
>> for the 'expected' and 'got' sequence of bytes in wireshark. It showed
>> that in some cases the netapp was sending back a new fileid in the
>> WCC attributes for the dir when a create call would fail.
>>
>>     
>
> For the record, the NetApp engineers I worked with on this issue referenced
> NetApp BURT:
>
> 244015: We should not have pre/post attributes in case of an error coming
> from exports code
>
> You might want to check that your filer has that fix.
>
>   

CONFIDENTIALITY NOTICE:

This email, together with any attachments, is intended only for use by Aristos Logic Corporation and the
individual(s) to which it is addressed and may contain information that is privileged, confidential or 
exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination,
distribution, or copying of this email, or any attachment, is strictly prohibited. If you have received this
email in error, please notify Aristos Logic Corporation by sending an email to sysadm@aristoslogic.com
and delete this email, along with any attachments, from your computer.

Thank you.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2007-09-11 17:09 ` Chuck Lever
@ 2007-09-11 21:15   ` Chris Carlson
  0 siblings, 0 replies; 27+ messages in thread
From: Chris Carlson @ 2007-09-11 21:15 UTC (permalink / raw)
  To: chuck.lever; +Cc: nfs


Thanks, Chuck.  Yours and Jeff's directions helped a lot.

We are running Linux 2.4.20 on our clients and NetApps ONTAP 6.3.3 on 
the servers that are causing the problem.  Based on all of this info, it 
appears the Linux client is blameless.

Thanks again,
Chris


Chuck Lever wrote:
> Hi Chris-
>
> Chris Carlson wrote:
>> We are running MontaVista Embedded Linux 2.4 with NetApps NFS servers 
>> as the root filesystem and Linux 2.6 mounted filesystems.  A simple 
>> test runs to copy files from one mount point to another (both are 
>> different directories on the same NFS server mounted at differet 
>> points).
>>
>> After 30 copies of a hundred files are made, the system is rebooted 
>> and the test repeats.
>>
>> After 2 reboots, an NFS file is created, and we get the following 
>> error from the kernel:
>>
>> nfs_refresh_inode: inode number mismatch
>> expected (0x11/0xdacea3), got (0x11/0xb8d5e3)
>>
>> We're just trying to figure out what to do to figure out what the 
>> problem is.  Is there a good place to place printks or breakpoints?
>
> This may be due to an RPC XID collision.  Which 2.4 kernel are you 
> using?  The Linux NFS client may be sending the same XID sequence on 
> the same port number after each reboot, in which case the server will 
> respond with a cached reply rather than doing real work.  The cached 
> reply may contain old file ID information, which triggers the "inode 
> number mismatch" message you see in your log.
>
> One way to detect if this is happening is to use "pktt" on the filer. 
> You can capture a packet trace across client reboots to determine if
>
> A) the transport socket's port number is the same across reboots, and
>
> B) the RPC XID sequence is the same

CONFIDENTIALITY NOTICE:

This email, together with any attachments, is intended only for use by Aristos Logic Corporation and the
individual(s) to which it is addressed and may contain information that is privileged, confidential or 
exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination,
distribution, or copying of this email, or any attachment, is strictly prohibited. If you have received this
email in error, please notify Aristos Logic Corporation by sending an email to sysadm@aristoslogic.com
and delete this email, along with any attachments, from your computer.

Thank you.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2007-09-11 16:41 nfs_refresh_inode: inode number mismatch Chris Carlson
  2007-09-11 17:02 ` Jeff Layton
  2007-09-11 17:09 ` Chuck Lever
@ 2007-09-21 20:46 ` Trond Myklebust
  2 siblings, 0 replies; 27+ messages in thread
From: Trond Myklebust @ 2007-09-21 20:46 UTC (permalink / raw)
  To: Chris Carlson; +Cc: nfs

On Tue, 2007-09-11 at 09:41 -0700, Chris Carlson wrote:
> We are running MontaVista Embedded Linux 2.4 with NetApps NFS servers as 
> the root filesystem and Linux 2.6 mounted filesystems.  A simple test 
> runs to copy files from one mount point to another (both are different 
> directories on the same NFS server mounted at differet points).
> 
> After 30 copies of a hundred files are made, the system is rebooted and 
> the test repeats.
> 
> After 2 reboots, an NFS file is created, and we get the following error 
> from the kernel:
> 
> nfs_refresh_inode: inode number mismatch
> expected (0x11/0xdacea3), got (0x11/0xb8d5e3)
> 
> We're just trying to figure out what to do to figure out what the 
> problem is.  Is there a good place to place printks or breakpoints?

It is a known problem with some versions of OnTap: they sometimes return
corrupted attribute information when an operation is denied due to a
'read-only' export option.
You should be able to fix the problem by upgrading to a more recent
version of OnTap.

Cheers
  Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2007-09-11 17:02 ` Jeff Layton
  2007-09-11 17:43   ` Jeff Layton
@ 2007-09-28  0:11   ` Chris Carlson
  2007-09-28 14:52     ` Chuck Lever
  1 sibling, 1 reply; 27+ messages in thread
From: Chris Carlson @ 2007-09-28  0:11 UTC (permalink / raw)
  To: nfs

A few weeks ago, I asked for assistance in finding the cause for an 
issue with NFS we were experiencing.  The original message is below.

We followed a path down a response we received having to do with an old 
version of the OnTap system on our NetApps servers.  Apparently, it is a 
caching problem that is known when using NetApps NFS servers.

Suddenly, we discovered the same problem with our Snap Appliance 
servers.  Now we can't blame it on NetApps.

A theory we came up with was that the real-time clock on our boards is 
not operational.  Is it possible that during our frequent reboots, the 
sequence number of NFS RPC calls is coinciding with previous runs, and 
the server is responding with cached packets having the same sequence 
number on the previous run?

I have noticed that in Linux 2.4, the random seed appears to be 
generated from the lower 16 bits of the MAC address.  This implies to me 
that it is quite likely the sequence numbers would be identical from one 
run to the next.

Does our theory that the server is sending cached responses sound plausible?

Thanks for your time,
Chris

> On Tue, 11 Sep 2007 09:41:51 -0700
> "Chris Carlson" <c.carlson@aristoslogic.com> wrote:
>
>   
>> We are running MontaVista Embedded Linux 2.4 with NetApps NFS servers as 
>> the root filesystem and Linux 2.6 mounted filesystems.  A simple test 
>> runs to copy files from one mount point to another (both are different 
>> directories on the same NFS server mounted at differet points).
>>
>> After 30 copies of a hundred files are made, the system is rebooted and 
>> the test repeats.
>>
>> After 2 reboots, an NFS file is created, and we get the following error 
>> from the kernel:
>>
>> nfs_refresh_inode: inode number mismatch
>> expected (0x11/0xdacea3), got (0x11/0xb8d5e3)
>>
>>     
>
>   
>   

CONFIDENTIALITY NOTICE:

This email, together with any attachments, is intended only for use by Aristos Logic Corporation and the
individual(s) to which it is addressed and may contain information that is privileged, confidential or 
exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination,
distribution, or copying of this email, or any attachment, is strictly prohibited. If you have received this
email in error, please notify Aristos Logic Corporation by sending an email to sysadm@aristoslogic.com
and delete this email, along with any attachments, from your computer.

Thank you.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: nfs_refresh_inode: inode number mismatch
  2007-09-28  0:11   ` Chris Carlson
@ 2007-09-28 14:52     ` Chuck Lever
  0 siblings, 0 replies; 27+ messages in thread
From: Chuck Lever @ 2007-09-28 14:52 UTC (permalink / raw)
  To: Chris Carlson; +Cc: nfs

[-- Attachment #1: Type: text/plain, Size: 3289 bytes --]

Hi Chris-

Chris Carlson wrote:
> A few weeks ago, I asked for assistance in finding the cause for an 
> issue with NFS we were experiencing.  The original message is below.
> 
> We followed a path down a response we received having to do with an old 
> version of the OnTap system on our NetApps servers.  Apparently, it is a 
> caching problem that is known when using NetApps NFS servers.
> 
> Suddenly, we discovered the same problem with our Snap Appliance 
> servers.  Now we can't blame it on NetApps.
> 
> A theory we came up with was that the real-time clock on our boards is 
> not operational.  Is it possible that during our frequent reboots, the 
> sequence number of NFS RPC calls is coinciding with previous runs, and 
> the server is responding with cached packets having the same sequence 
> number on the previous run?
> 
> I have noticed that in Linux 2.4, the random seed appears to be 
> generated from the lower 16 bits of the MAC address.  This implies to me 
> that it is quite likely the sequence numbers would be identical from one 
> run to the next.
> 
> Does our theory that the server is sending cached responses sound plausible?

This theory is what I suggested in my reply to your original e-mail.  So 
I think it's plausible!  :-)

If the client's XID generator starts at the same value after every 
reboot, the port number the client uses to connect is the same, and the 
client's IP address is the same, the server has little to distinguish 
fresh RPC requests from old ones.

>> On Tue, 11 Sep 2007 09:41:51 -0700
>> "Chris Carlson" <c.carlson@aristoslogic.com> wrote:
>>
>>   
>>> We are running MontaVista Embedded Linux 2.4 with NetApps NFS servers as 
>>> the root filesystem and Linux 2.6 mounted filesystems.  A simple test 
>>> runs to copy files from one mount point to another (both are different 
>>> directories on the same NFS server mounted at differet points).
>>>
>>> After 30 copies of a hundred files are made, the system is rebooted and 
>>> the test repeats.
>>>
>>> After 2 reboots, an NFS file is created, and we get the following error 
>>> from the kernel:
>>>
>>> nfs_refresh_inode: inode number mismatch
>>> expected (0x11/0xdacea3), got (0x11/0xb8d5e3)
>>>
>>>     
>>   
>>   
> 
> CONFIDENTIALITY NOTICE:
> 
> This email, together with any attachments, is intended only for use by Aristos Logic Corporation and the
> individual(s) to which it is addressed and may contain information that is privileged, confidential or 
> exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination,
> distribution, or copying of this email, or any attachment, is strictly prohibited. If you have received this
> email in error, please notify Aristos Logic Corporation by sending an email to sysadm@aristoslogic.com
> and delete this email, along with any attachments, from your computer.
> 
> Thank you.
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2005.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs

[-- Attachment #2: chuck.lever.vcf --]
[-- Type: text/x-vcard, Size: 290 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
url:http://oss.oracle.com/~cel
version:2.1
end:vcard


[-- Attachment #3: Type: text/plain, Size: 228 bytes --]

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

[-- Attachment #4: Type: text/plain, Size: 140 bytes --]

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2007-09-28 14:52 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-11 16:41 nfs_refresh_inode: inode number mismatch Chris Carlson
2007-09-11 17:02 ` Jeff Layton
2007-09-11 17:43   ` Jeff Layton
2007-09-11 21:11     ` Chris Carlson
2007-09-28  0:11   ` Chris Carlson
2007-09-28 14:52     ` Chuck Lever
2007-09-11 17:09 ` Chuck Lever
2007-09-11 21:15   ` Chris Carlson
2007-09-21 20:46 ` Trond Myklebust
  -- strict thread matches above, loose matches on Subject: below --
2003-06-03 23:54 Frank Cusack
2003-06-04 14:19 ` Trond Myklebust
2003-06-04 21:20   ` Frank Cusack
2003-06-04 21:28     ` Trond Myklebust
2003-06-05  9:11     ` Adrian Cox
2003-06-05  9:13       ` Russell King
2003-06-05 13:51         ` Trond Myklebust
2001-07-17  0:24 Marco d'Itri
2001-07-17  9:44 ` Trond Myklebust
2001-07-18 22:25   ` Marco d'Itri
2001-07-19 11:00     ` Trond Myklebust
2001-02-22 22:30 Scott A McConnell
2001-02-22 21:59 ` Russell King
2001-02-23  9:30   ` Trond Myklebust
2001-02-08  1:13 Jun Sun
2001-02-08  1:22 ` Neil Brown
2001-02-08  8:08   ` Russell King
2001-02-09  0:02     ` Jun Sun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.