* nfs problems with 2.6.18-rc1
@ 2006-07-13 18:22 Janos Farkas
2006-07-16 23:23 ` Neil Brown
2006-07-17 9:03 ` Tony Reix
0 siblings, 2 replies; 10+ messages in thread
From: Janos Farkas @ 2006-07-13 18:22 UTC (permalink / raw)
To: linux-kernel, nfs
Hi!
I recently updated two (old) hosts to 2.6.18-rc1, and started noticing
weird things with the nfs mounted /home s.
I frequently face EACCESs where a few minutes ago there wasn't any
problem, and after a retry everything does work again.
An example that easily trips it is keeping mutt open
on a single mailbox file (strace -tt| grep stat):
20:04:08.393815 stat64("mailbox", {st_mode=S_IFREG|0600, st_size=401000, ...}) = 0
20:08:41.859168 stat64("mailbox", {st_mode=S_IFREG|0600, st_size=401000, ...}) = 0
20:09:30.975876 stat64("mailbox", 0xbfe8966c) = -1 EACCES (Permission denied)
This results in a bit scary "Mailbox was corrupted!" message, but
otherwise harmless. Reopening the mailbox succeeds immediately.
A sample session with an rsync session updating files on the nfs mounted
/home/:
-----
> rsync...
receiving file list ... done
file1
rsync: close failed on "/home/path/.file1.UgEmSh": Permission denied (13)
rsync error: error in file IO (code 11) at receiver.c(628) [receiver]
rsync: connection unexpectedly closed (2490 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(471) [generator]
> rsync...
receiving file list ... done
rsync: recv_generator: failed to stat "/home/path/file2": Permission denied (13)
rsync: recv_generator: failed to stat "/home/path/file3": Permission denied (13)
rsync: recv_generator: failed to stat "/home/path/file4": Permission denied (13)
rsync: recv_generator: failed to stat "/home/path/file5": Permission denied (13)
rsync: recv_generator: failed to stat "/home/path/file6": Permission denied (13)
rsync: recv_generator: failed to stat "/home/path/file7": Permission denied (13)
-----
I also think this is related in the dmesg. Think, because there's no
other trace of any "read error" on the disks, and the comments in
mm/filemap.c (but the message is not that much help to be sure of this).
Reducing readahead size to 28K
Reducing readahead size to 4K
Reducing readahead size to 28K
Reducing readahead size to 4K
Reducing readahead size to 28K
Reducing readahead size to 4K
Reducing readahead size to 0K
The relevant part of the /proc/mounts file:
-----
automount(pid1831) /home autofs rw 0 0
HOST:/export/PATH /home/path nfs rw,vers=3,rsize=8192,wsize=8192,hard,intr,nolock,proto=udp,timeo=7,retrans=3,sec=sys,addr=HOST 0 0
-----
How can I help with tracing this? git bisecting on these machines takes
at least an hour per step, (and no reasonable connectivity either to
compile elsewhere much quicker).
Janos
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: nfs problems with 2.6.18-rc1
2006-07-13 18:22 nfs problems with 2.6.18-rc1 Janos Farkas
@ 2006-07-16 23:23 ` Neil Brown
2006-07-17 10:08 ` Janos Farkas
2006-07-17 9:03 ` Tony Reix
1 sibling, 1 reply; 10+ messages in thread
From: Neil Brown @ 2006-07-16 23:23 UTC (permalink / raw)
To: Janos Farkas; +Cc: linux-kernel, nfs
On Thursday July 13, chexum+dev@gmail.com wrote:
> Hi!
>
> I recently updated two (old) hosts to 2.6.18-rc1, and started noticing
> weird things with the nfs mounted /home s.
So this is both the client and the server that you upgraded? That
makes is harder to point the finger of blame :-)
>
> I frequently face EACCESs where a few minutes ago there wasn't any
> problem, and after a retry everything does work again.
>
I wonder if that is pointing the finger at
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8c7b389e532e964f07057dac8a56c43465544759
as that is a recent change that returns 'EACCES'... but I cannot see
that being relevant in this case as it only affects directories.
>
> How can I help with tracing this? git bisecting on these machines takes
> at least an hour per step, (and no reasonable connectivity either to
> compile elsewhere much quicker).
The standard answer for tracing nfs problems if 'tcpdump'.
e.g.
tcpdump -s 0 -w /tmp/trace host $CLIENT and host $SERVER and port 2049
that should show whether the error is coming from the server, or if
the client is generating it all by itself.
If you can get a reasonably small '/tmp/trace', compress it and attach
it to an email.
Also turn on tracing. Something like:
on server
echo 32767 > /proc/sys/sunrpc/nfsd_debug
on client
echo 32767 > /proc/sys/sunrpc/nfs_debug
You can be a bit more selective by only enabling individual flags.
For the server, these are in include/linux/nfsd/debug.h
You probably want FH, EXPORT AUTH PROC FILEOP
For the client, they are near the end of include/linux/nfs_fs.h
Not sure which to choose... maybe just all of them.
NeilBrown
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: nfs problems with 2.6.18-rc1
2006-07-16 23:23 ` Neil Brown
@ 2006-07-17 10:08 ` Janos Farkas
2006-07-18 12:20 ` Janos Farkas
0 siblings, 1 reply; 10+ messages in thread
From: Janos Farkas @ 2006-07-17 10:08 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-kernel, nfs
On 2006-07-17 at 09:23:38, Neil Brown wrote:
> On Thursday July 13, chexum+dev@gmail.com wrote:
> > I recently updated two (old) hosts to 2.6.18-rc1, and started noticing
> > weird things with the nfs mounted /home s.
> So this is both the client and the server that you upgraded? That
> makes is harder to point the finger of blame :-)
Yeah, smart thing :) Thank you for the response! When I have
downgraded the server to 2.6.17, none of those errors happened again.
(And I left the setup that way, because some programs have been
corrupting their config files because of the spurious read errors --
most notably zsh and firefox.)
> I wonder if that is pointing the finger at
>
> http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8c7b389e532e964f07057dac8a56c43465544759
>
> as that is a recent change that returns 'EACCES'... but I cannot see
> that being relevant in this case as it only affects directories.
As far as I can see, that patch is from May 21st, and went in 17-rc5,
thus obviously 2.6.17 is newer than that... But a bunch of nfs changes
happened in 18-rc1.
> The standard answer for tracing nfs problems if 'tcpdump'.
> e.g.
> tcpdump -s 0 -w /tmp/trace host $CLIENT and host $SERVER and port 2049
>
> that should show whether the error is coming from the server, or if
> the client is generating it all by itself.
Well, I tried that, but I couldn't see errors flying on the wire. That
would point to flaws in the 18-rc1 client (but only surfacing with an
rc1 server?), or in my eyes :)
> Also turn on tracing. Something like:
> on server
> echo 32767 > /proc/sys/sunrpc/nfsd_debug
> on client
> echo 32767 > /proc/sys/sunrpc/nfs_debug
>
> You can be a bit more selective by only enabling individual flags.
> For the server, these are in include/linux/nfsd/debug.h
> You probably want FH, EXPORT AUTH PROC FILEOP
>
> For the client, they are near the end of include/linux/nfs_fs.h
> Not sure which to choose... maybe just all of them.
Thanks, that's probably what I've been looking for. I'll try to debug
this with a 2.6.18-rc2(client):2.6.17(server) and 2.6.18-rc2(both) setup
later this day, but those machines are currently in the process of
reorganizing their RAID config :)
--
Janos
romfs is at http://romfs.sourceforge.net/
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: nfs problems with 2.6.18-rc1
2006-07-17 10:08 ` Janos Farkas
@ 2006-07-18 12:20 ` Janos Farkas
2006-07-18 23:29 ` Neil Brown
0 siblings, 1 reply; 10+ messages in thread
From: Janos Farkas @ 2006-07-18 12:20 UTC (permalink / raw)
To: Neil Brown, linux-kernel, nfs
> On 2006-07-17 at 09:23:38, Neil Brown wrote:
> > The standard answer for tracing nfs problems if 'tcpdump'.
> > e.g.
> > tcpdump -s 0 -w /tmp/trace host $CLIENT and host $SERVER and port 2049
> >
> > that should show whether the error is coming from the server, or if
> > the client is generating it all by itself.
Closing in, I have a dump between these two machines running 18-rc2 that
has the error on the wire, but I'm not sure how much more would be
relevant:
(This is probably a successful stat() of the "mailbox" file (having the
size of 39074 bytes))
13:34:56.382832 getattr fh Unknown/0100000200FD000002000000765104000BA487A2755104000AA487A20000002D
13:34:56.383109 reply ok 112 getattr REG 100600 ids 2001/500 sz 39074
13:34:56.383328 getattr fh Unknown/0100000200FD000002000000765104000BA487A2755104000AA487A20000002D
13:34:56.383556 reply ok 112 getattr REG 100600 ids 2001/500 sz 39074
13:34:56.383723 access fh Unknown/0100000200FD000002000000765104000BA487A2755104000AA487A20000002D 002d
13:34:56.383951 reply ok 120 access attr: REG 100600 ids 2001/500 sz 39074 nlink 1 rdev 0/0 fsid fd00 fileid 45176 a/m/ctime 1153222475.000000 1153173671.000000 1153222475.000000 c 000d
13:34:56.384241 setattr fh Unknown/0100000200FD000002000000765104000BA487A2755104000AA487A200000000
13:34:56.384585 reply ok 144 setattr PRE: sz 39074 mtime 1153173671.000000 ctime 1153222475.000000 POST: REG 100600 ids 2001/500 sz 39074 nlink 1 rdev 0/0 fsid fd00 fileid 45176 a/m/ctime 1153222475.000000 1153173671.000000 1153222496.000000
13:34:56.386376 getattr fh Unknown/0100000200FD000002000000765104000BA487A2755104000AA487A200000000
13:34:56.387209 reply ok 112 getattr REG 100600 ids 2001/500 sz 39074
13:34:56.387489 getattr fh Unknown/0100000100FD000002000000755104000AA487A2755104000AA487A200000000
13:34:56.388137 reply ok 112 getattr DIR 40755 ids 2001/500 sz 4096
13:34:56.388317 getattr fh Unknown/0100000200FD000002000000765104000BA487A2755104000AA487A200000000
13:34:56.388560 reply ok 112 getattr REG 100600 ids 2001/500 sz 39074
(... after a few lock files, reading through the whole file)
13:34:56.392869 lookup fh Unknown/0100000100FD000002000000755104000AA487A2000000076D61696C626F7800 "mailbox"
13:34:56.393129 reply ok 236 lookup fh Unknown/0100000200FD000002000000765104000BA487A2755104000AA487A200000001 REG 100600 ids 2001/500 sz 39074 nlink 1 rdev 0/0 fsid fd00 fileid 45176 a/m/ctime 1153222475.000000 1153173671.000000 1153222496.000000 post dattr: DIR 40755 ids 2001/500 sz 4096 nlink 2 rdev 0/0 fsid fd00 fileid 45175 a/m/ctime 1153222413.000000 1153222496.000000 1153222496.000000
13:34:56.393325 access fh Unknown/0100000200FD000002000000765104000BA487A2755104000AA487A20000002D 002d
13:34:56.393556 reply ok 120 access attr: REG 100600 ids 2001/500 sz 39074 nlink 1 rdev 0/0 fsid fd00 fileid 45176 a/m/ctime 1153222475.000000 1153173671.000000 1153222496.000000 c 000d
13:34:56.393953 read fh Unknown/0100000200FD000002000000765104000BA487A2755104000AA487A200000000 8192 bytes @ 0
13:34:56.393967 read fh Unknown/0100000200FD000002000000765104000BA487A2755104000AA487A200000000 8192 bytes @ 8192
13:34:56.394631 reply ok 1472 read REG 100600 ids 2001/500 sz 39074 nlink 1 rdev 0/0 fsid fd00 fileid 45176 a/m/ctime 1153222496.000000 1153173671.000000 1153222496.000000 8192 bytes
13:34:56.395717 reply ok 1472 read REG 100600 ids 2001/500 sz 39074 nlink 1 rdev 0/0 fsid fd00 fileid 45176 a/m/ctime 1153222496.000000 1153173671.000000 1153222496.000000 8192 bytes
13:34:56.398241 read fh Unknown/0100000200FD000002000000765104000BA487A2755104000AA487A200000000 8192 bytes @ 16384
13:34:56.398257 read fh Unknown/0100000200FD000002000000765104000BA487A2755104000AA487A200000000 8192 bytes @ 24576
13:34:56.398266 read fh Unknown/0100000200FD000002000000765104000BA487A2755104000AA487A2000000006306 bytes @ 32768
13:34:56.399338 reply ok 1472 read REG 100600 ids 2001/500 sz 39074 nlink 1 rdev 0/0 fsid fd00 fileid 45176 a/m/ctime 1153222496.000000 1153173671.000000 1153222496.000000 8192 bytes
13:34:56.400120 reply ok 1472 read REG 100600 ids 2001/500 sz 39074 nlink 1 rdev 0/0 fsid fd00 fileid 45176 a/m/ctime 1153222496.000000 1153173671.000000 1153222496.000000 8192 bytes
13:34:56.401822 reply ok 1472 read REG 100600 ids 2001/500 sz 39074 nlink 1 rdev 0/0 fsid fd00 fileid 45176 a/m/ctime 1153222496.000000 1153173671.000000 1153222496.000000 6306 bytes EOF
13:34:56.403059 getattr fh Unknown/0100000100FD000002000000755104000AA487A2742049207468696E6B0A6162
13:34:56.403359 reply ok 112 getattr DIR 40755 ids 2001/500 sz 4096
13:34:56.403538 getattr fh Unknown/0100000200FD000002000000765104000BA487A2755104000AA487A26B0A6162
13:34:56.403743 reply ok 112 getattr REG 100600 ids 2001/500 sz 39074
13:34:56.403931 lookup fh Unknown/0100000100FD000002000000755104000AA487A20000000C6D61696C626F782E "mailbox.lock"
13:34:56.404298 reply ok 236 lookup fh Unknown/0100000200FD000002000000775104000EA487A2755104000AA487A200000001 REG 100000 ids 2001/500 sz 0 n link 1 rdev 0/0 fsid fd00 fileid 45177 a/m/ctime 1153222496.000000 1153222496.000000 1153222496.000000 post dattr: DIR 40755 ids 2001/500 sz 4096 nlink 2 rdev 0/0 fsid fd00 fileid 45175 a/m/ctime 1153222413.000000 1153222496.000000 1153222496.000000
13:34:56.404484 remove fh Unknown/0100000100FD000002000000755104000AA487A20000000C6D61696C626F782E "mailbox.lock"
13:34:56.404844 reply ok 120 remove PRE: POST: DIR 40755 ids 2001/500 sz 4096 nlink 2 rdev 0/0 fsid fd00 fileid 45175 a/m/ctime 1153222413.000000 1153222496.000000 1153222496.000000
13:34:56.405728 lookup fh Unknown/0100000100FD000002000000755104000AA487A2000000076D61696C626F7800 "mailbox"
13:34:56.406419 reply ok 236 lookup fh Unknown/0100000200FD000002000000765104000BA487A2755104000AA487A200000001 REG 100600 ids 2001/500 sz 39074 nlink 1 rdev 0/0 fsid fd00 fileid 45176 a/m/ctime 1153222496.000000 1153173671.000000 1153222496.000000 post dattr: DIR 40755 ids 2001/500 sz 4096 nlink 2 rdev 0/0 fsid fd00 fileid 45175 a/m/ctime 1153222413.000000 1153222496.000000 1153222496.000000
(More operations, and pause. After a while, pressing a key makes mutt to
revalidate the mailbox file, nothing more skipped before the error)
13:35:23.139297 getattr fh Unknown/0100000200FD000002000000765104000BA487A2755104000AA487A200030001
13:35:23.139958 reply ok 112 getattr REG 100600 ids 2001/500 sz 39074
13:36:29.902394 access fh Unknown/0100000100FD000002000000755104000AA487A20000001F0AA487A200030001 001f
13:36:29.903109 reply ok 120 access attr: DIR 40755 ids 2001/500 sz 4096 nlink 2 rdev 0/0 fsid fd00 fileid 45175 a/m/ctime 1153222413.000000 1153222496.000000 1153222496.000000 c 001f
13:36:29.903314 getattr fh Unknown/0100000200FD000002000000765104000BA487A2755104000AA487A200030001
13:36:29.903603 reply ok 112 getattr REG 100600 ids 2001/500 sz 39074
13:37:41.196223 getattr fh Unknown/0100000000FD000002000000000000000004517644BCC7600000000044BC08A7
13:37:41.196678 reply ok 112 getattr DIR 40775 ids 0/0 sz 4096
13:37:41.196903 getattr fh Unknown/0100000000FD000402000000000000000000000244BCA4790000000044A22F45
13:37:41.197146 reply ok 112 getattr DIR 40755 ids 0/0 sz 1024
13:37:41.197304 getattr fh Unknown/0100000000FD000102000000000000000000000244BCB6A80000000042CACA97
13:37:41.197494 reply ok 112 getattr DIR 40755 ids 0/0 sz 4096
13:37:51.254708 access fh Unknown/0100000100FD000002000000755104000AA487A20000001F0AA487A200030001 001f
13:37:51.255375 reply ok 32 access ERROR: Permission denied attr:
13:37:51.255924 access fh Unknown/0100000000FD0001020000000000001F0AA487A20000001F0AA487A200030001 001f
13:37:51.256216 reply ok 120 access attr: DIR 40755 ids 0/0 sz 4096 nlink 4 rdev 0/0 fsid fd01 fileid 2 a/m/ctime 1153217912.000000 1139267032.000000 1139267032.000000 c 0003
13:37:51.256434 getattr fh Unknown/0100000100FD00010200000001DC0500E909CCBB0000001F0AA487A200030001
13:37:51.256655 reply ok 112 getattr DIR 40755 ids 0/0 sz 4096
13:37:51.256821 access fh Unknown/0100000100FD00010200000001DC0500E909CCBB0000001F0AA487A200030001 001f
13:37:51.257048 reply ok 120 access attr: DIR 40755 ids 0/0 sz 4096 nlink 3 rdev 0/0 fsid fd01 fileid 5dc01 a/m/ctime 1153217912.000000 1120496574.000000 1120496574.000000 c 0003
Apparently, the access that results in EACCESS is having the same
filehandle as one above that suceeded, but it is an fh of the current
directory that contained the mailbox.lock file (judging by the fileid).
But it's still confusing to me. I also have this in raw tcpdump
format. Would the client and/or server debug log be more helpful?
Janos
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: nfs problems with 2.6.18-rc1
2006-07-18 12:20 ` Janos Farkas
@ 2006-07-18 23:29 ` Neil Brown
2006-07-20 8:11 ` [NFS] " Frank van Maarseveen
0 siblings, 1 reply; 10+ messages in thread
From: Neil Brown @ 2006-07-18 23:29 UTC (permalink / raw)
To: Janos Farkas; +Cc: linux-kernel, nfs
On Tuesday July 18, chexum+dev@gmail.com wrote:
> > On 2006-07-17 at 09:23:38, Neil Brown wrote:
> > > The standard answer for tracing nfs problems if 'tcpdump'.
> > > e.g.
> > > tcpdump -s 0 -w /tmp/trace host $CLIENT and host $SERVER and port 2049
> > >
> > > that should show whether the error is coming from the server, or if
> > > the client is generating it all by itself.
>
> Closing in, I have a dump between these two machines running 18-rc2 that
> has the error on the wire, but I'm not sure how much more would be
> relevant:
Hmmm...
Interesting, but confusing.
The filehandle of the directory seems to keep changing. Maybe tcpdump
is showing too many bytes. Access to the raw dump would help.
> 13:37:51.254708 access fh Unknown/0100000100FD000002000000755104000AA487A20000001F0AA487A200030001 001f
> 13:37:51.255375 reply ok 32 access ERROR: Permission denied attr:
'access' should never return 'Permission denied' so there is
definitely something wrong here. I was expected nfserr_acces rather
than nfserr_perm...
The server logs at the same time would help a lot.
NeilBrown
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [NFS] nfs problems with 2.6.18-rc1
2006-07-18 23:29 ` Neil Brown
@ 2006-07-20 8:11 ` Frank van Maarseveen
0 siblings, 0 replies; 10+ messages in thread
From: Frank van Maarseveen @ 2006-07-20 8:11 UTC (permalink / raw)
To: Neil Brown; +Cc: Janos Farkas, nfs, linux-kernel
On Wed, Jul 19, 2006 at 09:29:28AM +1000, Neil Brown wrote:
[...]
>
> > 13:37:51.254708 access fh Unknown/0100000100FD000002000000755104000AA487A20000001F0AA487A200030001 001f
> > 13:37:51.255375 reply ok 32 access ERROR: Permission denied attr:
>
> 'access' should never return 'Permission denied' so there is
> definitely something wrong here. I was expected nfserr_acces rather
> than nfserr_perm...
Maybe a silly remark but:
EACCES = Permission denied
EPERM = Operation not permitted
--
Frank
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [NFS] nfs problems with 2.6.18-rc1
2006-07-13 18:22 nfs problems with 2.6.18-rc1 Janos Farkas
2006-07-16 23:23 ` Neil Brown
@ 2006-07-17 9:03 ` Tony Reix
2006-07-17 11:25 ` Janos Farkas
1 sibling, 1 reply; 10+ messages in thread
From: Tony Reix @ 2006-07-17 9:03 UTC (permalink / raw)
To: Janos Farkas; +Cc: Aimé Le Rouzic, linux-kernel, nfs
Le jeudi 13 juillet 2006 à 20:22 +0200, Janos Farkas a écrit :
> Hi!
>
> I recently updated two (old) hosts to 2.6.18-rc1, and started noticing
> weird things with the nfs mounted /home s.
>
> I frequently face EACCESs where a few minutes ago there wasn't any
> problem, and after a retry everything does work again.
>
....
Do you have the same problem with the last stable version (2.6.17-rc2)
we have tested with in depth ?
See: http://nfsv4.bullopensource.org/
Thanks,
Tony
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: nfs problems with 2.6.18-rc1
2006-07-17 9:03 ` Tony Reix
@ 2006-07-17 11:25 ` Janos Farkas
0 siblings, 0 replies; 10+ messages in thread
From: Janos Farkas @ 2006-07-17 11:25 UTC (permalink / raw)
To: Tony Reix; +Cc: Aimé Le Rouzic, linux-kernel, nfs
On 2006-07-17 at 11:03:46, Tony Reix wrote:
> Le jeudi 13 juillet 2006 ? 20:22 +0200, Janos Farkas a écrit :
> > I recently updated two (old) hosts to 2.6.18-rc1, and started
> > noticing
> > weird things with the nfs mounted /home s.
> >
> > I frequently face EACCESs where a few minutes ago there wasn't any
> > problem, and after a retry everything does work again.
> Do you have the same problem with the last stable version (2.6.17-rc2)
> we have tested with in depth ?
> See: http://nfsv4.bullopensource.org/
Even stock 2.6.17 as server works for me with a 2.6.18-rc1 client (v3).
--
Janos
romfs is at http://romfs.sourceforge.net/
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [NFS] nfs problems with 2.6.18-rc1
@ 2006-07-13 20:15 Ottman, Tom
0 siblings, 0 replies; 10+ messages in thread
From: Ottman, Tom @ 2006-07-13 20:15 UTC (permalink / raw)
To: Janos Farkas, linux-kernel, nfs
All of today I have been receiving your email strings, and I am not
aware why.
The reference that is close is the NFS, which on my end is National
Flood Services in Kalispell, MT.
I you could remove me from your distribution list, it would be greatly
appreciated.
Thanks,
Thomas Ottman
Program Manager, NFS
(888) 888-2169 ext 545
Fax (406) 257-1292
tom.ottman@nfsmt.com
-----Original Message-----
From: nfs-bounces@lists.sourceforge.net
[mailto:nfs-bounces@lists.sourceforge.net] On Behalf Of Janos Farkas
Sent: Thursday, July 13, 2006 12:22 PM
To: linux-kernel@vger.kernel.org; nfs@lists.sourceforge.net
Subject: [NFS] nfs problems with 2.6.18-rc1
Hi!
I recently updated two (old) hosts to 2.6.18-rc1, and started noticing
weird things with the nfs mounted /home s.
I frequently face EACCESs where a few minutes ago there wasn't any
problem, and after a retry everything does work again.
An example that easily trips it is keeping mutt open
on a single mailbox file (strace -tt| grep stat):
20:04:08.393815 stat64("mailbox", {st_mode=S_IFREG|0600, st_size=401000,
...}) = 0
20:08:41.859168 stat64("mailbox", {st_mode=S_IFREG|0600, st_size=401000,
...}) = 0
20:09:30.975876 stat64("mailbox", 0xbfe8966c) = -1 EACCES (Permission
denied)
This results in a bit scary "Mailbox was corrupted!" message, but
otherwise harmless. Reopening the mailbox succeeds immediately.
A sample session with an rsync session updating files on the nfs mounted
/home/:
-----
> rsync...
receiving file list ... done
file1
rsync: close failed on "/home/path/.file1.UgEmSh": Permission denied
(13)
rsync error: error in file IO (code 11) at receiver.c(628) [receiver]
rsync: connection unexpectedly closed (2490 bytes received so far)
[generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(471)
[generator]
> rsync...
receiving file list ... done
rsync: recv_generator: failed to stat "/home/path/file2": Permission
denied (13)
rsync: recv_generator: failed to stat "/home/path/file3": Permission
denied (13)
rsync: recv_generator: failed to stat "/home/path/file4": Permission
denied (13)
rsync: recv_generator: failed to stat "/home/path/file5": Permission
denied (13)
rsync: recv_generator: failed to stat "/home/path/file6": Permission
denied (13)
rsync: recv_generator: failed to stat "/home/path/file7": Permission
denied (13)
-----
I also think this is related in the dmesg. Think, because there's no
other trace of any "read error" on the disks, and the comments in
mm/filemap.c (but the message is not that much help to be sure of this).
Reducing readahead size to 28K
Reducing readahead size to 4K
Reducing readahead size to 28K
Reducing readahead size to 4K
Reducing readahead size to 28K
Reducing readahead size to 4K
Reducing readahead size to 0K
The relevant part of the /proc/mounts file:
-----
automount(pid1831) /home autofs rw 0 0
HOST:/export/PATH /home/path nfs
rw,vers=3,rsize=8192,wsize=8192,hard,intr,nolock,proto=udp,timeo=7,retra
ns=3,sec=sys,addr=HOST 0 0
-----
How can I help with tracing this? git bisecting on these machines takes
at least an hour per step, (and no reasonable connectivity either to
compile elsewhere much quicker).
Janos
------------------------------------------------------------------------
-
Using Tomcat but need to do more? Need to support web services,
security?
Get stuff done quickly with pre-integrated technology to make your job
easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 10+ messages in thread* RE: [NFS] nfs problems with 2.6.18-rc1
@ 2006-07-13 20:38 Gilman, Mark
0 siblings, 0 replies; 10+ messages in thread
From: Gilman, Mark @ 2006-07-13 20:38 UTC (permalink / raw)
To: Ottman, Tom, Janos Farkas, linux-kernel, nfs
Remove me too please.
-----Original Message-----
From: nfs-bounces@lists.sourceforge.net
[mailto:nfs-bounces@lists.sourceforge.net] On Behalf Of Ottman, Tom
Sent: Thursday, July 13, 2006 2:15 PM
To: Janos Farkas; linux-kernel@vger.kernel.org;
nfs@lists.sourceforge.net
Subject: Re: [NFS] nfs problems with 2.6.18-rc1
All of today I have been receiving your email strings, and I am not
aware why.
The reference that is close is the NFS, which on my end is National
Flood Services in Kalispell, MT.
I you could remove me from your distribution list, it would be greatly
appreciated.
Thanks,
Thomas Ottman
Program Manager, NFS
(888) 888-2169 ext 545
Fax (406) 257-1292
tom.ottman@nfsmt.com
-----Original Message-----
From: nfs-bounces@lists.sourceforge.net
[mailto:nfs-bounces@lists.sourceforge.net] On Behalf Of Janos Farkas
Sent: Thursday, July 13, 2006 12:22 PM
To: linux-kernel@vger.kernel.org; nfs@lists.sourceforge.net
Subject: [NFS] nfs problems with 2.6.18-rc1
Hi!
I recently updated two (old) hosts to 2.6.18-rc1, and started noticing
weird things with the nfs mounted /home s.
I frequently face EACCESs where a few minutes ago there wasn't any
problem, and after a retry everything does work again.
An example that easily trips it is keeping mutt open
on a single mailbox file (strace -tt| grep stat):
20:04:08.393815 stat64("mailbox", {st_mode=S_IFREG|0600, st_size=401000,
...}) = 0
20:08:41.859168 stat64("mailbox", {st_mode=S_IFREG|0600, st_size=401000,
...}) = 0
20:09:30.975876 stat64("mailbox", 0xbfe8966c) = -1 EACCES (Permission
denied)
This results in a bit scary "Mailbox was corrupted!" message, but
otherwise harmless. Reopening the mailbox succeeds immediately.
A sample session with an rsync session updating files on the nfs mounted
/home/:
-----
> rsync...
receiving file list ... done
file1
rsync: close failed on "/home/path/.file1.UgEmSh": Permission denied
(13)
rsync error: error in file IO (code 11) at receiver.c(628) [receiver]
rsync: connection unexpectedly closed (2490 bytes received so far)
[generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(471)
[generator]
> rsync...
receiving file list ... done
rsync: recv_generator: failed to stat "/home/path/file2": Permission
denied (13)
rsync: recv_generator: failed to stat "/home/path/file3": Permission
denied (13)
rsync: recv_generator: failed to stat "/home/path/file4": Permission
denied (13)
rsync: recv_generator: failed to stat "/home/path/file5": Permission
denied (13)
rsync: recv_generator: failed to stat "/home/path/file6": Permission
denied (13)
rsync: recv_generator: failed to stat "/home/path/file7": Permission
denied (13)
-----
I also think this is related in the dmesg. Think, because there's no
other trace of any "read error" on the disks, and the comments in
mm/filemap.c (but the message is not that much help to be sure of this).
Reducing readahead size to 28K
Reducing readahead size to 4K
Reducing readahead size to 28K
Reducing readahead size to 4K
Reducing readahead size to 28K
Reducing readahead size to 4K
Reducing readahead size to 0K
The relevant part of the /proc/mounts file:
-----
automount(pid1831) /home autofs rw 0 0
HOST:/export/PATH /home/path nfs
rw,vers=3,rsize=8192,wsize=8192,hard,intr,nolock,proto=udp,timeo=7,retra
ns=3,sec=sys,addr=HOST 0 0
-----
How can I help with tracing this? git bisecting on these machines takes
at least an hour per step, (and no reasonable connectivity either to
compile elsewhere much quicker).
Janos
------------------------------------------------------------------------
-
Using Tomcat but need to do more? Need to support web services,
security?
Get stuff done quickly with pre-integrated technology to make your job
easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
------------------------------------------------------------------------
-
Using Tomcat but need to do more? Need to support web services,
security?
Get stuff done quickly with pre-integrated technology to make your job
easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-07-20 8:11 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-13 18:22 nfs problems with 2.6.18-rc1 Janos Farkas
2006-07-16 23:23 ` Neil Brown
2006-07-17 10:08 ` Janos Farkas
2006-07-18 12:20 ` Janos Farkas
2006-07-18 23:29 ` Neil Brown
2006-07-20 8:11 ` [NFS] " Frank van Maarseveen
2006-07-17 9:03 ` Tony Reix
2006-07-17 11:25 ` Janos Farkas
-- strict thread matches above, loose matches on Subject: below --
2006-07-13 20:15 [NFS] " Ottman, Tom
2006-07-13 20:38 Gilman, Mark
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox