* nfs problem 2.4.19-pre9
@ 2002-06-01 14:02 Kenneth Johansson
2002-06-01 19:09 ` Trond Myklebust
0 siblings, 1 reply; 6+ messages in thread
From: Kenneth Johansson @ 2002-06-01 14:02 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 1470 bytes --]
I have had a problem for some time that processes get stuck in D state
and I now have a way to get this to happen at will.
One way to do this is to copy a file from one nfs mounted directory to
another. It dose not happen on the same mount and not when copying from
nfs to a local disk. To make this even more complex it works with cp and
mv but not in mc(midnight commander F6 ).
The nfs mounts is from the same server and same disk on that server,
mounted using automount.
linux version is
client 2.4.19-pre9
server 2.4.19-pre8 using reiserfs on a raid1 partition.
here is a calltrace on mc when it is dead.
Trace; c0129875 <__lock_page+a1/c8>
Trace; c01298b1 <lock_page+15/1c>
Trace; c0129faa <do_generic_file_read+28e/464>
Trace; c012a466 <generic_file_read+7e/130>
Trace; c012a360 <file_read_actor+0/88>
Trace; c0173a2d <nfs_file_read+9d/ac>
Trace; c01372f7 <sys_read+8f/100>
Trace; c01088cb <system_call+33/38>
The nfs mount is unusable after this. Every process that uses it enters
D state.
I have attached a strace of mc when moving file /home/ken/3 to
/delta/kernel/
mc 4.5.55
And now to something completely different.
(please forward to gconf people)
I also found what I would call a serious problem with gconf during the
testing. It goes something like this.
1. have homedir on nfs
2. pull network cable when gconf app active(nautilus, galeon ..)
3. Shutdown
4. insert network cable. restart
5. watch all gconf apps fail to start.
gconf 1.0.9
[-- Attachment #2: mc.log --]
[-- Type: text/plain, Size: 6508 bytes --]
eteuid32() = 1000
stat64("/delta/kernel", {st_mode=S_IFDIR|S_ISGID|0755, st_size=240, ...}) = 0
select(1024, [0], NULL, NULL, {0, 0}) = 0 (Timeout)
write(1, "\33[23;28H\33[m\17\17\33[37;44m \16"..., 1176) = 1176
gettimeofday({1022936718, 201427}, NULL) = 0
rt_sigaction(SIGINT, {0x8077c78, [], 0x4000000}, NULL, 8) = 0
select(1024, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
rt_sigaction(SIGINT, {SIG_IGN}, NULL, 8) = 0
select(1024, [0], NULL, NULL, {0, 0}) = 0 (Timeout)
write(1, "\33[28;26H\33[m\17\17\33[30;47mTarget /d"..., 45) = 45
gettimeofday({1022936718, 202476}, NULL) = 0
rt_sigaction(SIGINT, {0x8077c78, [], 0x4000000}, NULL, 8) = 0
select(1024, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
rt_sigaction(SIGINT, {SIG_IGN}, NULL, 8) = 0
lstat64("/home/ken/2", {st_mode=S_IFREG|0664, st_size=1048576, ...}) = 0
lstat64("/delta/kernel/2", 0xbffff88c) = -1 ENOENT (No such file or directory)
rename("/home/ken/2", "/delta/kernel/2") = -1 EXDEV (Invalid cross-device link)
select(1024, [0], NULL, NULL, {0, 0}) = 0 (Timeout)
gettimeofday({1022936718, 203840}, NULL) = 0
rt_sigaction(SIGINT, {0x8077c78, [], 0x4000000}, NULL, 8) = 0
select(1024, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
rt_sigaction(SIGINT, {SIG_IGN}, NULL, 8) = 0
select(1024, [0], NULL, NULL, {0, 0}) = 0 (Timeout)
gettimeofday({1022936718, 204324}, NULL) = 0
rt_sigaction(SIGINT, {0x8077c78, [], 0x4000000}, NULL, 8) = 0
select(1024, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
rt_sigaction(SIGINT, {SIG_IGN}, NULL, 8) = 0
stat64("/delta/kernel/2", 0xbfffd78c) = -1 ENOENT (No such file or directory)
lstat64("/home/ken/2", {st_mode=S_IFREG|0664, st_size=1048576, ...}) = 0
gettimeofday({1022936718, 205249}, NULL) = 0
open("/home/ken/2", O_RDONLY|O_LARGEFILE) = 4
fstat64(4, {st_mode=S_IFREG|0664, st_size=1048576, ...}) = 0
open("/delta/kernel/2", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0600) = 7
fstat64(7, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
select(1024, [0], NULL, NULL, {0, 0}) = 0 (Timeout)
write(1, "\33[30;26HFile\33[5C[\33[40C] 0%", 28) = 28
gettimeofday({1022936718, 207295}, NULL) = 0
rt_sigaction(SIGINT, {0x8077c78, [], 0x4000000}, NULL, 8) = 0
select(1024, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
rt_sigaction(SIGINT, {SIG_IGN}, NULL, 8) = 0
read(4, "RIFF\324\355;\24AVI LISTp\"\0\0hdrlavih8\0\0\0"..., 8192) = 8192
gettimeofday({1022936718, 209270}, NULL) = 0
gettimeofday({1022936718, 209332}, NULL) = 0
write(7, "RIFF\324\355;\24AVI LISTp\"\0\0hdrlavih8\0\0\0"..., 8192) = 8192
select(1024, [0], NULL, NULL, {0, 0}) = 0 (Timeout)
gettimeofday({1022936718, 209986}, NULL) = 0
rt_sigaction(SIGINT, {0x8077c78, [], 0x4000000}, NULL, 8) = 0
select(1024, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
rt_sigaction(SIGINT, {SIG_IGN}, NULL, 8) = 0
select(1024, [0], NULL, NULL, {0, 0}) = 0 (Timeout)
write(1, "\33[30;80H1\33[30;82H", 17) = 17
gettimeofday({1022936718, 211172}, NULL) = 0
rt_sigaction(SIGINT, {0x8077c78, [], 0x4000000}, NULL, 8) = 0
select(1024, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
rt_sigaction(SIGINT, {SIG_IGN}, NULL, 8) = 0
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
gettimeofday({1022936718, 211684}, NULL) = 0
gettimeofday({1022936718, 211744}, NULL) = 0
write(7, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
select(1024, [0], NULL, NULL, {0, 0}) = 0 (Timeout)
gettimeofday({1022936718, 212366}, NULL) = 0
rt_sigaction(SIGINT, {0x8077c78, [], 0x4000000}, NULL, 8) = 0
select(1024, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
rt_sigaction(SIGINT, {SIG_IGN}, NULL, 8) = 0
select(1024, [0], NULL, NULL, {0, 0}) = 0 (Timeout)
write(1, "\33[30;36H\33[1m\33[37;40m \33[43C\33[m\17\33["..., 47) = 47
gettimeofday({1022936718, 213603}, NULL) = 0
rt_sigaction(SIGINT, {0x8077c78, [], 0x4000000}, NULL, 8) = 0
select(1024, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
rt_sigaction(SIGINT, {SIG_IGN}, NULL, 8) = 0
read(4, "\177\177\177\177\177\177\177\177\177\177\177\177\177\177"..., 8192) = 8192
gettimeofday({1022936718, 908626}, NULL) = 0
gettimeofday({1022936718, 908709}, NULL) = 0
write(7, "\177\177\177\177\177\177\177\177\177\177\177\177\177\177"..., 8192) = 8192
select(1024, [0], NULL, NULL, {0, 0}) = 0 (Timeout)
gettimeofday({1022936718, 909309}, NULL) = 0
rt_sigaction(SIGINT, {0x8077c78, [], 0x4000000}, NULL, 8) = 0
select(1024, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
rt_sigaction(SIGINT, {SIG_IGN}, NULL, 8) = 0
select(1024, [0], NULL, NULL, {0, 0}) = 0 (Timeout)
gettimeofday({1022936718, 909890}, NULL) = 0
rt_sigaction(SIGINT, {0x8077c78, [], 0x4000000}, NULL, 8) = 0
select(1024, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
rt_sigaction(SIGINT, {SIG_IGN}, NULL, 8) = 0
read(4, "}}~~~~~~}}~~\177\200\200\200\200\201\202\202\203\202\202"..., 8192) = 8192
gettimeofday({1022936718, 910396}, NULL) = 0
gettimeofday({1022936718, 910456}, NULL) = 0
write(7, "}}~~~~~~}}~~\177\200\200\200\200\201\202\202\203\202\202"..., 8192) = 8192
select(1024, [0], NULL, NULL, {0, 0}) = 0 (Timeout)
gettimeofday({1022936718, 910892}, NULL) = 0
rt_sigaction(SIGINT, {0x8077c78, [], 0x4000000}, NULL, 8) = 0
select(1024, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
rt_sigaction(SIGINT, {SIG_IGN}, NULL, 8) = 0
select(1024, [0], NULL, NULL, {0, 0}) = 0 (Timeout)
write(1, "\33[30;80H3\33[30;82H", 17) = 17
gettimeofday({1022936718, 912201}, NULL) = 0
rt_sigaction(SIGINT, {0x8077c78, [], 0x4000000}, NULL, 8) = 0
select(1024, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
rt_sigaction(SIGINT, {SIG_IGN}, NULL, 8) = 0
read(4, "\177\200\201\201\202\202\202\201\201\177~}||}~\177\201"..., 8192) = 8192
gettimeofday({1022936718, 912711}, NULL) = 0
gettimeofday({1022936718, 912771}, NULL) = 0
write(7, "\177\200\201\201\202\202\202\201\201\177~}||}~\177\201"..., 8192) = 8192
select(1024, [0], NULL, NULL, {0, 0}) = 0 (Timeout)
gettimeofday({1022936718, 913260}, NULL) = 0
rt_sigaction(SIGINT, {0x8077c78, [], 0x4000000}, NULL, 8) = 0
select(1024, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
rt_sigaction(SIGINT, {SIG_IGN}, NULL, 8) = 0
select(1024, [0], NULL, NULL, {0, 0}) = 0 (Timeout)
write(1, "\33[30;37H\33[1m\33[37;40m \33[42C\33[m\17\33["..., 47) = 47
gettimeofday({1022936718, 914553}, NULL) = 0
rt_sigaction(SIGINT, {0x8077c78, [], 0x4000000}, NULL, 8) = 0
select(1024, [0 3], NULL, NULL, {0, 0}) = 0 (Timeout)
rt_sigaction(SIGINT, {SIG_IGN}, NULL, 8) = 0
read(4,
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nfs problem 2.4.19-pre9
2002-06-01 14:02 nfs problem 2.4.19-pre9 Kenneth Johansson
@ 2002-06-01 19:09 ` Trond Myklebust
2002-06-01 19:39 ` Kenneth Johansson
0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2002-06-01 19:09 UTC (permalink / raw)
To: Kenneth Johansson; +Cc: linux-kernel@vger.kernel.org
>>>>> " " == Kenneth Johansson <ken@canit.se> writes:
> I have had a problem for some time that processes get stuck in
> D state and I now have a way to get this to happen at will.
> One way to do this is to copy a file from one nfs mounted
> directory to another. It dose not happen on the same mount and
> not when copying from nfs to a local disk. To make this even
> more complex it works with cp and mv but not in mc(midnight
> commander F6 ).
Sounds like a network driver problem or something like that. UDP
appears to trigger these lockups a lot more easily than does TCP.
Try testing with a different brand of networking card...
Cheers,
Trond
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nfs problem 2.4.19-pre9
2002-06-01 19:09 ` Trond Myklebust
@ 2002-06-01 19:39 ` Kenneth Johansson
2002-06-01 19:50 ` Trond Myklebust
0 siblings, 1 reply; 6+ messages in thread
From: Kenneth Johansson @ 2002-06-01 19:39 UTC (permalink / raw)
To: Trond Myklebust; +Cc: linux-kernel@vger.kernel.org
On Sat, 2002-06-01 at 21:09, Trond Myklebust wrote:
> >>>>> " " == Kenneth Johansson <ken@canit.se> writes:
>
> > I have had a problem for some time that processes get stuck in
> > D state and I now have a way to get this to happen at will.
>
> > One way to do this is to copy a file from one nfs mounted
> > directory to another. It dose not happen on the same mount and
> > not when copying from nfs to a local disk. To make this even
> > more complex it works with cp and mv but not in mc(midnight
> > commander F6 ).
>
> Sounds like a network driver problem or something like that. UDP
> appears to trigger these lockups a lot more easily than does TCP.
>
> Try testing with a different brand of networking card...
>
I have three cards but they are all the same :(
3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 30).
Also Why only this nfs mount. I can still telnet to other computers and
use nfs on another mount point so it's not like I lose the network.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nfs problem 2.4.19-pre9
2002-06-01 19:39 ` Kenneth Johansson
@ 2002-06-01 19:50 ` Trond Myklebust
2002-06-01 20:10 ` Kenneth Johansson
0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2002-06-01 19:50 UTC (permalink / raw)
To: Kenneth Johansson; +Cc: Trond Myklebust, linux-kernel@vger.kernel.org
>>>>> " " == Kenneth Johansson <ken@canit.se> writes:
> I have three cards but they are all the same :( 3Com
> Corporation 3c905B 100BaseTX [Cyclone] (rev 30).
> Also Why only this nfs mount. I can still telnet to other
> computers and use nfs on another mount point so it's not like I
> lose the network.
Fair enough. Have you tried a tcpdump?
cheers,
Trond
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nfs problem 2.4.19-pre9
2002-06-01 19:50 ` Trond Myklebust
@ 2002-06-01 20:10 ` Kenneth Johansson
2002-06-02 10:52 ` Trond Myklebust
0 siblings, 1 reply; 6+ messages in thread
From: Kenneth Johansson @ 2002-06-01 20:10 UTC (permalink / raw)
To: trond.myklebust; +Cc: linux-kernel@vger.kernel.org
On Sat, 2002-06-01 at 21:50, Trond Myklebust wrote:
> >>>>> " " == Kenneth Johansson <ken@canit.se> writes:
>
> > I have three cards but they are all the same :( 3Com
> > Corporation 3c905B 100BaseTX [Cyclone] (rev 30).
>
> > Also Why only this nfs mount. I can still telnet to other
> > computers and use nfs on another mount point so it's not like I
> > lose the network.
>
> Fair enough. Have you tried a tcpdump?
No for the simple reason that I would not know what to look for.
I can send you a trace if you want. I guess you only need a trace from
the first stat to read fails but it has to wait an hour or two it's not
a good time to crash just now.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nfs problem 2.4.19-pre9
2002-06-01 20:10 ` Kenneth Johansson
@ 2002-06-02 10:52 ` Trond Myklebust
0 siblings, 0 replies; 6+ messages in thread
From: Trond Myklebust @ 2002-06-02 10:52 UTC (permalink / raw)
To: Kenneth Johansson; +Cc: linux-kernel@vger.kernel.org
>>>>> " " == Kenneth Johansson <ken@canit.se> writes:
>> Fair enough. Have you tried a tcpdump?
> I can send you a trace if you want. I guess you only need a
> trace from the first stat to read fails but it has to wait an
> hour or two it's not a good time to crash just now.
Problem is very apparent from the tcpdump: your client is only
receiving 2 or 3 out of the 6 UDP fragments in the NFS read
reply from the server. The rest is getting lost en route.
Check out the NFS FAQ on nfs.sourceforge.net. The relevant section is
the bit that asks questions of the form:
1) Are both server and client running on the same speed network
(i.e. are both switched 100Mbit/100Mbit or 10Mbit/10Mbit)?
2) If you are using a switch, are you also using autonegotiation,
or have you forced one or both of the cards (forcing is *bad*
if your switch/hub is autonegotiating)
Cheers,
Trond
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2002-06-02 10:52 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-01 14:02 nfs problem 2.4.19-pre9 Kenneth Johansson
2002-06-01 19:09 ` Trond Myklebust
2002-06-01 19:39 ` Kenneth Johansson
2002-06-01 19:50 ` Trond Myklebust
2002-06-01 20:10 ` Kenneth Johansson
2002-06-02 10:52 ` Trond Myklebust
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox