* Processes hanging, directory hanging
@ 2006-08-01 13:30 Filipe Brandenburger
2006-08-04 14:38 ` Filipe Brandenburger
0 siblings, 1 reply; 5+ messages in thread
From: Filipe Brandenburger @ 2006-08-01 13:30 UTC (permalink / raw)
To: nfs; +Cc: f.soto
Hello,
I'm facing a rather strange situation on a host of mine. I recently
upgraded one server software, and after a week running, several
processes hang, and including some directories hang.
The processes hang in "D" (disk wait) state. That way, I cannot strace
or gdb them to know what they were doing or where they were.
But the strangest thing are directories. Some directories in NFS start
to hang, in some way that if I try to "cd" to them or "ls" them
(sometimes even TAB complete hangs them) the process hangs, stays in
"disk wait" state, and there's no way I can get it back. If I try to
strace a process that changes directory to some of these hanged
directories, it goes up to the "getent32" and hangs.
I'm using RHEL4, but I tried to upgrade the kernel to the latest
release, and the problem happens as well on the latest kernel (which at
the time I upgraded was 2.6.17.6).
So I ask:
1) Do you know of some bug currently unsolved that could cause this?
2) It seems to me that the problem is in the kernel, but somehow it's
being induced by the new version of the application... What could the
application be doing wrong to cause such a problem?
3) How could I try to see what's happening? Since strace and gdb
(which are the tools I know) don't work anymore, I couldn't find
anything to try to debug the problem... Should I try to dump something
from the kernel? Where exactly should I look?
Thanks in advance,
Filipe Brandenburger
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Processes hanging, directory hanging
2006-08-01 13:30 Processes hanging, directory hanging Filipe Brandenburger
@ 2006-08-04 14:38 ` Filipe Brandenburger
2006-08-04 15:09 ` Trond Myklebust
0 siblings, 1 reply; 5+ messages in thread
From: Filipe Brandenburger @ 2006-08-04 14:38 UTC (permalink / raw)
To: nfs
Hello,
Please, anybody has any hints on this? I'm still with this problem, and
I still don't have any clues about what to do next...
Or should I try to post this on other list, like a kernel list? It
seems to be that the problem is related to the NFS client, but I can't
be 100% sure of that...
Thanks a lot,
Filipe Brandenburger
On Tue, 1 Aug 2006 10:30:59 -0300, Filipe Brandenburger
<branden@terra.com.br> wrote:
> I'm facing a rather strange situation on a host of mine. I recently
> upgraded one server software, and after a week running, several
> processes hang, and including some directories hang.
>
> The processes hang in "D" (disk wait) state. That way, I cannot strace
> or gdb them to know what they were doing or where they were.
>
> But the strangest thing are directories. Some directories in NFS start
> to hang, in some way that if I try to "cd" to them or "ls" them
> (sometimes even TAB complete hangs them) the process hangs, stays in
> "disk wait" state, and there's no way I can get it back. If I try to
> strace a process that changes directory to some of these hanged
> directories, it goes up to the "getent32" and hangs.
>
> I'm using RHEL4, but I tried to upgrade the kernel to the latest
> release, and the problem happens as well on the latest kernel (which
> at the time I upgraded was 2.6.17.6).
>
> So I ask:
>
> 1) Do you know of some bug currently unsolved that could cause this?
>
> 2) It seems to me that the problem is in the kernel, but somehow it's
> being induced by the new version of the application... What could the
> application be doing wrong to cause such a problem?
>
> 3) How could I try to see what's happening? Since strace and gdb
> (which are the tools I know) don't work anymore, I couldn't find
> anything to try to debug the problem... Should I try to dump something
> from the kernel? Where exactly should I look?
>
> Thanks in advance,
> Filipe Brandenburger
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Processes hanging, directory hanging
2006-08-04 14:38 ` Filipe Brandenburger
@ 2006-08-04 15:09 ` Trond Myklebust
2006-08-04 16:51 ` Filipe Brandenburger
0 siblings, 1 reply; 5+ messages in thread
From: Trond Myklebust @ 2006-08-04 15:09 UTC (permalink / raw)
To: Filipe Brandenburger; +Cc: nfs
On Fri, 2006-08-04 at 11:38 -0300, Filipe Brandenburger wrote:
> Hello,
>
> Please, anybody has any hints on this? I'm still with this problem, and
> I still don't have any clues about what to do next...
>
> Or should I try to post this on other list, like a kernel list? It
> seems to be that the problem is related to the NFS client, but I can't
> be 100% sure of that...
So you upgraded the server, and the clients started to hang. What makes
you think this is a client problem?
Have you tried comparing 'nfsstat' output on the client and server to
see if the server is processing the client requests. A tcpdump to see if
the client is receiving server replies would be useful too.
Also, check what software you upgraded on the server. If it was samba,
and you have oplock support enabled, then the problem could be related
to leases (IIRC there were a few kernel bugs w.r.t. leases that had to
be fixed recently).
Cheers,
Trond
> Thanks a lot,
> Filipe Brandenburger
>
>
> On Tue, 1 Aug 2006 10:30:59 -0300, Filipe Brandenburger
> <branden@terra.com.br> wrote:
> > I'm facing a rather strange situation on a host of mine. I recently
> > upgraded one server software, and after a week running, several
> > processes hang, and including some directories hang.
> >
> > The processes hang in "D" (disk wait) state. That way, I cannot strace
> > or gdb them to know what they were doing or where they were.
> >
> > But the strangest thing are directories. Some directories in NFS start
> > to hang, in some way that if I try to "cd" to them or "ls" them
> > (sometimes even TAB complete hangs them) the process hangs, stays in
> > "disk wait" state, and there's no way I can get it back. If I try to
> > strace a process that changes directory to some of these hanged
> > directories, it goes up to the "getent32" and hangs.
> >
> > I'm using RHEL4, but I tried to upgrade the kernel to the latest
> > release, and the problem happens as well on the latest kernel (which
> > at the time I upgraded was 2.6.17.6).
> >
> > So I ask:
> >
> > 1) Do you know of some bug currently unsolved that could cause this?
> >
> > 2) It seems to me that the problem is in the kernel, but somehow it's
> > being induced by the new version of the application... What could the
> > application be doing wrong to cause such a problem?
> >
> > 3) How could I try to see what's happening? Since strace and gdb
> > (which are the tools I know) don't work anymore, I couldn't find
> > anything to try to debug the problem... Should I try to dump something
> > from the kernel? Where exactly should I look?
> >
> > Thanks in advance,
> > Filipe Brandenburger
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys -- and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> NFS maillist - NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Processes hanging, directory hanging
2006-08-04 15:09 ` Trond Myklebust
@ 2006-08-04 16:51 ` Filipe Brandenburger
2006-08-04 22:22 ` Filipe Brandenburger
0 siblings, 1 reply; 5+ messages in thread
From: Filipe Brandenburger @ 2006-08-04 16:51 UTC (permalink / raw)
To: Trond Myklebust; +Cc: nfs, f.soto
Hi.
On Fri, 04 Aug 2006 11:09:52 -0400, Trond Myklebust
<trond.myklebust@fys.uio.no> wrote:
> So you upgraded the server, and the clients started to hang. What
> makes you think this is a client problem?
No, I didn't upgrade the NFS server... I just upgraded the LMTP server,
which is the software that uses the NFS client on Linux... Sorry if I
wrote this in a confusing way.
I guess the problem is in NFS client because the directory just
"disappears" or "hangs" when trying to cd/ls it on that client. The
problem doesn't happen in any other client, and also, rebooting the
client solves the problem.
> Have you tried comparing 'nfsstat' output on the client and server to
> see if the server is processing the client requests. A tcpdump to see
> if the client is receiving server replies would be useful too.
Ok, I'll try and provide a tcpdump. Comparing "nfsstat" I guess it's
difficult, because on the server it still gets requests from the other
clients, and it would be difficult to me to isolate only from that
client... However, I'll try to get a tcpdump on the server too (the
servers are EMC storage, not Linux, but I can get them to run tcpdump
too).
> Also, check what software you upgraded on the server. If it was samba,
> and you have oplock support enabled, then the problem could be related
> to leases (IIRC there were a few kernel bugs w.r.t. leases that had to
> be fixed recently).
The software on the client is an LMTP server, it's developped in-house,
and it got a somewhat large upgrade for new features. I don't believe
the problem is related to locks (as in "flock" or "fcntl") because the
software uses Maildir which doesn't need locks. Anyway, I'll investigate
this issue about oplocks and leases with the developpers, to see if the
problem could be related to that.
Thank you very much!
Filipe Brandenburger
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Processes hanging, directory hanging
2006-08-04 16:51 ` Filipe Brandenburger
@ 2006-08-04 22:22 ` Filipe Brandenburger
0 siblings, 0 replies; 5+ messages in thread
From: Filipe Brandenburger @ 2006-08-04 22:22 UTC (permalink / raw)
To: nfs; +Cc: f.soto
Hi,
I found this post which seems to be about the same problem:
http://marc.theaimsgroup.com/?l=linux-kernel&m=105211049508464&w=2
Although it's kernel 2.4, and he doesn't say anything about a directory
"hanging", only the part of processes stuck in "D", it seems to be like
my problem.
He says something about using "intr", I'll try to turn it off to see if
it helps.
In other post in the same thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=105214854307866&w=2
He says:
"This happens when you mount an NFS mount with the 'hard' option
(default) and a mount's handle expires incorrectly (eg: server crash)."
Could this be the cause of my problem? But then what could cause the
mount's handle to expire incorrectly?
If someone can give me some hints on how to debug this deeper (that is,
how to know what's going on with these processes inside the kernel),
please tell me how to do so.
Thanks a lot,
Filipe Brandenburger
On Fri, 4 Aug 2006 13:51:15 -0300, Filipe Brandenburger
<branden@terra.com.br> wrote:
> On Fri, 04 Aug 2006 11:09:52 -0400, Trond Myklebust
> <trond.myklebust@fys.uio.no> wrote:
> > So you upgraded the server, and the clients started to hang. What
> > makes you think this is a client problem?
>
> No, I didn't upgrade the NFS server... I just upgraded the LMTP
> server, which is the software that uses the NFS client on Linux...
> Sorry if I wrote this in a confusing way.
>
> I guess the problem is in NFS client because the directory just
> "disappears" or "hangs" when trying to cd/ls it on that client. The
> problem doesn't happen in any other client, and also, rebooting the
> client solves the problem.
>
> > Have you tried comparing 'nfsstat' output on the client and server
> > to see if the server is processing the client requests. A tcpdump
> > to see if the client is receiving server replies would be useful
> > too.
>
> Ok, I'll try and provide a tcpdump. Comparing "nfsstat" I guess it's
> difficult, because on the server it still gets requests from the other
> clients, and it would be difficult to me to isolate only from that
> client... However, I'll try to get a tcpdump on the server too (the
> servers are EMC storage, not Linux, but I can get them to run tcpdump
> too).
>
> > Also, check what software you upgraded on the server. If it was
> > samba, and you have oplock support enabled, then the problem could
> > be related to leases (IIRC there were a few kernel bugs w.r.t.
> > leases that had to be fixed recently).
>
> The software on the client is an LMTP server, it's developped
> in-house, and it got a somewhat large upgrade for new features. I
> don't believe the problem is related to locks (as in "flock" or
> "fcntl") because the software uses Maildir which doesn't need locks.
> Anyway, I'll investigate this issue about oplocks and leases with the
> developpers, to see if the problem could be related to that.
>
> Thank you very much!
> Filipe Brandenburger
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-08-04 22:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-01 13:30 Processes hanging, directory hanging Filipe Brandenburger
2006-08-04 14:38 ` Filipe Brandenburger
2006-08-04 15:09 ` Trond Myklebust
2006-08-04 16:51 ` Filipe Brandenburger
2006-08-04 22:22 ` Filipe Brandenburger
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.