* Processes hanging, directory hanging @ 2006-08-01 13:30 Filipe Brandenburger 2006-08-04 14:38 ` Filipe Brandenburger 0 siblings, 1 reply; 5+ messages in thread From: Filipe Brandenburger @ 2006-08-01 13:30 UTC (permalink / raw) To: nfs; +Cc: f.soto Hello, I'm facing a rather strange situation on a host of mine. I recently upgraded one server software, and after a week running, several processes hang, and including some directories hang. The processes hang in "D" (disk wait) state. That way, I cannot strace or gdb them to know what they were doing or where they were. But the strangest thing are directories. Some directories in NFS start to hang, in some way that if I try to "cd" to them or "ls" them (sometimes even TAB complete hangs them) the process hangs, stays in "disk wait" state, and there's no way I can get it back. If I try to strace a process that changes directory to some of these hanged directories, it goes up to the "getent32" and hangs. I'm using RHEL4, but I tried to upgrade the kernel to the latest release, and the problem happens as well on the latest kernel (which at the time I upgraded was 2.6.17.6). So I ask: 1) Do you know of some bug currently unsolved that could cause this? 2) It seems to me that the problem is in the kernel, but somehow it's being induced by the new version of the application... What could the application be doing wrong to cause such a problem? 3) How could I try to see what's happening? Since strace and gdb (which are the tools I know) don't work anymore, I couldn't find anything to try to debug the problem... Should I try to dump something from the kernel? Where exactly should I look? Thanks in advance, Filipe Brandenburger ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Processes hanging, directory hanging 2006-08-01 13:30 Processes hanging, directory hanging Filipe Brandenburger @ 2006-08-04 14:38 ` Filipe Brandenburger 2006-08-04 15:09 ` Trond Myklebust 0 siblings, 1 reply; 5+ messages in thread From: Filipe Brandenburger @ 2006-08-04 14:38 UTC (permalink / raw) To: nfs Hello, Please, anybody has any hints on this? I'm still with this problem, and I still don't have any clues about what to do next... Or should I try to post this on other list, like a kernel list? It seems to be that the problem is related to the NFS client, but I can't be 100% sure of that... Thanks a lot, Filipe Brandenburger On Tue, 1 Aug 2006 10:30:59 -0300, Filipe Brandenburger <branden@terra.com.br> wrote: > I'm facing a rather strange situation on a host of mine. I recently > upgraded one server software, and after a week running, several > processes hang, and including some directories hang. > > The processes hang in "D" (disk wait) state. That way, I cannot strace > or gdb them to know what they were doing or where they were. > > But the strangest thing are directories. Some directories in NFS start > to hang, in some way that if I try to "cd" to them or "ls" them > (sometimes even TAB complete hangs them) the process hangs, stays in > "disk wait" state, and there's no way I can get it back. If I try to > strace a process that changes directory to some of these hanged > directories, it goes up to the "getent32" and hangs. > > I'm using RHEL4, but I tried to upgrade the kernel to the latest > release, and the problem happens as well on the latest kernel (which > at the time I upgraded was 2.6.17.6). > > So I ask: > > 1) Do you know of some bug currently unsolved that could cause this? > > 2) It seems to me that the problem is in the kernel, but somehow it's > being induced by the new version of the application... What could the > application be doing wrong to cause such a problem? > > 3) How could I try to see what's happening? Since strace and gdb > (which are the tools I know) don't work anymore, I couldn't find > anything to try to debug the problem... Should I try to dump something > from the kernel? Where exactly should I look? > > Thanks in advance, > Filipe Brandenburger ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Processes hanging, directory hanging 2006-08-04 14:38 ` Filipe Brandenburger @ 2006-08-04 15:09 ` Trond Myklebust 2006-08-04 16:51 ` Filipe Brandenburger 0 siblings, 1 reply; 5+ messages in thread From: Trond Myklebust @ 2006-08-04 15:09 UTC (permalink / raw) To: Filipe Brandenburger; +Cc: nfs On Fri, 2006-08-04 at 11:38 -0300, Filipe Brandenburger wrote: > Hello, > > Please, anybody has any hints on this? I'm still with this problem, and > I still don't have any clues about what to do next... > > Or should I try to post this on other list, like a kernel list? It > seems to be that the problem is related to the NFS client, but I can't > be 100% sure of that... So you upgraded the server, and the clients started to hang. What makes you think this is a client problem? Have you tried comparing 'nfsstat' output on the client and server to see if the server is processing the client requests. A tcpdump to see if the client is receiving server replies would be useful too. Also, check what software you upgraded on the server. If it was samba, and you have oplock support enabled, then the problem could be related to leases (IIRC there were a few kernel bugs w.r.t. leases that had to be fixed recently). Cheers, Trond > Thanks a lot, > Filipe Brandenburger > > > On Tue, 1 Aug 2006 10:30:59 -0300, Filipe Brandenburger > <branden@terra.com.br> wrote: > > I'm facing a rather strange situation on a host of mine. I recently > > upgraded one server software, and after a week running, several > > processes hang, and including some directories hang. > > > > The processes hang in "D" (disk wait) state. That way, I cannot strace > > or gdb them to know what they were doing or where they were. > > > > But the strangest thing are directories. Some directories in NFS start > > to hang, in some way that if I try to "cd" to them or "ls" them > > (sometimes even TAB complete hangs them) the process hangs, stays in > > "disk wait" state, and there's no way I can get it back. If I try to > > strace a process that changes directory to some of these hanged > > directories, it goes up to the "getent32" and hangs. > > > > I'm using RHEL4, but I tried to upgrade the kernel to the latest > > release, and the problem happens as well on the latest kernel (which > > at the time I upgraded was 2.6.17.6). > > > > So I ask: > > > > 1) Do you know of some bug currently unsolved that could cause this? > > > > 2) It seems to me that the problem is in the kernel, but somehow it's > > being induced by the new version of the application... What could the > > application be doing wrong to cause such a problem? > > > > 3) How could I try to see what's happening? Since strace and gdb > > (which are the tools I know) don't work anymore, I couldn't find > > anything to try to debug the problem... Should I try to dump something > > from the kernel? Where exactly should I look? > > > > Thanks in advance, > > Filipe Brandenburger > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Processes hanging, directory hanging 2006-08-04 15:09 ` Trond Myklebust @ 2006-08-04 16:51 ` Filipe Brandenburger 2006-08-04 22:22 ` Filipe Brandenburger 0 siblings, 1 reply; 5+ messages in thread From: Filipe Brandenburger @ 2006-08-04 16:51 UTC (permalink / raw) To: Trond Myklebust; +Cc: nfs, f.soto Hi. On Fri, 04 Aug 2006 11:09:52 -0400, Trond Myklebust <trond.myklebust@fys.uio.no> wrote: > So you upgraded the server, and the clients started to hang. What > makes you think this is a client problem? No, I didn't upgrade the NFS server... I just upgraded the LMTP server, which is the software that uses the NFS client on Linux... Sorry if I wrote this in a confusing way. I guess the problem is in NFS client because the directory just "disappears" or "hangs" when trying to cd/ls it on that client. The problem doesn't happen in any other client, and also, rebooting the client solves the problem. > Have you tried comparing 'nfsstat' output on the client and server to > see if the server is processing the client requests. A tcpdump to see > if the client is receiving server replies would be useful too. Ok, I'll try and provide a tcpdump. Comparing "nfsstat" I guess it's difficult, because on the server it still gets requests from the other clients, and it would be difficult to me to isolate only from that client... However, I'll try to get a tcpdump on the server too (the servers are EMC storage, not Linux, but I can get them to run tcpdump too). > Also, check what software you upgraded on the server. If it was samba, > and you have oplock support enabled, then the problem could be related > to leases (IIRC there were a few kernel bugs w.r.t. leases that had to > be fixed recently). The software on the client is an LMTP server, it's developped in-house, and it got a somewhat large upgrade for new features. I don't believe the problem is related to locks (as in "flock" or "fcntl") because the software uses Maildir which doesn't need locks. Anyway, I'll investigate this issue about oplocks and leases with the developpers, to see if the problem could be related to that. Thank you very much! Filipe Brandenburger ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Processes hanging, directory hanging 2006-08-04 16:51 ` Filipe Brandenburger @ 2006-08-04 22:22 ` Filipe Brandenburger 0 siblings, 0 replies; 5+ messages in thread From: Filipe Brandenburger @ 2006-08-04 22:22 UTC (permalink / raw) To: nfs; +Cc: f.soto Hi, I found this post which seems to be about the same problem: http://marc.theaimsgroup.com/?l=linux-kernel&m=105211049508464&w=2 Although it's kernel 2.4, and he doesn't say anything about a directory "hanging", only the part of processes stuck in "D", it seems to be like my problem. He says something about using "intr", I'll try to turn it off to see if it helps. In other post in the same thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=105214854307866&w=2 He says: "This happens when you mount an NFS mount with the 'hard' option (default) and a mount's handle expires incorrectly (eg: server crash)." Could this be the cause of my problem? But then what could cause the mount's handle to expire incorrectly? If someone can give me some hints on how to debug this deeper (that is, how to know what's going on with these processes inside the kernel), please tell me how to do so. Thanks a lot, Filipe Brandenburger On Fri, 4 Aug 2006 13:51:15 -0300, Filipe Brandenburger <branden@terra.com.br> wrote: > On Fri, 04 Aug 2006 11:09:52 -0400, Trond Myklebust > <trond.myklebust@fys.uio.no> wrote: > > So you upgraded the server, and the clients started to hang. What > > makes you think this is a client problem? > > No, I didn't upgrade the NFS server... I just upgraded the LMTP > server, which is the software that uses the NFS client on Linux... > Sorry if I wrote this in a confusing way. > > I guess the problem is in NFS client because the directory just > "disappears" or "hangs" when trying to cd/ls it on that client. The > problem doesn't happen in any other client, and also, rebooting the > client solves the problem. > > > Have you tried comparing 'nfsstat' output on the client and server > > to see if the server is processing the client requests. A tcpdump > > to see if the client is receiving server replies would be useful > > too. > > Ok, I'll try and provide a tcpdump. Comparing "nfsstat" I guess it's > difficult, because on the server it still gets requests from the other > clients, and it would be difficult to me to isolate only from that > client... However, I'll try to get a tcpdump on the server too (the > servers are EMC storage, not Linux, but I can get them to run tcpdump > too). > > > Also, check what software you upgraded on the server. If it was > > samba, and you have oplock support enabled, then the problem could > > be related to leases (IIRC there were a few kernel bugs w.r.t. > > leases that had to be fixed recently). > > The software on the client is an LMTP server, it's developped > in-house, and it got a somewhat large upgrade for new features. I > don't believe the problem is related to locks (as in "flock" or > "fcntl") because the software uses Maildir which doesn't need locks. > Anyway, I'll investigate this issue about oplocks and leases with the > developpers, to see if the problem could be related to that. > > Thank you very much! > Filipe Brandenburger ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-08-04 22:22 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-08-01 13:30 Processes hanging, directory hanging Filipe Brandenburger 2006-08-04 14:38 ` Filipe Brandenburger 2006-08-04 15:09 ` Trond Myklebust 2006-08-04 16:51 ` Filipe Brandenburger 2006-08-04 22:22 ` Filipe Brandenburger
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.