* knfsd brought to its knees, by a simple rsync or cp operation @ 2005-02-26 13:28 Brad Barnett 2005-02-28 10:06 ` Olaf Kirch 0 siblings, 1 reply; 16+ messages in thread From: Brad Barnett @ 2005-02-26 13:28 UTC (permalink / raw) To: nfs There seems to be some odd behaviour with knfsd. I have a box with a raid10, and a single cp or rsync operation should not effectively kill knfsd performance. It does, however. First, nfs works very well as long as this box does not have any local disk i/o. I can, literally, transfer files at the upper limit of my 100mbit network connection. Directory reads are fast, file transfers are great in both directions. "Instant" would be the word I would use for access. It works great, fast and beautifully. There does not appear to be any configuration issue at play here. The problems start as soon as any local I/O starts. Directory listings over nfs can take > 5 or 6 seconds, once I start my rsync backup process. File transfer rates fall through the floor. However, directory listings, locally on the box, are still instant. File reads are instant. There is a _very_ minor slowdown, but my raid10 array is doing a great job at handling a single rsync session + a single directory request or copy request. Again, with knfsd, performance bombs. There is obviously something wacky in the way the kernel is scheduling things here. Any ideas, patches, suggestions? Kernel 2.6.10, NFSv3 mounted, noatime mounts. More info can be provided in needed, but again.. this setup works beautifully under load from multiple NFS clients. It is fast, responsive, you name it. However, one _single_ cp or rsync session can bring NFS responsiveness to its knees, without tasking the cpu, ram or swap. Thanks. ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: knfsd brought to its knees, by a simple rsync or cp operation 2005-02-26 13:28 knfsd brought to its knees, by a simple rsync or cp operation Brad Barnett @ 2005-02-28 10:06 ` Olaf Kirch 2005-02-28 15:23 ` Brad Barnett 0 siblings, 1 reply; 16+ messages in thread From: Olaf Kirch @ 2005-02-28 10:06 UTC (permalink / raw) To: Brad Barnett; +Cc: nfs On Sat, Feb 26, 2005 at 08:28:54AM -0500, Brad Barnett wrote: > There is obviously something wacky in the way the kernel is scheduling > things here. Any ideas, patches, suggestions? That's because knfsd will write things to disk synchronously unless you tell it not to. That can throttle other NFS activity in two locations: - by tying up all knfsd threads on the server. Try to bump the number of nfsd processes - by tying up all RPC slots on the client. Make sure your wsize isn't too big (8k is reasonable) Olaf -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: knfsd brought to its knees, by a simple rsync or cp operation 2005-02-28 10:06 ` Olaf Kirch @ 2005-02-28 15:23 ` Brad Barnett 2005-02-28 15:44 ` Olaf Kirch 0 siblings, 1 reply; 16+ messages in thread From: Brad Barnett @ 2005-02-28 15:23 UTC (permalink / raw) To: Olaf Kirch; +Cc: nfs On Mon, 28 Feb 2005 11:06:33 +0100 Olaf Kirch <okir@suse.de> wrote: > On Sat, Feb 26, 2005 at 08:28:54AM -0500, Brad Barnett wrote: > > There is obviously something wacky in the way the kernel is scheduling > > things here. Any ideas, patches, suggestions? > > That's because knfsd will write things to disk synchronously unless you > tell it not to. That can throttle other NFS activity in two locations: During my tests involving "ls", no one else was accessing the server. I have noatime set for both client and server mounts.. just in case. So, there should be no writes for knfsd to do. There was only one read operation, and that was a "ls -R /nfsmount". > > - by tying up all knfsd threads on the server. Try to bump the > number of nfsd processes > > - by tying up all RPC slots on the client. Make sure your wsize > isn't too big (8k is reasonable) There is only one client (during my tests), so #1 can't be the case. Number 2 applies to writes operations, although I have spent over 5 hours trying every possible permutation to see if any significant advantage can be had. This is what I don't understand. Why is one single 'ls' on a single client, the only nfs client, brought to a standstill by a single cp or rsync? It's very weird, and it does not seem to be because of write operations the client is performing. > > Olaf > -- > Olaf Kirch | --- o --- Nous sommes du soleil we love when we play > okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: knfsd brought to its knees, by a simple rsync or cp operation 2005-02-28 15:23 ` Brad Barnett @ 2005-02-28 15:44 ` Olaf Kirch 2005-02-28 16:20 ` Brad Barnett 0 siblings, 1 reply; 16+ messages in thread From: Olaf Kirch @ 2005-02-28 15:44 UTC (permalink / raw) To: Brad Barnett; +Cc: nfs On Mon, Feb 28, 2005 at 10:23:07AM -0500, Brad Barnett wrote: > During my tests involving "ls", no one else was accessing the server. I > have noatime set for both client and server mounts.. just in case. > > So, there should be no writes for knfsd to do. There was only one > read operation, and that was a "ls -R /nfsmount". Well, you were talking about rsync and cp, so it's either reads or writes going over the wire, or both. > > - by tying up all knfsd threads on the server. Try to bump the > > number of nfsd processes > > > > - by tying up all RPC slots on the client. Make sure your wsize > > isn't too big (8k is reasonable) > > There is only one client (during my tests), so #1 can't be the case. One NFS client can issue many requests simultaenously, thereby tying up more than one nfsd thread. > This is what I don't understand. Why is one single 'ls' on a single > client, the only nfs client, brought to a standstill by a single cp or > rsync? It's very weird, and it does not seem to be because of write > operations the client is performing. Where do these cp and rsync calls occur? From your first message I assumed they were on the client, operating on the NFS mounted file system. Olaf -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: knfsd brought to its knees, by a simple rsync or cp operation 2005-02-28 15:44 ` Olaf Kirch @ 2005-02-28 16:20 ` Brad Barnett 2005-03-01 9:55 ` Olaf Kirch 0 siblings, 1 reply; 16+ messages in thread From: Brad Barnett @ 2005-02-28 16:20 UTC (permalink / raw) To: Olaf Kirch; +Cc: nfs On Mon, 28 Feb 2005 16:44:55 +0100 Olaf Kirch <okir@suse.de> wrote: > On Mon, Feb 28, 2005 at 10:23:07AM -0500, Brad Barnett wrote: > > During my tests involving "ls", no one else was accessing the server. > > I have noatime set for both client and server mounts.. just in case. > > > > So, there should be no writes for knfsd to do. There was only one > > read operation, and that was a "ls -R /nfsmount". > > Well, you were talking about rsync and cp, so it's either reads or > writes going over the wire, or both. The rsync or cp operation are on the server. > > > > - by tying up all knfsd threads on the server. Try to bump the > > > number of nfsd processes > > > > > > - by tying up all RPC slots on the client. Make sure your wsize > > > isn't too big (8k is reasonable) > > > > There is only one client (during my tests), so #1 can't be the case. > > One NFS client can issue many requests simultaenously, thereby tying > up more than one nfsd thread. Yes, but the only activity is a single "ls" on the client.. I don't think this would use more than one thread. > > > This is what I don't understand. Why is one single 'ls' on a single > > client, the only nfs client, brought to a standstill by a single cp or > > rsync? It's very weird, and it does not seem to be because of write > > operations the client is performing. > > Where do these cp and rsync calls occur? From your first message I > assumed they were on the client, operating on the NFS mounted file > system. > The cp or rsync are occurring locally on the server. Eg One client has an nfs mount. It issues an "ls". The response is instant, without slowdowns. I start a long and extensive cp -a process on the nfs server. Local 'ls' responses are instant. Write and read operations are instant (it's a raid 10) on the local box, as well. However, my single remote client's "ls" operation changes to a jerky, slow operation.. with upwards of 5 second pauses in reads. ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: knfsd brought to its knees, by a simple rsync or cp operation 2005-02-28 16:20 ` Brad Barnett @ 2005-03-01 9:55 ` Olaf Kirch 2005-03-01 11:57 ` Brad Barnett 0 siblings, 1 reply; 16+ messages in thread From: Olaf Kirch @ 2005-03-01 9:55 UTC (permalink / raw) To: Brad Barnett; +Cc: nfs On Mon, Feb 28, 2005 at 11:20:18AM -0500, Brad Barnett wrote: > I start a long and extensive cp -a process on the nfs server. Local 'ls' > responses are instant. Write and read operations are instant (it's a raid > 10) on the local box, as well. However, my single remote client's "ls" > operation changes to a jerky, slow operation.. with upwards of 5 second > pauses in reads. Are you using NFS over UDP? If you ping the server from the client, do the round trip time and packet loss rate change when you start the heavy IO jobs on the server? Olaf -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: knfsd brought to its knees, by a simple rsync or cp operation 2005-03-01 9:55 ` Olaf Kirch @ 2005-03-01 11:57 ` Brad Barnett 2005-03-01 14:21 ` Roger Heflin ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Brad Barnett @ 2005-03-01 11:57 UTC (permalink / raw) To: nfs On Tue, 1 Mar 2005 10:55:48 +0100 Olaf Kirch <okir@suse.de> wrote: > On Mon, Feb 28, 2005 at 11:20:18AM -0500, Brad Barnett wrote: > > I start a long and extensive cp -a process on the nfs server. Local > > 'ls' responses are instant. Write and read operations are instant > > (it's a raid 10) on the local box, as well. However, my single remote > > client's "ls" operation changes to a jerky, slow operation.. with > > upwards of 5 second pauses in reads. > > Are you using NFS over UDP? If you ping the server from the client, do > the round trip time and packet loss rate change when you start the > heavy IO jobs on the server? I just tried, and ping times do not visibly change (0.1ms before and after). However, this is what is really erking me. This isn't a heavy I/O job. This is just _one_ cp. Nothing else is happening on the entire server! I just did, in the above test: client: ls -R /home The client is fine, for very long periods of time... Then, while the above command is still happening: server: cp -a /raid/home /raid/hometest Within 10 seconds, the output of ls -R /home slows. Within 20 seconds, it _stop_. It then sits there for seconds, and spews out a page in small jumps. Again, a ls /raid/home on the _server_ barely slows, and is constant. I'm really scratching my head here. ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: knfsd brought to its knees, by a simple rsync or cp operation 2005-03-01 11:57 ` Brad Barnett @ 2005-03-01 14:21 ` Roger Heflin 2005-03-01 14:37 ` Olaf Kirch 2005-03-01 15:04 ` Bill Rugolsky Jr. 2 siblings, 0 replies; 16+ messages in thread From: Roger Heflin @ 2005-03-01 14:21 UTC (permalink / raw) To: 'Brad Barnett', nfs Brad, My post will bounce, so it won't go to the list, my email and domain name don't agree. The problem is simple, I have never found a decent solution to it. The basic issue is that a large local cp quickly fills up the buffer cache on the local machine and can cause the nfsd processes to starve and have difficulty getting in their io. Watch what happens to the buffer cache and disk when this is happening. The later versions of linux should be worse as they fill the buffer cache faster. I suspect that the problem is that there is always a long line of operations to take care of and when NFS comes along it has to get in line behind whatever is already queued up. I have seen it on older versions of linux (2.2) and it took around 2-3 to make things really bad, but 1 would do a good job of making response bad. Roger > -----Original Message----- > From: nfs-admin@lists.sourceforge.net > [mailto:nfs-admin@lists.sourceforge.net] On Behalf Of Brad Barnett > Sent: Tuesday, March 01, 2005 5:57 AM > To: nfs@lists.sourceforge.net > Subject: Re: [NFS] knfsd brought to its knees, by a simple > rsync or cp operation > > On Tue, 1 Mar 2005 10:55:48 +0100 > Olaf Kirch <okir@suse.de> wrote: > > > On Mon, Feb 28, 2005 at 11:20:18AM -0500, Brad Barnett wrote: > > > I start a long and extensive cp -a process on the nfs > server. Local > > > 'ls' responses are instant. Write and read operations > are instant > > > (it's a raid 10) on the local box, as well. However, my single > > > remote client's "ls" operation changes to a jerky, slow > operation.. > > > with upwards of 5 second pauses in reads. > > > > Are you using NFS over UDP? If you ping the server from the > client, do > > the round trip time and packet loss rate change when you start the > > heavy IO jobs on the server? > > I just tried, and ping times do not visibly change (0.1ms > before and after). > > However, this is what is really erking me. This isn't a > heavy I/O job. > This is just _one_ cp. Nothing else is happening on the > entire server! I just did, in the above test: > > client: ls -R /home > > The client is fine, for very long periods of time... > > Then, while the above command is still happening: > > server: cp -a /raid/home /raid/hometest > > Within 10 seconds, the output of ls -R /home slows. Within > 20 seconds, it _stop_. It then sits there for seconds, and > spews out a page in small jumps. Again, a ls /raid/home on > the _server_ barely slows, and is constant. > > I'm really scratching my head here. > > > > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide Read honest & > candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs > ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: knfsd brought to its knees, by a simple rsync or cp operation 2005-03-01 11:57 ` Brad Barnett 2005-03-01 14:21 ` Roger Heflin @ 2005-03-01 14:37 ` Olaf Kirch 2005-03-01 23:10 ` Brad Barnett 2005-03-01 15:04 ` Bill Rugolsky Jr. 2 siblings, 1 reply; 16+ messages in thread From: Olaf Kirch @ 2005-03-01 14:37 UTC (permalink / raw) To: Brad Barnett; +Cc: nfs On Tue, Mar 01, 2005 at 06:57:03AM -0500, Brad Barnett wrote: > Within 10 seconds, the output of ls -R /home slows. Within 20 seconds, it > _stop_. It then sits there for seconds, and spews out a page in small > jumps. Again, a ls /raid/home on the _server_ barely slows, and is > constant. > > I'm really scratching my head here. Well, it sounds like something's eating the network bandwidth, or otherwise interfering with nfsd responsiveness. Again, are you using UDP or TCP? If UDP, look at nfsstat output to see if you have a high retransmit count. If it's really a problem with scheduling, it should make a difference if you run the rsync job with lower priority, and/or renice the nfsd threads to run with higher priority. Olaf -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: knfsd brought to its knees, by a simple rsync or cp operation 2005-03-01 14:37 ` Olaf Kirch @ 2005-03-01 23:10 ` Brad Barnett 2005-03-02 9:03 ` Olaf Kirch 0 siblings, 1 reply; 16+ messages in thread From: Brad Barnett @ 2005-03-01 23:10 UTC (permalink / raw) To: nfs On Tue, 1 Mar 2005 15:37:32 +0100 Olaf Kirch <okir@suse.de> wrote: > On Tue, Mar 01, 2005 at 06:57:03AM -0500, Brad Barnett wrote: > > Within 10 seconds, the output of ls -R /home slows. Within 20 > > seconds, it_stop_. It then sits there for seconds, and spews out a > > page in small jumps. Again, a ls /raid/home on the _server_ barely > > slows, and is constant. > > > > I'm really scratching my head here. > > Well, it sounds like something's eating the network bandwidth, > or otherwise interfering with nfsd responsiveness. Again, are > you using UDP or TCP? If UDP, look at nfsstat output to see if > you have a high retransmit count. > In my original post, I did mention that I can copy large files (isos) over the network at excellent speeds. That is, I get over 6M/sec transfer speed... > If it's really a problem with scheduling, it should make a difference > if you run the rsync job with lower priority, and/or renice the > nfsd threads to run with higher priority. You can't really renice the kernel nfsd threads though :(( ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: knfsd brought to its knees, by a simple rsync or cp operation 2005-03-01 23:10 ` Brad Barnett @ 2005-03-02 9:03 ` Olaf Kirch 2005-03-02 16:41 ` Brad Barnett 0 siblings, 1 reply; 16+ messages in thread From: Olaf Kirch @ 2005-03-02 9:03 UTC (permalink / raw) To: Brad Barnett; +Cc: nfs On Tue, Mar 01, 2005 at 06:10:07PM -0500, Brad Barnett wrote: > > Well, it sounds like something's eating the network bandwidth, > > or otherwise interfering with nfsd responsiveness. Again, are > > you using UDP or TCP? If UDP, look at nfsstat output to see if > > you have a high retransmit count. > > > > In my original post, I did mention that I can copy large files (isos) over > the network at excellent speeds. That is, I get over 6M/sec transfer > speed... Stil you won't answer: UDP or TCP? :-) And the question about retransmits referred to the situation where you see the slow-downs. > > If it's really a problem with scheduling, it should make a difference > > if you run the rsync job with lower priority, and/or renice the > > nfsd threads to run with higher priority. > > You can't really renice the kernel nfsd threads though :(( renice -20 -p <pid of nfsd> works for me. Olaf -- Olaf Kirch | --- o --- Nous sommes du soleil we love when we play okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: knfsd brought to its knees, by a simple rsync or cp operation 2005-03-02 9:03 ` Olaf Kirch @ 2005-03-02 16:41 ` Brad Barnett 0 siblings, 0 replies; 16+ messages in thread From: Brad Barnett @ 2005-03-02 16:41 UTC (permalink / raw) To: Olaf Kirch; +Cc: Brad Barnett, nfs On Wed, 2 Mar 2005 10:03:13 +0100 Olaf Kirch <okir@suse.de> wrote: > On Tue, Mar 01, 2005 at 06:10:07PM -0500, Brad Barnett wrote: > > > Well, it sounds like something's eating the network bandwidth, > > > or otherwise interfering with nfsd responsiveness. Again, are > > > you using UDP or TCP? If UDP, look at nfsstat output to see if > > > you have a high retransmit count. > > > > > > > In my original post, I did mention that I can copy large files (isos) > > over the network at excellent speeds. That is, I get over 6M/sec > > transfer speed... > > Stil you won't answer: UDP or TCP? :-) Sorry Olaf, heh. UDP. > > And the question about retransmits referred to the situation where you > see the slow-downs. I did check this before, in was in an NFS howto someplace, and I did not notice a large number of retransmits (there were one or two per several minutes).. > > > > If it's really a problem with scheduling, it should make a > > > difference if you run the rsync job with lower priority, and/or > > > renice the nfsd threads to run with higher priority. > > > > You can't really renice the kernel nfsd threads though :(( > > renice -20 -p <pid of nfsd> works for me. > Right, but that doesn't effect anything in the kernel... Anyhow, the problem is solved, please see my other message about changing i/o schedulers... Thanks! ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: knfsd brought to its knees, by a simple rsync or cp operation 2005-03-01 11:57 ` Brad Barnett 2005-03-01 14:21 ` Roger Heflin 2005-03-01 14:37 ` Olaf Kirch @ 2005-03-01 15:04 ` Bill Rugolsky Jr. 2005-03-01 16:08 ` Bill Rugolsky Jr. ` (2 more replies) 2 siblings, 3 replies; 16+ messages in thread From: Bill Rugolsky Jr. @ 2005-03-01 15:04 UTC (permalink / raw) To: Brad Barnett; +Cc: nfs On Tue, Mar 01, 2005 at 06:57:03AM -0500, Brad Barnett wrote: > However, this is what is really erking me. This isn't a heavy I/O job. > This is just _one_ cp. Nothing else is happening on the entire server! I > just did, in the above test: > > client: ls -R /home > > The client is fine, for very long periods of time... > > Then, while the above command is still happening: > > server: cp -a /raid/home /raid/hometest You say that it isn't a heavy I/O job, but a recursive copy is a very seek-intensive one, particularly when copying a large tree to the same device, which will interleave reads and writes. What filesystem are you using? With an internal journal, journal writes will cause additional seeking. > Within 10 seconds, the output of ls -R /home slows. Within 20 seconds, it > _stop_. It then sits there for seconds, and spews out a page in small > jumps. Again, a ls /raid/home on the _server_ barely slows, and is > constant. Do you mean ls -R /raid/home here? Is it definitely the case that ls -R /raid/home on the server is quick, but on the client it is slow? How about ls -lR /raid/home on the server? It could be that knfsd is returning file attribute information, hence reading the whole inode for each file, and not just for the directories. getdents64() returns d_type=DT_DIR for directory entries, which allows ls -R to optimize the traversal so as to only call fstat64() on directories, not regular files. So on the server, ls -R would only fstat64() the entries, while on the client ls -R can cause knfsd to do the equivalent of ls -lR. Also, which I/O scheduler are you using? Regards, Bill Rugolsky ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: knfsd brought to its knees, by a simple rsync or cp operation 2005-03-01 15:04 ` Bill Rugolsky Jr. @ 2005-03-01 16:08 ` Bill Rugolsky Jr. 2005-03-01 23:38 ` Brad Barnett 2005-03-01 23:40 ` Brad Barnett 2 siblings, 0 replies; 16+ messages in thread From: Bill Rugolsky Jr. @ 2005-03-01 16:08 UTC (permalink / raw) To: Brad Barnett; +Cc: nfs On Tue, Mar 01, 2005 at 10:04:46AM -0500, Bill Rugolsky Jr. wrote: > How about ls -lR /raid/home on the server? It could be that knfsd is > returning file attribute information, hence reading the whole inode > for each file, and not just for the directories. getdents64() returns > d_type=DT_DIR for directory entries, which allows ls -R to optimize the > traversal so as to only call fstat64() on directories, not regular files. > So on the server, ls -R would only fstat64() the entries, while > on the client ls -R can cause knfsd to do the equivalent of ls -lR. Sorry for replying to myself; I've had a look at the 2.6.10 code, and I think I may understand what is going on. In the *ideal* case, the client NFS and server NFS implementations support READDIRPLUS. Additionally, the filesystem stores the file type in the directory entry; such is the case with EXT3 with the "filetype" feature. If all of the above is true, when a directory is read on the client, the nfs client issues a READDIRPLUS call to the server. The server in turn, issues a readdir to the VFS, which will call down into the underlying filesystem. If the underlying filesystem stores type information in the directory, then it will populate the the type field, and this can then be returned to the caller without reading the on-disk inode. If the filesystem does not store filetype information in the directory, then filldir() on the server will return with DT_UNKNOWN, and this will get passed back to the getdents64() caller (ls), which then has to [f]stat() the file, which will translate into a GETATTR call, which will require reading the on-disk inode. If the client or server doesn't implement READDIRPLUS, then the nfs client will be unable to receive the type information, and will either have to return DT_UNKNOWN to satisfy the getdents64 call, will issue a GETATTR on each directory entry itself. Note that READDIRPLUS is missing from many/most deployed 2.4 kernels, though patches are at client.linux-nfs.org. Bill Rugolsky ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: knfsd brought to its knees, by a simple rsync or cp operation 2005-03-01 15:04 ` Bill Rugolsky Jr. 2005-03-01 16:08 ` Bill Rugolsky Jr. @ 2005-03-01 23:38 ` Brad Barnett 2005-03-01 23:40 ` Brad Barnett 2 siblings, 0 replies; 16+ messages in thread From: Brad Barnett @ 2005-03-01 23:38 UTC (permalink / raw) To: nfs On Tue, 1 Mar 2005 10:04:46 -0500 "Bill Rugolsky Jr." <brugolsky@telemetry-investments.com> wrote: > On Tue, Mar 01, 2005 at 06:57:03AM -0500, Brad Barnett wrote: > > However, this is what is really erking me. This isn't a heavy I/O > > job. This is just _one_ cp. Nothing else is happening on the entire > > server! I just did, in the above test: > > > > client: ls -R /home > > > > The client is fine, for very long periods of time... > > > > Then, while the above command is still happening: > > > > server: cp -a /raid/home /raid/hometest > > You say that it isn't a heavy I/O job, but a recursive copy is a very > seek-intensive one, particularly when copying a large tree to the same > device, which will interleave reads and writes. What filesystem are you > using? With an internal journal, journal writes will cause additional > seeking. ext3. I do stress, however, that I find almost zero slowdown on the local system, when I do a ls -R locally. This is what set off my spidey sense. > > > Within 10 seconds, the output of ls -R /home slows. Within 20 > > seconds, it_stop_. It then sits there for seconds, and spews out a > > page in small jumps. Again, a ls /raid/home on the _server_ barely > > slows, and is constant. > > Do you mean ls -R /raid/home here? Is it definitely the case that > ls -R /raid/home on the server is quick, but on the client it is slow? Yes, most definitely ls -R /raid/home. I've tested this dozens of times, and it is as you say above. Server fast, client slow. > > How about ls -lR /raid/home on the server? It could be that knfsd is > returning file attribute information, hence reading the whole inode > for each file, and not just for the directories. getdents64() returns > d_type=DT_DIR for directory entries, which allows ls -R to optimize the > traversal so as to only call fstat64() on directories, not regular > files. So on the server, ls -R would only fstat64() the entries, while > on the client ls -R can cause knfsd to do the equivalent of ls -lR. > ls -lR on the server is fast still. There are no 5 second stalls... no stalls at all, really. > Also, which I/O scheduler are you using? io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered elevator: using anticipatory as default io scheduler Should I try a different scheduler? Deadline perhaps? > > Regards, > > Bill Rugolsky ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: knfsd brought to its knees, by a simple rsync or cp operation 2005-03-01 15:04 ` Bill Rugolsky Jr. 2005-03-01 16:08 ` Bill Rugolsky Jr. 2005-03-01 23:38 ` Brad Barnett @ 2005-03-01 23:40 ` Brad Barnett 2 siblings, 0 replies; 16+ messages in thread From: Brad Barnett @ 2005-03-01 23:40 UTC (permalink / raw) To: nfs; +Cc: brugolsky Wow. CBQ instantly resolved the issue. After reading a bit more up on cbq, I switched to deadline. Deadline makes me happy. ;) I am currently doing FIVE cp -al operations, as well as half a dozen ls -R raid operations on the local box. I see almost zero nfs slowdown. ;) Thanks very much guys, for all your help. I appreciate the effort everyone put into this. Hopefully this thread will help some people down the road... Thanks! On Tue, 1 Mar 2005 10:04:46 -0500 "Bill Rugolsky Jr." <brugolsky@telemetry-investments.com> wrote: > On Tue, Mar 01, 2005 at 06:57:03AM -0500, Brad Barnett wrote: > > However, this is what is really erking me. This isn't a heavy I/O > > job. This is just _one_ cp. Nothing else is happening on the entire > > server! I just did, in the above test: > > > > client: ls -R /home > > > > The client is fine, for very long periods of time... > > > > Then, while the above command is still happening: > > > > server: cp -a /raid/home /raid/hometest > > You say that it isn't a heavy I/O job, but a recursive copy is a very > seek-intensive one, particularly when copying a large tree to the same > device, which will interleave reads and writes. What filesystem are you > using? With an internal journal, journal writes will cause additional > seeking. ext3. I do stress, however, that I find almost zero slowdown on the local system, when I do a ls -R locally. This is what set off my spidey sense. > > > Within 10 seconds, the output of ls -R /home slows. Within 20 > > seconds, it_stop_. It then sits there for seconds, and spews out a > > page in small jumps. Again, a ls /raid/home on the _server_ barely > > slows, and is constant. > > Do you mean ls -R /raid/home here? Is it definitely the case that > ls -R /raid/home on the server is quick, but on the client it is slow? Yes, most definitely ls -R /raid/home. I've tested this dozens of times, and it is as you say above. Server fast, client slow. > > How about ls -lR /raid/home on the server? It could be that knfsd is > returning file attribute information, hence reading the whole inode > for each file, and not just for the directories. getdents64() returns > d_type=DT_DIR for directory entries, which allows ls -R to optimize the > traversal so as to only call fstat64() on directories, not regular > files. So on the server, ls -R would only fstat64() the entries, while > on the client ls -R can cause knfsd to do the equivalent of ls -lR. > ls -lR on the server is fast still. There are no 5 second stalls... no stalls at all, really. > Also, which I/O scheduler are you using? io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered elevator: using anticipatory as default io scheduler Should I try a different scheduler? Deadline perhaps? > > Regards, > > Bill Rugolsky ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2005-03-02 16:41 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-02-26 13:28 knfsd brought to its knees, by a simple rsync or cp operation Brad Barnett 2005-02-28 10:06 ` Olaf Kirch 2005-02-28 15:23 ` Brad Barnett 2005-02-28 15:44 ` Olaf Kirch 2005-02-28 16:20 ` Brad Barnett 2005-03-01 9:55 ` Olaf Kirch 2005-03-01 11:57 ` Brad Barnett 2005-03-01 14:21 ` Roger Heflin 2005-03-01 14:37 ` Olaf Kirch 2005-03-01 23:10 ` Brad Barnett 2005-03-02 9:03 ` Olaf Kirch 2005-03-02 16:41 ` Brad Barnett 2005-03-01 15:04 ` Bill Rugolsky Jr. 2005-03-01 16:08 ` Bill Rugolsky Jr. 2005-03-01 23:38 ` Brad Barnett 2005-03-01 23:40 ` Brad Barnett
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.