* mmap() and NFS server performance
@ 2002-12-13 21:09 Matthew Mitchell
2002-12-13 21:35 ` Brian Pawlowski
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Matthew Mitchell @ 2002-12-13 21:09 UTC (permalink / raw)
To: nfs
Hello,
We've noticed some interesting behavior regarding mmap file IO and were
wondering if anyone here had some clues as to what might be going on.
We have some applications that mmap() files into memory for the purpose
of updating some potentially scattered word-sized data values. These
apps were originally written on Solaris with Solaris NFS servers assumed
to be the data source; the Sun guys said that mmap would be much faster
than read/write and they were correct. However, now that we have a few
Linux NFS servers, we're seeing the opposite.
It takes, for example, 20-24 hours to update a 2GB file (there are on
the order of 1M words to update) via mmap, whereas reading in the whole
file, updating it in memory, and writing it out again takes about 30
minutes (limited by network speed in our case). This when the file
resides on the Linux server. On Solaris the mmap update runs on the
order of 15-20 minutes. Client in both cases is a Solaris 8 machine.
What we are speculating is that the Solaris NFS server is "cheating" by
keeping file state information between NFS reads and writes. (The
updates do occur in sequence.) Are there any heuristics or
optimizations done by the Linux NFS server that might help the
performance here?
All things, alas, are not equal in this comparison -- the Solaris server
in question is an 8 CPU Enterprise 4500 with 8GB of RAM, and the Linux
server is a single-processor P4 with 512MB of RAM. The Solaris server
is much busier than the Linux server, though (I know it cannot be
caching the whole file in memory). The Linux server is running Red
Hat's 2.4.18-14 kernel.
Anyone have any ideas as to why this is happening and what, if anything,
we can do to speed up the mmap IO when using the Linux server?
--
Matthew Mitchell
Systems Programmer/Administrator matthew@geodev.com
Geophysical Development Corporation phone 713 782 1234
1 Riverway Suite 2100, Houston, TX 77056 fax 713 782 1829
-------------------------------------------------------
This sf.net email is sponsored by:
With Great Power, Comes Great Responsibility
Learn to use your power at OSDN's High Performance Computing Channel
http://hpc.devchannel.org/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: mmap() and NFS server performance
2002-12-13 21:09 mmap() and NFS server performance Matthew Mitchell
@ 2002-12-13 21:35 ` Brian Pawlowski
2002-12-14 11:22 ` Trond Myklebust
2002-12-14 16:51 ` David B. Ritch
2 siblings, 0 replies; 7+ messages in thread
From: Brian Pawlowski @ 2002-12-13 21:35 UTC (permalink / raw)
To: Matthew Mitchell; +Cc: nfs
I would love to see a packet trace in the absence of any known
problems...
> Hello,
>
> We've noticed some interesting behavior regarding mmap file IO and were
> wondering if anyone here had some clues as to what might be going on.
>
> We have some applications that mmap() files into memory for the purpose
> of updating some potentially scattered word-sized data values. These
> apps were originally written on Solaris with Solaris NFS servers assumed
> to be the data source; the Sun guys said that mmap would be much faster
> than read/write and they were correct. However, now that we have a few
> Linux NFS servers, we're seeing the opposite.
>
> It takes, for example, 20-24 hours to update a 2GB file (there are on
> the order of 1M words to update) via mmap, whereas reading in the whole
> file, updating it in memory, and writing it out again takes about 30
> minutes (limited by network speed in our case). This when the file
> resides on the Linux server. On Solaris the mmap update runs on the
> order of 15-20 minutes. Client in both cases is a Solaris 8 machine.
>
> What we are speculating is that the Solaris NFS server is "cheating" by
> keeping file state information between NFS reads and writes. (The
> updates do occur in sequence.) Are there any heuristics or
> optimizations done by the Linux NFS server that might help the
> performance here?
>
> All things, alas, are not equal in this comparison -- the Solaris server
> in question is an 8 CPU Enterprise 4500 with 8GB of RAM, and the Linux
> server is a single-processor P4 with 512MB of RAM. The Solaris server
> is much busier than the Linux server, though (I know it cannot be
> caching the whole file in memory). The Linux server is running Red
> Hat's 2.4.18-14 kernel.
>
> Anyone have any ideas as to why this is happening and what, if anything,
> we can do to speed up the mmap IO when using the Linux server?
>
> --
> Matthew Mitchell
> Systems Programmer/Administrator matthew@geodev.com
> Geophysical Development Corporation phone 713 782 1234
> 1 Riverway Suite 2100, Houston, TX 77056 fax 713 782 1829
>
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:
> With Great Power, Comes Great Responsibility
> Learn to use your power at OSDN's High Performance Computing Channel
> http://hpc.devchannel.org/
> _______________________________________________
> NFS maillist - NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs
-------------------------------------------------------
This sf.net email is sponsored by:
With Great Power, Comes Great Responsibility
Learn to use your power at OSDN's High Performance Computing Channel
http://hpc.devchannel.org/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mmap() and NFS server performance
2002-12-13 21:09 mmap() and NFS server performance Matthew Mitchell
2002-12-13 21:35 ` Brian Pawlowski
@ 2002-12-14 11:22 ` Trond Myklebust
2002-12-16 14:33 ` Matthew Mitchell
2002-12-14 16:51 ` David B. Ritch
2 siblings, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2002-12-14 11:22 UTC (permalink / raw)
To: Matthew Mitchell; +Cc: nfs
>>>>> " " == Matthew Mitchell <matthew@geodev.com> writes:
> values. These apps were originally written on Solaris with
> Solaris NFS servers assumed to be the data source; the Sun guys
> said that mmap would be much faster than read/write and they
> were correct. However, now that we have a few Linux NFS
> servers, we're seeing the opposite.
As long as the clients are still Solaris, then the only difference can
be the network, and the server performance.
Of the 2, the bigger 'generic' troublemaker tends to be the network.
Solaris clients always tend to prefer NFS over TCP since that tends to
be more reliable on poor networks than does UDP. Unfortunately, NFS
over TCP on the server side is a fairly recent addition to Linux: it
only just made it into the stable release 2 weeks ago (when 2.4.20 was
released). To the best of my knowledge, none of the RedHat kernels
support it yet.
Cheers,
Trond
-------------------------------------------------------
This sf.net email is sponsored by:
With Great Power, Comes Great Responsibility
Learn to use your power at OSDN's High Performance Computing Channel
http://hpc.devchannel.org/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: mmap() and NFS server performance
2002-12-14 11:22 ` Trond Myklebust
@ 2002-12-16 14:33 ` Matthew Mitchell
2002-12-16 14:50 ` Trond Myklebust
0 siblings, 1 reply; 7+ messages in thread
From: Matthew Mitchell @ 2002-12-16 14:33 UTC (permalink / raw)
To: Trond Myklebust; +Cc: nfs
Trond Myklebust wrote:
>>>>>>" " == Matthew Mitchell <matthew@geodev.com> writes:
>>>>>
>
> > values. These apps were originally written on Solaris with
> > Solaris NFS servers assumed to be the data source; the Sun guys
> > said that mmap would be much faster than read/write and they
> > were correct. However, now that we have a few Linux NFS
> > servers, we're seeing the opposite.
>
> As long as the clients are still Solaris, then the only difference can
> be the network, and the server performance.
>
> Of the 2, the bigger 'generic' troublemaker tends to be the network.
> Solaris clients always tend to prefer NFS over TCP since that tends to
> be more reliable on poor networks than does UDP. Unfortunately, NFS
> over TCP on the server side is a fairly recent addition to Linux: it
> only just made it into the stable release 2 weeks ago (when 2.4.20 was
> released). To the best of my knowledge, none of the RedHat kernels
> support it yet.
For some reason it had not occurred to me that the NFS server on the
Linux box might be using UDP instead of TCP; I had it in my head somehow
that it would use TCP. Obviously it wasn't. Now, we had noticed slower
speeds with UDP before (when Linux clients were using UDP to access the
Solaris servers). Could this be causing such a drastic slowdown?
I have looked at some network packet streams and I don't think we are
having any of the classic UDP problems -- negligible # of retransmits,
most packets arrive in order. The Linux server isn't loaded hardly at all.
Three further questions:
1) What would you like to see, tcpdump/snoop wise, to verify this?
2) Could UDP service really be causing this order of magnitude slowdown?
3) Is TCP server code "ready enough" for production use? In our case we
don't mind some occasional bugs, but it needs to be able to stay working
under reasonable load for a day or so at a time for us to get anything
done ("Stale NFS file handle" is a scourge...).
Thanks again.
--
Matthew Mitchell
Systems Programmer/Administrator matthew@geodev.com
Geophysical Development Corporation phone 713 782 1234
1 Riverway Suite 2100, Houston, TX 77056 fax 713 782 1829
-------------------------------------------------------
This sf.net email is sponsored by:
With Great Power, Comes Great Responsibility
Learn to use your power at OSDN's High Performance Computing Channel
http://hpc.devchannel.org/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: mmap() and NFS server performance
2002-12-16 14:33 ` Matthew Mitchell
@ 2002-12-16 14:50 ` Trond Myklebust
2002-12-16 20:04 ` Matthew Mitchell
0 siblings, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2002-12-16 14:50 UTC (permalink / raw)
To: Matthew Mitchell; +Cc: nfs
>>>>> " " == Matthew Mitchell <matthew@geodev.com> writes:
> 1) What would you like to see, tcpdump/snoop wise, to verify
> this?
nfsstat on the client should normally tell you how often you are
seeing RPC retransmits.
> 2) Could UDP service really be causing this order of magnitude
> slowdown?
Certainly: retransmissions follow an *exponential* backoff rule. For
that reason, it doesn't take a very high percentage of retransmissions
before you see a large impact.
> 3) Is TCP server code "ready enough" for production use? In
> our case we
> don't mind some occasional bugs, but it needs to be able to
> stay working under reasonable load for a day or so at a time
> for us to get anything done ("Stale NFS file handle" is a
> scourge...).
That is more of a question for Neil Brown, but I personally don't have
any particularly bad experiences to report.
Cheers,
Trond
-------------------------------------------------------
This sf.net email is sponsored by:
With Great Power, Comes Great Responsibility
Learn to use your power at OSDN's High Performance Computing Channel
http://hpc.devchannel.org/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: mmap() and NFS server performance
2002-12-16 14:50 ` Trond Myklebust
@ 2002-12-16 20:04 ` Matthew Mitchell
0 siblings, 0 replies; 7+ messages in thread
From: Matthew Mitchell @ 2002-12-16 20:04 UTC (permalink / raw)
To: trond.myklebust; +Cc: nfs
Trond Myklebust wrote:
>>>>>>" " == Matthew Mitchell <matthew@geodev.com> writes:
>>>>>
>
> > 1) What would you like to see, tcpdump/snoop wise, to verify
> > this?
>
> nfsstat on the client should normally tell you how often you are
> seeing RPC retransmits.
Ran the program again. In 20 minutes, around 20k total client RPC
requests, with an assortment of retrans, time, and badxid errors. I am
starting to believe that there is a problem using UDP, and that it alone
might be enough to explain the slowness. It seems to have negotiated
an 8k transfer size. Reasonable?
> > 2) Could UDP service really be causing this order of magnitude
> > slowdown?
>
> Certainly: retransmissions follow an *exponential* backoff rule. For
> that reason, it doesn't take a very high percentage of retransmissions
> before you see a large impact.
Seems to be borne out by the evidence.
> > 3) Is TCP server code "ready enough" for production use? In
> > our case we
> > don't mind some occasional bugs, but it needs to be able to
> > stay working under reasonable load for a day or so at a time
> > for us to get anything done ("Stale NFS file handle" is a
> > scourge...).
>
> That is more of a question for Neil Brown, but I personally don't have
> any particularly bad experiences to report.
Any particularly *good* ones? :) Would you recommend the 2.4.20 set, or
some additional patches? I will probably install the new kernel when we
get a little downtime later this week (and then I will promptly go on
vacation; hope it stays up! ha!).
Thanks for the suggestions and assistance. I'll report back with TCP
info for anyone who is interested.
--
Matthew Mitchell
Systems Programmer/Administrator matthew@geodev.com
Geophysical Development Corporation phone 713 782 1234
1 Riverway Suite 2100, Houston, TX 77056 fax 713 782 1829
-------------------------------------------------------
This sf.net email is sponsored by:
With Great Power, Comes Great Responsibility
Learn to use your power at OSDN's High Performance Computing Channel
http://hpc.devchannel.org/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mmap() and NFS server performance
2002-12-13 21:09 mmap() and NFS server performance Matthew Mitchell
2002-12-13 21:35 ` Brian Pawlowski
2002-12-14 11:22 ` Trond Myklebust
@ 2002-12-14 16:51 ` David B. Ritch
2 siblings, 0 replies; 7+ messages in thread
From: David B. Ritch @ 2002-12-14 16:51 UTC (permalink / raw)
To: Matthew Mitchell; +Cc: NFS mailing list
When we moved some nfs clients from 2.4.18 to 2.4.20, we saw some
performance issues that initially appeared to be an nfs problem.
However, it turned out that the upgrade apparently broke autonegotiation
between the NICs in the nodes and our network switches. This did not
turn up immediately in network testing with tools such as netperf,
because we still had good one-way bandwidth with the nodes set to full
duplex and the switches set to half. However, two-way traffic
immediately caused lots of problems.
I never trust autonegotiation, and turn it off whenever I can.
You may be experiencing a similar problem.
dbr
On Fri, 2002-12-13 at 16:09, Matthew Mitchell wrote:
> Hello,
>
> We've noticed some interesting behavior regarding mmap file IO and were
> wondering if anyone here had some clues as to what might be going on.
--
David B. Ritch
High Performance Technologies, Inc.
-------------------------------------------------------
This sf.net email is sponsored by:
With Great Power, Comes Great Responsibility
Learn to use your power at OSDN's High Performance Computing Channel
http://hpc.devchannel.org/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2002-12-16 20:08 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-12-13 21:09 mmap() and NFS server performance Matthew Mitchell
2002-12-13 21:35 ` Brian Pawlowski
2002-12-14 11:22 ` Trond Myklebust
2002-12-16 14:33 ` Matthew Mitchell
2002-12-16 14:50 ` Trond Myklebust
2002-12-16 20:04 ` Matthew Mitchell
2002-12-14 16:51 ` David B. Ritch
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.