Best A->B large file copy performance

Linux NFS development
 help / color / mirror / Atom feed

From: Jim Callahan <callahan@temerity.us>
To: linux-nfs@vger.kernel.org
Subject: Best A->B large file copy performance
Date: Thu, 12 Mar 2009 17:00:59 -0400	[thread overview]
Message-ID: <49B9780B.2020609@temerity.us> (raw)

I'm trying to determine the most optimal way to have a single NFS client 
copy large numbers (100-1000) of fairly large (1-50M) files from one 
location on an file server to another location on the same file server.  
There seem to be several API layers which influence this:

1. Number of OS level processes performing the copy in parallel.
2. Record size used buy the C-library read()/write() calls from these 
processes.
3. NFS client rsize/wsize settings.
4. Ethernet MTU size.
5. Bandwidth of the ethernet network and switches.

So far we've played around with larger MTU and rsize/wsize settings 
without seeing a huge difference.  Since we have been using "cp" to 
perform (1), we've not tweaked the record size at all at this point.   
My suspicion is that we should be carefully coordinating the sizes 
specified in for the layers 2, 3 and 4.  Perhaps we should be using "dd" 
instead of "cp" so we can control the record size being used.   Since 
the number of permutations of these three settings are large I was 
hoping that I might get some advise from this list about a range of 
values we should be investigating and any unpleasant interactions 
between these levels of settings we should be aware of to narrow our 
search.  Also, if there are other major factors outside those listed I'd 
appreciate being pointed in the right direction.

---

While I'm on the subject, has there been any discussion about adding an 
NFS request that would allow copying files from one location to another 
on the same NFS server without requiring a round trip to a client?  Its 
not at all uncommon to need to move data around in this manner and it 
seems a huge waste of bandwidth to have to send all this data from the 
server to the client just to have the client send the data back 
unaltered to a different location.  Such a COPY request would be high 
level along the lines of RENAME and each server vendor could optimize 
this for their particular hardware architecture.  For our particular 
application, having such a request would make a huge difference in 
performance.

-- 
Jim Callahan - President - Temerity Software <www.temerity.us>

next             reply	other threads:[~2009-03-12 22:00 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-12 21:00 Jim Callahan [this message]
2009-03-13  2:43 ` Best A->B large file copy performance Greg Banks
2009-03-13 19:16 ` Trond Myklebust
2009-03-13 21:40   ` Jim Callahan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49B9780B.2020609@temerity.us \
    --to=callahan@temerity.us \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox