All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Banks <gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
To: Frank van Maarseveen <frankvm@frankvm.com>
Cc: Chuck Lever <chuck.lever@oracle.com>,
	Trond Myklebust <trond.myklebust@fys.uio.no>,
	bfields@citi.umich.edu, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 4/5] NFSD: Remove NFSD_TCP kernel build option
Date: Wed, 06 Feb 2008 10:05:39 +1100	[thread overview]
Message-ID: <47A8EBC3.7050900@melbourne.sgi.com> (raw)
In-Reply-To: <20080205155021.GA7805@janus>

Frank van Maarseveen wrote:
> On Tue, Feb 05, 2008 at 04:49:39PM +1100, Greg Banks wrote:
>   
>> Chuck Lever wrote:
>>     
>>> On Feb 4, 2008, at 7:29 PM, Greg Banks wrote:
>>>       
>>>> Trond Myklebust wrote:
>>>>         
>>>>> On Tue, 2008-02-05 at 11:19 +1100, Greg Banks wrote:
>>>>>
>>>>>           
>>>>>> Chuck Lever wrote:
>>>>>>
>>>>>>             
>>>>>>> TCP support in the Linux NFS server is stable enough that we can
>>>>>>> leave it
>>>>>>> on always.  CONFIG_NFSD_TCP adds about 10 lines of code, and
>>>>>>> defaults to
>>>>>>> "Y" anyway.
>>>>>>>
>>>>>>> A run-time switch might be more appropriate if people feel they
>>>>>>> would like
>>>>>>> to disable NFSD's TCP support.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> Looks good.
>>>>>>
>>>>>> Actually, I'd be inclined to go one step further and set UDP support
>>>>>> off by default.
>>>>>>
>>>>>>             
>>>>> That will break older clients.
>>>>>
>>>>>
>>>>>           
>>>> Hence the default, rather than removing the code entirely.
>>>>         
>>> What might make sense is to remove NFSD_TCP, but add NFSD_UDP,
>>> defaulting to Y.
>>>
>>> Then in a year or two we can change the default to N.
>>>
>>>       
>> Fine by me.
>>     
>
> Last time I checked (around 2.6.22) writing large files on NFSv3 over
> UDP was 20% faster compared to TCP (Gb LAN with one switch connecting
> all machines).
>   
Did all of your file arrive at the server, and in the same order it left
the client?  NFS on UDP relies on IP fragmentation, which is known to
introduce silent data corruption at high data rates (google for "IPID aliasing").

Also, last time I checked, UDP support in the server uses a single socket
for all traffic, and processes need to serialise on the svc_sock lock to send,
so aggregate UDP throughput is strictly limited compared to TCP.  As in, 145 MB/s
for UDP compared to filling 12 1gige pipes for TCP.  I have a patch to fix this,
but given the inherent data corruption issues of UDP I haven't bothered posting
the most recent version.



> TCP and its timeout/retransmission behavior isn't always the best choice.
>
>   
The timeout & retrans that sunrpc implements on top of UDP is arguably worse,
especially if you use the "soft" mount option.

-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.


  parent reply	other threads:[~2008-02-05 22:59 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-05  0:04 [PATCH 4/5] NFSD: Remove NFSD_TCP kernel build option Chuck Lever
     [not found] ` <20080205000442.18602.29035.stgit-meopP2rzCrTwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
2008-02-05  0:19   ` Greg Banks
     [not found]     ` <47A7AB89.7020709-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2008-02-05  0:19       ` Trond Myklebust
     [not found]         ` <1202170754.28484.57.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-02-05  0:29           ` Greg Banks
     [not found]             ` <47A7AE03.10401-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2008-02-05  1:55               ` Chuck Lever
2008-02-05  5:49                 ` Greg Banks
2008-02-05  6:05                   ` Neil Brown
     [not found]                   ` <47A7F8F3.3020907-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2008-02-05 15:50                     ` Frank van Maarseveen
2008-02-05 17:50                       ` Trond Myklebust
     [not found]                         ` <1202233839.8452.31.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-02-05 18:08                           ` Frank van Maarseveen
2008-02-05 23:05                       ` Greg Banks [this message]
     [not found]                         ` <47A8EBC3.7050900-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2008-02-05 23:08                           ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47A8EBC3.7050900@melbourne.sgi.com \
    --to=gnb-cp1dwlodopni96+mszhfpqc/g2k4zdhf@public.gmane.org \
    --cc=bfields@citi.umich.edu \
    --cc=chuck.lever@oracle.com \
    --cc=frankvm@frankvm.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.