* "Using NFS over UDP on high-speed links such as Gigabit can cause silent data corruption." @ 2012-02-28 5:22 Harshula 2012-02-28 11:52 ` Jeff Layton 0 siblings, 1 reply; 16+ messages in thread From: Harshula @ 2012-02-28 5:22 UTC (permalink / raw) To: Steve Dickson; +Cc: Jeff Layton, NeilBrown, linux-nfs Hi Steve, The following openSUSE nfs-utils patch, warn-nfs-udp.patch, is not included upstream: https://build.opensuse.org/package/view_file?file=warn-nfs-udp.patch&package=nfs-utils&project=openSUSE%3AFactory&rev=8e3e60c70e8270cd4afa036e13f6b2bb Please consider including it. Thanks, # ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: "Using NFS over UDP on high-speed links such as Gigabit can cause silent data corruption." 2012-02-28 5:22 "Using NFS over UDP on high-speed links such as Gigabit can cause silent data corruption." Harshula @ 2012-02-28 11:52 ` Jeff Layton 2012-02-28 12:32 ` Harshula 2012-02-28 12:46 ` Jim Rees 0 siblings, 2 replies; 16+ messages in thread From: Jeff Layton @ 2012-02-28 11:52 UTC (permalink / raw) To: Harshula; +Cc: Steve Dickson, NeilBrown, linux-nfs On Tue, 28 Feb 2012 16:22:01 +1100 Harshula <harshula@redhat.com> wrote: > Hi Steve, > > The following openSUSE nfs-utils patch, warn-nfs-udp.patch, is not > included upstream: > > https://build.opensuse.org/package/view_file?file=warn-nfs-udp.patch&package=nfs-utils&project=openSUSE%3AFactory&rev=8e3e60c70e8270cd4afa036e13f6b2bb > > Please consider including it. > > Thanks, > # > I think that patch looks reasonable and clearly documenting the problems with UDP is a wonderful thing. It may be best to send it formally to steved and the list as a real [PATCH] with a real description and SoB line. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: "Using NFS over UDP on high-speed links such as Gigabit can cause silent data corruption." 2012-02-28 11:52 ` Jeff Layton @ 2012-02-28 12:32 ` Harshula 2012-02-28 12:41 ` Jeff Layton 2012-02-28 12:46 ` Jim Rees 1 sibling, 1 reply; 16+ messages in thread From: Harshula @ 2012-02-28 12:32 UTC (permalink / raw) To: Jeff Layton; +Cc: Steve Dickson, NeilBrown, linux-nfs On Tue, 2012-02-28 at 06:52 -0500, Jeff Layton wrote: > On Tue, 28 Feb 2012 16:22:01 +1100 > Harshula <harshula@redhat.com> wrote: > > > Hi Steve, > > > > The following openSUSE nfs-utils patch, warn-nfs-udp.patch, is not > > included upstream: > > > > https://build.opensuse.org/package/view_file?file=warn-nfs-udp.patch&package=nfs-utils&project=openSUSE%3AFactory&rev=8e3e60c70e8270cd4afa036e13f6b2bb > > > > Please consider including it. > > > > Thanks, > > # > > > > I think that patch looks reasonable and clearly documenting the > problems with UDP is a wonderful thing. > > It may be best to send it formally to steved and the list as a real > [PATCH] with a real description and SoB line. I do not know who authored the above patch, hopefully someone will come out and claim it now that it is on this list. cya, # ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: "Using NFS over UDP on high-speed links such as Gigabit can cause silent data corruption." 2012-02-28 12:32 ` Harshula @ 2012-02-28 12:41 ` Jeff Layton 2012-03-05 1:56 ` Harshula 0 siblings, 1 reply; 16+ messages in thread From: Jeff Layton @ 2012-02-28 12:41 UTC (permalink / raw) To: Harshula; +Cc: Steve Dickson, NeilBrown, linux-nfs On Tue, 28 Feb 2012 23:32:26 +1100 Harshula <harshula@redhat.com> wrote: > On Tue, 2012-02-28 at 06:52 -0500, Jeff Layton wrote: > > On Tue, 28 Feb 2012 16:22:01 +1100 > > Harshula <harshula@redhat.com> wrote: > > > > > Hi Steve, > > > > > > The following openSUSE nfs-utils patch, warn-nfs-udp.patch, is not > > > included upstream: > > > > > > https://build.opensuse.org/package/view_file?file=warn-nfs-udp.patch&package=nfs-utils&project=openSUSE%3AFactory&rev=8e3e60c70e8270cd4afa036e13f6b2bb > > > > > > Please consider including it. > > > > > > Thanks, > > > # > > > > > > > I think that patch looks reasonable and clearly documenting the > > problems with UDP is a wonderful thing. > > > > It may be best to send it formally to steved and the list as a real > > [PATCH] with a real description and SoB line. > > I do not know who authored the above patch, hopefully someone will come > out and claim it now that it is on this list. > > cya, > # > Well, presumably that patch is open-source licensed like the rest of the nfs-utils code. Is that sufficient to simply copy it from opensuse? Either way, if the author were to step forward that would certainly be preferable... -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: "Using NFS over UDP on high-speed links such as Gigabit can cause silent data corruption." 2012-02-28 12:41 ` Jeff Layton @ 2012-03-05 1:56 ` Harshula 0 siblings, 0 replies; 16+ messages in thread From: Harshula @ 2012-03-05 1:56 UTC (permalink / raw) To: Jeff Layton; +Cc: Steve Dickson, NeilBrown, linux-nfs On Tue, 2012-02-28 at 07:41 -0500, Jeff Layton wrote: > On Tue, 28 Feb 2012 23:32:26 +1100 > Harshula <harshula@redhat.com> wrote: > > > On Tue, 2012-02-28 at 06:52 -0500, Jeff Layton wrote: > > > On Tue, 28 Feb 2012 16:22:01 +1100 > > > Harshula <harshula@redhat.com> wrote: > > > > > > > Hi Steve, > > > > > > > > The following openSUSE nfs-utils patch, warn-nfs-udp.patch, is not > > > > included upstream: > > > > > > > > https://build.opensuse.org/package/view_file?file=warn-nfs-udp.patch&package=nfs-utils&project=openSUSE%3AFactory&rev=8e3e60c70e8270cd4afa036e13f6b2bb > > > > > > > > Please consider including it. > > > > > > > > Thanks, > > > > # > > > > > > > > > > I think that patch looks reasonable and clearly documenting the > > > problems with UDP is a wonderful thing. > > > > > > It may be best to send it formally to steved and the list as a real > > > [PATCH] with a real description and SoB line. > > > > I do not know who authored the above patch, hopefully someone will come > > out and claim it now that it is on this list. > > > > cya, > > # > > > > Well, presumably that patch is open-source licensed like the rest of > the nfs-utils code. Is that sufficient to simply copy it from opensuse? > > Either way, if the author were to step forward that would certainly be > preferable... I was told the following: ---------------------------------------------------------------- Olaf wrote the patch, and Mads Martin Joergensen applied it to util-linux in SLES 9 SP2 and/or SP3 by about mid 2005. This appears in the util-linux changelog as: Wed Jun 29 11:26:31 CEST 2005 - mmj@suse.de - Document load and clearly issues about NFS over UDP [#80263] ---------------------------------------------------------------- cya, # ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: "Using NFS over UDP on high-speed links such as Gigabit can cause silent data corruption." 2012-02-28 11:52 ` Jeff Layton 2012-02-28 12:32 ` Harshula @ 2012-02-28 12:46 ` Jim Rees 2012-02-28 12:57 ` Jeff Layton 2012-02-28 14:35 ` Chuck Lever 1 sibling, 2 replies; 16+ messages in thread From: Jim Rees @ 2012-02-28 12:46 UTC (permalink / raw) To: Jeff Layton; +Cc: Harshula, Steve Dickson, NeilBrown, linux-nfs Jeff Layton wrote: On Tue, 28 Feb 2012 16:22:01 +1100 Harshula <harshula@redhat.com> wrote: > Hi Steve, > > The following openSUSE nfs-utils patch, warn-nfs-udp.patch, is not > included upstream: > > https://build.opensuse.org/package/view_file?file=warn-nfs-udp.patch&package=nfs-utils&project=openSUSE%3AFactory&rev=8e3e60c70e8270cd4afa036e13f6b2bb > > Please consider including it. > > Thanks, > # > I think that patch looks reasonable and clearly documenting the problems with UDP is a wonderful thing. It may be best to send it formally to steved and the list as a real [PATCH] with a real description and SoB line. This feels like the wrong place to document this, since it affects anything that uses udp, not just nfs. It also seems like this should be solved in the network layer with an adaptive frag time. But I'm not volunteering to do that. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: "Using NFS over UDP on high-speed links such as Gigabit can cause silent data corruption." 2012-02-28 12:46 ` Jim Rees @ 2012-02-28 12:57 ` Jeff Layton 2012-02-28 14:35 ` Chuck Lever 1 sibling, 0 replies; 16+ messages in thread From: Jeff Layton @ 2012-02-28 12:57 UTC (permalink / raw) To: Jim Rees; +Cc: Harshula, Steve Dickson, NeilBrown, linux-nfs On Tue, 28 Feb 2012 07:46:46 -0500 Jim Rees <rees@umich.edu> wrote: > Jeff Layton wrote: > > On Tue, 28 Feb 2012 16:22:01 +1100 > Harshula <harshula@redhat.com> wrote: > > > Hi Steve, > > > > The following openSUSE nfs-utils patch, warn-nfs-udp.patch, is not > > included upstream: > > > > https://build.opensuse.org/package/view_file?file=warn-nfs-udp.patch&package=nfs-utils&project=openSUSE%3AFactory&rev=8e3e60c70e8270cd4afa036e13f6b2bb > > > > Please consider including it. > > > > Thanks, > > # > > > > I think that patch looks reasonable and clearly documenting the > problems with UDP is a wonderful thing. > > It may be best to send it formally to steved and the list as a real > [PATCH] with a real description and SoB line. > > This feels like the wrong place to document this, since it affects anything > that uses udp, not just nfs. It also seems like this should be solved in > the network layer with an adaptive frag time. But I'm not volunteering to > do that. Certainly, documenting it in udp(7) or whatever would be fine too. The problem though is that someone setting up a NFS mount isn't as likely to see it there as they would if it were in nfs(5). I see no harm in documenting it here too. At the very least, if you're going to put this into udp(7) instead then nfs(5) should refer to that manpage and chapter explicitly. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: "Using NFS over UDP on high-speed links such as Gigabit can cause silent data corruption." 2012-02-28 12:46 ` Jim Rees 2012-02-28 12:57 ` Jeff Layton @ 2012-02-28 14:35 ` Chuck Lever 2012-02-28 15:09 ` Jim Rees 2012-02-28 15:50 ` Chuck Lever 1 sibling, 2 replies; 16+ messages in thread From: Chuck Lever @ 2012-02-28 14:35 UTC (permalink / raw) To: Jim Rees; +Cc: Jeff Layton, Harshula, Steve Dickson, NeilBrown, linux-nfs On Feb 28, 2012, at 7:46 AM, Jim Rees wrote: > Jeff Layton wrote: > > On Tue, 28 Feb 2012 16:22:01 +1100 > Harshula <harshula@redhat.com> wrote: > >> Hi Steve, >> >> The following openSUSE nfs-utils patch, warn-nfs-udp.patch, is not >> included upstream: >> >> https://build.opensuse.org/package/view_file?file=warn-nfs-udp.patch&package=nfs-utils&project=openSUSE%3AFactory&rev=8e3e60c70e8270cd4afa036e13f6b2bb >> >> Please consider including it. >> >> Thanks, >> # >> > > I think that patch looks reasonable and clearly documenting the > problems with UDP is a wonderful thing. > > It may be best to send it formally to steved and the list as a real > [PATCH] with a real description and SoB line. > > This feels like the wrong place to document this, since it affects anything > that uses udp, not just nfs. NFS has a particular sensitivity to unreliable datagram transports, and that is a well-known problem. NetApp's retired TR-3183 and many Oracle meta documents mention the problems with NFS over UDP. Most other uses of UDP do not involve such large datagrams. My comment is that if the text in the TRANSPORT METHODS section in nfs(5) about UDP reassembly is not adequate it should be updated. I would rather see the meat of the proposed text merged into that section; otherwise we have two disparate sections discussing the same topic. That section is where this kind of discussion belongs. > It also seems like this should be solved in > the network layer with an adaptive frag time. But I'm not volunteering to > do that. As above, most other uses of UDP do not involve large packets. But I wonder if it is appropriate for us to suggest a change in the default setting. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: "Using NFS over UDP on high-speed links such as Gigabit can cause silent data corruption." 2012-02-28 14:35 ` Chuck Lever @ 2012-02-28 15:09 ` Jim Rees 2012-02-28 15:50 ` Chuck Lever 1 sibling, 0 replies; 16+ messages in thread From: Jim Rees @ 2012-02-28 15:09 UTC (permalink / raw) To: Chuck Lever; +Cc: Jeff Layton, Harshula, Steve Dickson, NeilBrown, linux-nfs Chuck Lever wrote: As above, most other uses of UDP do not involve large packets. But I wonder if it is appropriate for us to suggest a change in the default setting. A minute is certainly too long. Just a guess, but 4 seconds seems appropriate to me. That's good enough for gigabit ethernet. Anyone running fragmented 10G networks gets what they deserve. Yes, you can have more than four seconds of data in flight at moderate speeds, but unless something is terribly wrong the fragments should come in very close together. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: "Using NFS over UDP on high-speed links such as Gigabit can cause silent data corruption." 2012-02-28 14:35 ` Chuck Lever 2012-02-28 15:09 ` Jim Rees @ 2012-02-28 15:50 ` Chuck Lever 2012-03-05 2:17 ` Harshula 1 sibling, 1 reply; 16+ messages in thread From: Chuck Lever @ 2012-02-28 15:50 UTC (permalink / raw) To: Harshula Jayasuriya Cc: Jeff Layton, Jim Rees, Steve Dickson, NeilBrown, Linux NFS Mailing List On Feb 28, 2012, at 9:35 AM, Chuck Lever wrote: > > On Feb 28, 2012, at 7:46 AM, Jim Rees wrote: > >> Jeff Layton wrote: >> >> On Tue, 28 Feb 2012 16:22:01 +1100 >> Harshula <harshula@redhat.com> wrote: >> >>> Hi Steve, >>> >>> The following openSUSE nfs-utils patch, warn-nfs-udp.patch, is not >>> included upstream: >>> >>> https://build.opensuse.org/package/view_file?file=warn-nfs-udp.patch&package=nfs-utils&project=openSUSE%3AFactory&rev=8e3e60c70e8270cd4afa036e13f6b2bb >>> >>> Please consider including it. >>> >>> Thanks, >>> # >>> >> >> I think that patch looks reasonable and clearly documenting the >> problems with UDP is a wonderful thing. >> >> It may be best to send it formally to steved and the list as a real >> [PATCH] with a real description and SoB line. >> >> This feels like the wrong place to document this, since it affects anything >> that uses udp, not just nfs. > > NFS has a particular sensitivity to unreliable datagram transports, and that is a well-known problem. NetApp's retired TR-3183 and many Oracle meta documents mention the problems with NFS over UDP. Most other uses of UDP do not involve such large datagrams. > > My comment is that if the text in the TRANSPORT METHODS section in nfs(5) about UDP reassembly is not adequate it should be updated. I would rather see the meat of the proposed text merged into that section; otherwise we have two disparate sections discussing the same topic. That section is where this kind of discussion belongs. A few more comments. Any file, including a /proc file, called out in new text should be added to the FILES section, IMO. If we can't resolve the provenance issue, someone could rewrite the patch from scratch so that it addresses the review comments. I don't agree with adding in-code warnings. Mount works silently unless it fails, and this is not a mount failure. Would such warnings ever be seen for NFS mounts added to /etc/fstab, or performed by automounter? I think by and large most people type "mount -t nfs" without options and will get our current default transport setting, which is TCP, or UDP if the server does not support TCP. Isn't that adequate? We also know that the risk of using UDP is mitigated by using jumbo frames, specifying a small r/wsize, or by reducing the fragment reassembly timeout. If an admin does those things, she still gets the warning. It seems needlessly alarmist, and useless for our most common use cases. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: "Using NFS over UDP on high-speed links such as Gigabit can cause silent data corruption." 2012-02-28 15:50 ` Chuck Lever @ 2012-03-05 2:17 ` Harshula 2012-03-05 15:08 ` Chuck Lever 0 siblings, 1 reply; 16+ messages in thread From: Harshula @ 2012-03-05 2:17 UTC (permalink / raw) To: Chuck Lever Cc: Jeff Layton, Jim Rees, Steve Dickson, NeilBrown, Linux NFS Mailing List On Tue, 2012-02-28 at 10:50 -0500, Chuck Lever wrote: > On Feb 28, 2012, at 9:35 AM, Chuck Lever wrote: > > My comment is that if the text in the TRANSPORT METHODS section in > nfs(5) about UDP reassembly is not adequate it should be updated. I > would rather see the meat of the proposed text merged into that > section; otherwise we have two disparate sections discussing the same > topic. That section is where this kind of discussion belongs. Good point. I'll try to massage the text into that section. > A few more comments. > > Any file, including a /proc file, called out in new text should be > added to the FILES section, IMO. > > If we can't resolve the provenance issue, someone could rewrite the > patch from scratch so that it addresses the review comments. We now know who authored (Olaf Kirch) and committed (Mads Martin Joergensen) the text at SUSE. Do we need to get a sign-off from someone at SUSE? > I don't agree with adding in-code warnings. Mount works silently > unless it fails, and this is not a mount failure. Would such warnings > ever be seen for NFS mounts added to /etc/fstab, or performed by > automounter? I think by and large most people type "mount -t nfs" > without options and will get our current default transport setting, > which is TCP, or UDP if the server does not support TCP. Isn't that > adequate? > > We also know that the risk of using UDP is mitigated by using jumbo > frames, specifying a small r/wsize, or by reducing the fragment > reassembly timeout. If an admin does those things, she still gets the > warning. > > It seems needlessly alarmist, and useless for our most common use > cases. Sounds reasonable. Just the man page text then. cya, # ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: "Using NFS over UDP on high-speed links such as Gigabit can cause silent data corruption." 2012-03-05 2:17 ` Harshula @ 2012-03-05 15:08 ` Chuck Lever 2012-05-09 0:59 ` [PATCH] nfs-utils: Add a warning to the nfs manpage regarding using NFS over UDP on high-speed links Harshula Jayasuriya 0 siblings, 1 reply; 16+ messages in thread From: Chuck Lever @ 2012-03-05 15:08 UTC (permalink / raw) To: Harshula Cc: Jeff Layton, Jim Rees, Steve Dickson, NeilBrown, Linux NFS Mailing List On Mar 4, 2012, at 9:17 PM, Harshula wrote: > On Tue, 2012-02-28 at 10:50 -0500, Chuck Lever wrote: >> On Feb 28, 2012, at 9:35 AM, Chuck Lever wrote: > >>> My comment is that if the text in the TRANSPORT METHODS section in >> nfs(5) about UDP reassembly is not adequate it should be updated. I >> would rather see the meat of the proposed text merged into that >> section; otherwise we have two disparate sections discussing the same >> topic. That section is where this kind of discussion belongs. > > Good point. I'll try to massage the text into that section. Thanks. >> A few more comments. >> >> Any file, including a /proc file, called out in new text should be >> added to the FILES section, IMO. >> >> If we can't resolve the provenance issue, someone could rewrite the >> patch from scratch so that it addresses the review comments. > > We now know who authored (Olaf Kirch) and committed (Mads Martin > Joergensen) the text at SUSE. Do we need to get a sign-off from someone > at SUSE? IMO if Olaf is still there, he can send a SOB. But IANAL. >> I don't agree with adding in-code warnings. Mount works silently >> unless it fails, and this is not a mount failure. Would such warnings >> ever be seen for NFS mounts added to /etc/fstab, or performed by >> automounter? I think by and large most people type "mount -t nfs" >> without options and will get our current default transport setting, >> which is TCP, or UDP if the server does not support TCP. Isn't that >> adequate? >> >> We also know that the risk of using UDP is mitigated by using jumbo >> frames, specifying a small r/wsize, or by reducing the fragment >> reassembly timeout. If an admin does those things, she still gets the >> warning. >> >> It seems needlessly alarmist, and useless for our most common use >> cases. > > Sounds reasonable. Just the man page text then. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] nfs-utils: Add a warning to the nfs manpage regarding using NFS over UDP on high-speed links 2012-03-05 15:08 ` Chuck Lever @ 2012-05-09 0:59 ` Harshula Jayasuriya 2012-05-09 18:14 ` Steve Dickson 2012-05-09 18:38 ` Peter Staubach 0 siblings, 2 replies; 16+ messages in thread From: Harshula Jayasuriya @ 2012-05-09 0:59 UTC (permalink / raw) To: Steve Dickson Cc: Jeff Layton, Linux NFS Mailing List, Chuck Lever, Olaf Kirch * Using NFS over UDP on high-speed links such as Gigabit can cause silent data corruption. * The man page text was written by Olaf Kirch and committed to (but not upstream): https://build.opensuse.org/package/view_file?file=warn-nfs-udp.patch&package=nfs-utils&project=openSUSE%3AFactory&rev=8e3e60c70e8270cd4afa036e13f6b2bb Signed-off-by: Harshula Jayasuriya <harshula@redhat.com> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Olaf Kirch <okir@suse.com> --- utils/mount/nfs.man | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 81 insertions(+), 0 deletions(-) diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man index 0d20cf0..87e27e1 100644 --- a/utils/mount/nfs.man +++ b/utils/mount/nfs.man @@ -500,6 +500,8 @@ Specifying a netid that uses TCP forces all traffic from the command and the NFS client to use TCP. Specifying a netid that uses UDP forces all traffic types to use UDP. .IP +.B Before using NFS over UDP, refer to the TRANSPORT METHODS section. +.IP If the .B proto mount option is not specified, the @@ -514,6 +516,8 @@ The option is an alternative to specifying .BR proto=udp. It is included for compatibility with other operating systems. +.IP +.B Before using NFS over UDP, refer to the TRANSPORT METHODS section. .TP 1.5i .B tcp The @@ -1070,6 +1074,83 @@ or options are specified more than once on the same mount command line, then the value of the rightmost instance of each of these options takes effect. +.SS "Using NFS over UDP on high-speed links" +Using NFS over UDP on high-speed links such as Gigabit +.BR "can cause silent data corruption" . +.P +The problem can be triggered at high loads, and is caused by problems in +IP fragment reassembly. NFS read and writes typically transmit UDP packets +of 4 Kilobytes or more, which have to be broken up into several fragments +in order to be sent over the Ethernet link, which limits packets to 1500 +bytes by default. This process happens at the IP network layer and is +called fragmentation. +.P +In order to identify fragments that belong together, IP assigns a 16bit +.I IP ID +value to each packet; fragments generated from the same UDP packet +will have the same IP ID. The receiving system will collect these +fragments and combine them to form the original UDP packet. This process +is called reassembly. The default timeout for packet reassembly is +30 seconds; if the network stack does not receive all fragments of +a given packet within this interval, it assumes the missing fragment(s) +got lost and discards those it already received. +.P +The problem this creates over high-speed links is that it is possible +to send more than 65536 packets within 30 seconds. In fact, with +heavy NFS traffic one can observe that the IP IDs repeat after about +5 seconds. +.P +This has serious effects on reassembly: if one fragment gets lost, +another fragment +.I from a different packet +but with the +.I same IP ID +will arrive within the 30 second timeout, and the network stack will +combine these fragments to form a new packet. Most of the time, network +layers above IP will detect this mismatched reassembly - in the case +of UDP, the UDP checksum, which is a 16 bit checksum over the entire +packet payload, will usually not match, and UDP will discard the +bad packet. +.P +However, the UDP checksum is 16 bit only, so there is a chance of 1 in +65536 that it will match even if the packet payload is completely +random (which very often isn't the case). If that is the case, +silent data corruption will occur. +.P +This potential should be taken seriously, at least on Gigabit +Ethernet. +Network speeds of 100Mbit/s should be considered less +problematic, because with most traffic patterns IP ID wrap around +will take much longer than 30 seconds. +.P +It is therefore strongly recommended to use +.BR "NFS over TCP where possible" , +since TCP does not perform fragmentation. +.P +If you absolutely have to use NFS over UDP over Gigabit Ethernet, +some steps can be taken to mitigate the problem and reduce the +probability of corruption: +.TP +1.5i +.I Jumbo frames: +Many Gigabit network cards are capable of transmitting +frames bigger than the 1500 byte limit of traditional Ethernet, typically +9000 bytes. Using jumbo frames of 9000 bytes will allow you to run NFS over +UDP at a page size of 8K without fragmentation. Of course, this is +only feasible if all involved stations support jumbo frames. +.IP +To enable a machine to send jumbo frames on cards that support it, +it is sufficient to configure the interface for a MTU value of 9000. +.TP +1.5i +.I Lower reassembly timeout: +By lowering this timeout below the time it takes the IP ID counter +to wrap around, incorrect reassembly of fragments can be prevented +as well. To do so, simply write the new timeout value (in seconds) +to the file +.BR /proc/sys/net/ipv4/ipfrag_time . +.IP +A value of 2 seconds will greatly reduce the probability of IPID clashes on +a single Gigabit link, while still allowing for a reasonable timeout +when receiving fragmented traffic from distant peers. .SH "DATA AND METADATA COHERENCE" Some modern cluster file systems provide perfect cache coherence among their clients. -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] nfs-utils: Add a warning to the nfs manpage regarding using NFS over UDP on high-speed links 2012-05-09 0:59 ` [PATCH] nfs-utils: Add a warning to the nfs manpage regarding using NFS over UDP on high-speed links Harshula Jayasuriya @ 2012-05-09 18:14 ` Steve Dickson 2012-05-09 18:38 ` Peter Staubach 1 sibling, 0 replies; 16+ messages in thread From: Steve Dickson @ 2012-05-09 18:14 UTC (permalink / raw) To: Harshula Jayasuriya Cc: Jeff Layton, Linux NFS Mailing List, Chuck Lever, Olaf Kirch On 05/08/2012 08:59 PM, Harshula Jayasuriya wrote: > * Using NFS over UDP on high-speed links such as Gigabit can cause > silent data corruption. > * The man page text was written by Olaf Kirch and committed to (but not > upstream): > https://build.opensuse.org/package/view_file?file=warn-nfs-udp.patch&package=nfs-utils&project=openSUSE%3AFactory&rev=8e3e60c70e8270cd4afa036e13f6b2bb > > Signed-off-by: Harshula Jayasuriya <harshula@redhat.com> > Acked-by: Chuck Lever <chuck.lever@oracle.com> > Signed-off-by: Olaf Kirch <okir@suse.com> Committed... steved. > --- > utils/mount/nfs.man | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 81 insertions(+), 0 deletions(-) > > diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man > index 0d20cf0..87e27e1 100644 > --- a/utils/mount/nfs.man > +++ b/utils/mount/nfs.man > @@ -500,6 +500,8 @@ Specifying a netid that uses TCP forces all traffic from the > command and the NFS client to use TCP. > Specifying a netid that uses UDP forces all traffic types to use UDP. > .IP > +.B Before using NFS over UDP, refer to the TRANSPORT METHODS section. > +.IP > If the > .B proto > mount option is not specified, the > @@ -514,6 +516,8 @@ The > option is an alternative to specifying > .BR proto=udp. > It is included for compatibility with other operating systems. > +.IP > +.B Before using NFS over UDP, refer to the TRANSPORT METHODS section. > .TP 1.5i > .B tcp > The > @@ -1070,6 +1074,83 @@ or > options are specified more than once on the same mount command line, > then the value of the rightmost instance of each of these options > takes effect. > +.SS "Using NFS over UDP on high-speed links" > +Using NFS over UDP on high-speed links such as Gigabit > +.BR "can cause silent data corruption" . > +.P > +The problem can be triggered at high loads, and is caused by problems in > +IP fragment reassembly. NFS read and writes typically transmit UDP packets > +of 4 Kilobytes or more, which have to be broken up into several fragments > +in order to be sent over the Ethernet link, which limits packets to 1500 > +bytes by default. This process happens at the IP network layer and is > +called fragmentation. > +.P > +In order to identify fragments that belong together, IP assigns a 16bit > +.I IP ID > +value to each packet; fragments generated from the same UDP packet > +will have the same IP ID. The receiving system will collect these > +fragments and combine them to form the original UDP packet. This process > +is called reassembly. The default timeout for packet reassembly is > +30 seconds; if the network stack does not receive all fragments of > +a given packet within this interval, it assumes the missing fragment(s) > +got lost and discards those it already received. > +.P > +The problem this creates over high-speed links is that it is possible > +to send more than 65536 packets within 30 seconds. In fact, with > +heavy NFS traffic one can observe that the IP IDs repeat after about > +5 seconds. > +.P > +This has serious effects on reassembly: if one fragment gets lost, > +another fragment > +.I from a different packet > +but with the > +.I same IP ID > +will arrive within the 30 second timeout, and the network stack will > +combine these fragments to form a new packet. Most of the time, network > +layers above IP will detect this mismatched reassembly - in the case > +of UDP, the UDP checksum, which is a 16 bit checksum over the entire > +packet payload, will usually not match, and UDP will discard the > +bad packet. > +.P > +However, the UDP checksum is 16 bit only, so there is a chance of 1 in > +65536 that it will match even if the packet payload is completely > +random (which very often isn't the case). If that is the case, > +silent data corruption will occur. > +.P > +This potential should be taken seriously, at least on Gigabit > +Ethernet. > +Network speeds of 100Mbit/s should be considered less > +problematic, because with most traffic patterns IP ID wrap around > +will take much longer than 30 seconds. > +.P > +It is therefore strongly recommended to use > +.BR "NFS over TCP where possible" , > +since TCP does not perform fragmentation. > +.P > +If you absolutely have to use NFS over UDP over Gigabit Ethernet, > +some steps can be taken to mitigate the problem and reduce the > +probability of corruption: > +.TP +1.5i > +.I Jumbo frames: > +Many Gigabit network cards are capable of transmitting > +frames bigger than the 1500 byte limit of traditional Ethernet, typically > +9000 bytes. Using jumbo frames of 9000 bytes will allow you to run NFS over > +UDP at a page size of 8K without fragmentation. Of course, this is > +only feasible if all involved stations support jumbo frames. > +.IP > +To enable a machine to send jumbo frames on cards that support it, > +it is sufficient to configure the interface for a MTU value of 9000. > +.TP +1.5i > +.I Lower reassembly timeout: > +By lowering this timeout below the time it takes the IP ID counter > +to wrap around, incorrect reassembly of fragments can be prevented > +as well. To do so, simply write the new timeout value (in seconds) > +to the file > +.BR /proc/sys/net/ipv4/ipfrag_time . > +.IP > +A value of 2 seconds will greatly reduce the probability of IPID clashes on > +a single Gigabit link, while still allowing for a reasonable timeout > +when receiving fragmented traffic from distant peers. > .SH "DATA AND METADATA COHERENCE" > Some modern cluster file systems provide > perfect cache coherence among their clients. > -- 1.7.7.6 > ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [PATCH] nfs-utils: Add a warning to the nfs manpage regarding using NFS over UDP on high-speed links 2012-05-09 0:59 ` [PATCH] nfs-utils: Add a warning to the nfs manpage regarding using NFS over UDP on high-speed links Harshula Jayasuriya 2012-05-09 18:14 ` Steve Dickson @ 2012-05-09 18:38 ` Peter Staubach 2012-05-09 22:16 ` Harshula 1 sibling, 1 reply; 16+ messages in thread From: Peter Staubach @ 2012-05-09 18:38 UTC (permalink / raw) To: Harshula Jayasuriya, Steve Dickson Cc: Jeff Layton, Linux NFS Mailing List, Chuck Lever, Olaf Kirch SGkuDQoNCkkgdGhvdWdodCB0aGF0IHdlIGhhZCBwcmV2aW91c2x5IGRpc2N1c3NlZCB3aGV0aGVy IG9yIG5vdCB0byBpbmNsdWRlIHRoaXMgc29ydCBvZiB0ZXh0IGFuZCBoYWQgY29tZSB0byB0aGUg Y29uY2x1c2lvbiB0byBub3QgaW5jbHVkZSBpdCBiZWNhdXNlIHRoZSBwcm9ibGVtIGlzIG5vdCBu ZXcgb3IgdW5pcXVlIHRvIE5GUy4gIEl0IGlzIGEgZ2VuZXJhbCBuZXR3b3JraW5nIGlzc3VlLiAg QW0gSSByZW1lbWJlcmluZyBpbmNvcnJlY3RseT8NCg0KCVRoYW54Li4uDQoNCgkJcHMNCg0KDQot LS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KRnJvbTogbGludXgtbmZzLW93bmVyQHZnZXIua2Vy bmVsLm9yZyBbbWFpbHRvOmxpbnV4LW5mcy1vd25lckB2Z2VyLmtlcm5lbC5vcmddIE9uIEJlaGFs ZiBPZiBIYXJzaHVsYSBKYXlhc3VyaXlhDQpTZW50OiBUdWVzZGF5LCBNYXkgMDgsIDIwMTIgODo1 OSBQTQ0KVG86IFN0ZXZlIERpY2tzb24NCkNjOiBKZWZmIExheXRvbjsgTGludXggTkZTIE1haWxp bmcgTGlzdDsgQ2h1Y2sgTGV2ZXI7IE9sYWYgS2lyY2gNClN1YmplY3Q6IFtQQVRDSF0gbmZzLXV0 aWxzOiBBZGQgYSB3YXJuaW5nIHRvIHRoZSBuZnMgbWFucGFnZSByZWdhcmRpbmcgdXNpbmcgTkZT IG92ZXIgVURQIG9uIGhpZ2gtc3BlZWQgbGlua3MNCg0KKiBVc2luZyBORlMgb3ZlciBVRFAgb24g aGlnaC1zcGVlZCBsaW5rcyBzdWNoIGFzIEdpZ2FiaXQgY2FuIGNhdXNlDQogIHNpbGVudCBkYXRh IGNvcnJ1cHRpb24uDQoqIFRoZSBtYW4gcGFnZSB0ZXh0IHdhcyB3cml0dGVuIGJ5IE9sYWYgS2ly Y2ggYW5kIGNvbW1pdHRlZCB0byAoYnV0IG5vdA0KICB1cHN0cmVhbSk6DQpodHRwczovL2J1aWxk Lm9wZW5zdXNlLm9yZy9wYWNrYWdlL3ZpZXdfZmlsZT9maWxlPXdhcm4tbmZzLXVkcC5wYXRjaCZw YWNrYWdlPW5mcy11dGlscyZwcm9qZWN0PW9wZW5TVVNFJTNBRmFjdG9yeSZyZXY9OGUzZTYwYzcw ZTgyNzBjZDRhZmEwMzZlMTNmNmIyYmINCg0KU2lnbmVkLW9mZi1ieTogSGFyc2h1bGEgSmF5YXN1 cml5YSA8aGFyc2h1bGFAcmVkaGF0LmNvbT4NCkFja2VkLWJ5OiBDaHVjayBMZXZlciA8Y2h1Y2su bGV2ZXJAb3JhY2xlLmNvbT4NClNpZ25lZC1vZmYtYnk6IE9sYWYgS2lyY2ggPG9raXJAc3VzZS5j b20+DQotLS0NCiB1dGlscy9tb3VudC9uZnMubWFuIHwgICA4MSArKysrKysrKysrKysrKysrKysr KysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysNCiAxIGZpbGVzIGNoYW5nZWQsIDgxIGlu c2VydGlvbnMoKyksIDAgZGVsZXRpb25zKC0pDQoNCmRpZmYgLS1naXQgYS91dGlscy9tb3VudC9u ZnMubWFuIGIvdXRpbHMvbW91bnQvbmZzLm1hbiBpbmRleCAwZDIwY2YwLi44N2UyN2UxIDEwMDY0 NA0KLS0tIGEvdXRpbHMvbW91bnQvbmZzLm1hbg0KKysrIGIvdXRpbHMvbW91bnQvbmZzLm1hbg0K QEAgLTUwMCw2ICs1MDAsOCBAQCBTcGVjaWZ5aW5nIGEgbmV0aWQgdGhhdCB1c2VzIFRDUCBmb3Jj ZXMgYWxsIHRyYWZmaWMgZnJvbSB0aGUgIGNvbW1hbmQgYW5kIHRoZSBORlMgY2xpZW50IHRvIHVz ZSBUQ1AuDQogU3BlY2lmeWluZyBhIG5ldGlkIHRoYXQgdXNlcyBVRFAgZm9yY2VzIGFsbCB0cmFm ZmljIHR5cGVzIHRvIHVzZSBVRFAuDQogLklQDQorLkIgQmVmb3JlIHVzaW5nIE5GUyBvdmVyIFVE UCwgcmVmZXIgdG8gdGhlIFRSQU5TUE9SVCBNRVRIT0RTIHNlY3Rpb24uDQorLklQDQogSWYgdGhl DQogLkIgcHJvdG8NCiBtb3VudCBvcHRpb24gaXMgbm90IHNwZWNpZmllZCwgdGhlDQpAQCAtNTE0 LDYgKzUxNiw4IEBAIFRoZQ0KIG9wdGlvbiBpcyBhbiBhbHRlcm5hdGl2ZSB0byBzcGVjaWZ5aW5n DQogLkJSIHByb3RvPXVkcC4NCiBJdCBpcyBpbmNsdWRlZCBmb3IgY29tcGF0aWJpbGl0eSB3aXRo IG90aGVyIG9wZXJhdGluZyBzeXN0ZW1zLg0KKy5JUA0KKy5CIEJlZm9yZSB1c2luZyBORlMgb3Zl ciBVRFAsIHJlZmVyIHRvIHRoZSBUUkFOU1BPUlQgTUVUSE9EUyBzZWN0aW9uLg0KIC5UUCAxLjVp DQogLkIgdGNwDQogVGhlDQpAQCAtMTA3MCw2ICsxMDc0LDgzIEBAIG9yDQogb3B0aW9ucyBhcmUg c3BlY2lmaWVkIG1vcmUgdGhhbiBvbmNlIG9uIHRoZSBzYW1lIG1vdW50IGNvbW1hbmQgbGluZSwg IHRoZW4gdGhlIHZhbHVlIG9mIHRoZSByaWdodG1vc3QgaW5zdGFuY2Ugb2YgZWFjaCBvZiB0aGVz ZSBvcHRpb25zICB0YWtlcyBlZmZlY3QuDQorLlNTICJVc2luZyBORlMgb3ZlciBVRFAgb24gaGln aC1zcGVlZCBsaW5rcyINCitVc2luZyBORlMgb3ZlciBVRFAgb24gaGlnaC1zcGVlZCBsaW5rcyBz dWNoIGFzIEdpZ2FiaXQgLkJSICJjYW4gY2F1c2UgDQorc2lsZW50IGRhdGEgY29ycnVwdGlvbiIg Lg0KKy5QDQorVGhlIHByb2JsZW0gY2FuIGJlIHRyaWdnZXJlZCBhdCBoaWdoIGxvYWRzLCBhbmQg aXMgY2F1c2VkIGJ5IHByb2JsZW1zIA0KK2luIElQIGZyYWdtZW50IHJlYXNzZW1ibHkuIE5GUyBy ZWFkIGFuZCB3cml0ZXMgdHlwaWNhbGx5IHRyYW5zbWl0IFVEUCANCitwYWNrZXRzIG9mIDQgS2ls b2J5dGVzIG9yIG1vcmUsIHdoaWNoIGhhdmUgdG8gYmUgYnJva2VuIHVwIGludG8gc2V2ZXJhbCAN CitmcmFnbWVudHMgaW4gb3JkZXIgdG8gYmUgc2VudCBvdmVyIHRoZSBFdGhlcm5ldCBsaW5rLCB3 aGljaCBsaW1pdHMgDQorcGFja2V0cyB0byAxNTAwIGJ5dGVzIGJ5IGRlZmF1bHQuIFRoaXMgcHJv Y2VzcyBoYXBwZW5zIGF0IHRoZSBJUCANCituZXR3b3JrIGxheWVyIGFuZCBpcyBjYWxsZWQgZnJh Z21lbnRhdGlvbi4NCisuUA0KK0luIG9yZGVyIHRvIGlkZW50aWZ5IGZyYWdtZW50cyB0aGF0IGJl bG9uZyB0b2dldGhlciwgSVAgYXNzaWducyBhIDE2Yml0IA0KKy5JIElQIElEIHZhbHVlIHRvIGVh Y2ggcGFja2V0OyBmcmFnbWVudHMgZ2VuZXJhdGVkIGZyb20gdGhlIHNhbWUgVURQIA0KK3BhY2tl dCB3aWxsIGhhdmUgdGhlIHNhbWUgSVAgSUQuIFRoZSByZWNlaXZpbmcgc3lzdGVtIHdpbGwgY29s bGVjdCANCit0aGVzZSBmcmFnbWVudHMgYW5kIGNvbWJpbmUgdGhlbSB0byBmb3JtIHRoZSBvcmln aW5hbCBVRFAgcGFja2V0LiBUaGlzIA0KK3Byb2Nlc3MgaXMgY2FsbGVkIHJlYXNzZW1ibHkuIFRo ZSBkZWZhdWx0IHRpbWVvdXQgZm9yIHBhY2tldCByZWFzc2VtYmx5IA0KK2lzDQorMzAgc2Vjb25k czsgaWYgdGhlIG5ldHdvcmsgc3RhY2sgZG9lcyBub3QgcmVjZWl2ZSBhbGwgZnJhZ21lbnRzIG9m IGEgDQorZ2l2ZW4gcGFja2V0IHdpdGhpbiB0aGlzIGludGVydmFsLCBpdCBhc3N1bWVzIHRoZSBt aXNzaW5nIGZyYWdtZW50KHMpIA0KK2dvdCBsb3N0IGFuZCBkaXNjYXJkcyB0aG9zZSBpdCBhbHJl YWR5IHJlY2VpdmVkLg0KKy5QDQorVGhlIHByb2JsZW0gdGhpcyBjcmVhdGVzIG92ZXIgaGlnaC1z cGVlZCBsaW5rcyBpcyB0aGF0IGl0IGlzIHBvc3NpYmxlIA0KK3RvIHNlbmQgbW9yZSB0aGFuIDY1 NTM2IHBhY2tldHMgd2l0aGluIDMwIHNlY29uZHMuIEluIGZhY3QsIHdpdGggaGVhdnkgDQorTkZT IHRyYWZmaWMgb25lIGNhbiBvYnNlcnZlIHRoYXQgdGhlIElQIElEcyByZXBlYXQgYWZ0ZXIgYWJv dXQNCis1IHNlY29uZHMuDQorLlANCitUaGlzIGhhcyBzZXJpb3VzIGVmZmVjdHMgb24gcmVhc3Nl bWJseTogaWYgb25lIGZyYWdtZW50IGdldHMgbG9zdCwgDQorYW5vdGhlciBmcmFnbWVudCAuSSBm cm9tIGEgZGlmZmVyZW50IHBhY2tldCBidXQgd2l0aCB0aGUgLkkgc2FtZSBJUCBJRCANCit3aWxs IGFycml2ZSB3aXRoaW4gdGhlIDMwIHNlY29uZCB0aW1lb3V0LCBhbmQgdGhlIG5ldHdvcmsgc3Rh Y2sgd2lsbCANCitjb21iaW5lIHRoZXNlIGZyYWdtZW50cyB0byBmb3JtIGEgbmV3IHBhY2tldC4g TW9zdCBvZiB0aGUgdGltZSwgbmV0d29yayANCitsYXllcnMgYWJvdmUgSVAgd2lsbCBkZXRlY3Qg dGhpcyBtaXNtYXRjaGVkIHJlYXNzZW1ibHkgLSBpbiB0aGUgY2FzZSBvZiANCitVRFAsIHRoZSBV RFAgY2hlY2tzdW0sIHdoaWNoIGlzIGEgMTYgYml0IGNoZWNrc3VtIG92ZXIgdGhlIGVudGlyZSAN CitwYWNrZXQgcGF5bG9hZCwgd2lsbCB1c3VhbGx5IG5vdCBtYXRjaCwgYW5kIFVEUCB3aWxsIGRp c2NhcmQgdGhlIGJhZCANCitwYWNrZXQuDQorLlANCitIb3dldmVyLCB0aGUgVURQIGNoZWNrc3Vt IGlzIDE2IGJpdCBvbmx5LCBzbyB0aGVyZSBpcyBhIGNoYW5jZSBvZiAxIGluDQorNjU1MzYgdGhh dCBpdCB3aWxsIG1hdGNoIGV2ZW4gaWYgdGhlIHBhY2tldCBwYXlsb2FkIGlzIGNvbXBsZXRlbHkg DQorcmFuZG9tICh3aGljaCB2ZXJ5IG9mdGVuIGlzbid0IHRoZSBjYXNlKS4gSWYgdGhhdCBpcyB0 aGUgY2FzZSwgc2lsZW50IA0KK2RhdGEgY29ycnVwdGlvbiB3aWxsIG9jY3VyLg0KKy5QDQorVGhp cyBwb3RlbnRpYWwgc2hvdWxkIGJlIHRha2VuIHNlcmlvdXNseSwgYXQgbGVhc3Qgb24gR2lnYWJp dCBFdGhlcm5ldC4NCitOZXR3b3JrIHNwZWVkcyBvZiAxMDBNYml0L3Mgc2hvdWxkIGJlIGNvbnNp ZGVyZWQgbGVzcyBwcm9ibGVtYXRpYywgDQorYmVjYXVzZSB3aXRoIG1vc3QgdHJhZmZpYyBwYXR0 ZXJucyBJUCBJRCB3cmFwIGFyb3VuZCB3aWxsIHRha2UgbXVjaCANCitsb25nZXIgdGhhbiAzMCBz ZWNvbmRzLg0KKy5QDQorSXQgaXMgdGhlcmVmb3JlIHN0cm9uZ2x5IHJlY29tbWVuZGVkIHRvIHVz ZSAuQlIgIk5GUyBvdmVyIFRDUCB3aGVyZSANCitwb3NzaWJsZSIgLCBzaW5jZSBUQ1AgZG9lcyBu b3QgcGVyZm9ybSBmcmFnbWVudGF0aW9uLg0KKy5QDQorSWYgeW91IGFic29sdXRlbHkgaGF2ZSB0 byB1c2UgTkZTIG92ZXIgVURQIG92ZXIgR2lnYWJpdCBFdGhlcm5ldCwgc29tZSANCitzdGVwcyBj YW4gYmUgdGFrZW4gdG8gbWl0aWdhdGUgdGhlIHByb2JsZW0gYW5kIHJlZHVjZSB0aGUgcHJvYmFi aWxpdHkgDQorb2YgY29ycnVwdGlvbjoNCisuVFAgKzEuNWkNCisuSSBKdW1ibyBmcmFtZXM6DQor TWFueSBHaWdhYml0IG5ldHdvcmsgY2FyZHMgYXJlIGNhcGFibGUgb2YgdHJhbnNtaXR0aW5nIGZy YW1lcyBiaWdnZXIgDQordGhhbiB0aGUgMTUwMCBieXRlIGxpbWl0IG9mIHRyYWRpdGlvbmFsIEV0 aGVybmV0LCB0eXBpY2FsbHkNCis5MDAwIGJ5dGVzLiBVc2luZyBqdW1ibyBmcmFtZXMgb2YgOTAw MCBieXRlcyB3aWxsIGFsbG93IHlvdSB0byBydW4gTkZTIA0KK292ZXIgVURQIGF0IGEgcGFnZSBz aXplIG9mIDhLIHdpdGhvdXQgZnJhZ21lbnRhdGlvbi4gT2YgY291cnNlLCB0aGlzIGlzIA0KK29u bHkgZmVhc2libGUgaWYgYWxsIGludm9sdmVkIHN0YXRpb25zIHN1cHBvcnQganVtYm8gZnJhbWVz Lg0KKy5JUA0KK1RvIGVuYWJsZSBhIG1hY2hpbmUgdG8gc2VuZCBqdW1ibyBmcmFtZXMgb24gY2Fy ZHMgdGhhdCBzdXBwb3J0IGl0LCBpdCANCitpcyBzdWZmaWNpZW50IHRvIGNvbmZpZ3VyZSB0aGUg aW50ZXJmYWNlIGZvciBhIE1UVSB2YWx1ZSBvZiA5MDAwLg0KKy5UUCArMS41aQ0KKy5JIExvd2Vy IHJlYXNzZW1ibHkgdGltZW91dDoNCitCeSBsb3dlcmluZyB0aGlzIHRpbWVvdXQgYmVsb3cgdGhl IHRpbWUgaXQgdGFrZXMgdGhlIElQIElEIGNvdW50ZXIgdG8gDQord3JhcCBhcm91bmQsIGluY29y cmVjdCByZWFzc2VtYmx5IG9mIGZyYWdtZW50cyBjYW4gYmUgcHJldmVudGVkIGFzIA0KK3dlbGwu IFRvIGRvIHNvLCBzaW1wbHkgd3JpdGUgdGhlIG5ldyB0aW1lb3V0IHZhbHVlIChpbiBzZWNvbmRz KSB0byB0aGUgDQorZmlsZSAuQlIgL3Byb2Mvc3lzL25ldC9pcHY0L2lwZnJhZ190aW1lIC4NCisu SVANCitBIHZhbHVlIG9mIDIgc2Vjb25kcyB3aWxsIGdyZWF0bHkgcmVkdWNlIHRoZSBwcm9iYWJp bGl0eSBvZiBJUElEIA0KK2NsYXNoZXMgb24gYSBzaW5nbGUgR2lnYWJpdCBsaW5rLCB3aGlsZSBz dGlsbCBhbGxvd2luZyBmb3IgYSByZWFzb25hYmxlIA0KK3RpbWVvdXQgd2hlbiByZWNlaXZpbmcg ZnJhZ21lbnRlZCB0cmFmZmljIGZyb20gZGlzdGFudCBwZWVycy4NCiAuU0ggIkRBVEEgQU5EIE1F VEFEQVRBIENPSEVSRU5DRSINCiBTb21lIG1vZGVybiBjbHVzdGVyIGZpbGUgc3lzdGVtcyBwcm92 aWRlICBwZXJmZWN0IGNhY2hlIGNvaGVyZW5jZSBhbW9uZyB0aGVpciBjbGllbnRzLg0KLS0NCjEu Ny43LjYNCg0KLS0NClRvIHVuc3Vic2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5l ICJ1bnN1YnNjcmliZSBsaW51eC1uZnMiIGluIHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpv cmRvbW9Admdlci5rZXJuZWwub3JnIE1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0dHA6Ly92Z2Vy Lmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRtbA0K ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [PATCH] nfs-utils: Add a warning to the nfs manpage regarding using NFS over UDP on high-speed links 2012-05-09 18:38 ` Peter Staubach @ 2012-05-09 22:16 ` Harshula 0 siblings, 0 replies; 16+ messages in thread From: Harshula @ 2012-05-09 22:16 UTC (permalink / raw) To: Peter Staubach Cc: Steve Dickson, Jeff Layton, Linux NFS Mailing List, Chuck Lever, Olaf Kirch Hi Peter! On Wed, 2012-05-09 at 14:38 -0400, Peter Staubach wrote: > Hi. > > I thought that we had previously discussed whether or not to include > this sort of text and had come to the conclusion to not include it > because the problem is not new or unique to NFS. It is a general > networking issue. Am I remembering incorrectly? This was from the most recent discussion: http://article.gmane.org/gmane.linux.nfs/47349 http://article.gmane.org/gmane.linux.nfs/47350 cya, # > -----Original Message----- > From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of Harshula Jayasuriya > Sent: Tuesday, May 08, 2012 8:59 PM > To: Steve Dickson > Cc: Jeff Layton; Linux NFS Mailing List; Chuck Lever; Olaf Kirch > Subject: [PATCH] nfs-utils: Add a warning to the nfs manpage regarding using NFS over UDP on high-speed links > > * Using NFS over UDP on high-speed links such as Gigabit can cause > silent data corruption. > * The man page text was written by Olaf Kirch and committed to (but not > upstream): > https://build.opensuse.org/package/view_file?file=warn-nfs-udp.patch&package=nfs-utils&project=openSUSE%3AFactory&rev=8e3e60c70e8270cd4afa036e13f6b2bb > > Signed-off-by: Harshula Jayasuriya <harshula@redhat.com> > Acked-by: Chuck Lever <chuck.lever@oracle.com> > Signed-off-by: Olaf Kirch <okir@suse.com> > --- > utils/mount/nfs.man | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 81 insertions(+), 0 deletions(-) > > diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man index 0d20cf0..87e27e1 100644 > --- a/utils/mount/nfs.man > +++ b/utils/mount/nfs.man > @@ -500,6 +500,8 @@ Specifying a netid that uses TCP forces all traffic from the command and the NFS client to use TCP. > Specifying a netid that uses UDP forces all traffic types to use UDP. > .IP > +.B Before using NFS over UDP, refer to the TRANSPORT METHODS section. > +.IP > If the > .B proto > mount option is not specified, the > @@ -514,6 +516,8 @@ The > option is an alternative to specifying > .BR proto=udp. > It is included for compatibility with other operating systems. > +.IP > +.B Before using NFS over UDP, refer to the TRANSPORT METHODS section. > .TP 1.5i > .B tcp > The > @@ -1070,6 +1074,83 @@ or > options are specified more than once on the same mount command line, then the value of the rightmost instance of each of these options takes effect. > +.SS "Using NFS over UDP on high-speed links" > +Using NFS over UDP on high-speed links such as Gigabit .BR "can cause > +silent data corruption" . > +.P > +The problem can be triggered at high loads, and is caused by problems > +in IP fragment reassembly. NFS read and writes typically transmit UDP > +packets of 4 Kilobytes or more, which have to be broken up into several > +fragments in order to be sent over the Ethernet link, which limits > +packets to 1500 bytes by default. This process happens at the IP > +network layer and is called fragmentation. > +.P > +In order to identify fragments that belong together, IP assigns a 16bit > +.I IP ID value to each packet; fragments generated from the same UDP > +packet will have the same IP ID. The receiving system will collect > +these fragments and combine them to form the original UDP packet. This > +process is called reassembly. The default timeout for packet reassembly > +is > +30 seconds; if the network stack does not receive all fragments of a > +given packet within this interval, it assumes the missing fragment(s) > +got lost and discards those it already received. > +.P > +The problem this creates over high-speed links is that it is possible > +to send more than 65536 packets within 30 seconds. In fact, with heavy > +NFS traffic one can observe that the IP IDs repeat after about > +5 seconds. > +.P > +This has serious effects on reassembly: if one fragment gets lost, > +another fragment .I from a different packet but with the .I same IP ID > +will arrive within the 30 second timeout, and the network stack will > +combine these fragments to form a new packet. Most of the time, network > +layers above IP will detect this mismatched reassembly - in the case of > +UDP, the UDP checksum, which is a 16 bit checksum over the entire > +packet payload, will usually not match, and UDP will discard the bad > +packet. > +.P > +However, the UDP checksum is 16 bit only, so there is a chance of 1 in > +65536 that it will match even if the packet payload is completely > +random (which very often isn't the case). If that is the case, silent > +data corruption will occur. > +.P > +This potential should be taken seriously, at least on Gigabit Ethernet. > +Network speeds of 100Mbit/s should be considered less problematic, > +because with most traffic patterns IP ID wrap around will take much > +longer than 30 seconds. > +.P > +It is therefore strongly recommended to use .BR "NFS over TCP where > +possible" , since TCP does not perform fragmentation. > +.P > +If you absolutely have to use NFS over UDP over Gigabit Ethernet, some > +steps can be taken to mitigate the problem and reduce the probability > +of corruption: > +.TP +1.5i > +.I Jumbo frames: > +Many Gigabit network cards are capable of transmitting frames bigger > +than the 1500 byte limit of traditional Ethernet, typically > +9000 bytes. Using jumbo frames of 9000 bytes will allow you to run NFS > +over UDP at a page size of 8K without fragmentation. Of course, this is > +only feasible if all involved stations support jumbo frames. > +.IP > +To enable a machine to send jumbo frames on cards that support it, it > +is sufficient to configure the interface for a MTU value of 9000. > +.TP +1.5i > +.I Lower reassembly timeout: > +By lowering this timeout below the time it takes the IP ID counter to > +wrap around, incorrect reassembly of fragments can be prevented as > +well. To do so, simply write the new timeout value (in seconds) to the > +file .BR /proc/sys/net/ipv4/ipfrag_time . > +.IP > +A value of 2 seconds will greatly reduce the probability of IPID > +clashes on a single Gigabit link, while still allowing for a reasonable > +timeout when receiving fragmented traffic from distant peers. > .SH "DATA AND METADATA COHERENCE" > Some modern cluster file systems provide perfect cache coherence among their clients. > -- > 1.7.7.6 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2012-05-09 22:16 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-02-28 5:22 "Using NFS over UDP on high-speed links such as Gigabit can cause silent data corruption." Harshula 2012-02-28 11:52 ` Jeff Layton 2012-02-28 12:32 ` Harshula 2012-02-28 12:41 ` Jeff Layton 2012-03-05 1:56 ` Harshula 2012-02-28 12:46 ` Jim Rees 2012-02-28 12:57 ` Jeff Layton 2012-02-28 14:35 ` Chuck Lever 2012-02-28 15:09 ` Jim Rees 2012-02-28 15:50 ` Chuck Lever 2012-03-05 2:17 ` Harshula 2012-03-05 15:08 ` Chuck Lever 2012-05-09 0:59 ` [PATCH] nfs-utils: Add a warning to the nfs manpage regarding using NFS over UDP on high-speed links Harshula Jayasuriya 2012-05-09 18:14 ` Steve Dickson 2012-05-09 18:38 ` Peter Staubach 2012-05-09 22:16 ` Harshula
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).