netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PMTU issues due to TOS field manipulation (for DSCP)
@ 2003-12-10 18:23 Kevin W. Rudd
  2003-12-10 19:34 ` Andi Kleen
  0 siblings, 1 reply; 31+ messages in thread
From: Kevin W. Rudd @ 2003-12-10 18:23 UTC (permalink / raw)
  To: David Miller, Alexey Kuznetsov; +Cc: netdev


Here is a recent posting to linux-kernel related to PMTU issues when
routers use DSCP marking.

> Subject Yet another UDP pmtud iss, it's different, really
> Date Sun, 16 Nov 2003 22:25:08 -0800
> From "Johnson, Chester F" <chester.f.johnson@intel.com>
> 
> This is not the same as the pmtud issues discussed ad-nauseum from 1999
> through 2001. It really is different. Trust me, please read on.
> 
> Well, it is similar, but with a twist. We are in the middle of deploying
> DiffServ compliant QoS throughout our networks and stumbled across an
> issue that occurs when we configure our routers to mark the DiffServ
> Code Points (DSCP) for UDP traffic (AFS, NFS, other full frame size UDP
> traffic).
> 
> The problem is that when the marked traffic reaches an IPsec/Ethernet
> segment, and the DF bit set to true, an ICMP message is returned to the
> transmitting host to say basically "fix your MTU". Since we have changed
> the ToS field with DSCP information, the ICMP message no longer matches
> anything in the route cache hash. If the ToS field is not "0", it must
> match src, dst, and ToS in the cache. Well, we changed one of them and
> there can be no such match.
> 
> The net result is that the transmitting host sends another 1500 byte
> packet and the process repeats itself. Ultimately the data transfer
> fails. When we stop DSCP marking, MTU negotiation works just fine, but
> we have no QoS.
> 
> This kind of match might be great if we use a Linux platform as a
> router. It may indeed be useful for higher performance DiffServ routing.
> This kind of match requirement for an end-host is problematic. In our
> estimation it looks like a bug.
> 
> Can anyone out there help sort this out?
> 
> Chester Johnson
> Network Transport Engineering
> Intel Corporation
> 

At least in the case of a "Destination unreachable/Fragmentation
needed" ICMP message, there is an assumption that the TOS value
returned will not have changed.  The ip_rt_frag_needed() routine
will fail to find a cached route with a matching TOS (since it has
been changed by the DSCP marking) and so the MTU will not be properly
updated.  Given that the DS field definition supersedes the previous
TOS definitions, this does indeed present a problem for route modifying
ICMP messages in a network environment that is using DSCP marking.

This particular user would like the ability to turn off the caching of
the TOS value within the routing tables.  On the surface, it looks like
simply manipulating the IPTOS_RT_MASK would accomplish what they are
looking for with minimal code changes.  This can currently be done with
something equivalent in include/net/route.h:

#ifndef CONFIG_IP_ROUTE_TOS
#define IPTOS_RT_MASK   0
#endif

and then rebuilding the kernel with the CONFIG_IP_ROUTE_TOS unset.  If
the approach of zeroing out the TOS routing mask seems reasonable, a
more dynamic approach would be desired (a sysctl variable that can be
modified without having to build a custom kernel).

Long term, does it really make sense to continue trying to make routing
decisions based on TOS when this field is obsolete?

Thoughts?  Comments?

Thanks,
       -Kevin

--
 Kevin W. Rudd
 Linux Change Team
 IBM Global Services
 1-800-426-7378,  T/L 775-4161

^ permalink raw reply	[flat|nested] 31+ messages in thread
* RE: PMTU issues due to TOS field manipulation (for DSCP)
@ 2003-12-10 21:35 Johnson, Chester F
  0 siblings, 0 replies; 31+ messages in thread
From: Johnson, Chester F @ 2003-12-10 21:35 UTC (permalink / raw)
  To: Andi Kleen, Nivedita Singhvi; +Cc: ruddk, netdev

>From a corporate network perspective there are a couple of options that
can temporarily work around the problem. As an example, support for
jumbo frames on Ethernet IP-Sec interfaces. This would require some
hardware replacement, but GigE generally supports this. On the other
hand, I'm not sure I want to comprehend IP-Sec at GigE line rates.

We are pretty early in deploying diff-serv compliant QoS at this scale.
We are also one of few companies that mandate encrypted private WAN
links (Only the paranoid survive). As we look forward, we would expect
more frequent appearance of this behavior at other companies. 

All that said, an evolutionary roll-out is OK. Knowing that there is a
target release that would resolve the issue gives us something to look
forward to. It may also be a good idea to submit some clarification in
the relevant RFCs.

chet




-----Original Message-----
From: Andi Kleen [mailto:ak@suse.de] 
Sent: Wednesday, December 10, 2003 12:55 PM
To: Nivedita Singhvi
Cc: ruddk@us.ibm.com; netdev@oss.sgi.com; Johnson, Chester F
Subject: Re: PMTU issues due to TOS field manipulation (for DSCP)


On Wed, 10 Dec 2003 12:30:39 -0800
Nivedita Singhvi <niv@us.ibm.com> wrote:
 
> > I don't think the network users will be very happy if you require 
> > changing all end hosts this way. How about the following hack? For 
> > DF=1 packets the ipid field is useless. When you rewrite the TOS to 
> > DSCP save the old TOS in the ipid field.  When you see an ICMP 
> > fragment required message with the right DSCP on the router restore 
> > the old TOS from the ipid field.
> 
> Wouldnt this require changes at the both ends, router and host? How 
> would we sync?

Only the router would need to change (and rewrite ICMP messages, which
is a 
bit nasty, but then compatibility is not always fun) 

-Andi

P.S.: I am not opposed to fixing linux for this, just I have my doubts
that fixing all end hosts is a practical solution for the problem.

^ permalink raw reply	[flat|nested] 31+ messages in thread
* RE: PMTU issues due to TOS field manipulation (for DSCP)
@ 2003-12-10 23:36 Johnson, Chester F
  2003-12-11  0:17 ` Julian Anastasov
  0 siblings, 1 reply; 31+ messages in thread
From: Johnson, Chester F @ 2003-12-10 23:36 UTC (permalink / raw)
  To: David S. Miller, Julian Anastasov; +Cc: niv, ak, ruddk, kuznet, netdev

I'm trying to think through one potential issue with ignoring the TOS
field. Would/should a route redirect have to match the same src-dst-tos
hash?

1. Assume the network uses DSCP to make routing decisions.
2. Assume the first hop routers are configured as passive and do not
advertise routes to the supported subnets and presumes static routes
configured on end hosts.
3. The Linux host has multiple network interfaces for route diversity.

The preferred route may send the frame to the wrong router and then
ignore the icmp route redirect. Or would it?

chet

-----Original Message-----
From: David S. Miller [mailto:davem@redhat.com] 
Sent: Wednesday, December 10, 2003 3:21 PM
To: Julian Anastasov
Cc: niv@us.ibm.com; ak@suse.de; ruddk@us.ibm.com; kuznet@ms2.inr.ac.ru;
netdev@oss.sgi.com; Johnson, Chester F
Subject: Re: PMTU issues due to TOS field manipulation (for DSCP)


On Thu, 11 Dec 2003 01:15:06 +0200 (EET)
Julian Anastasov <ja@ssi.bg> wrote:

> It seems, there are no many users of OIF!=0 but if TOS is used as 
> routing key we can see up to 8 entries with different TOS for same 
> SADDR,DADDR. Of course, it looks difficult to walk 8 rows just to 
> check all TOS variants, the common case is to see only one TOS value 
> used. That is why I propose to eliminate the TOS as hash key and to 
> walk one row. At first look, the risk of DoS is same, thanks to the 
> random value.

This is not my understanding at all.

Consider the case where we generate routing cache entries for all 8 TOS
values.  Currently we'll likely get a O(1) lookup for any one of those
entries.

Your proposal guarentees that all such entries will land to the same
hash chain, since TOS is not an input for the hash any longer. Therefore
the lookup in my example case will be O(8).

And instead of just eating the complexity at ICMP PMTU handling time, we
eat the complexity at every routing cache lookup.

I really don't think we can consider this.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2004-03-06 16:02 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-12-10 18:23 PMTU issues due to TOS field manipulation (for DSCP) Kevin W. Rudd
2003-12-10 19:34 ` Andi Kleen
2003-12-10 20:20   ` Julian Anastasov
2003-12-10 20:33     ` Nivedita Singhvi
2003-12-10 21:18       ` Julian Anastasov
2003-12-10 22:36         ` Nivedita Singhvi
2003-12-10 22:51           ` David S. Miller
2003-12-10 23:15             ` Julian Anastasov
2003-12-10 23:20               ` David S. Miller
2003-12-11  0:06                 ` Julian Anastasov
2003-12-11  0:09                   ` David S. Miller
2003-12-11  0:34                     ` Julian Anastasov
2003-12-12  8:31                       ` David S. Miller
2003-12-12 23:38                         ` Julian Anastasov
2003-12-18 23:17                           ` Kevin W. Rudd
2004-01-19 22:43                           ` Kevin W. Rudd
2004-01-20  4:29                             ` David S. Miller
2003-12-13  0:10                         ` Julian Anastasov
2004-03-04  9:36                           ` David S. Miller
2004-03-04 20:56                             ` Julian Anastasov
2004-03-04 22:02                               ` kuznet
2004-03-06 11:55                                 ` Julian Anastasov
2004-03-06 16:02                                 ` Julian Anastasov
2003-12-10 20:26   ` Kevin W. Rudd
2003-12-10 20:52     ` Andi Kleen
2003-12-10 20:30   ` Nivedita Singhvi
2003-12-10 20:55     ` Andi Kleen
2003-12-10 21:11       ` Nivedita Singhvi
  -- strict thread matches above, loose matches on Subject: below --
2003-12-10 21:35 Johnson, Chester F
2003-12-10 23:36 Johnson, Chester F
2003-12-11  0:17 ` Julian Anastasov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).