netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "David S. Miller" <davem@redhat.com>
To: Julian Anastasov <ja@ssi.bg>
Cc: niv@us.ibm.com, ak@suse.de, ruddk@us.ibm.com,
	kuznet@ms2.inr.ac.ru, netdev@oss.sgi.com,
	chester.f.johnson@intel.com
Subject: Re: PMTU issues due to TOS field manipulation (for DSCP)
Date: Fri, 12 Dec 2003 00:31:43 -0800	[thread overview]
Message-ID: <20031212003143.062598e9.davem@redhat.com> (raw)
In-Reply-To: <Pine.LNX.4.44.0312110219210.1519-100000@u.domain.uli>

On Thu, 11 Dec 2003 02:34:51 +0200 (EET)
Julian Anastasov <ja@ssi.bg> wrote:

> On Wed, 10 Dec 2003, David S. Miller wrote:
> 
> > But regardless, let us say that your system has complexity O(16)
> > lookups as you mention, your proposal changes this to O(16+8).
> 
> 	It is ~16 :)
> 
> 	ip_rt_max_size = (rt_hash_mask + 1) * 16;
> 
> 	This is what happens on full table, of course. OK,
> some simple numbers for an ideal table:

But look at default gc_thresh setting, which is when we trim
rt cache entries:

        ipv4_dst_ops.gc_thresh = (rt_hash_mask + 1);

The ip_rt_max_size value is meant to be a sort of buffer to absorb
the situation where many rt cache entries are unreclaimable.

But this is a seperate issue, and we can discuss your further points
regardless.

> 2 cases depending on whether TOS is a hash key (path=saddr->daddr):
> 
> 1. TOS is a hash key:
> 
> 	- in each chain we have 16 paths, 1 TOS value per path
> 	- all 8 TOS values for a path are in 8 different chains
> 
> 2. TOS is not a hash key:
> 
> 	2 paths per chain (2 paths x 8 TOS values => 16 entries)
> 
> if all saddr->daddr->tos streams have same packet rate I think
> the CPU time to lookup them will be same.
> This is because 8 (number of TOS values) < 16 (chain length).
> 
> 	And I hope the users always can tune the proposed TOS
> settings if they see DoS and if they do not need TOS as a rt key.

Ok.  I agree with your analysis.  Let's propose something concrete.

1) PMTU processing applies PMTU change to all TOS'd instances of
   a route.  This behavior change is sysctl controllable, and
   on by default.

   The implementation is to just lookup all 8 possible TOS values.

2) Whether TOS is a routing cache hash key is controlled by another
   sysctl.

   When CONFIG_IP_ROUTE_TOS is set this sysctl defaults to on, other-
   wise it defaults to off.

I think #2 should be very safe because fib node fn_tos values are only
ever set when that config variable is enabled, and fib rule r_tos values
are only compared on lookup when it is enabled as well.  However, there
could be a few more ifdefs added to the fib rule code to cover all the
assignment cases too but let's not worry about that right now.

Comments?

  reply	other threads:[~2003-12-12  8:31 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-12-10 18:23 PMTU issues due to TOS field manipulation (for DSCP) Kevin W. Rudd
2003-12-10 19:34 ` Andi Kleen
2003-12-10 20:20   ` Julian Anastasov
2003-12-10 20:33     ` Nivedita Singhvi
2003-12-10 21:18       ` Julian Anastasov
2003-12-10 22:36         ` Nivedita Singhvi
2003-12-10 22:51           ` David S. Miller
2003-12-10 23:15             ` Julian Anastasov
2003-12-10 23:20               ` David S. Miller
2003-12-11  0:06                 ` Julian Anastasov
2003-12-11  0:09                   ` David S. Miller
2003-12-11  0:34                     ` Julian Anastasov
2003-12-12  8:31                       ` David S. Miller [this message]
2003-12-12 23:38                         ` Julian Anastasov
2003-12-18 23:17                           ` Kevin W. Rudd
2004-01-19 22:43                           ` Kevin W. Rudd
2004-01-20  4:29                             ` David S. Miller
2003-12-13  0:10                         ` Julian Anastasov
2004-03-04  9:36                           ` David S. Miller
2004-03-04 20:56                             ` Julian Anastasov
2004-03-04 22:02                               ` kuznet
2004-03-06 11:55                                 ` Julian Anastasov
2004-03-06 16:02                                 ` Julian Anastasov
2003-12-10 20:26   ` Kevin W. Rudd
2003-12-10 20:52     ` Andi Kleen
2003-12-10 20:30   ` Nivedita Singhvi
2003-12-10 20:55     ` Andi Kleen
2003-12-10 21:11       ` Nivedita Singhvi
  -- strict thread matches above, loose matches on Subject: below --
2003-12-10 21:35 Johnson, Chester F
2003-12-10 23:36 Johnson, Chester F
2003-12-11  0:17 ` Julian Anastasov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031212003143.062598e9.davem@redhat.com \
    --to=davem@redhat.com \
    --cc=ak@suse.de \
    --cc=chester.f.johnson@intel.com \
    --cc=ja@ssi.bg \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=netdev@oss.sgi.com \
    --cc=niv@us.ibm.com \
    --cc=ruddk@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).