All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <ak@suse.de>
To: David Miller <davem@davemloft.net>
Cc: rdreier@cisco.com, tom@opengridcomputing.com,
	netdev@vger.kernel.org, akpm@osdl.org
Subject: Re: RDMA will be reverted
Date: Tue, 25 Jul 2006 02:02:02 +0200	[thread overview]
Message-ID: <200607250202.02913.ak@suse.de> (raw)
In-Reply-To: <20060724.162250.55836503.davem@davemloft.net>

On Tuesday 25 July 2006 01:22, David Miller wrote:
> From: Andi Kleen <ak@suse.de>
> Date: Tue, 25 Jul 2006 01:10:25 +0200
> 
> > > All the original costs of route, netfilter, TCP socket lookup all
> > > reappear as we make VJ netchannels fit all the rules of real practical
> > > systems, eliminating their gains entirely.
> > 
> > At least most of the optimizations from the early demux scheme could
> > be probably gotten simpler by adding a fast path to iptables/conntrack/etc. 
> > that checks if all rules only check SYN etc. packets and doesn't walk the
> > full rules then (or more generalized a fast TCP flag mask check similar 
> > to what TCP does). With that ESTABLISHED would hit TCP only with relatively
> > small overhead.
> 
> Actually, all is not lost.  Alexey has a more clever idea which
> is basically to run the netfilter hooks in the socket receive
> path.

The gain being that the target CPU does the work instead of 
the softirq one?

Some combined lookup and better handler of ESTABLISHED still
seems like a good idea.

One idea I had at some point was to separate conntrack for local
connection vs routed connections and attach the local conntrack
to the socket (and use its lookup tables). Then at least for
local connections conntrack should be nearly free.

It should also solve the issue we currently have that enabled 
conntrack makes local network performance significantly worse.

> Where does state live in such a huge process?  Usually, it is
> scattered all over it's address space.  Let us say that java
> application just did a lot of churning on it's own data
> structure, swapping out TCP library state objects, we will
> prematurely swap that stuff back in just to spit out an ACK
> or similar.

TCP state is usually multiple cache lines, so you would have
cache misses anyways. Do you worry about the TLBs? 

> > But what do you do when you have lots of different connections
> > with different target CPU hash values or when this would require
> > you to move multiple compute intensive processes or a single core?
> 
> That is why we have scheduler :)

It can't do well if it gets conflicting input.

> Even in a best effort scenerio, things 
> will be generally better than they are currently, plus there is nothing
> precluding the flow demux MSI-X selection from getting more intelligent.

Intelligent = statefull in this case.

AFAIK the only way to do it stateless is hashes and the output
of hashes tends to be unpredictible by definition.


> For example, the demuxer could "notice" that TCPdata transmits for
> flow X tend to happen on cpu X, and update a flow table to record that
> fact.  It could use the same flow table as the one used for LRO.

Hmm, i somewhat doubt that lower end NICs will ever have such flow tables.
Also the flow tables could always thrash (because the on NIC RAM is necessarily
limited) or they or require the NIC to look up state in memory which is 
probably not much faster than the CPUs doing it.

Using hash functions in the hardware to select the MSI-X seems 
more elegant, cheaper and much more scalable to me.

The drawback of hashes is that for processes with multiple
connections you have to move some work back into the softirqs
that run on the MSI-X target CPUs.

So basically doing process context TCP fully will require
much more complex and statefull hardware. 

Or you can keep it only as a fast path for specific situations
(single busy connection per thread) and stay with mostly-softirq
processing for the many connection cases.

-Andi

  reply	other threads:[~2006-07-25  0:05 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-28  7:07 RDMA will be reverted David Miller
2006-06-28  7:41 ` Evgeniy Polyakov
2006-06-28 14:56 ` Tom Tucker
2006-06-28 15:01 ` Steve Wise
2006-06-29 16:54 ` Roland Dreier
2006-06-29 17:32   ` YOSHIFUJI Hideaki / 吉藤英明
2006-06-29 17:35     ` Roland Dreier
2006-06-29 17:40       ` YOSHIFUJI Hideaki / 吉藤英明
2006-06-29 19:46   ` David Miller
2006-06-29 20:11     ` Tom Tucker
2006-06-29 20:16       ` Tom Tucker
2006-06-29 20:19       ` David Miller
2006-06-29 20:47         ` Tom Tucker
2006-06-29 20:53           ` David Miller
2006-06-29 21:28             ` Tom Tucker
2006-06-29 21:25         ` Andi Kleen
2006-06-29 20:42       ` James Morris
2006-06-30 20:51     ` Roland Dreier
2006-06-30 21:16       ` David Miller
2006-06-30 23:01         ` Tom Tucker
2006-07-01 14:26           ` Andi Kleen
2006-07-04 18:34             ` Andy Gay
2006-07-04 20:47               ` Andi Kleen
2006-07-04 22:22                 ` Andy Gay
2006-07-04 23:01                   ` Andi Kleen
2006-07-04 23:48                     ` Andy Gay
2006-07-05  0:04                       ` Andi Kleen
2006-07-04 20:34             ` Roland Dreier
2006-07-24 22:06               ` David Miller
2006-07-24 23:10                 ` Andi Kleen
2006-07-24 23:22                   ` David Miller
2006-07-25  0:02                     ` Andi Kleen [this message]
2006-07-25  0:29                       ` Rick Jones
2006-07-25  0:45                         ` David Miller
2006-07-25  0:55                           ` Rick Jones
2006-07-25  1:04                             ` Andi Kleen
2006-07-25  1:21                             ` David Miller
2006-07-25 16:29                               ` Rick Jones
2006-07-25 16:32                                 ` Andi Kleen
2006-07-25  1:03                           ` Rick Jones
2006-07-25  1:42                         ` Andi Kleen
2006-07-25  5:51                 ` Evgeniy Polyakov
2006-07-25  6:48                   ` David Miller
2006-07-25  6:59                     ` Evgeniy Polyakov
2006-07-25  7:33                       ` David Miller
2006-07-25  7:42                         ` Evgeniy Polyakov
2006-07-05 17:09             ` Tom Tucker
2006-07-05 17:50               ` Steve Wise
2006-07-24 22:25                 ` David Miller
2006-07-24 22:47                   ` Caitlin Bestler
2006-07-24 22:23               ` David Miller
2006-07-24 22:57                 ` Caitlin Bestler
2006-07-01 21:45           ` David Miller
2006-07-04 20:34             ` Roland Dreier
2006-07-05 18:27               ` David Miller
2006-07-05 20:29                 ` Roland Dreier
2006-07-06  3:03                   ` David Miller
2006-07-06  5:25                     ` Tom Tucker
2006-07-06 14:08                       ` Herbert Xu
2006-07-06 17:36                         ` Tom Tucker
2006-07-07  0:03                           ` Herbert Xu
2006-07-07  0:32                             ` Tom Tucker
2006-07-07  6:53                       ` David Miller
2006-07-07  8:11                         ` What is RDMA (was: RDMA will be reverted) Herbert Xu
2006-07-07 18:25                           ` Steve Wise
2006-07-11  8:17                             ` Herbert Xu
2006-07-11 13:27                               ` Steve Wise
2006-07-24 22:29                           ` What is RDMA David Miller
2006-07-24 22:34                             ` Rick Jones
2006-07-24 22:39                               ` David Miller
2006-07-24 22:49                               ` Andi Kleen
2006-07-07 13:29                         ` RDMA will be reverted Tom Tucker
  -- strict thread matches above, loose matches on Subject: below --
2006-07-06 13:26 Caitlin Bestler
2006-07-25 19:59 Tom Tucker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200607250202.02913.ak@suse.de \
    --to=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=rdreier@cisco.com \
    --cc=tom@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.