From: Rick Jones <rick.jones2@hp.com>
To: David Miller <davem@davemloft.net>
Cc: therbert@google.com, shemminger@vyatta.com, dada1@cosmosbay.com,
andi@firstfloor.org, netdev@vger.kernel.org
Subject: Re: [PATCH] Software receive packet steering
Date: Wed, 22 Apr 2009 11:49:50 -0700 [thread overview]
Message-ID: <49EF66CE.10800@hp.com> (raw)
In-Reply-To: <20090422.022120.211323498.davem@davemloft.net>
David Miller wrote:
> From: Tom Herbert <therbert@google.com>
> Date: Tue, 21 Apr 2009 11:52:07 -0700
>
>
>>That is possible and don't think the design of our patch would
>>preclude it, but I am worried that each time the mapping from a
>>connection to a CPU changes this could cause of out of order
>>packets. I suppose this is similar problem to changing the RSS hash
>>mappings in a device.
>
>
> Yes, out of order packet processing is a serious issue.
>
> There are some things I've been brainstorming about.
>
> One thought I keep coming back to is the hack the block layer
> is using right now. It remembers which CPU a block I/O request
> comes in on, and it makes sure the completion runs on that
> cpu too.
>
> We could remember the cpu that the last socket level operation
> occurred upon, and use that as a target for packets. This requires a
> bit of work.
>
> First we'd need some kind of pre-demux at netif_receive_skb()
> time to look up the cpu target, and reference this blob from
> the socket somehow, and keep it uptodate at various specific
> locations (read/write/poll, whatever...).
Does poll on the socket touch all that many cachelines, or are you thinking of it
as being a predictor of where read/write will be called?
>
> Or we could pre-demux the real socket. That could be exciting.
>
> But then we come back to the cpu number changing issue. There is a
> cool way to handle this, because it seems that we can just keep
> queueing to the previous cpu and it can check the socket cpu cookie.
> If that changes, the old target can push the rest of it's queue to
> that cpu and then update the cpu target blob.
>
> Anyways, just some ideas.
For what it is worth, at the 5000 foot description level that is exactly what
HP-UX 11.X does and calls TOPS (Thread Optimized Packet Scheduling). Where the
socket was last accessed is stashed away (in the socket/stream structure) and
that is looked-up when the driver hands the packet up the stack. It was done
that way in HP-UX 11.X because we found that simply hashing the headers (what
HP-UX 10.20 called "Inbound Packet Scheduling" or IPS) while fine for discrete
netperf TCP_RR tests, wasn't really what one wanted when a single thread of
execution was servicing more than one connection/flow.
The TOPS patches were added to HP-UX 11.0 ca 1998 and while there have been some
issues (as you surmise, and others thanks to Streams being involved :) it appears
to have worked rather well these last ten years. So, at least in the abstract
what is proposed above has at least a little pre-validation. TOPS can be
disabled/enabled via an ndd (ie sysctl) setting for those cases when the number
of NICs (back then they were all single-queue) or now queues is a reasonable
fraction of the number of cores and the administrator can/wants to silo things.
rick jones
next prev parent reply other threads:[~2009-04-22 18:49 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-08 22:48 [PATCH] Software receive packet steering Tom Herbert
2009-04-08 23:08 ` Stephen Hemminger
2009-04-08 23:09 ` Stephen Hemminger
2009-04-08 23:15 ` David Miller
2009-04-09 16:43 ` Tom Herbert
2009-04-09 18:23 ` Ben Hutchings
2009-04-09 21:17 ` David Miller
2009-04-09 0:36 ` David Miller
2009-04-09 4:40 ` Tom Herbert
2009-04-09 5:24 ` David Miller
2009-04-20 10:32 ` Andi Kleen
2009-04-20 10:46 ` David Miller
2009-04-21 3:26 ` Tom Herbert
2009-04-21 9:48 ` Eric Dumazet
2009-04-21 15:46 ` Stephen Hemminger
2009-04-21 18:52 ` Tom Herbert
2009-04-22 9:21 ` David Miller
2009-04-22 15:46 ` Tom Herbert
2009-04-22 18:49 ` Rick Jones [this message]
2009-04-22 20:44 ` Jesper Dangaard Brouer
2009-04-23 6:58 ` Jens Axboe
2009-04-23 7:25 ` David Miller
2009-04-23 7:29 ` Jens Axboe
2009-04-23 9:12 ` Jens Laas
2009-04-22 14:33 ` Martin Josefsson
2009-04-23 7:34 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49EF66CE.10800@hp.com \
--to=rick.jones2@hp.com \
--cc=andi@firstfloor.org \
--cc=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=shemminger@vyatta.com \
--cc=therbert@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.