linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
To: Tom Herbert <therbert-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Eric Dumazet
	<eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Michael Kerrisk
	<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>,
	netdev <netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Ying Cai <ycai-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Willem de Bruijn
	<willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Neal Cardwell <ncardwell-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Linux API <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH net-next] net: introduce SO_INCOMING_CPU
Date: Fri, 14 Nov 2014 16:50:31 -0800	[thread overview]
Message-ID: <CALCETrU5Og8tfkFuA-rPY5evV5CkPsx=22f6mePHJPvfcRrBdA@mail.gmail.com> (raw)
In-Reply-To: <CA+mtBx8uneXRoc5HcXED58zi44UyMZGJS6_sU3av8ST1ft7Diw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Fri, Nov 14, 2014 at 4:40 PM, Tom Herbert <therbert-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> On Fri, Nov 14, 2014 at 4:24 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>> On Fri, Nov 14, 2014 at 4:06 PM, Tom Herbert <therbert-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>>> On Fri, Nov 14, 2014 at 2:10 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>>>> On Fri, Nov 14, 2014 at 1:36 PM, Tom Herbert <therbert-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>>>>> On Fri, Nov 14, 2014 at 12:34 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>>>>>> On Fri, Nov 14, 2014 at 12:25 PM, Tom Herbert <therbert-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>>>>>>> On Fri, Nov 14, 2014 at 12:16 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>>>>>>>> On Fri, Nov 14, 2014 at 11:52 AM, Tom Herbert <therbert-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>>>>>>>>> On Fri, Nov 14, 2014 at 11:33 AM, Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>>>>>>>> On Fri, 2014-11-14 at 09:17 -0800, Andy Lutomirski wrote:
>>>>>>>>>>
>>>>>>>>>>> As a heavy user of RFS (and finder of bugs in it, too), here's my
>>>>>>>>>>> question about this API:
>>>>>>>>>>>
>>>>>>>>>>> How does an application tell whether the socket represents a
>>>>>>>>>>> non-actively-steered flow?  If the flow is subject to RFS, then moving
>>>>>>>>>>> the application handling to the socket's CPU seems problematic, as the
>>>>>>>>>>> socket's CPU might move as well.  The current implementation in this
>>>>>>>>>>> patch seems to tell me which CPU the most recent packet came in on,
>>>>>>>>>>> which is not necessarily very useful.
>>>>>>>>>>
>>>>>>>>>> Its the cpu that hit the TCP stack, bringing dozens of cache lines in
>>>>>>>>>> its cache. This is all that matters,
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Some possibilities:
>>>>>>>>>>>
>>>>>>>>>>> 1. Let SO_INCOMING_CPU fail if RFS or RPS are in play.
>>>>>>>>>>
>>>>>>>>>> Well, idea is to not use RFS at all. Otherwise, it is useless.
>>>>>>>>
>>>>>>>> Sure, but how do I know that it'll be the same CPU next time?
>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Bear in mind this is only an interface to report RX CPU and in itself
>>>>>>>>> doesn't provide any functionality for changing scheduling, there is
>>>>>>>>> obviously logic needed in user space that would need to do something.
>>>>>>>>>
>>>>>>>>> If we track the interrupting CPU in skb, the interface could be easily
>>>>>>>>> extended to provide the interrupting CPU, the RPS CPU (calculated at
>>>>>>>>> reported time), and the CPU processing transport (post steering which
>>>>>>>>> is what is currently returned). That would provide the complete
>>>>>>>>> picture to control scheduling a flow from userspace, and an interface
>>>>>>>>> to selectively turn off RFS for a socket would make sense then.
>>>>>>>>
>>>>>>>> I think that a turn-off-RFS interface would also want a way to figure
>>>>>>>> out where the flow would go without RFS.  Can the network stack do
>>>>>>>> that (e.g. evaluate the rx indirection hash or whatever happens these
>>>>>>>> days)?
>>>>>>>>
>>>>>>> Yes,. We need the rxhash and the CPU that packets are received on from
>>>>>>> the device for the socket. The former we already have, the latter
>>>>>>> might be done by adding a field to skbuff to set received CPU. Given
>>>>>>> the L4 hash and interrupting CPU we can calculated the RPS CPU which
>>>>>>> is where packet would have landed with RFS off.
>>>>>>
>>>>>> Hmm.  I think this would be useful for me.  It would *definitely* be
>>>>>> useful for me if I could pin an RFS flow to a cpu of my choice.
>>>>>>
>>>>> Andy, can you elaborate a little more on your use case. I've thought
>>>>> several times about an interface to program the flow table from
>>>>> userspace, but never quite came up with a compelling use case and
>>>>> there is the security concern that a user could "steal" cycles from
>>>>> arbitrary CPUs.
>>>>
>>>> I have a bunch of threads that are pinned to various CPUs or groups of
>>>> CPUs.  Each thread is responsible for a fixed set of flows.  I'd like
>>>> those flows to go to those CPUs.
>>>>
>>>> RFS will eventually do it, but it would be nice if I could
>>>> deterministically ask for a flow to be routed to the right CPU.  Also,
>>>> if my thread bounces temporarily to another CPU, I don't really need
>>>> the flow to follow it -- I'd like it to stay put.
>>>>
>>> Okay, how about it we have a SO_RFS_LOCK_FLOW sockopt. When this is
>>> called on a socket we can lock the socket to CPU binding to the
>>> current CPU it is called from. It could be unlocked at a later point.
>>> Would this satisfy your requirements?
>>
>> Yes, I think.  Especially if it bypassed the hash table.
>
> Unfortunately we can't easily bypass the hash table. The only way I
> know of to to do that is to perform the socket lookup to do steering
> (I tried that early on, but it was pretty costly).

What happens if you just call ndo_rx_flow_steer and do something to
keep the result from expiring?

>>
>>> As I mentioned, there is no material functionality in this patch and
>>> it should be independent of RFS. It simply returns the CPU where the
>>> stack processed the packet. Whether or not this is meaningful
>>> information to the algorithm being implemented in userspace is
>>> completely up to the caller to decide.
>>
>> Agreed.
>>
>> My only concern is that writing that userspace algorithm might result
>> in surprises if RFS is on.  Having the user program notice the problem
>> early and alert the admin might help keep Murphy's Law at bay here.
>>
> By Murphy's law we'd also have to consider that the flow hash could
> change after reading the results so that the scheduling done in
> userspace is completely wrong until the CPU is read again.
> Synchronizing kernel and device state with userspace state is not
> always so easy. One way to mitigate is to use ancillary data which
> would provide real time information and obviate the need for another
> system call.

Hmm.  That would work, too.  I don't know how annoyed user code would
be at having to read ancillary data, though.

The flow hash really shouldn't change much, though, right?

--Andy

  parent reply	other threads:[~2014-11-15  0:50 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1415393472.13896.119.camel@edumazet-glaptop2.roam.corp.google.com>
     [not found] ` <1415393472.13896.119.camel-XN9IlZ5yJG9HTL0Zs8A6p/gx64E7kk8eUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
2014-11-14  8:05   ` [PATCH net-next] net: introduce SO_INCOMING_CPU Michael Kerrisk
     [not found]     ` <CAHO5Pa0OGtgbUp4R287jbK2SFSpVUoXWCJybvHTFsGyCLynqLg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-11-14 17:00       ` Eric Dumazet
2014-11-14 17:17       ` Andy Lutomirski
     [not found]         ` <CALCETrXmCWSoeQLC3WCqMf_sehGVE9KTbTGmsA77ZN84B-V24Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-11-14 19:33           ` Eric Dumazet
2014-11-14 19:52             ` Tom Herbert
     [not found]               ` <CA+mtBx_DyUCxE6BqL_xXcqKPUDu7b3jLShV+AL6L+gCzJLriPg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-11-14 20:16                 ` Andy Lutomirski
2014-11-14 20:25                   ` Tom Herbert
     [not found]                     ` <CA+mtBx_tNW_Bz0hovZ90XxcLK6pL8d6RX=v5Rg8utfKkaFoJ+Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-11-14 20:34                       ` Andy Lutomirski
2014-11-14 21:36                         ` Tom Herbert
2014-11-14 22:10                           ` Andy Lutomirski
     [not found]                             ` <CALCETrXf8j52nmpw-A2+bQt0yoz-fiD-4GP4Qf-afwH6UjCVnw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-11-14 22:24                               ` Eric Dumazet
     [not found]                                 ` <1416003883.17262.72.camel-XN9IlZ5yJG9HTL0Zs8A6p/gx64E7kk8eUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
2014-11-14 22:27                                   ` Andy Lutomirski
     [not found]                                     ` <CALCETrWhsj+byMeT=WHb9eLR_vgsUe58Li8JDLaTvsSPWp1DKQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-11-14 22:58                                       ` Eric Dumazet
     [not found]                                         ` <1416005912.17262.76.camel-XN9IlZ5yJG9HTL0Zs8A6p/gx64E7kk8eUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
2014-11-14 23:03                                           ` Andy Lutomirski
     [not found]                                             ` <CALCETrUAd50g2mYEOKT-5pEqMvwSstfQcgU8=7GRsO1XcKBSFA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-11-14 23:32                                               ` Eric Dumazet
     [not found]                                                 ` <1416007946.17262.84.camel-XN9IlZ5yJG9HTL0Zs8A6p/gx64E7kk8eUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
2014-11-14 23:40                                                   ` Andy Lutomirski
2014-11-15  0:06                               ` Tom Herbert
     [not found]                                 ` <CA+mtBx9Z33O0x4-MG=66Fb4qvBJfxV54xOyiYvWydr=4Bka2xw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-11-15  0:24                                   ` Andy Lutomirski
     [not found]                                     ` <CALCETrXnM5zCt651QWF0Z3c197gqzbLA29bQzwi64DnCvS48NQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-11-15  0:40                                       ` Tom Herbert
     [not found]                                         ` <CA+mtBx8uneXRoc5HcXED58zi44UyMZGJS6_sU3av8ST1ft7Diw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-11-15  0:50                                           ` Andy Lutomirski [this message]
2014-11-15 18:41                                             ` Tom Herbert
2014-11-15 18:47                                               ` Andy Lutomirski
     [not found]                   ` <CALCETrUuChgxXcNrKjYzpe1ry0hx3kf19EfAbY1N1YnG7NFZUw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-11-14 22:16                     ` Eric Dumazet
     [not found]                       ` <1416003371.17262.66.camel-XN9IlZ5yJG9HTL0Zs8A6p/gx64E7kk8eUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
2014-11-14 22:18                         ` Andy Lutomirski
     [not found]                           ` <CALCETrVQqsJKqVU4ENEfi86wkQLQ=mqBif=HcQiKT2h_fj22xQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-11-15  0:15                             ` Tom Herbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrU5Og8tfkFuA-rPY5evV5CkPsx=22f6mePHJPvfcRrBdA@mail.gmail.com' \
    --to=luto-klttt9wpgjjwatoyat5jvq@public.gmane.org \
    --cc=davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org \
    --cc=eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=ncardwell-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=therbert-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=ycai-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).