linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Baron <jbaron-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>
To: Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org>
Cc: Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
	mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	davidel-AhlLAIvw+VEjIGhXcJzhZg@public.gmane.org,
	mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
	Linus Torvalds
	<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Peter Zijlstra
	<a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org>,
	"luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org >> Andy
	Lutomirski" <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
Subject: Re: [PATCH v2 2/2] epoll: introduce EPOLLEXCLUSIVE and EPOLLROUNDROBIN
Date: Wed, 25 Feb 2015 10:48:18 -0500	[thread overview]
Message-ID: <54EDEEC2.2040201@akamai.com> (raw)
In-Reply-To: <20150222002432.GA9031-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>

On 02/21/2015 07:24 PM, Eric Wong wrote:
> Jason Baron <jbaron-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org> wrote:
>> On 02/18/2015 12:51 PM, Ingo Molnar wrote:
>>> * Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>>
>>>>> [...] However, I think the userspace API change is less 
>>>>> clear since epoll_wait() doesn't currently have an 
>>>>> 'input' events argument as epoll_ctl() does.
>>>> ... but the change would be a bit clearer and somewhat 
>>>> more flexible: LIFO or FIFO queueing, right?
>>>>
>>>> But having the queueing model as part of the epoll 
>>>> context is a legitimate approach as well.
>>> Btw., there's another optimization that the networking code 
>>> already does when processing incoming packets: waking up a 
>>> thread on the local CPU, where the wakeup is running.
>>>
>>> Doing the same on epoll would have real scalability 
>>> advantages where incoming events are IRQ driven and are 
>>> distributed amongst multiple CPUs.
>>>
>>> Where events are task driven the scheduler will already try 
>>> to pair up waker and wakee so it might not show up in 
>>> measurements that markedly.
>>>
>> Right, so this makes me think that we may want to potentially
>> support a variety of wakeup policies. Adding these to the
>> generic wake up code is just going to be too messy. So, perhaps
>> a better approach here would be to register a single
>> wait_queue_t with the event source queue that will always
>> be woken up, and then layer any epoll balancing/irq affinity
>> policies on top of that. So in essence we end up with sort of
>> two queues layers, but I think it provides much nicer isolation
>> between layers. Also, the bulk of the changes are going to be
>> isolated to the epoll code, and we avoid Andy's concern about
>> missing, or starving out wakeups.
>>
>> So here's a stab at how this API could look:
>>
>> 1. ep1 = epoll_create1(EPOLL_POLICY);
>>
>> So EPOLL_POLICY here could the round robin policy described
>> here, or the irq affinity or other ideas. The idea is to create
>> an fd that is local to the process, such that other processes
>> can not subsequently attach to it and affect our policy.
> I'm not against defining more policies if needed.
> Maybe FIFO vs LIFO is a good case for this.
>
> For affinity, it could probably be done transparently based on
> epoll_wait retrievals + EPOLL_CTL_MOD operations.
>
>> 2. epoll_ctl(ep1, EPOLL_CTL_ADD, fd_source, NULL);
>>
>> This associates ep1 with the event source. ep1 can be
>> associated with or added to at most 1 wakeup source. This call
>> would largely just form the association, but not queue anything
>> to the fd_source wait queue.
> This would mean one extra FD for every fd_source, but that's
> only a handful of FDs (listen sockets), correct?

Yes, one extra epoll fd per shared wakeup source, so this should
result in very few additional fds.

>> 3. epoll_ctl(ep2, EPOLL_CTL_ADD, ep1, event);
>>     epoll_ctl(ep3, EPOLL_CTL_ADD, ep1, event);
>>     epoll_ctl(ep4, EPOLL_CTL_ADD, ep1, event);
>>      .
>>      .
>>      .
>>
>> Finally, we add the epoll sets to the event source (indirectly via
>> ep1). So the first add would actually queue the callback to the
>> fd_source. While the subsequent calls would simply queue things
>> to the 'nested' wakeup queue associated with ep1.
> I'm not sure I follow, wouldn't this increase the number of wakeups?

I agree, my text there is confusing...I've posted this idea as
v3 of this series, so hopefully that clarifies this approach.

Thanks,

-Jason

  parent reply	other threads:[~2015-02-25 15:48 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-17 19:33 [PATCH v2 0/2] Add epoll round robin wakeup mode Jason Baron
2015-02-17 19:33 ` [PATCH v2 1/2] sched/wait: add " Jason Baron
2015-02-17 19:33 ` [PATCH v2 2/2] epoll: introduce EPOLLEXCLUSIVE and EPOLLROUNDROBIN Jason Baron
     [not found]   ` <7956874bfdc7403f37afe8a75e50c24221039bd2.1424200151.git.jbaron-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>
2015-02-18  8:07     ` Ingo Molnar
     [not found]       ` <20150218080740.GA10199-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-02-18 15:42         ` Jason Baron
2015-02-18 16:33           ` Ingo Molnar
2015-02-18 17:38             ` Jason Baron
     [not found]               ` <54E4CE14.5010708-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>
2015-02-18 17:45                 ` Ingo Molnar
2015-02-18 17:51                   ` Ingo Molnar
     [not found]                     ` <20150218175123.GA31878-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-02-18 22:18                       ` Eric Wong
2015-02-19  3:26                     ` Jason Baron
2015-02-22  0:24                       ` Eric Wong
     [not found]                         ` <20150222002432.GA9031-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2015-02-25 15:48                           ` Jason Baron [this message]
2015-02-18 23:12               ` Andy Lutomirski
     [not found]   ` <CAPh34mcPNQELwZCDTHej+HK=bpWgJ=jb1LeCtKoUHVgoDJOJoQ@mail.gmail.com>
     [not found]     ` <CAPh34mcPNQELwZCDTHej+HK=bpWgJ=jb1LeCtKoUHVgoDJOJoQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-02-27 22:24       ` Jason Baron
2015-02-17 19:46 ` [PATCH v2 0/2] Add epoll round robin wakeup mode Andy Lutomirski
2015-02-17 20:33   ` Jason Baron
     [not found]     ` <54E3A591.2050806-JqFfY2XvxFXQT0dZR+AlfA@public.gmane.org>
2015-02-17 21:09       ` Andy Lutomirski
     [not found]         ` <CALCETrWg9sdyoKg0-BkwKQgyANvJybQ_wqjTfvYEGW1+S1J5Bw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-02-18  3:15           ` Jason Baron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54EDEEC2.2040201@akamai.com \
    --to=jbaron-jqffy2xvxfxqt0dzr+alfa@public.gmane.org \
    --cc=a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=davidel-AhlLAIvw+VEjIGhXcJzhZg@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org \
    --cc=mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=normalperson-rMlxZR9MS24@public.gmane.org \
    --cc=peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org \
    --cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).