From: Madars Vitolins <m@silodev.com>
To: Jason Baron <jbaron@akamai.com>
Cc: Eric Wong <normalperson@yhbt.net>, linux-kernel@vger.kernel.org
Subject: Re: epoll and multiple processes - eliminate unneeded process wake-ups
Date: Sat, 05 Dec 2015 13:47:10 +0200 [thread overview]
Message-ID: <9664870cea1bbe5938ac40ff2c161be6@silodev.com> (raw)
In-Reply-To: <565DFF0A.4060901@akamai.com>
Hi Jason,
I did the testing and wrote for it a blog article for this:
https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/
But in summary is following:
Test case:
- One multi-threaded binary with 10 threads are doing total of 1'000'000
calls to 250 single threaded processes doing epoll() on the Posix queue
- The 'call' are basically sending a message to shared queue (to those
250 load balanced processed) and they send reply back to client thread's
private queue
Tests done on following system:
- Host system: Linux Mint Mate 17.2 64bit, kernel: 3.13.0-24-generic
- CPU: Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz (two cores)
- RAM: 16 GB
- Visualization platform: Oracle Virtual Box 4.3.28
- Guest OS: Gentoo Linux 2015.03, kernel 4.3.0-gentoo, 64 bit.
- CPU for guest: Two cores
- RAM for guest: 5GB (no swap usage, free about 4GB)
- Enduro/X version: 2.3.2
Results with original kernel (no EPOLLEXCLUSIVE):
Gives:
$ time ./bankcl
...
real 14m20.561s
user 0m21.823s
sys 10m49.821s
Patched kernel version with EPOLLEXCLUSIVE flag in use:
$ time ./bankcl
...
real 0m24.953s
user 0m17.497s
sys 0m4.445s
Thus 14 minutes vs 24 seconds! So EPOLLEXCLUSIVE flag makes application
to run *35 times faster*!
Guys this is MUST HAVE patch!
Thanks,
Madars
Jason Baron @ 2015-12-01 22:11 rakstīja:
> Hi Madars,
>
> On 11/30/2015 04:28 PM, Madars Vitolins wrote:
>> Hi Jason,
>>
>> I today did search the mail archive and checked your offered patch did
>> on February, it basically does the some (flag for
>> add_wait_queue_exclusive() + balance).
>>
>> So I plan to run off some tests with your patch, flag on/off and will
>> provide results. I guess if I pull up 250 or 500 processes (which
>> could real for production environment) waiting on one Q, then there
>> could be a notable difference in performance with EPOLLEXCLUSIVE set
>> or not.
>>
>
> Sounds good. Below is an updated patch if you want to try it - it only
> adds the 'EPOLLEXCLUSIVE' flag.
>
>
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 1e009ca..265fa7b 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -92,7 +92,7 @@
> */
>
> /* Epoll private bits inside the event mask */
> -#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET)
> +#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET |
> EPOLLEXCLUSIVE)
>
> /* Maximum number of nesting allowed inside epoll sets */
> #define EP_MAX_NESTS 4
> @@ -1002,6 +1002,7 @@ static int ep_poll_callback(wait_queue_t *wait,
> unsigned mode, int sync, void *k
> unsigned long flags;
> struct epitem *epi = ep_item_from_wait(wait);
> struct eventpoll *ep = epi->ep;
> + int ewake = 0;
>
> if ((unsigned long)key & POLLFREE) {
> ep_pwq_from_wait(wait)->whead = NULL;
> @@ -1066,8 +1067,10 @@ static int ep_poll_callback(wait_queue_t *wait,
> unsigned mode, int sync, void *k
> * Wake up ( if active ) both the eventpoll wait list and the
> ->poll()
> * wait list.
> */
> - if (waitqueue_active(&ep->wq))
> + if (waitqueue_active(&ep->wq)) {
> + ewake = 1;
> wake_up_locked(&ep->wq);
> + }
> if (waitqueue_active(&ep->poll_wait))
> pwake++;
>
> @@ -1078,6 +1081,9 @@ out_unlock:
> if (pwake)
> ep_poll_safewake(&ep->poll_wait);
>
> + if (epi->event.events & EPOLLEXCLUSIVE)
> + return ewake;
> +
> return 1;
> }
>
> @@ -1095,7 +1101,10 @@ static void ep_ptable_queue_proc(struct file
> *file, wait_queue_head_t *whead,
> init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);
> pwq->whead = whead;
> pwq->base = epi;
> - add_wait_queue(whead, &pwq->wait);
> + if (epi->event.events & EPOLLEXCLUSIVE)
> + add_wait_queue_exclusive(whead, &pwq->wait);
> + else
> + add_wait_queue(whead, &pwq->wait);
> list_add_tail(&pwq->llink, &epi->pwqlist);
> epi->nwait++;
> } else {
> @@ -1861,6 +1870,10 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op,
> int, fd,
> if (f.file == tf.file || !is_file_epoll(f.file))
> goto error_tgt_fput;
>
> + if ((epds.events & EPOLLEXCLUSIVE) && (op == EPOLL_CTL_MOD ||
> + (op == EPOLL_CTL_ADD && is_file_epoll(tf.file))))
> + goto error_tgt_fput;
> +
> /*
> * At this point it is safe to assume that the "private_data"
> contains
> * our own data structure.
> diff --git a/include/uapi/linux/eventpoll.h
> b/include/uapi/linux/eventpoll.h
> index bc81fb2..925bbfb 100644
> --- a/include/uapi/linux/eventpoll.h
> +++ b/include/uapi/linux/eventpoll.h
> @@ -26,6 +26,9 @@
> #define EPOLL_CTL_DEL 2
> #define EPOLL_CTL_MOD 3
>
> +/* Add exclusively */
> +#define EPOLLEXCLUSIVE (1 << 28)
> +
> /*
> * Request the handling of system wakeup events so as to prevent
> system suspends
> * from happening while those events are being processed.
>
>
>> During kernel hacking with debug print, with 10 processes waiting on
>> one event source, with original kernel I did see lot un-needed
>> processing inside of eventpoll.c, it got 10x calls to
>> ep_poll_callback() and other stuff for single event, which results
>> with few processes waken up in user space (count probably gets
>> randomly depending on concurrency).
>>
>>
>> Meanwhile we are not the only ones who talk about this patch, see
>> here:
>> http://stackoverflow.com/questions/33226842/epollexclusive-and-epollroundrobin-flags-in-mainstream-kernel
>> others are asking too.
>>
>> So what is the current situation with your patch, what is the blocking
>> for getting it into mainline?
>>
>
> If we can show some good test results here I will re-submit it.
>
> Thanks,
>
> -Jason
next prev parent reply other threads:[~2015-12-05 11:49 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-28 22:54 epoll and multiple processes - eliminate unneeded process wake-ups Madars Vitolins
2015-11-30 19:45 ` Jason Baron
2015-11-30 21:28 ` Madars Vitolins
2015-12-01 20:11 ` Jason Baron
2015-12-05 11:47 ` Madars Vitolins [this message]
-- strict thread matches above, loose matches on Subject: below --
2015-07-13 12:34 Madars Vitolins
2015-07-15 13:07 ` Madars Vitolins
2015-08-03 23:48 ` Eric Wong
2015-08-04 15:02 ` Jason Baron
2015-08-05 11:06 ` Madars Vitolins
2015-08-05 13:32 ` Jason Baron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9664870cea1bbe5938ac40ff2c161be6@silodev.com \
--to=m@silodev.com \
--cc=jbaron@akamai.com \
--cc=linux-kernel@vger.kernel.org \
--cc=normalperson@yhbt.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.