public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Madars Vitolins <m@silodev.com>
To: Jason Baron <jbaron@akamai.com>
Cc: Eric Wong <normalperson@yhbt.net>, linux-kernel@vger.kernel.org
Subject: Re: epoll and multiple processes - eliminate unneeded process wake-ups
Date: Sat, 05 Dec 2015 13:47:10 +0200	[thread overview]
Message-ID: <9664870cea1bbe5938ac40ff2c161be6@silodev.com> (raw)
In-Reply-To: <565DFF0A.4060901@akamai.com>

Hi Jason,

I did the testing and wrote for it a blog article for this: 
https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/

But in summary is following:

Test case:
- One multi-threaded binary with 10 threads are doing total of 1'000'000 
calls to 250 single threaded processes doing epoll() on the Posix queue
- The 'call' are basically sending a message to shared queue (to those 
250 load balanced processed) and they send reply back to client thread's 
private queue

Tests done on following system:
- Host system: Linux Mint Mate 17.2 64bit, kernel: 3.13.0-24-generic
- CPU: Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz (two cores)
- RAM: 16 GB
- Visualization platform: Oracle Virtual Box 4.3.28
- Guest OS: Gentoo Linux 2015.03, kernel 4.3.0-gentoo, 64 bit.
- CPU for guest: Two cores
- RAM for guest: 5GB (no swap usage, free about 4GB)
- Enduro/X version: 2.3.2


Results with original kernel (no EPOLLEXCLUSIVE):
Gives:

$ time ./bankcl
...

real 14m20.561s
user 0m21.823s
sys 10m49.821s


Patched kernel version with EPOLLEXCLUSIVE flag in use:
$ time ./bankcl
...
real 0m24.953s
user 0m17.497s
sys 0m4.445s

Thus 14 minutes vs 24 seconds! So EPOLLEXCLUSIVE flag makes application 
to run *35 times faster*!

Guys this is MUST HAVE patch!

Thanks,
Madars



Jason Baron @ 2015-12-01 22:11 rakstīja:
> Hi Madars,
> 
> On 11/30/2015 04:28 PM, Madars Vitolins wrote:
>> Hi Jason,
>> 
>> I today did search the mail archive and checked your offered patch did 
>> on February, it basically does the some (flag for 
>> add_wait_queue_exclusive() + balance).
>> 
>> So I plan to run off some tests with your patch, flag on/off and will 
>> provide results. I guess if I pull up 250 or 500 processes (which 
>> could real for production environment) waiting on one Q, then there 
>> could be a notable difference in performance with EPOLLEXCLUSIVE set 
>> or not.
>> 
> 
> Sounds good. Below is an updated patch if you want to try it - it only
> adds the 'EPOLLEXCLUSIVE' flag.
> 
> 
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 1e009ca..265fa7b 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -92,7 +92,7 @@
>   */
> 
>  /* Epoll private bits inside the event mask */
> -#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET)
> +#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET | 
> EPOLLEXCLUSIVE)
> 
>  /* Maximum number of nesting allowed inside epoll sets */
>  #define EP_MAX_NESTS 4
> @@ -1002,6 +1002,7 @@ static int ep_poll_callback(wait_queue_t *wait,
> unsigned mode, int sync, void *k
>  	unsigned long flags;
>  	struct epitem *epi = ep_item_from_wait(wait);
>  	struct eventpoll *ep = epi->ep;
> +	int ewake = 0;
> 
>  	if ((unsigned long)key & POLLFREE) {
>  		ep_pwq_from_wait(wait)->whead = NULL;
> @@ -1066,8 +1067,10 @@ static int ep_poll_callback(wait_queue_t *wait,
> unsigned mode, int sync, void *k
>  	 * Wake up ( if active ) both the eventpoll wait list and the 
> ->poll()
>  	 * wait list.
>  	 */
> -	if (waitqueue_active(&ep->wq))
> +	if (waitqueue_active(&ep->wq)) {
> +		ewake = 1;
>  		wake_up_locked(&ep->wq);
> +	}
>  	if (waitqueue_active(&ep->poll_wait))
>  		pwake++;
> 
> @@ -1078,6 +1081,9 @@ out_unlock:
>  	if (pwake)
>  		ep_poll_safewake(&ep->poll_wait);
> 
> +	if (epi->event.events & EPOLLEXCLUSIVE)
> +		return ewake;
> +
>  	return 1;
>  }
> 
> @@ -1095,7 +1101,10 @@ static void ep_ptable_queue_proc(struct file
> *file, wait_queue_head_t *whead,
>  		init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);
>  		pwq->whead = whead;
>  		pwq->base = epi;
> -		add_wait_queue(whead, &pwq->wait);
> +		if (epi->event.events & EPOLLEXCLUSIVE)
> +			add_wait_queue_exclusive(whead, &pwq->wait);
> +		else
> +			add_wait_queue(whead, &pwq->wait);
>  		list_add_tail(&pwq->llink, &epi->pwqlist);
>  		epi->nwait++;
>  	} else {
> @@ -1861,6 +1870,10 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, 
> int, fd,
>  	if (f.file == tf.file || !is_file_epoll(f.file))
>  		goto error_tgt_fput;
> 
> +	if ((epds.events & EPOLLEXCLUSIVE) && (op == EPOLL_CTL_MOD ||
> +		(op == EPOLL_CTL_ADD && is_file_epoll(tf.file))))
> +		goto error_tgt_fput;
> +
>  	/*
>  	 * At this point it is safe to assume that the "private_data" 
> contains
>  	 * our own data structure.
> diff --git a/include/uapi/linux/eventpoll.h 
> b/include/uapi/linux/eventpoll.h
> index bc81fb2..925bbfb 100644
> --- a/include/uapi/linux/eventpoll.h
> +++ b/include/uapi/linux/eventpoll.h
> @@ -26,6 +26,9 @@
>  #define EPOLL_CTL_DEL 2
>  #define EPOLL_CTL_MOD 3
> 
> +/* Add exclusively */
> +#define EPOLLEXCLUSIVE (1 << 28)
> +
>  /*
>   * Request the handling of system wakeup events so as to prevent
> system suspends
>   * from happening while those events are being processed.
> 
> 
>> During kernel hacking with debug print, with 10 processes waiting on 
>> one event source, with original kernel I did see lot un-needed 
>> processing inside of eventpoll.c, it got 10x calls to 
>> ep_poll_callback() and other stuff for single event, which results 
>> with few processes waken up in user space (count probably gets 
>> randomly depending on concurrency).
>> 
>> 
>> Meanwhile we are not the only ones who talk about this patch, see 
>> here: 
>> http://stackoverflow.com/questions/33226842/epollexclusive-and-epollroundrobin-flags-in-mainstream-kernel 
>> others are asking too.
>> 
>> So what is the current situation with your patch, what is the blocking 
>> for getting it into mainline?
>> 
> 
> If we can show some good test results here I will re-submit it.
> 
> Thanks,
> 
> -Jason

  reply	other threads:[~2015-12-05 11:49 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-28 22:54 epoll and multiple processes - eliminate unneeded process wake-ups Madars Vitolins
2015-11-30 19:45 ` Jason Baron
2015-11-30 21:28   ` Madars Vitolins
2015-12-01 20:11     ` Jason Baron
2015-12-05 11:47       ` Madars Vitolins [this message]
  -- strict thread matches above, loose matches on Subject: below --
2015-07-13 12:34 Madars Vitolins
2015-07-15 13:07 ` Madars Vitolins
2015-08-03 23:48 ` Eric Wong
2015-08-04 15:02   ` Jason Baron
2015-08-05 11:06     ` Madars Vitolins
2015-08-05 13:32       ` Jason Baron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9664870cea1bbe5938ac40ff2c161be6@silodev.com \
    --to=m@silodev.com \
    --cc=jbaron@akamai.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=normalperson@yhbt.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox