From: Jason Baron <jbaron@akamai.com>
To: Madars Vitolins <m@silodev.com>
Cc: Eric Wong <normalperson@yhbt.net>, linux-kernel@vger.kernel.org
Subject: Re: epoll and multiple processes - eliminate unneeded process wake-ups
Date: Tue, 1 Dec 2015 15:11:54 -0500 [thread overview]
Message-ID: <565DFF0A.4060901@akamai.com> (raw)
In-Reply-To: <ad48f0339fc689b63df712969d8c17c6@silodev.com>
Hi Madars,
On 11/30/2015 04:28 PM, Madars Vitolins wrote:
> Hi Jason,
>
> I today did search the mail archive and checked your offered patch did on February, it basically does the some (flag for add_wait_queue_exclusive() + balance).
>
> So I plan to run off some tests with your patch, flag on/off and will provide results. I guess if I pull up 250 or 500 processes (which could real for production environment) waiting on one Q, then there could be a notable difference in performance with EPOLLEXCLUSIVE set or not.
>
Sounds good. Below is an updated patch if you want to try it - it only adds the 'EPOLLEXCLUSIVE' flag.
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1e009ca..265fa7b 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -92,7 +92,7 @@
*/
/* Epoll private bits inside the event mask */
-#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET)
+#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET | EPOLLEXCLUSIVE)
/* Maximum number of nesting allowed inside epoll sets */
#define EP_MAX_NESTS 4
@@ -1002,6 +1002,7 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
unsigned long flags;
struct epitem *epi = ep_item_from_wait(wait);
struct eventpoll *ep = epi->ep;
+ int ewake = 0;
if ((unsigned long)key & POLLFREE) {
ep_pwq_from_wait(wait)->whead = NULL;
@@ -1066,8 +1067,10 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
* Wake up ( if active ) both the eventpoll wait list and the ->poll()
* wait list.
*/
- if (waitqueue_active(&ep->wq))
+ if (waitqueue_active(&ep->wq)) {
+ ewake = 1;
wake_up_locked(&ep->wq);
+ }
if (waitqueue_active(&ep->poll_wait))
pwake++;
@@ -1078,6 +1081,9 @@ out_unlock:
if (pwake)
ep_poll_safewake(&ep->poll_wait);
+ if (epi->event.events & EPOLLEXCLUSIVE)
+ return ewake;
+
return 1;
}
@@ -1095,7 +1101,10 @@ static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);
pwq->whead = whead;
pwq->base = epi;
- add_wait_queue(whead, &pwq->wait);
+ if (epi->event.events & EPOLLEXCLUSIVE)
+ add_wait_queue_exclusive(whead, &pwq->wait);
+ else
+ add_wait_queue(whead, &pwq->wait);
list_add_tail(&pwq->llink, &epi->pwqlist);
epi->nwait++;
} else {
@@ -1861,6 +1870,10 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
if (f.file == tf.file || !is_file_epoll(f.file))
goto error_tgt_fput;
+ if ((epds.events & EPOLLEXCLUSIVE) && (op == EPOLL_CTL_MOD ||
+ (op == EPOLL_CTL_ADD && is_file_epoll(tf.file))))
+ goto error_tgt_fput;
+
/*
* At this point it is safe to assume that the "private_data" contains
* our own data structure.
diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h
index bc81fb2..925bbfb 100644
--- a/include/uapi/linux/eventpoll.h
+++ b/include/uapi/linux/eventpoll.h
@@ -26,6 +26,9 @@
#define EPOLL_CTL_DEL 2
#define EPOLL_CTL_MOD 3
+/* Add exclusively */
+#define EPOLLEXCLUSIVE (1 << 28)
+
/*
* Request the handling of system wakeup events so as to prevent system suspends
* from happening while those events are being processed.
> During kernel hacking with debug print, with 10 processes waiting on one event source, with original kernel I did see lot un-needed processing inside of eventpoll.c, it got 10x calls to ep_poll_callback() and other stuff for single event, which results with few processes waken up in user space (count probably gets randomly depending on concurrency).
>
>
> Meanwhile we are not the only ones who talk about this patch, see here: http://stackoverflow.com/questions/33226842/epollexclusive-and-epollroundrobin-flags-in-mainstream-kernel others are asking too.
>
> So what is the current situation with your patch, what is the blocking for getting it into mainline?
>
If we can show some good test results here I will re-submit it.
Thanks,
-Jason
next prev parent reply other threads:[~2015-12-01 20:11 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-28 22:54 epoll and multiple processes - eliminate unneeded process wake-ups Madars Vitolins
2015-11-30 19:45 ` Jason Baron
2015-11-30 21:28 ` Madars Vitolins
2015-12-01 20:11 ` Jason Baron [this message]
2015-12-05 11:47 ` Madars Vitolins
-- strict thread matches above, loose matches on Subject: below --
2015-07-13 12:34 Madars Vitolins
2015-07-15 13:07 ` Madars Vitolins
2015-08-03 23:48 ` Eric Wong
2015-08-04 15:02 ` Jason Baron
2015-08-05 11:06 ` Madars Vitolins
2015-08-05 13:32 ` Jason Baron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=565DFF0A.4060901@akamai.com \
--to=jbaron@akamai.com \
--cc=linux-kernel@vger.kernel.org \
--cc=m@silodev.com \
--cc=normalperson@yhbt.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.