public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jason Baron <jbaron@akamai.com>
To: Madars Vitolins <m@silodev.com>
Cc: Eric Wong <normalperson@yhbt.net>, linux-kernel@vger.kernel.org
Subject: Re: epoll and multiple processes - eliminate unneeded process wake-ups
Date: Tue, 1 Dec 2015 15:11:54 -0500	[thread overview]
Message-ID: <565DFF0A.4060901@akamai.com> (raw)
In-Reply-To: <ad48f0339fc689b63df712969d8c17c6@silodev.com>

Hi Madars,

On 11/30/2015 04:28 PM, Madars Vitolins wrote:
> Hi Jason,
> 
> I today did search the mail archive and checked your offered patch did on February, it basically does the some (flag for add_wait_queue_exclusive() + balance).
> 
> So I plan to run off some tests with your patch, flag on/off and will provide results. I guess if I pull up 250 or 500 processes (which could real for production environment) waiting on one Q, then there could be a notable difference in performance with EPOLLEXCLUSIVE set or not.
> 

Sounds good. Below is an updated patch if you want to try it - it only adds the 'EPOLLEXCLUSIVE' flag.


diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1e009ca..265fa7b 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -92,7 +92,7 @@
  */
 
 /* Epoll private bits inside the event mask */
-#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET)
+#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET | EPOLLEXCLUSIVE)
 
 /* Maximum number of nesting allowed inside epoll sets */
 #define EP_MAX_NESTS 4
@@ -1002,6 +1002,7 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
 	unsigned long flags;
 	struct epitem *epi = ep_item_from_wait(wait);
 	struct eventpoll *ep = epi->ep;
+	int ewake = 0;
 
 	if ((unsigned long)key & POLLFREE) {
 		ep_pwq_from_wait(wait)->whead = NULL;
@@ -1066,8 +1067,10 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
 	 * Wake up ( if active ) both the eventpoll wait list and the ->poll()
 	 * wait list.
 	 */
-	if (waitqueue_active(&ep->wq))
+	if (waitqueue_active(&ep->wq)) {
+		ewake = 1;
 		wake_up_locked(&ep->wq);
+	}
 	if (waitqueue_active(&ep->poll_wait))
 		pwake++;
 
@@ -1078,6 +1081,9 @@ out_unlock:
 	if (pwake)
 		ep_poll_safewake(&ep->poll_wait);
 
+	if (epi->event.events & EPOLLEXCLUSIVE)
+		return ewake;
+
 	return 1;
 }
 
@@ -1095,7 +1101,10 @@ static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
 		init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);
 		pwq->whead = whead;
 		pwq->base = epi;
-		add_wait_queue(whead, &pwq->wait);
+		if (epi->event.events & EPOLLEXCLUSIVE)
+			add_wait_queue_exclusive(whead, &pwq->wait);
+		else
+			add_wait_queue(whead, &pwq->wait);
 		list_add_tail(&pwq->llink, &epi->pwqlist);
 		epi->nwait++;
 	} else {
@@ -1861,6 +1870,10 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
 	if (f.file == tf.file || !is_file_epoll(f.file))
 		goto error_tgt_fput;
 
+	if ((epds.events & EPOLLEXCLUSIVE) && (op == EPOLL_CTL_MOD ||
+		(op == EPOLL_CTL_ADD && is_file_epoll(tf.file))))
+		goto error_tgt_fput;
+
 	/*
 	 * At this point it is safe to assume that the "private_data" contains
 	 * our own data structure.
diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h
index bc81fb2..925bbfb 100644
--- a/include/uapi/linux/eventpoll.h
+++ b/include/uapi/linux/eventpoll.h
@@ -26,6 +26,9 @@
 #define EPOLL_CTL_DEL 2
 #define EPOLL_CTL_MOD 3
 
+/* Add exclusively */
+#define EPOLLEXCLUSIVE (1 << 28)
+
 /*
  * Request the handling of system wakeup events so as to prevent system suspends
  * from happening while those events are being processed.


> During kernel hacking with debug print, with 10 processes waiting on one event source, with original kernel I did see lot un-needed processing inside of eventpoll.c, it got 10x calls to ep_poll_callback() and other stuff for single event, which results with few processes waken up in user space (count probably gets randomly depending on concurrency).
> 
> 
> Meanwhile we are not the only ones who talk about this patch, see here: http://stackoverflow.com/questions/33226842/epollexclusive-and-epollroundrobin-flags-in-mainstream-kernel others are asking too.
> 
> So what is the current situation with your patch, what is the blocking for getting it into mainline?
> 

If we can show some good test results here I will re-submit it.

Thanks,

-Jason


  reply	other threads:[~2015-12-01 20:11 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-28 22:54 epoll and multiple processes - eliminate unneeded process wake-ups Madars Vitolins
2015-11-30 19:45 ` Jason Baron
2015-11-30 21:28   ` Madars Vitolins
2015-12-01 20:11     ` Jason Baron [this message]
2015-12-05 11:47       ` Madars Vitolins
  -- strict thread matches above, loose matches on Subject: below --
2015-07-13 12:34 Madars Vitolins
2015-07-15 13:07 ` Madars Vitolins
2015-08-03 23:48 ` Eric Wong
2015-08-04 15:02   ` Jason Baron
2015-08-05 11:06     ` Madars Vitolins
2015-08-05 13:32       ` Jason Baron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=565DFF0A.4060901@akamai.com \
    --to=jbaron@akamai.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m@silodev.com \
    --cc=normalperson@yhbt.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox