linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Shawn Landden <slandden@gmail.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [RFC] EPOLL_KILLME: New flag to epoll_wait() that subscribes process to death row (new syscall)
Date: Wed, 1 Nov 2017 07:04:54 -0700	[thread overview]
Message-ID: <20171101140454.GA28205@bombadil.infradead.org> (raw)
In-Reply-To: <20171101053244.5218-1-slandden@gmail.com>

On Tue, Oct 31, 2017 at 10:32:44PM -0700, Shawn Landden wrote:
> It is common for services to be stateless around their main event loop.
> If a process passes the EPOLL_KILLME flag to epoll_wait5() then it
> signals to the kernel that epoll_wait5() may not complete, and the kernel
> may send SIGKILL if resources get tight.
> 
> See my systemd patch: https://github.com/shawnl/systemd/tree/killme
> 
> Android uses this memory model for all programs, and having it in the
> kernel will enable integration with the page cache (not in this
> series).

I'm not taking a position on whether this is a good feature to have, but
your implementation could do with some improvement.

> +static LIST_HEAD(deathrow_q);
> +static long deathrow_len __read_mostly;

In what sense is this __read_mostly when it's modified by every call that
has EPOLL_KILLME set?  Also, why do you think this is a useful statistic
to gather in the kernel and expose to userspace?

> +/* TODO: Can this lock be removed by using atomic instructions to update
> + * queue?
> + */
> +static DEFINE_MUTEX(deathrow_mutex);

This doesn't need to be a mutex; you don't do anything that sleeps while
holding it.  It should be a spinlock instead (but see below).

> @@ -380,6 +380,9 @@ struct sched_entity {
>  	struct list_head		group_node;
>  	unsigned int			on_rq;
>  
> +	unsigned			on_deathrow:1;
> +	struct list_head		deathrow;
> +
>  	u64				exec_start;
>  	u64				sum_exec_runtime;
>  	u64				vruntime;

You're adding an extra 16 bytes to each task to implement this feature.  I
don't like that, and I think you can avoid it.

Turn 'deathrow' into a wait_queue_head_t.  Declare the wait_queue_entry
on the stack.

While you're at it, I don't think 'deathrow' is an epoll concept.
I think it's an OOM killer concept which happens to be only accessible
through epoll today (but we could consider allowing other system calls
to place tasks on it in the future).  So the central place for all this is
in oom_kill.c and epoll only calls into it.  Maybe we have 'deathrow_enroll()'
and 'deathrow_remove()' APIs in oom_killer.

And I don't like the name 'deathrow'.  How about oom_target?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-11-01 14:04 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-01  5:32 [RFC] EPOLL_KILLME: New flag to epoll_wait() that subscribes process to death row (new syscall) Shawn Landden
2017-11-01 14:04 ` Matthew Wilcox [this message]
2017-11-01 15:16 ` Colin Walters
2017-11-01 15:22   ` Colin Walters
2017-11-03  9:22     ` peter enderborg
2017-11-01 19:02   ` Shawn Landden
2017-11-01 19:37     ` Colin Walters
2017-11-01 19:43       ` Shawn Landden
2017-11-01 20:54       ` Shawn Landden
2017-11-02 15:24       ` Shawn Paul Landden
2017-11-01 19:05   ` Shawn Landden
2017-11-01 22:10 ` Tetsuo Handa
2017-11-02  7:36   ` Shawn Landden
2017-11-02 15:45 ` Michal Hocko
2017-11-03  6:35 ` [RFC v2] prctl: prctl(PR_SET_IDLE, PR_IDLE_MODE_KILLME), for stateless idle loops Shawn Landden
2017-11-03  9:09   ` Michal Hocko
2017-11-18  4:45     ` Shawn Landden
2017-11-19  4:19       ` Matthew Wilcox
2017-11-20  8:35       ` Michal Hocko
2017-11-21  4:48         ` Shawn Landden
2017-11-21  7:05           ` Michal Hocko
2017-11-18 20:33     ` Shawn Landden
2017-11-21  4:49   ` [RFC v3] It is common for services to be stateless around their main event loop. If a process sets PR_SET_IDLE to PR_IDLE_MODE_KILLME then it signals to the kernel that epoll_wait() and friends may not complete, and the kernel may send SIGKILL if resources get tight Shawn Landden
2017-11-21  4:56     ` Shawn Landden
2017-11-21  5:16     ` [RFC v4] " Shawn Landden
2017-11-21  5:26       ` Shawn Landden
2017-11-21  9:14       ` Thomas Gleixner
2017-11-22 10:29   ` [RFC v2] prctl: prctl(PR_SET_IDLE, PR_IDLE_MODE_KILLME), for stateless idle loops peter enderborg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171101140454.GA28205@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=slandden@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).