All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Khalid Aziz <khalid.aziz@oracle.com>
Cc: tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com,
	peterz@infradead.org, akpm@linux-foundation.org,
	andi.kleen@intel.com, rob@landley.net, viro@zeniv.linux.org.uk,
	venki@google.com, linux-kernel@vger.kernel.org
Subject: Re: [RFC] [PATCH] Pre-emption control for userspace
Date: Tue, 4 Mar 2014 14:56:24 +0100	[thread overview]
Message-ID: <20140304135624.GA6846@redhat.com> (raw)
In-Reply-To: <1393870033-31076-1-git-send-email-khalid.aziz@oracle.com>

On 03/03, Khalid Aziz wrote:
>
> This queueing
> and subsequent CPU cycle wastage can be avoided if the locking thread
> could request to be granted an additional timeslice if its current
> timeslice runs out before it gives up the lock.

Well. I am in no position to discuss the changes in sched/fair.c. I have
to admit that I am skeptical about the whole idea/implementation, but I
leave this to sched/ maintainers.

However, at least the proc/mmap changes do not look right. I didn't read
the whole patch, just picked a "random" ->mmap function, see below.

>  kernel/sched/preempt_delay.c                    |  39 ++++++

Why? This can go into proc/ as well.

> +static void
> +close_preempt_delay_vmops(struct vm_area_struct *vma)
> +{
> +	struct preemp_delay_mmap_state *state;
> +
> +	state = (struct preemp_delay_mmap_state *) vma->vm_private_data;
> +	BUG_ON(!state || !state->task);
> +
> +	state->page->mapping = NULL;
> +	/* point delay request flag pointer back to old flag in task_struct */
> +	state->task->sched_preempt_delay.delay_req =
> +			&state->task->sched_preempt_delay.delay_flag;
> +	state->task->sched_preempt_delay.mmap_state = NULL;
> +	vfree(state->kaddr);
> +	kfree(state);
> +	vma->vm_private_data = NULL;
> +}

Suppose that state->task != current. Then this can race with do_exit()
which cleanups ->mmap_state too. OTOH do_exit() unmaps this region, it
is not clear why it can't rely in vm_ops->close().

Hmm. In fact I think do_exit() should crash after munmap? ->mmap_state
should be NULL ?? Perhaps I misread this patch completely...

> +static int
> +tid_preempt_delay_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	int retval = 0;
> +	void *kaddr = NULL;
> +	struct preemp_delay_mmap_state *state = NULL;
> +	struct inode *inode = file_inode(file);
> +	struct task_struct *task;
> +	struct page *page;
> +
> +	/*
> +	 * Validate args:
> +	 * - Only offset 0 support for now
> +	 * - size should be PAGE_SIZE
> +	 */
> +	if (vma->vm_pgoff != 0 || (vma->vm_end - vma->vm_start) != PAGE_SIZE) {
> +		retval = -EINVAL;
> +		goto error;
> +	}
> +
> +	/*
> +	 * Only one mmap allowed at a time
> +	 */
> +	if (current->sched_preempt_delay.mmap_state != NULL) {
> +		retval = -EEXIST;
> +		goto error;

This assumes that we are going to setup current->sched_preempt_delay.mmap_state,
but what if the task opens /proc/random_tid/sched_preempt_delay ?

> +	state = kzalloc(sizeof(struct preemp_delay_mmap_state), GFP_KERNEL);
> +	kaddr = vmalloc_user(PAGE_SIZE);

Why vmalloc() ? We only need a single page?

> +	task = get_proc_task(inode);

And it seems that nobody does put_task_struct(state->task);

> +	state->page = page;
> +	state->kaddr = kaddr;
> +	state->uaddr = (void *)vma->vm_start;

This is used by do_exit(). But ->vm_start can be changed by mremap() ?


Hmm. And mremap() can do vm_ops->close() too. But the new vma will
have the same vm_ops/vm_private_data, so exit_mmap() will try to do
this again... Perhaps I missed something, but I bet this all can't be
right.

> +	state->task = task;
> +
> +	/* Clear the current delay request flag */
> +	task->sched_preempt_delay.delay_flag = 0;
> +
> +	/* Point delay request flag pointer to the newly allocated memory */
> +	task->sched_preempt_delay.delay_req = (unsigned char *)kaddr;
> +
> +	task->sched_preempt_delay.mmap_state = state;
> +	vma->vm_private_data = state;
> +	vma->vm_ops = &preempt_delay_vmops;
> +	vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND | VM_SHARED | VM_WRITE;

This probably also needs VM_IO, to protect from madvise(MADV_DOFORK).
VM_SHARED/VM_WRITE doesn't look right.

Oleg.


  parent reply	other threads:[~2014-03-04 13:57 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-03 18:07 [RFC] [PATCH] Pre-emption control for userspace Khalid Aziz
2014-03-03 21:51 ` Davidlohr Bueso
2014-03-03 23:29   ` Khalid Aziz
2014-03-04 13:56 ` Oleg Nesterov [this message]
2014-03-04 17:44   ` Khalid Aziz
2014-03-04 18:38     ` Al Viro
2014-03-04 19:01       ` Khalid Aziz
2014-03-04 19:03     ` Oleg Nesterov
2014-03-04 20:14       ` Khalid Aziz
2014-03-05 14:38         ` Oleg Nesterov
2014-03-05 16:12           ` Oleg Nesterov
2014-03-05 17:10             ` Khalid Aziz
2014-03-04 21:12 ` H. Peter Anvin
2014-03-04 21:39   ` Khalid Aziz
2014-03-04 22:23     ` One Thousand Gnomes
2014-03-04 22:44       ` Khalid Aziz
2014-03-05  0:39         ` Thomas Gleixner
2014-03-05  0:51           ` Andi Kleen
2014-03-05 11:10             ` Peter Zijlstra
2014-03-05 17:29               ` Khalid Aziz
2014-03-05 19:58               ` Khalid Aziz
2014-03-06  9:57                 ` Peter Zijlstra
2014-03-06 16:08                   ` Khalid Aziz
2014-03-06 11:14                 ` Thomas Gleixner
2014-03-06 16:32                   ` Khalid Aziz
2014-03-05 14:54             ` Oleg Nesterov
2014-03-05 15:56               ` Andi Kleen
2014-03-05 16:36                 ` Oleg Nesterov
2014-03-05 17:22                   ` Khalid Aziz
2014-03-05 23:13                     ` David Lang
2014-03-05 23:48                       ` Khalid Aziz
2014-03-05 23:56                         ` H. Peter Anvin
2014-03-06  0:02                           ` Khalid Aziz
2014-03-06  0:13                             ` H. Peter Anvin
2014-03-05 23:59                         ` David Lang
2014-03-06  0:17                           ` Khalid Aziz
2014-03-06  0:36                             ` David Lang
2014-03-06  1:22                               ` Khalid Aziz
2014-03-06 14:23                                 ` David Lang
2014-03-06 12:13             ` Kevin Easton
2014-03-06 13:59               ` Peter Zijlstra
2014-03-06 22:41                 ` Andi Kleen
2014-03-06 14:25               ` David Lang
2014-03-06 16:12                 ` Khalid Aziz
2014-03-06 13:24   ` Rasmus Villemoes
2014-03-06 13:34     ` Peter Zijlstra
2014-03-06 13:45       ` Rasmus Villemoes
2014-03-06 14:02         ` Peter Zijlstra
2014-03-06 14:33           ` Thomas Gleixner
2014-03-06 14:34             ` H. Peter Anvin
2014-03-06 14:04         ` Thomas Gleixner
2014-03-25 17:17 ` [PATCH v2] " Khalid Aziz
2014-03-25 17:44   ` Andrew Morton
2014-03-25 17:56     ` Khalid Aziz
2014-03-25 18:14       ` Andrew Morton
2014-03-25 17:46   ` Oleg Nesterov
2014-03-25 17:59     ` Khalid Aziz
2014-03-25 18:20   ` Andi Kleen
2014-03-25 18:47     ` Khalid Aziz
2014-03-25 19:47       ` Andi Kleen
2014-03-25 18:59   ` Eric W. Biederman
2014-03-25 19:15     ` Khalid Aziz
2014-03-25 20:31       ` Eric W. Biederman
2014-03-25 21:37         ` Khalid Aziz
2014-03-26  6:03     ` Mike Galbraith
2014-03-25 23:01 ` [RFC] [PATCH] " Davidlohr Bueso
2014-03-25 23:29   ` Khalid Aziz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140304135624.GA6846@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi.kleen@intel.com \
    --cc=hpa@zytor.com \
    --cc=khalid.aziz@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rob@landley.net \
    --cc=tglx@linutronix.de \
    --cc=venki@google.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.