All of lore.kernel.org
 help / color / mirror / Atom feed
From: Khalid Aziz <khalid.aziz@oracle.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com,
	peterz@infradead.org, akpm@linux-foundation.org,
	andi.kleen@intel.com, rob@landley.net, viro@zeniv.linux.org.uk,
	venki@google.com, linux-kernel@vger.kernel.org
Subject: Re: [RFC] [PATCH] Pre-emption control for userspace
Date: Tue, 04 Mar 2014 10:44:54 -0700	[thread overview]
Message-ID: <53161116.9050109@oracle.com> (raw)
In-Reply-To: <20140304135624.GA6846@redhat.com>

Thanks for the review. Please see my comments inline below.

On 03/04/2014 06:56 AM, Oleg Nesterov wrote:
> On 03/03, Khalid Aziz wrote:
>>   kernel/sched/preempt_delay.c                    |  39 ++++++
>
> Why? This can go into proc/ as well.
>

Sure. No strong reason to keep these functions in separate file. These 
functions can go into proc/fs/base.c.

>> +static void
>> +close_preempt_delay_vmops(struct vm_area_struct *vma)
>> +{
>> +	struct preemp_delay_mmap_state *state;
>> +
>> +	state = (struct preemp_delay_mmap_state *) vma->vm_private_data;
>> +	BUG_ON(!state || !state->task);
>> +
>> +	state->page->mapping = NULL;
>> +	/* point delay request flag pointer back to old flag in task_struct */
>> +	state->task->sched_preempt_delay.delay_req =
>> +			&state->task->sched_preempt_delay.delay_flag;
>> +	state->task->sched_preempt_delay.mmap_state = NULL;
>> +	vfree(state->kaddr);
>> +	kfree(state);
>> +	vma->vm_private_data = NULL;
>> +}
>
> Suppose that state->task != current. Then this can race with do_exit()
> which cleanups ->mmap_state too. OTOH do_exit() unmaps this region, it
> is not clear why it can't rely in vm_ops->close().
>
> Hmm. In fact I think do_exit() should crash after munmap? ->mmap_state
> should be NULL ?? Perhaps I misread this patch completely...

do_exit() unmaps mmap_state->uaddr, and frees up mmap_state->kaddr and 
mmap_state. mmap_state should not be NULL after unmap. vfree() and 
kfree() are tolerant of pointers that have already been freed. On the 
other hand mmap_state can be NULL in do_exit() if do_exit() and 
close_preempt_delay_vmops() were to race since 
close_preempt_delay_vmops() sets mmap_state to NULL just before it frees 
it up. Could they indeed race, because the thread happens to be killed 
just as it had called munmap()? I can protect against that with a 
refcount in mmap_state. Do you feel this is necessary/helpful to do?

>
>> +static int
>> +tid_preempt_delay_mmap(struct file *file, struct vm_area_struct *vma)
>> +{
>> +	int retval = 0;
>> +	void *kaddr = NULL;
>> +	struct preemp_delay_mmap_state *state = NULL;
>> +	struct inode *inode = file_inode(file);
>> +	struct task_struct *task;
>> +	struct page *page;
>> +
>> +	/*
>> +	 * Validate args:
>> +	 * - Only offset 0 support for now
>> +	 * - size should be PAGE_SIZE
>> +	 */
>> +	if (vma->vm_pgoff != 0 || (vma->vm_end - vma->vm_start) != PAGE_SIZE) {
>> +		retval = -EINVAL;
>> +		goto error;
>> +	}
>> +
>> +	/*
>> +	 * Only one mmap allowed at a time
>> +	 */
>> +	if (current->sched_preempt_delay.mmap_state != NULL) {
>> +		retval = -EEXIST;
>> +		goto error;
>
> This assumes that we are going to setup current->sched_preempt_delay.mmap_state,
> but what if the task opens /proc/random_tid/sched_preempt_delay ?

Good point. A thread should not be allowed to request preemption delay 
for another thread. I would recommend leaving this code alone and adding 
following code before this:

if (get_proc_task(inode) != current) {
	retval = -EPERM;
	goto error;
}

Sounds reasonable?

>
>> +	state = kzalloc(sizeof(struct preemp_delay_mmap_state), GFP_KERNEL);
>> +	kaddr = vmalloc_user(PAGE_SIZE);
>
> Why vmalloc() ? We only need a single page?
>

Makes sense. I will switch to get_zeroed_page().

>> +	task = get_proc_task(inode);
>
> And it seems that nobody does put_task_struct(state->task);

Good catch. I had caught the other two instances of get_proc_task() but 
missed this one.

>
>> +	state->page = page;
>> +	state->kaddr = kaddr;
>> +	state->uaddr = (void *)vma->vm_start;
>
> This is used by do_exit(). But ->vm_start can be changed by mremap() ?
>
>
> Hmm. And mremap() can do vm_ops->close() too. But the new vma will
> have the same vm_ops/vm_private_data, so exit_mmap() will try to do
> this again... Perhaps I missed something, but I bet this all can't be
> right.

Would you say sys_munmap() of mmap_state->uaddr is not even needed since 
exit_mm() will do this any way further down in do_exit()? If I were to 
remove this sys_munmap(), that could simplify the race issues as well.

>
>> +	state->task = task;
>> +
>> +	/* Clear the current delay request flag */
>> +	task->sched_preempt_delay.delay_flag = 0;
>> +
>> +	/* Point delay request flag pointer to the newly allocated memory */
>> +	task->sched_preempt_delay.delay_req = (unsigned char *)kaddr;
>> +
>> +	task->sched_preempt_delay.mmap_state = state;
>> +	vma->vm_private_data = state;
>> +	vma->vm_ops = &preempt_delay_vmops;
>> +	vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND | VM_SHARED | VM_WRITE;
>
> This probably also needs VM_IO, to protect from madvise(MADV_DOFORK).

Yes, you are right. I will add that.

> VM_SHARED/VM_WRITE doesn't look right.

VM_SHARED is wrong but VM_WRITE is needed I think since the thread will 
write to the mmap'd page to signal to request preemption delay.

>
> Oleg.
>

I appreciate your taking the time to review this code. Thank you very much.

--
Khalid

  reply	other threads:[~2014-03-04 17:45 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-03 18:07 [RFC] [PATCH] Pre-emption control for userspace Khalid Aziz
2014-03-03 21:51 ` Davidlohr Bueso
2014-03-03 23:29   ` Khalid Aziz
2014-03-04 13:56 ` Oleg Nesterov
2014-03-04 17:44   ` Khalid Aziz [this message]
2014-03-04 18:38     ` Al Viro
2014-03-04 19:01       ` Khalid Aziz
2014-03-04 19:03     ` Oleg Nesterov
2014-03-04 20:14       ` Khalid Aziz
2014-03-05 14:38         ` Oleg Nesterov
2014-03-05 16:12           ` Oleg Nesterov
2014-03-05 17:10             ` Khalid Aziz
2014-03-04 21:12 ` H. Peter Anvin
2014-03-04 21:39   ` Khalid Aziz
2014-03-04 22:23     ` One Thousand Gnomes
2014-03-04 22:44       ` Khalid Aziz
2014-03-05  0:39         ` Thomas Gleixner
2014-03-05  0:51           ` Andi Kleen
2014-03-05 11:10             ` Peter Zijlstra
2014-03-05 17:29               ` Khalid Aziz
2014-03-05 19:58               ` Khalid Aziz
2014-03-06  9:57                 ` Peter Zijlstra
2014-03-06 16:08                   ` Khalid Aziz
2014-03-06 11:14                 ` Thomas Gleixner
2014-03-06 16:32                   ` Khalid Aziz
2014-03-05 14:54             ` Oleg Nesterov
2014-03-05 15:56               ` Andi Kleen
2014-03-05 16:36                 ` Oleg Nesterov
2014-03-05 17:22                   ` Khalid Aziz
2014-03-05 23:13                     ` David Lang
2014-03-05 23:48                       ` Khalid Aziz
2014-03-05 23:56                         ` H. Peter Anvin
2014-03-06  0:02                           ` Khalid Aziz
2014-03-06  0:13                             ` H. Peter Anvin
2014-03-05 23:59                         ` David Lang
2014-03-06  0:17                           ` Khalid Aziz
2014-03-06  0:36                             ` David Lang
2014-03-06  1:22                               ` Khalid Aziz
2014-03-06 14:23                                 ` David Lang
2014-03-06 12:13             ` Kevin Easton
2014-03-06 13:59               ` Peter Zijlstra
2014-03-06 22:41                 ` Andi Kleen
2014-03-06 14:25               ` David Lang
2014-03-06 16:12                 ` Khalid Aziz
2014-03-06 13:24   ` Rasmus Villemoes
2014-03-06 13:34     ` Peter Zijlstra
2014-03-06 13:45       ` Rasmus Villemoes
2014-03-06 14:02         ` Peter Zijlstra
2014-03-06 14:33           ` Thomas Gleixner
2014-03-06 14:34             ` H. Peter Anvin
2014-03-06 14:04         ` Thomas Gleixner
2014-03-25 17:17 ` [PATCH v2] " Khalid Aziz
2014-03-25 17:44   ` Andrew Morton
2014-03-25 17:56     ` Khalid Aziz
2014-03-25 18:14       ` Andrew Morton
2014-03-25 17:46   ` Oleg Nesterov
2014-03-25 17:59     ` Khalid Aziz
2014-03-25 18:20   ` Andi Kleen
2014-03-25 18:47     ` Khalid Aziz
2014-03-25 19:47       ` Andi Kleen
2014-03-25 18:59   ` Eric W. Biederman
2014-03-25 19:15     ` Khalid Aziz
2014-03-25 20:31       ` Eric W. Biederman
2014-03-25 21:37         ` Khalid Aziz
2014-03-26  6:03     ` Mike Galbraith
2014-03-25 23:01 ` [RFC] [PATCH] " Davidlohr Bueso
2014-03-25 23:29   ` Khalid Aziz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53161116.9050109@oracle.com \
    --to=khalid.aziz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi.kleen@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rob@landley.net \
    --cc=tglx@linutronix.de \
    --cc=venki@google.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.