From: Khalid Aziz <khalid.aziz@oracle.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com,
peterz@infradead.org, akpm@linux-foundation.org,
andi.kleen@intel.com, rob@landley.net, viro@zeniv.linux.org.uk,
venki@google.com, linux-kernel@vger.kernel.org
Subject: Re: [RFC] [PATCH] Pre-emption control for userspace
Date: Tue, 04 Mar 2014 10:44:54 -0700 [thread overview]
Message-ID: <53161116.9050109@oracle.com> (raw)
In-Reply-To: <20140304135624.GA6846@redhat.com>
Thanks for the review. Please see my comments inline below.
On 03/04/2014 06:56 AM, Oleg Nesterov wrote:
> On 03/03, Khalid Aziz wrote:
>> kernel/sched/preempt_delay.c | 39 ++++++
>
> Why? This can go into proc/ as well.
>
Sure. No strong reason to keep these functions in separate file. These
functions can go into proc/fs/base.c.
>> +static void
>> +close_preempt_delay_vmops(struct vm_area_struct *vma)
>> +{
>> + struct preemp_delay_mmap_state *state;
>> +
>> + state = (struct preemp_delay_mmap_state *) vma->vm_private_data;
>> + BUG_ON(!state || !state->task);
>> +
>> + state->page->mapping = NULL;
>> + /* point delay request flag pointer back to old flag in task_struct */
>> + state->task->sched_preempt_delay.delay_req =
>> + &state->task->sched_preempt_delay.delay_flag;
>> + state->task->sched_preempt_delay.mmap_state = NULL;
>> + vfree(state->kaddr);
>> + kfree(state);
>> + vma->vm_private_data = NULL;
>> +}
>
> Suppose that state->task != current. Then this can race with do_exit()
> which cleanups ->mmap_state too. OTOH do_exit() unmaps this region, it
> is not clear why it can't rely in vm_ops->close().
>
> Hmm. In fact I think do_exit() should crash after munmap? ->mmap_state
> should be NULL ?? Perhaps I misread this patch completely...
do_exit() unmaps mmap_state->uaddr, and frees up mmap_state->kaddr and
mmap_state. mmap_state should not be NULL after unmap. vfree() and
kfree() are tolerant of pointers that have already been freed. On the
other hand mmap_state can be NULL in do_exit() if do_exit() and
close_preempt_delay_vmops() were to race since
close_preempt_delay_vmops() sets mmap_state to NULL just before it frees
it up. Could they indeed race, because the thread happens to be killed
just as it had called munmap()? I can protect against that with a
refcount in mmap_state. Do you feel this is necessary/helpful to do?
>
>> +static int
>> +tid_preempt_delay_mmap(struct file *file, struct vm_area_struct *vma)
>> +{
>> + int retval = 0;
>> + void *kaddr = NULL;
>> + struct preemp_delay_mmap_state *state = NULL;
>> + struct inode *inode = file_inode(file);
>> + struct task_struct *task;
>> + struct page *page;
>> +
>> + /*
>> + * Validate args:
>> + * - Only offset 0 support for now
>> + * - size should be PAGE_SIZE
>> + */
>> + if (vma->vm_pgoff != 0 || (vma->vm_end - vma->vm_start) != PAGE_SIZE) {
>> + retval = -EINVAL;
>> + goto error;
>> + }
>> +
>> + /*
>> + * Only one mmap allowed at a time
>> + */
>> + if (current->sched_preempt_delay.mmap_state != NULL) {
>> + retval = -EEXIST;
>> + goto error;
>
> This assumes that we are going to setup current->sched_preempt_delay.mmap_state,
> but what if the task opens /proc/random_tid/sched_preempt_delay ?
Good point. A thread should not be allowed to request preemption delay
for another thread. I would recommend leaving this code alone and adding
following code before this:
if (get_proc_task(inode) != current) {
retval = -EPERM;
goto error;
}
Sounds reasonable?
>
>> + state = kzalloc(sizeof(struct preemp_delay_mmap_state), GFP_KERNEL);
>> + kaddr = vmalloc_user(PAGE_SIZE);
>
> Why vmalloc() ? We only need a single page?
>
Makes sense. I will switch to get_zeroed_page().
>> + task = get_proc_task(inode);
>
> And it seems that nobody does put_task_struct(state->task);
Good catch. I had caught the other two instances of get_proc_task() but
missed this one.
>
>> + state->page = page;
>> + state->kaddr = kaddr;
>> + state->uaddr = (void *)vma->vm_start;
>
> This is used by do_exit(). But ->vm_start can be changed by mremap() ?
>
>
> Hmm. And mremap() can do vm_ops->close() too. But the new vma will
> have the same vm_ops/vm_private_data, so exit_mmap() will try to do
> this again... Perhaps I missed something, but I bet this all can't be
> right.
Would you say sys_munmap() of mmap_state->uaddr is not even needed since
exit_mm() will do this any way further down in do_exit()? If I were to
remove this sys_munmap(), that could simplify the race issues as well.
>
>> + state->task = task;
>> +
>> + /* Clear the current delay request flag */
>> + task->sched_preempt_delay.delay_flag = 0;
>> +
>> + /* Point delay request flag pointer to the newly allocated memory */
>> + task->sched_preempt_delay.delay_req = (unsigned char *)kaddr;
>> +
>> + task->sched_preempt_delay.mmap_state = state;
>> + vma->vm_private_data = state;
>> + vma->vm_ops = &preempt_delay_vmops;
>> + vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND | VM_SHARED | VM_WRITE;
>
> This probably also needs VM_IO, to protect from madvise(MADV_DOFORK).
Yes, you are right. I will add that.
> VM_SHARED/VM_WRITE doesn't look right.
VM_SHARED is wrong but VM_WRITE is needed I think since the thread will
write to the mmap'd page to signal to request preemption delay.
>
> Oleg.
>
I appreciate your taking the time to review this code. Thank you very much.
--
Khalid
next prev parent reply other threads:[~2014-03-04 17:45 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-03 18:07 [RFC] [PATCH] Pre-emption control for userspace Khalid Aziz
2014-03-03 21:51 ` Davidlohr Bueso
2014-03-03 23:29 ` Khalid Aziz
2014-03-04 13:56 ` Oleg Nesterov
2014-03-04 17:44 ` Khalid Aziz [this message]
2014-03-04 18:38 ` Al Viro
2014-03-04 19:01 ` Khalid Aziz
2014-03-04 19:03 ` Oleg Nesterov
2014-03-04 20:14 ` Khalid Aziz
2014-03-05 14:38 ` Oleg Nesterov
2014-03-05 16:12 ` Oleg Nesterov
2014-03-05 17:10 ` Khalid Aziz
2014-03-04 21:12 ` H. Peter Anvin
2014-03-04 21:39 ` Khalid Aziz
2014-03-04 22:23 ` One Thousand Gnomes
2014-03-04 22:44 ` Khalid Aziz
2014-03-05 0:39 ` Thomas Gleixner
2014-03-05 0:51 ` Andi Kleen
2014-03-05 11:10 ` Peter Zijlstra
2014-03-05 17:29 ` Khalid Aziz
2014-03-05 19:58 ` Khalid Aziz
2014-03-06 9:57 ` Peter Zijlstra
2014-03-06 16:08 ` Khalid Aziz
2014-03-06 11:14 ` Thomas Gleixner
2014-03-06 16:32 ` Khalid Aziz
2014-03-05 14:54 ` Oleg Nesterov
2014-03-05 15:56 ` Andi Kleen
2014-03-05 16:36 ` Oleg Nesterov
2014-03-05 17:22 ` Khalid Aziz
2014-03-05 23:13 ` David Lang
2014-03-05 23:48 ` Khalid Aziz
2014-03-05 23:56 ` H. Peter Anvin
2014-03-06 0:02 ` Khalid Aziz
2014-03-06 0:13 ` H. Peter Anvin
2014-03-05 23:59 ` David Lang
2014-03-06 0:17 ` Khalid Aziz
2014-03-06 0:36 ` David Lang
2014-03-06 1:22 ` Khalid Aziz
2014-03-06 14:23 ` David Lang
2014-03-06 12:13 ` Kevin Easton
2014-03-06 13:59 ` Peter Zijlstra
2014-03-06 22:41 ` Andi Kleen
2014-03-06 14:25 ` David Lang
2014-03-06 16:12 ` Khalid Aziz
2014-03-06 13:24 ` Rasmus Villemoes
2014-03-06 13:34 ` Peter Zijlstra
2014-03-06 13:45 ` Rasmus Villemoes
2014-03-06 14:02 ` Peter Zijlstra
2014-03-06 14:33 ` Thomas Gleixner
2014-03-06 14:34 ` H. Peter Anvin
2014-03-06 14:04 ` Thomas Gleixner
2014-03-25 17:17 ` [PATCH v2] " Khalid Aziz
2014-03-25 17:44 ` Andrew Morton
2014-03-25 17:56 ` Khalid Aziz
2014-03-25 18:14 ` Andrew Morton
2014-03-25 17:46 ` Oleg Nesterov
2014-03-25 17:59 ` Khalid Aziz
2014-03-25 18:20 ` Andi Kleen
2014-03-25 18:47 ` Khalid Aziz
2014-03-25 19:47 ` Andi Kleen
2014-03-25 18:59 ` Eric W. Biederman
2014-03-25 19:15 ` Khalid Aziz
2014-03-25 20:31 ` Eric W. Biederman
2014-03-25 21:37 ` Khalid Aziz
2014-03-26 6:03 ` Mike Galbraith
2014-03-25 23:01 ` [RFC] [PATCH] " Davidlohr Bueso
2014-03-25 23:29 ` Khalid Aziz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53161116.9050109@oracle.com \
--to=khalid.aziz@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=andi.kleen@intel.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rob@landley.net \
--cc=tglx@linutronix.de \
--cc=venki@google.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.