Re: [PATCH] sched/core: Create new task with twice disabled preemption

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>,
	Peter Zijlstra <peterz@infradead.org>,
	Kirill Tkhai <ktkhai@parallels.com>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH] sched/core: Create new task with twice disabled preemption
Date: Mon, 17 Feb 2014 10:37:38 +0100	[thread overview]
Message-ID: <20140217103738.7369d84b@mschwide> (raw)
In-Reply-To: <20140214105255.GA10596@arm.com>

On Fri, 14 Feb 2014 10:52:55 +0000
Catalin Marinas <catalin.marinas@arm.com> wrote:

> On Thu, Feb 13, 2014 at 09:32:22PM +0400, Kirill Tkhai wrote:
> > On 13.02.2014 20:00, Peter Zijlstra wrote:
> > > On Thu, Feb 13, 2014 at 07:51:56PM +0400, Kirill Tkhai wrote:
> > >> For archs without __ARCH_WANT_UNLOCKED_CTXSW set this means
> > >> that all newly created tasks execute finish_arch_post_lock_switch()
> > >> and post_schedule() with preemption enabled.
> > > 
> > > That's IA64 and MIPS; do they have a 'good' reason to use this?
> > 
> > It seems my description misleads reader, I'm sorry if so.
> > 
> > I mean all architectures *except* IA64 and MIPS. All, which
> > has no __ARCH_WANT_UNLOCKED_CTXSW defined.
> > 
> > IA64 and MIPS already have preempt_enable() in schedule_tail():
> > 
> > #ifdef __ARCH_WANT_UNLOCKED_CTXSW
> >         /* In this case, finish_task_switch does not reenable preemption */
> >         preempt_enable();
> > #endif
> > 
> > Their initial preemption is not decremented in finish_lock_switch().
> > 
> > So, we speak about x86, ARM64 etc.
> > 
> > Look at ARM64's finish_arch_post_lock_switch(). It looks a task
> > must to not be preempted between switch_mm() and this function.
> > But in case of new task this is possible.
> 
> We had a thread about this at the end of last year:
> 
> https://lkml.org/lkml/2013/11/15/82
> 
> There is indeed a problem on arm64, something like this (and I think
> s390 also needs a fix):
> 
> 1. switch_mm() via check_and_switch_context() defers the actual mm
>    switch by setting TIF_SWITCH_MM
> 2. the context switch is considered 'done' by the kernel before
>    finish_arch_post_lock_switch() and therefore we can be preempted to a
>    new thread before finish_arch_post_lock_switch()
> 3. The new thread has the same mm as the preempted thread but we
>    actually missed the mm switching in finish_arch_post_lock_switch()
>    because TIF_SWITCH_MM is per thread rather than mm
>
> > This is the problem I tried to solve. I don't know arm64, and I can't
> > say how it is serious.
> 
> Have you managed to reproduce this? I don't say it doesn't exist, but I
> want to make sure that any patch actually fixes it.
> 
> So we have more solutions, one of the first two suitable for stable:
> 
> 1. Propagate the TIF_SWITCH_MM to the next thread (suggested by Martin)

This is what I put in place for s390 but with the name TIF_TLB_WAIT instead
of TIF_SWITCH_MM. I took the liberty to add the code to the features branch
of the linux-s390 tree including the common code change that is necessary:

https://git.kernel.org/cgit/linux/kernel/git/s390/linux.git/commit/?h=features&id=09ddfb4d5602095aad04eada8bc8df59e873a6ef
https://git.kernel.org/cgit/linux/kernel/git/s390/linux.git/commit/?h=features&id=525d65f8f66ac29136ba6d2336f5a73b038701e2

These patches will be included in a please-pull request with the next merge
window.

> 2. Get rid of TIF_SWITCH_MM and use mm_cpumask for tracking (I already
>    have the patch, it just needs a lot more testing)
> 3. Re-write the ASID allocation algorithm to no longer require IPIs and
>    therefore drop finish_arch_post_lock_switch() (this can be done, so
>    pretty intrusive for stable)
> 4. Replace finish_arch_post_lock_switch() with finish_mm_switch() as per
>    Martin's patch and I think this would guarantee a call always, we can
>    move the mm switching from switch_mm() to finish_mm_switch() and no
>    need for flags to mark deferred mm switching
> 
> For arm64, we'll most likely go with 2 for stable and move to 3 shortly
> after, no need for other deferred mm switching.
> 


-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

next prev parent reply	other threads:[~2014-02-17  9:37 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-13 15:51 [PATCH] sched/core: Create new task with twice disabled preemption Kirill Tkhai
2014-02-13 16:00 ` Peter Zijlstra
2014-02-13 17:32   ` Kirill Tkhai
2014-02-14 10:52     ` Catalin Marinas
2014-02-14 11:16       ` Kirill Tkhai
2014-02-14 12:21         ` Catalin Marinas
2014-02-14 12:33           ` Kirill Tkhai
2014-02-17  9:37       ` Martin Schwidefsky [this message]
2014-02-17 10:40         ` Catalin Marinas
2014-02-17 12:55           ` Martin Schwidefsky
2014-02-14 12:35 ` Catalin Marinas
2014-02-14 12:44   ` Kirill Tkhai
2014-02-14 15:49     ` Catalin Marinas
2014-02-17 14:43       ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140217103738.7369d84b@mschwide \
    --to=schwidefsky@de.ibm.com \
    --cc=catalin.marinas@arm.com \
    --cc=ktkhai@parallels.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tkhai@yandex.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox