All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>,
	Peter Zijlstra <peterz@infradead.org>,
	Kirill Tkhai <ktkhai@parallels.com>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH] sched/core: Create new task with twice disabled preemption
Date: Mon, 17 Feb 2014 10:37:38 +0100	[thread overview]
Message-ID: <20140217103738.7369d84b@mschwide> (raw)
In-Reply-To: <20140214105255.GA10596@arm.com>

On Fri, 14 Feb 2014 10:52:55 +0000
Catalin Marinas <catalin.marinas@arm.com> wrote:

> On Thu, Feb 13, 2014 at 09:32:22PM +0400, Kirill Tkhai wrote:
> > On 13.02.2014 20:00, Peter Zijlstra wrote:
> > > On Thu, Feb 13, 2014 at 07:51:56PM +0400, Kirill Tkhai wrote:
> > >> For archs without __ARCH_WANT_UNLOCKED_CTXSW set this means
> > >> that all newly created tasks execute finish_arch_post_lock_switch()
> > >> and post_schedule() with preemption enabled.
> > > 
> > > That's IA64 and MIPS; do they have a 'good' reason to use this?
> > 
> > It seems my description misleads reader, I'm sorry if so.
> > 
> > I mean all architectures *except* IA64 and MIPS. All, which
> > has no __ARCH_WANT_UNLOCKED_CTXSW defined.
> > 
> > IA64 and MIPS already have preempt_enable() in schedule_tail():
> > 
> > #ifdef __ARCH_WANT_UNLOCKED_CTXSW
> >         /* In this case, finish_task_switch does not reenable preemption */
> >         preempt_enable();
> > #endif
> > 
> > Their initial preemption is not decremented in finish_lock_switch().
> > 
> > So, we speak about x86, ARM64 etc.
> > 
> > Look at ARM64's finish_arch_post_lock_switch(). It looks a task
> > must to not be preempted between switch_mm() and this function.
> > But in case of new task this is possible.
> 
> We had a thread about this at the end of last year:
> 
> https://lkml.org/lkml/2013/11/15/82
> 
> There is indeed a problem on arm64, something like this (and I think
> s390 also needs a fix):
> 
> 1. switch_mm() via check_and_switch_context() defers the actual mm
>    switch by setting TIF_SWITCH_MM
> 2. the context switch is considered 'done' by the kernel before
>    finish_arch_post_lock_switch() and therefore we can be preempted to a
>    new thread before finish_arch_post_lock_switch()
> 3. The new thread has the same mm as the preempted thread but we
>    actually missed the mm switching in finish_arch_post_lock_switch()
>    because TIF_SWITCH_MM is per thread rather than mm
>
> > This is the problem I tried to solve. I don't know arm64, and I can't
> > say how it is serious.
> 
> Have you managed to reproduce this? I don't say it doesn't exist, but I
> want to make sure that any patch actually fixes it.
> 
> So we have more solutions, one of the first two suitable for stable:
> 
> 1. Propagate the TIF_SWITCH_MM to the next thread (suggested by Martin)

This is what I put in place for s390 but with the name TIF_TLB_WAIT instead
of TIF_SWITCH_MM. I took the liberty to add the code to the features branch
of the linux-s390 tree including the common code change that is necessary:

https://git.kernel.org/cgit/linux/kernel/git/s390/linux.git/commit/?h=features&id=09ddfb4d5602095aad04eada8bc8df59e873a6ef
https://git.kernel.org/cgit/linux/kernel/git/s390/linux.git/commit/?h=features&id=525d65f8f66ac29136ba6d2336f5a73b038701e2

These patches will be included in a please-pull request with the next merge
window.

> 2. Get rid of TIF_SWITCH_MM and use mm_cpumask for tracking (I already
>    have the patch, it just needs a lot more testing)
> 3. Re-write the ASID allocation algorithm to no longer require IPIs and
>    therefore drop finish_arch_post_lock_switch() (this can be done, so
>    pretty intrusive for stable)
> 4. Replace finish_arch_post_lock_switch() with finish_mm_switch() as per
>    Martin's patch and I think this would guarantee a call always, we can
>    move the mm switching from switch_mm() to finish_mm_switch() and no
>    need for flags to mark deferred mm switching
> 
> For arm64, we'll most likely go with 2 for stable and move to 3 shortly
> after, no need for other deferred mm switching.
> 


-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


  parent reply	other threads:[~2014-02-17  9:37 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-13 15:51 [PATCH] sched/core: Create new task with twice disabled preemption Kirill Tkhai
2014-02-13 16:00 ` Peter Zijlstra
2014-02-13 17:32   ` Kirill Tkhai
2014-02-14 10:52     ` Catalin Marinas
2014-02-14 11:16       ` Kirill Tkhai
2014-02-14 12:21         ` Catalin Marinas
2014-02-14 12:33           ` Kirill Tkhai
2014-02-17  9:37       ` Martin Schwidefsky [this message]
2014-02-17 10:40         ` Catalin Marinas
2014-02-17 12:55           ` Martin Schwidefsky
2014-02-14 12:35 ` Catalin Marinas
2014-02-14 12:44   ` Kirill Tkhai
2014-02-14 15:49     ` Catalin Marinas
2014-02-17 14:43       ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140217103738.7369d84b@mschwide \
    --to=schwidefsky@de.ibm.com \
    --cc=catalin.marinas@arm.com \
    --cc=ktkhai@parallels.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tkhai@yandex.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.