public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Rik van Riel <riel@redhat.com>
To: "H. Peter Anvin" <hpa@zytor.com>, serebrin@google.com
Cc: mingo@kernel.org, peterz@infradead.org, torvalds@kernel.org,
	linux-kernel@vger.kernel.org, luto@kernel.org, bp@suse.de,
	mgorman@suse.de, tglx@linutronix.de
Subject: Re: [PATCH RFC UGLY] x86,mm,sched: make lazy TLB mode even lazier
Date: Mon, 29 Aug 2016 12:08:52 -0400	[thread overview]
Message-ID: <1472486932.32433.103.camel@redhat.com> (raw)
In-Reply-To: <7E7CF02F-F0B1-493A-98B3-B078174811DA@zytor.com>

[-- Attachment #1: Type: text/plain, Size: 2709 bytes --]

On Thu, 2016-08-25 at 12:42 -0700, H. Peter Anvin wrote:

> > +static void set_lazy_tlbstate_flush(int cpu) {
> > +	if (per_cpu(cpu_tlbstate.state, cpu) == TLBSTATE_LAZY) {
> > +		raw_spin_lock(&cpu_rq(cpu)->lock);
> > +		if (per_cpu(cpu_tlbstate.state, cpu) ==
> > TLBSTATE_LAZY)
> > +			per_cpu(cpu_tlbstate.state, cpu) =
> > TLBSTATE_FLUSH;
> > +		raw_spin_unlock(&cpu_rq(cpu)->lock);
> > +	}
> > +}
> > +
> > 
> Why grabbing a lock instead of cmpxchg?

The second and third version of the patch had cmpxchg,
instead of grabbing the remote CPU's runqueue lock,
but I am no longer convinced it is safe.

At TLB invalidation time, we have this:

        int *tlbstate = &per_cpu(cpu_tlbstate.state, cpu);
        int old;

        switch (*tlbstate) {
        case TLBSTATE_LAZY:
                /*
                 * The CPU is in TLBSTATE_LAZY, which could context switch back
                 * to TLBSTATE_OK, re-using the old TLB state without a flush.
                 * If that happened, send a TLB flush IPI.
                 *
                 * Otherwise, the state is now TLBSTATE_FLUSH, and TLB will
                 * be flushed at the next context switch. Skip the IPI.
                 */ 
                old = cmpxchg(tlbstate, TLBSTATE_LAZY, TLBSTATE_FLUSH);
                return old != TLBSTATE_OK;

At context switch time, we have this:

                int oldstate = this_cpu_read(cpu_tlbstate.state);

                this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK);
                BUG_ON(this_cpu_read(cpu_tlbstate.active_mm) != next);

                if (oldstate == TLBSTATE_FLUSH ||
                                !cpumask_test_cpu(cpu, mm_cpumask(next))) {

In each case, the read will happen before the write, because
they are to the same address.

If the invalidate and context switch happen concurrently,
the writes can be ordered in two directions:

1) The cmpxchg in the TLB flush code happens after the
this_cpu_write in the context switch code. This is safe.

2) The cmpxchg in the TLB flush code happens before the
this_cpu_write in the context switch code. This is broken.

I can see two ways to fix that:
1) Change the write in the context switch code to a
   cmpxchg. I do not know how expensive this is on
   modern CPUs, or whether the overhead of doing this
   is unacceptable (or even noticeable, considering the
   cache line needs to be acquired for write anyway).
2) Acquire the runqueue lock of the remote CPU from the
   (much rarer?) TLB flush code, in order to ensure it
   does not run concurrently with the context switch
   code.

Any preferences?

-- 

All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

  parent reply	other threads:[~2016-08-29 16:08 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-25 19:04 [PATCH RFC UGLY] x86,mm,sched: make lazy TLB mode even lazier Rik van Riel
2016-08-25 19:42 ` H. Peter Anvin
2016-08-25 19:52   ` Rik van Riel
2016-08-25 20:15   ` [PATCH RFC v2] " Rik van Riel
2016-08-25 21:01   ` [PATCH RFC v3] " Rik van Riel
2016-08-27  8:03     ` Ingo Molnar
2016-08-27 23:02       ` Linus Torvalds
2016-08-30 19:53         ` [PATCH RFC v4] " Rik van Riel
2016-08-30 21:09           ` [PATCH RFC v5] " Rik van Riel
2016-08-31 16:27         ` [PATCH RFC v6] " Rik van Riel
2016-09-08  6:56           ` Ingo Molnar
2016-09-09  0:09             ` Benjamin Serebrin
2016-09-09  4:39               ` Andy Lutomirski
2016-09-09  7:44                 ` Peter Zijlstra
2017-04-25  3:30                   ` Andy Lutomirski
2016-08-29 15:24       ` [PATCH RFC v3] " Rik van Riel
2016-08-29 16:08   ` Rik van Riel [this message]
2016-08-28  8:11 ` [PATCH RFC UGLY] " Andy Lutomirski
2016-08-29 14:54   ` Rik van Riel
2016-08-29 23:55     ` Andy Lutomirski
2016-08-30  1:14       ` H. Peter Anvin
2016-08-30 18:23         ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1472486932.32433.103.camel@redhat.com \
    --to=riel@redhat.com \
    --cc=bp@suse.de \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=serebrin@google.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox