Re: [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: "Emilio G. Cota" <cota@braap.org>
Cc: mttcg@listserver.greensocs.com, qemu-devel@nongnu.org,
	fred.konrad@greensocs.com, a.rigo@virtualopensystems.com,
	serge.fdrv@gmail.com, bobby.prani@gmail.com, rth@twiddle.net,
	mark.burton@greensocs.com, pbonzini@redhat.com,
	jan.kiszka@siemens.com, peter.maydell@linaro.org,
	claudio.fontana@huawei.com,
	Peter Crosthwaite <crosthwaite.peter@gmail.com>
Subject: Re: [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path
Date: Tue, 05 Jul 2016 12:14:22 +0100	[thread overview]
Message-ID: <87inwkmj5d.fsf@linaro.org> (raw)
In-Reply-To: <20160704223001.GA29624@flamenco>


Emilio G. Cota <cota@braap.org> writes:

> On Mon, Jul 04, 2016 at 12:45:52 +0100, Alex Bennée wrote:
>>
>> Emilio G. Cota <cota@braap.org> writes:
>>
>> > On Fri, Jul 01, 2016 at 17:16:10 +0100, Alex Bennée wrote:
>> >> Lock contention in the hot path of moving between existing patched
>> >> TranslationBlocks is the main drag in multithreaded performance. This
>> >> patch pushes the tb_lock() usage down to the two places that really need
>> >> it:
>> >>
>> >>   - code generation (tb_gen_code)
>> >>   - jump patching (tb_add_jump)
>> >>
>> >> The rest of the code doesn't really need to hold a lock as it is either
>> >> using per-CPU structures, atomically updated or designed to be used in
>> >> concurrent read situations (qht_lookup).
>> >>
>> >> To keep things simple I removed the #ifdef CONFIG_USER_ONLY stuff as the
>> >> locks become NOPs anyway until the MTTCG work is completed.
>> >
>> > From a scalability point of view it would be better to have a single
>> > critical section.
>>
>> You mean merge the critical region for patching and code-generation?
>
> Yes, I'd keep the lock held and drop it (if it was held) after the patching
> is done, like IIRC we used to do:
> (snip)
>> > I propose to just extend the critical section, like we used to
>> > do with tcg_lock_reset.

Hmm, so I came up with this:

/*
 * Patch the last TB with a jump to the current TB.
 *
 * Modification of the TB has to be protected with tb_lock which can
 * either be already held or taken here.
 */
static inline void maybe_patch_last_tb(CPUState *cpu,
                                       TranslationBlock *tb,
                                       TranslationBlock **last_tb,
                                       int tb_exit,
                                       bool locked)
{
    if (cpu->tb_flushed) {
        /* Ensure that no TB jump will be modified as the
         * translation buffer has been flushed.
         */
        *last_tb = NULL;
        cpu->tb_flushed = false;
    }
#ifndef CONFIG_USER_ONLY
    /* We don't take care of direct jumps when address mapping changes in
     * system emulation. So it's not safe to make a direct jump to a TB
     * spanning two pages because the mapping for the second page can change.
     */
    if (tb->page_addr[1] != -1) {
        *last_tb = NULL;
    }
#endif
    /* See if we can patch the calling TB. */
    if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
        if (!locked) {
            tb_lock();
        }
        tb_add_jump(*last_tb, tb_exit, tb);
        if (!locked) {
            tb_unlock();
        }
    }
}

/*
 * tb_find - find next TB, possibly generating it
 *
 * There is a multi-level lookup for finding the next TB which avoids
 * locks unless generation is required.
 *
 * 1. Lookup via the per-vcpu tb_jmp_cache
 * 2. Lookup via tb_find_physical (using QHT)
 *
 * If both of those fail then we need to grab the mmap_lock and
 * tb_lock and do code generation.
 *
 * As the jump patching of code also needs to be protected by locks we
 * have multiple paths into maybe_patch_last_tb taking advantage of
 * the fact we may already have locks held for code generation.
 */
static TranslationBlock *tb_find(CPUState *cpu,
                                 TranslationBlock **last_tb,
                                 int tb_exit)
{
    CPUArchState *env = (CPUArchState *)cpu->env_ptr;
    TranslationBlock *tb;
    target_ulong cs_base, pc;
    unsigned int h;
    uint32_t flags;

    /* we record a subset of the CPU state. It will
       always be the same before a given translated block
       is executed. */
    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
    h = tb_jmp_cache_hash_func(pc);
    tb = atomic_read(&cpu->tb_jmp_cache[h]);

    if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                 tb->flags != flags)) {

        /* Ensure that we won't find a TB in the shared hash table
         * if it is being invalidated by some other thread.
         * Otherwise we'd put it back to CPU's local cache.
         * Pairs with smp_wmb() in tb_phys_invalidate(). */
        smp_rmb();
        tb = tb_find_physical(cpu, pc, cs_base, flags);

        if (!tb) {
            /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
             * taken outside tb_lock. As system emulation is currently
             * single threaded the locks are NOPs.
             */
            mmap_lock();
            tb_lock();

            /* There's a chance that our desired tb has been translated while
             * taking the locks so we check again inside the lock.
             */
            tb = tb_find_physical(cpu, pc, cs_base, flags);
            if (!tb) {
                /* if no translated code available, then translate it now */
                tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
            }
            maybe_patch_last_tb(cpu, tb, last_tb, tb_exit, true);

            tb_unlock();
            mmap_unlock();
        } else {
            maybe_patch_last_tb(cpu, tb, last_tb, tb_exit, false);
        }

        /* We update the TB in the virtual pc hash table for the fast lookup */
        atomic_set(&cpu->tb_jmp_cache[h], tb);
    } else {
        maybe_patch_last_tb(cpu, tb, last_tb, tb_exit, false);
    }

    return tb;
}

But it doesn't seem to make much difference to the microbenchmark and I
think makes the code a lot trickier to follow.

Is it worth it?

--
Alex Bennée

next prev parent reply	other threads:[~2016-07-05 11:14 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-01 16:16 [Qemu-devel] [PATCH 0/2] Reduce lock contention on TCG hot-path Alex Bennée
2016-07-01 16:16 ` [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock' Alex Bennée
2016-07-01 23:14   ` Richard Henderson
2016-07-02  0:17   ` Emilio G. Cota
2016-07-02  0:32     ` Richard Henderson
2016-07-04 22:33       ` Emilio G. Cota
2016-07-02  7:09     ` Alex Bennée
2016-07-04 22:31       ` Emilio G. Cota
2016-07-05 12:49         ` Paolo Bonzini
2016-07-01 16:16 ` [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path Alex Bennée
2016-07-01 23:19   ` Richard Henderson
2016-07-02  0:39   ` Emilio G. Cota
2016-07-04 11:45     ` Alex Bennée
2016-07-04 22:30       ` Emilio G. Cota
2016-07-05 11:14         ` Alex Bennée [this message]
2016-07-05 12:47           ` Paolo Bonzini
2016-07-05 13:11             ` Alex Bennée
2016-07-05 13:42               ` Paolo Bonzini
2016-07-05 15:34                 ` Sergey Fedorov
2016-07-05 16:00                   ` Alex Bennée
2016-07-05 16:04                     ` Sergey Fedorov
2016-07-02  0:52 ` [Qemu-devel] [PATCH 0/2] Reduce lock contention on TCG hot-path Emilio G. Cota
2016-07-02  7:08   ` Alex Bennée
2016-07-02 16:03     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87inwkmj5d.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=a.rigo@virtualopensystems.com \
    --cc=bobby.prani@gmail.com \
    --cc=claudio.fontana@huawei.com \
    --cc=cota@braap.org \
    --cc=crosthwaite.peter@gmail.com \
    --cc=fred.konrad@greensocs.com \
    --cc=jan.kiszka@siemens.com \
    --cc=mark.burton@greensocs.com \
    --cc=mttcg@listserver.greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=serge.fdrv@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).