Re: [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: "Emilio G. Cota" <cota@braap.org>
Cc: mttcg@listserver.greensocs.com, qemu-devel@nongnu.org,
	fred.konrad@greensocs.com, a.rigo@virtualopensystems.com,
	serge.fdrv@gmail.com, bobby.prani@gmail.com, rth@twiddle.net,
	mark.burton@greensocs.com, pbonzini@redhat.com,
	jan.kiszka@siemens.com, peter.maydell@linaro.org,
	claudio.fontana@huawei.com,
	Peter Crosthwaite <crosthwaite.peter@gmail.com>
Subject: Re: [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path
Date: Tue, 05 Jul 2016 12:14:22 +0100	[thread overview]
Message-ID: <87inwkmj5d.fsf@linaro.org> (raw)
In-Reply-To: <20160704223001.GA29624@flamenco>


Emilio G. Cota <cota@braap.org> writes:

> On Mon, Jul 04, 2016 at 12:45:52 +0100, Alex Bennée wrote:
>>
>> Emilio G. Cota <cota@braap.org> writes:
>>
>> > On Fri, Jul 01, 2016 at 17:16:10 +0100, Alex Bennée wrote:
>> >> Lock contention in the hot path of moving between existing patched
>> >> TranslationBlocks is the main drag in multithreaded performance. This
>> >> patch pushes the tb_lock() usage down to the two places that really need
>> >> it:
>> >>
>> >>   - code generation (tb_gen_code)
>> >>   - jump patching (tb_add_jump)
>> >>
>> >> The rest of the code doesn't really need to hold a lock as it is either
>> >> using per-CPU structures, atomically updated or designed to be used in
>> >> concurrent read situations (qht_lookup).
>> >>
>> >> To keep things simple I removed the #ifdef CONFIG_USER_ONLY stuff as the
>> >> locks become NOPs anyway until the MTTCG work is completed.
>> >
>> > From a scalability point of view it would be better to have a single
>> > critical section.
>>
>> You mean merge the critical region for patching and code-generation?
>
> Yes, I'd keep the lock held and drop it (if it was held) after the patching
> is done, like IIRC we used to do:
> (snip)
>> > I propose to just extend the critical section, like we used to
>> > do with tcg_lock_reset.

Hmm, so I came up with this:

/*
 * Patch the last TB with a jump to the current TB.
 *
 * Modification of the TB has to be protected with tb_lock which can
 * either be already held or taken here.
 */
static inline void maybe_patch_last_tb(CPUState *cpu,
                                       TranslationBlock *tb,
                                       TranslationBlock **last_tb,
                                       int tb_exit,
                                       bool locked)
{
    if (cpu->tb_flushed) {
        /* Ensure that no TB jump will be modified as the
         * translation buffer has been flushed.
         */
        *last_tb = NULL;
        cpu->tb_flushed = false;
    }
#ifndef CONFIG_USER_ONLY
    /* We don't take care of direct jumps when address mapping changes in
     * system emulation. So it's not safe to make a direct jump to a TB
     * spanning two pages because the mapping for the second page can change.
     */
    if (tb->page_addr[1] != -1) {
        *last_tb = NULL;
    }
#endif
    /* See if we can patch the calling TB. */
    if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
        if (!locked) {
            tb_lock();
        }
        tb_add_jump(*last_tb, tb_exit, tb);
        if (!locked) {
            tb_unlock();
        }
    }
}

/*
 * tb_find - find next TB, possibly generating it
 *
 * There is a multi-level lookup for finding the next TB which avoids
 * locks unless generation is required.
 *
 * 1. Lookup via the per-vcpu tb_jmp_cache
 * 2. Lookup via tb_find_physical (using QHT)
 *
 * If both of those fail then we need to grab the mmap_lock and
 * tb_lock and do code generation.
 *
 * As the jump patching of code also needs to be protected by locks we
 * have multiple paths into maybe_patch_last_tb taking advantage of
 * the fact we may already have locks held for code generation.
 */
static TranslationBlock *tb_find(CPUState *cpu,
                                 TranslationBlock **last_tb,
                                 int tb_exit)
{
    CPUArchState *env = (CPUArchState *)cpu->env_ptr;
    TranslationBlock *tb;
    target_ulong cs_base, pc;
    unsigned int h;
    uint32_t flags;

    /* we record a subset of the CPU state. It will
       always be the same before a given translated block
       is executed. */
    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
    h = tb_jmp_cache_hash_func(pc);
    tb = atomic_read(&cpu->tb_jmp_cache[h]);

    if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                 tb->flags != flags)) {

        /* Ensure that we won't find a TB in the shared hash table
         * if it is being invalidated by some other thread.
         * Otherwise we'd put it back to CPU's local cache.
         * Pairs with smp_wmb() in tb_phys_invalidate(). */
        smp_rmb();
        tb = tb_find_physical(cpu, pc, cs_base, flags);

        if (!tb) {
            /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
             * taken outside tb_lock. As system emulation is currently
             * single threaded the locks are NOPs.
             */
            mmap_lock();
            tb_lock();

            /* There's a chance that our desired tb has been translated while
             * taking the locks so we check again inside the lock.
             */
            tb = tb_find_physical(cpu, pc, cs_base, flags);
            if (!tb) {
                /* if no translated code available, then translate it now */
                tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
            }
            maybe_patch_last_tb(cpu, tb, last_tb, tb_exit, true);

            tb_unlock();
            mmap_unlock();
        } else {
            maybe_patch_last_tb(cpu, tb, last_tb, tb_exit, false);
        }

        /* We update the TB in the virtual pc hash table for the fast lookup */
        atomic_set(&cpu->tb_jmp_cache[h], tb);
    } else {
        maybe_patch_last_tb(cpu, tb, last_tb, tb_exit, false);
    }

    return tb;
}

But it doesn't seem to make much difference to the microbenchmark and I
think makes the code a lot trickier to follow.

Is it worth it?

--
Alex Bennée

next prev parent reply	other threads:[~2016-07-05 11:14 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-01 16:16 [Qemu-devel] [PATCH 0/2] Reduce lock contention on TCG hot-path Alex Bennée
2016-07-01 16:16 ` [Qemu-devel] [PATCH 1/2] tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock' Alex Bennée
2016-07-01 23:14   ` Richard Henderson
2016-07-02  0:17   ` Emilio G. Cota
2016-07-02  0:32     ` Richard Henderson
2016-07-04 22:33       ` Emilio G. Cota
2016-07-02  7:09     ` Alex Bennée
2016-07-04 22:31       ` Emilio G. Cota
2016-07-05 12:49         ` Paolo Bonzini
2016-07-01 16:16 ` [Qemu-devel] [PATCH 2/2] cpu-exec: remove tb_lock from the hot-path Alex Bennée
2016-07-01 23:19   ` Richard Henderson
2016-07-02  0:39   ` Emilio G. Cota
2016-07-04 11:45     ` Alex Bennée
2016-07-04 22:30       ` Emilio G. Cota
2016-07-05 11:14         ` Alex Bennée [this message]
2016-07-05 12:47           ` Paolo Bonzini
2016-07-05 13:11             ` Alex Bennée
2016-07-05 13:42               ` Paolo Bonzini
2016-07-05 15:34                 ` Sergey Fedorov
2016-07-05 16:00                   ` Alex Bennée
2016-07-05 16:04                     ` Sergey Fedorov
2016-07-02  0:52 ` [Qemu-devel] [PATCH 0/2] Reduce lock contention on TCG hot-path Emilio G. Cota
2016-07-02  7:08   ` Alex Bennée
2016-07-02 16:03     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87inwkmj5d.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=a.rigo@virtualopensystems.com \
    --cc=bobby.prani@gmail.com \
    --cc=claudio.fontana@huawei.com \
    --cc=cota@braap.org \
    --cc=crosthwaite.peter@gmail.com \
    --cc=fred.konrad@greensocs.com \
    --cc=jan.kiszka@siemens.com \
    --cc=mark.burton@greensocs.com \
    --cc=mttcg@listserver.greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=serge.fdrv@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.