From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:45859)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alex.bennee@linaro.org>) id 1a9FfB-0001Ru-HD
	for qemu-devel@nongnu.org; Wed, 16 Dec 2015 12:14:54 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <alex.bennee@linaro.org>) id 1a9Ff8-00030k-BT
	for qemu-devel@nongnu.org; Wed, 16 Dec 2015 12:14:53 -0500
Received: from mail-wm0-x22c.google.com ([2a00:1450:400c:c09::22c]:38027)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <alex.bennee@linaro.org>) id 1a9Ff7-00030B-Sl
	for qemu-devel@nongnu.org; Wed, 16 Dec 2015 12:14:50 -0500
Received: by mail-wm0-x22c.google.com with SMTP id l126so47777792wml.1
	for <qemu-devel@nongnu.org>; Wed, 16 Dec 2015 09:14:49 -0800 (PST)
From: Alex =?utf-8?Q?Benn=C3=A9e?= <alex.bennee@linaro.org>
Date: Wed, 16 Dec 2015 17:14:46 +0000
Message-ID: <87zixafh89.fsf@linaro.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Subject: [Qemu-devel] Rationalising exit_request,
	cpu->exit_request and tcg_exit_req?
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: QEMU Devel <qemu-devel@nongnu.org>, mttcg@listserver.greensocs.com, mark.burton@greensocs.com, fred.konrad@greensocs.com, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <rth@twiddle.net>, Peter Crosthwaite <crosthwaitepeter@gmail.com>, Peter Maydell <peter.maydell@linaro.org>

Hi,

While looking at Fred's current MTTCG WIP branch I ran into a problem
where:

  - async_safe_work_pending() was true
  - this triggered setting cpu->exit_request
  - however we never left tcg_exec_all()
  - because the global exit_request wasn't set
  - hence qemu_tcg_wait_io_event() never drained the async work queue

While trying to understand why we have both a cpu and a global
exit_request I then discovered there is also cpu->tcg_exit_req which is
the actual variable the TCG examines. This leads to sequences like:

void cpu_exit(CPUState *cpu)
{
    cpu->exit_request = 1;
    /* Ensure cpu_exec will see the exit request after TCG has exited.  */
    smp_wmb();
    cpu->tcg_exit_req = 1;
}

which itself is amusingly called from:

static void qemu_cpu_kick_no_halt(void)
{
    CPUState *cpu;
    /* Ensure whatever caused the exit has reached the CPU threads before
     * writing exit_request.
     */
    atomic_mb_set(&exit_request, 1);
    cpu = atomic_mb_read(&tcg_current_cpu);
    if (cpu) {
        cpu_exit(cpu);
    }
}

This seems to me to be slightly insane as we now have 3 variables that
struggle to be kept in sync. Could all this not be rationalised into a
single variable or am I missing a subtly in their different semantics?

One problem I can think of when we get to the MTTCG world is a race when
signalling other CPUs to exit and making sure that request is not
dropped as we clear an old exit_request.

The other complication is the main cpu_exec loop which works hard to
avoid leaving the main loop when processing interrupts (which require
an exit_request to trigger). This means there a potentially multiple
places where exit_requests are drained.

I don't know if there is clean-up that can happen in master or if this
all needs to be done in the mttcg work but would it make sense just to
keep cpu->exit_request, make it visible to the TCG code and make all
exits fall out to qemu_tcg_cpu_thread_fn which would be the only place
to clear the flag?

I did have a brief look at the KVM side of the code and it only seems to
reference cpu->exit_request so I think the rest of this is a TCG
problem.

Thoughts?

--
Alex Bennée