From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41165) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b9qzI-0002Gp-7e for qemu-devel@nongnu.org; Mon, 06 Jun 2016 05:38:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b9qzE-0005go-S5 for qemu-devel@nongnu.org; Mon, 06 Jun 2016 05:38:24 -0400 Received: from mail-lf0-x241.google.com ([2a00:1450:4010:c07::241]:35836) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b9qzD-0005gS-P1 for qemu-devel@nongnu.org; Mon, 06 Jun 2016 05:38:20 -0400 Received: by mail-lf0-x241.google.com with SMTP id s186so6224208lfs.2 for ; Mon, 06 Jun 2016 02:38:19 -0700 (PDT) References: <87k2i2ituo.fsf@linaro.org> From: Sergey Fedorov Message-ID: <57554488.1070408@gmail.com> Date: Mon, 6 Jun 2016 12:38:16 +0300 MIME-Version: 1.0 In-Reply-To: <87k2i2ituo.fsf@linaro.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [RFC v1 07/12] cpus: introduce async_safe_run_on_cpu. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?UTF-8?Q?Alex_Benn=c3=a9e?= Cc: mttcg@listserver.greensocs.com, fred.konrad@greensocs.com, a.rigo@virtualopensystems.com, cota@braap.org, qemu-devel@nongnu.org, mark.burton@greensocs.com, pbonzini@redhat.com, jan.kiszka@siemens.com, rth@twiddle.net, peter.maydell@linaro.org, claudio.fontana@huawei.com, Peter Crosthwaite , =?UTF-8?Q?Andreas_F=c3=a4rber?= On 06/06/16 11:50, Alex Bennée wrote: > Sergey Fedorov writes: > >> On 15/04/16 17:23, Alex Bennée wrote: >>> diff --git a/cpus.c b/cpus.c >>> index 9177161..860e2a9 100644 >>> --- a/cpus.c >>> +++ b/cpus.c >>> @@ -928,6 +928,19 @@ static QemuCond qemu_cpu_cond; >>> static QemuCond qemu_pause_cond; >>> static QemuCond qemu_work_cond; >>> >>> +/* safe work */ >>> +static int safe_work_pending; >>> +static int tcg_scheduled_cpus; >>> + >>> +typedef struct { >>> + CPUState *cpu; /* CPU affected */ >> Do we really need to pass CPU state to the safe work? Async safe work >> seems to be supposed to handle cross-CPU tasks rather than CPU-specific >> ones. TB flush, for example, doesn't really require CPU to be passed to >> the async safe work handler: we should be able to check for code buffer >> overflow before scheduling the job. Because of this, it seems to be >> reasonable to rename async_safe_run_on_cpu() to something like >> request_async_cpu_safe_work(). > CPUState is common enough to be passed around and this is a shared array > for all queued work. I wanted to ensure the second most common operation > of CPUState + one extra bit of information was covered without having to > resort to memory allocation. > > The previous iterations of this work had allocations for structures and > the linked list and I was working hard to make the common case > allocation free. The point was that this mechanism supposed to be used for operations requiring all CPUs to be halted thus there should be little chance it would require a specific CPUState to be passed. > >>> + run_on_cpu_func func; /* Helper function */ >>> + void *data; /* Helper data */ >>> +} qemu_safe_work_item; >>> + >>> +static GArray *safe_work; /* array of qemu_safe_work_items */ >> I think we shouldn't bother with "static" memory allocation for the list >> of work items. We should be fine with dynamic allocation in >> async_safe_run_on_cpu(). > Why? Under heavy load these add up fast, some TLB flush operations can > queue CPU x ASID deferred operations. It's not exactly a large array to > allocate at the start. We could make the initial size smaller if you > want. We are going to use async_run_on_cpu() for TLB invalidation, aren't we? > >> Halting all the CPUs doesn't seem to be cheap >> anyhow, thus it shouldn't be used frequently. The only use so far is to >> make tb_flush() safe. If we look at how tb_flush() is used we can see >> that this is either code buffer exhaustion or some rare operation which >> needs all the previous translations to be flushed. The former shouldn't >> happen often, the latter isn't supposed to be cheap and fast anyway. > The various flushes are more common under ARMv8, we are not just talking > about the big tb_flush using this mechanism. Which of them would really require quiescent state? I suppose all frequent operations are to be handled in per-CPU manner and don't require this expansive "halt all the CPUs" mechanism. > >> >>> +static QemuMutex safe_work_mutex; >>> + >>> void qemu_init_cpu_loop(void) >>> { >>> qemu_init_sigbus(); (snip) >>> @@ -997,6 +1013,81 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data) >>> qemu_cpu_kick(cpu); >>> } >>> >>> +/* >>> + * Safe work interface >>> + * >>> + * Safe work is defined as work that requires the system to be >>> + * quiescent before making changes. >>> + */ >>> + >>> +void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data) >>> +{ >>> + CPUState *iter; >>> + qemu_safe_work_item wi; >>> + wi.cpu = cpu; >>> + wi.func = func; >>> + wi.data = data; >>> + >>> + qemu_mutex_lock(&safe_work_mutex); >>> + g_array_append_val(safe_work, wi); >>> + atomic_inc(&safe_work_pending); >>> + qemu_mutex_unlock(&safe_work_mutex); >>> + >>> + /* Signal all vCPUs to halt */ >>> + CPU_FOREACH(iter) { >>> + qemu_cpu_kick(iter); >>> + } >>> +} >>> + >>> +/** >>> + * flush_queued_safe_work: >>> + * >>> + * @scheduled_cpu_count >>> + * >>> + * If not 0 will signal the other vCPUs and sleep. The last vCPU to >>> + * get to the function then drains the queue while the system is in a >>> + * quiescent state. This allows the operations to change shared >>> + * structures. >>> + * >>> + * @see async_run_safe_work_on_cpu >>> + */ >>> +static void flush_queued_safe_work(int scheduled_cpu_count) >>> +{ >>> + qemu_safe_work_item *wi; >>> + int i; >>> + >>> + /* bail out if there is nothing to do */ >>> + if (!async_safe_work_pending()) { >>> + return; >>> + } >>> + >>> + if (scheduled_cpu_count) { >>> + >>> + /* Nothing to do but sleep */ >>> + qemu_cond_wait(&qemu_work_cond, &qemu_global_mutex); >> We'll not be able to extend this to user-mode emulation because we don't >> have BQL there. I think we should consider something like >> exclusive_idle()/start_exclusive()/end_exclusive()/cpu_exec_start()/cpu_exec_end(). > This is true, sacrificed on the alter of making things simpler. It would > be nice if we had a single simple dispatch point for linux-user as well > and we could make things sane there as well. > > I'll have another look at the exclusive code although at first blush it > seemed a little more complex because it didn't have the luxury of > knowing how many threads are running. Actually, it knows that making use of user-mode-only 'CPUState::running' and its global counter 'pending_cpus'. Kind regards, Sergey