From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42838) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZTh5M-0000SP-Eo for qemu-devel@nongnu.org; Sun, 23 Aug 2015 22:02:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZTh5J-0003W8-8N for qemu-devel@nongnu.org; Sun, 23 Aug 2015 22:02:08 -0400 Received: from mail-wi0-x234.google.com ([2a00:1450:400c:c05::234]:34244) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZTh5J-0003VW-0s for qemu-devel@nongnu.org; Sun, 23 Aug 2015 22:02:05 -0400 Received: by widdq5 with SMTP id dq5so36119493wid.1 for ; Sun, 23 Aug 2015 19:02:03 -0700 (PDT) Sender: Paolo Bonzini References: <1440375847-17603-1-git-send-email-cota@braap.org> From: Paolo Bonzini Message-ID: <55DA7AF8.6040006@redhat.com> Date: Sun, 23 Aug 2015 19:01:28 -0700 MIME-Version: 1.0 In-Reply-To: <1440375847-17603-1-git-send-email-cota@braap.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [RFC 00/38] MTTCG: i386, user+system mode List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Emilio G. Cota" , qemu-devel@nongnu.org, mttcg@greensocs.com Cc: guillaume.delbergue@greensocs.com, alex.bennee@linaro.org, mark.burton@greensocs.com, a.rigo@virtualopensystems.com, Frederic Konrad On 23/08/2015 17:23, Emilio G. Cota wrote: > Hi all, > > Here is MTTCG code I've been working on out-of-tree for the last few months. > > The patchset applies on top of pbonzini's mttcg branch, commit ca56de6f. > Fetch the branch from: https://github.com/bonzini/qemu/commits/mttcg > > The highlights of the patchset are as follows: > > - The first 5 patches are direct fixes to bugs only in the mttcg > branch. I'm taking this as a review of the first ten patches in the mttcg thread, so I will send it in a pull request. Also squashed patches 1, 2, 4 and 5 in the relevant mttcg branch patches. > - Patches 6-12 fix issues in the master branch. Applied all 7, thanks. I hope to send the pending work before I leave for vacation. Otherwise I'll just push something to the mttcg branch. > - The remaining patches are really the meat of this patchset. > The main features are: > > * Support of MTTCG for both user and system mode. Nice. :) > * Design: per-CPU TB jump list protected by a seqlock, > if the TB is not found there then check on the global, RCU-protected 'hash table' > (i.e. fixed number of buckets), if not there then grab lock, check again, > and if it's not there then add generate the code and add the TB to the hash table. > > It makes sense that Paolo's recent work on the mttcg branch ended up > being almost identical to this--it's simple and it scales well. To be honest it's really Fred's, I just extracted his patch and modified it to avoid using "safe work" when invalidating TBs. I think I didn't need a seqlock, but it's possible that my branch has a race. > * tb_lock must be held every time code is generated. The rationale is > that most of the time QEMU is executing code, not generating it. Makes sense, this is really the same as Fred's work. > * tb_flush: do it once all other CPUs have been put to sleep by calling > rcu_synchronize(). > We also instrument tb_lock to make sure that only one tb_flush request can > happen at a given time. What do you think about just protecting code_gen_buffer with RCU? > For this a mechanism to schedule work is added to > supersede cpu_sched_safe_work, which cannot work in usermode. Here I've > toyed with an alternative version that doesn't force the flushing CPU to > exit, but in order to make this work we have save/restore the RCU read > lock while tb_lock is held in order to avoid deadlocks. This isn't too > pretty but it's good to know that the option is there. > > * I focused on x86 since it is a complex ISA and we support many cores via -smp. > I work on a 64-core machine so concurrency bugs show up relatively easily. > > Atomics are modeled using spinlocks, i.e. one host lock per guest cache line. > Note that spinlocks are way better than mutexes for this--perf on 64-cores > is 2X with spinlocks on highly concurrent workloads (synchrobench, see below). > > + Works unchanged for both system and user modes. As far as I can > tell the TLB-based approach that Alvise is working on couldn't > be used without the TLB--correct me if I'm wrong, it's been > quite some time since I looked at that work. Yes, that's correct. Paolo