From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46021) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aiKxo-0007rZ-P1 for qemu-devel@nongnu.org; Tue, 22 Mar 2016 07:59:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aiKxk-0008Nr-NO for qemu-devel@nongnu.org; Tue, 22 Mar 2016 07:59:08 -0400 Received: from mail-wm0-x236.google.com ([2a00:1450:400c:c09::236]:37793) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aiKxk-0008Ni-HX for qemu-devel@nongnu.org; Tue, 22 Mar 2016 07:59:04 -0400 Received: by mail-wm0-x236.google.com with SMTP id p65so160037934wmp.0 for ; Tue, 22 Mar 2016 04:59:04 -0700 (PDT) References: <1458317932-1875-1-git-send-email-alex.bennee@linaro.org> <1458317932-1875-2-git-send-email-alex.bennee@linaro.org> <20160321215039.GA2466@flamenco> <20160321235950.GA9356@flamenco> From: Alex =?utf-8?Q?Benn=C3=A9e?= In-reply-to: <20160321235950.GA9356@flamenco> Date: Tue, 22 Mar 2016 11:59:02 +0000 Message-ID: <87h9fysozd.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [RFC v1 01/11] tcg: move tb_find_fast outside the tb_lock critical section List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Emilio G. Cota" Cc: mttcg@listserver.greensocs.com, Peter Maydell , Peter Crosthwaite , Mark Burton , Alvise Rigo , QEMU Developers , Sergey Fedorov , Paolo Bonzini , KONRAD =?utf-8?B?RnLDqWTDqXJpYw==?= , Andreas =?utf-8?Q?F=C3=A4rber?= , Richard Henderson Emilio G. Cota writes: > On Mon, Mar 21, 2016 at 22:08:06 +0000, Peter Maydell wrote: >> It is not _necessary_, but it is a performance optimization to >> speed up the "missed in the TLB" case. (A TLB flush will wipe >> the tb_jmp_cache table.) From the thread where the move-to-front-of-list >> behaviour was added in 2010, benefits cited: > > (snip) >> I think what's happening here is that for guest CPUs where TLB >> invalidation happens fairly frequently (notably ARM, because >> we don't model ASIDs in the QEMU TLB and thus have to flush >> the TLB on any context switch) the case of "we didn't hit in >> the TLB but we do have this TB and it was used really recently" >> happens often enough to make it worthwhile for the >> tb_find_physical() code to keep its hash buckets in LRU order. >> >> Obviously that's all five year old data now, so a pinch of >> salt may be indicated, but I'd rather we didn't just remove >> the optimisation without some benchmarking to check that it's >> not significant. A 2x difference is huge. > > Good point. Most of my tests have been on x86-on-x86, and the > difference there (for many CPU-intensive benchmarks such as SPEC) was > negligible. > > Just tested the current master booting Alex' debian ARM image, without > LRU, and I see a 20% increase in boot time. Also see: https://github.com/stsquad/kvm-unit-tests/tree/mttcg/current-tests-v5 ./run-tests.sh -g tcg -t The tcg tests are designed to exercise the TB find and linking logic. The computed and paged variants of the test always exit the run loop to look up the next TB. Granted the tests are pathological cases but useful for comparing different approaches at the edge cases. > > I'll add per-bucket locks to keep the same behaviour without hurting > scalability. > > Thanks, > > Emilio -- Alex Bennée