From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755526AbbIAJky (ORCPT ); Tue, 1 Sep 2015 05:40:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56885 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751308AbbIAJkx (ORCPT ); Tue, 1 Sep 2015 05:40:53 -0400 Date: Tue, 1 Sep 2015 12:40:46 +0300 From: "Michael S. Tsirkin" To: Ingo Molnar Cc: "H. Peter Anvin" , linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , x86@kernel.org, Rusty Russell , Linus Torvalds , Uros Bizjak Subject: Re: [PATCH 1/2] x86/bitops: implement __test_bit Message-ID: <20150901094046.GA32498@redhat.com> References: <1440776707-22016-1-git-send-email-mst@redhat.com> <20150831060549.GB7093@gmail.com> <0779C35A-141F-4019-942A-CD3F861048A3@zytor.com> <20150831105355-mutt-send-email-mst@redhat.com> <20150831075947.GA9974@gmail.com> <20150831111910.GA24574@redhat.com> <20150901092422.GA8088@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150901092422.GA8088@gmail.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 01, 2015 at 11:24:22AM +0200, Ingo Molnar wrote: > > * Michael S. Tsirkin wrote: > > > I applied this patch on top of mine: > > Yeah, looks similar to the one I sent. > > > -static inline int __variable_test_bit(long nr, const unsigned long *addr) > > -{ > > - int oldbit; > > - > > - asm volatile("bt %2,%1\n\t" > > - "sbb %0,%0" > > - : "=r" (oldbit) > > - : "m" (*addr), "Ir" (nr)); > > - > > - return oldbit; > > -} > > > And the code size went up: > > > > 134836 2997 8372 146205 23b1d arch/x86/kvm/kvm-intel.ko -> > > 134846 2997 8372 146215 23b27 arch/x86/kvm/kvm-intel.ko > > > > 342690 47640 441 390771 5f673 arch/x86/kvm/kvm.ko -> > > 342738 47640 441 390819 5f6a3 arch/x86/kvm/kvm.ko > > > > I tried removing __always_inline, this had no effect. > > But code size isn't the only factor. > > Uros Bizjak pointed out that the reason GCC does not use the "BT reg,mem" > instruction is that it's highly suboptimal even on recent microarchitectures, > Sandy Bridge is listed as having a 10 cycles latency (!) for this instruction: > > http://www.agner.org/optimize/instruction_tables.pdf > > this instruction had bad latency going back to Pentium 4 CPUs. > > ... so unless something changed in this area with Skylake I think using the > __variable_test_bit() code of the kernel is a bad choice and looking at kernel > size only is misleading. > > It makes sense for atomics, but not for unlocked access. > > Thanks, > > Ingo Hmm - so do you take back the ack? I compared this: int main(int argc, char **argv) { long long int i; const unsigned long addr = 0; for (i = 0; i < 1000000000ull; ++i) { asm volatile(""); if (__variable_test_bit(1, &addr)) asm volatile(""); } return 0; } with the __constant_test_bit variant. __constant_test_bit code does appear to be slower on an i7 processor. test_bit isn't atomic either. Maybe drop variable_test_bit there too? -- MST