From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751987AbbIAXtM (ORCPT ); Tue, 1 Sep 2015 19:49:12 -0400 Received: from terminus.zytor.com ([198.137.202.10]:39877 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751848AbbIAXtK (ORCPT ); Tue, 1 Sep 2015 19:49:10 -0400 Message-ID: <55E63964.6000404@zytor.com> Date: Tue, 01 Sep 2015 16:48:52 -0700 From: "H. Peter Anvin" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: "Michael S. Tsirkin" , Ingo Molnar CC: linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , x86@kernel.org, Rusty Russell , Linus Torvalds , Uros Bizjak Subject: Re: [PATCH 1/2] x86/bitops: implement __test_bit References: <1440776707-22016-1-git-send-email-mst@redhat.com> <20150831060549.GB7093@gmail.com> <0779C35A-141F-4019-942A-CD3F861048A3@zytor.com> <20150831105355-mutt-send-email-mst@redhat.com> <20150831075947.GA9974@gmail.com> <20150831111910.GA24574@redhat.com> <20150901092422.GA8088@gmail.com> <20150901094046.GA32498@redhat.com> <20150901113942.GB11161@gmail.com> <20150901155824-mutt-send-email-mst@redhat.com> In-Reply-To: <20150901155824-mutt-send-email-mst@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/01/2015 08:03 AM, Michael S. Tsirkin wrote: >>> >>> Hmm - so do you take back the ack? >> >> I have no strong feelings either way, it simply strikes me as misguided to >> explicitly optimize for something that is listed as a high overhead instruction. >> > > [mst@robin test]$ diff a.c b.c > 31c31 > < if (__variable_test_bit(1, &addr)) > --- > > if (__constant_test_bit(1, &addr)) > > [mst@robin test]$ gcc -Wall -O2 a.c; time ./a.out > > real 0m0.532s > user 0m0.531s > sys 0m0.000s > [mst@robin test]$ gcc -Wall -O2 b.c; time ./a.out > > real 0m0.517s > user 0m0.517s > sys 0m0.000s > > > So __constant_test_bit is faster even though it's using more > instructions > $ gcc -Wall -O2 a.c; -objdump -ld ./a.out > I think this is well understood. The use of bts/btc in locked operations is sometimes justified since it reports the bit status back out, whereas in unlocked operations bts/btc has no benefit except for code size. bt is a read operation, and is therefore "never/always" atomic; it cannot be locked because there is no read/write pair to lock. So it is strictly an issue of code size versus performance. However, your test is simply faulty: 804843f: 50 push %eax 8048440: 6a 01 push $0x1 8048442: e8 b4 ff ff ff call 80483fb <__variable_test_bit> You're encapsulating the __variable_test_bit() version into an expensive function call, whereas the __constant_test_bit() seems to emit code that is quite frankly completely bonkers insane: 8048444: 8b 45 ec mov -0x14(%ebp),%eax 8048447: 83 e0 1f and $0x1f,%eax 804844a: 89 c1 mov %eax,%ecx 804844c: d3 ea shr %cl,%edx 804844e: 89 d0 mov %edx,%eax 8048450: 83 e0 01 and $0x1,%eax 8048453: 85 c0 test %eax,%eax 8048455: 0f 95 c0 setne %al 8048458: 0f b6 c0 movzbl %al,%eax 804845b: 85 c0 test %eax,%eax 804845d: 74 00 je 804845f Observe the sequence and/test/setne/movzbl/test! -hpa