From mboxrd@z Thu Jan  1 00:00:00 1970
From: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
Subject: Re: [PATCH v2] eal: fix up bad asm in
	rte_cpu_get_features
Date: Thu, 20 Mar 2014 08:20:26 -0700
Message-ID: <532B073A.5010709@zytor.com>
References: <1395175414-25232-1-git-send-email-nhorman@tuxdriver.com>
 <1395240524-412-1-git-send-email-nhorman@tuxdriver.com>
 <5329BB6E.8080509@zytor.com>
 <20140320004010.GA20693@neilslaptop.think-freely.org>
 <532A6CEB.1070106@zytor.com>
 <20140320110323.GA7721@hmsreliant.think-freely.org>
 <20140320112734.GB7721@hmsreliant.think-freely.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: dev-VfR2kkLFssw@public.gmane.org
To: Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
Return-path: <dev-bounces-VfR2kkLFssw@public.gmane.org>
In-Reply-To: <20140320112734.GB7721-B26myB8xz7F8NnZeBjwnZQMhkBWG/bsMQH7oEaQurus@public.gmane.org>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request-VfR2kkLFssw@public.gmane.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev-VfR2kkLFssw@public.gmane.org>
List-Help: <mailto:dev-request-VfR2kkLFssw@public.gmane.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request-VfR2kkLFssw@public.gmane.org?subject=subscribe>
Errors-To: dev-bounces-VfR2kkLFssw@public.gmane.org
Sender: "dev" <dev-bounces-VfR2kkLFssw@public.gmane.org>

On 03/20/2014 04:27 AM, Neil Horman wrote:
>>
> So, I answered my own question, sort of.  The __i386__ is clear: x86_64 uses RIP
> relative addressing, making the saving of ebx not needed - thats perfectly
> clear.
> 
> Whats a bit less clear to me is why it matters.  Ideally moving ebx and
> restoring it with an xchg should change the register state at all.  It would
> clobber the lower part of rbx I think, but looking at the disassembly that
> shouldn't be used, so as long as the calling function saves its value of rbx, it
> should be ok.

I think you just hit on the real bug.

If this code were compiled on 64 bits, it would clobber the *upper* half
of %rbx, because a 32-bit operation on 64 bits clobber the upper half of
the register.  Since the compiler isn't being told that %rbx is being
modified, it expects %rbx to be unmodified and disaster ensues.

It just clicked on me, though, that this function is actually a static
function in a .c file, meaning it is not an API at all.  This code can
be simplified dramatically as a result.

Let me see if I can hack up something quickly.

> The odd part is, if I look at the disassembly of
> rte_cpu_get_flag_enabled compiled with and without the mov and xchgl operations,
> I see that without those additional instructions the compiler adds a push rbx
> and pop rbx instruction at the start and end of the assembly, but not when the
> mov ebx, %0 and xchgl %ebx, %0 instructions are added.  I'm not sure what the
> compiler is sensitive to when adding those instructions, but it seems like it
> should be sensitive to the cpuid instruction, and should be adding it to both.

It's not the instruction, it is the fact that the constraints include a
"=b".

This explains why your little hack happens to work... I was wondering
how it compiled at all.  The answer, of course, is that it it on x86-64
where the hack is neither necessary nor correct.

	-hpa