clang .code16 with -Os producing larger code that it needs to

All of lore.kernel.org
 help / color / mirror / Atom feed

* clang .code16 with -Os producing larger code that it needs to
@ 2015-02-20 14:58 Vladimir 'φ-coder/phcoder' Serbinenko
  2015-02-20 15:26 ` Vladimir 'φ-coder/phcoder' Serbinenko
  2015-02-20 15:38 ` David Woodhouse
  0 siblings, 2 replies; 10+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-20 14:58 UTC (permalink / raw)
  To: dwmw2, llvmdev, The development of GRUB 2

[-- Attachment #1: Type: text/plain, Size: 1052 bytes --]

When experimenting with compiling GRUB2 with clang using integrated as,
I found out that it generates a 16-bit code bigger than gas counterpart
and result gets too big for size constraints of bootsector. This was
traced mainly to 2 problems.
32-bit access to 16-bit addresses.
source:
	movl	LOCAL(kernel_sector), %ebx
	movl	%ebx, 8(%si)
clang:
    7cbc:	67 66 8b 1d 5c 7c 00 	addr32 mov 0x7c5c,%ebx
    7cc3:	00
    7cc4:	66 89 5c 08          	mov    %ebx,0x8(%si)

gas:
    7cbc:	66 8b 1e 5c 7c       	mov    0x7c5c,%ebx
    7cc1:	66 89 5c 08          	mov    %ebx,0x8(%si)
32-bit jump.
source:
	jnb	LOCAL(floppy_probe)
clang:
+    7cb5:	66 0f 83 07 01 00 00 	jae    7dc3 <L_floppy_probe>
gas:
-    7cb5:	0f 83 0a 01          	jae    7dc3 <L_floppy_probe>
The last one is particularly problematic as it never makes sense to
issue 32-bit jump if %ip is only 16 bits and it eats 3 extra bytes per
jump. Is it possible to force clang to generate 16-bit jumps?
On bright side if I remove error strings the code is functional.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: clang .code16 with -Os producing larger code that it needs to
  2015-02-20 14:58 clang .code16 with -Os producing larger code that it needs to Vladimir 'φ-coder/phcoder' Serbinenko
@ 2015-02-20 15:26 ` Vladimir 'φ-coder/phcoder' Serbinenko
  2015-02-20 15:38 ` David Woodhouse
  1 sibling, 0 replies; 10+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-20 15:26 UTC (permalink / raw)
  To: dwmw2, llvmdev, The development of GRUB 2

[-- Attachment #1: Type: text/plain, Size: 1425 bytes --]

On 20.02.2015 15:58, Vladimir 'φ-coder/phcoder' Serbinenko wrote:
> When experimenting with compiling GRUB2 with clang using integrated as,
> I found out that it generates a 16-bit code bigger than gas counterpart
> and result gets too big for size constraints of bootsector. This was
> traced mainly to 2 problems.
> 32-bit access to 16-bit addresses.
> source:
> 	movl	LOCAL(kernel_sector), %ebx
> 	movl	%ebx, 8(%si)
> clang:
>     7cbc:	67 66 8b 1d 5c 7c 00 	addr32 mov 0x7c5c,%ebx
>     7cc3:	00
>     7cc4:	66 89 5c 08          	mov    %ebx,0x8(%si)
> 
> gas:
>     7cbc:	66 8b 1e 5c 7c       	mov    0x7c5c,%ebx
>     7cc1:	66 89 5c 08          	mov    %ebx,0x8(%si)
> 32-bit jump.
> source:
> 	jnb	LOCAL(floppy_probe)
> clang:
> +    7cb5:	66 0f 83 07 01 00 00 	jae    7dc3 <L_floppy_probe>
> gas:
> -    7cb5:	0f 83 0a 01          	jae    7dc3 <L_floppy_probe>
Minimal example would be:
	.code16
	jmp 1f
	.space 256
1:	nop
clang:
   0:	66 e9 00 01 00 00    	jmpl   0x106
	...
 106:	90                   	nop
gcc:
   0:	e9 00 01             	jmp    0x103
	...
 103:	90                   	nop

> The last one is particularly problematic as it never makes sense to
> issue 32-bit jump if %ip is only 16 bits and it eats 3 extra bytes per
> jump. Is it possible to force clang to generate 16-bit jumps?
> On bright side if I remove error strings the code is functional.
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: clang .code16 with -Os producing larger code that it needs to
  2015-02-20 14:58 clang .code16 with -Os producing larger code that it needs to Vladimir 'φ-coder/phcoder' Serbinenko
  2015-02-20 15:26 ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2015-02-20 15:38 ` David Woodhouse
  2015-02-20 15:46   ` Vladimir 'φ-coder/phcoder' Serbinenko
  1 sibling, 1 reply; 10+ messages in thread
From: David Woodhouse @ 2015-02-20 15:38 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko
  Cc: The development of GRUB 2, llvmdev

[-- Attachment #1: Type: text/plain, Size: 1330 bytes --]

On Fri, 2015-02-20 at 15:58 +0100, Vladimir 'φ-coder/phcoder' Serbinenko
wrote:
> When experimenting with compiling GRUB2 with clang using integrated as,
> I found out that it generates a 16-bit code bigger than gas counterpart
> and result gets too big for size constraints of bootsector. This was
> traced mainly to 2 problems.

...

> 32-bit access to 16-bit addresses.
> clang:
>     7cbc:	67 66 8b 1d 5c 7c 00 00	addr32 mov 0x7c5c,%ebx
> gas:
>     7cbc:	66 8b 1e 5c 7c       	mov    0x7c5c,%ebx

> 32-bit jump.
> clang:
> +    7cb5:	66 0f 83 07 01 00 00 	jae    7dc3 <L_floppy_probe>
> gas:
> -    7cb5:	0f 83 0a 01          	jae    7dc3 <L_floppy_probe>

To a large extent, those are the *same* problem. We don't know that it's
eventually going to fit into a 16-bit offset, so we emit it with a fixup
record which can cope with 32 bits.

Arguably, the jump is *particularly* gratuitous in many cases... but in
'big real' mode is the IP *really* limited to 16 bits?

We could make it default to 16-bit, as gas does. But then we'd be
screwed in the cases where we really *do* need 32-bit.

What we actually need to do is implement handling for the explicit
addr32 prefix. Then we can do what gas does and default to 16-bit but
*also* have a way to do 32-bit when it's needed.

-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: clang .code16 with -Os producing larger code that it needs to
  2015-02-20 15:38 ` David Woodhouse
@ 2015-02-20 15:46   ` Vladimir 'φ-coder/phcoder' Serbinenko
  2015-02-20 16:05     ` David Woodhouse
  0 siblings, 1 reply; 10+ messages in thread
From: Vladimir 'φ-coder/phcoder' Serbinenko @ 2015-02-20 15:46 UTC (permalink / raw)
  To: David Woodhouse; +Cc: The development of GRUB 2, llvmdev

[-- Attachment #1: Type: text/plain, Size: 1810 bytes --]

On 20.02.2015 16:38, David Woodhouse wrote:
> On Fri, 2015-02-20 at 15:58 +0100, Vladimir 'φ-coder/phcoder' Serbinenko
> wrote:
>> When experimenting with compiling GRUB2 with clang using integrated as,
>> I found out that it generates a 16-bit code bigger than gas counterpart
>> and result gets too big for size constraints of bootsector. This was
>> traced mainly to 2 problems.
> 
> ...
> 
>> 32-bit access to 16-bit addresses.
>> clang:
>>     7cbc:	67 66 8b 1d 5c 7c 00 00	addr32 mov 0x7c5c,%ebx
>> gas:
>>     7cbc:	66 8b 1e 5c 7c       	mov    0x7c5c,%ebx
> 
>> 32-bit jump.
>> clang:
>> +    7cb5:	66 0f 83 07 01 00 00 	jae    7dc3 <L_floppy_probe>
>> gas:
>> -    7cb5:	0f 83 0a 01          	jae    7dc3 <L_floppy_probe>
> 
> To a large extent, those are the *same* problem. We don't know that it's
> eventually going to fit into a 16-bit offset, so we emit it with a fixup
> record which can cope with 32 bits.
> 
All labels are local to the source file. If I use %eax instead of %ebx
in first example I get the short code. For the second example how does
clang detect that offset fits into one byte for issuing EB XX sequence
which is issued in resulting file in several places. Can we use the same
mechanism to detect when issuing 16-bit reference and keep 32-bit one
for external references?
> Arguably, the jump is *particularly* gratuitous in many cases... but in
> 'big real' mode is the IP *really* limited to 16 bits?
> 
> We could make it default to 16-bit, as gas does. But then we'd be
> screwed in the cases where we really *do* need 32-bit.
> 
> What we actually need to do is implement handling for the explicit
> addr32 prefix. Then we can do what gas does and default to 16-bit but
> *also* have a way to do 32-bit when it's needed.
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: clang .code16 with -Os producing larger code that it needs to
  2015-02-20 15:46   ` Vladimir 'φ-coder/phcoder' Serbinenko
@ 2015-02-20 16:05     ` David Woodhouse
  2015-02-20 16:18       ` David Woodhouse
  0 siblings, 1 reply; 10+ messages in thread
From: David Woodhouse @ 2015-02-20 16:05 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko
  Cc: The development of GRUB 2, llvmdev

[-- Attachment #1: Type: text/plain, Size: 813 bytes --]

On Fri, 2015-02-20 at 16:46 +0100, Vladimir 'φ-coder/phcoder' Serbinenko
wrote:
> 
> All labels are local to the source file. If I use %eax instead of %ebx
> in first example I get the short code. For the second example how does
> clang detect that offset fits into one byte for issuing EB XX sequence
> which is issued in resulting file in several places. Can we use the
> same mechanism to detect when issuing 16-bit reference and keep 32-bit
> one for external references?

It's been a while since I looked at this... but I think for the short
jumps we just emit the 8-bit version and there's a fixup which can go
back and re-emit the instruction in 32-bit mode if it finds it doesn't
fit?

Do we just need to support a similar fixup for promoting 16-bit to
32-bit relocations?

-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: clang .code16 with -Os producing larger code that it needs to
  2015-02-20 16:05     ` David Woodhouse
@ 2015-02-20 16:18       ` David Woodhouse
  2015-02-20 18:47         ` [LLVMdev] " Rafael Espíndola
  0 siblings, 1 reply; 10+ messages in thread
From: David Woodhouse @ 2015-02-20 16:18 UTC (permalink / raw)
  To: Vladimir 'φ-coder/phcoder' Serbinenko
  Cc: The development of GRUB 2, llvmdev

[-- Attachment #1: Type: text/plain, Size: 1491 bytes --]

On Fri, 2015-02-20 at 16:05 +0000, David Woodhouse wrote:
> 
> It's been a while since I looked at this... but I think for the short
> jumps we just emit the 8-bit version and there's a fixup which can go
> back and re-emit the instruction in 32-bit mode if it finds it doesn't
> fit?
> 
> Do we just need to support a similar fixup for promoting 16-bit to
> 32-bit relocations?

OK, the term I was looking for was 'relaxation'. Look in
lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp for
X86AsmBackend::relaxInstruction() and related methods.

Observe that it will cope with 'relaxing' 8-bit PC-relative relocations
to 32-bit PC-relative, but it doesn't cope with anything else.

Your task, should you choose to accept it, is to make it cope with other
forms of relaxation where necessary.

Note that the existing cases end up emitting a new instruction with a
*new* opcode. In your case it won't be doing that. It's the *same*
opcode, but you'll have to set a flag to tell the emitter to use the
32-bit addressing mode (for data and/or addr as appropriate) this time.

And while you're doing that, you should note that that's the *same* flag
that'll be needed to support explicit addr32/data32 prefixes in the asm
source. So you might as well support those too. I might suggest doing
them *first*, in fact.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LLVMdev] clang .code16 with -Os producing larger code that it needs to
  2015-02-20 16:18       ` David Woodhouse
@ 2015-02-20 18:47         ` Rafael Espíndola
  2015-02-23 12:07           ` David Woodhouse
  0 siblings, 1 reply; 10+ messages in thread
From: Rafael Espíndola @ 2015-02-20 18:47 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Vladimir 'φ-coder/phcoder' Serbinenko,
	LLVM Developers Mailing List, The development of GRUB 2

> Your task, should you choose to accept it, is to make it cope with other
> forms of relaxation where necessary.

And if not, please open a bug :-)

There are a few other missing cases that cause MC to produce code that
is more "relaxed" than it needs to be.

Cheers,
Rafael


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LLVMdev] clang .code16 with -Os producing larger code that it needs to
  2015-02-20 18:47         ` [LLVMdev] " Rafael Espíndola
@ 2015-02-23 12:07           ` David Woodhouse
  2015-02-24  8:42             ` Craig Topper
  0 siblings, 1 reply; 10+ messages in thread
From: David Woodhouse @ 2015-02-23 12:07 UTC (permalink / raw)
  To: Rafael Espíndola
  Cc: Vladimir 'φ-coder/phcoder' Serbinenko,
	LLVM Developers Mailing List, The development of GRUB 2

[-- Attachment #1: Type: text/plain, Size: 1037 bytes --]

On Fri, 2015-02-20 at 13:47 -0500, Rafael Espíndola wrote:
> > Your task, should you choose to accept it, is to make it cope with other
> > forms of relaxation where necessary.
> 
> And if not, please open a bug :-)

http://llvm.org/bugs/show_bug.cgi?id=22662

FWIW I could reproduce the 'movl foo, %ebx' one but a relative jump
*was* using 16 bits (although gas uses 8):

 $ cat foo.S
.code16
	jae foo
	movl (foo), %ebx
foo:
 $ gcc -c -oa.out foo.S   ; llvm-objdump -d -triple=i686-pc-linux-code16  

a.out:	file format ELF64-x86-64

Disassembly of section .text:
.text:
       0:	73 05                                        	jae	5
       2:	66 8b 1e 00 00                               	movl	0, %ebx
 $ llvm-mc -filetype=obj foo.S | llvm-objdump -d -triple=i686-pc-linux-code16 - 

<stdin>:	file format ELF64-x86-64

Disassembly of section .text:
.text:
       0:	0f 83 08 00                                  	jae	8
       4:	67 66 8b 1d 00 00 00 00                      	movl	0, %ebx


-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LLVMdev] clang .code16 with -Os producing larger code that it needs to
  2015-02-23 12:07           ` David Woodhouse
@ 2015-02-24  8:42             ` Craig Topper
  2015-02-24  9:07               ` David Woodhouse
  0 siblings, 1 reply; 10+ messages in thread
From: Craig Topper @ 2015-02-24  8:42 UTC (permalink / raw)
  To: David Woodhouse
  Cc: The development of GRUB 2, LLVM Developers Mailing List,
	Rafael Espíndola

[-- Attachment #1: Type: text/plain, Size: 2573 bytes --]

Does gas really relax from 16-bit addresses to 32-bit address as necessary?
I played around briefly and it looks like gas will only emit 16-bit
addresses in 16-bit mode unless addr32 is prefixed. Even for an external
symbol it only emitted a 16-bit relocation type until I added addr32.

I wonder if we shouldn't fix the x86 encoder to use 16-bit addresses in
16-bit mode. (Actually I think we're emitting 0x67 prefix because the
displacement size check in Is16BitMemOperand doesn't like cases where
displacement isExpr instead of isImm). And maybe override the mode in
SubTargetInfo around the EmitInstruction call for any that specifies
"addr32" in 16-bit mode or "addr16" in 32-bit mode?

That doesn't help with jumps though since they do need their opcode
switched to JMP_2 instead of JMP_4. Again I can't prove that gas will
further relax 2-byte to 4-byte without addr32. I think we either need to
again change SubTargetInfo and pass it into relaxInstruction OR we could
create new 16-bit mode only 1-byte jumps that we can parse based on mode
and relax to the 2 byte form.

On Mon, Feb 23, 2015 at 4:07 AM, David Woodhouse <dwmw2@infradead.org>
wrote:

> On Fri, 2015-02-20 at 13:47 -0500, Rafael Espíndola wrote:
> > > Your task, should you choose to accept it, is to make it cope with
> other
> > > forms of relaxation where necessary.
> >
> > And if not, please open a bug :-)
>
> http://llvm.org/bugs/show_bug.cgi?id=22662
>
> FWIW I could reproduce the 'movl foo, %ebx' one but a relative jump
> *was* using 16 bits (although gas uses 8):
>
>  $ cat foo.S
> .code16
>         jae foo
>         movl (foo), %ebx
> foo:
>  $ gcc -c -oa.out foo.S   ; llvm-objdump -d -triple=i686-pc-linux-code16
>
> a.out:  file format ELF64-x86-64
>
> Disassembly of section .text:
> .text:
>        0:       73 05                                           jae     5
>        2:       66 8b 1e 00 00                                  movl    0,
> %ebx
>  $ llvm-mc -filetype=obj foo.S | llvm-objdump -d
> -triple=i686-pc-linux-code16 -
>
> <stdin>:        file format ELF64-x86-64
>
> Disassembly of section .text:
> .text:
>        0:       0f 83 08 00                                     jae     8
>        4:       67 66 8b 1d 00 00 00 00                         movl    0,
> %ebx
>
>
> --
> dwmw2
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev@cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

-- 
~Craig

[-- Attachment #2: Type: text/html, Size: 3546 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [LLVMdev] clang .code16 with -Os producing larger code that it needs to
  2015-02-24  8:42             ` Craig Topper
@ 2015-02-24  9:07               ` David Woodhouse
  0 siblings, 0 replies; 10+ messages in thread
From: David Woodhouse @ 2015-02-24  9:07 UTC (permalink / raw)
  To: Craig Topper
  Cc: The development of GRUB 2, LLVM Developers Mailing List,
	Rafael Espíndola

[-- Attachment #1: Type: text/plain, Size: 1848 bytes --]

On Tue, 2015-02-24 at 00:42 -0800, Craig Topper wrote:
> Does gas really relax from 16-bit addresses to 32-bit address as
> necessary? I played around briefly and it looks like gas will only
> emit 16-bit addresses in 16-bit mode unless addr32 is prefixed. Even
> for an external symbol it only emitted a 16-bit relocation type until
> I added addr32.

I believe you are correct. My use of 32-bit relocations in LLVM was
mostly because we didn't yet support addr32. Having code which is
correct but slightly larger than needed was better than having some
things which you just *couldn't* build, in the short term.

> I wonder if we shouldn't fix the x86 encoder to use 16-bit addresses
> in 16-bit mode.

Yes, we should. Having implemented the addr32 prefix first.

>  (Actually I think we're emitting 0x67 prefix because the displacement
> size check in Is16BitMemOperand doesn't like cases where displacement
> isExpr instead of isImm). And maybe override the mode in SubTargetInfo
> around the EmitInstruction call for any that specifies "addr32" in
> 16-bit mode or "addr16" in 32-bit mode?
> 
> 
> That doesn't help with jumps though since they do need their opcode
> switched to JMP_2 instead of JMP_4. Again I can't prove that gas will
> further relax 2-byte to 4-byte without addr32. I think we either need
> to again change SubTargetInfo and pass it into relaxInstruction OR we
> could create new 16-bit mode only 1-byte jumps that we can parse based
> on mode and relax to the 2 byte form.

I don't think we need a new opcode for a 16-bit mode 1-byte jump, do we?
The mode is already stored in the MCInst because I needed to do that to
fix PR18303.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5745 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-02-24 14:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-20 14:58 clang .code16 with -Os producing larger code that it needs to Vladimir 'φ-coder/phcoder' Serbinenko
2015-02-20 15:26 ` Vladimir 'φ-coder/phcoder' Serbinenko
2015-02-20 15:38 ` David Woodhouse
2015-02-20 15:46   ` Vladimir 'φ-coder/phcoder' Serbinenko
2015-02-20 16:05     ` David Woodhouse
2015-02-20 16:18       ` David Woodhouse
2015-02-20 18:47         ` [LLVMdev] " Rafael Espíndola
2015-02-23 12:07           ` David Woodhouse
2015-02-24  8:42             ` Craig Topper
2015-02-24  9:07               ` David Woodhouse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.