From mboxrd@z Thu Jan  1 00:00:00 1970
From: David Given <dg@cowlark.com>
Subject: Compilers
Date: Fri, 28 May 2004 11:42:38 +0100
Sender: linux-8086-owner@vger.kernel.org
Message-ID: <200405281142.38116.dg@cowlark.com>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Return-path: <linux-8086-owner@vger.kernel.org>
Content-Disposition: inline
List-Id: <linux-8086.vger.kernel.org>
Content-Type: text/plain; charset="us-ascii"
To: linux-8086@vger.kernel.org

I've been playing with Minix, and in particular looking at its C compiler. 
Minix uses a cut-down version of the Amsterdam Compiler Kit, which used to 
cost serious money but is now open source.

It's nice.

It seems to be a full ANSI C compiler generating pretty good code. ELKS' 
compiler is bcc, which is a K&R C compiler with a nasty preprocessor that 
turns ANSI C into K&R C; this means that it will compile ANSI C, but you 
don't get the proper type checking. Minix' cc is designed to run self-hosted 
(there are up to *eight* seperate programs involved); on my 1MB machine 
there's a fair amount of disk churn because it has to keep reloading bits of 
code, which makes it slow. It would be way faster on a 2MB machine.

I've tried to compare the code produced by cc and bcc. I'm not entirely sure 
this is a fair comparison because I don't think I managed to invoke bcc's 
optimiser correctly, but here you go anyway. Fixed-pitch font recommended.

The source:

---snip---
extern int printf(char* format, ...);

int fnord(int i)
{
        int count = 0;

        while (i)
                count += i--;
        return count;
}

int main(int argc, char* argv[])
{
        printf("Hello, world!\n");
        printf("%d\n", fnord(argc));
        return 0;
}
---snip---

The output:

BCC				CC

_fnord:				_fnord:
push bp				push bp
mov bp,sp			mov bp,sp
dec sp				push si
dec sp				xor si,si
xor ax,ax			.4:
mov -2[bp],ax			cmp 4(bp),#0
jmp .2				je .3
.3:				mov dx,4(bp)
mov ax,4[bp]			mov cx,dx
dec ax				dec cx
mov 4[bp],ax			mov 4(bp),cx
inc ax
add ax,-2[bp]
mov -2[bp],ax			add si,dx
.2:				
mov ax,4[bp]			
test ax,ax
jne .3
.4:
.1:				.3:
mov ax,-2[bp]			mov ax,si
mov sp,bp
pop bp
ret				jmp .sret

The main() function produced by both compilers is identical apart from minor 
differences: cc likes using pop to do a stack retraction, where bcc does a 
mov sp,bp instead. This means that cc produces smaller but slower code, but 
bcc produces faster code. *shrug*

cc seems to have better register allocation and avoids using stack slots when 
it doesn't need to. bcc has put the loop conditional at the bottom instead of 
the top; not sure why, it makes the code larger.

Given that currently the ELKS kernel is about 70kB of code, a little 
problematic given the 64kB limit, anything that will make the code smaller is 
good! Plus, cc is a real ANSI compiler. When I get some spare time I'll see 
if I can have a look at building the ELKS kernel on Minix. Since the two 
compilers use the same calling conventions I don't think it would be terribly 
hard and I'd be interested in seeing what sort of differences there are in 
real code.

Incidentally, if you're interested in what ELKS might end up being, I strongly 
suggest you install a copy of Minix on something. It's fascinating just how 
usable it is, on such limited hardware.

-- 
+- David Given --McQ-+ 
|  dg@cowlark.com    | "All power corrupts, but we need electricity." ---
| (dg@tao-group.com) | Diana Wynne Jones, _Archer's Goon_
+- www.cowlark.com --+