From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alfonso Subject: _main and startup code Date: Wed, 12 Feb 2003 12:15:15 +0100 Sender: linux-8086-owner@vger.kernel.org Message-ID: <200302121215.15299.a.martone@retepnet.it> References: <200301301341.h0UDf5Z25323@preshak.recjai.ac.in> <200302060016.28257.a.martone@retepnet.it> <1044942743.1550.70.camel@Castle.goembel> Reply-To: a.martone@retepnet.it Mime-Version: 1.0 Content-Transfer-Encoding: 7BIT Return-path: In-Reply-To: <1044942743.1550.70.camel@Castle.goembel> List-Id: Content-Type: text/plain; charset="us-ascii" To: linux-8086@vger.kernel.org > Would the program start symbol also need to be _main, > or is that only needed by the C runtime library? The name "main" is internal to the C compiler and its libraries. As of ELKS 0.1.1, an executable starts always at its CS:0000, where the bcc places the startup code (adjust stack parameters, call _main and then issue an "exit" syscall). In the Minix-header there are two "Unused" fields. I found them defined somewhere as "starting address" and "lenght of symbol table (or DLL data) appended to the file". Since the compiler sets them to zero by default, the patch below should be backward- and forward-compatible (I'm sorry, I don't know how to use those weird things like diff, patch, cvs, etc, so I show it to you in this jerky mode): in the file include/linuxmt/minix.h change the line #23 from: unsigned long unused; to: unsigned long startaddr; in the file fs/exec.c add these lines after the line #348 (soon after "tregs->cs=cseg" line): tregs->ip = mh.startaddr; /* guaranteed good at compile time */ This implements a "start address" eventually different from CS:0000. But bcc won't support it. Every program compiled by bcc has a short initial startup section which rearranges argc/argv/envp parameters for calling _main (first part) and issue an _exit when main() returns (second part). Yes, the "rearrange" could be easily implemented in the ELKS kernel. When the program terminates by exit() the second startup part is not needed. But if the main() returns a value, there is need for it. An ugly hack could be patching the bcc in order to get the "main()" with a different stack frame; maybe something like: _main: mov bp, sp ; opening code: not need to push bp [maybe: sub sp, NN -- stack space needed for local variables] [...rest of main()...] mov bx, ax ; closing code: bp is no more needed mov ax, 1 int 0x80 ; call exit(main()) This is transparent to the programmer except in the case of a recursive call to _main (well, in sixteen years of C programming I never found a case of a recursive _main call). But this is an ugly hack because it needs a different stackframe for _main (the compiler should always check for the function name)...! A decent hack should provide, when the main() returns, at least the three instructions of "closing code" of above. But this means either adding a "push return-address-to-closing-code" somewhere in the kernel, or... simply placing a "call _main -- closing code follows" at the beginning of the program (which can start at CS:0000 without any extra work). I think I would not hack anything. The startup code often does a little more than calling _main and exiting with its return value. For example you could save in a static variable the envp so that a getenv() library function call can get in any moment (without _main support) an environment string. These notes demostrate that, at least for C programs (not only bcc compiled programs), a "start address" different from CS:0000 is not really needed. Maybe a compiler for some other programming language could take some advantage from it. We still did not discuss assembler-written programs. An assembler program does not always need to start at CS:0000 and/or with some startup code. The little patch of above seems suffice, but then I would add in the fs/exec.c a little extra check (to verify that mh.startaddr is less than mh.tseg).