All of lore.kernel.org
 help / color / mirror / Atom feed
From: chandramouli narayanan <mouli@linux.intel.com>
To: Andi Kleen <ak@suse.de>
Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org
Subject: Re: [PATCH 2.6.21 1/3] x86_64: EFI64 support
Date: Fri, 25 May 2007 15:46:25 -0700	[thread overview]
Message-ID: <46576741.6020806@linux.intel.com> (raw)
In-Reply-To: <200705041501.55502.ak@suse.de>

Andi Kleen wrote:
> On Tuesday 01 May 2007 20:59:46 Chandramouli Narayanan wrote:
>   
>> General note on EFI x86_64 support
>> ----------------------------------
>>     
>
> More review. This code unfortunately has some problems.
>
> First this seems to be quite different from what the 32bit EFI 
> support does (which i suppose is pre UEFI) Are there plans to sync this 
> up eventually?
>   
Consolidation of the efi support across architectures needs to be done 
at some point and this will be a bigger task. But right now my focus is 
x86_64.
> Also your howto above should be somewhere in Documentation/
>   
Will do.
>   
>> - Create a VFAT partition on the disk
>> - Copy the following to the VFAT partition:
>> 	elilo bootloader with x86_64 support and elilo configuration file
>> 	efi64 kernel image, initrd
>>     
>
> What format is the kernel image?
>   
bzImage
>   
>> - Boot to EFI shell and invoke elilo choosing efi64 kernel image
>> - On UEFI2.0 firmware systems, pass vga=fbcon for boot messages to appear
>>   on console.
>>     
>
> They don't have a compat mode for vga anymore?
>   
I'm not sure what you mean by VGA compatibility mode. There is no 
requirement in [U]EFI for VGA.
>   
>> 2. With CALGARY_IOMMU=y in the kernel configuration, the Calgary detection fails
>> with the message "Calgary: Unable to locate Rio Grande Table in EBDA - bailing".
>> However, the legacy kernel has no such error.
>>     
>
> Getting that message when you don't have a IBM Summit system with Calgary is expected.
> It would be more worrying why the old kernel didn't give it.
>   
All right. I will double-check into the older kernel.
>   
>> +config EFI
>> +	bool "Boot from EFI support (EXPERIMENTAL)"
>> +	default n
>>     
>
> The config should be only added after the feature works -- later in the series.
>
> Drop the default n
>   
To the extent I have tested, the feature works. Should this still be 'n'?
>   
>> +	---help---
>> +
>> +	This enables the the kernel to boot on EFI platforms using
>> +	system configuration information passed to it from the firmware.
>> +	This also enables the kernel to use any EFI runtime services that are
>> +	available (such as the EFI variable services).
>> +	This option is only useful on systems that have EFI firmware
>> +	and will result in a kernel image that is ~8k larger. However,
>> +	even with this option, the resultant kernel should continue to
>> +	boot on existing non-EFI platforms.
>>     
>
> The description should probably have a reference to the Documentation describing
> how to set this up.
>
> Mention UEFI?
>   
Will add to the doc.
>   
>> +#define EFI_SYSTAB (*((unsigned long *)(PARAM+0x1b8)))
>> +#define EFI_LOADER_SIG ((unsigned char *)(PARAM+0x1c0))
>> +#define EFI_MEMDESC_SIZE (*((unsigned int *) (PARAM+0x1c4)))
>> +#define EFI_MEMDESC_VERSION (*((unsigned int *) (PARAM+0x1c8)))
>> +#define EFI_MEMMAP_SIZE (*((unsigned int *) (PARAM+0x1cc)))
>> +#define EFI_MEMMAP (*((unsigned long *)(PARAM+0x1d0)))
>>     
>
> This needs to be documented in Documentation/i386/zero-page.txt
>
> But it might be already obsolete with the early conversion to e820 change?
>   
Not sure if this becomes obsolete with e820 change. I will look into it 
and add to the doc.
>   
>> +#define EFI_ARG_NUM_GET_TIME			2
>> +#define EFI_ARG_NUM_SET_TIME			1
>> +#define EFI_ARG_NUM_GET_WAKEUP_TIME		3
>> +#define EFI_ARG_NUM_SET_WAKEUP_TIME		2
>> +#define EFI_ARG_NUM_GET_VARIABLE		5
>> +#define EFI_ARG_NUM_GET_NEXT_VARIABLE		3
>> +#define EFI_ARG_NUM_SET_VARIABLE		5
>> +#define EFI_ARG_NUM_GET_NEXT_HIGH_MONO_COUNT	1
>> +#define EFI_ARG_NUM_RESET_SYSTEM		4
>> +#define EFI_ARG_NUM_SET_VIRTUAL_ADDRESS_MAP	4
>> +
>> +#define EFI_ARG_NUM_MAX 10
>> +#define EFI_REG_ARG_NUM 4
>> +
>> +extern unsigned long efi_call_phys(void *fp, u64 arg_num, ...);
>> +struct efi efi;
>> +EXPORT_SYMBOL(efi);
>> +struct efi efi_phys __initdata;
>> +struct efi_memory_map memmap ;
>> +static efi_system_table_t efi_systab __initdata;
>> +
>> +static unsigned long efi_rt_eflags;
>> +static spinlock_t efi_rt_lock = SPIN_LOCK_UNLOCKED;
>>     
>
> Each lock needs a comment what it protects and if there is a locking order.
>
>   
I will add the comments. Ditto for i386.
>> +static pgd_t save_pgd;
>>     
>
> That looks dubious, more comments later.
>
>   
>> +
>> +/* Convert SysV calling convention to EFI x86_64 calling convention */
>> +
>> +static efi_status_t uefi_call_wrapper(void *fp, unsigned long va_num, ...)
>> +{
>>     
>
> Any reason you can't do something simple like (untested)
>
> /* rdi, rsi, rdx, rcx, r8, r9 -> rcx,rdx,r8,r9,stack,stack
>    arg1 function to call */
>
> call_ms:
>         mov %rsi,%rcx
>         mov %rdx,%rdx
>         mov %rcx,%r8
>         mov %r8,%r9
>         push %r9
>         call *%rdi
>         pop %r9
>         ret
>
> I assume none of the calls has more than 6 arguments.
> ndiswrapper probably has already tested variants if you're lazy.
> Then you also wouldn't need the defines for the number of arguments.
>
> Also such code is better written in pure assembly; some versions off 
> gcc don't like clobbering of too many registers.
>   
This wrapper code was tested and added to elilo with x86_64 support. 
There are some efi calls with > 6 args as used in elilo. So, the wrapper 
code is more elaborate to deal with those cases. Although the efi calls 
used here in the kernel do not exceed 6 args, for the sake of 
generality, the same code is being used here. So, is there a better way 
to handle more number of arguments? I tested with the LIN2WINxx macros 
from ndiswrapper code ( http://ndiswrapper.sourceforge.net/joomla/) and 
that works. The LIN2WINxx defines macros to call with a specified set of 
arguments. For instance, an efi runtime call with one argument should be 
called with LIN2WIN1() and so on. If this approach is acceptable for 
inclusion, that is a possible candidate for replacement.
I had an issue with the code snippet you provided for an efi call with 5 
args to get efi variables. I didn't investigate this further.
>   
>> +static efi_status_t _efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
>> +{
>> +	return uefi_call_wrapper((void*)efi.systab->runtime->get_time,
>> +					EFI_ARG_NUM_GET_TIME,
>> +					(u64)tm,
>> +					(u64)tc
>>     
>
> Are the casts really needed? 
>   
Not needed. I fixed it.
>   
>> +static void efi_call_phys_prelog(void)
>> +{
>> +	unsigned long vaddress;
>> +
>> +	spin_lock(&efi_rt_lock);
>> +	local_irq_save(efi_rt_eflags);
>> +
>> +	vaddress = (unsigned long)__va(0x0UL);	
>> +	pgd_val(save_pgd) = pgd_val(*pgd_offset_k(0x0UL));
>> +	set_pgd(pgd_offset_k(0x0UL), *pgd_offset_k(vaddress));
>>     
>
> Who tells you the other CPUs don't use the current pgd? If it's only used in early
> boot it should be __init. If not it's broken.
>   
This code is called during early boot. This should be __init. Ditto for 
i386 version. It too does not have __init.
> Most likely you need to create an own thread with special page tables.
>
>   
>> +	local_flush_tlb();
>>     
>
> That won't flush the global mappings.
>
>   
>> +}
>> +
>> +static void efi_call_phys_epilog(void)
>> +{
>> +	/*
>> +	 * After the lock is released, the original page table is restored.
>> +	 */
>> +	set_pgd(pgd_offset_k(0x0UL), save_pgd);
>> +	local_flush_tlb();
>>     
>
> Same
>
>   
>> +	local_irq_restore(efi_rt_eflags);
>> +	spin_unlock(&efi_rt_lock);
>> +}
>> +
>> +static efi_status_t
>> +phys_efi_set_virtual_address_map(unsigned long memory_map_size,
>> +				 unsigned long descriptor_size,
>> +				 u32 descriptor_version,
>> +				 efi_memory_desc_t *virtual_map)
>> +{
>> +	efi_status_t status;
>> +
>> +	efi_call_phys_prelog();
>> +	status = efi_call_phys(efi_phys.set_virtual_address_map,
>> +					EFI_ARG_NUM_SET_VIRTUAL_ADDRESS_MAP,
>> +					(unsigned long)memory_map_size,
>> +					(unsigned long)descriptor_size,
>> +					(unsigned long)descriptor_version, 
>> +					(unsigned long)virtual_map);
>> +	efi_call_phys_epilog();
>> +	return status;
>> +}
>> +
>> +efi_status_t
>> +phys_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
>> +{
>>     
>
> Looks broken -- i think that is called later, so the pgds
> can be messed up.
>   
I don't think so. For instance, when efi_get_time is called later, the 
virtual mode is set up and it does not
cause the physical mode call.
> Ok there is suspend/resume -- if you're careful you might be able
> to call this before the other CPUs are put online again. But that
> is also current being changed to use frozen CPUs. You probably
> need to coordinate with Rafael.
>
> [i suspect your resume hang is somehow related to this]
>
>   
I don't have a suspend/resume hang. The issue I documented was with 
regard to the behavior of the desktop with just one of the git kernels 
prior to 2.6.21 release. I tested the suspend/resume with the patch 
applied to 2.6.21. I tested by directly writing the state to 
/sys/power/state. However, powersave -U did not seem to do anything. 
This seems to me a user-mode utility issue rather than the kernel support.
>> +inline int efi_set_rtc_mmss(unsigned long nowtime)
>>     
>
> I think that can be called any time so definitely broken
> regarding page tables.
>   
efi init code sets up efi.get_time to a virtual mode function that can 
be called later. So, I don't think there is an issue here. Correct me if 
I'm wrong.
>   
>> +inline unsigned long efi_get_time(void)
>>     
>
> inline without static usually leads to wasted code because
> the compiler has to generate an out of line copy.
>
> I don't see why all these functions need to be inline anyways.
> Best just drop that everywhere and let the compiler decide.
>
> That would probably eliminate some of your 8k.
>   
Agreed. However, the function is declared extern in include/linux/efi.h. 
So, it can not be both static and inline. Ditto for the i386 code too.
>   
>> +	if (status != EFI_SUCCESS)
>> +		printk("Oops: efitime: can't read time status: 0x%lx\n",status);
>>     
>
> This should have suitable KERN_* prefixes (missing in most printks)
>
>   
Ok, I will fix this. Ditto for i386/efi code.
>> +/* Make EFI runtime code executable */
>>     
>
> It would be better to integrate this into the standard page table setup
> instead of cut'n'pasting so much code here.
>
>   
I've finally a version that is working with standard page table setup 
code. By the way, I got rid of the duplicate code. Also, I have 
integrated EFI memory map into e820 space.
>   
>> + * We need to map the EFI memory map again after paging_init().
>> + */
>> +void __init efi_map_memmap(void)
>> +{
>> +        efi_memory_desc_t *md;
>> +        void *p;
>> +
>> +	memmap.map = __va((unsigned long) memmap.phys_map);
>> +	memmap.map_end = memmap.map + (memmap.nr_map * memmap.desc_size);
>>     
>
> White space broken.
>   
Will fix.
>   
>> +#if EFI_DEBUG
>> +void __init print_efi_memmap(void)
>>     
>
> The memory map should be probably always printed; it's very useful
> for debugging. But if you integrate it with e820 that code can do it.
>
>   
I integrated the EFI memory map into e820 and it takes care of printing 
that. So, print_efi_memmap() can be gotten rid of.
>> +	/*
>> +	 * Show what we know for posterity
>> +	 */
>> +	c16 = (efi_char16_t *) early_ioremap(efi.systab->fw_vendor, 2);
>> +	if (c16) {
>> +		for (i = 0; i < sizeof(vendor) && *c16; ++i) 
>> +			vendor[i] = *c16++;
>>     
>
> Probably safer to use probe_kernel_address() here and bail out if it's broken.
>
>   
I will make the needed changes.
>   
>> +		vendor[i] = '\0';
>> +	} else
>> +		printk(KERN_ERR PFX "Could not map the firmware vendor!\n");
>>     
>
> EFI should be in the error string
>   
PFX is set to EFI and that should take care.
>
>   
>> +	if (status != EFI_SUCCESS) {
>> +		printk (KERN_ALERT "You are screwed! "
>> +			"Unable to switch EFI into virtual mode "
>> +			"(status=%lx)\n", status);
>> +		panic("EFI call to SetVirtualAddressMap() failed!");
>>     
>
> Is the panic really needed?
>   
Probably not needed. This is lifted from i386 efi init code.
>   
>> diff -uprN -X linux-2.6.21rc7-git2-orig/Documentation/dontdiff linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/head.S linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/head.S
>> --- linux-2.6.21rc7-git2-orig/arch/x86_64/kernel/head.S	2007-04-19 12:39:39.000000000 -0700
>> +++ linux-2.6.21rc7-git2-uefi-finaltest/arch/x86_64/kernel/head.S	2007-04-19 13:01:02.000000000 -0700
>> @@ -94,12 +94,29 @@ startup_32:
>>  	 * EFER.LMA = 1). Now we want to jump in 64bit mode, to do that we use
>>  	 * the new gdt/idt that has __KERNEL_CS with CS.L = 1.
>>  	 */
>> -	ljmp	$__KERNEL_CS, $(startup_64 - __START_KERNEL_map)
>> +	ljmp	$__KERNEL_CS, $(long64 - __START_KERNEL_map)
>>     
>
>
> What is the head.S change good for?
>   
Sorry, this is remnant code that should _not_ have been part of the patch.
>   
> -Andi
>
>   

I'm testing next set of patches with fixes and post them for review.
thanks for feedback,
- mouli

  reply	other threads:[~2007-05-25 22:46 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-01 18:59 [PATCH 2.6.21 0/3] x86_64: EFI64 support Chandramouli Narayanan
2007-05-01 18:59 ` [PATCH 2.6.21 1/3] " Chandramouli Narayanan
2007-05-03  2:56   ` Randy Dunlap
2007-05-03 19:20     ` chandramouli narayanan
2007-05-03 20:15       ` Randy Dunlap
2007-06-01 16:47       ` Yinghai Lu
2007-05-04 13:01   ` Andi Kleen
2007-05-25 22:46     ` chandramouli narayanan [this message]
2007-06-01 18:39   ` Eric W. Biederman
2007-06-01 18:44   ` Eric W. Biederman
2007-05-01 18:59 ` [PATCH 2.6.21 2/3] " Chandramouli Narayanan
2007-05-02 20:55   ` Andi Kleen
2007-05-02 22:43     ` chandramouli narayanan
2007-06-01 18:50   ` Eric W. Biederman
2007-05-01 18:59 ` [PATCH 2.6.21 3/3] " Chandramouli Narayanan
2007-06-01 18:52   ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46576741.6020806@linux.intel.com \
    --to=mouli@linux.intel.com \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.