* fix memory corruption/crash for physical-mode EFI calls
@ 2004-07-10 4:39 David Mosberger
2004-07-10 16:54 ` Jesse Barnes
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: David Mosberger @ 2004-07-10 4:39 UTC (permalink / raw)
To: linux-ia64
Jesse,
I think you'll want to try out the patch below. If my guess is
correct and the SGI firmware doesn't support switching into virtual
mode, then this may fix the boot-problem you are seeing on SN2.
We found this problem after I noticed that the Ski simulator always
wanted to fsck its filesystem. That turned out to be because the
phys_get_time() in efi.c used __pa() to convert the address of a
stack-variable to a physical address. Only problem was that the
stack-variable was on the init-task's stack, so it was in region 5.
Effectively, this ended up writing the correct time to a bogus memory
address. In the simulator, that was harmless apart from not returning
the correct time, but in a real machine, it would likely lead to a
machine-check. We never saw this problem on tiger or zx1-based
machines because by the time efi_get_time() is called, they have
switched EFI into virtual mode, which obviates the need to do the
virtual->physical conversion.
The patch below looks bigger than what's really going on: all it does
is convert __pa() to ia64_tpa(), with some extra code to allow NULL
pointers for optional arguments.
Happy hacking,
--david
=== arch/ia64/kernel/efi.c 1.34 vs edited ==--- 1.34/arch/ia64/kernel/efi.c 2004-05-10 11:38:38 -07:00
+++ edited/arch/ia64/kernel/efi.c 2004-07-09 21:24:38 -07:00
@@ -43,18 +43,20 @@
#define efi_call_virt(f, args...) (*(f))(args)
-#define STUB_GET_TIME(prefix, adjust_arg) \
-static efi_status_t \
-prefix##_get_time (efi_time_t *tm, efi_time_cap_t *tc) \
-{ \
- struct ia64_fpreg fr[6]; \
- efi_status_t ret; \
- \
- ia64_save_scratch_fpregs(fr); \
- ret = efi_call_##prefix((efi_get_time_t *) __va(runtime->get_time), adjust_arg(tm), \
- adjust_arg(tc)); \
- ia64_load_scratch_fpregs(fr); \
- return ret; \
+#define STUB_GET_TIME(prefix, adjust_arg) \
+static efi_status_t \
+prefix##_get_time (efi_time_t *tm, efi_time_cap_t *tc) \
+{ \
+ struct ia64_fpreg fr[6]; \
+ efi_time_cap_t *atc = 0; \
+ efi_status_t ret; \
+ \
+ if (tc) \
+ atc = adjust_arg(tc); \
+ ia64_save_scratch_fpregs(fr); \
+ ret = efi_call_##prefix((efi_get_time_t *) __va(runtime->get_time), adjust_arg(tm), atc); \
+ ia64_load_scratch_fpregs(fr); \
+ return ret; \
}
#define STUB_SET_TIME(prefix, adjust_arg) \
@@ -89,11 +91,14 @@
prefix##_set_wakeup_time (efi_bool_t enabled, efi_time_t *tm) \
{ \
struct ia64_fpreg fr[6]; \
+ efi_time_t *atm = 0; \
efi_status_t ret; \
\
+ if (tm) \
+ atm = adjust_arg(tm); \
ia64_save_scratch_fpregs(fr); \
ret = efi_call_##prefix((efi_set_wakeup_time_t *) __va(runtime->set_wakeup_time), \
- enabled, adjust_arg(tm)); \
+ enabled, atm); \
ia64_load_scratch_fpregs(fr); \
return ret; \
}
@@ -104,11 +109,14 @@
unsigned long *data_size, void *data) \
{ \
struct ia64_fpreg fr[6]; \
+ u32 *aattr = 0; \
efi_status_t ret; \
\
+ if (attr) \
+ aattr = adjust_arg(attr); \
ia64_save_scratch_fpregs(fr); \
ret = efi_call_##prefix((efi_get_variable_t *) __va(runtime->get_variable), \
- adjust_arg(name), adjust_arg(vendor), adjust_arg(attr), \
+ adjust_arg(name), adjust_arg(vendor), aattr, \
adjust_arg(data_size), adjust_arg(data)); \
ia64_load_scratch_fpregs(fr); \
return ret; \
@@ -164,33 +172,41 @@
unsigned long data_size, efi_char16_t *data) \
{ \
struct ia64_fpreg fr[6]; \
+ efi_char16_t *adata = 0; \
+ \
+ if (data) \
+ adata = adjust_arg(data); \
\
ia64_save_scratch_fpregs(fr); \
efi_call_##prefix((efi_reset_system_t *) __va(runtime->reset_system), \
- reset_type, status, data_size, adjust_arg(data)); \
+ reset_type, status, data_size, adata); \
/* should not return, but just in case... */ \
ia64_load_scratch_fpregs(fr); \
}
-STUB_GET_TIME(phys, __pa)
-STUB_SET_TIME(phys, __pa)
-STUB_GET_WAKEUP_TIME(phys, __pa)
-STUB_SET_WAKEUP_TIME(phys, __pa)
-STUB_GET_VARIABLE(phys, __pa)
-STUB_GET_NEXT_VARIABLE(phys, __pa)
-STUB_SET_VARIABLE(phys, __pa)
-STUB_GET_NEXT_HIGH_MONO_COUNT(phys, __pa)
-STUB_RESET_SYSTEM(phys, __pa)
-
-STUB_GET_TIME(virt, )
-STUB_SET_TIME(virt, )
-STUB_GET_WAKEUP_TIME(virt, )
-STUB_SET_WAKEUP_TIME(virt, )
-STUB_GET_VARIABLE(virt, )
-STUB_GET_NEXT_VARIABLE(virt, )
-STUB_SET_VARIABLE(virt, )
-STUB_GET_NEXT_HIGH_MONO_COUNT(virt, )
-STUB_RESET_SYSTEM(virt, )
+#define phys_ptr(arg) ((__typeof__(arg)) ia64_tpa(arg))
+
+STUB_GET_TIME(phys, phys_ptr)
+STUB_SET_TIME(phys, phys_ptr)
+STUB_GET_WAKEUP_TIME(phys, phys_ptr)
+STUB_SET_WAKEUP_TIME(phys, phys_ptr)
+STUB_GET_VARIABLE(phys, phys_ptr)
+STUB_GET_NEXT_VARIABLE(phys, phys_ptr)
+STUB_SET_VARIABLE(phys, phys_ptr)
+STUB_GET_NEXT_HIGH_MONO_COUNT(phys, phys_ptr)
+STUB_RESET_SYSTEM(phys, phys_ptr)
+
+#define id(arg) arg
+
+STUB_GET_TIME(virt, id)
+STUB_SET_TIME(virt, id)
+STUB_GET_WAKEUP_TIME(virt, id)
+STUB_SET_WAKEUP_TIME(virt, id)
+STUB_GET_VARIABLE(virt, id)
+STUB_GET_NEXT_VARIABLE(virt, id)
+STUB_SET_VARIABLE(virt, id)
+STUB_GET_NEXT_HIGH_MONO_COUNT(virt, id)
+STUB_RESET_SYSTEM(virt, id)
void
efi_gettimeofday (struct timespec *ts)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: fix memory corruption/crash for physical-mode EFI calls
2004-07-10 4:39 fix memory corruption/crash for physical-mode EFI calls David Mosberger
@ 2004-07-10 16:54 ` Jesse Barnes
2004-07-12 20:41 ` David Mosberger
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Jesse Barnes @ 2004-07-10 16:54 UTC (permalink / raw)
To: linux-ia64
On Friday, July 9, 2004 9:39 pm, David Mosberger wrote:
> I think you'll want to try out the patch below. If my guess is
> correct and the SGI firmware doesn't support switching into virtual
> mode, then this may fix the boot-problem you are seeing on SN2.
I believe we do. IIRC we can call into our PROM in either physical or virtual
mode.
> We found this problem after I noticed that the Ski simulator always
> wanted to fsck its filesystem. That turned out to be because the
> phys_get_time() in efi.c used __pa() to convert the address of a
> stack-variable to a physical address. Only problem was that the
> stack-variable was on the init-task's stack, so it was in region 5.
> Effectively, this ended up writing the correct time to a bogus memory
> address. In the simulator, that was harmless apart from not returning
> the correct time, but in a real machine, it would likely lead to a
> machine-check. We never saw this problem on tiger or zx1-based
> machines because by the time efi_get_time() is called, they have
> switched EFI into virtual mode, which obviates the need to do the
> virtual->physical conversion.
>
> The patch below looks bigger than what's really going on: all it does
> is convert __pa() to ia64_tpa(), with some extra code to allow NULL
> pointers for optional arguments.
That sounds like a real bug, but applying the patch doesn't help with the
MCA/hang I see on sn2:
Linux version 2.6.7 (jbarnes@tomahawk.engr.sgi.com) (gcc version 3.3.2) #17
SMP Sat Jul 10 09:48:17 PDT 2004
EFI v1.02 by SGI: SALsystab=0x30047e4ed0 ACPI 2.0=0x30047e56a0
ACPI: RSDP (v002 SGI ) @ 0x00000030047e56a0
ACPI: XSDT (v001 SGI XSDTSN2 0x00010001 0x00000001) @ 0x00000030047e56e0
ACPI: MADT (v001 SGI APICSN2 0x00010001 0x00000001) @ 0x00000030047e5740
ACPI: SRAT (v001 SGI SRATSN2 0x00010001 0x00000001) @ 0x00000030047e57a0
ACPI: SLIT (v001 SGI SLITSN2 0x00010001 0x00000001) @ 0x00000030047e5830
ACPI: FADT (v003 SGI FACPSN2 0x00030001 0x00000001) @ 0x00000030047e5900
ACPI: DSDT (v001 SGI DSDTSN2 0x00010001 0x00000001) @ 0x00000030047e58c0
ACPI: DSDT (v001 SGI DSDTSN2 0x00010001 0x00000001) @ 0x0000000000000000
ACPI: SRAT revision 0
ACPI: SLIT localities 1x1
Number of logical nodes in system = 1
Number of memory chunks in system = 1
SAL 2.9: SGI SN2 version 3.31
SAL Platform features: ITC_Drift
SAL: AP wakeup using external interrupt vector 0x12
POD entered via MCA, using Cac mode
0 000: POD SysCt Cac>
(POD is the builtin debugger that's entered when a machine check occurs.)
Thanks,
Jesse
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: fix memory corruption/crash for physical-mode EFI calls
2004-07-10 4:39 fix memory corruption/crash for physical-mode EFI calls David Mosberger
2004-07-10 16:54 ` Jesse Barnes
@ 2004-07-12 20:41 ` David Mosberger
2004-07-13 14:54 ` Jack Steiner
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: David Mosberger @ 2004-07-12 20:41 UTC (permalink / raw)
To: linux-ia64
>>>>> On Sat, 10 Jul 2004 09:54:05 -0700, Jesse Barnes <jbarnes@engr.sgi.com> said:
Jesse> On Friday, July 9, 2004 9:39 pm, David Mosberger wrote:
>> I think you'll want to try out the patch below. If my guess is
>> correct and the SGI firmware doesn't support switching into
>> virtual mode, then this may fix the boot-problem you are seeing
>> on SN2.
Jesse> I believe we do. IIRC we can call into our PROM in either
Jesse> physical or virtual mode.
Ah, in that case the patch won't help. Too bad.
--david
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: fix memory corruption/crash for physical-mode EFI calls
2004-07-10 4:39 fix memory corruption/crash for physical-mode EFI calls David Mosberger
2004-07-10 16:54 ` Jesse Barnes
2004-07-12 20:41 ` David Mosberger
@ 2004-07-13 14:54 ` Jack Steiner
2004-07-13 15:18 ` Jesse Barnes
2004-07-13 22:00 ` David Mosberger
4 siblings, 0 replies; 6+ messages in thread
From: Jack Steiner @ 2004-07-13 14:54 UTC (permalink / raw)
To: linux-ia64
On Fri, Jul 09, 2004 at 09:39:00PM -0700, David Mosberger wrote:
> Jesse,
>
> I think you'll want to try out the patch below. If my guess is
> correct and the SGI firmware doesn't support switching into virtual
> mode, then this may fix the boot-problem you are seeing on SN2.
FYI, I looked at the SGI boot failures that are occurring on recent 2.6 kernels
that have INIT_TASK in region 5. The boot failures are caused by a bug in our
PROM. It incorrectly assumes that the stack is identity mapped.
(Dont ask... :-)
Moving the stack to region 5 causes an OOPs in some PAL calls. It
goes downhill from there - sending signals to processes early
in boot before kmem is initialized doesnt get very far. The result is
an MCA that has nothing to do with the original failure but adds a lot
of confusion.
We'll fix our PROM....
--
Thanks
Jack Steiner (steiner@sgi.com) 651-683-5302
Principal Engineer SGI - Silicon Graphics, Inc.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: fix memory corruption/crash for physical-mode EFI calls
2004-07-10 4:39 fix memory corruption/crash for physical-mode EFI calls David Mosberger
` (2 preceding siblings ...)
2004-07-13 14:54 ` Jack Steiner
@ 2004-07-13 15:18 ` Jesse Barnes
2004-07-13 22:00 ` David Mosberger
4 siblings, 0 replies; 6+ messages in thread
From: Jesse Barnes @ 2004-07-13 15:18 UTC (permalink / raw)
To: linux-ia64
On Tuesday, July 13, 2004 10:54 am, Jack Steiner wrote:
> On Fri, Jul 09, 2004 at 09:39:00PM -0700, David Mosberger wrote:
> > Jesse,
> >
> > I think you'll want to try out the patch below. If my guess is
> > correct and the SGI firmware doesn't support switching into virtual
> > mode, then this may fix the boot-problem you are seeing on SN2.
>
> FYI, I looked at the SGI boot failures that are occurring on recent 2.6
> kernels that have INIT_TASK in region 5. The boot failures are caused by a
> bug in our PROM. It incorrectly assumes that the stack is identity mapped.
> (Dont ask... :-)
Ugg, I remembered that we had code for calling in virtual or physical mode,
but I should have checked to make sure it didn't assume identity mapping!
> We'll fix our PROM....
Thanks,
Jesse
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: fix memory corruption/crash for physical-mode EFI calls
2004-07-10 4:39 fix memory corruption/crash for physical-mode EFI calls David Mosberger
` (3 preceding siblings ...)
2004-07-13 15:18 ` Jesse Barnes
@ 2004-07-13 22:00 ` David Mosberger
4 siblings, 0 replies; 6+ messages in thread
From: David Mosberger @ 2004-07-13 22:00 UTC (permalink / raw)
To: linux-ia64
>>>>> On Tue, 13 Jul 2004 09:54:13 -0500, Jack Steiner <steiner@sgi.com> said:
Jack> FYI, I looked at the SGI boot failures that are occurring on
Jack> recent 2.6 kernels that have INIT_TASK in region 5. The boot
Jack> failures are caused by a bug in our PROM. It incorrectly
Jack> assumes that the stack is identity mapped.
That would do it. Glad you found the culprit. One mystery we can
scratch off.
--david
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-07-13 22:00 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-10 4:39 fix memory corruption/crash for physical-mode EFI calls David Mosberger
2004-07-10 16:54 ` Jesse Barnes
2004-07-12 20:41 ` David Mosberger
2004-07-13 14:54 ` Jack Steiner
2004-07-13 15:18 ` Jesse Barnes
2004-07-13 22:00 ` David Mosberger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox