* MCA Recovery for Enterprise Server
@ 2003-10-20 6:19 Hidetoshi Seto
2003-10-20 17:02 ` Luck, Tony
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Hidetoshi Seto @ 2003-10-20 6:19 UTC (permalink / raw)
To: linux-ia64
Hi.
Now I am considering the way to apply Linux to Mission-Critical Enterprise
system on IPF (Itanium Processor Family) Server. Generally, Enterprise Server
requires high-reliability and high-availability, so I recognize following
features as fundamentals:
- Recovery from device error
- Recovery from intermittent corrected error (ex. Single-bit ECC error)
- Structured Error logging
Aims of these are:
- Keep stable.
- Quick maintenance by early error detecting/declaring.
These features we working on are realized by functions that recover system
from hardware error, block suffered device by judging from CPU/Memory/Chipset
error severity. An outline is here:
a) Fault Location and Error Classification
Detect suffered unit and determine error severity on interrupted timing.
b) Recovery from device error
If error is local, disable suffered devices and block operations target to
them. Else, reboot system immediately.
c) Error Logging
Structured error log helps maintenance engineer, remote maintenance system,
and policed error observer.
d) Error Prediction (from intermittent corrected error)
To prevent expected error on sick component, check every corrected error
and alert user to confirmed. This feature will be realized by daemon in
user-land.
I am planning to offer a) to c) by the mid of March 2004, and d) by the end of
2005.
However, some of these features seem to depend on the platform implementation.
So I am designing a Platform-MCA (Machine Check Abort) handler for our IPF
machine.
Is there any guideline(s) to implement Platform-MCA handler?
I have found a symbol named PLATFORM_MCA_HANDLERS in /arch/ia64/kernel/mca.c,
but it seems not to work.
Also, if you know any technique for debugging MCA codes, please show me the
smart way.
Thanks.
------
H.Seto <seto.hidetoshi@jp.fujitsu.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: MCA Recovery for Enterprise Server
2003-10-20 6:19 MCA Recovery for Enterprise Server Hidetoshi Seto
@ 2003-10-20 17:02 ` Luck, Tony
2003-10-20 20:42 ` David Mosberger
2003-10-27 6:44 ` Hidetoshi Seto
2 siblings, 0 replies; 4+ messages in thread
From: Luck, Tony @ 2003-10-20 17:02 UTC (permalink / raw)
To: linux-ia64
[-- Attachment #1: Type: text/plain, Size: 1338 bytes --]
> Is there any guideline(s) to implement Platform-MCA handler?
> I have found a symbol named PLATFORM_MCA_HANDLERS in
> /arch/ia64/kernel/mca.c, > but it seems not to work.
I posted a set of three patches against 2.6.0-test5 on October 3rd.
Only the first of those parts was accepted, but since then I
have broken out some of the bug-fix components from the 3rd part
and they were accepted by David and Linus, and are part of
2.6.0-test8.
That still leaves 666 lines of patch required to get this working
for the case of MCA due to TLB fault. I've attached the remaining
part to this e-mail for reference, but without any real hope that
David will take such a large patch at this stage of 2.6.0 stablilization.
> Also, if you know any technique for debugging MCA codes,
> please show me the smart way.
The "smart" way is (as always) to avoid putting bugs into the
code, especially as this is fault handler code, which has extra
challenges to debug :-) Since this approach is very hard, you'll
need either a simulator, or an ITP to allow you to set breakpoints,
examine registers and single-step. There may be pieces of the
code that could be tested by writing some surrounding support code
to debug them in a more user friendly environment (e.g. user mode).
David: does "ski" have any hooks for fault injection?
-Tony Luck
[-- Attachment #2: mca-TODO-20031017.patch --]
[-- Type: application/octet-stream, Size: 18574 bytes --]
diff -ru linus-bk/arch/ia64/kernel/asm-offsets.c todo/arch/ia64/kernel/asm-offsets.c
--- linus-bk/arch/ia64/kernel/asm-offsets.c Thu Oct 16 16:18:48 2003
+++ todo/arch/ia64/kernel/asm-offsets.c Fri Oct 17 12:00:29 2003
@@ -12,6 +12,7 @@
#include <asm-ia64/ptrace.h>
#include <asm-ia64/siginfo.h>
#include <asm-ia64/sigcontext.h>
+#include <asm-ia64/mca.h>
#include "../kernel/sigframe.h"
@@ -204,4 +205,7 @@
# error "CLONE_SETTLS_BIT incorrect, please fix"
#endif
+ BLANK();
+ DEFINE(IA64_MCA_TLB_INFO_SIZE, sizeof (struct ia64_mca_tlb_info));
+
}
diff -ru linus-bk/arch/ia64/kernel/efi.c todo/arch/ia64/kernel/efi.c
--- linus-bk/arch/ia64/kernel/efi.c Thu Oct 16 16:18:48 2003
+++ todo/arch/ia64/kernel/efi.c Fri Oct 17 12:00:29 2003
@@ -30,6 +30,7 @@
#include <asm/kregs.h>
#include <asm/pgtable.h>
#include <asm/processor.h>
+#include <asm/mca.h>
#define EFI_DEBUG 0
@@ -402,6 +403,9 @@
int pal_code_count = 0;
u64 mask, psr;
u64 vaddr;
+#ifdef CONFIG_IA64_MCA
+ int cpu;
+#endif
efi_map_start = __va(ia64_boot_param->efi_memmap);
efi_map_end = efi_map_start + ia64_boot_param->efi_memmap_size;
@@ -462,6 +466,14 @@
IA64_GRANULE_SHIFT);
ia64_set_psr(psr); /* restore psr */
ia64_srlz_i();
+
+#ifdef CONFIG_IA64_MCA
+ cpu = smp_processor_id();
+
+ /* insert this TR into our list for MCA recovery purposes */
+ ia64_mca_tlb_list[cpu].pal_base=vaddr & mask;
+ ia64_mca_tlb_list[cpu].pal_paddr= pte_val(mk_pte_phys(md->phys_addr, PAGE_KERNEL));
+#endif
}
}
diff -ru linus-bk/arch/ia64/kernel/mca.c todo/arch/ia64/kernel/mca.c
--- linus-bk/arch/ia64/kernel/mca.c Thu Oct 16 16:18:48 2003
+++ todo/arch/ia64/kernel/mca.c Fri Oct 17 12:00:29 2003
@@ -78,9 +78,8 @@
u64 ia64_mca_stackframe[32];
u64 ia64_mca_bspstore[1024];
u64 ia64_init_stack[KERNEL_STACK_SIZE/8] __attribute__((aligned(16)));
-u64 ia64_mca_sal_data_area[1356];
-u64 ia64_tlb_functional;
u64 ia64_os_mca_recovery_successful;
+u64 ia64_mca_serialize;
static void ia64_mca_wakeup_ipi_wait(void);
static void ia64_mca_wakeup(int cpu);
static void ia64_mca_wakeup_all(void);
@@ -89,6 +88,8 @@
extern void ia64_slave_init_handler (void);
extern struct hw_interrupt_type irq_type_iosapic_level;
+struct ia64_mca_tlb_info ia64_mca_tlb_list[NR_CPUS];
+
static struct irqaction cmci_irqaction = {
.handler = ia64_mca_cmc_int_handler,
.flags = SA_INTERRUPT,
@@ -924,6 +925,9 @@
void
ia64_return_to_sal_check(void)
{
+ pal_processor_state_info_t *psp = (pal_processor_state_info_t *)
+ &ia64_sal_to_os_handoff_state.proc_state_param;
+
/* Copy over some relevant stuff from the sal_to_os_mca_handoff
* so that it can be used at the time of os_mca_to_sal_handoff
*/
@@ -933,14 +937,22 @@
ia64_os_to_sal_handoff_state.imots_sal_check_ra =
ia64_sal_to_os_handoff_state.imsto_sal_check_ra;
- /* Cold Boot for uncorrectable MCA */
- ia64_os_to_sal_handoff_state.imots_os_status = IA64_MCA_COLD_BOOT;
+ /*
+ * Did we correct the error? At the moment the only error that
+ * we fix is a TLB error, if any other kind of error occurred
+ * we must reboot.
+ */
+ if (psp->cc == 1 && psp->bc == 1 && psp->rc == 1 && psp->uc == 1)
+ ia64_os_to_sal_handoff_state.imots_os_status = IA64_MCA_COLD_BOOT;
+ else
+ ia64_os_to_sal_handoff_state.imots_os_status = IA64_MCA_CORRECTED;
/* Default = tell SAL to return to same context */
ia64_os_to_sal_handoff_state.imots_context = IA64_MCA_SAME_CONTEXT;
ia64_os_to_sal_handoff_state.imots_new_min_state =
(u64 *)ia64_sal_to_os_handoff_state.pal_min_state;
+
}
/*
@@ -1318,8 +1330,8 @@
void
ia64_log_prt_guid (efi_guid_t *p_guid, prfunc_t prfunc)
{
- char out[40];
- printk(KERN_DEBUG "GUID = %s\n", efi_guid_unparse(p_guid, out));
+ //char out[40];
+ //printk(KERN_DEBUG "GUID = %s\n", efi_guid_unparse(p_guid, out));
}
static void
diff -ru linus-bk/arch/ia64/kernel/mca_asm.S todo/arch/ia64/kernel/mca_asm.S
--- linus-bk/arch/ia64/kernel/mca_asm.S Thu Oct 16 16:18:48 2003
+++ todo/arch/ia64/kernel/mca_asm.S Fri Oct 17 12:00:29 2003
@@ -13,7 +13,9 @@
// 2. Restore current thread pointer to kr6
// 3. Move stack ptr 16 bytes to conform to C calling convention
//
+//
#include <linux/config.h>
+#include <linux/threads.h>
#include <asm/asmmacro.h>
#include <asm/pgtable.h>
@@ -22,20 +24,15 @@
#include <asm/mca.h>
/*
- * When we get an machine check, the kernel stack pointer is no longer
+ * When we get a machine check, the kernel stack pointer is no longer
* valid, so we need to set a new stack pointer.
*/
#define MINSTATE_PHYS /* Make sure stack access is physical for MINSTATE */
/*
- * Needed for ia64_sal call
- */
-#define SAL_GET_STATE_INFO 0x01000001
-
-/*
* Needed for return context to SAL
*/
-#define IA64_MCA_SAME_CONTEXT 0x0
+#define IA64_MCA_SAME_CONTEXT 0
#define IA64_MCA_COLD_BOOT -2
#include "minstate.h"
@@ -71,19 +68,36 @@
* returns ptr to SAL rtn save loc in _tmp
*/
#define OS_MCA_TO_SAL_HANDOFF_STATE_RESTORE(_tmp) \
- LOAD_PHYSICAL(p6, _tmp, ia64_sal_to_os_handoff_state);; \
- LOAD_PHYSICAL(p7, _tmp, ia64_os_to_sal_handoff_state);; \
-(p6) movl r8=IA64_MCA_COLD_BOOT; \
-(p6) movl r10=IA64_MCA_SAME_CONTEXT; \
-(p6) add _tmp=0x18,_tmp;; \
-(p6) ld8 r9=[_tmp],0x10; \
-(p6) mov r22=r0;; \
-(p7) ld8 r8=[_tmp],0x08;; \
-(p7) ld8 r9=[_tmp],0x08;; \
-(p7) ld8 r10=[_tmp],0x08;; \
-(p7) ld8 r22=[_tmp],0x08;;
+ movl _tmp=ia64_os_to_sal_handoff_state;; \
+ DATA_VA_TO_PA(_tmp);; \
+ ld8 r8=[_tmp],0x08;; \
+ ld8 r9=[_tmp],0x08;; \
+ ld8 r10=[_tmp],0x08;; \
+ ld8 r22=[_tmp],0x08;;
// now _tmp is pointing to SAL rtn save location
+/*
+ * COLD_BOOT_HANDOFF_STATE() sets ia64_mca_os_to_sal_state
+ * imots_os_status=IA64_MCA_COLD_BOOT
+ * imots_sal_gp=SAL GP
+ * imots_context=IA64_MCA_SAME_CONTEXT
+ * imots_new_min_state=Min state save area pointer
+ * imots_sal_check_ra=Return address to location within SAL_CHECK
+ *
+ */
+#define COLD_BOOT_HANDOFF_STATE(sal_to_os_handoff,os_to_sal_handoff,tmp)\
+ movl tmp=IA64_MCA_COLD_BOOT; \
+ movl sal_to_os_handoff=__pa(ia64_sal_to_os_handoff_state); \
+ movl os_to_sal_handoff=__pa(ia64_os_to_sal_handoff_state);; \
+ st8 [os_to_sal_handoff]=tmp,8;; \
+ ld8 tmp=[sal_to_os_handoff],48;; \
+ st8 [os_to_sal_handoff]=tmp,8;; \
+ movl tmp=IA64_MCA_SAME_CONTEXT;; \
+ st8 [os_to_sal_handoff]=tmp,8;; \
+ ld8 tmp=[sal_to_os_handoff],-8;; \
+ st8 [os_to_sal_handoff]=tmp,8;; \
+ ld8 tmp=[sal_to_os_handoff];; \
+ st8 [os_to_sal_handoff]=tmp;;
.global ia64_os_mca_dispatch
.global ia64_os_mca_dispatch_end
@@ -94,20 +108,21 @@
.global ia64_mca_stackframe
.global ia64_mca_bspstore
.global ia64_init_stack
- .global ia64_mca_sal_data_area
- .global ia64_tlb_functional
.text
.align 16
ia64_os_mca_dispatch:
-#if defined(MCA_TEST)
- // Pretend that we are in interrupt context
- mov r2=psr
- dep r2=0, r2, PSR_IC, 2;
- mov psr.l = r2
-#endif /* #if defined(MCA_TEST) */
+ // Serialize all MCA processing
+// movl r2=ia64_mca_serialize
+ mov r3=1;;
+// DATA_VA_TO_PA(r2);;
+ LOAD_PHYSICAL(p0,r2,ia64_mca_serialize);;
+ia64_os_mca_spin:
+ xchg8 r4=[r2],r3;;
+ cmp.ne p6,p0=r4,r0
+(p6) br ia64_os_mca_spin
// Save the SAL to OS MCA handoff state as defined
// by SAL SPEC 3.0
@@ -124,6 +139,191 @@
ia64_os_mca_done_dump:
+// movl r16=__pa(ia64_sal_to_os_handoff_state)+56
+ LOAD_PHYSICAL(p0,r16,ia64_sal_to_os_handoff_state+56)
+ ;;
+ ld8 r18=[r16] // Get processor state parameter on existing PALE_CHECK.
+ ;;
+ tbit.nz p6,p7=r18,60
+(p7) br.spnt done_tlb_purge_and_reload
+
+ // The following code purges TC and TR entries. Then reload all TC entries.
+ // Purge percpu data TC entries.
+begin_tlb_purge_and_reload:
+ mov r16=cr.lid
+// movl r17=__pa(ia64_mca_tlb_list) // Physical address of ia64_mca_tlb_list
+ LOAD_PHYSICAL(p0,r17,ia64_mca_tlb_list) // Physical address of ia64_mca_tlb_list
+ mov r19=0
+ mov r20=NR_CPUS
+ ;;
+1: cmp.eq p6,p7=r19,r20
+(p6) br.spnt.few err
+ ld8 r18=[r17],IA64_MCA_TLB_INFO_SIZE
+ ;;
+ add r19=1,r19
+ cmp.eq p6,p7=r18,r16
+(p7) br.sptk.few 1b
+ ;;
+ adds r17=-IA64_MCA_TLB_INFO_SIZE,r17
+ ;;
+ mov r23=r17 // save current ia64_mca_percpu_info addr pointer.
+ adds r17=16,r17
+ ;;
+ .global aegl
+aegl:
+ ld8 r18=[r17],8 // r18=ptce_base
+ ;;
+ ld4 r19=[r17],4 // r19=ptce_count[0]
+ ;;
+ ld4 r20=[r17],4 // r20=ptce_count[1]
+ ;;
+ ld4 r21=[r17],4 // r21=ptce_stride[0]
+ mov r24=0
+ ;;
+ ld4 r22=[r17],4 // r22=ptce_stride[1]
+ adds r20=-1,r20
+ ;;
+2:
+ cmp.ltu p6,p7=r24,r19
+(p7) br.cond.dpnt.few 4f
+ mov ar.lc=r20
+3:
+ ptc.e r18
+ ;;
+ add r18=r22,r18
+ br.cloop.sptk.few 3b
+ ;;
+ add r18=r21,r18
+ add r24=1,r24
+ ;;
+ br.sptk.few 2b
+4:
+ srlz.i // srlz.i implies srlz.d
+ ;;
+
+ // Now purge addresses formerly mapped by TR registers
+ // 1. Purge ITR&DTR for kernel.
+ movl r16=KERNEL_START
+ mov r18=KERNEL_TR_PAGE_SHIFT<<2
+ ;;
+ ptr.i r16, r18
+ ptr.d r16, r18
+ ;;
+ srlz.i
+ ;;
+ srlz.d
+ ;;
+ // 2. Purge DTR for PERCPU data.
+ movl r16=PERCPU_ADDR
+ mov r18=PERCPU_PAGE_SHIFT<<2
+ ;;
+ ptr.d r16,r18
+ ;;
+ srlz.d
+ ;;
+ // 3. Purge ITR for PAL code.
+ adds r17=48,r23
+ ;;
+ ld8 r16=[r17]
+ mov r18=IA64_GRANULE_SHIFT<<2
+ ;;
+ ptr.i r16,r18
+ ;;
+ srlz.i
+ ;;
+ // 4. Purge DTR for stack.
+ mov r16=IA64_KR(CURRENT_STACK)
+ ;;
+ shl r16=r16,IA64_GRANULE_SHIFT
+ movl r19=PAGE_OFFSET
+ ;;
+ add r16=r19,r16
+ mov r18=IA64_GRANULE_SHIFT<<2
+ ;;
+ ptr.d r16,r18
+ ;;
+ srlz.i
+ ;;
+ // Finally reload the TR registers.
+ // 1. Reload DTR/ITR registers for kernel.
+ mov r18=KERNEL_TR_PAGE_SHIFT<<2
+ movl r17=KERNEL_START
+ ;;
+ mov cr.itir=r18
+ mov cr.ifa=r17
+ mov r16=IA64_TR_KERNEL
+ mov r19=ip
+ movl r18=PAGE_KERNEL
+ ;;
+ dep r17=0,r19,0, KERNEL_TR_PAGE_SHIFT
+ ;;
+ or r18=r17,r18
+ ;;
+ itr.i itr[r16]=r18
+ ;;
+ itr.d dtr[r16]=r18
+ ;;
+ srlz.i
+ srlz.d
+ ;;
+ // 2. Reload DTR register for PERCPU data.
+ adds r17=8,r23
+ movl r16=PERCPU_ADDR // vaddr
+ movl r18=PERCPU_PAGE_SHIFT<<2
+ ;;
+ mov cr.itir=r18
+ mov cr.ifa=r16
+ ;;
+ ld8 r18=[r17] // pte
+ mov r16=IA64_TR_PERCPU_DATA;
+ ;;
+ itr.d dtr[r16]=r18
+ ;;
+ srlz.d
+ ;;
+ // 3. Reload ITR for PAL code.
+ adds r17=40,r23
+ ;;
+ ld8 r18=[r17],8 // pte
+ ;;
+ ld8 r16=[r17] // vaddr
+ mov r19=IA64_GRANULE_SHIFT<<2
+ ;;
+ mov cr.itir=r19
+ mov cr.ifa=r16
+ mov r20=IA64_TR_PALCODE
+ ;;
+ itr.i itr[r20]=r18
+ ;;
+ srlz.i
+ ;;
+ // 4. Reload DTR for stack.
+ mov r16=IA64_KR(CURRENT_STACK)
+ ;;
+ shl r16=r16,IA64_GRANULE_SHIFT
+ movl r19=PAGE_OFFSET
+ ;;
+ add r18=r19,r16
+ movl r20=PAGE_KERNEL
+ ;;
+ add r16=r20,r16
+ mov r19=IA64_GRANULE_SHIFT<<2
+ ;;
+ mov cr.itir=r19
+ mov cr.ifa=r18
+ mov r20=IA64_TR_CURRENT_STACK
+ ;;
+ itr.d dtr[r20]=r16
+ ;;
+ srlz.d
+ ;;
+ br.sptk.many done_tlb_purge_and_reload
+err:
+ COLD_BOOT_HANDOFF_STATE(r20,r21,r22)
+ br.sptk.many ia64_os_mca_done_restore
+
+done_tlb_purge_and_reload:
+
// Setup new stack frame for OS_MCA handling
movl r2=ia64_mca_bspstore;; // local bspstore area location in r2
DATA_VA_TO_PA(r2);;
@@ -137,17 +337,11 @@
// (C calling convention)
DATA_VA_TO_PA(r12);;
- // Check to see if the MCA resulted from a TLB error
-begin_tlb_error_check:
- br ia64_os_mca_tlb_error_check;;
-
-done_tlb_error_check:
-
- // If TLB is functional, enter virtual mode from physical mode
+ // Enter virtual mode from physical mode
VIRTUAL_MODE_ENTER(r2, r3, ia64_os_mca_virtual_begin, r4)
ia64_os_mca_virtual_begin:
- // call our handler
+ // Call virtual mode handler
movl r2=ia64_mca_ucmc_handler;;
mov b6=r2;;
br.call.sptk.many b0=b6;;
@@ -156,13 +350,6 @@
PHYSICAL_MODE_ENTER(r2, r3, ia64_os_mca_virtual_end, r4)
ia64_os_mca_virtual_end:
-#if defined(MCA_TEST)
- // Pretend that we are in interrupt context
- mov r2=psr;;
- dep r2=0, r2, PSR_IC, 2;;
- mov psr.l = r2;;
-#endif /* #if defined(MCA_TEST) */
-
// restore the original stack frame here
movl r2=ia64_mca_stackframe // restore stack frame from memory at r2
;;
@@ -178,14 +365,16 @@
br ia64_os_mca_proc_state_restore;;
ia64_os_mca_done_restore:
- movl r3=ia64_tlb_functional;;
- DATA_VA_TO_PA(r3);;
- ld8 r3=[r3];;
- cmp.eq p6,p7=r0,r3;;
OS_MCA_TO_SAL_HANDOFF_STATE_RESTORE(r2);;
// branch back to SALE_CHECK
ld8 r3=[r2];;
mov b0=r3;; // SAL_CHECK return address
+
+ // release lock
+ movl r3=ia64_mca_serialize;;
+ DATA_VA_TO_PA(r3);;
+ st8.rel [r3]=r0
+
br b0
;;
ia64_os_mca_dispatch_end:
@@ -205,8 +394,9 @@
ia64_os_mca_proc_state_dump:
// Save bank 1 GRs 16-31 which will be used by c-language code when we switch
// to virtual addressing mode.
- movl r2=ia64_mca_proc_state_dump;; // Os state dump area
- DATA_VA_TO_PA(r2) // convert to to physical address
+// movl r2=ia64_mca_proc_state_dump;; // Os state dump area
+// DATA_VA_TO_PA(r2) // convert to to physical address
+ LOAD_PHYSICAL(p0,r2,ia64_mca_proc_state_dump)// convert OS state dump area to physical address
// save ar.NaT
mov r5=ar.unat // ar.unat
@@ -658,79 +848,6 @@
//EndStub//////////////////////////////////////////////////////////////////////
-//++
-// Name:
-// ia64_os_mca_tlb_error_check()
-//
-// Stub Description:
-//
-// This stub checks to see if the MCA resulted from a TLB error
-//
-//--
-
-ia64_os_mca_tlb_error_check:
-
- // Retrieve sal data structure for uncorrected MCA
-
- // Make the ia64_sal_get_state_info() call
- movl r4=ia64_mca_sal_data_area;;
- movl r7=ia64_sal;;
- mov r6=r1 // save gp
- DATA_VA_TO_PA(r4) // convert to physical address
- DATA_VA_TO_PA(r7);; // convert to physical address
- ld8 r7=[r7] // get addr of pdesc from ia64_sal
- movl r3=SAL_GET_STATE_INFO;;
- DATA_VA_TO_PA(r7);; // convert to physical address
- ld8 r8=[r7],8;; // get pdesc function pointer
- dep r8=0,r8,61,3;; // convert SAL VA to PA
- ld8 r1=[r7];; // set new (ia64_sal) gp
- dep r1=0,r1,61,3;; // convert SAL VA to PA
- mov b6=r8
-
- alloc r5=ar.pfs,8,0,8,0;; // allocate stack frame for SAL call
- mov out0=r3 // which SAL proc to call
- mov out1=r0 // error type == MCA
- mov out2=r0 // null arg
- mov out3=r4 // data copy area
- mov out4=r0 // null arg
- mov out5=r0 // null arg
- mov out6=r0 // null arg
- mov out7=r0;; // null arg
-
- br.call.sptk.few b0=b6;;
-
- mov r1=r6 // restore gp
- mov ar.pfs=r5;; // restore ar.pfs
-
- movl r6=ia64_tlb_functional;;
- DATA_VA_TO_PA(r6) // needed later
-
- cmp.eq p6,p7=r0,r8;; // check SAL call return address
-(p7) st8 [r6]=r0 // clear tlb_functional flag
-(p7) br tlb_failure // error; return to SAL
-
- // examine processor error log for type of error
- add r4=40+24,r4;; // parse past record header (length=40)
- // and section header (length=24)
- ld4 r4=[r4] // get valid field of processor log
- mov r5=0xf00;;
- and r5=r4,r5;; // read bits 8-11 of valid field
- // to determine if we have a TLB error
- movl r3=0x1
- cmp.eq p6,p7=r0,r5;;
- // if no TLB failure, set tlb_functional flag
-(p6) st8 [r6]=r3
- // else clear flag
-(p7) st8 [r6]=r0
-
- // if no TLB failure, continue with normal virtual mode logging
-(p6) br done_tlb_error_check
- // else no point in entering virtual mode for logging
-tlb_failure:
- br ia64_os_mca_virtual_end
-
-//EndStub//////////////////////////////////////////////////////////////////////
-
// ok, the issue here is that we need to save state information so
// it can be useable by the kernel debugger and show regs routines.
diff -ru linus-bk/arch/ia64/mm/init.c todo/arch/ia64/mm/init.c
--- linus-bk/arch/ia64/mm/init.c Thu Oct 16 16:18:48 2003
+++ todo/arch/ia64/mm/init.c Fri Oct 17 12:00:29 2003
@@ -33,6 +33,7 @@
#include <asm/tlb.h>
#include <asm/uaccess.h>
#include <asm/unistd.h>
+#include <asm/mca.h>
DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
@@ -274,6 +275,10 @@
{
unsigned long psr, pta, impl_va_bits;
extern void __init tlb_init (void);
+#ifdef CONFIG_IA64_MCA
+ int cpu;
+#endif
+
#ifdef CONFIG_DISABLE_VHPT
# define VHPT_ENABLE_BIT 0
#else
@@ -332,6 +337,23 @@
ia64_set_pta(pta | (0 << 8) | (vmlpt_bits << 2) | VHPT_ENABLE_BIT);
ia64_tlb_init();
+
+#ifdef CONFIG_IA64_MCA
+ cpu = smp_processor_id();
+
+ /* mca handler uses cr.lid as key to pick the right entry */
+ ia64_mca_tlb_list[cpu].cr_lid = ia64_getreg(_IA64_REG_CR_LID);
+
+ /* insert this percpu data information into our list for MCA recovery purposes */
+ ia64_mca_tlb_list[cpu].percpu_paddr=pte_val(mk_pte_phys(__pa(my_cpu_data), PAGE_KERNEL));
+ /* Also save per-cpu tlb flush recipe for use in physical mode mca handler */
+ ia64_mca_tlb_list[cpu].ptce_base=local_cpu_data->ptce_base;
+ ia64_mca_tlb_list[cpu].ptce_count[0]=local_cpu_data->ptce_count[0];
+ ia64_mca_tlb_list[cpu].ptce_count[1]=local_cpu_data->ptce_count[1];
+ ia64_mca_tlb_list[cpu].ptce_stride[0]=local_cpu_data->ptce_stride[0];
+ ia64_mca_tlb_list[cpu].ptce_stride[1]=local_cpu_data->ptce_stride[1];
+#endif
+
}
#ifdef CONFIG_VIRTUAL_MEM_MAP
diff -ru linus-bk/include/asm-ia64/mca.h todo/include/asm-ia64/mca.h
--- linus-bk/include/asm-ia64/mca.h Thu Oct 16 16:19:51 2003
+++ todo/include/asm-ia64/mca.h Fri Oct 17 12:01:02 2003
@@ -18,6 +18,7 @@
#include <asm/param.h>
#include <asm/sal.h>
#include <asm/processor.h>
+#include <asm/mca_asm.h>
/* These are the return codes from all the IA64_MCA specific interfaces */
typedef int ia64_mca_return_code_t;
@@ -61,6 +62,17 @@
IA64_MCA_RENDEZ_CHECKIN_DONE = 0x1
};
+/* the following data structure is used for TLB error recovery purposes */
+extern struct ia64_mca_tlb_info {
+ u64 cr_lid;
+ u64 percpu_paddr;
+ u64 ptce_base;
+ u32 ptce_count[2];
+ u32 ptce_stride[2];
+ u64 pal_paddr;
+ u64 pal_base;
+} ia64_mca_tlb_list[NR_CPUS];
+
/* Information maintained by the MC infrastructure */
typedef struct ia64_mc_info_s {
u64 imi_mca_handler;
diff -ru linus-bk/include/asm-ia64/pgtable.h todo/include/asm-ia64/pgtable.h
--- linus-bk/include/asm-ia64/pgtable.h Thu Oct 16 16:19:51 2003
+++ todo/include/asm-ia64/pgtable.h Fri Oct 17 12:00:29 2003
@@ -229,6 +229,10 @@
#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot))
+/* This takes a physical page address that is used by the remapping functions */
+#define mk_pte_phys(physpage, pgprot) \
+({ pte_t __pte; pte_val(__pte) = physpage + pgprot_val(pgprot); __pte; })
+
#define pte_modify(_pte, newprot) \
(__pte((pte_val(_pte) & _PAGE_CHG_MASK) | pgprot_val(newprot)))
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: MCA Recovery for Enterprise Server
2003-10-20 6:19 MCA Recovery for Enterprise Server Hidetoshi Seto
2003-10-20 17:02 ` Luck, Tony
@ 2003-10-20 20:42 ` David Mosberger
2003-10-27 6:44 ` Hidetoshi Seto
2 siblings, 0 replies; 4+ messages in thread
From: David Mosberger @ 2003-10-20 20:42 UTC (permalink / raw)
To: linux-ia64
>>>>> On Mon, 20 Oct 2003 10:02:21 -0700, "Luck, Tony" <tony.luck@intel.com> said:
Tony> David: does "ski" have any hooks for fault injection?
Not that I know of. However, wouldn't it be possible to extend
fw-emu.c to do such things? You could hit Ctrl-C at any point in
time, then force a branch to a special fault-injection routine. That
ought to be workable, though setting up all the info etc. is probably
non-trivial.
--david
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: MCA Recovery for Enterprise Server
2003-10-20 6:19 MCA Recovery for Enterprise Server Hidetoshi Seto
2003-10-20 17:02 ` Luck, Tony
2003-10-20 20:42 ` David Mosberger
@ 2003-10-27 6:44 ` Hidetoshi Seto
2 siblings, 0 replies; 4+ messages in thread
From: Hidetoshi Seto @ 2003-10-27 6:44 UTC (permalink / raw)
To: linux-ia64
Hi Tony,
I think my words were too short or vague to bring you a real implementation
image. What I intend to do is to show my basic idea of general MCA handling on
Linux.
I will explain more details to you with my next post.
Thanks.
------
H.Seto <seto.hidetoshi@jp.fujitsu.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2003-10-27 6:44 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-20 6:19 MCA Recovery for Enterprise Server Hidetoshi Seto
2003-10-20 17:02 ` Luck, Tony
2003-10-20 20:42 ` David Mosberger
2003-10-27 6:44 ` Hidetoshi Seto
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox