Attached is a patch that adds lightweight memory scrubbing for memory errors reported by CMCs and CPEs. The goal is simply to mark addresses reported by these corrected errors as dirty such that the corrected value gets written back to memory. For platforms that do no support hardware memory scrubbing, this should help ensure that single bit errors don't become multi-bit errors and should reduce the occurrence of multiple CMCs for the same memory address. I'm assuming that platforms that do support hardware scrubbing will fix single bit errors at the chipset, eliminating the CMC, and thus making this addition extremely lightweight. To scrub the memory, I simply issue an lfetch.excl to the faulting address. According to the Itanium 2 Optimization guide, this will look like a write on the bus and puts the cacheline in the M(odified) state. Thanks to David for recommending this method of scrubbing. To determine if an address needs scrubbing, I look for the following: CMC - bus error w/ the eb (external bus) bit set. CPE - memory device error. Ideally for the CMC, we could get the target address from the bus error log. Unfortunately, the CMC hardly ever (never in my experience) sets the target address as valid. Therefore, if I see the signature from the CMC, but not a target address, I kick the CPE poll to trigger (if we're in polling mode for CPEs). I've also updated the CPE polling to poll on all processors. For multi-node systems, this makes sure we get all the logs we're after. This patch also fixes the timestamp for MCA logs. The date was correctly changed to be printed as BCD, but the time was still being printed as decimal. This patch applies cleanly against 2.4.20 + ia64-021210 (I think 2.5 is missing the CPE polling patch, which causes failures). Feedback welcome. Thanks, Alex -- Alex Williamson Linux Development Lab alex_williamson@hp.com Hewlett Packard 970-898-9173 Fort Collins, CO