This patch adds a lock free, yet safe way of storing MCA/INIT logs. You will not end up with logs mixed up from different MCAs. By default, there are N_MCA_INIT_LOGS log buffers for the MCA, and another N_MCA_INIT_LOGS log buffers for the INIT handler. Boot command-line options of "nMCAlogs=" and "nINITlogs=, where is an integer greater than N_MCA_INIT_LOGS, override the default values. The first ("N" - 1) logs and the very last one are stored only. The last one gets overwritten if there are too many logs there. The admin. info. is in a structure ia64_mca_init_buf_t, see in mca.h. Handling the first ("N" - 1) log buffers is straight forward: You increment an atomic variable (->_b_cnt) and you use it as index to ->_buf[]. Having completed the log, you set the corresponding validity bit. Otherwise you race (incl. with the nested handlers) for the last buffer: - Increment the atomic generation counter (->_gen_cnt). - You own the last log buffer while no one else has got a higher generation count. - The log data is broken up into 4-byte chunks and they are stamped with the generation count. They are written together as an atomic64_t into the last buffer (*->_last_buf)[] by use of a compare-and-swap primitive to make sure that no one with higher generation count has passed by in the mean time. - (*->_last_buf)[0] is a marker: * Before writing the log data into the rest of (*->_last_buf)[], you set the marker to say "not done" (MCA_INIT_LOG_VALID bit off). * Having finished, you set the marker to say "done" (MCA_INIT_LOG_VALID bit on). This is how the code backs off if someone writes the same buffer with a higher generation count: do { tmp = atomic64_read(p); // p => las log buffer /* * If you can see a higher generation count than yours, * then you are not the last - bail out. */ if (GET_GEN_CNT(tmp) > gen_cnt) return -1; } while (cmpxchg_rel(p, tmp, COMPOSE_AT_VAL(gen_cnt, value)) != tmp); The code does not assume that the rendezvous always works. The salinfo side verifies that every element of the last log buffer is of the same generation. If there is no log left to save, it clears ->_b_cnt. There is no "shift" of the logs in the buffers at the salinfo side. Well, the the old code is not cleaned up... Changes since the previous patch: - Boot command-line options of "nMCAlogs=" and "nINITlogs= - Reusing the "struct salinfo_data" infrastructure (not the data buffers) Notes: - Writing "clear " does not actually clear the SAL's log record. The MCA handler clears the recovered events. - When checking to see if there is an MCA log coming before the reboot, the CPU number should have been picked up from the Processor Device Error Info. Yet a CPU causing fatal errors can be excluded after the reboot, the CPUs can be renumbered, etc. This implementation lets any CPU pick up logs coming before the reboot. - Apply the patch http://marc.info/?l=linux-ia64&m=120418991227044&w=3 before this one. Thanks, Zoltan Menyhart