From mboxrd@z Thu Jan  1 00:00:00 1970
From: Keith Owens <kaos@sgi.com>
Date: Fri, 06 Aug 2004 14:32:13 +0000
Subject: Re: [PATCH&RFC 2/2] OS_MCA Recovery from poisoned memory read
Message-Id: <10156.1091802733@ocs3.ocs.com.au>
List-Id: <linux-ia64.vger.kernel.org>
References: <41121484.40804@jp.fujitsu.com>
In-Reply-To: <41121484.40804@jp.fujitsu.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: linux-ia64@vger.kernel.org

On Fri, 06 Aug 2004 21:17:39 +0900, 
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> wrote:
>Thank you for your useful reply.
>
>But, there is one thing that I want to confirm.
>
>Keith Owens wrote:
>> +static isolate_status_t
>> +mca_page_isolate(unsigned long paddr)
>> +{
>> +       int i;
>> +       struct page *p;
>> +
>> +       /* whether physical address is valid or not */
>> +       if ( !ia64_phys_addr_valid(paddr) )  
>> 
>> The calls to mca_page_isolate() are racy.  That code is running in
>> normal kernel context after exiting from the MCA handler.  Other cpus
>> could be modifying the page tables at the same time, there could even
>> be two cpus running mca_handler_bh() at the same time for the same
>> page.
>
>I agree that there could be multiple cpus running handler_bh at the
>same time, so (even though it would be a rare case) I think it would be
>better if I avoid the race using something like a spinlock.
>
>ITOH, what the handler_bh should modify is not the page tables but the
>flag in a struct page which pfn_to_page convert from a physical address.
>Does the result of the translation from a physical address to a page that
>includes the address can be changed? (Do you suppose Memory Hotplugs?)

I had a quick look through mm/page_alloc.c and mm/memory.c.  Since
these are user pages, handler_bh should be able to get
mm->page_table_lock.  But what if the MCA occurred while the process
was already holding mm->page_table_lock?  Then mca_page_isolate() would
deadlock.

mca_handler_bh() is running as an extension of the MCA event which
means that it is not irq safe.  It is not safe to get any external lock
in mca_page_isolate() or mca_handler_bh().  Even calling printk() from
mca_handler_bh() is risky, if the MCA occurred during printk handling
then the printk call from mca_handler_bh() would deadlock on
logbuf_lock.

mca_handler_bh() can only lock against itself.  It is not safe to get
any external locks.

I am also concerned about the code in mca_handler_bh() that calls
schedule with SIGKILL set.  Again that is running as an extension of
the MCA event (not irq safe), which means that it could still own
locks, or even have interrupts disabled.

AFAICT, my concerns about the MCA event and mca_handler_bh() not being
irq safe are only a problem for the case when the MCA was triggered by
user space code but was delivered when the cpu was in kernel code.
Maybe we do not support the problem case.

*    offending process  affected process  OS MCA do
*     kernel mode        kernel mode       down system
*     kernel mode        user   mode       kill the process
*     user   mode        kernel mode       kill the process <== problem
*     user   mode        user   mode       kill the process