* [Question ]: Avoid kernel panic when killing an application if happen RAS page table error
[not found] ` <eab54efe-0ab4-bf6a-5831-128ff02a018b@huawei.com>
@ 2017-12-15 18:52 ` James Morse
2017-12-15 19:35 ` Matthew Wilcox
0 siblings, 1 reply; 3+ messages in thread
From: James Morse @ 2017-12-15 18:52 UTC (permalink / raw)
To: linux-arm-kernel
Hi gengdongjiu,
On 15/12/17 02:00, gengdongjiu wrote:
> change the mail title and resend.
(please don't do this, we all got the first version)
> If the user space application happen page table RAS error,Memory error handler(memory_failure()) will
> do nothing except making a poisoned page flag,
Yes, because user-space process's page tables are kernel memory.
memory_failure() depends on the system being able to contain these faults,
giving us another RAS exception if we touch the page again.
> and fault handler in arch/arm64/mm/fault.c
> will deliver a signal to kill this application. when this application exits, it will call unmap_vmas ()
> to release his vma resource, but here it will touch the error page table
again, then will
> trigger RAS error again, so this application cannot be killed and system will be panic, the log is shown in [2].
Kernel memory is corrupt, we panic().
You want to add a distinction to handle user-space process's page tables:
> As shown the stack in [1], unmap_page_range() will touch the error page table, so system will panic,
> there are some simple way to avoid this panic and avoid change much about
> the memory management.
> 1. put the tasks to dead status, not run it again.
> 2. not release the page table for this task.
>
> Of cause, above methods may happen memory leakage. do you have good suggestion about how to solve it?, or do you think this panic is expected behavior? thanks.
I don't think this is worth the effort, the page tables are small compared to
the memory they map. Even if this were fixed, you still have the chance of other
kernel memory being corrupted.
Leaking any memory that isn't marked as poisoned isn't a good idea.
What you would need is a way to know from the struct_page that: this page is
is page-table, and which struct_mm it belongs to. (If its the kernel's init_mm:
panic()).
Next you need a way to find all the other pages of page-table without walking
them. With these three pieces of information you can free all the unaffected
memory, with even more work you can probably regenerate the corrupted page.
It's going to be complicated to do, I don't think its worth the effort.
Thanks,
James
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Question ]: Avoid kernel panic when killing an application if happen RAS page table error
2017-12-15 18:52 ` [Question ]: Avoid kernel panic when killing an application if happen RAS page table error James Morse
@ 2017-12-15 19:35 ` Matthew Wilcox
2017-12-16 7:09 ` gengdongjiu
0 siblings, 1 reply; 3+ messages in thread
From: Matthew Wilcox @ 2017-12-15 19:35 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Dec 15, 2017 at 06:52:35PM +0000, James Morse wrote:
> Leaking any memory that isn't marked as poisoned isn't a good idea.
>
> What you would need is a way to know from the struct_page that: this page is
> is page-table, and which struct_mm it belongs to. (If its the kernel's init_mm:
> panic()).
> Next you need a way to find all the other pages of page-table without walking
> them. With these three pieces of information you can free all the unaffected
> memory, with even more work you can probably regenerate the corrupted page.
>
> It's going to be complicated to do, I don't think its worth the effort.
We can find a bit in struct page that we guarantee will only be set if
this is allocated as a pagetable. Bit 1 of the third union is currently
available (compound_head is a pointer if bit 0 is set, so nothing is
using bit 1). We can put a pointer to the mm_struct in the same word.
Finding all the allocated pages will be the tricky bit. We could put a
list_head into struct page; perhaps in the same spot as page_deferred_list
for tail pages. Then we can link all the pagetables belonging to
this mm together and tear them all down if any of them get an error.
They'll repopulate on demand. It won't be quick or scalable, but when
the alternative is death, it looks relatively attractive.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Question ]: Avoid kernel panic when killing an application if happen RAS page table error
2017-12-15 19:35 ` Matthew Wilcox
@ 2017-12-16 7:09 ` gengdongjiu
0 siblings, 0 replies; 3+ messages in thread
From: gengdongjiu @ 2017-12-16 7:09 UTC (permalink / raw)
To: linux-arm-kernel
On 2017/12/16 3:35, Matthew Wilcox wrote:
>> It's going to be complicated to do, I don't think its worth the effort.
> We can find a bit in struct page that we guarantee will only be set if
> this is allocated as a pagetable. Bit 1 of the third union is currently
> available (compound_head is a pointer if bit 0 is set, so nothing is
> using bit 1). We can put a pointer to the mm_struct in the same word.
>
> Finding all the allocated pages will be the tricky bit. We could put a
> list_head into struct page; perhaps in the same spot as page_deferred_list
> for tail pages. Then we can link all the pagetables belonging to
> this mm together and tear them all down if any of them get an error.
> They'll repopulate on demand. It won't be quick or scalable, but when
> the alternative is death, it looks relatively attractive.
Thanks for the comments, I will check it in detailed and investigate whether it is worth to do for it.
Thanks!
>
> .
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-12-16 7:09 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <0184EA26B2509940AA629AE1405DD7F2019C8B36@DGGEMA503-MBS.china.huawei.com>
[not found] ` <20171205165727.GG3070@tassilo.jf.intel.com>
[not found] ` <0276f3b3-94a5-8a47-dfb7-8773cd2f99c5@huawei.com>
[not found] ` <dedf9af6-7979-12dc-2a52-f00b2ec7f3b6@huawei.com>
[not found] ` <0b7bb7b3-ae39-0c97-9c0a-af37b0701ab4@huawei.com>
[not found] ` <eab54efe-0ab4-bf6a-5831-128ff02a018b@huawei.com>
2017-12-15 18:52 ` [Question ]: Avoid kernel panic when killing an application if happen RAS page table error James Morse
2017-12-15 19:35 ` Matthew Wilcox
2017-12-16 7:09 ` gengdongjiu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).