* [RFC] kcore:change kcore_read to make sure the kernel read is safe @ 2015-08-04 3:37 yalin wang 2015-08-04 21:18 ` Dave Hansen 2015-08-04 22:38 ` Andrew Morton 0 siblings, 2 replies; 5+ messages in thread From: yalin wang @ 2015-08-04 3:37 UTC (permalink / raw) To: Ingo Molnar, Andrew Morton, dave, David Rientjes, fabf, bhe, yalin.wang2010, open list This change kcore_read() to use __copy_from_user_inatomic() to copy data from kernel address, because kern_addr_valid() just make sure page table is valid during call it, whne it return, the page table may change, for example, like set_fixmap() function will change kernel page table, then maybe trigger kernel crash if encounter this unluckily. Signed-off-by: yalin wang <yalin.wang2010@gmail.com> --- fs/proc/kcore.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index 92e6726..b085fde 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -86,8 +86,8 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen) size = try; *nphdr = *nphdr + 1; } - *elf_buflen = sizeof(struct elfhdr) + - (*nphdr + 2)*sizeof(struct elf_phdr) + + *elf_buflen = sizeof(struct elfhdr) + + (*nphdr + 2)*sizeof(struct elf_phdr) + 3 * ((sizeof(struct elf_note)) + roundup(sizeof(CORE_STR), 4)) + roundup(sizeof(struct elf_prstatus), 4) + @@ -435,6 +435,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) size_t elf_buflen; int nphdr; unsigned long start; + unsigned long page = 0; read_lock(&kclist_lock); size = get_kcore_size(&nphdr, &elf_buflen); @@ -485,7 +486,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) start = kc_offset_to_vaddr(*fpos - elf_buflen); if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen) tsz = buflen; - + while (buflen) { struct kcore_list *m; @@ -515,15 +516,32 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) } else { if (kern_addr_valid(start)) { unsigned long n; + mm_segment_t old_fs = get_fs(); + + if (page == 0) { + page = __get_free_page(GFP_KERNEL); + if (page == 0) + return -ENOMEM; - n = copy_to_user(buffer, (char *)start, tsz); + } + set_fs(KERNEL_DS); + pagefault_disable(); + n = __copy_from_user_inatomic((void *)page, + (__force const void __user *)start, + tsz); + pagefault_enable(); + set_fs(old_fs); + if (n) + memset((void *)page + tsz - n, 0, n); + + n = copy_to_user(buffer, (char *)page, tsz); /* * We cannot distinguish between fault on source * and fault on destination. When this happens * we clear too and hope it will trigger the * EFAULT again. */ - if (n) { + if (n) { if (clear_user(buffer + tsz - n, n)) return -EFAULT; @@ -540,7 +558,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) start += tsz; tsz = (buflen > PAGE_SIZE ? PAGE_SIZE : buflen); } - + free_page(page); return acc; } -- 1.9.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC] kcore:change kcore_read to make sure the kernel read is safe 2015-08-04 3:37 [RFC] kcore:change kcore_read to make sure the kernel read is safe yalin wang @ 2015-08-04 21:18 ` Dave Hansen 2015-08-05 3:37 ` yalin wang 2015-08-04 22:38 ` Andrew Morton 1 sibling, 1 reply; 5+ messages in thread From: Dave Hansen @ 2015-08-04 21:18 UTC (permalink / raw) To: yalin wang, Ingo Molnar, Andrew Morton, David Rientjes, fabf, bhe, open list On 08/03/2015 08:37 PM, yalin wang wrote: > This change kcore_read() to use __copy_from_user_inatomic() to > copy data from kernel address, because kern_addr_valid() just make sure > page table is valid during call it, whne it return, the page table may > change, for example, like set_fixmap() function will change kernel page > table, then maybe trigger kernel crash if encounter this unluckily. I don't see any cases at the moment that will crash. set_fixmap() doesn't ever clear out any ptes, right? I guess the root problem here is that we don't have any good (generic) locking of kernel page tables inside the linear map. Can you come up with a case where this will _actually_ crash? > fs/proc/kcore.c | 30 ++++++++++++++++++++++++------ > 1 file changed, 24 insertions(+), 6 deletions(-) > > diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c > index 92e6726..b085fde 100644 > --- a/fs/proc/kcore.c > +++ b/fs/proc/kcore.c > @@ -86,8 +86,8 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen) > size = try; > *nphdr = *nphdr + 1; > } > - *elf_buflen = sizeof(struct elfhdr) + > - (*nphdr + 2)*sizeof(struct elf_phdr) + > + *elf_buflen = sizeof(struct elfhdr) + > + (*nphdr + 2)*sizeof(struct elf_phdr) + I'm having a hard time spotting the change here. Whitespace? > 3 * ((sizeof(struct elf_note)) + > roundup(sizeof(CORE_STR), 4)) + > roundup(sizeof(struct elf_prstatus), 4) + > @@ -435,6 +435,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) > size_t elf_buflen; > int nphdr; > unsigned long start; > + unsigned long page = 0; > > read_lock(&kclist_lock); > size = get_kcore_size(&nphdr, &elf_buflen); > @@ -485,7 +486,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) > start = kc_offset_to_vaddr(*fpos - elf_buflen); > if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen) > tsz = buflen; > - > + Please keep the unnecessary whitespace changes for another patch. > while (buflen) { > struct kcore_list *m; > > @@ -515,15 +516,32 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) > } else { > if (kern_addr_valid(start)) { > unsigned long n; > + mm_segment_t old_fs = get_fs(); > + > + if (page == 0) { > + page = __get_free_page(GFP_KERNEL); > + if (page == 0) > + return -ENOMEM; FWIW, we usually code this as "!page" instead of "page == 0". I also wouldn't call it 'page'. Also, why is this using a raw __get_free_page() while the code above it uses a kmalloc()? > - n = copy_to_user(buffer, (char *)start, tsz); > + } > + set_fs(KERNEL_DS); > + pagefault_disable(); > + n = __copy_from_user_inatomic((void *)page, > + (__force const void __user *)start, > + tsz); > + pagefault_enable(); > + set_fs(old_fs); > + if (n) > + memset((void *)page + tsz - n, 0, n); > + > + n = copy_to_user(buffer, (char *)page, tsz); So, first of all, we are using __copy_from_user_inatomic() to copy to and from a *kernel* addresses, and it doesn't even get a comment? :) Fundamentally, we're trying to be able to safely survive faults in the kernel linear map here. I think we've got to get a better handle on when that happens rather than just paper over it when it does. (Aside: There might actually be a missing use of get_online_mems() here.) Maybe we should just be walking the kernel page tables ourselves and do a kmap(). We might have a stale pte but we don't have to worry about actual racy updates while we are doing the copy. > /* > * We cannot distinguish between fault on source > * and fault on destination. When this happens > * we clear too and hope it will trigger the > * EFAULT again. > */ This comment seems wrong after the patch. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] kcore:change kcore_read to make sure the kernel read is safe 2015-08-04 21:18 ` Dave Hansen @ 2015-08-05 3:37 ` yalin wang 0 siblings, 0 replies; 5+ messages in thread From: yalin wang @ 2015-08-05 3:37 UTC (permalink / raw) To: Dave Hansen Cc: Ingo Molnar, Andrew Morton, David Rientjes, fabf, bhe, open list > On Aug 5, 2015, at 05:18, Dave Hansen <dave@sr71.net> wrote: > > On 08/03/2015 08:37 PM, yalin wang wrote: >> This change kcore_read() to use __copy_from_user_inatomic() to >> copy data from kernel address, because kern_addr_valid() just make sure >> page table is valid during call it, whne it return, the page table may >> change, for example, like set_fixmap() function will change kernel page >> table, then maybe trigger kernel crash if encounter this unluckily. > > I don't see any cases at the moment that will crash. set_fixmap() > doesn't ever clear out any ptes, right? > > I guess the root problem here is that we don't have any good (generic) > locking of kernel page tables inside the linear map. Can you come up > with a case where this will _actually_ crash? > Thanks for your comments. i don’t have crash for this, but when i read code, i see this part not safe, so i make this patch :). >> fs/proc/kcore.c | 30 ++++++++++++++++++++++++------ >> 1 file changed, 24 insertions(+), 6 deletions(-) >> >> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c >> index 92e6726..b085fde 100644 >> --- a/fs/proc/kcore.c >> +++ b/fs/proc/kcore.c >> @@ -86,8 +86,8 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen) >> size = try; >> *nphdr = *nphdr + 1; >> } >> - *elf_buflen = sizeof(struct elfhdr) + >> - (*nphdr + 2)*sizeof(struct elf_phdr) + >> + *elf_buflen = sizeof(struct elfhdr) + >> + (*nphdr + 2)*sizeof(struct elf_phdr) + > > I'm having a hard time spotting the change here. Whitespace? i will seperate in another patch for format correctness. > >> 3 * ((sizeof(struct elf_note)) + >> roundup(sizeof(CORE_STR), 4)) + >> roundup(sizeof(struct elf_prstatus), 4) + >> @@ -435,6 +435,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) >> size_t elf_buflen; >> int nphdr; >> unsigned long start; >> + unsigned long page = 0; >> >> read_lock(&kclist_lock); >> size = get_kcore_size(&nphdr, &elf_buflen); >> @@ -485,7 +486,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) >> start = kc_offset_to_vaddr(*fpos - elf_buflen); >> if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen) >> tsz = buflen; >> - >> + > > Please keep the unnecessary whitespace changes for another patch. > >> while (buflen) { >> struct kcore_list *m; >> >> @@ -515,15 +516,32 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) >> } else { >> if (kern_addr_valid(start)) { >> unsigned long n; >> + mm_segment_t old_fs = get_fs(); >> + >> + if (page == 0) { >> + page = __get_free_page(GFP_KERNEL); >> + if (page == 0) >> + return -ENOMEM; > > FWIW, we usually code this as "!page" instead of "page == 0". I also > wouldn't call it 'page'. > > Also, why is this using a raw __get_free_page() while the code above it > uses a kmalloc()? > because i am using a page size buffer, more efficient to use __get_free_page() than kmalloc() here . >> - n = copy_to_user(buffer, (char *)start, tsz); >> + } >> + set_fs(KERNEL_DS); >> + pagefault_disable(); >> + n = __copy_from_user_inatomic((void *)page, >> + (__force const void __user *)start, >> + tsz); >> + pagefault_enable(); >> + set_fs(old_fs); >> + if (n) >> + memset((void *)page + tsz - n, 0, n); >> + >> + n = copy_to_user(buffer, (char *)page, tsz); > > So, first of all, we are using __copy_from_user_inatomic() to copy to > and from a *kernel* addresses, and it doesn't even get a comment? :) > i will add comment. > Fundamentally, we're trying to be able to safely survive faults in the > kernel linear map here. I think we've got to get a better handle on > when that happens rather than just paper over it when it does. (Aside: > There might actually be a missing use of get_online_mems() here.) > ok. > Maybe we should just be walking the kernel page tables ourselves and do > a kmap(). We might have a stale pte but we don't have to worry about > actual racy updates while we are doing the copy. > so if do like this, we can remove kern_addr_valid() function, and i just walk pte and use get_page_unelss_zero() to grab the valid page ? >> /* >> * We cannot distinguish between fault on source >> * and fault on destination. When this happens >> * we clear too and hope it will trigger the >> * EFAULT again. >> */ > > This comment seems wrong after the patch. Ok. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] kcore:change kcore_read to make sure the kernel read is safe 2015-08-04 3:37 [RFC] kcore:change kcore_read to make sure the kernel read is safe yalin wang 2015-08-04 21:18 ` Dave Hansen @ 2015-08-04 22:38 ` Andrew Morton 2015-08-05 3:30 ` yalin wang 1 sibling, 1 reply; 5+ messages in thread From: Andrew Morton @ 2015-08-04 22:38 UTC (permalink / raw) To: yalin wang; +Cc: Ingo Molnar, dave, David Rientjes, fabf, bhe, open list On Tue, 4 Aug 2015 11:37:57 +0800 yalin wang <yalin.wang2010@gmail.com> wrote: > This change kcore_read() to use __copy_from_user_inatomic() to > copy data from kernel address, because kern_addr_valid() just make sure > page table is valid during call it, whne it return, the page table may > change, for example, like set_fixmap() function will change kernel page > table, then maybe trigger kernel crash if encounter this unluckily. The changelog is a bit hard to follow. How does this version look? : read_kcore() does a copy_to_user() from kernel memory. This could cause a : crash if the source (kernel) address is concurrently unmapped via, say, : set_fixmap(). The kern_addr_valid() check is racy and won't reliably : prevent this. : : Change kcore_read() to use __copy_from_user_inatomic() via a temporary : buffer to catch such situations. What actually happens when copy_to_user() gets a fault on the source address? It *could* handle it and return -EFAULT. I forget... Also... what is special about this particular copy_to_user()? Isn't every copy_to_user() in the kernel vulnerable to a concurrent set_fixmap()? Is it that only read_kcore() will read pages which are subject to set_fixmap() alteration? > --- a/fs/proc/kcore.c > +++ b/fs/proc/kcore.c > @@ -86,8 +86,8 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen) > size = try; > *nphdr = *nphdr + 1; > } > - *elf_buflen = sizeof(struct elfhdr) + > - (*nphdr + 2)*sizeof(struct elf_phdr) + > + *elf_buflen = sizeof(struct elfhdr) + > + (*nphdr + 2)*sizeof(struct elf_phdr) + Unrelated whitespace fixes really shouldn't be in here. They don't bother me too much, but some people get upset ;) > 3 * ((sizeof(struct elf_note)) + > roundup(sizeof(CORE_STR), 4)) + > roundup(sizeof(struct elf_prstatus), 4) + > @@ -435,6 +435,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) > size_t elf_buflen; > int nphdr; > unsigned long start; > + unsigned long page = 0; "page" isn't a very good name - when we see that identifier we expect it to be a `struct page *'. Maybe call it copy_buf or something. (And incoming argument "buffer" was poorly named. "buffer" implies some temporary intermediate thing, which is inappropriate here!) > read_lock(&kclist_lock); > size = get_kcore_size(&nphdr, &elf_buflen); > @@ -485,7 +486,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) > start = kc_offset_to_vaddr(*fpos - elf_buflen); > if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen) > tsz = buflen; > - > + > while (buflen) { > struct kcore_list *m; > > @@ -515,15 +516,32 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) > } else { > if (kern_addr_valid(start)) { Do we still need the (racy) kern_addr_valid() test? The code should work OK if this is removed? > unsigned long n; > + mm_segment_t old_fs = get_fs(); > + > + if (page == 0) { > + page = __get_free_page(GFP_KERNEL); > + if (page == 0) > + return -ENOMEM; > > - n = copy_to_user(buffer, (char *)start, tsz); > + } > + set_fs(KERNEL_DS); > + pagefault_disable(); > + n = __copy_from_user_inatomic((void *)page, > + (__force const void __user *)start, > + tsz); > + pagefault_enable(); > + set_fs(old_fs); We should have a code comment in here telling people what's going on. A concurrent set_fixmap() on the source memory is unexpected! > + if (n) > + memset((void *)page + tsz - n, 0, n); > + > + n = copy_to_user(buffer, (char *)page, tsz); > /* > * We cannot distinguish between fault on source > * and fault on destination. When this happens > * we clear too and hope it will trigger the > * EFAULT again. > */ > - if (n) { > + if (n) { > if (clear_user(buffer + tsz - n, > n)) > return -EFAULT; > @@ -540,7 +558,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) > start += tsz; > tsz = (buflen > PAGE_SIZE ? PAGE_SIZE : buflen); > } > - > + free_page(page); > return acc; > } ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] kcore:change kcore_read to make sure the kernel read is safe 2015-08-04 22:38 ` Andrew Morton @ 2015-08-05 3:30 ` yalin wang 0 siblings, 0 replies; 5+ messages in thread From: yalin wang @ 2015-08-05 3:30 UTC (permalink / raw) To: Andrew Morton; +Cc: Ingo Molnar, dave, David Rientjes, fabf, bhe, open list > On Aug 5, 2015, at 06:38, Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 4 Aug 2015 11:37:57 +0800 yalin wang <yalin.wang2010@gmail.com> wrote: > >> This change kcore_read() to use __copy_from_user_inatomic() to >> copy data from kernel address, because kern_addr_valid() just make sure >> page table is valid during call it, whne it return, the page table may >> change, for example, like set_fixmap() function will change kernel page >> table, then maybe trigger kernel crash if encounter this unluckily. > > The changelog is a bit hard to follow. How does this version look? > > : read_kcore() does a copy_to_user() from kernel memory. This could cause a > : crash if the source (kernel) address is concurrently unmapped via, say, > : set_fixmap(). The kern_addr_valid() check is racy and won't reliably > : prevent this. > : > : Change kcore_read() to use __copy_from_user_inatomic() via a temporary > : buffer to catch such situations. > > What actually happens when copy_to_user() gets a fault on the source > address? It *could* handle it and return -EFAULT. I forget... > > Also... what is special about this particular copy_to_user()? Isn't > every copy_to_user() in the kernel vulnerable to a concurrent > set_fixmap()? Is it that only read_kcore() will read pages which are > subject to set_fixmap() alteration? > Thanks for your great comment . i agree with your git change log, one more question, at first i only focus on arm64 arch, it only check __user* address during copy_from{to}_user, but other architecther like X86 check both source and dest address in copy_from{to}_user, is there some special reason do like this? in my view , just need check __user* address is enough, and if also have ex_table for kernel address access maybe hide some BUG in kernel , i think kernel don’t need it, or am i miss something ? if copy_from{to}_user both check source and dest address, we don’t need this patch, it is safe . Maybe we need one more API , like: copy_data_in_user(__user *source, __user *dest, size_t size) ?? >> --- a/fs/proc/kcore.c >> +++ b/fs/proc/kcore.c >> @@ -86,8 +86,8 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen) >> size = try; >> *nphdr = *nphdr + 1; >> } >> - *elf_buflen = sizeof(struct elfhdr) + >> - (*nphdr + 2)*sizeof(struct elf_phdr) + >> + *elf_buflen = sizeof(struct elfhdr) + >> + (*nphdr + 2)*sizeof(struct elf_phdr) + > > Unrelated whitespace fixes really shouldn't be in here. They don't > bother me too much, but some people get upset ;) > i will seperate in another patch for format correctness. >> 3 * ((sizeof(struct elf_note)) + >> roundup(sizeof(CORE_STR), 4)) + >> roundup(sizeof(struct elf_prstatus), 4) + >> @@ -435,6 +435,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) >> size_t elf_buflen; >> int nphdr; >> unsigned long start; >> + unsigned long page = 0; > > "page" isn't a very good name - when we see that identifier we expect > it to be a `struct page *'. Maybe call it copy_buf or something. > > (And incoming argument "buffer" was poorly named. "buffer" implies some > temporary intermediate thing, which is inappropriate here!) > will change name. >> read_lock(&kclist_lock); >> size = get_kcore_size(&nphdr, &elf_buflen); >> @@ -485,7 +486,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) >> start = kc_offset_to_vaddr(*fpos - elf_buflen); >> if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen) >> tsz = buflen; >> - >> + >> while (buflen) { >> struct kcore_list *m; >> >> @@ -515,15 +516,32 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) >> } else { >> if (kern_addr_valid(start)) { > > Do we still need the (racy) kern_addr_valid() test? The code should > work OK if this is removed? > Yes, can remove >> unsigned long n; >> + mm_segment_t old_fs = get_fs(); >> + >> + if (page == 0) { >> + page = __get_free_page(GFP_KERNEL); >> + if (page == 0) >> + return -ENOMEM; >> >> - n = copy_to_user(buffer, (char *)start, tsz); >> + } >> + set_fs(KERNEL_DS); >> + pagefault_disable(); >> + n = __copy_from_user_inatomic((void *)page, >> + (__force const void __user *)start, >> + tsz); >> + pagefault_enable(); >> + set_fs(old_fs); > > We should have a code comment in here telling people what's going on. > A concurrent set_fixmap() on the source memory is unexpected! Ok. > >> + if (n) >> + memset((void *)page + tsz - n, 0, n); >> + >> + n = copy_to_user(buffer, (char *)page, tsz); >> /* >> * We cannot distinguish between fault on source >> * and fault on destination. When this happens >> * we clear too and hope it will trigger the >> * EFAULT again. >> */ >> - if (n) { >> + if (n) { >> if (clear_user(buffer + tsz - n, >> n)) >> return -EFAULT; >> @@ -540,7 +558,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) >> start += tsz; >> tsz = (buflen > PAGE_SIZE ? PAGE_SIZE : buflen); >> } >> - >> + free_page(page); >> return acc; >> } > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-08-05 3:37 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-08-04 3:37 [RFC] kcore:change kcore_read to make sure the kernel read is safe yalin wang 2015-08-04 21:18 ` Dave Hansen 2015-08-05 3:37 ` yalin wang 2015-08-04 22:38 ` Andrew Morton 2015-08-05 3:30 ` yalin wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox