[RFC] kcore:change kcore_read to make sure the kernel read is safe

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC] kcore:change kcore_read to make sure the kernel read is safe
@ 2015-08-04  3:37 yalin wang
  2015-08-04 21:18 ` Dave Hansen
  2015-08-04 22:38 ` Andrew Morton
  0 siblings, 2 replies; 5+ messages in thread
From: yalin wang @ 2015-08-04  3:37 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton, dave, David Rientjes, fabf, bhe,
	yalin.wang2010, open list

This change kcore_read() to use __copy_from_user_inatomic() to
copy data from kernel address, because kern_addr_valid() just make sure
page table is valid during call it, whne it return, the page table may
change, for example, like set_fixmap() function will change kernel page
table, then maybe trigger kernel crash if encounter this unluckily.

Signed-off-by: yalin wang <yalin.wang2010@gmail.com>
---
 fs/proc/kcore.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 92e6726..b085fde 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -86,8 +86,8 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
 			size = try;
 		*nphdr = *nphdr + 1;
 	}
-	*elf_buflen =	sizeof(struct elfhdr) + 
-			(*nphdr + 2)*sizeof(struct elf_phdr) + 
+	*elf_buflen =	sizeof(struct elfhdr) +
+			(*nphdr + 2)*sizeof(struct elf_phdr) +
 			3 * ((sizeof(struct elf_note)) +
 			     roundup(sizeof(CORE_STR), 4)) +
 			roundup(sizeof(struct elf_prstatus), 4) +
@@ -435,6 +435,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 	size_t elf_buflen;
 	int nphdr;
 	unsigned long start;
+	unsigned long page = 0;
 
 	read_lock(&kclist_lock);
 	size = get_kcore_size(&nphdr, &elf_buflen);
@@ -485,7 +486,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 	start = kc_offset_to_vaddr(*fpos - elf_buflen);
 	if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
 		tsz = buflen;
-		
+
 	while (buflen) {
 		struct kcore_list *m;
 
@@ -515,15 +516,32 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 		} else {
 			if (kern_addr_valid(start)) {
 				unsigned long n;
+				mm_segment_t old_fs = get_fs();
+
+				if (page == 0) {
+					page = __get_free_page(GFP_KERNEL);
+					if (page == 0)
+						return -ENOMEM;
 
-				n = copy_to_user(buffer, (char *)start, tsz);
+				}
+				set_fs(KERNEL_DS);
+				pagefault_disable();
+				n = __copy_from_user_inatomic((void *)page,
+					(__force const void __user *)start,
+					tsz);
+				pagefault_enable();
+				set_fs(old_fs);
+				if (n)
+					memset((void *)page + tsz - n, 0, n);
+
+				n = copy_to_user(buffer, (char *)page, tsz);
 				/*
 				 * We cannot distinguish between fault on source
 				 * and fault on destination. When this happens
 				 * we clear too and hope it will trigger the
 				 * EFAULT again.
 				 */
-				if (n) { 
+				if (n) {
 					if (clear_user(buffer + tsz - n,
 								n))
 						return -EFAULT;
@@ -540,7 +558,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
 		start += tsz;
 		tsz = (buflen > PAGE_SIZE ? PAGE_SIZE : buflen);
 	}
-
+	free_page(page);
 	return acc;
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC] kcore:change kcore_read to make sure the kernel read is safe
  2015-08-04  3:37 [RFC] kcore:change kcore_read to make sure the kernel read is safe yalin wang
@ 2015-08-04 21:18 ` Dave Hansen
  2015-08-05  3:37   ` yalin wang
  2015-08-04 22:38 ` Andrew Morton
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Hansen @ 2015-08-04 21:18 UTC (permalink / raw)
  To: yalin wang, Ingo Molnar, Andrew Morton, David Rientjes, fabf, bhe,
	open list

On 08/03/2015 08:37 PM, yalin wang wrote:
> This change kcore_read() to use __copy_from_user_inatomic() to
> copy data from kernel address, because kern_addr_valid() just make sure
> page table is valid during call it, whne it return, the page table may
> change, for example, like set_fixmap() function will change kernel page
> table, then maybe trigger kernel crash if encounter this unluckily.

I don't see any cases at the moment that will crash.  set_fixmap()
doesn't ever clear out any ptes, right?

I guess the root problem here is that we don't have any good (generic)
locking of kernel page tables inside the linear map.  Can you come up
with a case where this will _actually_ crash?

>  fs/proc/kcore.c | 30 ++++++++++++++++++++++++------
>  1 file changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
> index 92e6726..b085fde 100644
> --- a/fs/proc/kcore.c
> +++ b/fs/proc/kcore.c
> @@ -86,8 +86,8 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
>  			size = try;
>  		*nphdr = *nphdr + 1;
>  	}
> -	*elf_buflen =	sizeof(struct elfhdr) + 
> -			(*nphdr + 2)*sizeof(struct elf_phdr) + 
> +	*elf_buflen =	sizeof(struct elfhdr) +
> +			(*nphdr + 2)*sizeof(struct elf_phdr) +

I'm having a hard time spotting the change here.  Whitespace?

>  			3 * ((sizeof(struct elf_note)) +
>  			     roundup(sizeof(CORE_STR), 4)) +
>  			roundup(sizeof(struct elf_prstatus), 4) +
> @@ -435,6 +435,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>  	size_t elf_buflen;
>  	int nphdr;
>  	unsigned long start;
> +	unsigned long page = 0;
>  
>  	read_lock(&kclist_lock);
>  	size = get_kcore_size(&nphdr, &elf_buflen);
> @@ -485,7 +486,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>  	start = kc_offset_to_vaddr(*fpos - elf_buflen);
>  	if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
>  		tsz = buflen;
> -		
> +

Please keep the unnecessary whitespace changes for another patch.

>  	while (buflen) {
>  		struct kcore_list *m;
>  
> @@ -515,15 +516,32 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>  		} else {
>  			if (kern_addr_valid(start)) {
>  				unsigned long n;
> +				mm_segment_t old_fs = get_fs();
> +
> +				if (page == 0) {
> +					page = __get_free_page(GFP_KERNEL);
> +					if (page == 0)
> +						return -ENOMEM;

FWIW, we usually code this as "!page" instead of "page == 0".  I also
wouldn't call it 'page'.

Also, why is this using a raw __get_free_page() while the code above it
uses a kmalloc()?

> -				n = copy_to_user(buffer, (char *)start, tsz);
> +				}
> +				set_fs(KERNEL_DS);
> +				pagefault_disable();
> +				n = __copy_from_user_inatomic((void *)page,
> +					(__force const void __user *)start,
> +					tsz);
> +				pagefault_enable();
> +				set_fs(old_fs);
> +				if (n)
> +					memset((void *)page + tsz - n, 0, n);
> +
> +				n = copy_to_user(buffer, (char *)page, tsz);

So, first of all, we are using __copy_from_user_inatomic() to copy to
and from a *kernel* addresses, and it doesn't even get a comment? :)

Fundamentally, we're trying to be able to safely survive faults in the
kernel linear map here.  I think we've got to get a better handle on
when that happens rather than just paper over it when it does.  (Aside:
There might actually be a missing use of get_online_mems() here.)

Maybe we should just be walking the kernel page tables ourselves and do
a kmap().  We might have a stale pte but we don't have to worry about
actual racy updates while we are doing the copy.

>  				/*
>  				 * We cannot distinguish between fault on source
>  				 * and fault on destination. When this happens
>  				 * we clear too and hope it will trigger the
>  				 * EFAULT again.
>  				 */

This comment seems wrong after the patch.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] kcore:change kcore_read to make sure the kernel read is safe
  2015-08-04  3:37 [RFC] kcore:change kcore_read to make sure the kernel read is safe yalin wang
  2015-08-04 21:18 ` Dave Hansen
@ 2015-08-04 22:38 ` Andrew Morton
  2015-08-05  3:30   ` yalin wang
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2015-08-04 22:38 UTC (permalink / raw)
  To: yalin wang; +Cc: Ingo Molnar, dave, David Rientjes, fabf, bhe, open list

On Tue, 4 Aug 2015 11:37:57 +0800 yalin wang <yalin.wang2010@gmail.com> wrote:

> This change kcore_read() to use __copy_from_user_inatomic() to
> copy data from kernel address, because kern_addr_valid() just make sure
> page table is valid during call it, whne it return, the page table may
> change, for example, like set_fixmap() function will change kernel page
> table, then maybe trigger kernel crash if encounter this unluckily.

The changelog is a bit hard to follow.  How does this version look?

: read_kcore() does a copy_to_user() from kernel memory.  This could cause a
: crash if the source (kernel) address is concurrently unmapped via, say,
: set_fixmap().  The kern_addr_valid() check is racy and won't reliably
: prevent this.
: 
: Change kcore_read() to use __copy_from_user_inatomic() via a temporary
: buffer to catch such situations.

What actually happens when copy_to_user() gets a fault on the source
address?  It *could* handle it and return -EFAULT.  I forget...

Also...  what is special about this particular copy_to_user()?  Isn't
every copy_to_user() in the kernel vulnerable to a concurrent
set_fixmap()?  Is it that only read_kcore() will read pages which are
subject to set_fixmap() alteration?

> --- a/fs/proc/kcore.c
> +++ b/fs/proc/kcore.c
> @@ -86,8 +86,8 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
>  			size = try;
>  		*nphdr = *nphdr + 1;
>  	}
> -	*elf_buflen =	sizeof(struct elfhdr) + 
> -			(*nphdr + 2)*sizeof(struct elf_phdr) + 
> +	*elf_buflen =	sizeof(struct elfhdr) +
> +			(*nphdr + 2)*sizeof(struct elf_phdr) +

Unrelated whitespace fixes really shouldn't be in here.  They don't
bother me too much, but some people get upset ;)

>  			3 * ((sizeof(struct elf_note)) +
>  			     roundup(sizeof(CORE_STR), 4)) +
>  			roundup(sizeof(struct elf_prstatus), 4) +
> @@ -435,6 +435,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>  	size_t elf_buflen;
>  	int nphdr;
>  	unsigned long start;
> +	unsigned long page = 0;

"page" isn't a very good name - when we see that identifier we expect
it to be a `struct page *'.  Maybe call it copy_buf or something.

(And incoming argument "buffer" was poorly named.  "buffer" implies some
temporary intermediate thing, which is inappropriate here!)

>  	read_lock(&kclist_lock);
>  	size = get_kcore_size(&nphdr, &elf_buflen);
> @@ -485,7 +486,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>  	start = kc_offset_to_vaddr(*fpos - elf_buflen);
>  	if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
>  		tsz = buflen;
> -		
> +
>  	while (buflen) {
>  		struct kcore_list *m;
>  
> @@ -515,15 +516,32 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>  		} else {
>  			if (kern_addr_valid(start)) {

Do we still need the (racy) kern_addr_valid() test?  The code should
work OK if this is removed?

>  				unsigned long n;
> +				mm_segment_t old_fs = get_fs();
> +
> +				if (page == 0) {
> +					page = __get_free_page(GFP_KERNEL);
> +					if (page == 0)
> +						return -ENOMEM;
>  
> -				n = copy_to_user(buffer, (char *)start, tsz);
> +				}
> +				set_fs(KERNEL_DS);
> +				pagefault_disable();
> +				n = __copy_from_user_inatomic((void *)page,
> +					(__force const void __user *)start,
> +					tsz);
> +				pagefault_enable();
> +				set_fs(old_fs);

We should have a code comment in here telling people what's going on. 
A concurrent set_fixmap() on the source memory is unexpected!

> +				if (n)
> +					memset((void *)page + tsz - n, 0, n);
> +
> +				n = copy_to_user(buffer, (char *)page, tsz);
>  				/*
>  				 * We cannot distinguish between fault on source
>  				 * and fault on destination. When this happens
>  				 * we clear too and hope it will trigger the
>  				 * EFAULT again.
>  				 */
> -				if (n) { 
> +				if (n) {
>  					if (clear_user(buffer + tsz - n,
>  								n))
>  						return -EFAULT;
> @@ -540,7 +558,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>  		start += tsz;
>  		tsz = (buflen > PAGE_SIZE ? PAGE_SIZE : buflen);
>  	}
> -
> +	free_page(page);
>  	return acc;
>  }


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] kcore:change kcore_read to make sure the kernel read is safe
  2015-08-04 22:38 ` Andrew Morton
@ 2015-08-05  3:30   ` yalin wang
  0 siblings, 0 replies; 5+ messages in thread
From: yalin wang @ 2015-08-05  3:30 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Ingo Molnar, dave, David Rientjes, fabf, bhe, open list


> On Aug 5, 2015, at 06:38, Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> On Tue, 4 Aug 2015 11:37:57 +0800 yalin wang <yalin.wang2010@gmail.com> wrote:
> 
>> This change kcore_read() to use __copy_from_user_inatomic() to
>> copy data from kernel address, because kern_addr_valid() just make sure
>> page table is valid during call it, whne it return, the page table may
>> change, for example, like set_fixmap() function will change kernel page
>> table, then maybe trigger kernel crash if encounter this unluckily.
> 
> The changelog is a bit hard to follow.  How does this version look?
> 
> : read_kcore() does a copy_to_user() from kernel memory.  This could cause a
> : crash if the source (kernel) address is concurrently unmapped via, say,
> : set_fixmap().  The kern_addr_valid() check is racy and won't reliably
> : prevent this.
> : 
> : Change kcore_read() to use __copy_from_user_inatomic() via a temporary
> : buffer to catch such situations.
> 
> What actually happens when copy_to_user() gets a fault on the source
> address?  It *could* handle it and return -EFAULT.  I forget...
> 
> Also...  what is special about this particular copy_to_user()?  Isn't
> every copy_to_user() in the kernel vulnerable to a concurrent
> set_fixmap()?  Is it that only read_kcore() will read pages which are
> subject to set_fixmap() alteration?
> 
Thanks for your great comment .
i agree with your git change log,
one more question, at first i only focus on arm64 arch,
it only check __user* address during copy_from{to}_user,
but other architecther like X86 check both source and dest address
in copy_from{to}_user, is there some special reason do like this?
in my view , just need check __user* address is enough, and if also
have ex_table for kernel address access maybe hide some BUG in kernel ,
i think kernel don’t need it, or am i miss something ?

if copy_from{to}_user both check source and dest address,
we don’t need this patch, it is safe .
Maybe we need one more API , like:
copy_data_in_user(__user *source, __user *dest, size_t size)  ??
>> --- a/fs/proc/kcore.c
>> +++ b/fs/proc/kcore.c
>> @@ -86,8 +86,8 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
>> 			size = try;
>> 		*nphdr = *nphdr + 1;
>> 	}
>> -	*elf_buflen =	sizeof(struct elfhdr) + 
>> -			(*nphdr + 2)*sizeof(struct elf_phdr) + 
>> +	*elf_buflen =	sizeof(struct elfhdr) +
>> +			(*nphdr + 2)*sizeof(struct elf_phdr) +
> 
> Unrelated whitespace fixes really shouldn't be in here.  They don't
> bother me too much, but some people get upset ;)
> 

i will seperate in another patch for format correctness.

>> 			3 * ((sizeof(struct elf_note)) +
>> 			     roundup(sizeof(CORE_STR), 4)) +
>> 			roundup(sizeof(struct elf_prstatus), 4) +
>> @@ -435,6 +435,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>> 	size_t elf_buflen;
>> 	int nphdr;
>> 	unsigned long start;
>> +	unsigned long page = 0;
> 
> "page" isn't a very good name - when we see that identifier we expect
> it to be a `struct page *'.  Maybe call it copy_buf or something.
> 
> (And incoming argument "buffer" was poorly named.  "buffer" implies some
> temporary intermediate thing, which is inappropriate here!)
> 
will change name.

>> 	read_lock(&kclist_lock);
>> 	size = get_kcore_size(&nphdr, &elf_buflen);
>> @@ -485,7 +486,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>> 	start = kc_offset_to_vaddr(*fpos - elf_buflen);
>> 	if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
>> 		tsz = buflen;
>> -		
>> +
>> 	while (buflen) {
>> 		struct kcore_list *m;
>> 
>> @@ -515,15 +516,32 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>> 		} else {
>> 			if (kern_addr_valid(start)) {
> 
> Do we still need the (racy) kern_addr_valid() test?  The code should
> work OK if this is removed?
> 
Yes, can remove

>> 				unsigned long n;
>> +				mm_segment_t old_fs = get_fs();
>> +
>> +				if (page == 0) {
>> +					page = __get_free_page(GFP_KERNEL);
>> +					if (page == 0)
>> +						return -ENOMEM;
>> 
>> -				n = copy_to_user(buffer, (char *)start, tsz);
>> +				}
>> +				set_fs(KERNEL_DS);
>> +				pagefault_disable();
>> +				n = __copy_from_user_inatomic((void *)page,
>> +					(__force const void __user *)start,
>> +					tsz);
>> +				pagefault_enable();
>> +				set_fs(old_fs);
> 
> We should have a code comment in here telling people what's going on. 
> A concurrent set_fixmap() on the source memory is unexpected!

Ok.
> 
>> +				if (n)
>> +					memset((void *)page + tsz - n, 0, n);
>> +
>> +				n = copy_to_user(buffer, (char *)page, tsz);
>> 				/*
>> 				 * We cannot distinguish between fault on source
>> 				 * and fault on destination. When this happens
>> 				 * we clear too and hope it will trigger the
>> 				 * EFAULT again.
>> 				 */
>> -				if (n) { 
>> +				if (n) {
>> 					if (clear_user(buffer + tsz - n,
>> 								n))
>> 						return -EFAULT;
>> @@ -540,7 +558,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>> 		start += tsz;
>> 		tsz = (buflen > PAGE_SIZE ? PAGE_SIZE : buflen);
>> 	}
>> -
>> +	free_page(page);
>> 	return acc;
>> }
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] kcore:change kcore_read to make sure the kernel read is safe
  2015-08-04 21:18 ` Dave Hansen
@ 2015-08-05  3:37   ` yalin wang
  0 siblings, 0 replies; 5+ messages in thread
From: yalin wang @ 2015-08-05  3:37 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Ingo Molnar, Andrew Morton, David Rientjes, fabf, bhe, open list


> On Aug 5, 2015, at 05:18, Dave Hansen <dave@sr71.net> wrote:
> 
> On 08/03/2015 08:37 PM, yalin wang wrote:
>> This change kcore_read() to use __copy_from_user_inatomic() to
>> copy data from kernel address, because kern_addr_valid() just make sure
>> page table is valid during call it, whne it return, the page table may
>> change, for example, like set_fixmap() function will change kernel page
>> table, then maybe trigger kernel crash if encounter this unluckily.
> 
> I don't see any cases at the moment that will crash.  set_fixmap()
> doesn't ever clear out any ptes, right?
> 
> I guess the root problem here is that we don't have any good (generic)
> locking of kernel page tables inside the linear map.  Can you come up
> with a case where this will _actually_ crash?
> 
Thanks for your comments.
i don’t have crash for this, but when i read code, i see this part not safe,
so i make this patch :).

>> fs/proc/kcore.c | 30 ++++++++++++++++++++++++------
>> 1 file changed, 24 insertions(+), 6 deletions(-)
>> 
>> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
>> index 92e6726..b085fde 100644
>> --- a/fs/proc/kcore.c
>> +++ b/fs/proc/kcore.c
>> @@ -86,8 +86,8 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
>> 			size = try;
>> 		*nphdr = *nphdr + 1;
>> 	}
>> -	*elf_buflen =	sizeof(struct elfhdr) + 
>> -			(*nphdr + 2)*sizeof(struct elf_phdr) + 
>> +	*elf_buflen =	sizeof(struct elfhdr) +
>> +			(*nphdr + 2)*sizeof(struct elf_phdr) +
> 
> I'm having a hard time spotting the change here.  Whitespace?
i  will seperate in another patch for format correctness.
> 
>> 			3 * ((sizeof(struct elf_note)) +
>> 			     roundup(sizeof(CORE_STR), 4)) +
>> 			roundup(sizeof(struct elf_prstatus), 4) +
>> @@ -435,6 +435,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>> 	size_t elf_buflen;
>> 	int nphdr;
>> 	unsigned long start;
>> +	unsigned long page = 0;
>> 
>> 	read_lock(&kclist_lock);
>> 	size = get_kcore_size(&nphdr, &elf_buflen);
>> @@ -485,7 +486,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>> 	start = kc_offset_to_vaddr(*fpos - elf_buflen);
>> 	if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
>> 		tsz = buflen;
>> -		
>> +
> 
> Please keep the unnecessary whitespace changes for another patch.
> 
>> 	while (buflen) {
>> 		struct kcore_list *m;
>> 
>> @@ -515,15 +516,32 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>> 		} else {
>> 			if (kern_addr_valid(start)) {
>> 				unsigned long n;
>> +				mm_segment_t old_fs = get_fs();
>> +
>> +				if (page == 0) {
>> +					page = __get_free_page(GFP_KERNEL);
>> +					if (page == 0)
>> +						return -ENOMEM;
> 
> FWIW, we usually code this as "!page" instead of "page == 0".  I also
> wouldn't call it 'page'.
> 
> Also, why is this using a raw __get_free_page() while the code above it
> uses a kmalloc()?
> 
because i am using a page size buffer, more efficient  to use __get_free_page()
than  kmalloc() here .

>> -				n = copy_to_user(buffer, (char *)start, tsz);
>> +				}
>> +				set_fs(KERNEL_DS);
>> +				pagefault_disable();
>> +				n = __copy_from_user_inatomic((void *)page,
>> +					(__force const void __user *)start,
>> +					tsz);
>> +				pagefault_enable();
>> +				set_fs(old_fs);
>> +				if (n)
>> +					memset((void *)page + tsz - n, 0, n);
>> +
>> +				n = copy_to_user(buffer, (char *)page, tsz);
> 
> So, first of all, we are using __copy_from_user_inatomic() to copy to
> and from a *kernel* addresses, and it doesn't even get a comment? :)
> 
i will add comment.
> Fundamentally, we're trying to be able to safely survive faults in the
> kernel linear map here.  I think we've got to get a better handle on
> when that happens rather than just paper over it when it does.  (Aside:
> There might actually be a missing use of get_online_mems() here.)
> 
ok.

> Maybe we should just be walking the kernel page tables ourselves and do
> a kmap().  We might have a stale pte but we don't have to worry about
> actual racy updates while we are doing the copy.
> 
so if do like this, we can remove kern_addr_valid() function, and i just walk pte and use get_page_unelss_zero()
to grab the valid page  ?


>> 				/*
>> 				 * We cannot distinguish between fault on source
>> 				 * and fault on destination. When this happens
>> 				 * we clear too and hope it will trigger the
>> 				 * EFAULT again.
>> 				 */
> 
> This comment seems wrong after the patch.
Ok.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-08-05  3:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-04  3:37 [RFC] kcore:change kcore_read to make sure the kernel read is safe yalin wang
2015-08-04 21:18 ` Dave Hansen
2015-08-05  3:37   ` yalin wang
2015-08-04 22:38 ` Andrew Morton
2015-08-05  3:30   ` yalin wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox