Re: [RFC] kcore:change kcore_read to make sure the kernel read is safe

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Hansen <dave@sr71.net>
To: yalin wang <yalin.wang2010@gmail.com>,
	Ingo Molnar <mingo@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	fabf@skynet.be, bhe@redhat.com,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] kcore:change kcore_read to make sure the kernel read is safe
Date: Tue, 04 Aug 2015 14:18:35 -0700	[thread overview]
Message-ID: <55C12C2B.90509@sr71.net> (raw)
In-Reply-To: <593904DC-7E60-4B95-B82F-B50A5024085C@gmail.com>

On 08/03/2015 08:37 PM, yalin wang wrote:
> This change kcore_read() to use __copy_from_user_inatomic() to
> copy data from kernel address, because kern_addr_valid() just make sure
> page table is valid during call it, whne it return, the page table may
> change, for example, like set_fixmap() function will change kernel page
> table, then maybe trigger kernel crash if encounter this unluckily.

I don't see any cases at the moment that will crash.  set_fixmap()
doesn't ever clear out any ptes, right?

I guess the root problem here is that we don't have any good (generic)
locking of kernel page tables inside the linear map.  Can you come up
with a case where this will _actually_ crash?

>  fs/proc/kcore.c | 30 ++++++++++++++++++++++++------
>  1 file changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
> index 92e6726..b085fde 100644
> --- a/fs/proc/kcore.c
> +++ b/fs/proc/kcore.c
> @@ -86,8 +86,8 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
>  			size = try;
>  		*nphdr = *nphdr + 1;
>  	}
> -	*elf_buflen =	sizeof(struct elfhdr) + 
> -			(*nphdr + 2)*sizeof(struct elf_phdr) + 
> +	*elf_buflen =	sizeof(struct elfhdr) +
> +			(*nphdr + 2)*sizeof(struct elf_phdr) +

I'm having a hard time spotting the change here.  Whitespace?

>  			3 * ((sizeof(struct elf_note)) +
>  			     roundup(sizeof(CORE_STR), 4)) +
>  			roundup(sizeof(struct elf_prstatus), 4) +
> @@ -435,6 +435,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>  	size_t elf_buflen;
>  	int nphdr;
>  	unsigned long start;
> +	unsigned long page = 0;
>  
>  	read_lock(&kclist_lock);
>  	size = get_kcore_size(&nphdr, &elf_buflen);
> @@ -485,7 +486,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>  	start = kc_offset_to_vaddr(*fpos - elf_buflen);
>  	if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
>  		tsz = buflen;
> -		
> +

Please keep the unnecessary whitespace changes for another patch.

>  	while (buflen) {
>  		struct kcore_list *m;
>  
> @@ -515,15 +516,32 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>  		} else {
>  			if (kern_addr_valid(start)) {
>  				unsigned long n;
> +				mm_segment_t old_fs = get_fs();
> +
> +				if (page == 0) {
> +					page = __get_free_page(GFP_KERNEL);
> +					if (page == 0)
> +						return -ENOMEM;

FWIW, we usually code this as "!page" instead of "page == 0".  I also
wouldn't call it 'page'.

Also, why is this using a raw __get_free_page() while the code above it
uses a kmalloc()?

> -				n = copy_to_user(buffer, (char *)start, tsz);
> +				}
> +				set_fs(KERNEL_DS);
> +				pagefault_disable();
> +				n = __copy_from_user_inatomic((void *)page,
> +					(__force const void __user *)start,
> +					tsz);
> +				pagefault_enable();
> +				set_fs(old_fs);
> +				if (n)
> +					memset((void *)page + tsz - n, 0, n);
> +
> +				n = copy_to_user(buffer, (char *)page, tsz);

So, first of all, we are using __copy_from_user_inatomic() to copy to
and from a *kernel* addresses, and it doesn't even get a comment? :)

Fundamentally, we're trying to be able to safely survive faults in the
kernel linear map here.  I think we've got to get a better handle on
when that happens rather than just paper over it when it does.  (Aside:
There might actually be a missing use of get_online_mems() here.)

Maybe we should just be walking the kernel page tables ourselves and do
a kmap().  We might have a stale pte but we don't have to worry about
actual racy updates while we are doing the copy.

>  				/*
>  				 * We cannot distinguish between fault on source
>  				 * and fault on destination. When this happens
>  				 * we clear too and hope it will trigger the
>  				 * EFAULT again.
>  				 */

This comment seems wrong after the patch.

next prev parent reply	other threads:[~2015-08-04 21:18 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-04  3:37 [RFC] kcore:change kcore_read to make sure the kernel read is safe yalin wang
2015-08-04 21:18 ` Dave Hansen [this message]
2015-08-05  3:37   ` yalin wang
2015-08-04 22:38 ` Andrew Morton
2015-08-05  3:30   ` yalin wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55C12C2B.90509@sr71.net \
    --to=dave@sr71.net \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=fabf@skynet.be \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=rientjes@google.com \
    --cc=yalin.wang2010@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.