All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mmap_vmcore: skip non-ram pages reported by hypervisors
Date: Tue, 08 Jul 2014 10:08:58 +0200	[thread overview]
Message-ID: <87k37o46hx.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <20140707133301.dfcc078f416efeb1ada72da9@linux-foundation.org> (Andrew Morton's message of "Mon, 7 Jul 2014 13:33:01 -0700")

Andrew Morton <akpm@linux-foundation.org> writes:

> On Mon,  7 Jul 2014 17:05:49 +0200 Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
>> we have a special check in read_vmcore() handler to check if the page was
>> reported as ram or not by the hypervisor (pfn_is_ram()). However, when
>> vmcore is read with mmap() no such check is performed. That can lead to
>> unpredictable results, e.g. when running Xen PVHVM guest memcpy() after
>> mmap() on /proc/vmcore will hang processing HVMMEM_mmio_dm pages creating
>> enormous load in both DomU and Dom0.
>> 
>> Fix the issue by mapping each non-ram page to the zero page. Keep direct
>> path with remap_oldmem_pfn_range() to avoid looping through all pages on
>> bare metal.
>> 
>> The issue can also be solved by overriding remap_oldmem_pfn_range() in
>> xen-specific code, as remap_oldmem_pfn_range() was been designed for.
>> That, however, would involve non-obvious xen code path for all x86 builds
>> with CONFIG_XEN_PVHVM=y and would prevent all other hypervisor-specific
>> code on x86 arch from doing the same override.
>
> I'd like to get some reviewed-by's and tested-by's on this one please.
>

This patch can be tested with Xen PVHVM guest only as it is the only
platform which registers oldmem_pfn_is_ram atm.

>> --- a/fs/proc/vmcore.c
>> +++ b/fs/proc/vmcore.c
>> @@ -328,6 +328,46 @@ static inline char *alloc_elfnotes_buf(size_t notes_sz)
>>   * virtually contiguous user-space in ELF layout.
>>   */
>>  #ifdef CONFIG_MMU
>> +static u64 remap_oldmem_pfn_checked(struct vm_area_struct *vma, u64 len,
>> +				    unsigned long pfn, unsigned long page_count)
>> +{
>> +	unsigned long pos;
>> +	size_t size;
>> +	unsigned long vma_addr;
>> +	unsigned long emptypage_pfn = __pa(empty_zero_page) >> PAGE_SHIFT;
>
> That's old-school.  Can we use my_zero_pfn() here?
>
> Also, "zeropage_pfn" is a better name - let's not introduce the
> hitherto unknown concept of an "empty page".
>

Sure!

>> +	for (pos = pfn; (pos - pfn) <= page_count; pos++) {
>> +		if (!pfn_is_ram(pos) || (pos - pfn) == page_count) {
>> +			/* we hit a page which is not ram or reached the end */
>> +			if (pos - pfn > 0) {
>> +				/* remapping continuous region */
>> +				size = (pos - pfn) << PAGE_SHIFT;
>> +				vma_addr = vma->vm_start + len;
>> +				if (remap_oldmem_pfn_range(vma, vma_addr,
>> +							   pfn, size,
>> +							   vma->vm_page_prot))
>> +					return len;
>> +				len += size;
>> +				page_count -= (pos - pfn);
>> +			}
>> +			if (page_count > 0) {
>> +				/* we hit a page which is not ram, replacing
>> +				   with an empty one */
>
> I suggest
>
> 				/*
> 				 * We hit a page which is not ram.  Replace it
> 				 * with the zero page.
> 				 */
>

:-)

>> +				vma_addr = vma->vm_start + len;
>> +				if (remap_oldmem_pfn_range(vma, vma_addr,
>> +							   emptypage_pfn,
>> +							   PAGE_SIZE,
>> +							   vma->vm_page_prot))
>> +					return len;
>> +				len += PAGE_SIZE;
>> +				pfn = pos + 1;
>> +				page_count--;
>> +			}
>> +		}
>> +	}
>> +	return len;
>> +}
>
> Also, this loop seems unnecessarily hard to follow.  It *look* like the
> `for' statement has an off-by-one because of the "<=", but page_count
> is mofidied inside the loop!  Despite it being an incoming formal
> argument.

There is no off-by-one error here (I believe) as I'm checking two
possible conditions to remap the continuous region:
1) We hit a non-ram page
2) We reached the end
so we exclude the page pos is pointing us to in both cases.

I tried to avoid code duplication e.g. having one 'remap continuous
region' inside the loop to do the remapping when we hit a non-ram page
and having the other one outside the loop to remap the tail.

>
> None of this is made any easier by the function's lack of
> documentation.  Some description of the incoming args would help, along
> with an overall description of the function's responsibilities.
>
> That being said, can't we just do something nice and simple like
>
> 	pos = pfn;
> 	while (pos < pfn + page_count) {
> 		stuff which advances `pos'
> 	}
>
> ?
>

I completely agree it's possible to make this code easier to
understand, will do.

>>  static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
>>  {
>>  	size_t size = vma->vm_end - vma->vm_start;
>> @@ -383,17 +423,33 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
>>  
>>  	list_for_each_entry(m, &vmcore_list, list) {
>>  		if (start < m->offset + m->size) {
>> -			u64 paddr = 0;
>> +			u64 paddr = 0, original_len;
>> +			unsigned long pfn, page_count;
>>  
>>  			tsz = min_t(size_t, m->offset + m->size - start, size);
>>  			paddr = m->paddr + start - m->offset;
>> -			if (remap_oldmem_pfn_range(vma, vma->vm_start + len,
>> -						   paddr >> PAGE_SHIFT, tsz,
>> -						   vma->vm_page_prot))
>> -				goto fail;
>> +
>> +			/* check if oldmem_pfn_is_ram was registered to avoid
>> +			   looping over all pages without a reason */
>
> Please lay out the comments in the usual fashion:
>
> 	/*
> 	 * ...
> 	 * ..
> 	 */
>
> And sentences start whith upper-case letters!
>
>> +			if (oldmem_pfn_is_ram) {
>> +				pfn = paddr >> PAGE_SHIFT;
>> +				page_count = tsz >> PAGE_SHIFT;
>> +				original_len = len;
>> +				len = remap_oldmem_pfn_checked(vma, len, pfn,
>> +							       page_count);
>> +				if (len != original_len + tsz)
>> +					goto fail;
>> +			} else {
>> +				if (remap_oldmem_pfn_range(vma,
>> +							   vma->vm_start + len,
>> +							   paddr >> PAGE_SHIFT,
>> +							   tsz,
>> +							   vma->vm_page_prot))
>> +					goto fail;
>> +				len += tsz;
>> +			}
>>  			size -= tsz;
>>  			start += tsz;
>> -			len += tsz;
>>  
>>  			if (size == 0)
>>  				return 0;

Thank you very much for your review! I'm working on v2.

-- 
  Vitaly

  reply	other threads:[~2014-07-08  8:09 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-07 15:05 [PATCH] mmap_vmcore: skip non-ram pages reported by hypervisors Vitaly Kuznetsov
2014-07-07 20:33 ` Andrew Morton
2014-07-07 20:33 ` Andrew Morton
2014-07-08  8:08   ` Vitaly Kuznetsov [this message]
2014-07-08  8:08   ` Vitaly Kuznetsov
2014-07-08 16:25   ` [Xen-devel] " David Vrabel
2014-07-09  9:17     ` Vitaly Kuznetsov
2014-07-09  9:46       ` David Vrabel
2014-07-09  9:46       ` David Vrabel
2014-07-09  9:17     ` Vitaly Kuznetsov
2014-07-08 16:25   ` David Vrabel
2014-07-08 16:29 ` Konrad Rzeszutek Wilk
2014-07-08 16:29 ` [Xen-devel] " Konrad Rzeszutek Wilk
2014-07-09  9:23   ` Vitaly Kuznetsov
2014-07-09  9:23   ` Vitaly Kuznetsov
2014-07-08 19:11 ` Vivek Goyal
2014-07-09  9:46   ` Vitaly Kuznetsov
2014-07-09  9:46   ` Vitaly Kuznetsov
2014-07-09 10:12     ` [Xen-devel] " Olaf Hering
2014-07-09 10:12     ` Olaf Hering
2014-07-08 19:11 ` Vivek Goyal
  -- strict thread matches above, loose matches on Subject: below --
2014-07-07 15:05 Vitaly Kuznetsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k37o46hx.fsf@vitty.brq.redhat.com \
    --to=vkuznets@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=holzheu@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vgoyal@redhat.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.