All of lore.kernel.org
 help / color / mirror / Atom feed
From: m.smarduch@samsung.com (Mario Smarduch)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v5 2/4] live migration support for initial write protect of VM
Date: Fri, 16 May 2014 14:39:16 -0700	[thread overview]
Message-ID: <53768584.4060705@samsung.com> (raw)
In-Reply-To: <537544D8.2030005@samsung.com>

Hi Christoffer,
  few more comments
>>>  	struct vgic_dist	vgic;
>>> +	/* Marks start of migration, used to handle 2nd stage page faults
>>> +	 * during migration, prevent installing huge pages and split huge pages
>>> +	 * to small pages.
>>> +	 */
>>
>> commenting style
>>
>> this is a bit verbose for a field in a struct, perhaps moving the longer
>> version to where you set this?
> Will do.
>>
>>> +	int migration_in_progress;
>>>  };

I think this flag could be removed all together. Migration can be
stopped at any time (started too), through user request or other events. 
When that happens (like migrate_cancel) migrate cleanup bh runs and eventually calls 
KVM memory listener kvm_log_global_start() (cancel handler) 
that stops logging, clears KVM_MEM_LOG_DIRTY_PAGES, and region ops ioctl,
clears dirty_bitmap. In either case dirty_bitmap for memslot is set or 
unset during migration to track dirty pages, following that field seems to be 
a better way to keep track of migration. This again is QEMU view but it appears 
all these policies are driven from user space.



>>>  
>>> +/* kvm_split_pmd - splits huge pages to small pages, required to keep a dirty
>>> + *	log of smaller memory granules, otherwise huge pages would need to be
>>> + *	migrated. Practically an idle system has problems migrating with
>>
>> This seems abrupt.  Why can't we just represent a 2M huge page as 512 4K
>> bits and write protect the huge pages, if you take a write fault on a 2M
>> page, then split it then.
> 
> That's one alternative the one I put into v6 is clear the PMD
> and force user_mem_abort() to fault in 4k pages, and mark the
> dirty_bitmap[] for that page, reuse the current code. Have not
> checked the impact on performance, it takes few seconds longer
> to converge for the tests I'm running. 

I was thinking about this and if PMD attributes need to be passed
onto the PTEs then it appears what you recommend is required.
But during run time I don't see how 2nd stage attributes can
change, could the guest do anything to change them (SH, Memattr)?


Performance may also be other reason but that always depends
on the load, clearing a PMD seems easier and reuses current code.
Probably several load tests/benchmarks can help here.
Also noticed hw PMD/PTE attributes differ a little which
is not significant now, but moving forward different page size
and any new revisions to fields may require additional maintenance.

I'll be out next week and back 26'th, I'll create a link with
details on test environment and tests. The cover letter will
will go through general overview only.

Thanks,
  Mario

> 
>>
>> If your use case is HA, then you will be doing this a lot, and you don't
>> want to hurt performance of your main live system more than necessary.
> 
>>
>>> + *	huge pages.  Called during WP of entire VM address space, done
>>

WARNING: multiple messages have this Message-ID (diff)
From: Mario Smarduch <m.smarduch@samsung.com>
To: Christoffer Dall <christoffer.dall@linaro.org>
Cc: kvmarm@lists.cs.columbia.edu, marc.zyngier@arm.com,
	steve.capper@arm.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, gavin.guo@canonical.com,
	peter.maydell@linaro.org, jays.lee@samsung.com,
	sungjinn.chung@samsung.com
Subject: Re: [PATCH v5 2/4] live migration support for initial write protect of VM
Date: Fri, 16 May 2014 14:39:16 -0700	[thread overview]
Message-ID: <53768584.4060705@samsung.com> (raw)
In-Reply-To: <537544D8.2030005@samsung.com>

Hi Christoffer,
  few more comments
>>>  	struct vgic_dist	vgic;
>>> +	/* Marks start of migration, used to handle 2nd stage page faults
>>> +	 * during migration, prevent installing huge pages and split huge pages
>>> +	 * to small pages.
>>> +	 */
>>
>> commenting style
>>
>> this is a bit verbose for a field in a struct, perhaps moving the longer
>> version to where you set this?
> Will do.
>>
>>> +	int migration_in_progress;
>>>  };

I think this flag could be removed all together. Migration can be
stopped at any time (started too), through user request or other events. 
When that happens (like migrate_cancel) migrate cleanup bh runs and eventually calls 
KVM memory listener kvm_log_global_start() (cancel handler) 
that stops logging, clears KVM_MEM_LOG_DIRTY_PAGES, and region ops ioctl,
clears dirty_bitmap. In either case dirty_bitmap for memslot is set or 
unset during migration to track dirty pages, following that field seems to be 
a better way to keep track of migration. This again is QEMU view but it appears 
all these policies are driven from user space.



>>>  
>>> +/* kvm_split_pmd - splits huge pages to small pages, required to keep a dirty
>>> + *	log of smaller memory granules, otherwise huge pages would need to be
>>> + *	migrated. Practically an idle system has problems migrating with
>>
>> This seems abrupt.  Why can't we just represent a 2M huge page as 512 4K
>> bits and write protect the huge pages, if you take a write fault on a 2M
>> page, then split it then.
> 
> That's one alternative the one I put into v6 is clear the PMD
> and force user_mem_abort() to fault in 4k pages, and mark the
> dirty_bitmap[] for that page, reuse the current code. Have not
> checked the impact on performance, it takes few seconds longer
> to converge for the tests I'm running. 

I was thinking about this and if PMD attributes need to be passed
onto the PTEs then it appears what you recommend is required.
But during run time I don't see how 2nd stage attributes can
change, could the guest do anything to change them (SH, Memattr)?


Performance may also be other reason but that always depends
on the load, clearing a PMD seems easier and reuses current code.
Probably several load tests/benchmarks can help here.
Also noticed hw PMD/PTE attributes differ a little which
is not significant now, but moving forward different page size
and any new revisions to fields may require additional maintenance.

I'll be out next week and back 26'th, I'll create a link with
details on test environment and tests. The cover letter will
will go through general overview only.

Thanks,
  Mario

> 
>>
>> If your use case is HA, then you will be doing this a lot, and you don't
>> want to hurt performance of your main live system more than necessary.
> 
>>
>>> + *	huge pages.  Called during WP of entire VM address space, done
>>



  reply	other threads:[~2014-05-16 21:39 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-08  0:40 [PATCH v5 0/4] live migration dirty bitmap support for ARMv7 Mario Smarduch
2014-05-08  0:40 ` Mario Smarduch
2014-05-08  0:40 ` [PATCH v5 1/4] add ARMv7 HYP API to flush VM TLBs without address param Mario Smarduch
2014-05-08  0:40   ` Mario Smarduch
2014-05-14 16:47   ` Christoffer Dall
2014-05-14 16:47     ` Christoffer Dall
2014-05-15  2:00     ` Mario Smarduch
2014-05-15  2:00       ` Mario Smarduch
2014-05-15 18:50       ` Christoffer Dall
2014-05-15 18:50         ` Christoffer Dall
2014-05-08  0:40 ` [PATCH v5 2/4] live migration support for initial write protect of VM Mario Smarduch
2014-05-08  0:40   ` Mario Smarduch
2014-05-15 18:53   ` Christoffer Dall
2014-05-15 18:53     ` Christoffer Dall
2014-05-15 22:51     ` Mario Smarduch
2014-05-15 22:51       ` Mario Smarduch
2014-05-16 21:39       ` Mario Smarduch [this message]
2014-05-16 21:39         ` Mario Smarduch
2014-05-19 17:56         ` Christoffer Dall
2014-05-19 17:56           ` Christoffer Dall
2014-05-30 16:48     ` Mario Smarduch
2014-05-30 16:48       ` Mario Smarduch
2014-05-08  0:40 ` [PATCH v5 3/4] live migration support for VM dirty log management Mario Smarduch
2014-05-08  0:40   ` Mario Smarduch
2014-05-08  0:40 ` [PATCH v5 4/4] add 2nd stage page fault handling during live migration Mario Smarduch
2014-05-08  0:40   ` Mario Smarduch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53768584.4060705@samsung.com \
    --to=m.smarduch@samsung.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.