From: christoffer.dall@linaro.org (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v5 2/4] live migration support for initial write protect of VM
Date: Mon, 19 May 2014 18:56:48 +0100 [thread overview]
Message-ID: <20140519175648.GB5292@lvm> (raw)
In-Reply-To: <53768584.4060705@samsung.com>
On Fri, May 16, 2014 at 02:39:16PM -0700, Mario Smarduch wrote:
> Hi Christoffer,
> few more comments
> >>> struct vgic_dist vgic;
> >>> + /* Marks start of migration, used to handle 2nd stage page faults
> >>> + * during migration, prevent installing huge pages and split huge pages
> >>> + * to small pages.
> >>> + */
> >>
> >> commenting style
> >>
> >> this is a bit verbose for a field in a struct, perhaps moving the longer
> >> version to where you set this?
> > Will do.
> >>
> >>> + int migration_in_progress;
> >>> };
>
> I think this flag could be removed all together. Migration can be
> stopped at any time (started too), through user request or other events.
> When that happens (like migrate_cancel) migrate cleanup bh runs and eventually calls
> KVM memory listener kvm_log_global_start() (cancel handler)
> that stops logging, clears KVM_MEM_LOG_DIRTY_PAGES, and region ops ioctl,
> clears dirty_bitmap. In either case dirty_bitmap for memslot is set or
> unset during migration to track dirty pages, following that field seems to be
> a better way to keep track of migration. This again is QEMU view but it appears
> all these policies are driven from user space.
>
ok, I need to look more closely at the whole thing to properly comment
on this.
>
>
> >>>
> >>> +/* kvm_split_pmd - splits huge pages to small pages, required to keep a dirty
> >>> + * log of smaller memory granules, otherwise huge pages would need to be
> >>> + * migrated. Practically an idle system has problems migrating with
> >>
> >> This seems abrupt. Why can't we just represent a 2M huge page as 512 4K
> >> bits and write protect the huge pages, if you take a write fault on a 2M
> >> page, then split it then.
> >
> > That's one alternative the one I put into v6 is clear the PMD
> > and force user_mem_abort() to fault in 4k pages, and mark the
> > dirty_bitmap[] for that page, reuse the current code. Have not
> > checked the impact on performance, it takes few seconds longer
> > to converge for the tests I'm running.
>
> I was thinking about this and if PMD attributes need to be passed
> onto the PTEs then it appears what you recommend is required.
> But during run time I don't see how 2nd stage attributes can
> change, could the guest do anything to change them (SH, Memattr)?
You should be able to just grab the kvm_mmu lock, update the stage-2
page tables to remove all writable bits, flush all Stage-2 TLBs for that
VMID, and you should be all set.
>
>
> Performance may also be other reason but that always depends
> on the load, clearing a PMD seems easier and reuses current code.
> Probably several load tests/benchmarks can help here.
> Also noticed hw PMD/PTE attributes differ a little which
> is not significant now, but moving forward different page size
> and any new revisions to fields may require additional maintenance.
I think clearing out all PMD mappings will carry a significant
performance degradation on the source VM, and in the case you keep it
running, will be quite unfortunate. Hint: Page faults are expensive and
huge pages have shown to give about 10-15% performance increase on ARMv7
for CPU/memory intensive benchmarks.
>
> I'll be out next week and back 26'th, I'll create a link with
> details on test environment and tests. The cover letter will
> will go through general overview only.
>
ok, I have some time then.
-Christoffer
>
> >
> >>
> >> If your use case is HA, then you will be doing this a lot, and you don't
> >> want to hurt performance of your main live system more than necessary.
> >
> >>
> >>> + * huge pages. Called during WP of entire VM address space, done
> >>
>
>
WARNING: multiple messages have this Message-ID (diff)
From: Christoffer Dall <christoffer.dall@linaro.org>
To: Mario Smarduch <m.smarduch@samsung.com>
Cc: kvmarm@lists.cs.columbia.edu, marc.zyngier@arm.com,
steve.capper@arm.com, kvm@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, gavin.guo@canonical.com,
peter.maydell@linaro.org, jays.lee@samsung.com,
sungjinn.chung@samsung.com
Subject: Re: [PATCH v5 2/4] live migration support for initial write protect of VM
Date: Mon, 19 May 2014 18:56:48 +0100 [thread overview]
Message-ID: <20140519175648.GB5292@lvm> (raw)
In-Reply-To: <53768584.4060705@samsung.com>
On Fri, May 16, 2014 at 02:39:16PM -0700, Mario Smarduch wrote:
> Hi Christoffer,
> few more comments
> >>> struct vgic_dist vgic;
> >>> + /* Marks start of migration, used to handle 2nd stage page faults
> >>> + * during migration, prevent installing huge pages and split huge pages
> >>> + * to small pages.
> >>> + */
> >>
> >> commenting style
> >>
> >> this is a bit verbose for a field in a struct, perhaps moving the longer
> >> version to where you set this?
> > Will do.
> >>
> >>> + int migration_in_progress;
> >>> };
>
> I think this flag could be removed all together. Migration can be
> stopped at any time (started too), through user request or other events.
> When that happens (like migrate_cancel) migrate cleanup bh runs and eventually calls
> KVM memory listener kvm_log_global_start() (cancel handler)
> that stops logging, clears KVM_MEM_LOG_DIRTY_PAGES, and region ops ioctl,
> clears dirty_bitmap. In either case dirty_bitmap for memslot is set or
> unset during migration to track dirty pages, following that field seems to be
> a better way to keep track of migration. This again is QEMU view but it appears
> all these policies are driven from user space.
>
ok, I need to look more closely at the whole thing to properly comment
on this.
>
>
> >>>
> >>> +/* kvm_split_pmd - splits huge pages to small pages, required to keep a dirty
> >>> + * log of smaller memory granules, otherwise huge pages would need to be
> >>> + * migrated. Practically an idle system has problems migrating with
> >>
> >> This seems abrupt. Why can't we just represent a 2M huge page as 512 4K
> >> bits and write protect the huge pages, if you take a write fault on a 2M
> >> page, then split it then.
> >
> > That's one alternative the one I put into v6 is clear the PMD
> > and force user_mem_abort() to fault in 4k pages, and mark the
> > dirty_bitmap[] for that page, reuse the current code. Have not
> > checked the impact on performance, it takes few seconds longer
> > to converge for the tests I'm running.
>
> I was thinking about this and if PMD attributes need to be passed
> onto the PTEs then it appears what you recommend is required.
> But during run time I don't see how 2nd stage attributes can
> change, could the guest do anything to change them (SH, Memattr)?
You should be able to just grab the kvm_mmu lock, update the stage-2
page tables to remove all writable bits, flush all Stage-2 TLBs for that
VMID, and you should be all set.
>
>
> Performance may also be other reason but that always depends
> on the load, clearing a PMD seems easier and reuses current code.
> Probably several load tests/benchmarks can help here.
> Also noticed hw PMD/PTE attributes differ a little which
> is not significant now, but moving forward different page size
> and any new revisions to fields may require additional maintenance.
I think clearing out all PMD mappings will carry a significant
performance degradation on the source VM, and in the case you keep it
running, will be quite unfortunate. Hint: Page faults are expensive and
huge pages have shown to give about 10-15% performance increase on ARMv7
for CPU/memory intensive benchmarks.
>
> I'll be out next week and back 26'th, I'll create a link with
> details on test environment and tests. The cover letter will
> will go through general overview only.
>
ok, I have some time then.
-Christoffer
>
> >
> >>
> >> If your use case is HA, then you will be doing this a lot, and you don't
> >> want to hurt performance of your main live system more than necessary.
> >
> >>
> >>> + * huge pages. Called during WP of entire VM address space, done
> >>
>
>
next prev parent reply other threads:[~2014-05-19 17:56 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-08 0:40 [PATCH v5 0/4] live migration dirty bitmap support for ARMv7 Mario Smarduch
2014-05-08 0:40 ` Mario Smarduch
2014-05-08 0:40 ` [PATCH v5 1/4] add ARMv7 HYP API to flush VM TLBs without address param Mario Smarduch
2014-05-08 0:40 ` Mario Smarduch
2014-05-14 16:47 ` Christoffer Dall
2014-05-14 16:47 ` Christoffer Dall
2014-05-15 2:00 ` Mario Smarduch
2014-05-15 2:00 ` Mario Smarduch
2014-05-15 18:50 ` Christoffer Dall
2014-05-15 18:50 ` Christoffer Dall
2014-05-08 0:40 ` [PATCH v5 2/4] live migration support for initial write protect of VM Mario Smarduch
2014-05-08 0:40 ` Mario Smarduch
2014-05-15 18:53 ` Christoffer Dall
2014-05-15 18:53 ` Christoffer Dall
2014-05-15 22:51 ` Mario Smarduch
2014-05-15 22:51 ` Mario Smarduch
2014-05-16 21:39 ` Mario Smarduch
2014-05-16 21:39 ` Mario Smarduch
2014-05-19 17:56 ` Christoffer Dall [this message]
2014-05-19 17:56 ` Christoffer Dall
2014-05-30 16:48 ` Mario Smarduch
2014-05-30 16:48 ` Mario Smarduch
2014-05-08 0:40 ` [PATCH v5 3/4] live migration support for VM dirty log management Mario Smarduch
2014-05-08 0:40 ` Mario Smarduch
2014-05-08 0:40 ` [PATCH v5 4/4] add 2nd stage page fault handling during live migration Mario Smarduch
2014-05-08 0:40 ` Mario Smarduch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140519175648.GB5292@lvm \
--to=christoffer.dall@linaro.org \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.