* [PATCH] (resent) NUMA node migration
@ 2007-12-21 22:39 Andre Przywara
2007-12-21 22:45 ` Daniel P. Berrange
2007-12-22 0:53 ` John Levon
0 siblings, 2 replies; 7+ messages in thread
From: Andre Przywara @ 2007-12-21 22:39 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1: Type: text/plain, Size: 1340 bytes --]
forgot my signed-off, thus resent...
the following patch adds NUMA node migration based on live migration to
xend. By adding another parameter to "xm migrate" the target NUMA node
number gets propagated to the target host (can be both localhost or a
remote host). The restore function then sets the VCPU affinity
accordingly. Only changes Python code in xend. I hope that the patch
doesn't break XenAPI compatibility (adding a parameter seems fine?).
# xm migrate --live --node=<nodenr> <domid> localhost
<nodenr> is the number as shown with 'xm info' under node_to_cpu
I am aware that using live migration isn't the best approach (takes
twice the memory and quite some time), but it's less intrusive and works
fine (given localhost migration stability...)
Feedback appreciated, especially since I speak Python since Monday...
Regards,
Andre.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 277-84917
----to satisfy European Law for business letters:
AMD Saxony Limited Liability Company & Co. KG,
Wilschdorfer Landstr. 101, 01109 Dresden, Germany
Register Court Dresden: HRA 4896, General Partner authorized
to represent: AMD Saxony LLC (Wilmington, Delaware, US)
General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
[-- Attachment #2: numa_migrate4a.diff --]
[-- Type: text/plain, Size: 5320 bytes --]
diff -r 1f4b29eaf7f4 -r 1a9f3e26552d tools/python/xen/xend/XendAPI.py
--- a/tools/python/xen/xend/XendAPI.py Thu Dec 20 17:30:27 2007 +0000
+++ b/tools/python/xen/xend/XendAPI.py Fri Dec 21 17:34:38 2007 +0100
@@ -1761,9 +1761,10 @@ class XendAPI(object):
resource = other_config.get("resource", 0)
port = other_config.get("port", 0)
+ node = other_config.get("node", 0)
xendom.domain_migrate(xeninfo.getDomid(), destination_url,
- bool(live), resource, port)
+ bool(live), resource, port, node)
return xen_api_success_void()
def VM_save(self, _, vm_ref, dest, checkpoint):
diff -r 1f4b29eaf7f4 -r 1a9f3e26552d tools/python/xen/xend/XendCheckpoint.py
--- a/tools/python/xen/xend/XendCheckpoint.py Thu Dec 20 17:30:27 2007 +0000
+++ b/tools/python/xen/xend/XendCheckpoint.py Fri Dec 21 17:34:38 2007 +0100
@@ -22,6 +22,7 @@ from xen.xend.XendLogging import log
from xen.xend.XendLogging import log
from xen.xend.XendConfig import XendConfig
from xen.xend.XendConstants import *
+from xen.xend import XendNode
SIGNATURE = "LinuxGuestRecord"
QEMU_SIGNATURE = "QemuDeviceModelRecord"
@@ -56,10 +57,23 @@ def read_exact(fd, size, errmsg):
return buf
-def save(fd, dominfo, network, live, dst, checkpoint=False):
+def insert_after(list, pred, value):
+ for i,k in enumerate(list):
+ if type(k) == type([]):
+ if k[0] == pred:
+ list.insert (i+1, value)
+ return
+
+
+def save(fd, dominfo, network, live, dst, checkpoint=False, node=-1):
write_exact(fd, SIGNATURE, "could not write guest state file: signature")
- config = sxp.to_string(dominfo.sxpr())
+ sxprep = dominfo.sxpr()
+
+ if node > -1:
+ insert_after(sxprep,'vcpus',['node', str(node)])
+
+ config = sxp.to_string(sxprep)
domain_name = dominfo.getName()
# Rename the domain temporarily, so that we don't get a name clash if this
@@ -175,6 +189,21 @@ def restore(xd, fd, dominfo = None, paus
dominfo.resume()
else:
dominfo = xd.restore_(vmconfig)
+
+ # repin domain vcpus if a target node number was specified
+ # this is done prior to memory allocation to aide in memory
+ # distribution for NUMA systems.
+ nodenr = -1
+ for i,l in enumerate(vmconfig):
+ if type(l) == type([]):
+ if l[0] == 'node':
+ nodenr = int(l[1])
+
+ if nodenr >= 0:
+ node_to_cpu = XendNode.instance().xc.physinfo()['node_to_cpu']
+ if nodenr < len(node_to_cpu):
+ for v in range(0, dominfo.info['VCPUs_max']):
+ xc.vcpu_setaffinity(dominfo.domid, v, node_to_cpu[nodenr])
store_port = dominfo.getStorePort()
console_port = dominfo.getConsolePort()
diff -r 1f4b29eaf7f4 -r 1a9f3e26552d tools/python/xen/xend/XendDomain.py
--- a/tools/python/xen/xend/XendDomain.py Thu Dec 20 17:30:27 2007 +0000
+++ b/tools/python/xen/xend/XendDomain.py Fri Dec 21 17:34:38 2007 +0100
@@ -1255,7 +1255,7 @@ class XendDomain:
return val
- def domain_migrate(self, domid, dst, live=False, resource=0, port=0):
+ def domain_migrate(self, domid, dst, live=False, resource=0, port=0, node=-1):
"""Start domain migration.
@param domid: Domain ID or Name
@@ -1268,6 +1268,8 @@ class XendDomain:
@type live: bool
@keyword resource: not used??
@rtype: None
+ @keyword node: use node number for target
+ @rtype: int
@raise XendError: Failed to migrate
@raise XendInvalidDomain: Domain is not valid
"""
@@ -1296,7 +1298,7 @@ class XendDomain:
sock.send("receive\n")
sock.recv(80)
- XendCheckpoint.save(sock.fileno(), dominfo, True, live, dst)
+ XendCheckpoint.save(sock.fileno(), dominfo, True, live, dst, node=node)
sock.close()
def domain_save(self, domid, dst, checkpoint=False):
diff -r 1f4b29eaf7f4 -r 1a9f3e26552d tools/python/xen/xm/migrate.py
--- a/tools/python/xen/xm/migrate.py Thu Dec 20 17:30:27 2007 +0000
+++ b/tools/python/xen/xm/migrate.py Fri Dec 21 17:34:38 2007 +0100
@@ -43,6 +43,10 @@ gopts.opt('port', short='p', val='portnu
fn=set_int, default=0,
use="Use specified port for migration.")
+gopts.opt('node', short='n', val='nodenum',
+ fn=set_int, default=-1,
+ use="Use specified NUMA node on target.")
+
gopts.opt('resource', short='r', val='MBIT',
fn=set_int, default=0,
use="Set level of resource usage for migration.")
@@ -65,11 +69,13 @@ def main(argv):
vm_ref = get_single_vm(dom)
other_config = {
"port": opts.vals.port,
- "resource": opts.vals.resource
+ "resource": opts.vals.resource,
+ "node": opts.vals.node
}
server.xenapi.VM.migrate(vm_ref, dst, bool(opts.vals.live),
other_config)
else:
server.xend.domain.migrate(dom, dst, opts.vals.live,
opts.vals.resource,
- opts.vals.port)
+ opts.vals.port,
+ opts.vals.node)
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] (resent) NUMA node migration
2007-12-21 22:39 [PATCH] (resent) NUMA node migration Andre Przywara
@ 2007-12-21 22:45 ` Daniel P. Berrange
2007-12-22 0:53 ` John Levon
1 sibling, 0 replies; 7+ messages in thread
From: Daniel P. Berrange @ 2007-12-21 22:45 UTC (permalink / raw)
To: Andre Przywara; +Cc: xen-devel
On Fri, Dec 21, 2007 at 11:39:58PM +0100, Andre Przywara wrote:
> forgot my signed-off, thus resent...
>
> the following patch adds NUMA node migration based on live migration to
> xend. By adding another parameter to "xm migrate" the target NUMA node
> number gets propagated to the target host (can be both localhost or a
> remote host). The restore function then sets the VCPU affinity
> accordingly. Only changes Python code in xend. I hope that the patch
> doesn't break XenAPI compatibility (adding a parameter seems fine?).
>
> # xm migrate --live --node=<nodenr> <domid> localhost
> <nodenr> is the number as shown with 'xm info' under node_to_cpu
>
> I am aware that using live migration isn't the best approach (takes
> twice the memory and quite some time), but it's less intrusive and works
> fine (given localhost migration stability...)
>
> Feedback appreciated, especially since I speak Python since Monday...
Rather than using '-1' to indiciate no pinning, it is more common
python practice to use None, which indicates no value. Other than
that it looks reasonable
Regards,
Dan.
--
|=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=|
|=- Perl modules: http://search.cpan.org/~danberr/ -=|
|=- Projects: http://freshmeat.net/~danielpb/ -=|
|=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] (resent) NUMA node migration
2007-12-21 22:39 [PATCH] (resent) NUMA node migration Andre Przywara
2007-12-21 22:45 ` Daniel P. Berrange
@ 2007-12-22 0:53 ` John Levon
2007-12-22 1:15 ` Ian Pratt
2008-01-09 14:35 ` Andre Przywara
1 sibling, 2 replies; 7+ messages in thread
From: John Levon @ 2007-12-22 0:53 UTC (permalink / raw)
To: Andre Przywara; +Cc: xen-devel
On Fri, Dec 21, 2007 at 11:39:58PM +0100, Andre Przywara wrote:
> accordingly. Only changes Python code in xend. I hope that the patch
> doesn't break XenAPI compatibility (adding a parameter seems fine?).
>
> # xm migrate --live --node=<nodenr> <domid> localhost
> <nodenr> is the number as shown with 'xm info' under node_to_cpu
>
> I am aware that using live migration isn't the best approach (takes
> twice the memory and quite some time), but it's less intrusive and works
> fine (given localhost migration stability...)
Is this really using localhost live migration to move a domain from one
NUMA node to another on the same host? Why isn't there a simpler way?
cheers
john
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] (resent) NUMA node migration
2007-12-22 0:53 ` John Levon
@ 2007-12-22 1:15 ` Ian Pratt
2007-12-22 9:42 ` tgh
2008-01-09 14:35 ` Andre Przywara
1 sibling, 1 reply; 7+ messages in thread
From: Ian Pratt @ 2007-12-22 1:15 UTC (permalink / raw)
To: John Levon, Andre Przywara; +Cc: Ian Pratt, xen-devel
> > I am aware that using live migration isn't the best approach (takes
> > twice the memory and quite some time), but it's less intrusive and
> works
> > fine (given localhost migration stability...)
>
> Is this really using localhost live migration to move a domain from
one
> NUMA node to another on the same host?
Yep, it really is.
> Why isn't there a simpler way?
Well, you can't beat using live migration for code simplicity :)
Doing page migration for HVM guests is easy as you can just stop the
guest, copy a bunch of pages and update the p2m table, flush the shadow
page table cache and all VCPU's TLBs, then resume. Doing it live is a
little tricky as you have to go from MFN's to PTE's which currently
requires a full shadow page table scan. [Though some of the experimental
page sharing patches maintain linked lists of backpointers, and we could
switch to a shadow mode that supports this while doing page migration]
PV guests are a little more challenging as all references in the
direct-mode page tables need to be updated. We also need to make sure
that the guest isn't holding MFNs outside of pagetables, so we need to
get all the VCPUs into a known state. The best way of handling this is
to use the PV fast checkpoint support to freeze the guest, copy the
badly located pages, scan and update all pagetables, resume from
checkpoint. This would make a nice little project for someone...
Ian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] (resent) NUMA node migration
2007-12-22 1:15 ` Ian Pratt
@ 2007-12-22 9:42 ` tgh
2007-12-23 21:12 ` Mark Williamson
0 siblings, 1 reply; 7+ messages in thread
From: tgh @ 2007-12-22 9:42 UTC (permalink / raw)
To: Ian Pratt; +Cc: Andre Przywara, xen-devel, John Levon
hi
for PV, is there a fast checkpoint mechanism available ? how does it
work , that is ,what paremeters in the common line"xm migration ",or how
to invoke it? and how does it work ,say ,fast checkpoint for PV in an
incremental way such as live migration, or how to fast checkpoint ?
Thanks in advance
Ian Pratt 写道:
>
> Doing page migration for HVM guests is easy as you can just stop the
> guest, copy a bunch of pages and update the p2m table, flush the shadow
> page table cache and all VCPU's TLBs, then resume. Doing it live is a
> little tricky as you have to go from MFN's to PTE's which currently
> requires a full shadow page table scan. [Though some of the experimental
> page sharing patches maintain linked lists of backpointers, and we could
> switch to a shadow mode that supports this while doing page migration]
>
> PV guests are a little more challenging as all references in the
> direct-mode page tables need to be updated. We also need to make sure
> that the guest isn't holding MFNs outside of pagetables, so we need to
> get all the VCPUs into a known state. The best way of handling this is
> to use the PV fast checkpoint support to freeze the guest, copy the
>
> badly located pages, scan and update all pagetables, resume from
> checkpoint. This would make a nice little project for someone...
>
> Ian
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] (resent) NUMA node migration
2007-12-22 9:42 ` tgh
@ 2007-12-23 21:12 ` Mark Williamson
0 siblings, 0 replies; 7+ messages in thread
From: Mark Williamson @ 2007-12-23 21:12 UTC (permalink / raw)
To: xen-devel; +Cc: Andre Przywara, Ian Pratt, tgh, John Levon
> for PV, is there a fast checkpoint mechanism available ? how does it
> work , that is ,what paremeters in the common line"xm migration ",or how
> to invoke it?
xm save -c does a checkpoint
The "Checkpoint" operation means that the guest is resumed immediately after
the save has completed, rather than being stopped or migrated.
> and how does it work ,say ,fast checkpoint for PV in an
> incremental way such as live migration, or how to fast checkpoint ?
It's not an incremental process in the case of a checkpoint: at the moment it
still stops the guests execution for the period of the save - just like
normal xm save does.
After the save continues, the guest is instructed to resume its own execution.
Cheers,
Mark
>
> Thanks in advance
>
> Ian Pratt 写道:
> > Doing page migration for HVM guests is easy as you can just stop the
> > guest, copy a bunch of pages and update the p2m table, flush the shadow
> > page table cache and all VCPU's TLBs, then resume. Doing it live is a
> > little tricky as you have to go from MFN's to PTE's which currently
> > requires a full shadow page table scan. [Though some of the experimental
> > page sharing patches maintain linked lists of backpointers, and we could
> > switch to a shadow mode that supports this while doing page migration]
> >
> > PV guests are a little more challenging as all references in the
> > direct-mode page tables need to be updated. We also need to make sure
> > that the guest isn't holding MFNs outside of pagetables, so we need to
> > get all the VCPUs into a known state. The best way of handling this is
> > to use the PV fast checkpoint support to freeze the guest, copy the
> >
> >
> > badly located pages, scan and update all pagetables, resume from
> > checkpoint. This would make a nice little project for someone...
> >
> > Ian
> >
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
--
Dave: Just a question. What use is a unicyle with no seat? And no pedals!
Mark: To answer a question with a question: What use is a skateboard?
Dave: Skateboards have wheels.
Mark: My wheel has a wheel!
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] (resent) NUMA node migration
2007-12-22 0:53 ` John Levon
2007-12-22 1:15 ` Ian Pratt
@ 2008-01-09 14:35 ` Andre Przywara
1 sibling, 0 replies; 7+ messages in thread
From: Andre Przywara @ 2008-01-09 14:35 UTC (permalink / raw)
To: John Levon; +Cc: Ian.Pratt, xen-devel
John Levon wrote:
> On Fri, Dec 21, 2007 at 11:39:58PM +0100, Andre Przywara wrote:
>
>> accordingly. Only changes Python code in xend. I hope that the patch
>> doesn't break XenAPI compatibility (adding a parameter seems fine?).
>>
>> # xm migrate --live --node=<nodenr> <domid> localhost
>> <nodenr> is the number as shown with 'xm info' under node_to_cpu
>>
>> I am aware that using live migration isn't the best approach (takes
>> twice the memory and quite some time),
After thinking more about this, I came to the conclusion that this is
not entirely true. If you want to transfer the guest from one node (with
it's own memory) to another node (with it's own memory), then you would
need to have this amount of memory available at the target anyway.
The performance problem can maybe overcome by not using TCP/IP to
loopback, but a more direct solution (or by reverting to a UNIX socket).
>>but it's less intrusive and works
>> fine (given localhost migration stability...)
>
> Is this really using localhost live migration to move a domain from one
> NUMA node to another on the same host? Why isn't there a simpler way?
Well, Ian described the "simpler way" (thanks for that), I would call it
more elegant and efficient, but simpler is not the word that comes to
mind...
One advantage of this solution is that the guest stays the same and
there is no interruption. Also one could think about migrating hot pages
first and letting the rest be done in the background more slowly (this
is how VMware does this).
Regards,
Andre.
--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 277-84917
----to satisfy European Law for business letters:
AMD Saxony Limited Liability Company & Co. KG,
Wilschdorfer Landstr. 101, 01109 Dresden, Germany
Register Court Dresden: HRA 4896, General Partner authorized
to represent: AMD Saxony LLC (Wilmington, Delaware, US)
General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-01-09 14:35 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-21 22:39 [PATCH] (resent) NUMA node migration Andre Przywara
2007-12-21 22:45 ` Daniel P. Berrange
2007-12-22 0:53 ` John Levon
2007-12-22 1:15 ` Ian Pratt
2007-12-22 9:42 ` tgh
2007-12-23 21:12 ` Mark Williamson
2008-01-09 14:35 ` Andre Przywara
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.