* txenmon: Cluster monitoring/management
@ 2004-02-10 5:25 stevegt
2004-02-10 5:41 ` stevegt
2004-02-10 8:06 ` Ian Pratt
0 siblings, 2 replies; 9+ messages in thread
From: stevegt @ 2004-02-10 5:25 UTC (permalink / raw)
To: Ian Pratt; +Cc: Christian Limpach, Keir Fraser, xen-devel
On Sun, Feb 08, 2004 at 09:19:56AM +0000, Ian Pratt wrote:
> Of course, this will all be much neater in rev 3 of the domain
> control tools that will use a db backend to maintain state about
> currently running domains across a cluster...
Ack! We might be doing duplicate work. How far have you gotten with
this?
Right now I'm running python code (distantly descended from
createlinuxdom.py) that is able to:
- monitor each domain and restart as needed
- migrate domains from one host to another
- dynamically create and assign swap vd's, and garbage collect them at
shutdown or after crash
...and a few other things. Right now migration is via reboot, not
suspend; haven't had a chance to troubleshoot resume further.
The only thing I'm using VD's for at this point is swap. This code so
far depends on NFS root partitions, all served from a central NFS
server; control and state are communicated via NFS also. I just today
started migrating the control/state comms to jabber instead, so that I
could start using VD root filesystems after the COW stuff settles down.
Haven't decided what to do for migrating filesystems between nodes in
that case though.
Right now I'm calling this 'txenmon' (TerraLuna Xen Monitor) but was
already considering renaming it 'xenmon' and posting it after I got it
cleaned up.
This is all to support a production Xen cluster rollout that I plan to
have running by the end of this month. I really don't want to go back
to UML at this point, and if I don't have this cluster running by March
I'm in deep doo-doo -- so I'm committed to working full-time on Xen
tools now. ;-}
So here's the current version, not cleaned up, way too verbose, crufty,
but running:
Steve
#!/usr/bin/python2.2
import Xc, XenoUtil, string, sys, os, time, socket, cPickle
# initialize a few variables that might come in handy
thishostname = socket.gethostname()
if not len(sys.argv) >= 2:
print "usage: %s /path/to/base/of/users/hosts" % sys.argv[0]
sys.exit(1)
nfsserv="10.27.2.50"
base = sys.argv[1]
if len(base) > 1 and base.endswith('/'):
base=base[:-1]
# Obtain an instance of the Xen control interface
xc = Xc.new()
# daemonize
daemonize=0
if daemonize:
try:
pid = os.fork()
if pid > 0:
# exit first parent
sys.exit(0)
except OSError, e:
print >>sys.stderr, "fork #1 failed: %d (%s)" % (e.errno, e.strerror)
sys.exit(1)
# decouple from parent environment
# os.chdir("/")
os.setsid()
os.umask(0)
# XXX what about stdout etc?
# do second fork
try:
pid = os.fork()
if pid > 0:
# exit from second parent, print eventual PID before
# print "Daemon PID %d" % pid
sys.exit(0)
except OSError, e:
print >>sys.stderr, "fork #2 failed: %d (%s)" % (e.errno, e.strerror)
sys.exit(1)
def main():
while 1:
guests=getGuests(base)
# state machine
for guest in guests:
print guest.path,guest.activeHost,guest.isRunning()
if guest.isMine():
if guest.isRunningHere():
guest.heartbeat()
if guest.isRunnable():
if guest.isRunningHere():
pass
else:
if guest.isRunning():
print "warning: %s is running on %s" % (
guest.name, guest.activeHost
)
else:
guest.start()
else: # not guest.isRunnable()
if guest.isRunningHere():
guest.shutdown()
if guest.isHung():
guest.destroy()
else: # not guest.isMine()
if guest.isRunningHere():
guest.shutdown()
if guest.isRunning():
pass
else:
print "warning: %s is not running on %s" % (
guest.name,guest.ctl('host')
)
# end state machine
# garbage collect vd's
usedVds=[]
for guest in guests:
if guest.isRunningHere():
usedVds+=guest.vds()
guest.pickle()
for vd in listvdids():
print "usedVds =",usedVds
if vd in usedVds:
pass
else:
print "deleting vd %s" % vd
XenoUtil.vd_delete(vd)
# garbage collect domains
# XXX
time.sleep(10)
# end while
def getGuests(base):
users=os.listdir(base)
guests=[]
for user in users:
if not os.path.isdir("%s/%s" % (base,user)):
continue
guestnames=os.listdir("%s/%s" % (base,user))
for name in guestnames:
path="%s/%s/%s" % (base,user,name)
try:
try:
file=open("%s/log/pickle" % path,"r")
guest=cPickle.load(file)
file.close()
except:
print "creating",path
guest=Guest(path)
except:
print "exception creating guest %s/%s: %s" % (
user,
name,
sys.exc_info()[1].__dict__
)
continue
guests.append(guest)
return guests
def listvdids():
vdids=[]
for vbd in XenoUtil.vd_list():
vdids.append(vbd['vdisk_id'])
print "listvdids =", vdids
return vdids
class Guest(object):
def __init__(self,path):
self.reload(path)
def reload(self,path):
pathparts=path.split('/')
name=pathparts.pop()
user=pathparts.pop()
base='/'.join(pathparts)
self.path=path
self.base=base
self.user=user
self.name=name
self.domain_name="%s" % name
self.ctlcache={}
# requested domain id number
self.domid=self.ctl("domid")
# kernel
self.image=self.ctl("kernel")
# memory
self.memory_megabytes=int(self.ctl("mem"))
swap=self.ctl("swap")
(swap_dev,swap_megabytes) = swap.split(",")
self.swap_dev=swap_dev
self.swap_megabytes=int(swap_megabytes)
# ip
self.ipaddr = [self.ctl("ip")]
self.netmask = XenoUtil.get_current_ipmask()
self.gateway = self.ctl("gw")
# vbd's
vbds = []
vbdfile = open("%s/ctl/vbds" % self.path,"r")
for line in vbdfile.readlines():
print line
( uname, virt_name, rw ) = line.split(',')
uname = uname.strip()
virt_name = virt_name.strip()
rw = rw.strip()
vbds.append(( uname, virt_name, rw ))
self.vbds=vbds
self.vbd_expert = 0
# build kernel command line
ipbit = "ip="+self.ipaddr[0]
ipbit += ":"+nfsserv
ipbit += ":"+self.gateway+":"+self.netmask+"::eth0:off"
rootbit = "root=/dev/nfs nfsroot=/export/%s/root" % path
extrabit = "4 DOMID=%s " % self.domid
self.cmdline = ipbit +" "+ rootbit +" "+ extrabit
self.curid=None
self.swapvdid=None
self.shutdownTime=None
self.activeHost=None
def ctl(self,var):
filename="%s/ctl/%s" % (self.path,var)
# if not hasattr(self,'ctlcache'):
# print dir(self)
# print self.path
# self.ctlcache={}
if not self.ctlcache.has_key('filename'):
self.ctlcache[filename]={'mtime': 0, 'val': None}
val=None
mtime=os.path.getmtime(filename)
if self.ctlcache[filename]['mtime'] < mtime:
val = open(filename,"r").readline().strip()
self.ctlcache[filename]={'mtime': mtime, 'val': val}
else:
val = self.ctlcache[filename]['val']
return val
def destroy(self):
print "destroying %s" % self.domain_name
# print "now curid =",self.curid
if self.curid == 0:
raise "attempt to kill dom0"
xc.domain_destroy(dom=self.curid,force=True)
def heartbeat(self):
assert self.isRunningHere()
# update swap expiry to one day
try:
XenoUtil.vd_refresh(self.swapvdid, 86400)
except:
print "%s missed swap expiry update: %s" % (
self.domain_name,
sys.exc_info()[1].__dict__
)
self.activeHost=thishostname
self.pickle()
def isHung(self):
if not self.isRunningHere():
return False
if self.shutdownTime and time.time() - self.shutdownTime > 300:
return True
return False
def isMine(self):
if self.ctl("host") == thishostname:
return True
return False
def isRunnable(self):
run=int(self.ctl("run"))
if run > 0:
return True
return False
def isRunning(self):
if self.isRunningHere():
return True
else:
host=self.activeHost
if host == None:
return None
if host == thishostname:
return False
filename="%s/log/%s" % (self.path,"pickle")
mtime=None
try:
mtime=os.path.getmtime(filename)
except:
return False
now=time.time()
if now - mtime < 60:
return True
return False
def isRunningHere(self):
if not self.curid or self.curid == 0:
return False
domains=xc.domain_getinfo()
domids = [ d['dom'] for d in domains ]
if self.curid in domids:
# print self.curid
return True
self.curid=None
return False
def XXXlog(self,var,val=None,append=False):
filename="%s/log/%s" % (self.path,var)
if val==None:
out=None
try:
out=open(filename,"r").readlines()
except:
return None
out=[l.strip() for l in out]
return out
mode="w"
if append:
mode="a"
file=open(filename,mode)
file.write("%s\n" % str(val))
file.close()
def mkswap(self):
# create swap, 1 minute expiry
vdid=XenoUtil.vd_create(self.swap_megabytes,60)
# print "vdid =",vdid
self.swapvdid=vdid
uname="vd:%s" % vdid
# format it
segments = XenoUtil.lookup_disk_uname(uname)
if XenoUtil.vd_extents_validate(segments,1) < 0:
print "segment conflict on %s" % uname
sys.exit(1)
tmpdev="/dev/xenswap%s" % vdid
cmd="mknod %s b 125 %s" % (tmpdev,vdid)
os.system(cmd)
virt_dev = XenoUtil.blkdev_name_to_number(tmpdev)
xc.vbd_create(0,virt_dev,1)
xc.vbd_setextents(0,virt_dev,segments)
cmd="mkswap %s" % tmpdev
os.system(cmd)
xc.vbd_destroy(0,virt_dev)
self.vbds.append(( uname, self.swap_dev, "w" ))
print "mkswap:",uname, self.swap_dev, "w"
print self.vbds
def pickle(self):
assert self.isRunningHere()
# write then rename so others see an atomic operation...
file=open("%s/log/pickle.new" % self.path,"w")
cPickle.dump(self,file)
file.close()
os.rename(
"%s/log/pickle.new" % self.path,
"%s/log/pickle" % self.path
)
def shutdown(self):
print "shutting down %s" % self.name
# reduce swap expiry to 10 minutes (to give it time to shut down)
if self.swapvdid:
XenoUtil.vd_refresh(self.swapvdid, 600)
xc.domain_destroy(dom=self.curid)
if not self.shutdownTime:
self.shutdownTime=time.time()
def start(self):
"""Create, build and start the domain for this guest."""
self.reload(self.path)
image=self.image
memory_megabytes=self.memory_megabytes
domain_name=self.domain_name
ipaddr=self.ipaddr
netmask=self.netmask
vbds=self.vbds
cmdline=self.cmdline
vbd_expert=self.vbd_expert
print "Domain image : ", self.image
print "Domain memory : ", self.memory_megabytes
print "Domain IP address(es) : ", self.ipaddr
print "Domain block devices : ", self.vbds
print 'Domain cmdline : "%s"' % self.cmdline
if self.isRunning():
raise "%s already running on %s" % (self.name,self.activeHost)
if not os.path.isfile( image ):
print "Image file '" + image + "' does not exist"
return None
id = xc.domain_create( mem_kb=memory_megabytes*1024, name=domain_name )
print "Created new domain with id = " + str(id)
if id <= 0:
print "Error creating domain"
return None
ret = xc.linux_build( dom=id, image=image, cmdline=cmdline )
if ret < 0:
print "Error building Linux guest OS: "
print "Return code from linux_build = " + str(ret)
xc.domain_destroy ( dom=id )
return None
# setup the virtual block devices
# set the expertise level appropriately
XenoUtil.VBD_EXPERT_MODE = vbd_expert
self.mkswap()
self.datavds=[]
for ( uname, virt_name, rw ) in vbds:
virt_dev = XenoUtil.blkdev_name_to_number( virt_name )
segments = XenoUtil.lookup_disk_uname( uname )
if not segments or segments < 0:
print "Error looking up %s\n" % uname
xc.domain_destroy ( dom=id )
return None
# check that setting up this VBD won't violate the sharing
# allowed by the current VBD expertise level
# print uname, virt_name, rw, segments
if XenoUtil.vd_extents_validate(segments, rw=='w' or rw=='rw') < 0:
xc.domain_destroy( dom = id )
return None
if xc.vbd_create( dom=id, vbd=virt_dev, writeable= rw=='w' or rw=='rw' ):
print "Error creating VBD vbd=%d writeable=%d\n" % (virt_dev,rw)
xc.domain_destroy ( dom=id )
return None
if xc.vbd_setextents(
dom=id,
vbd=virt_dev,
extents=segments):
print "Error populating VBD vbd=%d\n" % virt_dev
xc.domain_destroy ( dom=id )
return None
self.datavds.append(virt_dev)
# setup virtual firewall rules for all aliases
for ip in ipaddr:
XenoUtil.setup_vfr_rules_for_vif( id, 0, ip )
if xc.domain_start( dom=id ) < 0:
print "Error starting domain"
xc.domain_destroy ( dom=id )
sys.exit()
self.curid=id
print "domain (re)started: %s (%d)" % (domain_name,id)
self.heartbeat()
return id
def vds(self):
vds=[]
# XXX add data vbds
vds.append(self.swapvdid)
return vds
main()
--
Stephen G. Traugott (KG6HDQ)
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@TerraLuna.Org
http://www.stevegt.com -- http://Infrastructures.Org
-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: txenmon: Cluster monitoring/management
2004-02-10 5:25 txenmon: Cluster monitoring/management stevegt
@ 2004-02-10 5:41 ` stevegt
2004-02-10 8:06 ` Ian Pratt
1 sibling, 0 replies; 9+ messages in thread
From: stevegt @ 2004-02-10 5:41 UTC (permalink / raw)
To: Ian Pratt; +Cc: Christian Limpach, Keir Fraser, xen-devel
One thing I didn't mention: this code also is able to be killed on the
fly without killing the domains it monitors; on restart it will discover
and adopt them, as well as their swap VD's, and resume monitoring.
This feature was needed because the script dies after a few hours of
running -- I'm getting a SIGABRT from somewhere in the xc libraries
every few hours, I think. Note that I'm not only checking
xc.domain_getinfo(), but also updating the swap VD expirys on every trip
through the while loop; one of those is likely the culprit.
Steve
On Mon, Feb 09, 2004 at 09:25:36PM -0800, wrote:
> On Sun, Feb 08, 2004 at 09:19:56AM +0000, Ian Pratt wrote:
> > Of course, this will all be much neater in rev 3 of the domain
> > control tools that will use a db backend to maintain state about
> > currently running domains across a cluster...
>
> Ack! We might be doing duplicate work. How far have you gotten with
> this?
>
> Right now I'm running python code (distantly descended from
> createlinuxdom.py) that is able to:
>
> - monitor each domain and restart as needed
>
> - migrate domains from one host to another
>
> - dynamically create and assign swap vd's, and garbage collect them at
> shutdown or after crash
>
> ...and a few other things. Right now migration is via reboot, not
> suspend; haven't had a chance to troubleshoot resume further.
>
> The only thing I'm using VD's for at this point is swap. This code so
> far depends on NFS root partitions, all served from a central NFS
> server; control and state are communicated via NFS also. I just today
> started migrating the control/state comms to jabber instead, so that I
> could start using VD root filesystems after the COW stuff settles down.
> Haven't decided what to do for migrating filesystems between nodes in
> that case though.
>
> Right now I'm calling this 'txenmon' (TerraLuna Xen Monitor) but was
> already considering renaming it 'xenmon' and posting it after I got it
> cleaned up.
>
> This is all to support a production Xen cluster rollout that I plan to
> have running by the end of this month. I really don't want to go back
> to UML at this point, and if I don't have this cluster running by March
> I'm in deep doo-doo -- so I'm committed to working full-time on Xen
> tools now. ;-}
>
> So here's the current version, not cleaned up, way too verbose, crufty,
> but running:
>
> Steve
>
>
> #!/usr/bin/python2.2
>
> import Xc, XenoUtil, string, sys, os, time, socket, cPickle
>
> # initialize a few variables that might come in handy
> thishostname = socket.gethostname()
> if not len(sys.argv) >= 2:
> print "usage: %s /path/to/base/of/users/hosts" % sys.argv[0]
> sys.exit(1)
>
> nfsserv="10.27.2.50"
>
> base = sys.argv[1]
> if len(base) > 1 and base.endswith('/'):
> base=base[:-1]
>
> # Obtain an instance of the Xen control interface
> xc = Xc.new()
>
> # daemonize
> daemonize=0
> if daemonize:
> try:
> pid = os.fork()
> if pid > 0:
> # exit first parent
> sys.exit(0)
> except OSError, e:
> print >>sys.stderr, "fork #1 failed: %d (%s)" % (e.errno, e.strerror)
> sys.exit(1)
> # decouple from parent environment
> # os.chdir("/")
> os.setsid()
> os.umask(0)
> # XXX what about stdout etc?
> # do second fork
> try:
> pid = os.fork()
> if pid > 0:
> # exit from second parent, print eventual PID before
> # print "Daemon PID %d" % pid
> sys.exit(0)
> except OSError, e:
> print >>sys.stderr, "fork #2 failed: %d (%s)" % (e.errno, e.strerror)
> sys.exit(1)
>
> def main():
> while 1:
> guests=getGuests(base)
> # state machine
> for guest in guests:
> print guest.path,guest.activeHost,guest.isRunning()
> if guest.isMine():
> if guest.isRunningHere():
> guest.heartbeat()
> if guest.isRunnable():
> if guest.isRunningHere():
> pass
> else:
> if guest.isRunning():
> print "warning: %s is running on %s" % (
> guest.name, guest.activeHost
> )
> else:
> guest.start()
> else: # not guest.isRunnable()
> if guest.isRunningHere():
> guest.shutdown()
> if guest.isHung():
> guest.destroy()
> else: # not guest.isMine()
> if guest.isRunningHere():
> guest.shutdown()
> if guest.isRunning():
> pass
> else:
> print "warning: %s is not running on %s" % (
> guest.name,guest.ctl('host')
> )
> # end state machine
> # garbage collect vd's
> usedVds=[]
> for guest in guests:
> if guest.isRunningHere():
> usedVds+=guest.vds()
> guest.pickle()
> for vd in listvdids():
> print "usedVds =",usedVds
> if vd in usedVds:
> pass
> else:
> print "deleting vd %s" % vd
> XenoUtil.vd_delete(vd)
> # garbage collect domains
> # XXX
> time.sleep(10)
> # end while
>
> def getGuests(base):
> users=os.listdir(base)
> guests=[]
> for user in users:
> if not os.path.isdir("%s/%s" % (base,user)):
> continue
> guestnames=os.listdir("%s/%s" % (base,user))
> for name in guestnames:
> path="%s/%s/%s" % (base,user,name)
> try:
> try:
> file=open("%s/log/pickle" % path,"r")
> guest=cPickle.load(file)
> file.close()
> except:
> print "creating",path
> guest=Guest(path)
> except:
> print "exception creating guest %s/%s: %s" % (
> user,
> name,
> sys.exc_info()[1].__dict__
> )
> continue
> guests.append(guest)
> return guests
>
> def listvdids():
> vdids=[]
> for vbd in XenoUtil.vd_list():
> vdids.append(vbd['vdisk_id'])
> print "listvdids =", vdids
> return vdids
>
>
> class Guest(object):
>
> def __init__(self,path):
> self.reload(path)
>
> def reload(self,path):
> pathparts=path.split('/')
> name=pathparts.pop()
> user=pathparts.pop()
> base='/'.join(pathparts)
> self.path=path
> self.base=base
> self.user=user
> self.name=name
> self.domain_name="%s" % name
> self.ctlcache={}
> # requested domain id number
> self.domid=self.ctl("domid")
> # kernel
> self.image=self.ctl("kernel")
> # memory
> self.memory_megabytes=int(self.ctl("mem"))
> swap=self.ctl("swap")
> (swap_dev,swap_megabytes) = swap.split(",")
> self.swap_dev=swap_dev
> self.swap_megabytes=int(swap_megabytes)
> # ip
> self.ipaddr = [self.ctl("ip")]
> self.netmask = XenoUtil.get_current_ipmask()
> self.gateway = self.ctl("gw")
> # vbd's
> vbds = []
> vbdfile = open("%s/ctl/vbds" % self.path,"r")
> for line in vbdfile.readlines():
> print line
> ( uname, virt_name, rw ) = line.split(',')
> uname = uname.strip()
> virt_name = virt_name.strip()
> rw = rw.strip()
> vbds.append(( uname, virt_name, rw ))
> self.vbds=vbds
> self.vbd_expert = 0
> # build kernel command line
> ipbit = "ip="+self.ipaddr[0]
> ipbit += ":"+nfsserv
> ipbit += ":"+self.gateway+":"+self.netmask+"::eth0:off"
> rootbit = "root=/dev/nfs nfsroot=/export/%s/root" % path
> extrabit = "4 DOMID=%s " % self.domid
> self.cmdline = ipbit +" "+ rootbit +" "+ extrabit
> self.curid=None
> self.swapvdid=None
> self.shutdownTime=None
> self.activeHost=None
>
>
> def ctl(self,var):
> filename="%s/ctl/%s" % (self.path,var)
> # if not hasattr(self,'ctlcache'):
> # print dir(self)
> # print self.path
> # self.ctlcache={}
> if not self.ctlcache.has_key('filename'):
> self.ctlcache[filename]={'mtime': 0, 'val': None}
> val=None
> mtime=os.path.getmtime(filename)
> if self.ctlcache[filename]['mtime'] < mtime:
> val = open(filename,"r").readline().strip()
> self.ctlcache[filename]={'mtime': mtime, 'val': val}
> else:
> val = self.ctlcache[filename]['val']
> return val
>
> def destroy(self):
> print "destroying %s" % self.domain_name
> # print "now curid =",self.curid
> if self.curid == 0:
> raise "attempt to kill dom0"
> xc.domain_destroy(dom=self.curid,force=True)
>
> def heartbeat(self):
> assert self.isRunningHere()
> # update swap expiry to one day
> try:
> XenoUtil.vd_refresh(self.swapvdid, 86400)
> except:
> print "%s missed swap expiry update: %s" % (
> self.domain_name,
> sys.exc_info()[1].__dict__
> )
> self.activeHost=thishostname
> self.pickle()
>
> def isHung(self):
> if not self.isRunningHere():
> return False
> if self.shutdownTime and time.time() - self.shutdownTime > 300:
> return True
> return False
>
> def isMine(self):
> if self.ctl("host") == thishostname:
> return True
> return False
>
> def isRunnable(self):
> run=int(self.ctl("run"))
> if run > 0:
> return True
> return False
>
> def isRunning(self):
> if self.isRunningHere():
> return True
> else:
> host=self.activeHost
> if host == None:
> return None
> if host == thishostname:
> return False
> filename="%s/log/%s" % (self.path,"pickle")
> mtime=None
> try:
> mtime=os.path.getmtime(filename)
> except:
> return False
> now=time.time()
> if now - mtime < 60:
> return True
> return False
>
> def isRunningHere(self):
> if not self.curid or self.curid == 0:
> return False
> domains=xc.domain_getinfo()
> domids = [ d['dom'] for d in domains ]
> if self.curid in domids:
> # print self.curid
> return True
> self.curid=None
> return False
>
> def XXXlog(self,var,val=None,append=False):
> filename="%s/log/%s" % (self.path,var)
> if val==None:
> out=None
> try:
> out=open(filename,"r").readlines()
> except:
> return None
> out=[l.strip() for l in out]
> return out
> mode="w"
> if append:
> mode="a"
> file=open(filename,mode)
> file.write("%s\n" % str(val))
> file.close()
>
> def mkswap(self):
> # create swap, 1 minute expiry
> vdid=XenoUtil.vd_create(self.swap_megabytes,60)
> # print "vdid =",vdid
> self.swapvdid=vdid
> uname="vd:%s" % vdid
> # format it
> segments = XenoUtil.lookup_disk_uname(uname)
> if XenoUtil.vd_extents_validate(segments,1) < 0:
> print "segment conflict on %s" % uname
> sys.exit(1)
> tmpdev="/dev/xenswap%s" % vdid
> cmd="mknod %s b 125 %s" % (tmpdev,vdid)
> os.system(cmd)
> virt_dev = XenoUtil.blkdev_name_to_number(tmpdev)
> xc.vbd_create(0,virt_dev,1)
> xc.vbd_setextents(0,virt_dev,segments)
> cmd="mkswap %s" % tmpdev
> os.system(cmd)
> xc.vbd_destroy(0,virt_dev)
> self.vbds.append(( uname, self.swap_dev, "w" ))
> print "mkswap:",uname, self.swap_dev, "w"
> print self.vbds
>
> def pickle(self):
> assert self.isRunningHere()
> # write then rename so others see an atomic operation...
> file=open("%s/log/pickle.new" % self.path,"w")
> cPickle.dump(self,file)
> file.close()
> os.rename(
> "%s/log/pickle.new" % self.path,
> "%s/log/pickle" % self.path
> )
>
> def shutdown(self):
> print "shutting down %s" % self.name
> # reduce swap expiry to 10 minutes (to give it time to shut down)
> if self.swapvdid:
> XenoUtil.vd_refresh(self.swapvdid, 600)
> xc.domain_destroy(dom=self.curid)
> if not self.shutdownTime:
> self.shutdownTime=time.time()
>
> def start(self):
> """Create, build and start the domain for this guest."""
> self.reload(self.path)
> image=self.image
> memory_megabytes=self.memory_megabytes
> domain_name=self.domain_name
> ipaddr=self.ipaddr
> netmask=self.netmask
> vbds=self.vbds
> cmdline=self.cmdline
> vbd_expert=self.vbd_expert
>
> print "Domain image : ", self.image
> print "Domain memory : ", self.memory_megabytes
> print "Domain IP address(es) : ", self.ipaddr
> print "Domain block devices : ", self.vbds
> print 'Domain cmdline : "%s"' % self.cmdline
>
> if self.isRunning():
> raise "%s already running on %s" % (self.name,self.activeHost)
>
> if not os.path.isfile( image ):
> print "Image file '" + image + "' does not exist"
> return None
>
> id = xc.domain_create( mem_kb=memory_megabytes*1024, name=domain_name )
> print "Created new domain with id = " + str(id)
> if id <= 0:
> print "Error creating domain"
> return None
>
> ret = xc.linux_build( dom=id, image=image, cmdline=cmdline )
> if ret < 0:
> print "Error building Linux guest OS: "
> print "Return code from linux_build = " + str(ret)
> xc.domain_destroy ( dom=id )
> return None
>
> # setup the virtual block devices
> # set the expertise level appropriately
> XenoUtil.VBD_EXPERT_MODE = vbd_expert
>
> self.mkswap()
>
> self.datavds=[]
> for ( uname, virt_name, rw ) in vbds:
> virt_dev = XenoUtil.blkdev_name_to_number( virt_name )
> segments = XenoUtil.lookup_disk_uname( uname )
> if not segments or segments < 0:
> print "Error looking up %s\n" % uname
> xc.domain_destroy ( dom=id )
> return None
>
> # check that setting up this VBD won't violate the sharing
> # allowed by the current VBD expertise level
> # print uname, virt_name, rw, segments
> if XenoUtil.vd_extents_validate(segments, rw=='w' or rw=='rw') < 0:
> xc.domain_destroy( dom = id )
> return None
>
> if xc.vbd_create( dom=id, vbd=virt_dev, writeable= rw=='w' or rw=='rw' ):
> print "Error creating VBD vbd=%d writeable=%d\n" % (virt_dev,rw)
> xc.domain_destroy ( dom=id )
> return None
>
> if xc.vbd_setextents(
> dom=id,
> vbd=virt_dev,
> extents=segments):
> print "Error populating VBD vbd=%d\n" % virt_dev
> xc.domain_destroy ( dom=id )
> return None
> self.datavds.append(virt_dev)
>
>
> # setup virtual firewall rules for all aliases
> for ip in ipaddr:
> XenoUtil.setup_vfr_rules_for_vif( id, 0, ip )
>
> if xc.domain_start( dom=id ) < 0:
> print "Error starting domain"
> xc.domain_destroy ( dom=id )
> sys.exit()
>
> self.curid=id
> print "domain (re)started: %s (%d)" % (domain_name,id)
> self.heartbeat()
> return id
>
>
> def vds(self):
> vds=[]
> # XXX add data vbds
> vds.append(self.swapvdid)
> return vds
>
>
>
>
> main()
>
>
> --
> Stephen G. Traugott (KG6HDQ)
> UNIX/Linux Infrastructure Architect, TerraLuna LLC
> stevegt@TerraLuna.Org
> http://www.stevegt.com -- http://Infrastructures.Org
--
Stephen G. Traugott (KG6HDQ)
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@TerraLuna.Org
http://www.stevegt.com -- http://Infrastructures.Org
-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: txenmon: Cluster monitoring/management
2004-02-10 5:25 txenmon: Cluster monitoring/management stevegt
2004-02-10 5:41 ` stevegt
@ 2004-02-10 8:06 ` Ian Pratt
2004-02-10 19:47 ` stevegt
1 sibling, 1 reply; 9+ messages in thread
From: Ian Pratt @ 2004-02-10 8:06 UTC (permalink / raw)
To: stevegt; +Cc: Ian Pratt, xen-devel
> On Sun, Feb 08, 2004 at 09:19:56AM +0000, Ian Pratt wrote:
> > Of course, this will all be much neater in rev 3 of the domain
> > control tools that will use a db backend to maintain state about
> > currently running domains across a cluster...
>
> Ack! We might be doing duplicate work. How far have you gotten with
> this?
We haven't even started, but have been thinking about the design,
and what the schema for the database show be etc.
> Right now I'm running python code (distantly descended from
> createlinuxdom.py) that is able to:
>
> - monitor each domain and restart as needed
>
> - migrate domains from one host to another
>
> - dynamically create and assign swap vd's, and garbage collect them at
> shutdown or after crash
>
> ...and a few other things. Right now migration is via reboot, not
> suspend; haven't had a chance to troubleshoot resume further.
Cool! It's always a nice surprise to find out what work is
going on by people on the list.
You might want to try repulling 1.2 and trying the newer versions
of the tools which are a bit more user friendly.
> Right now I'm calling this 'txenmon' (TerraLuna Xen Monitor) but was
> already considering renaming it 'xenmon' and posting it after I got it
> cleaned up.
Great, we'd love to see stuff like this in the tree.
Thanks,
Ian
-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: txenmon: Cluster monitoring/management
2004-02-10 8:06 ` Ian Pratt
@ 2004-02-10 19:47 ` stevegt
2004-02-11 0:43 ` Bin Ren
0 siblings, 1 reply; 9+ messages in thread
From: stevegt @ 2004-02-10 19:47 UTC (permalink / raw)
To: Ian Pratt; +Cc: xen-devel
On Tue, Feb 10, 2004 at 08:06:25AM +0000, Ian Pratt wrote:
> > On Sun, Feb 08, 2004 at 09:19:56AM +0000, Ian Pratt wrote:
> > > Of course, this will all be much neater in rev 3 of the domain
> > > control tools that will use a db backend to maintain state about
> > > currently running domains across a cluster...
> >
> > Ack! We might be doing duplicate work. How far have you gotten with
> > this?
>
> We haven't even started, but have been thinking about the design,
> and what the schema for the database show be etc.
When you say "database", do you mean "an independent sqlite running in
each dom0", or do you mean "a central SQL server running somewhere on a
dedicated machine"? (See further down for why I ask.)
As far as schema goes, the things I've needed to track so far are these
"control" items, referenced in the guest.ctl() calls in txenmon:
domid
gw
host
ip
kernel
mem
run
swap
vbds
...and I'm considering adding a 'reboot' boolean. I also track several
runtime state items as attributes of the Guest class -- the whole object
is saved as a pickle, so see __init__ for a list of them.
The NFS export directory tree looks something like this:
/export/xen/fs/stevegt
/export/xen/fs/stevegt/tcx
/export/xen/fs/stevegt/tcx/root
/export/xen/fs/stevegt/tcx/ctl
/export/xen/fs/stevegt/tcx/log
/export/xen/fs/stevegt/xentest1
/export/xen/fs/stevegt/xentest1/root
/export/xen/fs/stevegt/xentest1/log
/export/xen/fs/stevegt/xentest1/ctl
/export/xen/fs/stevegt/xentest2
/export/xen/fs/stevegt/xentest2/root
/export/xen/fs/stevegt/xentest2/log
/export/xen/fs/stevegt/xentest2/ctl
/export/xen/fs/stevegt/crashme1
/export/xen/fs/stevegt/crashme1/root
/export/xen/fs/stevegt/crashme1/ctl
/export/xen/fs/stevegt/crashme1/log
...where 'stevegt' is a user who owns one or more virtual domains, and
'xentest1' is the hostname of a virtual domain. Those control items I
mentioned above go in individual files (qmail style) under ./ctl, and
the python pickle for each virtual domain is saved as ./log/pickle. The
root partition for each domain is under ./root. Here's what the
contents of ./ctl look like for a given guest:
nfs1:/export/xen# ls -l /export/xen/fs/stevegt/tcx/ctl
total 32
-rw-r--r-- 1 root root 3 Feb 8 20:57 domid
-rw-r--r-- 1 root root 12 Feb 5 22:51 gw
-rw-r--r-- 1 root root 6 Feb 9 21:56 host
-rw-r--r-- 1 root root 13 Feb 8 20:57 ip
-rw-r--r-- 1 root root 30 Feb 5 22:52 kernel
-rw-r--r-- 1 root root 4 Feb 9 17:47 mem
-rw-r--r-- 1 root root 2 Feb 9 21:56 run
-rw-r--r-- 1 root root 14 Feb 5 22:53 swap
-rw-r--r-- 1 root root 0 Feb 5 22:52 vbds
Because these are individual files, this makes it easy to say, for
instance, 'echo 0 > run' from a shell prompt to cause a domain to shut
down, or 'echo node43 > host' to cause it to move to a different node.
I considered using the sqlite db for these things; I didn't do that (1)
because this was faster to implement and easier to access from the
command line, and (2) I didn't want to cause future schema conflicts
with whatever you were going to do.
* * *
Having said all this, I'm less worried about schema and more worried
about single points of failure. Right now txenmon runs in domain 0 on
each node, and the data store is distributed as above. This gives me a
dependence on the central NFS server staying up, but an NFS server is a
relatively simple thing, it can be HA'd, backed up easily, and will tend
to have uptimes in the hundreds of days anyway as long as you leave it
alone.
If these data items were to move into a "real" database server instead,
say a central mysql or postgresql server, than I'd worry more; database
servers aren't as easy to keep available for hundreds of days without
interruption. (See http://Infrastructures.Org for more of my
perspective on this.)
I'm moving in the direction of keeping some sort of distributed data
store, like those flat files and python pickles, (or use the sqlite on
each dom0?) which can be cached on local disk in each dom0, and then use
something like UDP broadcast (simple) or XMPP/jabber (less simple) as a
peer-to-peer communications mechanism, to keep the caches synced.
My goal here is to be able to walk into a Xen data center and destroy
any random machine without impacting any user for more than a few
minutes. (See http://www.infrastructures.org/bootstrap/recovery.shtml).
To this end, I'm curious what people's thoughts are on backups and
real-time replication of virtual disks -- I'm only using them for swap
right now, because of these issues.
* * *
> Cool! It's always a nice surprise to find out what work is
> going on by people on the list.
As I said last night, you have me full time right now. ;-) My wife and
I are launching a commercial service based on Xen (we were evaluating
UML). I have until the end of March. If enough revenue is flowing by
then, then you get to keep me. If not, then "the boss" will tell me to
put myself back on the consulting market.
Nothing like a little pressure. ;-)
> You might want to try repulling 1.2 and trying the newer versions
> of the tools which are a bit more user friendly.
My most recent pull was a week ago; this got me xc_dom_control and
xc_vd_tool. I'll likely do another pull this week. We already have one
production customer (woo hoo!), so I am trying to limit upgrades/reboots
for them.
> Great, we'd love to see stuff like this in the tree.
Would it help if I exposed a bk repository you could pull from, or how
do you want to do this?
Steve
--
Stephen G. Traugott (KG6HDQ)
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@TerraLuna.Org
http://www.stevegt.com -- http://Infrastructures.Org
-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: txenmon: Cluster monitoring/management
@ 2004-02-10 19:55 Williamson, Mark A
2004-02-10 20:13 ` stevegt
0 siblings, 1 reply; 9+ messages in thread
From: Williamson, Mark A @ 2004-02-10 19:55 UTC (permalink / raw)
To: stevegt; +Cc: xen-devel
> ...and a few other things. Right now migration is via reboot, not
> suspend; haven't had a chance to troubleshoot resume further.
We've improved the front-end to the resume functionality since you
highlighted the problenm, so you may want to have a look at the modified
tools if you have time.
The previous version xc_dom_control.py just called linux_restore in the
Xc library in order to reload a domain's memory state. That didn't
recreate all of the VBDs, or set up the appropriate VFR (Virtual
Firewall Router) state, which was the problem you'd experienced.
I'm not sure what version of the tools you're using. We now use
'xc_dom_create.py' to start / restore domains - this can reads it's
configuration from a file, using the '-f' option. We use
xc_dom_control.py to control running domains.
Using the latest tools stuff, you domains should be restored using
xc_dom_create.py, specifying the original configuration file as usual,
with the '-f' flag (which provides information for setting up the VFR /
VBDs again) but also the domain memory state file, using the '-L' flag
for 'Load domain state from file'. That way, the VBD / VFR state gets
put back before the domain is restarted.
Also, the save option of xc_dom_control.py is now 'suspend' and it stops
and destroys the copy of the domain in memory after it has been
suspended to disk (so it can't change it's persistent storage, etc.,
which would otherwise confuse the image you if resume from file later).
> This is all to support a production Xen cluster rollout that I plan to
> have running by the end of this month. I really don't want to go back
> to UML at this point, and if I don't have this cluster
> running by March
> I'm in deep doo-doo -- so I'm committed to working full-time on Xen
> tools now. ;-}
Thanks for the contribution! And good luck, too!
Mark
-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: txenmon: Cluster monitoring/management
2004-02-10 19:55 Williamson, Mark A
@ 2004-02-10 20:13 ` stevegt
2004-02-11 8:29 ` stevegt
0 siblings, 1 reply; 9+ messages in thread
From: stevegt @ 2004-02-10 20:13 UTC (permalink / raw)
To: Williamson, Mark A; +Cc: xen-devel
I see these save/restore updates in 'bk changes -R' -- my last pull was
Monday a week ago. And yes, using xc_dom_create.py for restore sounds
like exactly the right idea; had just hit that realization myself late
last night.
Pulling 1.2 at this instant; I'll exercise it and let you know how it
goes.
Steve
On Tue, Feb 10, 2004 at 07:55:36PM -0000, Williamson, Mark A wrote:
> > ...and a few other things. Right now migration is via reboot, not
> > suspend; haven't had a chance to troubleshoot resume further.
>
> We've improved the front-end to the resume functionality since you
> highlighted the problenm, so you may want to have a look at the modified
> tools if you have time.
>
> The previous version xc_dom_control.py just called linux_restore in the
> Xc library in order to reload a domain's memory state. That didn't
> recreate all of the VBDs, or set up the appropriate VFR (Virtual
> Firewall Router) state, which was the problem you'd experienced.
>
> I'm not sure what version of the tools you're using. We now use
> 'xc_dom_create.py' to start / restore domains - this can reads it's
> configuration from a file, using the '-f' option. We use
> xc_dom_control.py to control running domains.
>
> Using the latest tools stuff, you domains should be restored using
> xc_dom_create.py, specifying the original configuration file as usual,
> with the '-f' flag (which provides information for setting up the VFR /
> VBDs again) but also the domain memory state file, using the '-L' flag
> for 'Load domain state from file'. That way, the VBD / VFR state gets
> put back before the domain is restarted.
>
> Also, the save option of xc_dom_control.py is now 'suspend' and it stops
> and destroys the copy of the domain in memory after it has been
> suspended to disk (so it can't change it's persistent storage, etc.,
> which would otherwise confuse the image you if resume from file later).
>
> > This is all to support a production Xen cluster rollout that I plan to
> > have running by the end of this month. I really don't want to go back
> > to UML at this point, and if I don't have this cluster
> > running by March
> > I'm in deep doo-doo -- so I'm committed to working full-time on Xen
> > tools now. ;-}
>
> Thanks for the contribution! And good luck, too!
>
> Mark
>
--
Stephen G. Traugott (KG6HDQ)
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@TerraLuna.Org
http://www.stevegt.com -- http://Infrastructures.Org
-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: txenmon: Cluster monitoring/management
2004-02-10 19:47 ` stevegt
@ 2004-02-11 0:43 ` Bin Ren
2004-02-11 4:17 ` stevegt
0 siblings, 1 reply; 9+ messages in thread
From: Bin Ren @ 2004-02-11 0:43 UTC (permalink / raw)
To: stevegt; +Cc: Devel Xen
On 10 Feb 2004, at 19:47, stevegt@TerraLuna.Org wrote:
> nfs1:/export/xen# ls -l /export/xen/fs/stevegt/tcx/ctl
> total 32
> -rw-r--r-- 1 root root 3 Feb 8 20:57 domid
> -rw-r--r-- 1 root root 12 Feb 5 22:51 gw
> -rw-r--r-- 1 root root 6 Feb 9 21:56 host
> -rw-r--r-- 1 root root 13 Feb 8 20:57 ip
> -rw-r--r-- 1 root root 30 Feb 5 22:52 kernel
> -rw-r--r-- 1 root root 4 Feb 9 17:47 mem
> -rw-r--r-- 1 root root 2 Feb 9 21:56 run
> -rw-r--r-- 1 root root 14 Feb 5 22:53 swap
> -rw-r--r-- 1 root root 0 Feb 5 22:52 vbds
>
> Because these are individual files, this makes it easy to say, for
> instance, 'echo 0 > run' from a shell prompt to cause a domain to shut
> down, or 'echo node43 > host' to cause it to move to a different node.
Hey, this is the very Plan9 style, isn't it?! ;-p
-- Bin
-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: txenmon: Cluster monitoring/management
2004-02-11 0:43 ` Bin Ren
@ 2004-02-11 4:17 ` stevegt
0 siblings, 0 replies; 9+ messages in thread
From: stevegt @ 2004-02-11 4:17 UTC (permalink / raw)
To: Bin Ren; +Cc: Devel Xen
On Wed, Feb 11, 2004 at 12:43:59AM +0000, Bin Ren wrote:
> On 10 Feb 2004, at 19:47, stevegt@TerraLuna.Org wrote:
>
> > nfs1:/export/xen# ls -l /export/xen/fs/stevegt/tcx/ctl
> > total 32
> > -rw-r--r-- 1 root root 3 Feb 8 20:57 domid
> > -rw-r--r-- 1 root root 12 Feb 5 22:51 gw
> > -rw-r--r-- 1 root root 6 Feb 9 21:56 host
> > -rw-r--r-- 1 root root 13 Feb 8 20:57 ip
> > -rw-r--r-- 1 root root 30 Feb 5 22:52 kernel
> > -rw-r--r-- 1 root root 4 Feb 9 17:47 mem
> > -rw-r--r-- 1 root root 2 Feb 9 21:56 run
> > -rw-r--r-- 1 root root 14 Feb 5 22:53 swap
> > -rw-r--r-- 1 root root 0 Feb 5 22:52 vbds
> >
> >Because these are individual files, this makes it easy to say, for
> >instance, 'echo 0 > run' from a shell prompt to cause a domain to shut
> >down, or 'echo node43 > host' to cause it to move to a different node.
>
> Hey, this is the very Plan9 style, isn't it?! ;-p
Is it? Never played with that. I used to live behind Murray Hill Bell
Labs and work at USL; maybe I got polluted. ;-}
Steve
--
Stephen G. Traugott (KG6HDQ)
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@TerraLuna.Org
http://www.stevegt.com -- http://Infrastructures.Org
-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: txenmon: Cluster monitoring/management
2004-02-10 20:13 ` stevegt
@ 2004-02-11 8:29 ` stevegt
0 siblings, 0 replies; 9+ messages in thread
From: stevegt @ 2004-02-11 8:29 UTC (permalink / raw)
To: Williamson, Mark A; +Cc: xen-devel
Though I didn't get a working xenolinux today, I did decide to try
today's tools with the 02 Feb 1.2 xen/xenolinux.
- I like the new 'list' output. Pretty! ;-)
- After dealing with the builder_fn='xc.linux_build' vs.
builder_fn='linux' change, I was able to suspend and restore a virtual
domain just fine, including its swap VD. I even ran my perl malloc
torture test to make sure the swap device actually worked.
- Amusing note: I forgot to log off of the virtual domain before
suspending it. I thought about this after I resumed, thought "aww,
gotta ssh in again", and then was in for a suprise. The socket
survived. Very cool. ;-)
- I'll go ahead and integrate the suspend/restore (should we just call
this "resume"?) machinery into txenmon, so it can migrate guests
between xenoservers without rebooting them. I haven't tested restore
to a different xenoserver yet, but am hoping that will just work.
G'night all,
Steve
On Tue, Feb 10, 2004 at 12:13:39PM -0800, wrote:
> I see these save/restore updates in 'bk changes -R' -- my last pull was
> Monday a week ago. And yes, using xc_dom_create.py for restore sounds
> like exactly the right idea; had just hit that realization myself late
> last night.
>
> Pulling 1.2 at this instant; I'll exercise it and let you know how it
> goes.
>
> Steve
>
> On Tue, Feb 10, 2004 at 07:55:36PM -0000, Williamson, Mark A wrote:
> > > ...and a few other things. Right now migration is via reboot, not
> > > suspend; haven't had a chance to troubleshoot resume further.
> >
> > We've improved the front-end to the resume functionality since you
> > highlighted the problenm, so you may want to have a look at the modified
> > tools if you have time.
> >
> > The previous version xc_dom_control.py just called linux_restore in the
> > Xc library in order to reload a domain's memory state. That didn't
> > recreate all of the VBDs, or set up the appropriate VFR (Virtual
> > Firewall Router) state, which was the problem you'd experienced.
> >
> > I'm not sure what version of the tools you're using. We now use
> > 'xc_dom_create.py' to start / restore domains - this can reads it's
> > configuration from a file, using the '-f' option. We use
> > xc_dom_control.py to control running domains.
> >
> > Using the latest tools stuff, you domains should be restored using
> > xc_dom_create.py, specifying the original configuration file as usual,
> > with the '-f' flag (which provides information for setting up the VFR /
> > VBDs again) but also the domain memory state file, using the '-L' flag
> > for 'Load domain state from file'. That way, the VBD / VFR state gets
> > put back before the domain is restarted.
> >
> > Also, the save option of xc_dom_control.py is now 'suspend' and it stops
> > and destroys the copy of the domain in memory after it has been
> > suspended to disk (so it can't change it's persistent storage, etc.,
> > which would otherwise confuse the image you if resume from file later).
> >
> > > This is all to support a production Xen cluster rollout that I plan to
> > > have running by the end of this month. I really don't want to go back
> > > to UML at this point, and if I don't have this cluster
> > > running by March
> > > I'm in deep doo-doo -- so I'm committed to working full-time on Xen
> > > tools now. ;-}
> >
> > Thanks for the contribution! And good luck, too!
> >
> > Mark
> >
>
> --
> Stephen G. Traugott (KG6HDQ)
> UNIX/Linux Infrastructure Architect, TerraLuna LLC
> stevegt@TerraLuna.Org
> http://www.stevegt.com -- http://Infrastructures.Org
--
Stephen G. Traugott (KG6HDQ)
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@TerraLuna.Org
http://www.stevegt.com -- http://Infrastructures.Org
-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-02-11 8:29 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-10 5:25 txenmon: Cluster monitoring/management stevegt
2004-02-10 5:41 ` stevegt
2004-02-10 8:06 ` Ian Pratt
2004-02-10 19:47 ` stevegt
2004-02-11 0:43 ` Bin Ren
2004-02-11 4:17 ` stevegt
-- strict thread matches above, loose matches on Subject: below --
2004-02-10 19:55 Williamson, Mark A
2004-02-10 20:13 ` stevegt
2004-02-11 8:29 ` stevegt
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.