* KVM and Perf Counters
@ 2007-01-10 1:26 Casey Jeffery
[not found] ` <cb6aceaa0701091726h7fe94bb3q73db49e43ff226ed-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Casey Jeffery @ 2007-01-10 1:26 UTC (permalink / raw)
To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
I've tried out the last few versions of KVM and think it's great. It's
much easier to use and understand than Xen and performance is
surprisingly good.
One of the things I'd like to do is modify it to allow PMI generation
based on the Intel performance counter facilities. Specifically, I'd
like to be able to pin a guest to a CPU, program one of the IA32_PMCx
MSR's to a count, configure the LVT to deliver an NMI on overflow,
enable the PERF_EVT_SELx, and use the hardware-based MSR save/restore
area to disable counting in root mode and re-enable when entering
non-root.
The questions I have now are the following:
1. It looks like the MSR save/restore area is set up already for EFER
MSR and I assume that adding IA32_PMCx and IA32_PERFEVTSEL to the
vmx_msr_index[] array would be enough to save/restore them. Is that
correct?
2. I don't completely follow the code from vmx.c
save_msrs(vcpu->host_msrs, vcpu->nmsrs);
load_msrs(vcpu->guest_msrs, NR_BAD_MSRS);
...VMLAUNCH/RESUME etc. ...
....
...VMEXIT...
save_msrs(vcpu->guest_msrs, NR_BAD_MSRS);
load_msrs(vcpu->host_msrs, NR_BAD_MSRS);
Does this mean the MSR values to be loaded on VM exit are initialized
before leaving root mode while the VM_entry_load / VM_exit_store area
is not touched?
3. Is it possible to attribute counter ticks solely to a given KVM
guest process and avoid counting ticks from other processes? I assume
this isn't a problem since a VM exit should occur if the KVM process
were to be pulled off the CPU by the kernel to do a context switch.
Thanks in advance for any insight provided.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: KVM and Perf Counters
[not found] ` <cb6aceaa0701091726h7fe94bb3q73db49e43ff226ed-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2007-01-10 9:37 ` Avi Kivity
[not found] ` <45A4B3E4.7090202-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2007-01-10 9:37 UTC (permalink / raw)
To: Casey Jeffery; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Casey Jeffery wrote:
> I've tried out the last few versions of KVM and think it's great. It's
> much easier to use and understand than Xen and performance is
> surprisingly good.
>
> One of the things I'd like to do is modify it to allow PMI generation
> based on the Intel performance counter facilities. Specifically, I'd
> like to be able to pin a guest to a CPU, program one of the IA32_PMCx
> MSR's to a count, configure the LVT to deliver an NMI on overflow,
> enable the PERF_EVT_SELx, and use the hardware-based MSR save/restore
> area to disable counting in root mode and re-enable when entering
> non-root.
>
Ok. Note that there are two possible mutually exclusive uses for the
performance counters:
1. Allow the guest to program the performance counters as it sees fit,
for example to run oprofile internally.
2. Let the host control the performance counters, for stuff like
deterministic record/replay.
We should by default allow (1), with the option for (2). This is
somewhat similar to the hardware breakpoints, so you'd need some control
like KVM_DEBUG_GUEST to switch modes.
> The questions I have now are the following:
>
> 1. It looks like the MSR save/restore area is set up already for EFER
> MSR and I assume that adding IA32_PMCx and IA32_PERFEVTSEL to the
> vmx_msr_index[] array would be enough to save/restore them. Is that
> correct?
>
Yes.
> 2. I don't completely follow the code from vmx.c
>
> save_msrs(vcpu->host_msrs, vcpu->nmsrs);
>
This saves the host msrs to the host save area.
> load_msrs(vcpu->guest_msrs, NR_BAD_MSRS);
>
vmx has a bug where it corrupts a couple of msrs, so we load the first
NR_BAD_MSRS explicitly.
> ...VMLAUNCH/RESUME etc. ...
>
This loads vcpu->nmsrs - NR_BAD_MSRS from the guest area.
> ....
> ...VMEXIT...
>
>
This saves vcpu->nmsrs - NR_BAD_MSRS into the guest area, and then loads
vcpu->nmsrs - NR_BAD_MSRS from the host area.
> save_msrs(vcpu->guest_msrs, NR_BAD_MSRS);
> load_msrs(vcpu->host_msrs, NR_BAD_MSRS);
>
These two lines do the same for the vmx-incompatible msrs.
> Does this mean the MSR values to be loaded on VM exit are initialized
> before leaving root mode while the VM_entry_load / VM_exit_store area
> is not touched?
>
I'm not sure I understand the question, but host msrs are never saved
automatically, and there are a couple of msrs we load and save manually
for both host and guest.
> 3. Is it possible to attribute counter ticks solely to a given KVM
> guest process and avoid counting ticks from other processes? I assume
> this isn't a problem since a VM exit should occur if the KVM process
> were to be pulled off the CPU by the kernel to do a context switch.
>
Yes.
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: KVM and Perf Counters
[not found] ` <45A4B3E4.7090202-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-02-01 15:43 ` Casey Jeffery
[not found] ` <cb6aceaa0702010743v7fba298y23e825ef1e706801-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: Casey Jeffery @ 2007-02-01 15:43 UTC (permalink / raw)
To: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
I wanted to give an update on a project I'm attempting with KVM. The
goal is to implement a software-based lockstepping of two or more
virtual guests. There are a number of applications of this, and the
most related is the ReVirt project that was done with UML and Xen by
Dunlap et al.
The basic idea is to remove the non-deterministic events that occur in
the system by giving each guest the same I/O values from external
devices, tsc, etc. and only injecting interrupts at deterministic
points in the instruction stream. I'm actually trying to get multiple
guests to run simultaneously rather than storing the run of one guest
and repeating it at a later time, but it's basically the same thing.
The goal is to use the perf counters to trigger NMI's after a given
number of events (e.g. 1k branches) and delivering all interrupts at
those VMEXIT's in all guests. It is also necessary to buffer all
external input from one guest and give the same input to all the
others, as well as optionally check all external output for
consistency.
With a couple pointers from Avi, I have figured out how to make use of
the performance counter MSR's in KVM to trigger NMI's that exit to the
hypervisor at deterministic points in the guest. There is still an
issue of delayed NMI delivery that causes the NMI to actually occur in
root mode. This seems to occur if a perf counter overflows near
another event that causes a VMEXIT, and by the time the NMI propagates
up the pipeline, the CPU has already exited. I think this can be
handled by using both perf counters and keeping a global event counter
that is used to determine the point to inject interrupts.
The main questions I have at the moment are the following:
1. What is the best way to start and ID multiple guests? I've just
been configuring a script to start up two of them from the
command-line and storing an ID in the kvm_vcpu structure. The first to
get to get to vmc_run() is designated as the primary and the other is
then the secondary that will replicate what the primary does. I'm open
to ideas on automating the creation of multiple guests (with pinning
to CPUs).
2. Where should the bulk of the buffering and synchronization be done?
I've been putting everything in the hypervisor since it can see all
the guests. It may make sense to put some things in the qemu code and
make use of other IPC mechanisms for synchronization, though.
3. What is the best way to deal with I/O string/string_down calls that
handle the I/O in kvmctl.c? I would guess it will be necessary to
buffer the memory range somewhere and pass the address of that someone
for the secondary guest to access when it gets to that point.
4. DMA? I haven't thought about this much yet and am using simple
guests that don't do it. It will need to be handled eventually,
though.
There are many others, but these are the top things that I'm still not
clear on. I'd appreciate any input from others experienced in this
area.
Thanks,
Casey
On 1/10/07, Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
> Casey Jeffery wrote:
> > I've tried out the last few versions of KVM and think it's great. It's
> > much easier to use and understand than Xen and performance is
> > surprisingly good.
> >
> > One of the things I'd like to do is modify it to allow PMI generation
> > based on the Intel performance counter facilities. Specifically, I'd
> > like to be able to pin a guest to a CPU, program one of the IA32_PMCx
> > MSR's to a count, configure the LVT to deliver an NMI on overflow,
> > enable the PERF_EVT_SELx, and use the hardware-based MSR save/restore
> > area to disable counting in root mode and re-enable when entering
> > non-root.
> >
>
> Ok. Note that there are two possible mutually exclusive uses for the
> performance counters:
>
> 1. Allow the guest to program the performance counters as it sees fit,
> for example to run oprofile internally.
> 2. Let the host control the performance counters, for stuff like
> deterministic record/replay.
>
> We should by default allow (1), with the option for (2). This is
> somewhat similar to the hardware breakpoints, so you'd need some control
> like KVM_DEBUG_GUEST to switch modes.
>
> > The questions I have now are the following:
> >
> > 1. It looks like the MSR save/restore area is set up already for EFER
> > MSR and I assume that adding IA32_PMCx and IA32_PERFEVTSEL to the
> > vmx_msr_index[] array would be enough to save/restore them. Is that
> > correct?
> >
>
> Yes.
>
> > 2. I don't completely follow the code from vmx.c
> >
> > save_msrs(vcpu->host_msrs, vcpu->nmsrs);
> >
>
> This saves the host msrs to the host save area.
>
> > load_msrs(vcpu->guest_msrs, NR_BAD_MSRS);
> >
>
> vmx has a bug where it corrupts a couple of msrs, so we load the first
> NR_BAD_MSRS explicitly.
>
>
> > ...VMLAUNCH/RESUME etc. ...
> >
>
> This loads vcpu->nmsrs - NR_BAD_MSRS from the guest area.
>
> > ....
> > ...VMEXIT...
> >
> >
>
> This saves vcpu->nmsrs - NR_BAD_MSRS into the guest area, and then loads
> vcpu->nmsrs - NR_BAD_MSRS from the host area.
>
>
> > save_msrs(vcpu->guest_msrs, NR_BAD_MSRS);
> > load_msrs(vcpu->host_msrs, NR_BAD_MSRS);
> >
>
> These two lines do the same for the vmx-incompatible msrs.
>
> > Does this mean the MSR values to be loaded on VM exit are initialized
> > before leaving root mode while the VM_entry_load / VM_exit_store area
> > is not touched?
> >
>
> I'm not sure I understand the question, but host msrs are never saved
> automatically, and there are a couple of msrs we load and save manually
> for both host and guest.
>
> > 3. Is it possible to attribute counter ticks solely to a given KVM
> > guest process and avoid counting ticks from other processes? I assume
> > this isn't a problem since a VM exit should occur if the KVM process
> > were to be pulled off the CPU by the kernel to do a context switch.
> >
>
> Yes.
>
> --
> error compiling committee.c: too many arguments to function
>
>
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: KVM and Perf Counters
[not found] ` <cb6aceaa0702010743v7fba298y23e825ef1e706801-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2007-02-01 16:10 ` Dor Laor
[not found] ` <64F9B87B6B770947A9F8391472E032160A3F2298-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>
2007-02-01 16:15 ` Avi Kivity
1 sibling, 1 reply; 7+ messages in thread
From: Dor Laor @ 2007-02-01 16:10 UTC (permalink / raw)
To: Casey Jeffery, kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
>I wanted to give an update on a project I'm attempting with KVM. The
>goal is to implement a software-based lockstepping of two or more
>virtual guests. There are a number of applications of this, and the
>most related is the ReVirt project that was done with UML and Xen by
>Dunlap et al.
>
>The basic idea is to remove the non-deterministic events that occur in
>the system by giving each guest the same I/O values from external
>devices, tsc, etc. and only injecting interrupts at deterministic
>points in the instruction stream. I'm actually trying to get multiple
>guests to run simultaneously rather than storing the run of one guest
>and repeating it at a later time, but it's basically the same thing.
>The goal is to use the perf counters to trigger NMI's after a given
>number of events (e.g. 1k branches) and delivering all interrupts at
>those VMEXIT's in all guests. It is also necessary to buffer all
>external input from one guest and give the same input to all the
>others, as well as optionally check all external output for
>consistency.
>
>With a couple pointers from Avi, I have figured out how to make use of
>the performance counter MSR's in KVM to trigger NMI's that exit to the
>hypervisor at deterministic points in the guest. There is still an
>issue of delayed NMI delivery that causes the NMI to actually occur in
>root mode. This seems to occur if a perf counter overflows near
>another event that causes a VMEXIT, and by the time the NMI propagates
>up the pipeline, the CPU has already exited. I think this can be
>handled by using both perf counters and keeping a global event counter
>that is used to determine the point to inject interrupts.
>
>The main questions I have at the moment are the following:
>
>1. What is the best way to start and ID multiple guests? I've just
>been configuring a script to start up two of them from the
>command-line and storing an ID in the kvm_vcpu structure. The first to
>get to get to vmc_run() is designated as the primary and the other is
>then the secondary that will replicate what the primary does. I'm open
>to ideas on automating the creation of multiple guests (with pinning
>to CPUs).
It's logical to have such VM uniqe id, similar to Xen domain.
Do you intent to use is as an id for the deterministic guest to hook
into the primary executing guest?
>2. Where should the bulk of the buffering and synchronization be done?
>I've been putting everything in the hypervisor since it can see all
>the guests. It may make sense to put some things in the qemu code and
>make use of other IPC mechanisms for synchronization, though.
Except for code that must resides within KVM (the performance counters
logic), I would put all the non performance critical (mem copy) in the
user space with special hooks into Qemu-kvm.
This way we achieve separation and easier coding in user space.
>3. What is the best way to deal with I/O string/string_down calls that
>handle the I/O in kvmctl.c? I would guess it will be necessary to
>buffer the memory range somewhere and pass the address of that someone
>for the secondary guest to access when it gets to that point.
Do you assume that the deterministic guest(slave) resides in the same
host?
It's better to put them in a stream and send to the slave guest in a
pipe/socket.
>4. DMA? I haven't thought about this much yet and am using simple
>guests that don't do it. It will need to be handled eventually,
>though.
All network/disk IO should be sent to the slave guest too. You might
need to hook into Qemu disk/net devices and replace them temporary in
slave mode.
>
>There are many others, but these are the top things that I'm still not
>clear on. I'd appreciate any input from others experienced in this
>area.
>
>Thanks,
>Casey
Good luck, it's really a fun project.
>
>
>
>On 1/10/07, Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
>> Casey Jeffery wrote:
>> > I've tried out the last few versions of KVM and think it's great.
It's
>> > much easier to use and understand than Xen and performance is
>> > surprisingly good.
>> >
>> > One of the things I'd like to do is modify it to allow PMI
generation
>> > based on the Intel performance counter facilities. Specifically,
I'd
>> > like to be able to pin a guest to a CPU, program one of the
IA32_PMCx
>> > MSR's to a count, configure the LVT to deliver an NMI on overflow,
>> > enable the PERF_EVT_SELx, and use the hardware-based MSR
save/restore
>> > area to disable counting in root mode and re-enable when entering
>> > non-root.
>> >
>>
>> Ok. Note that there are two possible mutually exclusive uses for the
>> performance counters:
>>
>> 1. Allow the guest to program the performance counters as it sees
fit,
>> for example to run oprofile internally.
>> 2. Let the host control the performance counters, for stuff like
>> deterministic record/replay.
>>
>> We should by default allow (1), with the option for (2). This is
>> somewhat similar to the hardware breakpoints, so you'd need some
control
>> like KVM_DEBUG_GUEST to switch modes.
>>
>> > The questions I have now are the following:
>> >
>> > 1. It looks like the MSR save/restore area is set up already for
EFER
>> > MSR and I assume that adding IA32_PMCx and IA32_PERFEVTSEL to the
>> > vmx_msr_index[] array would be enough to save/restore them. Is
that
>> > correct?
>> >
>>
>> Yes.
>>
>> > 2. I don't completely follow the code from vmx.c
>> >
>> > save_msrs(vcpu->host_msrs, vcpu->nmsrs);
>> >
>>
>> This saves the host msrs to the host save area.
>>
>> > load_msrs(vcpu->guest_msrs, NR_BAD_MSRS);
>> >
>>
>> vmx has a bug where it corrupts a couple of msrs, so we load the
first
>> NR_BAD_MSRS explicitly.
>>
>>
>> > ...VMLAUNCH/RESUME etc. ...
>> >
>>
>> This loads vcpu->nmsrs - NR_BAD_MSRS from the guest area.
>>
>> > ....
>> > ...VMEXIT...
>> >
>> >
>>
>> This saves vcpu->nmsrs - NR_BAD_MSRS into the guest area, and then
loads
>> vcpu->nmsrs - NR_BAD_MSRS from the host area.
>>
>>
>> > save_msrs(vcpu->guest_msrs, NR_BAD_MSRS);
>> > load_msrs(vcpu->host_msrs, NR_BAD_MSRS);
>> >
>>
>> These two lines do the same for the vmx-incompatible msrs.
>>
>> > Does this mean the MSR values to be loaded on VM exit are
initialized
>> > before leaving root mode while the VM_entry_load / VM_exit_store
area
>> > is not touched?
>> >
>>
>> I'm not sure I understand the question, but host msrs are never saved
>> automatically, and there are a couple of msrs we load and save
manually
>> for both host and guest.
>>
>> > 3. Is it possible to attribute counter ticks solely to a given KVM
>> > guest process and avoid counting ticks from other processes? I
assume
>> > this isn't a problem since a VM exit should occur if the KVM
process
>> > were to be pulled off the CPU by the kernel to do a context switch.
>> >
>>
>> Yes.
>>
>> --
>> error compiling committee.c: too many arguments to function
>>
>>
>
>-----------------------------------------------------------------------
--
>Using Tomcat but need to do more? Need to support web services,
security?
>Get stuff done quickly with pre-integrated technology to make your job
>easier.
>Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=12164
2
>_______________________________________________
>kvm-devel mailing list
>kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
>https://lists.sourceforge.net/lists/listinfo/kvm-devel
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: KVM and Perf Counters
[not found] ` <cb6aceaa0702010743v7fba298y23e825ef1e706801-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2007-02-01 16:10 ` Dor Laor
@ 2007-02-01 16:15 ` Avi Kivity
[not found] ` <45C21211.3050509-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
1 sibling, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2007-02-01 16:15 UTC (permalink / raw)
To: Casey Jeffery; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Casey Jeffery wrote:
> 1. What is the best way to start and ID multiple guests? I've just
> been configuring a script to start up two of them from the
> command-line and storing an ID in the kvm_vcpu structure. The first to
> get to get to vmc_run() is designated as the primary and the other is
> then the secondary that will replicate what the primary does.
You already have a guest ID -- the pid of the process which created the
VM. Is there any reason to have another ID?
> I'm open
> to ideas on automating the creation of multiple guests (with pinning
> to CPUs).
>
You can pin using taskset(1) or sched_setaffinity(2). I don't
understand the problem with automating guest creation.
> 2. Where should the bulk of the buffering and synchronization be done?
> I've been putting everything in the hypervisor since it can see all
> the guests. It may make sense to put some things in the qemu code and
> make use of other IPC mechanisms for synchronization, though.
>
Userspace is the best place.
> 3. What is the best way to deal with I/O string/string_down calls that
> handle the I/O in kvmctl.c? I would guess it will be necessary to
> buffer the memory range somewhere and pass the address of that someone
> for the secondary guest to access when it gets to that point.
>
I'd suggest serializing everything over a socket. That way you can run
the primary and backup on different machines.
> 4. DMA? I haven't thought about this much yet and am using simple
> guests that don't do it. It will need to be handled eventually,
> though.
>
Since DMA is handled by qemu, you already have a sync point. Just tag
the guest address and data with that sync point.
--
error compiling committee.c: too many arguments to function
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: KVM and Perf Counters
[not found] ` <64F9B87B6B770947A9F8391472E032160A3F2298-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>
@ 2007-02-01 17:44 ` Casey Jeffery
0 siblings, 0 replies; 7+ messages in thread
From: Casey Jeffery @ 2007-02-01 17:44 UTC (permalink / raw)
To: Dor Laor; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Thanks for the very quick response. You guys at Qumranet are good. :)
On 2/1/07, Dor Laor <dor.laor-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
> >The main questions I have at the moment are the following:
> >
> >1. What is the best way to start and ID multiple guests? I've just
> >been configuring a script to start up two of them from the
> >command-line and storing an ID in the kvm_vcpu structure. The first to
> >get to get to vmc_run() is designated as the primary and the other is
> >then the secondary that will replicate what the primary does. I'm open
> >to ideas on automating the creation of multiple guests (with pinning
> >to CPUs).
>
> It's logical to have such VM uniqe id, similar to Xen domain.
> Do you intent to use is as an id for the deterministic guest to hook
> into the primary executing guest?
I was just looking to use the ID as a way for the hypervisor to
distinguish which guest is the primary and which is the
backup/replication. If logic is put in the usermode qemu code, it also
needs to know if the guest is primary or backup.
>
> >2. Where should the bulk of the buffering and synchronization be done?
> >I've been putting everything in the hypervisor since it can see all
> >the guests. It may make sense to put some things in the qemu code and
> >make use of other IPC mechanisms for synchronization, though.
>
> Except for code that must resides within KVM (the performance counters
> logic), I would put all the non performance critical (mem copy) in the
> user space with special hooks into Qemu-kvm.
> This way we achieve separation and easier coding in user space.
>
I thought this would be the case. It seems it would be convenient if
the guest could be completely ignorant to the fact it is being
replicated with the hypervisor doing everything in the background, but
it does make sense to make the qemu process aware of it to make the
coding easier and not put a lot of policy in the kernel.
> >3. What is the best way to deal with I/O string/string_down calls that
> >handle the I/O in kvmctl.c? I would guess it will be necessary to
> >buffer the memory range somewhere and pass the address of that someone
> >for the secondary guest to access when it gets to that point.
>
> Do you assume that the deterministic guest(slave) resides in the same
> host?
> It's better to put them in a stream and send to the slave guest in a
> pipe/socket.
>
I was assuming all guests would run on a single, multi-core machine.
It would be an interesting extension to replicate across different
physical hosts and would enable usage similar to Marathon Technologies
distributed lockstepping. I will explore the socket option, although
I'm trying to trade-off limited flexibility with just getting
something working.
> >4. DMA? I haven't thought about this much yet and am using simple
> >guests that don't do it. It will need to be handled eventually,
> >though.
>
> All network/disk IO should be sent to the slave guest too. You might
> need to hook into Qemu disk/net devices and replace them temporary in
> slave mode.
>
I will have to explore QEMU some more, but it's reassuring that you
and Avi suggest it's straightforward.
> >
> >There are many others, but these are the top things that I'm still not
> >clear on. I'd appreciate any input from others experienced in this
> >area.
> >
> >Thanks,
> >Casey
>
> Good luck, it's really a fun project.
>
Thanks for the help.
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: KVM and Perf Counters
[not found] ` <45C21211.3050509-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
@ 2007-02-01 19:28 ` Casey Jeffery
0 siblings, 0 replies; 7+ messages in thread
From: Casey Jeffery @ 2007-02-01 19:28 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
On 2/1/07, Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org> wrote:
> Casey Jeffery wrote:
> > 1. What is the best way to start and ID multiple guests? I've just
> > been configuring a script to start up two of them from the
> > command-line and storing an ID in the kvm_vcpu structure. The first to
> > get to get to vmc_run() is designated as the primary and the other is
> > then the secondary that will replicate what the primary does.
>
> You already have a guest ID -- the pid of the process which created the
> VM. Is there any reason to have another ID?
>
I guess ID isn't a good term to use. I was using the ID to identify
the guest as being primary or secondary. I could map the PID to being
one or the other, but I don't think it's beneficial over just directly
labeling them as such.
>
> > I'm open
> > to ideas on automating the creation of multiple guests (with pinning
> > to CPUs).
> >
>
> You can pin using taskset(1) or sched_setaffinity(2). I don't
> understand the problem with automating guest creation.
>
I'm currently using a simple shell script that starts two processes
separately with taskset to do the pinning. By automation, I was
thinking along the lines of passing an argument to a single call to
qemu and having it start the multiple processes and pin them. It's a
very minor point.
>
> > 2. Where should the bulk of the buffering and synchronization be done?
> > I've been putting everything in the hypervisor since it can see all
> > the guests. It may make sense to put some things in the qemu code and
> > make use of other IPC mechanisms for synchronization, though.
> >
>
> Userspace is the best place.
ok
>
> > 3. What is the best way to deal with I/O string/string_down calls that
> > handle the I/O in kvmctl.c? I would guess it will be necessary to
> > buffer the memory range somewhere and pass the address of that someone
> > for the secondary guest to access when it gets to that point.
> >
>
> I'd suggest serializing everything over a socket. That way you can run
> the primary and backup on different machines.
>
This is a good point, and I will explore it. I hadn't been concerned
with running across machines, but it may fall out as a free feature in
such an implementation as long as performance isn't too bad.
> > 4. DMA? I haven't thought about this much yet and am using simple
> > guests that don't do it. It will need to be handled eventually,
> > though.
> >
>
> Since DMA is handled by qemu, you already have a sync point. Just tag
> the guest address and data with that sync point.
>
Thanks again for the help.
-Casey
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-02-01 19:28 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-10 1:26 KVM and Perf Counters Casey Jeffery
[not found] ` <cb6aceaa0701091726h7fe94bb3q73db49e43ff226ed-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2007-01-10 9:37 ` Avi Kivity
[not found] ` <45A4B3E4.7090202-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-02-01 15:43 ` Casey Jeffery
[not found] ` <cb6aceaa0702010743v7fba298y23e825ef1e706801-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2007-02-01 16:10 ` Dor Laor
[not found] ` <64F9B87B6B770947A9F8391472E032160A3F2298-yEcIvxbTEBqsx+V+t5oei8rau4O3wl8o3fe8/T/H7NteoWH0uzbU5w@public.gmane.org>
2007-02-01 17:44 ` Casey Jeffery
2007-02-01 16:15 ` Avi Kivity
[not found] ` <45C21211.3050509-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-02-01 19:28 ` Casey Jeffery
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox