* [Qemu-devel] MTTCG Tasks (kvmforum summary) @ 2015-09-04 7:49 Alex Bennée 2015-09-04 8:10 ` Frederic Konrad ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Alex Bennée @ 2015-09-04 7:49 UTC (permalink / raw) To: qemu-devel, mttcg Cc: mark.burton, claudio.fontana, a.rigo, Emilio G. Cota, Alexander Spyridakis, Pbonzini, fred.konrad Hi, At KVM Forum I sat down with Paolo and Frederic and we came up with the current outstanding tasks on MTTCG. This is not comprehensive but hopefully covers the big areas. They are sorted in rough order we'd like to get them up-streamed. * linux-user patches (Paolo) Paolo has already grabbed a bunch of Fred's patch set where it makes sense on its own. They are up on the list and need review to expedite there way into the main tree now it is open for pull requests. See thread: 1439397664-70734-1-git-send-email-pbonzini@redhat.com * TLB_EXCL based LL/SC patches (Alvise) I think our consensus is these provide a good basis for the solution to modelling our atomics within TCG. I haven't had a chance to review Emilio's series yet which may approach this problem differently. I think the core patches with the generic backend support make a good basis to base development work on. We need to iterate and review the non-MTTCG variant of the patch set with a view to up-streaming soon. * Signal free qemu_cpu_kick (Paolo) I don't know much about this patch set but I assume this avoids the need to catch signals and longjmp about just to wake up? * RCU tb_flush (needs writing) The idea has been floated to introduce an RCU based translation buffer to flushes can be done lazily and the buffers dumped once all threads have stopped using them. I have been pondering if looking into having translation regions would be worth looking into so we can have translation buffers for contiguous series of pages. That way we don't have to throw away all translations on these big events. Currently every time we roll over the translation buffer we throw a bunch of perfectly good code away. This may or may not be orthogonal to using RCU? * Memory barrier support (need RFC for discussion) I came to KVM forum with a back of the envelope idea we could implement one or two barrier ops (acquire/release?). Various suggestions of other types of memory behaviour have been suggested. I'll try to pull together an RFC patch with design outline for discussion. It would be nice to be able to demonstrate barrier failures in my test cases as well ;-) * longjmp in cpu_exec Paolo is fairly sure that if you take page faults while IRQs happening problems will occur with cpu->interrupt_request. Does it need to take the BQL? I'd like to see if we can get a torture test to stress this code although it will require IPI support in the unit tests. * tlb_flush and dmb behaviour (am I waiting for TLB flush?) I think this means we need explicit memory barriers to sync updates to the tlb. * tb_find_fast outside the lock Currently it is a big performance win as the tb_find_fast has a lot of contention with other threads. However there is concern it needs to be properly protected. * Additional review comments on the Fred's branch - page->code_bitmap isn't protected by lock - cpu_breakpoint_insert needs locks - check gdbstub works * What to do about icount? What is the impact of multi-thread on icount? Do we need to disable it for MTTCG or can it be correct per-cpu? Can it be updated lock-step? We need some input from the guys that use icount the most. Cheers, -- Alex Bennée ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] MTTCG Tasks (kvmforum summary) 2015-09-04 7:49 [Qemu-devel] MTTCG Tasks (kvmforum summary) Alex Bennée @ 2015-09-04 8:10 ` Frederic Konrad 2015-09-04 9:25 ` Paolo Bonzini 2015-09-04 9:45 ` dovgaluk 2 siblings, 0 replies; 11+ messages in thread From: Frederic Konrad @ 2015-09-04 8:10 UTC (permalink / raw) To: Alex Bennée, qemu-devel, mttcg Cc: mark.burton, claudio.fontana, a.rigo, Emilio G. Cota, Alexander Spyridakis, Pbonzini Hi Alex, On 04/09/2015 09:49, Alex Bennée wrote: > Hi, > > At KVM Forum I sat down with Paolo and Frederic and we came up with the > current outstanding tasks on MTTCG. This is not comprehensive but > hopefully covers the big areas. They are sorted in rough order we'd like > to get them up-streamed. > > * linux-user patches (Paolo) > > Paolo has already grabbed a bunch of Fred's patch set where it makes > sense on its own. They are up on the list and need review to expedite > there way into the main tree now it is open for pull requests. > > See thread: 1439397664-70734-1-git-send-email-pbonzini@redhat.com > > * TLB_EXCL based LL/SC patches (Alvise) > > I think our consensus is these provide a good basis for the solution to > modelling our atomics within TCG. I haven't had a chance to review > Emilio's series yet which may approach this problem differently. I think > the core patches with the generic backend support make a good basis to > base development work on. > > We need to iterate and review the non-MTTCG variant of the patch set > with a view to up-streaming soon. > > * Signal free qemu_cpu_kick (Paolo) > > I don't know much about this patch set but I assume this avoids the need > to catch signals and longjmp about just to wake up? > > * RCU tb_flush (needs writing) > > The idea has been floated to introduce an RCU based translation buffer > to flushes can be done lazily and the buffers dumped once all threads > have stopped using them. > > I have been pondering if looking into having translation regions would > be worth looking into so we can have translation buffers for contiguous > series of pages. That way we don't have to throw away all translations > on these big events. Currently every time we roll over the translation > buffer we throw a bunch of perfectly good code away. This may or may not > be orthogonal to using RCU? I'm still not sure tb_flush needs so much effort. tb_flush is happening very rarely just exiting everybody seems easier.. > > * Memory barrier support (need RFC for discussion) > > I came to KVM forum with a back of the envelope idea we could implement > one or two barrier ops (acquire/release?). Various suggestions of other > types of memory behaviour have been suggested. > > I'll try to pull together an RFC patch with design outline for > discussion. It would be nice to be able to demonstrate barrier failures > in my test cases as well ;-) > > * longjmp in cpu_exec > > Paolo is fairly sure that if you take page faults while IRQs happening > problems will occur with cpu->interrupt_request. Does it need to take > the BQL? > > I'd like to see if we can get a torture test to stress this code > although it will require IPI support in the unit tests. > > * tlb_flush and dmb behaviour (am I waiting for TLB flush?) > > I think this means we need explicit memory barriers to sync updates to > the tlb. > > * tb_find_fast outside the lock > > Currently it is a big performance win as the tb_find_fast has a lot of > contention with other threads. However there is concern it needs to be > properly protected. > > * Additional review comments on the Fred's branch > - page->code_bitmap isn't protected by lock > - cpu_breakpoint_insert needs locks Thoses one are OK, I didn't send all that yet. > - check gdbstub works > > * What to do about icount? > > What is the impact of multi-thread on icount? Do we need to disable it > for MTTCG or can it be correct per-cpu? Can it be updated lock-step? > > We need some input from the guys that use icount the most. > > Cheers, > Also we might want to have everything in a branch. So rebasing the Atomics series on eg: 2.4.0 + 2 Paolo's series so I can rebase MTTCG on it and pick the MTTCG atomic parts can be usefull. Thanks, Fred ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] MTTCG Tasks (kvmforum summary) 2015-09-04 7:49 [Qemu-devel] MTTCG Tasks (kvmforum summary) Alex Bennée 2015-09-04 8:10 ` Frederic Konrad @ 2015-09-04 9:25 ` Paolo Bonzini 2015-09-04 9:41 ` Edgar E. Iglesias 2015-09-04 9:45 ` dovgaluk 2 siblings, 1 reply; 11+ messages in thread From: Paolo Bonzini @ 2015-09-04 9:25 UTC (permalink / raw) To: Alex Bennée, qemu-devel, mttcg Cc: mark.burton, claudio.fontana, a.rigo, Emilio G. Cota, Alexander Spyridakis, Edgar E. Iglesias, fred.konrad On 04/09/2015 09:49, Alex Bennée wrote: > * Signal free qemu_cpu_kick (Paolo) > > I don't know much about this patch set but I assume this avoids the need > to catch signals and longjmp about just to wake up? It was part of Fred's patches, so I've extracted it to its own series. Removing 150 lines of code can't hurt. > * Memory barrier support (need RFC for discussion) > > I came to KVM forum with a back of the envelope idea we could implement > one or two barrier ops (acquire/release?). Various suggestions of other > types of memory behaviour have been suggested. > > I'll try to pull together an RFC patch with design outline for > discussion. It would be nice to be able to demonstrate barrier failures > in my test cases as well ;-) Emilio has something about it in his own MTTCG implementation. > * longjmp in cpu_exec > > Paolo is fairly sure that if you take page faults while IRQs happening > problems will occur with cpu->interrupt_request. Does it need to take > the BQL? > > I'd like to see if we can get a torture test to stress this code > although it will require IPI support in the unit tests. It's x86-specific (hardware interrupts push to the stack and can cause a page fault or other exception), so a unit test can be written for it. > * tlb_flush and dmb behaviour (am I waiting for TLB flush?) > > I think this means we need explicit memory barriers to sync updates to > the tlb. Yes. > * tb_find_fast outside the lock > > Currently it is a big performance win as the tb_find_fast has a lot of > contention with other threads. However there is concern it needs to be > properly protected. This, BTW, can be done for user-mode emulation first, so it can go in early. Same for RCU-ized code_gen_buffer. > * What to do about icount? > > What is the impact of multi-thread on icount? Do we need to disable it > for MTTCG or can it be correct per-cpu? Can it be updated lock-step? > > We need some input from the guys that use icount the most. That means Edgar. :) Paolo ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] MTTCG Tasks (kvmforum summary) 2015-09-04 9:25 ` Paolo Bonzini @ 2015-09-04 9:41 ` Edgar E. Iglesias 2015-09-04 10:18 ` Mark Burton 0 siblings, 1 reply; 11+ messages in thread From: Edgar E. Iglesias @ 2015-09-04 9:41 UTC (permalink / raw) To: Paolo Bonzini Cc: mttcg, claudio.fontana, mark.burton, a.rigo, qemu-devel, Emilio G. Cota, Alexander Spyridakis, Edgar E. Iglesias, Alex Bennée, fred.konrad On Fri, Sep 04, 2015 at 11:25:33AM +0200, Paolo Bonzini wrote: > > > On 04/09/2015 09:49, Alex Bennée wrote: > > * Signal free qemu_cpu_kick (Paolo) > > > > I don't know much about this patch set but I assume this avoids the need > > to catch signals and longjmp about just to wake up? > > It was part of Fred's patches, so I've extracted it to its own series. > Removing 150 lines of code can't hurt. > > > * Memory barrier support (need RFC for discussion) > > > > I came to KVM forum with a back of the envelope idea we could implement > > one or two barrier ops (acquire/release?). Various suggestions of other > > types of memory behaviour have been suggested. > > > > I'll try to pull together an RFC patch with design outline for > > discussion. It would be nice to be able to demonstrate barrier failures > > in my test cases as well ;-) > > Emilio has something about it in his own MTTCG implementation. > > > * longjmp in cpu_exec > > > > Paolo is fairly sure that if you take page faults while IRQs happening > > problems will occur with cpu->interrupt_request. Does it need to take > > the BQL? > > > > I'd like to see if we can get a torture test to stress this code > > although it will require IPI support in the unit tests. > > It's x86-specific (hardware interrupts push to the stack and can cause a > page fault or other exception), so a unit test can be written for it. > > > * tlb_flush and dmb behaviour (am I waiting for TLB flush?) > > > > I think this means we need explicit memory barriers to sync updates to > > the tlb. > > Yes. > > > * tb_find_fast outside the lock > > > > Currently it is a big performance win as the tb_find_fast has a lot of > > contention with other threads. However there is concern it needs to be > > properly protected. > > This, BTW, can be done for user-mode emulation first, so it can go in > early. Same for RCU-ized code_gen_buffer. > > > * What to do about icount? > > > > What is the impact of multi-thread on icount? Do we need to disable it > > for MTTCG or can it be correct per-cpu? Can it be updated lock-step? > > > > We need some input from the guys that use icount the most. > > That means Edgar. :) Hi! IMO it would be nice if we could run the cores in some kind of lock-step with a configurable amount of instructions that they can run ahead of time (X). For example, if X is 10000, every thread/core would checkpoint at 10000 insn boundaries and wait for other cores. Between these checkpoints, the cores will not be in sync. We might need to consider synchronizing at I/O accesses aswell to avoid weird timing issues when reading counter registers for example. Of course the devil will be in the details but an approach roughly like that sounds useful to me. Are there any other ideas floating around that may be better? BTW, where can I find the latest series? Is it on a git-repo/branch somewhere? Best regards, Edgar ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] MTTCG Tasks (kvmforum summary) 2015-09-04 9:41 ` Edgar E. Iglesias @ 2015-09-04 10:18 ` Mark Burton 2015-09-04 13:00 ` Lluís Vilanova 0 siblings, 1 reply; 11+ messages in thread From: Mark Burton @ 2015-09-04 10:18 UTC (permalink / raw) To: Edgar E. Iglesias Cc: mttcg, Alexander Spyridakis, Claudio Fontana, Alvise Rigo, qemu-devel, Emilio G. Cota, Edgar E. Iglesias, Paolo Bonzini, Alex Bennée, KONRAD Frédéric > On 4 Sep 2015, at 11:41, Edgar E. Iglesias <edgar.iglesias@xilinx.com> wrote: > > On Fri, Sep 04, 2015 at 11:25:33AM +0200, Paolo Bonzini wrote: >> >> >> On 04/09/2015 09:49, Alex Bennée wrote: >>> * Signal free qemu_cpu_kick (Paolo) >>> >>> I don't know much about this patch set but I assume this avoids the need >>> to catch signals and longjmp about just to wake up? >> >> It was part of Fred's patches, so I've extracted it to its own series. >> Removing 150 lines of code can't hurt. >> >>> * Memory barrier support (need RFC for discussion) >>> >>> I came to KVM forum with a back of the envelope idea we could implement >>> one or two barrier ops (acquire/release?). Various suggestions of other >>> types of memory behaviour have been suggested. >>> >>> I'll try to pull together an RFC patch with design outline for >>> discussion. It would be nice to be able to demonstrate barrier failures >>> in my test cases as well ;-) >> >> Emilio has something about it in his own MTTCG implementation. >> >>> * longjmp in cpu_exec >>> >>> Paolo is fairly sure that if you take page faults while IRQs happening >>> problems will occur with cpu->interrupt_request. Does it need to take >>> the BQL? >>> >>> I'd like to see if we can get a torture test to stress this code >>> although it will require IPI support in the unit tests. >> >> It's x86-specific (hardware interrupts push to the stack and can cause a >> page fault or other exception), so a unit test can be written for it. >> >>> * tlb_flush and dmb behaviour (am I waiting for TLB flush?) >>> >>> I think this means we need explicit memory barriers to sync updates to >>> the tlb. >> >> Yes. >> >>> * tb_find_fast outside the lock >>> >>> Currently it is a big performance win as the tb_find_fast has a lot of >>> contention with other threads. However there is concern it needs to be >>> properly protected. >> >> This, BTW, can be done for user-mode emulation first, so it can go in >> early. Same for RCU-ized code_gen_buffer. >> >>> * What to do about icount? >>> >>> What is the impact of multi-thread on icount? Do we need to disable it >>> for MTTCG or can it be correct per-cpu? Can it be updated lock-step? >>> >>> We need some input from the guys that use icount the most. >> >> That means Edgar. :) > > Hi! > > IMO it would be nice if we could run the cores in some kind of lock-step > with a configurable amount of instructions that they can run ahead > of time (X). > > For example, if X is 10000, every thread/core would checkpoint at > 10000 insn boundaries and wait for other cores. Between these > checkpoints, the cores will not be in sync. We might need to > consider synchronizing at I/O accesses aswell to avoid weird > timing issues when reading counter registers for example. > > Of course the devil will be in the details but an approach roughly > like that sounds useful to me. And “works" in other domains. Theoretically we dont need to sync at IO (Dynamic quantums), for most systems that have ’normal' IO its normally less efficient I believe. However, the trouble is, the user typically doesn’t know, and mucking about with quantum lengths, dynamic quantum switches etc is probably a royal pain in the butt. And if you dont set your quantum right, the thing will run really slowly (or will break)… The choices are a rock or a hard place. Dynamic quantums risk to be slow (you’ll be forcing an expensive ’sync’ - all CPU’s will have to exit etc) on each IO access from each core…. not great. Syncing with host time (e.g. each CPU tries to sync with host clock as best it can) will fail when one or other CPU can’t keep up…. In the end you end up with leaving the user with a nice long bit of string and a message saying “hang yourself here”. Cheers Mark. > > Are there any other ideas floating around that may be better? > > BTW, where can I find the latest series? Is it on a git-repo/branch > somewhere? > > Best regards, > Edgar +44 (0)20 7100 3485 x 210 +33 (0)5 33 52 01 77x 210 +33 (0)603762104 mark.burton ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] MTTCG Tasks (kvmforum summary) 2015-09-04 10:18 ` Mark Burton @ 2015-09-04 13:00 ` Lluís Vilanova 2015-09-04 13:10 ` dovgaluk 0 siblings, 1 reply; 11+ messages in thread From: Lluís Vilanova @ 2015-09-04 13:00 UTC (permalink / raw) To: Mark Burton Cc: Edgar E. Iglesias, mttcg, Alexander Spyridakis, Claudio Fontana, qemu-devel, Alvise Rigo, Emilio G. Cota, Paolo Bonzini, Edgar E. Iglesias, Alex Bennée, KONRAD Frédéric Mark Burton writes: [...] >>>> * What to do about icount? >>>> >>>> What is the impact of multi-thread on icount? Do we need to disable it >>>> for MTTCG or can it be correct per-cpu? Can it be updated lock-step? >>>> >>>> We need some input from the guys that use icount the most. >>> >>> That means Edgar. :) >> >> Hi! >> >> IMO it would be nice if we could run the cores in some kind of lock-step >> with a configurable amount of instructions that they can run ahead >> of time (X). >> >> For example, if X is 10000, every thread/core would checkpoint at >> 10000 insn boundaries and wait for other cores. Between these >> checkpoints, the cores will not be in sync. We might need to >> consider synchronizing at I/O accesses aswell to avoid weird >> timing issues when reading counter registers for example. >> >> Of course the devil will be in the details but an approach roughly >> like that sounds useful to me. > And “works" in other domains. > Theoretically we dont need to sync at IO (Dynamic quantums), for most systems > that have ’normal' IO its normally less efficient I believe. However, the > trouble is, the user typically doesn’t know, and mucking about with quantum > lengths, dynamic quantum switches etc is probably a royal pain in the butt. And > if you dont set your quantum right, the thing will run really slowly (or will > break)… > The choices are a rock or a hard place. Dynamic quantums risk to be slow (you’ll > be forcing an expensive ’sync’ - all CPU’s will have to exit etc) on each IO > access from each core…. not great. Syncing with host time (e.g. each CPU tries > to sync with host clock as best it can) will fail when one or other CPU can’t > keep up…. In the end you end up with leaving the user with a nice long bit of > string and a message saying “hang yourself here”. That price would not be paid when icount is disabled. Well, the code complexity price is always paid... I meant runtime :) Then, I think this depends on what type of guarantees you require from icount. I see two possible semantics: * All CPUs are *exactly* synchronized at icount granularity This means that every icount instructions everyone has to stop and synchronize. * All CPUs are *loosely* synchronized at icount granularity You can implement it in a way that ensures that every cpu has *at least* reached a certain timestamp. So cpus can keep on running nonetheless. The downside is that the latter loses the ability for reproducible runs, which IMHO are useful. A more complex option is to merge both: icount sets the "synchronization granularity" and another parameter sets the maximum delta between cpus (i.e., set it to 0 to have the first option, and infinite for the second). Cheers, Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] MTTCG Tasks (kvmforum summary) 2015-09-04 13:00 ` Lluís Vilanova @ 2015-09-04 13:10 ` dovgaluk 2015-09-04 14:59 ` Lluís Vilanova 0 siblings, 1 reply; 11+ messages in thread From: dovgaluk @ 2015-09-04 13:10 UTC (permalink / raw) To: Lluís Vilanova Cc: Edgar E. Iglesias, mttcg, Paolo Bonzini, Claudio Fontana, Mark Burton, Alvise Rigo, qemu-devel, Emilio G. Cota, Alexander Spyridakis, Edgar E. Iglesias, mttcg-request, Alex Bennée, KONRAD Frédéric Lluís Vilanova писал 2015-09-04 16:00: > Mark Burton writes: > [...] >>>>> * What to do about icount? >>>>> >>>>> What is the impact of multi-thread on icount? Do we need to disable >>>>> it >>>>> for MTTCG or can it be correct per-cpu? Can it be updated >>>>> lock-step? >>>>> >>>>> We need some input from the guys that use icount the most. >>>> >>>> That means Edgar. :) >>> >>> Hi! >>> >>> IMO it would be nice if we could run the cores in some kind of >>> lock-step >>> with a configurable amount of instructions that they can run ahead >>> of time (X). >>> >>> For example, if X is 10000, every thread/core would checkpoint at >>> 10000 insn boundaries and wait for other cores. Between these >>> checkpoints, the cores will not be in sync. We might need to >>> consider synchronizing at I/O accesses aswell to avoid weird >>> timing issues when reading counter registers for example. >>> >>> Of course the devil will be in the details but an approach roughly >>> like that sounds useful to me. > >> And “works" in other domains. >> Theoretically we dont need to sync at IO (Dynamic quantums), for most >> systems >> that have ’normal' IO its normally less efficient I believe. However, >> the >> trouble is, the user typically doesn’t know, and mucking about with >> quantum >> lengths, dynamic quantum switches etc is probably a royal pain in the >> butt. And >> if you dont set your quantum right, the thing will run really slowly >> (or will >> break)… > >> The choices are a rock or a hard place. Dynamic quantums risk to be >> slow (you’ll >> be forcing an expensive ’sync’ - all CPU’s will have to exit etc) on >> each IO >> access from each core…. not great. Syncing with host time (e.g. each >> CPU tries >> to sync with host clock as best it can) will fail when one or other >> CPU can’t >> keep up…. In the end you end up with leaving the user with a nice long >> bit of >> string and a message saying “hang yourself here”. > > That price would not be paid when icount is disabled. Well, the code > complexity > price is always paid... I meant runtime :) > > Then, I think this depends on what type of guarantees you require from > icount. I see two possible semantics: > > * All CPUs are *exactly* synchronized at icount granularity > > This means that every icount instructions everyone has to stop and > synchronize. > > * All CPUs are *loosely* synchronized at icount granularity > > You can implement it in a way that ensures that every cpu has *at > least* > reached a certain timestamp. So cpus can keep on running nonetheless. > Is the third possibility looks sane? * All CPUs synchronize at shared memory operations. When somebody tries to read/write shared memory, it should wait until all others will reach the same icount. > The downside is that the latter loses the ability for reproducible > runs, which > IMHO are useful. A more complex option is to merge both: icount sets > the > "synchronization granularity" and another parameter sets the maximum > delta > between cpus (i.e., set it to 0 to have the first option, and infinite > for the > second). Pavel Dovgalyuk ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] MTTCG Tasks (kvmforum summary) 2015-09-04 13:10 ` dovgaluk @ 2015-09-04 14:59 ` Lluís Vilanova 0 siblings, 0 replies; 11+ messages in thread From: Lluís Vilanova @ 2015-09-04 14:59 UTC (permalink / raw) To: dovgaluk Cc: Edgar E. Iglesias, mttcg, mttcg-request, Paolo Bonzini, Claudio Fontana, Mark Burton, Alvise Rigo, qemu-devel, Emilio G. Cota, Alexander Spyridakis, Edgar E. Iglesias, Alex Bennée, KONRAD Frédéric dovgaluk writes: > Lluís Vilanova писал 2015-09-04 16:00: >> Mark Burton writes: >> [...] >>>>>> * What to do about icount? >>>>>> >>>>>> What is the impact of multi-thread on icount? Do we need to disable it >>>>>> for MTTCG or can it be correct per-cpu? Can it be updated lock-step? >>>>>> >>>>>> We need some input from the guys that use icount the most. >>>>> >>>>> That means Edgar. :) >>>> >>>> Hi! >>>> >>>> IMO it would be nice if we could run the cores in some kind of lock-step >>>> with a configurable amount of instructions that they can run ahead >>>> of time (X). >>>> >>>> For example, if X is 10000, every thread/core would checkpoint at >>>> 10000 insn boundaries and wait for other cores. Between these >>>> checkpoints, the cores will not be in sync. We might need to >>>> consider synchronizing at I/O accesses aswell to avoid weird >>>> timing issues when reading counter registers for example. >>>> >>>> Of course the devil will be in the details but an approach roughly >>>> like that sounds useful to me. >> >>> And “works" in other domains. >>> Theoretically we dont need to sync at IO (Dynamic quantums), for most systems >>> that have ’normal' IO its normally less efficient I believe. However, the >>> trouble is, the user typically doesn’t know, and mucking about with quantum >>> lengths, dynamic quantum switches etc is probably a royal pain in the >>> butt. And >>> if you dont set your quantum right, the thing will run really slowly (or will >>> break)… >> >>> The choices are a rock or a hard place. Dynamic quantums risk to be slow >>> (you’ll >>> be forcing an expensive ’sync’ - all CPU’s will have to exit etc) on each IO >>> access from each core…. not great. Syncing with host time (e.g. each CPU >>> tries >>> to sync with host clock as best it can) will fail when one or other CPU can’t >>> keep up…. In the end you end up with leaving the user with a nice long bit of >>> string and a message saying “hang yourself here”. >> >> That price would not be paid when icount is disabled. Well, the code >> complexity >> price is always paid... I meant runtime :) >> >> Then, I think this depends on what type of guarantees you require from >> icount. I see two possible semantics: >> >> * All CPUs are *exactly* synchronized at icount granularity >> >> This means that every icount instructions everyone has to stop and >> synchronize. >> >> * All CPUs are *loosely* synchronized at icount granularity >> >> You can implement it in a way that ensures that every cpu has *at least* >> reached a certain timestamp. So cpus can keep on running nonetheless. >> > Is the third possibility looks sane? > * All CPUs synchronize at shared memory operations. > When somebody tries to read/write shared memory, it should wait until all > others > will reach the same icount. I think that's too heavyweight. Every memory access is a potential shared memory operation. You could refine it by tagging which pages are shared across cores and limit the number of synchronizations. But all pages would eventually end up as shared. Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] MTTCG Tasks (kvmforum summary) 2015-09-04 7:49 [Qemu-devel] MTTCG Tasks (kvmforum summary) Alex Bennée 2015-09-04 8:10 ` Frederic Konrad 2015-09-04 9:25 ` Paolo Bonzini @ 2015-09-04 9:45 ` dovgaluk 2015-09-04 12:38 ` Lluís Vilanova 2 siblings, 1 reply; 11+ messages in thread From: dovgaluk @ 2015-09-04 9:45 UTC (permalink / raw) To: Alex Bennée Cc: mttcg, claudio.fontana, mark.burton, qemu-devel, a.rigo, Emilio G. Cota, Alexander Spyridakis, Pbonzini, mttcg-request, fred.konrad Hi! Alex Bennée писал 2015-09-04 10:49: > * What to do about icount? > > What is the impact of multi-thread on icount? Do we need to disable it > for MTTCG or can it be correct per-cpu? Can it be updated lock-step? Why can't we have separate icount for each CPU? Then virtual timer will be assigned to one of them. Pavel Dovgalyuk ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] MTTCG Tasks (kvmforum summary) 2015-09-04 9:45 ` dovgaluk @ 2015-09-04 12:38 ` Lluís Vilanova 2015-09-04 12:46 ` Mark Burton 0 siblings, 1 reply; 11+ messages in thread From: Lluís Vilanova @ 2015-09-04 12:38 UTC (permalink / raw) To: dovgaluk Cc: mttcg, mttcg-request, claudio.fontana, mark.burton, qemu-devel, a.rigo, Emilio G. Cota, Alexander Spyridakis, Pbonzini, Alex Bennée, fred.konrad dovgaluk writes: > Hi! > Alex Bennée писал 2015-09-04 10:49: >> * What to do about icount? >> >> What is the impact of multi-thread on icount? Do we need to disable it >> for MTTCG or can it be correct per-cpu? Can it be updated lock-step? > Why can't we have separate icount for each CPU? > Then virtual timer will be assigned to one of them. My understanding is that icount means by deasign that time should be synchronized between cpus, where the number of executed instructions is the time unit. If all elements worked under this assumption (I'm afraid that's not the case for I/O devices), it should be possible to reproduce executions by setting icount to 1. Now, MTTCG faces the same icount accuracy problems that the current TCG implementation deals with (only at different scale). The naive implementation is to execute 1 instruction per CPU in lockstep. TCG currently relaxes this at the translation block level. The MTTCG implementation could do something similar, but just at a different (configurable?) granularity. Every N per-cpu instructions, synchronize all CPUs until each has, at least, arrived at that time step, then proceed with the next batch. Ideally, this synchronization delay (N) could be adapted dynamically. My half cent. Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Qemu-devel] MTTCG Tasks (kvmforum summary) 2015-09-04 12:38 ` Lluís Vilanova @ 2015-09-04 12:46 ` Mark Burton 0 siblings, 0 replies; 11+ messages in thread From: Mark Burton @ 2015-09-04 12:46 UTC (permalink / raw) To: Lluís Vilanova Cc: mttcg, Alexander Spyridakis, Claudio Fontana, qemu-devel, Alvise Rigo, Emilio G. Cota, dovgaluk, Pbonzini, Alex Bennée, KONRAD Frédéric > On 4 Sep 2015, at 14:38, Lluís Vilanova <vilanova@ac.upc.edu> wrote: > > dovgaluk writes: > >> Hi! >> Alex Bennée писал 2015-09-04 10:49: >>> * What to do about icount? >>> >>> What is the impact of multi-thread on icount? Do we need to disable it >>> for MTTCG or can it be correct per-cpu? Can it be updated lock-step? > >> Why can't we have separate icount for each CPU? >> Then virtual timer will be assigned to one of them. > > My understanding is that icount means by deasign that time should be > synchronized between cpus, where the number of executed instructions is the time > unit. If all elements worked under this assumption (I'm afraid that's not the > case for I/O devices), it should be possible to reproduce executions by setting > icount to 1. > > Now, MTTCG faces the same icount accuracy problems that the current TCG > implementation deals with (only at different scale). The naive implementation is > to execute 1 instruction per CPU in lockstep. TCG currently relaxes this at the > translation block level. > > The MTTCG implementation could do something similar, but just at a different > (configurable?) granularity. Every N per-cpu instructions, synchronize all CPUs > until each has, at least, arrived at that time step, then proceed with the next > batch. Ideally, this synchronization delay (N) could be adapted dynamically. This is often called a Quantum. Cheers Mark > > My half cent. > > Lluis > > -- > "And it's much the same thing with knowledge, for whenever you learn > something new, the whole world becomes that much richer." > -- The Princess of Pure Reason, as told by Norton Juster in The Phantom > Tollbooth +44 (0)20 7100 3485 x 210 +33 (0)5 33 52 01 77x 210 +33 (0)603762104 mark.burton ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-09-04 14:59 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-09-04 7:49 [Qemu-devel] MTTCG Tasks (kvmforum summary) Alex Bennée 2015-09-04 8:10 ` Frederic Konrad 2015-09-04 9:25 ` Paolo Bonzini 2015-09-04 9:41 ` Edgar E. Iglesias 2015-09-04 10:18 ` Mark Burton 2015-09-04 13:00 ` Lluís Vilanova 2015-09-04 13:10 ` dovgaluk 2015-09-04 14:59 ` Lluís Vilanova 2015-09-04 9:45 ` dovgaluk 2015-09-04 12:38 ` Lluís Vilanova 2015-09-04 12:46 ` Mark Burton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).