* Re: [QUESTION] tcg: Is concurrent storing and code translation of the same code page considered as racing in MTTCG? [not found] <60169742.1c69fb81.90ae8.cdc6SMTPIN_ADDED_BROKEN@mx.google.com> @ 2021-01-31 23:01 ` Richard Henderson 2021-02-01 16:59 ` Liren Wei [not found] ` <60183365.1c69fb81.8afce.3d7bSMTPIN_ADDED_BROKEN@mx.google.com> 0 siblings, 2 replies; 3+ messages in thread From: Richard Henderson @ 2021-01-31 23:01 UTC (permalink / raw) To: Liren Wei, qemu-devel@nongnu.org Cc: Paolo Bonzini, Alex Bennée, Peter Maydell On 1/31/21 1:38 AM, Liren Wei wrote: > However, similar to the situation described in: > https://lists.nongnu.org/archive/html/qemu-devel/2018-02/msg02529.html > > When we have 2 vCPUs with one of them writing to the code page while > the other just translated some code within that same page, the following > situation might happen: > > vCPU thread 1 - writing vCPU thread 2 - translating > ----------------------- ----------------------- > TLB check -> slow path > notdirty_write() > set dirty flag > write to RAM > tb_gen_code() > tb_page_add() > tlb_protect_code() > > TLB check -> fast path > set TLB_NOTDIRTY > write to RAM > executing unmodified code for this time > and maybe also for the next time, never > re-translate modified TBs. > > > My question is: > Should the situation described above be considered as a bug or, > an intended behavior for QEMU (, so it's the programmer's fault > for not flushing the icache after modifying shared code page)? Yes, this is a bug, because we are trying to support e.g. x86 which does not require an icache flush. I think the page lock, the TLB_NOTDIRTY setting, and a possible sync on the setting, needs to happen before the bytes are read during translation. Otherwise we don't catch the case above, nor do we catch CPU1 CPU2 ------------------ -------------------------- TLB check -> fast tb_gen_code() -> all of it write to ram Also because of x86 (and other architectures in which a single instruction can span a page boundary), I think this lock+set+sync sequence needs to happen on demand in something called from the function set defined in include/exec/translator.h That also means that any target/cpu/ which has not been converted to use that interface remains broken, and should be converted or deprecated. Are you planning to work on this? r~ ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [QUESTION] tcg: Is concurrent storing and code translation of the same code page considered as racing in MTTCG? 2021-01-31 23:01 ` [QUESTION] tcg: Is concurrent storing and code translation of the same code page considered as racing in MTTCG? Richard Henderson @ 2021-02-01 16:59 ` Liren Wei [not found] ` <60183365.1c69fb81.8afce.3d7bSMTPIN_ADDED_BROKEN@mx.google.com> 1 sibling, 0 replies; 3+ messages in thread From: Liren Wei @ 2021-02-01 16:59 UTC (permalink / raw) To: Richard Henderson, qemu-devel@nongnu.org On 2/1/21 7:01 AM, Richard Henderson wrote: > Yes, this is a bug, because we are trying to support e.g. x86 which does not > require an icache flush. That is too bad :( I know nothing about the modern hardware, it's really hard to imagine what is done in CPU to maintain the coherence when this kind of cross-modifying scenario happens. > I think the page lock, the TLB_NOTDIRTY setting, and a possible sync on the > setting, needs to happen before the bytes are read during translation. > Otherwise we don't catch the case above, nor do we catch > > CPU1 CPU2 > ------------------ -------------------------- > TLB check -> fast > tb_gen_code() -> all of it > write to ram > > Also because of x86 (and other architectures in which a single instruction can > span a page boundary), I think this lock+set+sync sequence needs to happen on > demand in something called from the function set defined in > include/exec/translator.h > > That also means that any target/cpu/ which has not been converted to use that > interface remains broken, and should be converted or deprecated. I failed to figure out what do you mean by lock+set+sync, in particular: - What is the use of the page lock here (Is this the lock of PageDesc?) - Is the "possible sync" means some kind of wait so that TLB_NOTDIRTY is definitely able to catch further "write to ram"? > Are you planning to work on this? No, sorry for that.. Neither do I see myself qualified enough to do this job, nor do I have enough time for it. But I did considered the following: Since "TLB check" and "fast path write to ram" are separate steps, it seems to me that CPU1 can always (in the extreme case) enter the fast path before CPU2 starts doing translation, and then write to already-translated code of CPU2 without informing it. Therefore maybe we can mark the RAM backing page in QEMU's page table as non-writable at an early stage in tb_gen_code() using the ability of the underlying OS, register a signal handler to intercept the first "write to ram" happened, restore the page to be writable, and eventually inform the translating thread to do something about it. (e.g. queue_work_on_cpu() and cpu_exit() the translating vCPU so that it has chance to invalidate the TB after possibly running that TB for several times) But all these sounds very intrusive to the existing code base, and I'm not sure whether it make sense... Thanks Liren Wei ^ permalink raw reply [flat|nested] 3+ messages in thread
[parent not found: <60183365.1c69fb81.8afce.3d7bSMTPIN_ADDED_BROKEN@mx.google.com>]
* Re: [QUESTION] tcg: Is concurrent storing and code translation of the same code page considered as racing in MTTCG? [not found] ` <60183365.1c69fb81.8afce.3d7bSMTPIN_ADDED_BROKEN@mx.google.com> @ 2021-02-01 18:27 ` Richard Henderson 0 siblings, 0 replies; 3+ messages in thread From: Richard Henderson @ 2021-02-01 18:27 UTC (permalink / raw) To: Liren Wei, qemu-devel@nongnu.org On 2/1/21 6:59 AM, Liren Wei wrote: > On 2/1/21 7:01 AM, Richard Henderson wrote: >> Yes, this is a bug, because we are trying to support e.g. x86 which does not >> require an icache flush. > > That is too bad :( > > I know nothing about the modern hardware, it's really hard to imagine what > is done in CPU to maintain the coherence when this kind of cross-modifying > scenario happens. Indeed. Complicated cache shootdown stuff, no doubt. It also helps that the cpu is decoding one instruction at a time, whereas qemu is decoding many instructions. >> I think the page lock, the TLB_NOTDIRTY setting, and a possible sync on the >> setting, needs to happen before the bytes are read during translation. >> Otherwise we don't catch the case above, nor do we catch >> >> CPU1 CPU2 >> ------------------ -------------------------- >> TLB check -> fast >> tb_gen_code() -> all of it >> write to ram >> >> Also because of x86 (and other architectures in which a single instruction can >> span a page boundary), I think this lock+set+sync sequence needs to happen on >> demand in something called from the function set defined in >> include/exec/translator.h >> >> That also means that any target/cpu/ which has not been converted to use that >> interface remains broken, and should be converted or deprecated. > > I failed to figure out what do you mean by lock+set+sync, I used "lock+set+sync" as shorthand for the sequence I described in the first paragraph above. > in particular: > - What is the use of the page lock here (Is this the lock of PageDesc?) Yes, this would be the PageDesc lock. It would prevent TLB_NOTDIRTY from being removed by the slow-path write until any translation is complete. > - Is the "possible sync" means some kind of wait so that TLB_NOTDIRTY is > definitely able to catch further "write to ram"? That's what the sync is for -- to make sure that no other cpu can have seen a dirty page and not be finished with the write. > Therefore maybe we can mark the RAM backing page in QEMU's page table as > non-writable at an early stage in tb_gen_code() using the ability of the > underlying OS... It's very expensive to change page tables. It's also complicated to capture and re-start with signal handlers. We do that sort of thing for qemu-linux-user, where there *is* no slow path and we have no other option. I think it would be much easier to reason with locks. r~ ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-02-01 18:28 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <60169742.1c69fb81.90ae8.cdc6SMTPIN_ADDED_BROKEN@mx.google.com> 2021-01-31 23:01 ` [QUESTION] tcg: Is concurrent storing and code translation of the same code page considered as racing in MTTCG? Richard Henderson 2021-02-01 16:59 ` Liren Wei [not found] ` <60183365.1c69fb81.8afce.3d7bSMTPIN_ADDED_BROKEN@mx.google.com> 2021-02-01 18:27 ` Richard Henderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).