From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LM1Cq-0005kA-KY for qemu-devel@nongnu.org; Sun, 11 Jan 2009 09:22:24 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LM1Co-0005i8-N5 for qemu-devel@nongnu.org; Sun, 11 Jan 2009 09:22:23 -0500 Received: from [199.232.76.173] (port=42627 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LM1Co-0005hw-C5 for qemu-devel@nongnu.org; Sun, 11 Jan 2009 09:22:22 -0500 Received: from bart.se.axis.com ([195.60.68.10]:48906) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LM1Cn-0003mu-RX for qemu-devel@nongnu.org; Sun, 11 Jan 2009 09:22:22 -0500 Received: from bart.se.axis.com (bart.se.axis.com [127.0.0.1]) by bart.se.axis.com (Postfix) with ESMTP id 1DDE46403F for ; Sun, 11 Jan 2009 15:22:21 +0100 (CET) Received: from axis.com (edgar.se.axis.com [10.93.151.1]) by bart.se.axis.com (Postfix) with ESMTP id 1065D64021 for ; Sun, 11 Jan 2009 15:22:21 +0100 (CET) Date: Sun, 11 Jan 2009 15:22:21 +0100 From: "Edgar E. Iglesias" Subject: Re: [Qemu-devel] [PATCH 2/3] sh: movca.l cancel by ocbi Message-ID: <20090111142221.GG26952@edgar.se.axis.com> References: <4969B880.10600@juno.dti.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4969B880.10600@juno.dti.ne.jp> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Shin-ichiro KAWASAKI Cc: qemu-devel@nongnu.org On Sun, Jan 11, 2009 at 06:14:40PM +0900, Shin-ichiro KAWASAKI wrote: > Current sh4's "movca.l" instruction is implemented in a same way > as "mov.l". Then, write to memory cannot be canceled by "ocbi" > cache control instrunction. This makes text area broken on > cache flush by linux kernel. > > This patch delays "movca.l" execution and provide the chance to > cancel it for "ocbi". > # Thank you Edgar, for your advice! > > This patch does, > - on executing "movca.l", just records what and where movca > should store data. > - on executing "ocbi", find the corresponding record by movca, > and delete it to cancel. > - lets TCG produce "delayed_movca" instruction at the first > instruction which is neither "movca.l" nor "ocbi". > - on executing "delayed_movca", does the data store task, > according to the record. > Hello, I think your patch will catch the linux dflush sequence but I've got a few comments. There is a complication with the delayed stores and potential mmu exceptions that they might generate. I think with this approach you'll take those faults at the next instruction which might not even be a load/store. Not sure if that can cause problems for sh. Maybe you could avoid that by making movca a load + store and the ocbi a store with the original value loaded at movca. But you'll still get into trouble with exceptions that might break the movca/ocbi sequence. I don't it's worth having a dynamically sized movca delay buffer. The movca insn has a restricted addressing mode making it a bit hard to use back-to-back without alu insns in between. IIU your code correctly you flush the buffer before every non movca/ocbi insn so I guess that for most realistic use-cases you'll only need very few (maybe only 1) entries in the delay buffer. > Signed-off-by: Shin-ichiro KAWASAKI > > Index: trunk/target-sh4/op_helper.c > =================================================================== > --- trunk/target-sh4/op_helper.c (revision 6133) > +++ trunk/target-sh4/op_helper.c (working copy) > @@ -20,6 +20,7 @@ > #include > #include "exec.h" > #include "helper.h" > +void *qemu_mallocz(size_t size); > > #ifndef CONFIG_USER_ONLY > > @@ -604,3 +605,57 @@ > d.ll = t0; > return float64_to_int32_round_to_zero(d.d, &env->fp_status); > } > + > +void helper_movca(uint32_t val, uint32_t addr) > +{ > + delayed_movca_t *cur = &env->movca_list; > + delayed_movca_t *prev = NULL; > + while (cur) { > + if (!cur->valid) { Might be overkill but you can merge here, i.e: if (!cur->valid || cur->addr == addr) { > + cur->valid = 1; > + cur->value = val; > + cur->addr = addr; > + return; > + } > + prev = cur; > + cur = cur->next; > + } > + > + /* movca entry shortage. allocate it. */ > + prev->next = cur = qemu_mallocz(sizeof(delayed_movca_t)); > + if (cur == NULL) { > + printf("out of memory for delayed movca. @%08x\n", addr); > + return; > + } > + cur->valid = 1; > + cur->value = val; > + cur->addr = addr; > +} > + > +void helper_ocbi(uint32_t addr) > +{ > + delayed_movca_t *cur = &env->movca_list; > + > + while (cur) { > + if (cur->valid && cur->addr == addr) { /* found! */ > + cur->valid = 0; > + return; > + } > + cur = cur->next; > + } I think you can quite easy catch a few more movca use cases if you dont compare line offsets, i.e for sh4 you'd mask the addr & ~(32 - 1). Also, you should not return until you've scanned the entire movca delay buffer as there might be multiple stores to the same line that need to be ignored. example: addr &= ~31; while (cur) { if (cur->valid && (cur->addr & ~31) == addr) cur->valid = 0; cur = cur->next; } Thanks