From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LNAkx-0002V3-IU for qemu-devel@nongnu.org; Wed, 14 Jan 2009 13:46:23 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LNAkw-0002UC-JZ for qemu-devel@nongnu.org; Wed, 14 Jan 2009 13:46:22 -0500 Received: from [199.232.76.173] (port=50197 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LNAkv-0002Ti-VT for qemu-devel@nongnu.org; Wed, 14 Jan 2009 13:46:22 -0500 Received: from bart.se.axis.com ([195.60.68.10]:43621) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LNAkv-00026V-95 for qemu-devel@nongnu.org; Wed, 14 Jan 2009 13:46:21 -0500 Received: from bart.se.axis.com (bart.se.axis.com [127.0.0.1]) by bart.se.axis.com (Postfix) with ESMTP id E938D6409F for ; Wed, 14 Jan 2009 19:46:16 +0100 (CET) Received: from axis.com (edgar.se.axis.com [10.93.151.1]) by bart.se.axis.com (Postfix) with ESMTP id D78CD6408D for ; Wed, 14 Jan 2009 19:46:16 +0100 (CET) Date: Wed, 14 Jan 2009 19:46:16 +0100 From: "Edgar E. Iglesias" Subject: Re: [Qemu-devel] [PATCH 2/3] sh: movca.l cancel by ocbi Message-ID: <20090114184616.GA641@edgar.se.axis.com> References: <4969B880.10600@juno.dti.ne.jp> <20090111142221.GG26952@edgar.se.axis.com> <496AD162.1040808@juno.dti.ne.jp> <496BDE7E.4040002@juno.dti.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <496BDE7E.4040002@juno.dti.ne.jp> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Shin-ichiro KAWASAKI Cc: "Edgar E. Iglesias" , qemu-devel@nongnu.org On Tue, Jan 13, 2009 at 09:21:18AM +0900, Shin-ichiro KAWASAKI wrote: > I've completely passed over Vladimir-san's nice patch on movca-ocbi, > even though it is already in qemu-sh staging repository. > > http://lists.gnu.org/archive/html/qemu-devel/2008-12/msg00605.html Hah funny, that patch looks very much like yours :) I'll try to review it too. Cheers > > But now, I can review the patch. I'll do it later. > > Regards, > Shin-ichiro KAWASAKI > > > Shin-ichiro KAWASAKI wrote: >> Thank you for your commmnts! >> Edgar E. Iglesias wrote: >>> On Sun, Jan 11, 2009 at 06:14:40PM +0900, Shin-ichiro KAWASAKI wrote: >>>> Current sh4's "movca.l" instruction is implemented in a same way >>>> as "mov.l". Then, write to memory cannot be canceled by "ocbi" >>>> cache control instrunction. This makes text area broken on >>>> cache flush by linux kernel. >>>> >>>> This patch delays "movca.l" execution and provide the chance to cancel >>>> it for "ocbi". >>>> # Thank you Edgar, for your advice! >>>> >>>> This patch does, >>>> - on executing "movca.l", just records what and where movca should >>>> store data. >>>> - on executing "ocbi", find the corresponding record by movca, >>>> and delete it to cancel. >>>> - lets TCG produce "delayed_movca" instruction at the first >>>> instruction which is neither "movca.l" nor "ocbi". >>>> - on executing "delayed_movca", does the data store task, >>>> according to the record. >>> >>> Hello, >>> >>> I think your patch will catch the linux dflush sequence but I've got a >>> few comments. >>> >>> There is a complication with the delayed stores and potential mmu >>> exceptions that they might generate. I think with this approach >>> you'll take those faults at the next instruction which might not >>> even be a load/store. Not sure if that can cause problems for sh. >>> Maybe you could avoid that by making movca a load + store and the >>> ocbi a store with the original value loaded at movca. But you'll >>> still get into trouble with exceptions that might break the >>> movca/ocbi sequence. >> That's an idea. Current implementation is "delay and do-or-cancel" way. >> But your idea is "do and commit-or-rollback" way. >> I'll try to implement it to avoid mmu fault by movca. >> The mmu fault by movca will cause invoke exception handler, and break >> movca/ocbi sequence. It will be a quite rare case, however, I'll consider >> it for next version. >>> I don't it's worth having a dynamically sized movca delay buffer. >>> The movca insn has a restricted addressing mode making it a bit hard >>> to use back-to-back without alu insns in between. IIU your code >>> correctly you flush the buffer before every non movca/ocbi insn so I >>> guess that for most realistic use-cases you'll only need very few >>> (maybe only 1) entries in the delay buffer. >> That is a point I wavered. >> I investigated linux kernel and found following lines in >> "__flush_dcache_segment_4way()" in "arch/sh/mm/cache-sh4.c". >> a0 = base_addr; >> a1 = a0 + way_incr; >> a2 = a1 + way_incr; >> a3 = a2 + way_incr; >> a0e = base_addr + extent_per_way; >> do { >> asm volatile("ldc %0, sr" : : "r" (sr_with_bl)); >> asm volatile("movca.l r0, @%0\n\t" >> "movca.l r0, @%1\n\t" >> "movca.l r0, @%2\n\t" >> "movca.l r0, @%3\n\t" >> "ocbi @%0\n\t" >> "ocbi @%1\n\t" >> "ocbi @%2\n\t" >> "ocbi @%3\n\t" : : >> "r" (a0), "r" (a1), "r" (a2), "r" (a3)); >> Multiple movca.l insns executed consequently, and they access different >> lines. >> At least, 4 entries are needed for it. This code shows kernel >> implementation >> decides how many number of movcas are invoked consequently. >> That's the reason why I introudcued dynamic size change. >>>> Signed-off-by: Shin-ichiro KAWASAKI >>>> >>>> Index: trunk/target-sh4/op_helper.c >>>> =================================================================== >>>> --- trunk/target-sh4/op_helper.c (revision 6133) >>>> +++ trunk/target-sh4/op_helper.c (working copy) >>>> @@ -20,6 +20,7 @@ >>>> #include >>>> #include "exec.h" >>>> #include "helper.h" >>>> +void *qemu_mallocz(size_t size); >>>> #ifndef CONFIG_USER_ONLY >>>> @@ -604,3 +605,57 @@ >>>> d.ll = t0; >>>> return float64_to_int32_round_to_zero(d.d, &env->fp_status); >>>> } >>>> + >>>> +void helper_movca(uint32_t val, uint32_t addr) >>>> +{ >>>> + delayed_movca_t *cur = &env->movca_list; >>>> + delayed_movca_t *prev = NULL; >>>> + while (cur) { >>>> + if (!cur->valid) { >>> >>> Might be overkill but you can merge here, i.e: >>> >>> if (!cur->valid || cur->addr == addr) { >>> >>> >>>> + cur->valid = 1; >>>> + cur->value = val; >>>> + cur->addr = addr; >>>> + return; >>>> + } >>>> + prev = cur; >>>> + cur = cur->next; >>>> + } >>>> + >>>> + /* movca entry shortage. allocate it. */ >>>> + prev->next = cur = qemu_mallocz(sizeof(delayed_movca_t)); >>>> + if (cur == NULL) { >>>> + printf("out of memory for delayed movca. @%08x\n", addr); >>>> + return; >>>> + } >>>> + cur->valid = 1; >>>> + cur->value = val; >>>> + cur->addr = addr; >>>> +} >>>> + >>>> +void helper_ocbi(uint32_t addr) >>>> +{ >>>> + delayed_movca_t *cur = &env->movca_list; >>>> + >>>> + while (cur) { >>>> + if (cur->valid && cur->addr == addr) { /* found! */ >>>> + cur->valid = 0; >>>> + return; >>>> + } >>>> + cur = cur->next; >>>> + } >>> >>> I think you can quite easy catch a few more movca use cases if you >>> dont compare line offsets, i.e for sh4 you'd mask the addr & ~(32 - 1). >>> Also, you should not return until you've scanned the entire movca delay >>> buffer as there might be multiple stores to the same line that need to be >>> ignored. >>> >>> example: >>> addr &= ~31; >>> while (cur) { >>> if (cur->valid && (cur->addr & ~31) == addr) >>> cur->valid = 0; >>> cur = cur->next; >>> } >> That's a good advice. I'll try to implement it for next version. >> Thanks again for your review! >> Regards, >> Shin-ichiro KAWASAKI > >