From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LMX2A-0004ID-SK for qemu-devel@nongnu.org; Mon, 12 Jan 2009 19:21:30 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LMX27-0004Fb-UO for qemu-devel@nongnu.org; Mon, 12 Jan 2009 19:21:29 -0500 Received: from [199.232.76.173] (port=36565 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LMX27-0004FA-LQ for qemu-devel@nongnu.org; Mon, 12 Jan 2009 19:21:27 -0500 Received: from vsmtp04.dti.ne.jp ([202.216.231.139]:56709) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LMX27-0002gW-0j for qemu-devel@nongnu.org; Mon, 12 Jan 2009 19:21:27 -0500 Message-ID: <496BDE7E.4040002@juno.dti.ne.jp> Date: Tue, 13 Jan 2009 09:21:18 +0900 From: Shin-ichiro KAWASAKI MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH 2/3] sh: movca.l cancel by ocbi References: <4969B880.10600@juno.dti.ne.jp> <20090111142221.GG26952@edgar.se.axis.com> <496AD162.1040808@juno.dti.ne.jp> In-Reply-To: <496AD162.1040808@juno.dti.ne.jp> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Shin-ichiro KAWASAKI Cc: qemu-devel@nongnu.org, "Edgar E. Iglesias" I've completely passed over Vladimir-san's nice patch on movca-ocbi, even though it is already in qemu-sh staging repository. http://lists.gnu.org/archive/html/qemu-devel/2008-12/msg00605.html But now, I can review the patch. I'll do it later. Regards, Shin-ichiro KAWASAKI Shin-ichiro KAWASAKI wrote: > Thank you for your commmnts! > > Edgar E. Iglesias wrote: >> On Sun, Jan 11, 2009 at 06:14:40PM +0900, Shin-ichiro KAWASAKI wrote: >>> Current sh4's "movca.l" instruction is implemented in a same way >>> as "mov.l". Then, write to memory cannot be canceled by "ocbi" >>> cache control instrunction. This makes text area broken on >>> cache flush by linux kernel. >>> >>> This patch delays "movca.l" execution and provide the chance to >>> cancel it for "ocbi". >>> # Thank you Edgar, for your advice! >>> >>> This patch does, >>> - on executing "movca.l", just records what and where movca should >>> store data. >>> - on executing "ocbi", find the corresponding record by movca, >>> and delete it to cancel. >>> - lets TCG produce "delayed_movca" instruction at the first >>> instruction which is neither "movca.l" nor "ocbi". >>> - on executing "delayed_movca", does the data store task, >>> according to the record. >> >> Hello, >> >> I think your patch will catch the linux dflush sequence but I've got a >> few comments. >> >> There is a complication with the delayed stores and potential mmu >> exceptions that they might generate. I think with this approach >> you'll take those faults at the next instruction which might not >> even be a load/store. Not sure if that can cause problems for sh. >> Maybe you could avoid that by making movca a load + store and the >> ocbi a store with the original value loaded at movca. But you'll >> still get into trouble with exceptions that might break the >> movca/ocbi sequence. > That's an idea. Current implementation is "delay and do-or-cancel" way. > But your idea is "do and commit-or-rollback" way. > I'll try to implement it to avoid mmu fault by movca. > The mmu fault by movca will cause invoke exception handler, and break > movca/ocbi sequence. It will be a quite rare case, however, I'll consider > it for next version. > >> I don't it's worth having a dynamically sized movca delay buffer. >> The movca insn has a restricted addressing mode making it a bit hard >> to use back-to-back without alu insns in between. IIU your code >> correctly you flush the buffer before every non movca/ocbi insn so I >> guess that for most realistic use-cases you'll only need very few >> (maybe only 1) entries in the delay buffer. > That is a point I wavered. > I investigated linux kernel and found following lines in > "__flush_dcache_segment_4way()" in "arch/sh/mm/cache-sh4.c". > > a0 = base_addr; > a1 = a0 + way_incr; > a2 = a1 + way_incr; > a3 = a2 + way_incr; > a0e = base_addr + extent_per_way; > do { > asm volatile("ldc %0, sr" : : "r" (sr_with_bl)); > asm volatile("movca.l r0, @%0\n\t" > "movca.l r0, @%1\n\t" > "movca.l r0, @%2\n\t" > "movca.l r0, @%3\n\t" > "ocbi @%0\n\t" > "ocbi @%1\n\t" > "ocbi @%2\n\t" > "ocbi @%3\n\t" : : > "r" (a0), "r" (a1), "r" (a2), "r" (a3)); > > Multiple movca.l insns executed consequently, and they access different > lines. > At least, 4 entries are needed for it. This code shows kernel > implementation > decides how many number of movcas are invoked consequently. > That's the reason why I introudcued dynamic size change. > >>> Signed-off-by: Shin-ichiro KAWASAKI >>> >>> Index: trunk/target-sh4/op_helper.c >>> =================================================================== >>> --- trunk/target-sh4/op_helper.c (revision 6133) >>> +++ trunk/target-sh4/op_helper.c (working copy) >>> @@ -20,6 +20,7 @@ >>> #include >>> #include "exec.h" >>> #include "helper.h" >>> +void *qemu_mallocz(size_t size); >>> >>> #ifndef CONFIG_USER_ONLY >>> >>> @@ -604,3 +605,57 @@ >>> d.ll = t0; >>> return float64_to_int32_round_to_zero(d.d, &env->fp_status); >>> } >>> + >>> +void helper_movca(uint32_t val, uint32_t addr) >>> +{ >>> + delayed_movca_t *cur = &env->movca_list; >>> + delayed_movca_t *prev = NULL; >>> + while (cur) { >>> + if (!cur->valid) { >> >> Might be overkill but you can merge here, i.e: >> >> if (!cur->valid || cur->addr == addr) { >> >> >>> + cur->valid = 1; >>> + cur->value = val; >>> + cur->addr = addr; >>> + return; >>> + } >>> + prev = cur; >>> + cur = cur->next; >>> + } >>> + >>> + /* movca entry shortage. allocate it. */ >>> + prev->next = cur = qemu_mallocz(sizeof(delayed_movca_t)); >>> + if (cur == NULL) { >>> + printf("out of memory for delayed movca. @%08x\n", addr); >>> + return; >>> + } >>> + cur->valid = 1; >>> + cur->value = val; >>> + cur->addr = addr; >>> +} >>> + >>> +void helper_ocbi(uint32_t addr) >>> +{ >>> + delayed_movca_t *cur = &env->movca_list; >>> + >>> + while (cur) { >>> + if (cur->valid && cur->addr == addr) { /* found! */ >>> + cur->valid = 0; >>> + return; >>> + } >>> + cur = cur->next; >>> + } >> >> I think you can quite easy catch a few more movca use cases if you >> dont compare line offsets, i.e for sh4 you'd mask the addr & ~(32 - 1). >> Also, you should not return until you've scanned the entire movca delay >> buffer as there might be multiple stores to the same line that need to be >> ignored. >> >> example: >> addr &= ~31; >> while (cur) { >> if (cur->valid && (cur->addr & ~31) == addr) >> cur->valid = 0; >> cur = cur->next; >> } > > That's a good advice. I'll try to implement it for next version. > > Thanks again for your review! > > Regards, > Shin-ichiro KAWASAKI >