From: Shin-ichiro KAWASAKI <kawasaki@juno.dti.ne.jp>
To: Shin-ichiro KAWASAKI <kawasaki@juno.dti.ne.jp>
Cc: qemu-devel@nongnu.org, "Edgar E. Iglesias" <edgar.iglesias@axis.com>
Subject: Re: [Qemu-devel] [PATCH 2/3] sh: movca.l cancel by ocbi
Date: Tue, 13 Jan 2009 09:21:18 +0900 [thread overview]
Message-ID: <496BDE7E.4040002@juno.dti.ne.jp> (raw)
In-Reply-To: <496AD162.1040808@juno.dti.ne.jp>
I've completely passed over Vladimir-san's nice patch on movca-ocbi,
even though it is already in qemu-sh staging repository.
http://lists.gnu.org/archive/html/qemu-devel/2008-12/msg00605.html
But now, I can review the patch. I'll do it later.
Regards,
Shin-ichiro KAWASAKI
Shin-ichiro KAWASAKI wrote:
> Thank you for your commmnts!
>
> Edgar E. Iglesias wrote:
>> On Sun, Jan 11, 2009 at 06:14:40PM +0900, Shin-ichiro KAWASAKI wrote:
>>> Current sh4's "movca.l" instruction is implemented in a same way
>>> as "mov.l". Then, write to memory cannot be canceled by "ocbi"
>>> cache control instrunction. This makes text area broken on
>>> cache flush by linux kernel.
>>>
>>> This patch delays "movca.l" execution and provide the chance to
>>> cancel it for "ocbi".
>>> # Thank you Edgar, for your advice!
>>>
>>> This patch does,
>>> - on executing "movca.l", just records what and where movca should
>>> store data.
>>> - on executing "ocbi", find the corresponding record by movca,
>>> and delete it to cancel.
>>> - lets TCG produce "delayed_movca" instruction at the first
>>> instruction which is neither "movca.l" nor "ocbi".
>>> - on executing "delayed_movca", does the data store task,
>>> according to the record.
>>
>> Hello,
>>
>> I think your patch will catch the linux dflush sequence but I've got a
>> few comments.
>>
>> There is a complication with the delayed stores and potential mmu
>> exceptions that they might generate. I think with this approach
>> you'll take those faults at the next instruction which might not
>> even be a load/store. Not sure if that can cause problems for sh.
>> Maybe you could avoid that by making movca a load + store and the
>> ocbi a store with the original value loaded at movca. But you'll
>> still get into trouble with exceptions that might break the
>> movca/ocbi sequence.
> That's an idea. Current implementation is "delay and do-or-cancel" way.
> But your idea is "do and commit-or-rollback" way.
> I'll try to implement it to avoid mmu fault by movca.
> The mmu fault by movca will cause invoke exception handler, and break
> movca/ocbi sequence. It will be a quite rare case, however, I'll consider
> it for next version.
>
>> I don't it's worth having a dynamically sized movca delay buffer.
>> The movca insn has a restricted addressing mode making it a bit hard
>> to use back-to-back without alu insns in between. IIU your code
>> correctly you flush the buffer before every non movca/ocbi insn so I
>> guess that for most realistic use-cases you'll only need very few
>> (maybe only 1) entries in the delay buffer.
> That is a point I wavered.
> I investigated linux kernel and found following lines in
> "__flush_dcache_segment_4way()" in "arch/sh/mm/cache-sh4.c".
>
> a0 = base_addr;
> a1 = a0 + way_incr;
> a2 = a1 + way_incr;
> a3 = a2 + way_incr;
> a0e = base_addr + extent_per_way;
> do {
> asm volatile("ldc %0, sr" : : "r" (sr_with_bl));
> asm volatile("movca.l r0, @%0\n\t"
> "movca.l r0, @%1\n\t"
> "movca.l r0, @%2\n\t"
> "movca.l r0, @%3\n\t"
> "ocbi @%0\n\t"
> "ocbi @%1\n\t"
> "ocbi @%2\n\t"
> "ocbi @%3\n\t" : :
> "r" (a0), "r" (a1), "r" (a2), "r" (a3));
>
> Multiple movca.l insns executed consequently, and they access different
> lines.
> At least, 4 entries are needed for it. This code shows kernel
> implementation
> decides how many number of movcas are invoked consequently.
> That's the reason why I introudcued dynamic size change.
>
>>> Signed-off-by: Shin-ichiro KAWASAKI <kawasaki@juno.dti.ne.jp>
>>>
>>> Index: trunk/target-sh4/op_helper.c
>>> ===================================================================
>>> --- trunk/target-sh4/op_helper.c (revision 6133)
>>> +++ trunk/target-sh4/op_helper.c (working copy)
>>> @@ -20,6 +20,7 @@
>>> #include <assert.h>
>>> #include "exec.h"
>>> #include "helper.h"
>>> +void *qemu_mallocz(size_t size);
>>>
>>> #ifndef CONFIG_USER_ONLY
>>>
>>> @@ -604,3 +605,57 @@
>>> d.ll = t0;
>>> return float64_to_int32_round_to_zero(d.d, &env->fp_status);
>>> }
>>> +
>>> +void helper_movca(uint32_t val, uint32_t addr)
>>> +{
>>> + delayed_movca_t *cur = &env->movca_list;
>>> + delayed_movca_t *prev = NULL;
>>> + while (cur) {
>>> + if (!cur->valid) {
>>
>> Might be overkill but you can merge here, i.e:
>>
>> if (!cur->valid || cur->addr == addr) {
>>
>>
>>> + cur->valid = 1;
>>> + cur->value = val;
>>> + cur->addr = addr;
>>> + return;
>>> + }
>>> + prev = cur;
>>> + cur = cur->next;
>>> + }
>>> +
>>> + /* movca entry shortage. allocate it. */
>>> + prev->next = cur = qemu_mallocz(sizeof(delayed_movca_t));
>>> + if (cur == NULL) {
>>> + printf("out of memory for delayed movca. @%08x\n", addr);
>>> + return;
>>> + }
>>> + cur->valid = 1;
>>> + cur->value = val;
>>> + cur->addr = addr;
>>> +}
>>> +
>>> +void helper_ocbi(uint32_t addr)
>>> +{
>>> + delayed_movca_t *cur = &env->movca_list;
>>> +
>>> + while (cur) {
>>> + if (cur->valid && cur->addr == addr) { /* found! */
>>> + cur->valid = 0;
>>> + return;
>>> + }
>>> + cur = cur->next;
>>> + }
>>
>> I think you can quite easy catch a few more movca use cases if you
>> dont compare line offsets, i.e for sh4 you'd mask the addr & ~(32 - 1).
>> Also, you should not return until you've scanned the entire movca delay
>> buffer as there might be multiple stores to the same line that need to be
>> ignored.
>>
>> example:
>> addr &= ~31;
>> while (cur) {
>> if (cur->valid && (cur->addr & ~31) == addr)
>> cur->valid = 0;
>> cur = cur->next;
>> }
>
> That's a good advice. I'll try to implement it for next version.
>
> Thanks again for your review!
>
> Regards,
> Shin-ichiro KAWASAKI
>
next prev parent reply other threads:[~2009-01-13 0:21 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-11 9:14 [Qemu-devel] [PATCH 2/3] sh: movca.l cancel by ocbi Shin-ichiro KAWASAKI
2009-01-11 9:16 ` [Qemu-devel] [PATCH 3/3] sh: SCI improvements Shin-ichiro KAWASAKI
2009-01-11 15:30 ` Jean-Christophe PLAGNIOL-VILLARD
2009-01-11 17:02 ` [Qemu-devel] [PATCH] sh/serial: allow cpu regs definition Jean-Christophe PLAGNIOL-VILLARD
2009-01-12 6:42 ` [Qemu-devel] [PATCH 3/3] sh: SCI improvements Jean-Christophe PLAGNIOL-VILLARD
2009-01-11 14:22 ` [Qemu-devel] [PATCH 2/3] sh: movca.l cancel by ocbi Edgar E. Iglesias
2009-01-12 5:13 ` Shin-ichiro KAWASAKI
2009-01-13 0:21 ` Shin-ichiro KAWASAKI [this message]
2009-01-14 18:46 ` Edgar E. Iglesias
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=496BDE7E.4040002@juno.dti.ne.jp \
--to=kawasaki@juno.dti.ne.jp \
--cc=edgar.iglesias@axis.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.