From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1LM1Cq-0005kA-KY
	for qemu-devel@nongnu.org; Sun, 11 Jan 2009 09:22:24 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1LM1Co-0005i8-N5
	for qemu-devel@nongnu.org; Sun, 11 Jan 2009 09:22:23 -0500
Received: from [199.232.76.173] (port=42627 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1LM1Co-0005hw-C5
	for qemu-devel@nongnu.org; Sun, 11 Jan 2009 09:22:22 -0500
Received: from bart.se.axis.com ([195.60.68.10]:48906)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <edgar.iglesias@axis.com>) id 1LM1Cn-0003mu-RX
	for qemu-devel@nongnu.org; Sun, 11 Jan 2009 09:22:22 -0500
Received: from bart.se.axis.com (bart.se.axis.com [127.0.0.1])
	by bart.se.axis.com (Postfix) with ESMTP id 1DDE46403F
	for <qemu-devel@nongnu.org>; Sun, 11 Jan 2009 15:22:21 +0100 (CET)
Received: from axis.com (edgar.se.axis.com [10.93.151.1])
	by bart.se.axis.com (Postfix) with ESMTP id 1065D64021
	for <qemu-devel@nongnu.org>; Sun, 11 Jan 2009 15:22:21 +0100 (CET)
Date: Sun, 11 Jan 2009 15:22:21 +0100
From: "Edgar E. Iglesias" <edgar.iglesias@axis.com>
Subject: Re: [Qemu-devel] [PATCH 2/3] sh: movca.l cancel by ocbi
Message-ID: <20090111142221.GG26952@edgar.se.axis.com>
References: <4969B880.10600@juno.dti.ne.jp>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4969B880.10600@juno.dti.ne.jp>
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Shin-ichiro KAWASAKI <kawasaki@juno.dti.ne.jp>
Cc: qemu-devel@nongnu.org

On Sun, Jan 11, 2009 at 06:14:40PM +0900, Shin-ichiro KAWASAKI wrote:
> Current sh4's "movca.l" instruction is implemented in a same way
> as "mov.l". Then, write to memory cannot be canceled by "ocbi"
> cache control instrunction.  This makes text area broken on
> cache flush by linux kernel.
> 
> This patch delays "movca.l" execution and provide the chance to 
> cancel it for "ocbi".
> # Thank you Edgar, for your advice!
> 
> This patch does,
> - on executing "movca.l", just records what and where movca 
>   should store data.
> - on executing "ocbi", find the corresponding record by movca,
>   and delete it to cancel.
> - lets TCG produce "delayed_movca" instruction at the first
>   instruction which is neither "movca.l" nor "ocbi".
> - on executing "delayed_movca", does the data store task,
>   according to the record. 
> 

Hello,

I think your patch will catch the linux dflush sequence but I've got a
few comments.

There is a complication with the delayed stores and potential mmu
exceptions that they might generate. I think with this approach
you'll take those faults at the next instruction which might not
even be a load/store. Not sure if that can cause problems for sh.
Maybe you could avoid that by making movca a load + store and the
ocbi a store with the original value loaded at movca. But you'll
still get into trouble with exceptions that might break the
movca/ocbi sequence.

I don't it's worth having a dynamically sized movca delay buffer.
The movca insn has a restricted addressing mode making it a bit hard
to use back-to-back without alu insns in between. IIU your code
correctly you flush the buffer before every non movca/ocbi insn so I
guess that for most realistic use-cases you'll only need very few
(maybe only 1) entries in the delay buffer.

> Signed-off-by: Shin-ichiro KAWASAKI <kawasaki@juno.dti.ne.jp>
>
> Index: trunk/target-sh4/op_helper.c
> ===================================================================
> --- trunk/target-sh4/op_helper.c	(revision 6133)
> +++ trunk/target-sh4/op_helper.c	(working copy)
> @@ -20,6 +20,7 @@
>  #include <assert.h>
>  #include "exec.h"
>  #include "helper.h"
> +void *qemu_mallocz(size_t size);
>  
>  #ifndef CONFIG_USER_ONLY
>  
> @@ -604,3 +605,57 @@
>      d.ll = t0;
>      return float64_to_int32_round_to_zero(d.d, &env->fp_status);
>  }
> +
> +void helper_movca(uint32_t val, uint32_t addr)
> +{
> +    delayed_movca_t *cur = &env->movca_list;
> +    delayed_movca_t *prev = NULL;
> +    while (cur) {
> +	if (!cur->valid) {

Might be overkill but you can merge here, i.e:

	if (!cur->valid || cur->addr == addr) {


> +	    cur->valid = 1;
> +	    cur->value = val;
> +	    cur->addr = addr;
> +	    return;
> +	}
> +	prev = cur;
> +	cur = cur->next;
> +    }
> +
> +    /* movca entry shortage. allocate it. */
> +    prev->next = cur = qemu_mallocz(sizeof(delayed_movca_t));
> +    if (cur == NULL) {
> +	printf("out of memory for delayed movca. @%08x\n", addr);
> +	return;
> +    }
> +    cur->valid = 1;
> +    cur->value = val;
> +    cur->addr = addr;
> +}
> +
> +void helper_ocbi(uint32_t addr)
> +{
> +    delayed_movca_t *cur = &env->movca_list;
> +
> +    while (cur) {
> +	if (cur->valid && cur->addr == addr) { /* found! */
> +	    cur->valid = 0;
> +	    return;
> +	}
> +	cur = cur->next;
> +    }

I think you can quite easy catch a few more movca use cases if you
dont compare line offsets, i.e for sh4 you'd mask the addr & ~(32 - 1).
Also, you should not return until you've scanned the entire movca delay
buffer as there might be multiple stores to the same line that need to be
ignored.

example:
    addr &= ~31;
    while (cur) {
       if (cur->valid && (cur->addr & ~31) == addr)
           cur->valid = 0;
       cur = cur->next;
    }

Thanks