On Wednesday 14 January 2009 23:45:11 Edgar E. Iglesias wrote: > On Thu, Dec 11, 2008 at 09:34:30PM +0300, Vladimir Prus wrote: > > > > This patch fixes the emulation of movca.l and ocbi > > instructions. movca.l is documented to allocate a cache > > like, and write into it. ocbi invalidates a cache line. > > So, given code like: > > > > asm volatile("movca.l r0, @%0\n\t" > > "movca.l r0, @%1\n\t" > > "ocbi @%0\n\t" > > "ocbi @%1" : : > > "r" (a0), "r" (a1)); > > > > Nothing is actually written to memory. Code like this can be > > found in arch/sh/mm/cache-sh4.c and is used to flush the cache. > > > > Current QEMU implements movca.l the same way as ordinary move, > > and the code above just corrupts memory. Doing full cache emulation > > is out of question, so this patch implements a hack. Stores > > done by movca.l are delayed. If we execute an instructions that is > > neither movca.l nor ocbi, we flush all pending stores. If we execute > > ocbi, we look at pending stores and delete a store to the invalidated > > address. This appears to work fairly well in practice. > > > > - Volodya > > Hello and thanks for the patch. > > If you wonder why the late sudden interest in this, see the following > thread :) > http://lists.gnu.org/archive/html/qemu-devel/2009-01/msg00460.html Hi Edgar, apologies for belated reply. > I've got a few comments on your patch. Lets start with the general ones. > > It would be good if the patch handled when the data cache gets disabled or > put into writethrough mode. There are also memory areas that are not > cache:able (TLB feature). Supporting the latter can get messy for very > little win, IMO not needed. I am afraid I might not be the best person to accurately catch all cases where cache might be enabled or disabled. KAWASAKI-san has posted a function that can be used to check if caching is enabled, in: http://article.gmane.org/gmane.comp.emulators.qemu/36348 Does that one address your suggestion? If so, I can incorporate that function, and checking for it. > In the other thread discussing this issue, I raised conerns about the > delayed stores beeing issued on insns where they normaly wouldn't. Let me > give you an example of the kind of thing I'm thinking of. > > movca > --- <-- Page boundary, broken TB. > mov <-- I TLB happens to miss. SH jumps to TLB miss handler. > xxx <- I TLB refill handlers first insn issues delayed store > and the delayed store happens to fault! > > That scenario could cause a D-TLB fault inside a I-TLB refill handler. Some > archs don't handle that. If it's a real problem for sh (which I beleive it > is) you could turn the stake and make the movca into a load + store, saving > the overwritten memory value. ocbi could then just bring back the old state. > I beleive that approach would not break the exception rules of sh. I think your approach will indeed be more reliable, so I've reworked my patch accordingly. > > +void helper_do_stores(void) > > +{ > > + store_request_t *current = env->store_requests; > > + > > + while(current) > > + { > > + uint32_t a = current->address, v = current->value; > > + store_request_t *next = current->next; > > + free (current); > > + env->store_requests = current = next; > > + if (current == 0) > > + env->store_request_tail = &(env->store_requests); > > + > > + stl_data(a, v); > > Shouldn't you remove the delayed store record _after_ stl() returns? > Otherwise you might loose delayed stores in case of TLB exceptions or? I recall that I have explicitly used this ordering, but I no longer remember exactly why. Anyway, this is not an issue after patch is revised as you suggested above. > > > > > + } > > +} > > + > > +void helper_ocbi(uint32_t address) > > +{ > > + store_request_t **current = &(env->store_requests); > > + while (*current) > > + { > > + if ((*current)->address == address) > > + { > > > Nitpick: > It may not be very significant but you can in an easy way catch a few more > movca+ocbi use cases by not comparing line offsets. I.e, just mask off > the last five bits from the compared addresses to zap all recorded movca's > made to the same cache-line ocbi is invalidating. I did so. > > static void _decode_opc(DisasContext * ctx) > > { > > + /* This code tries to make movcal emulation sufficiently > > + accurate for Linux purposes. This instruction writes > > + memory, and prior to that, always allocates a cache line. > > + It is used in two contexts: > > + - in memcpy, where data is copied in blocks, the first write > > + of to a block uses movca.l. I presume this is because writing > > + all data into cache, and then having the data sent into memory > > + later, via store buffer, is faster than, in case of write-through > > + cache configuration, to wait for memory write on each store. > > I dont think this is the reason for using movca. In fact, according to the > documentation I've got, movca behaves like a normal mov for caches in > writethrough mode. I beleive the reason is to avoid the line fetch in case > the store from a normal mov misses the cache in writeback mode. Anyway, > there's no reason to speculate, IMO we can just remove the comment on why > and just leave the note about it beeing used for fast block copies. OK. I am attaching a revised patch -- what do you think? - Volodya