From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:56473) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLwyU-0007Zb-Mn for qemu-devel@nongnu.org; Wed, 10 Oct 2012 10:09:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TLwyL-000470-0v for qemu-devel@nongnu.org; Wed, 10 Oct 2012 10:09:26 -0400 Received: from mailout1.samsung.com ([203.254.224.24]:18014) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLwyK-00045p-D8 for qemu-devel@nongnu.org; Wed, 10 Oct 2012 10:09:16 -0400 Received: from epcpsbgm1.samsung.com (epcpsbgm1 [203.254.230.26]) by mailout1.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MBO00MFAKN5T820@mailout1.samsung.com> for qemu-devel@nongnu.org; Wed, 10 Oct 2012 23:09:12 +0900 (KST) Received: from [172.21.111.108] ([182.198.1.3]) by mmp2.samsung.com (Oracle Communications Messaging Server 7u4-24.01 (7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTPA id <0MBO00IG8KNCMOB0@mmp2.samsung.com> for qemu-devel@nongnu.org; Wed, 10 Oct 2012 23:09:12 +0900 (KST) Date: Wed, 10 Oct 2012 23:09:12 +0900 From: Yeongkyoon Lee In-reply-to: <50754F3A.5030602@samsung.com> Message-id: <50758188.3000904@samsung.com> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8; format=flowed Content-transfer-encoding: QUOTED-PRINTABLE References: <1349786252-12343-1-git-send-email-yeongkyoon.lee@samsung.com> <20121009142610.GA14078@ohm.aurel32.net> <20121009161956.GG14078@ohm.aurel32.net> <5074571E.60700@redhat.com> <20121009170923.GF9643@ohm.aurel32.net> <5074F6E0.1090206@samsung.com> <20121010064517.GJ9643@ohm.aurel32.net> <50754F3A.5030602@samsung.com> Subject: Re: [Qemu-devel] [PATCH v5 0/3] tcg: enhance code generation quality for qemu_ld/st IRs List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Aurelien Jarno Cc: Paolo Bonzini , qemu-devel@nongnu.org On 2012=EB=85=84 10=EC=9B=94 10=EC=9D=BC 19:34, Yeongkyoon Lee wrote: > On 2012=EB=85=84 10=EC=9B=94 10=EC=9D=BC 15:45, Aurelien Jarno wrot= e: >> On Wed, Oct 10, 2012 at 01:17:36PM +0900, Yeongkyoon Lee wrote: >>> On 2012=EB=85=84 10=EC=9B=94 10=EC=9D=BC 02:09, Aurelien Jarno wr= ote: >>>> On Tue, Oct 09, 2012 at 06:55:58PM +0200, Paolo Bonzini wrote: >>>>> Il 09/10/2012 18:19, Aurelien Jarno ha scritto: >>>>>>>> Instead of calling the MMU helper with an additional argumen= t=20 >>>>>>>> (7), and >>>>>>>> then jump back (8) to the next code (4), what about pushing = the=20 >>>>>>>> address >>>>>>>> of the next code (4) on the stack and use a jmp instead of t= he=20 >>>>>>>> call. In >>>>>>>> that case you don't need the extra argument to the helpers. >>>>>>>> >>>>>> Maybe it wasn't very clear. This is based on the fact that cal= l is >>>>>> basically push %rip + jmp. Therefore we can fake the return= =20 >>>>>> address by >>>>>> putting the value we want, here the address of the next code.= =20 >>>>>> This mean >>>>>> that we don't need to pass the extra argument to the helper fo= r the >>>>>> return address, as GET_PC() would work correctly (it basically= =20 >>>>>> reads the >>>>>> return address on the stack). >>>>>> >>>>>> For other architectures, it might not be a push, but rather a= =20 >>>>>> move to >>>>>> link register, basically put the return address where the call= ing >>>>>> convention asks for. >>>>>> >>>>>> OTOH I just realized it only works if the end of the slow path= =20 >>>>>> (moving >>>>>> the value from the return address to the correct register). It= =20 >>>>>> might be >>>>>> something doable. >>>>> Branch predictors will not oldschool tricks like this one. :) >>>>> >>>> Given it is only used in the slow path (ie the exception more th= an the >>>> rule), branch prediction isn't that important there. >>>> >>> I had already considered the approach of using jmp and removing >>> extra argument for helper call. >>> However, the problem is that the helper needs the gen code addr u= sed >>> by tb_find_pc() and cpu_restore_state(). >>> That means the code addr in the helper can be actually said the a= ddr >>> corresponding to QEMU_ld/st IR rather than the return addr. >>> In my LDST optimization, the helper call site is not in the code = of >>> IR but in the end of TB. >> GETPC() uses the return address to determine the call place, and a= s long >> as the code at the end of the TB set a return address correspondin= g to >> the one of the fast path instructions, tb_find_pc() will be able t= o find >> the correct instruction. >> >> That implies that at least one instruction at the end of the gener= ated >> code is shared between the slow path and the fast path, but in the= other >> hand it avoids having to different kind of mmu helpers. >> > > How about nop instruction at the end of fast path as return address= of=20 > helper? > That means the change of "call helper" to "push addr of nop" and "j= mp=20 > helper". > Although I need to check the feasibility, it is expected to avoid= =20 > helper fragmentation and to make performance degradation to be mini= mum. > > I've done some tests about performance degradation when nop instructi= on=20 is inserted to qemu_ld/st fast path. The result is ok because I did not find any notable performance degra= dation. I'll patch new version without the change of MMU helper's description= soon.