From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LeWpT-0002hu-Bh for qemu-devel@nongnu.org; Tue, 03 Mar 2009 10:46:47 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LeWpR-0002hU-A4 for qemu-devel@nongnu.org; Tue, 03 Mar 2009 10:46:46 -0500 Received: from [199.232.76.173] (port=44207 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LeWpR-0002hR-0e for qemu-devel@nongnu.org; Tue, 03 Mar 2009 10:46:45 -0500 Received: from vsmtp02.dti.ne.jp ([202.216.231.137]:46778) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LeWpQ-0002a1-4Q for qemu-devel@nongnu.org; Tue, 03 Mar 2009 10:46:44 -0500 Received: from [192.168.1.22] (PPPa1209.e11.eacc.dti.ne.jp [124.255.90.198]) by vsmtp02.dti.ne.jp (3.11v) with ESMTP AUTH id n23Fkgex022974 for ; Wed, 4 Mar 2009 00:46:42 +0900 (JST) Message-ID: <49AD50E2.7000401@juno.dti.ne.jp> Date: Wed, 04 Mar 2009 00:46:42 +0900 From: Shin-ichiro KAWASAKI MIME-Version: 1.0 Subject: Re: [Qemu-devel] sh : performance problem References: <49A6C317.1080202@juno.dti.ne.jp> <1236038327.4975.16.camel@coalu.atr> In-Reply-To: <1236038327.4975.16.camel@coalu.atr> Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Lionel Landwerlin wrote: > Shin-ichiro, > > Sorry, but I cannot apply your patch cleanly on the last qemu-svn. > > Instead, I would like to try another approach. The patch you proposed to > find (or not) a valid TLB entry has a complexity of O(log2(n)) (or > something like that if I remember) instead here is a patch with a > complexity of O(1). Good work. I evaluated your patch on my environment, measuring compile time for empty main() with gcc. sh4 : 5.8 [seconds] O(n) utlb search. sh4 : 4.6 [seconds] O(log2(n)) utlb search. sh4 : 4.1 [seconds] O(1) utlb search by Lionel arm : 0.8 [seconds] (-M versatilepb + Debian ARM) Your patch has a nice score! Now I've done the work to increase number of utlb entries from 64 to 256, and found that the score get arround 2.4 seconds. I'm trying to increase it to 4096. Your O(1) search will be more important as the entry number increase. > +#if !defined(CONFIG_USER_ONLY) > + /* vpn to utlb entry caches (too much space for user emulation) */ > + uint8_t utlbs_1k[4194304]; /* 222 => 4 Mb */ > + uint8_t utlbs_4k[1048576]; /* 220 => 1 Mb */ > + uint8_t utlbs_64k[65536]; /* 216 => 64 Kb */ > + uint8_t utlbs_1m[4096]; /* 212 => 4 Kb */ > +#endif > } CPUSH4State; Isn't it too gorgeous? How about allocating them on demand? I guess sh-linux uses only utlbs_4k[], in general. If so, 4 Mb utlbs_1k[] is waste. Regards, Shin-ichiro KAWASAKI