From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Chen, Kenneth W" Date: Thu, 10 Nov 2005 23:30:39 +0000 Subject: RE: [Patch 1/1] 4-level page tables v4. Message-Id: <200511102330.jAANUdg21565@unix-os.sc.intel.com> List-Id: References: <20051110161915.GA3630@lnx-holt.americas.sgi.com> In-Reply-To: <20051110161915.GA3630@lnx-holt.americas.sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-ia64@vger.kernel.org Robin Holt wrote on Thursday, November 10, 2005 2:39 PM > On Thu, Nov 10, 2005 at 01:49:26PM -0800, Luck, Tony wrote: > > Compiling with three levels, I see some differences in the scheduling > > of instructions in the vhpt_miss handler and the nested_dtlb miss > > handler. Side-by-side diff of a disassembly included below (original > > sequence is on the left, new sequence is on the right). For the vhpt > > case the new handler is 3 instructions shorter ... but shorter isn't > > always better. > > I used the objdump that Jack Steiner pointed me towards to optomize the > vhpt_miss handler and then test. This instruction order gave the best > performance, but we are talking extremely small differences. > > Is the goal to make these identical? If so, it should be easy to do, > but I was not aware that was the intent. I was wondering earlier too why you changed all the register usage etc. You really don't need to make that big of change since the resource contention is around dep/cmp. cmp instruction is ALU type and can be schedule on all 6 integer units. The easiest way is to just re-order these two instructions. There is one change you made around tbit/dep on line 163 (dep r23=0,r20,0,PAGE_SHIFT), but that is outside the 4-level page table walk. And again, easiest thing to do is to pull that ins 2 bundle earlier. - Ken