From mboxrd@z Thu Jan 1 00:00:00 1970 From: dave.martin@linaro.org (Dave Martin) Date: Tue, 25 Sep 2012 13:55:19 +0100 Subject: [PATCH v3 RESEND 05/17] ARM: LPAE: support 64-bit virt_to_phys patching In-Reply-To: References: <1348242975-19184-1-git-send-email-cyril@ti.com> <1348242975-19184-6-git-send-email-cyril@ti.com> <20120924151305.GA14198@arm.com> <5060C9C6.8080900@ti.com> <20120924224021.GJ26454@n2100.arm.linux.org.uk> Message-ID: <20120925125519.GA2330@linaro.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Mon, Sep 24, 2012 at 06:55:25PM -0400, Nicolas Pitre wrote: > On Mon, 24 Sep 2012, Russell King - ARM Linux wrote: > > > On Mon, Sep 24, 2012 at 06:32:22PM -0400, Nicolas Pitre wrote: > > > We don't want to limit lowmem. The lowmem is very precious memory and > > > we want to maximize its size. In that case it is probably best to > > > implement a real patchable 64-bit addition rather than artificially > > > restricting the lowmem size. > > > > You don't need to. You can solve the V->P translation like this: > > > > movw %hi, #0xVWXY @ fixup > > adds %lo, %lo, #offset @ fixup > > adc %hi, %hi, #0 > > That's exactly what I mean when I say a real 64-bit addition. Despite > one of the arguments being a 32 bit value, the overflow needs to be > carried to the high part. > > > which is probably the simplest way to do the fixup - keep the existing > > fixup code, and add support for that movw instruction. And that will > > work across any 4GB boundary just fine (we won't have more than 4GB of > > virtual address space on a 32-bit CPU anyway, so we only have to worry > > about one carry.) > > Exactly. > > > And the P->V translation is truncating anyway, so that is just: > > > > sub %lo, %lo, #offset > > > > and nothing more. > > Yes, that's what the posted code does already. Taking a step back, this is all feeling very complex. Do we actually need to solve the P2V patching problem for this platform? For 32-bit platforms, especially pre-v6, small TLBs and caches, lack of branch prediction etc. may make the performance boost from P2V patching more significant, but I have some doubts about whether it is really worth it for more heavyweight v7+LPAE platforms. Would it be worth trying a simple macro implementation that reads a global variable for this case, and investigating the performance impact? For realistic workloads running on newer CPUs, I suspect that any costs from this may disappear into the noise... Cheers ---Dave