From mboxrd@z Thu Jan 1 00:00:00 1970 From: "H. Peter Anvin" Subject: Re: [RFC] speeding up the stat() family of system calls... Date: Thu, 26 Dec 2013 16:45:56 -0800 Message-ID: <52BCCDC4.1090409@zytor.com> References: <20131224204625.GB20471@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Ingo Molnar , Thomas Gleixner , Al Viro , the arch/x86 maintainers , linux-fsdevel , Linux Kernel Mailing List To: Linus Torvalds , Ingo Molnar Return-path: In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On 12/26/2013 11:00 AM, Linus Torvalds wrote: > > Interestingly, looking at the cp_new_stat() profiles, the games we > play to get efficient range checking seem to actually hurt us. Maybe > it's the "sbb" that is just expensive, or maybe it's turning a (very > predictable) conditional branch into a data dependency chain instead. > Or maybe it's just random noise in my profiles that happened to make > those sbb's look bad. > I'm not at all surprised... there is a pretty serious data dependency chain here and in the end we end up manifesting a value in a register that has to be tested even though it is available in the flags. Inline assembly also means the compiler can't optimize it at all. I have to wonder if we actually have to test the upper limit, though: we can always guarantee a guard zone between user space and kernel space, and thus guarantee either a #PF or #GP if someone tries to overflow user space. Testing just the lower limit would be much cheaper, especially on 64 bits where we can simply test the sign bit. What do you think? -hpa