All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@linux.intel.com>
To: linux-kernel@vger.kernel.org
Subject: [RFC] Optimising IS_ERR_OR_NULL
Date: Sat, 5 Apr 2014 10:43:37 -0400	[thread overview]
Message-ID: <20140405144337.GA5727@linux.intel.com> (raw)


(4 days too late for April Fools ... oh well :-)

I don't like the look of IS_ERR_OR_NULL.  It does two tests when (due to
the bit patterns used to represent errors and NULL pointers) it could
use just one:

#define IS_ERR_VALUE(x) unlikely((x) >= (unsigned long)-MAX_ERRNO)

static inline long __must_check IS_ERR_OR_NULL(__force const void *ptr)
{
	return !ptr || IS_ERR_VALUE((unsigned long)ptr);
}

It needs some performance testing to be sure that it's a win, but I'm
essentially suggesting this:

+++ b/include/linux/err.h
@@ -34,10 +34,8 @@ static inline long __must_check IS_ERR(__force const void *pt
        return IS_ERR_VALUE((unsigned long)ptr);
 }
 
-static inline long __must_check IS_ERR_OR_NULL(__force const void *ptr)
-{
-       return !ptr || IS_ERR_VALUE((unsigned long)ptr);
-}
+#define IS_ERR_OR_NULL(ptr) \
+       unlikely(((unsigned long)PTR_ERR(ptr) + MAX_ERRNO) <= MAX_ERRNO)
 
 /**
  * ERR_CAST - Explicitly cast an error-valued pointer to another pointer type

(deliberately whitespace-damaged to ensure it doesn't get applied
without thought).

Comments, before I go off and run some perf tests?

Here's how I got to this point:

Let's take kern_unmount() as our example caller (for no particular reason
... first one I came across.  It is blessedly short though):

void kern_unmount(struct vfsmount *mnt)
{
        /* release long term mount so mount point can be released */
        if (!IS_ERR_OR_NULL(mnt)) {
                real_mount(mnt)->mnt_ns = NULL;
                synchronize_rcu();      /* yecchhh... */
                mntput(mnt);
        }
}

Here's the assembly for it:

0000000000001180 <kern_unmount>:
    1180:       48 85 ff                test   %rdi,%rdi
    1183:       53                      push   %rbx
    1184:       48 89 fb                mov    %rdi,%rbx
    1187:       74 27                   je     11b0 <kern_unmount+0x30>
    1189:       48 81 ff 00 f0 ff ff    cmp    $0xfffffffffffff000,%rdi
    1190:       77 1e                   ja     11b0 <kern_unmount+0x30>
    1192:       48 c7 87 c0 00 00 00    movq   $0x0,0xc0(%rdi)
    1199:       00 00 00 00 
    119d:       e8 00 00 00 00          callq  11a2 <kern_unmount+0x22>
                        119e: R_X86_64_PC32     synchronize_sched-0x4
    11a2:       48 89 df                mov    %rbx,%rdi
    11a5:       5b                      pop    %rbx
    11a6:       e9 00 00 00 00          jmpq   11ab <kern_unmount+0x2b>
                        11a7: R_X86_64_PC32     mntput-0x4
    11ab:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
    11b0:       5b                      pop    %rbx
    11b1:       c3                      retq   
    11b2:       66 66 66 66 66 2e 0f    data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
    11b9:       1f 84 00 00 00 00 00 

So we do two tests and jumps; first at 1180 and then at 1189, in either
case we jump forward to the end.  Now here's what happens when we optimise
IS_ERR_OR_NULL to make use of the fact that 0 is actually contiguous
with the range of errors:

 static inline long __must_check IS_ERR_OR_NULL(__force const void *ptr)
 {
-       return !ptr || IS_ERR_VALUE((unsigned long)ptr);
+       return ((unsigned long)ptr + MAX_ERRNO) <= MAX_ERRNO;
 }

0000000000001180 <kern_unmount>:
    1180:       48 8d 87 ff 0f 00 00    lea    0xfff(%rdi),%rax
    1187:       48 3d ff 0f 00 00       cmp    $0xfff,%rax
    118d:       77 09                   ja     1198 <kern_unmount+0x18>
    118f:       f3 c3                   repz retq 
    1191:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)
    1198:       48 83 ec 08             sub    $0x8,%rsp
    119c:       48 c7 87 c0 00 00 00    movq   $0x0,0xc0(%rdi)
    11a3:       00 00 00 00 
    11a7:       48 89 3c 24             mov    %rdi,(%rsp)
    11ab:       e8 00 00 00 00          callq  11b0 <kern_unmount+0x30>
                        11ac: R_X86_64_PC32     synchronize_sched-0x4
    11b0:       48 8b 3c 24             mov    (%rsp),%rdi
    11b4:       48 83 c4 08             add    $0x8,%rsp
    11b8:       e9 00 00 00 00          jmpq   11bd <kern_unmount+0x3d>
                        11b9: R_X86_64_PC32     mntput-0x4
    11bd:       0f 1f 00                nopl   (%rax)

It's fewer instructions, and only one compare, which is nice.  But gcc
now thinks that IS_ERR_OR_NULL is the likely case and has an early return
embedded in the function (118f).  So let's quickly fix that up:

-       if (!IS_ERR_OR_NULL(mnt)) {
+       if (!unlikely(IS_ERR_OR_NULL(mnt))) {

0000000000001180 <kern_unmount>:
    1180:       48 8d 87 ff 0f 00 00    lea    0xfff(%rdi),%rax
    1187:       53                      push   %rbx
    1188:       48 89 fb                mov    %rdi,%rbx
    118b:       48 3d ff 0f 00 00       cmp    $0xfff,%rax
    1191:       76 19                   jbe    11ac <kern_unmount+0x2c>
    1193:       48 c7 87 c0 00 00 00    movq   $0x0,0xc0(%rdi)
    119a:       00 00 00 00 
    119e:       e8 00 00 00 00          callq  11a3 <kern_unmount+0x23>
                        119f: R_X86_64_PC32     synchronize_sched-0x4
    11a3:       48 89 df                mov    %rbx,%rdi
    11a6:       5b                      pop    %rbx
    11a7:       e9 00 00 00 00          jmpq   11ac <kern_unmount+0x2c>
                        11a8: R_X86_64_PC32     mntput-0x4
    11ac:       5b                      pop    %rbx
    11ad:       c3                      retq   
    11ae:       66 90                   xchg   %ax,%ax

Now that's a genuine improvement; we saved one instruction (lea, cmp,
jbe vs test, je, cmp, ja), and due to various alignment / padding / etc.
we ended up saving 4 bytes on the length of the function which turns
into 16 bytes due to function alignment.


             reply	other threads:[~2014-04-05 14:43 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-05 14:43 Matthew Wilcox [this message]
2014-04-05 15:48 ` [RFC] Optimising IS_ERR_OR_NULL Alexander Holler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140405144337.GA5727@linux.intel.com \
    --to=willy@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.