From: Linus Torvalds <torvalds@linux-foundation.org>
To: Andi Kleen <andi@firstfloor.org>, "H. Peter Anvin" <hpa@zytor.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: Word-at-a-time dcache name accesses (was Re: .. anybody know of any filesystems that depend on the exact VFS 'namehash' implementation?)
Date: Sat, 3 Mar 2012 12:10:09 -0800 [thread overview]
Message-ID: <CA+55aFyMn+gYh2uRPJSAS5n4UbQoY2iqe3peYjVosrP-73oQVA@mail.gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.02.1203021544300.28247@i5.linux-foundation.org>
On Fri, Mar 2, 2012 at 3:46 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> This *does* assume that "bsf" is a reasonably fast instruction, which is
> not necessarily the case especially on 32-bit x86. So the config option
> choice for this might want some tuning even on x86, but it would be lovely
> to get comments and have people test it out on older hardware.
Ok, so I was thinking about this. I can replace the "bsf" with a
multiply, and I just wonder which one is faster.
> + /* Get the final path component length */
> + len += __ffs(mask) >> 3;
> +
> + /* The mask *below* the first high bit set */
> + mask = (mask - 1) & ~mask;
> + mask >>= 7;
> + hash += a & mask;
So instead of the __ffs() on the original mask (to find the first byte
with the high bit set), I could use the "mask of bytes" and some math
to get the number of bytes set like this (so this goes at the end,
*after* we used the mask to mask off the bytes in 'a' - not where the
__ffs() is right now):
/* Low bits set in each byte we used as a mask */
mask &= ONEBYTES;
/* Add up "mask + (mask<<8) + (mask<<16) +... ":
same as a multiply */
mask *= ONEBYTES;
/* High byte now contains count of bits set */
len += mask >> 8*(sizeof(unsigned long)-1);
which I find intriguing because it just continues with the whole
"bitmask tricks" thing and even happens to re-use one of the bitmasks
we already had.
On machines with slow bit scanning (and a good multiplier), that might
be faster.
Sadly, it's a multiply with a big constant. Yes, we could make the
constant smaller by not counting the highest byte: it is never set, so
we could use "ONEBYTES>>8" and shift right by 8*sizeof(unsigned
long)-2) instead, but it's still not as cheap as just doing adds and
masks.
I can't come up with anything really cheap to calculate "number of
bytes set". But the above may be cheaper than the bsf on some older
32-bit machines that have horrible bit scanning performance (eg Atom
or P4). An integer multiply tends to be around four cycles, the bsf
performance is all over the map (2-17 cycles latency).
Linus
next prev parent reply other threads:[~2012-03-03 20:10 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-02 23:46 Word-at-a-time dcache name accesses (was Re: .. anybody know of any filesystems that depend on the exact VFS 'namehash' implementation?) Linus Torvalds
2012-03-03 0:02 ` Ted Ts'o
2012-03-03 0:17 ` david
2012-03-03 0:24 ` Linus Torvalds
2012-03-04 22:19 ` Matthew Wilcox
2012-03-04 23:27 ` Linus Torvalds
2012-03-03 0:17 ` Linus Torvalds
2012-03-03 0:38 ` H. Peter Anvin
2012-03-03 0:57 ` Linus Torvalds
2012-03-03 1:02 ` H. Peter Anvin
2012-03-03 1:11 ` Linus Torvalds
2012-03-03 1:17 ` H. Peter Anvin
2012-03-03 16:12 ` Word-at-a-time dcache name accesses Andi Kleen
2012-03-03 18:47 ` H. Peter Anvin
2012-03-03 20:10 ` Linus Torvalds [this message]
2012-03-04 2:27 ` Word-at-a-time dcache name accesses (was Re: .. anybody know ofany filesystems that depend on the exact VFS 'namehash' implementation?) Tetsuo Handa
2012-03-04 4:31 ` Andi Kleen
2012-03-05 3:58 ` Word-at-a-time dcache name accesses (was Re: .. anybody know of any " Jason Garrett-Glaser
2012-03-05 5:38 ` Linus Torvalds
2012-03-27 4:42 ` Brian Gerst
2012-03-27 5:02 ` Dave Jones
2012-03-27 5:31 ` Brian Gerst
2012-03-28 0:39 ` Linus Torvalds
2012-03-28 0:50 ` Linus Torvalds
2012-03-28 0:56 ` Brian Gerst
[not found] ` <CACvQF53YasSCUit2KoWDimgObknCz++aU90MesSfvAZTeUFQHw@mail.gmail.com>
2013-04-04 16:50 ` Lai Jiangshan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA+55aFyMn+gYh2uRPJSAS5n4UbQoY2iqe3peYjVosrP-73oQVA@mail.gmail.com \
--to=torvalds@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=hpa@zytor.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).