From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Wang, Yalin" <Yalin.Wang@sonymobile.com>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Gao, Neil" <Neil.Gao@sonymobile.com>
Subject: Re: [RFC V2] test_bit before clear files_struct bits
Date: Wed, 11 Feb 2015 00:46:01 +0200 [thread overview]
Message-ID: <20150210224601.GA6170@node.dhcp.inet.fi> (raw)
In-Reply-To: <CA+55aFzLgPZoLKRK5rPk8hpCS=Y8CNh59K_tzEZEVKpt1VyBWg@mail.gmail.com>
On Tue, Feb 10, 2015 at 12:49:46PM -0800, Linus Torvalds wrote:
> On Tue, Feb 10, 2015 at 12:22 PM, Andrew Morton
> <akpm@linux-foundation.org> wrote:
> >
> > The patch is good but I'm still wondering if any CPUs can do this
> > speedup for us. The CPU has to pull in the target word to modify the
> > bit and what it *could* do is to avoid dirtying the cacheline if it
> > sees that the bit is already in the desired state.
>
> Sadly, no CPU I know of actually does this. Probably because it would
> take a bit more core resources, and conditional writes to memory are
> not normally part of an x86 core (it might be more natural for
> something like old-style ARM that has conditional writes).
>
> Also, even if the store were to be conditional, the cacheline would
> have been acquired in exclusive state, and in many cache protocols the
> state machine is from exclusive to dirty (since normally the only
> reason to get a cacheline for exclusive use is in order to write to
> it). So a "read, test, conditional write" ends up actually being more
> complicated than you'd think - because you *want* that
> exclusive->dirty state for the case where you really are going to
> change the bit, and to avoid extra cache protocol stages you don't
> generally want to read the cacheline into a shared read mode first
> (only to then have to turn it into exclusive/dirty as a second state)
That all sounds resonable.
But I still fail to understand why my micro-benchmark is faster with
branch before store comparing to plain store.
http://article.gmane.org/gmane.linux.kernel.cross-arch/26254
In this case we would not have intermidiate shared state, because we don't
have anybody else to share cache line with. So with branch we would have
the smae E->M and write-back to memory as without branch. But it doesn't
explain why branch makes code faster.
Any ideas?
--
Kirill A. Shutemov
next prev parent reply other threads:[~2015-02-10 22:46 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-10 7:06 [RFC] test_bit before clear files_struct bits Wang, Yalin
2015-02-10 7:11 ` [RFC V2] " Wang, Yalin
2015-02-10 20:22 ` Andrew Morton
2015-02-10 20:49 ` Linus Torvalds
2015-02-10 22:46 ` Kirill A. Shutemov [this message]
2015-02-10 23:29 ` Linus Torvalds
2015-02-15 8:27 ` [RFC V3] test bit " Wang, Yalin
2015-02-18 21:27 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150210224601.GA6170@node.dhcp.inet.fi \
--to=kirill@shutemov.name \
--cc=Neil.Gao@sonymobile.com \
--cc=Yalin.Wang@sonymobile.com \
--cc=akpm@linux-foundation.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.