From: Waiman Long <waiman.long@hp.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Jeff Layton <jlayton@redhat.com>,
Miklos Szeredi <mszeredi@suse.cz>, Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Andi Kleen <andi@firstfloor.org>,
"Chandramouleeswaran, Aswin" <aswin@hp.com>,
"Norton, Scott J" <scott.norton@hp.com>
Subject: Re: [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount
Date: Fri, 30 Aug 2013 15:20:48 -0400 [thread overview]
Message-ID: <5220F090.5050908@hp.com> (raw)
In-Reply-To: <CA+55aFzv9BZ2+B2NrNRRtTqnfgbH3LRSvSsUZVsy-g9OpM-8HA@mail.gmail.com>
On 08/30/2013 02:53 PM, Linus Torvalds wrote:
> So the perf data would be *much* more interesting for a more varied
> load. I know pretty much exactly what happens with my silly
> test-program, and as you can see it never really gets to the actual
> spinlock, because that test program will only ever hit the fast-path
> case. It would be much more interesting to see another load that may
> trigger the d_lock actually being taken. So:
>> For the other test cases that I am interested in, like the AIM7 benchmark,
>> your patch may not be as good as my original one. I got 1-3M JPM (varied
>> quite a lot in different runs) in the short workloads on a 80-core system.
>> My original one got 6M JPM. However, the test was done on 3.10 based kernel.
>> So I need to do more test to see if that has an effect on the JPM results.
> I'd really like to see a perf profile of that, particularly with some
> call chain data for the relevant functions (ie "what it is that causes
> us to get to spinlocks"). Because it may well be that you're hitting
> some of the cases that I didn't see, and thus didn't notice.
>
> In particular, I suspect AIM7 actually creates/deletes files and/or
> renames them too. Or maybe I screwed up the dget_parent() special case
> thing, which mattered because AIM7 did a lot of getcwd() calls or
> someting odd like that.
>
> Linus
Below is the perf data of my short workloads run in an 80-core DL980:
13.60% reaim [kernel.kallsyms] [k]
_raw_spin_lock_irqsave
|--48.79%-- tty_ldisc_try
|--48.58%-- tty_ldisc_deref
--2.63%-- [...]
11.31% swapper [kernel.kallsyms] [k] intel_idle
|--99.94%-- cpuidle_enter_state
--0.06%-- [...]
4.86% reaim [kernel.kallsyms] [k] lg_local_lock
|--59.41%-- mntput_no_expire
|--19.37%-- path_init
|--15.14%-- d_path
|--5.88%-- sys_getcwd
--0.21%-- [...]
3.00% reaim reaim [.] mul_short
2.41% reaim reaim [.] mul_long
|--87.21%-- 0xbc614e
--12.79%-- (nil)
2.29% reaim reaim [.] mul_int
2.20% reaim [kernel.kallsyms] [k] _raw_spin_lock
|--12.81%-- prepend_path
|--9.90%-- lockref_put_or_lock
|--9.62%-- __rcu_process_callbacks
|--8.77%-- load_balance
|--6.40%-- lockref_get
|--5.55%-- __mutex_lock_slowpath
|--4.85%-- __mutex_unlock_slowpath
|--4.83%-- inet_twsk_schedule
|--4.27%-- lockref_get_or_lock
|--2.19%-- task_rq_lock
|--2.13%-- sem_lock
|--2.09%-- scheduler_tick
|--1.88%-- try_to_wake_up
|--1.53%-- kmem_cache_free
|--1.30%-- unix_create1
|--1.22%-- unix_release_sock
|--1.21%-- process_backlog
|--1.11%-- unix_stream_sendmsg
|--1.03%-- enqueue_to_backlog
|--0.85%-- rcu_accelerate_cbs
|--0.79%-- unix_dgram_sendmsg
|--0.76%-- do_anonymous_page
|--0.70%-- unix_stream_recvmsg
|--0.69%-- unix_stream_connect
|--0.64%-- net_rx_action
|--0.61%-- tcp_v4_rcv
|--0.59%-- __do_fault
|--0.54%-- new_inode_pseudo
|--0.52%-- __d_lookup
--10.62%-- [...]
1.19% reaim [kernel.kallsyms] [k] mspin_lock
|--99.82%-- __mutex_lock_slowpath
--0.18%-- [...]
1.01% reaim [kernel.kallsyms] [k] lg_global_lock
|--51.62%-- __shmdt
--48.38%-- __shmctl
There are more contention in the lglock than I remember for the run in
3.10. This is an area that I need to look at. In fact, lglock is
becoming a problem for really large machine with a lot of cores. We have
a prototype 16-socket machine with 240 cores under development. The cost
of doing a lg_global_lock will be very high in that type of machine
given that it is already high in this 80-core machine. I have been
thinking about instead of per-cpu spinlocks, we could change the locking
to per-node level. While there will be more contention for
lg_local_lock, the cost of doing a lg_global_lock will be much lower and
contention within the local die should not be too bad. That will require
either a per-node variable infrastructure or simulated with the existing
per-cpu subsystem.
I will also need to look at ways reduce the need of taking d_lock in
existing code. One area that I am looking at is whether we can take out
the lock/unlock pair in prepend_path(). This function can only be called
with the rename_lock taken. So no filename change or deletion will be
allowed. It will only be a problem if somehow the dentry itself got
killed or dropped while the name is being copied out. The first dentry
referenced by the path structure should have a non-zero reference count,
so that shouldn't happen. I am not so sure about the parents of that
dentry as I am not so familiar with that part of the filesystem code.
Regards,
Longman
next prev parent reply other threads:[~2013-08-30 19:20 UTC|newest]
Thread overview: 154+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-06 3:12 [PATCH v7 0/4] Lockless update of reference count protected by spinlock Waiman Long
2013-08-06 3:12 ` [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount Waiman Long
2013-08-29 1:40 ` Linus Torvalds
2013-08-29 4:44 ` Benjamin Herrenschmidt
2013-08-29 7:00 ` Ingo Molnar
2013-08-29 16:43 ` Linus Torvalds
2013-08-29 19:25 ` Linus Torvalds
2013-08-29 23:42 ` Linus Torvalds
2013-08-30 0:26 ` Benjamin Herrenschmidt
2013-08-30 0:49 ` Linus Torvalds
2013-08-30 2:06 ` Michael Neuling
2013-08-30 2:30 ` Benjamin Herrenschmidt
2013-08-30 2:35 ` Linus Torvalds
2013-08-30 2:45 ` Benjamin Herrenschmidt
2013-08-30 2:31 ` Linus Torvalds
2013-08-30 2:43 ` Benjamin Herrenschmidt
2013-08-30 7:16 ` Ingo Molnar
2013-08-30 15:28 ` Linus Torvalds
2013-08-30 3:12 ` Waiman Long
2013-08-30 3:54 ` Linus Torvalds
2013-08-30 7:55 ` Sedat Dilek
2013-08-30 8:10 ` Sedat Dilek
2013-08-30 9:27 ` Sedat Dilek
2013-08-30 9:48 ` Ingo Molnar
2013-08-30 9:56 ` Sedat Dilek
2013-08-30 9:58 ` Sedat Dilek
2013-08-30 10:29 ` Sedat Dilek
2013-08-30 10:36 ` Peter Zijlstra
2013-08-30 10:44 ` Sedat Dilek
2013-08-30 10:46 ` Sedat Dilek
2013-08-30 10:52 ` Peter Zijlstra
2013-08-30 10:57 ` Sedat Dilek
2013-08-30 14:05 ` Sedat Dilek
2013-08-30 11:19 ` Sedat Dilek
2013-08-30 10:38 ` Sedat Dilek
2013-08-30 15:34 ` Linus Torvalds
2013-08-30 15:38 ` Sedat Dilek
2013-08-30 16:12 ` Steven Rostedt
2013-08-30 16:16 ` Sedat Dilek
2013-08-30 18:42 ` Linus Torvalds
2013-08-30 16:32 ` Linus Torvalds
2013-08-30 16:37 ` Sedat Dilek
2013-08-30 16:52 ` Linus Torvalds
2013-08-30 17:11 ` Sedat Dilek
2013-08-30 17:26 ` Linus Torvalds
2013-09-01 10:01 ` Sedat Dilek
2013-09-01 10:33 ` Sedat Dilek
2013-09-01 15:32 ` Linus Torvalds
2013-09-01 15:45 ` Sedat Dilek
2013-09-01 15:55 ` Linus Torvalds
2013-09-02 10:30 ` Sedat Dilek
2013-09-02 16:09 ` David Ahern
2013-09-02 16:09 ` David Ahern
2013-09-01 20:59 ` Linus Torvalds
2013-09-01 21:23 ` Al Viro
2013-09-01 22:16 ` Linus Torvalds
2013-09-01 22:35 ` Al Viro
2013-09-01 22:44 ` Al Viro
2013-09-01 22:58 ` Linus Torvalds
2013-09-01 22:48 ` Linus Torvalds
2013-09-01 23:30 ` Al Viro
2013-09-02 0:12 ` Linus Torvalds
2013-09-02 0:50 ` Linus Torvalds
2013-09-02 0:50 ` Linus Torvalds
2013-09-02 7:05 ` Ingo Molnar
2013-09-02 16:44 ` Linus Torvalds
2013-09-03 10:15 ` Ingo Molnar
2013-09-03 15:41 ` Linus Torvalds
2013-09-03 18:34 ` Linus Torvalds
2013-09-03 19:19 ` Ingo Molnar
2013-09-03 21:05 ` Linus Torvalds
2013-09-03 21:13 ` Linus Torvalds
2013-09-03 21:34 ` Linus Torvalds
2013-09-03 21:39 ` Linus Torvalds
2013-09-03 14:08 ` Pavel Machek
2013-09-03 22:37 ` Sedat Dilek
2013-09-03 22:55 ` Dave Jones
2013-09-03 23:05 ` Sedat Dilek
2013-09-03 23:15 ` Dave Jones
2013-09-03 23:20 ` Sedat Dilek
2013-09-03 23:45 ` Sedat Dilek
2013-08-30 18:33 ` Waiman Long
2013-08-30 18:53 ` Linus Torvalds
2013-08-30 19:20 ` Waiman Long [this message]
2013-08-30 19:33 ` Linus Torvalds
2013-08-30 20:15 ` Waiman Long
2013-08-30 20:43 ` Linus Torvalds
2013-08-30 20:54 ` Al Viro
2013-08-30 21:03 ` Linus Torvalds
2013-08-30 21:44 ` Al Viro
2013-08-30 22:30 ` Linus Torvalds
2013-08-31 21:23 ` Al Viro
2013-08-31 22:49 ` Linus Torvalds
2013-08-31 23:27 ` Al Viro
2013-09-01 0:13 ` Al Viro
2013-09-01 17:48 ` Al Viro
2013-09-09 8:30 ` Peter Zijlstra
2013-08-30 21:10 ` Waiman Long
2013-08-30 21:22 ` Linus Torvalds
2013-08-30 21:30 ` Al Viro
2013-08-30 21:42 ` Waiman Long
2013-08-30 19:40 ` Al Viro
2013-08-30 19:52 ` Waiman Long
2013-08-30 20:26 ` Al Viro
2013-08-30 20:35 ` Waiman Long
2013-08-30 20:48 ` Al Viro
2013-08-31 2:02 ` Waiman Long
2013-08-31 2:35 ` Al Viro
2013-08-31 2:42 ` Al Viro
2013-09-02 19:25 ` Waiman Long
2013-09-03 6:01 ` Ingo Molnar
2013-09-03 7:24 ` Sedat Dilek
2013-09-03 15:38 ` Linus Torvalds
2013-09-03 15:14 ` Waiman Long
2013-09-03 15:34 ` Linus Torvalds
2013-09-03 19:09 ` Linus Torvalds
2013-09-03 21:01 ` Waiman Long
2013-09-04 14:52 ` Waiman Long
2013-09-04 15:14 ` Linus Torvalds
2013-09-04 19:25 ` Waiman Long
2013-09-04 21:34 ` Linus Torvalds
2013-09-05 2:35 ` Waiman Long
2013-09-05 13:31 ` Ingo Molnar
2013-09-05 17:33 ` Waiman Long
2013-09-05 17:40 ` Ingo Molnar
2013-09-03 22:41 ` Sedat Dilek
2013-09-03 23:11 ` Sedat Dilek
2013-09-03 23:11 ` Sedat Dilek
2013-09-08 21:45 ` Linus Torvalds
2013-09-09 0:03 ` Al Viro
2013-09-09 0:25 ` Linus Torvalds
2013-09-09 0:35 ` Al Viro
2013-09-09 0:38 ` Linus Torvalds
2013-09-09 0:57 ` Al Viro
2013-09-09 2:09 ` Ramkumar Ramachandra
2013-09-09 0:30 ` Al Viro
2013-09-09 3:32 ` Linus Torvalds
2013-09-09 4:06 ` Ramkumar Ramachandra
2013-09-09 5:44 ` Al Viro
2013-08-30 17:17 ` Peter Zijlstra
2013-08-30 17:28 ` Linus Torvalds
2013-08-30 17:33 ` Linus Torvalds
2013-08-29 15:20 ` Waiman Long
2013-08-06 3:12 ` [PATCH v7 2/4] spinlock: Enable x86 architecture to do lockless refcount update Waiman Long
2013-08-06 3:12 ` [PATCH v7 3/4] dcache: replace d_lock/d_count by d_lockcnt Waiman Long
2013-08-06 3:12 ` [PATCH v7 4/4] dcache: Enable lockless update of dentry's refcount Waiman Long
2013-08-13 18:03 ` [PATCH v7 0/4] Lockless update of reference count protected by spinlock Waiman Long
-- strict thread matches above, loose matches on Subject: below --
2013-08-31 3:06 [PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount George Spelvin
2013-08-31 17:16 ` Linus Torvalds
2013-09-01 8:50 ` George Spelvin
2013-09-01 11:10 ` Theodore Ts'o
2013-09-01 15:49 ` Linus Torvalds
2013-09-01 18:11 ` Steven Rostedt
2013-09-01 20:03 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5220F090.5050908@hp.com \
--to=waiman.long@hp.com \
--cc=andi@firstfloor.org \
--cc=aswin@hp.com \
--cc=benh@kernel.crashing.org \
--cc=jlayton@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=mszeredi@suse.cz \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=scott.norton@hp.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.