From: Waiman Long <waiman.long@hpe.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kara <jack@suse.cz>, Alexander Viro <viro@zeniv.linux.org.uk>,
Jan Kara <jack@suse.com>, Jeff Layton <jlayton@poochiereds.net>,
"J. Bruce Fields" <bfields@fieldses.org>,
Tejun Heo <tj@kernel.org>,
Christoph Lameter <cl@linux-foundation.org>,
<linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Andi Kleen <andi@firstfloor.org>,
Dave Chinner <dchinner@redhat.com>,
Scott J Norton <scott.norton@hp.com>,
Douglas Hatch <doug.hatch@hp.com>
Subject: Re: [PATCH v3 3/3] vfs: Use per-cpu list for superblock's inode list
Date: Thu, 25 Feb 2016 09:43:46 -0500 [thread overview]
Message-ID: <56CF1322.2040609@hpe.com> (raw)
In-Reply-To: <20160225080635.GB10611@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3052 bytes --]
On 02/25/2016 03:06 AM, Ingo Molnar wrote:
> * Jan Kara<jack@suse.cz> wrote:
>
>>>>> With an exit microbenchmark that creates a large number of threads,
>>>>> attachs many inodes to them and then exits. The runtimes of that
>>>>> microbenchmark with 1000 threads before and after the patch on a 4-socket
>>>>> Intel E7-4820 v3 system (40 cores, 80 threads) were as follows:
>>>>>
>>>>> Kernel Elapsed Time System Time
>>>>> ------ ------------ -----------
>>>>> Vanilla 4.5-rc4 65.29s 82m14s
>>>>> Patched 4.5-rc4 22.81s 23m03s
>>>>>
>>>>> Before the patch, spinlock contention at the inode_sb_list_add() function
>>>>> at the startup phase and the inode_sb_list_del() function at the exit
>>>>> phase were about 79% and 93% of total CPU time respectively (as measured
>>>>> by perf). After the patch, the percpu_list_add() function consumed only
>>>>> about 0.04% of CPU time at startup phase. The percpu_list_del() function
>>>>> consumed about 0.4% of CPU time at exit phase. There were still some
>>>>> spinlock contention, but they happened elsewhere.
>>>> While looking through this patch, I have noticed that the
>>>> list_for_each_entry_safe() iterations in evict_inodes() and
>>>> invalidate_inodes() are actually unnecessary. So if you first apply the
>>>> attached patch, you don't have to implement safe iteration variants at all.
>>>>
>>>> As a second comment, I'd note that this patch grows struct inode by 1
>>>> pointer. It is probably acceptable for large machines given the speedup but
>>>> it should be noted in the changelog. Furthermore for UP or even small SMP
>>>> systems this is IMHO undesired bloat since the speedup won't be noticeable.
>>>>
>>>> So for these small systems it would be good if per-cpu list magic would just
>>>> fall back to single linked list with a spinlock. Do you think that is
>>>> reasonably doable?
>>> Even many 'small' systems tend to be SMP these days.
>> Yes, I know. But my tablet with 4 ARM cores is unlikely to benefit from this
>> change either. [...]
> I'm not sure about that at all, the above numbers are showing a 3x-4x speedup in
> system time, which ought to be noticeable on smaller SMP systems as well.
>
> Waiman, could you please post the microbenchmark?
>
> Thanks,
>
> Ingo
The microbenchmark that I used is attached.
I do agree that performance benefit will decrease as the number of CPUs
get smaller. The system that I used for testing have 4 sockets with 40
cores (80 threads). Dave Chinner had run his fstests on a 16-core system
(probably 2-socket) which showed modest improvement in performance
(~4m40s vs 4m30s in runtime).
This patch enables parallel insertion and deletion to/from the inode
list which used to be a serialized operation. So if that list operation
is a bottleneck, you will see significant improvement. If it is not, we
may not notice that much of a difference. For a single-socket 4-core
system, I agree that the performance benefit, if any, will be limited.
Cheers,
Longman
[-- Attachment #2: exit_test.c --]
[-- Type: text/plain, Size: 2665 bytes --]
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* Authors: Waiman Long <waiman.long@hp.com>
*/
/*
* This is an exit test
*/
#include <ctype.h>
#include <errno.h>
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/syscall.h>
#define do_exit() syscall(SYS_exit)
#define gettid() syscall(SYS_gettid)
#define MAX_THREADS 2048
static inline void cpu_relax(void)
{
__asm__ __volatile__("rep;nop": : :"memory");
}
static inline void atomic_inc(volatile int *v)
{
__asm__ __volatile__("lock incl %0": "+m" (*v));
}
static volatile int exit_now = 0;
static volatile int threadcnt = 0;
/*
* Walk the /proc/<pid> filesystem to make them fill the dentry cache
*/
static void walk_procfs(void)
{
char cmdbuf[256];
pid_t tid = gettid();
snprintf(cmdbuf, sizeof(cmdbuf), "find /proc/%d > /dev/null 2>&1", tid);
if (system(cmdbuf) < 0)
perror("system() failed!");
}
static void *exit_thread(void *dummy)
{
long tid = (long)dummy;
walk_procfs();
atomic_inc(&threadcnt);
/*
* Busy wait until the do_exit flag is set and then call exit
*/
while (!exit_now)
sleep(1);
do_exit();
}
static void exit_test(int threads)
{
pthread_t thread[threads];
long i = 0, finish;
time_t start = time(NULL);
while (i++ < threads) {
if (pthread_create(thread + i - 1, NULL, exit_thread,
(void *)i)) {
perror("pthread_create");
exit(1);
}
#if 0
/*
* Pipelining to reduce contention & improve speed
*/
if ((i & 0xf) == 0)
while (i - threadcnt > 12)
usleep(1);
#endif
}
while (threadcnt != threads)
usleep(1);
walk_procfs();
printf("Setup time = %lus\n", time(NULL) - start);
printf("Process ready to exit!\n");
kill(0, SIGKILL);
exit(0);
}
int main(int argc, char *argv[])
{
int tcnt; /* Thread counts */
char *cmd = argv[0];
if ((argc != 2) || !isdigit(argv[1][0])) {
fprintf(stderr, "Usage: %s <thread count>\n", cmd);
exit(1);
}
tcnt = strtoul(argv[1], NULL, 10);
if (tcnt > MAX_THREADS) {
fprintf(stderr, "Error: thread count should be <= %d\n",
MAX_THREADS);
exit(1);
}
exit_test(tcnt);
return 0; /* Not reaachable */
}
next prev parent reply other threads:[~2016-02-25 14:44 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-23 19:04 [PATCH v3 0/3] vfs: Use per-cpu list for SB's s_inodes list Waiman Long
2016-02-23 19:04 ` [PATCH v3 1/3] lib/percpu-list: Per-cpu list with associated per-cpu locks Waiman Long
2016-02-24 2:00 ` Boqun Feng
2016-02-24 4:01 ` Waiman Long
2016-02-24 7:56 ` Jan Kara
2016-02-24 19:51 ` Waiman Long
2016-02-23 19:04 ` [PATCH v3 2/3] fsnotify: Simplify inode iteration on umount Waiman Long
2016-02-23 19:04 ` [PATCH v3 3/3] vfs: Use per-cpu list for superblock's inode list Waiman Long
2016-02-24 8:28 ` Jan Kara
2016-02-24 8:36 ` Ingo Molnar
2016-02-24 8:58 ` Jan Kara
2016-02-25 8:06 ` Ingo Molnar
2016-02-25 14:43 ` Waiman Long [this message]
2016-02-24 20:23 ` Waiman Long
2016-02-25 14:50 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56CF1322.2040609@hpe.com \
--to=waiman.long@hpe.com \
--cc=andi@firstfloor.org \
--cc=bfields@fieldses.org \
--cc=cl@linux-foundation.org \
--cc=dchinner@redhat.com \
--cc=doug.hatch@hp.com \
--cc=jack@suse.com \
--cc=jack@suse.cz \
--cc=jlayton@poochiereds.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=scott.norton@hp.com \
--cc=tj@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).