From: Waiman Long <waiman.long@hpe.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
linux-fsdevel@vger.kernel.org, x86@kernel.org,
linux-kernel@vger.kernel.org,
Peter Zijlstra <peterz@infradead.org>,
Andi Kleen <andi@firstfloor.org>,
Scott J Norton <scott.norton@hp.com>,
Douglas Hatch <doug.hatch@hp.com>
Subject: Re: [PATCH v2 3/3] vfs: Enable list batching for the superblock's inode list
Date: Mon, 01 Feb 2016 16:44:02 -0500 [thread overview]
Message-ID: <56AFD1A2.1060902@hpe.com> (raw)
In-Reply-To: <20160130083557.GA31749@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2280 bytes --]
On 01/30/2016 03:35 AM, Ingo Molnar wrote:
> * Waiman Long<Waiman.Long@hpe.com> wrote:
>
>> The inode_sb_list_add() and inode_sb_list_del() functions in the vfs
>> layer just perform list addition and deletion under lock. So they can
>> use the new list batching facility to speed up the list operations
>> when many CPUs are trying to do it simultaneously.
>>
>> In particular, the inode_sb_list_del() function can be a performance
>> bottleneck when large applications with many threads and associated
>> inodes exit. With an exit microbenchmark that creates a large number
>> of threads, attachs many inodes to them and then exits. The runtimes
>> of that microbenchmark with 1000 threads before and after the patch
>> on a 4-socket Intel E7-4820 v3 system (48 cores, 96 threads) were
>> as follows:
>>
>> Kernel Elapsed Time System Time
>> ------ ------------ -----------
>> Vanilla 4.4 65.29s 82m14s
>> Patched 4.4 45.69s 49m44s
>>
>> The elapsed time and the reported system time were reduced by 30%
>> and 40% respectively.
> That's pretty impressive!
>
> I'm wondering, why are inode_sb_list_add()/del() even called for a presumably
> reasonably well cached benchmark running on a system with enough RAM? Are these
> perhaps thousands of temporary files, already deleted, and released when all the
> file descriptors are closed as part of sys_exit()?
The inodes that need to be deleted were actually procfs files which have
to go away when the processes/threads exit. I encountered this problem
when running the SPECjbb2013 benchmark on large machine where sometimes
it might seems to hang for 30 mins or so after the benchmark complete. I
wrote a simple microbenchmark to simulate this situation which is in the
attachment.
> If that's the case then I suspect an even bigger win would be not just to batch
> the (sb-)global list fiddling, but to potentially turn the sb list into a
> percpu_alloc() managed set of per CPU lists? It's a bigger change, but it could
> speed up a lot of other temporary file intensive usecases as well, not just
> batched delete.
>
> Thanks,
>
> Ingo
Yes, that can be another possible. I will investigate further on that
one. Thanks for the suggestion.
Cheers,
Longman
[-- Attachment #2: exit_test.c --]
[-- Type: text/plain, Size: 2665 bytes --]
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* Authors: Waiman Long <waiman.long@hp.com>
*/
/*
* This is an exit test
*/
#include <ctype.h>
#include <errno.h>
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/syscall.h>
#define do_exit() syscall(SYS_exit)
#define gettid() syscall(SYS_gettid)
#define MAX_THREADS 2048
static inline void cpu_relax(void)
{
__asm__ __volatile__("rep;nop": : :"memory");
}
static inline void atomic_inc(volatile int *v)
{
__asm__ __volatile__("lock incl %0": "+m" (*v));
}
static volatile int exit_now = 0;
static volatile int threadcnt = 0;
/*
* Walk the /proc/<pid> filesystem to make them fill the dentry cache
*/
static void walk_procfs(void)
{
char cmdbuf[256];
pid_t tid = gettid();
snprintf(cmdbuf, sizeof(cmdbuf), "find /proc/%d > /dev/null 2>&1", tid);
if (system(cmdbuf) < 0)
perror("system() failed!");
}
static void *exit_thread(void *dummy)
{
long tid = (long)dummy;
walk_procfs();
atomic_inc(&threadcnt);
/*
* Busy wait until the do_exit flag is set and then call exit
*/
while (!exit_now)
sleep(1);
do_exit();
}
static void exit_test(int threads)
{
pthread_t thread[threads];
long i = 0, finish;
time_t start = time(NULL);
while (i++ < threads) {
if (pthread_create(thread + i - 1, NULL, exit_thread,
(void *)i)) {
perror("pthread_create");
exit(1);
}
#if 0
/*
* Pipelining to reduce contention & improve speed
*/
if ((i & 0xf) == 0)
while (i - threadcnt > 12)
usleep(1);
#endif
}
while (threadcnt != threads)
usleep(1);
walk_procfs();
printf("Setup time = %lus\n", time(NULL) - start);
printf("Process ready to exit!\n");
kill(0, SIGKILL);
exit(0);
}
int main(int argc, char *argv[])
{
int tcnt; /* Thread counts */
char *cmd = argv[0];
if ((argc != 2) || !isdigit(argv[1][0])) {
fprintf(stderr, "Usage: %s <thread count>\n", cmd);
exit(1);
}
tcnt = strtoul(argv[1], NULL, 10);
if (tcnt > MAX_THREADS) {
fprintf(stderr, "Error: thread count should be <= %d\n",
MAX_THREADS);
exit(1);
}
exit_test(tcnt);
return 0; /* Not reaachable */
}
next prev parent reply other threads:[~2016-02-01 21:44 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-29 19:30 [PATCH v2 0/3] lib/list_batch: A simple list insertion/deletion batching facility Waiman Long
2016-01-29 19:30 ` [PATCH v2 1/3] " Waiman Long
2016-02-01 0:47 ` Dave Chinner
2016-02-03 23:11 ` Waiman Long
2016-02-06 23:57 ` Dave Chinner
2016-02-17 1:37 ` Waiman Long
2016-01-29 19:30 ` [PATCH v2 2/3] lib/list_batch, x86: Enable list insertion/deletion batching for x86 Waiman Long
2016-01-29 19:30 ` [PATCH v2 3/3] vfs: Enable list batching for the superblock's inode list Waiman Long
2016-01-30 8:35 ` Ingo Molnar
2016-02-01 17:45 ` Andi Kleen
2016-02-01 22:03 ` Waiman Long
2016-02-03 22:59 ` Waiman Long
2016-02-06 23:51 ` Dave Chinner
2016-02-01 21:44 ` Waiman Long [this message]
2016-02-01 0:04 ` Dave Chinner
2016-02-03 23:01 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56AFD1A2.1060902@hpe.com \
--to=waiman.long@hpe.com \
--cc=andi@firstfloor.org \
--cc=doug.hatch@hp.com \
--cc=hpa@zytor.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=scott.norton@hp.com \
--cc=tglx@linutronix.de \
--cc=viro@zeniv.linux.org.uk \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).