Re: [PATCH v2 3/3] vfs: Enable list batching for the superblock's inode list

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Waiman Long <waiman.long@hpe.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, x86@kernel.org,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Andi Kleen <andi@firstfloor.org>,
	Scott J Norton <scott.norton@hp.com>,
	Douglas Hatch <doug.hatch@hp.com>
Subject: Re: [PATCH v2 3/3] vfs: Enable list batching for the superblock's inode list
Date: Mon, 01 Feb 2016 16:44:02 -0500	[thread overview]
Message-ID: <56AFD1A2.1060902@hpe.com> (raw)
In-Reply-To: <20160130083557.GA31749@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2280 bytes --]

On 01/30/2016 03:35 AM, Ingo Molnar wrote:
> * Waiman Long<Waiman.Long@hpe.com>  wrote:
>
>> The inode_sb_list_add() and inode_sb_list_del() functions in the vfs
>> layer just perform list addition and deletion under lock. So they can
>> use the new list batching facility to speed up the list operations
>> when many CPUs are trying to do it simultaneously.
>>
>> In particular, the inode_sb_list_del() function can be a performance
>> bottleneck when large applications with many threads and associated
>> inodes exit. With an exit microbenchmark that creates a large number
>> of threads, attachs many inodes to them and then exits. The runtimes
>> of that microbenchmark with 1000 threads before and after the patch
>> on a 4-socket Intel E7-4820 v3 system (48 cores, 96 threads) were
>> as follows:
>>
>>    Kernel        Elapsed Time    System Time
>>    ------        ------------    -----------
>>    Vanilla 4.4      65.29s         82m14s
>>    Patched 4.4      45.69s         49m44s
>>
>> The elapsed time and the reported system time were reduced by 30%
>> and 40% respectively.
> That's pretty impressive!
>
> I'm wondering, why are inode_sb_list_add()/del() even called for a presumably
> reasonably well cached benchmark running on a system with enough RAM? Are these
> perhaps thousands of temporary files, already deleted, and released when all the
> file descriptors are closed as part of sys_exit()?

The inodes that need to be deleted were actually procfs files which have 
to go away when the processes/threads exit. I encountered this problem 
when running the SPECjbb2013 benchmark on large machine where sometimes 
it might seems to hang for 30 mins or so after the benchmark complete. I 
wrote a simple microbenchmark to simulate this situation which is in the 
attachment.


> If that's the case then I suspect an even bigger win would be not just to batch
> the (sb-)global list fiddling, but to potentially turn the sb list into a
> percpu_alloc() managed set of per CPU lists? It's a bigger change, but it could
> speed up a lot of other temporary file intensive usecases as well, not just
> batched delete.
>
> Thanks,
>
> 	Ingo

Yes, that can be another possible. I will investigate further on that 
one. Thanks for the suggestion.

Cheers,
Longman


[-- Attachment #2: exit_test.c --]
[-- Type: text/plain, Size: 2665 bytes --]

/*
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * Authors: Waiman Long <waiman.long@hp.com>
 */
/*
 * This is an exit test
 */
#include <ctype.h>
#include <errno.h>
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/syscall.h>


#define do_exit()	syscall(SYS_exit)
#define	gettid()	syscall(SYS_gettid)
#define	MAX_THREADS	2048

static inline void cpu_relax(void)
{
        __asm__ __volatile__("rep;nop": : :"memory");
}

static inline void atomic_inc(volatile int *v)
{
	__asm__ __volatile__("lock incl %0": "+m" (*v));
}

static volatile int exit_now  = 0;
static volatile int threadcnt = 0;

/*
 * Walk the /proc/<pid> filesystem to make them fill the dentry cache
 */
static void walk_procfs(void)
{
	char cmdbuf[256];
	pid_t tid = gettid();

	snprintf(cmdbuf, sizeof(cmdbuf), "find /proc/%d > /dev/null 2>&1", tid);
	if (system(cmdbuf) < 0)
		perror("system() failed!");
}

static void *exit_thread(void *dummy)
{
	long tid = (long)dummy;

	walk_procfs();
	atomic_inc(&threadcnt);
	/*
	 * Busy wait until the do_exit flag is set and then call exit
	 */
	while (!exit_now)
		sleep(1);
	do_exit();
}

static void exit_test(int threads)
{
	pthread_t thread[threads];
	long i = 0, finish;
	time_t start = time(NULL);

	while (i++ < threads) {
		if (pthread_create(thread + i - 1, NULL, exit_thread,
				  (void *)i)) {
			perror("pthread_create");
			exit(1);
		}
#if 0
		/*
		 * Pipelining to reduce contention & improve speed
		 */
		if ((i & 0xf) == 0)
			 while (i - threadcnt > 12)
				usleep(1);
#endif
	}
	while (threadcnt != threads)
		usleep(1);
	walk_procfs();
	printf("Setup time = %lus\n", time(NULL) - start);
	printf("Process ready to exit!\n");
	kill(0, SIGKILL);
	exit(0);
}

int main(int argc, char *argv[])
{
	int   tcnt;	/* Thread counts */
	char *cmd = argv[0];

	if ((argc != 2) || !isdigit(argv[1][0])) {
		fprintf(stderr, "Usage: %s <thread count>\n", cmd);
		exit(1);
	}
	tcnt = strtoul(argv[1], NULL, 10);
	if (tcnt > MAX_THREADS) {
		fprintf(stderr, "Error: thread count should be <= %d\n",
			MAX_THREADS);
		exit(1);
	}
	exit_test(tcnt);
	return 0;	/* Not reaachable */
}

next prev parent reply	other threads:[~2016-02-01 21:44 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-29 19:30 [PATCH v2 0/3] lib/list_batch: A simple list insertion/deletion batching facility Waiman Long
2016-01-29 19:30 ` [PATCH v2 1/3] " Waiman Long
2016-02-01  0:47   ` Dave Chinner
2016-02-03 23:11     ` Waiman Long
2016-02-06 23:57       ` Dave Chinner
2016-02-17  1:37         ` Waiman Long
2016-01-29 19:30 ` [PATCH v2 2/3] lib/list_batch, x86: Enable list insertion/deletion batching for x86 Waiman Long
2016-01-29 19:30 ` [PATCH v2 3/3] vfs: Enable list batching for the superblock's inode list Waiman Long
2016-01-30  8:35   ` Ingo Molnar
2016-02-01 17:45     ` Andi Kleen
2016-02-01 22:03       ` Waiman Long
2016-02-03 22:59         ` Waiman Long
2016-02-06 23:51           ` Dave Chinner
2016-02-01 21:44     ` Waiman Long [this message]
2016-02-01  0:04   ` Dave Chinner
2016-02-03 23:01     ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56AFD1A2.1060902@hpe.com \
    --to=waiman.long@hpe.com \
    --cc=andi@firstfloor.org \
    --cc=doug.hatch@hp.com \
    --cc=hpa@zytor.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=scott.norton@hp.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).