Re: [PATCH v2 3/3] vfs: Enable list batching for the superblock's inode list

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Waiman Long <waiman.long@hpe.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, x86@kernel.org,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Andi Kleen <andi@firstfloor.org>,
	Scott J Norton <scott.norton@hp.com>,
	Douglas Hatch <doug.hatch@hp.com>
Subject: Re: [PATCH v2 3/3] vfs: Enable list batching for the superblock's inode list
Date: Mon, 01 Feb 2016 16:44:02 -0500	[thread overview]
Message-ID: <56AFD1A2.1060902@hpe.com> (raw)
In-Reply-To: <20160130083557.GA31749@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2280 bytes --]

On 01/30/2016 03:35 AM, Ingo Molnar wrote:
> * Waiman Long<Waiman.Long@hpe.com>  wrote:
>
>> The inode_sb_list_add() and inode_sb_list_del() functions in the vfs
>> layer just perform list addition and deletion under lock. So they can
>> use the new list batching facility to speed up the list operations
>> when many CPUs are trying to do it simultaneously.
>>
>> In particular, the inode_sb_list_del() function can be a performance
>> bottleneck when large applications with many threads and associated
>> inodes exit. With an exit microbenchmark that creates a large number
>> of threads, attachs many inodes to them and then exits. The runtimes
>> of that microbenchmark with 1000 threads before and after the patch
>> on a 4-socket Intel E7-4820 v3 system (48 cores, 96 threads) were
>> as follows:
>>
>>    Kernel        Elapsed Time    System Time
>>    ------        ------------    -----------
>>    Vanilla 4.4      65.29s         82m14s
>>    Patched 4.4      45.69s         49m44s
>>
>> The elapsed time and the reported system time were reduced by 30%
>> and 40% respectively.
> That's pretty impressive!
>
> I'm wondering, why are inode_sb_list_add()/del() even called for a presumably
> reasonably well cached benchmark running on a system with enough RAM? Are these
> perhaps thousands of temporary files, already deleted, and released when all the
> file descriptors are closed as part of sys_exit()?

The inodes that need to be deleted were actually procfs files which have 
to go away when the processes/threads exit. I encountered this problem 
when running the SPECjbb2013 benchmark on large machine where sometimes 
it might seems to hang for 30 mins or so after the benchmark complete. I 
wrote a simple microbenchmark to simulate this situation which is in the 
attachment.


> If that's the case then I suspect an even bigger win would be not just to batch
> the (sb-)global list fiddling, but to potentially turn the sb list into a
> percpu_alloc() managed set of per CPU lists? It's a bigger change, but it could
> speed up a lot of other temporary file intensive usecases as well, not just
> batched delete.
>
> Thanks,
>
> 	Ingo

Yes, that can be another possible. I will investigate further on that 
one. Thanks for the suggestion.

Cheers,
Longman


[-- Attachment #2: exit_test.c --]
[-- Type: text/plain, Size: 2665 bytes --]

/*
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * Authors: Waiman Long <waiman.long@hp.com>
 */
/*
 * This is an exit test
 */
#include <ctype.h>
#include <errno.h>
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/syscall.h>


#define do_exit()	syscall(SYS_exit)
#define	gettid()	syscall(SYS_gettid)
#define	MAX_THREADS	2048

static inline void cpu_relax(void)
{
        __asm__ __volatile__("rep;nop": : :"memory");
}

static inline void atomic_inc(volatile int *v)
{
	__asm__ __volatile__("lock incl %0": "+m" (*v));
}

static volatile int exit_now  = 0;
static volatile int threadcnt = 0;

/*
 * Walk the /proc/<pid> filesystem to make them fill the dentry cache
 */
static void walk_procfs(void)
{
	char cmdbuf[256];
	pid_t tid = gettid();

	snprintf(cmdbuf, sizeof(cmdbuf), "find /proc/%d > /dev/null 2>&1", tid);
	if (system(cmdbuf) < 0)
		perror("system() failed!");
}

static void *exit_thread(void *dummy)
{
	long tid = (long)dummy;

	walk_procfs();
	atomic_inc(&threadcnt);
	/*
	 * Busy wait until the do_exit flag is set and then call exit
	 */
	while (!exit_now)
		sleep(1);
	do_exit();
}

static void exit_test(int threads)
{
	pthread_t thread[threads];
	long i = 0, finish;
	time_t start = time(NULL);

	while (i++ < threads) {
		if (pthread_create(thread + i - 1, NULL, exit_thread,
				  (void *)i)) {
			perror("pthread_create");
			exit(1);
		}
#if 0
		/*
		 * Pipelining to reduce contention & improve speed
		 */
		if ((i & 0xf) == 0)
			 while (i - threadcnt > 12)
				usleep(1);
#endif
	}
	while (threadcnt != threads)
		usleep(1);
	walk_procfs();
	printf("Setup time = %lus\n", time(NULL) - start);
	printf("Process ready to exit!\n");
	kill(0, SIGKILL);
	exit(0);
}

int main(int argc, char *argv[])
{
	int   tcnt;	/* Thread counts */
	char *cmd = argv[0];

	if ((argc != 2) || !isdigit(argv[1][0])) {
		fprintf(stderr, "Usage: %s <thread count>\n", cmd);
		exit(1);
	}
	tcnt = strtoul(argv[1], NULL, 10);
	if (tcnt > MAX_THREADS) {
		fprintf(stderr, "Error: thread count should be <= %d\n",
			MAX_THREADS);
		exit(1);
	}
	exit_test(tcnt);
	return 0;	/* Not reaachable */
}

next prev parent reply	other threads:[~2016-02-01 21:44 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-29 19:30 [PATCH v2 0/3] lib/list_batch: A simple list insertion/deletion batching facility Waiman Long
2016-01-29 19:30 ` [PATCH v2 1/3] " Waiman Long
2016-02-01  0:47   ` Dave Chinner
2016-02-03 23:11     ` Waiman Long
2016-02-06 23:57       ` Dave Chinner
2016-02-17  1:37         ` Waiman Long
2016-01-29 19:30 ` [PATCH v2 2/3] lib/list_batch, x86: Enable list insertion/deletion batching for x86 Waiman Long
2016-01-29 19:30 ` [PATCH v2 3/3] vfs: Enable list batching for the superblock's inode list Waiman Long
2016-01-30  8:35   ` Ingo Molnar
2016-02-01 17:45     ` Andi Kleen
2016-02-01 22:03       ` Waiman Long
2016-02-03 22:59         ` Waiman Long
2016-02-06 23:51           ` Dave Chinner
2016-02-01 21:44     ` Waiman Long [this message]
2016-02-01  0:04   ` Dave Chinner
2016-02-03 23:01     ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56AFD1A2.1060902@hpe.com \
    --to=waiman.long@hpe.com \
    --cc=andi@firstfloor.org \
    --cc=doug.hatch@hp.com \
    --cc=hpa@zytor.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=scott.norton@hp.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.