linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junxiao Bi <junxiao.bi@oracle.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Matthew Wilcox <matthew.wilcox@oracle.com>,
	Srinivas Eeda <SRINIVAS.EEDA@oracle.com>,
	"joe.jin@oracle.com" <joe.jin@oracle.com>,
	Wengang Wang <wen.gang.wang@oracle.com>
Subject: Re: [PATCH] proc: Avoid a thundering herd of threads freeing proc dentries
Date: Thu, 25 Jun 2020 15:11:17 -0700	[thread overview]
Message-ID: <5421d6d6-1b00-865a-a992-e2337f044188@oracle.com> (raw)
In-Reply-To: <20200623004756.GE21350@casper.infradead.org>

On 6/22/20 5:47 PM, Matthew Wilcox wrote:

> On Sun, Jun 21, 2020 at 10:15:39PM -0700, Junxiao Bi wrote:
>> On 6/20/20 9:27 AM, Matthew Wilcox wrote:
>>> On Fri, Jun 19, 2020 at 05:42:45PM -0500, Eric W. Biederman wrote:
>>>> Junxiao Bi <junxiao.bi@oracle.com> writes:
>>>>> Still high lock contention. Collect the following hot path.
>>>> A different location this time.
>>>>
>>>> I know of at least exit_signal and exit_notify that take thread wide
>>>> locks, and it looks like exit_mm is another.  Those don't use the same
>>>> locks as flushing proc.
>>>>
>>>>
>>>> So I think you are simply seeing a result of the thundering herd of
>>>> threads shutting down at once.  Given that thread shutdown is fundamentally
>>>> a slow path there is only so much that can be done.
>>>>
>>>> If you are up for a project to working through this thundering herd I
>>>> expect I can help some.  It will be a long process of cleaning up
>>>> the entire thread exit process with an eye to performance.
>>> Wengang had some tests which produced wall-clock values for this problem,
>>> which I agree is more informative.
>>>
>>> I'm not entirely sure what the customer workload is that requires a
>>> highly threaded workload to also shut down quickly.  To my mind, an
>>> overall workload is normally composed of highly-threaded tasks that run
>>> for a long time and only shut down rarely (thus performance of shutdown
>>> is not important) and single-threaded tasks that run for a short time.
>> The real workload is a Java application working in server-agent mode, issue
>> happened in agent side, all it do is waiting works dispatching from server
>> and execute. To execute one work, agent will start lots of short live
>> threads, there could be a lot of threads exit same time if there were a lots
>> of work to execute, the contention on the exit path caused a high %sys time
>> which impacted other workload.
> How about this for a micro?  Executes in about ten seconds on my laptop.
> You might need to tweak it a bit to get better timing on a server.
>
> // gcc -pthread -O2 -g -W -Wall
> #include <pthread.h>
> #include <unistd.h>
>
> void *worker(void *arg)
> {
> 	int i = 0;
> 	int *p = arg;
>
> 	for (;;) {
> 		while (i < 1000 * 1000) {
> 			i += *p;
> 		}
> 		sleep(1);
> 	}
> }
>
> int main(int argc, char **argv)
> {
> 	pthread_t threads[20][100];

Tuning 100 to 1000 here and the following 2 loops.

Test it on 2-socket server with 104 cpu. Perf is similar on v5.7 and 
v5.7 with Eric's fix. The spin lock was shifted to spin lock in futex, 
so the fix didn't help.


     46.41%     0.11%  perf_test        [kernel.kallsyms] [k] 
entry_SYSCALL_64_after_hwframe
             |
              --46.30%--entry_SYSCALL_64_after_hwframe
                        |
                         --46.12%--do_syscall_64
                                   |
                                   |--30.47%--__x64_sys_futex
                                   |          |
                                   |           --30.45%--do_futex
                                   |                     |
                                   | |--18.04%--futex_wait
                                   |                     | |
                                   |                     | 
|--16.94%--futex_wait_setup
                                   |                     | |          |
                                   |                     | |           
--16.61%--_raw_spin_lock
                                   |                     | 
|                     |
                                   |                     | 
|                      --16.30%--native_queued_spin_lock_slowpath
                                   |                     | 
|                                |
                                   |                     | 
|                                 --0.81%--call_function_interrupt
                                   |                     | 
|                                           |
                                   |                     | | 
--0.79%--smp_call_function_interrupt
                                   |                     | 
|                                                      |
                                   |                     | | 
--0.62%--generic_smp_call_function_single_interrupt
                                   |                     | |
                                   | |           
--1.04%--futex_wait_queue_me
                                   | |                     |
                                   | |                      
--0.96%--schedule
                                   | |                                |
                                   | |                                 
--0.94%--__schedule
                                   | 
|                                           |
                                   | | --0.51%--pick_next_task_fair
                                   |                     |
                                   | --12.38%--futex_wake
                                   | |
                                   | |--11.00%--_raw_spin_lock
                                   | |          |
                                   | |           
--10.76%--native_queued_spin_lock_slowpath
                                   | |                     |
                                   | |                      
--0.55%--call_function_interrupt
                                   | |                                |
                                   | | --0.53%--smp_call_function_interrupt
                                   | |
|                                 --1.11%--wake_up_q
|                                           |
| --1.10%--try_to_wake_up
                                   |


Result of v5.7

=========

[root@jubi-bm-ol8 upstream]# time ./perf_test

real    0m4.850s
user    0m14.499s
sys    0m12.116s
[root@jubi-bm-ol8 upstream]# time ./perf_test

real    0m4.949s
user    0m14.285s
sys    0m18.408s
[root@jubi-bm-ol8 upstream]# time ./perf_test

real    0m4.885s
user    0m14.193s
sys    0m17.888s
[root@jubi-bm-ol8 upstream]# time ./perf_test

real    0m4.872s
user    0m14.451s
sys    0m18.717s
[root@jubi-bm-ol8 upstream]# uname -a
Linux jubi-bm-ol8 5.7.0-1700.20200601.el8uek.base.x86_64 #1 SMP Fri Jun 
19 07:41:06 PDT 2020 x86_64 x86_64 x86_64 GNU/Linux


Result of v5.7 with Eric's fix

=================

[root@jubi-bm-ol8 upstream]# time ./perf_test

real    0m4.889s
user    0m14.215s
sys    0m16.203s
[root@jubi-bm-ol8 upstream]# time ./perf_test

real    0m4.872s
user    0m14.431s
sys    0m17.737s
[root@jubi-bm-ol8 upstream]# time ./perf_test

real    0m4.908s
user    0m14.274s
sys    0m15.377s
[root@jubi-bm-ol8 upstream]# time ./perf_test

real    0m4.937s
user    0m14.632s
sys    0m16.255s
[root@jubi-bm-ol8 upstream]# uname -a
Linux jubi-bm-ol8 5.7.0-1700.20200601.el8uek.procfix.x86_64 #1 SMP Fri 
Jun 19 07:42:16 PDT 2020 x86_64 x86_64 x86_64 GNU/Linux

Thanks,

Junxiao.

> 	int i, j, one = 1;
>
> 	for (i = 0; i < 1000; i++) {
> 		for (j = 0; j < 100; j++)
> 			pthread_create(&threads[i % 20][j], NULL, worker, &one);
> 		if (i < 5)
> 			continue;
> 		for (j = 0; j < 100; j++)
> 			pthread_cancel(threads[(i - 5) %20][j]);
> 	}
>
> 	return 0;
> }

  reply	other threads:[~2020-06-25 22:11 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-18 22:17 severe proc dentry lock contention Junxiao Bi
2020-06-18 23:39 ` Matthew Wilcox
2020-06-19  0:02   ` Eric W. Biederman
2020-06-19  0:27     ` Junxiao Bi
2020-06-19  3:30       ` Eric W. Biederman
2020-06-19 14:09       ` [PATCH] proc: Avoid a thundering herd of threads freeing proc dentries Eric W. Biederman
2020-06-19 15:56         ` Junxiao Bi
2020-06-19 17:24           ` Eric W. Biederman
2020-06-19 21:56             ` Junxiao Bi
2020-06-19 22:42               ` Eric W. Biederman
2020-06-20 16:27                 ` Matthew Wilcox
2020-06-22  5:15                   ` Junxiao Bi
2020-06-22 15:20                     ` Eric W. Biederman
2020-06-22 15:48                       ` willy
2020-08-17 12:19                         ` Eric W. Biederman
2020-06-22 17:16                       ` Junxiao Bi
2020-06-23  0:47                     ` Matthew Wilcox
2020-06-25 22:11                       ` Junxiao Bi [this message]
2020-06-22  5:33         ` Masahiro Yamada
2020-06-22 15:13           ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5421d6d6-1b00-865a-a992-e2337f044188@oracle.com \
    --to=junxiao.bi@oracle.com \
    --cc=SRINIVAS.EEDA@oracle.com \
    --cc=ebiederm@xmission.com \
    --cc=joe.jin@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthew.wilcox@oracle.com \
    --cc=wen.gang.wang@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).