From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org,
Yunlong Song <yunlong.song@huawei.com>,
Paul Mackerras <paulus@samba.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Wang Nan <wangnan0@huawei.com>,
Arnaldo Carvalho de Melo <acme@redhat.com>
Subject: [PATCH 07/19] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max
Date: Wed, 8 Apr 2015 11:23:27 -0300 [thread overview]
Message-ID: <1428503019-23820-8-git-send-email-acme@kernel.org> (raw)
In-Reply-To: <1428503019-23820-1-git-send-email-acme@kernel.org>
From: Yunlong Song <yunlong.song@huawei.com>
The current memory allocation of struct task_desc *pid_to_task[MAX_PID]
is in a permanent and preset way, and it has two problems:
Problem 1: If the pid_max, which is the max number of pids in the
system, is much smaller than MAX_PID (1024*1000), then it causes a waste
of stack memory. This may happen in the case where the number of cpu
cores is much smaller than 1000.
Problem 2: If the pid_max is changed from the default value to a value
larger than MAX_PID, then it will cause assertion failure problem. The
maximum value of pid_max can be set to pid_max_max (see pidmap_init
defined in kernel/pid.c), which equals to PID_MAX_LIMIT. In x86_64,
PID_MAX_LIMIT is 4*1024*1024 (defined in include/linux/threads.h). This
value is much larger than MAX_PID, and will take up 32768 Kbytes
(4*1024*1024*8/1024) for memory allocation of pid_to_task, which is much
larger than the default 8192 Kbytes of the stack size of calling
process.
Due to these two problems, we use calloc to allocate the memory of
pid_to_task dynamically.
Example:
Test environment: x86_64 with 160 cores
$ cat /proc/sys/kernel/pid_max
163840
$ echo 1025000 > /proc/sys/kernel/pid_max
$ cat /proc/sys/kernel/pid_max
1025000
Run some applications until the pid of some process is greater than
the value of MAX_PID (1024*1000).
Before this patch:
$ perf sched replay
run measurement overhead: 221 nsecs
sleep measurement overhead: 55480 nsecs
the run test took 1000008 nsecs
the sleep test took 1063151 nsecs
perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 1024000)'
failed.
Aborted
After this patch:
$ perf sched replay
run measurement overhead: 221 nsecs
sleep measurement overhead: 55435 nsecs
the run test took 1000004 nsecs
the sleep test took 1059312 nsecs
nr_run_events: 10
nr_sleep_events: 1562
nr_wakeup_events: 5
task 0 ( :1: 1), nr_events: 1
task 1 ( :2: 2), nr_events: 1
task 2 ( :3: 3), nr_events: 1
task 3 ( :5: 5), nr_events: 1
...
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-4-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/builtin-sched.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index c46610447ede..20d887b222e4 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -23,6 +23,7 @@
#include <semaphore.h>
#include <pthread.h>
#include <math.h>
+#include <api/fs/fs.h>
#define PR_SET_NAME 15 /* Set process name */
#define MAX_CPUS 4096
@@ -124,7 +125,7 @@ struct perf_sched {
struct perf_tool tool;
const char *sort_order;
unsigned long nr_tasks;
- struct task_desc *pid_to_task[MAX_PID];
+ struct task_desc **pid_to_task;
struct task_desc **tasks;
const struct trace_sched_handler *tp_handler;
pthread_mutex_t start_work_mutex;
@@ -326,8 +327,14 @@ static struct task_desc *register_pid(struct perf_sched *sched,
unsigned long pid, const char *comm)
{
struct task_desc *task;
+ static int pid_max;
- BUG_ON(pid >= MAX_PID);
+ if (sched->pid_to_task == NULL) {
+ if (sysctl__read_int("kernel/pid_max", &pid_max) < 0)
+ pid_max = MAX_PID;
+ BUG_ON((sched->pid_to_task = calloc(pid_max, sizeof(struct task_desc *))) == NULL);
+ }
+ BUG_ON(pid >= (unsigned long)pid_max);
task = sched->pid_to_task[pid];
--
1.9.3
next prev parent reply other threads:[~2015-04-08 14:28 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-08 14:23 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 01/19] perf evlist: Fix inverted logic in perf_mmap__empty Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 02/19] perf kmaps: Check kmaps to make code more robust Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 03/19] tools lib traceevent: Honor operator priority Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 04/19] perf kmem: Respect -i option Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 05/19] perf sched replay: Use struct task_desc instead of struct task_task for correct meaning Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 06/19] perf sched replay: Increase the MAX_PID value to fix assertion failure problem Arnaldo Carvalho de Melo
2015-04-08 14:23 ` Arnaldo Carvalho de Melo [this message]
2015-04-08 14:23 ` [PATCH 08/19] perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 09/19] perf sched replay: Fix the segmentation fault problem caused by pr_err in threads Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 10/19] perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 11/19] perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 12/19] perf sched replay: Support using -f to override perf.data file ownership Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 13/19] perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 14/19] perf record: Add clockid parameter Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 15/19] perf tools: Merge all perf_event_attr print functions Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 16/19] perf probe: Fix ARM 32 building error Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 17/19] perf tests: Fix attr tests Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 18/19] perf report: Don't call map__kmap if map is NULL Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 19/19] perf tools: Add 'I' event modifier for exclude_idle bit Arnaldo Carvalho de Melo
2015-04-08 15:05 ` [GIT PULL 00/19] perf/core improvements and fixes Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1428503019-23820-8-git-send-email-acme@kernel.org \
--to=acme@kernel.org \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=paulus@samba.org \
--cc=wangnan0@huawei.com \
--cc=yunlong.song@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).