All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org,
	Yunlong Song <yunlong.song@huawei.com>,
	Paul Mackerras <paulus@samba.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Wang Nan <wangnan0@huawei.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>
Subject: [PATCH 07/19] perf sched replay: Alloc the memory of pid_to_task dynamically to adapt to the unexpected change of pid_max
Date: Wed,  8 Apr 2015 11:23:27 -0300	[thread overview]
Message-ID: <1428503019-23820-8-git-send-email-acme@kernel.org> (raw)
In-Reply-To: <1428503019-23820-1-git-send-email-acme@kernel.org>

From: Yunlong Song <yunlong.song@huawei.com>

The current memory allocation of struct task_desc *pid_to_task[MAX_PID]
is in a permanent and preset way, and it has two problems:

Problem 1: If the pid_max, which is the max number of pids in the
system, is much smaller than MAX_PID (1024*1000), then it causes a waste
of stack memory. This may happen in the case where the number of cpu
cores is much smaller than 1000.

Problem 2: If the pid_max is changed from the default value to a value
larger than MAX_PID, then it will cause assertion failure problem. The
maximum value of pid_max can be set to pid_max_max (see pidmap_init
defined in kernel/pid.c), which equals to PID_MAX_LIMIT. In x86_64,
PID_MAX_LIMIT is 4*1024*1024 (defined in include/linux/threads.h). This
value is much larger than MAX_PID, and will take up 32768 Kbytes
(4*1024*1024*8/1024) for memory allocation of pid_to_task, which is much
larger than the default 8192 Kbytes of the stack size of calling
process.

Due to these two problems, we use calloc to allocate the memory of
pid_to_task dynamically.

Example:

Test environment: x86_64 with 160 cores

 $ cat /proc/sys/kernel/pid_max
 163840
 $ echo 1025000 > /proc/sys/kernel/pid_max
 $ cat /proc/sys/kernel/pid_max
 1025000

Run some applications until the pid of some process is greater than
the value of MAX_PID (1024*1000).

Before this patch:

 $ perf sched replay
 run measurement overhead: 221 nsecs
 sleep measurement overhead: 55480 nsecs
 the run test took 1000008 nsecs
 the sleep test took 1063151 nsecs
 perf: builtin-sched.c:330: register_pid: Assertion `!(pid >= 1024000)'
 failed.
 Aborted

After this patch:

 $ perf sched replay
 run measurement overhead: 221 nsecs
 sleep measurement overhead: 55435 nsecs
 the run test took 1000004 nsecs
 the sleep test took 1059312 nsecs
 nr_run_events:        10
 nr_sleep_events:      1562
 nr_wakeup_events:     5
 task      0 (                  :1:         1), nr_events: 1
 task      1 (                  :2:         2), nr_events: 1
 task      2 (                  :3:         3), nr_events: 1
 task      3 (                  :5:         5), nr_events: 1
 ...

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-4-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-sched.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index c46610447ede..20d887b222e4 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -23,6 +23,7 @@
 #include <semaphore.h>
 #include <pthread.h>
 #include <math.h>
+#include <api/fs/fs.h>
 
 #define PR_SET_NAME		15               /* Set process name */
 #define MAX_CPUS		4096
@@ -124,7 +125,7 @@ struct perf_sched {
 	struct perf_tool tool;
 	const char	 *sort_order;
 	unsigned long	 nr_tasks;
-	struct task_desc *pid_to_task[MAX_PID];
+	struct task_desc **pid_to_task;
 	struct task_desc **tasks;
 	const struct trace_sched_handler *tp_handler;
 	pthread_mutex_t	 start_work_mutex;
@@ -326,8 +327,14 @@ static struct task_desc *register_pid(struct perf_sched *sched,
 				      unsigned long pid, const char *comm)
 {
 	struct task_desc *task;
+	static int pid_max;
 
-	BUG_ON(pid >= MAX_PID);
+	if (sched->pid_to_task == NULL) {
+		if (sysctl__read_int("kernel/pid_max", &pid_max) < 0)
+			pid_max = MAX_PID;
+		BUG_ON((sched->pid_to_task = calloc(pid_max, sizeof(struct task_desc *))) == NULL);
+	}
+	BUG_ON(pid >= (unsigned long)pid_max);
 
 	task = sched->pid_to_task[pid];
 
-- 
1.9.3


  parent reply	other threads:[~2015-04-08 14:28 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-08 14:23 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
2015-04-08 14:23 ` Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 01/19] perf evlist: Fix inverted logic in perf_mmap__empty Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 02/19] perf kmaps: Check kmaps to make code more robust Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 03/19] tools lib traceevent: Honor operator priority Arnaldo Carvalho de Melo
2015-04-08 14:23   ` Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 04/19] perf kmem: Respect -i option Arnaldo Carvalho de Melo
2015-04-08 14:23   ` Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 05/19] perf sched replay: Use struct task_desc instead of struct task_task for correct meaning Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 06/19] perf sched replay: Increase the MAX_PID value to fix assertion failure problem Arnaldo Carvalho de Melo
2015-04-08 14:23 ` Arnaldo Carvalho de Melo [this message]
2015-04-08 14:23 ` [PATCH 08/19] perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 09/19] perf sched replay: Fix the segmentation fault problem caused by pr_err in threads Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 10/19] perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 11/19] perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 12/19] perf sched replay: Support using -f to override perf.data file ownership Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 13/19] perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 14/19] perf record: Add clockid parameter Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 15/19] perf tools: Merge all perf_event_attr print functions Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 16/19] perf probe: Fix ARM 32 building error Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 17/19] perf tests: Fix attr tests Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 18/19] perf report: Don't call map__kmap if map is NULL Arnaldo Carvalho de Melo
2015-04-08 14:23 ` [PATCH 19/19] perf tools: Add 'I' event modifier for exclude_idle bit Arnaldo Carvalho de Melo
2015-04-08 15:05 ` [GIT PULL 00/19] perf/core improvements and fixes Ingo Molnar
2015-04-08 15:05   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1428503019-23820-8-git-send-email-acme@kernel.org \
    --to=acme@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulus@samba.org \
    --cc=wangnan0@huawei.com \
    --cc=yunlong.song@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.