Commit 34d76c41 causes linker errors on ia64 with NR

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
@ 2009-10-18 22:28 Jeff Mahoney
  2009-10-20  2:02 ` Jiri Kosina
  0 siblings, 1 reply; 36+ messages in thread
From: Jeff Mahoney @ 2009-10-18 22:28 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Linux Kernel Mailing List

Hi Peter -

Commit 34d76c41 introduced the update_shares_data percpu, but it ends up
causing problems on large ia64 machines. Specifically, ia64 is limited
to 64k in percpu vars and with NR_CPUS=4096, that ends up being 32k by
itself. It ends up causing link errors since that is how ia64 enforces
the 64k limit.

I can take a deeper look at finding a workable solution but thought I'd
mention it in case you had ideas already.

-Jeff

-- 
Jeff Mahoney
SuSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-18 22:28 Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096 Jeff Mahoney
@ 2009-10-20  2:02 ` Jiri Kosina
  2009-10-20  4:57   ` Jiri Kosina
  0 siblings, 1 reply; 36+ messages in thread
From: Jiri Kosina @ 2009-10-20  2:02 UTC (permalink / raw)
  To: Jeff Mahoney
  Cc: Peter Zijlstra, Linux Kernel Mailing List, Tony Luck, Fenghua Yu,
	linux-ia64

On Sun, 18 Oct 2009, Jeff Mahoney wrote:

> Hi Peter -
> 
> Commit 34d76c41 introduced the update_shares_data percpu, but it ends up
> causing problems on large ia64 machines. Specifically, ia64 is limited
> to 64k in percpu vars and with NR_CPUS=4096, that ends up being 32k by
> itself. It ends up causing link errors since that is how ia64 enforces
> the 64k limit.
> 
> I can take a deeper look at finding a workable solution but thought I'd
> mention it in case you had ideas already.

I am adding some IA64 CCs, as the failure is solely caused by the ia64 
percpu implementation/pagefault handler optimization which requires the 
.percpu section area not be larger than 64k, which blows up with 34d76c41 
and NR_CPUS high enoufh (due to introduction of percpu array being 
size-dependent on NR_CPUS).

-- 
Jiri Kosina
SUSE Labs, Novell Inc.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20  2:02 ` Jiri Kosina
@ 2009-10-20  4:57   ` Jiri Kosina
  2009-10-20  5:21     ` Tejun Heo
  2009-10-20  6:15     ` Ingo Molnar
  0 siblings, 2 replies; 36+ messages in thread
From: Jiri Kosina @ 2009-10-20  4:57 UTC (permalink / raw)
  To: Jeff Mahoney, Peter Zijlstra
  Cc: Linux Kernel Mailing List, Tony Luck, Fenghua Yu, linux-ia64

On Tue, 20 Oct 2009, Jiri Kosina wrote:

> > Commit 34d76c41 introduced the update_shares_data percpu, but it ends up
> > causing problems on large ia64 machines. Specifically, ia64 is limited
> > to 64k in percpu vars and with NR_CPUS=4096, that ends up being 32k by
> > itself. It ends up causing link errors since that is how ia64 enforces
> > the 64k limit.
> > 
> > I can take a deeper look at finding a workable solution but thought I'd
> > mention it in case you had ideas already.
> 
> I am adding some IA64 CCs, as the failure is solely caused by the ia64 
> percpu implementation/pagefault handler optimization which requires the 
> .percpu section area not be larger than 64k, which blows up with 34d76c41 
> and NR_CPUS high enoufh (due to introduction of percpu array being 
> size-dependent on NR_CPUS).

How about this one? (untested)



From: Jiri Kosina <jkosina@suse.cz>
Subject: sched: move rq_weight data array out of .percpu

Commit 34d76c41 introduced percpu array update_shares_data, size of which 
being proportional to NR_CPUS. Unfortunately this blows up ia64 for large 
NR_CPUS configuration, as ia64 allows only 64k for .percpu section.

Fix this by allocating this array dynamically and keep only pointer to it 
percpu.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
--- 
 kernel/sched.c |   15 +++++++--------
 1 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index e886895..21337da 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1564,11 +1564,9 @@ static unsigned long cpu_avg_load_per_task(int cpu)
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 
-struct update_shares_data {
-	unsigned long rq_weight[NR_CPUS];
-};
+unsigned long *update_shares_data;
 
-static DEFINE_PER_CPU(struct update_shares_data, update_shares_data);
+static DEFINE_PER_CPU(unsigned long *, update_shares_data);
 
 static void __set_se_shares(struct sched_entity *se, unsigned long shares);
 
@@ -1578,12 +1576,12 @@ static void __set_se_shares(struct sched_entity *se, unsigned long shares);
 static void update_group_shares_cpu(struct task_group *tg, int cpu,
 				    unsigned long sd_shares,
 				    unsigned long sd_rq_weight,
-				    struct update_shares_data *usd)
+				    unsigned long *usd)
 {
 	unsigned long shares, rq_weight;
 	int boost = 0;
 
-	rq_weight = usd->rq_weight[cpu];
+	rq_weight = usd[cpu];
 	if (!rq_weight) {
 		boost = 1;
 		rq_weight = NICE_0_LOAD;
@@ -1618,7 +1616,7 @@ static void update_group_shares_cpu(struct task_group *tg, int cpu,
 static int tg_shares_up(struct task_group *tg, void *data)
 {
 	unsigned long weight, rq_weight = 0, shares = 0;
-	struct update_shares_data *usd;
+	unsigned long *usd;
 	struct sched_domain *sd = data;
 	unsigned long flags;
 	int i;
@@ -1631,7 +1629,7 @@ static int tg_shares_up(struct task_group *tg, void *data)
 
 	for_each_cpu(i, sched_domain_span(sd)) {
 		weight = tg->cfs_rq[i]->load.weight;
-		usd->rq_weight[i] = weight;
+		usd[i] = weight;
 
 		/*
 		 * If there are currently no tasks on the cpu pretend there
@@ -9420,6 +9418,7 @@ void __init sched_init(void)
 #ifdef CONFIG_FAIR_GROUP_SCHED
 		init_task_group.shares = init_task_group_load;
 		INIT_LIST_HEAD(&rq->leaf_cfs_rq_list);
+		__get_cpu_var(update_shares_data) = kmalloc(NR_CPUS * sizeof(unsigned long));
 #ifdef CONFIG_CGROUP_SCHED
 		/*
 		 * How much cpu bandwidth does init_task_group get?

-- 
Jiri Kosina
SUSE Labs, Novell Inc.


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20  4:57   ` Jiri Kosina
@ 2009-10-20  5:21     ` Tejun Heo
  2009-10-20  5:58       ` Jiri Kosina
  2009-10-27 10:52       ` Jiri Kosina
  2009-10-20  6:15     ` Ingo Molnar
  1 sibling, 2 replies; 36+ messages in thread
From: Tejun Heo @ 2009-10-20  5:21 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Jeff Mahoney, Peter Zijlstra, Linux Kernel Mailing List,
	Tony Luck, Fenghua Yu, linux-ia64

Hello,

Jiri Kosina wrote:
> How about this one? (untested)
> 
> From: Jiri Kosina <jkosina@suse.cz>
> Subject: sched: move rq_weight data array out of .percpu
> 
> Commit 34d76c41 introduced percpu array update_shares_data, size of which 
> being proportional to NR_CPUS. Unfortunately this blows up ia64 for large 
> NR_CPUS configuration, as ia64 allows only 64k for .percpu section.
> 
> Fix this by allocating this array dynamically and keep only pointer to it 
> percpu.

If the structure can be dynamically allocated at all, just using
percpu_alloc() to allocate the whole thing should solve it with the
correct numa locality.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20  5:21     ` Tejun Heo
@ 2009-10-20  5:58       ` Jiri Kosina
  2009-10-20  6:12         ` Tejun Heo
  2009-10-27 10:52       ` Jiri Kosina
  1 sibling, 1 reply; 36+ messages in thread
From: Jiri Kosina @ 2009-10-20  5:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jeff Mahoney, Peter Zijlstra, Linux Kernel Mailing List,
	Tony Luck, Fenghua Yu, linux-ia64

On Tue, 20 Oct 2009, Tejun Heo wrote:

> If the structure can be dynamically allocated at all, just using
> percpu_alloc() to allocate the whole thing should solve it with the
> correct numa locality.
> 
> Thanks.

So how about something along the lines below? (completely untested as 
well)



From: Jiri Kosina <jkosina@suse.cz>
Subject: sched: move rq_weight data array out of .percpu

Commit 34d76c41 introduced percpu array update_shares_data, size of which
being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
NR_CPUS configuration, as ia64 allows only 64k for .percpu section.

Fix this by allocating this array dynamically and keep only pointer to it
percpu.
                                                                                                     
Signed-off-by: Jiri Kosina <jkosina@suse.cz>   
--- 
 kernel/sched.c |   19 +++++++++----------
 1 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index e886895..071cc66 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1564,11 +1564,7 @@ static unsigned long cpu_avg_load_per_task(int cpu)
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 
-struct update_shares_data {
-	unsigned long rq_weight[NR_CPUS];
-};
-
-static DEFINE_PER_CPU(struct update_shares_data, update_shares_data);
+unsigned long *update_shares_data;
 
 static void __set_se_shares(struct sched_entity *se, unsigned long shares);
 
@@ -1578,12 +1574,12 @@ static void __set_se_shares(struct sched_entity *se, unsigned long shares);
 static void update_group_shares_cpu(struct task_group *tg, int cpu,
 				    unsigned long sd_shares,
 				    unsigned long sd_rq_weight,
-				    struct update_shares_data *usd)
+				    unsigned long *usd)
 {
 	unsigned long shares, rq_weight;
 	int boost = 0;
 
-	rq_weight = usd->rq_weight[cpu];
+	rq_weight = usd[cpu];
 	if (!rq_weight) {
 		boost = 1;
 		rq_weight = NICE_0_LOAD;
@@ -1618,7 +1614,7 @@ static void update_group_shares_cpu(struct task_group *tg, int cpu,
 static int tg_shares_up(struct task_group *tg, void *data)
 {
 	unsigned long weight, rq_weight = 0, shares = 0;
-	struct update_shares_data *usd;
+	unsigned long *usd;
 	struct sched_domain *sd = data;
 	unsigned long flags;
 	int i;
@@ -1627,11 +1623,10 @@ static int tg_shares_up(struct task_group *tg, void *data)
 		return 0;
 
 	local_irq_save(flags);
-	usd = &__get_cpu_var(update_shares_data);
 
 	for_each_cpu(i, sched_domain_span(sd)) {
 		weight = tg->cfs_rq[i]->load.weight;
-		usd->rq_weight[i] = weight;
+		usd = *per_cpu_ptr(update_shares_data, i) = weight;
 
 		/*
 		 * If there are currently no tasks on the cpu pretend there
@@ -9407,6 +9402,10 @@ void __init sched_init(void)
 #endif /* CONFIG_USER_SCHED */
 #endif /* CONFIG_GROUP_SCHED */
 
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	update_shares_data = __alloc_percpu(NR_CPUS * sizeof(unsigned long),
+			unsigned long);
+#endif
 	for_each_possible_cpu(i) {
 		struct rq *rq;
 

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20  5:58       ` Jiri Kosina
@ 2009-10-20  6:12         ` Tejun Heo
  2009-10-20  6:14           ` Tejun Heo
  0 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2009-10-20  6:12 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Jeff Mahoney, Peter Zijlstra, Linux Kernel Mailing List,
	Tony Luck, Fenghua Yu, linux-ia64

Jiri Kosina wrote:
> So how about something along the lines below? (completely untested as 
> well)
> 
> From: Jiri Kosina <jkosina@suse.cz>
> Subject: sched: move rq_weight data array out of .percpu
> 
> Commit 34d76c41 introduced percpu array update_shares_data, size of which
> being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
> NR_CPUS configuration, as ia64 allows only 64k for .percpu section.
> 
> Fix this by allocating this array dynamically and keep only pointer to it
> percpu.

Looks good to me from percpu POV.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20  6:12         ` Tejun Heo
@ 2009-10-20  6:14           ` Tejun Heo
  2009-10-20  6:27             ` Jiri Kosina
  0 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2009-10-20  6:14 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Jeff Mahoney, Peter Zijlstra, Linux Kernel Mailing List,
	Tony Luck, Fenghua Yu, linux-ia64

Tejun Heo wrote:
> Jiri Kosina wrote:
>> So how about something along the lines below? (completely untested as 
>> well)
>>
>> From: Jiri Kosina <jkosina@suse.cz>
>> Subject: sched: move rq_weight data array out of .percpu
>>
>> Commit 34d76c41 introduced percpu array update_shares_data, size of which
>> being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
>> NR_CPUS configuration, as ia64 allows only 64k for .percpu section.
>>
>> Fix this by allocating this array dynamically and keep only pointer to it
>> percpu.
> 
> Looks good to me from percpu POV.

Oh... one thing.  If you're doing dynamic allocation you can use
nr_cpu_ids instead of NR_CPUS.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20  6:14           ` Tejun Heo
@ 2009-10-20  6:27             ` Jiri Kosina
  2009-10-20 14:18               ` Jeff Mahoney
  0 siblings, 1 reply; 36+ messages in thread
From: Jiri Kosina @ 2009-10-20  6:27 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jeff Mahoney, Peter Zijlstra, Linux Kernel Mailing List,
	Tony Luck, Fenghua Yu, linux-ia64

On Tue, 20 Oct 2009, Tejun Heo wrote:

> Oh... one thing.  If you're doing dynamic allocation you can use
> nr_cpu_ids instead of NR_CPUS.

Right, that's proper optimization.



From: Jiri Kosina <jkosina@suse.cz>
Subject: sched: move rq_weight data array out of .percpu

Commit 34d76c41 introduced percpu array update_shares_data, size of which
being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
NR_CPUS configuration, as ia64 allows only 64k for .percpu section.

Fix this by allocating this array dynamically and keep only pointer to it
percpu.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 kernel/sched.c |   19 +++++++++----------
 1 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index e886895..e70b526 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1564,11 +1564,7 @@ static unsigned long cpu_avg_load_per_task(int cpu)
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 
-struct update_shares_data {
-	unsigned long rq_weight[NR_CPUS];
-};
-
-static DEFINE_PER_CPU(struct update_shares_data, update_shares_data);
+unsigned long *update_shares_data;
 
 static void __set_se_shares(struct sched_entity *se, unsigned long shares);
 
@@ -1578,12 +1574,12 @@ static void __set_se_shares(struct sched_entity *se, unsigned long shares);
 static void update_group_shares_cpu(struct task_group *tg, int cpu,
 				    unsigned long sd_shares,
 				    unsigned long sd_rq_weight,
-				    struct update_shares_data *usd)
+				    unsigned long *usd)
 {
 	unsigned long shares, rq_weight;
 	int boost = 0;
 
-	rq_weight = usd->rq_weight[cpu];
+	rq_weight = usd[cpu];
 	if (!rq_weight) {
 		boost = 1;
 		rq_weight = NICE_0_LOAD;
@@ -1618,7 +1614,7 @@ static void update_group_shares_cpu(struct task_group *tg, int cpu,
 static int tg_shares_up(struct task_group *tg, void *data)
 {
 	unsigned long weight, rq_weight = 0, shares = 0;
-	struct update_shares_data *usd;
+	unsigned long *usd;
 	struct sched_domain *sd = data;
 	unsigned long flags;
 	int i;
@@ -1627,11 +1623,10 @@ static int tg_shares_up(struct task_group *tg, void *data)
 		return 0;
 
 	local_irq_save(flags);
-	usd = &__get_cpu_var(update_shares_data);
 
 	for_each_cpu(i, sched_domain_span(sd)) {
 		weight = tg->cfs_rq[i]->load.weight;
-		usd->rq_weight[i] = weight;
+		usd = *per_cpu_ptr(update_shares_data, i) = weight;
 
 		/*
 		 * If there are currently no tasks on the cpu pretend there
@@ -9407,6 +9402,10 @@ void __init sched_init(void)
 #endif /* CONFIG_USER_SCHED */
 #endif /* CONFIG_GROUP_SCHED */
 
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	update_shares_data = __alloc_percpu(nr_cpu_ids * sizeof(unsigned long),
+			unsigned long);
+#endif
 	for_each_possible_cpu(i) {
 		struct rq *rq;
 

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20  6:27             ` Jiri Kosina
@ 2009-10-20 14:18               ` Jeff Mahoney
  2009-10-20 14:49                 ` Jeff Mahoney
  0 siblings, 1 reply; 36+ messages in thread
From: Jeff Mahoney @ 2009-10-20 14:18 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Tejun Heo, Peter Zijlstra, Linux Kernel Mailing List, Tony Luck,
	Fenghua Yu, linux-ia64

On 10/20/2009 02:27 AM, Jiri Kosina wrote:
> @@ -1627,11 +1623,10 @@ static int tg_shares_up(struct task_group *tg, void *data)
>  		return 0;
>  
>  	local_irq_save(flags);
> -	usd = &__get_cpu_var(update_shares_data);
>  
>  	for_each_cpu(i, sched_domain_span(sd)) {
>  		weight = tg->cfs_rq[i]->load.weight;
> -		usd->rq_weight[i] = weight;
> +		usd = *per_cpu_ptr(update_shares_data, i) = weight;
>  
>  		/*
>  		 * If there are currently no tasks on the cpu pretend there

I don't think this is what you want here.

In the original version, usd is the percpu var using the current cpu. In
your version, usd is the percpu var using i instead of the current cpu.

I'll post my version of the patch shortly. I don't think keeping most of
the original version is a bad thing. We can just allocate it dynamically
instead.

-Jeff

-- 
Jeff Mahoney
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20 14:18               ` Jeff Mahoney
@ 2009-10-20 14:49                 ` Jeff Mahoney
  2009-10-21  6:11                   ` Ingo Molnar
  2009-10-26 20:16                   ` Ingo Molnar
  0 siblings, 2 replies; 36+ messages in thread
From: Jeff Mahoney @ 2009-10-20 14:49 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Tejun Heo, Peter Zijlstra, Linux Kernel Mailing List, Tony Luck,
	Fenghua Yu, linux-ia64

On 10/20/2009 10:18 AM, Jeff Mahoney wrote:
> On 10/20/2009 02:27 AM, Jiri Kosina wrote:
>> @@ -1627,11 +1623,10 @@ static int tg_shares_up(struct task_group *tg, void *data)
>>  		return 0;
>>  
>>  	local_irq_save(flags);
>> -	usd = &__get_cpu_var(update_shares_data);
>>  
>>  	for_each_cpu(i, sched_domain_span(sd)) {
>>  		weight = tg->cfs_rq[i]->load.weight;
>> -		usd->rq_weight[i] = weight;
>> +		usd = *per_cpu_ptr(update_shares_data, i) = weight;
>>  
>>  		/*
>>  		 * If there are currently no tasks on the cpu pretend there
> 
> I don't think this is what you want here.
> 
> In the original version, usd is the percpu var using the current cpu. In
> your version, usd is the percpu var using i instead of the current cpu.
> 
> I'll post my version of the patch shortly. I don't think keeping most of
> the original version is a bad thing. We can just allocate it dynamically
> instead.

This version fixes a build issue (__alignof__(unsigned long)) and fixes the
percpu lookup to be usd = percpu array pointer, usd[i] = actual variable.

-Jeff

From: Jiri Kosina <jkosina@suse.cz>
Subject: sched: move rq_weight data array out of .percpu

Commit 34d76c41 introduced percpu array update_shares_data, size of which
being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
NR_CPUS configuration, as ia64 allows only 64k for .percpu section.

Fix this by allocating this array dynamically and keep only pointer to it
percpu.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
--- 
 kernel/sched.c |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1564,11 +1564,7 @@ static unsigned long cpu_avg_load_per_ta
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 
-struct update_shares_data {
-	unsigned long rq_weight[NR_CPUS];
-};
-
-static DEFINE_PER_CPU(struct update_shares_data, update_shares_data);
+unsigned long *update_shares_data;
 
 static void __set_se_shares(struct sched_entity *se, unsigned long shares);
 
@@ -1578,12 +1574,12 @@ static void __set_se_shares(struct sched
 static void update_group_shares_cpu(struct task_group *tg, int cpu,
 				    unsigned long sd_shares,
 				    unsigned long sd_rq_weight,
-				    struct update_shares_data *usd)
+				    unsigned long *usd)
 {
 	unsigned long shares, rq_weight;
 	int boost = 0;
 
-	rq_weight = usd->rq_weight[cpu];
+	rq_weight = usd[cpu];
 	if (!rq_weight) {
 		boost = 1;
 		rq_weight = NICE_0_LOAD;
@@ -1618,7 +1614,7 @@ static void update_group_shares_cpu(stru
 static int tg_shares_up(struct task_group *tg, void *data)
 {
 	unsigned long weight, rq_weight = 0, shares = 0;
-	struct update_shares_data *usd;
+	unsigned long *usd;
 	struct sched_domain *sd = data;
 	unsigned long flags;
 	int i;
@@ -1627,11 +1623,11 @@ static int tg_shares_up(struct task_grou
 		return 0;
 
 	local_irq_save(flags);
-	usd = &__get_cpu_var(update_shares_data);
+	usd = per_cpu_ptr(update_shares_data, smp_processor_id());
 
 	for_each_cpu(i, sched_domain_span(sd)) {
 		weight = tg->cfs_rq[i]->load.weight;
-		usd->rq_weight[i] = weight;
+		usd[i] = weight;
 
 		/*
 		 * If there are currently no tasks on the cpu pretend there
@@ -9407,6 +9403,10 @@ void __init sched_init(void)
 #endif /* CONFIG_USER_SCHED */
 #endif /* CONFIG_GROUP_SCHED */
 
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	update_shares_data = __alloc_percpu(nr_cpu_ids * sizeof(unsigned long),
+					    __alignof__(unsigned long));
+#endif
 	for_each_possible_cpu(i) {
 		struct rq *rq;
 
 

-- 
Jeff Mahoney
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20 14:49                 ` Jeff Mahoney
@ 2009-10-21  6:11                   ` Ingo Molnar
  2009-10-21 15:19                     ` Tejun Heo
  2009-10-26 20:16                   ` Ingo Molnar
  1 sibling, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2009-10-21  6:11 UTC (permalink / raw)
  To: Jeff Mahoney
  Cc: Jiri Kosina, Tejun Heo, Peter Zijlstra, Linux Kernel Mailing List,
	Tony Luck, Fenghua Yu, linux-ia64


* Jeff Mahoney <jeffm@suse.com> wrote:

> This version fixes a build issue (__alignof__(unsigned long)) and 
> fixes the percpu lookup to be usd = percpu array pointer, usd[i] = 
> actual variable.

Before applying this part-workaround i'd like to see confirmation that 
the IA64 limitation is being fixed. It would be bad for such bugs to pop 
up again and again in the future, with a 2 months bug fixing latency.

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-21  6:11                   ` Ingo Molnar
@ 2009-10-21 15:19                     ` Tejun Heo
  2009-10-21 22:11                       ` Luck, Tony
  0 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2009-10-21 15:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeff Mahoney, Jiri Kosina, Peter Zijlstra,
	Linux Kernel Mailing List, Tony Luck, Fenghua Yu, linux-ia64

Ingo Molnar wrote:
> * Jeff Mahoney <jeffm@suse.com> wrote:
> 
>> This version fixes a build issue (__alignof__(unsigned long)) and 
>> fixes the percpu lookup to be usd = percpu array pointer, usd[i] = 
>> actual variable.
> 
> Before applying this part-workaround i'd like to see confirmation that 
> the IA64 limitation is being fixed. It would be bad for such bugs to pop 
> up again and again in the future, with a 2 months bug fixing latency.

I talked with Christoph earlier and he basically said that the
constant can just be bumped up without any ill effect (it would be
great if Tony Luck can confirm this).  So, it's an annoying thing and
in the long term removing would be a good idea but I don't think it's
anything major as long as it blows up during build not runtime.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-21 15:19                     ` Tejun Heo
@ 2009-10-21 22:11                       ` Luck, Tony
  2009-10-22 14:49                         ` Jiri Kosina
  0 siblings, 1 reply; 36+ messages in thread
From: Luck, Tony @ 2009-10-21 22:11 UTC (permalink / raw)
  To: Tejun Heo, Ingo Molnar
  Cc: Jeff Mahoney, Jiri Kosina, Peter Zijlstra,
	Linux Kernel Mailing List, Yu, Fenghua,
	linux-ia64@vger.kernel.org

> I talked with Christoph earlier and he basically said that the
> constant can just be bumped up without any ill effect (it would be
> great if Tony Luck can confirm this).  So, it's an annoying thing and
> in the long term removing would be a good idea but I don't think it's
> anything major as long as it blows up during build not runtime.

Yes, ia64 can use any supported page size for the percpu area (I
tried a couple of different options, and saw no obvious problems).

But ... the architecturally supported page sizes go up by powers
of 4, so next choice from 64K is 256K then 1M, 4M, etc.  This is
also requires an edit of source code and re-compile.  We could easily
make it a config option ... but that is still inconvenient.

The bloat introduced by adding percpu variables is multiplied by
NR_CPUS ... and in my case that is 4096.  It is easy to just shrug
this off and say that such big systems have plenty of memory anyway,
but the case that led to this issue (adding a percpu object that
included a [NR_CPUS] array) shows that, IMHO, people are do not
care enough about the bloat.

I suspect that if I just increase the percpu area to 256K or 1M,
I'll see this same issue when someone adds:

struct foo {
       char buf[NR_CPUS][PAGE_SIZE];
};
DECLARE_PER_CPU(struct foo, bar);

which needs 4k * 64k = 256M of per-cpu space ... i.e. 1T total.

If such code is going to be deemed acceptable, then we do need
to move away from the ia64 TLB mapped percpu area.

-Tony

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-21 22:11                       ` Luck, Tony
@ 2009-10-22 14:49                         ` Jiri Kosina
  2009-10-22 14:53                           ` Jeff Mahoney
  2009-10-22 22:24                           ` Luck, Tony
  0 siblings, 2 replies; 36+ messages in thread
From: Jiri Kosina @ 2009-10-22 14:49 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Tejun Heo, Ingo Molnar, Jeff Mahoney, Peter Zijlstra,
	Linux Kernel Mailing List, Yu, Fenghua,
	linux-ia64@vger.kernel.org

On Wed, 21 Oct 2009, Luck, Tony wrote:

> But ... the architecturally supported page sizes go up by powers of 4, 
> so next choice from 64K is 256K then 1M, 4M, etc.  This is also requires 
> an edit of source code and re-compile.  We could easily make it a config 
> option ... but that is still inconvenient.
> 
> The bloat introduced by adding percpu variables is multiplied by NR_CPUS 
> ... and in my case that is 4096.  It is easy to just shrug this off and 
> say that such big systems have plenty of memory anyway, but the case 
> that led to this issue (adding a percpu object that included a [NR_CPUS] 
> array) shows that, IMHO, people are do not care enough about the bloat.
> 
> I suspect that if I just increase the percpu area to 256K or 1M, I'll 
> see this same issue when someone adds:
> 
> struct foo {
>        char buf[NR_CPUS][PAGE_SIZE];
> };
> DECLARE_PER_CPU(struct foo, bar);
> 
> which needs 4k * 64k = 256M of per-cpu space ... i.e. 1T total.
> 
> If such code is going to be deemed acceptable, then we do need
> to move away from the ia64 TLB mapped percpu area.

Well, I must say I slightly agree that my gut feeling is that we should 
try to avoid arrays which size depends on NR_CPUS as much as possible.

Now, what to do for 2.6.32? We definitely need some kind of fix for this, 
otherwise the Altix guys would kill us.

Tony, is the change that will eventually have to be made to ia64 pagefault 
handler too intrusive for -rc6, and should we rather go with my workaround 
instead, and try to find something proper for 2.6.33?

Pure revert of 34d76c41 isn't possible as well, as it fixes a real bug.

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-22 14:49                         ` Jiri Kosina
@ 2009-10-22 14:53                           ` Jeff Mahoney
  2009-10-22 22:24                           ` Luck, Tony
  1 sibling, 0 replies; 36+ messages in thread
From: Jeff Mahoney @ 2009-10-22 14:53 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Luck, Tony, Tejun Heo, Ingo Molnar, Peter Zijlstra,
	Linux Kernel Mailing List, Yu, Fenghua,
	linux-ia64@vger.kernel.org

On 10/22/2009 10:49 AM, Jiri Kosina wrote:
> On Wed, 21 Oct 2009, Luck, Tony wrote:
> 
>> But ... the architecturally supported page sizes go up by powers of 4, 
>> so next choice from 64K is 256K then 1M, 4M, etc.  This is also requires 
>> an edit of source code and re-compile.  We could easily make it a config 
>> option ... but that is still inconvenient.
>>
>> The bloat introduced by adding percpu variables is multiplied by NR_CPUS 
>> ... and in my case that is 4096.  It is easy to just shrug this off and 
>> say that such big systems have plenty of memory anyway, but the case 
>> that led to this issue (adding a percpu object that included a [NR_CPUS] 
>> array) shows that, IMHO, people are do not care enough about the bloat.
>>
>> I suspect that if I just increase the percpu area to 256K or 1M, I'll 
>> see this same issue when someone adds:
>>
>> struct foo {
>>        char buf[NR_CPUS][PAGE_SIZE];
>> };
>> DECLARE_PER_CPU(struct foo, bar);
>>
>> which needs 4k * 64k = 256M of per-cpu space ... i.e. 1T total.
>>
>> If such code is going to be deemed acceptable, then we do need
>> to move away from the ia64 TLB mapped percpu area.
> 
> Well, I must say I slightly agree that my gut feeling is that we should 
> try to avoid arrays which size depends on NR_CPUS as much as possible.

Agreed.

> Now, what to do for 2.6.32? We definitely need some kind of fix for this, 
> otherwise the Altix guys would kill us.
> 
> Tony, is the change that will eventually have to be made to ia64 pagefault 
> handler too intrusive for -rc6, and should we rather go with my workaround 
> instead, and try to find something proper for 2.6.33?

It's not an either-or situation. It's not really a workaround. Distros
configure NR_CPUS for the maximum possible for a given architecture even
though the majority of systems don't come close to it. We're just
wasting memory. I've seen other fixes that are just "make percpu xyz
dynamic" without much debate. I don't see why this one should be much
different other than the fact that the original report raised a red flag
about an implementation limitation.

Yes, the ia64 limitation should be revisited, but that is a separate issue.

-Jeff

-- 
Jeff Mahoney
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-22 14:49                         ` Jiri Kosina
  2009-10-22 14:53                           ` Jeff Mahoney
@ 2009-10-22 22:24                           ` Luck, Tony
  2009-10-23  7:51                             ` Ingo Molnar
  1 sibling, 1 reply; 36+ messages in thread
From: Luck, Tony @ 2009-10-22 22:24 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Tejun Heo, Ingo Molnar, Jeff Mahoney, Peter Zijlstra,
	Linux Kernel Mailing List, Yu, Fenghua,
	linux-ia64@vger.kernel.org

> Tony, is the change that will eventually have to be made to ia64 pagefault 
> handler too intrusive for -rc6, and should we rather go with my workaround 
> instead, and try to find something proper for 2.6.33?

Using __alloc_percpu() rather than static declaration looks to be
the right fix here.  Not a "workaround".

-Tony

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-22 22:24                           ` Luck, Tony
@ 2009-10-23  7:51                             ` Ingo Molnar
  2009-10-23 12:30                               ` Jiri Kosina
  0 siblings, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2009-10-23  7:51 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Jiri Kosina, Tejun Heo, Jeff Mahoney, Peter Zijlstra,
	Linux Kernel Mailing List, Yu, Fenghua,
	linux-ia64@vger.kernel.org


* Luck, Tony <tony.luck@intel.com> wrote:

> > Tony, is the change that will eventually have to be made to ia64 
> > pagefault handler too intrusive for -rc6, and should we rather go 
> > with my workaround instead, and try to find something proper for 
> > 2.6.33?
> 
> Using __alloc_percpu() rather than static declaration looks to be the 
> right fix here.  Not a "workaround".

It is a workaround for the IA64 build failure.

It's also an improvement of the scheduler code (we generally try to 
eliminate NR_CPUs scaling of allocations) - but code improvements need 
to happen much sooner than -rc6/-rc7.

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-23  7:51                             ` Ingo Molnar
@ 2009-10-23 12:30                               ` Jiri Kosina
  2009-10-26 16:38                                 ` Jiri Kosina
  0 siblings, 1 reply; 36+ messages in thread
From: Jiri Kosina @ 2009-10-23 12:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Luck, Tony, Tejun Heo, Jeff Mahoney, Peter Zijlstra,
	Linux Kernel Mailing List, Yu, Fenghua,
	linux-ia64@vger.kernel.org

On Fri, 23 Oct 2009, Ingo Molnar wrote:

> > > Tony, is the change that will eventually have to be made to ia64 
> > > pagefault handler too intrusive for -rc6, and should we rather go 
> > > with my workaround instead, and try to find something proper for 
> > > 2.6.33?
> > 
> > Using __alloc_percpu() rather than static declaration looks to be the 
> > right fix here.  Not a "workaround".
> 
> It is a workaround for the IA64 build failure.
> 
> It's also an improvement of the scheduler code (we generally try to 
> eliminate NR_CPUs scaling of allocations) - but code improvements need 
> to happen much sooner than -rc6/-rc7.

Unfortunately refactoring the ia4 pagefault handler would require some 
refactoring in order to eliminate the 64k limitation, as far as I know, 
which doesn't seem to be feasible for this timeframe either.

So if I personally have to chose from these two options, I'd prefer my 
patch for 2.6.32 and it'd be helpful if ia64 people could look into doing 
something with the 64k limit for 2.6.33.

Acceptable for everyone? Ingo/Peter, will you merge this asap please, so 
that we don't miss 2.6.32 with it?

Thanks,

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-23 12:30                               ` Jiri Kosina
@ 2009-10-26 16:38                                 ` Jiri Kosina
  0 siblings, 0 replies; 36+ messages in thread
From: Jiri Kosina @ 2009-10-26 16:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Luck, Tony, Tejun Heo, Jeff Mahoney, Peter Zijlstra,
	Linux Kernel Mailing List, Yu, Fenghua,
	linux-ia64@vger.kernel.org

On Fri, 23 Oct 2009, Jiri Kosina wrote:

> > > > Tony, is the change that will eventually have to be made to ia64 
> > > > pagefault handler too intrusive for -rc6, and should we rather go 
> > > > with my workaround instead, and try to find something proper for 
> > > > 2.6.33?
> > > 
> > > Using __alloc_percpu() rather than static declaration looks to be the 
> > > right fix here.  Not a "workaround".
> > 
> > It is a workaround for the IA64 build failure.
> > 
> > It's also an improvement of the scheduler code (we generally try to 
> > eliminate NR_CPUs scaling of allocations) - but code improvements need 
> > to happen much sooner than -rc6/-rc7.
> 
> Unfortunately refactoring the ia4 pagefault handler would require some 
> refactoring in order to eliminate the 64k limitation, as far as I know, 
> which doesn't seem to be feasible for this timeframe either.
> 
> So if I personally have to chose from these two options, I'd prefer my 
> patch for 2.6.32 and it'd be helpful if ia64 people could look into doing 
> something with the 64k limit for 2.6.33.
> 
> Acceptable for everyone? Ingo/Peter, will you merge this asap please, so 
> that we don't miss 2.6.32 with it?

This stil seems unfixed in the relevant trees I have checked ... anyone 
has strong objections to this, which cause the patch to be delayed?

Thanks,

-- 
Jiri Kosina
SUSE Labs, Novell Inc.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20 14:49                 ` Jeff Mahoney
  2009-10-21  6:11                   ` Ingo Molnar
@ 2009-10-26 20:16                   ` Ingo Molnar
  2009-10-27 10:03                     ` Jiri Kosina
  1 sibling, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2009-10-26 20:16 UTC (permalink / raw)
  To: Jeff Mahoney
  Cc: Jiri Kosina, Tejun Heo, Peter Zijlstra, Linux Kernel Mailing List,
	Tony Luck, Fenghua Yu, linux-ia64


* Jeff Mahoney <jeffm@suse.com> wrote:

> On 10/20/2009 10:18 AM, Jeff Mahoney wrote:
> > On 10/20/2009 02:27 AM, Jiri Kosina wrote:
> >> @@ -1627,11 +1623,10 @@ static int tg_shares_up(struct task_group *tg, void *data)
> >>  		return 0;
> >>  
> >>  	local_irq_save(flags);
> >> -	usd = &__get_cpu_var(update_shares_data);
> >>  
> >>  	for_each_cpu(i, sched_domain_span(sd)) {
> >>  		weight = tg->cfs_rq[i]->load.weight;
> >> -		usd->rq_weight[i] = weight;
> >> +		usd = *per_cpu_ptr(update_shares_data, i) = weight;
> >>  
> >>  		/*
> >>  		 * If there are currently no tasks on the cpu pretend there
> > 
> > I don't think this is what you want here.
> > 
> > In the original version, usd is the percpu var using the current cpu. In
> > your version, usd is the percpu var using i instead of the current cpu.
> > 
> > I'll post my version of the patch shortly. I don't think keeping most of
> > the original version is a bad thing. We can just allocate it dynamically
> > instead.
> 
> This version fixes a build issue (__alignof__(unsigned long)) and fixes the
> percpu lookup to be usd = percpu array pointer, usd[i] = actual variable.
> 
> -Jeff
> 
> From: Jiri Kosina <jkosina@suse.cz>
> Subject: sched: move rq_weight data array out of .percpu
> 
> Commit 34d76c41 introduced percpu array update_shares_data, size of which
> being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
> NR_CPUS configuration, as ia64 allows only 64k for .percpu section.
> 
> Fix this by allocating this array dynamically and keep only pointer to it
> percpu.
> 
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> --- 
>  kernel/sched.c |   20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -1564,11 +1564,7 @@ static unsigned long cpu_avg_load_per_ta
>  
>  #ifdef CONFIG_FAIR_GROUP_SCHED
>  
> -struct update_shares_data {
> -	unsigned long rq_weight[NR_CPUS];
> -};
> -
> -static DEFINE_PER_CPU(struct update_shares_data, update_shares_data);
> +unsigned long *update_shares_data;

That should be __read_mostly - this can be a hotly accessed data 
structure of the scheduler - if it happens to go next to a frequently 
bouncing variable that can be bad for performance.

> -				    struct update_shares_data *usd)
> +				    unsigned long *usd)

I dont think using usd[cpu] is clearer than usd->rq_weight[cpu]. At 
minimum it should be renamed to usd_rq_weight not usd.

>  	local_irq_save(flags);
> -	usd = &__get_cpu_var(update_shares_data);
> +	usd = per_cpu_ptr(update_shares_data, smp_processor_id());

Could we please have a look at the before/after assembly of this 
sequence on x86, to make sure the claims in this thread are true and we 
dont lose performance? (and included it in the changelog with a 
resubmission - with a new, changed '[PATCH] ...' subject line, not 
hidden inside a discussion thread.)

>From a merge POV i'm quite nervous about such a change to the scheduler 
this late in the .32 cycle - to offset that risk i'd really like to see 
that this change has been pursued carefully to the edge of possibilities 
- currently it does not give that impression.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-26 20:16                   ` Ingo Molnar
@ 2009-10-27 10:03                     ` Jiri Kosina
  0 siblings, 0 replies; 36+ messages in thread
From: Jiri Kosina @ 2009-10-27 10:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeff Mahoney, Tejun Heo, Peter Zijlstra,
	Linux Kernel Mailing List, Tony Luck, Fenghua Yu, linux-ia64

On Mon, 26 Oct 2009, Ingo Molnar wrote:

> > -struct update_shares_data {
> > -	unsigned long rq_weight[NR_CPUS];
> > -};
> > -
> > -static DEFINE_PER_CPU(struct update_shares_data, update_shares_data);
> > +unsigned long *update_shares_data;
> 
> That should be __read_mostly - this can be a hotly accessed data 
> structure of the scheduler - if it happens to go next to a frequently 
> bouncing variable that can be bad for performance.

Good point, will resend with this annotation added.

> > -				    struct update_shares_data *usd)
> > +				    unsigned long *usd)
> 
> I dont think using usd[cpu] is clearer than usd->rq_weight[cpu]. At 
> minimum it should be renamed to usd_rq_weight not usd.

OK.

> >  	local_irq_save(flags);
> > -	usd = &__get_cpu_var(update_shares_data);
> > +	usd = per_cpu_ptr(update_shares_data, smp_processor_id());
> 
> Could we please have a look at the before/after assembly of this 
> sequence on x86, to make sure the claims in this thread are true and we 
> dont lose performance? (and included it in the changelog with a 
> resubmission - with a new, changed '[PATCH] ...' subject line, not 
> hidden inside a discussion thread.)

Just for completness (I will include it in the changelog with the next 
resubmission shortly):


Before:

...
ffffffff8104337c:       65 48 8b 14 25 20 cd    mov    %gs:0xcd20,%rdx
ffffffff81043383:       00 00
ffffffff81043385:       48 c7 c0 00 e1 00 00    mov    $0xe100,%rax
ffffffff8104338c:       48 c7 45 a0 00 00 00    movq   $0x0,-0x60(%rbp)
ffffffff81043393:       00
ffffffff81043394:       48 c7 45 a8 00 00 00    movq   $0x0,-0x58(%rbp)
ffffffff8104339b:       00
ffffffff8104339c:       48 01 d0                add    %rdx,%rax
ffffffff8104339f:       49 8d 94 24 08 01 00    lea    0x108(%r12),%rdx
ffffffff810433a6:       00
ffffffff810433a7:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
ffffffff810433ac:       48 89 45 b0             mov    %rax,-0x50(%rbp)
ffffffff810433b0:       bb 00 04 00 00          mov    $0x400,%ebx
ffffffff810433b5:       48 89 55 c0             mov    %rdx,-0x40(%rbp)
...

After:

...
ffffffff8104337c:       65 8b 04 25 28 cd 00    mov    %gs:0xcd28,%eax
ffffffff81043383:       00
ffffffff81043384:       48 98                   cltq
ffffffff81043386:       49 8d bc 24 08 01 00    lea    0x108(%r12),%rdi
ffffffff8104338d:       00
ffffffff8104338e:       48 8b 15 d3 7f 76 00    mov    0x767fd3(%rip),%rdx        # ffffffff817ab368 <update_shares_data>
ffffffff81043395:       48 8b 34 c5 00 ee 6d    mov    -0x7e921200(,%rax,8),%rsi
ffffffff8104339c:       81
ffffffff8104339d:       48 c7 45 a0 00 00 00    movq   $0x0,-0x60(%rbp)
ffffffff810433a4:       00
ffffffff810433a5:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
ffffffff810433aa:       48 89 7d c0             mov    %rdi,-0x40(%rbp)
ffffffff810433ae:       48 c7 45 a8 00 00 00    movq   $0x0,-0x58(%rbp)
ffffffff810433b5:       00
ffffffff810433b6:       bb 00 04 00 00          mov    $0x400,%ebx
ffffffff810433bb:       48 01 f2                add    %rsi,%rdx
ffffffff810433be:       48 89 55 b0             mov    %rdx,-0x50(%rbp)
...

> From a merge POV i'm quite nervous about such a change to the scheduler 
> this late in the .32 cycle - to offset that risk i'd really like to see 
> that this change has been pursued carefully to the edge of possibilities 
> - currently it does not give that impression.

So what's your other proposal?

Refactoring the ia64 pagefault handler is no-go this late in the cycle 
IMHO.

So either this, or reverting 34d76c41 (which is also no-go, I believe).

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20  5:21     ` Tejun Heo
  2009-10-20  5:58       ` Jiri Kosina
@ 2009-10-27 10:52       ` Jiri Kosina
  1 sibling, 0 replies; 36+ messages in thread
From: Jiri Kosina @ 2009-10-27 10:52 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jiri Kosina, Jeff Mahoney, Peter Zijlstra,
	Linux Kernel Mailing List, Tony Luck, Fenghua Yu, linux-ia64

On Tue, 20 Oct 2009, Tejun Heo wrote:

> > How about this one? (untested)
> > 
> > From: Jiri Kosina <jkosina@suse.cz>
> > Subject: sched: move rq_weight data array out of .percpu
> > 
> > Commit 34d76c41 introduced percpu array update_shares_data, size of which 
> > being proportional to NR_CPUS. Unfortunately this blows up ia64 for large 
> > NR_CPUS configuration, as ia64 allows only 64k for .percpu section.
> > 
> > Fix this by allocating this array dynamically and keep only pointer to it 
> > percpu.
> 
> If the structure can be dynamically allocated at all, just using
> percpu_alloc() to allocate the whole thing should solve it with the
> correct numa locality.

Unfortunately it's not that straightforward and easy, as we can't simply 
call __alloc_percpu from sched_init().

The reason is that does spin_unlock_irq(), and in sched_init() time it's 
too early for that, so we end up with

WARNING: at kernel/lockdep.c:2311 trace_hardirqs_on_caller+0xd2/0x1a0()
Hardware name: Kfisher   HT2100 HT1000
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.32-rc5-default #66
Call Trace:
 [<ffffffff81086a62>] ? trace_hardirqs_on_caller+0xd2/0x1a0
 [<ffffffff81054528>] warn_slowpath_common+0x78/0xd0
 [<ffffffff81356b8b>] ? _spin_unlock_irq+0x2b/0x40
 [<ffffffff8105458f>] warn_slowpath_null+0xf/0x20
 [<ffffffff81086a62>] trace_hardirqs_on_caller+0xd2/0x1a0
 [<ffffffff81086b3d>] trace_hardirqs_on+0xd/0x10
 [<ffffffff81356b8b>] _spin_unlock_irq+0x2b/0x40
 [<ffffffff811154c1>] pcpu_extend_area_map+0x41/0x110
 [<ffffffff81115ce0>] pcpu_alloc+0x120/0x890
 [<ffffffff81084577>] ? trace_hardirqs_off_caller+0x67/0xa0
 [<ffffffff8108578d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff8111646b>] __alloc_percpu+0xb/0x10
 [<ffffffff81712215>] sched_init+0x147/0x5d2
 [<ffffffff816fabca>] start_kernel+0x1fc/0x40c
 [<ffffffff816fa299>] x86_64_start_reservations+0x99/0xb9
 [<ffffffff816fa3bf>] x86_64_start_kernel+0x106/0x121
 [<ffffffff816fa140>] ? early_idt_handler+0x0/0x71
---[ end trace 4eaa2a86a8e2da22 ]---
start_kernel(): bug: interrupts were enabled *very* early, fixing it

-- 
Jiri Kosina
SUSE Labs, Novell Inc.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20  4:57   ` Jiri Kosina
  2009-10-20  5:21     ` Tejun Heo
@ 2009-10-20  6:15     ` Ingo Molnar
  2009-10-20  6:26       ` Jiri Kosina
  2009-10-20  9:21       ` Peter Zijlstra
  1 sibling, 2 replies; 36+ messages in thread
From: Ingo Molnar @ 2009-10-20  6:15 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Jeff Mahoney, Peter Zijlstra, Linux Kernel Mailing List,
	Tony Luck, Fenghua Yu, linux-ia64


* Jiri Kosina <jkosina@suse.cz> wrote:

> On Tue, 20 Oct 2009, Jiri Kosina wrote:
> 
> > > Commit 34d76c41 introduced the update_shares_data percpu, but it ends up
> > > causing problems on large ia64 machines. Specifically, ia64 is limited
> > > to 64k in percpu vars and with NR_CPUS=4096, that ends up being 32k by
> > > itself. It ends up causing link errors since that is how ia64 enforces
> > > the 64k limit.
> > > 
> > > I can take a deeper look at finding a workable solution but thought I'd
> > > mention it in case you had ideas already.
> > 
> > I am adding some IA64 CCs, as the failure is solely caused by the ia64 
> > percpu implementation/pagefault handler optimization which requires the 
> > .percpu section area not be larger than 64k, which blows up with 34d76c41 
> > and NR_CPUS high enoufh (due to introduction of percpu array being 
> > size-dependent on NR_CPUS).
> 
> How about this one? (untested)
> 
> 
> 
> From: Jiri Kosina <jkosina@suse.cz>
> Subject: sched: move rq_weight data array out of .percpu
> 
> Commit 34d76c41 introduced percpu array update_shares_data, size of which 
> being proportional to NR_CPUS. Unfortunately this blows up ia64 for large 
> NR_CPUS configuration, as ia64 allows only 64k for .percpu section.
> 
> Fix this by allocating this array dynamically and keep only pointer to it 
> percpu.
> 
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> --- 
>  kernel/sched.c |   15 +++++++--------
>  1 files changed, 7 insertions(+), 8 deletions(-)

Seems like an IA64 bug to me. It's _very_ easy to get more than 64K of 
percpu data - i'm wondering why IA64 only triggered it now? Sure 
lockdep/lockstat must have triggered it before.

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20  6:15     ` Ingo Molnar
@ 2009-10-20  6:26       ` Jiri Kosina
  2009-10-20  6:35         ` Ingo Molnar
  2009-10-20  9:21       ` Peter Zijlstra
  1 sibling, 1 reply; 36+ messages in thread
From: Jiri Kosina @ 2009-10-20  6:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeff Mahoney, Peter Zijlstra, Linux Kernel Mailing List,
	Tony Luck, Fenghua Yu, linux-ia64

On Tue, 20 Oct 2009, Ingo Molnar wrote:

> > Commit 34d76c41 introduced percpu array update_shares_data, size of which 
> > being proportional to NR_CPUS. Unfortunately this blows up ia64 for large 
> > NR_CPUS configuration, as ia64 allows only 64k for .percpu section.
> > 
> > Fix this by allocating this array dynamically and keep only pointer to it 
> > percpu.
> > 
> > Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> > --- 
> >  kernel/sched.c |   15 +++++++--------
> >  1 files changed, 7 insertions(+), 8 deletions(-)
> 
> Seems like an IA64 bug to me. 

IA64 guys actually use that as some kind of optimization for fast access 
to the percpu data in their pagefault handler, as far as I know.

> It's _very_ easy to get more than 64K of percpu data - i'm wondering why 
> IA64 only triggered it now? Sure lockdep/lockstat must have triggered it 
> before.

Do we actually have lockdep on ia64?

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20  6:26       ` Jiri Kosina
@ 2009-10-20  6:35         ` Ingo Molnar
  2009-10-20  7:11           ` Eric Dumazet
                             ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Ingo Molnar @ 2009-10-20  6:35 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Jeff Mahoney, Peter Zijlstra, Linux Kernel Mailing List,
	Tony Luck, Fenghua Yu, linux-ia64


* Jiri Kosina <jkosina@suse.cz> wrote:

> On Tue, 20 Oct 2009, Ingo Molnar wrote:
> 
> > > Commit 34d76c41 introduced percpu array update_shares_data, size of which 
> > > being proportional to NR_CPUS. Unfortunately this blows up ia64 for large 
> > > NR_CPUS configuration, as ia64 allows only 64k for .percpu section.
> > > 
> > > Fix this by allocating this array dynamically and keep only pointer to it 
> > > percpu.
> > > 
> > > Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> > > --- 
> > >  kernel/sched.c |   15 +++++++--------
> > >  1 files changed, 7 insertions(+), 8 deletions(-)
> > 
> > Seems like an IA64 bug to me. 
> 
> IA64 guys actually use that as some kind of optimization for fast 
> access to the percpu data in their pagefault handler, as far as I 
> know.

Still looks like a bug if it causes a breakage (linker error) on IA64, 
and if the 'fix' (i'd call it a workaround) causes a (small but nonzero) 
performance regression on other architectures.

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20  6:35         ` Ingo Molnar
@ 2009-10-20  7:11           ` Eric Dumazet
  2009-10-20  7:39             ` Tejun Heo
  2009-10-20  7:12           ` Tejun Heo
  2009-10-20 13:08           ` Jeff Mahoney
  2 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2009-10-20  7:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jiri Kosina, Jeff Mahoney, Peter Zijlstra,
	Linux Kernel Mailing List, Tony Luck, Fenghua Yu, linux-ia64

Ingo Molnar a écrit :
 
> Still looks like a bug if it causes a breakage (linker error) on IA64, 
> and if the 'fix' (i'd call it a workaround) causes a (small but nonzero) 
> performance regression on other architectures.
> 

True, but this also save some amount of ram for some distro kernels.

If we keep this static NR_CPUS thing, we might be able to free
the end of table, for other per_cpu users ?

if (nr_cpus_ids < NR_CPUS) {
	per_cpu_free_static_zone(&update_shares_data[nr_cpus_ids],
				 sizeof(long)*(NR_CPUS - nr_cpus_ids));
}



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20  7:11           ` Eric Dumazet
@ 2009-10-20  7:39             ` Tejun Heo
  0 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2009-10-20  7:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ingo Molnar, Jiri Kosina, Jeff Mahoney, Peter Zijlstra,
	Linux Kernel Mailing List, Tony Luck, Fenghua Yu, linux-ia64

Eric Dumazet wrote:
> Ingo Molnar a écrit :
>  
>> Still looks like a bug if it causes a breakage (linker error) on IA64, 
>> and if the 'fix' (i'd call it a workaround) causes a (small but nonzero) 
>> performance regression on other architectures.
>>
> 
> True, but this also save some amount of ram for some distro kernels.
> 
> If we keep this static NR_CPUS thing, we might be able to free
> the end of table, for other per_cpu users ?
> 
> if (nr_cpus_ids < NR_CPUS) {
> 	per_cpu_free_static_zone(&update_shares_data[nr_cpus_ids],
> 				 sizeof(long)*(NR_CPUS - nr_cpus_ids));
> }

That's doable but Considering that the users of NR_CPUS are pretty few
(and should be kept that way), I think it would be better to just use
dynamic allocation, which no longer incurs any major performance
difference (the only difference is constant offset vs. pointer in a
variable), for those cases.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20  6:35         ` Ingo Molnar
  2009-10-20  7:11           ` Eric Dumazet
@ 2009-10-20  7:12           ` Tejun Heo
  2009-10-20  7:17             ` Jiri Kosina
  2009-10-20 13:08           ` Jeff Mahoney
  2 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2009-10-20  7:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jiri Kosina, Jeff Mahoney, Peter Zijlstra,
	Linux Kernel Mailing List, Tony Luck, Fenghua Yu, linux-ia64

Hello,

Ingo Molnar wrote:
> Still looks like a bug if it causes a breakage (linker error) on IA64, 

In a sense, yes.

> and if the 'fix' (i'd call it a workaround) causes a (small but nonzero) 
> performance regression on other architectures.

Yeah, by small amount but also reduces memory usage quite a bit.  For
this case, I think dynamic allocation is actually the right thing to
do.

Extending the first chunk size shouldn't be difficult on ia64 but
other than this, there still is a lot of room left with the current
64k limit, so I think we should be safe for a while.  BTW, I also am
curious how lockdep has been coping with the limit.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20  7:12           ` Tejun Heo
@ 2009-10-20  7:17             ` Jiri Kosina
  2009-10-20  7:36               ` Tejun Heo
  0 siblings, 1 reply; 36+ messages in thread
From: Jiri Kosina @ 2009-10-20  7:17 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, Jeff Mahoney, Peter Zijlstra,
	Linux Kernel Mailing List, Tony Luck, Fenghua Yu, linux-ia64

On Tue, 20 Oct 2009, Tejun Heo wrote:

> Extending the first chunk size shouldn't be difficult on ia64 but other 
> than this, there still is a lot of room left with the current 64k limit, 
> so I think we should be safe for a while.  BTW, I also am curious how 
> lockdep has been coping with the limit.

There is no lockdep on ia64, and we don't have such restriction on .percpu 
section on any other architecture which supports lockdep, right?

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20  7:17             ` Jiri Kosina
@ 2009-10-20  7:36               ` Tejun Heo
  0 siblings, 0 replies; 36+ messages in thread
From: Tejun Heo @ 2009-10-20  7:36 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Ingo Molnar, Jeff Mahoney, Peter Zijlstra,
	Linux Kernel Mailing List, Tony Luck, Fenghua Yu, linux-ia64

Jiri Kosina wrote:
> On Tue, 20 Oct 2009, Tejun Heo wrote:
> 
>> Extending the first chunk size shouldn't be difficult on ia64 but other 
>> than this, there still is a lot of room left with the current 64k limit, 
>> so I think we should be safe for a while.  BTW, I also am curious how 
>> lockdep has been coping with the limit.
> 
> There is no lockdep on ia64, and we don't have such restriction on .percpu 
> section on any other architecture which supports lockdep, right?

Yeap, the restriction is only on ia64.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20  6:35         ` Ingo Molnar
  2009-10-20  7:11           ` Eric Dumazet
  2009-10-20  7:12           ` Tejun Heo
@ 2009-10-20 13:08           ` Jeff Mahoney
  2009-10-20 13:43             ` Ingo Molnar
  2 siblings, 1 reply; 36+ messages in thread
From: Jeff Mahoney @ 2009-10-20 13:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jiri Kosina, Peter Zijlstra, Linux Kernel Mailing List, Tony Luck,
	Fenghua Yu, linux-ia64

On 10/20/2009 02:35 AM, Ingo Molnar wrote:
> 
> * Jiri Kosina <jkosina@suse.cz> wrote:
> 
>> On Tue, 20 Oct 2009, Ingo Molnar wrote:
>>
>>>> Commit 34d76c41 introduced percpu array update_shares_data, size of which 
>>>> being proportional to NR_CPUS. Unfortunately this blows up ia64 for large 
>>>> NR_CPUS configuration, as ia64 allows only 64k for .percpu section.
>>>>
>>>> Fix this by allocating this array dynamically and keep only pointer to it 
>>>> percpu.
>>>>
>>>> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
>>>> --- 
>>>>  kernel/sched.c |   15 +++++++--------
>>>>  1 files changed, 7 insertions(+), 8 deletions(-)
>>>
>>> Seems like an IA64 bug to me. 
>>
>> IA64 guys actually use that as some kind of optimization for fast 
>> access to the percpu data in their pagefault handler, as far as I 
>> know.
> 
> Still looks like a bug if it causes a breakage (linker error) on IA64, 
> and if the 'fix' (i'd call it a workaround) causes a (small but nonzero) 
> performance regression on other architectures.

The linker error isn't a bug, it's enforcement. The ia64 linker script
explicitly rewinds the location pointer back to the start of
.data.percpu + 64k to start the .data section to cause the error if
.data.percpu is larger than 64k.

-Jeff

-- 
Jeff Mahoney
SUSE Labs

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20 13:08           ` Jeff Mahoney
@ 2009-10-20 13:43             ` Ingo Molnar
  2009-10-20 13:57               ` Tejun Heo
  0 siblings, 1 reply; 36+ messages in thread
From: Ingo Molnar @ 2009-10-20 13:43 UTC (permalink / raw)
  To: Jeff Mahoney
  Cc: Jiri Kosina, Peter Zijlstra, Linux Kernel Mailing List, Tony Luck,
	Fenghua Yu, linux-ia64, Linus Torvalds

* Jeff Mahoney <jeffm@suse.com> wrote:

> On 10/20/2009 02:35 AM, Ingo Molnar wrote:
> > 
> > * Jiri Kosina <jkosina@suse.cz> wrote:
> > 
> >> On Tue, 20 Oct 2009, Ingo Molnar wrote:
> >>
> >>>> Commit 34d76c41 introduced percpu array update_shares_data, size of which 
> >>>> being proportional to NR_CPUS. Unfortunately this blows up ia64 for large 
> >>>> NR_CPUS configuration, as ia64 allows only 64k for .percpu section.
> >>>>
> >>>> Fix this by allocating this array dynamically and keep only pointer to it 
> >>>> percpu.
> >>>>
> >>>> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> >>>> --- 
> >>>>  kernel/sched.c |   15 +++++++--------
> >>>>  1 files changed, 7 insertions(+), 8 deletions(-)
> >>>
> >>> Seems like an IA64 bug to me. 
> >>
> >> IA64 guys actually use that as some kind of optimization for fast 
> >> access to the percpu data in their pagefault handler, as far as I 
> >> know.
> > 
> > Still looks like a bug if it causes a breakage (linker error) on IA64, 
> > and if the 'fix' (i'd call it a workaround) causes a (small but nonzero) 
> > performance regression on other architectures.
> 
> The linker error isn't a bug, it's enforcement. The ia64 linker script 
> explicitly rewinds the location pointer back to the start of 
> .data.percpu + 64k to start the .data section to cause the error if 
> .data.percpu is larger than 64k.

Since every other SMP architecture manages to support more than 64K of 
pecpu data, this is clearly an ugly, self-inflicted limitation of IA64 
that has now escallated into a link failure.

Now, 34d76c41 could certainly be improved in a way that works around the 
IA64 problem too: we can allocate the data dynamically as long as the 
proper percpu allocator is used (not kmalloc as in the patch in this 
thread). But arguing that the current IA64 64K limit behavior is 
anything but very broken is rather shortsighted.

IA64 should be fixed really - we can get past the 64K of percpu data 
limit anytime we add a few more pages of per-cpu data to the kernel - 
the scheduler just happened to be the one to cross it this time.

The scheduler change in 34d76c41 has been done two months ago and has 
been upstream for a month, so this compaint is rather late and at 
minimum a certain degree of honesty about the situation is warranted.

Saying that all static percpu data must be below 64K, which will only be 
noticed once IA64 gets its testing act together months after it's been 
created is silly. If you want to enforce such a limit make it testable 
in a _timely_ fashion. Or fix the limit really.

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20 13:43             ` Ingo Molnar
@ 2009-10-20 13:57               ` Tejun Heo
  2009-10-20 13:58                 ` Tejun Heo
  0 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2009-10-20 13:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeff Mahoney, Jiri Kosina, Peter Zijlstra,
	Linux Kernel Mailing List, Tony Luck, Fenghua Yu, linux-ia64,
	Linus Torvalds

Ingo Molnar wrote:
> IA64 should be fixed really - we can get past the 64K of percpu data 
> limit anytime we add a few more pages of per-cpu data to the kernel - 
> the scheduler just happened to be the one to cross it this time.

Christoph (cc'd hi!) was talking about re-purposing one of reserved
generic registers (for current or something, I don't remember the
details) for percpu base and just getting the original one via percpu
access.  With that we should be able to lift the limitation without
much performance penalty.

> Saying that all static percpu data must be below 64K, which will only be 
> noticed once IA64 gets its testing act together months after it's been 
> created is silly. If you want to enforce such a limit make it testable 
> in a _timely_ fashion. Or fix the limit really.

Yeah, the problem probably was that it only pushes the perpcu area
very slightly over the limit depending on configuration so it's not
too surprising that it didn't get reported for some time.  Also, the
linker script thing is a sanity check which is intentionally put there
to trigger when this happens so that it can be found out clearly
during build time.

In the long term, it will be a good idea to lift that restriction.
That said, I really don't think we should be adding NR_CPUS sized
array to percpu area either.  Things like that quickly become very
scary with N*1024 cpu configurations.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20 13:57               ` Tejun Heo
@ 2009-10-20 13:58                 ` Tejun Heo
  2009-10-21  6:43                   ` Christoph Lameter
  0 siblings, 1 reply; 36+ messages in thread
From: Tejun Heo @ 2009-10-20 13:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jeff Mahoney, Jiri Kosina, Peter Zijlstra,
	Linux Kernel Mailing List, Tony Luck, Fenghua Yu, linux-ia64,
	Linus Torvalds, Christoph Lameter

Of course I forgot to actually cc.  Cc'ing and quoting whole body.
Sorry.

Tejun Heo wrote:
> Ingo Molnar wrote:
>> IA64 should be fixed really - we can get past the 64K of percpu data 
>> limit anytime we add a few more pages of per-cpu data to the kernel - 
>> the scheduler just happened to be the one to cross it this time.
> 
> Christoph (cc'd hi!) was talking about re-purposing one of reserved
> generic registers (for current or something, I don't remember the
> details) for percpu base and just getting the original one via percpu
> access.  With that we should be able to lift the limitation without
> much performance penalty.
> 
>> Saying that all static percpu data must be below 64K, which will only be 
>> noticed once IA64 gets its testing act together months after it's been 
>> created is silly. If you want to enforce such a limit make it testable 
>> in a _timely_ fashion. Or fix the limit really.
> 
> Yeah, the problem probably was that it only pushes the perpcu area
> very slightly over the limit depending on configuration so it's not
> too surprising that it didn't get reported for some time.  Also, the
> linker script thing is a sanity check which is intentionally put there
> to trigger when this happens so that it can be found out clearly
> during build time.
> 
> In the long term, it will be a good idea to lift that restriction.
> That said, I really don't think we should be adding NR_CPUS sized
> array to percpu area either.  Things like that quickly become very
> scary with N*1024 cpu configurations.


-- 
tejun

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20 13:58                 ` Tejun Heo
@ 2009-10-21  6:43                   ` Christoph Lameter
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Lameter @ 2009-10-21  6:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ingo Molnar, Jeff Mahoney, Jiri Kosina, Peter Zijlstra,
	Linux Kernel Mailing List, Tony Luck, Fenghua Yu, linux-ia64,
	Linus Torvalds, peterc

The suggestion to Tony was to stop using the fixed TLB address for the per
cpu areas and instead use a global register like on x86 and sparc. Peter
Chubb then said that we had available global registers (cced him) that
could be used for this purpose. Doing so would make percpu support conform
to the schemes used by other architectures.

However, IA64 supports TLBs for larger pages (128MB, 256MB ...). So we
could change the maximum size of the static percpu data by using a
different sized TLB entry.

Tony?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096
  2009-10-20  6:15     ` Ingo Molnar
  2009-10-20  6:26       ` Jiri Kosina
@ 2009-10-20  9:21       ` Peter Zijlstra
  1 sibling, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2009-10-20  9:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Jiri Kosina, Jeff Mahoney, Linux Kernel Mailing List, Tony Luck,
	Fenghua Yu, linux-ia64

On Tue, 2009-10-20 at 08:15 +0200, Ingo Molnar wrote:
> 
> Seems like an IA64 bug to me. It's _very_ easy to get more than 64K of 
> percpu data - i'm wondering why IA64 only triggered it now? Sure 
> lockdep/lockstat must have triggered it before.

I seem to remember people wanting to make lockstat data dynamically
allocated because of this.

But then, I'm not sure ia64 actually has lockdep support.


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2009-10-27 10:52 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-18 22:28 Commit 34d76c41 causes linker errors on ia64 with NR_CPUS=4096 Jeff Mahoney
2009-10-20  2:02 ` Jiri Kosina
2009-10-20  4:57   ` Jiri Kosina
2009-10-20  5:21     ` Tejun Heo
2009-10-20  5:58       ` Jiri Kosina
2009-10-20  6:12         ` Tejun Heo
2009-10-20  6:14           ` Tejun Heo
2009-10-20  6:27             ` Jiri Kosina
2009-10-20 14:18               ` Jeff Mahoney
2009-10-20 14:49                 ` Jeff Mahoney
2009-10-21  6:11                   ` Ingo Molnar
2009-10-21 15:19                     ` Tejun Heo
2009-10-21 22:11                       ` Luck, Tony
2009-10-22 14:49                         ` Jiri Kosina
2009-10-22 14:53                           ` Jeff Mahoney
2009-10-22 22:24                           ` Luck, Tony
2009-10-23  7:51                             ` Ingo Molnar
2009-10-23 12:30                               ` Jiri Kosina
2009-10-26 16:38                                 ` Jiri Kosina
2009-10-26 20:16                   ` Ingo Molnar
2009-10-27 10:03                     ` Jiri Kosina
2009-10-27 10:52       ` Jiri Kosina
2009-10-20  6:15     ` Ingo Molnar
2009-10-20  6:26       ` Jiri Kosina
2009-10-20  6:35         ` Ingo Molnar
2009-10-20  7:11           ` Eric Dumazet
2009-10-20  7:39             ` Tejun Heo
2009-10-20  7:12           ` Tejun Heo
2009-10-20  7:17             ` Jiri Kosina
2009-10-20  7:36               ` Tejun Heo
2009-10-20 13:08           ` Jeff Mahoney
2009-10-20 13:43             ` Ingo Molnar
2009-10-20 13:57               ` Tejun Heo
2009-10-20 13:58                 ` Tejun Heo
2009-10-21  6:43                   ` Christoph Lameter
2009-10-20  9:21       ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox