public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Fixed division by zero bug in kernel/padata.c
@ 2010-07-02 11:59 Dan Kruchinin
  2010-07-02 12:56 ` Steffen Klassert
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Kruchinin @ 2010-07-02 11:59 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: LKML, Herbert Xu

 When boot CPU(typically CPU #0) is excluded from padata cpumask and
 user enters halt command from console, kernel faults on division by zero;
 This occurs because during the halt kernel shuts down each non-boot CPU one
 by one and after it shuts down the last CPU that is set in the padata cpumask,
 the only working CPU in the system is a boot CPU(#0) and it's the only CPU that
 is set in the cpu_active_mask. Hence when padata_cpu_callback calls
 __padata_remove_cpu(which calls padata_alloc_pd) it appears that
padata cpumask and
 cpu_active_mask aren't intersect. Hence the following code in
padata_alloc_pd causes
 a DZ error exception:
  cpumask_and(pd->cpumask, cpumask, cpu_active_mask); // pd->cpumask
will be empty
  ...
  num_cpus = cpumask_weight(pd->cpumask); // num_cpus = 0
  pd->max_seq_nr = (MAX_SEQ_NR / num_cpus) * num_cpus - 1; // DZ!


Signed-off-by: Dan Kruchinin <dkruchinin@acm.org>
---
 kernel/padata.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/padata.c b/kernel/padata.c
index fdd8ae6..dbe6d26 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -434,7 +434,7 @@ static struct parallel_data
*padata_alloc_pd(struct padata_instance *pinst,
 		atomic_set(&queue->num_obj, 0);
 	}

-	num_cpus = cpumask_weight(pd->cpumask);
+	num_cpus = cpumask_weight(pd->cpumask) + 1;
 	pd->max_seq_nr = (MAX_SEQ_NR / num_cpus) * num_cpus - 1;

 	setup_timer(&pd->timer, padata_reorder_timer, (unsigned long)pd);
-- 
1.7.1


-- 
W.B.R.
Dan Kruchinin

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] Fixed division by zero bug in kernel/padata.c
  2010-07-02 11:59 [PATCH] Fixed division by zero bug in kernel/padata.c Dan Kruchinin
@ 2010-07-02 12:56 ` Steffen Klassert
  2010-07-02 13:24   ` Dan Kruchinin
  0 siblings, 1 reply; 7+ messages in thread
From: Steffen Klassert @ 2010-07-02 12:56 UTC (permalink / raw)
  To: Dan Kruchinin; +Cc: LKML, Herbert Xu

On Fri, Jul 02, 2010 at 03:59:54PM +0400, Dan Kruchinin wrote:
>  When boot CPU(typically CPU #0) is excluded from padata cpumask and
>  user enters halt command from console, kernel faults on division by zero;
>  This occurs because during the halt kernel shuts down each non-boot CPU one
>  by one and after it shuts down the last CPU that is set in the padata cpumask,
>  the only working CPU in the system is a boot CPU(#0) and it's the only CPU that
>  is set in the cpu_active_mask. Hence when padata_cpu_callback calls
>  __padata_remove_cpu(which calls padata_alloc_pd) it appears that
> padata cpumask and
>  cpu_active_mask aren't intersect. Hence the following code in
> padata_alloc_pd causes
>  a DZ error exception:
>   cpumask_and(pd->cpumask, cpumask, cpu_active_mask); // pd->cpumask
> will be empty
>   ...
>   num_cpus = cpumask_weight(pd->cpumask); // num_cpus = 0
>   pd->max_seq_nr = (MAX_SEQ_NR / num_cpus) * num_cpus - 1; // DZ!
> 

Good catch!

> 
> Signed-off-by: Dan Kruchinin <dkruchinin@acm.org>
> ---
>  kernel/padata.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/padata.c b/kernel/padata.c
> index fdd8ae6..dbe6d26 100644
> --- a/kernel/padata.c
> +++ b/kernel/padata.c
> @@ -434,7 +434,7 @@ static struct parallel_data
> *padata_alloc_pd(struct padata_instance *pinst,
>  		atomic_set(&queue->num_obj, 0);
>  	}
> 
> -	num_cpus = cpumask_weight(pd->cpumask);
> +	num_cpus = cpumask_weight(pd->cpumask) + 1;
>  	pd->max_seq_nr = (MAX_SEQ_NR / num_cpus) * num_cpus - 1;
> 

num_cpus should stay the number of cpus in this cpumask, this is required
to handle a smooth overrun of the sequence numbers. 
I think it's better to return with an error and to stop the instance
if somebody takes away the last cpu in our cpumask. We can't run with an 
empty cpumask anyway.

Let us look again at this on monday.

Thanks again for catching this,

Steffen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Fixed division by zero bug in kernel/padata.c
  2010-07-02 12:56 ` Steffen Klassert
@ 2010-07-02 13:24   ` Dan Kruchinin
  2010-07-05 13:18     ` Steffen Klassert
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Kruchinin @ 2010-07-02 13:24 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: LKML, Herbert Xu

No problem. Here is fixed patch:
--
 When boot CPU(typically CPU #0) is excluded from padata cpumask and
 user enters halt command from console, kernel faults on division by zero;
 This occurs because during the halt kernel shuts down each non-boot CPU one
 by one. After it shuts down the last CPU that is set in the padata cpumask,
 the only working CPU in the system is a boot CPU(#0) and it's the only CPU that
 is set in the cpu_active_mask. Hence when padata_cpu_callback calls
 __padata_remove_cpu(and hence padata_alloc_pd) it appears that padata
cpumask and
 cpu_active mask aren't intersect. Hence the following code in
padata_alloc_pd causes
 a DZ error exception:
  cpumask_and(pd->cpumask, cpumask, cpu_active_mask); // pd->cpumask
will be empty
  ...
  num_cpus = cpumask_weight(pd->cpumask); // num_cpus = 0
  pd->max_seq_nr = (MAX_SEQ_NR / num_cpus) * num_cpus - 1; // DZ!


Signed-off-by: Dan Kruchinin <dkruchinin@acm.org>
---
 kernel/padata.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/kernel/padata.c b/kernel/padata.c
index fdd8ae6..dcddac0 100644
--- a/kernel/padata.c
+++ b/kernel/padata.c
@@ -435,6 +435,9 @@ static struct parallel_data
*padata_alloc_pd(struct padata_instance *pinst,
 	}

 	num_cpus = cpumask_weight(pd->cpumask);
+	if (!num_cpus)
+		goto err_free_cpumask;
+
 	pd->max_seq_nr = (MAX_SEQ_NR / num_cpus) * num_cpus - 1;

 	setup_timer(&pd->timer, padata_reorder_timer, (unsigned long)pd);
@@ -446,6 +449,8 @@ static struct parallel_data
*padata_alloc_pd(struct padata_instance *pinst,

 	return pd;

+err_free_cpumask:
+	free_cpumask_var(pd->cpumask);
 err_free_queue:
 	free_percpu(pd->queue);
 err_free_pd:
-- 
1.7.1


On Fri, Jul 2, 2010 at 4:56 PM, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> On Fri, Jul 02, 2010 at 03:59:54PM +0400, Dan Kruchinin wrote:
>>  When boot CPU(typically CPU #0) is excluded from padata cpumask and
>>  user enters halt command from console, kernel faults on division by zero;
>>  This occurs because during the halt kernel shuts down each non-boot CPU one
>>  by one and after it shuts down the last CPU that is set in the padata cpumask,
>>  the only working CPU in the system is a boot CPU(#0) and it's the only CPU that
>>  is set in the cpu_active_mask. Hence when padata_cpu_callback calls
>>  __padata_remove_cpu(which calls padata_alloc_pd) it appears that
>> padata cpumask and
>>  cpu_active_mask aren't intersect. Hence the following code in
>> padata_alloc_pd causes
>>  a DZ error exception:
>>   cpumask_and(pd->cpumask, cpumask, cpu_active_mask); // pd->cpumask
>> will be empty
>>   ...
>>   num_cpus = cpumask_weight(pd->cpumask); // num_cpus = 0
>>   pd->max_seq_nr = (MAX_SEQ_NR / num_cpus) * num_cpus - 1; // DZ!
>>
>
> Good catch!
>
>>
>> Signed-off-by: Dan Kruchinin <dkruchinin@acm.org>
>> ---
>>  kernel/padata.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/kernel/padata.c b/kernel/padata.c
>> index fdd8ae6..dbe6d26 100644
>> --- a/kernel/padata.c
>> +++ b/kernel/padata.c
>> @@ -434,7 +434,7 @@ static struct parallel_data
>> *padata_alloc_pd(struct padata_instance *pinst,
>>               atomic_set(&queue->num_obj, 0);
>>       }
>>
>> -     num_cpus = cpumask_weight(pd->cpumask);
>> +     num_cpus = cpumask_weight(pd->cpumask) + 1;
>>       pd->max_seq_nr = (MAX_SEQ_NR / num_cpus) * num_cpus - 1;
>>
>
> num_cpus should stay the number of cpus in this cpumask, this is required
> to handle a smooth overrun of the sequence numbers.
> I think it's better to return with an error and to stop the instance
> if somebody takes away the last cpu in our cpumask. We can't run with an
> empty cpumask anyway.
>
> Let us look again at this on monday.
>
> Thanks again for catching this,
>
> Steffen
>



-- 
W.B.R.
Dan Kruchinin

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] Fixed division by zero bug in kernel/padata.c
  2010-07-02 13:24   ` Dan Kruchinin
@ 2010-07-05 13:18     ` Steffen Klassert
  2010-07-05 13:35       ` Dan Kruchinin
  0 siblings, 1 reply; 7+ messages in thread
From: Steffen Klassert @ 2010-07-05 13:18 UTC (permalink / raw)
  To: Dan Kruchinin; +Cc: LKML, Herbert Xu

On Fri, Jul 02, 2010 at 05:24:13PM +0400, Dan Kruchinin wrote:
> No problem. Here is fixed patch:
> --
>  When boot CPU(typically CPU #0) is excluded from padata cpumask and
>  user enters halt command from console, kernel faults on division by zero;
>  This occurs because during the halt kernel shuts down each non-boot CPU one
>  by one. After it shuts down the last CPU that is set in the padata cpumask,
>  the only working CPU in the system is a boot CPU(#0) and it's the only CPU that
>  is set in the cpu_active_mask. Hence when padata_cpu_callback calls
>  __padata_remove_cpu(and hence padata_alloc_pd) it appears that padata
> cpumask and
>  cpu_active mask aren't intersect. Hence the following code in
> padata_alloc_pd causes
>  a DZ error exception:
>   cpumask_and(pd->cpumask, cpumask, cpu_active_mask); // pd->cpumask
> will be empty
>   ...
>   num_cpus = cpumask_weight(pd->cpumask); // num_cpus = 0
>   pd->max_seq_nr = (MAX_SEQ_NR / num_cpus) * num_cpus - 1; // DZ!
> 

I'm still thinking about how to handle an empty cpumask here.
While your patch would be ok to handle the shutdown case you
noticed, the problem is a bit more complex as soon as we are
able to change the cpumasks from userspace with your patches.

Essentially, we can end up with an empty cpumask here because
of two reasons:

1. A user removed the last cpu that belongs to the padata
cpumask and the active cpumask.

2. The last cpu that belongs to the padata cpumask and the
active cpumask goes offline.

In the first case it would be ok to tell the user that this is
an invalid operation by returning an error. In the second case
we can't just return an error to the cpu hotplug callback function,
because it returns NOTIFY_BAD on error. This means, that it depends
on the padata user configuration whether a cpu can go offline or not.
This is certainly not what we want to have.

Both cases should be handled in the same way. So we could just
stop the instance if the cpumasks do not intersect, and enable
it as soon as they do intersect again. The padata instance would
refuse to do anything as long as the cpumasks do not intersect,
but it is still in a consistent state. Let me add the infrastructure
to handle this, then you can use it with your patches.

Thanks,

Steffen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Fixed division by zero bug in kernel/padata.c
  2010-07-05 13:18     ` Steffen Klassert
@ 2010-07-05 13:35       ` Dan Kruchinin
  2010-07-05 13:43         ` Herbert Xu
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Kruchinin @ 2010-07-05 13:35 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: LKML, Herbert Xu

On Mon, Jul 5, 2010 at 5:18 PM, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> On Fri, Jul 02, 2010 at 05:24:13PM +0400, Dan Kruchinin wrote:
>> No problem. Here is fixed patch:
>> --
>>  When boot CPU(typically CPU #0) is excluded from padata cpumask and
>>  user enters halt command from console, kernel faults on division by zero;
>>  This occurs because during the halt kernel shuts down each non-boot CPU one
>>  by one. After it shuts down the last CPU that is set in the padata cpumask,
>>  the only working CPU in the system is a boot CPU(#0) and it's the only CPU that
>>  is set in the cpu_active_mask. Hence when padata_cpu_callback calls
>>  __padata_remove_cpu(and hence padata_alloc_pd) it appears that padata
>> cpumask and
>>  cpu_active mask aren't intersect. Hence the following code in
>> padata_alloc_pd causes
>>  a DZ error exception:
>>   cpumask_and(pd->cpumask, cpumask, cpu_active_mask); // pd->cpumask
>> will be empty
>>   ...
>>   num_cpus = cpumask_weight(pd->cpumask); // num_cpus = 0
>>   pd->max_seq_nr = (MAX_SEQ_NR / num_cpus) * num_cpus - 1; // DZ!
>>
>
> I'm still thinking about how to handle an empty cpumask here.
> While your patch would be ok to handle the shutdown case you
> noticed, the problem is a bit more complex as soon as we are
> able to change the cpumasks from userspace with your patches.
>
> Essentially, we can end up with an empty cpumask here because
> of two reasons:
>
> 1. A user removed the last cpu that belongs to the padata
> cpumask and the active cpumask.
>
> 2. The last cpu that belongs to the padata cpumask and the
> active cpumask goes offline.
>
> In the first case it would be ok to tell the user that this is
> an invalid operation by returning an error. In the second case
> we can't just return an error to the cpu hotplug callback function,
> because it returns NOTIFY_BAD on error. This means, that it depends
> on the padata user configuration whether a cpu can go offline or not.
> This is certainly not what we want to have.
>
> Both cases should be handled in the same way. So we could just
> stop the instance if the cpumasks do not intersect, and enable
> it as soon as they do intersect again. The padata instance would
> refuse to do anything as long as the cpumasks do not intersect,
> but it is still in a consistent state. Let me add the infrastructure
> to handle this, then you can use it with your patches.

Ok, get it.

>
> Thanks,
>
> Steffen
>



-- 
W.B.R.
Dan Kruchinin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Fixed division by zero bug in kernel/padata.c
  2010-07-05 13:35       ` Dan Kruchinin
@ 2010-07-05 13:43         ` Herbert Xu
  2010-07-05 13:53           ` Steffen Klassert
  0 siblings, 1 reply; 7+ messages in thread
From: Herbert Xu @ 2010-07-05 13:43 UTC (permalink / raw)
  To: Dan Kruchinin; +Cc: Steffen Klassert, LKML

On Mon, Jul 05, 2010 at 05:35:32PM +0400, Dan Kruchinin wrote:
>
> > Both cases should be handled in the same way. So we could just
> > stop the instance if the cpumasks do not intersect, and enable
> > it as soon as they do intersect again. The padata instance would
> > refuse to do anything as long as the cpumasks do not intersect,
> > but it is still in a consistent state. Let me add the infrastructure
> > to handle this, then you can use it with your patches.
> 
> Ok, get it.

So I should wait for another patch, right?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Fixed division by zero bug in kernel/padata.c
  2010-07-05 13:43         ` Herbert Xu
@ 2010-07-05 13:53           ` Steffen Klassert
  0 siblings, 0 replies; 7+ messages in thread
From: Steffen Klassert @ 2010-07-05 13:53 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Dan Kruchinin, LKML

On Mon, Jul 05, 2010 at 09:43:57PM +0800, Herbert Xu wrote:
> On Mon, Jul 05, 2010 at 05:35:32PM +0400, Dan Kruchinin wrote:
> >
> > > Both cases should be handled in the same way. So we could just
> > > stop the instance if the cpumasks do not intersect, and enable
> > > it as soon as they do intersect again. The padata instance would
> > > refuse to do anything as long as the cpumasks do not intersect,
> > > but it is still in a consistent state. Let me add the infrastructure
> > > to handle this, then you can use it with your patches.
> > 
> > Ok, get it.
> 
> So I should wait for another patch, right?
> 

Right.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-07-05 13:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-02 11:59 [PATCH] Fixed division by zero bug in kernel/padata.c Dan Kruchinin
2010-07-02 12:56 ` Steffen Klassert
2010-07-02 13:24   ` Dan Kruchinin
2010-07-05 13:18     ` Steffen Klassert
2010-07-05 13:35       ` Dan Kruchinin
2010-07-05 13:43         ` Herbert Xu
2010-07-05 13:53           ` Steffen Klassert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox