* Problem with percpu values when bringing up second CPU?
@ 2009-08-04 22:58 Jeremy Fitzhardinge
2009-08-05 1:31 ` Rusty Russell
2009-08-05 1:44 ` Tejun Heo
0 siblings, 2 replies; 4+ messages in thread
From: Jeremy Fitzhardinge @ 2009-08-04 22:58 UTC (permalink / raw)
To: Tejun Heo; +Cc: Rusty Russell, Ingo Molnar, Linux Kernel Mailing List
Hi,
I just tracked down a bug I was having to a change where I changed one
of my Xen event channel variables to a percpu variable, relating to
masking an event channel.
The symptom was that shortly after bringing up the second CPU, the first
CPU's timer events stopped arriving, apparently because they had become
masked.
The event channels masks are declared as:
#define NR_EVENT_CHANNEL_LONGS (NR_EVENT_CHANNELS/BITS_PER_LONG)
static DEFINE_PER_CPU(unsigned long,
cpu_evtchn_mask[NR_EVENT_CHANNEL_LONGS]) =
{[0 ... NR_EVENT_CHANNEL_LONGS-1] = ~0ul }; /* everything masked by default */
My theory about what's happening is that when the second CPU comes up,
it allocates separate percpu areas for each CPU, but it is somehow
failing to accurately copy CPU 0's percpu data over; either it isn't
copying it all (ie, using the initialized values rather than the current
values), or failing to copy the values in an interrupt-atomic way.
Does this sound plausible?
When I convert this back to an ad-hoc percpu variable (an array indexed
by cpu number), it goes back to working. Also, if I boot with maxcpus=1
it also works with percpu data.
Also, because we don't have large pages under Xen, it always allocates
percpu as 4k pages:
PERCPU: Allocated 21 4k pages, static data 82080 bytes
Thanks,
J
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: Problem with percpu values when bringing up second CPU?
2009-08-04 22:58 Problem with percpu values when bringing up second CPU? Jeremy Fitzhardinge
@ 2009-08-05 1:31 ` Rusty Russell
2009-08-06 20:32 ` Jeremy Fitzhardinge
2009-08-05 1:44 ` Tejun Heo
1 sibling, 1 reply; 4+ messages in thread
From: Rusty Russell @ 2009-08-05 1:31 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Tejun Heo, Ingo Molnar, Linux Kernel Mailing List
On Wed, 5 Aug 2009 08:28:34 am Jeremy Fitzhardinge wrote:
> Hi,
>
> I just tracked down a bug I was having to a change where I changed one
> of my Xen event channel variables to a percpu variable, relating to
> masking an event channel.
>
> The symptom was that shortly after bringing up the second CPU, the first
> CPU's timer events stopped arriving, apparently because they had become
> masked.
>
> The event channels masks are declared as:
>
> #define NR_EVENT_CHANNEL_LONGS (NR_EVENT_CHANNELS/BITS_PER_LONG)
> static DEFINE_PER_CPU(unsigned long,
> cpu_evtchn_mask[NR_EVENT_CHANNEL_LONGS]) =
> {[0 ... NR_EVENT_CHANNEL_LONGS-1] = ~0ul }; /* everything masked by default */
>
>
> My theory about what's happening is that when the second CPU comes up,
> it allocates separate percpu areas for each CPU, but it is somehow
> failing to accurately copy CPU 0's percpu data over
If you touch the per-cpu vars before setup_per_cpu_areas(), you will hit the
master copy.
Is that possible?
Rusty.
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: Problem with percpu values when bringing up second CPU?
2009-08-05 1:31 ` Rusty Russell
@ 2009-08-06 20:32 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 4+ messages in thread
From: Jeremy Fitzhardinge @ 2009-08-06 20:32 UTC (permalink / raw)
To: Rusty Russell; +Cc: Tejun Heo, Ingo Molnar, Linux Kernel Mailing List
On 08/04/09 18:31, Rusty Russell wrote:
> On Wed, 5 Aug 2009 08:28:34 am Jeremy Fitzhardinge wrote:
>
>> Hi,
>>
>> I just tracked down a bug I was having to a change where I changed one
>> of my Xen event channel variables to a percpu variable, relating to
>> masking an event channel.
>>
>> The symptom was that shortly after bringing up the second CPU, the first
>> CPU's timer events stopped arriving, apparently because they had become
>> masked.
>>
>> The event channels masks are declared as:
>>
>> #define NR_EVENT_CHANNEL_LONGS (NR_EVENT_CHANNELS/BITS_PER_LONG)
>> static DEFINE_PER_CPU(unsigned long,
>> cpu_evtchn_mask[NR_EVENT_CHANNEL_LONGS]) =
>> {[0 ... NR_EVENT_CHANNEL_LONGS-1] = ~0ul }; /* everything masked by default */
>>
>>
>> My theory about what's happening is that when the second CPU comes up,
>> it allocates separate percpu areas for each CPU, but it is somehow
>> failing to accurately copy CPU 0's percpu data over
>>
>
> If you touch the per-cpu vars before setup_per_cpu_areas(), you will hit the
> master copy.
>
> Is that possible?
>
Likely, I think. It depends on whether interrupts can happen before
that point. But hitting the master copy should be OK.
However, the problem I'm seeing happens when the second CPU starts. I
was working on the assumption that that's when the transfer from master
to allocated copy happens, but it looks like I'm mistaken.
Maybe I'm barking up the wrong tree, but the problem does bisect to a
simple conversion to percpu...
J
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Problem with percpu values when bringing up second CPU?
2009-08-04 22:58 Problem with percpu values when bringing up second CPU? Jeremy Fitzhardinge
2009-08-05 1:31 ` Rusty Russell
@ 2009-08-05 1:44 ` Tejun Heo
1 sibling, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2009-08-05 1:44 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Rusty Russell, Ingo Molnar, Linux Kernel Mailing List
Hello,
Jeremy Fitzhardinge wrote:
> I just tracked down a bug I was having to a change where I changed one
> of my Xen event channel variables to a percpu variable, relating to
> masking an event channel.
>
> The symptom was that shortly after bringing up the second CPU, the first
> CPU's timer events stopped arriving, apparently because they had become
> masked.
Hmmmm...
> The event channels masks are declared as:
>
> #define NR_EVENT_CHANNEL_LONGS (NR_EVENT_CHANNELS/BITS_PER_LONG)
> static DEFINE_PER_CPU(unsigned long,
> cpu_evtchn_mask[NR_EVENT_CHANNEL_LONGS]) =
> {[0 ... NR_EVENT_CHANNEL_LONGS-1] = ~0ul }; /* everything masked by default */
>
> My theory about what's happening is that when the second CPU comes up,
> it allocates separate percpu areas for each CPU, but it is somehow
> failing to accurately copy CPU 0's percpu data over; either it isn't
> copying it all (ie, using the initialized values rather than the current
> values), or failing to copy the values in an interrupt-atomic way.
>
> Does this sound plausible?
Percpu areas aren't setup when the first cpu comes up. They're
allocated and copied from the master copy during early init when only
the boot cpu is running.
> When I convert this back to an ad-hoc percpu variable (an array indexed
> by cpu number), it goes back to working. Also, if I boot with maxcpus=1
> it also works with percpu data.
Hmmm... strange. Can you try to print out the values along the boot
process and see when things go wrong?
> Also, because we don't have large pages under Xen, it always allocates
> percpu as 4k pages:
>
> PERCPU: Allocated 21 4k pages, static data 82080 bytes
I don't think the choice of first chunk allocator would cause any
difference.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-08-06 20:33 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-04 22:58 Problem with percpu values when bringing up second CPU? Jeremy Fitzhardinge
2009-08-05 1:31 ` Rusty Russell
2009-08-06 20:32 ` Jeremy Fitzhardinge
2009-08-05 1:44 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox