public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] clockevents: fix clockevent_devices list corruption after cpu hotplug
@ 2009-12-10 13:07 Xiaotian Feng
  2009-12-10 13:07 ` [PATCH 1/4] clockevents: use list_for_each_entry_safe Xiaotian Feng
  2009-12-10 14:35 ` [RFC PATCH 0/4] clockevents: fix clockevent_devices list corruption after cpu hotplug Thomas Gleixner
  0 siblings, 2 replies; 10+ messages in thread
From: Xiaotian Feng @ 2009-12-10 13:07 UTC (permalink / raw)
  To: tglx, damm, hsweeten, akpm, venkatesh.pallipadi; +Cc: linux-kernel

I've met a list_del corruption, which was reported in
http://lkml.org/lkml/2009/11/27/45. But no response, so I try to debug it
by myself.

After I added some printks to show all elements in clockevent_devices, I
found kernel hangs when I tried to resume from s2ram.

In clockevents_register_device, clockevents_do_notify ADD is always followed
by clockevents_notify_released. Although clockevents_do_notify ADD will use
tick_check_new_device to add new devices and replace old devices to the
clockevents_released list, clockevents_notify_released add them back to
clockevent_devices list.

My system is Quad-Core x86_64, with apic and hpet enables, after boot up,
the elements in clockevent_devices list is :
clockevent_device->lapic(3)->hpet5(3)->lapic(2)->hpet4(2)->lapic(1)->hpet3(1)-
  ->lapic(0)->hpet2(0)->hpet(0)
* () means cpu id

But active clock_event_device is hpet2,hpet3,hpet4,hpet5. Then at s2ram stage,
cpu 1,2,3 is down, then notify CLOCK_EVT_NOTIFY_CPU_DEAD will calls tick_shutdown,
then hpet2,hpet3,hpet4,hpet5 was deleted from clockevent_device list.
So after s2ram, elements in clockevent_device list is:
clockevent_device->lapic(3)->lapic(2)->lapic(1)->lapic(0)->hpet2(0)->hpet(0)

Then at resume stage, cpu 1,2,3 is up, it will register lapic again, and then
perform list_add lapic on clockevent_device list, e.g. list_add lapic(1) on
above list, lapic will move to the clockevent_device->next, but lapic(2)->next
is still point to lapic(1), the list is circular and corrupted then. 

This patchset aims to fixes above behaviour by:
       - on clockevents_register_device, if notify ADD success, move new devices
         to the clockevent_devices list, otherwise move to clockevents_released
         list.
       - on clockevents_notify_released, same behaviour as above.
       - on clockevents_notify CPU_DEAD, remove related devices on dead cpu from
         clockevents_released list.

It makes sure that only active devices on each cpu is on clockevent_devices list.
With this patchset, the list_del corruption disappeared, and suspend/resume, cpu
hotplug works fine on my system. 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-01-18 13:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-10 13:07 [RFC PATCH 0/4] clockevents: fix clockevent_devices list corruption after cpu hotplug Xiaotian Feng
2009-12-10 13:07 ` [PATCH 1/4] clockevents: use list_for_each_entry_safe Xiaotian Feng
2009-12-10 13:07   ` [PATCH 2/4] clockevents: convert clockevents_do_notify to int Xiaotian Feng
2009-12-10 13:07     ` [PATCH 3/4] clockevents: add device to clockevent_devices list if notify ADD success Xiaotian Feng
2009-12-10 13:07       ` [PATCH 4/4] clockevents: remove related device from clockevents_released list when cpu is DEAD Xiaotian Feng
2009-12-10 14:35 ` [RFC PATCH 0/4] clockevents: fix clockevent_devices list corruption after cpu hotplug Thomas Gleixner
2009-12-11  2:29   ` Xiaotian Feng
2010-01-17  9:28   ` Ozan Çağlayan
2010-01-18  2:30     ` Xiaotian Feng
2010-01-18 13:51       ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox