* [BUG stable, 2.6.32.27] sched: delayed cleanup of user_struct [not found] ` <20110110114913.GA22298@redhat.com> @ 2011-01-10 12:36 ` Oleg Nesterov 2011-01-10 12:46 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 10+ messages in thread From: Oleg Nesterov @ 2011-01-10 12:36 UTC (permalink / raw) To: Stefan Priebe - Profihost AG, Kay Sievers, Dhaval Giani, Greg Kroah-Hartman Cc: linux-kernel On 01/10, Oleg Nesterov wrote: > > On 01/10, Stefan Priebe - Profihost AG wrote: > > > > i've seen your patch and i've seen that we've a lot of crashes in the > > process cleanup since upgrading from 2.6.32.19 to 2.6.32.27 and i would > > like to know if you can tell me if your patch will solve them. > > > > Log: (ATTENTION Log is in reverse order) > > http://pastebin.com/WiyEKScs > > No, that patch has nothing to do with this crash. > > Looks like, this is CONFIG_USER_SCHED bug. Probably something like > double-free but I know nothing about this code and USER_SCHED is > deprecated anyway. > > I'd suggest you to disable this option. > > > Perhaps it makes sense to report this bug to lkml, though. > Probably 3959214f971417f4162926ac52ad4cd042958caa is the offending > commit. Yes, at first glance "sched: delayed cleanup of user_struct" looks buggy... uid_hash_find: hlist_for_each_entry(user, h, hashent, uidhash_node) { if (user->uid == uid) { /* possibly resurrect an "almost deleted" object */ if (atomic_inc_return(&user->__count) == 1) cancel_delayed_work(&user->work); return user; cancel_delayed_work() can only cancel the timer. If the timer has already expired, it can't cancel the pending work, and cleanup_user_struct() can run after uid_hash_find() returns. This _looks_ OK, cleanup_user_struct() should notice ->__count == 0 and do nothing. But it is not. Suppose that the new "owner" of this user_struct (the caller of uid_hash_find) in turn does free_uid() before up->work->func() completes. In this case INIT_DELAYED_WORK() can corrupt the pending work, or 2 instances of work->func() can race with each other on different CPUs. In particular, this can lead to double free. Kay? Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG stable, 2.6.32.27] sched: delayed cleanup of user_struct 2011-01-10 12:36 ` [BUG stable, 2.6.32.27] sched: delayed cleanup of user_struct Oleg Nesterov @ 2011-01-10 12:46 ` Stefan Priebe - Profihost AG 2011-01-10 13:35 ` Oleg Nesterov 0 siblings, 1 reply; 10+ messages in thread From: Stefan Priebe - Profihost AG @ 2011-01-10 12:46 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Kay Sievers, Dhaval Giani, Greg Kroah-Hartman, linux-kernel Hi, is it a workaround to disable CONFIG_GROUP_SCHED? Stefan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG stable, 2.6.32.27] sched: delayed cleanup of user_struct 2011-01-10 12:46 ` Stefan Priebe - Profihost AG @ 2011-01-10 13:35 ` Oleg Nesterov 2011-01-10 14:01 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 10+ messages in thread From: Oleg Nesterov @ 2011-01-10 13:35 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Kay Sievers, Dhaval Giani, Greg Kroah-Hartman, linux-kernel On 01/10, Stefan Priebe - Profihost AG wrote: > > Hi, > > is it a workaround to disable CONFIG_GROUP_SCHED? ^^^^^^^^^^^^^^^^^^ CONFIG_USER_SCHED, not CONFIG_GROUP_SCHED. I think that disabling it should obviously help. But it is possible that there is something else. Given that you can reproduce the crash, it would be nice if you can test this and report ;) Or. You can revert b00bc0b237055b4c45816325ee14f0bd83e6f590 uids: Prevent tear down race 3959214f971417f4162926ac52ad4cd042958caa sched: delayed cleanup of user_struct patches for testing. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG stable, 2.6.32.27] sched: delayed cleanup of user_struct 2011-01-10 13:35 ` Oleg Nesterov @ 2011-01-10 14:01 ` Stefan Priebe - Profihost AG 2011-01-10 14:08 ` Oleg Nesterov 0 siblings, 1 reply; 10+ messages in thread From: Stefan Priebe - Profihost AG @ 2011-01-10 14:01 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Kay Sievers, Dhaval Giani, Greg Kroah-Hartman, linux-kernel >> is it a workaround to disable CONFIG_GROUP_SCHED? > ^^^^^^^^^^^^^^^^^^ > CONFIG_USER_SCHED, not CONFIG_GROUP_SCHED. correct but CONFIG_USER_SCHED depends on CONFIG_GROUP_SCHED and as we don't use CONFIG_GROUP_SCHED at all my idea was to completely remove this feature. > I think that disabling it should obviously help. But it is > possible that there is something else. Given that you can > reproduce the crash, it would be nice if you can test this > and report ;) Main problem - right i can't reproduce it. But the system was crashing two times more in the past. You can find the logs here: http://pastebin.com/hNd15E8K > Or. You can revert > b00bc0b237055b4c45816325ee14f0bd83e6f590 > uids: Prevent tear down race > > 3959214f971417f4162926ac52ad4cd042958caa > sched: delayed cleanup of user_struct Not sure if it is a good idea to introduce another race again. Stefan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG stable, 2.6.32.27] sched: delayed cleanup of user_struct 2011-01-10 14:01 ` Stefan Priebe - Profihost AG @ 2011-01-10 14:08 ` Oleg Nesterov 2011-01-10 14:23 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 10+ messages in thread From: Oleg Nesterov @ 2011-01-10 14:08 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Kay Sievers, Dhaval Giani, Greg Kroah-Hartman, linux-kernel On 01/10, Stefan Priebe - Profihost AG wrote: > > Main problem - right i can't reproduce it. But the system was crashing > two times more in the past. > > You can find the logs here: > http://pastebin.com/hNd15E8K Good. Sadly, I didn't see this trace before... kernel BUG at kernel/workqueue.c:287! this matches "INIT_DELAYED_WORK() can corrupt the pending work" I mentioned. >> Or. You can revert >> b00bc0b237055b4c45816325ee14f0bd83e6f590 >> uids: Prevent tear down race >> >> 3959214f971417f4162926ac52ad4cd042958caa >> sched: delayed cleanup of user_struct > Not sure if it is a good idea to introduce another race again. Which race? b00bc0b237055b4c45816325ee14f0bd83e6f590 fixes the race, yes. But this race was introduced by 3959214f971417f4162926ac52ad4cd042958caa. And afaics the latter commit is optimization... Anyway, this is up to you, of course. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG stable, 2.6.32.27] sched: delayed cleanup of user_struct 2011-01-10 14:08 ` Oleg Nesterov @ 2011-01-10 14:23 ` Stefan Priebe - Profihost AG 2011-01-10 14:19 ` Oleg Nesterov 2011-01-10 18:23 ` Greg KH 0 siblings, 2 replies; 10+ messages in thread From: Stefan Priebe - Profihost AG @ 2011-01-10 14:23 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Kay Sievers, Dhaval Giani, Greg Kroah-Hartman, linux-kernel >> You can find the logs here: >> http://pastebin.com/hNd15E8K > > Good. Sadly, I didn't see this trace before... > > kernel BUG at kernel/workqueue.c:287! > > this matches "INIT_DELAYED_WORK() can corrupt the pending work" I > mentioned. Sorry to ask this but this could also be "workarounded" by disabled GROUP_SCHED? Or is this another thing? >>> Or. You can revert >>> b00bc0b237055b4c45816325ee14f0bd83e6f590 >>> uids: Prevent tear down race >>> >>> 3959214f971417f4162926ac52ad4cd042958caa >>> sched: delayed cleanup of user_struct >> Not sure if it is a good idea to introduce another race again. > > Which race? b00bc0b237055b4c45816325ee14f0bd83e6f590 fixes the race, yes. > But this race was introduced by 3959214f971417f4162926ac52ad4cd042958caa. > And afaics the latter commit is optimization... > > Anyway, this is up to you, of course. ah OK sorry didn't get this. Will wait if greg or anybody else can tell us something about it. Thanks for your findings and help! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG stable, 2.6.32.27] sched: delayed cleanup of user_struct 2011-01-10 14:23 ` Stefan Priebe - Profihost AG @ 2011-01-10 14:19 ` Oleg Nesterov 2011-01-10 17:07 ` Mike Galbraith 2011-01-10 18:23 ` Greg KH 1 sibling, 1 reply; 10+ messages in thread From: Oleg Nesterov @ 2011-01-10 14:19 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Kay Sievers, Dhaval Giani, Greg Kroah-Hartman, linux-kernel On 01/10, Stefan Priebe - Profihost AG wrote: > >>> You can find the logs here: >>> http://pastebin.com/hNd15E8K >> >> Good. Sadly, I didn't see this trace before... >> >> kernel BUG at kernel/workqueue.c:287! >> >> this matches "INIT_DELAYED_WORK() can corrupt the pending work" I >> mentioned. > Sorry to ask this but this could also be "workarounded" by disabled > GROUP_SCHED? Or is this another thing? Yes, this looks like the same thing. But let me repeat, today I looked at this code for the first time ;) > Will wait if greg or anybody else can tell > us something about it. Indeed. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG stable, 2.6.32.27] sched: delayed cleanup of user_struct 2011-01-10 14:19 ` Oleg Nesterov @ 2011-01-10 17:07 ` Mike Galbraith 0 siblings, 0 replies; 10+ messages in thread From: Mike Galbraith @ 2011-01-10 17:07 UTC (permalink / raw) To: Oleg Nesterov Cc: Stefan Priebe - Profihost AG, Kay Sievers, Dhaval Giani, Greg Kroah-Hartman, linux-kernel, Peter Zijlstra On Mon, 2011-01-10 at 15:19 +0100, Oleg Nesterov wrote: > On 01/10, Stefan Priebe - Profihost AG wrote: > > > >>> You can find the logs here: > >>> http://pastebin.com/hNd15E8K > >> > >> Good. Sadly, I didn't see this trace before... > >> > >> kernel BUG at kernel/workqueue.c:287! > >> > >> this matches "INIT_DELAYED_WORK() can corrupt the pending work" I > >> mentioned. > > Sorry to ask this but this could also be "workarounded" by disabled > > GROUP_SCHED? Or is this another thing? > > Yes, this looks like the same thing. > > But let me repeat, today I looked at this code for the first time ;) > > > Will wait if greg or anybody else can tell > > us something about it. > > Indeed. I'd be inclined to just shoot USER_SCHED, since it's dead upstream. I'm putting a small backport series together, and could add that to the mix. -Mike ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG stable, 2.6.32.27] sched: delayed cleanup of user_struct 2011-01-10 14:23 ` Stefan Priebe - Profihost AG 2011-01-10 14:19 ` Oleg Nesterov @ 2011-01-10 18:23 ` Greg KH 2011-01-10 18:43 ` Oleg Nesterov 1 sibling, 1 reply; 10+ messages in thread From: Greg KH @ 2011-01-10 18:23 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Oleg Nesterov, Kay Sievers, Dhaval Giani, linux-kernel On Mon, Jan 10, 2011 at 03:23:20PM +0100, Stefan Priebe - Profihost AG wrote: > >>You can find the logs here: > >>http://pastebin.com/hNd15E8K > > > >Good. Sadly, I didn't see this trace before... > > > > kernel BUG at kernel/workqueue.c:287! > > > >this matches "INIT_DELAYED_WORK() can corrupt the pending work" I > >mentioned. > Sorry to ask this but this could also be "workarounded" by disabled > GROUP_SCHED? Or is this another thing? > > >>>Or. You can revert > >>> b00bc0b237055b4c45816325ee14f0bd83e6f590 > >>> uids: Prevent tear down race > >>> > >>> 3959214f971417f4162926ac52ad4cd042958caa > >>> sched: delayed cleanup of user_struct > >>Not sure if it is a good idea to introduce another race again. > > > >Which race? b00bc0b237055b4c45816325ee14f0bd83e6f590 fixes the race, yes. > >But this race was introduced by 3959214f971417f4162926ac52ad4cd042958caa. > >And afaics the latter commit is optimization... > > > >Anyway, this is up to you, of course. > ah OK sorry didn't get this. Will wait if greg or anybody else can > tell us something about it. Tell us about what? I'm totally confused here, what is the problem? thanks, greg k-h ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BUG stable, 2.6.32.27] sched: delayed cleanup of user_struct 2011-01-10 18:23 ` Greg KH @ 2011-01-10 18:43 ` Oleg Nesterov 0 siblings, 0 replies; 10+ messages in thread From: Oleg Nesterov @ 2011-01-10 18:43 UTC (permalink / raw) To: Greg KH Cc: Stefan Priebe - Profihost AG, Kay Sievers, Dhaval Giani, linux-kernel, Mike Galbraith On 01/10, Greg KH wrote: > > Tell us about what? I'm totally confused here, what is the problem? Please look at http://marc.info/?l=linux-kernel&m=129466345327931 I think that free_user's delayed logic is buggy, and the traces from Stefan seems to confirm the theory. As Mike suggests, we can just remove the deprecated USER_SCHED code and forget about this problem. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-01-10 18:50 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4D2AB8F7.7030506@profihost.ag>
[not found] ` <20110110114913.GA22298@redhat.com>
2011-01-10 12:36 ` [BUG stable, 2.6.32.27] sched: delayed cleanup of user_struct Oleg Nesterov
2011-01-10 12:46 ` Stefan Priebe - Profihost AG
2011-01-10 13:35 ` Oleg Nesterov
2011-01-10 14:01 ` Stefan Priebe - Profihost AG
2011-01-10 14:08 ` Oleg Nesterov
2011-01-10 14:23 ` Stefan Priebe - Profihost AG
2011-01-10 14:19 ` Oleg Nesterov
2011-01-10 17:07 ` Mike Galbraith
2011-01-10 18:23 ` Greg KH
2011-01-10 18:43 ` Oleg Nesterov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox