* [patch, minor] workqueue: consistently use 'err' in __create_workqueue_key()
@ 2008-07-28 20:41 Dmitry Adamushko
2008-07-29 11:02 ` Oleg Nesterov
2008-07-29 16:22 ` [PATCH] workqueues: add comments to __create_workqueue_key() Oleg Nesterov
0 siblings, 2 replies; 9+ messages in thread
From: Dmitry Adamushko @ 2008-07-28 20:41 UTC (permalink / raw)
To: linux-kernel; +Cc: Ingo Molnar, Oleg Nesterov
I guess error handling is a bit illogical in __create_workqueue_key().
Although, it won't cause any real problems.
---
From: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Subject: workqueue: consistently use 'err' in __create_workqueue_key()
Fix the logic behind the use of 'err' in the 'for_each_posible_cpu()' loop
of __create_workqueue_key().
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ec7e4f6..738bf05 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -827,7 +827,8 @@ struct workqueue_struct *__create_workqueue_key(const char *name,
if (singlethread) {
cwq = init_cpu_workqueue(wq, singlethread_cpu);
err = create_workqueue_thread(cwq, singlethread_cpu);
- start_workqueue_thread(cwq, -1);
+ if (!err)
+ start_workqueue_thread(cwq, -1);
} else {
cpu_maps_update_begin();
spin_lock(&workqueue_lock);
@@ -836,9 +837,11 @@ struct workqueue_struct *__create_workqueue_key(const char *name,
for_each_possible_cpu(cpu) {
cwq = init_cpu_workqueue(wq, cpu);
- if (err || !cpu_online(cpu))
+ if (!cpu_online(cpu))
continue;
err = create_workqueue_thread(cwq, cpu);
+ if (err)
+ break;
start_workqueue_thread(cwq, cpu);
}
cpu_maps_update_done();
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [patch, minor] workqueue: consistently use 'err' in __create_workqueue_key() 2008-07-28 20:41 [patch, minor] workqueue: consistently use 'err' in __create_workqueue_key() Dmitry Adamushko @ 2008-07-29 11:02 ` Oleg Nesterov 2008-07-29 11:58 ` Dmitry Adamushko 2008-07-29 16:22 ` [PATCH] workqueues: add comments to __create_workqueue_key() Oleg Nesterov 1 sibling, 1 reply; 9+ messages in thread From: Oleg Nesterov @ 2008-07-29 11:02 UTC (permalink / raw) To: Dmitry Adamushko; +Cc: linux-kernel, Ingo Molnar On 07/28, Dmitry Adamushko wrote: > > I guess error handling is a bit illogical in __create_workqueue_key() Please see below, > for_each_possible_cpu(cpu) { > cwq = init_cpu_workqueue(wq, cpu); > - if (err || !cpu_online(cpu)) > + if (!cpu_online(cpu)) > continue; > err = create_workqueue_thread(cwq, cpu); > + if (err) > + break; This was done on purpose. The code above does init_cpu_workqueue(cpu) for each possible cpu, even if we fail to create cwq->thread for some cpu. This way destroy_workqueue() (called below) shouldn't worry about the partially initialized workqueues. The patch above should work, but it assumes that destroy_workqueue() must do nothing with cwq if cwq->thread == NULL, this is not very robust. And, more importantly. Let's suppose __create_workqueue_key() does "break" and drops cpu_add_remove_lock. Then we race with cpu-hotplug which can hit the uninitialized cwq. This is fixable, but needs other complication. Oleg. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch, minor] workqueue: consistently use 'err' in __create_workqueue_key() 2008-07-29 11:02 ` Oleg Nesterov @ 2008-07-29 11:58 ` Dmitry Adamushko 2008-07-29 12:52 ` Oleg Nesterov 0 siblings, 1 reply; 9+ messages in thread From: Dmitry Adamushko @ 2008-07-29 11:58 UTC (permalink / raw) To: Oleg Nesterov; +Cc: linux-kernel, Ingo Molnar 2008/7/29 Oleg Nesterov <oleg@tv-sign.ru>: > On 07/28, Dmitry Adamushko wrote: >> >> I guess error handling is a bit illogical in __create_workqueue_key() > > Please see below, > >> for_each_possible_cpu(cpu) { >> cwq = init_cpu_workqueue(wq, cpu); >> - if (err || !cpu_online(cpu)) >> + if (!cpu_online(cpu)) >> continue; >> err = create_workqueue_thread(cwq, cpu); >> + if (err) >> + break; > > This was done on purpose. The code above does init_cpu_workqueue(cpu) > for each possible cpu, even if we fail to create cwq->thread for some > cpu. This way destroy_workqueue() (called below) shouldn't worry about > the partially initialized workqueues. > > The patch above should work, but it assumes that destroy_workqueue() > must do nothing with cwq if cwq->thread == NULL, this is not very > robust. Yes, I saw this test and that's why I decided that destroy_workqueue() is able (designed) to deal with partially-initialized objects. Note, for the race scenario with cpu-hotplug (which I've overlooked indeed) which you describe below, we also seem to depend on the same "cwq->thread == NULL" test in cleanup_workqueue_thread() as follows: assume, cpu_down(cpu) -> CPU_POST_DEAD -> cleanup_workqueue_thread() gets called for a partially initialized workqueue for 'cpu' for which create_workqueue_thread() has previously failed in create_worqueue_key(). > > And, more importantly. Let's suppose __create_workqueue_key() does > "break" and drops cpu_add_remove_lock. Then we race with cpu-hotplug > which can hit the uninitialized cwq. This is fixable, but needs other > complication. And I'd say this behavior (of having a partially-created object visible to the outside world) is not that robust. e.g. the aforementioned race would be eliminated if we place a wq on the global list only when it's been successfully initialized. For this goal, the cleanup path in __create_workqueue_key() would need to be altered but overall, I think it'd make the code a bit more straightforward. [ just my 0.02, maybe I'm missing something again ;-) ] > > Oleg. > -- Best regards, Dmitry Adamushko ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch, minor] workqueue: consistently use 'err' in __create_workqueue_key() 2008-07-29 11:58 ` Dmitry Adamushko @ 2008-07-29 12:52 ` Oleg Nesterov 2008-07-29 13:44 ` Oleg Nesterov 0 siblings, 1 reply; 9+ messages in thread From: Oleg Nesterov @ 2008-07-29 12:52 UTC (permalink / raw) To: Dmitry Adamushko; +Cc: linux-kernel, Ingo Molnar On 07/29, Dmitry Adamushko wrote: > > 2008/7/29 Oleg Nesterov <oleg@tv-sign.ru>: > > On 07/28, Dmitry Adamushko wrote: > >> > >> I guess error handling is a bit illogical in __create_workqueue_key() > > > > Please see below, > > > >> for_each_possible_cpu(cpu) { > >> cwq = init_cpu_workqueue(wq, cpu); > >> - if (err || !cpu_online(cpu)) > >> + if (!cpu_online(cpu)) > >> continue; > >> err = create_workqueue_thread(cwq, cpu); > >> + if (err) > >> + break; > > > > This was done on purpose. The code above does init_cpu_workqueue(cpu) > > for each possible cpu, even if we fail to create cwq->thread for some > > cpu. This way destroy_workqueue() (called below) shouldn't worry about > > the partially initialized workqueues. > > > > The patch above should work, but it assumes that destroy_workqueue() > > must do nothing with cwq if cwq->thread == NULL, this is not very > > robust. > > Yes, I saw this test and that's why I decided that destroy_workqueue() > is able (designed) to deal with partially-initialized objects. No, no. cwq->thread == NULL just means that it has no ->thread and nothing more, it does not mean cwq was not initialized, see below. > Note, for the race scenario with cpu-hotplug (which I've overlooked > indeed) which you describe below, we also seem to depend on the same > "cwq->thread == NULL" test in cleanup_workqueue_thread() as follows: > > assume, cpu_down(cpu) -> CPU_POST_DEAD -> cleanup_workqueue_thread() > gets called for a partially initialized workqueue for 'cpu' for which > create_workqueue_thread() has previously failed in > create_worqueue_key(). Well, it _is_ initialized, but yes cwq->thread can be NULL, > > > > And, more importantly. Let's suppose __create_workqueue_key() does > > "break" and drops cpu_add_remove_lock. Then we race with cpu-hotplug > > which can hit the uninitialized cwq. This is fixable, but needs other > > complication. > > And I'd say this behavior (of having a partially-created object > visible to the outside world) is not that robust. e.g. the > aforementioned race would be eliminated if we place a wq on the global > list only when it's been successfully initialized. Note that start_workqueue_thread() and cleanup_workqueue_thread() has to check cwq->thread != NULL anyway, suppose that CPU_UP_PREPARE fails. Yes, we can change __create_workqueue_key() to check err == 0 before list_add(), but this just adds more checks without any gain. Note also that in fact it is better to do start_workqueue_thread() even if create_workqueue_thread(). This doesn't matter with the current implementation, but start_workqueue_thread() ensures that cwq->thread can be kthread_stop()'ed, and start_workqueue_thread() can be changed so it can fail even if kthread_create() succeeds. Oleg. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch, minor] workqueue: consistently use 'err' in __create_workqueue_key() 2008-07-29 12:52 ` Oleg Nesterov @ 2008-07-29 13:44 ` Oleg Nesterov 2008-07-29 14:20 ` Dmitry Adamushko 0 siblings, 1 reply; 9+ messages in thread From: Oleg Nesterov @ 2008-07-29 13:44 UTC (permalink / raw) To: Dmitry Adamushko; +Cc: linux-kernel, Ingo Molnar On 07/29, Oleg Nesterov wrote: > > On 07/29, Dmitry Adamushko wrote: > > > > And I'd say this behavior (of having a partially-created object > > visible to the outside world) is not that robust. e.g. the > > aforementioned race would be eliminated if we place a wq on the global > > list only when it's been successfully initialized. > > Yes, we can change __create_workqueue_key() to check err == 0 before > list_add(), Well no, we can't do even this. Then we have another race with cpu-hotplug. Suppose we have CPUs 0, 1, 2. create_workqueue() fails to create cwq->thread for CPU 2 and calls destroy_workqueue(). Before it takes the cpu_add_remove_lock, _cpu_down() removes CPU 1 from cpu_populated_map, but since we didn't add this wq on the global list, cwq[1]->thread remains alive. destroy_workqueue() takes cpu_add_remove_lock, and calls cleanup_workqueue_thread() for CPUs 0 and 2. cwq[1]->thread is lost. Damn. I had this in mind when I wrote the code, but forgot. We need comments, I'll send the patch. Oleg. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch, minor] workqueue: consistently use 'err' in __create_workqueue_key() 2008-07-29 13:44 ` Oleg Nesterov @ 2008-07-29 14:20 ` Dmitry Adamushko 2008-07-29 16:13 ` Oleg Nesterov 0 siblings, 1 reply; 9+ messages in thread From: Dmitry Adamushko @ 2008-07-29 14:20 UTC (permalink / raw) To: Oleg Nesterov; +Cc: linux-kernel, Ingo Molnar 2008/7/29 Oleg Nesterov <oleg@tv-sign.ru>: > On 07/29, Oleg Nesterov wrote: >> >> On 07/29, Dmitry Adamushko wrote: >> > >> > And I'd say this behavior (of having a partially-created object >> > visible to the outside world) is not that robust. e.g. the >> > aforementioned race would be eliminated if we place a wq on the global >> > list only when it's been successfully initialized. >> >> Yes, we can change __create_workqueue_key() to check err == 0 before >> list_add(), > > Well no, we can't do even this. > > Then we have another race with cpu-hotplug. Suppose we have CPUs 0, 1, 2. > create_workqueue() fails to create cwq->thread for CPU 2 and calls > destroy_workqueue(). Before it takes the cpu_add_remove_lock, _cpu_down() > removes CPU 1 from cpu_populated_map, but since we didn't add this wq > on the global list, cwq[1]->thread remains alive. > > destroy_workqueue() takes cpu_add_remove_lock, and calls > cleanup_workqueue_thread() for CPUs 0 and 2. cwq[1]->thread is lost. Yes, I've actually seen this case and that's why I said "the cleanup path in __create_workqueue_key() would need to be altered" :-) likely, to the extent that it would not be a call to destroy_workqueue() anymore. either something that only does for_each_cpu_mask_nr(cpu, *cpu_map) cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, cpu)); and from the _same_ 'cpu_add_remove_lock' section which is used to create a wq (so we don't drop a lock); _or_ do it outside of the locked section _but_ don't rely on for_each_cpu_mask_nr(cpu, *cpu_map)... e.g. just delete all per-cpu wq->cpu_wq structures that have been initialized (that's no matter if their respective cpus are online/offline now). yes, maybe this cleanup path would not look all that fancy (but I didn't try) but I do think that by not exposing "partially-initialized object to the outside world" (e.g. cpu-hotplug events won't see them) this code would become more straightforward and less prone to possible errors/races. e.g. all these "create_workqueue_key() may race with cpu-hotplug" would be gone. -- Best regards, Dmitry Adamushko ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch, minor] workqueue: consistently use 'err' in __create_workqueue_key() 2008-07-29 14:20 ` Dmitry Adamushko @ 2008-07-29 16:13 ` Oleg Nesterov 2008-07-29 16:52 ` Dmitry Adamushko 0 siblings, 1 reply; 9+ messages in thread From: Oleg Nesterov @ 2008-07-29 16:13 UTC (permalink / raw) To: Dmitry Adamushko; +Cc: linux-kernel, Ingo Molnar On 07/29, Dmitry Adamushko wrote: > > 2008/7/29 Oleg Nesterov <oleg@tv-sign.ru>: > > On 07/29, Oleg Nesterov wrote: > >> > >> On 07/29, Dmitry Adamushko wrote: > >> > > >> > And I'd say this behavior (of having a partially-created object > >> > visible to the outside world) is not that robust. e.g. the > >> > aforementioned race would be eliminated if we place a wq on the global > >> > list only when it's been successfully initialized. > >> > >> Yes, we can change __create_workqueue_key() to check err == 0 before > >> list_add(), > > > > Well no, we can't do even this. > > > > Then we have another race with cpu-hotplug. Suppose we have CPUs 0, 1, 2. > > create_workqueue() fails to create cwq->thread for CPU 2 and calls > > destroy_workqueue(). Before it takes the cpu_add_remove_lock, _cpu_down() > > removes CPU 1 from cpu_populated_map, but since we didn't add this wq > > on the global list, cwq[1]->thread remains alive. > > > > destroy_workqueue() takes cpu_add_remove_lock, and calls > > cleanup_workqueue_thread() for CPUs 0 and 2. cwq[1]->thread is lost. > > Yes, I've actually seen this case and that's why I said "the cleanup > path in __create_workqueue_key() would need > to be altered" :-) likely, to the extent that it would not be a call > to destroy_workqueue() anymore. > > either something that only does > > for_each_cpu_mask_nr(cpu, *cpu_map) > cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, cpu)); > > > and from the _same_ 'cpu_add_remove_lock' section which is used to > create a wq (so we don't drop a lock); Why should we duplicate the code? > _or_ do it outside of the locked section _but_ don't rely on > for_each_cpu_mask_nr(cpu, *cpu_map)... e.g. just delete all per-cpu > wq->cpu_wq structures that have been initialized (that's no matter if > their respective cpus are online/offline now). Yes. And this means we change the code to handle another special case: destroy() is called by create(). Why? > yes, maybe this cleanup path would not look all that fancy (but I > didn't try) but I do think that by not exposing "partially-initialized > object to the outside world" (e.g. cpu-hotplug events won't see them) > this code would become more straightforward and less prone to possible > errors/races. > > e.g. all these "create_workqueue_key() may race with cpu-hotplug" would be gone. Once again, from my pov wq is fully initialized. Yes, cwq->thread can be NULL or not, and this doesn't necessary match cpu_online_map. This is normal, for example CPU_POST_DEAD runs when CPU doesn't exists, but cwq[CPU]->thread is alive. With the current code we just have no special cases. I do not see why create_workqueue_key()->destroy_workqueue() should be special. However, I don't claim you are wrong. I think this all is a matter of taste. And yes I agree, without the comments the current code is not immediately obvious, this probably indicates that my taste is not good and you are right ;) Oleg. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch, minor] workqueue: consistently use 'err' in __create_workqueue_key() 2008-07-29 16:13 ` Oleg Nesterov @ 2008-07-29 16:52 ` Dmitry Adamushko 0 siblings, 0 replies; 9+ messages in thread From: Dmitry Adamushko @ 2008-07-29 16:52 UTC (permalink / raw) To: Oleg Nesterov; +Cc: linux-kernel, Ingo Molnar 2008/7/29 Oleg Nesterov <oleg@tv-sign.ru>: > On 07/29, Dmitry Adamushko wrote: >> >> 2008/7/29 Oleg Nesterov <oleg@tv-sign.ru>: >> > On 07/29, Oleg Nesterov wrote: >> >> >> >> On 07/29, Dmitry Adamushko wrote: >> >> > >> >> > And I'd say this behavior (of having a partially-created object >> >> > visible to the outside world) is not that robust. e.g. the >> >> > aforementioned race would be eliminated if we place a wq on the global >> >> > list only when it's been successfully initialized. >> >> >> >> Yes, we can change __create_workqueue_key() to check err == 0 before >> >> list_add(), >> > >> > Well no, we can't do even this. >> > >> > Then we have another race with cpu-hotplug. Suppose we have CPUs 0, 1, 2. >> > create_workqueue() fails to create cwq->thread for CPU 2 and calls >> > destroy_workqueue(). Before it takes the cpu_add_remove_lock, _cpu_down() >> > removes CPU 1 from cpu_populated_map, but since we didn't add this wq >> > on the global list, cwq[1]->thread remains alive. >> > >> > destroy_workqueue() takes cpu_add_remove_lock, and calls >> > cleanup_workqueue_thread() for CPUs 0 and 2. cwq[1]->thread is lost. >> >> Yes, I've actually seen this case and that's why I said "the cleanup >> path in __create_workqueue_key() would need >> to be altered" :-) likely, to the extent that it would not be a call >> to destroy_workqueue() anymore. >> >> either something that only does >> >> for_each_cpu_mask_nr(cpu, *cpu_map) >> cleanup_workqueue_thread(per_cpu_ptr(wq->cpu_wq, cpu)); >> >> >> and from the _same_ 'cpu_add_remove_lock' section which is used to >> create a wq (so we don't drop a lock); > > Why should we duplicate the code? > >> _or_ do it outside of the locked section _but_ don't rely on >> for_each_cpu_mask_nr(cpu, *cpu_map)... e.g. just delete all per-cpu >> wq->cpu_wq structures that have been initialized (that's no matter if >> their respective cpus are online/offline now). > > Yes. And this means we change the code to handle another special case: > destroy() is called by create(). Why? > >> yes, maybe this cleanup path would not look all that fancy (but I >> didn't try) but I do think that by not exposing "partially-initialized >> object to the outside world" (e.g. cpu-hotplug events won't see them) >> this code would become more straightforward and less prone to possible >> errors/races. >> >> e.g. all these "create_workqueue_key() may race with cpu-hotplug" would be gone. > > Once again, from my pov wq is fully initialized. Yes, cwq->thread can > be NULL or not, and this doesn't necessary match cpu_online_map. This > is normal, for example CPU_POST_DEAD runs when CPU doesn't exists, but > cwq[CPU]->thread is alive. > > With the current code we just have no special cases. > I do not see > why create_workqueue_key()->destroy_workqueue() should be special. heh, any particular implementation is secondary. My point was about logic/easy-to-analyse/less-prone-to-bugs-races thing (which can be quite subjective indeed). (ok, I'm repeating myself last time and then do go away and shut up [chorus] yeah, go-go-go!!! [/chorus] :^)) >From my pov, if we fail in create_workqueue_key(), this wq is _not_ fully initialized, after all we are about to destroy it and report a failure to user (yeah, and here out tastes seem to be incompatible ;-) >From my pov, create_workqueue_key() is a bit illogical. (we did fail to create a wq from the pov of user, so why cpu-hotplug is allowed to mess with it while we are in-between 2 locked-sections in create_workqueue_key()? -- that's just a door for possible fancy bug/races, imho). All in all, I think that "as is" now it's more _prone_ to potential bugs/races and it's harder to analyse (hey, you've said "Damn. I had this in mind when I wrote the code, but forgot" ;-) anyway, by no means I say my taste is better or whatever. Just an opinion (moreover, of a person who has nothing to do with workqueue's code so it can be happily ignored ;-) > > Oleg. > -- Best regards, Dmitry Adamushko ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] workqueues: add comments to __create_workqueue_key() 2008-07-28 20:41 [patch, minor] workqueue: consistently use 'err' in __create_workqueue_key() Dmitry Adamushko 2008-07-29 11:02 ` Oleg Nesterov @ 2008-07-29 16:22 ` Oleg Nesterov 1 sibling, 0 replies; 9+ messages in thread From: Oleg Nesterov @ 2008-07-29 16:22 UTC (permalink / raw) To: Andrew Morton, Dmitry Adamushko; +Cc: linux-kernel, Ingo Molnar Dmitry Adamushko pointed out that the error handling in __create_workqueue_key() is not clear, add the comment. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> --- LINUS/kernel/workqueue.c~ 2008-07-29 17:59:48.000000000 +0400 +++ LINUS/kernel/workqueue.c 2008-07-29 18:41:06.000000000 +0400 @@ -830,10 +830,21 @@ struct workqueue_struct *__create_workqu start_workqueue_thread(cwq, -1); } else { cpu_maps_update_begin(); + /* + * We must place this wq on list even if the code below fails. + * cpu_down(cpu) can remove cpu from cpu_populated_map before + * destroy_workqueue() takes the lock, in that case we leak + * cwq[cpu]->thread. + */ spin_lock(&workqueue_lock); list_add(&wq->list, &workqueues); spin_unlock(&workqueue_lock); - + /* + * We must initialize cwqs for each possible cpu even if we + * are going to call destroy_workqueue() finally. Otherwise + * cpu_up() can hit the uninitialized cwq once we drop the + * lock. + */ for_each_possible_cpu(cpu) { cwq = init_cpu_workqueue(wq, cpu); if (err || !cpu_online(cpu)) ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2008-07-29 16:53 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-07-28 20:41 [patch, minor] workqueue: consistently use 'err' in __create_workqueue_key() Dmitry Adamushko 2008-07-29 11:02 ` Oleg Nesterov 2008-07-29 11:58 ` Dmitry Adamushko 2008-07-29 12:52 ` Oleg Nesterov 2008-07-29 13:44 ` Oleg Nesterov 2008-07-29 14:20 ` Dmitry Adamushko 2008-07-29 16:13 ` Oleg Nesterov 2008-07-29 16:52 ` Dmitry Adamushko 2008-07-29 16:22 ` [PATCH] workqueues: add comments to __create_workqueue_key() Oleg Nesterov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox