From: Tejun Heo <tj@kernel.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Greg KH <gregkh@suse.de>, Mike Galbraith <efault@gmx.de>,
LKML <linux-kernel@vger.kernel.org>,
Jesse Barnes <jbarnes@virtuousgeek.org>,
Andi Kleen <ak@linux.intel.com>,
pm list <linux-pm@lists.linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Linus Torvalds <torvalds@linux-foundation.org>,
Ingo Molnar <mingo@elte.hu>
Subject: Re: GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume
Date: Fri, 13 Nov 2009 04:14:30 +0900 [thread overview]
Message-ID: <4AFC5E96.1020609@kernel.org> (raw)
In-Reply-To: <20091112183508.GA14661@redhat.com>
Hello, Oleg.
11/13/2009 03:35 AM, Oleg Nesterov wrote:
>> That looks like an excellent idea and I don't think it will add
>> noticeable overhead even done by default and it will magically make
>> the "how to implement single-threaded wq semantics in conccurrency
>> managed wq" problem go away. I'll work on it.
>
> I am still not sure all work_structs should single-threaded by default.
>
> To clarify, I am not arguing. Just I don't know. I mean, this change can
> break the existing code, and it is not easy to notice the problem.
Hmm... I can't think of something which will break if single thread
(or execution instance) is enforced. Are you worrying about ignoring
cpu designation? I'm still working on it but it seems possible to
honor cpu parameter and enforce single thread.
>> making flush_work() behave as
>> flush_work_sync() by default should be doable without too much
>> overhead. I'll give it a shot.
>
> Well, I disagree. Imho it is better to have both flush_work() and
> flush_work_sync(). flush_work() is "special" and should be used with
> care. But this is minor, and if the work_stuct is single-threaded then
> flush_work() == flush_work_sync().
>
> (Perhaps this is what you meant)
Yeap, that was what I meant. :-)
> My only point was, it is not that workqueues are buggy, they were
> designed (and even documented) to work this way. I can't judge if it
> was right or not, but personally I think everything is "logical".
Yes, it's very focused on lowering cross-cpu overhead whereever
possible. I think the one design decision which added most to
complexity and subtlety was allowing work functions to free or reuse
work structs leaving the workqueue code only the pointer value to work
with for synchronization. Maybe it's worth it. I don't know. At any
rate changing it would be too costly at this point.
> That said, I agree that we have too many buggy users, perhaps we
> should simplify the rules.
>
> I just noticed that schedule_on_each_cpu() was recently changed by
>
> HWPOISON: Allow schedule_on_each_cpu() from keventd
> commit: 65a64464349883891e21e74af16c05d6e1eeb4e9
>
> Surprisingly, even this simple change is not exactly right.
>
> /*
> * when running in keventd don't schedule a work item on itself.
> * Can just call directly because the work queue is already bound.
> * This also is faster.
> * Make this a generic parameter for other workqueues?
> */
> if (current_is_keventd()) {
> orig = raw_smp_processor_id();
> INIT_WORK(per_cpu_ptr(works, orig), func);
> func(per_cpu_ptr(works, orig));
> }
>
> OK, but this code should be moved down, under get_online_cpus().
Yeap, I already have the patch in my queue. It will be going out in a
few days.
> Perhaps you and Linus are right, and we should simplify the rules
> unconditionally. But note that the problem above has nothing to do with
> single-threaded behaviour, and I do not think it is possible to guarantee
> work->func() can't be moved to another CPU.
I don't know about Linus but I definitely would like default single
threaded behavior. Currently, singlethread workqueue is used for two
things - to limit the number of workers and to avoid works executing
simultaneously on different cpus but I don't think all users are
careful enough about the second point.
Thanks.
--
tejun
next prev parent reply other threads:[~2009-11-12 19:14 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-09 11:50 Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt_count leakage in keventd Rafael J. Wysocki
2009-11-09 12:02 ` Ingo Molnar
[not found] ` <20091109120217.GB18399@elte.hu>
2009-11-09 12:24 ` Rafael J. Wysocki
[not found] ` <200911091324.37955.rjw@sisk.pl>
2009-11-09 12:49 ` Ingo Molnar
2009-11-09 14:02 ` Thomas Gleixner
[not found] ` <alpine.LFD.2.00.0911091452300.2725@localhost.localdomain>
2009-11-09 14:16 ` Mike Galbraith
2009-11-09 14:26 ` Rafael J. Wysocki
[not found] ` <1257776176.6365.8.camel@marge.simson.net>
2009-11-09 14:27 ` Rafael J. Wysocki
2009-11-09 14:30 ` Mike Galbraith
[not found] ` <1257777040.6365.15.camel@marge.simson.net>
2009-11-09 15:47 ` Rafael J. Wysocki
[not found] ` <200911091647.54171.rjw@sisk.pl>
2009-11-09 16:19 ` Mike Galbraith
[not found] ` <1257783553.6408.6.camel@marge.simson.net>
2009-11-09 17:36 ` Rafael J. Wysocki
[not found] ` <200911091836.30349.rjw@sisk.pl>
2009-11-09 18:50 ` Thomas Gleixner
2009-11-09 19:13 ` Thomas Gleixner
[not found] ` <alpine.LFD.2.00.0911091949500.2725@localhost.localdomain>
2009-11-09 20:00 ` Rafael J. Wysocki
[not found] ` <200911092100.58187.rjw@sisk.pl>
2009-11-09 20:31 ` Alan Stern
2009-11-09 20:45 ` GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume (was: Re: Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt_count leakage in keventd) Rafael J. Wysocki
[not found] ` <200911092145.27485.rjw@sisk.pl>
2009-11-09 21:42 ` Linus Torvalds
2009-11-10 0:19 ` Rafael J. Wysocki
[not found] ` <200911100119.38019.rjw@sisk.pl>
2009-11-10 22:02 ` Linus Torvalds
2009-11-11 8:08 ` GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume Tejun Heo
2009-11-11 11:52 ` GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume (was: Re: Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt_count leakage in keventd) Rafael J. Wysocki
2009-11-11 16:13 ` Oleg Nesterov
2009-11-11 17:17 ` Oleg Nesterov
[not found] ` <4AFA70FB.60706@kernel.org>
2009-11-11 18:13 ` GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume Oleg Nesterov
[not found] ` <20091111181349.GA30638@redhat.com>
2009-11-12 4:56 ` Tejun Heo
[not found] ` <4AFB9595.1030002@kernel.org>
2009-11-12 18:35 ` Oleg Nesterov
[not found] ` <20091112183508.GA14661@redhat.com>
2009-11-12 19:14 ` Tejun Heo [this message]
2009-11-16 11:01 ` Tejun Heo
[not found] ` <200911111252.48214.rjw@sisk.pl>
2009-11-11 19:52 ` GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume (was: Re: Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt_count leakage in keventd) Linus Torvalds
2009-11-11 20:18 ` Marcel Holtmann
[not found] ` <1257970719.21467.4.camel@violet>
2009-11-11 20:25 ` Linus Torvalds
2009-11-11 21:18 ` Rafael J. Wysocki
2009-11-11 21:13 ` Oliver Neukum
[not found] ` <200911112213.30673.oliver@neukum.org>
2009-11-11 21:38 ` Linus Torvalds
2009-11-11 21:44 ` Oliver Neukum
[not found] ` <20091111161348.GA27394@redhat.com>
2009-11-11 20:00 ` Rafael J. Wysocki
[not found] ` <200911112100.16561.rjw@sisk.pl>
2009-11-11 20:11 ` Linus Torvalds
[not found] ` <alpine.LFD.2.01.0911111202490.31845@localhost.localdomain>
2009-11-11 20:20 ` Marcel Holtmann
2009-11-11 20:24 ` Oleg Nesterov
[not found] ` <20091111202433.GA5714@redhat.com>
2009-11-11 21:15 ` Oliver Neukum
[not found] ` <20091111171703.GA28506@redhat.com>
2009-11-12 17:33 ` Thomas Gleixner
[not found] ` <alpine.LFD.2.00.0911121821370.24119@localhost.localdomain>
2009-11-12 19:17 ` GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume Tejun Heo
2009-11-12 20:53 ` GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume (was: Re: Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt_count leakage in keventd) Rafael J. Wysocki
[not found] ` <4AFC5F2E.9040709@kernel.org>
2009-11-12 20:53 ` GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume Thomas Gleixner
[not found] ` <200911122153.12419.rjw@sisk.pl>
2009-11-12 20:55 ` GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume (was: Re: Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt_count leakage in keventd) Thomas Gleixner
[not found] ` <alpine.LFD.2.00.0911122153320.24119@localhost.localdomain>
2009-11-12 22:55 ` Rafael J. Wysocki
[not found] ` <200911122355.11675.rjw@sisk.pl>
2009-11-12 23:08 ` Thomas Gleixner
2009-11-15 23:37 ` Frederic Weisbecker
[not found] ` <20091115233703.GA6090@nowhere>
2009-11-15 23:40 ` Frederic Weisbecker
[not found] ` <alpine.LFD.2.00.0911092012340.2725@localhost.localdomain>
2009-11-09 20:03 ` Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt_count leakage in keventd Rafael J. Wysocki
[not found] ` <200911091526.02147.rjw@sisk.pl>
2009-11-09 14:44 ` Mike Galbraith
[not found] ` <1257777896.6365.21.camel@marge.simson.net>
2009-11-09 15:47 ` Rafael J. Wysocki
2009-11-09 15:57 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AFC5E96.1020609@kernel.org \
--to=tj@kernel.org \
--cc=ak@linux.intel.com \
--cc=efault@gmx.de \
--cc=gregkh@suse.de \
--cc=jbarnes@virtuousgeek.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@lists.linux-foundation.org \
--cc=mingo@elte.hu \
--cc=oleg@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox