* [PATCHSET] workqueue, writeback: better worker information in task dumps
@ 2013-03-30 3:00 Tejun Heo
2013-03-30 3:00 ` [PATCH 1/3] kthread: implement probe_kthread_data() Tejun Heo
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Tejun Heo @ 2013-03-30 3:00 UTC (permalink / raw)
To: axboe, akpm; +Cc: jack, david, linux-kernel
Hello,
One of the problems that arise when converting dedicated custom
threadpool to workqueue is that the shared worker pool used by
workqueue anonimizes each worker making it more difficult to identify
what the worker was doing on which target from the output of sysrq-t
or debug dump from oops, BUG() and friends.
For example, after writeback is converted to use workqueue instead of
priviate thread pool, there's no easy to tell which backing device a
writeback work item was working on at the time of task dump, which,
according to our writeback brethren, is important in tracking down
issues with a lot of mounted file systems on a lot of different
devices.
This patchset implements a way for a work function to mark its
execution instance so that task dump of the worker task includes
information to indicate what the work item was doing.
An example WARN dump would look like the following.
WARNING: at /work/os/work/fs/fs-writeback.c:1015 bdi_writeback_workfn+0x2b4/0x3c0()
Modules linked in:
Pid: 28, comm: kworker/u18:0 Not tainted 3.9.0-rc1-work+ #24 empty empty/S3992
Workqueue: writeback bdi_writeback_workfn (flush-8:16)
ffffffff820a3a98 ffff88015b927cb8 ffffffff81c61855 ffff88015b927cf8
ffffffff8108f500 0000000000000000 ffff88007a171948 ffff88007a1716b0
ffff88015b49df00 ffff88015b8d3940 0000000000000000 ffff88015b927d08
Call Trace:
[<ffffffff81c61855>] dump_stack+0x19/0x1b
[<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0
...
This patchset contains the following three patches.
0001-kthread-implement-probe_kthread_data.patch
0002-workqueue-include-workqueue-info-when-printing-debug.patch
0003-writeback-set-worker-desc-to-identify-writeback-work.patch
0001 adds a speculative accessor for kthread_data.
0002 implements worker description.
0003 updates writeback to set its flusher name as worker description.
This patchset is on top of
[1] workqueue: NUMA affinity for unbound workqueues
+ [2] writeback: convert writeback to unbound workqueue
+ [3] arch: unify task dump debug info
and available in the following git branch.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git review-better-dbg
Due to the dependencies, routing this set would be a bit complicated.
I think it would be best to first pull wq/for-3.10 to block tree and
apply workqueue conversion patchset. The patches in this series then
can go through -mm on top of all three pieces - workqueue, block and
archs.
diffstat follows. Thanks.
fs/fs-writeback.c | 1
include/linux/kthread.h | 1
include/linux/workqueue.h | 5 ++
kernel/kthread.c | 19 ++++++++++
kernel/sched/core.c | 1
kernel/workqueue.c | 79 ++++++++++++++++++++++++++++++++++++++++++++
kernel/workqueue_internal.h | 12 ++++++
lib/dump_stack.c | 2 +
8 files changed, 119 insertions(+), 1 deletion(-)
--
tejun
[1] http://thread.gmane.org/gmane.linux.kernel/1465626
[2] http://thread.gmane.org/gmane.linux.kernel/1453006
[3] https://lkml.org/lkml/2013/3/29/354
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH 1/3] kthread: implement probe_kthread_data() 2013-03-30 3:00 [PATCHSET] workqueue, writeback: better worker information in task dumps Tejun Heo @ 2013-03-30 3:00 ` Tejun Heo 2013-03-30 14:36 ` Oleg Nesterov 2013-04-03 9:26 ` Jan Kara 2013-03-30 3:00 ` [PATCH 2/3] workqueue: include workqueue info when printing debug dump of a worker task Tejun Heo 2013-03-30 3:00 ` [PATCH 3/3] writeback: set worker desc to identify writeback workers in task dumps Tejun Heo 2 siblings, 2 replies; 10+ messages in thread From: Tejun Heo @ 2013-03-30 3:00 UTC (permalink / raw) To: axboe, akpm; +Cc: jack, david, linux-kernel, Tejun Heo, Oleg Nesterov Implement probe_kthread_data() which returns kthread_data if accessible. The function is equivalent to kthread_data() except that the specified @task may not be a kthread or its vfork_done is already cleared rendering struct kthread inaccessible. In the former case, probe_kthread_data() may return any value. In the latter, NULL. This will be used to safely print debug information without affecting synchronization in the normal paths. Workqueue debug info printing on dump_stack() and friends will make use of it. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> --- include/linux/kthread.h | 1 + kernel/kthread.c | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/include/linux/kthread.h b/include/linux/kthread.h index 8d81664..7dcef33 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -43,6 +43,7 @@ bool kthread_should_stop(void); bool kthread_should_park(void); bool kthread_freezable_should_stop(bool *was_frozen); void *kthread_data(struct task_struct *k); +void *probe_kthread_data(struct task_struct *k); int kthread_park(struct task_struct *k); void kthread_unpark(struct task_struct *k); void kthread_parkme(void); diff --git a/kernel/kthread.c b/kernel/kthread.c index a2fbbb7..31696c8 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -17,6 +17,7 @@ #include <linux/slab.h> #include <linux/freezer.h> #include <linux/ptrace.h> +#include <linux/uaccess.h> #include <trace/events/sched.h> static DEFINE_SPINLOCK(kthread_create_lock); @@ -122,6 +123,24 @@ void *kthread_data(struct task_struct *task) return to_kthread(task)->data; } +/** + * probe_kthread_data - speculatively version of kthread_data() + * @task: possible kthread task in question + * + * @task could be a kthread task. Return the data value specified when it + * was created if accessible. If @task isn't a kthread task or its data is + * inaccessible for any reason, %NULL is returned. This function requires + * that @task itself is safe to dereference. + */ +void *probe_kthread_data(struct task_struct *task) +{ + struct kthread *kthread = to_kthread(task); + void *data = NULL; + + probe_kernel_read(&data, &kthread->data, sizeof(data)); + return data; +} + static void __kthread_parkme(struct kthread *self) { __set_current_state(TASK_INTERRUPTIBLE); -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] kthread: implement probe_kthread_data() 2013-03-30 3:00 ` [PATCH 1/3] kthread: implement probe_kthread_data() Tejun Heo @ 2013-03-30 14:36 ` Oleg Nesterov 2013-03-30 16:17 ` Tejun Heo 2013-04-03 9:26 ` Jan Kara 1 sibling, 1 reply; 10+ messages in thread From: Oleg Nesterov @ 2013-03-30 14:36 UTC (permalink / raw) To: Tejun Heo; +Cc: axboe, akpm, jack, david, linux-kernel On 03/29, Tejun Heo wrote: > > Implement probe_kthread_data() which returns kthread_data if > accessible. The function is equivalent to kthread_data() except that > the specified @task may not be a kthread ^^^^^^^^^^^^^^^^^^^^^^^^^ This doesn't necessarily mean its ->vfork_done != NULL... but I guess the "false positive" is fine for your purpose. > or its vfork_done is already > cleared rendering struct kthread inaccessible. > ... > +/** > + * probe_kthread_data - speculatively version of kthread_data() > + * @task: possible kthread task in question > + * > + * @task could be a kthread task. Return the data value specified when it > + * was created if accessible. If @task isn't a kthread task or its data is > + * inaccessible for any reason, %NULL is returned. This function requires > + * that @task itself is safe to dereference. > + */ > +void *probe_kthread_data(struct task_struct *task) > +{ > + struct kthread *kthread = to_kthread(task); > + void *data = NULL; > + > + probe_kernel_read(&data, &kthread->data, sizeof(data)); > + return data; > +} OK, but we can simply check ->vfork_done != NULL ? We do not care if we race with the exiting thread which can clear its ->vfork_done. If it is safe to dereference this @task then kthread is also safe. kthread-introduce-to_live_kthread.patch in -mm adds the trivial helper, static inline struct kthread *to_kthread(struct task_struct *k) { return __to_kthread(k->vfork_done); } static struct kthread *to_live_kthread(struct task_struct *k) { struct completion *vfork = ACCESS_ONCE(k->vfork_done); if (likely(vfork)) return __to_kthread(vfork); return NULL; } So, perhaps, voif *kthread_data_safe(struct task_struct *task) { struct kthread *kthread = to_live_kthread(task); return kthread ? kthread->data : NULL; } ? Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] kthread: implement probe_kthread_data() 2013-03-30 14:36 ` Oleg Nesterov @ 2013-03-30 16:17 ` Tejun Heo 2013-03-30 17:00 ` Oleg Nesterov 0 siblings, 1 reply; 10+ messages in thread From: Tejun Heo @ 2013-03-30 16:17 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Jens Axboe, Andrew Morton, Jan Kara, david, lkml Hello, Oleg. On Sat, Mar 30, 2013 at 7:36 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > +void *probe_kthread_data(struct task_struct *task) > > +{ > > + struct kthread *kthread = to_kthread(task); > > + void *data = NULL; > > + > > + probe_kernel_read(&data, &kthread->data, sizeof(data)); > > + return data; > > +} > > OK, but we can simply check ->vfork_done != NULL ? Hmm... what if ->vfork_done is pointing to some weird place? Thanks. -- tejun ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] kthread: implement probe_kthread_data() 2013-03-30 16:17 ` Tejun Heo @ 2013-03-30 17:00 ` Oleg Nesterov [not found] ` <CAOS58YNMbTY2fZzUXeyLwuA+nowW2AhgLvQwKf_0jWXDen6HzQ@mail.gmail.com> 0 siblings, 1 reply; 10+ messages in thread From: Oleg Nesterov @ 2013-03-30 17:00 UTC (permalink / raw) To: Tejun Heo; +Cc: Jens Axboe, Andrew Morton, Jan Kara, david, lkml Hi Tejun, On 03/30, Tejun Heo wrote: > > On Sat, Mar 30, 2013 at 7:36 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > > +void *probe_kthread_data(struct task_struct *task) > > > +{ > > > + struct kthread *kthread = to_kthread(task); > > > + void *data = NULL; > > > + > > > + probe_kernel_read(&data, &kthread->data, sizeof(data)); > > > + return data; > > > +} > > > > OK, but we can simply check ->vfork_done != NULL ? > > Hmm... what if ->vfork_done is pointing to some weird place? Aah... "weird place" is not possible if we know that @task is kthread, it is either NULL or it points into tsk->stack which can only go away along with task. Note that kthread_stop() already relies on this. But I guess I missed the fact that this helper should be safe even this @task can be the a vfork'ed user-space process, yes? In this case, yes, ->vfork_done can point to task->parent->stack, not good... Perhaps, voif *kthread_data_safe(struct task_struct *task) { if (task->parent == kthreadd_task) { struct kthread *kthread = to_live_kthread(task); if (kthread) return kthread->data; } return NULL; } ? Or we can add to_live_kthread_safe() which checks "parent == kthreadd_task" instead. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <CAOS58YNMbTY2fZzUXeyLwuA+nowW2AhgLvQwKf_0jWXDen6HzQ@mail.gmail.com>]
* Re: [PATCH 1/3] kthread: implement probe_kthread_data() [not found] ` <CAOS58YNMbTY2fZzUXeyLwuA+nowW2AhgLvQwKf_0jWXDen6HzQ@mail.gmail.com> @ 2013-03-30 19:12 ` Oleg Nesterov 0 siblings, 0 replies; 10+ messages in thread From: Oleg Nesterov @ 2013-03-30 19:12 UTC (permalink / raw) To: Tejun Heo; +Cc: Jens Axboe, Andrew Morton, Jan Kara, david, lkml On 03/30, Tejun Heo wrote: > > On Sat, Mar 30, 2013 at 10:00 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > > But I guess I missed the fact that this helper should be safe even this > > @task can be the a vfork'ed user-space process, yes? > > > > Yeap, we really don't know what to expect. OK, > > Or we can add to_live_kthread_safe() which checks "parent == kthreadd_task" > > instead. > > But that would be more fragile than probe_read. I mean, this thing will be > called on BUGs and oopses. OK, I agree. Sorry for noise and thanks for your explanations, I really missed the intent. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] kthread: implement probe_kthread_data() 2013-03-30 3:00 ` [PATCH 1/3] kthread: implement probe_kthread_data() Tejun Heo 2013-03-30 14:36 ` Oleg Nesterov @ 2013-04-03 9:26 ` Jan Kara 1 sibling, 0 replies; 10+ messages in thread From: Jan Kara @ 2013-04-03 9:26 UTC (permalink / raw) To: Tejun Heo; +Cc: axboe, akpm, jack, david, linux-kernel, Oleg Nesterov On Fri 29-03-13 20:00:45, Tejun Heo wrote: > Implement probe_kthread_data() which returns kthread_data if > accessible. The function is equivalent to kthread_data() except that > the specified @task may not be a kthread or its vfork_done is already > cleared rendering struct kthread inaccessible. In the former case, > probe_kthread_data() may return any value. In the latter, NULL. > > This will be used to safely print debug information without affecting > synchronization in the normal paths. Workqueue debug info printing on > dump_stack() and friends will make use of it. Thanks for looking into this. Just one typo correction below: > @@ -122,6 +123,24 @@ void *kthread_data(struct task_struct *task) > return to_kthread(task)->data; > } > > +/** > + * probe_kthread_data - speculatively version of kthread_data() ^^^^ speculative Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/3] workqueue: include workqueue info when printing debug dump of a worker task 2013-03-30 3:00 [PATCHSET] workqueue, writeback: better worker information in task dumps Tejun Heo 2013-03-30 3:00 ` [PATCH 1/3] kthread: implement probe_kthread_data() Tejun Heo @ 2013-03-30 3:00 ` Tejun Heo 2013-03-30 3:00 ` [PATCH 3/3] writeback: set worker desc to identify writeback workers in task dumps Tejun Heo 2 siblings, 0 replies; 10+ messages in thread From: Tejun Heo @ 2013-03-30 3:00 UTC (permalink / raw) To: axboe, akpm Cc: jack, david, linux-kernel, Tejun Heo, Peter Zijlstra, Ingo Molnar One of the problems that arise when converting dedicated custom threadpool to workqueue is that the shared worker pool used by workqueue anonimizes each worker making it more difficult to identify what the worker was doing on which target from the output of sysrq-t or debug dump from oops, BUG() and friends. This patch implements set_worker_desc() which can be called from any workqueue work function to set its description. When the worker task is dumped for whatever reason - sysrq-t, WARN, BUG, oops, lockdep assertion and so on - the description will be printed out together with the workqueue name and the worker function pointer. The printing side is implemented by print_worker_info() which is called from functions in task dump paths - sched_show_task() and dump_stack_print_info(). print_worker_info() can be safely called on any task in any state as long as the task struct itself is accessible. It uses probe_*() functions to access worker fields. It may print garbage if something went very wrong, but it wouldn't cause (another) oops. The description is currently limited to 24bytes including the terminating \0. worker->desc_valid and workder->desc[] are added and the 64 bytes marker which was already incorrect before adding the new fields is moved to the correct position. Here's an example dump with writeback updated to set the bdi name as worker desc. Hardware name: Bochs Modules linked in: Pid: 7, comm: kworker/u9:0 Not tainted 3.9.0-rc1-work+ #1 Workqueue: writeback bdi_writeback_workfn (flush-8:0) ffffffff820a3ab0 ffff88000f6e9cb8 ffffffff81c61845 ffff88000f6e9cf8 ffffffff8108f50f 0000000000000000 0000000000000000 ffff88000cde16b0 ffff88000cde1aa8 ffff88001ee19240 ffff88000f6e9fd8 ffff88000f6e9d08 Call Trace: [<ffffffff81c61845>] dump_stack+0x19/0x1b [<ffffffff8108f50f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8108f56a>] warn_slowpath_null+0x1a/0x20 [<ffffffff81200150>] bdi_writeback_workfn+0x2a0/0x3b0 ... Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Dave Chinner <david@fromorbit.com> --- include/linux/workqueue.h | 5 +++ kernel/sched/core.c | 1 + kernel/workqueue.c | 79 +++++++++++++++++++++++++++++++++++++++++++++ kernel/workqueue_internal.h | 12 ++++++- lib/dump_stack.c | 2 ++ 5 files changed, 98 insertions(+), 1 deletion(-) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 7179756..623488f 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -92,6 +92,9 @@ enum { /* bit mask for work_busy() return values */ WORK_BUSY_PENDING = 1 << 0, WORK_BUSY_RUNNING = 1 << 1, + + /* maximum string length for set_worker_desc() */ + WORKER_DESC_LEN = 24, }; struct work_struct { @@ -447,6 +450,8 @@ extern void workqueue_set_max_active(struct workqueue_struct *wq, extern bool current_is_workqueue_rescuer(void); extern bool workqueue_congested(int cpu, struct workqueue_struct *wq); extern unsigned int work_busy(struct work_struct *work); +extern __printf(1, 2) void set_worker_desc(const char *fmt, ...); +extern void print_worker_info(const char *log_lvl, struct task_struct *task); /** * queue_work - queue work on a workqueue diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 23606ee..aba4ccd 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4630,6 +4630,7 @@ void sched_show_task(struct task_struct *p) task_pid_nr(p), ppid, (unsigned long)task_thread_info(p)->flags); + print_worker_info(KERN_INFO, p); show_stack(p, NULL); } diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 80432c7..3181e9b 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -46,6 +46,7 @@ #include <linux/rculist.h> #include <linux/nodemask.h> #include <linux/moduleparam.h> +#include <linux/uaccess.h> #include "workqueue_internal.h" @@ -2197,6 +2198,7 @@ __acquires(&pool->lock) worker->current_work = NULL; worker->current_func = NULL; worker->current_pwq = NULL; + worker->desc_valid = false; pwq_dec_nr_in_flight(pwq, work_color); } @@ -4364,6 +4366,83 @@ unsigned int work_busy(struct work_struct *work) } EXPORT_SYMBOL_GPL(work_busy); +/** + * set_worker_desc - set description for the current work item + * @fmt: printf-style format string + * @...: arguments for the format string + * + * This function can be called by a running work function to describe what + * the work item is about. If the worker task gets dumped, this + * information will be printed out together to help debugging. The + * description can be at most WORKER_DESC_LEN including the trailing '\0'. + */ +void set_worker_desc(const char *fmt, ...) +{ + struct worker *worker = current_wq_worker(); + va_list args; + + if (worker) { + va_start(args, fmt); + vsnprintf(worker->desc, sizeof(worker->desc), fmt, args); + va_end(args); + worker->desc_valid = true; + } +} + +/** + * print_worker_info - print out worker information and description + * @log_lvl: the log level to use when printing + * @task: target task + * + * If @task is a worker and currently executing a work item, print out the + * name of the workqueue being serviced and worker description set with + * set_worker_desc() by the currently executing work item. + * + * This function can be safely called on any task as long as the + * task_struct itself is accessible. While safe, this function isn't + * synchronized and may print out mixups or garbages of limited length. + */ +void print_worker_info(const char *log_lvl, struct task_struct *task) +{ + work_func_t *fn = NULL; + char name[WQ_NAME_LEN] = { }; + char desc[WORKER_DESC_LEN] = { }; + struct pool_workqueue *pwq = NULL; + struct workqueue_struct *wq = NULL; + bool desc_valid = false; + struct worker *worker; + + if (!(task->flags & PF_WQ_WORKER)) + return; + + /* + * This function is called without any synchronization and @task + * could be in any state. Be careful with dereferences. + */ + worker = probe_kthread_data(task); + + /* + * Carefully copy the associated workqueue's workfn and name. Keep + * the original last '\0' in case the original contains garbage. + */ + probe_kernel_read(&fn, &worker->current_func, sizeof(fn)); + probe_kernel_read(&pwq, &worker->current_pwq, sizeof(pwq)); + probe_kernel_read(&wq, &pwq->wq, sizeof(wq)); + probe_kernel_read(name, wq->name, sizeof(name) - 1); + + /* copy worker description */ + probe_kernel_read(&desc_valid, &worker->desc_valid, sizeof(desc_valid)); + if (desc_valid) + probe_kernel_read(desc, worker->desc, sizeof(desc) - 1); + + if (fn || name[0] || desc[0]) { + printk("%sWorkqueue: %s %pf", log_lvl, name, fn); + if (desc[0]) + pr_cont(" (%s)", desc); + pr_cont("\n"); + } +} + /* * CPU hotplug. * diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h index 84ab6e1..ad83c96 100644 --- a/kernel/workqueue_internal.h +++ b/kernel/workqueue_internal.h @@ -29,15 +29,25 @@ struct worker { struct work_struct *current_work; /* L: work being processed */ work_func_t current_func; /* L: current_work's fn */ struct pool_workqueue *current_pwq; /* L: current_work's pwq */ + bool desc_valid; /* ->desc is valid */ struct list_head scheduled; /* L: scheduled works */ + + /* 64 bytes boundary on 64bit, 32 on 32bit */ + struct task_struct *task; /* I: worker task */ struct worker_pool *pool; /* I: the associated pool */ /* L: for rescuers */ - /* 64 bytes boundary on 64bit, 32 on 32bit */ + unsigned long last_active; /* L: last active timestamp */ unsigned int flags; /* X: flags */ int id; /* I: worker id */ + /* + * Opaque string set with work_set_desc(). Printed out with task + * dump for debugging - WARN, BUG, panic or sysrq. + */ + char desc[WORKER_DESC_LEN]; + /* used only by rescuers to point to the target workqueue */ struct workqueue_struct *rescue_wq; /* I: the workqueue to rescue */ }; diff --git a/lib/dump_stack.c b/lib/dump_stack.c index 520a878..776af7b 100644 --- a/lib/dump_stack.c +++ b/lib/dump_stack.c @@ -44,6 +44,8 @@ void dump_stack_print_info(const char *log_lvl) init_utsname()->release, (int)strcspn(init_utsname()->version, " "), init_utsname()->version, dump_stack_arch_desc_str); + + print_worker_info(log_lvl, current); } /** -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] writeback: set worker desc to identify writeback workers in task dumps 2013-03-30 3:00 [PATCHSET] workqueue, writeback: better worker information in task dumps Tejun Heo 2013-03-30 3:00 ` [PATCH 1/3] kthread: implement probe_kthread_data() Tejun Heo 2013-03-30 3:00 ` [PATCH 2/3] workqueue: include workqueue info when printing debug dump of a worker task Tejun Heo @ 2013-03-30 3:00 ` Tejun Heo 2 siblings, 0 replies; 10+ messages in thread From: Tejun Heo @ 2013-03-30 3:00 UTC (permalink / raw) To: axboe, akpm; +Cc: jack, david, linux-kernel, Tejun Heo Writeback has been recently converted to use workqueue instead of its private thread pool implementation. One negative side effect of this conversion is that there's no easy to tell which backing device a writeback work item was working on at the time of task dump, be it sysrq-t, BUG, WARN or whatever, which, according to our writeback brethren, is important in tracking down issues with a lot of mounted file systems on a lot of different devices. This patch restores that information using the new worker description facility. bdi_writeback_workfn() calls set_work_desc() to identify which bdi it's working on. The description is printed out together with the worqueue name and worker function as in the following example dump. WARNING: at /work/os/work/fs/fs-writeback.c:1015 bdi_writeback_workfn+0x2b4/0x3c0() Modules linked in: Pid: 28, comm: kworker/u18:0 Not tainted 3.9.0-rc1-work+ #24 empty empty/S3992 Workqueue: writeback bdi_writeback_workfn (flush-8:16) ffffffff820a3a98 ffff88015b927cb8 ffffffff81c61855 ffff88015b927cf8 ffffffff8108f500 0000000000000000 ffff88007a171948 ffff88007a1716b0 ffff88015b49df00 ffff88015b8d3940 0000000000000000 ffff88015b927d08 Call Trace: [<ffffffff81c61855>] dump_stack+0x19/0x1b [<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0 [<ffffffff8108f54a>] warn_slowpath_null+0x1a/0x20 [<ffffffff81200144>] bdi_writeback_workfn+0x2b4/0x3c0 [<ffffffff810b4c87>] process_one_work+0x1d7/0x660 [<ffffffff810b5c72>] worker_thread+0x122/0x380 [<ffffffff810bdfea>] kthread+0xea/0xf0 [<ffffffff81c6cedc>] ret_from_fork+0x7c/0xb0 Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> --- fs/fs-writeback.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 8067d37..3be5718 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1009,6 +1009,7 @@ void bdi_writeback_workfn(struct work_struct *work) struct backing_dev_info *bdi = wb->bdi; long pages_written; + set_worker_desc("flush-%s", dev_name(bdi->dev)); current->flags |= PF_SWAPWRITE; if (likely(!current_is_workqueue_rescuer() || -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHSET v2] workqueue, writeback: better worker information in task dumps @ 2013-04-03 19:24 Tejun Heo 2013-04-03 19:24 ` [PATCH 1/3] kthread: implement probe_kthread_data() Tejun Heo 0 siblings, 1 reply; 10+ messages in thread From: Tejun Heo @ 2013-04-03 19:24 UTC (permalink / raw) To: axboe, akpm; +Cc: jack, david, linux-kernel Hello, Andrew, this one depends on the debug info unification patch and should be routed through -mm too. The writeback folks seem to be okay with the proposed changes although nobody explicitly acked it yet (please do so). Please route these together with the previous set through -mm. Thanks. This is v2. Changes from v1 are, * Rebased on top of the v2 of "arch: unify task dump debug info" patchset. One of the problems that arise when converting dedicated custom threadpool to workqueue is that the shared worker pool used by workqueue anonimizes each worker making it more difficult to identify what the worker was doing on which target from the output of sysrq-t or debug dump from oops, BUG() and friends. For example, after writeback is converted to use workqueue instead of priviate thread pool, there's no easy to tell which backing device a writeback work item was working on at the time of task dump, which, according to our writeback brethren, is important in tracking down issues with a lot of mounted file systems on a lot of different devices. This patchset implements a way for a work function to mark its execution instance so that task dump of the worker task includes information to indicate what the work item was doing. An example WARN dump would look like the following. WARNING: at /work/os/work/fs/fs-writeback.c:1015 bdi_writeback_workfn+0x2b4/0x3c0() Modules linked in: CPU: 0 Pid: 28 Comm: kworker/u18:0 Not tainted 3.9.0-rc1-work+ #24 Hardware name: empty empty/S3992, BIOS 080011 10/26/2007 Workqueue: writeback bdi_writeback_workfn (flush-8:16) ffffffff820a3a98 ffff88015b927cb8 ffffffff81c61855 ffff88015b927cf8 ffffffff8108f500 0000000000000000 ffff88007a171948 ffff88007a1716b0 ffff88015b49df00 ffff88015b8d3940 0000000000000000 ffff88015b927d08 Call Trace: [<ffffffff81c61855>] dump_stack+0x19/0x1b [<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0 ... This patchset contains the following three patches. 0001-kthread-implement-probe_kthread_data.patch 0002-workqueue-include-workqueue-info-when-printing-debug.patch 0003-writeback-set-worker-desc-to-identify-writeback-work.patch 0001 adds a speculative accessor for kthread_data. 0002 implements worker description. 0003 updates writeback to set its flusher name as worker description. This patchset is on top of akpm as of 2013/04/03 + [PATCHSET v2] arch: unify task dump debug info and available in the following git branch. git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git review-wb-better-dbg diffstat follows. Thanks. fs/fs-writeback.c | 1 include/linux/kthread.h | 1 include/linux/workqueue.h | 5 ++ kernel/kthread.c | 19 ++++++++++ kernel/printk.c | 2 + kernel/sched/core.c | 1 kernel/workqueue.c | 79 ++++++++++++++++++++++++++++++++++++++++++++ kernel/workqueue_internal.h | 12 ++++++ 8 files changed, 119 insertions(+), 1 deletion(-) -- tejun ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] kthread: implement probe_kthread_data() 2013-04-03 19:24 [PATCHSET v2] workqueue, writeback: better worker information " Tejun Heo @ 2013-04-03 19:24 ` Tejun Heo 0 siblings, 0 replies; 10+ messages in thread From: Tejun Heo @ 2013-04-03 19:24 UTC (permalink / raw) To: axboe, akpm; +Cc: jack, david, linux-kernel, Tejun Heo, Oleg Nesterov Implement probe_kthread_data() which returns kthread_data if accessible. The function is equivalent to kthread_data() except that the specified @task may not be a kthread or its vfork_done is already cleared rendering struct kthread inaccessible. In the former case, probe_kthread_data() may return any value. In the latter, NULL. This will be used to safely print debug information without affecting synchronization in the normal paths. Workqueue debug info printing on dump_stack() and friends will make use of it. v2: Spelling error in comment fixed as suggested by Jan Kara. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> --- include/linux/kthread.h | 1 + kernel/kthread.c | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/include/linux/kthread.h b/include/linux/kthread.h index 8d81664..7dcef33 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -43,6 +43,7 @@ bool kthread_should_stop(void); bool kthread_should_park(void); bool kthread_freezable_should_stop(bool *was_frozen); void *kthread_data(struct task_struct *k); +void *probe_kthread_data(struct task_struct *k); int kthread_park(struct task_struct *k); void kthread_unpark(struct task_struct *k); void kthread_parkme(void); diff --git a/kernel/kthread.c b/kernel/kthread.c index b9db231..a279c55 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -17,6 +17,7 @@ #include <linux/slab.h> #include <linux/freezer.h> #include <linux/ptrace.h> +#include <linux/uaccess.h> #include <trace/events/sched.h> static DEFINE_SPINLOCK(kthread_create_lock); @@ -135,6 +136,24 @@ void *kthread_data(struct task_struct *task) return to_kthread(task)->data; } +/** + * probe_kthread_data - speculative version of kthread_data() + * @task: possible kthread task in question + * + * @task could be a kthread task. Return the data value specified when it + * was created if accessible. If @task isn't a kthread task or its data is + * inaccessible for any reason, %NULL is returned. This function requires + * that @task itself is safe to dereference. + */ +void *probe_kthread_data(struct task_struct *task) +{ + struct kthread *kthread = to_kthread(task); + void *data = NULL; + + probe_kernel_read(&data, &kthread->data, sizeof(data)); + return data; +} + static void __kthread_parkme(struct kthread *self) { __set_current_state(TASK_INTERRUPTIBLE); -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-04-03 19:24 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-30 3:00 [PATCHSET] workqueue, writeback: better worker information in task dumps Tejun Heo
2013-03-30 3:00 ` [PATCH 1/3] kthread: implement probe_kthread_data() Tejun Heo
2013-03-30 14:36 ` Oleg Nesterov
2013-03-30 16:17 ` Tejun Heo
2013-03-30 17:00 ` Oleg Nesterov
[not found] ` <CAOS58YNMbTY2fZzUXeyLwuA+nowW2AhgLvQwKf_0jWXDen6HzQ@mail.gmail.com>
2013-03-30 19:12 ` Oleg Nesterov
2013-04-03 9:26 ` Jan Kara
2013-03-30 3:00 ` [PATCH 2/3] workqueue: include workqueue info when printing debug dump of a worker task Tejun Heo
2013-03-30 3:00 ` [PATCH 3/3] writeback: set worker desc to identify writeback workers in task dumps Tejun Heo
-- strict thread matches above, loose matches on Subject: below --
2013-04-03 19:24 [PATCHSET v2] workqueue, writeback: better worker information " Tejun Heo
2013-04-03 19:24 ` [PATCH 1/3] kthread: implement probe_kthread_data() Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox