* [Question] null pointer risk of kernel workqueue @ 2017-09-23 9:04 tanxiaofei 2017-09-25 15:25 ` Tejun Heo 0 siblings, 1 reply; 10+ messages in thread From: tanxiaofei @ 2017-09-23 9:04 UTC (permalink / raw) To: tj, jiangshanlai, linux-kernel; +Cc: Linuxarm Hi Tejun & Jiangshan, I find an null pointer risk in the code of workqueue. Here is description: If draining, __queue_work() will call the function is_chained_work() to do some checks. In is_chained_work(), worker->current_pwq is used directly. It should be not safe. http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L1384 If you check the thread function of this worker, worker_thread(), you will find worker->current_pwq is null when one work is done or ready to be processed. This issue may happen only if we queue work during executing drain_workqueue(). http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L2173 There are very few places to call drain_workqueue() in the whole linux kernel. I think that's why no one noticed this risk. Xiaofei Tan _______________________________________________ linuxarm mailing list linuxarm@huawei.com http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm . ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Question] null pointer risk of kernel workqueue 2017-09-23 9:04 [Question] null pointer risk of kernel workqueue tanxiaofei @ 2017-09-25 15:25 ` Tejun Heo 2017-09-27 9:17 ` tanxiaofei 0 siblings, 1 reply; 10+ messages in thread From: Tejun Heo @ 2017-09-25 15:25 UTC (permalink / raw) To: tanxiaofei; +Cc: jiangshanlai, linux-kernel, Linuxarm Hello, On Sat, Sep 23, 2017 at 05:04:24PM +0800, tanxiaofei wrote: > Hi Tejun & Jiangshan, > > I find an null pointer risk in the code of workqueue. Here is description: > > If draining, __queue_work() will call the function is_chained_work() to do some checks. > In is_chained_work(), worker->current_pwq is used directly. It should be not safe. > http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L1384 > > If you check the thread function of this worker, worker_thread(), you will find worker->current_pwq > is null when one work is done or ready to be processed. > This issue may happen only if we queue work during executing drain_workqueue(). > http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L2173 Hmmm? I don't get it. worker->current_pwq is guaranteed to be set while a work function is being executed and the chained check can only get there iff the the worker is executing a work function. Thanks. -- tejun ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Question] null pointer risk of kernel workqueue 2017-09-25 15:25 ` Tejun Heo @ 2017-09-27 9:17 ` tanxiaofei 2017-10-20 6:57 ` tanxiaofei 0 siblings, 1 reply; 10+ messages in thread From: tanxiaofei @ 2017-09-27 9:17 UTC (permalink / raw) To: Tejun Heo; +Cc: jiangshanlai, linux-kernel, Linuxarm, John Garry, huawei.libin On 2017/9/25 23:25, Tejun Heo wrote: > Hello, > > On Sat, Sep 23, 2017 at 05:04:24PM +0800, tanxiaofei wrote: >> Hi Tejun & Jiangshan, >> >> I find an null pointer risk in the code of workqueue. Here is description: >> >> If draining, __queue_work() will call the function is_chained_work() to do some checks. >> In is_chained_work(), worker->current_pwq is used directly. It should be not safe. >> http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L1384 >> >> If you check the thread function of this worker, worker_thread(), you will find worker->current_pwq >> is null when one work is done or ready to be processed. >> This issue may happen only if we queue work during executing drain_workqueue(). >> http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L2173 > > Hmmm? I don't get it. worker->current_pwq is guaranteed to be set > while a work function is being executed and the chained check can only > get there iff the the worker is executing a work function. > Hi Tejun, Thanks for your quick reply. Hmm, but i think interrupt preemption could make this happen. For example, the scenario that all following conditions are met. 1.The handler of an interrupt call queue_work() to queue some works, and the workqueue is draining. 2.An worker thread is interrupted by this interrupt. 3.The worker thread is being executed, and at the very moment,worker->current_pwq is set to null.(This is possible and please check function worker_thread() carefully). Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Question] null pointer risk of kernel workqueue 2017-09-27 9:17 ` tanxiaofei @ 2017-10-20 6:57 ` tanxiaofei 2017-10-21 15:35 ` Tejun Heo 0 siblings, 1 reply; 10+ messages in thread From: tanxiaofei @ 2017-10-20 6:57 UTC (permalink / raw) To: Tejun Heo; +Cc: jiangshanlai, linux-kernel, Linuxarm, John Garry, huawei.libin Hi Tejun, Any comments about this? Thanks On 2017/9/27 17:17, tanxiaofei wrote: > > > On 2017/9/25 23:25, Tejun Heo wrote: >> Hello, >> >> On Sat, Sep 23, 2017 at 05:04:24PM +0800, tanxiaofei wrote: >>> Hi Tejun & Jiangshan, >>> >>> I find an null pointer risk in the code of workqueue. Here is description: >>> >>> If draining, __queue_work() will call the function is_chained_work() to do some checks. >>> In is_chained_work(), worker->current_pwq is used directly. It should be not safe. >>> http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L1384 >>> >>> If you check the thread function of this worker, worker_thread(), you will find worker->current_pwq >>> is null when one work is done or ready to be processed. >>> This issue may happen only if we queue work during executing drain_workqueue(). >>> http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L2173 >> >> Hmmm? I don't get it. worker->current_pwq is guaranteed to be set >> while a work function is being executed and the chained check can only >> get there iff the the worker is executing a work function. >> > > Hi Tejun, > > Thanks for your quick reply. > Hmm, but i think interrupt preemption could make this happen. > For example, the scenario that all following conditions are met. > 1.The handler of an interrupt call queue_work() to queue some works, and the workqueue is draining. > 2.An worker thread is interrupted by this interrupt. > 3.The worker thread is being executed, and at the very moment,worker->current_pwq > is set to null.(This is possible and please check function worker_thread() carefully). > > Thanks. > > > > > > . > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Question] null pointer risk of kernel workqueue 2017-10-20 6:57 ` tanxiaofei @ 2017-10-21 15:35 ` Tejun Heo 2017-10-21 15:35 ` Tejun Heo 2017-10-23 1:34 ` Li Bin 0 siblings, 2 replies; 10+ messages in thread From: Tejun Heo @ 2017-10-21 15:35 UTC (permalink / raw) To: tanxiaofei; +Cc: jiangshanlai, linux-kernel, Linuxarm, John Garry, huawei.libin On Fri, Oct 20, 2017 at 02:57:18PM +0800, tanxiaofei wrote: > Hi Tejun, > > Any comments about this? I think you're confused, or at least can't understand what you're trying to say. Can you create a rero? Thanks. -- tejun ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Question] null pointer risk of kernel workqueue 2017-10-21 15:35 ` Tejun Heo @ 2017-10-21 15:35 ` Tejun Heo 2017-10-23 1:34 ` Li Bin 1 sibling, 0 replies; 10+ messages in thread From: Tejun Heo @ 2017-10-21 15:35 UTC (permalink / raw) To: tanxiaofei; +Cc: jiangshanlai, linux-kernel, Linuxarm, John Garry, huawei.libin On Sat, Oct 21, 2017 at 08:35:22AM -0700, Tejun Heo wrote: > On Fri, Oct 20, 2017 at 02:57:18PM +0800, tanxiaofei wrote: > > Hi Tejun, > > > > Any comments about this? > > I think you're confused, or at least can't understand what you're > trying to say. Can you create a rero? Sorry, I meant to write repro - a reproduction case. -- tejun ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Question] null pointer risk of kernel workqueue 2017-10-21 15:35 ` Tejun Heo 2017-10-21 15:35 ` Tejun Heo @ 2017-10-23 1:34 ` Li Bin 2017-10-23 6:23 ` tanxiaofei 2017-10-23 14:03 ` Tejun Heo 1 sibling, 2 replies; 10+ messages in thread From: Li Bin @ 2017-10-23 1:34 UTC (permalink / raw) To: Tejun Heo, tanxiaofei; +Cc: jiangshanlai, linux-kernel, John Garry on 2017/10/21 23:35, Tejun Heo wrote: > On Fri, Oct 20, 2017 at 02:57:18PM +0800, tanxiaofei wrote: >> Hi Tejun, >> >> Any comments about this? > > I think you're confused, or at least can't understand what you're > trying to say. Can you create a rero? > Hi Tejun, The case is as following: worker_thread() |-spin_lock_irq() |-process_one_work() |-worker->current_pwq = pwq |-spin_unlock_irq() |-worker->current_func(work) |-spin_lock_irq() |-worker->current_pwq = NULL |-spin_unlock_irq() //interrupt here |-irq_handler |-__queue_work() //assuming that the wq is draining |-if (unlikely(wq->flags & __WQ_DRAINING) &&WARN_ON_ONCE(!is_chained_work(wq))) |-is_chained_work(wq) |-current_wq_worker() // Here, 'current' is the interrupted worker! |-current->current_pwq is NULL here! |-schedule() And I think the following patch can solve the bug, right? diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h index 8635417..650680c 100644 --- a/kernel/workqueue_internal.h +++ b/kernel/workqueue_internal.h @@ -59,7 +59,7 @@ struct worker { */ static inline struct worker *current_wq_worker(void) { - if (current->flags & PF_WQ_WORKER) + if (!in_irq() && (current->flags & PF_WQ_WORKER)) return kthread_data(current); return NULL; } Thanks, Li Bin > Thanks. > ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Question] null pointer risk of kernel workqueue 2017-10-23 1:34 ` Li Bin @ 2017-10-23 6:23 ` tanxiaofei 2017-10-23 14:03 ` Tejun Heo 1 sibling, 0 replies; 10+ messages in thread From: tanxiaofei @ 2017-10-23 6:23 UTC (permalink / raw) To: Li Bin, Tejun Heo; +Cc: jiangshanlai, linux-kernel, John Garry Hi Bin, Yes, that's it. thanks. -- tanxiaofei On 2017/10/23 9:34, Li Bin wrote: > > > on 2017/10/21 23:35, Tejun Heo wrote: >> On Fri, Oct 20, 2017 at 02:57:18PM +0800, tanxiaofei wrote: >>> Hi Tejun, >>> >>> Any comments about this? >> >> I think you're confused, or at least can't understand what you're >> trying to say. Can you create a rero? >> > > Hi Tejun, > The case is as following: > > worker_thread() > |-spin_lock_irq() > |-process_one_work() > |-worker->current_pwq = pwq > |-spin_unlock_irq() > |-worker->current_func(work) > |-spin_lock_irq() > |-worker->current_pwq = NULL > |-spin_unlock_irq() > //interrupt here > |-irq_handler > |-__queue_work() > //assuming that the wq is draining > |-if (unlikely(wq->flags & __WQ_DRAINING) &&WARN_ON_ONCE(!is_chained_work(wq))) > |-is_chained_work(wq) > |-current_wq_worker() // Here, 'current' is the interrupted worker! > |-current->current_pwq is NULL here! > |-schedule() > > And I think the following patch can solve the bug, right? > > diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h > index 8635417..650680c 100644 > --- a/kernel/workqueue_internal.h > +++ b/kernel/workqueue_internal.h > @@ -59,7 +59,7 @@ struct worker { > */ > static inline struct worker *current_wq_worker(void) > { > - if (current->flags & PF_WQ_WORKER) > + if (!in_irq() && (current->flags & PF_WQ_WORKER)) > return kthread_data(current); > return NULL; > } > > > Thanks, > Li Bin > >> Thanks. >> > > > . > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Question] null pointer risk of kernel workqueue 2017-10-23 1:34 ` Li Bin 2017-10-23 6:23 ` tanxiaofei @ 2017-10-23 14:03 ` Tejun Heo 2017-10-24 0:33 ` Li Bin 1 sibling, 1 reply; 10+ messages in thread From: Tejun Heo @ 2017-10-23 14:03 UTC (permalink / raw) To: Li Bin; +Cc: tanxiaofei, jiangshanlai, linux-kernel, John Garry Hello, On Mon, Oct 23, 2017 at 09:34:11AM +0800, Li Bin wrote: > > > on 2017/10/21 23:35, Tejun Heo wrote: > > On Fri, Oct 20, 2017 at 02:57:18PM +0800, tanxiaofei wrote: > >> Hi Tejun, > >> > >> Any comments about this? > > > > I think you're confused, or at least can't understand what you're > > trying to say. Can you create a rero? > > > > Hi Tejun, > The case is as following: > > worker_thread() > |-spin_lock_irq() > |-process_one_work() > |-worker->current_pwq = pwq > |-spin_unlock_irq() > |-worker->current_func(work) > |-spin_lock_irq() > |-worker->current_pwq = NULL > |-spin_unlock_irq() > //interrupt here > |-irq_handler > |-__queue_work() > //assuming that the wq is draining > |-if (unlikely(wq->flags & __WQ_DRAINING) &&WARN_ON_ONCE(!is_chained_work(wq))) > |-is_chained_work(wq) > |-current_wq_worker() // Here, 'current' is the interrupted worker! > |-current->current_pwq is NULL here! > |-schedule() > > And I think the following patch can solve the bug, right? > > diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h > index 8635417..650680c 100644 > --- a/kernel/workqueue_internal.h > +++ b/kernel/workqueue_internal.h > @@ -59,7 +59,7 @@ struct worker { > */ > static inline struct worker *current_wq_worker(void) > { > - if (current->flags & PF_WQ_WORKER) > + if (!in_irq() && (current->flags & PF_WQ_WORKER)) > return kthread_data(current); > return NULL; > } Yeah, that makes sense to me. Can you please resend the patch with patch description and SOB? Thanks. -- tejun ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Question] null pointer risk of kernel workqueue 2017-10-23 14:03 ` Tejun Heo @ 2017-10-24 0:33 ` Li Bin 0 siblings, 0 replies; 10+ messages in thread From: Li Bin @ 2017-10-24 0:33 UTC (permalink / raw) To: Tejun Heo; +Cc: tanxiaofei, jiangshanlai, linux-kernel, John Garry on 2017/10/23 22:03, Tejun Heo wrote: >> >> And I think the following patch can solve the bug, right? >> >> diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h >> index 8635417..650680c 100644 >> --- a/kernel/workqueue_internal.h >> +++ b/kernel/workqueue_internal.h >> @@ -59,7 +59,7 @@ struct worker { >> */ >> static inline struct worker *current_wq_worker(void) >> { >> - if (current->flags & PF_WQ_WORKER) >> + if (!in_irq() && (current->flags & PF_WQ_WORKER)) >> return kthread_data(current); >> return NULL; >> } > > Yeah, that makes sense to me. Can you please resend the patch with > patch description and SOB? Ok, I will resend the patch soon. Thanks, Li Bin > > Thanks. > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-10-24 0:49 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-09-23 9:04 [Question] null pointer risk of kernel workqueue tanxiaofei 2017-09-25 15:25 ` Tejun Heo 2017-09-27 9:17 ` tanxiaofei 2017-10-20 6:57 ` tanxiaofei 2017-10-21 15:35 ` Tejun Heo 2017-10-21 15:35 ` Tejun Heo 2017-10-23 1:34 ` Li Bin 2017-10-23 6:23 ` tanxiaofei 2017-10-23 14:03 ` Tejun Heo 2017-10-24 0:33 ` Li Bin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox