* [Question] null pointer risk of kernel workqueue
@ 2017-09-23 9:04 tanxiaofei
2017-09-25 15:25 ` Tejun Heo
0 siblings, 1 reply; 10+ messages in thread
From: tanxiaofei @ 2017-09-23 9:04 UTC (permalink / raw)
To: tj, jiangshanlai, linux-kernel; +Cc: Linuxarm
Hi Tejun & Jiangshan,
I find an null pointer risk in the code of workqueue. Here is description:
If draining, __queue_work() will call the function is_chained_work() to do some checks.
In is_chained_work(), worker->current_pwq is used directly. It should be not safe.
http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L1384
If you check the thread function of this worker, worker_thread(), you will find worker->current_pwq
is null when one work is done or ready to be processed.
This issue may happen only if we queue work during executing drain_workqueue().
http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L2173
There are very few places to call drain_workqueue() in the whole linux kernel.
I think that's why no one noticed this risk.
Xiaofei Tan
_______________________________________________
linuxarm mailing list
linuxarm@huawei.com
http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm
.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Question] null pointer risk of kernel workqueue
2017-09-23 9:04 [Question] null pointer risk of kernel workqueue tanxiaofei
@ 2017-09-25 15:25 ` Tejun Heo
2017-09-27 9:17 ` tanxiaofei
0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2017-09-25 15:25 UTC (permalink / raw)
To: tanxiaofei; +Cc: jiangshanlai, linux-kernel, Linuxarm
Hello,
On Sat, Sep 23, 2017 at 05:04:24PM +0800, tanxiaofei wrote:
> Hi Tejun & Jiangshan,
>
> I find an null pointer risk in the code of workqueue. Here is description:
>
> If draining, __queue_work() will call the function is_chained_work() to do some checks.
> In is_chained_work(), worker->current_pwq is used directly. It should be not safe.
> http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L1384
>
> If you check the thread function of this worker, worker_thread(), you will find worker->current_pwq
> is null when one work is done or ready to be processed.
> This issue may happen only if we queue work during executing drain_workqueue().
> http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L2173
Hmmm? I don't get it. worker->current_pwq is guaranteed to be set
while a work function is being executed and the chained check can only
get there iff the the worker is executing a work function.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Question] null pointer risk of kernel workqueue
2017-09-25 15:25 ` Tejun Heo
@ 2017-09-27 9:17 ` tanxiaofei
2017-10-20 6:57 ` tanxiaofei
0 siblings, 1 reply; 10+ messages in thread
From: tanxiaofei @ 2017-09-27 9:17 UTC (permalink / raw)
To: Tejun Heo; +Cc: jiangshanlai, linux-kernel, Linuxarm, John Garry, huawei.libin
On 2017/9/25 23:25, Tejun Heo wrote:
> Hello,
>
> On Sat, Sep 23, 2017 at 05:04:24PM +0800, tanxiaofei wrote:
>> Hi Tejun & Jiangshan,
>>
>> I find an null pointer risk in the code of workqueue. Here is description:
>>
>> If draining, __queue_work() will call the function is_chained_work() to do some checks.
>> In is_chained_work(), worker->current_pwq is used directly. It should be not safe.
>> http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L1384
>>
>> If you check the thread function of this worker, worker_thread(), you will find worker->current_pwq
>> is null when one work is done or ready to be processed.
>> This issue may happen only if we queue work during executing drain_workqueue().
>> http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L2173
>
> Hmmm? I don't get it. worker->current_pwq is guaranteed to be set
> while a work function is being executed and the chained check can only
> get there iff the the worker is executing a work function.
>
Hi Tejun,
Thanks for your quick reply.
Hmm, but i think interrupt preemption could make this happen.
For example, the scenario that all following conditions are met.
1.The handler of an interrupt call queue_work() to queue some works, and the workqueue is draining.
2.An worker thread is interrupted by this interrupt.
3.The worker thread is being executed, and at the very moment,worker->current_pwq
is set to null.(This is possible and please check function worker_thread() carefully).
Thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Question] null pointer risk of kernel workqueue
2017-09-27 9:17 ` tanxiaofei
@ 2017-10-20 6:57 ` tanxiaofei
2017-10-21 15:35 ` Tejun Heo
0 siblings, 1 reply; 10+ messages in thread
From: tanxiaofei @ 2017-10-20 6:57 UTC (permalink / raw)
To: Tejun Heo; +Cc: jiangshanlai, linux-kernel, Linuxarm, John Garry, huawei.libin
Hi Tejun,
Any comments about this?
Thanks
On 2017/9/27 17:17, tanxiaofei wrote:
>
>
> On 2017/9/25 23:25, Tejun Heo wrote:
>> Hello,
>>
>> On Sat, Sep 23, 2017 at 05:04:24PM +0800, tanxiaofei wrote:
>>> Hi Tejun & Jiangshan,
>>>
>>> I find an null pointer risk in the code of workqueue. Here is description:
>>>
>>> If draining, __queue_work() will call the function is_chained_work() to do some checks.
>>> In is_chained_work(), worker->current_pwq is used directly. It should be not safe.
>>> http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L1384
>>>
>>> If you check the thread function of this worker, worker_thread(), you will find worker->current_pwq
>>> is null when one work is done or ready to be processed.
>>> This issue may happen only if we queue work during executing drain_workqueue().
>>> http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L2173
>>
>> Hmmm? I don't get it. worker->current_pwq is guaranteed to be set
>> while a work function is being executed and the chained check can only
>> get there iff the the worker is executing a work function.
>>
>
> Hi Tejun,
>
> Thanks for your quick reply.
> Hmm, but i think interrupt preemption could make this happen.
> For example, the scenario that all following conditions are met.
> 1.The handler of an interrupt call queue_work() to queue some works, and the workqueue is draining.
> 2.An worker thread is interrupted by this interrupt.
> 3.The worker thread is being executed, and at the very moment,worker->current_pwq
> is set to null.(This is possible and please check function worker_thread() carefully).
>
> Thanks.
>
>
>
>
>
> .
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Question] null pointer risk of kernel workqueue
2017-10-20 6:57 ` tanxiaofei
@ 2017-10-21 15:35 ` Tejun Heo
2017-10-21 15:35 ` Tejun Heo
2017-10-23 1:34 ` Li Bin
0 siblings, 2 replies; 10+ messages in thread
From: Tejun Heo @ 2017-10-21 15:35 UTC (permalink / raw)
To: tanxiaofei; +Cc: jiangshanlai, linux-kernel, Linuxarm, John Garry, huawei.libin
On Fri, Oct 20, 2017 at 02:57:18PM +0800, tanxiaofei wrote:
> Hi Tejun,
>
> Any comments about this?
I think you're confused, or at least can't understand what you're
trying to say. Can you create a rero?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Question] null pointer risk of kernel workqueue
2017-10-21 15:35 ` Tejun Heo
@ 2017-10-21 15:35 ` Tejun Heo
2017-10-23 1:34 ` Li Bin
1 sibling, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2017-10-21 15:35 UTC (permalink / raw)
To: tanxiaofei; +Cc: jiangshanlai, linux-kernel, Linuxarm, John Garry, huawei.libin
On Sat, Oct 21, 2017 at 08:35:22AM -0700, Tejun Heo wrote:
> On Fri, Oct 20, 2017 at 02:57:18PM +0800, tanxiaofei wrote:
> > Hi Tejun,
> >
> > Any comments about this?
>
> I think you're confused, or at least can't understand what you're
> trying to say. Can you create a rero?
Sorry, I meant to write repro - a reproduction case.
--
tejun
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Question] null pointer risk of kernel workqueue
2017-10-21 15:35 ` Tejun Heo
2017-10-21 15:35 ` Tejun Heo
@ 2017-10-23 1:34 ` Li Bin
2017-10-23 6:23 ` tanxiaofei
2017-10-23 14:03 ` Tejun Heo
1 sibling, 2 replies; 10+ messages in thread
From: Li Bin @ 2017-10-23 1:34 UTC (permalink / raw)
To: Tejun Heo, tanxiaofei; +Cc: jiangshanlai, linux-kernel, John Garry
on 2017/10/21 23:35, Tejun Heo wrote:
> On Fri, Oct 20, 2017 at 02:57:18PM +0800, tanxiaofei wrote:
>> Hi Tejun,
>>
>> Any comments about this?
>
> I think you're confused, or at least can't understand what you're
> trying to say. Can you create a rero?
>
Hi Tejun,
The case is as following:
worker_thread()
|-spin_lock_irq()
|-process_one_work()
|-worker->current_pwq = pwq
|-spin_unlock_irq()
|-worker->current_func(work)
|-spin_lock_irq()
|-worker->current_pwq = NULL
|-spin_unlock_irq()
//interrupt here
|-irq_handler
|-__queue_work()
//assuming that the wq is draining
|-if (unlikely(wq->flags & __WQ_DRAINING) &&WARN_ON_ONCE(!is_chained_work(wq)))
|-is_chained_work(wq)
|-current_wq_worker() // Here, 'current' is the interrupted worker!
|-current->current_pwq is NULL here!
|-schedule()
And I think the following patch can solve the bug, right?
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 8635417..650680c 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -59,7 +59,7 @@ struct worker {
*/
static inline struct worker *current_wq_worker(void)
{
- if (current->flags & PF_WQ_WORKER)
+ if (!in_irq() && (current->flags & PF_WQ_WORKER))
return kthread_data(current);
return NULL;
}
Thanks,
Li Bin
> Thanks.
>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Question] null pointer risk of kernel workqueue
2017-10-23 1:34 ` Li Bin
@ 2017-10-23 6:23 ` tanxiaofei
2017-10-23 14:03 ` Tejun Heo
1 sibling, 0 replies; 10+ messages in thread
From: tanxiaofei @ 2017-10-23 6:23 UTC (permalink / raw)
To: Li Bin, Tejun Heo; +Cc: jiangshanlai, linux-kernel, John Garry
Hi Bin,
Yes, that's it. thanks.
--
tanxiaofei
On 2017/10/23 9:34, Li Bin wrote:
>
>
> on 2017/10/21 23:35, Tejun Heo wrote:
>> On Fri, Oct 20, 2017 at 02:57:18PM +0800, tanxiaofei wrote:
>>> Hi Tejun,
>>>
>>> Any comments about this?
>>
>> I think you're confused, or at least can't understand what you're
>> trying to say. Can you create a rero?
>>
>
> Hi Tejun,
> The case is as following:
>
> worker_thread()
> |-spin_lock_irq()
> |-process_one_work()
> |-worker->current_pwq = pwq
> |-spin_unlock_irq()
> |-worker->current_func(work)
> |-spin_lock_irq()
> |-worker->current_pwq = NULL
> |-spin_unlock_irq()
> //interrupt here
> |-irq_handler
> |-__queue_work()
> //assuming that the wq is draining
> |-if (unlikely(wq->flags & __WQ_DRAINING) &&WARN_ON_ONCE(!is_chained_work(wq)))
> |-is_chained_work(wq)
> |-current_wq_worker() // Here, 'current' is the interrupted worker!
> |-current->current_pwq is NULL here!
> |-schedule()
>
> And I think the following patch can solve the bug, right?
>
> diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
> index 8635417..650680c 100644
> --- a/kernel/workqueue_internal.h
> +++ b/kernel/workqueue_internal.h
> @@ -59,7 +59,7 @@ struct worker {
> */
> static inline struct worker *current_wq_worker(void)
> {
> - if (current->flags & PF_WQ_WORKER)
> + if (!in_irq() && (current->flags & PF_WQ_WORKER))
> return kthread_data(current);
> return NULL;
> }
>
>
> Thanks,
> Li Bin
>
>> Thanks.
>>
>
>
> .
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Question] null pointer risk of kernel workqueue
2017-10-23 1:34 ` Li Bin
2017-10-23 6:23 ` tanxiaofei
@ 2017-10-23 14:03 ` Tejun Heo
2017-10-24 0:33 ` Li Bin
1 sibling, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2017-10-23 14:03 UTC (permalink / raw)
To: Li Bin; +Cc: tanxiaofei, jiangshanlai, linux-kernel, John Garry
Hello,
On Mon, Oct 23, 2017 at 09:34:11AM +0800, Li Bin wrote:
>
>
> on 2017/10/21 23:35, Tejun Heo wrote:
> > On Fri, Oct 20, 2017 at 02:57:18PM +0800, tanxiaofei wrote:
> >> Hi Tejun,
> >>
> >> Any comments about this?
> >
> > I think you're confused, or at least can't understand what you're
> > trying to say. Can you create a rero?
> >
>
> Hi Tejun,
> The case is as following:
>
> worker_thread()
> |-spin_lock_irq()
> |-process_one_work()
> |-worker->current_pwq = pwq
> |-spin_unlock_irq()
> |-worker->current_func(work)
> |-spin_lock_irq()
> |-worker->current_pwq = NULL
> |-spin_unlock_irq()
> //interrupt here
> |-irq_handler
> |-__queue_work()
> //assuming that the wq is draining
> |-if (unlikely(wq->flags & __WQ_DRAINING) &&WARN_ON_ONCE(!is_chained_work(wq)))
> |-is_chained_work(wq)
> |-current_wq_worker() // Here, 'current' is the interrupted worker!
> |-current->current_pwq is NULL here!
> |-schedule()
>
> And I think the following patch can solve the bug, right?
>
> diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
> index 8635417..650680c 100644
> --- a/kernel/workqueue_internal.h
> +++ b/kernel/workqueue_internal.h
> @@ -59,7 +59,7 @@ struct worker {
> */
> static inline struct worker *current_wq_worker(void)
> {
> - if (current->flags & PF_WQ_WORKER)
> + if (!in_irq() && (current->flags & PF_WQ_WORKER))
> return kthread_data(current);
> return NULL;
> }
Yeah, that makes sense to me. Can you please resend the patch with
patch description and SOB?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Question] null pointer risk of kernel workqueue
2017-10-23 14:03 ` Tejun Heo
@ 2017-10-24 0:33 ` Li Bin
0 siblings, 0 replies; 10+ messages in thread
From: Li Bin @ 2017-10-24 0:33 UTC (permalink / raw)
To: Tejun Heo; +Cc: tanxiaofei, jiangshanlai, linux-kernel, John Garry
on 2017/10/23 22:03, Tejun Heo wrote:
>>
>> And I think the following patch can solve the bug, right?
>>
>> diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
>> index 8635417..650680c 100644
>> --- a/kernel/workqueue_internal.h
>> +++ b/kernel/workqueue_internal.h
>> @@ -59,7 +59,7 @@ struct worker {
>> */
>> static inline struct worker *current_wq_worker(void)
>> {
>> - if (current->flags & PF_WQ_WORKER)
>> + if (!in_irq() && (current->flags & PF_WQ_WORKER))
>> return kthread_data(current);
>> return NULL;
>> }
>
> Yeah, that makes sense to me. Can you please resend the patch with
> patch description and SOB?
Ok, I will resend the patch soon.
Thanks,
Li Bin
>
> Thanks.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-10-24 0:49 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-23 9:04 [Question] null pointer risk of kernel workqueue tanxiaofei
2017-09-25 15:25 ` Tejun Heo
2017-09-27 9:17 ` tanxiaofei
2017-10-20 6:57 ` tanxiaofei
2017-10-21 15:35 ` Tejun Heo
2017-10-21 15:35 ` Tejun Heo
2017-10-23 1:34 ` Li Bin
2017-10-23 6:23 ` tanxiaofei
2017-10-23 14:03 ` Tejun Heo
2017-10-24 0:33 ` Li Bin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox