public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [Question] null pointer risk of kernel workqueue
@ 2017-09-23  9:04 tanxiaofei
  2017-09-25 15:25 ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: tanxiaofei @ 2017-09-23  9:04 UTC (permalink / raw)
  To: tj, jiangshanlai, linux-kernel; +Cc: Linuxarm

Hi Tejun & Jiangshan,

I find an null pointer risk in the code of workqueue. Here is description:

If draining, __queue_work() will call the function is_chained_work() to do some checks.
In is_chained_work(), worker->current_pwq is used directly. It should be not safe.
http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L1384

If you check the thread function of this worker, worker_thread(), you will find worker->current_pwq
is null when one work is done or ready to be processed.
This issue may happen only if we queue work during executing drain_workqueue().
http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L2173

There are very few places to call drain_workqueue() in the whole linux kernel.
I think that's why no one noticed this risk.

Xiaofei Tan
_______________________________________________
linuxarm mailing list
linuxarm@huawei.com
http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm

.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Question] null pointer risk of kernel workqueue
  2017-09-23  9:04 [Question] null pointer risk of kernel workqueue tanxiaofei
@ 2017-09-25 15:25 ` Tejun Heo
  2017-09-27  9:17   ` tanxiaofei
  0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2017-09-25 15:25 UTC (permalink / raw)
  To: tanxiaofei; +Cc: jiangshanlai, linux-kernel, Linuxarm

Hello,

On Sat, Sep 23, 2017 at 05:04:24PM +0800, tanxiaofei wrote:
> Hi Tejun & Jiangshan,
> 
> I find an null pointer risk in the code of workqueue. Here is description:
> 
> If draining, __queue_work() will call the function is_chained_work() to do some checks.
> In is_chained_work(), worker->current_pwq is used directly. It should be not safe.
> http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L1384
> 
> If you check the thread function of this worker, worker_thread(), you will find worker->current_pwq
> is null when one work is done or ready to be processed.
> This issue may happen only if we queue work during executing drain_workqueue().
> http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L2173

Hmmm? I don't get it.  worker->current_pwq is guaranteed to be set
while a work function is being executed and the chained check can only
get there iff the the worker is executing a work function.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Question] null pointer risk of kernel workqueue
  2017-09-25 15:25 ` Tejun Heo
@ 2017-09-27  9:17   ` tanxiaofei
  2017-10-20  6:57     ` tanxiaofei
  0 siblings, 1 reply; 10+ messages in thread
From: tanxiaofei @ 2017-09-27  9:17 UTC (permalink / raw)
  To: Tejun Heo; +Cc: jiangshanlai, linux-kernel, Linuxarm, John Garry, huawei.libin



On 2017/9/25 23:25, Tejun Heo wrote:
> Hello,
> 
> On Sat, Sep 23, 2017 at 05:04:24PM +0800, tanxiaofei wrote:
>> Hi Tejun & Jiangshan,
>>
>> I find an null pointer risk in the code of workqueue. Here is description:
>>
>> If draining, __queue_work() will call the function is_chained_work() to do some checks.
>> In is_chained_work(), worker->current_pwq is used directly. It should be not safe.
>> http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L1384
>>
>> If you check the thread function of this worker, worker_thread(), you will find worker->current_pwq
>> is null when one work is done or ready to be processed.
>> This issue may happen only if we queue work during executing drain_workqueue().
>> http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L2173
> 
> Hmmm? I don't get it.  worker->current_pwq is guaranteed to be set
> while a work function is being executed and the chained check can only
> get there iff the the worker is executing a work function.
> 

Hi Tejun,

Thanks for your quick reply.
Hmm, but i think interrupt preemption could make this happen.
For example, the scenario that all following conditions are met.
1.The handler of an interrupt call queue_work() to queue some works, and the workqueue is draining.
2.An worker thread is interrupted by this interrupt.
3.The worker thread is being executed, and at the very moment,worker->current_pwq
is set to null.(This is possible and please check function worker_thread() carefully).

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Question] null pointer risk of kernel workqueue
  2017-09-27  9:17   ` tanxiaofei
@ 2017-10-20  6:57     ` tanxiaofei
  2017-10-21 15:35       ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: tanxiaofei @ 2017-10-20  6:57 UTC (permalink / raw)
  To: Tejun Heo; +Cc: jiangshanlai, linux-kernel, Linuxarm, John Garry, huawei.libin

Hi Tejun,

Any comments about this?

Thanks

On 2017/9/27 17:17, tanxiaofei wrote:
> 
> 
> On 2017/9/25 23:25, Tejun Heo wrote:
>> Hello,
>>
>> On Sat, Sep 23, 2017 at 05:04:24PM +0800, tanxiaofei wrote:
>>> Hi Tejun & Jiangshan,
>>>
>>> I find an null pointer risk in the code of workqueue. Here is description:
>>>
>>> If draining, __queue_work() will call the function is_chained_work() to do some checks.
>>> In is_chained_work(), worker->current_pwq is used directly. It should be not safe.
>>> http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L1384
>>>
>>> If you check the thread function of this worker, worker_thread(), you will find worker->current_pwq
>>> is null when one work is done or ready to be processed.
>>> This issue may happen only if we queue work during executing drain_workqueue().
>>> http://elixir.free-electrons.com/linux/latest/source/kernel/workqueue.c#L2173
>>
>> Hmmm? I don't get it.  worker->current_pwq is guaranteed to be set
>> while a work function is being executed and the chained check can only
>> get there iff the the worker is executing a work function.
>>
> 
> Hi Tejun,
> 
> Thanks for your quick reply.
> Hmm, but i think interrupt preemption could make this happen.
> For example, the scenario that all following conditions are met.
> 1.The handler of an interrupt call queue_work() to queue some works, and the workqueue is draining.
> 2.An worker thread is interrupted by this interrupt.
> 3.The worker thread is being executed, and at the very moment,worker->current_pwq
> is set to null.(This is possible and please check function worker_thread() carefully).
> 
> Thanks.
> 
> 
> 
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Question] null pointer risk of kernel workqueue
  2017-10-20  6:57     ` tanxiaofei
@ 2017-10-21 15:35       ` Tejun Heo
  2017-10-21 15:35         ` Tejun Heo
  2017-10-23  1:34         ` Li Bin
  0 siblings, 2 replies; 10+ messages in thread
From: Tejun Heo @ 2017-10-21 15:35 UTC (permalink / raw)
  To: tanxiaofei; +Cc: jiangshanlai, linux-kernel, Linuxarm, John Garry, huawei.libin

On Fri, Oct 20, 2017 at 02:57:18PM +0800, tanxiaofei wrote:
> Hi Tejun,
> 
> Any comments about this?

I think you're confused, or at least can't understand what you're
trying to say.  Can you create a rero?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Question] null pointer risk of kernel workqueue
  2017-10-21 15:35       ` Tejun Heo
@ 2017-10-21 15:35         ` Tejun Heo
  2017-10-23  1:34         ` Li Bin
  1 sibling, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2017-10-21 15:35 UTC (permalink / raw)
  To: tanxiaofei; +Cc: jiangshanlai, linux-kernel, Linuxarm, John Garry, huawei.libin

On Sat, Oct 21, 2017 at 08:35:22AM -0700, Tejun Heo wrote:
> On Fri, Oct 20, 2017 at 02:57:18PM +0800, tanxiaofei wrote:
> > Hi Tejun,
> > 
> > Any comments about this?
> 
> I think you're confused, or at least can't understand what you're
> trying to say.  Can you create a rero?

Sorry, I meant to write repro - a reproduction case.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Question] null pointer risk of kernel workqueue
  2017-10-21 15:35       ` Tejun Heo
  2017-10-21 15:35         ` Tejun Heo
@ 2017-10-23  1:34         ` Li Bin
  2017-10-23  6:23           ` tanxiaofei
  2017-10-23 14:03           ` Tejun Heo
  1 sibling, 2 replies; 10+ messages in thread
From: Li Bin @ 2017-10-23  1:34 UTC (permalink / raw)
  To: Tejun Heo, tanxiaofei; +Cc: jiangshanlai, linux-kernel, John Garry



on 2017/10/21 23:35, Tejun Heo wrote:
> On Fri, Oct 20, 2017 at 02:57:18PM +0800, tanxiaofei wrote:
>> Hi Tejun,
>>
>> Any comments about this?
> 
> I think you're confused, or at least can't understand what you're
> trying to say.  Can you create a rero?
> 

Hi Tejun,
The case is as following:

worker_thread()
|-spin_lock_irq()
|-process_one_work()
    |-worker->current_pwq = pwq
    |-spin_unlock_irq()
    |-worker->current_func(work)
    |-spin_lock_irq()
    |-worker->current_pwq = NULL
|-spin_unlock_irq()
                                                                    //interrupt here
                                                                    |-irq_handler
                                                                        |-__queue_work()
                                                                            //assuming that the wq is draining
                                                                            |-if (unlikely(wq->flags & __WQ_DRAINING) &&WARN_ON_ONCE(!is_chained_work(wq)))
                                                                                |-is_chained_work(wq)
                                                                                    |-current_wq_worker() // Here, 'current' is the interrupted worker!
                                                                                        |-current->current_pwq is NULL here!
|-schedule()

And I think the following patch can solve the bug, right?

diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 8635417..650680c 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -59,7 +59,7 @@ struct worker {
  */
 static inline struct worker *current_wq_worker(void)
 {
-       if (current->flags & PF_WQ_WORKER)
+       if (!in_irq() && (current->flags & PF_WQ_WORKER))
                return kthread_data(current);
        return NULL;
 }


Thanks,
Li Bin

> Thanks.
> 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Question] null pointer risk of kernel workqueue
  2017-10-23  1:34         ` Li Bin
@ 2017-10-23  6:23           ` tanxiaofei
  2017-10-23 14:03           ` Tejun Heo
  1 sibling, 0 replies; 10+ messages in thread
From: tanxiaofei @ 2017-10-23  6:23 UTC (permalink / raw)
  To: Li Bin, Tejun Heo; +Cc: jiangshanlai, linux-kernel, John Garry

Hi Bin,

Yes, that's it. thanks.

--
tanxiaofei

On 2017/10/23 9:34, Li Bin wrote:
> 
> 
> on 2017/10/21 23:35, Tejun Heo wrote:
>> On Fri, Oct 20, 2017 at 02:57:18PM +0800, tanxiaofei wrote:
>>> Hi Tejun,
>>>
>>> Any comments about this?
>>
>> I think you're confused, or at least can't understand what you're
>> trying to say.  Can you create a rero?
>>
> 
> Hi Tejun,
> The case is as following:
> 
> worker_thread()
> |-spin_lock_irq()
> |-process_one_work()
>     |-worker->current_pwq = pwq
>     |-spin_unlock_irq()
>     |-worker->current_func(work)
>     |-spin_lock_irq()
>     |-worker->current_pwq = NULL
> |-spin_unlock_irq()
>                                                                     //interrupt here
>                                                                     |-irq_handler
>                                                                         |-__queue_work()
>                                                                             //assuming that the wq is draining
>                                                                             |-if (unlikely(wq->flags & __WQ_DRAINING) &&WARN_ON_ONCE(!is_chained_work(wq)))
>                                                                                 |-is_chained_work(wq)
>                                                                                     |-current_wq_worker() // Here, 'current' is the interrupted worker!
>                                                                                         |-current->current_pwq is NULL here!
> |-schedule()
> 
> And I think the following patch can solve the bug, right?
> 
> diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
> index 8635417..650680c 100644
> --- a/kernel/workqueue_internal.h
> +++ b/kernel/workqueue_internal.h
> @@ -59,7 +59,7 @@ struct worker {
>   */
>  static inline struct worker *current_wq_worker(void)
>  {
> -       if (current->flags & PF_WQ_WORKER)
> +       if (!in_irq() && (current->flags & PF_WQ_WORKER))
>                 return kthread_data(current);
>         return NULL;
>  }
> 
> 
> Thanks,
> Li Bin
> 
>> Thanks.
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Question] null pointer risk of kernel workqueue
  2017-10-23  1:34         ` Li Bin
  2017-10-23  6:23           ` tanxiaofei
@ 2017-10-23 14:03           ` Tejun Heo
  2017-10-24  0:33             ` Li Bin
  1 sibling, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2017-10-23 14:03 UTC (permalink / raw)
  To: Li Bin; +Cc: tanxiaofei, jiangshanlai, linux-kernel, John Garry

Hello,

On Mon, Oct 23, 2017 at 09:34:11AM +0800, Li Bin wrote:
> 
> 
> on 2017/10/21 23:35, Tejun Heo wrote:
> > On Fri, Oct 20, 2017 at 02:57:18PM +0800, tanxiaofei wrote:
> >> Hi Tejun,
> >>
> >> Any comments about this?
> > 
> > I think you're confused, or at least can't understand what you're
> > trying to say.  Can you create a rero?
> > 
> 
> Hi Tejun,
> The case is as following:
> 
> worker_thread()
> |-spin_lock_irq()
> |-process_one_work()
>     |-worker->current_pwq = pwq
>     |-spin_unlock_irq()
>     |-worker->current_func(work)
>     |-spin_lock_irq()
>     |-worker->current_pwq = NULL
> |-spin_unlock_irq()
>                                                                     //interrupt here
>                                                                     |-irq_handler
>                                                                         |-__queue_work()
>                                                                             //assuming that the wq is draining
>                                                                             |-if (unlikely(wq->flags & __WQ_DRAINING) &&WARN_ON_ONCE(!is_chained_work(wq)))
>                                                                                 |-is_chained_work(wq)
>                                                                                     |-current_wq_worker() // Here, 'current' is the interrupted worker!
>                                                                                         |-current->current_pwq is NULL here!
> |-schedule()
> 
> And I think the following patch can solve the bug, right?
> 
> diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
> index 8635417..650680c 100644
> --- a/kernel/workqueue_internal.h
> +++ b/kernel/workqueue_internal.h
> @@ -59,7 +59,7 @@ struct worker {
>   */
>  static inline struct worker *current_wq_worker(void)
>  {
> -       if (current->flags & PF_WQ_WORKER)
> +       if (!in_irq() && (current->flags & PF_WQ_WORKER))
>                 return kthread_data(current);
>         return NULL;
>  }

Yeah, that makes sense to me.  Can you please resend the patch with
patch description and SOB?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Question] null pointer risk of kernel workqueue
  2017-10-23 14:03           ` Tejun Heo
@ 2017-10-24  0:33             ` Li Bin
  0 siblings, 0 replies; 10+ messages in thread
From: Li Bin @ 2017-10-24  0:33 UTC (permalink / raw)
  To: Tejun Heo; +Cc: tanxiaofei, jiangshanlai, linux-kernel, John Garry



on 2017/10/23 22:03, Tejun Heo wrote:
>>
>> And I think the following patch can solve the bug, right?
>>
>> diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
>> index 8635417..650680c 100644
>> --- a/kernel/workqueue_internal.h
>> +++ b/kernel/workqueue_internal.h
>> @@ -59,7 +59,7 @@ struct worker {
>>   */
>>  static inline struct worker *current_wq_worker(void)
>>  {
>> -       if (current->flags & PF_WQ_WORKER)
>> +       if (!in_irq() && (current->flags & PF_WQ_WORKER))
>>                 return kthread_data(current);
>>         return NULL;
>>  }
> 
> Yeah, that makes sense to me.  Can you please resend the patch with
> patch description and SOB?

Ok, I will resend the patch soon.

Thanks,
Li Bin

> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-10-24  0:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-23  9:04 [Question] null pointer risk of kernel workqueue tanxiaofei
2017-09-25 15:25 ` Tejun Heo
2017-09-27  9:17   ` tanxiaofei
2017-10-20  6:57     ` tanxiaofei
2017-10-21 15:35       ` Tejun Heo
2017-10-21 15:35         ` Tejun Heo
2017-10-23  1:34         ` Li Bin
2017-10-23  6:23           ` tanxiaofei
2017-10-23 14:03           ` Tejun Heo
2017-10-24  0:33             ` Li Bin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox