From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S932306AbdJWOD1 (ORCPT <rfc822;w@1wt.eu>);
        Mon, 23 Oct 2017 10:03:27 -0400
Received: from mail-qt0-f193.google.com ([209.85.216.193]:50070 "EHLO
        mail-qt0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S932191AbdJWODZ (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 23 Oct 2017 10:03:25 -0400
X-Google-Smtp-Source: ABhQp+QJnEIBm6A5tHMmaTj418cWVae5k7asqG7TEBafrycLwEv5vIv5CU6wXo2P0xTmGiw7FsrCkw==
Date: Mon, 23 Oct 2017 07:03:21 -0700
From: Tejun Heo <tj@kernel.org>
To: Li Bin <huawei.libin@huawei.com>
Cc: tanxiaofei <tanxiaofei@huawei.com>, jiangshanlai@gmail.com,
        linux-kernel@vger.kernel.org, John Garry <john.garry@huawei.com>
Subject: Re: [Question] null pointer risk of kernel workqueue
Message-ID: <20171023140321.GY1302522@devbig577.frc2.facebook.com>
References: <59C62398.6040101@huawei.com>
 <20170925152536.GL828415@devbig577.frc2.facebook.com>
 <59CB6C9C.7000205@huawei.com>
 <59E99E4E.5090305@huawei.com>
 <20171021153522.GH1302522@devbig577.frc2.facebook.com>
 <2f56ab49-4a65-8e35-07ba-6577af8843b6@huawei.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <2f56ab49-4a65-8e35-07ba-6577af8843b6@huawei.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hello,

On Mon, Oct 23, 2017 at 09:34:11AM +0800, Li Bin wrote:
> 
> 
> on 2017/10/21 23:35, Tejun Heo wrote:
> > On Fri, Oct 20, 2017 at 02:57:18PM +0800, tanxiaofei wrote:
> >> Hi Tejun,
> >>
> >> Any comments about this?
> > 
> > I think you're confused, or at least can't understand what you're
> > trying to say.  Can you create a rero?
> > 
> 
> Hi Tejun,
> The case is as following:
> 
> worker_thread()
> |-spin_lock_irq()
> |-process_one_work()
>     |-worker->current_pwq = pwq
>     |-spin_unlock_irq()
>     |-worker->current_func(work)
>     |-spin_lock_irq()
>     |-worker->current_pwq = NULL
> |-spin_unlock_irq()
>                                                                     //interrupt here
>                                                                     |-irq_handler
>                                                                         |-__queue_work()
>                                                                             //assuming that the wq is draining
>                                                                             |-if (unlikely(wq->flags & __WQ_DRAINING) &&WARN_ON_ONCE(!is_chained_work(wq)))
>                                                                                 |-is_chained_work(wq)
>                                                                                     |-current_wq_worker() // Here, 'current' is the interrupted worker!
>                                                                                         |-current->current_pwq is NULL here!
> |-schedule()
> 
> And I think the following patch can solve the bug, right?
> 
> diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
> index 8635417..650680c 100644
> --- a/kernel/workqueue_internal.h
> +++ b/kernel/workqueue_internal.h
> @@ -59,7 +59,7 @@ struct worker {
>   */
>  static inline struct worker *current_wq_worker(void)
>  {
> -       if (current->flags & PF_WQ_WORKER)
> +       if (!in_irq() && (current->flags & PF_WQ_WORKER))
>                 return kthread_data(current);
>         return NULL;
>  }

Yeah, that makes sense to me.  Can you please resend the patch with
patch description and SOB?

Thanks.

-- 
tejun