From mboxrd@z Thu Jan  1 00:00:00 1970
From: John Fastabend <john.fastabend@gmail.com>
Subject: Re: [PATCH net-next v2] net: sched: fix skb leak in dev_requeue_skb()
Date: Tue, 26 Dec 2017 21:24:07 -0800
Message-ID: <42a543e3-82ff-2421-186d-cc0957ea45ed@gmail.com>
References: <1514173746-165282-1-git-send-email-weiyongjun1@huawei.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Cc: netdev@vger.kernel.org
To: Wei Yongjun <weiyongjun1@huawei.com>,
        Jamal Hadi Salim <jhs@mojatatu.com>,
        Cong Wang <xiyou.wangcong@gmail.com>,
        Jiri Pirko <jiri@resnulli.us>
Return-path: <netdev-owner@vger.kernel.org>
Received: from mail-pg0-f67.google.com ([74.125.83.67]:43052 "EHLO
        mail-pg0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1750762AbdL0FYW (ORCPT
        <rfc822;netdev@vger.kernel.org>); Wed, 27 Dec 2017 00:24:22 -0500
Received: by mail-pg0-f67.google.com with SMTP id f18so1817738pga.10
        for <netdev@vger.kernel.org>; Tue, 26 Dec 2017 21:24:21 -0800 (PST)
In-Reply-To: <1514173746-165282-1-git-send-email-weiyongjun1@huawei.com>
Content-Language: en-US
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

On 12/24/2017 07:49 PM, Wei Yongjun wrote:
> When dev_requeue_skb() is called with bluked skb list, only the
> first skb of the list will be requeued to qdisc layer, and leak
> the others without free them.
> 
> TCP is broken due to skb leak since no free skb will be considered
> as still in the host queue and never be retransmitted. This happend
> when dev_requeue_skb() called from qdisc_restart().
>   qdisc_restart
>   |-- dequeue_skb
>   |-- sch_direct_xmit()
>       |-- dev_requeue_skb() <-- skb may bluked
> 
> Fix dev_requeue_skb() to requeue the full bluked list.
> 
> Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> ---
> v1 -> v2: add net-next prefix
> ---

First, thanks for tracking this down.

> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 981c08f..0df2dbf 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -111,10 +111,16 @@ static inline void qdisc_enqueue_skb_bad_txq(struct Qdisc *q,
>  
>  static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>  {
> -	__skb_queue_head(&q->gso_skb, skb);
> -	q->qstats.requeues++;
> -	qdisc_qstats_backlog_inc(q, skb);
> -	q->q.qlen++;	/* it's still part of the queue */
> +	while (skb) {
> +		struct sk_buff *next = skb->next;
> +
> +		__skb_queue_tail(&q->gso_skb, skb);

Was the change from __skb_queue_head to __skb_queue_tail here
intentional? We should re-queue packets to the head of the list.

> +		q->qstats.requeues++;
> +		qdisc_qstats_backlog_inc(q, skb);
> +		q->q.qlen++;	/* it's still part of the queue */
> +
> +		skb = next;
> +	}
>  	__netif_schedule(q);
>  
>  	return 0;
> @@ -124,13 +130,20 @@ static inline int dev_requeue_skb_locked(struct sk_buff *skb, struct Qdisc *q)
>  {
>  	spinlock_t *lock = qdisc_lock(q);
>  
> -	spin_lock(lock);
> -	__skb_queue_tail(&q->gso_skb, skb);
> -	spin_unlock(lock);
> +	while (skb) {
> +		struct sk_buff *next = skb->next;
> +
> +		spin_lock(lock);

In this case I suspect its better to move the lock to be around the
while loop rather than grab and drop it repeatedly. I don't have
any data at this point so OK either way. Assuming other head/tail
comment is addressed.

> +		__skb_queue_tail(&q->gso_skb, skb);

Same here *_tail should be *_head?

> +		spin_unlock(lock);
> +
> +		qdisc_qstats_cpu_requeues_inc(q);
> +		qdisc_qstats_cpu_backlog_inc(q, skb);
> +		qdisc_qstats_cpu_qlen_inc(q);
> +
> +		skb = next;
> +	}
>  
> -	qdisc_qstats_cpu_requeues_inc(q);
> -	qdisc_qstats_cpu_backlog_inc(q, skb);
> -	qdisc_qstats_cpu_qlen_inc(q);
>  	__netif_schedule(q);
>  
>  	return 0;
>