Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2 net-next] net/sched: add skbprio scheduler
From: Cong Wang @ 2018-06-23 21:43 UTC (permalink / raw)
  To: Nishanth Devarajan
  Cc: Jamal Hadi Salim, Jiri Pirko, David Miller,
	Linux Kernel Network Developers, Cody Doucette, Michel Machado
In-Reply-To: <20180623204745.GA4337@gmail.com>

On Sat, Jun 23, 2018 at 1:47 PM, Nishanth Devarajan <ndev2021@gmail.com> wrote:
> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
> index 37b5096..6fd07e8 100644
> --- a/include/uapi/linux/pkt_sched.h
> +++ b/include/uapi/linux/pkt_sched.h
...
> +#define SKBPRIO_MAX_PRIORITY 64
> +
> +struct tc_skbprio_qopt {
> +       __u32   limit;          /* Queue length in packets. */
> +};


Since this is just an integer, you can just make it NLA_U32 instead
of a struct?


> +static int skbprio_change(struct Qdisc *sch, struct nlattr *opt,
> +                       struct netlink_ext_ack *extack)
> +{
> +       struct skbprio_sched_data *q = qdisc_priv(sch);
> +       struct tc_skbprio_qopt *ctl = nla_data(opt);
> +       const unsigned int min_limit = 1;
> +
> +       if (ctl->limit == (typeof(ctl->limit))-1)
> +               q->max_limit = max(qdisc_dev(sch)->tx_queue_len, min_limit);
> +       else if (ctl->limit < min_limit ||
> +               ctl->limit > qdisc_dev(sch)->tx_queue_len)
> +               return -EINVAL;
> +       else
> +               q->max_limit = ctl->limit;
> +
> +       return 0;
> +}

Isn't q->max_limit same with sch->limit?

Also, please avoid dev->tx_queue_len here, it may change
independently of your qdisc change, unless you want to implement
ops->change_tx_queue_len().

^ permalink raw reply

* Re: [PATCH] qmi_wwan: add support for the Dell Wireless 5821e module
From: Bjørn Mork @ 2018-06-23 21:56 UTC (permalink / raw)
  To: Aleksander Morgado; +Cc: netdev, linux-usb
In-Reply-To: <20180623212252.15026-1-aleksander@aleksander.es>

Aleksander Morgado <aleksander@aleksander.es> writes:

> This module exposes two USB configurations: a QMI+AT capable setup on
> USB config #1 and a MBIM capable setup on USB config #2.
>
> By default the kernel will choose the MBIM capable configuration as
> long as the cdc_mbim driver is available. This patch adds support for
> the QMI port in the secondary configuration.
>
> Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>

Acked-by: Bjørn Mork <bjorn@mork.no>

Please queue this up for stable too.  Thanks.

^ permalink raw reply

* Re: [PATCH v2 net-next] net/sched: add skbprio scheduler
From: Alexander Duyck @ 2018-06-23 22:10 UTC (permalink / raw)
  To: Nishanth Devarajan
  Cc: Cong Wang, Jamal Hadi Salim, Jiri Pirko, David Miller, Netdev,
	doucette, michel
In-Reply-To: <20180623204745.GA4337@gmail.com>

On Sat, Jun 23, 2018 at 1:47 PM, Nishanth Devarajan <ndev2021@gmail.com> wrote:
> net/sched: add skbprio scheduler
>
> Skbprio (SKB Priority Queue) is a queueing discipline that prioritizes packets
> according to their skb->priority field. Although Skbprio can be employed in any
> scenario in which a higher skb->priority field means a higher priority packet,
> Skbprio was concieved as a solution for denial-of-service defenses that need to
> route packets with different priorities.

Really this description is not very good. Reading it I was thinking to
myself "why do we need this, prio already does this". It wasn't until
I read through the code that I figured out that you are basically
adding dropping of lower priority frames.

>
> v2
> *Use skb->priority field rather than DS field. Rename queueing discipline as
> SKB Priority Queue (previously Gatekeeper Priority Queue).
>
> *Queueing discipline is made classful to expose Skbprio's internal priority
> queues.
>
> Signed-off-by: Nishanth Devarajan <ndev2021@gmail.com>
> Reviewed-by: Sachin Paryani <sachin.paryani@gmail.com>
> Reviewed-by: Cody Doucette <doucette@bu.edu>
> Reviewed-by: Michel Machado <michel@digirati.com.br>
> ---
>  include/uapi/linux/pkt_sched.h |  15 ++
>  net/sched/Kconfig              |  13 ++
>  net/sched/Makefile             |   1 +
>  net/sched/sch_skbprio.c        | 347 +++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 376 insertions(+)
>  create mode 100644 net/sched/sch_skbprio.c
>
> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
> index 37b5096..6fd07e8 100644
> --- a/include/uapi/linux/pkt_sched.h
> +++ b/include/uapi/linux/pkt_sched.h
> @@ -124,6 +124,21 @@ struct tc_fifo_qopt {
>         __u32   limit;  /* Queue length: bytes for bfifo, packets for pfifo */
>  };
>
> +/* SKBPRIO section */
> +
> +/*
> + * Priorities go from zero to (SKBPRIO_MAX_PRIORITY - 1).
> + * SKBPRIO_MAX_PRIORITY should be at least 64 in order for skbprio to be able
> + * to map one to one the DS field of IPV4 and IPV6 headers.
> + * Memory allocation grows linearly with SKBPRIO_MAX_PRIORITY.
> + */
> +
> +#define SKBPRIO_MAX_PRIORITY 64
> +
> +struct tc_skbprio_qopt {
> +       __u32   limit;          /* Queue length in packets. */
> +};
> +
>  /* PRIO section */
>
>  #define TCQ_PRIO_BANDS 16
> diff --git a/net/sched/Kconfig b/net/sched/Kconfig
> index a01169f..9ac4b53 100644
> --- a/net/sched/Kconfig
> +++ b/net/sched/Kconfig
> @@ -240,6 +240,19 @@ config NET_SCH_MQPRIO
>
>           If unsure, say N.
>
> +config NET_SCH_SKBPRIO
> +       tristate "SKB priority queue scheduler (SKBPRIO)"
> +       help
> +         Say Y here if you want to use the SKB priority queue
> +         scheduler. This schedules packets according to skb->priority,
> +         which is useful for request packets in DoS mitigation systems such
> +         as Gatekeeper.
> +
> +         To compile this driver as a module, choose M here: the module will
> +         be called sch_skbprio.
> +
> +         If unsure, say N.
> +
>  config NET_SCH_CHOKE
>         tristate "CHOose and Keep responsive flow scheduler (CHOKE)"
>         help
> diff --git a/net/sched/Makefile b/net/sched/Makefile
> index 8811d38..a4d8893 100644
> --- a/net/sched/Makefile
> +++ b/net/sched/Makefile
> @@ -46,6 +46,7 @@ obj-$(CONFIG_NET_SCH_NETEM)   += sch_netem.o
>  obj-$(CONFIG_NET_SCH_DRR)      += sch_drr.o
>  obj-$(CONFIG_NET_SCH_PLUG)     += sch_plug.o
>  obj-$(CONFIG_NET_SCH_MQPRIO)   += sch_mqprio.o
> +obj-$(CONFIG_NET_SCH_SKBPRIO)  += sch_skbprio.o
>  obj-$(CONFIG_NET_SCH_CHOKE)    += sch_choke.o
>  obj-$(CONFIG_NET_SCH_QFQ)      += sch_qfq.o
>  obj-$(CONFIG_NET_SCH_CODEL)    += sch_codel.o
> diff --git a/net/sched/sch_skbprio.c b/net/sched/sch_skbprio.c
> new file mode 100644
> index 0000000..5e89446
> --- /dev/null
> +++ b/net/sched/sch_skbprio.c
> @@ -0,0 +1,347 @@
> +/*
> + * net/sched/sch_skbprio.c  SKB Priority Queue.
> + *
> + *             This program is free software; you can redistribute it and/or
> + *             modify it under the terms of the GNU General Public License
> + *             as published by the Free Software Foundation; either version
> + *             2 of the License, or (at your option) any later version.
> + *
> + * Authors:    Nishanth Devarajan, <ndev2021@gmail.com>
> + *             Cody Doucette, <doucette@bu.edu>
> + *             original idea by Michel Machado, Cody Doucette, and Qiaobin Fu
> + */
> +
> +#include <linux/string.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/skbuff.h>
> +#include <net/pkt_sched.h>
> +#include <net/sch_generic.h>
> +#include <net/inet_ecn.h>
> +
> +
> +/*       SKB Priority Queue
> + *     =================================
> + *
> + * This qdisc schedules a packet according to skb->priority, where a higher
> + * value places the packet closer to the exit of the queue. When the queue is
> + * full, the lowest priority packet in the queue is dropped to make room for
> + * the packet to be added if it has higher priority. If the packet to be added
> + * has lower priority than all packets in the queue, it is dropped.
> + *
> + * Without the SKB priority queue, queue length limits must be imposed
> + * for individual queues, and there is no easy way to enforce a global queue
> + * length limit across all priorities. With the SKBprio queue, a global
> + * queue length limit can be enforced while not restricting the queue lengths
> + * of individual priorities.
> + *
> + * This is especially useful for a denial-of-service defense system like
> + * Gatekeeper, which prioritizes packets in flows that demonstrate expected
> + * behavior of legitimate users. The queue is flexible to allow any number
> + * of packets of any priority up to the global limit of the scheduler
> + * without risking resource overconsumption by a flood of low priority packets.
> + *
> + * The Gatekeeper codebase is found here:
> + *
> + *             https://github.com/AltraMayor/gatekeeper

Do we really need to be linking to third party projects in the code? I
am not even really sure it is related here since I cannot find
anything that references the queuing discipline.

Also this seems like it should be in the Documentation/networking
folder rather than being stored in the code itself.

> + */
> +
> +struct skbprio_sched_data {
> +       /* Parameters. */
> +       u32 max_limit;
> +
> +       /* Queue state. */
> +       struct sk_buff_head qdiscs[SKBPRIO_MAX_PRIORITY];
> +       struct gnet_stats_queue qstats[SKBPRIO_MAX_PRIORITY];
> +       u16 highest_prio;
> +       u16 lowest_prio;
> +};
> +
> +static u16 calc_new_high_prio(const struct skbprio_sched_data *q)
> +{
> +       int prio;
> +
> +       for (prio = q->highest_prio - 1; prio >= q->lowest_prio; prio--) {
> +               if (!skb_queue_empty(&q->qdiscs[prio]))
> +                       return prio;
> +       }
> +
> +       /* SKB queue is empty, return 0 (default highest priority setting). */
> +       return 0;
> +}
> +
> +static u16 calc_new_low_prio(const struct skbprio_sched_data *q)
> +{
> +       int prio;
> +
> +       for (prio = q->lowest_prio + 1; prio <= q->highest_prio; prio++) {
> +               if (!skb_queue_empty(&q->qdiscs[prio]))
> +                       return prio;
> +       }
> +
> +       /* SKB queue is empty, return SKBPRIO_MAX_PRIORITY - 1
> +        * (default lowest priority setting).
> +        */
> +       return SKBPRIO_MAX_PRIORITY - 1;
> +}
> +
> +static int skbprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> +                         struct sk_buff **to_free)
> +{
> +       const unsigned int max_priority = SKBPRIO_MAX_PRIORITY - 1;
> +       struct skbprio_sched_data *q = qdisc_priv(sch);
> +       struct sk_buff_head *qdisc;
> +       struct sk_buff_head *lp_qdisc;
> +       struct sk_buff *to_drop;
> +       u16 prio, lp;
> +
> +       /* Obtain the priority of @skb. */
> +       prio = min(skb->priority, max_priority);
> +
> +       qdisc = &q->qdiscs[prio];
> +       if (sch->q.qlen < q->max_limit) {
> +               __skb_queue_tail(qdisc, skb);
> +               qdisc_qstats_backlog_inc(sch, skb);
> +               q->qstats[prio].backlog += qdisc_pkt_len(skb);
> +
> +               /* Check to update highest and lowest priorities. */
> +               if (prio > q->highest_prio)
> +                       q->highest_prio = prio;
> +
> +               if (prio < q->lowest_prio)
> +                       q->lowest_prio = prio;
> +
> +               sch->q.qlen++;
> +               return NET_XMIT_SUCCESS;
> +       }
> +
> +       /* If this packet has the lowest priority, drop it. */
> +       lp = q->lowest_prio;
> +       if (prio <= lp) {
> +               q->qstats[prio].drops++;
> +               return qdisc_drop(skb, sch, to_free);
> +       }
> +
> +       /* Drop the packet at the tail of the lowest priority qdisc. */
> +       lp_qdisc = &q->qdiscs[lp];
> +       to_drop = __skb_dequeue_tail(lp_qdisc);

Is there a specific reason for dropping tail instead of head? I would
think you might see better latency numbers with this if you were to
focus on dropping older packets from the lowest priority instead of
the newest ones. Then things like TCP would likely be able to respond
more quickly to the congestion effects and would be able to more
accurately determine actual round trip time versus buffer delay.

> +       BUG_ON(!to_drop);
> +       qdisc_qstats_backlog_dec(sch, to_drop);
> +       qdisc_drop(to_drop, sch, to_free);
> +
> +       q->qstats[lp].backlog -= qdisc_pkt_len(to_drop);
> +       q->qstats[lp].drops++;
> +
> +       __skb_queue_tail(qdisc, skb);
> +       qdisc_qstats_backlog_inc(sch, skb);
> +       q->qstats[prio].backlog += qdisc_pkt_len(skb);
> +

You could probably just take care of enqueueing first, and then take
care of the dequeue. That way you can drop the references to skb
earlier from the stack or free up whatever register is storing it.

> +       /* Check to update highest and lowest priorities. */
> +       if (skb_queue_empty(lp_qdisc)) {
> +               if (q->lowest_prio == q->highest_prio) {
> +                       BUG_ON(sch->q.qlen);
> +                       q->lowest_prio = prio;
> +                       q->highest_prio = prio;
> +               } else {
> +                       q->lowest_prio = calc_new_low_prio(q);
> +               }
> +       }
> +
> +       if (prio > q->highest_prio)
> +               q->highest_prio = prio;
> +
> +       return NET_XMIT_CN;
> +}
> +
> +static struct sk_buff *skbprio_dequeue(struct Qdisc *sch)
> +{
> +       struct skbprio_sched_data *q = qdisc_priv(sch);
> +       struct sk_buff_head *hpq = &q->qdiscs[q->highest_prio];
> +       struct sk_buff *skb = __skb_dequeue(hpq);
> +
> +       if (unlikely(!skb))
> +               return NULL;
> +
> +       sch->q.qlen--;
> +       qdisc_qstats_backlog_dec(sch, skb);
> +       qdisc_bstats_update(sch, skb);
> +
> +       q->qstats[q->highest_prio].backlog -= qdisc_pkt_len(skb);
> +
> +       /* Update highest priority field. */
> +       if (skb_queue_empty(hpq)) {
> +               if (q->lowest_prio == q->highest_prio) {
> +                       BUG_ON(sch->q.qlen);
> +                       q->highest_prio = 0;
> +                       q->lowest_prio = SKBPRIO_MAX_PRIORITY - 1;
> +               } else {
> +                       q->highest_prio = calc_new_high_prio(q);
> +               }
> +       }
> +       return skb;
> +}
> +
> +static int skbprio_change(struct Qdisc *sch, struct nlattr *opt,
> +                       struct netlink_ext_ack *extack)
> +{
> +       struct skbprio_sched_data *q = qdisc_priv(sch);
> +       struct tc_skbprio_qopt *ctl = nla_data(opt);
> +       const unsigned int min_limit = 1;
> +
> +       if (ctl->limit == (typeof(ctl->limit))-1)
> +               q->max_limit = max(qdisc_dev(sch)->tx_queue_len, min_limit);
> +       else if (ctl->limit < min_limit ||
> +               ctl->limit > qdisc_dev(sch)->tx_queue_len)
> +               return -EINVAL;
> +       else
> +               q->max_limit = ctl->limit;
> +
> +       return 0;
> +}
> +
> +static int skbprio_init(struct Qdisc *sch, struct nlattr *opt,
> +                       struct netlink_ext_ack *extack)
> +{
> +       struct skbprio_sched_data *q = qdisc_priv(sch);
> +       const unsigned int min_limit = 1;
> +       int prio;
> +
> +       /* Initialise all queues, one for each possible priority. */
> +       for (prio = 0; prio < SKBPRIO_MAX_PRIORITY; prio++)
> +               __skb_queue_head_init(&q->qdiscs[prio]);
> +
> +       memset(&q->qstats, 0, sizeof(q->qstats));
> +       q->highest_prio = 0;
> +       q->lowest_prio = SKBPRIO_MAX_PRIORITY - 1;
> +       if (!opt) {
> +               q->max_limit = max(qdisc_dev(sch)->tx_queue_len, min_limit);
> +               return 0;
> +       }
> +       return skbprio_change(sch, opt, extack);
> +}
> +
> +static int skbprio_dump(struct Qdisc *sch, struct sk_buff *skb)
> +{
> +       struct skbprio_sched_data *q = qdisc_priv(sch);
> +       struct tc_skbprio_qopt opt;
> +
> +       opt.limit = q->max_limit;
> +
> +       if (nla_put(skb, TCA_OPTIONS, sizeof(opt), &opt))
> +               return -1;
> +
> +       return skb->len;
> +}
> +
> +static void skbprio_reset(struct Qdisc *sch)
> +{
> +       struct skbprio_sched_data *q = qdisc_priv(sch);
> +       int prio;
> +
> +       sch->qstats.backlog = 0;
> +       sch->q.qlen = 0;
> +
> +       for (prio = 0; prio < SKBPRIO_MAX_PRIORITY; prio++)
> +               __skb_queue_purge(&q->qdiscs[prio]);
> +
> +       memset(&q->qstats, 0, sizeof(q->qstats));
> +       q->highest_prio = 0;
> +       q->lowest_prio = SKBPRIO_MAX_PRIORITY - 1;
> +}
> +
> +static void skbprio_destroy(struct Qdisc *sch)
> +{
> +       struct skbprio_sched_data *q = qdisc_priv(sch);
> +       int prio;
> +
> +       for (prio = 0; prio < SKBPRIO_MAX_PRIORITY; prio++)
> +               __skb_queue_purge(&q->qdiscs[prio]);
> +}
> +
> +static struct Qdisc *skbprio_leaf(struct Qdisc *sch, unsigned long arg)
> +{
> +       return NULL;
> +}
> +
> +static unsigned long skbprio_find(struct Qdisc *sch, u32 classid)
> +{
> +       return 0;
> +}
> +
> +static int skbprio_dump_class(struct Qdisc *sch, unsigned long cl,
> +                            struct sk_buff *skb, struct tcmsg *tcm)
> +{
> +       tcm->tcm_handle |= TC_H_MIN(cl);
> +       return 0;
> +}
> +
> +static int skbprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
> +                                  struct gnet_dump *d)
> +{
> +       struct skbprio_sched_data *q = qdisc_priv(sch);
> +       if (gnet_stats_copy_queue(d, NULL, &q->qstats[cl - 1],
> +               q->qstats[cl - 1].qlen) < 0)
> +               return -1;
> +       return 0;
> +}
> +
> +static void skbprio_walk(struct Qdisc *sch, struct qdisc_walker *arg)
> +{
> +       unsigned int i;
> +
> +       if (arg->stop)
> +               return;
> +
> +       for (i = 0; i < SKBPRIO_MAX_PRIORITY; i++) {
> +               if (arg->count < arg->skip) {
> +                       arg->count++;
> +                       continue;
> +               }
> +               if (arg->fn(sch, i + 1, arg) < 0) {
> +                       arg->stop = 1;
> +                       break;
> +               }
> +               arg->count++;
> +       }
> +}
> +
> +static const struct Qdisc_class_ops skbprio_class_ops = {
> +       .leaf           =       skbprio_leaf,
> +       .find           =       skbprio_find,
> +       .dump           =       skbprio_dump_class,
> +       .dump_stats     =       skbprio_dump_class_stats,
> +       .walk           =       skbprio_walk,
> +};
> +
> +static struct Qdisc_ops skbprio_qdisc_ops __read_mostly = {
> +       .cl_ops         =       &skbprio_class_ops,
> +       .id             =       "skbprio",
> +       .priv_size      =       sizeof(struct skbprio_sched_data),
> +       .enqueue        =       skbprio_enqueue,
> +       .dequeue        =       skbprio_dequeue,
> +       .peek           =       qdisc_peek_dequeued,
> +       .init           =       skbprio_init,
> +       .reset          =       skbprio_reset,
> +       .change         =       skbprio_change,
> +       .dump           =       skbprio_dump,
> +       .destroy        =       skbprio_destroy,
> +       .owner          =       THIS_MODULE,
> +};
> +
> +static int __init skbprio_module_init(void)
> +{
> +       return register_qdisc(&skbprio_qdisc_ops);
> +}
> +
> +static void __exit skbprio_module_exit(void)
> +{
> +       unregister_qdisc(&skbprio_qdisc_ops);
> +}
> +
> +module_init(skbprio_module_init)
> +module_exit(skbprio_module_exit)
> +
> +MODULE_LICENSE("GPL");
> --
> 1.9.1
>

^ permalink raw reply

* [PATCH] brcmsmac: make function wlc_phy_workarounds_nphy_rev1 static
From: Colin King @ 2018-06-23 22:15 UTC (permalink / raw)
  To: Arend van Spriel, Franky Lin, Hante Meuleman, Chi-Hsien Lin,
	Wright Feng, Kalle Valo, David S . Miller, linux-wireless,
	brcm80211-dev-list.pdl, brcm80211-dev-list, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

The function wlc_phy_workarounds_nphy_rev1 is local to the source and
does not need to be in global scope, so make it static.

Cleans up sparse warning:
symbol 'wlc_phy_workarounds_nphy_rev1' was not declared. Should it
be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
index 1a187557982e..bedec1606caa 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
@@ -16904,7 +16904,7 @@ static void wlc_phy_workarounds_nphy_rev3(struct brcms_phy *pi)
 	}
 }
 
-void wlc_phy_workarounds_nphy_rev1(struct brcms_phy *pi)
+static void wlc_phy_workarounds_nphy_rev1(struct brcms_phy *pi)
 {
 	static const u8 rfseq_rx2tx_events[] = {
 		NPHY_RFSEQ_CMD_NOP,
-- 
2.17.0

^ permalink raw reply related

* [PATCH] brcmsmac: make function wlc_phy_workarounds_nphy_rev1 static
From: Colin King @ 2018-06-23 22:15 UTC (permalink / raw)
  To: Arend van Spriel, Franky Lin, Hante Meuleman, Chi-Hsien Lin,
	Wright Feng, Kalle Valo, David S . Miller, linux-wireless,
	brcm80211-dev-list.pdl, brcm80211-dev-list, netdev
  Cc: kernel-janitors, linux-kernel
In-Reply-To: <20180623221531.6396-1-colin.king@canonical.com>

From: Colin Ian King <colin.king@canonical.com>

The function wlc_phy_workarounds_nphy_rev1 is local to the source and
does not need to be in global scope, so make it static.

Cleans up sparse warning:
symbol 'wlc_phy_workarounds_nphy_rev1' was not declared. Should it
be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
index 1a187557982e..bedec1606caa 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
@@ -16904,7 +16904,7 @@ static void wlc_phy_workarounds_nphy_rev3(struct brcms_phy *pi)
 	}
 }
 
-void wlc_phy_workarounds_nphy_rev1(struct brcms_phy *pi)
+static void wlc_phy_workarounds_nphy_rev1(struct brcms_phy *pi)
 {
 	static const u8 rfseq_rx2tx_events[] = {
 		NPHY_RFSEQ_CMD_NOP,
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH] ipv6: avoid copy_from_user() via ipv6_renew_options_kern()
From: Al Viro @ 2018-06-23 22:21 UTC (permalink / raw)
  To: David Miller; +Cc: pmoore, netdev, selinux, linux-security-module
In-Reply-To: <20180623212626.GD30522@ZenIV.linux.org.uk>

On Sat, Jun 23, 2018 at 10:26:27PM +0100, Al Viro wrote:
> On Sat, Jun 23, 2018 at 10:57:06AM +0900, David Miller wrote:
> > From: Paul Moore <pmoore@redhat.com>
> > Date: Fri, 22 Jun 2018 17:18:20 -0400
> > 
> > > -	const mm_segment_t old_fs = get_fs();
> > > -
> > > -	set_fs(KERNEL_DS);
> > > -	ret_val = ipv6_renew_options(sk, opt, newtype,
> > > -				     (struct ipv6_opt_hdr __user *)newopt,
> > > -				     newoptlen);
> > > -	set_fs(old_fs);
> > 
> > So is it really the case that the traditional construct:
> > 
> > 	set_fs(KERNEL_DS);
> > 	... copy_{from,to}_user(...);
> > 	set_fs(old_fs);
> > 
> > is no longer allowed?
> 
> s/no longer allowed/best avoided/, but IMO in this case the replacement is
> too ugly to live ;-/

BTW, I wonder if the life would be simpler with do_ipv6_setsockopt() doing
the copy-in and verifying ipv6_optlen(*hdr) <= newoptlen; that would've
simplified ipv6_renew_option{,s}() quite a bit and completely eliminated
ipv6_renew_options_kern()...

Incidentally, is copying the entire value in case newoptlen > ipv6_optlen(...)
the right thing?  After all, the next update in any of those options will
quietly lose everything past ipv6_optlen(...), wouldn't it?

IOW, how about the following (completely untested):

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 16475c269749..d02881e4ad1f 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -355,14 +355,7 @@ struct ipv6_txoptions *ipv6_dup_options(struct sock *sk,
 struct ipv6_txoptions *ipv6_renew_options(struct sock *sk,
 					  struct ipv6_txoptions *opt,
 					  int newtype,
-					  struct ipv6_opt_hdr __user *newopt,
-					  int newoptlen);
-struct ipv6_txoptions *
-ipv6_renew_options_kern(struct sock *sk,
-			struct ipv6_txoptions *opt,
-			int newtype,
-			struct ipv6_opt_hdr *newopt,
-			int newoptlen);
+					  struct ipv6_opt_hdr *newopt);
 struct ipv6_txoptions *ipv6_fixup_options(struct ipv6_txoptions *opt_space,
 					  struct ipv6_txoptions *opt);
 
diff --git a/net/ipv6/calipso.c b/net/ipv6/calipso.c
index 1323b9679cf7..1c0bb9fb76e6 100644
--- a/net/ipv6/calipso.c
+++ b/net/ipv6/calipso.c
@@ -799,8 +799,7 @@ static int calipso_opt_update(struct sock *sk, struct ipv6_opt_hdr *hop)
 {
 	struct ipv6_txoptions *old = txopt_get(inet6_sk(sk)), *txopts;
 
-	txopts = ipv6_renew_options_kern(sk, old, IPV6_HOPOPTS,
-					 hop, hop ? ipv6_optlen(hop) : 0);
+	txopts = ipv6_renew_options(sk, old, IPV6_HOPOPTS, hop);
 	txopt_put(old);
 	if (IS_ERR(txopts))
 		return PTR_ERR(txopts);
@@ -1222,8 +1221,7 @@ static int calipso_req_setattr(struct request_sock *req,
 	if (IS_ERR(new))
 		return PTR_ERR(new);
 
-	txopts = ipv6_renew_options_kern(sk, req_inet->ipv6_opt, IPV6_HOPOPTS,
-					 new, new ? ipv6_optlen(new) : 0);
+	txopts = ipv6_renew_options(sk, req_inet->ipv6_opt, IPV6_HOPOPTS, new);
 
 	kfree(new);
 
@@ -1260,8 +1258,7 @@ static void calipso_req_delattr(struct request_sock *req)
 	if (calipso_opt_del(req_inet->ipv6_opt->hopopt, &new))
 		return; /* Nothing to do */
 
-	txopts = ipv6_renew_options_kern(sk, req_inet->ipv6_opt, IPV6_HOPOPTS,
-					 new, new ? ipv6_optlen(new) : 0);
+	txopts = ipv6_renew_options(sk, req_inet->ipv6_opt, IPV6_HOPOPTS, new);
 
 	if (!IS_ERR(txopts)) {
 		txopts = xchg(&req_inet->ipv6_opt, txopts);
diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 5bc2bf3733ab..4b6915eedfc3 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -1015,29 +1015,15 @@ ipv6_dup_options(struct sock *sk, struct ipv6_txoptions *opt)
 }
 EXPORT_SYMBOL_GPL(ipv6_dup_options);
 
-static int ipv6_renew_option(void *ohdr,
-			     struct ipv6_opt_hdr __user *newopt, int newoptlen,
-			     int inherit,
+static void ipv6_renew_option(struct ipv6_opt_hdr *opt,
 			     struct ipv6_opt_hdr **hdr,
 			     char **p)
 {
-	if (inherit) {
-		if (ohdr) {
-			memcpy(*p, ohdr, ipv6_optlen((struct ipv6_opt_hdr *)ohdr));
-			*hdr = (struct ipv6_opt_hdr *)*p;
-			*p += CMSG_ALIGN(ipv6_optlen(*hdr));
-		}
-	} else {
-		if (newopt) {
-			if (copy_from_user(*p, newopt, newoptlen))
-				return -EFAULT;
-			*hdr = (struct ipv6_opt_hdr *)*p;
-			if (ipv6_optlen(*hdr) > newoptlen)
-				return -EINVAL;
-			*p += CMSG_ALIGN(newoptlen);
-		}
+	if (opt) {
+		memcpy(*p, opt, ipv6_optlen(opt));
+		*hdr = (struct ipv6_opt_hdr *)*p;
+		*p += CMSG_ALIGN(ipv6_optlen(*hdr));
 	}
-	return 0;
 }
 
 /**
@@ -1063,13 +1049,11 @@ static int ipv6_renew_option(void *ohdr,
  */
 struct ipv6_txoptions *
 ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt,
-		   int newtype,
-		   struct ipv6_opt_hdr __user *newopt, int newoptlen)
+		   int newtype, struct ipv6_opt_hdr *newopt)
 {
 	int tot_len = 0;
 	char *p;
 	struct ipv6_txoptions *opt2;
-	int err;
 
 	if (opt) {
 		if (newtype != IPV6_HOPOPTS && opt->hopopt)
@@ -1082,8 +1066,8 @@ ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt,
 			tot_len += CMSG_ALIGN(ipv6_optlen(opt->dst1opt));
 	}
 
-	if (newopt && newoptlen)
-		tot_len += CMSG_ALIGN(newoptlen);
+	if (newopt)
+		tot_len += CMSG_ALIGN(ipv6_optlen(newopt));
 
 	if (!tot_len)
 		return NULL;
@@ -1098,67 +1082,25 @@ ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt,
 	opt2->tot_len = tot_len;
 	p = (char *)(opt2 + 1);
 
-	err = ipv6_renew_option(opt ? opt->hopopt : NULL, newopt, newoptlen,
-				newtype != IPV6_HOPOPTS,
-				&opt2->hopopt, &p);
-	if (err)
-		goto out;
-
-	err = ipv6_renew_option(opt ? opt->dst0opt : NULL, newopt, newoptlen,
-				newtype != IPV6_RTHDRDSTOPTS,
-				&opt2->dst0opt, &p);
-	if (err)
-		goto out;
-
-	err = ipv6_renew_option(opt ? opt->srcrt : NULL, newopt, newoptlen,
-				newtype != IPV6_RTHDR,
-				(struct ipv6_opt_hdr **)&opt2->srcrt, &p);
-	if (err)
-		goto out;
-
-	err = ipv6_renew_option(opt ? opt->dst1opt : NULL, newopt, newoptlen,
-				newtype != IPV6_DSTOPTS,
-				&opt2->dst1opt, &p);
-	if (err)
-		goto out;
-
+	ipv6_renew_option(newtype == IPV6_HOPOPTS ? newopt :
+				opt ? opt->hopopt : NULL,
+			  &opt2->hopopt, &p);
+
+	ipv6_renew_option(newtype == IPV6_RTHDRDSTOPTS ? newopt :
+				opt ? opt->dst0opt : NULL,
+			&opt2->dst0opt, &p);
+	ipv6_renew_option(newtype == IPV6_RTHDR ? newopt :
+				opt ? (struct ipv6_opt_hdr *)opt->srcrt : NULL,
+			(struct ipv6_opt_hdr **)&opt2->srcrt, &p);
+	ipv6_renew_option(newtype == IPV6_DSTOPTS ? newopt :
+				opt ? opt->dst1opt : NULL,
+			&opt2->dst1opt, &p);
 	opt2->opt_nflen = (opt2->hopopt ? ipv6_optlen(opt2->hopopt) : 0) +
 			  (opt2->dst0opt ? ipv6_optlen(opt2->dst0opt) : 0) +
 			  (opt2->srcrt ? ipv6_optlen(opt2->srcrt) : 0);
 	opt2->opt_flen = (opt2->dst1opt ? ipv6_optlen(opt2->dst1opt) : 0);
 
 	return opt2;
-out:
-	sock_kfree_s(sk, opt2, opt2->tot_len);
-	return ERR_PTR(err);
-}
-
-/**
- * ipv6_renew_options_kern - replace a specific ext hdr with a new one.
- *
- * @sk: sock from which to allocate memory
- * @opt: original options
- * @newtype: option type to replace in @opt
- * @newopt: new option of type @newtype to replace (kernel-mem)
- * @newoptlen: length of @newopt
- *
- * See ipv6_renew_options().  The difference is that @newopt is
- * kernel memory, rather than user memory.
- */
-struct ipv6_txoptions *
-ipv6_renew_options_kern(struct sock *sk, struct ipv6_txoptions *opt,
-			int newtype, struct ipv6_opt_hdr *newopt,
-			int newoptlen)
-{
-	struct ipv6_txoptions *ret_val;
-	const mm_segment_t old_fs = get_fs();
-
-	set_fs(KERNEL_DS);
-	ret_val = ipv6_renew_options(sk, opt, newtype,
-				     (struct ipv6_opt_hdr __user *)newopt,
-				     newoptlen);
-	set_fs(old_fs);
-	return ret_val;
 }
 
 struct ipv6_txoptions *ipv6_fixup_options(struct ipv6_txoptions *opt_space,
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 4d780c7f0130..b69ba18e0138 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -398,6 +398,12 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 	case IPV6_DSTOPTS:
 	{
 		struct ipv6_txoptions *opt;
+		struct ipv6_opt_hdr *new = NULL;
+
+		/* hop-by-hop / destination options are privileged option */
+		retv = -EPERM;
+		if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW))
+			break;
 
 		/* remove any sticky options header with a zero option
 		 * length, per RFC3542.
@@ -409,17 +415,23 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 		else if (optlen < sizeof(struct ipv6_opt_hdr) ||
 			 optlen & 0x7 || optlen > 8 * 255)
 			goto e_inval;
-
-		/* hop-by-hop / destination options are privileged option */
-		retv = -EPERM;
-		if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW))
-			break;
+		else {
+			new = kmemdup(optval, optlen, GFP_USER);
+			if (IS_ERR(new)) {
+				retv = PTR_ERR(new);
+				break;
+			}
+			if (unlikely(ipv6_optlen(new) > optlen)) {
+				kfree(new);
+				retv = -EINVAL;
+				break;
+			}
+		}
 
 		opt = rcu_dereference_protected(np->opt,
 						lockdep_sock_is_held(sk));
-		opt = ipv6_renew_options(sk, opt, optname,
-					 (struct ipv6_opt_hdr __user *)optval,
-					 optlen);
+		opt = ipv6_renew_options(sk, opt, optname, new);
+		kfree(new);
 		if (IS_ERR(opt)) {
 			retv = PTR_ERR(opt);
 			break;

^ permalink raw reply related

* [PATCH net-next] net: phy: fixed-phy: Make the error path simpler
From: Fabio Estevam @ 2018-06-24  0:28 UTC (permalink / raw)
  To: davem; +Cc: andrew, f.fainelli, netdev, Fabio Estevam

From: Fabio Estevam <fabio.estevam@nxp.com>

When platform_device_register_simple() fails we can return
the error immediately instead of jumping to the 'err_pdev'
label.

This makes the error path a bit simpler.

Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
---
 drivers/net/phy/fixed_phy.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c
index 001fe1d..67b2608 100644
--- a/drivers/net/phy/fixed_phy.c
+++ b/drivers/net/phy/fixed_phy.c
@@ -259,10 +259,8 @@ static int __init fixed_mdio_bus_init(void)
 	int ret;
 
 	pdev = platform_device_register_simple("Fixed MDIO bus", 0, NULL, 0);
-	if (IS_ERR(pdev)) {
-		ret = PTR_ERR(pdev);
-		goto err_pdev;
-	}
+	if (IS_ERR(pdev))
+		return PTR_ERR(pdev);
 
 	fmb->mii_bus = mdiobus_alloc();
 	if (fmb->mii_bus == NULL) {
@@ -287,7 +285,6 @@ static int __init fixed_mdio_bus_init(void)
 	mdiobus_free(fmb->mii_bus);
 err_mdiobus_reg:
 	platform_device_unregister(pdev);
-err_pdev:
 	return ret;
 }
 module_init(fixed_mdio_bus_init);
-- 
2.7.4

^ permalink raw reply related

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-24  1:45 UTC (permalink / raw)
  To: Siwei Liu
  Cc: Alexander Duyck, virtio-dev, Jiri Pirko, konrad.wilk,
	Jakub Kicinski, Samudrala, Sridhar, Cornelia Huck, qemu-devel,
	virtualization, Venu Busireddy, Netdev, boris.ostrovsky,
	aaron.f.brown, Joao Martins
In-Reply-To: <CADGSJ22DX8=rNPY+gNS_q=LCYLR5ieoE7oD8p4+8AnpzQsWSCg@mail.gmail.com>

On Fri, Jun 22, 2018 at 05:17:10PM -0700, Siwei Liu wrote:
> I forgot to mention the above has the assumption that we expose both
> STANDBY and UUID feature to QEMU user. In fact, as we're going towards
> not exposing the STANDBY feature directly to user, UUID may be always
> required to enable STANDBY.

Sounds good.

> If not, how do we make sure QEMU can
> control the visibility of primary device?

Hypervisors fundamentally always can control visibility of all
virtual devices.

> Something to be confirmed
> before implementing the code.
> 

^ permalink raw reply

* Re: [PATCH] fib_rules: match rules based on suppress_* properties too
From: David Miller @ 2018-06-24  2:25 UTC (permalink / raw)
  To: Jason; +Cc: roopa, netdev
In-Reply-To: <20180623155930.25983-1-Jason@zx2c4.com>

From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: Sat, 23 Jun 2018 17:59:30 +0200

> Two rules with different values of suppress_prefix or suppress_ifgroup
> are not the same. This fixes an -EEXIST when running:
> 
>    $ ip -4 rule add table main suppress_prefixlength 0
> 
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Fixes: f9d4b0c1e969 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule")

But the old rule_find() code didn't check this key either, so I can't
see how the behavior in this area changed.

I think the behavior changed for a different reason.

The commit mentioned in your Fixes: tag changed newrule semantics
wrt. defaults or "any" values.

The original code matched on pure values of the keys, whereas the new
code only compares the keys when the new rule is not specifying an
"any" value.

-		if (r->table != rule->table)
+		if (rule->table && r->table != rule->table)
 			continue;

And I think these changes are what makes your test case fail after the
commit.  Some other key didn't match previous due to the handling of
"any" values.

^ permalink raw reply

* Re: [PATCH net] cxgb4: when disabling dcb set txq dcb priority to 0
From: David Miller @ 2018-06-24  2:43 UTC (permalink / raw)
  To: ganeshgr; +Cc: netdev, nirranjan, indranil, robert, dsa, leedom
In-Reply-To: <1529765906-7499-1-git-send-email-ganeshgr@chelsio.com>

From: Ganesh Goudar <ganeshgr@chelsio.com>
Date: Sat, 23 Jun 2018 20:28:26 +0530

> When we are disabling DCB, store "0" in txq->dcb_prio
> since that's used for future TX Work Request "OVLAN_IDX"
> values. Setting non zero priority upon disabling DCB
> would halt the traffic.
> 
> Reported-by: AMG Zollner Robert <robert@cloudmedia.eu>
> CC: David Ahern <dsa@cumulusnetworks.com>
> Signed-off-by: Casey Leedom <leedom@chelsio.com>
> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] qmi_wwan: add support for the Dell Wireless 5821e module
From: David Miller @ 2018-06-24  2:44 UTC (permalink / raw)
  To: aleksander; +Cc: bjorn, netdev, linux-usb
In-Reply-To: <20180623212252.15026-1-aleksander@aleksander.es>

From: Aleksander Morgado <aleksander@aleksander.es>
Date: Sat, 23 Jun 2018 23:22:52 +0200

> This module exposes two USB configurations: a QMI+AT capable setup on
> USB config #1 and a MBIM capable setup on USB config #2.
> 
> By default the kernel will choose the MBIM capable configuration as
> long as the cdc_mbim driver is available. This patch adds support for
> the QMI port in the secondary configuration.
> 
> Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>

Applied and queued up for -stable.

^ permalink raw reply

* BUG: unable to handle kernel paging request in bpf_int_jit_compile
From: syzbot @ 2018-06-24  4:09 UTC (permalink / raw)
  To: ast, daniel, davem, hpa, kuznet, linux-kernel, mingo, netdev,
	syzkaller-bugs, tglx, x86, yoshfuji

Hello,

syzbot found the following crash on:

HEAD commit:    5e2204832b20 Merge tag 'powerpc-4.18-2' of git://git.kerne..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=148b5a90400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=befbcd7305e41bb0
dashboard link: https://syzkaller.appspot.com/bug?extid=a4eb8c7766952a1ca872
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=10ee22d4400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+a4eb8c7766952a1ca872@syzkaller.appspotmail.com

RAX: ffffffffffffffda RBX: 0000000001429914 RCX: 0000000000455a99
RDX: 0000000000000048 RSI: 0000000020000240 RDI: 0000000000000005
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
R13: 00000000004bb7d5 R14: 00000000004c8508 R15: 0000000000000023
BUG: unable to handle kernel paging request at ffffffffa0008002
PGD 8e6d067 P4D 8e6d067 PUD 8e6e063 PMD 1b4528067 PTE 1d433d161
Oops: 0003 [#1] SMP KASAN
CPU: 1 PID: 4811 Comm: syz-executor0 Not tainted 4.18.0-rc1+ #114
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:bpf_jit_binary_lock_ro include/linux/filter.h:703 [inline]
RIP: 0010:bpf_int_jit_compile+0xc36/0xf30 arch/x86/net/bpf_jit_comp.c:1168
Code: b8 00 00 00 00 00 fc ff df 48 c1 ea 03 0f b6 04 02 4c 89 f2 83 e2 07  
38 d0 7f 08 84 c0 0f 85 a0 02 00 00 48 8b 85 00 ff ff ff <80> 60 02 fe e9  
c7 fb ff ff e8 ac 00 36 00 48 8b 8d 30 ff ff ff 48
RSP: 0018:ffff8801cfca7998 EFLAGS: 00010246
RAX: ffffffffa0008000 RBX: 0000000000000046 RCX: ffffffff81460e4a
RDX: 0000000000000002 RSI: ffffffff81460e58 RDI: 0000000000000005
RBP: ffff8801cfca7ab8 R08: ffff8801aa2121c0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90001938002
R13: ffff8801cfca7a90 R14: ffffffffa0008002 R15: 00000000fffffff4
FS:  0000000001429940(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0008002 CR3: 00000001d2c40000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  bpf_prog_select_runtime+0x7db/0xa60 kernel/bpf/core.c:1505
  bpf_prog_load+0x1194/0x1c60 kernel/bpf/syscall.c:1356
  __do_sys_bpf kernel/bpf/syscall.c:2360 [inline]
  __se_sys_bpf kernel/bpf/syscall.c:2322 [inline]
  __x64_sys_bpf+0x36c/0x510 kernel/bpf/syscall.c:2322
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455a99
Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffd396676f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 0000000001429914 RCX: 0000000000455a99
RDX: 0000000000000048 RSI: 0000000020000240 RDI: 0000000000000005
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
R13: 00000000004bb7d5 R14: 00000000004c8508 R15: 0000000000000023
Modules linked in:
Dumping ftrace buffer:
    (ftrace buffer empty)
CR2: ffffffffa0008002
---[ end trace fa548fc30dca8c15 ]---
RIP: 0010:bpf_jit_binary_lock_ro include/linux/filter.h:703 [inline]
RIP: 0010:bpf_int_jit_compile+0xc36/0xf30 arch/x86/net/bpf_jit_comp.c:1168
Code: b8 00 00 00 00 00 fc ff df 48 c1 ea 03 0f b6 04 02 4c 89 f2 83 e2 07  
38 d0 7f 08 84 c0 0f 85 a0 02 00 00 48 8b 85 00 ff ff ff <80> 60 02 fe e9  
c7 fb ff ff e8 ac 00 36 00 48 8b 8d 30 ff ff ff 48
RSP: 0018:ffff8801cfca7998 EFLAGS: 00010246
RAX: ffffffffa0008000 RBX: 0000000000000046 RCX: ffffffff81460e4a
RDX: 0000000000000002 RSI: ffffffff81460e58 RDI: 0000000000000005
RBP: ffff8801cfca7ab8 R08: ffff8801aa2121c0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90001938002
R13: ffff8801cfca7a90 R14: ffffffffa0008002 R15: 00000000fffffff4
FS:  0000000001429940(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0008002 CR3: 00000001d2c40000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: Donation
From: M.M fridman @ 2018-06-24  3:07 UTC (permalink / raw)




I Mikhail Fridman. has selected you specially as one of my beneficiaries
for my Charitable Donation, Just as I have declared on May 23, 2016 to give
my fortune as charity.

Check the link below for confirmation:

http://www.ibtimes.co.uk/russias-second-wealthiest-man-mikhail-fridman-plans-leaving-14-2bn-fortune-charity-1561604

Reply as soon as possible with further directives.

Best Regards,
Mikhail Fridman.

^ permalink raw reply

* [PATCH RFC 0/2] Convert GRO receive over to hash table.
From: David Miller @ 2018-06-24  5:13 UTC (permalink / raw)
  To: netdev; +Cc: edumazet


When many parallel flows are present and being received on the same
RX queue, GRO processing can become expensive because each incoming
frame must traverse the per-NAPI GRO list at each protocol layer
of GRO receive (eth --> ipv{4,6} --> tcp).

Use the already computed hash to chain these SKBs in a hash table
instead of a simple list.

The first patch makes the GRO list a true list_head.

The second patch implements the hash table.

This series patches basic testing and I added some diagnostics
to make sure we really were aggregating GRO frames :-)

Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* [PATCH RFC 1/2] net: Convert GRO SKB handling to list_head.
From: David Miller @ 2018-06-24  5:13 UTC (permalink / raw)
  To: netdev; +Cc: edumazet


Manage pending per-NAPI GRO packets via list_head.

Return an SKB pointer from the GRO receive handlers.  When GRO receive
handlers return non-NULL, it means that this SKB needs to be completed
at this time and removed from the NAPI queue.

Several operations are greatly simplified by this transformation,
especially timing out the oldest SKB in the list when gro_count
exceeds MAX_GRO_SKBS, and napi_gro_flush() which walks the queue
in reverse order.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/geneve.c        | 11 +++---
 drivers/net/vxlan.c         | 11 +++---
 include/linux/etherdevice.h |  3 +-
 include/linux/netdevice.h   | 32 ++++++++---------
 include/linux/skbuff.h      |  3 +-
 include/linux/udp.h         |  4 +--
 include/net/inet_common.h   |  2 +-
 include/net/tcp.h           |  2 +-
 include/net/udp.h           |  4 +--
 include/net/udp_tunnel.h    |  6 ++--
 net/8021q/vlan.c            | 13 +++----
 net/core/dev.c              | 68 +++++++++++++++----------------------
 net/core/skbuff.c           |  4 +--
 net/ethernet/eth.c          | 12 +++----
 net/ipv4/af_inet.c          | 12 +++----
 net/ipv4/esp4_offload.c     |  4 +--
 net/ipv4/fou.c              | 20 +++++------
 net/ipv4/gre_offload.c      |  8 ++---
 net/ipv4/tcp_offload.c      | 14 ++++----
 net/ipv4/udp_offload.c      | 13 +++----
 net/ipv6/esp6_offload.c     |  4 +--
 net/ipv6/ip6_offload.c      | 16 ++++-----
 net/ipv6/tcpv6_offload.c    |  4 +--
 net/ipv6/udp_offload.c      |  4 +--
 24 files changed, 133 insertions(+), 141 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 750eaa53bf0c..3e94375b9b01 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -418,11 +418,12 @@ static int geneve_hlen(struct genevehdr *gh)
 	return sizeof(*gh) + gh->opt_len * 4;
 }
 
-static struct sk_buff **geneve_gro_receive(struct sock *sk,
-					   struct sk_buff **head,
-					   struct sk_buff *skb)
+static struct sk_buff *geneve_gro_receive(struct sock *sk,
+					  struct list_head *head,
+					  struct sk_buff *skb)
 {
-	struct sk_buff *p, **pp = NULL;
+	struct sk_buff *pp = NULL;
+	struct sk_buff *p;
 	struct genevehdr *gh, *gh2;
 	unsigned int hlen, gh_len, off_gnv;
 	const struct packet_offload *ptype;
@@ -449,7 +450,7 @@ static struct sk_buff **geneve_gro_receive(struct sock *sk,
 			goto out;
 	}
 
-	for (p = *head; p; p = p->next) {
+	list_for_each_entry(p, head, list) {
 		if (!NAPI_GRO_CB(p)->same_flow)
 			continue;
 
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index aee0e60471f1..cc14e0cd5647 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -568,11 +568,12 @@ static struct vxlanhdr *vxlan_gro_remcsum(struct sk_buff *skb,
 	return vh;
 }
 
-static struct sk_buff **vxlan_gro_receive(struct sock *sk,
-					  struct sk_buff **head,
-					  struct sk_buff *skb)
+static struct sk_buff *vxlan_gro_receive(struct sock *sk,
+					 struct list_head *head,
+					 struct sk_buff *skb)
 {
-	struct sk_buff *p, **pp = NULL;
+	struct sk_buff *pp = NULL;
+	struct sk_buff *p;
 	struct vxlanhdr *vh, *vh2;
 	unsigned int hlen, off_vx;
 	int flush = 1;
@@ -607,7 +608,7 @@ static struct sk_buff **vxlan_gro_receive(struct sock *sk,
 
 	skb_gro_pull(skb, sizeof(struct vxlanhdr)); /* pull vxlan header */
 
-	for (p = *head; p; p = p->next) {
+	list_for_each_entry(p, head, list) {
 		if (!NAPI_GRO_CB(p)->same_flow)
 			continue;
 
diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
index 79563840c295..572e11bb8696 100644
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -59,8 +59,7 @@ struct net_device *devm_alloc_etherdev_mqs(struct device *dev, int sizeof_priv,
 					   unsigned int rxqs);
 #define devm_alloc_etherdev(dev, sizeof_priv) devm_alloc_etherdev_mqs(dev, sizeof_priv, 1, 1)
 
-struct sk_buff **eth_gro_receive(struct sk_buff **head,
-				 struct sk_buff *skb);
+struct sk_buff *eth_gro_receive(struct list_head *head, struct sk_buff *skb);
 int eth_gro_complete(struct sk_buff *skb, int nhoff);
 
 /* Reserved Ethernet Addresses per IEEE 802.1Q */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3ec9850c7936..f176d9873910 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -322,7 +322,7 @@ struct napi_struct {
 	int			poll_owner;
 #endif
 	struct net_device	*dev;
-	struct sk_buff		*gro_list;
+	struct list_head	gro_list;
 	struct sk_buff		*skb;
 	struct hrtimer		timer;
 	struct list_head	dev_list;
@@ -2255,10 +2255,10 @@ static inline int gro_recursion_inc_test(struct sk_buff *skb)
 	return ++NAPI_GRO_CB(skb)->recursion_counter == GRO_RECURSION_LIMIT;
 }
 
-typedef struct sk_buff **(*gro_receive_t)(struct sk_buff **, struct sk_buff *);
-static inline struct sk_buff **call_gro_receive(gro_receive_t cb,
-						struct sk_buff **head,
-						struct sk_buff *skb)
+typedef struct sk_buff *(*gro_receive_t)(struct list_head *, struct sk_buff *);
+static inline struct sk_buff *call_gro_receive(gro_receive_t cb,
+					       struct list_head *head,
+					       struct sk_buff *skb)
 {
 	if (unlikely(gro_recursion_inc_test(skb))) {
 		NAPI_GRO_CB(skb)->flush |= 1;
@@ -2268,12 +2268,12 @@ static inline struct sk_buff **call_gro_receive(gro_receive_t cb,
 	return cb(head, skb);
 }
 
-typedef struct sk_buff **(*gro_receive_sk_t)(struct sock *, struct sk_buff **,
-					     struct sk_buff *);
-static inline struct sk_buff **call_gro_receive_sk(gro_receive_sk_t cb,
-						   struct sock *sk,
-						   struct sk_buff **head,
-						   struct sk_buff *skb)
+typedef struct sk_buff *(*gro_receive_sk_t)(struct sock *, struct list_head *,
+					    struct sk_buff *);
+static inline struct sk_buff *call_gro_receive_sk(gro_receive_sk_t cb,
+						  struct sock *sk,
+						  struct list_head *head,
+						  struct sk_buff *skb)
 {
 	if (unlikely(gro_recursion_inc_test(skb))) {
 		NAPI_GRO_CB(skb)->flush |= 1;
@@ -2299,8 +2299,8 @@ struct packet_type {
 struct offload_callbacks {
 	struct sk_buff		*(*gso_segment)(struct sk_buff *skb,
 						netdev_features_t features);
-	struct sk_buff		**(*gro_receive)(struct sk_buff **head,
-						 struct sk_buff *skb);
+	struct sk_buff		*(*gro_receive)(struct list_head *head,
+						struct sk_buff *skb);
 	int			(*gro_complete)(struct sk_buff *skb, int nhoff);
 };
 
@@ -2568,7 +2568,7 @@ struct net_device *dev_get_by_index_rcu(struct net *net, int ifindex);
 struct net_device *dev_get_by_napi_id(unsigned int napi_id);
 int netdev_get_name(struct net *net, char *name, int ifindex);
 int dev_restart(struct net_device *dev);
-int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb);
+int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb);
 
 static inline unsigned int skb_gro_offset(const struct sk_buff *skb)
 {
@@ -2784,13 +2784,13 @@ static inline void skb_gro_remcsum_cleanup(struct sk_buff *skb,
 }
 
 #ifdef CONFIG_XFRM_OFFLOAD
-static inline void skb_gro_flush_final(struct sk_buff *skb, struct sk_buff **pp, int flush)
+static inline void skb_gro_flush_final(struct sk_buff *skb, struct sk_buff *pp, int flush)
 {
 	if (PTR_ERR(pp) != -EINPROGRESS)
 		NAPI_GRO_CB(skb)->flush |= flush;
 }
 #else
-static inline void skb_gro_flush_final(struct sk_buff *skb, struct sk_buff **pp, int flush)
+static inline void skb_gro_flush_final(struct sk_buff *skb, struct sk_buff *pp, int flush)
 {
 	NAPI_GRO_CB(skb)->flush |= flush;
 }
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c86885954994..7ccc601b55d9 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -677,7 +677,8 @@ struct sk_buff {
 				int			ip_defrag_offset;
 			};
 		};
-		struct rb_node	rbnode; /* used in netem & tcp stack */
+		struct rb_node		rbnode; /* used in netem & tcp stack */
+		struct list_head	list;
 	};
 	struct sock		*sk;
 
diff --git a/include/linux/udp.h b/include/linux/udp.h
index ca840345571b..320d49d85484 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -74,8 +74,8 @@ struct udp_sock {
 	void (*encap_destroy)(struct sock *sk);
 
 	/* GRO functions for UDP socket */
-	struct sk_buff **	(*gro_receive)(struct sock *sk,
-					       struct sk_buff **head,
+	struct sk_buff *	(*gro_receive)(struct sock *sk,
+					       struct list_head *head,
 					       struct sk_buff *skb);
 	int			(*gro_complete)(struct sock *sk,
 						struct sk_buff *skb,
diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index 384b90c62c0b..3ca969cbd161 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -43,7 +43,7 @@ int inet_ctl_sock_create(struct sock **sk, unsigned short family,
 int inet_recv_error(struct sock *sk, struct msghdr *msg, int len,
 		    int *addr_len);
 
-struct sk_buff **inet_gro_receive(struct sk_buff **head, struct sk_buff *skb);
+struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb);
 int inet_gro_complete(struct sk_buff *skb, int nhoff);
 struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 				 netdev_features_t features);
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 822ee49ed0f9..402a88b0e8a8 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1788,7 +1788,7 @@ void tcp_v4_destroy_sock(struct sock *sk);
 
 struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 				netdev_features_t features);
-struct sk_buff **tcp_gro_receive(struct sk_buff **head, struct sk_buff *skb);
+struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb);
 int tcp_gro_complete(struct sk_buff *skb);
 
 void __tcp_v4_send_check(struct sk_buff *skb, __be32 saddr, __be32 daddr);
diff --git a/include/net/udp.h b/include/net/udp.h
index b1ea8b0f5e6a..5723c6128ae4 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -170,8 +170,8 @@ static inline void udp_csum_pull_header(struct sk_buff *skb)
 typedef struct sock *(*udp_lookup_t)(struct sk_buff *skb, __be16 sport,
 				     __be16 dport);
 
-struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
-				 struct udphdr *uh, udp_lookup_t lookup);
+struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
+				struct udphdr *uh, udp_lookup_t lookup);
 int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t lookup);
 
 struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index b95a6927c718..fe680ab6b15a 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -65,9 +65,9 @@ static inline int udp_sock_create(struct net *net,
 
 typedef int (*udp_tunnel_encap_rcv_t)(struct sock *sk, struct sk_buff *skb);
 typedef void (*udp_tunnel_encap_destroy_t)(struct sock *sk);
-typedef struct sk_buff **(*udp_tunnel_gro_receive_t)(struct sock *sk,
-						     struct sk_buff **head,
-						     struct sk_buff *skb);
+typedef struct sk_buff *(*udp_tunnel_gro_receive_t)(struct sock *sk,
+						    struct list_head *head,
+						    struct sk_buff *skb);
 typedef int (*udp_tunnel_gro_complete_t)(struct sock *sk, struct sk_buff *skb,
 					 int nhoff);
 
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 73a65789271b..99141986efa0 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -647,13 +647,14 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 	return err;
 }
 
-static struct sk_buff **vlan_gro_receive(struct sk_buff **head,
-					 struct sk_buff *skb)
+static struct sk_buff *vlan_gro_receive(struct list_head *head,
+					struct sk_buff *skb)
 {
-	struct sk_buff *p, **pp = NULL;
-	struct vlan_hdr *vhdr;
-	unsigned int hlen, off_vlan;
 	const struct packet_offload *ptype;
+	unsigned int hlen, off_vlan;
+	struct sk_buff *pp = NULL;
+	struct vlan_hdr *vhdr;
+	struct sk_buff *p;
 	__be16 type;
 	int flush = 1;
 
@@ -675,7 +676,7 @@ static struct sk_buff **vlan_gro_receive(struct sk_buff **head,
 
 	flush = 0;
 
-	for (p = *head; p; p = p->next) {
+	list_for_each_entry(p, head, list) {
 		struct vlan_hdr *vhdr2;
 
 		if (!NAPI_GRO_CB(p)->same_flow)
diff --git a/net/core/dev.c b/net/core/dev.c
index a5aa1c7444e6..aa61b9344b46 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4881,36 +4881,25 @@ static int napi_gro_complete(struct sk_buff *skb)
  */
 void napi_gro_flush(struct napi_struct *napi, bool flush_old)
 {
-	struct sk_buff *skb, *prev = NULL;
-
-	/* scan list and build reverse chain */
-	for (skb = napi->gro_list; skb != NULL; skb = skb->next) {
-		skb->prev = prev;
-		prev = skb;
-	}
-
-	for (skb = prev; skb; skb = prev) {
-		skb->next = NULL;
+	struct sk_buff *skb, *p;
 
+	list_for_each_entry_safe_reverse(skb, p, &napi->gro_list, list) {
 		if (flush_old && NAPI_GRO_CB(skb)->age == jiffies)
 			return;
-
-		prev = skb->prev;
+		list_del_init(&skb->list);
 		napi_gro_complete(skb);
 		napi->gro_count--;
 	}
-
-	napi->gro_list = NULL;
 }
 EXPORT_SYMBOL(napi_gro_flush);
 
 static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)
 {
-	struct sk_buff *p;
 	unsigned int maclen = skb->dev->hard_header_len;
 	u32 hash = skb_get_hash_raw(skb);
+	struct sk_buff *p;
 
-	for (p = napi->gro_list; p; p = p->next) {
+	list_for_each_entry(p, &napi->gro_list, list) {
 		unsigned long diffs;
 
 		NAPI_GRO_CB(p)->flush = 0;
@@ -4977,12 +4966,12 @@ static void gro_pull_from_frag0(struct sk_buff *skb, int grow)
 
 static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
 {
-	struct sk_buff **pp = NULL;
+	struct list_head *head = &offload_base;
 	struct packet_offload *ptype;
 	__be16 type = skb->protocol;
-	struct list_head *head = &offload_base;
-	int same_flow;
+	struct sk_buff *pp = NULL;
 	enum gro_result ret;
+	int same_flow;
 	int grow;
 
 	if (netif_elide_gro(skb->dev))
@@ -5039,11 +5028,8 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 	ret = NAPI_GRO_CB(skb)->free ? GRO_MERGED_FREE : GRO_MERGED;
 
 	if (pp) {
-		struct sk_buff *nskb = *pp;
-
-		*pp = nskb->next;
-		nskb->next = NULL;
-		napi_gro_complete(nskb);
+		list_del_init(&pp->list);
+		napi_gro_complete(pp);
 		napi->gro_count--;
 	}
 
@@ -5054,15 +5040,10 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 		goto normal;
 
 	if (unlikely(napi->gro_count >= MAX_GRO_SKBS)) {
-		struct sk_buff *nskb = napi->gro_list;
+		struct sk_buff *nskb;
 
-		/* locate the end of the list to select the 'oldest' flow */
-		while (nskb->next) {
-			pp = &nskb->next;
-			nskb = *pp;
-		}
-		*pp = NULL;
-		nskb->next = NULL;
+		nskb = list_last_entry(&napi->gro_list, struct sk_buff, list);
+		list_del(&nskb->list);
 		napi_gro_complete(nskb);
 	} else {
 		napi->gro_count++;
@@ -5071,8 +5052,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 	NAPI_GRO_CB(skb)->age = jiffies;
 	NAPI_GRO_CB(skb)->last = skb;
 	skb_shinfo(skb)->gso_size = skb_gro_len(skb);
-	skb->next = napi->gro_list;
-	napi->gro_list = skb;
+	list_add(&skb->list, &napi->gro_list);
 	ret = GRO_HELD;
 
 pull:
@@ -5478,7 +5458,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
 				 NAPIF_STATE_IN_BUSY_POLL)))
 		return false;
 
-	if (n->gro_list) {
+	if (!list_empty(&n->gro_list)) {
 		unsigned long timeout = 0;
 
 		if (work_done)
@@ -5687,7 +5667,7 @@ static enum hrtimer_restart napi_watchdog(struct hrtimer *timer)
 	/* Note : we use a relaxed variant of napi_schedule_prep() not setting
 	 * NAPI_STATE_MISSED, since we do not react to a device IRQ.
 	 */
-	if (napi->gro_list && !napi_disable_pending(napi) &&
+	if (!list_empty(&napi->gro_list) && !napi_disable_pending(napi) &&
 	    !test_and_set_bit(NAPI_STATE_SCHED, &napi->state))
 		__napi_schedule_irqoff(napi);
 
@@ -5701,7 +5681,7 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 	hrtimer_init(&napi->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
 	napi->timer.function = napi_watchdog;
 	napi->gro_count = 0;
-	napi->gro_list = NULL;
+	INIT_LIST_HEAD(&napi->gro_list);
 	napi->skb = NULL;
 	napi->poll = poll;
 	if (weight > NAPI_POLL_WEIGHT)
@@ -5734,6 +5714,14 @@ void napi_disable(struct napi_struct *n)
 }
 EXPORT_SYMBOL(napi_disable);
 
+static void gro_list_free(struct list_head *head)
+{
+	struct sk_buff *skb, *p;
+
+	list_for_each_entry_safe(skb, p, head, list)
+		kfree_skb(skb);
+}
+
 /* Must be called in process context */
 void netif_napi_del(struct napi_struct *napi)
 {
@@ -5743,8 +5731,8 @@ void netif_napi_del(struct napi_struct *napi)
 	list_del_init(&napi->dev_list);
 	napi_free_frags(napi);
 
-	kfree_skb_list(napi->gro_list);
-	napi->gro_list = NULL;
+	gro_list_free(&napi->gro_list);
+	INIT_LIST_HEAD(&napi->gro_list);
 	napi->gro_count = 0;
 }
 EXPORT_SYMBOL(netif_napi_del);
@@ -5787,7 +5775,7 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
 		goto out_unlock;
 	}
 
-	if (n->gro_list) {
+	if (!list_empty(&n->gro_list)) {
 		/* flush too old packets
 		 * If HZ < 1000, flush all packets.
 		 */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c642304f178c..b1f274f22d85 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3815,14 +3815,14 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 }
 EXPORT_SYMBOL_GPL(skb_segment);
 
-int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
+int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
 {
 	struct skb_shared_info *pinfo, *skbinfo = skb_shinfo(skb);
 	unsigned int offset = skb_gro_offset(skb);
 	unsigned int headlen = skb_headlen(skb);
 	unsigned int len = skb_gro_len(skb);
-	struct sk_buff *lp, *p = *head;
 	unsigned int delta_truesize;
+	struct sk_buff *lp;
 
 	if (unlikely(p->len + len >= 65536))
 		return -E2BIG;
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index ee28440f57c5..fd8faa0dfa61 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -427,13 +427,13 @@ ssize_t sysfs_format_mac(char *buf, const unsigned char *addr, int len)
 }
 EXPORT_SYMBOL(sysfs_format_mac);
 
-struct sk_buff **eth_gro_receive(struct sk_buff **head,
-				 struct sk_buff *skb)
+struct sk_buff *eth_gro_receive(struct list_head *head, struct sk_buff *skb)
 {
-	struct sk_buff *p, **pp = NULL;
-	struct ethhdr *eh, *eh2;
-	unsigned int hlen, off_eth;
 	const struct packet_offload *ptype;
+	unsigned int hlen, off_eth;
+	struct sk_buff *pp = NULL;
+	struct ethhdr *eh, *eh2;
+	struct sk_buff *p;
 	__be16 type;
 	int flush = 1;
 
@@ -448,7 +448,7 @@ struct sk_buff **eth_gro_receive(struct sk_buff **head,
 
 	flush = 0;
 
-	for (p = *head; p; p = p->next) {
+	list_for_each_entry(p, head, list) {
 		if (!NAPI_GRO_CB(p)->same_flow)
 			continue;
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 15e125558c76..06b218a2870f 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1384,12 +1384,12 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 }
 EXPORT_SYMBOL(inet_gso_segment);
 
-struct sk_buff **inet_gro_receive(struct sk_buff **head, struct sk_buff *skb)
+struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
 {
 	const struct net_offload *ops;
-	struct sk_buff **pp = NULL;
-	struct sk_buff *p;
+	struct sk_buff *pp = NULL;
 	const struct iphdr *iph;
+	struct sk_buff *p;
 	unsigned int hlen;
 	unsigned int off;
 	unsigned int id;
@@ -1425,7 +1425,7 @@ struct sk_buff **inet_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 	flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id & ~IP_DF));
 	id >>= 16;
 
-	for (p = *head; p; p = p->next) {
+	list_for_each_entry(p, head, list) {
 		struct iphdr *iph2;
 		u16 flush_id;
 
@@ -1505,8 +1505,8 @@ struct sk_buff **inet_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(inet_gro_receive);
 
-static struct sk_buff **ipip_gro_receive(struct sk_buff **head,
-					 struct sk_buff *skb)
+static struct sk_buff *ipip_gro_receive(struct list_head *head,
+					struct sk_buff *skb)
 {
 	if (NAPI_GRO_CB(skb)->encap_mark) {
 		NAPI_GRO_CB(skb)->flush = 1;
diff --git a/net/ipv4/esp4_offload.c b/net/ipv4/esp4_offload.c
index 7cf755ef9efb..bbeecd13e534 100644
--- a/net/ipv4/esp4_offload.c
+++ b/net/ipv4/esp4_offload.c
@@ -28,8 +28,8 @@
 #include <linux/spinlock.h>
 #include <net/udp.h>
 
-static struct sk_buff **esp4_gro_receive(struct sk_buff **head,
-					 struct sk_buff *skb)
+static struct sk_buff *esp4_gro_receive(struct list_head *head,
+					struct sk_buff *skb)
 {
 	int offset = skb_gro_offset(skb);
 	struct xfrm_offload *xo;
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 1540db65241a..efdc9e1f741e 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -224,14 +224,14 @@ static int gue_udp_recv(struct sock *sk, struct sk_buff *skb)
 	return 0;
 }
 
-static struct sk_buff **fou_gro_receive(struct sock *sk,
-					struct sk_buff **head,
-					struct sk_buff *skb)
+static struct sk_buff *fou_gro_receive(struct sock *sk,
+				       struct list_head *head,
+				       struct sk_buff *skb)
 {
-	const struct net_offload *ops;
-	struct sk_buff **pp = NULL;
 	u8 proto = fou_from_sock(sk)->protocol;
 	const struct net_offload **offloads;
+	const struct net_offload *ops;
+	struct sk_buff *pp = NULL;
 
 	/* We can clear the encap_mark for FOU as we are essentially doing
 	 * one of two possible things.  We are either adding an L4 tunnel
@@ -305,13 +305,13 @@ static struct guehdr *gue_gro_remcsum(struct sk_buff *skb, unsigned int off,
 	return guehdr;
 }
 
-static struct sk_buff **gue_gro_receive(struct sock *sk,
-					struct sk_buff **head,
-					struct sk_buff *skb)
+static struct sk_buff *gue_gro_receive(struct sock *sk,
+				       struct list_head *head,
+				       struct sk_buff *skb)
 {
 	const struct net_offload **offloads;
 	const struct net_offload *ops;
-	struct sk_buff **pp = NULL;
+	struct sk_buff *pp = NULL;
 	struct sk_buff *p;
 	struct guehdr *guehdr;
 	size_t len, optlen, hdrlen, off;
@@ -397,7 +397,7 @@ static struct sk_buff **gue_gro_receive(struct sock *sk,
 
 	skb_gro_pull(skb, hdrlen);
 
-	for (p = *head; p; p = p->next) {
+	list_for_each_entry(p, head, list) {
 		const struct guehdr *guehdr2;
 
 		if (!NAPI_GRO_CB(p)->same_flow)
diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index 1859c473b21a..b9673c21be45 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -108,10 +108,10 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
 	return segs;
 }
 
-static struct sk_buff **gre_gro_receive(struct sk_buff **head,
-					struct sk_buff *skb)
+static struct sk_buff *gre_gro_receive(struct list_head *head,
+				       struct sk_buff *skb)
 {
-	struct sk_buff **pp = NULL;
+	struct sk_buff *pp = NULL;
 	struct sk_buff *p;
 	const struct gre_base_hdr *greh;
 	unsigned int hlen, grehlen;
@@ -182,7 +182,7 @@ static struct sk_buff **gre_gro_receive(struct sk_buff **head,
 					     null_compute_pseudo);
 	}
 
-	for (p = *head; p; p = p->next) {
+	list_for_each_entry(p, head, list) {
 		const struct gre_base_hdr *greh2;
 
 		if (!NAPI_GRO_CB(p)->same_flow)
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 8cc7c3487330..f5aee641f825 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -180,9 +180,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 	return segs;
 }
 
-struct sk_buff **tcp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
+struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb)
 {
-	struct sk_buff **pp = NULL;
+	struct sk_buff *pp = NULL;
 	struct sk_buff *p;
 	struct tcphdr *th;
 	struct tcphdr *th2;
@@ -220,7 +220,7 @@ struct sk_buff **tcp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 	len = skb_gro_len(skb);
 	flags = tcp_flag_word(th);
 
-	for (; (p = *head); head = &p->next) {
+	list_for_each_entry(p, head, list) {
 		if (!NAPI_GRO_CB(p)->same_flow)
 			continue;
 
@@ -233,7 +233,7 @@ struct sk_buff **tcp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 
 		goto found;
 	}
-
+	p = NULL;
 	goto out_check_final;
 
 found:
@@ -263,7 +263,7 @@ struct sk_buff **tcp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 	flush |= (len - 1) >= mss;
 	flush |= (ntohl(th2->seq) + skb_gro_len(p)) ^ ntohl(th->seq);
 
-	if (flush || skb_gro_receive(head, skb)) {
+	if (flush || skb_gro_receive(p, skb)) {
 		mss = 1;
 		goto out_check_final;
 	}
@@ -277,7 +277,7 @@ struct sk_buff **tcp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 					TCP_FLAG_FIN));
 
 	if (p && (!NAPI_GRO_CB(skb)->same_flow || flush))
-		pp = head;
+		pp = p;
 
 out:
 	NAPI_GRO_CB(skb)->flush |= (flush != 0);
@@ -302,7 +302,7 @@ int tcp_gro_complete(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(tcp_gro_complete);
 
-static struct sk_buff **tcp4_gro_receive(struct sk_buff **head, struct sk_buff *skb)
+static struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *skb)
 {
 	/* Don't bother verifying checksum if we're going to flush anyway. */
 	if (!NAPI_GRO_CB(skb)->flush &&
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 92dc9e5a7ff3..ac46c1c55c99 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -343,10 +343,11 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
 	return segs;
 }
 
-struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
-				 struct udphdr *uh, udp_lookup_t lookup)
+struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
+				struct udphdr *uh, udp_lookup_t lookup)
 {
-	struct sk_buff *p, **pp = NULL;
+	struct sk_buff *pp = NULL;
+	struct sk_buff *p;
 	struct udphdr *uh2;
 	unsigned int off = skb_gro_offset(skb);
 	int flush = 1;
@@ -371,7 +372,7 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
 unflush:
 	flush = 0;
 
-	for (p = *head; p; p = p->next) {
+	list_for_each_entry(p, head, list) {
 		if (!NAPI_GRO_CB(p)->same_flow)
 			continue;
 
@@ -399,8 +400,8 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
 }
 EXPORT_SYMBOL(udp_gro_receive);
 
-static struct sk_buff **udp4_gro_receive(struct sk_buff **head,
-					 struct sk_buff *skb)
+static struct sk_buff *udp4_gro_receive(struct list_head *head,
+					struct sk_buff *skb)
 {
 	struct udphdr *uh = udp_gro_udphdr(skb);
 
diff --git a/net/ipv6/esp6_offload.c b/net/ipv6/esp6_offload.c
index 27f59b61f70f..ddfa533a84e5 100644
--- a/net/ipv6/esp6_offload.c
+++ b/net/ipv6/esp6_offload.c
@@ -49,8 +49,8 @@ static __u16 esp6_nexthdr_esp_offset(struct ipv6hdr *ipv6_hdr, int nhlen)
 	return 0;
 }
 
-static struct sk_buff **esp6_gro_receive(struct sk_buff **head,
-					 struct sk_buff *skb)
+static struct sk_buff *esp6_gro_receive(struct list_head *head,
+					struct sk_buff *skb)
 {
 	int offset = skb_gro_offset(skb);
 	struct xfrm_offload *xo;
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 5b3f2f89ef41..37ff4805b20c 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -163,11 +163,11 @@ static int ipv6_exthdrs_len(struct ipv6hdr *iph,
 	return len;
 }
 
-static struct sk_buff **ipv6_gro_receive(struct sk_buff **head,
-					 struct sk_buff *skb)
+static struct sk_buff *ipv6_gro_receive(struct list_head *head,
+					struct sk_buff *skb)
 {
 	const struct net_offload *ops;
-	struct sk_buff **pp = NULL;
+	struct sk_buff *pp = NULL;
 	struct sk_buff *p;
 	struct ipv6hdr *iph;
 	unsigned int nlen;
@@ -214,7 +214,7 @@ static struct sk_buff **ipv6_gro_receive(struct sk_buff **head,
 	flush--;
 	nlen = skb_network_header_len(skb);
 
-	for (p = *head; p; p = p->next) {
+	list_for_each_entry(p, head, list) {
 		const struct ipv6hdr *iph2;
 		__be32 first_word; /* <Version:4><Traffic_Class:8><Flow_Label:20> */
 
@@ -263,8 +263,8 @@ static struct sk_buff **ipv6_gro_receive(struct sk_buff **head,
 	return pp;
 }
 
-static struct sk_buff **sit_ip6ip6_gro_receive(struct sk_buff **head,
-					       struct sk_buff *skb)
+static struct sk_buff *sit_ip6ip6_gro_receive(struct list_head *head,
+					      struct sk_buff *skb)
 {
 	/* Common GRO receive for SIT and IP6IP6 */
 
@@ -278,8 +278,8 @@ static struct sk_buff **sit_ip6ip6_gro_receive(struct sk_buff **head,
 	return ipv6_gro_receive(head, skb);
 }
 
-static struct sk_buff **ip4ip6_gro_receive(struct sk_buff **head,
-					   struct sk_buff *skb)
+static struct sk_buff *ip4ip6_gro_receive(struct list_head *head,
+					  struct sk_buff *skb)
 {
 	/* Common GRO receive for SIT and IP6IP6 */
 
diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c
index 278e49cd67d4..e72947c99454 100644
--- a/net/ipv6/tcpv6_offload.c
+++ b/net/ipv6/tcpv6_offload.c
@@ -15,8 +15,8 @@
 #include <net/ip6_checksum.h>
 #include "ip6_offload.h"
 
-static struct sk_buff **tcp6_gro_receive(struct sk_buff **head,
-					 struct sk_buff *skb)
+static struct sk_buff *tcp6_gro_receive(struct list_head *head,
+					struct sk_buff *skb)
 {
 	/* Don't bother verifying checksum if we're going to flush anyway. */
 	if (!NAPI_GRO_CB(skb)->flush &&
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 03a2ff3fe1e6..95dee9ca8d22 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -114,8 +114,8 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
 	return segs;
 }
 
-static struct sk_buff **udp6_gro_receive(struct sk_buff **head,
-					 struct sk_buff *skb)
+static struct sk_buff *udp6_gro_receive(struct list_head *head,
+					struct sk_buff *skb)
 {
 	struct udphdr *uh = udp_gro_udphdr(skb);
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH RFC 2/2] net: Convert NAPI gro list into a small hash table.
From: David Miller @ 2018-06-24  5:14 UTC (permalink / raw)
  To: netdev; +Cc: edumazet


Improve the performance of GRO receive by splitting flows into
multiple hash chains.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/netdevice.h |   3 +-
 net/core/dev.c            | 105 ++++++++++++++++++++++++++++----------
 2 files changed, 81 insertions(+), 27 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f176d9873910..c6b377a15869 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -305,6 +305,7 @@ int __init netdev_boot_setup(char *str);
 /*
  * Structure for NAPI scheduling similar to tasklet but with weighting
  */
+#define GRO_HASH_BUCKETS	8
 struct napi_struct {
 	/* The poll_list must only be managed by the entity which
 	 * changes the state of the NAPI_STATE_SCHED bit.  This means
@@ -322,7 +323,7 @@ struct napi_struct {
 	int			poll_owner;
 #endif
 	struct net_device	*dev;
-	struct list_head	gro_list;
+	struct list_head	gro_hash[GRO_HASH_BUCKETS];
 	struct sk_buff		*skb;
 	struct hrtimer		timer;
 	struct list_head	dev_list;
diff --git a/net/core/dev.c b/net/core/dev.c
index aa61b9344b46..dffed642e686 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4875,15 +4875,12 @@ static int napi_gro_complete(struct sk_buff *skb)
 	return netif_receive_skb_internal(skb);
 }
 
-/* napi->gro_list contains packets ordered by age.
- * youngest packets at the head of it.
- * Complete skbs in reverse order to reduce latencies.
- */
-void napi_gro_flush(struct napi_struct *napi, bool flush_old)
+static void __napi_gro_flush_chain(struct napi_struct *napi, struct list_head *head,
+				   bool flush_old)
 {
 	struct sk_buff *skb, *p;
 
-	list_for_each_entry_safe_reverse(skb, p, &napi->gro_list, list) {
+	list_for_each_entry_safe_reverse(skb, p, head, list) {
 		if (flush_old && NAPI_GRO_CB(skb)->age == jiffies)
 			return;
 		list_del_init(&skb->list);
@@ -4891,15 +4888,33 @@ void napi_gro_flush(struct napi_struct *napi, bool flush_old)
 		napi->gro_count--;
 	}
 }
+
+/* napi->gro_hash contains packets ordered by age.
+ * youngest packets at the head of it.
+ * Complete skbs in reverse order to reduce latencies.
+ */
+void napi_gro_flush(struct napi_struct *napi, bool flush_old)
+{
+	int i;
+
+	for (i = 0; i < GRO_HASH_BUCKETS; i++) {
+		struct list_head *head = &napi->gro_hash[i];
+
+		__napi_gro_flush_chain(napi, head, flush_old);
+	}
+}
 EXPORT_SYMBOL(napi_gro_flush);
 
-static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)
+static struct list_head *gro_list_prepare(struct napi_struct *napi,
+					  struct sk_buff *skb)
 {
 	unsigned int maclen = skb->dev->hard_header_len;
 	u32 hash = skb_get_hash_raw(skb);
+	struct list_head *head;
 	struct sk_buff *p;
 
-	list_for_each_entry(p, &napi->gro_list, list) {
+	head = &napi->gro_hash[hash & (GRO_HASH_BUCKETS - 1)];
+	list_for_each_entry(p, head, list) {
 		unsigned long diffs;
 
 		NAPI_GRO_CB(p)->flush = 0;
@@ -4922,6 +4937,8 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)
 				       maclen);
 		NAPI_GRO_CB(p)->same_flow = !diffs;
 	}
+
+	return head;
 }
 
 static void skb_gro_reset_offset(struct sk_buff *skb)
@@ -4964,11 +4981,45 @@ static void gro_pull_from_frag0(struct sk_buff *skb, int grow)
 	}
 }
 
+static void gro_flush_oldest(struct napi_struct *napi)
+{
+	struct sk_buff *oldest = NULL;
+	unsigned long age = jiffies;
+	int i;
+
+	for (i = 0; i < GRO_HASH_BUCKETS; i++) {
+		struct list_head *head = &napi->gro_hash[i];
+		struct sk_buff *skb;
+
+		if (list_empty(head))
+			continue;
+
+		skb = list_last_entry(head, struct sk_buff, list);
+		if (!oldest || time_before(NAPI_GRO_CB(skb)->age, age)) {
+			oldest = skb;
+			age = NAPI_GRO_CB(skb)->age;
+		}
+	}
+
+	/* We are called with napi->gro_count >= MAX_GRO_SKBS, so this is
+	 * impossible.
+	 */
+	if (WARN_ON_ONCE(!oldest))
+		return;
+
+	/* Do not adjust napi->gro_count, caller is adding a new SKB to
+	 * the chain.
+	 */
+	list_del(&oldest->list);
+	napi_gro_complete(oldest);
+}
+
 static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
 {
 	struct list_head *head = &offload_base;
 	struct packet_offload *ptype;
 	__be16 type = skb->protocol;
+	struct list_head *gro_head;
 	struct sk_buff *pp = NULL;
 	enum gro_result ret;
 	int same_flow;
@@ -4977,7 +5028,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 	if (netif_elide_gro(skb->dev))
 		goto normal;
 
-	gro_list_prepare(napi, skb);
+	gro_head = gro_list_prepare(napi, skb);
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(ptype, head, list) {
@@ -5011,7 +5062,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 			NAPI_GRO_CB(skb)->csum_valid = 0;
 		}
 
-		pp = ptype->callbacks.gro_receive(&napi->gro_list, skb);
+		pp = ptype->callbacks.gro_receive(gro_head, skb);
 		break;
 	}
 	rcu_read_unlock();
@@ -5040,11 +5091,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 		goto normal;
 
 	if (unlikely(napi->gro_count >= MAX_GRO_SKBS)) {
-		struct sk_buff *nskb;
-
-		nskb = list_last_entry(&napi->gro_list, struct sk_buff, list);
-		list_del(&nskb->list);
-		napi_gro_complete(nskb);
+		gro_flush_oldest(napi);
 	} else {
 		napi->gro_count++;
 	}
@@ -5052,7 +5099,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 	NAPI_GRO_CB(skb)->age = jiffies;
 	NAPI_GRO_CB(skb)->last = skb;
 	skb_shinfo(skb)->gso_size = skb_gro_len(skb);
-	list_add(&skb->list, &napi->gro_list);
+	list_add(&skb->list, gro_head);
 	ret = GRO_HELD;
 
 pull:
@@ -5458,7 +5505,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
 				 NAPIF_STATE_IN_BUSY_POLL)))
 		return false;
 
-	if (!list_empty(&n->gro_list)) {
+	if (n->gro_count) {
 		unsigned long timeout = 0;
 
 		if (work_done)
@@ -5667,7 +5714,7 @@ static enum hrtimer_restart napi_watchdog(struct hrtimer *timer)
 	/* Note : we use a relaxed variant of napi_schedule_prep() not setting
 	 * NAPI_STATE_MISSED, since we do not react to a device IRQ.
 	 */
-	if (!list_empty(&napi->gro_list) && !napi_disable_pending(napi) &&
+	if (napi->gro_count && !napi_disable_pending(napi) &&
 	    !test_and_set_bit(NAPI_STATE_SCHED, &napi->state))
 		__napi_schedule_irqoff(napi);
 
@@ -5677,11 +5724,14 @@ static enum hrtimer_restart napi_watchdog(struct hrtimer *timer)
 void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 		    int (*poll)(struct napi_struct *, int), int weight)
 {
+	int i;
+
 	INIT_LIST_HEAD(&napi->poll_list);
 	hrtimer_init(&napi->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
 	napi->timer.function = napi_watchdog;
 	napi->gro_count = 0;
-	INIT_LIST_HEAD(&napi->gro_list);
+	for (i = 0; i < GRO_HASH_BUCKETS; i++)
+		INIT_LIST_HEAD(&napi->gro_hash[i]);
 	napi->skb = NULL;
 	napi->poll = poll;
 	if (weight > NAPI_POLL_WEIGHT)
@@ -5714,12 +5764,16 @@ void napi_disable(struct napi_struct *n)
 }
 EXPORT_SYMBOL(napi_disable);
 
-static void gro_list_free(struct list_head *head)
+static void flush_gro_hash(struct napi_struct *napi)
 {
-	struct sk_buff *skb, *p;
+	int i;
 
-	list_for_each_entry_safe(skb, p, head, list)
-		kfree_skb(skb);
+	for (i = 0; i < GRO_HASH_BUCKETS; i++) {
+		struct sk_buff *skb, *n;
+
+		list_for_each_entry_safe(skb, n, &napi->gro_hash[i], list)
+			kfree_skb(skb);
+	}
 }
 
 /* Must be called in process context */
@@ -5731,8 +5785,7 @@ void netif_napi_del(struct napi_struct *napi)
 	list_del_init(&napi->dev_list);
 	napi_free_frags(napi);
 
-	gro_list_free(&napi->gro_list);
-	INIT_LIST_HEAD(&napi->gro_list);
+	flush_gro_hash(napi);
 	napi->gro_count = 0;
 }
 EXPORT_SYMBOL(netif_napi_del);
@@ -5775,7 +5828,7 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
 		goto out_unlock;
 	}
 
-	if (!list_empty(&n->gro_list)) {
+	if (n->gro_count) {
 		/* flush too old packets
 		 * If HZ < 1000, flush all packets.
 		 */
-- 
2.17.1

^ permalink raw reply related

* Re: BUG: unable to handle kernel paging request in bpf_int_jit_compile
From: Thomas Gleixner @ 2018-06-24  7:09 UTC (permalink / raw)
  To: syzbot
  Cc: ast, daniel, David Miller, H. Peter Anvin, kuznet, LKML, mingo,
	netdev, syzkaller-bugs, x86, yoshfuji, Peter Zijlstra
In-Reply-To: <000000000000d48c8e056f5b6c67@google.com>

On Sat, 23 Jun 2018, syzbot wrote:
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+a4eb8c7766952a1ca872@syzkaller.appspotmail.com
> 
> RAX: ffffffffffffffda RBX: 0000000001429914 RCX: 0000000000455a99
> RDX: 0000000000000048 RSI: 0000000020000240 RDI: 0000000000000005
> RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
> R13: 00000000004bb7d5 R14: 00000000004c8508 R15: 0000000000000023
> BUG: unable to handle kernel paging request at ffffffffa0008002
> PGD 8e6d067 P4D 8e6d067 PUD 8e6e063 PMD 1b4528067 PTE 1d433d161
> Oops: 0003 [#1] SMP KASAN
> CPU: 1 PID: 4811 Comm: syz-executor0 Not tainted 4.18.0-rc1+ #114
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
> 01/01/2011
> RIP: 0010:bpf_jit_binary_lock_ro include/linux/filter.h:703 [inline]
> RIP: 0010:bpf_int_jit_compile+0xc36/0xf30 arch/x86/net/bpf_jit_comp.c:1168

static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
{
        WARN_ON_ONCE(set_memory_ro((unsigned long)hdr, hdr->pages));
}

Qualitee. set_memory_ro() has legitimate reasons to fail, but sure it does
not most of the time.

So instead of implementing proper error handling, this adds complete bogus
wrappers. Hell, set_memory_*() have stub functions which return 0 for the
CONFIG_ARCH_HAS_SET_MEMORY=n case.

The unlock function is even more hilarious:

static inline void bpf_prog_unlock_ro(struct bpf_prog *fp)
{
        if (fp->locked) {
                WARN_ON_ONCE(set_memory_rw((unsigned long)fp, fp->pages));
                /* In case set_memory_rw() fails, we want to be the first
                 * to crash here instead of some random place later on.
                 */
                fp->locked = 0;
        }
}

Great approach for a facility, which deals with untrusted user space
stuff. Yeah. I know. The BPF mantra is: "Performance first"

I'm really tempted to make the BPF config switch depend on BROKEN. 

Thanks,

	tglx

^ permalink raw reply

* Re: BUG: unable to handle kernel paging request in bpf_int_jit_compile
From: David Miller @ 2018-06-24  7:14 UTC (permalink / raw)
  To: tglx
  Cc: syzbot+a4eb8c7766952a1ca872, ast, daniel, hpa, kuznet,
	linux-kernel, mingo, netdev, syzkaller-bugs, x86, yoshfuji,
	peterz
In-Reply-To: <alpine.DEB.2.21.1806240849150.8650@nanos.tec.linutronix.de>

From: Thomas Gleixner <tglx@linutronix.de>
Date: Sun, 24 Jun 2018 09:09:09 +0200 (CEST)

> I'm really tempted to make the BPF config switch depend on BROKEN. 

This really isn't necessary Thomas.

Whoever wrote the code didn't understand that set ro can legitimately
fail.

So let's correct that instead of flaming a feature.

Thank you.

^ permalink raw reply

* Re: [PATCH net-next] strparser: Corrected typo in documentation.
From: David Miller @ 2018-06-24  7:18 UTC (permalink / raw)
  To: vakul.garg; +Cc: netdev, linux-kernel, linux-doc, corbet
In-Reply-To: <20180624123721.24287-1-vakul.garg@nxp.com>

From: Vakul Garg <vakul.garg@nxp.com>
Date: Sun, 24 Jun 2018 18:07:21 +0530

> Replaced strp_pause() with strp_unpause() to correct a seemingly copy
> paste documentation mistake.
> 
> Signed-off-by: Vakul Garg <vakul.garg@nxp.com>

As a bug fix this should target 'net'.

This is even more true since this fixes documentation.

^ permalink raw reply

* BUG: unable to handle kernel paging request in bpf_prog_select_runtime
From: syzbot @ 2018-06-24  7:32 UTC (permalink / raw)
  To: ast, daniel, linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    5e2204832b20 Merge tag 'powerpc-4.18-2' of git://git.kerne..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11e79814400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=befbcd7305e41bb0
dashboard link: https://syzkaller.appspot.com/bug?extid=d866d1925855328eac3b
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+d866d1925855328eac3b@syzkaller.appspotmail.com

BUG: unable to handle kernel paging request at ffffc90001952002
PGD 1da947067 P4D 1da947067 PUD 1da948067 PMD 1d3a61067 PTE 800000016ba62161
Oops: 0003 [#1] SMP KASAN
CPU: 0 PID: 17593 Comm: syz-executor3 Not tainted 4.18.0-rc1+ #114
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:bpf_prog_lock_ro include/linux/filter.h:681 [inline]
RIP: 0010:bpf_prog_select_runtime+0xf5/0xa60 kernel/bpf/core.c:1519
Code: 48 b8 00 00 00 00 00 fc ff df 48 89 ca 48 c1 ea 03 0f b6 14 02 48 89  
c8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 de 08 00 00 <41> 80 66 02 fb  
e8 f1 67 f3 ff 49 8d 7e 20 48 b8 00 00 00 00 00 fc
RSP: 0018:ffff8801b4d6fac8 EFLAGS: 00010246
RAX: 0000000000000003 RBX: 00000000fffffff4 RCX: ffffc90001952002
RDX: 0000000000000000 RSI: ffffffff8188a717 RDI: 0000000000000005
RBP: ffff8801b4d6fb30 R08: ffff8801d778e400 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffc90001952030 R14: ffffc90001952000 R15: ffff8801b4d6fb98
FS:  00007f59edc0c700(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc90001952002 CR3: 00000001d20b4000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  bpf_prog_load+0x1194/0x1c60 kernel/bpf/syscall.c:1356
  __do_sys_bpf kernel/bpf/syscall.c:2360 [inline]
  __se_sys_bpf kernel/bpf/syscall.c:2322 [inline]
  __x64_sys_bpf+0x36c/0x510 kernel/bpf/syscall.c:2322
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455a99
Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f59edc0bc68 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 00007f59edc0c6d4 RCX: 0000000000455a99
RDX: 0000000000000048 RSI: 0000000020000240 RDI: 0000000000000005
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000015
R13: 00000000004bb7d5 R14: 00000000004c8508 R15: 0000000000000023
Modules linked in:
Dumping ftrace buffer:
    (ftrace buffer empty)
CR2: ffffc90001952002
---[ end trace f4bd75b0437a2cce ]---
RIP: 0010:bpf_prog_lock_ro include/linux/filter.h:681 [inline]
RIP: 0010:bpf_prog_select_runtime+0xf5/0xa60 kernel/bpf/core.c:1519
Code: 48 b8 00 00 00 00 00 fc ff df 48 89 ca 48 c1 ea 03 0f b6 14 02 48 89  
c8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 de 08 00 00 <41> 80 66 02 fb  
e8 f1 67 f3 ff 49 8d 7e 20 48 b8 00 00 00 00 00 fc
RSP: 0018:ffff8801b4d6fac8 EFLAGS: 00010246
RAX: 0000000000000003 RBX: 00000000fffffff4 RCX: ffffc90001952002
RDX: 0000000000000000 RSI: ffffffff8188a717 RDI: 0000000000000005
RBP: ffff8801b4d6fb30 R08: ffff8801d778e400 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffc90001952030 R14: ffffc90001952000 R15: ffff8801b4d6fb98
FS:  00007f59edc0c700(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc90001952002 CR3: 00000001d20b4000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply

* Re: [PATCH net] strparser: Corrected typo in documentation.
From: David Miller @ 2018-06-24  7:40 UTC (permalink / raw)
  To: vakul.garg; +Cc: netdev, linux-kernel, linux-doc, corbet
In-Reply-To: <20180624124401.24584-1-vakul.garg@nxp.com>

From: Vakul Garg <vakul.garg@nxp.com>
Date: Sun, 24 Jun 2018 18:14:01 +0530

> Replaced strp_pause() with strp_unpause() to correct a seemingly copy
> paste documentation mistake.
> 
> Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
> ---
> Resending for 'net' as advised.

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next] net: phy: fixed-phy: Make the error path simpler
From: David Miller @ 2018-06-24  7:42 UTC (permalink / raw)
  To: festevam; +Cc: andrew, f.fainelli, netdev, fabio.estevam
In-Reply-To: <1529800102-18287-1-git-send-email-festevam@gmail.com>

From: Fabio Estevam <festevam@gmail.com>
Date: Sat, 23 Jun 2018 21:28:22 -0300

> From: Fabio Estevam <fabio.estevam@nxp.com>
> 
> When platform_device_register_simple() fails we can return
> the error immediately instead of jumping to the 'err_pdev'
> label.
> 
> This makes the error path a bit simpler.
> 
> Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>

Applied, thank you.

^ permalink raw reply

* Re: [Patch net-next] net_sched: remove unused htb drop_list
From: David Miller @ 2018-06-24  7:43 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, fw
In-Reply-To: <20180623204639.29933-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Sat, 23 Jun 2018 13:46:39 -0700

> After commit a09ceb0e0814 ("sched: remove qdisc->drop"),
> it is no longer used.
> 
> Cc: Florian Westphal <fw@strlen.de>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied, thanks Cong.

^ permalink raw reply

* Re: [PATCH] ipv6: avoid copy_from_user() via ipv6_renew_options_kern()
From: David Miller @ 2018-06-24  7:48 UTC (permalink / raw)
  To: viro; +Cc: pmoore, netdev, selinux, linux-security-module
In-Reply-To: <20180623222106.GE30522@ZenIV.linux.org.uk>

From: Al Viro <viro@ZenIV.linux.org.uk>
Date: Sat, 23 Jun 2018 23:21:07 +0100

> BTW, I wonder if the life would be simpler with do_ipv6_setsockopt() doing
> the copy-in and verifying ipv6_optlen(*hdr) <= newoptlen; that would've
> simplified ipv6_renew_option{,s}() quite a bit and completely eliminated
> ipv6_renew_options_kern()...

I agree that this makes things a lot simpler.

One thing that drives me crazy though is this inherit stuff:

> +	ipv6_renew_option(newtype == IPV6_HOPOPTS ? newopt :
> +				opt ? opt->hopopt : NULL,

Why don't we pass the type into ipv6_renew_option() and have it
do this pointer dance instead?

That's going to definitely be easier to read.

I don't know enough about this code to give feedback about the
option length handling wrt. copies, sorry.

^ permalink raw reply

* [PATCH rdma-next 08/12] overflow.h: Add arithmetic shift helper
From: Leon Romanovsky @ 2018-06-24  8:23 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe, Kees Cook, Rasmus Villemoes
  Cc: Leon Romanovsky, RDMA mailing list, Hadar Hen Zion, Matan Barak,
	Michael J Ruhl, Noa Osherovich, Raed Salem, Yishai Hadas,
	Saeed Mahameed, linux-netdev, linux-kernel
In-Reply-To: <20180624082353.16138-1-leon@kernel.org>

From: Leon Romanovsky <leonro@mellanox.com>

Add shift_overflow() helper to help driver authors to ensure that
shift operand doesn't cause to overflow, which is very common pattern
for RDMA drivers.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 include/linux/overflow.h | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/include/linux/overflow.h b/include/linux/overflow.h
index 8712ff70995f..2a3395248e94 100644
--- a/include/linux/overflow.h
+++ b/include/linux/overflow.h
@@ -202,6 +202,29 @@

 #endif /* COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW */

+/**
+ * shift_overflow() - Peform shift operation with overflow check
+ * @a: value to be shifted
+ * @b: shift operand
+ *
+ * Checks if a << b will overflow
+ *
+ * Returns: result of shift for no overflow or SIZE_MAX for overflow
+ */
+static inline __must_check size_t shift_overflow(size_t a, size_t b)
+{
+	size_t c, res;
+
+	if (b >= sizeof(size_t) * BITS_PER_BYTE)
+		return SIZE_MAX;
+
+	c = (size_t)1 << b;
+	if (check_mul_overflow(a, c, &res))
+		return SIZE_MAX;
+
+	return res;
+}
+
 /**
  * array_size() - Calculate size of 2-dimensional array.
  *
--
2.14.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox