* [PATCH net] net: vrf: Add missing Rx counters
From: David Ahern @ 2017-01-03 17:37 UTC (permalink / raw)
To: netdev; +Cc: David Ahern
The move from rx-handler to L3 receive handler inadvertantly dropped the
rx counters. Restore them.
Fixes: 74b20582ac38 ("net: l3mdev: Add hook in ip and ipv6")
Reported-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
drivers/net/vrf.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 7532646c3b7b..23dfb0eac098 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -967,6 +967,7 @@ static struct sk_buff *vrf_ip6_rcv(struct net_device *vrf_dev,
*/
need_strict = rt6_need_strict(&ipv6_hdr(skb)->daddr);
if (!ipv6_ndisc_frame(skb) && !need_strict) {
+ vrf_rx_stats(vrf_dev, skb->len);
skb->dev = vrf_dev;
skb->skb_iif = vrf_dev->ifindex;
@@ -1011,6 +1012,8 @@ static struct sk_buff *vrf_ip_rcv(struct net_device *vrf_dev,
goto out;
}
+ vrf_rx_stats(vrf_dev, skb->len);
+
skb_push(skb, skb->mac_len);
dev_queue_xmit_nit(skb, vrf_dev);
skb_pull(skb, skb->mac_len);
--
2.1.4
^ permalink raw reply related
* [PATCH net-next v2] net/sched: cls_matchall: Fix error path
From: Yotam Gigi @ 2017-01-03 17:20 UTC (permalink / raw)
To: jhs, davem, eladr, jiri, netdev; +Cc: Yotam Gigi
Fix several error paths in matchall:
- Release reference to actions in case the hardware fails offloading
(relevant to skip_sw only)
- Fix error path in case tcf_exts initialization/validation fail
Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
---
v1->v2:
- Add check for tcf_exts_init return code and fix error path for it too
---
net/sched/cls_matchall.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index f935429..fcecf5a 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -141,10 +141,12 @@ static int mall_set_parms(struct net *net, struct tcf_proto *tp,
struct tcf_exts e;
int err;
- tcf_exts_init(&e, TCA_MATCHALL_ACT, 0);
+ err = tcf_exts_init(&e, TCA_MATCHALL_ACT, 0);
+ if (err)
+ return err;
err = tcf_exts_validate(net, tp, tb, est, &e, ovr);
if (err < 0)
- return err;
+ goto errout;
if (tb[TCA_MATCHALL_CLASSID]) {
f->res.classid = nla_get_u32(tb[TCA_MATCHALL_CLASSID]);
@@ -154,6 +156,9 @@ static int mall_set_parms(struct net *net, struct tcf_proto *tp,
tcf_exts_change(tp, &f->exts, &e);
return 0;
+errout:
+ tcf_exts_destroy(&e);
+ return err;
}
static int mall_change(struct net *net, struct sk_buff *in_skb,
@@ -193,7 +198,9 @@ static int mall_change(struct net *net, struct sk_buff *in_skb,
if (!f)
return -ENOBUFS;
- tcf_exts_init(&f->exts, TCA_MATCHALL_ACT, 0);
+ err = tcf_exts_init(&f->exts, TCA_MATCHALL_ACT, 0);
+ if (err)
+ goto err_exts_init;
if (!handle)
handle = 1;
@@ -202,13 +209,13 @@ static int mall_change(struct net *net, struct sk_buff *in_skb,
err = mall_set_parms(net, tp, f, base, tb, tca[TCA_RATE], ovr);
if (err)
- goto errout;
+ goto err_set_parms;
if (tc_should_offload(dev, tp, flags)) {
err = mall_replace_hw_filter(tp, f, (unsigned long) f);
if (err) {
if (tc_skip_sw(flags))
- goto errout;
+ goto err_replace_hw_filter;
else
err = 0;
}
@@ -219,7 +226,10 @@ static int mall_change(struct net *net, struct sk_buff *in_skb,
return 0;
-errout:
+err_replace_hw_filter:
+err_set_parms:
+ tcf_exts_destroy(&f->exts);
+err_exts_init:
kfree(f);
return err;
}
--
2.4.11
^ permalink raw reply related
* Re: [PATCH net-next rfc 0/6] convert tc_verd to integer bitfields
From: Willem de Bruijn @ 2017-01-03 17:05 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Network Development, David Miller, Florian Westphal, dborkman,
Alexei Starovoitov, Willem de Bruijn, Roman Mashak,
Hannes Frederic Sowa, Shmulik Ladkani
In-Reply-To: <c32d7b7c-3dd3-2d75-5f9a-e356f1fc7732@mojatatu.com>
> No objections to new year resolution of slimming the skb.
> But: i am still concerned about the recursion that getting rid of
> some of these bits could embolden. i.e my suggestion was infact to
> restore some of those bits taken away by Florian after the ingress
> redirect patches from Shmulik.
>
> The possibilities are: egress->egress, egress->ingress,
> ingress->egress, ingress->ingress. The suggestion was
> xmit_recursion with some skb magic would suffice.
> Hannes promised around last netdevconf that he has a scheme to solve
> it without using any extra skb state.
Are you referring to
"
Personally, I would only try to fix and warn against the easy to detect
cases. It is easy enough to just create a loop with your local attached
L2 which brings your box into a endless loop processing the same packet
again and again. Because it is out of control of the kernel you cannot
do anything at all.
I would just care that we sometimes reschedule and don't do everything
in one stack so we don't corrupt the machine and an admin has still a
chance to solve the problem.
"
https://www.spinics.net/lists/netdev/msg397498.html
That can be solved by extending act_mirred in the same way as
Daniel did for bpf_redirect in a70b506efe89 ("bpf: enforce recursion
limit on redirects").
^ permalink raw reply
* [PATCH net] libcxgb: fix error check for ip6_route_output()
From: Varun Prakash @ 2017-01-03 15:55 UTC (permalink / raw)
To: davem; +Cc: netdev, swise, indranil, varun
ip6_route_output() never returns NULL so
check dst->error instead of !dst.
Signed-off-by: Varun Prakash <varun@chelsio.com>
---
drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.c b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.c
index 0f0de5b..d04a6c1 100644
--- a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.c
+++ b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_cm.c
@@ -133,17 +133,15 @@ cxgb_find_route6(struct cxgb4_lld_info *lldi,
if (ipv6_addr_type(&fl6.daddr) & IPV6_ADDR_LINKLOCAL)
fl6.flowi6_oif = sin6_scope_id;
dst = ip6_route_output(&init_net, NULL, &fl6);
- if (!dst)
- goto out;
- if (!cxgb_our_interface(lldi, get_real_dev,
- ip6_dst_idev(dst)->dev) &&
- !(ip6_dst_idev(dst)->dev->flags & IFF_LOOPBACK)) {
+ if (dst->error ||
+ (!cxgb_our_interface(lldi, get_real_dev,
+ ip6_dst_idev(dst)->dev) &&
+ !(ip6_dst_idev(dst)->dev->flags & IFF_LOOPBACK))) {
dst_release(dst);
- dst = NULL;
+ return NULL;
}
}
-out:
return dst;
}
EXPORT_SYMBOL(cxgb_find_route6);
--
2.0.2
^ permalink raw reply related
* Re: [RFC PATCH] virtio_net: XDP support for adjust_head
From: John Fastabend @ 2017-01-03 16:57 UTC (permalink / raw)
To: Jason Wang, mst; +Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel
In-Reply-To: <586BD734.7020105@gmail.com>
On 17-01-03 08:54 AM, John Fastabend wrote:
> On 17-01-02 10:01 PM, Jason Wang wrote:
>>
>>
>> On 2017年01月03日 03:44, John Fastabend wrote:
>>> Add support for XDP adjust head by allocating a 256B header region
>>> that XDP programs can grow into. This is only enabled when a XDP
>>> program is loaded.
>>>
>>> In order to ensure that we do not have to unwind queue headroom push
>>> queue setup below bpf_prog_add. It reads better to do a prog ref
>>> unwind vs another queue setup call.
>>>
>>> : There is a problem with this patch as is. When xdp prog is loaded
>>> the old buffers without the 256B headers need to be flushed so that
>>> the bpf prog has the necessary headroom. This patch does this by
>>> calling the virtqueue_detach_unused_buf() and followed by the
>>> virtnet_set_queues() call to reinitialize the buffers. However I
>>> don't believe this is safe per comment in virtio_ring this API
>>> is not valid on an active queue and the only thing we have done
>>> here is napi_disable/napi_enable wrappers which doesn't do anything
>>> to the emulation layer.
>>>
>>> So the RFC is really to find the best solution to this problem.
>>> A couple things come to mind, (a) always allocate the necessary
>>> headroom but this is a bit of a waste (b) add some bit somewhere
>>> to check if the buffer has headroom but this would mean XDP programs
>>> would be broke for a cycle through the ring, (c) figure out how
>>> to deactivate a queue, free the buffers and finally reallocate.
>>> I think (c) is the best choice for now but I'm not seeing the
>>> API to do this so virtio/qemu experts anyone know off-hand
>>> how to make this work? I started looking into the PCI callbacks
>>> reset() and virtio_device_ready() or possibly hitting the right
>>> set of bits with vp_set_status() but my first attempt just hung
>>> the device.
>>
>> Hi John:
>>
>> AFAIK, disabling a specific queue was supported only by virtio 1.0 through
>> queue_enable field in pci common cfg. But unfortunately, qemu does not emulate
>> this at all and legacy device does not even support this. So the safe way is
>> probably reset the device and redo the initialization here.
>>
>
> OK, I'll draft up a fix with a full reset unless Michael has some idea in
> the meantime.
>
>>>
>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>> ---
>>> drivers/net/virtio_net.c | 106 +++++++++++++++++++++++++++++++++++-----------
>>> 1 file changed, 80 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index 5deeda6..fcc5bd7 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -159,6 +159,9 @@ struct virtnet_info {
>>> /* Ethtool settings */
>>> u8 duplex;
>>> u32 speed;
>>> +
>>> + /* Headroom allocated in RX Queue */
>>> + unsigned int headroom;
>>> };
>>> struct padded_vnet_hdr {
>>> @@ -355,6 +358,7 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
>>> }
>>> if (vi->mergeable_rx_bufs) {
>>> + xdp->data -= sizeof(struct virtio_net_hdr_mrg_rxbuf);
>>> /* Zero header and leave csum up to XDP layers */
>>> hdr = xdp->data;
>>> memset(hdr, 0, vi->hdr_len);
>>> @@ -371,7 +375,7 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
>>> num_sg = 2;
>>> sg_init_table(sq->sg, 2);
>>> sg_set_buf(sq->sg, hdr, vi->hdr_len);
>>> - skb_to_sgvec(skb, sq->sg + 1, 0, skb->len);
>>> + skb_to_sgvec(skb, sq->sg + 1, vi->headroom, xdp->data_end - xdp->data);
>>
>> vi->headroom look suspicious, should it be xdp->data - xdp->data_hard_start?
>>
>
> Yep found this as well while testing small packet receive.
>
>>> }
>>> err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
>>> data, GFP_ATOMIC);
>>> @@ -393,34 +397,39 @@ static u32 do_xdp_prog(struct virtnet_info *vi,
>>> struct bpf_prog *xdp_prog,
>>> void *data, int len)
>>> {
>>> - int hdr_padded_len;
>>> struct xdp_buff xdp;
>>> - void *buf;
>>> unsigned int qp;
>>> u32 act;
>>> +
>>> if (vi->mergeable_rx_bufs) {
>>> - hdr_padded_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
>>> - xdp.data = data + hdr_padded_len;
>>> + int desc_room = sizeof(struct virtio_net_hdr_mrg_rxbuf);
>>> +
>>> + /* Allow consuming headroom but reserve enough space to push
>>> + * the descriptor on if we get an XDP_TX return code.
>>> + */
>>> + xdp.data_hard_start = data - vi->headroom + desc_room;
>>> + xdp.data = data + desc_room;
>>> xdp.data_end = xdp.data + (len - vi->hdr_len);
>>> - buf = data;
>>> } else { /* small buffers */
>>> struct sk_buff *skb = data;
>>> - xdp.data = skb->data;
>>> + xdp.data_hard_start = skb->data;
>>> + xdp.data = skb->data + vi->headroom;
>>> xdp.data_end = xdp.data + len;
>>> - buf = skb->data;
>>> }
>>> act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>> switch (act) {
>>> case XDP_PASS:
>>> + if (!vi->mergeable_rx_bufs)
>>> + __skb_pull((struct sk_buff *) data,
>>> + xdp.data - xdp.data_hard_start);
>>
>> Instead of doing things here and virtnet_xdp_xmit(). How about always making
>> skb->data point to the buffer head like:
>>
>> 1) reserve headroom in add_recvbuf_small()
>> 2) skb_push(xdp->data - xdp_data_hard_start, skb) if we detect xdp->data was
>> modified afer bpf_prog_run_xdp()
>>
>> Then there's no special code in either XDP_PASS or XDP_TX?
>>
>
> Sure. Works for me and avoids if/else code.
>
>>> return XDP_PASS;
>>> case XDP_TX:
>>> qp = vi->curr_queue_pairs -
>>> vi->xdp_queue_pairs +
>>> smp_processor_id();
>>
>> [...]
>>
>>> +#define VIRTIO_XDP_HEADROOM 256
>>> +
>>> static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>>> {
>>> unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
>>> struct virtnet_info *vi = netdev_priv(dev);
>>> struct bpf_prog *old_prog;
>>> u16 xdp_qp = 0, curr_qp;
>>> + unsigned int old_hr;
>>> int i, err;
>>> if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>>> @@ -1736,19 +1751,58 @@ static int virtnet_xdp_set(struct net_device *dev,
>>> struct bpf_prog *prog)
>>> return -ENOMEM;
>>> }
>>> + old_hr = vi->headroom;
>>> + if (prog) {
>>> + prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
>>> + if (IS_ERR(prog))
>>> + return PTR_ERR(prog);
>>> + vi->headroom = VIRTIO_XDP_HEADROOM;
>>> + } else {
>>> + vi->headroom = 0;
>>> + }
>>> +
>>> + /* Changing the headroom in buffers is a disruptive operation because
>>> + * existing buffers must be flushed and reallocated. This will happen
>>> + * when a xdp program is initially added or xdp is disabled by removing
>>> + * the xdp program.
>>> + */
>>
>> We probably need reset the device here, but maybe Michale has more ideas. And if
>> we do this, another interesting thing to do is to disable EWMA and always use a
>> single page for each packet, this could almost eliminate linearizing.
>
> Well with normal MTU 1500 size we should not hit the linearizing case right? The
> question is should we cap the MTU at GOOD_PACKET_LEN vs the current cap of
> (PAGE_SIZE - overhead).
Sorry responding to my own post with a bit more detail. I don't really like
going to a page for each packet because we end up with double the pages in use
for the "normal" 1500 MTU case. We could make the xdp allocation scheme smarter
and allocate a page per packet when MTU is greater than 2k instead of using the
EWMA but I would push those types of things at net-next and live with the
linearizing behavior for now or capping the MTU.
>
>>
>> Thanks
>
^ permalink raw reply
* Re: [RFC PATCH] virtio_net: XDP support for adjust_head
From: John Fastabend @ 2017-01-03 16:54 UTC (permalink / raw)
To: Jason Wang, mst; +Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel
In-Reply-To: <73715f7a-eeeb-679f-a7b8-7b1fefe1757e@redhat.com>
On 17-01-02 10:01 PM, Jason Wang wrote:
>
>
> On 2017年01月03日 03:44, John Fastabend wrote:
>> Add support for XDP adjust head by allocating a 256B header region
>> that XDP programs can grow into. This is only enabled when a XDP
>> program is loaded.
>>
>> In order to ensure that we do not have to unwind queue headroom push
>> queue setup below bpf_prog_add. It reads better to do a prog ref
>> unwind vs another queue setup call.
>>
>> : There is a problem with this patch as is. When xdp prog is loaded
>> the old buffers without the 256B headers need to be flushed so that
>> the bpf prog has the necessary headroom. This patch does this by
>> calling the virtqueue_detach_unused_buf() and followed by the
>> virtnet_set_queues() call to reinitialize the buffers. However I
>> don't believe this is safe per comment in virtio_ring this API
>> is not valid on an active queue and the only thing we have done
>> here is napi_disable/napi_enable wrappers which doesn't do anything
>> to the emulation layer.
>>
>> So the RFC is really to find the best solution to this problem.
>> A couple things come to mind, (a) always allocate the necessary
>> headroom but this is a bit of a waste (b) add some bit somewhere
>> to check if the buffer has headroom but this would mean XDP programs
>> would be broke for a cycle through the ring, (c) figure out how
>> to deactivate a queue, free the buffers and finally reallocate.
>> I think (c) is the best choice for now but I'm not seeing the
>> API to do this so virtio/qemu experts anyone know off-hand
>> how to make this work? I started looking into the PCI callbacks
>> reset() and virtio_device_ready() or possibly hitting the right
>> set of bits with vp_set_status() but my first attempt just hung
>> the device.
>
> Hi John:
>
> AFAIK, disabling a specific queue was supported only by virtio 1.0 through
> queue_enable field in pci common cfg. But unfortunately, qemu does not emulate
> this at all and legacy device does not even support this. So the safe way is
> probably reset the device and redo the initialization here.
>
OK, I'll draft up a fix with a full reset unless Michael has some idea in
the meantime.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>> drivers/net/virtio_net.c | 106 +++++++++++++++++++++++++++++++++++-----------
>> 1 file changed, 80 insertions(+), 26 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 5deeda6..fcc5bd7 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -159,6 +159,9 @@ struct virtnet_info {
>> /* Ethtool settings */
>> u8 duplex;
>> u32 speed;
>> +
>> + /* Headroom allocated in RX Queue */
>> + unsigned int headroom;
>> };
>> struct padded_vnet_hdr {
>> @@ -355,6 +358,7 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
>> }
>> if (vi->mergeable_rx_bufs) {
>> + xdp->data -= sizeof(struct virtio_net_hdr_mrg_rxbuf);
>> /* Zero header and leave csum up to XDP layers */
>> hdr = xdp->data;
>> memset(hdr, 0, vi->hdr_len);
>> @@ -371,7 +375,7 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
>> num_sg = 2;
>> sg_init_table(sq->sg, 2);
>> sg_set_buf(sq->sg, hdr, vi->hdr_len);
>> - skb_to_sgvec(skb, sq->sg + 1, 0, skb->len);
>> + skb_to_sgvec(skb, sq->sg + 1, vi->headroom, xdp->data_end - xdp->data);
>
> vi->headroom look suspicious, should it be xdp->data - xdp->data_hard_start?
>
Yep found this as well while testing small packet receive.
>> }
>> err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
>> data, GFP_ATOMIC);
>> @@ -393,34 +397,39 @@ static u32 do_xdp_prog(struct virtnet_info *vi,
>> struct bpf_prog *xdp_prog,
>> void *data, int len)
>> {
>> - int hdr_padded_len;
>> struct xdp_buff xdp;
>> - void *buf;
>> unsigned int qp;
>> u32 act;
>> +
>> if (vi->mergeable_rx_bufs) {
>> - hdr_padded_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
>> - xdp.data = data + hdr_padded_len;
>> + int desc_room = sizeof(struct virtio_net_hdr_mrg_rxbuf);
>> +
>> + /* Allow consuming headroom but reserve enough space to push
>> + * the descriptor on if we get an XDP_TX return code.
>> + */
>> + xdp.data_hard_start = data - vi->headroom + desc_room;
>> + xdp.data = data + desc_room;
>> xdp.data_end = xdp.data + (len - vi->hdr_len);
>> - buf = data;
>> } else { /* small buffers */
>> struct sk_buff *skb = data;
>> - xdp.data = skb->data;
>> + xdp.data_hard_start = skb->data;
>> + xdp.data = skb->data + vi->headroom;
>> xdp.data_end = xdp.data + len;
>> - buf = skb->data;
>> }
>> act = bpf_prog_run_xdp(xdp_prog, &xdp);
>> switch (act) {
>> case XDP_PASS:
>> + if (!vi->mergeable_rx_bufs)
>> + __skb_pull((struct sk_buff *) data,
>> + xdp.data - xdp.data_hard_start);
>
> Instead of doing things here and virtnet_xdp_xmit(). How about always making
> skb->data point to the buffer head like:
>
> 1) reserve headroom in add_recvbuf_small()
> 2) skb_push(xdp->data - xdp_data_hard_start, skb) if we detect xdp->data was
> modified afer bpf_prog_run_xdp()
>
> Then there's no special code in either XDP_PASS or XDP_TX?
>
Sure. Works for me and avoids if/else code.
>> return XDP_PASS;
>> case XDP_TX:
>> qp = vi->curr_queue_pairs -
>> vi->xdp_queue_pairs +
>> smp_processor_id();
>
> [...]
>
>> +#define VIRTIO_XDP_HEADROOM 256
>> +
>> static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>> {
>> unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
>> struct virtnet_info *vi = netdev_priv(dev);
>> struct bpf_prog *old_prog;
>> u16 xdp_qp = 0, curr_qp;
>> + unsigned int old_hr;
>> int i, err;
>> if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>> @@ -1736,19 +1751,58 @@ static int virtnet_xdp_set(struct net_device *dev,
>> struct bpf_prog *prog)
>> return -ENOMEM;
>> }
>> + old_hr = vi->headroom;
>> + if (prog) {
>> + prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
>> + if (IS_ERR(prog))
>> + return PTR_ERR(prog);
>> + vi->headroom = VIRTIO_XDP_HEADROOM;
>> + } else {
>> + vi->headroom = 0;
>> + }
>> +
>> + /* Changing the headroom in buffers is a disruptive operation because
>> + * existing buffers must be flushed and reallocated. This will happen
>> + * when a xdp program is initially added or xdp is disabled by removing
>> + * the xdp program.
>> + */
>
> We probably need reset the device here, but maybe Michale has more ideas. And if
> we do this, another interesting thing to do is to disable EWMA and always use a
> single page for each packet, this could almost eliminate linearizing.
Well with normal MTU 1500 size we should not hit the linearizing case right? The
question is should we cap the MTU at GOOD_PACKET_LEN vs the current cap of
(PAGE_SIZE - overhead).
>
> Thanks
^ permalink raw reply
* Re: [net PATCH] net: virtio: cap mtu when XDP programs are running
From: John Fastabend @ 2017-01-03 16:48 UTC (permalink / raw)
To: Jason Wang, mst; +Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel
In-Reply-To: <402027b4-58c7-aa1b-5079-74e31448f544@redhat.com>
On 17-01-02 10:14 PM, Jason Wang wrote:
>
>
> On 2017年01月03日 06:30, John Fastabend wrote:
>> XDP programs can not consume multiple pages so we cap the MTU to
>> avoid this case. Virtio-net however only checks the MTU at XDP
>> program load and does not block MTU changes after the program
>> has loaded.
>>
>> This patch sets/clears the max_mtu value at XDP load/unload time.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>> drivers/net/virtio_net.c | 9 ++++++---
>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 5deeda6..783e842 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1699,6 +1699,9 @@ static void virtnet_init_settings(struct net_device *dev)
>> .set_settings = virtnet_set_settings,
>> };
>> +#define MIN_MTU ETH_MIN_MTU
>> +#define MAX_MTU ETH_MAX_MTU
>> +
>> static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>> {
>> unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
>> @@ -1748,6 +1751,9 @@ static int virtnet_xdp_set(struct net_device *dev,
>> struct bpf_prog *prog)
>> virtnet_set_queues(vi, curr_qp);
>> return PTR_ERR(prog);
>> }
>> + dev->max_mtu = max_sz;
>> + } else {
>> + dev->max_mtu = ETH_MAX_MTU;
>
> Or use ETH_DATA_LEN here consider we only allocate a size of GOOD_PACKET_LEN for
> each small buffer?
>
> Thanks
OK so this logic is a bit too simply. When it resets the max_mtu I guess it
needs to read the mtu via
virtio_cread16(vdev, ...)
or we may break the negotiated mtu.
As for capping it at GOOD_PACKET_LEN this has the nice benefit of avoiding any
underestimates in EWMA predictions because it appears min estimates are capped
at GOOD_PACKET_LEN via get_mergeable_buf_len().
Thanks,
John
^ permalink raw reply
* Re: [net PATCH] net: virtio: cap mtu when XDP programs are running
From: John Fastabend @ 2017-01-03 16:48 UTC (permalink / raw)
To: Jason Wang, mst; +Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel
In-Reply-To: <402027b4-58c7-aa1b-5079-74e31448f544@redhat.com>
On 17-01-02 10:14 PM, Jason Wang wrote:
>
>
> On 2017年01月03日 06:30, John Fastabend wrote:
>> XDP programs can not consume multiple pages so we cap the MTU to
>> avoid this case. Virtio-net however only checks the MTU at XDP
>> program load and does not block MTU changes after the program
>> has loaded.
>>
>> This patch sets/clears the max_mtu value at XDP load/unload time.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>> drivers/net/virtio_net.c | 9 ++++++---
>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 5deeda6..783e842 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1699,6 +1699,9 @@ static void virtnet_init_settings(struct net_device *dev)
>> .set_settings = virtnet_set_settings,
>> };
>> +#define MIN_MTU ETH_MIN_MTU
>> +#define MAX_MTU ETH_MAX_MTU
>> +
>> static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>> {
>> unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
>> @@ -1748,6 +1751,9 @@ static int virtnet_xdp_set(struct net_device *dev,
>> struct bpf_prog *prog)
>> virtnet_set_queues(vi, curr_qp);
>> return PTR_ERR(prog);
>> }
>> + dev->max_mtu = max_sz;
>> + } else {
>> + dev->max_mtu = ETH_MAX_MTU;
>
> Or use ETH_DATA_LEN here consider we only allocate a size of GOOD_PACKET_LEN for
> each small buffer?
>
> Thanks
OK so this logic is a bit too simply. When it resets the max_mtu I guess it
needs to read the mtu via
virtio_cread16(vdev, ...)
or we may break the negotiated mtu.
As for capping it at GOOD_PACKET_LEN this has the nice benefit of avoiding any
underestimates in EWMA predictions because it appears min estimates are capped
at GOOD_PACKET_LEN via get_mergeable_buf_len().
Thanks,
John
^ permalink raw reply
* Re: [PATCH net 9/9] virtio-net: XDP support for small buffers
From: John Fastabend @ 2017-01-03 16:40 UTC (permalink / raw)
To: Jason Wang, mst, virtualization, netdev, linux-kernel; +Cc: john.r.fastabend
In-Reply-To: <8c896c40-fd25-4b92-fe80-5be18c13dd48@redhat.com>
On 17-01-02 10:16 PM, Jason Wang wrote:
>
>
> On 2017年01月03日 06:43, John Fastabend wrote:
>> On 16-12-23 06:37 AM, Jason Wang wrote:
>>> Commit f600b6905015 ("virtio_net: Add XDP support") leaves the case of
>>> small receive buffer untouched. This will confuse the user who want to
>>> set XDP but use small buffers. Other than forbid XDP in small buffer
>>> mode, let's make it work. XDP then can only work at skb->data since
>>> virtio-net create skbs during refill, this is sub optimal which could
>>> be optimized in the future.
>>>
>>> Cc: John Fastabend <john.r.fastabend@intel.com>
>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>> ---
>>> drivers/net/virtio_net.c | 112 ++++++++++++++++++++++++++++++++++++-----------
>>> 1 file changed, 87 insertions(+), 25 deletions(-)
>>>
>> Hi Jason,
>>
>> I was doing some more testing on this what do you think about doing this
>> so that free_unused_bufs() handles the buffer free with dev_kfree_skb()
>> instead of put_page in small receive mode. Seems more correct to me.
>>
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 783e842..27ff76c 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1898,6 +1898,10 @@ static void free_receive_page_frags(struct virtnet_info
>> *vi)
>>
>> static bool is_xdp_queue(struct virtnet_info *vi, int q)
>> {
>> + /* For small receive mode always use kfree_skb variants */
>> + if (!vi->mergeable_rx_bufs)
>> + return false;
>> +
>> if (q < (vi->curr_queue_pairs - vi->xdp_queue_pairs))
>> return false;
>> else if (q < vi->curr_queue_pairs)
>>
>>
>> patch is untested just spotted doing code review.
>>
>> Thanks,
>> John
>
> We probably need a better name for this function.
>
> Acked-by: Jason Wang <jasowang@redhat.com>
>
How about is_xdp_raw_buffer_queue()?
I'll submit a proper patch today.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v3 3/3] nfc: trf7970a: Prevent repeated polling from crashing the kernel
From: Mark Greer @ 2017-01-03 16:33 UTC (permalink / raw)
To: Geoff Lansberry
Cc: linux-wireless, Lauro Ramos Venancio, Aloisio Almeida Jr,
Samuel Ortiz, robh+dt, mark.rutland, netdev, devicetree,
linux-kernel, Justin Bronder, Jaret Cantu
In-Reply-To: <CAO7Z3WJa0goJ-VXc7dvyz8imZtqby6QsC0QNH+uRAE8LhxqU2w@mail.gmail.com>
[Please stop top-posting. Bottom-post only to these lists.]
Hi Geoff & happy new year.
On Tue, Dec 27, 2016 at 09:18:32AM -0500, Geoff Lansberry wrote:
> Mark - I will split this off soon.
OK
> In the meantime - here is some more info about how we use it.
>
> We do use NFC structures. I did find an interesting clue in that
> there are certain bottles that cause neard to segfault, I'm not sure
> what is different about them. We write a string, like
> "coppola_chardonnay_2015" to the bottles.
Off the top of my head, it could be the length of the text.
It would be useful to compare the data that works to the data
that doesn't work. Can you install NXP's 'TagInfo' app on a
smartphone and scan tags with working & non-working data?
You can email the data from the app to yourself, edit out
the cruft, and share here.
> Come to think of it, I
> haven't done anything special to make that an ndef record, just
> assumed that it would happen by default, I'll look into this further.
If you wrote the data using neard, it will be NDEF formatted.
Since it is working this well, it is virtually guaranteed that
the data is NDEF formatted.
> Also, I've been running neard with --plugin nfctype2. Just in case
> the problem was happening due to cycling through other tag types. It
> didn't seem to make any difference, but I have not gone back to
> default.
Good to know, thanks.
Mark
--
^ permalink raw reply
* Re: [PATCH v2 net-next] net:mv88e6xxx: use g2 interrupt for 6097 chip
From: Andrew Lunn @ 2017-01-03 16:28 UTC (permalink / raw)
To: Volodymyr Bendiuga; +Cc: vivien.didelot, f.fainelli, netdev, volodymyr.bendiuga
In-Reply-To: <1483456720-9929-1-git-send-email-volodymyr.bendiuga@gmail.com>
On Tue, Jan 03, 2017 at 04:18:40PM +0100, Volodymyr Bendiuga wrote:
> From: Volodymyr Bendiuga <volodymyr.bendiuga@westermo.se>
>
> This chip needs MV88E6XXX_FLAG_G2_INT
>
> Signed-off-by: Volodymyr Bendiuga <volodymyr.bendiuga@westermo.se>
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
> ---
> drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
> index af54bae..431e954 100644
> --- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
> +++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
> @@ -566,6 +566,7 @@ enum mv88e6xxx_cap {
> (MV88E6XXX_FLAG_G1_ATU_FID | \
> MV88E6XXX_FLAG_G1_VTU_FID | \
> MV88E6XXX_FLAG_GLOBAL2 | \
> + MV88E6XXX_FLAG_G2_INT | \
checkpatch is your friend:
$ ./scripts/checkpatch.pl volodymyr.bendiuga
ERROR: code indent should use tabs where possible
#76: FILE: drivers/net/dsa/mv88e6xxx/mv88e6xxx.h:569:
+ ^I MV88E6XXX_FLAG_G2_INT |^I\$
WARNING: please, no space before tabs
#76: FILE: drivers/net/dsa/mv88e6xxx/mv88e6xxx.h:569:
+ ^I MV88E6XXX_FLAG_G2_INT |^I\$
WARNING: please, no spaces at the start of a line
#76: FILE: drivers/net/dsa/mv88e6xxx/mv88e6xxx.h:569:
+ ^I MV88E6XXX_FLAG_G2_INT |^I\$
total: 1 errors, 2 warnings, 0 checks, 7 lines checked
Andrew
^ permalink raw reply
* [PATCH ipsec] xfrm: trivial typos
From: Alexander Alemayhu @ 2017-01-03 16:13 UTC (permalink / raw)
To: netdev; +Cc: steffen.klassert, Alexander Alemayhu
o s/descentant/descendant
o s/workarbound/workaround
Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
---
net/xfrm/xfrm_policy.c | 2 +-
net/xfrm/xfrm_state.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 177e208e8ff5..99ad1af2927f 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -330,7 +330,7 @@ void xfrm_policy_destroy(struct xfrm_policy *policy)
}
EXPORT_SYMBOL(xfrm_policy_destroy);
-/* Rule must be locked. Release descentant resources, announce
+/* Rule must be locked. Release descendant resources, announce
* entry dead. The rule must be unlinked from lists to the moment.
*/
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 64e3c82eedf6..c5cf4d611aab 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -409,7 +409,7 @@ static enum hrtimer_restart xfrm_timer_handler(struct hrtimer *me)
if (x->xflags & XFRM_SOFT_EXPIRE) {
/* enter hard expire without soft expire first?!
* setting a new date could trigger this.
- * workarbound: fix x->curflt.add_time by below:
+ * workaround: fix x->curflt.add_time by below:
*/
x->curlft.add_time = now - x->saved_tmo - 1;
tmo = x->lft.hard_add_expires_seconds - x->saved_tmo;
--
2.11.0
^ permalink raw reply related
* Re: [PATCH v2 net-next] net:mv88e6xxx: use g2 interrupt for 6097 chip
From: David Miller @ 2017-01-03 16:18 UTC (permalink / raw)
To: volodymyr.bendiuga
Cc: andrew, vivien.didelot, f.fainelli, netdev, volodymyr.bendiuga
In-Reply-To: <1483456720-9929-1-git-send-email-volodymyr.bendiuga@gmail.com>
From: Volodymyr Bendiuga <volodymyr.bendiuga@gmail.com>
Date: Tue, 3 Jan 2017 16:18:40 +0100
> + MV88E6XXX_FLAG_G2_INT | \
Space before TAB character is still there on this line, right after
the "+".
^ permalink raw reply
* [PATCH v2 net-next] net:mv88e6xxx: use g2 interrupt for 6097 chip
From: Volodymyr Bendiuga @ 2017-01-03 15:18 UTC (permalink / raw)
To: andrew, vivien.didelot, f.fainelli, netdev, volodymyr.bendiuga
From: Volodymyr Bendiuga <volodymyr.bendiuga@westermo.se>
This chip needs MV88E6XXX_FLAG_G2_INT
Signed-off-by: Volodymyr Bendiuga <volodymyr.bendiuga@westermo.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index af54bae..431e954 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -566,6 +566,7 @@ enum mv88e6xxx_cap {
(MV88E6XXX_FLAG_G1_ATU_FID | \
MV88E6XXX_FLAG_G1_VTU_FID | \
MV88E6XXX_FLAG_GLOBAL2 | \
+ MV88E6XXX_FLAG_G2_INT | \
MV88E6XXX_FLAG_G2_MGMT_EN_2X | \
MV88E6XXX_FLAG_G2_MGMT_EN_0X | \
MV88E6XXX_FLAG_G2_POT | \
--
2.7.4
^ permalink raw reply related
* RE: [PATCH net-next] net/sched: cls_matchall: Fix error path
From: Yotam Gigi @ 2017-01-03 16:13 UTC (permalink / raw)
To: David Miller
Cc: jhs@mojatatu.com, Elad Raz, Jiri Pirko, netdev@vger.kernel.org
In-Reply-To: <20170103.110807.1415814934069793893.davem@davemloft.net>
>-----Original Message-----
>From: David Miller [mailto:davem@davemloft.net]
>Sent: Tuesday, January 03, 2017 6:08 PM
>To: Yotam Gigi <yotamg@mellanox.com>
>Cc: jhs@mojatatu.com; Elad Raz <eladr@mellanox.com>; Jiri Pirko
><jiri@mellanox.com>; netdev@vger.kernel.org
>Subject: Re: [PATCH net-next] net/sched: cls_matchall: Fix error path
>
>From: Yotam Gigi <yotamg@mellanox.com>
>Date: Tue, 3 Jan 2017 17:47:02 +0200
>
>> Fix several error paths in matchall:
>> - Release reference to actions in case the hardware fails offloading
>> (relevant to skip_sw only)
>> - Fix error path in case tcf_exts initialization fails
>>
>> Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
>> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
>
>Nothing is checking the tcf_exts_init() return value for errors either,
>and I think you should fix this alongside these release problems at the
>same time.
Ok. Will send v2 soon.
Thanks!
>
>Thanks.
^ permalink raw reply
* Re: [net-next 0/3] tipc: improve interaction socket-link
From: David Miller @ 2017-01-03 16:13 UTC (permalink / raw)
To: jon.maloy; +Cc: netdev, tipc-discussion
In-Reply-To: <1483458911-32549-1-git-send-email-jon.maloy@ericsson.com>
From: Jon Maloy <jon.maloy@ericsson.com>
Date: Tue, 3 Jan 2017 10:55:08 -0500
> We fix a very real starvation problem that may occur when a link
> encounters send buffer congestion. At the same time we make the
> interaction between the socket and link layer simpler and more
> consistent.
Series applied, thanks Jon.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
^ permalink raw reply
* Re: [PATCH] staging: octeon: Call SET_NETDEV_DEV()
From: Greg KH @ 2017-01-03 16:11 UTC (permalink / raw)
To: Florian Fainelli
Cc: devel, asbjorn, aaro.koskinen, netdev, nevola, linux-kernel,
jarod, bhaktipriya96, David Miller, tremyfr
In-Reply-To: <748b758b-7a9c-d58e-2fa5-52b6fa031ae3@gmail.com>
On Tue, Dec 27, 2016 at 02:15:57PM -0800, Florian Fainelli wrote:
> On 12/20/2016 07:20 PM, David Miller wrote:
> > From: Florian Fainelli <f.fainelli@gmail.com>
> > Date: Tue, 20 Dec 2016 17:02:37 -0800
> >
> >> On 12/14/2016 05:13 PM, Florian Fainelli wrote:
> >>> The Octeon driver calls into PHYLIB which now checks for
> >>> net_device->dev.parent, so make sure we do set it before calling into
> >>> any MDIO/PHYLIB related function.
> >>>
> >>> Fixes: ec988ad78ed6 ("phy: Don't increment MDIO bus refcount unless it's a different owner")
> >>> Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
> >>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> >>
> >> Greg, David, since this is a fix for a regression introduced in the net
> >> tree, it may make sense that David take it via his tree.
> >
> > Since the change in question is in Linus's tree, it's equally valid
> > for Greg to take it as well.
>
> Sure, Greg, can you take this change? Thank you!
Will do so now, thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH] drop_monitor: consider inserted data in genlmsg_end
From: David Miller @ 2017-01-03 16:10 UTC (permalink / raw)
To: nhorman; +Cc: wr0112358, netdev, linux-kernel
In-Reply-To: <20170103160443.GC11735@hmsreliant.think-freely.org>
From: Neil Horman <nhorman@tuxdriver.com>
Date: Tue, 3 Jan 2017 11:04:43 -0500
> On Tue, Jan 03, 2017 at 09:54:19AM -0500, David Miller wrote:
>> From: Reiter Wolfgang <wr0112358@gmail.com>
>> Date: Tue, 3 Jan 2017 01:39:10 +0100
>>
>> > Final nlmsg_len field update must reflect inserted net_dm_drop_point
>> > data.
>> >
>> > This patch depends on previous patch:
>> > "drop_monitor: add missing call to genlmsg_end"
>> >
>> > Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
>>
>> I don't understand why the current code doesn't work properly.
>>
>> All over the tree, the pattern is:
>>
>> x = genlmsg_put(skb, ...);
>> ...
>> genlmsg_end(skb, x);
>>
>> And that is exactly what the code is doing right now.
>>
>
> Because reset_per_cpu_data should close the use of of the established skb
> that was being written to. Without this patch we add the END tlv to the skb
> that is just getting started for use in the drop monitor, rather than for the
> skb that is getting returned for use in sending up to user space listeners.
>
> Or am I missing something?
That's the critical part I didn't see, thanks for explaining.
Applied and queued up for -stabel, thanks.
^ permalink raw reply
* Re: [PATCH net-next] net/sched: cls_matchall: Fix error path
From: David Miller @ 2017-01-03 16:08 UTC (permalink / raw)
To: yotamg; +Cc: jhs, eladr, jiri, netdev
In-Reply-To: <1483458422-13607-1-git-send-email-yotamg@mellanox.com>
From: Yotam Gigi <yotamg@mellanox.com>
Date: Tue, 3 Jan 2017 17:47:02 +0200
> Fix several error paths in matchall:
> - Release reference to actions in case the hardware fails offloading
> (relevant to skip_sw only)
> - Fix error path in case tcf_exts initialization fails
>
> Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier")
> Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Nothing is checking the tcf_exts_init() return value for errors either,
and I think you should fix this alongside these release problems at the
same time.
Thanks.
^ permalink raw reply
* Re: [RFC PATCH 2/4] page_pool: basic implementation of page_pool
From: Vlastimil Babka @ 2017-01-03 16:07 UTC (permalink / raw)
To: Jesper Dangaard Brouer, linux-mm, Alexander Duyck
Cc: willemdebruijn.kernel, netdev, john.fastabend, Saeed Mahameed,
bjorn.topel, Alexei Starovoitov, Tariq Toukan
In-Reply-To: <20161220132817.18788.64726.stgit@firesoul>
On 12/20/2016 02:28 PM, Jesper Dangaard Brouer wrote:
> The focus in this patch is getting the API around page_pool figured out.
>
> The internal data structures for returning page_pool pages is not optimal.
> This implementation use ptr_ring for recycling, which is known not to scale
> in case of multiple remote CPUs releasing/returning pages.
Just few very quick impressions...
> A bulking interface into the page allocator is also left for later. (This
> requires cooperation will Mel Gorman, who just send me some PoC patches for this).
> ---
> include/linux/mm.h | 6 +
> include/linux/mm_types.h | 11 +
> include/linux/page-flags.h | 13 +
> include/linux/page_pool.h | 158 +++++++++++++++
> include/linux/skbuff.h | 2
> include/trace/events/mmflags.h | 3
> mm/Makefile | 3
> mm/page_alloc.c | 10 +
> mm/page_pool.c | 423 ++++++++++++++++++++++++++++++++++++++++
> mm/slub.c | 4
> 10 files changed, 627 insertions(+), 6 deletions(-)
> create mode 100644 include/linux/page_pool.h
> create mode 100644 mm/page_pool.c
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 4424784ac374..11b4d8fb280b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -23,6 +23,7 @@
> #include <linux/page_ext.h>
> #include <linux/err.h>
> #include <linux/page_ref.h>
> +#include <linux/page_pool.h>
>
> struct mempolicy;
> struct anon_vma;
> @@ -765,6 +766,11 @@ static inline void put_page(struct page *page)
> {
> page = compound_head(page);
>
> + if (PagePool(page)) {
> + page_pool_put_page(page);
> + return;
> + }
Can't say I'm thrilled about a new page flag and a test in put_page(). I don't
know the full life cycle here, but isn't it that these pages will be
specifically allocated and used in page pool aware drivers, so maybe they can be
also specifically freed there without hooking to the generic page refcount
mechanism?
> +
> if (put_page_testzero(page))
> __put_page(page);
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 08d947fc4c59..c74dea967f99 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -47,6 +47,12 @@ struct page {
> unsigned long flags; /* Atomic flags, some possibly
> * updated asynchronously */
> union {
> + /* DISCUSS: Considered moving page_pool pointer here,
> + * but I'm unsure if 'mapping' is needed for userspace
> + * mapping the page, as this is a use-case the
> + * page_pool need to support in the future. (Basically
> + * mapping a NIC RX ring into userspace).
I think so, but might be wrong here. In any case mapping usually goes with
index, and you put dma_addr in union with index below...
> + */
> struct address_space *mapping; /* If low bit clear, points to
> * inode address_space, or NULL.
> * If page mapped as anonymous
> @@ -63,6 +69,7 @@ struct page {
> union {
> pgoff_t index; /* Our offset within mapping. */
> void *freelist; /* sl[aou]b first free object */
> + dma_addr_t dma_addr; /* used by page_pool */
> /* page_deferred_list().prev -- second tail page */
> };
>
> @@ -117,6 +124,8 @@ struct page {
> * avoid collision and false-positive PageTail().
> */
> union {
> + /* XXX: Idea reuse lru list, in page_pool to align with PCP */
> +
> struct list_head lru; /* Pageout list, eg. active_list
> * protected by zone_lru_lock !
> * Can be used as a generic list
> @@ -189,6 +198,8 @@ struct page {
> #endif
> #endif
> struct kmem_cache *slab_cache; /* SL[AU]B: Pointer to slab */
> + /* XXX: Sure page_pool will have no users of "private"? */
> + struct page_pool *pool;
> };
>
> #ifdef CONFIG_MEMCG
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH] drop_monitor: consider inserted data in genlmsg_end
From: Neil Horman @ 2017-01-03 16:04 UTC (permalink / raw)
To: David Miller; +Cc: wr0112358, netdev, linux-kernel
In-Reply-To: <20170103.095419.261470619535526723.davem@davemloft.net>
On Tue, Jan 03, 2017 at 09:54:19AM -0500, David Miller wrote:
> From: Reiter Wolfgang <wr0112358@gmail.com>
> Date: Tue, 3 Jan 2017 01:39:10 +0100
>
> > Final nlmsg_len field update must reflect inserted net_dm_drop_point
> > data.
> >
> > This patch depends on previous patch:
> > "drop_monitor: add missing call to genlmsg_end"
> >
> > Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
>
> I don't understand why the current code doesn't work properly.
>
> All over the tree, the pattern is:
>
> x = genlmsg_put(skb, ...);
> ...
> genlmsg_end(skb, x);
>
> And that is exactly what the code is doing right now.
>
Because reset_per_cpu_data should close the use of of the established skb
that was being written to. Without this patch we add the END tlv to the skb
that is just getting started for use in the drop monitor, rather than for the
skb that is getting returned for use in sending up to user space listeners.
Or am I missing something?
^ permalink raw reply
* Re: [PATCH] Ipvlan should return an error when an address is already in use.
From: David Miller @ 2017-01-03 15:55 UTC (permalink / raw)
To: aconole; +Cc: kjlx, maheshb, netdev
In-Reply-To: <f7td1g49m8n.fsf@redhat.com>
From: Aaron Conole <aconole@redhat.com>
Date: Tue, 03 Jan 2017 10:50:00 -0500
>> @@ -489,7 +490,12 @@ static int __inet_insert_ifa(struct in_ifaddr *ifa, struct nlmsghdr *nlh,
>> Notifier will trigger FIB update, so that
>> listeners of netlink will know about new ifaddr */
>> rtmsg_ifa(RTM_NEWADDR, ifa, nlh, portid);
>> - blocking_notifier_call_chain(&inetaddr_chain, NETDEV_UP, ifa);
>> + ret = blocking_notifier_call_chain(&inetaddr_chain, NETDEV_UP, ifa);
>
> Why are you doing this assignment if you aren't using the result?
>
>> + ret = notifier_to_errno(ret);
>> + if (ret) {
>> + __inet_del_ifa(in_dev, ifap, 1, NULL, portid);
>> + return ret;
>> + }
'ret' assignment is being used, via notifier_to_errno().
^ permalink raw reply
* Re: [PATCH v3 net-next 0/2] TPACKET_V3 TX_RING support
From: David Miller @ 2017-01-03 16:01 UTC (permalink / raw)
To: sowmini.varadhan; +Cc: netdev, daniel, willemb
In-Reply-To: <cover.1483452545.git.sowmini.varadhan@oracle.com>
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Tue, 3 Jan 2017 06:31:46 -0800
> This patch series allows an application to use a single PF_PACKET
> descriptor and leverage the best implementations of TX_RING
> and RX_RING that exist today.
>
> Patch 1 adds the kernel/Documentation changes for TX_RING
> support and patch2 adds the associated test case in selftests.
>
> Changes since v2: additional sanity checks for setsockopt
> input for TX_RING/TPACKET_V3. Refactored psock_tpacket.c
> test code to avoid code duplication from V2.
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH net] benet: stricter vxlan offloading check in be_features_check
From: David Miller @ 2017-01-03 15:59 UTC (permalink / raw)
To: sd
Cc: netdev, sathya.perla, ajit.khaparde, sriharsha.basavapatna,
somnath.kotur
In-Reply-To: <59e720bde70fb5226313c62d89b62cbbef25b3e2.1483455910.git.sd@queasysnail.net>
From: Sabrina Dubroca <sd@queasysnail.net>
Date: Tue, 3 Jan 2017 16:26:04 +0100
> When VXLAN offloading is enabled, be_features_check() tries to check if
> an encapsulated packet is indeed a VXLAN packet. The check is not strict
> enough, and considers any UDP-encapsulated ethernet frame with a 8-byte
> tunnel header as being VXLAN. Unfortunately, both GENEVE and VXLAN-GPE
> have a 8-byte header, so they get through this check.
>
> Force the UDP destination port to be the one that has been offloaded to
> hardware.
>
> Without this, GENEVE-encapsulated packets can end up having an incorrect
> checksum when both a GENEVE and a VXLAN (offloaded) tunnel are
> configured.
>
> This is similar to commit a547224dceed ("mlx4e: Do not attempt to
> offload VXLAN ports that are unrecognized").
>
> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH v3 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support
From: Willem de Bruijn @ 2017-01-03 15:57 UTC (permalink / raw)
To: Sowmini Varadhan
Cc: Network Development, Daniel Borkmann, Willem de Bruijn,
David Miller
In-Reply-To: <dcf7fe19d3248b4f523b7f7d22937c61d92a152f.1483452545.git.sowmini.varadhan@oracle.com>
On Tue, Jan 3, 2017 at 9:31 AM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> Add a test case and sample code for (TPACKET_V3, PACKET_TX_RING)
>
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Thanks!
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox