* RE: RTL8723BE performance regression
From: Pkshih @ 2018-05-02 5:44 UTC (permalink / raw)
To: João Paulo Rechi Vita, Larry Finger
Cc: Steve deRosier, 莊彥宣, Birming Chiu, Shaofu,
Steven Ting, Chaoming_Li, Kalle Valo, linux-wireless,
Network Development, LKML, Daniel Drake,
João Paulo Rechi Vita, linux@endlessm.com
In-Reply-To: <CA+A7VXUUsZHD2gr9TBcV6jZBOPGZv7_vK-wWZ0MvGgiCnAiUgQ@mail.gmail.com>
> -----Original Message-----
> From: João Paulo Rechi Vita [mailto:jprvita@gmail.com]
> Sent: Wednesday, May 02, 2018 6:41 AM
> To: Larry Finger
> Cc: Steve deRosier; 莊彥宣; Pkshih; Birming Chiu; Shaofu; Steven Ting; Chaoming_Li; Kalle Valo;
> linux-wireless; Network Development; LKML; Daniel Drake; João Paulo Rechi Vita; linux@endlessm.com
> Subject: Re: RTL8723BE performance regression
>
> On Tue, Apr 3, 2018 at 7:51 PM, Larry Finger <Larry.Finger@lwfinger.net> wrote:
> > On 04/03/2018 09:37 PM, João Paulo Rechi Vita wrote:
> >>
> >> On Tue, Apr 3, 2018 at 7:28 PM, Larry Finger <Larry.Finger@lwfinger.net>
> >> wrote:
> >>
> >> (...)
> >>
> >>> As the antenna selection code changes affected your first bisection, do
> >>> you
> >>> have one of those HP laptops with only one antenna and the incorrect
> >>> coding
> >>> in the FUSE?
> >>
> >>
> >> Yes, that is why I've been passing ant_sel=1 during my tests -- this
> >> was needed to achieve a good performance in the past, before this
> >> regression. I've also opened the laptop chassis and confirmed the
> >> antenna cable is plugged to the connector labeled with "1" on the
> >> card.
> >>
> >>> If so, please make sure that you still have the same signal
> >>> strength for good and bad cases. I have tried to keep the driver and the
> >>> btcoex code in sync, but there may be some combinations of antenna
> >>> configuration and FUSE contents that cause the code to fail.
> >>>
> >>
> >> What is the recommended way to monitor the signal strength?
> >
> >
> > The btcoex code is developed for multiple platforms by a different group
> > than the Linux driver. I think they made a change that caused ant_sel to
> > switch from 1 to 2. At least numerous comments at
> > github.com/lwfinger/rtlwifi_new claimed they needed to make that change.
> >
> > Mhy recommended method is to verify the wifi device name with "iw dev". Then
> > using that device
> >
> > sudo iw dev <dev_name> scan | egrep "SSID|signal"
> >
>
> I have confirmed that the performance regression is indeed tied to
> signal strength: on the good cases signal was between -16 and -8 dBm,
> whereas in bad cases signal was always between -50 to - 40 dBm. I've
> also switched to testing bandwidth in controlled LAN environment using
> iperf3, as suggested by Steve deRosier, with the DUT being the only
> machine connected to the 2.4 GHz radio and the machine running the
> iperf3 server connected via ethernet.
>
We have new experimental results in commit af8a41cccf8f46 ("rtlwifi: cleanup
8723be ant_sel definition"). You can use the above commit and do the same
experiments (with ant_sel=0, 1 and 2) in your side, and then share your results.
Since performance is tied to signal strength, you can only share signal strength.
Regards
PK
^ permalink raw reply
* [PATCH net] sctp: use the old asoc when making the cookie-ack chunk in dupcook_d
From: Xin Long @ 2018-05-02 5:39 UTC (permalink / raw)
To: network dev, linux-sctp; +Cc: davem, Marcelo Ricardo Leitner, Neil Horman
When processing a duplicate cookie-echo chunk, for case 'D', sctp will
not process the param from this chunk. It means old asoc has nothing
to be updated, and the new temp asoc doesn't have the complete info.
So there's no reason to use the new asoc when creating the cookie-ack
chunk. Otherwise, like when auth is enabled for cookie-ack, the chunk
can not be set with auth, and it will definitely be dropped by peer.
This issue is there since very beginning, and we fix it by using the
old asoc instead.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
net/sctp/sm_statefuns.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 98acfed..28c070e 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -2056,7 +2056,7 @@ static enum sctp_disposition sctp_sf_do_dupcook_d(
}
}
- repl = sctp_make_cookie_ack(new_asoc, chunk);
+ repl = sctp_make_cookie_ack(asoc, chunk);
if (!repl)
goto nomem;
--
2.1.0
^ permalink raw reply related
* [PATCH net] sctp: init active key for the new asoc in dupcook_a and dupcook_b
From: Xin Long @ 2018-05-02 5:37 UTC (permalink / raw)
To: network dev, linux-sctp; +Cc: davem, Marcelo Ricardo Leitner, Neil Horman
When processing a duplicate cookie-echo chunk, for case 'A' and 'B',
after sctp_process_init for the new asoc, if auth is enabled for the
cookie-ack chunk, the active key should also be initialized.
Otherwise, the cookie-ack chunk made later can not be set with auth
shkey properly, and a crash can even be caused by this, as after
Commit 1b1e0bc99474 ("sctp: add refcnt support for sh_key"), sctp
needs to hold the shkey when making control chunks.
Fixes: 1b1e0bc99474 ("sctp: add refcnt support for sh_key")
Reported-by: Jianwen Ji <jiji@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
net/sctp/sm_statefuns.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index dd0594a..98acfed 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -1794,6 +1794,9 @@ static enum sctp_disposition sctp_sf_do_dupcook_a(
GFP_ATOMIC))
goto nomem;
+ if (sctp_auth_asoc_init_active_key(new_asoc, GFP_ATOMIC))
+ goto nomem;
+
/* Make sure no new addresses are being added during the
* restart. Though this is a pretty complicated attack
* since you'd have to get inside the cookie.
@@ -1906,6 +1909,9 @@ static enum sctp_disposition sctp_sf_do_dupcook_b(
GFP_ATOMIC))
goto nomem;
+ if (sctp_auth_asoc_init_active_key(new_asoc, GFP_ATOMIC))
+ goto nomem;
+
/* Update the content of current association. */
sctp_add_cmd_sf(commands, SCTP_CMD_UPDATE_ASSOC, SCTP_ASOC(new_asoc));
sctp_add_cmd_sf(commands, SCTP_CMD_NEW_STATE,
--
2.1.0
^ permalink raw reply related
* Re: [Xen-devel] [PATCH 2/6] xen-netfront: copy response out of shared buffer before accessing it
From: Oleksandr Andrushchenko @ 2018-05-02 5:20 UTC (permalink / raw)
To: Marek Marczykowski-Górecki, xen-devel
Cc: Juergen Gross, open list:NETWORKING DRIVERS, stable, open list,
Boris Ostrovsky
In-Reply-To: <98a855dceb47dbebd9c87e024084f14a5cb127f7.1525122026.git-series.marmarek@invisiblethingslab.com>
On 05/01/2018 12:01 AM, Marek Marczykowski-Górecki wrote:
> Make local copy of the response, otherwise backend might modify it while
> frontend is already processing it - leading to time of check / time of
> use issue.
>
> This is complementary to XSA155.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> ---
> drivers/net/xen-netfront.c | 51 +++++++++++++++++++--------------------
> 1 file changed, 25 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 4dd0668..dc99763 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -387,13 +387,13 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
> rmb(); /* Ensure we see responses up to 'rp'. */
>
> for (cons = queue->tx.rsp_cons; cons != prod; cons++) {
Side comment: the original concern was expressed on the above counters,
will those be addressed as a dedicated series?
> - struct xen_netif_tx_response *txrsp;
> + struct xen_netif_tx_response txrsp;
>
> - txrsp = RING_GET_RESPONSE(&queue->tx, cons);
> - if (txrsp->status == XEN_NETIF_RSP_NULL)
> + RING_COPY_RESPONSE(&queue->tx, cons, &txrsp);
> + if (txrsp.status == XEN_NETIF_RSP_NULL)
> continue;
>
IMO, there is still no guarantee you access consistent data after this
change.
What if part of the response was ok when you started copying and
then, in the middle, backend poisons the end of the response?
This seems to be just like minimizing(?) chances to work with inconsistent
data rather than removing the possibility of such completely
> - id = txrsp->id;
> + id = txrsp.id;
> skb = queue->tx_skbs[id].skb;
> if (unlikely(gnttab_query_foreign_access(
> queue->grant_tx_ref[id]) != 0)) {
> @@ -741,7 +741,7 @@ static int xennet_get_extras(struct netfront_queue *queue,
> RING_IDX rp)
>
> {
> - struct xen_netif_extra_info *extra;
> + struct xen_netif_extra_info extra;
> struct device *dev = &queue->info->netdev->dev;
> RING_IDX cons = queue->rx.rsp_cons;
> int err = 0;
> @@ -757,24 +757,23 @@ static int xennet_get_extras(struct netfront_queue *queue,
> break;
> }
>
> - extra = (struct xen_netif_extra_info *)
> - RING_GET_RESPONSE(&queue->rx, ++cons);
> + RING_COPY_RESPONSE(&queue->rx, ++cons, &extra);
>
> - if (unlikely(!extra->type ||
> - extra->type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
> + if (unlikely(!extra.type ||
> + extra.type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
> if (net_ratelimit())
> dev_warn(dev, "Invalid extra type: %d\n",
> - extra->type);
> + extra.type);
> err = -EINVAL;
> } else {
> - memcpy(&extras[extra->type - 1], extra,
> - sizeof(*extra));
> + memcpy(&extras[extra.type - 1], &extra,
> + sizeof(extra));
> }
>
> skb = xennet_get_rx_skb(queue, cons);
> ref = xennet_get_rx_ref(queue, cons);
> xennet_move_rx_slot(queue, skb, ref);
> - } while (extra->flags & XEN_NETIF_EXTRA_FLAG_MORE);
> + } while (extra.flags & XEN_NETIF_EXTRA_FLAG_MORE);
>
> queue->rx.rsp_cons = cons;
> return err;
> @@ -784,28 +783,28 @@ static int xennet_get_responses(struct netfront_queue *queue,
> struct netfront_rx_info *rinfo, RING_IDX rp,
> struct sk_buff_head *list)
> {
> - struct xen_netif_rx_response *rx = &rinfo->rx;
> + struct xen_netif_rx_response rx = rinfo->rx;
> struct xen_netif_extra_info *extras = rinfo->extras;
> struct device *dev = &queue->info->netdev->dev;
> RING_IDX cons = queue->rx.rsp_cons;
> struct sk_buff *skb = xennet_get_rx_skb(queue, cons);
> grant_ref_t ref = xennet_get_rx_ref(queue, cons);
> - int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD);
> + int max = MAX_SKB_FRAGS + (rx.status <= RX_COPY_THRESHOLD);
> int slots = 1;
> int err = 0;
> unsigned long ret;
>
> - if (rx->flags & XEN_NETRXF_extra_info) {
> + if (rx.flags & XEN_NETRXF_extra_info) {
> err = xennet_get_extras(queue, extras, rp);
> cons = queue->rx.rsp_cons;
> }
>
> for (;;) {
> - if (unlikely(rx->status < 0 ||
> - rx->offset + rx->status > XEN_PAGE_SIZE)) {
> + if (unlikely(rx.status < 0 ||
> + rx.offset + rx.status > XEN_PAGE_SIZE)) {
> if (net_ratelimit())
> dev_warn(dev, "rx->offset: %u, size: %d\n",
> - rx->offset, rx->status);
> + rx.offset, rx.status);
> xennet_move_rx_slot(queue, skb, ref);
> err = -EINVAL;
> goto next;
> @@ -819,7 +818,7 @@ static int xennet_get_responses(struct netfront_queue *queue,
> if (ref == GRANT_INVALID_REF) {
> if (net_ratelimit())
> dev_warn(dev, "Bad rx response id %d.\n",
> - rx->id);
> + rx.id);
> err = -EINVAL;
> goto next;
> }
> @@ -832,7 +831,7 @@ static int xennet_get_responses(struct netfront_queue *queue,
> __skb_queue_tail(list, skb);
>
> next:
> - if (!(rx->flags & XEN_NETRXF_more_data))
> + if (!(rx.flags & XEN_NETRXF_more_data))
> break;
>
> if (cons + slots == rp) {
> @@ -842,7 +841,7 @@ static int xennet_get_responses(struct netfront_queue *queue,
> break;
> }
>
> - rx = RING_GET_RESPONSE(&queue->rx, cons + slots);
> + RING_COPY_RESPONSE(&queue->rx, cons + slots, &rx);
> skb = xennet_get_rx_skb(queue, cons + slots);
> ref = xennet_get_rx_ref(queue, cons + slots);
> slots++;
> @@ -898,9 +897,9 @@ static RING_IDX xennet_fill_frags(struct netfront_queue *queue,
> struct sk_buff *nskb;
>
> while ((nskb = __skb_dequeue(list))) {
> - struct xen_netif_rx_response *rx =
> - RING_GET_RESPONSE(&queue->rx, ++cons);
> + struct xen_netif_rx_response rx;
> skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0];
> + RING_COPY_RESPONSE(&queue->rx, ++cons, &rx);
>
> if (shinfo->nr_frags == MAX_SKB_FRAGS) {
> unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to;
> @@ -911,7 +910,7 @@ static RING_IDX xennet_fill_frags(struct netfront_queue *queue,
> BUG_ON(shinfo->nr_frags >= MAX_SKB_FRAGS);
>
> skb_add_rx_frag(skb, shinfo->nr_frags, skb_frag_page(nfrag),
> - rx->offset, rx->status, PAGE_SIZE);
> + rx.offset, rx.status, PAGE_SIZE);
>
> skb_shinfo(nskb)->nr_frags = 0;
> kfree_skb(nskb);
> @@ -1007,7 +1006,7 @@ static int xennet_poll(struct napi_struct *napi, int budget)
> i = queue->rx.rsp_cons;
> work_done = 0;
> while ((i != rp) && (work_done < budget)) {
> - memcpy(rx, RING_GET_RESPONSE(&queue->rx, i), sizeof(*rx));
> + RING_COPY_RESPONSE(&queue->rx, i, rx);
> memset(extras, 0, sizeof(rinfo.extras));
>
> err = xennet_get_responses(queue, &rinfo, rp, &tmpq);
^ permalink raw reply
* Re: [v2 PATCH 1/1] tg3: fix meaningless hw_stats reading after tg3_halt memset 0 hw_stats
From: Michael Chan @ 2018-05-02 5:12 UTC (permalink / raw)
To: Zumeng Chen
Cc: Netdev, open list, Siva Reddy Kallam,
prashant.sreedharan@broadcom.com, David Miller, Zumeng Chen
In-Reply-To: <20180502004234.230662-1-zumeng.chen@gmail.com>
On Tue, May 1, 2018 at 5:42 PM, Zumeng Chen <zumeng.chen@gmail.com> wrote:
> diff --git a/drivers/net/ethernet/broadcom/tg3.h b/drivers/net/ethernet/broadcom/tg3.h
> index 3b5e98e..c61d83c 100644
> --- a/drivers/net/ethernet/broadcom/tg3.h
> +++ b/drivers/net/ethernet/broadcom/tg3.h
> @@ -3102,6 +3102,7 @@ enum TG3_FLAGS {
> TG3_FLAG_ROBOSWITCH,
> TG3_FLAG_ONE_DMA_AT_ONCE,
> TG3_FLAG_RGMII_MODE,
> + TG3_FLAG_HALT,
I think you should be able to use the existing INIT_COMPLETE flag and
not have to add a new flag.
>
> /* Add new flags before this comment and TG3_FLAG_NUMBER_OF_FLAGS */
> TG3_FLAG_NUMBER_OF_FLAGS, /* Last entry in enum TG3_FLAGS */
> --
> 2.9.3
>
^ permalink raw reply
* Re: [PATCH bpf-next] bpf/verifier: enable ctx + const + 0.
From: Alexei Starovoitov @ 2018-05-02 4:52 UTC (permalink / raw)
To: William Tu; +Cc: Linux Kernel Network Developers, Yonghong Song, Yifeng Sun
In-Reply-To: <CALDO+SbDZ5BDz2Pt_66O96Y0ZtvpB78oKaaxtL-Toy_npr0_Zw@mail.gmail.com>
On Tue, May 01, 2018 at 09:35:29PM -0700, William Tu wrote:
>
> > How did you test this patch?
> >
> Without the patch, the test case will fail.
> With the patch, the test case passes.
Please test it with real program and you'll see crashes and garbage returned.
^ permalink raw reply
* Re: [PATCH iproute2-next] Add tc-BPF example for a TCP pure ack recognizer
From: Dave Taht @ 2018-05-02 4:51 UTC (permalink / raw)
To: David Ahern; +Cc: Linux Kernel Network Developers
In-Reply-To: <96fb372d-4fc8-77db-e971-6b9540d8ad26@gmail.com>
On Tue, May 1, 2018 at 9:12 PM, David Ahern <dsahern@gmail.com> wrote:
> On 5/1/18 12:32 PM, Dave Taht wrote:
>> ack_recognize can shift pure tcp acks into another flowid.
>> ---
>> examples/bpf/ack_recognize.c | 98 ++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 98 insertions(+)
>> create mode 100644 examples/bpf/ack_recognize.c
>>
>> diff --git a/examples/bpf/ack_recognize.c b/examples/bpf/ack_recognize.c
>> new file mode 100644
>> index 0000000..5da620c
>> --- /dev/null
>> +++ b/examples/bpf/ack_recognize.c
>> @@ -0,0 +1,98 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * Copyright 2017 Google Inc.
>
> 2017?? it's 2018. Did you write this last year and just now posting?
November, 2017? Shortly prior to you taking iproute2-next off of steven's hands.
It was the first stage of a proof of concept showing (eventually) that
a simple ack decimator/filter (like "drop every other ack", or simple
"drop head" in the supplied example) had a ton of problems, and that
to filter out acks correctly to any extent, it needed to peer back
into the queue. (see the sch_cake ack-filter controversy on-going on
this list)
While it's better than what is in wondershaper, I wouldn't recommend
anyone use it. It was, however, useful in acquiring a gut feel for
what several other broken ack filters might be doing in the field. I
submitted it as a possibly useful example for showing off tc-bpf and
to add fuel to the ack-filter fire under cake.
I can clean it up, SPF it, etc, if you want it. Otherwise, sorry for the noise.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version 2
>> + * of the License, or (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
>> + * 02110-1301, USA.
>> + */
>> +
>> +/*
>> + * Author: dave.taht@gmail.com (Dave Taht)
>> + *
>> + * ack_recognizer: An eBPF program that correctly recognizes modern TCP ACKs,
>> + * with tcp option fields like SACK and timestamps, and no additional data.
>> + *
>> + * ack_match call: Recognize "pure acks" with no data payload
>> + *
>> + */
>> +
>> +#include "bpf_api.h"
>> +#include "linux/if_ether.h"
>> +#include "linux/ip.h"
>> +#include "linux/in.h"
>> +#include "linux/ipv6.h"
>> +#include "linux/tcp.h"
>> +
>> +/*
>> + * A pure ack contains the ip header, the tcp header + options, flags with the
>> + * ack field set, and no additional payload. That last bit is what every prior
>> + * ack filter gets wrong, they typically assume an obsolete 64 bytes, and don't
>> + * calculate the options (like sack or timestamps) to subtract from the payload.
>> + */
>> +
>> +__section_cls_entry
>> +int ack_match(struct __sk_buff *skb)
>> +{
>> + void *data = (void *)(long)skb->data;
>> + void *data_end = (void *)(long)skb->data_end;
>> + struct ethhdr *eth = data;
>> + struct iphdr *iph = data + sizeof(*eth);
>> + struct tcphdr *tcp;
>> +
>> + if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*tcp) > data_end)
>> + return 0;
>> +
>> + if (eth->h_proto == htons(ETH_P_IP) &&
>> + iph->version == 4) {
>
> Why not make the version == 4 under the proto check?
> if (eth->h_proto == htons(ETH_P_IP) {
> if (iph->version != 4)
> return 0;
> if the eth proto is ETH_P_IP, and version != 4 something is really wrong
>
>> + if(iph->protocol == IPPROTO_TCP &&
>> + iph->ihl == 5 &&
>> + data + sizeof(*eth) + 20 + sizeof(*tcp) <= data_end) {
>> + tcp = data + sizeof(*eth) + 20;
>> + if (tcp->ack &&
>> + htons(iph->tot_len) == 20 + tcp->doff*4)
>> + return -1;
>
> And then here why the magic '20' and limits on ihl = 5? both are trivial
> to accommodate programmatically.
>
>> + }
>> + } else if (eth->h_proto == htons(ETH_P_IPV6) &&
>> + iph->version == 6) {
>> + struct ipv6hdr *iph6 = (struct ipv6hdr *) iph;
>> + if(iph6->nexthdr == IPPROTO_TCP &&
>> + data + sizeof(*eth) + 40 + sizeof(*tcp) <= data_end ) {
>> + tcp = data + sizeof(*eth) + 40;
>> + if (tcp->ack &&
>> + tcp->doff*4 == htons(iph6->payload_len))
>> + return -1;
>
> ditto here.
>
>> + }
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +/* Example: Move acks into a priority queue:
>> +
>> +IFACE=eth0
>> +tc qdisc del dev $IFACE root 2> /dev/null
>> +tc qdisc add dev $IFACE root handle 1: prio bands 3 \
>> + priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>> +tc qdisc add dev $IFACE parent 1:1 handle 10:1 sfq headdrop # acks only
>> +tc qdisc add dev $IFACE parent 1:2 handle 20:1 fq_codel # all other traffic
>> +tc qdisc add dev $IFACE parent 1:3 handle 30:1 fq_codel # unused
>> +tc filter add dev $IFACE parent 1: prio 1 bpf \
>> + object-file ack_recognize.o flowid 1:1
>> +
>> +Please note that a strict priority queue is not a good idea (drr would be
>> +better), nor is doing any level of prioritization on acks at all....
>> +*/
>> +
>> +BPF_LICENSE("GPL");
>>
>
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
^ permalink raw reply
* Re: linux-next: manual merge of the bpf-next tree with the bpf tree
From: Song Liu @ 2018-05-02 4:40 UTC (permalink / raw)
To: Stephen Rothwell
Cc: Daniel Borkmann, Alexei Starovoitov, Networking,
Linux-Next Mailing List, Linux Kernel Mailing List, Yonghong Song
In-Reply-To: <20180502120921.654cc338@canb.auug.org.au>
> On May 1, 2018, at 7:09 PM, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> Hi all,
>
> Today's linux-next merge of the bpf-next tree got a conflict in:
>
> tools/testing/selftests/bpf/test_progs.c
>
> between commit:
>
> a4e21ff8d9a3 ("bpf: minor fix to selftest test_stacktrace_build_id()")
>
> from the bpf tree and commit:
>
> 79b453501310 ("tools/bpf: add a test for bpf_get_stack with tracepoint prog")
>
> from the bpf-next tree.
>
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging. You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
>
> --
> Cheers,
> Stephen Rothwell
>
> diff --cc tools/testing/selftests/bpf/test_progs.c
> index fac581f1c57f,aa336f0abebc..000000000000
> --- a/tools/testing/selftests/bpf/test_progs.c
> +++ b/tools/testing/selftests/bpf/test_progs.c
> @@@ -1137,9 -1193,14 +1193,14 @@@ static void test_stacktrace_build_id(vo
> err, errno))
> goto disable_pmu;
>
> + stack_amap_fd = bpf_find_map(__func__, obj, "stack_amap");
> + if (CHECK(stack_amap_fd < 0, "bpf_find_map stack_amap",
> + "err %d errno %d\n", err, errno))
> + goto disable_pmu;
> +
> assert(system("dd if=/dev/urandom of=/dev/zero count=4 2> /dev/null")
> == 0);
> - assert(system("./urandom_read if=/dev/urandom of=/dev/zero count=4 2> /dev/null") == 0);
> + assert(system("./urandom_read") == 0);
> /* disable stack trace collection */
> key = 0;
> val = 1;
> @@@ -1188,8 -1249,15 +1249,15 @@@
> previous_key = key;
> } while (bpf_map_get_next_key(stackmap_fd, &previous_key, &key) == 0);
>
> - CHECK(build_id_matches < 1, "build id match",
> - "Didn't find expected build ID from the map\n");
> + if (CHECK(build_id_matches < 1, "build id match",
> - "Didn't find expected build ID from the map"))
> ++ "Didn't find expected build ID from the map\n"))
^^^ Is there a "+" at the beginning of this line?
Thanks,
Song
> + goto disable_pmu;
> +
> + stack_trace_len = PERF_MAX_STACK_DEPTH
> + * sizeof(struct bpf_stack_build_id);
> + err = compare_stack_ips(stackmap_fd, stack_amap_fd, stack_trace_len);
> + CHECK(err, "compare_stack_ips stackmap vs. stack_amap",
> + "err %d errno %d\n", err, errno);
>
> disable_pmu:
> ioctl(pmu_fd, PERF_EVENT_IOC_DISABLE);
^ permalink raw reply
* Re: [PATCH bpf-next] bpf/verifier: enable ctx + const + 0.
From: William Tu @ 2018-05-02 4:35 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Linux Kernel Network Developers, Yonghong Song, Yifeng Sun
In-Reply-To: <20180501231652.ec563qza4c2nayhx@ast-mbp>
On Tue, May 1, 2018 at 4:16 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Mon, Apr 30, 2018 at 10:15:05AM -0700, William Tu wrote:
>> Existing verifier does not allow 'ctx + const + const'. However, due to
>> compiler optimization, there is a case where BPF compilerit generates
>> 'ctx + const + 0', as shown below:
>>
>> 599: (1d) if r2 == r4 goto pc+2
>> R0=inv(id=0) R1=ctx(id=0,off=40,imm=0)
>> R2=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff))
>> R3=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)) R4=inv0
>> R6=ctx(id=0,off=0,imm=0) R7=inv2
>> 600: (bf) r1 = r6 // r1 is ctx
>> 601: (07) r1 += 36 // r1 has offset 36
>> 602: (61) r4 = *(u32 *)(r1 +0) // r1 + 0
>> dereference of modified ctx ptr R1 off=36+0, ctx+const is allowed,
>> ctx+const+const is not
>>
>> The reason for BPF backend generating this code is due optimization
>> likes this, explained from Yonghong:
>> if (...)
>> *(ctx + 60)
>> else
>> *(ctx + 56)
>>
>> The compiler translates it to
>> if (...)
>> ptr = ctx + 60
>> else
>> ptr = ctx + 56
>> *(ptr + 0)
>>
>> So load ptr memory become an example of 'ctx + const + 0'. This patch
>> enables support for this case.
>>
>> Fixes: f8ddadc4db6c7 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
>> Cc: Yonghong Song <yhs@fb.com>
>> Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
>> Signed-off-by: William Tu <u9012063@gmail.com>
>> ---
>> kernel/bpf/verifier.c | 2 +-
>> tools/testing/selftests/bpf/test_verifier.c | 13 +++++++++++++
>> 2 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 712d8655e916..c9a791b9cf2a 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -1638,7 +1638,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>> /* ctx accesses must be at a fixed offset, so that we can
>> * determine what type of data were returned.
>> */
>> - if (reg->off) {
>> + if (reg->off && off != reg->off) {
>> verbose(env,
>> "dereference of modified ctx ptr R%d off=%d+%d, ctx+const is allowed, ctx+const+const is not\n",
>> regno, reg->off, off - reg->off);
>> diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
>> index 1acafe26498b..95ad5d5723ae 100644
>> --- a/tools/testing/selftests/bpf/test_verifier.c
>> +++ b/tools/testing/selftests/bpf/test_verifier.c
>> @@ -8452,6 +8452,19 @@ static struct bpf_test tests[] = {
>> .prog_type = BPF_PROG_TYPE_SCHED_CLS,
>> },
>> {
>> + "arithmetic ops make PTR_TO_CTX + const + 0 valid",
>> + .insns = {
>> + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1,
>> + offsetof(struct __sk_buff, data) -
>> + offsetof(struct __sk_buff, mark)),
This is:
r1 += N // r1 has offset N: the offset between data and mark)
>> + BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1, 0),
This is:
r0 = *(u32 *)(r1 +0) // r1 + 0
So the above two lines create similar case I hit
601: (07) r1 += 36 // r1 has offset 36
602: (61) r4 = *(u32 *)(r1 +0) // r1 + 0
>
> How rewritten code looks here?
>
> The patch is allowing check_ctx_access() to proceed with sort-of
> correct 'off' and remember ctx_field_size,
> but in convert_ctx_accesses() it's using insn->off to do conversion.
> Which is zero in this case, so it will convert
> struct __sk_buff {
> __u32 len; // offset 0
>
> into access of 'struct sk_buff'->len
> and then will add __sk_buff's &data - &mark delta to in-kernel len field.
> Which will point to some random field further down in struct sk_buff.
> Doesn't look correct at all.
why?
So it points to ctx + "offsetof(struct __sk_buff, data) -
offsetof(struct __sk_buff, mark)",
which is ctx + const
then I tested that 'ctx + const + 0' should pass the verifier
> How did you test this patch?
>
Without the patch, the test case will fail.
With the patch, the test case passes.
William
^ permalink raw reply
* Re: [PATCH iproute2-next] Add tc-BPF example for a TCP pure ack recognizer
From: David Ahern @ 2018-05-02 4:12 UTC (permalink / raw)
To: Dave Taht, netdev
In-Reply-To: <1525199561-9302-1-git-send-email-dave.taht@gmail.com>
On 5/1/18 12:32 PM, Dave Taht wrote:
> ack_recognize can shift pure tcp acks into another flowid.
> ---
> examples/bpf/ack_recognize.c | 98 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 98 insertions(+)
> create mode 100644 examples/bpf/ack_recognize.c
>
> diff --git a/examples/bpf/ack_recognize.c b/examples/bpf/ack_recognize.c
> new file mode 100644
> index 0000000..5da620c
> --- /dev/null
> +++ b/examples/bpf/ack_recognize.c
> @@ -0,0 +1,98 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright 2017 Google Inc.
2017?? it's 2018. Did you write this last year and just now posting?
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> + * 02110-1301, USA.
> + */
> +
> +/*
> + * Author: dave.taht@gmail.com (Dave Taht)
> + *
> + * ack_recognizer: An eBPF program that correctly recognizes modern TCP ACKs,
> + * with tcp option fields like SACK and timestamps, and no additional data.
> + *
> + * ack_match call: Recognize "pure acks" with no data payload
> + *
> + */
> +
> +#include "bpf_api.h"
> +#include "linux/if_ether.h"
> +#include "linux/ip.h"
> +#include "linux/in.h"
> +#include "linux/ipv6.h"
> +#include "linux/tcp.h"
> +
> +/*
> + * A pure ack contains the ip header, the tcp header + options, flags with the
> + * ack field set, and no additional payload. That last bit is what every prior
> + * ack filter gets wrong, they typically assume an obsolete 64 bytes, and don't
> + * calculate the options (like sack or timestamps) to subtract from the payload.
> + */
> +
> +__section_cls_entry
> +int ack_match(struct __sk_buff *skb)
> +{
> + void *data = (void *)(long)skb->data;
> + void *data_end = (void *)(long)skb->data_end;
> + struct ethhdr *eth = data;
> + struct iphdr *iph = data + sizeof(*eth);
> + struct tcphdr *tcp;
> +
> + if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*tcp) > data_end)
> + return 0;
> +
> + if (eth->h_proto == htons(ETH_P_IP) &&
> + iph->version == 4) {
Why not make the version == 4 under the proto check?
if (eth->h_proto == htons(ETH_P_IP) {
if (iph->version != 4)
return 0;
if the eth proto is ETH_P_IP, and version != 4 something is really wrong
> + if(iph->protocol == IPPROTO_TCP &&
> + iph->ihl == 5 &&
> + data + sizeof(*eth) + 20 + sizeof(*tcp) <= data_end) {
> + tcp = data + sizeof(*eth) + 20;
> + if (tcp->ack &&
> + htons(iph->tot_len) == 20 + tcp->doff*4)
> + return -1;
And then here why the magic '20' and limits on ihl = 5? both are trivial
to accommodate programmatically.
> + }
> + } else if (eth->h_proto == htons(ETH_P_IPV6) &&
> + iph->version == 6) {
> + struct ipv6hdr *iph6 = (struct ipv6hdr *) iph;
> + if(iph6->nexthdr == IPPROTO_TCP &&
> + data + sizeof(*eth) + 40 + sizeof(*tcp) <= data_end ) {
> + tcp = data + sizeof(*eth) + 40;
> + if (tcp->ack &&
> + tcp->doff*4 == htons(iph6->payload_len))
> + return -1;
ditto here.
> + }
> + }
> +
> + return 0;
> +}
> +
> +/* Example: Move acks into a priority queue:
> +
> +IFACE=eth0
> +tc qdisc del dev $IFACE root 2> /dev/null
> +tc qdisc add dev $IFACE root handle 1: prio bands 3 \
> + priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
> +tc qdisc add dev $IFACE parent 1:1 handle 10:1 sfq headdrop # acks only
> +tc qdisc add dev $IFACE parent 1:2 handle 20:1 fq_codel # all other traffic
> +tc qdisc add dev $IFACE parent 1:3 handle 30:1 fq_codel # unused
> +tc filter add dev $IFACE parent 1: prio 1 bpf \
> + object-file ack_recognize.o flowid 1:1
> +
> +Please note that a strict priority queue is not a good idea (drr would be
> +better), nor is doing any level of prioritization on acks at all....
> +*/
> +
> +BPF_LICENSE("GPL");
>
^ permalink raw reply
* Re: [RFC v3 0/5] virtio: support packed ring
From: Jason Wang @ 2018-05-02 3:49 UTC (permalink / raw)
To: Tiwei Bie, mst, virtualization, linux-kernel, netdev; +Cc: wexu, jfreimann
In-Reply-To: <20180425051550.24342-1-tiwei.bie@intel.com>
On 2018年04月25日 13:15, Tiwei Bie wrote:
> Hello everyone,
>
> This RFC implements packed ring support in virtio driver.
>
> Some simple functional tests have been done with Jason's
> packed ring implementation in vhost:
>
> https://lkml.org/lkml/2018/4/23/12
>
> Both of ping and netperf worked as expected (with EVENT_IDX
> disabled). But there are below known issues:
>
> 1. Reloading the guest driver will break the Tx/Rx;
It looks like the reason is we don't sync wrap counter information
between host and qemu through VHOST_SET/GET_VRING_BASE. And both vhost
and qemu need to do this through encoding warp counters to higher bits
of vhost_vring_state.num,
Thanks
^ permalink raw reply
* Re: [PATCH RFC 4/9] veth: Use NAPI for XDP
From: Toshiaki Makita @ 2018-05-02 3:37 UTC (permalink / raw)
To: Jesper Dangaard Brouer; +Cc: Toshiaki Makita, netdev
In-Reply-To: <20180501104306.33c65bd8@redhat.com>
On 18/05/01 (火) 17:43, Jesper Dangaard Brouer wrote:
> On Tue, 1 May 2018 17:02:34 +0900
> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote:
>
>> On 2018/05/01 16:50, Jesper Dangaard Brouer wrote:
>>> On Tue, 24 Apr 2018 23:39:18 +0900
>>> Toshiaki Makita <toshiaki.makita1@gmail.com> wrote:
>>>
>>>> +static int veth_xdp_enqueue(struct veth_priv *priv, void *ptr)
>>>> +{
>>>> + if (unlikely(ptr_ring_produce(&priv->xdp_ring, ptr)))
>>>> + return -ENOSPC;
>>>> +
>>>> + return 0;
>>>> +}
>>>
>>> Here we have a lock per (enqueued) packet. I'm working on changing the
>>> ndo_xdp_xmit API to allow bulking. And the tun driver have exact same
>>> issue/need.
>>
>> Probably I should have noted in commitlog, but this per-packet lock is
>> removed in patch 9.
>> I'm curious about if any change is needed by your new API.
>
> Again, I'm just moving this into the generic code, to avoid having to
> implement this for every driver.
>
> Plus, with CONFIG_RETPOLINE we have the advantage of only calling
> ndo_xdp_xmit once (indirect function pointer call).
Nice. Once it becomes available I'll use it instead.
Toshiaki Makita
^ permalink raw reply
* [RFC PATCH] bpf: __pcpu_scope_irq_work can be static
From: kbuild test robot @ 2018-05-02 3:34 UTC (permalink / raw)
To: Song Liu
Cc: kbuild-all, netdev, Song Liu, kernel-team, Alexei Starovoitov,
Daniel Borkmann, Peter Zijlstra
In-Reply-To: <20180502000220.2585320-1-songliubraving@fb.com>
Fixes: 1bd02fb7c12e ("bpf: enable stackmap with build_id in nmi context")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
---
stackmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index c33fec1..5d0a951 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -48,7 +48,7 @@ static void up_read_work(struct irq_work *entry)
work->sem = NULL;
}
-DEFINE_PER_CPU(struct stack_map_irq_work, irq_work);
+static DEFINE_PER_CPU(struct stack_map_irq_work, irq_work);
static inline bool stack_map_use_build_id(struct bpf_map *map)
{
^ permalink raw reply related
* Re: [PATCH bpf-next 1/2] bpf: enable stackmap with build_id in nmi context
From: kbuild test robot @ 2018-05-02 3:34 UTC (permalink / raw)
To: Song Liu
Cc: kbuild-all, netdev, Song Liu, kernel-team, Alexei Starovoitov,
Daniel Borkmann, Peter Zijlstra
In-Reply-To: <20180502000220.2585320-1-songliubraving@fb.com>
Hi Song,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on bpf-next/master]
url: https://github.com/0day-ci/linux/commits/Song-Liu/bpf-enable-stackmap-with-build_id-in-nmi-context/20180502-081727
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__
sparse warnings: (new ones prefixed by >>)
>> kernel/bpf/stackmap.c:51:1: sparse: symbol '__pcpu_scope_irq_work' was not declared. Should it be static?
Please review and possibly fold the followup patch.
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
^ permalink raw reply
* Re: [PATCH RFC 6/9] veth: Add ndo_xdp_xmit
From: Toshiaki Makita @ 2018-05-02 3:33 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Toshiaki Makita, netdev, Tariq Toukan, Daniel Borkmann,
Alexei Starovoitov, Eran Ben Elisha
In-Reply-To: <20180501101420.544ff225@redhat.com>
On 18/05/01 (火) 17:14, Jesper Dangaard Brouer wrote:
> On Tue, 1 May 2018 10:02:01 +0900 Toshiaki Makita
> <makita.toshiaki@lab.ntt.co.jp> wrote:
>
>> On 2018/05/01 2:27, Jesper Dangaard Brouer wrote:
>>> On Thu, 26 Apr 2018 19:52:40 +0900 Toshiaki Makita
>>> <makita.toshiaki@lab.ntt.co.jp> wrote:
>>>
>>>> On 2018/04/26 5:24, Jesper Dangaard Brouer wrote:
>>>>> On Tue, 24 Apr 2018 23:39:20 +0900 Toshiaki Makita
>>>>> <toshiaki.makita1@gmail.com> wrote:
>>>>>
>>>>>> +static int veth_xdp_xmit(struct net_device *dev, struct
>>>>>> xdp_frame *frame) +{ + struct veth_priv *rcv_priv, *priv =
>>>>>> netdev_priv(dev); + int headroom = frame->data - (void
>>>>>> *)frame; + struct net_device *rcv; + int err = 0; + + rcv
>>>>>> = rcu_dereference(priv->peer); + if (unlikely(!rcv)) +
>>>>>> return -ENXIO; + + rcv_priv = netdev_priv(rcv); + /*
>>>>>> xdp_ring is initialized on receive side? */ + if
>>>>>> (rcu_access_pointer(rcv_priv->xdp_prog)) { + err =
>>>>>> xdp_ok_fwd_dev(rcv, frame->len); + if (unlikely(err)) +
>>>>>> return err; + + err = veth_xdp_enqueue(rcv_priv,
>>>>>> veth_xdp_to_ptr(frame)); + } else { + struct sk_buff *skb;
>>>>>> + + skb = veth_build_skb(frame, headroom, frame->len, 0);
>>>>>> + if (unlikely(!skb)) + return -ENOMEM; + + /* Get page
>>>>>> ref in case skb is dropped in netif_rx. + * The caller is
>>>>>> responsible for freeing the page on error. + */ +
>>>>>> get_page(virt_to_page(frame->data));
>>>>>
>>>>> I'm not sure you can make this assumption, that xdp_frames
>>>>> coming from another device driver uses a refcnt based memory
>>>>> model. But maybe I'm confused, as this looks like an SKB
>>>>> receive path, but in the ndo_xdp_xmit().
>>>>
>>>> I find this path similar to cpumap, which creates skb from
>>>> redirected xdp frame. Once it is converted to skb, skb head is
>>>> freed by page_frag_free, so anyway I needed to get the
>>>> refcount here regardless of memory model.
>>>
>>> Yes I know, I wrote cpumap ;-)
>>>
>>> First of all, I don't want to see such xdp_frame to SKB
>>> conversion code in every driver. Because that increase the
>>> chances of errors. And when looking at the details, then it
>>> seems that you have made the mistake of making it possible to
>>> leak xdp_frame info to the SKB (which cpumap takes into
>>> account).
>>
>> Do you mean leaving xdp_frame in skb->head is leaking something?
>> how?
>
> Like commit 97e19cce05e5 ("bpf: reserve xdp_frame size in xdp
> headroom") and commit 6dfb970d3dbd ("xdp: avoid leaking info stored
> in frame data on page reuse").
Thanks for sharing the info.
> But this time, the concern is a bpf_prog attached at TC/bpf_cls
> level, that can that can adjust head via bpf_skb_change_head (for
> XDP it is bpf_xdp_adjust_head) into the area used by xdp_frame. As
> described in https://git.kernel.org/davem/net-next/c/6dfb970d3dbd in
> is not super critical at the moment, as this _currently_ runs as
> CAP_SYS_ADMIN, but we would like to move towards CAP_NET_ADMIN.
What I don't get is why special casing xdp_frame info. My assumption is
that the area above skb->mac_header is uninit kernel memory and should
not be readable by unprivileged users anyway. So I didn't clear the area
at this point.
>>> Second, I think the refcnt scheme here is wrong. The xdp_frame
>>> should be "owned" by XDP and have the proper refcnt to deliver
>>> it directly to the network stack.
>>>
>>> Third, if we choose that we want a fallback, in-case XDP is not
>>> enabled on egress dev (but it have an ndo_xdp_xmit), then it
>>> should be placed in the generic/core code. E.g.
>>> __bpf_tx_xdp_map() could look at the return code from
>>> dev->netdev_ops->ndo_xdp() and create an SKB. (Hint, this would
>>> make it easy to implement TX bulking towards the dev).
>>
>> Right, this is a much cleaner way.
>>
>> Although I feel like we should add this fallback for veth because
>> it requires something which is different from other drivers
>> (enabling XDP on the peer device of the egress device),
>
> (This is why I Cc'ed Tariq...)
>
> This is actually a general problem with the xdp "xmit" side (and not
> specific to veth driver). The problem exists for other drivers as
> well.
>
> The problem is that a driver can implement ndo_xdp_xmit(), but the
> driver might not have allocated the necessary XDP TX-queue resources
> yet (or it might not be possible due to system resource limits).
>
> The current "hack" is to load an XDP prog on the egress device, and
> then assume that this cause the driver to also allocate the XDP
> ndo_xdo_xmit side HW resources. This is IMHO a wrong assumption.
>
> We need a more generic way to test if a net_device is "ready/enabled"
> for handling xdp_frames via ndo_xdp_xmit. And Tariq had some ideas
> on how to implement this...
My assumption on REDIRECT requirement came from this.
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=306da4e685b
I guess you are saying thing are changing, and having an XDP program
attached on the egress device is no longer generally sufficient. Looking
forward to Tariq's solution.
Toshiaki Makita
>
> My opinion is that it is a waste of (HW/mem) resources to always
> allocate resources for ndo_xdp_xmit when loading an XDP program.
> Because what if my use-cases are XDP_DROP DDoS filter, or CPUMAP
> redirect load-balancer, then I don't want/need ndo_xdp_xmit. E.g.
> today using ixgbe on machines with more than 96 CPUs, will fail due
> to limited TX queue resources. Thus, blocking the mentioned
> use-cases.
>
>
>> I'll drop the part for now. It should not be resolved in the driver
>> code.
>
> Thank you.
>
^ permalink raw reply
* Re: [PATCH V2 net-next 5/6] macvlan/macvtap: Add support for SCTP checksum offload.
From: Michael S. Tsirkin @ 2018-05-02 3:24 UTC (permalink / raw)
To: Vladislav Yasevich
Cc: netdev, linux-sctp, virtualization, virtio-dev, jasowang, nhorman,
marcelo.leitner, Vladislav Yasevich
In-Reply-To: <20180502020739.19239-6-vyasevic@redhat.com>
On Tue, May 01, 2018 at 10:07:38PM -0400, Vladislav Yasevich wrote:
> Since we now have support for software CRC32c offload, turn it on
> for macvlan and macvtap devices so that guests can take advantage
> of offload SCTP checksums to the host or host hardware.
>
> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> ---
> drivers/net/macvlan.c | 5 +++--
> drivers/net/tap.c | 8 +++++---
> 2 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
> index 725f4b4..646b730 100644
> --- a/drivers/net/macvlan.c
> +++ b/drivers/net/macvlan.c
> @@ -834,7 +834,7 @@ static struct lock_class_key macvlan_netdev_addr_lock_key;
>
> #define ALWAYS_ON_OFFLOADS \
> (NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_GSO_SOFTWARE | \
> - NETIF_F_GSO_ROBUST | NETIF_F_GSO_ENCAP_ALL)
> + NETIF_F_GSO_ROBUST | NETIF_F_GSO_ENCAP_ALL | NETIF_F_SCTP_CRC)
>
> #define ALWAYS_ON_FEATURES (ALWAYS_ON_OFFLOADS | NETIF_F_LLTX)
>
> @@ -842,7 +842,8 @@ static struct lock_class_key macvlan_netdev_addr_lock_key;
> (NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_HIGHDMA | NETIF_F_FRAGLIST | \
> NETIF_F_GSO | NETIF_F_TSO | NETIF_F_LRO | \
> NETIF_F_TSO_ECN | NETIF_F_TSO6 | NETIF_F_GRO | NETIF_F_RXCSUM | \
> - NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_STAG_FILTER)
> + NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_STAG_FILTER | \
> + NETIF_F_SCTP_CRC)
>
> #define MACVLAN_STATE_MASK \
> ((1<<__LINK_STATE_NOCARRIER) | (1<<__LINK_STATE_DORMANT))
> diff --git a/drivers/net/tap.c b/drivers/net/tap.c
> index 9b6cb78..2c8512b 100644
> --- a/drivers/net/tap.c
> +++ b/drivers/net/tap.c
> @@ -369,8 +369,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
> * check, we either support them all or none.
> */
> if (skb->ip_summed == CHECKSUM_PARTIAL &&
> - !(features & NETIF_F_CSUM_MASK) &&
> - skb_checksum_help(skb))
> + skb_csum_hwoffload_help(skb, features))
> goto drop;
> if (ptr_ring_produce(&q->ring, skb))
> goto drop;
> @@ -945,6 +944,9 @@ static int set_offload(struct tap_queue *q, unsigned long arg)
> }
> }
>
> + if (arg & TUN_F_SCTP_CSUM)
> + feature_mask |= NETIF_F_SCTP_CRC;
> +
so this still affects TX, shouldn't this affect RX instead?
> /* tun/tap driver inverts the usage for TSO offloads, where
> * setting the TSO bit means that the userspace wants to
> * accept TSO frames and turning it off means that user space
> @@ -1077,7 +1079,7 @@ static long tap_ioctl(struct file *file, unsigned int cmd,
> case TUNSETOFFLOAD:
> /* let the user check for future flags */
> if (arg & ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
> - TUN_F_TSO_ECN | TUN_F_UFO))
> + TUN_F_TSO_ECN | TUN_F_UFO | TUN_F_SCTP_CSUM))
> return -EINVAL;
>
> rtnl_lock();
> --
> 2.9.5
^ permalink raw reply
* Re: [PATCH V2 net-next 1/6] virtio: Add support for SCTP checksum offloading
From: Michael S. Tsirkin @ 2018-05-02 3:16 UTC (permalink / raw)
To: Vladislav Yasevich
Cc: virtio-dev, marcelo.leitner, nhorman, netdev, virtualization,
linux-sctp
In-Reply-To: <20180502020739.19239-2-vyasevic@redhat.com>
On Tue, May 01, 2018 at 10:07:34PM -0400, Vladislav Yasevich wrote:
> To support SCTP checksum offloading, we need to add a new feature
> to virtio_net, so we can negotiate support between the hypervisor
> and the guest.
> The HOST feature bit signifies offloading support for transmit and
> enables device offload features.
> The GUEST feature bit signifies offloading support of recieve and
> is currently only used by the driver in case of xdp.
>
> That patch also adds an addition virtio_net header flag which
> mirrors the skb->csum_not_inet flag. This flags is used to indicate
> that is this an SCTP packet that needs its checksum computed by the
> lower layer. In this case, the lower layer is the host hypervisor or
> possibly HW nic that supporst CRC32c offload.
>
> In the case that GUEST feature bit is flag, it will be possible to
> receive a virtio_net header with this bit set, which will set the
> corresponding skb bit. SCTP protocol will be updated to correctly
> deal with it.
>
> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
> ---
> drivers/net/virtio_net.c | 14 +++++++++++++-
> include/linux/virtio_net.h | 6 ++++++
> include/uapi/linux/virtio_net.h | 5 +++++
> 3 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 7b187ec..34af280 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2148,6 +2148,8 @@ static int virtnet_clear_guest_offloads(struct virtnet_info *vi)
>
> if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))
> offloads = 1ULL << VIRTIO_NET_F_GUEST_CSUM;
> + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_SCTP_CSUM))
> + offloads |= 1ULL << VIRTIO_NET_F_GUEST_SCTP_CSUM;
>
> return virtnet_set_guest_offloads(vi, offloads);
> }
> @@ -2160,6 +2162,8 @@ static int virtnet_restore_guest_offloads(struct virtnet_info *vi)
> return 0;
> if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))
> offloads |= 1ULL << VIRTIO_NET_F_GUEST_CSUM;
> + if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_SCTP_CSUM))
> + offloads |= 1ULL << VIRTIO_NET_F_GUEST_SCTP_CSUM;
>
> return virtnet_set_guest_offloads(vi, offloads);
> }
> @@ -2724,6 +2728,7 @@ static int virtnet_probe(struct virtio_device *vdev)
> /* Do we support "hardware" checksums? */
> if (virtio_has_feature(vdev, VIRTIO_NET_F_CSUM)) {
> /* This opens up the world of extra features. */
> +
> dev->hw_features |= NETIF_F_HW_CSUM | NETIF_F_SG;
> if (csum)
> dev->features |= NETIF_F_HW_CSUM | NETIF_F_SG;
> @@ -2746,9 +2751,15 @@ static int virtnet_probe(struct virtio_device *vdev)
> dev->features |= dev->hw_features & NETIF_F_ALL_TSO;
> /* (!csum && gso) case will be fixed by register_netdev() */
> }
> +
> if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> dev->features |= NETIF_F_RXCSUM;
>
> + if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_SCTP_CSUM)) {
> + dev->hw_features |= NETIF_F_SCTP_CRC;
> + dev->features |= NETIF_F_SCTP_CRC;
> + }
> +
> dev->vlan_features = dev->features;
>
> /* MTU range: 68 - 65535 */
> @@ -2962,7 +2973,8 @@ static struct virtio_device_id id_table[] = {
> VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
> VIRTIO_NET_F_CTRL_MAC_ADDR, \
> VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
> - VIRTIO_NET_F_SPEED_DUPLEX
> + VIRTIO_NET_F_SPEED_DUPLEX, \
> + VIRTIO_NET_F_HOST_SCTP_CSUM, VIRTIO_NET_F_GUEST_SCTP_CSUM
>
> static unsigned int features[] = {
> VIRTNET_FEATURES,
> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
> index f144216..28fffdc 100644
> --- a/include/linux/virtio_net.h
> +++ b/include/linux/virtio_net.h
> @@ -39,6 +39,9 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
>
> if (!skb_partial_csum_set(skb, start, off))
> return -EINVAL;
> +
> + if (hdr->flags & VIRTIO_NET_HDR_F_CSUM_NOT_INET)
> + skb->csum_not_inet = 1;
> }
>
> if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> @@ -91,6 +94,9 @@ static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
> skb_checksum_start_offset(skb));
> hdr->csum_offset = __cpu_to_virtio16(little_endian,
> skb->csum_offset);
> +
> + if (skb->csum_not_inet)
> + hdr->flags |= VIRTIO_NET_HDR_F_CSUM_NOT_INET;
> } else if (has_data_valid &&
> skb->ip_summed == CHECKSUM_UNNECESSARY) {
> hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID;
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index 5de6ed3..9dfca1a 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -57,6 +57,10 @@
> * Steering */
> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
>
> +#define VIRTIO_NET_F_GUEST_SCTP_CSUM 61 /* Guest handles SCTP pks w/ partial
> + * csum */
> +#define VIRTIO_NET_F_HOST_SCTP_CSUM 62 /* HOST handles SCTP pkts w/ partial
> + * csum */
> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */
>
> #ifndef VIRTIO_NET_NO_LEGACY
> @@ -101,6 +105,7 @@ struct virtio_net_config {
> struct virtio_net_hdr_v1 {
> #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 /* Use csum_start, csum_offset */
> #define VIRTIO_NET_HDR_F_DATA_VALID 2 /* Csum is valid */
> +#define VIRTIO_NET_HDR_F_CSUM_NOT_INET 4 /* Checksum is not inet */
Both comment and name are not very informative.
How about just saying CRC32c ?
> __u8 flags;
> #define VIRTIO_NET_HDR_GSO_NONE 0 /* Not a GSO frame */
> #define VIRTIO_NET_HDR_GSO_TCPV4 1 /* GSO frame, IPv4 TCP (TSO) */
> --
> 2.9.5
^ permalink raw reply
* Re: [RFC v3 4/5] virtio_ring: add event idx support in packed ring
From: Jason Wang @ 2018-05-02 2:51 UTC (permalink / raw)
To: Tiwei Bie, mst, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <20180425051550.24342-5-tiwei.bie@intel.com>
On 2018年04月25日 13:15, Tiwei Bie wrote:
> This commit introduces the event idx support in packed
> ring. This feature is temporarily disabled, because the
> implementation in this patch may not work as expected,
> and some further discussions on the implementation are
> needed, e.g. do we have to check the wrap counter when
> checking whether a kick is needed?
>
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
> drivers/virtio/virtio_ring.c | 53 ++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 49 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 0181e93897be..b1039c2985b9 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -986,7 +986,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> - u16 flags;
> + u16 new, old, off_wrap, flags;
> bool needs_kick;
> u32 snapshot;
>
> @@ -995,7 +995,12 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> * suppressions. */
> virtio_mb(vq->weak_barriers);
>
> + old = vq->next_avail_idx - vq->num_added;
> + new = vq->next_avail_idx;
> + vq->num_added = 0;
> +
> snapshot = *(u32 *)vq->vring_packed.device;
> + off_wrap = virtio16_to_cpu(_vq->vdev, snapshot & 0xffff);
> flags = cpu_to_virtio16(_vq->vdev, snapshot >> 16) & 0x3;
>
> #ifdef DEBUG
> @@ -1006,7 +1011,10 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> vq->last_add_time_valid = false;
> #endif
>
> - needs_kick = (flags != VRING_EVENT_F_DISABLE);
> + if (flags == VRING_EVENT_F_DESC)
> + needs_kick = vring_need_event(off_wrap & ~(1<<15), new, old);
I wonder whether or not the math is correct. Both new and event are in
the unit of descriptor ring size, but old looks not.
Thanks
> + else
> + needs_kick = (flags != VRING_EVENT_F_DISABLE);
> END_USE(vq);
> return needs_kick;
> }
> @@ -1116,6 +1124,15 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> if (vq->last_used_idx >= vq->vring_packed.num)
> vq->last_used_idx -= vq->vring_packed.num;
>
> + /* If we expect an interrupt for the next entry, tell host
> + * by writing event index and flush out the write before
> + * the read in the next get_buf call. */
> + if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
> + virtio_store_mb(vq->weak_barriers,
> + &vq->vring_packed.driver->off_wrap,
> + cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
> + (vq->wrap_counter << 15)));
> +
> #ifdef DEBUG
> vq->last_add_time_valid = false;
> #endif
> @@ -1143,10 +1160,17 @@ static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
>
> /* We optimistically turn back on interrupts, then check if there was
> * more to do. */
> + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> + * either clear the flags bit or point the event index at the next
> + * entry. Always update the event index to keep code simple. */
> +
> + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> + vq->last_used_idx | (vq->wrap_counter << 15));
>
> if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> virtio_wmb(vq->weak_barriers);
> - vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> + VRING_EVENT_F_ENABLE;
> vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> vq->event_flags_shadow);
> }
> @@ -1172,15 +1196,34 @@ static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
> static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> + u16 bufs, used_idx, wrap_counter;
>
> START_USE(vq);
>
> /* We optimistically turn back on interrupts, then check if there was
> * more to do. */
> + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> + * either clear the flags bit or point the event index at the next
> + * entry. Always update the event index to keep code simple. */
> +
> + /* TODO: tune this threshold */
> + bufs = (u16)(vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
> +
> + used_idx = vq->last_used_idx + bufs;
> + wrap_counter = vq->wrap_counter;
> +
> + if (used_idx >= vq->vring_packed.num) {
> + used_idx -= vq->vring_packed.num;
> + wrap_counter ^= 1;
> + }
> +
> + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> + used_idx | (wrap_counter << 15));
>
> if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> virtio_wmb(vq->weak_barriers);
> - vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> + VRING_EVENT_F_ENABLE;
> vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> vq->event_flags_shadow);
> }
> @@ -1822,8 +1865,10 @@ void vring_transport_features(struct virtio_device *vdev)
> switch (i) {
> case VIRTIO_RING_F_INDIRECT_DESC:
> break;
> +#if 0
> case VIRTIO_RING_F_EVENT_IDX:
> break;
> +#endif
> case VIRTIO_F_VERSION_1:
> break;
> case VIRTIO_F_IOMMU_PLATFORM:
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH iproute2 1/2] README: update libdb build dependency information
From: Stephen Hemminger @ 2018-05-02 2:37 UTC (permalink / raw)
To: Baruch Siach; +Cc: netdev
In-Reply-To: <1918c6139ac3ee9affe8e62817f30b110ab235c7.1525178588.git.baruch@tkos.co.il>
On Tue, 1 May 2018 15:43:07 +0300
Baruch Siach <baruch@tkos.co.il> wrote:
> Debian does not distribute libdb4.x-dev for quite some time now. Current
> stable carries libdb5.3-dev. Update the wording accordingly.
>
> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Applied both patches, not sure how many people still use arpd anyway...
^ permalink raw reply
* Re: [PATCH iproute2-master] iproute: Parse last nexthop in a multipath route
From: Stephen Hemminger @ 2018-05-02 2:37 UTC (permalink / raw)
To: Ido Schimmel; +Cc: netdev, dsahern, mlxsw
In-Reply-To: <20180501131635.14981-1-idosch@mellanox.com>
On Tue, 1 May 2018 16:16:35 +0300
Ido Schimmel <idosch@mellanox.com> wrote:
> Continue parsing a multipath payload as long as another nexthop can fit
> in the payload.
>
> # ip route add 192.0.2.0/24 nexthop dev dummy0 nexthop dev dummy1
>
> Before:
> # ip route show 192.0.2.0/24
> 192.0.2.0/24
> nexthop dev dummy0 weight 1
>
> After:
> # ip route show 192.0.2.0/24
> 192.0.2.0/24
> nexthop dev dummy0 weight 1
> nexthop dev dummy1 weight 1
>
> Fixes: f48e14880a0e ("iproute: refactor multipath print")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Applied, thanks.
There really ought to be test cases for this.
^ permalink raw reply
* Re: Suggestions on iterating eBPF maps
From: Alexei Starovoitov @ 2018-05-02 2:33 UTC (permalink / raw)
To: Lorenzo Colitti; +Cc: Chenbo Feng, netdev, Daniel Borkmann, Joel Fernandes
In-Reply-To: <CAKD1Yr0QQJwhZDH4uDevcZ8QwzhrVjRQZGDvPg+Q2ONJGAqqiQ@mail.gmail.com>
On Wed, May 02, 2018 at 11:05:19AM +0900, Lorenzo Colitti wrote:
> On Sat, Apr 28, 2018 at 10:04 AM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > Another approach could be to use map-in-map and have almost atomic
> > replace of the whole map with new potentially empty map. The prog
> > can continue using the new map, while user space walks no longer
> > accessed old map.
>
> That sounds like a promising approach. I assume this would be
> functionally equivalent to an approach where there is a map containing
> a boolean that says whether to write to map A or map B? We'd then do
> the following:
>
> 0. Kernel program is writing to map A.
> 1. Userspace pushes config that says to write to map B.
> 2. Kernel program starts to write to map B.
> 3. Userspace scans map A, collecting stats and deleting everything it finds.
>
> One problem with this is: if the effects of #1 are not immediately
> visible to the programs running on all cores, the program could still
> be writing to map A and the deletes in #3 would result in loss of
> data. Are there any guarantees around this? I know that hash map
> writes are atomic, but I'm not aware of any other guarantees here. Are
> there memory barriers around map writes and reads?
>
> In the absence of guarantees, userspace could put a sleep between #1
> and #3 and things would be correct Most Of The Time(TM), but if the
> kernel is busy doing other things that might not be sufficient.
> Thoughts?
if you use map-in-map you don't need extra boolean map.
0. bpf prog can do
inner_map = lookup(map_in_map, key=0);
lookup(inner_map, your_real_key);
1. user space writes into map_in_map[0] <- FD of new map
2. some cpus are using old inner map and some a new
3. user space does sys_membarrier(CMD_GLOBAL) which will do synchronize_sched()
which in CONFIG_PREEMPT_NONE=y servers is the same as synchronize_rcu()
which will guarantee that progs finished.
4. scan old inner map
^ permalink raw reply
* linux-next: manual merge of the bpf-next tree with the bpf tree
From: Stephen Rothwell @ 2018-05-02 2:09 UTC (permalink / raw)
To: Daniel Borkmann, Alexei Starovoitov, Networking
Cc: Linux-Next Mailing List, Linux Kernel Mailing List, Song Liu,
Yonghong Song
[-- Attachment #1: Type: text/plain, Size: 2342 bytes --]
Hi all,
Today's linux-next merge of the bpf-next tree got a conflict in:
tools/testing/selftests/bpf/test_progs.c
between commit:
a4e21ff8d9a3 ("bpf: minor fix to selftest test_stacktrace_build_id()")
from the bpf tree and commit:
79b453501310 ("tools/bpf: add a test for bpf_get_stack with tracepoint prog")
from the bpf-next tree.
I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging. You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.
--
Cheers,
Stephen Rothwell
diff --cc tools/testing/selftests/bpf/test_progs.c
index fac581f1c57f,aa336f0abebc..000000000000
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@@ -1137,9 -1193,14 +1193,14 @@@ static void test_stacktrace_build_id(vo
err, errno))
goto disable_pmu;
+ stack_amap_fd = bpf_find_map(__func__, obj, "stack_amap");
+ if (CHECK(stack_amap_fd < 0, "bpf_find_map stack_amap",
+ "err %d errno %d\n", err, errno))
+ goto disable_pmu;
+
assert(system("dd if=/dev/urandom of=/dev/zero count=4 2> /dev/null")
== 0);
- assert(system("./urandom_read if=/dev/urandom of=/dev/zero count=4 2> /dev/null") == 0);
+ assert(system("./urandom_read") == 0);
/* disable stack trace collection */
key = 0;
val = 1;
@@@ -1188,8 -1249,15 +1249,15 @@@
previous_key = key;
} while (bpf_map_get_next_key(stackmap_fd, &previous_key, &key) == 0);
- CHECK(build_id_matches < 1, "build id match",
- "Didn't find expected build ID from the map\n");
+ if (CHECK(build_id_matches < 1, "build id match",
- "Didn't find expected build ID from the map"))
++ "Didn't find expected build ID from the map\n"))
+ goto disable_pmu;
+
+ stack_trace_len = PERF_MAX_STACK_DEPTH
+ * sizeof(struct bpf_stack_build_id);
+ err = compare_stack_ips(stackmap_fd, stack_amap_fd, stack_trace_len);
+ CHECK(err, "compare_stack_ips stackmap vs. stack_amap",
+ "err %d errno %d\n", err, errno);
disable_pmu:
ioctl(pmu_fd, PERF_EVENT_IOC_DISABLE);
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* [PATCH V2 net-next 6/6] ipvlan: Add support for SCTP checksum offload
From: Vladislav Yasevich @ 2018-05-02 2:07 UTC (permalink / raw)
To: netdev
Cc: linux-sctp, virtualization, virtio-dev, mst, jasowang, nhorman,
marcelo.leitner, Vladislav Yasevich
In-Reply-To: <20180502020739.19239-1-vyasevic@redhat.com>
Since ipvlan is a software device, we can turn on SCTP checksum
offload and let the checksum be computed at the lower layer.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
drivers/net/ipvlan/ipvlan_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 450eec2..ddb1590 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -161,7 +161,8 @@ static void ipvlan_port_destroy(struct net_device *dev)
(NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_HIGHDMA | NETIF_F_FRAGLIST | \
NETIF_F_GSO | NETIF_F_TSO | NETIF_F_GSO_ROBUST | \
NETIF_F_TSO_ECN | NETIF_F_TSO6 | NETIF_F_GRO | NETIF_F_RXCSUM | \
- NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_STAG_FILTER)
+ NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_STAG_FILTER | \
+ NETIF_F_SCTP_CRC)
#define IPVLAN_STATE_MASK \
((1<<__LINK_STATE_NOCARRIER) | (1<<__LINK_STATE_DORMANT))
--
2.9.5
^ permalink raw reply related
* [PATCH V2 net-next 5/6] macvlan/macvtap: Add support for SCTP checksum offload.
From: Vladislav Yasevich @ 2018-05-02 2:07 UTC (permalink / raw)
To: netdev
Cc: linux-sctp, virtualization, virtio-dev, mst, jasowang, nhorman,
marcelo.leitner, Vladislav Yasevich
In-Reply-To: <20180502020739.19239-1-vyasevic@redhat.com>
Since we now have support for software CRC32c offload, turn it on
for macvlan and macvtap devices so that guests can take advantage
of offload SCTP checksums to the host or host hardware.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
drivers/net/macvlan.c | 5 +++--
drivers/net/tap.c | 8 +++++---
2 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 725f4b4..646b730 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -834,7 +834,7 @@ static struct lock_class_key macvlan_netdev_addr_lock_key;
#define ALWAYS_ON_OFFLOADS \
(NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_GSO_SOFTWARE | \
- NETIF_F_GSO_ROBUST | NETIF_F_GSO_ENCAP_ALL)
+ NETIF_F_GSO_ROBUST | NETIF_F_GSO_ENCAP_ALL | NETIF_F_SCTP_CRC)
#define ALWAYS_ON_FEATURES (ALWAYS_ON_OFFLOADS | NETIF_F_LLTX)
@@ -842,7 +842,8 @@ static struct lock_class_key macvlan_netdev_addr_lock_key;
(NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_HIGHDMA | NETIF_F_FRAGLIST | \
NETIF_F_GSO | NETIF_F_TSO | NETIF_F_LRO | \
NETIF_F_TSO_ECN | NETIF_F_TSO6 | NETIF_F_GRO | NETIF_F_RXCSUM | \
- NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_STAG_FILTER)
+ NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_STAG_FILTER | \
+ NETIF_F_SCTP_CRC)
#define MACVLAN_STATE_MASK \
((1<<__LINK_STATE_NOCARRIER) | (1<<__LINK_STATE_DORMANT))
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 9b6cb78..2c8512b 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -369,8 +369,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
* check, we either support them all or none.
*/
if (skb->ip_summed == CHECKSUM_PARTIAL &&
- !(features & NETIF_F_CSUM_MASK) &&
- skb_checksum_help(skb))
+ skb_csum_hwoffload_help(skb, features))
goto drop;
if (ptr_ring_produce(&q->ring, skb))
goto drop;
@@ -945,6 +944,9 @@ static int set_offload(struct tap_queue *q, unsigned long arg)
}
}
+ if (arg & TUN_F_SCTP_CSUM)
+ feature_mask |= NETIF_F_SCTP_CRC;
+
/* tun/tap driver inverts the usage for TSO offloads, where
* setting the TSO bit means that the userspace wants to
* accept TSO frames and turning it off means that user space
@@ -1077,7 +1079,7 @@ static long tap_ioctl(struct file *file, unsigned int cmd,
case TUNSETOFFLOAD:
/* let the user check for future flags */
if (arg & ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
- TUN_F_TSO_ECN | TUN_F_UFO))
+ TUN_F_TSO_ECN | TUN_F_UFO | TUN_F_SCTP_CSUM))
return -EINVAL;
rtnl_lock();
--
2.9.5
^ permalink raw reply related
* [PATCH V2 net-next 4/6] tun: Add support for SCTP checksum offload
From: Vladislav Yasevich @ 2018-05-02 2:07 UTC (permalink / raw)
To: netdev
Cc: linux-sctp, virtualization, virtio-dev, mst, jasowang, nhorman,
marcelo.leitner, Vladislav Yasevich
In-Reply-To: <20180502020739.19239-1-vyasevic@redhat.com>
Adds a new tun offload flag to allow for SCTP checksum offload.
The flag has to be set by the user and defaults to "no offload".
Add SCTP checksum support to the supported tun features.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
drivers/net/tun.c | 7 ++++++-
include/uapi/linux/if_tun.h | 1 +
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index a1ba262..4f314a6 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -216,7 +216,7 @@ struct tun_struct {
struct net_device *dev;
netdev_features_t set_features;
#define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
- NETIF_F_TSO6)
+ NETIF_F_TSO6|NETIF_F_SCTP_CRC)
int align;
int vnet_hdr_sz;
@@ -2719,6 +2719,11 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
arg &= ~TUN_F_UFO;
}
+ if (arg & TUN_F_SCTP_CSUM) {
+ features |= NETIF_F_SCTP_CRC;
+ arg &= ~TUN_F_SCTP_CSUM;
+ }
+
/* This gives the user a way to test for new features in future by
* trying to set them. */
if (arg)
diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h
index ee432cd..061aea8 100644
--- a/include/uapi/linux/if_tun.h
+++ b/include/uapi/linux/if_tun.h
@@ -86,6 +86,7 @@
#define TUN_F_TSO6 0x04 /* I can handle TSO for IPv6 packets */
#define TUN_F_TSO_ECN 0x08 /* I can handle TSO with ECN bits. */
#define TUN_F_UFO 0x10 /* I can handle UFO packets */
+#define TUN_F_SCTP_CSUM 0x20 /* I can handle SCTP checksums offloads */
/* Protocol info prepended to the packets (when IFF_NO_PI is not set) */
#define TUN_PKT_STRIP 0x0001
--
2.9.5
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox