* [PATCH bpf-next 1/2] selftests/bpf: test_progs: fix verbose mode garbage
From: Stanislav Fomichev @ 2019-08-31 2:34 UTC (permalink / raw)
To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev
fseeko(.., 0, SEEK_SET) on a memstream just puts the buffer pointer
to the beginning so when we call fflush on it we get some garbage
log data from the previous test. Let's manually set terminating
byte to zero at the reported buffer size.
To show the issue consider the following snippet:
stream = open_memstream (&buf, &len);
fprintf(stream, "aaa");
fflush(stream);
printf("buf=%s, len=%zu\n", buf, len);
fseeko(stream, 0, SEEK_SET);
fprintf(stream, "b");
fflush(stream);
printf("buf=%s, len=%zu\n", buf, len);
Output:
buf=aaa, len=3
buf=baa, len=1
Fixes: 946152b3c5d6 ("selftests/bpf: test_progs: switch to open_memstream")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
tools/testing/selftests/bpf/test_progs.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index e5892cb60eca..e8616e778cb5 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -45,6 +45,7 @@ static void dump_test_log(const struct prog_test_def *test, bool failed)
if (env.verbose || test->force_log || failed) {
if (env.log_cnt) {
+ env.log_buf[env.log_cnt] = '\0';
fprintf(env.stdout, "%s", env.log_buf);
if (env.log_buf[env.log_cnt - 1] != '\n')
fprintf(env.stdout, "\n");
--
2.23.0.187.g17f5b7556c-goog
^ permalink raw reply related
* Re: [PATCH 4.14] tcp: fix tcp_rtx_queue_tail in case of empty retransmit queue
From: David Miller @ 2019-08-31 2:20 UTC (permalink / raw)
To: cpaasch
Cc: jonathan.lemon, stable, gregkh, tim.froidcoeur, matthieu.baerts,
aprout, edumazet, jtl, linux-kernel, mkubecek, ncardwell, sashal,
ycheng, netdev
In-Reply-To: <20190830232657.GL45416@MacBook-Pro-64.local>
From: Christoph Paasch <cpaasch@apple.com>
Date: Fri, 30 Aug 2019 16:26:57 -0700
> (I don't see it in the stable-queue)
I don't handle any stable branch before the most recent two, so this isn't
my territory.
^ permalink raw reply
* [PATCH] net: stmmac: dwmac-sun8i: Variable "val" in function sun8i_dwmac_set_syscon() could be uninitialized
From: Yizhuo @ 2019-08-31 2:00 UTC (permalink / raw)
Cc: csong, zhiyunq, Yizhuo, Giuseppe Cavallaro, Alexandre Torgue,
Jose Abreu, David S. Miller, Maxime Ripard, Chen-Yu Tsai,
Maxime Coquelin, netdev, linux-arm-kernel, linux-stm32,
linux-kernel
In function sun8i_dwmac_set_syscon(), local variable "val" could
be uninitialized if function regmap_field_read() returns -EINVAL.
However, it will be used directly in the if statement, which
is potentially unsafe.
Signed-off-by: Yizhuo <yzhai003@ucr.edu>
---
drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
index 4083019c547a..f97a4096f8fc 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
@@ -873,7 +873,12 @@ static int sun8i_dwmac_set_syscon(struct stmmac_priv *priv)
int ret;
u32 reg, val;
- regmap_field_read(gmac->regmap_field, &val);
+ ret = regmap_field_read(gmac->regmap_field, &val);
+ if (ret) {
+ dev_err(priv->device, "Fail to read from regmap field.\n");
+ return ret;
+ }
+
reg = gmac->variant->default_syscon_value;
if (reg != val)
dev_warn(priv->device,
--
2.17.1
^ permalink raw reply related
* Re: [PATCH 4.14] tcp: fix tcp_rtx_queue_tail in case of empty retransmit queue
From: Christoph Paasch @ 2019-08-30 23:26 UTC (permalink / raw)
To: Jonathan Lemon, stable, gregkh
Cc: Tim Froidcoeur, matthieu.baerts, aprout, davem, edumazet, jtl,
linux-kernel, mkubecek, ncardwell, sashal, ycheng, netdev
In-Reply-To: <400C4757-E7AD-4CCF-8077-79563EA869B1@gmail.com>
Hello,
On 24/08/19 - 15:05:20, Jonathan Lemon wrote:
>
>
> On 23 Aug 2019, at 23:03, Tim Froidcoeur wrote:
>
> > Commit 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
> > triggers following stack trace:
> >
> > [25244.848046] kernel BUG at ./include/linux/skbuff.h:1406!
> > [25244.859335] RIP: 0010:skb_queue_prev+0x9/0xc
> > [25244.888167] Call Trace:
> > [25244.889182] <IRQ>
> > [25244.890001] tcp_fragment+0x9c/0x2cf
> > [25244.891295] tcp_write_xmit+0x68f/0x988
> > [25244.892732] __tcp_push_pending_frames+0x3b/0xa0
> > [25244.894347] tcp_data_snd_check+0x2a/0xc8
> > [25244.895775] tcp_rcv_established+0x2a8/0x30d
> > [25244.897282] tcp_v4_do_rcv+0xb2/0x158
> > [25244.898666] tcp_v4_rcv+0x692/0x956
> > [25244.899959] ip_local_deliver_finish+0xeb/0x169
> > [25244.901547] __netif_receive_skb_core+0x51c/0x582
> > [25244.903193] ? inet_gro_receive+0x239/0x247
> > [25244.904756] netif_receive_skb_internal+0xab/0xc6
> > [25244.906395] napi_gro_receive+0x8a/0xc0
> > [25244.907760] receive_buf+0x9a1/0x9cd
> > [25244.909160] ? load_balance+0x17a/0x7b7
> > [25244.910536] ? vring_unmap_one+0x18/0x61
> > [25244.911932] ? detach_buf+0x60/0xfa
> > [25244.913234] virtnet_poll+0x128/0x1e1
> > [25244.914607] net_rx_action+0x12a/0x2b1
> > [25244.915953] __do_softirq+0x11c/0x26b
> > [25244.917269] ? handle_irq_event+0x44/0x56
> > [25244.918695] irq_exit+0x61/0xa0
> > [25244.919947] do_IRQ+0x9d/0xbb
> > [25244.921065] common_interrupt+0x85/0x85
> > [25244.922479] </IRQ>
> >
> > tcp_rtx_queue_tail() (called by tcp_fragment()) can call
> > tcp_write_queue_prev() on the first packet in the queue, which will trigger
> > the BUG in tcp_write_queue_prev(), because there is no previous packet.
> >
> > This happens when the retransmit queue is empty, for example in case of a
> > zero window.
> >
> > Patch is needed for 4.4, 4.9 and 4.14 stable branches.
> >
> > Fixes: 8c3088f895a0 ("tcp: be more careful in tcp_fragment()")
> > Change-Id: I839bde7167ae59e2f7d916c913507372445765c5
> > Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net>
> > Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
> > Reviewed-by: Christoph Paasch <cpaasch@apple.com>
>
> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
just checking in, if the patch is getting picked up for -stable ?
(I don't see it in the stable-queue)
Thanks,
Christoph
^ permalink raw reply
* Re: [PATCH net] net: Properly update v4 routes with v6 nexthop
From: David Ahern @ 2019-08-31 1:48 UTC (permalink / raw)
To: Donald Sharp, netdev, dsahern, sworley
In-Reply-To: <20190830181446.25262-1-sharpd@cumulusnetworks.com>
On 8/30/19 12:14 PM, Donald Sharp wrote:
> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> index 4c81846ccce8..c7e94edae482 100644
> --- a/include/net/ip_fib.h
> +++ b/include/net/ip_fib.h
> @@ -513,7 +513,7 @@ int ip_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
> struct netlink_callback *cb);
>
> int fib_nexthop_info(struct sk_buff *skb, const struct fib_nh_common *nh,
> - unsigned char *flags, bool skip_oif);
> + u8 family, unsigned char *flags, bool skip_oif);
> int fib_add_nexthop(struct sk_buff *skb, const struct fib_nh_common *nh,
> - int nh_weight);
> + int nh_weight, u8 family);
Call this rt_family in both for 'route family' to make it clear.
> #endif /* _NET_FIB_H */
> diff --git a/include/net/nexthop.h b/include/net/nexthop.h
> index 95f766c31c90..f13c61806abf 100644
> --- a/include/net/nexthop.h
> +++ b/include/net/nexthop.h
> @@ -172,7 +172,7 @@ int nexthop_mpath_fill_node(struct sk_buff *skb, struct nexthop *nh)
nexthop_mpath_fill_node should take the family as input argument and
then ...
> struct fib_nh_common *nhc = &nhi->fib_nhc;
> int weight = nhg->nh_entries[i].weight;
>
> - if (fib_add_nexthop(skb, nhc, weight) < 0)
> + if (fib_add_nexthop(skb, nhc, weight, nhc->nhc_family) < 0)
pass it to fib_add_nexthop.
> return -EMSGSIZE;
> }
>
The rest looks ok to me.
as an FYI for the archives, the fib_nexthops.sh script does show the
unexpected gw for IPv6 but it does not flag it as an error. I need to
fix that so this should have been caught in the original submission.
^ permalink raw reply
* Re: Proposal: r8152 firmware patching framework
From: Amber Chen @ 2019-08-31 0:53 UTC (permalink / raw)
To: Prashant Malani
Cc: Hayes Wang, David Miller, netdev@vger.kernel.org, Bambi Yeh,
Ryankao, Jackc, Albertk, marcochen@google.com, nic_swsd,
Grant Grundler
In-Reply-To: <CACeCKacjCkS5UmzS9irm0JjGmk98uBBBsTLSzrXoDUJ60Be9Vw@mail.gmail.com>
+ acct mgr, Stephen
> Prashant Malani <pmalani@chromium.org> 於 2019年8月31日 上午6:24 寫道:
>
> (Adding a few more Realtek folks)
>
> Friendly ping. Any thoughts / feedback, Realtek folks (and others) ?
>
>> On Thu, Aug 29, 2019 at 11:40 AM Prashant Malani <pmalani@chromium.org> wrote:
>>
>> Hi,
>>
>> The r8152 driver source code distributed by Realtek (on
>> www.realtek.com) contains firmware patches. This involves binary
>> byte-arrays being written byte/word-wise to the hardware memory
>> Example: grundler@chromium.org (cc-ed) has an experimental patch which
>> includes the firmware patching code which was distributed with the
>> Realtek source :
>> https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1417953
>>
>> It would be nice to have a way to incorporate these firmware fixes
>> into the upstream code. Since having indecipherable byte-arrays is not
>> possible upstream, I propose the following:
>> - We use the assistance of Realtek to come up with a format which the
>> firmware patch files can follow (this can be documented in the
>> comments).
>> - A real simple format could look like this:
>> +
>> <section1><size_in_bytes><address1><data1><address2><data2>...<addressN><dataN><section2>...
>> + The driver would be able to understand how to parse
>> each section (e.g is each data entry a byte or a word?)
>>
>> - We use request_firmware() to load the firmware, parse it and write
>> the data to the relevant registers.
>>
>> I'm unfamiliar with what the preferred method of firmware patching is,
>> so I hope the maintainers can help suggest the best path forward.
>>
>> As an aside: It would be great if Realtek could publish a list of
>> fixes that the firmware patches implement (I think a list on the
>> driver download page on the Realtek website would be an excellent
>> starting point).
>>
>> Thanks and Best regards,
>>
>> -Prashant
>
> ------Please consider the environment before printing this e-mail.
^ permalink raw reply
* Re: [PATCH 0/5] Netfilter fixes for net
From: David Miller @ 2019-08-31 0:52 UTC (permalink / raw)
To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <20190830120704.6147-1-pablo@netfilter.org>
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Fri, 30 Aug 2019 14:06:59 +0200
> The following patchset contains Netfilter fixes for net:
>
> 1) Spurious warning when loading rules using the physdev match,
> from Todd Seidelmann.
>
> 2) Fix FTP conntrack helper debugging output, from Thomas Jarosch.
>
> 3) Restore per-netns nf_conntrack_{acct,helper,timeout} sysctl knobs,
> from Florian Westphal.
>
> 4) Clear skbuff timestamp from the flowtable datapath, also from Florian.
>
> 5) Fix incorrect byteorder of NFT_META_BRI_IIFVPROTO, from wenxu.
Pulled, thanks.
^ permalink raw reply
* Re: pull-request: bpf 2019-08-31
From: David Miller @ 2019-08-31 0:39 UTC (permalink / raw)
To: daniel; +Cc: ast, netdev, bpf
In-Reply-To: <20190830234006.31988-1-daniel@iogearbox.net>
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Sat, 31 Aug 2019 01:40:06 +0200
> The following pull-request contains BPF updates for your *net* tree.
>
> The main changes are:
>
> 1) Fix 32-bit zero-extension during constant blinding which
> has been causing a regression on ppc64, from Naveen.
>
> 2) Fix a latency bug in nfp driver when updating stack index
> register, from Jiong.
Pulled, thanks.
^ permalink raw reply
* Re: [PATCH net-next] bnxt_en: Fix compile error regression with CONFIG_BNXT_SRIOV not set.
From: David Miller @ 2019-08-31 0:38 UTC (permalink / raw)
To: michael.chan; +Cc: netdev, ray.jui
In-Reply-To: <1567206638-22674-1-git-send-email-michael.chan@broadcom.com>
From: Michael Chan <michael.chan@broadcom.com>
Date: Fri, 30 Aug 2019 19:10:38 -0400
> Add a new function bnxt_get_registered_vfs() to handle the work
> of getting the number of registered VFs under #ifdef CONFIG_BNXT_SRIOV.
> The main code will call this function and will always work correctly
> whether CONFIG_BNXT_SRIOV is set or not.
>
> Fixes: 230d1f0de754 ("bnxt_en: Handle firmware reset.")
> Reported-by: kbuild test robot <lkp@intel.com>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] net: stmmac: Variable "val" in function sun8i_dwmac_set_syscon() could be uninitialized
From: David Miller @ 2019-08-31 0:37 UTC (permalink / raw)
To: yzhai003
Cc: csong, zhiyunq, peppe.cavallaro, alexandre.torgue, maxime.ripard,
wens, netdev, linux-arm-kernel, linux-kernel
In-Reply-To: <CABvMjLRzuUVh7FxVQj2O40Sbr+VygwSG8spMv0fW2RZVvaJ8rQ@mail.gmail.com>
From: Yizhuo Zhai <yzhai003@ucr.edu>
Date: Fri, 30 Aug 2019 15:29:07 -0700
> Thanks for your feedback, this patch should work for v4.14.
You must always submit patches against the current tree.
^ permalink raw reply
* Re: [v2] net_sched: act_police: add 2 new attributes to support police 64bit rate and peakrate
From: David Z. Dai @ 2019-08-31 0:30 UTC (permalink / raw)
To: David Miller; +Cc: xiyou.wangcong, jhs, jiri, netdev, linux-kernel, zdai
In-Reply-To: <20190830.133335.323827182628557013.davem@davemloft.net>
On Fri, 2019-08-30 at 13:33 -0700, David Miller wrote:
> From: "David Z. Dai" <zdai@linux.vnet.ibm.com>
> Date: Fri, 30 Aug 2019 15:03:52 -0500
>
> > I have the impression that last parameter num value should be larger
> > than the attribute num value in 2nd parameter (TC_POLICE_RATE64 in this
> > case).
>
> The argument in question is explicitly the "padding" value.
>
> Please explain in detail where you got the impression that the
> argument has to be larger?
In include/uapi/linux/pkt_sched.h header:
For HTB:
enum {
TCA_HTB_UNSPEC,
TCA_HTB_PARMS,
TCA_HTB_INIT,
TCA_HTB_CTAB,
TCA_HTB_RTAB,
TCA_HTB_DIRECT_QLEN,
TCA_HTB_RATE64, /* <--- */
TCA_HTB_CEIL64, /* <--- */
TCA_HTB_PAD, /* <--- */
__TCA_HTB_MAX,
};
/* TCA_HTB_RATE64,TCA_HTB_CEIL64 are declared *before* TCA_HTB_PAD */
For TBF:
enum {
TCA_TBF_UNSPEC,
TCA_TBF_PARMS,
TCA_TBF_RTAB,
TCA_TBF_PTAB,
TCA_TBF_RATE64, /* <--- */
TCA_TBF_PRATE64, /* <--- */
TCA_TBF_BURST,
TCA_TBF_PBURST,
TCA_TBF_PAD, /* <--- */
__TCA_TBF_MAX,
};
/* TCA_TBF_RATE64, TCA_TBF_PRATE64 are declared *before* TCA_TBF_PAD */
For HTB, in net/sched/sch_htb.c file, htb_dump_class() routine:
if ((cl->rate.rate_bytes_ps >= (1ULL << 32)) &&
nla_put_u64_64bit(skb, TCA_HTB_RATE64,
cl->rate.rate_bytes_ps,
TCA_HTB_PAD))
goto nla_put_failure;
if ((cl->ceil.rate_bytes_ps >= (1ULL << 32)) &&
nla_put_u64_64bit(skb, TCA_HTB_CEIL64,
cl->ceil.rate_bytes_ps,
TCA_HTB_PAD))
goto nla_put_failure;
For TBF, in net/sched/sch_tbf.c file, tbf_dump() routine:
if (q->rate.rate_bytes_ps >= (1ULL << 32) &&
nla_put_u64_64bit(skb, TCA_TBF_RATE64,
q->rate.rate_bytes_ps,
TCA_TBF_PAD))
goto nla_put_failure;
if (tbf_peak_present(q) &&
q->peak.rate_bytes_ps >= (1ULL << 32) &&
nla_put_u64_64bit(skb, TCA_TBF_PRATE64,
q->peak.rate_bytes_ps,
TCA_TBF_PAD))
goto nla_put_failure;
The last parameter used TCA_TBF_PAD, TCA_TBF_PAD are all declared
*after* those attributes.
I am trying to keep it consistent in police part. That's where my
impression is coming from.
Now for suggestion/comment, do you think is it better to add a new PAD
attribute in include/uapi/pkt_cls.h like this:
enum {
TCA_POLICE_UNSPEC,
TCA_POLICE_TBF,
TCA_POLICE_RATE,
TCA_POLICE_PEAKRATE,
TCA_POLICE_AVRATE,
TCA_POLICE_RESULT,
TCA_POLICE_TM,
TCA_POLICE_PAD,
TCA_POLICE_RATE64, /* <--- */
TCA_POLICE_PEAKRATE64, /* <--- */
TCA_POLICE_PAD2, /* <--- new PAD */
__TCA_POLICE_MAX
#define TCA_POLICE_RESULT TCA_POLICE_RESULT
#};
Then use this TCA_POLICE_PAD2 as the last parameter in
nla_put_u64_64bit()?
Thanks!
^ permalink raw reply
* Re: [PATCH v6 net-next 15/19] ionic: Add Tx and Rx handling
From: Shannon Nelson @ 2019-08-30 23:57 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: netdev, davem
In-Reply-To: <20190829161852.1705d770@cakuba.netronome.com>
On 8/29/19 4:18 PM, Jakub Kicinski wrote:
> On Thu, 29 Aug 2019 11:27:16 -0700, Shannon Nelson wrote:
>> +netdev_tx_t ionic_start_xmit(struct sk_buff *skb, struct net_device *netdev)
>> +{
>> + u16 queue_index = skb_get_queue_mapping(skb);
>> + struct ionic_lif *lif = netdev_priv(netdev);
>> + struct ionic_queue *q;
>> + int ndescs;
>> + int err;
>> +
>> + if (unlikely(!test_bit(IONIC_LIF_UP, lif->state))) {
>> + dev_kfree_skb(skb);
>> + return NETDEV_TX_OK;
>> + }
>> +
>> + if (likely(lif_to_txqcq(lif, queue_index)))
>> + q = lif_to_txq(lif, queue_index);
>> + else
>> + q = lif_to_txq(lif, 0);
>> +
>> + ndescs = ionic_tx_descs_needed(q, skb);
>> + if (ndescs < 0)
>> + goto err_out_drop;
>> +
>> + if (!ionic_q_has_space(q, ndescs)) {
> You should stop the queue in advance, whenever you can't ensure that a
> max size frame can be placed on the ring. Requeuing is very expensive
> so modern drivers should try to never return NETDEV_TX_BUSY
Yes, I see how that's been done in nfp - good idea.
>
>> + netif_stop_subqueue(netdev, queue_index);
>> + q->stop++;
>> +
>> + /* Might race with ionic_tx_clean, check again */
>> + smp_rmb();
>> + if (ionic_q_has_space(q, ndescs)) {
>> + netif_wake_subqueue(netdev, queue_index);
>> + q->wake++;
>> + } else {
>> + return NETDEV_TX_BUSY;
>> + }
>> + }
>> +
>> + if (skb_is_gso(skb))
>> + err = ionic_tx_tso(q, skb);
>> + else
>> + err = ionic_tx(q, skb);
>> +
>> + if (err)
>> + goto err_out_drop;
>> +
>> + return NETDEV_TX_OK;
>> +
>> +err_out_drop:
>> + netif_stop_subqueue(netdev, queue_index);
> This stopping of the queue is suspicious, if ionic_tx() fails there's
> no guarantee the queue will ever be woken up, no?
Yes, that does look odd. If there isn't a new descriptor with an skb in
the queue, it won't get cleaned and reenabled in the Tx clean.
sln
>
>> + q->stop++;
>> + q->drop++;
>> + dev_kfree_skb(skb);
>> + return NETDEV_TX_OK;
>> +}
^ permalink raw reply
* Re: [PATCH] net: bcmgenet: use ethtool_op_get_ts_info()
From: Doug Berger @ 2019-08-30 23:52 UTC (permalink / raw)
To: Florian Fainelli, Ryan M. Collins, David S. Miller
Cc: bcm-kernel-feedback-list, netdev, linux-kernel
In-Reply-To: <a7003b3c-4035-5d4f-43e7-a8a76dcea0fb@gmail.com>
On 8/30/19 11:51 AM, Florian Fainelli wrote:
> On 8/30/19 11:49 AM, Ryan M. Collins wrote:
>> This change enables the use of SW timestamping on the Raspberry Pi 4.
>
> Finally the first bcmgenet patch that was tested on the Pi 4!
>
>>
>> bcmgenet's transmit function bcmgenet_xmit() implements software
>> timestamping. However the SOF_TIMESTAMPING_TX_SOFTWARE capability was
>> missing and only SOF_TIMESTAMPING_RX_SOFTWARE was announced. By using
>> ethtool_ops bcmgenet_ethtool_ops() as get_ts_info(), the
>> SOF_TIMESTAMPING_TX_SOFTWARE capability is announced.
>>
>> Similar to commit a8f5cb9e7991 ("smsc95xx: use ethtool_op_get_ts_info()")
>>
>> Signed-off-by: Ryan M. Collins <rmc032@bucknell.edu>
>
> Acked-by: Florian Fainelli <f.fainelli@gmail.com>
>
Thanks Ryan!
Acked-by: Doug Berger <opendmb@gmail.com>
^ permalink raw reply
* pull-request: bpf 2019-08-31
From: Daniel Borkmann @ 2019-08-30 23:40 UTC (permalink / raw)
To: davem; +Cc: daniel, ast, netdev, bpf
Hi David,
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix 32-bit zero-extension during constant blinding which
has been causing a regression on ppc64, from Naveen.
2) Fix a latency bug in nfp driver when updating stack index
register, from Jiong.
Please consider pulling these changes from:
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
Thanks a lot!
----------------------------------------------------------------
The following changes since commit f53a7ad189594a112167efaf17ea8d0242b5ac00:
r8152: Set memory to all 0xFFs on failed reg reads (2019-08-25 19:52:59 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
for you to fetch changes up to ede7c460b1da5be7b8ef4efe47f1687babf06408:
bpf: handle 32-bit zext during constant blinding (2019-08-26 23:05:01 +0200)
----------------------------------------------------------------
Jiong Wang (1):
nfp: bpf: fix latency bug when updating stack index register
Naveen N. Rao (1):
bpf: handle 32-bit zext during constant blinding
drivers/net/ethernet/netronome/nfp/bpf/jit.c | 17 +++++++++++++----
kernel/bpf/core.c | 8 ++++++--
2 files changed, 19 insertions(+), 6 deletions(-)
^ permalink raw reply
* Re: [PATCH bpf-next v6 00/12] XDP unaligned chunk placement support
From: Daniel Borkmann @ 2019-08-30 23:29 UTC (permalink / raw)
To: Kevin Laatz, netdev, ast, bjorn.topel, magnus.karlsson,
jakub.kicinski, jonathan.lemon, saeedm, maximmi, stephen
Cc: bruce.richardson, ciara.loftus, bpf, intel-wired-lan
In-Reply-To: <20190827022531.15060-1-kevin.laatz@intel.com>
On 8/27/19 4:25 AM, Kevin Laatz wrote:
> This patch set adds the ability to use unaligned chunks in the XDP umem.
>
> Currently, all chunk addresses passed to the umem are masked to be chunk
> size aligned (max is PAGE_SIZE). This limits where we can place chunks
> within the umem as well as limiting the packet sizes that are supported.
>
> The changes in this patch set removes these restrictions, allowing XDP to
> be more flexible in where it can place a chunk within a umem. By relaxing
> where the chunks can be placed, it allows us to use an arbitrary buffer
> size and place that wherever we have a free address in the umem. These
> changes add the ability to support arbitrary frame sizes up to 4k
> (PAGE_SIZE) and make it easy to integrate with other existing frameworks
> that have their own memory management systems, such as DPDK.
> In DPDK, for example, there is already support for AF_XDP with zero-copy.
> However, with this patch set the integration will be much more seamless.
> You can find the DPDK AF_XDP driver at:
> https://git.dpdk.org/dpdk/tree/drivers/net/af_xdp
>
> Since we are now dealing with arbitrary frame sizes, we need also need to
> update how we pass around addresses. Currently, the addresses can simply be
> masked to 2k to get back to the original address. This becomes less trivial
> when using frame sizes that are not a 'power of 2' size. This patch set
> modifies the Rx/Tx descriptor format to use the upper 16-bits of the addr
> field for an offset value, leaving the lower 48-bits for the address (this
> leaves us with 256 Terabytes, which should be enough!). We only need to use
> the upper 16-bits to store the offset when running in unaligned mode.
> Rather than adding the offset (headroom etc) to the address, we will store
> it in the upper 16-bits of the address field. This way, we can easily add
> the offset to the address where we need it, using some bit manipulation and
> addition, and we can also easily get the original address wherever we need
> it (for example in i40e_zca_free) by simply masking to get the lower
> 48-bits of the address field.
>
> The patch set was tested with the following set up:
> - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
> - Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02)
> - Driver: i40e
> - Application: xdpsock with l2fwd (single interface)
> - Turbo disabled in BIOS
>
> There are no changes to performance before and after these patches for SKB
> mode and Copy mode. Zero-copy mode saw a performance degradation of ~1.5%.
>
> This patch set has been applied against
> commit 0bb52b0dfc88 ("tools: bpftool: add 'bpftool map freeze' subcommand")
>
> Structure of the patch set:
> Patch 1:
> - Remove unnecessary masking and headroom addition during zero-copy Rx
> buffer recycling in i40e. This change is required in order for the
> buffer recycling to work in the unaligned chunk mode.
>
> Patch 2:
> - Remove unnecessary masking and headroom addition during
> zero-copy Rx buffer recycling in ixgbe. This change is required in
> order for the buffer recycling to work in the unaligned chunk mode.
>
> Patch 3:
> - Add infrastructure for unaligned chunks. Since we are dealing with
> unaligned chunks that could potentially cross a physical page boundary,
> we add checks to keep track of that information. We can later use this
> information to correctly handle buffers that are placed at an address
> where they cross a page boundary. This patch also modifies the
> existing Rx and Tx functions to use the new descriptor format. To
> handle addresses correctly, we need to mask appropriately based on
> whether we are in aligned or unaligned mode.
>
> Patch 4:
> - This patch updates the i40e driver to make use of the new descriptor
> format.
>
> Patch 5:
> - This patch updates the ixgbe driver to make use of the new descriptor
> format.
>
> Patch 6:
> - This patch updates the mlx5e driver to make use of the new descriptor
> format. These changes are required to handle the new descriptor format
> and for unaligned chunks support.
>
> Patch 7:
> - This patch allows XSK frames smaller than page size in the mlx5e
> driver. Relax the requirements to the XSK frame size to allow it to be
> smaller than a page and even not a power of two. The current
> implementation can work in this mode, both with Striding RQ and without
> it.
>
> Patch 8:
> - Add flags for umem configuration to libbpf. Since we increase the size
> of the struct by adding flags, we also need to add the ABI versioning
> in this patch.
>
> Patch 9:
> - Modify xdpsock application to add a command line option for
> unaligned chunks
>
> Patch 10:
> - Since we can now run the application in unaligned chunk mode, we need
> to make sure we recycle the buffers appropriately.
>
> Patch 11:
> - Adds hugepage support to the xdpsock application
>
> Patch 12:
> - Documentation update to include the unaligned chunk scenario. We need
> to explicitly state that the incoming addresses are only masked in the
> aligned chunk mode and not the unaligned chunk mode.
>
> ---
> v2:
> - fixed checkpatch issues
> - fixed Rx buffer recycling for unaligned chunks in xdpsock
> - removed unused defines
> - fixed how chunk_size is calculated in xsk_diag.c
> - added some performance numbers to cover letter
> - modified descriptor format to make it easier to retrieve original
> address
> - removed patch adding off_t off to the zero copy allocator. This is no
> longer needed with the new descriptor format.
>
> v3:
> - added patch for mlx5 driver changes needed for unaligned chunks
> - moved offset handling to new helper function
> - changed value used for the umem chunk_mask. Now using the new
> descriptor format to save us doing the calculations in a number of
> places meaning more of the code is left unchanged while adding
> unaligned chunk support.
>
> v4:
> - reworked the next_pg_contig field in the xdp_umem_page struct. We now
> use the low 12 bits of the addr for flags rather than adding an extra
> field in the struct.
> - modified unaligned chunks flag define
> - fixed page_start calculation in __xsk_rcv_memcpy().
> - move offset handling to the xdp_umem_get_* functions
> - modified the len field in xdp_umem_reg struct. We now use 16 bits from
> this for the flags field.
> - fixed headroom addition to handle in the mlx5e driver
> - other minor changes based on review comments
>
> v5:
> - Added ABI versioning in the libbpf patch
> - Removed bitfields in the xdp_umem_reg struct. Adding new flags field.
> - Added accessors for getting addr and offset.
> - Added helper function for adding the offset to the addr.
> - Fixed conflicts with 'bpf-af-xdp-wakeup' which was merged recently.
> - Fixed typo in mlx driver patch.
> - Moved libbpf patch to later in the set (7/11, just before the sample
> app changes)
>
> v6:
> - Added support for XSK frames smaller than page in mlx5e driver (Maxim
> Mikityanskiy <maximmi@mellanox.com).
> - Fixed offset handling in xsk_generic_rcv.
> - Added check for base address in xskq_is_valid_addr_unaligned.
>
> Kevin Laatz (11):
> i40e: simplify Rx buffer recycle
> ixgbe: simplify Rx buffer recycle
> xsk: add support to allow unaligned chunk placement
> i40e: modify driver for handling offsets
> ixgbe: modify driver for handling offsets
> mlx5e: modify driver for handling offsets
> libbpf: add flags to umem config
> samples/bpf: add unaligned chunks mode support to xdpsock
> samples/bpf: add buffer recycling for unaligned chunks to xdpsock
> samples/bpf: use hugepages in xdpsock app
> doc/af_xdp: include unaligned chunk case
>
> Maxim Mikityanskiy (1):
> net/mlx5e: Allow XSK frames smaller than a page
>
> Documentation/networking/af_xdp.rst | 10 +-
> drivers/net/ethernet/intel/i40e/i40e_xsk.c | 26 +++--
> drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 26 +++--
> .../ethernet/mellanox/mlx5/core/en/params.c | 23 ++++-
> .../ethernet/mellanox/mlx5/core/en/params.h | 2 +
> .../net/ethernet/mellanox/mlx5/core/en/xdp.c | 8 +-
> .../ethernet/mellanox/mlx5/core/en/xsk/rx.c | 5 +-
> .../mellanox/mlx5/core/en/xsk/setup.c | 15 ++-
> include/net/xdp_sock.h | 75 ++++++++++++++-
> include/uapi/linux/if_xdp.h | 9 ++
> net/xdp/xdp_umem.c | 19 +++-
> net/xdp/xsk.c | 94 +++++++++++++++----
> net/xdp/xsk_diag.c | 2 +-
> net/xdp/xsk_queue.h | 70 ++++++++++++--
> samples/bpf/xdpsock_user.c | 61 ++++++++----
> tools/include/uapi/linux/if_xdp.h | 9 ++
> tools/lib/bpf/Makefile | 5 +-
> tools/lib/bpf/libbpf.map | 1 +
> tools/lib/bpf/xsk.c | 33 ++++++-
> tools/lib/bpf/xsk.h | 27 ++++++
> 20 files changed, 417 insertions(+), 103 deletions(-)
>
Applied, thanks!
^ permalink raw reply
* Re: [bpf-next] bpf: fix error check in bpf_tcp_gen_syncookie
From: Daniel Borkmann @ 2019-08-30 23:23 UTC (permalink / raw)
To: Petar Penkov, netdev, bpf; +Cc: davem, ast, Petar Penkov, Stanislav Fomichev
In-Reply-To: <20190827234622.76209-1-ppenkov.kernel@gmail.com>
On 8/28/19 1:46 AM, Petar Penkov wrote:
> From: Petar Penkov <ppenkov@google.com>
>
> If a SYN cookie is not issued by tcp_v#_gen_syncookie, then the return
> value will be exactly 0, rather than <= 0. Let's change the check to
> reflect that, especially since mss is an unsigned value and cannot be
> negative.
>
> Fixes: 70d66244317e ("bpf: add bpf_tcp_gen_syncookie helper")
> Reported-by: Stanislav Fomichev <sdf@google.com>
> Signed-off-by: Petar Penkov <ppenkov@google.com>
Applied, thanks!
^ permalink raw reply
* Re: [PATCH bpf-next 0/2] nfp: bpf: add simple map op cache
From: Daniel Borkmann @ 2019-08-30 23:23 UTC (permalink / raw)
To: Jakub Kicinski, alexei.starovoitov; +Cc: netdev, oss-drivers, jaco.gericke
In-Reply-To: <20190828053629.28658-1-jakub.kicinski@netronome.com>
On 8/28/19 7:36 AM, Jakub Kicinski wrote:
> Hi!
>
> This set adds a small batching and cache mechanism to the driver.
> Map dumps require two operations per element - get next, and
> lookup. Each of those needs a round trip to the device, and on
> a loaded system scheduling out and in of the dumping process.
> This set makes the driver request a number of entries at the same
> time, and if no operation which would modify the map happens
> from the host side those entries are used to serve lookup
> requests for up to 250us, at which point they are considered
> stale.
>
> This set has been measured to provide almost 4x dumping speed
> improvement, Jaco says:
>
> OLD dump times
> 500 000 elements: 26.1s
> 1 000 000 elements: 54.5s
>
> NEW dump times
> 500 000 elements: 7.6s
> 1 000 000 elements: 16.5s
>
> Jakub Kicinski (2):
> nfp: bpf: rework MTU checking
> nfp: bpf: add simple map op cache
>
> drivers/net/ethernet/netronome/nfp/bpf/cmsg.c | 187 ++++++++++++++++--
> drivers/net/ethernet/netronome/nfp/bpf/fw.h | 1 +
> drivers/net/ethernet/netronome/nfp/bpf/main.c | 33 ++++
> drivers/net/ethernet/netronome/nfp/bpf/main.h | 24 +++
> .../net/ethernet/netronome/nfp/bpf/offload.c | 3 +
> drivers/net/ethernet/netronome/nfp/nfp_net.h | 2 +-
> .../ethernet/netronome/nfp/nfp_net_common.c | 9 +-
> 7 files changed, 239 insertions(+), 20 deletions(-)
Applied, thanks!
^ permalink raw reply
* Re: [PATCH bpf-next v2 0/4] tools: bpftool: improve bpftool build experience
From: Daniel Borkmann @ 2019-08-30 23:20 UTC (permalink / raw)
To: Quentin Monnet, Alexei Starovoitov
Cc: bpf, netdev, oss-drivers, Lorenz Bauer, Ilya Leoshkevich,
Jakub Kicinski
In-Reply-To: <20190830110040.31257-1-quentin.monnet@netronome.com>
On 8/30/19 1:00 PM, Quentin Monnet wrote:
> Hi,
> This set attempts to make it easier to build bpftool, in particular when
> passing a specific output directory. This is a follow-up to the
> conversation held last month by Lorenz, Ilya and Jakub [0].
>
> The first patch is a minor fix to bpftool's Makefile, regarding the
> retrieval of kernel version (which currently prints a non-relevant make
> warning on some invocations).
>
> Second patch improves the Makefile commands to support more "make"
> invocations, or to fix building with custom output directory. On Jakub's
> suggestion, a script is also added to BPF selftests in order to keep track
> of the supported build variants.
>
> Building bpftool with "make tools/bpf" from the top of the repository
> generates files in "libbpf/" and "feature/" directories under tools/bpf/
> and tools/bpf/bpftool/. The third patch ensures such directories are taken
> care of on "make clean", and add them to the relevant .gitignore files.
>
> At last, fourth patch is a sligthly modified version of Ilya's fix
> regarding libbpf.a appearing twice on the linking command for bpftool.
>
> [0] https://lore.kernel.org/bpf/CACAyw9-CWRHVH3TJ=Tke2x8YiLsH47sLCijdp=V+5M836R9aAA@mail.gmail.com/
>
> v2:
> - Return error from check script if one of the make invocations returns
> non-zero (even if binary is successfully produced).
> - Run "make clean" from bpf/ and not only bpf/bpftool/ in that same script,
> when relevant.
> - Add a patch to clean up generated "feature/" and "libbpf/" directories.
>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Cc: Ilya Leoshkevich <iii@linux.ibm.com>
> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
>
> Quentin Monnet (4):
> tools: bpftool: ignore make built-in rules for getting kernel version
> tools: bpftool: improve and check builds for different make
> invocations
> tools: bpf: account for generated feature/ and libbpf/ directories
> tools: bpftool: do not link twice against libbpf.a in Makefile
>
> tools/bpf/.gitignore | 1 +
> tools/bpf/Makefile | 5 +-
> tools/bpf/bpftool/.gitignore | 2 +
> tools/bpf/bpftool/Makefile | 28 ++--
> tools/testing/selftests/bpf/Makefile | 3 +-
> .../selftests/bpf/test_bpftool_build.sh | 143 ++++++++++++++++++
> 6 files changed, 167 insertions(+), 15 deletions(-)
> create mode 100755 tools/testing/selftests/bpf/test_bpftool_build.sh
>
Applied, thanks!
^ permalink raw reply
* Re: [PATCH v6 net-next 07/19] ionic: Add basic adminq support
From: Shannon Nelson @ 2019-08-30 23:18 UTC (permalink / raw)
To: David Miller, jakub.kicinski; +Cc: netdev
In-Reply-To: <20190830.151711.704306282464276122.davem@davemloft.net>
On 8/30/19 3:17 PM, David Miller wrote:
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Date: Fri, 30 Aug 2019 15:16:04 -0700
>
>> On Fri, 30 Aug 2019 12:31:07 -0700, Shannon Nelson wrote:
>>> On 8/29/19 3:52 PM, Jakub Kicinski wrote:
>>>> On Thu, 29 Aug 2019 11:27:08 -0700, Shannon Nelson wrote:
>>>>> +static void ionic_lif_qcq_deinit(struct ionic_lif *lif, struct ionic_qcq *qcq)
>>>>> +{
>>>>> + struct ionic_dev *idev = &lif->ionic->idev;
>>>>> + struct device *dev = lif->ionic->dev;
>>>>> +
>>>>> + if (!qcq)
>>>>> + return;
>>>>> +
>>>>> + ionic_debugfs_del_qcq(qcq);
>>>>> +
>>>>> + if (!(qcq->flags & IONIC_QCQ_F_INITED))
>>>>> + return;
>>>>> +
>>>>> + if (qcq->flags & IONIC_QCQ_F_INTR) {
>>>>> + ionic_intr_mask(idev->intr_ctrl, qcq->intr.index,
>>>>> + IONIC_INTR_MASK_SET);
>>>>> + synchronize_irq(qcq->intr.vector);
>>>>> + devm_free_irq(dev, qcq->intr.vector, &qcq->napi);
>>>> Doesn't free_irq() basically imply synchronize_irq()?
>>> The synchronize_irq() waits for any threaded handlers to finish, while
>>> free_irq() only waits for HW handling. This helps makes sure we don't
>>> have anything still running before we remove resources.
>> mm.. I'm no IRQ expert but it strikes me as surprising as that'd mean
>> every single driver would always have to run synchronize_irq() on
>> module exit, no?
>>
>> I see there is a kthread_stop() in __free_irq(), you sure it doesn't
>> wait for threaded IRQs?
> I'm pretty sure it does.
Yes, deeper in there are the kthread_stop() calls that make the
synchronize_irq() unneccessary. I'll pull it out.
Thanks,
sln
^ permalink raw reply
* RE: [PATCH net-next, 2/2] hv_netvsc: Sync offloading features to VF NIC
From: Haiyang Zhang @ 2019-08-30 23:12 UTC (permalink / raw)
To: Jakub Kicinski
Cc: sashal@kernel.org, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org, KY Srinivasan, Stephen Hemminger,
olaf@aepfle.de, vkuznets, davem@davemloft.net,
linux-kernel@vger.kernel.org, Mark Bloch
In-Reply-To: <20190830160451.43a61cf9@cakuba.netronome.com>
> -----Original Message-----
> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Sent: Friday, August 30, 2019 4:05 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: sashal@kernel.org; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Stephen
> Hemminger <sthemmin@microsoft.com>; olaf@aepfle.de; vkuznets
> <vkuznets@redhat.com>; davem@davemloft.net; linux-
> kernel@vger.kernel.org; Mark Bloch <markb@mellanox.com>
> Subject: Re: [PATCH net-next, 2/2] hv_netvsc: Sync offloading features to VF
> NIC
>
> On Fri, 30 Aug 2019 03:45:38 +0000, Haiyang Zhang wrote:
> > VF NIC may go down then come up during host servicing events. This
> > causes the VF NIC offloading feature settings to roll back to the
> > defaults. This patch can synchronize features from synthetic NIC to
> > the VF NIC during ndo_set_features (ethtool -K), and
> > netvsc_register_vf when VF comes back after host events.
> >
> > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> > Cc: Mark Bloch <markb@mellanox.com>
>
> If we want to make this change in behaviour we should change net_failover
> at the same time.
I will check net_failover. Thanks.
^ permalink raw reply
* [PATCH net-next] bnxt_en: Fix compile error regression with CONFIG_BNXT_SRIOV not set.
From: Michael Chan @ 2019-08-30 23:10 UTC (permalink / raw)
To: davem; +Cc: netdev, ray.jui
Add a new function bnxt_get_registered_vfs() to handle the work
of getting the number of registered VFs under #ifdef CONFIG_BNXT_SRIOV.
The main code will call this function and will always work correctly
whether CONFIG_BNXT_SRIOV is set or not.
Fixes: 230d1f0de754 ("bnxt_en: Handle firmware reset.")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 82 ++++++++++++++++++++-----------
1 file changed, 52 insertions(+), 30 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index f8a834f..402d9f5 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -10107,34 +10107,56 @@ void bnxt_fw_exception(struct bnxt *bp)
bnxt_rtnl_unlock_sp(bp);
}
-void bnxt_fw_reset(struct bnxt *bp)
+/* Returns the number of registered VFs, or 1 if VF configuration is pending, or
+ * < 0 on error.
+ */
+static int bnxt_get_registered_vfs(struct bnxt *bp)
{
+#ifdef CONFIG_BNXT_SRIOV
int rc;
+ if (!BNXT_PF(bp))
+ return 0;
+
+ rc = bnxt_hwrm_func_qcfg(bp);
+ if (rc) {
+ netdev_err(bp->dev, "func_qcfg cmd failed, rc = %d\n", rc);
+ return rc;
+ }
+ if (bp->pf.registered_vfs)
+ return bp->pf.registered_vfs;
+ if (bp->sriov_cfg)
+ return 1;
+#endif
+ return 0;
+}
+
+void bnxt_fw_reset(struct bnxt *bp)
+{
bnxt_rtnl_lock_sp(bp);
if (test_bit(BNXT_STATE_OPEN, &bp->state) &&
!test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) {
+ int n = 0;
+
set_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
- if (BNXT_PF(bp) && bp->pf.active_vfs &&
- !test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state)) {
- rc = bnxt_hwrm_func_qcfg(bp);
- if (rc) {
- netdev_err(bp->dev, "Firmware reset aborted, first func_qcfg cmd failed, rc = %d\n",
- rc);
- clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
- dev_close(bp->dev);
- goto fw_reset_exit;
- }
- if (bp->pf.registered_vfs || bp->sriov_cfg) {
- u16 vf_tmo_dsecs = bp->pf.registered_vfs * 10;
-
- if (bp->fw_reset_max_dsecs < vf_tmo_dsecs)
- bp->fw_reset_max_dsecs = vf_tmo_dsecs;
- bp->fw_reset_state =
- BNXT_FW_RESET_STATE_POLL_VF;
- bnxt_queue_fw_reset_work(bp, HZ / 10);
- goto fw_reset_exit;
- }
+ if (bp->pf.active_vfs &&
+ !test_bit(BNXT_STATE_FW_FATAL_COND, &bp->state))
+ n = bnxt_get_registered_vfs(bp);
+ if (n < 0) {
+ netdev_err(bp->dev, "Firmware reset aborted, rc = %d\n",
+ n);
+ clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
+ dev_close(bp->dev);
+ goto fw_reset_exit;
+ } else if (n > 0) {
+ u16 vf_tmo_dsecs = n * 10;
+
+ if (bp->fw_reset_max_dsecs < vf_tmo_dsecs)
+ bp->fw_reset_max_dsecs = vf_tmo_dsecs;
+ bp->fw_reset_state =
+ BNXT_FW_RESET_STATE_POLL_VF;
+ bnxt_queue_fw_reset_work(bp, HZ / 10);
+ goto fw_reset_exit;
}
bnxt_fw_reset_close(bp);
bp->fw_reset_state = BNXT_FW_RESET_STATE_ENABLE_DEV;
@@ -10579,22 +10601,21 @@ static void bnxt_fw_reset_task(struct work_struct *work)
}
switch (bp->fw_reset_state) {
- case BNXT_FW_RESET_STATE_POLL_VF:
- rc = bnxt_hwrm_func_qcfg(bp);
- if (rc) {
+ case BNXT_FW_RESET_STATE_POLL_VF: {
+ int n = bnxt_get_registered_vfs(bp);
+
+ if (n < 0) {
netdev_err(bp->dev, "Firmware reset aborted, subsequent func_qcfg cmd failed, rc = %d, %d msecs since reset timestamp\n",
- rc, jiffies_to_msecs(jiffies -
+ n, jiffies_to_msecs(jiffies -
bp->fw_reset_timestamp));
goto fw_reset_abort;
- }
- if (bp->pf.registered_vfs || bp->sriov_cfg) {
+ } else if (n > 0) {
if (time_after(jiffies, bp->fw_reset_timestamp +
(bp->fw_reset_max_dsecs * HZ / 10))) {
clear_bit(BNXT_STATE_IN_FW_RESET, &bp->state);
bp->fw_reset_state = 0;
- netdev_err(bp->dev, "Firmware reset aborted, %d VFs still registered, sriov_cfg %d\n",
- bp->pf.registered_vfs,
- bp->sriov_cfg);
+ netdev_err(bp->dev, "Firmware reset aborted, bnxt_get_registered_vfs() returns %d\n",
+ n);
return;
}
bnxt_queue_fw_reset_work(bp, HZ / 10);
@@ -10607,6 +10628,7 @@ static void bnxt_fw_reset_task(struct work_struct *work)
rtnl_unlock();
bnxt_queue_fw_reset_work(bp, bp->fw_reset_min_dsecs * HZ / 10);
return;
+ }
case BNXT_FW_RESET_STATE_RESET_FW: {
u32 wait_dsecs = bp->fw_health->post_reset_wait_dsecs;
--
2.5.1
^ permalink raw reply related
* Re: [PATCH net-next, 1/2] hv_netvsc: Allow scatter-gather feature to be tunable
From: Jakub Kicinski @ 2019-08-30 23:05 UTC (permalink / raw)
To: Haiyang Zhang
Cc: sashal@kernel.org, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org, KY Srinivasan, Stephen Hemminger,
olaf@aepfle.de, vkuznets, davem@davemloft.net,
linux-kernel@vger.kernel.org
In-Reply-To: <1567136656-49288-2-git-send-email-haiyangz@microsoft.com>
On Fri, 30 Aug 2019 03:45:24 +0000, Haiyang Zhang wrote:
> In a previous patch, the NETIF_F_SG was missing after the code changes.
> That caused the SG feature to be "fixed". This patch includes it into
> hw_features, so it is tunable again.
>
> Fixes: 23312a3be999 ("netvsc: negotiate checksum and segmentation parameters")
^
Looks like a tab sneaked in there.
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
^ permalink raw reply
* Re: [PATCH net-next, 2/2] hv_netvsc: Sync offloading features to VF NIC
From: Jakub Kicinski @ 2019-08-30 23:04 UTC (permalink / raw)
To: Haiyang Zhang
Cc: sashal@kernel.org, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org, KY Srinivasan, Stephen Hemminger,
olaf@aepfle.de, vkuznets, davem@davemloft.net,
linux-kernel@vger.kernel.org, Mark Bloch
In-Reply-To: <1567136656-49288-3-git-send-email-haiyangz@microsoft.com>
On Fri, 30 Aug 2019 03:45:38 +0000, Haiyang Zhang wrote:
> VF NIC may go down then come up during host servicing events. This
> causes the VF NIC offloading feature settings to roll back to the
> defaults. This patch can synchronize features from synthetic NIC to
> the VF NIC during ndo_set_features (ethtool -K),
> and netvsc_register_vf when VF comes back after host events.
>
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Mark Bloch <markb@mellanox.com>
If we want to make this change in behaviour we should change
net_failover at the same time.
^ permalink raw reply
* Re: [PATCH net-next v3 3/3] net: tls: export protocol version, cipher, tx_conf/rx_conf to socket diag
From: Jakub Kicinski @ 2019-08-30 22:45 UTC (permalink / raw)
To: Davide Caratti
Cc: borisp, Eric Dumazet, aviadye, davejwatson, davem, john.fastabend,
Matthieu Baerts, netdev
In-Reply-To: <39ad297f2b1f129b26c4a3461a1ae443d836da52.1567158431.git.dcaratti@redhat.com>
On Fri, 30 Aug 2019 12:25:49 +0200, Davide Caratti wrote:
> When an application configures kernel TLS on top of a TCP socket, it's
> now possible for inet_diag_handler() to collect information regarding the
> protocol version, the cipher type and TX / RX configuration, in case
> INET_DIAG_INFO is requested.
>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Thank you!
^ permalink raw reply
* Re: [PATCH bpf-next 00/13] bpf: adding map batch processing support
From: Yonghong Song @ 2019-08-30 22:38 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Alexei Starovoitov, bpf@vger.kernel.org, netdev@vger.kernel.org,
Brian Vazquez, Daniel Borkmann, Kernel Team, Quentin Monnet
In-Reply-To: <20190830143508.73c30631@cakuba.netronome.com>
On 8/30/19 2:35 PM, Jakub Kicinski wrote:
> On Fri, 30 Aug 2019 07:25:54 +0000, Yonghong Song wrote:
>> On 8/29/19 11:39 AM, Jakub Kicinski wrote:
>>> On Wed, 28 Aug 2019 23:45:02 -0700, Yonghong Song wrote:
>>>> Brian Vazquez has proposed BPF_MAP_DUMP command to look up more than one
>>>> map entries per syscall.
>>>> https://lore.kernel.org/bpf/CABCgpaU3xxX6CMMxD+1knApivtc2jLBHysDXw-0E9bQEL0qC3A@mail.gmail.com/T/#t
>>>>
>>>> During discussion, we found more use cases can be supported in a similar
>>>> map operation batching framework. For example, batched map lookup and delete,
>>>> which can be really helpful for bcc.
>>>> https://github.com/iovisor/bcc/blob/master/tools/tcptop.py#L233-L243
>>>> https://github.com/iovisor/bcc/blob/master/tools/slabratetop.py#L129-L138
>>>>
>>>> Also, in bcc, we have API to delete all entries in a map.
>>>> https://github.com/iovisor/bcc/blob/master/src/cc/api/BPFTable.h#L257-L264
>>>>
>>>> For map update, batched operations also useful as sometimes applications need
>>>> to populate initial maps with more than one entry. For example, the below
>>>> example is from kernel/samples/bpf/xdp_redirect_cpu_user.c:
>>>> https://github.com/torvalds/linux/blob/master/samples/bpf/xdp_redirect_cpu_user.c#L543-L550
>>>>
>>>> This patch addresses all the above use cases. To make uapi stable, it also
>>>> covers other potential use cases. Four bpf syscall subcommands are introduced:
>>>> BPF_MAP_LOOKUP_BATCH
>>>> BPF_MAP_LOOKUP_AND_DELETE_BATCH
>>>> BPF_MAP_UPDATE_BATCH
>>>> BPF_MAP_DELETE_BATCH
>>>>
>>>> In userspace, application can iterate through the whole map one batch
>>>> as a time, e.g., bpf_map_lookup_batch() in the below:
>>>> p_key = NULL;
>>>> p_next_key = &key;
>>>> while (true) {
>>>> err = bpf_map_lookup_batch(fd, p_key, &p_next_key, keys, values,
>>>> &batch_size, elem_flags, flags);
>>>> if (err) ...
>>>> if (p_next_key) break; // done
>>>> if (!p_key) p_key = p_next_key;
>>>> }
>>>> Please look at individual patches for details of new syscall subcommands
>>>> and examples of user codes.
>>>>
>>>> The testing is also done in a qemu VM environment:
>>>> measure_lookup: max_entries 1000000, batch 10, time 342ms
>>>> measure_lookup: max_entries 1000000, batch 1000, time 295ms
>>>> measure_lookup: max_entries 1000000, batch 1000000, time 270ms
>>>> measure_lookup: max_entries 1000000, no batching, time 1346ms
>>>> measure_lookup_delete: max_entries 1000000, batch 10, time 433ms
>>>> measure_lookup_delete: max_entries 1000000, batch 1000, time 363ms
>>>> measure_lookup_delete: max_entries 1000000, batch 1000000, time 357ms
>>>> measure_lookup_delete: max_entries 1000000, not batch, time 1894ms
>>>> measure_delete: max_entries 1000000, batch, time 220ms
>>>> measure_delete: max_entries 1000000, not batch, time 1289ms
>>>> For a 1M entry hash table, batch size of 10 can reduce cpu time
>>>> by 70%. Please see patch "tools/bpf: measure map batching perf"
>>>> for details of test codes.
>>>
>>> Hi Yonghong!
>>>
>>> great to see this, we have been looking at implementing some way to
>>> speed up map walks as well.
>>>
>>> The direction we were looking in, after previous discussions [1],
>>> however, was to provide a BPF program which can run the logic entirely
>>> within the kernel.
>>>
>>> We have a rough PoC on the FW side (we can offload the program which
>>> walks the map, which is pretty neat), but the kernel verifier side
>>> hasn't really progressed. It will soon.
>>>
>>> The rough idea is that the user space provides two programs, "filter"
>>> and "dumper":
>>>
>>> bpftool map exec id XYZ filter pinned /some/prog \
>>> dumper pinned /some/other_prog
>>>
>>> Both programs get this context:
>>>
>>> struct map_op_ctx {
>>> u64 key;
>>> u64 value;
>>> }
>>>
>>> We need a per-map implementation of the exec side, but roughly maps
>>> would do:
>>>
>>> LIST_HEAD(deleted);
>>>
>>> for entry in map {
>>> struct map_op_ctx {
>>> .key = entry->key,
>>> .value = entry->value,
>>> };
>>>
>>> act = BPF_PROG_RUN(filter, &map_op_ctx);
>>> if (act & ~ACT_BITS)
>>> return -EINVAL;
>>>
>>> if (act & DELETE) {
>>> map_unlink(entry);
>>> list_add(entry, &deleted);
>>> }
>>> if (act & STOP)
>>> break;
>>> }
>>>
>>> synchronize_rcu();
>>>
>>> for entry in deleted {
>>> struct map_op_ctx {
>>> .key = entry->key,
>>> .value = entry->value,
>>> };
>>>
>>> BPF_PROG_RUN(dumper, &map_op_ctx);
>>> map_free(entry);
>>> }
>>>
>>> The filter program can't perform any map operations other than lookup,
>>> otherwise we won't be able to guarantee that we'll walk the entire map
>>> (if the filter program deletes some entries in a unfortunate order).
>>
>> Looks like you will provide a new program type and per-map
>> implementation of above code. My patch set indeed avoided per-map
>> implementation for all of lookup/delete/get-next-key...
>
> Indeed, the simple batched ops are undeniably lower LoC.
>
>>> If user space just wants a pure dump it can simply load a program which
>>> dumps the entries into a perf ring.
>>
>> percpu perf ring is not really ideal for user space which simply just
>> want to get some key/value pairs back. Some kind of generate non-per-cpu
>> ring buffer might be better for such cases.
>
> I don't think it had to be per-cpu, but I may be blissfully ignorant
> about the perf ring details :) bpf_perf_event_output() takes flags,
> which are effectively selecting the "output CPU", no?
Right, it does not need to be per-cpu. One particular cpu
can be selected. Binding to which cpu might be always
subject to debate like why this cpu, not another cpu.
This works, and I am just thinking a ring buffer without
binding to cpu is better and less confusion to user.
But this may need yet another ring buffer implementation
in the kernel and people might not like it.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox