* Re: pull-request: can-next 2020-11-20
From: Jakub Kicinski @ 2020-11-21 23:09 UTC (permalink / raw)
To: Marc Kleine-Budde; +Cc: netdev, davem, linux-can, kernel
In-Reply-To: <20201120133318.3428231-1-mkl@pengutronix.de>
On Fri, 20 Nov 2020 14:32:53 +0100 Marc Kleine-Budde wrote:
> The first patch is by Yegor Yefremov and he improves the j1939 documentaton by
> adding tables for the CAN identifier and its fields.
>
> Then there are 8 patches by Oliver Hartkopp targeting the CAN driver
> infrastructure and drivers. These add support for optional DLC element to the
> Classical CAN frame structure. See patch ea7800565a12 ("can: add optional DLC
> element to Classical CAN frame structure") for details. Oliver's last patch
> adds len8_dlc support to several drivers. Stefan Mätje provides a patch to add
> len8_dlc support to the esd_usb2 driver.
>
> The next patch is by Oliver Hartkopp, too and adds support for modification of
> Classical CAN DLCs to CAN GW sockets.
>
> The next 3 patches target the nxp,flexcan DT bindings. One patch by my adds the
> missing uint32 reference to the clock-frequency property. Joakim Zhang's
> patches fix the fsl,clk-source property and add the IMX_SC_R_CAN() macro to the
> imx firmware header file, which will be used in the flexcan driver later.
>
> Another patch by Joakim Zhang prepares the flexcan driver for SCU based
> stop-mode, by giving the existing, GPR based stop-mode, a _GPR postfix.
>
> The next 5 patches are by me, target the flexcan driver, and clean up the
> .ndo_open and .ndo_stop callbacks. These patches try to fix a sporadically
> hanging flexcan_close() during simultanious ifdown, sending of CAN messages and
> probably open CAN bus. I was never aber to reproduce, but these seem to fix the
> problem at the reporting user. As these changes are rather big, I'd like to
> mainline them via net-next/master.
>
> The next patches are by Jimmy Assarsson and Christer Beskow, they add support
> for new USB devices to the existing kvaser_usb driver.
>
> The last patch is by Kaixu Xia and simplifies the return in the
> mcp251xfd_chip_softreset() function in the mcp251xfd driver.
Great, this one finally got into patchwork correctly!
Pulled, thank you!
^ permalink raw reply
* Re: [PATCH net 01/15] ibmvnic: handle inconsistent login with reset
From: Jakub Kicinski @ 2020-11-21 23:36 UTC (permalink / raw)
To: Lijun Pan; +Cc: netdev, sukadev, drt
In-Reply-To: <20201120224049.46933-2-ljp@linux.ibm.com>
On Fri, 20 Nov 2020 16:40:35 -0600 Lijun Pan wrote:
> From: Dany Madden <drt@linux.ibm.com>
>
> Inconsistent login with the vnicserver is causing the device to be
> removed. This does not give the device a chance to recover from error
> state. This patch schedules a FATAL reset instead to bring the adapter
> up.
>
> Signed-off-by: Dany Madden <drt@linux.ibm.com>
> Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Please provide fixes tags for all the patches.
^ permalink raw reply
* Re: [PATCH] tun: honor IOCB_NOWAIT flag
From: Jakub Kicinski @ 2020-11-21 23:19 UTC (permalink / raw)
To: Jens Axboe; +Cc: Networking
In-Reply-To: <e9451860-96cc-c7c7-47b8-fe42cadd5f4c@kernel.dk>
On Fri, 20 Nov 2020 07:59:54 -0700 Jens Axboe wrote:
> tun only checks the file O_NONBLOCK flag, but it should also be checking
> the iocb IOCB_NOWAIT flag. Any fops using ->read/write_iter() should check
> both, otherwise it breaks users that correctly expect O_NONBLOCK semantics
> if IOCB_NOWAIT is set.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Applied, thanks!
^ permalink raw reply
* Re: [PATCH net 04/15] ibmvnic: remove free_all_rwi function
From: Jakub Kicinski @ 2020-11-21 23:39 UTC (permalink / raw)
To: Lijun Pan; +Cc: netdev, sukadev, drt
In-Reply-To: <20201120224049.46933-5-ljp@linux.ibm.com>
On Fri, 20 Nov 2020 16:40:38 -0600 Lijun Pan wrote:
> From: Dany Madden <drt@linux.ibm.com>
>
> Remove free_all_rwi() since it is no longer used. (__ibmvnic_remove() was
> the last user of free_all_rwi()).
Squash this with the appropriate change, please.
^ permalink raw reply
* Re: [PATCH net 02/15] ibmvnic: process HMC disable command
From: Jakub Kicinski @ 2020-11-21 23:36 UTC (permalink / raw)
To: Lijun Pan; +Cc: netdev, sukadev, drt
In-Reply-To: <20201120224049.46933-3-ljp@linux.ibm.com>
On Fri, 20 Nov 2020 16:40:36 -0600 Lijun Pan wrote:
> From: Dany Madden <drt@linux.ibm.com>
>
> Currently ibmvnic does not support the disable vnic command from the
> Hardware Management Console. This patch enables ibmvnic to process
> CRQ message 0x07, disable vnic adapter.
What user-visible problem does this one solve?
^ permalink raw reply
* Re: [PATCH net 02/15] ibmvnic: process HMC disable command
From: Jakub Kicinski @ 2020-11-21 23:38 UTC (permalink / raw)
To: Lijun Pan; +Cc: netdev, sukadev, drt
In-Reply-To: <20201121153637.17a91ac4@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
On Sat, 21 Nov 2020 15:36:37 -0800 Jakub Kicinski wrote:
> On Fri, 20 Nov 2020 16:40:36 -0600 Lijun Pan wrote:
> > From: Dany Madden <drt@linux.ibm.com>
> >
> > Currently ibmvnic does not support the disable vnic command from the
> > Hardware Management Console. This patch enables ibmvnic to process
> > CRQ message 0x07, disable vnic adapter.
>
> What user-visible problem does this one solve?
Re-reading the commit message - is Hardware Management Console operated
by a human? So this is basically adding a missing feature, not fixes a
bug? Unless not being able to disable vnic is causing other things to
break.
^ permalink raw reply
* Re: [PATCH net 12/15] ibmvnic: fix NULL pointer dereference in reset_sub_crq_queues
From: Jakub Kicinski @ 2020-11-21 23:44 UTC (permalink / raw)
To: Lijun Pan; +Cc: netdev, sukadev, drt
In-Reply-To: <20201120224049.46933-13-ljp@linux.ibm.com>
On Fri, 20 Nov 2020 16:40:46 -0600 Lijun Pan wrote:
> adapter->tx_scrq and adapter->rx_scrq could be NULL if the previous reset
> did not complete after freeing sub crqs. Check for NULL before
> dereferencing them.
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
> index 47446e5f8ec5..a0dbd963a1ab 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -2930,6 +2930,13 @@ static int reset_sub_crq_queues(struct ibmvnic_adapter *adapter)
> {
> int i, rc;
>
> + if (!adapter->tx_scrq || !adapter->rx_scrq) {
> + netdev_err(adapter->netdev,
> + "tx_scrq (%p) or rx_scrq (%p) does not exist\n",
> + adapter->tx_scrq, adapter->rx_scrq);
This is expected to happen for the condition you describe in the commit
message. Either prevent it from happening or silently ignore.
What's the impact to the user when this happens? Why would they want to
know that some pointer is NULL? Presumably there is already a message
printed when reset does not complete or such?
> + return -EINVAL;
> + }
> +
> for (i = 0; i < adapter->req_tx_queues; i++) {
> netdev_dbg(adapter->netdev, "Re-setting tx_scrq[%d]\n", i);
> rc = reset_one_sub_crq_queue(adapter, adapter->tx_scrq[i]);
^ permalink raw reply
* Re: [PATCH net 15/15] ibmvnic: add some debugs
From: Jakub Kicinski @ 2020-11-21 23:45 UTC (permalink / raw)
To: Lijun Pan; +Cc: netdev, sukadev, drt
In-Reply-To: <20201120224049.46933-16-ljp@linux.ibm.com>
On Fri, 20 Nov 2020 16:40:49 -0600 Lijun Pan wrote:
> From: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
>
> We sometimes run into situations where a soft/hard reset of the adapter
> takes a long time or fails to complete. Having additional messages that
> include important adapter state info will hopefully help understand what
> is happening, reduce the guess work and minimize requests to reproduce
> problems with debug patches.
This doesn't qualify as a bug fix, please send it to net-next.
> + netdev_err(adapter->netdev,
> + "[S:%d FRR:%d WFR:%d] Done processing resets\n",
> + adapter->state, adapter->force_reset_recovery,
> + adapter->wait_for_reset);
Does reset only happen as a result of an error? Should this be a
netdev_info() instead?
^ permalink raw reply
* Re: [PATCH V2 net 4/4] net: ena: return error code from ena_xdp_xmit_buff
From: Jakub Kicinski @ 2020-11-21 23:53 UTC (permalink / raw)
To: Shay Agroskin
Cc: netdev, dwmw, zorik, matua, saeedb, msw, aliguori, nafea, gtzalik,
netanel, alisaidi, benh, akiyano, sameehj, ndagan
In-Reply-To: <20201119202851.28077-5-shayagr@amazon.com>
On Thu, 19 Nov 2020 22:28:51 +0200 Shay Agroskin wrote:
> The function mistakenly returns NETDEV_TX_OK regardless of the
> transmission success. This patch fixes this behavior by returning the
> error code from the function.
>
> Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action")
> Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Doesn't seem like a legitimate bug fix, since the only caller of this
function ignores its return value.
^ permalink raw reply
* Re: [PATCH mlx5-next 09/16] net/mlx5: Expose IP-in-IP TX and RX capability bits
From: Jakub Kicinski @ 2020-11-21 23:58 UTC (permalink / raw)
To: Saeed Mahameed
Cc: Leon Romanovsky, netdev, linux-rdma, Aya Levin, Moshe Shemesh
In-Reply-To: <20201120230339.651609-10-saeedm@nvidia.com>
On Fri, 20 Nov 2020 15:03:32 -0800 Saeed Mahameed wrote:
> From: Aya Levin <ayal@nvidia.com>
>
> Expose FW indication that it supports stateless offloads for IP over IP
> tunneled packets per direction. In some HW like ConnectX-4 IP-in-IP
> support is not symmetric, it supports steering on the inner header but
> it doesn't TX-Checksum and TSO. Add IP-in-IP capability per direction to
> cover this case as well.
What's the use for the rx capability in Linux? We don't have an API to
configure that AFAIK.
^ permalink raw reply
* Re: [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace
From: Jakub Kicinski @ 2020-11-22 0:01 UTC (permalink / raw)
To: Saeed Mahameed
Cc: Leon Romanovsky, netdev, linux-rdma, Eli Cohen, Mark Bloch,
Maor Gottlieb
In-Reply-To: <20201120230339.651609-12-saeedm@nvidia.com>
On Fri, 20 Nov 2020 15:03:34 -0800 Saeed Mahameed wrote:
> From: Eli Cohen <eli@mellanox.com>
>
> Add a new namespace type to the NIC RX root namespace to allow for
> inserting VDPA rules before regular NIC but after bypass, thus allowing
> DPDK to have precedence in packet processing.
How does DPDK and VDPA relate in this context?
^ permalink raw reply
* Re: [PATCH] octeontx2-af: Add support for RSS hashing based on Transport protocol field
From: patchwork-bot+netdevbpf @ 2020-11-22 0:10 UTC (permalink / raw)
To: George Cherian
Cc: netdev, linux-kernel, kuba, davem, sgoutham, lcherian, gakula
In-Reply-To: <20201120093906.2873616-1-george.cherian@marvell.com>
Hello:
This patch was applied to netdev/net-next.git (refs/heads/master):
On Fri, 20 Nov 2020 15:09:06 +0530 you wrote:
> Add support to choose RSS flow key algorithm with IPv4 transport protocol
> field included in hashing input data. This will be enabled by default.
> There-by enabling 3/5 tuple hash
>
> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
> Signed-off-by: George Cherian <george.cherian@marvell.com>
>
> [...]
Here is the summary with links:
- octeontx2-af: Add support for RSS hashing based on Transport protocol field
https://git.kernel.org/netdev/net-next/c/f9e425e99b07
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* [PATCH] dpaa2-eth: Fix compile error due to missing devlink support
From: Ezequiel Garcia @ 2020-11-22 0:23 UTC (permalink / raw)
To: netdev
Cc: Jakub Kicinski, David S . Miller, Ioana Radulescu, Ioana Ciornei,
Ezequiel Garcia, kernel
The dpaa2 driver depends on devlink, so it should select
NET_DEVLINK in order to fix compile errors, such as:
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.o: in function `dpaa2_eth_rx_err':
dpaa2-eth.c:(.text+0x3cec): undefined reference to `devlink_trap_report'
drivers/net/ethernet/freescale/dpaa2/dpaa2-eth-devlink.o: in function `dpaa2_eth_dl_info_get':
dpaa2-eth-devlink.c:(.text+0x160): undefined reference to `devlink_info_driver_name_put'
Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
---
drivers/net/ethernet/freescale/dpaa2/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/freescale/dpaa2/Kconfig b/drivers/net/ethernet/freescale/dpaa2/Kconfig
index cfd369cf4c8c..aee59ead7250 100644
--- a/drivers/net/ethernet/freescale/dpaa2/Kconfig
+++ b/drivers/net/ethernet/freescale/dpaa2/Kconfig
@@ -2,6 +2,7 @@
config FSL_DPAA2_ETH
tristate "Freescale DPAA2 Ethernet"
depends on FSL_MC_BUS && FSL_MC_DPIO
+ select NET_DEVLINK
select PHYLINK
select PCS_LYNX
help
--
2.27.0
^ permalink raw reply related
* Re: [PATCH net-next v3 1/5] net: implement threaded-able napi poll loop support
From: Jakub Kicinski @ 2020-11-22 0:31 UTC (permalink / raw)
To: Wei Wang
Cc: David Miller, netdev, Eric Dumazet, Felix Fietkau, Paolo Abeni,
Hannes Frederic Sowa, Hillf Danton
In-Reply-To: <20201118191009.3406652-2-weiwan@google.com>
On Wed, 18 Nov 2020 11:10:05 -0800 Wei Wang wrote:
> +int napi_set_threaded(struct napi_struct *n, bool threaded)
> +{
> + ASSERT_RTNL();
> +
> + if (n->dev->flags & IFF_UP)
> + return -EBUSY;
> +
> + if (threaded == !!test_bit(NAPI_STATE_THREADED, &n->state))
> + return 0;
> + if (threaded)
> + set_bit(NAPI_STATE_THREADED, &n->state);
> + else
> + clear_bit(NAPI_STATE_THREADED, &n->state);
Do we really need the per-NAPI control here? Does anyone have use cases
where that makes sense? The user would be guessing which NAPI means
which queue and which bit, currently.
> + /* if the device is initializing, nothing todo */
> + if (test_bit(__LINK_STATE_START, &n->dev->state))
> + return 0;
> +
> + napi_thread_stop(n);
> + napi_thread_start(n);
> + return 0;
> +}
> +EXPORT_SYMBOL(napi_set_threaded);
Why was this exported? Do we still need that?
Please rejig the patches into a reviewable form. You can use the
Co-developed-by tag to give people credit on individual patches.
^ permalink raw reply
* Re: [PATCH net 1/4] netfilter: nftables_offload: set address type in control dissector
From: Jakub Kicinski @ 2020-11-22 0:44 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, davem, netdev
In-Reply-To: <20201121123601.21733-2-pablo@netfilter.org>
On Sat, 21 Nov 2020 13:35:58 +0100 Pablo Neira Ayuso wrote:
> If the address type is missing through the control dissector, then
> matching on IPv4 and IPv6 addresses does not work.
Doesn't work where? Are you talking about a specific driver?
> Set it accordingly so
> rules that specify an IP address succesfully match on packets.
>
> Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support")
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
> include/net/netfilter/nf_tables_offload.h | 4 ++++
> net/netfilter/nf_tables_offload.c | 18 ++++++++++++++++++
> net/netfilter/nft_payload.c | 4 ++++
> 3 files changed, 26 insertions(+)
>
> diff --git a/include/net/netfilter/nf_tables_offload.h b/include/net/netfilter/nf_tables_offload.h
> index ea7d1d78b92d..bddd34c5bd79 100644
> --- a/include/net/netfilter/nf_tables_offload.h
> +++ b/include/net/netfilter/nf_tables_offload.h
> @@ -37,6 +37,7 @@ void nft_offload_update_dependency(struct nft_offload_ctx *ctx,
>
> struct nft_flow_key {
> struct flow_dissector_key_basic basic;
> + struct flow_dissector_key_control control;
> union {
> struct flow_dissector_key_ipv4_addrs ipv4;
> struct flow_dissector_key_ipv6_addrs ipv6;
> @@ -62,6 +63,9 @@ struct nft_flow_rule {
>
> #define NFT_OFFLOAD_F_ACTION (1 << 0)
>
> +void nft_flow_rule_set_addr_type(struct nft_flow_rule *flow,
> + enum flow_dissector_key_id addr_type);
> +
> struct nft_rule;
> struct nft_flow_rule *nft_flow_rule_create(struct net *net, const struct nft_rule *rule);
> void nft_flow_rule_destroy(struct nft_flow_rule *flow);
> diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c
> index 9f625724a20f..9a3c5ac057b6 100644
> --- a/net/netfilter/nf_tables_offload.c
> +++ b/net/netfilter/nf_tables_offload.c
> @@ -28,6 +28,24 @@ static struct nft_flow_rule *nft_flow_rule_alloc(int num_actions)
> return flow;
> }
>
> +void nft_flow_rule_set_addr_type(struct nft_flow_rule *flow,
> + enum flow_dissector_key_id addr_type)
> +{
> + struct nft_flow_match *match = &flow->match;
> + struct nft_flow_key *mask = &match->mask;
> + struct nft_flow_key *key = &match->key;
> +
> + if (match->dissector.used_keys & BIT(FLOW_DISSECTOR_KEY_CONTROL))
> + return;
> +
> + key->control.addr_type = addr_type;
> + mask->control.addr_type = 0xffff;
> + match->dissector.used_keys |= BIT(FLOW_DISSECTOR_KEY_CONTROL);
> + match->dissector.offset[FLOW_DISSECTOR_KEY_CONTROL] =
> + offsetof(struct nft_flow_key, control);
Why is this injecting the match conditionally?
> +}
> +EXPORT_SYMBOL_GPL(nft_flow_rule_set_addr_type);
And why is this exported?
nf_tables-objs := nf_tables_core.o nf_tables_api.o nft_chain_filter.o \
nf_tables_trace.o nft_immediate.o nft_cmp.o nft_range.o \
nft_bitwise.o nft_byteorder.o nft_payload.o nft_lookup.o \
^^^^^^^^^^^^^
nft_dynset.o nft_meta.o nft_rt.o nft_exthdr.o \
nft_chain_route.o nf_tables_offload.o \
^^^^^^^^^^^^^^^^^^^
nft_set_hash.o nft_set_bitmap.o nft_set_rbtree.o \
nft_set_pipapo.o
These are linked together.
^ permalink raw reply
* Re: [PATCH net-next v3 00/12] net: dsa: microchip: PTP support for KSZ956x
From: Richard Cochran @ 2020-11-22 1:36 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Tristram.Ha, ceggers, kuba, andrew, robh+dt, vivien.didelot,
davem, kurt.kanzenbach, george.mccollister, marex, helmut.grohne,
pbarker, Codrin.Ciubotariu, Woojung.Huh, UNGLinuxDriver, netdev,
devicetree, linux-kernel
In-Reply-To: <20201121012611.r6h5zpd32pypczg3@skbuf>
On Sat, Nov 21, 2020 at 03:26:11AM +0200, Vladimir Oltean wrote:
> On Thu, Nov 19, 2020 at 06:51:15PM +0000, Tristram.Ha@microchip.com wrote:
> > The receive and transmit latencies are different for different connected speed. So the
> > driver needs to change them when the link changes. For that reason the PTP stack
> > should not use its own latency values as generally the application does not care about
> > the linked speed.
>
> The thing is, ptp4l already has ingressLatency and egressLatency
> settings, and I would not be surprised if those config options would get
> extended to cover values at multiple link speeds.
>
> In the general case, the ksz9477 MAC could be attached to any external
> PHY, having its own propagation delay characteristics, or any number of
> other things that cause clock domain crossings. I'm not sure how feasible
> it is for the kernel to abstract this away completely, and adjust
> timestamps automatically based on any and all combinations of MAC and
> PHY. Maybe this is just wishful thinking.
The idea that the driver will correctly adjust time stamps according
to link speed sounds nice in theory, but in practice it fails. There
is a at least one other driver that attempted this, but, surprise,
surprise, the hard coded correction values turned out to be wrong.
I think the best way would be to let user space monitor the link speed
and apply the matching correction value. That way, we avoid bogus,
hard coded values in kernel space. (This isn't implemented in
linuxptp, but it certainly could be.)
Thanks,
Richard
^ permalink raw reply
* Re: [PATCH net-next v3 00/12] net: dsa: microchip: PTP support for KSZ956x
From: Richard Cochran @ 2020-11-22 1:53 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Tristram.Ha, ceggers, kuba, andrew, robh+dt, vivien.didelot,
davem, kurt.kanzenbach, george.mccollister, marex, helmut.grohne,
pbarker, Codrin.Ciubotariu, Woojung.Huh, UNGLinuxDriver, netdev,
devicetree, linux-kernel
In-Reply-To: <20201121012611.r6h5zpd32pypczg3@skbuf>
On Sat, Nov 21, 2020 at 03:26:11AM +0200, Vladimir Oltean wrote:
> On Thu, Nov 19, 2020 at 06:51:15PM +0000, Tristram.Ha@microchip.com wrote:
> > I think the right implementation is for the driver to remember this receive timestamp
> > of Pdelay_Req and puts it in the tail tag when it sees a 1-step Pdelay_Resp is sent.
As long as this is transparent to user space, it could work. Remember
that user space simply copies the correction field from the Request
into the Response. If the driver correctly accumulates the turnaround
time into the correction field of the response, then all is well.
> I have mixed feelings about this. IIUC, you're saying "let's implement a
> fixed-size FIFO of RX timestamps of Pdelay_Req messages, and let's match
> them on TX to Pdelay_Resp messages, by {sequenceId, domainNumber}."
>
> But how deep should we make that FIFO? I.e. how many Pdelay_Req messages
> should we expect before the user space will inject back a Pdelay_Resp
> for transmission?
Good question. Normally you would expect just one Request pending at
any one time, but nothing guarantees that, and so the driver would
have to match the Req/Resp exactly and deal with rogue/buggy requests
and responses.
Thanks,
Richard
^ permalink raw reply
* [net-next 0/5] Add CHACHA20-POLY1305 cipher to Kernel TLS
From: Vadim Fedorenko @ 2020-11-22 1:57 UTC (permalink / raw)
To: Jakub Kicinski, Boris Pismenny, Aviad Yehezkel; +Cc: Vadim Fedorenko, netdev
RFC 7905 defines usage of ChaCha20-Poly1305 in TLS connections. This
cipher is widely used nowadays and it's good to have a support for it
in TLS connections in kernel
Vadim Fedorenko (5):
net/tls: make inline helpers protocol-aware
net/tls: add CHACHA20-POLY1305 specific defines and structures
net/tls: add CHACHA20-POLY1305 specific behavior
net/tls: add CHACHA20-POLY1305 configuration
selftests/tls: add CHACHA20-POLY1305 to tls selftests
include/net/tls.h | 32 ++++++++++++++++---------------
include/uapi/linux/tls.h | 15 +++++++++++++++
net/tls/tls_device.c | 2 +-
net/tls/tls_device_fallback.c | 13 +++++++------
net/tls/tls_main.c | 3 +++
net/tls/tls_sw.c | 36 ++++++++++++++++++++++++++---------
tools/testing/selftests/net/tls.c | 40 ++++++++++++++++++++++++++++++++-------
7 files changed, 103 insertions(+), 38 deletions(-)
--
1.8.3.1
^ permalink raw reply
* [net-next 1/5] net/tls: make inline helpers protocol-aware
From: Vadim Fedorenko @ 2020-11-22 1:57 UTC (permalink / raw)
To: Jakub Kicinski, Boris Pismenny, Aviad Yehezkel; +Cc: Vadim Fedorenko, netdev
In-Reply-To: <1606010265-30471-1-git-send-email-vfedorenko@novek.ru>
Inline functions defined in tls.h have a lot of AES-specific
constants. Remove these constants and change argument to struct
tls_prot_info to have an access to cipher type in later patches
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
---
include/net/tls.h | 26 ++++++++++++--------------
net/tls/tls_device.c | 2 +-
net/tls/tls_device_fallback.c | 13 +++++++------
net/tls/tls_sw.c | 12 +++++-------
4 files changed, 25 insertions(+), 28 deletions(-)
diff --git a/include/net/tls.h b/include/net/tls.h
index cf14730..d04ce73 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -502,31 +502,30 @@ static inline void tls_advance_record_sn(struct sock *sk,
tls_err_abort(sk, EBADMSG);
if (prot->version != TLS_1_3_VERSION)
- tls_bigint_increment(ctx->iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE,
+ tls_bigint_increment(ctx->iv + prot->salt_size,
prot->iv_size);
}
static inline void tls_fill_prepend(struct tls_context *ctx,
char *buf,
size_t plaintext_len,
- unsigned char record_type,
- int version)
+ unsigned char record_type)
{
struct tls_prot_info *prot = &ctx->prot_info;
size_t pkt_len, iv_size = prot->iv_size;
pkt_len = plaintext_len + prot->tag_size;
- if (version != TLS_1_3_VERSION) {
+ if (prot->version != TLS_1_3_VERSION) {
pkt_len += iv_size;
memcpy(buf + TLS_NONCE_OFFSET,
- ctx->tx.iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, iv_size);
+ ctx->tx.iv + prot->salt_size, iv_size);
}
/* we cover nonce explicit here as well, so buf should be of
* size KTLS_DTLS_HEADER_SIZE + KTLS_DTLS_NONCE_EXPLICIT_SIZE
*/
- buf[0] = version == TLS_1_3_VERSION ?
+ buf[0] = prot->version == TLS_1_3_VERSION ?
TLS_RECORD_TYPE_DATA : record_type;
/* Note that VERSION must be TLS_1_2 for both TLS1.2 and TLS1.3 */
buf[1] = TLS_1_2_VERSION_MINOR;
@@ -539,18 +538,17 @@ static inline void tls_fill_prepend(struct tls_context *ctx,
static inline void tls_make_aad(char *buf,
size_t size,
char *record_sequence,
- int record_sequence_size,
unsigned char record_type,
- int version)
+ struct tls_prot_info *prot)
{
- if (version != TLS_1_3_VERSION) {
- memcpy(buf, record_sequence, record_sequence_size);
+ if (prot->version != TLS_1_3_VERSION) {
+ memcpy(buf, record_sequence, prot->rec_seq_size);
buf += 8;
} else {
- size += TLS_CIPHER_AES_GCM_128_TAG_SIZE;
+ size += prot->tag_size;
}
- buf[0] = version == TLS_1_3_VERSION ?
+ buf[0] = prot->version == TLS_1_3_VERSION ?
TLS_RECORD_TYPE_DATA : record_type;
buf[1] = TLS_1_2_VERSION_MAJOR;
buf[2] = TLS_1_2_VERSION_MINOR;
@@ -558,11 +556,11 @@ static inline void tls_make_aad(char *buf,
buf[4] = size & 0xFF;
}
-static inline void xor_iv_with_seq(int version, char *iv, char *seq)
+static inline void xor_iv_with_seq(struct tls_prot_info *prot, char *iv, char *seq)
{
int i;
- if (version == TLS_1_3_VERSION) {
+ if (prot->version == TLS_1_3_VERSION) {
for (i = 0; i < 8; i++)
iv[i + 4] ^= seq[i];
}
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 54d3e16..6f93ad5 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -327,7 +327,7 @@ static int tls_device_record_close(struct sock *sk,
/* fill prepend */
tls_fill_prepend(ctx, skb_frag_address(&record->frags[0]),
record->len - prot->overhead_size,
- record_type, prot->version);
+ record_type);
return ret;
}
diff --git a/net/tls/tls_device_fallback.c b/net/tls/tls_device_fallback.c
index 2889533..d946817 100644
--- a/net/tls/tls_device_fallback.c
+++ b/net/tls/tls_device_fallback.c
@@ -49,7 +49,8 @@ static int tls_enc_record(struct aead_request *aead_req,
struct crypto_aead *aead, char *aad,
char *iv, __be64 rcd_sn,
struct scatter_walk *in,
- struct scatter_walk *out, int *in_len)
+ struct scatter_walk *out, int *in_len,
+ struct tls_prot_info *prot)
{
unsigned char buf[TLS_HEADER_SIZE + TLS_CIPHER_AES_GCM_128_IV_SIZE];
struct scatterlist sg_in[3];
@@ -73,8 +74,7 @@ static int tls_enc_record(struct aead_request *aead_req,
len -= TLS_CIPHER_AES_GCM_128_IV_SIZE;
tls_make_aad(aad, len - TLS_CIPHER_AES_GCM_128_TAG_SIZE,
- (char *)&rcd_sn, sizeof(rcd_sn), buf[0],
- TLS_1_2_VERSION);
+ (char *)&rcd_sn, buf[0], prot);
memcpy(iv + TLS_CIPHER_AES_GCM_128_SALT_SIZE, buf + TLS_HEADER_SIZE,
TLS_CIPHER_AES_GCM_128_IV_SIZE);
@@ -140,7 +140,7 @@ static struct aead_request *tls_alloc_aead_request(struct crypto_aead *aead,
static int tls_enc_records(struct aead_request *aead_req,
struct crypto_aead *aead, struct scatterlist *sg_in,
struct scatterlist *sg_out, char *aad, char *iv,
- u64 rcd_sn, int len)
+ u64 rcd_sn, int len, struct tls_prot_info *prot)
{
struct scatter_walk out, in;
int rc;
@@ -150,7 +150,7 @@ static int tls_enc_records(struct aead_request *aead_req,
do {
rc = tls_enc_record(aead_req, aead, aad, iv,
- cpu_to_be64(rcd_sn), &in, &out, &len);
+ cpu_to_be64(rcd_sn), &in, &out, &len, prot);
rcd_sn++;
} while (rc == 0 && len);
@@ -348,7 +348,8 @@ static struct sk_buff *tls_enc_skb(struct tls_context *tls_ctx,
payload_len, sync_size, dummy_buf);
if (tls_enc_records(aead_req, ctx->aead_send, sg_in, sg_out, aad, iv,
- rcd_sn, sync_size + payload_len) < 0)
+ rcd_sn, sync_size + payload_len,
+ &tls_ctx->prot_info) < 0)
goto free_nskb;
complete_skb(nskb, skb, tcp_payload_offset);
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 2fe9e2c..6bc757a 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -505,7 +505,7 @@ static int tls_do_encryption(struct sock *sk,
memcpy(&rec->iv_data[iv_offset], tls_ctx->tx.iv,
prot->iv_size + prot->salt_size);
- xor_iv_with_seq(prot->version, rec->iv_data, tls_ctx->tx.rec_seq);
+ xor_iv_with_seq(prot, rec->iv_data, tls_ctx->tx.rec_seq);
sge->offset += prot->prepend_size;
sge->length -= prot->prepend_size;
@@ -748,14 +748,13 @@ static int tls_push_record(struct sock *sk, int flags,
sg_chain(rec->sg_aead_out, 2, &msg_en->sg.data[i]);
tls_make_aad(rec->aad_space, msg_pl->sg.size + prot->tail_size,
- tls_ctx->tx.rec_seq, prot->rec_seq_size,
- record_type, prot->version);
+ tls_ctx->tx.rec_seq, record_type, prot);
tls_fill_prepend(tls_ctx,
page_address(sg_page(&msg_en->sg.data[i])) +
msg_en->sg.data[i].offset,
msg_pl->sg.size + prot->tail_size,
- record_type, prot->version);
+ record_type);
tls_ctx->pending_open_record_frags = false;
@@ -1471,13 +1470,12 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
else
memcpy(iv + iv_offset, tls_ctx->rx.iv, prot->salt_size);
- xor_iv_with_seq(prot->version, iv, tls_ctx->rx.rec_seq);
+ xor_iv_with_seq(prot, iv, tls_ctx->rx.rec_seq);
/* Prepare AAD */
tls_make_aad(aad, rxm->full_len - prot->overhead_size +
prot->tail_size,
- tls_ctx->rx.rec_seq, prot->rec_seq_size,
- ctx->control, prot->version);
+ tls_ctx->rx.rec_seq, ctx->control, prot);
/* Prepare sgin */
sg_init_table(sgin, n_sgin);
--
1.8.3.1
^ permalink raw reply related
* [net-next 2/5] net/tls: add CHACHA20-POLY1305 specific defines and structures
From: Vadim Fedorenko @ 2020-11-22 1:57 UTC (permalink / raw)
To: Jakub Kicinski, Boris Pismenny, Aviad Yehezkel; +Cc: Vadim Fedorenko, netdev
In-Reply-To: <1606010265-30471-1-git-send-email-vfedorenko@novek.ru>
To provide support for ChaCha-Poly cipher we need to define
specific constants and structures.
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
---
include/net/tls.h | 1 +
include/uapi/linux/tls.h | 15 +++++++++++++++
2 files changed, 16 insertions(+)
diff --git a/include/net/tls.h b/include/net/tls.h
index d04ce73..e4e9c2a 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -211,6 +211,7 @@ struct cipher_context {
union {
struct tls12_crypto_info_aes_gcm_128 aes_gcm_128;
struct tls12_crypto_info_aes_gcm_256 aes_gcm_256;
+ struct tls12_crypto_info_chacha20_poly1305 chacha20_poly1305;
};
};
diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h
index bcd2869..0d54bae 100644
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -77,6 +77,13 @@
#define TLS_CIPHER_AES_CCM_128_TAG_SIZE 16
#define TLS_CIPHER_AES_CCM_128_REC_SEQ_SIZE 8
+#define TLS_CIPHER_CHACHA20_POLY1305 54
+#define TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE 12
+#define TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE 32
+#define TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE 0
+#define TLS_CIPHER_CHACHA20_POLY1305_TAG_SIZE 16
+#define TLS_CIPHER_CHACHA20_POLY1305_REC_SEQ_SIZE 8
+
#define TLS_SET_RECORD_TYPE 1
#define TLS_GET_RECORD_TYPE 2
@@ -109,6 +116,14 @@ struct tls12_crypto_info_aes_ccm_128 {
unsigned char rec_seq[TLS_CIPHER_AES_CCM_128_REC_SEQ_SIZE];
};
+struct tls12_crypto_info_chacha20_poly1305 {
+ struct tls_crypto_info info;
+ unsigned char iv[TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE];
+ unsigned char key[TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE];
+ unsigned char salt[TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE];
+ unsigned char rec_seq[TLS_CIPHER_CHACHA20_POLY1305_REC_SEQ_SIZE];
+};
+
enum {
TLS_INFO_UNSPEC,
TLS_INFO_VERSION,
--
1.8.3.1
^ permalink raw reply related
* [net-next 4/5] net/tls: add CHACHA20-POLY1305 configuration
From: Vadim Fedorenko @ 2020-11-22 1:57 UTC (permalink / raw)
To: Jakub Kicinski, Boris Pismenny, Aviad Yehezkel; +Cc: Vadim Fedorenko, netdev
In-Reply-To: <1606010265-30471-1-git-send-email-vfedorenko@novek.ru>
Add ChaCha-Poly specific configuration code.
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
---
net/tls/tls_main.c | 3 +++
net/tls/tls_sw.c | 18 ++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 8d93cea..47b7c53 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -521,6 +521,9 @@ static int do_tls_setsockopt_conf(struct sock *sk, sockptr_t optval,
case TLS_CIPHER_AES_CCM_128:
optsize = sizeof(struct tls12_crypto_info_aes_ccm_128);
break;
+ case TLS_CIPHER_CHACHA20_POLY1305:
+ optsize = sizeof(struct tls12_crypto_info_chacha20_poly1305);
+ break;
default:
rc = -EINVAL;
goto err_crypto_info;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index b4eefdb..6bb33d5 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2290,6 +2290,7 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
struct tls12_crypto_info_aes_gcm_128 *gcm_128_info;
struct tls12_crypto_info_aes_gcm_256 *gcm_256_info;
struct tls12_crypto_info_aes_ccm_128 *ccm_128_info;
+ struct tls12_crypto_info_chacha20_poly1305 *chacha20_poly1305_info;
struct tls_sw_context_tx *sw_ctx_tx = NULL;
struct tls_sw_context_rx *sw_ctx_rx = NULL;
struct cipher_context *cctx;
@@ -2402,6 +2403,23 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
cipher_name = "ccm(aes)";
break;
}
+ case TLS_CIPHER_CHACHA20_POLY1305: {
+ nonce_size = 0;
+ tag_size = TLS_CIPHER_CHACHA20_POLY1305_TAG_SIZE;
+ iv_size = TLS_CIPHER_CHACHA20_POLY1305_IV_SIZE;
+ iv = ((struct tls12_crypto_info_chacha20_poly1305 *)crypto_info)->iv;
+ rec_seq_size = TLS_CIPHER_CHACHA20_POLY1305_REC_SEQ_SIZE;
+ rec_seq =
+ ((struct tls12_crypto_info_chacha20_poly1305 *)crypto_info)->rec_seq;
+ chacha20_poly1305_info =
+ (struct tls12_crypto_info_chacha20_poly1305 *)crypto_info;
+ keysize = TLS_CIPHER_CHACHA20_POLY1305_KEY_SIZE;
+ key = chacha20_poly1305_info->key;
+ salt = chacha20_poly1305_info->salt;
+ salt_size = TLS_CIPHER_CHACHA20_POLY1305_SALT_SIZE;
+ cipher_name = "rfc7539(chacha20,poly1305)";
+ break;
+ }
default:
rc = -EINVAL;
goto free_priv;
--
1.8.3.1
^ permalink raw reply related
* [net-next 3/5] net/tls: add CHACHA20-POLY1305 specific behavior
From: Vadim Fedorenko @ 2020-11-22 1:57 UTC (permalink / raw)
To: Jakub Kicinski, Boris Pismenny, Aviad Yehezkel; +Cc: Vadim Fedorenko, netdev
In-Reply-To: <1606010265-30471-1-git-send-email-vfedorenko@novek.ru>
RFC 7905 defines special behavior for ChaCha-Poly TLS sessions.
The differences are in the calculation of nonce and the absence
of explicit IV. This behavior is like TLSv1.3 partly.
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
---
include/net/tls.h | 9 ++++++---
net/tls/tls_sw.c | 6 ++++--
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/include/net/tls.h b/include/net/tls.h
index e4e9c2a..b2637ed 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -502,7 +502,8 @@ static inline void tls_advance_record_sn(struct sock *sk,
if (tls_bigint_increment(ctx->rec_seq, prot->rec_seq_size))
tls_err_abort(sk, EBADMSG);
- if (prot->version != TLS_1_3_VERSION)
+ if (prot->version != TLS_1_3_VERSION &&
+ prot->cipher_type != TLS_CIPHER_CHACHA20_POLY1305)
tls_bigint_increment(ctx->iv + prot->salt_size,
prot->iv_size);
}
@@ -516,7 +517,8 @@ static inline void tls_fill_prepend(struct tls_context *ctx,
size_t pkt_len, iv_size = prot->iv_size;
pkt_len = plaintext_len + prot->tag_size;
- if (prot->version != TLS_1_3_VERSION) {
+ if (prot->version != TLS_1_3_VERSION &&
+ prot->cipher_type != TLS_CIPHER_CHACHA20_POLY1305) {
pkt_len += iv_size;
memcpy(buf + TLS_NONCE_OFFSET,
@@ -561,7 +563,8 @@ static inline void xor_iv_with_seq(struct tls_prot_info *prot, char *iv, char *s
{
int i;
- if (prot->version == TLS_1_3_VERSION) {
+ if (prot->version == TLS_1_3_VERSION ||
+ prot->cipher_type == TLS_CIPHER_CHACHA20_POLY1305) {
for (i = 0; i < 8; i++)
iv[i + 4] ^= seq[i];
}
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 6bc757a..b4eefdb 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1464,7 +1464,8 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
kfree(mem);
return err;
}
- if (prot->version == TLS_1_3_VERSION)
+ if (prot->version == TLS_1_3_VERSION ||
+ prot->cipher_type == TLS_CIPHER_CHACHA20_POLY1305)
memcpy(iv + iv_offset, tls_ctx->rx.iv,
crypto_aead_ivsize(ctx->aead_recv));
else
@@ -2068,7 +2069,8 @@ static int tls_read_size(struct strparser *strp, struct sk_buff *skb)
data_len = ((header[4] & 0xFF) | (header[3] << 8));
cipher_overhead = prot->tag_size;
- if (prot->version != TLS_1_3_VERSION)
+ if (prot->version != TLS_1_3_VERSION &&
+ prot->cipher_type != TLS_CIPHER_CHACHA20_POLY1305)
cipher_overhead += prot->iv_size;
if (data_len > TLS_MAX_PAYLOAD_SIZE + cipher_overhead +
--
1.8.3.1
^ permalink raw reply related
* [net-next 5/5] selftests/tls: add CHACHA20-POLY1305 to tls selftests
From: Vadim Fedorenko @ 2020-11-22 1:57 UTC (permalink / raw)
To: Jakub Kicinski, Boris Pismenny, Aviad Yehezkel; +Cc: Vadim Fedorenko, netdev
In-Reply-To: <1606010265-30471-1-git-send-email-vfedorenko@novek.ru>
Add new cipher as a variant of standart tls selftests
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
---
tools/testing/selftests/net/tls.c | 40 ++++++++++++++++++++++++++++++++-------
1 file changed, 33 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index b599f1f..49d8ade 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -103,32 +103,58 @@
FIXTURE_VARIANT(tls)
{
- unsigned int tls_version;
+ u16 tls_version;
+ u16 cipher_type;
};
-FIXTURE_VARIANT_ADD(tls, 12)
+FIXTURE_VARIANT_ADD(tls, 12_gcm)
{
.tls_version = TLS_1_2_VERSION,
+ .cipher_type = TLS_CIPHER_AES_GCM_128,
};
-FIXTURE_VARIANT_ADD(tls, 13)
+FIXTURE_VARIANT_ADD(tls, 13_gcm)
{
.tls_version = TLS_1_3_VERSION,
+ .cipher_type = TLS_CIPHER_AES_GCM_128,
+};
+
+FIXTURE_VARIANT_ADD(tls, 12_chacha)
+{
+ .tls_version = TLS_1_2_VERSION,
+ .cipher_type = TLS_CIPHER_CHACHA20_POLY1305,
+};
+
+FIXTURE_VARIANT_ADD(tls, 13_chacha)
+{
+ .tls_version = TLS_1_3_VERSION,
+ .cipher_type = TLS_CIPHER_CHACHA20_POLY1305,
};
FIXTURE_SETUP(tls)
{
- struct tls12_crypto_info_aes_gcm_128 tls12;
+ union tls_crypto_context tls12;
struct sockaddr_in addr;
socklen_t len;
int sfd, ret;
+ size_t tls12_sz;
self->notls = false;
len = sizeof(addr);
memset(&tls12, 0, sizeof(tls12));
tls12.info.version = variant->tls_version;
- tls12.info.cipher_type = TLS_CIPHER_AES_GCM_128;
+ tls12.info.cipher_type = variant->cipher_type;
+ switch(variant->cipher_type) {
+ case TLS_CIPHER_CHACHA20_POLY1305:
+ tls12_sz = sizeof(tls12_crypto_info_chacha20_poly1305);
+ break;
+ case TLS_CIPHER_AES_GCM_128:
+ tls12_sz = sizeof(tls12_crypto_info_aes_gcm_128);
+ break;
+ default:
+ tls12_sz = 0;
+ }
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(INADDR_ANY);
@@ -156,7 +182,7 @@
if (!self->notls) {
ret = setsockopt(self->fd, SOL_TLS, TLS_TX, &tls12,
- sizeof(tls12));
+ tls12_sz);
ASSERT_EQ(ret, 0);
}
@@ -169,7 +195,7 @@
ASSERT_EQ(ret, 0);
ret = setsockopt(self->cfd, SOL_TLS, TLS_RX, &tls12,
- sizeof(tls12));
+ tls12_sz);
ASSERT_EQ(ret, 0);
}
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH net-next v3 1/5] net: implement threaded-able napi poll loop support
From: Wei Wang @ 2020-11-22 2:23 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David Miller, Linux Kernel Network Developers, Eric Dumazet,
Felix Fietkau, Paolo Abeni, Hannes Frederic Sowa, Hillf Danton
In-Reply-To: <20201121163136.024d636c@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net>
On Sat, Nov 21, 2020 at 4:31 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 18 Nov 2020 11:10:05 -0800 Wei Wang wrote:
> > +int napi_set_threaded(struct napi_struct *n, bool threaded)
> > +{
> > + ASSERT_RTNL();
> > +
> > + if (n->dev->flags & IFF_UP)
> > + return -EBUSY;
> > +
> > + if (threaded == !!test_bit(NAPI_STATE_THREADED, &n->state))
> > + return 0;
> > + if (threaded)
> > + set_bit(NAPI_STATE_THREADED, &n->state);
> > + else
> > + clear_bit(NAPI_STATE_THREADED, &n->state);
>
> Do we really need the per-NAPI control here? Does anyone have use cases
> where that makes sense? The user would be guessing which NAPI means
> which queue and which bit, currently.
Thanks for reviewing this.
I think one use case might be that if the driver uses separate napi
for tx and rx, one might want to only enable threaded mode for rx, and
leave tx completion in interrupt mode.
>
> > + /* if the device is initializing, nothing todo */
> > + if (test_bit(__LINK_STATE_START, &n->dev->state))
> > + return 0;
> > +
> > + napi_thread_stop(n);
> > + napi_thread_start(n);
> > + return 0;
> > +}
> > +EXPORT_SYMBOL(napi_set_threaded);
>
> Why was this exported? Do we still need that?
>
Yea. I guess it is not needed.
> Please rejig the patches into a reviewable form. You can use the
> Co-developed-by tag to give people credit on individual patches.
Will do. Thanks!
^ permalink raw reply
* [PATCH net] ipv6: addrlabel: fix possible memory leak in ip6addrlbl_net_init
From: Wang Hai @ 2020-11-22 2:34 UTC (permalink / raw)
To: davem, kuznet, yoshfuji, kuba; +Cc: netdev, linux-kernel
kmemleak report a memory leak as follows:
unreferenced object 0xffff8880059c6a00 (size 64):
comm "ip", pid 23696, jiffies 4296590183 (age 1755.384s)
hex dump (first 32 bytes):
20 01 00 10 00 00 00 00 00 00 00 00 00 00 00 00 ...............
1c 00 00 00 00 00 00 00 00 00 00 00 07 00 00 00 ................
backtrace:
[<00000000aa4e7a87>] ip6addrlbl_add+0x90/0xbb0
[<0000000070b8d7f1>] ip6addrlbl_net_init+0x109/0x170
[<000000006a9ca9d4>] ops_init+0xa8/0x3c0
[<000000002da57bf2>] setup_net+0x2de/0x7e0
[<000000004e52d573>] copy_net_ns+0x27d/0x530
[<00000000b07ae2b4>] create_new_namespaces+0x382/0xa30
[<000000003b76d36f>] unshare_nsproxy_namespaces+0xa1/0x1d0
[<0000000030653721>] ksys_unshare+0x3a4/0x780
[<0000000007e82e40>] __x64_sys_unshare+0x2d/0x40
[<0000000031a10c08>] do_syscall_64+0x33/0x40
[<0000000099df30e7>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
We should free all rules when we catch an error in ip6addrlbl_net_init().
otherwise a memory leak will occur.
Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
---
net/ipv6/addrlabel.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/addrlabel.c b/net/ipv6/addrlabel.c
index 642fc6ac13d2..637e323a0224 100644
--- a/net/ipv6/addrlabel.c
+++ b/net/ipv6/addrlabel.c
@@ -306,6 +306,8 @@ static int ip6addrlbl_del(struct net *net,
/* add default label */
static int __net_init ip6addrlbl_net_init(struct net *net)
{
+ struct ip6addrlbl_entry *p = NULL;
+ struct hlist_node *n;
int err = 0;
int i;
@@ -320,9 +322,17 @@ static int __net_init ip6addrlbl_net_init(struct net *net)
ip6addrlbl_init_table[i].prefixlen,
0,
ip6addrlbl_init_table[i].label, 0);
- /* XXX: should we free all rules when we catch an error? */
- if (ret && (!err || err != -ENOMEM))
+ if (ret && (!err || err != -ENOMEM)) {
err = ret;
+ goto err_ip6addrlbl_add;
+ }
+ }
+ return err;
+
+err_ip6addrlbl_add:
+ hlist_for_each_entry_safe(p, n, &net->ipv6.ip6addrlbl_table.head, list) {
+ hlist_del_rcu(&p->list);
+ kfree_rcu(p, rcu);
}
return err;
}
--
2.17.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox