* Re: [PATCH net-next 2/2 v9] net: ethernet: Add a driver for Gemini gigabit ethernet
From: Linus Walleij @ 2017-12-18 20:55 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Michał Mirosław, Tobias Waldvogel, Florian Fainelli,
Paulius Zaleckas, netdev, Hans Ulli Kroll, Janos Laube,
David S . Miller, Linux ARM
In-Reply-To: <20171218145403.GE10595@n2100.armlinux.org.uk>
On Mon, Dec 18, 2017 at 3:54 PM, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Mon, Dec 18, 2017 at 03:48:17PM +0100, Michał Mirosław wrote:
>> On Mon, Dec 18, 2017 at 02:57:37PM +0100, Linus Walleij wrote:
>> > On Sat, Dec 16, 2017 at 8:39 PM, Linus Walleij <linus.walleij@linaro.org> wrote:
>> >
>> > > The Gemini ethernet has been around for years as an out-of-tree
>> > > patch used with the NAS boxen and routers built on StorLink
>> > > SL3512 and SL3516, later Storm Semiconductor, later Cortina
>> > > Systems. These ASICs are still being deployed and brand new
>> > > off-the-shelf systems using it can easily be acquired.
>> [...]
>> > > ---
>> > > Changes from v8:
>> > > - Remove dependency guards in Kconfig to get a wider compile
>> > > coverage for the driver to detect broken APIs etc.
>> >
>> > I guess we need to hold this off for a while, the code does
>> > some weird stuff using the ARM-internal page DMA mapping
>> > API.
>> >
>> > I *think* what happens is that the driver allocates a global queue
>> > used for RX and TX on both interfaces, then initializes that with
>> > page pointers and gives that to the hardware to play with.
>> >
>> > When an RX packet comes in, the RX routine needs to figure
>> > out from the DMA (physical) address which remapped
>> > page/address this random physical address pointer
>> > corresponds to.
>> >
>> > The Linux DMA API assumption is that the driver keeps track
>> > of this mapping, not the hardware. So we need to figure out
>> > a way to reverse-map this. Preferably quickly, and without
>> > using any ARM-internal mapping APIs.
>>
>> IIRC, the hardware copies descriptors from free queue (FREEQ)
>> to RX queues. FREEQ is shared among the two ethernet ports.
Seems like that to me too. I will try to refactor and break it
apart a bit.
The way freeq works is undocumented, even in the official
datasheet for CS3516 (the memory area is just "reserved"),
so the code is the only documentation of it.
>> This platform is CPU bound, so every additional lookup will
>> hit performance here. In my version I had an #ifdef for
>> COMPILE_TEST that replaced ARM-specific calls with stubs.
>> Since the driver is not expected to work on other platforms,
>> this seemed like the best workaround to make it compile
>> on other arches.
>
> Really. No. Stop going beneath the covers and using ARM private
> implementation APIs in drivers.
>
> Take that as a big NAK to that.
Don't worry, it won't happen. I am already thinking about better
approaches that stay with the public DMA-API.
> (I don't seem have the patch in question here to look at though.)
I'll put you on CC in future postings.
Yours,
Linus Walleij
^ permalink raw reply
* linux-next: Signed-off-by missing for commits in the net-next tree
From: Stephen Rothwell @ 2017-12-18 20:41 UTC (permalink / raw)
To: David Miller, Networking
Cc: Linux-Next Mailing List, Linux Kernel Mailing List, Bert Kenward,
Edward Cree
Hi all,
Commits
d8d8ccf27741 ("sfc: update EF10 register definitions")
0bc959a95e8c ("sfc: populate the timer reload field")
are missing a Signed-off-by from their author.
--
Cheers,
Stephen Rothwell
^ permalink raw reply
* Re: [PATCH] qed: Remove unused QED_RDMA_DEV_CAP_* symbols and dev->dev_caps
From: David Miller @ 2017-12-18 20:25 UTC (permalink / raw)
To: helgaas; +Cc: netdev, linux-pci, Ariel.Elior, everest-linux-l2
In-Reply-To: <20171218.151354.978438255508508144.davem@davemloft.net>
From: David Miller <davem@davemloft.net>
Date: Mon, 18 Dec 2017 15:13:54 -0500 (EST)
> From: Bjorn Helgaas <helgaas@kernel.org>
> Date: Fri, 15 Dec 2017 17:03:01 -0600
>
>> From: Bjorn Helgaas <bhelgaas@google.com>
>>
>> The QED_RDMA_DEV_CAP_* symbols are only used to set bits in dev->dev_caps.
>> Nobody ever looks at those bits. Remove the symbols and dev_caps itself.
>>
>> Note that if these are ever used and added back, it looks incorrect to set
>> QED_RDMA_DEV_CAP_ATOMIC_OP based on PCI_EXP_DEVCTL2_LTR_EN. LTR is the
>> Latency Tolerance Reporting mechanism, which has nothing to do with Atomic
>> Ops.
>>
>> No functional change intended.
>>
>> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>
> Applied to net-next.
Actually, this doesn't build, reverted:
drivers/infiniband/hw/qedr/main.c: In function ‘qedr_set_device_attr’:
drivers/infiniband/hw/qedr/main.c:682:27: error: ‘struct qed_rdma_device’ has no member named ‘dev_caps’
attr->dev_caps = qed_attr->dev_caps;
^ permalink raw reply
* Re: [bpf-next V1-RFC PATCH 08/14] nfp: setup xdp_rxq_info
From: Jesper Dangaard Brouer @ 2017-12-18 20:25 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Daniel Borkmann, Alexei Starovoitov, dsahern, oss-drivers,
Simon Horman, netdev, gospo, bjorn.topel, michael.chan, brouer
In-Reply-To: <20171213183427.213f6206@cakuba.netronome.com>
On Wed, 13 Dec 2017 18:34:27 -0800
Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
> On Wed, 13 Dec 2017 12:20:01 +0100, Jesper Dangaard Brouer wrote:
> > Driver hook points for xdp_rxq_info:
> > * init+reg: nfp_net_rx_ring_alloc
> > * unreg : nfp_net_rx_ring_free
> >
> > In struct nfp_net_rx_ring moved member @size into a hole on 64-bit.
> > Thus, the size remaines the same after adding member @xdp_rxq.
> >
> > Cc: oss-drivers@netronome.com
> > Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Cc: Simon Horman <simon.horman@netronome.com>
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>
> > diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h
> > index 3801c52098d5..0e564cfabe7e 100644
> > --- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
> > +++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
> > @@ -47,6 +47,7 @@
> > #include <linux/netdevice.h>
> > #include <linux/pci.h>
> > #include <linux/io-64-nonatomic-hi-lo.h>
> > +#include <net/xdp.h>
> >
> > #include "nfp_net_ctrl.h"
> >
> > @@ -350,6 +351,7 @@ struct nfp_net_rx_buf {
> > * @rxds: Virtual address of FL/RX ring in host memory
> > * @dma: DMA address of the FL/RX ring
> > * @size: Size, in bytes, of the FL/RX ring (needed to free)
> > + * @xdp_rxq: RX-ring info avail for XDP
> > */
> > struct nfp_net_rx_ring {
> > struct nfp_net_r_vector *r_vec;
> > @@ -361,13 +363,14 @@ struct nfp_net_rx_ring {
> > u32 idx;
> >
> > int fl_qcidx;
> > + unsigned int size;
> > u8 __iomem *qcp_fl;
> >
> > struct nfp_net_rx_buf *rxbufs;
> > struct nfp_net_rx_desc *rxds;
> >
> > dma_addr_t dma;
> > - unsigned int size;
> > + struct xdp_rxq_info xdp_rxq;
> > } ____cacheline_aligned;
>
> The @size member is not in the hole on purpose. IIRC all the members
> up to @dma are in the first cacheline. All things which are not
> needed on the fast path are after @dma. IOW @size is not used on the
> fast path and the hole is for fast path stuff :)
Yes, I did notice @size was not used on fast-path, but it didn't hurt
to move it up. I was just excited to see I could add this without
increasing the rx_ring struct size.
I'm more and more considering Ahern's suggestion of returning an err,
and if I do so, I also want to do proper allocation of xdp_rxq_info,
which means this will be converted into a pointer instead (and thus
much smaller effect on rx_ring size).
> > /**
> > diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> > index ad3e9f6a61e5..6474aecd0451 100644
> > --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> > +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> > @@ -2252,6 +2253,7 @@ static void nfp_net_rx_ring_free(struct nfp_net_rx_ring *rx_ring)
> > struct nfp_net_r_vector *r_vec = rx_ring->r_vec;
> > struct nfp_net_dp *dp = &r_vec->nfp_net->dp;
> >
> > + xdp_rxq_info_unreg(&rx_ring->xdp_rxq);
> > kfree(rx_ring->rxbufs);
> >
> > if (rx_ring->rxds)
> > @@ -2277,6 +2279,12 @@ nfp_net_rx_ring_alloc(struct nfp_net_dp *dp, struct nfp_net_rx_ring *rx_ring)
> > {
> > int sz;
> >
> > + /* XDP RX-queue info */
> > + xdp_rxq_info_init(&rx_ring->xdp_rxq);
> > + rx_ring->xdp_rxq.dev = dp->netdev;
> > + rx_ring->xdp_rxq.queue_index = rx_ring->idx;
> > + xdp_rxq_info_reg(&rx_ring->xdp_rxq);
> > +
> > rx_ring->cnt = dp->rxd_cnt;
> > rx_ring->size = sizeof(*rx_ring->rxds) * rx_ring->cnt;
> > rx_ring->rxds = dma_zalloc_coherent(dp->dev, rx_ring->size,
>
> The nfp driver implements the prepare/commit for reallocating rings.
> I don't think it matters now, but there can be 2 sets of rings with the
> same ID allocated during reconfiguration (see nfp_net_ring_reconfig()).
> Maybe place the register/unregister in nfp_net_open_stack() and
> nfp_net_close_stack() respectively?
Going over the your driver code again, I do think I handle this
correctly in nfp_net_rx_ring_free() / nfp_net_rx_ring_alloc().
Your calls nfp_net_open_stack() / nfp_net_close_stack(), doesn't
support failing, which conflicts with Ahern's suggestion.
As I explained, in another reply, I do want to support having 2 sets
of rings during reconfiguration, as many drivers do this. This is also
the reason I cannot use net_device->_rx[] area.
> Perhaps that won't be necessary, only cleaner :) I'm not sure how is
> the redirect between drivers intended to work WRT freeing rings and
> unloading drivers while packets fly...
I do have a plan for handling in-flight packets when driver is being
unloaded... that is the reason for having the unreg call. (Sorry, I
should have included you in that offlist discussion).
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* Re: [PATCH] qed: Remove unused QED_RDMA_DEV_CAP_* symbols and dev->dev_caps
From: David Miller @ 2017-12-18 20:13 UTC (permalink / raw)
To: helgaas; +Cc: netdev, linux-pci, Ariel.Elior, everest-linux-l2
In-Reply-To: <20171215230301.177993.80284.stgit@bhelgaas-glaptop.roam.corp.google.com>
From: Bjorn Helgaas <helgaas@kernel.org>
Date: Fri, 15 Dec 2017 17:03:01 -0600
> From: Bjorn Helgaas <bhelgaas@google.com>
>
> The QED_RDMA_DEV_CAP_* symbols are only used to set bits in dev->dev_caps.
> Nobody ever looks at those bits. Remove the symbols and dev_caps itself.
>
> Note that if these are ever used and added back, it looks incorrect to set
> QED_RDMA_DEV_CAP_ATOMIC_OP based on PCI_EXP_DEVCTL2_LTR_EN. LTR is the
> Latency Tolerance Reporting mechanism, which has nothing to do with Atomic
> Ops.
>
> No functional change intended.
>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Applied to net-next.
^ permalink raw reply
* Re: [PATCH] cxgb4: Simplify PCIe Completion Timeout setting
From: David Miller @ 2017-12-18 20:13 UTC (permalink / raw)
To: helgaas; +Cc: netdev, linux-pci, ganeshgr
In-Reply-To: <20171215230150.177674.9821.stgit@bhelgaas-glaptop.roam.corp.google.com>
From: Bjorn Helgaas <helgaas@kernel.org>
Date: Fri, 15 Dec 2017 17:01:50 -0600
> From: Bjorn Helgaas <bhelgaas@google.com>
>
> Simplify PCIe Completion Timeout setting by using the
> pcie_capability_clear_and_set_word() interface. No functional change
> intended.
>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Applied to net-next.
^ permalink raw reply
* Re: [PATCH net-next 0/2] net: erspan: a couple fixes
From: David Miller @ 2017-12-18 20:12 UTC (permalink / raw)
To: u9012063; +Cc: netdev
In-Reply-To: <1513376864-33777-1-git-send-email-u9012063@gmail.com>
From: William Tu <u9012063@gmail.com>
Date: Fri, 15 Dec 2017 14:27:42 -0800
> Haishuang Yan reports a couple of issues (wrong return value,
> pskb_may_pull) on erspan V1. Since erspan V2 is in net-next,
> this series fix the similar issues on v2.
Series applied, thank you.
^ permalink raw reply
* Re: [PATCH] net: phy: xgene: disable clk on error paths
From: David Miller @ 2017-12-18 20:10 UTC (permalink / raw)
To: khoroshilov
Cc: isubramanian, kchudgar, qnguyen, netdev, linux-kernel,
ldv-project
In-Reply-To: <1513374759-21384-1-git-send-email-khoroshilov@ispras.ru>
From: Alexey Khoroshilov <khoroshilov@ispras.ru>
Date: Sat, 16 Dec 2017 00:52:39 +0300
> There are several error paths in xgene_mdio_probe(),
> where clk is left undisabled. The patch fixes them.
>
> Found by Linux Driver Verification project (linuxtesting.org).
>
> Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH v16 3/4] hinic: Replace PCI pool old API
From: David Miller @ 2017-12-18 20:07 UTC (permalink / raw)
To: romain.perier
Cc: axboe, akpm, dan.j.williams, vinod.koul, jeffrey.t.kirsher,
aviad.krawczyk, jejb, martin.petersen, linux-scsi, bhelgaas,
linux-pci, dmaengine, netdev, linux-kernel, gregkh, romain.perier
In-Reply-To: <20171215193123.13395-4-romain.perier@gmail.com>
From: Romain Perier <romain.perier@gmail.com>
Date: Fri, 15 Dec 2017 20:31:22 +0100
> From: Romain Perier <romain.perier@collabora.com>
>
> The PCI pool API is deprecated. This commit replaces the PCI pool old
> API by the appropriate function with the DMA pool API.
>
> Signed-off-by: Romain Perier <romain.perier@collabora.com>
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Re: [PATCH v16 2/4] net: e100: Replace PCI pool old API
From: David Miller @ 2017-12-18 20:06 UTC (permalink / raw)
To: romain.perier
Cc: axboe, akpm, dan.j.williams, vinod.koul, jeffrey.t.kirsher,
aviad.krawczyk, jejb, martin.petersen, linux-scsi, bhelgaas,
linux-pci, dmaengine, netdev, linux-kernel, gregkh, romain.perier
In-Reply-To: <20171215193123.13395-3-romain.perier@gmail.com>
From: Romain Perier <romain.perier@gmail.com>
Date: Fri, 15 Dec 2017 20:31:21 +0100
> From: Romain Perier <romain.perier@collabora.com>
>
> The PCI pool API is deprecated. This commit replaces the PCI pool old
> API by the appropriate function with the DMA pool API.
>
> Signed-off-by: Romain Perier <romain.perier@collabora.com>
> Acked-by: Peter Senna Tschudin <peter.senna@collabora.com>
> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Tested-by: Peter Senna Tschudin <peter.senna@collabora.com>
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Re: [PATCH] net: phy: marvell: avoid pause mode on SGMII-to-Copper for 88e151x
From: David Miller @ 2017-12-18 20:04 UTC (permalink / raw)
To: rmk+kernel; +Cc: andrew, f.fainelli, netdev
In-Reply-To: <E1ePsZ2-0007rr-F3@rmk-PC.armlinux.org.uk>
From: Russell King <rmk+kernel@armlinux.org.uk>
Date: Fri, 15 Dec 2017 16:10:20 +0000
> Observed on the 88e1512 in SGMII-to-Copper mode, negotiating pause
> is unreliable. While the pause bits can be set in the advertisment
> register, they clear shortly after negotiation with a link partner
> commences irrespective of the cause of the negotiation.
>
> While these bits may be correctly conveyed to the link partner on the
> first negotiation, a subsequent negotiation (eg, due to negotiation
> restart by the link partner, or reconnection of the cable) will result
> in the link partner seeing these bits as zero, while the kernel
> believes that it has advertised pause modes.
>
> This leads to the local kernel evaluating (eg) symmetric pause mode,
> while the remote end evaluates that we have no pause mode capability.
>
> Since we can't guarantee the advertisment, disable pause mode support
> with this PHY when used in SGMII-to-Copper mode.
>
> The 88e1510 in RGMII-to-Copper mode appears to behave correctly.
>
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Applied.
^ permalink raw reply
* Re: [PATCH 0/3] More SFP/phylink fixes
From: David Miller @ 2017-12-18 19:58 UTC (permalink / raw)
To: linux; +Cc: andrew, f.fainelli, netdev
In-Reply-To: <20171215160344.GU10595@n2100.armlinux.org.uk>
From: Russell King - ARM Linux <linux@armlinux.org.uk>
Date: Fri, 15 Dec 2017 16:03:44 +0000
> This series fixes a few more bits with sfp/phylink, particularly
> confusion with the right way to test for the RTNL mutex being
> held, a change in 2016 to the mdiobus_scan() behaviour that wasn't
> noticed, and a fix for reading module EEPROMs.
Series applied to net-next, because that's the tree this actually
applies cleanly to.
Please be explicit about which tree of mine you are targetting
in the future.
Thank you.
^ permalink raw reply
* Re: [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure.
From: Marcelo Ricardo Leitner @ 2017-12-18 19:53 UTC (permalink / raw)
To: Ilya Lesokhin
Cc: netdev, davem, davejwatson, tom, hannes, borisp, aviadye, liranl
In-Reply-To: <20171218111033.13256-7-ilyal@mellanox.com>
On Mon, Dec 18, 2017 at 01:10:33PM +0200, Ilya Lesokhin wrote:
> This patch adds a generic infrastructure to offload TLS crypto to a
> network devices. It enables the kernel TLS socket to skip encryption
> and authentication operations on the transmit side of the data path.
> Leaving those computationally expensive operations to the NIC.
I have a hard time understanding why this was named 'tls_device' if no
net_device's are registered.
>
> The NIC offload infrastructure builds TLS records and pushes them to
> the TCP layer just like the SW KTLS implementation and using the same API.
> TCP segmentation is mostly unaffected. Currently the only exception is
> that we prevent mixed SKBs where only part of the payload requires
> offload. In the future we are likely to add a similar restriction
> following a change cipher spec record.
>
> The notable differences between SW KTLS and NIC offloaded TLS
> implementations are as follows:
> 1. The offloaded implementation builds "plaintext TLS record", those
> records contain plaintext instead of ciphertext and place holder bytes
> instead of authentication tags.
> 2. The offloaded implementation maintains a mapping from TCP sequence
> number to TLS records. Thus given a TCP SKB sent from a NIC offloaded
> TLS socket, we can use the tls NIC offload infrastructure to obtain
> enough context to encrypt the payload of the SKB.
> A TLS record is released when the last byte of the record is ack'ed,
> this is done through the new icsk_clean_acked callback.
>
> The infrastructure should be extendable to support various NIC offload
> implementations. However it is currently written with the
> implementation below in mind:
> The NIC assumes that packets from each offloaded stream are sent as
> plaintext and in-order. It keeps track of the TLS records in the TCP
> stream. When a packet marked for offload is transmitted, the NIC
> encrypts the payload in-place and puts authentication tags in the
> relevant place holders.
>
> The responsibility for handling out-of-order packets (i.e. TCP
> retransmission, qdisc drops) falls on the netdev driver.
>
> The netdev driver keeps track of the expected TCP SN from the NIC's
> perspective. If the next packet to transmit matches the expected TCP
> SN, the driver advances the expected TCP SN, and transmits the packet
> with TLS offload indication.
>
> If the next packet to transmit does not match the expected TCP SN. The
> driver calls the TLS layer to obtain the TLS record that includes the
> TCP of the packet for transmission. Using this TLS record, the driver
> posts a work entry on the transmit queue to reconstruct the NIC TLS
> state required for the offload of the out-of-order packet. It updates
> the expected TCP SN accordingly and transmit the now in-order packet.
> The same queue is used for packet transmission and TLS context
> reconstruction to avoid the need for flushing the transmit queue before
> issuing the context reconstruction request.
>
> Signed-off-by: Boris Pismenny <borisp@mellanox.com>
> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
> ---
> include/net/tls.h | 62 +++-
> net/tls/Kconfig | 9 +
> net/tls/Makefile | 3 +
> net/tls/tls_device.c | 800 ++++++++++++++++++++++++++++++++++++++++++
> net/tls/tls_device_fallback.c | 405 +++++++++++++++++++++
> net/tls/tls_main.c | 33 +-
> 6 files changed, 1305 insertions(+), 7 deletions(-)
> create mode 100644 net/tls/tls_device.c
> create mode 100644 net/tls/tls_device_fallback.c
>
> diff --git a/include/net/tls.h b/include/net/tls.h
> index 936cfc5cab7d..9c1b5d13d9a7 100644
> --- a/include/net/tls.h
> +++ b/include/net/tls.h
> @@ -75,6 +75,29 @@ struct tls_sw_context {
> struct scatterlist sg_aead_out[2];
> };
>
> +struct tls_record_info {
> + struct list_head list;
> + u32 end_seq;
> + int len;
> + int num_frags;
> + skb_frag_t frags[MAX_SKB_FRAGS];
> +};
> +
> +struct tls_offload_context {
> + struct crypto_aead *aead_send;
> +
> + struct list_head records_list;
> + struct scatterlist sg_tx_data[MAX_SKB_FRAGS];
> + void (*sk_destruct)(struct sock *sk);
> + struct tls_record_info *open_record;
> + struct tls_record_info *retransmit_hint;
> + u64 hint_record_sn;
> + u64 unacked_record_sn;
> +
> + u32 expected_seq;
> + spinlock_t lock; /* protects records list */
> +};
> +
> enum {
> TLS_PENDING_CLOSED_RECORD
> };
> @@ -85,6 +108,10 @@ struct tls_context {
> struct tls12_crypto_info_aes_gcm_128 crypto_send_aes_gcm_128;
> };
>
> + struct list_head list;
> + struct net_device *netdev;
> + refcount_t refcount;
> +
> void *priv_ctx;
>
> u8 tx_conf:2;
> @@ -129,9 +156,29 @@ int tls_sw_sendpage(struct sock *sk, struct page *page,
> void tls_sw_close(struct sock *sk, long timeout);
> void tls_sw_free_tx_resources(struct sock *sk);
>
> -void tls_sk_destruct(struct sock *sk, struct tls_context *ctx);
> -void tls_icsk_clean_acked(struct sock *sk);
> +void tls_clear_device_offload(struct sock *sk, struct tls_context *ctx);
> +int tls_set_device_offload(struct sock *sk, struct tls_context *ctx);
> +int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
> +int tls_device_sendpage(struct sock *sk, struct page *page,
> + int offset, size_t size, int flags);
> +void tls_device_sk_destruct(struct sock *sk);
> +void tls_device_init(void);
> +void tls_device_cleanup(void);
>
> +struct tls_record_info *tls_get_record(struct tls_offload_context *context,
> + u32 seq, u64 *p_record_sn);
> +
> +static inline bool tls_record_is_start_marker(struct tls_record_info *rec)
> +{
> + return rec->len == 0;
> +}
> +
> +static inline u32 tls_record_start_seq(struct tls_record_info *rec)
> +{
> + return rec->end_seq - rec->len;
> +}
> +
> +void tls_sk_destruct(struct sock *sk, struct tls_context *ctx);
> int tls_push_sg(struct sock *sk, struct tls_context *ctx,
> struct scatterlist *sg, u16 first_offset,
> int flags);
> @@ -168,6 +215,13 @@ static inline bool tls_is_pending_open_record(struct tls_context *tls_ctx)
> return tls_ctx->pending_open_record_frags;
> }
>
> +static inline bool tls_is_sk_tx_device_offloaded(struct sock *sk)
> +{
> + return sk_fullsock(sk) &&
> + /* matches smp_store_release in tls_set_device_offload */
> + smp_load_acquire(&sk->sk_destruct) == &tls_device_sk_destruct;
> +}
> +
> static inline void tls_err_abort(struct sock *sk)
> {
> sk->sk_err = -EBADMSG;
> @@ -255,4 +309,8 @@ static inline struct tls_offload_context *tls_offload_ctx(
> int tls_proccess_cmsg(struct sock *sk, struct msghdr *msg,
> unsigned char *record_type);
>
> +int tls_sw_fallback_init(struct sock *sk,
> + struct tls_offload_context *offload_ctx,
> + struct tls_crypto_info *crypto_info);
> +
> #endif /* _TLS_OFFLOAD_H */
> diff --git a/net/tls/Kconfig b/net/tls/Kconfig
> index eb583038c67e..1a4ea55c2f09 100644
> --- a/net/tls/Kconfig
> +++ b/net/tls/Kconfig
> @@ -13,3 +13,12 @@ config TLS
> encryption handling of the TLS protocol to be done in-kernel.
>
> If unsure, say N.
> +
> +config TLS_DEVICE
> + bool "Transport Layer Security HW offload"
> + depends on TLS
> + default n
> + ---help---
> + Enable kernel support for HW offload of the TLS protocol.
> +
> + If unsure, say N.
> diff --git a/net/tls/Makefile b/net/tls/Makefile
> index a930fd1c4f7b..44483cd47b3a 100644
> --- a/net/tls/Makefile
> +++ b/net/tls/Makefile
> @@ -5,3 +5,6 @@
> obj-$(CONFIG_TLS) += tls.o
>
> tls-y := tls_main.o tls_sw.o
> +
> +tls-$(CONFIG_TLS_DEVICE) += tls_device.o tls_device_fallback.o
> +
> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> new file mode 100644
> index 000000000000..5082d693a503
> --- /dev/null
> +++ b/net/tls/tls_device.c
> @@ -0,0 +1,800 @@
> +/* Copyright (c) 2016-2017, Mellanox Technologies All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * - Redistributions of source code must retain the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer.
> + *
> + * - Redistributions in binary form must reproduce the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer in the documentation and/or other materials
> + * provided with the distribution.
> + *
> + * - Neither the name of the Mellanox Technologies nor the
> + * names of its contributors may be used to endorse or promote
> + * products derived from this software without specific prior written
> + * permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
> + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
> + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED.
> + * IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
> + * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
> + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
> + * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
> + * POSSIBILITY OF SUCH DAMAGE
> + */
> +
> +#include <linux/module.h>
> +#include <net/tcp.h>
> +#include <net/inet_common.h>
> +#include <linux/highmem.h>
> +#include <linux/netdevice.h>
> +
> +#include <net/tls.h>
> +#include <crypto/aead.h>
> +
> +/* device_offload_lock is used to synchronize tls_dev_add
> + * against NETDEV_DOWN notifications.
> + */
> +DEFINE_STATIC_PERCPU_RWSEM(device_offload_lock);
> +
> +static void tls_device_gc_task(struct work_struct *work);
> +
> +static DECLARE_WORK(tls_device_gc_work, tls_device_gc_task);
> +static LIST_HEAD(tls_device_gc_list);
> +static LIST_HEAD(tls_device_list);
> +static DEFINE_SPINLOCK(tls_device_lock);
> +
> +static void tls_device_free_ctx(struct tls_context *ctx)
> +{
> + struct tls_offload_context *offlad_ctx = tls_offload_ctx(ctx);
> +
> + kfree(offlad_ctx);
> + kfree(ctx);
> +}
> +
> +static void tls_device_gc_task(struct work_struct *work)
> +{
> + struct tls_context *ctx, *tmp;
> + struct list_head gc_list;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&tls_device_lock, flags);
> + INIT_LIST_HEAD(&gc_list);
> + list_splice_init(&tls_device_gc_list, &gc_list);
> + spin_unlock_irqrestore(&tls_device_lock, flags);
> +
> + list_for_each_entry_safe(ctx, tmp, &gc_list, list) {
> + struct net_device *netdev = ctx->netdev;
> +
> + if (netdev) {
> + netdev->tlsdev_ops->tls_dev_del(netdev, ctx,
> + TLS_OFFLOAD_CTX_DIR_TX);
> + dev_put(netdev);
> + }
> +
> + list_del(&ctx->list);
> + tls_device_free_ctx(ctx);
> + }
> +}
> +
> +static void tls_device_queue_ctx_destruction(struct tls_context *ctx)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&tls_device_lock, flags);
> + list_move_tail(&ctx->list, &tls_device_gc_list);
> +
> + /* schedule_work inside the spinlock
> + * to make sure tls_device_down waits for that work.
> + */
> + schedule_work(&tls_device_gc_work);
> +
> + spin_unlock_irqrestore(&tls_device_lock, flags);
> +}
> +
> +/* We assume that the socket is already connected */
> +static struct net_device *get_netdev_for_sock(struct sock *sk)
> +{
> + struct inet_sock *inet = inet_sk(sk);
> + struct net_device *netdev = NULL;
> +
> + netdev = dev_get_by_index(sock_net(sk), inet->cork.fl.flowi_oif);
> +
> + return netdev;
> +}
> +
> +static int attach_sock_to_netdev(struct sock *sk, struct net_device *netdev,
> + struct tls_context *ctx)
> +{
> + int rc;
> +
> + rc = netdev->tlsdev_ops->tls_dev_add(netdev, sk, TLS_OFFLOAD_CTX_DIR_TX,
> + &ctx->crypto_send);
> + if (rc) {
> + pr_err("The netdev has refused to offload this socket\n");
_ratelimit here is probably welcomed, as this could be triggered by
users at will.
> + goto out;
> + }
> +
> + rc = 0;
> +out:
> + return rc;
> +}
> +
> +static void destroy_record(struct tls_record_info *record)
> +{
> + skb_frag_t *frag;
> + int nr_frags = record->num_frags;
> +
> + while (nr_frags > 0) {
> + frag = &record->frags[nr_frags - 1];
> + __skb_frag_unref(frag);
> + --nr_frags;
> + }
> + kfree(record);
> +}
> +
> +static void delete_all_records(struct tls_offload_context *offload_ctx)
> +{
> + struct tls_record_info *info, *temp;
> +
> + list_for_each_entry_safe(info, temp, &offload_ctx->records_list, list) {
> + list_del(&info->list);
> + destroy_record(info);
> + }
> +
> + offload_ctx->retransmit_hint = NULL;
> +}
> +
> +static void tls_icsk_clean_acked(struct sock *sk)
> +{
> + struct tls_context *tls_ctx = tls_get_ctx(sk);
> + struct tls_offload_context *ctx;
> + struct tcp_sock *tp = tcp_sk(sk);
> + struct tls_record_info *info, *temp;
> + unsigned long flags;
> + u64 deleted_records = 0;
> +
> + if (!tls_ctx)
> + return;
> +
> + ctx = tls_offload_ctx(tls_ctx);
> +
> + spin_lock_irqsave(&ctx->lock, flags);
Would be nice if this spinlock could be avoided somehow, as it's being
called right from tcp_ack().
> + info = ctx->retransmit_hint;
> + if (info && !before(tp->snd_una, info->end_seq)) {
> + ctx->retransmit_hint = NULL;
> + list_del(&info->list);
> + destroy_record(info);
> + deleted_records++;
> + }
> +
> + list_for_each_entry_safe(info, temp, &ctx->records_list, list) {
> + if (before(tp->snd_una, info->end_seq))
> + break;
> + list_del(&info->list);
> +
> + destroy_record(info);
> + deleted_records++;
> + }
> +
> + ctx->unacked_record_sn += deleted_records;
> + spin_unlock_irqrestore(&ctx->lock, flags);
> +}
> +
> +/* At this point, there should be no references on this
> + * socket and no in-flight SKBs associated with this
> + * socket, so it is safe to free all the resources.
> + */
> +void tls_device_sk_destruct(struct sock *sk)
> +{
> + struct tls_context *tls_ctx = tls_get_ctx(sk);
> + struct tls_offload_context *ctx = tls_offload_ctx(tls_ctx);
> +
> + if (ctx->open_record)
> + destroy_record(ctx->open_record);
> +
> + delete_all_records(ctx);
> + crypto_free_aead(ctx->aead_send);
> + ctx->sk_destruct(sk);
> +
> + if (refcount_dec_and_test(&tls_ctx->refcount))
> + tls_device_queue_ctx_destruction(tls_ctx);
> +}
> +EXPORT_SYMBOL(tls_device_sk_destruct);
> +
> +static inline void tls_append_frag(struct tls_record_info *record,
> + struct page_frag *pfrag,
> + int size)
> +{
> + skb_frag_t *frag;
> +
> + frag = &record->frags[record->num_frags - 1];
> + if (frag->page.p == pfrag->page &&
> + frag->page_offset + frag->size == pfrag->offset) {
> + frag->size += size;
> + } else {
> + ++frag;
> + frag->page.p = pfrag->page;
> + frag->page_offset = pfrag->offset;
> + frag->size = size;
> + ++record->num_frags;
> + get_page(pfrag->page);
> + }
> +
> + pfrag->offset += size;
> + record->len += size;
> +}
> +
> +static inline int tls_push_record(struct sock *sk,
> + struct tls_context *ctx,
> + struct tls_offload_context *offload_ctx,
> + struct tls_record_info *record,
> + struct page_frag *pfrag,
> + int flags,
> + unsigned char record_type)
> +{
> + skb_frag_t *frag;
> + struct tcp_sock *tp = tcp_sk(sk);
> + struct page_frag fallback_frag;
> + struct page_frag *tag_pfrag = pfrag;
> + int i;
> +
> + /* fill prepand */
> + frag = &record->frags[0];
> + tls_fill_prepend(ctx,
> + skb_frag_address(frag),
> + record->len - ctx->prepend_size,
> + record_type);
> +
> + if (unlikely(!skb_page_frag_refill(ctx->tag_size, pfrag, GFP_KERNEL))) {
> + /* HW doesn't care about the data in the tag
> + * so in case pfrag has no room
> + * for a tag and we can't allocate a new pfrag
> + * just use the page in the first frag
> + * rather then write a complicated fall back code.
> + */
> + tag_pfrag = &fallback_frag;
> + tag_pfrag->page = skb_frag_page(frag);
> + tag_pfrag->offset = 0;
> + }
> +
> + tls_append_frag(record, tag_pfrag, ctx->tag_size);
> + record->end_seq = tp->write_seq + record->len;
> + spin_lock_irq(&offload_ctx->lock);
> + list_add_tail(&record->list, &offload_ctx->records_list);
> + spin_unlock_irq(&offload_ctx->lock);
> + offload_ctx->open_record = NULL;
> + set_bit(TLS_PENDING_CLOSED_RECORD, &ctx->flags);
> + tls_advance_record_sn(sk, ctx);
> +
> + for (i = 0; i < record->num_frags; i++) {
> + frag = &record->frags[i];
> + sg_unmark_end(&offload_ctx->sg_tx_data[i]);
> + sg_set_page(&offload_ctx->sg_tx_data[i], skb_frag_page(frag),
> + frag->size, frag->page_offset);
> + sk_mem_charge(sk, frag->size);
> + get_page(skb_frag_page(frag));
> + }
> + sg_mark_end(&offload_ctx->sg_tx_data[record->num_frags - 1]);
> +
> + /* all ready, send */
> + return tls_push_sg(sk, ctx, offload_ctx->sg_tx_data, 0, flags);
> +}
> +
> +static inline int tls_create_new_record(struct tls_offload_context *offload_ctx,
> + struct page_frag *pfrag,
> + size_t prepend_size)
> +{
> + skb_frag_t *frag;
> + struct tls_record_info *record;
> +
> + record = kmalloc(sizeof(*record), GFP_KERNEL);
> + if (!record)
> + return -ENOMEM;
> +
> + frag = &record->frags[0];
> + __skb_frag_set_page(frag, pfrag->page);
> + frag->page_offset = pfrag->offset;
> + skb_frag_size_set(frag, prepend_size);
> +
> + get_page(pfrag->page);
> + pfrag->offset += prepend_size;
> +
> + record->num_frags = 1;
> + record->len = prepend_size;
> + offload_ctx->open_record = record;
> + return 0;
> +}
> +
> +static inline int tls_do_allocation(struct sock *sk,
> + struct tls_offload_context *offload_ctx,
> + struct page_frag *pfrag,
> + size_t prepend_size)
> +{
> + int ret;
> +
> + if (!offload_ctx->open_record) {
> + if (unlikely(!skb_page_frag_refill(prepend_size, pfrag,
> + sk->sk_allocation))) {
> + sk->sk_prot->enter_memory_pressure(sk);
> + sk_stream_moderate_sndbuf(sk);
> + return -ENOMEM;
> + }
> +
> + ret = tls_create_new_record(offload_ctx, pfrag, prepend_size);
> + if (ret)
> + return ret;
> +
> + if (pfrag->size > pfrag->offset)
> + return 0;
> + }
> +
> + if (!sk_page_frag_refill(sk, pfrag))
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +static int tls_push_data(struct sock *sk,
> + struct iov_iter *msg_iter,
> + size_t size, int flags,
> + unsigned char record_type)
> +{
> + struct tls_context *tls_ctx = tls_get_ctx(sk);
> + struct tls_offload_context *ctx = tls_offload_ctx(tls_ctx);
> + struct tls_record_info *record = ctx->open_record;
> + struct page_frag *pfrag;
> + int copy, rc = 0;
> + size_t orig_size = size;
> + u32 max_open_record_len;
> + long timeo;
> + int more = flags & (MSG_SENDPAGE_NOTLAST | MSG_MORE);
> + int tls_push_record_flags = flags | MSG_SENDPAGE_NOTLAST;
> + bool done = false;
> +
> + if (flags &
> + ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST))
> + return -ENOTSUPP;
> +
> + if (sk->sk_err)
> + return -sk->sk_err;
> +
> + timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
> + rc = tls_complete_pending_work(sk, tls_ctx, flags, &timeo);
> + if (rc < 0)
> + return rc;
> +
> + pfrag = sk_page_frag(sk);
> +
> + /* KTLS_TLS_HEADER_SIZE is not counted as part of the TLS record, and
> + * we need to leave room for an authentication tag.
> + */
> + max_open_record_len = TLS_MAX_PAYLOAD_SIZE +
> + tls_ctx->prepend_size;
> + do {
> + if (tls_do_allocation(sk, ctx, pfrag,
> + tls_ctx->prepend_size)) {
> + rc = sk_stream_wait_memory(sk, &timeo);
> + if (!rc)
> + continue;
> +
> + record = ctx->open_record;
> + if (!record)
> + break;
> +handle_error:
> + if (record_type != TLS_RECORD_TYPE_DATA) {
> + /* avoid sending partial
> + * record with type !=
> + * application_data
> + */
> + size = orig_size;
> + destroy_record(record);
> + ctx->open_record = NULL;
> + } else if (record->len > tls_ctx->prepend_size) {
> + goto last_record;
> + }
> +
> + break;
> + }
> +
> + record = ctx->open_record;
> + copy = min_t(size_t, size, (pfrag->size - pfrag->offset));
> + copy = min_t(size_t, copy, (max_open_record_len - record->len));
> +
> + if (copy_from_iter_nocache(page_address(pfrag->page) +
> + pfrag->offset,
> + copy, msg_iter) != copy) {
> + rc = -EFAULT;
> + goto handle_error;
> + }
> + tls_append_frag(record, pfrag, copy);
> +
> + size -= copy;
> + if (!size) {
> +last_record:
> + tls_push_record_flags = flags;
> + if (more) {
> + tls_ctx->pending_open_record_frags =
> + record->num_frags;
> + break;
> + }
> +
> + done = true;
> + }
> +
> + if ((done) || record->len >= max_open_record_len ||
> + (record->num_frags >= MAX_SKB_FRAGS - 1)) {
> + rc = tls_push_record(sk,
> + tls_ctx,
> + ctx,
> + record,
> + pfrag,
> + tls_push_record_flags,
> + record_type);
> + if (rc < 0)
> + break;
> + }
> + } while (!done);
> +
> + if (orig_size - size > 0)
> + rc = orig_size - size;
> +
> + return rc;
> +}
> +
> +int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
> +{
> + unsigned char record_type = TLS_RECORD_TYPE_DATA;
> + int rc = 0;
> +
> + lock_sock(sk);
> +
> + if (unlikely(msg->msg_controllen)) {
> + rc = tls_proccess_cmsg(sk, msg, &record_type);
> + if (rc)
> + goto out;
> + }
> +
> + rc = tls_push_data(sk, &msg->msg_iter, size,
> + msg->msg_flags, record_type);
> +
> +out:
> + release_sock(sk);
> + return rc;
> +}
> +
> +int tls_device_sendpage(struct sock *sk, struct page *page,
> + int offset, size_t size, int flags)
> +{
> + struct iov_iter msg_iter;
> + struct kvec iov;
> + char *kaddr = kmap(page);
> + int rc = 0;
> +
> + if (flags & MSG_SENDPAGE_NOTLAST)
> + flags |= MSG_MORE;
> +
> + lock_sock(sk);
> +
> + if (flags & MSG_OOB) {
> + rc = -ENOTSUPP;
> + goto out;
> + }
> +
> + iov.iov_base = kaddr + offset;
> + iov.iov_len = size;
> + iov_iter_kvec(&msg_iter, WRITE | ITER_KVEC, &iov, 1, size);
> + rc = tls_push_data(sk, &msg_iter, size,
> + flags, TLS_RECORD_TYPE_DATA);
> + kunmap(page);
> +
> +out:
> + release_sock(sk);
> + return rc;
> +}
> +
> +struct tls_record_info *tls_get_record(struct tls_offload_context *context,
> + u32 seq, u64 *p_record_sn)
> +{
> + struct tls_record_info *info;
> + u64 record_sn = context->hint_record_sn;
> +
> + info = context->retransmit_hint;
> + if (!info ||
> + before(seq, info->end_seq - info->len)) {
> + /* if retransmit_hint is irrelevant start
> + * from the begging of the list
> + */
> + info = list_first_entry(&context->records_list,
> + struct tls_record_info, list);
> + record_sn = context->unacked_record_sn;
> + }
> +
> + list_for_each_entry_from(info, &context->records_list, list) {
> + if (before(seq, info->end_seq)) {
> + if (!context->retransmit_hint ||
> + after(info->end_seq,
> + context->retransmit_hint->end_seq)) {
> + context->hint_record_sn = record_sn;
> + context->retransmit_hint = info;
> + }
> + *p_record_sn = record_sn;
> + return info;
> + }
> + record_sn++;
> + }
> +
> + return NULL;
> +}
> +EXPORT_SYMBOL(tls_get_record);
> +
> +static int tls_device_push_pending_record(struct sock *sk, int flags)
> +{
> + struct iov_iter msg_iter;
> +
> + iov_iter_kvec(&msg_iter, WRITE | ITER_KVEC, NULL, 0, 0);
> + return tls_push_data(sk, &msg_iter, 0, flags, TLS_RECORD_TYPE_DATA);
> +}
> +
> +int tls_set_device_offload(struct sock *sk, struct tls_context *ctx)
> +{
> + struct tls_crypto_info *crypto_info;
> + struct tls_offload_context *offload_ctx;
> + struct tls_record_info *start_marker_record;
> + u16 nonece_size, tag_size, iv_size, rec_seq_size;
> + char *iv, *rec_seq;
> + int rc;
> + struct net_device *netdev;
> + struct sk_buff *skb;
> +
> + if (!ctx) {
> + rc = -EINVAL;
> + goto out;
> + }
> +
> + if (ctx->priv_ctx) {
> + rc = -EEXIST;
> + goto out;
> + }
> +
> + /* We support starting offload on multiple sockets
> + * concurrently, So we only need a read lock here.
> + */
> + percpu_down_read(&device_offload_lock);
> + netdev = get_netdev_for_sock(sk);
> + if (!netdev) {
> + pr_err("%s: netdev not found\n", __func__);
_ratelimit?
> + rc = -EINVAL;
> + goto release_lock;
> + }
> +
Marcelo
^ permalink raw reply
* Re: [PATCH] net: arc_emac: restart stalled EMAC
From: David Miller @ 2017-12-18 19:53 UTC (permalink / raw)
To: al.kochet; +Cc: netdev, linux-kernel, f.fainelli, edumazet
In-Reply-To: <1513336371-21325-1-git-send-email-al.kochet@gmail.com>
From: Alexander Kochetkov <al.kochet@gmail.com>
Date: Fri, 15 Dec 2017 14:12:51 +0300
> Under certain conditions EMAC stop reception of incoming packets and
> continuously increment R_MISS register instead of saving data into
> provided buffer. The commit implement workaround for such situation.
> Then the stall detected EMAC will be restarted.
>
> On device the stall looks like the device lost it's dynamic IP address.
> ifconfig shows that interface error counter rapidly increments.
> At the same time on the DHCP server we can see continues DHCP-requests
> from device.
>
> In real network stalls happen really rarely. To make them frequent the
> broadcast storm[1] should be simulated. For simulation it is necessary
> to make following connections:
> 1. connect radxarock to 1st port of switch
> 2. connect some PC to 2nd port of switch
> 3. connect two other free ports together using standard ethernet cable,
> in order to make a switching loop.
>
> After that, is necessary to make a broadcast storm. For example, running on
> PC 'ping' to some IP address triggers ARP-request storm. After some
> time (~10sec), EMAC on rk3188 will stall.
>
> Observed and tested on rk3188 radxarock.
>
> [1] https://en.wikipedia.org/wiki/Broadcast_radiation
>
> Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
This patch doesn't apply cleanly to any of my trees.
^ permalink raw reply
* Re: [PATCH net-next] net/ncsi: Don't take any action on HNCDSC AEN
From: David Miller @ 2017-12-18 19:50 UTC (permalink / raw)
To: sam; +Cc: netdev, linux-kernel, openbmc
In-Reply-To: <20171215051640.10926-1-sam@mendozajonas.com>
From: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Date: Fri, 15 Dec 2017 16:16:40 +1100
> The current HNCDSC handler takes the status flag from the AEN packet and
> will update or change the current channel based on this flag and the
> current channel status.
>
> However the flag from the HNCDSC packet merely represents the host link
> state. While the state of the host interface is potentially interesting
> information it should not affect the state of the NCSI link. Indeed the
> NCSI specification makes no mention of any recommended action related to
> the host network controller driver state.
>
> Update the HNCDSC handler to record the host network driver status but
> take no other action.
>
> Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Applied, thanks.
^ permalink raw reply
* Re: [RFC PATCH 0/9] ethtool netlink interface (WiP)
From: David Miller @ 2017-12-18 19:39 UTC (permalink / raw)
To: linville; +Cc: mkubecek, netdev, linux-kernel
In-Reply-To: <20171214210755.GG19705@tuxdriver.com>
From: "John W. Linville" <linville@tuxdriver.com>
Date: Thu, 14 Dec 2017 16:07:56 -0500
> Even without considering the ioctl problesms, the current ethtool
> API seems a bit crufty. It has been a catch-all, "where else would it
> go?" dumping ground for a long time, and it has accrued a number of
> not-entirely-related bits of functionality. In my mind, what needs
> to happen is that these various bits of functionality need to be
> reorganized into a handful of groupings. Then, each group needs an
> API designed around semantics that are natural to the functionality
> being addressed. I believe this is essentially the idea that others
> have expressed with the "move some of the ethtool bits to devlink"
> comments. I think that probably makes sense, although trying to shove
> everything into devlink probably makes no more sense than keeping
> the entire ethtool API intact on top of a netlink transport. Anyway,
> I think that with a reasonable set of groupings, the semantics would
> fall-out naturally and implementing them on netlink or any other
> suitable transport would be reasonably trivial.
Thanks for your valueable feedback John.
Let's keep in mind that really the core impetus to move ethtool stuff
to netlink is visibility.
Someone trying to monitor network config events in the system can't
see anything that happens with ethtool currently. It's completely
invisible.
Even ancient ifconfig ioctls generate proper netlink events.
Ethtool is one of the few, if not the only, network config mechanism
that elides netlink event visibility.
And I think fixing that core issue is what is driving the focus onto a
pure 1-to-1 conversion, be it to a separate netlink/genetlink family
or to devlink.
^ permalink raw reply
* Re: [PATCH v7 2/3] sock: Move the socket inuse to namespace.
From: David Miller @ 2017-12-18 19:30 UTC (permalink / raw)
To: xiangxia.m.yue; +Cc: xiyou.wangcong, netdev
In-Reply-To: <1513259519-32332-2-git-send-email-xiangxia.m.yue@gmail.com>
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Date: Thu, 14 Dec 2017 05:51:58 -0800
> In some case, we want to know how many sockets are in use in
> different _net_ namespaces. It's a key resource metric.
Useful or not, you're not exporting this value.
All this patch series does is convert the existing export of the
global tally to add up the per-net values.
So if you're not exporting the per-net value on it's own in any way,
this patch series isn't achieving the stated goal.
I'm not applying this series, sorry.
^ permalink raw reply
* Re: [PATCH v2 4/5] rds: Add runchecks.cfg for net/rds
From: Santosh Shilimkar @ 2017-12-18 19:28 UTC (permalink / raw)
To: Knut Omang
Cc: Joe Perches, Stephen Hemminger,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
rds-devel-N0ozoZBvEnrZJqsBc5GL+g
In-Reply-To: <1513476136.31439.96.camel-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
On 12/16/2017 6:02 PM, Knut Omang wrote:
> On Sat, 2017-12-16 at 12:00 -0800, santosh.shilimkar-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org wrote:
>> On 12/16/17 10:24 AM, Joe Perches wrote:
[...]
>>> Most of these existing messages from checkpatch should
>>> probably be inspected and corrected where possible to
>>> minimize the style differences between this subsystem
>>> and the rest of the kernel.
>>>
>>> For instance, here's a trivial patch to substitute
>>> pr_<level> for printks and a couple braces next to
>>> these substitutions.
>>>
>> Thanks Joe. I actually had a similar patch a while back but
>> since it was lot of churn, and code was already merged,
>> never submitted it and then later forgot about it.
>>
>> Will look into it.
>
> Please look at my set here first - I have already spent considerable time cleaning up
> stuff while working on this:
>
Just closing the loop. As discussed, I can use your patches without
any new tool dependency since existing checkpatch.pl already gives
those warnings. I started picking up Joes patch but since you have
changes, can use them instead once you untie them with runcheck.
Regarding the $subject, just re-iterating that I don't want any custom
script for RDS and want to just follow generic guidelines followed by
netdev for all net/* code.
Regards,
Santosh
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] Staging: irda: Do not check for NOT NULL before kfree()
From: Shreeya Patel @ 2017-12-18 19:27 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: devel, gregkh, samuel, linux-kernel, netdev
In-Reply-To: <20171218112038.15626e20@xeon-e3>
On Mon, 2017-12-18 at 11:20 -0800, Stephen Hemminger wrote:
> On Tue, 19 Dec 2017 00:41:30 +0530
> Shreeya Patel <shreeya.patel23498@gmail.com> wrote:
>
> >
> > Do not check for NOT NULL before calling kfree because if the
> > pointer is NULL, no action occurs.
> > Done using the following semantic patch by coccinelle.
> >
> > @@
> > expression ptr;
> > @@
> >
> > - if (ptr != NULL) {
> > kfree(ptr);
> > ptr = NULL;
> > - }
> >
> > The semantic patch has the effect of adding an assignment
> > of ptr to NULL in the case where ptr is NULL already.
> >
> > Signed-off-by: Shreeya Patel <shreeya.patel23498@gmail.com>
> Please read drivers/staging/irda/TODO
Oh, I was not knowing about it.
Thank you
>
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
^ permalink raw reply
* Re: [PATCH v2 0/5] Support for generalized use of make C={1,2} via a wrapper program
From: Leon Romanovsky @ 2017-12-18 19:24 UTC (permalink / raw)
To: Knut Omang
Cc: Bart Van Assche, joe@perches.com, jgg@ziepe.ca, corbet@lwn.net,
linux-kernel@vger.kernel.org, keescook@chromium.org,
linux-rdma@vger.kernel.org, linux-doc@vger.kernel.org,
willy@infradead.org, nicolas.palix@imag.fr,
asmund.ostvold@oracle.com, john.haxby@oracle.com,
alexander.levin@verizon.com, mchehab@kernel.org,
haakon.bugge@oracle.com, michal.lkml@markovi.net
In-Reply-To: <1513622390.31439.239.camel@oracle.com>
[-- Attachment #1: Type: text/plain, Size: 2032 bytes --]
On Mon, Dec 18, 2017 at 07:39:50PM +0100, Knut Omang wrote:
> On Mon, 2017-12-18 at 17:56 +0000, Bart Van Assche wrote:
> > On Mon, 2017-12-18 at 10:46 -0700, Jason Gunthorpe wrote:
> > > On Sun, Dec 17, 2017 at 10:00:17PM -0800, Joe Perches wrote:
> > >
> > > > > Today when we run checkers we get so many warnings it is too hard to
> > > > > make any sense of it.
> > > >
> > > > Here is a list of the checkpatch messages for drivers/infiniband
> > > > sorted by type.
> > > >
> > > > Many of these might be corrected by using
> > > >
> > > > $ ./scripts/checkpatch.pl -f --fix-inplace --types=<TYPE> \
> > > > $(git ls-files drivers/infiniband/)
> > >
> > > How many of these do you think it is worth to fix?
> > >
> > > We do get a steady trickle of changes in this topic every cycle.
> > >
> > > Is it better to just do a big number of them all at once? Do you have
> > > an idea how disruptive this kind of work is to the whole patch flow
> > > eg new patches no longer applying to for-next, backports no longer
> > > applying, merge conflicts?
> >
> > In my opinion patches that only change the coding style and do not change any
> > functionality are annoying. Before posting a patch that fixes a bug the change
> > history (git log -p) has to be cheched to figure out which patch introduced
> > the bug. Patches that only change coding style pollute the change history.
>
> I agree with you - the problem is that style issues should not have existed.
> But when they do it becomes a problem to remove them and a problem to
> keep them - for instance us who try to be compliant by having style helpers
> in our editor, we end up having to manually revert old style mistakes back in
> to avoid making unrelated whitespace changes or similar.
If the checkpatch.pl complains about coding style for the new patch in
newly added code, I'm asking from the author to prepare cleanup patch so
it will be applied before actual patch.
In case, complains are for code which patch are not touching, I'm
submitting it as is.
Thanks
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH iproute2 1/3] iplink: Improve index parameter handling
From: Stephen Hemminger @ 2017-12-18 19:23 UTC (permalink / raw)
To: Serhey Popovych; +Cc: netdev
In-Reply-To: <1513623248-7689-2-git-send-email-serhe.popovych@gmail.com>
On Mon, 18 Dec 2017 20:54:06 +0200
Serhey Popovych <serhe.popovych@gmail.com> wrote:
> diff --git a/ip/iplink.c b/ip/iplink.c
> index 1e685cc..4f9c169 100644
> --- a/ip/iplink.c
> +++ b/ip/iplink.c
> @@ -586,8 +586,10 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
> *name = *argv;
> } else if (strcmp(*argv, "index") == 0) {
> NEXT_ARG();
> + if (*index)
> + duparg("index", *argv);
> *index = atoi(*argv);
> - if (*index < 0)
> + if (*index <= 0)
Why not use strtoul instead of atoi?
^ permalink raw reply
* Re: [net-next] phylib: Add device reset GPIO support causes DSA MT7530 acquires reset-gpios fails
From: Florian Fainelli @ 2017-12-18 19:21 UTC (permalink / raw)
To: Andrew Lunn, Sean Wang
Cc: sergei.shtylyov, vivien.didelot, davem, netdev, linux-kernel,
linux-mediatek, richard.leitner, geert+renesas
In-Reply-To: <20171218080120.GD30815@lunn.ch>
On 12/18/2017 12:01 AM, Andrew Lunn wrote:
> Hi Sean
>
>> It probably can't. Because before the GPIO line is manipulated to reset,
>> certain power control should be handled such as power sources from
>> external PMIC to let devices actually enter the proper state.
>>
>> So, I thought the kind of reset should be better controlled by the
>> specific driver, not by generic core.
>
> Yes, the driver should do it in that case.
>
> So we have a few choices:
>
> 1) Change the name of one of the properties
>
> 2) Make the new code look at the compatible string, any only apply a
> reset if it is a PHY.
>
> 3) Make the new code only hold the gpio when it needs it. Same for the
> driver, so that they both can reset the device.
>
> Any other ideas? Any preferences? 2) and 3) are probably simpler to
> do, less backwards compatibility issues. 3) potentially could cause
> issues when a device is reset in the wrong context, because of
> external PMIC etc. So i'm thinking 2).
We could also add some sort of flag that indicates whether the reset
should be managed by the core, or the driver, I would have to double
check there is not a chicken and egg problem and that the driver probe
is early enough this can happen...
--
Florian
^ permalink raw reply
* Re: [PATCH] Staging: irda: Do not check for NOT NULL before kfree()
From: Stephen Hemminger @ 2017-12-18 19:20 UTC (permalink / raw)
To: Shreeya Patel; +Cc: devel, gregkh, samuel, linux-kernel, netdev
In-Reply-To: <1513624290-2965-1-git-send-email-shreeya.patel23498@gmail.com>
On Tue, 19 Dec 2017 00:41:30 +0530
Shreeya Patel <shreeya.patel23498@gmail.com> wrote:
> Do not check for NOT NULL before calling kfree because if the
> pointer is NULL, no action occurs.
> Done using the following semantic patch by coccinelle.
>
> @@
> expression ptr;
> @@
>
> - if (ptr != NULL) {
> kfree(ptr);
> ptr = NULL;
> - }
>
> The semantic patch has the effect of adding an assignment
> of ptr to NULL in the case where ptr is NULL already.
>
> Signed-off-by: Shreeya Patel <shreeya.patel23498@gmail.com>
Please read drivers/staging/irda/TODO
^ permalink raw reply
* Re: [PATCH v3 net-next 3/6] net: Add SW fallback infrastructure for offloaded sockets
From: Marcelo Ricardo Leitner @ 2017-12-18 19:18 UTC (permalink / raw)
To: Ilya Lesokhin
Cc: netdev, davem, davejwatson, tom, hannes, borisp, aviadye, liranl
In-Reply-To: <20171218111033.13256-4-ilyal@mellanox.com>
On Mon, Dec 18, 2017 at 01:10:30PM +0200, Ilya Lesokhin wrote:
> Offloaded sockets rely on the netdev to transform the transmitted
> packets before sending them over the network.
> When a packet from an offloaded socket is looped back or
> rerouted to a different device we need to detect it and
> do the transformation in software
>
> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
> Signed-off-by: Boris Pismenny <borisp@mellanox.com>
> ---
> include/net/sock.h | 17 +++++++++++++++++
> net/core/dev.c | 4 ++++
> 2 files changed, 21 insertions(+)
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 9a9047268d37..5397307603ec 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -479,6 +479,9 @@ struct sock {
> void (*sk_error_report)(struct sock *sk);
> int (*sk_backlog_rcv)(struct sock *sk,
> struct sk_buff *skb);
> + struct sk_buff* (*sk_offload_check)(struct sock *sk,
> + struct net_device *dev,
> + struct sk_buff *skb);
> void (*sk_destruct)(struct sock *sk);
> struct sock_reuseport __rcu *sk_reuseport_cb;
> struct rcu_head sk_rcu;
> @@ -2324,6 +2327,20 @@ static inline bool sk_fullsock(const struct sock *sk)
> return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
> }
>
> +/* Checks if this SKB belongs to an HW offloaded socket
> + * and whether any SW fallbacks are required based on dev.
> + */
> +static inline struct sk_buff *skb_offload_check(struct sk_buff *skb,
> + struct net_device *dev)
> +{
> + struct sock *sk = skb->sk;
> +
> + if (sk && sk_fullsock(sk) && sk->sk_offload_check)
Isn't this going to hurt the fast path, checking for sk fields here?
> + skb = sk->sk_offload_check(sk, dev, skb);
> +
> + return skb;
> +}
> +
> /* This helper checks if a socket is a LISTEN or NEW_SYN_RECV
> * SYNACK messages can be attached to either ones (depending on SYNCOOKIE)
> */
> diff --git a/net/core/dev.c b/net/core/dev.c
> index b0eee49a2489..6a78d9046674 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3051,6 +3051,10 @@ static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device
> if (unlikely(!skb))
> goto out_null;
>
> + skb = skb_offload_check(skb, dev);
> + if (!skb)
> + goto out_null;
> +
> if (netif_needs_gso(skb, features)) {
> struct sk_buff *segs;
>
> --
> 2.15.0.317.g14c63a9
>
^ permalink raw reply
* [PATCH] utils: fix makeargs stack overflow
From: Stephen Hemminger @ 2017-12-18 19:15 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger
The makeargs() function did not handle end of string correctly
and would reference past end of string.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
lib/utils.c | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/lib/utils.c b/lib/utils.c
index 7ced8c061cb0..df1f3b1238c0 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -1206,10 +1206,16 @@ ssize_t getcmdline(char **linep, size_t *lenp, FILE *in)
int makeargs(char *line, char *argv[], int maxargs)
{
static const char ws[] = " \t\r\n";
- char *cp;
+ char *cp = line;
int argc = 0;
- for (cp = line + strspn(line, ws); *cp; cp += strspn(cp, ws)) {
+ while (*cp) {
+ /* skip leading whitespace */
+ cp += strspn(cp, ws);
+
+ if (*cp == '\0')
+ break;
+
if (argc >= (maxargs - 1)) {
fprintf(stderr, "Too many arguments to command\n");
exit(1);
@@ -1226,13 +1232,16 @@ int makeargs(char *line, char *argv[], int maxargs)
fprintf(stderr, "Unterminated quoted string\n");
exit(1);
}
- *cp++ = 0;
- continue;
+ } else {
+ argv[argc++] = cp;
+
+ /* find end of word */
+ cp += strcspn(cp, ws);
+ if (*cp == '\0')
+ break;
}
- argv[argc++] = cp;
- /* find end of word */
- cp += strcspn(cp, ws);
+ /* seperate words */
*cp++ = 0;
}
argv[argc] = NULL;
--
2.11.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox