* Re: [PATCH] ipv6:ip6_xmit remove unnecessary np NULL check
From: Eric Dumazet @ 2016-11-29 15:26 UTC (permalink / raw)
To: Manjeet Pawar
Cc: davem, kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel,
pankaj.m, ajeet.y, Rohit Thapliyal
In-Reply-To: <1480401179-9220-1-git-send-email-manjeet.p@samsung.com>
On Tue, 2016-11-29 at 12:02 +0530, Manjeet Pawar wrote:
> From: Rohit Thapliyal <r.thapliyal@samsung.com>
>
> np NULL check doesn't seem required here as it shall never
> be NULL anyways in inet6_sk(sk).
>
> Signed-off-by: Rohit Thapliyal <r.thapliyal@samsung.com>
> Signed-off-by: Manjeet Pawar <manjeet.p@samsung.com>
> Signed-off-by: David Miller <davem@davemloft.net>
> Reviewed-by: Akhilesh Kumar <akhilesh.k@samsung.com>
>
> ---
> v2->v3: Modified as per the suggestion from David Miller
> ip6_xmit calls are made without checking NULL np
> pointer, so no need to explicitly check NULL np in
> ip6_xmit.
>
> include/linux/ipv6.h | 2 +-
> net/ipv6/ip6_output.c | 3 +--
> 2 files changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index a064997..6c9c604 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -299,7 +299,7 @@ struct tcp6_timewait_sock {
>
> static inline struct ipv6_pinfo *inet6_sk(const struct sock *__sk)
> {
> - return sk_fullsock(__sk) ? inet_sk(__sk)->pinet6 : NULL;
> + return inet_sk(__sk)->pinet6;
David suggestion was about np being NULL or not in ip6_xmit()
But have you checked inet6_sk() was never called for a TCPv6 TIMEWAIT or
SYN_RECV request ?
> }
>
> static inline struct raw6_sock *raw6_sk(const struct sock *sk)
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 59eb4ed..f8c63ec 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -213,8 +213,7 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
> /*
> * Fill in the IPv6 header
> */
> - if (np)
> - hlimit = np->hop_limit;
> + hlimit = np->hop_limit;
> if (hlimit < 0)
> hlimit = ip6_dst_hoplimit(dst);
>
This part is fine.
^ permalink raw reply
* [PATCH] tun: Use netif_receive_skb instead of netif_rx
From: Andrey Konovalov @ 2016-11-29 15:25 UTC (permalink / raw)
To: Herbert Xu, David S . Miller, Jason Wang, Eric Dumazet,
Paolo Abeni, Michael S . Tsirkin, Soheil Hassas Yeganeh,
Markus Elfring, Mike Rapoport, netdev, linux-kernel
Cc: Dmitry Vyukov, Kostya Serebryany, syzkaller, Andrey Konovalov
This patch changes tun.c to call netif_receive_skb instead of netif_rx
when a packet is received. The difference between the two is that netif_rx
queues the packet into the backlog, and netif_receive_skb proccesses the
packet in the current context.
This patch is required for syzkaller [1] to collect coverage from packet
receive paths, when a packet being received through tun (syzkaller collects
coverage per process in the process context).
A similar patch was introduced back in 2010 [2, 3], but the author found
out that the patch doesn't help with the task he had in mind (for cgroups
to shape network traffic based on the original process) and decided not to
go further with it. The main concern back then was about possible stack
exhaustion with 4K stacks, but CONFIG_4KSTACKS was removed and stacks are
8K now.
[1] https://github.com/google/syzkaller
[2] https://www.spinics.net/lists/netdev/thrd440.html#130570
[3] https://www.spinics.net/lists/netdev/msg130570.html
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
---
drivers/net/tun.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 8093e39..4b56e91 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1304,7 +1304,9 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
skb_probe_transport_header(skb, 0);
rxhash = skb_get_hash(skb);
- netif_rx_ni(skb);
+ local_bh_disable();
+ netif_receive_skb(skb);
+ local_bh_enable();
stats = get_cpu_ptr(tun->pcpu_stats);
u64_stats_update_begin(&stats->syncp);
--
2.8.0.rc3.226.g39d4020
^ permalink raw reply related
* Re: [PATCH RFC v2] ethtool: implement helper to get flow_type value
From: Jakub Kicinski @ 2016-11-29 15:21 UTC (permalink / raw)
To: Jacob Keller; +Cc: netdev, Intel Wired LAN, David Miller
In-Reply-To: <20161128230343.19110-1-jacob.e.keller@intel.com>
On Mon, 28 Nov 2016 15:03:43 -0800, Jacob Keller wrote:
> Often a driver wants to store the flow type and thus it must mask the
> extra fields. This is a task that could grow more complex as more flags
> are added in the future. Add a helper function that masks the flags for
> marking additional fields.
>
> Modify drivers in drivers/net/ethernet that currently check for FLOW_EXT
> and FLOW_MAC_EXT to use the helper. Currently this is only the mellanox
> drivers.
>
> I chose not to modify other drivers as I'm actually unsure whether we
> should always mask the flow type even for drivers which don't recognize
> the newer flags. On the one hand, today's drivers (generally)
> automatically fail when a new flag is used because they won't mask it
> and their checks against flow_type will not match. On the other hand, it
> means another place that you have to update when you begin implementing
> a flag.
>
> An alternative is to have the driver store a set of flags that it knows
> about, and then have ethtool core do the check for us to discard frames.
> I haven't implemented this quite yet.
>
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---
> drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 4 ++--
> drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 6 +++---
> include/uapi/linux/ethtool.h | 5 +++++
> 3 files changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> index 487a58f9c192..d8f9839ce2a3 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> @@ -1270,7 +1270,7 @@ static int mlx4_en_validate_flow(struct net_device *dev,
> return -EINVAL;
> }
>
> - switch (cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
> + switch (ethtool_get_flow_spec_type(cmd->fs.flow_type)) {
> case TCP_V4_FLOW:
> case UDP_V4_FLOW:
> if (cmd->fs.m_u.tcp_ip4_spec.tos)
> @@ -1493,7 +1493,7 @@ static int mlx4_en_ethtool_to_net_trans_rule(struct net_device *dev,
> if (err)
> return err;
>
> - switch (cmd->fs.flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
> + switch (ethtool_get_flow_spec_type(cmd->fs.flow_type)) {
> case ETHER_FLOW:
> spec_l2 = kzalloc(sizeof(*spec_l2), GFP_KERNEL);
> if (!spec_l2)
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
> index 3691451c728c..066e6c5cf38b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c
> @@ -63,7 +63,7 @@ static struct mlx5e_ethtool_table *get_flow_table(struct mlx5e_priv *priv,
> int table_size;
> int prio;
>
> - switch (fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
> + switch (ethtool_get_flow_spec_type(fs->flow_type)) {
> case TCP_V4_FLOW:
> case UDP_V4_FLOW:
> max_tuples = ETHTOOL_NUM_L3_L4_FTS;
> @@ -147,7 +147,7 @@ static int set_flow_attrs(u32 *match_c, u32 *match_v,
> outer_headers);
> void *outer_headers_v = MLX5_ADDR_OF(fte_match_param, match_v,
> outer_headers);
> - u32 flow_type = fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT);
> + u32 flow_type = ethtool_get_flow_spec_type(fs->flow_type);
> struct ethtool_tcpip4_spec *l4_mask;
> struct ethtool_tcpip4_spec *l4_val;
> struct ethtool_usrip4_spec *l3_mask;
> @@ -393,7 +393,7 @@ static int validate_flow(struct mlx5e_priv *priv,
> fs->ring_cookie != RX_CLS_FLOW_DISC)
> return -EINVAL;
>
> - switch (fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT)) {
> + switch (ethtool_get_flow_spec_type(fs->flow_type)) {
> case ETHER_FLOW:
> eth_mask = &fs->m_u.ether_spec;
> if (!is_zero_ether_addr(eth_mask->h_dest))
> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
> index f0db7788f887..e92ad725c9d0 100644
> --- a/include/uapi/linux/ethtool.h
> +++ b/include/uapi/linux/ethtool.h
> @@ -1583,6 +1583,11 @@ static inline int ethtool_validate_duplex(__u8 duplex)
> #define FLOW_EXT 0x80000000
> #define FLOW_MAC_EXT 0x40000000
>
> +static inline __u32 ethtool_get_flow_spec_type(__u32 flow_type)
> +{
> + return flow_type & (FLOW_EXT | FLOW_MAC_EXT);
I don't have anything of substance to say but I think you are missing a
negation (~) here compared to the code you are replacing ;)
^ permalink raw reply
* Re: bpf debug info
From: Daniel Borkmann @ 2016-11-29 15:11 UTC (permalink / raw)
To: Alexei Starovoitov, netdev
Cc: Brenden Blanco, Thomas Graf, Wangnan, He Kuang, kernel-team
In-Reply-To: <20161129064217.GA20124@ast-mbp.thefacebook.com>
On 11/29/2016 07:42 AM, Alexei Starovoitov wrote:
[...]
> The support for debug information in BPF was recently added to llvm.
>
> In order to use it recompile bpf programs with the following patch
> in samples/bpf/Makefile
> @@ -155,4 +155,4 @@ $(obj)/%.o: $(src)/%.c
> $(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
> -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
> -Wno-compare-distinct-pointer-types \
> - -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
> + -O2 -emit-llvm -g -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
>
> and compiled .o files can be consumed by standard llvm-objdump utility.
>
> $ llvm-objdump -S -no-show-raw-insn samples/bpf/xdp1_kern.o
> xdp1_kern.o: file format ELF64-BPF
>
> Disassembly of section xdp1:
> xdp_prog1:
> ; {
> 0: r2 = *(u32 *)(r1 + 4)
> ; void *data = (void *)(long)ctx->data;
> 8: r1 = *(u32 *)(r1 + 0)
> ; if (data + nh_off > data_end)
> 10: r3 = r1
> 18: r3 += 14
> 20: if r3 > r2 goto 55
> ; h_proto = eth->h_proto;
> 28: r3 = *(u8 *)(r1 + 12)
> 30: r4 = *(u8 *)(r1 + 13)
> 38: r4 <<= 8
> 40: r4 |= r3
> ; if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
> 48: if r4 == 43144 goto 2
> 50: r3 = 14
> 58: if r4 != 129 goto 5
>
> LBB0_3:
> ; if (data + nh_off > data_end)
> 60: r3 = r1
> 68: r3 += 18
> 70: if r3 > r2 goto 45
> 78: r3 = 18
> ; h_proto = vhdr->h_vlan_encapsulated_proto;
> 80: r4 = *(u16 *)(r1 + 16)
>
> LBB0_5:
> 88: r5 = r4
> 90: r5 &= 65535
> ; if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
> 98: if r5 == 43144 goto 1
> a0: if r5 != 129 goto 9
>
> Notice that 'clang -S -o a.s' output and llvm-objdump disassembler
> were changed to use kernel verifier style, so now it should be easier
> to see what's going on.
Sounds really useful, is that scheduled for llvm 3.10 release?
That debugging info is stored in dwarf format into the obj, right?
Would be nice if also pahole could work on bpf object files.
> The main advantage of debug info is that verifier error messages
> are now easier to correlate to original C code.
Does that mean that the old -S output format is not available anymore?
Personally, I liked that one better tbh, was hoping someone would have
added asm parsing for it, so compilation from .S to .o also works.
> For example, say, in samples/bpf/parse_varlen.c I forgot
> to compare pointer into packet with data_end:
> --- a/samples/bpf/parse_varlen.c
> +++ b/samples/bpf/parse_varlen.c
> @@ -33,8 +33,8 @@ static int udp(void *data, uint64_t tp_off, void *data_end)
> {
> struct udphdr *udp = data + tp_off;
>
> - if (udp + 1 > data_end)
> - return 0;
> +// if (udp + 1 > data_end)
> +// return 0;
> if (udp->dest == htons(DEFAULT_PKTGEN_UDP_PORT) ||
> udp->source == htons(DEFAULT_PKTGEN_UDP_PORT)) {
>
> If I try to run samples/bpf/test_cls_bpf.sh the verifier will complain:
> R0=imm0,min_value=0,max_value=0 R1=pkt(id=0,off=0,r=42) R2=pkt_end
> 112: (0f) r4 += r3
> 113: (0f) r1 += r4
> 114: (b7) r0 = 2
> 115: (69) r2 = *(u16 *)(r1 +2)
> invalid access to packet, off=2 size=2, R1(id=3,off=0,r=0)
>
> Now multiply 115 * 8 and convert to hex. This is address 0x398 in llvm-objdump:
> ; struct udphdr *udp = data + tp_off;
> 388: r1 += r4
> 390: r0 = 2
> ; if (udp->dest == htons(DEFAULT_PKTGEN_UDP_PORT) ||
> 398: r2 = *(u16 *)(r1 + 2)
> 3a0: if r2 == 2304 goto 16
>
> Now it's clear which line of C code is causing the verifier to reject.
[...]
Could llvm-objdump switch line numbering for bpf same way as verifier
output, so mapping step is not really needed? Alternatively, on each
verifier error, there could be a hint with cmd to use regarding llvm-objdump.
> So next step is to improve verifier messages to be more human friendly.
> The step after is to introduce BPF_COMMENT pseudo instruction
> that will be ignored by the interpreter yet it will contain the text
> of original source code. Then llvm-objdump step won't be necessary.
> The bpf loader will load both instructions and pieces of C sources.
> Then verifier errors should be even easier to read and humans
> can easily understand the purpose of the program.
So the BPF_COMMENT pseudo insn will get stripped away from the insn array
after verification step, so we don't need to hold/account for this mem? I
assume in it's ->imm member it will just hold offset into text blob?
Given that the generated verifier log can already become huge nowadays up
to a point where less than 4k insns prog fails to load due to reaching max
kernel allowed verifier buffer size, is the plan to only dump C code for
last few lines or for everything?
> PS
> A year ago He Kuang reported that dwarf emitted by bpf llvm backend is broken.
> Sorry it took so long to fix. It's probably still broken on big endian,
> since I've only tested on x86.
>
^ permalink raw reply
* Re: AF_VSOCK network namespace support
From: Stefan Hajnoczi @ 2016-11-29 15:09 UTC (permalink / raw)
To: Jorgen S. Hansen; +Cc: netdev@vger.kernel.org, imbrenda@linux.vnet.ibm.com
In-Reply-To: <4DA0F029-EE07-4E51-879C-2033D1F8AFCD@vmware.com>
[-- Attachment #1: Type: text/plain, Size: 2894 bytes --]
On Mon, Nov 28, 2016 at 03:24:36PM +0000, Jorgen S. Hansen wrote:
> > On Nov 23, 2016, at 3:55 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > Hi Jorgen,
> > There are two use cases where network namespace support in AF_VSOCK
> > could be useful:
> >
> > 1. Claudio Imbrenda pointed out that a machine cannot act as both host
> > and guest at the same time. This is necessary for nested
> > virtualization. Currently only one transport (the host side or the
> > guest side) can be registered at a time.
>
> VMCI based AF_VSOCK relies on the VMCI driver for nested virtualization support. The VMCI driver is a combined host/guest driver with a routing component, that will either direct traffic to VMs managed by the host “personality” of the driver, or to the outer host. So any VMCI driver driver is able to function simultaneously as both a guest and a host driver - exactly to be able to support nested virtualization.
>
> Since, for VMCI based vSocket, the host has a fixed CID (2), any traffic generated by an application inside a VM destined for CID 2 will be routed out of the VM (to the host - either a virtual or physical one). Any traffic for a CID > 2 will be directed towards VMs managed by the host personality of the VMCI driver.
>
> Since VMCI predates nested virtualization, the solution above was partly a result of having to support existing configurations in a transparent way.
Thanks for your reply.
It seems like a reasonable solution. We could do something similar for
virtio.
> > 2. Users may wish to isolate the AF_VSOCK address namespace so that two
> > VMs have completely independent CID and ports (they could even use
> > the same CID and ports because they're in separate namespaces). This
> > ensures that a host service visible to VM1 is not automatically
> > visible to VM2.
>
> If the goal is to provide fine grained service access control, won’t this end up requiring a namespace per VM? For ESX, we have a mechanism to tag VMs that allows them to be granted access to a service offered through AF_VSOCK, but this is not part of the Linux hypervisor.
Yes it would and using one network namespace per VM is a bit clumsy
because it means IP networking (e.g. for libiscsi or GlusterFS) needs to
be configured carefully so that external network connectivity works
(using a veth interface).
I did think about netfilter support but one of the main reasons for
using AF_VSOCK vs Ethernet is to avoid user firewall rules from
disrupting hypervisor services :).
> If the intent is to be able to support multi tenancy, then this sounds like a better fit. Also, in the multi tenancy case, isolating the other AFs is probably what you want as well.
Eventually multi-tenancy might be interesting but just fine-grained
service access control would be enough for most cases.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply
* [PATCH 4/5] net: ethernet: ti: cpsw: optimize end of poll cycle
From: Ivan Khoronzhuk @ 2016-11-29 15:00 UTC (permalink / raw)
To: netdev, linux-kernel, mugunthanvnm, grygorii.strashko
Cc: linux-omap, Ivan Khoronzhuk
In-Reply-To: <1480431651-6081-1-git-send-email-ivan.khoronzhuk@linaro.org>
Check budget fullness only after it's updated and update
channel mask only once to keep budget balance between channels.
It's also needed for farther changes.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
drivers/net/ethernet/ti/cpsw.c | 24 ++++++------------------
1 file changed, 6 insertions(+), 18 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index c102675..8a70298 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -788,19 +788,13 @@ static int cpsw_tx_poll(struct napi_struct *napi_tx, int budget)
/* process every unprocessed channel */
ch_map = cpdma_ctrl_txchs_state(cpsw->dma);
- for (ch = 0, num_tx = 0; num_tx < budget; ch_map >>= 1, ch++) {
- if (!ch_map) {
- ch_map = cpdma_ctrl_txchs_state(cpsw->dma);
- if (!ch_map)
- break;
-
- ch = 0;
- }
-
+ for (ch = 0, num_tx = 0; ch_map; ch_map >>= 1, ch++) {
if (!(ch_map & 0x01))
continue;
num_tx += cpdma_chan_process(cpsw->txch[ch], budget - num_tx);
+ if (num_tx >= budget)
+ break;
}
if (num_tx < budget) {
@@ -823,19 +817,13 @@ static int cpsw_rx_poll(struct napi_struct *napi_rx, int budget)
/* process every unprocessed channel */
ch_map = cpdma_ctrl_rxchs_state(cpsw->dma);
- for (ch = 0, num_rx = 0; num_rx < budget; ch_map >>= 1, ch++) {
- if (!ch_map) {
- ch_map = cpdma_ctrl_rxchs_state(cpsw->dma);
- if (!ch_map)
- break;
-
- ch = 0;
- }
-
+ for (ch = 0, num_rx = 0; ch_map; ch_map >>= 1, ch++) {
if (!(ch_map & 0x01))
continue;
num_rx += cpdma_chan_process(cpsw->rxch[ch], budget - num_rx);
+ if (num_rx >= budget)
+ break;
}
if (num_rx < budget) {
--
2.7.4
^ permalink raw reply related
* [PATCH 3/5] net: ethernet: ti: cpsw: add .ndo to set per-queue rate
From: Ivan Khoronzhuk @ 2016-11-29 15:00 UTC (permalink / raw)
To: netdev, linux-kernel, mugunthanvnm, grygorii.strashko
Cc: linux-omap, Ivan Khoronzhuk
In-Reply-To: <1480431651-6081-1-git-send-email-ivan.khoronzhuk@linaro.org>
This patch allows to rate limit queues tx queues for cpsw interface.
The rate is set in absolute Mb/s units and cannot be more a speed
an interface is connected with.
The rate for a tx queue can be tested with:
ethtool -L eth0 rx 4 tx 4
echo 100 > /sys/class/net/eth0/queues/tx-0/tx_maxrate
echo 200 > /sys/class/net/eth0/queues/tx-1/tx_maxrate
echo 50 > /sys/class/net/eth0/queues/tx-2/tx_maxrate
echo 30 > /sys/class/net/eth0/queues/tx-3/tx_maxrate
tc qdisc add dev eth0 root handle 1: multiq
tc filter add dev eth0 parent 1: protocol ip prio 1 u32 match ip\
dport 5001 0xffff action skbedit queue_mapping 0
tc filter add dev eth0 parent 1: protocol ip prio 1 u32 match ip\
dport 5002 0xffff action skbedit queue_mapping 1
tc filter add dev eth0 parent 1: protocol ip prio 1 u32 match ip\
dport 5003 0xffff action skbedit queue_mapping 2
tc filter add dev eth0 parent 1: protocol ip prio 1 u32 match ip\
dport 5004 0xffff action skbedit queue_mapping 3
iperf -c 192.168.2.1 -b 110M -p 5001 -f m -t 60
iperf -c 192.168.2.1 -b 215M -p 5002 -f m -t 60
iperf -c 192.168.2.1 -b 55M -p 5003 -f m -t 60
iperf -c 192.168.2.1 -b 32M -p 5004 -f m -t 60
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
drivers/net/ethernet/ti/cpsw.c | 87 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 87 insertions(+)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index da40ea5..c102675 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1872,6 +1872,88 @@ static int cpsw_ndo_vlan_rx_kill_vid(struct net_device *ndev,
return ret;
}
+static int cpsw_ndo_set_tx_maxrate(struct net_device *ndev, int queue, u32 rate)
+{
+ struct cpsw_priv *priv = netdev_priv(ndev);
+ int tx_ch_num = ndev->real_num_tx_queues;
+ u32 consumed_rate, min_rate, max_rate;
+ struct cpsw_common *cpsw = priv->cpsw;
+ struct cpsw_slave *slave;
+ int ret, i, weight;
+ int rlim_num = 0;
+ u32 ch_rate;
+
+ ch_rate = netdev_get_tx_queue(ndev, queue)->tx_maxrate;
+ if (ch_rate == rate)
+ return 0;
+
+ if (cpsw->data.dual_emac)
+ slave = &cpsw->slaves[priv->emac_port];
+ else
+ slave = &cpsw->slaves[cpsw->data.active_slave];
+ max_rate = slave->phy->speed;
+
+ consumed_rate = 0;
+ for (i = 0; i < tx_ch_num; i++) {
+ if (i == queue)
+ ch_rate = rate;
+ else
+ ch_rate = netdev_get_tx_queue(ndev, i)->tx_maxrate;
+
+ if (!ch_rate)
+ continue;
+
+ rlim_num++;
+ consumed_rate += ch_rate;
+ }
+
+ if (consumed_rate > max_rate)
+ dev_info(priv->dev, "The common rate shouldn't be more than %dMbps",
+ max_rate);
+
+ if (consumed_rate > max_rate) {
+ if (max_rate == 10 && consumed_rate <= 100) {
+ max_rate = 100;
+ } else if (max_rate <= 100 && consumed_rate <= 1000) {
+ max_rate = 1000;
+ } else {
+ dev_err(priv->dev, "The common rate cannot be more than %dMbps",
+ max_rate);
+ return -EINVAL;
+ }
+ }
+
+ if (consumed_rate > max_rate) {
+ dev_err(priv->dev, "The common rate cannot be more than %dMbps",
+ max_rate);
+ return -EINVAL;
+ }
+
+ rate *= 1000;
+ min_rate = cpdma_chan_get_min_rate(cpsw->dma);
+ if ((rate < min_rate && rate)) {
+ dev_err(priv->dev, "The common rate cannot be less than %dMbps",
+ min_rate);
+ return -EINVAL;
+ }
+
+ ret = pm_runtime_get_sync(cpsw->dev);
+ if (ret < 0) {
+ pm_runtime_put_noidle(cpsw->dev);
+ return ret;
+ }
+
+ if (rlim_num == tx_ch_num)
+ max_rate = consumed_rate;
+
+ weight = (rate * 100) / (max_rate * 1000);
+ cpdma_chan_set_weight(cpsw->txch[queue], weight);
+
+ ret = cpdma_chan_set_rate(cpsw->txch[queue], rate);
+ pm_runtime_put(cpsw->dev);
+ return ret;
+}
+
static const struct net_device_ops cpsw_netdev_ops = {
.ndo_open = cpsw_ndo_open,
.ndo_stop = cpsw_ndo_stop,
@@ -1881,6 +1963,7 @@ static const struct net_device_ops cpsw_netdev_ops = {
.ndo_validate_addr = eth_validate_addr,
.ndo_tx_timeout = cpsw_ndo_tx_timeout,
.ndo_set_rx_mode = cpsw_ndo_set_rx_mode,
+ .ndo_set_tx_maxrate = cpsw_ndo_set_tx_maxrate,
#ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller = cpsw_ndo_poll_controller,
#endif
@@ -2100,6 +2183,7 @@ static int cpsw_update_channels_res(struct cpsw_priv *priv, int ch_num, int rx)
int (*poll)(struct napi_struct *, int);
struct cpsw_common *cpsw = priv->cpsw;
void (*handler)(void *, int, int);
+ struct netdev_queue *queue;
struct cpdma_chan **chan;
int ret, *ch;
@@ -2117,6 +2201,8 @@ static int cpsw_update_channels_res(struct cpsw_priv *priv, int ch_num, int rx)
while (*ch < ch_num) {
chan[*ch] = cpdma_chan_create(cpsw->dma, *ch, handler, rx);
+ queue = netdev_get_tx_queue(priv->ndev, *ch);
+ queue->tx_maxrate = 0;
if (IS_ERR(chan[*ch]))
return PTR_ERR(chan[*ch]);
@@ -2752,6 +2838,7 @@ static int cpsw_probe(struct platform_device *pdev)
dma_params.desc_align = 16;
dma_params.has_ext_regs = true;
dma_params.desc_hw_addr = dma_params.desc_mem_phys;
+ dma_params.bus_freq_mhz = cpsw->bus_freq_mhz;
cpsw->dma = cpdma_ctlr_create(&dma_params);
if (!cpsw->dma) {
--
2.7.4
^ permalink raw reply related
* [PATCH 5/5] net: ethernet: ti: cpsw: split tx budget according between channels
From: Ivan Khoronzhuk @ 2016-11-29 15:00 UTC (permalink / raw)
To: netdev, linux-kernel, mugunthanvnm, grygorii.strashko
Cc: linux-omap, Ivan Khoronzhuk
In-Reply-To: <1480431651-6081-1-git-send-email-ivan.khoronzhuk@linaro.org>
Split device budget between channels according to channel rate.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
drivers/net/ethernet/ti/cpsw.c | 159 +++++++++++++++++++++++++++++++++--------
1 file changed, 130 insertions(+), 29 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 8a70298..ec4873f 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -365,6 +365,11 @@ static inline void slave_write(struct cpsw_slave *slave, u32 val, u32 offset)
__raw_writel(val, slave->regs + offset);
}
+struct cpsw_vector {
+ struct cpdma_chan *ch;
+ int budget;
+};
+
struct cpsw_common {
struct device *dev;
struct cpsw_platform_data data;
@@ -380,8 +385,8 @@ struct cpsw_common {
int rx_packet_max;
struct cpsw_slave *slaves;
struct cpdma_ctlr *dma;
- struct cpdma_chan *txch[CPSW_MAX_QUEUES];
- struct cpdma_chan *rxch[CPSW_MAX_QUEUES];
+ struct cpsw_vector txv[CPSW_MAX_QUEUES];
+ struct cpsw_vector rxv[CPSW_MAX_QUEUES];
struct cpsw_ale *ale;
bool quirk_irq;
bool rx_irq_disabled;
@@ -741,7 +746,7 @@ static void cpsw_rx_handler(void *token, int len, int status)
return;
}
- ch = cpsw->rxch[skb_get_queue_mapping(new_skb)];
+ ch = cpsw->rxv[skb_get_queue_mapping(new_skb)].ch;
ret = cpdma_chan_submit(ch, new_skb, new_skb->data,
skb_tailroom(new_skb), 0);
if (WARN_ON(ret < 0))
@@ -783,8 +788,9 @@ static irqreturn_t cpsw_rx_interrupt(int irq, void *dev_id)
static int cpsw_tx_poll(struct napi_struct *napi_tx, int budget)
{
u32 ch_map;
- int num_tx, ch;
+ int num_tx, cur_budget, ch;
struct cpsw_common *cpsw = napi_to_cpsw(napi_tx);
+ struct cpsw_vector *txv;
/* process every unprocessed channel */
ch_map = cpdma_ctrl_txchs_state(cpsw->dma);
@@ -792,7 +798,13 @@ static int cpsw_tx_poll(struct napi_struct *napi_tx, int budget)
if (!(ch_map & 0x01))
continue;
- num_tx += cpdma_chan_process(cpsw->txch[ch], budget - num_tx);
+ txv = &cpsw->txv[ch];
+ if (unlikely(txv->budget > budget - num_tx))
+ cur_budget = budget - num_tx;
+ else
+ cur_budget = txv->budget;
+
+ num_tx += cpdma_chan_process(txv->ch, cur_budget);
if (num_tx >= budget)
break;
}
@@ -812,8 +824,9 @@ static int cpsw_tx_poll(struct napi_struct *napi_tx, int budget)
static int cpsw_rx_poll(struct napi_struct *napi_rx, int budget)
{
u32 ch_map;
- int num_rx, ch;
+ int num_rx, cur_budget, ch;
struct cpsw_common *cpsw = napi_to_cpsw(napi_rx);
+ struct cpsw_vector *rxv;
/* process every unprocessed channel */
ch_map = cpdma_ctrl_rxchs_state(cpsw->dma);
@@ -821,7 +834,13 @@ static int cpsw_rx_poll(struct napi_struct *napi_rx, int budget)
if (!(ch_map & 0x01))
continue;
- num_rx += cpdma_chan_process(cpsw->rxch[ch], budget - num_rx);
+ rxv = &cpsw->rxv[ch];
+ if (unlikely(rxv->budget > budget - num_rx))
+ cur_budget = budget - num_rx;
+ else
+ cur_budget = rxv->budget;
+
+ num_rx += cpdma_chan_process(rxv->ch, cur_budget);
if (num_rx >= budget)
break;
}
@@ -1063,7 +1082,7 @@ static void cpsw_get_ethtool_stats(struct net_device *ndev,
cpsw_gstrings_stats[l].stat_offset);
for (ch = 0; ch < cpsw->rx_ch_num; ch++) {
- cpdma_chan_get_stats(cpsw->rxch[ch], &ch_stats);
+ cpdma_chan_get_stats(cpsw->rxv[ch].ch, &ch_stats);
for (i = 0; i < CPSW_STATS_CH_LEN; i++, l++) {
p = (u8 *)&ch_stats +
cpsw_gstrings_ch_stats[i].stat_offset;
@@ -1072,7 +1091,7 @@ static void cpsw_get_ethtool_stats(struct net_device *ndev,
}
for (ch = 0; ch < cpsw->tx_ch_num; ch++) {
- cpdma_chan_get_stats(cpsw->txch[ch], &ch_stats);
+ cpdma_chan_get_stats(cpsw->txv[ch].ch, &ch_stats);
for (i = 0; i < CPSW_STATS_CH_LEN; i++, l++) {
p = (u8 *)&ch_stats +
cpsw_gstrings_ch_stats[i].stat_offset;
@@ -1261,6 +1280,82 @@ static void cpsw_init_host_port(struct cpsw_priv *priv)
}
}
+/* split budget depending on channel rates */
+static void cpsw_split_budget(struct net_device *ndev)
+{
+ struct cpsw_priv *priv = netdev_priv(ndev);
+ struct cpsw_common *cpsw = priv->cpsw;
+ struct cpsw_vector *txv = cpsw->txv;
+ u32 consumed_rate, bigest_rate = 0;
+ int budget, bigest_rate_ch = 0;
+ struct cpsw_slave *slave;
+ int i, rlim_ch_num = 0;
+ u32 ch_rate, max_rate;
+ int ch_budget = 0;
+
+ if (cpsw->data.dual_emac)
+ slave = &cpsw->slaves[priv->emac_port];
+ else
+ slave = &cpsw->slaves[cpsw->data.active_slave];
+
+ max_rate = slave->phy->speed * 1000;
+
+ consumed_rate = 0;
+ for (i = 0; i < cpsw->tx_ch_num; i++) {
+ ch_rate = cpdma_chan_get_rate(txv[i].ch);
+ if (!ch_rate)
+ continue;
+
+ rlim_ch_num++;
+ consumed_rate += ch_rate;
+ }
+
+ if (cpsw->tx_ch_num == rlim_ch_num) {
+ max_rate = consumed_rate;
+ } else {
+ ch_budget = (consumed_rate * CPSW_POLL_WEIGHT) / max_rate;
+ ch_budget = (CPSW_POLL_WEIGHT - ch_budget) /
+ (cpsw->tx_ch_num - rlim_ch_num);
+ bigest_rate = (max_rate - consumed_rate) /
+ (cpsw->tx_ch_num - rlim_ch_num);
+ }
+
+ /* split tx budget */
+ budget = CPSW_POLL_WEIGHT;
+ for (i = 0; i < cpsw->tx_ch_num; i++) {
+ ch_rate = cpdma_chan_get_rate(txv[i].ch);
+ if (ch_rate) {
+ txv[i].budget = (ch_rate * CPSW_POLL_WEIGHT) / max_rate;
+ if (!txv[i].budget)
+ txv[i].budget = 1;
+ if (ch_rate > bigest_rate) {
+ bigest_rate_ch = i;
+ bigest_rate = ch_rate;
+ }
+ } else {
+ txv[i].budget = ch_budget;
+ if (!bigest_rate_ch)
+ bigest_rate_ch = i;
+ }
+
+ budget -= txv[i].budget;
+ }
+
+ if (budget)
+ txv[bigest_rate_ch].budget += budget;
+
+ /* split rx budget */
+ budget = CPSW_POLL_WEIGHT;
+ ch_budget = budget / cpsw->rx_ch_num;
+ for (i = 0; i < cpsw->rx_ch_num; i++) {
+ cpsw->rxv[i].budget = ch_budget;
+ budget -= ch_budget;
+ }
+
+ if (budget)
+ cpsw->rxv[0].budget += budget;
+}
+
static int cpsw_fill_rx_channels(struct cpsw_priv *priv)
{
struct cpsw_common *cpsw = priv->cpsw;
@@ -1269,7 +1364,7 @@ static int cpsw_fill_rx_channels(struct cpsw_priv *priv)
int ch, i, ret;
for (ch = 0; ch < cpsw->rx_ch_num; ch++) {
- ch_buf_num = cpdma_chan_get_rx_buf_num(cpsw->rxch[ch]);
+ ch_buf_num = cpdma_chan_get_rx_buf_num(cpsw->rxv[ch].ch);
for (i = 0; i < ch_buf_num; i++) {
skb = __netdev_alloc_skb_ip_align(priv->ndev,
cpsw->rx_packet_max,
@@ -1280,8 +1375,9 @@ static int cpsw_fill_rx_channels(struct cpsw_priv *priv)
}
skb_set_queue_mapping(skb, ch);
- ret = cpdma_chan_submit(cpsw->rxch[ch], skb, skb->data,
- skb_tailroom(skb), 0);
+ ret = cpdma_chan_submit(cpsw->rxv[ch].ch, skb,
+ skb->data, skb_tailroom(skb),
+ 0);
if (ret < 0) {
cpsw_err(priv, ifup,
"cannot submit skb to channel %d rx, error %d\n",
@@ -1405,6 +1501,7 @@ static int cpsw_ndo_open(struct net_device *ndev)
cpsw_set_coalesce(ndev, &coal);
}
+ cpsw_split_budget(ndev);
cpdma_ctlr_start(cpsw->dma);
cpsw_intr_enable(cpsw);
@@ -1474,7 +1571,7 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff *skb,
if (q_idx >= cpsw->tx_ch_num)
q_idx = q_idx % cpsw->tx_ch_num;
- txch = cpsw->txch[q_idx];
+ txch = cpsw->txv[q_idx].ch;
ret = cpsw_tx_packet_submit(priv, skb, txch);
if (unlikely(ret != 0)) {
cpsw_err(priv, tx_err, "desc submit failed\n");
@@ -1681,8 +1778,8 @@ static void cpsw_ndo_tx_timeout(struct net_device *ndev)
ndev->stats.tx_errors++;
cpsw_intr_disable(cpsw);
for (ch = 0; ch < cpsw->tx_ch_num; ch++) {
- cpdma_chan_stop(cpsw->txch[ch]);
- cpdma_chan_start(cpsw->txch[ch]);
+ cpdma_chan_stop(cpsw->txv[ch].ch);
+ cpdma_chan_start(cpsw->txv[ch].ch);
}
cpsw_intr_enable(cpsw);
@@ -1887,7 +1984,6 @@ static int cpsw_ndo_set_tx_maxrate(struct net_device *ndev, int queue, u32 rate)
ch_rate = rate;
else
ch_rate = netdev_get_tx_queue(ndev, i)->tx_maxrate;
-
if (!ch_rate)
continue;
@@ -1935,9 +2031,12 @@ static int cpsw_ndo_set_tx_maxrate(struct net_device *ndev, int queue, u32 rate)
max_rate = consumed_rate;
weight = (rate * 100) / (max_rate * 1000);
- cpdma_chan_set_weight(cpsw->txch[queue], weight);
+ cpdma_chan_set_weight(cpsw->txv[queue].ch, weight);
+ ret = cpdma_chan_set_rate(cpsw->txv[queue].ch, rate);
- ret = cpdma_chan_set_rate(cpsw->txch[queue], rate);
+ /* re-split budget between channels */
+ if (!rate)
+ cpsw_split_budget(ndev);
pm_runtime_put(cpsw->dev);
return ret;
}
@@ -2172,30 +2271,30 @@ static int cpsw_update_channels_res(struct cpsw_priv *priv, int ch_num, int rx)
struct cpsw_common *cpsw = priv->cpsw;
void (*handler)(void *, int, int);
struct netdev_queue *queue;
- struct cpdma_chan **chan;
+ struct cpsw_vector *vec;
int ret, *ch;
if (rx) {
ch = &cpsw->rx_ch_num;
- chan = cpsw->rxch;
+ vec = cpsw->rxv;
handler = cpsw_rx_handler;
poll = cpsw_rx_poll;
} else {
ch = &cpsw->tx_ch_num;
- chan = cpsw->txch;
+ vec = cpsw->txv;
handler = cpsw_tx_handler;
poll = cpsw_tx_poll;
}
while (*ch < ch_num) {
- chan[*ch] = cpdma_chan_create(cpsw->dma, *ch, handler, rx);
+ vec[*ch].ch = cpdma_chan_create(cpsw->dma, *ch, handler, rx);
queue = netdev_get_tx_queue(priv->ndev, *ch);
queue->tx_maxrate = 0;
- if (IS_ERR(chan[*ch]))
- return PTR_ERR(chan[*ch]);
+ if (IS_ERR(vec[*ch].ch))
+ return PTR_ERR(vec[*ch].ch);
- if (!chan[*ch])
+ if (!vec[*ch].ch)
return -EINVAL;
cpsw_info(priv, ifup, "created new %d %s channel\n", *ch,
@@ -2206,7 +2305,7 @@ static int cpsw_update_channels_res(struct cpsw_priv *priv, int ch_num, int rx)
while (*ch > ch_num) {
(*ch)--;
- ret = cpdma_chan_destroy(chan[*ch]);
+ ret = cpdma_chan_destroy(vec[*ch].ch);
if (ret)
return ret;
@@ -2293,6 +2392,8 @@ static int cpsw_set_channels(struct net_device *ndev,
if (ret)
goto err;
+ cpsw_split_budget(ndev);
+
/* After this receive is started */
cpdma_ctlr_start(cpsw->dma);
cpsw_intr_enable(cpsw);
@@ -2835,9 +2936,9 @@ static int cpsw_probe(struct platform_device *pdev)
goto clean_dt_ret;
}
- cpsw->txch[0] = cpdma_chan_create(cpsw->dma, 0, cpsw_tx_handler, 0);
- cpsw->rxch[0] = cpdma_chan_create(cpsw->dma, 0, cpsw_rx_handler, 1);
- if (WARN_ON(!cpsw->rxch[0] || !cpsw->txch[0])) {
+ cpsw->txv[0].ch = cpdma_chan_create(cpsw->dma, 0, cpsw_tx_handler, 0);
+ cpsw->rxv[0].ch = cpdma_chan_create(cpsw->dma, 0, cpsw_rx_handler, 1);
+ if (WARN_ON(!cpsw->rxv[0].ch || !cpsw->txv[0].ch)) {
dev_err(priv->dev, "error initializing dma channels\n");
ret = -ENOMEM;
goto clean_dma_ret;
--
2.7.4
^ permalink raw reply related
* [PATCH 2/5] net: ethernet: ti: davinci_cpdma: add set rate for a channel
From: Ivan Khoronzhuk @ 2016-11-29 15:00 UTC (permalink / raw)
To: netdev, linux-kernel, mugunthanvnm, grygorii.strashko
Cc: linux-omap, Ivan Khoronzhuk
In-Reply-To: <1480431651-6081-1-git-send-email-ivan.khoronzhuk@linaro.org>
The cpdma has 8 rate limited tx channels. This patch adds
ability for cpdma driver to use 8 tx h/w shapers. If at least one
channel is not rate limited then it must have higher number, this
is because the rate limited channels have to have higher priority
then not rate limited channels. The channel priority is set in low-hi
direction already, so that when a new channel is added with ethtool
and it doesn't have rate yet, it cannot affect on rate limited
channels. It can be useful for TSN streams and just in cases when
h/w rate limited channels are needed.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
drivers/net/ethernet/ti/davinci_cpdma.c | 329 +++++++++++++++++++++++++++-----
drivers/net/ethernet/ti/davinci_cpdma.h | 5 +
2 files changed, 289 insertions(+), 45 deletions(-)
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index 87456a9..c776e45 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -32,6 +32,7 @@
#define CPDMA_RXCONTROL 0x14
#define CPDMA_SOFTRESET 0x1c
#define CPDMA_RXTEARDOWN 0x18
+#define CPDMA_TX_PRI0_RATE 0x30
#define CPDMA_TXINTSTATRAW 0x80
#define CPDMA_TXINTSTATMASKED 0x84
#define CPDMA_TXINTMASKSET 0x88
@@ -68,6 +69,8 @@
#define CPDMA_TEARDOWN_VALUE 0xfffffffc
+#define CPDMA_MAX_RLIM_CNT 16384
+
struct cpdma_desc {
/* hardware fields */
u32 hw_next;
@@ -123,6 +126,8 @@ struct cpdma_chan {
/* offsets into dmaregs */
int int_set, int_clear, td;
int weight;
+ u32 rate_factor;
+ u32 rate;
};
struct cpdma_control_info {
@@ -135,6 +140,7 @@ struct cpdma_control_info {
};
static struct cpdma_control_info controls[] = {
+ [CPDMA_TX_RLIM] = {CPDMA_DMACONTROL, 8, 0xffff, ACCESS_RW},
[CPDMA_CMD_IDLE] = {CPDMA_DMACONTROL, 3, 1, ACCESS_WO},
[CPDMA_COPY_ERROR_FRAMES] = {CPDMA_DMACONTROL, 4, 1, ACCESS_RW},
[CPDMA_RX_OFF_LEN_UPDATE] = {CPDMA_DMACONTROL, 2, 1, ACCESS_RW},
@@ -302,6 +308,186 @@ static int _cpdma_control_set(struct cpdma_ctlr *ctlr, int control, int value)
return 0;
}
+static int _cpdma_control_get(struct cpdma_ctlr *ctlr, int control)
+{
+ struct cpdma_control_info *info = &controls[control];
+ int ret;
+
+ if (!ctlr->params.has_ext_regs)
+ return -ENOTSUPP;
+
+ if (ctlr->state != CPDMA_STATE_ACTIVE)
+ return -EINVAL;
+
+ if (control < 0 || control >= ARRAY_SIZE(controls))
+ return -ENOENT;
+
+ if ((info->access & ACCESS_RO) != ACCESS_RO)
+ return -EPERM;
+
+ ret = (dma_reg_read(ctlr, info->reg) >> info->shift) & info->mask;
+ return ret;
+}
+
+/* cpdma_chan_set_chan_shaper - set shaper for a channel
+ * Has to be called under ctlr lock
+ */
+static int cpdma_chan_set_chan_shaper(struct cpdma_chan *chan)
+{
+ struct cpdma_ctlr *ctlr = chan->ctlr;
+ u32 rate_reg;
+ u32 rmask;
+ int ret;
+
+ if (!chan->rate)
+ return 0;
+
+ rate_reg = CPDMA_TX_PRI0_RATE + 4 * chan->chan_num;
+ dma_reg_write(ctlr, rate_reg, chan->rate_factor);
+
+ rmask = _cpdma_control_get(ctlr, CPDMA_TX_RLIM);
+ rmask |= chan->mask;
+
+ ret = _cpdma_control_set(ctlr, CPDMA_TX_RLIM, rmask);
+ return ret;
+}
+
+static int cpdma_chan_on(struct cpdma_chan *chan)
+{
+ struct cpdma_ctlr *ctlr = chan->ctlr;
+ struct cpdma_desc_pool *pool = ctlr->pool;
+ unsigned long flags;
+
+ spin_lock_irqsave(&chan->lock, flags);
+ if (chan->state != CPDMA_STATE_IDLE) {
+ spin_unlock_irqrestore(&chan->lock, flags);
+ return -EBUSY;
+ }
+ if (ctlr->state != CPDMA_STATE_ACTIVE) {
+ spin_unlock_irqrestore(&chan->lock, flags);
+ return -EINVAL;
+ }
+ dma_reg_write(ctlr, chan->int_set, chan->mask);
+ chan->state = CPDMA_STATE_ACTIVE;
+ if (chan->head) {
+ chan_write(chan, hdp, desc_phys(pool, chan->head));
+ if (chan->rxfree)
+ chan_write(chan, rxfree, chan->count);
+ }
+
+ spin_unlock_irqrestore(&chan->lock, flags);
+ return 0;
+}
+
+/* cpdma_chan_fit_rate - set rate for a channel and check if it's possible.
+ * rmask - mask of rate limited channels
+ * Returns min rate in Kb/s
+ */
+static int cpdma_chan_fit_rate(struct cpdma_chan *ch, u32 rate,
+ u32 *rmask, int *prio_mode)
+{
+ struct cpdma_ctlr *ctlr = ch->ctlr;
+ struct cpdma_chan *chan;
+ u32 old_rate = ch->rate;
+ u32 new_rmask = 0;
+ int rlim = 1;
+ int i;
+
+ *prio_mode = 0;
+ for (i = tx_chan_num(0); i < tx_chan_num(CPDMA_MAX_CHANNELS); i++) {
+ chan = ctlr->channels[i];
+ if (!chan) {
+ rlim = 0;
+ continue;
+ }
+
+ if (chan == ch)
+ chan->rate = rate;
+
+ if (chan->rate) {
+ if (rlim) {
+ new_rmask |= chan->mask;
+ } else {
+ ch->rate = old_rate;
+ dev_err(ctlr->dev, "Prev channel of %dch is not rate limited\n",
+ chan->chan_num);
+ return -EINVAL;
+ }
+ } else {
+ *prio_mode = 1;
+ rlim = 0;
+ }
+ }
+
+ *rmask = new_rmask;
+ return 0;
+}
+
+static u32 cpdma_chan_set_factors(struct cpdma_ctlr *ctlr,
+ struct cpdma_chan *ch)
+{
+ u32 delta = UINT_MAX, prev_delta = UINT_MAX, best_delta = UINT_MAX;
+ u32 best_send_cnt = 0, best_idle_cnt = 0;
+ u32 new_rate, best_rate = 0, rate_reg;
+ u64 send_cnt, idle_cnt;
+ u32 min_send_cnt, freq;
+ u64 divident, divisor;
+
+ if (!ch->rate) {
+ ch->rate_factor = 0;
+ goto set_factor;
+ }
+
+ freq = ctlr->params.bus_freq_mhz * 1000 * 32;
+ if (!freq) {
+ dev_err(ctlr->dev, "The bus frequency is not set\n");
+ return -EINVAL;
+ }
+
+ min_send_cnt = freq - ch->rate;
+ send_cnt = DIV_ROUND_UP(min_send_cnt, ch->rate);
+ while (send_cnt <= CPDMA_MAX_RLIM_CNT) {
+ divident = ch->rate * send_cnt;
+ divisor = min_send_cnt;
+ idle_cnt = DIV_ROUND_CLOSEST_ULL(divident, divisor);
+
+ divident = freq * idle_cnt;
+ divisor = idle_cnt + send_cnt;
+ new_rate = DIV_ROUND_CLOSEST_ULL(divident, divisor);
+
+ delta = new_rate >= ch->rate ? new_rate - ch->rate : delta;
+ if (delta < best_delta) {
+ best_delta = delta;
+ best_send_cnt = send_cnt;
+ best_idle_cnt = idle_cnt;
+ best_rate = new_rate;
+
+ if (!delta)
+ break;
+ }
+
+ if (prev_delta >= delta) {
+ prev_delta = delta;
+ send_cnt++;
+ continue;
+ }
+
+ idle_cnt++;
+ divident = freq * idle_cnt;
+ send_cnt = DIV_ROUND_CLOSEST_ULL(divident, ch->rate);
+ send_cnt -= idle_cnt;
+ prev_delta = UINT_MAX;
+ }
+
+ ch->rate = best_rate;
+ ch->rate_factor = best_send_cnt | (best_idle_cnt << 16);
+
+set_factor:
+ rate_reg = CPDMA_TX_PRI0_RATE + 4 * ch->chan_num;
+ dma_reg_write(ctlr, rate_reg, ch->rate_factor);
+ return 0;
+}
+
struct cpdma_ctlr *cpdma_ctlr_create(struct cpdma_params *params)
{
struct cpdma_ctlr *ctlr;
@@ -332,8 +518,9 @@ EXPORT_SYMBOL_GPL(cpdma_ctlr_create);
int cpdma_ctlr_start(struct cpdma_ctlr *ctlr)
{
+ struct cpdma_chan *chan;
unsigned long flags;
- int i;
+ int i, prio_mode;
spin_lock_irqsave(&ctlr->lock, flags);
if (ctlr->state != CPDMA_STATE_IDLE) {
@@ -369,12 +556,20 @@ int cpdma_ctlr_start(struct cpdma_ctlr *ctlr)
ctlr->state = CPDMA_STATE_ACTIVE;
+ prio_mode = 0;
for (i = 0; i < ARRAY_SIZE(ctlr->channels); i++) {
- if (ctlr->channels[i])
- cpdma_chan_start(ctlr->channels[i]);
+ chan = ctlr->channels[i];
+ if (chan) {
+ cpdma_chan_set_chan_shaper(chan);
+ cpdma_chan_on(chan);
+
+ /* off prio mode if all tx channels are rate limited */
+ if (is_tx_chan(chan) && !chan->rate)
+ prio_mode = 1;
+ }
}
- _cpdma_control_set(ctlr, CPDMA_TX_PRIO_FIXED, 1);
+ _cpdma_control_set(ctlr, CPDMA_TX_PRIO_FIXED, prio_mode);
_cpdma_control_set(ctlr, CPDMA_RX_BUFFER_OFFSET, 0);
spin_unlock_irqrestore(&ctlr->lock, flags);
@@ -602,6 +797,75 @@ int cpdma_chan_set_weight(struct cpdma_chan *ch, int weight)
return ret;
}
+/* cpdma_chan_get_min_rate - get minimum allowed rate for channel
+ * Should be called before cpdma_chan_set_rate.
+ * Returns min rate in Kb/s
+ */
+u32 cpdma_chan_get_min_rate(struct cpdma_ctlr *ctlr)
+{
+ unsigned int divident, divisor;
+
+ divident = ctlr->params.bus_freq_mhz * 32 * 1000;
+ divisor = 1 + CPDMA_MAX_RLIM_CNT;
+
+ return DIV_ROUND_UP(divident, divisor);
+}
+
+/* cpdma_chan_set_rate - limits bandwidth for transmit channel.
+ * The bandwidth * limited channels have to be in order beginning from lowest.
+ * ch - transmit channel the bandwidth is configured for
+ * rate - bandwidth in Kb/s, if 0 - then off shaper
+ */
+int cpdma_chan_set_rate(struct cpdma_chan *ch, u32 rate)
+{
+ struct cpdma_ctlr *ctlr = ch->ctlr;
+ unsigned long flags, ch_flags;
+ int ret, prio_mode;
+ u32 rmask;
+
+ if (!ch || !is_tx_chan(ch))
+ return -EINVAL;
+
+ if (ch->rate == rate)
+ return rate;
+
+ spin_lock_irqsave(&ctlr->lock, flags);
+ spin_lock_irqsave(&ch->lock, ch_flags);
+
+ ret = cpdma_chan_fit_rate(ch, rate, &rmask, &prio_mode);
+ if (ret)
+ goto err;
+
+ ret = cpdma_chan_set_factors(ctlr, ch);
+ if (ret)
+ goto err;
+
+ spin_unlock_irqrestore(&ch->lock, ch_flags);
+
+ /* on shapers */
+ _cpdma_control_set(ctlr, CPDMA_TX_RLIM, rmask);
+ _cpdma_control_set(ctlr, CPDMA_TX_PRIO_FIXED, prio_mode);
+ spin_unlock_irqrestore(&ctlr->lock, flags);
+ return ret;
+
+err:
+ spin_unlock_irqrestore(&ch->lock, ch_flags);
+ spin_unlock_irqrestore(&ctlr->lock, flags);
+ return ret;
+}
+
+u32 cpdma_chan_get_rate(struct cpdma_chan *ch)
+{
+ unsigned long flags;
+ u32 rate;
+
+ spin_lock_irqsave(&ch->lock, flags);
+ rate = ch->rate;
+ spin_unlock_irqrestore(&ch->lock, flags);
+
+ return rate;
+}
+
struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr *ctlr, int chan_num,
cpdma_handler_fn handler, int rx_type)
{
@@ -629,6 +893,7 @@ struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr *ctlr, int chan_num,
chan->state = CPDMA_STATE_IDLE;
chan->chan_num = chan_num;
chan->handler = handler;
+ chan->rate = 0;
chan->desc_num = ctlr->pool->num_desc / 2;
chan->weight = 0;
@@ -924,28 +1189,20 @@ EXPORT_SYMBOL_GPL(cpdma_chan_process);
int cpdma_chan_start(struct cpdma_chan *chan)
{
- struct cpdma_ctlr *ctlr = chan->ctlr;
- struct cpdma_desc_pool *pool = ctlr->pool;
- unsigned long flags;
+ struct cpdma_ctlr *ctlr = chan->ctlr;
+ unsigned long flags;
+ int ret;
- spin_lock_irqsave(&chan->lock, flags);
- if (chan->state != CPDMA_STATE_IDLE) {
- spin_unlock_irqrestore(&chan->lock, flags);
- return -EBUSY;
- }
- if (ctlr->state != CPDMA_STATE_ACTIVE) {
- spin_unlock_irqrestore(&chan->lock, flags);
- return -EINVAL;
- }
- dma_reg_write(ctlr, chan->int_set, chan->mask);
- chan->state = CPDMA_STATE_ACTIVE;
- if (chan->head) {
- chan_write(chan, hdp, desc_phys(pool, chan->head));
- if (chan->rxfree)
- chan_write(chan, rxfree, chan->count);
- }
+ spin_lock_irqsave(&ctlr->lock, flags);
+ ret = cpdma_chan_set_chan_shaper(chan);
+ spin_unlock_irqrestore(&ctlr->lock, flags);
+ if (ret)
+ return ret;
+
+ ret = cpdma_chan_on(chan);
+ if (ret)
+ return ret;
- spin_unlock_irqrestore(&chan->lock, flags);
return 0;
}
EXPORT_SYMBOL_GPL(cpdma_chan_start);
@@ -1033,31 +1290,12 @@ int cpdma_chan_int_ctrl(struct cpdma_chan *chan, bool enable)
int cpdma_control_get(struct cpdma_ctlr *ctlr, int control)
{
unsigned long flags;
- struct cpdma_control_info *info = &controls[control];
int ret;
spin_lock_irqsave(&ctlr->lock, flags);
-
- ret = -ENOTSUPP;
- if (!ctlr->params.has_ext_regs)
- goto unlock_ret;
-
- ret = -EINVAL;
- if (ctlr->state != CPDMA_STATE_ACTIVE)
- goto unlock_ret;
-
- ret = -ENOENT;
- if (control < 0 || control >= ARRAY_SIZE(controls))
- goto unlock_ret;
-
- ret = -EPERM;
- if ((info->access & ACCESS_RO) != ACCESS_RO)
- goto unlock_ret;
-
- ret = (dma_reg_read(ctlr, info->reg) >> info->shift) & info->mask;
-
-unlock_ret:
+ ret = _cpdma_control_get(ctlr, control);
spin_unlock_irqrestore(&ctlr->lock, flags);
+
return ret;
}
@@ -1069,6 +1307,7 @@ int cpdma_control_set(struct cpdma_ctlr *ctlr, int control, int value)
spin_lock_irqsave(&ctlr->lock, flags);
ret = _cpdma_control_set(ctlr, control, value);
spin_unlock_irqrestore(&ctlr->lock, flags);
+
return ret;
}
EXPORT_SYMBOL_GPL(cpdma_control_set);
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.h b/drivers/net/ethernet/ti/davinci_cpdma.h
index 629020c..4a167db 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.h
+++ b/drivers/net/ethernet/ti/davinci_cpdma.h
@@ -36,6 +36,7 @@ struct cpdma_params {
u32 desc_hw_addr;
int desc_mem_size;
int desc_align;
+ u32 bus_freq_mhz;
/*
* Some instances of embedded cpdma controllers have extra control and
@@ -91,8 +92,12 @@ u32 cpdma_ctrl_rxchs_state(struct cpdma_ctlr *ctlr);
u32 cpdma_ctrl_txchs_state(struct cpdma_ctlr *ctlr);
bool cpdma_check_free_tx_desc(struct cpdma_chan *chan);
int cpdma_chan_set_weight(struct cpdma_chan *ch, int weight);
+int cpdma_chan_set_rate(struct cpdma_chan *ch, u32 rate);
+u32 cpdma_chan_get_rate(struct cpdma_chan *ch);
+u32 cpdma_chan_get_min_rate(struct cpdma_ctlr *ctlr);
enum cpdma_control {
+ CPDMA_TX_RLIM, /* read-write */
CPDMA_CMD_IDLE, /* write-only */
CPDMA_COPY_ERROR_FRAMES, /* read-write */
CPDMA_RX_OFF_LEN_UPDATE, /* read-write */
--
2.7.4
^ permalink raw reply related
* [PATCH 1/5] net: ethernet: ti: davinci_cpdma: add weight function for channels
From: Ivan Khoronzhuk @ 2016-11-29 15:00 UTC (permalink / raw)
To: netdev, linux-kernel, mugunthanvnm, grygorii.strashko
Cc: linux-omap, Ivan Khoronzhuk
In-Reply-To: <1480431651-6081-1-git-send-email-ivan.khoronzhuk@linaro.org>
The weight of a channel is needed to split descriptors between
channels. The weight can depend on maximum rate of channels, maximum
rate of an interface or other reasons. The channel weight is in
percentage and is independent for rx and tx channels.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
drivers/net/ethernet/ti/davinci_cpdma.c | 124 +++++++++++++++++++++++++++++---
drivers/net/ethernet/ti/davinci_cpdma.h | 1 +
2 files changed, 115 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index 56708a7..87456a9 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -122,6 +122,7 @@ struct cpdma_chan {
struct cpdma_chan_stats stats;
/* offsets into dmaregs */
int int_set, int_clear, td;
+ int weight;
};
struct cpdma_control_info {
@@ -474,29 +475,131 @@ u32 cpdma_ctrl_txchs_state(struct cpdma_ctlr *ctlr)
}
EXPORT_SYMBOL_GPL(cpdma_ctrl_txchs_state);
+static void cpdma_chan_set_descs(struct cpdma_ctlr *ctlr,
+ int rx, int desc_num,
+ int per_ch_desc)
+{
+ struct cpdma_chan *chan, *most_chan = NULL;
+ int desc_cnt = desc_num;
+ int most_dnum = 0;
+ int min, max, i;
+
+ if (!desc_num)
+ return;
+
+ if (rx) {
+ min = rx_chan_num(0);
+ max = rx_chan_num(CPDMA_MAX_CHANNELS);
+ } else {
+ min = tx_chan_num(0);
+ max = tx_chan_num(CPDMA_MAX_CHANNELS);
+ }
+
+ for (i = min; i < max; i++) {
+ chan = ctlr->channels[i];
+ if (!chan)
+ continue;
+
+ if (chan->weight)
+ chan->desc_num = (chan->weight * desc_num) / 100;
+ else
+ chan->desc_num = per_ch_desc;
+
+ desc_cnt -= chan->desc_num;
+
+ if (most_dnum < chan->desc_num) {
+ most_dnum = chan->desc_num;
+ most_chan = chan;
+ }
+ }
+ /* use remains */
+ most_chan->desc_num += desc_cnt;
+}
+
/**
* cpdma_chan_split_pool - Splits ctrl pool between all channels.
* Has to be called under ctlr lock
*/
-static void cpdma_chan_split_pool(struct cpdma_ctlr *ctlr)
+static int cpdma_chan_split_pool(struct cpdma_ctlr *ctlr)
{
+ int tx_per_ch_desc = 0, rx_per_ch_desc = 0;
struct cpdma_desc_pool *pool = ctlr->pool;
+ int free_rx_num = 0, free_tx_num = 0;
+ int rx_weight = 0, tx_weight = 0;
+ int tx_desc_num, rx_desc_num;
struct cpdma_chan *chan;
- int ch_desc_num;
- int i;
+ int i, tx_num = 0;
if (!ctlr->chan_num)
- return;
-
- /* calculate average size of pool slice */
- ch_desc_num = pool->num_desc / ctlr->chan_num;
+ return 0;
- /* split ctlr pool */
for (i = 0; i < ARRAY_SIZE(ctlr->channels); i++) {
chan = ctlr->channels[i];
- if (chan)
- chan->desc_num = ch_desc_num;
+ if (!chan)
+ continue;
+
+ if (is_rx_chan(chan)) {
+ if (!chan->weight)
+ free_rx_num++;
+ rx_weight += chan->weight;
+ } else {
+ if (!chan->weight)
+ free_tx_num++;
+ tx_weight += chan->weight;
+ tx_num++;
+ }
+ }
+
+ if (rx_weight > 100 || tx_weight > 100)
+ return -EINVAL;
+
+ tx_desc_num = (tx_num * pool->num_desc) / ctlr->chan_num;
+ rx_desc_num = pool->num_desc - tx_desc_num;
+
+ if (free_tx_num) {
+ tx_per_ch_desc = tx_desc_num - (tx_weight * tx_desc_num) / 100;
+ tx_per_ch_desc /= free_tx_num;
+ }
+ if (free_rx_num) {
+ rx_per_ch_desc = rx_desc_num - (rx_weight * rx_desc_num) / 100;
+ rx_per_ch_desc /= free_rx_num;
}
+
+ cpdma_chan_set_descs(ctlr, 0, tx_desc_num, tx_per_ch_desc);
+ cpdma_chan_set_descs(ctlr, 1, rx_desc_num, rx_per_ch_desc);
+
+ return 0;
+}
+
+/* cpdma_chan_set_weight - set weight of a channel in percentage.
+ * Tx and Rx channels have separate weights. That is 100% for RX
+ * and 100% for Tx. The weight is used to split cpdma resources
+ * in correct proportion required by the channels, including number
+ * of descriptors. The channel rate is not enough to know the
+ * weight of a channel as the maximum rate of an interface is needed.
+ * If weight = 0, then channel uses rest of descriptors leaved by
+ * weighted channels.
+ */
+int cpdma_chan_set_weight(struct cpdma_chan *ch, int weight)
+{
+ struct cpdma_ctlr *ctlr = ch->ctlr;
+ unsigned long flags, ch_flags;
+ int ret;
+
+ spin_lock_irqsave(&ctlr->lock, flags);
+ spin_lock_irqsave(&ch->lock, ch_flags);
+ if (ch->weight == weight) {
+ spin_unlock_irqrestore(&ch->lock, ch_flags);
+ spin_unlock_irqrestore(&ctlr->lock, flags);
+ return 0;
+ }
+ ch->weight = weight;
+ spin_unlock_irqrestore(&ch->lock, ch_flags);
+
+ /* re-split pool using new channel weight */
+ ret = cpdma_chan_split_pool(ctlr);
+ spin_unlock_irqrestore(&ctlr->lock, flags);
+ return ret;
}
struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr *ctlr, int chan_num,
@@ -527,6 +630,7 @@ struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr *ctlr, int chan_num,
chan->chan_num = chan_num;
chan->handler = handler;
chan->desc_num = ctlr->pool->num_desc / 2;
+ chan->weight = 0;
if (is_rx_chan(chan)) {
chan->hdp = ctlr->params.rxhdp + offset;
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.h b/drivers/net/ethernet/ti/davinci_cpdma.h
index a07b22b..629020c 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.h
+++ b/drivers/net/ethernet/ti/davinci_cpdma.h
@@ -90,6 +90,7 @@ int cpdma_chan_int_ctrl(struct cpdma_chan *chan, bool enable);
u32 cpdma_ctrl_rxchs_state(struct cpdma_ctlr *ctlr);
u32 cpdma_ctrl_txchs_state(struct cpdma_ctlr *ctlr);
bool cpdma_check_free_tx_desc(struct cpdma_chan *chan);
+int cpdma_chan_set_weight(struct cpdma_chan *ch, int weight);
enum cpdma_control {
CPDMA_CMD_IDLE, /* write-only */
--
2.7.4
^ permalink raw reply related
* [PATCH 0/5] cpsw: add per channel shaper configuration
From: Ivan Khoronzhuk @ 2016-11-29 15:00 UTC (permalink / raw)
To: netdev, linux-kernel, mugunthanvnm, grygorii.strashko
Cc: linux-omap, Ivan Khoronzhuk
This series is intended to allow user to set rate for per channel
shapers at cpdma level. This patchset doesn't have impact on performance.
The rate can be set with:
echo 100 > /sys/class/net/ethX/queues/tx-0/tx_maxrate
Tested on am572xx
Based on net-next/master
Ivan Khoronzhuk (5):
net: ethernet: ti: davinci_cpdma: add weight function for channels
net: ethernet: ti: davinci_cpdma: add set rate for a channel
net: ethernet: ti: cpsw: add .ndo to set per-queue rate
net: ethernet: ti: cpsw: optimize end of poll cycle
net: ethernet: ti: cpsw: split tx budget according between channels
drivers/net/ethernet/ti/cpsw.c | 264 +++++++++++++++----
drivers/net/ethernet/ti/davinci_cpdma.c | 453 ++++++++++++++++++++++++++++----
drivers/net/ethernet/ti/davinci_cpdma.h | 6 +
3 files changed, 624 insertions(+), 99 deletions(-)
--
2.7.4
^ permalink raw reply
* pull-request: wireless-drivers 2016-11-29
From: Kalle Valo @ 2016-11-29 14:59 UTC (permalink / raw)
To: David Miller
Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
Hi Dave,
if there's still time here's one more patch to 3.9. I think this is good
to have in 3.9 as it fixes an issue where we were printing uninitialised
memory in mwifiex. I had this in wireless-drivers already for some time
as I was waiting for other fixes and nothing serious actually came up.
If this doesn't make it to 3.9 that's not a problem, I'll just merge
this to wireless-drivers-next. Let me know what you prefer.
Kalle
The following changes since commit d3532ea6ce4ea501e421d130555e59edc2945f99:
brcmfmac: avoid maybe-uninitialized warning in brcmf_cfg80211_start_ap (2016-10-27 18:04:54 +0300)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git tags/wireless-drivers-for-davem-2016-11-29
for you to fetch changes up to fcd2042e8d36cf644bd2d69c26378d17158b17df:
mwifiex: printk() overflow with 32-byte SSIDs (2016-11-17 13:16:52 +0200)
----------------------------------------------------------------
wireless-drivers fixes for 4.9
mwifiex
* properly terminate SSIDs so that uninitalised memory is not printed
----------------------------------------------------------------
Brian Norris (1):
mwifiex: printk() overflow with 32-byte SSIDs
drivers/net/wireless/marvell/mwifiex/cfg80211.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
^ permalink raw reply
* Re: net: GPF in eth_header
From: Eric Dumazet @ 2016-11-29 14:58 UTC (permalink / raw)
To: Andrey Konovalov
Cc: syzkaller, Dmitry Vyukov, David Miller, Tom Herbert,
Alexander Duyck, Hannes Frederic Sowa, Jiri Benc, Sabrina Dubroca,
netdev, LKML
In-Reply-To: <CAAeHK+xr+GdLbLch7XatW5rifOz+=cjRCGf1ro8WxEZWziyU4Q@mail.gmail.com>
On Tue, 2016-11-29 at 11:26 +0100, Andrey Konovalov wrote:
> On Sat, Nov 26, 2016 at 9:05 PM, Eric Dumazet <erdlkml@gmail.com> wrote:
> >> I actually see multiple places where skb_network_offset() is used as
> >> an argument to skb_pull().
> >> So I guess every place can potentially be buggy.
> >
> > Well, I think the intent is to accept a negative number.
>
> I'm not sure that was the intent since it results in a signedness
> issue which leads to an out-of-bounds.
>
Hey, I already mentioned where was the bug.
You missed the investigation where I pointed it to FLorian ?
> A quick grep shows that the same issue can potentially happen in
> multiple places across the kernel:
>
> net/ipv6/ip6_output.c:1655: __skb_pull(skb, skb_network_offset(skb));
> net/packet/af_packet.c:2043: skb_pull(skb, skb_network_offset(skb));
> net/packet/af_packet.c:2165: skb_pull(skb, skb_network_offset(skb));
> net/core/neighbour.c:1301: __skb_pull(skb, skb_network_offset(skb));
> net/core/neighbour.c:1331: __skb_pull(skb, skb_network_offset(skb));
> net/core/dev.c:3157: __skb_pull(skb, skb_network_offset(skb));
> net/sched/sch_teql.c:337: __skb_pull(skb, skb_network_offset(skb));
> net/sched/sch_atm.c:479: skb_pull(skb, skb_network_offset(skb));
> net/ipv4/ip_output.c:1385: __skb_pull(skb, skb_network_offset(skb));
> net/ipv4/ip_fragment.c:391: if (!pskb_pull(skb, skb_network_offset(skb) + ihl))
> drivers/net/vxlan.c:1440: __skb_pull(reply, skb_network_offset(reply));
> drivers/net/vxlan.c:1902: __skb_pull(skb, skb_network_offset(skb));
> drivers/net/vrf.c:220: __skb_pull(skb, skb_network_offset(skb));
> drivers/net/vrf.c:314: __skb_pull(skb, skb_network_offset(skb));
>
> A similar thing also happened to somebody else (on a receive path!):
> https://forums.grsecurity.net/viewtopic.php?f=3&t=4550
>
> Does it make sense to check skb_network_offset() before passing it to
> skb_pull() everywhere?
Well, sure, we could add safety checks everywhere and slow the kernel
when debugging is requested.
But skb_network_offset() is not the problem here. Why are you focusing
on it ?
The real problem is in __skb_pull() or __skb_push() and all similar
helpers. Lots of added checks and slowdowns.
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 9c535fbccf2c7dbfae04cee393460e86d588c26b..d6116e37d054fc1536114347ed3c41fc7dc7a882 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1923,6 +1923,7 @@ static inline unsigned char *__skb_put(struct sk_buff *skb, unsigned int len)
unsigned char *skb_push(struct sk_buff *skb, unsigned int len);
static inline unsigned char *__skb_push(struct sk_buff *skb, unsigned int len)
{
+ BUG_ON((int)len < 0);
skb->data -= len;
skb->len += len;
return skb->data;
@@ -1931,6 +1932,7 @@ static inline unsigned char *__skb_push(struct sk_buff *skb, unsigned int len)
unsigned char *skb_pull(struct sk_buff *skb, unsigned int len);
static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len)
{
+ BUG_ON((int)len < 0);
skb->len -= len;
BUG_ON(skb->len < skb->data_len);
return skb->data += len;
@@ -1938,6 +1940,7 @@ static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len)
static inline unsigned char *skb_pull_inline(struct sk_buff *skb, unsigned int len)
{
+ BUG_ON((int)len < 0);
return unlikely(len > skb->len) ? NULL : __skb_pull(skb, len);
}
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d1d1a5a5ad24ded523fc12ffba8c602b03bd0830..7bf098c848fd857ba5d287fc91d43f62f381bd55 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1450,6 +1450,7 @@ EXPORT_SYMBOL(skb_put);
*/
unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
{
+ BUG_ON((int)len < 0);
skb->data -= len;
skb->len += len;
if (unlikely(skb->data<skb->head))
^ permalink raw reply related
* Re: [PATCH v3 net-next 0/4] bpf: BPF for lightweight tunnel encapsulation
From: Thomas Graf @ 2016-11-29 14:58 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: davem, netdev, alexei.starovoitov, daniel, tom, roopa
In-Reply-To: <50764658-4bf7-966b-bc61-d6840d7c03f2@stressinduktion.org>
Hi Hannes,
On 11/29/16 at 03:15pm, Hannes Frederic Sowa wrote:
> Did you look at the cgroup based hooks which were added recently in
> ip_finish_output for cgroup ebpf support and in general the cgroup bpf
> subsystem. Does some of this solve the problem for you already? Would be
> interesting to hear your opinion on that.
What I'm looking for is the ability to collect statistics and generate
samples for a subset of the traffic, e.g. all intra data center
traffic, all packets hitting the default route in a network namespace,
all packets which use a dst tying a certain endpoint to particular TCP
metrics. For the examples above, LWT provides a very intuitive and
natural way to do so while amortizing the cost of the route lookup
which is required anyway.
The cgroup hook provides similar semantics but if the application
context is of interest. Obviously, tasks in a cgroup may be sharing
routes so I can't use it as a replacement. However, using the two in
combination will become highly useful as it allows to gather statistics
individually for both application context and routing context and then
aggregate them to see how applications are using different network
segments.
Aside from the different context matching, the cgroup hook will not
allow to modify the packet as the lwtunnel_xmit() post
ip_finish_output does.
^ permalink raw reply
* [PATCH v4 net-next 7/7] ARM64: dts: marvell: Add network support for Armada 3700
From: Gregory CLEMENT @ 2016-11-29 14:55 UTC (permalink / raw)
To: David S. Miller, linux-kernel, netdev
Cc: Jisheng Zhang, Arnd Bergmann, Jason Cooper, Andrew Lunn,
Sebastian Hesselbarth, Gregory CLEMENT, Thomas Petazzoni,
linux-arm-kernel, Nadav Haklai, Marcin Wojtas, Dmitri Epshtein,
Yelena Krivosheev
In-Reply-To: <cover.654978388d6811b943dd58ecb9a9d9b6cceeaee3.1480431285.git-series.gregory.clement@free-electrons.com>
Add neta nodes for network support both in device tree for the SoC and
the board.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---
arch/arm64/boot/dts/marvell/armada-3720-db.dts | 23 +++++++++++++++++++-
arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 23 +++++++++++++++++++-
2 files changed, 46 insertions(+), 0 deletions(-)
diff --git a/arch/arm64/boot/dts/marvell/armada-3720-db.dts b/arch/arm64/boot/dts/marvell/armada-3720-db.dts
index 1372e9a6aaa4..c8b82e4145de 100644
--- a/arch/arm64/boot/dts/marvell/armada-3720-db.dts
+++ b/arch/arm64/boot/dts/marvell/armada-3720-db.dts
@@ -81,3 +81,26 @@
&pcie0 {
status = "okay";
};
+
+&mdio {
+ status = "okay";
+ phy0: ethernet-phy@0 {
+ reg = <0>;
+ };
+
+ phy1: ethernet-phy@1 {
+ reg = <1>;
+ };
+};
+
+ð0 {
+ phy-mode = "rgmii-id";
+ phy = <&phy0>;
+ status = "okay";
+};
+
+ð1 {
+ phy-mode = "rgmii-id";
+ phy = <&phy1>;
+ status = "okay";
+};
diff --git a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
index e9bd58793464..3b8eb45bdc76 100644
--- a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
@@ -140,6 +140,29 @@
};
};
+ eth0: ethernet@30000 {
+ compatible = "marvell,armada-3700-neta";
+ reg = <0x30000 0x4000>;
+ interrupts = <GIC_SPI 42 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&sb_periph_clk 8>;
+ status = "disabled";
+ };
+
+ mdio: mdio@32004 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "marvell,orion-mdio";
+ reg = <0x32004 0x4>;
+ };
+
+ eth1: ethernet@40000 {
+ compatible = "marvell,armada-3700-neta";
+ reg = <0x40000 0x4000>;
+ interrupts = <GIC_SPI 45 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&sb_periph_clk 7>;
+ status = "disabled";
+ };
+
usb3: usb@58000 {
compatible = "marvell,armada3700-xhci",
"generic-xhci";
--
git-series 0.8.10
^ permalink raw reply related
* [PATCH v4 net-next 6/7] net: mvneta: Add network support for Armada 3700 SoC
From: Gregory CLEMENT @ 2016-11-29 14:55 UTC (permalink / raw)
To: David S. Miller, linux-kernel, netdev
Cc: Jisheng Zhang, Arnd Bergmann, Jason Cooper, Andrew Lunn,
Sebastian Hesselbarth, Gregory CLEMENT, Thomas Petazzoni,
linux-arm-kernel, Nadav Haklai, Marcin Wojtas, Dmitri Epshtein,
Yelena Krivosheev
In-Reply-To: <cover.654978388d6811b943dd58ecb9a9d9b6cceeaee3.1480431285.git-series.gregory.clement@free-electrons.com>
From: Marcin Wojtas <mw@semihalf.com>
Armada 3700 is a new ARMv8 SoC from Marvell using same network controller
as older Armada 370/38x/XP. There are however some differences that
needed taking into account when adding support for it:
* open default MBUS window to 4GB of DRAM - Armada 3700 SoC's Mbus
configuration for network controller has to be done on two levels:
global and per-port. The first one is inherited from the
bootloader. The latter can be opened in a default way, leaving
arbitration to the bus controller. Hence filled mbus_dram_target_info
structure is not needed
* make per-CPU operation optional - Recent patches adding RSS and XPS
support for Armada 38x/XP enabled per-CPU operation of the controller
by default. Contrary to older SoC's Armada 3700 SoC's network
controller is not capable of per-CPU processing due to interrupt lines'
connectivity. This patch restores non-per-CPU operation, which is now
optional and depends on neta_armada3700 flag value in mvneta_port
structure. In order not to complicate the code, separate interrupt
subroutine is implemented.
For now, on the Armada 3700, RSS is disabled as the current
implementation depend on the per cpu interrupts.
[gregory.clement@free-electrons.com: extract from a larger patch, replace
some ifdef and port to net-next for v4.10]
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---
Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt | 7 +-
drivers/net/ethernet/marvell/Kconfig | 7 +-
drivers/net/ethernet/marvell/mvneta.c | 287 +++++++++++++++++++++++++++++++++++++++++++++++++++---------------------
3 files changed, 214 insertions(+), 87 deletions(-)
diff --git a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
index 73be8970815e..7aa840c8768d 100644
--- a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
+++ b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt
@@ -1,7 +1,10 @@
-* Marvell Armada 370 / Armada XP Ethernet Controller (NETA)
+* Marvell Armada 370 / Armada XP / Armada 3700 Ethernet Controller (NETA)
Required properties:
-- compatible: "marvell,armada-370-neta" or "marvell,armada-xp-neta".
+- compatible: could be one of the followings
+ "marvell,armada-370-neta"
+ "marvell,armada-xp-neta"
+ "marvell,armada-3700-neta"
- reg: address and length of the register set for the device.
- interrupts: interrupt for the device
- phy: See ethernet.txt file in the same directory.
diff --git a/drivers/net/ethernet/marvell/Kconfig b/drivers/net/ethernet/marvell/Kconfig
index 2ccea9dd9248..3b8f11fe5e13 100644
--- a/drivers/net/ethernet/marvell/Kconfig
+++ b/drivers/net/ethernet/marvell/Kconfig
@@ -56,14 +56,15 @@ config MVNETA_BM_ENABLE
buffer management.
config MVNETA
- tristate "Marvell Armada 370/38x/XP network interface support"
- depends on PLAT_ORION || COMPILE_TEST
+ tristate "Marvell Armada 370/38x/XP/37xx network interface support"
+ depends on ARCH_MVEBU || COMPILE_TEST
depends on HAS_DMA
select MVMDIO
select FIXED_PHY
---help---
This driver supports the network interface units in the
- Marvell ARMADA XP, ARMADA 370 and ARMADA 38x SoC family.
+ Marvell ARMADA XP, ARMADA 370, ARMADA 38x and
+ ARMADA 37xx SoC family.
Note that this driver is distinct from the mv643xx_eth
driver, which should be used for the older Marvell SoCs
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 6421402ef394..c12859201f5d 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -397,6 +397,9 @@ struct mvneta_port {
spinlock_t lock;
bool is_stopped;
+ u32 cause_rx_tx;
+ struct napi_struct napi;
+
/* Core clock */
struct clk *clk;
/* AXI clock */
@@ -422,6 +425,9 @@ struct mvneta_port {
u64 ethtool_stats[ARRAY_SIZE(mvneta_statistics)];
u32 indir[MVNETA_RSS_LU_TABLE_SIZE];
+
+ /* Flags for special SoC configurations */
+ bool neta_armada3700;
u16 rx_offset_correction;
};
@@ -965,14 +971,9 @@ static int mvneta_mbus_io_win_set(struct mvneta_port *pp, u32 base, u32 wsize,
return 0;
}
-/* Assign and initialize pools for port. In case of fail
- * buffer manager will remain disabled for current port.
- */
-static int mvneta_bm_port_init(struct platform_device *pdev,
- struct mvneta_port *pp)
+static int mvneta_bm_port_mbus_init(struct mvneta_port *pp)
{
- struct device_node *dn = pdev->dev.of_node;
- u32 long_pool_id, short_pool_id, wsize;
+ u32 wsize;
u8 target, attr;
int err;
@@ -991,6 +992,25 @@ static int mvneta_bm_port_init(struct platform_device *pdev,
netdev_info(pp->dev, "fail to configure mbus window to BM\n");
return err;
}
+ return 0;
+}
+
+/* Assign and initialize pools for port. In case of fail
+ * buffer manager will remain disabled for current port.
+ */
+static int mvneta_bm_port_init(struct platform_device *pdev,
+ struct mvneta_port *pp)
+{
+ struct device_node *dn = pdev->dev.of_node;
+ u32 long_pool_id, short_pool_id;
+
+ if (!pp->neta_armada3700) {
+ int ret;
+
+ ret = mvneta_bm_port_mbus_init(pp);
+ if (ret)
+ return ret;
+ }
if (of_property_read_u32(dn, "bm,pool-long", &long_pool_id)) {
netdev_info(pp->dev, "missing long pool id\n");
@@ -1359,22 +1379,27 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
for_each_present_cpu(cpu) {
int rxq_map = 0, txq_map = 0;
int rxq, txq;
+ if (!pp->neta_armada3700) {
+ for (rxq = 0; rxq < rxq_number; rxq++)
+ if ((rxq % max_cpu) == cpu)
+ rxq_map |= MVNETA_CPU_RXQ_ACCESS(rxq);
+
+ for (txq = 0; txq < txq_number; txq++)
+ if ((txq % max_cpu) == cpu)
+ txq_map |= MVNETA_CPU_TXQ_ACCESS(txq);
+
+ /* With only one TX queue we configure a special case
+ * which will allow to get all the irq on a single
+ * CPU
+ */
+ if (txq_number == 1)
+ txq_map = (cpu == pp->rxq_def) ?
+ MVNETA_CPU_TXQ_ACCESS(1) : 0;
- for (rxq = 0; rxq < rxq_number; rxq++)
- if ((rxq % max_cpu) == cpu)
- rxq_map |= MVNETA_CPU_RXQ_ACCESS(rxq);
-
- for (txq = 0; txq < txq_number; txq++)
- if ((txq % max_cpu) == cpu)
- txq_map |= MVNETA_CPU_TXQ_ACCESS(txq);
-
- /* With only one TX queue we configure a special case
- * which will allow to get all the irq on a single
- * CPU
- */
- if (txq_number == 1)
- txq_map = (cpu == pp->rxq_def) ?
- MVNETA_CPU_TXQ_ACCESS(1) : 0;
+ } else {
+ txq_map = MVNETA_CPU_TXQ_ACCESS_ALL_MASK;
+ rxq_map = MVNETA_CPU_RXQ_ACCESS_ALL_MASK;
+ }
mvreg_write(pp, MVNETA_CPU_MAP(cpu), rxq_map | txq_map);
}
@@ -2627,6 +2652,17 @@ static void mvneta_set_rx_mode(struct net_device *dev)
/* Interrupt handling - the callback for request_irq() */
static irqreturn_t mvneta_isr(int irq, void *dev_id)
{
+ struct mvneta_port *pp = (struct mvneta_port *)dev_id;
+
+ mvreg_write(pp, MVNETA_INTR_NEW_MASK, 0);
+ napi_schedule(&pp->napi);
+
+ return IRQ_HANDLED;
+}
+
+/* Interrupt handling - the callback for request_percpu_irq() */
+static irqreturn_t mvneta_percpu_isr(int irq, void *dev_id)
+{
struct mvneta_pcpu_port *port = (struct mvneta_pcpu_port *)dev_id;
disable_percpu_irq(port->pp->dev->irq);
@@ -2674,7 +2710,7 @@ static int mvneta_poll(struct napi_struct *napi, int budget)
struct mvneta_pcpu_port *port = this_cpu_ptr(pp->ports);
if (!netif_running(pp->dev)) {
- napi_complete(&port->napi);
+ napi_complete(napi);
return rx_done;
}
@@ -2703,7 +2739,8 @@ static int mvneta_poll(struct napi_struct *napi, int budget)
*/
rx_queue = fls(((cause_rx_tx >> 8) & 0xff));
- cause_rx_tx |= port->cause_rx_tx;
+ cause_rx_tx |= pp->neta_armada3700 ? pp->cause_rx_tx :
+ port->cause_rx_tx;
if (rx_queue) {
rx_queue = rx_queue - 1;
@@ -2717,11 +2754,27 @@ static int mvneta_poll(struct napi_struct *napi, int budget)
if (budget > 0) {
cause_rx_tx = 0;
- napi_complete(&port->napi);
- enable_percpu_irq(pp->dev->irq, 0);
+ napi_complete(napi);
+
+ if (pp->neta_armada3700) {
+ unsigned long flags;
+
+ local_irq_save(flags);
+ mvreg_write(pp, MVNETA_INTR_NEW_MASK,
+ MVNETA_RX_INTR_MASK(rxq_number) |
+ MVNETA_TX_INTR_MASK(txq_number) |
+ MVNETA_MISCINTR_INTR_MASK);
+ local_irq_restore(flags);
+ } else {
+ enable_percpu_irq(pp->dev->irq, 0);
+ }
}
- port->cause_rx_tx = cause_rx_tx;
+ if (pp->neta_armada3700)
+ pp->cause_rx_tx = cause_rx_tx;
+ else
+ port->cause_rx_tx = cause_rx_tx;
+
return rx_done;
}
@@ -2991,11 +3044,16 @@ static void mvneta_start_dev(struct mvneta_port *pp)
/* start the Rx/Tx activity */
mvneta_port_enable(pp);
- /* Enable polling on the port */
- for_each_online_cpu(cpu) {
- struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
+ if (!pp->neta_armada3700) {
+ /* Enable polling on the port */
+ for_each_online_cpu(cpu) {
+ struct mvneta_pcpu_port *port =
+ per_cpu_ptr(pp->ports, cpu);
- napi_enable(&port->napi);
+ napi_enable(&port->napi);
+ }
+ } else {
+ napi_enable(&pp->napi);
}
/* Unmask interrupts. It has to be done from each CPU */
@@ -3017,10 +3075,15 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
phy_stop(ndev->phydev);
- for_each_online_cpu(cpu) {
- struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
+ if (!pp->neta_armada3700) {
+ for_each_online_cpu(cpu) {
+ struct mvneta_pcpu_port *port =
+ per_cpu_ptr(pp->ports, cpu);
- napi_disable(&port->napi);
+ napi_disable(&port->napi);
+ }
+ } else {
+ napi_disable(&pp->napi);
}
netif_carrier_off(pp->dev);
@@ -3430,31 +3493,37 @@ static int mvneta_open(struct net_device *dev)
goto err_cleanup_rxqs;
/* Connect to port interrupt line */
- ret = request_percpu_irq(pp->dev->irq, mvneta_isr,
- MVNETA_DRIVER_NAME, pp->ports);
+ if (pp->neta_armada3700)
+ ret = request_irq(pp->dev->irq, mvneta_isr, 0,
+ dev->name, pp);
+ else
+ ret = request_percpu_irq(pp->dev->irq, mvneta_percpu_isr,
+ dev->name, pp->ports);
if (ret) {
netdev_err(pp->dev, "cannot request irq %d\n", pp->dev->irq);
goto err_cleanup_txqs;
}
- /* Enable per-CPU interrupt on all the CPU to handle our RX
- * queue interrupts
- */
- on_each_cpu(mvneta_percpu_enable, pp, true);
+ if (!pp->neta_armada3700) {
+ /* Enable per-CPU interrupt on all the CPU to handle our RX
+ * queue interrupts
+ */
+ on_each_cpu(mvneta_percpu_enable, pp, true);
- pp->is_stopped = false;
- /* Register a CPU notifier to handle the case where our CPU
- * might be taken offline.
- */
- ret = cpuhp_state_add_instance_nocalls(online_hpstate,
- &pp->node_online);
- if (ret)
- goto err_free_irq;
+ pp->is_stopped = false;
+ /* Register a CPU notifier to handle the case where our CPU
+ * might be taken offline.
+ */
+ ret = cpuhp_state_add_instance_nocalls(online_hpstate,
+ &pp->node_online);
+ if (ret)
+ goto err_free_irq;
- ret = cpuhp_state_add_instance_nocalls(CPUHP_NET_MVNETA_DEAD,
- &pp->node_dead);
- if (ret)
- goto err_free_online_hp;
+ ret = cpuhp_state_add_instance_nocalls(CPUHP_NET_MVNETA_DEAD,
+ &pp->node_dead);
+ if (ret)
+ goto err_free_online_hp;
+ }
/* In default link is down */
netif_carrier_off(pp->dev);
@@ -3470,13 +3539,20 @@ static int mvneta_open(struct net_device *dev)
return 0;
err_free_dead_hp:
- cpuhp_state_remove_instance_nocalls(CPUHP_NET_MVNETA_DEAD,
- &pp->node_dead);
+ if (!pp->neta_armada3700)
+ cpuhp_state_remove_instance_nocalls(CPUHP_NET_MVNETA_DEAD,
+ &pp->node_dead);
err_free_online_hp:
- cpuhp_state_remove_instance_nocalls(online_hpstate, &pp->node_online);
+ if (!pp->neta_armada3700)
+ cpuhp_state_remove_instance_nocalls(online_hpstate,
+ &pp->node_online);
err_free_irq:
- on_each_cpu(mvneta_percpu_disable, pp, true);
- free_percpu_irq(pp->dev->irq, pp->ports);
+ if (pp->neta_armada3700) {
+ free_irq(pp->dev->irq, pp);
+ } else {
+ on_each_cpu(mvneta_percpu_disable, pp, true);
+ free_percpu_irq(pp->dev->irq, pp->ports);
+ }
err_cleanup_txqs:
mvneta_cleanup_txqs(pp);
err_cleanup_rxqs:
@@ -3489,23 +3565,30 @@ static int mvneta_stop(struct net_device *dev)
{
struct mvneta_port *pp = netdev_priv(dev);
- /* Inform that we are stopping so we don't want to setup the
- * driver for new CPUs in the notifiers. The code of the
- * notifier for CPU online is protected by the same spinlock,
- * so when we get the lock, the notifer work is done.
- */
- spin_lock(&pp->lock);
- pp->is_stopped = true;
- spin_unlock(&pp->lock);
+ if (!pp->neta_armada3700) {
+ /* Inform that we are stopping so we don't want to setup the
+ * driver for new CPUs in the notifiers. The code of the
+ * notifier for CPU online is protected by the same spinlock,
+ * so when we get the lock, the notifer work is done.
+ */
+ spin_lock(&pp->lock);
+ pp->is_stopped = true;
+ spin_unlock(&pp->lock);
- mvneta_stop_dev(pp);
- mvneta_mdio_remove(pp);
+ mvneta_stop_dev(pp);
+ mvneta_mdio_remove(pp);
cpuhp_state_remove_instance_nocalls(online_hpstate, &pp->node_online);
cpuhp_state_remove_instance_nocalls(CPUHP_NET_MVNETA_DEAD,
&pp->node_dead);
- on_each_cpu(mvneta_percpu_disable, pp, true);
- free_percpu_irq(dev->irq, pp->ports);
+ on_each_cpu(mvneta_percpu_disable, pp, true);
+ free_percpu_irq(dev->irq, pp->ports);
+ } else {
+ mvneta_stop_dev(pp);
+ mvneta_mdio_remove(pp);
+ free_irq(dev->irq, pp);
+ }
+
mvneta_cleanup_rxqs(pp);
mvneta_cleanup_txqs(pp);
@@ -3784,6 +3867,11 @@ static int mvneta_ethtool_set_rxfh(struct net_device *dev, const u32 *indir,
const u8 *key, const u8 hfunc)
{
struct mvneta_port *pp = netdev_priv(dev);
+
+ /* Current code for Armada 3700 doesn't support RSS features yet */
+ if (pp->neta_armada3700)
+ return -EOPNOTSUPP;
+
/* We require at least one supported parameter to be changed
* and no change in any of the unsupported parameters
*/
@@ -3804,6 +3892,10 @@ static int mvneta_ethtool_get_rxfh(struct net_device *dev, u32 *indir, u8 *key,
{
struct mvneta_port *pp = netdev_priv(dev);
+ /* Current code for Armada 3700 doesn't support RSS features yet */
+ if (pp->neta_armada3700)
+ return -EOPNOTSUPP;
+
if (hfunc)
*hfunc = ETH_RSS_HASH_TOP;
@@ -3911,16 +4003,29 @@ static void mvneta_conf_mbus_windows(struct mvneta_port *pp,
win_enable = 0x3f;
win_protect = 0;
- for (i = 0; i < dram->num_cs; i++) {
- const struct mbus_dram_window *cs = dram->cs + i;
- mvreg_write(pp, MVNETA_WIN_BASE(i), (cs->base & 0xffff0000) |
- (cs->mbus_attr << 8) | dram->mbus_dram_target_id);
+ if (dram) {
+ for (i = 0; i < dram->num_cs; i++) {
+ const struct mbus_dram_window *cs = dram->cs + i;
+
+ mvreg_write(pp, MVNETA_WIN_BASE(i),
+ (cs->base & 0xffff0000) |
+ (cs->mbus_attr << 8) |
+ dram->mbus_dram_target_id);
- mvreg_write(pp, MVNETA_WIN_SIZE(i),
- (cs->size - 1) & 0xffff0000);
+ mvreg_write(pp, MVNETA_WIN_SIZE(i),
+ (cs->size - 1) & 0xffff0000);
- win_enable &= ~(1 << i);
- win_protect |= 3 << (2 * i);
+ win_enable &= ~(1 << i);
+ win_protect |= 3 << (2 * i);
+ }
+ } else {
+ /* For Armada3700 open default 4GB Mbus window, leaving
+ * arbitration of target/attribute to a different layer
+ * of configuration.
+ */
+ mvreg_write(pp, MVNETA_WIN_SIZE(0), 0xffff0000);
+ win_enable &= ~BIT(0);
+ win_protect = 3;
}
mvreg_write(pp, MVNETA_BASE_ADDR_ENABLE, win_enable);
@@ -4050,6 +4155,10 @@ static int mvneta_probe(struct platform_device *pdev)
pp->indir[0] = rxq_def;
+ /* Get special SoC configurations */
+ if (of_device_is_compatible(dn, "marvell,armada-3700-neta"))
+ pp->neta_armada3700 = true;
+
pp->clk = devm_clk_get(&pdev->dev, "core");
if (IS_ERR(pp->clk))
pp->clk = devm_clk_get(&pdev->dev, NULL);
@@ -4117,7 +4226,11 @@ static int mvneta_probe(struct platform_device *pdev)
pp->tx_csum_limit = tx_csum_limit;
dram_target_info = mv_mbus_dram_info();
- if (dram_target_info)
+ /* Armada3700 requires setting default configuration of Mbus
+ * windows, however without using filled mbus_dram_target_info
+ * structure.
+ */
+ if (dram_target_info || pp->neta_armada3700)
mvneta_conf_mbus_windows(pp, dram_target_info);
pp->tx_ring_size = MVNETA_MAX_TXD;
@@ -4150,11 +4263,20 @@ static int mvneta_probe(struct platform_device *pdev)
goto err_netdev;
}
- for_each_present_cpu(cpu) {
- struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
+ /* Armada3700 network controller does not support per-cpu
+ * operation, so only single NAPI should be initialized.
+ */
+ if (pp->neta_armada3700) {
+ netif_napi_add(dev, &pp->napi, mvneta_poll, NAPI_POLL_WEIGHT);
+ } else {
+ for_each_present_cpu(cpu) {
+ struct mvneta_pcpu_port *port =
+ per_cpu_ptr(pp->ports, cpu);
- netif_napi_add(dev, &port->napi, mvneta_poll, NAPI_POLL_WEIGHT);
- port->pp = pp;
+ netif_napi_add(dev, &port->napi, mvneta_poll,
+ NAPI_POLL_WEIGHT);
+ port->pp = pp;
+ }
}
dev->features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO;
@@ -4239,6 +4361,7 @@ static int mvneta_remove(struct platform_device *pdev)
static const struct of_device_id mvneta_match[] = {
{ .compatible = "marvell,armada-370-neta" },
{ .compatible = "marvell,armada-xp-neta" },
+ { .compatible = "marvell,armada-3700-neta" },
{ }
};
MODULE_DEVICE_TABLE(of, mvneta_match);
--
git-series 0.8.10
^ permalink raw reply related
* [PATCH v4 net-next 5/7] net: mvneta: Only disable mvneta_bm for 64-bits
From: Gregory CLEMENT @ 2016-11-29 14:55 UTC (permalink / raw)
To: David S. Miller, linux-kernel, netdev
Cc: Jisheng Zhang, Arnd Bergmann, Jason Cooper, Andrew Lunn,
Sebastian Hesselbarth, Gregory CLEMENT, Thomas Petazzoni,
linux-arm-kernel, Nadav Haklai, Marcin Wojtas, Dmitri Epshtein,
Yelena Krivosheev
In-Reply-To: <cover.654978388d6811b943dd58ecb9a9d9b6cceeaee3.1480431285.git-series.gregory.clement@free-electrons.com>
Actually only the mvneta_bm support is not 64-bits compatible.
The mvneta code itself can run on 64-bits architecture.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---
drivers/net/ethernet/marvell/Kconfig | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/Kconfig b/drivers/net/ethernet/marvell/Kconfig
index 66fd9dbb2ca7..2ccea9dd9248 100644
--- a/drivers/net/ethernet/marvell/Kconfig
+++ b/drivers/net/ethernet/marvell/Kconfig
@@ -44,6 +44,7 @@ config MVMDIO
config MVNETA_BM_ENABLE
tristate "Marvell Armada 38x/XP network interface BM support"
depends on MVNETA
+ depends on !64BIT
---help---
This driver supports auxiliary block of the network
interface units in the Marvell ARMADA XP and ARMADA 38x SoC
@@ -58,7 +59,6 @@ config MVNETA
tristate "Marvell Armada 370/38x/XP network interface support"
depends on PLAT_ORION || COMPILE_TEST
depends on HAS_DMA
- depends on !64BIT
select MVMDIO
select FIXED_PHY
---help---
@@ -71,6 +71,7 @@ config MVNETA
config MVNETA_BM
tristate
+ depends on !64BIT
default y if MVNETA=y && MVNETA_BM_ENABLE!=n
default MVNETA_BM_ENABLE
select HWBM
--
git-series 0.8.10
^ permalink raw reply related
* [PATCH v4 net-next 3/7] net: mvneta: Use cacheable memory to store the rx buffer virtual address
From: Gregory CLEMENT @ 2016-11-29 14:55 UTC (permalink / raw)
To: David S. Miller, linux-kernel, netdev
Cc: Jisheng Zhang, Arnd Bergmann, Jason Cooper, Andrew Lunn,
Sebastian Hesselbarth, Gregory CLEMENT, Thomas Petazzoni,
linux-arm-kernel, Nadav Haklai, Marcin Wojtas, Dmitri Epshtein,
Yelena Krivosheev
In-Reply-To: <cover.654978388d6811b943dd58ecb9a9d9b6cceeaee3.1480431285.git-series.gregory.clement@free-electrons.com>
Until now the virtual address of the received buffer were stored in the
cookie field of the rx descriptor. However, this field is 32-bits only
which prevents to use the driver on a 64-bits architecture.
With this patch the virtual address is stored in an array not shared with
the hardware (no more need to use the DMA API). Thanks to this, it is
possible to use cache contrary to the access of the rx descriptor member.
The change is done in the swbm path only because the hwbm uses the cookie
field, this also means that currently the hwbm is not usable in 64-bits.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---
drivers/net/ethernet/marvell/mvneta.c | 34 +++++++++++++++++++---------
1 file changed, 24 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index f5319c50f8d9..b4810b0c0ffc 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -561,6 +561,9 @@ struct mvneta_rx_queue {
u32 pkts_coal;
u32 time_coal;
+ /* Virtual address of the RX buffer */
+ void **buf_virt_addr;
+
/* Virtual address of the RX DMA descriptors array */
struct mvneta_rx_desc *descs;
@@ -1573,10 +1576,14 @@ static void mvneta_tx_done_pkts_coal_set(struct mvneta_port *pp,
/* Handle rx descriptor fill by setting buf_cookie and buf_phys_addr */
static void mvneta_rx_desc_fill(struct mvneta_rx_desc *rx_desc,
- u32 phys_addr, u32 cookie)
+ u32 phys_addr, void *virt_addr,
+ struct mvneta_rx_queue *rxq)
{
- rx_desc->buf_cookie = cookie;
+ int i;
+
rx_desc->buf_phys_addr = phys_addr;
+ i = rx_desc - rxq->descs;
+ rxq->buf_virt_addr[i] = virt_addr;
}
/* Decrement sent descriptors counter */
@@ -1781,7 +1788,8 @@ EXPORT_SYMBOL_GPL(mvneta_frag_free);
/* Refill processing for SW buffer management */
static int mvneta_rx_refill(struct mvneta_port *pp,
- struct mvneta_rx_desc *rx_desc)
+ struct mvneta_rx_desc *rx_desc,
+ struct mvneta_rx_queue *rxq)
{
dma_addr_t phys_addr;
@@ -1799,7 +1807,7 @@ static int mvneta_rx_refill(struct mvneta_port *pp,
return -ENOMEM;
}
- mvneta_rx_desc_fill(rx_desc, phys_addr, (u32)data);
+ mvneta_rx_desc_fill(rx_desc, phys_addr, data, rxq);
return 0;
}
@@ -1861,7 +1869,7 @@ static void mvneta_rxq_drop_pkts(struct mvneta_port *pp,
for (i = 0; i < rxq->size; i++) {
struct mvneta_rx_desc *rx_desc = rxq->descs + i;
- void *data = (void *)rx_desc->buf_cookie;
+ void *data = rxq->buf_virt_addr[i];
dma_unmap_single(pp->dev->dev.parent, rx_desc->buf_phys_addr,
MVNETA_RX_BUF_SIZE(pp->pkt_size), DMA_FROM_DEVICE);
@@ -1894,12 +1902,13 @@ static int mvneta_rx_swbm(struct mvneta_port *pp, int rx_todo,
unsigned char *data;
dma_addr_t phys_addr;
u32 rx_status, frag_size;
- int rx_bytes, err;
+ int rx_bytes, err, index;
rx_done++;
rx_status = rx_desc->status;
rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE);
- data = (unsigned char *)rx_desc->buf_cookie;
+ index = rx_desc - rxq->descs;
+ data = (unsigned char *)rxq->buf_virt_addr[index];
phys_addr = rx_desc->buf_phys_addr;
if (!mvneta_rxq_desc_is_first_last(rx_status) ||
@@ -1938,7 +1947,7 @@ static int mvneta_rx_swbm(struct mvneta_port *pp, int rx_todo,
}
/* Refill processing */
- err = mvneta_rx_refill(pp, rx_desc);
+ err = mvneta_rx_refill(pp, rx_desc, rxq);
if (err) {
netdev_err(dev, "Linux processing - Can't refill\n");
rxq->missed++;
@@ -2020,7 +2029,7 @@ static int mvneta_rx_hwbm(struct mvneta_port *pp, int rx_todo,
rx_done++;
rx_status = rx_desc->status;
rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE);
- data = (unsigned char *)rx_desc->buf_cookie;
+ data = (u8 *)(uintptr_t)rx_desc->buf_cookie;
phys_addr = rx_desc->buf_phys_addr;
pool_id = MVNETA_RX_GET_BM_POOL_ID(rx_desc);
bm_pool = &pp->bm_priv->bm_pools[pool_id];
@@ -2716,7 +2725,7 @@ static int mvneta_rxq_fill(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
for (i = 0; i < num; i++) {
memset(rxq->descs + i, 0, sizeof(struct mvneta_rx_desc));
- if (mvneta_rx_refill(pp, rxq->descs + i) != 0) {
+ if (mvneta_rx_refill(pp, rxq->descs + i, rxq) != 0) {
netdev_err(pp->dev, "%s:rxq %d, %d of %d buffs filled\n",
__func__, rxq->id, i, num);
break;
@@ -3865,6 +3874,11 @@ static int mvneta_init(struct device *dev, struct mvneta_port *pp)
rxq->size = pp->rx_ring_size;
rxq->pkts_coal = MVNETA_RX_COAL_PKTS;
rxq->time_coal = MVNETA_RX_COAL_USEC;
+ rxq->buf_virt_addr = devm_kmalloc(pp->dev->dev.parent,
+ rxq->size * sizeof(void *),
+ GFP_KERNEL);
+ if (!rxq->buf_virt_addr)
+ return -ENOMEM;
}
return 0;
--
git-series 0.8.10
^ permalink raw reply related
* [PATCH v4 net-next 2/7] net: mvneta: Do not allocate buffer in rxq init with HWBM
From: Gregory CLEMENT @ 2016-11-29 14:55 UTC (permalink / raw)
To: David S. Miller, linux-kernel, netdev
Cc: Jisheng Zhang, Arnd Bergmann, Jason Cooper, Andrew Lunn,
Sebastian Hesselbarth, Gregory CLEMENT, Thomas Petazzoni,
linux-arm-kernel, Nadav Haklai, Marcin Wojtas, Dmitri Epshtein,
Yelena Krivosheev
In-Reply-To: <cover.654978388d6811b943dd58ecb9a9d9b6cceeaee3.1480431285.git-series.gregory.clement@free-electrons.com>
For HWBM all buffers are allocated in mvneta_bm_construct() and in runtime
they are put into descriptors by hardware. There is no need to fill them
at this point.
Suggested-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---
drivers/net/ethernet/marvell/mvneta.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 1b84f746d748..f5319c50f8d9 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2784,14 +2784,14 @@ static int mvneta_rxq_init(struct mvneta_port *pp,
mvneta_rxq_buf_size_set(pp, rxq,
MVNETA_RX_BUF_SIZE(pp->pkt_size));
mvneta_rxq_bm_disable(pp, rxq);
+ mvneta_rxq_fill(pp, rxq, rxq->size);
} else {
mvneta_rxq_bm_enable(pp, rxq);
mvneta_rxq_long_pool_set(pp, rxq);
mvneta_rxq_short_pool_set(pp, rxq);
+ mvneta_rxq_non_occup_desc_add(pp, rxq, rxq->size);
}
- mvneta_rxq_fill(pp, rxq, rxq->size);
-
return 0;
}
--
git-series 0.8.10
^ permalink raw reply related
* [PATCH v4 net-next 0/7] Support Armada 37xx SoC (ARMv8 64-bits) in mvneta driver
From: Gregory CLEMENT @ 2016-11-29 14:55 UTC (permalink / raw)
To: David S. Miller, linux-kernel, netdev
Cc: Jisheng Zhang, Arnd Bergmann, Jason Cooper, Andrew Lunn,
Sebastian Hesselbarth, Gregory CLEMENT, Thomas Petazzoni,
linux-arm-kernel, Nadav Haklai, Marcin Wojtas, Dmitri Epshtein,
Yelena Krivosheev
Hi,
The Armada 37xx is a new ARMv8 SoC from Marvell using same network
controller as the older Armada 370/38x/XP SoCs. This series adapts the
driver in order to be able to use it on this new SoC. The main changes
are:
- 64-bits support: the first patches allow using the driver on a 64-bit
architecture.
- MBUS support: the mbus configuration is different on Armada 37xx
from the older SoCs.
- per cpu interrupt: Armada 37xx do not support per cpu interrupt for
the NETA IP, the non-per-CPU behavior was added back.
The first patch is an optimization in the rx path in swbm mode.
The second patch remove unnecessary allocation for HWBM.
The first item is solved by patches 4 and 5.
The 2 last items are solved by patch 6.
In patch 7 the dt support is added.
Beside Armada 37xx, this series have been again tested on Armada XP
and Armada 38x (with Hardware Buffer Management and with Software
Buffer Management).
This is the 4th version of the series:
- 1st version:
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/469588.html
- 2nd version:
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/470476.html
-3rd version:
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/470901.html
Changelog:
v3 -> v4:
- Adding new patch: "net: mvneta: do not allocate buffer in rxq init
with HWBM"
- Simplify the HWBM case in patch 3 as suggested by Marcin
v2 -> v3:
- Adding patch 1 "Optimize rx path for small frame"
- Fix the kbuild error by moving the "phys_addr += pp->rx_offset_correction;"
line from patch 2 to patch 3 where rx_offset_correction is introduced.
- Move the memory allocation of the buf_virt_addr of the rxq to be
called by the probe function in order to avoid a memory leak.
Thanks,
Gregory
Gregory CLEMENT (5):
net: mvneta: Optimize rx path for small frame
net: mvneta: Do not allocate buffer in rxq init with HWBM
net: mvneta: Use cacheable memory to store the rx buffer virtual address
net: mvneta: Only disable mvneta_bm for 64-bits
ARM64: dts: marvell: Add network support for Armada 3700
Marcin Wojtas (2):
net: mvneta: Convert to be 64 bits compatible
net: mvneta: Add network support for Armada 3700 SoC
Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt | 7 +-
arch/arm64/boot/dts/marvell/armada-3720-db.dts | 23 +++++-
arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 23 +++++-
drivers/net/ethernet/marvell/Kconfig | 10 +-
drivers/net/ethernet/marvell/mvneta.c | 344 +++++++++++++++++++++++++++++++++++++++++++++++++++---------------------
5 files changed, 305 insertions(+), 102 deletions(-)
base-commit: 436accebb53021ef7c63535f60bda410aa87c136
--
git-series 0.8.10
^ permalink raw reply
* [PATCH v4 net-next 1/7] net: mvneta: Optimize rx path for small frame
From: Gregory CLEMENT @ 2016-11-29 14:55 UTC (permalink / raw)
To: David S. Miller, linux-kernel, netdev
Cc: Jisheng Zhang, Arnd Bergmann, Jason Cooper, Andrew Lunn,
Sebastian Hesselbarth, Gregory CLEMENT, Thomas Petazzoni,
linux-arm-kernel, Nadav Haklai, Marcin Wojtas, Dmitri Epshtein,
Yelena Krivosheev
In-Reply-To: <cover.654978388d6811b943dd58ecb9a9d9b6cceeaee3.1480431285.git-series.gregory.clement@free-electrons.com>
For small frame reuse the phys_addr variable instead of accessing the
uncacheable value in the rx descriptor.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---
drivers/net/ethernet/marvell/mvneta.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 87274d4ab102..1b84f746d748 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1918,7 +1918,7 @@ static int mvneta_rx_swbm(struct mvneta_port *pp, int rx_todo,
goto err_drop_frame;
dma_sync_single_range_for_cpu(dev->dev.parent,
- rx_desc->buf_phys_addr,
+ phys_addr,
MVNETA_MH_SIZE + NET_SKB_PAD,
rx_bytes,
DMA_FROM_DEVICE);
--
git-series 0.8.10
^ permalink raw reply related
* [PATCH v4 net-next 4/7] net: mvneta: Convert to be 64 bits compatible
From: Gregory CLEMENT @ 2016-11-29 14:55 UTC (permalink / raw)
To: David S. Miller, linux-kernel, netdev
Cc: Jisheng Zhang, Arnd Bergmann, Jason Cooper, Andrew Lunn,
Sebastian Hesselbarth, Gregory CLEMENT, Thomas Petazzoni,
linux-arm-kernel, Nadav Haklai, Marcin Wojtas, Dmitri Epshtein,
Yelena Krivosheev
In-Reply-To: <cover.654978388d6811b943dd58ecb9a9d9b6cceeaee3.1480431285.git-series.gregory.clement@free-electrons.com>
From: Marcin Wojtas <mw@semihalf.com>
Prepare the mvneta driver in order to be usable on the 64 bits platform
such as the Armada 3700.
[gregory.clement@free-electrons.com]: this patch was extract from a larger
one to ease review and maintenance.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---
drivers/net/ethernet/marvell/mvneta.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index b4810b0c0ffc..6421402ef394 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -296,6 +296,12 @@
/* descriptor aligned size */
#define MVNETA_DESC_ALIGNED_SIZE 32
+/* Number of bytes to be taken into account by HW when putting incoming data
+ * to the buffers. It is needed in case NET_SKB_PAD exceeds maximum packet
+ * offset supported in MVNETA_RXQ_CONFIG_REG(q) registers.
+ */
+#define MVNETA_RX_PKT_OFFSET_CORRECTION 64
+
#define MVNETA_RX_PKT_SIZE(mtu) \
ALIGN((mtu) + MVNETA_MH_SIZE + MVNETA_VLAN_TAG_LEN + \
ETH_HLEN + ETH_FCS_LEN, \
@@ -416,6 +422,7 @@ struct mvneta_port {
u64 ethtool_stats[ARRAY_SIZE(mvneta_statistics)];
u32 indir[MVNETA_RSS_LU_TABLE_SIZE];
+ u16 rx_offset_correction;
};
/* The mvneta_tx_desc and mvneta_rx_desc structures describe the
@@ -1807,6 +1814,7 @@ static int mvneta_rx_refill(struct mvneta_port *pp,
return -ENOMEM;
}
+ phys_addr += pp->rx_offset_correction;
mvneta_rx_desc_fill(rx_desc, phys_addr, data, rxq);
return 0;
}
@@ -2782,7 +2790,7 @@ static int mvneta_rxq_init(struct mvneta_port *pp,
mvreg_write(pp, MVNETA_RXQ_SIZE_REG(rxq->id), rxq->size);
/* Set Offset */
- mvneta_rxq_offset_set(pp, rxq, NET_SKB_PAD);
+ mvneta_rxq_offset_set(pp, rxq, NET_SKB_PAD - pp->rx_offset_correction);
/* Set coalescing pkts and time */
mvneta_rx_pkts_coal_set(pp, rxq, rxq->pkts_coal);
@@ -4033,6 +4041,13 @@ static int mvneta_probe(struct platform_device *pdev)
pp->rxq_def = rxq_def;
+ /* Set RX packet offset correction for platforms, whose
+ * NET_SKB_PAD, exceeds 64B. It should be 64B for 64-bit
+ * platforms and 0B for 32-bit ones.
+ */
+ pp->rx_offset_correction =
+ max(0, NET_SKB_PAD - MVNETA_RX_PKT_OFFSET_CORRECTION);
+
pp->indir[0] = rxq_def;
pp->clk = devm_clk_get(&pdev->dev, "core");
--
git-series 0.8.10
^ permalink raw reply related
* [PATCH v2 net-next 11/11] qede: Add support for XDP_TX
From: Yuval Mintz @ 2016-11-29 14:47 UTC (permalink / raw)
To: davem, netdev; +Cc: Yuval Mintz
In-Reply-To: <1480430830-17671-1-git-send-email-Yuval.Mintz@cavium.com>
Add support for forwarding via XDP. Once the eBPF is attached,
driver would allocate & configure a designated transmission queue
meant solely for forwarding packets. Said queue would share the
receive-queue's interrupt line, and would have it's own Tx statistics.
Infrastructure changes required for this [spread-out through the code]:
- Determine the DMA direction of the receive buffers based on the presence
of the eBPF program.
- Turn the sw Tx ring into a union, as regular/XDP queues have different
needs for releasing resources after completion [regular requires the SKB,
XDP requires the transmitted page].
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
---
drivers/net/ethernet/qlogic/qede/qede.h | 16 +-
drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 22 ++-
drivers/net/ethernet/qlogic/qede/qede_main.c | 213 ++++++++++++++++++++----
3 files changed, 216 insertions(+), 35 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h
index 56ab446..fcd9741 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -258,6 +258,7 @@ struct qede_rx_queue {
u16 sw_rx_prod;
u16 num_rx_buffers; /* Slowpath */
+ u8 data_direction;
u8 rxq_id;
u32 rx_buf_size;
@@ -294,6 +295,7 @@ struct sw_tx_bd {
};
struct qede_tx_queue {
+ u8 is_xdp;
bool is_legacy;
u16 sw_tx_cons;
u16 sw_tx_prod;
@@ -310,8 +312,18 @@ struct qede_tx_queue {
void __iomem *doorbell_addr;
union db_prod tx_db;
int index; /* Slowpath only */
+#define QEDE_TXQ_XDP_TO_IDX(edev, txq) ((txq)->index - \
+ QEDE_MAX_TSS_CNT(edev))
+#define QEDE_TXQ_IDX_TO_XDP(edev, idx) ((idx) + QEDE_MAX_TSS_CNT(edev))
+
+ /* Regular Tx requires skb + metadata for release purpose,
+ * while XDP requires only the pages themselves.
+ */
+ union {
+ struct sw_tx_bd *skbs;
+ struct page **pages;
+ } sw_tx_ring;
- struct sw_tx_bd *sw_tx_ring;
struct qed_chain tx_pbl;
/* Slowpath; Should be kept in end [unless missing padding] */
@@ -336,10 +348,12 @@ struct qede_fastpath {
#define QEDE_FASTPATH_COMBINED (QEDE_FASTPATH_TX | QEDE_FASTPATH_RX)
u8 type;
u8 id;
+ u8 xdp_xmit;
struct napi_struct napi;
struct qed_sb_info *sb_info;
struct qede_rx_queue *rxq;
struct qede_tx_queue *txq;
+ struct qede_tx_queue *xdp_tx;
#define VEC_NAME_SIZE (sizeof(((struct net_device *)0)->name) + 8)
char name[VEC_NAME_SIZE];
diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
index 6c70e29..1c48f44 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -165,8 +165,13 @@ static void qede_get_strings_stats_txq(struct qede_dev *edev,
int i;
for (i = 0; i < QEDE_NUM_TQSTATS; i++) {
- sprintf(*buf, "%d: %s", txq->index,
- qede_tqstats_arr[i].string);
+ if (txq->is_xdp)
+ sprintf(*buf, "%d [XDP]: %s",
+ QEDE_TXQ_XDP_TO_IDX(edev, txq),
+ qede_tqstats_arr[i].string);
+ else
+ sprintf(*buf, "%d: %s", txq->index,
+ qede_tqstats_arr[i].string);
*buf += ETH_GSTRING_LEN;
}
}
@@ -195,6 +200,9 @@ static void qede_get_strings_stats(struct qede_dev *edev, u8 *buf)
if (fp->type & QEDE_FASTPATH_RX)
qede_get_strings_stats_rxq(edev, fp->rxq, &buf);
+ if (fp->type & QEDE_FASTPATH_XDP)
+ qede_get_strings_stats_txq(edev, fp->xdp_tx, &buf);
+
if (fp->type & QEDE_FASTPATH_TX)
qede_get_strings_stats_txq(edev, fp->txq, &buf);
}
@@ -268,6 +276,9 @@ static void qede_get_ethtool_stats(struct net_device *dev,
if (fp->type & QEDE_FASTPATH_RX)
qede_get_ethtool_stats_rxq(fp->rxq, &buf);
+ if (fp->type & QEDE_FASTPATH_XDP)
+ qede_get_ethtool_stats_txq(fp->xdp_tx, &buf);
+
if (fp->type & QEDE_FASTPATH_TX)
qede_get_ethtool_stats_txq(fp->txq, &buf);
}
@@ -305,6 +316,9 @@ static int qede_get_sset_count(struct net_device *dev, int stringset)
/* Account for the Regular Rx statistics */
num_stats += QEDE_RSS_COUNT(edev) * QEDE_NUM_RQSTATS;
+ /* Account for XDP statistics [if needed] */
+ if (edev->xdp_prog)
+ num_stats += QEDE_RSS_COUNT(edev) * QEDE_NUM_TQSTATS;
return num_stats;
case ETH_SS_PRIV_FLAGS:
@@ -1216,7 +1230,7 @@ static int qede_selftest_transmit_traffic(struct qede_dev *edev,
/* Fill the entry in the SW ring and the BDs in the FW ring */
idx = txq->sw_tx_prod & NUM_TX_BDS_MAX;
- txq->sw_tx_ring[idx].skb = skb;
+ txq->sw_tx_ring.skbs[idx].skb = skb;
first_bd = qed_chain_produce(&txq->tx_pbl);
memset(first_bd, 0, sizeof(*first_bd));
val = 1 << ETH_TX_1ST_BD_FLAGS_START_BD_SHIFT;
@@ -1270,7 +1284,7 @@ static int qede_selftest_transmit_traffic(struct qede_dev *edev,
dma_unmap_single(&edev->pdev->dev, BD_UNMAP_ADDR(first_bd),
BD_UNMAP_LEN(first_bd), DMA_TO_DEVICE);
txq->sw_tx_cons++;
- txq->sw_tx_ring[idx].skb = NULL;
+ txq->sw_tx_ring.skbs[idx].skb = NULL;
return 0;
}
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 67e08c6..f787f2f 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -94,6 +94,9 @@ enum qede_pci_private {
#define TX_TIMEOUT (5 * HZ)
+/* Utilize last protocol index for XDP */
+#define XDP_PI 11
+
static void qede_remove(struct pci_dev *pdev);
static void qede_shutdown(struct pci_dev *pdev);
static void qede_link_update(void *dev, struct qed_link_output *link);
@@ -301,12 +304,12 @@ static int qede_free_tx_pkt(struct qede_dev *edev,
struct qede_tx_queue *txq, int *len)
{
u16 idx = txq->sw_tx_cons & NUM_TX_BDS_MAX;
- struct sk_buff *skb = txq->sw_tx_ring[idx].skb;
+ struct sk_buff *skb = txq->sw_tx_ring.skbs[idx].skb;
struct eth_tx_1st_bd *first_bd;
struct eth_tx_bd *tx_data_bd;
int bds_consumed = 0;
int nbds;
- bool data_split = txq->sw_tx_ring[idx].flags & QEDE_TSO_SPLIT_BD;
+ bool data_split = txq->sw_tx_ring.skbs[idx].flags & QEDE_TSO_SPLIT_BD;
int i, split_bd_len = 0;
if (unlikely(!skb)) {
@@ -346,8 +349,8 @@ static int qede_free_tx_pkt(struct qede_dev *edev,
/* Free skb */
dev_kfree_skb_any(skb);
- txq->sw_tx_ring[idx].skb = NULL;
- txq->sw_tx_ring[idx].flags = 0;
+ txq->sw_tx_ring.skbs[idx].skb = NULL;
+ txq->sw_tx_ring.skbs[idx].flags = 0;
return 0;
}
@@ -358,7 +361,7 @@ static void qede_free_failed_tx_pkt(struct qede_tx_queue *txq,
int nbd, bool data_split)
{
u16 idx = txq->sw_tx_prod & NUM_TX_BDS_MAX;
- struct sk_buff *skb = txq->sw_tx_ring[idx].skb;
+ struct sk_buff *skb = txq->sw_tx_ring.skbs[idx].skb;
struct eth_tx_bd *tx_data_bd;
int i, split_bd_len = 0;
@@ -394,8 +397,8 @@ static void qede_free_failed_tx_pkt(struct qede_tx_queue *txq,
/* Free skb */
dev_kfree_skb_any(skb);
- txq->sw_tx_ring[idx].skb = NULL;
- txq->sw_tx_ring[idx].flags = 0;
+ txq->sw_tx_ring.skbs[idx].skb = NULL;
+ txq->sw_tx_ring.skbs[idx].flags = 0;
}
static u32 qede_xmit_type(struct sk_buff *skb, int *ipv6_ext)
@@ -530,6 +533,47 @@ static inline void qede_update_tx_producer(struct qede_tx_queue *txq)
mmiowb();
}
+static int qede_xdp_xmit(struct qede_dev *edev, struct qede_fastpath *fp,
+ struct sw_rx_data *metadata, u16 padding, u16 length)
+{
+ struct qede_tx_queue *txq = fp->xdp_tx;
+ u16 idx = txq->sw_tx_prod & NUM_TX_BDS_MAX;
+ struct eth_tx_1st_bd *first_bd;
+
+ if (!qed_chain_get_elem_left(&txq->tx_pbl)) {
+ txq->stopped_cnt++;
+ return -ENOMEM;
+ }
+
+ first_bd = (struct eth_tx_1st_bd *)qed_chain_produce(&txq->tx_pbl);
+
+ memset(first_bd, 0, sizeof(*first_bd));
+ first_bd->data.bd_flags.bitfields =
+ BIT(ETH_TX_1ST_BD_FLAGS_START_BD_SHIFT);
+ first_bd->data.bitfields |=
+ (length & ETH_TX_DATA_1ST_BD_PKT_LEN_MASK) <<
+ ETH_TX_DATA_1ST_BD_PKT_LEN_SHIFT;
+ first_bd->data.nbds = 1;
+
+ /* We can safely ignore the offset, as it's 0 for XDP */
+ BD_SET_UNMAP_ADDR_LEN(first_bd, metadata->mapping + padding, length);
+
+ /* Synchronize the buffer back to device, as program [probably]
+ * has changed it.
+ */
+ dma_sync_single_for_device(&edev->pdev->dev,
+ metadata->mapping + padding,
+ length, PCI_DMA_TODEVICE);
+
+ txq->sw_tx_ring.pages[idx] = metadata->data;
+ txq->sw_tx_prod++;
+
+ /* Mark the fastpath for future XDP doorbell */
+ fp->xdp_xmit = 1;
+
+ return 0;
+}
+
/* Main transmit function */
static netdev_tx_t qede_start_xmit(struct sk_buff *skb,
struct net_device *ndev)
@@ -573,7 +617,7 @@ static netdev_tx_t qede_start_xmit(struct sk_buff *skb,
/* Fill the entry in the SW ring and the BDs in the FW ring */
idx = txq->sw_tx_prod & NUM_TX_BDS_MAX;
- txq->sw_tx_ring[idx].skb = skb;
+ txq->sw_tx_ring.skbs[idx].skb = skb;
first_bd = (struct eth_tx_1st_bd *)
qed_chain_produce(&txq->tx_pbl);
memset(first_bd, 0, sizeof(*first_bd));
@@ -693,7 +737,7 @@ static netdev_tx_t qede_start_xmit(struct sk_buff *skb,
/* this marks the BD as one that has no
* individual mapping
*/
- txq->sw_tx_ring[idx].flags |= QEDE_TSO_SPLIT_BD;
+ txq->sw_tx_ring.skbs[idx].flags |= QEDE_TSO_SPLIT_BD;
first_bd->nbytes = cpu_to_le16(hlen);
@@ -802,6 +846,27 @@ int qede_txq_has_work(struct qede_tx_queue *txq)
return hw_bd_cons != qed_chain_get_cons_idx(&txq->tx_pbl);
}
+static void qede_xdp_tx_int(struct qede_dev *edev, struct qede_tx_queue *txq)
+{
+ struct eth_tx_1st_bd *bd;
+ u16 hw_bd_cons;
+
+ hw_bd_cons = le16_to_cpu(*txq->hw_cons_ptr);
+ barrier();
+
+ while (hw_bd_cons != qed_chain_get_cons_idx(&txq->tx_pbl)) {
+ bd = (struct eth_tx_1st_bd *)qed_chain_consume(&txq->tx_pbl);
+
+ dma_unmap_single(&edev->pdev->dev, BD_UNMAP_ADDR(bd),
+ PAGE_SIZE, DMA_BIDIRECTIONAL);
+ __free_page(txq->sw_tx_ring.pages[txq->sw_tx_cons &
+ NUM_TX_BDS_MAX]);
+
+ txq->sw_tx_cons++;
+ txq->xmit_pkts++;
+ }
+}
+
static int qede_tx_int(struct qede_dev *edev, struct qede_tx_queue *txq)
{
struct netdev_queue *netdev_txq;
@@ -942,7 +1007,7 @@ static int qede_alloc_rx_buffer(struct qede_rx_queue *rxq)
* for multiple RX buffer segment size mapping.
*/
mapping = dma_map_page(rxq->dev, data, 0,
- PAGE_SIZE, DMA_FROM_DEVICE);
+ PAGE_SIZE, rxq->data_direction);
if (unlikely(dma_mapping_error(rxq->dev, mapping))) {
__free_page(data);
return -ENOMEM;
@@ -981,7 +1046,7 @@ static inline int qede_realloc_rx_buffer(struct qede_rx_queue *rxq,
}
dma_unmap_page(rxq->dev, curr_cons->mapping,
- PAGE_SIZE, DMA_FROM_DEVICE);
+ PAGE_SIZE, rxq->data_direction);
} else {
/* Increment refcount of the page as we don't want
* network stack to take the ownership of the page
@@ -1441,6 +1506,26 @@ static bool qede_rx_xdp(struct qede_dev *edev,
rxq->xdp_no_pass++;
switch (act) {
+ case XDP_TX:
+ /* We need the replacement buffer before transmit. */
+ if (qede_alloc_rx_buffer(rxq)) {
+ qede_recycle_rx_bd_ring(rxq, 1);
+ return false;
+ }
+
+ /* Now if there's a transmission problem, we'd still have to
+ * throw current buffer, as replacement was already allocated.
+ */
+ if (qede_xdp_xmit(edev, fp, bd, cqe->placement_offset, len)) {
+ dma_unmap_page(rxq->dev, bd->mapping,
+ PAGE_SIZE, DMA_BIDIRECTIONAL);
+ __free_page(bd->data);
+ }
+
+ /* Regardless, we've consumed an Rx BD */
+ qede_rx_bd_ring_consume(rxq);
+ return false;
+
default:
bpf_warn_invalid_xdp_action(act);
case XDP_ABORTED:
@@ -1740,6 +1825,10 @@ static bool qede_poll_is_more_work(struct qede_fastpath *fp)
if (qede_has_rx_work(fp->rxq))
return true;
+ if (fp->type & QEDE_FASTPATH_XDP)
+ if (qede_txq_has_work(fp->xdp_tx))
+ return true;
+
if (likely(fp->type & QEDE_FASTPATH_TX))
if (qede_txq_has_work(fp->txq))
return true;
@@ -1757,6 +1846,9 @@ static int qede_poll(struct napi_struct *napi, int budget)
if (likely(fp->type & QEDE_FASTPATH_TX) && qede_txq_has_work(fp->txq))
qede_tx_int(edev, fp->txq);
+ if ((fp->type & QEDE_FASTPATH_XDP) && qede_txq_has_work(fp->xdp_tx))
+ qede_xdp_tx_int(edev, fp->xdp_tx);
+
rx_work_done = (likely(fp->type & QEDE_FASTPATH_RX) &&
qede_has_rx_work(fp->rxq)) ?
qede_rx_int(fp, budget) : 0;
@@ -1771,6 +1863,14 @@ static int qede_poll(struct napi_struct *napi, int budget)
}
}
+ if (fp->xdp_xmit) {
+ u16 xdp_prod = qed_chain_get_prod_idx(&fp->xdp_tx->tx_pbl);
+
+ fp->xdp_xmit = 0;
+ fp->xdp_tx->tx_db.data.bd_prod = cpu_to_le16(xdp_prod);
+ qede_update_tx_producer(fp->xdp_tx);
+ }
+
return rx_work_done;
}
@@ -2586,6 +2686,7 @@ static void qede_free_fp_array(struct qede_dev *edev)
kfree(fp->sb_info);
kfree(fp->rxq);
+ kfree(fp->xdp_tx);
kfree(fp->txq);
}
kfree(edev->fp_array);
@@ -2646,8 +2747,13 @@ static int qede_alloc_fp_array(struct qede_dev *edev)
if (!fp->rxq)
goto err;
- if (edev->xdp_prog)
+ if (edev->xdp_prog) {
+ fp->xdp_tx = kzalloc(sizeof(*fp->xdp_tx),
+ GFP_KERNEL);
+ if (!fp->xdp_tx)
+ goto err;
fp->type |= QEDE_FASTPATH_XDP;
+ }
}
}
@@ -2694,9 +2800,9 @@ static void qede_update_pf_params(struct qed_dev *cdev)
{
struct qed_pf_params pf_params;
- /* 64 rx + 64 tx */
+ /* 64 rx + 64 tx + 64 XDP */
memset(&pf_params, 0, sizeof(struct qed_pf_params));
- pf_params.eth_pf_params.num_cons = 128;
+ pf_params.eth_pf_params.num_cons = 192;
qed_ops->common->update_pf_params(cdev, &pf_params);
}
@@ -2953,7 +3059,7 @@ static void qede_free_rx_buffers(struct qede_dev *edev,
data = rx_buf->data;
dma_unmap_page(&edev->pdev->dev,
- rx_buf->mapping, PAGE_SIZE, DMA_FROM_DEVICE);
+ rx_buf->mapping, PAGE_SIZE, rxq->data_direction);
rx_buf->data = NULL;
__free_page(data);
@@ -3114,7 +3220,10 @@ static int qede_alloc_mem_rxq(struct qede_dev *edev, struct qede_rx_queue *rxq)
static void qede_free_mem_txq(struct qede_dev *edev, struct qede_tx_queue *txq)
{
/* Free the parallel SW ring */
- kfree(txq->sw_tx_ring);
+ if (txq->is_xdp)
+ kfree(txq->sw_tx_ring.pages);
+ else
+ kfree(txq->sw_tx_ring.skbs);
/* Free the real RQ ring used by FW */
edev->ops->common->chain_free(edev->cdev, &txq->tx_pbl);
@@ -3123,17 +3232,22 @@ static void qede_free_mem_txq(struct qede_dev *edev, struct qede_tx_queue *txq)
/* This function allocates all memory needed per Tx queue */
static int qede_alloc_mem_txq(struct qede_dev *edev, struct qede_tx_queue *txq)
{
- int size, rc;
union eth_tx_bd_types *p_virt;
+ int size, rc;
txq->num_tx_buffers = edev->q_num_tx_buffers;
/* Allocate the parallel driver ring for Tx buffers */
- size = sizeof(*txq->sw_tx_ring) * TX_RING_SIZE;
- txq->sw_tx_ring = kzalloc(size, GFP_KERNEL);
- if (!txq->sw_tx_ring) {
- DP_NOTICE(edev, "Tx buffers ring allocation failed\n");
- goto err;
+ if (txq->is_xdp) {
+ size = sizeof(*txq->sw_tx_ring.pages) * TX_RING_SIZE;
+ txq->sw_tx_ring.pages = kzalloc(size, GFP_KERNEL);
+ if (!txq->sw_tx_ring.pages)
+ goto err;
+ } else {
+ size = sizeof(*txq->sw_tx_ring.skbs) * TX_RING_SIZE;
+ txq->sw_tx_ring.skbs = kzalloc(size, GFP_KERNEL);
+ if (!txq->sw_tx_ring.skbs)
+ goto err;
}
rc = edev->ops->common->chain_alloc(edev->cdev,
@@ -3169,26 +3283,31 @@ static void qede_free_mem_fp(struct qede_dev *edev, struct qede_fastpath *fp)
*/
static int qede_alloc_mem_fp(struct qede_dev *edev, struct qede_fastpath *fp)
{
- int rc;
+ int rc = 0;
rc = qede_alloc_mem_sb(edev, fp->sb_info, fp->id);
if (rc)
- goto err;
+ goto out;
if (fp->type & QEDE_FASTPATH_RX) {
rc = qede_alloc_mem_rxq(edev, fp->rxq);
if (rc)
- goto err;
+ goto out;
+ }
+
+ if (fp->type & QEDE_FASTPATH_XDP) {
+ rc = qede_alloc_mem_txq(edev, fp->xdp_tx);
+ if (rc)
+ goto out;
}
if (fp->type & QEDE_FASTPATH_TX) {
rc = qede_alloc_mem_txq(edev, fp->txq);
if (rc)
- goto err;
+ goto out;
}
- return 0;
-err:
+out:
return rc;
}
@@ -3236,9 +3355,20 @@ static void qede_init_fp(struct qede_dev *edev)
fp->edev = edev;
fp->id = queue_id;
+ if (fp->type & QEDE_FASTPATH_XDP) {
+ fp->xdp_tx->index = QEDE_TXQ_IDX_TO_XDP(edev,
+ rxq_index);
+ fp->xdp_tx->is_xdp = 1;
+ }
if (fp->type & QEDE_FASTPATH_RX) {
fp->rxq->rxq_id = rxq_index++;
+
+ /* Determine how to map buffers for this queue */
+ if (fp->type & QEDE_FASTPATH_XDP)
+ fp->rxq->data_direction = DMA_BIDIRECTIONAL;
+ else
+ fp->rxq->data_direction = DMA_FROM_DEVICE;
fp->rxq->dev = &edev->pdev->dev;
}
@@ -3449,6 +3579,12 @@ static int qede_stop_queues(struct qede_dev *edev)
if (rc)
return rc;
}
+
+ if (fp->type & QEDE_FASTPATH_XDP) {
+ rc = qede_drain_txq(edev, fp->xdp_tx, true);
+ if (rc)
+ return rc;
+ }
}
/* Stop all Queues in reverse order */
@@ -3471,8 +3607,14 @@ static int qede_stop_queues(struct qede_dev *edev)
}
}
- if (fp->type & QEDE_FASTPATH_XDP)
+ /* Stop the XDP forwarding queue */
+ if (fp->type & QEDE_FASTPATH_XDP) {
+ rc = qede_stop_txq(edev, fp->xdp_tx, i);
+ if (rc)
+ return rc;
+
bpf_prog_put(fp->rxq->xdp_prog);
+ }
}
/* Stop the vport */
@@ -3496,7 +3638,14 @@ static int qede_start_txq(struct qede_dev *edev,
memset(¶ms, 0, sizeof(params));
memset(&ret_params, 0, sizeof(ret_params));
- params.queue_id = txq->index;
+ /* Let the XDP queue share the queue-zone with one of the regular txq.
+ * We don't really care about its coalescing.
+ */
+ if (txq->is_xdp)
+ params.queue_id = QEDE_TXQ_XDP_TO_IDX(edev, txq);
+ else
+ params.queue_id = txq->index;
+
params.sb = fp->sb_info->igu_sb_id;
params.sb_idx = sb_idx;
@@ -3601,6 +3750,10 @@ static int qede_start_queues(struct qede_dev *edev, bool clear_stats)
}
if (fp->type & QEDE_FASTPATH_XDP) {
+ rc = qede_start_txq(edev, fp, fp->xdp_tx, i, XDP_PI);
+ if (rc)
+ return rc;
+
fp->rxq->xdp_prog = bpf_prog_add(edev->xdp_prog, 1);
if (IS_ERR(fp->rxq->xdp_prog)) {
rc = PTR_ERR(fp->rxq->xdp_prog);
--
1.9.3
^ permalink raw reply related
* [PATCH v2 net-next 10/11] qede: Add basic XDP support
From: Yuval Mintz @ 2016-11-29 14:47 UTC (permalink / raw)
To: davem, netdev; +Cc: Yuval Mintz
In-Reply-To: <1480430830-17671-1-git-send-email-Yuval.Mintz@cavium.com>
Add support for the ndo_xdp callback. This patch would support XDP_PASS,
XDP_DROP and XDP_ABORTED commands.
This also adds a per Rx queue statistic which counts number of packets
which didn't reach the stack [due to XDP].
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
---
drivers/net/ethernet/qlogic/qede/qede.h | 9 ++
drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 1 +
drivers/net/ethernet/qlogic/qede/qede_main.c | 120 +++++++++++++++++++++++-
3 files changed, 127 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h
index 21e8533..56ab446 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -16,6 +16,7 @@
#include <linux/bitmap.h>
#include <linux/kernel.h>
#include <linux/mutex.h>
+#include <linux/bpf.h>
#include <linux/io.h>
#include <linux/qed/common_hsi.h>
#include <linux/qed/eth_common.h>
@@ -187,6 +188,8 @@ struct qede_dev {
bool wol_enabled;
struct qede_rdma_dev rdma_info;
+
+ struct bpf_prog *xdp_prog;
};
enum QEDE_STATE {
@@ -249,6 +252,8 @@ struct qede_rx_queue {
/* Required for the allocation of replacement buffers */
struct device *dev;
+ struct bpf_prog *xdp_prog;
+
u16 sw_rx_cons;
u16 sw_rx_prod;
@@ -271,6 +276,8 @@ struct qede_rx_queue {
u64 rx_alloc_errors;
u64 rx_ip_frags;
+ u64 xdp_no_pass;
+
void *handle;
};
@@ -325,6 +332,7 @@ struct qede_fastpath {
struct qede_dev *edev;
#define QEDE_FASTPATH_TX BIT(0)
#define QEDE_FASTPATH_RX BIT(1)
+#define QEDE_FASTPATH_XDP BIT(2)
#define QEDE_FASTPATH_COMBINED (QEDE_FASTPATH_TX | QEDE_FASTPATH_RX)
u8 type;
u8 id;
@@ -358,6 +366,7 @@ struct qede_reload_args {
void (*func)(struct qede_dev *edev, struct qede_reload_args *args);
union {
netdev_features_t features;
+ struct bpf_prog *new_prog;
u16 mtu;
} u;
};
diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
index 60a2e58..6c70e29 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -32,6 +32,7 @@
QEDE_RQSTAT(rx_hw_errors),
QEDE_RQSTAT(rx_alloc_errors),
QEDE_RQSTAT(rx_ip_frags),
+ QEDE_RQSTAT(xdp_no_pass),
};
#define QEDE_NUM_RQSTATS ARRAY_SIZE(qede_rqstats_arr)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 15520bf..67e08c6 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1418,6 +1418,39 @@ static bool qede_pkt_is_ip_fragmented(struct eth_fast_path_rx_reg_cqe *cqe,
return false;
}
+/* Return true iff packet is to be passed to stack */
+static bool qede_rx_xdp(struct qede_dev *edev,
+ struct qede_fastpath *fp,
+ struct qede_rx_queue *rxq,
+ struct bpf_prog *prog,
+ struct sw_rx_data *bd,
+ struct eth_fast_path_rx_reg_cqe *cqe)
+{
+ u16 len = le16_to_cpu(cqe->len_on_first_bd);
+ struct xdp_buff xdp;
+ enum xdp_action act;
+
+ xdp.data = page_address(bd->data) + cqe->placement_offset;
+ xdp.data_end = xdp.data + len;
+ act = bpf_prog_run_xdp(prog, &xdp);
+
+ if (act == XDP_PASS)
+ return true;
+
+ /* Count number of packets not to be passed to stack */
+ rxq->xdp_no_pass++;
+
+ switch (act) {
+ default:
+ bpf_warn_invalid_xdp_action(act);
+ case XDP_ABORTED:
+ case XDP_DROP:
+ qede_recycle_rx_bd_ring(rxq, cqe->bd_num);
+ }
+
+ return false;
+}
+
static struct sk_buff *qede_rx_allocate_skb(struct qede_dev *edev,
struct qede_rx_queue *rxq,
struct sw_rx_data *bd, u16 len,
@@ -1560,6 +1593,7 @@ static int qede_rx_process_cqe(struct qede_dev *edev,
struct qede_fastpath *fp,
struct qede_rx_queue *rxq)
{
+ struct bpf_prog *xdp_prog = READ_ONCE(rxq->xdp_prog);
struct eth_fast_path_rx_reg_cqe *fp_cqe;
u16 len, pad, bd_cons_idx, parse_flag;
enum eth_rx_cqe_type cqe_type;
@@ -1596,6 +1630,11 @@ static int qede_rx_process_cqe(struct qede_dev *edev,
len = le16_to_cpu(fp_cqe->len_on_first_bd);
pad = fp_cqe->placement_offset;
+ /* Run eBPF program if one is attached */
+ if (xdp_prog)
+ if (!qede_rx_xdp(edev, fp, rxq, xdp_prog, bd, fp_cqe))
+ return 1;
+
/* If this is an error packet then drop it */
flags = cqe->fast_path_regular.pars_flags.flags;
parse_flag = le16_to_cpu(flags);
@@ -2226,7 +2265,16 @@ int qede_set_features(struct net_device *dev, netdev_features_t features)
args.u.features = features;
args.func = &qede_set_features_reload;
- qede_reload(edev, &args, false);
+ /* Make sure that we definitely need to reload.
+ * In case of an eBPF attached program, there will be no FW
+ * aggregations, so no need to actually reload.
+ */
+ __qede_lock(edev);
+ if (edev->xdp_prog)
+ args.func(edev, &args);
+ else
+ qede_reload(edev, &args, true);
+ __qede_unlock(edev);
return 1;
}
@@ -2338,6 +2386,43 @@ static netdev_features_t qede_features_check(struct sk_buff *skb,
return features;
}
+static void qede_xdp_reload_func(struct qede_dev *edev,
+ struct qede_reload_args *args)
+{
+ struct bpf_prog *old;
+
+ old = xchg(&edev->xdp_prog, args->u.new_prog);
+ if (old)
+ bpf_prog_put(old);
+}
+
+static int qede_xdp_set(struct qede_dev *edev, struct bpf_prog *prog)
+{
+ struct qede_reload_args args;
+
+ /* If we're called, there was already a bpf reference increment */
+ args.func = &qede_xdp_reload_func;
+ args.u.new_prog = prog;
+ qede_reload(edev, &args, false);
+
+ return 0;
+}
+
+static int qede_xdp(struct net_device *dev, struct netdev_xdp *xdp)
+{
+ struct qede_dev *edev = netdev_priv(dev);
+
+ switch (xdp->command) {
+ case XDP_SETUP_PROG:
+ return qede_xdp_set(edev, xdp->prog);
+ case XDP_QUERY_PROG:
+ xdp->prog_attached = !!edev->xdp_prog;
+ return 0;
+ default:
+ return -EINVAL;
+ }
+}
+
static const struct net_device_ops qede_netdev_ops = {
.ndo_open = qede_open,
.ndo_stop = qede_close,
@@ -2363,6 +2448,7 @@ static netdev_features_t qede_features_check(struct sk_buff *skb,
.ndo_udp_tunnel_add = qede_udp_tunnel_add,
.ndo_udp_tunnel_del = qede_udp_tunnel_del,
.ndo_features_check = qede_features_check,
+ .ndo_xdp = qede_xdp,
};
/* -------------------------------------------------------------------------
@@ -2559,6 +2645,9 @@ static int qede_alloc_fp_array(struct qede_dev *edev)
fp->rxq = kzalloc(sizeof(*fp->rxq), GFP_KERNEL);
if (!fp->rxq)
goto err;
+
+ if (edev->xdp_prog)
+ fp->type |= QEDE_FASTPATH_XDP;
}
}
@@ -2756,6 +2845,10 @@ static void __qede_remove(struct pci_dev *pdev, enum qede_remove_mode mode)
pci_set_drvdata(pdev, NULL);
+ /* Release edev's reference to XDP's bpf if such exist */
+ if (edev->xdp_prog)
+ bpf_prog_put(edev->xdp_prog);
+
free_netdev(ndev);
/* Use global ops since we've freed edev */
@@ -2907,6 +3000,10 @@ static int qede_alloc_sge_mem(struct qede_dev *edev, struct qede_rx_queue *rxq)
dma_addr_t mapping;
int i;
+ /* Don't perform FW aggregations in case of XDP */
+ if (edev->xdp_prog)
+ edev->gro_disable = 1;
+
if (edev->gro_disable)
return 0;
@@ -2959,8 +3056,13 @@ static int qede_alloc_mem_rxq(struct qede_dev *edev, struct qede_rx_queue *rxq)
if (rxq->rx_buf_size > PAGE_SIZE)
rxq->rx_buf_size = PAGE_SIZE;
- /* Segment size to spilt a page in multiple equal parts */
- rxq->rx_buf_seg_size = roundup_pow_of_two(rxq->rx_buf_size);
+ /* Segment size to spilt a page in multiple equal parts,
+ * unless XDP is used in which case we'd use the entire page.
+ */
+ if (!edev->xdp_prog)
+ rxq->rx_buf_seg_size = roundup_pow_of_two(rxq->rx_buf_size);
+ else
+ rxq->rx_buf_seg_size = PAGE_SIZE;
/* Allocate the parallel driver ring for Rx buffers */
size = sizeof(*rxq->sw_rx_ring) * RX_RING_SIZE;
@@ -3368,6 +3470,9 @@ static int qede_stop_queues(struct qede_dev *edev)
return rc;
}
}
+
+ if (fp->type & QEDE_FASTPATH_XDP)
+ bpf_prog_put(fp->rxq->xdp_prog);
}
/* Stop the vport */
@@ -3495,6 +3600,15 @@ static int qede_start_queues(struct qede_dev *edev, bool clear_stats)
qede_update_rx_prod(edev, rxq);
}
+ if (fp->type & QEDE_FASTPATH_XDP) {
+ fp->rxq->xdp_prog = bpf_prog_add(edev->xdp_prog, 1);
+ if (IS_ERR(fp->rxq->xdp_prog)) {
+ rc = PTR_ERR(fp->rxq->xdp_prog);
+ fp->rxq->xdp_prog = NULL;
+ return rc;
+ }
+ }
+
if (fp->type & QEDE_FASTPATH_TX) {
rc = qede_start_txq(edev, fp, fp->txq, i, TX_PI(0));
if (rc)
--
1.9.3
^ permalink raw reply related
* [PATCH v2 net-next 09/11] qede: Better utilize the qede_[rt]x_queue
From: Yuval Mintz @ 2016-11-29 14:47 UTC (permalink / raw)
To: davem, netdev; +Cc: Yuval Mintz
In-Reply-To: <1480430830-17671-1-git-send-email-Yuval.Mintz@cavium.com>
Improve the cacheline usage of both queues by reordering -
This reduces the cachelines required for egress datapath processing
from 3 to 2 and those required by ingress datapath processing by 2.
It also changes a couple of datapath related functions that currently
require either the fastpath or the qede_dev, changing them to be based
on the tx/rx queue instead.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
---
drivers/net/ethernet/qlogic/qede/qede.h | 79 ++++++-----
drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 4 +-
drivers/net/ethernet/qlogic/qede/qede_main.c | 166 +++++++++++-------------
3 files changed, 124 insertions(+), 125 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h
index 4ebc638..21e8533 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -243,27 +243,33 @@ struct qede_agg_info {
};
struct qede_rx_queue {
- __le16 *hw_cons_ptr;
- struct sw_rx_data *sw_rx_ring;
- u16 sw_rx_cons;
- u16 sw_rx_prod;
- struct qed_chain rx_bd_ring;
- struct qed_chain rx_comp_ring;
- void __iomem *hw_rxq_prod_addr;
+ __le16 *hw_cons_ptr;
+ void __iomem *hw_rxq_prod_addr;
- /* GRO */
- struct qede_agg_info tpa_info[ETH_TPA_MAX_AGGS_NUM];
+ /* Required for the allocation of replacement buffers */
+ struct device *dev;
+
+ u16 sw_rx_cons;
+ u16 sw_rx_prod;
- int rx_buf_size;
- unsigned int rx_buf_seg_size;
+ u16 num_rx_buffers; /* Slowpath */
+ u8 rxq_id;
- u16 num_rx_buffers;
- u16 rxq_id;
+ u32 rx_buf_size;
+ u32 rx_buf_seg_size;
- u64 rcv_pkts;
- u64 rx_hw_errors;
- u64 rx_alloc_errors;
- u64 rx_ip_frags;
+ u64 rcv_pkts;
+
+ struct sw_rx_data *sw_rx_ring;
+ struct qed_chain rx_bd_ring;
+ struct qed_chain rx_comp_ring ____cacheline_aligned;
+
+ /* GRO */
+ struct qede_agg_info tpa_info[ETH_TPA_MAX_AGGS_NUM];
+
+ u64 rx_hw_errors;
+ u64 rx_alloc_errors;
+ u64 rx_ip_frags;
void *handle;
};
@@ -281,22 +287,28 @@ struct sw_tx_bd {
};
struct qede_tx_queue {
- int index; /* Queue index */
- __le16 *hw_cons_ptr;
- struct sw_tx_bd *sw_tx_ring;
- u16 sw_tx_cons;
- u16 sw_tx_prod;
- struct qed_chain tx_pbl;
- void __iomem *doorbell_addr;
- union db_prod tx_db;
-
- u16 num_tx_buffers;
- u64 xmit_pkts;
- u64 stopped_cnt;
-
- bool is_legacy;
- void *handle;
+ bool is_legacy;
+ u16 sw_tx_cons;
+ u16 sw_tx_prod;
+ u16 num_tx_buffers; /* Slowpath only */
+ u64 xmit_pkts;
+ u64 stopped_cnt;
+
+ __le16 *hw_cons_ptr;
+
+ /* Needed for the mapping of packets */
+ struct device *dev;
+
+ void __iomem *doorbell_addr;
+ union db_prod tx_db;
+ int index; /* Slowpath only */
+
+ struct sw_tx_bd *sw_tx_ring;
+ struct qed_chain tx_pbl;
+
+ /* Slowpath; Should be kept in end [unless missing padding] */
+ void *handle;
};
#define BD_UNMAP_ADDR(bd) HILO_U64(le32_to_cpu((bd)->addr.hi), \
@@ -363,8 +375,7 @@ void qede_reload(struct qede_dev *edev,
void __qede_unlock(struct qede_dev *edev);
bool qede_has_rx_work(struct qede_rx_queue *rxq);
int qede_txq_has_work(struct qede_tx_queue *txq);
-void qede_recycle_rx_bd_ring(struct qede_rx_queue *rxq, struct qede_dev *edev,
- u8 count);
+void qede_recycle_rx_bd_ring(struct qede_rx_queue *rxq, u8 count);
void qede_update_rx_prod(struct qede_dev *edev, struct qede_rx_queue *rxq);
#define RX_RING_SIZE_POW 13
diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
index ef8c327..60a2e58 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -1337,13 +1337,13 @@ static int qede_selftest_receive_traffic(struct qede_dev *edev)
break;
}
- qede_recycle_rx_bd_ring(rxq, edev, 1);
+ qede_recycle_rx_bd_ring(rxq, 1);
qed_chain_recycle_consumed(&rxq->rx_comp_ring);
break;
}
DP_INFO(edev, "Not the transmitted packet\n");
- qede_recycle_rx_bd_ring(rxq, edev, 1);
+ qede_recycle_rx_bd_ring(rxq, 1);
qed_chain_recycle_consumed(&rxq->rx_comp_ring);
}
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index ec01de2..15520bf 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -96,8 +96,6 @@ enum qede_pci_private {
static void qede_remove(struct pci_dev *pdev);
static void qede_shutdown(struct pci_dev *pdev);
-static int qede_alloc_rx_buffer(struct qede_dev *edev,
- struct qede_rx_queue *rxq);
static void qede_link_update(void *dev, struct qed_link_output *link);
/* The qede lock is used to protect driver state change and driver flows that
@@ -355,8 +353,7 @@ static int qede_free_tx_pkt(struct qede_dev *edev,
}
/* Unmap the data and free skb when mapping failed during start_xmit */
-static void qede_free_failed_tx_pkt(struct qede_dev *edev,
- struct qede_tx_queue *txq,
+static void qede_free_failed_tx_pkt(struct qede_tx_queue *txq,
struct eth_tx_1st_bd *first_bd,
int nbd, bool data_split)
{
@@ -378,7 +375,7 @@ static void qede_free_failed_tx_pkt(struct qede_dev *edev,
nbd--;
}
- dma_unmap_single(&edev->pdev->dev, BD_UNMAP_ADDR(first_bd),
+ dma_unmap_single(txq->dev, BD_UNMAP_ADDR(first_bd),
BD_UNMAP_LEN(first_bd) + split_bd_len, DMA_TO_DEVICE);
/* Unmap the data of the skb frags */
@@ -386,7 +383,7 @@ static void qede_free_failed_tx_pkt(struct qede_dev *edev,
tx_data_bd = (struct eth_tx_bd *)
qed_chain_produce(&txq->tx_pbl);
if (tx_data_bd->nbytes)
- dma_unmap_page(&edev->pdev->dev,
+ dma_unmap_page(txq->dev,
BD_UNMAP_ADDR(tx_data_bd),
BD_UNMAP_LEN(tx_data_bd), DMA_TO_DEVICE);
}
@@ -401,8 +398,7 @@ static void qede_free_failed_tx_pkt(struct qede_dev *edev,
txq->sw_tx_ring[idx].flags = 0;
}
-static u32 qede_xmit_type(struct qede_dev *edev,
- struct sk_buff *skb, int *ipv6_ext)
+static u32 qede_xmit_type(struct sk_buff *skb, int *ipv6_ext)
{
u32 rc = XMIT_L4_CSUM;
__be16 l3_proto;
@@ -469,18 +465,16 @@ static void qede_set_params_for_ipv6_ext(struct sk_buff *skb,
second_bd->data.bitfields2 = cpu_to_le16(bd2_bits2);
}
-static int map_frag_to_bd(struct qede_dev *edev,
+static int map_frag_to_bd(struct qede_tx_queue *txq,
skb_frag_t *frag, struct eth_tx_bd *bd)
{
dma_addr_t mapping;
/* Map skb non-linear frag data for DMA */
- mapping = skb_frag_dma_map(&edev->pdev->dev, frag, 0,
+ mapping = skb_frag_dma_map(txq->dev, frag, 0,
skb_frag_size(frag), DMA_TO_DEVICE);
- if (unlikely(dma_mapping_error(&edev->pdev->dev, mapping))) {
- DP_NOTICE(edev, "Unable to map frag - dropping packet\n");
+ if (unlikely(dma_mapping_error(txq->dev, mapping)))
return -ENOMEM;
- }
/* Setup the data pointer of the frag data */
BD_SET_UNMAP_ADDR_LEN(bd, mapping, skb_frag_size(frag));
@@ -500,8 +494,7 @@ static u16 qede_get_skb_hlen(struct sk_buff *skb, bool is_encap_pkt)
/* +2 for 1st BD for headers and 2nd BD for headlen (if required) */
#if ((MAX_SKB_FRAGS + 2) > ETH_TX_MAX_BDS_PER_NON_LSO_PACKET)
-static bool qede_pkt_req_lin(struct qede_dev *edev, struct sk_buff *skb,
- u8 xmit_type)
+static bool qede_pkt_req_lin(struct sk_buff *skb, u8 xmit_type)
{
int allowed_frags = ETH_TX_MAX_BDS_PER_NON_LSO_PACKET - 1;
@@ -565,10 +558,10 @@ static netdev_tx_t qede_start_xmit(struct sk_buff *skb,
WARN_ON(qed_chain_get_elem_left(&txq->tx_pbl) < (MAX_SKB_FRAGS + 1));
- xmit_type = qede_xmit_type(edev, skb, &ipv6_ext);
+ xmit_type = qede_xmit_type(skb, &ipv6_ext);
#if ((MAX_SKB_FRAGS + 2) > ETH_TX_MAX_BDS_PER_NON_LSO_PACKET)
- if (qede_pkt_req_lin(edev, skb, xmit_type)) {
+ if (qede_pkt_req_lin(skb, xmit_type)) {
if (skb_linearize(skb)) {
DP_NOTICE(edev,
"SKB linearization failed - silently dropping this SKB\n");
@@ -588,11 +581,11 @@ static netdev_tx_t qede_start_xmit(struct sk_buff *skb,
1 << ETH_TX_1ST_BD_FLAGS_START_BD_SHIFT;
/* Map skb linear data for DMA and set in the first BD */
- mapping = dma_map_single(&edev->pdev->dev, skb->data,
+ mapping = dma_map_single(txq->dev, skb->data,
skb_headlen(skb), DMA_TO_DEVICE);
- if (unlikely(dma_mapping_error(&edev->pdev->dev, mapping))) {
+ if (unlikely(dma_mapping_error(txq->dev, mapping))) {
DP_NOTICE(edev, "SKB mapping failed\n");
- qede_free_failed_tx_pkt(edev, txq, first_bd, 0, false);
+ qede_free_failed_tx_pkt(txq, first_bd, 0, false);
qede_update_tx_producer(txq);
return NETDEV_TX_OK;
}
@@ -716,12 +709,11 @@ static netdev_tx_t qede_start_xmit(struct sk_buff *skb,
/* Handle fragmented skb */
/* special handle for frags inside 2nd and 3rd bds.. */
while (tx_data_bd && frag_idx < skb_shinfo(skb)->nr_frags) {
- rc = map_frag_to_bd(edev,
+ rc = map_frag_to_bd(txq,
&skb_shinfo(skb)->frags[frag_idx],
tx_data_bd);
if (rc) {
- qede_free_failed_tx_pkt(edev, txq, first_bd, nbd,
- data_split);
+ qede_free_failed_tx_pkt(txq, first_bd, nbd, data_split);
qede_update_tx_producer(txq);
return NETDEV_TX_OK;
}
@@ -741,12 +733,11 @@ static netdev_tx_t qede_start_xmit(struct sk_buff *skb,
memset(tx_data_bd, 0, sizeof(*tx_data_bd));
- rc = map_frag_to_bd(edev,
+ rc = map_frag_to_bd(txq,
&skb_shinfo(skb)->frags[frag_idx],
tx_data_bd);
if (rc) {
- qede_free_failed_tx_pkt(edev, txq, first_bd, nbd,
- data_split);
+ qede_free_failed_tx_pkt(txq, first_bd, nbd, data_split);
qede_update_tx_producer(txq);
return NETDEV_TX_OK;
}
@@ -903,8 +894,7 @@ static inline void qede_rx_bd_ring_consume(struct qede_rx_queue *rxq)
/* This function reuses the buffer(from an offset) from
* consumer index to producer index in the bd ring
*/
-static inline void qede_reuse_page(struct qede_dev *edev,
- struct qede_rx_queue *rxq,
+static inline void qede_reuse_page(struct qede_rx_queue *rxq,
struct sw_rx_data *curr_cons)
{
struct eth_rx_bd *rx_bd_prod = qed_chain_produce(&rxq->rx_bd_ring);
@@ -926,27 +916,62 @@ static inline void qede_reuse_page(struct qede_dev *edev,
/* In case of allocation failures reuse buffers
* from consumer index to produce buffers for firmware
*/
-void qede_recycle_rx_bd_ring(struct qede_rx_queue *rxq,
- struct qede_dev *edev, u8 count)
+void qede_recycle_rx_bd_ring(struct qede_rx_queue *rxq, u8 count)
{
struct sw_rx_data *curr_cons;
for (; count > 0; count--) {
curr_cons = &rxq->sw_rx_ring[rxq->sw_rx_cons & NUM_RX_BDS_MAX];
- qede_reuse_page(edev, rxq, curr_cons);
+ qede_reuse_page(rxq, curr_cons);
qede_rx_bd_ring_consume(rxq);
}
}
-static inline int qede_realloc_rx_buffer(struct qede_dev *edev,
- struct qede_rx_queue *rxq,
+static int qede_alloc_rx_buffer(struct qede_rx_queue *rxq)
+{
+ struct sw_rx_data *sw_rx_data;
+ struct eth_rx_bd *rx_bd;
+ dma_addr_t mapping;
+ struct page *data;
+
+ data = alloc_pages(GFP_ATOMIC, 0);
+ if (unlikely(!data))
+ return -ENOMEM;
+
+ /* Map the entire page as it would be used
+ * for multiple RX buffer segment size mapping.
+ */
+ mapping = dma_map_page(rxq->dev, data, 0,
+ PAGE_SIZE, DMA_FROM_DEVICE);
+ if (unlikely(dma_mapping_error(rxq->dev, mapping))) {
+ __free_page(data);
+ return -ENOMEM;
+ }
+
+ sw_rx_data = &rxq->sw_rx_ring[rxq->sw_rx_prod & NUM_RX_BDS_MAX];
+ sw_rx_data->page_offset = 0;
+ sw_rx_data->data = data;
+ sw_rx_data->mapping = mapping;
+
+ /* Advance PROD and get BD pointer */
+ rx_bd = (struct eth_rx_bd *)qed_chain_produce(&rxq->rx_bd_ring);
+ WARN_ON(!rx_bd);
+ rx_bd->addr.hi = cpu_to_le32(upper_32_bits(mapping));
+ rx_bd->addr.lo = cpu_to_le32(lower_32_bits(mapping));
+
+ rxq->sw_rx_prod++;
+
+ return 0;
+}
+
+static inline int qede_realloc_rx_buffer(struct qede_rx_queue *rxq,
struct sw_rx_data *curr_cons)
{
/* Move to the next segment in the page */
curr_cons->page_offset += rxq->rx_buf_seg_size;
if (curr_cons->page_offset == PAGE_SIZE) {
- if (unlikely(qede_alloc_rx_buffer(edev, rxq))) {
+ if (unlikely(qede_alloc_rx_buffer(rxq))) {
/* Since we failed to allocate new buffer
* current buffer can be used again.
*/
@@ -955,7 +980,7 @@ static inline int qede_realloc_rx_buffer(struct qede_dev *edev,
return -ENOMEM;
}
- dma_unmap_page(&edev->pdev->dev, curr_cons->mapping,
+ dma_unmap_page(rxq->dev, curr_cons->mapping,
PAGE_SIZE, DMA_FROM_DEVICE);
} else {
/* Increment refcount of the page as we don't want
@@ -963,7 +988,7 @@ static inline int qede_realloc_rx_buffer(struct qede_dev *edev,
* which can be recycled multiple times by the driver.
*/
page_ref_inc(curr_cons->data);
- qede_reuse_page(edev, rxq, curr_cons);
+ qede_reuse_page(rxq, curr_cons);
}
return 0;
@@ -1026,6 +1051,7 @@ static void qede_set_skb_csum(struct sk_buff *skb, u8 csum_flag)
static inline void qede_skb_receive(struct qede_dev *edev,
struct qede_fastpath *fp,
+ struct qede_rx_queue *rxq,
struct sk_buff *skb, u16 vlan_tag)
{
if (vlan_tag)
@@ -1068,7 +1094,7 @@ static int qede_fill_frag_skb(struct qede_dev *edev,
current_bd->data, current_bd->page_offset,
len_on_bd);
- if (unlikely(qede_realloc_rx_buffer(edev, rxq, current_bd))) {
+ if (unlikely(qede_realloc_rx_buffer(rxq, current_bd))) {
/* Incr page ref count to reuse on allocation failure
* so that it doesn't get freed while freeing SKB.
*/
@@ -1087,7 +1113,8 @@ static int qede_fill_frag_skb(struct qede_dev *edev,
out:
tpa_info->state = QEDE_AGG_STATE_ERROR;
- qede_recycle_rx_bd_ring(rxq, edev, 1);
+ qede_recycle_rx_bd_ring(rxq, 1);
+
return -ENOMEM;
}
@@ -1239,7 +1266,7 @@ static void qede_gro_receive(struct qede_dev *edev,
send_skb:
skb_record_rx_queue(skb, fp->rxq->rxq_id);
- qede_skb_receive(edev, fp, skb, vlan_tag);
+ qede_skb_receive(edev, fp, fp->rxq, skb, vlan_tag);
}
static inline void qede_tpa_cont(struct qede_dev *edev,
@@ -1414,7 +1441,7 @@ static struct sk_buff *qede_rx_allocate_skb(struct qede_dev *edev,
if (len + pad <= edev->rx_copybreak) {
memcpy(skb_put(skb, len),
page_address(page) + pad + offset, len);
- qede_reuse_page(edev, rxq, bd);
+ qede_reuse_page(rxq, bd);
goto out;
}
@@ -1435,7 +1462,7 @@ static struct sk_buff *qede_rx_allocate_skb(struct qede_dev *edev,
skb->data_len -= pull_len;
skb->tail += pull_len;
- if (unlikely(qede_realloc_rx_buffer(edev, rxq, bd))) {
+ if (unlikely(qede_realloc_rx_buffer(rxq, bd))) {
/* Incr page ref count to reuse on allocation failure so
* that it doesn't get freed while freeing SKB [as its
* already mapped there].
@@ -1477,7 +1504,7 @@ static int qede_rx_build_jumbo(struct qede_dev *edev,
}
/* We need a replacement buffer for each BD */
- if (unlikely(qede_alloc_rx_buffer(edev, rxq)))
+ if (unlikely(qede_alloc_rx_buffer(rxq)))
goto out;
/* Now that we've allocated the replacement buffer,
@@ -1487,7 +1514,7 @@ static int qede_rx_build_jumbo(struct qede_dev *edev,
bd = &rxq->sw_rx_ring[bd_cons_idx];
qede_rx_bd_ring_consume(rxq);
- dma_unmap_page(&edev->pdev->dev, bd->mapping,
+ dma_unmap_page(rxq->dev, bd->mapping,
PAGE_SIZE, DMA_FROM_DEVICE);
skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags++,
@@ -1582,7 +1609,7 @@ static int qede_rx_process_cqe(struct qede_dev *edev,
"CQE has error, flags = %x, dropping incoming packet\n",
parse_flag);
rxq->rx_hw_errors++;
- qede_recycle_rx_bd_ring(rxq, edev, fp_cqe->bd_num);
+ qede_recycle_rx_bd_ring(rxq, fp_cqe->bd_num);
return 0;
}
}
@@ -1593,7 +1620,7 @@ static int qede_rx_process_cqe(struct qede_dev *edev,
skb = qede_rx_allocate_skb(edev, rxq, bd, len, pad);
if (!skb) {
rxq->rx_alloc_errors++;
- qede_recycle_rx_bd_ring(rxq, edev, fp_cqe->bd_num);
+ qede_recycle_rx_bd_ring(rxq, fp_cqe->bd_num);
return 0;
}
@@ -1605,7 +1632,7 @@ static int qede_rx_process_cqe(struct qede_dev *edev,
fp_cqe, len);
if (unlikely(unmapped_frags > 0)) {
- qede_recycle_rx_bd_ring(rxq, edev, unmapped_frags);
+ qede_recycle_rx_bd_ring(rxq, unmapped_frags);
dev_kfree_skb_any(skb);
return 0;
}
@@ -1618,7 +1645,7 @@ static int qede_rx_process_cqe(struct qede_dev *edev,
skb_record_rx_queue(skb, rxq->rxq_id);
/* SKB is prepared - pass it to stack */
- qede_skb_receive(edev, fp, skb, le16_to_cpu(fp_cqe->vlan_tag));
+ qede_skb_receive(edev, fp, rxq, skb, le16_to_cpu(fp_cqe->vlan_tag));
return 1;
}
@@ -2875,47 +2902,6 @@ static void qede_free_mem_rxq(struct qede_dev *edev, struct qede_rx_queue *rxq)
edev->ops->common->chain_free(edev->cdev, &rxq->rx_comp_ring);
}
-static int qede_alloc_rx_buffer(struct qede_dev *edev,
- struct qede_rx_queue *rxq)
-{
- struct sw_rx_data *sw_rx_data;
- struct eth_rx_bd *rx_bd;
- dma_addr_t mapping;
- struct page *data;
-
- data = alloc_pages(GFP_ATOMIC, 0);
- if (unlikely(!data)) {
- DP_NOTICE(edev, "Failed to allocate Rx data [page]\n");
- return -ENOMEM;
- }
-
- /* Map the entire page as it would be used
- * for multiple RX buffer segment size mapping.
- */
- mapping = dma_map_page(&edev->pdev->dev, data, 0,
- PAGE_SIZE, DMA_FROM_DEVICE);
- if (unlikely(dma_mapping_error(&edev->pdev->dev, mapping))) {
- __free_page(data);
- DP_NOTICE(edev, "Failed to map Rx buffer\n");
- return -ENOMEM;
- }
-
- sw_rx_data = &rxq->sw_rx_ring[rxq->sw_rx_prod & NUM_RX_BDS_MAX];
- sw_rx_data->page_offset = 0;
- sw_rx_data->data = data;
- sw_rx_data->mapping = mapping;
-
- /* Advance PROD and get BD pointer */
- rx_bd = (struct eth_rx_bd *)qed_chain_produce(&rxq->rx_bd_ring);
- WARN_ON(!rx_bd);
- rx_bd->addr.hi = cpu_to_le32(upper_32_bits(mapping));
- rx_bd->addr.lo = cpu_to_le32(lower_32_bits(mapping));
-
- rxq->sw_rx_prod++;
-
- return 0;
-}
-
static int qede_alloc_sge_mem(struct qede_dev *edev, struct qede_rx_queue *rxq)
{
dma_addr_t mapping;
@@ -3010,7 +2996,7 @@ static int qede_alloc_mem_rxq(struct qede_dev *edev, struct qede_rx_queue *rxq)
/* Allocate buffers for the Rx ring */
for (i = 0; i < rxq->num_rx_buffers; i++) {
- rc = qede_alloc_rx_buffer(edev, rxq);
+ rc = qede_alloc_rx_buffer(rxq);
if (rc) {
DP_ERR(edev,
"Rx buffers allocation failed at index %d\n", i);
@@ -3151,12 +3137,14 @@ static void qede_init_fp(struct qede_dev *edev)
if (fp->type & QEDE_FASTPATH_RX) {
fp->rxq->rxq_id = rxq_index++;
+ fp->rxq->dev = &edev->pdev->dev;
}
if (fp->type & QEDE_FASTPATH_TX) {
fp->txq->index = txq_index++;
if (edev->dev_info.is_legacy)
fp->txq->is_legacy = 1;
+ fp->txq->dev = &edev->pdev->dev;
}
snprintf(fp->name, sizeof(fp->name), "%s-fp-%d",
--
1.9.3
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox