* Re: [PATCH net-next v2 0/2] FDDI: DEC FDDIcontroller 700 TURBOchannel adapter support
From: David Miller @ 2018-10-16 4:46 UTC (permalink / raw)
To: macro; +Cc: netdev
In-Reply-To: <alpine.LFD.2.21.1810090133440.15789@eddie.linux-mips.org>
From: "Maciej W. Rozycki" <macro@linux-mips.org>
Date: Tue, 9 Oct 2018 23:57:36 +0100 (BST)
> Questions, comments? Otherwise, please apply.
Series applied, thank you.
^ permalink raw reply
* Re: [PATCH net-next 0/3] ip_tunnel: specify tunnel type via template
From: David Miller @ 2018-10-16 4:43 UTC (permalink / raw)
To: pablo
Cc: netfilter-devel, netdev, roopa, amir, pshelar, u9012063, daniel,
jakub.kicinski
In-Reply-To: <20181009222439.29399-1-pablo@netfilter.org>
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Wed, 10 Oct 2018 00:24:36 +0200
> The following patchset adds a new field to the tunnel metadata template.
> This new field allows us to restrict the configuration to a given tunnel
> driver in order to catch incorrect configuration that may result in
> packets going to the wrong tunnel driver.
>
> Changes with regards to initial RFC [1] are:
>
> 1) Explicit tunnel type initialization to TUNNEL_TYPE_UNSPEC in existing
> clients for this code, as requested by Daniel.
>
> 2) Add TUNNEL_TYPE_* definition through enum tunnel_type in
> uapi/linux/if_tunnel.h, so we don't need to redefine this in every
> client of this infrastructure.
>
> 3) Add TUNNEL_TYPE_IPIP, TUNNEL_TYPE_IPIP6 and TUNNEL_TYPE_IP6IP6, which
> were missing in the original RFC.
>
> Let me know if you any more comments, thanks.
>
> [1] https://marc.info/?l=netfilter-devel&m=153861145204094&w=2
People don't need to update a core common UAPI header to add a new
ethernet driver.
They shouldn't have to do so to add a new tunneling driver either.
But that requirement is created by this patch set.
^ permalink raw reply
* Re: [PATCH net-next] tun: Consistently configure generic netdev params via rtnetlink
From: David Miller @ 2018-10-16 4:41 UTC (permalink / raw)
To: serhe.popovych; +Cc: netdev, ebiederm
In-Reply-To: <1539109261-8414-1-git-send-email-serhe.popovych@gmail.com>
From: Serhey Popovych <serhe.popovych@gmail.com>
Date: Tue, 9 Oct 2018 21:21:01 +0300
> Configuring generic network device parameters on tun will fail in
> presence of IFLA_INFO_KIND attribute in IFLA_LINKINFO nested attribute
> since tun_validate() always return failure.
>
> This can be visualized with following ip-link(8) command sequences:
>
> # ip link set dev tun0 group 100
> # ip link set dev tun0 group 100 type tun
> RTNETLINK answers: Invalid argument
>
> with contrast to dummy and veth drivers:
>
> # ip link set dev dummy0 group 100
> # ip link set dev dummy0 type dummy
>
> # ip link set dev veth0 group 100
> # ip link set dev veth0 group 100 type veth
>
> Fix by returning zero in tun_validate() when @data is NULL that is
> always in case since rtnl_link_ops->maxtype is zero in tun driver.
>
> Fixes: f019a7a594d9 ("tun: Implement ip link del tunXXX")
> Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH] ethtool: fix a privilege escalation bug
From: David Miller @ 2018-10-16 4:39 UTC (permalink / raw)
To: wang6495
Cc: kjlu, f.fainelli, keescook, andrew, ecree, ilyal, ynorov,
alan.brady, stephen, netdev, linux-kernel
In-Reply-To: <1539013777-1625-1-git-send-email-wang6495@umn.edu>
From: Wenwen Wang <wang6495@umn.edu>
Date: Mon, 8 Oct 2018 10:49:35 -0500
> In dev_ethtool(), the eth command 'ethcmd' is firstly copied from the
> use-space buffer 'useraddr' and checked to see whether it is
> ETHTOOL_PERQUEUE. If yes, the sub-command 'sub_cmd' is further copied from
> the user space. Otherwise, 'sub_cmd' is the same as 'ethcmd'. Next,
> according to 'sub_cmd', a permission check is enforced through the function
> ns_capable(). For example, the permission check is required if 'sub_cmd' is
> ETHTOOL_SCOALESCE, but it is not necessary if 'sub_cmd' is
> ETHTOOL_GCOALESCE, as suggested in the comment "Allow some commands to be
> done by anyone". The following execution invokes different handlers
> according to 'ethcmd'. Specifically, if 'ethcmd' is ETHTOOL_PERQUEUE,
> ethtool_set_per_queue() is called. In ethtool_set_per_queue(), the kernel
> object 'per_queue_opt' is copied again from the user-space buffer
> 'useraddr' and 'per_queue_opt.sub_command' is used to determine which
> operation should be performed. Given that the buffer 'useraddr' is in the
> user space, a malicious user can race to change the sub-command between the
> two copies. In particular, the attacker can supply ETHTOOL_PERQUEUE and
> ETHTOOL_GCOALESCE to bypass the permission check in dev_ethtool(). Then
> before ethtool_set_per_queue() is called, the attacker changes
> ETHTOOL_GCOALESCE to ETHTOOL_SCOALESCE. In this way, the attacker can
> bypass the permission check and execute ETHTOOL_SCOALESCE.
>
> This patch enforces a check in ethtool_set_per_queue() after the second
> copy from 'useraddr'. If the sub-command is different from the one obtained
> in the first copy in dev_ethtool(), an error code EINVAL will be returned.
>
> Signed-off-by: Wenwen Wang <wang6495@umn.edu>
Applied and queued up for -stable.
^ permalink raw reply
* crash in xt_policy due to skb_dst_drop() in nf_ct_frag6_gather()
From: Maciej Żenczykowski @ 2018-10-16 4:13 UTC (permalink / raw)
To: Lorenzo Colitti, Eric Dumazet, Florian Westphal, Linux NetDev,
Maciej Zenczykowski, Maciej Żenczykowski
I believe that:
commit ad8b1ffc3efae2f65080bdb11145c87d299b8f9a
Author: Florian Westphal <fw@strlen.de>
netfilter: ipv6: nf_defrag: drop skb dst before queueing
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -618,6 +618,8 @@ int nf_ct_frag6_gather(struct net *net, struct
sk_buff *skb, u32 user)
fq->q.meat == fq->q.len &&
nf_ct_frag6_reasm(fq, skb, dev))
ret = 0;
+ else
+ skb_dst_drop(skb);
out_unlock:
spin_unlock_bh(&fq->q.lock);
Is causing a crash on android after upgrading from 4.9.96 to 4.9.119
This is because clatd ipv4 to ipv6 translation user space daemon is
functionally equivalent to the syzkaller reproducer.
It will convert ipv4 frags it receives via tap into ipv6 frags which
it will write out via rawv6 sendmsg.
However we are also using xt_policy, after stripping cruft this is basically:
ip6tables -A OUTPUT -m policy --dir out --pol ipsec
Crash is:
match_policy_out()
const struct dst_entry *dst = skb_dst(skb); // returns NULL
if (dst->xfrm == NULL) <-- dst == NULL -> panic
[ 1136.606948] c1 2675 [<ffffff9ec38b4098>] policy_mt+0x34/0x18c
[ 1136.606954] c1 2675 [<ffffff9ec39e6af8>] ip6t_do_table+0x280/0x684
[ 1136.606961] c1 2675 [<ffffff9ec39e7250>] ip6table_filter_hook+0x20/0x28
[ 1136.606969] c1 2675 [<ffffff9ec386ecc8>] nf_hook_slow+0x98/0x154
[ 1136.606977] c1 2675 [<ffffff9ec39b9b10>] rawv6_sendmsg+0xd14/0x1520
[ 1136.606985] c1 2675 [<ffffff9ec39191fc>] inet_sendmsg+0x100/0x1b0
[ 1136.606993] c1 2675 [<ffffff9ec37d3720>] ___sys_sendmsg+0x2a0/0x414
[ 1136.606999] c1 2675 [<ffffff9ec37d3d48>] SyS_sendmsg+0x94/0xe4
Just checking for NULL in xt_policy.c:match_policy_out() and returning
0 or 1 unconditionally seems to be the wrong thing to do,
since after all prior to skb_dst_drop() the skb->dst->xfrm might not
have been NULL.
Maciej Żenczykowski, Kernel Networking Developer @ Google
^ permalink raw reply
* [PATCH net-next 5/5] net: hns3: fix for multiple unmapping DMA problem
From: Yunsheng Lin @ 2018-10-16 11:58 UTC (permalink / raw)
To: davem
Cc: huangdaode, xuwei5, liguozhu, Yisen.Zhuang, john.garry, linuxarm,
yisen.zhuang, salil.mehta, lipeng321, netdev, linux-kernel
In-Reply-To: <1539691132-158455-1-git-send-email-linyunsheng@huawei.com>
From: Fuyun Liang <liangfuyun1@huawei.com>
When sending a big fragment using multiple buffer descriptor,
hns3 does one maping, but do multiple unmapping when tx is done,
which may cause unmapping problem.
To fix it, this patch makes sure the value of desc_cb.length of
the non-first bd is zero. If desc_cb.length is zero, we do not
unmap the buffer.
Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 74e592d..76ce2f2 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -1051,6 +1051,8 @@ static int hns3_fill_desc(struct hns3_enet_ring *ring, void *priv,
return -ENOMEM;
}
+ desc_cb->length = size;
+
frag_buf_num = (size + HNS3_MAX_BD_SIZE - 1) / HNS3_MAX_BD_SIZE;
sizeoflast = size % HNS3_MAX_BD_SIZE;
sizeoflast = sizeoflast ? sizeoflast : HNS3_MAX_BD_SIZE;
@@ -1059,15 +1061,14 @@ static int hns3_fill_desc(struct hns3_enet_ring *ring, void *priv,
for (k = 0; k < frag_buf_num; k++) {
/* The txbd's baseinfo of DESC_TYPE_PAGE & DESC_TYPE_SKB */
desc_cb->priv = priv;
- desc_cb->length = (k == frag_buf_num - 1) ?
- sizeoflast : HNS3_MAX_BD_SIZE;
desc_cb->dma = dma + HNS3_MAX_BD_SIZE * k;
desc_cb->type = (type == DESC_TYPE_SKB && !k) ?
DESC_TYPE_SKB : DESC_TYPE_PAGE;
/* now, fill the descriptor */
desc->addr = cpu_to_le64(dma + HNS3_MAX_BD_SIZE * k);
- desc->tx.send_size = cpu_to_le16((u16)desc_cb->length);
+ desc->tx.send_size = cpu_to_le16((k == frag_buf_num - 1) ?
+ (u16)sizeoflast : (u16)HNS3_MAX_BD_SIZE);
hns3_set_txbd_baseinfo(&bdtp_fe_sc_vld_ra_ri,
frag_end && (k == frag_buf_num - 1) ?
1 : 0);
@@ -1150,12 +1151,14 @@ static void hns3_clear_desc(struct hns3_enet_ring *ring, int next_to_use_orig)
ring->desc_cb[ring->next_to_use].dma,
ring->desc_cb[ring->next_to_use].length,
DMA_TO_DEVICE);
- else
+ else if (ring->desc_cb[ring->next_to_use].length)
dma_unmap_page(dev,
ring->desc_cb[ring->next_to_use].dma,
ring->desc_cb[ring->next_to_use].length,
DMA_TO_DEVICE);
+ ring->desc_cb[ring->next_to_use].length = 0;
+
/* rollback one */
ring_ptr_move_bw(ring, next_to_use);
}
@@ -1874,7 +1877,7 @@ static void hns3_unmap_buffer(struct hns3_enet_ring *ring,
if (cb->type == DESC_TYPE_SKB)
dma_unmap_single(ring_to_dev(ring), cb->dma, cb->length,
ring_to_dma_dir(ring));
- else
+ else if (cb->length)
dma_unmap_page(ring_to_dev(ring), cb->dma, cb->length,
ring_to_dma_dir(ring));
}
--
1.9.1
^ permalink raw reply related
* [PATCH net-next 4/5] net: hns3: rename hns_nic_dma_unmap
From: Yunsheng Lin @ 2018-10-16 11:58 UTC (permalink / raw)
To: davem
Cc: huangdaode, xuwei5, liguozhu, Yisen.Zhuang, john.garry, linuxarm,
yisen.zhuang, salil.mehta, lipeng321, netdev, linux-kernel
In-Reply-To: <1539691132-158455-1-git-send-email-linyunsheng@huawei.com>
From: Fuyun Liang <liangfuyun1@huawei.com>
To keep symmetrical, this patch renames hns_nic_dma_unmap to
hns3_clear_desc.
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 3fb8e48..74e592d 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -1134,7 +1134,7 @@ static int hns3_nic_maybe_stop_tx(struct sk_buff **out_skb, int *bnum,
return 0;
}
-static void hns_nic_dma_unmap(struct hns3_enet_ring *ring, int next_to_use_orig)
+static void hns3_clear_desc(struct hns3_enet_ring *ring, int next_to_use_orig)
{
struct device *dev = ring_to_dev(ring);
unsigned int i;
@@ -1208,7 +1208,7 @@ netdev_tx_t hns3_nic_net_xmit(struct sk_buff *skb, struct net_device *netdev)
ret = priv->ops.fill_desc(ring, skb, size, seg_num == 1 ? 1 : 0,
DESC_TYPE_SKB);
if (ret)
- goto head_dma_map_err;
+ goto head_fill_err;
next_to_use_frag = ring->next_to_use;
/* Fill the fragments */
@@ -1221,7 +1221,7 @@ netdev_tx_t hns3_nic_net_xmit(struct sk_buff *skb, struct net_device *netdev)
DESC_TYPE_PAGE);
if (ret)
- goto frag_dma_map_err;
+ goto frag_fill_err;
}
/* Complete translate all packets */
@@ -1234,11 +1234,11 @@ netdev_tx_t hns3_nic_net_xmit(struct sk_buff *skb, struct net_device *netdev)
return NETDEV_TX_OK;
-frag_dma_map_err:
- hns_nic_dma_unmap(ring, next_to_use_frag);
+frag_fill_err:
+ hns3_clear_desc(ring, next_to_use_frag);
-head_dma_map_err:
- hns_nic_dma_unmap(ring, next_to_use_head);
+head_fill_err:
+ hns3_clear_desc(ring, next_to_use_head);
out_err_tx_ok:
dev_kfree_skb_any(skb);
--
1.9.1
^ permalink raw reply related
* [PATCH net-next 3/5] net: hns3: add handling for big TX fragment
From: Yunsheng Lin @ 2018-10-16 11:58 UTC (permalink / raw)
To: davem
Cc: huangdaode, xuwei5, liguozhu, Yisen.Zhuang, john.garry, linuxarm,
yisen.zhuang, salil.mehta, lipeng321, netdev, linux-kernel
In-Reply-To: <1539691132-158455-1-git-send-email-linyunsheng@huawei.com>
From: Fuyun Liang <liangfuyun1@huawei.com>
This patch unifies big tx fragment handling for tso and non-tso
case.
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 45 +++++++++++++++++--------
1 file changed, 31 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 288d527..3fb8e48 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -985,10 +985,13 @@ static int hns3_fill_desc(struct hns3_enet_ring *ring, void *priv,
u32 ol_type_vlan_len_msec = 0;
u16 bdtp_fe_sc_vld_ra_ri = 0;
struct skb_frag_struct *frag;
+ unsigned int frag_buf_num;
u32 type_cs_vlan_tso = 0;
struct sk_buff *skb;
u16 inner_vtag = 0;
u16 out_vtag = 0;
+ unsigned int k;
+ int sizeoflast;
u32 paylen = 0;
dma_addr_t dma;
u16 mss = 0;
@@ -996,16 +999,6 @@ static int hns3_fill_desc(struct hns3_enet_ring *ring, void *priv,
u8 il4_proto;
int ret;
- /* The txbd's baseinfo of DESC_TYPE_PAGE & DESC_TYPE_SKB */
- desc_cb->priv = priv;
- desc_cb->length = size;
- desc_cb->type = type;
-
- /* now, fill the descriptor */
- desc->tx.send_size = cpu_to_le16((u16)size);
- hns3_set_txbd_baseinfo(&bdtp_fe_sc_vld_ra_ri, frag_end);
- desc->tx.bdtp_fe_sc_vld_ra_ri = cpu_to_le16(bdtp_fe_sc_vld_ra_ri);
-
if (type == DESC_TYPE_SKB) {
skb = (struct sk_buff *)priv;
paylen = skb->len;
@@ -1058,11 +1051,35 @@ static int hns3_fill_desc(struct hns3_enet_ring *ring, void *priv,
return -ENOMEM;
}
- desc_cb->dma = dma;
- desc->addr = cpu_to_le64(dma);
+ frag_buf_num = (size + HNS3_MAX_BD_SIZE - 1) / HNS3_MAX_BD_SIZE;
+ sizeoflast = size % HNS3_MAX_BD_SIZE;
+ sizeoflast = sizeoflast ? sizeoflast : HNS3_MAX_BD_SIZE;
+
+ /* When frag size is bigger than hardware limit, split this frag */
+ for (k = 0; k < frag_buf_num; k++) {
+ /* The txbd's baseinfo of DESC_TYPE_PAGE & DESC_TYPE_SKB */
+ desc_cb->priv = priv;
+ desc_cb->length = (k == frag_buf_num - 1) ?
+ sizeoflast : HNS3_MAX_BD_SIZE;
+ desc_cb->dma = dma + HNS3_MAX_BD_SIZE * k;
+ desc_cb->type = (type == DESC_TYPE_SKB && !k) ?
+ DESC_TYPE_SKB : DESC_TYPE_PAGE;
+
+ /* now, fill the descriptor */
+ desc->addr = cpu_to_le64(dma + HNS3_MAX_BD_SIZE * k);
+ desc->tx.send_size = cpu_to_le16((u16)desc_cb->length);
+ hns3_set_txbd_baseinfo(&bdtp_fe_sc_vld_ra_ri,
+ frag_end && (k == frag_buf_num - 1) ?
+ 1 : 0);
+ desc->tx.bdtp_fe_sc_vld_ra_ri =
+ cpu_to_le16(bdtp_fe_sc_vld_ra_ri);
+
+ /* move ring pointer to next.*/
+ ring_ptr_move_fw(ring, next_to_use);
- /* move ring pointer to next.*/
- ring_ptr_move_fw(ring, next_to_use);
+ desc_cb = &ring->desc_cb[ring->next_to_use];
+ desc = &ring->desc[ring->next_to_use];
+ }
return 0;
}
--
1.9.1
^ permalink raw reply related
* [PATCH net-next 2/5] net: hns3: move DMA map into hns3_fill_desc
From: Yunsheng Lin @ 2018-10-16 11:58 UTC (permalink / raw)
To: davem
Cc: huangdaode, xuwei5, liguozhu, Yisen.Zhuang, john.garry, linuxarm,
yisen.zhuang, salil.mehta, lipeng321, netdev, linux-kernel
In-Reply-To: <1539691132-158455-1-git-send-email-linyunsheng@huawei.com>
From: Peng Li <lipeng321@huawei.com>
To solve the L3 checksum error problem which happens when driver
does not clear L3 checksum, DMA map should be done after calling
skb_cow_head.
This patch moves DMA map into hns3_fill_desc to ensure that DMA
map is done after calling skb_cow_head.
Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 47 ++++++++++++-------------
drivers/net/ethernet/hisilicon/hns3/hns3_enet.h | 3 +-
2 files changed, 24 insertions(+), 26 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 953568a..288d527 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -977,18 +977,20 @@ static int hns3_fill_desc_vtags(struct sk_buff *skb,
}
static int hns3_fill_desc(struct hns3_enet_ring *ring, void *priv,
- int size, dma_addr_t dma, int frag_end,
- enum hns_desc_type type)
+ int size, int frag_end, enum hns_desc_type type)
{
struct hns3_desc_cb *desc_cb = &ring->desc_cb[ring->next_to_use];
struct hns3_desc *desc = &ring->desc[ring->next_to_use];
+ struct device *dev = ring_to_dev(ring);
u32 ol_type_vlan_len_msec = 0;
u16 bdtp_fe_sc_vld_ra_ri = 0;
+ struct skb_frag_struct *frag;
u32 type_cs_vlan_tso = 0;
struct sk_buff *skb;
u16 inner_vtag = 0;
u16 out_vtag = 0;
u32 paylen = 0;
+ dma_addr_t dma;
u16 mss = 0;
u8 ol4_proto;
u8 il4_proto;
@@ -997,11 +999,9 @@ static int hns3_fill_desc(struct hns3_enet_ring *ring, void *priv,
/* The txbd's baseinfo of DESC_TYPE_PAGE & DESC_TYPE_SKB */
desc_cb->priv = priv;
desc_cb->length = size;
- desc_cb->dma = dma;
desc_cb->type = type;
/* now, fill the descriptor */
- desc->addr = cpu_to_le64(dma);
desc->tx.send_size = cpu_to_le16((u16)size);
hns3_set_txbd_baseinfo(&bdtp_fe_sc_vld_ra_ri, frag_end);
desc->tx.bdtp_fe_sc_vld_ra_ri = cpu_to_le16(bdtp_fe_sc_vld_ra_ri);
@@ -1046,8 +1046,21 @@ static int hns3_fill_desc(struct hns3_enet_ring *ring, void *priv,
desc->tx.mss = cpu_to_le16(mss);
desc->tx.vlan_tag = cpu_to_le16(inner_vtag);
desc->tx.outer_vlan_tag = cpu_to_le16(out_vtag);
+
+ dma = dma_map_single(dev, skb->data, size, DMA_TO_DEVICE);
+ } else {
+ frag = (struct skb_frag_struct *)priv;
+ dma = skb_frag_dma_map(dev, frag, 0, size, DMA_TO_DEVICE);
+ }
+
+ if (dma_mapping_error(ring->dev, dma)) {
+ ring->stats.sw_err_cnt++;
+ return -ENOMEM;
}
+ desc_cb->dma = dma;
+ desc->addr = cpu_to_le64(dma);
+
/* move ring pointer to next.*/
ring_ptr_move_fw(ring, next_to_use);
@@ -1137,12 +1150,10 @@ netdev_tx_t hns3_nic_net_xmit(struct sk_buff *skb, struct net_device *netdev)
struct hns3_nic_ring_data *ring_data =
&tx_ring_data(priv, skb->queue_mapping);
struct hns3_enet_ring *ring = ring_data->ring;
- struct device *dev = priv->dev;
struct netdev_queue *dev_queue;
struct skb_frag_struct *frag;
int next_to_use_head;
int next_to_use_frag;
- dma_addr_t dma;
int buf_num;
int seg_num;
int size;
@@ -1177,15 +1188,8 @@ netdev_tx_t hns3_nic_net_xmit(struct sk_buff *skb, struct net_device *netdev)
next_to_use_head = ring->next_to_use;
- dma = dma_map_single(dev, skb->data, size, DMA_TO_DEVICE);
- if (dma_mapping_error(dev, dma)) {
- netdev_err(netdev, "TX head DMA map failed\n");
- ring->stats.sw_err_cnt++;
- goto out_err_tx_ok;
- }
-
- ret = priv->ops.fill_desc(ring, skb, size, dma, seg_num == 1 ? 1 : 0,
- DESC_TYPE_SKB);
+ ret = priv->ops.fill_desc(ring, skb, size, seg_num == 1 ? 1 : 0,
+ DESC_TYPE_SKB);
if (ret)
goto head_dma_map_err;
@@ -1194,15 +1198,10 @@ netdev_tx_t hns3_nic_net_xmit(struct sk_buff *skb, struct net_device *netdev)
for (i = 1; i < seg_num; i++) {
frag = &skb_shinfo(skb)->frags[i - 1];
size = skb_frag_size(frag);
- dma = skb_frag_dma_map(dev, frag, 0, size, DMA_TO_DEVICE);
- if (dma_mapping_error(dev, dma)) {
- netdev_err(netdev, "TX frag(%d) DMA map failed\n", i);
- ring->stats.sw_err_cnt++;
- goto frag_dma_map_err;
- }
- ret = priv->ops.fill_desc(ring, skb_frag_page(frag), size, dma,
- seg_num - 1 == i ? 1 : 0,
- DESC_TYPE_PAGE);
+
+ ret = priv->ops.fill_desc(ring, frag, size,
+ seg_num - 1 == i ? 1 : 0,
+ DESC_TYPE_PAGE);
if (ret)
goto frag_dma_map_err;
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
index f25b281..71cfca1 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
@@ -419,8 +419,7 @@ struct hns3_nic_ring_data {
struct hns3_nic_ops {
int (*fill_desc)(struct hns3_enet_ring *ring, void *priv,
- int size, dma_addr_t dma, int frag_end,
- enum hns_desc_type type);
+ int size, int frag_end, enum hns_desc_type type);
int (*maybe_stop_tx)(struct sk_buff **out_skb,
int *bnum, struct hns3_enet_ring *ring);
void (*get_rxd_bnum)(u32 bnum_flag, int *out_bnum);
--
1.9.1
^ permalink raw reply related
* [PATCH net-next 1/5] net: hns3: remove hns3_fill_desc_tso
From: Yunsheng Lin @ 2018-10-16 11:58 UTC (permalink / raw)
To: davem
Cc: huangdaode, xuwei5, liguozhu, Yisen.Zhuang, john.garry, linuxarm,
yisen.zhuang, salil.mehta, lipeng321, netdev, linux-kernel
In-Reply-To: <1539691132-158455-1-git-send-email-linyunsheng@huawei.com>
From: Peng Li <lipeng321@huawei.com>
This patch removes hns3_fill_desc_tso in preparation for
fixing some desc filling bug, because for tso or non-tso
case, we will use the unified hns3_fill_desc.
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 44 +++----------------------
1 file changed, 5 insertions(+), 39 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index ce93fcc..953568a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -1054,35 +1054,6 @@ static int hns3_fill_desc(struct hns3_enet_ring *ring, void *priv,
return 0;
}
-static int hns3_fill_desc_tso(struct hns3_enet_ring *ring, void *priv,
- int size, dma_addr_t dma, int frag_end,
- enum hns_desc_type type)
-{
- unsigned int frag_buf_num;
- unsigned int k;
- int sizeoflast;
- int ret;
-
- frag_buf_num = (size + HNS3_MAX_BD_SIZE - 1) / HNS3_MAX_BD_SIZE;
- sizeoflast = size % HNS3_MAX_BD_SIZE;
- sizeoflast = sizeoflast ? sizeoflast : HNS3_MAX_BD_SIZE;
-
- /* When the frag size is bigger than hardware, split this frag */
- for (k = 0; k < frag_buf_num; k++) {
- ret = hns3_fill_desc(ring, priv,
- (k == frag_buf_num - 1) ?
- sizeoflast : HNS3_MAX_BD_SIZE,
- dma + HNS3_MAX_BD_SIZE * k,
- frag_end && (k == frag_buf_num - 1) ? 1 : 0,
- (type == DESC_TYPE_SKB && !k) ?
- DESC_TYPE_SKB : DESC_TYPE_PAGE);
- if (ret)
- return ret;
- }
-
- return 0;
-}
-
static int hns3_nic_maybe_stop_tso(struct sk_buff **out_skb, int *bnum,
struct hns3_enet_ring *ring)
{
@@ -1313,13 +1284,10 @@ static int hns3_nic_set_features(struct net_device *netdev,
int ret;
if (changed & (NETIF_F_TSO | NETIF_F_TSO6)) {
- if (features & (NETIF_F_TSO | NETIF_F_TSO6)) {
- priv->ops.fill_desc = hns3_fill_desc_tso;
+ if (features & (NETIF_F_TSO | NETIF_F_TSO6))
priv->ops.maybe_stop_tx = hns3_nic_maybe_stop_tso;
- } else {
- priv->ops.fill_desc = hns3_fill_desc;
+ else
priv->ops.maybe_stop_tx = hns3_nic_maybe_stop_tx;
- }
}
if ((changed & NETIF_F_HW_VLAN_CTAG_FILTER) &&
@@ -3247,14 +3215,12 @@ static void hns3_nic_set_priv_ops(struct net_device *netdev)
{
struct hns3_nic_priv *priv = netdev_priv(netdev);
+ priv->ops.fill_desc = hns3_fill_desc;
if ((netdev->features & NETIF_F_TSO) ||
- (netdev->features & NETIF_F_TSO6)) {
- priv->ops.fill_desc = hns3_fill_desc_tso;
+ (netdev->features & NETIF_F_TSO6))
priv->ops.maybe_stop_tx = hns3_nic_maybe_stop_tso;
- } else {
- priv->ops.fill_desc = hns3_fill_desc;
+ else
priv->ops.maybe_stop_tx = hns3_nic_maybe_stop_tx;
- }
}
static int hns3_client_init(struct hnae3_handle *handle)
--
1.9.1
^ permalink raw reply related
* [PATCH net-next 0/5] Some cleanup and bugfix for desc filling
From: Yunsheng Lin @ 2018-10-16 11:58 UTC (permalink / raw)
To: davem
Cc: huangdaode, xuwei5, liguozhu, Yisen.Zhuang, john.garry, linuxarm,
yisen.zhuang, salil.mehta, lipeng321, netdev, linux-kernel
When retransmiting packets, skb_cow_head which is called in
hns3_set_tso may clone a new header. And driver will clear the
checksum of the header after doing DMA map, so HW will read the
old header whose L3 checksum is not cleared and calculate a
wrong L3 checksum.
Also When sending a big fragment using multiple buffer descriptor,
hns3 does one maping, but do multiple unmapping when tx is done,
which may cause unmapping problem.
This patchset does some cleanup before fixing the above problem.
Fuyun Liang (3):
net: hns3: add handling for big TX fragment
net: hns3: rename hns_nic_dma_unmap
net: hns3: fix for multiple unmapping DMA problem
Peng Li (2):
net: hns3: remove hns3_fill_desc_tso
net: hns3: move DMA map into hns3_fill_desc
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 137 +++++++++++-------------
drivers/net/ethernet/hisilicon/hns3/hns3_enet.h | 3 +-
2 files changed, 62 insertions(+), 78 deletions(-)
--
1.9.1
^ permalink raw reply
* Re: patch "staging: MAINTAINERS: remove obsolete IPX staging directory" added to staging-testing
From: Greg KH @ 2018-10-16 11:51 UTC (permalink / raw)
To: Joe Perches; +Cc: LKML, netdev
In-Reply-To: <249981aa5af26de07565946fbe806e3c01f26c07.camel@perches.com>
On Tue, Oct 16, 2018 at 04:43:30AM -0700, Joe Perches wrote:
> On Tue, 2018-10-16 at 12:46 +0200, gregkh@linuxfoundation.org wrote:
> > This is a note to let you know that I've just added the patch titled
> >
> > staging: MAINTAINERS: remove obsolete IPX staging directoryts
> []
> > From dd71c89b2c1ae049d2018107e9d7a41261db3f7b Mon Sep 17 00:00:00 2001
> > From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Date: Tue, 16 Oct 2018 12:45:13 +0200
> > Subject: staging: MAINTAINERS: remove obsolete IPX staging directory
> >
> > The IPX code was removed from staging back in November 2017, but the
> > MAINTAINERS entry stuck around. Remove the invalid directory from the
> > file as it does not actually point to anything anymore.
>
> As the uapi ipx file still exists but the functionality
> has been removed, perhaps some compiler message like
>
> #pragma message "IPX functionality removed as of linux kernel v4.18"
>
> should be added to the uapi file.
Eventually those files should be removed by the networking developers.
It will take a bit of work for them to do it, it's not a straight
"delete the file" work.
thanks,
greg k-h
^ permalink raw reply
* Re: patch "staging: MAINTAINERS: remove obsolete IPX staging directory" added to staging-testing
From: Joe Perches @ 2018-10-16 11:43 UTC (permalink / raw)
To: gregkh; +Cc: LKML, netdev
In-Reply-To: <15396867728666@kroah.com>
On Tue, 2018-10-16 at 12:46 +0200, gregkh@linuxfoundation.org wrote:
> This is a note to let you know that I've just added the patch titled
>
> staging: MAINTAINERS: remove obsolete IPX staging directoryts
[]
> From dd71c89b2c1ae049d2018107e9d7a41261db3f7b Mon Sep 17 00:00:00 2001
> From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Date: Tue, 16 Oct 2018 12:45:13 +0200
> Subject: staging: MAINTAINERS: remove obsolete IPX staging directory
>
> The IPX code was removed from staging back in November 2017, but the
> MAINTAINERS entry stuck around. Remove the invalid directory from the
> file as it does not actually point to anything anymore.
As the uapi ipx file still exists but the functionality
has been removed, perhaps some compiler message like
#pragma message "IPX functionality removed as of linux kernel v4.18"
should be added to the uapi file.
> diff --git a/MAINTAINERS b/MAINTAINERS
> []
> @@ -7672,7 +7672,6 @@ IPX NETWORK LAYER
> L: netdev@vger.kernel.org
> S: Obsolete
> F: include/uapi/linux/ipx.h
> -F: drivers/staging/ipx/
>
> IRQ DOMAINS (IRQ NUMBER MAPPING LIBRARY)
> M: Marc Zyngier <marc.zyngier@arm.com>
^ permalink raw reply
* Re: KASAN: use-after-free Read in sctp_id2assoc
From: Neil Horman @ 2018-10-16 11:28 UTC (permalink / raw)
To: Marcelo Ricardo Leitner
Cc: Dmitry Vyukov, syzbot, David Miller, LKML, linux-sctp, netdev,
syzkaller-bugs, Vladislav Yasevich
In-Reply-To: <20181010184011.GE6761@localhost.localdomain>
On Wed, Oct 10, 2018 at 03:40:11PM -0300, Marcelo Ricardo Leitner wrote:
> On Wed, Oct 10, 2018 at 08:28:22PM +0200, Dmitry Vyukov wrote:
> > On Wed, Oct 10, 2018 at 8:13 PM, Marcelo Ricardo Leitner
> > <marcelo.leitner@gmail.com> wrote:
> > > On Wed, Oct 10, 2018 at 05:28:12PM +0200, Dmitry Vyukov wrote:
> > >> On Fri, Oct 5, 2018 at 4:58 PM, Marcelo Ricardo Leitner
> > >> <marcelo.leitner@gmail.com> wrote:
> > >> > On Thu, Oct 04, 2018 at 01:48:03AM -0700, syzbot wrote:
> > >> >> Hello,
> > >> >>
> > >> >> syzbot found the following crash on:
> > >> >>
> > >> >> HEAD commit: 4e6d47206c32 tls: Add support for inplace records encryption
> > >> >> git tree: net-next
> > >> >> console output: https://syzkaller.appspot.com/x/log.txt?x=13834b81400000
> > >> >> kernel config: https://syzkaller.appspot.com/x/.config?x=e569aa5632ebd436
> > >> >> dashboard link: https://syzkaller.appspot.com/bug?extid=c7dd55d7aec49d48e49a
> > >> >> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> > >> >>
> > >> >> Unfortunately, I don't have any reproducer for this crash yet.
> > >> >>
> > >> >> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > >> >> Reported-by: syzbot+c7dd55d7aec49d48e49a@syzkaller.appspotmail.com
> > >> >>
> > >> >> netlink: 'syz-executor1': attribute type 1 has an invalid length.
> > >> >> ==================================================================
> > >> >> BUG: KASAN: use-after-free in sctp_id2assoc+0x3a7/0x3e0
> > >> >> net/sctp/socket.c:276
> > >> >> Read of size 8 at addr ffff880195b3eb20 by task syz-executor2/15454
> > >> >>
> > >> >> CPU: 1 PID: 15454 Comm: syz-executor2 Not tainted 4.19.0-rc5+ #242
> > >> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > >> >> Google 01/01/2011
> > >> >> Call Trace:
> > >> >> __dump_stack lib/dump_stack.c:77 [inline]
> > >> >> dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
> > >> >> print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
> > >> >> kasan_report_error mm/kasan/report.c:354 [inline]
> > >> >> kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
> > >> >> __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
> > >> >> sctp_id2assoc+0x3a7/0x3e0 net/sctp/socket.c:276
> > >> >
> > >> > I'm not seeing yet how this could happen.
> > >> > All sockopts here are serialized by sock_lock.
> > >> > do_peeloff here would create another socket, but the issue was
> > >> > triggered before that.
> > >> > The same function that freed this memory, also removes the entry from
> > >> > idr mapping, so this entry shouldn't be there anymore.
> > >> >
> > >> > I have only two theories so far:
> > >> > - an issue with IDR/RCU.
> > >> > - something else happened that just the call stacks are not revealing.
> > >>
> > >> The "asoc->base.sk != sk" check after idr_find suggests that we don't
> > >> actually know what sock it belongs to. And if we don't know then
> > >
> > > Right. The check is more because the IDR is global and not per socket
> > > (and we don't want sockets accessing asocs from other sockets), and not
> > > that the asoc may move to another socket in between, but it also
> > > protects from such cases, yes.
> > >
> > >> locking this sock can't help keeping another sock association alive.
> > >> Am I missing something obvious here? Should we take assoc ref while we
> > >
> > > Not sure. Maybe I am. Thanks for looking into this, btw.
> > >
> > >> are still holding sctp_assocs_id_lock?
> > >
> > > Shouldn't be needed.
> > >
> > > Solely by the call stacks:
> > > - we tried to establish a new asoc from a sctp_connect() call,
> > > blocking one.
> > > - it slept waiting for the connect
> > > - (something closed the asoc in between the sleeps, because it freed
> > > the asoc right when waking up on sctp_wait_for_connect())
> > > - it freed the asoc after sleeping on it on sctp_wait_for_connect [A]
> > > - another thread tried to peeloff that asoc [B]
> > >
> > > For [B] to access the asoc in question, it had to take the same sock
> > > lock [A] had taken, and then the idr should not return an asoc in
> > > sctp_i2asoc(). Note that we can't peeloff an asoc twice, thus why
> > > the certainty here.
> > >
> > > If [B] actually kicked in before the sleep resumed, that should have
> > > been fine because it took the same sock lock [A] would have to
> > > re-take. In this case an asoc would have been returned by
> > > sctp_id2asoc(), the asoc would have been moved to a new socket, but
> > > all while holding the original socket sock lock.
> >
> > But why A and B use the same lock?
> >
> > sctp_assocs_id is global, so it contains asocs from all sockets, right?
> > assoc id comes straight from userspaces.
> > So isn't it possible that B uses completely different sock but passes
> > assoc id from the A sock? Then B should find assoc in sctp_assocs_id,
> > and at the point of "asoc->base.sk != sk" check the assoc can be
> > already freed.
>
> That explains it. Somehow I was thinking the issue was for reading
> ->dead instead. Now it's pretty clear. Then this should be the patch
> we want. Can you please give it a spin? (only compile tested)
>
> While holding the spinlock, an entry cannot be removed from the idr
> and thus it cannot be freed. So even if the app uses an id from
> another socket, it will still be there.
>
> ---8<---
>
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index f73e9d38d5ba..a7722f43aa69 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -271,11 +271,10 @@ struct sctp_association *sctp_id2assoc(struct sock *sk, sctp_assoc_t id)
>
> spin_lock_bh(&sctp_assocs_id_lock);
> asoc = (struct sctp_association *)idr_find(&sctp_assocs_id, (int)id);
> + if (asoc && (asoc->base.sk != sk || asoc->base.dead))
> + asoc = NULL;
> spin_unlock_bh(&sctp_assocs_id_lock);
>
> - if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
> - return NULL;
> -
> return asoc;
> }
>
>
Marcello, can you post this with a proper changelog commit please? Based on the
bug report, and description of the problem, I think we can all agree this is a
sane fix
Thanks
Neil
^ permalink raw reply
* [PATCH 1/1] net-next/hinic: add checksum offload and TSO support
From: Xue Chaojing @ 2018-10-16 11:12 UTC (permalink / raw)
To: davem
Cc: linux-kernel, xuechaojing, netdev, zhaochen6, tony.qu, yin.yinshi,
luoshaokai, fy.wang, luoxianjun
From: Zhao Chen <zhaochen6@huawei.com>
This patch adds checksum offload and TSO support for the HiNIC
driver. Perfomance test (Iperf) shows more than 100% improvement
in TCP streams.
Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
---
.../net/ethernet/huawei/hinic/hinic_hw_dev.h | 2 +
.../net/ethernet/huawei/hinic/hinic_hw_qp.c | 121 +++++--
.../net/ethernet/huawei/hinic/hinic_hw_qp.h | 27 ++
.../net/ethernet/huawei/hinic/hinic_hw_wq.c | 14 +
.../net/ethernet/huawei/hinic/hinic_hw_wq.h | 2 +
.../net/ethernet/huawei/hinic/hinic_hw_wqe.h | 97 ++++--
.../net/ethernet/huawei/hinic/hinic_main.c | 23 +-
.../net/ethernet/huawei/hinic/hinic_port.c | 32 ++
.../net/ethernet/huawei/hinic/hinic_port.h | 18 ++
drivers/net/ethernet/huawei/hinic/hinic_tx.c | 295 +++++++++++++++++-
10 files changed, 571 insertions(+), 60 deletions(-)
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h
index 0f5563f3b779..097b5502603f 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_dev.h
@@ -58,6 +58,8 @@ enum hinic_port_cmd {
HINIC_PORT_CMD_GET_GLOBAL_QPN = 102,
+ HINIC_PORT_CMD_SET_TSO = 112,
+
HINIC_PORT_CMD_GET_CAP = 170,
};
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_qp.c b/drivers/net/ethernet/huawei/hinic/hinic_hw_qp.c
index cb239627770f..967c993d5303 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_qp.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_qp.c
@@ -70,8 +70,6 @@
#define SQ_MASKED_IDX(sq, idx) ((idx) & (sq)->wq->mask)
#define RQ_MASKED_IDX(rq, idx) ((idx) & (rq)->wq->mask)
-#define TX_MAX_MSS_DEFAULT 0x3E00
-
enum sq_wqe_type {
SQ_NORMAL_WQE = 0,
};
@@ -494,33 +492,16 @@ static void sq_prepare_ctrl(struct hinic_sq_ctrl *ctrl, u16 prod_idx,
HINIC_SQ_CTRL_SET(SQ_NORMAL_WQE, DATA_FORMAT) |
HINIC_SQ_CTRL_SET(ctrl_size, LEN);
- ctrl->queue_info = HINIC_SQ_CTRL_SET(TX_MAX_MSS_DEFAULT,
- QUEUE_INFO_MSS);
+ ctrl->queue_info = HINIC_SQ_CTRL_SET(HINIC_MSS_DEFAULT,
+ QUEUE_INFO_MSS) |
+ HINIC_SQ_CTRL_SET(1, QUEUE_INFO_UC);
}
static void sq_prepare_task(struct hinic_sq_task *task)
{
- task->pkt_info0 =
- HINIC_SQ_TASK_INFO0_SET(0, L2HDR_LEN) |
- HINIC_SQ_TASK_INFO0_SET(HINIC_L4_OFF_DISABLE, L4_OFFLOAD) |
- HINIC_SQ_TASK_INFO0_SET(HINIC_OUTER_L3TYPE_UNKNOWN,
- INNER_L3TYPE) |
- HINIC_SQ_TASK_INFO0_SET(HINIC_VLAN_OFF_DISABLE,
- VLAN_OFFLOAD) |
- HINIC_SQ_TASK_INFO0_SET(HINIC_PKT_NOT_PARSED, PARSE_FLAG);
-
- task->pkt_info1 =
- HINIC_SQ_TASK_INFO1_SET(HINIC_MEDIA_UNKNOWN, MEDIA_TYPE) |
- HINIC_SQ_TASK_INFO1_SET(0, INNER_L4_LEN) |
- HINIC_SQ_TASK_INFO1_SET(0, INNER_L3_LEN);
-
- task->pkt_info2 =
- HINIC_SQ_TASK_INFO2_SET(0, TUNNEL_L4_LEN) |
- HINIC_SQ_TASK_INFO2_SET(0, OUTER_L3_LEN) |
- HINIC_SQ_TASK_INFO2_SET(HINIC_TUNNEL_L4TYPE_UNKNOWN,
- TUNNEL_L4TYPE) |
- HINIC_SQ_TASK_INFO2_SET(HINIC_OUTER_L3TYPE_UNKNOWN,
- OUTER_L3TYPE);
+ task->pkt_info0 = 0;
+ task->pkt_info1 = 0;
+ task->pkt_info2 = 0;
task->ufo_v6_identify = 0;
@@ -529,6 +510,86 @@ static void sq_prepare_task(struct hinic_sq_task *task)
task->zero_pad = 0;
}
+void hinic_task_set_l2hdr(struct hinic_sq_task *task, u32 len)
+{
+ task->pkt_info0 |= HINIC_SQ_TASK_INFO0_SET(len, L2HDR_LEN);
+}
+
+void hinic_task_set_outter_l3(struct hinic_sq_task *task,
+ enum hinic_l3_offload_type l3_type,
+ u32 network_len)
+{
+ task->pkt_info2 |= HINIC_SQ_TASK_INFO2_SET(l3_type, OUTER_L3TYPE) |
+ HINIC_SQ_TASK_INFO2_SET(network_len, OUTER_L3LEN);
+}
+
+void hinic_task_set_inner_l3(struct hinic_sq_task *task,
+ enum hinic_l3_offload_type l3_type,
+ u32 network_len)
+{
+ task->pkt_info0 |= HINIC_SQ_TASK_INFO0_SET(l3_type, INNER_L3TYPE);
+ task->pkt_info1 |= HINIC_SQ_TASK_INFO1_SET(network_len, INNER_L3LEN);
+}
+
+void hinic_task_set_tunnel_l4(struct hinic_sq_task *task,
+ enum hinic_l4_offload_type l4_type,
+ u32 tunnel_len)
+{
+ task->pkt_info2 |= HINIC_SQ_TASK_INFO2_SET(l4_type, TUNNEL_L4TYPE) |
+ HINIC_SQ_TASK_INFO2_SET(tunnel_len, TUNNEL_L4LEN);
+}
+
+void hinic_set_cs_inner_l4(struct hinic_sq_task *task, u32 *queue_info,
+ enum hinic_l4_offload_type l4_offload,
+ u32 l4_len, u32 offset)
+{
+ u32 tcp_udp_cs = 0, sctp = 0;
+ u32 mss = HINIC_MSS_DEFAULT;
+
+ if (l4_offload == TCP_OFFLOAD_ENABLE ||
+ l4_offload == UDP_OFFLOAD_ENABLE)
+ tcp_udp_cs = 1;
+ else if (l4_offload == SCTP_OFFLOAD_ENABLE)
+ sctp = 1;
+
+ task->pkt_info0 |= HINIC_SQ_TASK_INFO0_SET(l4_offload, L4_OFFLOAD);
+ task->pkt_info1 |= HINIC_SQ_TASK_INFO1_SET(l4_len, INNER_L4LEN);
+
+ *queue_info |= HINIC_SQ_CTRL_SET(offset, QUEUE_INFO_PLDOFF) |
+ HINIC_SQ_CTRL_SET(tcp_udp_cs, QUEUE_INFO_TCPUDP_CS) |
+ HINIC_SQ_CTRL_SET(sctp, QUEUE_INFO_SCTP);
+
+ *queue_info = HINIC_SQ_CTRL_CLEAR(*queue_info, QUEUE_INFO_MSS);
+ *queue_info |= HINIC_SQ_CTRL_SET(mss, QUEUE_INFO_MSS);
+}
+
+void hinic_set_tso_inner_l4(struct hinic_sq_task *task, u32 *queue_info,
+ enum hinic_l4_offload_type l4_offload,
+ u32 l4_len, u32 offset, u32 ip_ident, u32 mss)
+{
+ u32 tso = 0, ufo = 0;
+
+ if (l4_offload == TCP_OFFLOAD_ENABLE)
+ tso = 1;
+ else if (l4_offload == UDP_OFFLOAD_ENABLE)
+ ufo = 1;
+
+ task->ufo_v6_identify = ip_ident;
+
+ task->pkt_info0 |= HINIC_SQ_TASK_INFO0_SET(l4_offload, L4_OFFLOAD);
+ task->pkt_info0 |= HINIC_SQ_TASK_INFO0_SET(tso || ufo, TSO_FLAG);
+ task->pkt_info1 |= HINIC_SQ_TASK_INFO1_SET(l4_len, INNER_L4LEN);
+
+ *queue_info |= HINIC_SQ_CTRL_SET(offset, QUEUE_INFO_PLDOFF) |
+ HINIC_SQ_CTRL_SET(tso, QUEUE_INFO_TSO) |
+ HINIC_SQ_CTRL_SET(ufo, QUEUE_INFO_UFO) |
+ HINIC_SQ_CTRL_SET(!!l4_offload, QUEUE_INFO_TCPUDP_CS);
+
+ /* set MSS value */
+ *queue_info = HINIC_SQ_CTRL_CLEAR(*queue_info, QUEUE_INFO_MSS);
+ *queue_info |= HINIC_SQ_CTRL_SET(mss, QUEUE_INFO_MSS);
+}
+
/**
* hinic_sq_prepare_wqe - prepare wqe before insert to the queue
* @sq: send queue
@@ -612,6 +673,16 @@ struct hinic_sq_wqe *hinic_sq_get_wqe(struct hinic_sq *sq,
return &hw_wqe->sq_wqe;
}
+/**
+ * hinic_sq_return_wqe - return the wqe to the sq
+ * @sq: send queue
+ * @wqe_size: the size of the wqe
+ **/
+void hinic_sq_return_wqe(struct hinic_sq *sq, unsigned int wqe_size)
+{
+ hinic_return_wqe(sq->wq, wqe_size);
+}
+
/**
* hinic_sq_write_wqe - write the wqe to the sq
* @sq: send queue
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h b/drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h
index 6c84f83ec283..a0dc63a4bfc7 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_qp.h
@@ -149,6 +149,31 @@ int hinic_get_sq_free_wqebbs(struct hinic_sq *sq);
int hinic_get_rq_free_wqebbs(struct hinic_rq *rq);
+void hinic_task_set_l2hdr(struct hinic_sq_task *task, u32 len);
+
+void hinic_task_set_outter_l3(struct hinic_sq_task *task,
+ enum hinic_l3_offload_type l3_type,
+ u32 network_len);
+
+void hinic_task_set_inner_l3(struct hinic_sq_task *task,
+ enum hinic_l3_offload_type l3_type,
+ u32 network_len);
+
+void hinic_task_set_tunnel_l4(struct hinic_sq_task *task,
+ enum hinic_l4_offload_type l4_type,
+ u32 tunnel_len);
+
+void hinic_set_cs_inner_l4(struct hinic_sq_task *task,
+ u32 *queue_info,
+ enum hinic_l4_offload_type l4_offload,
+ u32 l4_len, u32 offset);
+
+void hinic_set_tso_inner_l4(struct hinic_sq_task *task,
+ u32 *queue_info,
+ enum hinic_l4_offload_type l4_offload,
+ u32 l4_len,
+ u32 offset, u32 ip_ident, u32 mss);
+
void hinic_sq_prepare_wqe(struct hinic_sq *sq, u16 prod_idx,
struct hinic_sq_wqe *wqe, struct hinic_sge *sges,
int nr_sges);
@@ -159,6 +184,8 @@ void hinic_sq_write_db(struct hinic_sq *sq, u16 prod_idx, unsigned int wqe_size,
struct hinic_sq_wqe *hinic_sq_get_wqe(struct hinic_sq *sq,
unsigned int wqe_size, u16 *prod_idx);
+void hinic_sq_return_wqe(struct hinic_sq *sq, unsigned int wqe_size);
+
void hinic_sq_write_wqe(struct hinic_sq *sq, u16 prod_idx,
struct hinic_sq_wqe *wqe, struct sk_buff *skb,
unsigned int wqe_size);
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_wq.c b/drivers/net/ethernet/huawei/hinic/hinic_hw_wq.c
index 3e3181c089bd..f92f1bf3901a 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_wq.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_wq.c
@@ -774,6 +774,20 @@ struct hinic_hw_wqe *hinic_get_wqe(struct hinic_wq *wq, unsigned int wqe_size,
return WQ_PAGE_ADDR(wq, *prod_idx) + WQE_PAGE_OFF(wq, *prod_idx);
}
+/**
+ * hinic_return_wqe - return the wqe when transmit failed
+ * @wq: wq to return wqe
+ * @wqe_size: wqe size
+ **/
+void hinic_return_wqe(struct hinic_wq *wq, unsigned int wqe_size)
+{
+ int num_wqebbs = ALIGN(wqe_size, wq->wqebb_size) / wq->wqebb_size;
+
+ atomic_sub(num_wqebbs, &wq->prod_idx);
+
+ atomic_add(num_wqebbs, &wq->delta);
+}
+
/**
* hinic_put_wqe - return the wqe place to use for a new wqe
* @wq: wq to return wqe
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_wq.h b/drivers/net/ethernet/huawei/hinic/hinic_hw_wq.h
index 9c030a0f035e..9b66545ba563 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_wq.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_wq.h
@@ -104,6 +104,8 @@ void hinic_wq_free(struct hinic_wqs *wqs, struct hinic_wq *wq);
struct hinic_hw_wqe *hinic_get_wqe(struct hinic_wq *wq, unsigned int wqe_size,
u16 *prod_idx);
+void hinic_return_wqe(struct hinic_wq *wq, unsigned int wqe_size);
+
void hinic_put_wqe(struct hinic_wq *wq, unsigned int wqe_size);
struct hinic_hw_wqe *hinic_read_wqe(struct hinic_wq *wq, unsigned int wqe_size,
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_hw_wqe.h b/drivers/net/ethernet/huawei/hinic/hinic_hw_wqe.h
index bc73485483c5..9754d6ed5f4a 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_hw_wqe.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_hw_wqe.h
@@ -62,19 +62,33 @@
(((val) >> HINIC_CMDQ_WQE_HEADER_##member##_SHIFT) \
& HINIC_CMDQ_WQE_HEADER_##member##_MASK)
-#define HINIC_SQ_CTRL_BUFDESC_SECT_LEN_SHIFT 0
-#define HINIC_SQ_CTRL_TASKSECT_LEN_SHIFT 16
-#define HINIC_SQ_CTRL_DATA_FORMAT_SHIFT 22
-#define HINIC_SQ_CTRL_LEN_SHIFT 29
-
-#define HINIC_SQ_CTRL_BUFDESC_SECT_LEN_MASK 0xFF
-#define HINIC_SQ_CTRL_TASKSECT_LEN_MASK 0x1F
-#define HINIC_SQ_CTRL_DATA_FORMAT_MASK 0x1
-#define HINIC_SQ_CTRL_LEN_MASK 0x3
-
-#define HINIC_SQ_CTRL_QUEUE_INFO_MSS_SHIFT 13
-
-#define HINIC_SQ_CTRL_QUEUE_INFO_MSS_MASK 0x3FFF
+#define HINIC_SQ_CTRL_BUFDESC_SECT_LEN_SHIFT 0
+#define HINIC_SQ_CTRL_TASKSECT_LEN_SHIFT 16
+#define HINIC_SQ_CTRL_DATA_FORMAT_SHIFT 22
+#define HINIC_SQ_CTRL_LEN_SHIFT 29
+
+#define HINIC_SQ_CTRL_BUFDESC_SECT_LEN_MASK 0xFF
+#define HINIC_SQ_CTRL_TASKSECT_LEN_MASK 0x1F
+#define HINIC_SQ_CTRL_DATA_FORMAT_MASK 0x1
+#define HINIC_SQ_CTRL_LEN_MASK 0x3
+
+#define HINIC_SQ_CTRL_QUEUE_INFO_PLDOFF_SHIFT 2
+#define HINIC_SQ_CTRL_QUEUE_INFO_UFO_SHIFT 10
+#define HINIC_SQ_CTRL_QUEUE_INFO_TSO_SHIFT 11
+#define HINIC_SQ_CTRL_QUEUE_INFO_TCPUDP_CS_SHIFT 12
+#define HINIC_SQ_CTRL_QUEUE_INFO_MSS_SHIFT 13
+#define HINIC_SQ_CTRL_QUEUE_INFO_SCTP_SHIFT 27
+#define HINIC_SQ_CTRL_QUEUE_INFO_UC_SHIFT 28
+#define HINIC_SQ_CTRL_QUEUE_INFO_PRI_SHIFT 29
+
+#define HINIC_SQ_CTRL_QUEUE_INFO_PLDOFF_MASK 0xFF
+#define HINIC_SQ_CTRL_QUEUE_INFO_UFO_MASK 0x1
+#define HINIC_SQ_CTRL_QUEUE_INFO_TSO_MASK 0x1
+#define HINIC_SQ_CTRL_QUEUE_INFO_TCPUDP_CS_MASK 0x1
+#define HINIC_SQ_CTRL_QUEUE_INFO_MSS_MASK 0x3FFF
+#define HINIC_SQ_CTRL_QUEUE_INFO_SCTP_MASK 0x1
+#define HINIC_SQ_CTRL_QUEUE_INFO_UC_MASK 0x1
+#define HINIC_SQ_CTRL_QUEUE_INFO_PRI_MASK 0x7
#define HINIC_SQ_CTRL_SET(val, member) \
(((u32)(val) & HINIC_SQ_CTRL_##member##_MASK) \
@@ -84,6 +98,10 @@
(((val) >> HINIC_SQ_CTRL_##member##_SHIFT) \
& HINIC_SQ_CTRL_##member##_MASK)
+#define HINIC_SQ_CTRL_CLEAR(val, member) \
+ ((u32)(val) & (~(HINIC_SQ_CTRL_##member##_MASK \
+ << HINIC_SQ_CTRL_##member##_SHIFT)))
+
#define HINIC_SQ_TASK_INFO0_L2HDR_LEN_SHIFT 0
#define HINIC_SQ_TASK_INFO0_L4_OFFLOAD_SHIFT 8
#define HINIC_SQ_TASK_INFO0_INNER_L3TYPE_SHIFT 10
@@ -108,28 +126,28 @@
/* 8 bits reserved */
#define HINIC_SQ_TASK_INFO1_MEDIA_TYPE_SHIFT 8
-#define HINIC_SQ_TASK_INFO1_INNER_L4_LEN_SHIFT 16
-#define HINIC_SQ_TASK_INFO1_INNER_L3_LEN_SHIFT 24
+#define HINIC_SQ_TASK_INFO1_INNER_L4LEN_SHIFT 16
+#define HINIC_SQ_TASK_INFO1_INNER_L3LEN_SHIFT 24
/* 8 bits reserved */
#define HINIC_SQ_TASK_INFO1_MEDIA_TYPE_MASK 0xFF
-#define HINIC_SQ_TASK_INFO1_INNER_L4_LEN_MASK 0xFF
-#define HINIC_SQ_TASK_INFO1_INNER_L3_LEN_MASK 0xFF
+#define HINIC_SQ_TASK_INFO1_INNER_L4LEN_MASK 0xFF
+#define HINIC_SQ_TASK_INFO1_INNER_L3LEN_MASK 0xFF
#define HINIC_SQ_TASK_INFO1_SET(val, member) \
(((u32)(val) & HINIC_SQ_TASK_INFO1_##member##_MASK) << \
HINIC_SQ_TASK_INFO1_##member##_SHIFT)
-#define HINIC_SQ_TASK_INFO2_TUNNEL_L4_LEN_SHIFT 0
-#define HINIC_SQ_TASK_INFO2_OUTER_L3_LEN_SHIFT 12
-#define HINIC_SQ_TASK_INFO2_TUNNEL_L4TYPE_SHIFT 19
+#define HINIC_SQ_TASK_INFO2_TUNNEL_L4LEN_SHIFT 0
+#define HINIC_SQ_TASK_INFO2_OUTER_L3LEN_SHIFT 8
+#define HINIC_SQ_TASK_INFO2_TUNNEL_L4TYPE_SHIFT 16
/* 1 bit reserved */
-#define HINIC_SQ_TASK_INFO2_OUTER_L3TYPE_SHIFT 22
+#define HINIC_SQ_TASK_INFO2_OUTER_L3TYPE_SHIFT 24
/* 8 bits reserved */
-#define HINIC_SQ_TASK_INFO2_TUNNEL_L4_LEN_MASK 0xFFF
-#define HINIC_SQ_TASK_INFO2_OUTER_L3_LEN_MASK 0x7F
-#define HINIC_SQ_TASK_INFO2_TUNNEL_L4TYPE_MASK 0x3
+#define HINIC_SQ_TASK_INFO2_TUNNEL_L4LEN_MASK 0xFF
+#define HINIC_SQ_TASK_INFO2_OUTER_L3LEN_MASK 0xFF
+#define HINIC_SQ_TASK_INFO2_TUNNEL_L4TYPE_MASK 0x7
/* 1 bit reserved */
#define HINIC_SQ_TASK_INFO2_OUTER_L3TYPE_MASK 0x3
/* 8 bits reserved */
@@ -187,12 +205,15 @@
sizeof(struct hinic_sq_task) + \
(nr_sges) * sizeof(struct hinic_sq_bufdesc))
-#define HINIC_SCMD_DATA_LEN 16
+#define HINIC_SCMD_DATA_LEN 16
+
+#define HINIC_MAX_SQ_BUFDESCS 17
-#define HINIC_MAX_SQ_BUFDESCS 17
+#define HINIC_SQ_WQE_MAX_SIZE 320
+#define HINIC_RQ_WQE_SIZE 32
-#define HINIC_SQ_WQE_MAX_SIZE 320
-#define HINIC_RQ_WQE_SIZE 32
+#define HINIC_MSS_DEFAULT 0x3E00
+#define HINIC_MSS_MIN 0x50
enum hinic_l4offload_type {
HINIC_L4_OFF_DISABLE = 0,
@@ -211,6 +232,26 @@ enum hinic_pkt_parsed {
HINIC_PKT_PARSED = 1,
};
+enum hinic_l3_offload_type {
+ L3TYPE_UNKNOWN = 0,
+ IPV6_PKT = 1,
+ IPV4_PKT_NO_CHKSUM_OFFLOAD = 2,
+ IPV4_PKT_WITH_CHKSUM_OFFLOAD = 3,
+};
+
+enum hinic_l4_offload_type {
+ OFFLOAD_DISABLE = 0,
+ TCP_OFFLOAD_ENABLE = 1,
+ SCTP_OFFLOAD_ENABLE = 2,
+ UDP_OFFLOAD_ENABLE = 3,
+};
+
+enum hinic_l4_tunnel_type {
+ NOT_TUNNEL,
+ TUNNEL_UDP_NO_CSUM,
+ TUNNEL_UDP_CSUM,
+};
+
enum hinic_outer_l3type {
HINIC_OUTER_L3TYPE_UNKNOWN = 0,
HINIC_OUTER_L3TYPE_IPV6 = 1,
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_main.c b/drivers/net/ethernet/huawei/hinic/hinic_main.c
index 4a8f82938ed5..fdf2bdb6b0d0 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_main.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_main.c
@@ -805,7 +805,8 @@ static const struct net_device_ops hinic_netdev_ops = {
static void netdev_features_init(struct net_device *netdev)
{
- netdev->hw_features = NETIF_F_SG | NETIF_F_HIGHDMA;
+ netdev->hw_features = NETIF_F_SG | NETIF_F_HIGHDMA | NETIF_F_IP_CSUM |
+ NETIF_F_IPV6_CSUM | NETIF_F_TSO | NETIF_F_TSO6;
netdev->vlan_features = netdev->hw_features;
@@ -863,6 +864,20 @@ static void link_status_event_handler(void *handle, void *buf_in, u16 in_size,
*out_size = sizeof(*ret_link_status);
}
+static int set_features(struct hinic_dev *nic_dev,
+ netdev_features_t pre_features,
+ netdev_features_t features, bool force_change)
+{
+ netdev_features_t changed = force_change ? ~0 : pre_features ^ features;
+ int err = 0;
+
+ if (changed & NETIF_F_TSO)
+ err = hinic_port_set_tso(nic_dev, (features & NETIF_F_TSO) ?
+ HINIC_TSO_ENABLE : HINIC_TSO_DISABLE);
+
+ return err;
+}
+
/**
* nic_dev_init - Initialize the NIC device
* @pdev: the NIC pci device
@@ -963,7 +978,12 @@ static int nic_dev_init(struct pci_dev *pdev)
hinic_hwdev_cb_register(nic_dev->hwdev, HINIC_MGMT_MSG_CMD_LINK_STATUS,
nic_dev, link_status_event_handler);
+ err = set_features(nic_dev, 0, nic_dev->netdev->features, true);
+ if (err)
+ goto err_set_features;
+
SET_NETDEV_DEV(netdev, &pdev->dev);
+
err = register_netdev(netdev);
if (err) {
dev_err(&pdev->dev, "Failed to register netdev\n");
@@ -973,6 +993,7 @@ static int nic_dev_init(struct pci_dev *pdev)
return 0;
err_reg_netdev:
+err_set_features:
hinic_hwdev_cb_unregister(nic_dev->hwdev,
HINIC_MGMT_MSG_CMD_LINK_STATUS);
cancel_work_sync(&rx_mode_work->work);
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_port.c b/drivers/net/ethernet/huawei/hinic/hinic_port.c
index 4d4e3f05fb5f..7575a7d3bd9f 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_port.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_port.c
@@ -377,3 +377,35 @@ int hinic_port_get_cap(struct hinic_dev *nic_dev,
return 0;
}
+
+/**
+ * hinic_port_set_tso - set port tso configuration
+ * @nic_dev: nic device
+ * @state: the tso state to set
+ *
+ * Return 0 - Success, negative - Failure
+ **/
+int hinic_port_set_tso(struct hinic_dev *nic_dev, enum hinic_tso_state state)
+{
+ struct hinic_hwdev *hwdev = nic_dev->hwdev;
+ struct hinic_hwif *hwif = hwdev->hwif;
+ struct hinic_tso_config tso_cfg = {0};
+ struct pci_dev *pdev = hwif->pdev;
+ u16 out_size;
+ int err;
+
+ tso_cfg.func_id = HINIC_HWIF_FUNC_IDX(hwif);
+ tso_cfg.tso_en = state;
+
+ err = hinic_port_msg_cmd(hwdev, HINIC_PORT_CMD_SET_TSO,
+ &tso_cfg, sizeof(tso_cfg),
+ &tso_cfg, &out_size);
+ if (err || out_size != sizeof(tso_cfg) || tso_cfg.status) {
+ dev_err(&pdev->dev,
+ "Failed to set port tso, ret = %d\n",
+ tso_cfg.status);
+ return -EINVAL;
+ }
+
+ return 0;
+}
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_port.h b/drivers/net/ethernet/huawei/hinic/hinic_port.h
index 9404365195dd..f6e3220fe28f 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_port.h
+++ b/drivers/net/ethernet/huawei/hinic/hinic_port.h
@@ -72,6 +72,11 @@ enum hinic_speed {
HINIC_SPEED_UNKNOWN = 0xFF,
};
+enum hinic_tso_state {
+ HINIC_TSO_DISABLE = 0,
+ HINIC_TSO_ENABLE = 1,
+};
+
struct hinic_port_mac_cmd {
u8 status;
u8 version;
@@ -167,6 +172,17 @@ struct hinic_port_cap {
u8 rsvd2[3];
};
+struct hinic_tso_config {
+ u8 status;
+ u8 version;
+ u8 rsvd0[6];
+
+ u16 func_id;
+ u16 rsvd1;
+ u8 tso_en;
+ u8 resv2[3];
+};
+
int hinic_port_add_mac(struct hinic_dev *nic_dev, const u8 *addr,
u16 vlan_id);
@@ -195,4 +211,6 @@ int hinic_port_set_func_state(struct hinic_dev *nic_dev,
int hinic_port_get_cap(struct hinic_dev *nic_dev,
struct hinic_port_cap *port_cap);
+int hinic_port_set_tso(struct hinic_dev *nic_dev, enum hinic_tso_state state);
+
#endif
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_tx.c b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
index c5fca0356c9c..c98d968d94df 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_tx.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
@@ -26,6 +26,13 @@
#include <linux/skbuff.h>
#include <linux/smp.h>
#include <asm/byteorder.h>
+#include <linux/ip.h>
+#include <linux/tcp.h>
+#include <linux/sctp.h>
+#include <linux/ipv6.h>
+#include <net/ipv6.h>
+#include <net/checksum.h>
+#include <net/ip6_checksum.h>
#include "hinic_common.h"
#include "hinic_hw_if.h"
@@ -45,9 +52,31 @@
#define CI_UPDATE_NO_PENDING 0
#define CI_UPDATE_NO_COALESC 0
-#define HW_CONS_IDX(sq) be16_to_cpu(*(u16 *)((sq)->hw_ci_addr))
+#define HW_CONS_IDX(sq) be16_to_cpu(*(u16 *)((sq)->hw_ci_addr))
-#define MIN_SKB_LEN 64
+#define MIN_SKB_LEN 17
+
+#define MAX_PAYLOAD_OFFSET 221
+#define TRANSPORT_OFFSET(l4_hdr, skb) ((u32)((l4_hdr) - (skb)->data))
+
+union hinic_l3 {
+ struct iphdr *v4;
+ struct ipv6hdr *v6;
+ unsigned char *hdr;
+};
+
+union hinic_l4 {
+ struct tcphdr *tcp;
+ struct udphdr *udp;
+ unsigned char *hdr;
+};
+
+enum hinic_offload_type {
+ TX_OFFLOAD_TSO = BIT(0),
+ TX_OFFLOAD_CSUM = BIT(1),
+ TX_OFFLOAD_VLAN = BIT(2),
+ TX_OFFLOAD_INVALID = BIT(3),
+};
/**
* hinic_txq_clean_stats - Clean the statistics of specific queue
@@ -175,6 +204,251 @@ static void tx_unmap_skb(struct hinic_dev *nic_dev, struct sk_buff *skb,
DMA_TO_DEVICE);
}
+static void get_inner_l3_l4_type(struct sk_buff *skb, union hinic_l3 *ip,
+ union hinic_l4 *l4,
+ enum hinic_offload_type offload_type,
+ enum hinic_l3_offload_type *l3_type,
+ u8 *l4_proto)
+{
+ u8 *exthdr;
+
+ if (ip->v4->version == 4) {
+ *l3_type = (offload_type == TX_OFFLOAD_CSUM) ?
+ IPV4_PKT_NO_CHKSUM_OFFLOAD :
+ IPV4_PKT_WITH_CHKSUM_OFFLOAD;
+ *l4_proto = ip->v4->protocol;
+ } else if (ip->v4->version == 6) {
+ *l3_type = IPV6_PKT;
+ exthdr = ip->hdr + sizeof(*ip->v6);
+ *l4_proto = ip->v6->nexthdr;
+ if (exthdr != l4->hdr) {
+ int start = exthdr - skb->data;
+ __be16 frag_off;
+
+ ipv6_skip_exthdr(skb, start, l4_proto, &frag_off);
+ }
+ } else {
+ *l3_type = L3TYPE_UNKNOWN;
+ *l4_proto = 0;
+ }
+}
+
+static void get_inner_l4_info(struct sk_buff *skb, union hinic_l4 *l4,
+ enum hinic_offload_type offload_type, u8 l4_proto,
+ enum hinic_l4_offload_type *l4_offload,
+ u32 *l4_len, u32 *offset)
+{
+ *offset = 0;
+ *l4_len = 0;
+ *l4_offload = OFFLOAD_DISABLE;
+
+ switch (l4_proto) {
+ case IPPROTO_TCP:
+ *l4_offload = TCP_OFFLOAD_ENABLE;
+ /* doff in unit of 4B */
+ *l4_len = l4->tcp->doff * 4;
+ *offset = *l4_len + TRANSPORT_OFFSET(l4->hdr, skb);
+ break;
+
+ case IPPROTO_UDP:
+ *l4_offload = UDP_OFFLOAD_ENABLE;
+ *l4_len = sizeof(struct udphdr);
+ *offset = TRANSPORT_OFFSET(l4->hdr, skb);
+ break;
+
+ case IPPROTO_SCTP:
+ /* only csum offload support sctp */
+ if (offload_type != TX_OFFLOAD_CSUM)
+ break;
+
+ *l4_offload = SCTP_OFFLOAD_ENABLE;
+ *l4_len = sizeof(struct sctphdr);
+ *offset = TRANSPORT_OFFSET(l4->hdr, skb);
+ break;
+
+ default:
+ break;
+ }
+}
+
+static __sum16 csum_magic(union hinic_l3 *ip, unsigned short proto)
+{
+ return (ip->v4->version == 4) ?
+ csum_tcpudp_magic(ip->v4->saddr, ip->v4->daddr, 0, proto, 0) :
+ csum_ipv6_magic(&ip->v6->saddr, &ip->v6->daddr, 0, proto, 0);
+}
+
+static int offload_tso(struct hinic_sq_task *task, u32 *queue_info,
+ struct sk_buff *skb)
+{
+ u32 offset, l4_len, ip_identify, network_hdr_len;
+ enum hinic_l3_offload_type l3_offload;
+ enum hinic_l4_offload_type l4_offload;
+ union hinic_l3 ip;
+ union hinic_l4 l4;
+ u8 l4_proto;
+
+ if (!skb_is_gso(skb))
+ return 0;
+
+ if (skb_cow_head(skb, 0) < 0)
+ return -EPROTONOSUPPORT;
+
+ if (skb->encapsulation) {
+ u32 gso_type = skb_shinfo(skb)->gso_type;
+ u32 tunnel_type = 0;
+ u32 l4_tunnel_len;
+
+ ip.hdr = skb_network_header(skb);
+ l4.hdr = skb_transport_header(skb);
+ network_hdr_len = skb_inner_network_header_len(skb);
+
+ if (ip.v4->version == 4) {
+ ip.v4->tot_len = 0;
+ l3_offload = IPV4_PKT_WITH_CHKSUM_OFFLOAD;
+ } else if (ip.v4->version == 6) {
+ l3_offload = IPV6_PKT;
+ } else {
+ l3_offload = 0;
+ }
+
+ hinic_task_set_outter_l3(task, l3_offload,
+ skb_network_header_len(skb));
+
+ if (gso_type & SKB_GSO_UDP_TUNNEL_CSUM) {
+ l4.udp->check = ~csum_magic(&ip, IPPROTO_UDP);
+ tunnel_type = TUNNEL_UDP_CSUM;
+ } else if (gso_type & SKB_GSO_UDP_TUNNEL) {
+ tunnel_type = TUNNEL_UDP_NO_CSUM;
+ }
+
+ l4_tunnel_len = skb_inner_network_offset(skb) -
+ skb_transport_offset(skb);
+ hinic_task_set_tunnel_l4(task, tunnel_type, l4_tunnel_len);
+
+ ip.hdr = skb_inner_network_header(skb);
+ l4.hdr = skb_inner_transport_header(skb);
+ } else {
+ ip.hdr = skb_network_header(skb);
+ l4.hdr = skb_transport_header(skb);
+ network_hdr_len = skb_network_header_len(skb);
+ }
+
+ /* initialize inner IP header fields */
+ if (ip.v4->version == 4)
+ ip.v4->tot_len = 0;
+ else
+ ip.v6->payload_len = 0;
+
+ get_inner_l3_l4_type(skb, &ip, &l4, TX_OFFLOAD_TSO, &l3_offload,
+ &l4_proto);
+
+ hinic_task_set_inner_l3(task, l3_offload, network_hdr_len);
+
+ ip_identify = 0;
+ if (l4_proto == IPPROTO_TCP)
+ l4.tcp->check = ~csum_magic(&ip, IPPROTO_TCP);
+
+ get_inner_l4_info(skb, &l4, TX_OFFLOAD_TSO, l4_proto, &l4_offload,
+ &l4_len, &offset);
+
+ hinic_set_tso_inner_l4(task, queue_info, l4_offload, l4_len, offset,
+ ip_identify, skb_shinfo(skb)->gso_size);
+
+ return 1;
+}
+
+static int offload_csum(struct hinic_sq_task *task, u32 *queue_info,
+ struct sk_buff *skb)
+{
+ union hinic_l3 ip;
+ union hinic_l4 l4;
+ enum hinic_l3_offload_type l3_type;
+ enum hinic_l4_offload_type l4_offload;
+ u32 offset, l4_len, network_hdr_len;
+ u8 l4_proto;
+
+ if (skb->ip_summed != CHECKSUM_PARTIAL)
+ return 0;
+
+ if (skb->encapsulation) {
+ u32 l4_tunnel_len;
+
+ ip.hdr = skb_network_header(skb);
+
+ if (ip.v4->version == 4)
+ l3_type = IPV4_PKT_NO_CHKSUM_OFFLOAD;
+ else if (ip.v4->version == 6)
+ l3_type = IPV6_PKT;
+ else
+ l3_type = L3TYPE_UNKNOWN;
+
+ hinic_task_set_outter_l3(task, l3_type,
+ skb_network_header_len(skb));
+
+ l4_tunnel_len = skb_inner_network_offset(skb) -
+ skb_transport_offset(skb);
+
+ hinic_task_set_tunnel_l4(task, TUNNEL_UDP_NO_CSUM,
+ l4_tunnel_len);
+
+ ip.hdr = skb_inner_network_header(skb);
+ l4.hdr = skb_inner_transport_header(skb);
+ network_hdr_len = skb_inner_network_header_len(skb);
+ } else {
+ ip.hdr = skb_network_header(skb);
+ l4.hdr = skb_transport_header(skb);
+ network_hdr_len = skb_network_header_len(skb);
+ }
+
+ get_inner_l3_l4_type(skb, &ip, &l4, TX_OFFLOAD_CSUM, &l3_type,
+ &l4_proto);
+
+ hinic_task_set_inner_l3(task, l3_type, network_hdr_len);
+
+ get_inner_l4_info(skb, &l4, TX_OFFLOAD_CSUM, l4_proto, &l4_offload,
+ &l4_len, &offset);
+
+ hinic_set_cs_inner_l4(task, queue_info, l4_offload, l4_len, offset);
+
+ return 1;
+}
+
+static int hinic_tx_offload(struct sk_buff *skb, struct hinic_sq_task *task,
+ u32 *queue_info)
+{
+ enum hinic_offload_type offload = 0;
+ int enabled;
+
+ enabled = offload_tso(task, queue_info, skb);
+ if (enabled > 0) {
+ offload |= TX_OFFLOAD_TSO;
+ } else if (enabled == 0) {
+ enabled = offload_csum(task, queue_info, skb);
+ if (enabled)
+ offload |= TX_OFFLOAD_CSUM;
+ } else {
+ return -EPROTONOSUPPORT;
+ }
+
+ if (offload)
+ hinic_task_set_l2hdr(task, skb_network_offset(skb));
+
+ /* payload offset should not more than 221 */
+ if (HINIC_SQ_CTRL_GET(*queue_info, QUEUE_INFO_PLDOFF) >
+ MAX_PAYLOAD_OFFSET) {
+ return -EPROTONOSUPPORT;
+ }
+
+ /* mss should not less than 80 */
+ if (HINIC_SQ_CTRL_GET(*queue_info, QUEUE_INFO_MSS) < HINIC_MSS_MIN) {
+ *queue_info = HINIC_SQ_CTRL_CLEAR(*queue_info, QUEUE_INFO_MSS);
+ *queue_info |= HINIC_SQ_CTRL_SET(HINIC_MSS_MIN, QUEUE_INFO_MSS);
+ }
+
+ return 0;
+}
+
netdev_tx_t hinic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
{
struct hinic_dev *nic_dev = netdev_priv(netdev);
@@ -184,9 +458,9 @@ netdev_tx_t hinic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
unsigned int wqe_size;
struct hinic_txq *txq;
struct hinic_qp *qp;
- u16 prod_idx;
+ u16 prod_idx, q_id = skb->queue_mapping;
- txq = &nic_dev->txqs[skb->queue_mapping];
+ txq = &nic_dev->txqs[q_id];
qp = container_of(txq->sq, struct hinic_qp, sq);
if (skb->len < MIN_SKB_LEN) {
@@ -236,15 +510,23 @@ netdev_tx_t hinic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
process_sq_wqe:
hinic_sq_prepare_wqe(txq->sq, prod_idx, sq_wqe, txq->sges, nr_sges);
+ err = hinic_tx_offload(skb, &sq_wqe->task, &sq_wqe->ctrl.queue_info);
+ if (err)
+ goto offload_error;
+
hinic_sq_write_wqe(txq->sq, prod_idx, sq_wqe, skb, wqe_size);
flush_skbs:
- netdev_txq = netdev_get_tx_queue(netdev, skb->queue_mapping);
+ netdev_txq = netdev_get_tx_queue(netdev, q_id);
if ((!skb->xmit_more) || (netif_xmit_stopped(netdev_txq)))
hinic_sq_write_db(txq->sq, prod_idx, wqe_size, 0);
return err;
+offload_error:
+ hinic_sq_return_wqe(txq->sq, wqe_size);
+ tx_unmap_skb(nic_dev, skb, txq->sges);
+
skb_error:
dev_kfree_skb_any(skb);
@@ -252,7 +534,8 @@ netdev_tx_t hinic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
u64_stats_update_begin(&txq->txq_stats.syncp);
txq->txq_stats.tx_dropped++;
u64_stats_update_end(&txq->txq_stats.syncp);
- return err;
+
+ return NETDEV_TX_OK;
}
/**
--
2.17.1
^ permalink raw reply related
* [PATCH net-next] net: aquantia: make function aq_fw2x_update_stats static
From: YueHaibing @ 2018-10-16 10:50 UTC (permalink / raw)
To: davem, igor.russkikh, nikita.danilov
Cc: linux-kernel, netdev, yana.esina, andrew, YueHaibing
Fixes the following sparse warning:
drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c:282:5: warning:
symbol 'aq_fw2x_update_stats' was not declared. Should it be static?
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c
index c056846..096ca57 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c
@@ -279,7 +279,7 @@ static int aq_fw2x_get_mac_permanent(struct aq_hw_s *self, u8 *mac)
return err;
}
-int aq_fw2x_update_stats(struct aq_hw_s *self)
+static int aq_fw2x_update_stats(struct aq_hw_s *self)
{
int err = 0;
u32 mpi_opts = aq_hw_read_reg(self, HW_ATL_FW2X_MPI_CONTROL2_ADDR);
--
2.7.0
^ permalink raw reply related
* [PATCH net-next] ixgbe: fix XFRM_ALGO dependency
From: Arnd Bergmann @ 2018-10-16 10:03 UTC (permalink / raw)
To: Jeff Kirsher, David S. Miller, Steffen Klassert, Herbert Xu
Cc: Arnd Bergmann, Jesse Brandeburg, Björn Töpel,
Alexander Duyck, intel-wired-lan, netdev, linux-kernel
When XFRM_ALGO is not enabled, the new igxge ipsec code produces a
link error:
drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.o: In function `ixgbe_ipsec_vf_add_sa':
ixgbe_ipsec.c:(.text+0x1266): undefined reference to `xfrm_aead_get_byname'
Simply selectin XFRM_ALGO from here causes circular dependencies, so
to fix it, we probably want this slightly more complex solution that is
similar to what other drivers with XFRM offload do:
A separate Kconfig symbol now controls whether we include the ipsec
offload code. To keep the old behavior, this is left as 'default y'. The
dependency in XFRM_OFFLOAD still causes a circular dependency but is
not actually needed because this symbol is not user visible, so removing
that dependency on top makes it all work.
Fixes: eda0333ac293 ("ixgbe: add VF IPsec management")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/net/ethernet/intel/Kconfig | 10 ++++++++++
drivers/net/ethernet/intel/ixgbe/Makefile | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 8 ++++----
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 6 +++---
net/xfrm/Kconfig | 1 -
5 files changed, 18 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index b542aba6f0e8..d1db382d8299 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -194,6 +194,16 @@ config IXGBE_DCB
If unsure, say N.
+config IXGBE_IPSEC
+ bool "IPSec XFRM cryptography-offload accelaration"
+ default n
+ depends on IXGBE
+ depends on XFRM_OFFLOAD
+ default y
+ select XFRM_ALGO
+ ---help---
+ Enable support for IPSec offload in ixgbe.ko
+
config IXGBEVF
tristate "Intel(R) 10GbE PCI Express Virtual Function Ethernet support"
depends on PCI_MSI
diff --git a/drivers/net/ethernet/intel/ixgbe/Makefile b/drivers/net/ethernet/intel/ixgbe/Makefile
index ca6b0c458e4a..4fb0d9e3f2da 100644
--- a/drivers/net/ethernet/intel/ixgbe/Makefile
+++ b/drivers/net/ethernet/intel/ixgbe/Makefile
@@ -17,4 +17,4 @@ ixgbe-$(CONFIG_IXGBE_DCB) += ixgbe_dcb.o ixgbe_dcb_82598.o \
ixgbe-$(CONFIG_IXGBE_HWMON) += ixgbe_sysfs.o
ixgbe-$(CONFIG_DEBUG_FS) += ixgbe_debugfs.o
ixgbe-$(CONFIG_FCOE:m=y) += ixgbe_fcoe.o
-ixgbe-$(CONFIG_XFRM_OFFLOAD) += ixgbe_ipsec.o
+ixgbe-$(CONFIG_IXGBE_IPSEC) += ixgbe_ipsec.o
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 7a7679e7be84..1d5d66436eac 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -770,9 +770,9 @@ struct ixgbe_adapter {
#define IXGBE_RSS_KEY_SIZE 40 /* size of RSS Hash Key in bytes */
u32 *rss_key;
-#ifdef CONFIG_XFRM_OFFLOAD
+#ifdef CONFIG_IXGBE_IPSEC
struct ixgbe_ipsec *ipsec;
-#endif /* CONFIG_XFRM_OFFLOAD */
+#endif /* CONFIG_IXGBE_IPSEC */
/* AF_XDP zero-copy */
struct xdp_umem **xsk_umems;
@@ -1009,7 +1009,7 @@ void ixgbe_store_key(struct ixgbe_adapter *adapter);
void ixgbe_store_reta(struct ixgbe_adapter *adapter);
s32 ixgbe_negotiate_fc(struct ixgbe_hw *hw, u32 adv_reg, u32 lp_reg,
u32 adv_sym, u32 adv_asm, u32 lp_sym, u32 lp_asm);
-#ifdef CONFIG_XFRM_OFFLOAD
+#ifdef CONFIG_IXGBE_IPSEC
void ixgbe_init_ipsec_offload(struct ixgbe_adapter *adapter);
void ixgbe_stop_ipsec_offload(struct ixgbe_adapter *adapter);
void ixgbe_ipsec_restore(struct ixgbe_adapter *adapter);
@@ -1037,5 +1037,5 @@ static inline int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter,
u32 *mbuf, u32 vf) { return -EACCES; }
static inline int ixgbe_ipsec_vf_del_sa(struct ixgbe_adapter *adapter,
u32 *mbuf, u32 vf) { return -EACCES; }
-#endif /* CONFIG_XFRM_OFFLOAD */
+#endif /* CONFIG_IXGBE_IPSEC */
#endif /* _IXGBE_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0049a2becd7e..113b38e0defb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8694,7 +8694,7 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
#endif /* IXGBE_FCOE */
-#ifdef CONFIG_XFRM_OFFLOAD
+#ifdef CONFIG_IXGBE_IPSEC
if (skb->sp && !ixgbe_ipsec_tx(tx_ring, first, &ipsec_tx))
goto out_drop;
#endif
@@ -10190,7 +10190,7 @@ ixgbe_features_check(struct sk_buff *skb, struct net_device *dev,
* the TSO, so it's the exception.
*/
if (skb->encapsulation && !(features & NETIF_F_TSO_MANGLEID)) {
-#ifdef CONFIG_XFRM_OFFLOAD
+#ifdef CONFIG_IXGBE_IPSEC
if (!skb->sp)
#endif
features &= ~NETIF_F_TSO;
@@ -10883,7 +10883,7 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
if (hw->mac.type >= ixgbe_mac_82599EB)
netdev->features |= NETIF_F_SCTP_CRC;
-#ifdef CONFIG_XFRM_OFFLOAD
+#ifdef CONFIG_IXGBE_IPSEC
#define IXGBE_ESP_FEATURES (NETIF_F_HW_ESP | \
NETIF_F_HW_ESP_TX_CSUM | \
NETIF_F_GSO_ESP)
diff --git a/net/xfrm/Kconfig b/net/xfrm/Kconfig
index 4a9ee2d83158..140270a13d54 100644
--- a/net/xfrm/Kconfig
+++ b/net/xfrm/Kconfig
@@ -8,7 +8,6 @@ config XFRM
config XFRM_OFFLOAD
bool
- depends on XFRM
config XFRM_ALGO
tristate
--
2.18.0
^ permalink raw reply related
* [PATCH v2 net-next 11/11] net/ipv4: Bail early if user only wants prefix entries
From: David Ahern @ 2018-10-16 1:56 UTC (permalink / raw)
To: netdev, davem; +Cc: David Ahern
In-Reply-To: <20181016015651.22696-1-dsahern@kernel.org>
From: David Ahern <dsahern@gmail.com>
Unlike IPv6, IPv4 does not have routes marked with RTF_PREFIX_RT. If the
flag is set in the dump request, just return.
In the process of this change, move the CLONE check to use the new
filter flags.
Signed-off-by: David Ahern <dsahern@gmail.com>
---
net/ipv4/fib_frontend.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index e86ca2255181..5bf653f36911 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -886,10 +886,14 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
err = ip_valid_fib_dump_req(net, nlh, &filter, cb);
if (err < 0)
return err;
+ } else if (nlmsg_len(nlh) >= sizeof(struct rtmsg)) {
+ struct rtmsg *rtm = nlmsg_data(nlh);
+
+ filter.flags = rtm->rtm_flags & (RTM_F_PREFIX | RTM_F_CLONED);
}
- if (nlmsg_len(nlh) >= sizeof(struct rtmsg) &&
- ((struct rtmsg *)nlmsg_data(nlh))->rtm_flags & RTM_F_CLONED)
+ /* fib entries are never clones and ipv4 does not use prefix flag */
+ if (filter.flags & (RTM_F_PREFIX | RTM_F_CLONED))
return skb->len;
if (filter.table_id) {
--
2.11.0
^ permalink raw reply related
* [PATCH v2 net-next 08/11] net: Enable kernel side filtering of route dumps
From: David Ahern @ 2018-10-16 1:56 UTC (permalink / raw)
To: netdev, davem; +Cc: David Ahern
In-Reply-To: <20181016015651.22696-1-dsahern@kernel.org>
From: David Ahern <dsahern@gmail.com>
Update parsing of route dump request to enable kernel side filtering.
Allow filtering results by protocol (e.g., which routing daemon installed
the route), route type (e.g., unicast), table id and nexthop device. These
amount to the low hanging fruit, yet a huge improvement, for dumping
routes.
ip_valid_fib_dump_req is called with RTNL held, so __dev_get_by_index can
be used to look up the device index without taking a reference. From
there filter->dev is only used during dump loops with the lock still held.
Set NLM_F_DUMP_FILTERED in the answer_flags so the user knows the results
have been filtered should no entries be returned.
Signed-off-by: David Ahern <dsahern@gmail.com>
---
include/net/ip_fib.h | 2 +-
net/ipv4/fib_frontend.c | 51 ++++++++++++++++++++++++++++++++++++++++++-------
net/ipv4/ipmr.c | 2 +-
net/ipv6/ip6_fib.c | 2 +-
net/ipv6/ip6mr.c | 2 +-
net/mpls/af_mpls.c | 9 +++++----
6 files changed, 53 insertions(+), 15 deletions(-)
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 1eabc9edd2b9..e8d9456bf36e 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -465,5 +465,5 @@ u32 ip_mtu_from_fib_result(struct fib_result *res, __be32 daddr);
int ip_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
struct fib_dump_filter *filter,
- struct netlink_ext_ack *extack);
+ struct netlink_callback *cb);
#endif /* _NET_FIB_H */
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 37dc8ac366fd..e86ca2255181 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -804,9 +804,14 @@ static int inet_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh,
int ip_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
struct fib_dump_filter *filter,
- struct netlink_ext_ack *extack)
+ struct netlink_callback *cb)
{
+ struct netlink_ext_ack *extack = cb->extack;
+ struct nlattr *tb[RTA_MAX + 1];
struct rtmsg *rtm;
+ int err, i;
+
+ ASSERT_RTNL();
if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*rtm))) {
NL_SET_ERR_MSG(extack, "Invalid header for FIB dump request");
@@ -815,8 +820,7 @@ int ip_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
rtm = nlmsg_data(nlh);
if (rtm->rtm_dst_len || rtm->rtm_src_len || rtm->rtm_tos ||
- rtm->rtm_table || rtm->rtm_protocol || rtm->rtm_scope ||
- rtm->rtm_type) {
+ rtm->rtm_scope) {
NL_SET_ERR_MSG(extack, "Invalid values in header for FIB dump request");
return -EINVAL;
}
@@ -825,9 +829,42 @@ int ip_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
return -EINVAL;
}
- if (nlmsg_attrlen(nlh, sizeof(*rtm))) {
- NL_SET_ERR_MSG(extack, "Invalid data after header in FIB dump request");
- return -EINVAL;
+ filter->flags = rtm->rtm_flags;
+ filter->protocol = rtm->rtm_protocol;
+ filter->rt_type = rtm->rtm_type;
+ filter->table_id = rtm->rtm_table;
+
+ err = nlmsg_parse_strict(nlh, sizeof(*rtm), tb, RTA_MAX,
+ rtm_ipv4_policy, extack);
+ if (err < 0)
+ return err;
+
+ for (i = 0; i <= RTA_MAX; ++i) {
+ int ifindex;
+
+ if (!tb[i])
+ continue;
+
+ switch (i) {
+ case RTA_TABLE:
+ filter->table_id = nla_get_u32(tb[i]);
+ break;
+ case RTA_OIF:
+ ifindex = nla_get_u32(tb[i]);
+ filter->dev = __dev_get_by_index(net, ifindex);
+ if (!filter->dev)
+ return -ENODEV;
+ break;
+ default:
+ NL_SET_ERR_MSG(extack, "Unsupported attribute in dump request");
+ return -EINVAL;
+ }
+ }
+
+ if (filter->flags || filter->protocol || filter->rt_type ||
+ filter->table_id || filter->dev) {
+ filter->filter_set = 1;
+ cb->answer_flags = NLM_F_DUMP_FILTERED;
}
return 0;
@@ -846,7 +883,7 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
int dumped = 0, err;
if (cb->strict_check) {
- err = ip_valid_fib_dump_req(net, nlh, &filter, cb->extack);
+ err = ip_valid_fib_dump_req(net, nlh, &filter, cb);
if (err < 0)
return err;
}
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 3fa988e6a3df..7a3e2acda94c 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2532,7 +2532,7 @@ static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
if (cb->strict_check) {
err = ip_valid_fib_dump_req(sock_net(skb->sk), cb->nlh,
- &filter, cb->extack);
+ &filter, cb);
if (err < 0)
return err;
}
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index a51fc357a05c..5562c77022c6 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -580,7 +580,7 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
if (cb->strict_check) {
int err;
- err = ip_valid_fib_dump_req(net, nlh, &arg.filter, cb->extack);
+ err = ip_valid_fib_dump_req(net, nlh, &arg.filter, cb);
if (err < 0)
return err;
} else if (nlmsg_len(nlh) >= sizeof(struct rtmsg)) {
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 9759b0aecdd6..c3317ffb09eb 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -2463,7 +2463,7 @@ static int ip6mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
if (cb->strict_check) {
err = ip_valid_fib_dump_req(sock_net(skb->sk), nlh,
- &filter, cb->extack);
+ &filter, cb);
if (err < 0)
return err;
}
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 48f4cbd9fb38..24381696932a 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -2034,15 +2034,16 @@ static int mpls_dump_route(struct sk_buff *skb, u32 portid, u32 seq, int event,
#if IS_ENABLED(CONFIG_INET)
static int mpls_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
struct fib_dump_filter *filter,
- struct netlink_ext_ack *extack)
+ struct netlink_callback *cb)
{
- return ip_valid_fib_dump_req(net, nlh, filter, extack);
+ return ip_valid_fib_dump_req(net, nlh, filter, cb);
}
#else
static int mpls_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
struct fib_dump_filter *filter,
- struct netlink_ext_ack *extack)
+ struct netlink_callback *cb)
{
+ struct netlink_ext_ack *extack = cb->extack;
struct rtmsg *rtm;
if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*rtm))) {
@@ -2104,7 +2105,7 @@ static int mpls_dump_routes(struct sk_buff *skb, struct netlink_callback *cb)
if (cb->strict_check) {
int err;
- err = mpls_valid_fib_dump_req(net, nlh, &filter, cb->extack);
+ err = mpls_valid_fib_dump_req(net, nlh, &filter, cb);
if (err < 0)
return err;
--
2.11.0
^ permalink raw reply related
* [PATCH v2 net-next 06/11] ipmr: Refactor mr_rtm_dumproute
From: David Ahern @ 2018-10-16 1:56 UTC (permalink / raw)
To: netdev, davem; +Cc: David Ahern
In-Reply-To: <20181016015651.22696-1-dsahern@kernel.org>
From: David Ahern <dsahern@gmail.com>
Move per-table loops from mr_rtm_dumproute to mr_table_dump and export
mr_table_dump for dumps by specific table id.
Signed-off-by: David Ahern <dsahern@gmail.com>
---
include/linux/mroute_base.h | 6 ++++
net/ipv4/ipmr_base.c | 88 ++++++++++++++++++++++++++++-----------------
2 files changed, 61 insertions(+), 33 deletions(-)
diff --git a/include/linux/mroute_base.h b/include/linux/mroute_base.h
index 6675b9f81979..db85373c8d15 100644
--- a/include/linux/mroute_base.h
+++ b/include/linux/mroute_base.h
@@ -283,6 +283,12 @@ void *mr_mfc_find_any(struct mr_table *mrt, int vifi, void *hasharg);
int mr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
struct mr_mfc *c, struct rtmsg *rtm);
+int mr_table_dump(struct mr_table *mrt, struct sk_buff *skb,
+ struct netlink_callback *cb,
+ int (*fill)(struct mr_table *mrt, struct sk_buff *skb,
+ u32 portid, u32 seq, struct mr_mfc *c,
+ int cmd, int flags),
+ spinlock_t *lock);
int mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb,
struct mr_table *(*iter)(struct net *net,
struct mr_table *mrt),
diff --git a/net/ipv4/ipmr_base.c b/net/ipv4/ipmr_base.c
index 1ad9aa62a97b..132dd2613ca5 100644
--- a/net/ipv4/ipmr_base.c
+++ b/net/ipv4/ipmr_base.c
@@ -268,6 +268,55 @@ int mr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
}
EXPORT_SYMBOL(mr_fill_mroute);
+int mr_table_dump(struct mr_table *mrt, struct sk_buff *skb,
+ struct netlink_callback *cb,
+ int (*fill)(struct mr_table *mrt, struct sk_buff *skb,
+ u32 portid, u32 seq, struct mr_mfc *c,
+ int cmd, int flags),
+ spinlock_t *lock)
+{
+ unsigned int e = 0, s_e = cb->args[1];
+ unsigned int flags = NLM_F_MULTI;
+ struct mr_mfc *mfc;
+ int err;
+
+ list_for_each_entry_rcu(mfc, &mrt->mfc_cache_list, list) {
+ if (e < s_e)
+ goto next_entry;
+
+ err = fill(mrt, skb, NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq, mfc, RTM_NEWROUTE, flags);
+ if (err < 0)
+ goto out;
+next_entry:
+ e++;
+ }
+ e = 0;
+ s_e = 0;
+
+ spin_lock_bh(lock);
+ list_for_each_entry(mfc, &mrt->mfc_unres_queue, list) {
+ if (e < s_e)
+ goto next_entry2;
+
+ err = fill(mrt, skb, NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq, mfc, RTM_NEWROUTE, flags);
+ if (err < 0) {
+ spin_unlock_bh(lock);
+ goto out;
+ }
+next_entry2:
+ e++;
+ }
+ spin_unlock_bh(lock);
+ err = 0;
+ e = 0;
+
+out:
+ cb->args[1] = e;
+ return err;
+}
+
int mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb,
struct mr_table *(*iter)(struct net *net,
struct mr_table *mrt),
@@ -277,51 +326,24 @@ int mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb,
int cmd, int flags),
spinlock_t *lock)
{
- unsigned int t = 0, e = 0, s_t = cb->args[0], s_e = cb->args[1];
+ unsigned int t = 0, s_t = cb->args[0];
struct net *net = sock_net(skb->sk);
struct mr_table *mrt;
- struct mr_mfc *mfc;
+ int err;
rcu_read_lock();
for (mrt = iter(net, NULL); mrt; mrt = iter(net, mrt)) {
if (t < s_t)
goto next_table;
- list_for_each_entry_rcu(mfc, &mrt->mfc_cache_list, list) {
- if (e < s_e)
- goto next_entry;
- if (fill(mrt, skb, NETLINK_CB(cb->skb).portid,
- cb->nlh->nlmsg_seq, mfc,
- RTM_NEWROUTE, NLM_F_MULTI) < 0)
- goto done;
-next_entry:
- e++;
- }
- e = 0;
- s_e = 0;
-
- spin_lock_bh(lock);
- list_for_each_entry(mfc, &mrt->mfc_unres_queue, list) {
- if (e < s_e)
- goto next_entry2;
- if (fill(mrt, skb, NETLINK_CB(cb->skb).portid,
- cb->nlh->nlmsg_seq, mfc,
- RTM_NEWROUTE, NLM_F_MULTI) < 0) {
- spin_unlock_bh(lock);
- goto done;
- }
-next_entry2:
- e++;
- }
- spin_unlock_bh(lock);
- e = 0;
- s_e = 0;
+
+ err = mr_table_dump(mrt, skb, cb, fill, lock);
+ if (err < 0)
+ break;
next_table:
t++;
}
-done:
rcu_read_unlock();
- cb->args[1] = e;
cb->args[0] = t;
return skb->len;
--
2.11.0
^ permalink raw reply related
* [PATCH v2 net-next 02/11] net: Add struct for fib dump filter
From: David Ahern @ 2018-10-16 1:56 UTC (permalink / raw)
To: netdev, davem; +Cc: David Ahern
In-Reply-To: <20181016015651.22696-1-dsahern@kernel.org>
From: David Ahern <dsahern@gmail.com>
Add struct fib_dump_filter for options on limiting which routes are
returned in a dump request. The current list is table id, protocol,
route type, rtm_flags and nexthop device index. struct net is needed
to lookup the net_device from the index.
Declare the filter for each route dump handler and plumb the new
arguments from dump handlers to ip_valid_fib_dump_req.
Signed-off-by: David Ahern <dsahern@gmail.com>
---
include/net/ip6_route.h | 1 +
include/net/ip_fib.h | 13 ++++++++++++-
net/ipv4/fib_frontend.c | 6 ++++--
net/ipv4/ipmr.c | 6 +++++-
net/ipv6/ip6_fib.c | 5 +++--
net/ipv6/ip6mr.c | 5 ++++-
net/mpls/af_mpls.c | 12 ++++++++----
7 files changed, 37 insertions(+), 11 deletions(-)
diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index cef186dbd2ce..7ab119936e69 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -174,6 +174,7 @@ struct rt6_rtnl_dump_arg {
struct sk_buff *skb;
struct netlink_callback *cb;
struct net *net;
+ struct fib_dump_filter filter;
};
int rt6_dump_route(struct fib6_info *f6i, void *p_arg);
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 852e4ebf2209..667013bf4266 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -222,6 +222,16 @@ struct fib_table {
unsigned long __data[0];
};
+struct fib_dump_filter {
+ u32 table_id;
+ /* filter_set is an optimization that an entry is set */
+ bool filter_set;
+ unsigned char protocol;
+ unsigned char rt_type;
+ unsigned int flags;
+ struct net_device *dev;
+};
+
int fib_table_lookup(struct fib_table *tb, const struct flowi4 *flp,
struct fib_result *res, int fib_flags);
int fib_table_insert(struct net *, struct fib_table *, struct fib_config *,
@@ -453,6 +463,7 @@ static inline void fib_proc_exit(struct net *net)
u32 ip_mtu_from_fib_result(struct fib_result *res, __be32 daddr);
-int ip_valid_fib_dump_req(const struct nlmsghdr *nlh,
+int ip_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
+ struct fib_dump_filter *filter,
struct netlink_ext_ack *extack);
#endif /* _NET_FIB_H */
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 0f1beceb47d5..850850dd80e1 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -802,7 +802,8 @@ static int inet_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh,
return err;
}
-int ip_valid_fib_dump_req(const struct nlmsghdr *nlh,
+int ip_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
+ struct fib_dump_filter *filter,
struct netlink_ext_ack *extack)
{
struct rtmsg *rtm;
@@ -837,6 +838,7 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
{
const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
+ struct fib_dump_filter filter = {};
unsigned int h, s_h;
unsigned int e = 0, s_e;
struct fib_table *tb;
@@ -844,7 +846,7 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
int dumped = 0, err;
if (cb->strict_check) {
- err = ip_valid_fib_dump_req(nlh, cb->extack);
+ err = ip_valid_fib_dump_req(net, nlh, &filter, cb->extack);
if (err < 0)
return err;
}
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 91b0d5671649..44d777058960 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2527,9 +2527,13 @@ static int ipmr_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
{
+ struct fib_dump_filter filter = {};
+
if (cb->strict_check) {
- int err = ip_valid_fib_dump_req(cb->nlh, cb->extack);
+ int err;
+ err = ip_valid_fib_dump_req(sock_net(skb->sk), cb->nlh,
+ &filter, cb->extack);
if (err < 0)
return err;
}
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 0783af11b0b7..94e61fe47ff8 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -569,17 +569,18 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
{
const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
+ struct rt6_rtnl_dump_arg arg = {};
unsigned int h, s_h;
unsigned int e = 0, s_e;
- struct rt6_rtnl_dump_arg arg;
struct fib6_walker *w;
struct fib6_table *tb;
struct hlist_head *head;
int res = 0;
if (cb->strict_check) {
- int err = ip_valid_fib_dump_req(nlh, cb->extack);
+ int err;
+ err = ip_valid_fib_dump_req(net, nlh, &arg.filter, cb->extack);
if (err < 0)
return err;
}
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index d7563ef76518..dbd5166c5599 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -2458,10 +2458,13 @@ static void mrt6msg_netlink_event(struct mr_table *mrt, struct sk_buff *pkt)
static int ip6mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
{
const struct nlmsghdr *nlh = cb->nlh;
+ struct fib_dump_filter filter = {};
if (cb->strict_check) {
- int err = ip_valid_fib_dump_req(nlh, cb->extack);
+ int err;
+ err = ip_valid_fib_dump_req(sock_net(skb->sk), nlh,
+ &filter, cb->extack);
if (err < 0)
return err;
}
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 5fe274c47c41..bfcb4759c9ee 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -2032,13 +2032,15 @@ static int mpls_dump_route(struct sk_buff *skb, u32 portid, u32 seq, int event,
}
#if IS_ENABLED(CONFIG_INET)
-static int mpls_valid_fib_dump_req(const struct nlmsghdr *nlh,
+static int mpls_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
+ struct fib_dump_filter *filter,
struct netlink_ext_ack *extack)
{
- return ip_valid_fib_dump_req(nlh, extack);
+ return ip_valid_fib_dump_req(net, nlh, filter, extack);
}
#else
-static int mpls_valid_fib_dump_req(const struct nlmsghdr *nlh,
+static int mpls_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
+ struct fib_dump_filter *filter,
struct netlink_ext_ack *extack)
{
struct rtmsg *rtm;
@@ -2070,14 +2072,16 @@ static int mpls_dump_routes(struct sk_buff *skb, struct netlink_callback *cb)
const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
struct mpls_route __rcu **platform_label;
+ struct fib_dump_filter filter = {};
size_t platform_labels;
unsigned int index;
ASSERT_RTNL();
if (cb->strict_check) {
- int err = mpls_valid_fib_dump_req(nlh, cb->extack);
+ int err;
+ err = mpls_valid_fib_dump_req(net, nlh, &filter, cb->extack);
if (err < 0)
return err;
}
--
2.11.0
^ permalink raw reply related
* [PATCH v2 net-next 09/11] net/mpls: Handle kernel side filtering of route dumps
From: David Ahern @ 2018-10-16 1:56 UTC (permalink / raw)
To: netdev, davem; +Cc: David Ahern
In-Reply-To: <20181016015651.22696-1-dsahern@kernel.org>
From: David Ahern <dsahern@gmail.com>
Update the dump request parsing in MPLS for the non-INET case to
enable kernel side filtering. If INET is disabled the only filters
that make sense for MPLS are protocol and nexthop device.
Signed-off-by: David Ahern <dsahern@gmail.com>
---
net/mpls/af_mpls.c | 33 ++++++++++++++++++++++++++++-----
1 file changed, 28 insertions(+), 5 deletions(-)
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 24381696932a..7d55d4c04088 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -2044,7 +2044,9 @@ static int mpls_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
struct netlink_callback *cb)
{
struct netlink_ext_ack *extack = cb->extack;
+ struct nlattr *tb[RTA_MAX + 1];
struct rtmsg *rtm;
+ int err, i;
if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*rtm))) {
NL_SET_ERR_MSG_MOD(extack, "Invalid header for FIB dump request");
@@ -2053,15 +2055,36 @@ static int mpls_valid_fib_dump_req(struct net *net, const struct nlmsghdr *nlh,
rtm = nlmsg_data(nlh);
if (rtm->rtm_dst_len || rtm->rtm_src_len || rtm->rtm_tos ||
- rtm->rtm_table || rtm->rtm_protocol || rtm->rtm_scope ||
- rtm->rtm_type || rtm->rtm_flags) {
+ rtm->rtm_table || rtm->rtm_scope || rtm->rtm_type ||
+ rtm->rtm_flags) {
NL_SET_ERR_MSG_MOD(extack, "Invalid values in header for FIB dump request");
return -EINVAL;
}
- if (nlmsg_attrlen(nlh, sizeof(*rtm))) {
- NL_SET_ERR_MSG_MOD(extack, "Invalid data after header in FIB dump request");
- return -EINVAL;
+ if (rtm->rtm_protocol) {
+ filter->protocol = rtm->rtm_protocol;
+ filter->filter_set = 1;
+ cb->answer_flags = NLM_F_DUMP_FILTERED;
+ }
+
+ err = nlmsg_parse_strict(nlh, sizeof(*rtm), tb, RTA_MAX,
+ rtm_mpls_policy, extack);
+ if (err < 0)
+ return err;
+
+ for (i = 0; i <= RTA_MAX; ++i) {
+ int ifindex;
+
+ if (i == RTA_OIF) {
+ ifindex = nla_get_u32(tb[i]);
+ filter->dev = __dev_get_by_index(net, ifindex);
+ if (!filter->dev)
+ return -ENODEV;
+ filter->filter_set = 1;
+ } else if (tb[i]) {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported attribute in dump request");
+ return -EINVAL;
+ }
}
return 0;
--
2.11.0
^ permalink raw reply related
* [PATCH v2 net-next 07/11] net: Plumb support for filtering ipv4 and ipv6 multicast route dumps
From: David Ahern @ 2018-10-16 1:56 UTC (permalink / raw)
To: netdev, davem; +Cc: David Ahern
In-Reply-To: <20181016015651.22696-1-dsahern@kernel.org>
From: David Ahern <dsahern@gmail.com>
Implement kernel side filtering of routes by egress device index and
table id. If the table id is given in the filter, lookup table and
call mr_table_dump directly for it.
Signed-off-by: David Ahern <dsahern@gmail.com>
---
include/linux/mroute_base.h | 7 ++++---
net/ipv4/ipmr.c | 18 +++++++++++++++---
net/ipv4/ipmr_base.c | 42 +++++++++++++++++++++++++++++++++++++++---
net/ipv6/ip6mr.c | 18 +++++++++++++++---
4 files changed, 73 insertions(+), 12 deletions(-)
diff --git a/include/linux/mroute_base.h b/include/linux/mroute_base.h
index db85373c8d15..34de06b426ef 100644
--- a/include/linux/mroute_base.h
+++ b/include/linux/mroute_base.h
@@ -7,6 +7,7 @@
#include <net/net_namespace.h>
#include <net/sock.h>
#include <net/fib_notifier.h>
+#include <net/ip_fib.h>
/**
* struct vif_device - interface representor for multicast routing
@@ -288,7 +289,7 @@ int mr_table_dump(struct mr_table *mrt, struct sk_buff *skb,
int (*fill)(struct mr_table *mrt, struct sk_buff *skb,
u32 portid, u32 seq, struct mr_mfc *c,
int cmd, int flags),
- spinlock_t *lock);
+ spinlock_t *lock, struct fib_dump_filter *filter);
int mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb,
struct mr_table *(*iter)(struct net *net,
struct mr_table *mrt),
@@ -296,7 +297,7 @@ int mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb,
struct sk_buff *skb,
u32 portid, u32 seq, struct mr_mfc *c,
int cmd, int flags),
- spinlock_t *lock);
+ spinlock_t *lock, struct fib_dump_filter *filter);
int mr_dump(struct net *net, struct notifier_block *nb, unsigned short family,
int (*rules_dump)(struct net *net,
@@ -346,7 +347,7 @@ mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb,
struct sk_buff *skb,
u32 portid, u32 seq, struct mr_mfc *c,
int cmd, int flags),
- spinlock_t *lock)
+ spinlock_t *lock, struct fib_dump_filter *filter)
{
return -EINVAL;
}
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 44d777058960..3fa988e6a3df 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2528,18 +2528,30 @@ static int ipmr_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
static int ipmr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
{
struct fib_dump_filter filter = {};
+ int err;
if (cb->strict_check) {
- int err;
-
err = ip_valid_fib_dump_req(sock_net(skb->sk), cb->nlh,
&filter, cb->extack);
if (err < 0)
return err;
}
+ if (filter.table_id) {
+ struct mr_table *mrt;
+
+ mrt = ipmr_get_table(sock_net(skb->sk), filter.table_id);
+ if (!mrt) {
+ NL_SET_ERR_MSG(cb->extack, "ipv4: MR table does not exist");
+ return -ENOENT;
+ }
+ err = mr_table_dump(mrt, skb, cb, _ipmr_fill_mroute,
+ &mfc_unres_lock, &filter);
+ return skb->len ? : err;
+ }
+
return mr_rtm_dumproute(skb, cb, ipmr_mr_table_iter,
- _ipmr_fill_mroute, &mfc_unres_lock);
+ _ipmr_fill_mroute, &mfc_unres_lock, &filter);
}
static const struct nla_policy rtm_ipmr_policy[RTA_MAX + 1] = {
diff --git a/net/ipv4/ipmr_base.c b/net/ipv4/ipmr_base.c
index 132dd2613ca5..bfe8fd04afa0 100644
--- a/net/ipv4/ipmr_base.c
+++ b/net/ipv4/ipmr_base.c
@@ -268,21 +268,45 @@ int mr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
}
EXPORT_SYMBOL(mr_fill_mroute);
+static bool mr_mfc_uses_dev(const struct mr_table *mrt,
+ const struct mr_mfc *c,
+ const struct net_device *dev)
+{
+ int ct;
+
+ for (ct = c->mfc_un.res.minvif; ct < c->mfc_un.res.maxvif; ct++) {
+ if (VIF_EXISTS(mrt, ct) && c->mfc_un.res.ttls[ct] < 255) {
+ const struct vif_device *vif;
+
+ vif = &mrt->vif_table[ct];
+ if (vif->dev == dev)
+ return true;
+ }
+ }
+ return false;
+}
+
int mr_table_dump(struct mr_table *mrt, struct sk_buff *skb,
struct netlink_callback *cb,
int (*fill)(struct mr_table *mrt, struct sk_buff *skb,
u32 portid, u32 seq, struct mr_mfc *c,
int cmd, int flags),
- spinlock_t *lock)
+ spinlock_t *lock, struct fib_dump_filter *filter)
{
unsigned int e = 0, s_e = cb->args[1];
unsigned int flags = NLM_F_MULTI;
struct mr_mfc *mfc;
int err;
+ if (filter->filter_set)
+ flags |= NLM_F_DUMP_FILTERED;
+
list_for_each_entry_rcu(mfc, &mrt->mfc_cache_list, list) {
if (e < s_e)
goto next_entry;
+ if (filter->dev &&
+ !mr_mfc_uses_dev(mrt, mfc, filter->dev))
+ goto next_entry;
err = fill(mrt, skb, NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq, mfc, RTM_NEWROUTE, flags);
@@ -298,6 +322,9 @@ int mr_table_dump(struct mr_table *mrt, struct sk_buff *skb,
list_for_each_entry(mfc, &mrt->mfc_unres_queue, list) {
if (e < s_e)
goto next_entry2;
+ if (filter->dev &&
+ !mr_mfc_uses_dev(mrt, mfc, filter->dev))
+ goto next_entry2;
err = fill(mrt, skb, NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq, mfc, RTM_NEWROUTE, flags);
@@ -324,19 +351,28 @@ int mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb,
struct sk_buff *skb,
u32 portid, u32 seq, struct mr_mfc *c,
int cmd, int flags),
- spinlock_t *lock)
+ spinlock_t *lock, struct fib_dump_filter *filter)
{
unsigned int t = 0, s_t = cb->args[0];
struct net *net = sock_net(skb->sk);
struct mr_table *mrt;
int err;
+ /* multicast does not track protocol or have route type other
+ * than RTN_MULTICAST
+ */
+ if (filter->filter_set) {
+ if (filter->protocol || filter->flags ||
+ (filter->rt_type && filter->rt_type != RTN_MULTICAST))
+ return skb->len;
+ }
+
rcu_read_lock();
for (mrt = iter(net, NULL); mrt; mrt = iter(net, mrt)) {
if (t < s_t)
goto next_table;
- err = mr_table_dump(mrt, skb, cb, fill, lock);
+ err = mr_table_dump(mrt, skb, cb, fill, lock, filter);
if (err < 0)
break;
next_table:
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index dbd5166c5599..9759b0aecdd6 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -2459,16 +2459,28 @@ static int ip6mr_rtm_dumproute(struct sk_buff *skb, struct netlink_callback *cb)
{
const struct nlmsghdr *nlh = cb->nlh;
struct fib_dump_filter filter = {};
+ int err;
if (cb->strict_check) {
- int err;
-
err = ip_valid_fib_dump_req(sock_net(skb->sk), nlh,
&filter, cb->extack);
if (err < 0)
return err;
}
+ if (filter.table_id) {
+ struct mr_table *mrt;
+
+ mrt = ip6mr_get_table(sock_net(skb->sk), filter.table_id);
+ if (!mrt) {
+ NL_SET_ERR_MSG_MOD(cb->extack, "MR table does not exist");
+ return -ENOENT;
+ }
+ err = mr_table_dump(mrt, skb, cb, _ip6mr_fill_mroute,
+ &mfc_unres_lock, &filter);
+ return skb->len ? : err;
+ }
+
return mr_rtm_dumproute(skb, cb, ip6mr_mr_table_iter,
- _ip6mr_fill_mroute, &mfc_unres_lock);
+ _ip6mr_fill_mroute, &mfc_unres_lock, &filter);
}
--
2.11.0
^ permalink raw reply related
* [PATCH v2 net-next 10/11] net/ipv6: Bail early if user only wants cloned entries
From: David Ahern @ 2018-10-16 1:56 UTC (permalink / raw)
To: netdev, davem; +Cc: David Ahern
In-Reply-To: <20181016015651.22696-1-dsahern@kernel.org>
From: David Ahern <dsahern@gmail.com>
Similar to IPv4, IPv6 fib no longer contains cloned routes. If a user
requests a route dump for only cloned entries, no sense walking the FIB
and returning everything.
Signed-off-by: David Ahern <dsahern@gmail.com>
---
net/ipv6/ip6_fib.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 5562c77022c6..2a058b408a6a 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -586,10 +586,13 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
} else if (nlmsg_len(nlh) >= sizeof(struct rtmsg)) {
struct rtmsg *rtm = nlmsg_data(nlh);
- if (rtm->rtm_flags & RTM_F_PREFIX)
- arg.filter.flags = RTM_F_PREFIX;
+ arg.filter.flags = rtm->rtm_flags & (RTM_F_PREFIX|RTM_F_CLONED);
}
+ /* fib entries are never clones */
+ if (arg.filter.flags & RTM_F_CLONED)
+ return skb->len;
+
w = (void *)cb->args[2];
if (!w) {
/* New dump:
--
2.11.0
^ permalink raw reply related
* [PATCH v2 net-next 04/11] net/ipv6: Plumb support for filtering route dumps
From: David Ahern @ 2018-10-16 1:56 UTC (permalink / raw)
To: netdev, davem; +Cc: David Ahern
In-Reply-To: <20181016015651.22696-1-dsahern@kernel.org>
From: David Ahern <dsahern@gmail.com>
Implement kernel side filtering of routes by table id, egress device
index, protocol, and route type. If the table id is given in the filter,
lookup the table and call fib6_dump_table directly for it.
Move the existing route flags check for prefix only routes to the new
filter.
Signed-off-by: David Ahern <dsahern@gmail.com>
---
net/ipv6/ip6_fib.c | 28 ++++++++++++++++++++++------
net/ipv6/route.c | 40 ++++++++++++++++++++++++++++++++--------
2 files changed, 54 insertions(+), 14 deletions(-)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 94e61fe47ff8..a51fc357a05c 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -583,10 +583,12 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
err = ip_valid_fib_dump_req(net, nlh, &arg.filter, cb->extack);
if (err < 0)
return err;
- }
+ } else if (nlmsg_len(nlh) >= sizeof(struct rtmsg)) {
+ struct rtmsg *rtm = nlmsg_data(nlh);
- s_h = cb->args[0];
- s_e = cb->args[1];
+ if (rtm->rtm_flags & RTM_F_PREFIX)
+ arg.filter.flags = RTM_F_PREFIX;
+ }
w = (void *)cb->args[2];
if (!w) {
@@ -612,6 +614,20 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
arg.net = net;
w->args = &arg;
+ if (arg.filter.table_id) {
+ tb = fib6_get_table(net, arg.filter.table_id);
+ if (!tb) {
+ NL_SET_ERR_MSG_MOD(cb->extack, "FIB table does not exist");
+ return -ENOENT;
+ }
+
+ res = fib6_dump_table(tb, skb, cb);
+ goto out;
+ }
+
+ s_h = cb->args[0];
+ s_e = cb->args[1];
+
rcu_read_lock();
for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_e = 0) {
e = 0;
@@ -621,16 +637,16 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
goto next;
res = fib6_dump_table(tb, skb, cb);
if (res != 0)
- goto out;
+ goto out_unlock;
next:
e++;
}
}
-out:
+out_unlock:
rcu_read_unlock();
cb->args[1] = e;
cb->args[0] = h;
-
+out:
res = res < 0 ? res : skb->len;
if (res <= 0)
fib6_dump_end(cb);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f4e08b0689a8..9fd600e42f9d 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4767,28 +4767,52 @@ static int rt6_fill_node(struct net *net, struct sk_buff *skb,
return -EMSGSIZE;
}
+static bool fib6_info_uses_dev(const struct fib6_info *f6i,
+ const struct net_device *dev)
+{
+ if (f6i->fib6_nh.nh_dev == dev)
+ return true;
+
+ if (f6i->fib6_nsiblings) {
+ struct fib6_info *sibling, *next_sibling;
+
+ list_for_each_entry_safe(sibling, next_sibling,
+ &f6i->fib6_siblings, fib6_siblings) {
+ if (sibling->fib6_nh.nh_dev == dev)
+ return true;
+ }
+ }
+
+ return false;
+}
+
int rt6_dump_route(struct fib6_info *rt, void *p_arg)
{
struct rt6_rtnl_dump_arg *arg = (struct rt6_rtnl_dump_arg *) p_arg;
+ struct fib_dump_filter *filter = &arg->filter;
+ unsigned int flags = NLM_F_MULTI;
struct net *net = arg->net;
if (rt == net->ipv6.fib6_null_entry)
return 0;
- if (nlmsg_len(arg->cb->nlh) >= sizeof(struct rtmsg)) {
- struct rtmsg *rtm = nlmsg_data(arg->cb->nlh);
-
- /* user wants prefix routes only */
- if (rtm->rtm_flags & RTM_F_PREFIX &&
- !(rt->fib6_flags & RTF_PREFIX_RT)) {
- /* success since this is not a prefix route */
+ if ((filter->flags & RTM_F_PREFIX) &&
+ !(rt->fib6_flags & RTF_PREFIX_RT)) {
+ /* success since this is not a prefix route */
+ return 1;
+ }
+ if (filter->filter_set) {
+ if ((filter->rt_type && rt->fib6_type != filter->rt_type) ||
+ (filter->dev && !fib6_info_uses_dev(rt, filter->dev)) ||
+ (filter->protocol && rt->fib6_protocol != filter->protocol)) {
return 1;
}
+ flags |= NLM_F_DUMP_FILTERED;
}
return rt6_fill_node(net, arg->skb, rt, NULL, NULL, NULL, 0,
RTM_NEWROUTE, NETLINK_CB(arg->cb->skb).portid,
- arg->cb->nlh->nlmsg_seq, NLM_F_MULTI);
+ arg->cb->nlh->nlmsg_seq, flags);
}
static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
--
2.11.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox