* Re: [net,v2] net/packet: fix packet receive on L3 devices without visible hard header
From: Willem de Bruijn @ 2020-11-21 13:23 UTC (permalink / raw)
To: Jason A. Donenfeld
Cc: Eyal Birger, Willem de Bruijn, David Miller, Jakub Kicinski,
Network Development, Xie He
In-Reply-To: <CAHmME9rYRrWOs247vFJX-MAY+Zn3yUudOxVhqL13mWp8E+i0-A@mail.gmail.com>
On Sat, Nov 21, 2020 at 2:56 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> On 11/21/20, Eyal Birger <eyal.birger@gmail.com> wrote:
> > In the patchset merged by commit b9fcf0a0d826
> > ("Merge branch 'support-AF_PACKET-for-layer-3-devices'") L3 devices which
> > did not have header_ops were given one for the purpose of protocol parsing
> > on af_packet transmit path.
> >
> > That change made af_packet receive path regard these devices as having a
> > visible L3 header and therefore aligned incoming skb->data to point to the
> > skb's mac_header. Some devices, such as ipip, xfrmi, and others, do not
> > reset their mac_header prior to ingress and therefore their incoming
> > packets became malformed.
> >
> > Ideally these devices would reset their mac headers, or af_packet would be
> > able to rely on dev->hard_header_len being 0 for such cases, but it seems
> > this is not the case.
> >
> > Fix by changing af_packet RX ll visibility criteria to include the
> > existence of a '.create()' header operation, which is used when creating
> > a device hard header - via dev_hard_header() - by upper layers, and does
> > not exist in these L3 devices.
> >
> > As this predicate may be useful in other situations, add it as a common
> > dev_has_header() helper in netdevice.h.
> >
> > Fixes: b9fcf0a0d826 ("Merge branch
> > 'support-AF_PACKET-for-layer-3-devices'")
> > Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
> Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Willem de Bruijn <willemb@google.com>
^ permalink raw reply
* SO_REUSEADDR compatibility problems
From: James Courtier-Dutton @ 2020-11-21 13:36 UTC (permalink / raw)
To: netdev
Hi,
The use case I am struggling with is the use of a Windows program
running in wine that is sending and receiving UDP packets.
This particular windows program uses SO_REUSEADDR socket option and
opens two sockets. Lets call the first one socket A, and the second
one Socket B.
The SO_REUSEADDR from the Windows application is translated by "wine" into a
SO_REUSEADDR in Linux.
Unfortunately the behaviour of these is different between Windows and
Linux so the Windows application fails to run on Linux under wine.
1 ) On windows:
All received unicast UDP packets will arrive on the first opened
socket. Thus on socket A.
2) On Linux:
All received unicast UDP packets will arrive on the last opened
socket. Thus on socket B.
The problem is that this windows program only expects to receive
unicast UDP packets on socket A, and thus it sees no packets.
There are no currently existing socket options in Linux that would
permit wine to simulate the Windows behaviour.
And thus, the reason I am asking the question here.
Please can we add an extra socket option to the Linux socket options
such that we can get wine to simulate Windows correctly. I.e. behave
like (1) above.
Now wine is pretty good at simulating most things Windows throws at
it, but socket options is not one of them yet.
Also note, that (1) is actually more secure than (2) because it
prevents other applications with the same UserID from hijacking the
socket.
Although (2) is more helpful in more gracefully handling some error edge cases.
Suggested new option name: SO_REUSEADDR_WS
Kind Regards
James
^ permalink raw reply
* Re: [PATCHv4 net-next 2/3] octeontx2-af: Add devlink health reporters for NPA
From: Jiri Pirko @ 2020-11-21 14:13 UTC (permalink / raw)
To: George Cherian
Cc: netdev, linux-kernel, kuba, davem, sgoutham, lcherian, gakula,
masahiroy, willemdebruijn.kernel, saeed
In-Reply-To: <20201121040201.3171542-3-george.cherian@marvell.com>
Sat, Nov 21, 2020 at 05:02:00AM CET, george.cherian@marvell.com wrote:
>Add health reporters for RVU NPA block.
>NPA Health reporters handle following HW event groups
> - GENERAL events
> - ERROR events
> - RAS events
> - RVU event
>An event counter per event is maintained in SW.
>
>Output:
> # devlink health
> pci/0002:01:00.0:
> reporter npa
> state healthy error 0 recover 0
> # devlink health dump show pci/0002:01:00.0 reporter npa
> NPA_AF_GENERAL:
> Unmap PF Error: 0
> Free Disabled for NIX0 RX: 0
> Free Disabled for NIX0 TX: 0
> Free Disabled for NIX1 RX: 0
> Free Disabled for NIX1 TX: 0
This is for 2 ports if I'm not mistaken. Then you need to have this
reporter per-port. Register ports and have reporter for each.
NAK.
^ permalink raw reply
* [PATCH ipsec-next] xfrm: interface: support collect metadata mode
From: Eyal Birger @ 2020-11-21 14:28 UTC (permalink / raw)
To: steffen.klassert, herbert, davem, kuba; +Cc: netdev, Eyal Birger
This commit adds support for 'collect_md' mode on xfrm interfaces.
Each net can have one collect_md device, created by providing the
IFLA_XFRM_COLLECT_METADATA flag at creation. This device cannot be
altered and has no if_id or link device attributes.
On transmit to this device, the if_id is fetched from the attached dst
metadata on the skb. The dst metadata type used is METADATA_IP_TUNNEL
since the only needed property is the if_id stored in the tun_id member
of the ip_tunnel_info->key.
On the receive side, xfrmi_rcv_cb() populates a dst metadata for each
packet received and attaches it to the skb. The if_id used in this case is
fetched from the xfrm state. This can later be used by upper layers such
as tc, ebpf, and ip rules.
Because the skb is scrubed in xfrmi_rcv_cb(), the attachment of the dst
metadata is postponed until after scrubing. Similarly, xfrm_input() is
adapted to avoid dropping metadata dsts by only dropping 'valid'
(skb_valid_dst(skb) == true) dsts.
Policy matching on packets arriving from collect_md xfrmi devices is
done by using the xfrm state existing in the skb's sec_path.
The xfrm_if_cb.decode_cb() interface implemented by xfrmi_decode_session()
is changed to keep the details of the if_id extraction tucked away
in xfrm_interface.c.
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
---
include/net/xfrm.h | 11 +++-
include/uapi/linux/if_link.h | 1 +
net/xfrm/xfrm_input.c | 7 ++-
net/xfrm/xfrm_interface.c | 114 ++++++++++++++++++++++++++++++-----
net/xfrm/xfrm_policy.c | 10 +--
5 files changed, 119 insertions(+), 24 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index b2a06f10b62c..925f8dcdd0db 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -308,9 +308,15 @@ struct xfrm_replay {
int (*overflow)(struct xfrm_state *x, struct sk_buff *skb);
};
+struct xfrm_if_decode_session_params {
+ struct net *net;
+ u32 if_id;
+};
+
struct xfrm_if_cb {
- struct xfrm_if *(*decode_session)(struct sk_buff *skb,
- unsigned short family);
+ bool (*decode_session)(struct sk_buff *skb,
+ unsigned short family,
+ struct xfrm_if_decode_session_params *params);
};
void xfrm_if_register_cb(const struct xfrm_if_cb *ifcb);
@@ -984,6 +990,7 @@ void xfrm_dst_ifdown(struct dst_entry *dst, struct net_device *dev);
struct xfrm_if_parms {
int link; /* ifindex of underlying L2 interface */
u32 if_id; /* interface identifyer */
+ bool collect_md;
};
struct xfrm_if {
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index c4b23f06f69e..ff04a06c2b69 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -655,6 +655,7 @@ enum {
IFLA_XFRM_UNSPEC,
IFLA_XFRM_LINK,
IFLA_XFRM_IF_ID,
+ IFLA_XFRM_COLLECT_METADATA,
__IFLA_XFRM_MAX
};
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index be6351e3f3cd..c7de46d30697 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -20,6 +20,7 @@
#include <net/xfrm.h>
#include <net/ip_tunnels.h>
#include <net/ip6_tunnel.h>
+#include <net/dst_metadata.h>
#include "xfrm_inout.h"
@@ -719,7 +720,8 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
sp = skb_sec_path(skb);
if (sp)
sp->olen = 0;
- skb_dst_drop(skb);
+ if (skb_valid_dst(skb))
+ skb_dst_drop(skb);
gro_cells_receive(&gro_cells, skb);
return 0;
} else {
@@ -737,7 +739,8 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
sp = skb_sec_path(skb);
if (sp)
sp->olen = 0;
- skb_dst_drop(skb);
+ if (skb_valid_dst(skb))
+ skb_dst_drop(skb);
gro_cells_receive(&gro_cells, skb);
return err;
}
diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
index 9b8e292a7c6a..10c14967d305 100644
--- a/net/xfrm/xfrm_interface.c
+++ b/net/xfrm/xfrm_interface.c
@@ -41,6 +41,7 @@
#include <net/addrconf.h>
#include <net/xfrm.h>
#include <net/net_namespace.h>
+#include <net/dst_metadata.h>
#include <net/netns/generic.h>
#include <linux/etherdevice.h>
@@ -56,11 +57,22 @@ static const struct net_device_ops xfrmi_netdev_ops;
struct xfrmi_net {
/* lists for storing interfaces in use */
struct xfrm_if __rcu *xfrmi[XFRMI_HASH_SIZE];
+ struct xfrm_if __rcu *collect_md_xfrmi;
};
#define for_each_xfrmi_rcu(start, xi) \
for (xi = rcu_dereference(start); xi; xi = rcu_dereference(xi->next))
+static u32 tunnel_id_to_if_id(__be64 tun_id)
+{
+ return ntohl(tunnel_id_to_key32(tun_id));
+}
+
+static __be64 if_id_to_tunnel_id(u32 if_id)
+{
+ return key32_to_tunnel_id(htonl(if_id));
+}
+
static u32 xfrmi_hash(u32 if_id)
{
return hash_32(if_id, XFRMI_HASH_BITS);
@@ -77,17 +89,23 @@ static struct xfrm_if *xfrmi_lookup(struct net *net, struct xfrm_state *x)
return xi;
}
+ xi = rcu_dereference(xfrmn->collect_md_xfrmi);
+ if (xi && xi->dev->flags & IFF_UP)
+ return xi;
+
return NULL;
}
-static struct xfrm_if *xfrmi_decode_session(struct sk_buff *skb,
- unsigned short family)
+static bool xfrmi_decode_session(struct sk_buff *skb,
+ unsigned short family,
+ struct xfrm_if_decode_session_params *params)
{
struct net_device *dev;
+ struct xfrm_if *xi;
int ifindex = 0;
if (!secpath_exists(skb) || !skb->dev)
- return NULL;
+ return false;
switch (family) {
case AF_INET6:
@@ -107,11 +125,18 @@ static struct xfrm_if *xfrmi_decode_session(struct sk_buff *skb,
}
if (!dev || !(dev->flags & IFF_UP))
- return NULL;
+ return false;
if (dev->netdev_ops != &xfrmi_netdev_ops)
- return NULL;
+ return false;
+
+ xi = netdev_priv(dev);
+ params->net = xi->net;
- return netdev_priv(dev);
+ if (xi->p.collect_md)
+ params->if_id = xfrm_input_state(skb)->if_id;
+ else
+ params->if_id = xi->p.if_id;
+ return true;
}
static void xfrmi_link(struct xfrmi_net *xfrmn, struct xfrm_if *xi)
@@ -157,7 +182,10 @@ static int xfrmi_create(struct net_device *dev)
if (err < 0)
goto out;
- xfrmi_link(xfrmn, xi);
+ if (xi->p.collect_md)
+ rcu_assign_pointer(xfrmn->collect_md_xfrmi, xi);
+ else
+ xfrmi_link(xfrmn, xi);
return 0;
@@ -185,7 +213,10 @@ static void xfrmi_dev_uninit(struct net_device *dev)
struct xfrm_if *xi = netdev_priv(dev);
struct xfrmi_net *xfrmn = net_generic(xi->net, xfrmi_net_id);
- xfrmi_unlink(xfrmn, xi);
+ if (xi->p.collect_md)
+ rcu_assign_pointer(xfrmn->collect_md_xfrmi, NULL);
+ else
+ xfrmi_unlink(xfrmn, xi);
}
static void xfrmi_scrub_packet(struct sk_buff *skb, bool xnet)
@@ -254,6 +285,16 @@ static int xfrmi_rcv_cb(struct sk_buff *skb, int err)
}
xfrmi_scrub_packet(skb, xnet);
+ if (xi->p.collect_md) {
+ struct metadata_dst *tun_dst;
+
+ tun_dst = tun_rx_dst(0);
+ if (!tun_dst)
+ return -ENOMEM;
+
+ tun_dst->u.tun_info.key.tun_id = if_id_to_tunnel_id(x->if_id);
+ skb_dst_set(skb, (struct dst_entry *)tun_dst);
+ }
dev_sw_netstats_rx_add(dev, skb->len);
return 0;
@@ -269,10 +310,24 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
struct net_device *tdev;
struct xfrm_state *x;
int err = -1;
+ u32 if_id;
int mtu;
+ if (xi->p.collect_md) {
+ struct ip_tunnel_info *tun_info = skb_tunnel_info(skb);
+
+ if (unlikely(!tun_info ||
+ !(tun_info->mode & IP_TUNNEL_INFO_TX))) {
+ return -EINVAL;
+ }
+
+ if_id = tunnel_id_to_if_id(tun_info->key.tun_id);
+ } else {
+ if_id = xi->p.if_id;
+ }
+
dst_hold(dst);
- dst = xfrm_lookup_with_ifid(xi->net, dst, fl, NULL, 0, xi->p.if_id);
+ dst = xfrm_lookup_with_ifid(xi->net, dst, fl, NULL, 0, if_id);
if (IS_ERR(dst)) {
err = PTR_ERR(dst);
dst = NULL;
@@ -283,7 +338,7 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
if (!x)
goto tx_err_link_failure;
- if (x->if_id != xi->p.if_id)
+ if (x->if_id != if_id)
goto tx_err_link_failure;
tdev = dst->dev;
@@ -633,6 +688,9 @@ static void xfrmi_netlink_parms(struct nlattr *data[],
if (data[IFLA_XFRM_IF_ID])
parms->if_id = nla_get_u32(data[IFLA_XFRM_IF_ID]);
+
+ if (data[IFLA_XFRM_COLLECT_METADATA])
+ parms->collect_md = true;
}
static int xfrmi_newlink(struct net *src_net, struct net_device *dev,
@@ -645,9 +703,20 @@ static int xfrmi_newlink(struct net *src_net, struct net_device *dev,
int err;
xfrmi_netlink_parms(data, &p);
- xi = xfrmi_locate(net, &p);
- if (xi)
- return -EEXIST;
+ if (p.collect_md) {
+ struct xfrmi_net *xfrmn = net_generic(net, xfrmi_net_id);
+
+ if (p.link || p.if_id)
+ return -EINVAL;
+
+ if (rtnl_dereference(xfrmn->collect_md_xfrmi))
+ return -EEXIST;
+
+ } else {
+ xi = xfrmi_locate(net, &p);
+ if (xi)
+ return -EEXIST;
+ }
xi = netdev_priv(dev);
xi->p = p;
@@ -672,12 +741,17 @@ static int xfrmi_changelink(struct net_device *dev, struct nlattr *tb[],
struct xfrm_if_parms p;
xfrmi_netlink_parms(data, &p);
+ if (p.collect_md)
+ return -EINVAL;
+
xi = xfrmi_locate(net, &p);
if (!xi) {
xi = netdev_priv(dev);
} else {
if (xi->dev != dev)
return -EEXIST;
+ if (xi->p.collect_md)
+ return -EINVAL;
}
return xfrmi_update(xi, &p);
@@ -690,6 +764,8 @@ static size_t xfrmi_get_size(const struct net_device *dev)
nla_total_size(4) +
/* IFLA_XFRM_IF_ID */
nla_total_size(4) +
+ /* IFLA_XFRM_COLLECT_METADATA */
+ nla_total_size(0) +
0;
}
@@ -701,6 +777,10 @@ static int xfrmi_fill_info(struct sk_buff *skb, const struct net_device *dev)
if (nla_put_u32(skb, IFLA_XFRM_LINK, parm->link) ||
nla_put_u32(skb, IFLA_XFRM_IF_ID, parm->if_id))
goto nla_put_failure;
+ if (xi->p.collect_md) {
+ if (nla_put_flag(skb, IFLA_XFRM_COLLECT_METADATA))
+ goto nla_put_failure;
+ }
return 0;
nla_put_failure:
@@ -715,8 +795,9 @@ static struct net *xfrmi_get_link_net(const struct net_device *dev)
}
static const struct nla_policy xfrmi_policy[IFLA_XFRM_MAX + 1] = {
- [IFLA_XFRM_LINK] = { .type = NLA_U32 },
- [IFLA_XFRM_IF_ID] = { .type = NLA_U32 },
+ [IFLA_XFRM_LINK] = { .type = NLA_U32 },
+ [IFLA_XFRM_IF_ID] = { .type = NLA_U32 },
+ [IFLA_XFRM_COLLECT_METADATA] = { .type = NLA_FLAG },
};
static struct rtnl_link_ops xfrmi_link_ops __read_mostly = {
@@ -752,6 +833,9 @@ static void __net_exit xfrmi_exit_batch_net(struct list_head *net_exit_list)
xip = &xi->next)
unregister_netdevice_queue(xi->dev, &list);
}
+ xi = rcu_dereference(xfrmn->collect_md_xfrmi);
+ if (xi)
+ unregister_netdevice_queue(xi->dev, &list);
}
unregister_netdevice_many(&list);
rtnl_unlock();
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index d622c2548d22..4da5ea277500 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -3519,17 +3519,17 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb,
int xerr_idx = -1;
const struct xfrm_if_cb *ifcb;
struct sec_path *sp;
- struct xfrm_if *xi;
u32 if_id = 0;
rcu_read_lock();
ifcb = xfrm_if_get_cb();
if (ifcb) {
- xi = ifcb->decode_session(skb, family);
- if (xi) {
- if_id = xi->p.if_id;
- net = xi->net;
+ struct xfrm_if_decode_session_params p;
+
+ if (ifcb->decode_session(skb, family, &p)) {
+ if_id = p.if_id;
+ net = p.net;
}
}
rcu_read_unlock();
--
2.25.1
^ permalink raw reply related
* Re: [PATCH v3 net-next 0/3] net/sched: fix over mtu packet of defrag in
From: Jamal Hadi Salim @ 2020-11-21 14:39 UTC (permalink / raw)
To: Cong Wang, wenxu
Cc: Jakub Kicinski, Marcelo Ricardo Leitner, Vlad Buslov,
Linux Kernel Network Developers
In-Reply-To: <CAM_iQpV1Lyw4yNUEof1kERA1vWLediDGAsfHf_UVxuS2HMNHYg@mail.gmail.com>
On 2020-11-20 2:28 p.m., Cong Wang wrote:
> On Thu, Nov 19, 2020 at 3:39 PM <wenxu@ucloud.cn> wrote:
>>
>> From: wenxu <wenxu@ucloud.cn>
>>
>> Currently kernel tc subsystem can do conntrack in act_ct. But when several
>> fragment packets go through the act_ct, function tcf_ct_handle_fragments
>> will defrag the packets to a big one. But the last action will redirect
>> mirred to a device which maybe lead the reassembly big packet over the mtu
>> of target device.
>>
>> The first patch fix miss init the qdisc_skb_cb->mru
>> The send one refactor the hanle of xmit in act_mirred and prepare for the
>> third one
>> The last one add implict packet fragment support to fix the over mtu for
>> defrag in act_ct.
>
> Overall it looks much better to me now, so:
>
> Acked-by: Cong Wang <cong.wang@bytedance.com>
LGTM as well.
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
cheers,
jamal
^ permalink raw reply
* Re: [PATCH net-next 1/2] net: dsa: tag_hellcreek: Cleanup includes
From: Florian Fainelli @ 2020-11-21 15:00 UTC (permalink / raw)
To: Kurt Kanzenbach, Andrew Lunn, Vivien Didelot, Vladimir Oltean
Cc: David S. Miller, Jakub Kicinski, netdev
In-Reply-To: <20201121114455.22422-2-kurt@linutronix.de>
On 11/21/2020 3:44 AM, Kurt Kanzenbach wrote:
> Remove unused and add needed includes. No functional change.
>
> Suggested-by: Vladimir Oltean <olteanv@gmail.com>
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [PATCH net-next 2/2] net: dsa: hellcreek: Don't print error message on defer
From: Florian Fainelli @ 2020-11-21 15:01 UTC (permalink / raw)
To: Kurt Kanzenbach, Andrew Lunn, Vivien Didelot, Vladimir Oltean
Cc: David S. Miller, Jakub Kicinski, netdev
In-Reply-To: <20201121114455.22422-3-kurt@linutronix.de>
On 11/21/2020 3:44 AM, Kurt Kanzenbach wrote:
> When DSA is not loaded when the driver is probed an error message is
> printed. But, that's not really an error, just a defer. Use dev_err_probe()
> instead.
>
> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* [PATCH net-next] net: sched: alias action flags with TCA_ACT_ prefix
From: Vlad Buslov @ 2020-11-21 16:09 UTC (permalink / raw)
To: netdev, davem, kuba; +Cc: jhs, xiyou.wangcong, jiri, Vlad Buslov
Currently both filter and action flags use same "TCA_" prefix which makes
them hard to distinguish to code and confusing for users. Create aliases
for existing action flags constants with "TCA_ACT_" prefix.
Signed-off-by: Vlad Buslov <vlad@buslov.dev>
---
include/uapi/linux/rtnetlink.h | 11 +++++++----
net/sched/act_api.c | 10 +++++-----
2 files changed, 12 insertions(+), 9 deletions(-)
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 2ffbef5da6c1..91db081e5360 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -768,16 +768,19 @@ enum {
#define TA_PAYLOAD(n) NLMSG_PAYLOAD(n,sizeof(struct tcamsg))
/* tcamsg flags stored in attribute TCA_ROOT_FLAGS
*
- * TCA_FLAG_LARGE_DUMP_ON user->kernel to request for larger than TCA_ACT_MAX_PRIO
- * actions in a dump. All dump responses will contain the number of actions
- * being dumped stored in for user app's consumption in TCA_ROOT_COUNT
+ * TCA_ACT_FLAG_LARGE_DUMP_ON user->kernel to request for larger than
+ * TCA_ACT_MAX_PRIO actions in a dump. All dump responses will contain the
+ * number of actions being dumped stored in for user app's consumption in
+ * TCA_ROOT_COUNT
*
- * TCA_FLAG_TERSE_DUMP user->kernel to request terse (brief) dump that only
+ * TCA_ACT_FLAG_TERSE_DUMP user->kernel to request terse (brief) dump that only
* includes essential action info (kind, index, etc.)
*
*/
#define TCA_FLAG_LARGE_DUMP_ON (1 << 0)
+#define TCA_ACT_FLAG_LARGE_DUMP_ON TCA_FLAG_LARGE_DUMP_ON
#define TCA_FLAG_TERSE_DUMP (1 << 1)
+#define TCA_ACT_FLAG_TERSE_DUMP TCA_FLAG_TERSE_DUMP
/* New extended info filters for IFLA_EXT_MASK */
#define RTEXT_FILTER_VF (1 << 0)
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index fc23f46a315c..99db1c77426b 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -278,7 +278,7 @@ static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
index--;
goto nla_put_failure;
}
- err = (act_flags & TCA_FLAG_TERSE_DUMP) ?
+ err = (act_flags & TCA_ACT_FLAG_TERSE_DUMP) ?
tcf_action_dump_terse(skb, p, true) :
tcf_action_dump_1(skb, p, 0, 0);
if (err < 0) {
@@ -288,7 +288,7 @@ static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
}
nla_nest_end(skb, nest);
n_i++;
- if (!(act_flags & TCA_FLAG_LARGE_DUMP_ON) &&
+ if (!(act_flags & TCA_ACT_FLAG_LARGE_DUMP_ON) &&
n_i >= TCA_ACT_MAX_PRIO)
goto done;
}
@@ -298,7 +298,7 @@ static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
mutex_unlock(&idrinfo->lock);
if (n_i) {
- if (act_flags & TCA_FLAG_LARGE_DUMP_ON)
+ if (act_flags & TCA_ACT_FLAG_LARGE_DUMP_ON)
cb->args[1] = n_i;
}
return n_i;
@@ -1473,8 +1473,8 @@ static int tcf_action_add(struct net *net, struct nlattr *nla,
}
static const struct nla_policy tcaa_policy[TCA_ROOT_MAX + 1] = {
- [TCA_ROOT_FLAGS] = NLA_POLICY_BITFIELD32(TCA_FLAG_LARGE_DUMP_ON |
- TCA_FLAG_TERSE_DUMP),
+ [TCA_ROOT_FLAGS] = NLA_POLICY_BITFIELD32(TCA_ACT_FLAG_LARGE_DUMP_ON |
+ TCA_ACT_FLAG_TERSE_DUMP),
[TCA_ROOT_TIME_DELTA] = { .type = NLA_U32 },
};
--
2.29.1
^ permalink raw reply related
* Re: [PATCH v5 2/3] net: add kcov handle to skb extensions
From: Ido Schimmel @ 2020-11-21 16:09 UTC (permalink / raw)
To: Aleksandr Nogikh, fw
Cc: davem, kuba, johannes, edumazet, andreyknvl, dvyukov, elver,
linux-kernel, netdev, linux-wireless, willemdebruijn.kernel,
Aleksandr Nogikh, Willem de Bruijn
In-Reply-To: <20201029173620.2121359-3-aleksandrnogikh@gmail.com>
+ Florian
On Thu, Oct 29, 2020 at 05:36:19PM +0000, Aleksandr Nogikh wrote:
> From: Aleksandr Nogikh <nogikh@google.com>
>
> Remote KCOV coverage collection enables coverage-guided fuzzing of the
> code that is not reachable during normal system call execution. It is
> especially helpful for fuzzing networking subsystems, where it is
> common to perform packet handling in separate work queues even for the
> packets that originated directly from the user space.
>
> Enable coverage-guided frame injection by adding kcov remote handle to
> skb extensions. Default initialization in __alloc_skb and
> __build_skb_around ensures that no socket buffer that was generated
> during a system call will be missed.
>
> Code that is of interest and that performs packet processing should be
> annotated with kcov_remote_start()/kcov_remote_stop().
>
> An alternative approach is to determine kcov_handle solely on the
> basis of the device/interface that received the specific socket
> buffer. However, in this case it would be impossible to distinguish
> between packets that originated during normal background network
> processes or were intentionally injected from the user space.
>
> Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
> Acked-by: Willem de Bruijn <willemb@google.com>
[...]
> @@ -249,6 +249,9 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
>
> fclones->skb2.fclone = SKB_FCLONE_CLONE;
> }
> +
> + skb_set_kcov_handle(skb, kcov_common_handle());
Hi,
This causes skb extensions to be allocated for the allocated skb, but
there are instances that blindly overwrite 'skb->extensions' by invoking
skb_copy_header() after __alloc_skb(). For example, skb_copy(),
__pskb_copy_fclone() and skb_copy_expand(). This results in the skb
extensions being leaked [1].
One possible solution is to try to patch all these instances with
skb_ext_put() before skb_copy_header().
Another possible solution is to convert skb_copy_header() to use
skb_ext_copy() instead of __skb_ext_copy(). It will first drop the
reference on the skb extensions of the new skb, but it assumes that
'skb->active_extensions' is valid. This is not the case in the
skb_clone() path so we should probably zero this field in __skb_clone().
Other suggestions?
Thanks
[1]
BUG: memory leak
unreferenced object 0xffff888027f9a490 (size 16):
comm "syz-executor.0", pid 1155, jiffies 4295996826 (age 66.927s)
hex dump (first 16 bytes):
01 00 00 00 01 02 6b 6b 01 00 00 00 00 00 00 00 ......kk........
backtrace:
[<0000000005a5f2c4>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
[<0000000005a5f2c4>] slab_post_alloc_hook mm/slab.h:528 [inline]
[<0000000005a5f2c4>] slab_alloc_node mm/slub.c:2891 [inline]
[<0000000005a5f2c4>] slab_alloc mm/slub.c:2899 [inline]
[<0000000005a5f2c4>] kmem_cache_alloc+0x173/0x800 mm/slub.c:2904
[<00000000c5e43ea9>] __skb_ext_alloc+0x22/0x90 net/core/skbuff.c:6173
[<000000000de35e81>] skb_ext_add+0x230/0x4a0 net/core/skbuff.c:6268
[<000000003b7efba4>] skb_set_kcov_handle include/linux/skbuff.h:4622 [inline]
[<000000003b7efba4>] skb_set_kcov_handle include/linux/skbuff.h:4612 [inline]
[<000000003b7efba4>] __alloc_skb+0x47f/0x6a0 net/core/skbuff.c:253
[<000000007f789b23>] skb_copy+0x151/0x310 net/core/skbuff.c:1512
[<000000001ce26864>] mlxsw_emad_transmit+0x4e/0x620 drivers/net/ethernet/mellanox/mlxsw/core.c:585
[<000000005c732123>] mlxsw_emad_reg_access drivers/net/ethernet/mellanox/mlxsw/core.c:829 [inline]
[<000000005c732123>] mlxsw_core_reg_access_emad+0xda8/0x1770 drivers/net/ethernet/mellanox/mlxsw/core.c:2408
[<00000000c07840b3>] mlxsw_core_reg_access+0x101/0x7f0 drivers/net/ethernet/mellanox/mlxsw/core.c:2583
[<000000007c47f30f>] mlxsw_reg_write+0x30/0x40 drivers/net/ethernet/mellanox/mlxsw/core.c:2603
[<00000000675e3fc7>] mlxsw_sp_port_admin_status_set+0x8a7/0x980 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:300
[<00000000fefe35a4>] mlxsw_sp_port_stop+0x63/0x70 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:537
[<00000000c41390e8>] __dev_close_many+0x1c7/0x300 net/core/dev.c:1607
[<00000000628c5987>] __dev_close net/core/dev.c:1619 [inline]
[<00000000628c5987>] __dev_change_flags+0x2b9/0x710 net/core/dev.c:8421
[<000000008cc810c6>] dev_change_flags+0x97/0x170 net/core/dev.c:8494
[<0000000053274a78>] do_setlink+0xa5b/0x3b80 net/core/rtnetlink.c:2706
[<00000000e4085785>] rtnl_group_changelink net/core/rtnetlink.c:3225 [inline]
[<00000000e4085785>] __rtnl_newlink+0xe06/0x17d0 net/core/rtnetlink.c:3379
^ permalink raw reply
* [RFC] MAINTAINERS tag for cleanup robot
From: trix @ 2020-11-21 16:50 UTC (permalink / raw)
To: trix, joe, clang-built-linux
Cc: linux-hyperv, linux-kernel, xen-devel, tboot-devel, kvm,
linux-crypto, linux-acpi, devel, amd-gfx, dri-devel, intel-gfx,
netdev, linux-media, MPT-FusionLinux.pdl, linux-scsi,
linux-wireless, ibm-acpi-devel, platform-driver-x86, linux-usb,
linux-omap, linux-fbdev, ecryptfs, linux-fsdevel, cluster-devel,
linux-mtd, keyrings, netfilter-devel, coreteam, alsa-devel, bpf,
linux-bluetooth, linux-nfs, patches
A difficult part of automating commits is composing the subsystem
preamble in the commit log. For the ongoing effort of a fixer producing
one or two fixes a release the use of 'treewide:' does not seem appropriate.
It would be better if the normal prefix was used. Unfortunately normal is
not consistent across the tree.
So I am looking for comments for adding a new tag to the MAINTAINERS file
D: Commit subsystem prefix
ex/ for FPGA DFL DRIVERS
D: fpga: dfl:
Continuing with cleaning up clang's -Wextra-semi-stmt
A significant number of warnings are caused by function like macros with
a trailing semicolon. For example.
#define FOO(a) a++; <-- extra, unneeded semicolon
void bar() {
int v = 0;
FOO(a);
}
Clang will warn at the FOO(a); expansion location. Instead of removing
the semicolon there, the fixer removes semicolon from the macro
definition. After the fixer, the code will be:
#define FOO(a) a++
void bar() {
int v = 0;
FOO(a);
}
The fixer review is
https://reviews.llvm.org/D91789
A run over allyesconfig for x86_64 finds 62 issues, 5 are false positives.
The false positives are caused by macros passed to other macros and by
some macro expansions that did not have an extra semicolon.
This cleans up about 1,000 of the current 10,000 -Wextra-semi-stmt
warnings in linux-next.
An update to [RFC] clang tooling cleanup
This change adds the clang-tidy-fix as a top level target and
uses it to do the cleaning. The next iteration will do a loop of
cleaners. This will mean breaking clang-tidy-fix out into its own
processing function 'run_fixers'.
Makefile: Add toplevel target clang-tidy-fix to makefile
Calls clang-tidy with -fix option for a set of checkers that
programatically fixes the kernel source in place, treewide.
Signed-off-by: Tom Rix <trix@redhat.com>
---
Makefile | 7 ++++---
scripts/clang-tools/run-clang-tools.py | 20 +++++++++++++++++---
2 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/Makefile b/Makefile
index 47a8add4dd28..57756dbb767b 100644
--- a/Makefile
+++ b/Makefile
@@ -1567,20 +1567,21 @@ help:
echo ''
@echo 'Static analysers:'
@echo ' checkstack - Generate a list of stack hogs'
@echo ' versioncheck - Sanity check on version.h usage'
@echo ' includecheck - Check for duplicate included header files'
@echo ' export_report - List the usages of all exported symbols'
@echo ' headerdep - Detect inclusion cycles in headers'
@echo ' coccicheck - Check with Coccinelle'
@echo ' clang-analyzer - Check with clang static analyzer'
@echo ' clang-tidy - Check with clang-tidy'
+ @echo ' clang-tidy-fix - Check and fix with clang-tidy'
@echo ''
@echo 'Tools:'
@echo ' nsdeps - Generate missing symbol namespace dependencies'
@echo ''
@echo 'Kernel selftest:'
@echo ' kselftest - Build and run kernel selftest'
@echo ' Build, install, and boot kernel before'
@echo ' running kselftest on it'
@echo ' Run as root for full coverage'
@echo ' kselftest-all - Build kernel selftest'
@@ -1842,30 +1843,30 @@ nsdeps: modules
quiet_cmd_gen_compile_commands = GEN $@
cmd_gen_compile_commands = $(PYTHON3) $< -a $(AR) -o $@ $(filter-out $<, $(real-prereqs))
$(extmod-prefix)compile_commands.json: scripts/clang-tools/gen_compile_commands.py \
$(if $(KBUILD_EXTMOD),,$(KBUILD_VMLINUX_OBJS) $(KBUILD_VMLINUX_LIBS)) \
$(if $(CONFIG_MODULES), $(MODORDER)) FORCE
$(call if_changed,gen_compile_commands)
targets += $(extmod-prefix)compile_commands.json
-PHONY += clang-tidy clang-analyzer
+PHONY += clang-tidy-fix clang-tidy clang-analyzer
ifdef CONFIG_CC_IS_CLANG
quiet_cmd_clang_tools = CHECK $<
cmd_clang_tools = $(PYTHON3) $(srctree)/scripts/clang-tools/run-clang-tools.py $@ $<
-clang-tidy clang-analyzer: $(extmod-prefix)compile_commands.json
+clang-tidy-fix clang-tidy clang-analyzer: $(extmod-prefix)compile_commands.json
$(call cmd,clang_tools)
else
-clang-tidy clang-analyzer:
+clang-tidy-fix clang-tidy clang-analyzer:
@echo "$@ requires CC=clang" >&2
@false
endif
# Scripts to check various things for consistency
# ---------------------------------------------------------------------------
PHONY += includecheck versioncheck coccicheck export_report
includecheck:
diff --git a/scripts/clang-tools/run-clang-tools.py b/scripts/clang-tools/run-clang-tools.py
index fa7655c7cec0..c177ca822c56 100755
--- a/scripts/clang-tools/run-clang-tools.py
+++ b/scripts/clang-tools/run-clang-tools.py
@@ -22,43 +22,57 @@ def parse_arguments():
Returns:
args: Dict of parsed args
Has keys: [path, type]
"""
usage = """Run clang-tidy or the clang static-analyzer on a
compilation database."""
parser = argparse.ArgumentParser(description=usage)
type_help = "Type of analysis to be performed"
parser.add_argument("type",
- choices=["clang-tidy", "clang-analyzer"],
+ choices=["clang-tidy-fix", "clang-tidy", "clang-analyzer"],
help=type_help)
path_help = "Path to the compilation database to parse"
parser.add_argument("path", type=str, help=path_help)
return parser.parse_args()
def init(l, a):
global lock
global args
lock = l
args = a
def run_analysis(entry):
# Disable all checks, then re-enable the ones we want
checks = "-checks=-*,"
- if args.type == "clang-tidy":
+ fix = ""
+ header_filter = ""
+ if args.type == "clang-tidy-fix":
+ checks += "linuxkernel-macro-trailing-semi"
+ #
+ # Fix this
+ # #define M(a) a++; <-- clang-tidy fixes the problem here
+ # int f() {
+ # int v = 0;
+ # M(v); <-- clang reports problem here
+ # return v;
+ # }
+ fix += "-fix"
+ header_filter += "-header-filter=.*"
+ elif args.type == "clang-tidy":
checks += "linuxkernel-*"
else:
checks += "clang-analyzer-*"
- p = subprocess.run(["clang-tidy", "-p", args.path, checks, entry["file"]],
+ p = subprocess.run(["clang-tidy", "-p", args.path, checks, header_filter, fix, entry["file"]],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
cwd=entry["directory"])
with lock:
sys.stderr.buffer.write(p.stdout)
def main():
args = parse_arguments()
--
2.18.4
^ permalink raw reply related
* Re: [PATCH v5 2/3] net: add kcov handle to skb extensions
From: Florian Westphal @ 2020-11-21 16:52 UTC (permalink / raw)
To: Ido Schimmel
Cc: Aleksandr Nogikh, fw, davem, kuba, johannes, edumazet, andreyknvl,
dvyukov, elver, linux-kernel, netdev, linux-wireless,
willemdebruijn.kernel, Aleksandr Nogikh, Willem de Bruijn
In-Reply-To: <20201121160941.GA485907@shredder.lan>
Ido Schimmel <idosch@idosch.org> wrote:
> On Thu, Oct 29, 2020 at 05:36:19PM +0000, Aleksandr Nogikh wrote:
> > From: Aleksandr Nogikh <nogikh@google.com>
> >
> > Remote KCOV coverage collection enables coverage-guided fuzzing of the
> > code that is not reachable during normal system call execution. It is
> > especially helpful for fuzzing networking subsystems, where it is
> > common to perform packet handling in separate work queues even for the
> > packets that originated directly from the user space.
> >
> > Enable coverage-guided frame injection by adding kcov remote handle to
> > skb extensions. Default initialization in __alloc_skb and
> > __build_skb_around ensures that no socket buffer that was generated
> > during a system call will be missed.
> >
> > Code that is of interest and that performs packet processing should be
> > annotated with kcov_remote_start()/kcov_remote_stop().
> >
> > An alternative approach is to determine kcov_handle solely on the
> > basis of the device/interface that received the specific socket
> > buffer. However, in this case it would be impossible to distinguish
> > between packets that originated during normal background network
> > processes or were intentionally injected from the user space.
> >
> > Signed-off-by: Aleksandr Nogikh <nogikh@google.com>
> > Acked-by: Willem de Bruijn <willemb@google.com>
>
> [...]
>
> > @@ -249,6 +249,9 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
> >
> > fclones->skb2.fclone = SKB_FCLONE_CLONE;
> > }
> > +
> > + skb_set_kcov_handle(skb, kcov_common_handle());
>
> Hi,
>
> This causes skb extensions to be allocated for the allocated skb, but
> there are instances that blindly overwrite 'skb->extensions' by invoking
> skb_copy_header() after __alloc_skb(). For example, skb_copy(),
> __pskb_copy_fclone() and skb_copy_expand(). This results in the skb
> extensions being leaked [1].
[..]
> Other suggestions?
Aleksandr, why was this made into an skb extension in the first place?
AFAIU this feature is usually always disabled at build time.
For debug builds (test farm /debug kernel etc) its always needed.
If thats the case this u64 should be an sk_buff member, not an
extension.
^ permalink raw reply
* Re: [RFC] MAINTAINERS tag for cleanup robot
From: Joe Perches @ 2020-11-21 17:10 UTC (permalink / raw)
To: trix, clang-built-linux
Cc: linux-hyperv, linux-kernel, xen-devel, tboot-devel, kvm,
linux-crypto, linux-acpi, devel, amd-gfx, dri-devel, intel-gfx,
netdev, linux-media, MPT-FusionLinux.pdl, linux-scsi,
linux-wireless, ibm-acpi-devel, platform-driver-x86, linux-usb,
linux-omap, linux-fbdev, ecryptfs, linux-fsdevel, cluster-devel,
linux-mtd, keyrings, netfilter-devel, coreteam, alsa-devel, bpf,
linux-bluetooth, linux-nfs, patches
In-Reply-To: <20201121165058.1644182-1-trix@redhat.com>
On Sat, 2020-11-21 at 08:50 -0800, trix@redhat.com wrote:
> A difficult part of automating commits is composing the subsystem
> preamble in the commit log. For the ongoing effort of a fixer producing
> one or two fixes a release the use of 'treewide:' does not seem appropriate.
>
> It would be better if the normal prefix was used. Unfortunately normal is
> not consistent across the tree.
>
> So I am looking for comments for adding a new tag to the MAINTAINERS file
>
> D: Commit subsystem prefix
>
> ex/ for FPGA DFL DRIVERS
>
> D: fpga: dfl:
I'm all for it. Good luck with the effort. It's not completely trivial.
From a decade ago:
https://lore.kernel.org/lkml/1289919077.28741.50.camel@Joe-Laptop/
(and that thread started with extra semicolon patches too)
> Continuing with cleaning up clang's -Wextra-semi-stmt
> diff --git a/Makefile b/Makefile
[]
> @@ -1567,20 +1567,21 @@ help:
> echo ''
> @echo 'Static analysers:'
> @echo ' checkstack - Generate a list of stack hogs'
> @echo ' versioncheck - Sanity check on version.h usage'
> @echo ' includecheck - Check for duplicate included header files'
> @echo ' export_report - List the usages of all exported symbols'
> @echo ' headerdep - Detect inclusion cycles in headers'
> @echo ' coccicheck - Check with Coccinelle'
> @echo ' clang-analyzer - Check with clang static analyzer'
> @echo ' clang-tidy - Check with clang-tidy'
> + @echo ' clang-tidy-fix - Check and fix with clang-tidy'
A pity the ordering of the code below isn't the same as the above.
> -PHONY += clang-tidy clang-analyzer
> +PHONY += clang-tidy-fix clang-tidy clang-analyzer
[]
> -clang-tidy clang-analyzer: $(extmod-prefix)compile_commands.json
> +clang-tidy-fix clang-tidy clang-analyzer: $(extmod-prefix)compile_commands.json
> $(call cmd,clang_tools)
> else
> -clang-tidy clang-analyzer:
> +clang-tidy-fix clang-tidy clang-analyzer:
[]
> diff --git a/scripts/clang-tools/run-clang-tools.py b/scripts/clang-tools/run-clang-tools.py
[]
> @@ -22,43 +22,57 @@ def parse_arguments():
[]
> parser.add_argument("type",
> - choices=["clang-tidy", "clang-analyzer"],
> + choices=["clang-tidy-fix", "clang-tidy", "clang-analyzer"],
etc...
^ permalink raw reply
* Re: [RFC] MAINTAINERS tag for cleanup robot
From: James Bottomley @ 2020-11-21 17:18 UTC (permalink / raw)
To: trix, joe, clang-built-linux
Cc: linux-hyperv, linux-kernel, xen-devel, tboot-devel, kvm,
linux-crypto, linux-acpi, devel, amd-gfx, dri-devel, intel-gfx,
netdev, linux-media, MPT-FusionLinux.pdl, linux-scsi,
linux-wireless, ibm-acpi-devel, platform-driver-x86, linux-usb,
linux-omap, linux-fbdev, ecryptfs, linux-fsdevel, cluster-devel,
linux-mtd, keyrings, netfilter-devel, coreteam, alsa-devel, bpf,
linux-bluetooth, linux-nfs, patches
In-Reply-To: <20201121165058.1644182-1-trix@redhat.com>
On Sat, 2020-11-21 at 08:50 -0800, trix@redhat.com wrote:
> A difficult part of automating commits is composing the subsystem
> preamble in the commit log. For the ongoing effort of a fixer
> producing
> one or two fixes a release the use of 'treewide:' does not seem
> appropriate.
>
> It would be better if the normal prefix was used. Unfortunately
> normal is
> not consistent across the tree.
>
>
> D: Commit subsystem prefix
>
> ex/ for FPGA DFL DRIVERS
>
> D: fpga: dfl:
>
I've got to bet this is going to cause more issues than it solves.
SCSI uses scsi: <driver>: for drivers but not every driver has a
MAINTAINERS entry. We use either scsi: or scsi: core: for mid layer
things, but we're not consistent. Block uses blk-<something>: for all
of it's stuff but almost no <somtehing>s have a MAINTAINERS entry. So
the next thing you're going to cause is an explosion of suggested
MAINTAINERs entries.
Has anyone actually complained about treewide:?
James
^ permalink raw reply
* Re: [PATCH v5 2/3] net: add kcov handle to skb extensions
From: Johannes Berg @ 2020-11-21 17:39 UTC (permalink / raw)
To: Florian Westphal, Ido Schimmel
Cc: Aleksandr Nogikh, davem, kuba, edumazet, andreyknvl, dvyukov,
elver, linux-kernel, netdev, linux-wireless,
willemdebruijn.kernel, Aleksandr Nogikh, Willem de Bruijn
In-Reply-To: <20201121165227.GT15137@breakpoint.cc>
On Sat, 2020-11-21 at 17:52 +0100, Florian Westphal wrote:
>
> Aleksandr, why was this made into an skb extension in the first place?
>
> AFAIU this feature is usually always disabled at build time.
> For debug builds (test farm /debug kernel etc) its always needed.
>
> If thats the case this u64 should be an sk_buff member, not an
> extension.
Because it was requested :-)
https://lore.kernel.org/netdev/20201009161558.57792e1a@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
johannes
^ permalink raw reply
* [PATCH net-next] compat: always include linux/compat.h from net/compat.h
From: Jakub Kicinski @ 2020-11-21 17:52 UTC (permalink / raw)
To: davem; +Cc: netdev, hch, arnd, Jakub Kicinski
We're about to do some reshuffling in networking headers and make
some of the file lose the implicit includes. This results in:
In file included from net/ipv4/netfilter/arp_tables.c:26:
include/net/compat.h:57:23: error: conflicting types for ‘uintptr_t’
#define compat_uptr_t uintptr_t
^~~~~~~~~
include/asm-generic/compat.h:22:13: note: in expansion of macro ‘compat_uptr_t’
typedef u32 compat_uptr_t;
^~~~~~~~~~~~~
In file included from include/linux/limits.h:6,
from include/linux/kernel.h:7,
from net/ipv4/netfilter/arp_tables.c:14:
include/linux/types.h:37:24: note: previous declaration of ‘uintptr_t’ was here
typedef unsigned long uintptr_t;
^~~~~~~~~
Currently net/compat.h depends on linux/compat.h being included
first. After the upcoming changes this would break the 32bit build.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
Not sure who officially maintains this. Arnd, Christoph any objections?
include/net/compat.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/net/compat.h b/include/net/compat.h
index 745db0d605b6..08a089bbaecc 100644
--- a/include/net/compat.h
+++ b/include/net/compat.h
@@ -5,10 +5,10 @@
struct sock;
-#if defined(CONFIG_COMPAT)
-
#include <linux/compat.h>
+#if defined(CONFIG_COMPAT)
+
struct compat_msghdr {
compat_uptr_t msg_name; /* void * */
compat_int_t msg_namelen;
--
2.24.1
^ permalink raw reply related
* Re: [RFC] MAINTAINERS tag for cleanup robot
From: Joe Perches @ 2020-11-21 18:02 UTC (permalink / raw)
To: James Bottomley, trix, clang-built-linux
Cc: linux-hyperv, linux-kernel, xen-devel, tboot-devel, kvm,
linux-crypto, linux-acpi, devel, amd-gfx, dri-devel, intel-gfx,
netdev, linux-media, MPT-FusionLinux.pdl, linux-scsi,
linux-wireless, ibm-acpi-devel, platform-driver-x86, linux-usb,
linux-omap, linux-fbdev, ecryptfs, linux-fsdevel, cluster-devel,
linux-mtd, keyrings, netfilter-devel, coreteam, alsa-devel, bpf,
linux-bluetooth, linux-nfs, patches
In-Reply-To: <5843ef910b0e86c00d9c0143dec20f93823b016b.camel@HansenPartnership.com>
On Sat, 2020-11-21 at 09:18 -0800, James Bottomley wrote:
> On Sat, 2020-11-21 at 08:50 -0800, trix@redhat.com wrote:
> > A difficult part of automating commits is composing the subsystem
> > preamble in the commit log. For the ongoing effort of a fixer
> > producing one or two fixes a release the use of 'treewide:' does
> > not seem appropriate.
> >
> > It would be better if the normal prefix was used. Unfortunately
> > normal is not consistent across the tree.
> >
> > D: Commit subsystem prefix
> >
> > ex/ for FPGA DFL DRIVERS
> >
> > D: fpga: dfl:
>
> I've got to bet this is going to cause more issues than it solves.
> SCSI uses scsi: <driver>: for drivers but not every driver has a
> MAINTAINERS entry. We use either scsi: or scsi: core: for mid layer
> things, but we're not consistent. Block uses blk-<something>: for all
> of it's stuff but almost no <somtehing>s have a MAINTAINERS entry. So
> the next thing you're going to cause is an explosion of suggested
> MAINTAINERs entries.
As well as some changes require simultaneous changes across
multiple subsystems.
> Has anyone actually complained about treewide:?
It depends on what you mean by treewide:
If a treewide: patch is applied by some "higher level" maintainer,
then generally, no.
If the treewide patch is also cc'd to many individual maintainers,
then yes, many many times.
Mostly because patches cause what is in their view churn or that
changes are not specific to their subsystem grounds.
The treewide patch is sometimes dropped, sometimes broken up and
generally not completely applied.
What would be useful in many cases like this is for a pre and post
application of the treewide patch to be compiled and the object
code verified for lack of any logic change.
Unfortunately, gcc does not guarantee deterministic compilation so
this isn't feasible with at least gcc. Does clang guarantee this?
I'm not sure it's possible:
https://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html
^ permalink raw reply
* Re: [PATCH v5 2/3] net: add kcov handle to skb extensions
From: Jakub Kicinski @ 2020-11-21 18:06 UTC (permalink / raw)
To: Florian Westphal
Cc: Ido Schimmel, Aleksandr Nogikh, davem, johannes, edumazet,
andreyknvl, dvyukov, elver, linux-kernel, netdev, linux-wireless,
willemdebruijn.kernel, Aleksandr Nogikh, Willem de Bruijn
In-Reply-To: <20201121165227.GT15137@breakpoint.cc>
On Sat, 21 Nov 2020 17:52:27 +0100 Florian Westphal wrote:
> Ido Schimmel <idosch@idosch.org> wrote:
> > Other suggestions?
>
> Aleksandr, why was this made into an skb extension in the first place?
>
> AFAIU this feature is usually always disabled at build time.
> For debug builds (test farm /debug kernel etc) its always needed.
>
> If thats the case this u64 should be an sk_buff member, not an
> extension.
Yeah, in hindsight I should have looked at how it's used. Not a great
fit for extensions. We can go back, but...
In general I'm not very happy at how this is going. First of all just
setting the handle in a couple of allocs seems to not be enough, skbs
get cloned, reused etc. There were also build problems caused by this
patch and Aleksandr & co where nowhere to be found. Now we find out
this causes leaks, how was that not caught by the syzbot it's supposed
to serve?!
So I'm leaning towards reverting the whole thing. You can attach
kretprobes and record the information you need in BPF maps.
^ permalink raw reply
* Re: [PATCH v5 2/3] net: add kcov handle to skb extensions
From: Johannes Berg @ 2020-11-21 18:12 UTC (permalink / raw)
To: Jakub Kicinski, Florian Westphal
Cc: Ido Schimmel, Aleksandr Nogikh, davem, edumazet, andreyknvl,
dvyukov, elver, linux-kernel, netdev, linux-wireless,
willemdebruijn.kernel, Aleksandr Nogikh, Willem de Bruijn
In-Reply-To: <20201121100636.26aaaf8a@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
On Sat, 2020-11-21 at 10:06 -0800, Jakub Kicinski wrote:
> On Sat, 21 Nov 2020 17:52:27 +0100 Florian Westphal wrote:
> > Ido Schimmel <idosch@idosch.org> wrote:
> > > Other suggestions?
> >
> > Aleksandr, why was this made into an skb extension in the first place?
> >
> > AFAIU this feature is usually always disabled at build time.
> > For debug builds (test farm /debug kernel etc) its always needed.
> >
> > If thats the case this u64 should be an sk_buff member, not an
> > extension.
>
> Yeah, in hindsight I should have looked at how it's used. Not a great
> fit for extensions. We can go back, but...
>
> In general I'm not very happy at how this is going. First of all just
> setting the handle in a couple of allocs seems to not be enough, skbs
> get cloned, reused etc. There were also build problems caused by this
> patch and Aleksandr & co where nowhere to be found. Now we find out
> this causes leaks, how was that not caught by the syzbot it's supposed
> to serve?!
Heh.
> So I'm leaning towards reverting the whole thing. You can attach
> kretprobes and record the information you need in BPF maps.
I'm not going to object to reverting it (and perhaps redoing it better
later), but I will point out that kretprobe isn't going to work, you
eventually need kcov_remote_start() to be called in strategic points
before processing the skb after it bounced through the system.
IOW, it's not really about serving userland, it's about enabling (and
later disabling) coverage collection for the bits of code it cares
about, mostly because collecting it for _everything_ is going to be too
slow and will mess up the data since for coverage guided fuzzing you
really need the reported coverage data to be only about the injected
fuzz data...
johannes
^ permalink raw reply
* Re: [PATCH net-next,v3 0/9] netfilter: flowtable bridge and vlan enhancements
From: Jakub Kicinski @ 2020-11-21 18:15 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Tobias Waldekranz, netfilter-devel, davem, netdev, razor, jeremy
In-Reply-To: <20201121123138.GA21560@salvia>
On Sat, 21 Nov 2020 13:31:38 +0100 Pablo Neira Ayuso wrote:
> On Mon, Nov 16, 2020 at 11:56:58PM +0100, Pablo Neira Ayuso wrote:
> > On Mon, Nov 16, 2020 at 02:45:21PM -0800, Jakub Kicinski wrote:
> > > On Mon, 16 Nov 2020 23:36:15 +0100 Pablo Neira Ayuso wrote:
> > > > > Are you saying A -> B traffic won't match so it will update the cache,
> > > > > since conntrack flows are bi-directional?
> > > >
> > > > Yes, Traffic for A -> B won't match the flowtable entry, this will
> > > > update the cache.
> > >
> > > That's assuming there will be A -> B traffic without B sending a
> > > request which reaches A, first.
> >
> > B might send packets to A but this will not get anywhere. Assuming
> > TCP, this will trigger retransmissions so B -> A will kick in to
> > refresh the entry.
> >
> > Is this scenario that you describe a showstopper?
Sorry I got distracted.
> I have been discussing the topology update by tracking fdb updates
> with the bridge maintainer, I'll be exploring extensions to the
> existing fdb_notify() infrastructure to deal with this scenario you
> describe. On my side this topology update scenario is not a priority
> to be supported in this patchset, but it's feasible to support it
> later on.
My concern is that invalidation is _the_ hard part of creating caches.
And I feel like merging this as is would be setting our standards pretty
low.
Please gather some review tags from senior netdev developers. I don't
feel confident enough to apply this as 100% my own decision.
^ permalink raw reply
* Re: [PATCH v5 2/3] net: add kcov handle to skb extensions
From: Jakub Kicinski @ 2020-11-21 18:35 UTC (permalink / raw)
To: Johannes Berg
Cc: Florian Westphal, Ido Schimmel, Aleksandr Nogikh, davem, edumazet,
andreyknvl, dvyukov, elver, linux-kernel, netdev, linux-wireless,
willemdebruijn.kernel, Aleksandr Nogikh, Willem de Bruijn
In-Reply-To: <bcfb0fe1b207d2f4bb52f0d1ef51207f9b5587de.camel@sipsolutions.net>
On Sat, 21 Nov 2020 19:12:21 +0100 Johannes Berg wrote:
> > So I'm leaning towards reverting the whole thing. You can attach
> > kretprobes and record the information you need in BPF maps.
>
> I'm not going to object to reverting it (and perhaps redoing it better
> later), but I will point out that kretprobe isn't going to work, you
> eventually need kcov_remote_start() to be called in strategic points
> before processing the skb after it bounced through the system.
>
> IOW, it's not really about serving userland, it's about enabling (and
> later disabling) coverage collection for the bits of code it cares
> about, mostly because collecting it for _everything_ is going to be too
> slow and will mess up the data since for coverage guided fuzzing you
> really need the reported coverage data to be only about the injected
> fuzz data...
All you need is make kcov_remote_start_common() be BPF-able, like
the LSM hooks are now, right? And then BPF can return whatever handle
it pleases.
Or if you don't like BPF or what to KCOV BPF itself in the future you
can roll your own mechanism. The point is - this should be relatively
easily doable out of line...
^ permalink raw reply
* Re: [PATCH net-next] compat: always include linux/compat.h from net/compat.h
From: Jakub Kicinski @ 2020-11-21 18:40 UTC (permalink / raw)
To: davem; +Cc: netdev, hch, arnd
In-Reply-To: <20201121175224.1465831-1-kuba@kernel.org>
On Sat, 21 Nov 2020 09:52:24 -0800 Jakub Kicinski wrote:
> In file included from net/ipv4/netfilter/arp_tables.c:26:
> include/net/compat.h:57:23: error: conflicting types for ‘uintptr_t’
> #define compat_uptr_t uintptr_t
> ^~~~~~~~~
> include/asm-generic/compat.h:22:13: note: in expansion of macro ‘compat_uptr_t’
> typedef u32 compat_uptr_t;
> ^~~~~~~~~~~~~
> In file included from include/linux/limits.h:6,
> from include/linux/kernel.h:7,
> from net/ipv4/netfilter/arp_tables.c:14:
> include/linux/types.h:37:24: note: previous declaration of ‘uintptr_t’ was here
> typedef unsigned long uintptr_t;
> ^~~~~~~~~
Ah, damn it, I obviously copied the wrong error into the commit
message. This is the correct one (after removing include of ethtool.h
from netdevice.h):
In file included from ../net/ipv4/netfilter/arp_tables.c:26:
include/net/compat.h:60:40: error: unknown type name ‘compat_uptr_t’; did you mean ‘compat_ptr_ioctl’?
struct sockaddr __user **save_addr, compat_uptr_t *ptr,
^~~~~~~~~~~~~
compat_ptr_ioctl
include/net/compat.h:61:4: error: unknown type name ‘compat_size_t’; did you mean ‘compat_sigset_t’?
compat_size_t *len);
^~~~~~~~~~~~~
compat_sigset_t
^ permalink raw reply
* Re: [PATCH net-next,v3 0/9] netfilter: flowtable bridge and vlan enhancements
From: Pablo Neira Ayuso @ 2020-11-21 18:56 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Tobias Waldekranz, netfilter-devel, davem, netdev, razor, jeremy
In-Reply-To: <20201121101551.3264c5fd@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
On Sat, Nov 21, 2020 at 10:15:51AM -0800, Jakub Kicinski wrote:
> On Sat, 21 Nov 2020 13:31:38 +0100 Pablo Neira Ayuso wrote:
> > On Mon, Nov 16, 2020 at 11:56:58PM +0100, Pablo Neira Ayuso wrote:
> > > On Mon, Nov 16, 2020 at 02:45:21PM -0800, Jakub Kicinski wrote:
> > > > On Mon, 16 Nov 2020 23:36:15 +0100 Pablo Neira Ayuso wrote:
[...]
> > I have been discussing the topology update by tracking fdb updates
> > with the bridge maintainer, I'll be exploring extensions to the
> > existing fdb_notify() infrastructure to deal with this scenario you
> > describe. On my side this topology update scenario is not a priority
> > to be supported in this patchset, but it's feasible to support it
> > later on.
>
> My concern is that invalidation is _the_ hard part of creating caches.
> And I feel like merging this as is would be setting our standards pretty
> low.
Interesting, let's summarize a bit to make sure we're on the same
page:
- This "cache" is optional, you enable it on demand through ruleset.
- This "cache" is configurable, you can specify through ruleset policy
what policies get into the cache and _when_ they are placed in the
cache.
- This is not affecting any existing default configuration, neither
Linux networking not even classic path Netfilter configurations,
it's a rather new thing.
- This is showing performance improvement of ~50% with a very simple
testbed. With pktgen, back few years ago I was reaching x2.5
performance boost in software in a pktgen testbed.
- This is adding minimal changes to netdev_ops, just a single
callback.
For the live VM migration you describe, connections might time out,
but there are many use-cases where this is still valid, some of them
has been described already here.
> Please gather some review tags from senior netdev developers. I don't
> feel confident enough to apply this as 100% my own decision.
Fair enough.
This requirement for very specific Netfilter infrastructure which does
not affect any other Networking subsystem sounds new to me.
What senior developers specifically you would like I should poke to
get an acknowledgement on this to get this accepted of your
preference?
Thank you.
^ permalink raw reply
* Re: [PATCH nf-next v3 3/3] netfilter: Introduce egress hook
From: Pablo Neira Ayuso @ 2020-11-21 18:59 UTC (permalink / raw)
To: Lukas Wunner
Cc: Daniel Borkmann, Laura García Liébana, John Fastabend,
Jozsef Kadlecsik, Florian Westphal,
Netfilter Development Mailing list, coreteam, netdev,
Alexei Starovoitov, Eric Dumazet, Thomas Graf, David Miller
In-Reply-To: <20201011082657.GB15225@wunner.de>
Hi Lukas,
On Sun, Oct 11, 2020 at 10:26:57AM +0200, Lukas Wunner wrote:
> On Tue, Sep 15, 2020 at 12:02:03AM +0200, Daniel Borkmann wrote:
> > today it is possible and
> > perfectly fine to e.g. redirect to a host-facing veth from tc ingress which
> > then goes into container. Only traffic that goes up the host stack is seen
> > by nf ingress hook in that case. Likewise, reply traffic can be redirected
> > from host-facing veth to phys dev for xmit w/o any netfilter interference.
> > This means netfilter in host ns really only sees traffic to/from host as
> > intended. This is fine today, however, if 3rd party entities (e.g. distro
> > side) start pushing down rules on the two nf hooks, then these use cases will
> > break on the egress one due to this asymmetric layering violation. Hence my
> > ask that this needs to be configurable from a control plane perspective so
> > that both use cases can live next to each other w/o breakage. Most trivial
> > one I can think of is (aside from the fact to refactor the hooks and improve
> > their performance) a flag e.g. for skb that can be set from tc/BPF layer to
> > bypass the nf hooks. Basically a flexible opt-in so that existing use-cases
> > can be retained w/o breakage.
>
> I argue that being able to filter traffic coming out of the container
> is desirable because why should the host trust the software running
> in the container to never send malicious packets.
>
> As for the flag you're asking for, it already exists in the form of
> skb->mark. Just let tc set the mark when the packet exits the container
> and add a netfilter rule to accept packets carrying that mark. Or do
> not set any netfilter egress rules at all to disable the egress
> filtering and avoid the performance impact it would imply.
Would you follow up on this? I'd really appreciate if you do.
We're lately discussing more and more usecases in the NFWS meetings
where the egress can get really useful.
Thank you.
^ permalink raw reply
* Re: [PATCH net-next,v3 0/9] netfilter: flowtable bridge and vlan enhancements
From: Jakub Kicinski @ 2020-11-21 19:23 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Tobias Waldekranz, netfilter-devel, davem, netdev, razor, jeremy
In-Reply-To: <20201121185621.GA23017@salvia>
On Sat, 21 Nov 2020 19:56:21 +0100 Pablo Neira Ayuso wrote:
> > Please gather some review tags from senior netdev developers. I don't
> > feel confident enough to apply this as 100% my own decision.
>
> Fair enough.
>
> This requirement for very specific Netfilter infrastructure which does
> not affect any other Networking subsystem sounds new to me.
You mean me asking for reviews from other senior folks when I don't
feel good about some code? I've asked others the same thing in the
past, e.g. Paolo for his RPS thing.
> What senior developers specifically you would like I should poke to
> get an acknowledgement on this to get this accepted of your
> preference?
I don't want to make a list. Maybe netconf attendees are a safe bet?
^ permalink raw reply
* Re: [PATCH v5 2/3] net: add kcov handle to skb extensions
From: Johannes Berg @ 2020-11-21 19:30 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Florian Westphal, Ido Schimmel, Aleksandr Nogikh, davem, edumazet,
andreyknvl, dvyukov, elver, linux-kernel, netdev, linux-wireless,
willemdebruijn.kernel, Aleksandr Nogikh, Willem de Bruijn
In-Reply-To: <20201121103529.4b4acbff@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
On Sat, 2020-11-21 at 10:35 -0800, Jakub Kicinski wrote:
> On Sat, 21 Nov 2020 19:12:21 +0100 Johannes Berg wrote:
> > > So I'm leaning towards reverting the whole thing. You can attach
> > > kretprobes and record the information you need in BPF maps.
> >
> > I'm not going to object to reverting it (and perhaps redoing it better
> > later), but I will point out that kretprobe isn't going to work, you
> > eventually need kcov_remote_start() to be called in strategic points
> > before processing the skb after it bounced through the system.
> >
> > IOW, it's not really about serving userland, it's about enabling (and
> > later disabling) coverage collection for the bits of code it cares
> > about, mostly because collecting it for _everything_ is going to be too
> > slow and will mess up the data since for coverage guided fuzzing you
> > really need the reported coverage data to be only about the injected
> > fuzz data...
>
> All you need is make kcov_remote_start_common() be BPF-able, like
> the LSM hooks are now, right? And then BPF can return whatever handle
> it pleases.
Not sure I understand. Are you saying something should call
"kcov_remote_start_common()" with, say, the SKB, and leave it to a mass
of bpf hooks to figure out where the SKB got cloned or copied or
whatnot, track that in a map, and then ... no, wait, I don't really see
what you mean, sorry.
IIUC, fundamentally, you have this:
- at the beginning, a task is tagged with "please collect coverage
data for this handle"
- this task creates an SKB, etc, and all of the code that this task
executes is captured and the coverage data is reported
- However, the SKB traverses lots of things, gets copied, cloned, or
whatnot, and eventually leaves the annotated task, say for further
processing in softirq context or elsewhere.
Now since the whole point is to see what chaos this SKB created from
beginning (allocation) to end (free), since it was filled with fuzzed
data, you now have to figure out where to pick back up when the SKB is
processed further.
This is what the infrastructure was meant to solve. But note that the
SKB might be further cloned etc, so in order to track it you'd have to
(out-of-band) figure out all the possible places where it could
be reallocated, any time the skb pointer could change.
Then, when you know you've got interesting code on your hands, like in
mac80211 that was annotated in patch 3 here, you basically say
"oohhh, this SKB was annotated before, let's continue capturing
coverage data here"
(and turn it off again later by the corresponding kcov_remote_stop().
So the only way I could _possibly_ see how to do this would be to
* capture all possible places where the skb pointer can change
* still call something like skb_get_kcov_handle() but let it call out
to a BPF program to query a map or something to figure out if this
SKB has a handle attached to it
> Or if you don't like BPF or what to KCOV BPF itself in the future you
> can roll your own mechanism. The point is - this should be relatively
> easily doable out of line...
Seems pretty complicated to me though ...
johannes
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox