* [PATCH net 0/6] Netfilter fixes for net
@ 2024-01-31 22:59 Pablo Neira Ayuso
2024-01-31 22:59 ` [PATCH net 1/6] netfilter: conntrack: correct window scaling with retransmitted SYN Pablo Neira Ayuso
` (5 more replies)
0 siblings, 6 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2024-01-31 22:59 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw
Hi,
The following patchset contains Netfilter fixes for net:
1) TCP conntrack now only evaluates window negotiation for packets in
the REPLY direction, from Ryan Schaefer. Otherwise SYN retransmissions
trigger incorrect window scale negotiation. From Ryan Schaefer.
2) Restrict tunnel objects to NFPROTO_NETDEV which is where it makes sense
to use this object type.
3) Fix conntrack pick up from the middle of SCTP_CID_SHUTDOWN_ACK packets.
From Xin Long.
4) Another attempt from Jozsef Kadlecsik to address the slow down of the
swap command in ipset.
5) Replace a BUG_ON by WARN_ON_ONCE in nf_log, and consolidate check for
the case that the logger is NULL from the read side lock section.
6) Address lack of sanitization for custom expectations. Restrict layer 3
and 4 families to what it is supported by userspace.
Please, pull these changes from:
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-24-01-31
Thanks.
----------------------------------------------------------------
The following changes since commit a2933a8759a62269754e54733d993b19de870e84:
selftests: bonding: do not test arp/ns target with mode balance-alb/tlb (2024-01-25 09:50:54 +0100)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-24-01-31
for you to fetch changes up to 8059918a1377f2f1fff06af4f5a4ed3d5acd6bc4:
netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations (2024-01-31 23:14:14 +0100)
----------------------------------------------------------------
netfilter pull request 24-01-31
----------------------------------------------------------------
Jozsef Kadlecsik (1):
netfilter: ipset: fix performance regression in swap operation
Pablo Neira Ayuso (3):
netfilter: nf_tables: restrict tunnel object to NFPROTO_NETDEV
netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting logger
netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations
Ryan Schaefer (1):
netfilter: conntrack: correct window scaling with retransmitted SYN
Xin Long (1):
netfilter: conntrack: check SCTP_CID_SHUTDOWN_ACK for vtag setting in sctp_new
include/linux/netfilter/ipset/ip_set.h | 4 ++++
include/net/netfilter/nf_tables.h | 2 ++
net/netfilter/ipset/ip_set_bitmap_gen.h | 14 ++++++++++---
net/netfilter/ipset/ip_set_core.c | 37 +++++++++++++++++++++++++--------
net/netfilter/ipset/ip_set_hash_gen.h | 15 ++++++++++---
net/netfilter/ipset/ip_set_list_set.c | 13 +++++++++---
net/netfilter/nf_conntrack_proto_sctp.c | 2 +-
net/netfilter/nf_conntrack_proto_tcp.c | 10 +++++----
net/netfilter/nf_log.c | 7 ++++---
net/netfilter/nf_tables_api.c | 14 ++++++++-----
net/netfilter/nft_ct.c | 24 +++++++++++++++++++++
net/netfilter/nft_tunnel.c | 1 +
12 files changed, 112 insertions(+), 31 deletions(-)
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net 1/6] netfilter: conntrack: correct window scaling with retransmitted SYN
2024-01-31 22:59 [PATCH net 0/6] Netfilter fixes for net Pablo Neira Ayuso
@ 2024-01-31 22:59 ` Pablo Neira Ayuso
2024-02-01 17:20 ` patchwork-bot+netdevbpf
2024-01-31 22:59 ` [PATCH net 2/6] netfilter: nf_tables: restrict tunnel object to NFPROTO_NETDEV Pablo Neira Ayuso
` (4 subsequent siblings)
5 siblings, 1 reply; 8+ messages in thread
From: Pablo Neira Ayuso @ 2024-01-31 22:59 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw
From: Ryan Schaefer <ryanschf@amazon.com>
commit c7aab4f17021 ("netfilter: nf_conntrack_tcp: re-init for syn packets
only") introduces a bug where SYNs in ORIGINAL direction on reused 5-tuple
result in incorrect window scale negotiation. This commit merged the SYN
re-initialization and simultaneous open or SYN retransmits cases. Merging
this block added the logic in tcp_init_sender() that performed window scale
negotiation to the retransmitted syn case. Previously. this would only
result in updating the sender's scale and flags. After the merge the
additional logic results in improperly clearing the scale in ORIGINAL
direction before any packets in the REPLY direction are received. This
results in packets incorrectly being marked invalid for being
out-of-window.
This can be reproduced with the following trace:
Packet Sequence:
> Flags [S], seq 1687765604, win 62727, options [.. wscale 7], length 0
> Flags [S], seq 1944817196, win 62727, options [.. wscale 7], length 0
In order to fix the issue, only evaluate window negotiation for packets
in the REPLY direction. This was tested with simultaneous open, fast
open, and the above reproduction.
Fixes: c7aab4f17021 ("netfilter: nf_conntrack_tcp: re-init for syn packets only")
Signed-off-by: Ryan Schaefer <ryanschf@amazon.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nf_conntrack_proto_tcp.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index e573be5afde7..ae493599a3ef 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -457,7 +457,8 @@ static void tcp_init_sender(struct ip_ct_tcp_state *sender,
const struct sk_buff *skb,
unsigned int dataoff,
const struct tcphdr *tcph,
- u32 end, u32 win)
+ u32 end, u32 win,
+ enum ip_conntrack_dir dir)
{
/* SYN-ACK in reply to a SYN
* or SYN from reply direction in simultaneous open.
@@ -471,7 +472,8 @@ static void tcp_init_sender(struct ip_ct_tcp_state *sender,
* Both sides must send the Window Scale option
* to enable window scaling in either direction.
*/
- if (!(sender->flags & IP_CT_TCP_FLAG_WINDOW_SCALE &&
+ if (dir == IP_CT_DIR_REPLY &&
+ !(sender->flags & IP_CT_TCP_FLAG_WINDOW_SCALE &&
receiver->flags & IP_CT_TCP_FLAG_WINDOW_SCALE)) {
sender->td_scale = 0;
receiver->td_scale = 0;
@@ -542,7 +544,7 @@ tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir,
if (tcph->syn) {
tcp_init_sender(sender, receiver,
skb, dataoff, tcph,
- end, win);
+ end, win, dir);
if (!tcph->ack)
/* Simultaneous open */
return NFCT_TCP_ACCEPT;
@@ -585,7 +587,7 @@ tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir,
*/
tcp_init_sender(sender, receiver,
skb, dataoff, tcph,
- end, win);
+ end, win, dir);
if (dir == IP_CT_DIR_REPLY && !tcph->ack)
return NFCT_TCP_ACCEPT;
--
2.30.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net 2/6] netfilter: nf_tables: restrict tunnel object to NFPROTO_NETDEV
2024-01-31 22:59 [PATCH net 0/6] Netfilter fixes for net Pablo Neira Ayuso
2024-01-31 22:59 ` [PATCH net 1/6] netfilter: conntrack: correct window scaling with retransmitted SYN Pablo Neira Ayuso
@ 2024-01-31 22:59 ` Pablo Neira Ayuso
2024-01-31 22:59 ` [PATCH net 3/6] netfilter: conntrack: check SCTP_CID_SHUTDOWN_ACK for vtag setting in sctp_new Pablo Neira Ayuso
` (3 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2024-01-31 22:59 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw
Bail out on using the tunnel dst template from other than netdev family.
Add the infrastructure to check for the family in objects.
Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/net/netfilter/nf_tables.h | 2 ++
net/netfilter/nf_tables_api.c | 14 +++++++++-----
net/netfilter/nft_tunnel.c | 1 +
3 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 4e1ea18eb5f0..001226c34621 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -1351,6 +1351,7 @@ void nft_obj_notify(struct net *net, const struct nft_table *table,
* @type: stateful object numeric type
* @owner: module owner
* @maxattr: maximum netlink attribute
+ * @family: address family for AF-specific object types
* @policy: netlink attribute policy
*/
struct nft_object_type {
@@ -1360,6 +1361,7 @@ struct nft_object_type {
struct list_head list;
u32 type;
unsigned int maxattr;
+ u8 family;
struct module *owner;
const struct nla_policy *policy;
};
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index c537104411e7..fc016befb46f 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -7551,11 +7551,15 @@ static int nft_object_dump(struct sk_buff *skb, unsigned int attr,
return -1;
}
-static const struct nft_object_type *__nft_obj_type_get(u32 objtype)
+static const struct nft_object_type *__nft_obj_type_get(u32 objtype, u8 family)
{
const struct nft_object_type *type;
list_for_each_entry(type, &nf_tables_objects, list) {
+ if (type->family != NFPROTO_UNSPEC &&
+ type->family != family)
+ continue;
+
if (objtype == type->type)
return type;
}
@@ -7563,11 +7567,11 @@ static const struct nft_object_type *__nft_obj_type_get(u32 objtype)
}
static const struct nft_object_type *
-nft_obj_type_get(struct net *net, u32 objtype)
+nft_obj_type_get(struct net *net, u32 objtype, u8 family)
{
const struct nft_object_type *type;
- type = __nft_obj_type_get(objtype);
+ type = __nft_obj_type_get(objtype, family);
if (type != NULL && try_module_get(type->owner))
return type;
@@ -7660,7 +7664,7 @@ static int nf_tables_newobj(struct sk_buff *skb, const struct nfnl_info *info,
if (info->nlh->nlmsg_flags & NLM_F_REPLACE)
return -EOPNOTSUPP;
- type = __nft_obj_type_get(objtype);
+ type = __nft_obj_type_get(objtype, family);
if (WARN_ON_ONCE(!type))
return -ENOENT;
@@ -7674,7 +7678,7 @@ static int nf_tables_newobj(struct sk_buff *skb, const struct nfnl_info *info,
if (!nft_use_inc(&table->use))
return -EMFILE;
- type = nft_obj_type_get(net, objtype);
+ type = nft_obj_type_get(net, objtype, family);
if (IS_ERR(type)) {
err = PTR_ERR(type);
goto err_type;
diff --git a/net/netfilter/nft_tunnel.c b/net/netfilter/nft_tunnel.c
index 9f21953c7433..f735d79d8be5 100644
--- a/net/netfilter/nft_tunnel.c
+++ b/net/netfilter/nft_tunnel.c
@@ -713,6 +713,7 @@ static const struct nft_object_ops nft_tunnel_obj_ops = {
static struct nft_object_type nft_tunnel_obj_type __read_mostly = {
.type = NFT_OBJECT_TUNNEL,
+ .family = NFPROTO_NETDEV,
.ops = &nft_tunnel_obj_ops,
.maxattr = NFTA_TUNNEL_KEY_MAX,
.policy = nft_tunnel_key_policy,
--
2.30.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net 3/6] netfilter: conntrack: check SCTP_CID_SHUTDOWN_ACK for vtag setting in sctp_new
2024-01-31 22:59 [PATCH net 0/6] Netfilter fixes for net Pablo Neira Ayuso
2024-01-31 22:59 ` [PATCH net 1/6] netfilter: conntrack: correct window scaling with retransmitted SYN Pablo Neira Ayuso
2024-01-31 22:59 ` [PATCH net 2/6] netfilter: nf_tables: restrict tunnel object to NFPROTO_NETDEV Pablo Neira Ayuso
@ 2024-01-31 22:59 ` Pablo Neira Ayuso
2024-01-31 22:59 ` [PATCH net 4/6] netfilter: ipset: fix performance regression in swap operation Pablo Neira Ayuso
` (2 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2024-01-31 22:59 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw
From: Xin Long <lucien.xin@gmail.com>
The annotation says in sctp_new(): "If it is a shutdown ack OOTB packet, we
expect a return shutdown complete, otherwise an ABORT Sec 8.4 (5) and (8)".
However, it does not check SCTP_CID_SHUTDOWN_ACK before setting vtag[REPLY]
in the conntrack entry(ct).
Because of that, if the ct in Router disappears for some reason in [1]
with the packet sequence like below:
Client > Server: sctp (1) [INIT] [init tag: 3201533963]
Server > Client: sctp (1) [INIT ACK] [init tag: 972498433]
Client > Server: sctp (1) [COOKIE ECHO]
Server > Client: sctp (1) [COOKIE ACK]
Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057809]
Server > Client: sctp (1) [SACK] [cum ack 3075057809]
Server > Client: sctp (1) [HB REQ]
(the ct in Router disappears somehow) <-------- [1]
Client > Server: sctp (1) [HB ACK]
Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057810]
Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057810]
Client > Server: sctp (1) [HB REQ]
Client > Server: sctp (1) [DATA] (B)(E) [TSN: 3075057810]
Client > Server: sctp (1) [HB REQ]
Client > Server: sctp (1) [ABORT]
when processing HB ACK packet in Router it calls sctp_new() to initialize
the new ct with vtag[REPLY] set to HB_ACK packet's vtag.
Later when sending DATA from Client, all the SACKs from Server will get
dropped in Router, as the SACK packet's vtag does not match vtag[REPLY]
in the ct. The worst thing is the vtag in this ct will never get fixed
by the upcoming packets from Server.
This patch fixes it by checking SCTP_CID_SHUTDOWN_ACK before setting
vtag[REPLY] in the ct in sctp_new() as the annotation says. With this
fix, it will leave vtag[REPLY] in ct to 0 in the case above, and the
next HB REQ/ACK from Server is able to fix the vtag as its value is 0
in nf_conntrack_sctp_packet().
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nf_conntrack_proto_sctp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nf_conntrack_proto_sctp.c b/net/netfilter/nf_conntrack_proto_sctp.c
index c6bd533983c1..4cc97f971264 100644
--- a/net/netfilter/nf_conntrack_proto_sctp.c
+++ b/net/netfilter/nf_conntrack_proto_sctp.c
@@ -283,7 +283,7 @@ sctp_new(struct nf_conn *ct, const struct sk_buff *skb,
pr_debug("Setting vtag %x for secondary conntrack\n",
sh->vtag);
ct->proto.sctp.vtag[IP_CT_DIR_ORIGINAL] = sh->vtag;
- } else {
+ } else if (sch->type == SCTP_CID_SHUTDOWN_ACK) {
/* If it is a shutdown ack OOTB packet, we expect a return
shutdown complete, otherwise an ABORT Sec 8.4 (5) and (8) */
pr_debug("Setting vtag %x for new conn OOTB\n",
--
2.30.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net 4/6] netfilter: ipset: fix performance regression in swap operation
2024-01-31 22:59 [PATCH net 0/6] Netfilter fixes for net Pablo Neira Ayuso
` (2 preceding siblings ...)
2024-01-31 22:59 ` [PATCH net 3/6] netfilter: conntrack: check SCTP_CID_SHUTDOWN_ACK for vtag setting in sctp_new Pablo Neira Ayuso
@ 2024-01-31 22:59 ` Pablo Neira Ayuso
2024-01-31 22:59 ` [PATCH net 5/6] netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting logger Pablo Neira Ayuso
2024-01-31 22:59 ` [PATCH net 6/6] netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations Pablo Neira Ayuso
5 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2024-01-31 22:59 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw
From: Jozsef Kadlecsik <kadlec@netfilter.org>
The patch "netfilter: ipset: fix race condition between swap/destroy
and kernel side add/del/test", commit 28628fa9 fixes a race condition.
But the synchronize_rcu() added to the swap function unnecessarily slows
it down: it can safely be moved to destroy and use call_rcu() instead.
Eric Dumazet pointed out that simply calling the destroy functions as
rcu callback does not work: sets with timeout use garbage collectors
which need cancelling at destroy which can wait. Therefore the destroy
functions are split into two: cancelling garbage collectors safely at
executing the command received by netlink and moving the remaining
part only into the rcu callback.
Link: https://lore.kernel.org/lkml/C0829B10-EAA6-4809-874E-E1E9C05A8D84@automattic.com/
Fixes: 28628fa952fe ("netfilter: ipset: fix race condition between swap/destroy and kernel side add/del/test")
Reported-by: Ale Crismani <ale.crismani@automattic.com>
Reported-by: David Wang <00107082@163.com>
Tested-by: David Wang <00107082@163.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/linux/netfilter/ipset/ip_set.h | 4 +++
net/netfilter/ipset/ip_set_bitmap_gen.h | 14 ++++++++--
net/netfilter/ipset/ip_set_core.c | 37 +++++++++++++++++++------
net/netfilter/ipset/ip_set_hash_gen.h | 15 ++++++++--
net/netfilter/ipset/ip_set_list_set.c | 13 +++++++--
5 files changed, 65 insertions(+), 18 deletions(-)
diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h
index e8c350a3ade1..e9f4f845d760 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -186,6 +186,8 @@ struct ip_set_type_variant {
/* Return true if "b" set is the same as "a"
* according to the create set parameters */
bool (*same_set)(const struct ip_set *a, const struct ip_set *b);
+ /* Cancel ongoing garbage collectors before destroying the set*/
+ void (*cancel_gc)(struct ip_set *set);
/* Region-locking is used */
bool region_lock;
};
@@ -242,6 +244,8 @@ extern void ip_set_type_unregister(struct ip_set_type *set_type);
/* A generic IP set */
struct ip_set {
+ /* For call_cru in destroy */
+ struct rcu_head rcu;
/* The name of the set */
char name[IPSET_MAXNAMELEN];
/* Lock protecting the set data */
diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h b/net/netfilter/ipset/ip_set_bitmap_gen.h
index 21f7860e8fa1..cb48a2b9cb9f 100644
--- a/net/netfilter/ipset/ip_set_bitmap_gen.h
+++ b/net/netfilter/ipset/ip_set_bitmap_gen.h
@@ -30,6 +30,7 @@
#define mtype_del IPSET_TOKEN(MTYPE, _del)
#define mtype_list IPSET_TOKEN(MTYPE, _list)
#define mtype_gc IPSET_TOKEN(MTYPE, _gc)
+#define mtype_cancel_gc IPSET_TOKEN(MTYPE, _cancel_gc)
#define mtype MTYPE
#define get_ext(set, map, id) ((map)->extensions + ((set)->dsize * (id)))
@@ -59,9 +60,6 @@ mtype_destroy(struct ip_set *set)
{
struct mtype *map = set->data;
- if (SET_WITH_TIMEOUT(set))
- del_timer_sync(&map->gc);
-
if (set->dsize && set->extensions & IPSET_EXT_DESTROY)
mtype_ext_cleanup(set);
ip_set_free(map->members);
@@ -290,6 +288,15 @@ mtype_gc(struct timer_list *t)
add_timer(&map->gc);
}
+static void
+mtype_cancel_gc(struct ip_set *set)
+{
+ struct mtype *map = set->data;
+
+ if (SET_WITH_TIMEOUT(set))
+ del_timer_sync(&map->gc);
+}
+
static const struct ip_set_type_variant mtype = {
.kadt = mtype_kadt,
.uadt = mtype_uadt,
@@ -303,6 +310,7 @@ static const struct ip_set_type_variant mtype = {
.head = mtype_head,
.list = mtype_list,
.same_set = mtype_same_set,
+ .cancel_gc = mtype_cancel_gc,
};
#endif /* __IP_SET_BITMAP_IP_GEN_H */
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index 4c133e06be1d..bcaad9c009fe 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1182,6 +1182,14 @@ ip_set_destroy_set(struct ip_set *set)
kfree(set);
}
+static void
+ip_set_destroy_set_rcu(struct rcu_head *head)
+{
+ struct ip_set *set = container_of(head, struct ip_set, rcu);
+
+ ip_set_destroy_set(set);
+}
+
static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info,
const struct nlattr * const attr[])
{
@@ -1193,8 +1201,6 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info,
if (unlikely(protocol_min_failed(attr)))
return -IPSET_ERR_PROTOCOL;
- /* Must wait for flush to be really finished in list:set */
- rcu_barrier();
/* Commands are serialized and references are
* protected by the ip_set_ref_lock.
@@ -1206,8 +1212,10 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info,
* counter, so if it's already zero, we can proceed
* without holding the lock.
*/
- read_lock_bh(&ip_set_ref_lock);
if (!attr[IPSET_ATTR_SETNAME]) {
+ /* Must wait for flush to be really finished in list:set */
+ rcu_barrier();
+ read_lock_bh(&ip_set_ref_lock);
for (i = 0; i < inst->ip_set_max; i++) {
s = ip_set(inst, i);
if (s && (s->ref || s->ref_netlink)) {
@@ -1221,6 +1229,8 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info,
s = ip_set(inst, i);
if (s) {
ip_set(inst, i) = NULL;
+ /* Must cancel garbage collectors */
+ s->variant->cancel_gc(s);
ip_set_destroy_set(s);
}
}
@@ -1228,6 +1238,9 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info,
inst->is_destroyed = false;
} else {
u32 flags = flag_exist(info->nlh);
+ u16 features = 0;
+
+ read_lock_bh(&ip_set_ref_lock);
s = find_set_and_id(inst, nla_data(attr[IPSET_ATTR_SETNAME]),
&i);
if (!s) {
@@ -1238,10 +1251,16 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info,
ret = -IPSET_ERR_BUSY;
goto out;
}
+ features = s->type->features;
ip_set(inst, i) = NULL;
read_unlock_bh(&ip_set_ref_lock);
-
- ip_set_destroy_set(s);
+ if (features & IPSET_TYPE_NAME) {
+ /* Must wait for flush to be really finished */
+ rcu_barrier();
+ }
+ /* Must cancel garbage collectors */
+ s->variant->cancel_gc(s);
+ call_rcu(&s->rcu, ip_set_destroy_set_rcu);
}
return 0;
out:
@@ -1394,9 +1413,6 @@ static int ip_set_swap(struct sk_buff *skb, const struct nfnl_info *info,
ip_set(inst, to_id) = from;
write_unlock_bh(&ip_set_ref_lock);
- /* Make sure all readers of the old set pointers are completed. */
- synchronize_rcu();
-
return 0;
}
@@ -2409,8 +2425,11 @@ ip_set_fini(void)
{
nf_unregister_sockopt(&so_set);
nfnetlink_subsys_unregister(&ip_set_netlink_subsys);
-
unregister_pernet_subsys(&ip_set_net_ops);
+
+ /* Wait for call_rcu() in destroy */
+ rcu_barrier();
+
pr_debug("these are the famous last words\n");
}
diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
index cbf80da9a01c..1136510521a8 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -222,6 +222,7 @@ static const union nf_inet_addr zeromask = {};
#undef mtype_gc_do
#undef mtype_gc
#undef mtype_gc_init
+#undef mtype_cancel_gc
#undef mtype_variant
#undef mtype_data_match
@@ -266,6 +267,7 @@ static const union nf_inet_addr zeromask = {};
#define mtype_gc_do IPSET_TOKEN(MTYPE, _gc_do)
#define mtype_gc IPSET_TOKEN(MTYPE, _gc)
#define mtype_gc_init IPSET_TOKEN(MTYPE, _gc_init)
+#define mtype_cancel_gc IPSET_TOKEN(MTYPE, _cancel_gc)
#define mtype_variant IPSET_TOKEN(MTYPE, _variant)
#define mtype_data_match IPSET_TOKEN(MTYPE, _data_match)
@@ -450,9 +452,6 @@ mtype_destroy(struct ip_set *set)
struct htype *h = set->data;
struct list_head *l, *lt;
- if (SET_WITH_TIMEOUT(set))
- cancel_delayed_work_sync(&h->gc.dwork);
-
mtype_ahash_destroy(set, ipset_dereference_nfnl(h->table), true);
list_for_each_safe(l, lt, &h->ad) {
list_del(l);
@@ -599,6 +598,15 @@ mtype_gc_init(struct htable_gc *gc)
queue_delayed_work(system_power_efficient_wq, &gc->dwork, HZ);
}
+static void
+mtype_cancel_gc(struct ip_set *set)
+{
+ struct htype *h = set->data;
+
+ if (SET_WITH_TIMEOUT(set))
+ cancel_delayed_work_sync(&h->gc.dwork);
+}
+
static int
mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
struct ip_set_ext *mext, u32 flags);
@@ -1441,6 +1449,7 @@ static const struct ip_set_type_variant mtype_variant = {
.uref = mtype_uref,
.resize = mtype_resize,
.same_set = mtype_same_set,
+ .cancel_gc = mtype_cancel_gc,
.region_lock = true,
};
diff --git a/net/netfilter/ipset/ip_set_list_set.c b/net/netfilter/ipset/ip_set_list_set.c
index e162636525cf..6c3f28bc59b3 100644
--- a/net/netfilter/ipset/ip_set_list_set.c
+++ b/net/netfilter/ipset/ip_set_list_set.c
@@ -426,9 +426,6 @@ list_set_destroy(struct ip_set *set)
struct list_set *map = set->data;
struct set_elem *e, *n;
- if (SET_WITH_TIMEOUT(set))
- timer_shutdown_sync(&map->gc);
-
list_for_each_entry_safe(e, n, &map->members, list) {
list_del(&e->list);
ip_set_put_byindex(map->net, e->id);
@@ -545,6 +542,15 @@ list_set_same_set(const struct ip_set *a, const struct ip_set *b)
a->extensions == b->extensions;
}
+static void
+list_set_cancel_gc(struct ip_set *set)
+{
+ struct list_set *map = set->data;
+
+ if (SET_WITH_TIMEOUT(set))
+ timer_shutdown_sync(&map->gc);
+}
+
static const struct ip_set_type_variant set_variant = {
.kadt = list_set_kadt,
.uadt = list_set_uadt,
@@ -558,6 +564,7 @@ static const struct ip_set_type_variant set_variant = {
.head = list_set_head,
.list = list_set_list,
.same_set = list_set_same_set,
+ .cancel_gc = list_set_cancel_gc,
};
static void
--
2.30.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net 5/6] netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting logger
2024-01-31 22:59 [PATCH net 0/6] Netfilter fixes for net Pablo Neira Ayuso
` (3 preceding siblings ...)
2024-01-31 22:59 ` [PATCH net 4/6] netfilter: ipset: fix performance regression in swap operation Pablo Neira Ayuso
@ 2024-01-31 22:59 ` Pablo Neira Ayuso
2024-01-31 22:59 ` [PATCH net 6/6] netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations Pablo Neira Ayuso
5 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2024-01-31 22:59 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw
Module reference is bumped for each user, this should not ever happen.
But BUG_ON check should use rcu_access_pointer() instead.
If this ever happens, do WARN_ON_ONCE() instead of BUG_ON() and
consolidate pointer check under the rcu read side lock section.
Fixes: fab4085f4e24 ("netfilter: log: nf_log_packet() as real unified interface")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nf_log.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c
index 8cc52d2bd31b..e16f158388bb 100644
--- a/net/netfilter/nf_log.c
+++ b/net/netfilter/nf_log.c
@@ -193,11 +193,12 @@ void nf_logger_put(int pf, enum nf_log_type type)
return;
}
- BUG_ON(loggers[pf][type] == NULL);
-
rcu_read_lock();
logger = rcu_dereference(loggers[pf][type]);
- module_put(logger->me);
+ if (!logger)
+ WARN_ON_ONCE(1);
+ else
+ module_put(logger->me);
rcu_read_unlock();
}
EXPORT_SYMBOL_GPL(nf_logger_put);
--
2.30.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net 6/6] netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations
2024-01-31 22:59 [PATCH net 0/6] Netfilter fixes for net Pablo Neira Ayuso
` (4 preceding siblings ...)
2024-01-31 22:59 ` [PATCH net 5/6] netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting logger Pablo Neira Ayuso
@ 2024-01-31 22:59 ` Pablo Neira Ayuso
5 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2024-01-31 22:59 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw
- Disallow families other than NFPROTO_{IPV4,IPV6,INET}.
- Disallow layer 4 protocol with no ports, since destination port is a
mandatory attribute for this object.
Fixes: 857b46027d6f ("netfilter: nft_ct: add ct expectations support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nft_ct.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index 86bb9d7797d9..aac98a3c966e 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -1250,7 +1250,31 @@ static int nft_ct_expect_obj_init(const struct nft_ctx *ctx,
if (tb[NFTA_CT_EXPECT_L3PROTO])
priv->l3num = ntohs(nla_get_be16(tb[NFTA_CT_EXPECT_L3PROTO]));
+ switch (priv->l3num) {
+ case NFPROTO_IPV4:
+ case NFPROTO_IPV6:
+ if (priv->l3num != ctx->family)
+ return -EINVAL;
+
+ fallthrough;
+ case NFPROTO_INET:
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
priv->l4proto = nla_get_u8(tb[NFTA_CT_EXPECT_L4PROTO]);
+ switch (priv->l4proto) {
+ case IPPROTO_TCP:
+ case IPPROTO_UDP:
+ case IPPROTO_UDPLITE:
+ case IPPROTO_DCCP:
+ case IPPROTO_SCTP:
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
priv->dport = nla_get_be16(tb[NFTA_CT_EXPECT_DPORT]);
priv->timeout = nla_get_u32(tb[NFTA_CT_EXPECT_TIMEOUT]);
priv->size = nla_get_u8(tb[NFTA_CT_EXPECT_SIZE]);
--
2.30.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net 1/6] netfilter: conntrack: correct window scaling with retransmitted SYN
2024-01-31 22:59 ` [PATCH net 1/6] netfilter: conntrack: correct window scaling with retransmitted SYN Pablo Neira Ayuso
@ 2024-02-01 17:20 ` patchwork-bot+netdevbpf
0 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-02-01 17:20 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: netfilter-devel, davem, netdev, kuba, pabeni, edumazet, fw
Hello:
This series was applied to netdev/net.git (main)
by Pablo Neira Ayuso <pablo@netfilter.org>:
On Wed, 31 Jan 2024 23:59:38 +0100 you wrote:
> From: Ryan Schaefer <ryanschf@amazon.com>
>
> commit c7aab4f17021 ("netfilter: nf_conntrack_tcp: re-init for syn packets
> only") introduces a bug where SYNs in ORIGINAL direction on reused 5-tuple
> result in incorrect window scale negotiation. This commit merged the SYN
> re-initialization and simultaneous open or SYN retransmits cases. Merging
> this block added the logic in tcp_init_sender() that performed window scale
> negotiation to the retransmitted syn case. Previously. this would only
> result in updating the sender's scale and flags. After the merge the
> additional logic results in improperly clearing the scale in ORIGINAL
> direction before any packets in the REPLY direction are received. This
> results in packets incorrectly being marked invalid for being
> out-of-window.
>
> [...]
Here is the summary with links:
- [net,1/6] netfilter: conntrack: correct window scaling with retransmitted SYN
https://git.kernel.org/netdev/net/c/fb366fc7541a
- [net,2/6] netfilter: nf_tables: restrict tunnel object to NFPROTO_NETDEV
https://git.kernel.org/netdev/net/c/776d45164844
- [net,3/6] netfilter: conntrack: check SCTP_CID_SHUTDOWN_ACK for vtag setting in sctp_new
https://git.kernel.org/netdev/net/c/6e348067ee4b
- [net,4/6] netfilter: ipset: fix performance regression in swap operation
https://git.kernel.org/netdev/net/c/97f7cf1cd80e
- [net,5/6] netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting logger
https://git.kernel.org/netdev/net/c/259eb32971e9
- [net,6/6] netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations
https://git.kernel.org/netdev/net/c/8059918a1377
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-02-01 17:20 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-31 22:59 [PATCH net 0/6] Netfilter fixes for net Pablo Neira Ayuso
2024-01-31 22:59 ` [PATCH net 1/6] netfilter: conntrack: correct window scaling with retransmitted SYN Pablo Neira Ayuso
2024-02-01 17:20 ` patchwork-bot+netdevbpf
2024-01-31 22:59 ` [PATCH net 2/6] netfilter: nf_tables: restrict tunnel object to NFPROTO_NETDEV Pablo Neira Ayuso
2024-01-31 22:59 ` [PATCH net 3/6] netfilter: conntrack: check SCTP_CID_SHUTDOWN_ACK for vtag setting in sctp_new Pablo Neira Ayuso
2024-01-31 22:59 ` [PATCH net 4/6] netfilter: ipset: fix performance regression in swap operation Pablo Neira Ayuso
2024-01-31 22:59 ` [PATCH net 5/6] netfilter: nf_log: replace BUG_ON by WARN_ON_ONCE when putting logger Pablo Neira Ayuso
2024-01-31 22:59 ` [PATCH net 6/6] netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations Pablo Neira Ayuso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).