* [PATCH nf-next] netfilter: nf_defrag_ipv4: Add sysctl to disable per interface
@ 2017-11-04 2:28 Subash Abhinov Kasiviswanathan
2017-11-04 9:07 ` Steffen Klassert
2017-11-07 10:30 ` Florian Westphal
0 siblings, 2 replies; 6+ messages in thread
From: Subash Abhinov Kasiviswanathan @ 2017-11-04 2:28 UTC (permalink / raw)
To: netfilter-devel, steffen.klassert, pablo; +Cc: Subash Abhinov Kasiviswanathan
Add a sysctl nf_ipv4_defrag_skip to skip defragmentation per
interface. This is set 0 to preserve existing behavior (always
defrag per interface).
This is useful for pure ipv4 forwarding scenarios (without NAT)
in conjunction with xfrm. It appears that network stack defrags
the packets and then forwards them to xfrm which then encrypts
and then later fragments them on a different boundary compared
to the source.
An example of this usage is for fixing wifi calling on networks
where certain routers are configured to drop fragments explicitly.
Wifi calling was failing because data from rmnet interfaces was
fragmented and got reassembled and then forwarded and encrypted
and transmitted over wifi. Upon encryption, these packets were
larger than wifi MTU and hence were fragmented and sent to network.
We avoid fragmentation with this change as the original packets
themselves do not need to be fragmented after encryption since
these packets were smaller in size than the wifi MTU.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
Documentation/networking/ip-sysctl.txt | 3 +++
include/linux/inetdevice.h | 2 ++
include/uapi/linux/ip.h | 1 +
include/uapi/linux/sysctl.h | 1 +
kernel/sysctl_binary.c | 1 +
net/ipv4/devinet.c | 2 ++
net/ipv4/netfilter/nf_defrag_ipv4.c | 10 ++++++++--
7 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 77f4de5..d846607 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1281,6 +1281,9 @@ drop_gratuitous_arp - BOOLEAN
(or in the case of 802.11, must not be used to prevent attacks.)
Default: off (0)
+nf_ipv4_defrag_skip - BOOLEAN
+ Skip defragmentation per interface if set.
+ Default : 0 (always defrag)
tag - INTEGER
Allows you to write a number, which can be used as required.
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index 681dff3..638b681 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -129,6 +129,8 @@ static inline void ipv4_devconf_setall(struct in_device *in_dev)
#define IN_DEV_ARP_ANNOUNCE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE)
#define IN_DEV_ARP_IGNORE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_IGNORE)
#define IN_DEV_ARP_NOTIFY(in_dev) IN_DEV_MAXCONF((in_dev), ARP_NOTIFY)
+#define IN_DEV_NF_IPV4_DEFRAG_SKIP(in_dev) \
+ IN_DEV_ORCONF((in_dev), NF_IPV4_DEFRAG_SKIP)
struct in_ifaddr {
struct hlist_node hash;
diff --git a/include/uapi/linux/ip.h b/include/uapi/linux/ip.h
index f291569..739a4f3 100644
--- a/include/uapi/linux/ip.h
+++ b/include/uapi/linux/ip.h
@@ -167,6 +167,7 @@ enum
IPV4_DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
IPV4_DEVCONF_DROP_UNICAST_IN_L2_MULTICAST,
IPV4_DEVCONF_DROP_GRATUITOUS_ARP,
+ IPV4_DEVCONF_NF_IPV4_DEFRAG_SKIP,
__IPV4_DEVCONF_MAX
};
diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
index e13d480..dd3f439 100644
--- a/include/uapi/linux/sysctl.h
+++ b/include/uapi/linux/sysctl.h
@@ -480,6 +480,7 @@ enum
NET_IPV4_CONF_PROMOTE_SECONDARIES=20,
NET_IPV4_CONF_ARP_ACCEPT=21,
NET_IPV4_CONF_ARP_NOTIFY=22,
+ NET_IPV4_CONF_NF_IPV4_DEFRAG_SKIP = 23,
};
/* /proc/sys/net/ipv4/netfilter */
diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c
index 58ea8c0..8acb0c6 100644
--- a/kernel/sysctl_binary.c
+++ b/kernel/sysctl_binary.c
@@ -254,6 +254,7 @@ struct bin_table {
{ CTL_INT, NET_IPV4_CONF_NOPOLICY, "disable_policy" },
{ CTL_INT, NET_IPV4_CONF_FORCE_IGMP_VERSION, "force_igmp_version" },
{ CTL_INT, NET_IPV4_CONF_PROMOTE_SECONDARIES, "promote_secondaries" },
+ { CTL_INT, NET_IPV4_CONF_NF_IPV4_DEFRAG_SKIP, "nf_ipv4_defrag_skip" },
{}
};
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index a4573bc..2713a2f 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -2280,6 +2280,8 @@ static int ipv4_doint_and_flush(struct ctl_table *ctl, int write,
"ignore_routes_with_linkdown"),
DEVINET_SYSCTL_RW_ENTRY(DROP_GRATUITOUS_ARP,
"drop_gratuitous_arp"),
+ DEVINET_SYSCTL_RW_ENTRY(NF_IPV4_DEFRAG_SKIP,
+ "nf_ipv4_defrag_skip"),
DEVINET_SYSCTL_FLUSHING_ENTRY(NOXFRM, "disable_xfrm"),
DEVINET_SYSCTL_FLUSHING_ENTRY(NOPOLICY, "disable_policy"),
diff --git a/net/ipv4/netfilter/nf_defrag_ipv4.c b/net/ipv4/netfilter/nf_defrag_ipv4.c
index 37fe1616..9e6f9d2 100644
--- a/net/ipv4/netfilter/nf_defrag_ipv4.c
+++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
@@ -11,6 +11,7 @@
#include <linux/netfilter.h>
#include <linux/module.h>
#include <linux/skbuff.h>
+#include <linux/inetdevice.h>
#include <net/netns/generic.h>
#include <net/route.h>
#include <net/ip.h>
@@ -81,8 +82,13 @@ static unsigned int ipv4_conntrack_defrag(void *priv,
#endif
/* Gather fragments. */
if (ip_is_fragment(ip_hdr(skb))) {
- enum ip_defrag_users user =
- nf_ct_defrag_user(state->hook, skb);
+ enum ip_defrag_users user;
+
+ if (skb->dev &&
+ IN_DEV_NF_IPV4_DEFRAG_SKIP(__in_dev_get_rcu(skb->dev)))
+ return NF_ACCEPT;
+
+ user = nf_ct_defrag_user(state->hook, skb);
if (nf_ct_ipv4_gather_frags(state->net, skb, user))
return NF_STOLEN;
--
1.9.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH nf-next] netfilter: nf_defrag_ipv4: Add sysctl to disable per interface
2017-11-04 2:28 [PATCH nf-next] netfilter: nf_defrag_ipv4: Add sysctl to disable per interface Subash Abhinov Kasiviswanathan
@ 2017-11-04 9:07 ` Steffen Klassert
2017-11-07 10:30 ` Florian Westphal
1 sibling, 0 replies; 6+ messages in thread
From: Steffen Klassert @ 2017-11-04 9:07 UTC (permalink / raw)
To: Subash Abhinov Kasiviswanathan; +Cc: netfilter-devel, pablo
On Fri, Nov 03, 2017 at 08:28:40PM -0600, Subash Abhinov Kasiviswanathan wrote:
> Add a sysctl nf_ipv4_defrag_skip to skip defragmentation per
> interface. This is set 0 to preserve existing behavior (always
> defrag per interface).
>
> This is useful for pure ipv4 forwarding scenarios (without NAT)
> in conjunction with xfrm. It appears that network stack defrags
> the packets and then forwards them to xfrm which then encrypts
> and then later fragments them on a different boundary compared
> to the source.
The reassembling happens because of conntrack, right?
In this case, I'd recommend to do it like IPv6 does.
I.e. reassembling the fragments, inspect the reassembled
packet and if OK, send the chain of fragments instead of
the reassembled packet back to the stack.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH nf-next] netfilter: nf_defrag_ipv4: Add sysctl to disable per interface
2017-11-04 2:28 [PATCH nf-next] netfilter: nf_defrag_ipv4: Add sysctl to disable per interface Subash Abhinov Kasiviswanathan
2017-11-04 9:07 ` Steffen Klassert
@ 2017-11-07 10:30 ` Florian Westphal
2017-11-07 18:58 ` Subash Abhinov Kasiviswanathan
1 sibling, 1 reply; 6+ messages in thread
From: Florian Westphal @ 2017-11-07 10:30 UTC (permalink / raw)
To: Subash Abhinov Kasiviswanathan; +Cc: netfilter-devel, steffen.klassert, pablo
Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> wrote:
> Add a sysctl nf_ipv4_defrag_skip to skip defragmentation per
> interface. This is set 0 to preserve existing behavior (always
> defrag per interface).
>
> This is useful for pure ipv4 forwarding scenarios (without NAT)
> in conjunction with xfrm. It appears that network stack defrags
> the packets and then forwards them to xfrm which then encrypts
> and then later fragments them on a different boundary compared
> to the source.
This breaks connection tracking for packets coming in via such
interfaces.
Nowadays we only enable defrag in a network namespace if the ip/nftables
ruleset requires it, so this setting would be counter-productive.
> An example of this usage is for fixing wifi calling on networks
> where certain routers are configured to drop fragments explicitly.
Yay... does that happen for all frags or is this related to df bit
somehow?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH nf-next] netfilter: nf_defrag_ipv4: Add sysctl to disable per interface
2017-11-07 10:30 ` Florian Westphal
@ 2017-11-07 18:58 ` Subash Abhinov Kasiviswanathan
2017-11-08 4:34 ` Pablo Neira Ayuso
0 siblings, 1 reply; 6+ messages in thread
From: Subash Abhinov Kasiviswanathan @ 2017-11-07 18:58 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel, steffen.klassert, pablo
> This breaks connection tracking for packets coming in via such
> interfaces.
>
> Nowadays we only enable defrag in a network namespace if the
> ip/nftables
> ruleset requires it, so this setting would be counter-productive.
Hi Florian
This usecase is run on an Android based device, so there will be only
the init namespace. While the specific rmnet interfaces for wifi calling
do
not require conntrack / iptables, some other scenarios like NAT on other
interfaces may trigger the load of the defrag module. Hence, we needed
this interface specific way of preventing defrag.
>> An example of this usage is for fixing wifi calling on networks
>> where certain routers are configured to drop fragments explicitly.
>
> Yay... does that happen for all frags or is this related to df bit
> somehow?
Based on our observations, the routers usually drop all fragmented
packets possibly for security reasons.
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH nf-next] netfilter: nf_defrag_ipv4: Add sysctl to disable per interface
2017-11-07 18:58 ` Subash Abhinov Kasiviswanathan
@ 2017-11-08 4:34 ` Pablo Neira Ayuso
2017-11-08 20:46 ` Subash Abhinov Kasiviswanathan
0 siblings, 1 reply; 6+ messages in thread
From: Pablo Neira Ayuso @ 2017-11-08 4:34 UTC (permalink / raw)
To: Subash Abhinov Kasiviswanathan
Cc: Florian Westphal, netfilter-devel, steffen.klassert
On Tue, Nov 07, 2017 at 11:58:40AM -0700, Subash Abhinov Kasiviswanathan wrote:
> >This breaks connection tracking for packets coming in via such
> >interfaces.
> >
> >Nowadays we only enable defrag in a network namespace if the ip/nftables
> >ruleset requires it, so this setting would be counter-productive.
[...]
> This usecase is run on an Android based device, so there will be only
> the init namespace. While the specific rmnet interfaces for wifi calling do
> not require conntrack / iptables, some other scenarios like NAT on other
> interfaces may trigger the load of the defrag module. Hence, we needed
> this interface specific way of preventing defrag.
We can probably skip defrag if explicit notrack is requested via rule.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH nf-next] netfilter: nf_defrag_ipv4: Add sysctl to disable per interface
2017-11-08 4:34 ` Pablo Neira Ayuso
@ 2017-11-08 20:46 ` Subash Abhinov Kasiviswanathan
0 siblings, 0 replies; 6+ messages in thread
From: Subash Abhinov Kasiviswanathan @ 2017-11-08 20:46 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, steffen.klassert
> We can probably skip defrag if explicit notrack is requested via rule.
Hi Pablo
Thanks for the suggestion. I tried this and it appears that defrag
occurs before NOTRACK is hit in raw table in PREROUTING. This is because
the defrag priority happens to be higher than that of RAW.
[include/uapi/linux/netfilter_ipv4.h]
enum nf_ip_hook_priorities {
NF_IP_PRI_FIRST = INT_MIN,
NF_IP_PRI_CONNTRACK_DEFRAG = -400,
NF_IP_PRI_RAW = -300,
NF_IP_PRI_SELINUX_FIRST = -225,
NF_IP_PRI_CONNTRACK = -200,
By changing the ordering of NF_IP_PRI_CONNTRACK_DEFRAG to -210 (some
lower value)
instead of -400 (before CONNTRACK), I was able to skip the defrag when
NOTRACK was
set. Do you think this is a possible solution.
diff --git a/include/uapi/linux/netfilter_ipv4.h
b/include/uapi/linux/netfilter_ipv4.h
index 91ddd1f..13dc767 100644
--- a/include/uapi/linux/netfilter_ipv4.h
+++ b/include/uapi/linux/netfilter_ipv4.h
@@ -56,9 +56,9 @@
enum nf_ip_hook_priorities {
NF_IP_PRI_FIRST = INT_MIN,
- NF_IP_PRI_CONNTRACK_DEFRAG = -400,
NF_IP_PRI_RAW = -300,
NF_IP_PRI_SELINUX_FIRST = -225,
+ NF_IP_PRI_CONNTRACK_DEFRAG = -210,
NF_IP_PRI_CONNTRACK = -200,
NF_IP_PRI_MANGLE = -150,
NF_IP_PRI_NAT_DST = -100,
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-11-08 20:46 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-04 2:28 [PATCH nf-next] netfilter: nf_defrag_ipv4: Add sysctl to disable per interface Subash Abhinov Kasiviswanathan
2017-11-04 9:07 ` Steffen Klassert
2017-11-07 10:30 ` Florian Westphal
2017-11-07 18:58 ` Subash Abhinov Kasiviswanathan
2017-11-08 4:34 ` Pablo Neira Ayuso
2017-11-08 20:46 ` Subash Abhinov Kasiviswanathan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).