* [PATCH -next 0/5] netfilter: nf_queue: avoid expensive gso/checksumming @ 2013-04-16 15:32 Florian Westphal 2013-04-16 15:32 ` [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function Florian Westphal ` (4 more replies) 0 siblings, 5 replies; 10+ messages in thread From: Florian Westphal @ 2013-04-16 15:32 UTC (permalink / raw) To: netfilter-devel; +Cc: Eric Dumazet Hi Pablo, please consider pulling from git://git.breakpoint.cc/fw/nf-next.git nfqueue_gso_avoidance_04 to retrieve the following changes since commit aaa795ad25e18488b026572c7ba2ca8f99ced0b7: netfilter: nat: propagate errors from xfrm_me_harder() (2013-04-08 12:34:01 +0200) Florian Westphal (5): netfilter: nf_queue: move device refcount bump to extra function netfilter: nfnetlink_queue: avoid peer_portid test netfilter: move skb_gso_segment into nfnetlink_queue module netfilter: nfnetlink_queue: add skb info attribute netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup With these patches, userspace can now instruct the kernel that it is gso/gro aware and can handle "invalid" checksums that appear in packet headers. For old userspace, nothing is changed: the kernel segments gso skbs and adjusts checksums. To avoid gso/checksum fixup overhead, userspace applications must set the new NFQA_CFG_F_GSO config flag via NFQA_CFG_FLAGS attribute AND check the new NFQA_SKB_INFO attribute when processing a packet. This new attribute currently contains two bits: - NFQA_SKB_CSUMNOTREADY means 'checksums will be fixed in kernel later, pretend they are ok'. - NFQA_SKB_GSO could be used for statistics, or to determine when packet size exceeds mtu. I've done a few tests with old userspace and did not notice any issues. Feedback welcome. Update for libnetfilter_queue (including example program/documentation) will follow later. diffstat: include/net/netfilter/nf_queue.h | 6 + include/uapi/linux/netfilter/nfnetlink_queue.h | 10 ++- net/netfilter/nf_queue.c | 143 +++++--------------- net/netfilter/nfnetlink_queue_core.c | 169 ++++++++++++++++++++--- 4 files changed, 198 insertions(+), 130 deletions(-) ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function 2013-04-16 15:32 [PATCH -next 0/5] netfilter: nf_queue: avoid expensive gso/checksumming Florian Westphal @ 2013-04-16 15:32 ` Florian Westphal 2013-04-16 15:32 ` [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test Florian Westphal ` (3 subsequent siblings) 4 siblings, 0 replies; 10+ messages in thread From: Florian Westphal @ 2013-04-16 15:32 UTC (permalink / raw) To: netfilter-devel; +Cc: Eric Dumazet, Florian Westphal required by future patch that will need to duplicate the nf_queue_entry, bumping refcounts of the copy. Signed-off-by: Florian Westphal <fw@strlen.de> --- net/netfilter/nf_queue.c | 49 ++++++++++++++++++++++++++------------------- 1 files changed, 28 insertions(+), 21 deletions(-) diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c index d812c12..40a5001 100644 --- a/net/netfilter/nf_queue.c +++ b/net/netfilter/nf_queue.c @@ -61,6 +61,33 @@ static void nf_queue_entry_release_refs(struct nf_queue_entry *entry) module_put(entry->elem->owner); } +/* Bump dev refs so they don't vanish while packet is out */ +static bool nf_queue_entry_get_refs(struct nf_queue_entry *entry) +{ + if (!try_module_get(entry->elem->owner)) + return false; + + if (entry->indev) + dev_hold(entry->indev); + if (entry->outdev) + dev_hold(entry->outdev); +#ifdef CONFIG_BRIDGE_NETFILTER + if (entry->skb->nf_bridge) { + struct nf_bridge_info *nf_bridge = entry->skb->nf_bridge; + struct net_device *physdev; + + physdev = nf_bridge->physindev; + if (physdev) + dev_hold(physdev); + physdev = nf_bridge->physoutdev; + if (physdev) + dev_hold(physdev); + } +#endif + + return true; +} + /* * Any packet that leaves via this function must come back * through nf_reinject(). @@ -75,10 +102,6 @@ static int __nf_queue(struct sk_buff *skb, { int status = -ENOENT; struct nf_queue_entry *entry = NULL; -#ifdef CONFIG_BRIDGE_NETFILTER - struct net_device *physindev; - struct net_device *physoutdev; -#endif const struct nf_afinfo *afinfo; const struct nf_queue_handler *qh; @@ -111,26 +134,10 @@ static int __nf_queue(struct sk_buff *skb, .okfn = okfn, }; - /* If it's going away, ignore hook. */ - if (!try_module_get(entry->elem->owner)) { + if (!nf_queue_entry_get_refs(entry)) { status = -ECANCELED; goto err_unlock; } - /* Bump dev refs so they don't vanish while packet is out */ - if (indev) - dev_hold(indev); - if (outdev) - dev_hold(outdev); -#ifdef CONFIG_BRIDGE_NETFILTER - if (skb->nf_bridge) { - physindev = skb->nf_bridge->physindev; - if (physindev) - dev_hold(physindev); - physoutdev = skb->nf_bridge->physoutdev; - if (physoutdev) - dev_hold(physoutdev); - } -#endif skb_dst_force(skb); afinfo->saveroute(skb, entry); status = qh->outfn(entry, queuenum); -- 1.7.8.6 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test 2013-04-16 15:32 [PATCH -next 0/5] netfilter: nf_queue: avoid expensive gso/checksumming Florian Westphal 2013-04-16 15:32 ` [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function Florian Westphal @ 2013-04-16 15:32 ` Florian Westphal 2013-04-16 15:32 ` [PATCH 3/5] netfilter: move skb_gso_segment into nfnetlink_queue module Florian Westphal ` (2 subsequent siblings) 4 siblings, 0 replies; 10+ messages in thread From: Florian Westphal @ 2013-04-16 15:32 UTC (permalink / raw) To: netfilter-devel; +Cc: Eric Dumazet, Florian Westphal The portid is the netlink port id of the skb that created the queue. Add test to ensure the portid cannot be 0 at create time, and the check at enqueue time will always be false. Signed-off-by: Florian Westphal <fw@strlen.de> --- net/netfilter/nfnetlink_queue_core.c | 7 +++---- 1 files changed, 3 insertions(+), 4 deletions(-) diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c index 5e280b3..94e2e4f 100644 --- a/net/netfilter/nfnetlink_queue_core.c +++ b/net/netfilter/nfnetlink_queue_core.c @@ -107,6 +107,9 @@ instance_create(struct nfnl_queue_net *q, u_int16_t queue_num, unsigned int h; int err; + if (portid == 0) + return ERR_PTR(-EINVAL); + spin_lock(&q->instances_lock); if (instance_lookup(q, queue_num)) { err = -EEXIST; @@ -506,10 +509,6 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum) } spin_lock_bh(&queue->lock); - if (!queue->peer_portid) { - err = -EINVAL; - goto err_out_free_nskb; - } if (queue->queue_total >= queue->queue_maxlen) { if (queue->flags & NFQA_CFG_F_FAIL_OPEN) { failopen = 1; -- 1.7.8.6 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/5] netfilter: move skb_gso_segment into nfnetlink_queue module 2013-04-16 15:32 [PATCH -next 0/5] netfilter: nf_queue: avoid expensive gso/checksumming Florian Westphal 2013-04-16 15:32 ` [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function Florian Westphal 2013-04-16 15:32 ` [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test Florian Westphal @ 2013-04-16 15:32 ` Florian Westphal 2013-04-16 15:32 ` [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute Florian Westphal 2013-04-16 15:32 ` [PATCH 5/5] netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup Florian Westphal 4 siblings, 0 replies; 10+ messages in thread From: Florian Westphal @ 2013-04-16 15:32 UTC (permalink / raw) To: netfilter-devel; +Cc: Eric Dumazet, Florian Westphal There is nothing wrong with the current code. However, skb_gso_segment is expensive, so it would be nice if we could avoid it in the future. Since userspace must be prepared to receive larger-than-mtu-packets (which will also have incorrect l3/l4 checksums), we cannot simply remove it. The plan is to add a per-queue feature flag that userspace can set when binding the queue. The problem is that in nf_queue, we only have a queue number, not the queue context/configuration settings. This patch should have no impact other than the skb_gso_segment call now being in a function that has access to the queue config data. The size new size attribute in nf_queue_entry is needed so nfnetlink_queue can duplicate the entry of the gso skb when segmenting the skb while also copying the route key. Next step: avoid skb_gso_segment when queue config says so. Signed-off-by: Florian Westphal <fw@strlen.de> --- include/net/netfilter/nf_queue.h | 6 ++ net/netfilter/nf_queue.c | 96 ++-------------------- net/netfilter/nfnetlink_queue_core.c | 143 ++++++++++++++++++++++++++++++---- 3 files changed, 141 insertions(+), 104 deletions(-) diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h index fb1c0be..aaba4bb 100644 --- a/include/net/netfilter/nf_queue.h +++ b/include/net/netfilter/nf_queue.h @@ -9,10 +9,13 @@ struct nf_queue_entry { struct nf_hook_ops *elem; u_int8_t pf; + u16 size; /* sizeof(entry) + saved route keys */ unsigned int hook; struct net_device *indev; struct net_device *outdev; int (*okfn)(struct sk_buff *); + + /* extra space to store route keys */ }; #define nf_queue_entry_reroute(x) ((void *)x + sizeof(struct nf_queue_entry)) @@ -27,4 +30,7 @@ void nf_register_queue_handler(const struct nf_queue_handler *qh); void nf_unregister_queue_handler(void); extern void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict); +bool nf_queue_entry_get_refs(struct nf_queue_entry *entry); +void nf_queue_entry_release_refs(struct nf_queue_entry *entry); + #endif /* _NF_QUEUE_H */ diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c index 40a5001..36dfc45 100644 --- a/net/netfilter/nf_queue.c +++ b/net/netfilter/nf_queue.c @@ -40,7 +40,7 @@ void nf_unregister_queue_handler(void) } EXPORT_SYMBOL(nf_unregister_queue_handler); -static void nf_queue_entry_release_refs(struct nf_queue_entry *entry) +void nf_queue_entry_release_refs(struct nf_queue_entry *entry) { /* Release those devices we held, or Alexey will kill me. */ if (entry->indev) @@ -60,9 +60,10 @@ static void nf_queue_entry_release_refs(struct nf_queue_entry *entry) /* Drop reference to owner of hook which queued us. */ module_put(entry->elem->owner); } +EXPORT_SYMBOL_GPL(nf_queue_entry_release_refs); /* Bump dev refs so they don't vanish while packet is out */ -static bool nf_queue_entry_get_refs(struct nf_queue_entry *entry) +bool nf_queue_entry_get_refs(struct nf_queue_entry *entry) { if (!try_module_get(entry->elem->owner)) return false; @@ -87,12 +88,13 @@ static bool nf_queue_entry_get_refs(struct nf_queue_entry *entry) return true; } +EXPORT_SYMBOL_GPL(nf_queue_entry_get_refs); /* * Any packet that leaves via this function must come back * through nf_reinject(). */ -static int __nf_queue(struct sk_buff *skb, +int nf_queue(struct sk_buff *skb, struct nf_hook_ops *elem, u_int8_t pf, unsigned int hook, struct net_device *indev, @@ -132,6 +134,7 @@ static int __nf_queue(struct sk_buff *skb, .indev = indev, .outdev = outdev, .okfn = okfn, + .size = sizeof(*entry) + afinfo->route_key_size, }; if (!nf_queue_entry_get_refs(entry)) { @@ -158,87 +161,6 @@ err: return status; } -#ifdef CONFIG_BRIDGE_NETFILTER -/* When called from bridge netfilter, skb->data must point to MAC header - * before calling skb_gso_segment(). Else, original MAC header is lost - * and segmented skbs will be sent to wrong destination. - */ -static void nf_bridge_adjust_skb_data(struct sk_buff *skb) -{ - if (skb->nf_bridge) - __skb_push(skb, skb->network_header - skb->mac_header); -} - -static void nf_bridge_adjust_segmented_data(struct sk_buff *skb) -{ - if (skb->nf_bridge) - __skb_pull(skb, skb->network_header - skb->mac_header); -} -#else -#define nf_bridge_adjust_skb_data(s) do {} while (0) -#define nf_bridge_adjust_segmented_data(s) do {} while (0) -#endif - -int nf_queue(struct sk_buff *skb, - struct nf_hook_ops *elem, - u_int8_t pf, unsigned int hook, - struct net_device *indev, - struct net_device *outdev, - int (*okfn)(struct sk_buff *), - unsigned int queuenum) -{ - struct sk_buff *segs; - int err = -EINVAL; - unsigned int queued; - - if (!skb_is_gso(skb)) - return __nf_queue(skb, elem, pf, hook, indev, outdev, okfn, - queuenum); - - switch (pf) { - case NFPROTO_IPV4: - skb->protocol = htons(ETH_P_IP); - break; - case NFPROTO_IPV6: - skb->protocol = htons(ETH_P_IPV6); - break; - } - - nf_bridge_adjust_skb_data(skb); - segs = skb_gso_segment(skb, 0); - /* Does not use PTR_ERR to limit the number of error codes that can be - * returned by nf_queue. For instance, callers rely on -ECANCELED to mean - * 'ignore this hook'. - */ - if (IS_ERR(segs)) - goto out_err; - queued = 0; - err = 0; - do { - struct sk_buff *nskb = segs->next; - - segs->next = NULL; - if (err == 0) { - nf_bridge_adjust_segmented_data(segs); - err = __nf_queue(segs, elem, pf, hook, indev, - outdev, okfn, queuenum); - } - if (err == 0) - queued++; - else - kfree_skb(segs); - segs = nskb; - } while (segs); - - if (queued) { - kfree_skb(skb); - return 0; - } - out_err: - nf_bridge_adjust_segmented_data(skb); - return err; -} - void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict) { struct sk_buff *skb = entry->skb; @@ -278,9 +200,9 @@ void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict) local_bh_enable(); break; case NF_QUEUE: - err = __nf_queue(skb, elem, entry->pf, entry->hook, - entry->indev, entry->outdev, entry->okfn, - verdict >> NF_VERDICT_QBITS); + err = nf_queue(skb, elem, entry->pf, entry->hook, + entry->indev, entry->outdev, entry->okfn, + verdict >> NF_VERDICT_QBITS); if (err < 0) { if (err == -ECANCELED) goto next_hook; diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c index 94e2e4f..8ccaece 100644 --- a/net/netfilter/nfnetlink_queue_core.c +++ b/net/netfilter/nfnetlink_queue_core.c @@ -479,28 +479,13 @@ nla_put_failure: } static int -nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum) +__nfqnl_enqueue_packet(struct net *net, struct nfqnl_instance *queue, + struct nf_queue_entry *entry) { struct sk_buff *nskb; - struct nfqnl_instance *queue; int err = -ENOBUFS; __be32 *packet_id_ptr; int failopen = 0; - struct net *net = dev_net(entry->indev ? - entry->indev : entry->outdev); - struct nfnl_queue_net *q = nfnl_queue_pernet(net); - - /* rcu_read_lock()ed by nf_hook_slow() */ - queue = instance_lookup(q, queuenum); - if (!queue) { - err = -ESRCH; - goto err_out; - } - - if (queue->copy_mode == NFQNL_COPY_NONE) { - err = -EINVAL; - goto err_out; - } nskb = nfqnl_build_packet_message(queue, entry, &packet_id_ptr); if (nskb == NULL) { @@ -545,6 +530,130 @@ err_out: return err; } +static struct nf_queue_entry * +nf_queue_entry_dup(struct nf_queue_entry *e) +{ + struct nf_queue_entry *entry = kmemdup(e, e->size, GFP_ATOMIC); + if (entry) { + if (nf_queue_entry_get_refs(entry)) + return entry; + kfree(entry); + } + return NULL; +} + +#ifdef CONFIG_BRIDGE_NETFILTER +/* When called from bridge netfilter, skb->data must point to MAC header + * before calling skb_gso_segment(). Else, original MAC header is lost + * and segmented skbs will be sent to wrong destination. + */ +static void nf_bridge_adjust_skb_data(struct sk_buff *skb) +{ + if (skb->nf_bridge) + __skb_push(skb, skb->network_header - skb->mac_header); +} + +static void nf_bridge_adjust_segmented_data(struct sk_buff *skb) +{ + if (skb->nf_bridge) + __skb_pull(skb, skb->network_header - skb->mac_header); +} +#else +#define nf_bridge_adjust_skb_data(s) do {} while (0) +#define nf_bridge_adjust_segmented_data(s) do {} while (0) +#endif + +static int +__nfqnl_enqueue_packet_gso(struct net *net, struct nfqnl_instance *queue, + struct nf_queue_entry *entry) +{ + struct sk_buff *skb = entry->skb; + int ret = -ENOMEM; + + nf_bridge_adjust_segmented_data(skb); + + if (skb->next == NULL) /* last packet, no need to copy entry */ + return __nfqnl_enqueue_packet(net, queue, entry); + + skb->next = NULL; + + entry = nf_queue_entry_dup(entry); + if (entry) { + ret = __nfqnl_enqueue_packet(net, queue, entry); + if (ret) { + nf_queue_entry_release_refs(entry); + kfree(entry); + } + } + return ret; +} + +static int +nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum) +{ + unsigned int queued; + struct nfqnl_instance *queue; + struct sk_buff *skb, *segs; + int err = -ENOBUFS; + struct net *net = dev_net(entry->indev ? + entry->indev : entry->outdev); + struct nfnl_queue_net *q = nfnl_queue_pernet(net); + + /* rcu_read_lock()ed by nf_hook_slow() */ + queue = instance_lookup(q, queuenum); + if (!queue) + return -ESRCH; + + if (queue->copy_mode == NFQNL_COPY_NONE) + return -EINVAL; + + if (!skb_is_gso(entry->skb)) + return __nfqnl_enqueue_packet(net, queue, entry); + + skb = entry->skb; + + switch (entry->pf) { + case NFPROTO_IPV4: + skb->protocol = htons(ETH_P_IP); + break; + case NFPROTO_IPV6: + skb->protocol = htons(ETH_P_IPV6); + break; + } + + nf_bridge_adjust_skb_data(skb); + segs = skb_gso_segment(skb, 0); + /* Does not use PTR_ERR to limit the number of error codes that can be + * returned by nf_queue. For instance, callers rely on -ECANCELED to + * mean 'ignore this hook'. + */ + if (IS_ERR(segs)) + goto out_err; + queued = 0; + err = 0; + do { + struct sk_buff *nskb = segs->next; + + if (err == 0) { + entry->skb = segs; + err = __nfqnl_enqueue_packet_gso(net, queue, entry); + } + if (err == 0) + queued++; + else + kfree_skb(segs); + segs = nskb; + } while (segs); + + if (queued) { + kfree_skb(skb); + return 0; + } + out_err: + nf_bridge_adjust_segmented_data(skb); + return err; +} + static int nfqnl_mangle(void *data, int data_len, struct nf_queue_entry *e, int diff) { -- 1.7.8.6 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute 2013-04-16 15:32 [PATCH -next 0/5] netfilter: nf_queue: avoid expensive gso/checksumming Florian Westphal ` (2 preceding siblings ...) 2013-04-16 15:32 ` [PATCH 3/5] netfilter: move skb_gso_segment into nfnetlink_queue module Florian Westphal @ 2013-04-16 15:32 ` Florian Westphal 2013-04-16 15:55 ` Eric Dumazet 2013-04-16 15:32 ` [PATCH 5/5] netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup Florian Westphal 4 siblings, 1 reply; 10+ messages in thread From: Florian Westphal @ 2013-04-16 15:32 UTC (permalink / raw) To: netfilter-devel; +Cc: Eric Dumazet, Florian Westphal Once we allow userspace to receive gso/gro packets, userspace needs to be able to determine when checksums appear to be broken, but are not. NFQA_SKB_CSUMNOTREADY means 'checksums will be fixed in kernel later, pretend they are ok'. NFQA_SKB_GSO could be used for statistics, or to determine when packet size exceeds mtu. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> --- include/uapi/linux/netfilter/nfnetlink_queue.h | 7 +++++++ net/netfilter/nfnetlink_queue_core.c | 16 ++++++++++++++++ 2 files changed, 23 insertions(+), 0 deletions(-) diff --git a/include/uapi/linux/netfilter/nfnetlink_queue.h b/include/uapi/linux/netfilter/nfnetlink_queue.h index 70ec8c2..0069da3 100644 --- a/include/uapi/linux/netfilter/nfnetlink_queue.h +++ b/include/uapi/linux/netfilter/nfnetlink_queue.h @@ -45,6 +45,7 @@ enum nfqnl_attr_type { NFQA_CT, /* nf_conntrack_netlink.h */ NFQA_CT_INFO, /* enum ip_conntrack_info */ NFQA_CAP_LEN, /* __u32 length of captured packet */ + NFQA_SKB_INFO, /* __u32 skb meta information */ __NFQA_MAX }; @@ -98,4 +99,10 @@ enum nfqnl_attr_config { #define NFQA_CFG_F_CONNTRACK (1 << 1) #define NFQA_CFG_F_MAX (1 << 2) +/* flags for NFQA_SKB_INFO */ +/* packet appears to have wrong checksums, but they are ok */ +#define NFQA_SKB_CSUMNOTREADY (1 << 0) +/* packet is GSO (i.e., exceeds device mtu) */ +#define NFQA_SKB_GSO (1 << 1) + #endif /* _NFNETLINK_QUEUE_H */ diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c index 8ccaece..839c65b 100644 --- a/net/netfilter/nfnetlink_queue_core.c +++ b/net/netfilter/nfnetlink_queue_core.c @@ -275,6 +275,18 @@ nfqnl_zcopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen) skb_shinfo(to)->nr_frags = j; } +static int nfqnl_put_packet_info(struct sk_buff *nlskb, struct sk_buff *packet) +{ + __u32 flags = 0; + + if (skb_is_gso(packet)) + flags = NFQA_SKB_GSO; + if (packet->ip_summed == CHECKSUM_PARTIAL) + flags |= NFQA_SKB_CSUMNOTREADY; + + return nla_put_be32(nlskb, NFQA_SKB_INFO, htonl(flags)); +} + static struct sk_buff * nfqnl_build_packet_message(struct nfqnl_instance *queue, struct nf_queue_entry *entry, @@ -304,6 +316,7 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue, #endif + nla_total_size(sizeof(u_int32_t)) /* mark */ + nla_total_size(sizeof(struct nfqnl_msg_packet_hw)) + + nla_total_size(sizeof(u_int32_t)) /* skbinfo */ + nla_total_size(sizeof(u_int32_t)); /* cap_len */ if (entskb->tstamp.tv64) @@ -456,6 +469,9 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue, if (cap_len > 0 && nla_put_be32(skb, NFQA_CAP_LEN, htonl(cap_len))) goto nla_put_failure; + if (nfqnl_put_packet_info(skb, entskb)) + goto nla_put_failure; + if (data_len) { struct nlattr *nla; -- 1.7.8.6 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute 2013-04-16 15:32 ` [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute Florian Westphal @ 2013-04-16 15:55 ` Eric Dumazet 2013-04-16 17:47 ` Florian Westphal 0 siblings, 1 reply; 10+ messages in thread From: Eric Dumazet @ 2013-04-16 15:55 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel On Tue, 2013-04-16 at 17:32 +0200, Florian Westphal wrote: > Once we allow userspace to receive gso/gro packets, userspace > needs to be able to determine when checksums appear to be > broken, but are not. > > NFQA_SKB_CSUMNOTREADY means 'checksums will be fixed in kernel > later, pretend they are ok'. > > NFQA_SKB_GSO could be used for statistics, or to determine when > packet size exceeds mtu. > > Cc: Eric Dumazet <eric.dumazet@gmail.com> > Signed-off-by: Florian Westphal <fw@strlen.de> > --- > > +static int nfqnl_put_packet_info(struct sk_buff *nlskb, struct sk_buff *packet) > +{ > + __u32 flags = 0; > + > + if (skb_is_gso(packet)) > + flags = NFQA_SKB_GSO; > + if (packet->ip_summed == CHECKSUM_PARTIAL) > + flags |= NFQA_SKB_CSUMNOTREADY; > + > + return nla_put_be32(nlskb, NFQA_SKB_INFO, htonl(flags)); Maybe you could avoid sending NFQA_SKB_INFO if flags == 0 ? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute 2013-04-16 15:55 ` Eric Dumazet @ 2013-04-16 17:47 ` Florian Westphal 0 siblings, 0 replies; 10+ messages in thread From: Florian Westphal @ 2013-04-16 17:47 UTC (permalink / raw) To: Eric Dumazet; +Cc: Florian Westphal, netfilter-devel Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Tue, 2013-04-16 at 17:32 +0200, Florian Westphal wrote: > > Once we allow userspace to receive gso/gro packets, userspace > > needs to be able to determine when checksums appear to be > > broken, but are not. > > > > NFQA_SKB_CSUMNOTREADY means 'checksums will be fixed in kernel > > later, pretend they are ok'. > > > > NFQA_SKB_GSO could be used for statistics, or to determine when > > packet size exceeds mtu. [..] > > +static int nfqnl_put_packet_info(struct sk_buff *nlskb, struct sk_buff *packet) > > +{ > > + __u32 flags = 0; > > + > > + if (skb_is_gso(packet)) > > + flags = NFQA_SKB_GSO; > > > + if (packet->ip_summed == CHECKSUM_PARTIAL) > > + flags |= NFQA_SKB_CSUMNOTREADY; > > + > > + return nla_put_be32(nlskb, NFQA_SKB_INFO, htonl(flags)); > > Maybe you could avoid sending NFQA_SKB_INFO if flags == 0 ? Sure, will change it in the next round. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 5/5] netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup 2013-04-16 15:32 [PATCH -next 0/5] netfilter: nf_queue: avoid expensive gso/checksumming Florian Westphal ` (3 preceding siblings ...) 2013-04-16 15:32 ` [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute Florian Westphal @ 2013-04-16 15:32 ` Florian Westphal 4 siblings, 0 replies; 10+ messages in thread From: Florian Westphal @ 2013-04-16 15:32 UTC (permalink / raw) To: netfilter-devel; +Cc: Eric Dumazet, Florian Westphal Userspace can now indicate that it can cope with larger-than-mtu sized packets and packets that have invalid ipv4/tcp checksums. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> --- include/uapi/linux/netfilter/nfnetlink_queue.h | 3 ++- net/netfilter/nfnetlink_queue_core.c | 5 +++-- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/netfilter/nfnetlink_queue.h b/include/uapi/linux/netfilter/nfnetlink_queue.h index 0069da3..a2308ae 100644 --- a/include/uapi/linux/netfilter/nfnetlink_queue.h +++ b/include/uapi/linux/netfilter/nfnetlink_queue.h @@ -97,7 +97,8 @@ enum nfqnl_attr_config { /* Flags for NFQA_CFG_FLAGS */ #define NFQA_CFG_F_FAIL_OPEN (1 << 0) #define NFQA_CFG_F_CONNTRACK (1 << 1) -#define NFQA_CFG_F_MAX (1 << 2) +#define NFQA_CFG_F_GSO (1 << 2) +#define NFQA_CFG_F_MAX (1 << 3) /* flags for NFQA_SKB_INFO */ /* packet appears to have wrong checksums, but they are ok */ diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c index 839c65b..07ed239 100644 --- a/net/netfilter/nfnetlink_queue_core.c +++ b/net/netfilter/nfnetlink_queue_core.c @@ -330,7 +330,8 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue, break; case NFQNL_COPY_PACKET: - if (entskb->ip_summed == CHECKSUM_PARTIAL && + if (!(queue->flags & NFQA_CFG_F_GSO) && + entskb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(entskb)) return NULL; @@ -623,7 +624,7 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum) if (queue->copy_mode == NFQNL_COPY_NONE) return -EINVAL; - if (!skb_is_gso(entry->skb)) + if ((queue->flags & NFQA_CFG_F_GSO) || !skb_is_gso(entry->skb)) return __nfqnl_enqueue_packet(net, queue, entry); skb = entry->skb; -- 1.7.8.6 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH -next v2 0/5] netfilter: nf_queue: avoid expensive gso/checksums @ 2013-04-19 14:58 Florian Westphal 2013-04-19 14:58 ` [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test Florian Westphal 0 siblings, 1 reply; 10+ messages in thread From: Florian Westphal @ 2013-04-19 14:58 UTC (permalink / raw) To: netfilter-devel Hello Pablo, here is V2 of the gso avoidance patchset for nfnetlink_queue. With these patches, userspace can now instruct the kernel that it is gso/gro aware and can handle "invalid" checksums that appear in packet headers. For old userspace, nothing is changed: the kernel segments gso skbs and adjusts checksums. To avoid gso/checksum fixup overhead, userspace applications must set the new NFQA_CFG_F_GSO config flag via NFQA_CFG_FLAGS attribute. Then, for every packet received, userspace needs to check for the presence of the new NFQA_SKB_INFO attribute. If it exists, userspace needs to test NFQA_SKB_CSUMNOTREADY bit. If set, this means that userspace must NOT very packet checksums, since they will be fixed later on by the kernel. The other bit is NFQA_SKB_GSO, which could be used for statistics, or to determine when packet size exceeds mtu. Feedback welcome. Update for libnetfilter_queue (including example program/documentation) will follow later. The following changes since commit d37d696804a83479f240b397670a07ccb53a7417: netfilter: xt_rpfilter: depend on raw or mangle table (2013-04-19 00:22:55 +0200) are available in the git repository at: git://git.breakpoint.cc/fw/nf-next.git nfqueue_gso_avoidance_06 Florian Westphal (5): netfilter: nf_queue: move device refcount bump to extra function netfilter: nfnetlink_queue: avoid peer_portid test netfilter: move skb_gso_segment into nfnetlink_queue module netfilter: nfnetlink_queue: add skb info attribute netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup Changes since V1: - fix OOPS if CONFIG_BRIDGE_NETFILTER=y and old non-gso userspace listener is killed - only add NFQA_SKB_INFO when skb is gso or CHECKSUM_PARTIAL include/net/netfilter/nf_queue.h | 6 + include/uapi/linux/netfilter/nfnetlink_queue.h | 10 ++- net/netfilter/nf_queue.c | 143 +++++-------------- net/netfilter/nfnetlink_queue_core.c | 180 +++++++++++++++++++++--- 4 files changed, 209 insertions(+), 130 deletions(-) ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test 2013-04-19 14:58 [PATCH -next v2 0/5] netfilter: nf_queue: avoid expensive gso/checksums Florian Westphal @ 2013-04-19 14:58 ` Florian Westphal 2013-04-26 1:19 ` Pablo Neira Ayuso 0 siblings, 1 reply; 10+ messages in thread From: Florian Westphal @ 2013-04-19 14:58 UTC (permalink / raw) To: netfilter-devel; +Cc: Florian Westphal The portid is the netlink port id of the skb that created the queue. Add test to ensure the portid cannot be 0 at create time, and the check at enqueue time will always be false. Signed-off-by: Florian Westphal <fw@strlen.de> --- net/netfilter/nfnetlink_queue_core.c | 7 +++---- 1 files changed, 3 insertions(+), 4 deletions(-) diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c index 5e280b3..94e2e4f 100644 --- a/net/netfilter/nfnetlink_queue_core.c +++ b/net/netfilter/nfnetlink_queue_core.c @@ -107,6 +107,9 @@ instance_create(struct nfnl_queue_net *q, u_int16_t queue_num, unsigned int h; int err; + if (portid == 0) + return ERR_PTR(-EINVAL); + spin_lock(&q->instances_lock); if (instance_lookup(q, queue_num)) { err = -EEXIST; @@ -506,10 +509,6 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum) } spin_lock_bh(&queue->lock); - if (!queue->peer_portid) { - err = -EINVAL; - goto err_out_free_nskb; - } if (queue->queue_total >= queue->queue_maxlen) { if (queue->flags & NFQA_CFG_F_FAIL_OPEN) { failopen = 1; -- 1.7.8.6 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test 2013-04-19 14:58 ` [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test Florian Westphal @ 2013-04-26 1:19 ` Pablo Neira Ayuso 0 siblings, 0 replies; 10+ messages in thread From: Pablo Neira Ayuso @ 2013-04-26 1:19 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel Hi Florian, On Fri, Apr 19, 2013 at 04:58:24PM +0200, Florian Westphal wrote: > The portid is the netlink port id of the skb that created the queue. > > Add test to ensure the portid cannot be 0 at create time, and > the check at enqueue time will always be false. > > Signed-off-by: Florian Westphal <fw@strlen.de> > --- > net/netfilter/nfnetlink_queue_core.c | 7 +++---- > 1 files changed, 3 insertions(+), 4 deletions(-) > > diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c > index 5e280b3..94e2e4f 100644 > --- a/net/netfilter/nfnetlink_queue_core.c > +++ b/net/netfilter/nfnetlink_queue_core.c > @@ -107,6 +107,9 @@ instance_create(struct nfnl_queue_net *q, u_int16_t queue_num, > unsigned int h; > int err; > > + if (portid == 0) > + return ERR_PTR(-EINVAL); The instance_create function takes NETLINK_CB(skb).portid. IIRC, netlink always sets that for us to non zero, so I think we would never hit that error. > + > spin_lock(&q->instances_lock); > if (instance_lookup(q, queue_num)) { > err = -EEXIST; > @@ -506,10 +509,6 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum) > } > spin_lock_bh(&queue->lock); > > - if (!queue->peer_portid) { > - err = -EINVAL; > - goto err_out_free_nskb; > - } I'm trying to remember under what circunstances the queue portid can be left unset, but I don't find any. Will check again this tomorrow with fresh mind. > if (queue->queue_total >= queue->queue_maxlen) { > if (queue->flags & NFQA_CFG_F_FAIL_OPEN) { > failopen = 1; > -- > 1.7.8.6 > > -- > To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-04-26 1:19 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-04-16 15:32 [PATCH -next 0/5] netfilter: nf_queue: avoid expensive gso/checksumming Florian Westphal 2013-04-16 15:32 ` [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function Florian Westphal 2013-04-16 15:32 ` [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test Florian Westphal 2013-04-16 15:32 ` [PATCH 3/5] netfilter: move skb_gso_segment into nfnetlink_queue module Florian Westphal 2013-04-16 15:32 ` [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute Florian Westphal 2013-04-16 15:55 ` Eric Dumazet 2013-04-16 17:47 ` Florian Westphal 2013-04-16 15:32 ` [PATCH 5/5] netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup Florian Westphal -- strict thread matches above, loose matches on Subject: below -- 2013-04-19 14:58 [PATCH -next v2 0/5] netfilter: nf_queue: avoid expensive gso/checksums Florian Westphal 2013-04-19 14:58 ` [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test Florian Westphal 2013-04-26 1:19 ` Pablo Neira Ayuso
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).