* [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test
2013-04-16 15:32 [PATCH -next 0/5] netfilter: nf_queue: avoid expensive gso/checksumming Florian Westphal
@ 2013-04-16 15:32 ` Florian Westphal
0 siblings, 0 replies; 12+ messages in thread
From: Florian Westphal @ 2013-04-16 15:32 UTC (permalink / raw)
To: netfilter-devel; +Cc: Eric Dumazet, Florian Westphal
The portid is the netlink port id of the skb that created the queue.
Add test to ensure the portid cannot be 0 at create time, and
the check at enqueue time will always be false.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nfnetlink_queue_core.c | 7 +++----
1 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index 5e280b3..94e2e4f 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -107,6 +107,9 @@ instance_create(struct nfnl_queue_net *q, u_int16_t queue_num,
unsigned int h;
int err;
+ if (portid == 0)
+ return ERR_PTR(-EINVAL);
+
spin_lock(&q->instances_lock);
if (instance_lookup(q, queue_num)) {
err = -EEXIST;
@@ -506,10 +509,6 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
}
spin_lock_bh(&queue->lock);
- if (!queue->peer_portid) {
- err = -EINVAL;
- goto err_out_free_nskb;
- }
if (queue->queue_total >= queue->queue_maxlen) {
if (queue->flags & NFQA_CFG_F_FAIL_OPEN) {
failopen = 1;
--
1.7.8.6
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH -next v2 0/5] netfilter: nf_queue: avoid expensive gso/checksums
@ 2013-04-19 14:58 Florian Westphal
2013-04-19 14:58 ` [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function Florian Westphal
` (4 more replies)
0 siblings, 5 replies; 12+ messages in thread
From: Florian Westphal @ 2013-04-19 14:58 UTC (permalink / raw)
To: netfilter-devel
Hello Pablo,
here is V2 of the gso avoidance patchset for nfnetlink_queue.
With these patches, userspace can now instruct the kernel that it is gso/gro
aware and can handle "invalid" checksums that appear in packet headers.
For old userspace, nothing is changed: the kernel segments gso skbs
and adjusts checksums.
To avoid gso/checksum fixup overhead, userspace applications must set the
new NFQA_CFG_F_GSO config flag via NFQA_CFG_FLAGS attribute.
Then, for every packet received, userspace needs to check for the presence
of the new NFQA_SKB_INFO attribute. If it exists, userspace needs to test
NFQA_SKB_CSUMNOTREADY bit. If set, this means that userspace
must NOT very packet checksums, since they will be fixed later on
by the kernel.
The other bit is
NFQA_SKB_GSO, which could be used for statistics, or to determine when
packet size exceeds mtu.
Feedback welcome.
Update for libnetfilter_queue (including example program/documentation)
will follow later.
The following changes since commit d37d696804a83479f240b397670a07ccb53a7417:
netfilter: xt_rpfilter: depend on raw or mangle table (2013-04-19 00:22:55 +0200)
are available in the git repository at:
git://git.breakpoint.cc/fw/nf-next.git nfqueue_gso_avoidance_06
Florian Westphal (5):
netfilter: nf_queue: move device refcount bump to extra function
netfilter: nfnetlink_queue: avoid peer_portid test
netfilter: move skb_gso_segment into nfnetlink_queue module
netfilter: nfnetlink_queue: add skb info attribute
netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup
Changes since V1:
- fix OOPS if CONFIG_BRIDGE_NETFILTER=y and old non-gso userspace listener is killed
- only add NFQA_SKB_INFO when skb is gso or CHECKSUM_PARTIAL
include/net/netfilter/nf_queue.h | 6 +
include/uapi/linux/netfilter/nfnetlink_queue.h | 10 ++-
net/netfilter/nf_queue.c | 143 +++++--------------
net/netfilter/nfnetlink_queue_core.c | 180 +++++++++++++++++++++---
4 files changed, 209 insertions(+), 130 deletions(-)
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function
2013-04-19 14:58 [PATCH -next v2 0/5] netfilter: nf_queue: avoid expensive gso/checksums Florian Westphal
@ 2013-04-19 14:58 ` Florian Westphal
2013-04-27 17:46 ` Pablo Neira Ayuso
2013-04-19 14:58 ` [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test Florian Westphal
` (3 subsequent siblings)
4 siblings, 1 reply; 12+ messages in thread
From: Florian Westphal @ 2013-04-19 14:58 UTC (permalink / raw)
To: netfilter-devel; +Cc: Florian Westphal
required by future patch that will need to duplicate the
nf_queue_entry, bumping refcounts of the copy.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nf_queue.c | 49 ++++++++++++++++++++++++++-------------------
1 files changed, 28 insertions(+), 21 deletions(-)
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index 5ccf01e..1d91e77 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -66,6 +66,33 @@ static void nf_queue_entry_release_refs(struct nf_queue_entry *entry)
module_put(entry->elem->owner);
}
+/* Bump dev refs so they don't vanish while packet is out */
+static bool nf_queue_entry_get_refs(struct nf_queue_entry *entry)
+{
+ if (!try_module_get(entry->elem->owner))
+ return false;
+
+ if (entry->indev)
+ dev_hold(entry->indev);
+ if (entry->outdev)
+ dev_hold(entry->outdev);
+#ifdef CONFIG_BRIDGE_NETFILTER
+ if (entry->skb->nf_bridge) {
+ struct nf_bridge_info *nf_bridge = entry->skb->nf_bridge;
+ struct net_device *physdev;
+
+ physdev = nf_bridge->physindev;
+ if (physdev)
+ dev_hold(physdev);
+ physdev = nf_bridge->physoutdev;
+ if (physdev)
+ dev_hold(physdev);
+ }
+#endif
+
+ return true;
+}
+
/*
* Any packet that leaves via this function must come back
* through nf_reinject().
@@ -80,10 +107,6 @@ static int __nf_queue(struct sk_buff *skb,
{
int status = -ENOENT;
struct nf_queue_entry *entry = NULL;
-#ifdef CONFIG_BRIDGE_NETFILTER
- struct net_device *physindev;
- struct net_device *physoutdev;
-#endif
const struct nf_afinfo *afinfo;
const struct nf_queue_handler *qh;
@@ -116,26 +139,10 @@ static int __nf_queue(struct sk_buff *skb,
.okfn = okfn,
};
- /* If it's going away, ignore hook. */
- if (!try_module_get(entry->elem->owner)) {
+ if (!nf_queue_entry_get_refs(entry)) {
status = -ECANCELED;
goto err_unlock;
}
- /* Bump dev refs so they don't vanish while packet is out */
- if (indev)
- dev_hold(indev);
- if (outdev)
- dev_hold(outdev);
-#ifdef CONFIG_BRIDGE_NETFILTER
- if (skb->nf_bridge) {
- physindev = skb->nf_bridge->physindev;
- if (physindev)
- dev_hold(physindev);
- physoutdev = skb->nf_bridge->physoutdev;
- if (physoutdev)
- dev_hold(physoutdev);
- }
-#endif
skb_dst_force(skb);
afinfo->saveroute(skb, entry);
status = qh->outfn(entry, queuenum);
--
1.7.8.6
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test
2013-04-19 14:58 [PATCH -next v2 0/5] netfilter: nf_queue: avoid expensive gso/checksums Florian Westphal
2013-04-19 14:58 ` [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function Florian Westphal
@ 2013-04-19 14:58 ` Florian Westphal
2013-04-26 1:19 ` Pablo Neira Ayuso
2013-04-19 14:58 ` [PATCH 3/5] netfilter: move skb_gso_segment into nfnetlink_queue module Florian Westphal
` (2 subsequent siblings)
4 siblings, 1 reply; 12+ messages in thread
From: Florian Westphal @ 2013-04-19 14:58 UTC (permalink / raw)
To: netfilter-devel; +Cc: Florian Westphal
The portid is the netlink port id of the skb that created the queue.
Add test to ensure the portid cannot be 0 at create time, and
the check at enqueue time will always be false.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nfnetlink_queue_core.c | 7 +++----
1 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index 5e280b3..94e2e4f 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -107,6 +107,9 @@ instance_create(struct nfnl_queue_net *q, u_int16_t queue_num,
unsigned int h;
int err;
+ if (portid == 0)
+ return ERR_PTR(-EINVAL);
+
spin_lock(&q->instances_lock);
if (instance_lookup(q, queue_num)) {
err = -EEXIST;
@@ -506,10 +509,6 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
}
spin_lock_bh(&queue->lock);
- if (!queue->peer_portid) {
- err = -EINVAL;
- goto err_out_free_nskb;
- }
if (queue->queue_total >= queue->queue_maxlen) {
if (queue->flags & NFQA_CFG_F_FAIL_OPEN) {
failopen = 1;
--
1.7.8.6
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/5] netfilter: move skb_gso_segment into nfnetlink_queue module
2013-04-19 14:58 [PATCH -next v2 0/5] netfilter: nf_queue: avoid expensive gso/checksums Florian Westphal
2013-04-19 14:58 ` [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function Florian Westphal
2013-04-19 14:58 ` [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test Florian Westphal
@ 2013-04-19 14:58 ` Florian Westphal
2013-04-27 17:46 ` Pablo Neira Ayuso
2013-04-19 14:58 ` [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute Florian Westphal
2013-04-19 14:58 ` [PATCH 5/5] netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup Florian Westphal
4 siblings, 1 reply; 12+ messages in thread
From: Florian Westphal @ 2013-04-19 14:58 UTC (permalink / raw)
To: netfilter-devel; +Cc: Florian Westphal
There is nothing wrong with the current code.
However, skb_gso_segment is expensive, so it would be nice
if we could avoid it in the future.
Since userspace must be prepared to receive larger-than-mtu-packets
(which will also have incorrect l3/l4 checksums), we cannot simply
remove it.
The plan is to add a per-queue feature flag that userspace can
set when binding the queue.
The problem is that in nf_queue, we only have a queue number,
not the queue context/configuration settings.
This patch should have no impact other than the skb_gso_segment
call now being in a function that has access to the queue config data.
The size new size attribute in nf_queue_entry is needed so
nfnetlink_queue can duplicate the entry of the gso skb
when segmenting the skb while also copying the route key.
Next step: avoid skb_gso_segment when queue config says so.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
Change since V2:
- don't mess up entry->skb assignment when duplicating
nf_queue_entry to avoid OOPS with CONFIG_BRIDGE_NETFILTER=y
include/net/netfilter/nf_queue.h | 6 ++
net/netfilter/nf_queue.c | 96 ++-------------------
net/netfilter/nfnetlink_queue_core.c | 154 ++++++++++++++++++++++++++++++----
3 files changed, 152 insertions(+), 104 deletions(-)
diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h
index fb1c0be..aaba4bb 100644
--- a/include/net/netfilter/nf_queue.h
+++ b/include/net/netfilter/nf_queue.h
@@ -9,10 +9,13 @@ struct nf_queue_entry {
struct nf_hook_ops *elem;
u_int8_t pf;
+ u16 size; /* sizeof(entry) + saved route keys */
unsigned int hook;
struct net_device *indev;
struct net_device *outdev;
int (*okfn)(struct sk_buff *);
+
+ /* extra space to store route keys */
};
#define nf_queue_entry_reroute(x) ((void *)x + sizeof(struct nf_queue_entry))
@@ -27,4 +30,7 @@ void nf_register_queue_handler(const struct nf_queue_handler *qh);
void nf_unregister_queue_handler(void);
extern void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict);
+bool nf_queue_entry_get_refs(struct nf_queue_entry *entry);
+void nf_queue_entry_release_refs(struct nf_queue_entry *entry);
+
#endif /* _NF_QUEUE_H */
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index 1d91e77..5d24b1f 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -45,7 +45,7 @@ void nf_unregister_queue_handler(void)
}
EXPORT_SYMBOL(nf_unregister_queue_handler);
-static void nf_queue_entry_release_refs(struct nf_queue_entry *entry)
+void nf_queue_entry_release_refs(struct nf_queue_entry *entry)
{
/* Release those devices we held, or Alexey will kill me. */
if (entry->indev)
@@ -65,9 +65,10 @@ static void nf_queue_entry_release_refs(struct nf_queue_entry *entry)
/* Drop reference to owner of hook which queued us. */
module_put(entry->elem->owner);
}
+EXPORT_SYMBOL_GPL(nf_queue_entry_release_refs);
/* Bump dev refs so they don't vanish while packet is out */
-static bool nf_queue_entry_get_refs(struct nf_queue_entry *entry)
+bool nf_queue_entry_get_refs(struct nf_queue_entry *entry)
{
if (!try_module_get(entry->elem->owner))
return false;
@@ -92,12 +93,13 @@ static bool nf_queue_entry_get_refs(struct nf_queue_entry *entry)
return true;
}
+EXPORT_SYMBOL_GPL(nf_queue_entry_get_refs);
/*
* Any packet that leaves via this function must come back
* through nf_reinject().
*/
-static int __nf_queue(struct sk_buff *skb,
+int nf_queue(struct sk_buff *skb,
struct nf_hook_ops *elem,
u_int8_t pf, unsigned int hook,
struct net_device *indev,
@@ -137,6 +139,7 @@ static int __nf_queue(struct sk_buff *skb,
.indev = indev,
.outdev = outdev,
.okfn = okfn,
+ .size = sizeof(*entry) + afinfo->route_key_size,
};
if (!nf_queue_entry_get_refs(entry)) {
@@ -163,87 +166,6 @@ err:
return status;
}
-#ifdef CONFIG_BRIDGE_NETFILTER
-/* When called from bridge netfilter, skb->data must point to MAC header
- * before calling skb_gso_segment(). Else, original MAC header is lost
- * and segmented skbs will be sent to wrong destination.
- */
-static void nf_bridge_adjust_skb_data(struct sk_buff *skb)
-{
- if (skb->nf_bridge)
- __skb_push(skb, skb->network_header - skb->mac_header);
-}
-
-static void nf_bridge_adjust_segmented_data(struct sk_buff *skb)
-{
- if (skb->nf_bridge)
- __skb_pull(skb, skb->network_header - skb->mac_header);
-}
-#else
-#define nf_bridge_adjust_skb_data(s) do {} while (0)
-#define nf_bridge_adjust_segmented_data(s) do {} while (0)
-#endif
-
-int nf_queue(struct sk_buff *skb,
- struct nf_hook_ops *elem,
- u_int8_t pf, unsigned int hook,
- struct net_device *indev,
- struct net_device *outdev,
- int (*okfn)(struct sk_buff *),
- unsigned int queuenum)
-{
- struct sk_buff *segs;
- int err = -EINVAL;
- unsigned int queued;
-
- if (!skb_is_gso(skb))
- return __nf_queue(skb, elem, pf, hook, indev, outdev, okfn,
- queuenum);
-
- switch (pf) {
- case NFPROTO_IPV4:
- skb->protocol = htons(ETH_P_IP);
- break;
- case NFPROTO_IPV6:
- skb->protocol = htons(ETH_P_IPV6);
- break;
- }
-
- nf_bridge_adjust_skb_data(skb);
- segs = skb_gso_segment(skb, 0);
- /* Does not use PTR_ERR to limit the number of error codes that can be
- * returned by nf_queue. For instance, callers rely on -ECANCELED to mean
- * 'ignore this hook'.
- */
- if (IS_ERR(segs))
- goto out_err;
- queued = 0;
- err = 0;
- do {
- struct sk_buff *nskb = segs->next;
-
- segs->next = NULL;
- if (err == 0) {
- nf_bridge_adjust_segmented_data(segs);
- err = __nf_queue(segs, elem, pf, hook, indev,
- outdev, okfn, queuenum);
- }
- if (err == 0)
- queued++;
- else
- kfree_skb(segs);
- segs = nskb;
- } while (segs);
-
- if (queued) {
- kfree_skb(skb);
- return 0;
- }
- out_err:
- nf_bridge_adjust_segmented_data(skb);
- return err;
-}
-
void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict)
{
struct sk_buff *skb = entry->skb;
@@ -283,9 +205,9 @@ void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict)
local_bh_enable();
break;
case NF_QUEUE:
- err = __nf_queue(skb, elem, entry->pf, entry->hook,
- entry->indev, entry->outdev, entry->okfn,
- verdict >> NF_VERDICT_QBITS);
+ err = nf_queue(skb, elem, entry->pf, entry->hook,
+ entry->indev, entry->outdev, entry->okfn,
+ verdict >> NF_VERDICT_QBITS);
if (err < 0) {
if (err == -ECANCELED)
goto next_hook;
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index 94e2e4f..38c8a7a 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -479,28 +479,13 @@ nla_put_failure:
}
static int
-nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
+__nfqnl_enqueue_packet(struct net *net, struct nfqnl_instance *queue,
+ struct nf_queue_entry *entry)
{
struct sk_buff *nskb;
- struct nfqnl_instance *queue;
int err = -ENOBUFS;
__be32 *packet_id_ptr;
int failopen = 0;
- struct net *net = dev_net(entry->indev ?
- entry->indev : entry->outdev);
- struct nfnl_queue_net *q = nfnl_queue_pernet(net);
-
- /* rcu_read_lock()ed by nf_hook_slow() */
- queue = instance_lookup(q, queuenum);
- if (!queue) {
- err = -ESRCH;
- goto err_out;
- }
-
- if (queue->copy_mode == NFQNL_COPY_NONE) {
- err = -EINVAL;
- goto err_out;
- }
nskb = nfqnl_build_packet_message(queue, entry, &packet_id_ptr);
if (nskb == NULL) {
@@ -545,6 +530,141 @@ err_out:
return err;
}
+static struct nf_queue_entry *
+nf_queue_entry_dup(struct nf_queue_entry *e)
+{
+ struct nf_queue_entry *entry = kmemdup(e, e->size, GFP_ATOMIC);
+ if (entry) {
+ if (nf_queue_entry_get_refs(entry))
+ return entry;
+ kfree(entry);
+ }
+ return NULL;
+}
+
+#ifdef CONFIG_BRIDGE_NETFILTER
+/* When called from bridge netfilter, skb->data must point to MAC header
+ * before calling skb_gso_segment(). Else, original MAC header is lost
+ * and segmented skbs will be sent to wrong destination.
+ */
+static void nf_bridge_adjust_skb_data(struct sk_buff *skb)
+{
+ if (skb->nf_bridge)
+ __skb_push(skb, skb->network_header - skb->mac_header);
+}
+
+static void nf_bridge_adjust_segmented_data(struct sk_buff *skb)
+{
+ if (skb->nf_bridge)
+ __skb_pull(skb, skb->network_header - skb->mac_header);
+}
+#else
+#define nf_bridge_adjust_skb_data(s) do {} while (0)
+#define nf_bridge_adjust_segmented_data(s) do {} while (0)
+#endif
+
+static void free_entry(struct nf_queue_entry *entry)
+{
+ nf_queue_entry_release_refs(entry);
+ kfree(entry);
+}
+
+static int
+__nfqnl_enqueue_packet_gso(struct net *net, struct nfqnl_instance *queue,
+ struct sk_buff *skb, struct nf_queue_entry *entry)
+{
+ int ret = -ENOMEM;
+ struct nf_queue_entry *entry_seg;
+
+ nf_bridge_adjust_segmented_data(skb);
+
+ if (skb->next == NULL) { /* last packet, no need to copy entry */
+ struct sk_buff *gso_skb = entry->skb;
+ entry->skb = skb;
+ ret = __nfqnl_enqueue_packet(net, queue, entry);
+ if (ret)
+ entry->skb = gso_skb;
+ return ret;
+ }
+
+ skb->next = NULL;
+
+ entry_seg = nf_queue_entry_dup(entry);
+ if (entry_seg) {
+ entry_seg->skb = skb;
+ ret = __nfqnl_enqueue_packet(net, queue, entry_seg);
+ if (ret)
+ free_entry(entry_seg);
+ }
+ return ret;
+}
+
+static int
+nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
+{
+ unsigned int queued;
+ struct nfqnl_instance *queue;
+ struct sk_buff *skb, *segs;
+ int err = -ENOBUFS;
+ struct net *net = dev_net(entry->indev ?
+ entry->indev : entry->outdev);
+ struct nfnl_queue_net *q = nfnl_queue_pernet(net);
+
+ /* rcu_read_lock()ed by nf_hook_slow() */
+ queue = instance_lookup(q, queuenum);
+ if (!queue)
+ return -ESRCH;
+
+ if (queue->copy_mode == NFQNL_COPY_NONE)
+ return -EINVAL;
+
+ if (!skb_is_gso(entry->skb))
+ return __nfqnl_enqueue_packet(net, queue, entry);
+
+ skb = entry->skb;
+
+ switch (entry->pf) {
+ case NFPROTO_IPV4:
+ skb->protocol = htons(ETH_P_IP);
+ break;
+ case NFPROTO_IPV6:
+ skb->protocol = htons(ETH_P_IPV6);
+ break;
+ }
+
+ nf_bridge_adjust_skb_data(skb);
+ segs = skb_gso_segment(skb, 0);
+ /* Does not use PTR_ERR to limit the number of error codes that can be
+ * returned by nf_queue. For instance, callers rely on -ECANCELED to
+ * mean 'ignore this hook'.
+ */
+ if (IS_ERR(segs))
+ goto out_err;
+ queued = 0;
+ err = 0;
+ do {
+ struct sk_buff *nskb = segs->next;
+ if (err == 0)
+ err = __nfqnl_enqueue_packet_gso(net, queue,
+ segs, entry);
+ if (err == 0)
+ queued++;
+ else
+ kfree_skb(segs);
+ segs = nskb;
+ } while (segs);
+
+ if (queued) {
+ if (err) /* some segments are already queued */
+ free_entry(entry);
+ kfree_skb(skb);
+ return 0;
+ }
+ out_err:
+ nf_bridge_adjust_segmented_data(skb);
+ return err;
+}
+
static int
nfqnl_mangle(void *data, int data_len, struct nf_queue_entry *e, int diff)
{
--
1.7.8.6
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute
2013-04-19 14:58 [PATCH -next v2 0/5] netfilter: nf_queue: avoid expensive gso/checksums Florian Westphal
` (2 preceding siblings ...)
2013-04-19 14:58 ` [PATCH 3/5] netfilter: move skb_gso_segment into nfnetlink_queue module Florian Westphal
@ 2013-04-19 14:58 ` Florian Westphal
2013-04-27 17:46 ` Pablo Neira Ayuso
2013-04-19 14:58 ` [PATCH 5/5] netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup Florian Westphal
4 siblings, 1 reply; 12+ messages in thread
From: Florian Westphal @ 2013-04-19 14:58 UTC (permalink / raw)
To: netfilter-devel; +Cc: Florian Westphal
Once we allow userspace to receive gso/gro packets, userspace
needs to be able to determine when checksums appear to be
broken, but are not.
NFQA_SKB_CSUMNOTREADY means 'checksums will be fixed in kernel
later, pretend they are ok'.
NFQA_SKB_GSO could be used for statistics, or to determine when
packet size exceeds mtu.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
Change since v2:
add NFQA_SKB_INFO only if value is nonzero.
include/uapi/linux/netfilter/nfnetlink_queue.h | 7 +++++++
net/netfilter/nfnetlink_queue_core.c | 16 ++++++++++++++++
2 files changed, 23 insertions(+), 0 deletions(-)
diff --git a/include/uapi/linux/netfilter/nfnetlink_queue.h b/include/uapi/linux/netfilter/nfnetlink_queue.h
index 70ec8c2..0069da3 100644
--- a/include/uapi/linux/netfilter/nfnetlink_queue.h
+++ b/include/uapi/linux/netfilter/nfnetlink_queue.h
@@ -45,6 +45,7 @@ enum nfqnl_attr_type {
NFQA_CT, /* nf_conntrack_netlink.h */
NFQA_CT_INFO, /* enum ip_conntrack_info */
NFQA_CAP_LEN, /* __u32 length of captured packet */
+ NFQA_SKB_INFO, /* __u32 skb meta information */
__NFQA_MAX
};
@@ -98,4 +99,10 @@ enum nfqnl_attr_config {
#define NFQA_CFG_F_CONNTRACK (1 << 1)
#define NFQA_CFG_F_MAX (1 << 2)
+/* flags for NFQA_SKB_INFO */
+/* packet appears to have wrong checksums, but they are ok */
+#define NFQA_SKB_CSUMNOTREADY (1 << 0)
+/* packet is GSO (i.e., exceeds device mtu) */
+#define NFQA_SKB_GSO (1 << 1)
+
#endif /* _NFNETLINK_QUEUE_H */
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index 38c8a7a..c927a7b 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -275,6 +275,18 @@ nfqnl_zcopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
skb_shinfo(to)->nr_frags = j;
}
+static int nfqnl_put_packet_info(struct sk_buff *nlskb, struct sk_buff *packet)
+{
+ __u32 flags = 0;
+
+ if (packet->ip_summed == CHECKSUM_PARTIAL)
+ flags = NFQA_SKB_CSUMNOTREADY;
+ if (skb_is_gso(packet))
+ flags |= NFQA_SKB_GSO;
+
+ return flags ? nla_put_be32(nlskb, NFQA_SKB_INFO, htonl(flags)) : 0;
+}
+
static struct sk_buff *
nfqnl_build_packet_message(struct nfqnl_instance *queue,
struct nf_queue_entry *entry,
@@ -304,6 +316,7 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue,
#endif
+ nla_total_size(sizeof(u_int32_t)) /* mark */
+ nla_total_size(sizeof(struct nfqnl_msg_packet_hw))
+ + nla_total_size(sizeof(u_int32_t)) /* skbinfo */
+ nla_total_size(sizeof(u_int32_t)); /* cap_len */
if (entskb->tstamp.tv64)
@@ -456,6 +469,9 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue,
if (cap_len > 0 && nla_put_be32(skb, NFQA_CAP_LEN, htonl(cap_len)))
goto nla_put_failure;
+ if (nfqnl_put_packet_info(skb, entskb))
+ goto nla_put_failure;
+
if (data_len) {
struct nlattr *nla;
--
1.7.8.6
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 5/5] netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup
2013-04-19 14:58 [PATCH -next v2 0/5] netfilter: nf_queue: avoid expensive gso/checksums Florian Westphal
` (3 preceding siblings ...)
2013-04-19 14:58 ` [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute Florian Westphal
@ 2013-04-19 14:58 ` Florian Westphal
2013-04-27 17:46 ` Pablo Neira Ayuso
4 siblings, 1 reply; 12+ messages in thread
From: Florian Westphal @ 2013-04-19 14:58 UTC (permalink / raw)
To: netfilter-devel; +Cc: Florian Westphal
Userspace can now indicate that it can cope with larger-than-mtu sized
packets and packets that have invalid ipv4/tcp checksums.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
include/uapi/linux/netfilter/nfnetlink_queue.h | 3 ++-
net/netfilter/nfnetlink_queue_core.c | 5 +++--
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/include/uapi/linux/netfilter/nfnetlink_queue.h b/include/uapi/linux/netfilter/nfnetlink_queue.h
index 0069da3..a2308ae 100644
--- a/include/uapi/linux/netfilter/nfnetlink_queue.h
+++ b/include/uapi/linux/netfilter/nfnetlink_queue.h
@@ -97,7 +97,8 @@ enum nfqnl_attr_config {
/* Flags for NFQA_CFG_FLAGS */
#define NFQA_CFG_F_FAIL_OPEN (1 << 0)
#define NFQA_CFG_F_CONNTRACK (1 << 1)
-#define NFQA_CFG_F_MAX (1 << 2)
+#define NFQA_CFG_F_GSO (1 << 2)
+#define NFQA_CFG_F_MAX (1 << 3)
/* flags for NFQA_SKB_INFO */
/* packet appears to have wrong checksums, but they are ok */
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index c927a7b..f1e3c90c 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -330,7 +330,8 @@ nfqnl_build_packet_message(struct nfqnl_instance *queue,
break;
case NFQNL_COPY_PACKET:
- if (entskb->ip_summed == CHECKSUM_PARTIAL &&
+ if (!(queue->flags & NFQA_CFG_F_GSO) &&
+ entskb->ip_summed == CHECKSUM_PARTIAL &&
skb_checksum_help(entskb))
return NULL;
@@ -634,7 +635,7 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
if (queue->copy_mode == NFQNL_COPY_NONE)
return -EINVAL;
- if (!skb_is_gso(entry->skb))
+ if ((queue->flags & NFQA_CFG_F_GSO) || !skb_is_gso(entry->skb))
return __nfqnl_enqueue_packet(net, queue, entry);
skb = entry->skb;
--
1.7.8.6
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test
2013-04-19 14:58 ` [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test Florian Westphal
@ 2013-04-26 1:19 ` Pablo Neira Ayuso
0 siblings, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2013-04-26 1:19 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel
Hi Florian,
On Fri, Apr 19, 2013 at 04:58:24PM +0200, Florian Westphal wrote:
> The portid is the netlink port id of the skb that created the queue.
>
> Add test to ensure the portid cannot be 0 at create time, and
> the check at enqueue time will always be false.
>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> net/netfilter/nfnetlink_queue_core.c | 7 +++----
> 1 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
> index 5e280b3..94e2e4f 100644
> --- a/net/netfilter/nfnetlink_queue_core.c
> +++ b/net/netfilter/nfnetlink_queue_core.c
> @@ -107,6 +107,9 @@ instance_create(struct nfnl_queue_net *q, u_int16_t queue_num,
> unsigned int h;
> int err;
>
> + if (portid == 0)
> + return ERR_PTR(-EINVAL);
The instance_create function takes NETLINK_CB(skb).portid. IIRC,
netlink always sets that for us to non zero, so I think we would never
hit that error.
> +
> spin_lock(&q->instances_lock);
> if (instance_lookup(q, queue_num)) {
> err = -EEXIST;
> @@ -506,10 +509,6 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
> }
> spin_lock_bh(&queue->lock);
>
> - if (!queue->peer_portid) {
> - err = -EINVAL;
> - goto err_out_free_nskb;
> - }
I'm trying to remember under what circunstances the queue portid can
be left unset, but I don't find any. Will check again this tomorrow
with fresh mind.
> if (queue->queue_total >= queue->queue_maxlen) {
> if (queue->flags & NFQA_CFG_F_FAIL_OPEN) {
> failopen = 1;
> --
> 1.7.8.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function
2013-04-19 14:58 ` [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function Florian Westphal
@ 2013-04-27 17:46 ` Pablo Neira Ayuso
0 siblings, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2013-04-27 17:46 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel
On Fri, Apr 19, 2013 at 04:58:23PM +0200, Florian Westphal wrote:
> required by future patch that will need to duplicate the
> nf_queue_entry, bumping refcounts of the copy.
Applied, thanks Florian.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/5] netfilter: move skb_gso_segment into nfnetlink_queue module
2013-04-19 14:58 ` [PATCH 3/5] netfilter: move skb_gso_segment into nfnetlink_queue module Florian Westphal
@ 2013-04-27 17:46 ` Pablo Neira Ayuso
0 siblings, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2013-04-27 17:46 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel
On Fri, Apr 19, 2013 at 04:58:25PM +0200, Florian Westphal wrote:
> There is nothing wrong with the current code.
>
> However, skb_gso_segment is expensive, so it would be nice
> if we could avoid it in the future.
>
> Since userspace must be prepared to receive larger-than-mtu-packets
> (which will also have incorrect l3/l4 checksums), we cannot simply
> remove it.
>
> The plan is to add a per-queue feature flag that userspace can
> set when binding the queue.
>
> The problem is that in nf_queue, we only have a queue number,
> not the queue context/configuration settings.
>
> This patch should have no impact other than the skb_gso_segment
> call now being in a function that has access to the queue config data.
>
> The size new size attribute in nf_queue_entry is needed so
> nfnetlink_queue can duplicate the entry of the gso skb
> when segmenting the skb while also copying the route key.
>
> Next step: avoid skb_gso_segment when queue config says so.
Applied, thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute
2013-04-19 14:58 ` [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute Florian Westphal
@ 2013-04-27 17:46 ` Pablo Neira Ayuso
0 siblings, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2013-04-27 17:46 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel
On Fri, Apr 19, 2013 at 04:58:26PM +0200, Florian Westphal wrote:
> Once we allow userspace to receive gso/gro packets, userspace
> needs to be able to determine when checksums appear to be
> broken, but are not.
>
> NFQA_SKB_CSUMNOTREADY means 'checksums will be fixed in kernel
> later, pretend they are ok'.
>
> NFQA_SKB_GSO could be used for statistics, or to determine when
> packet size exceeds mtu.
Applied, thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 5/5] netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup
2013-04-19 14:58 ` [PATCH 5/5] netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup Florian Westphal
@ 2013-04-27 17:46 ` Pablo Neira Ayuso
0 siblings, 0 replies; 12+ messages in thread
From: Pablo Neira Ayuso @ 2013-04-27 17:46 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel
On Fri, Apr 19, 2013 at 04:58:27PM +0200, Florian Westphal wrote:
> Userspace can now indicate that it can cope with larger-than-mtu sized
> packets and packets that have invalid ipv4/tcp checksums.
And also applied, thanks Florian.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-04-27 17:46 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-19 14:58 [PATCH -next v2 0/5] netfilter: nf_queue: avoid expensive gso/checksums Florian Westphal
2013-04-19 14:58 ` [PATCH 1/5] netfilter: nf_queue: move device refcount bump to extra function Florian Westphal
2013-04-27 17:46 ` Pablo Neira Ayuso
2013-04-19 14:58 ` [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test Florian Westphal
2013-04-26 1:19 ` Pablo Neira Ayuso
2013-04-19 14:58 ` [PATCH 3/5] netfilter: move skb_gso_segment into nfnetlink_queue module Florian Westphal
2013-04-27 17:46 ` Pablo Neira Ayuso
2013-04-19 14:58 ` [PATCH 4/5] netfilter: nfnetlink_queue: add skb info attribute Florian Westphal
2013-04-27 17:46 ` Pablo Neira Ayuso
2013-04-19 14:58 ` [PATCH 5/5] netfilter: nfqueue: avoid expensive gso segmentation and checksum fixup Florian Westphal
2013-04-27 17:46 ` Pablo Neira Ayuso
-- strict thread matches above, loose matches on Subject: below --
2013-04-16 15:32 [PATCH -next 0/5] netfilter: nf_queue: avoid expensive gso/checksumming Florian Westphal
2013-04-16 15:32 ` [PATCH 2/5] netfilter: nfnetlink_queue: avoid peer_portid test Florian Westphal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).