* [PATCH nft 1/4] segtree: set expr->len for prefix expression from interval_map_decompose()
2016-04-23 16:08 [PATCH nft 0/4] Interval overlap detection for named sets Pablo Neira Ayuso
@ 2016-04-23 16:08 ` Pablo Neira Ayuso
2016-04-23 16:08 ` [PATCH nft 2/4] segtree: add expr_to_intervals() Pablo Neira Ayuso
` (3 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2016-04-23 16:08 UTC (permalink / raw)
To: netfilter-devel
This field needs to be set for the new interval overlap detection.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
src/segtree.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/segtree.c b/src/segtree.c
index 668c085..f544704 100644
--- a/src/segtree.c
+++ b/src/segtree.c
@@ -624,6 +624,7 @@ void interval_map_decompose(struct expr *set)
prefix_len = expr_value(i)->len - mpz_scan0(range, 0);
prefix = prefix_expr_alloc(&low->location, expr_value(low),
prefix_len);
+ prefix->len = low->len;
prefix = set_elem_expr_alloc(&low->location, prefix);
if (low->ops->type == EXPR_MAPPING)
prefix = mapping_expr_alloc(&low->location, prefix,
--
2.1.4
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH nft 2/4] segtree: add expr_to_intervals()
2016-04-23 16:08 [PATCH nft 0/4] Interval overlap detection for named sets Pablo Neira Ayuso
2016-04-23 16:08 ` [PATCH nft 1/4] segtree: set expr->len for prefix expression from interval_map_decompose() Pablo Neira Ayuso
@ 2016-04-23 16:08 ` Pablo Neira Ayuso
2016-04-23 16:08 ` [PATCH nft 3/4] segtree: rename set expression set_to_segtree() Pablo Neira Ayuso
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2016-04-23 16:08 UTC (permalink / raw)
To: netfilter-devel
Refactor code to add the new expr_to_intervals(). This function takes
the list of set element expressions and convert them to a list of
half-closed intervals.
This is useful for different purposes, such as interval overlap
and conflicts detection.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
src/segtree.c | 34 ++++++++++++++++++++++++++--------
1 file changed, 26 insertions(+), 8 deletions(-)
diff --git a/src/segtree.c b/src/segtree.c
index f544704..c8d63b8 100644
--- a/src/segtree.c
+++ b/src/segtree.c
@@ -299,20 +299,20 @@ static bool interval_conflict(const struct elementary_interval *e1,
return false;
}
-static int set_to_segtree(struct list_head *msgs, struct expr *set,
- struct seg_tree *tree)
+static unsigned int expr_to_intervals(const struct expr *set,
+ unsigned int keylen,
+ struct elementary_interval **intervals)
{
- struct elementary_interval *intervals[set->size];
struct elementary_interval *ei;
struct expr *i, *next;
unsigned int n;
mpz_t low, high;
- mpz_init2(low, tree->keylen);
- mpz_init2(high, tree->keylen);
+ mpz_init2(low, keylen);
+ mpz_init2(high, keylen);
/*
- * Convert elements to intervals and sort by priority.
+ * Convert elements to intervals
*/
n = 0;
list_for_each_entry_safe(i, next, &set->expressions, list) {
@@ -320,10 +320,30 @@ static int set_to_segtree(struct list_head *msgs, struct expr *set,
range_expr_value_high(high, i);
ei = ei_alloc(low, high, i, 0);
intervals[n++] = ei;
+ }
+ mpz_clear(high);
+ mpz_clear(low);
+
+ return n;
+}
+
+static int set_to_segtree(struct list_head *msgs, struct expr *set,
+ struct seg_tree *tree)
+{
+ struct elementary_interval *intervals[set->size];
+ struct expr *i, *next;
+ unsigned int n;
+
+ n = expr_to_intervals(set, tree->keylen, intervals);
+ list_for_each_entry_safe(i, next, &set->expressions, list) {
list_del(&i->list);
expr_free(i);
}
+
+ /*
+ * Sort intervals by priority.
+ */
qsort(intervals, n, sizeof(intervals[0]), interval_cmp);
/*
@@ -340,8 +360,6 @@ static int set_to_segtree(struct list_head *msgs, struct expr *set,
ei_insert(tree, intervals[n]);
}
- mpz_clear(high);
- mpz_clear(low);
return 0;
}
--
2.1.4
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH nft 3/4] segtree: rename set expression set_to_segtree()
2016-04-23 16:08 [PATCH nft 0/4] Interval overlap detection for named sets Pablo Neira Ayuso
2016-04-23 16:08 ` [PATCH nft 1/4] segtree: set expr->len for prefix expression from interval_map_decompose() Pablo Neira Ayuso
2016-04-23 16:08 ` [PATCH nft 2/4] segtree: add expr_to_intervals() Pablo Neira Ayuso
@ 2016-04-23 16:08 ` Pablo Neira Ayuso
2016-04-23 16:08 ` [PATCH nft 4/4] segtree: add interval overlap detection for dynamic updates Pablo Neira Ayuso
2016-04-25 10:38 ` [PATCH nft 0/4] Interval overlap detection for named sets Patrick McHardy
4 siblings, 0 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2016-04-23 16:08 UTC (permalink / raw)
To: netfilter-devel
This function is modified by a follow up patch to take the set object,
so rename it to init.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
src/segtree.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/src/segtree.c b/src/segtree.c
index c8d63b8..879264b 100644
--- a/src/segtree.c
+++ b/src/segtree.c
@@ -327,16 +327,16 @@ static unsigned int expr_to_intervals(const struct expr *set,
return n;
}
-static int set_to_segtree(struct list_head *msgs, struct expr *set,
+static int set_to_segtree(struct list_head *msgs, struct expr *init,
struct seg_tree *tree)
{
- struct elementary_interval *intervals[set->size];
+ struct elementary_interval *intervals[init->size];
struct expr *i, *next;
unsigned int n;
- n = expr_to_intervals(set, tree->keylen, intervals);
+ n = expr_to_intervals(init, tree->keylen, intervals);
- list_for_each_entry_safe(i, next, &set->expressions, list) {
+ list_for_each_entry_safe(i, next, &init->expressions, list) {
list_del(&i->list);
expr_free(i);
}
@@ -349,9 +349,9 @@ static int set_to_segtree(struct list_head *msgs, struct expr *set,
/*
* Insert elements into tree
*/
- for (n = 0; n < set->size; n++) {
- if (set->set_flags & SET_F_MAP &&
- n < set->size - 1 &&
+ for (n = 0; n < init->size; n++) {
+ if (init->set_flags & SET_F_MAP &&
+ n < init->size - 1 &&
interval_conflict(intervals[n], intervals[n+1]))
return expr_binary_error(msgs,
intervals[n]->expr,
--
2.1.4
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH nft 4/4] segtree: add interval overlap detection for dynamic updates
2016-04-23 16:08 [PATCH nft 0/4] Interval overlap detection for named sets Pablo Neira Ayuso
` (2 preceding siblings ...)
2016-04-23 16:08 ` [PATCH nft 3/4] segtree: rename set expression set_to_segtree() Pablo Neira Ayuso
@ 2016-04-23 16:08 ` Pablo Neira Ayuso
2016-04-25 10:38 ` [PATCH nft 0/4] Interval overlap detection for named sets Patrick McHardy
4 siblings, 0 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2016-04-23 16:08 UTC (permalink / raw)
To: netfilter-devel
Make sure the new intervals that we want to add are not overlapping with
any of the existing ones.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
src/segtree.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 52 insertions(+), 3 deletions(-)
diff --git a/src/segtree.c b/src/segtree.c
index 879264b..1d505a5 100644
--- a/src/segtree.c
+++ b/src/segtree.c
@@ -327,12 +327,61 @@ static unsigned int expr_to_intervals(const struct expr *set,
return n;
}
-static int set_to_segtree(struct list_head *msgs, struct expr *init,
- struct seg_tree *tree)
+/* This function checks for overlaps in two ways:
+ *
+ * 1) A new interval end intersects an existing interval.
+ * 2) New intervals that are larger than existing ones, that don't intersect
+ * at all, but that wrap the existing ones.
+ */
+static bool interval_overlap(const struct elementary_interval *e1,
+ const struct elementary_interval *e2)
+{
+ return (mpz_cmp(e1->left, e2->left) >= 0 &&
+ mpz_cmp(e1->left, e2->right) <= 0) ||
+ (mpz_cmp(e1->right, e2->left) >= 0 &&
+ mpz_cmp(e1->right, e2->right) <= 0) ||
+ (mpz_cmp(e1->left, e2->left) <= 0 &&
+ mpz_cmp(e1->right, e2->right) >= 0);
+}
+
+static int set_overlap(struct list_head *msgs, const struct set *set,
+ struct expr *init, unsigned int keylen)
+{
+ struct elementary_interval *new_intervals[init->size];
+ struct elementary_interval *intervals[set->init->size];
+ unsigned int n, m, i, j;
+
+ n = expr_to_intervals(init, keylen, new_intervals);
+ m = expr_to_intervals(set->init, keylen, intervals);
+
+ for (i = 0; i < n; i++) {
+ for (j = 0; j < m; j++) {
+ if (interval_overlap(new_intervals[i], intervals[j]))
+ return expr_error(msgs,
+ new_intervals[i]->expr,
+ "interval overlaps with an existing one");
+ }
+ }
+
+ return 0;
+}
+
+static int set_to_segtree(struct list_head *msgs, struct set *set,
+ struct expr *init, struct seg_tree *tree, bool add)
{
struct elementary_interval *intervals[init->size];
struct expr *i, *next;
unsigned int n;
+ int err;
+
+ /* We are updating an existing set with new elements, check if the new
+ * interval overlaps with any of the existing ones.
+ */
+ if (add && set->init != init) {
+ err = set_overlap(msgs, set, init, tree->keylen);
+ if (err < 0)
+ return err;
+ }
n = expr_to_intervals(init, tree->keylen, intervals);
@@ -491,7 +540,7 @@ int set_to_intervals(struct list_head *errs, struct set *set,
LIST_HEAD(list);
seg_tree_init(&tree, set, init);
- if (set_to_segtree(errs, init, &tree) < 0)
+ if (set_to_segtree(errs, set, init, &tree, add) < 0)
return -1;
segtree_linearize(&list, set, init, &tree, add);
--
2.1.4
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH nft 0/4] Interval overlap detection for named sets
2016-04-23 16:08 [PATCH nft 0/4] Interval overlap detection for named sets Pablo Neira Ayuso
` (3 preceding siblings ...)
2016-04-23 16:08 ` [PATCH nft 4/4] segtree: add interval overlap detection for dynamic updates Pablo Neira Ayuso
@ 2016-04-25 10:38 ` Patrick McHardy
2016-04-25 11:57 ` Pablo Neira Ayuso
4 siblings, 1 reply; 10+ messages in thread
From: Patrick McHardy @ 2016-04-25 10:38 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel
On 23.04, Pablo Neira Ayuso wrote:
> Hi,
>
> This patchset adds the missing code to reject overlapping intervals.
>
> # nft add table ip filter
> # nft add set ip filter myset { type ipv4_addr\; flags interval\; }
> # nft add chain ip filter output { type filter hook output priority 0\; }
> # nft add rule ip daddr @myset counter packets 0 bytes 0
> # nft add element ip filter myset { 127.0.0.0/16 }
>
> Then, if you add an overlapping element:
>
> # nft add element ip filter myset { 127.0.0.0/24 }
> <cmdline>:1:31-42: Error: interval overlaps with an existing one
> add element ip filter myset { 127.0.0.0/24 }
> ^^^^^^^^^^^^
>
> The new validation code from userspace rejects this to avoid shadowing
> issues.
This is actually intended. There is no issue with shadowing since sets only
contain one statement, present or not present. For maps something like this
does make sense, but for sets it only makes it harder to use.
Generally, we have a conflict resolution based on size, the more specific
element wins. The assumption being that if you add something generic and
something more specific, the more specific item is an exception to the
more generic one.
> Pablo Neira Ayuso (4):
> segtree: set expr->len for prefix expression from interval_map_decompose()
> segtree: add expr_to_intervals()
> segtree: rename set expression set_to_segtree()
> segtree: add interval overlap detection for dynamic updates
>
> src/segtree.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 80 insertions(+), 12 deletions(-)
>
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH nft 0/4] Interval overlap detection for named sets
2016-04-25 10:38 ` [PATCH nft 0/4] Interval overlap detection for named sets Patrick McHardy
@ 2016-04-25 11:57 ` Pablo Neira Ayuso
2016-04-25 16:59 ` Patrick McHardy
0 siblings, 1 reply; 10+ messages in thread
From: Pablo Neira Ayuso @ 2016-04-25 11:57 UTC (permalink / raw)
To: Patrick McHardy; +Cc: netfilter-devel
On Mon, Apr 25, 2016 at 11:38:32AM +0100, Patrick McHardy wrote:
> On 23.04, Pablo Neira Ayuso wrote:
> > Hi,
> >
> > This patchset adds the missing code to reject overlapping intervals.
> >
> > # nft add table ip filter
> > # nft add set ip filter myset { type ipv4_addr\; flags interval\; }
> > # nft add chain ip filter output { type filter hook output priority 0\; }
> > # nft add rule ip daddr @myset counter packets 0 bytes 0
> > # nft add element ip filter myset { 127.0.0.0/16 }
> >
> > Then, if you add an overlapping element:
> >
> > # nft add element ip filter myset { 127.0.0.0/24 }
> > <cmdline>:1:31-42: Error: interval overlaps with an existing one
> > add element ip filter myset { 127.0.0.0/24 }
> > ^^^^^^^^^^^^
> >
> > The new validation code from userspace rejects this to avoid shadowing
> > issues.
>
> This is actually intended. There is no issue with shadowing since sets only
> contain one statement, present or not present.
There is issue, eg:
# nft add element ip x myset { 1.1.1.0/24 }
# nft add element ip x myset { 1.1.1.1 }
# nft list ruleset
table ip x {
set myset {
type ipv4_addr
flags interval
elements = { 1.1.1.0, 1.1.1.1}
}
...
}
If we don't reject the above by now, the listing back to userspace is
not consistent. This is not trivial to resolve since the
representation in the kernel (rbtree) are individual nodes. For the
example above we get a representation like:
{ 0.0.0.0/end, 1.1.1.0, 1.1.1.1, 1.1.1.2/end, 1.1.1.255/end }
This is hard to reconstruct from userspace given that intervals are
not easy to identify anymore after this situation.
> For maps something like this does make sense, but for sets it only
> makes it harder to use.
>
> Generally, we have a conflict resolution based on size, the more specific
> element wins. The assumption being that if you add something generic and
> something more specific, the more specific item is an exception to the
> more generic one.
Conflicts can be resolved through merges from userspace at some point.
Basically the idea is that, when maps are not in place, the smaller
interval will get removed and the new larger one is added in the same
batch, so this happens in the same go. nft can deal with these
overlaps so this is easier to users.
This requires a bit more work on the caching side, but I would like to
provide a more conservative solution by now by rejecting overlaps.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH nft 0/4] Interval overlap detection for named sets
2016-04-25 11:57 ` Pablo Neira Ayuso
@ 2016-04-25 16:59 ` Patrick McHardy
2016-04-25 21:32 ` Pablo Neira Ayuso
0 siblings, 1 reply; 10+ messages in thread
From: Patrick McHardy @ 2016-04-25 16:59 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel
On 25.04, Pablo Neira Ayuso wrote:
> On Mon, Apr 25, 2016 at 11:38:32AM +0100, Patrick McHardy wrote:
> > On 23.04, Pablo Neira Ayuso wrote:
> > > This patchset adds the missing code to reject overlapping intervals.
> > >
> > > # nft add table ip filter
> > > # nft add set ip filter myset { type ipv4_addr\; flags interval\; }
> > > # nft add chain ip filter output { type filter hook output priority 0\; }
> > > # nft add rule ip daddr @myset counter packets 0 bytes 0
> > > # nft add element ip filter myset { 127.0.0.0/16 }
> > >
> > > Then, if you add an overlapping element:
> > >
> > > # nft add element ip filter myset { 127.0.0.0/24 }
> > > <cmdline>:1:31-42: Error: interval overlaps with an existing one
> > > add element ip filter myset { 127.0.0.0/24 }
> > > ^^^^^^^^^^^^
> > >
> > > The new validation code from userspace rejects this to avoid shadowing
> > > issues.
> >
> > This is actually intended. There is no issue with shadowing since sets only
> > contain one statement, present or not present.
>
> There is issue, eg:
>
> # nft add element ip x myset { 1.1.1.0/24 }
> # nft add element ip x myset { 1.1.1.1 }
> # nft list ruleset
> table ip x {
> set myset {
> type ipv4_addr
> flags interval
> elements = { 1.1.1.0, 1.1.1.1}
> }
> ...
> }
>
> If we don't reject the above by now, the listing back to userspace is
> not consistent. This is not trivial to resolve since the
> representation in the kernel (rbtree) are individual nodes. For the
> example above we get a representation like:
>
> { 0.0.0.0/end, 1.1.1.0, 1.1.1.1, 1.1.1.2/end, 1.1.1.255/end }
>
> This is hard to reconstruct from userspace given that intervals are
> not easy to identify anymore after this situation.
Yes, the 1.1.1.2/end is also incorrect, which is what messes up the
reconstruction.
For sets its quite easy. If we insert something that is completely overlapped,
there's nothing to do anyways. We simply do nothing. If we insert somsething
that extends an interval in one of both directions, we adjust it appropriately.
We already do exactly this when adding multiple elements to a set in a single
step:
# nft --debug=segtree,netlink filter output ip saddr { 192.168.1.0/24, 192.168.1.100 } counter
__set%d filter 7
insert: [c0a80100 c0a801ff]
insert: [c0a80164 c0a80164]
split [c0a80100 c0a801ff]
iter: [c0a80100 c0a80163]
iter: [c0a80164 c0a80164]
iter: [c0a80165 c0a801ff]
list: [00000000 c0a800ff]
list: [c0a80100 c0a801ff]
list: [c0a80200 ffffffff]
__set%d filter 0
element 00000000 : 1 [end] element 0001a8c0 : 0 [end] element 0002a8c0 : 1 [end]
Simply keeping track of the seqtree changes and applying those to the kernel
should handle this fine.
> > For maps something like this does make sense, but for sets it only
> > makes it harder to use.
> >
> > Generally, we have a conflict resolution based on size, the more specific
> > element wins. The assumption being that if you add something generic and
> > something more specific, the more specific item is an exception to the
> > more generic one.
>
> Conflicts can be resolved through merges from userspace at some point.
> Basically the idea is that, when maps are not in place, the smaller
> interval will get removed and the new larger one is added in the same
> batch, so this happens in the same go. nft can deal with these
> overlaps so this is easier to users.
The conflict resolution policy needs to be discussed. As I explained, we
already do this, but use a different policy (try adding a map with
two overlapping ranges). This of course so far only applies when the
overlapping elements are added in a single step, so using a different
policy for successive additions might make sense, however its something
I'd like to discuss first.
> This requires a bit more work on the caching side, but I would like to
> provide a more conservative solution by now by rejecting overlaps.
I fully agree for maps for now, but for sets it seems rather simple to do
a proper adjustment. I can give it a try if you want.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH nft 0/4] Interval overlap detection for named sets
2016-04-25 16:59 ` Patrick McHardy
@ 2016-04-25 21:32 ` Pablo Neira Ayuso
2016-04-25 21:49 ` Patrick McHardy
0 siblings, 1 reply; 10+ messages in thread
From: Pablo Neira Ayuso @ 2016-04-25 21:32 UTC (permalink / raw)
To: Patrick McHardy; +Cc: netfilter-devel
On Mon, Apr 25, 2016 at 05:59:56PM +0100, Patrick McHardy wrote:
> On 25.04, Pablo Neira Ayuso wrote:
> > On Mon, Apr 25, 2016 at 11:38:32AM +0100, Patrick McHardy wrote:
> > > On 23.04, Pablo Neira Ayuso wrote:
> > > > This patchset adds the missing code to reject overlapping intervals.
> > > >
> > > > # nft add table ip filter
> > > > # nft add set ip filter myset { type ipv4_addr\; flags interval\; }
> > > > # nft add chain ip filter output { type filter hook output priority 0\; }
> > > > # nft add rule ip daddr @myset counter packets 0 bytes 0
> > > > # nft add element ip filter myset { 127.0.0.0/16 }
> > > >
> > > > Then, if you add an overlapping element:
> > > >
> > > > # nft add element ip filter myset { 127.0.0.0/24 }
> > > > <cmdline>:1:31-42: Error: interval overlaps with an existing one
> > > > add element ip filter myset { 127.0.0.0/24 }
> > > > ^^^^^^^^^^^^
> > > >
> > > > The new validation code from userspace rejects this to avoid shadowing
> > > > issues.
> > >
> > > This is actually intended. There is no issue with shadowing since sets only
> > > contain one statement, present or not present.
> >
> > There is issue, eg:
> >
> > # nft add element ip x myset { 1.1.1.0/24 }
> > # nft add element ip x myset { 1.1.1.1 }
> > # nft list ruleset
> > table ip x {
> > set myset {
> > type ipv4_addr
> > flags interval
> > elements = { 1.1.1.0, 1.1.1.1}
> > }
> > ...
> > }
> >
> > If we don't reject the above by now, the listing back to userspace is
> > not consistent. This is not trivial to resolve since the
> > representation in the kernel (rbtree) are individual nodes. For the
> > example above we get a representation like:
> >
> > { 0.0.0.0/end, 1.1.1.0, 1.1.1.1, 1.1.1.2/end, 1.1.1.255/end }
> >
> > This is hard to reconstruct from userspace given that intervals are
> > not easy to identify anymore after this situation.
>
> Yes, the 1.1.1.2/end is also incorrect, which is what messes up the
> reconstruction.
It is not only this, if you add this:
192.168.0.100-192.168.0.200
and then later on:
192.168.100.0/24
You get:
{ 0.0.0.0/end, 192.168.0.0, 192.168.0.100, 192.168.0.201/end, 192.168.1.0/end }
and the interval reconstruction back on the listing becomes inconsistent.
On top of that, deletion breaks this even further since the user would
use what nft reconstructs from those overlapping interval (which is
not what explicit inserted). Then, trying to delete element from what
we get back via listing makes this even worse as we end up leaving the
set in more inconsistent state and triggering bogus mismatches.
> For sets its quite easy. If we insert something that is completely overlapped,
> there's nothing to do anyways. We simply do nothing. If we insert somsething
> that extends an interval in one of both directions, we adjust it appropriately.
As I said, this adjustment can be done from userspace by deleting the
previous interval and by adding the new one in the same batch.
> We already do exactly this when adding multiple elements to a set in a single
> step:
>
> # nft --debug=segtree,netlink filter output ip saddr { 192.168.1.0/24, 192.168.1.100 } counter
Just a side note to be clear: I'm not modifying the behaviour for
anonymous sets in my patches.
> __set%d filter 7
> insert: [c0a80100 c0a801ff]
> insert: [c0a80164 c0a80164]
> split [c0a80100 c0a801ff]
> iter: [c0a80100 c0a80163]
> iter: [c0a80164 c0a80164]
> iter: [c0a80165 c0a801ff]
> list: [00000000 c0a800ff]
> list: [c0a80100 c0a801ff]
> list: [c0a80200 ffffffff]
>
> __set%d filter 0
> element 00000000 : 1 [end] element 0001a8c0 : 0 [end] element 0002a8c0 : 1 [end]
>
> Simply keeping track of the seqtree changes and applying those to the kernel
> should handle this fine.
Assuming the shadowed interval is in the kernel, we can delete it and
add the new one in the same go from the batch.
Assuming the new interval is shadowed by what we have in the kernel,
we can just send no update.
> > > For maps something like this does make sense, but for sets it only
> > > makes it harder to use.
> > >
> > > Generally, we have a conflict resolution based on size, the more specific
> > > element wins. The assumption being that if you add something generic and
> > > something more specific, the more specific item is an exception to the
> > > more generic one.
> >
> > Conflicts can be resolved through merges from userspace at some point.
> > Basically the idea is that, when maps are not in place, the smaller
> > interval will get removed and the new larger one is added in the same
> > batch, so this happens in the same go. nft can deal with these
> > overlaps so this is easier to users.
>
> The conflict resolution policy needs to be discussed. As I explained, we
> already do this, but use a different policy (try adding a map with
> two overlapping ranges). This of course so far only applies when the
> overlapping elements are added in a single step, so using a different
> policy for successive additions might make sense, however its something
> I'd like to discuss first.
>
> > This requires a bit more work on the caching side, but I would like to
> > provide a more conservative solution by now by rejecting overlaps.
>
> I fully agree for maps for now, but for sets it seems rather simple to do
> a proper adjustment. I can give it a try if you want.
Frankly, I don't see any reason not to push this update.
You can follow up with an incremental update if you want, but I would
like to see this is working in the next release. The current approach
is conservative in the sense that it rejects what can take us to
problems, we can enhance this by making nft smarter in the sense of
allowing this automatic adjustments.
Moreover, note that the rejection of overlapping intervals is
consistent with what we have with the other hashtable set.
# nft add element x y { 1.1.1.1 }
# nft add element x y { 1.1.1.1 }
Error: Element already exists
We can probably introduce a similar behaviour to what we have with
tables, something like:
# nft create element x y { 1.1.1.1 }
# nft create element x y { 1.1.1.1 }
Error: element already exists
Then change the current behaviour to ignore readding of existing
elements:
# nft add element x y { 1.1.1.1 }
# nft add element x y { 1.1.1.1 }
So feel free to follow up on this after my patchset. We'll soon have
tests for this so any improvement will be covered by them.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH nft 0/4] Interval overlap detection for named sets
2016-04-25 21:32 ` Pablo Neira Ayuso
@ 2016-04-25 21:49 ` Patrick McHardy
0 siblings, 0 replies; 10+ messages in thread
From: Patrick McHardy @ 2016-04-25 21:49 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel
On 25.04, Pablo Neira Ayuso wrote:
> On Mon, Apr 25, 2016 at 05:59:56PM +0100, Patrick McHardy wrote:
> > On 25.04, Pablo Neira Ayuso wrote:
> > > There is issue, eg:
> > >
> > > # nft add element ip x myset { 1.1.1.0/24 }
> > > # nft add element ip x myset { 1.1.1.1 }
> > > # nft list ruleset
> > > table ip x {
> > > set myset {
> > > type ipv4_addr
> > > flags interval
> > > elements = { 1.1.1.0, 1.1.1.1}
> > > }
> > > ...
> > > }
> > >
> > > If we don't reject the above by now, the listing back to userspace is
> > > not consistent. This is not trivial to resolve since the
> > > representation in the kernel (rbtree) are individual nodes. For the
> > > example above we get a representation like:
> > >
> > > { 0.0.0.0/end, 1.1.1.0, 1.1.1.1, 1.1.1.2/end, 1.1.1.255/end }
> > >
> > > This is hard to reconstruct from userspace given that intervals are
> > > not easy to identify anymore after this situation.
> >
> > Yes, the 1.1.1.2/end is also incorrect, which is what messes up the
> > reconstruction.
>
> It is not only this, if you add this:
>
> 192.168.0.100-192.168.0.200
>
> and then later on:
>
> 192.168.100.0/24
>
> You get:
>
> { 0.0.0.0/end, 192.168.0.0, 192.168.0.100, 192.168.0.201/end, 192.168.1.0/end }
>
> and the interval reconstruction back on the listing becomes inconsistent.
>
> On top of that, deletion breaks this even further since the user would
> use what nft reconstructs from those overlapping interval (which is
> not what explicit inserted). Then, trying to delete element from what
> we get back via listing makes this even worse as we end up leaving the
> set in more inconsistent state and triggering bogus mismatches.
That's a good point. For consistency reasons, deletions would have to punch
a hole in existing ranges. I guess its a question of how to think of set
elements, do we think of them as containing ranges or every element within a
range? The later appears to more practical and also solve these problems, IOW
if we have 192.168.0.0/24 and delete 192.168.0.255 the resulting set contains
a range .0-.254.
> > For sets its quite easy. If we insert something that is completely overlapped,
> > there's nothing to do anyways. We simply do nothing. If we insert somsething
> > that extends an interval in one of both directions, we adjust it appropriately.
>
> As I said, this adjustment can be done from userspace by deleting the
> previous interval and by adding the new one in the same batch.
>
> > We already do exactly this when adding multiple elements to a set in a single
> > step:
> >
> > # nft --debug=segtree,netlink filter output ip saddr { 192.168.1.0/24, 192.168.1.100 } counter
>
> Just a side note to be clear: I'm not modifying the behaviour for
> anonymous sets in my patches.
Yeah, this was just for demonstration of what the seqtree does in this case.
> > __set%d filter 7
> > insert: [c0a80100 c0a801ff]
> > insert: [c0a80164 c0a80164]
> > split [c0a80100 c0a801ff]
> > iter: [c0a80100 c0a80163]
> > iter: [c0a80164 c0a80164]
> > iter: [c0a80165 c0a801ff]
> > list: [00000000 c0a800ff]
> > list: [c0a80100 c0a801ff]
> > list: [c0a80200 ffffffff]
> >
> > __set%d filter 0
> > element 00000000 : 1 [end] element 0001a8c0 : 0 [end] element 0002a8c0 : 1 [end]
> >
> > Simply keeping track of the seqtree changes and applying those to the kernel
> > should handle this fine.
>
> Assuming the shadowed interval is in the kernel, we can delete it and
> add the new one in the same go from the batch.
>
> Assuming the new interval is shadowed by what we have in the kernel,
> we can just send no update.
Yeah, that's even easier, the tracking might just reduce a few deletions that
we will immediately re-add.
> > > > For maps something like this does make sense, but for sets it only
> > > > makes it harder to use.
> > > >
> > > > Generally, we have a conflict resolution based on size, the more specific
> > > > element wins. The assumption being that if you add something generic and
> > > > something more specific, the more specific item is an exception to the
> > > > more generic one.
> > >
> > > Conflicts can be resolved through merges from userspace at some point.
> > > Basically the idea is that, when maps are not in place, the smaller
> > > interval will get removed and the new larger one is added in the same
> > > batch, so this happens in the same go. nft can deal with these
> > > overlaps so this is easier to users.
> >
> > The conflict resolution policy needs to be discussed. As I explained, we
> > already do this, but use a different policy (try adding a map with
> > two overlapping ranges). This of course so far only applies when the
> > overlapping elements are added in a single step, so using a different
> > policy for successive additions might make sense, however its something
> > I'd like to discuss first.
> >
> > > This requires a bit more work on the caching side, but I would like to
> > > provide a more conservative solution by now by rejecting overlaps.
> >
> > I fully agree for maps for now, but for sets it seems rather simple to do
> > a proper adjustment. I can give it a try if you want.
>
> Frankly, I don't see any reason not to push this update.
>
> You can follow up with an incremental update if you want, but I would
> like to see this is working in the next release. The current approach
> is conservative in the sense that it rejects what can take us to
> problems, we can enhance this by making nft smarter in the sense of
> allowing this automatic adjustments.
I agree. Let me have a closer look at the details, but no objections from me.
^ permalink raw reply [flat|nested] 10+ messages in thread