[PATCH v5 0/3] selinux: avtab arrays and refactors

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v5 0/3] selinux: avtab arrays and refactors
@ 2023-11-03 17:29 Jacob Satterfield
  2023-11-03 17:29 ` [PATCH v5 1/3] selinux: refactor avtab_node comparisons Jacob Satterfield
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Jacob Satterfield @ 2023-11-03 17:29 UTC (permalink / raw)
  To: selinux; +Cc: Jacob Satterfield, stephen.smalley.work, paul, omosnace

As the refpolicy and the default Fedora policy continue to grow in
size, especially with regard to rules / access vectors, the memory
usage of the policydb and runtime to search through it increases.
Looking at /proc/slabinfo indicates that the avtab_node_cachep
kmem_cache is significantly responsible for overall memory usage and
was a good target for optimizations. Running "perf stat" on the
"load_policy" command shows that a majority of time is spent adding
rules into the avtab.

This patch series is an attempt at optimizing these hot spots within
the security server implementation to help it scale with additional
rules in the future.

Patch 1 refactors the internal avtab.c to remove duplicative code
and homogenize comparisons between data structures.

Patch 2 computes the number of conditional avtab rules instead of
using the regular avtab table size as a hint to avtab_alloc() to get
a properly sized hashtable. 

Patch 3 changes avtab to use arrays instead of a kmem_cache for the
individual nodes of the hashtable and depends on patch 3 due to the
pre-allocation of the array to prevent growing or shrinking it.

Changelog:

v5:
- No code changes, only patch hash updates

v4:
- selinux: avtab iteration macros
  - dropped from the patch series until the use-case for it becomes
    more apparent in future patches
- selinux: refactor avtab_node comparisons
  - unchanged
- selinux: fix conditional avtab slot hint
  - created from "use arrays for avtab hashtable nodes" patch
  - added comments in avtab.c explaining the use of the nrules param
- selinux: use arrays for avtab hashtable nodes
  - split independent code to "fix conditional avtab slot hint" patch
    to ease reviewing and allow independent merging
  - fixed an edge case exercised by this patch that causes a NULL
    deref in avtab_destroy(); discovered by clang-analyzer

v3:
- selinux: simplify avtab_insert_node() prototype
  - dropped, already merged
- selinux: avtab iteration macros
  - reverted changes to avtab_destroy() which created a UAF bug
- selinux: refactor avtab_node comparisons
  - fixed indent issues in avtab_node_cmp()
  - removed unlikely() macro in avtab_insert()

v2:
- selinux: use arrays for avtab hashtable nodes
  - rewritten to use pointers instead of indices
  - NULL_NODE_IDX and NODES_ARRAY_IDX macros removed
  - getter functions removed
  - two-pass algorithm used to compute cond. table size prior to
    allocating the array
- selinux: hweight optimization in avtab_read_item
  - dropped, already merged
- selinux: shrink conditional avtab node array
  - dropped, not needed due to new allocation approach
- selinux: refactor avtab_node comparisons
  - newly added
- selinux: avtab iteration macros
  - newly added
- selinux: simplify avtab_insert_node() prototype
  - newly added

Jacob Satterfield (3):
  selinux: refactor avtab_node comparisons
  selinux: fix conditional avtab slot hint
  selinux: use arrays for avtab hashtable nodes

 security/selinux/ss/avtab.c       | 152 +++++++++++++++---------------
 security/selinux/ss/avtab.h       |   4 +-
 security/selinux/ss/conditional.c |  38 +++++---
 security/selinux/ss/conditional.h |   2 +-
 4 files changed, 103 insertions(+), 93 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 1/3] selinux: refactor avtab_node comparisons
  2023-11-03 17:29 [PATCH v5 0/3] selinux: avtab arrays and refactors Jacob Satterfield
@ 2023-11-03 17:29 ` Jacob Satterfield
  2023-11-21  1:29   ` Paul Moore
  2023-11-03 17:29 ` [PATCH v5 2/3] selinux: fix conditional avtab slot hint Jacob Satterfield
  2023-11-03 17:29 ` [PATCH v5 3/3] selinux: use arrays for avtab hashtable nodes Jacob Satterfield
  2 siblings, 1 reply; 12+ messages in thread
From: Jacob Satterfield @ 2023-11-03 17:29 UTC (permalink / raw)
  To: selinux; +Cc: Jacob Satterfield, stephen.smalley.work, paul, omosnace

In four separate functions within avtab, the same comparison logic is
used. The only difference is how the result is handled or whether there
is a unique specifier value to be checked for or used.

Extracting this functionality into the avtab_node_cmp() function unifies
the comparison logic between searching and insertion and gets rid of
duplicative code so that the implementation is easier to maintain.

Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
---
 security/selinux/ss/avtab.c | 101 +++++++++++++++---------------------
 1 file changed, 41 insertions(+), 60 deletions(-)

diff --git a/security/selinux/ss/avtab.c b/security/selinux/ss/avtab.c
index 8751a602ead2..697eb4352439 100644
--- a/security/selinux/ss/avtab.c
+++ b/security/selinux/ss/avtab.c
@@ -96,12 +96,34 @@ avtab_insert_node(struct avtab *h, struct avtab_node **dst,
 	return newnode;
 }
 
+static int avtab_node_cmp(const struct avtab_key *key1,
+			  const struct avtab_key *key2)
+{
+	u16 specified = key1->specified & ~(AVTAB_ENABLED|AVTAB_ENABLED_OLD);
+
+	if (key1->source_type == key2->source_type &&
+	    key1->target_type == key2->target_type &&
+	    key1->target_class == key2->target_class &&
+	    (specified & key2->specified))
+		return 0;
+	if (key1->source_type < key2->source_type)
+		return -1;
+	if (key1->source_type == key2->source_type &&
+	    key1->target_type < key2->target_type)
+		return -1;
+	if (key1->source_type == key2->source_type &&
+	    key1->target_type == key2->target_type &&
+	    key1->target_class < key2->target_class)
+		return -1;
+	return 1;
+}
+
 static int avtab_insert(struct avtab *h, const struct avtab_key *key,
 			const struct avtab_datum *datum)
 {
 	u32 hvalue;
 	struct avtab_node *prev, *cur, *newnode;
-	u16 specified = key->specified & ~(AVTAB_ENABLED|AVTAB_ENABLED_OLD);
+	int cmp;
 
 	if (!h || !h->nslot || h->nel == U32_MAX)
 		return -EINVAL;
@@ -110,23 +132,11 @@ static int avtab_insert(struct avtab *h, const struct avtab_key *key,
 	for (prev = NULL, cur = h->htable[hvalue];
 	     cur;
 	     prev = cur, cur = cur->next) {
-		if (key->source_type == cur->key.source_type &&
-		    key->target_type == cur->key.target_type &&
-		    key->target_class == cur->key.target_class &&
-		    (specified & cur->key.specified)) {
-			/* extended perms may not be unique */
-			if (specified & AVTAB_XPERMS)
-				break;
+		cmp = avtab_node_cmp(key, &cur->key);
+		/* extended perms may not be unique */
+		if (cmp == 0 && !(key->specified & AVTAB_XPERMS))
 			return -EEXIST;
-		}
-		if (key->source_type < cur->key.source_type)
-			break;
-		if (key->source_type == cur->key.source_type &&
-		    key->target_type < cur->key.target_type)
-			break;
-		if (key->source_type == cur->key.source_type &&
-		    key->target_type == cur->key.target_type &&
-		    key->target_class < cur->key.target_class)
+		if (cmp <= 0)
 			break;
 	}
 
@@ -148,7 +158,7 @@ struct avtab_node *avtab_insert_nonunique(struct avtab *h,
 {
 	u32 hvalue;
 	struct avtab_node *prev, *cur;
-	u16 specified = key->specified & ~(AVTAB_ENABLED|AVTAB_ENABLED_OLD);
+	int cmp;
 
 	if (!h || !h->nslot || h->nel == U32_MAX)
 		return NULL;
@@ -156,19 +166,8 @@ struct avtab_node *avtab_insert_nonunique(struct avtab *h,
 	for (prev = NULL, cur = h->htable[hvalue];
 	     cur;
 	     prev = cur, cur = cur->next) {
-		if (key->source_type == cur->key.source_type &&
-		    key->target_type == cur->key.target_type &&
-		    key->target_class == cur->key.target_class &&
-		    (specified & cur->key.specified))
-			break;
-		if (key->source_type < cur->key.source_type)
-			break;
-		if (key->source_type == cur->key.source_type &&
-		    key->target_type < cur->key.target_type)
-			break;
-		if (key->source_type == cur->key.source_type &&
-		    key->target_type == cur->key.target_type &&
-		    key->target_class < cur->key.target_class)
+		cmp = avtab_node_cmp(key, &cur->key);
+		if (cmp <= 0)
 			break;
 	}
 	return avtab_insert_node(h, prev ? &prev->next : &h->htable[hvalue],
@@ -183,7 +182,7 @@ struct avtab_node *avtab_search_node(struct avtab *h,
 {
 	u32 hvalue;
 	struct avtab_node *cur;
-	u16 specified = key->specified & ~(AVTAB_ENABLED|AVTAB_ENABLED_OLD);
+	int cmp;
 
 	if (!h || !h->nslot)
 		return NULL;
@@ -191,20 +190,10 @@ struct avtab_node *avtab_search_node(struct avtab *h,
 	hvalue = avtab_hash(key, h->mask);
 	for (cur = h->htable[hvalue]; cur;
 	     cur = cur->next) {
-		if (key->source_type == cur->key.source_type &&
-		    key->target_type == cur->key.target_type &&
-		    key->target_class == cur->key.target_class &&
-		    (specified & cur->key.specified))
+		cmp = avtab_node_cmp(key, &cur->key);
+		if (cmp == 0)
 			return cur;
-
-		if (key->source_type < cur->key.source_type)
-			break;
-		if (key->source_type == cur->key.source_type &&
-		    key->target_type < cur->key.target_type)
-			break;
-		if (key->source_type == cur->key.source_type &&
-		    key->target_type == cur->key.target_type &&
-		    key->target_class < cur->key.target_class)
+		if (cmp < 0)
 			break;
 	}
 	return NULL;
@@ -213,27 +202,19 @@ struct avtab_node *avtab_search_node(struct avtab *h,
 struct avtab_node*
 avtab_search_node_next(struct avtab_node *node, u16 specified)
 {
+	struct avtab_key tmp_key;
 	struct avtab_node *cur;
+	int cmp;
 
 	if (!node)
 		return NULL;
-
-	specified &= ~(AVTAB_ENABLED|AVTAB_ENABLED_OLD);
+	tmp_key = node->key;
+	tmp_key.specified = specified;
 	for (cur = node->next; cur; cur = cur->next) {
-		if (node->key.source_type == cur->key.source_type &&
-		    node->key.target_type == cur->key.target_type &&
-		    node->key.target_class == cur->key.target_class &&
-		    (specified & cur->key.specified))
+		cmp = avtab_node_cmp(&tmp_key, &cur->key);
+		if (cmp == 0)
 			return cur;
-
-		if (node->key.source_type < cur->key.source_type)
-			break;
-		if (node->key.source_type == cur->key.source_type &&
-		    node->key.target_type < cur->key.target_type)
-			break;
-		if (node->key.source_type == cur->key.source_type &&
-		    node->key.target_type == cur->key.target_type &&
-		    node->key.target_class < cur->key.target_class)
+		if (cmp < 0)
 			break;
 	}
 	return NULL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 2/3] selinux: fix conditional avtab slot hint
  2023-11-03 17:29 [PATCH v5 0/3] selinux: avtab arrays and refactors Jacob Satterfield
  2023-11-03 17:29 ` [PATCH v5 1/3] selinux: refactor avtab_node comparisons Jacob Satterfield
@ 2023-11-03 17:29 ` Jacob Satterfield
  2023-11-03 18:22   ` Stephen Smalley
  2023-11-21  1:29   ` Paul Moore
  2023-11-03 17:29 ` [PATCH v5 3/3] selinux: use arrays for avtab hashtable nodes Jacob Satterfield
  2 siblings, 2 replies; 12+ messages in thread
From: Jacob Satterfield @ 2023-11-03 17:29 UTC (permalink / raw)
  To: selinux; +Cc: Jacob Satterfield, stephen.smalley.work, paul, omosnace

Due to how conditional rules are written in the binary policy, the
code responsible for loading does not know how many conditional rules
there are before creating the avtab structure. Instead, it uses the
number of elements in the non-conditional avtab as a hint and allocates
the hash table based on it. In the refpolicy and default Fedora policy,
the actual sizes of these tables are not similar (~85k vs ~10k) thereby
creating more slots than needed and resulting in wasted memory.

This patch introduces a two-pass algorithm to calculate the conditional
rule count before allocating the avtab nodes array. Albeit with a slight
performance penalty in reading a portion of the binary policy twice,
this causes the number of hash slots for the conditional array to become
4096 instead of 32768. At 8-bytes per slot on 64-bit architectures, this
results in a net savings of 224 KB of heap memory.

Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
---
 security/selinux/ss/avtab.c       | 15 ++++++++++--
 security/selinux/ss/avtab.h       |  2 +-
 security/selinux/ss/conditional.c | 38 ++++++++++++++++++++-----------
 security/selinux/ss/conditional.h |  2 +-
 4 files changed, 40 insertions(+), 17 deletions(-)

diff --git a/security/selinux/ss/avtab.c b/security/selinux/ss/avtab.c
index 697eb4352439..a9227674899b 100644
--- a/security/selinux/ss/avtab.c
+++ b/security/selinux/ss/avtab.c
@@ -340,7 +340,7 @@ static const uint16_t spec_order[] = {
 int avtab_read_item(struct avtab *a, void *fp, struct policydb *pol,
 		    int (*insertf)(struct avtab *a, const struct avtab_key *k,
 				   const struct avtab_datum *d, void *p),
-		    void *p)
+		    void *p, u32 *nrules)
 {
 	__le16 buf16[4];
 	u16 enabled;
@@ -414,6 +414,11 @@ int avtab_read_item(struct avtab *a, void *fp, struct policydb *pol,
 			if (val & spec_order[i]) {
 				key.specified = spec_order[i] | enabled;
 				datum.u.data = le32_to_cpu(buf32[items++]);
+				/* first pass of conditional table read */
+				if (nrules) {
+					(*nrules)++;
+					continue;
+				}
 				rc = insertf(a, &key, &datum, p);
 				if (rc)
 					return rc;
@@ -492,6 +497,12 @@ int avtab_read_item(struct avtab *a, void *fp, struct policydb *pol,
 		pr_err("SELinux: avtab: invalid type\n");
 		return -EINVAL;
 	}
+
+	/* first pass of conditional table read */
+	if (nrules) {
+		(*nrules)++;
+		return 0;
+	}
 	return insertf(a, &key, &datum, p);
 }
 
@@ -525,7 +536,7 @@ int avtab_read(struct avtab *a, void *fp, struct policydb *pol)
 		goto bad;
 
 	for (i = 0; i < nel; i++) {
-		rc = avtab_read_item(a, fp, pol, avtab_insertf, NULL);
+		rc = avtab_read_item(a, fp, pol, avtab_insertf, NULL, NULL);
 		if (rc) {
 			if (rc == -ENOMEM)
 				pr_err("SELinux: avtab: out of memory\n");
diff --git a/security/selinux/ss/avtab.h b/security/selinux/ss/avtab.h
index 3c3904bf02b0..86fb6f793eec 100644
--- a/security/selinux/ss/avtab.h
+++ b/security/selinux/ss/avtab.h
@@ -104,7 +104,7 @@ struct policydb;
 int avtab_read_item(struct avtab *a, void *fp, struct policydb *pol,
 		    int (*insert)(struct avtab *a, const struct avtab_key *k,
 				  const struct avtab_datum *d, void *p),
-		    void *p);
+		    void *p, u32 *nrules);
 
 int avtab_read(struct avtab *a, void *fp, struct policydb *pol);
 int avtab_write_item(struct policydb *p, const struct avtab_node *cur, void *fp);
diff --git a/security/selinux/ss/conditional.c b/security/selinux/ss/conditional.c
index 81ff676f209a..810319bf0e60 100644
--- a/security/selinux/ss/conditional.c
+++ b/security/selinux/ss/conditional.c
@@ -321,9 +321,9 @@ static int cond_insertf(struct avtab *a, const struct avtab_key *k,
 	return 0;
 }
 
-static int cond_read_av_list(struct policydb *p, void *fp,
+static int cond_read_av_list(struct policydb *p, struct policy_file *fp,
 			     struct cond_av_list *list,
-			     struct cond_av_list *other)
+			     struct cond_av_list *other, u32 *nrules)
 {
 	int rc;
 	__le32 buf[1];
@@ -347,7 +347,7 @@ static int cond_read_av_list(struct policydb *p, void *fp,
 	for (i = 0; i < len; i++) {
 		data.dst = &list->nodes[i];
 		rc = avtab_read_item(&p->te_cond_avtab, fp, p, cond_insertf,
-				     &data);
+				     &data, nrules);
 		if (rc) {
 			kfree(list->nodes);
 			list->nodes = NULL;
@@ -373,7 +373,8 @@ static int expr_node_isvalid(struct policydb *p, struct cond_expr_node *expr)
 	return 1;
 }
 
-static int cond_read_node(struct policydb *p, struct cond_node *node, void *fp)
+static int cond_read_node(struct policydb *p, struct cond_node *node,
+			  struct policy_file *fp, u32 *nrules)
 {
 	__le32 buf[2];
 	u32 i, len;
@@ -407,16 +408,17 @@ static int cond_read_node(struct policydb *p, struct cond_node *node, void *fp)
 			return -EINVAL;
 	}
 
-	rc = cond_read_av_list(p, fp, &node->true_list, NULL);
+	rc = cond_read_av_list(p, fp, &node->true_list, NULL, nrules);
 	if (rc)
 		return rc;
-	return cond_read_av_list(p, fp, &node->false_list, &node->true_list);
+	return cond_read_av_list(p, fp, &node->false_list, &node->true_list, nrules);
 }
 
-int cond_read_list(struct policydb *p, void *fp)
+int cond_read_list(struct policydb *p, struct policy_file *fp)
 {
 	__le32 buf[1];
-	u32 i, len;
+	struct policy_file tmp_fp;
+	u32 i, len, nrules;
 	int rc;
 
 	rc = next_entry(buf, fp, sizeof(buf));
@@ -428,15 +430,25 @@ int cond_read_list(struct policydb *p, void *fp)
 	p->cond_list = kcalloc(len, sizeof(*p->cond_list), GFP_KERNEL);
 	if (!p->cond_list)
 		return -ENOMEM;
+	p->cond_list_len = len;
+
+	/* first pass to only calculate the avrule count */
+	tmp_fp = *fp;
+	nrules = 0;
+	for (i = 0; i < p->cond_list_len; i++) {
+		rc = cond_read_node(p, &p->cond_list[i], &tmp_fp, &nrules);
+		if (rc)
+			goto err;
+		cond_node_destroy(&p->cond_list[i]);
+	}
 
-	rc = avtab_alloc(&(p->te_cond_avtab), p->te_avtab.nel);
+	rc = avtab_alloc(&(p->te_cond_avtab), nrules);
 	if (rc)
 		goto err;
 
-	p->cond_list_len = len;
-
-	for (i = 0; i < len; i++) {
-		rc = cond_read_node(p, &p->cond_list[i], fp);
+	/* second pass to read in the conditional nodes */
+	for (i = 0; i < p->cond_list_len; i++) {
+		rc = cond_read_node(p, &p->cond_list[i], fp, NULL);
 		if (rc)
 			goto err;
 	}
diff --git a/security/selinux/ss/conditional.h b/security/selinux/ss/conditional.h
index 5a7b51278dc6..62a12d00cac9 100644
--- a/security/selinux/ss/conditional.h
+++ b/security/selinux/ss/conditional.h
@@ -70,7 +70,7 @@ int cond_destroy_bool(void *key, void *datum, void *p);
 int cond_index_bool(void *key, void *datum, void *datap);
 
 int cond_read_bool(struct policydb *p, struct symtab *s, void *fp);
-int cond_read_list(struct policydb *p, void *fp);
+int cond_read_list(struct policydb *p, struct policy_file *fp);
 int cond_write_bool(void *key, void *datum, void *ptr);
 int cond_write_list(struct policydb *p, void *fp);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v5 3/3] selinux: use arrays for avtab hashtable nodes
  2023-11-03 17:29 [PATCH v5 0/3] selinux: avtab arrays and refactors Jacob Satterfield
  2023-11-03 17:29 ` [PATCH v5 1/3] selinux: refactor avtab_node comparisons Jacob Satterfield
  2023-11-03 17:29 ` [PATCH v5 2/3] selinux: fix conditional avtab slot hint Jacob Satterfield
@ 2023-11-03 17:29 ` Jacob Satterfield
  2023-11-03 18:23   ` Stephen Smalley
  2 siblings, 1 reply; 12+ messages in thread
From: Jacob Satterfield @ 2023-11-03 17:29 UTC (permalink / raw)
  To: selinux; +Cc: Jacob Satterfield, stephen.smalley.work, paul, omosnace

The current avtab hashtable employs a separate chaining collision
resolution strategy where each bucket/chain holds an ordered linked list
of pointers to kmem_cache allocated avtab_node elements.

On Fedora 38 (x86_64) using the default policy, avtab_node_cachep
uses 573 slabs each containing 170 objects totaling 2,337,840 bytes.
A call to kmem_cache_zalloc() is required for every single rule, which
in the default policy is currently 96,730 and continually rising.

When both sets of avtab_node (regular and cond.) are turned into arrays
with the hash table chain heads pointing into it, this results in only
two additional kvcalloc() calls and the complete removal of the
kmem_cache itself and its memory and runtime overheads.

Running "perf stat -r 100 -d load_policy" has shown a runtime reduction
of around 10% on a Fedora 38 x86_64 VM with this single patch. Future
patches focused on improving the hash table's collision resolution
strategy and array layout (struct-of-arrays vs. array-of-structs) may
elicit even more caching and therefore runtime performance improvements.

To prevent the conditional table from under-allocating the avtab_node
array, which creates a heap-overflow bug, the two-pass algorithm in the
patch "selinux: fix conditional avtab slot hint" is required.

Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
---
 security/selinux/ss/avtab.c | 36 ++++++++++++++++++++----------------
 security/selinux/ss/avtab.h |  2 ++
 2 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/security/selinux/ss/avtab.c b/security/selinux/ss/avtab.c
index a9227674899b..273490783f14 100644
--- a/security/selinux/ss/avtab.c
+++ b/security/selinux/ss/avtab.c
@@ -24,7 +24,6 @@
 #include "avtab.h"
 #include "policydb.h"
 
-static struct kmem_cache *avtab_node_cachep __ro_after_init;
 static struct kmem_cache *avtab_xperms_cachep __ro_after_init;
 
 /* Based on MurmurHash3, written by Austin Appleby and placed in the
@@ -72,17 +71,15 @@ avtab_insert_node(struct avtab *h, struct avtab_node **dst,
 {
 	struct avtab_node *newnode;
 	struct avtab_extended_perms *xperms;
-	newnode = kmem_cache_zalloc(avtab_node_cachep, GFP_KERNEL);
-	if (newnode == NULL)
+	if (h->nel == h->nnodes)
 		return NULL;
+	newnode = &h->nodes[h->nel];
 	newnode->key = *key;
 
 	if (key->specified & AVTAB_XPERMS) {
 		xperms = kmem_cache_zalloc(avtab_xperms_cachep, GFP_KERNEL);
-		if (xperms == NULL) {
-			kmem_cache_free(avtab_node_cachep, newnode);
+		if (xperms == NULL)
 			return NULL;
-		}
 		*xperms = *(datum->u.xperms);
 		newnode->datum.u.xperms = xperms;
 	} else {
@@ -225,7 +222,7 @@ void avtab_destroy(struct avtab *h)
 	u32 i;
 	struct avtab_node *cur, *temp;
 
-	if (!h)
+	if (!h || !h->htable)
 		return;
 
 	for (i = 0; i < h->nslot; i++) {
@@ -236,11 +233,13 @@ void avtab_destroy(struct avtab *h)
 			if (temp->key.specified & AVTAB_XPERMS)
 				kmem_cache_free(avtab_xperms_cachep,
 						temp->datum.u.xperms);
-			kmem_cache_free(avtab_node_cachep, temp);
 		}
 	}
 	kvfree(h->htable);
+	kvfree(h->nodes);
 	h->htable = NULL;
+	h->nodes = NULL;
+	h->nnodes = 0;
 	h->nel = 0;
 	h->nslot = 0;
 	h->mask = 0;
@@ -249,20 +248,28 @@ void avtab_destroy(struct avtab *h)
 void avtab_init(struct avtab *h)
 {
 	h->htable = NULL;
+	h->nodes = NULL;
+	h->nnodes = 0;
 	h->nel = 0;
 	h->nslot = 0;
 	h->mask = 0;
 }
 
-static int avtab_alloc_common(struct avtab *h, u32 nslot)
+static int avtab_alloc_common(struct avtab *h, u32 nslot, u32 nrules)
 {
 	if (!nslot)
 		return 0;
 
-	h->htable = kvcalloc(nslot, sizeof(void *), GFP_KERNEL);
+	h->htable = kvcalloc(nslot, sizeof(*h->htable), GFP_KERNEL);
 	if (!h->htable)
 		return -ENOMEM;
-
+	h->nodes = kvcalloc(nrules, sizeof(*h->nodes), GFP_KERNEL);
+	if (!h->nodes) {
+		kvfree(h->htable);
+		h->htable = NULL;
+		return -ENOMEM;
+	}
+	h->nnodes = nrules;
 	h->nslot = nslot;
 	h->mask = nslot - 1;
 	return 0;
@@ -278,7 +285,7 @@ int avtab_alloc(struct avtab *h, u32 nrules)
 		if (nslot > MAX_AVTAB_HASH_BUCKETS)
 			nslot = MAX_AVTAB_HASH_BUCKETS;
 
-		rc = avtab_alloc_common(h, nslot);
+		rc = avtab_alloc_common(h, nslot, nrules);
 		if (rc)
 			return rc;
 	}
@@ -289,7 +296,7 @@ int avtab_alloc(struct avtab *h, u32 nrules)
 
 int avtab_alloc_dup(struct avtab *new, const struct avtab *orig)
 {
-	return avtab_alloc_common(new, orig->nslot);
+	return avtab_alloc_common(new, orig->nslot, orig->nel);
 }
 
 #ifdef CONFIG_SECURITY_SELINUX_DEBUG
@@ -617,9 +624,6 @@ int avtab_write(struct policydb *p, struct avtab *a, void *fp)
 
 void __init avtab_cache_init(void)
 {
-	avtab_node_cachep = kmem_cache_create("avtab_node",
-					      sizeof(struct avtab_node),
-					      0, SLAB_PANIC, NULL);
 	avtab_xperms_cachep = kmem_cache_create("avtab_extended_perms",
 						sizeof(struct avtab_extended_perms),
 						0, SLAB_PANIC, NULL);
diff --git a/security/selinux/ss/avtab.h b/security/selinux/ss/avtab.h
index 86fb6f793eec..5e465be6f057 100644
--- a/security/selinux/ss/avtab.h
+++ b/security/selinux/ss/avtab.h
@@ -82,6 +82,8 @@ struct avtab_node {
 
 struct avtab {
 	struct avtab_node **htable;
+	struct avtab_node *nodes;
+	u32 nnodes;	/* number of nodes */
 	u32 nel;	/* number of elements */
 	u32 nslot;      /* number of hash slots */
 	u32 mask;       /* mask to compute hash func */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/3] selinux: fix conditional avtab slot hint
  2023-11-03 17:29 ` [PATCH v5 2/3] selinux: fix conditional avtab slot hint Jacob Satterfield
@ 2023-11-03 18:22   ` Stephen Smalley
  2023-11-21  1:29   ` Paul Moore
  1 sibling, 0 replies; 12+ messages in thread
From: Stephen Smalley @ 2023-11-03 18:22 UTC (permalink / raw)
  To: Jacob Satterfield; +Cc: selinux, paul, omosnace

On Fri, Nov 3, 2023 at 1:30 PM Jacob Satterfield
<jsatterfield.linux@gmail.com> wrote:
>
> Due to how conditional rules are written in the binary policy, the
> code responsible for loading does not know how many conditional rules
> there are before creating the avtab structure. Instead, it uses the
> number of elements in the non-conditional avtab as a hint and allocates
> the hash table based on it. In the refpolicy and default Fedora policy,
> the actual sizes of these tables are not similar (~85k vs ~10k) thereby
> creating more slots than needed and resulting in wasted memory.
>
> This patch introduces a two-pass algorithm to calculate the conditional
> rule count before allocating the avtab nodes array. Albeit with a slight
> performance penalty in reading a portion of the binary policy twice,
> this causes the number of hash slots for the conditional array to become
> 4096 instead of 32768. At 8-bytes per slot on 64-bit architectures, this
> results in a net savings of 224 KB of heap memory.
>
> Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>

Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 3/3] selinux: use arrays for avtab hashtable nodes
  2023-11-03 17:29 ` [PATCH v5 3/3] selinux: use arrays for avtab hashtable nodes Jacob Satterfield
@ 2023-11-03 18:23   ` Stephen Smalley
  0 siblings, 0 replies; 12+ messages in thread
From: Stephen Smalley @ 2023-11-03 18:23 UTC (permalink / raw)
  To: Jacob Satterfield; +Cc: selinux, paul, omosnace

On Fri, Nov 3, 2023 at 1:30 PM Jacob Satterfield
<jsatterfield.linux@gmail.com> wrote:
>
> The current avtab hashtable employs a separate chaining collision
> resolution strategy where each bucket/chain holds an ordered linked list
> of pointers to kmem_cache allocated avtab_node elements.
>
> On Fedora 38 (x86_64) using the default policy, avtab_node_cachep
> uses 573 slabs each containing 170 objects totaling 2,337,840 bytes.
> A call to kmem_cache_zalloc() is required for every single rule, which
> in the default policy is currently 96,730 and continually rising.
>
> When both sets of avtab_node (regular and cond.) are turned into arrays
> with the hash table chain heads pointing into it, this results in only
> two additional kvcalloc() calls and the complete removal of the
> kmem_cache itself and its memory and runtime overheads.
>
> Running "perf stat -r 100 -d load_policy" has shown a runtime reduction
> of around 10% on a Fedora 38 x86_64 VM with this single patch. Future
> patches focused on improving the hash table's collision resolution
> strategy and array layout (struct-of-arrays vs. array-of-structs) may
> elicit even more caching and therefore runtime performance improvements.
>
> To prevent the conditional table from under-allocating the avtab_node
> array, which creates a heap-overflow bug, the two-pass algorithm in the
> patch "selinux: fix conditional avtab slot hint" is required.
>
> Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>

Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 1/3] selinux: refactor avtab_node comparisons
  2023-11-03 17:29 ` [PATCH v5 1/3] selinux: refactor avtab_node comparisons Jacob Satterfield
@ 2023-11-21  1:29   ` Paul Moore
  0 siblings, 0 replies; 12+ messages in thread
From: Paul Moore @ 2023-11-21  1:29 UTC (permalink / raw)
  To: Jacob Satterfield, selinux
  Cc: Jacob Satterfield, stephen.smalley.work, omosnace

On Nov  3, 2023 Jacob Satterfield <jsatterfield.linux@gmail.com> wrote:
> 
> In four separate functions within avtab, the same comparison logic is
> used. The only difference is how the result is handled or whether there
> is a unique specifier value to be checked for or used.
> 
> Extracting this functionality into the avtab_node_cmp() function unifies
> the comparison logic between searching and insertion and gets rid of
> duplicative code so that the implementation is easier to maintain.
> 
> Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> ---
>  security/selinux/ss/avtab.c | 101 +++++++++++++++---------------------
>  1 file changed, 41 insertions(+), 60 deletions(-)

Looks good to me, merged into selinux/dev - thanks!

--
paul-moore.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/3] selinux: fix conditional avtab slot hint
  2023-11-03 17:29 ` [PATCH v5 2/3] selinux: fix conditional avtab slot hint Jacob Satterfield
  2023-11-03 18:22   ` Stephen Smalley
@ 2023-11-21  1:29   ` Paul Moore
  2023-11-22 17:35     ` Jacob Satterfield
  1 sibling, 1 reply; 12+ messages in thread
From: Paul Moore @ 2023-11-21  1:29 UTC (permalink / raw)
  To: Jacob Satterfield, selinux
  Cc: Jacob Satterfield, stephen.smalley.work, omosnace

On Nov  3, 2023 Jacob Satterfield <jsatterfield.linux@gmail.com> wrote:
> 
> Due to how conditional rules are written in the binary policy, the
> code responsible for loading does not know how many conditional rules
> there are before creating the avtab structure. Instead, it uses the
> number of elements in the non-conditional avtab as a hint and allocates
> the hash table based on it. In the refpolicy and default Fedora policy,
> the actual sizes of these tables are not similar (~85k vs ~10k) thereby
> creating more slots than needed and resulting in wasted memory.
> 
> This patch introduces a two-pass algorithm to calculate the conditional
> rule count before allocating the avtab nodes array. Albeit with a slight
> performance penalty in reading a portion of the binary policy twice,
> this causes the number of hash slots for the conditional array to become
> 4096 instead of 32768. At 8-bytes per slot on 64-bit architectures, this
> results in a net savings of 224 KB of heap memory.
> 
> Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> ---
>  security/selinux/ss/avtab.c       | 15 ++++++++++--
>  security/selinux/ss/avtab.h       |  2 +-
>  security/selinux/ss/conditional.c | 38 ++++++++++++++++++++-----------
>  security/selinux/ss/conditional.h |  2 +-
>  4 files changed, 40 insertions(+), 17 deletions(-)

...

> diff --git a/security/selinux/ss/conditional.c b/security/selinux/ss/conditional.c
> index 81ff676f209a..810319bf0e60 100644
> --- a/security/selinux/ss/conditional.c
> +++ b/security/selinux/ss/conditional.c
> @@ -407,16 +408,17 @@ static int cond_read_node(struct policydb *p, struct cond_node *node, void *fp)
>  			return -EINVAL;
>  	}
>  
> -	rc = cond_read_av_list(p, fp, &node->true_list, NULL);
> +	rc = cond_read_av_list(p, fp, &node->true_list, NULL, nrules);
>  	if (rc)
>  		return rc;
> -	return cond_read_av_list(p, fp, &node->false_list, &node->true_list);
> +	return cond_read_av_list(p, fp, &node->false_list, &node->true_list, nrules);
>  }
>  
> -int cond_read_list(struct policydb *p, void *fp)
> +int cond_read_list(struct policydb *p, struct policy_file *fp)
>  {
>  	__le32 buf[1];
> -	u32 i, len;
> +	struct policy_file tmp_fp;
> +	u32 i, len, nrules;
>  	int rc;
>  
>  	rc = next_entry(buf, fp, sizeof(buf));
> @@ -428,15 +430,25 @@ int cond_read_list(struct policydb *p, void *fp)
>  	p->cond_list = kcalloc(len, sizeof(*p->cond_list), GFP_KERNEL);
>  	if (!p->cond_list)
>  		return -ENOMEM;
> +	p->cond_list_len = len;
> +
> +	/* first pass to only calculate the avrule count */
> +	tmp_fp = *fp;
> +	nrules = 0;
> +	for (i = 0; i < p->cond_list_len; i++) {
> +		rc = cond_read_node(p, &p->cond_list[i], &tmp_fp, &nrules);
> +		if (rc)
> +			goto err;
> +		cond_node_destroy(&p->cond_list[i]);
> +	}

I'm a concerned about all the work we have to do just to count the
conditional rules.  Other than not working with existing binary
policies, have you looked at bumping the policy version and introducing
a binary format change that would include the number of conditional
rules?

--
paul-moore.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/3] selinux: fix conditional avtab slot hint
  2023-11-21  1:29   ` Paul Moore
@ 2023-11-22 17:35     ` Jacob Satterfield
  2023-11-22 18:14       ` Paul Moore
  0 siblings, 1 reply; 12+ messages in thread
From: Jacob Satterfield @ 2023-11-22 17:35 UTC (permalink / raw)
  To: Paul Moore; +Cc: selinux, stephen.smalley.work, omosnace

On Mon, Nov 20, 2023 at 8:29 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Nov  3, 2023 Jacob Satterfield <jsatterfield.linux@gmail.com> wrote:
> >
> > Due to how conditional rules are written in the binary policy, the
> > code responsible for loading does not know how many conditional rules
> > there are before creating the avtab structure. Instead, it uses the
> > number of elements in the non-conditional avtab as a hint and allocates
> > the hash table based on it. In the refpolicy and default Fedora policy,
> > the actual sizes of these tables are not similar (~85k vs ~10k) thereby
> > creating more slots than needed and resulting in wasted memory.
> >
> > This patch introduces a two-pass algorithm to calculate the conditional
> > rule count before allocating the avtab nodes array. Albeit with a slight
> > performance penalty in reading a portion of the binary policy twice,
> > this causes the number of hash slots for the conditional array to become
> > 4096 instead of 32768. At 8-bytes per slot on 64-bit architectures, this
> > results in a net savings of 224 KB of heap memory.
> >
> > Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
> > Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> > ---
> >  security/selinux/ss/avtab.c       | 15 ++++++++++--
> >  security/selinux/ss/avtab.h       |  2 +-
> >  security/selinux/ss/conditional.c | 38 ++++++++++++++++++++-----------
> >  security/selinux/ss/conditional.h |  2 +-
> >  4 files changed, 40 insertions(+), 17 deletions(-)
>
> ...
>
> > diff --git a/security/selinux/ss/conditional.c b/security/selinux/ss/conditional.c
> > index 81ff676f209a..810319bf0e60 100644
> > --- a/security/selinux/ss/conditional.c
> > +++ b/security/selinux/ss/conditional.c
> > @@ -407,16 +408,17 @@ static int cond_read_node(struct policydb *p, struct cond_node *node, void *fp)
> >                       return -EINVAL;
> >       }
> >
> > -     rc = cond_read_av_list(p, fp, &node->true_list, NULL);
> > +     rc = cond_read_av_list(p, fp, &node->true_list, NULL, nrules);
> >       if (rc)
> >               return rc;
> > -     return cond_read_av_list(p, fp, &node->false_list, &node->true_list);
> > +     return cond_read_av_list(p, fp, &node->false_list, &node->true_list, nrules);
> >  }
> >
> > -int cond_read_list(struct policydb *p, void *fp)
> > +int cond_read_list(struct policydb *p, struct policy_file *fp)
> >  {
> >       __le32 buf[1];
> > -     u32 i, len;
> > +     struct policy_file tmp_fp;
> > +     u32 i, len, nrules;
> >       int rc;
> >
> >       rc = next_entry(buf, fp, sizeof(buf));
> > @@ -428,15 +430,25 @@ int cond_read_list(struct policydb *p, void *fp)
> >       p->cond_list = kcalloc(len, sizeof(*p->cond_list), GFP_KERNEL);
> >       if (!p->cond_list)
> >               return -ENOMEM;
> > +     p->cond_list_len = len;
> > +
> > +     /* first pass to only calculate the avrule count */
> > +     tmp_fp = *fp;
> > +     nrules = 0;
> > +     for (i = 0; i < p->cond_list_len; i++) {
> > +             rc = cond_read_node(p, &p->cond_list[i], &tmp_fp, &nrules);
> > +             if (rc)
> > +                     goto err;
> > +             cond_node_destroy(&p->cond_list[i]);
> > +     }
>
> I'm a concerned about all the work we have to do just to count the
> conditional rules.  Other than not working with existing binary
> policies, have you looked at bumping the policy version and introducing
> a binary format change that would include the number of conditional
> rules?
>
> --
> paul-moore.com

Thanks for raising the issue. I had considered adding the total size
of the conditional table to the binary policy, but I wasn't sure if it
would be substantive enough to warrant bumping the policy version. As
you point out, the counting work will be needed for existing binary
policies making this patch necessary for the default case, but if you
are concerned about the performance penalty this patch brings (which
is less than the gain provided by the avtab array patch), then there
are two threads to possibly be worked on.

One is to rework this patch to include more invasive changes to count
rules without actually reading and destroying nodes thus saving cycles
but requiring more lines of code. Because policy parsing is not
handled separately from the construction of the policydb structure
(they are deeply intertwined), I was reluctant to add more complexity
just to have a parse-only code path. Would you prefer speed or simpler
logic for older policies?

The other is to obviously sum and add the total size of the
conditional rules table to the binary policy in cond_write_list() of
libsepol/src/write.c. If you think just adding the total conditional
rules table size is worth bumping the kernel policy version, then I
can prioritize sending a patch to libsepol for inclusion as soon as I
can.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/3] selinux: fix conditional avtab slot hint
  2023-11-22 17:35     ` Jacob Satterfield
@ 2023-11-22 18:14       ` Paul Moore
  2023-11-29 17:39         ` Jacob Satterfield
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Moore @ 2023-11-22 18:14 UTC (permalink / raw)
  To: Jacob Satterfield; +Cc: selinux, stephen.smalley.work, omosnace

On Wed, Nov 22, 2023 at 12:35 PM Jacob Satterfield
<jsatterfield.linux@gmail.com> wrote:
> On Mon, Nov 20, 2023 at 8:29 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > I'm a concerned about all the work we have to do just to count the
> > conditional rules.  Other than not working with existing binary
> > policies, have you looked at bumping the policy version and introducing
> > a binary format change that would include the number of conditional
> > rules?
>
> Thanks for raising the issue. I had considered adding the total size
> of the conditional table to the binary policy, but I wasn't sure if it
> would be substantive enough to warrant bumping the policy version.

I appreciate the thoughtfulness, but the version field in the policy
is 32-bits and we are currently up to policy format v33 so we've still
got a few version numbers to play with :)

> As
> you point out, the counting work will be needed for existing binary
> policies making this patch necessary for the default case, but if you
> are concerned about the performance penalty this patch brings (which
> is less than the gain provided by the avtab array patch), then there
> are two threads to possibly be worked on.

To be clear, I'm not thinking about supporting this for existing
policies, just the new policy format; the existing policy versions
would behave as they do now ... although if we do the array conversion
we will likely need to do some type of realloc()/retry or something, I
dunno, I'll leave that to you to brainstorm ;)

> One is to rework this patch to include more invasive changes to count
> rules without actually reading and destroying nodes thus saving cycles
> but requiring more lines of code. Because policy parsing is not
> handled separately from the construction of the policydb structure
> (they are deeply intertwined), I was reluctant to add more complexity
> just to have a parse-only code path. Would you prefer speed or simpler
> logic for older policies?

That's the problem we have right now.  We have to do a lot of work
(allocations, etc.) that we throw away in the case where we are
counting, not to mention that bolting on the count-only functionality
is kinda hacky/ugly (not your fault, that is just the way the code is
right now).  As you mention, the alternative is to significantly
rework how we parse/load the policy, and that isn't a very exciting
prospect as far as I'm concerned.

I'm not sure if moving over to flex array is a win, I suspect that
whatever we gain in memory savings we lose in not having direct
access.  I dunno, maybe it wouldn't be too bad.

I'm open to ideas here ... I'm looking for something that would
support the improvements for a new policy with an explicit count,
while still falling back to something that works "reasonably well" in
the current case where we have to guess.  In this case "reasonably
well" means no worse than current in terms of performance and memory
use while not over complicating the code.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/3] selinux: fix conditional avtab slot hint
  2023-11-22 18:14       ` Paul Moore
@ 2023-11-29 17:39         ` Jacob Satterfield
  2023-12-07  0:11           ` Paul Moore
  0 siblings, 1 reply; 12+ messages in thread
From: Jacob Satterfield @ 2023-11-29 17:39 UTC (permalink / raw)
  To: Paul Moore; +Cc: selinux, stephen.smalley.work, omosnace

On Wed, Nov 22, 2023 at 6:15 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Wed, Nov 22, 2023 at 12:35 PM Jacob Satterfield
> <jsatterfield.linux@gmail.com> wrote:
> > On Mon, Nov 20, 2023 at 8:29 PM Paul Moore <paul@paul-moore.com> wrote:
> > >
> > > I'm a concerned about all the work we have to do just to count the
> > > conditional rules.  Other than not working with existing binary
> > > policies, have you looked at bumping the policy version and introducing
> > > a binary format change that would include the number of conditional
> > > rules?
> >
> > Thanks for raising the issue. I had considered adding the total size
> > of the conditional table to the binary policy, but I wasn't sure if it
> > would be substantive enough to warrant bumping the policy version.
>
> I appreciate the thoughtfulness, but the version field in the policy
> is 32-bits and we are currently up to policy format v33 so we've still
> got a few version numbers to play with :)
>

Sounds good, I will implement the policy version bump.

> > As
> > you point out, the counting work will be needed for existing binary
> > policies making this patch necessary for the default case, but if you
> > are concerned about the performance penalty this patch brings (which
> > is less than the gain provided by the avtab array patch), then there
> > are two threads to possibly be worked on.
>
> To be clear, I'm not thinking about supporting this for existing
> policies, just the new policy format; the existing policy versions
> would behave as they do now ... although if we do the array conversion
> we will likely need to do some type of realloc()/retry or something, I
> dunno, I'll leave that to you to brainstorm ;)
>

I still believe the performance gained by converting to arrays
outweighs the cost of supporting older versions. See the discussion
below.

> > One is to rework this patch to include more invasive changes to count
> > rules without actually reading and destroying nodes thus saving cycles
> > but requiring more lines of code. Because policy parsing is not
> > handled separately from the construction of the policydb structure
> > (they are deeply intertwined), I was reluctant to add more complexity
> > just to have a parse-only code path. Would you prefer speed or simpler
> > logic for older policies?
>
> That's the problem we have right now.  We have to do a lot of work
> (allocations, etc.) that we throw away in the case where we are
> counting, not to mention that bolting on the count-only functionality
> is kinda hacky/ugly (not your fault, that is just the way the code is
> right now).  As you mention, the alternative is to significantly
> rework how we parse/load the policy, and that isn't a very exciting
> prospect as far as I'm concerned.
>
> I'm not sure if moving over to flex array is a win, I suspect that
> whatever we gain in memory savings we lose in not having direct
> access.  I dunno, maybe it wouldn't be too bad.
>

Unless I misunderstand, I believe I addressed your concerns from v1
over using indices vs direct access to the arrays. From v2 --> v5, the
array conversion patch only changes how avtab_node elements are
allocated, access is the same as it is now.

> I'm open to ideas here ... I'm looking for something that would
> support the improvements for a new policy with an explicit count,
> while still falling back to something that works "reasonably well" in
> the current case where we have to guess.  In this case "reasonably
> well" means no worse than current in terms of performance and memory
> use while not over complicating the code.
>

For existing policies, I have given this a bit of thought and did some
performance testing. There are memory vs. runtime tradeoffs to
consider between counting the rules (like v5 does) and doing a realloc
(like v1 does). I'm of the opinion that the counting approach is
better overall, but I'll (r)enumerate the tradeoffs here for the
benefit of discussion.

For the realloc approach, a decent chunk of memory (currently 256 KB)
is wasted over-allocating the hash table slots because of the estimate
derived from the regular rules table. Extra memory for avtab_nodes are
allocated (either from overestimation or the dynamic growth factor)
during parsing, sometimes up to 1 MB, requiring a final realloc to
remove them. The growth factor chosen could have additional runtime
implications with regard to multiple reallocs. Regardless, this
approach has runtime costs and memory lost to overestimating the
hashtable slots needed (which in this case, can't be recovered except
through a rehashing of the whole table). 86 lines of code are added,
but it is fairly clean code.

The counting approach incurs a runtime penalty processing the
conditional rules twice. Memory is wasted constructing the struct
cond_node lists and corresponding struct cond_av_list elements twice,
but nowhere close to 1 MB. 23 lines of code are added, but it is as
you said hacky/ugly. The runtime penalty could be further optimized at
the cost of parsing code complexity and additional lines of code as
discussed earlier.

Some performance numbers from Fedora 38 running "perf stat -r 1000 -d
load_policy" (realloc is the v5 array conversion + v1 realloc
patches):

dev:       0.261174 seconds time elapsed ( +- 0.31% ) baseline
v5:         0.233972 seconds time elapsed ( +- 0.31% ) -10%
realloc:  0.226636 seconds time elapsed ( +- 0.34% ) -13.3%

Therefore, to your criteria stated above, both approaches are
empirically no worse than current in terms of performance or memory
used after load. But I believe the current counting approach is the
most reasonable given the lines of code added and the least amount of
memory used during parsing. For what it's worth, I have been
interested in designing a set of patches that would ultimately
refactor policy parsing from policy loading and would, as a side
effect, enable a simpler and more optimized counting approach for
older policies. If this is something you would entertain (even if it's
not exciting), then I will consider it for a future patch.

If you agree, I will introduce the policy version bump in the next
spin and also submit a patch to sepol. Otherwise, I can go back to the
drawing board.

Thanks for reviewing as always!

> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 2/3] selinux: fix conditional avtab slot hint
  2023-11-29 17:39         ` Jacob Satterfield
@ 2023-12-07  0:11           ` Paul Moore
  0 siblings, 0 replies; 12+ messages in thread
From: Paul Moore @ 2023-12-07  0:11 UTC (permalink / raw)
  To: Jacob Satterfield; +Cc: selinux, stephen.smalley.work, omosnace

On Wed, Nov 29, 2023 at 12:39 PM Jacob Satterfield
<jsatterfield.linux@gmail.com> wrote:
>
> On Wed, Nov 22, 2023 at 6:15 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > On Wed, Nov 22, 2023 at 12:35 PM Jacob Satterfield
> > <jsatterfield.linux@gmail.com> wrote:
> > > On Mon, Nov 20, 2023 at 8:29 PM Paul Moore <paul@paul-moore.com> wrote:
> > > >
> > > > I'm a concerned about all the work we have to do just to count the
> > > > conditional rules.  Other than not working with existing binary
> > > > policies, have you looked at bumping the policy version and introducing
> > > > a binary format change that would include the number of conditional
> > > > rules?
> > >
> > > Thanks for raising the issue. I had considered adding the total size
> > > of the conditional table to the binary policy, but I wasn't sure if it
> > > would be substantive enough to warrant bumping the policy version.
> >
> > I appreciate the thoughtfulness, but the version field in the policy
> > is 32-bits and we are currently up to policy format v33 so we've still
> > got a few version numbers to play with :)
>
> Sounds good, I will implement the policy version bump.
>
> > > As
> > > you point out, the counting work will be needed for existing binary
> > > policies making this patch necessary for the default case, but if you
> > > are concerned about the performance penalty this patch brings (which
> > > is less than the gain provided by the avtab array patch), then there
> > > are two threads to possibly be worked on.
> >
> > To be clear, I'm not thinking about supporting this for existing
> > policies, just the new policy format; the existing policy versions
> > would behave as they do now ... although if we do the array conversion
> > we will likely need to do some type of realloc()/retry or something, I
> > dunno, I'll leave that to you to brainstorm ;)
>
> I still believe the performance gained by converting to arrays
> outweighs the cost of supporting older versions. See the discussion
> below.

Annoying as it may be at times, we have an obligation to continue to
support older policy versions in the kernel.  We can't break
existing/old userspace applications or SELinux policies, period.

This doesn't mean that we have to ensure that every performance
improvement applies to the existing/old userspace; if we have an
improvement that only applies on newer policy versions, that's okay,
we just need to make sure we don't significantly penalize the existing
policies in use today.

> > > One is to rework this patch to include more invasive changes to count
> > > rules without actually reading and destroying nodes thus saving cycles
> > > but requiring more lines of code. Because policy parsing is not
> > > handled separately from the construction of the policydb structure
> > > (they are deeply intertwined), I was reluctant to add more complexity
> > > just to have a parse-only code path. Would you prefer speed or simpler
> > > logic for older policies?
> >
> > That's the problem we have right now.  We have to do a lot of work
> > (allocations, etc.) that we throw away in the case where we are
> > counting, not to mention that bolting on the count-only functionality
> > is kinda hacky/ugly (not your fault, that is just the way the code is
> > right now).  As you mention, the alternative is to significantly
> > rework how we parse/load the policy, and that isn't a very exciting
> > prospect as far as I'm concerned.
> >
> > I'm not sure if moving over to flex array is a win, I suspect that
> > whatever we gain in memory savings we lose in not having direct
> > access.  I dunno, maybe it wouldn't be too bad.
>
> Unless I misunderstand, I believe I addressed your concerns from v1
> over using indices vs direct access to the arrays. From v2 --> v5, the
> array conversion patch only changes how avtab_node elements are
> allocated, access is the same as it is now.

I probably wasn't as clear as I should have been, I'm sorry about
that.  My comment about flex arrays was in reference to a dynamic
array mechanism in the kernel known as "flex arrays", my thinking was
that we could use flex arrays to manage the size of the conditional
rules arrays with just a single pass.  While flex arrays were dynamic
and resizable, there was an obvious limitation in that direct access,
similar to conventional fixed arrays, wasn't always possible.
However, while looking up a pointer to the flex array documentation to
share in this email I now realize that flex arrays were removed from
the kernel some time ago and are no longer available ... oof :/

Looking around quickly, I see we now have xarrays, which are also
resizable, but I have zero experience with them and they may be a bit
too heavyweight for what we need.

There is also kvrealloc(), but that doesn't seem to be especially
clever, using the traditional alloc-n-copy based approach.  Who knows,
maybe that is okay since we would only be doing this on policy loads,
which should hopefully be relatively infrequent.

> > I'm open to ideas here ... I'm looking for something that would
> > support the improvements for a new policy with an explicit count,
> > while still falling back to something that works "reasonably well" in
> > the current case where we have to guess.  In this case "reasonably
> > well" means no worse than current in terms of performance and memory
> > use while not over complicating the code.
>
> For existing policies, I have given this a bit of thought and did some
> performance testing. There are memory vs. runtime tradeoffs to
> consider between counting the rules (like v5 does) and doing a realloc
> (like v1 does). I'm of the opinion that the counting approach is
> better overall, but I'll (r)enumerate the tradeoffs here for the
> benefit of discussion.

Thanks, I have to jump around enough that a refresher is always welcome :)

> For the realloc approach, a decent chunk of memory (currently 256 KB)
> is wasted over-allocating the hash table slots because of the estimate
> derived from the regular rules table. Extra memory for avtab_nodes are
> allocated (either from overestimation or the dynamic growth factor)
> during parsing, sometimes up to 1 MB, requiring a final realloc to
> remove them. The growth factor chosen could have additional runtime
> implications with regard to multiple reallocs. Regardless, this
> approach has runtime costs and memory lost to overestimating the
> hashtable slots needed (which in this case, can't be recovered except
> through a rehashing of the whole table). 86 lines of code are added,
> but it is fairly clean code.

I went back and looked at the v1 patches very quickly, I didn't want
to spend too much time there as there were a number of changes,
unrelated to the array/alloc code, that meant this early patchset
wasn't very applicable to our discussion.  However, one thing did jump
out as I was looking for the realloc code: you might want to consider
rewriting avtab_change_nodes_size() to use kvrealloc().  The change
should be trivial and it is SELinux specific code; the only drawback
is that the "shrink" operation is effectively a no-op.

I'm okay if we need to "waste" a reasonably small amount of memory due
to dynamic reallocs; we just need to be smart about balancing the
incremental size increases with the desire to moderate the number of
times we call kvrealloc().

> The counting approach incurs a runtime penalty processing the
> conditional rules twice. Memory is wasted constructing the struct
> cond_node lists and corresponding struct cond_av_list elements twice,
> but nowhere close to 1 MB. 23 lines of code are added, but it is as
> you said hacky/ugly. The runtime penalty could be further optimized at
> the cost of parsing code complexity and additional lines of code as
> discussed earlier.

I can't bring myself to merge something that takes the two-pass
approach to counting, I have this gut feeling that it is something we
are going to regret in the future.  I'm sorry.  It isn't the number of
lines of code that concern me, it's how the code is put together and
architected.  A kvrealloc() based approach is simply going to be
cleaner than a two-pass, alloc-free-alloc approach.  The kvrealloc()
solution may not be optimal in terms of memory allocated(), and policy
loads might be a bit slower due to mm issues, but it is *much* easier
to look at the code and convince yourself (and others, including
future you) that it is correct ... which is worth a *lot* in my mind.

> Some performance numbers from Fedora 38 running "perf stat -r 1000 -d
> load_policy" (realloc is the v5 array conversion + v1 realloc
> patches):
>
> dev:       0.261174 seconds time elapsed ( +- 0.31% ) baseline
> v5:         0.233972 seconds time elapsed ( +- 0.31% ) -10%
> realloc:  0.226636 seconds time elapsed ( +- 0.34% ) -13.3%
>
> Therefore, to your criteria stated above, both approaches are
> empirically no worse than current in terms of performance or memory
> used after load.

Sweet, so even the realloc approach is faster on policy load?  Awesome :)

> But I believe the current counting approach is the
> most reasonable given the lines of code added and the least amount of
> memory used during parsing.

I've tried, perhaps poorly, to explain why the two-pass approach isn't
appealing from my perspective.

> For what it's worth, I have been
> interested in designing a set of patches that would ultimately
> refactor policy parsing from policy loading and would, as a side
> effect, enable a simpler and more optimized counting approach for
> older policies. If this is something you would entertain (even if it's
> not exciting), then I will consider it for a future patch.

Possibly.  Let's get past this first and then you can explain a bit
more about what you had in mind ;)

> If you agree, I will introduce the policy version bump in the next
> spin and also submit a patch to sepol. Otherwise, I can go back to the
> drawing board.

I think I would like to see two things, either in one patchset or two,
that's up to you:

* A kvrealloc/xa based approach that works with existing policy
versions and offers improvements over what we have now in terms or
either performance or memory usage while not regressing in both.

* A new binary policy version that adds an explicit count that can be
used to properly allocate the necessary internal policy structs up
front and avoid reallocation.

Does that make sense and sound reasonable to you?

> Thanks for reviewing as always!

Thanks for your continued patience and enthusiasm :)

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-12-07  0:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-03 17:29 [PATCH v5 0/3] selinux: avtab arrays and refactors Jacob Satterfield
2023-11-03 17:29 ` [PATCH v5 1/3] selinux: refactor avtab_node comparisons Jacob Satterfield
2023-11-21  1:29   ` Paul Moore
2023-11-03 17:29 ` [PATCH v5 2/3] selinux: fix conditional avtab slot hint Jacob Satterfield
2023-11-03 18:22   ` Stephen Smalley
2023-11-21  1:29   ` Paul Moore
2023-11-22 17:35     ` Jacob Satterfield
2023-11-22 18:14       ` Paul Moore
2023-11-29 17:39         ` Jacob Satterfield
2023-12-07  0:11           ` Paul Moore
2023-11-03 17:29 ` [PATCH v5 3/3] selinux: use arrays for avtab hashtable nodes Jacob Satterfield
2023-11-03 18:23   ` Stephen Smalley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.