* [PATCH 0/3] avtab hotspot optimizations
@ 2023-09-06 15:46 Jacob Satterfield
2023-09-06 15:46 ` [PATCH 1/3] selinux: use arrays for avtab hashtable nodes Jacob Satterfield
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Jacob Satterfield @ 2023-09-06 15:46 UTC (permalink / raw)
To: selinux; +Cc: Jacob Satterfield, stephen.smalley.work, paul, omosnace
As the refpolicy and the default Fedora policy continue to grow in
size, especially with regard to rules / access vectors, the memory
usage of the policydb and runtime to search through it increases.
Looking at /proc/slabinfo indicates that the avtab_node_cachep
kmem_cache is significantly responsible for overall memory usage and
was a good target for optimizations. Running "perf record" on the
"load_policy" command shows that a majority of time is spent adding
rules into the avtab.
This patch series is a first attempt at optimizing these hot spots
within the security server implementation to help it scale with
additional rules in the future.
Patches 1-2 deal specifically with the hashtable implementation within
avtab and how memory is allocated for individual nodes.
Patch 3 is a runtime optimization discovered through profiling the
"load_policy".
Jacob Satterfield (3):
selinux: use arrays for avtab hashtable nodes
selinux: shrink conditional avtab node array
selinux: hweight optimization in avtab_read_item
security/selinux/ss/avtab.c | 143 ++++++++++++++++++------------
security/selinux/ss/avtab.h | 36 ++++++--
security/selinux/ss/conditional.c | 57 +++++++-----
security/selinux/ss/conditional.h | 2 +-
security/selinux/ss/services.c | 20 +++--
5 files changed, 166 insertions(+), 92 deletions(-)
--
2.41.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/3] selinux: use arrays for avtab hashtable nodes
2023-09-06 15:46 [PATCH 0/3] avtab hotspot optimizations Jacob Satterfield
@ 2023-09-06 15:46 ` Jacob Satterfield
2023-09-06 17:16 ` Stephen Smalley
2023-09-13 3:23 ` Paul Moore
2023-09-06 15:46 ` [PATCH 2/3] selinux: shrink conditional avtab node array Jacob Satterfield
2023-09-06 15:46 ` [PATCH 3/3] selinux: hweight optimization in avtab_read_item Jacob Satterfield
2 siblings, 2 replies; 12+ messages in thread
From: Jacob Satterfield @ 2023-09-06 15:46 UTC (permalink / raw)
To: selinux; +Cc: Jacob Satterfield, stephen.smalley.work, paul, omosnace
The current avtab hashtable employs a separate chaining collision
resolution strategy where each bucket/chain holds an ordered linked list
of pointers to kmem_cache allocated avtab_node elements.
On Fedora 38 (x86_64) using the default policy, avtab_node_cachep
uses 573 slabs each containing 170 objects totaling 2,337,840 bytes.
A call to kmem_cache_zalloc() is required for every single rule, which
in the default policy is currently 96,730 and continually rising.
When both sets of avtab_node (regular and cond.) are turned into arrays
with the hash table slot indexing into it rather than a pointer, then
this results in only 2 additional kvcalloc() calls and the complete
removal of the kmem_cache itself.
The caching characteristics of iterating a single array are better due
to locality of reference. Running "perf stat -r 100 -d load_policy"
has shown a runtime reduction of at least 10% on a Fedora 38 x86_64 VM
with this single patch. Future patches focused on improving the hash
table's collision resolution strategy and array layout (struct-of-arrays
vs. array-of-structs) may elicit even more caching and therefore runtime
performance improvements.
On 64-bit machines, the memory usage of the hash table slots is cut in
half due to the use of u32 indices instead of memory pointers. Since
~65k hash slots are used between the regular and cond. tables with the
default Fedora 38 policy, this equates to around 256KB of memory saved.
Notes:
A couple helper functions avtab_get_chain() and avtab_get_node() are
introduced to provide more standardized interfaces for use in functions
that need to search through the hash table.
NULL_NODE_IDX defines a sentinel value which helps determinine if a spot
in the hash table or the "next" member of an avtab_node are valid.
This patch causes the cond. rules table to waste memory as the size
requested for the kvcalloc() is the size of the regular rules table.
These tables rarely, if ever, have the same number of rules in practice.
The next patch addresses this shortcoming.
Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
---
security/selinux/ss/avtab.c | 75 +++++++++++++++----------------
security/selinux/ss/avtab.h | 28 ++++++++++--
security/selinux/ss/conditional.c | 13 +++---
security/selinux/ss/services.c | 20 +++++----
4 files changed, 81 insertions(+), 55 deletions(-)
diff --git a/security/selinux/ss/avtab.c b/security/selinux/ss/avtab.c
index 86d98a8e291b..620ea0a03e41 100644
--- a/security/selinux/ss/avtab.c
+++ b/security/selinux/ss/avtab.c
@@ -23,7 +23,6 @@
#include "avtab.h"
#include "policydb.h"
-static struct kmem_cache *avtab_node_cachep __ro_after_init;
static struct kmem_cache *avtab_xperms_cachep __ro_after_init;
/* Based on MurmurHash3, written by Austin Appleby and placed in the
@@ -70,17 +69,17 @@ avtab_insert_node(struct avtab *h, u32 hvalue,
struct avtab_node *prev,
const struct avtab_key *key, const struct avtab_datum *datum)
{
+ u32 newnodei;
struct avtab_node *newnode;
struct avtab_extended_perms *xperms;
- newnode = kmem_cache_zalloc(avtab_node_cachep, GFP_KERNEL);
- if (newnode == NULL)
- return NULL;
+ newnodei = ++h->nel;
+ newnode = avtab_get_node(h, newnodei);
newnode->key = *key;
if (key->specified & AVTAB_XPERMS) {
xperms = kmem_cache_zalloc(avtab_xperms_cachep, GFP_KERNEL);
if (xperms == NULL) {
- kmem_cache_free(avtab_node_cachep, newnode);
+ --h->nel;
return NULL;
}
*xperms = *(datum->u.xperms);
@@ -91,15 +90,12 @@ avtab_insert_node(struct avtab *h, u32 hvalue,
if (prev) {
newnode->next = prev->next;
- prev->next = newnode;
+ prev->next = newnodei;
} else {
- struct avtab_node **n = &h->htable[hvalue];
-
- newnode->next = *n;
- *n = newnode;
+ newnode->next = h->htable[hvalue];
+ h->htable[hvalue] = newnodei;
}
- h->nel++;
return newnode;
}
@@ -114,9 +110,9 @@ static int avtab_insert(struct avtab *h, const struct avtab_key *key,
return -EINVAL;
hvalue = avtab_hash(key, h->mask);
- for (prev = NULL, cur = h->htable[hvalue];
+ for (prev = NULL, cur = avtab_get_chain(h, hvalue);
cur;
- prev = cur, cur = cur->next) {
+ prev = cur, cur = avtab_get_node(h, cur->next)) {
if (key->source_type == cur->key.source_type &&
key->target_type == cur->key.target_type &&
key->target_class == cur->key.target_class &&
@@ -159,9 +155,9 @@ struct avtab_node *avtab_insert_nonunique(struct avtab *h,
if (!h || !h->nslot || h->nel == U32_MAX)
return NULL;
hvalue = avtab_hash(key, h->mask);
- for (prev = NULL, cur = h->htable[hvalue];
+ for (prev = NULL, cur = avtab_get_chain(h, hvalue);
cur;
- prev = cur, cur = cur->next) {
+ prev = cur, cur = avtab_get_node(h, cur->next)) {
if (key->source_type == cur->key.source_type &&
key->target_type == cur->key.target_type &&
key->target_class == cur->key.target_class &&
@@ -194,8 +190,8 @@ struct avtab_node *avtab_search_node(struct avtab *h,
return NULL;
hvalue = avtab_hash(key, h->mask);
- for (cur = h->htable[hvalue]; cur;
- cur = cur->next) {
+ for (cur = avtab_get_chain(h, hvalue); cur;
+ cur = avtab_get_node(h, cur->next)) {
if (key->source_type == cur->key.source_type &&
key->target_type == cur->key.target_type &&
key->target_class == cur->key.target_class &&
@@ -216,7 +212,7 @@ struct avtab_node *avtab_search_node(struct avtab *h,
}
struct avtab_node*
-avtab_search_node_next(struct avtab_node *node, u16 specified)
+avtab_search_node_next(struct avtab *h, struct avtab_node *node, u16 specified)
{
struct avtab_node *cur;
@@ -224,7 +220,7 @@ avtab_search_node_next(struct avtab_node *node, u16 specified)
return NULL;
specified &= ~(AVTAB_ENABLED|AVTAB_ENABLED_OLD);
- for (cur = node->next; cur; cur = cur->next) {
+ for (cur = avtab_get_node(h, node->next); cur; cur = avtab_get_node(h, cur->next)) {
if (node->key.source_type == cur->key.source_type &&
node->key.target_type == cur->key.target_type &&
node->key.target_class == cur->key.target_class &&
@@ -247,24 +243,24 @@ avtab_search_node_next(struct avtab_node *node, u16 specified)
void avtab_destroy(struct avtab *h)
{
u32 i;
- struct avtab_node *cur, *temp;
+ struct avtab_node *cur;
if (!h)
return;
for (i = 0; i < h->nslot; i++) {
- cur = h->htable[i];
+ cur = avtab_get_chain(h, i);
while (cur) {
- temp = cur;
- cur = cur->next;
- if (temp->key.specified & AVTAB_XPERMS)
+ if (cur->key.specified & AVTAB_XPERMS)
kmem_cache_free(avtab_xperms_cachep,
- temp->datum.u.xperms);
- kmem_cache_free(avtab_node_cachep, temp);
+ cur->datum.u.xperms);
+ cur = avtab_get_node(h, cur->next);
}
}
kvfree(h->htable);
+ kvfree(h->nodes);
h->htable = NULL;
+ h->nodes = NULL;
h->nel = 0;
h->nslot = 0;
h->mask = 0;
@@ -273,20 +269,26 @@ void avtab_destroy(struct avtab *h)
void avtab_init(struct avtab *h)
{
h->htable = NULL;
+ h->nodes = NULL;
h->nel = 0;
h->nslot = 0;
h->mask = 0;
}
-static int avtab_alloc_common(struct avtab *h, u32 nslot)
+static int avtab_alloc_common(struct avtab *h, u32 nslot, u32 nrules)
{
if (!nslot)
return 0;
- h->htable = kvcalloc(nslot, sizeof(void *), GFP_KERNEL);
+ h->htable = kvcalloc(nslot, sizeof(u32), GFP_KERNEL);
if (!h->htable)
return -ENOMEM;
-
+ h->nodes = kvcalloc(nrules, sizeof(struct avtab_node), GFP_KERNEL);
+ if (!h->nodes) {
+ kvfree(h->htable);
+ h->htable = NULL;
+ return -ENOMEM;
+ }
h->nslot = nslot;
h->mask = nslot - 1;
return 0;
@@ -308,7 +310,7 @@ int avtab_alloc(struct avtab *h, u32 nrules)
if (nslot > MAX_AVTAB_HASH_BUCKETS)
nslot = MAX_AVTAB_HASH_BUCKETS;
- rc = avtab_alloc_common(h, nslot);
+ rc = avtab_alloc_common(h, nslot, nrules);
if (rc)
return rc;
}
@@ -319,7 +321,7 @@ int avtab_alloc(struct avtab *h, u32 nrules)
int avtab_alloc_dup(struct avtab *new, const struct avtab *orig)
{
- return avtab_alloc_common(new, orig->nslot);
+ return avtab_alloc_common(new, orig->nslot, orig->nel);
}
#ifdef CONFIG_SECURITY_SELINUX_DEBUG
@@ -333,13 +335,13 @@ void avtab_hash_eval(struct avtab *h, const char *tag)
max_chain_len = 0;
chain2_len_sum = 0;
for (i = 0; i < h->nslot; i++) {
- cur = h->htable[i];
+ cur = avtab_get_chain(h, i);
if (cur) {
slots_used++;
chain_len = 0;
while (cur) {
chain_len++;
- cur = cur->next;
+ cur = avtab_get_node(h, cur->next);
}
if (chain_len > max_chain_len)
@@ -627,8 +629,8 @@ int avtab_write(struct policydb *p, struct avtab *a, void *fp)
return rc;
for (i = 0; i < a->nslot; i++) {
- for (cur = a->htable[i]; cur;
- cur = cur->next) {
+ for (cur = avtab_get_chain(a, i); cur;
+ cur = avtab_get_node(a, cur->next)) {
rc = avtab_write_item(p, cur, fp);
if (rc)
return rc;
@@ -640,9 +642,6 @@ int avtab_write(struct policydb *p, struct avtab *a, void *fp)
void __init avtab_cache_init(void)
{
- avtab_node_cachep = kmem_cache_create("avtab_node",
- sizeof(struct avtab_node),
- 0, SLAB_PANIC, NULL);
avtab_xperms_cachep = kmem_cache_create("avtab_extended_perms",
sizeof(struct avtab_extended_perms),
0, SLAB_PANIC, NULL);
diff --git a/security/selinux/ss/avtab.h b/security/selinux/ss/avtab.h
index 3c3904bf02b0..72708aed5fff 100644
--- a/security/selinux/ss/avtab.h
+++ b/security/selinux/ss/avtab.h
@@ -77,16 +77,38 @@ struct avtab_datum {
struct avtab_node {
struct avtab_key key;
struct avtab_datum datum;
- struct avtab_node *next;
+ u32 next;
};
struct avtab {
- struct avtab_node **htable;
+ u32 *htable;
+ struct avtab_node *nodes;
u32 nel; /* number of elements */
u32 nslot; /* number of hash slots */
u32 mask; /* mask to compute hash func */
};
+/* sentinel value to signal an empty node */
+#define NULL_NODE_IDX (0)
+/* compute the actual index into the nodes array */
+#define NODES_ARRAY_IDX(idx) ((idx) - 1)
+
+static inline struct avtab_node *avtab_get_chain(struct avtab *h, u32 slot)
+{
+ u32 chain_start = h->htable[slot];
+
+ if (chain_start != NULL_NODE_IDX)
+ return &h->nodes[NODES_ARRAY_IDX(chain_start)];
+ return NULL;
+}
+
+static inline struct avtab_node *avtab_get_node(struct avtab *h, u32 idx)
+{
+ if (idx != NULL_NODE_IDX)
+ return &h->nodes[NODES_ARRAY_IDX(idx)];
+ return NULL;
+}
+
void avtab_init(struct avtab *h);
int avtab_alloc(struct avtab *, u32);
int avtab_alloc_dup(struct avtab *new, const struct avtab *orig);
@@ -117,7 +139,7 @@ struct avtab_node *avtab_insert_nonunique(struct avtab *h,
struct avtab_node *avtab_search_node(struct avtab *h,
const struct avtab_key *key);
-struct avtab_node *avtab_search_node_next(struct avtab_node *node, u16 specified);
+struct avtab_node *avtab_search_node_next(struct avtab *h, struct avtab_node *node, u16 specified);
#define MAX_AVTAB_HASH_BITS 16
#define MAX_AVTAB_HASH_BUCKETS (1 << MAX_AVTAB_HASH_BITS)
diff --git a/security/selinux/ss/conditional.c b/security/selinux/ss/conditional.c
index 81ff676f209a..448bdacb1769 100644
--- a/security/selinux/ss/conditional.c
+++ b/security/selinux/ss/conditional.c
@@ -262,6 +262,7 @@ static int cond_insertf(struct avtab *a, const struct avtab_key *k,
struct cond_insertf_data *data = ptr;
struct policydb *p = data->p;
struct cond_av_list *other = data->other;
+ struct avtab *cond_avtab = &p->te_cond_avtab;
struct avtab_node *node_ptr;
u32 i;
bool found;
@@ -285,9 +286,9 @@ static int cond_insertf(struct avtab *a, const struct avtab_key *k,
* be any other entries.
*/
if (other) {
- node_ptr = avtab_search_node(&p->te_cond_avtab, k);
+ node_ptr = avtab_search_node(cond_avtab, k);
if (node_ptr) {
- if (avtab_search_node_next(node_ptr, k->specified)) {
+ if (avtab_search_node_next(cond_avtab, node_ptr, k->specified)) {
pr_err("SELinux: too many conflicting type rules.\n");
return -EINVAL;
}
@@ -304,14 +305,14 @@ static int cond_insertf(struct avtab *a, const struct avtab_key *k,
}
}
} else {
- if (avtab_search_node(&p->te_cond_avtab, k)) {
+ if (avtab_search_node(cond_avtab, k)) {
pr_err("SELinux: conflicting type rules when adding type rule for true.\n");
return -EINVAL;
}
}
}
- node_ptr = avtab_insert_nonunique(&p->te_cond_avtab, k, d);
+ node_ptr = avtab_insert_nonunique(cond_avtab, k, d);
if (!node_ptr) {
pr_err("SELinux: could not insert rule.\n");
return -ENOMEM;
@@ -563,7 +564,7 @@ void cond_compute_xperms(struct avtab *ctab, struct avtab_key *key,
return;
for (node = avtab_search_node(ctab, key); node;
- node = avtab_search_node_next(node, key->specified)) {
+ node = avtab_search_node_next(ctab, node, key->specified)) {
if (node->key.specified & AVTAB_ENABLED)
services_compute_xperms_decision(xpermd, node);
}
@@ -580,7 +581,7 @@ void cond_compute_av(struct avtab *ctab, struct avtab_key *key,
return;
for (node = avtab_search_node(ctab, key); node;
- node = avtab_search_node_next(node, key->specified)) {
+ node = avtab_search_node_next(ctab, node, key->specified)) {
if ((u16)(AVTAB_ALLOWED|AVTAB_ENABLED) ==
(node->key.specified & (AVTAB_ALLOWED|AVTAB_ENABLED)))
avd->allowed |= node->datum.u.data;
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index 1eeffc66ea7d..89cb4fc2d4aa 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -617,6 +617,7 @@ static void context_struct_compute_av(struct policydb *policydb,
{
struct constraint_node *constraint;
struct role_allow *ra;
+ struct avtab *avtab;
struct avtab_key avkey;
struct avtab_node *node;
struct class_datum *tclass_datum;
@@ -644,6 +645,7 @@ static void context_struct_compute_av(struct policydb *policydb,
* If a specific type enforcement rule was defined for
* this permission check, then use it.
*/
+ avtab = &policydb->te_avtab;
avkey.target_class = tclass;
avkey.specified = AVTAB_AV | AVTAB_XPERMS;
sattr = &policydb->type_attr_map_array[scontext->type - 1];
@@ -652,10 +654,9 @@ static void context_struct_compute_av(struct policydb *policydb,
ebitmap_for_each_positive_bit(tattr, tnode, j) {
avkey.source_type = i + 1;
avkey.target_type = j + 1;
- for (node = avtab_search_node(&policydb->te_avtab,
- &avkey);
+ for (node = avtab_search_node(avtab, &avkey);
node;
- node = avtab_search_node_next(node, avkey.specified)) {
+ node = avtab_search_node_next(avtab, node, avkey.specified)) {
if (node->key.specified == AVTAB_ALLOWED)
avd->allowed |= node->datum.u.data;
else if (node->key.specified == AVTAB_AUDITALLOW)
@@ -1008,6 +1009,7 @@ void security_compute_xperms_decision(u32 ssid,
struct sidtab *sidtab;
u16 tclass;
struct context *scontext, *tcontext;
+ struct avtab *avtab;
struct avtab_key avkey;
struct avtab_node *node;
struct ebitmap *sattr, *tattr;
@@ -1055,6 +1057,7 @@ void security_compute_xperms_decision(u32 ssid,
goto out;
}
+ avtab = &policydb->te_avtab;
avkey.target_class = tclass;
avkey.specified = AVTAB_XPERMS;
sattr = &policydb->type_attr_map_array[scontext->type - 1];
@@ -1063,10 +1066,9 @@ void security_compute_xperms_decision(u32 ssid,
ebitmap_for_each_positive_bit(tattr, tnode, j) {
avkey.source_type = i + 1;
avkey.target_type = j + 1;
- for (node = avtab_search_node(&policydb->te_avtab,
- &avkey);
+ for (node = avtab_search_node(avtab, &avkey);
node;
- node = avtab_search_node_next(node, avkey.specified))
+ node = avtab_search_node_next(avtab, node, avkey.specified))
services_compute_xperms_decision(xpermd, node);
cond_compute_xperms(&policydb->te_cond_avtab,
@@ -1705,6 +1707,7 @@ static int security_compute_sid(u32 ssid,
struct class_datum *cladatum;
struct context *scontext, *tcontext, newcontext;
struct sidtab_entry *sentry, *tentry;
+ struct avtab *cond_avtab;
struct avtab_key avkey;
struct avtab_node *avnode, *node;
u16 tclass;
@@ -1810,6 +1813,7 @@ static int security_compute_sid(u32 ssid,
}
/* Look for a type transition/member/change rule. */
+ cond_avtab = &policydb->te_cond_avtab;
avkey.source_type = scontext->type;
avkey.target_type = tcontext->type;
avkey.target_class = tclass;
@@ -1818,8 +1822,8 @@ static int security_compute_sid(u32 ssid,
/* If no permanent rule, also check for enabled conditional rules */
if (!avnode) {
- node = avtab_search_node(&policydb->te_cond_avtab, &avkey);
- for (; node; node = avtab_search_node_next(node, specified)) {
+ node = avtab_search_node(cond_avtab, &avkey);
+ for (; node; node = avtab_search_node_next(cond_avtab, node, specified)) {
if (node->key.specified & AVTAB_ENABLED) {
avnode = node;
break;
--
2.41.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/3] selinux: shrink conditional avtab node array
2023-09-06 15:46 [PATCH 0/3] avtab hotspot optimizations Jacob Satterfield
2023-09-06 15:46 ` [PATCH 1/3] selinux: use arrays for avtab hashtable nodes Jacob Satterfield
@ 2023-09-06 15:46 ` Jacob Satterfield
2023-09-06 17:17 ` Stephen Smalley
2023-09-06 15:46 ` [PATCH 3/3] selinux: hweight optimization in avtab_read_item Jacob Satterfield
2 siblings, 1 reply; 12+ messages in thread
From: Jacob Satterfield @ 2023-09-06 15:46 UTC (permalink / raw)
To: selinux; +Cc: Jacob Satterfield, stephen.smalley.work, paul, omosnace
Due to how conditional rules are written in the binary policy, the
code responsible for loading does not know how many conditional rules
there are before creating the avtab structure. Instead, it uses the
number of elements in the non-conditional avtab as a hint and allocates
the hash table based on it. Therefore, pre-allocating an array of nodes
based on this hint over-allocates at best and under-allocates at worst.
This patch includes two functions, avtab_grow_nodes and
avtab_shrink_nodes, which help manage the size of the nodes array in the
unlikely case when there are more conditional rules than non-conditional
and in the likely case when there are more non-conditional rules than
conditional rules respectively.
This patch required struct cond_av_list to become an array of indices
instead of pointers so that the nodes array can be copied and moved.
This coveniently results in a reduction of memory usage on 64-bit archs
as pointers become u32 integers.
Future improvements to the binary policy to provide the correct hint to
the loader code will make these functions obsolete. But as this would be
a breaking change to the format, it is not a part of this patch series.
Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
---
security/selinux/ss/avtab.c | 61 +++++++++++++++++++++++++------
security/selinux/ss/avtab.h | 8 ++--
security/selinux/ss/conditional.c | 46 ++++++++++++++---------
security/selinux/ss/conditional.h | 2 +-
4 files changed, 84 insertions(+), 33 deletions(-)
diff --git a/security/selinux/ss/avtab.c b/security/selinux/ss/avtab.c
index 620ea0a03e41..b7a11f417f0a 100644
--- a/security/selinux/ss/avtab.c
+++ b/security/selinux/ss/avtab.c
@@ -64,7 +64,9 @@ static inline u32 avtab_hash(const struct avtab_key *keyp, u32 mask)
return hash & mask;
}
-static struct avtab_node*
+static int avtab_grow_nodes(struct avtab *h);
+
+static u32
avtab_insert_node(struct avtab *h, u32 hvalue,
struct avtab_node *prev,
const struct avtab_key *key, const struct avtab_datum *datum)
@@ -72,6 +74,8 @@ avtab_insert_node(struct avtab *h, u32 hvalue,
u32 newnodei;
struct avtab_node *newnode;
struct avtab_extended_perms *xperms;
+ if (unlikely(h->nel == h->nnodes) && avtab_grow_nodes(h) != 0)
+ return NULL_NODE_IDX;
newnodei = ++h->nel;
newnode = avtab_get_node(h, newnodei);
newnode->key = *key;
@@ -80,7 +84,7 @@ avtab_insert_node(struct avtab *h, u32 hvalue,
xperms = kmem_cache_zalloc(avtab_xperms_cachep, GFP_KERNEL);
if (xperms == NULL) {
--h->nel;
- return NULL;
+ return NULL_NODE_IDX;
}
*xperms = *(datum->u.xperms);
newnode->datum.u.xperms = xperms;
@@ -96,14 +100,14 @@ avtab_insert_node(struct avtab *h, u32 hvalue,
h->htable[hvalue] = newnodei;
}
- return newnode;
+ return newnodei;
}
static int avtab_insert(struct avtab *h, const struct avtab_key *key,
const struct avtab_datum *datum)
{
- u32 hvalue;
- struct avtab_node *prev, *cur, *newnode;
+ u32 hvalue, newnodei;
+ struct avtab_node *prev, *cur;
u16 specified = key->specified & ~(AVTAB_ENABLED|AVTAB_ENABLED_OLD);
if (!h || !h->nslot || h->nel == U32_MAX)
@@ -133,8 +137,8 @@ static int avtab_insert(struct avtab *h, const struct avtab_key *key,
break;
}
- newnode = avtab_insert_node(h, hvalue, prev, key, datum);
- if (!newnode)
+ newnodei = avtab_insert_node(h, hvalue, prev, key, datum);
+ if (newnodei == NULL_NODE_IDX)
return -ENOMEM;
return 0;
@@ -142,18 +146,18 @@ static int avtab_insert(struct avtab *h, const struct avtab_key *key,
/* Unlike avtab_insert(), this function allow multiple insertions of the same
* key/specified mask into the table, as needed by the conditional avtab.
- * It also returns a pointer to the node inserted.
+ * It returns the index of the node inserted.
*/
-struct avtab_node *avtab_insert_nonunique(struct avtab *h,
- const struct avtab_key *key,
- const struct avtab_datum *datum)
+u32 avtab_insert_nonunique(struct avtab *h,
+ const struct avtab_key *key,
+ const struct avtab_datum *datum)
{
u32 hvalue;
struct avtab_node *prev, *cur;
u16 specified = key->specified & ~(AVTAB_ENABLED|AVTAB_ENABLED_OLD);
if (!h || !h->nslot || h->nel == U32_MAX)
- return NULL;
+ return NULL_NODE_IDX;
hvalue = avtab_hash(key, h->mask);
for (prev = NULL, cur = avtab_get_chain(h, hvalue);
cur;
@@ -261,6 +265,7 @@ void avtab_destroy(struct avtab *h)
kvfree(h->nodes);
h->htable = NULL;
h->nodes = NULL;
+ h->nnodes = 0;
h->nel = 0;
h->nslot = 0;
h->mask = 0;
@@ -270,6 +275,7 @@ void avtab_init(struct avtab *h)
{
h->htable = NULL;
h->nodes = NULL;
+ h->nnodes = 0;
h->nel = 0;
h->nslot = 0;
h->mask = 0;
@@ -289,6 +295,7 @@ static int avtab_alloc_common(struct avtab *h, u32 nslot, u32 nrules)
h->htable = NULL;
return -ENOMEM;
}
+ h->nnodes = nrules;
h->nslot = nslot;
h->mask = nslot - 1;
return 0;
@@ -324,6 +331,36 @@ int avtab_alloc_dup(struct avtab *new, const struct avtab *orig)
return avtab_alloc_common(new, orig->nslot, orig->nel);
}
+static int avtab_change_nodes_size(struct avtab *h, u32 nnodes)
+{
+ struct avtab_node *new_nodes;
+
+ if (!h->nodes)
+ return -EINVAL;
+
+ new_nodes = kvcalloc(nnodes, sizeof(*h->nodes), GFP_KERNEL);
+ if (!new_nodes)
+ return -ENOMEM;
+
+ if (h->nel)
+ memcpy(new_nodes, h->nodes, sizeof(*h->nodes) * h->nel);
+
+ kvfree(h->nodes);
+ h->nodes = new_nodes;
+ h->nnodes = nnodes;
+ return 0;
+}
+
+static int avtab_grow_nodes(struct avtab *h)
+{
+ return avtab_change_nodes_size(h, h->nnodes + 1024);
+}
+
+int avtab_shrink_nodes(struct avtab *h)
+{
+ return avtab_change_nodes_size(h, h->nel);
+}
+
#ifdef CONFIG_SECURITY_SELINUX_DEBUG
void avtab_hash_eval(struct avtab *h, const char *tag)
{
diff --git a/security/selinux/ss/avtab.h b/security/selinux/ss/avtab.h
index 72708aed5fff..009734c14d05 100644
--- a/security/selinux/ss/avtab.h
+++ b/security/selinux/ss/avtab.h
@@ -83,6 +83,7 @@ struct avtab_node {
struct avtab {
u32 *htable;
struct avtab_node *nodes;
+ u32 nnodes; /* number of nodes */
u32 nel; /* number of elements */
u32 nslot; /* number of hash slots */
u32 mask; /* mask to compute hash func */
@@ -112,6 +113,7 @@ static inline struct avtab_node *avtab_get_node(struct avtab *h, u32 idx)
void avtab_init(struct avtab *h);
int avtab_alloc(struct avtab *, u32);
int avtab_alloc_dup(struct avtab *new, const struct avtab *orig);
+int avtab_shrink_nodes(struct avtab *h);
void avtab_destroy(struct avtab *h);
#ifdef CONFIG_SECURITY_SELINUX_DEBUG
@@ -132,9 +134,9 @@ int avtab_read(struct avtab *a, void *fp, struct policydb *pol);
int avtab_write_item(struct policydb *p, const struct avtab_node *cur, void *fp);
int avtab_write(struct policydb *p, struct avtab *a, void *fp);
-struct avtab_node *avtab_insert_nonunique(struct avtab *h,
- const struct avtab_key *key,
- const struct avtab_datum *datum);
+u32 avtab_insert_nonunique(struct avtab *h,
+ const struct avtab_key *key,
+ const struct avtab_datum *datum);
struct avtab_node *avtab_search_node(struct avtab *h,
const struct avtab_key *key);
diff --git a/security/selinux/ss/conditional.c b/security/selinux/ss/conditional.c
index 448bdacb1769..b7da47d00158 100644
--- a/security/selinux/ss/conditional.c
+++ b/security/selinux/ss/conditional.c
@@ -91,10 +91,12 @@ static int cond_evaluate_expr(struct policydb *p, struct cond_expr *expr)
*/
static void evaluate_cond_node(struct policydb *p, struct cond_node *node)
{
+ struct avtab *cond_avtab;
struct avtab_node *avnode;
int new_state;
u32 i;
+ cond_avtab = &p->te_cond_avtab;
new_state = cond_evaluate_expr(p, &node->expr);
if (new_state != node->cur_state) {
node->cur_state = new_state;
@@ -102,7 +104,7 @@ static void evaluate_cond_node(struct policydb *p, struct cond_node *node)
pr_err("SELinux: expression result was undefined - disabling all rules.\n");
/* turn the rules on or off */
for (i = 0; i < node->true_list.len; i++) {
- avnode = node->true_list.nodes[i];
+ avnode = avtab_get_node(cond_avtab, node->true_list.nodes[i]);
if (new_state <= 0)
avnode->key.specified &= ~AVTAB_ENABLED;
else
@@ -110,7 +112,7 @@ static void evaluate_cond_node(struct policydb *p, struct cond_node *node)
}
for (i = 0; i < node->false_list.len; i++) {
- avnode = node->false_list.nodes[i];
+ avnode = avtab_get_node(cond_avtab, node->false_list.nodes[i]);
/* -1 or 1 */
if (new_state)
avnode->key.specified &= ~AVTAB_ENABLED;
@@ -140,7 +142,7 @@ void cond_policydb_init(struct policydb *p)
static void cond_node_destroy(struct cond_node *node)
{
kfree(node->expr.nodes);
- /* the avtab_ptr_t nodes are destroyed by the avtab */
+ /* the actual nodes were destroyed by avtab_destroy() */
kfree(node->true_list.nodes);
kfree(node->false_list.nodes);
}
@@ -252,7 +254,7 @@ int cond_read_bool(struct policydb *p, struct symtab *s, void *fp)
struct cond_insertf_data {
struct policydb *p;
- struct avtab_node **dst;
+ u32 *dst;
struct cond_av_list *other;
};
@@ -263,8 +265,8 @@ static int cond_insertf(struct avtab *a, const struct avtab_key *k,
struct policydb *p = data->p;
struct cond_av_list *other = data->other;
struct avtab *cond_avtab = &p->te_cond_avtab;
- struct avtab_node *node_ptr;
- u32 i;
+ struct avtab_node *node_ptr, *other_node_ptr;
+ u32 i, node_idx;
bool found;
/*
@@ -294,7 +296,9 @@ static int cond_insertf(struct avtab *a, const struct avtab_key *k,
}
found = false;
for (i = 0; i < other->len; i++) {
- if (other->nodes[i] == node_ptr) {
+ other_node_ptr = avtab_get_node(cond_avtab,
+ other->nodes[i]);
+ if (other_node_ptr == node_ptr) {
found = true;
break;
}
@@ -312,13 +316,13 @@ static int cond_insertf(struct avtab *a, const struct avtab_key *k,
}
}
- node_ptr = avtab_insert_nonunique(cond_avtab, k, d);
- if (!node_ptr) {
+ node_idx = avtab_insert_nonunique(cond_avtab, k, d);
+ if (node_idx == NULL_NODE_IDX) {
pr_err("SELinux: could not insert rule.\n");
return -ENOMEM;
}
- *data->dst = node_ptr;
+ *data->dst = node_idx;
return 0;
}
@@ -441,6 +445,9 @@ int cond_read_list(struct policydb *p, void *fp)
if (rc)
goto err;
}
+ rc = avtab_shrink_nodes(&p->te_cond_avtab);
+ if (rc)
+ goto err;
return 0;
err:
cond_list_destroy(p);
@@ -483,6 +490,7 @@ static int cond_write_av_list(struct policydb *p,
struct cond_av_list *list, struct policy_file *fp)
{
__le32 buf[1];
+ struct avtab_node *node;
u32 i;
int rc;
@@ -492,7 +500,8 @@ static int cond_write_av_list(struct policydb *p,
return rc;
for (i = 0; i < list->len; i++) {
- rc = avtab_write_item(p, list->nodes[i], fp);
+ node = avtab_get_node(&p->te_cond_avtab, list->nodes[i]);
+ rc = avtab_write_item(p, node, fp);
if (rc)
return rc;
}
@@ -604,8 +613,10 @@ void cond_compute_av(struct avtab *ctab, struct avtab_key *key,
static int cond_dup_av_list(struct cond_av_list *new,
struct cond_av_list *orig,
- struct avtab *avtab)
+ struct avtab *new_avtab,
+ struct avtab *orig_avtab)
{
+ struct avtab_node *orig_node;
u32 i;
memset(new, 0, sizeof(*new));
@@ -615,9 +626,10 @@ static int cond_dup_av_list(struct cond_av_list *new,
return -ENOMEM;
for (i = 0; i < orig->len; i++) {
- new->nodes[i] = avtab_insert_nonunique(avtab,
- &orig->nodes[i]->key,
- &orig->nodes[i]->datum);
+ orig_node = avtab_get_node(orig_avtab, orig->nodes[i]);
+ new->nodes[i] = avtab_insert_nonunique(new_avtab,
+ &orig_node->key,
+ &orig_node->datum);
if (!new->nodes[i])
return -ENOMEM;
new->len++;
@@ -659,12 +671,12 @@ static int duplicate_policydb_cond_list(struct policydb *newp,
newn->expr.len = orign->expr.len;
rc = cond_dup_av_list(&newn->true_list, &orign->true_list,
- &newp->te_cond_avtab);
+ &newp->te_cond_avtab, &origp->te_cond_avtab);
if (rc)
goto error;
rc = cond_dup_av_list(&newn->false_list, &orign->false_list,
- &newp->te_cond_avtab);
+ &newp->te_cond_avtab, &origp->te_cond_avtab);
if (rc)
goto error;
}
diff --git a/security/selinux/ss/conditional.h b/security/selinux/ss/conditional.h
index 5a7b51278dc6..0f78970493c3 100644
--- a/security/selinux/ss/conditional.h
+++ b/security/selinux/ss/conditional.h
@@ -43,7 +43,7 @@ struct cond_expr {
* struct is for that list.
*/
struct cond_av_list {
- struct avtab_node **nodes;
+ u32 *nodes;
u32 len;
};
--
2.41.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/3] selinux: hweight optimization in avtab_read_item
2023-09-06 15:46 [PATCH 0/3] avtab hotspot optimizations Jacob Satterfield
2023-09-06 15:46 ` [PATCH 1/3] selinux: use arrays for avtab hashtable nodes Jacob Satterfield
2023-09-06 15:46 ` [PATCH 2/3] selinux: shrink conditional avtab node array Jacob Satterfield
@ 2023-09-06 15:46 ` Jacob Satterfield
2023-09-06 17:18 ` Stephen Smalley
2023-09-13 17:54 ` Paul Moore
2 siblings, 2 replies; 12+ messages in thread
From: Jacob Satterfield @ 2023-09-06 15:46 UTC (permalink / raw)
To: selinux; +Cc: Jacob Satterfield, stephen.smalley.work, paul, omosnace
avtab_read_item() is a hot function called when reading each rule in a
binary policydb. With the current Fedora policy and refpolicy, this
function is called nearly 100,000 times per policy load.
A single avtab node is only permitted to have a single specifier to
describe the data it holds. As such, a check is performed to make sure
only one specifier is set. Previously this was done via a for-loop.
However, there is already an optimal function for finding the number of
bits set (hamming weight) and on some architectures, dedicated
instructions (popcount) which can be executed much more efficiently.
Even when using -mcpu=generic on a x86-64 Fedora 38 VM, this commit
results in a modest 2-4% speedup for policy loading due to a substantial
reduction in the number of instructions executed.
Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
---
security/selinux/ss/avtab.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/security/selinux/ss/avtab.c b/security/selinux/ss/avtab.c
index b7a11f417f0a..b0e455fb395c 100644
--- a/security/selinux/ss/avtab.c
+++ b/security/selinux/ss/avtab.c
@@ -17,6 +17,7 @@
* Tuned number of hash slots for avtab to reduce memory usage
*/
+#include <linux/bitops.h>
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/errno.h>
@@ -516,11 +517,7 @@ int avtab_read_item(struct avtab *a, void *fp, struct policydb *pol,
return -EINVAL;
}
- set = 0;
- for (i = 0; i < ARRAY_SIZE(spec_order); i++) {
- if (key.specified & spec_order[i])
- set++;
- }
+ set = hweight16(key.specified & (AVTAB_XPERMS | AVTAB_TYPE | AVTAB_AV));
if (!set || set > 1) {
pr_err("SELinux: avtab: more than one specifier\n");
return -EINVAL;
--
2.41.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] selinux: use arrays for avtab hashtable nodes
2023-09-06 15:46 ` [PATCH 1/3] selinux: use arrays for avtab hashtable nodes Jacob Satterfield
@ 2023-09-06 17:16 ` Stephen Smalley
2023-09-13 3:23 ` Paul Moore
1 sibling, 0 replies; 12+ messages in thread
From: Stephen Smalley @ 2023-09-06 17:16 UTC (permalink / raw)
To: Jacob Satterfield; +Cc: selinux, paul, omosnace
On Wed, Sep 6, 2023 at 11:46 AM Jacob Satterfield
<jsatterfield.linux@gmail.com> wrote:
>
> The current avtab hashtable employs a separate chaining collision
> resolution strategy where each bucket/chain holds an ordered linked list
> of pointers to kmem_cache allocated avtab_node elements.
>
> On Fedora 38 (x86_64) using the default policy, avtab_node_cachep
> uses 573 slabs each containing 170 objects totaling 2,337,840 bytes.
> A call to kmem_cache_zalloc() is required for every single rule, which
> in the default policy is currently 96,730 and continually rising.
>
> When both sets of avtab_node (regular and cond.) are turned into arrays
> with the hash table slot indexing into it rather than a pointer, then
> this results in only 2 additional kvcalloc() calls and the complete
> removal of the kmem_cache itself.
>
> The caching characteristics of iterating a single array are better due
> to locality of reference. Running "perf stat -r 100 -d load_policy"
> has shown a runtime reduction of at least 10% on a Fedora 38 x86_64 VM
> with this single patch. Future patches focused on improving the hash
> table's collision resolution strategy and array layout (struct-of-arrays
> vs. array-of-structs) may elicit even more caching and therefore runtime
> performance improvements.
>
> On 64-bit machines, the memory usage of the hash table slots is cut in
> half due to the use of u32 indices instead of memory pointers. Since
> ~65k hash slots are used between the regular and cond. tables with the
> default Fedora 38 policy, this equates to around 256KB of memory saved.
>
> Notes:
>
> A couple helper functions avtab_get_chain() and avtab_get_node() are
> introduced to provide more standardized interfaces for use in functions
> that need to search through the hash table.
>
> NULL_NODE_IDX defines a sentinel value which helps determinine if a spot
> in the hash table or the "next" member of an avtab_node are valid.
>
> This patch causes the cond. rules table to waste memory as the size
> requested for the kvcalloc() is the size of the regular rules table.
> These tables rarely, if ever, have the same number of rules in practice.
> The next patch addresses this shortcoming.
>
> Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> ---
> security/selinux/ss/avtab.c | 75 +++++++++++++++----------------
> security/selinux/ss/avtab.h | 28 ++++++++++--
> security/selinux/ss/conditional.c | 13 +++---
> security/selinux/ss/services.c | 20 +++++----
> 4 files changed, 81 insertions(+), 55 deletions(-)
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] selinux: shrink conditional avtab node array
2023-09-06 15:46 ` [PATCH 2/3] selinux: shrink conditional avtab node array Jacob Satterfield
@ 2023-09-06 17:17 ` Stephen Smalley
0 siblings, 0 replies; 12+ messages in thread
From: Stephen Smalley @ 2023-09-06 17:17 UTC (permalink / raw)
To: Jacob Satterfield; +Cc: selinux, paul, omosnace
On Wed, Sep 6, 2023 at 11:46 AM Jacob Satterfield
<jsatterfield.linux@gmail.com> wrote:
>
> Due to how conditional rules are written in the binary policy, the
> code responsible for loading does not know how many conditional rules
> there are before creating the avtab structure. Instead, it uses the
> number of elements in the non-conditional avtab as a hint and allocates
> the hash table based on it. Therefore, pre-allocating an array of nodes
> based on this hint over-allocates at best and under-allocates at worst.
>
> This patch includes two functions, avtab_grow_nodes and
> avtab_shrink_nodes, which help manage the size of the nodes array in the
> unlikely case when there are more conditional rules than non-conditional
> and in the likely case when there are more non-conditional rules than
> conditional rules respectively.
>
> This patch required struct cond_av_list to become an array of indices
> instead of pointers so that the nodes array can be copied and moved.
> This coveniently results in a reduction of memory usage on 64-bit archs
> as pointers become u32 integers.
>
> Future improvements to the binary policy to provide the correct hint to
> the loader code will make these functions obsolete. But as this would be
> a breaking change to the format, it is not a part of this patch series.
>
> Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> ---
> security/selinux/ss/avtab.c | 61 +++++++++++++++++++++++++------
> security/selinux/ss/avtab.h | 8 ++--
> security/selinux/ss/conditional.c | 46 ++++++++++++++---------
> security/selinux/ss/conditional.h | 2 +-
> 4 files changed, 84 insertions(+), 33 deletions(-)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] selinux: hweight optimization in avtab_read_item
2023-09-06 15:46 ` [PATCH 3/3] selinux: hweight optimization in avtab_read_item Jacob Satterfield
@ 2023-09-06 17:18 ` Stephen Smalley
2023-09-13 17:54 ` Paul Moore
1 sibling, 0 replies; 12+ messages in thread
From: Stephen Smalley @ 2023-09-06 17:18 UTC (permalink / raw)
To: Jacob Satterfield; +Cc: selinux, paul, omosnace
On Wed, Sep 6, 2023 at 11:47 AM Jacob Satterfield
<jsatterfield.linux@gmail.com> wrote:
>
> avtab_read_item() is a hot function called when reading each rule in a
> binary policydb. With the current Fedora policy and refpolicy, this
> function is called nearly 100,000 times per policy load.
>
> A single avtab node is only permitted to have a single specifier to
> describe the data it holds. As such, a check is performed to make sure
> only one specifier is set. Previously this was done via a for-loop.
> However, there is already an optimal function for finding the number of
> bits set (hamming weight) and on some architectures, dedicated
> instructions (popcount) which can be executed much more efficiently.
>
> Even when using -mcpu=generic on a x86-64 Fedora 38 VM, this commit
> results in a modest 2-4% speedup for policy loading due to a substantial
> reduction in the number of instructions executed.
>
> Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> ---
> security/selinux/ss/avtab.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] selinux: use arrays for avtab hashtable nodes
2023-09-06 15:46 ` [PATCH 1/3] selinux: use arrays for avtab hashtable nodes Jacob Satterfield
2023-09-06 17:16 ` Stephen Smalley
@ 2023-09-13 3:23 ` Paul Moore
2023-09-14 21:57 ` Jacob Satterfield
1 sibling, 1 reply; 12+ messages in thread
From: Paul Moore @ 2023-09-13 3:23 UTC (permalink / raw)
To: Stephen Smalley, Jacob Satterfield; +Cc: selinux, omosnace
On Sep 6, 2023 Stephen Smalley <stephen.smalley.work@gmail.com> wrote:
>
> The current avtab hashtable employs a separate chaining collision
> resolution strategy where each bucket/chain holds an ordered linked list
> of pointers to kmem_cache allocated avtab_node elements.
>
> On Fedora 38 (x86_64) using the default policy, avtab_node_cachep
> uses 573 slabs each containing 170 objects totaling 2,337,840 bytes.
> A call to kmem_cache_zalloc() is required for every single rule, which
> in the default policy is currently 96,730 and continually rising.
>
> When both sets of avtab_node (regular and cond.) are turned into arrays
> with the hash table slot indexing into it rather than a pointer, then
> this results in only 2 additional kvcalloc() calls and the complete
> removal of the kmem_cache itself.
>
> The caching characteristics of iterating a single array are better due
> to locality of reference. Running "perf stat -r 100 -d load_policy"
> has shown a runtime reduction of at least 10% on a Fedora 38 x86_64 VM
> with this single patch. Future patches focused on improving the hash
> table's collision resolution strategy and array layout (struct-of-arrays
> vs. array-of-structs) may elicit even more caching and therefore runtime
> performance improvements.
>
> On 64-bit machines, the memory usage of the hash table slots is cut in
> half due to the use of u32 indices instead of memory pointers. Since
> ~65k hash slots are used between the regular and cond. tables with the
> default Fedora 38 policy, this equates to around 256KB of memory saved.
>
> Notes:
>
> A couple helper functions avtab_get_chain() and avtab_get_node() are
> introduced to provide more standardized interfaces for use in functions
> that need to search through the hash table.
>
> NULL_NODE_IDX defines a sentinel value which helps determinine if a spot
> in the hash table or the "next" member of an avtab_node are valid.
>
> This patch causes the cond. rules table to waste memory as the size
> requested for the kvcalloc() is the size of the regular rules table.
> These tables rarely, if ever, have the same number of rules in practice.
> The next patch addresses this shortcoming.
Isn't the more important issue that of there being more conditional
rules than regular rules? Yes, I'm sure this is very unlikely, but
given just this patch wouldn't this cause a problem?
It is important to remember that even when combined in a patchset,
any given patch should not break the system. A patch can depend on
prior patches in the patchset, but not upcoming patches.
I've only looked very briefly at patch 2/3, but I suspect at the very
least you may need to squash patches 1/3 and 2/3 to ensure there is
no breakage.
More comments below (all are in the context of patch 1/3, some may not
be relevant in the context of 1/3+2/3).
> Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> ---
> security/selinux/ss/avtab.c | 75 +++++++++++++++----------------
> security/selinux/ss/avtab.h | 28 ++++++++++--
> security/selinux/ss/conditional.c | 13 +++---
> security/selinux/ss/services.c | 20 +++++----
> 4 files changed, 81 insertions(+), 55 deletions(-)
>
> diff --git a/security/selinux/ss/avtab.c b/security/selinux/ss/avtab.c
> index 86d98a8e291b..620ea0a03e41 100644
> --- a/security/selinux/ss/avtab.c
> +++ b/security/selinux/ss/avtab.c
> @@ -23,7 +23,6 @@
> #include "avtab.h"
> #include "policydb.h"
>
> -static struct kmem_cache *avtab_node_cachep __ro_after_init;
> static struct kmem_cache *avtab_xperms_cachep __ro_after_init;
>
> /* Based on MurmurHash3, written by Austin Appleby and placed in the
> @@ -70,17 +69,17 @@ avtab_insert_node(struct avtab *h, u32 hvalue,
> struct avtab_node *prev,
> const struct avtab_key *key, const struct avtab_datum *datum)
> {
> + u32 newnodei;
> struct avtab_node *newnode;
> struct avtab_extended_perms *xperms;
> - newnode = kmem_cache_zalloc(avtab_node_cachep, GFP_KERNEL);
> - if (newnode == NULL)
> - return NULL;
> + newnodei = ++h->nel;
This makes me nervous that some future code change forgets to
decrement h->nel in its error handler. Let's only bump h->nel at the
end of the function.
We also should have a check in this function to ensure we don't
overflow the array size.
> + newnode = avtab_get_node(h, newnodei);
> newnode->key = *key;
>
> if (key->specified & AVTAB_XPERMS) {
> xperms = kmem_cache_zalloc(avtab_xperms_cachep, GFP_KERNEL);
> if (xperms == NULL) {
> - kmem_cache_free(avtab_node_cachep, newnode);
> + --h->nel;
> return NULL;
> }
> *xperms = *(datum->u.xperms);
> @@ -91,15 +90,12 @@ avtab_insert_node(struct avtab *h, u32 hvalue,
>
> if (prev) {
> newnode->next = prev->next;
> - prev->next = newnode;
> + prev->next = newnodei;
> } else {
> - struct avtab_node **n = &h->htable[hvalue];
> -
> - newnode->next = *n;
> - *n = newnode;
> + newnode->next = h->htable[hvalue];
> + h->htable[hvalue] = newnodei;
> }
See my comments below about the next field in the avtab_node struct.
> - h->nel++;
> return newnode;
> }
>
> @@ -114,9 +110,9 @@ static int avtab_insert(struct avtab *h, const struct avtab_key *key,
> return -EINVAL;
>
> hvalue = avtab_hash(key, h->mask);
> - for (prev = NULL, cur = h->htable[hvalue];
> + for (prev = NULL, cur = avtab_get_chain(h, hvalue);
> cur;
> - prev = cur, cur = cur->next) {
> + prev = cur, cur = avtab_get_node(h, cur->next)) {
> if (key->source_type == cur->key.source_type &&
> key->target_type == cur->key.target_type &&
> key->target_class == cur->key.target_class &&
> @@ -159,9 +155,9 @@ struct avtab_node *avtab_insert_nonunique(struct avtab *h,
> if (!h || !h->nslot || h->nel == U32_MAX)
> return NULL;
> hvalue = avtab_hash(key, h->mask);
> - for (prev = NULL, cur = h->htable[hvalue];
> + for (prev = NULL, cur = avtab_get_chain(h, hvalue);
> cur;
> - prev = cur, cur = cur->next) {
> + prev = cur, cur = avtab_get_node(h, cur->next)) {
> if (key->source_type == cur->key.source_type &&
> key->target_type == cur->key.target_type &&
> key->target_class == cur->key.target_class &&
> @@ -194,8 +190,8 @@ struct avtab_node *avtab_search_node(struct avtab *h,
> return NULL;
>
> hvalue = avtab_hash(key, h->mask);
> - for (cur = h->htable[hvalue]; cur;
> - cur = cur->next) {
> + for (cur = avtab_get_chain(h, hvalue); cur;
> + cur = avtab_get_node(h, cur->next)) {
> if (key->source_type == cur->key.source_type &&
> key->target_type == cur->key.target_type &&
> key->target_class == cur->key.target_class &&
> @@ -216,7 +212,7 @@ struct avtab_node *avtab_search_node(struct avtab *h,
> }
>
> struct avtab_node*
> -avtab_search_node_next(struct avtab_node *node, u16 specified)
> +avtab_search_node_next(struct avtab *h, struct avtab_node *node, u16 specified)
> {
> struct avtab_node *cur;
>
> @@ -224,7 +220,7 @@ avtab_search_node_next(struct avtab_node *node, u16 specified)
> return NULL;
>
> specified &= ~(AVTAB_ENABLED|AVTAB_ENABLED_OLD);
> - for (cur = node->next; cur; cur = cur->next) {
> + for (cur = avtab_get_node(h, node->next); cur; cur = avtab_get_node(h, cur->next)) {
> if (node->key.source_type == cur->key.source_type &&
> node->key.target_type == cur->key.target_type &&
> node->key.target_class == cur->key.target_class &&
> @@ -247,24 +243,24 @@ avtab_search_node_next(struct avtab_node *node, u16 specified)
> void avtab_destroy(struct avtab *h)
> {
> u32 i;
> - struct avtab_node *cur, *temp;
> + struct avtab_node *cur;
>
> if (!h)
> return;
>
> for (i = 0; i < h->nslot; i++) {
> - cur = h->htable[i];
> + cur = avtab_get_chain(h, i);
> while (cur) {
> - temp = cur;
> - cur = cur->next;
> - if (temp->key.specified & AVTAB_XPERMS)
> + if (cur->key.specified & AVTAB_XPERMS)
> kmem_cache_free(avtab_xperms_cachep,
> - temp->datum.u.xperms);
> - kmem_cache_free(avtab_node_cachep, temp);
> + cur->datum.u.xperms);
> + cur = avtab_get_node(h, cur->next);
> }
> }
> kvfree(h->htable);
> + kvfree(h->nodes);
> h->htable = NULL;
> + h->nodes = NULL;
> h->nel = 0;
> h->nslot = 0;
> h->mask = 0;
> @@ -273,20 +269,26 @@ void avtab_destroy(struct avtab *h)
> void avtab_init(struct avtab *h)
> {
> h->htable = NULL;
> + h->nodes = NULL;
> h->nel = 0;
> h->nslot = 0;
> h->mask = 0;
> }
>
> -static int avtab_alloc_common(struct avtab *h, u32 nslot)
> +static int avtab_alloc_common(struct avtab *h, u32 nslot, u32 nrules)
> {
> if (!nslot)
> return 0;
>
> - h->htable = kvcalloc(nslot, sizeof(void *), GFP_KERNEL);
> + h->htable = kvcalloc(nslot, sizeof(u32), GFP_KERNEL);
> if (!h->htable)
> return -ENOMEM;
> -
> + h->nodes = kvcalloc(nrules, sizeof(struct avtab_node), GFP_KERNEL);
> + if (!h->nodes) {
> + kvfree(h->htable);
> + h->htable = NULL;
> + return -ENOMEM;
> + }
> h->nslot = nslot;
> h->mask = nslot - 1;
> return 0;
> @@ -308,7 +310,7 @@ int avtab_alloc(struct avtab *h, u32 nrules)
> if (nslot > MAX_AVTAB_HASH_BUCKETS)
> nslot = MAX_AVTAB_HASH_BUCKETS;
>
> - rc = avtab_alloc_common(h, nslot);
> + rc = avtab_alloc_common(h, nslot, nrules);
> if (rc)
> return rc;
> }
> @@ -319,7 +321,7 @@ int avtab_alloc(struct avtab *h, u32 nrules)
>
> int avtab_alloc_dup(struct avtab *new, const struct avtab *orig)
> {
> - return avtab_alloc_common(new, orig->nslot);
> + return avtab_alloc_common(new, orig->nslot, orig->nel);
> }
>
> #ifdef CONFIG_SECURITY_SELINUX_DEBUG
> @@ -333,13 +335,13 @@ void avtab_hash_eval(struct avtab *h, const char *tag)
> max_chain_len = 0;
> chain2_len_sum = 0;
> for (i = 0; i < h->nslot; i++) {
> - cur = h->htable[i];
> + cur = avtab_get_chain(h, i);
> if (cur) {
> slots_used++;
> chain_len = 0;
> while (cur) {
> chain_len++;
> - cur = cur->next;
> + cur = avtab_get_node(h, cur->next);
> }
>
> if (chain_len > max_chain_len)
> @@ -627,8 +629,8 @@ int avtab_write(struct policydb *p, struct avtab *a, void *fp)
> return rc;
>
> for (i = 0; i < a->nslot; i++) {
> - for (cur = a->htable[i]; cur;
> - cur = cur->next) {
> + for (cur = avtab_get_chain(a, i); cur;
> + cur = avtab_get_node(a, cur->next)) {
> rc = avtab_write_item(p, cur, fp);
> if (rc)
> return rc;
> @@ -640,9 +642,6 @@ int avtab_write(struct policydb *p, struct avtab *a, void *fp)
>
> void __init avtab_cache_init(void)
> {
> - avtab_node_cachep = kmem_cache_create("avtab_node",
> - sizeof(struct avtab_node),
> - 0, SLAB_PANIC, NULL);
> avtab_xperms_cachep = kmem_cache_create("avtab_extended_perms",
> sizeof(struct avtab_extended_perms),
> 0, SLAB_PANIC, NULL);
> diff --git a/security/selinux/ss/avtab.h b/security/selinux/ss/avtab.h
> index 3c3904bf02b0..72708aed5fff 100644
> --- a/security/selinux/ss/avtab.h
> +++ b/security/selinux/ss/avtab.h
> @@ -77,16 +77,38 @@ struct avtab_datum {
> struct avtab_node {
> struct avtab_key key;
> struct avtab_datum datum;
> - struct avtab_node *next;
> + u32 next;
Given that one of the common avtab lookup operations is to simply walk
the list, using the next field, why not keep the next field as a
pointer? Sure, you loose the 64-bit to 32-bit size reduction on 64-bit
systems (although I wonder if alignment issues rendered that moot), but
I would expect the walk to be quicker if you only needed to do a single
next pointer lookup as opposed to the array indexing.
I may be missing something, but I believe this would also remove the
need for the NULL_NODE_IDX sentinel.
> };
>
> struct avtab {
> - struct avtab_node **htable;
> + u32 *htable;
The field name "htable" is a little too confusing now, let's change
it to "indicies", "offsets", or something like that.
> + struct avtab_node *nodes;
> u32 nel; /* number of elements */
> u32 nslot; /* number of hash slots */
> u32 mask; /* mask to compute hash func */
> };
>
> +/* sentinel value to signal an empty node */
> +#define NULL_NODE_IDX (0)
> +/* compute the actual index into the nodes array */
> +#define NODES_ARRAY_IDX(idx) ((idx) - 1)
I'm pretty sure we can do away with NULL_NODE_IDX if we keep the next
field as a pointer (NULL works as a sentinel), does that also mean we
can get rid of the NODES_ARRAY_IDX translation?
> +static inline struct avtab_node *avtab_get_chain(struct avtab *h, u32 slot)
> +{
> + u32 chain_start = h->htable[slot];
> +
> + if (chain_start != NULL_NODE_IDX)
> + return &h->nodes[NODES_ARRAY_IDX(chain_start)];
> + return NULL;
Don't duplicate avtab_get_node() here, call it instead; for example:
struct avtab_node *avtab_get_chain(h, slot)
{
return avtab_get_node(h, h->table[slot]);
}
> +}
> +
> +static inline struct avtab_node *avtab_get_node(struct avtab *h, u32 idx)
> +{
> + if (idx != NULL_NODE_IDX)
> + return &h->nodes[NODES_ARRAY_IDX(idx)];
> + return NULL;
> +}
> +
> void avtab_init(struct avtab *h);
> int avtab_alloc(struct avtab *, u32);
> int avtab_alloc_dup(struct avtab *new, const struct avtab *orig);
> @@ -117,7 +139,7 @@ struct avtab_node *avtab_insert_nonunique(struct avtab *h,
> struct avtab_node *avtab_search_node(struct avtab *h,
> const struct avtab_key *key);
>
> -struct avtab_node *avtab_search_node_next(struct avtab_node *node, u16 specified);
> +struct avtab_node *avtab_search_node_next(struct avtab *h, struct avtab_node *node, u16 specified);
Another benefit of preserving the next field as a pointer is that we
should be able to leave the avtab_search_node_next() prototype intact.
Probably the implementation too.
> #define MAX_AVTAB_HASH_BITS 16
> #define MAX_AVTAB_HASH_BUCKETS (1 << MAX_AVTAB_HASH_BITS)
> diff --git a/security/selinux/ss/conditional.c b/security/selinux/ss/conditional.c
> index 81ff676f209a..448bdacb1769 100644
> --- a/security/selinux/ss/conditional.c
> +++ b/security/selinux/ss/conditional.c
> @@ -262,6 +262,7 @@ static int cond_insertf(struct avtab *a, const struct avtab_key *k,
> struct cond_insertf_data *data = ptr;
> struct policydb *p = data->p;
> struct cond_av_list *other = data->other;
> + struct avtab *cond_avtab = &p->te_cond_avtab;
> struct avtab_node *node_ptr;
> u32 i;
> bool found;
> @@ -285,9 +286,9 @@ static int cond_insertf(struct avtab *a, const struct avtab_key *k,
> * be any other entries.
> */
> if (other) {
> - node_ptr = avtab_search_node(&p->te_cond_avtab, k);
> + node_ptr = avtab_search_node(cond_avtab, k);
> if (node_ptr) {
> - if (avtab_search_node_next(node_ptr, k->specified)) {
> + if (avtab_search_node_next(cond_avtab, node_ptr, k->specified)) {
> pr_err("SELinux: too many conflicting type rules.\n");
> return -EINVAL;
> }
> @@ -304,14 +305,14 @@ static int cond_insertf(struct avtab *a, const struct avtab_key *k,
> }
> }
> } else {
> - if (avtab_search_node(&p->te_cond_avtab, k)) {
> + if (avtab_search_node(cond_avtab, k)) {
> pr_err("SELinux: conflicting type rules when adding type rule for true.\n");
> return -EINVAL;
> }
> }
> }
>
> - node_ptr = avtab_insert_nonunique(&p->te_cond_avtab, k, d);
> + node_ptr = avtab_insert_nonunique(cond_avtab, k, d);
> if (!node_ptr) {
> pr_err("SELinux: could not insert rule.\n");
> return -ENOMEM;
> @@ -563,7 +564,7 @@ void cond_compute_xperms(struct avtab *ctab, struct avtab_key *key,
> return;
>
> for (node = avtab_search_node(ctab, key); node;
> - node = avtab_search_node_next(node, key->specified)) {
> + node = avtab_search_node_next(ctab, node, key->specified)) {
> if (node->key.specified & AVTAB_ENABLED)
> services_compute_xperms_decision(xpermd, node);
> }
> @@ -580,7 +581,7 @@ void cond_compute_av(struct avtab *ctab, struct avtab_key *key,
> return;
>
> for (node = avtab_search_node(ctab, key); node;
> - node = avtab_search_node_next(node, key->specified)) {
> + node = avtab_search_node_next(ctab, node, key->specified)) {
> if ((u16)(AVTAB_ALLOWED|AVTAB_ENABLED) ==
> (node->key.specified & (AVTAB_ALLOWED|AVTAB_ENABLED)))
> avd->allowed |= node->datum.u.data;
> diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
> index 1eeffc66ea7d..89cb4fc2d4aa 100644
> --- a/security/selinux/ss/services.c
> +++ b/security/selinux/ss/services.c
> @@ -617,6 +617,7 @@ static void context_struct_compute_av(struct policydb *policydb,
> {
> struct constraint_node *constraint;
> struct role_allow *ra;
> + struct avtab *avtab;
> struct avtab_key avkey;
> struct avtab_node *node;
> struct class_datum *tclass_datum;
> @@ -644,6 +645,7 @@ static void context_struct_compute_av(struct policydb *policydb,
> * If a specific type enforcement rule was defined for
> * this permission check, then use it.
> */
> + avtab = &policydb->te_avtab;
> avkey.target_class = tclass;
> avkey.specified = AVTAB_AV | AVTAB_XPERMS;
> sattr = &policydb->type_attr_map_array[scontext->type - 1];
> @@ -652,10 +654,9 @@ static void context_struct_compute_av(struct policydb *policydb,
> ebitmap_for_each_positive_bit(tattr, tnode, j) {
> avkey.source_type = i + 1;
> avkey.target_type = j + 1;
> - for (node = avtab_search_node(&policydb->te_avtab,
> - &avkey);
> + for (node = avtab_search_node(avtab, &avkey);
> node;
> - node = avtab_search_node_next(node, avkey.specified)) {
> + node = avtab_search_node_next(avtab, node, avkey.specified)) {
> if (node->key.specified == AVTAB_ALLOWED)
> avd->allowed |= node->datum.u.data;
> else if (node->key.specified == AVTAB_AUDITALLOW)
> @@ -1008,6 +1009,7 @@ void security_compute_xperms_decision(u32 ssid,
> struct sidtab *sidtab;
> u16 tclass;
> struct context *scontext, *tcontext;
> + struct avtab *avtab;
> struct avtab_key avkey;
> struct avtab_node *node;
> struct ebitmap *sattr, *tattr;
> @@ -1055,6 +1057,7 @@ void security_compute_xperms_decision(u32 ssid,
> goto out;
> }
>
> + avtab = &policydb->te_avtab;
> avkey.target_class = tclass;
> avkey.specified = AVTAB_XPERMS;
> sattr = &policydb->type_attr_map_array[scontext->type - 1];
> @@ -1063,10 +1066,9 @@ void security_compute_xperms_decision(u32 ssid,
> ebitmap_for_each_positive_bit(tattr, tnode, j) {
> avkey.source_type = i + 1;
> avkey.target_type = j + 1;
> - for (node = avtab_search_node(&policydb->te_avtab,
> - &avkey);
> + for (node = avtab_search_node(avtab, &avkey);
> node;
> - node = avtab_search_node_next(node, avkey.specified))
> + node = avtab_search_node_next(avtab, node, avkey.specified))
> services_compute_xperms_decision(xpermd, node);
>
> cond_compute_xperms(&policydb->te_cond_avtab,
> @@ -1705,6 +1707,7 @@ static int security_compute_sid(u32 ssid,
> struct class_datum *cladatum;
> struct context *scontext, *tcontext, newcontext;
> struct sidtab_entry *sentry, *tentry;
> + struct avtab *cond_avtab;
> struct avtab_key avkey;
> struct avtab_node *avnode, *node;
> u16 tclass;
> @@ -1810,6 +1813,7 @@ static int security_compute_sid(u32 ssid,
> }
>
> /* Look for a type transition/member/change rule. */
> + cond_avtab = &policydb->te_cond_avtab;
> avkey.source_type = scontext->type;
> avkey.target_type = tcontext->type;
> avkey.target_class = tclass;
> @@ -1818,8 +1822,8 @@ static int security_compute_sid(u32 ssid,
>
> /* If no permanent rule, also check for enabled conditional rules */
> if (!avnode) {
> - node = avtab_search_node(&policydb->te_cond_avtab, &avkey);
> - for (; node; node = avtab_search_node_next(node, specified)) {
> + node = avtab_search_node(cond_avtab, &avkey);
> + for (; node; node = avtab_search_node_next(cond_avtab, node, specified)) {
> if (node->key.specified & AVTAB_ENABLED) {
> avnode = node;
> break;
> --
> 2.41.0
--
paul-moore.com
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] selinux: hweight optimization in avtab_read_item
2023-09-06 15:46 ` [PATCH 3/3] selinux: hweight optimization in avtab_read_item Jacob Satterfield
2023-09-06 17:18 ` Stephen Smalley
@ 2023-09-13 17:54 ` Paul Moore
1 sibling, 0 replies; 12+ messages in thread
From: Paul Moore @ 2023-09-13 17:54 UTC (permalink / raw)
To: Jacob Satterfield, selinux
Cc: Jacob Satterfield, stephen.smalley.work, omosnace
On Sep 6, 2023 Jacob Satterfield <jsatterfield.linux@gmail.com> wrote:
>
> avtab_read_item() is a hot function called when reading each rule in a
> binary policydb. With the current Fedora policy and refpolicy, this
> function is called nearly 100,000 times per policy load.
>
> A single avtab node is only permitted to have a single specifier to
> describe the data it holds. As such, a check is performed to make sure
> only one specifier is set. Previously this was done via a for-loop.
> However, there is already an optimal function for finding the number of
> bits set (hamming weight) and on some architectures, dedicated
> instructions (popcount) which can be executed much more efficiently.
>
> Even when using -mcpu=generic on a x86-64 Fedora 38 VM, this commit
> results in a modest 2-4% speedup for policy loading due to a substantial
> reduction in the number of instructions executed.
>
> Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> ---
> security/selinux/ss/avtab.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
Merged into selinux/next, thanks.
--
paul-moore.com
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] selinux: use arrays for avtab hashtable nodes
2023-09-13 3:23 ` Paul Moore
@ 2023-09-14 21:57 ` Jacob Satterfield
2023-09-15 1:10 ` Paul Moore
0 siblings, 1 reply; 12+ messages in thread
From: Jacob Satterfield @ 2023-09-14 21:57 UTC (permalink / raw)
To: Paul Moore; +Cc: Stephen Smalley, selinux, omosnace
On Tue, Sep 12, 2023 at 11:23 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Sep 6, 2023 Stephen Smalley <stephen.smalley.work@gmail.com> wrote:
> >
> > The current avtab hashtable employs a separate chaining collision
> > resolution strategy where each bucket/chain holds an ordered linked list
> > of pointers to kmem_cache allocated avtab_node elements.
> >
> > On Fedora 38 (x86_64) using the default policy, avtab_node_cachep
> > uses 573 slabs each containing 170 objects totaling 2,337,840 bytes.
> > A call to kmem_cache_zalloc() is required for every single rule, which
> > in the default policy is currently 96,730 and continually rising.
> >
> > When both sets of avtab_node (regular and cond.) are turned into arrays
> > with the hash table slot indexing into it rather than a pointer, then
> > this results in only 2 additional kvcalloc() calls and the complete
> > removal of the kmem_cache itself.
> >
> > The caching characteristics of iterating a single array are better due
> > to locality of reference. Running "perf stat -r 100 -d load_policy"
> > has shown a runtime reduction of at least 10% on a Fedora 38 x86_64 VM
> > with this single patch. Future patches focused on improving the hash
> > table's collision resolution strategy and array layout (struct-of-arrays
> > vs. array-of-structs) may elicit even more caching and therefore runtime
> > performance improvements.
> >
> > On 64-bit machines, the memory usage of the hash table slots is cut in
> > half due to the use of u32 indices instead of memory pointers. Since
> > ~65k hash slots are used between the regular and cond. tables with the
> > default Fedora 38 policy, this equates to around 256KB of memory saved.
> >
> > Notes:
> >
> > A couple helper functions avtab_get_chain() and avtab_get_node() are
> > introduced to provide more standardized interfaces for use in functions
> > that need to search through the hash table.
> >
> > NULL_NODE_IDX defines a sentinel value which helps determinine if a spot
> > in the hash table or the "next" member of an avtab_node are valid.
> >
> > This patch causes the cond. rules table to waste memory as the size
> > requested for the kvcalloc() is the size of the regular rules table.
> > These tables rarely, if ever, have the same number of rules in practice.
> > The next patch addresses this shortcoming.
>
> Isn't the more important issue that of there being more conditional
> rules than regular rules? Yes, I'm sure this is very unlikely, but
> given just this patch wouldn't this cause a problem?
>
> It is important to remember that even when combined in a patchset,
> any given patch should not break the system. A patch can depend on
> prior patches in the patchset, but not upcoming patches.
>
> I've only looked very briefly at patch 2/3, but I suspect at the very
> least you may need to squash patches 1/3 and 2/3 to ensure there is
> no breakage.
>
> More comments below (all are in the context of patch 1/3, some may not
> be relevant in the context of 1/3+2/3).
>
I will squash the two in the next version.
> > Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
> > Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> > ---
> > security/selinux/ss/avtab.c | 75 +++++++++++++++----------------
> > security/selinux/ss/avtab.h | 28 ++++++++++--
> > security/selinux/ss/conditional.c | 13 +++---
> > security/selinux/ss/services.c | 20 +++++----
> > 4 files changed, 81 insertions(+), 55 deletions(-)
> >
> > diff --git a/security/selinux/ss/avtab.c b/security/selinux/ss/avtab.c
> > index 86d98a8e291b..620ea0a03e41 100644
> > --- a/security/selinux/ss/avtab.c
> > +++ b/security/selinux/ss/avtab.c
> > @@ -23,7 +23,6 @@
> > #include "avtab.h"
> > #include "policydb.h"
> >
> > -static struct kmem_cache *avtab_node_cachep __ro_after_init;
> > static struct kmem_cache *avtab_xperms_cachep __ro_after_init;
> >
> > /* Based on MurmurHash3, written by Austin Appleby and placed in the
> > @@ -70,17 +69,17 @@ avtab_insert_node(struct avtab *h, u32 hvalue,
> > struct avtab_node *prev,
> > const struct avtab_key *key, const struct avtab_datum *datum)
> > {
> > + u32 newnodei;
> > struct avtab_node *newnode;
> > struct avtab_extended_perms *xperms;
> > - newnode = kmem_cache_zalloc(avtab_node_cachep, GFP_KERNEL);
> > - if (newnode == NULL)
> > - return NULL;
> > + newnodei = ++h->nel;
>
> This makes me nervous that some future code change forgets to
> decrement h->nel in its error handler. Let's only bump h->nel at the
> end of the function.
>
Agreed.
> We also should have a check in this function to ensure we don't
> overflow the array size.
>
When 1/3 + 2/3 are squashed, there is a check which grows the array if full.
> > + newnode = avtab_get_node(h, newnodei);
> > newnode->key = *key;
> >
> > if (key->specified & AVTAB_XPERMS) {
> > xperms = kmem_cache_zalloc(avtab_xperms_cachep, GFP_KERNEL);
> > if (xperms == NULL) {
> > - kmem_cache_free(avtab_node_cachep, newnode);
> > + --h->nel;
> > return NULL;
> > }
> > *xperms = *(datum->u.xperms);
> > @@ -91,15 +90,12 @@ avtab_insert_node(struct avtab *h, u32 hvalue,
> >
> > if (prev) {
> > newnode->next = prev->next;
> > - prev->next = newnode;
> > + prev->next = newnodei;
> > } else {
> > - struct avtab_node **n = &h->htable[hvalue];
> > -
> > - newnode->next = *n;
> > - *n = newnode;
> > + newnode->next = h->htable[hvalue];
> > + h->htable[hvalue] = newnodei;
> > }
>
> See my comments below about the next field in the avtab_node struct.
>
> > - h->nel++;
> > return newnode;
> > }
> >
> > @@ -114,9 +110,9 @@ static int avtab_insert(struct avtab *h, const struct avtab_key *key,
> > return -EINVAL;
> >
> > hvalue = avtab_hash(key, h->mask);
> > - for (prev = NULL, cur = h->htable[hvalue];
> > + for (prev = NULL, cur = avtab_get_chain(h, hvalue);
> > cur;
> > - prev = cur, cur = cur->next) {
> > + prev = cur, cur = avtab_get_node(h, cur->next)) {
> > if (key->source_type == cur->key.source_type &&
> > key->target_type == cur->key.target_type &&
> > key->target_class == cur->key.target_class &&
> > @@ -159,9 +155,9 @@ struct avtab_node *avtab_insert_nonunique(struct avtab *h,
> > if (!h || !h->nslot || h->nel == U32_MAX)
> > return NULL;
> > hvalue = avtab_hash(key, h->mask);
> > - for (prev = NULL, cur = h->htable[hvalue];
> > + for (prev = NULL, cur = avtab_get_chain(h, hvalue);
> > cur;
> > - prev = cur, cur = cur->next) {
> > + prev = cur, cur = avtab_get_node(h, cur->next)) {
> > if (key->source_type == cur->key.source_type &&
> > key->target_type == cur->key.target_type &&
> > key->target_class == cur->key.target_class &&
> > @@ -194,8 +190,8 @@ struct avtab_node *avtab_search_node(struct avtab *h,
> > return NULL;
> >
> > hvalue = avtab_hash(key, h->mask);
> > - for (cur = h->htable[hvalue]; cur;
> > - cur = cur->next) {
> > + for (cur = avtab_get_chain(h, hvalue); cur;
> > + cur = avtab_get_node(h, cur->next)) {
> > if (key->source_type == cur->key.source_type &&
> > key->target_type == cur->key.target_type &&
> > key->target_class == cur->key.target_class &&
> > @@ -216,7 +212,7 @@ struct avtab_node *avtab_search_node(struct avtab *h,
> > }
> >
> > struct avtab_node*
> > -avtab_search_node_next(struct avtab_node *node, u16 specified)
> > +avtab_search_node_next(struct avtab *h, struct avtab_node *node, u16 specified)
> > {
> > struct avtab_node *cur;
> >
> > @@ -224,7 +220,7 @@ avtab_search_node_next(struct avtab_node *node, u16 specified)
> > return NULL;
> >
> > specified &= ~(AVTAB_ENABLED|AVTAB_ENABLED_OLD);
> > - for (cur = node->next; cur; cur = cur->next) {
> > + for (cur = avtab_get_node(h, node->next); cur; cur = avtab_get_node(h, cur->next)) {
> > if (node->key.source_type == cur->key.source_type &&
> > node->key.target_type == cur->key.target_type &&
> > node->key.target_class == cur->key.target_class &&
> > @@ -247,24 +243,24 @@ avtab_search_node_next(struct avtab_node *node, u16 specified)
> > void avtab_destroy(struct avtab *h)
> > {
> > u32 i;
> > - struct avtab_node *cur, *temp;
> > + struct avtab_node *cur;
> >
> > if (!h)
> > return;
> >
> > for (i = 0; i < h->nslot; i++) {
> > - cur = h->htable[i];
> > + cur = avtab_get_chain(h, i);
> > while (cur) {
> > - temp = cur;
> > - cur = cur->next;
> > - if (temp->key.specified & AVTAB_XPERMS)
> > + if (cur->key.specified & AVTAB_XPERMS)
> > kmem_cache_free(avtab_xperms_cachep,
> > - temp->datum.u.xperms);
> > - kmem_cache_free(avtab_node_cachep, temp);
> > + cur->datum.u.xperms);
> > + cur = avtab_get_node(h, cur->next);
> > }
> > }
> > kvfree(h->htable);
> > + kvfree(h->nodes);
> > h->htable = NULL;
> > + h->nodes = NULL;
> > h->nel = 0;
> > h->nslot = 0;
> > h->mask = 0;
> > @@ -273,20 +269,26 @@ void avtab_destroy(struct avtab *h)
> > void avtab_init(struct avtab *h)
> > {
> > h->htable = NULL;
> > + h->nodes = NULL;
> > h->nel = 0;
> > h->nslot = 0;
> > h->mask = 0;
> > }
> >
> > -static int avtab_alloc_common(struct avtab *h, u32 nslot)
> > +static int avtab_alloc_common(struct avtab *h, u32 nslot, u32 nrules)
> > {
> > if (!nslot)
> > return 0;
> >
> > - h->htable = kvcalloc(nslot, sizeof(void *), GFP_KERNEL);
> > + h->htable = kvcalloc(nslot, sizeof(u32), GFP_KERNEL);
> > if (!h->htable)
> > return -ENOMEM;
> > -
> > + h->nodes = kvcalloc(nrules, sizeof(struct avtab_node), GFP_KERNEL);
> > + if (!h->nodes) {
> > + kvfree(h->htable);
> > + h->htable = NULL;
> > + return -ENOMEM;
> > + }
> > h->nslot = nslot;
> > h->mask = nslot - 1;
> > return 0;
> > @@ -308,7 +310,7 @@ int avtab_alloc(struct avtab *h, u32 nrules)
> > if (nslot > MAX_AVTAB_HASH_BUCKETS)
> > nslot = MAX_AVTAB_HASH_BUCKETS;
> >
> > - rc = avtab_alloc_common(h, nslot);
> > + rc = avtab_alloc_common(h, nslot, nrules);
> > if (rc)
> > return rc;
> > }
> > @@ -319,7 +321,7 @@ int avtab_alloc(struct avtab *h, u32 nrules)
> >
> > int avtab_alloc_dup(struct avtab *new, const struct avtab *orig)
> > {
> > - return avtab_alloc_common(new, orig->nslot);
> > + return avtab_alloc_common(new, orig->nslot, orig->nel);
> > }
> >
> > #ifdef CONFIG_SECURITY_SELINUX_DEBUG
> > @@ -333,13 +335,13 @@ void avtab_hash_eval(struct avtab *h, const char *tag)
> > max_chain_len = 0;
> > chain2_len_sum = 0;
> > for (i = 0; i < h->nslot; i++) {
> > - cur = h->htable[i];
> > + cur = avtab_get_chain(h, i);
> > if (cur) {
> > slots_used++;
> > chain_len = 0;
> > while (cur) {
> > chain_len++;
> > - cur = cur->next;
> > + cur = avtab_get_node(h, cur->next);
> > }
> >
> > if (chain_len > max_chain_len)
> > @@ -627,8 +629,8 @@ int avtab_write(struct policydb *p, struct avtab *a, void *fp)
> > return rc;
> >
> > for (i = 0; i < a->nslot; i++) {
> > - for (cur = a->htable[i]; cur;
> > - cur = cur->next) {
> > + for (cur = avtab_get_chain(a, i); cur;
> > + cur = avtab_get_node(a, cur->next)) {
> > rc = avtab_write_item(p, cur, fp);
> > if (rc)
> > return rc;
> > @@ -640,9 +642,6 @@ int avtab_write(struct policydb *p, struct avtab *a, void *fp)
> >
> > void __init avtab_cache_init(void)
> > {
> > - avtab_node_cachep = kmem_cache_create("avtab_node",
> > - sizeof(struct avtab_node),
> > - 0, SLAB_PANIC, NULL);
> > avtab_xperms_cachep = kmem_cache_create("avtab_extended_perms",
> > sizeof(struct avtab_extended_perms),
> > 0, SLAB_PANIC, NULL);
> > diff --git a/security/selinux/ss/avtab.h b/security/selinux/ss/avtab.h
> > index 3c3904bf02b0..72708aed5fff 100644
> > --- a/security/selinux/ss/avtab.h
> > +++ b/security/selinux/ss/avtab.h
> > @@ -77,16 +77,38 @@ struct avtab_datum {
> > struct avtab_node {
> > struct avtab_key key;
> > struct avtab_datum datum;
> > - struct avtab_node *next;
> > + u32 next;
>
> Given that one of the common avtab lookup operations is to simply walk
> the list, using the next field, why not keep the next field as a
> pointer? Sure, you loose the 64-bit to 32-bit size reduction on 64-bit
> systems (although I wonder if alignment issues rendered that moot), but
> I would expect the walk to be quicker if you only needed to do a single
> next pointer lookup as opposed to the array indexing.
>
> I may be missing something, but I believe this would also remove the
> need for the NULL_NODE_IDX sentinel.
>
The conditional avtab is the reason an index was used instead of a
pointer. Because it doesn't have the right hint when allocating the
nodes array, we have to copy the nodes array into a smaller buffer
after reading the rules (otherwise we waste about 1.7 MB of memory).
The copy invalidates the pointers that each "next" refers to as they
would be pointing to free'd memory instead of the new nodes array.
However, I agree that the walk would be quicker with the direct
pointer reference instead of the array index and with the way the
struct alignment works, the index here doesn't save memory.
An alternative I have prototyped is to use pointer math to calculate
an offset from the original nodes array to where "next" is pointing to
and add it from the new array to calculate the new "next" value. This
changes "next" back to a pointer and gets rid of the indirection.
I agree that the NULL_NODE_IDX sentinel is not very ergonomic, however
it's at the cost of 256 KB of memory in relation to the htable slots.
Mainly due to the difficulty of telling when a slot is empty with an
index vs. a pointer. I could leave the htable slots as indexes, but it
would still require some form of sentinel value to indicate the
emptiness of a slot. What would you prefer?
> > };
> >
> > struct avtab {
> > - struct avtab_node **htable;
> > + u32 *htable;
>
> The field name "htable" is a little too confusing now, let's change
> it to "indicies", "offsets", or something like that.
>
Agreed, what about "slots" to coincide with the "nslot" variable below?
> > + struct avtab_node *nodes;
> > u32 nel; /* number of elements */
> > u32 nslot; /* number of hash slots */
> > u32 mask; /* mask to compute hash func */
> > };
> >
> > +/* sentinel value to signal an empty node */
> > +#define NULL_NODE_IDX (0)
> > +/* compute the actual index into the nodes array */
> > +#define NODES_ARRAY_IDX(idx) ((idx) - 1)
>
> I'm pretty sure we can do away with NULL_NODE_IDX if we keep the next
> field as a pointer (NULL works as a sentinel), does that also mean we
> can get rid of the NODES_ARRAY_IDX translation?
>
See comment above. There is a tradeoff with memory if the sentinel and
translation are removed, but they are not required to use arrays
instead of a linked-list. It's up to you as to what is preferred.
> > +static inline struct avtab_node *avtab_get_chain(struct avtab *h, u32 slot)
> > +{
> > + u32 chain_start = h->htable[slot];
> > +
> > + if (chain_start != NULL_NODE_IDX)
> > + return &h->nodes[NODES_ARRAY_IDX(chain_start)];
> > + return NULL;
>
> Don't duplicate avtab_get_node() here, call it instead; for example:
>
> struct avtab_node *avtab_get_chain(h, slot)
> {
> return avtab_get_node(h, h->table[slot]);
> }
>
Good call.
> > +}
> > +
> > +static inline struct avtab_node *avtab_get_node(struct avtab *h, u32 idx)
> > +{
> > + if (idx != NULL_NODE_IDX)
> > + return &h->nodes[NODES_ARRAY_IDX(idx)];
> > + return NULL;
> > +}
> > +
> > void avtab_init(struct avtab *h);
> > int avtab_alloc(struct avtab *, u32);
> > int avtab_alloc_dup(struct avtab *new, const struct avtab *orig);
> > @@ -117,7 +139,7 @@ struct avtab_node *avtab_insert_nonunique(struct avtab *h,
> > struct avtab_node *avtab_search_node(struct avtab *h,
> > const struct avtab_key *key);
> >
> > -struct avtab_node *avtab_search_node_next(struct avtab_node *node, u16 specified);
> > +struct avtab_node *avtab_search_node_next(struct avtab *h, struct avtab_node *node, u16 specified);
>
> Another benefit of preserving the next field as a pointer is that we
> should be able to leave the avtab_search_node_next() prototype intact.
> Probably the implementation too.
>
I agree and have changed the "next" field back to a pointer. This
reverts the prototype back.
> > #define MAX_AVTAB_HASH_BITS 16
> > #define MAX_AVTAB_HASH_BUCKETS (1 << MAX_AVTAB_HASH_BITS)
> > diff --git a/security/selinux/ss/conditional.c b/security/selinux/ss/conditional.c
> > index 81ff676f209a..448bdacb1769 100644
> > --- a/security/selinux/ss/conditional.c
> > +++ b/security/selinux/ss/conditional.c
> > @@ -262,6 +262,7 @@ static int cond_insertf(struct avtab *a, const struct avtab_key *k,
> > struct cond_insertf_data *data = ptr;
> > struct policydb *p = data->p;
> > struct cond_av_list *other = data->other;
> > + struct avtab *cond_avtab = &p->te_cond_avtab;
> > struct avtab_node *node_ptr;
> > u32 i;
> > bool found;
> > @@ -285,9 +286,9 @@ static int cond_insertf(struct avtab *a, const struct avtab_key *k,
> > * be any other entries.
> > */
> > if (other) {
> > - node_ptr = avtab_search_node(&p->te_cond_avtab, k);
> > + node_ptr = avtab_search_node(cond_avtab, k);
> > if (node_ptr) {
> > - if (avtab_search_node_next(node_ptr, k->specified)) {
> > + if (avtab_search_node_next(cond_avtab, node_ptr, k->specified)) {
> > pr_err("SELinux: too many conflicting type rules.\n");
> > return -EINVAL;
> > }
> > @@ -304,14 +305,14 @@ static int cond_insertf(struct avtab *a, const struct avtab_key *k,
> > }
> > }
> > } else {
> > - if (avtab_search_node(&p->te_cond_avtab, k)) {
> > + if (avtab_search_node(cond_avtab, k)) {
> > pr_err("SELinux: conflicting type rules when adding type rule for true.\n");
> > return -EINVAL;
> > }
> > }
> > }
> >
> > - node_ptr = avtab_insert_nonunique(&p->te_cond_avtab, k, d);
> > + node_ptr = avtab_insert_nonunique(cond_avtab, k, d);
> > if (!node_ptr) {
> > pr_err("SELinux: could not insert rule.\n");
> > return -ENOMEM;
> > @@ -563,7 +564,7 @@ void cond_compute_xperms(struct avtab *ctab, struct avtab_key *key,
> > return;
> >
> > for (node = avtab_search_node(ctab, key); node;
> > - node = avtab_search_node_next(node, key->specified)) {
> > + node = avtab_search_node_next(ctab, node, key->specified)) {
> > if (node->key.specified & AVTAB_ENABLED)
> > services_compute_xperms_decision(xpermd, node);
> > }
> > @@ -580,7 +581,7 @@ void cond_compute_av(struct avtab *ctab, struct avtab_key *key,
> > return;
> >
> > for (node = avtab_search_node(ctab, key); node;
> > - node = avtab_search_node_next(node, key->specified)) {
> > + node = avtab_search_node_next(ctab, node, key->specified)) {
> > if ((u16)(AVTAB_ALLOWED|AVTAB_ENABLED) ==
> > (node->key.specified & (AVTAB_ALLOWED|AVTAB_ENABLED)))
> > avd->allowed |= node->datum.u.data;
> > diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
> > index 1eeffc66ea7d..89cb4fc2d4aa 100644
> > --- a/security/selinux/ss/services.c
> > +++ b/security/selinux/ss/services.c
> > @@ -617,6 +617,7 @@ static void context_struct_compute_av(struct policydb *policydb,
> > {
> > struct constraint_node *constraint;
> > struct role_allow *ra;
> > + struct avtab *avtab;
> > struct avtab_key avkey;
> > struct avtab_node *node;
> > struct class_datum *tclass_datum;
> > @@ -644,6 +645,7 @@ static void context_struct_compute_av(struct policydb *policydb,
> > * If a specific type enforcement rule was defined for
> > * this permission check, then use it.
> > */
> > + avtab = &policydb->te_avtab;
> > avkey.target_class = tclass;
> > avkey.specified = AVTAB_AV | AVTAB_XPERMS;
> > sattr = &policydb->type_attr_map_array[scontext->type - 1];
> > @@ -652,10 +654,9 @@ static void context_struct_compute_av(struct policydb *policydb,
> > ebitmap_for_each_positive_bit(tattr, tnode, j) {
> > avkey.source_type = i + 1;
> > avkey.target_type = j + 1;
> > - for (node = avtab_search_node(&policydb->te_avtab,
> > - &avkey);
> > + for (node = avtab_search_node(avtab, &avkey);
> > node;
> > - node = avtab_search_node_next(node, avkey.specified)) {
> > + node = avtab_search_node_next(avtab, node, avkey.specified)) {
> > if (node->key.specified == AVTAB_ALLOWED)
> > avd->allowed |= node->datum.u.data;
> > else if (node->key.specified == AVTAB_AUDITALLOW)
> > @@ -1008,6 +1009,7 @@ void security_compute_xperms_decision(u32 ssid,
> > struct sidtab *sidtab;
> > u16 tclass;
> > struct context *scontext, *tcontext;
> > + struct avtab *avtab;
> > struct avtab_key avkey;
> > struct avtab_node *node;
> > struct ebitmap *sattr, *tattr;
> > @@ -1055,6 +1057,7 @@ void security_compute_xperms_decision(u32 ssid,
> > goto out;
> > }
> >
> > + avtab = &policydb->te_avtab;
> > avkey.target_class = tclass;
> > avkey.specified = AVTAB_XPERMS;
> > sattr = &policydb->type_attr_map_array[scontext->type - 1];
> > @@ -1063,10 +1066,9 @@ void security_compute_xperms_decision(u32 ssid,
> > ebitmap_for_each_positive_bit(tattr, tnode, j) {
> > avkey.source_type = i + 1;
> > avkey.target_type = j + 1;
> > - for (node = avtab_search_node(&policydb->te_avtab,
> > - &avkey);
> > + for (node = avtab_search_node(avtab, &avkey);
> > node;
> > - node = avtab_search_node_next(node, avkey.specified))
> > + node = avtab_search_node_next(avtab, node, avkey.specified))
> > services_compute_xperms_decision(xpermd, node);
> >
> > cond_compute_xperms(&policydb->te_cond_avtab,
> > @@ -1705,6 +1707,7 @@ static int security_compute_sid(u32 ssid,
> > struct class_datum *cladatum;
> > struct context *scontext, *tcontext, newcontext;
> > struct sidtab_entry *sentry, *tentry;
> > + struct avtab *cond_avtab;
> > struct avtab_key avkey;
> > struct avtab_node *avnode, *node;
> > u16 tclass;
> > @@ -1810,6 +1813,7 @@ static int security_compute_sid(u32 ssid,
> > }
> >
> > /* Look for a type transition/member/change rule. */
> > + cond_avtab = &policydb->te_cond_avtab;
> > avkey.source_type = scontext->type;
> > avkey.target_type = tcontext->type;
> > avkey.target_class = tclass;
> > @@ -1818,8 +1822,8 @@ static int security_compute_sid(u32 ssid,
> >
> > /* If no permanent rule, also check for enabled conditional rules */
> > if (!avnode) {
> > - node = avtab_search_node(&policydb->te_cond_avtab, &avkey);
> > - for (; node; node = avtab_search_node_next(node, specified)) {
> > + node = avtab_search_node(cond_avtab, &avkey);
> > + for (; node; node = avtab_search_node_next(cond_avtab, node, specified)) {
> > if (node->key.specified & AVTAB_ENABLED) {
> > avnode = node;
> > break;
> > --
> > 2.41.0
>
> --
> paul-moore.com
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] selinux: use arrays for avtab hashtable nodes
2023-09-14 21:57 ` Jacob Satterfield
@ 2023-09-15 1:10 ` Paul Moore
2023-09-18 0:12 ` Jacob Satterfield
0 siblings, 1 reply; 12+ messages in thread
From: Paul Moore @ 2023-09-15 1:10 UTC (permalink / raw)
To: Jacob Satterfield; +Cc: Stephen Smalley, selinux, omosnace
On Thu, Sep 14, 2023 at 5:57 PM Jacob Satterfield
<jsatterfield.linux@gmail.com> wrote:
>
> On Tue, Sep 12, 2023 at 11:23 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > On Sep 6, 2023 Stephen Smalley <stephen.smalley.work@gmail.com> wrote:
> > >
> > > The current avtab hashtable employs a separate chaining collision
> > > resolution strategy where each bucket/chain holds an ordered linked list
> > > of pointers to kmem_cache allocated avtab_node elements.
> > >
> > > On Fedora 38 (x86_64) using the default policy, avtab_node_cachep
> > > uses 573 slabs each containing 170 objects totaling 2,337,840 bytes.
> > > A call to kmem_cache_zalloc() is required for every single rule, which
> > > in the default policy is currently 96,730 and continually rising.
> > >
> > > When both sets of avtab_node (regular and cond.) are turned into arrays
> > > with the hash table slot indexing into it rather than a pointer, then
> > > this results in only 2 additional kvcalloc() calls and the complete
> > > removal of the kmem_cache itself.
> > >
> > > The caching characteristics of iterating a single array are better due
> > > to locality of reference. Running "perf stat -r 100 -d load_policy"
> > > has shown a runtime reduction of at least 10% on a Fedora 38 x86_64 VM
> > > with this single patch. Future patches focused on improving the hash
> > > table's collision resolution strategy and array layout (struct-of-arrays
> > > vs. array-of-structs) may elicit even more caching and therefore runtime
> > > performance improvements.
> > >
> > > On 64-bit machines, the memory usage of the hash table slots is cut in
> > > half due to the use of u32 indices instead of memory pointers. Since
> > > ~65k hash slots are used between the regular and cond. tables with the
> > > default Fedora 38 policy, this equates to around 256KB of memory saved.
> > >
> > > Notes:
> > >
> > > A couple helper functions avtab_get_chain() and avtab_get_node() are
> > > introduced to provide more standardized interfaces for use in functions
> > > that need to search through the hash table.
> > >
> > > NULL_NODE_IDX defines a sentinel value which helps determinine if a spot
> > > in the hash table or the "next" member of an avtab_node are valid.
> > >
> > > This patch causes the cond. rules table to waste memory as the size
> > > requested for the kvcalloc() is the size of the regular rules table.
> > > These tables rarely, if ever, have the same number of rules in practice.
> > > The next patch addresses this shortcoming.
> >
> > Isn't the more important issue that of there being more conditional
> > rules than regular rules? Yes, I'm sure this is very unlikely, but
> > given just this patch wouldn't this cause a problem?
> >
> > It is important to remember that even when combined in a patchset,
> > any given patch should not break the system. A patch can depend on
> > prior patches in the patchset, but not upcoming patches.
> >
> > I've only looked very briefly at patch 2/3, but I suspect at the very
> > least you may need to squash patches 1/3 and 2/3 to ensure there is
> > no breakage.
> >
> > More comments below (all are in the context of patch 1/3, some may not
> > be relevant in the context of 1/3+2/3).
>
> I will squash the two in the next version.
>
> > > Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
> > > Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> > > ---
> > > security/selinux/ss/avtab.c | 75 +++++++++++++++----------------
> > > security/selinux/ss/avtab.h | 28 ++++++++++--
> > > security/selinux/ss/conditional.c | 13 +++---
> > > security/selinux/ss/services.c | 20 +++++----
> > > 4 files changed, 81 insertions(+), 55 deletions(-)
...
0, SLAB_PANIC, NULL);
> > > diff --git a/security/selinux/ss/avtab.h b/security/selinux/ss/avtab.h
> > > index 3c3904bf02b0..72708aed5fff 100644
> > > --- a/security/selinux/ss/avtab.h
> > > +++ b/security/selinux/ss/avtab.h
> > > @@ -77,16 +77,38 @@ struct avtab_datum {
> > > struct avtab_node {
> > > struct avtab_key key;
> > > struct avtab_datum datum;
> > > - struct avtab_node *next;
> > > + u32 next;
> >
> > Given that one of the common avtab lookup operations is to simply walk
> > the list, using the next field, why not keep the next field as a
> > pointer? Sure, you loose the 64-bit to 32-bit size reduction on 64-bit
> > systems (although I wonder if alignment issues rendered that moot), but
> > I would expect the walk to be quicker if you only needed to do a single
> > next pointer lookup as opposed to the array indexing.
> >
> > I may be missing something, but I believe this would also remove the
> > need for the NULL_NODE_IDX sentinel.
> >
>
> The conditional avtab is the reason an index was used instead of a
> pointer. Because it doesn't have the right hint when allocating the
> nodes array, we have to copy the nodes array into a smaller buffer
> after reading the rules (otherwise we waste about 1.7 MB of memory).
> The copy invalidates the pointers that each "next" refers to as they
> would be pointing to free'd memory instead of the new nodes array.
> However, I agree that the walk would be quicker with the direct
> pointer reference instead of the array index and with the way the
> struct alignment works, the index here doesn't save memory.
>
> An alternative I have prototyped is to use pointer math to calculate
> an offset from the original nodes array to where "next" is pointing to
> and add it from the new array to calculate the new "next" value. This
> changes "next" back to a pointer and gets rid of the indirection.
>
> I agree that the NULL_NODE_IDX sentinel is not very ergonomic, however
> it's at the cost of 256 KB of memory in relation to the htable slots.
> Mainly due to the difficulty of telling when a slot is empty with an
> index vs. a pointer. I could leave the htable slots as indexes, but it
> would still require some form of sentinel value to indicate the
> emptiness of a slot. What would you prefer?
In addition to other changes we've discussed, it looks like squashing
1/3 and 2/3 is going to change things quite a bit, so I'm going to
suggest you make the changes you're planning to make, including
squashing the two patches, and repost. We can re-review the new
revision and pick up the discussion from there.
Does that sound okay?
--
paul-moore.com
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] selinux: use arrays for avtab hashtable nodes
2023-09-15 1:10 ` Paul Moore
@ 2023-09-18 0:12 ` Jacob Satterfield
0 siblings, 0 replies; 12+ messages in thread
From: Jacob Satterfield @ 2023-09-18 0:12 UTC (permalink / raw)
To: Paul Moore; +Cc: Stephen Smalley, selinux, omosnace
Yes, it does. I will post a new patch set next week.
Thanks for reviewing.
On Fri, Sep 15, 2023 at 1:10 AM Paul Moore <paul@paul-moore.com> wrote:
>
> On Thu, Sep 14, 2023 at 5:57 PM Jacob Satterfield
> <jsatterfield.linux@gmail.com> wrote:
> >
> > On Tue, Sep 12, 2023 at 11:23 PM Paul Moore <paul@paul-moore.com> wrote:
> > >
> > > On Sep 6, 2023 Stephen Smalley <stephen.smalley.work@gmail.com> wrote:
> > > >
> > > > The current avtab hashtable employs a separate chaining collision
> > > > resolution strategy where each bucket/chain holds an ordered linked list
> > > > of pointers to kmem_cache allocated avtab_node elements.
> > > >
> > > > On Fedora 38 (x86_64) using the default policy, avtab_node_cachep
> > > > uses 573 slabs each containing 170 objects totaling 2,337,840 bytes.
> > > > A call to kmem_cache_zalloc() is required for every single rule, which
> > > > in the default policy is currently 96,730 and continually rising.
> > > >
> > > > When both sets of avtab_node (regular and cond.) are turned into arrays
> > > > with the hash table slot indexing into it rather than a pointer, then
> > > > this results in only 2 additional kvcalloc() calls and the complete
> > > > removal of the kmem_cache itself.
> > > >
> > > > The caching characteristics of iterating a single array are better due
> > > > to locality of reference. Running "perf stat -r 100 -d load_policy"
> > > > has shown a runtime reduction of at least 10% on a Fedora 38 x86_64 VM
> > > > with this single patch. Future patches focused on improving the hash
> > > > table's collision resolution strategy and array layout (struct-of-arrays
> > > > vs. array-of-structs) may elicit even more caching and therefore runtime
> > > > performance improvements.
> > > >
> > > > On 64-bit machines, the memory usage of the hash table slots is cut in
> > > > half due to the use of u32 indices instead of memory pointers. Since
> > > > ~65k hash slots are used between the regular and cond. tables with the
> > > > default Fedora 38 policy, this equates to around 256KB of memory saved.
> > > >
> > > > Notes:
> > > >
> > > > A couple helper functions avtab_get_chain() and avtab_get_node() are
> > > > introduced to provide more standardized interfaces for use in functions
> > > > that need to search through the hash table.
> > > >
> > > > NULL_NODE_IDX defines a sentinel value which helps determinine if a spot
> > > > in the hash table or the "next" member of an avtab_node are valid.
> > > >
> > > > This patch causes the cond. rules table to waste memory as the size
> > > > requested for the kvcalloc() is the size of the regular rules table.
> > > > These tables rarely, if ever, have the same number of rules in practice.
> > > > The next patch addresses this shortcoming.
> > >
> > > Isn't the more important issue that of there being more conditional
> > > rules than regular rules? Yes, I'm sure this is very unlikely, but
> > > given just this patch wouldn't this cause a problem?
> > >
> > > It is important to remember that even when combined in a patchset,
> > > any given patch should not break the system. A patch can depend on
> > > prior patches in the patchset, but not upcoming patches.
> > >
> > > I've only looked very briefly at patch 2/3, but I suspect at the very
> > > least you may need to squash patches 1/3 and 2/3 to ensure there is
> > > no breakage.
> > >
> > > More comments below (all are in the context of patch 1/3, some may not
> > > be relevant in the context of 1/3+2/3).
> >
> > I will squash the two in the next version.
> >
> > > > Signed-off-by: Jacob Satterfield <jsatterfield.linux@gmail.com>
> > > > Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
> > > > ---
> > > > security/selinux/ss/avtab.c | 75 +++++++++++++++----------------
> > > > security/selinux/ss/avtab.h | 28 ++++++++++--
> > > > security/selinux/ss/conditional.c | 13 +++---
> > > > security/selinux/ss/services.c | 20 +++++----
> > > > 4 files changed, 81 insertions(+), 55 deletions(-)
>
> ...
> 0, SLAB_PANIC, NULL);
> > > > diff --git a/security/selinux/ss/avtab.h b/security/selinux/ss/avtab.h
> > > > index 3c3904bf02b0..72708aed5fff 100644
> > > > --- a/security/selinux/ss/avtab.h
> > > > +++ b/security/selinux/ss/avtab.h
> > > > @@ -77,16 +77,38 @@ struct avtab_datum {
> > > > struct avtab_node {
> > > > struct avtab_key key;
> > > > struct avtab_datum datum;
> > > > - struct avtab_node *next;
> > > > + u32 next;
> > >
> > > Given that one of the common avtab lookup operations is to simply walk
> > > the list, using the next field, why not keep the next field as a
> > > pointer? Sure, you loose the 64-bit to 32-bit size reduction on 64-bit
> > > systems (although I wonder if alignment issues rendered that moot), but
> > > I would expect the walk to be quicker if you only needed to do a single
> > > next pointer lookup as opposed to the array indexing.
> > >
> > > I may be missing something, but I believe this would also remove the
> > > need for the NULL_NODE_IDX sentinel.
> > >
> >
> > The conditional avtab is the reason an index was used instead of a
> > pointer. Because it doesn't have the right hint when allocating the
> > nodes array, we have to copy the nodes array into a smaller buffer
> > after reading the rules (otherwise we waste about 1.7 MB of memory).
> > The copy invalidates the pointers that each "next" refers to as they
> > would be pointing to free'd memory instead of the new nodes array.
> > However, I agree that the walk would be quicker with the direct
> > pointer reference instead of the array index and with the way the
> > struct alignment works, the index here doesn't save memory.
> >
> > An alternative I have prototyped is to use pointer math to calculate
> > an offset from the original nodes array to where "next" is pointing to
> > and add it from the new array to calculate the new "next" value. This
> > changes "next" back to a pointer and gets rid of the indirection.
> >
> > I agree that the NULL_NODE_IDX sentinel is not very ergonomic, however
> > it's at the cost of 256 KB of memory in relation to the htable slots.
> > Mainly due to the difficulty of telling when a slot is empty with an
> > index vs. a pointer. I could leave the htable slots as indexes, but it
> > would still require some form of sentinel value to indicate the
> > emptiness of a slot. What would you prefer?
>
> In addition to other changes we've discussed, it looks like squashing
> 1/3 and 2/3 is going to change things quite a bit, so I'm going to
> suggest you make the changes you're planning to make, including
> squashing the two patches, and repost. We can re-review the new
> revision and pick up the discussion from there.
>
> Does that sound okay?
>
> --
> paul-moore.com
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-09-18 0:14 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-06 15:46 [PATCH 0/3] avtab hotspot optimizations Jacob Satterfield
2023-09-06 15:46 ` [PATCH 1/3] selinux: use arrays for avtab hashtable nodes Jacob Satterfield
2023-09-06 17:16 ` Stephen Smalley
2023-09-13 3:23 ` Paul Moore
2023-09-14 21:57 ` Jacob Satterfield
2023-09-15 1:10 ` Paul Moore
2023-09-18 0:12 ` Jacob Satterfield
2023-09-06 15:46 ` [PATCH 2/3] selinux: shrink conditional avtab node array Jacob Satterfield
2023-09-06 17:17 ` Stephen Smalley
2023-09-06 15:46 ` [PATCH 3/3] selinux: hweight optimization in avtab_read_item Jacob Satterfield
2023-09-06 17:18 ` Stephen Smalley
2023-09-13 17:54 ` Paul Moore
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.