From: Stefan Berger <stefanb@linux.ibm.com>
To: Mimi Zohar <zohar@linux.ibm.com>, linux-integrity@vger.kernel.org
Cc: serge@hallyn.com, christian.brauner@ubuntu.com,
containers@lists.linux.dev, dmitry.kasatkin@gmail.com,
ebiederm@xmission.com, krzysztof.struczynski@huawei.com,
roberto.sassu@huawei.com, mpeters@redhat.com, lhinds@redhat.com,
lsturman@redhat.com, puiterwi@redhat.com, jejb@linux.ibm.com,
jamjoom@us.ibm.com, linux-kernel@vger.kernel.org,
paul@paul-moore.com, rgb@redhat.com,
linux-security-module@vger.kernel.org, jmorris@namei.org,
Mehmet Kayaalp <mkayaalp@linux.vnet.ibm.com>
Subject: Re: [PATCH v10 18/27] integrity/ima: Define ns_status for storing namespaced iint data
Date: Wed, 23 Feb 2022 21:21:48 -0500 [thread overview]
Message-ID: <92e1fc33-b97f-b99e-4f28-1d05a07c2f2f@linux.ibm.com> (raw)
In-Reply-To: <d94928dcb87550b7d5cfe277eed8a195ad9c877c.camel@linux.ibm.com>
On 2/23/22 11:12, Mimi Zohar wrote:
> On Tue, 2022-02-01 at 15:37 -0500, Stefan Berger wrote:
>> From: Mehmet Kayaalp <mkayaalp@linux.vnet.ibm.com>
>>
>> Add an rbtree to the IMA namespace structure that stores a namespaced
>> version of iint->flags in ns_status struct. Similar to the
>> integrity_iint_cache, both the iint and ns_status are looked up using the
>> inode pointer value. The lookup, allocate, and insertion code is also
>> similar.
>>
>> In subsequent patches we will have to find all ns_status entries an iint
>> is being used in and reset flags there. To do this, connect a list of
>> ns_status to the integrity_iint_cache and provide a reader-writer
>> lock in the integrity_iint_cache to lock access to the list.
>>
>> To simplify the code in the non-namespaces case embed an ns_status in
>> the integrity_iint_cache and have it linked into the iint's ns_status list
>> when calling ima_get_ns_status().
>>
>> When getting an ns_status first try to find it in the RB tree. Here we can
>> run into the situation that an ns_status found in the RB tree has a
>> different iint associated with it for the same inode. In this case we need
>> to delete the ns_status structure and get a new one.
>>
>> There are two cases for freeing:
>> - when the iint is freed (inode deletion): Walk the list of ns_status
>> entries and disconnect each ns_status from the list; take the
>> writer lock to protect access to the list; also, take the item off the
>> per-namespace rbtree
>>
>> - when the ima_namepace is freed: While walking the rbtree, remove the
>> ns_status from the list while also holding the iint's writer lock;
>> to be able to grab the lock we have to have a pointer to the iint on
>> the ns_status structure.
>>
>> To avoid an ns_status to be freed by the two cases concurrently, prevent
>> these two cases to run concurrently. Therefore, groups of threads
>> deleting either inodes or ima_namespaces are allowed to run concurrently
>> but no two threads may run and one delete an inode and the other an
>> ima_namespace.
> The locking involved here is really complex. I'm sure you thought
> about it a lot, but isn't there a better alternative?
I am afraid this is a difficult question and a short and concise answer
is not possible...
The complexity of the locking is driven by concurrency and the data
structures that are involved. The data structures (existing global iint
rbtree, ns_status structure, and per namespace rbtree for ns_status) and
how they are organized and connected (via linked lists) are a
consequence of the fact that we need to be able to handle files shared
between IMA namespaces (and the host) so that re-auditing, re-measuring
and re-appraisal of files after file modifications or modifications of
the security.ima xattr (by any namespaces) can be done efficiently.
Furthermore, it helps to efficiently remove all the status information
that an IMA namespace has kept for files it audited/measured/appraised.
The goal was to make this as scalable as possible by having each
namespace get out of the way of other namespaces by preventing them from
locking each other out too much. The single biggest problem are files
shared between IMA namespaces.
The best argument for the design I can come up with is the 'Big O
notation' describing the time complexity of operations.
The existing global iint rbtree maintains IMA status information for
each inode. Lookup and insertion of data into the gloab iint rbtree is
O(log(n)), thus optimal.
To accommodate re-auditing/re-measurement/re-appraisal, which is driven
by resetting status flags, I connected a list of ns_status structures,
in which each namespace maintains its status information for each inode,
to the iint maintained in that global rbtree. The resetting of status
flags is fast because traversing the list after a lookup in the tree is
O(n). Lookup + resetting the flags therefore is O(log(n) + n). If the
list didn't exist we would have to search all IMA namespaces for the
inode to be able to reset the flags, resulting in O(n * log(n)) time
complexity, which is of course much worse. So, the list of ns_status
linked to an iint has a good reason: better time complexity to traverse
the list and reset status flags. Beside that it also supports fast
handling of deletion of files where the iint has to be delete from the
global rbtree and the ns_status list it holds must also be deleted (each
ns_status also needs to be delete from a per IMA-namespace rbtree then)
There's also a per-IMA namespace rbtree for each inode that serves two
purposes:
a) Fast lookup of ns_status (O(log(n)) for an IMA namespace; at least to
insert an ns_status into this tree we need not write-lock the iint tree
but the initial iint creation required the write-locking of the iint tree
b) Maintaining a collection of inodes that the namespace has
audited/measured/appraised for efficient deletion upon IMA namespace
teardown: We can traverse this tree in O(n) time and determine which
iints have no more namespace users and delete them from the iint tree.
Now the dilemma with this is that an ns_status structure is connected to
a list hanging off the iint and on top of this it is part of an rbtree.
And this is where the 'group locking' is coming from. What we need to
prevent is that an ns_status is deleted from its iint list (when a file
is deleted) while it is also deleted from the per-IMA namespace rbtree
(when the namespace is deleted). Both must not be done concurrently.
What is possible is that a group of threads may tear down namespaces and
the other group may act on file deletion, but threads from both groups
must not run concurrently.
Now we can at least look at two alternatives for the per-IMA namespace
rbtree.
1) One alternative is to use a list instead of an rbtree. We would loose
the fast lookup via the per IMA namespace tree and get O(n) lookup times
but quick insertion into the list [O(1)]. We still would have the
collection of inodes. And we would still have the dilemma that an
ns_status would be connected to two lists, thus requiring the group
locking. I don't think using a list instead of an rbtree is a solution.
2) We could try to get rid of the per-IMA namespace rbtree altogether
and just use the global iint rbtree that exists today with a list of
ns_status connected to its iints. If we do this we would loose the
knowledge of which inodes a namespace has an ns_status structure for.
The only way we would find this is by traversing the global iint tree
(O(n)) and follow each iint list (O(m)) to see whether we find an
ns_status holding information about the iint. The time complexity for
this would be O(n*m) but much less than O(n^2). A downside would also be
that we would have to keep a lock on the global iint rbtree while
traversing it, thus locking out those that want to add inodes to the
tree. On the upside it would allow us to get rid of the group locking.
Lookup of an ns_status in the global iint tree would be O(n) + O(m) and
insertion would be O(n) + O(1).
Certainly, the alternative is 2) with its own trade-offs. My guess is
some sort of yielding could probably also be helpful there then to avoid
blocking higher priority operations than deleting of a namespace.
>> +/*
>> + * ima_free_ns_status_tree - free all items on the ns_status_tree and take each
>> + * one off the list; yield to ns_list free'ers
>> + *
>> + * This function is called when an ima_namespace is freed. All entries in the
>> + * rbtree will be taken off their list and collected in a garbage collection
>> + * list and freed at the end. This allows to walk the rbtree again.
>> + */
>> +void ima_free_ns_status_tree(struct ima_namespace *ns)
>> +{
>> + struct ns_status *ns_status, *next;
>> + struct llist_node *node;
>> + LLIST_HEAD(garbage);
>> + unsigned int ctr;
>> + bool restart;
>> +
>> + do {
>> + ctr = 0;
>> + restart = false;
>> +
>> + lock_group(GRP_NS_STATUS_TREE);
>> + write_lock(&ns->ns_tree_lock);
>> +
>> + rbtree_postorder_for_each_entry_safe(ns_status, next,
>> + &ns->ns_status_tree,
>> + rb_node) {
>> + write_lock(&ns_status->iint->ns_list_lock);
>> + if (!list_empty(&ns_status->ns_next)) {
>> + list_del_init(&ns_status->ns_next);
>> + llist_add(&ns_status->gc_llist, &garbage);
>> + ctr++;
>> + }
> At this point when the namespace is being deleted, no entries are being
> added to the rbtree, so it is safe to remove the nodes here. There's
> no need to first create a list and then remove them.
We are traversing the per-namespace rbtree here and later on in this
series (21/27) we try to determine which iints are now unused in the
global iint tree and delete the iints there. Due to the locking that's
necessary to remove the iint from the global rbtree all the ns_status
are collected in this list here already now. True, in this patch here it
is not necessary but in 21/27 it will be necessary to have them on this
list. I had thought about merging the patches but its complex enough as-is.
Stefan
>
>> + write_unlock(&ns_status->iint->ns_list_lock);
>> +
>> + /*
>> + * After some progress yield to any waiting ns_list
>> + * free'ers.
>> + */
>> + if (atomic_read(&ns_list_waiters) > 0 && ctr >= 5) {
>> + restart = true;
>> + break;
>> + }
> Giving priority to removing entries in the iint cache is important, but
> I wish there was a better alternative.
>
>> + }
>> +
>> + write_unlock(&ns->ns_tree_lock);
>> + unlock_group(GRP_NS_STATUS_TREE);
>> + } while (restart);
>> +
>> + node = llist_del_all(&garbage);
>> + llist_for_each_entry_safe(ns_status, next, node, gc_llist)
>> + ns_status_free(ns, ns_status);
>> +
>> + kmem_cache_destroy(ns->ns_status_cache);
>> +}
next prev parent reply other threads:[~2022-02-24 2:22 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-01 20:37 [PATCH v10 00/27] ima: Namespace IMA with audit support in IMA-ns Stefan Berger
2022-02-01 20:37 ` [PATCH v10 01/27] ima: Remove ima_policy file before directory Stefan Berger
2022-02-10 12:02 ` Mimi Zohar
2022-02-01 20:37 ` [PATCH v10 02/27] ima: Do not print policy rule with inactive LSM labels Stefan Berger
2022-02-02 14:17 ` Christian Brauner
2022-02-01 20:37 ` [PATCH v10 03/27] ima: Return error code obtained from securityfs functions Stefan Berger
2022-02-10 12:02 ` Mimi Zohar
2022-02-15 18:09 ` Mimi Zohar
2022-02-01 20:37 ` [PATCH v10 04/27] securityfs: rework dentry creation Stefan Berger
2022-02-10 12:03 ` Mimi Zohar
2022-02-01 20:37 ` [PATCH v10 05/27] ima: Define ima_namespace struct and start moving variables into it Stefan Berger
2022-02-16 14:41 ` Mimi Zohar
2022-02-16 20:25 ` Stefan Berger
2022-02-01 20:37 ` [PATCH v10 06/27] ima: Move arch_policy_entry into ima_namespace Stefan Berger
2022-02-16 16:39 ` Mimi Zohar
2022-02-16 20:48 ` Stefan Berger
2022-02-16 20:56 ` Mimi Zohar
2022-02-16 21:19 ` Stefan Berger
2022-02-01 20:37 ` [PATCH v10 07/27] ima: Move ima_htable " Stefan Berger
2022-02-16 14:41 ` Mimi Zohar
2022-02-01 20:37 ` [PATCH v10 08/27] ima: Move measurement list related variables " Stefan Berger
2022-02-17 14:46 ` Mimi Zohar
2022-02-01 20:37 ` [PATCH v10 09/27] ima: Move some IMA policy and filesystem " Stefan Berger
2022-02-17 14:44 ` Mimi Zohar
2022-02-01 20:37 ` [PATCH v10 10/27] ima: Move IMA securityfs files into ima_namespace or onto stack Stefan Berger
2022-02-17 14:44 ` Mimi Zohar
2022-02-01 20:37 ` [PATCH v10 11/27] ima: Move ima_lsm_policy_notifier into ima_namespace Stefan Berger
2022-02-17 20:30 ` Mimi Zohar
2022-02-17 20:59 ` Stefan Berger
2022-02-17 21:24 ` Mimi Zohar
2022-02-01 20:37 ` [PATCH v10 12/27] ima: Define mac_admin_ns_capable() as a wrapper for ns_capable() Stefan Berger
2022-02-05 5:58 ` Serge E. Hallyn
2022-02-06 17:20 ` Stefan Berger
2022-02-07 18:43 ` Stefan Berger
2022-02-23 17:51 ` Serge E. Hallyn
2022-02-01 20:37 ` [PATCH v10 13/27] ima: Only accept AUDIT rules for non-init_ima_ns namespaces for now Stefan Berger
2022-02-17 21:32 ` Mimi Zohar
2022-02-01 20:37 ` [PATCH v10 14/27] userns: Add pointer to ima_namespace to user_namespace Stefan Berger
2022-02-18 16:26 ` Mimi Zohar
2022-02-01 20:37 ` [PATCH v10 15/27] ima: Implement hierarchical processing of file accesses Stefan Berger
2022-02-18 16:27 ` Mimi Zohar
2022-02-01 20:37 ` [PATCH v10 16/27] ima: Implement ima_free_policy_rules() for freeing of an ima_namespace Stefan Berger
2022-02-18 17:09 ` Mimi Zohar
2022-02-18 19:38 ` Stefan Berger
2022-02-18 20:04 ` Mimi Zohar
2022-02-01 20:37 ` [PATCH v10 17/27] ima: Add functions for creating and " Stefan Berger
2022-02-18 19:49 ` Mimi Zohar
2022-02-01 20:37 ` [PATCH v10 18/27] integrity/ima: Define ns_status for storing namespaced iint data Stefan Berger
2022-02-23 16:12 ` Mimi Zohar
2022-02-24 2:21 ` Stefan Berger [this message]
2022-02-24 2:49 ` Stefan Berger
2022-02-01 20:37 ` [PATCH v10 19/27] integrity: Add optional callback function to integrity_inode_free() Stefan Berger
2022-02-01 20:37 ` [PATCH v10 20/27] ima: Namespace audit status flags Stefan Berger
2022-02-01 20:37 ` [PATCH v10 21/27] ima: Remove unused iints from the integrity_iint_cache Stefan Berger
2022-02-01 20:37 ` [PATCH v10 22/27] securityfs: Extend securityfs with namespacing support Stefan Berger
2022-02-23 1:48 ` Mimi Zohar
2022-02-23 8:14 ` Christian Brauner
2022-02-23 15:44 ` Stefan Berger
2022-02-01 20:37 ` [PATCH v10 23/27] ima: Setup securityfs for IMA namespace Stefan Berger
2022-02-23 11:45 ` Mimi Zohar
2022-02-01 20:37 ` [PATCH v10 24/27] ima: Introduce securityfs file to activate an " Stefan Berger
2022-02-23 13:54 ` Mimi Zohar
2022-02-23 17:08 ` Stefan Berger
2022-02-23 17:12 ` Mimi Zohar
2022-02-23 22:30 ` Stefan Berger
2022-02-01 20:37 ` [PATCH v10 25/27] ima: Show owning user namespace's uid and gid when displaying policy Stefan Berger
2022-02-23 14:06 ` Mimi Zohar
2022-02-01 20:37 ` [PATCH v10 26/27] ima: Limit number of policy rules in non-init_ima_ns Stefan Berger
2022-02-23 15:38 ` Mimi Zohar
2022-02-23 16:37 ` Stefan Berger
2022-02-23 17:04 ` Mimi Zohar
2022-02-23 20:45 ` Stefan Berger
2022-02-23 20:59 ` Mimi Zohar
2022-02-23 21:06 ` Stefan Berger
2022-02-01 20:37 ` [PATCH v10 27/27] ima: Enable IMA namespaces Stefan Berger
2022-02-23 17:58 ` Serge E. Hallyn
2022-02-23 20:53 ` Stefan Berger
2022-02-02 14:13 ` [PATCH v10 00/27] ima: Namespace IMA with audit support in IMA-ns Christian Brauner
2022-02-02 14:40 ` Stefan Berger
2022-02-02 16:04 ` Mimi Zohar
2022-02-02 18:18 ` Stefan Berger
2022-02-02 21:27 ` Stefan Berger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=92e1fc33-b97f-b99e-4f28-1d05a07c2f2f@linux.ibm.com \
--to=stefanb@linux.ibm.com \
--cc=christian.brauner@ubuntu.com \
--cc=containers@lists.linux.dev \
--cc=dmitry.kasatkin@gmail.com \
--cc=ebiederm@xmission.com \
--cc=jamjoom@us.ibm.com \
--cc=jejb@linux.ibm.com \
--cc=jmorris@namei.org \
--cc=krzysztof.struczynski@huawei.com \
--cc=lhinds@redhat.com \
--cc=linux-integrity@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=lsturman@redhat.com \
--cc=mkayaalp@linux.vnet.ibm.com \
--cc=mpeters@redhat.com \
--cc=paul@paul-moore.com \
--cc=puiterwi@redhat.com \
--cc=rgb@redhat.com \
--cc=roberto.sassu@huawei.com \
--cc=serge@hallyn.com \
--cc=zohar@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).