linux-audit.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Ankur Arora <ankur.a.arora@oracle.com>
To: Steve Grubb <sgrubb@redhat.com>
Cc: konrad.wilk@oracle.com, linux-kernel@vger.kernel.org,
	linux-audit@redhat.com, eparis@redhat.com,
	boris.ostrovsky@oracle.com
Subject: Re: [PATCH 1/3] audit: cache ctx->major in audit_filter_syscall()
Date: Thu, 06 Oct 2022 17:55:06 -0700	[thread overview]
Message-ID: <87a668bggl.fsf@oracle.com> (raw)
In-Reply-To: <8171459.NyiUUSuA9g@x2>


Steve Grubb <sgrubb@redhat.com> writes:

> Hello,
>
> Thanks for the detailed notes on this investigation. It really is  a lot of
> good information backing this up. However, there will come a day when someone
> sees this "major = ctx->major" and they will send a patch to "fix" this
> unnecessary assignment. If you are sending a V2 of this set, I would suggest
> adding some comment in the code that this is for a performance improvement
> and to see the commit message for additional info.

Thanks for the comment. Just sent out v2 of the last patch which
addresses this tangentially -- it adds a separate function parameter
for ctx->major/uring_op, so this shouldn't be an issue anymore.

Thanks
Ankur

>
> Thanks,
> -Steve
>
> On Tuesday, September 27, 2022 6:59:42 PM EDT Ankur Arora wrote:
>> ctx->major contains the current syscall number. This is, of course, a
>> constant for the duration of the syscall. Unfortunately, GCC's alias
>> analysis cannot prove that it is not modified via a pointer in the
>> audit_filter_syscall() loop, and so always loads it from memory.
>>
>> In and of itself the load isn't very expensive (ops dependent on the
>> ctx->major load are only used to determine the direction of control flow
>> and have short dependence chains and, in any case the related branches
>> get predicted perfectly in the fastpath) but still cache ctx->major
>> in a local for two reasons:
>>
>> * ctx->major is in the first cacheline of struct audit_context and has
>>   similar alignment as audit_entry::list audit_entry. For cases
>>   with a lot of audit rules, doing this reduces one source of contention
>>   from a potentially busy cache-set.
>>
>> * audit_in_mask() (called in the hot loop in audit_filter_syscall())
>>   does cast manipulation and error checking on ctx->major:
>>
>>      audit_in_mask(const struct audit_krule *rule, unsigned long val):
>>              if (val > 0xffffffff)
>>                      return false;
>>
>>              word = AUDIT_WORD(val);
>>              if (word >= AUDIT_BITMASK_SIZE)
>>                      return false;
>>
>>              bit = AUDIT_BIT(val);
>>
>>              return rule->mask[word] & bit;
>>
>>   The clauses related to the rule need to be evaluated in the loop, but
>>   the rest is unnecessarily re-evaluated for every loop iteration.
>>   (Note, however, that most of these are cheap ALU ops and the branches
>>    are perfectly predicted. However, see discussion on cycles
>>    improvement below for more on why it is still worth hoisting.)
>>
>> On a Skylakex system change in getpid() latency (aggregated over
>> 12 boot cycles):
>>
>>              Min     Mean  Median     Max       pstdev
>>             (ns)     (ns)    (ns)    (ns)
>>
>>  -        201.30   216.14  216.22  228.46      (+- 1.45%)
>>  +        196.63   207.86  206.60  230.98      (+- 3.92%)
>>
>> Performance counter stats for 'bin/getpid' (3 runs) go from:
>>     cycles               836.89  (  +-   .80% )
>>     instructions        2000.19  (  +-   .03% )
>>     IPC                    2.39  (  +-   .83% )
>>     branches             430.14  (  +-   .03% )
>>     branch-misses          1.48  (  +-  3.37% )
>>     L1-dcache-loads      471.11  (  +-   .05% )
>>     L1-dcache-load-misses  7.62  (  +- 46.98% )
>>
>>  to:
>>     cycles               805.58  (  +-  4.11% )
>>     instructions        1654.11  (  +-   .05% )
>>     IPC                    2.06  (  +-  3.39% )
>>     branches             430.02  (  +-   .05% )
>>     branch-misses          1.55  (  +-  7.09% )
>>     L1-dcache-loads      440.01  (  +-   .09% )
>>     L1-dcache-load-misses  9.05  (  +- 74.03% )
>>
>> (Both aggregated over 12 boot cycles.)
>>
>> instructions: we reduce around 8 instructions/iteration because some of
>> the computation is now hoisted out of the loop (branch count does not
>> change because GCC, for reasons unclear, only hoists the computations
>> while keeping the basic-blocks.)
>>
>> cycles: improve by about 5% (in aggregate and looking at individual run
>> numbers.) This is likely because we now waste fewer pipeline resources
>> on unnecessary instructions which allows the control flow to
>> speculatively execute further ahead shortening the execution of the loop
>> a little. The final gating factor on the performance of this loop
>> remains the long dependence chain due to the linked-list load.
>>
>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>> ---
>>  kernel/auditsc.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
>> index 79a5da1bc5bb..533b087c3c02 100644
>> --- a/kernel/auditsc.c
>> +++ b/kernel/auditsc.c
>> @@ -843,13 +843,14 @@ static void audit_filter_syscall(struct task_struct
>> *tsk, {
>>  	struct audit_entry *e;
>>  	enum audit_state state;
>> +	unsigned long major = ctx->major;
>>
>>  	if (auditd_test_task(tsk))
>>  		return;
>>
>>  	rcu_read_lock();
>>  	list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_EXIT],
> list) {
>> -		if (audit_in_mask(&e->rule, ctx->major) &&
>> +		if (audit_in_mask(&e->rule, major) &&
>>  		    audit_filter_rules(tsk, &e->rule, ctx, NULL,
>>  				       &state, false)) {
>>  			rcu_read_unlock();


--
ankur

--
Linux-audit mailing list
Linux-audit@redhat.com
https://listman.redhat.com/mailman/listinfo/linux-audit


  parent reply	other threads:[~2022-10-07  0:55 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-27 22:59 [PATCH 0/3] improve audit syscall-exit latency Ankur Arora
2022-09-27 22:59 ` [PATCH 1/3] audit: cache ctx->major in audit_filter_syscall() Ankur Arora
2022-09-28 22:03   ` Paul Moore
2022-09-29 20:20     ` Ankur Arora
2022-10-17 18:23     ` Paul Moore
2022-09-30 17:45   ` Steve Grubb
2022-09-30 18:22     ` Paul Moore
2022-10-07  0:55     ` Ankur Arora [this message]
2022-09-27 22:59 ` [PATCH 2/3] audit: annotate branch direction for audit_in_mask() Ankur Arora
2022-09-28 22:26   ` Paul Moore
2022-09-29 20:19     ` Ankur Arora
2022-09-30 18:48       ` Paul Moore
2022-10-07  0:57         ` Ankur Arora
2022-09-27 22:59 ` [PATCH 3/3] audit: unify audit_filter_{uring(), inode_name(), syscall()} Ankur Arora
2022-09-28 22:54   ` Paul Moore
2022-09-29 20:23     ` [PATCH 3/3] audit: unify audit_filter_{uring(),inode_name(),syscall()} Ankur Arora
2022-10-07  0:49   ` [PATCH v2] audit: unify audit_filter_{uring(), inode_name(), syscall()} Ankur Arora
2022-10-13 23:11     ` Paul Moore
2022-10-14 16:53       ` [PATCH v2] audit: unify audit_filter_{uring(),inode_name(),syscall()} Ankur Arora
2022-10-17 18:26       ` [PATCH v2] audit: unify audit_filter_{uring(), inode_name(), syscall()} Paul Moore

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a668bggl.fsf@oracle.com \
    --to=ankur.a.arora@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=eparis@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-audit@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sgrubb@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).