From: Paul Moore <pmoore@redhat.com>
To: Richard Guy Briggs <rgb@redhat.com>
Cc: linux-audit@redhat.com, linux-kernel@vger.kernel.org,
sgrubb@redhat.com, eparis@redhat.com, v.rathor@gmail.com,
ctcard@hotmail.com
Subject: Re: [PATCH V2] audit: try harder to send to auditd upon netlink failure
Date: Wed, 09 Sep 2015 16:41:54 -0400 [thread overview]
Message-ID: <4657206.BZD1YinpOL@sifl> (raw)
In-Reply-To: <8a3693df3e804bffdd0bf148fb1793d684bf9bbf.1441614933.git.rgb@redhat.com>
On Monday, September 07, 2015 05:10:13 AM Richard Guy Briggs wrote:
> There are several reports of the kernel losing contact with auditd when
> it is, in fact, still running. When this happens, kernel syslogs show:
> "audit: *NO* daemon at audit_pid=<pid>"
> although auditd is still running, and is apparently happy, listening on
> the netlink socket. The pid in the "*NO* daemon" message matches the pid
> of the running auditd process. Restarting auditd solves this.
>
> The problem appears to happen randomly, and doesn't seem to be strongly
> correlated to the rate of audit events being logged. The problem
> happens fairly regularly (every few days), but not yet reproduced to
> order.
>
> On production kernels, BUG_ON() is a no-op, so any error will trigger
> this.
>
> Commit 34eab0a7cd45 ("audit: prevent an older auditd shutdown from
> orphaning a newer auditd startup") eliminates one possible cause. This
> isn't the case here, since the PID in the error message and the PID of
> the running auditd match.
>
> The primary expected cause of error here is -ECONNREFUSED when the audit
> daemon goes away, when netlink_getsockbyportid() can't find the auditd
> portid entry in the netlink audit table (or there is no receive
> function). If -EPERM is returned, that situation isn't likely to be
> resolved in a timely fashion without administrator intervention. In
> both cases, reset the audit_pid. This does not rule out a race
> condition. SELinux is expected to return zero since this isn't an INET
> or INET6 socket. Other LSMs may have other return codes. Log the error
> code for better diagnosis in the future.
>
> In the case of -ENOMEM, the situation could be temporary, based on local
> or general availability of buffers. -EAGAIN should never happen since
> the netlink audit (kernel) socket is set to MAX_SCHEDULE_TIMEOUT.
> -ERESTARTSYS and -EINTR are not expected since this kernel thread is not
> expected to receive signals. In these cases (or any other unexpected
> ones for now), report the error and re-schedule the thread, retrying up
> to 5 times.
>
> v2:
> Removed BUG_ON().
> Moved comma in pr_*() statements.
> Removed audit_strerror() text.
>
> Reported-by: Vipin Rathor <v.rathor@gmail.com>
> Reported-by: <ctcard@hotmail.com>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
> kernel/audit.c | 24 +++++++++++++++++++-----
> 1 files changed, 19 insertions(+), 5 deletions(-)
Queued up for linux-audit#next as soon as 4.3-rc1 is released.
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 1c13e42..18cdfe2 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -407,16 +407,30 @@ static void audit_printk_skb(struct sk_buff *skb)
> static void kauditd_send_skb(struct sk_buff *skb)
> {
> int err;
> + int attempts = 0;
> +#define AUDITD_RETRIES 5
> +
> +restart:
> /* take a reference in case we can't send it and we want to hold it */
> skb_get(skb);
> err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0);
> if (err < 0) {
> - BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
> + pr_err("netlink_unicast sending to audit_pid=%d returned error: %d\n",
> + audit_pid, err);
> if (audit_pid) {
> - pr_err("*NO* daemon at audit_pid=%d\n", audit_pid);
> - audit_log_lost("auditd disappeared");
> - audit_pid = 0;
> - audit_sock = NULL;
> + if (err == -ECONNREFUSED || err == -EPERM
> + || ++attempts >= AUDITD_RETRIES) {
> + audit_log_lost("audit_pid=%d reset");
> + audit_pid = 0;
> + audit_sock = NULL;
> + } else {
> + pr_warn("re-scheduling(#%d) write to audit_pid=%d\n",
> + attempts, audit_pid);
> + set_current_state(TASK_INTERRUPTIBLE);
> + schedule();
> + __set_current_state(TASK_RUNNING);
> + goto restart;
> + }
> }
> /* we might get lucky and get this in the next auditd */
> audit_hold_skb(skb);
--
paul moore
security @ redhat
prev parent reply other threads:[~2015-09-09 20:41 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-07 9:10 [PATCH V2] audit: try harder to send to auditd upon netlink failure Richard Guy Briggs
2015-09-09 20:41 ` Paul Moore [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4657206.BZD1YinpOL@sifl \
--to=pmoore@redhat.com \
--cc=ctcard@hotmail.com \
--cc=eparis@redhat.com \
--cc=linux-audit@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rgb@redhat.com \
--cc=sgrubb@redhat.com \
--cc=v.rathor@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).