* [PATCH V2] audit: try harder to send to auditd upon netlink failure
@ 2015-09-07 9:10 Richard Guy Briggs
2015-09-09 20:41 ` Paul Moore
0 siblings, 1 reply; 2+ messages in thread
From: Richard Guy Briggs @ 2015-09-07 9:10 UTC (permalink / raw)
To: linux-audit, linux-kernel
Cc: Richard Guy Briggs, sgrubb, pmoore, eparis, v.rathor, ctcard
There are several reports of the kernel losing contact with auditd when
it is, in fact, still running. When this happens, kernel syslogs show:
"audit: *NO* daemon at audit_pid=<pid>"
although auditd is still running, and is apparently happy, listening on
the netlink socket. The pid in the "*NO* daemon" message matches the pid
of the running auditd process. Restarting auditd solves this.
The problem appears to happen randomly, and doesn't seem to be strongly
correlated to the rate of audit events being logged. The problem
happens fairly regularly (every few days), but not yet reproduced to
order.
On production kernels, BUG_ON() is a no-op, so any error will trigger
this.
Commit 34eab0a7cd45 ("audit: prevent an older auditd shutdown from
orphaning a newer auditd startup") eliminates one possible cause. This
isn't the case here, since the PID in the error message and the PID of
the running auditd match.
The primary expected cause of error here is -ECONNREFUSED when the audit
daemon goes away, when netlink_getsockbyportid() can't find the auditd
portid entry in the netlink audit table (or there is no receive
function). If -EPERM is returned, that situation isn't likely to be
resolved in a timely fashion without administrator intervention. In
both cases, reset the audit_pid. This does not rule out a race
condition. SELinux is expected to return zero since this isn't an INET
or INET6 socket. Other LSMs may have other return codes. Log the error
code for better diagnosis in the future.
In the case of -ENOMEM, the situation could be temporary, based on local
or general availability of buffers. -EAGAIN should never happen since
the netlink audit (kernel) socket is set to MAX_SCHEDULE_TIMEOUT.
-ERESTARTSYS and -EINTR are not expected since this kernel thread is not
expected to receive signals. In these cases (or any other unexpected
ones for now), report the error and re-schedule the thread, retrying up
to 5 times.
v2:
Removed BUG_ON().
Moved comma in pr_*() statements.
Removed audit_strerror() text.
Reported-by: Vipin Rathor <v.rathor@gmail.com>
Reported-by: <ctcard@hotmail.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
---
kernel/audit.c | 24 +++++++++++++++++++-----
1 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/kernel/audit.c b/kernel/audit.c
index 1c13e42..18cdfe2 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -407,16 +407,30 @@ static void audit_printk_skb(struct sk_buff *skb)
static void kauditd_send_skb(struct sk_buff *skb)
{
int err;
+ int attempts = 0;
+#define AUDITD_RETRIES 5
+
+restart:
/* take a reference in case we can't send it and we want to hold it */
skb_get(skb);
err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0);
if (err < 0) {
- BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
+ pr_err("netlink_unicast sending to audit_pid=%d returned error: %d\n",
+ audit_pid, err);
if (audit_pid) {
- pr_err("*NO* daemon at audit_pid=%d\n", audit_pid);
- audit_log_lost("auditd disappeared");
- audit_pid = 0;
- audit_sock = NULL;
+ if (err == -ECONNREFUSED || err == -EPERM
+ || ++attempts >= AUDITD_RETRIES) {
+ audit_log_lost("audit_pid=%d reset");
+ audit_pid = 0;
+ audit_sock = NULL;
+ } else {
+ pr_warn("re-scheduling(#%d) write to audit_pid=%d\n",
+ attempts, audit_pid);
+ set_current_state(TASK_INTERRUPTIBLE);
+ schedule();
+ __set_current_state(TASK_RUNNING);
+ goto restart;
+ }
}
/* we might get lucky and get this in the next auditd */
audit_hold_skb(skb);
--
1.7.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH V2] audit: try harder to send to auditd upon netlink failure
2015-09-07 9:10 [PATCH V2] audit: try harder to send to auditd upon netlink failure Richard Guy Briggs
@ 2015-09-09 20:41 ` Paul Moore
0 siblings, 0 replies; 2+ messages in thread
From: Paul Moore @ 2015-09-09 20:41 UTC (permalink / raw)
To: Richard Guy Briggs
Cc: linux-audit, linux-kernel, sgrubb, eparis, v.rathor, ctcard
On Monday, September 07, 2015 05:10:13 AM Richard Guy Briggs wrote:
> There are several reports of the kernel losing contact with auditd when
> it is, in fact, still running. When this happens, kernel syslogs show:
> "audit: *NO* daemon at audit_pid=<pid>"
> although auditd is still running, and is apparently happy, listening on
> the netlink socket. The pid in the "*NO* daemon" message matches the pid
> of the running auditd process. Restarting auditd solves this.
>
> The problem appears to happen randomly, and doesn't seem to be strongly
> correlated to the rate of audit events being logged. The problem
> happens fairly regularly (every few days), but not yet reproduced to
> order.
>
> On production kernels, BUG_ON() is a no-op, so any error will trigger
> this.
>
> Commit 34eab0a7cd45 ("audit: prevent an older auditd shutdown from
> orphaning a newer auditd startup") eliminates one possible cause. This
> isn't the case here, since the PID in the error message and the PID of
> the running auditd match.
>
> The primary expected cause of error here is -ECONNREFUSED when the audit
> daemon goes away, when netlink_getsockbyportid() can't find the auditd
> portid entry in the netlink audit table (or there is no receive
> function). If -EPERM is returned, that situation isn't likely to be
> resolved in a timely fashion without administrator intervention. In
> both cases, reset the audit_pid. This does not rule out a race
> condition. SELinux is expected to return zero since this isn't an INET
> or INET6 socket. Other LSMs may have other return codes. Log the error
> code for better diagnosis in the future.
>
> In the case of -ENOMEM, the situation could be temporary, based on local
> or general availability of buffers. -EAGAIN should never happen since
> the netlink audit (kernel) socket is set to MAX_SCHEDULE_TIMEOUT.
> -ERESTARTSYS and -EINTR are not expected since this kernel thread is not
> expected to receive signals. In these cases (or any other unexpected
> ones for now), report the error and re-schedule the thread, retrying up
> to 5 times.
>
> v2:
> Removed BUG_ON().
> Moved comma in pr_*() statements.
> Removed audit_strerror() text.
>
> Reported-by: Vipin Rathor <v.rathor@gmail.com>
> Reported-by: <ctcard@hotmail.com>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
> kernel/audit.c | 24 +++++++++++++++++++-----
> 1 files changed, 19 insertions(+), 5 deletions(-)
Queued up for linux-audit#next as soon as 4.3-rc1 is released.
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 1c13e42..18cdfe2 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -407,16 +407,30 @@ static void audit_printk_skb(struct sk_buff *skb)
> static void kauditd_send_skb(struct sk_buff *skb)
> {
> int err;
> + int attempts = 0;
> +#define AUDITD_RETRIES 5
> +
> +restart:
> /* take a reference in case we can't send it and we want to hold it */
> skb_get(skb);
> err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0);
> if (err < 0) {
> - BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
> + pr_err("netlink_unicast sending to audit_pid=%d returned error: %d\n",
> + audit_pid, err);
> if (audit_pid) {
> - pr_err("*NO* daemon at audit_pid=%d\n", audit_pid);
> - audit_log_lost("auditd disappeared");
> - audit_pid = 0;
> - audit_sock = NULL;
> + if (err == -ECONNREFUSED || err == -EPERM
> + || ++attempts >= AUDITD_RETRIES) {
> + audit_log_lost("audit_pid=%d reset");
> + audit_pid = 0;
> + audit_sock = NULL;
> + } else {
> + pr_warn("re-scheduling(#%d) write to audit_pid=%d\n",
> + attempts, audit_pid);
> + set_current_state(TASK_INTERRUPTIBLE);
> + schedule();
> + __set_current_state(TASK_RUNNING);
> + goto restart;
> + }
> }
> /* we might get lucky and get this in the next auditd */
> audit_hold_skb(skb);
--
paul moore
security @ redhat
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2015-09-09 20:41 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-07 9:10 [PATCH V2] audit: try harder to send to auditd upon netlink failure Richard Guy Briggs
2015-09-09 20:41 ` Paul Moore
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).