* [PATCH] ratelimit printk messages from the audit system
@ 2008-01-23 19:50 Eric Paris
2008-01-23 21:05 ` Linda Knippers
0 siblings, 1 reply; 8+ messages in thread
From: Eric Paris @ 2008-01-23 19:50 UTC (permalink / raw)
To: linux-audit
Some printk messages from the audit system can become excessive. This
patch ratelimits those messages. It was found that messages, such as
the audit backlog lost printk message could flood the logs to the point
that a machine could take an nmi watchdog hit or otherwise become
unresponsive.
Signed-off-by: Eric Paris <eparis@redhat.com>
---
kernel/audit.c | 28 ++++++++++++++++++----------
1 files changed, 18 insertions(+), 10 deletions(-)
diff --git a/kernel/audit.c b/kernel/audit.c
index f93c271..a3d828b 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -163,7 +163,8 @@ void audit_panic(const char *message)
case AUDIT_FAIL_SILENT:
break;
case AUDIT_FAIL_PRINTK:
- printk(KERN_ERR "audit: %s\n", message);
+ if (printk_ratelimit())
+ printk(KERN_ERR "audit: %s\n", message);
break;
case AUDIT_FAIL_PANIC:
panic("audit: %s\n", message);
@@ -231,11 +232,13 @@ void audit_log_lost(const char *message)
}
if (print) {
- printk(KERN_WARNING
- "audit: audit_lost=%d audit_rate_limit=%d audit_backlog_limit=%d\n",
- atomic_read(&audit_lost),
- audit_rate_limit,
- audit_backlog_limit);
+ if (printk_ratelimit())
+ printk(KERN_WARNING
+ "audit: audit_lost=%d audit_rate_limit=%d "
+ "audit_backlog_limit=%d\n",
+ atomic_read(&audit_lost),
+ audit_rate_limit,
+ audit_backlog_limit);
audit_panic(message);
}
}
@@ -405,7 +408,11 @@ static int kauditd_thread(void *dummy)
audit_pid = 0;
}
} else {
- printk(KERN_NOTICE "%s\n", skb->data + NLMSG_SPACE(0));
+ if (printk_ratelimit())
+ printk(KERN_NOTICE "%s\n", skb->data +
+ NLMSG_SPACE(0));
+ else
+ audit_log_lost("printk limit exceeded\n");
kfree_skb(skb);
}
} else {
@@ -1164,7 +1171,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
remove_wait_queue(&audit_backlog_wait, &wait);
continue;
}
- if (audit_rate_check())
+ if (audit_rate_check() && printk_ratelimit())
printk(KERN_WARNING
"audit: audit_backlog=%d > "
"audit_backlog_limit=%d\n",
@@ -1433,9 +1440,10 @@ void audit_log_end(struct audit_buffer *ab)
skb_queue_tail(&audit_skb_queue, ab->skb);
ab->skb = NULL;
wake_up_interruptible(&kauditd_wait);
- } else {
+ } else if (printk_ratelimit())
printk(KERN_NOTICE "%s\n", ab->skb->data + NLMSG_SPACE(0));
- }
+ else
+ audit_log_lost("printk limit exceeded\n");
}
audit_buffer_free(ab);
}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] ratelimit printk messages from the audit system
2008-01-23 19:50 [PATCH] ratelimit printk messages from the audit system Eric Paris
@ 2008-01-23 21:05 ` Linda Knippers
2008-01-23 21:41 ` Eric Paris
0 siblings, 1 reply; 8+ messages in thread
From: Linda Knippers @ 2008-01-23 21:05 UTC (permalink / raw)
To: Eric Paris; +Cc: linux-audit
Eric Paris wrote:
> Some printk messages from the audit system can become excessive. This
> patch ratelimits those messages. It was found that messages, such as
> the audit backlog lost printk message could flood the logs to the point
> that a machine could take an nmi watchdog hit or otherwise become
> unresponsive.
>
> Signed-off-by: Eric Paris <eparis@redhat.com>
>
> ---
> kernel/audit.c | 28 ++++++++++++++++++----------
> 1 files changed, 18 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index f93c271..a3d828b 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -163,7 +163,8 @@ void audit_panic(const char *message)
> case AUDIT_FAIL_SILENT:
> break;
> case AUDIT_FAIL_PRINTK:
> - printk(KERN_ERR "audit: %s\n", message);
> + if (printk_ratelimit())
> + printk(KERN_ERR "audit: %s\n", message);
> break;
> case AUDIT_FAIL_PANIC:
> panic("audit: %s\n", message);
> @@ -231,11 +232,13 @@ void audit_log_lost(const char *message)
> }
>
> if (print) {
> - printk(KERN_WARNING
> - "audit: audit_lost=%d audit_rate_limit=%d audit_backlog_limit=%d\n",
> - atomic_read(&audit_lost),
> - audit_rate_limit,
> - audit_backlog_limit);
> + if (printk_ratelimit())
> + printk(KERN_WARNING
> + "audit: audit_lost=%d audit_rate_limit=%d "
This is unrelated to your patch but I think it would be nice if
audit_lost represented the number of audit messages lost since the last
time the message came out or the last time an audit record came out.
Today its a cumulative count since the system was booted. Is it too
much overhead to zero it?
> + "audit_backlog_limit=%d\n",
> + atomic_read(&audit_lost),
> + audit_rate_limit,
> + audit_backlog_limit);
> audit_panic(message);
> }
> }
> @@ -405,7 +408,11 @@ static int kauditd_thread(void *dummy)
> audit_pid = 0;
> }
> } else {
> - printk(KERN_NOTICE "%s\n", skb->data + NLMSG_SPACE(0));
> + if (printk_ratelimit())
> + printk(KERN_NOTICE "%s\n", skb->data +
> + NLMSG_SPACE(0));
> + else
> + audit_log_lost("printk limit exceeded\n");
If you call audit_log_lost when the printk limit is exceeded, but then
audit_log_lost also checks the printk limit, will this message ever
come out? Does it make sense to print a message saying we couldn't
print a message?
> kfree_skb(skb);
> }
> } else {
> @@ -1164,7 +1171,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
> remove_wait_queue(&audit_backlog_wait, &wait);
> continue;
> }
> - if (audit_rate_check())
> + if (audit_rate_check() && printk_ratelimit())
> printk(KERN_WARNING
> "audit: audit_backlog=%d > "
> "audit_backlog_limit=%d\n",
> @@ -1433,9 +1440,10 @@ void audit_log_end(struct audit_buffer *ab)
> skb_queue_tail(&audit_skb_queue, ab->skb);
> ab->skb = NULL;
> wake_up_interruptible(&kauditd_wait);
> - } else {
> + } else if (printk_ratelimit())
> printk(KERN_NOTICE "%s\n", ab->skb->data + NLMSG_SPACE(0));
> - }
> + else
> + audit_log_lost("printk limit exceeded\n");
Same question here.
I wonder if it would be better to reduce the generation of the messages,
rather than just their output. For example, once we're losing records,
should we just flush the queue, issue one message, and then keep going?
Or perhaps issue one message, shut off incoming so we don't accept new
records until the backlog goes to zero, then start up again?
> }
> audit_buffer_free(ab);
> }
>
>
> --
> Linux-audit mailing list
> Linux-audit@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-audit
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ratelimit printk messages from the audit system
2008-01-23 21:05 ` Linda Knippers
@ 2008-01-23 21:41 ` Eric Paris
2008-01-23 22:06 ` Linda Knippers
0 siblings, 1 reply; 8+ messages in thread
From: Eric Paris @ 2008-01-23 21:41 UTC (permalink / raw)
To: Linda Knippers; +Cc: linux-audit
On Wed, 2008-01-23 at 16:05 -0500, Linda Knippers wrote:
> Eric Paris wrote:
> > Some printk messages from the audit system can become excessive. This
> > patch ratelimits those messages. It was found that messages, such as
> > the audit backlog lost printk message could flood the logs to the point
> > that a machine could take an nmi watchdog hit or otherwise become
> > unresponsive.
> >
> > Signed-off-by: Eric Paris <eparis@redhat.com>
> >
> > ---
> > kernel/audit.c | 28 ++++++++++++++++++----------
> > 1 files changed, 18 insertions(+), 10 deletions(-)
> >
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index f93c271..a3d828b 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -163,7 +163,8 @@ void audit_panic(const char *message)
> > case AUDIT_FAIL_SILENT:
> > break;
> > case AUDIT_FAIL_PRINTK:
> > - printk(KERN_ERR "audit: %s\n", message);
> > + if (printk_ratelimit())
> > + printk(KERN_ERR "audit: %s\n", message);
> > break;
> > case AUDIT_FAIL_PANIC:
> > panic("audit: %s\n", message);
> > @@ -231,11 +232,13 @@ void audit_log_lost(const char *message)
> > }
> >
> > if (print) {
> > - printk(KERN_WARNING
> > - "audit: audit_lost=%d audit_rate_limit=%d audit_backlog_limit=%d\n",
> > - atomic_read(&audit_lost),
> > - audit_rate_limit,
> > - audit_backlog_limit);
> > + if (printk_ratelimit())
> > + printk(KERN_WARNING
> > + "audit: audit_lost=%d audit_rate_limit=%d "
>
> This is unrelated to your patch but I think it would be nice if
> audit_lost represented the number of audit messages lost since the last
> time the message came out or the last time an audit record came out.
> Today its a cumulative count since the system was booted. Is it too
> much overhead to zero it?
Shouldn't be too much overhead, we are already on a slow/unlikely path.
What's the benefit though? Just don't want to have to do a subtraction?
If we are dropping the 'we lost some messages' message 0'ing the counter
at that time would be a bad idea, certainly not unsolvable, but I don't
see what it buys us.
>
> > + "audit_backlog_limit=%d\n",
> > + atomic_read(&audit_lost),
> > + audit_rate_limit,
> > + audit_backlog_limit);
> > audit_panic(message);
> > }
> > }
> > @@ -405,7 +408,11 @@ static int kauditd_thread(void *dummy)
> > audit_pid = 0;
> > }
> > } else {
> > - printk(KERN_NOTICE "%s\n", skb->data + NLMSG_SPACE(0));
> > + if (printk_ratelimit())
> > + printk(KERN_NOTICE "%s\n", skb->data +
> > + NLMSG_SPACE(0));
> > + else
> > + audit_log_lost("printk limit exceeded\n");
>
> If you call audit_log_lost when the printk limit is exceeded, but then
> audit_log_lost also checks the printk limit, will this message ever
> come out? Does it make sense to print a message saying we couldn't
> print a message?
No it won't come out of audit_log_lost() through printk either, but what
it does do is call audit_panic() and we get the lost message accounting.
> > kfree_skb(skb);
> > }
> > } else {
> > @@ -1164,7 +1171,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
> > remove_wait_queue(&audit_backlog_wait, &wait);
> > continue;
> > }
> > - if (audit_rate_check())
> > + if (audit_rate_check() && printk_ratelimit())
> > printk(KERN_WARNING
> > "audit: audit_backlog=%d > "
> > "audit_backlog_limit=%d\n",
> > @@ -1433,9 +1440,10 @@ void audit_log_end(struct audit_buffer *ab)
> > skb_queue_tail(&audit_skb_queue, ab->skb);
> > ab->skb = NULL;
> > wake_up_interruptible(&kauditd_wait);
> > - } else {
> > + } else if (printk_ratelimit())
> > printk(KERN_NOTICE "%s\n", ab->skb->data + NLMSG_SPACE(0));
> > - }
> > + else
> > + audit_log_lost("printk limit exceeded\n");
>
> Same question here.
>
> I wonder if it would be better to reduce the generation of the messages,
> rather than just their output. For example, once we're losing records,
> should we just flush the queue, issue one message, and then keep going?
I'd be scared it'd just fill up too quickly again...
> Or perhaps issue one message, shut off incoming so we don't accept new
> records until the backlog goes to zero, then start up again?
Well that's sorta what we do now, we throw stuff on the floor until it
gets low, maybe not to 0, i don't remember. I'll take a look.
I'll think about it, but really, as long as we are generating audit
events there isn't a great way to solve this problem other than throwing
stuff on the floor.
At this point I think this patch is good, but I'll look at how we handle
lost messages a little more....
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ratelimit printk messages from the audit system
2008-01-23 21:41 ` Eric Paris
@ 2008-01-23 22:06 ` Linda Knippers
2008-01-24 17:52 ` Paul Moore
0 siblings, 1 reply; 8+ messages in thread
From: Linda Knippers @ 2008-01-23 22:06 UTC (permalink / raw)
To: Eric Paris; +Cc: linux-audit
Eric Paris wrote:
> On Wed, 2008-01-23 at 16:05 -0500, Linda Knippers wrote:
>> Eric Paris wrote:
>>> Some printk messages from the audit system can become excessive. This
>>> patch ratelimits those messages. It was found that messages, such as
>>> the audit backlog lost printk message could flood the logs to the point
>>> that a machine could take an nmi watchdog hit or otherwise become
>>> unresponsive.
>>>
>>> Signed-off-by: Eric Paris <eparis@redhat.com>
>>>
>>> ---
>>> kernel/audit.c | 28 ++++++++++++++++++----------
>>> 1 files changed, 18 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/kernel/audit.c b/kernel/audit.c
>>> index f93c271..a3d828b 100644
>>> --- a/kernel/audit.c
>>> +++ b/kernel/audit.c
>>> @@ -163,7 +163,8 @@ void audit_panic(const char *message)
>>> case AUDIT_FAIL_SILENT:
>>> break;
>>> case AUDIT_FAIL_PRINTK:
>>> - printk(KERN_ERR "audit: %s\n", message);
>>> + if (printk_ratelimit())
>>> + printk(KERN_ERR "audit: %s\n", message);
>>> break;
>>> case AUDIT_FAIL_PANIC:
>>> panic("audit: %s\n", message);
>>> @@ -231,11 +232,13 @@ void audit_log_lost(const char *message)
>>> }
>>>
>>> if (print) {
>>> - printk(KERN_WARNING
>>> - "audit: audit_lost=%d audit_rate_limit=%d audit_backlog_limit=%d\n",
>>> - atomic_read(&audit_lost),
>>> - audit_rate_limit,
>>> - audit_backlog_limit);
>>> + if (printk_ratelimit())
>>> + printk(KERN_WARNING
>>> + "audit: audit_lost=%d audit_rate_limit=%d "
>> This is unrelated to your patch but I think it would be nice if
>> audit_lost represented the number of audit messages lost since the last
>> time the message came out or the last time an audit record came out.
>> Today its a cumulative count since the system was booted. Is it too
>> much overhead to zero it?
>
> Shouldn't be too much overhead, we are already on a slow/unlikely path.
> What's the benefit though? Just don't want to have to do a subtraction?
Well that, plus if the system is up for a long time (which we hope) and
the message is infrequent (which we also hope), then it could take me a
while to find the previous message in order to do the subtraction.
> If we are dropping the 'we lost some messages' message 0'ing the counter
> at that time would be a bad idea, certainly not unsolvable, but I don't
> see what it buys us.
I wouldn't want to lose the message, just make it more useful. And if
we zero it we don't have to worry about it wrapping. As it is now, its
really just the count since the last time it wrapped.
>
>>> + "audit_backlog_limit=%d\n",
>>> + atomic_read(&audit_lost),
>>> + audit_rate_limit,
>>> + audit_backlog_limit);
>>> audit_panic(message);
>>> }
>>> }
>>> @@ -405,7 +408,11 @@ static int kauditd_thread(void *dummy)
>>> audit_pid = 0;
>>> }
>>> } else {
>>> - printk(KERN_NOTICE "%s\n", skb->data + NLMSG_SPACE(0));
>>> + if (printk_ratelimit())
>>> + printk(KERN_NOTICE "%s\n", skb->data +
>>> + NLMSG_SPACE(0));
>>> + else
>>> + audit_log_lost("printk limit exceeded\n");
>> If you call audit_log_lost when the printk limit is exceeded, but then
>> audit_log_lost also checks the printk limit, will this message ever
>> come out? Does it make sense to print a message saying we couldn't
>> print a message?
>
> No it won't come out of audit_log_lost() through printk either, but what
> it does do is call audit_panic() and we get the lost message accounting.
But audit_panic also does a rate limit check, depending on the setting
of audit_failure?
>
>
>>> kfree_skb(skb);
>>> }
>>> } else {
>>> @@ -1164,7 +1171,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
>>> remove_wait_queue(&audit_backlog_wait, &wait);
>>> continue;
>>> }
>>> - if (audit_rate_check())
>>> + if (audit_rate_check() && printk_ratelimit())
>>> printk(KERN_WARNING
>>> "audit: audit_backlog=%d > "
>>> "audit_backlog_limit=%d\n",
>>> @@ -1433,9 +1440,10 @@ void audit_log_end(struct audit_buffer *ab)
>>> skb_queue_tail(&audit_skb_queue, ab->skb);
>>> ab->skb = NULL;
>>> wake_up_interruptible(&kauditd_wait);
>>> - } else {
>>> + } else if (printk_ratelimit())
>>> printk(KERN_NOTICE "%s\n", ab->skb->data + NLMSG_SPACE(0));
>>> - }
>>> + else
>>> + audit_log_lost("printk limit exceeded\n");
>> Same question here.
>>
>> I wonder if it would be better to reduce the generation of the messages,
>> rather than just their output. For example, once we're losing records,
>> should we just flush the queue, issue one message, and then keep going?
>
> I'd be scared it'd just fill up too quickly again...
>
>> Or perhaps issue one message, shut off incoming so we don't accept new
>> records until the backlog goes to zero, then start up again?
>
> Well that's sorta what we do now, we throw stuff on the floor until it
> gets low, maybe not to 0, i don't remember. I'll take a look.
I think it will wait a short while for there to be room in the queue
before failing, but it doesn't wait for the queue to really drain.
If there's one slot left, it takes it, so even if the input rate
fairly closely matches the output rate, we essentially have no
buffering.
>
> I'll think about it, but really, as long as we are generating audit
> events there isn't a great way to solve this problem other than throwing
> stuff on the floor.
I'm actually ok with throwing stuff on the floor if that's how the
audit system is configured. In fact I'm suggesting throwing more
stuff on the floor until some low watermark is hit so we can actually
get out of the backlog condition. Sure, it might re-occur again, but
the idea is to not have an audit message rate problem immediately turn
into a printk rate problem to the point that we don't actually know
what we're losing anymore.
>
> At this point I think this patch is good, but I'll look at how we handle
> lost messages a little more....
Ok, thanks. I wish I had an alternate patch to propose.
-- ljk
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ratelimit printk messages from the audit system
2008-01-23 22:06 ` Linda Knippers
@ 2008-01-24 17:52 ` Paul Moore
2008-01-24 18:01 ` Eric Paris
0 siblings, 1 reply; 8+ messages in thread
From: Paul Moore @ 2008-01-24 17:52 UTC (permalink / raw)
To: linux-audit
On Wednesday 23 January 2008 5:06:53 pm Linda Knippers wrote:
> Eric Paris wrote:
> > On Wed, 2008-01-23 at 16:05 -0500, Linda Knippers wrote:
> >> This is unrelated to your patch but I think it would be nice if
> >> audit_lost represented the number of audit messages lost since the
> >> last time the message came out or the last time an audit record
> >> came out. Today its a cumulative count since the system was
> >> booted. Is it too much overhead to zero it?
> >
> > Shouldn't be too much overhead, we are already on a slow/unlikely
> > path. What's the benefit though? Just don't want to have to do a
> > subtraction?
>
> Well that, plus if the system is up for a long time (which we hope)
> and the message is infrequent (which we also hope), then it could
> take me a while to find the previous message in order to do the
> subtraction.
>
> > If we are dropping the 'we lost some messages' message 0'ing the
> > counter at that time would be a bad idea, certainly not unsolvable,
> > but I don't see what it buys us.
>
> I wouldn't want to lose the message, just make it more useful. And
> if we zero it we don't have to worry about it wrapping. As it is
> now, its really just the count since the last time it wrapped.
I like Linda's idea of zero'ing the lost message counter once we are
able to start sending messages again for all the reasons listed above.
I haven't looked at the audit message sending code, but we are only
talking about adding an extra conditional in the common case and in the
worst case a conditional and an assignment. Granted they are atomic
ops, but everyone keeps telling me that atomic ops are pretty quick on
almost all of the platforms that Linux supports ...
--
paul moore
linux security @ hp
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ratelimit printk messages from the audit system
2008-01-24 17:52 ` Paul Moore
@ 2008-01-24 18:01 ` Eric Paris
2008-01-24 18:08 ` Paul Moore
0 siblings, 1 reply; 8+ messages in thread
From: Eric Paris @ 2008-01-24 18:01 UTC (permalink / raw)
To: Paul Moore; +Cc: linux-audit
On Thu, 2008-01-24 at 12:52 -0500, Paul Moore wrote:
> On Wednesday 23 January 2008 5:06:53 pm Linda Knippers wrote:
> > Eric Paris wrote:
> > > On Wed, 2008-01-23 at 16:05 -0500, Linda Knippers wrote:
> > >> This is unrelated to your patch but I think it would be nice if
> > >> audit_lost represented the number of audit messages lost since the
> > >> last time the message came out or the last time an audit record
> > >> came out. Today its a cumulative count since the system was
> > >> booted. Is it too much overhead to zero it?
> > >
> > > Shouldn't be too much overhead, we are already on a slow/unlikely
> > > path. What's the benefit though? Just don't want to have to do a
> > > subtraction?
> >
> > Well that, plus if the system is up for a long time (which we hope)
> > and the message is infrequent (which we also hope), then it could
> > take me a while to find the previous message in order to do the
> > subtraction.
> >
> > > If we are dropping the 'we lost some messages' message 0'ing the
> > > counter at that time would be a bad idea, certainly not unsolvable,
> > > but I don't see what it buys us.
> >
> > I wouldn't want to lose the message, just make it more useful. And
> > if we zero it we don't have to worry about it wrapping. As it is
> > now, its really just the count since the last time it wrapped.
>
> I like Linda's idea of zero'ing the lost message counter once we are
> able to start sending messages again for all the reasons listed above.
> I haven't looked at the audit message sending code, but we are only
> talking about adding an extra conditional in the common case and in the
> worst case a conditional and an assignment. Granted they are atomic
> ops, but everyone keeps telling me that atomic ops are pretty quick on
> almost all of the platforms that Linux supports ...
Delivery of audit lost messages is through printk/syslog. Assuming we
can assure it gets out of printk when we reset the counter we can't
assure that it made it to syslog. That means we could lose that message
and have no record of it at all, nor any chance that in the future it
would get recorded that it was lost either.
I wouldn't NAK such a patch, but at the same time don't anyone expect me
to write it :)
-Eric
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ratelimit printk messages from the audit system
2008-01-24 18:01 ` Eric Paris
@ 2008-01-24 18:08 ` Paul Moore
2008-01-24 18:13 ` Eric Paris
0 siblings, 1 reply; 8+ messages in thread
From: Paul Moore @ 2008-01-24 18:08 UTC (permalink / raw)
To: Eric Paris; +Cc: linux-audit
On Thursday 24 January 2008 1:01:12 pm Eric Paris wrote:
> On Thu, 2008-01-24 at 12:52 -0500, Paul Moore wrote:
> > On Wednesday 23 January 2008 5:06:53 pm Linda Knippers wrote:
> > > Eric Paris wrote:
> > > > On Wed, 2008-01-23 at 16:05 -0500, Linda Knippers wrote:
> > > >> This is unrelated to your patch but I think it would be nice
> > > >> if audit_lost represented the number of audit messages lost
> > > >> since the last time the message came out or the last time an
> > > >> audit record came out. Today its a cumulative count since the
> > > >> system was booted. Is it too much overhead to zero it?
> > > >
> > > > Shouldn't be too much overhead, we are already on a
> > > > slow/unlikely path. What's the benefit though? Just don't want
> > > > to have to do a subtraction?
> > >
> > > Well that, plus if the system is up for a long time (which we
> > > hope) and the message is infrequent (which we also hope), then it
> > > could take me a while to find the previous message in order to do
> > > the subtraction.
> > >
> > > > If we are dropping the 'we lost some messages' message 0'ing
> > > > the counter at that time would be a bad idea, certainly not
> > > > unsolvable, but I don't see what it buys us.
> > >
> > > I wouldn't want to lose the message, just make it more useful.
> > > And if we zero it we don't have to worry about it wrapping. As
> > > it is now, its really just the count since the last time it
> > > wrapped.
> >
> > I like Linda's idea of zero'ing the lost message counter once we
> > are able to start sending messages again for all the reasons listed
> > above. I haven't looked at the audit message sending code, but we
> > are only talking about adding an extra conditional in the common
> > case and in the worst case a conditional and an assignment.
> > Granted they are atomic ops, but everyone keeps telling me that
> > atomic ops are pretty quick on almost all of the platforms that
> > Linux supports ...
>
> Delivery of audit lost messages is through printk/syslog. Assuming
> we can assure it gets out of printk when we reset the counter we
> can't assure that it made it to syslog. That means we could lose
> that message and have no record of it at all, nor any chance that in
> the future it would get recorded that it was lost either.
That sort of begs the question - why do we even bother printing the
audit record lost message?
:)
> I wouldn't NAK such a patch, but at the same time don't anyone expect
> me to write it :)
<mumbling>
... everytime I open my mouth I end up with more work ...
</mumbling>
--
paul moore
linux security @ hp
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ratelimit printk messages from the audit system
2008-01-24 18:08 ` Paul Moore
@ 2008-01-24 18:13 ` Eric Paris
0 siblings, 0 replies; 8+ messages in thread
From: Eric Paris @ 2008-01-24 18:13 UTC (permalink / raw)
To: Paul Moore; +Cc: linux-audit
On Thu, 2008-01-24 at 13:08 -0500, Paul Moore wrote:
> On Thursday 24 January 2008 1:01:12 pm Eric Paris wrote:
> > On Thu, 2008-01-24 at 12:52 -0500, Paul Moore wrote:
> > > On Wednesday 23 January 2008 5:06:53 pm Linda Knippers wrote:
> > > > Eric Paris wrote:
> > > > > On Wed, 2008-01-23 at 16:05 -0500, Linda Knippers wrote:
> > > > >> This is unrelated to your patch but I think it would be nice
> > > > >> if audit_lost represented the number of audit messages lost
> > > > >> since the last time the message came out or the last time an
> > > > >> audit record came out. Today its a cumulative count since the
> > > > >> system was booted. Is it too much overhead to zero it?
> > > > >
> > > > > Shouldn't be too much overhead, we are already on a
> > > > > slow/unlikely path. What's the benefit though? Just don't want
> > > > > to have to do a subtraction?
> > > >
> > > > Well that, plus if the system is up for a long time (which we
> > > > hope) and the message is infrequent (which we also hope), then it
> > > > could take me a while to find the previous message in order to do
> > > > the subtraction.
> > > >
> > > > > If we are dropping the 'we lost some messages' message 0'ing
> > > > > the counter at that time would be a bad idea, certainly not
> > > > > unsolvable, but I don't see what it buys us.
> > > >
> > > > I wouldn't want to lose the message, just make it more useful.
> > > > And if we zero it we don't have to worry about it wrapping. As
> > > > it is now, its really just the count since the last time it
> > > > wrapped.
> > >
> > > I like Linda's idea of zero'ing the lost message counter once we
> > > are able to start sending messages again for all the reasons listed
> > > above. I haven't looked at the audit message sending code, but we
> > > are only talking about adding an extra conditional in the common
> > > case and in the worst case a conditional and an assignment.
> > > Granted they are atomic ops, but everyone keeps telling me that
> > > atomic ops are pretty quick on almost all of the platforms that
> > > Linux supports ...
> >
> > Delivery of audit lost messages is through printk/syslog. Assuming
> > we can assure it gets out of printk when we reset the counter we
> > can't assure that it made it to syslog. That means we could lose
> > that message and have no record of it at all, nor any chance that in
> > the future it would get recorded that it was lost either.
>
> That sort of begs the question - why do we even bother printing the
> audit record lost message?
>
> :)
Hey its best effort what can I say. At least without reseting the
counter we could realize one of them didn't make it sometime later. Not
worth much I admit :)
-Eric
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-01-24 18:13 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-23 19:50 [PATCH] ratelimit printk messages from the audit system Eric Paris
2008-01-23 21:05 ` Linda Knippers
2008-01-23 21:41 ` Eric Paris
2008-01-23 22:06 ` Linda Knippers
2008-01-24 17:52 ` Paul Moore
2008-01-24 18:01 ` Eric Paris
2008-01-24 18:08 ` Paul Moore
2008-01-24 18:13 ` Eric Paris
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox