From mboxrd@z Thu Jan  1 00:00:00 1970
From: Linda Knippers <linda.knippers@hp.com>
Subject: Re: [PATCH] ratelimit printk messages from the audit system
Date: Wed, 23 Jan 2008 16:05:18 -0500
Message-ID: <4797AC0E.6030308@hp.com>
References: <1201117808.3295.20.camel@localhost.localdomain>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Return-path: <linux-audit-bounces@redhat.com>
In-Reply-To: <1201117808.3295.20.camel@localhost.localdomain>
List-Unsubscribe: <https://www.redhat.com/mailman/listinfo/linux-audit>,
	<mailto:linux-audit-request@redhat.com?subject=unsubscribe>
List-Archive: <https://www.redhat.com/archives/linux-audit>
List-Post: <mailto:linux-audit@redhat.com>
List-Help: <mailto:linux-audit-request@redhat.com?subject=help>
List-Subscribe: <https://www.redhat.com/mailman/listinfo/linux-audit>,
	<mailto:linux-audit-request@redhat.com?subject=subscribe>
Sender: linux-audit-bounces@redhat.com
Errors-To: linux-audit-bounces@redhat.com
To: Eric Paris <eparis@redhat.com>
Cc: linux-audit <linux-audit@redhat.com>
List-Id: linux-audit@redhat.com

Eric Paris wrote:
> Some printk messages from the audit system can become excessive.  This
> patch ratelimits those messages.  It was found that messages, such as
> the audit backlog lost printk message could flood the logs to the point
> that a machine could take an nmi watchdog hit or otherwise become
> unresponsive.
> 
> Signed-off-by: Eric Paris <eparis@redhat.com>
> 
> ---
>  kernel/audit.c |   28 ++++++++++++++++++----------
>  1 files changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/audit.c b/kernel/audit.c
> index f93c271..a3d828b 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -163,7 +163,8 @@ void audit_panic(const char *message)
>  	case AUDIT_FAIL_SILENT:
>  		break;
>  	case AUDIT_FAIL_PRINTK:
> -		printk(KERN_ERR "audit: %s\n", message);
> +		if (printk_ratelimit())
> +			printk(KERN_ERR "audit: %s\n", message);
>  		break;
>  	case AUDIT_FAIL_PANIC:
>  		panic("audit: %s\n", message);
> @@ -231,11 +232,13 @@ void audit_log_lost(const char *message)
>  	}
>  
>  	if (print) {
> -		printk(KERN_WARNING
> -		       "audit: audit_lost=%d audit_rate_limit=%d audit_backlog_limit=%d\n",
> -		       atomic_read(&audit_lost),
> -		       audit_rate_limit,
> -		       audit_backlog_limit);
> +		if (printk_ratelimit())
> +			printk(KERN_WARNING
> +				"audit: audit_lost=%d audit_rate_limit=%d "

This is unrelated to your patch but I think it would be nice if
audit_lost represented the number of audit messages lost since the last
time the message came out or the last time an audit record came out.
Today its a cumulative count since the system was booted.  Is it too
much overhead to zero it?

> +				"audit_backlog_limit=%d\n",
> +				atomic_read(&audit_lost),
> +				audit_rate_limit,
> +				audit_backlog_limit);
>  		audit_panic(message);
>  	}
>  }
> @@ -405,7 +408,11 @@ static int kauditd_thread(void *dummy)
>  					audit_pid = 0;
>  				}
>  			} else {
> -				printk(KERN_NOTICE "%s\n", skb->data + NLMSG_SPACE(0));
> +				if (printk_ratelimit())
> +					printk(KERN_NOTICE "%s\n", skb->data +
> +						NLMSG_SPACE(0));
> +				else
> +					audit_log_lost("printk limit exceeded\n");

If you call audit_log_lost when the printk limit is exceeded, but then
audit_log_lost also checks the printk limit, will this message ever
come out?  Does it make sense to print a message saying we couldn't
print a message?
>  				kfree_skb(skb);
>  			}
>  		} else {
> @@ -1164,7 +1171,7 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
>  			remove_wait_queue(&audit_backlog_wait, &wait);
>  			continue;
>  		}
> -		if (audit_rate_check())
> +		if (audit_rate_check() && printk_ratelimit())
>  			printk(KERN_WARNING
>  			       "audit: audit_backlog=%d > "
>  			       "audit_backlog_limit=%d\n",
> @@ -1433,9 +1440,10 @@ void audit_log_end(struct audit_buffer *ab)
>  			skb_queue_tail(&audit_skb_queue, ab->skb);
>  			ab->skb = NULL;
>  			wake_up_interruptible(&kauditd_wait);
> -		} else {
> +		} else if (printk_ratelimit())
>  			printk(KERN_NOTICE "%s\n", ab->skb->data + NLMSG_SPACE(0));
> -		}
> +		else
> +			audit_log_lost("printk limit exceeded\n");

Same question here.

I wonder if it would be better to reduce the generation of the messages,
rather than just their output.  For example, once we're losing records,
should we just flush the queue, issue one message, and then keep going?
Or perhaps issue one message, shut off incoming so we don't accept new
records until the backlog goes to zero, then start up again?

>  	}
>  	audit_buffer_free(ab);
>  }
> 
> 
> --
> Linux-audit mailing list
> Linux-audit@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-audit