From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Lucas C. Villa Real" Subject: Handling -ENOBUFS Date: Wed, 5 Nov 2008 14:30:16 -0200 Message-ID: <2c03f9590811050830wc551ca6g3a99c467c5e0b7@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: Received: from mx3.redhat.com (mx3.redhat.com [172.16.48.32]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id mA5GUTws022129 for ; Wed, 5 Nov 2008 11:30:29 -0500 Received: from qw-out-2122.google.com (qw-out-2122.google.com [74.125.92.27]) by mx3.redhat.com (8.13.8/8.13.8) with ESMTP id mA5GUHU2000950 for ; Wed, 5 Nov 2008 11:30:18 -0500 Received: by qw-out-2122.google.com with SMTP id 3so49450qwe.39 for ; Wed, 05 Nov 2008 08:30:16 -0800 (PST) Content-Disposition: inline List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-audit-bounces@redhat.com Errors-To: linux-audit-bounces@redhat.com To: linux-audit@redhat.com List-Id: linux-audit@redhat.com Hi guys, I'm facing a situation where -ENOBUFS is returned from both audit_send() and audit_get_reply(). The system is under high stress, with 250k files being created and having creat() and chmod() syscalls audited. Looking the code at lib/netlink.c, I saw that audit_send() doesn't handle -ENOBUFS. Would it be possible to replace the condition from "while (retval < 0 && errno == EINTR)" to "while (retval < 0 && (errno == EINTR || errno == ENOBUFS))" to fix the problem when sending packets from userspace to kernel? My understanding for the problem in audit_get_reply() is that the I/O buffers are all full and auditd was just not scheduled at the expected rate, causing these buffers to overflow. Does that make sense? If it does, do you have a suggestion about the best way to approach this problem, besides changing auditd's priority? I thought of a dirty trick such as forcing auditd to be rescheduled, but that would be way too intrusive. One interesting thing which I noticed is that 'auditctl -s' doesn't report that messages were lost, although a few events did not appear in the logs. I'm still not sure if they didn't appear because of this specific problem, but given that ENOBUFS was returned I would expect to see a positive counter in "lost" below: AUDIT_STATUS: enabled=1 flag=1 pid=3821 rate_limit=0 backlog_limit=8192 lost=0 backlog=0 This is happening with an old kernel, 2.6.16.46 + a bunch of patches, and audit 1.7.4. I cannot completely upgrade it to a new release, but I can certainly backport audit specific bits if you remember having fixed something similar since then. Thanks, Lucas