From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Lucas C. Villa Real" Subject: Re: Handling -ENOBUFS Date: Wed, 5 Nov 2008 18:56:30 -0200 Message-ID: <2c03f9590811051256k3548a16i6ffc3060f54d11c8@mail.gmail.com> References: <2c03f9590811050830wc551ca6g3a99c467c5e0b7@mail.gmail.com> <200811051319.00359.sgrubb@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <200811051319.00359.sgrubb@redhat.com> Content-Disposition: inline List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-audit-bounces@redhat.com Errors-To: linux-audit-bounces@redhat.com To: Steve Grubb Cc: linux-audit@redhat.com List-Id: linux-audit@redhat.com On Wed, Nov 5, 2008 at 4:19 PM, Steve Grubb wrote: > On Wednesday 05 November 2008 11:30:16 Lucas C. Villa Real wrote: >> I'm facing a situation where -ENOBUFS is returned from both >> audit_send() and audit_get_reply(). The system is under high stress, >> with 250k files being created and having creat() and chmod() syscalls >> audited. > > Is this what you really wanted to audit? :) Yes, not a single event can be missed in the system I'm working on, unfortunately :) >> Looking the code at lib/netlink.c, I saw that audit_send() doesn't >> handle -ENOBUFS. Would it be possible to replace the condition from >> "while (retval < 0 && errno == EINTR)" to "while (retval < 0 && (errno >> == EINTR || errno == ENOBUFS))" to fix the problem when sending >> packets from userspace to kernel? > > Have you tried that? Does it fix the problem or just hang the utility? So far it didn't hang. However, just in case, I added a maximum number of retries (currently set to 64). I'm about to launch a new batch to stress the system once again, and then I'll be able to see if it works as expected. >> My understanding for the problem in audit_get_reply() is that the I/O >> buffers are all full and auditd was just not scheduled at the expected >> rate, causing these buffers to overflow. Does that make sense? > > If you go over the backlog limit, you get a syslog message about that unless > you have it set to ignore. My guess would be that you have a general network > memory pool depletion and is not related to audit specifically. Yes. I hope that increasing auditd's priority will help to drain that. I'll let you know if that works. >> If it does, do you have a suggestion about the best way to approach this >> problem, besides changing auditd's priority? > > Increase the backlog and increase auditd's priority. I have not played with > running auditd with a different scheduler policy than whatever the default > is. But you may want to see if one of the other scheduler polices treat audit > better. or maybe you want to tune /proc/sys/kernel/sched_granularity_ns. > > >> One interesting thing which I noticed is that 'auditctl -s' doesn't >> report that messages were lost, > > They weren't lost by the audit system so it doesn't know they didn't arrive. Do you think it would make sense to add an extra member to struct sk_buff (a pointer to a callback function) and then have skb_queue_tail() signal if it failed to send a message? That would allow audit to keep track of such losses, as well as any other subsystem using netlink for communicating with userspace. >> This is happening with an old kernel, 2.6.16.46 + a bunch of patches, >> and audit 1.7.4. I cannot completely upgrade it to a new release, but >> I can certainly backport audit specific bits if you remember having >> fixed something similar since then. > > Well, that proc tunable is only available for the CFS scheduler. Not sure what > you have for older kernels. It's not, but I'll keep looking for other ways to improve the responsiveness of auditd here. Thanks! Lucas