From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Lucas C. Villa Real" <lucasvr@gobolinux.org>
Subject: Re: Handling -ENOBUFS
Date: Wed, 5 Nov 2008 18:56:30 -0200
Message-ID: <2c03f9590811051256k3548a16i6ffc3060f54d11c8@mail.gmail.com>
References: <2c03f9590811050830wc551ca6g3a99c467c5e0b7@mail.gmail.com>
	<200811051319.00359.sgrubb@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Return-path: <linux-audit-bounces@redhat.com>
In-Reply-To: <200811051319.00359.sgrubb@redhat.com>
Content-Disposition: inline
List-Unsubscribe: <https://www.redhat.com/mailman/listinfo/linux-audit>,
	<mailto:linux-audit-request@redhat.com?subject=unsubscribe>
List-Archive: <https://www.redhat.com/archives/linux-audit>
List-Post: <mailto:linux-audit@redhat.com>
List-Help: <mailto:linux-audit-request@redhat.com?subject=help>
List-Subscribe: <https://www.redhat.com/mailman/listinfo/linux-audit>,
	<mailto:linux-audit-request@redhat.com?subject=subscribe>
Sender: linux-audit-bounces@redhat.com
Errors-To: linux-audit-bounces@redhat.com
To: Steve Grubb <sgrubb@redhat.com>
Cc: linux-audit@redhat.com
List-Id: linux-audit@redhat.com

On Wed, Nov 5, 2008 at 4:19 PM, Steve Grubb <sgrubb@redhat.com> wrote:
> On Wednesday 05 November 2008 11:30:16 Lucas C. Villa Real wrote:
>> I'm facing a situation where -ENOBUFS is returned from both
>> audit_send() and audit_get_reply(). The system is under high stress,
>> with 250k files being created and having creat() and chmod() syscalls
>> audited.
>
> Is this what you really wanted to audit? :)

Yes, not a single event can be missed in the system I'm working on,
unfortunately :)


>> Looking the code at lib/netlink.c, I saw that audit_send() doesn't
>> handle -ENOBUFS. Would it be possible to replace the condition from
>> "while (retval < 0 && errno == EINTR)" to "while (retval < 0 && (errno
>> == EINTR || errno == ENOBUFS))" to fix the problem when sending
>> packets from userspace to kernel?
>
> Have you tried that? Does it fix the problem or just hang the utility?

So far it didn't hang. However, just in case, I added a maximum number
of retries (currently set to 64). I'm about to launch a new batch to
stress the system once again, and then I'll be able to see if it works
as expected.

>> My understanding for the problem in audit_get_reply() is that the I/O
>> buffers are all full and auditd was just not scheduled at the expected
>> rate, causing these buffers to overflow. Does that make sense?
>
> If you go over the backlog limit, you get a syslog message about that unless
> you have it set to ignore. My guess would be that you have a general network
> memory pool depletion and is not related to audit specifically.

Yes. I hope that increasing auditd's priority will help to drain that.
I'll let you know if that works.

>> If it does, do you have a suggestion about the best way to approach this
>> problem, besides changing auditd's priority?
>
> Increase the backlog and increase auditd's priority. I have not played with
> running auditd with a different scheduler policy than whatever the default
> is. But you may want to see if one of the other scheduler polices treat audit
> better. or maybe you want to tune  /proc/sys/kernel/sched_granularity_ns.
>
>
>> One interesting thing which I noticed is that 'auditctl -s' doesn't
>> report that messages were lost,
>
> They weren't lost by the audit system so it doesn't know they didn't arrive.

Do you think it would make sense to add an extra member to struct
sk_buff (a pointer to a callback function) and then have
skb_queue_tail() signal if it failed to send a message? That would
allow audit to keep track of such losses, as well as any other
subsystem using netlink for communicating with userspace.

>> This is happening with an old kernel, 2.6.16.46 + a bunch of patches,
>> and audit 1.7.4. I cannot completely upgrade it to a new release, but
>> I can certainly backport audit specific bits if you remember having
>> fixed something similar since then.
>
> Well, that proc tunable is only available for the CFS scheduler. Not sure what
> you have for older kernels.

It's not, but I'll keep looking for other ways to improve the
responsiveness of auditd here.

Thanks!
Lucas