All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luiz Capitulino <lcapitulino@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, oleg@redhat.com, eparis@redhat.com,
	rgb@redhat.com
Subject: Re: [RFC] audit: avoid soft lockup in audit_log_start()
Date: Fri, 30 Aug 2013 14:23:26 -0400	[thread overview]
Message-ID: <20130830142326.79864486@redhat.com> (raw)
In-Reply-To: <20130828160813.e448eee90886310d6640b87d@linux-foundation.org>

On Wed, 28 Aug 2013 16:08:13 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 28 Aug 2013 18:54:36 -0400 Luiz Capitulino <lcapitulino@redhat.com> wrote:
> 
> > > Are you really sure that kauditd is stuck in schedule() and doesn't
> > > come out?
> > 
> > No, that's a guess. Inferred from:
> > 
> > 1. I tried calling wake_up_interruptible(&kauditd_wait); right
> >    before wait_for_auditd(). Nothing changes
> > 
> > 2. I added this debug printks:
> > 
> >   diff --git a/kernel/audit.c b/kernel/audit.c
> >   index 91e53d0..27448ad 100644
> >   --- a/kernel/audit.c
> >   +++ b/kernel/audit.c
> >   @@ -458,11 +458,14 @@ static int kauditd_thread(void *dummy)
> >    		set_current_state(TASK_INTERRUPTIBLE);
> >    		add_wait_queue(&kauditd_wait, &wait);
> >    
> >   +		pr_emerg_ratelimited("*** sleeping\n");
> >   +
> >    		if (!skb_queue_len(&audit_skb_queue)) {
> >    			try_to_freeze();
> >    			schedule();
> >    		}
> >    
> >   +		pr_emerg_ratelimited("*** waking up\n");
> >    		__set_current_state(TASK_RUNNING);
> >    		remove_wait_queue(&kauditd_wait, &wait);
> >    	}
> > 
> >  I get several pairs of sleeping/waking up strings right before the
> >  system begins to shut down. Then it stops (even though we do
> >  have SKBs queued)
> 
> Well.  I assume "*** sleeping" the last thing kauditd prints?  If the
> last print is "waking up" then obviously kauditd is stuck somewhere
> else, which makes more sense.

That's correct. According to crash, here's where it's stuck:

 #0 [ffff880115511c58] __schedule at ffffffff815361de
 #1 [ffff880115511cd0] schedule at ffffffff81537039
 #2 [ffff880115511ce0] schedule_timeout at ffffffff81534524
 #3 [ffff880115511d90] netlink_attachskb at ffffffff81498c69
 #4 [ffff880115511df0] netlink_unicast at ffffffff81498d7f
 #5 [ffff880115511e40] kauditd_send_skb at ffffffff810c254f
 #6 [ffff880115511e60] kauditd_thread at ffffffff810c26e9
 #7 [ffff880115511ec0] kthread at ffffffff810693b0
 #8 [ffff880115511f50] ret_from_fork at ffffffff8154071c

I see two possible reasons for this:

 1. User-space triggered a NETLINK_CONGESTED condition. If this is
    the problem, then the fix is to make audit's netlink socket
    non-blocking and queue SKBs if netlink_unicast() returns -EINVAL

 2. It's a race condition in the audit code. This bug is only triggered
    on SMP machines _and_ adding printk()s to kauditd_thread() makes
    it go away. Also, if I'm not mistaken, there's a number of global
    variables that are shared between kaudit_thread() and threads
    calling audit_log(). For example: kaudit_sock, audit_skb_hold_queue,
	kauditd_wait and others

I can try protecting the global data structures with a mutex but, will
such a patch be accepted if the bug goes away? :)

  parent reply	other threads:[~2013-08-30 18:23 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-28 22:21 [RFC] audit: avoid soft lockup in audit_log_start() Luiz Capitulino
2013-08-28 22:33 ` Andrew Morton
2013-08-28 22:54   ` Luiz Capitulino
2013-08-28 23:08     ` Andrew Morton
2013-08-29  0:49       ` Luiz Capitulino
2013-08-30 18:23       ` Luiz Capitulino [this message]
2013-09-09 14:32 ` Konstantin Khlebnikov
2013-09-09 14:54   ` Luiz Capitulino
2013-09-09 15:19     ` Konstantin Khlebnikov
2013-09-09 15:29       ` Luiz Capitulino
2013-09-09 15:42         ` Konstantin Khlebnikov
2013-09-10 16:03   ` Eric Paris
2013-09-10 17:45     ` Luiz Capitulino
2013-09-17 22:28     ` Andrew Morton
2013-09-17 22:54       ` Luiz Capitulino
2013-09-18  1:57       ` Richard Guy Briggs
2013-09-18  9:48       ` [PATCH] audit: fix endless wait " Konstantin Khlebnikov
2013-09-18 13:31         ` Richard Guy Briggs
2013-09-18 19:06       ` [PATCH 0/8] Audit backlog queue fixes related to soft lockup Richard Guy Briggs
2013-09-18 19:06         ` [PATCH 1/8] audit: avoid soft lockup due to audit_log_start() incorrect loop termination Richard Guy Briggs
2013-09-18 19:06         ` [PATCH 2/8] audit: reset audit backlog wait time after error recovery Richard Guy Briggs
2013-09-18 19:06         ` [PATCH 3/8] audit: make use of remaining sleep time from wait_for_auditd Richard Guy Briggs
2013-09-18 19:06         ` [PATCH 4/8] audit: efficiency fix 1: only wake up if queue shorter than backlog limit Richard Guy Briggs
2013-09-18 19:06         ` [PATCH 5/8] audit: efficiency fix 2: request exclusive wait since all need same resource Richard Guy Briggs
2013-09-18 19:06         ` [PATCH 6/8] audit: add boot option to override default backlog limit Richard Guy Briggs
2013-09-18 19:06         ` [PATCH 7/8] audit: clean up AUDIT_GET/SET local variables and future-proof API Richard Guy Briggs
2013-09-19 21:18           ` Steve Grubb
2013-09-20 14:47             ` Eric Paris
2013-09-23 16:38               ` Richard Guy Briggs
2013-09-18 19:06         ` [PATCH 8/8] audit: add audit_backlog_wait_time configuration option Richard Guy Briggs
2013-09-18 20:33           ` Eric Paris
2013-09-18 20:49             ` Richard Guy Briggs
2013-09-18 20:54               ` Eric Paris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130830142326.79864486@redhat.com \
    --to=lcapitulino@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=eparis@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=rgb@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.