public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
To: Jan Kara <jack@suse.cz>, Viresh Kumar <viresh.kumar@linaro.org>
Cc: Tejun Heo <tj@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	vlevenetz@mm-sol.com, vaibhav.hiremath@linaro.org,
	alex.elder@linaro.org, johan@kernel.org,
	akpm@linux-foundation.org, rostedt@goodmis.org,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Subject: Re: [Query] Preemption (hogging) of the work handler
Date: Tue, 12 Jul 2016 00:44:38 +0900	[thread overview]
Message-ID: <20160711154438.GA528@swordfish> (raw)
In-Reply-To: <20160711102603.GI12410@quack2.suse.cz>

Hello,

Thanks for Cc-ing.

I'm attending an internal 2-days training now, so I'm a bit
slow at answering emails, sorry.

On (07/11/16 12:26), Jan Kara wrote:
[..]
> > These print messages continue from 2994.918 to 2996.268 (1.35 seconds)
> > and they hog the work-handler for that long, which results in watchdog
> > reboot in our setup. The 3.10 kernel implementation of the printk
> > looks like this (if I am not wrong):
> > 
> >         local_irq_save();
> >         flush-console-buffer(); //console_unlock()
> >         local_irq_restore();
> > 
> > So, the current CPU will try to print all the messages from the
> > buffer, before enabling the interrupts again on the local CPU and so I
> > don't see the hrtimer fire at all for almost a second.
> > 

right. apart from cases when the existing console_unlock() behaviour can
simply "block" a process to flush the log_buf to slow serial consoles
(regardless the  process execution context) and make the system less
responsive, I have around ~10 absolutely different scenarios on my list that
may cause soft/hard lockups, rcu stalls, oom-s, etc. and console_unlock() is
the root cause there. the simplest ones involve heavy printk() usage, the
trickier ones do not necessarily have anything that is abusing printk(): a
moderate printk() pressure coming from other CPUs on the system and more or
less active tty -> UART can do the trick, because uart interrupt service
routine and call_console_drivers()->write() have to compete for the same
uart port spin_lock. soft lockups are probably the most common problems,
though, it's not all that easy to catch, because watchdog does not ring
the bell straight after preempt_enable(), but from hrtimer interrupt, that
happens approx every 4 seconds. by this time CPU can be somewhere far away
from console_unlock(). I had an idea of doing watchdog soft lockup check
from preempt_enable(), when it brings preempt_count down to zero, but not
sure I can recall how well did it go.

> > I tried looking at if something related to this changed between 3.10
> > and mainline, and found few patches at least. One of the important
> > ones is:
> > 
> > commit 5874af2003b1 ("printk: enable interrupts before calling
> > console_trylock_for_printk()")
> > 
> > I wasn't able to backport it cleanly to 3.10 yet to see it makes thing
> > work better though. But it looks like it was targeting similar
> > problems.
> Yes. We have similar problems as you observe on machines when they do a lot
> of printing (usually due to device discovery or similar reasons). The
> problem is not fully solved even upstream as Andrew is reluctant to merge
> the patches. Sergey (added to CC) has the latest version of the series [1].
> If you are interested, I can send you the patches for 3.12 kernel which we
> carry in SLES kernels and which fixes the issue for us. It is significanly
> different from current upstream version but it works good enough for us.

yes, an alternative link /* lkml.org is pretty unreliable sometimes*/
is: http://marc.info/?l=linux-kernel&m=146314209118602

I don't have a backport to 3.10, sorry. I had it some time ago (not the
current version, tho), but I think I lost it by now, don't have to deal
with 3.10 anymore.

I'll re-spin the series in a day or two, I think. A rebased version
(against next-20160711), basically, has only that KERN_CONT patch as
part of 0001 now: http://marc.info/?l=linux-kernel&m=146717692431893

hopefully it will re-fresh the discussion and I'll be able to polish
the series so Andrew will be less sceptical about the whole thing.

	-ss

  reply	other threads:[~2016-07-11 15:45 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-01 16:59 [Query] Preemption (hogging) of the work handler Viresh Kumar
2016-07-01 17:22 ` Tejun Heo
2016-07-01 17:28   ` Viresh Kumar
2016-07-06 18:28   ` Viresh Kumar
2016-07-06 19:23     ` Steven Rostedt
2016-07-06 19:25       ` Viresh Kumar
2016-07-11 10:26     ` Jan Kara
2016-07-11 15:44       ` Sergey Senozhatsky [this message]
2016-07-11 22:35         ` Viresh Kumar
2016-07-11 22:44           ` Rafael J. Wysocki
2016-07-11 22:46             ` Viresh Kumar
2016-07-12 12:24               ` Rafael J. Wysocki
2016-07-12 13:02                 ` Viresh Kumar
2016-07-12 13:56                   ` Petr Mladek
2016-07-12 14:04                     ` Viresh Kumar
2016-07-12  9:38           ` Sergey Senozhatsky
2016-07-12 12:52             ` Petr Mladek
2016-07-12 13:12               ` Viresh Kumar
2016-07-12 17:11                 ` Viresh Kumar
2016-07-12 19:59                   ` Rafael J. Wysocki
2016-07-12 20:08                     ` Viresh Kumar
2016-07-13  7:00                   ` Sergey Senozhatsky
2016-07-13 12:05                     ` Rafael J. Wysocki
2016-07-13 12:57                       ` Sergey Senozhatsky
2016-07-13 13:22                         ` Rafael J. Wysocki
2016-07-12 14:03               ` Sergey Senozhatsky
2016-07-12 14:12                 ` Viresh Kumar
2016-07-14 23:52                 ` Viresh Kumar
2016-07-15 13:11                   ` Sergey Senozhatsky
2016-07-15 15:57                     ` Viresh Kumar
2016-07-12 23:19           ` Viresh Kumar
2016-07-13  0:18             ` Viresh Kumar
2016-07-13  5:45             ` Sergey Senozhatsky
2016-07-13 15:39               ` Viresh Kumar
2016-07-13 23:08                 ` Rafael J. Wysocki
2016-07-13 23:18                   ` Viresh Kumar
2016-07-13 23:38                     ` Greg Kroah-Hartman
2016-07-14  0:55                 ` Sergey Senozhatsky
2016-07-14  1:09                   ` Rafael J. Wysocki
2016-07-14  1:32                     ` Sergey Senozhatsky
2016-07-14 21:57                       ` Viresh Kumar
2016-07-14 21:55                   ` Viresh Kumar
2016-07-14 14:12               ` Jan Kara
2016-07-14 14:33                 ` Rafael J. Wysocki
2016-07-14 14:39                   ` Jan Kara
2016-07-14 14:47                     ` Rafael J. Wysocki
2016-07-14 14:55                       ` Jan Kara
2016-07-14 22:14                         ` Viresh Kumar
2016-07-14 14:34                 ` Sergey Senozhatsky
2016-07-14 15:03                   ` Jan Kara
2016-07-14 22:12                 ` Viresh Kumar
2016-07-18 11:01                   ` Jan Kara
2016-07-18 11:49                     ` Rafael J. Wysocki
2016-07-29 20:42               ` Viresh Kumar
2016-07-30  2:12                 ` Sergey Senozhatsky
2016-07-11 19:03       ` Viresh Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160711154438.GA528@swordfish \
    --to=sergey.senozhatsky@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.elder@linaro.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jack@suse.cz \
    --cc=johan@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=tj@kernel.org \
    --cc=vaibhav.hiremath@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=vlevenetz@mm-sol.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox