From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
To: Jan Kara <jack@suse.cz>, Viresh Kumar <viresh.kumar@linaro.org>
Cc: Tejun Heo <tj@kernel.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
vlevenetz@mm-sol.com, vaibhav.hiremath@linaro.org,
alex.elder@linaro.org, johan@kernel.org,
akpm@linux-foundation.org, rostedt@goodmis.org,
Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Subject: Re: [Query] Preemption (hogging) of the work handler
Date: Tue, 12 Jul 2016 00:44:38 +0900 [thread overview]
Message-ID: <20160711154438.GA528@swordfish> (raw)
In-Reply-To: <20160711102603.GI12410@quack2.suse.cz>
Hello,
Thanks for Cc-ing.
I'm attending an internal 2-days training now, so I'm a bit
slow at answering emails, sorry.
On (07/11/16 12:26), Jan Kara wrote:
[..]
> > These print messages continue from 2994.918 to 2996.268 (1.35 seconds)
> > and they hog the work-handler for that long, which results in watchdog
> > reboot in our setup. The 3.10 kernel implementation of the printk
> > looks like this (if I am not wrong):
> >
> > local_irq_save();
> > flush-console-buffer(); //console_unlock()
> > local_irq_restore();
> >
> > So, the current CPU will try to print all the messages from the
> > buffer, before enabling the interrupts again on the local CPU and so I
> > don't see the hrtimer fire at all for almost a second.
> >
right. apart from cases when the existing console_unlock() behaviour can
simply "block" a process to flush the log_buf to slow serial consoles
(regardless the process execution context) and make the system less
responsive, I have around ~10 absolutely different scenarios on my list that
may cause soft/hard lockups, rcu stalls, oom-s, etc. and console_unlock() is
the root cause there. the simplest ones involve heavy printk() usage, the
trickier ones do not necessarily have anything that is abusing printk(): a
moderate printk() pressure coming from other CPUs on the system and more or
less active tty -> UART can do the trick, because uart interrupt service
routine and call_console_drivers()->write() have to compete for the same
uart port spin_lock. soft lockups are probably the most common problems,
though, it's not all that easy to catch, because watchdog does not ring
the bell straight after preempt_enable(), but from hrtimer interrupt, that
happens approx every 4 seconds. by this time CPU can be somewhere far away
from console_unlock(). I had an idea of doing watchdog soft lockup check
from preempt_enable(), when it brings preempt_count down to zero, but not
sure I can recall how well did it go.
> > I tried looking at if something related to this changed between 3.10
> > and mainline, and found few patches at least. One of the important
> > ones is:
> >
> > commit 5874af2003b1 ("printk: enable interrupts before calling
> > console_trylock_for_printk()")
> >
> > I wasn't able to backport it cleanly to 3.10 yet to see it makes thing
> > work better though. But it looks like it was targeting similar
> > problems.
> Yes. We have similar problems as you observe on machines when they do a lot
> of printing (usually due to device discovery or similar reasons). The
> problem is not fully solved even upstream as Andrew is reluctant to merge
> the patches. Sergey (added to CC) has the latest version of the series [1].
> If you are interested, I can send you the patches for 3.12 kernel which we
> carry in SLES kernels and which fixes the issue for us. It is significanly
> different from current upstream version but it works good enough for us.
yes, an alternative link /* lkml.org is pretty unreliable sometimes*/
is: http://marc.info/?l=linux-kernel&m=146314209118602
I don't have a backport to 3.10, sorry. I had it some time ago (not the
current version, tho), but I think I lost it by now, don't have to deal
with 3.10 anymore.
I'll re-spin the series in a day or two, I think. A rebased version
(against next-20160711), basically, has only that KERN_CONT patch as
part of 0001 now: http://marc.info/?l=linux-kernel&m=146717692431893
hopefully it will re-fresh the discussion and I'll be able to polish
the series so Andrew will be less sceptical about the whole thing.
-ss
next prev parent reply other threads:[~2016-07-11 15:45 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-01 16:59 [Query] Preemption (hogging) of the work handler Viresh Kumar
2016-07-01 17:22 ` Tejun Heo
2016-07-01 17:28 ` Viresh Kumar
2016-07-06 18:28 ` Viresh Kumar
2016-07-06 19:23 ` Steven Rostedt
2016-07-06 19:25 ` Viresh Kumar
2016-07-11 10:26 ` Jan Kara
2016-07-11 15:44 ` Sergey Senozhatsky [this message]
2016-07-11 22:35 ` Viresh Kumar
2016-07-11 22:44 ` Rafael J. Wysocki
2016-07-11 22:46 ` Viresh Kumar
2016-07-12 12:24 ` Rafael J. Wysocki
2016-07-12 13:02 ` Viresh Kumar
2016-07-12 13:56 ` Petr Mladek
2016-07-12 14:04 ` Viresh Kumar
2016-07-12 9:38 ` Sergey Senozhatsky
2016-07-12 12:52 ` Petr Mladek
2016-07-12 13:12 ` Viresh Kumar
2016-07-12 17:11 ` Viresh Kumar
2016-07-12 19:59 ` Rafael J. Wysocki
2016-07-12 20:08 ` Viresh Kumar
2016-07-13 7:00 ` Sergey Senozhatsky
2016-07-13 12:05 ` Rafael J. Wysocki
2016-07-13 12:57 ` Sergey Senozhatsky
2016-07-13 13:22 ` Rafael J. Wysocki
2016-07-12 14:03 ` Sergey Senozhatsky
2016-07-12 14:12 ` Viresh Kumar
2016-07-14 23:52 ` Viresh Kumar
2016-07-15 13:11 ` Sergey Senozhatsky
2016-07-15 15:57 ` Viresh Kumar
2016-07-12 23:19 ` Viresh Kumar
2016-07-13 0:18 ` Viresh Kumar
2016-07-13 5:45 ` Sergey Senozhatsky
2016-07-13 15:39 ` Viresh Kumar
2016-07-13 23:08 ` Rafael J. Wysocki
2016-07-13 23:18 ` Viresh Kumar
2016-07-13 23:38 ` Greg Kroah-Hartman
2016-07-14 0:55 ` Sergey Senozhatsky
2016-07-14 1:09 ` Rafael J. Wysocki
2016-07-14 1:32 ` Sergey Senozhatsky
2016-07-14 21:57 ` Viresh Kumar
2016-07-14 21:55 ` Viresh Kumar
2016-07-14 14:12 ` Jan Kara
2016-07-14 14:33 ` Rafael J. Wysocki
2016-07-14 14:39 ` Jan Kara
2016-07-14 14:47 ` Rafael J. Wysocki
2016-07-14 14:55 ` Jan Kara
2016-07-14 22:14 ` Viresh Kumar
2016-07-14 14:34 ` Sergey Senozhatsky
2016-07-14 15:03 ` Jan Kara
2016-07-14 22:12 ` Viresh Kumar
2016-07-18 11:01 ` Jan Kara
2016-07-18 11:49 ` Rafael J. Wysocki
2016-07-29 20:42 ` Viresh Kumar
2016-07-30 2:12 ` Sergey Senozhatsky
2016-07-11 19:03 ` Viresh Kumar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160711154438.GA528@swordfish \
--to=sergey.senozhatsky@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=alex.elder@linaro.org \
--cc=gregkh@linuxfoundation.org \
--cc=jack@suse.cz \
--cc=johan@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=sergey.senozhatsky.work@gmail.com \
--cc=tj@kernel.org \
--cc=vaibhav.hiremath@linaro.org \
--cc=viresh.kumar@linaro.org \
--cc=vlevenetz@mm-sol.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox