netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Alexander E. Patrakov" <patrakov@gmail.com>
To: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>,
	Tom Gundersen <teg@jklm.no>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>,
	Takashi Iwai <tiwai@suse.de>, Kay Sievers <kay@vrfy.org>,
	Sreekanth Reddy <sreekanth.reddy@avagotech.com>,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	Praveen Krishnamoorthy <praveen.krishnamoorthy@avagotech.com>,
	hare <hare@suse.com>,
	Nagalakshmi Nandigama <nagalakshmi.nandigama@avagotech.com>,
	Wu Zhangjin <falcon@meizu.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	"mpt-fusionlinux.pdl" <MPT-FusionLinux.pdl@avagotech.com>,
	Tim Gardner <tim.gardner@canonical.com>,
	Benjamin Poirier <bpoirier@suse.de>,
	Santosh Rastapur <santosh@chelsio.com>,
	Casey Leedom <leedom@chelsio.com>,
	Hariprasad S <hariprasad@chelsio.com>,
	Pierre Fersing <pierre-fersing@pierref.org>,
	Arjan van de Ven <arjan@linux.intel.com>,
	Abhijit Mahajan <abhijit.mahajan@avagotech.com>,
	systemd Mailing List <systemd-devel@lists.freedesk
Subject: Re: [RFC v2 3/6] kthread: warn on kill signal if not OOM
Date: Thu, 11 Sep 2014 11:42:04 +0600	[thread overview]
Message-ID: <5411362C.8080705@gmail.com> (raw)
In-Reply-To: <CAB=NE6UprJH69hzc1mjqyq6PAn+7LZmX65Z8CDwUdJ73nDusHg@mail.gmail.com>

11.09.2014 03:10, Luis R. Rodriguez wrote:
> Tom, thanks for reviewing this! My reply below!
>
> On Tue, Sep 9, 2014 at 11:46 PM, Tom Gundersen <teg@jklm.no> wrote:
>> On Tue, Sep 9, 2014 at 10:45 PM, Luis R. Rodriguez
>> <mcgrof@do-not-panic.com> wrote:
>>> On Tue, Sep 9, 2014 at 12:35 PM, James Bottomley
>>> <James.Bottomley@hansenpartnership.com> wrote:
>>>> On Tue, 2014-09-09 at 12:16 -0700, Luis R. Rodriguez wrote:
>>>>> On Mon, Sep 8, 2014 at 10:38 PM, James Bottomley
>>>>> <James.Bottomley@hansenpartnership.com> wrote:
>>>>>> If we want to sort out some sync/async mechanism for probing devices, as
>>>>>> an agreement between the init systems and the kernel, that's fine, but
>>>>>> its a to-be negotiated enhancement.
>>>>>
>>>>> Unfortunately as Tejun notes the train has left which already made
>>>>> assumptions on this.
>>>>
>>>> Well, that's why it's a bug.  It's a material regression impacting
>>>> users.
>>>
>>> Indeed. I believe the issue with this regression however was that the
>>> original commit e64fae55 (January 2012) was only accepted by *kernel
>>> folks* to be a real regression until recently.
>>
>> Just for the record, this only caused user-visible problems after
>> kernel commit 786235ee (November 2013), right?
>
> Another one was cxgb4:
>
> https://bugzilla.novell.com/show_bug.cgi?id=877622
>
> SLE12 does not yet have commit 786235ee merged. Benjamim did some hard
> work to debug this and trace the kill down to systemd-udev. A debug
> kernel build has been provided now to try to pick up exactly on the
> place where the kill was received, but its at least clear this came
> from systemd.
>
>>> More than two years
>>> have gone by on growing design and assumptions on top of that original
>>> commit. I'm not sure if *systemd folks* yet believe its was a design
>>> regression?
>>
>> I don't think so. udev should not allow its workers to run for an
>> unbounded length of time. Whether the upper bound should be 30, 60,
>> 180 seconds or something else is up for debate (currently it is 60,
>> but if that is too short for some drivers we could certainly revisit
>> that).
>
> That's the thing -- the timeout was put in place under the assumption
> probe was asyncronous and its not, the driver core issues both module
> init *and* probe together, the loader has to wait. That alone makes
> the timeout a design flaw, and then systemd carried on top of that
> design over two years after that. Its not systemd's fault, its just
> that we never spoke about this as a design thing broadly and we should
> have, and I will mention that even when the first issues creeped up,
> the issue was still tossed back a driver problems. It was only until
> recently that we realized that both init and probe run together that
> we've been thinking about this problem differently. Systemd was trying
> to ensure init on driver don't take long but its not init that is
> taking long, its probe, and probe gets then penalized as the driver
> core batches both init and probe synchronously before finishing module
> loading. Furthermore as clarified by Tejun random userland is known to
> exist that will wait indefinitely for module loading under the simple
> assumption things *are done synchronously*, and its precisely why we
> can't just blindly enable async probe upstream through a new driver
> boolean as it can be unfair to this old userland. What is being
> evaluated is to enable aync probe for *all* drivers through a new
> general system-wide option. We cannot regress old userspace and
> assumptions but we can create a new shiny universe.
>
>> Moreover, it seems from this discussion that the aim is (still)
>> that insmod should be near-instantaneous (i.e., not wait for probe),
>
> The only reason that is being discussed is that systemd has not
> accepted the timeout as a system design despite me pointing out the
> original design flaw recently and at this point even if was accepted
> as a design flaw it seems its too late. The approach taken to help
> make all drivers async is simply an afterthought to give systemd what
> it *thought* was in place, and it by no measure should be considered
> the proper fix to the regression introduced by systemd, it may perhaps
> be the right step long term for systemd systems given it goes with
> what it assumed was there, but the timeout was flawed. Its not clear
> if systemd can help with old kernels, it seems the ship has sailed and
> there seems no options but for folks to work around that -- unless of
> course some reasonable solution is found which doesn't break current
> systemd design?
>
>> so it seems to me that the basic design is correct and all we need is
>> some temporary work-around and a way to better detect misbehaving
>> drivers?
>
> As part of this series I addressed hunting for the  "misbehaving
> drivers" in-kernel as I saw no progress on the systemd side of things
> to non-fatally detect "misbehaving drivers" despite my original RFCs
> and request for advice. I quote  "misbehaving drivers" as its a flawed
> view to consider them misbehaving now in light of clarifications of
> how the driver core works in that it batches both init and probe
> together always and we can't be penalizing long probes due to the fact
> long probes are simply fine. My patch to pick up "misbehaving drivers"
> drivers on the kernel front by picking up systemd's signal was
> reactive but it was also simply braindead given the same exact reasons
> why systemd's original timeout was flawed. We want a general solution
> and we don't want to work around the root cause, in this case it was
> systemd's assumption on how drivers work.
>
> Keep in mind that the above just addresses kmod built-in cmd on
> systemd, its where the timeout was introduced but as has been
> clarified here assuming the same timeout on *all* other built-in
> likely is likely pretty flawed as well and this does concern me. Its
> why I mentioned that more than two years have gone by now on growing
> design and assumptions on top of that original commit and its why its
> hard for systemd to consider an alternative.
>
>>>>>   I'm afraid distributions that want to avoid this
>>>>> sigkill at least on the kernel front will have to work around this
>>>>> issue either on systemd by increasing the default timeout which is now
>>>>> possible thanks to Hannes' changes or by some other means such as the
>>>>> combination of a modified non-chatty version of this patch + a check
>>>>> at the end of load_module() as mentioned earlier on these threads.
>>>>
>>>> Increasing the default timeout in systemd seems like the obvious bug fix
>>>> to me.  If the patch exists already, having distros that want it use it
>>>> looks to be correct ... not every bug is a kernel bug, after all.
>>>
>>> Its merged upstream on systemd now, along with a few fixes on top of
>>> it. I also see Kay merged a change to the default timeout to 60 second
>>> on August 30. Its unclear if these discussions had any impact on that
>>> decision or if that was just because udev firmware loading got now
>>> ripped out. I'll note that the new 60 second timeout wouldn't suffice
>>> for cxgb4 even if it didn't do firmware loading, its probe takes over
>>> one full minute.
>>>
>>>> Negotiating a probe vs init split for drivers is fine too, but it's a
>>>> longer term thing rather than a bug fix.
>>>
>>> Indeed. What I proposed with a multiplier for the timeout for the
>>> different types of built in commands was deemed complex but saw no
>>> alternatives proposed despite my interest to work on one and
>>> clarifications noted that this was a design regression. Not quite sure
>>> what else I could have done here. I'm interested in learning what the
>>> better approach is for the future as if we want to marry init + kernel
>>> we need a smooth way for us to discuss design without getting worked
>>> up about it, or taking it personal. I really want this to work as I
>>> personally like systemd so far.
>>
>> How about this: keep the timeout global, but also introduce a
>> (relatively short, say 10 or 15 seconds) timeout after which a warning
>> is printed.
>
> That is something that I originally was looking forward to on systemd,
> but here's the thing once that warning comes up  -- what would we do
> with it? This patch addresses this warning in-kernel and the idea was
> that we'd then peg an async_probe bool as true on the driver as a fix,
> that was decided to be silly given all the above. These drivers are
> actually not misbehaving and it would be even more incorrect to try to
> "fix" them by making them run asynchronously. In fact for some old
> storage drivers it may even be the worst thing to do given the
> possible slew of userland deployment and scripts which assume things
> *are* synchronous.
>
>> Even if nothing is actually killed, having workers (be it
>> insmod or something else) take longer than a couple of seconds is
>> likely a sign that something is seriously off somewhere.
>
> Probe can take a long time and that's fine, so for kmod the current
> assumption is flawed. If we had an option to async probe all drivers
> then perhaps this kmod timeout *might be reasonable*, and even then I
> do recommend for a clear warning that can be collected on logs on its
> first iteration rather than sigkilling, only after a whlie should
> sigkilling be done really. If systemd can deal with module loading in
> the background for drivers that take a long time and warning on that
> intsead of sigkiling it may be good start prior to enabling a default
> sigkill on drivers. This is perhaps also true for other workers but
> its not clear if this is a reasonable strategy for systemd.
>
>   Luis

Just two small remarks to the whole thread.

First, I am quite surprised that nobody brought up the argument that 
module loading is serialized by the kernel. So, while pata-marvell on my 
laptop does its dirty "wait-reset-wait-reset-work" thing, no other 
module can be loaded. This prevention of loading other drivers is the 
thing that slows down the boot.

Second, I am going to XDC2014, LinuxCon Europe and Plumbers. I will take 
my laptop with me, feel free to see the situation firsthand or try 
debugging patches.

-- 
Alexander E. Patrakov

  reply	other threads:[~2014-09-11  5:42 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1409899047-13045-1-git-send-email-mcgrof@do-not-panic.com>
2014-09-05  6:37 ` [RFC v2 2/6] driver-core: add driver async_probe support Luis R. Rodriguez
2014-09-05 11:24   ` Oleg Nesterov
2014-09-05 17:25     ` Luis R. Rodriguez
2014-09-05 22:10   ` Dmitry Torokhov
2014-10-20 23:43     ` Luis R. Rodriguez
2014-09-05  6:37 ` [RFC v2 3/6] kthread: warn on kill signal if not OOM Luis R. Rodriguez
2014-09-05  7:19   ` Tejun Heo
2014-09-05  7:47     ` Luis R. Rodriguez
2014-09-05  9:14       ` Mike Galbraith
2014-09-05 14:12       ` Tejun Heo
2014-09-05 16:44         ` Dmitry Torokhov
2014-09-05 17:49           ` Tejun Heo
2014-09-05 18:10             ` Dmitry Torokhov
2014-09-05 22:29               ` Tejun Heo
2014-09-05 22:31                 ` Tejun Heo
2014-09-05 22:49                   ` Dmitry Torokhov
2014-09-05 22:55                     ` Tejun Heo
2014-09-05 23:22                       ` Dmitry Torokhov
2014-09-05 23:32                         ` Tejun Heo
2014-09-05 22:45                 ` Arjan van de Ven
2014-09-05 22:52                   ` Dmitry Torokhov
2014-09-05 22:57                     ` Tejun Heo
2014-09-05 23:05                     ` Arjan van de Ven
2014-09-05 23:18                       ` Dmitry Torokhov
2014-09-05 18:12             ` Luis R. Rodriguez
2014-09-05 18:29               ` Dmitry Torokhov
2014-09-05 22:40               ` Tejun Heo
2014-09-09  1:04                 ` Luis R. Rodriguez
2014-09-09  1:10                   ` Tejun Heo
2014-09-09  1:13                     ` Tejun Heo
2014-09-09  1:22                     ` Tejun Heo
2014-09-09  1:26                       ` Luis R. Rodriguez
2014-09-09  1:29                         ` Tejun Heo
2014-09-09  1:38                           ` Luis R. Rodriguez
2014-09-09  1:47                             ` Tejun Heo
2014-09-09  2:28                               ` Luis R. Rodriguez
2014-09-09  2:39                                 ` Tejun Heo
2014-09-09  2:57                                   ` Luis R. Rodriguez
2014-09-09  3:03                                     ` Tejun Heo
2014-09-09  3:19                                       ` Luis R. Rodriguez
2014-09-09  3:25                                         ` Tejun Heo
2014-09-09 23:03                                           ` Tejun Heo
2014-09-12 20:14                                             ` Luis R. Rodriguez
2014-09-22 16:36                                     ` Luis R. Rodriguez
2014-09-10  5:13                         ` Tom Gundersen
2014-09-09  5:38                     ` James Bottomley
2014-09-09 19:16                       ` Luis R. Rodriguez
2014-09-09 19:35                         ` James Bottomley
2014-09-09 20:45                           ` Luis R. Rodriguez
2014-09-10  6:46                             ` Tom Gundersen
2014-09-10 10:07                               ` [systemd-devel] " Ceriel Jacobs
2014-09-10 13:31                                 ` James Bottomley
2014-09-10 21:10                               ` Luis R. Rodriguez
2014-09-11  5:42                                 ` Alexander E. Patrakov [this message]
2014-09-11 21:43                                 ` Tom Gundersen
2014-09-11 22:26                                   ` [systemd-devel] " Luis R. Rodriguez
2014-09-12  5:48                                     ` Tom Gundersen
2014-09-12 20:09                                       ` Luis R. Rodriguez
2014-10-10 21:54                                         ` Anatol Pomozov
2014-10-10 22:45                                           ` Tom Gundersen
2014-10-15 19:41                                             ` Anatol Pomozov
2014-10-15 19:46                                               ` Alexander E. Patrakov
2014-09-09 21:42                           ` Tejun Heo
2014-09-09 22:26                             ` James Bottomley
2014-09-09 22:41                               ` Tejun Heo
2014-09-09 22:46                                 ` James Bottomley
2014-09-09 22:52                                   ` Tejun Heo
2014-09-09 23:01                                   ` Dmitry Torokhov
2014-09-11 19:59                                     ` James Bottomley
2014-09-11 20:23                                       ` Dmitry Torokhov
2014-09-11 20:42                                         ` Luis R. Rodriguez
2014-09-11 20:53                                           ` Dmitry Torokhov
2014-09-11 21:08                                             ` Luis R. Rodriguez
2014-09-22 19:49                                         ` Pavel Machek
2014-09-22 20:23                                           ` Dmitry Torokhov
2014-09-30 21:06                                             ` Pavel Machek
2014-09-30 21:34                                               ` Dmitry Torokhov
2014-09-09 22:00                         ` Jiri Kosina
2014-09-05 10:59   ` Oleg Nesterov
2014-09-05 17:35     ` Luis R. Rodriguez
2014-09-05  6:37 ` [RFC v2 4/6] cxgb4: use async probe Luis R. Rodriguez
2014-09-05  6:37 ` [RFC v2 5/6] mptsas: " Luis R. Rodriguez
2014-09-05  7:16   ` Tejun Heo
2014-09-05  7:23   ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5411362C.8080705@gmail.com \
    --to=patrakov@gmail.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=MPT-FusionLinux.pdl@avagotech.com \
    --cc=abhijit.mahajan@avagotech.com \
    --cc=arjan@linux.intel.com \
    --cc=bpoirier@suse.de \
    --cc=falcon@meizu.com \
    --cc=gnomes@lxorguk.ukuu.org.uk \
    --cc=hare@suse.com \
    --cc=hariprasad@chelsio.com \
    --cc=kay@vrfy.org \
    --cc=leedom@chelsio.com \
    --cc=mcgrof@do-not-panic.com \
    --cc=nagalakshmi.nandigama@avagotech.com \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=pierre-fersing@pierref.org \
    --cc=praveen.krishnamoorthy@avagotech.com \
    --cc=santosh@chelsio.com \
    --cc=sreekanth.reddy@avagotech.com \
    --cc=systemd-devel@lists.freedesk \
    --cc=teg@jklm.no \
    --cc=tim.gardner@canonical.com \
    --cc=tiwai@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).