linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: linux@arm.linux.org.uk (Russell King - ARM Linux)
To: linux-arm-kernel@lists.infradead.org
Subject: Another possible race in deferred probing...
Date: Mon, 12 Aug 2013 11:06:13 +0100	[thread overview]
Message-ID: <20130812100613.GC23006@n2100.arm.linux.org.uk> (raw)

There is no better way to provoke bugs than to show a device to someone
who has never seen it before.  Obviously, I didn't get much chance to
investigate what happened, so all I have are the kernel messages:

kirkwood-spdif-audio kirkwood-spdif-audio.1: ASoC: CPU DAI mvebu-audio.1 not registered
kirkwood-spdif-audio kirkwood-spdif-audio.1: failed to register card
platform kirkwood-spdif-audio.1: Driver kirkwood-spdif-audio requests probe deferral
mvebu-audio mvebu-audio.1: found external clock
kirkwood-spdif-audio kirkwood-spdif-audio.1: ASoC: CODEC spdif-dit not registered
kirkwood-spdif-audio kirkwood-spdif-audio.1: failed to register card
platform kirkwood-spdif-audio.1: Driver kirkwood-spdif-audio requests probe deferral
kirkwood-spdif-audio kirkwood-spdif-audio.1: ASoC: CODEC spdif-dit not registered
kirkwood-spdif-audio kirkwood-spdif-audio.1: failed to register card
platform kirkwood-spdif-audio.1: Driver kirkwood-spdif-audio requests probe deferral

and then when manually removing the kirkwood-spdif module and re-inserting
it:

kirkwood-spdif-audio kirkwood-spdif-audio.1:  dit-hifi <-> mvebu-audio.1 mapping ok

First, remember that Ubuntu loads kernel modules from separate threads,
so module loading here is multithreaded.

What I suspect happened was the spdif-dit module was inserted and
initialized between kirkwood-spdif starting to be re-probed, and
returning from its probe function with -EPROBE_DEFER, and I suspect
that spdif-dit was the last module.

As there was no further module loading or driver activity, there was
no attempt to re-probe the kirkwood-spdif module.

This seems to be a very rare event (it's the first time I've seen it
since the last fix to the deferred probing, and that's many hundreds
of reboots.)

Nevertheless, the fact that it has happened indicates that deferred
probing still has holes that can leave modular drivers unbound after
boot.

Let's consider this scenario, which from a quick scan of the code looks
like it's possible to happen:

Thread 0			Thread 1
deferred probe triggered
driver A's probe function called
 (kirkwood_spdif_probe in my case)
takes lock L to lookup resource A
checks for resource A
 (the spdif_dit codec)
releases lock L
resource A is not found
		----- context switch ------
				module init function called
				 (to register spdif_dit_driver)
				driver B's probe function called
				 (to register the spdif_dit codec
				 via snd_soc_register_codec)
				takes lock L for adding resource A
				registers resource A
				releases lock L
				driver B's probe function completes
				deferred probe triggered,
				 deferred probe list empty
				module init completes
		----- context switch ------
probe returns -EPROBE_DEFER
deferred probe mutex taken
added to deferred probe list
deferred probe mutex released

This can happen because there's no locking between looking up the
resource and having the driver added to the list of devices waiting
to be probed.

Given that deferred probing is becoming the "normal" thing for device
drivers, these races are only going to get worse.  Consider that as we're
working on ARM towards having a single zImage bootable on many boards,
most drivers will be modules, and if they're inserted into the kernel in
a multi-threaded way, the above race is going to continually come back
and bite us.

I don't see an easy solution to this; we can't serialise device probes,
because we have some places where device driver probe functions register
devices themselves, which then go on to have their drivers probed
(nested probes).

Maybe this reflects back into Greg's whole argument against platform
devices: maybe we need a struct device which lists _all_ the resources
which a device requires, have the bus layer safely check for the
presence of all those resources before calling the probe function, and
move the deferred probe out of the core into the bus layer(s).

Another solution would be to have a way to attach to the device a list
of resources which failed, and have the driver core scan that list after
the probe returns rechecking to see if all the resources are now present.
That to me sounds particularly horrid, invasive and quite disgusting to
implement safely.

                 reply	other threads:[~2013-08-12 10:06 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130812100613.GC23006@n2100.arm.linux.org.uk \
    --to=linux@arm.linux.org.uk \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).