All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Andersson <bjorn.andersson@linaro.org>
To: Suman Anna <s-anna@ti.com>
Cc: linux-remoteproc@vger.kernel.org,
	Ohad Ben-Cohen <ohad@wizery.com>,
	linux-kernel@vger.kernel.org, Lee Jones <lee.jones@linaro.org>,
	Loic Pallardy <loic.pallardy@st.com>
Subject: Re: [PATCH v2 1/4] remoteproc: Introduce auto-boot flag
Date: Mon, 19 Sep 2016 16:24:41 -0700	[thread overview]
Message-ID: <20160919232441.GJ21438@tuxbot> (raw)
In-Reply-To: <1cef21a3-2517-4995-06d4-bd5defa13662@ti.com>

On Fri 16 Sep 16:58 PDT 2016, Suman Anna wrote:

> Hi Bjorn,
> 
> On 09/08/2016 05:27 PM, Bjorn Andersson wrote:
> > On Wed 31 Aug 11:27 PDT 2016, Suman Anna wrote:
> > 
> >> Hi Bjorn,
> >>
> >> On 08/11/2016 04:52 PM, Bjorn Andersson wrote:
> >>> Introduce an "auto-boot" flag on rprocs to make it possible to flag
> >>> remote processors without vdevs to automatically boot once the firmware
> >>> is found.
> >>>
> >>> Preserve previous behavior of the wkup_m3 processor being explicitly
> >>> booted by a consumer.
> >>>
> >>> Cc: Lee Jones <lee.jones@linaro.org>
> >>> Cc: Loic Pallardy <loic.pallardy@st.com>
> >>> Cc: Suman Anna <s-anna@ti.com>
> >>> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> >>> ---
> >>>
> >>> Changes since v1:
> >>> - s/always_on/auto_boot/
> >>> - Fixed double increment of "power" in recover path
> >>> - Marked wkup_m3 to not auto_boot
> >>>
> >>
> >> I am seeing various issues with this series as I am testing this series
> >> more thoroughly with various TI remoteproc drivers and IPC stacks based
> >> on virtio devices. I use very simple firmware images that publishes the
> >> rpmsg-client-sample devices, so that I can use the kernel
> >> rpmsg_client_sample to test the communication.
> >>
> > 
> > Thanks for bringing these up!
> > 
> >> Here's a summary of the main issues:
> >>
> >> 1. The rproc_boot holds a module reference count to the remoteproc
> >> platform driver so that it cannot be removed when a remote processor is
> >> booted. The previous architecture allowed virtio_rpmsg_bus or the
> >> platform remoteproc driver to be installed independent of each other
> >> with the boot actually getting triggered when the virtio_rpmsg_bus gets
> >> probed in find_vqs. The boot now happens when just the platform
> >> remoteproc driver is installed independent of virtio_rpmsg_bus and
> >> results in holding self-reference counts. This makes it impossible to
> >> remove the remoteproc platform module cleanly (we shouldn't be imposing
> >> force remove), which means we can't stop the remoteprocs properly.
> >>
> > 
> > I've always found the dependency on rpmsg awkward and don't like this
> > behavior to depend on the firmware content, but rather on some sort of
> > system configuration.
> > 
> > That said, I did not intend to break the ability of shutting down and
> > unloading the module.
> 
> The remoteproc infrastructure always allowed the boot to be done by an
> external module

The only caller of rproc_boot() in the mainline kernel is the
wkup_m3_ipc, which spawns a thread and waits for the remoteproc-internal
completion that indicates when the firmware is available and then call
rproc_boot().

I'm afraid I would much prefer the wkup_m3_rproc to just set auto-boot
and not work around the design like this.


That said, we still provide the support for this.

> (with vdevs, it looks intrinsic because of the
> invocation in remoteproc_virtio.c, but the boot was really triggered
> during virtio_rpmsg probe)

Except for the fact that the first vdev does this as a direct result of
the call from rproc_fw_config_virtio().

As stated above, the only other caller of rproc_boot() is the wkup_m3
driver and although it's not executing in the context of
rproc_fw_config_virtio() it's directly tied into its execution.

> and provided protection against removing the remoteproc driver removal
> when you had a consumer.  I tried the auto-boot when I was upstreaming
> the wkup_m3_rproc driver and it was rejected for exactly this reason.

One of the problems I have with the design chosen in the wkup_m3 case is
if you had to deal with crash recovery, how would you sync the
wkup_m3_ipc driver operations to the async recovery?

Flipping this the other way around and accepting that the "function
device" follows the RPROC_RUNNING state of the rproc implicitly would
allow this.

> 
> > Calling rmmod on your rproc module is a rather explicit operation, do
> > you know why there's a need to hold the module reference? Wouldn't it be
> > cleaner to just shutdown the remoteproc as the module is being removed -
> > which would include tearing down all children (rpmsg and others)
> 
> Right, the introduction of the auto-boot ended up in a self-holding
> reference count which should be fixed to allow you to remove the module.
> The external module boot is still a valid usage (when auto_boot is
> false) though.

The way this is dealt with in other subsystems is that when a client
acquire a handle to something exposed by the implementation it will
reference the implementation.

With remoteproc I can acquire a reference to a remoteproc, then unload
the implementation and when the client calls rproc_boot() the core will
try to dereference dev->parent->driver->owner to try_module_get() it; if
we're lucky that will fail as the driver is gone, but I would assume we
would panic here instead, as driver is no longer a valid pointer.

> 
> > 
> > I expect to be able to compile rpmsg into my kernel and still be able to
> > unload my remoteproc module.
> > 
> >> 2. The reversal of boot between virtio_rpmsg_bus and remoteproc core
> >> also meant that the virtio devices and therefore the memory for vrings
> >> are allocated at the time virtio_rpmsg_bus is probed in find_vqs(). The
> >> remoteproc can be booted without the virtio_rpmsg_bus module installed.
> >> We do use the allocated dma addresses of the vrings in the published
> >> resource table, but now the remote processor is up even before these
> >> values are filled in. I had to actually move up the rproc_alloc_vring
> >> alongside rproc_parse_vring to have the communication up.
> >>
> > 
> > This also means that for a piece of firmware that exposes more than one
> > virtio device we would have the same race.
> 
> Yes, if you had more virtio devices technically. The remoteproc
> infrastructure so far hadn't supported more than one vdev.

I don't see anything preventing you from putting more than one vdev in
your resource table. There will however be a race on getting the vrings
allocated before the firmware needs them.

> We did
> support that in our downstream kernel but that was for a non-SMP usage
> on a dual-core remoteproc subsystem and the virtio devices were still
> virtio_rpmsg devices, so it scaled for us when we removed the
> virtio_rpmsg_bus module as long as the reference counts were managed in
> rproc_boot and rproc_shutdown()
> 

I think the shutdown case would work in either way, but the behavior of
rmmod rpmsg && insmod rpmsg still depends on other things.

> > 
> > As you say it makes more sense to allocate the vrings as we set up the
> > other resources.
> > 
> >> 3. The remoteproc platform driver cannot be removed previously when the
> >> corresponding virtio devices are probed/configured properly and all the
> >> communication flow w.r.t rpmsg channel publishing followed from the
> >> remoteproc boot. These channels are child devices of the parent virtio
> >> devices, and since the remoteproc boot/shutdown followed the virtio
> >> device probe/removal lifecycle, the rpmsg channels life-cycle followed
> >> that of the parent virtio device. My communication paths are now failing
> >> if I remove the virtio_rpmsg_bus and insmod it again as the vrings and
> >> vring buffers are configured again while the remoteproc is still
> >> running. Also, since the remoteproc is not rebooted, the previously
> >> published rpmsg channels are stale and they won't get recreated.
> >>
> > 
> > So in essence this only worked because rmmod'ing the virtio_rpmsg_bus
> > would shut down the remoteproc(?) What if the firmware exposed a
> > VIRTIO_ID_NET and a VIRTIO_ID_RPMSG and you rmmod one of them?
> 
> Yes, it's a problem if firmware exposed different virtio devices, but
> like I said above, it was not supported so far.

It's not prevented, the only reason I can find for it not working is
what I'm arguing about - that the resource handling will be racy.

> It is definitely
> something we should consider going forward.

We had a question related to supporting multiple VIRTIO_ID_NET devices
per remoteproc on LKML last week.

> Also, I would think the
> VIRTIO_ID_RPMSG usage is different from VIRTIO_ID_NET or VIRTIO_CONSOLE.
> The rpmsg usage does provide a bus infrastructure allowing the remote
> side to publish different rpmsg devices.
> 

Maybe I'm missing something here, but if I put any other VIRTIO_ID in
the "id" member of fw_rsc_vdev the virtio core will probe that driver
and it will call the find_vqs callback and there's no limit on the
number of fw_rsc_vdev entries in the resource table.

So the use case of having a remoteproc firmware with 8 VIRTIO_ID_NET
entries should work just fine - except for only the first one being
initialized as the firmware boots (unless we finish this restructure).

> > 
> > 
> > The vrings are in use by the remote side, so allocating those with the
> > other remoteproc resources makes more sense, as above.
> > 
> > But virtio_rpmsg_bus needs to detach all buffers from the rings before
> > going away, so we don't get any writes to those after releasing the dma
> > allocation.
> 
> They do get detached alright from Linux-side but obviously you are now
> creating an additional synchronization to the remoteproc side that they
> cannot use these as you have freed up the memory.
> 

Yeah, that's not acceptable, the backing memory must at least follow the
running state of the rproc, so that it is available once we boot the
core.

> > 
> > We will be sending out RPMSG_NS_DESTROY for some of the channels, but as
> > far as I can tell from the rpmsg implementation there is no support in
> > the protocol for your use case of dropping one end of the rpmsg channels
> > without restarting the firmware and(/or?) vdevs.
> 
> Right, this also causes new synchronization problems when you reprobe
> and the remoteproc side has to republish the channels.

>From the code it seems like the other virtio devices would handle this.

> The existing
> remoteproc infrastructure was designed around this fact - rpmsg channels
> can come and go (if needed) from the remoteproc-side, and the flow is no
> different from regular boot vs error recovery, and since they are tied
> to their parent virtio_rpmsg device, removing the communication path
> ensured that.

There are two levels here, the first is the virtio level which in the
case of rpmsg seems to be required to follow the remoteproc
RPROC_RUNNING state; the other being rpmsg channels have nothing to do
with remoteproc, but can as you say come and go dynamically as either
side like.

But the only problem I can see is if the firmware does not re-announce
channels after we reset the virtio device.

> 
> > 
> >> In summary, the current patches worked nicely in a error recovery
> >> scenario but are not working properly with the various combinations of
> >> module insertion/removal process.
> >>
> > 
> > Thanks for your feedback Suman. I think we should definitely fix #1 and
> > #2, but I'm uncertain to what extent #3 can be fixed.
> 
> The only solution I can think of is to reverse the module dependency as
> well to follow the reversal of the boot dependency, so that
> virtio_rpmsg_bus cannot be removed if you have a remoteproc supporting
> virtio devices is booted and running once the virtio_rpmsg has probed.

Can you please help me understand why it's important to protect either
module from being unloaded?

> I
> don't see any reasons to introduce new synchronization issues and force
> the firmwares to change (having to republish channels) because of this
> boot triggering reversal.
> 

The problem is not the boot triggering reversal - as there's no such
thing, the issue shows because I decoupled the rproc life cycle from one
of its child's.

> I am concerned on the impact as 4.9 is going to an LTS, with most of the
> Si vendors using the LTS for their new software releases.
> 

I'm sorry, but could you help me understand how rpmsg is used downstream
for this to be of concern, how is module unloading used beyond
development?

Per my own arguments I will provide some patches to correct the vring
buffer allocation and I'll look into the module locking.


PS. I really would like to see >0 users of this framework in mainline,
so we can reason about things on the basis of what's actually there!

Regards,
Bjorn

  reply	other threads:[~2016-09-19 23:24 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-11 21:52 [PATCH v2 1/4] remoteproc: Introduce auto-boot flag Bjorn Andersson
2016-08-11 21:52 ` [PATCH v2 2/4] remoteproc: Calculate max_notifyid during load Bjorn Andersson
2016-08-11 21:52 ` [PATCH v2 3/4] remoteproc: Move vdev handling to boot/shutdown Bjorn Andersson
2016-08-11 21:52 ` [PATCH v2 4/4] remoteproc: Move handling of cached table " Bjorn Andersson
2016-08-31 18:27 ` [PATCH v2 1/4] remoteproc: Introduce auto-boot flag Suman Anna
2016-08-31 18:27   ` Suman Anna
2016-09-08 22:27   ` Bjorn Andersson
2016-09-16 23:58     ` Suman Anna
2016-09-16 23:58       ` Suman Anna
2016-09-19 23:24       ` Bjorn Andersson [this message]
2016-09-20 21:29         ` Suman Anna
2016-09-20 21:29           ` Suman Anna

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160919232441.GJ21438@tuxbot \
    --to=bjorn.andersson@linaro.org \
    --cc=lee.jones@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-remoteproc@vger.kernel.org \
    --cc=loic.pallardy@st.com \
    --cc=ohad@wizery.com \
    --cc=s-anna@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.