From: Richard Fitzgerald <rf@opensource.cirrus.com>
To: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>,
<vkoul@kernel.org>, <yung-chuan.liao@linux.intel.com>,
<sanyog.r.kale@intel.com>
Cc: <patches@opensource.cirrus.com>, <alsa-devel@alsa-project.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/3] soundwire: bus: Fix lost UNATTACH when re-enumerating
Date: Thu, 25 Aug 2022 16:25:01 +0100 [thread overview]
Message-ID: <e9deb2fb-458a-8136-5ba7-a9e2b0f2d174@opensource.cirrus.com> (raw)
In-Reply-To: <adfdf06a-e1a3-e47c-a71f-5e5dccef6fd0@linux.intel.com>
On 25/08/2022 15:24, Pierre-Louis Bossart wrote:
> Humm, I am struggling a bit more on this patch.
>
> On 8/25/22 14:22, Richard Fitzgerald wrote:
>> Rearrange sdw_handle_slave_status() so that any peripherals
>> on device #0 that are given a device ID are reported as
>> unattached. The ensures that UNATTACH status is not lost.
>>
>> Handle unenumerated devices first and update the
>> sdw_slave_status array to indicate IDs that must have become
>> UNATTACHED.
>>
>> Look for UNATTACHED devices after this so we can pick up
>> peripherals that were UNATTACHED in the original PING status
>> and those that were still ATTACHED at the time of the PING but
>> then reverted to unenumerated and were found by
>> sdw_program_device_num().
>
> Are those two cases really lost completely? It's a bit surprising, I do
> recall that we added a recheck on the status, see the 'update_status'
> label in cdns_update_slave_status_work
>
Yes they are. We see this happen extremely frequently (like, almost
every time) when we reset out peripherals after a firmware change.
I saw that "try again" stuff in cdns_update_slave_status_work() but
it's not fixing the problem. Maybe because it's looking for devices
still on #0 but that isn't the problem.
The cdns_update_slave_status_work() is running in one workqueue thread,
child drivers in other threads. So for example:
1. Child driver #1 resets #1
2. PING: #1 has reverted to #0, #2 still ATTACHED
3. cdns_update_slave_status() snapshots the status. #2 is ATTACHED
4. #1 has gone so mark it UNATTACHED
5. Child driver #2 gets some CPU time and reset #2
5. PING: #2 has reset, both now on #0 but we are handling the previous
PING
6. sdw_handle_slave_status() - snapshot PING (from step 3) says #2 is
attached
7. Device on #0 so call sdw_program_device_num()
8. sdw_program_device_num() loops until no devices on #0, #1 and #2
are both reprogrammed, return from sdw_handle_slave_status()
10. PING: #1 and #2 both attached
11. cdns_update_slave_status() -> sdw_handle_slave_status()
12. #1 has changed UNATTACHED->ATTACHED, but we never got a PING with
#2 unattached so its slave->status==ATTACHED, "it hasn't changed"
(wrong!)
Now, at step 10 the Cadence IP may have accumlated both UNATTACH and
ATTACH flags, and perhaps it should be smarter about deciding what
to report if there are multiple states. HOWEVER.... that's the behaviour
of Cadence IP, other IP may be different so it's probably unwise to
assume that the IP has "remembered" the UNATTACH state before it was
reprogrammed.
If we reprogrammed it, it was definitely UNATTACHED so let's say that.
>> As sdw_update_slave_status() is always processing a snapshot of
>> a PING from some time in the past, it is possible that the status
>> is changing while sdw_update_slave_status() is running.
>>
>> A peripheral could report attached in the PING, but detach and
>> revert to device #0 and then be found in the loop in
>> sdw_program_device_num(). Previously the code would not have
>> updated slave->status to UNATTACHED because there was never a
>> PING with that status. If the slave->status is not updated to
>> UNATTACHED the next PING will report it as ATTACHED, but its
>> slave->status is already ATTACHED so the re-attach will not be
>> properly handled.
> The idea of detecting first devices that become unattached - and later
> deal with device0 when they re-attach - was based on the fact that
> synchronization takes time. The absolute minimum is 16 frames per the
> SoundWire spec.
>
> I don't see how testing for the status[0] first in
> sdw_handle_slave_status() helps, the value is taken at the same time as
> status[1..11]. If you really want to take the last information, we
> should re-read the status from a new PING frame.
>
>
The point is to deal with unattached devices second, not first.
If we do it first we might find some more that are unattached since
the ping. Moving the unattach check second means we don't have to
do it twice.
>> This situations happens fairly frequently with multiple
>> peripherals on a bus that are intentionally reset (for example
>> after downloading firmware).
>>
>> Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
>> ---
>> drivers/soundwire/bus.c | 39 ++++++++++++++++++++++++++-------------
>> 1 file changed, 26 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c
>> index bb8ce26c68b3..1212148ac251 100644
>> --- a/drivers/soundwire/bus.c
>> +++ b/drivers/soundwire/bus.c
>> @@ -718,7 +718,8 @@ void sdw_extract_slave_id(struct sdw_bus *bus,
>> }
>> EXPORT_SYMBOL(sdw_extract_slave_id);
>>
>> -static int sdw_program_device_num(struct sdw_bus *bus)
>> +static int sdw_program_device_num(struct sdw_bus *bus,
>> + enum sdw_slave_status status[])
>> {
>> u8 buf[SDW_NUM_DEV_ID_REGISTERS] = {0};
>> struct sdw_slave *slave, *_s;
>> @@ -776,6 +777,12 @@ static int sdw_program_device_num(struct sdw_bus *bus)
>> return ret;
>> }
>>
>> + /*
>> + * It could have dropped off the bus since the
>> + * PING response so update the status array.
>> + */
>> + status[slave->dev_num] = SDW_SLAVE_UNATTACHED;
>> +
>> break;
>> }
>> }
>> @@ -1735,10 +1742,21 @@ int sdw_handle_slave_status(struct sdw_bus *bus,
>> {
>> enum sdw_slave_status prev_status;
>> struct sdw_slave *slave;
>> + bool programmed_dev_num = false;
>> bool attached_initializing;
>> int i, ret = 0;
>>
>> - /* first check if any Slaves fell off the bus */
>> + /* Handle any unenumerated peripherals */
>> + if (status[0] == SDW_SLAVE_ATTACHED) {
>> + dev_dbg(bus->dev, "Slave attached, programming device number\n");
>> + ret = sdw_program_device_num(bus, status);
>> + if (ret < 0)
>> + dev_warn(bus->dev, "Slave attach failed: %d\n", ret);
>> +
>> + programmed_dev_num = true;
>> + }
>> +
>> + /* Check if any fell off the bus */
>> for (i = 1; i <= SDW_MAX_DEVICES; i++) {
>> mutex_lock(&bus->bus_lock);
>> if (test_bit(i, bus->assigned) == false) {
>> @@ -1764,17 +1782,12 @@ int sdw_handle_slave_status(struct sdw_bus *bus,
>> }
>> }
>>
>> - if (status[0] == SDW_SLAVE_ATTACHED) {
>> - dev_dbg(bus->dev, "Slave attached, programming device number\n");
>> - ret = sdw_program_device_num(bus);
>> - if (ret < 0)
>> - dev_err(bus->dev, "Slave attach failed: %d\n", ret);
>> - /*
>> - * programming a device number will have side effects,
>> - * so we deal with other devices at a later time
>> - */
>> - return ret;
>> - }
>> + /*
>> + * programming a device number will have side effects,
>> + * so we deal with other devices at a later time
>> + */
>> + if (programmed_dev_num)
>> + return 0;
>>
>> /* Continue to check other slave statuses */
>> for (i = 1; i <= SDW_MAX_DEVICES; i++) {
next prev parent reply other threads:[~2022-08-25 15:41 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-25 12:22 [PATCH 0/3] soundwire: Fixes for spurious and missing UNATTACH Richard Fitzgerald
2022-08-25 12:22 ` [PATCH 1/3] soundwire: cadence: fix updating slave status when a bus has multiple peripherals Richard Fitzgerald
2022-08-25 12:57 ` Pierre-Louis Bossart
2022-08-25 12:22 ` [PATCH 2/3] soundwire: bus: Don't lose unattach notifications Richard Fitzgerald
2022-08-25 12:39 ` Pierre-Louis Bossart
2022-08-25 12:22 ` [PATCH 3/3] soundwire: bus: Fix lost UNATTACH when re-enumerating Richard Fitzgerald
2022-08-25 14:24 ` Pierre-Louis Bossart
2022-08-25 15:25 ` Richard Fitzgerald [this message]
2022-08-26 8:06 ` Pierre-Louis Bossart
2022-08-29 9:50 ` Richard Fitzgerald
2022-08-26 10:38 ` Richard Fitzgerald
2022-08-30 9:00 ` Richard Fitzgerald
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e9deb2fb-458a-8136-5ba7-a9e2b0f2d174@opensource.cirrus.com \
--to=rf@opensource.cirrus.com \
--cc=alsa-devel@alsa-project.org \
--cc=linux-kernel@vger.kernel.org \
--cc=patches@opensource.cirrus.com \
--cc=pierre-louis.bossart@linux.intel.com \
--cc=sanyog.r.kale@intel.com \
--cc=vkoul@kernel.org \
--cc=yung-chuan.liao@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox