public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Takashi Sakamoto <o-takashi@sakamocchi.jp>,
	Adam Goldman <adamg@pobox.com>, Sasha Levin <sashal@kernel.org>,
	linux1394-devel@lists.sourceforge.net
Subject: [PATCH AUTOSEL 6.1 22/28] firewire: core: send bus reset promptly on gap count error
Date: Mon, 12 Feb 2024 19:22:20 -0500	[thread overview]
Message-ID: <20240213002235.671934-22-sashal@kernel.org> (raw)
In-Reply-To: <20240213002235.671934-1-sashal@kernel.org>

From: Takashi Sakamoto <o-takashi@sakamocchi.jp>

[ Upstream commit 7ed4380009e96d9e9c605e12822e987b35b05648 ]

If we are bus manager and the bus has inconsistent gap counts, send a
bus reset immediately instead of trying to read the root node's config
ROM first. Otherwise, we could spend a lot of time trying to read the
config ROM but never succeeding.

This eliminates a 50+ second delay before the FireWire bus is usable after
a newly connected device is powered on in certain circumstances.

The delay occurs if a gap count inconsistency occurs, we are not the root
node, and we become bus manager. One scenario that causes this is with a TI
XIO2213B OHCI, the first time a Sony DSR-25 is powered on after being
connected to the FireWire cable. In this configuration, the Linux box will
not receive the initial PHY configuration packet sent by the DSR-25 as IRM,
resulting in the DSR-25 having a gap count of 44 while the Linux box has a
gap count of 63.

FireWire devices have a gap count parameter, which is set to 63 on power-up
and can be changed with a PHY configuration packet. This determines the
duration of the subaction and arbitration gaps. For reliable communication,
all nodes on a FireWire bus must have the same gap count.

A node may have zero or more of the following roles: root node, bus manager
(BM), isochronous resource manager (IRM), and cycle master. Unless a root
node was forced with a PHY configuration packet, any node might become root
node after a bus reset. Only the root node can become cycle master. If the
root node is not cycle master capable, the BM or IRM should force a change
of root node.

After a bus reset, each node sends a self-ID packet, which contains its
current gap count. A single bus reset does not change the gap count, but
two bus resets in a row will set the gap count to 63. Because a consistent
gap count is required for reliable communication, IEEE 1394a-2000 requires
that the bus manager generate a bus reset if it detects that the gap count
is inconsistent.

When the gap count is inconsistent, build_tree() will notice this after the
self identification process. It will set card->gap_count to the invalid
value 0. If we become bus master, this will force bm_work() to send a bus
reset when it performs gap count optimization.

After a bus reset, there is no bus manager. We will almost always try to
become bus manager. Once we become bus manager, we will first determine
whether the root node is cycle master capable. Then, we will determine if
the gap count should be changed. If either the root node or the gap count
should be changed, we will generate a bus reset.

To determine if the root node is cycle master capable, we read its
configuration ROM. bm_work() will wait until we have finished trying to
read the configuration ROM.

However, an inconsistent gap count can make this take a long time.
read_config_rom() will read the first few quadlets from the config ROM. Due
to the gap count inconsistency, eventually one of the reads will time out.
When read_config_rom() fails, fw_device_init() calls it again until
MAX_RETRIES is reached. This takes 50+ seconds.

Once we give up trying to read the configuration ROM, bm_work() will wake
up, assume that the root node is not cycle master capable, and do a bus
reset. Hopefully, this will resolve the gap count inconsistency.

This change makes bm_work() check for an inconsistent gap count before
waiting for the root node's configuration ROM. If the gap count is
inconsistent, bm_work() will immediately do a bus reset. This eliminates
the 50+ second delay and rapidly brings the bus to a working state.

I considered that if the gap count is inconsistent, a PHY configuration
packet might not be successful, so it could be desirable to skip the PHY
configuration packet before the bus reset in this case. However, IEEE
1394a-2000 and IEEE 1394-2008 say that the bus manager may transmit a PHY
configuration packet before a bus reset when correcting a gap count error.
Since the standard endorses this, I decided it's safe to retain the PHY
configuration packet transmission.

Normally, after a topology change, we will reset the bus a maximum of 5
times to change the root node and perform gap count optimization. However,
if there is a gap count inconsistency, we must always generate a bus reset.
Otherwise the gap count inconsistency will persist and communication will
be unreliable. For that reason, if there is a gap count inconstency, we
generate a bus reset even if we already reached the 5 reset limit.

Signed-off-by: Adam Goldman <adamg@pobox.com>
Reference: https://sourceforge.net/p/linux1394/mailman/message/58727806/
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/firewire/core-card.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/firewire/core-card.c b/drivers/firewire/core-card.c
index 6ac5ff20a2fe..8aaa7fcb2630 100644
--- a/drivers/firewire/core-card.c
+++ b/drivers/firewire/core-card.c
@@ -429,7 +429,23 @@ static void bm_work(struct work_struct *work)
 	 */
 	card->bm_generation = generation;
 
-	if (root_device == NULL) {
+	if (card->gap_count == 0) {
+		/*
+		 * If self IDs have inconsistent gap counts, do a
+		 * bus reset ASAP. The config rom read might never
+		 * complete, so don't wait for it. However, still
+		 * send a PHY configuration packet prior to the
+		 * bus reset. The PHY configuration packet might
+		 * fail, but 1394-2008 8.4.5.2 explicitly permits
+		 * it in this case, so it should be safe to try.
+		 */
+		new_root_id = local_id;
+		/*
+		 * We must always send a bus reset if the gap count
+		 * is inconsistent, so bypass the 5-reset limit.
+		 */
+		card->bm_retries = 0;
+	} else if (root_device == NULL) {
 		/*
 		 * Either link_on is false, or we failed to read the
 		 * config rom.  In either case, pick another root.
-- 
2.43.0


  parent reply	other threads:[~2024-02-13  0:23 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-13  0:21 [PATCH AUTOSEL 6.1 01/28] fs/ntfs3: Modified fix directory element type detection Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 02/28] fs/ntfs3: Improve ntfs_dir_count Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 03/28] fs/ntfs3: Correct hard links updating when dealing with DOS names Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 04/28] fs/ntfs3: Print warning while fixing hard links count Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 05/28] fs/ntfs3: Fix detected field-spanning write (size 8) of single field "le->name" Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 06/28] fs/ntfs3: Add NULL ptr dereference checking at the end of attr_allocate_frame() Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 07/28] fs/ntfs3: Disable ATTR_LIST_ENTRY size check Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 08/28] fs/ntfs3: use non-movable memory for ntfs3 MFT buffer cache Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 09/28] fs/ntfs3: Prevent generic message "attempt to access beyond end of device" Sasha Levin
2024-02-18 18:00   ` Pavel Machek
2024-02-18 18:06     ` Greg KH
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 10/28] fs/ntfs3: Correct function is_rst_area_valid Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 11/28] fs/ntfs3: Update inode->i_size after success write into compressed file Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 12/28] fs/ntfs3: Fix oob in ntfs_listxattr Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 13/28] wifi: mac80211: set station RX-NSS on reconfig Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 14/28] wifi: mac80211: adding missing drv_mgd_complete_tx() call Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 15/28] efi: runtime: Fix potential overflow of soft-reserved region size Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 16/28] efi: Don't add memblocks for soft-reserved memory Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 17/28] hwmon: (coretemp) Enlarge per package core count limit Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 18/28] scsi: lpfc: Use unsigned type for num_sge Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 19/28] scsi: ufs: core: Remove the ufshcd_release() in ufshcd_err_handling_prepare() Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 20/28] LoongArch: Select ARCH_ENABLE_THP_MIGRATION instead of redefining it Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 21/28] LoongArch: Select HAVE_ARCH_SECCOMP to use the common SECCOMP menu Sasha Levin
2024-02-13  0:22 ` Sasha Levin [this message]
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 23/28] PCI: dwc: Clean up dw_pcie_ep_raise_msi_irq() alignment Sasha Levin
2024-02-18 18:01   ` Pavel Machek
2024-02-21 18:53     ` Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 24/28] drm/amdgpu: skip to program GFXDEC registers for suspend abort Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 25/28] drm/amdgpu: reset gpu for s3 suspend abort case Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 26/28] smb: client: set correct d_type for reparse points under DFS mounts Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 27/28] virtio-blk: Ensure no requests in virtqueues before deleting vqs Sasha Levin
2024-02-13  0:22 ` [PATCH AUTOSEL 6.1 28/28] smb3: clarify mount warning Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240213002235.671934-22-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=adamg@pobox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux1394-devel@lists.sourceforge.net \
    --cc=o-takashi@sakamocchi.jp \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox