From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D7B023ED42 for ; Mon, 13 Jan 2025 14:08:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736777326; cv=none; b=HeRUPOH5diHsRvlznJL/Xf6ozgLJywTNCFJoE4W1oFYCswcmxV/tElmxS2a8kEmZr/qFFE/Jsz+89G5xiqYxOxMGO9+/qeuAomwD8H0nV+wpcOlpPeLJkuaVzYYjDsMpY7EhrilFauiiznsBs/eToUVjlaGFLIDp+0rnd+UlHPs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736777326; c=relaxed/simple; bh=DFhJcOj4yvsA+mhOUAs7B5VqLEunr0m+ivU1XrbxDv4=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=kH3gqx8NmKaeGdcNUYBS1Sc6us4qbcVPGUUUCx1+pSZ7vLyTgCFoZtvNO9X9kPuHCZyGPRaLwePiCddYHRc2qstjXetaqhqM+rgjblNWbnSFVLHrXDJxNJElk/kf2Zcva62r2Zy9SnXxGYViKm6SVYbiktGQbQ8pZMHYuJqkXgU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=twNE3sFD; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="twNE3sFD" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 011A2C4CED6; Mon, 13 Jan 2025 14:08:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1736777326; bh=DFhJcOj4yvsA+mhOUAs7B5VqLEunr0m+ivU1XrbxDv4=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=twNE3sFDJdNtBA//VFal6WrCvOYK61cnSc19xHIwOtmPNQBEYCDgvH+eLkNcyGKB0 Xmti9VdndfS/KEeA37U7clv1cXrFXHm4MqCeHQ1VBG0mRT3jxTL2Jxum5tJbymR0ZG Imqe9C0nRCec7wllg20L3jBmyUtAgTeiRbCuKLPyTVFyA7pDEVKg4FQcYpf3t4VuI/ gfWupGHgfUmAg68qEzEdIQcCZfVu1jzn5GeWC0/5pXkJnTZrmu7B+/jOcezjTxyqSt kT+wx+P002zv3WO9iI6zqeGf1NggDDeoU4KBDW2a7INfbFJJBHejQAtPH0Vpk4bS+/ LRfMijOzq/Kyw== From: Pratyush Yadav To: Miquel Raynal Cc: Pratyush Yadav , Tudor Ambarus , Michael Walle , Richard Weinberger , Vignesh Raghavendra , Thomas Petazzoni , Steam Lin , linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2] mtd: spi-nor: winbond: Add support for w25q01jv In-Reply-To: <871pxp798c.fsf@bootlin.com> (Miquel Raynal's message of "Mon, 30 Dec 2024 11:31:31 +0100") References: <20241224-winbond-6-12-rc1-nor-volatile-bit-v1-0-f7c4dff66182@bootlin.com> <20241224-winbond-6-12-rc1-nor-volatile-bit-v1-1-f7c4dff66182@bootlin.com> <871pxp798c.fsf@bootlin.com> Date: Mon, 13 Jan 2025 14:08:43 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain On Mon, Dec 30 2024, Miquel Raynal wrote: > Hello Pratyush, > > On 24/12/2024 at 21:15:41 GMT, Pratyush Yadav wrote: > >> On Tue, Dec 24 2024, Miquel Raynal wrote: [...] >>> As there are very few situations where this can actually happen, a >>> status register write being the most likely one, another possibility >>> might have been to use volatile writes instead of non-volatile writes, >>> as most of the deviation comes from the action of writing the bit. But >>> this would overlook other possible actions where both dies can be used >>> at the same time like a chip erase (or any erase over the die boundary >>> in general). This last approach would have the least impact but because >>> it does not feel like it is totally safe to use and because the impact >>> of the second solution presented above is also negligible, we keep this >>> second approach for now (which can be further tuned later if it appears >>> to be too impacting in the end). >> >> I am a bit confused by this paragraph. What do you mean by "this" in >> the > > "this" = "the race condition" > >> first sentence? What do status register writes have to do with the ready >> bit being racy? > > The bug that has been experienced followed this sequence: > - send the write enable command (non-volatile) > - wait for the ready/busy bit, ie. wait for the WEL bit to be set > because it is non-volatile write > - active die is ready, (but idle die is not!) > - enter 4-byte address mode, only the die that is ready processes the > command. > > We only observed the issue in this particular case which involves > writing the status register, because it is one of the very few commands > targeting all dies at the same time. > > I assume another sequence that might lead to a similar issue might be a > chip erasure, as all dies are involved in parallel, but maybe there are > other situations I did not think about which might be racy as well. > >> I would assume those would be nearly instant since >> status registers are usually volatile. What do volatile writes mean in >> this context? > > You are actually right. Status register bits can be volatile (in this > case writing the bits themselves is almost instant) but currently when > we allow this register to be writable by sending the write enable (06h) > command, the non-volatile way is used, ie. the state of the bit itself > is stored in non-volatile memory and write durations can vary from one > die to another. Okay, that is strange behaviour. Normally the status registers are always volatile, and don't have a non-volatile counterpart. > > Winbond chips (maybe this is a shared capability?) accepts another > command, "Write Enable for Volatile Status Register (50h)", which > specifically change the status register bits to use the volatile method. > > Hence, if the only situation we want to solve is the status register > access, then we may just enable this command (this is the third solution > I tried to explain in the commit log), but if we think there are other > racy situations, this approach is not complete and we must fallback to > one of the approaches listed above. I am not quite sure how you fix the write-enable-being-racy bug with your patch. If you look at the code, spi_nor_write_enable() only calls the write enable command (06h), and does not call spi_nor_wait_till_ready() after that. After the write enable, it immediately executes the program or erase operation. So you never actually wait for all dies to be ready after a write enable. You can see an example in spi_nor_write(). It does: spi_nor_write_enable() -> spi_nor_write_data() -> spi_nor_wait_till_ready() Do you have a consistent reproducer for the race? If so, does the patch actually somehow make the race go away? If so, I would be curious to know why. > >>> >>> However, the fixup, whatever which one we pick, must be applied on >>> multi-die chips, which hence must be properly flagged. The SFDP tables >>> implemented give a lot of information but the die details are part of an >>> optional table that is not implemented, hence we use a post parsing >>> fixup hook to set the params->n_dice value manually. >>> [...] -- Regards, Pratyush Yadav