All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pratyush Yadav <pratyush@kernel.org>
To: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Pratyush Yadav <pratyush@kernel.org>,
	 Tudor Ambarus <tudor.ambarus@linaro.org>,
	 Michael Walle <mwalle@kernel.org>,
	 Richard Weinberger <richard@nod.at>,
	 Vignesh Raghavendra <vigneshr@ti.com>,
	Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
	 Steam Lin <STLin2@winbond.com>,
	 linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] mtd: spi-nor: winbond: Add support for w25q01jv
Date: Mon, 13 Jan 2025 14:08:43 +0000	[thread overview]
Message-ID: <mafs0ldve7qms.fsf@kernel.org> (raw)
In-Reply-To: <871pxp798c.fsf@bootlin.com> (Miquel Raynal's message of "Mon, 30 Dec 2024 11:31:31 +0100")

On Mon, Dec 30 2024, Miquel Raynal wrote:

> Hello Pratyush,
>
> On 24/12/2024 at 21:15:41 GMT, Pratyush Yadav <pratyush@kernel.org> wrote:
>
>> On Tue, Dec 24 2024, Miquel Raynal wrote:
[...]
>>> As there are very few situations where this can actually happen, a
>>> status register write being the most likely one, another possibility
>>> might have been to use volatile writes instead of non-volatile writes,
>>> as most of the deviation comes from the action of writing the bit. But
>>> this would overlook other possible actions where both dies can be used
>>> at the same time like a chip erase (or any erase over the die boundary
>>> in general). This last approach would have the least impact but because
>>> it does not feel like it is totally safe to use and because the impact
>>> of the second solution presented above is also negligible, we keep this
>>> second approach for now (which can be further tuned later if it appears
>>> to be too impacting in the end).
>>
>> I am a bit confused by this paragraph. What do you mean by "this" in
>> the
>
> "this" = "the race condition"
>
>> first sentence? What do status register writes have to do with the ready
>> bit being racy?
>
> The bug that has been experienced followed this sequence:
> - send the write enable command (non-volatile)
> - wait for the ready/busy bit, ie. wait for the WEL bit to be set
>   because it is non-volatile write
> - active die is ready, (but idle die is not!)
> - enter 4-byte address mode, only the die that is ready processes the
>   command.
>
> We only observed the issue in this particular case which involves
> writing the status register, because it is one of the very few commands
> targeting all dies at the same time.
>
> I assume another sequence that might lead to a similar issue might be a
> chip erasure, as all dies are involved in parallel, but maybe there are
> other situations I did not think about which might be racy as well.


>
>> I would assume those would be nearly instant since
>> status registers are usually volatile. What do volatile writes mean in
>> this context?
>
> You are actually right. Status register bits can be volatile (in this
> case writing the bits themselves is almost instant) but currently when
> we allow this register to be writable by sending the write enable (06h)
> command, the non-volatile way is used, ie. the state of the bit itself
> is stored in non-volatile memory and write durations can vary from one
> die to another.

Okay, that is strange behaviour. Normally the status registers are
always volatile, and don't have a non-volatile counterpart.

>
> Winbond chips (maybe this is a shared capability?) accepts another
> command, "Write Enable for Volatile Status Register (50h)", which
> specifically change the status register bits to use the volatile method.
>
> Hence, if the only situation we want to solve is the status register
> access, then we may just enable this command (this is the third solution
> I tried to explain in the commit log), but if we think there are other
> racy situations, this approach is not complete and we must fallback to
> one of the approaches listed above.

I am not quite sure how you fix the write-enable-being-racy bug with
your patch. If you look at the code, spi_nor_write_enable() only calls
the write enable command (06h), and does not call
spi_nor_wait_till_ready() after that. After the write enable, it
immediately executes the program or erase operation. So you never
actually wait for all dies to be ready after a write enable.

You can see an example in spi_nor_write(). It does:

    spi_nor_write_enable() -> spi_nor_write_data() -> spi_nor_wait_till_ready()

Do you have a consistent reproducer for the race? If so, does the patch
actually somehow make the race go away? If so, I would be curious to
know why.

>
>>>
>>> However, the fixup, whatever which one we pick, must be applied on
>>> multi-die chips, which hence must be properly flagged. The SFDP tables
>>> implemented give a lot of information but the die details are part of an
>>> optional table that is not implemented, hence we use a post parsing
>>> fixup hook to set the params->n_dice value manually.
>>>
[...]

-- 
Regards,
Pratyush Yadav

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

WARNING: multiple messages have this Message-ID (diff)
From: Pratyush Yadav <pratyush@kernel.org>
To: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Pratyush Yadav <pratyush@kernel.org>,
	 Tudor Ambarus <tudor.ambarus@linaro.org>,
	 Michael Walle <mwalle@kernel.org>,
	 Richard Weinberger <richard@nod.at>,
	 Vignesh Raghavendra <vigneshr@ti.com>,
	Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
	 Steam Lin <STLin2@winbond.com>,
	 linux-mtd@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] mtd: spi-nor: winbond: Add support for w25q01jv
Date: Mon, 13 Jan 2025 14:08:43 +0000	[thread overview]
Message-ID: <mafs0ldve7qms.fsf@kernel.org> (raw)
In-Reply-To: <871pxp798c.fsf@bootlin.com> (Miquel Raynal's message of "Mon, 30 Dec 2024 11:31:31 +0100")

On Mon, Dec 30 2024, Miquel Raynal wrote:

> Hello Pratyush,
>
> On 24/12/2024 at 21:15:41 GMT, Pratyush Yadav <pratyush@kernel.org> wrote:
>
>> On Tue, Dec 24 2024, Miquel Raynal wrote:
[...]
>>> As there are very few situations where this can actually happen, a
>>> status register write being the most likely one, another possibility
>>> might have been to use volatile writes instead of non-volatile writes,
>>> as most of the deviation comes from the action of writing the bit. But
>>> this would overlook other possible actions where both dies can be used
>>> at the same time like a chip erase (or any erase over the die boundary
>>> in general). This last approach would have the least impact but because
>>> it does not feel like it is totally safe to use and because the impact
>>> of the second solution presented above is also negligible, we keep this
>>> second approach for now (which can be further tuned later if it appears
>>> to be too impacting in the end).
>>
>> I am a bit confused by this paragraph. What do you mean by "this" in
>> the
>
> "this" = "the race condition"
>
>> first sentence? What do status register writes have to do with the ready
>> bit being racy?
>
> The bug that has been experienced followed this sequence:
> - send the write enable command (non-volatile)
> - wait for the ready/busy bit, ie. wait for the WEL bit to be set
>   because it is non-volatile write
> - active die is ready, (but idle die is not!)
> - enter 4-byte address mode, only the die that is ready processes the
>   command.
>
> We only observed the issue in this particular case which involves
> writing the status register, because it is one of the very few commands
> targeting all dies at the same time.
>
> I assume another sequence that might lead to a similar issue might be a
> chip erasure, as all dies are involved in parallel, but maybe there are
> other situations I did not think about which might be racy as well.


>
>> I would assume those would be nearly instant since
>> status registers are usually volatile. What do volatile writes mean in
>> this context?
>
> You are actually right. Status register bits can be volatile (in this
> case writing the bits themselves is almost instant) but currently when
> we allow this register to be writable by sending the write enable (06h)
> command, the non-volatile way is used, ie. the state of the bit itself
> is stored in non-volatile memory and write durations can vary from one
> die to another.

Okay, that is strange behaviour. Normally the status registers are
always volatile, and don't have a non-volatile counterpart.

>
> Winbond chips (maybe this is a shared capability?) accepts another
> command, "Write Enable for Volatile Status Register (50h)", which
> specifically change the status register bits to use the volatile method.
>
> Hence, if the only situation we want to solve is the status register
> access, then we may just enable this command (this is the third solution
> I tried to explain in the commit log), but if we think there are other
> racy situations, this approach is not complete and we must fallback to
> one of the approaches listed above.

I am not quite sure how you fix the write-enable-being-racy bug with
your patch. If you look at the code, spi_nor_write_enable() only calls
the write enable command (06h), and does not call
spi_nor_wait_till_ready() after that. After the write enable, it
immediately executes the program or erase operation. So you never
actually wait for all dies to be ready after a write enable.

You can see an example in spi_nor_write(). It does:

    spi_nor_write_enable() -> spi_nor_write_data() -> spi_nor_wait_till_ready()

Do you have a consistent reproducer for the race? If so, does the patch
actually somehow make the race go away? If so, I would be curious to
know why.

>
>>>
>>> However, the fixup, whatever which one we pick, must be applied on
>>> multi-die chips, which hence must be properly flagged. The SFDP tables
>>> implemented give a lot of information but the die details are part of an
>>> optional table that is not implemented, hence we use a post parsing
>>> fixup hook to set the params->n_dice value manually.
>>>
[...]

-- 
Regards,
Pratyush Yadav

  reply	other threads:[~2025-01-13 14:08 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-24 16:47 [PATCH 0/2] mtd: spi-nor: winbond: Add support for flashes with several dies Miquel Raynal
2024-12-24 16:47 ` Miquel Raynal
2024-12-24 16:47 ` [PATCH 1/2] mtd: spi-nor: winbond: Add support for w25q01jv Miquel Raynal
2024-12-24 16:47   ` Miquel Raynal
2024-12-24 21:15   ` Pratyush Yadav
2024-12-24 21:15     ` Pratyush Yadav
2024-12-30 10:31     ` Miquel Raynal
2024-12-30 10:31       ` Miquel Raynal
2025-01-13 14:08       ` Pratyush Yadav [this message]
2025-01-13 14:08         ` Pratyush Yadav
2025-01-14 11:07         ` Miquel Raynal
2025-01-14 11:07           ` Miquel Raynal
2025-01-15 14:03           ` Pratyush Yadav
2025-01-15 14:03             ` Pratyush Yadav
2025-01-15 19:10             ` Miquel Raynal
2025-01-15 19:10               ` Miquel Raynal
2025-01-15 20:05               ` Pratyush Yadav
2025-01-15 20:05                 ` Pratyush Yadav
2025-01-20 12:47             ` Miquel Raynal
2025-01-20 12:47               ` Miquel Raynal
2025-01-20 14:21               ` Pratyush Yadav
2025-01-20 14:21                 ` Pratyush Yadav
2025-01-08 18:22     ` Miquel Raynal
2025-01-08 18:22       ` Miquel Raynal
2025-01-09 16:14     ` Miquel Raynal
2025-01-09 16:14       ` Miquel Raynal
2025-01-13 13:40       ` Pratyush Yadav
2025-01-13 13:40         ` Pratyush Yadav
2024-12-24 16:47 ` [PATCH 2/2] mtd: spi-nor: winbond: Add support for w25q02jv Miquel Raynal
2024-12-24 16:47   ` Miquel Raynal
2024-12-24 21:16   ` Pratyush Yadav
2024-12-24 21:16     ` Pratyush Yadav

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mafs0ldve7qms.fsf@kernel.org \
    --to=pratyush@kernel.org \
    --cc=STLin2@winbond.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=miquel.raynal@bootlin.com \
    --cc=mwalle@kernel.org \
    --cc=richard@nod.at \
    --cc=thomas.petazzoni@bootlin.com \
    --cc=tudor.ambarus@linaro.org \
    --cc=vigneshr@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.