* CXL/region : commit reset of out of order region appears to succeed.
@ 2023-03-16 17:14 Jonathan Cameron
2023-06-17 0:26 ` Dan Williams
0 siblings, 1 reply; 2+ messages in thread
From: Jonathan Cameron @ 2023-03-16 17:14 UTC (permalink / raw)
To: linux-cxl, dan.j.williams
Ran into this whilst testing fix for QEMU uncommit handling.
To replicate.
1) Setup two regions on a direct connected Type 3 and commit them both.
2) Uncommit the first region once. (it fails with an out of order message)
Note that from here on the sysfs commit attribute reads as 0.
3) Uncommit that first region again. It appears to succeed.
Reason is easy to track down:
https://elixir.bootlin.com/linux/v6.3-rc2/source/drivers/cxl/core/region.c#L257
commit_store() of 0 unconditionally sets the state to CXL_CONFIG_RESET_PENDING
When the decoder reset fails, that is left set.
Hence next call drops straight through.
Whilst it's easy to 'fix' the superficial issue by reseting the state to the previous
value on error, I'm not sure that's sufficient or race free.
Hence report rather than a patch. I can look into this in more depth, but
a few other things come before it in my list.
Thanks,
Jonathan
p.s. I hope to send the qemu fix for uncommit fairly soon.
^ permalink raw reply [flat|nested] 2+ messages in thread
* RE: CXL/region : commit reset of out of order region appears to succeed.
2023-03-16 17:14 CXL/region : commit reset of out of order region appears to succeed Jonathan Cameron
@ 2023-06-17 0:26 ` Dan Williams
0 siblings, 0 replies; 2+ messages in thread
From: Dan Williams @ 2023-06-17 0:26 UTC (permalink / raw)
To: Jonathan Cameron, linux-cxl, dan.j.williams
Jonathan Cameron wrote:
> Ran into this whilst testing fix for QEMU uncommit handling.
>
> To replicate.
> 1) Setup two regions on a direct connected Type 3 and commit them both.
> 2) Uncommit the first region once. (it fails with an out of order message)
> Note that from here on the sysfs commit attribute reads as 0.
> 3) Uncommit that first region again. It appears to succeed.
>
> Reason is easy to track down:
> https://elixir.bootlin.com/linux/v6.3-rc2/source/drivers/cxl/core/region.c#L257
>
> commit_store() of 0 unconditionally sets the state to CXL_CONFIG_RESET_PENDING
>
> When the decoder reset fails, that is left set.
> Hence next call drops straight through.
>
> Whilst it's easy to 'fix' the superficial issue by reseting the state to the previous
> value on error, I'm not sure that's sufficient or race free.
I think it is sufficient because the state transition is happening under
the lock and RESET_PENDING > ACTIVE. So any paths that depend on the
region not being active will be protected.
On the other side, if someone races to commit the region via another
thread while the lock is dropped they will either successfully
transition the region back to the COMMIT state, or will re-attempt
the reset. When both those threads re-acquire the lock one of them will
see that the reset state can advance back to ACTIVE, or will see that
someone snuck in and committed the region again while the lock was
dropped.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-06-17 0:26 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-16 17:14 CXL/region : commit reset of out of order region appears to succeed Jonathan Cameron
2023-06-17 0:26 ` Dan Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox