* [PATCH v3] powerpc/pseries/vas: Use usleep_range() to support HCALL delay
@ 2023-12-03 2:01 Haren Myneni
2023-12-04 14:05 ` Aneesh Kumar K.V
0 siblings, 1 reply; 3+ messages in thread
From: Haren Myneni @ 2023-12-03 2:01 UTC (permalink / raw)
To: linuxppc-dev; +Cc: nathanl, Haren Myneni, npiggin
VAS allocate, modify and deallocate HCALLs returns
H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
delay and expects OS to reissue HCALL after that delay. But using
msleep() will often sleep at least 20 msecs even though the
hypervisor suggests OS reissue these HCALLs after 1 or 10msecs.
The open and close VAS window functions hold mutex and then issue
these HCALLs. So these operations can take longer than the
necessary when multiple threads issue open or close window APIs
simultaneously.
So instead of msleep(), use usleep_range() to ensure sleep with
the expected value before issuing HCALL again.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>
---
v1 -> v2:
- Use usleep_range instead of using RTAS sleep routine as
suggested by Nathan
v2 -> v3:
- Sleep 10MSecs even for HCALL delay > 10MSecs and the other
commit / comemnt changes as suggested by Nathan and Ellerman.
---
arch/powerpc/platforms/pseries/vas.c | 25 ++++++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
index 71d52a670d95..5cf81c564d4b 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -38,7 +38,30 @@ static long hcall_return_busy_check(long rc)
{
/* Check if we are stalled for some time */
if (H_IS_LONG_BUSY(rc)) {
- msleep(get_longbusy_msecs(rc));
+ unsigned int ms;
+ /*
+ * Allocate, Modify and Deallocate HCALLs returns
+ * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC
+ * for the long delay. So the sleep time should always
+ * be either 1 or 10msecs, but in case if the HCALL
+ * returns the long delay > 10 msecs, clamp the sleep
+ * time to 10msecs.
+ */
+ ms = clamp(get_longbusy_msecs(rc), 1, 10);
+
+ /*
+ * msleep() will often sleep at least 20 msecs even
+ * though the hypervisor suggests that the OS reissue
+ * HCALLs after 1 or 10msecs. Also the delay hint from
+ * the HCALL is just a suggestion. So OK to pause for
+ * less time than the hinted delay. Use usleep_range()
+ * to ensure we don't sleep much longer than actually
+ * needed.
+ *
+ * See Documentation/timers/timers-howto.rst for
+ * explanation of the range used here.
+ */
+ usleep_range(ms * 100, ms * 1000);
rc = H_BUSY;
} else if (rc == H_BUSY) {
cond_resched();
--
2.26.3
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v3] powerpc/pseries/vas: Use usleep_range() to support HCALL delay
2023-12-03 2:01 [PATCH v3] powerpc/pseries/vas: Use usleep_range() to support HCALL delay Haren Myneni
@ 2023-12-04 14:05 ` Aneesh Kumar K.V
2023-12-05 9:21 ` Haren Myneni
0 siblings, 1 reply; 3+ messages in thread
From: Aneesh Kumar K.V @ 2023-12-04 14:05 UTC (permalink / raw)
To: Haren Myneni, linuxppc-dev; +Cc: nathanl, Haren Myneni, npiggin
Haren Myneni <haren@linux.ibm.com> writes:
> VAS allocate, modify and deallocate HCALLs returns
> H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
> delay and expects OS to reissue HCALL after that delay. But using
> msleep() will often sleep at least 20 msecs even though the
> hypervisor suggests OS reissue these HCALLs after 1 or 10msecs.
> The open and close VAS window functions hold mutex and then issue
> these HCALLs. So these operations can take longer than the
> necessary when multiple threads issue open or close window APIs
> simultaneously.
>
> So instead of msleep(), use usleep_range() to ensure sleep with
> the expected value before issuing HCALL again.
>
Can you summarize if there an user observable impact for the current
code? We have other code paths using msleep(get_longbusy_msec()). Should
we audit those usages?
>
> Signed-off-by: Haren Myneni <haren@linux.ibm.com>
> Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>
>
> ---
> v1 -> v2:
> - Use usleep_range instead of using RTAS sleep routine as
> suggested by Nathan
> v2 -> v3:
> - Sleep 10MSecs even for HCALL delay > 10MSecs and the other
> commit / comemnt changes as suggested by Nathan and Ellerman.
> ---
> arch/powerpc/platforms/pseries/vas.c | 25 ++++++++++++++++++++++++-
> 1 file changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
> index 71d52a670d95..5cf81c564d4b 100644
> --- a/arch/powerpc/platforms/pseries/vas.c
> +++ b/arch/powerpc/platforms/pseries/vas.c
> @@ -38,7 +38,30 @@ static long hcall_return_busy_check(long rc)
> {
> /* Check if we are stalled for some time */
> if (H_IS_LONG_BUSY(rc)) {
> - msleep(get_longbusy_msecs(rc));
> + unsigned int ms;
> + /*
> + * Allocate, Modify and Deallocate HCALLs returns
> + * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC
> + * for the long delay. So the sleep time should always
> + * be either 1 or 10msecs, but in case if the HCALL
> + * returns the long delay > 10 msecs, clamp the sleep
> + * time to 10msecs.
> + */
> + ms = clamp(get_longbusy_msecs(rc), 1, 10);
> +
> + /*
> + * msleep() will often sleep at least 20 msecs even
> + * though the hypervisor suggests that the OS reissue
> + * HCALLs after 1 or 10msecs. Also the delay hint from
> + * the HCALL is just a suggestion. So OK to pause for
> + * less time than the hinted delay. Use usleep_range()
> + * to ensure we don't sleep much longer than actually
> + * needed.
> + *
> + * See Documentation/timers/timers-howto.rst for
> + * explanation of the range used here.
> + */
> + usleep_range(ms * 100, ms * 1000);
> rc = H_BUSY;
> } else if (rc == H_BUSY) {
> cond_resched();
> --
> 2.26.3
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v3] powerpc/pseries/vas: Use usleep_range() to support HCALL delay
2023-12-04 14:05 ` Aneesh Kumar K.V
@ 2023-12-05 9:21 ` Haren Myneni
0 siblings, 0 replies; 3+ messages in thread
From: Haren Myneni @ 2023-12-05 9:21 UTC (permalink / raw)
To: Aneesh Kumar K.V (IBM), linuxppc-dev; +Cc: nathanl, npiggin
On 12/4/23 6:05 AM, Aneesh Kumar K.V (IBM) wrote:
> Haren Myneni <haren@linux.ibm.com> writes:
>
>> VAS allocate, modify and deallocate HCALLs returns
>> H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
>> delay and expects OS to reissue HCALL after that delay. But using
>> msleep() will often sleep at least 20 msecs even though the
>> hypervisor suggests OS reissue these HCALLs after 1 or 10msecs.
>> The open and close VAS window functions hold mutex and then issue
>> these HCALLs. So these operations can take longer than the
>> necessary when multiple threads issue open or close window APIs
>> simultaneously.
>>
>> So instead of msleep(), use usleep_range() to ensure sleep with
>> the expected value before issuing HCALL again.
>>
>
> Can you summarize if there an user observable impact for the current
> code? We have other code paths using msleep(get_longbusy_msec()). Should
> we audit those usages?
As mentioned in the description, the open and close VAS window APIs can
take longer with simultaneous calls, especially might affect the
performance in the case of repeat open/close APIs for each compression
request. On the large machine configuration which allows more
simultaneous open windows (Ex: 240 cores provides 4800 VAS credits), the
user can observe mutex contention around open/close HCAlls and hung-up
traces in dmesg. I will repost the patch with this update in the commit
message.
I think applicable to use the similar approach for other HCALLs (like in
rtas_busy_delay()) but I have not seen any impact so far with other
HCALLs. So we can add this change later.
Thanks
Haren
>
>
>>
>> Signed-off-by: Haren Myneni <haren@linux.ibm.com>
>> Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>
>>
>> ---
>> v1 -> v2:
>> - Use usleep_range instead of using RTAS sleep routine as
>> suggested by Nathan
>> v2 -> v3:
>> - Sleep 10MSecs even for HCALL delay > 10MSecs and the other
>> commit / comemnt changes as suggested by Nathan and Ellerman.
>> ---
>> arch/powerpc/platforms/pseries/vas.c | 25 ++++++++++++++++++++++++-
>> 1 file changed, 24 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
>> index 71d52a670d95..5cf81c564d4b 100644
>> --- a/arch/powerpc/platforms/pseries/vas.c
>> +++ b/arch/powerpc/platforms/pseries/vas.c
>> @@ -38,7 +38,30 @@ static long hcall_return_busy_check(long rc)
>> {
>> /* Check if we are stalled for some time */
>> if (H_IS_LONG_BUSY(rc)) {
>> - msleep(get_longbusy_msecs(rc));
>> + unsigned int ms;
>> + /*
>> + * Allocate, Modify and Deallocate HCALLs returns
>> + * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC
>> + * for the long delay. So the sleep time should always
>> + * be either 1 or 10msecs, but in case if the HCALL
>> + * returns the long delay > 10 msecs, clamp the sleep
>> + * time to 10msecs.
>> + */
>> + ms = clamp(get_longbusy_msecs(rc), 1, 10);
>> +
>> + /*
>> + * msleep() will often sleep at least 20 msecs even
>> + * though the hypervisor suggests that the OS reissue
>> + * HCALLs after 1 or 10msecs. Also the delay hint from
>> + * the HCALL is just a suggestion. So OK to pause for
>> + * less time than the hinted delay. Use usleep_range()
>> + * to ensure we don't sleep much longer than actually
>> + * needed.
>> + *
>> + * See Documentation/timers/timers-howto.rst for
>> + * explanation of the range used here.
>> + */
>> + usleep_range(ms * 100, ms * 1000);
>> rc = H_BUSY;
>> } else if (rc == H_BUSY) {
>> cond_resched();
>> --
>> 2.26.3
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-12-05 9:42 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-03 2:01 [PATCH v3] powerpc/pseries/vas: Use usleep_range() to support HCALL delay Haren Myneni
2023-12-04 14:05 ` Aneesh Kumar K.V
2023-12-05 9:21 ` Haren Myneni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).