[PATCH v5] powerpc/pseries/vas: Use usleep_range() to support HCALL delay

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5] powerpc/pseries/vas: Use usleep_range() to support HCALL delay
@ 2024-01-11  6:25 Haren Myneni
  2024-01-11 17:27 ` Nathan Lynch
  0 siblings, 1 reply; 3+ messages in thread
From: Haren Myneni @ 2024-01-11  6:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: nathanl, haren, npiggin, aneesh.kumar

VAS allocate, modify and deallocate HCALLs returns
H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
delay and expects OS to reissue HCALL after that delay. But using
msleep() will often sleep at least 20 msecs even though the
hypervisor suggests OS reissue these HCALLs after 1 or 10msecs.

The open and close VAS window functions hold mutex and then issue
these HCALLs. So these operations can take longer than the
necessary when multiple threads issue open or close window APIs
simultaneously, especially might affect the performance in the
case of repeat open/close APIs for each compression request.
On the large machine configuration which allows more simultaneous
open/close windows (Ex: 240 cores provides 4800 VAS credits), the
user can observe hung task traces in dmesg due to mutex contention
around open/close HCAlls.

So instead of msleep(), use usleep_range() to ensure sleep with
the expected value before issuing HCALL again.

Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>

---
v1 -> v2:
- Use usleep_range instead of using RTAS sleep routine as
  suggested by Nathan
v2 -> v3:
- Sleep 10MSecs even for HCALL delay > 10MSecs and the other
  commit / comemnt changes as suggested by Nathan and Ellerman.
v3 -> v4:
- More description in the commit log with the visible impact for
  the current code as suggested by Aneesh
v4 -> v5:
- Use USEC_PER_MSEC macro in usleep_range as suggested by Aneesh
---
 arch/powerpc/platforms/pseries/vas.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
index 71d52a670d95..79ffe8868c04 100644
--- a/arch/powerpc/platforms/pseries/vas.c
+++ b/arch/powerpc/platforms/pseries/vas.c
@@ -38,7 +38,27 @@ static long hcall_return_busy_check(long rc)
 {
 	/* Check if we are stalled for some time */
 	if (H_IS_LONG_BUSY(rc)) {
-		msleep(get_longbusy_msecs(rc));
+		unsigned int ms;
+		/*
+		 * Allocate, Modify and Deallocate HCALLs returns
+		 * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC
+		 * for the long delay. So the sleep time should always
+		 * be either 1 or 10msecs, but in case if the HCALL
+		 * returns the long delay > 10 msecs, clamp the sleep
+		 * time to 10msecs.
+		 */
+		ms = clamp(get_longbusy_msecs(rc), 1, 10);
+
+		/*
+		 * msleep() will often sleep at least 20 msecs even
+		 * though the hypervisor suggests that the OS reissue
+		 * HCALLs after 1 or 10msecs. Also the delay hint from
+		 * the HCALL is just a suggestion. So OK to pause for
+		 * less time than the hinted delay. Use usleep_range()
+		 * to ensure we don't sleep much longer than actually
+		 * needed.
+		 */
+		usleep_range(ms * 100, ms * USEC_PER_MSEC);
 		rc = H_BUSY;
 	} else if (rc == H_BUSY) {
 		cond_resched();
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v5] powerpc/pseries/vas: Use usleep_range() to support HCALL delay
  2024-01-11  6:25 [PATCH v5] powerpc/pseries/vas: Use usleep_range() to support HCALL delay Haren Myneni
@ 2024-01-11 17:27 ` Nathan Lynch
  2024-01-16  2:36   ` Haren Myneni
  0 siblings, 1 reply; 3+ messages in thread
From: Nathan Lynch @ 2024-01-11 17:27 UTC (permalink / raw)
  To: Haren Myneni, linuxppc-dev; +Cc: aneesh.kumar, npiggin

Haren Myneni <haren@linux.ibm.com> writes:
> VAS allocate, modify and deallocate HCALLs returns
> H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
> delay and expects OS to reissue HCALL after that delay. But using
> msleep() will often sleep at least 20 msecs even though the
> hypervisor suggests OS reissue these HCALLs after 1 or 10msecs.
>
> The open and close VAS window functions hold mutex and then issue
> these HCALLs. So these operations can take longer than the
> necessary when multiple threads issue open or close window APIs
> simultaneously, especially might affect the performance in the
> case of repeat open/close APIs for each compression request.
> On the large machine configuration which allows more simultaneous
> open/close windows (Ex: 240 cores provides 4800 VAS credits), the
> user can observe hung task traces in dmesg due to mutex contention
> around open/close HCAlls.

Is this because the workload queues enough tasks on the mutex to trigger
the hung task watchdog? With a threshold of 120 seconds, something on
the order of ~6000 tasks each taking 20ms or more to traverse this
critical section would cause the problem I think you're describing.

Presumably this change improves the situation, but the commit message
isn't explicit. Have you measured the "throughput" of window open/close
activity before and after? Anything that quantifies the improvement
would be welcome.

> diff --git a/arch/powerpc/platforms/pseries/vas.c b/arch/powerpc/platforms/pseries/vas.c
> index 71d52a670d95..79ffe8868c04 100644
> --- a/arch/powerpc/platforms/pseries/vas.c
> +++ b/arch/powerpc/platforms/pseries/vas.c
> @@ -38,7 +38,27 @@ static long hcall_return_busy_check(long rc)
>  {
>  	/* Check if we are stalled for some time */
>  	if (H_IS_LONG_BUSY(rc)) {
> -		msleep(get_longbusy_msecs(rc));
> +		unsigned int ms;
> +		/*
> +		 * Allocate, Modify and Deallocate HCALLs returns
> +		 * H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC
> +		 * for the long delay. So the sleep time should always
> +		 * be either 1 or 10msecs, but in case if the HCALL
> +		 * returns the long delay > 10 msecs, clamp the sleep
> +		 * time to 10msecs.
> +		 */
> +		ms = clamp(get_longbusy_msecs(rc), 1, 10);
> +
> +		/*
> +		 * msleep() will often sleep at least 20 msecs even
> +		 * though the hypervisor suggests that the OS reissue
> +		 * HCALLs after 1 or 10msecs. Also the delay hint from
> +		 * the HCALL is just a suggestion. So OK to pause for
> +		 * less time than the hinted delay. Use usleep_range()
> +		 * to ensure we don't sleep much longer than actually
> +		 * needed.
> +		 */
> +		usleep_range(ms * 100, ms * USEC_PER_MSEC);

                usleep_range(ms * (USEC_PER_MSEC / 10), ms * USEC_PER_MSEC);

is probably what reviewers want to see when they ask you to use
USEC_PER_MSEC. I.e. both arguments to usleep_range() should be expressed
in terms of USEC_PER_MSEC.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v5] powerpc/pseries/vas: Use usleep_range() to support HCALL delay
  2024-01-11 17:27 ` Nathan Lynch
@ 2024-01-16  2:36   ` Haren Myneni
  0 siblings, 0 replies; 3+ messages in thread
From: Haren Myneni @ 2024-01-16  2:36 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev; +Cc: aneesh.kumar, npiggin



On 1/11/24 9:27 AM, Nathan Lynch wrote:
> Haren Myneni <haren@linux.ibm.com> writes:
>> VAS allocate, modify and deallocate HCALLs returns
>> H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
>> delay and expects OS to reissue HCALL after that delay. But using
>> msleep() will often sleep at least 20 msecs even though the
>> hypervisor suggests OS reissue these HCALLs after 1 or 10msecs.
>>
>> The open and close VAS window functions hold mutex and then issue
>> these HCALLs. So these operations can take longer than the
>> necessary when multiple threads issue open or close window APIs
>> simultaneously, especially might affect the performance in the
>> case of repeat open/close APIs for each compression request.
>> On the large machine configuration which allows more simultaneous
>> open/close windows (Ex: 240 cores provides 4800 VAS credits), the
>> user can observe hung task traces in dmesg due to mutex contention
>> around open/close HCAlls.
> 
> Is this because the workload queues enough tasks on the mutex to trigger
> the hung task watchdog? With a threshold of 120 seconds, something on
> the order of ~6000 tasks each taking 20ms or more to traverse this
> critical section would cause the problem I think you're describing.
> 
> Presumably this change improves the situation, but the commit message
> isn't explicit. Have you measured the "throughput" of window open/close
> activity before and after? Anything that quantifies the improvement
> would be welcome.

Yes, tested on the large system which allows open/close 4800 windows at 
the same time (means 4800 tasks). Noticed sleep more than 20msecs for 
some tasks and getting hung traces for some tasks since the combined 
waiting timing is more then 120seconds. With this patch, the maximum 
sleep is 10msecs and did not see these traces on this system. I will add 
more description to the commit log.

Thanks
Haren




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-01-16  2:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-11  6:25 [PATCH v5] powerpc/pseries/vas: Use usleep_range() to support HCALL delay Haren Myneni
2024-01-11 17:27 ` Nathan Lynch
2024-01-16  2:36   ` Haren Myneni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).