public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] ibacm: acmp retry issues
@ 2017-11-13 21:23 Michael J. Ruhl
       [not found] ` <20171113212220.24293.97479.stgit-K+u1se/DcYrLESAwzcoQNrvm/XP+8Wra@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Michael J. Ruhl @ 2017-11-13 21:23 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

While testing the retry mechanism for the acmp provider, I observed
that the retry event_wait() did not appear to be working correctly.
After studying the issue a bit more, I discovered that there were a
couple of issues.

This patch set addresses those issues.

v2 - added patch for MONOTONIC time base

---

Michael J. Ruhl (4):
      ibacm: Fix an incorrect expiration check for the retry timer
      ibacm: Calculate correct tv_nsec value in event_wait()
      ibacm: Fix a retry loop calculation race condition
      ibacm: Use MONOTONIC time base to avoid timer expiration issues


 ibacm/linux/osd.h          |   25 ++++++++++++++++---------
 ibacm/prov/acmp/src/acmp.c |   12 +++++++-----
 2 files changed, 23 insertions(+), 14 deletions(-)

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/4] ibacm: Fix an incorrect expiration check for the retry timer
       [not found] ` <20171113212220.24293.97479.stgit-K+u1se/DcYrLESAwzcoQNrvm/XP+8Wra@public.gmane.org>
@ 2017-11-13 21:24   ` Michael J. Ruhl
  2017-11-13 21:24   ` [PATCH v2 2/4] ibacm: Calculate correct tv_nsec value in event_wait() Michael J. Ruhl
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Michael J. Ruhl @ 2017-11-13 21:24 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

From: Michael J. Ruhl <michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

The acmp_process_wait_queue() checks to see if a message expiration
time has passed.

Because the check is for less than (<), if the timeout expires matches
the current time, the check will result in a timeout value of 0, and
the wait loop will spin until the next millisecond has passed.

Using example values to demonstrate the issue, we can see:

With '<':
wait = -2106577636 (no work)
wait = 2510        (message wait)
(process spins)
wait = 0           (expires - current time == 0)
wait = 0
wait = 0
...                (1 ms of output)
wait = 0
wait = -2106580147 (retry complete)
wait = 2512

With '<=':
wait = -2106688780 (no work)
wait = 2512        ( message wait)
(process sleeps)
wait = -2106691293 (retry complete)
wait = 2512

Expire the message if the expires is less than or equal.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 ibacm/prov/acmp/src/acmp.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/ibacm/prov/acmp/src/acmp.c b/ibacm/prov/acmp/src/acmp.c
index 78d9a29..d707b8e 100644
--- a/ibacm/prov/acmp/src/acmp.c
+++ b/ibacm/prov/acmp/src/acmp.c
@@ -1507,7 +1507,7 @@ static void acmp_process_wait_queue(struct acmp_ep *ep, uint64_t *next_expire)
 	struct ibv_send_wr *bad_wr;
 
 	list_for_each_safe(&ep->wait_queue, msg, next, entry) {
-		if (msg->expires < time_stamp_ms()) {
+		if (msg->expires <= time_stamp_ms()) {
 			list_del(&msg->entry);
 			(void) atomic_dec(&wait_cnt);
 			if (--msg->tries) {

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/4] ibacm: Calculate correct tv_nsec value in event_wait()
       [not found] ` <20171113212220.24293.97479.stgit-K+u1se/DcYrLESAwzcoQNrvm/XP+8Wra@public.gmane.org>
  2017-11-13 21:24   ` [PATCH v2 1/4] ibacm: Fix an incorrect expiration check for the retry timer Michael J. Ruhl
@ 2017-11-13 21:24   ` Michael J. Ruhl
  2017-11-13 21:24   ` [PATCH v2 3/4] ibacm: Fix a retry loop calculation race condition Michael J. Ruhl
  2017-11-13 21:25   ` [PATCH v2 4/4] ibacm: Use MONOTONIC time base to avoid timer expiration issues Michael J. Ruhl
  3 siblings, 0 replies; 6+ messages in thread
From: Michael J. Ruhl @ 2017-11-13 21:24 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

From: Michael J. Ruhl <michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

The event_wait() function calculates a tv_nsec value based on the
given timeout.  If the tv_nsec value calculation ends ups larger
than 1 second, the pthread_cond_timedwait() will return EINVAL,
and will not wait.

This causes the retry loop to spin (busy wait) until the actual
timeout occurs.

Ensure that the tv_nsec value is less than 1 second.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 ibacm/linux/osd.h |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/ibacm/linux/osd.h b/ibacm/linux/osd.h
index 95713e6..d6cbb8f 100644
--- a/ibacm/linux/osd.h
+++ b/ibacm/linux/osd.h
@@ -92,6 +92,7 @@ static inline void event_init(event_t *e)
 	pthread_mutex_init(&e->mutex, NULL);
 }
 #define event_signal(e)	pthread_cond_signal(&(e)->cond)
+#define ONE_SEC_IN_NSEC  1000000000
 static inline int event_wait(event_t *e, int timeout)
 {
 	struct timeval curtime;
@@ -101,6 +102,10 @@ static inline int event_wait(event_t *e, int timeout)
 	gettimeofday(&curtime, NULL);
 	wait.tv_sec = curtime.tv_sec + ((unsigned) timeout) / 1000;
 	wait.tv_nsec = (curtime.tv_usec + (((unsigned) timeout) % 1000) * 1000) * 1000;
+	if (wait.tv_nsec > ONE_SEC_IN_NSEC) {
+		wait.tv_sec++;
+		wait.tv_nsec -= ONE_SEC_IN_NSEC;
+	}
 	pthread_mutex_lock(&e->mutex);
 	ret = pthread_cond_timedwait(&e->cond, &e->mutex, &wait);
 	pthread_mutex_unlock(&e->mutex);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 3/4] ibacm: Fix a retry loop calculation race condition
       [not found] ` <20171113212220.24293.97479.stgit-K+u1se/DcYrLESAwzcoQNrvm/XP+8Wra@public.gmane.org>
  2017-11-13 21:24   ` [PATCH v2 1/4] ibacm: Fix an incorrect expiration check for the retry timer Michael J. Ruhl
  2017-11-13 21:24   ` [PATCH v2 2/4] ibacm: Calculate correct tv_nsec value in event_wait() Michael J. Ruhl
@ 2017-11-13 21:24   ` Michael J. Ruhl
  2017-11-13 21:25   ` [PATCH v2 4/4] ibacm: Use MONOTONIC time base to avoid timer expiration issues Michael J. Ruhl
  3 siblings, 0 replies; 6+ messages in thread
From: Michael J. Ruhl @ 2017-11-13 21:24 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

From: Michael J. Ruhl <michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

The retry loop calculation uses a conversion to int of an unsigned
64 bit number (next_expire) minus the current time to decide if
event_wait() should be called.  This calculation works correctly as
long as the next_expire value is not the default value (-1).

If the next_expire is the default value, periodically this subtraction
can result in a very large postive timeout value (days rather than
milliseconds).

For example:
next_expire  = 0xFFFFFFFFFFFFFFFF  (-1)
current_ms = 0x15f7db52146  (today's ms since 1970)

max_delay_ms = (int) next_expire - future_ms

future_ms  = 0x15f80000000  = max_delay_ms 2147483647
future_ms  = 0x16080000000  = max_delay_ms 2147483647

Converting max_delay_ms to days:
2147483647 / 1000 / 60 / 60 / 24 == 24 days

0xxx180000000 - 0xxx080000000 = 4294967296

every 48 days, this issue repeats

This calculation can occur if a wait_cnt is incremented and a message
expiration is handled so that next_expire is not updated.  If wait_cnt
is incremented before the wait calculation is done (the race condition),
event_wait() can be called with the potentially very large value.

If next_expire is not updated, do not do the wait calculation and
avoid the race condition.

Reported-by: Morys Grzegorz <grzegorz.morys-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 ibacm/prov/acmp/src/acmp.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/ibacm/prov/acmp/src/acmp.c b/ibacm/prov/acmp/src/acmp.c
index d707b8e..884fc48 100644
--- a/ibacm/prov/acmp/src/acmp.c
+++ b/ibacm/prov/acmp/src/acmp.c
@@ -1579,10 +1579,12 @@ static void *acmp_retry_handler(void *context)
 		pthread_mutex_unlock(&acmp_dev_lock);
 
 		acmp_process_timeouts();
-		wait = (int) (next_expire - time_stamp_ms());
-		if (wait > 0 && atomic_get(&wait_cnt)) {
-			pthread_testcancel();
-			event_wait(&timeout_event, wait);
+		if (next_expire != -1) {
+			wait = (int) (next_expire - time_stamp_ms());
+			if (wait > 0 && atomic_get(&wait_cnt)) {
+				pthread_testcancel();
+				event_wait(&timeout_event, wait);
+			}
 		}
 	}
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 4/4] ibacm: Use MONOTONIC time base to avoid timer expiration issues
       [not found] ` <20171113212220.24293.97479.stgit-K+u1se/DcYrLESAwzcoQNrvm/XP+8Wra@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-11-13 21:24   ` [PATCH v2 3/4] ibacm: Fix a retry loop calculation race condition Michael J. Ruhl
@ 2017-11-13 21:25   ` Michael J. Ruhl
       [not found]     ` <20171113212454.24293.68783.stgit-K+u1se/DcYrLESAwzcoQNrvm/XP+8Wra@public.gmane.org>
  3 siblings, 1 reply; 6+ messages in thread
From: Michael J. Ruhl @ 2017-11-13 21:25 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

From: Michael J. Ruhl <michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

The event_wait() function uses the CLOCK_REALTIME time base for
calculating expiration times (default time base for gettimeofday()).

Using the CLOCK_REALTIME time base can introduce incorrect expiration
timeout calculations if the REALTIME clock changes, making a timeout
too long (possibly hours or days), or too short.

Update time base usage to the CLOCK_MONOTONIC time base to avoid time
change issues.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 ibacm/linux/osd.h |   20 +++++++++++---------
 1 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/ibacm/linux/osd.h b/ibacm/linux/osd.h
index d6cbb8f..1901a89 100644
--- a/ibacm/linux/osd.h
+++ b/ibacm/linux/osd.h
@@ -88,20 +88,23 @@ typedef struct { volatile int val; } atomic_t;
 typedef struct { pthread_cond_t cond; pthread_mutex_t mutex; } event_t;
 static inline void event_init(event_t *e)
 {
-	pthread_cond_init(&e->cond, NULL);
+	pthread_condattr_t attr;
+
+	pthread_condattr_init(&attr);
+	pthread_condattr_setclock(&attr, CLOCK_MONOTONIC);
+	pthread_cond_init(&e->cond, &attr);
 	pthread_mutex_init(&e->mutex, NULL);
 }
 #define event_signal(e)	pthread_cond_signal(&(e)->cond)
 #define ONE_SEC_IN_NSEC  1000000000
 static inline int event_wait(event_t *e, int timeout)
 {
-	struct timeval curtime;
 	struct timespec wait;
 	int ret;
 
-	gettimeofday(&curtime, NULL);
-	wait.tv_sec = curtime.tv_sec + ((unsigned) timeout) / 1000;
-	wait.tv_nsec = (curtime.tv_usec + (((unsigned) timeout) % 1000) * 1000) * 1000;
+	clock_gettime(CLOCK_MONOTONIC, &wait);
+	wait.tv_sec = wait.tv_sec + ((unsigned) timeout) / 1000;
+	wait.tv_nsec = (wait.tv_nsec + (((unsigned) timeout) % 1000) * 1000000);
 	if (wait.tv_nsec > ONE_SEC_IN_NSEC) {
 		wait.tv_sec++;
 		wait.tv_nsec -= ONE_SEC_IN_NSEC;
@@ -114,10 +117,9 @@ static inline int event_wait(event_t *e, int timeout)
 
 static inline uint64_t time_stamp_us(void)
 {
-	struct timeval curtime;
-	timerclear(&curtime);
-	gettimeofday(&curtime, NULL);
-	return (uint64_t) curtime.tv_sec * 1000000 + (uint64_t) curtime.tv_usec;
+	struct timespec t;
+	clock_gettime(CLOCK_MONOTONIC, &t);
+	return ((uint64_t)t.tv_sec * ONE_SEC_IN_NSEC + (uint64_t)t.tv_nsec) / 1000;
 }
 
 #define time_stamp_ms()  (time_stamp_us() / (uint64_t) 1000)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 4/4] ibacm: Use MONOTONIC time base to avoid timer expiration issues
       [not found]     ` <20171113212454.24293.68783.stgit-K+u1se/DcYrLESAwzcoQNrvm/XP+8Wra@public.gmane.org>
@ 2017-11-14  3:04       ` Jason Gunthorpe
  0 siblings, 0 replies; 6+ messages in thread
From: Jason Gunthorpe @ 2017-11-14  3:04 UTC (permalink / raw)
  To: Michael J. Ruhl; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Mon, Nov 13, 2017 at 04:25:04PM -0500, Michael J. Ruhl wrote:

Sorry, noticed a few more wonky things:

>  #define event_signal(e)	pthread_cond_signal(&(e)->cond)
>  #define ONE_SEC_IN_NSEC  1000000000

That should be 1000000000ULL so math doesn't accidental truncate

> +	clock_gettime(CLOCK_MONOTONIC, &wait);
> +	wait.tv_sec = wait.tv_sec + ((unsigned) timeout) / 1000;
> +	wait.tv_nsec = (wait.tv_nsec + (((unsigned) timeout) % 1000) * 1000000);

Why the odd casts to (unsigned) ? That should be (unsigned int)

> -	return (uint64_t) curtime.tv_sec * 1000000 + (uint64_t) curtime.tv_usec;
> +	struct timespec t;
> +	clock_gettime(CLOCK_MONOTONIC, &t);
> +	return ((uint64_t)t.tv_sec * ONE_SEC_IN_NSEC + (uint64_t)t.tv_nsec) / 1000;

These casts to uint64_t are also wrong, time is technically
signed. The change to ONE_SEC_IN_NSEC should remove the need
for the casting.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-11-14  3:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-13 21:23 [PATCH v2 0/4] ibacm: acmp retry issues Michael J. Ruhl
     [not found] ` <20171113212220.24293.97479.stgit-K+u1se/DcYrLESAwzcoQNrvm/XP+8Wra@public.gmane.org>
2017-11-13 21:24   ` [PATCH v2 1/4] ibacm: Fix an incorrect expiration check for the retry timer Michael J. Ruhl
2017-11-13 21:24   ` [PATCH v2 2/4] ibacm: Calculate correct tv_nsec value in event_wait() Michael J. Ruhl
2017-11-13 21:24   ` [PATCH v2 3/4] ibacm: Fix a retry loop calculation race condition Michael J. Ruhl
2017-11-13 21:25   ` [PATCH v2 4/4] ibacm: Use MONOTONIC time base to avoid timer expiration issues Michael J. Ruhl
     [not found]     ` <20171113212454.24293.68783.stgit-K+u1se/DcYrLESAwzcoQNrvm/XP+8Wra@public.gmane.org>
2017-11-14  3:04       ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox