* [PATCH 2/2] opensm: Protect against spurious wakeups when calling cl_event_wait_on
@ 2012-10-29 23:45 Albert Chu
[not found] ` <1351554302.25353.21.camel-akkeaxHeDKRliZ7u+bvwcg@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Albert Chu @ 2012-10-29 23:45 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Signed-off-by: Albert Chu <chu11-i2BcT+NCU+M@public.gmane.org>
---
opensm/osm_congestion_control.c | 4 ++--
opensm/osm_perfmgr.c | 6 +++---
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/opensm/osm_congestion_control.c b/opensm/osm_congestion_control.c
index b5d9cdb..e103ab1 100644
--- a/opensm/osm_congestion_control.c
+++ b/opensm/osm_congestion_control.c
@@ -525,8 +525,8 @@ static void cc_poller_send(osm_congestion_control_t *p_cc,
status = osm_vendor_send(p_cc->bind_handle, p_madw, TRUE);
if (status == IB_SUCCESS) {
cl_atomic_inc(&p_cc->outstanding_mads_on_wire);
- if (p_cc->outstanding_mads_on_wire >
- (int32_t)p_opt->cc_max_outstanding_mads)
+ while (p_cc->outstanding_mads_on_wire >
+ (int32_t)p_opt->cc_max_outstanding_mads)
cl_event_wait_on(&p_cc->sig_mads_on_wire_continue,
EVENT_NO_TIMEOUT,
TRUE);
diff --git a/opensm/osm_perfmgr.c b/opensm/osm_perfmgr.c
index 98b4c07..d8f933e 100644
--- a/opensm/osm_perfmgr.c
+++ b/opensm/osm_perfmgr.c
@@ -419,13 +419,13 @@ static ib_api_status_t perfmgr_send_pc_mad(osm_perfmgr_t * perfmgr,
if (status == IB_SUCCESS) {
/* pause thread if there are too many outstanding requests */
cl_atomic_inc(&(perfmgr->outstanding_queries));
- if (perfmgr->outstanding_queries >
- (int32_t)perfmgr->max_outstanding_queries) {
+ while (perfmgr->outstanding_queries >
+ (int32_t)perfmgr->max_outstanding_queries) {
perfmgr->sweep_state = PERFMGR_SWEEP_SUSPENDED;
cl_event_wait_on(&perfmgr->sig_query, EVENT_NO_TIMEOUT,
TRUE);
- perfmgr->sweep_state = PERFMGR_SWEEP_ACTIVE;
}
+ perfmgr->sweep_state = PERFMGR_SWEEP_ACTIVE;
}
OSM_LOG_EXIT(perfmgr->log);
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] opensm: Protect against spurious wakeups when calling cl_event_wait_on
[not found] ` <1351554302.25353.21.camel-akkeaxHeDKRliZ7u+bvwcg@public.gmane.org>
@ 2012-11-01 7:59 ` Roland Dreier
[not found] ` <CAL1RGDXg39pKOaUcmsZA_HBdFH0XX313PzONDXgEv9F51JrCRQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Roland Dreier @ 2012-11-01 7:59 UTC (permalink / raw)
To: Albert Chu; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Mon, Oct 29, 2012 at 4:45 PM, Albert Chu <chu11-i2BcT+NCU+M@public.gmane.org> wrote:
> @@ -525,8 +525,8 @@ static void cc_poller_send(osm_congestion_control_t *p_cc,
> status = osm_vendor_send(p_cc->bind_handle, p_madw, TRUE);
> if (status == IB_SUCCESS) {
> cl_atomic_inc(&p_cc->outstanding_mads_on_wire);
> - if (p_cc->outstanding_mads_on_wire >
> - (int32_t)p_opt->cc_max_outstanding_mads)
> + while (p_cc->outstanding_mads_on_wire >
> + (int32_t)p_opt->cc_max_outstanding_mads)
> cl_event_wait_on(&p_cc->sig_mads_on_wire_continue,
> EVENT_NO_TIMEOUT,
> TRUE);
I've never looked at the opensm code -- I'm just guessing based on this patch.
But is this (both original and patched) code susceptible to a missed
wakeup race?
ie
if (outstanding_mads > max) // <-- decide to go to sleep here
// other thread signals wakeup, we're not asleep yet
cl_event_wait_on(...); // <-- we've already missed the wakeup,
sleep forever.
- R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] opensm: Protect against spurious wakeups when calling cl_event_wait_on
[not found] ` <CAL1RGDXg39pKOaUcmsZA_HBdFH0XX313PzONDXgEv9F51JrCRQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-11-01 18:08 ` Jason Gunthorpe
[not found] ` <20121101180834.GA20151-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Jason Gunthorpe @ 2012-11-01 18:08 UTC (permalink / raw)
To: Roland Dreier
Cc: Albert Chu, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Thu, Nov 01, 2012 at 12:59:58AM -0700, Roland Dreier wrote:
> On Mon, Oct 29, 2012 at 4:45 PM, Albert Chu <chu11-i2BcT+NCU+M@public.gmane.org> wrote:
> > @@ -525,8 +525,8 @@ static void cc_poller_send(osm_congestion_control_t *p_cc,
> > status = osm_vendor_send(p_cc->bind_handle, p_madw, TRUE);
> > if (status == IB_SUCCESS) {
> > cl_atomic_inc(&p_cc->outstanding_mads_on_wire);
> > - if (p_cc->outstanding_mads_on_wire >
> > - (int32_t)p_opt->cc_max_outstanding_mads)
> > + while (p_cc->outstanding_mads_on_wire >
> > + (int32_t)p_opt->cc_max_outstanding_mads)
> > cl_event_wait_on(&p_cc->sig_mads_on_wire_continue,
> > EVENT_NO_TIMEOUT,
> > TRUE);
>
> I've never looked at the opensm code -- I'm just guessing based on this patch.
The event objects have a hidden built in state that ensures a wake up
is not lost, so long as only one thread ever calls wait_on. If it is
possible two threads could be sleeping on the same event then the
system is unfixably-broken-by-design, since on thread will eat the
internal event and the other will thus miss it, in a racy way.
I've had to clean this kind of a mess up in other code bases, and now
always discourage this kind of interface. Use POSIX condition
variables, they have cleaner locking semantics and are easier to audit
for correctness.
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] opensm: Protect against spurious wakeups when calling cl_event_wait_on
[not found] ` <20121101180834.GA20151-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2012-11-08 12:14 ` Alex Netes
0 siblings, 0 replies; 4+ messages in thread
From: Alex Netes @ 2012-11-08 12:14 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Roland Dreier, Albert Chu,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On 12:08 Thu 01 Nov , Jason Gunthorpe wrote:
> On Thu, Nov 01, 2012 at 12:59:58AM -0700, Roland Dreier wrote:
> > On Mon, Oct 29, 2012 at 4:45 PM, Albert Chu <chu11-i2BcT+NCU+M@public.gmane.org> wrote:
> > > @@ -525,8 +525,8 @@ static void cc_poller_send(osm_congestion_control_t *p_cc,
> > > status = osm_vendor_send(p_cc->bind_handle, p_madw, TRUE);
> > > if (status == IB_SUCCESS) {
> > > cl_atomic_inc(&p_cc->outstanding_mads_on_wire);
> > > - if (p_cc->outstanding_mads_on_wire >
> > > - (int32_t)p_opt->cc_max_outstanding_mads)
> > > + while (p_cc->outstanding_mads_on_wire >
> > > + (int32_t)p_opt->cc_max_outstanding_mads)
> > > cl_event_wait_on(&p_cc->sig_mads_on_wire_continue,
> > > EVENT_NO_TIMEOUT,
> > > TRUE);
> >
> > I've never looked at the opensm code -- I'm just guessing based on this patch.
>
> The event objects have a hidden built in state that ensures a wake up
> is not lost, so long as only one thread ever calls wait_on. If it is
> possible two threads could be sleeping on the same event then the
> system is unfixably-broken-by-design, since on thread will eat the
> internal event and the other will thus miss it, in a racy way.
>
> I've had to clean this kind of a mess up in other code bases, and now
> always discourage this kind of interface. Use POSIX condition
> variables, they have cleaner locking semantics and are easier to audit
> for correctness.
>
Right now only one thread is sleeping on the signal (for both CC and PM), so
it's safe to apply the patch as is. However improvements in that area are more
than welcomed.
-- Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-11-08 12:14 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-29 23:45 [PATCH 2/2] opensm: Protect against spurious wakeups when calling cl_event_wait_on Albert Chu
[not found] ` <1351554302.25353.21.camel-akkeaxHeDKRliZ7u+bvwcg@public.gmane.org>
2012-11-01 7:59 ` Roland Dreier
[not found] ` <CAL1RGDXg39pKOaUcmsZA_HBdFH0XX313PzONDXgEv9F51JrCRQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-11-01 18:08 ` Jason Gunthorpe
[not found] ` <20121101180834.GA20151-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2012-11-08 12:14 ` Alex Netes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox