All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] libceph: must hold mutex for reset_changed_osds()
@ 2013-05-15 21:35 Alex Elder
  2013-05-16 16:11 ` Sage Weil
  0 siblings, 1 reply; 3+ messages in thread
From: Alex Elder @ 2013-05-15 21:35 UTC (permalink / raw)
  To: ceph-devel

An osd client has a red-black tree describing its osds, and
occasionally we would get crashes due to one of these trees tree
becoming corrupt somehow.

The problem turned out to be that reset_changed_osds() was being
called without protection of the osd client request mutex.  That
function would call __reset_osd() for any osd that had changed, and
__reset_osd() would call __remove_osd() for any osd with no
outstanding requests, and finally __remove_osd() would remove the
corresponding entry from the red-black tree.  Thus, the tree was
getting modified without having any lock protection, and was
vulnerable to problems due to concurrent updates.

This appears to be the only osd tree updating path that has this
problem.  It can be fairly easily fixed by moving the call up
a few lines, to just before the request mutex gets dropped
in kick_requests().

This resolves:
    http://tracker.ceph.com/issues/5043
Signed-off-by: Alex Elder <elder@inktank.com>
---
 net/ceph/osd_client.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index d5953b8..3a246a6 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1675,13 +1675,13 @@ static void kick_requests(struct ceph_osd_client
*osdc, int force_resend)
 		__register_request(osdc, req);
 		__unregister_linger_request(osdc, req);
 	}
+	reset_changed_osds(osdc);
 	mutex_unlock(&osdc->request_mutex);

 	if (needmap) {
 		dout("%d requests for down osds, need new map\n", needmap);
 		ceph_monc_request_next_osdmap(&osdc->client->monc);
 	}
-	reset_changed_osds(osdc);
 }


-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] libceph: must hold mutex for reset_changed_osds()
  2013-05-15 21:35 [PATCH] libceph: must hold mutex for reset_changed_osds() Alex Elder
@ 2013-05-16 16:11 ` Sage Weil
  2013-05-16 16:12   ` Sage Weil
  0 siblings, 1 reply; 3+ messages in thread
From: Sage Weil @ 2013-05-16 16:11 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel

On Wed, 15 May 2013, Alex Elder wrote:
> An osd client has a red-black tree describing its osds, and
> occasionally we would get crashes due to one of these trees tree
> becoming corrupt somehow.
> 
> The problem turned out to be that reset_changed_osds() was being
> called without protection of the osd client request mutex.  That
> function would call __reset_osd() for any osd that had changed, and
> __reset_osd() would call __remove_osd() for any osd with no
> outstanding requests, and finally __remove_osd() would remove the
> corresponding entry from the red-black tree.  Thus, the tree was
> getting modified without having any lock protection, and was
> vulnerable to problems due to concurrent updates.
> 
> This appears to be the only osd tree updating path that has this
> problem.  It can be fairly easily fixed by moving the call up
> a few lines, to just before the request mutex gets dropped
> in kick_requests().
> 
> This resolves:
>     http://tracker.ceph.com/issues/5043
> Signed-off-by: Alex Elder <elder@inktank.com>

Reviewed-by: Sage Weil <sage@inktank.com>


> ---
>  net/ceph/osd_client.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index d5953b8..3a246a6 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -1675,13 +1675,13 @@ static void kick_requests(struct ceph_osd_client
> *osdc, int force_resend)
>  		__register_request(osdc, req);
>  		__unregister_linger_request(osdc, req);
>  	}
> +	reset_changed_osds(osdc);
>  	mutex_unlock(&osdc->request_mutex);
> 
>  	if (needmap) {
>  		dout("%d requests for down osds, need new map\n", needmap);
>  		ceph_monc_request_next_osdmap(&osdc->client->monc);
>  	}
> -	reset_changed_osds(osdc);
>  }
> 
> 
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] libceph: must hold mutex for reset_changed_osds()
  2013-05-16 16:11 ` Sage Weil
@ 2013-05-16 16:12   ` Sage Weil
  0 siblings, 0 replies; 3+ messages in thread
From: Sage Weil @ 2013-05-16 16:12 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel

On Thu, 16 May 2013, Sage Weil wrote:
> On Wed, 15 May 2013, Alex Elder wrote:
> > An osd client has a red-black tree describing its osds, and
> > occasionally we would get crashes due to one of these trees tree
> > becoming corrupt somehow.
> > 
> > The problem turned out to be that reset_changed_osds() was being
> > called without protection of the osd client request mutex.  That
> > function would call __reset_osd() for any osd that had changed, and
> > __reset_osd() would call __remove_osd() for any osd with no
> > outstanding requests, and finally __remove_osd() would remove the
> > corresponding entry from the red-black tree.  Thus, the tree was
> > getting modified without having any lock protection, and was
> > vulnerable to problems due to concurrent updates.
> > 
> > This appears to be the only osd tree updating path that has this
> > problem.  It can be fairly easily fixed by moving the call up
> > a few lines, to just before the request mutex gets dropped
> > in kick_requests().
> > 
> > This resolves:
> >     http://tracker.ceph.com/issues/5043
> > Signed-off-by: Alex Elder <elder@inktank.com>
> 
> Reviewed-by: Sage Weil <sage@inktank.com>

Before you commit this, can you see if this applies cleanly to the stable 
trees?  If so, let's put the stable tag in the changelog and see if the 
automagic stable backport stuff works!

sage


> 
> 
> > ---
> >  net/ceph/osd_client.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> > index d5953b8..3a246a6 100644
> > --- a/net/ceph/osd_client.c
> > +++ b/net/ceph/osd_client.c
> > @@ -1675,13 +1675,13 @@ static void kick_requests(struct ceph_osd_client
> > *osdc, int force_resend)
> >  		__register_request(osdc, req);
> >  		__unregister_linger_request(osdc, req);
> >  	}
> > +	reset_changed_osds(osdc);
> >  	mutex_unlock(&osdc->request_mutex);
> > 
> >  	if (needmap) {
> >  		dout("%d requests for down osds, need new map\n", needmap);
> >  		ceph_monc_request_next_osdmap(&osdc->client->monc);
> >  	}
> > -	reset_changed_osds(osdc);
> >  }
> > 
> > 
> > -- 
> > 1.7.9.5
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-05-16 16:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-15 21:35 [PATCH] libceph: must hold mutex for reset_changed_osds() Alex Elder
2013-05-16 16:11 ` Sage Weil
2013-05-16 16:12   ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.