All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] __remove_osd() vs lingering requests
@ 2014-11-05 17:33 Ilya Dryomov
  2014-11-05 17:33 ` [PATCH 1/3] libceph: unlink from o_linger_requests when clearing r_osd Ilya Dryomov
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Ilya Dryomov @ 2014-11-05 17:33 UTC (permalink / raw)
  To: ceph-devel

Hello,

These are fixes for the problem I found while trying to reproduce
recently reopened http://tracker.ceph.com/issues/5429.  I'm pretty sure
this is what was reported in http://tracker.ceph.com/issues/8568 too.
The exact BUG_ON() it points to is no longer there, but I checked with
an older kernel and the reproducer I have for this triggered it.

Thanks,

                Ilya


Ilya Dryomov (3):
  libceph: unlink from o_linger_requests when clearing r_osd
  libceph: clear r_req_lru_item in __unregister_linger_request()
  libceph: change from BUG to WARN for __remove_osd() asserts

 net/ceph/osd_client.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] libceph: unlink from o_linger_requests when clearing r_osd
  2014-11-05 17:33 [PATCH 0/3] __remove_osd() vs lingering requests Ilya Dryomov
@ 2014-11-05 17:33 ` Ilya Dryomov
  2014-11-05 17:33 ` [PATCH 2/3] libceph: clear r_req_lru_item in __unregister_linger_request() Ilya Dryomov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Ilya Dryomov @ 2014-11-05 17:33 UTC (permalink / raw)
  To: ceph-devel

Requests have to be unlinked from both osd->o_requests (normal
requests) and osd->o_linger_requests (linger requests) lists when
clearing req->r_osd.  Otherwise __unregister_linger_request() gets
confused and we trip over a !list_empty(&osd->o_linger_requests)
assert in __remove_osd().

Reproducer, MON=1 OSD=1:

    # cat remove-osd.sh
    #!/bin/bash
    rbd create --size 1 test
    DEV=$(rbd map test)
    ceph osd out 0
    sleep 3
    rbd map bar/b # obtain a new osdmap as a side effect
    rbd unmap $DEV & # will block
    sleep 3
    ceph osd in 0

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
---
 net/ceph/osd_client.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 5a75395fe0ff..ce09da2cf32b 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1395,6 +1395,7 @@ static int __map_request(struct ceph_osd_client *osdc,
 	if (req->r_osd) {
 		__cancel_request(req);
 		list_del_init(&req->r_osd_item);
+		list_del_init(&req->r_linger_osd_item);
 		req->r_osd = NULL;
 	}
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] libceph: clear r_req_lru_item in __unregister_linger_request()
  2014-11-05 17:33 [PATCH 0/3] __remove_osd() vs lingering requests Ilya Dryomov
  2014-11-05 17:33 ` [PATCH 1/3] libceph: unlink from o_linger_requests when clearing r_osd Ilya Dryomov
@ 2014-11-05 17:33 ` Ilya Dryomov
  2014-11-05 17:33 ` [PATCH 3/3] libceph: change from BUG to WARN for __remove_osd() asserts Ilya Dryomov
  2014-11-05 17:49 ` [PATCH 0/3] __remove_osd() vs lingering requests Alex Elder
  3 siblings, 0 replies; 6+ messages in thread
From: Ilya Dryomov @ 2014-11-05 17:33 UTC (permalink / raw)
  To: ceph-devel

kick_requests() can put linger requests on the notarget list.  This
means we need to clear the much-overloaded req->r_req_lru_item in
__unregister_linger_request() as well, or we get an assertion failure
in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item).

AFAICT the assumption was that registered linger requests cannot be on
any of req->r_req_lru_item lists, but that's clearly not the case.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
---
 net/ceph/osd_client.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index ce09da2cf32b..2a42c795b1ac 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1254,6 +1254,8 @@ static void __unregister_linger_request(struct ceph_osd_client *osdc,
 		if (list_empty(&req->r_osd_item))
 			req->r_osd = NULL;
 	}
+
+	list_del_init(&req->r_req_lru_item); /* can be on notarget */
 	ceph_osdc_put_request(req);
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] libceph: change from BUG to WARN for __remove_osd() asserts
  2014-11-05 17:33 [PATCH 0/3] __remove_osd() vs lingering requests Ilya Dryomov
  2014-11-05 17:33 ` [PATCH 1/3] libceph: unlink from o_linger_requests when clearing r_osd Ilya Dryomov
  2014-11-05 17:33 ` [PATCH 2/3] libceph: clear r_req_lru_item in __unregister_linger_request() Ilya Dryomov
@ 2014-11-05 17:33 ` Ilya Dryomov
  2014-11-05 17:49 ` [PATCH 0/3] __remove_osd() vs lingering requests Alex Elder
  3 siblings, 0 replies; 6+ messages in thread
From: Ilya Dryomov @ 2014-11-05 17:33 UTC (permalink / raw)
  To: ceph-devel

No reason to use BUG_ON for osd request list assertions.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
---
 net/ceph/osd_client.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 2a42c795b1ac..1f6c4055adaf 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1007,8 +1007,8 @@ static void put_osd(struct ceph_osd *osd)
 static void __remove_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd)
 {
 	dout("__remove_osd %p\n", osd);
-	BUG_ON(!list_empty(&osd->o_requests));
-	BUG_ON(!list_empty(&osd->o_linger_requests));
+	WARN_ON(!list_empty(&osd->o_requests));
+	WARN_ON(!list_empty(&osd->o_linger_requests));
 
 	rb_erase(&osd->o_node, &osdc->osds);
 	list_del_init(&osd->o_osd_lru);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3] __remove_osd() vs lingering requests
  2014-11-05 17:33 [PATCH 0/3] __remove_osd() vs lingering requests Ilya Dryomov
                   ` (2 preceding siblings ...)
  2014-11-05 17:33 ` [PATCH 3/3] libceph: change from BUG to WARN for __remove_osd() asserts Ilya Dryomov
@ 2014-11-05 17:49 ` Alex Elder
  2014-11-05 18:42   ` Ilya Dryomov
  3 siblings, 1 reply; 6+ messages in thread
From: Alex Elder @ 2014-11-05 17:49 UTC (permalink / raw)
  To: Ilya Dryomov, ceph-devel

On 11/05/2014 11:33 AM, Ilya Dryomov wrote:
> Hello,
> 
> These are fixes for the problem I found while trying to reproduce
> recently reopened http://tracker.ceph.com/issues/5429.  I'm pretty sure
> this is what was reported in http://tracker.ceph.com/issues/8568 too.
> The exact BUG_ON() it points to is no longer there, but I checked with
> an older kernel and the reproducer I have for this triggered it.
> 
> Thanks,
> 
>                 Ilya

All three of these look good to me.  The interactions with
these lists has always been a bit too tricky; you've done a
lot to make sense of them.

And I'm sure there are other BUG_ON() (or rbd_assert()) calls
that can be tamed--either removed, changed to warnings, or
changed to simply return errors where possible.

Reviewed-by: Alex Elder <elder@linaro.org>

> 
> 
> Ilya Dryomov (3):
>   libceph: unlink from o_linger_requests when clearing r_osd
>   libceph: clear r_req_lru_item in __unregister_linger_request()
>   libceph: change from BUG to WARN for __remove_osd() asserts
> 
>  net/ceph/osd_client.c |    7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3] __remove_osd() vs lingering requests
  2014-11-05 17:49 ` [PATCH 0/3] __remove_osd() vs lingering requests Alex Elder
@ 2014-11-05 18:42   ` Ilya Dryomov
  0 siblings, 0 replies; 6+ messages in thread
From: Ilya Dryomov @ 2014-11-05 18:42 UTC (permalink / raw)
  To: Alex Elder; +Cc: Ceph Development

On Wed, Nov 5, 2014 at 8:49 PM, Alex Elder <elder@ieee.org> wrote:
> On 11/05/2014 11:33 AM, Ilya Dryomov wrote:
>> Hello,
>>
>> These are fixes for the problem I found while trying to reproduce
>> recently reopened http://tracker.ceph.com/issues/5429.  I'm pretty sure
>> this is what was reported in http://tracker.ceph.com/issues/8568 too.
>> The exact BUG_ON() it points to is no longer there, but I checked with
>> an older kernel and the reproducer I have for this triggered it.
>>
>> Thanks,
>>
>>                 Ilya
>
> All three of these look good to me.  The interactions with
> these lists has always been a bit too tricky; you've done a
> lot to make sense of them.
>
> And I'm sure there are other BUG_ON() (or rbd_assert()) calls
> that can be tamed--either removed, changed to warnings, or
> changed to simply return errors where possible.
>
> Reviewed-by: Alex Elder <elder@linaro.org>

Thanks for the review, Alex!

                Ilya

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-11-05 18:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-05 17:33 [PATCH 0/3] __remove_osd() vs lingering requests Ilya Dryomov
2014-11-05 17:33 ` [PATCH 1/3] libceph: unlink from o_linger_requests when clearing r_osd Ilya Dryomov
2014-11-05 17:33 ` [PATCH 2/3] libceph: clear r_req_lru_item in __unregister_linger_request() Ilya Dryomov
2014-11-05 17:33 ` [PATCH 3/3] libceph: change from BUG to WARN for __remove_osd() asserts Ilya Dryomov
2014-11-05 17:49 ` [PATCH 0/3] __remove_osd() vs lingering requests Alex Elder
2014-11-05 18:42   ` Ilya Dryomov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.