* [PATCH 0/3] __remove_osd() vs lingering requests
@ 2014-11-05 17:33 Ilya Dryomov
2014-11-05 17:33 ` [PATCH 1/3] libceph: unlink from o_linger_requests when clearing r_osd Ilya Dryomov
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Ilya Dryomov @ 2014-11-05 17:33 UTC (permalink / raw)
To: ceph-devel
Hello,
These are fixes for the problem I found while trying to reproduce
recently reopened http://tracker.ceph.com/issues/5429. I'm pretty sure
this is what was reported in http://tracker.ceph.com/issues/8568 too.
The exact BUG_ON() it points to is no longer there, but I checked with
an older kernel and the reproducer I have for this triggered it.
Thanks,
Ilya
Ilya Dryomov (3):
libceph: unlink from o_linger_requests when clearing r_osd
libceph: clear r_req_lru_item in __unregister_linger_request()
libceph: change from BUG to WARN for __remove_osd() asserts
net/ceph/osd_client.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--
1.7.10.4
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/3] libceph: unlink from o_linger_requests when clearing r_osd
2014-11-05 17:33 [PATCH 0/3] __remove_osd() vs lingering requests Ilya Dryomov
@ 2014-11-05 17:33 ` Ilya Dryomov
2014-11-05 17:33 ` [PATCH 2/3] libceph: clear r_req_lru_item in __unregister_linger_request() Ilya Dryomov
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Ilya Dryomov @ 2014-11-05 17:33 UTC (permalink / raw)
To: ceph-devel
Requests have to be unlinked from both osd->o_requests (normal
requests) and osd->o_linger_requests (linger requests) lists when
clearing req->r_osd. Otherwise __unregister_linger_request() gets
confused and we trip over a !list_empty(&osd->o_linger_requests)
assert in __remove_osd().
Reproducer, MON=1 OSD=1:
# cat remove-osd.sh
#!/bin/bash
rbd create --size 1 test
DEV=$(rbd map test)
ceph osd out 0
sleep 3
rbd map bar/b # obtain a new osdmap as a side effect
rbd unmap $DEV & # will block
sleep 3
ceph osd in 0
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
---
net/ceph/osd_client.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 5a75395fe0ff..ce09da2cf32b 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1395,6 +1395,7 @@ static int __map_request(struct ceph_osd_client *osdc,
if (req->r_osd) {
__cancel_request(req);
list_del_init(&req->r_osd_item);
+ list_del_init(&req->r_linger_osd_item);
req->r_osd = NULL;
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/3] libceph: clear r_req_lru_item in __unregister_linger_request()
2014-11-05 17:33 [PATCH 0/3] __remove_osd() vs lingering requests Ilya Dryomov
2014-11-05 17:33 ` [PATCH 1/3] libceph: unlink from o_linger_requests when clearing r_osd Ilya Dryomov
@ 2014-11-05 17:33 ` Ilya Dryomov
2014-11-05 17:33 ` [PATCH 3/3] libceph: change from BUG to WARN for __remove_osd() asserts Ilya Dryomov
2014-11-05 17:49 ` [PATCH 0/3] __remove_osd() vs lingering requests Alex Elder
3 siblings, 0 replies; 6+ messages in thread
From: Ilya Dryomov @ 2014-11-05 17:33 UTC (permalink / raw)
To: ceph-devel
kick_requests() can put linger requests on the notarget list. This
means we need to clear the much-overloaded req->r_req_lru_item in
__unregister_linger_request() as well, or we get an assertion failure
in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item).
AFAICT the assumption was that registered linger requests cannot be on
any of req->r_req_lru_item lists, but that's clearly not the case.
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
---
net/ceph/osd_client.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index ce09da2cf32b..2a42c795b1ac 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1254,6 +1254,8 @@ static void __unregister_linger_request(struct ceph_osd_client *osdc,
if (list_empty(&req->r_osd_item))
req->r_osd = NULL;
}
+
+ list_del_init(&req->r_req_lru_item); /* can be on notarget */
ceph_osdc_put_request(req);
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/3] libceph: change from BUG to WARN for __remove_osd() asserts
2014-11-05 17:33 [PATCH 0/3] __remove_osd() vs lingering requests Ilya Dryomov
2014-11-05 17:33 ` [PATCH 1/3] libceph: unlink from o_linger_requests when clearing r_osd Ilya Dryomov
2014-11-05 17:33 ` [PATCH 2/3] libceph: clear r_req_lru_item in __unregister_linger_request() Ilya Dryomov
@ 2014-11-05 17:33 ` Ilya Dryomov
2014-11-05 17:49 ` [PATCH 0/3] __remove_osd() vs lingering requests Alex Elder
3 siblings, 0 replies; 6+ messages in thread
From: Ilya Dryomov @ 2014-11-05 17:33 UTC (permalink / raw)
To: ceph-devel
No reason to use BUG_ON for osd request list assertions.
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
---
net/ceph/osd_client.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 2a42c795b1ac..1f6c4055adaf 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1007,8 +1007,8 @@ static void put_osd(struct ceph_osd *osd)
static void __remove_osd(struct ceph_osd_client *osdc, struct ceph_osd *osd)
{
dout("__remove_osd %p\n", osd);
- BUG_ON(!list_empty(&osd->o_requests));
- BUG_ON(!list_empty(&osd->o_linger_requests));
+ WARN_ON(!list_empty(&osd->o_requests));
+ WARN_ON(!list_empty(&osd->o_linger_requests));
rb_erase(&osd->o_node, &osdc->osds);
list_del_init(&osd->o_osd_lru);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 0/3] __remove_osd() vs lingering requests
2014-11-05 17:33 [PATCH 0/3] __remove_osd() vs lingering requests Ilya Dryomov
` (2 preceding siblings ...)
2014-11-05 17:33 ` [PATCH 3/3] libceph: change from BUG to WARN for __remove_osd() asserts Ilya Dryomov
@ 2014-11-05 17:49 ` Alex Elder
2014-11-05 18:42 ` Ilya Dryomov
3 siblings, 1 reply; 6+ messages in thread
From: Alex Elder @ 2014-11-05 17:49 UTC (permalink / raw)
To: Ilya Dryomov, ceph-devel
On 11/05/2014 11:33 AM, Ilya Dryomov wrote:
> Hello,
>
> These are fixes for the problem I found while trying to reproduce
> recently reopened http://tracker.ceph.com/issues/5429. I'm pretty sure
> this is what was reported in http://tracker.ceph.com/issues/8568 too.
> The exact BUG_ON() it points to is no longer there, but I checked with
> an older kernel and the reproducer I have for this triggered it.
>
> Thanks,
>
> Ilya
All three of these look good to me. The interactions with
these lists has always been a bit too tricky; you've done a
lot to make sense of them.
And I'm sure there are other BUG_ON() (or rbd_assert()) calls
that can be tamed--either removed, changed to warnings, or
changed to simply return errors where possible.
Reviewed-by: Alex Elder <elder@linaro.org>
>
>
> Ilya Dryomov (3):
> libceph: unlink from o_linger_requests when clearing r_osd
> libceph: clear r_req_lru_item in __unregister_linger_request()
> libceph: change from BUG to WARN for __remove_osd() asserts
>
> net/ceph/osd_client.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/3] __remove_osd() vs lingering requests
2014-11-05 17:49 ` [PATCH 0/3] __remove_osd() vs lingering requests Alex Elder
@ 2014-11-05 18:42 ` Ilya Dryomov
0 siblings, 0 replies; 6+ messages in thread
From: Ilya Dryomov @ 2014-11-05 18:42 UTC (permalink / raw)
To: Alex Elder; +Cc: Ceph Development
On Wed, Nov 5, 2014 at 8:49 PM, Alex Elder <elder@ieee.org> wrote:
> On 11/05/2014 11:33 AM, Ilya Dryomov wrote:
>> Hello,
>>
>> These are fixes for the problem I found while trying to reproduce
>> recently reopened http://tracker.ceph.com/issues/5429. I'm pretty sure
>> this is what was reported in http://tracker.ceph.com/issues/8568 too.
>> The exact BUG_ON() it points to is no longer there, but I checked with
>> an older kernel and the reproducer I have for this triggered it.
>>
>> Thanks,
>>
>> Ilya
>
> All three of these look good to me. The interactions with
> these lists has always been a bit too tricky; you've done a
> lot to make sense of them.
>
> And I'm sure there are other BUG_ON() (or rbd_assert()) calls
> that can be tamed--either removed, changed to warnings, or
> changed to simply return errors where possible.
>
> Reviewed-by: Alex Elder <elder@linaro.org>
Thanks for the review, Alex!
Ilya
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-11-05 18:42 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-05 17:33 [PATCH 0/3] __remove_osd() vs lingering requests Ilya Dryomov
2014-11-05 17:33 ` [PATCH 1/3] libceph: unlink from o_linger_requests when clearing r_osd Ilya Dryomov
2014-11-05 17:33 ` [PATCH 2/3] libceph: clear r_req_lru_item in __unregister_linger_request() Ilya Dryomov
2014-11-05 17:33 ` [PATCH 3/3] libceph: change from BUG to WARN for __remove_osd() asserts Ilya Dryomov
2014-11-05 17:49 ` [PATCH 0/3] __remove_osd() vs lingering requests Alex Elder
2014-11-05 18:42 ` Ilya Dryomov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.