RGW Multisite delete wierdness

CEPH filesystem development
 help / color / mirror / Atom feed

* RGW Multisite delete wierdness
@ 2016-04-19 16:10 Abhishek Lekshmanan
  2016-04-19 17:52 ` Yehuda Sadeh-Weinraub
  0 siblings, 1 reply; 22+ messages in thread
From: Abhishek Lekshmanan @ 2016-04-19 16:10 UTC (permalink / raw)
  To: Ceph Devel

Trying deleting objects & buckets from a secondary zone in a RGW
multisite configuration leads to some wierdness:

1. On deleting an object and the bucket immediately will mostly lead to
object and bucket getting deleted in the secondary zone, but since we
forward the bucket deletion to master only after we delete in secondary
it will fail with 409 (BucketNotEmpty) and gets reraised as a 500 to the
client. This _seems_ simple enough to fix if we forward the bucket
deletion request to master zone before attempting deletion locally,
(issue: http://tracker.ceph.com/issues/15540, possible fix: https://github.com/ceph/ceph/pull/8655)

2. Deletion of objects themselves: deletion of objects themselves seems
to be a bit racy, deleting an object on a secondary zone succeeds,
listing the bucket seems to show an empty list, but gets populated with
the object again sometimes (this time with a newer timestamp), this is
not always guaranteed to be reproduce, but I've seen this often with
multipart uploads, as an eg:

$ s3 -u list test-mp
                       Key                             Last Modified      Size
--------------------------------------------------  --------------------  -----
test.img                                            2016-04-19T13:00:17Z    40M
$ s3 -u delete test-mp/test.img
$ s3 -u list test-mp
                       Key                             Last Modified      Size
--------------------------------------------------  --------------------  -----
test.img                                            2016-04-19T13:00:45Z    40M
$ s3 -u delete test-mp/test.img # wait for a  min
$ s3 -us list test-mp
--------------------------------------------------  --------------------  -----
test.img                                            2016-04-19T13:01:52Z    40M

Mostly seeing log entries of this form in both the cases ie. where
delete object seems to be successfully delete in both master and
secondary zone and the case where it succeeds in master and fails in
secondary :

20 parsed entry: id=00000000027.27.2 iter->object=foo iter->instance= name=foo instance= ns=
20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: non-complete operation
20 parsed entry: id=00000000028.28.2 iter->object=foo iter->instance= name=foo instance= ns=
20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: canceled operation

Any ideas on this?

--
Abhishek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-04-19 16:10 RGW Multisite delete wierdness Abhishek Lekshmanan
@ 2016-04-19 17:52 ` Yehuda Sadeh-Weinraub
  2016-04-19 17:54   ` Abhishek L
  0 siblings, 1 reply; 22+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2016-04-19 17:52 UTC (permalink / raw)
  To: Abhishek Lekshmanan; +Cc: Ceph Devel

On Tue, Apr 19, 2016 at 9:10 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
> Trying deleting objects & buckets from a secondary zone in a RGW
> multisite configuration leads to some wierdness:
>
> 1. On deleting an object and the bucket immediately will mostly lead to
> object and bucket getting deleted in the secondary zone, but since we
> forward the bucket deletion to master only after we delete in secondary
> it will fail with 409 (BucketNotEmpty) and gets reraised as a 500 to the
> client. This _seems_ simple enough to fix if we forward the bucket
> deletion request to master zone before attempting deletion locally,
> (issue: http://tracker.ceph.com/issues/15540, possible fix: https://github.com/ceph/ceph/pull/8655)
>

Yeah, this looks good. We'll get it through testing.

> 2. Deletion of objects themselves: deletion of objects themselves seems
> to be a bit racy, deleting an object on a secondary zone succeeds,
> listing the bucket seems to show an empty list, but gets populated with
> the object again sometimes (this time with a newer timestamp), this is
> not always guaranteed to be reproduce, but I've seen this often with
> multipart uploads, as an eg:
>
> $ s3 -u list test-mp
>                        Key                             Last Modified      Size
> --------------------------------------------------  --------------------  -----
> test.img                                            2016-04-19T13:00:17Z    40M
> $ s3 -u delete test-mp/test.img
> $ s3 -u list test-mp
>                        Key                             Last Modified      Size
> --------------------------------------------------  --------------------  -----
> test.img                                            2016-04-19T13:00:45Z    40M
> $ s3 -u delete test-mp/test.img # wait for a  min
> $ s3 -us list test-mp
> --------------------------------------------------  --------------------  -----
> test.img                                            2016-04-19T13:01:52Z    40M
>
>
> Mostly seeing log entries of this form in both the cases ie. where
> delete object seems to be successfully delete in both master and
> secondary zone and the case where it succeeds in master and fails in
> secondary :
>
> 20 parsed entry: id=00000000027.27.2 iter->object=foo iter->instance= name=foo instance= ns=
> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: non-complete operation
> 20 parsed entry: id=00000000028.28.2 iter->object=foo iter->instance= name=foo instance= ns=
> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: canceled operation
>
> Any ideas on this?
>

Do you have more than 2 zones syncing? Is it an object delete that
came right after the object creation?

Yehuda

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-04-19 17:52 ` Yehuda Sadeh-Weinraub
@ 2016-04-19 17:54   ` Abhishek L
  2016-04-19 18:08     ` Yehuda Sadeh-Weinraub
  0 siblings, 1 reply; 22+ messages in thread
From: Abhishek L @ 2016-04-19 17:54 UTC (permalink / raw)
  To: Yehuda Sadeh-Weinraub; +Cc: Abhishek Lekshmanan, Ceph Devel


Yehuda Sadeh-Weinraub writes:

> On Tue, Apr 19, 2016 at 9:10 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>> Trying deleting objects & buckets from a secondary zone in a RGW
>> multisite configuration leads to some wierdness:
>>
>> 1. On deleting an object and the bucket immediately will mostly lead to
>> object and bucket getting deleted in the secondary zone, but since we
>> forward the bucket deletion to master only after we delete in secondary
>> it will fail with 409 (BucketNotEmpty) and gets reraised as a 500 to the
>> client. This _seems_ simple enough to fix if we forward the bucket
>> deletion request to master zone before attempting deletion locally,
>> (issue: http://tracker.ceph.com/issues/15540, possible fix: https://github.com/ceph/ceph/pull/8655)
>>
>
> Yeah, this looks good. We'll get it through testing.
>
>> 2. Deletion of objects themselves: deletion of objects themselves seems
>> to be a bit racy, deleting an object on a secondary zone succeeds,
>> listing the bucket seems to show an empty list, but gets populated with
>> the object again sometimes (this time with a newer timestamp), this is
>> not always guaranteed to be reproduce, but I've seen this often with
>> multipart uploads, as an eg:
>>
>> $ s3 -u list test-mp
>>                        Key                             Last Modified      Size
>> --------------------------------------------------  --------------------  -----
>> test.img                                            2016-04-19T13:00:17Z    40M
>> $ s3 -u delete test-mp/test.img
>> $ s3 -u list test-mp
>>                        Key                             Last Modified      Size
>> --------------------------------------------------  --------------------  -----
>> test.img                                            2016-04-19T13:00:45Z    40M
>> $ s3 -u delete test-mp/test.img # wait for a  min
>> $ s3 -us list test-mp
>> --------------------------------------------------  --------------------  -----
>> test.img                                            2016-04-19T13:01:52Z    40M
>>
>>
>> Mostly seeing log entries of this form in both the cases ie. where
>> delete object seems to be successfully delete in both master and
>> secondary zone and the case where it succeeds in master and fails in
>> secondary :
>>
>> 20 parsed entry: id=00000000027.27.2 iter->object=foo iter->instance= name=foo instance= ns=
>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: non-complete operation
>> 20 parsed entry: id=00000000028.28.2 iter->object=foo iter->instance= name=foo instance= ns=
>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: canceled operation
>>
>> Any ideas on this?
>>
>
> Do you have more than 2 zones syncing? Is it an object delete that
> came right after the object creation?

Only 2 zones ie. one master and one secondary, req, on secondary. The delete came right after the
create though 
>
> Yehuda


-- 
Abhishek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-04-19 17:54   ` Abhishek L
@ 2016-04-19 18:08     ` Yehuda Sadeh-Weinraub
  2016-04-22  0:40       ` Yehuda Sadeh-Weinraub
  0 siblings, 1 reply; 22+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2016-04-19 18:08 UTC (permalink / raw)
  To: Abhishek L; +Cc: Abhishek Lekshmanan, Ceph Devel

On Tue, Apr 19, 2016 at 10:54 AM, Abhishek L
<abhishek.lekshmanan@gmail.com> wrote:
>
> Yehuda Sadeh-Weinraub writes:
>
>> On Tue, Apr 19, 2016 at 9:10 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>> Trying deleting objects & buckets from a secondary zone in a RGW
>>> multisite configuration leads to some wierdness:
>>>
>>> 1. On deleting an object and the bucket immediately will mostly lead to
>>> object and bucket getting deleted in the secondary zone, but since we
>>> forward the bucket deletion to master only after we delete in secondary
>>> it will fail with 409 (BucketNotEmpty) and gets reraised as a 500 to the
>>> client. This _seems_ simple enough to fix if we forward the bucket
>>> deletion request to master zone before attempting deletion locally,
>>> (issue: http://tracker.ceph.com/issues/15540, possible fix: https://github.com/ceph/ceph/pull/8655)
>>>
>>
>> Yeah, this looks good. We'll get it through testing.
>>
>>> 2. Deletion of objects themselves: deletion of objects themselves seems
>>> to be a bit racy, deleting an object on a secondary zone succeeds,
>>> listing the bucket seems to show an empty list, but gets populated with
>>> the object again sometimes (this time with a newer timestamp), this is
>>> not always guaranteed to be reproduce, but I've seen this often with
>>> multipart uploads, as an eg:
>>>
>>> $ s3 -u list test-mp
>>>                        Key                             Last Modified      Size
>>> --------------------------------------------------  --------------------  -----
>>> test.img                                            2016-04-19T13:00:17Z    40M
>>> $ s3 -u delete test-mp/test.img
>>> $ s3 -u list test-mp
>>>                        Key                             Last Modified      Size
>>> --------------------------------------------------  --------------------  -----
>>> test.img                                            2016-04-19T13:00:45Z    40M
>>> $ s3 -u delete test-mp/test.img # wait for a  min
>>> $ s3 -us list test-mp
>>> --------------------------------------------------  --------------------  -----
>>> test.img                                            2016-04-19T13:01:52Z    40M
>>>
>>>
>>> Mostly seeing log entries of this form in both the cases ie. where
>>> delete object seems to be successfully delete in both master and
>>> secondary zone and the case where it succeeds in master and fails in
>>> secondary :
>>>
>>> 20 parsed entry: id=00000000027.27.2 iter->object=foo iter->instance= name=foo instance= ns=
>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: non-complete operation
>>> 20 parsed entry: id=00000000028.28.2 iter->object=foo iter->instance= name=foo instance= ns=
>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: canceled operation
>>>
>>> Any ideas on this?
>>>
>>
>> Do you have more than 2 zones syncing? Is it an object delete that
>> came right after the object creation?
>
> Only 2 zones ie. one master and one secondary, req, on secondary. The delete came right after the
> create though

There are two issues that I see here. One is that we sync an object,
but end up with different mtime than the object's source. The second
issue is that we shouldn't have synced that object.

There needs to be a check when syncing objects, to validate that we
don't sync an object that originated from the current zone (by
comparing the short zone id). We might be missing that.

Yehuda

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-04-19 18:08     ` Yehuda Sadeh-Weinraub
@ 2016-04-22  0:40       ` Yehuda Sadeh-Weinraub
  2016-04-25  8:17         ` Abhishek Lekshmanan
  0 siblings, 1 reply; 22+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2016-04-22  0:40 UTC (permalink / raw)
  To: Abhishek L; +Cc: Abhishek Lekshmanan, Ceph Devel

On Tue, Apr 19, 2016 at 11:08 AM, Yehuda Sadeh-Weinraub
<yehuda@redhat.com> wrote:
> On Tue, Apr 19, 2016 at 10:54 AM, Abhishek L
> <abhishek.lekshmanan@gmail.com> wrote:
>>
>> Yehuda Sadeh-Weinraub writes:
>>
>>> On Tue, Apr 19, 2016 at 9:10 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>> Trying deleting objects & buckets from a secondary zone in a RGW
>>>> multisite configuration leads to some wierdness:
>>>>
>>>> 1. On deleting an object and the bucket immediately will mostly lead to
>>>> object and bucket getting deleted in the secondary zone, but since we
>>>> forward the bucket deletion to master only after we delete in secondary
>>>> it will fail with 409 (BucketNotEmpty) and gets reraised as a 500 to the
>>>> client. This _seems_ simple enough to fix if we forward the bucket
>>>> deletion request to master zone before attempting deletion locally,
>>>> (issue: http://tracker.ceph.com/issues/15540, possible fix: https://github.com/ceph/ceph/pull/8655)
>>>>
>>>
>>> Yeah, this looks good. We'll get it through testing.
>>>
>>>> 2. Deletion of objects themselves: deletion of objects themselves seems
>>>> to be a bit racy, deleting an object on a secondary zone succeeds,
>>>> listing the bucket seems to show an empty list, but gets populated with
>>>> the object again sometimes (this time with a newer timestamp), this is
>>>> not always guaranteed to be reproduce, but I've seen this often with
>>>> multipart uploads, as an eg:
>>>>
>>>> $ s3 -u list test-mp
>>>>                        Key                             Last Modified      Size
>>>> --------------------------------------------------  --------------------  -----
>>>> test.img                                            2016-04-19T13:00:17Z    40M
>>>> $ s3 -u delete test-mp/test.img
>>>> $ s3 -u list test-mp
>>>>                        Key                             Last Modified      Size
>>>> --------------------------------------------------  --------------------  -----
>>>> test.img                                            2016-04-19T13:00:45Z    40M
>>>> $ s3 -u delete test-mp/test.img # wait for a  min
>>>> $ s3 -us list test-mp
>>>> --------------------------------------------------  --------------------  -----
>>>> test.img                                            2016-04-19T13:01:52Z    40M
>>>>
>>>>
>>>> Mostly seeing log entries of this form in both the cases ie. where
>>>> delete object seems to be successfully delete in both master and
>>>> secondary zone and the case where it succeeds in master and fails in
>>>> secondary :
>>>>
>>>> 20 parsed entry: id=00000000027.27.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: non-complete operation
>>>> 20 parsed entry: id=00000000028.28.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: canceled operation
>>>>
>>>> Any ideas on this?
>>>>
>>>
>>> Do you have more than 2 zones syncing? Is it an object delete that
>>> came right after the object creation?
>>
>> Only 2 zones ie. one master and one secondary, req, on secondary. The delete came right after the
>> create though
>
> There are two issues that I see here. One is that we sync an object,
> but end up with different mtime than the object's source. The second
> issue is that we shouldn't have synced that object.
>
> There needs to be a check when syncing objects, to validate that we
> don't sync an object that originated from the current zone (by
> comparing the short zone id). We might be missing that.
>

For the first issue, see:
https://github.com/ceph/ceph/pull/8685

However, create that follows by a delete will still be a problem, as
when we sync the object we check it against the source mtime is newer
than the destination mtime. This is problematic with deletes, as these
don't have mtime once the object is removed. I think the solution
would be by using temporary tombstone objects (we already have the olh
framework that can provide what we need), that we'll garbage collect.

Yehuda

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-04-22  0:40       ` Yehuda Sadeh-Weinraub
@ 2016-04-25  8:17         ` Abhishek Lekshmanan
  2016-04-25 18:46           ` Yehuda Sadeh-Weinraub
  0 siblings, 1 reply; 22+ messages in thread
From: Abhishek Lekshmanan @ 2016-04-25  8:17 UTC (permalink / raw)
  To: Yehuda Sadeh-Weinraub; +Cc: Ceph Devel


Yehuda Sadeh-Weinraub writes:

> On Tue, Apr 19, 2016 at 11:08 AM, Yehuda Sadeh-Weinraub
> <yehuda@redhat.com> wrote:
>> On Tue, Apr 19, 2016 at 10:54 AM, Abhishek L
>> <abhishek.lekshmanan@gmail.com> wrote:
>>>
>>> Yehuda Sadeh-Weinraub writes:
>>>
>>>> On Tue, Apr 19, 2016 at 9:10 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>> Trying deleting objects & buckets from a secondary zone in a RGW
>>>>> multisite configuration leads to some wierdness:
>>>>>
>>>>> 1. On deleting an object and the bucket immediately will mostly lead to
>>>>> object and bucket getting deleted in the secondary zone, but since we
>>>>> forward the bucket deletion to master only after we delete in secondary
>>>>> it will fail with 409 (BucketNotEmpty) and gets reraised as a 500 to the
>>>>> client. This _seems_ simple enough to fix if we forward the bucket
>>>>> deletion request to master zone before attempting deletion locally,
>>>>> (issue: http://tracker.ceph.com/issues/15540, possible fix: https://github.com/ceph/ceph/pull/8655)
>>>>>
>>>>
>>>> Yeah, this looks good. We'll get it through testing.
>>>>
>>>>> 2. Deletion of objects themselves: deletion of objects themselves seems
>>>>> to be a bit racy, deleting an object on a secondary zone succeeds,
>>>>> listing the bucket seems to show an empty list, but gets populated with
>>>>> the object again sometimes (this time with a newer timestamp), this is
>>>>> not always guaranteed to be reproduce, but I've seen this often with
>>>>> multipart uploads, as an eg:
>>>>>
>>>>> $ s3 -u list test-mp
>>>>>                        Key                             Last Modified      Size
>>>>> --------------------------------------------------  --------------------  -----
>>>>> test.img                                            2016-04-19T13:00:17Z    40M
>>>>> $ s3 -u delete test-mp/test.img
>>>>> $ s3 -u list test-mp
>>>>>                        Key                             Last Modified      Size
>>>>> --------------------------------------------------  --------------------  -----
>>>>> test.img                                            2016-04-19T13:00:45Z    40M
>>>>> $ s3 -u delete test-mp/test.img # wait for a  min
>>>>> $ s3 -us list test-mp
>>>>> --------------------------------------------------  --------------------  -----
>>>>> test.img                                            2016-04-19T13:01:52Z    40M
>>>>>
>>>>>
>>>>> Mostly seeing log entries of this form in both the cases ie. where
>>>>> delete object seems to be successfully delete in both master and
>>>>> secondary zone and the case where it succeeds in master and fails in
>>>>> secondary :
>>>>>
>>>>> 20 parsed entry: id=00000000027.27.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: non-complete operation
>>>>> 20 parsed entry: id=00000000028.28.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: canceled operation
>>>>>
>>>>> Any ideas on this?
>>>>>
>>>>
>>>> Do you have more than 2 zones syncing? Is it an object delete that
>>>> came right after the object creation?
>>>
>>> Only 2 zones ie. one master and one secondary, req, on secondary. The delete came right after the
>>> create though
>>
>> There are two issues that I see here. One is that we sync an object,
>> but end up with different mtime than the object's source. The second
>> issue is that we shouldn't have synced that object.
>>
>> There needs to be a check when syncing objects, to validate that we
>> don't sync an object that originated from the current zone (by
>> comparing the short zone id). We might be missing that.
>>
>
> For the first issue, see:
> https://github.com/ceph/ceph/pull/8685
>
> However, create that follows by a delete will still be a problem, as
> when we sync the object we check it against the source mtime is newer
> than the destination mtime. This is problematic with deletes, as these
> don't have mtime once the object is removed. I think the solution
> would be by using temporary tombstone objects (we already have the olh
> framework that can provide what we need), that we'll garbage collect.

Further information from logs if it helps:

2016-04-19 17:00:45.539356 7fc99effd700  0 _send_request(): deleting obj=test-mp:test.img
2016-04-19 17:00:45.539902 7fc99effd700 20 _send_request(): skipping object removal obj=test-mp:test.img (obj mtime=2016-04-19 17:00:26.0.098255s, request timestamp=2016-04-19 17:00:17.0.395208s)

This is what the master zone logs show, however the request timestamp
logged here is the `If-Modified-Since` value from secondary zone when
the actual object write was completed (and not the time when deletion
was completed),  do we set the value of the deletion time anywhere else
in the BI log


>
> Yehuda


--
Abhishek Lekshmanan
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-04-25  8:17         ` Abhishek Lekshmanan
@ 2016-04-25 18:46           ` Yehuda Sadeh-Weinraub
  2016-04-25 19:44             ` Abhishek L
  0 siblings, 1 reply; 22+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2016-04-25 18:46 UTC (permalink / raw)
  To: Abhishek Lekshmanan; +Cc: Ceph Devel

On Mon, Apr 25, 2016 at 1:17 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>
> Yehuda Sadeh-Weinraub writes:
>
>> On Tue, Apr 19, 2016 at 11:08 AM, Yehuda Sadeh-Weinraub
>> <yehuda@redhat.com> wrote:
>>> On Tue, Apr 19, 2016 at 10:54 AM, Abhishek L
>>> <abhishek.lekshmanan@gmail.com> wrote:
>>>>
>>>> Yehuda Sadeh-Weinraub writes:
>>>>
>>>>> On Tue, Apr 19, 2016 at 9:10 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>> Trying deleting objects & buckets from a secondary zone in a RGW
>>>>>> multisite configuration leads to some wierdness:
>>>>>>
>>>>>> 1. On deleting an object and the bucket immediately will mostly lead to
>>>>>> object and bucket getting deleted in the secondary zone, but since we
>>>>>> forward the bucket deletion to master only after we delete in secondary
>>>>>> it will fail with 409 (BucketNotEmpty) and gets reraised as a 500 to the
>>>>>> client. This _seems_ simple enough to fix if we forward the bucket
>>>>>> deletion request to master zone before attempting deletion locally,
>>>>>> (issue: http://tracker.ceph.com/issues/15540, possible fix: https://github.com/ceph/ceph/pull/8655)
>>>>>>
>>>>>
>>>>> Yeah, this looks good. We'll get it through testing.
>>>>>
>>>>>> 2. Deletion of objects themselves: deletion of objects themselves seems
>>>>>> to be a bit racy, deleting an object on a secondary zone succeeds,
>>>>>> listing the bucket seems to show an empty list, but gets populated with
>>>>>> the object again sometimes (this time with a newer timestamp), this is
>>>>>> not always guaranteed to be reproduce, but I've seen this often with
>>>>>> multipart uploads, as an eg:
>>>>>>
>>>>>> $ s3 -u list test-mp
>>>>>>                        Key                             Last Modified      Size
>>>>>> --------------------------------------------------  --------------------  -----
>>>>>> test.img                                            2016-04-19T13:00:17Z    40M
>>>>>> $ s3 -u delete test-mp/test.img
>>>>>> $ s3 -u list test-mp
>>>>>>                        Key                             Last Modified      Size
>>>>>> --------------------------------------------------  --------------------  -----
>>>>>> test.img                                            2016-04-19T13:00:45Z    40M
>>>>>> $ s3 -u delete test-mp/test.img # wait for a  min
>>>>>> $ s3 -us list test-mp
>>>>>> --------------------------------------------------  --------------------  -----
>>>>>> test.img                                            2016-04-19T13:01:52Z    40M
>>>>>>
>>>>>>
>>>>>> Mostly seeing log entries of this form in both the cases ie. where
>>>>>> delete object seems to be successfully delete in both master and
>>>>>> secondary zone and the case where it succeeds in master and fails in
>>>>>> secondary :
>>>>>>
>>>>>> 20 parsed entry: id=00000000027.27.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: non-complete operation
>>>>>> 20 parsed entry: id=00000000028.28.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: canceled operation
>>>>>>
>>>>>> Any ideas on this?
>>>>>>
>>>>>
>>>>> Do you have more than 2 zones syncing? Is it an object delete that
>>>>> came right after the object creation?
>>>>
>>>> Only 2 zones ie. one master and one secondary, req, on secondary. The delete came right after the
>>>> create though
>>>
>>> There are two issues that I see here. One is that we sync an object,
>>> but end up with different mtime than the object's source. The second
>>> issue is that we shouldn't have synced that object.
>>>
>>> There needs to be a check when syncing objects, to validate that we
>>> don't sync an object that originated from the current zone (by
>>> comparing the short zone id). We might be missing that.
>>>
>>
>> For the first issue, see:
>> https://github.com/ceph/ceph/pull/8685
>>
>> However, create that follows by a delete will still be a problem, as
>> when we sync the object we check it against the source mtime is newer
>> than the destination mtime. This is problematic with deletes, as these
>> don't have mtime once the object is removed. I think the solution
>> would be by using temporary tombstone objects (we already have the olh
>> framework that can provide what we need), that we'll garbage collect.
>
> Further information from logs if it helps:
>
> 2016-04-19 17:00:45.539356 7fc99effd700  0 _send_request(): deleting obj=test-mp:test.img
> 2016-04-19 17:00:45.539902 7fc99effd700 20 _send_request(): skipping object removal obj=test-mp:test.img (obj mtime=2016-04-19 17:00:26.0.098255s, request timestamp=2016-04-19 17:00:17.0.395208s)
>
> This is what the master zone logs show, however the request timestamp
> logged here is the `If-Modified-Since` value from secondary zone when
> the actual object write was completed (and not the time when deletion
> was completed),  do we set the value of the deletion time anywhere else
> in the BI log
>
>

Did you apply PR 8685?

Also, take a look at this:

https://github.com/ceph/ceph/pull/8709

With the new code we do store the object creation time in the delete
bucket index entry. That way we make sure we only sync object removal,
if the object was the same or older than the one that was actually
removed.

Yehuda

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-04-25 18:46           ` Yehuda Sadeh-Weinraub
@ 2016-04-25 19:44             ` Abhishek L
  2016-04-26 17:37               ` Abhishek Lekshmanan
  0 siblings, 1 reply; 22+ messages in thread
From: Abhishek L @ 2016-04-25 19:44 UTC (permalink / raw)
  To: Yehuda Sadeh-Weinraub; +Cc: Abhishek Lekshmanan, Ceph Devel


Yehuda Sadeh-Weinraub writes:

> On Mon, Apr 25, 2016 at 1:17 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>
>> Yehuda Sadeh-Weinraub writes:
>>
>>> On Tue, Apr 19, 2016 at 11:08 AM, Yehuda Sadeh-Weinraub
>>> <yehuda@redhat.com> wrote:
>>>> On Tue, Apr 19, 2016 at 10:54 AM, Abhishek L
>>>> <abhishek.lekshmanan@gmail.com> wrote:
>>>>>
>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>
>>>>>> On Tue, Apr 19, 2016 at 9:10 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>>> Trying deleting objects & buckets from a secondary zone in a RGW
>>>>>>> multisite configuration leads to some wierdness:
>>>>>>>
>>>>>>> 1. On deleting an object and the bucket immediately will mostly lead to
>>>>>>> object and bucket getting deleted in the secondary zone, but since we
>>>>>>> forward the bucket deletion to master only after we delete in secondary
>>>>>>> it will fail with 409 (BucketNotEmpty) and gets reraised as a 500 to the
>>>>>>> client. This _seems_ simple enough to fix if we forward the bucket
>>>>>>> deletion request to master zone before attempting deletion locally,
>>>>>>> (issue: http://tracker.ceph.com/issues/15540, possible fix: https://github.com/ceph/ceph/pull/8655)
>>>>>>>
>>>>>>
>>>>>> Yeah, this looks good. We'll get it through testing.
>>>>>>
>>>>>>> 2. Deletion of objects themselves: deletion of objects themselves seems
>>>>>>> to be a bit racy, deleting an object on a secondary zone succeeds,
>>>>>>> listing the bucket seems to show an empty list, but gets populated with
>>>>>>> the object again sometimes (this time with a newer timestamp), this is
>>>>>>> not always guaranteed to be reproduce, but I've seen this often with
>>>>>>> multipart uploads, as an eg:
>>>>>>>
>>>>>>> $ s3 -u list test-mp
>>>>>>>                        Key                             Last Modified      Size
>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>> test.img                                            2016-04-19T13:00:17Z    40M
>>>>>>> $ s3 -u delete test-mp/test.img
>>>>>>> $ s3 -u list test-mp
>>>>>>>                        Key                             Last Modified      Size
>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>> test.img                                            2016-04-19T13:00:45Z    40M
>>>>>>> $ s3 -u delete test-mp/test.img # wait for a  min
>>>>>>> $ s3 -us list test-mp
>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>> test.img                                            2016-04-19T13:01:52Z    40M
>>>>>>>
>>>>>>>
>>>>>>> Mostly seeing log entries of this form in both the cases ie. where
>>>>>>> delete object seems to be successfully delete in both master and
>>>>>>> secondary zone and the case where it succeeds in master and fails in
>>>>>>> secondary :
>>>>>>>
>>>>>>> 20 parsed entry: id=00000000027.27.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: non-complete operation
>>>>>>> 20 parsed entry: id=00000000028.28.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: canceled operation
>>>>>>>
>>>>>>> Any ideas on this?
>>>>>>>
>>>>>>
>>>>>> Do you have more than 2 zones syncing? Is it an object delete that
>>>>>> came right after the object creation?
>>>>>
>>>>> Only 2 zones ie. one master and one secondary, req, on secondary. The delete came right after the
>>>>> create though
>>>>
>>>> There are two issues that I see here. One is that we sync an object,
>>>> but end up with different mtime than the object's source. The second
>>>> issue is that we shouldn't have synced that object.
>>>>
>>>> There needs to be a check when syncing objects, to validate that we
>>>> don't sync an object that originated from the current zone (by
>>>> comparing the short zone id). We might be missing that.
>>>>
>>>
>>> For the first issue, see:
>>> https://github.com/ceph/ceph/pull/8685
>>>
>>> However, create that follows by a delete will still be a problem, as
>>> when we sync the object we check it against the source mtime is newer
>>> than the destination mtime. This is problematic with deletes, as these
>>> don't have mtime once the object is removed. I think the solution
>>> would be by using temporary tombstone objects (we already have the olh
>>> framework that can provide what we need), that we'll garbage collect.
>>
>> Further information from logs if it helps:
>>
>> 2016-04-19 17:00:45.539356 7fc99effd700  0 _send_request(): deleting obj=test-mp:test.img
>> 2016-04-19 17:00:45.539902 7fc99effd700 20 _send_request(): skipping object removal obj=test-mp:test.img (obj mtime=2016-04-19 17:00:26.0.098255s, request timestamp=2016-04-19 17:00:17.0.395208s)
>>
>> This is what the master zone logs show, however the request timestamp
>> logged here is the `If-Modified-Since` value from secondary zone when
>> the actual object write was completed (and not the time when deletion
>> was completed),  do we set the value of the deletion time anywhere else
>> in the BI log
>>
>>
>
> Did you apply PR 8685?
>
> Also, take a look at this:
>
> https://github.com/ceph/ceph/pull/8709
>
> With the new code we do store the object creation time in the delete
> bucket index entry. That way we make sure we only sync object removal,
> if the object was the same or older than the one that was actually
> removed.

Hadn't applied the PR yet, I'll apply both and see if I can reproduce
the issue again. 
>
> Yehuda

Thanks
-- 
Abhishek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-04-25 19:44             ` Abhishek L
@ 2016-04-26 17:37               ` Abhishek Lekshmanan
  2016-04-26 22:21                 ` Yehuda Sadeh-Weinraub
  0 siblings, 1 reply; 22+ messages in thread
From: Abhishek Lekshmanan @ 2016-04-26 17:37 UTC (permalink / raw)
  To: Abhishek L; +Cc: Yehuda Sadeh-Weinraub, Ceph Devel


Abhishek L writes:

> Yehuda Sadeh-Weinraub writes:
>
>> On Mon, Apr 25, 2016 at 1:17 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>
>>> Yehuda Sadeh-Weinraub writes:
>>>
>>>> On Tue, Apr 19, 2016 at 11:08 AM, Yehuda Sadeh-Weinraub
>>>> <yehuda@redhat.com> wrote:
>>>>> On Tue, Apr 19, 2016 at 10:54 AM, Abhishek L
>>>>> <abhishek.lekshmanan@gmail.com> wrote:
>>>>>>
>>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>>
>>>>>>> On Tue, Apr 19, 2016 at 9:10 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>>>> Trying deleting objects & buckets from a secondary zone in a RGW
>>>>>>>> multisite configuration leads to some wierdness:
>>>>>>>>
>>>>>>>> 1. On deleting an object and the bucket immediately will mostly lead to
>>>>>>>> object and bucket getting deleted in the secondary zone, but since we
>>>>>>>> forward the bucket deletion to master only after we delete in secondary
>>>>>>>> it will fail with 409 (BucketNotEmpty) and gets reraised as a 500 to the
>>>>>>>> client. This _seems_ simple enough to fix if we forward the bucket
>>>>>>>> deletion request to master zone before attempting deletion locally,
>>>>>>>> (issue: http://tracker.ceph.com/issues/15540, possible fix: https://github.com/ceph/ceph/pull/8655)
>>>>>>>>
>>>>>>>
>>>>>>> Yeah, this looks good. We'll get it through testing.
>>>>>>>
>>>>>>>> 2. Deletion of objects themselves: deletion of objects themselves seems
>>>>>>>> to be a bit racy, deleting an object on a secondary zone succeeds,
>>>>>>>> listing the bucket seems to show an empty list, but gets populated with
>>>>>>>> the object again sometimes (this time with a newer timestamp), this is
>>>>>>>> not always guaranteed to be reproduce, but I've seen this often with
>>>>>>>> multipart uploads, as an eg:
>>>>>>>>
>>>>>>>> $ s3 -u list test-mp
>>>>>>>>                        Key                             Last Modified      Size
>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>> test.img                                            2016-04-19T13:00:17Z    40M
>>>>>>>> $ s3 -u delete test-mp/test.img
>>>>>>>> $ s3 -u list test-mp
>>>>>>>>                        Key                             Last Modified      Size
>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>> test.img                                            2016-04-19T13:00:45Z    40M
>>>>>>>> $ s3 -u delete test-mp/test.img # wait for a  min
>>>>>>>> $ s3 -us list test-mp
>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>> test.img                                            2016-04-19T13:01:52Z    40M
>>>>>>>>
>>>>>>>>
>>>>>>>> Mostly seeing log entries of this form in both the cases ie. where
>>>>>>>> delete object seems to be successfully delete in both master and
>>>>>>>> secondary zone and the case where it succeeds in master and fails in
>>>>>>>> secondary :
>>>>>>>>
>>>>>>>> 20 parsed entry: id=00000000027.27.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: non-complete operation
>>>>>>>> 20 parsed entry: id=00000000028.28.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: canceled operation
>>>>>>>>
>>>>>>>> Any ideas on this?
>>>>>>>>
>>>>>>>
>>>>>>> Do you have more than 2 zones syncing? Is it an object delete that
>>>>>>> came right after the object creation?
>>>>>>
>>>>>> Only 2 zones ie. one master and one secondary, req, on secondary. The delete came right after the
>>>>>> create though
>>>>>
>>>>> There are two issues that I see here. One is that we sync an object,
>>>>> but end up with different mtime than the object's source. The second
>>>>> issue is that we shouldn't have synced that object.
>>>>>
>>>>> There needs to be a check when syncing objects, to validate that we
>>>>> don't sync an object that originated from the current zone (by
>>>>> comparing the short zone id). We might be missing that.
>>>>>
>>>>
>>>> For the first issue, see:
>>>> https://github.com/ceph/ceph/pull/8685
>>>>
>>>> However, create that follows by a delete will still be a problem, as
>>>> when we sync the object we check it against the source mtime is newer
>>>> than the destination mtime. This is problematic with deletes, as these
>>>> don't have mtime once the object is removed. I think the solution
>>>> would be by using temporary tombstone objects (we already have the olh
>>>> framework that can provide what we need), that we'll garbage collect.
>>>
>>> Further information from logs if it helps:
>>>
>>> 2016-04-19 17:00:45.539356 7fc99effd700  0 _send_request(): deleting obj=test-mp:test.img
>>> 2016-04-19 17:00:45.539902 7fc99effd700 20 _send_request(): skipping object removal obj=test-mp:test.img (obj mtime=2016-04-19 17:00:26.0.098255s, request timestamp=2016-04-19 17:00:17.0.395208s)
>>>
>>> This is what the master zone logs show, however the request timestamp
>>> logged here is the `If-Modified-Since` value from secondary zone when
>>> the actual object write was completed (and not the time when deletion
>>> was completed),  do we set the value of the deletion time anywhere else
>>> in the BI log
>>>
>>>
>>
>> Did you apply PR 8685?
>>
>> Also, take a look at this:
>>
>> https://github.com/ceph/ceph/pull/8709
>>
>> With the new code we do store the object creation time in the delete
>> bucket index entry. That way we make sure we only sync object removal,
>> if the object was the same or older than the one that was actually
>> removed.

Applied both PRs atop of master + 8655, basically now the object doesn't
resync back to the secondary zone after deletion which we observed
before.A create followed by an immediate delete succeeds delete in both the
zones almost every time.

However allowing the object to sync to primary by introducing a delay,
for eg a script on secondary like:

for i in {1..20}; do s3 -us put foobar/foo$i filename=test.img && sleep 3 && s3 -us delete foobar/foo$i;  done

gives a empty list on the secondary zone, on the primary zone however it
looks like very few objects got actually deleted and others are still existing

$ s3 -us list foobar
Content-Type: application/xml
Request-Id: tx000000000000000001db8-00571f9cdb-1015-us-east-1
Content-Length: 4713
                       Key                             Last Modified      Size
--------------------------------------------------  --------------------  -----
foo1                                                2016-04-26T14:24:25Z    87M
foo10                                               2016-04-26T14:27:28Z    87M
foo11                                               2016-04-26T14:27:50Z    87M
foo12                                               2016-04-26T14:28:12Z    87M
foo14                                               2016-04-26T14:28:48Z    87M
foo15                                               2016-04-26T14:29:09Z    87M
foo16                                               2016-04-26T14:29:30Z    87M
foo17                                               2016-04-26T14:29:51Z    87M
foo18                                               2016-04-26T14:30:12Z    87M
foo19                                               2016-04-26T14:30:33Z    87M
foo2                                                2016-04-26T14:24:47Z    87M
foo20                                               2016-04-26T14:30:54Z    87M
foo3                                                2016-04-26T14:25:07Z    87M
foo6                                                2016-04-26T14:26:05Z    87M
foo7                                                2016-04-26T14:26:24Z    87M
foo8                                                2016-04-26T14:26:47Z    87M
foo9                                                2016-04-26T14:27:07Z    87M

Logs show this in case of failed deletes:
2016-04-26 18:31:01.793673 7ff3f1ffb700 10 If-UnModified-Since: 2016-04-26 18:30:54.0.751623s Last-Modified: 0.000000
2016-04-26 18:31:01.793716 7ff3f1ffb700 20 _send_request(): delete_obj() obj=foobar3:foo20 returned ret=-2

In case of an object that succeeded deletion:
2016-04-26 18:28:40.673885 7ff3f27fc700 10 If-UnModified-Since: 2016-04-26 18:28:30.0.510155s Last-Modified: 2016-04-26 18:28:30.510155

More interesting log:
2016-04-26 18:25:52.660674 7ff3cf7fe700 10 If-UnModified-Since: 2016-04-26 18:25:46.0.700052s Last-Modified: 0.000000
2016-04-26 18:25:52.660698 7ff3cf7fe700 20 _send_request(): delete_obj() obj=foobar3:foo5 returned ret=-2
...
2016-04-26 18:25:59.341420 7ff3a67fc700 10 If-UnModified-Since: 2016-04-26 18:25:46.0.700052s Last-Modified: 2016-04-26 18:25:46.700052
[this was object foo5.. which was one of the objects that got deleted]

Let me know if any other log info may be helpful (only debug rgw was
set, debug ms was 0)

Possibly unrelated, a pkill radosgw sometimes used to throw an error
message with segfault, haven't seen this before though:
*** Caught signal (Segmentation fault) **
 in thread 7fdfbddfcb40 thread_name:radosgw

master sha1 before the patches was 25c2f5a793ff726facde236aa0c2dd78cf933c60

Regards
Abhishek
>
> Hadn't applied the PR yet, I'll apply both and see if I can reproduce
> the issue again.
>>
>> Yehuda
>
> Thanks
> --
> Abhishek



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-04-26 17:37               ` Abhishek Lekshmanan
@ 2016-04-26 22:21                 ` Yehuda Sadeh-Weinraub
  2016-04-26 23:12                   ` Yehuda Sadeh-Weinraub
  0 siblings, 1 reply; 22+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2016-04-26 22:21 UTC (permalink / raw)
  To: Abhishek Lekshmanan; +Cc: Ceph Devel

On Tue, Apr 26, 2016 at 10:37 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>
> Abhishek L writes:
>
>> Yehuda Sadeh-Weinraub writes:
>>
>>> On Mon, Apr 25, 2016 at 1:17 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>
>>>> Yehuda Sadeh-Weinraub writes:
>>>>
>>>>> On Tue, Apr 19, 2016 at 11:08 AM, Yehuda Sadeh-Weinraub
>>>>> <yehuda@redhat.com> wrote:
>>>>>> On Tue, Apr 19, 2016 at 10:54 AM, Abhishek L
>>>>>> <abhishek.lekshmanan@gmail.com> wrote:
>>>>>>>
>>>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>>>
>>>>>>>> On Tue, Apr 19, 2016 at 9:10 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>>>>> Trying deleting objects & buckets from a secondary zone in a RGW
>>>>>>>>> multisite configuration leads to some wierdness:
>>>>>>>>>
>>>>>>>>> 1. On deleting an object and the bucket immediately will mostly lead to
>>>>>>>>> object and bucket getting deleted in the secondary zone, but since we
>>>>>>>>> forward the bucket deletion to master only after we delete in secondary
>>>>>>>>> it will fail with 409 (BucketNotEmpty) and gets reraised as a 500 to the
>>>>>>>>> client. This _seems_ simple enough to fix if we forward the bucket
>>>>>>>>> deletion request to master zone before attempting deletion locally,
>>>>>>>>> (issue: http://tracker.ceph.com/issues/15540, possible fix: https://github.com/ceph/ceph/pull/8655)
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yeah, this looks good. We'll get it through testing.
>>>>>>>>
>>>>>>>>> 2. Deletion of objects themselves: deletion of objects themselves seems
>>>>>>>>> to be a bit racy, deleting an object on a secondary zone succeeds,
>>>>>>>>> listing the bucket seems to show an empty list, but gets populated with
>>>>>>>>> the object again sometimes (this time with a newer timestamp), this is
>>>>>>>>> not always guaranteed to be reproduce, but I've seen this often with
>>>>>>>>> multipart uploads, as an eg:
>>>>>>>>>
>>>>>>>>> $ s3 -u list test-mp
>>>>>>>>>                        Key                             Last Modified      Size
>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>> test.img                                            2016-04-19T13:00:17Z    40M
>>>>>>>>> $ s3 -u delete test-mp/test.img
>>>>>>>>> $ s3 -u list test-mp
>>>>>>>>>                        Key                             Last Modified      Size
>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>> test.img                                            2016-04-19T13:00:45Z    40M
>>>>>>>>> $ s3 -u delete test-mp/test.img # wait for a  min
>>>>>>>>> $ s3 -us list test-mp
>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>> test.img                                            2016-04-19T13:01:52Z    40M
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Mostly seeing log entries of this form in both the cases ie. where
>>>>>>>>> delete object seems to be successfully delete in both master and
>>>>>>>>> secondary zone and the case where it succeeds in master and fails in
>>>>>>>>> secondary :
>>>>>>>>>
>>>>>>>>> 20 parsed entry: id=00000000027.27.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: non-complete operation
>>>>>>>>> 20 parsed entry: id=00000000028.28.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: canceled operation
>>>>>>>>>
>>>>>>>>> Any ideas on this?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Do you have more than 2 zones syncing? Is it an object delete that
>>>>>>>> came right after the object creation?
>>>>>>>
>>>>>>> Only 2 zones ie. one master and one secondary, req, on secondary. The delete came right after the
>>>>>>> create though
>>>>>>
>>>>>> There are two issues that I see here. One is that we sync an object,
>>>>>> but end up with different mtime than the object's source. The second
>>>>>> issue is that we shouldn't have synced that object.
>>>>>>
>>>>>> There needs to be a check when syncing objects, to validate that we
>>>>>> don't sync an object that originated from the current zone (by
>>>>>> comparing the short zone id). We might be missing that.
>>>>>>
>>>>>
>>>>> For the first issue, see:
>>>>> https://github.com/ceph/ceph/pull/8685
>>>>>
>>>>> However, create that follows by a delete will still be a problem, as
>>>>> when we sync the object we check it against the source mtime is newer
>>>>> than the destination mtime. This is problematic with deletes, as these
>>>>> don't have mtime once the object is removed. I think the solution
>>>>> would be by using temporary tombstone objects (we already have the olh
>>>>> framework that can provide what we need), that we'll garbage collect.
>>>>
>>>> Further information from logs if it helps:
>>>>
>>>> 2016-04-19 17:00:45.539356 7fc99effd700  0 _send_request(): deleting obj=test-mp:test.img
>>>> 2016-04-19 17:00:45.539902 7fc99effd700 20 _send_request(): skipping object removal obj=test-mp:test.img (obj mtime=2016-04-19 17:00:26.0.098255s, request timestamp=2016-04-19 17:00:17.0.395208s)
>>>>
>>>> This is what the master zone logs show, however the request timestamp
>>>> logged here is the `If-Modified-Since` value from secondary zone when
>>>> the actual object write was completed (and not the time when deletion
>>>> was completed),  do we set the value of the deletion time anywhere else
>>>> in the BI log
>>>>
>>>>
>>>
>>> Did you apply PR 8685?
>>>
>>> Also, take a look at this:
>>>
>>> https://github.com/ceph/ceph/pull/8709
>>>
>>> With the new code we do store the object creation time in the delete
>>> bucket index entry. That way we make sure we only sync object removal,
>>> if the object was the same or older than the one that was actually
>>> removed.
>
> Applied both PRs atop of master + 8655, basically now the object doesn't
> resync back to the secondary zone after deletion which we observed
> before.A create followed by an immediate delete succeeds delete in both the
> zones almost every time.
>
> However allowing the object to sync to primary by introducing a delay,
> for eg a script on secondary like:
>
> for i in {1..20}; do s3 -us put foobar/foo$i filename=test.img && sleep 3 && s3 -us delete foobar/foo$i;  done
>
> gives a empty list on the secondary zone, on the primary zone however it
> looks like very few objects got actually deleted and others are still existing
>
> $ s3 -us list foobar
> Content-Type: application/xml
> Request-Id: tx000000000000000001db8-00571f9cdb-1015-us-east-1
> Content-Length: 4713
>                        Key                             Last Modified      Size
> --------------------------------------------------  --------------------  -----
> foo1                                                2016-04-26T14:24:25Z    87M
> foo10                                               2016-04-26T14:27:28Z    87M
> foo11                                               2016-04-26T14:27:50Z    87M
> foo12                                               2016-04-26T14:28:12Z    87M
> foo14                                               2016-04-26T14:28:48Z    87M
> foo15                                               2016-04-26T14:29:09Z    87M
> foo16                                               2016-04-26T14:29:30Z    87M
> foo17                                               2016-04-26T14:29:51Z    87M
> foo18                                               2016-04-26T14:30:12Z    87M
> foo19                                               2016-04-26T14:30:33Z    87M
> foo2                                                2016-04-26T14:24:47Z    87M
> foo20                                               2016-04-26T14:30:54Z    87M
> foo3                                                2016-04-26T14:25:07Z    87M
> foo6                                                2016-04-26T14:26:05Z    87M
> foo7                                                2016-04-26T14:26:24Z    87M
> foo8                                                2016-04-26T14:26:47Z    87M
> foo9                                                2016-04-26T14:27:07Z    87M
>
> Logs show this in case of failed deletes:
> 2016-04-26 18:31:01.793673 7ff3f1ffb700 10 If-UnModified-Since: 2016-04-26 18:30:54.0.751623s Last-Modified: 0.000000
> 2016-04-26 18:31:01.793716 7ff3f1ffb700 20 _send_request(): delete_obj() obj=foobar3:foo20 returned ret=-2
>
> In case of an object that succeeded deletion:
> 2016-04-26 18:28:40.673885 7ff3f27fc700 10 If-UnModified-Since: 2016-04-26 18:28:30.0.510155s Last-Modified: 2016-04-26 18:28:30.510155
>
> More interesting log:
> 2016-04-26 18:25:52.660674 7ff3cf7fe700 10 If-UnModified-Since: 2016-04-26 18:25:46.0.700052s Last-Modified: 0.000000
> 2016-04-26 18:25:52.660698 7ff3cf7fe700 20 _send_request(): delete_obj() obj=foobar3:foo5 returned ret=-2
> ...
> 2016-04-26 18:25:59.341420 7ff3a67fc700 10 If-UnModified-Since: 2016-04-26 18:25:46.0.700052s Last-Modified: 2016-04-26 18:25:46.700052
> [this was object foo5.. which was one of the objects that got deleted]
>
> Let me know if any other log info may be helpful (only debug rgw was
> set, debug ms was 0)

Yes, this would be great. If you could provide log with 'debug rgw =
20' and 'debug ms = 1' it'd be helpful. I'm trying to reproduce the
issue, and was able to get some bad behavior, but different than what
you describe.

>
> Possibly unrelated, a pkill radosgw sometimes used to throw an error
> message with segfault, haven't seen this before though:
> *** Caught signal (Segmentation fault) **
>  in thread 7fdfbddfcb40 thread_name:radosgw
>

Do you have a backtrace? I found an issue that might explain this one,
but need a backtrace to validate.

Thanks,
Yehuda

> master sha1 before the patches was 25c2f5a793ff726facde236aa0c2dd78cf933c60
>
> Regards
> Abhishek
>>
>> Hadn't applied the PR yet, I'll apply both and see if I can reproduce
>> the issue again.
>>>
>>> Yehuda
>>
>> Thanks
>> --
>> Abhishek
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-04-26 22:21                 ` Yehuda Sadeh-Weinraub
@ 2016-04-26 23:12                   ` Yehuda Sadeh-Weinraub
  2016-04-27 20:02                     ` Abhishek L
  0 siblings, 1 reply; 22+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2016-04-26 23:12 UTC (permalink / raw)
  To: Abhishek Lekshmanan; +Cc: Ceph Devel

On Tue, Apr 26, 2016 at 3:21 PM, Yehuda Sadeh-Weinraub
<yehuda@redhat.com> wrote:
> On Tue, Apr 26, 2016 at 10:37 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>
>> Abhishek L writes:
>>
>>> Yehuda Sadeh-Weinraub writes:
>>>
>>>> On Mon, Apr 25, 2016 at 1:17 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>
>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>
>>>>>> On Tue, Apr 19, 2016 at 11:08 AM, Yehuda Sadeh-Weinraub
>>>>>> <yehuda@redhat.com> wrote:
>>>>>>> On Tue, Apr 19, 2016 at 10:54 AM, Abhishek L
>>>>>>> <abhishek.lekshmanan@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>>>>
>>>>>>>>> On Tue, Apr 19, 2016 at 9:10 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>>>>>> Trying deleting objects & buckets from a secondary zone in a RGW
>>>>>>>>>> multisite configuration leads to some wierdness:
>>>>>>>>>>
>>>>>>>>>> 1. On deleting an object and the bucket immediately will mostly lead to
>>>>>>>>>> object and bucket getting deleted in the secondary zone, but since we
>>>>>>>>>> forward the bucket deletion to master only after we delete in secondary
>>>>>>>>>> it will fail with 409 (BucketNotEmpty) and gets reraised as a 500 to the
>>>>>>>>>> client. This _seems_ simple enough to fix if we forward the bucket
>>>>>>>>>> deletion request to master zone before attempting deletion locally,
>>>>>>>>>> (issue: http://tracker.ceph.com/issues/15540, possible fix: https://github.com/ceph/ceph/pull/8655)
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yeah, this looks good. We'll get it through testing.
>>>>>>>>>
>>>>>>>>>> 2. Deletion of objects themselves: deletion of objects themselves seems
>>>>>>>>>> to be a bit racy, deleting an object on a secondary zone succeeds,
>>>>>>>>>> listing the bucket seems to show an empty list, but gets populated with
>>>>>>>>>> the object again sometimes (this time with a newer timestamp), this is
>>>>>>>>>> not always guaranteed to be reproduce, but I've seen this often with
>>>>>>>>>> multipart uploads, as an eg:
>>>>>>>>>>
>>>>>>>>>> $ s3 -u list test-mp
>>>>>>>>>>                        Key                             Last Modified      Size
>>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>>> test.img                                            2016-04-19T13:00:17Z    40M
>>>>>>>>>> $ s3 -u delete test-mp/test.img
>>>>>>>>>> $ s3 -u list test-mp
>>>>>>>>>>                        Key                             Last Modified      Size
>>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>>> test.img                                            2016-04-19T13:00:45Z    40M
>>>>>>>>>> $ s3 -u delete test-mp/test.img # wait for a  min
>>>>>>>>>> $ s3 -us list test-mp
>>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>>> test.img                                            2016-04-19T13:01:52Z    40M
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Mostly seeing log entries of this form in both the cases ie. where
>>>>>>>>>> delete object seems to be successfully delete in both master and
>>>>>>>>>> secondary zone and the case where it succeeds in master and fails in
>>>>>>>>>> secondary :
>>>>>>>>>>
>>>>>>>>>> 20 parsed entry: id=00000000027.27.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: non-complete operation
>>>>>>>>>> 20 parsed entry: id=00000000028.28.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: canceled operation
>>>>>>>>>>
>>>>>>>>>> Any ideas on this?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Do you have more than 2 zones syncing? Is it an object delete that
>>>>>>>>> came right after the object creation?
>>>>>>>>
>>>>>>>> Only 2 zones ie. one master and one secondary, req, on secondary. The delete came right after the
>>>>>>>> create though
>>>>>>>
>>>>>>> There are two issues that I see here. One is that we sync an object,
>>>>>>> but end up with different mtime than the object's source. The second
>>>>>>> issue is that we shouldn't have synced that object.
>>>>>>>
>>>>>>> There needs to be a check when syncing objects, to validate that we
>>>>>>> don't sync an object that originated from the current zone (by
>>>>>>> comparing the short zone id). We might be missing that.
>>>>>>>
>>>>>>
>>>>>> For the first issue, see:
>>>>>> https://github.com/ceph/ceph/pull/8685
>>>>>>
>>>>>> However, create that follows by a delete will still be a problem, as
>>>>>> when we sync the object we check it against the source mtime is newer
>>>>>> than the destination mtime. This is problematic with deletes, as these
>>>>>> don't have mtime once the object is removed. I think the solution
>>>>>> would be by using temporary tombstone objects (we already have the olh
>>>>>> framework that can provide what we need), that we'll garbage collect.
>>>>>
>>>>> Further information from logs if it helps:
>>>>>
>>>>> 2016-04-19 17:00:45.539356 7fc99effd700  0 _send_request(): deleting obj=test-mp:test.img
>>>>> 2016-04-19 17:00:45.539902 7fc99effd700 20 _send_request(): skipping object removal obj=test-mp:test.img (obj mtime=2016-04-19 17:00:26.0.098255s, request timestamp=2016-04-19 17:00:17.0.395208s)
>>>>>
>>>>> This is what the master zone logs show, however the request timestamp
>>>>> logged here is the `If-Modified-Since` value from secondary zone when
>>>>> the actual object write was completed (and not the time when deletion
>>>>> was completed),  do we set the value of the deletion time anywhere else
>>>>> in the BI log
>>>>>
>>>>>
>>>>
>>>> Did you apply PR 8685?
>>>>
>>>> Also, take a look at this:
>>>>
>>>> https://github.com/ceph/ceph/pull/8709
>>>>
>>>> With the new code we do store the object creation time in the delete
>>>> bucket index entry. That way we make sure we only sync object removal,
>>>> if the object was the same or older than the one that was actually
>>>> removed.
>>
>> Applied both PRs atop of master + 8655, basically now the object doesn't
>> resync back to the secondary zone after deletion which we observed
>> before.A create followed by an immediate delete succeeds delete in both the
>> zones almost every time.
>>
>> However allowing the object to sync to primary by introducing a delay,
>> for eg a script on secondary like:
>>
>> for i in {1..20}; do s3 -us put foobar/foo$i filename=test.img && sleep 3 && s3 -us delete foobar/foo$i;  done
>>
>> gives a empty list on the secondary zone, on the primary zone however it
>> looks like very few objects got actually deleted and others are still existing
>>
>> $ s3 -us list foobar
>> Content-Type: application/xml
>> Request-Id: tx000000000000000001db8-00571f9cdb-1015-us-east-1
>> Content-Length: 4713
>>                        Key                             Last Modified      Size
>> --------------------------------------------------  --------------------  -----
>> foo1                                                2016-04-26T14:24:25Z    87M
>> foo10                                               2016-04-26T14:27:28Z    87M
>> foo11                                               2016-04-26T14:27:50Z    87M
>> foo12                                               2016-04-26T14:28:12Z    87M
>> foo14                                               2016-04-26T14:28:48Z    87M
>> foo15                                               2016-04-26T14:29:09Z    87M
>> foo16                                               2016-04-26T14:29:30Z    87M
>> foo17                                               2016-04-26T14:29:51Z    87M
>> foo18                                               2016-04-26T14:30:12Z    87M
>> foo19                                               2016-04-26T14:30:33Z    87M
>> foo2                                                2016-04-26T14:24:47Z    87M
>> foo20                                               2016-04-26T14:30:54Z    87M
>> foo3                                                2016-04-26T14:25:07Z    87M
>> foo6                                                2016-04-26T14:26:05Z    87M
>> foo7                                                2016-04-26T14:26:24Z    87M
>> foo8                                                2016-04-26T14:26:47Z    87M
>> foo9                                                2016-04-26T14:27:07Z    87M
>>
>> Logs show this in case of failed deletes:
>> 2016-04-26 18:31:01.793673 7ff3f1ffb700 10 If-UnModified-Since: 2016-04-26 18:30:54.0.751623s Last-Modified: 0.000000
>> 2016-04-26 18:31:01.793716 7ff3f1ffb700 20 _send_request(): delete_obj() obj=foobar3:foo20 returned ret=-2
>>
>> In case of an object that succeeded deletion:
>> 2016-04-26 18:28:40.673885 7ff3f27fc700 10 If-UnModified-Since: 2016-04-26 18:28:30.0.510155s Last-Modified: 2016-04-26 18:28:30.510155
>>
>> More interesting log:
>> 2016-04-26 18:25:52.660674 7ff3cf7fe700 10 If-UnModified-Since: 2016-04-26 18:25:46.0.700052s Last-Modified: 0.000000
>> 2016-04-26 18:25:52.660698 7ff3cf7fe700 20 _send_request(): delete_obj() obj=foobar3:foo5 returned ret=-2
>> ...
>> 2016-04-26 18:25:59.341420 7ff3a67fc700 10 If-UnModified-Since: 2016-04-26 18:25:46.0.700052s Last-Modified: 2016-04-26 18:25:46.700052
>> [this was object foo5.. which was one of the objects that got deleted]
>>
>> Let me know if any other log info may be helpful (only debug rgw was
>> set, debug ms was 0)
>
> Yes, this would be great. If you could provide log with 'debug rgw =
> 20' and 'debug ms = 1' it'd be helpful. I'm trying to reproduce the
> issue, and was able to get some bad behavior, but different than what
> you describe.

Please take a look at PR 8772, and let me know whether things work
differently for you.

Thanks,
Yehuda

>
>>
>> Possibly unrelated, a pkill radosgw sometimes used to throw an error
>> message with segfault, haven't seen this before though:
>> *** Caught signal (Segmentation fault) **
>>  in thread 7fdfbddfcb40 thread_name:radosgw
>>
>
> Do you have a backtrace? I found an issue that might explain this one,
> but need a backtrace to validate.
>
> Thanks,
> Yehuda
>
>> master sha1 before the patches was 25c2f5a793ff726facde236aa0c2dd78cf933c60
>>
>> Regards
>> Abhishek
>>>
>>> Hadn't applied the PR yet, I'll apply both and see if I can reproduce
>>> the issue again.
>>>>
>>>> Yehuda
>>>
>>> Thanks
>>> --
>>> Abhishek
>>
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-04-26 23:12                   ` Yehuda Sadeh-Weinraub
@ 2016-04-27 20:02                     ` Abhishek L
  2016-04-27 20:15                       ` Yehuda Sadeh-Weinraub
  0 siblings, 1 reply; 22+ messages in thread
From: Abhishek L @ 2016-04-27 20:02 UTC (permalink / raw)
  To: Yehuda Sadeh-Weinraub; +Cc: Ceph Devel


Yehuda Sadeh-Weinraub writes:

> On Tue, Apr 26, 2016 at 3:21 PM, Yehuda Sadeh-Weinraub
> <yehuda@redhat.com> wrote:
>> On Tue, Apr 26, 2016 at 10:37 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>
>>> Abhishek L writes:
>>>
>>>> Yehuda Sadeh-Weinraub writes:
>>>>
>>>>> On Mon, Apr 25, 2016 at 1:17 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>>
>>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>>
>>>>>>> On Tue, Apr 19, 2016 at 11:08 AM, Yehuda Sadeh-Weinraub
>>>>>>> <yehuda@redhat.com> wrote:
>>>>>>>> On Tue, Apr 19, 2016 at 10:54 AM, Abhishek L
>>>>>>>> <abhishek.lekshmanan@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>>>>>
>>>>>>>>>> On Tue, Apr 19, 2016 at 9:10 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>>>>>>> Trying deleting objects & buckets from a secondary zone in a RGW
>>>>>>>>>>> multisite configuration leads to some wierdness:
>>>>>>>>>>>
>>>>>>>>>>> 1. On deleting an object and the bucket immediately will mostly lead to
>>>>>>>>>>> object and bucket getting deleted in the secondary zone, but since we
>>>>>>>>>>> forward the bucket deletion to master only after we delete in secondary
>>>>>>>>>>> it will fail with 409 (BucketNotEmpty) and gets reraised as a 500 to the
>>>>>>>>>>> client. This _seems_ simple enough to fix if we forward the bucket
>>>>>>>>>>> deletion request to master zone before attempting deletion locally,
>>>>>>>>>>> (issue: http://tracker.ceph.com/issues/15540, possible fix: https://github.com/ceph/ceph/pull/8655)
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yeah, this looks good. We'll get it through testing.
>>>>>>>>>>
>>>>>>>>>>> 2. Deletion of objects themselves: deletion of objects themselves seems
>>>>>>>>>>> to be a bit racy, deleting an object on a secondary zone succeeds,
>>>>>>>>>>> listing the bucket seems to show an empty list, but gets populated with
>>>>>>>>>>> the object again sometimes (this time with a newer timestamp), this is
>>>>>>>>>>> not always guaranteed to be reproduce, but I've seen this often with
>>>>>>>>>>> multipart uploads, as an eg:
>>>>>>>>>>>
>>>>>>>>>>> $ s3 -u list test-mp
>>>>>>>>>>>                        Key                             Last Modified      Size
>>>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>>>> test.img                                            2016-04-19T13:00:17Z    40M
>>>>>>>>>>> $ s3 -u delete test-mp/test.img
>>>>>>>>>>> $ s3 -u list test-mp
>>>>>>>>>>>                        Key                             Last Modified      Size
>>>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>>>> test.img                                            2016-04-19T13:00:45Z    40M
>>>>>>>>>>> $ s3 -u delete test-mp/test.img # wait for a  min
>>>>>>>>>>> $ s3 -us list test-mp
>>>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>>>> test.img                                            2016-04-19T13:01:52Z    40M
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Mostly seeing log entries of this form in both the cases ie. where
>>>>>>>>>>> delete object seems to be successfully delete in both master and
>>>>>>>>>>> secondary zone and the case where it succeeds in master and fails in
>>>>>>>>>>> secondary :
>>>>>>>>>>>
>>>>>>>>>>> 20 parsed entry: id=00000000027.27.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: non-complete operation
>>>>>>>>>>> 20 parsed entry: id=00000000028.28.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: canceled operation
>>>>>>>>>>>
>>>>>>>>>>> Any ideas on this?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Do you have more than 2 zones syncing? Is it an object delete that
>>>>>>>>>> came right after the object creation?
>>>>>>>>>
>>>>>>>>> Only 2 zones ie. one master and one secondary, req, on secondary. The delete came right after the
>>>>>>>>> create though
>>>>>>>>
>>>>>>>> There are two issues that I see here. One is that we sync an object,
>>>>>>>> but end up with different mtime than the object's source. The second
>>>>>>>> issue is that we shouldn't have synced that object.
>>>>>>>>
>>>>>>>> There needs to be a check when syncing objects, to validate that we
>>>>>>>> don't sync an object that originated from the current zone (by
>>>>>>>> comparing the short zone id). We might be missing that.
>>>>>>>>
>>>>>>>
>>>>>>> For the first issue, see:
>>>>>>> https://github.com/ceph/ceph/pull/8685
>>>>>>>
>>>>>>> However, create that follows by a delete will still be a problem, as
>>>>>>> when we sync the object we check it against the source mtime is newer
>>>>>>> than the destination mtime. This is problematic with deletes, as these
>>>>>>> don't have mtime once the object is removed. I think the solution
>>>>>>> would be by using temporary tombstone objects (we already have the olh
>>>>>>> framework that can provide what we need), that we'll garbage collect.
>>>>>>
>>>>>> Further information from logs if it helps:
>>>>>>
>>>>>> 2016-04-19 17:00:45.539356 7fc99effd700  0 _send_request(): deleting obj=test-mp:test.img
>>>>>> 2016-04-19 17:00:45.539902 7fc99effd700 20 _send_request(): skipping object removal obj=test-mp:test.img (obj mtime=2016-04-19 17:00:26.0.098255s, request timestamp=2016-04-19 17:00:17.0.395208s)
>>>>>>
>>>>>> This is what the master zone logs show, however the request timestamp
>>>>>> logged here is the `If-Modified-Since` value from secondary zone when
>>>>>> the actual object write was completed (and not the time when deletion
>>>>>> was completed),  do we set the value of the deletion time anywhere else
>>>>>> in the BI log
>>>>>>
>>>>>>
>>>>>
>>>>> Did you apply PR 8685?
>>>>>
>>>>> Also, take a look at this:
>>>>>
>>>>> https://github.com/ceph/ceph/pull/8709
>>>>>
>>>>> With the new code we do store the object creation time in the delete
>>>>> bucket index entry. That way we make sure we only sync object removal,
>>>>> if the object was the same or older than the one that was actually
>>>>> removed.
>>>
>>> Applied both PRs atop of master + 8655, basically now the object doesn't
>>> resync back to the secondary zone after deletion which we observed
>>> before.A create followed by an immediate delete succeeds delete in both the
>>> zones almost every time.
>>>
>>> However allowing the object to sync to primary by introducing a delay,
>>> for eg a script on secondary like:
>>>
>>> for i in {1..20}; do s3 -us put foobar/foo$i filename=test.img && sleep 3 && s3 -us delete foobar/foo$i;  done
>>>
>>> gives a empty list on the secondary zone, on the primary zone however it
>>> looks like very few objects got actually deleted and others are still existing
>>>
>>> $ s3 -us list foobar
>>> Content-Type: application/xml
>>> Request-Id: tx000000000000000001db8-00571f9cdb-1015-us-east-1
>>> Content-Length: 4713
>>>                        Key                             Last Modified      Size
>>> --------------------------------------------------  --------------------  -----
>>> foo1                                                2016-04-26T14:24:25Z    87M
>>> foo10                                               2016-04-26T14:27:28Z    87M
>>> foo11                                               2016-04-26T14:27:50Z    87M
>>> foo12                                               2016-04-26T14:28:12Z    87M
>>> foo14                                               2016-04-26T14:28:48Z    87M
>>> foo15                                               2016-04-26T14:29:09Z    87M
>>> foo16                                               2016-04-26T14:29:30Z    87M
>>> foo17                                               2016-04-26T14:29:51Z    87M
>>> foo18                                               2016-04-26T14:30:12Z    87M
>>> foo19                                               2016-04-26T14:30:33Z    87M
>>> foo2                                                2016-04-26T14:24:47Z    87M
>>> foo20                                               2016-04-26T14:30:54Z    87M
>>> foo3                                                2016-04-26T14:25:07Z    87M
>>> foo6                                                2016-04-26T14:26:05Z    87M
>>> foo7                                                2016-04-26T14:26:24Z    87M
>>> foo8                                                2016-04-26T14:26:47Z    87M
>>> foo9                                                2016-04-26T14:27:07Z    87M
>>>
>>> Logs show this in case of failed deletes:
>>> 2016-04-26 18:31:01.793673 7ff3f1ffb700 10 If-UnModified-Since: 2016-04-26 18:30:54.0.751623s Last-Modified: 0.000000
>>> 2016-04-26 18:31:01.793716 7ff3f1ffb700 20 _send_request(): delete_obj() obj=foobar3:foo20 returned ret=-2
>>>
>>> In case of an object that succeeded deletion:
>>> 2016-04-26 18:28:40.673885 7ff3f27fc700 10 If-UnModified-Since: 2016-04-26 18:28:30.0.510155s Last-Modified: 2016-04-26 18:28:30.510155
>>>
>>> More interesting log:
>>> 2016-04-26 18:25:52.660674 7ff3cf7fe700 10 If-UnModified-Since: 2016-04-26 18:25:46.0.700052s Last-Modified: 0.000000
>>> 2016-04-26 18:25:52.660698 7ff3cf7fe700 20 _send_request(): delete_obj() obj=foobar3:foo5 returned ret=-2
>>> ...
>>> 2016-04-26 18:25:59.341420 7ff3a67fc700 10 If-UnModified-Since: 2016-04-26 18:25:46.0.700052s Last-Modified: 2016-04-26 18:25:46.700052
>>> [this was object foo5.. which was one of the objects that got deleted]
>>>
>>> Let me know if any other log info may be helpful (only debug rgw was
>>> set, debug ms was 0)
>>
>> Yes, this would be great. If you could provide log with 'debug rgw =
>> 20' and 'debug ms = 1' it'd be helpful. I'm trying to reproduce the
>> issue, and was able to get some bad behavior, but different than what
>> you describe.

Tried reproducing this behaviour again, not very successful though, I
have debug logs without debug ms setting.
>
> Please take a look at PR 8772, and let me know whether things work
> differently for you.
>
Still seeing the old issue (with or without 8772), ie. master sees an
object and even though it is deleted in secondary after sometime it
reappears. I've uploaded logs of both the regions using
`ceph-post-file`, uid cc9207f5-8aad-4f29-8d2a-6b84f2c253b0

Regards
Abhishek
> Thanks,
> Yehuda
>
>>
>>>
>>> Possibly unrelated, a pkill radosgw sometimes used to throw an error
>>> message with segfault, haven't seen this before though:
>>> *** Caught signal (Segmentation fault) **
>>>  in thread 7fdfbddfcb40 thread_name:radosgw
>>>
>>
>> Do you have a backtrace? I found an issue that might explain this one,
>> but need a backtrace to validate.
>>
>> Thanks,
>> Yehuda
>>
>>> master sha1 before the patches was 25c2f5a793ff726facde236aa0c2dd78cf933c60
>>>
>>> Regards
>>> Abhishek
>>>>
>>>> Hadn't applied the PR yet, I'll apply both and see if I can reproduce
>>>> the issue again.
>>>>>
>>>>> Yehuda
>>>>
>>>> Thanks
>>>> --
>>>> Abhishek
>>>
>>>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-04-27 20:02                     ` Abhishek L
@ 2016-04-27 20:15                       ` Yehuda Sadeh-Weinraub
  2016-04-27 21:50                         ` Yehuda Sadeh-Weinraub
  0 siblings, 1 reply; 22+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2016-04-27 20:15 UTC (permalink / raw)
  To: Abhishek L; +Cc: Ceph Devel

On Wed, Apr 27, 2016 at 1:02 PM, Abhishek L <abhishek@suse.com> wrote:
>
> Yehuda Sadeh-Weinraub writes:
>
>> On Tue, Apr 26, 2016 at 3:21 PM, Yehuda Sadeh-Weinraub
>> <yehuda@redhat.com> wrote:
>>> On Tue, Apr 26, 2016 at 10:37 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>
>>>> Abhishek L writes:
>>>>
>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>
>>>>>> On Mon, Apr 25, 2016 at 1:17 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>>>
>>>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>>>
>>>>>>>> On Tue, Apr 19, 2016 at 11:08 AM, Yehuda Sadeh-Weinraub
>>>>>>>> <yehuda@redhat.com> wrote:
>>>>>>>>> On Tue, Apr 19, 2016 at 10:54 AM, Abhishek L
>>>>>>>>> <abhishek.lekshmanan@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 19, 2016 at 9:10 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>>>>>>>> Trying deleting objects & buckets from a secondary zone in a RGW
>>>>>>>>>>>> multisite configuration leads to some wierdness:
>>>>>>>>>>>>
>>>>>>>>>>>> 1. On deleting an object and the bucket immediately will mostly lead to
>>>>>>>>>>>> object and bucket getting deleted in the secondary zone, but since we
>>>>>>>>>>>> forward the bucket deletion to master only after we delete in secondary
>>>>>>>>>>>> it will fail with 409 (BucketNotEmpty) and gets reraised as a 500 to the
>>>>>>>>>>>> client. This _seems_ simple enough to fix if we forward the bucket
>>>>>>>>>>>> deletion request to master zone before attempting deletion locally,
>>>>>>>>>>>> (issue: http://tracker.ceph.com/issues/15540, possible fix: https://github.com/ceph/ceph/pull/8655)
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Yeah, this looks good. We'll get it through testing.
>>>>>>>>>>>
>>>>>>>>>>>> 2. Deletion of objects themselves: deletion of objects themselves seems
>>>>>>>>>>>> to be a bit racy, deleting an object on a secondary zone succeeds,
>>>>>>>>>>>> listing the bucket seems to show an empty list, but gets populated with
>>>>>>>>>>>> the object again sometimes (this time with a newer timestamp), this is
>>>>>>>>>>>> not always guaranteed to be reproduce, but I've seen this often with
>>>>>>>>>>>> multipart uploads, as an eg:
>>>>>>>>>>>>
>>>>>>>>>>>> $ s3 -u list test-mp
>>>>>>>>>>>>                        Key                             Last Modified      Size
>>>>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>>>>> test.img                                            2016-04-19T13:00:17Z    40M
>>>>>>>>>>>> $ s3 -u delete test-mp/test.img
>>>>>>>>>>>> $ s3 -u list test-mp
>>>>>>>>>>>>                        Key                             Last Modified      Size
>>>>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>>>>> test.img                                            2016-04-19T13:00:45Z    40M
>>>>>>>>>>>> $ s3 -u delete test-mp/test.img # wait for a  min
>>>>>>>>>>>> $ s3 -us list test-mp
>>>>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>>>>> test.img                                            2016-04-19T13:01:52Z    40M
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Mostly seeing log entries of this form in both the cases ie. where
>>>>>>>>>>>> delete object seems to be successfully delete in both master and
>>>>>>>>>>>> secondary zone and the case where it succeeds in master and fails in
>>>>>>>>>>>> secondary :
>>>>>>>>>>>>
>>>>>>>>>>>> 20 parsed entry: id=00000000027.27.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: non-complete operation
>>>>>>>>>>>> 20 parsed entry: id=00000000028.28.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: canceled operation
>>>>>>>>>>>>
>>>>>>>>>>>> Any ideas on this?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Do you have more than 2 zones syncing? Is it an object delete that
>>>>>>>>>>> came right after the object creation?
>>>>>>>>>>
>>>>>>>>>> Only 2 zones ie. one master and one secondary, req, on secondary. The delete came right after the
>>>>>>>>>> create though
>>>>>>>>>
>>>>>>>>> There are two issues that I see here. One is that we sync an object,
>>>>>>>>> but end up with different mtime than the object's source. The second
>>>>>>>>> issue is that we shouldn't have synced that object.
>>>>>>>>>
>>>>>>>>> There needs to be a check when syncing objects, to validate that we
>>>>>>>>> don't sync an object that originated from the current zone (by
>>>>>>>>> comparing the short zone id). We might be missing that.
>>>>>>>>>
>>>>>>>>
>>>>>>>> For the first issue, see:
>>>>>>>> https://github.com/ceph/ceph/pull/8685
>>>>>>>>
>>>>>>>> However, create that follows by a delete will still be a problem, as
>>>>>>>> when we sync the object we check it against the source mtime is newer
>>>>>>>> than the destination mtime. This is problematic with deletes, as these
>>>>>>>> don't have mtime once the object is removed. I think the solution
>>>>>>>> would be by using temporary tombstone objects (we already have the olh
>>>>>>>> framework that can provide what we need), that we'll garbage collect.
>>>>>>>
>>>>>>> Further information from logs if it helps:
>>>>>>>
>>>>>>> 2016-04-19 17:00:45.539356 7fc99effd700  0 _send_request(): deleting obj=test-mp:test.img
>>>>>>> 2016-04-19 17:00:45.539902 7fc99effd700 20 _send_request(): skipping object removal obj=test-mp:test.img (obj mtime=2016-04-19 17:00:26.0.098255s, request timestamp=2016-04-19 17:00:17.0.395208s)
>>>>>>>
>>>>>>> This is what the master zone logs show, however the request timestamp
>>>>>>> logged here is the `If-Modified-Since` value from secondary zone when
>>>>>>> the actual object write was completed (and not the time when deletion
>>>>>>> was completed),  do we set the value of the deletion time anywhere else
>>>>>>> in the BI log
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Did you apply PR 8685?
>>>>>>
>>>>>> Also, take a look at this:
>>>>>>
>>>>>> https://github.com/ceph/ceph/pull/8709
>>>>>>
>>>>>> With the new code we do store the object creation time in the delete
>>>>>> bucket index entry. That way we make sure we only sync object removal,
>>>>>> if the object was the same or older than the one that was actually
>>>>>> removed.
>>>>
>>>> Applied both PRs atop of master + 8655, basically now the object doesn't
>>>> resync back to the secondary zone after deletion which we observed
>>>> before.A create followed by an immediate delete succeeds delete in both the
>>>> zones almost every time.
>>>>
>>>> However allowing the object to sync to primary by introducing a delay,
>>>> for eg a script on secondary like:
>>>>
>>>> for i in {1..20}; do s3 -us put foobar/foo$i filename=test.img && sleep 3 && s3 -us delete foobar/foo$i;  done
>>>>
>>>> gives a empty list on the secondary zone, on the primary zone however it
>>>> looks like very few objects got actually deleted and others are still existing
>>>>
>>>> $ s3 -us list foobar
>>>> Content-Type: application/xml
>>>> Request-Id: tx000000000000000001db8-00571f9cdb-1015-us-east-1
>>>> Content-Length: 4713
>>>>                        Key                             Last Modified      Size
>>>> --------------------------------------------------  --------------------  -----
>>>> foo1                                                2016-04-26T14:24:25Z    87M
>>>> foo10                                               2016-04-26T14:27:28Z    87M
>>>> foo11                                               2016-04-26T14:27:50Z    87M
>>>> foo12                                               2016-04-26T14:28:12Z    87M
>>>> foo14                                               2016-04-26T14:28:48Z    87M
>>>> foo15                                               2016-04-26T14:29:09Z    87M
>>>> foo16                                               2016-04-26T14:29:30Z    87M
>>>> foo17                                               2016-04-26T14:29:51Z    87M
>>>> foo18                                               2016-04-26T14:30:12Z    87M
>>>> foo19                                               2016-04-26T14:30:33Z    87M
>>>> foo2                                                2016-04-26T14:24:47Z    87M
>>>> foo20                                               2016-04-26T14:30:54Z    87M
>>>> foo3                                                2016-04-26T14:25:07Z    87M
>>>> foo6                                                2016-04-26T14:26:05Z    87M
>>>> foo7                                                2016-04-26T14:26:24Z    87M
>>>> foo8                                                2016-04-26T14:26:47Z    87M
>>>> foo9                                                2016-04-26T14:27:07Z    87M
>>>>
>>>> Logs show this in case of failed deletes:
>>>> 2016-04-26 18:31:01.793673 7ff3f1ffb700 10 If-UnModified-Since: 2016-04-26 18:30:54.0.751623s Last-Modified: 0.000000
>>>> 2016-04-26 18:31:01.793716 7ff3f1ffb700 20 _send_request(): delete_obj() obj=foobar3:foo20 returned ret=-2
>>>>
>>>> In case of an object that succeeded deletion:
>>>> 2016-04-26 18:28:40.673885 7ff3f27fc700 10 If-UnModified-Since: 2016-04-26 18:28:30.0.510155s Last-Modified: 2016-04-26 18:28:30.510155
>>>>
>>>> More interesting log:
>>>> 2016-04-26 18:25:52.660674 7ff3cf7fe700 10 If-UnModified-Since: 2016-04-26 18:25:46.0.700052s Last-Modified: 0.000000
>>>> 2016-04-26 18:25:52.660698 7ff3cf7fe700 20 _send_request(): delete_obj() obj=foobar3:foo5 returned ret=-2
>>>> ...
>>>> 2016-04-26 18:25:59.341420 7ff3a67fc700 10 If-UnModified-Since: 2016-04-26 18:25:46.0.700052s Last-Modified: 2016-04-26 18:25:46.700052
>>>> [this was object foo5.. which was one of the objects that got deleted]
>>>>
>>>> Let me know if any other log info may be helpful (only debug rgw was
>>>> set, debug ms was 0)
>>>
>>> Yes, this would be great. If you could provide log with 'debug rgw =
>>> 20' and 'debug ms = 1' it'd be helpful. I'm trying to reproduce the
>>> issue, and was able to get some bad behavior, but different than what
>>> you describe.
>
> Tried reproducing this behaviour again, not very successful though, I
> have debug logs without debug ms setting.
>>
>> Please take a look at PR 8772, and let me know whether things work
>> differently for you.
>>
> Still seeing the old issue (with or without 8772), ie. master sees an
> object and even though it is deleted in secondary after sometime it
> reappears. I've uploaded logs of both the regions using
> `ceph-post-file`, uid cc9207f5-8aad-4f29-8d2a-6b84f2c253b0
>


Still haven't looked at the logs, but a quick question. Did you also
update the osds with the changes in these PRs? There is an objclass
change there, so the osd side needed to be updated and restarted.

Yehuda

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-04-27 20:15                       ` Yehuda Sadeh-Weinraub
@ 2016-04-27 21:50                         ` Yehuda Sadeh-Weinraub
  2016-05-31  9:21                           ` Abhishek Lekshmanan
  0 siblings, 1 reply; 22+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2016-04-27 21:50 UTC (permalink / raw)
  To: Abhishek L; +Cc: Ceph Devel

On Wed, Apr 27, 2016 at 1:15 PM, Yehuda Sadeh-Weinraub
<yehuda@redhat.com> wrote:
> On Wed, Apr 27, 2016 at 1:02 PM, Abhishek L <abhishek@suse.com> wrote:
>>
>> Yehuda Sadeh-Weinraub writes:
>>
>>> On Tue, Apr 26, 2016 at 3:21 PM, Yehuda Sadeh-Weinraub
>>> <yehuda@redhat.com> wrote:
>>>> On Tue, Apr 26, 2016 at 10:37 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>
>>>>> Abhishek L writes:
>>>>>
>>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>>
>>>>>>> On Mon, Apr 25, 2016 at 1:17 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>>>>
>>>>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>>>>
>>>>>>>>> On Tue, Apr 19, 2016 at 11:08 AM, Yehuda Sadeh-Weinraub
>>>>>>>>> <yehuda@redhat.com> wrote:
>>>>>>>>>> On Tue, Apr 19, 2016 at 10:54 AM, Abhishek L
>>>>>>>>>> <abhishek.lekshmanan@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Apr 19, 2016 at 9:10 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>>>>>>>>> Trying deleting objects & buckets from a secondary zone in a RGW
>>>>>>>>>>>>> multisite configuration leads to some wierdness:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. On deleting an object and the bucket immediately will mostly lead to
>>>>>>>>>>>>> object and bucket getting deleted in the secondary zone, but since we
>>>>>>>>>>>>> forward the bucket deletion to master only after we delete in secondary
>>>>>>>>>>>>> it will fail with 409 (BucketNotEmpty) and gets reraised as a 500 to the
>>>>>>>>>>>>> client. This _seems_ simple enough to fix if we forward the bucket
>>>>>>>>>>>>> deletion request to master zone before attempting deletion locally,
>>>>>>>>>>>>> (issue: http://tracker.ceph.com/issues/15540, possible fix: https://github.com/ceph/ceph/pull/8655)
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Yeah, this looks good. We'll get it through testing.
>>>>>>>>>>>>
>>>>>>>>>>>>> 2. Deletion of objects themselves: deletion of objects themselves seems
>>>>>>>>>>>>> to be a bit racy, deleting an object on a secondary zone succeeds,
>>>>>>>>>>>>> listing the bucket seems to show an empty list, but gets populated with
>>>>>>>>>>>>> the object again sometimes (this time with a newer timestamp), this is
>>>>>>>>>>>>> not always guaranteed to be reproduce, but I've seen this often with
>>>>>>>>>>>>> multipart uploads, as an eg:
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ s3 -u list test-mp
>>>>>>>>>>>>>                        Key                             Last Modified      Size
>>>>>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>>>>>> test.img                                            2016-04-19T13:00:17Z    40M
>>>>>>>>>>>>> $ s3 -u delete test-mp/test.img
>>>>>>>>>>>>> $ s3 -u list test-mp
>>>>>>>>>>>>>                        Key                             Last Modified      Size
>>>>>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>>>>>> test.img                                            2016-04-19T13:00:45Z    40M
>>>>>>>>>>>>> $ s3 -u delete test-mp/test.img # wait for a  min
>>>>>>>>>>>>> $ s3 -us list test-mp
>>>>>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>>>>>> test.img                                            2016-04-19T13:01:52Z    40M
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mostly seeing log entries of this form in both the cases ie. where
>>>>>>>>>>>>> delete object seems to be successfully delete in both master and
>>>>>>>>>>>>> secondary zone and the case where it succeeds in master and fails in
>>>>>>>>>>>>> secondary :
>>>>>>>>>>>>>
>>>>>>>>>>>>> 20 parsed entry: id=00000000027.27.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>>>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: non-complete operation
>>>>>>>>>>>>> 20 parsed entry: id=00000000028.28.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>>>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: canceled operation
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any ideas on this?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Do you have more than 2 zones syncing? Is it an object delete that
>>>>>>>>>>>> came right after the object creation?
>>>>>>>>>>>
>>>>>>>>>>> Only 2 zones ie. one master and one secondary, req, on secondary. The delete came right after the
>>>>>>>>>>> create though
>>>>>>>>>>
>>>>>>>>>> There are two issues that I see here. One is that we sync an object,
>>>>>>>>>> but end up with different mtime than the object's source. The second
>>>>>>>>>> issue is that we shouldn't have synced that object.
>>>>>>>>>>
>>>>>>>>>> There needs to be a check when syncing objects, to validate that we
>>>>>>>>>> don't sync an object that originated from the current zone (by
>>>>>>>>>> comparing the short zone id). We might be missing that.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> For the first issue, see:
>>>>>>>>> https://github.com/ceph/ceph/pull/8685
>>>>>>>>>
>>>>>>>>> However, create that follows by a delete will still be a problem, as
>>>>>>>>> when we sync the object we check it against the source mtime is newer
>>>>>>>>> than the destination mtime. This is problematic with deletes, as these
>>>>>>>>> don't have mtime once the object is removed. I think the solution
>>>>>>>>> would be by using temporary tombstone objects (we already have the olh
>>>>>>>>> framework that can provide what we need), that we'll garbage collect.
>>>>>>>>
>>>>>>>> Further information from logs if it helps:
>>>>>>>>
>>>>>>>> 2016-04-19 17:00:45.539356 7fc99effd700  0 _send_request(): deleting obj=test-mp:test.img
>>>>>>>> 2016-04-19 17:00:45.539902 7fc99effd700 20 _send_request(): skipping object removal obj=test-mp:test.img (obj mtime=2016-04-19 17:00:26.0.098255s, request timestamp=2016-04-19 17:00:17.0.395208s)
>>>>>>>>
>>>>>>>> This is what the master zone logs show, however the request timestamp
>>>>>>>> logged here is the `If-Modified-Since` value from secondary zone when
>>>>>>>> the actual object write was completed (and not the time when deletion
>>>>>>>> was completed),  do we set the value of the deletion time anywhere else
>>>>>>>> in the BI log
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Did you apply PR 8685?
>>>>>>>
>>>>>>> Also, take a look at this:
>>>>>>>
>>>>>>> https://github.com/ceph/ceph/pull/8709
>>>>>>>
>>>>>>> With the new code we do store the object creation time in the delete
>>>>>>> bucket index entry. That way we make sure we only sync object removal,
>>>>>>> if the object was the same or older than the one that was actually
>>>>>>> removed.
>>>>>
>>>>> Applied both PRs atop of master + 8655, basically now the object doesn't
>>>>> resync back to the secondary zone after deletion which we observed
>>>>> before.A create followed by an immediate delete succeeds delete in both the
>>>>> zones almost every time.
>>>>>
>>>>> However allowing the object to sync to primary by introducing a delay,
>>>>> for eg a script on secondary like:
>>>>>
>>>>> for i in {1..20}; do s3 -us put foobar/foo$i filename=test.img && sleep 3 && s3 -us delete foobar/foo$i;  done
>>>>>
>>>>> gives a empty list on the secondary zone, on the primary zone however it
>>>>> looks like very few objects got actually deleted and others are still existing
>>>>>
>>>>> $ s3 -us list foobar
>>>>> Content-Type: application/xml
>>>>> Request-Id: tx000000000000000001db8-00571f9cdb-1015-us-east-1
>>>>> Content-Length: 4713
>>>>>                        Key                             Last Modified      Size
>>>>> --------------------------------------------------  --------------------  -----
>>>>> foo1                                                2016-04-26T14:24:25Z    87M
>>>>> foo10                                               2016-04-26T14:27:28Z    87M
>>>>> foo11                                               2016-04-26T14:27:50Z    87M
>>>>> foo12                                               2016-04-26T14:28:12Z    87M
>>>>> foo14                                               2016-04-26T14:28:48Z    87M
>>>>> foo15                                               2016-04-26T14:29:09Z    87M
>>>>> foo16                                               2016-04-26T14:29:30Z    87M
>>>>> foo17                                               2016-04-26T14:29:51Z    87M
>>>>> foo18                                               2016-04-26T14:30:12Z    87M
>>>>> foo19                                               2016-04-26T14:30:33Z    87M
>>>>> foo2                                                2016-04-26T14:24:47Z    87M
>>>>> foo20                                               2016-04-26T14:30:54Z    87M
>>>>> foo3                                                2016-04-26T14:25:07Z    87M
>>>>> foo6                                                2016-04-26T14:26:05Z    87M
>>>>> foo7                                                2016-04-26T14:26:24Z    87M
>>>>> foo8                                                2016-04-26T14:26:47Z    87M
>>>>> foo9                                                2016-04-26T14:27:07Z    87M
>>>>>
>>>>> Logs show this in case of failed deletes:
>>>>> 2016-04-26 18:31:01.793673 7ff3f1ffb700 10 If-UnModified-Since: 2016-04-26 18:30:54.0.751623s Last-Modified: 0.000000
>>>>> 2016-04-26 18:31:01.793716 7ff3f1ffb700 20 _send_request(): delete_obj() obj=foobar3:foo20 returned ret=-2
>>>>>
>>>>> In case of an object that succeeded deletion:
>>>>> 2016-04-26 18:28:40.673885 7ff3f27fc700 10 If-UnModified-Since: 2016-04-26 18:28:30.0.510155s Last-Modified: 2016-04-26 18:28:30.510155
>>>>>
>>>>> More interesting log:
>>>>> 2016-04-26 18:25:52.660674 7ff3cf7fe700 10 If-UnModified-Since: 2016-04-26 18:25:46.0.700052s Last-Modified: 0.000000
>>>>> 2016-04-26 18:25:52.660698 7ff3cf7fe700 20 _send_request(): delete_obj() obj=foobar3:foo5 returned ret=-2
>>>>> ...
>>>>> 2016-04-26 18:25:59.341420 7ff3a67fc700 10 If-UnModified-Since: 2016-04-26 18:25:46.0.700052s Last-Modified: 2016-04-26 18:25:46.700052
>>>>> [this was object foo5.. which was one of the objects that got deleted]
>>>>>
>>>>> Let me know if any other log info may be helpful (only debug rgw was
>>>>> set, debug ms was 0)
>>>>
>>>> Yes, this would be great. If you could provide log with 'debug rgw =
>>>> 20' and 'debug ms = 1' it'd be helpful. I'm trying to reproduce the
>>>> issue, and was able to get some bad behavior, but different than what
>>>> you describe.
>>
>> Tried reproducing this behaviour again, not very successful though, I
>> have debug logs without debug ms setting.
>>>
>>> Please take a look at PR 8772, and let me know whether things work
>>> differently for you.
>>>
>> Still seeing the old issue (with or without 8772), ie. master sees an
>> object and even though it is deleted in secondary after sometime it
>> reappears. I've uploaded logs of both the regions using
>> `ceph-post-file`, uid cc9207f5-8aad-4f29-8d2a-6b84f2c253b0
>>
>
>
> Still haven't looked at the logs, but a quick question. Did you also
> update the osds with the changes in these PRs? There is an objclass
> change there, so the osd side needed to be updated and restarted.
>

Ok, after looking at the logs, I think I understand the problem. For
some reason object deletes sync can happen in concurrent with object
creation. I'll fix that and update the PR.

Thanks,
Yehuda

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-04-27 21:50                         ` Yehuda Sadeh-Weinraub
@ 2016-05-31  9:21                           ` Abhishek Lekshmanan
  2016-05-31 11:06                             ` Yehuda Sadeh-Weinraub
  0 siblings, 1 reply; 22+ messages in thread
From: Abhishek Lekshmanan @ 2016-05-31  9:21 UTC (permalink / raw)
  To: Yehuda Sadeh-Weinraub; +Cc: Abhishek L, Ceph Devel

[-- Attachment #1: Type: text/plain, Size: 12800 bytes --]


Yehuda Sadeh-Weinraub writes:

> On Wed, Apr 27, 2016 at 1:15 PM, Yehuda Sadeh-Weinraub
> <yehuda@redhat.com> wrote:
>> On Wed, Apr 27, 2016 at 1:02 PM, Abhishek L <abhishek@suse.com> wrote:
>>>
>>> Yehuda Sadeh-Weinraub writes:
>>>
>>>> On Tue, Apr 26, 2016 at 3:21 PM, Yehuda Sadeh-Weinraub
>>>> <yehuda@redhat.com> wrote:
>>>>> On Tue, Apr 26, 2016 at 10:37 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>>
>>>>>> Abhishek L writes:
>>>>>>
>>>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>>>
>>>>>>>> On Mon, Apr 25, 2016 at 1:17 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>>>>>
>>>>>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>>>>>
>>>>>>>>>> On Tue, Apr 19, 2016 at 11:08 AM, Yehuda Sadeh-Weinraub
>>>>>>>>>> <yehuda@redhat.com> wrote:
>>>>>>>>>>> On Tue, Apr 19, 2016 at 10:54 AM, Abhishek L
>>>>>>>>>>> <abhishek.lekshmanan@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Apr 19, 2016 at 9:10 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>>>>>>>>>> Trying deleting objects & buckets from a secondary zone in a RGW
>>>>>>>>>>>>>> multisite configuration leads to some wierdness:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. On deleting an object and the bucket immediately will mostly lead to
>>>>>>>>>>>>>> object and bucket getting deleted in the secondary zone, but since we
>>>>>>>>>>>>>> forward the bucket deletion to master only after we delete in secondary
>>>>>>>>>>>>>> it will fail with 409 (BucketNotEmpty) and gets reraised as a 500 to the
>>>>>>>>>>>>>> client. This _seems_ simple enough to fix if we forward the bucket
>>>>>>>>>>>>>> deletion request to master zone before attempting deletion locally,
>>>>>>>>>>>>>> (issue: http://tracker.ceph.com/issues/15540, possible fix: https://github.com/ceph/ceph/pull/8655)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yeah, this looks good. We'll get it through testing.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2. Deletion of objects themselves: deletion of objects themselves seems
>>>>>>>>>>>>>> to be a bit racy, deleting an object on a secondary zone succeeds,
>>>>>>>>>>>>>> listing the bucket seems to show an empty list, but gets populated with
>>>>>>>>>>>>>> the object again sometimes (this time with a newer timestamp), this is
>>>>>>>>>>>>>> not always guaranteed to be reproduce, but I've seen this often with
>>>>>>>>>>>>>> multipart uploads, as an eg:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ s3 -u list test-mp
>>>>>>>>>>>>>>                        Key                             Last Modified      Size
>>>>>>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>>>>>>> test.img                                            2016-04-19T13:00:17Z    40M
>>>>>>>>>>>>>> $ s3 -u delete test-mp/test.img
>>>>>>>>>>>>>> $ s3 -u list test-mp
>>>>>>>>>>>>>>                        Key                             Last Modified      Size
>>>>>>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>>>>>>> test.img                                            2016-04-19T13:00:45Z    40M
>>>>>>>>>>>>>> $ s3 -u delete test-mp/test.img # wait for a  min
>>>>>>>>>>>>>> $ s3 -us list test-mp
>>>>>>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>>>>>>> test.img                                            2016-04-19T13:01:52Z    40M
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mostly seeing log entries of this form in both the cases ie. where
>>>>>>>>>>>>>> delete object seems to be successfully delete in both master and
>>>>>>>>>>>>>> secondary zone and the case where it succeeds in master and fails in
>>>>>>>>>>>>>> secondary :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 20 parsed entry: id=00000000027.27.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>>>>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: non-complete operation
>>>>>>>>>>>>>> 20 parsed entry: id=00000000028.28.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>>>>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: canceled operation
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any ideas on this?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you have more than 2 zones syncing? Is it an object delete that
>>>>>>>>>>>>> came right after the object creation?
>>>>>>>>>>>>
>>>>>>>>>>>> Only 2 zones ie. one master and one secondary, req, on secondary. The delete came right after the
>>>>>>>>>>>> create though
>>>>>>>>>>>
>>>>>>>>>>> There are two issues that I see here. One is that we sync an object,
>>>>>>>>>>> but end up with different mtime than the object's source. The second
>>>>>>>>>>> issue is that we shouldn't have synced that object.
>>>>>>>>>>>
>>>>>>>>>>> There needs to be a check when syncing objects, to validate that we
>>>>>>>>>>> don't sync an object that originated from the current zone (by
>>>>>>>>>>> comparing the short zone id). We might be missing that.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For the first issue, see:
>>>>>>>>>> https://github.com/ceph/ceph/pull/8685
>>>>>>>>>>
>>>>>>>>>> However, create that follows by a delete will still be a problem, as
>>>>>>>>>> when we sync the object we check it against the source mtime is newer
>>>>>>>>>> than the destination mtime. This is problematic with deletes, as these
>>>>>>>>>> don't have mtime once the object is removed. I think the solution
>>>>>>>>>> would be by using temporary tombstone objects (we already have the olh
>>>>>>>>>> framework that can provide what we need), that we'll garbage collect.
>>>>>>>>>
>>>>>>>>> Further information from logs if it helps:
>>>>>>>>>
>>>>>>>>> 2016-04-19 17:00:45.539356 7fc99effd700  0 _send_request(): deleting obj=test-mp:test.img
>>>>>>>>> 2016-04-19 17:00:45.539902 7fc99effd700 20 _send_request(): skipping object removal obj=test-mp:test.img (obj mtime=2016-04-19 17:00:26.0.098255s, request timestamp=2016-04-19 17:00:17.0.395208s)
>>>>>>>>>
>>>>>>>>> This is what the master zone logs show, however the request timestamp
>>>>>>>>> logged here is the `If-Modified-Since` value from secondary zone when
>>>>>>>>> the actual object write was completed (and not the time when deletion
>>>>>>>>> was completed),  do we set the value of the deletion time anywhere else
>>>>>>>>> in the BI log
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Did you apply PR 8685?
>>>>>>>>
>>>>>>>> Also, take a look at this:
>>>>>>>>
>>>>>>>> https://github.com/ceph/ceph/pull/8709
>>>>>>>>
>>>>>>>> With the new code we do store the object creation time in the delete
>>>>>>>> bucket index entry. That way we make sure we only sync object removal,
>>>>>>>> if the object was the same or older than the one that was actually
>>>>>>>> removed.
>>>>>>
>>>>>> Applied both PRs atop of master + 8655, basically now the object doesn't
>>>>>> resync back to the secondary zone after deletion which we observed
>>>>>> before.A create followed by an immediate delete succeeds delete in both the
>>>>>> zones almost every time.
>>>>>>
>>>>>> However allowing the object to sync to primary by introducing a delay,
>>>>>> for eg a script on secondary like:
>>>>>>
>>>>>> for i in {1..20}; do s3 -us put foobar/foo$i filename=test.img && sleep 3 && s3 -us delete foobar/foo$i;  done
>>>>>>
>>>>>> gives a empty list on the secondary zone, on the primary zone however it
>>>>>> looks like very few objects got actually deleted and others are still existing
>>>>>>
>>>>>> $ s3 -us list foobar
>>>>>> Content-Type: application/xml
>>>>>> Request-Id: tx000000000000000001db8-00571f9cdb-1015-us-east-1
>>>>>> Content-Length: 4713
>>>>>>                        Key                             Last Modified      Size
>>>>>> --------------------------------------------------  --------------------  -----
>>>>>> foo1                                                2016-04-26T14:24:25Z    87M
>>>>>> foo10                                               2016-04-26T14:27:28Z    87M
>>>>>> foo11                                               2016-04-26T14:27:50Z    87M
>>>>>> foo12                                               2016-04-26T14:28:12Z    87M
>>>>>> foo14                                               2016-04-26T14:28:48Z    87M
>>>>>> foo15                                               2016-04-26T14:29:09Z    87M
>>>>>> foo16                                               2016-04-26T14:29:30Z    87M
>>>>>> foo17                                               2016-04-26T14:29:51Z    87M
>>>>>> foo18                                               2016-04-26T14:30:12Z    87M
>>>>>> foo19                                               2016-04-26T14:30:33Z    87M
>>>>>> foo2                                                2016-04-26T14:24:47Z    87M
>>>>>> foo20                                               2016-04-26T14:30:54Z    87M
>>>>>> foo3                                                2016-04-26T14:25:07Z    87M
>>>>>> foo6                                                2016-04-26T14:26:05Z    87M
>>>>>> foo7                                                2016-04-26T14:26:24Z    87M
>>>>>> foo8                                                2016-04-26T14:26:47Z    87M
>>>>>> foo9                                                2016-04-26T14:27:07Z    87M
>>>>>>
>>>>>> Logs show this in case of failed deletes:
>>>>>> 2016-04-26 18:31:01.793673 7ff3f1ffb700 10 If-UnModified-Since: 2016-04-26 18:30:54.0.751623s Last-Modified: 0.000000
>>>>>> 2016-04-26 18:31:01.793716 7ff3f1ffb700 20 _send_request(): delete_obj() obj=foobar3:foo20 returned ret=-2
>>>>>>
>>>>>> In case of an object that succeeded deletion:
>>>>>> 2016-04-26 18:28:40.673885 7ff3f27fc700 10 If-UnModified-Since: 2016-04-26 18:28:30.0.510155s Last-Modified: 2016-04-26 18:28:30.510155
>>>>>>
>>>>>> More interesting log:
>>>>>> 2016-04-26 18:25:52.660674 7ff3cf7fe700 10 If-UnModified-Since: 2016-04-26 18:25:46.0.700052s Last-Modified: 0.000000
>>>>>> 2016-04-26 18:25:52.660698 7ff3cf7fe700 20 _send_request(): delete_obj() obj=foobar3:foo5 returned ret=-2
>>>>>> ...
>>>>>> 2016-04-26 18:25:59.341420 7ff3a67fc700 10 If-UnModified-Since: 2016-04-26 18:25:46.0.700052s Last-Modified: 2016-04-26 18:25:46.700052
>>>>>> [this was object foo5.. which was one of the objects that got deleted]
>>>>>>
>>>>>> Let me know if any other log info may be helpful (only debug rgw was
>>>>>> set, debug ms was 0)
>>>>>
>>>>> Yes, this would be great. If you could provide log with 'debug rgw =
>>>>> 20' and 'debug ms = 1' it'd be helpful. I'm trying to reproduce the
>>>>> issue, and was able to get some bad behavior, but different than what
>>>>> you describe.
>>>
>>> Tried reproducing this behaviour again, not very successful though, I
>>> have debug logs without debug ms setting.
>>>>
>>>> Please take a look at PR 8772, and let me know whether things work
>>>> differently for you.
>>>>
>>> Still seeing the old issue (with or without 8772), ie. master sees an
>>> object and even though it is deleted in secondary after sometime it
>>> reappears. I've uploaded logs of both the regions using
>>> `ceph-post-file`, uid cc9207f5-8aad-4f29-8d2a-6b84f2c253b0
>>>
>>
>>
>> Still haven't looked at the logs, but a quick question. Did you also
>> update the osds with the changes in these PRs? There is an objclass
>> change there, so the osd side needed to be updated and restarted.
>>
>
> Ok, after looking at the logs, I think I understand the problem. For
> some reason object deletes sync can happen in concurrent with object
> creation. I'll fix that and update the PR.

I tried this again in the master from a week ago commit
a41860a501a245b57510543946cb1d2a24b61342 , though the issue is not
reproducible in most cases, for a particular object size and delay I
could see the issue (here I tried an object of ~95M, multipart, delay of
1s between a create and delete.. trying a smaller object size generally
succeeded without any issue) I've uploaded a snippet of the logs at the
issue tracker at http://tracker.ceph.com/issues/15542, let me know if
you need a more fuller log from the run.

Reading through the logs a little, this is what I could understand, for
eg. an obj. del19 in secondary zone:
14:16:31.9879 -> complete_multipart issued to client
14:16:32.3816 -> system req. Get Obj from the primary
14:16:33.0503 -> client issues delete obj
14:16:33.1609 -> delete obj complete issued from rgw
14:16:33.4394 -> data changes log sent to primary, however delete obj
fails in primary as rados returns -2, the obj. is not synced fully yet from primary
14:16:34.5547 -> system req. Get Obj from primary completed,

Is it possible that we've already iterated past the head obj. so the get
obj returns full obj to primary even though a delete was issued to it or
something similar?

Regards,
Abhishek

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-05-31  9:21                           ` Abhishek Lekshmanan
@ 2016-05-31 11:06                             ` Yehuda Sadeh-Weinraub
  2016-06-02 13:01                               ` Abhishek Lekshmanan
  0 siblings, 1 reply; 22+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2016-05-31 11:06 UTC (permalink / raw)
  To: Abhishek Lekshmanan; +Cc: Ceph Devel

On Tue, May 31, 2016 at 2:21 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>
> Yehuda Sadeh-Weinraub writes:
>
>> On Wed, Apr 27, 2016 at 1:15 PM, Yehuda Sadeh-Weinraub
>> <yehuda@redhat.com> wrote:
>>> On Wed, Apr 27, 2016 at 1:02 PM, Abhishek L <abhishek@suse.com> wrote:
>>>>
>>>> Yehuda Sadeh-Weinraub writes:
>>>>
>>>>> On Tue, Apr 26, 2016 at 3:21 PM, Yehuda Sadeh-Weinraub
>>>>> <yehuda@redhat.com> wrote:
>>>>>> On Tue, Apr 26, 2016 at 10:37 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>>>
>>>>>>> Abhishek L writes:
>>>>>>>
>>>>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>>>>
>>>>>>>>> On Mon, Apr 25, 2016 at 1:17 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 19, 2016 at 11:08 AM, Yehuda Sadeh-Weinraub
>>>>>>>>>>> <yehuda@redhat.com> wrote:
>>>>>>>>>>>> On Tue, Apr 19, 2016 at 10:54 AM, Abhishek L
>>>>>>>>>>>> <abhishek.lekshmanan@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yehuda Sadeh-Weinraub writes:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Apr 19, 2016 at 9:10 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>>>>>>>>>>>> Trying deleting objects & buckets from a secondary zone in a RGW
>>>>>>>>>>>>>>> multisite configuration leads to some wierdness:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. On deleting an object and the bucket immediately will mostly lead to
>>>>>>>>>>>>>>> object and bucket getting deleted in the secondary zone, but since we
>>>>>>>>>>>>>>> forward the bucket deletion to master only after we delete in secondary
>>>>>>>>>>>>>>> it will fail with 409 (BucketNotEmpty) and gets reraised as a 500 to the
>>>>>>>>>>>>>>> client. This _seems_ simple enough to fix if we forward the bucket
>>>>>>>>>>>>>>> deletion request to master zone before attempting deletion locally,
>>>>>>>>>>>>>>> (issue: http://tracker.ceph.com/issues/15540, possible fix: https://github.com/ceph/ceph/pull/8655)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yeah, this looks good. We'll get it through testing.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2. Deletion of objects themselves: deletion of objects themselves seems
>>>>>>>>>>>>>>> to be a bit racy, deleting an object on a secondary zone succeeds,
>>>>>>>>>>>>>>> listing the bucket seems to show an empty list, but gets populated with
>>>>>>>>>>>>>>> the object again sometimes (this time with a newer timestamp), this is
>>>>>>>>>>>>>>> not always guaranteed to be reproduce, but I've seen this often with
>>>>>>>>>>>>>>> multipart uploads, as an eg:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> $ s3 -u list test-mp
>>>>>>>>>>>>>>>                        Key                             Last Modified      Size
>>>>>>>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>>>>>>>> test.img                                            2016-04-19T13:00:17Z    40M
>>>>>>>>>>>>>>> $ s3 -u delete test-mp/test.img
>>>>>>>>>>>>>>> $ s3 -u list test-mp
>>>>>>>>>>>>>>>                        Key                             Last Modified      Size
>>>>>>>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>>>>>>>> test.img                                            2016-04-19T13:00:45Z    40M
>>>>>>>>>>>>>>> $ s3 -u delete test-mp/test.img # wait for a  min
>>>>>>>>>>>>>>> $ s3 -us list test-mp
>>>>>>>>>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>>>>>>>>>> test.img                                            2016-04-19T13:01:52Z    40M
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Mostly seeing log entries of this form in both the cases ie. where
>>>>>>>>>>>>>>> delete object seems to be successfully delete in both master and
>>>>>>>>>>>>>>> secondary zone and the case where it succeeds in master and fails in
>>>>>>>>>>>>>>> secondary :
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 20 parsed entry: id=00000000027.27.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>>>>>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: non-complete operation
>>>>>>>>>>>>>>> 20 parsed entry: id=00000000028.28.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>>>>>>>>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: canceled operation
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any ideas on this?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do you have more than 2 zones syncing? Is it an object delete that
>>>>>>>>>>>>>> came right after the object creation?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Only 2 zones ie. one master and one secondary, req, on secondary. The delete came right after the
>>>>>>>>>>>>> create though
>>>>>>>>>>>>
>>>>>>>>>>>> There are two issues that I see here. One is that we sync an object,
>>>>>>>>>>>> but end up with different mtime than the object's source. The second
>>>>>>>>>>>> issue is that we shouldn't have synced that object.
>>>>>>>>>>>>
>>>>>>>>>>>> There needs to be a check when syncing objects, to validate that we
>>>>>>>>>>>> don't sync an object that originated from the current zone (by
>>>>>>>>>>>> comparing the short zone id). We might be missing that.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> For the first issue, see:
>>>>>>>>>>> https://github.com/ceph/ceph/pull/8685
>>>>>>>>>>>
>>>>>>>>>>> However, create that follows by a delete will still be a problem, as
>>>>>>>>>>> when we sync the object we check it against the source mtime is newer
>>>>>>>>>>> than the destination mtime. This is problematic with deletes, as these
>>>>>>>>>>> don't have mtime once the object is removed. I think the solution
>>>>>>>>>>> would be by using temporary tombstone objects (we already have the olh
>>>>>>>>>>> framework that can provide what we need), that we'll garbage collect.
>>>>>>>>>>
>>>>>>>>>> Further information from logs if it helps:
>>>>>>>>>>
>>>>>>>>>> 2016-04-19 17:00:45.539356 7fc99effd700  0 _send_request(): deleting obj=test-mp:test.img
>>>>>>>>>> 2016-04-19 17:00:45.539902 7fc99effd700 20 _send_request(): skipping object removal obj=test-mp:test.img (obj mtime=2016-04-19 17:00:26.0.098255s, request timestamp=2016-04-19 17:00:17.0.395208s)
>>>>>>>>>>
>>>>>>>>>> This is what the master zone logs show, however the request timestamp
>>>>>>>>>> logged here is the `If-Modified-Since` value from secondary zone when
>>>>>>>>>> the actual object write was completed (and not the time when deletion
>>>>>>>>>> was completed),  do we set the value of the deletion time anywhere else
>>>>>>>>>> in the BI log
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Did you apply PR 8685?
>>>>>>>>>
>>>>>>>>> Also, take a look at this:
>>>>>>>>>
>>>>>>>>> https://github.com/ceph/ceph/pull/8709
>>>>>>>>>
>>>>>>>>> With the new code we do store the object creation time in the delete
>>>>>>>>> bucket index entry. That way we make sure we only sync object removal,
>>>>>>>>> if the object was the same or older than the one that was actually
>>>>>>>>> removed.
>>>>>>>
>>>>>>> Applied both PRs atop of master + 8655, basically now the object doesn't
>>>>>>> resync back to the secondary zone after deletion which we observed
>>>>>>> before.A create followed by an immediate delete succeeds delete in both the
>>>>>>> zones almost every time.
>>>>>>>
>>>>>>> However allowing the object to sync to primary by introducing a delay,
>>>>>>> for eg a script on secondary like:
>>>>>>>
>>>>>>> for i in {1..20}; do s3 -us put foobar/foo$i filename=test.img && sleep 3 && s3 -us delete foobar/foo$i;  done
>>>>>>>
>>>>>>> gives a empty list on the secondary zone, on the primary zone however it
>>>>>>> looks like very few objects got actually deleted and others are still existing
>>>>>>>
>>>>>>> $ s3 -us list foobar
>>>>>>> Content-Type: application/xml
>>>>>>> Request-Id: tx000000000000000001db8-00571f9cdb-1015-us-east-1
>>>>>>> Content-Length: 4713
>>>>>>>                        Key                             Last Modified      Size
>>>>>>> --------------------------------------------------  --------------------  -----
>>>>>>> foo1                                                2016-04-26T14:24:25Z    87M
>>>>>>> foo10                                               2016-04-26T14:27:28Z    87M
>>>>>>> foo11                                               2016-04-26T14:27:50Z    87M
>>>>>>> foo12                                               2016-04-26T14:28:12Z    87M
>>>>>>> foo14                                               2016-04-26T14:28:48Z    87M
>>>>>>> foo15                                               2016-04-26T14:29:09Z    87M
>>>>>>> foo16                                               2016-04-26T14:29:30Z    87M
>>>>>>> foo17                                               2016-04-26T14:29:51Z    87M
>>>>>>> foo18                                               2016-04-26T14:30:12Z    87M
>>>>>>> foo19                                               2016-04-26T14:30:33Z    87M
>>>>>>> foo2                                                2016-04-26T14:24:47Z    87M
>>>>>>> foo20                                               2016-04-26T14:30:54Z    87M
>>>>>>> foo3                                                2016-04-26T14:25:07Z    87M
>>>>>>> foo6                                                2016-04-26T14:26:05Z    87M
>>>>>>> foo7                                                2016-04-26T14:26:24Z    87M
>>>>>>> foo8                                                2016-04-26T14:26:47Z    87M
>>>>>>> foo9                                                2016-04-26T14:27:07Z    87M
>>>>>>>
>>>>>>> Logs show this in case of failed deletes:
>>>>>>> 2016-04-26 18:31:01.793673 7ff3f1ffb700 10 If-UnModified-Since: 2016-04-26 18:30:54.0.751623s Last-Modified: 0.000000
>>>>>>> 2016-04-26 18:31:01.793716 7ff3f1ffb700 20 _send_request(): delete_obj() obj=foobar3:foo20 returned ret=-2
>>>>>>>
>>>>>>> In case of an object that succeeded deletion:
>>>>>>> 2016-04-26 18:28:40.673885 7ff3f27fc700 10 If-UnModified-Since: 2016-04-26 18:28:30.0.510155s Last-Modified: 2016-04-26 18:28:30.510155
>>>>>>>
>>>>>>> More interesting log:
>>>>>>> 2016-04-26 18:25:52.660674 7ff3cf7fe700 10 If-UnModified-Since: 2016-04-26 18:25:46.0.700052s Last-Modified: 0.000000
>>>>>>> 2016-04-26 18:25:52.660698 7ff3cf7fe700 20 _send_request(): delete_obj() obj=foobar3:foo5 returned ret=-2
>>>>>>> ...
>>>>>>> 2016-04-26 18:25:59.341420 7ff3a67fc700 10 If-UnModified-Since: 2016-04-26 18:25:46.0.700052s Last-Modified: 2016-04-26 18:25:46.700052
>>>>>>> [this was object foo5.. which was one of the objects that got deleted]
>>>>>>>
>>>>>>> Let me know if any other log info may be helpful (only debug rgw was
>>>>>>> set, debug ms was 0)
>>>>>>
>>>>>> Yes, this would be great. If you could provide log with 'debug rgw =
>>>>>> 20' and 'debug ms = 1' it'd be helpful. I'm trying to reproduce the
>>>>>> issue, and was able to get some bad behavior, but different than what
>>>>>> you describe.
>>>>
>>>> Tried reproducing this behaviour again, not very successful though, I
>>>> have debug logs without debug ms setting.
>>>>>
>>>>> Please take a look at PR 8772, and let me know whether things work
>>>>> differently for you.
>>>>>
>>>> Still seeing the old issue (with or without 8772), ie. master sees an
>>>> object and even though it is deleted in secondary after sometime it
>>>> reappears. I've uploaded logs of both the regions using
>>>> `ceph-post-file`, uid cc9207f5-8aad-4f29-8d2a-6b84f2c253b0
>>>>
>>>
>>>
>>> Still haven't looked at the logs, but a quick question. Did you also
>>> update the osds with the changes in these PRs? There is an objclass
>>> change there, so the osd side needed to be updated and restarted.
>>>
>>
>> Ok, after looking at the logs, I think I understand the problem. For
>> some reason object deletes sync can happen in concurrent with object
>> creation. I'll fix that and update the PR.
>
> I tried this again in the master from a week ago commit
> a41860a501a245b57510543946cb1d2a24b61342 , though the issue is not
> reproducible in most cases, for a particular object size and delay I
> could see the issue (here I tried an object of ~95M, multipart, delay of
> 1s between a create and delete.. trying a smaller object size generally
> succeeded without any issue) I've uploaded a snippet of the logs at the
> issue tracker at http://tracker.ceph.com/issues/15542, let me know if
> you need a more fuller log from the run.
>
> Reading through the logs a little, this is what I could understand, for
> eg. an obj. del19 in secondary zone:
> 14:16:31.9879 -> complete_multipart issued to client
> 14:16:32.3816 -> system req. Get Obj from the primary
> 14:16:33.0503 -> client issues delete obj
> 14:16:33.1609 -> delete obj complete issued from rgw
> 14:16:33.4394 -> data changes log sent to primary, however delete obj
> fails in primary as rados returns -2, the obj. is not synced fully yet from primary
> 14:16:34.5547 -> system req. Get Obj from primary completed,
>
> Is it possible that we've already iterated past the head obj. so the get
> obj returns full obj to primary even though a delete was issued to it or
> something similar?
>

Yes, that would be a normal behaviour. The primary should not have
concurrent sync operations on the same object if object has not
completed previous sync operations. Looking at the log it really seems
that we don't identify the concurrent sync operation on the same
object. This should have been fixed by commit
edea6d58dd25995bcc1ed4fc5be6f72ce4a6835a. Can you try to verify what
went wrong there (whether can_do_op() returned true and why)?

Thanks,
Yehuda





> Regards,
> Abhishek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-05-31 11:06                             ` Yehuda Sadeh-Weinraub
@ 2016-06-02 13:01                               ` Abhishek Lekshmanan
  2016-06-02 13:09                                 ` Yehuda Sadeh-Weinraub
  0 siblings, 1 reply; 22+ messages in thread
From: Abhishek Lekshmanan @ 2016-06-02 13:01 UTC (permalink / raw)
  To: Yehuda Sadeh-Weinraub; +Cc: Ceph Devel

[..]
Yehuda Sadeh-Weinraub writes:
>
> Yes, that would be a normal behaviour. The primary should not have
> concurrent sync operations on the same object if object has not
> completed previous sync operations. Looking at the log it really seems
> that we don't identify the concurrent sync operation on the same
> object. This should have been fixed by commit
> edea6d58dd25995bcc1ed4fc5be6f72ce4a6835a. Can you try to verify what
> went wrong there (whether can_do_op() returned true and why)?

Looked into this a bit, can_do_op() has returned true for the case when
primary issues a Fetch (or GET) and when a delete is issued,(even though
the Fetch is still not complete yet) by putting a debug log around when
we clear the keys, both the delete op and the get op creates and deletes
the same key successfully.

Which makes me suspect, that different instances of
RGWBucketIncSyncShardMarkerTrack are at play here, leading to different
independent values for key_to_marker. Is that possible?

Regards
Abhishek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-06-02 13:01                               ` Abhishek Lekshmanan
@ 2016-06-02 13:09                                 ` Yehuda Sadeh-Weinraub
  2016-06-03  8:28                                   ` Abhishek Lekshmanan
  0 siblings, 1 reply; 22+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2016-06-02 13:09 UTC (permalink / raw)
  To: Abhishek Lekshmanan; +Cc: Ceph Devel

On Thu, Jun 2, 2016 at 6:01 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
> [..]
> Yehuda Sadeh-Weinraub writes:
>>
>> Yes, that would be a normal behaviour. The primary should not have
>> concurrent sync operations on the same object if object has not
>> completed previous sync operations. Looking at the log it really seems
>> that we don't identify the concurrent sync operation on the same
>> object. This should have been fixed by commit
>> edea6d58dd25995bcc1ed4fc5be6f72ce4a6835a. Can you try to verify what
>> went wrong there (whether can_do_op() returned true and why)?
>
> Looked into this a bit, can_do_op() has returned true for the case when
> primary issues a Fetch (or GET) and when a delete is issued,(even though
> the Fetch is still not complete yet) by putting a debug log around when
> we clear the keys, both the delete op and the get op creates and deletes
> the same key successfully.
>
> Which makes me suspect, that different instances of
> RGWBucketIncSyncShardMarkerTrack are at play here, leading to different
> independent values for key_to_marker. Is that possible?
>
Shouldn't happen, but maybe something went wrong. Try adding some more
info to the log message to see if that's the case.

Thanks,
Yehuda

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-06-02 13:09                                 ` Yehuda Sadeh-Weinraub
@ 2016-06-03  8:28                                   ` Abhishek Lekshmanan
  2016-06-03  9:00                                     ` Yehuda Sadeh-Weinraub
  0 siblings, 1 reply; 22+ messages in thread
From: Abhishek Lekshmanan @ 2016-06-03  8:28 UTC (permalink / raw)
  To: Yehuda Sadeh-Weinraub; +Cc: Abhishek Lekshmanan, Ceph Devel

Yehuda Sadeh-Weinraub writes:

> On Thu, Jun 2, 2016 at 6:01 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>> [..]
>> Yehuda Sadeh-Weinraub writes:
>>>
>>> Yes, that would be a normal behaviour. The primary should not have
>>> concurrent sync operations on the same object if object has not
>>> completed previous sync operations. Looking at the log it really seems
>>> that we don't identify the concurrent sync operation on the same
>>> object. This should have been fixed by commit
>>> edea6d58dd25995bcc1ed4fc5be6f72ce4a6835a. Can you try to verify what
>>> went wrong there (whether can_do_op() returned true and why)?
>>
>> Looked into this a bit, can_do_op() has returned true for the case when
>> primary issues a Fetch (or GET) and when a delete is issued,(even though
>> the Fetch is still not complete yet) by putting a debug log around when
>> we clear the keys, both the delete op and the get op creates and deletes
>> the same key successfully.
>>
>> Which makes me suspect, that different instances of
>> RGWBucketIncSyncShardMarkerTrack are at play here, leading to different
>> independent values for key_to_marker. Is that possible?
>>
> Shouldn't happen, but maybe something went wrong. Try adding some more
> info to the log message to see if that's the case.

I just added a debug log whenever a new instance of
RGWBucketIncSyncShardMarkerTrack was created, and when we check/delete
keys, in all cases, ie. when a GET was called and/or when a DELETE was
called, it was a newer instance of marker_tracker that was being invoked.
Also a few lines before always showed this:

incremental_sync(): async update notification: mybucket:62bc922d-f295-4067-ae36-e23e2f231aad.24099.1:-1

which seems to be called whenever we're creating a new SingleEntry CR?
(the value of modified_iter was the same in every case)

Also looking at the cases where the deletion succeeded in the secondary
zone, it seemed here too can_do_op had succeeded every time, the
difference was in this case either the Object GET came from the remote
site after original site had already processed the DELETE or in other
cases, the GET in remote site was processed in time before the DELETE.

--
Abhishek

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-06-03  8:28                                   ` Abhishek Lekshmanan
@ 2016-06-03  9:00                                     ` Yehuda Sadeh-Weinraub
  2016-06-03  9:09                                       ` Yehuda Sadeh-Weinraub
  2016-06-03  9:16                                       ` Abhishek Lekshmanan
  0 siblings, 2 replies; 22+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2016-06-03  9:00 UTC (permalink / raw)
  To: Abhishek Lekshmanan; +Cc: Ceph Devel

On Fri, Jun 3, 2016 at 1:28 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>
> Yehuda Sadeh-Weinraub writes:
>
>> On Thu, Jun 2, 2016 at 6:01 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>> [..]
>>> Yehuda Sadeh-Weinraub writes:
>>>>
>>>> Yes, that would be a normal behaviour. The primary should not have
>>>> concurrent sync operations on the same object if object has not
>>>> completed previous sync operations. Looking at the log it really seems
>>>> that we don't identify the concurrent sync operation on the same
>>>> object. This should have been fixed by commit
>>>> edea6d58dd25995bcc1ed4fc5be6f72ce4a6835a. Can you try to verify what
>>>> went wrong there (whether can_do_op() returned true and why)?
>>>
>>> Looked into this a bit, can_do_op() has returned true for the case when
>>> primary issues a Fetch (or GET) and when a delete is issued,(even though
>>> the Fetch is still not complete yet) by putting a debug log around when
>>> we clear the keys, both the delete op and the get op creates and deletes
>>> the same key successfully.
>>>
>>> Which makes me suspect, that different instances of
>>> RGWBucketIncSyncShardMarkerTrack are at play here, leading to different
>>> independent values for key_to_marker. Is that possible?
>>>
>> Shouldn't happen, but maybe something went wrong. Try adding some more
>> info to the log message to see if that's the case.
>
> I just added a debug log whenever a new instance of
> RGWBucketIncSyncShardMarkerTrack was created, and when we check/delete
> keys, in all cases, ie. when a GET was called and/or when a DELETE was
> called, it was a newer instance of marker_tracker that was being invoked.
> Also a few lines before always showed this:
>
> incremental_sync(): async update notification: mybucket:62bc922d-f295-4067-ae36-e23e2f231aad.24099.1:-1
>
> which seems to be called whenever we're creating a new SingleEntry CR?
> (the value of modified_iter was the same in every case)
>
> Also looking at the cases where the deletion succeeded in the secondary
> zone, it seemed here too can_do_op had succeeded every time, the
> difference was in this case either the Object GET came from the remote
> site after original site had already processed the DELETE or in other
> cases, the GET in remote site was processed in time before the DELETE.
>
>

Can you provide the log? I'm still not sure how you'd have different
tracker markers for the same bucket instance, as we take a lease to
prevent concurrent updates to the same bucket shard. This should
happen in the async updates too.

Yehuda

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-06-03  9:00                                     ` Yehuda Sadeh-Weinraub
@ 2016-06-03  9:09                                       ` Yehuda Sadeh-Weinraub
  2016-06-03  9:16                                       ` Abhishek Lekshmanan
  1 sibling, 0 replies; 22+ messages in thread
From: Yehuda Sadeh-Weinraub @ 2016-06-03  9:09 UTC (permalink / raw)
  To: Abhishek Lekshmanan; +Cc: Ceph Devel

On Fri, Jun 3, 2016 at 2:00 AM, Yehuda Sadeh-Weinraub <yehuda@redhat.com> wrote:
> On Fri, Jun 3, 2016 at 1:28 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>
>> Yehuda Sadeh-Weinraub writes:
>>
>>> On Thu, Jun 2, 2016 at 6:01 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>> [..]
>>>> Yehuda Sadeh-Weinraub writes:
>>>>>
>>>>> Yes, that would be a normal behaviour. The primary should not have
>>>>> concurrent sync operations on the same object if object has not
>>>>> completed previous sync operations. Looking at the log it really seems
>>>>> that we don't identify the concurrent sync operation on the same
>>>>> object. This should have been fixed by commit
>>>>> edea6d58dd25995bcc1ed4fc5be6f72ce4a6835a. Can you try to verify what
>>>>> went wrong there (whether can_do_op() returned true and why)?
>>>>
>>>> Looked into this a bit, can_do_op() has returned true for the case when
>>>> primary issues a Fetch (or GET) and when a delete is issued,(even though
>>>> the Fetch is still not complete yet) by putting a debug log around when
>>>> we clear the keys, both the delete op and the get op creates and deletes
>>>> the same key successfully.
>>>>
>>>> Which makes me suspect, that different instances of
>>>> RGWBucketIncSyncShardMarkerTrack are at play here, leading to different
>>>> independent values for key_to_marker. Is that possible?
>>>>
>>> Shouldn't happen, but maybe something went wrong. Try adding some more
>>> info to the log message to see if that's the case.
>>
>> I just added a debug log whenever a new instance of
>> RGWBucketIncSyncShardMarkerTrack was created, and when we check/delete
>> keys, in all cases, ie. when a GET was called and/or when a DELETE was
>> called, it was a newer instance of marker_tracker that was being invoked.
>> Also a few lines before always showed this:
>>
>> incremental_sync(): async update notification: mybucket:62bc922d-f295-4067-ae36-e23e2f231aad.24099.1:-1
>>
>> which seems to be called whenever we're creating a new SingleEntry CR?
>> (the value of modified_iter was the same in every case)
>>
>> Also looking at the cases where the deletion succeeded in the secondary
>> zone, it seemed here too can_do_op had succeeded every time, the
>> difference was in this case either the Object GET came from the remote
>> site after original site had already processed the DELETE or in other
>> cases, the GET in remote site was processed in time before the DELETE.
>>
>>
>
> Can you provide the log? I'm still not sure how you'd have different
> tracker markers for the same bucket instance, as we take a lease to
> prevent concurrent updates to the same bucket shard. This should
> happen in the async updates too.
>

Also, try this:
https://github.com/yehudasa/ceph/commit/b00096207e5fb2b1d7591a59a8012ec458bcde4b

Thanks,
Yehuda

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RGW Multisite delete wierdness
  2016-06-03  9:00                                     ` Yehuda Sadeh-Weinraub
  2016-06-03  9:09                                       ` Yehuda Sadeh-Weinraub
@ 2016-06-03  9:16                                       ` Abhishek Lekshmanan
  1 sibling, 0 replies; 22+ messages in thread
From: Abhishek Lekshmanan @ 2016-06-03  9:16 UTC (permalink / raw)
  To: Yehuda Sadeh-Weinraub; +Cc: Abhishek Lekshmanan, Ceph Devel


Yehuda Sadeh-Weinraub writes:

> On Fri, Jun 3, 2016 at 1:28 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>
>> Yehuda Sadeh-Weinraub writes:
>>
>>> On Thu, Jun 2, 2016 at 6:01 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>> [..]
>>>> Yehuda Sadeh-Weinraub writes:
>>>>>
>>>>> Yes, that would be a normal behaviour. The primary should not have
>>>>> concurrent sync operations on the same object if object has not
>>>>> completed previous sync operations. Looking at the log it really seems
>>>>> that we don't identify the concurrent sync operation on the same
>>>>> object. This should have been fixed by commit
>>>>> edea6d58dd25995bcc1ed4fc5be6f72ce4a6835a. Can you try to verify what
>>>>> went wrong there (whether can_do_op() returned true and why)?
>>>>
>>>> Looked into this a bit, can_do_op() has returned true for the case when
>>>> primary issues a Fetch (or GET) and when a delete is issued,(even though
>>>> the Fetch is still not complete yet) by putting a debug log around when
>>>> we clear the keys, both the delete op and the get op creates and deletes
>>>> the same key successfully.
>>>>
>>>> Which makes me suspect, that different instances of
>>>> RGWBucketIncSyncShardMarkerTrack are at play here, leading to different
>>>> independent values for key_to_marker. Is that possible?
>>>>
>>> Shouldn't happen, but maybe something went wrong. Try adding some more
>>> info to the log message to see if that's the case.
>>
>> I just added a debug log whenever a new instance of
>> RGWBucketIncSyncShardMarkerTrack was created, and when we check/delete
>> keys, in all cases, ie. when a GET was called and/or when a DELETE was
>> called, it was a newer instance of marker_tracker that was being invoked.
>> Also a few lines before always showed this:
>>
>> incremental_sync(): async update notification: mybucket:62bc922d-f295-4067-ae36-e23e2f231aad.24099.1:-1
>>
>> which seems to be called whenever we're creating a new SingleEntry CR?
>> (the value of modified_iter was the same in every case)
>>
>> Also looking at the cases where the deletion succeeded in the secondary
>> zone, it seemed here too can_do_op had succeeded every time, the
>> difference was in this case either the Object GET came from the remote
>> site after original site had already processed the DELETE or in other
>> cases, the GET in remote site was processed in time before the DELETE.
>>
>>
>
> Can you provide the log? I'm still not sure how you'd have different
> tracker markers for the same bucket instance, as we take a lease to
> prevent concurrent updates to the same bucket shard. This should
> happen in the async updates too.

id: c24eed72-47d4-452c-8d0e-86d96be8fff1

radosgw.8001.log is the master (and in this case the remote site)

>
> Yehuda



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-06-03  9:16 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-19 16:10 RGW Multisite delete wierdness Abhishek Lekshmanan
2016-04-19 17:52 ` Yehuda Sadeh-Weinraub
2016-04-19 17:54   ` Abhishek L
2016-04-19 18:08     ` Yehuda Sadeh-Weinraub
2016-04-22  0:40       ` Yehuda Sadeh-Weinraub
2016-04-25  8:17         ` Abhishek Lekshmanan
2016-04-25 18:46           ` Yehuda Sadeh-Weinraub
2016-04-25 19:44             ` Abhishek L
2016-04-26 17:37               ` Abhishek Lekshmanan
2016-04-26 22:21                 ` Yehuda Sadeh-Weinraub
2016-04-26 23:12                   ` Yehuda Sadeh-Weinraub
2016-04-27 20:02                     ` Abhishek L
2016-04-27 20:15                       ` Yehuda Sadeh-Weinraub
2016-04-27 21:50                         ` Yehuda Sadeh-Weinraub
2016-05-31  9:21                           ` Abhishek Lekshmanan
2016-05-31 11:06                             ` Yehuda Sadeh-Weinraub
2016-06-02 13:01                               ` Abhishek Lekshmanan
2016-06-02 13:09                                 ` Yehuda Sadeh-Weinraub
2016-06-03  8:28                                   ` Abhishek Lekshmanan
2016-06-03  9:00                                     ` Yehuda Sadeh-Weinraub
2016-06-03  9:09                                       ` Yehuda Sadeh-Weinraub
2016-06-03  9:16                                       ` Abhishek Lekshmanan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox