From: Abhishek Lekshmanan <abhishek@suse.com>
To: Yehuda Sadeh-Weinraub <yehuda@redhat.com>
Cc: Ceph Devel <ceph-devel@vger.kernel.org>
Subject: Re: RGW Multisite delete wierdness
Date: Mon, 25 Apr 2016 10:17:36 +0200 [thread overview]
Message-ID: <87a8ki6r1b.fsf@suse.com> (raw)
In-Reply-To: <CADRKj5SJVMzjbvMt7PkvdDxBq1+b1GoCNo7f23Jsf9434+0O-A@mail.gmail.com>
Yehuda Sadeh-Weinraub writes:
> On Tue, Apr 19, 2016 at 11:08 AM, Yehuda Sadeh-Weinraub
> <yehuda@redhat.com> wrote:
>> On Tue, Apr 19, 2016 at 10:54 AM, Abhishek L
>> <abhishek.lekshmanan@gmail.com> wrote:
>>>
>>> Yehuda Sadeh-Weinraub writes:
>>>
>>>> On Tue, Apr 19, 2016 at 9:10 AM, Abhishek Lekshmanan <abhishek@suse.com> wrote:
>>>>> Trying deleting objects & buckets from a secondary zone in a RGW
>>>>> multisite configuration leads to some wierdness:
>>>>>
>>>>> 1. On deleting an object and the bucket immediately will mostly lead to
>>>>> object and bucket getting deleted in the secondary zone, but since we
>>>>> forward the bucket deletion to master only after we delete in secondary
>>>>> it will fail with 409 (BucketNotEmpty) and gets reraised as a 500 to the
>>>>> client. This _seems_ simple enough to fix if we forward the bucket
>>>>> deletion request to master zone before attempting deletion locally,
>>>>> (issue: http://tracker.ceph.com/issues/15540, possible fix: https://github.com/ceph/ceph/pull/8655)
>>>>>
>>>>
>>>> Yeah, this looks good. We'll get it through testing.
>>>>
>>>>> 2. Deletion of objects themselves: deletion of objects themselves seems
>>>>> to be a bit racy, deleting an object on a secondary zone succeeds,
>>>>> listing the bucket seems to show an empty list, but gets populated with
>>>>> the object again sometimes (this time with a newer timestamp), this is
>>>>> not always guaranteed to be reproduce, but I've seen this often with
>>>>> multipart uploads, as an eg:
>>>>>
>>>>> $ s3 -u list test-mp
>>>>> Key Last Modified Size
>>>>> -------------------------------------------------- -------------------- -----
>>>>> test.img 2016-04-19T13:00:17Z 40M
>>>>> $ s3 -u delete test-mp/test.img
>>>>> $ s3 -u list test-mp
>>>>> Key Last Modified Size
>>>>> -------------------------------------------------- -------------------- -----
>>>>> test.img 2016-04-19T13:00:45Z 40M
>>>>> $ s3 -u delete test-mp/test.img # wait for a min
>>>>> $ s3 -us list test-mp
>>>>> -------------------------------------------------- -------------------- -----
>>>>> test.img 2016-04-19T13:01:52Z 40M
>>>>>
>>>>>
>>>>> Mostly seeing log entries of this form in both the cases ie. where
>>>>> delete object seems to be successfully delete in both master and
>>>>> secondary zone and the case where it succeeds in master and fails in
>>>>> secondary :
>>>>>
>>>>> 20 parsed entry: id=00000000027.27.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: non-complete operation
>>>>> 20 parsed entry: id=00000000028.28.2 iter->object=foo iter->instance= name=foo instance= ns=
>>>>> 20 [inc sync] skipping object: dkr:d8e0ec3d-b3da-43f8-a99b-38a5b4941b6f.14113.2:-1/foo: canceled operation
>>>>>
>>>>> Any ideas on this?
>>>>>
>>>>
>>>> Do you have more than 2 zones syncing? Is it an object delete that
>>>> came right after the object creation?
>>>
>>> Only 2 zones ie. one master and one secondary, req, on secondary. The delete came right after the
>>> create though
>>
>> There are two issues that I see here. One is that we sync an object,
>> but end up with different mtime than the object's source. The second
>> issue is that we shouldn't have synced that object.
>>
>> There needs to be a check when syncing objects, to validate that we
>> don't sync an object that originated from the current zone (by
>> comparing the short zone id). We might be missing that.
>>
>
> For the first issue, see:
> https://github.com/ceph/ceph/pull/8685
>
> However, create that follows by a delete will still be a problem, as
> when we sync the object we check it against the source mtime is newer
> than the destination mtime. This is problematic with deletes, as these
> don't have mtime once the object is removed. I think the solution
> would be by using temporary tombstone objects (we already have the olh
> framework that can provide what we need), that we'll garbage collect.
Further information from logs if it helps:
2016-04-19 17:00:45.539356 7fc99effd700 0 _send_request(): deleting obj=test-mp:test.img
2016-04-19 17:00:45.539902 7fc99effd700 20 _send_request(): skipping object removal obj=test-mp:test.img (obj mtime=2016-04-19 17:00:26.0.098255s, request timestamp=2016-04-19 17:00:17.0.395208s)
This is what the master zone logs show, however the request timestamp
logged here is the `If-Modified-Since` value from secondary zone when
the actual object write was completed (and not the time when deletion
was completed), do we set the value of the deletion time anywhere else
in the BI log
>
> Yehuda
--
Abhishek Lekshmanan
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2016-04-25 8:17 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-19 16:10 RGW Multisite delete wierdness Abhishek Lekshmanan
2016-04-19 17:52 ` Yehuda Sadeh-Weinraub
2016-04-19 17:54 ` Abhishek L
2016-04-19 18:08 ` Yehuda Sadeh-Weinraub
2016-04-22 0:40 ` Yehuda Sadeh-Weinraub
2016-04-25 8:17 ` Abhishek Lekshmanan [this message]
2016-04-25 18:46 ` Yehuda Sadeh-Weinraub
2016-04-25 19:44 ` Abhishek L
2016-04-26 17:37 ` Abhishek Lekshmanan
2016-04-26 22:21 ` Yehuda Sadeh-Weinraub
2016-04-26 23:12 ` Yehuda Sadeh-Weinraub
2016-04-27 20:02 ` Abhishek L
2016-04-27 20:15 ` Yehuda Sadeh-Weinraub
2016-04-27 21:50 ` Yehuda Sadeh-Weinraub
2016-05-31 9:21 ` Abhishek Lekshmanan
2016-05-31 11:06 ` Yehuda Sadeh-Weinraub
2016-06-02 13:01 ` Abhishek Lekshmanan
2016-06-02 13:09 ` Yehuda Sadeh-Weinraub
2016-06-03 8:28 ` Abhishek Lekshmanan
2016-06-03 9:00 ` Yehuda Sadeh-Weinraub
2016-06-03 9:09 ` Yehuda Sadeh-Weinraub
2016-06-03 9:16 ` Abhishek Lekshmanan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a8ki6r1b.fsf@suse.com \
--to=abhishek@suse.com \
--cc=ceph-devel@vger.kernel.org \
--cc=yehuda@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox