All of lore.kernel.org
 help / color / mirror / Atom feed
From: Osier Yang <osier@yunify.com>
To: Guang Yang <yguang11@outlook.com>, Yehuda Sadeh <yehuda@inktank.com>
Cc: Ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: bucket index sharding - IO throttle
Date: Wed, 06 Aug 2014 12:38:01 +0800	[thread overview]
Message-ID: <53E1B129.4060006@yunify.com> (raw)
In-Reply-To: <BLU436-SMTP530E3373FB99617E291C2ADFE20@phx.gbl>


On 2014年08月04日 15:20, Guang Yang wrote:
> Hi Yehuda,
> Here is the new pull request - https://github.com/ceph/ceph/pull/2187

I simply applied the patch on git top, and the testing shows
"rest-bench" is completely
broken with the 2 patches:


root@testing-s3gw0:~/s3-tests# /usr/bin/rest-bench
--api-host=testing-s3gw0 --access-key=93EEF3F5O7VY89Q2GSWC
--secret="lf2bwxiRf1e9/nrOTCZyN/HgTqCz7XwrB2LDocY1" --protocol=http
--uri_style=path --bucket=cool0 --seconds=20 --concurrent-ios=50
--block-size=204800 --show-time write
host=testing-s3gw0
2014-08-06 12:28:56.500235 7f1336645780 -1 did not load config file,
using default settings.
ERROR: failed to create bucket: ConnectionFailed
failed initializing benchmark

The related debug log entry:

2014-08-06 12:29:48.137559 7fea62fcd700 20 state for
obj=.rgw:.bucket.meta.rest-bench-bucket:default.9738.2 is not atomic,
not appending atomic test

After a short time, all the memory was eaten up:

root@testing-s3gw0:~/s3-tests# /usr/bin/rest-bench
--api-host=testing-s3gw0 --access-key=93EEF3F5O7VY89Q2GSWC
--secret="lf2bwxiRf1e9/nrOTCZyN/HgTqCz7XwrB2LDocY1" --protocol=http
--uri_style=path --seconds=20 --concurrent-ios=50 --block-size=204800
--show-time write
-bash: fork: Cannot allocate memory
root@testing-s3gw0:~/s3-tests# /usr/bin/rest-bench
--api-host=testing-s3gw0 --access-key=93EEF3F5O7VY89Q2GSWC
--secret="lf2bwxiRf1e9/nrOTCZyN/HgTqCz7XwrB2LDocY1" --protocol=http
--uri_style=path --seconds=20 --concurrent-ios=50 --block-size=204800
--show-time write
-bash: fork: Cannot allocate memory
root@testing-s3gw0:~/s3-tests# free
-bash: fork: Cannot allocate memory

A few mins later, the VM is completely unresponsible. And I had to
destroy it and restart again.

Guang, how was your testing when creating the patches?

>  
>
> Thanks,
> Guang
> On Jul 31, 2014, at 10:40 PM, Guang Yang <yguang11@outlook.com> wrote:
>
>> Thanks Yehuda. I will do that (sorry I was occupied by some other stuff recently but I will try my best to provide a patch as soon as possible).
>>
>> Thanks,
>> Guang
>>
>> 在 2014年7月31日,上午1:00,Yehuda Sadeh <yehuda@inktank.com> 写道:
>>
>>> Can you send this code through a github pull request (or at least as a
>>> patch)? It'lll be easier to review and comment.
>>>
>>> Thanks,
>>> Yehuda
>>>
>>> On Wed, Jul 30, 2014 at 7:58 AM, Guang Yang <yguang11@outlook.com> wrote:
>>>> +ceph-devel.
>>>>
>>>> Thanks,
>>>> Guang
>>>>
>>>> On Jul 29, 2014, at 10:20 PM, Guang Yang <yguang11@outlook.com> wrote:
>>>>
>>>>> Hi Yehuda,
>>>>> Per you review comment in terms of IO throttling for bucket index operation, I prototyped the below code (details still need to polish), can you take a look if that is right way to go?
>>>>>
>>>>> Another problem I came across is that ClsBucketIndexOpCtx::handle_compeltion was not called for the bucket index init op (below), is there anything I missed obviously here?
>>>>>
>>>>> Thanks,
>>>>> Guang
>>>>>
>>>>>
>>>>> class ClsBucketIndexAioThrottler {
>>>>> protected:
>>>>> int completed;
>>>>> int ret_code;
>>>>> IoCtx& io_ctx;
>>>>> Mutex lock;
>>>>> struct LockCond {
>>>>>  Mutex lock;
>>>>>  Cond cond;
>>>>>  LockCond() : lock("LockCond"), cond() {}
>>>>> } lock_cond;
>>>>> public:
>>>>> ClsBucketIndexAioThrottler(IoCtx& _io_ctx)
>>>>>  : completed(0), ret_code(0), io_ctx(_io_ctx),
>>>>>  lock("ClsBucketIndexAioThrottler"), lock_cond() {}
>>>>>
>>>>> virtual ~ClsBucketIndexAioThrottler() {}
>>>>> virtual void do_next() = 0;
>>>>> virtual bool is_completed () = 0;
>>>>>
>>>>> void complete(int ret) {
>>>>>  {
>>>>>    Mutex::Locker l(lock);
>>>>>    if (ret < 0)
>>>>>      ret_code = ret;
>>>>>    ++completed;
>>>>>  }
>>>>>
>>>>>  lock_cond.lock.Lock();
>>>>>  lock_cond.cond.Signal();
>>>>>  lock_cond.lock.Unlock();
>>>>> }
>>>>>
>>>>> int get_ret_code () {
>>>>>  Mutex::Locker l(lock);
>>>>>  return ret_code;
>>>>> }
>>>>>
>>>>> virtual int wait_completion() {
>>>>>  lock_cond.lock.Lock();
>>>>>  while (1) {
>>>>>    if (is_completed()) {
>>>>>      lock_cond.lock.Unlock();
>>>>>      return ret_code;
>>>>>    }
>>>>>    lock_cond.cond.Wait(lock_cond.lock);
>>>>>    lock_cond.lock.Lock();
>>>>>  }
>>>>> }
>>>>> };
>>>>>
>>>>> class ClsBucketIndexListAioThrottler : public ClsBucketIndexAioThrottler {
>>>>> protected:
>>>>> vector<string> bucket_objects;
>>>>> vector<string>::iterator iter_pos;
>>>>> public:
>>>>> ClsBucketIndexListAioThrottler(IoCtx& _io_ctx, const vector<string> _bucket_objs)
>>>>>  : ClsBucketIndexAioThrottler(_io_ctx), bucket_objects(_bucket_objs),
>>>>>  iter_pos(bucket_objects.begin()) {}
>>>>>
>>>>> virtual bool is_completed() {
>>>>>  Mutex::Locker l(lock);
>>>>>  int sent = 0;
>>>>>  vector<string>::iterator iter = bucket_objects.begin();
>>>>>  for (; iter != iter_pos; ++iter) ++sent;
>>>>>
>>>>>  return (sent == completed &&
>>>>>      (iter_pos == bucket_objects.end() /*Success*/ || ret_code < 0 /*Failure*/));
>>>>> }
>>>>> };
>>>>>
>>>>> template<typename T>
>>>>> class ClsBucketIndexOpCtx : public ObjectOperationCompletion {
>>>>> private:
>>>>> T* data;
>>>>> // Return code of the operation
>>>>> int* ret_code;
>>>>>
>>>>> // The Aio completion object associated with this Op, it should
>>>>> // be release from within the completion handler
>>>>> librados::AioCompletion* completion;
>>>>> ClsBucketIndexAioThrottler* throttler;
>>>>> public:
>>>>> ClsBucketIndexOpCtx(T* _data, int* _ret_code, librados::AioCompletion* _completion,
>>>>>        ClsBucketIndexAioThrottler* _throttler)
>>>>>  : data(_data), ret_code(_ret_code), completion(_completion), throttler(_throttler) {}
>>>>> ~ClsBucketIndexOpCtx() {}
>>>>>
>>>>> // The completion callback, fill the response data
>>>>> void handle_completion(int r, bufferlist& outbl) {
>>>>>  if (r >= 0) {
>>>>>    if (data) {
>>>>>      try {
>>>>>        bufferlist::iterator iter = outbl.begin();
>>>>>        ::decode((*data), iter);
>>>>>      } catch (buffer::error& err) {
>>>>>        r = -EIO;
>>>>>      }
>>>>>    }
>>>>>    // Do the next request
>>>>>  }
>>>>>  throttler->do_next();
>>>>>  throttler->complete(r);
>>>>>  if (completion) {
>>>>>    completion->release();
>>>>>  }
>>>>> }
>>>>> };
>>>>>
>>>>>
>>>>> class ClsBucketIndexInitAioThrottler : public ClsBucketIndexListAioThrottler {
>>>>> public:
>>>>> ClsBucketIndexInitAioThrottler(IoCtx& _io_ctx, const vector<string> _bucket_objs) :
>>>>>  ClsBucketIndexListAioThrottler(_io_ctx, _bucket_objs) {}
>>>>>
>>>>> virtual void do_next() {
>>>>>  string oid;
>>>>>  {
>>>>>    Mutex::Locker l(lock);
>>>>>    if (iter_pos == bucket_objects.end())
>>>>>      return;
>>>>>    oid = *(iter_pos++);
>>>>>  }
>>>>>  AioCompletion* c = librados::Rados::aio_create_completion(NULL, NULL, NULL);
>>>>>  // Dummy
>>>>>  bufferlist in;
>>>>>  librados::ObjectWriteOperation op;
>>>>>  op.create(true);
>>>>>  op.exec("rgw", "bucket_init_index", in, new ClsBucketIndexOpCtx<int>(NULL, NULL, c, this));
>>>>>  io_ctx.aio_operate(oid, c, &op, NULL);
>>>>> }
>>>>> };
>>>>>
>>>>>
>>>>> int cls_rgw_bucket_index_init_op(librados::IoCtx &io_ctx,
>>>>>      const vector<string>& bucket_objs, uint32_t max_aio)
>>>>> {
>>>>> vector<string>::const_iterator iter = bucket_objs.begin();
>>>>> bufferlist in;
>>>>> ClsBucketIndexAioThrottler* throttler = new ClsBucketIndexInitAioThrottler(io_ctx, bucket_objs);
>>>>> for (; iter != bucket_objs.end() && max_aio-- > 0; ++iter) {
>>>>>     throttler->do_next();
>>>>> }
>>>>> throttler->wait_completion();
>>>>> return 0;
>>>>> }
>>>>>
>>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2014-08-06  4:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1963E3AE-B242-4896-904C-B0868F5AC569@outlook.com>
2014-07-30 14:58 ` bucket index sharding - IO throttle Guang Yang
2014-07-30 17:00   ` Yehuda Sadeh
2014-07-31 14:40     ` Guang Yang
2014-08-04  7:20       ` Guang Yang
2014-08-06  4:38         ` Osier Yang [this message]
2014-08-06  7:51           ` Guang Yang
     [not found]       ` <BAA0B931-AFB1-41E6-AA11-A901D4CF8A27@outlook.com>
2014-08-12  7:50         ` Guang Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53E1B129.4060006@yunify.com \
    --to=osier@yunify.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=yehuda@inktank.com \
    --cc=yguang11@outlook.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.