From: Loic Dachary <loic@dachary.org>
To: "Miyamae, Takeshi" <miyamae.takeshi@jp.fujitsu.com>,
Ceph Development <ceph-devel@vger.kernel.org>
Cc: "Kawaguchi, Shotaro" <kawaguchi.s@jp.fujitsu.com>,
"Imai, Hiroki" <imai.hiroki@jp.fujitsu.com>,
"Nakao, Takanori" <nakao.takanori@jp.fujitsu.com>,
"Shiozawa, Kensuke" <shiozawa.kennsu@jp.fujitsu.com>
Subject: Re: teuthology timeout error
Date: Thu, 21 May 2015 11:37:36 +0200 [thread overview]
Message-ID: <555DA760.50604@dachary.org> (raw)
In-Reply-To: <870DE8DBB716524BAE51B2D499EC81E40AAFDE74@g01jpexmbyt24>
[-- Attachment #1: Type: text/plain, Size: 6396 bytes --]
Hi,
[sorry the previous mail was sent by accident, here is the full mail]
On 21/05/2015 10:32, Miyamae, Takeshi wrote:
> Hi Loic,
>
>> Could you please share the teuthology/ceph-qa-suite repository you are using to run these tests
>> so I can try to reproduce / diagnose the problem ?
>
> https://github.com/kawaguchi-s/teuthology/tree/wip-10886
> https://github.com/kawaguchi-s/ceph-qa-suite/tree/wip-10886
>
When compared against master they show differences that indicate it would be good to rebase:
https://github.com/ceph/teuthology/compare/master...kawaguchi-s:wip-10886
https://github.com/ceph/ceph-qa-suite/compare/master...kawaguchi-s:wip-10886
I think the teuthology commit on top of wip-10886 is a mistake
https://github.com/kawaguchi-s/teuthology/commit/348e54931f89c9b0ae7a84eb931576f8414017b5
do you really need to modify teuthology ? It should just be necessary to use the latest master branch.
It looks like the
https://github.com/kawaguchi-s/ceph-qa-suite/commit/f2e3ca5d12ceef742eae2a9cf4057c436e9040c3
commit in your ceph-qa-suite is not what you intended. However
https://github.com/kawaguchi-s/ceph-qa-suite/commit/4b39d6d4862f9091a849d224e880795be406815d
https://github.com/kawaguchi-s/ceph-qa-suite/commit/d16b4b058ae118931928541a2c8acd68f9703a44
look ok :-) Instead of naming the test 4nodes16osds3mons1client.yaml it would be better to use the same kind of naming you see at https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados/thrash-erasure-code/workloads. That is a file name made of the distinctive parameters for the shec plugin (the parameters that are the default can be omited).
Cheers
> Here are our teuthology/ceph-qa-suite repositories. Thanks in advance.
>
> Best regards,
> Takeshi Miyamae
>
> -----Original Message-----
> From: Loic Dachary [mailto:loic@dachary.org]
> Sent: Wednesday, May 20, 2015 4:49 PM
> To: Miyamae, Takeshi/宮前 剛; Ceph Development
> Cc: Kawaguchi, Shotaro/川口 翔太朗; Imai, Hiroki/今井 宏樹; Nakao, Takanori/中尾 鷹詔; Shiozawa, Kensuke/塩沢 賢輔
> Subject: Re: teuthology timeout error
>
> Hi,
>
> On 20/05/2015 04:20, Miyamae, Takeshi wrote:
>> Hi Loic,
>>
>> When we fixed our own issue and restarted teuthology,
>
> Great !
>
>> we encountered another issue (timeout error) which occurs in case of LRC as well.
>> Do you have any information about that ?
>
> Could you please share the teuthology/ceph-qa-suite repository you are using to run these tests so I can try to reproduce / diagnose the problem ?
>
> Thanks
>
>>
>> [error messages (in case of LRC pool)]
>>
>> 2015-04-28 12:38:54,128.128 INFO:teuthology.orchestra.run.RX35-1:Running: 'adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage ceph status --format=json-pretty'
>> 2015-04-28 12:38:54,516.516 INFO:tasks.ceph.ceph_manager:no progress seen, keeping timeout for now
>> 2015-04-28 12:38:54,516.516 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
>> File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 632, in wrapper
>> return func(self)
>> File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 665, in do_thrash
>> timeout=self.config.get('timeout')
>> File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 1566, in wait_for_recovery
>> 'failed to recover before timeout expired'
>> AssertionError: failed to recover before timeout expired
>>
>> Traceback (most recent call last):
>> File "/root/work/teuthology/virtualenv/lib/python2.7/site-packages/gevent/greenlet.py", line 390, in run
>> result = self._run(*self.args, **self.kwargs)
>> File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 632, in wrapper
>> return func(self)
>> File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 665, in do_thrash
>> timeout=self.config.get('timeout')
>> File "/root/src/ceph-qa-suite_master/tasks/ceph_manager.py", line 1566, in wait_for_recovery
>> 'failed to recover before timeout expired'
>> AssertionError: failed to recover before timeout expired <Greenlet at 0x2a7d550: <bound method Thrasher.do_thrash of <tasks.ceph_manager.Thrasher instance at 0x2bd12d8>>> failed with AssertionError
>>
>> [ceph version]
>> 0.93-952-gfe28daa
>>
>> [teuthology, ceph-qa-suite]
>> newest version at 3/25/2015
>>
>> [configurations]
>> check-locks: false
>> overrides:
>> ceph:
>> conf:
>> global:
>> ms inject socket failures: 5000
>> osd:
>> osd heartbeat use min delay socket: true
>> osd sloppy crc: true
>> fs: xfs
>> roles:
>> - - mon.a
>> - osd.0
>> - osd.4
>> - osd.8
>> - osd.12
>> - - mon.b
>> - osd.1
>> - osd.5
>> - osd.9
>> - osd.13
>> - - mon.c
>> - osd.2
>> - osd.6
>> - osd.10
>> - osd.14
>> - - osd.3
>> - osd.7
>> - osd.11
>> - osd.15
>> - client.0
>> targets:
>> ubuntu@RX35-1.primary.ceph-poc.fsc.net:
>> ubuntu@RX35-2.primary.ceph-poc.fsc.net:
>> ubuntu@RX35-3.primary.ceph-poc.fsc.net:
>> ubuntu@RX35-4.primary.ceph-poc.fsc.net:
>> tasks:
>> - ceph:
>> conf:
>> osd:
>> osd debug reject backfill probability: 0.3
>> osd max backfills: 1
>> osd scrub max interval: 120
>> osd scrub min interval: 60
>> log-whitelist:
>> - wrongly marked me down
>> - objects unfound and apparently lost
>> - thrashosds:
>> chance_pgnum_grow: 1
>> chance_pgpnum_fix: 1
>> min_in: 4
>> timeout: 1200
>> - rados:
>> clients:
>> - client.0
>> ec_pool: true
>> erasure_code_profile:
>> k: 4
>> l: 3
>> m: 2
>> name: lrcprofile
>> plugin: lrc
>> ruleset-failure-domain: osd
>> objects: 50
>> op_weights:
>> append: 100
>> copy_from: 50
>> delete: 50
>> read: 100
>> rmattr: 25
>> rollback: 50
>> setattr: 25
>> snap_create: 50
>> snap_remove: 50
>> write: 0
>> ops: 190000
>>
>> Best regards,
>> Takeshi Miyamae
>>
>
--
Loïc Dachary, Artisan Logiciel Libre
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
next prev parent reply other threads:[~2015-05-21 9:37 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-20 2:20 teuthology timeout error Miyamae, Takeshi
2015-05-20 7:48 ` Loic Dachary
2015-05-21 8:32 ` Miyamae, Takeshi
2015-05-21 9:30 ` Loic Dachary
2015-05-21 9:37 ` Loic Dachary [this message]
2015-05-26 2:39 ` Miyamae, Takeshi
2015-05-26 8:59 ` Loic Dachary
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=555DA760.50604@dachary.org \
--to=loic@dachary.org \
--cc=ceph-devel@vger.kernel.org \
--cc=imai.hiroki@jp.fujitsu.com \
--cc=kawaguchi.s@jp.fujitsu.com \
--cc=miyamae.takeshi@jp.fujitsu.com \
--cc=nakao.takanori@jp.fujitsu.com \
--cc=shiozawa.kennsu@jp.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.