re-running teuthology jobs

All of lore.kernel.org
 help / color / mirror / Atom feed

* re-running teuthology jobs
@ 2015-02-28 10:28 Loic Dachary
  2015-02-28 15:01 ` Loic Dachary
  0 siblings, 1 reply; 5+ messages in thread
From: Loic Dachary @ 2015-02-28 10:28 UTC (permalink / raw)
  To: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 1498 bytes --]

Hi,

A teuthology rados run ( https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados ) completed with five dead jobs out of 693. They failed because of DNS errors and I'd like to re-run them. Ideally I could do something like:

teuthology-schedule --run loic-2015-02-27_20:22:09-rados-firefly-backports---basic-multi --job-id 781444 --job-id  781457 ...

and it would re-schedule a run of the designated jobs from the designated run. But I don't think such a command exist. 

I will therefore manually do what such a command would do, for each failed job:

* download http://qa-proxy.ceph.com/teuthology/loic-2015-02-27_20:22:09-rados-firefly-backports---basic-multi/781444/orig.config.yaml
* git clone https://github.com/ceph/ceph-qa-suite /srv/ceph-qa-suite
* cd /srv/ceph-qa-suite ; git checkout firefly (assuming that's the ceph-qa-suite branch I'm interested in)
* remove the fields:
   job_id: '781444'
   last_in_suite: false
   worker_log: /var/lib/teuthworker/archive/worker_logs/worker.multi.14588
* replace the suite_path: field with suite_path: /srv/ceph-qa-suite
* teuthology-lock --lock enough machines (i.e. one for each element in the roles: section of the orig.config.yaml)
* turn the machine list into a consumable file for teuthology : teuthology-lock --list-targets > targets.yaml 
* run teuthology orig.config.yaml targets.yaml
* wait for the result

Is there a better way to do that ? 

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: re-running teuthology jobs
  2015-02-28 10:28 re-running teuthology jobs Loic Dachary
@ 2015-02-28 15:01 ` Loic Dachary
  2015-02-28 15:47   ` Yuri Weinstein
  0 siblings, 1 reply; 5+ messages in thread
From: Loic Dachary @ 2015-02-28 15:01 UTC (permalink / raw)
  To: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 5411 bytes --]

The simpler way is to use the --filter argument of teuthology-suite with the value of the description: field found in the config.yaml file. For instance, running the rados failed jobs http://tracker.ceph.com/issues/10641#rados failed jobs:

$ ./virtualenv/bin/teuthology-suite --priority 101 --suite rados --filter 'rados/multimon/{clusters/21.yaml msgr-failures/many.yaml tasks/mon_clock_with_skews.yaml},rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/morepggrow.yaml workloads/small-objects.yaml},rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/pggrow.yaml workloads/ec-small-objects.yaml},rados/verify/{1thrash/none.yaml clusters/fixed-2.yaml fs/btrfs.yaml msgr-failures/few.yaml tasks/mon_recovery.yaml validater/valgrind.yaml},rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/default.yaml workloads/cache-agent-small.yaml}' --suite-branch firefly --machine-type plana,burnupi,mira --distro ubuntu --email loic@dachary.org --owner loic@dachary.org  --ceph firefly-backports
2015-02-28 15:58:08,474.474 INFO:teuthology.suite:ceph sha1: e54834bfac3c38562987730b317cb1944a96005b
2015-02-28 15:58:08,969.969 INFO:teuthology.suite:ceph version: 0.80.8-75-ge54834b-1precise
2015-02-28 15:58:09,606.606 INFO:teuthology.suite:teuthology branch: master
2015-02-28 15:58:10,407.407 INFO:teuthology.suite:ceph-qa-suite branch: firefly
2015-02-28 15:58:10,409.409 INFO:teuthology.repo_utils:Fetching from upstream into /home/loic/src/ceph-qa-suite_firefly
2015-02-28 15:58:11,522.522 INFO:teuthology.repo_utils:Resetting repo at /home/loic/src/ceph-qa-suite_firefly to branch firefly
2015-02-28 15:58:12,393.393 INFO:teuthology.suite:Suite rados in /home/loic/src/ceph-qa-suite_firefly/suites/rados generated 693 jobs (not yet filtered)
2015-02-28 15:58:12,419.419 INFO:teuthology.suite:Scheduling rados/multimon/{clusters/21.yaml msgr-failures/many.yaml tasks/mon_clock_with_skews.yaml}
Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783145
2015-02-28 15:58:14,199.199 INFO:teuthology.suite:Scheduling rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/default.yaml workloads/cache-agent-small.yaml}
Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783146
2015-02-28 15:58:15,650.650 INFO:teuthology.suite:Scheduling rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/morepggrow.yaml workloads/small-objects.yaml}
Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783147
2015-02-28 15:58:16,837.837 INFO:teuthology.suite:Scheduling rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/pggrow.yaml workloads/ec-small-objects.yaml}
Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783148
2015-02-28 15:58:18,421.421 INFO:teuthology.suite:Scheduling rados/verify/{1thrash/none.yaml clusters/fixed-2.yaml fs/btrfs.yaml msgr-failures/few.yaml tasks/mon_recovery.yaml validater/valgrind.yaml}
Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783149
2015-02-28 15:58:19,729.729 INFO:teuthology.suite:Suite rados in /home/loic/src/ceph-qa-suite_firefly/suites/rados scheduled 5 jobs.
2015-02-28 15:58:19,729.729 INFO:teuthology.suite:Suite rados in /home/loic/src/ceph-qa-suite_firefly/suites/rados -- 688 jobs were filtered out.
Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783150

Creates the http://pulpito.ceph.com/loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi/ run with just 5 jobs.

On 28/02/2015 11:28, Loic Dachary wrote:
> Hi,
> 
> A teuthology rados run ( https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados ) completed with five dead jobs out of 693. They failed because of DNS errors and I'd like to re-run them. Ideally I could do something like:
> 
> teuthology-schedule --run loic-2015-02-27_20:22:09-rados-firefly-backports---basic-multi --job-id 781444 --job-id  781457 ...
> 
> and it would re-schedule a run of the designated jobs from the designated run. But I don't think such a command exist. 
> 
> I will therefore manually do what such a command would do, for each failed job:
> 
> * download http://qa-proxy.ceph.com/teuthology/loic-2015-02-27_20:22:09-rados-firefly-backports---basic-multi/781444/orig.config.yaml
> * git clone https://github.com/ceph/ceph-qa-suite /srv/ceph-qa-suite
> * cd /srv/ceph-qa-suite ; git checkout firefly (assuming that's the ceph-qa-suite branch I'm interested in)
> * remove the fields:
>    job_id: '781444'
>    last_in_suite: false
>    worker_log: /var/lib/teuthworker/archive/worker_logs/worker.multi.14588
> * replace the suite_path: field with suite_path: /srv/ceph-qa-suite
> * teuthology-lock --lock enough machines (i.e. one for each element in the roles: section of the orig.config.yaml)
> * turn the machine list into a consumable file for teuthology : teuthology-lock --list-targets > targets.yaml 
> * run teuthology orig.config.yaml targets.yaml
> * wait for the result
> 
> Is there a better way to do that ? 
> 
> Cheers
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: re-running teuthology jobs
  2015-02-28 15:01 ` Loic Dachary
@ 2015-02-28 15:47   ` Yuri Weinstein
  2015-02-28 16:17     ` Loic Dachary
  0 siblings, 1 reply; 5+ messages in thread
From: Yuri Weinstein @ 2015-02-28 15:47 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

Loic

In case you want to add some comments - http://tracker.ceph.com/issues/10945

Thx
YuriW

----- Original Message -----
From: "Loic Dachary" <loic@dachary.org>
To: "Ceph Development" <ceph-devel@vger.kernel.org>
Sent: Saturday, February 28, 2015 7:01:29 AM
Subject: Re: re-running teuthology jobs

The simpler way is to use the --filter argument of teuthology-suite with the value of the description: field found in the config.yaml file. For instance, running the rados failed jobs http://tracker.ceph.com/issues/10641#rados failed jobs:

$ ./virtualenv/bin/teuthology-suite --priority 101 --suite rados --filter 'rados/multimon/{clusters/21.yaml msgr-failures/many.yaml tasks/mon_clock_with_skews.yaml},rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/morepggrow.yaml workloads/small-objects.yaml},rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/pggrow.yaml workloads/ec-small-objects.yaml},rados/verify/{1thrash/none.yaml clusters/fixed-2.yaml fs/btrfs.yaml msgr-failures/few.yaml tasks/mon_recovery.yaml validater/valgrind.yaml},rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/default.yaml workloads/cache-agent-small.yaml}' --suite-branch firefly --machine-type plana,burnupi,mira --distro ubuntu --email loic@dachary.org --owner loic@dachary.org  --ceph firefly-backports
2015-02-28 15:58:08,474.474 INFO:teuthology.suite:ceph sha1: e54834bfac3c38562987730b317cb1944a96005b
2015-02-28 15:58:08,969.969 INFO:teuthology.suite:ceph version: 0.80.8-75-ge54834b-1precise
2015-02-28 15:58:09,606.606 INFO:teuthology.suite:teuthology branch: master
2015-02-28 15:58:10,407.407 INFO:teuthology.suite:ceph-qa-suite branch: firefly
2015-02-28 15:58:10,409.409 INFO:teuthology.repo_utils:Fetching from upstream into /home/loic/src/ceph-qa-suite_firefly
2015-02-28 15:58:11,522.522 INFO:teuthology.repo_utils:Resetting repo at /home/loic/src/ceph-qa-suite_firefly to branch firefly
2015-02-28 15:58:12,393.393 INFO:teuthology.suite:Suite rados in /home/loic/src/ceph-qa-suite_firefly/suites/rados generated 693 jobs (not yet filtered)
2015-02-28 15:58:12,419.419 INFO:teuthology.suite:Scheduling rados/multimon/{clusters/21.yaml msgr-failures/many.yaml tasks/mon_clock_with_skews.yaml}
Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783145
2015-02-28 15:58:14,199.199 INFO:teuthology.suite:Scheduling rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/default.yaml workloads/cache-agent-small.yaml}
Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783146
2015-02-28 15:58:15,650.650 INFO:teuthology.suite:Scheduling rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/morepggrow.yaml workloads/small-objects.yaml}
Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783147
2015-02-28 15:58:16,837.837 INFO:teuthology.suite:Scheduling rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/pggrow.yaml workloads/ec-small-objects.yaml}
Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783148
2015-02-28 15:58:18,421.421 INFO:teuthology.suite:Scheduling rados/verify/{1thrash/none.yaml clusters/fixed-2.yaml fs/btrfs.yaml msgr-failures/few.yaml tasks/mon_recovery.yaml validater/valgrind.yaml}
Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783149
2015-02-28 15:58:19,729.729 INFO:teuthology.suite:Suite rados in /home/loic/src/ceph-qa-suite_firefly/suites/rados scheduled 5 jobs.
2015-02-28 15:58:19,729.729 INFO:teuthology.suite:Suite rados in /home/loic/src/ceph-qa-suite_firefly/suites/rados -- 688 jobs were filtered out.
Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783150

Creates the http://pulpito.ceph.com/loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi/ run with just 5 jobs.

On 28/02/2015 11:28, Loic Dachary wrote:
> Hi,
> 
> A teuthology rados run ( https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados ) completed with five dead jobs out of 693. They failed because of DNS errors and I'd like to re-run them. Ideally I could do something like:
> 
> teuthology-schedule --run loic-2015-02-27_20:22:09-rados-firefly-backports---basic-multi --job-id 781444 --job-id  781457 ...
> 
> and it would re-schedule a run of the designated jobs from the designated run. But I don't think such a command exist. 
> 
> I will therefore manually do what such a command would do, for each failed job:
> 
> * download http://qa-proxy.ceph.com/teuthology/loic-2015-02-27_20:22:09-rados-firefly-backports---basic-multi/781444/orig.config.yaml
> * git clone https://github.com/ceph/ceph-qa-suite /srv/ceph-qa-suite
> * cd /srv/ceph-qa-suite ; git checkout firefly (assuming that's the ceph-qa-suite branch I'm interested in)
> * remove the fields:
>    job_id: '781444'
>    last_in_suite: false
>    worker_log: /var/lib/teuthworker/archive/worker_logs/worker.multi.14588
> * replace the suite_path: field with suite_path: /srv/ceph-qa-suite
> * teuthology-lock --lock enough machines (i.e. one for each element in the roles: section of the orig.config.yaml)
> * turn the machine list into a consumable file for teuthology : teuthology-lock --list-targets > targets.yaml 
> * run teuthology orig.config.yaml targets.yaml
> * wait for the result
> 
> Is there a better way to do that ? 
> 
> Cheers
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: re-running teuthology jobs
  2015-02-28 15:47   ` Yuri Weinstein
@ 2015-02-28 16:17     ` Loic Dachary
  2015-02-28 17:21       ` Sage Weil
  0 siblings, 1 reply; 5+ messages in thread
From: Loic Dachary @ 2015-02-28 16:17 UTC (permalink / raw)
  To: Yuri Weinstein; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 5896 bytes --]



On 28/02/2015 16:47, Yuri Weinstein wrote:
> Loic
> 
> In case you want to add some comments - http://tracker.ceph.com/issues/10945

Done thanks !

> 
> Thx
> YuriW
> 
> ----- Original Message -----
> From: "Loic Dachary" <loic@dachary.org>
> To: "Ceph Development" <ceph-devel@vger.kernel.org>
> Sent: Saturday, February 28, 2015 7:01:29 AM
> Subject: Re: re-running teuthology jobs
> 
> The simpler way is to use the --filter argument of teuthology-suite with the value of the description: field found in the config.yaml file. For instance, running the rados failed jobs http://tracker.ceph.com/issues/10641#rados failed jobs:
> 
> $ ./virtualenv/bin/teuthology-suite --priority 101 --suite rados --filter 'rados/multimon/{clusters/21.yaml msgr-failures/many.yaml tasks/mon_clock_with_skews.yaml},rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/morepggrow.yaml workloads/small-objects.yaml},rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/pggrow.yaml workloads/ec-small-objects.yaml},rados/verify/{1thrash/none.yaml clusters/fixed-2.yaml fs/btrfs.yaml msgr-failures/few.yaml tasks/mon_recovery.yaml validater/valgrind.yaml},rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/default.yaml workloads/cache-agent-small.yaml}' --suite-branch firefly --machine-type plana,burnupi,mira --distro ubuntu --email loic@dachary.org --owner loic@dachary.org  --ceph firefly-backports
> 2015-02-28 15:58:08,474.474 INFO:teuthology.suite:ceph sha1: e54834bfac3c38562987730b317cb1944a96005b
> 2015-02-28 15:58:08,969.969 INFO:teuthology.suite:ceph version: 0.80.8-75-ge54834b-1precise
> 2015-02-28 15:58:09,606.606 INFO:teuthology.suite:teuthology branch: master
> 2015-02-28 15:58:10,407.407 INFO:teuthology.suite:ceph-qa-suite branch: firefly
> 2015-02-28 15:58:10,409.409 INFO:teuthology.repo_utils:Fetching from upstream into /home/loic/src/ceph-qa-suite_firefly
> 2015-02-28 15:58:11,522.522 INFO:teuthology.repo_utils:Resetting repo at /home/loic/src/ceph-qa-suite_firefly to branch firefly
> 2015-02-28 15:58:12,393.393 INFO:teuthology.suite:Suite rados in /home/loic/src/ceph-qa-suite_firefly/suites/rados generated 693 jobs (not yet filtered)
> 2015-02-28 15:58:12,419.419 INFO:teuthology.suite:Scheduling rados/multimon/{clusters/21.yaml msgr-failures/many.yaml tasks/mon_clock_with_skews.yaml}
> Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783145
> 2015-02-28 15:58:14,199.199 INFO:teuthology.suite:Scheduling rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/default.yaml workloads/cache-agent-small.yaml}
> Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783146
> 2015-02-28 15:58:15,650.650 INFO:teuthology.suite:Scheduling rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/morepggrow.yaml workloads/small-objects.yaml}
> Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783147
> 2015-02-28 15:58:16,837.837 INFO:teuthology.suite:Scheduling rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/pggrow.yaml workloads/ec-small-objects.yaml}
> Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783148
> 2015-02-28 15:58:18,421.421 INFO:teuthology.suite:Scheduling rados/verify/{1thrash/none.yaml clusters/fixed-2.yaml fs/btrfs.yaml msgr-failures/few.yaml tasks/mon_recovery.yaml validater/valgrind.yaml}
> Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783149
> 2015-02-28 15:58:19,729.729 INFO:teuthology.suite:Suite rados in /home/loic/src/ceph-qa-suite_firefly/suites/rados scheduled 5 jobs.
> 2015-02-28 15:58:19,729.729 INFO:teuthology.suite:Suite rados in /home/loic/src/ceph-qa-suite_firefly/suites/rados -- 688 jobs were filtered out.
> Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783150
> 
> Creates the http://pulpito.ceph.com/loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi/ run with just 5 jobs.
> 
> On 28/02/2015 11:28, Loic Dachary wrote:
>> Hi,
>>
>> A teuthology rados run ( https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados ) completed with five dead jobs out of 693. They failed because of DNS errors and I'd like to re-run them. Ideally I could do something like:
>>
>> teuthology-schedule --run loic-2015-02-27_20:22:09-rados-firefly-backports---basic-multi --job-id 781444 --job-id  781457 ...
>>
>> and it would re-schedule a run of the designated jobs from the designated run. But I don't think such a command exist. 
>>
>> I will therefore manually do what such a command would do, for each failed job:
>>
>> * download http://qa-proxy.ceph.com/teuthology/loic-2015-02-27_20:22:09-rados-firefly-backports---basic-multi/781444/orig.config.yaml
>> * git clone https://github.com/ceph/ceph-qa-suite /srv/ceph-qa-suite
>> * cd /srv/ceph-qa-suite ; git checkout firefly (assuming that's the ceph-qa-suite branch I'm interested in)
>> * remove the fields:
>>    job_id: '781444'
>>    last_in_suite: false
>>    worker_log: /var/lib/teuthworker/archive/worker_logs/worker.multi.14588
>> * replace the suite_path: field with suite_path: /srv/ceph-qa-suite
>> * teuthology-lock --lock enough machines (i.e. one for each element in the roles: section of the orig.config.yaml)
>> * turn the machine list into a consumable file for teuthology : teuthology-lock --list-targets > targets.yaml 
>> * run teuthology orig.config.yaml targets.yaml
>> * wait for the result
>>
>> Is there a better way to do that ? 
>>
>> Cheers
>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: re-running teuthology jobs
  2015-02-28 16:17     ` Loic Dachary
@ 2015-02-28 17:21       ` Sage Weil
  0 siblings, 0 replies; 5+ messages in thread
From: Sage Weil @ 2015-02-28 17:21 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Yuri Weinstein, Ceph Development

[-- Attachment #1: Type: TEXT/PLAIN, Size: 6260 bytes --]

On Sat, 28 Feb 2015, Loic Dachary wrote:
> 
> 
> On 28/02/2015 16:47, Yuri Weinstein wrote:
> > Loic
> > 
> > In case you want to add some comments - http://tracker.ceph.com/issues/10945
> 
> Done thanks !

It would also be nice to just point it at the archive directory of the run 
that failed and have it figure the rest out from the orig.config.yaml (or 
whatever else) in that directory.  At least, that's how I would probably 
use it!

sage

> 
> > 
> > Thx
> > YuriW
> > 
> > ----- Original Message -----
> > From: "Loic Dachary" <loic@dachary.org>
> > To: "Ceph Development" <ceph-devel@vger.kernel.org>
> > Sent: Saturday, February 28, 2015 7:01:29 AM
> > Subject: Re: re-running teuthology jobs
> > 
> > The simpler way is to use the --filter argument of teuthology-suite with the value of the description: field found in the config.yaml file. For instance, running the rados failed jobs http://tracker.ceph.com/issues/10641#rados failed jobs:
> > 
> > $ ./virtualenv/bin/teuthology-suite --priority 101 --suite rados --filter 'rados/multimon/{clusters/21.yaml msgr-failures/many.yaml tasks/mon_clock_with_skews.yaml},rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/morepggrow.yaml workloads/small-objects.yaml},rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/pggrow.yaml workloads/ec-small-objects.yaml},rados/verify/{1thrash/none.yaml clusters/fixed-2.yaml fs/btrfs.yaml msgr-failures/few.yaml tasks/mon_recovery.yaml validater/valgrind.yaml},rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/default.yaml workloads/cache-agent-small.yaml}' --suite-branch firefly --machine-type plana,burnupi,mira --distro ubuntu --email loic@dacha
 ry.org --owner loic@dachary.org  --ceph firefly-backports
> > 2015-02-28 15:58:08,474.474 INFO:teuthology.suite:ceph sha1: e54834bfac3c38562987730b317cb1944a96005b
> > 2015-02-28 15:58:08,969.969 INFO:teuthology.suite:ceph version: 0.80.8-75-ge54834b-1precise
> > 2015-02-28 15:58:09,606.606 INFO:teuthology.suite:teuthology branch: master
> > 2015-02-28 15:58:10,407.407 INFO:teuthology.suite:ceph-qa-suite branch: firefly
> > 2015-02-28 15:58:10,409.409 INFO:teuthology.repo_utils:Fetching from upstream into /home/loic/src/ceph-qa-suite_firefly
> > 2015-02-28 15:58:11,522.522 INFO:teuthology.repo_utils:Resetting repo at /home/loic/src/ceph-qa-suite_firefly to branch firefly
> > 2015-02-28 15:58:12,393.393 INFO:teuthology.suite:Suite rados in /home/loic/src/ceph-qa-suite_firefly/suites/rados generated 693 jobs (not yet filtered)
> > 2015-02-28 15:58:12,419.419 INFO:teuthology.suite:Scheduling rados/multimon/{clusters/21.yaml msgr-failures/many.yaml tasks/mon_clock_with_skews.yaml}
> > Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783145
> > 2015-02-28 15:58:14,199.199 INFO:teuthology.suite:Scheduling rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/default.yaml workloads/cache-agent-small.yaml}
> > Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783146
> > 2015-02-28 15:58:15,650.650 INFO:teuthology.suite:Scheduling rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/morepggrow.yaml workloads/small-objects.yaml}
> > Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783147
> > 2015-02-28 15:58:16,837.837 INFO:teuthology.suite:Scheduling rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/pggrow.yaml workloads/ec-small-objects.yaml}
> > Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783148
> > 2015-02-28 15:58:18,421.421 INFO:teuthology.suite:Scheduling rados/verify/{1thrash/none.yaml clusters/fixed-2.yaml fs/btrfs.yaml msgr-failures/few.yaml tasks/mon_recovery.yaml validater/valgrind.yaml}
> > Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783149
> > 2015-02-28 15:58:19,729.729 INFO:teuthology.suite:Suite rados in /home/loic/src/ceph-qa-suite_firefly/suites/rados scheduled 5 jobs.
> > 2015-02-28 15:58:19,729.729 INFO:teuthology.suite:Suite rados in /home/loic/src/ceph-qa-suite_firefly/suites/rados -- 688 jobs were filtered out.
> > Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783150
> > 
> > Creates the http://pulpito.ceph.com/loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi/ run with just 5 jobs.
> > 
> > On 28/02/2015 11:28, Loic Dachary wrote:
> >> Hi,
> >>
> >> A teuthology rados run ( https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados ) completed with five dead jobs out of 693. They failed because of DNS errors and I'd like to re-run them. Ideally I could do something like:
> >>
> >> teuthology-schedule --run loic-2015-02-27_20:22:09-rados-firefly-backports---basic-multi --job-id 781444 --job-id  781457 ...
> >>
> >> and it would re-schedule a run of the designated jobs from the designated run. But I don't think such a command exist. 
> >>
> >> I will therefore manually do what such a command would do, for each failed job:
> >>
> >> * download http://qa-proxy.ceph.com/teuthology/loic-2015-02-27_20:22:09-rados-firefly-backports---basic-multi/781444/orig.config.yaml
> >> * git clone https://github.com/ceph/ceph-qa-suite /srv/ceph-qa-suite
> >> * cd /srv/ceph-qa-suite ; git checkout firefly (assuming that's the ceph-qa-suite branch I'm interested in)
> >> * remove the fields:
> >>    job_id: '781444'
> >>    last_in_suite: false
> >>    worker_log: /var/lib/teuthworker/archive/worker_logs/worker.multi.14588
> >> * replace the suite_path: field with suite_path: /srv/ceph-qa-suite
> >> * teuthology-lock --lock enough machines (i.e. one for each element in the roles: section of the orig.config.yaml)
> >> * turn the machine list into a consumable file for teuthology : teuthology-lock --list-targets > targets.yaml 
> >> * run teuthology orig.config.yaml targets.yaml
> >> * wait for the result
> >>
> >> Is there a better way to do that ? 
> >>
> >> Cheers
> >>
> > 
> 
> -- 
> Loïc Dachary, Artisan Logiciel Libre
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-02-28 17:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-28 10:28 re-running teuthology jobs Loic Dachary
2015-02-28 15:01 ` Loic Dachary
2015-02-28 15:47   ` Yuri Weinstein
2015-02-28 16:17     ` Loic Dachary
2015-02-28 17:21       ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.