Analyzing the nightlies

All of lore.kernel.org
 help / color / mirror / Atom feed

* Analyzing the nightlies
@ 2015-05-09  9:31 Loic Dachary
  2015-05-10  0:27 ` Yuri Weinstein
  2015-05-11 22:50 ` Gregory Farnum
  0 siblings, 2 replies; 6+ messages in thread
From: Loic Dachary @ 2015-05-09  9:31 UTC (permalink / raw)
  To: Yuri Weinstein; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 1276 bytes --]

Hi Yuri,

It would be useful to add more information bout how the nightlies are analyzed at

   http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_monitor_the_automated_tests_AKA_nightlies

At this point my understanding is that you look over all of them and you carry the burden of 

* sorting out the environmental noise
* creating new bugs for errors for which there is no match in the tracker
* add a link to the failed job in pre-existing issues found in the tracker (useful to figure out the frequency and helps with debug when there are multiple outputs / logs)

You do so by using tools such as https://github.com/jcsp/scrape/blob/master/scrape.py and maybe others and you also format your mail messages so that they can be parsed by a program (although such a program does not exist yet, it could go over all your messages and build a database from the mails you sent).

In the http://lists.ceph.com/private.cgi/ceph-qa-ceph.com/ archives, I see that Greg also regularly goes over the errors and other developers also do. What I'm not sure about is if it's best effort ? Is there a time like bug scrubbing or sprint planning when developers say "Let's analyze QA results and dig bugs" ?

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Analyzing the nightlies
  2015-05-09  9:31 Analyzing the nightlies Loic Dachary
@ 2015-05-10  0:27 ` Yuri Weinstein
  2015-05-10  8:25   ` Loic Dachary
  2015-05-11 22:50 ` Gregory Farnum
  1 sibling, 1 reply; 6+ messages in thread
From: Yuri Weinstein @ 2015-05-10  0:27 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development, Sage Weil

Loic 

You description on high level is correct.  There are, of cause, more details when someone actually goes thru the nightlies results.

As far as you question "Is there a time like bug scrubbing or sprint planning when developers say "Let's analyze QA results and dig bugs" ?" - I surely hope so and do see updates and triages on bugs in the tracker, but not 100% sure what exactly our process is, so Sage and development leads are better persons to ask this.

Also when you say "What I'm not sure about is if it's best effort ? "  - do you have something in mind instead or in addition to what we do now?
( I hope something that can lighten the burden :) )

Thx
YuriW

----- Original Message -----
From: "Loic Dachary" <loic@dachary.org>
To: "Yuri Weinstein" <yweinste@redhat.com>
Cc: "Ceph Development" <ceph-devel@vger.kernel.org>
Sent: Saturday, May 9, 2015 2:31:03 AM
Subject: Analyzing the nightlies

Hi Yuri,

It would be useful to add more information bout how the nightlies are analyzed at

   http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_monitor_the_automated_tests_AKA_nightlies

At this point my understanding is that you look over all of them and you carry the burden of 

* sorting out the environmental noise
* creating new bugs for errors for which there is no match in the tracker
* add a link to the failed job in pre-existing issues found in the tracker (useful to figure out the frequency and helps with debug when there are multiple outputs / logs)

You do so by using tools such as https://github.com/jcsp/scrape/blob/master/scrape.py and maybe others and you also format your mail messages so that they can be parsed by a program (although such a program does not exist yet, it could go over all your messages and build a database from the mails you sent).

In the http://lists.ceph.com/private.cgi/ceph-qa-ceph.com/ archives, I see that Greg also regularly goes over the errors and other developers also do. What I'm not sure about is if it's best effort ? Is there a time like bug scrubbing or sprint planning when developers say "Let's analyze QA results and dig bugs" ?

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Analyzing the nightlies
  2015-05-10  0:27 ` Yuri Weinstein
@ 2015-05-10  8:25   ` Loic Dachary
  2015-05-10 18:52     ` Yuri Weinstein
  0 siblings, 1 reply; 6+ messages in thread
From: Loic Dachary @ 2015-05-10  8:25 UTC (permalink / raw)
  To: Yuri Weinstein; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 3612 bytes --]

Hi Yuri,

On 10/05/2015 02:27, Yuri Weinstein wrote:
> Loic 
> 
> You description on high level is correct.  There are, of cause, more details when someone actually goes thru the nightlies results.
> 
> As far as you question "Is there a time like bug scrubbing or sprint planning when developers say "Let's analyze QA results and dig bugs" ?" - I surely hope so and do see updates and triages on bugs in the tracker, but not 100% sure what exactly our process is, so Sage and development leads are better persons to ask this.
> 
> Also when you say "What I'm not sure about is if it's best effort ? "  - do you have something in mind instead or in addition to what we do now?
> ( I hope something that can lighten the burden :) )

It would make sense for the "Stable releases and backports" team to monitor the nightlies. Not on a daily basis because it would be too much work (there are only a few of us right now). But after merges to the stable branch, I think we should monitor the nightlies that could be impacted. Here is an example:

* 20 pull requests from rgw, rbd, fs, rados are merged in the integration branch (the nightlies don't see it)
* the rgw, rbd, fs, rados suites run on the integration branch and succeed
* the 20 pull requests are merged (after approval by the original developer and the lead if it was backported by someone from the "Stable releases and backports" team). This typically happens withink two or three days.
* the nightlies will run on a stable branch in which 20 pull requests have been merged
* if a jobs fails because of on of these 20 pull requests, the person in charge of the stable branch is likely to be in a good position to figure out where it comes from

I keep an eye on your comments on a daily basis, but I think I should pay attention more closely. I guess the amount of output is intimidating and it's difficult to figure out how to contribute usefully when you only have one or two hours a week to devote to this. 

What do you think ?

> 
> Thx
> YuriW
> 
> ----- Original Message -----
> From: "Loic Dachary" <loic@dachary.org>
> To: "Yuri Weinstein" <yweinste@redhat.com>
> Cc: "Ceph Development" <ceph-devel@vger.kernel.org>
> Sent: Saturday, May 9, 2015 2:31:03 AM
> Subject: Analyzing the nightlies
> 
> Hi Yuri,
> 
> It would be useful to add more information bout how the nightlies are analyzed at
> 
>    http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_monitor_the_automated_tests_AKA_nightlies
> 
> At this point my understanding is that you look over all of them and you carry the burden of 
> 
> * sorting out the environmental noise
> * creating new bugs for errors for which there is no match in the tracker
> * add a link to the failed job in pre-existing issues found in the tracker (useful to figure out the frequency and helps with debug when there are multiple outputs / logs)
> 
> You do so by using tools such as https://github.com/jcsp/scrape/blob/master/scrape.py and maybe others and you also format your mail messages so that they can be parsed by a program (although such a program does not exist yet, it could go over all your messages and build a database from the mails you sent).
> 
> In the http://lists.ceph.com/private.cgi/ceph-qa-ceph.com/ archives, I see that Greg also regularly goes over the errors and other developers also do. What I'm not sure about is if it's best effort ? Is there a time like bug scrubbing or sprint planning when developers say "Let's analyze QA results and dig bugs" ?
> 
> Cheers
> 

-- 
Loïc Dachary, Artisan Logiciel Libre

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Analyzing the nightlies
  2015-05-10  8:25   ` Loic Dachary
@ 2015-05-10 18:52     ` Yuri Weinstein
  2015-05-10 19:33       ` Loic Dachary
  0 siblings, 1 reply; 6+ messages in thread
From: Yuri Weinstein @ 2015-05-10 18:52 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

See inline

Thx
YuriW

----- Original Message -----
From: "Loic Dachary" <loic@dachary.org>
To: "Yuri Weinstein" <yweinste@redhat.com>
Cc: "Ceph Development" <ceph-devel@vger.kernel.org>
Sent: Sunday, May 10, 2015 1:25:02 AM
Subject: Re: Analyzing the nightlies

Hi Yuri,

On 10/05/2015 02:27, Yuri Weinstein wrote:
> Loic 
> 
> You description on high level is correct.  There are, of cause, more details when someone actually goes thru the nightlies results.
> 
> As far as you question "Is there a time like bug scrubbing or sprint planning when developers say "Let's analyze QA results and dig bugs" ?" - I surely hope so and do see updates and triages on bugs in the tracker, but not 100% sure what exactly our process is, so Sage and development leads are better persons to ask this.
> 
> Also when you say "What I'm not sure about is if it's best effort ? "  - do you have something in mind instead or in addition to what we do now?
> ( I hope something that can lighten the burden :) )

It would make sense for the "Stable releases and backports" team to monitor the nightlies. Not on a daily basis because it would be too much work (there are only a few of us right now). But after merges to the stable branch, I think we should monitor the nightlies that could be impacted. Here is an example:

* 20 pull requests from rgw, rbd, fs, rados are merged in the integration branch (the nightlies don't see it)
* the rgw, rbd, fs, rados suites run on the integration branch and succeed
* the 20 pull requests are merged (after approval by the original developer and the lead if it was backported by someone from the "Stable releases and backports" team). This typically happens withink two or three days.
* the nightlies will run on a stable branch in which 20 pull requests have been merged
* if a jobs fails because of on of these 20 pull requests, the person in charge of the stable branch is likely to be in a good position to figure out where it comes from

I keep an eye on your comments on a daily basis, but I think I should pay attention more closely. I guess the amount of output is intimidating and it's difficult to figure out how to contribute usefully when you only have one or two hours a week to devote to this. 

What do you think ?
================
I Think it's a good idea.

Two points for consideration:

- we need to coordinate those activities in such a way so we use our labs resources very conservatively
- (related go the point above) - we tend to have "stable" related scheduled in the Octo lab, how will that work out for the "Stable releases and backports" team?
================  

> 
> Thx
> YuriW
> 
> ----- Original Message -----
> From: "Loic Dachary" <loic@dachary.org>
> To: "Yuri Weinstein" <yweinste@redhat.com>
> Cc: "Ceph Development" <ceph-devel@vger.kernel.org>
> Sent: Saturday, May 9, 2015 2:31:03 AM
> Subject: Analyzing the nightlies
> 
> Hi Yuri,
> 
> It would be useful to add more information bout how the nightlies are analyzed at
> 
>    http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_monitor_the_automated_tests_AKA_nightlies
> 
> At this point my understanding is that you look over all of them and you carry the burden of 
> 
> * sorting out the environmental noise
> * creating new bugs for errors for which there is no match in the tracker
> * add a link to the failed job in pre-existing issues found in the tracker (useful to figure out the frequency and helps with debug when there are multiple outputs / logs)
> 
> You do so by using tools such as https://github.com/jcsp/scrape/blob/master/scrape.py and maybe others and you also format your mail messages so that they can be parsed by a program (although such a program does not exist yet, it could go over all your messages and build a database from the mails you sent).
> 
> In the http://lists.ceph.com/private.cgi/ceph-qa-ceph.com/ archives, I see that Greg also regularly goes over the errors and other developers also do. What I'm not sure about is if it's best effort ? Is there a time like bug scrubbing or sprint planning when developers say "Let's analyze QA results and dig bugs" ?
> 
> Cheers
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Analyzing the nightlies
  2015-05-10 18:52     ` Yuri Weinstein
@ 2015-05-10 19:33       ` Loic Dachary
  0 siblings, 0 replies; 6+ messages in thread
From: Loic Dachary @ 2015-05-10 19:33 UTC (permalink / raw)
  To: Yuri Weinstein; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 4536 bytes --]



On 10/05/2015 20:52, Yuri Weinstein wrote:
> See inline
> 
> Thx
> YuriW
> 
> ----- Original Message -----
> From: "Loic Dachary" <loic@dachary.org>
> To: "Yuri Weinstein" <yweinste@redhat.com>
> Cc: "Ceph Development" <ceph-devel@vger.kernel.org>
> Sent: Sunday, May 10, 2015 1:25:02 AM
> Subject: Re: Analyzing the nightlies
> 
> Hi Yuri,
> 
> On 10/05/2015 02:27, Yuri Weinstein wrote:
>> Loic 
>>
>> You description on high level is correct.  There are, of cause, more details when someone actually goes thru the nightlies results.
>>
>> As far as you question "Is there a time like bug scrubbing or sprint planning when developers say "Let's analyze QA results and dig bugs" ?" - I surely hope so and do see updates and triages on bugs in the tracker, but not 100% sure what exactly our process is, so Sage and development leads are better persons to ask this.
>>
>> Also when you say "What I'm not sure about is if it's best effort ? "  - do you have something in mind instead or in addition to what we do now?
>> ( I hope something that can lighten the burden :) )
> 
> It would make sense for the "Stable releases and backports" team to monitor the nightlies. Not on a daily basis because it would be too much work (there are only a few of us right now). But after merges to the stable branch, I think we should monitor the nightlies that could be impacted. Here is an example:
> 
> * 20 pull requests from rgw, rbd, fs, rados are merged in the integration branch (the nightlies don't see it)
> * the rgw, rbd, fs, rados suites run on the integration branch and succeed
> * the 20 pull requests are merged (after approval by the original developer and the lead if it was backported by someone from the "Stable releases and backports" team). This typically happens withink two or three days.
> * the nightlies will run on a stable branch in which 20 pull requests have been merged
> * if a jobs fails because of on of these 20 pull requests, the person in charge of the stable branch is likely to be in a good position to figure out where it comes from
> 
> I keep an eye on your comments on a daily basis, but I think I should pay attention more closely. I guess the amount of output is intimidating and it's difficult to figure out how to contribute usefully when you only have one or two hours a week to devote to this. 
> 
> What do you think ?
> ================
> I Think it's a good idea.
> 
> Two points for consideration:
> 
> - we need to coordinate those activities in such a way so we use our labs resources very conservatively

I don't think we plan to add anything to the nightlies.

> - (related go the point above) - we tend to have "stable" related scheduled in the Octo lab, how will that work out for the "Stable releases and backports" team?

We would just help with analyzing the existing nightlies.

Cheers

> ================  
> 
>>
>> Thx
>> YuriW
>>
>> ----- Original Message -----
>> From: "Loic Dachary" <loic@dachary.org>
>> To: "Yuri Weinstein" <yweinste@redhat.com>
>> Cc: "Ceph Development" <ceph-devel@vger.kernel.org>
>> Sent: Saturday, May 9, 2015 2:31:03 AM
>> Subject: Analyzing the nightlies
>>
>> Hi Yuri,
>>
>> It would be useful to add more information bout how the nightlies are analyzed at
>>
>>    http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_monitor_the_automated_tests_AKA_nightlies
>>
>> At this point my understanding is that you look over all of them and you carry the burden of 
>>
>> * sorting out the environmental noise
>> * creating new bugs for errors for which there is no match in the tracker
>> * add a link to the failed job in pre-existing issues found in the tracker (useful to figure out the frequency and helps with debug when there are multiple outputs / logs)
>>
>> You do so by using tools such as https://github.com/jcsp/scrape/blob/master/scrape.py and maybe others and you also format your mail messages so that they can be parsed by a program (although such a program does not exist yet, it could go over all your messages and build a database from the mails you sent).
>>
>> In the http://lists.ceph.com/private.cgi/ceph-qa-ceph.com/ archives, I see that Greg also regularly goes over the errors and other developers also do. What I'm not sure about is if it's best effort ? Is there a time like bug scrubbing or sprint planning when developers say "Let's analyze QA results and dig bugs" ?
>>
>> Cheers
>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Analyzing the nightlies
  2015-05-09  9:31 Analyzing the nightlies Loic Dachary
  2015-05-10  0:27 ` Yuri Weinstein
@ 2015-05-11 22:50 ` Gregory Farnum
  1 sibling, 0 replies; 6+ messages in thread
From: Gregory Farnum @ 2015-05-11 22:50 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Yuri Weinstein, Ceph Development

On Sat, May 9, 2015 at 2:31 AM, Loic Dachary <loic@dachary.org> wrote:
> Hi Yuri,
>
> It would be useful to add more information bout how the nightlies are analyzed at
>
>    http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_monitor_the_automated_tests_AKA_nightlies
>
> At this point my understanding is that you look over all of them and you carry the burden of
>
> * sorting out the environmental noise
> * creating new bugs for errors for which there is no match in the tracker
> * add a link to the failed job in pre-existing issues found in the tracker (useful to figure out the frequency and helps with debug when there are multiple outputs / logs)
>
> You do so by using tools such as https://github.com/jcsp/scrape/blob/master/scrape.py and maybe others and you also format your mail messages so that they can be parsed by a program (although such a program does not exist yet, it could go over all your messages and build a database from the mails you sent).
>
> In the http://lists.ceph.com/private.cgi/ceph-qa-ceph.com/ archives, I see that Greg also regularly goes over the errors and other developers also do. What I'm not sure about is if it's best effort ? Is there a time like bug scrubbing or sprint planning when developers say "Let's analyze QA results and dig bugs" ?

I know that Yuri looks at some nightlies but I'm not sure which ones
he's responsible for — I think it's the upgrade suites?
In general analyzing the nightlies is (unfortunately? maybe
positively) the team lead's responsibility to make happen. Right now I
think that means we each pretty much go over (or ignore) the tests
covering our area as it suits us; I send emails because I find it
convenient but I know Sam mostly just makes bugs. Now that things have
settled down some in the labs I'm planning to sett up a rotation
amongst my team to cover them (it is *not* a small time commitment,
sadly).

The most annoying part of the job is when the lab breaks — realizing
that this means we can or cannot ignore such-and-such a set of
symptoms, making sure that it's not something new in Ceph or the test
that we changed, and then adjudicating responsibility for the fix
between the very nebulous group of people whose fault or
responsibility it might be. (We have a lot of hands in teuthology and
a lot in the lab, not in an entirely overlapping set, and any one of
them can cause breakage.) It's not always clear when I see a batch of
runs failed over the weekend if the problem has been resolved yet or
not. :(
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-05-11 22:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-09  9:31 Analyzing the nightlies Loic Dachary
2015-05-10  0:27 ` Yuri Weinstein
2015-05-10  8:25   ` Loic Dachary
2015-05-10 18:52     ` Yuri Weinstein
2015-05-10 19:33       ` Loic Dachary
2015-05-11 22:50 ` Gregory Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.