* getting ready for jewel 10.2.1
@ 2016-03-30 10:30 Loic Dachary
2016-03-30 10:45 ` Abhishek Varshney
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Loic Dachary @ 2016-03-30 10:30 UTC (permalink / raw)
To: Abhishek Varshney; +Cc: Ceph Development
Hi,
Now is a good time to get ready for jewel 10.2.1 and I created http://tracker.ceph.com/issues/15317 for that purpose. The goal is to be able to run as many suites as possible on OpenStack, so that we do not have to wait days (sometime a week) for runs to complete on Sepia. Best case scenario, all OpenStack specific problems are fixed by the time 10.2.1 is being prepared. Worst case scenario there is no time to fix issues and we keep using the sepia lab. I guess we'll end up somewhere in the middle : some suites will run fine on Openstack and we'll use sepia for others.
In a previous mail I voiced my concerns regarding the lack of interest of developers regarding teuthology job failures that are cause by variations in the infrastructure. I still have no clue how to convey my belief that it is important for teuthology jobs to succeed despite infrastructure variations. But instead of just giving up and do nothing, I will work on that for the rados suite and hope things will evolve in a good way. To be honest, figuring out http://tracker.ceph.com/issues/15236 and seeing a good run of the rados suite on jewel as a result renewed my motivation in that area :-)
Cheers
--
Loïc Dachary, Artisan Logiciel Libre
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: getting ready for jewel 10.2.1 2016-03-30 10:30 getting ready for jewel 10.2.1 Loic Dachary @ 2016-03-30 10:45 ` Abhishek Varshney 2016-03-30 18:47 ` Gregory Farnum 2016-03-31 15:49 ` John Spray 2 siblings, 0 replies; 9+ messages in thread From: Abhishek Varshney @ 2016-03-30 10:45 UTC (permalink / raw) To: Loic Dachary; +Cc: Ceph Development Hi Loic, On Wed, Mar 30, 2016 at 4:00 PM, Loic Dachary <loic@dachary.org> wrote: > Hi, > > Now is a good time to get ready for jewel 10.2.1 and I created http://tracker.ceph.com/issues/15317 for that purpose. Thanks for creating the tracker issue. The goal is to be able to run as many suites as possible on OpenStack, so that we do not have to wait days (sometime a week) for runs to complete on Sepia. Best case scenario, all OpenStack specific problems are fixed by the time 10.2.1 is being prepared. Worst case scenario there is no time to fix issues and we keep using the sepia lab. I guess we'll end up somewhere in the middle : some suites will run fine on Openstack and we'll use sepia for others. Lets aim for the stars :) > > In a previous mail I voiced my concerns regarding the lack of interest of developers regarding teuthology job failures that are cause by variations in the infrastructure. I still have no clue how to convey my belief that it is important for teuthology jobs to succeed despite infrastructure variations. But instead of just giving up and do nothing, I will work on that for the rados suite and hope things will evolve in a good way. To be honest, figuring out http://tracker.ceph.com/issues/15236 and seeing a good run of the rados suite on jewel as a result renewed my motivation in that area :-) > > Cheers > > -- > Loïc Dachary, Artisan Logiciel Libre Thanks Abhishek -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: getting ready for jewel 10.2.1 2016-03-30 10:30 getting ready for jewel 10.2.1 Loic Dachary 2016-03-30 10:45 ` Abhishek Varshney @ 2016-03-30 18:47 ` Gregory Farnum 2016-03-31 14:31 ` Loic Dachary 2016-03-31 15:49 ` John Spray 2 siblings, 1 reply; 9+ messages in thread From: Gregory Farnum @ 2016-03-30 18:47 UTC (permalink / raw) To: Loic Dachary; +Cc: Abhishek Varshney, Ceph Development On Wed, Mar 30, 2016 at 3:30 AM, Loic Dachary <loic@dachary.org> wrote: > Hi, > > Now is a good time to get ready for jewel 10.2.1 and I created http://tracker.ceph.com/issues/15317 for that purpose. The goal is to be able to run as many suites as possible on OpenStack, so that we do not have to wait days (sometime a week) for runs to complete on Sepia. Best case scenario, all OpenStack specific problems are fixed by the time 10.2.1 is being prepared. Worst case scenario there is no time to fix issues and we keep using the sepia lab. I guess we'll end up somewhere in the middle : some suites will run fine on Openstack and we'll use sepia for others. > > In a previous mail I voiced my concerns regarding the lack of interest of developers regarding teuthology job failures that are cause by variations in the infrastructure. I still have no clue how to convey my belief that it is important for teuthology jobs to succeed despite infrastructure variations. But instead of just giving up and do nothing, I will work on that for the rados suite and hope things will evolve in a good way. To be honest, figuring out http://tracker.ceph.com/issues/15236 and seeing a good run of the rados suite on jewel as a result renewed my motivation in that area :-) I think you've convinced us all it's important in the abstract; that's just very different from putting it on top of our list of priorities, especially since we alleviated many of our needs in the sepia lab. Beyond that, a lot of the issues we're seeing have very little to do with Ceph itself, or even the testing programs, and that can make it more difficult to get interested as we lack the necessary expertise. I spent some time trying to get disk sizes and things matched up (and I suddenly realize that never got merged), but some of the other odder issues we're having: http://tracker.ceph.com/issues/13980, in which we are failing to mount anything with nfs v3. This is a config file that needs to get updated; we do it for the sepia lab (probably in ansible?) but somehow that information isn't getting into the ovh slaves. (Or else it is in there, and there's something *else* broken.) If we are using a separate setup regimen for OpenStack than we are in the sepia lab there will be persistent breakage as new dependencies and environmental expectations get added to one and not the other. :/ http://tracker.ceph.com/issues/13876, in which MPI is just failing to get any connections going. Why? No idea; there's a teuthology commit from you that's supposed to have opened up all the ports in the firewall (and it sure *looks* like it does do that, but I don't know how the rules work), but this works in sepia and inasmuch as we have debugging info sure looks like some kind of network blockage... So I think this isn't something that's going to get done properly unless somebody gets assigned to just make everything work in all the suites, who has the time to learn all the fiddly little bits. (Or we somehow take a break for it as a project. But I don't see that going well.) :/ -Greg ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: getting ready for jewel 10.2.1 2016-03-30 18:47 ` Gregory Farnum @ 2016-03-31 14:31 ` Loic Dachary 2016-03-31 19:01 ` Gregory Farnum 0 siblings, 1 reply; 9+ messages in thread From: Loic Dachary @ 2016-03-31 14:31 UTC (permalink / raw) To: Gregory Farnum; +Cc: Ceph Development Hi Gregory, On 30/03/2016 20:47, Gregory Farnum wrote: > On Wed, Mar 30, 2016 at 3:30 AM, Loic Dachary <loic@dachary.org> wrote: >> Hi, >> >> Now is a good time to get ready for jewel 10.2.1 and I created http://tracker.ceph.com/issues/15317 for that purpose. The goal is to be able to run as many suites as possible on OpenStack, so that we do not have to wait days (sometime a week) for runs to complete on Sepia. Best case scenario, all OpenStack specific problems are fixed by the time 10.2.1 is being prepared. Worst case scenario there is no time to fix issues and we keep using the sepia lab. I guess we'll end up somewhere in the middle : some suites will run fine on Openstack and we'll use sepia for others. >> >> In a previous mail I voiced my concerns regarding the lack of interest of developers regarding teuthology job failures that are cause by variations in the infrastructure. I still have no clue how to convey my belief that it is important for teuthology jobs to succeed despite infrastructure variations. But instead of just giving up and do nothing, I will work on that for the rados suite and hope things will evolve in a good way. To be honest, figuring out http://tracker.ceph.com/issues/15236 and seeing a good run of the rados suite on jewel as a result renewed my motivation in that area :-) > > I think you've convinced us all it's important in the abstract; that's > just very different from putting it on top of our list of priorities, > especially since we alleviated many of our needs in the sepia lab. > Beyond that, a lot of the issues we're seeing have very little to do > with Ceph itself, or even the testing programs, and that can make it > more difficult to get interested as we lack the necessary expertise. I > spent some time trying to get disk sizes and things matched up (and I > suddenly realize that never got merged), but some of the other odder > issues we're having: > > http://tracker.ceph.com/issues/13980, in which we are failing to mount > anything with nfs v3. This is a config file that needs to get updated; > we do it for the sepia lab (probably in ansible?) but somehow that > information isn't getting into the ovh slaves. (Or else it is in > there, and there's something *else* broken.) If we are using a > separate setup regimen for OpenStack than we are in the sepia lab > there will be persistent breakage as new dependencies and > environmental expectations get added to one and not the other. :/ ceph-cm-ansible does not have any OpenStack specific instructions. It's supposed to work exactly the same on both sepia and OpenStack. When teuthology provisions an OpenStack target, it does so in the same way it provisions VPS in sepia. The only difference is that OpenStack uses images that come from http://cloud.centos.org/centos/7/images/ etc., unmodified. The VPS images have sometime been modified. However, this has only been an issue once, over six months ago. On OVH the UDP ports were firewalled, and that created the problem. I changed the firewall rules and I'm hopefull http://pulpito.ovh.sepia.ceph.com:8081/loic-2016-03-31_14:10:18-knfs-jewel-testing-basic-openstack/ will now pass. > http://tracker.ceph.com/issues/13876, in which MPI is just failing to > get any connections going. Why? No idea; there's a teuthology commit > from you that's supposed to have opened up all the ports in the > firewall (and it sure *looks* like it does do that, but I don't know > how the rules work), but this works in sepia and inasmuch as we have > debugging info sure looks like some kind of network blockage... I opened the required port on the OVH lab. I don't think there is an ansible rule that does it but I'll ask Zack to be sure. > So I think this isn't something that's going to get done properly > unless somebody gets assigned to just make everything work in all the > suites, who has the time to learn all the fiddly little bits. (Or we > somehow take a break for it as a project. But I don't see that going > well.) :/ If you suspect an OpenStack specific problem, feel free to ping me. There is a good chance I can help and together we can make teuthology happy with OpenStack :-) Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: getting ready for jewel 10.2.1 2016-03-31 14:31 ` Loic Dachary @ 2016-03-31 19:01 ` Gregory Farnum 2016-03-31 22:13 ` Loic Dachary 0 siblings, 1 reply; 9+ messages in thread From: Gregory Farnum @ 2016-03-31 19:01 UTC (permalink / raw) To: Loic Dachary; +Cc: Ceph Development On Thu, Mar 31, 2016 at 7:31 AM, Loic Dachary <loic@dachary.org> wrote: > Hi Gregory, > > On 30/03/2016 20:47, Gregory Farnum wrote: >> On Wed, Mar 30, 2016 at 3:30 AM, Loic Dachary <loic@dachary.org> wrote: >>> Hi, >>> >>> Now is a good time to get ready for jewel 10.2.1 and I created http://tracker.ceph.com/issues/15317 for that purpose. The goal is to be able to run as many suites as possible on OpenStack, so that we do not have to wait days (sometime a week) for runs to complete on Sepia. Best case scenario, all OpenStack specific problems are fixed by the time 10.2.1 is being prepared. Worst case scenario there is no time to fix issues and we keep using the sepia lab. I guess we'll end up somewhere in the middle : some suites will run fine on Openstack and we'll use sepia for others. >>> >>> In a previous mail I voiced my concerns regarding the lack of interest of developers regarding teuthology job failures that are cause by variations in the infrastructure. I still have no clue how to convey my belief that it is important for teuthology jobs to succeed despite infrastructure variations. But instead of just giving up and do nothing, I will work on that for the rados suite and hope things will evolve in a good way. To be honest, figuring out http://tracker.ceph.com/issues/15236 and seeing a good run of the rados suite on jewel as a result renewed my motivation in that area :-) >> >> I think you've convinced us all it's important in the abstract; that's >> just very different from putting it on top of our list of priorities, >> especially since we alleviated many of our needs in the sepia lab. >> Beyond that, a lot of the issues we're seeing have very little to do >> with Ceph itself, or even the testing programs, and that can make it >> more difficult to get interested as we lack the necessary expertise. I >> spent some time trying to get disk sizes and things matched up (and I >> suddenly realize that never got merged), but some of the other odder >> issues we're having: >> >> http://tracker.ceph.com/issues/13980, in which we are failing to mount >> anything with nfs v3. This is a config file that needs to get updated; >> we do it for the sepia lab (probably in ansible?) but somehow that >> information isn't getting into the ovh slaves. (Or else it is in >> there, and there's something *else* broken.) If we are using a >> separate setup regimen for OpenStack than we are in the sepia lab >> there will be persistent breakage as new dependencies and >> environmental expectations get added to one and not the other. :/ > > ceph-cm-ansible does not have any OpenStack specific instructions. It's supposed to work exactly the same on both sepia and OpenStack. When teuthology provisions an OpenStack target, it does so in the same way it provisions VPS in sepia. The only difference is that OpenStack uses images that come from http://cloud.centos.org/centos/7/images/ etc., unmodified. The VPS images have sometime been modified. However, this has only been an issue once, over six months ago. > > On OVH the UDP ports were firewalled, and that created the problem. I changed the firewall rules and I'm hopefull http://pulpito.ovh.sepia.ceph.com:8081/loic-2016-03-31_14:10:18-knfs-jewel-testing-basic-openstack/ will now pass. > >> http://tracker.ceph.com/issues/13876, in which MPI is just failing to >> get any connections going. Why? No idea; there's a teuthology commit >> from you that's supposed to have opened up all the ports in the >> firewall (and it sure *looks* like it does do that, but I don't know >> how the rules work), but this works in sepia and inasmuch as we have >> debugging info sure looks like some kind of network blockage... > > I opened the required port on the OVH lab. I don't think there is an ansible rule that does it but I'll ask Zack to be sure. > >> So I think this isn't something that's going to get done properly >> unless somebody gets assigned to just make everything work in all the >> suites, who has the time to learn all the fiddly little bits. (Or we >> somehow take a break for it as a project. But I don't see that going >> well.) :/ > > If you suspect an OpenStack specific problem, feel free to ping me. There is a good chance I can help and together we can make teuthology happy with OpenStack :-) I really wasn't fishing with those, but hey! thanks so much for those fixes. :) Do we have any way to automate those kinds of things for external users? It sounds like right now these are just some random things any third party needs to know to do, or their tests will mysteriously fail? -Greg ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: getting ready for jewel 10.2.1 2016-03-31 19:01 ` Gregory Farnum @ 2016-03-31 22:13 ` Loic Dachary 0 siblings, 0 replies; 9+ messages in thread From: Loic Dachary @ 2016-03-31 22:13 UTC (permalink / raw) To: Gregory Farnum; +Cc: Ceph Development On 31/03/2016 21:01, Gregory Farnum wrote: > On Thu, Mar 31, 2016 at 7:31 AM, Loic Dachary <loic@dachary.org> wrote: >> Hi Gregory, >> >> On 30/03/2016 20:47, Gregory Farnum wrote: >>> On Wed, Mar 30, 2016 at 3:30 AM, Loic Dachary <loic@dachary.org> wrote: >>>> Hi, >>>> >>>> Now is a good time to get ready for jewel 10.2.1 and I created http://tracker.ceph.com/issues/15317 for that purpose. The goal is to be able to run as many suites as possible on OpenStack, so that we do not have to wait days (sometime a week) for runs to complete on Sepia. Best case scenario, all OpenStack specific problems are fixed by the time 10.2.1 is being prepared. Worst case scenario there is no time to fix issues and we keep using the sepia lab. I guess we'll end up somewhere in the middle : some suites will run fine on Openstack and we'll use sepia for others. >>>> >>>> In a previous mail I voiced my concerns regarding the lack of interest of developers regarding teuthology job failures that are cause by variations in the infrastructure. I still have no clue how to convey my belief that it is important for teuthology jobs to succeed despite infrastructure variations. But instead of just giving up and do nothing, I will work on that for the rados suite and hope things will evolve in a good way. To be honest, figuring out http://tracker.ceph.com/issues/15236 and seeing a good run of the rados suite on jewel as a result renewed my motivation in that area :-) >>> >>> I think you've convinced us all it's important in the abstract; that's >>> just very different from putting it on top of our list of priorities, >>> especially since we alleviated many of our needs in the sepia lab. >>> Beyond that, a lot of the issues we're seeing have very little to do >>> with Ceph itself, or even the testing programs, and that can make it >>> more difficult to get interested as we lack the necessary expertise. I >>> spent some time trying to get disk sizes and things matched up (and I >>> suddenly realize that never got merged), but some of the other odder >>> issues we're having: >>> >>> http://tracker.ceph.com/issues/13980, in which we are failing to mount >>> anything with nfs v3. This is a config file that needs to get updated; >>> we do it for the sepia lab (probably in ansible?) but somehow that >>> information isn't getting into the ovh slaves. (Or else it is in >>> there, and there's something *else* broken.) If we are using a >>> separate setup regimen for OpenStack than we are in the sepia lab >>> there will be persistent breakage as new dependencies and >>> environmental expectations get added to one and not the other. :/ >> >> ceph-cm-ansible does not have any OpenStack specific instructions. It's supposed to work exactly the same on both sepia and OpenStack. When teuthology provisions an OpenStack target, it does so in the same way it provisions VPS in sepia. The only difference is that OpenStack uses images that come from http://cloud.centos.org/centos/7/images/ etc., unmodified. The VPS images have sometime been modified. However, this has only been an issue once, over six months ago. >> >> On OVH the UDP ports were firewalled, and that created the problem. I changed the firewall rules and I'm hopefull http://pulpito.ovh.sepia.ceph.com:8081/loic-2016-03-31_14:10:18-knfs-jewel-testing-basic-openstack/ will now pass. >> >>> http://tracker.ceph.com/issues/13876, in which MPI is just failing to >>> get any connections going. Why? No idea; there's a teuthology commit >>> from you that's supposed to have opened up all the ports in the >>> firewall (and it sure *looks* like it does do that, but I don't know >>> how the rules work), but this works in sepia and inasmuch as we have >>> debugging info sure looks like some kind of network blockage... >> >> I opened the required port on the OVH lab. I don't think there is an ansible rule that does it but I'll ask Zack to be sure. >> >>> So I think this isn't something that's going to get done properly >>> unless somebody gets assigned to just make everything work in all the >>> suites, who has the time to learn all the fiddly little bits. (Or we >>> somehow take a break for it as a project. But I don't see that going >>> well.) :/ >> >> If you suspect an OpenStack specific problem, feel free to ping me. There is a good chance I can help and together we can make teuthology happy with OpenStack :-) > > I really wasn't fishing with those, but hey! thanks so much for those fixes. :) > > Do we have any way to automate those kinds of things for external > users? It sounds like right now these are just some random things any > third party needs to know to do, or their tests will mysteriously > fail? https://github.com/ceph/teuthology/pull/834/files automates that. Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: getting ready for jewel 10.2.1 2016-03-30 10:30 getting ready for jewel 10.2.1 Loic Dachary 2016-03-30 10:45 ` Abhishek Varshney 2016-03-30 18:47 ` Gregory Farnum @ 2016-03-31 15:49 ` John Spray 2016-04-01 15:18 ` Sage Weil 2 siblings, 1 reply; 9+ messages in thread From: John Spray @ 2016-03-31 15:49 UTC (permalink / raw) To: Loic Dachary; +Cc: Abhishek Varshney, Ceph Development On Wed, Mar 30, 2016 at 11:30 AM, Loic Dachary <loic@dachary.org> wrote: > Hi, > > Now is a good time to get ready for jewel 10.2.1 and I created http://tracker.ceph.com/issues/15317 for that purpose. The goal is to be able to run as many suites as possible on OpenStack, so that we do not have to wait days (sometime a week) for runs to complete on Sepia. Best case scenario, all OpenStack specific problems are fixed by the time 10.2.1 is being prepared. Worst case scenario there is no time to fix issues and we keep using the sepia lab. I guess we'll end up somewhere in the middle : some suites will run fine on Openstack and we'll use sepia for others. > > In a previous mail I voiced my concerns regarding the lack of interest of developers regarding teuthology job failures that are cause by variations in the infrastructure. I still have no clue how to convey my belief that it is important for teuthology jobs to succeed despite infrastructure variations. But instead of just giving up and do nothing, I will work on that for the rados suite and hope things will evolve in a good way. To be honest, figuring out http://tracker.ceph.com/issues/15236 and seeing a good run of the rados suite on jewel as a result renewed my motivation in that area :-) If I was dedicating time to working on lab infrastructure, I think I would prioritise stabilising the existing sepia lab. I still see infrastructure issues (these day usually package install failures) sprinkled all over the place, so I have to question the value of spreading ourselves even more thinly by trying to handle multiple environments with their different quirks. I have nothing against the openstack work, it is a good tool, but I don't think it was wise to just deploy it and expect other developers to handle the issues. I would have liked to see at least one passing filesystem run on openstack before regular nightlies were scheduled on it. Maybe now that we have fixes for #13980 and #13876 we will see a passing run, and can get more of a sense of how stable/unstable these tests are in the openstack environment: I think it's likely that we will continue to see timeouts/instability from the comparatively underpowered nodes. John > Cheers > > -- > Loïc Dachary, Artisan Logiciel Libre > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: getting ready for jewel 10.2.1 2016-03-31 15:49 ` John Spray @ 2016-04-01 15:18 ` Sage Weil 2016-04-01 16:19 ` John Spray 0 siblings, 1 reply; 9+ messages in thread From: Sage Weil @ 2016-04-01 15:18 UTC (permalink / raw) To: John Spray; +Cc: Loic Dachary, Abhishek Varshney, Ceph Development On Thu, 31 Mar 2016, John Spray wrote: > On Wed, Mar 30, 2016 at 11:30 AM, Loic Dachary <loic@dachary.org> wrote: > > Hi, > > > > Now is a good time to get ready for jewel 10.2.1 and I created > > http://tracker.ceph.com/issues/15317 for that purpose. The goal is to > > be able to run as many suites as possible on OpenStack, so that we do > > not have to wait days (sometime a week) for runs to complete on Sepia. > > Best case scenario, all OpenStack specific problems are fixed by the > > time 10.2.1 is being prepared. Worst case scenario there is no time to > > fix issues and we keep using the sepia lab. I guess we'll end up > > somewhere in the middle : some suites will run fine on Openstack and > > we'll use sepia for others. > > > > In a previous mail I voiced my concerns regarding the lack of interest > > of developers regarding teuthology job failures that are cause by > > variations in the infrastructure. I still have no clue how to convey > > my belief that it is important for teuthology jobs to succeed despite > > infrastructure variations. But instead of just giving up and do > > nothing, I will work on that for the rados suite and hope things will > > evolve in a good way. To be honest, figuring out > > http://tracker.ceph.com/issues/15236 and seeing a good run of the > > rados suite on jewel as a result renewed my motivation in that area > > :-) > > If I was dedicating time to working on lab infrastructure, I think I > would prioritise stabilising the existing sepia lab. I still see > infrastructure issues (these day usually package install failures) > sprinkled all over the place, so I have to question the value of > spreading ourselves even more thinly by trying to handle multiple > environments with their different quirks. I think we can't afford not to do both. The problem with focusing only on sepia is that it makes it prevents new contributors from testing their code, and testing is one of the key pieces that preventing us from scaling our overall development velocity. Also, FWIW, Sam sank a couple days this week into improvements on the sepia side that have eliminated almost all of the sepia package install noise we've been seeing (at least on the rados suite). With Jewel stabilizing now is a good time to do the same with openstack. > I have nothing against the openstack work, it is a good tool, but I > don't think it was wise to just deploy it and expect other developers > to handle the issues. I would have liked to see at least one passing > filesystem run on openstack before regular nightlies were scheduled on > it. Maybe now that we have fixes for #13980 and #13876 we will see a > passing run, and can get more of a sense of how stable/unstable these > tests are in the openstack environment: I think it's likely that we > will continue to see timeouts/instability from the comparatively > underpowered nodes. The earlier transition to openstack left much to be desired, although to be fair it would have been hard to do it all that differently given we were forced out of Irvine by the sepia lab move. In my view the main lesson learned was that without everyone feeling invested in fixing the issues to make the tests pass, the issues won't get fixed. The lab folks don't understand all of the tests and their weird issues, and the developers are too busy with code to help debug them. I'd like to convince everyone that making openstack a reliable testing environment is an importat strategic goal for the project as a whole, and in everyone's best interests, and that right now (while we're focusing on teuthology tests and waiting for the final blocking jewel bugs to be squashed) is as good a time as any to dig into the remaining issues... both with sepia *and* openstack. Is that reasonable? sage ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: getting ready for jewel 10.2.1 2016-04-01 15:18 ` Sage Weil @ 2016-04-01 16:19 ` John Spray 0 siblings, 0 replies; 9+ messages in thread From: John Spray @ 2016-04-01 16:19 UTC (permalink / raw) To: Sage Weil; +Cc: Loic Dachary, Abhishek Varshney, Ceph Development On Fri, Apr 1, 2016 at 4:18 PM, Sage Weil <sage@newdream.net> wrote: > On Thu, 31 Mar 2016, John Spray wrote: >> On Wed, Mar 30, 2016 at 11:30 AM, Loic Dachary <loic@dachary.org> wrote: >> > Hi, >> > >> > Now is a good time to get ready for jewel 10.2.1 and I created >> > http://tracker.ceph.com/issues/15317 for that purpose. The goal is to >> > be able to run as many suites as possible on OpenStack, so that we do >> > not have to wait days (sometime a week) for runs to complete on Sepia. >> > Best case scenario, all OpenStack specific problems are fixed by the >> > time 10.2.1 is being prepared. Worst case scenario there is no time to >> > fix issues and we keep using the sepia lab. I guess we'll end up >> > somewhere in the middle : some suites will run fine on Openstack and >> > we'll use sepia for others. >> > >> > In a previous mail I voiced my concerns regarding the lack of interest >> > of developers regarding teuthology job failures that are cause by >> > variations in the infrastructure. I still have no clue how to convey >> > my belief that it is important for teuthology jobs to succeed despite >> > infrastructure variations. But instead of just giving up and do >> > nothing, I will work on that for the rados suite and hope things will >> > evolve in a good way. To be honest, figuring out >> > http://tracker.ceph.com/issues/15236 and seeing a good run of the >> > rados suite on jewel as a result renewed my motivation in that area >> > :-) >> >> If I was dedicating time to working on lab infrastructure, I think I >> would prioritise stabilising the existing sepia lab. I still see >> infrastructure issues (these day usually package install failures) >> sprinkled all over the place, so I have to question the value of >> spreading ourselves even more thinly by trying to handle multiple >> environments with their different quirks. > > I think we can't afford not to do both. The problem with focusing only on > sepia is that it makes it prevents new contributors from testing their > code, and testing is one of the key pieces that preventing us from scaling > our overall development velocity. > > Also, FWIW, Sam sank a couple days this week into improvements on the > sepia side that have eliminated almost all of the sepia package install > noise we've been seeing (at least on the rados suite). With Jewel > stabilizing now is a good time to do the same with openstack. > >> I have nothing against the openstack work, it is a good tool, but I >> don't think it was wise to just deploy it and expect other developers >> to handle the issues. I would have liked to see at least one passing >> filesystem run on openstack before regular nightlies were scheduled on >> it. Maybe now that we have fixes for #13980 and #13876 we will see a >> passing run, and can get more of a sense of how stable/unstable these >> tests are in the openstack environment: I think it's likely that we >> will continue to see timeouts/instability from the comparatively >> underpowered nodes. > > The earlier transition to openstack left much to be desired, although to > be fair it would have been hard to do it all that differently given we > were forced out of Irvine by the sepia lab move. In my view the main > lesson learned was that without everyone feeling invested in fixing the > issues to make the tests pass, the issues won't get fixed. The lab folks > don't understand all of the tests and their weird issues, and the > developers are too busy with code to help debug them. > > I'd like to convince everyone that making openstack a reliable testing > environment is an importat strategic goal for the project as a whole, and > in everyone's best interests, and that right now (while we're focusing on > teuthology tests and waiting for the final blocking jewel bugs to be > squashed) is as good a time as any to dig into the remaining issues... > both with sepia *and* openstack. > > Is that reasonable? OK -- if we are committed to doing both, then my opinions about priorities are kind of academic :-) If we're all pulling in the same direction I think it's fine. Cheers, John ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-04-01 16:20 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-03-30 10:30 getting ready for jewel 10.2.1 Loic Dachary 2016-03-30 10:45 ` Abhishek Varshney 2016-03-30 18:47 ` Gregory Farnum 2016-03-31 14:31 ` Loic Dachary 2016-03-31 19:01 ` Gregory Farnum 2016-03-31 22:13 ` Loic Dachary 2016-03-31 15:49 ` John Spray 2016-04-01 15:18 ` Sage Weil 2016-04-01 16:19 ` John Spray
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.