* Re: teuthology : 70 workers need more than 8GB RAM / 2 CPUS [not found] ` <55ABBD01.3070002@dachary.org> @ 2015-07-20 15:38 ` Zack Cerza 2015-07-20 17:03 ` Loic Dachary 0 siblings, 1 reply; 2+ messages in thread From: Zack Cerza @ 2015-07-20 15:38 UTC (permalink / raw) To: Loic Dachary; +Cc: Andrew Schoen, Ceph Development Hi Loic, This is definitely something to keep an eye on. It's actually a bit surprising to me, though - I haven't seen ansible-playbook use any significant resources in sepia. I wouldn't really recommend running paddles on the same host as teuthology though, to do any serious amount of testing; some teuthology tasks do use large amounts of RAM and/or CPU, and severe load issues could feasibly cause requests to time out, affecting other jobs. That's all theory though, as I've always used separate hosts for the two services. Zack ----- Original Message ----- > From: "Loic Dachary" <loic@dachary.org> > To: "Zack Cerza" <zack@redhat.com>, "Andrew Schoen" <aschoen@redhat.com> > Cc: "Ceph Development" <ceph-devel@vger.kernel.org> > Sent: Sunday, July 19, 2015 9:06:41 AM > Subject: Re: teuthology : 70 workers need more than 8GB RAM / 2 CPUS > > Hi again, > > I had the same problem when 50 workers kick in at the same time. I've lowered > the number of workers down to 25 and it went well. During a few minutes (~8 > minutes) the load average stayed around 25 (CPU bound, mainly the ansible > playbook competing, see the screenshot of htop). But did not see any error / > timeout. then I added 15 workers, wait for the load to go back to < 2 (10 > minutes), then 15 more (10 minutes) to get to 55. > > That sound like a log of CPU used by a single playbook run. Is there a known > way to reduce that ? If not I'll just upgrade the machine. Just want to make > sure I'm not missing a simple solution ;-) > > Cheers > > On 19/07/2015 14:22, Loic Dachary wrote: > > Hi, > > > > For the record, I launched a rados suite on an idle teuthology cluster, > > with 70 workers running on a 8GB RAM / 2 CPUS / 40GB SSD disk. The load > > average reached 40 within a minute or two and some jobs started failing / > > timeouting. I had pulpito running on the same machine and it failed one > > time out of two because of the load (see the top image). > > > > On friday I was able to run 70 workers because I gradually added them. The > > load peak is when a job starts and all workers kick in a the same time. > > > > Cheers > > > > -- > Loïc Dachary, Artisan Logiciel Libre > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: teuthology : 70 workers need more than 8GB RAM / 2 CPUS 2015-07-20 15:38 ` teuthology : 70 workers need more than 8GB RAM / 2 CPUS Zack Cerza @ 2015-07-20 17:03 ` Loic Dachary 0 siblings, 0 replies; 2+ messages in thread From: Loic Dachary @ 2015-07-20 17:03 UTC (permalink / raw) To: Zack Cerza; +Cc: Andrew Schoen, Ceph Development [-- Attachment #1: Type: text/plain, Size: 2693 bytes --] Thanks for the feedback. I'll try with postgresql as it seems the sqlite modifications did nothing really significant. On 20/07/2015 17:38, Zack Cerza wrote: > Hi Loic, > > This is definitely something to keep an eye on. It's actually a bit surprising to me, though - I haven't seen ansible-playbook use any significant resources in sepia. > > I wouldn't really recommend running paddles on the same host as teuthology though, to do any serious amount of testing; some teuthology tasks do use large amounts of RAM and/or CPU, and severe load issues could feasibly cause requests to time out, affecting other jobs. > > That's all theory though, as I've always used separate hosts for the two services. > > Zack > > ----- Original Message ----- >> From: "Loic Dachary" <loic@dachary.org> >> To: "Zack Cerza" <zack@redhat.com>, "Andrew Schoen" <aschoen@redhat.com> >> Cc: "Ceph Development" <ceph-devel@vger.kernel.org> >> Sent: Sunday, July 19, 2015 9:06:41 AM >> Subject: Re: teuthology : 70 workers need more than 8GB RAM / 2 CPUS >> >> Hi again, >> >> I had the same problem when 50 workers kick in at the same time. I've lowered >> the number of workers down to 25 and it went well. During a few minutes (~8 >> minutes) the load average stayed around 25 (CPU bound, mainly the ansible >> playbook competing, see the screenshot of htop). But did not see any error / >> timeout. then I added 15 workers, wait for the load to go back to < 2 (10 >> minutes), then 15 more (10 minutes) to get to 55. >> >> That sound like a log of CPU used by a single playbook run. Is there a known >> way to reduce that ? If not I'll just upgrade the machine. Just want to make >> sure I'm not missing a simple solution ;-) >> >> Cheers >> >> On 19/07/2015 14:22, Loic Dachary wrote: >>> Hi, >>> >>> For the record, I launched a rados suite on an idle teuthology cluster, >>> with 70 workers running on a 8GB RAM / 2 CPUS / 40GB SSD disk. The load >>> average reached 40 within a minute or two and some jobs started failing / >>> timeouting. I had pulpito running on the same machine and it failed one >>> time out of two because of the load (see the top image). >>> >>> On friday I was able to run 70 workers because I gradually added them. The >>> load peak is when a job starts and all workers kick in a the same time. >>> >>> Cheers >>> >> >> -- >> Loïc Dachary, Artisan Logiciel Libre >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Loïc Dachary, Artisan Logiciel Libre [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2015-07-20 17:03 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <55AB9671.7050509@dachary.org>
[not found] ` <55ABBD01.3070002@dachary.org>
2015-07-20 15:38 ` teuthology : 70 workers need more than 8GB RAM / 2 CPUS Zack Cerza
2015-07-20 17:03 ` Loic Dachary
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.