Re: teuthology : 70 workers need more than 8GB RAM / 2 CPUS

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: teuthology : 70 workers need more than 8GB RAM / 2 CPUS
       [not found] ` <55ABBD01.3070002@dachary.org>
@ 2015-07-20 15:38   ` Zack Cerza
  2015-07-20 17:03     ` Loic Dachary
  0 siblings, 1 reply; 2+ messages in thread
From: Zack Cerza @ 2015-07-20 15:38 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Andrew Schoen, Ceph Development

Hi Loic,

This is definitely something to keep an eye on. It's actually a bit surprising to me, though - I haven't seen ansible-playbook use any significant resources in sepia.

I wouldn't really recommend running paddles on the same host as teuthology though, to do any serious amount of testing; some teuthology tasks do use large amounts of RAM and/or CPU, and severe load issues could feasibly cause requests to time out, affecting other jobs.

That's all theory though, as I've always used separate hosts for the two services.

Zack

----- Original Message -----
> From: "Loic Dachary" <loic@dachary.org>
> To: "Zack Cerza" <zack@redhat.com>, "Andrew Schoen" <aschoen@redhat.com>
> Cc: "Ceph Development" <ceph-devel@vger.kernel.org>
> Sent: Sunday, July 19, 2015 9:06:41 AM
> Subject: Re: teuthology : 70 workers need more than 8GB RAM / 2 CPUS
> 
> Hi again,
> 
> I had the same problem when 50 workers kick in at the same time. I've lowered
> the number of workers down to 25 and it went well. During a few minutes (~8
> minutes) the load average stayed around 25 (CPU bound, mainly the ansible
> playbook competing, see the screenshot of htop). But did not see any error /
> timeout. then I added 15 workers, wait for the load to go back to < 2 (10
> minutes), then 15 more (10 minutes) to get to 55.
> 
> That sound like a log of CPU used by a single playbook run. Is there a known
> way to reduce that ? If not I'll just upgrade the machine. Just want to make
> sure I'm not missing a simple solution ;-)
> 
> Cheers
> 
> On 19/07/2015 14:22, Loic Dachary wrote:
> > Hi,
> > 
> > For the record, I launched a rados suite on an idle teuthology cluster,
> > with 70 workers running on a 8GB RAM / 2 CPUS / 40GB SSD disk. The load
> > average reached 40 within a minute or two and some jobs started failing /
> > timeouting. I had pulpito running on the same machine and it failed one
> > time out of two because of the load (see the top image).
> > 
> > On friday I was able to run 70 workers because I gradually added them. The
> > load peak is when a job starts and all workers kick in a the same time.
> > 
> > Cheers
> > 
> 
> --
> Loïc Dachary, Artisan Logiciel Libre
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: teuthology : 70 workers need more than 8GB RAM / 2 CPUS
  2015-07-20 15:38   ` teuthology : 70 workers need more than 8GB RAM / 2 CPUS Zack Cerza
@ 2015-07-20 17:03     ` Loic Dachary
  0 siblings, 0 replies; 2+ messages in thread
From: Loic Dachary @ 2015-07-20 17:03 UTC (permalink / raw)
  To: Zack Cerza; +Cc: Andrew Schoen, Ceph Development

[-- Attachment #1: Type: text/plain, Size: 2693 bytes --]

Thanks for the feedback. I'll try with postgresql as it seems the sqlite modifications did nothing really significant.

On 20/07/2015 17:38, Zack Cerza wrote:
> Hi Loic,
> 
> This is definitely something to keep an eye on. It's actually a bit surprising to me, though - I haven't seen ansible-playbook use any significant resources in sepia.
> 
> I wouldn't really recommend running paddles on the same host as teuthology though, to do any serious amount of testing; some teuthology tasks do use large amounts of RAM and/or CPU, and severe load issues could feasibly cause requests to time out, affecting other jobs.
> 
> That's all theory though, as I've always used separate hosts for the two services.
> 
> Zack
> 
> ----- Original Message -----
>> From: "Loic Dachary" <loic@dachary.org>
>> To: "Zack Cerza" <zack@redhat.com>, "Andrew Schoen" <aschoen@redhat.com>
>> Cc: "Ceph Development" <ceph-devel@vger.kernel.org>
>> Sent: Sunday, July 19, 2015 9:06:41 AM
>> Subject: Re: teuthology : 70 workers need more than 8GB RAM / 2 CPUS
>>
>> Hi again,
>>
>> I had the same problem when 50 workers kick in at the same time. I've lowered
>> the number of workers down to 25 and it went well. During a few minutes (~8
>> minutes) the load average stayed around 25 (CPU bound, mainly the ansible
>> playbook competing, see the screenshot of htop). But did not see any error /
>> timeout. then I added 15 workers, wait for the load to go back to < 2 (10
>> minutes), then 15 more (10 minutes) to get to 55.
>>
>> That sound like a log of CPU used by a single playbook run. Is there a known
>> way to reduce that ? If not I'll just upgrade the machine. Just want to make
>> sure I'm not missing a simple solution ;-)
>>
>> Cheers
>>
>> On 19/07/2015 14:22, Loic Dachary wrote:
>>> Hi,
>>>
>>> For the record, I launched a rados suite on an idle teuthology cluster,
>>> with 70 workers running on a 8GB RAM / 2 CPUS / 40GB SSD disk. The load
>>> average reached 40 within a minute or two and some jobs started failing /
>>> timeouting. I had pulpito running on the same machine and it failed one
>>> time out of two because of the load (see the top image).
>>>
>>> On friday I was able to run 70 workers because I gradually added them. The
>>> load peak is when a job starts and all workers kick in a the same time.
>>>
>>> Cheers
>>>
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-07-20 17:03 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <55AB9671.7050509@dachary.org>
     [not found] ` <55ABBD01.3070002@dachary.org>
2015-07-20 15:38   ` teuthology : 70 workers need more than 8GB RAM / 2 CPUS Zack Cerza
2015-07-20 17:03     ` Loic Dachary

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.