qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* how long do we need to retain gitlab CI job stdout logs?
@ 2022-08-08 17:47 Peter Maydell
  2022-08-08 18:01 ` Warner Losh
  2022-08-08 18:42 ` Thomas Huth
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Maydell @ 2022-08-08 17:47 UTC (permalink / raw)
  To: QEMU Developers; +Cc: Alex Bennée, Daniel P. Berrange, Richard Henderson

Hi; I just reduced QEMU's storage usage on gitlab by 130GB (no typo!)
using https://gitlab.com/eskultety/gitlab_cleaner, which Dan helpfully
pointed me at. This script removes old pipelines, which take up a
lot of storage space for QEMU because they include the stdout logs
for all the CI jobs in the pipeline. (Gitlab doesn't expire these,
either by default or configurably -- you have to either manually delete
the pipeline in the UI or else use the API, as this script does.)

I somewhat conservatively only blew away pipelines from before the
1st January 2022. I feel like we don't really even need 6 months worth
of CI job logs, though -- any views on whether we should be pruning
them more aggressively ?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: how long do we need to retain gitlab CI job stdout logs?
  2022-08-08 17:47 how long do we need to retain gitlab CI job stdout logs? Peter Maydell
@ 2022-08-08 18:01 ` Warner Losh
  2022-08-08 18:42 ` Thomas Huth
  1 sibling, 0 replies; 7+ messages in thread
From: Warner Losh @ 2022-08-08 18:01 UTC (permalink / raw)
  To: Peter Maydell
  Cc: QEMU Developers, Alex Bennée, Daniel P. Berrange,
	Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 1469 bytes --]

On Mon, Aug 8, 2022, 11:49 AM Peter Maydell <peter.maydell@linaro.org>
wrote:

> Hi; I just reduced QEMU's storage usage on gitlab by 130GB (no typo!)
> using https://gitlab.com/eskultety/gitlab_cleaner, which Dan helpfully
> pointed me at. This script removes old pipelines, which take up a
> lot of storage space for QEMU because they include the stdout logs
> for all the CI jobs in the pipeline. (Gitlab doesn't expire these,
> either by default or configurably -- you have to either manually delete
> the pipeline in the UI or else use the API, as this script does.)
>
> I somewhat conservatively only blew away pipelines from before the
> 1st January 2022. I feel like we don't really even need 6 months worth
> of CI job logs, though -- any views on whether we should be pruning
> them more aggressively ?
>

My finger in the air says "more than a month, less than a year."

It can often take a while to notice problems, especially non fatal ones. If
we had a one month retention we'd likely find we'd need older logs fairly
often. If we expire after a year, we'd never wish we hadn't. Nearly all
problems CI would be helpful to address are found in that time.

Usually, in other project, almost all issues like this are fixed within a
couple months (often much sooner). That suggests that 4-6 months likely is
the right balance with my personal bias to 6 months unless there is
significant financial or other savings from 4 months.

Warner

thanks
> -- PMM
>
>

[-- Attachment #2: Type: text/html, Size: 2316 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: how long do we need to retain gitlab CI job stdout logs?
  2022-08-08 17:47 how long do we need to retain gitlab CI job stdout logs? Peter Maydell
  2022-08-08 18:01 ` Warner Losh
@ 2022-08-08 18:42 ` Thomas Huth
  2022-08-09  6:15   ` Markus Armbruster
  2022-08-09  8:50   ` Daniel P. Berrangé
  1 sibling, 2 replies; 7+ messages in thread
From: Thomas Huth @ 2022-08-08 18:42 UTC (permalink / raw)
  To: Peter Maydell, QEMU Developers
  Cc: Alex Bennée, Daniel P. Berrange, Richard Henderson

On 08/08/2022 19.47, Peter Maydell wrote:
> Hi; I just reduced QEMU's storage usage on gitlab by 130GB (no typo!)
> using https://gitlab.com/eskultety/gitlab_cleaner, which Dan helpfully
> pointed me at. This script removes old pipelines, which take up a
> lot of storage space for QEMU because they include the stdout logs
> for all the CI jobs in the pipeline. (Gitlab doesn't expire these,
> either by default or configurably -- you have to either manually delete
> the pipeline in the UI or else use the API, as this script does.)
> 
> I somewhat conservatively only blew away pipelines from before the
> 1st January 2022. I feel like we don't really even need 6 months worth
> of CI job logs, though -- any views on whether we should be pruning
> them more aggressively ?

I'd say we should at least keep the logs of the last 4 to 5 months, i.e. the 
logs for one release cycle, so we can check these logs in case we introduced 
a new bug in the current release cycle.

  Thomas



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: how long do we need to retain gitlab CI job stdout logs?
  2022-08-08 18:42 ` Thomas Huth
@ 2022-08-09  6:15   ` Markus Armbruster
  2022-08-09  8:50   ` Daniel P. Berrangé
  1 sibling, 0 replies; 7+ messages in thread
From: Markus Armbruster @ 2022-08-09  6:15 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Peter Maydell, QEMU Developers, Alex Bennée,
	Daniel P. Berrange, Richard Henderson

Thomas Huth <thuth@redhat.com> writes:

> On 08/08/2022 19.47, Peter Maydell wrote:
>> Hi; I just reduced QEMU's storage usage on gitlab by 130GB (no typo!)
>> using https://gitlab.com/eskultety/gitlab_cleaner, which Dan helpfully
>> pointed me at. This script removes old pipelines, which take up a
>> lot of storage space for QEMU because they include the stdout logs
>> for all the CI jobs in the pipeline. (Gitlab doesn't expire these,
>> either by default or configurably -- you have to either manually delete
>> the pipeline in the UI or else use the API, as this script does.)
>> I somewhat conservatively only blew away pipelines from before the
>> 1st January 2022. I feel like we don't really even need 6 months worth
>> of CI job logs, though -- any views on whether we should be pruning
>> them more aggressively ?
>
> I'd say we should at least keep the logs of the last 4 to 5 months, i.e. the logs for one release cycle, so we can check these logs in case we introduced 
> a new bug in the current release cycle.

If this takes too much space, consider keeping every n-th log after a
month.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: how long do we need to retain gitlab CI job stdout logs?
  2022-08-08 18:42 ` Thomas Huth
  2022-08-09  6:15   ` Markus Armbruster
@ 2022-08-09  8:50   ` Daniel P. Berrangé
  2022-08-09  9:44     ` Markus Armbruster
  1 sibling, 1 reply; 7+ messages in thread
From: Daniel P. Berrangé @ 2022-08-09  8:50 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Peter Maydell, QEMU Developers, Alex Bennée,
	Richard Henderson

On Mon, Aug 08, 2022 at 08:42:28PM +0200, Thomas Huth wrote:
> On 08/08/2022 19.47, Peter Maydell wrote:
> > Hi; I just reduced QEMU's storage usage on gitlab by 130GB (no typo!)
> > using https://gitlab.com/eskultety/gitlab_cleaner, which Dan helpfully
> > pointed me at. This script removes old pipelines, which take up a
> > lot of storage space for QEMU because they include the stdout logs
> > for all the CI jobs in the pipeline. (Gitlab doesn't expire these,
> > either by default or configurably -- you have to either manually delete
> > the pipeline in the UI or else use the API, as this script does.)
> > 
> > I somewhat conservatively only blew away pipelines from before the
> > 1st January 2022. I feel like we don't really even need 6 months worth
> > of CI job logs, though -- any views on whether we should be pruning
> > them more aggressively ?
> 
> I'd say we should at least keep the logs of the last 4 to 5 months, i.e. the
> logs for one release cycle, so we can check these logs in case we introduced
> a new bug in the current release cycle.

Have we ever actually done this in practice ?  I don't think I've ever
looked at a pipeline older than 1-2 weeks in any project I've worked
with on gitlab.

Note that we currently use 165 GB, over an 8 month period (not sure on
the split between container registry and pipeline). I'd guess 4-5 months
might knock another 30-40 GB off our usage, still leaving it huge.

Personally I would suggest 1 month is sufficent for 99% of our needs.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: how long do we need to retain gitlab CI job stdout logs?
  2022-08-09  8:50   ` Daniel P. Berrangé
@ 2022-08-09  9:44     ` Markus Armbruster
  2022-08-09 10:42       ` Daniel P. Berrangé
  0 siblings, 1 reply; 7+ messages in thread
From: Markus Armbruster @ 2022-08-09  9:44 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Thomas Huth, Peter Maydell, QEMU Developers, Alex Bennée,
	Richard Henderson

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Mon, Aug 08, 2022 at 08:42:28PM +0200, Thomas Huth wrote:
>> On 08/08/2022 19.47, Peter Maydell wrote:
>> > Hi; I just reduced QEMU's storage usage on gitlab by 130GB (no typo!)
>> > using https://gitlab.com/eskultety/gitlab_cleaner, which Dan helpfully
>> > pointed me at. This script removes old pipelines, which take up a
>> > lot of storage space for QEMU because they include the stdout logs
>> > for all the CI jobs in the pipeline. (Gitlab doesn't expire these,
>> > either by default or configurably -- you have to either manually delete
>> > the pipeline in the UI or else use the API, as this script does.)
>> > 
>> > I somewhat conservatively only blew away pipelines from before the
>> > 1st January 2022. I feel like we don't really even need 6 months worth
>> > of CI job logs, though -- any views on whether we should be pruning
>> > them more aggressively ?
>> 
>> I'd say we should at least keep the logs of the last 4 to 5 months, i.e. the
>> logs for one release cycle, so we can check these logs in case we introduced
>> a new bug in the current release cycle.
>
> Have we ever actually done this in practice ?  I don't think I've ever
> looked at a pipeline older than 1-2 weeks in any project I've worked
> with on gitlab.
>
> Note that we currently use 165 GB, over an 8 month period (not sure on
> the split between container registry and pipeline). I'd guess 4-5 months
> might knock another 30-40 GB off our usage, still leaving it huge.

100GiB is a lot even in 2022.

> Personally I would suggest 1 month is sufficent for 99% of our needs.

Makes sense to me.

If we really need more, maybe look into storing suitable deltas?



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: how long do we need to retain gitlab CI job stdout logs?
  2022-08-09  9:44     ` Markus Armbruster
@ 2022-08-09 10:42       ` Daniel P. Berrangé
  0 siblings, 0 replies; 7+ messages in thread
From: Daniel P. Berrangé @ 2022-08-09 10:42 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Thomas Huth, Peter Maydell, QEMU Developers, Alex Bennée,
	Richard Henderson

On Tue, Aug 09, 2022 at 11:44:52AM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Mon, Aug 08, 2022 at 08:42:28PM +0200, Thomas Huth wrote:
> >> On 08/08/2022 19.47, Peter Maydell wrote:
> >> > Hi; I just reduced QEMU's storage usage on gitlab by 130GB (no typo!)
> >> > using https://gitlab.com/eskultety/gitlab_cleaner, which Dan helpfully
> >> > pointed me at. This script removes old pipelines, which take up a
> >> > lot of storage space for QEMU because they include the stdout logs
> >> > for all the CI jobs in the pipeline. (Gitlab doesn't expire these,
> >> > either by default or configurably -- you have to either manually delete
> >> > the pipeline in the UI or else use the API, as this script does.)
> >> > 
> >> > I somewhat conservatively only blew away pipelines from before the
> >> > 1st January 2022. I feel like we don't really even need 6 months worth
> >> > of CI job logs, though -- any views on whether we should be pruning
> >> > them more aggressively ?
> >> 
> >> I'd say we should at least keep the logs of the last 4 to 5 months, i.e. the
> >> logs for one release cycle, so we can check these logs in case we introduced
> >> a new bug in the current release cycle.
> >
> > Have we ever actually done this in practice ?  I don't think I've ever
> > looked at a pipeline older than 1-2 weeks in any project I've worked
> > with on gitlab.
> >
> > Note that we currently use 165 GB, over an 8 month period (not sure on
> > the split between container registry and pipeline). I'd guess 4-5 months
> > might knock another 30-40 GB off our usage, still leaving it huge.
> 
> 100GiB is a lot even in 2022.

BTW, frequent users of gitlab CI should check their forks too

  https://gitlab.com/$USERNAME/qemu/-/usage_quotas

I'm a bit of an extreme case since I run sooooo many pipelines when
working on CI configs, but I was using about 450 GB in my fork !

I can recommend Erik's cleaner script linked above, works fine for
forks too.

> > Personally I would suggest 1 month is sufficent for 99% of our needs.
> 
> Makes sense to me.
> 
> If we really need more, maybe look into storing suitable deltas?

We don't really have control over how stuff is stored. GitLab just
captures stdout/err from the jobs and presents that. Our options are
keep it, or delete it. For anything else, we would have to download
it and store it oursdide of gitlab, which doesn't look like its a
good use of time.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-08-09 10:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-08 17:47 how long do we need to retain gitlab CI job stdout logs? Peter Maydell
2022-08-08 18:01 ` Warner Losh
2022-08-08 18:42 ` Thomas Huth
2022-08-09  6:15   ` Markus Armbruster
2022-08-09  8:50   ` Daniel P. Berrangé
2022-08-09  9:44     ` Markus Armbruster
2022-08-09 10:42       ` Daniel P. Berrangé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).