* how long do we need to retain gitlab CI job stdout logs? @ 2022-08-08 17:47 Peter Maydell 2022-08-08 18:01 ` Warner Losh 2022-08-08 18:42 ` Thomas Huth 0 siblings, 2 replies; 7+ messages in thread From: Peter Maydell @ 2022-08-08 17:47 UTC (permalink / raw) To: QEMU Developers; +Cc: Alex Bennée, Daniel P. Berrange, Richard Henderson Hi; I just reduced QEMU's storage usage on gitlab by 130GB (no typo!) using https://gitlab.com/eskultety/gitlab_cleaner, which Dan helpfully pointed me at. This script removes old pipelines, which take up a lot of storage space for QEMU because they include the stdout logs for all the CI jobs in the pipeline. (Gitlab doesn't expire these, either by default or configurably -- you have to either manually delete the pipeline in the UI or else use the API, as this script does.) I somewhat conservatively only blew away pipelines from before the 1st January 2022. I feel like we don't really even need 6 months worth of CI job logs, though -- any views on whether we should be pruning them more aggressively ? thanks -- PMM ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: how long do we need to retain gitlab CI job stdout logs? 2022-08-08 17:47 how long do we need to retain gitlab CI job stdout logs? Peter Maydell @ 2022-08-08 18:01 ` Warner Losh 2022-08-08 18:42 ` Thomas Huth 1 sibling, 0 replies; 7+ messages in thread From: Warner Losh @ 2022-08-08 18:01 UTC (permalink / raw) To: Peter Maydell Cc: QEMU Developers, Alex Bennée, Daniel P. Berrange, Richard Henderson [-- Attachment #1: Type: text/plain, Size: 1469 bytes --] On Mon, Aug 8, 2022, 11:49 AM Peter Maydell <peter.maydell@linaro.org> wrote: > Hi; I just reduced QEMU's storage usage on gitlab by 130GB (no typo!) > using https://gitlab.com/eskultety/gitlab_cleaner, which Dan helpfully > pointed me at. This script removes old pipelines, which take up a > lot of storage space for QEMU because they include the stdout logs > for all the CI jobs in the pipeline. (Gitlab doesn't expire these, > either by default or configurably -- you have to either manually delete > the pipeline in the UI or else use the API, as this script does.) > > I somewhat conservatively only blew away pipelines from before the > 1st January 2022. I feel like we don't really even need 6 months worth > of CI job logs, though -- any views on whether we should be pruning > them more aggressively ? > My finger in the air says "more than a month, less than a year." It can often take a while to notice problems, especially non fatal ones. If we had a one month retention we'd likely find we'd need older logs fairly often. If we expire after a year, we'd never wish we hadn't. Nearly all problems CI would be helpful to address are found in that time. Usually, in other project, almost all issues like this are fixed within a couple months (often much sooner). That suggests that 4-6 months likely is the right balance with my personal bias to 6 months unless there is significant financial or other savings from 4 months. Warner thanks > -- PMM > > [-- Attachment #2: Type: text/html, Size: 2316 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: how long do we need to retain gitlab CI job stdout logs? 2022-08-08 17:47 how long do we need to retain gitlab CI job stdout logs? Peter Maydell 2022-08-08 18:01 ` Warner Losh @ 2022-08-08 18:42 ` Thomas Huth 2022-08-09 6:15 ` Markus Armbruster 2022-08-09 8:50 ` Daniel P. Berrangé 1 sibling, 2 replies; 7+ messages in thread From: Thomas Huth @ 2022-08-08 18:42 UTC (permalink / raw) To: Peter Maydell, QEMU Developers Cc: Alex Bennée, Daniel P. Berrange, Richard Henderson On 08/08/2022 19.47, Peter Maydell wrote: > Hi; I just reduced QEMU's storage usage on gitlab by 130GB (no typo!) > using https://gitlab.com/eskultety/gitlab_cleaner, which Dan helpfully > pointed me at. This script removes old pipelines, which take up a > lot of storage space for QEMU because they include the stdout logs > for all the CI jobs in the pipeline. (Gitlab doesn't expire these, > either by default or configurably -- you have to either manually delete > the pipeline in the UI or else use the API, as this script does.) > > I somewhat conservatively only blew away pipelines from before the > 1st January 2022. I feel like we don't really even need 6 months worth > of CI job logs, though -- any views on whether we should be pruning > them more aggressively ? I'd say we should at least keep the logs of the last 4 to 5 months, i.e. the logs for one release cycle, so we can check these logs in case we introduced a new bug in the current release cycle. Thomas ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: how long do we need to retain gitlab CI job stdout logs? 2022-08-08 18:42 ` Thomas Huth @ 2022-08-09 6:15 ` Markus Armbruster 2022-08-09 8:50 ` Daniel P. Berrangé 1 sibling, 0 replies; 7+ messages in thread From: Markus Armbruster @ 2022-08-09 6:15 UTC (permalink / raw) To: Thomas Huth Cc: Peter Maydell, QEMU Developers, Alex Bennée, Daniel P. Berrange, Richard Henderson Thomas Huth <thuth@redhat.com> writes: > On 08/08/2022 19.47, Peter Maydell wrote: >> Hi; I just reduced QEMU's storage usage on gitlab by 130GB (no typo!) >> using https://gitlab.com/eskultety/gitlab_cleaner, which Dan helpfully >> pointed me at. This script removes old pipelines, which take up a >> lot of storage space for QEMU because they include the stdout logs >> for all the CI jobs in the pipeline. (Gitlab doesn't expire these, >> either by default or configurably -- you have to either manually delete >> the pipeline in the UI or else use the API, as this script does.) >> I somewhat conservatively only blew away pipelines from before the >> 1st January 2022. I feel like we don't really even need 6 months worth >> of CI job logs, though -- any views on whether we should be pruning >> them more aggressively ? > > I'd say we should at least keep the logs of the last 4 to 5 months, i.e. the logs for one release cycle, so we can check these logs in case we introduced > a new bug in the current release cycle. If this takes too much space, consider keeping every n-th log after a month. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: how long do we need to retain gitlab CI job stdout logs? 2022-08-08 18:42 ` Thomas Huth 2022-08-09 6:15 ` Markus Armbruster @ 2022-08-09 8:50 ` Daniel P. Berrangé 2022-08-09 9:44 ` Markus Armbruster 1 sibling, 1 reply; 7+ messages in thread From: Daniel P. Berrangé @ 2022-08-09 8:50 UTC (permalink / raw) To: Thomas Huth Cc: Peter Maydell, QEMU Developers, Alex Bennée, Richard Henderson On Mon, Aug 08, 2022 at 08:42:28PM +0200, Thomas Huth wrote: > On 08/08/2022 19.47, Peter Maydell wrote: > > Hi; I just reduced QEMU's storage usage on gitlab by 130GB (no typo!) > > using https://gitlab.com/eskultety/gitlab_cleaner, which Dan helpfully > > pointed me at. This script removes old pipelines, which take up a > > lot of storage space for QEMU because they include the stdout logs > > for all the CI jobs in the pipeline. (Gitlab doesn't expire these, > > either by default or configurably -- you have to either manually delete > > the pipeline in the UI or else use the API, as this script does.) > > > > I somewhat conservatively only blew away pipelines from before the > > 1st January 2022. I feel like we don't really even need 6 months worth > > of CI job logs, though -- any views on whether we should be pruning > > them more aggressively ? > > I'd say we should at least keep the logs of the last 4 to 5 months, i.e. the > logs for one release cycle, so we can check these logs in case we introduced > a new bug in the current release cycle. Have we ever actually done this in practice ? I don't think I've ever looked at a pipeline older than 1-2 weeks in any project I've worked with on gitlab. Note that we currently use 165 GB, over an 8 month period (not sure on the split between container registry and pipeline). I'd guess 4-5 months might knock another 30-40 GB off our usage, still leaving it huge. Personally I would suggest 1 month is sufficent for 99% of our needs. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: how long do we need to retain gitlab CI job stdout logs? 2022-08-09 8:50 ` Daniel P. Berrangé @ 2022-08-09 9:44 ` Markus Armbruster 2022-08-09 10:42 ` Daniel P. Berrangé 0 siblings, 1 reply; 7+ messages in thread From: Markus Armbruster @ 2022-08-09 9:44 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Thomas Huth, Peter Maydell, QEMU Developers, Alex Bennée, Richard Henderson Daniel P. Berrangé <berrange@redhat.com> writes: > On Mon, Aug 08, 2022 at 08:42:28PM +0200, Thomas Huth wrote: >> On 08/08/2022 19.47, Peter Maydell wrote: >> > Hi; I just reduced QEMU's storage usage on gitlab by 130GB (no typo!) >> > using https://gitlab.com/eskultety/gitlab_cleaner, which Dan helpfully >> > pointed me at. This script removes old pipelines, which take up a >> > lot of storage space for QEMU because they include the stdout logs >> > for all the CI jobs in the pipeline. (Gitlab doesn't expire these, >> > either by default or configurably -- you have to either manually delete >> > the pipeline in the UI or else use the API, as this script does.) >> > >> > I somewhat conservatively only blew away pipelines from before the >> > 1st January 2022. I feel like we don't really even need 6 months worth >> > of CI job logs, though -- any views on whether we should be pruning >> > them more aggressively ? >> >> I'd say we should at least keep the logs of the last 4 to 5 months, i.e. the >> logs for one release cycle, so we can check these logs in case we introduced >> a new bug in the current release cycle. > > Have we ever actually done this in practice ? I don't think I've ever > looked at a pipeline older than 1-2 weeks in any project I've worked > with on gitlab. > > Note that we currently use 165 GB, over an 8 month period (not sure on > the split between container registry and pipeline). I'd guess 4-5 months > might knock another 30-40 GB off our usage, still leaving it huge. 100GiB is a lot even in 2022. > Personally I would suggest 1 month is sufficent for 99% of our needs. Makes sense to me. If we really need more, maybe look into storing suitable deltas? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: how long do we need to retain gitlab CI job stdout logs? 2022-08-09 9:44 ` Markus Armbruster @ 2022-08-09 10:42 ` Daniel P. Berrangé 0 siblings, 0 replies; 7+ messages in thread From: Daniel P. Berrangé @ 2022-08-09 10:42 UTC (permalink / raw) To: Markus Armbruster Cc: Thomas Huth, Peter Maydell, QEMU Developers, Alex Bennée, Richard Henderson On Tue, Aug 09, 2022 at 11:44:52AM +0200, Markus Armbruster wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Mon, Aug 08, 2022 at 08:42:28PM +0200, Thomas Huth wrote: > >> On 08/08/2022 19.47, Peter Maydell wrote: > >> > Hi; I just reduced QEMU's storage usage on gitlab by 130GB (no typo!) > >> > using https://gitlab.com/eskultety/gitlab_cleaner, which Dan helpfully > >> > pointed me at. This script removes old pipelines, which take up a > >> > lot of storage space for QEMU because they include the stdout logs > >> > for all the CI jobs in the pipeline. (Gitlab doesn't expire these, > >> > either by default or configurably -- you have to either manually delete > >> > the pipeline in the UI or else use the API, as this script does.) > >> > > >> > I somewhat conservatively only blew away pipelines from before the > >> > 1st January 2022. I feel like we don't really even need 6 months worth > >> > of CI job logs, though -- any views on whether we should be pruning > >> > them more aggressively ? > >> > >> I'd say we should at least keep the logs of the last 4 to 5 months, i.e. the > >> logs for one release cycle, so we can check these logs in case we introduced > >> a new bug in the current release cycle. > > > > Have we ever actually done this in practice ? I don't think I've ever > > looked at a pipeline older than 1-2 weeks in any project I've worked > > with on gitlab. > > > > Note that we currently use 165 GB, over an 8 month period (not sure on > > the split between container registry and pipeline). I'd guess 4-5 months > > might knock another 30-40 GB off our usage, still leaving it huge. > > 100GiB is a lot even in 2022. BTW, frequent users of gitlab CI should check their forks too https://gitlab.com/$USERNAME/qemu/-/usage_quotas I'm a bit of an extreme case since I run sooooo many pipelines when working on CI configs, but I was using about 450 GB in my fork ! I can recommend Erik's cleaner script linked above, works fine for forks too. > > Personally I would suggest 1 month is sufficent for 99% of our needs. > > Makes sense to me. > > If we really need more, maybe look into storing suitable deltas? We don't really have control over how stuff is stored. GitLab just captures stdout/err from the jobs and presents that. Our options are keep it, or delete it. For anything else, we would have to download it and store it oursdide of gitlab, which doesn't look like its a good use of time. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-08-09 10:44 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-08-08 17:47 how long do we need to retain gitlab CI job stdout logs? Peter Maydell 2022-08-08 18:01 ` Warner Losh 2022-08-08 18:42 ` Thomas Huth 2022-08-09 6:15 ` Markus Armbruster 2022-08-09 8:50 ` Daniel P. Berrangé 2022-08-09 9:44 ` Markus Armbruster 2022-08-09 10:42 ` Daniel P. Berrangé
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).