From: Yann E. MORIN <yann.morin.1998@free.fr>
To: buildroot@busybox.net
Subject: [Buildroot] [PATCH] support/scripts/cpedb.py: drop CPE XML database caching
Date: Sun, 14 Feb 2021 10:14:05 +0100 [thread overview]
Message-ID: <20210214091405.GD2740149@scaer> (raw)
In-Reply-To: <20210213221948.25889-1-thomas.petazzoni@bootlin.com>
Thomas, All,
On 2021-02-13 23:19 +0100, Thomas Petazzoni spake thusly:
> Currently, the CPE XML database is parsed into a Python dict, which is
> then pickled into a local file, to speed up the processing of further
> invocations.
>
> However, it turns out that since the initial implementation, we have
> switched the XML parsing from the out of tree xmltodict module to the
> standard ElementTree one, which has made the parsing much faster. The
> pickle caching only saves 6 seconds, on something that takes more than
> 13 minutes total.
>
> In addition, this pickle caching consumes a significant amount of RAM,
> causing the Python process to be OOM-killed on a server with 4 GB of
> RAM.
>
> So let's just drop this caching entirely.
>
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> ---
> This should be applied to master and next. Indeed, the pkg-stats
> results used for autobuild.buildroot.org/stats/ are currently done on
> next, but we also probably want people to have this change in master
> for the 2021.02 release.
Applied to master and next, thanks.
Note a comment below...
> ---
> support/scripts/cpedb.py | 40 ++++++----------------------------------
> 1 file changed, 6 insertions(+), 34 deletions(-)
>
> diff --git a/support/scripts/cpedb.py b/support/scripts/cpedb.py
> index 825ed6cb1e..b1e7e7012c 100644
> --- a/support/scripts/cpedb.py
> +++ b/support/scripts/cpedb.py
[--SNIP--]
> @@ -121,24 +105,12 @@ class CPEDB:
> cpe_dict = requests.get(CPEDB_URL)
> open(cpe_dict_local, "wb").write(cpe_dict.content)
>
> - cache_all_cpes = os.path.join(self.nvd_path, "cpe", "all_cpes.pkl")
> - cache_all_cpes_no_version = os.path.join(self.nvd_path, "cpe", "all_cpes_no_version.pkl")
> -
> - if not os.path.exists(cache_all_cpes) or \
> - not os.path.exists(cache_all_cpes_no_version) or \
> - os.stat(cache_all_cpes).st_mtime < os.stat(cpe_dict_local).st_mtime or \
> - os.stat(cache_all_cpes_no_version).st_mtime < os.stat(cpe_dict_local).st_mtime:
> - self.gen_cached_cpedb(cpe_dict_local,
> - cache_all_cpes,
> - cache_all_cpes_no_version)
> -
> - print("CPE: Loading CACHED dictionary")
> - cpe_file = open(cache_all_cpes, 'rb')
> - self.all_cpes = pickle.load(cpe_file)
> - cpe_file.close()
> - cpe_file = open(cache_all_cpes_no_version, 'rb')
> - self.all_cpes_no_version = pickle.load(cpe_file)
> - cpe_file.close()
> + print("CPE: Unzipping xml manifest...")
> + nist_cpe_file = gzip.GzipFile(fileobj=open(cpe_dict_local, 'rb'))
> + print("CPE: Converting xml manifest to dict...")
> + tree = ET.parse(nist_cpe_file)
Once your nist_cpe_file has been parsed, you could delete it to reclaim
some memory:
del nist_cpe_file
And maybe do so for a few other intemediate blobs that are really big...
Regards,
Yann E. MORIN.
> + all_cpedb = tree.getroot()
> + self.parse_dict(all_cpedb)
>
> def parse_dict(self, all_cpedb):
> # Cycle through the dict and build two dict to be used for custom
> --
> 2.29.2
>
--
.-----------------.--------------------.------------------.--------------------.
| Yann E. MORIN | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software Designer | \ / CAMPAIGN | ___ |
| +33 561 099 427 `------------.-------: X AGAINST | \e/ There is no |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL | v conspiracy. |
'------------------------------^-------^------------------^--------------------'
prev parent reply other threads:[~2021-02-14 9:14 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-13 22:19 [Buildroot] [PATCH] support/scripts/cpedb.py: drop CPE XML database caching Thomas Petazzoni
2021-02-14 9:14 ` Yann E. MORIN [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210214091405.GD2740149@scaer \
--to=yann.morin.1998@free.fr \
--cc=buildroot@busybox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox