Buildroot Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Yann E. MORIN <yann.morin.1998@free.fr>
To: buildroot@busybox.net
Subject: [Buildroot] [PATCH] support/scripts/cpedb.py: drop CPE XML database caching
Date: Sun, 14 Feb 2021 10:14:05 +0100	[thread overview]
Message-ID: <20210214091405.GD2740149@scaer> (raw)
In-Reply-To: <20210213221948.25889-1-thomas.petazzoni@bootlin.com>

Thomas, All,

On 2021-02-13 23:19 +0100, Thomas Petazzoni spake thusly:
> Currently, the CPE XML database is parsed into a Python dict, which is
> then pickled into a local file, to speed up the processing of further
> invocations.
> 
> However, it turns out that since the initial implementation, we have
> switched the XML parsing from the out of tree xmltodict module to the
> standard ElementTree one, which has made the parsing much faster. The
> pickle caching only saves 6 seconds, on something that takes more than
> 13 minutes total.
> 
> In addition, this pickle caching consumes a significant amount of RAM,
> causing the Python process to be OOM-killed on a server with 4 GB of
> RAM.
> 
> So let's just drop this caching entirely.
> 
> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> ---
> This should be applied to master and next. Indeed, the pkg-stats
> results used for autobuild.buildroot.org/stats/ are currently done on
> next, but we also probably want people to have this change in master
> for the 2021.02 release.

Applied to master and next, thanks.

Note a comment below...

> ---
>  support/scripts/cpedb.py | 40 ++++++----------------------------------
>  1 file changed, 6 insertions(+), 34 deletions(-)
> 
> diff --git a/support/scripts/cpedb.py b/support/scripts/cpedb.py
> index 825ed6cb1e..b1e7e7012c 100644
> --- a/support/scripts/cpedb.py
> +++ b/support/scripts/cpedb.py
[--SNIP--]
> @@ -121,24 +105,12 @@ class CPEDB:
>              cpe_dict = requests.get(CPEDB_URL)
>              open(cpe_dict_local, "wb").write(cpe_dict.content)
>  
> -        cache_all_cpes = os.path.join(self.nvd_path, "cpe", "all_cpes.pkl")
> -        cache_all_cpes_no_version = os.path.join(self.nvd_path, "cpe", "all_cpes_no_version.pkl")
> -
> -        if not os.path.exists(cache_all_cpes) or \
> -           not os.path.exists(cache_all_cpes_no_version) or \
> -           os.stat(cache_all_cpes).st_mtime < os.stat(cpe_dict_local).st_mtime or \
> -           os.stat(cache_all_cpes_no_version).st_mtime < os.stat(cpe_dict_local).st_mtime:
> -            self.gen_cached_cpedb(cpe_dict_local,
> -                                  cache_all_cpes,
> -                                  cache_all_cpes_no_version)
> -
> -        print("CPE: Loading CACHED dictionary")
> -        cpe_file = open(cache_all_cpes, 'rb')
> -        self.all_cpes = pickle.load(cpe_file)
> -        cpe_file.close()
> -        cpe_file = open(cache_all_cpes_no_version, 'rb')
> -        self.all_cpes_no_version = pickle.load(cpe_file)
> -        cpe_file.close()
> +        print("CPE: Unzipping xml manifest...")
> +        nist_cpe_file = gzip.GzipFile(fileobj=open(cpe_dict_local, 'rb'))
> +        print("CPE: Converting xml manifest to dict...")
> +        tree = ET.parse(nist_cpe_file)

Once your nist_cpe_file has been parsed, you could delete it to reclaim
some memory:

    del nist_cpe_file

And maybe do so for a few other intemediate blobs that are really big...

Regards,
Yann E. MORIN.

> +        all_cpedb = tree.getroot()
> +        self.parse_dict(all_cpedb)
>  
>      def parse_dict(self, all_cpedb):
>          # Cycle through the dict and build two dict to be used for custom
> -- 
> 2.29.2
> 

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 561 099 427 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

      reply	other threads:[~2021-02-14  9:14 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-13 22:19 [Buildroot] [PATCH] support/scripts/cpedb.py: drop CPE XML database caching Thomas Petazzoni
2021-02-14  9:14 ` Yann E. MORIN [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210214091405.GD2740149@scaer \
    --to=yann.morin.1998@free.fr \
    --cc=buildroot@busybox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox