From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yann E. MORIN Date: Sun, 14 Feb 2021 10:14:05 +0100 Subject: [Buildroot] [PATCH] support/scripts/cpedb.py: drop CPE XML database caching In-Reply-To: <20210213221948.25889-1-thomas.petazzoni@bootlin.com> References: <20210213221948.25889-1-thomas.petazzoni@bootlin.com> Message-ID: <20210214091405.GD2740149@scaer> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: buildroot@busybox.net Thomas, All, On 2021-02-13 23:19 +0100, Thomas Petazzoni spake thusly: > Currently, the CPE XML database is parsed into a Python dict, which is > then pickled into a local file, to speed up the processing of further > invocations. > > However, it turns out that since the initial implementation, we have > switched the XML parsing from the out of tree xmltodict module to the > standard ElementTree one, which has made the parsing much faster. The > pickle caching only saves 6 seconds, on something that takes more than > 13 minutes total. > > In addition, this pickle caching consumes a significant amount of RAM, > causing the Python process to be OOM-killed on a server with 4 GB of > RAM. > > So let's just drop this caching entirely. > > Signed-off-by: Thomas Petazzoni > --- > This should be applied to master and next. Indeed, the pkg-stats > results used for autobuild.buildroot.org/stats/ are currently done on > next, but we also probably want people to have this change in master > for the 2021.02 release. Applied to master and next, thanks. Note a comment below... > --- > support/scripts/cpedb.py | 40 ++++++---------------------------------- > 1 file changed, 6 insertions(+), 34 deletions(-) > > diff --git a/support/scripts/cpedb.py b/support/scripts/cpedb.py > index 825ed6cb1e..b1e7e7012c 100644 > --- a/support/scripts/cpedb.py > +++ b/support/scripts/cpedb.py [--SNIP--] > @@ -121,24 +105,12 @@ class CPEDB: > cpe_dict = requests.get(CPEDB_URL) > open(cpe_dict_local, "wb").write(cpe_dict.content) > > - cache_all_cpes = os.path.join(self.nvd_path, "cpe", "all_cpes.pkl") > - cache_all_cpes_no_version = os.path.join(self.nvd_path, "cpe", "all_cpes_no_version.pkl") > - > - if not os.path.exists(cache_all_cpes) or \ > - not os.path.exists(cache_all_cpes_no_version) or \ > - os.stat(cache_all_cpes).st_mtime < os.stat(cpe_dict_local).st_mtime or \ > - os.stat(cache_all_cpes_no_version).st_mtime < os.stat(cpe_dict_local).st_mtime: > - self.gen_cached_cpedb(cpe_dict_local, > - cache_all_cpes, > - cache_all_cpes_no_version) > - > - print("CPE: Loading CACHED dictionary") > - cpe_file = open(cache_all_cpes, 'rb') > - self.all_cpes = pickle.load(cpe_file) > - cpe_file.close() > - cpe_file = open(cache_all_cpes_no_version, 'rb') > - self.all_cpes_no_version = pickle.load(cpe_file) > - cpe_file.close() > + print("CPE: Unzipping xml manifest...") > + nist_cpe_file = gzip.GzipFile(fileobj=open(cpe_dict_local, 'rb')) > + print("CPE: Converting xml manifest to dict...") > + tree = ET.parse(nist_cpe_file) Once your nist_cpe_file has been parsed, you could delete it to reclaim some memory: del nist_cpe_file And maybe do so for a few other intemediate blobs that are really big... Regards, Yann E. MORIN. > + all_cpedb = tree.getroot() > + self.parse_dict(all_cpedb) > > def parse_dict(self, all_cpedb): > # Cycle through the dict and build two dict to be used for custom > -- > 2.29.2 > -- .-----------------.--------------------.------------------.--------------------. | Yann E. MORIN | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: | | +33 662 376 056 | Software Designer | \ / CAMPAIGN | ___ | | +33 561 099 427 `------------.-------: X AGAINST | \e/ There is no | | http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL | v conspiracy. | '------------------------------^-------^------------------^--------------------'