public inbox for openembedded-core@lists.openembedded.org
 help / color / mirror / Atom feed
From: Richard Purdie <richard.purdie@linuxfoundation.org>
To: Marta Rybczynska <rybczynska@gmail.com>
Cc: mrybczynska@syslinbit.com,
	openembedded-core@lists.openembedded.org,
	 Marta Rybczynska <marta.rybczynska@syslinbit.com>,
	Samantha Jalabert <samantha.jalabert@syslinbit.com>
Subject: Re: [OE-core][PATCH v3 1/5] cve-check: annotate CVEs during analysis
Date: Fri, 02 Aug 2024 14:00:29 +0100	[thread overview]
Message-ID: <44cf64be9b222545d86163afc0c6671ace33fa42.camel@linuxfoundation.org> (raw)
In-Reply-To: <CAApg2=RRLDsj7jnQh=JJTZOzF2KsMvhqy9UKvxs5igX_TYAmwg@mail.gmail.com>

On Fri, 2024-08-02 at 14:50 +0200, Marta Rybczynska wrote:
> On Thu, Aug 1, 2024 at 4:25 PM Richard Purdie
> <richard.purdie@linuxfoundation.org> wrote:
> > On Fri, 2024-07-26 at 15:02 +0200, Marta Rybczynska wrote:
> > > On Thu, Jul 25, 2024 at 5:27 PM Richard Purdie
> > > <richard.purdie@linuxfoundation.org> wrote:
> > > > On Thu, 2024-07-25 at 16:48 +0200, mrybczynska@syslinbit.com
> > > > wrote:
> > > > > On 25.07.2024 16:29, Richard Purdie wrote:
> > > > > > Hi Marta,
> > > > > > 
> > > > > > 
> > > > > > With the v3 series applied we did just see this on the
> > > > > > autobuilder
> > > > > > unfortunately so I'm not sure that problem is addressed:
> > > > > > 
> > > > > > https://autobuilder.yoctoproject.org/typhoon/#/builders/87/builds/7004/steps/14/logs/stdio
> > > > > > 
> > > > > 
> > > > > Hello Richard,
> > > > > Thanks, this is unfortunate. Is it possible to have a copy of
> > > > > the
> > > > > corrupted database somewhere?
> > > > 
> > > > I think it is transient as we never clean it up and not all
> > > > tasks fail.
> > > > That seems to imply it is a race of some kind.
> > > 
> > > I have a few ideas of what it might be, but I do not have a
> > > reproducer right now. With the
> > > vex changes, the duration of the cve_check operation changed
> > > slightly. On the other hand,
> > > the database download is slower these days (I have had standalone
> > > runs that lasted for 5+ hours).
> > > Also, I noticed that there were cancellations of some of the
> > > build, so the cancellation of the download
> > > may be in play too.
> > > 
> > > A question: autobuilder configuration does share DL_DIR among
> > > multiple builds?
> > 
> > DL_DIR is shared between all the workers over NFS.
> > 
> > > My possibility list right now:
> > > - the "download" job timeout too short
> > > - download failure/timeout
> > > - job cancellation during the download
> > 
> > While a download is in progress, the exclusive lock should be held.
> > If the database were damaged, I'd then expect all subsequent
> > cve_check tasks to fail the same way.
> > 
> > In the failures, 2 or 3 tasks fail, the rest all continue to work.
> > So ti doesn't really fit.
> > 
> 
> 
> I would suspect there's a fetch job running in addition somewhere and
> it manages to do the download. From that point, subsequent checks
> will work. But where does that corruption come from - no idea.

We only ever update the database through the recipe though, right? That
recipe does have the correct lockfile specified for do_fetch? That
should mean it always has an exclusive lock when updating.

> I've noticed that tests *could* cause a database update and that the
> temporary download path will be the same for all instances
> (CVE_DB_TEMP_FILE). This could cause corruption if the lock doesn't
> work as we expect it to.

I did some tests and the lock does work between different autobuilder
nodes. We've noticed that the issue only seems to happen on ubuntu
22.04 which makes me wonder if there is a bug somewhere there such as
in the host sqlite3?

> Now, between kirkstone and master there should be no corruption, as
> this is not the same database - files have different names as changed
> in 048ff0ad927f4d37cc5547ebeba9e0c221687ea6.

Steve has observed the same issue in kirkstone, only on ubuntu 22.04.

> 
> We could do tweaks to make sure tests do not download the database
> (CVE_DB_UPDATE_INTERVAL = "-1"). We could even do a run or two with
> that set for the whole build for all configurations, to make sure the
> corruption
> does not happen at runtime.
> 
> We also have a standalone script to download the database (no change
> in the format from the master branch), so we can use it and then
> point builds to the copy, while disabling updates.
> 
> The source is here:
> https://gitlab.com/syslinbit/public/yocto-vex-check/-/blob/main/cve-update-nvd2-native.py?ref_type=heads
> 
> We can also change the location of the database and always keep it in
> TMPDIR
> or such. This could mean a long wait for the download.
> 
> Which solution would you prefer to test?

I'm not convinced the issue is a parallel fetch, I think sqlite is
breaking somehow and it is host specific.

I think we should add a do_unpack to the recipe and work from a local
copy in TMPDIR, not one over NFS. It can copy from DL_DIR so we
shouldn't lose much speed. I'm willing to give that patch a go if that
helps and lets you focus on the other patches?

Cheers,

Richard






  reply	other threads:[~2024-08-02 13:00 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-24 15:25 [OE-core][PATCH v3 1/5] cve-check: annotate CVEs during analysis Marta Rybczynska
2024-07-24 15:25 ` [OE-core][PATCH v3 2/5] cve_check: Update selftest with new status detail Marta Rybczynska
2024-07-24 15:25 ` [OE-core][PATCH v3 3/5] vex.bbclass: add a new class Marta Rybczynska
2024-07-26 12:09   ` Ross Burton
2024-07-26 12:12     ` Ross Burton
2024-07-26 12:23       ` Marta Rybczynska
2024-07-26 12:22     ` Marta Rybczynska
2024-07-24 15:25 ` [OE-core][PATCH v3 4/5] cve-check-map: add new statuses Marta Rybczynska
2024-07-24 15:25 ` [OE-core][PATCH v3 5/5] cve-extra-exclusions.inc: add deprecation notice Marta Rybczynska
2024-07-26 12:23   ` Ross Burton
2024-07-26 12:28     ` Marta Rybczynska
2024-08-01 13:47       ` Richard Purdie
2024-08-01 13:58         ` Marta Rybczynska
2024-08-01 14:08           ` Richard Purdie
2024-08-06 13:16             ` Marta Rybczynska
2024-08-06 13:30               ` Richard Purdie
2024-07-24 15:41 ` Patchtest results for [OE-core][PATCH v3 1/5] cve-check: annotate CVEs during analysis patchtest
2024-07-25 14:29 ` Richard Purdie
     [not found]   ` <399979010dfd02323f49cbd25b95f606@syslinbit.com>
2024-07-25 15:27     ` Richard Purdie
2024-07-26 13:02       ` Marta Rybczynska
2024-08-01 14:25         ` Richard Purdie
2024-08-02 12:50           ` Marta Rybczynska
2024-08-02 13:00             ` Richard Purdie [this message]
2024-08-01 13:44 ` Richard Purdie
2024-08-01 14:14   ` Marko, Peter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44cf64be9b222545d86163afc0c6671ace33fa42.camel@linuxfoundation.org \
    --to=richard.purdie@linuxfoundation.org \
    --cc=marta.rybczynska@syslinbit.com \
    --cc=mrybczynska@syslinbit.com \
    --cc=openembedded-core@lists.openembedded.org \
    --cc=rybczynska@gmail.com \
    --cc=samantha.jalabert@syslinbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox