Re: [PATCH v4 15/35] tests/functional: enable pre-emptive caching of assets

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Thomas Huth <thuth@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: "Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	qemu-devel@nongnu.org, "Ani Sinha" <anisinha@redhat.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"John Snow" <jsnow@redhat.com>,
	qemu-ppc@nongnu.org, "Fabiano Rosas" <farosas@suse.de>
Subject: Re: [PATCH v4 15/35] tests/functional: enable pre-emptive caching of assets
Date: Fri, 30 Aug 2024 13:27:15 +0200	[thread overview]
Message-ID: <aa7edbf3-d615-4f00-9e5f-2c675fa3a01c@redhat.com> (raw)
In-Reply-To: <ZtF366lfxm1gNR_Z@redhat.com>

On 30/08/2024 09.42, Daniel P. Berrangé wrote:
> On Fri, Aug 30, 2024 at 09:38:17AM +0200, Thomas Huth wrote:
>> On 29/08/2024 12.15, Daniel P. Berrangé wrote:
>>> On Tue, Aug 27, 2024 at 04:24:59PM +0200, Thomas Huth wrote:
>>>> On 27/08/2024 15.16, Thomas Huth wrote:
>>>>> On 23/08/2024 09.28, Philippe Mathieu-Daudé wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 21/8/24 10:27, Thomas Huth wrote:
>>>>>>> From: Daniel P. Berrangé <berrange@redhat.com>
>>>>>>>
>>>>>>> Many tests need to access assets stored on remote sites. We don't want
>>>>>>> to download these during test execution when run by meson, since this
>>>>>>> risks hitting test timeouts when data transfers are slow.
>>>>>>>
>>>>>>> Add support for pre-emptive caching of assets by setting the env var
>>>>>>> QEMU_TEST_PRECACHE to point to a timestamp file. When this is set,
>>>>>>> instead of running the test, the assets will be downloaded and saved
>>>>>>> to the cache, then the timestamp file created.
>> ...
>>>>>>
>>>>>> When using multiple jobs (-jN) I'm observing some hangs,
>>>>>> apparently multiple threads trying to download the same file.
>>>>>> The files are eventually downloaded successfully but it takes
>>>>>> longer. Should we acquire some exclusive lock somewhere?
>>>>>
>>>>> I haven't seen that yet ... what did you exactly run? "make
>>>>> check-functional -jN" ? Or "make check-functional-<target> -jN" ?
>>>>
>>>> After applying some of your patches, I think I've run now into this problem,
>>>> too: It's because test_aarch64_sbsaref.py and test_aarch64_virt.py try to
>>>> download the same asset in parallel (alpine-standard-3.17.2-aarch64.iso).
>>>>
>>>> Daniel, any ideas how to fix this in the Asset code?
>>>
>>> So when downloading we open a file with a ".download" suffix, write to
>>> that, and then rename it to the final filename.
>>>
>>> If we have concurrent usage, both will open the same file and try to
>>> write to it. Assuming both are downloading the same content we would
>>> probably "get lucky" and have a consistent file at the end, but clearly
>>> it is bad to rely on luck.
>>>
>>> The lame option is to use NamedTemporaryFile for the teporary file.
>>> This ensures both processes will write to different temp files, and
>>> the final rename is atomic. This guarantees safety, but still has
>>> the double download penalty.
>>>
>>> The serious option is to use fcntl.lockf(..., fcntl.LOCK_EX) on the
>>> temp file. If we can't acquire the lock then just immediately close
>>> the temp file (don't delete it) and assume another thread is going to
>>> finish its download.
>>>
>>> On windows  we'll need msvcrt.locking(..., msvcrt.LK_WLCK, ...)
>>> instead of fcntl.
>>
>> While looking for portable solutions, I noticed that newer versions
>> of Python have a "x" mode for creating files only if they do not
>> exist yet. So I think something like this could be a solution:
>>
>> @@ -71,17 +72,26 @@ def fetch(self):
>>           tmp_cache_file = self.cache_file.with_suffix(".download")
>>           try:
>> -            resp = urllib.request.urlopen(self.url)
>> +            with tmp_cache_file.open("xb") as dst:
>> +                with urllib.request.urlopen(self.url) as resp:
>> +                    copyfileobj(resp, dst)
>> +        except FileExistsError:
>> +            # Another thread already seems to download this asset,
>> +            # so wait until it is done
>> +            self.log.debug("%s already exists, waiting for other thread to finish...",
>> +                           tmp_cache_file)
>> +            i = 0
>> +            while i < 600 and os.path.exists(tmp_cache_file):
>> +                sleep(1)
>> +                i += 1
>> +            if os.path.exists(self.cache_file):
>> +                return str(self.cache_file)
>> +            raise
>>           except Exception as e:
>>               self.log.error("Unable to download %s: %s", self.url, e)
>> -            raise
>> -
>> -        try:
>> -            with tmp_cache_file.open("wb+") as dst:
>> -                copyfileobj(resp, dst)
>> -        except:
>>               tmp_cache_file.unlink()
>>               raise
>> +
>>           try:
>>               # Set these just for informational purposes
>>               os.setxattr(str(tmp_cache_file), "user.qemu-asset-url",
>>
>> What do you think, does it look reasonable?
> 
> The main risk with this, as opposed to fcntl locking, is that it is not
> crash-safe. If a download is interrupted, subsequent cache runs will
> wait for a process that doesn't exist to finish downloading and then
> raise an exception, requiring manual user cleanup of the partial
> download.
> 
> Perhaps if we see the tmp_cache_file, and it doesn't change in size
> after N seconds, we could force unlink it, and create a new download,
> so we gracefully recover ?

Sounds like a plan ... does this look acceptable:

@@ -70,18 +71,52 @@ def fetch(self):
          self.log.info("Downloading %s to %s...", self.url, self.cache_file)
          tmp_cache_file = self.cache_file.with_suffix(".download")

-        try:
-            resp = urllib.request.urlopen(self.url)
-        except Exception as e:
-            self.log.error("Unable to download %s: %s", self.url, e)
-            raise
+        for retries in range(3):
+            try:
+                with tmp_cache_file.open("xb") as dst:
+                    with urllib.request.urlopen(self.url) as resp:
+                        copyfileobj(resp, dst)
+                break
+            except FileExistsError:
+                # Another thread already seems to download this asset,
+                # so wait until it is done
+                self.log.debug("%s already exists, "
+                               "waiting for other thread to finish...",
+                               tmp_cache_file)
+                try:
+                    current_size = tmp_cache_file.stat().st_size
+                    new_size = current_size
+                except:
+                    if os.path.exists(self.cache_file):
+                        return str(self.cache_file)
+                    raise
+                waittime = lastchange = 600
+                while waittime > 0:
+                    sleep(1)
+                    waittime -= 1
+                    try:
+                        new_size = tmp_cache_file.stat().st_size
+                    except:
+                        if os.path.exists(self.cache_file):
+                            return str(self.cache_file)
+                        raise
+                    if new_size != current_size:
+                        lastchange = waittime
+                        current_size = new_size
+                    elif lastchange - waittime > 90:
+                       self.log.debug("%s seems to be stale ... "
+                                      "deleting and retrying download...",
+                                      tmp_cache_file)
+                       tmp_cache_file.unlink()
+                       break
+                if waittime > 0:
+                    continue
+                raise
+            except Exception as e:
+                self.log.error("Unable to download %s: %s", self.url, e)
+                tmp_cache_file.unlink()
+                raise

-        try:
-            with tmp_cache_file.open("wb+") as dst:
-                copyfileobj(resp, dst)
-        except:
-            tmp_cache_file.unlink()
-            raise
          try:
              # Set these just for informational purposes
              os.setxattr(str(tmp_cache_file), "user.qemu-asset-url",

?

I tried it with a stale file in my cache, and it seems to work - after 90 
seconds, one of the threads is properly trying to redownload the file.

  Thomas

next prev parent reply	other threads:[~2024-08-30 11:28 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-21  8:27 [PATCH v4 00/35] Convert avocado tests to normal Python unittests Thomas Huth
2024-08-21  8:27 ` [PATCH v4 01/35] tests/avocado: machine aarch64: standardize location and RO access Thomas Huth
2024-08-21  9:30   ` Philippe Mathieu-Daudé
2024-08-21  8:27 ` [PATCH v4 02/35] tests/avocado/boot_xen.py: fetch kernel during test setUp() Thomas Huth
2024-08-21  8:27 ` [PATCH v4 03/35] tests/avocado/machine_aarch64_sbsaref.py: allow for rw usage of image Thomas Huth
2024-08-21  8:27 ` [PATCH v4 04/35] Bump avocado to 103.0 Thomas Huth
2024-08-21 10:34   ` Philippe Mathieu-Daudé
2024-08-29  9:45     ` Daniel P. Berrangé
2024-08-21  8:27 ` [PATCH v4 05/35] tests/avocado/avocado_qemu: Fix the "from" statements in linuxtest.py Thomas Huth
2024-08-21  9:31   ` Philippe Mathieu-Daudé
2024-08-21 10:07     ` Thomas Huth
2024-08-21 10:18       ` Philippe Mathieu-Daudé
2024-08-21  8:27 ` [PATCH v4 06/35] tests/avocado/boot_linux_console: Remove the s390x subtest Thomas Huth
2024-08-29  9:46   ` Daniel P. Berrangé
2024-08-21  8:27 ` [PATCH v4 07/35] python: Install pycotap in our venv if necessary Thomas Huth
2024-08-29  9:49   ` Daniel P. Berrangé
2024-08-21  8:27 ` [PATCH v4 08/35] tests/functional: Add base classes for the upcoming pytest-based tests Thomas Huth
2024-08-21  8:27 ` [PATCH v4 09/35] tests/functional: Set up logging Thomas Huth
2024-08-21  8:27 ` [PATCH v4 10/35] tests/Makefile.include: Increase the level of indentation in the help text Thomas Huth
2024-08-21  8:27 ` [PATCH v4 11/35] tests/functional: Prepare the meson build system for the functional tests Thomas Huth
2024-08-21 14:30   ` Philippe Mathieu-Daudé
2024-08-23 12:54   ` Philippe Mathieu-Daudé
2024-08-26  8:18     ` Thomas Huth
2024-08-29  9:54       ` Daniel P. Berrangé
2024-08-21  8:27 ` [PATCH v4 12/35] tests/functional: Convert simple avocado tests into standalone python tests Thomas Huth
2024-08-21  8:27 ` [PATCH v4 13/35] tests/functional: Convert avocado tests that just need a small adjustment Thomas Huth
2024-08-21  8:27 ` [PATCH v4 14/35] tests/functional: add a module for handling asset download & caching Thomas Huth
2024-08-21 14:49   ` Philippe Mathieu-Daudé
2024-08-29  9:57     ` Daniel P. Berrangé
2024-08-23  6:24   ` Philippe Mathieu-Daudé
2024-08-29 10:00     ` Daniel P. Berrangé
2024-08-21  8:27 ` [PATCH v4 15/35] tests/functional: enable pre-emptive caching of assets Thomas Huth
2024-08-23  7:28   ` Philippe Mathieu-Daudé
2024-08-27 13:16     ` Thomas Huth
2024-08-27 14:24       ` Thomas Huth
2024-08-29 10:15         ` Daniel P. Berrangé
2024-08-30  7:38           ` Thomas Huth
2024-08-30  7:42             ` Daniel P. Berrangé
2024-08-30 11:27               ` Thomas Huth [this message]
2024-08-30 11:37                 ` Daniel P. Berrangé
2024-08-21  8:27 ` [PATCH v4 16/35] tests/functional: Convert some tests that download files via fetch_asset() Thomas Huth
2024-08-21  8:27 ` [PATCH v4 17/35] tests/functional: Add a function for extracting files from an archive Thomas Huth
2024-08-21  8:27 ` [PATCH v4 18/35] tests/functional: Convert some avocado tests that needed avocado.utils.archive Thomas Huth
2024-08-21  8:27 ` [PATCH v4 19/35] tests/functional: Convert the s390x avocado tests into standalone tests Thomas Huth
2024-08-21  8:27 ` [PATCH v4 20/35] tests/functional: Convert the x86_cpu_model_versions test Thomas Huth
2024-08-21  8:27 ` [PATCH v4 21/35] tests/functional: Convert the microblaze avocado tests into standalone tests Thomas Huth
2024-08-21  8:27 ` [PATCH v4 22/35] tests/functional: Convert the riscv_opensbi avocado test into a standalone test Thomas Huth
2024-08-21  8:27 ` [PATCH v4 23/35] tests/functional: Convert the virtio_gpu " Thomas Huth
2024-08-21  8:27 ` [PATCH v4 24/35] tests/functional: Convert most ppc avocado tests into standalone tests Thomas Huth
2024-08-21  8:27 ` [PATCH v4 25/35] tests/functional: Convert the ppc_amiga avocado test into a standalone test Thomas Huth
2024-08-21  8:27 ` [PATCH v4 26/35] tests/functional: Convert the ppc_hv " Thomas Huth
2024-08-21  9:43   ` Philippe Mathieu-Daudé
2024-08-21 10:11     ` Thomas Huth
2024-08-21 11:47   ` Philippe Mathieu-Daudé
2024-08-21  8:27 ` [PATCH v4 27/35] tests/functional: Convert the m68k nextcube test with tesseract Thomas Huth
2024-08-21  8:27 ` [PATCH v4 28/35] tests/functional: Convert the acpi-bits test into a standalone test Thomas Huth
2024-08-21  8:27 ` [PATCH v4 29/35] tests/functional: Convert the rx_gdbsim avocado " Thomas Huth
2024-08-21  8:27 ` [PATCH v4 30/35] tests/functional: Convert the linux_initrd " Thomas Huth
2024-08-21  8:27 ` [PATCH v4 31/35] gitlab-ci: Add "check-functional" to the build tests Thomas Huth
2024-08-21  8:27 ` [PATCH v4 32/35] docs/devel: Split testing docs from the build docs and move to separate folder Thomas Huth
2024-08-21  8:27 ` [PATCH v4 33/35] docs/devel/testing: Split the Avocado documentation into a separate file Thomas Huth
2024-08-29 10:18   ` Daniel P. Berrangé
2024-08-21  8:27 ` [PATCH v4 34/35] docs/devel/testing: Rename avocado_qemu.Test class Thomas Huth
2024-08-29 10:27   ` Daniel P. Berrangé
2024-08-21  8:27 ` [PATCH v4 35/35] docs/devel/testing: Add documentation for functional tests Thomas Huth
2024-08-29 10:34   ` Daniel P. Berrangé
2024-08-29 11:35     ` Thomas Huth
2024-08-29 11:43       ` Daniel P. Berrangé
2024-08-29 10:35   ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa7edbf3-d615-4f00-9e5f-2c675fa3a01c@redhat.com \
    --to=thuth@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=anisinha@redhat.com \
    --cc=berrange@redhat.com \
    --cc=farosas@suse.de \
    --cc=jsnow@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).