qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] tests/functional/asset: improve partial-download handling
@ 2025-03-12 12:25 Nicholas Piggin
  2025-03-12 12:25 ` [PATCH v2 1/3] tests/functional/asset: Fail assert fetch when retries are exceeded Nicholas Piggin
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Nicholas Piggin @ 2025-03-12 12:25 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Nicholas Piggin, Philippe Mathieu-Daudé,
	Daniel P. Berrangé, qemu-devel

v1 discussion:

https://lore.kernel.org/qemu-devel/20250312051739.938441-1-npiggin@gmail.com/T/#md49b293a64207b578600a8c428bccbce3d471e68

Changes since v1:

- Change retry exceeded handling to be a check for no file
- Tidied comments and debug leftover from patch 2
- If downloaded size does not match, the advertised and received
  sizes are now printed in the error log.

Thanks Thomas and Daniel for review comments.

Thanks,
Nick

Nicholas Piggin (3):
  tests/functional/asset: Fail assert fetch when retries are exceeded
  tests/functional/asset: Verify downloaded size
  tests/functional/asset: Add AssetError exception class

 roms/skiboot                        |  2 +-
 tests/functional/qemu_test/asset.py | 58 ++++++++++++++++++++++-------
 tests/lcitool/libvirt-ci            |  2 +-
 3 files changed, 46 insertions(+), 16 deletions(-)

-- 
2.47.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/3] tests/functional/asset: Fail assert fetch when retries are exceeded
  2025-03-12 12:25 [PATCH v2 0/3] tests/functional/asset: improve partial-download handling Nicholas Piggin
@ 2025-03-12 12:25 ` Nicholas Piggin
  2025-03-12 12:27   ` Daniel P. Berrangé
  2025-03-12 12:25 ` [PATCH v2 2/3] tests/functional/asset: Verify downloaded size Nicholas Piggin
  2025-03-12 12:25 ` [PATCH v2 3/3] tests/functional/asset: Add AssetError exception class Nicholas Piggin
  2 siblings, 1 reply; 7+ messages in thread
From: Nicholas Piggin @ 2025-03-12 12:25 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Nicholas Piggin, Philippe Mathieu-Daudé,
	Daniel P. Berrangé, qemu-devel

Currently the fetch code does not fail gracefully when retry limit is
exceeded, it just falls through the loop with no file, which ends up
hitting other errors.

Add a check for non-existing file, which indicates the retry limit was
exceeded.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 tests/functional/qemu_test/asset.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/functional/qemu_test/asset.py b/tests/functional/qemu_test/asset.py
index f0730695f09..27dd839e705 100644
--- a/tests/functional/qemu_test/asset.py
+++ b/tests/functional/qemu_test/asset.py
@@ -138,6 +138,9 @@ def fetch(self):
                 tmp_cache_file.unlink()
                 raise
 
+        if not os.path.exists(tmp_cache_file):
+            raise Exception("Retries exceeded downloading %s", self.url)
+
         try:
             # Set these just for informational purposes
             os.setxattr(str(tmp_cache_file), "user.qemu-asset-url",
-- 
2.47.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/3] tests/functional/asset: Verify downloaded size
  2025-03-12 12:25 [PATCH v2 0/3] tests/functional/asset: improve partial-download handling Nicholas Piggin
  2025-03-12 12:25 ` [PATCH v2 1/3] tests/functional/asset: Fail assert fetch when retries are exceeded Nicholas Piggin
@ 2025-03-12 12:25 ` Nicholas Piggin
  2025-03-12 12:27   ` Daniel P. Berrangé
  2025-03-12 12:25 ` [PATCH v2 3/3] tests/functional/asset: Add AssetError exception class Nicholas Piggin
  2 siblings, 1 reply; 7+ messages in thread
From: Nicholas Piggin @ 2025-03-12 12:25 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Nicholas Piggin, Philippe Mathieu-Daudé,
	Daniel P. Berrangé, qemu-devel

If the server provides a Content-Length header, use that to verify the
size of the downloaded file. This catches cases where the connection
terminates early, and gives the opportunity to retry. Without this, the
checksum will likely mismatch and fail without retry.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 tests/functional/qemu_test/asset.py | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tests/functional/qemu_test/asset.py b/tests/functional/qemu_test/asset.py
index 27dd839e705..6bbfb9e1cad 100644
--- a/tests/functional/qemu_test/asset.py
+++ b/tests/functional/qemu_test/asset.py
@@ -121,6 +121,20 @@ def fetch(self):
                 with tmp_cache_file.open("xb") as dst:
                     with urllib.request.urlopen(self.url) as resp:
                         copyfileobj(resp, dst)
+                        length_hdr = resp.getheader("Content-Length")
+
+                # Verify downloaded file size against length metadata, if
+                # available.
+                if length_hdr is not None:
+                    length = int(length_hdr)
+                    fsize = tmp_cache_file.stat().st_size
+                    if fsize != length:
+                        self.log.error("Unable to download %s: "
+                                       "connection closed before "
+                                       "transfer complete (%d/%d)",
+                                       self.url, fsize, length)
+                        tmp_cache_file.unlink()
+                        continue
                 break
             except FileExistsError:
                 self.log.debug("%s already exists, "
-- 
2.47.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] tests/functional/asset: Add AssetError exception class
  2025-03-12 12:25 [PATCH v2 0/3] tests/functional/asset: improve partial-download handling Nicholas Piggin
  2025-03-12 12:25 ` [PATCH v2 1/3] tests/functional/asset: Fail assert fetch when retries are exceeded Nicholas Piggin
  2025-03-12 12:25 ` [PATCH v2 2/3] tests/functional/asset: Verify downloaded size Nicholas Piggin
@ 2025-03-12 12:25 ` Nicholas Piggin
  2025-03-12 12:29   ` Daniel P. Berrangé
  2 siblings, 1 reply; 7+ messages in thread
From: Nicholas Piggin @ 2025-03-12 12:25 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Nicholas Piggin, Philippe Mathieu-Daudé,
	Daniel P. Berrangé, qemu-devel

Assets are uniquely identified by human-readable-ish url, so make an
AssetError exception class that prints url with error message.

A property 'transient' is used to capture whether the client may retry
or try again later, or if it is a serious and likely permanent error.
This is used to retain the existing behaviour of treating HTTP errors
other than 404 as 'transient' and not causing precache step to fail.
Additionally, partial-downloads and stale asset caches that fail to
resolve after the retry limit are now treated as transient and do not
cause precache step to fail.

For background: The NetBSD archive is, at the time of writing, failing
with short transfer. Retrying the fetch at that position (as wget does)
results in a "503 backend unavailable" error. We would like to get that
error code directly, but I have not found a way to do that with urllib,
so treating the short-copy as a transient failure covers that case (and
seems like a reasonable way to handle it in general).

Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 roms/skiboot                        |  2 +-
 tests/functional/qemu_test/asset.py | 43 +++++++++++++++++++----------
 tests/lcitool/libvirt-ci            |  2 +-
 3 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/roms/skiboot b/roms/skiboot
index 24a7eb35966..785a5e3070a 160000
--- a/roms/skiboot
+++ b/roms/skiboot
@@ -1 +1 @@
-Subproject commit 24a7eb35966d93455520bc2debdd7954314b638b
+Subproject commit 785a5e3070a86e18521e62fe202b87209de30fa2
diff --git a/tests/functional/qemu_test/asset.py b/tests/functional/qemu_test/asset.py
index 6bbfb9e1cad..704b84d0ea6 100644
--- a/tests/functional/qemu_test/asset.py
+++ b/tests/functional/qemu_test/asset.py
@@ -17,6 +17,14 @@
 from shutil import copyfileobj
 from urllib.error import HTTPError
 
+class AssetError(Exception):
+    def __init__(self, asset, msg, transient=False):
+        self.url = asset.url
+        self.msg = msg
+        self.transient = transient
+
+    def __str__(self):
+        return "%s: %s" % (self.url, self.msg)
 
 # Instances of this class must be declared as class level variables
 # starting with a name "ASSET_". This enables the pre-caching logic
@@ -51,7 +59,7 @@ def _check(self, cache_file):
         elif len(self.hash) == 128:
             hl = hashlib.sha512()
         else:
-            raise Exception("unknown hash type")
+            raise AssetError(self, "unknown hash type")
 
         # Calculate the hash of the file:
         with open(cache_file, 'rb') as file:
@@ -111,7 +119,8 @@ def fetch(self):
             return str(self.cache_file)
 
         if not self.fetchable():
-            raise Exception("Asset cache is invalid and downloads disabled")
+            raise AssetError(self,
+                             "Asset cache is invalid and downloads disabled")
 
         self.log.info("Downloading %s to %s...", self.url, self.cache_file)
         tmp_cache_file = self.cache_file.with_suffix(".download")
@@ -147,13 +156,23 @@ def fetch(self):
                                tmp_cache_file)
                 tmp_cache_file.unlink()
                 continue
+            except HTTPError as e:
+                tmp_cache_file.unlink()
+                self.log.error("Unable to download %s: HTTP error %d",
+                               self.url, e.code)
+                # Treat 404 as fatal, since it is highly likely to
+                # indicate a broken test rather than a transient
+                # server or networking problem
+                if e.code == 404:
+                    raise AssetError(self, "Unable to download: "
+                                     "HTTP error %d" % e.code)
+                continue
             except Exception as e:
-                self.log.error("Unable to download %s: %s", self.url, e)
                 tmp_cache_file.unlink()
-                raise
+                raise AssetError(self, "Unable to download: " % e)
 
         if not os.path.exists(tmp_cache_file):
-            raise Exception("Retries exceeded downloading %s", self.url)
+            raise AssetError(self, "Download retries exceeded", transient=True)
 
         try:
             # Set these just for informational purposes
@@ -167,8 +186,7 @@ def fetch(self):
 
         if not self._check(tmp_cache_file):
             tmp_cache_file.unlink()
-            raise Exception("Hash of %s does not match %s" %
-                            (self.url, self.hash))
+            raise AssetError(self, "Hash does not match %s" % self.hash)
         tmp_cache_file.replace(self.cache_file)
         # Remove write perms to stop tests accidentally modifying them
         os.chmod(self.cache_file, stat.S_IRUSR | stat.S_IRGRP)
@@ -190,15 +208,10 @@ def precache_test(test):
                 log.info("Attempting to cache '%s'" % asset)
                 try:
                     asset.fetch()
-                except HTTPError as e:
-                    # Treat 404 as fatal, since it is highly likely to
-                    # indicate a broken test rather than a transient
-                    # server or networking problem
-                    if e.code == 404:
+                except AssetError as e:
+                    if not e.transient:
                         raise
-
-                    log.debug(f"HTTP error {e.code} from {asset.url} " +
-                              "skipping asset precache")
+                    log.error("%s: skipping asset precache" % e)
 
         log.removeHandler(handler)
 
diff --git a/tests/lcitool/libvirt-ci b/tests/lcitool/libvirt-ci
index 18c4bfe02c4..b6a65806bc9 160000
--- a/tests/lcitool/libvirt-ci
+++ b/tests/lcitool/libvirt-ci
@@ -1 +1 @@
-Subproject commit 18c4bfe02c467e5639bf9a687139735ccd7a3fff
+Subproject commit b6a65806bc9b2b56985f5e97c936b77c7e7a99fc
-- 
2.47.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/3] tests/functional/asset: Fail assert fetch when retries are exceeded
  2025-03-12 12:25 ` [PATCH v2 1/3] tests/functional/asset: Fail assert fetch when retries are exceeded Nicholas Piggin
@ 2025-03-12 12:27   ` Daniel P. Berrangé
  0 siblings, 0 replies; 7+ messages in thread
From: Daniel P. Berrangé @ 2025-03-12 12:27 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Thomas Huth, Philippe Mathieu-Daudé, qemu-devel

On Wed, Mar 12, 2025 at 10:25:56PM +1000, Nicholas Piggin wrote:
> Currently the fetch code does not fail gracefully when retry limit is
> exceeded, it just falls through the loop with no file, which ends up
> hitting other errors.
> 
> Add a check for non-existing file, which indicates the retry limit was
> exceeded.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  tests/functional/qemu_test/asset.py | 3 +++
>  1 file changed, 3 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/3] tests/functional/asset: Verify downloaded size
  2025-03-12 12:25 ` [PATCH v2 2/3] tests/functional/asset: Verify downloaded size Nicholas Piggin
@ 2025-03-12 12:27   ` Daniel P. Berrangé
  0 siblings, 0 replies; 7+ messages in thread
From: Daniel P. Berrangé @ 2025-03-12 12:27 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Thomas Huth, Philippe Mathieu-Daudé, qemu-devel

On Wed, Mar 12, 2025 at 10:25:57PM +1000, Nicholas Piggin wrote:
> If the server provides a Content-Length header, use that to verify the
> size of the downloaded file. This catches cases where the connection
> terminates early, and gives the opportunity to retry. Without this, the
> checksum will likely mismatch and fail without retry.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  tests/functional/qemu_test/asset.py | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 3/3] tests/functional/asset: Add AssetError exception class
  2025-03-12 12:25 ` [PATCH v2 3/3] tests/functional/asset: Add AssetError exception class Nicholas Piggin
@ 2025-03-12 12:29   ` Daniel P. Berrangé
  0 siblings, 0 replies; 7+ messages in thread
From: Daniel P. Berrangé @ 2025-03-12 12:29 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Thomas Huth, Philippe Mathieu-Daudé, qemu-devel

On Wed, Mar 12, 2025 at 10:25:58PM +1000, Nicholas Piggin wrote:
> Assets are uniquely identified by human-readable-ish url, so make an
> AssetError exception class that prints url with error message.
> 
> A property 'transient' is used to capture whether the client may retry
> or try again later, or if it is a serious and likely permanent error.
> This is used to retain the existing behaviour of treating HTTP errors
> other than 404 as 'transient' and not causing precache step to fail.
> Additionally, partial-downloads and stale asset caches that fail to
> resolve after the retry limit are now treated as transient and do not
> cause precache step to fail.
> 
> For background: The NetBSD archive is, at the time of writing, failing
> with short transfer. Retrying the fetch at that position (as wget does)
> results in a "503 backend unavailable" error. We would like to get that
> error code directly, but I have not found a way to do that with urllib,
> so treating the short-copy as a transient failure covers that case (and
> seems like a reasonable way to handle it in general).
> 
> Reviewed-by: Thomas Huth <thuth@redhat.com>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  roms/skiboot                        |  2 +-
>  tests/functional/qemu_test/asset.py | 43 +++++++++++++++++++----------
>  tests/lcitool/libvirt-ci            |  2 +-
>  3 files changed, 30 insertions(+), 17 deletions(-)
> 
> diff --git a/roms/skiboot b/roms/skiboot
> index 24a7eb35966..785a5e3070a 160000
> --- a/roms/skiboot
> +++ b/roms/skiboot
> @@ -1 +1 @@
> -Subproject commit 24a7eb35966d93455520bc2debdd7954314b638b
> +Subproject commit 785a5e3070a86e18521e62fe202b87209de30fa2


> diff --git a/tests/lcitool/libvirt-ci b/tests/lcitool/libvirt-ci
> index 18c4bfe02c4..b6a65806bc9 160000
> --- a/tests/lcitool/libvirt-ci
> +++ b/tests/lcitool/libvirt-ci
> @@ -1 +1 @@
> -Subproject commit 18c4bfe02c467e5639bf9a687139735ccd7a3fff
> +Subproject commit b6a65806bc9b2b56985f5e97c936b77c7e7a99fc

Two accidents here, with those removed

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-03-12 12:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-12 12:25 [PATCH v2 0/3] tests/functional/asset: improve partial-download handling Nicholas Piggin
2025-03-12 12:25 ` [PATCH v2 1/3] tests/functional/asset: Fail assert fetch when retries are exceeded Nicholas Piggin
2025-03-12 12:27   ` Daniel P. Berrangé
2025-03-12 12:25 ` [PATCH v2 2/3] tests/functional/asset: Verify downloaded size Nicholas Piggin
2025-03-12 12:27   ` Daniel P. Berrangé
2025-03-12 12:25 ` [PATCH v2 3/3] tests/functional/asset: Add AssetError exception class Nicholas Piggin
2025-03-12 12:29   ` Daniel P. Berrangé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).