* [PATCH 0/3] tests/functional/asset: improve partial-download handling
@ 2025-03-12 5:17 Nicholas Piggin
2025-03-12 5:17 ` [PATCH 1/3] tests/functional/asset: Fail assert fetch when retries are exceeded Nicholas Piggin
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Nicholas Piggin @ 2025-03-12 5:17 UTC (permalink / raw)
To: Thomas Huth
Cc: Nicholas Piggin, Philippe Mathieu-Daudé,
Daniel P. Berrangé, qemu-devel
Continuing discussion from
https://lore.kernel.org/qemu-devel/20250311131327.903329-1-npiggin@gmail.com/
I added a basic short-download detection, cleaned up download retry
failure, and added an AssetError class that can help us be a bit
smarter about what to do with failures. That's extended to treating
short downloads similarly to non-404 HTTP errors in that it won't
fail the precache step, just skip the asset.
I still think that no asset errors should cache precache to fail
including 404 and hash mismatch, but that they should be punted to
the individual test cases that use those assets. IMO they should cause
test failure rather than test skip, so it's more obvious. But that can
be topic for later discussion. For now, this basically does the right
thing with the NetBSD archive short download problem and treats it the
way we would treat a 503 (which it essentially is), and that gets us
past the CI failures in functional-system-debian and
functional-system-fedora that are hitting upstream testing now.
Thanks,
Nick
Nicholas Piggin (3):
tests/functional/asset: Fail assert fetch when retries are exceeded
tests/functional/asset: Verify downloaded size
tests/functional/asset: Add AssetError exception class
tests/functional/qemu_test/asset.py | 60 +++++++++++++++++++++--------
1 file changed, 45 insertions(+), 15 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] tests/functional/asset: Fail assert fetch when retries are exceeded
2025-03-12 5:17 [PATCH 0/3] tests/functional/asset: improve partial-download handling Nicholas Piggin
@ 2025-03-12 5:17 ` Nicholas Piggin
2025-03-12 6:40 ` Thomas Huth
2025-03-12 8:13 ` Daniel P. Berrangé
2025-03-12 5:17 ` [PATCH 2/3] tests/functional/asset: Verify downloaded size Nicholas Piggin
2025-03-12 5:17 ` [PATCH 3/3] tests/functional/asset: Add AssetError exception class Nicholas Piggin
2 siblings, 2 replies; 8+ messages in thread
From: Nicholas Piggin @ 2025-03-12 5:17 UTC (permalink / raw)
To: Thomas Huth
Cc: Nicholas Piggin, Philippe Mathieu-Daudé,
Daniel P. Berrangé, qemu-devel
Currently the fetch code does not fail gracefully when retry limit is
exceeded, it just falls through the loop with no file, which ends up
hitting other errors.
In preparation for adding more cases where a download gets retried,
add an explicit check for retry limit exceeded.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
tests/functional/qemu_test/asset.py | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tests/functional/qemu_test/asset.py b/tests/functional/qemu_test/asset.py
index f0730695f09..6a1c92ffbef 100644
--- a/tests/functional/qemu_test/asset.py
+++ b/tests/functional/qemu_test/asset.py
@@ -116,7 +116,10 @@ def fetch(self):
self.log.info("Downloading %s to %s...", self.url, self.cache_file)
tmp_cache_file = self.cache_file.with_suffix(".download")
- for retries in range(3):
+ for retries in range(4):
+ if retries == 3:
+ raise Exception("Retries exceeded downloading %s", self.url)
+
try:
with tmp_cache_file.open("xb") as dst:
with urllib.request.urlopen(self.url) as resp:
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] tests/functional/asset: Verify downloaded size
2025-03-12 5:17 [PATCH 0/3] tests/functional/asset: improve partial-download handling Nicholas Piggin
2025-03-12 5:17 ` [PATCH 1/3] tests/functional/asset: Fail assert fetch when retries are exceeded Nicholas Piggin
@ 2025-03-12 5:17 ` Nicholas Piggin
2025-03-12 6:49 ` Thomas Huth
2025-03-12 5:17 ` [PATCH 3/3] tests/functional/asset: Add AssetError exception class Nicholas Piggin
2 siblings, 1 reply; 8+ messages in thread
From: Nicholas Piggin @ 2025-03-12 5:17 UTC (permalink / raw)
To: Thomas Huth
Cc: Nicholas Piggin, Philippe Mathieu-Daudé,
Daniel P. Berrangé, qemu-devel
If the server provides a Content-Length header, use that to verify the
size of the downloaded file. This catches cases where the connection
terminates early, and gives the opportunity to retry. Without this, the
checksum will likely mismatch and fail without retry.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
tests/functional/qemu_test/asset.py | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/tests/functional/qemu_test/asset.py b/tests/functional/qemu_test/asset.py
index 6a1c92ffbef..d34e8f5e2ad 100644
--- a/tests/functional/qemu_test/asset.py
+++ b/tests/functional/qemu_test/asset.py
@@ -124,6 +124,22 @@ def fetch(self):
with tmp_cache_file.open("xb") as dst:
with urllib.request.urlopen(self.url) as resp:
copyfileobj(resp, dst)
+ length_hdr = resp.getheader("Content-Length")
+
+ # Verify downloaded file size against length metadata, if
+ # available. dst must be out of scope before testing st_size
+ # because # copyfileobj returns before all buffers are
+ # flushed to filesystem.
+ if length_hdr:
+ length = int(length_hdr)
+ if tmp_cache_file.stat().st_size != length:
+ print("st_size %ld", tmp_cache_file.stat().st_size)
+ self.log.error("Unable to download %s: "
+ "connection closed before "
+ "transfer complete",
+ self.url)
+ tmp_cache_file.unlink()
+ continue
break
except FileExistsError:
self.log.debug("%s already exists, "
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] tests/functional/asset: Add AssetError exception class
2025-03-12 5:17 [PATCH 0/3] tests/functional/asset: improve partial-download handling Nicholas Piggin
2025-03-12 5:17 ` [PATCH 1/3] tests/functional/asset: Fail assert fetch when retries are exceeded Nicholas Piggin
2025-03-12 5:17 ` [PATCH 2/3] tests/functional/asset: Verify downloaded size Nicholas Piggin
@ 2025-03-12 5:17 ` Nicholas Piggin
2025-03-12 6:56 ` Thomas Huth
2 siblings, 1 reply; 8+ messages in thread
From: Nicholas Piggin @ 2025-03-12 5:17 UTC (permalink / raw)
To: Thomas Huth
Cc: Nicholas Piggin, Philippe Mathieu-Daudé,
Daniel P. Berrangé, qemu-devel
Assets are uniquely identified by human-readable-ish url, so make an
AssetError exception class that prints url with error message.
A property 'transient' is used to capture whether the client may retry
or try again later, or if it is a serious and likely permanent error.
This is used to retain the existing behaviour of treating HTTP errors
other than 404 as 'transient' and not causing precache step to fail.
Additionally, partial-downloads and stale asset caches that fail to
resolve after the retry limit are now treated as transient and do not
cause precache step to fail.
For background: The NetBSD archive is, at the time of writing, failing
with short transfer. Retrying the fetch at that position (as wget does)
results in a "503 backend unavailable" error. We would like to get that
error code directly, but I have not found a way to do that with urllib,
so treating the short-copy as a transient failure covers that case (and
seems like a reasonable way to handle it in general).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
tests/functional/qemu_test/asset.py | 41 ++++++++++++++++++-----------
1 file changed, 26 insertions(+), 15 deletions(-)
diff --git a/tests/functional/qemu_test/asset.py b/tests/functional/qemu_test/asset.py
index d34e8f5e2ad..69b7a5ecb0e 100644
--- a/tests/functional/qemu_test/asset.py
+++ b/tests/functional/qemu_test/asset.py
@@ -17,6 +17,14 @@
from shutil import copyfileobj
from urllib.error import HTTPError
+class AssetError(Exception):
+ def __init__(self, asset, msg, transient=False):
+ self.url = asset.url
+ self.msg = msg
+ self.transient = transient
+
+ def __str__(self):
+ return "%s: %s" % (self.url, self.msg)
# Instances of this class must be declared as class level variables
# starting with a name "ASSET_". This enables the pre-caching logic
@@ -51,7 +59,7 @@ def _check(self, cache_file):
elif len(self.hash) == 128:
hl = hashlib.sha512()
else:
- raise Exception("unknown hash type")
+ raise AssetError(self, "unknown hash type")
# Calculate the hash of the file:
with open(cache_file, 'rb') as file:
@@ -111,14 +119,16 @@ def fetch(self):
return str(self.cache_file)
if not self.fetchable():
- raise Exception("Asset cache is invalid and downloads disabled")
+ raise AssetError(self,
+ "Asset cache is invalid and downloads disabled")
self.log.info("Downloading %s to %s...", self.url, self.cache_file)
tmp_cache_file = self.cache_file.with_suffix(".download")
for retries in range(4):
if retries == 3:
- raise Exception("Retries exceeded downloading %s", self.url)
+ raise AssetError(self, "Download retries exceeded",
+ transient=True)
try:
with tmp_cache_file.open("xb") as dst:
@@ -152,10 +162,17 @@ def fetch(self):
tmp_cache_file)
tmp_cache_file.unlink()
continue
+ except HTTPError as e:
+ tmp_cache_file.unlink()
+ # Treat 404 as fatal, since it is highly likely to
+ # indicate a broken test rather than a transient
+ # server or networking problem
+ raise AssetError(self, "Unable to download: "
+ "HTTP error %d" % e.code,
+ transient = e.code != 404)
except Exception as e:
- self.log.error("Unable to download %s: %s", self.url, e)
tmp_cache_file.unlink()
- raise
+ raise AssetError(self, "Unable to download: " % e)
try:
# Set these just for informational purposes
@@ -169,8 +186,7 @@ def fetch(self):
if not self._check(tmp_cache_file):
tmp_cache_file.unlink()
- raise Exception("Hash of %s does not match %s" %
- (self.url, self.hash))
+ raise AssetError(self, "Hash does not match %s" % self.hash)
tmp_cache_file.replace(self.cache_file)
# Remove write perms to stop tests accidentally modifying them
os.chmod(self.cache_file, stat.S_IRUSR | stat.S_IRGRP)
@@ -192,15 +208,10 @@ def precache_test(test):
log.info("Attempting to cache '%s'" % asset)
try:
asset.fetch()
- except HTTPError as e:
- # Treat 404 as fatal, since it is highly likely to
- # indicate a broken test rather than a transient
- # server or networking problem
- if e.code == 404:
+ except AssetError as e:
+ if not e.transient:
raise
-
- log.debug(f"HTTP error {e.code} from {asset.url} " +
- "skipping asset precache")
+ log.error("%s: skipping asset precache" % e)
log.removeHandler(handler)
--
2.47.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] tests/functional/asset: Fail assert fetch when retries are exceeded
2025-03-12 5:17 ` [PATCH 1/3] tests/functional/asset: Fail assert fetch when retries are exceeded Nicholas Piggin
@ 2025-03-12 6:40 ` Thomas Huth
2025-03-12 8:13 ` Daniel P. Berrangé
1 sibling, 0 replies; 8+ messages in thread
From: Thomas Huth @ 2025-03-12 6:40 UTC (permalink / raw)
To: Nicholas Piggin
Cc: Philippe Mathieu-Daudé, Daniel P. Berrangé, qemu-devel
On 12/03/2025 06.17, Nicholas Piggin wrote:
> Currently the fetch code does not fail gracefully when retry limit is
> exceeded, it just falls through the loop with no file, which ends up
> hitting other errors.
>
> In preparation for adding more cases where a download gets retried,
> add an explicit check for retry limit exceeded.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> tests/functional/qemu_test/asset.py | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/tests/functional/qemu_test/asset.py b/tests/functional/qemu_test/asset.py
> index f0730695f09..6a1c92ffbef 100644
> --- a/tests/functional/qemu_test/asset.py
> +++ b/tests/functional/qemu_test/asset.py
> @@ -116,7 +116,10 @@ def fetch(self):
> self.log.info("Downloading %s to %s...", self.url, self.cache_file)
> tmp_cache_file = self.cache_file.with_suffix(".download")
>
> - for retries in range(3):
> + for retries in range(4):
> + if retries == 3:
> + raise Exception("Retries exceeded downloading %s", self.url)
> +
> try:
> with tmp_cache_file.open("xb") as dst:
> with urllib.request.urlopen(self.url) as resp:
Reviewed-by: Thomas Huth <thuth@redhat.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] tests/functional/asset: Verify downloaded size
2025-03-12 5:17 ` [PATCH 2/3] tests/functional/asset: Verify downloaded size Nicholas Piggin
@ 2025-03-12 6:49 ` Thomas Huth
0 siblings, 0 replies; 8+ messages in thread
From: Thomas Huth @ 2025-03-12 6:49 UTC (permalink / raw)
To: Nicholas Piggin
Cc: Philippe Mathieu-Daudé, Daniel P. Berrangé, qemu-devel
On 12/03/2025 06.17, Nicholas Piggin wrote:
> If the server provides a Content-Length header, use that to verify the
> size of the downloaded file. This catches cases where the connection
> terminates early, and gives the opportunity to retry. Without this, the
> checksum will likely mismatch and fail without retry.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> tests/functional/qemu_test/asset.py | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/tests/functional/qemu_test/asset.py b/tests/functional/qemu_test/asset.py
> index 6a1c92ffbef..d34e8f5e2ad 100644
> --- a/tests/functional/qemu_test/asset.py
> +++ b/tests/functional/qemu_test/asset.py
> @@ -124,6 +124,22 @@ def fetch(self):
> with tmp_cache_file.open("xb") as dst:
> with urllib.request.urlopen(self.url) as resp:
> copyfileobj(resp, dst)
> + length_hdr = resp.getheader("Content-Length")
> +
> + # Verify downloaded file size against length metadata, if
> + # available. dst must be out of scope before testing st_size
> + # because # copyfileobj returns before all buffers are
remove the "#" after "because" ?
> + # flushed to filesystem.
As far as I understood Python, that's the main reason for using "with" in
cases like this here, so I'd maybe even rather scratch the last sentence
completely.
> + if length_hdr:
> + length = int(length_hdr)
> + if tmp_cache_file.stat().st_size != length:
> + print("st_size %ld", tmp_cache_file.stat().st_size)
Debug left-over?
> + self.log.error("Unable to download %s: "
> + "connection closed before "
> + "transfer complete",
> + self.url)
Maybe it would be good to include st_size and length in the log message instead?
> + tmp_cache_file.unlink()
> + continue
> break
> except FileExistsError:
> self.log.debug("%s already exists, "
Thomas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 3/3] tests/functional/asset: Add AssetError exception class
2025-03-12 5:17 ` [PATCH 3/3] tests/functional/asset: Add AssetError exception class Nicholas Piggin
@ 2025-03-12 6:56 ` Thomas Huth
0 siblings, 0 replies; 8+ messages in thread
From: Thomas Huth @ 2025-03-12 6:56 UTC (permalink / raw)
To: Nicholas Piggin
Cc: Philippe Mathieu-Daudé, Daniel P. Berrangé, qemu-devel
On 12/03/2025 06.17, Nicholas Piggin wrote:
> Assets are uniquely identified by human-readable-ish url, so make an
> AssetError exception class that prints url with error message.
>
> A property 'transient' is used to capture whether the client may retry
> or try again later, or if it is a serious and likely permanent error.
> This is used to retain the existing behaviour of treating HTTP errors
> other than 404 as 'transient' and not causing precache step to fail.
> Additionally, partial-downloads and stale asset caches that fail to
> resolve after the retry limit are now treated as transient and do not
> cause precache step to fail.
>
> For background: The NetBSD archive is, at the time of writing, failing
> with short transfer. Retrying the fetch at that position (as wget does)
> results in a "503 backend unavailable" error. We would like to get that
> error code directly, but I have not found a way to do that with urllib,
> so treating the short-copy as a transient failure covers that case (and
> seems like a reasonable way to handle it in general).
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> tests/functional/qemu_test/asset.py | 41 ++++++++++++++++++-----------
> 1 file changed, 26 insertions(+), 15 deletions(-)
Reviewed-by: Thomas Huth <thuth@redhat.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] tests/functional/asset: Fail assert fetch when retries are exceeded
2025-03-12 5:17 ` [PATCH 1/3] tests/functional/asset: Fail assert fetch when retries are exceeded Nicholas Piggin
2025-03-12 6:40 ` Thomas Huth
@ 2025-03-12 8:13 ` Daniel P. Berrangé
1 sibling, 0 replies; 8+ messages in thread
From: Daniel P. Berrangé @ 2025-03-12 8:13 UTC (permalink / raw)
To: Nicholas Piggin; +Cc: Thomas Huth, Philippe Mathieu-Daudé, qemu-devel
On Wed, Mar 12, 2025 at 03:17:36PM +1000, Nicholas Piggin wrote:
> Currently the fetch code does not fail gracefully when retry limit is
> exceeded, it just falls through the loop with no file, which ends up
> hitting other errors.
>
> In preparation for adding more cases where a download gets retried,
> add an explicit check for retry limit exceeded.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> tests/functional/qemu_test/asset.py | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/tests/functional/qemu_test/asset.py b/tests/functional/qemu_test/asset.py
> index f0730695f09..6a1c92ffbef 100644
> --- a/tests/functional/qemu_test/asset.py
> +++ b/tests/functional/qemu_test/asset.py
> @@ -116,7 +116,10 @@ def fetch(self):
> self.log.info("Downloading %s to %s...", self.url, self.cache_file)
> tmp_cache_file = self.cache_file.with_suffix(".download")
>
> - for retries in range(3):
> + for retries in range(4):
> + if retries == 3:
> + raise Exception("Retries exceeded downloading %s", self.url)
While it works, it feels a bit wierd to me. Given the error retry
scenario will unlink the file, I think it would be better todo
if not os.path.exists(tmp_cache_file)
raise Exception(...)
immediately after the for() loop
> +
> try:
> with tmp_cache_file.open("xb") as dst:
> with urllib.request.urlopen(self.url) as resp:
> --
> 2.47.1
>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-03-12 8:14 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-12 5:17 [PATCH 0/3] tests/functional/asset: improve partial-download handling Nicholas Piggin
2025-03-12 5:17 ` [PATCH 1/3] tests/functional/asset: Fail assert fetch when retries are exceeded Nicholas Piggin
2025-03-12 6:40 ` Thomas Huth
2025-03-12 8:13 ` Daniel P. Berrangé
2025-03-12 5:17 ` [PATCH 2/3] tests/functional/asset: Verify downloaded size Nicholas Piggin
2025-03-12 6:49 ` Thomas Huth
2025-03-12 5:17 ` [PATCH 3/3] tests/functional/asset: Add AssetError exception class Nicholas Piggin
2025-03-12 6:56 ` Thomas Huth
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).