[Buildroot] [PATCH 1/1] support/testing: test_aichat: improve test reliability

* [Buildroot] [PATCH 1/1] support/testing: test_aichat: improve test reliability
@ 2026-03-18 21:38 Julien Olivain via buildroot
  2026-03-19 20:55 ` Julien Olivain via buildroot
  0 siblings, 1 reply; 2+ messages in thread
From: Julien Olivain via buildroot @ 2026-03-18 21:38 UTC (permalink / raw)
  To: buildroot; +Cc: Julien Olivain

Since llama.cpp update in Buildroot commit [1], the test_aichat can
fail for several reasons:

The loop checking for the llama-server availability can fail if curl
succeed, but the returned json data is not formatted as expected.
This can happen if the server is ready but the model is not completely
loaded. In that case, the server returns:

    {"error":{"message":"Loading model","type":"unavailable_error","code":503}}

This commit ignore Python KeyError exceptions while doing the
server test, to avoid failing if this message is received.

Also, this new llama-server version introduced a prompt caching, which
uses too much memory. This commit completely disable this prompt
caching by adding "--cache-ram 0" in the llama-server options.

[1] https://gitlab.com/buildroot.org/buildroot/-/commit/05c36d5d875713521f99b7bad48be316dcde2510

Signed-off-by: Julien Olivain <ju.o@free.fr>
---
 support/testing/tests/package/test_aichat.py | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/support/testing/tests/package/test_aichat.py b/support/testing/tests/package/test_aichat.py
index 5fc554bbb5..50c07f87dc 100644
--- a/support/testing/tests/package/test_aichat.py
+++ b/support/testing/tests/package/test_aichat.py
@@ -70,6 +70,8 @@ class TestAiChat(infra.basetest.BRTest):
         llama_opts = "--log-file /tmp/llama-server.log"
         # We set a fixed seed, to reduce variability of the test
         llama_opts += " --seed 123456789"
+        # We disable prompt caching to reduce RAM usage
+        llama_opts += " --cache-ram 0"
         llama_opts += f" --hf-repo {hf_model}"
 
         # We start a llama-server in background, which will expose an
@@ -91,9 +93,12 @@ class TestAiChat(infra.basetest.BRTest):
             if ret == 0:
                 models_json = "".join(out)
                 models = json.loads(models_json)
-                model_name = models['models'][0]['name']
-                if model_name == hf_model:
-                    break
+                try:
+                    model_name = models['models'][0]['name']
+                    if model_name == hf_model:
+                        break
+                except KeyError:
+                    pass
         else:
             self.fail("Timeout while waiting for llama-server.")
 
-- 
2.53.0

_______________________________________________
buildroot mailing list
buildroot@buildroot.org
https://lists.buildroot.org/mailman/listinfo/buildroot

^ permalink raw reply related	[flat|nested] 2+ messages in thread