From: "Alexandre Marques" <Alexandre.Marques@criticaltechworks.com>
To: bitbake-devel@lists.openembedded.org
Subject: Re: Clarification on Cleaning Up a Remote Hash Equivalence DB
Date: Wed, 26 Feb 2025 16:42:25 -0800 [thread overview]
Message-ID: <16993.1740616945379904359@lists.openembedded.org> (raw)
In-Reply-To: <CAJdd5GZtxMRQB=VyEms8cS+1BjHxpDWZ8BZgPdvwQmYA8oy=7Q@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 5177 bytes --]
On Wed, Feb 26, 2025 at 08:08 AM, Joshua Watt wrote:
>
> On Wed, Feb 26, 2025 at 2:51 AM Alexandre Marques
> <Alexandre.Marques@criticaltechworks.com> wrote:
>
>> It already does this. If you specify multiple KEY VALUE pairs on the
>> command line, it will send all of them to the server in a single
>> message.
>>
>> Well yes, but as far as I understand, keys get overwritten, and the key is
>> the actual db column, meaning we can't really pass multiple hashes, just
>> "refine" the query.
>>
>> For example:
>> Client Side:
>> $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where
>> taskhash 9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0
>> New hashes marked: 1
>>
>> $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where
>> taskhash f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0
>> New hashes marked: 1
>>
>> $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where
>> taskhash 9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0
>> --where taskhash
>> f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0
>> New hashes marked: 1
>>
>> $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where
>> taskhash 9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0
>> --where taskhash2
>> f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0
>> New hashes marked: 1
>>
>> Server Side:
>> {'mark': 'alive', 'where': {'taskhash':
>> '9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0'}}
>> {'mark': 'alive', 'where': {'taskhash':
>> 'f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0'}}
>> {'mark': 'alive', 'where': {'taskhash':
>> 'f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0'}}
>> {'mark': 'alive', 'where': {'taskhash':
>> '9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0',
>> 'taskhash2':
>> 'f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0'}}
>>
>> Perhaps I'm missing a something..
>> What I was proposing would be supporting something similar to this:
>> Client Side:
>> $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where
>> 9cd45da f1c4cca
>>
>> Server Side:
>> {'mark': 'alive', 'where': {'taskhash': ['9cd45da', 'f1c4cca']}}
>
> No, you aren't missing anything, I was incorrect :)
>
> There are 2 approaches here.
>
> The first would be to improve bitbake-hashclient to add the input
> stream command, but keep the existing API with the server. Each mark
> sent through the file/pipe would result in a separate `mark` command
> being sent to the server. This should still be faster since it will
> reuse the same connection to the server as long as bitbake-hashclient
> is running, which will save the overhead of establishing and
> negotiating a connection. This also has the advantage that it doesn't
> require server side changes, but it does mean a round-trip for each
> `mark` command
I have a simple POC for this first approach, and seems to be working fine.
I still need to test with the remote server to have a better sense of how
much faster it is, but based on my tests with a local server, I expect it to
still be in the "minutes".
>
> The second is to make a new server API to allow streaming of marks.
> The protocol between the client and server allows commands to go into
> a "stream" mode (which to be clear is distinct from the "input stream"
> discussed above, the name overlap is unfortunate). This mode allows
> the client to send newline delimited messages as fast as it wants
> (usually is some large batch size), and asynchronously read the
> responses from the server (see send_stream_batch() in the client
> code), allowing multiple messages to be in-flight at once.
> Implementing a new mark API on the server using this mechanism would
> be the fastest possible way of communicating the marks. Of course the
> disadvantage here is that it would require new API on the server, so a
> server upgrade would be required to use it. That said, it may be
> possible to make bitbake-hashclient intelligent enough to know if this
> new API exists and if so use it for the "input stream" and if not
> fallback to the older messages as described above
I haven't tried the "stream mode" yet, but had a quick look at the code and I
don't see any obvious reason for it not to work. :) so thanks!!
I've been trying to tackle this "make bitbake-hashclient intelligent enough to know
if this new API exists" and tbh I'm struggling a bit..
Whenever I use a command that isn't available on the server I just get "Error talking
to server: Connection closed". So far I'm not really seeing a way of getting more
info on the particular error, and blindly assume a "connection closed" means the
new API is not available does not seem right.
I was thinking of adding a new command to the server to "request the API", or check
if a particular command is available, but that in itself changes the API :,)
so its a bit chicken and egg, which makes me think this might not be the away to go..
[-- Attachment #2: Type: text/html, Size: 5550 bytes --]
next prev parent reply other threads:[~2025-02-27 0:42 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-25 18:44 Clarification on Cleaning Up a Remote Hash Equivalence DB Alexandre Marques
2025-02-26 2:11 ` [bitbake-devel] " Joshua Watt
2025-02-26 9:50 ` Alexandre Marques
2025-02-26 16:07 ` Joshua Watt
2025-02-27 0:42 ` Alexandre Marques [this message]
2025-02-28 15:00 ` Joshua Watt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=16993.1740616945379904359@lists.openembedded.org \
--to=alexandre.marques@criticaltechworks.com \
--cc=bitbake-devel@lists.openembedded.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.