* Clarification on Cleaning Up a Remote Hash Equivalence DB @ 2025-02-25 18:44 Alexandre Marques 2025-02-26 2:11 ` [bitbake-devel] " Joshua Watt 0 siblings, 1 reply; 6+ messages in thread From: Alexandre Marques @ 2025-02-25 18:44 UTC (permalink / raw) To: bitbake-devel@lists.openembedded.org Cc: Helios, node0-integration-build@list.bmw.com [-- Attachment #1: Type: text/plain, Size: 1781 bytes --] Hello there, We are using a remote hash equivalence server and need to clean up irrelevant hashes. Currently, we iterate over a list of hashes, marking each as "alive" using "bitbake-hashclient gc-mark", followed by "bitbake-hashclient gc-sweep <marker>." This process is inefficient, taking about 9 minutes for 12,000 hashes. We are wondering what would be the best way to go about improving this.. We propose extending "gc-mark" to support a "bulk-mode" for efficiency. Additionally, we obtain hashes from sstate files and were considering adapting the client to also accept a list of files, possibly through a new command or flag. However the client seems to act as a one-to-one frontend to the server API calls. It's unclear if this is intentional or by design, and if it would be of interest adding such a feature? Thank you, Best regards, Alexandre Marques The information in this communication may contain confidential or legally privileged information. It is intended solely for the use of the individual or entity it addresses and others authorized to receive it. If you are not an intended recipient, you are hereby notified that any disclosure, copying, distribution or action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication by error, please notify us immediately by responding to this e-mail and then delete it from your system. Critical TechWorks is not liable for the proper and complete transmission of the information in this communication nor for any delay in its receipt This e-mail is environmentally friendly, just like Critical TechWorks, which lives in a paper-free atmosphere. Therefore, please consider the environment before printing it! [-- Attachment #2: Type: text/html, Size: 6620 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [bitbake-devel] Clarification on Cleaning Up a Remote Hash Equivalence DB 2025-02-25 18:44 Clarification on Cleaning Up a Remote Hash Equivalence DB Alexandre Marques @ 2025-02-26 2:11 ` Joshua Watt 2025-02-26 9:50 ` Alexandre Marques 0 siblings, 1 reply; 6+ messages in thread From: Joshua Watt @ 2025-02-26 2:11 UTC (permalink / raw) To: Alexandre.Marques Cc: bitbake-devel@lists.openembedded.org, Helios, node0-integration-build@list.bmw.com On Tue, Feb 25, 2025 at 11:44 AM Alexandre Marques via lists.openembedded.org <Alexandre.Marques=criticaltechworks.com@lists.openembedded.org> wrote: > > Hello there, > > We are using a remote hash equivalence server and need to clean up irrelevant hashes. Currently, we > iterate over a list of hashes, marking each as "alive" using "bitbake-hashclient gc-mark", followed by > "bitbake-hashclient gc-sweep <marker>." This process is inefficient, taking about 9 minutes for 12,000 > hashes. > > We are wondering what would be the best way to go about improving this.. > > We propose extending "gc-mark" to support a "bulk-mode" for efficiency. It already does this. If you specify multiple KEY VALUE pairs on the command line, it will send all of them to the server in a single message. Granted, this is not the most useful on the command line. It might be possible to extend the command to take a file/stdin list of KEY VALUE pairs and automatically "chunk" them into the appropriate number of messages (maybe, 50 - 100 at a time?). Then you could probably have some script that finds all the files you want to keep and do something like: find-sstate-files | bitbake-hashclient gc-mark-stream > > Additionally, we obtain hashes from sstate files and were considering adapting the client to also accept > a list of files, possibly through a new command or flag. However the client seems to act as a one-to-one > frontend to the server API calls. It's unclear if this is intentional or by design, and if it would be of interest > adding such a feature? It's useful to be able to invoke individual server APIs with the tool, but the intent was not to limit it to just that (see the stress test). That said, the streaming command outlined above might be sufficient, and would mean that different users could implement a different filter on the front end instead of trying to code one logic that works for everyone in bitbake-hashclient. That said, if you do write a front end script like that, please consider sharing it with the rest of us :) > > Thank you, > > Best regards, > Alexandre Marques > > > The information in this communication may contain confidential or legally privileged information. It is intended solely for the use of the individual or entity it addresses and others authorized to receive it. If you are not an intended recipient, you are hereby notified that any disclosure, copying, distribution or action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication by error, please notify us immediately by responding to this e-mail and then delete it from your system. Critical TechWorks is not liable for the proper and complete transmission of the information in this communication nor for any delay in its receipt > > This e-mail is environmentally friendly, just like Critical TechWorks, which lives in a paper-free atmosphere. Therefore, please consider the environment before printing it! > > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#17322): https://lists.openembedded.org/g/bitbake-devel/message/17322 > Mute This Topic: https://lists.openembedded.org/mt/111382632/3616693 > Group Owner: bitbake-devel+owner@lists.openembedded.org > Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [JPEWhacker@gmail.com] > -=-=-=-=-=-=-=-=-=-=-=- > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [bitbake-devel] Clarification on Cleaning Up a Remote Hash Equivalence DB 2025-02-26 2:11 ` [bitbake-devel] " Joshua Watt @ 2025-02-26 9:50 ` Alexandre Marques 2025-02-26 16:07 ` Joshua Watt 0 siblings, 1 reply; 6+ messages in thread From: Alexandre Marques @ 2025-02-26 9:50 UTC (permalink / raw) To: bitbake-devel@lists.openembedded.org Cc: Helios, node0-integration-build@list.bmw.com, Joshua Watt [-- Attachment #1: Type: text/plain, Size: 7307 bytes --] It already does this. If you specify multiple KEY VALUE pairs on the command line, it will send all of them to the server in a single message. Well yes, but as far as I understand, keys get overwritten, and the key is the actual db column, meaning we can't really pass multiple hashes, just "refine" the query. For example: Client Side: $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where taskhash 9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0 New hashes marked: 1 $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where taskhash f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0 New hashes marked: 1 $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where taskhash 9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0 --where taskhash f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0 New hashes marked: 1 $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where taskhash 9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0 --where taskhash2 f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0 New hashes marked: 1 Server Side: {'mark': 'alive', 'where': {'taskhash': '9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0'}} {'mark': 'alive', 'where': {'taskhash': 'f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0'}} {'mark': 'alive', 'where': {'taskhash': 'f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0'}} {'mark': 'alive', 'where': {'taskhash': '9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0', 'taskhash2': 'f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0'}} Perhaps I'm missing a something.. What I was proposing would be supporting something similar to this: Client Side: $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where 9cd45da f1c4cca Server Side: {'mark': 'alive', 'where': {'taskhash': ['9cd45da', 'f1c4cca']}} That said, the streaming command outlined above might be sufficient, Sounds reasonable. That said, if you do write a front end script like that, please consider sharing it with the rest of us :) Most definitely :) Thanks for the quick reply. ________________________________ De: Joshua Watt <jpewhacker@gmail.com> Enviado: 26 de fevereiro de 2025 02:11 Para: Alexandre Marques <Alexandre.Marques@criticaltechworks.com> Cc: bitbake-devel@lists.openembedded.org <bitbake-devel@lists.openembedded.org>; Helios <Helios@criticaltechworks.com>; node0-integration-build@list.bmw.com <node0-integration-build@list.bmw.com> Assunto: Re: [bitbake-devel] Clarification on Cleaning Up a Remote Hash Equivalence DB Notice: This e-mail has originated from an external email service, so do not click on any links, nor open any attachments unless you know who the sender is and are sure the content is secure. On Tue, Feb 25, 2025 at 11:44 AM Alexandre Marques via lists.openembedded.org <Alexandre.Marques=criticaltechworks.com@lists.openembedded.org> wrote: > > Hello there, > > We are using a remote hash equivalence server and need to clean up irrelevant hashes. Currently, we > iterate over a list of hashes, marking each as "alive" using "bitbake-hashclient gc-mark", followed by > "bitbake-hashclient gc-sweep <marker>." This process is inefficient, taking about 9 minutes for 12,000 > hashes. > > We are wondering what would be the best way to go about improving this.. > > We propose extending "gc-mark" to support a "bulk-mode" for efficiency. It already does this. If you specify multiple KEY VALUE pairs on the command line, it will send all of them to the server in a single message. Granted, this is not the most useful on the command line. It might be possible to extend the command to take a file/stdin list of KEY VALUE pairs and automatically "chunk" them into the appropriate number of messages (maybe, 50 - 100 at a time?). Then you could probably have some script that finds all the files you want to keep and do something like: find-sstate-files | bitbake-hashclient gc-mark-stream > > Additionally, we obtain hashes from sstate files and were considering adapting the client to also accept > a list of files, possibly through a new command or flag. However the client seems to act as a one-to-one > frontend to the server API calls. It's unclear if this is intentional or by design, and if it would be of interest > adding such a feature? It's useful to be able to invoke individual server APIs with the tool, but the intent was not to limit it to just that (see the stress test). That said, the streaming command outlined above might be sufficient, and would mean that different users could implement a different filter on the front end instead of trying to code one logic that works for everyone in bitbake-hashclient. That said, if you do write a front end script like that, please consider sharing it with the rest of us :) > > Thank you, > > Best regards, > Alexandre Marques > > > The information in this communication may contain confidential or legally privileged information. It is intended solely for the use of the individual or entity it addresses and others authorized to receive it. If you are not an intended recipient, you are hereby notified that any disclosure, copying, distribution or action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication by error, please notify us immediately by responding to this e-mail and then delete it from your system. Critical TechWorks is not liable for the proper and complete transmission of the information in this communication nor for any delay in its receipt > > This e-mail is environmentally friendly, just like Critical TechWorks, which lives in a paper-free atmosphere. Therefore, please consider the environment before printing it! > > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#17322): https://lists.openembedded.org/g/bitbake-devel/message/17322 > Mute This Topic: https://lists.openembedded.org/mt/111382632/3616693 > Group Owner: bitbake-devel+owner@lists.openembedded.org > Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [JPEWhacker@gmail.com] > -=-=-=-=-=-=-=-=-=-=-=- > The information in this communication may contain confidential or legally privileged information. It is intended solely for the use of the individual or entity it addresses and others authorized to receive it. If you are not an intended recipient, you are hereby notified that any disclosure, copying, distribution or action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication by error, please notify us immediately by responding to this e-mail and then delete it from your system. Critical TechWorks is not liable for the proper and complete transmission of the information in this communication nor for any delay in its receipt This e-mail is environmentally friendly, just like Critical TechWorks, which lives in a paper-free atmosphere. Therefore, please consider the environment before printing it! [-- Attachment #2: Type: text/html, Size: 16157 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [bitbake-devel] Clarification on Cleaning Up a Remote Hash Equivalence DB 2025-02-26 9:50 ` Alexandre Marques @ 2025-02-26 16:07 ` Joshua Watt 2025-02-27 0:42 ` Alexandre Marques 0 siblings, 1 reply; 6+ messages in thread From: Joshua Watt @ 2025-02-26 16:07 UTC (permalink / raw) To: Alexandre Marques Cc: bitbake-devel@lists.openembedded.org, Helios, node0-integration-build@list.bmw.com On Wed, Feb 26, 2025 at 2:51 AM Alexandre Marques <Alexandre.Marques@criticaltechworks.com> wrote: > > It already does this. If you specify multiple KEY VALUE pairs on the > command line, it will send all of them to the server in a single > message. > > Well yes, but as far as I understand, keys get overwritten, and the key is the actual db column, meaning we can't really pass multiple hashes, just "refine" the query. > > For example: > Client Side: > $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where taskhash 9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0 > New hashes marked: 1 > > $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where taskhash f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0 > New hashes marked: 1 > > $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where taskhash 9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0 --where taskhash f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0 > New hashes marked: 1 > > $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where taskhash 9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0 --where taskhash2 f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0 > New hashes marked: 1 > > Server Side: > {'mark': 'alive', 'where': {'taskhash': '9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0'}} > {'mark': 'alive', 'where': {'taskhash': 'f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0'}} > {'mark': 'alive', 'where': {'taskhash': 'f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0'}} > {'mark': 'alive', 'where': {'taskhash': '9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0', 'taskhash2': 'f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0'}} > > Perhaps I'm missing a something.. > What I was proposing would be supporting something similar to this: > Client Side: > $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where 9cd45da f1c4cca > > Server Side: > {'mark': 'alive', 'where': {'taskhash': ['9cd45da', 'f1c4cca']}} No, you aren't missing anything, I was incorrect :) There are 2 approaches here. The first would be to improve bitbake-hashclient to add the input stream command, but keep the existing API with the server. Each mark sent through the file/pipe would result in a separate `mark` command being sent to the server. This should still be faster since it will reuse the same connection to the server as long as bitbake-hashclient is running, which will save the overhead of establishing and negotiating a connection. This also has the advantage that it doesn't require server side changes, but it does mean a round-trip for each `mark` command The second is to make a new server API to allow streaming of marks. The protocol between the client and server allows commands to go into a "stream" mode (which to be clear is distinct from the "input stream" discussed above, the name overlap is unfortunate). This mode allows the client to send newline delimited messages as fast as it wants (usually is some large batch size), and asynchronously read the responses from the server (see send_stream_batch() in the client code), allowing multiple messages to be in-flight at once. Implementing a new mark API on the server using this mechanism would be the fastest possible way of communicating the marks. Of course the disadvantage here is that it would require new API on the server, so a server upgrade would be required to use it. That said, it may be possible to make bitbake-hashclient intelligent enough to know if this new API exists and if so use it for the "input stream" and if not fallback to the older messages as described above > > > That said, the streaming command outlined above might be sufficient, > > Sounds reasonable. > > > That said, if you do write a front end > script like that, please consider sharing it with the rest of us :) > > Most definitely :) > > Thanks for the quick reply. > > ________________________________ > De: Joshua Watt <jpewhacker@gmail.com> > Enviado: 26 de fevereiro de 2025 02:11 > Para: Alexandre Marques <Alexandre.Marques@criticaltechworks.com> > Cc: bitbake-devel@lists.openembedded.org <bitbake-devel@lists.openembedded.org>; Helios <Helios@criticaltechworks.com>; node0-integration-build@list.bmw.com <node0-integration-build@list.bmw.com> > Assunto: Re: [bitbake-devel] Clarification on Cleaning Up a Remote Hash Equivalence DB > > Notice: This e-mail has originated from an external email service, so do not click on any links, nor open any attachments unless you know who the sender is and are sure the content is secure. > > > > On Tue, Feb 25, 2025 at 11:44 AM Alexandre Marques via > lists.openembedded.org > <Alexandre.Marques=criticaltechworks.com@lists.openembedded.org> > wrote: > > > > Hello there, > > > > We are using a remote hash equivalence server and need to clean up irrelevant hashes. Currently, we > > iterate over a list of hashes, marking each as "alive" using "bitbake-hashclient gc-mark", followed by > > "bitbake-hashclient gc-sweep <marker>." This process is inefficient, taking about 9 minutes for 12,000 > > hashes. > > > > We are wondering what would be the best way to go about improving this.. > > > > We propose extending "gc-mark" to support a "bulk-mode" for efficiency. > > It already does this. If you specify multiple KEY VALUE pairs on the > command line, it will send all of them to the server in a single > message. Granted, this is not the most useful on the command line. It > might be possible to extend the command to take a file/stdin list of > KEY VALUE pairs and automatically "chunk" them into the appropriate > number of messages (maybe, 50 - 100 at a time?). Then you could > probably have some script that finds all the files you want to keep > and do something like: > > find-sstate-files | bitbake-hashclient gc-mark-stream > > > > > Additionally, we obtain hashes from sstate files and were considering adapting the client to also accept > > a list of files, possibly through a new command or flag. However the client seems to act as a one-to-one > > frontend to the server API calls. It's unclear if this is intentional or by design, and if it would be of interest > > adding such a feature? > > It's useful to be able to invoke individual server APIs with the tool, > but the intent was not to limit it to just that (see the stress test). > That said, the streaming command outlined above might be sufficient, > and would mean that different users could implement a different filter > on the front end instead of trying to code one logic that works for > everyone in bitbake-hashclient. That said, if you do write a front end > script like that, please consider sharing it with the rest of us :) > > > > > Thank you, > > > > Best regards, > > Alexandre Marques > > > > > > The information in this communication may contain confidential or legally privileged information. It is intended solely for the use of the individual or entity it addresses and others authorized to receive it. If you are not an intended recipient, you are hereby notified that any disclosure, copying, distribution or action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication by error, please notify us immediately by responding to this e-mail and then delete it from your system. Critical TechWorks is not liable for the proper and complete transmission of the information in this communication nor for any delay in its receipt > > > > This e-mail is environmentally friendly, just like Critical TechWorks, which lives in a paper-free atmosphere. Therefore, please consider the environment before printing it! > > > > > > -=-=-=-=-=-=-=-=-=-=-=- > > Links: You receive all messages sent to this group. > > View/Reply Online (#17322): https://lists.openembedded.org/g/bitbake-devel/message/17322 > > Mute This Topic: https://lists.openembedded.org/mt/111382632/3616693 > > Group Owner: bitbake-devel+owner@lists.openembedded.org > > Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [JPEWhacker@gmail.com] > > -=-=-=-=-=-=-=-=-=-=-=- > > > The information in this communication may contain confidential or legally privileged information. It is intended solely for the use of the individual or entity it addresses and others authorized to receive it. If you are not an intended recipient, you are hereby notified that any disclosure, copying, distribution or action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication by error, please notify us immediately by responding to this e-mail and then delete it from your system. Critical TechWorks is not liable for the proper and complete transmission of the information in this communication nor for any delay in its receipt > > This e-mail is environmentally friendly, just like Critical TechWorks, which lives in a paper-free atmosphere. Therefore, please consider the environment before printing it! ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Clarification on Cleaning Up a Remote Hash Equivalence DB 2025-02-26 16:07 ` Joshua Watt @ 2025-02-27 0:42 ` Alexandre Marques 2025-02-28 15:00 ` [bitbake-devel] " Joshua Watt 0 siblings, 1 reply; 6+ messages in thread From: Alexandre Marques @ 2025-02-27 0:42 UTC (permalink / raw) To: bitbake-devel [-- Attachment #1: Type: text/plain, Size: 5177 bytes --] On Wed, Feb 26, 2025 at 08:08 AM, Joshua Watt wrote: > > On Wed, Feb 26, 2025 at 2:51 AM Alexandre Marques > <Alexandre.Marques@criticaltechworks.com> wrote: > >> It already does this. If you specify multiple KEY VALUE pairs on the >> command line, it will send all of them to the server in a single >> message. >> >> Well yes, but as far as I understand, keys get overwritten, and the key is >> the actual db column, meaning we can't really pass multiple hashes, just >> "refine" the query. >> >> For example: >> Client Side: >> $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where >> taskhash 9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0 >> New hashes marked: 1 >> >> $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where >> taskhash f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0 >> New hashes marked: 1 >> >> $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where >> taskhash 9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0 >> --where taskhash >> f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0 >> New hashes marked: 1 >> >> $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where >> taskhash 9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0 >> --where taskhash2 >> f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0 >> New hashes marked: 1 >> >> Server Side: >> {'mark': 'alive', 'where': {'taskhash': >> '9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0'}} >> {'mark': 'alive', 'where': {'taskhash': >> 'f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0'}} >> {'mark': 'alive', 'where': {'taskhash': >> 'f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0'}} >> {'mark': 'alive', 'where': {'taskhash': >> '9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0', >> 'taskhash2': >> 'f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0'}} >> >> Perhaps I'm missing a something.. >> What I was proposing would be supporting something similar to this: >> Client Side: >> $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where >> 9cd45da f1c4cca >> >> Server Side: >> {'mark': 'alive', 'where': {'taskhash': ['9cd45da', 'f1c4cca']}} > > No, you aren't missing anything, I was incorrect :) > > There are 2 approaches here. > > The first would be to improve bitbake-hashclient to add the input > stream command, but keep the existing API with the server. Each mark > sent through the file/pipe would result in a separate `mark` command > being sent to the server. This should still be faster since it will > reuse the same connection to the server as long as bitbake-hashclient > is running, which will save the overhead of establishing and > negotiating a connection. This also has the advantage that it doesn't > require server side changes, but it does mean a round-trip for each > `mark` command I have a simple POC for this first approach, and seems to be working fine. I still need to test with the remote server to have a better sense of how much faster it is, but based on my tests with a local server, I expect it to still be in the "minutes". > > The second is to make a new server API to allow streaming of marks. > The protocol between the client and server allows commands to go into > a "stream" mode (which to be clear is distinct from the "input stream" > discussed above, the name overlap is unfortunate). This mode allows > the client to send newline delimited messages as fast as it wants > (usually is some large batch size), and asynchronously read the > responses from the server (see send_stream_batch() in the client > code), allowing multiple messages to be in-flight at once. > Implementing a new mark API on the server using this mechanism would > be the fastest possible way of communicating the marks. Of course the > disadvantage here is that it would require new API on the server, so a > server upgrade would be required to use it. That said, it may be > possible to make bitbake-hashclient intelligent enough to know if this > new API exists and if so use it for the "input stream" and if not > fallback to the older messages as described above I haven't tried the "stream mode" yet, but had a quick look at the code and I don't see any obvious reason for it not to work. :) so thanks!! I've been trying to tackle this "make bitbake-hashclient intelligent enough to know if this new API exists" and tbh I'm struggling a bit.. Whenever I use a command that isn't available on the server I just get "Error talking to server: Connection closed". So far I'm not really seeing a way of getting more info on the particular error, and blindly assume a "connection closed" means the new API is not available does not seem right. I was thinking of adding a new command to the server to "request the API", or check if a particular command is available, but that in itself changes the API :,) so its a bit chicken and egg, which makes me think this might not be the away to go.. [-- Attachment #2: Type: text/html, Size: 5550 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [bitbake-devel] Clarification on Cleaning Up a Remote Hash Equivalence DB 2025-02-27 0:42 ` Alexandre Marques @ 2025-02-28 15:00 ` Joshua Watt 0 siblings, 0 replies; 6+ messages in thread From: Joshua Watt @ 2025-02-28 15:00 UTC (permalink / raw) To: Alexandre.Marques; +Cc: bitbake-devel On Wed, Feb 26, 2025 at 5:42 PM Alexandre Marques via lists.openembedded.org <Alexandre.Marques=criticaltechworks.com@lists.openembedded.org> wrote: > > On Wed, Feb 26, 2025 at 08:08 AM, Joshua Watt wrote: > > On Wed, Feb 26, 2025 at 2:51 AM Alexandre Marques > <Alexandre.Marques@criticaltechworks.com> wrote: > > It already does this. If you specify multiple KEY VALUE pairs on the > command line, it will send all of them to the server in a single > message. > > Well yes, but as far as I understand, keys get overwritten, and the key is the actual db column, meaning we can't really pass multiple hashes, just "refine" the query. > > For example: > Client Side: > $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where taskhash 9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0 > New hashes marked: 1 > > $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where taskhash f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0 > New hashes marked: 1 > > $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where taskhash 9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0 --where taskhash f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0 > New hashes marked: 1 > > $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where taskhash 9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0 --where taskhash2 f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0 > New hashes marked: 1 > > Server Side: > {'mark': 'alive', 'where': {'taskhash': '9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0'}} > {'mark': 'alive', 'where': {'taskhash': 'f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0'}} > {'mark': 'alive', 'where': {'taskhash': 'f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0'}} > {'mark': 'alive', 'where': {'taskhash': '9cd45da2fb6aa303a7828ec3cad7709bde2882422792e696016663f390aeece0', 'taskhash2': 'f1c4cca2ea1fc1181c40afc8518d75db42d5c5e841fc4a4dbdcba30e1a9879e0'}} > > Perhaps I'm missing a something.. > What I was proposing would be supporting something similar to this: > Client Side: > $ ./bitbake-hashclient --address localhost:8688 gc-mark alive --where 9cd45da f1c4cca > > Server Side: > {'mark': 'alive', 'where': {'taskhash': ['9cd45da', 'f1c4cca']}} > > No, you aren't missing anything, I was incorrect :) > > There are 2 approaches here. > > The first would be to improve bitbake-hashclient to add the input > stream command, but keep the existing API with the server. Each mark > sent through the file/pipe would result in a separate `mark` command > being sent to the server. This should still be faster since it will > reuse the same connection to the server as long as bitbake-hashclient > is running, which will save the overhead of establishing and > negotiating a connection. This also has the advantage that it doesn't > require server side changes, but it does mean a round-trip for each > `mark` command > > > I have a simple POC for this first approach, and seems to be working fine. > I still need to test with the remote server to have a better sense of how > much faster it is, but based on my tests with a local server, I expect it to > still be in the "minutes". > > The second is to make a new server API to allow streaming of marks. > The protocol between the client and server allows commands to go into > a "stream" mode (which to be clear is distinct from the "input stream" > discussed above, the name overlap is unfortunate). This mode allows > the client to send newline delimited messages as fast as it wants > (usually is some large batch size), and asynchronously read the > responses from the server (see send_stream_batch() in the client > code), allowing multiple messages to be in-flight at once. > Implementing a new mark API on the server using this mechanism would > be the fastest possible way of communicating the marks. Of course the > disadvantage here is that it would require new API on the server, so a > server upgrade would be required to use it. That said, it may be > possible to make bitbake-hashclient intelligent enough to know if this > new API exists and if so use it for the "input stream" and if not > fallback to the older messages as described above > > I haven't tried the "stream mode" yet, but had a quick look at the code and I > don't see any obvious reason for it not to work. :) so thanks!! > > I've been trying to tackle this "make bitbake-hashclient intelligent enough to know > if this new API exists" and tbh I'm struggling a bit.. Right, sorry about that. Every time I've gone to add new API to the server I've though: The server should really be able to communicate it's version to the client, so the client knows what API is available.... but I'll do that next time :) I was certain I had actually done it last time, but I was wrong. I might have a patch, let me check > > Whenever I use a command that isn't available on the server I just get "Error talking > to server: Connection closed". So far I'm not really seeing a way of getting more > info on the particular error, and blindly assume a "connection closed" means the > new API is not available does not seem right. > > I was thinking of adding a new command to the server to "request the API", or check > if a particular command is available, but that in itself changes the API :,) > so its a bit chicken and egg, which makes me think this might not be the away to go.. > > > -=-=-=-=-=-=-=-=-=-=-=- > Links: You receive all messages sent to this group. > View/Reply Online (#17334): https://lists.openembedded.org/g/bitbake-devel/message/17334 > Mute This Topic: https://lists.openembedded.org/mt/111382632/3616693 > Group Owner: bitbake-devel+owner@lists.openembedded.org > Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [JPEWhacker@gmail.com] > -=-=-=-=-=-=-=-=-=-=-=- > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-02-28 15:01 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-02-25 18:44 Clarification on Cleaning Up a Remote Hash Equivalence DB Alexandre Marques 2025-02-26 2:11 ` [bitbake-devel] " Joshua Watt 2025-02-26 9:50 ` Alexandre Marques 2025-02-26 16:07 ` Joshua Watt 2025-02-27 0:42 ` Alexandre Marques 2025-02-28 15:00 ` [bitbake-devel] " Joshua Watt
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.