* nfnetlink_queue -- why linear lookup ? @ 2021-08-13 11:55 alexandre.ferrieux 2021-08-14 21:01 ` Florian Westphal 0 siblings, 1 reply; 20+ messages in thread From: alexandre.ferrieux @ 2021-08-13 11:55 UTC (permalink / raw) To: netfilter-devel Hello, In nfnetlink_queue.c, when receiving a verdict for a packet, its entry in the queue is looked up linearly: find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id) { ... list_for_each_entry(i, &queue->queue_list, list) { if (i->id == id) { entry = i; break; } } ... } As a result, in a situation of "highly asynchronous" verdicts, i.e. when we want some packets to linger in the queue for some time before reinjection, the mere existence of a large number of such "old packets" may incur a nonnegligible cost to the system. So I'm wondering: why is the list implemented as a simple linked list instead of an array directly indexed by the id (like file descriptors) ? Indeed, the list has a configured max size, the passed id can be bound-checked, discarded entries can simply hold a NULL, and id reuse is userspace's responsibility. So it looks like the array would yield constant-time lookup with no extra risk. What am I missing ? Thans in advance, -Alex _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-13 11:55 nfnetlink_queue -- why linear lookup ? alexandre.ferrieux @ 2021-08-14 21:01 ` Florian Westphal 2021-08-14 21:05 ` alexandre.ferrieux 0 siblings, 1 reply; 20+ messages in thread From: Florian Westphal @ 2021-08-14 21:01 UTC (permalink / raw) To: alexandre.ferrieux; +Cc: netfilter-devel alexandre.ferrieux@orange.com <alexandre.ferrieux@orange.com> wrote: > find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id) > { > ... > list_for_each_entry(i, &queue->queue_list, list) { > if (i->id == id) { > entry = i; > break; > } > } > ... > } > > As a result, in a situation of "highly asynchronous" verdicts, i.e. when we > want some packets to linger in the queue for some time before reinjection, > the mere existence of a large number of such "old packets" may incur a > nonnegligible cost to the system. > > So I'm wondering: why is the list implemented as a simple linked list > instead of an array directly indexed by the id (like file descriptors) ? Because when this was implemented "highly asynchronous" was not on the radar. All users of this (that I know of) do in-order verdicts. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-14 21:01 ` Florian Westphal @ 2021-08-14 21:05 ` alexandre.ferrieux 2021-08-14 21:12 ` Florian Westphal 0 siblings, 1 reply; 20+ messages in thread From: alexandre.ferrieux @ 2021-08-14 21:05 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel On 8/14/21 11:01 PM, Florian Westphal wrote: > alexandre.ferrieux@orange.com <alexandre.ferrieux@orange.com> wrote: >> find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id) >> { >> ... >> list_for_each_entry(i, &queue->queue_list, list) { >> if (i->id == id) { >> entry = i; >> break; >> } >> } >> ... >> } >> >> As a result, in a situation of "highly asynchronous" verdicts, i.e. when we >> want some packets to linger in the queue for some time before reinjection, >> the mere existence of a large number of such "old packets" may incur a >> nonnegligible cost to the system. >> >> So I'm wondering: why is the list implemented as a simple linked list >> instead of an array directly indexed by the id (like file descriptors) ? > > Because when this was implemented "highly asynchronous" was not on the > radar. All users of this (that I know of) do in-order verdicts. So, O(N) instead of O(1) just because "I currently can't imagine N>5" ? Would a patch to that effect be rejected ? _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-14 21:05 ` alexandre.ferrieux @ 2021-08-14 21:12 ` Florian Westphal 2021-08-15 12:17 ` alexandre.ferrieux 0 siblings, 1 reply; 20+ messages in thread From: Florian Westphal @ 2021-08-14 21:12 UTC (permalink / raw) To: alexandre.ferrieux; +Cc: Florian Westphal, netfilter-devel alexandre.ferrieux@orange.com <alexandre.ferrieux@orange.com> wrote: > > Because when this was implemented "highly asynchronous" was not on the > > radar. All users of this (that I know of) do in-order verdicts. > > So, O(N) instead of O(1) just because "I currently can't imagine N>5" ? Seems so. THis code was written 21 years ago. > Would a patch to that effect be rejected ? Probably not, depends on the implementation. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-14 21:12 ` Florian Westphal @ 2021-08-15 12:17 ` alexandre.ferrieux 2021-08-15 13:07 ` Pablo Neira Ayuso 0 siblings, 1 reply; 20+ messages in thread From: alexandre.ferrieux @ 2021-08-15 12:17 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel On 8/14/21 11:12 PM, Florian Westphal wrote: > alexandre.ferrieux@orange.com <alexandre.ferrieux@orange.com> wrote: >> > Because when this was implemented "highly asynchronous" was not on the >> > radar. All users of this (that I know of) do in-order verdicts. >> >> So, O(N) instead of O(1) just because "I currently can't imagine N>5" ? > > Seems so. THis code was written 21 years ago. > >> Would a patch to that effect be rejected ? > > Probably not, depends on the implementation. Sad: while the (necessarily) async nature of the kernel/user interface naturally suggests this change, one specific part of the existing API complicates things: batch verdicts ! Indeed, the very notion of an "id range" for batch verdicts, forbids the simple approach of reused small integers as array indices. So, the only way forward would be a separate hashtable on ids. Much less elegant + risk of slight overhead for housekeeping. I stand disappointed :) PS: what is the intended dominant use case for batch verdicts ? _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-15 12:17 ` alexandre.ferrieux @ 2021-08-15 13:07 ` Pablo Neira Ayuso 2021-08-15 13:32 ` alexandre.ferrieux 2021-08-15 13:33 ` alexandre.ferrieux 0 siblings, 2 replies; 20+ messages in thread From: Pablo Neira Ayuso @ 2021-08-15 13:07 UTC (permalink / raw) To: alexandre.ferrieux; +Cc: Florian Westphal, netfilter-devel On Sun, Aug 15, 2021 at 02:17:08PM +0200, alexandre.ferrieux@orange.com wrote: [...] > So, the only way forward would be a separate hashtable on ids. Using the rhashtable implementation is fine for this, it's mostly boilerplate code that is needed to use it and there are plenty of examples in the kernel tree if you need a reference. [...] > PS: what is the intended dominant use case for batch verdicts ? Issuing a batch containing several packets helps to amortize the cost of the syscall. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-15 13:07 ` Pablo Neira Ayuso @ 2021-08-15 13:32 ` alexandre.ferrieux 2021-08-15 14:12 ` Pablo Neira Ayuso 2021-08-15 13:33 ` alexandre.ferrieux 1 sibling, 1 reply; 20+ messages in thread From: alexandre.ferrieux @ 2021-08-15 13:32 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel On 8/15/21 3:07 PM, Pablo Neira Ayuso wrote: > On Sun, Aug 15, 2021 at 02:17:08PM +0200, alexandre.ferrieux@orange.com wrote: > [...] >> So, the only way forward would be a separate hashtable on ids. > > Using the rhashtable implementation is fine for this, it's mostly > boilerplate code that is needed to use it and there are plenty of > examples in the kernel tree if you need a reference. Thanks, that's indeed pretty simple. I was just worried that people would object to adding even the slightest overhead (hash_add/hash_del) to the existing code path, that satisfies 99% of uses (LIFO). What do you think ? >> PS: what is the intended dominant use case for batch verdicts ? > > Issuing a batch containing several packets helps to amortize the > cost of the syscall. Yes, but that could also be achieved by passing an array of ids. The extra constraint of using a (contiguous) range means that there is no outlier. This seems to imply that ranges are no help when flows are multiplexed. Or maybe, the assumption was that bursts tend to be homogeneous ? _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-15 13:32 ` alexandre.ferrieux @ 2021-08-15 14:12 ` Pablo Neira Ayuso 2021-08-15 18:47 ` alexandre.ferrieux 2021-08-16 12:04 ` Duncan Roe 0 siblings, 2 replies; 20+ messages in thread From: Pablo Neira Ayuso @ 2021-08-15 14:12 UTC (permalink / raw) To: alexandre.ferrieux; +Cc: Florian Westphal, netfilter-devel On Sun, Aug 15, 2021 at 03:32:30PM +0200, alexandre.ferrieux@orange.com wrote: > On 8/15/21 3:07 PM, Pablo Neira Ayuso wrote: > > On Sun, Aug 15, 2021 at 02:17:08PM +0200, alexandre.ferrieux@orange.com wrote: > > [...] > > > So, the only way forward would be a separate hashtable on ids. > > > > Using the rhashtable implementation is fine for this, it's mostly > > boilerplate code that is needed to use it and there are plenty of > > examples in the kernel tree if you need a reference. > > Thanks, that's indeed pretty simple. I was just worried that people would > object to adding even the slightest overhead (hash_add/hash_del) to the > existing code path, that satisfies 99% of uses (LIFO). What do you think ? It should be possible to maintain both the list and the hashtable, AFAICS, the batch callback still needs the queue_list. > > > PS: what is the intended dominant use case for batch verdicts ? > > > > Issuing a batch containing several packets helps to amortize the > > cost of the syscall. > > Yes, but that could also be achieved by passing an array of ids. You mean, one single sendmsg() with several netlink messages, that would also work to achieve a similar batching effect. > The extra constraint of using a (contiguous) range means that there > is no outlier. This seems to imply that ranges are no help when > flows are multiplexed. Or maybe, the assumption was that bursts tend > to be homogeneous ? What is your usecase? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-15 14:12 ` Pablo Neira Ayuso @ 2021-08-15 18:47 ` alexandre.ferrieux 2021-08-16 9:05 ` Pablo Neira Ayuso 2021-08-16 12:04 ` Duncan Roe 1 sibling, 1 reply; 20+ messages in thread From: alexandre.ferrieux @ 2021-08-15 18:47 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel On 8/15/21 4:12 PM, Pablo Neira Ayuso wrote: > On Sun, Aug 15, 2021 at 03:32:30PM +0200, alexandre.ferrieux@orange.com wrote: >> >> [...] I was just worried that people would >> object to adding even the slightest overhead (hash_add/hash_del) to the >> existing code path, that satisfies 99% of uses (LIFO). What do you think ? > > It should be possible to maintain both the list and the hashtable, > AFAICS, the batch callback still needs the queue_list. Yes, but to maintain the hashtable, we need to bother the "normal" code path with hash_add/del. Not much, but still, some overhead... >> > > PS: what is the intended dominant use case for batch verdicts ? >> > >> > Issuing a batch containing several packets helps to amortize the >> > cost of the syscall. >> >> Yes, but that could also be achieved by passing an array of ids. > > You mean, one single sendmsg() with several netlink messages, that > would also work to achieve a similar batching effect. Yes, a full spectrum of batching methods are possible. If we're to minimize the number of bytes crossing the kernel/user boundary though, an array of ids looks like the way to go (4 bytes per packet, assuming uint32 ids). >> The extra constraint of using a (contiguous) range means that there >> is no outlier. This seems to imply that ranges are no help when >> flows are multiplexed. Or maybe, the assumption was that bursts tend >> to be homogeneous ? > > What is your usecase? For O(1) lookup: My primary motivation is for transparent proxies and userland qdiscs. In both cases, specific packets must be held for some time and reinjected at a later time which is not computed by a simple, fixed delay, but rather triggered by some external event. My secondary motivation is that the netfilter queue is a beautifully asynchronous mechanism, and absolutely nothing in its definition limits it to the dumb FIFO-of-instantaneous-drop-decisions that seems to dominate sample code. For the deprecation of id-range-based batching: It seems that as soon as two different packet streams are muxed in the queue, one deserving verdict A and the other verdict B, contiguous id ranges of a given verdict may be very small. But I realize I'm 20 years late to complain :) That being said, the Doxygen of the userland nfqueue API mentions being DEPRECATED... So what is the recommended replacement ? _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-15 18:47 ` alexandre.ferrieux @ 2021-08-16 9:05 ` Pablo Neira Ayuso 2021-08-16 10:53 ` alexandre.ferrieux 0 siblings, 1 reply; 20+ messages in thread From: Pablo Neira Ayuso @ 2021-08-16 9:05 UTC (permalink / raw) To: alexandre.ferrieux; +Cc: Florian Westphal, netfilter-devel On Sun, Aug 15, 2021 at 08:47:04PM +0200, alexandre.ferrieux@orange.com wrote: > > > On 8/15/21 4:12 PM, Pablo Neira Ayuso wrote: > > On Sun, Aug 15, 2021 at 03:32:30PM +0200, alexandre.ferrieux@orange.com wrote: > >> > > > [...] I was just worried that people would >> object to adding even the slightest overhead (hash_add/hash_del) to the > > > existing code path, that satisfies 99% of uses (LIFO). What do you think ? > > > > It should be possible to maintain both the list and the hashtable, > > AFAICS, the batch callback still needs the queue_list. > > Yes, but to maintain the hashtable, we need to bother the "normal" code path > with hash_add/del. Not much, but still, some overhead... Probably you can collect some numbers to make sure this is not a theoretical issue. > > > > > PS: what is the intended dominant use case for batch verdicts ? > > > > > Issuing a batch containing several packets helps to amortize the > > > > cost of the syscall. > > > > > > Yes, but that could also be achieved by passing an array of ids. > > > > You mean, one single sendmsg() with several netlink messages, that > > would also work to achieve a similar batching effect. > > Yes, a full spectrum of batching methods are possible. If we're to minimize > the number of bytes crossing the kernel/user boundary though, an array of > ids looks like the way to go (4 bytes per packet, assuming uint32 ids). Are you proposing a new batching mechanism? > > > The extra constraint of using a (contiguous) range means that there > > > is no outlier. This seems to imply that ranges are no help when > > > flows are multiplexed. Or maybe, the assumption was that bursts tend > > > to be homogeneous ? > > > > What is your usecase? > > For O(1) lookup: > > My primary motivation is for transparent proxies and userland qdiscs. In > both cases, specific packets must be held for some time and reinjected at a > later time which is not computed by a simple, fixed delay, but rather > triggered by some external event. > > My secondary motivation is that the netfilter queue is a beautifully > asynchronous mechanism, and absolutely nothing in its definition limits it > to the dumb FIFO-of-instantaneous-drop-decisions that seems to dominate > sample code. I see. Thanks for telling me. > For the deprecation of id-range-based batching: > > It seems that as soon as two different packet streams are muxed in the > queue, one deserving verdict A and the other verdict B, contiguous id ranges > of a given verdict may be very small. But I realize I'm 20 years late to > complain :) As I said, you can place several netlink messages in one single sendmsg() call, then they do not need to be contiguous. > That being said, the Doxygen of the userland nfqueue API mentions being > DEPRECATED... So what is the recommended replacement ? What API are you refering to specifically? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-16 9:05 ` Pablo Neira Ayuso @ 2021-08-16 10:53 ` alexandre.ferrieux 2021-08-16 10:56 ` Florian Westphal ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: alexandre.ferrieux @ 2021-08-16 10:53 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel On 8/16/21 11:05 AM, Pablo Neira Ayuso wrote: > On Sun, Aug 15, 2021 at 08:47:04PM +0200, alexandre.ferrieux@orange.com wrote: >> >> >> [...] to maintain the hashtable, we need to bother the "normal" code path >> with hash_add/del. Not much, but still, some overhead... > > Probably you can collect some numbers to make sure this is not a > theoretical issue. 'k, will do :) >> Yes, a full spectrum of batching methods are possible. If we're to minimize >> the number of bytes crossing the kernel/user boundary though, an array of >> ids looks like the way to go (4 bytes per packet, assuming uint32 ids). > > Are you proposing a new batching mechanism? Well, the problem is backwards compatibility. Indeed I'd propose more flexible batching via an array of ids instead of a maxid. But the main added value of this flexibility is to enable reused-small-integers ids, like file descriptors. As long as the maxid API remains in place, this is impossible. >> That being said, the Doxygen of the userland nfqueue API mentions being >> DEPRECATED... So what is the recommended replacement ? > > What API are you refering to specifically? I'm referring to the nfq API documented here: https://www.netfilter.org/projects/libnetfilter_queue/doxygen/html/group__Queue.html It starts with "Queue handling [DEPRECATED]"... _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-16 10:53 ` alexandre.ferrieux @ 2021-08-16 10:56 ` Florian Westphal 2021-08-16 11:07 ` alexandre.ferrieux 2021-08-16 11:19 ` Pablo Neira Ayuso 2021-08-16 11:42 ` Duncan Roe 2 siblings, 1 reply; 20+ messages in thread From: Florian Westphal @ 2021-08-16 10:56 UTC (permalink / raw) To: alexandre.ferrieux; +Cc: Pablo Neira Ayuso, Florian Westphal, netfilter-devel alexandre.ferrieux@orange.com <alexandre.ferrieux@orange.com> wrote: > Well, the problem is backwards compatibility. Indeed I'd propose more > flexible batching via an array of ids instead of a maxid. But the main added > value of this flexibility is to enable reused-small-integers ids, like file > descriptors. As long as the maxid API remains in place, this is impossible. You cannot remove the maxid API. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-16 10:56 ` Florian Westphal @ 2021-08-16 11:07 ` alexandre.ferrieux 0 siblings, 0 replies; 20+ messages in thread From: alexandre.ferrieux @ 2021-08-16 11:07 UTC (permalink / raw) To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel On 8/16/21 12:56 PM, Florian Westphal wrote: > alexandre.ferrieux@orange.com <alexandre.ferrieux@orange.com> wrote: >> Well, the problem is backwards compatibility. Indeed I'd propose more >> flexible batching via an array of ids instead of a maxid. But the main added >> value of this flexibility is to enable reused-small-integers ids, like file >> descriptors. As long as the maxid API remains in place, this is impossible. > > You cannot remove the maxid API. Ok, I'll just propose an id-hashtable patch. I'm thinking of using "modulo queue_maxlen" instead of a traditional hash function (and queue_maxlen as table size) , as this would yield perfect hashing for the FIFO case. _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-16 10:53 ` alexandre.ferrieux 2021-08-16 10:56 ` Florian Westphal @ 2021-08-16 11:19 ` Pablo Neira Ayuso 2021-08-16 11:42 ` Duncan Roe 2 siblings, 0 replies; 20+ messages in thread From: Pablo Neira Ayuso @ 2021-08-16 11:19 UTC (permalink / raw) To: alexandre.ferrieux; +Cc: Florian Westphal, netfilter-devel On Mon, Aug 16, 2021 at 12:53:33PM +0200, alexandre.ferrieux@orange.com wrote: > > > On 8/16/21 11:05 AM, Pablo Neira Ayuso wrote: > > On Sun, Aug 15, 2021 at 08:47:04PM +0200, alexandre.ferrieux@orange.com wrote: > > > > > > > > > [...] to maintain the hashtable, we need to bother the "normal" code path > > > with hash_add/del. Not much, but still, some overhead... > > > > Probably you can collect some numbers to make sure this is not a > > theoretical issue. > > 'k, will do :) > > > > Yes, a full spectrum of batching methods are possible. If we're to minimize > > > the number of bytes crossing the kernel/user boundary though, an array of > > > ids looks like the way to go (4 bytes per packet, assuming uint32 ids). > > > > Are you proposing a new batching mechanism? > > Well, the problem is backwards compatibility. Indeed I'd propose more > flexible batching via an array of ids instead of a maxid. But the main added > value of this flexibility is to enable reused-small-integers ids, like file > descriptors. As long as the maxid API remains in place, this is impossible. OK, I'll compare this to the sendmsg() call including several netlink messages once you send your patch. > > > That being said, the Doxygen of the userland nfqueue API mentions being > > > DEPRECATED... So what is the recommended replacement ? > > > > What API are you refering to specifically? > > I'm referring to the nfq API documented here: > > https://www.netfilter.org/projects/libnetfilter_queue/doxygen/html/group__Queue.html > > It starts with "Queue handling [DEPRECATED]"... libmnl is preferred these days, specifically for advanced stuff. I've been getting reports from users about this old API: It is simple enough and people don't have to deal with netlink details to get packets from the kernel for simple stuff. Probably we should turn this deprecation into using libmnl and the helpers that libnetfilter_queue provides is recommended. Or translate this old API to use libmnl behind the curtain. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-16 10:53 ` alexandre.ferrieux 2021-08-16 10:56 ` Florian Westphal 2021-08-16 11:19 ` Pablo Neira Ayuso @ 2021-08-16 11:42 ` Duncan Roe 2 siblings, 0 replies; 20+ messages in thread From: Duncan Roe @ 2021-08-16 11:42 UTC (permalink / raw) To: alexandre.ferrieux Cc: Pablo Neira Ayuso, Florian Westphal, Netfilter Development On Mon, Aug 16, 2021 at 12:53:33PM +0200, alexandre.ferrieux@orange.com wrote: > > > On 8/16/21 11:05 AM, Pablo Neira Ayuso wrote: > > On Sun, Aug 15, 2021 at 08:47:04PM +0200, alexandre.ferrieux@orange.com wrote: > > > > > > > > > [...] to maintain the hashtable, we need to bother the "normal" code path > > > with hash_add/del. Not much, but still, some overhead... > > > > Probably you can collect some numbers to make sure this is not a > > theoretical issue. > > 'k, will do :) > > > > Yes, a full spectrum of batching methods are possible. If we're to minimize > > > the number of bytes crossing the kernel/user boundary though, an array of > > > ids looks like the way to go (4 bytes per packet, assuming uint32 ids). > > > > Are you proposing a new batching mechanism? > > Well, the problem is backwards compatibility. Indeed I'd propose more > flexible batching via an array of ids instead of a maxid. But the main added > value of this flexibility is to enable reused-small-integers ids, like file > descriptors. As long as the maxid API remains in place, this is impossible. > > > > That being said, the Doxygen of the userland nfqueue API mentions being > > > DEPRECATED... So what is the recommended replacement ? > > > > What API are you refering to specifically? > > > I'm referring to the nfq API documented here: > > > https://www.netfilter.org/projects/libnetfilter_queue/doxygen/html/group__Queue.html > > It starts with "Queue handling [DEPRECATED]"... > The deprecated functions are not going away, it's just not recommended to use them in new code. The replacements are the non-deprecated functions. Here's a bit of background: libnetfilter_queue is essentially a blend of 2 separate libraries. The older (and deprecated) library uses libnfnetlink to communicate with the kernel. The newer (current) library uses libmnl to communicate with the kernel. You either use functions from one library or the other: they don't mix. libnetfilter_queue is a wrapper for all libnfnetlink functions except nfnl_rcvbufsiz(), while it only provides helpers for *some* libmnl functions. The main new feature of the current libnetfilter_queue library is a suite of helpers for packet mangling. These manage checksums and other required header manipulation for ipv4 & ipv6 and the upb & tcp transport layers. Current libnetfilter_queue also provides inclusion of a mangled packet in a verdict - not available from the deprecated library AFAICS. Current libnetfilter_queue doesn't provide batch verdicts. I don't know why - perhaps Pablo can elaborate. Userland support for any new featuer would normally go into libmnl. Cheers ... Duncan. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-15 14:12 ` Pablo Neira Ayuso 2021-08-15 18:47 ` alexandre.ferrieux @ 2021-08-16 12:04 ` Duncan Roe 2021-08-16 16:10 ` Pablo Neira Ayuso 1 sibling, 1 reply; 20+ messages in thread From: Duncan Roe @ 2021-08-16 12:04 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: alexandre.ferrieux, Florian Westphal, Pablo Neira Ayuso, Netfilter Development On Sun, Aug 15, 2021 at 04:12:04PM +0200, Pablo Neira Ayuso wrote: > On Sun, Aug 15, 2021 at 03:32:30PM +0200, alexandre.ferrieux@orange.com wrote: > > On 8/15/21 3:07 PM, Pablo Neira Ayuso wrote: > > > On Sun, Aug 15, 2021 at 02:17:08PM +0200, alexandre.ferrieux@orange.com wrote: > > > [...] > > > > So, the only way forward would be a separate hashtable on ids. > > > > > > Using the rhashtable implementation is fine for this, it's mostly > > > boilerplate code that is needed to use it and there are plenty of > > > examples in the kernel tree if you need a reference. > > > > Thanks, that's indeed pretty simple. I was just worried that people would > > object to adding even the slightest overhead (hash_add/hash_del) to the > > existing code path, that satisfies 99% of uses (LIFO). What do you think ? > > It should be possible to maintain both the list and the hashtable, > AFAICS, the batch callback still needs the queue_list. > > > > > PS: what is the intended dominant use case for batch verdicts ? > > > > > > Issuing a batch containing several packets helps to amortize the > > > cost of the syscall. > > > > Yes, but that could also be achieved by passing an array of ids. > > You mean, one single sendmsg() with several netlink messages, that > would also work to achieve a similar batching effect. sendmsg() can actually be slower. I gave up on a project to send verdicts using sendmsg() (especially with large mangled packets), because benchmarking showed: 1. on a 3YO laptop, no discernable difference. 2. On a 1YO Ryzen desktop, sendmsg() significantly slower. sendmsg() sent 3 or 4 buffers: 1. leading netlink message blocks; 2. the packet; 3. padding to 4 bytes (if required); last: trailing netlink message blocks. sendmsg() saved moving these data blocks into a single buffer. But it introduced the overhead of the kernel's having to validate 4 userland buffers instead of 1. A colleague suggested the Ryzen result is because of having 128-bit registers to move data. I guess that must be it. The spreadsheets from the tests are up on GitHub: https://github.com/duncan-roe/nfq6/blob/main/laptop_timings.ods https://github.com/duncan-roe/nfq6/blob/main/timings.ods Cheers ... Duncan. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-16 12:04 ` Duncan Roe @ 2021-08-16 16:10 ` Pablo Neira Ayuso 2021-08-16 16:15 ` Florian Westphal 2021-08-17 4:03 ` Duncan Roe 0 siblings, 2 replies; 20+ messages in thread From: Pablo Neira Ayuso @ 2021-08-16 16:10 UTC (permalink / raw) To: alexandre.ferrieux, Florian Westphal, Netfilter Development On Mon, Aug 16, 2021 at 10:04:58PM +1000, Duncan Roe wrote: > On Sun, Aug 15, 2021 at 04:12:04PM +0200, Pablo Neira Ayuso wrote: > > On Sun, Aug 15, 2021 at 03:32:30PM +0200, alexandre.ferrieux@orange.com wrote: > > > On 8/15/21 3:07 PM, Pablo Neira Ayuso wrote: > > > > On Sun, Aug 15, 2021 at 02:17:08PM +0200, alexandre.ferrieux@orange.com wrote: > > > > [...] > > > > > So, the only way forward would be a separate hashtable on ids. > > > > > > > > Using the rhashtable implementation is fine for this, it's mostly > > > > boilerplate code that is needed to use it and there are plenty of > > > > examples in the kernel tree if you need a reference. > > > > > > Thanks, that's indeed pretty simple. I was just worried that people would > > > object to adding even the slightest overhead (hash_add/hash_del) to the > > > existing code path, that satisfies 99% of uses (LIFO). What do you think ? > > > > It should be possible to maintain both the list and the hashtable, > > AFAICS, the batch callback still needs the queue_list. > > > > > > > PS: what is the intended dominant use case for batch verdicts ? > > > > > > > > Issuing a batch containing several packets helps to amortize the > > > > cost of the syscall. > > > > > > Yes, but that could also be achieved by passing an array of ids. > > > > You mean, one single sendmsg() with several netlink messages, that > > would also work to achieve a similar batching effect. > > sendmsg() can actually be slower. I gave up on a project to send verdicts using > sendmsg() (especially with large mangled packets), because benchmarking showed: > > 1. on a 3YO laptop, no discernable difference. > > 2. On a 1YO Ryzen desktop, sendmsg() significantly slower. > > sendmsg() sent 3 or 4 buffers: 1. leading netlink message blocks; 2. the packet; > 3. padding to 4 bytes (if required); last: trailing netlink message blocks. > > sendmsg() saved moving these data blocks into a single buffer. But it introduced > the overhead of the kernel's having to validate 4 userland buffers instead of 1. > > A colleague suggested the Ryzen result is because of having 128-bit registers to > move data. I guess that must be it. > > The spreadsheets from the tests are up on GitHub: > https://github.com/duncan-roe/nfq6/blob/main/laptop_timings.ods > https://github.com/duncan-roe/nfq6/blob/main/timings.ods Just a quick test creating 64K entries in the conntrack table, using libmnl. - With batching # time ./batch real 0m0,122s user 0m0,010s sys 0m0,112s - Without batching # time ./nobatch real 0m0,195s user 0m0,049s sys 0m0,146s Just a sample, repeating the tests show similar numbers. Submitting a verdict on a packet via nfnetlink_queue is similar to creating an ct entry through ctnetlink (both use the send syscall). If you only have a few packets waiting for verdict in userspace, then probably batching is not helping much. BTW, leading and trailing netlink message blocks to the kernel are not required for nfnetlink_queue. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-16 16:10 ` Pablo Neira Ayuso @ 2021-08-16 16:15 ` Florian Westphal 2021-08-17 4:03 ` Duncan Roe 1 sibling, 0 replies; 20+ messages in thread From: Florian Westphal @ 2021-08-16 16:15 UTC (permalink / raw) To: Pablo Neira Ayuso Cc: alexandre.ferrieux, Florian Westphal, Netfilter Development Pablo Neira Ayuso <pablo@netfilter.org> wrote: > Submitting a verdict on a packet via nfnetlink_queue is similar to > creating an ct entry through ctnetlink (both use the send syscall). Reinject happens in the context of the process, so batching allows multiple skbs to get transmitted before returning to userspace. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-16 16:10 ` Pablo Neira Ayuso 2021-08-16 16:15 ` Florian Westphal @ 2021-08-17 4:03 ` Duncan Roe 1 sibling, 0 replies; 20+ messages in thread From: Duncan Roe @ 2021-08-17 4:03 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Netfilter Development On Mon, Aug 16, 2021 at 06:10:09PM +0200, Pablo Neira Ayuso wrote: [...] > > BTW, leading and trailing netlink message blocks to the kernel are not > required for nfnetlink_queue. Sorry, sloppy explanation on my part. This had nothing to do with batches. The idea was to attain a zero-copy nfq program by using sendmsg with an iov pointing at the mangled packet, wherever it is. A previous iov has to point to the struct nlmsghdr that starts the message. That first buffer ends with the struct nlattr for the packet data addressed by the next iov. If padding was required, I was sending a 3rd buffer. It's almost certainly fine to append the padding to the packet buffer instead, and for sure fine if check `pktb_tailroom > pad` first. Then I was sending the ACCEPT verdict in a 4th buffer. As you point out, I could have sent that earlier, before the trailing struct nlattr. When I originally started writing nfq programs I was concerned that the kernel might accept the original packet as soon as it encountered the ACCEPT, and miss that there was an updated packet following. I now realise it doesn't work that way, but got stuck with the old order of doing thins. So the code would be down to trading 1 kernel user buffer validate against 2 userland data copies. Which method is faster might well depend on packet length. I don't plan to pursue this further for now. Cheers ... Duncan. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: nfnetlink_queue -- why linear lookup ? 2021-08-15 13:07 ` Pablo Neira Ayuso 2021-08-15 13:32 ` alexandre.ferrieux @ 2021-08-15 13:33 ` alexandre.ferrieux 1 sibling, 0 replies; 20+ messages in thread From: alexandre.ferrieux @ 2021-08-15 13:33 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel (reset, typo: LIFO->FIFO) On 8/15/21 3:07 PM, Pablo Neira Ayuso wrote: > On Sun, Aug 15, 2021 at 02:17:08PM +0200, alexandre.ferrieux@orange.com wrote: > [...] >> So, the only way forward would be a separate hashtable on ids. > > Using the rhashtable implementation is fine for this, it's mostly > boilerplate code that is needed to use it and there are plenty of > examples in the kernel tree if you need a reference. Thanks, that's indeed pretty simple. I was just worried that people would object to adding even the slightest overhead (hash_add/hash_del) to the existing code path, that satisfies 99% of uses (FIFO). What do you think ? >> PS: what is the intended dominant use case for batch verdicts ? > > Issuing a batch containing several packets helps to amortize the > cost of the syscall. Yes, but that could also be achieved by passing an array of ids. The extra constraint of using a (contiguous) range means that there is no outlier. This seems to imply that ranges are no help when flows are multiplexed. Or maybe, the assumption was that bursts tend to be homogeneous ? _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2021-08-17 4:03 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-08-13 11:55 nfnetlink_queue -- why linear lookup ? alexandre.ferrieux 2021-08-14 21:01 ` Florian Westphal 2021-08-14 21:05 ` alexandre.ferrieux 2021-08-14 21:12 ` Florian Westphal 2021-08-15 12:17 ` alexandre.ferrieux 2021-08-15 13:07 ` Pablo Neira Ayuso 2021-08-15 13:32 ` alexandre.ferrieux 2021-08-15 14:12 ` Pablo Neira Ayuso 2021-08-15 18:47 ` alexandre.ferrieux 2021-08-16 9:05 ` Pablo Neira Ayuso 2021-08-16 10:53 ` alexandre.ferrieux 2021-08-16 10:56 ` Florian Westphal 2021-08-16 11:07 ` alexandre.ferrieux 2021-08-16 11:19 ` Pablo Neira Ayuso 2021-08-16 11:42 ` Duncan Roe 2021-08-16 12:04 ` Duncan Roe 2021-08-16 16:10 ` Pablo Neira Ayuso 2021-08-16 16:15 ` Florian Westphal 2021-08-17 4:03 ` Duncan Roe 2021-08-15 13:33 ` alexandre.ferrieux
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).