nfnetlink_queue -- why linear lookup ?

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* nfnetlink_queue -- why linear lookup ?
@ 2021-08-13 11:55 alexandre.ferrieux
  2021-08-14 21:01 ` Florian Westphal
  0 siblings, 1 reply; 20+ messages in thread
From: alexandre.ferrieux @ 2021-08-13 11:55 UTC (permalink / raw)
  To: netfilter-devel

Hello,

In nfnetlink_queue.c, when receiving a verdict for a packet, its entry in the 
queue is looked up linearly:

   find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id)
   {
     ...
     list_for_each_entry(i, &queue->queue_list, list) {
       if (i->id == id) {
         entry = i;
         break;
       }
     }
     ...
   }

As a result, in a situation of "highly asynchronous" verdicts, i.e. when we want 
some packets to linger in the queue for some time before reinjection, the mere 
existence of a large number of such "old packets" may incur a nonnegligible cost 
to the system.

So I'm wondering: why is the list implemented as a simple linked list instead of 
an array directly indexed by the id (like file descriptors) ?

Indeed, the list has a configured max size, the passed id can be bound-checked, 
discarded entries can simply hold a NULL, and id reuse is userspace's 
responsibility. So it looks like the array would yield constant-time lookup with 
no extra risk.

What am I missing ?

Thans in advance,

-Alex

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-13 11:55 nfnetlink_queue -- why linear lookup ? alexandre.ferrieux
@ 2021-08-14 21:01 ` Florian Westphal
  2021-08-14 21:05   ` alexandre.ferrieux
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Westphal @ 2021-08-14 21:01 UTC (permalink / raw)
  To: alexandre.ferrieux; +Cc: netfilter-devel

alexandre.ferrieux@orange.com <alexandre.ferrieux@orange.com> wrote:
>   find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id)
>   {
>     ...
>     list_for_each_entry(i, &queue->queue_list, list) {
>       if (i->id == id) {
>         entry = i;
>         break;
>       }
>     }
>     ...
>   }
> 
> As a result, in a situation of "highly asynchronous" verdicts, i.e. when we
> want some packets to linger in the queue for some time before reinjection,
> the mere existence of a large number of such "old packets" may incur a
> nonnegligible cost to the system.
> 
> So I'm wondering: why is the list implemented as a simple linked list
> instead of an array directly indexed by the id (like file descriptors) ?

Because when this was implemented "highly asynchronous" was not on the
radar.  All users of this (that I know of) do in-order verdicts.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-14 21:01 ` Florian Westphal
@ 2021-08-14 21:05   ` alexandre.ferrieux
  2021-08-14 21:12     ` Florian Westphal
  0 siblings, 1 reply; 20+ messages in thread
From: alexandre.ferrieux @ 2021-08-14 21:05 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On 8/14/21 11:01 PM, Florian Westphal wrote:
> alexandre.ferrieux@orange.com <alexandre.ferrieux@orange.com> wrote:
>>   find_dequeue_entry(struct nfqnl_instance *queue, unsigned int id)
>>   {
>>     ...
>>     list_for_each_entry(i, &queue->queue_list, list) {
>>       if (i->id == id) {
>>         entry = i;
>>         break;
>>       }
>>     }
>>     ...
>>   }
>> 
>> As a result, in a situation of "highly asynchronous" verdicts, i.e. when we
>> want some packets to linger in the queue for some time before reinjection,
>> the mere existence of a large number of such "old packets" may incur a
>> nonnegligible cost to the system.
>> 
>> So I'm wondering: why is the list implemented as a simple linked list
>> instead of an array directly indexed by the id (like file descriptors) ?
> 
> Because when this was implemented "highly asynchronous" was not on the
> radar.  All users of this (that I know of) do in-order verdicts.

So, O(N) instead of O(1) just because "I currently can't imagine N>5" ?

Would a patch to that effect be rejected ?

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-14 21:05   ` alexandre.ferrieux
@ 2021-08-14 21:12     ` Florian Westphal
  2021-08-15 12:17       ` alexandre.ferrieux
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Westphal @ 2021-08-14 21:12 UTC (permalink / raw)
  To: alexandre.ferrieux; +Cc: Florian Westphal, netfilter-devel

alexandre.ferrieux@orange.com <alexandre.ferrieux@orange.com> wrote:
> > Because when this was implemented "highly asynchronous" was not on the
> > radar.  All users of this (that I know of) do in-order verdicts.
> 
> So, O(N) instead of O(1) just because "I currently can't imagine N>5" ?

Seems so. THis code was written 21 years ago.

> Would a patch to that effect be rejected ?

Probably not, depends on the implementation.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-14 21:12     ` Florian Westphal
@ 2021-08-15 12:17       ` alexandre.ferrieux
  2021-08-15 13:07         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 20+ messages in thread
From: alexandre.ferrieux @ 2021-08-15 12:17 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On 8/14/21 11:12 PM, Florian Westphal wrote:
> alexandre.ferrieux@orange.com <alexandre.ferrieux@orange.com> wrote:
>> > Because when this was implemented "highly asynchronous" was not on the
>> > radar.  All users of this (that I know of) do in-order verdicts.
>> 
>> So, O(N) instead of O(1) just because "I currently can't imagine N>5" ?
> 
> Seems so. THis code was written 21 years ago.
> 
>> Would a patch to that effect be rejected ?
> 
> Probably not, depends on the implementation.

Sad: while the (necessarily) async nature of the kernel/user interface naturally 
suggests this change, one specific part of the existing API complicates things: 
batch verdicts !

Indeed, the very notion of an "id range" for batch verdicts, forbids the simple 
approach of reused small integers as array indices.

So, the only way forward would be a separate hashtable on ids. Much less elegant 
+ risk of slight overhead for housekeeping. I stand disappointed :)

PS: what is the intended dominant use case for batch verdicts ?

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-15 12:17       ` alexandre.ferrieux
@ 2021-08-15 13:07         ` Pablo Neira Ayuso
  2021-08-15 13:32           ` alexandre.ferrieux
  2021-08-15 13:33           ` alexandre.ferrieux
  0 siblings, 2 replies; 20+ messages in thread
From: Pablo Neira Ayuso @ 2021-08-15 13:07 UTC (permalink / raw)
  To: alexandre.ferrieux; +Cc: Florian Westphal, netfilter-devel

On Sun, Aug 15, 2021 at 02:17:08PM +0200, alexandre.ferrieux@orange.com wrote:
[...]
> So, the only way forward would be a separate hashtable on ids.

Using the rhashtable implementation is fine for this, it's mostly
boilerplate code that is needed to use it and there are plenty of
examples in the kernel tree if you need a reference.

[...]
> PS: what is the intended dominant use case for batch verdicts ?

Issuing a batch containing several packets helps to amortize the
cost of the syscall.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-15 13:07         ` Pablo Neira Ayuso
@ 2021-08-15 13:32           ` alexandre.ferrieux
  2021-08-15 14:12             ` Pablo Neira Ayuso
  2021-08-15 13:33           ` alexandre.ferrieux
  1 sibling, 1 reply; 20+ messages in thread
From: alexandre.ferrieux @ 2021-08-15 13:32 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel

On 8/15/21 3:07 PM, Pablo Neira Ayuso wrote:
> On Sun, Aug 15, 2021 at 02:17:08PM +0200, alexandre.ferrieux@orange.com wrote:
> [...]
>> So, the only way forward would be a separate hashtable on ids.
> 
> Using the rhashtable implementation is fine for this, it's mostly
> boilerplate code that is needed to use it and there are plenty of
> examples in the kernel tree if you need a reference.

Thanks, that's indeed pretty simple. I was just worried that people would object 
to adding even the slightest overhead (hash_add/hash_del) to the existing code 
path, that satisfies 99% of uses (LIFO). What do you think ?

>> PS: what is the intended dominant use case for batch verdicts ?
> 
> Issuing a batch containing several packets helps to amortize the
> cost of the syscall.

Yes, but that could also be achieved by passing an array of ids. The extra 
constraint of using a (contiguous) range means that there is no outlier. This 
seems to imply that ranges are no help when flows are multiplexed. Or maybe, the 
assumption was that bursts tend to be homogeneous ?

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-15 13:32           ` alexandre.ferrieux
@ 2021-08-15 14:12             ` Pablo Neira Ayuso
  2021-08-15 18:47               ` alexandre.ferrieux
  2021-08-16 12:04               ` Duncan Roe
  0 siblings, 2 replies; 20+ messages in thread
From: Pablo Neira Ayuso @ 2021-08-15 14:12 UTC (permalink / raw)
  To: alexandre.ferrieux; +Cc: Florian Westphal, netfilter-devel

On Sun, Aug 15, 2021 at 03:32:30PM +0200, alexandre.ferrieux@orange.com wrote:
> On 8/15/21 3:07 PM, Pablo Neira Ayuso wrote:
> > On Sun, Aug 15, 2021 at 02:17:08PM +0200, alexandre.ferrieux@orange.com wrote:
> > [...]
> > > So, the only way forward would be a separate hashtable on ids.
> > 
> > Using the rhashtable implementation is fine for this, it's mostly
> > boilerplate code that is needed to use it and there are plenty of
> > examples in the kernel tree if you need a reference.
> 
> Thanks, that's indeed pretty simple. I was just worried that people would
> object to adding even the slightest overhead (hash_add/hash_del) to the
> existing code path, that satisfies 99% of uses (LIFO). What do you think ?

It should be possible to maintain both the list and the hashtable,
AFAICS, the batch callback still needs the queue_list.

> > > PS: what is the intended dominant use case for batch verdicts ?
> > 
> > Issuing a batch containing several packets helps to amortize the
> > cost of the syscall.
> 
> Yes, but that could also be achieved by passing an array of ids.

You mean, one single sendmsg() with several netlink messages, that
would also work to achieve a similar batching effect.

> The extra constraint of using a (contiguous) range means that there
> is no outlier.  This seems to imply that ranges are no help when
> flows are multiplexed. Or maybe, the assumption was that bursts tend
> to be homogeneous ?

What is your usecase?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-15 14:12             ` Pablo Neira Ayuso
@ 2021-08-15 18:47               ` alexandre.ferrieux
  2021-08-16  9:05                 ` Pablo Neira Ayuso
  2021-08-16 12:04               ` Duncan Roe
  1 sibling, 1 reply; 20+ messages in thread
From: alexandre.ferrieux @ 2021-08-15 18:47 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel

On 8/15/21 4:12 PM, Pablo Neira Ayuso wrote:
> On Sun, Aug 15, 2021 at 03:32:30PM +0200, alexandre.ferrieux@orange.com wrote:
 >>
>> [...] I was just worried that people would >> object to adding even the slightest overhead (hash_add/hash_del) to the
>> existing code path, that satisfies 99% of uses (LIFO). What do you think ?
> 
> It should be possible to maintain both the list and the hashtable,
> AFAICS, the batch callback still needs the queue_list.

Yes, but to maintain the hashtable, we need to bother the "normal" code path 
with hash_add/del. Not much, but still, some overhead...

>> > > PS: what is the intended dominant use case for batch verdicts ?
>> > 
>> > Issuing a batch containing several packets helps to amortize the
>> > cost of the syscall.
>> 
>> Yes, but that could also be achieved by passing an array of ids.
> 
> You mean, one single sendmsg() with several netlink messages, that
> would also work to achieve a similar batching effect.

Yes, a full spectrum of batching methods are possible. If we're to minimize the 
number of bytes crossing the kernel/user boundary though, an array of ids looks 
like the way to go (4 bytes per packet, assuming uint32 ids).

>> The extra constraint of using a (contiguous) range means that there
>> is no outlier.  This seems to imply that ranges are no help when
>> flows are multiplexed. Or maybe, the assumption was that bursts tend
>> to be homogeneous ?
> 
> What is your usecase?

For O(1) lookup:

My primary motivation is for transparent proxies and userland qdiscs. In both 
cases, specific packets must be held for some time and reinjected at a later 
time which is not computed by a simple, fixed delay, but rather triggered by 
some external event.

My secondary motivation is that the netfilter queue is a beautifully 
asynchronous mechanism, and absolutely nothing in its definition limits it to 
the dumb FIFO-of-instantaneous-drop-decisions that seems to dominate sample code.

For the deprecation of id-range-based batching:

It seems that as soon as two different packet streams are muxed in the queue, 
one deserving verdict A and the other verdict B, contiguous id ranges of a given 
verdict may be very small. But I realize I'm 20 years late to complain :)

That being said, the Doxygen of the userland nfqueue API mentions being 
DEPRECATED... So what is the recommended replacement ?

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-15 18:47               ` alexandre.ferrieux
@ 2021-08-16  9:05                 ` Pablo Neira Ayuso
  2021-08-16 10:53                   ` alexandre.ferrieux
  0 siblings, 1 reply; 20+ messages in thread
From: Pablo Neira Ayuso @ 2021-08-16  9:05 UTC (permalink / raw)
  To: alexandre.ferrieux; +Cc: Florian Westphal, netfilter-devel

On Sun, Aug 15, 2021 at 08:47:04PM +0200, alexandre.ferrieux@orange.com wrote:
> 
> 
> On 8/15/21 4:12 PM, Pablo Neira Ayuso wrote:
> > On Sun, Aug 15, 2021 at 03:32:30PM +0200, alexandre.ferrieux@orange.com wrote:
> >>
> > > [...] I was just worried that people would >> object to adding even the slightest overhead (hash_add/hash_del) to the
> > > existing code path, that satisfies 99% of uses (LIFO). What do you think ?
> > 
> > It should be possible to maintain both the list and the hashtable,
> > AFAICS, the batch callback still needs the queue_list.
> 
> Yes, but to maintain the hashtable, we need to bother the "normal" code path
> with hash_add/del. Not much, but still, some overhead...

Probably you can collect some numbers to make sure this is not a
theoretical issue.

> > > > > PS: what is the intended dominant use case for batch verdicts ?
> > > > > Issuing a batch containing several packets helps to amortize the
> > > > cost of the syscall.
> > > 
> > > Yes, but that could also be achieved by passing an array of ids.
> > 
> > You mean, one single sendmsg() with several netlink messages, that
> > would also work to achieve a similar batching effect.
> 
> Yes, a full spectrum of batching methods are possible. If we're to minimize
> the number of bytes crossing the kernel/user boundary though, an array of
> ids looks like the way to go (4 bytes per packet, assuming uint32 ids).

Are you proposing a new batching mechanism?

> > > The extra constraint of using a (contiguous) range means that there
> > > is no outlier.  This seems to imply that ranges are no help when
> > > flows are multiplexed. Or maybe, the assumption was that bursts tend
> > > to be homogeneous ?
> > 
> > What is your usecase?
> 
> For O(1) lookup:
> 
> My primary motivation is for transparent proxies and userland qdiscs. In
> both cases, specific packets must be held for some time and reinjected at a
> later time which is not computed by a simple, fixed delay, but rather
> triggered by some external event.
> 
> My secondary motivation is that the netfilter queue is a beautifully
> asynchronous mechanism, and absolutely nothing in its definition limits it
> to the dumb FIFO-of-instantaneous-drop-decisions that seems to dominate
> sample code.

I see. Thanks for telling me.

> For the deprecation of id-range-based batching:
> 
> It seems that as soon as two different packet streams are muxed in the
> queue, one deserving verdict A and the other verdict B, contiguous id ranges
> of a given verdict may be very small. But I realize I'm 20 years late to
> complain :)

As I said, you can place several netlink messages in one single
sendmsg() call, then they do not need to be contiguous.

> That being said, the Doxygen of the userland nfqueue API mentions being
> DEPRECATED... So what is the recommended replacement ?

What API are you refering to specifically?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-16  9:05                 ` Pablo Neira Ayuso
@ 2021-08-16 10:53                   ` alexandre.ferrieux
  2021-08-16 10:56                     ` Florian Westphal
                                       ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: alexandre.ferrieux @ 2021-08-16 10:53 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel

On 8/16/21 11:05 AM, Pablo Neira Ayuso wrote:
> On Sun, Aug 15, 2021 at 08:47:04PM +0200, alexandre.ferrieux@orange.com wrote:
>> 
>> 
>> [...] to maintain the hashtable, we need to bother the "normal" code path
>> with hash_add/del. Not much, but still, some overhead...
> 
> Probably you can collect some numbers to make sure this is not a
> theoretical issue.

'k, will do :)

>> Yes, a full spectrum of batching methods are possible. If we're to minimize
>> the number of bytes crossing the kernel/user boundary though, an array of
>> ids looks like the way to go (4 bytes per packet, assuming uint32 ids).
> 
> Are you proposing a new batching mechanism?

Well, the problem is backwards compatibility. Indeed I'd propose more flexible 
batching via an array of ids instead of a maxid. But the main added value of 
this flexibility is to enable reused-small-integers ids, like file descriptors. 
As long as the maxid API remains in place, this is impossible.

>> That being said, the Doxygen of the userland nfqueue API mentions being
>> DEPRECATED... So what is the recommended replacement ?
> 
> What API are you refering to specifically?

I'm referring to the nfq API documented here:

https://www.netfilter.org/projects/libnetfilter_queue/doxygen/html/group__Queue.html

It starts with "Queue handling [DEPRECATED]"...

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-16 10:53                   ` alexandre.ferrieux
@ 2021-08-16 10:56                     ` Florian Westphal
  2021-08-16 11:07                       ` alexandre.ferrieux
  2021-08-16 11:19                     ` Pablo Neira Ayuso
  2021-08-16 11:42                     ` Duncan Roe
  2 siblings, 1 reply; 20+ messages in thread
From: Florian Westphal @ 2021-08-16 10:56 UTC (permalink / raw)
  To: alexandre.ferrieux; +Cc: Pablo Neira Ayuso, Florian Westphal, netfilter-devel

alexandre.ferrieux@orange.com <alexandre.ferrieux@orange.com> wrote:
> Well, the problem is backwards compatibility. Indeed I'd propose more
> flexible batching via an array of ids instead of a maxid. But the main added
> value of this flexibility is to enable reused-small-integers ids, like file
> descriptors. As long as the maxid API remains in place, this is impossible.

You cannot remove the maxid API.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-16 10:56                     ` Florian Westphal
@ 2021-08-16 11:07                       ` alexandre.ferrieux
  0 siblings, 0 replies; 20+ messages in thread
From: alexandre.ferrieux @ 2021-08-16 11:07 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel

On 8/16/21 12:56 PM, Florian Westphal wrote:
> alexandre.ferrieux@orange.com <alexandre.ferrieux@orange.com> wrote:
>> Well, the problem is backwards compatibility. Indeed I'd propose more
>> flexible batching via an array of ids instead of a maxid. But the main added
>> value of this flexibility is to enable reused-small-integers ids, like file
>> descriptors. As long as the maxid API remains in place, this is impossible.
> 
> You cannot remove the maxid API.

Ok, I'll just propose an id-hashtable patch.
I'm thinking of using "modulo queue_maxlen" instead of a traditional hash 
function (and queue_maxlen as table size) , as this would yield perfect hashing 
for the FIFO case.

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-16 10:53                   ` alexandre.ferrieux
  2021-08-16 10:56                     ` Florian Westphal
@ 2021-08-16 11:19                     ` Pablo Neira Ayuso
  2021-08-16 11:42                     ` Duncan Roe
  2 siblings, 0 replies; 20+ messages in thread
From: Pablo Neira Ayuso @ 2021-08-16 11:19 UTC (permalink / raw)
  To: alexandre.ferrieux; +Cc: Florian Westphal, netfilter-devel

On Mon, Aug 16, 2021 at 12:53:33PM +0200, alexandre.ferrieux@orange.com wrote:
> 
> 
> On 8/16/21 11:05 AM, Pablo Neira Ayuso wrote:
> > On Sun, Aug 15, 2021 at 08:47:04PM +0200, alexandre.ferrieux@orange.com wrote:
> > > 
> > > 
> > > [...] to maintain the hashtable, we need to bother the "normal" code path
> > > with hash_add/del. Not much, but still, some overhead...
> > 
> > Probably you can collect some numbers to make sure this is not a
> > theoretical issue.
> 
> 'k, will do :)
> 
> > > Yes, a full spectrum of batching methods are possible. If we're to minimize
> > > the number of bytes crossing the kernel/user boundary though, an array of
> > > ids looks like the way to go (4 bytes per packet, assuming uint32 ids).
> > 
> > Are you proposing a new batching mechanism?
> 
> Well, the problem is backwards compatibility. Indeed I'd propose more
> flexible batching via an array of ids instead of a maxid. But the main added
> value of this flexibility is to enable reused-small-integers ids, like file
> descriptors. As long as the maxid API remains in place, this is impossible.

OK, I'll compare this to the sendmsg() call including several netlink
messages once you send your patch.

> > > That being said, the Doxygen of the userland nfqueue API mentions being
> > > DEPRECATED... So what is the recommended replacement ?
> > 
> > What API are you refering to specifically?
> 
> I'm referring to the nfq API documented here:
> 
> https://www.netfilter.org/projects/libnetfilter_queue/doxygen/html/group__Queue.html
> 
> It starts with "Queue handling [DEPRECATED]"...

libmnl is preferred these days, specifically for advanced stuff.

I've been getting reports from users about this old API: It is simple
enough and people don't have to deal with netlink details to get
packets from the kernel for simple stuff.

Probably we should turn this deprecation into using libmnl and the
helpers that libnetfilter_queue provides is recommended. Or translate
this old API to use libmnl behind the curtain.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-16 10:53                   ` alexandre.ferrieux
  2021-08-16 10:56                     ` Florian Westphal
  2021-08-16 11:19                     ` Pablo Neira Ayuso
@ 2021-08-16 11:42                     ` Duncan Roe
  2 siblings, 0 replies; 20+ messages in thread
From: Duncan Roe @ 2021-08-16 11:42 UTC (permalink / raw)
  To: alexandre.ferrieux
  Cc: Pablo Neira Ayuso, Florian Westphal, Netfilter Development

On Mon, Aug 16, 2021 at 12:53:33PM +0200, alexandre.ferrieux@orange.com wrote:
>
>
> On 8/16/21 11:05 AM, Pablo Neira Ayuso wrote:
> > On Sun, Aug 15, 2021 at 08:47:04PM +0200, alexandre.ferrieux@orange.com wrote:
> > >
> > >
> > > [...] to maintain the hashtable, we need to bother the "normal" code path
> > > with hash_add/del. Not much, but still, some overhead...
> >
> > Probably you can collect some numbers to make sure this is not a
> > theoretical issue.
>
> 'k, will do :)
>
> > > Yes, a full spectrum of batching methods are possible. If we're to minimize
> > > the number of bytes crossing the kernel/user boundary though, an array of
> > > ids looks like the way to go (4 bytes per packet, assuming uint32 ids).
> >
> > Are you proposing a new batching mechanism?
>
> Well, the problem is backwards compatibility. Indeed I'd propose more
> flexible batching via an array of ids instead of a maxid. But the main added
> value of this flexibility is to enable reused-small-integers ids, like file
> descriptors. As long as the maxid API remains in place, this is impossible.
>
> > > That being said, the Doxygen of the userland nfqueue API mentions being
> > > DEPRECATED... So what is the recommended replacement ?
> >
> > What API are you refering to specifically?
>
>
> I'm referring to the nfq API documented here:
>
>
> https://www.netfilter.org/projects/libnetfilter_queue/doxygen/html/group__Queue.html
>
> It starts with "Queue handling [DEPRECATED]"...
>
The deprecated functions are not going away, it's just not recommended to use
them in new code.

The replacements are the non-deprecated functions.

Here's a bit of background: libnetfilter_queue is essentially a blend of 2
separate libraries. The older (and deprecated) library uses libnfnetlink to
communicate with the kernel. The newer (current) library uses libmnl to
communicate with the kernel. You either use functions from one library or the
other: they don't mix.

libnetfilter_queue is a wrapper for all libnfnetlink functions except
nfnl_rcvbufsiz(), while it only provides helpers for *some* libmnl functions.

The main new feature of the current libnetfilter_queue library is a suite of
helpers for packet mangling. These manage checksums and other required header
manipulation for ipv4 & ipv6 and the upb & tcp transport layers. Current
libnetfilter_queue also provides inclusion of a mangled packet in a verdict -
not available from the deprecated library AFAICS.

Current libnetfilter_queue doesn't provide batch verdicts. I don't know why -
perhaps Pablo can elaborate.

Userland support for any new featuer would normally go into libmnl.

Cheers ... Duncan.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-15 14:12             ` Pablo Neira Ayuso
  2021-08-15 18:47               ` alexandre.ferrieux
@ 2021-08-16 12:04               ` Duncan Roe
  2021-08-16 16:10                 ` Pablo Neira Ayuso
  1 sibling, 1 reply; 20+ messages in thread
From: Duncan Roe @ 2021-08-16 12:04 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: alexandre.ferrieux, Florian Westphal, Pablo Neira Ayuso,
	Netfilter Development

On Sun, Aug 15, 2021 at 04:12:04PM +0200, Pablo Neira Ayuso wrote:
> On Sun, Aug 15, 2021 at 03:32:30PM +0200, alexandre.ferrieux@orange.com wrote:
> > On 8/15/21 3:07 PM, Pablo Neira Ayuso wrote:
> > > On Sun, Aug 15, 2021 at 02:17:08PM +0200, alexandre.ferrieux@orange.com wrote:
> > > [...]
> > > > So, the only way forward would be a separate hashtable on ids.
> > >
> > > Using the rhashtable implementation is fine for this, it's mostly
> > > boilerplate code that is needed to use it and there are plenty of
> > > examples in the kernel tree if you need a reference.
> >
> > Thanks, that's indeed pretty simple. I was just worried that people would
> > object to adding even the slightest overhead (hash_add/hash_del) to the
> > existing code path, that satisfies 99% of uses (LIFO). What do you think ?
>
> It should be possible to maintain both the list and the hashtable,
> AFAICS, the batch callback still needs the queue_list.
>
> > > > PS: what is the intended dominant use case for batch verdicts ?
> > >
> > > Issuing a batch containing several packets helps to amortize the
> > > cost of the syscall.
> >
> > Yes, but that could also be achieved by passing an array of ids.
>
> You mean, one single sendmsg() with several netlink messages, that
> would also work to achieve a similar batching effect.

sendmsg() can actually be slower. I gave up on a project to send verdicts using
sendmsg() (especially with large mangled packets), because benchmarking showed:

1. on a 3YO laptop, no discernable difference.

2. On a 1YO Ryzen desktop, sendmsg() significantly slower.

sendmsg() sent 3 or 4 buffers: 1. leading netlink message blocks; 2. the packet;
3. padding to 4 bytes (if required); last: trailing netlink message blocks.

sendmsg() saved moving these data blocks into a single buffer. But it introduced
the overhead of the kernel's having to validate 4 userland buffers instead of 1.

A colleague suggested the Ryzen result is because of having 128-bit registers to
move data. I guess that must be it.

The spreadsheets from the tests are up on GitHub:
https://github.com/duncan-roe/nfq6/blob/main/laptop_timings.ods
https://github.com/duncan-roe/nfq6/blob/main/timings.ods

Cheers ... Duncan.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-16 12:04               ` Duncan Roe
@ 2021-08-16 16:10                 ` Pablo Neira Ayuso
  2021-08-16 16:15                   ` Florian Westphal
  2021-08-17  4:03                   ` Duncan Roe
  0 siblings, 2 replies; 20+ messages in thread
From: Pablo Neira Ayuso @ 2021-08-16 16:10 UTC (permalink / raw)
  To: alexandre.ferrieux, Florian Westphal, Netfilter Development

On Mon, Aug 16, 2021 at 10:04:58PM +1000, Duncan Roe wrote:
> On Sun, Aug 15, 2021 at 04:12:04PM +0200, Pablo Neira Ayuso wrote:
> > On Sun, Aug 15, 2021 at 03:32:30PM +0200, alexandre.ferrieux@orange.com wrote:
> > > On 8/15/21 3:07 PM, Pablo Neira Ayuso wrote:
> > > > On Sun, Aug 15, 2021 at 02:17:08PM +0200, alexandre.ferrieux@orange.com wrote:
> > > > [...]
> > > > > So, the only way forward would be a separate hashtable on ids.
> > > >
> > > > Using the rhashtable implementation is fine for this, it's mostly
> > > > boilerplate code that is needed to use it and there are plenty of
> > > > examples in the kernel tree if you need a reference.
> > >
> > > Thanks, that's indeed pretty simple. I was just worried that people would
> > > object to adding even the slightest overhead (hash_add/hash_del) to the
> > > existing code path, that satisfies 99% of uses (LIFO). What do you think ?
> >
> > It should be possible to maintain both the list and the hashtable,
> > AFAICS, the batch callback still needs the queue_list.
> >
> > > > > PS: what is the intended dominant use case for batch verdicts ?
> > > >
> > > > Issuing a batch containing several packets helps to amortize the
> > > > cost of the syscall.
> > >
> > > Yes, but that could also be achieved by passing an array of ids.
> >
> > You mean, one single sendmsg() with several netlink messages, that
> > would also work to achieve a similar batching effect.
> 
> sendmsg() can actually be slower. I gave up on a project to send verdicts using
> sendmsg() (especially with large mangled packets), because benchmarking showed:
> 
> 1. on a 3YO laptop, no discernable difference.
> 
> 2. On a 1YO Ryzen desktop, sendmsg() significantly slower.
> 
> sendmsg() sent 3 or 4 buffers: 1. leading netlink message blocks; 2. the packet;
> 3. padding to 4 bytes (if required); last: trailing netlink message blocks.
>
> sendmsg() saved moving these data blocks into a single buffer. But it introduced
> the overhead of the kernel's having to validate 4 userland buffers instead of 1.
> 
> A colleague suggested the Ryzen result is because of having 128-bit registers to
> move data. I guess that must be it.
> 
> The spreadsheets from the tests are up on GitHub:
> https://github.com/duncan-roe/nfq6/blob/main/laptop_timings.ods
> https://github.com/duncan-roe/nfq6/blob/main/timings.ods

Just a quick test creating 64K entries in the conntrack table, using
libmnl.

- With batching

# time ./batch

real    0m0,122s
user    0m0,010s
sys     0m0,112s

- Without batching

# time ./nobatch

real    0m0,195s
user    0m0,049s
sys     0m0,146s

Just a sample, repeating the tests show similar numbers.

Submitting a verdict on a packet via nfnetlink_queue is similar to
creating an ct entry through ctnetlink (both use the send syscall).

If you only have a few packets waiting for verdict in userspace, then
probably batching is not helping much.

BTW, leading and trailing netlink message blocks to the kernel are not
required for nfnetlink_queue.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-16 16:10                 ` Pablo Neira Ayuso
@ 2021-08-16 16:15                   ` Florian Westphal
  2021-08-17  4:03                   ` Duncan Roe
  1 sibling, 0 replies; 20+ messages in thread
From: Florian Westphal @ 2021-08-16 16:15 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: alexandre.ferrieux, Florian Westphal, Netfilter Development

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Submitting a verdict on a packet via nfnetlink_queue is similar to
> creating an ct entry through ctnetlink (both use the send syscall).

Reinject happens in the context of the process, so batching allows
multiple skbs to get transmitted before returning to userspace.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-16 16:10                 ` Pablo Neira Ayuso
  2021-08-16 16:15                   ` Florian Westphal
@ 2021-08-17  4:03                   ` Duncan Roe
  1 sibling, 0 replies; 20+ messages in thread
From: Duncan Roe @ 2021-08-17  4:03 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Netfilter Development

On Mon, Aug 16, 2021 at 06:10:09PM +0200, Pablo Neira Ayuso wrote: [...]
>
> BTW, leading and trailing netlink message blocks to the kernel are not
> required for nfnetlink_queue.

Sorry, sloppy explanation on my part. This had nothing to do with batches.

The idea was to attain a zero-copy nfq program by using sendmsg with an iov
pointing at the mangled packet, wherever it is. A previous iov has to point to
the struct nlmsghdr that starts the message.

That first buffer ends with the struct nlattr for the packet data addressed by
the next iov.

If padding was required, I was sending a 3rd buffer. It's almost certainly fine
to append the padding to the packet buffer instead, and for sure fine if check
`pktb_tailroom > pad` first.

Then I was sending the ACCEPT verdict in a 4th buffer. As you point out, I could
have sent that earlier, before the trailing struct nlattr.

When I originally started writing nfq programs I was concerned that the kernel
might accept the original packet as soon as it encountered the ACCEPT, and miss
that there was an updated packet following.

I now realise it doesn't work that way, but got stuck with the old order of
doing thins.

So the code would be down to trading 1 kernel user buffer validate against 2
userland data copies. Which method is faster might well depend on packet length.

I don't plan to pursue this further for now.

Cheers ... Duncan.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: nfnetlink_queue -- why linear lookup ?
  2021-08-15 13:07         ` Pablo Neira Ayuso
  2021-08-15 13:32           ` alexandre.ferrieux
@ 2021-08-15 13:33           ` alexandre.ferrieux
  1 sibling, 0 replies; 20+ messages in thread
From: alexandre.ferrieux @ 2021-08-15 13:33 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel

(reset, typo: LIFO->FIFO)

On 8/15/21 3:07 PM, Pablo Neira Ayuso wrote:
> On Sun, Aug 15, 2021 at 02:17:08PM +0200, alexandre.ferrieux@orange.com wrote:
> [...]
>> So, the only way forward would be a separate hashtable on ids.
>
> Using the rhashtable implementation is fine for this, it's mostly
> boilerplate code that is needed to use it and there are plenty of
> examples in the kernel tree if you need a reference.

Thanks, that's indeed pretty simple. I was just worried that people would object 
to adding even the slightest overhead (hash_add/hash_del) to the existing code 
path, that satisfies 99% of uses (FIFO). What do you think ?

>> PS: what is the intended dominant use case for batch verdicts ?
>
> Issuing a batch containing several packets helps to amortize the
> cost of the syscall.

Yes, but that could also be achieved by passing an array of ids. The extra 
constraint of using a (contiguous) range means that there is no outlier. This 
seems to imply that ranges are no help when flows are multiplexed. Or maybe, the 
assumption was that bursts tend to be homogeneous ?

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-08-17  4:03 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-08-13 11:55 nfnetlink_queue -- why linear lookup ? alexandre.ferrieux
2021-08-14 21:01 ` Florian Westphal
2021-08-14 21:05   ` alexandre.ferrieux
2021-08-14 21:12     ` Florian Westphal
2021-08-15 12:17       ` alexandre.ferrieux
2021-08-15 13:07         ` Pablo Neira Ayuso
2021-08-15 13:32           ` alexandre.ferrieux
2021-08-15 14:12             ` Pablo Neira Ayuso
2021-08-15 18:47               ` alexandre.ferrieux
2021-08-16  9:05                 ` Pablo Neira Ayuso
2021-08-16 10:53                   ` alexandre.ferrieux
2021-08-16 10:56                     ` Florian Westphal
2021-08-16 11:07                       ` alexandre.ferrieux
2021-08-16 11:19                     ` Pablo Neira Ayuso
2021-08-16 11:42                     ` Duncan Roe
2021-08-16 12:04               ` Duncan Roe
2021-08-16 16:10                 ` Pablo Neira Ayuso
2021-08-16 16:15                   ` Florian Westphal
2021-08-17  4:03                   ` Duncan Roe
2021-08-15 13:33           ` alexandre.ferrieux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).