From: Duncan Roe <duncan_roe@optusnet.com.au>
To: Netfilter Development <netfilter-devel@vger.kernel.org>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Subject: Re: [PATCH libnetfilter_queue 0/3] pktbuff API updates
Date: Wed, 29 Apr 2020 23:28:40 +1000 [thread overview]
Message-ID: <20200429132840.GA3833@dimstar.local.net> (raw)
In-Reply-To: <20200428225520.GA30421@salvia>
On Wed, Apr 29, 2020 at 12:55:20AM +0200, Pablo Neira Ayuso wrote:
> On Wed, Apr 29, 2020 at 07:14:52AM +1000, Duncan Roe wrote:
> > On Tue, Apr 28, 2020 at 12:34:07PM +0200, Pablo Neira Ayuso wrote:
> [...]
> > > pktb_alloc2() still has a memcpy which is not needed by people that do
> > > not need to mangle the packet.
> >
> > No it does not. Please look again. There is only a memcpy if the caller
> > specifies extra > 0, in which case she clearly intends to mangle it (perhaps
> > depending on its contents).
>
> Right, it only happens if extra is specified.
>
> + if (extra) {
> + pkt_data = buf;
> + memcpy(pkt_data, data, len);
> + memset((uint8_t *)pkt_data + len, 0, extra);
> + } else {
> + pkt_data = data;
> + }
>
> So buf is only used if extra is specified?
Yes, that's right.
>
> > "depending on its contents" is where the memcpy deferral comes in. pktb_alloc2()
> > verifies that the supplied buffer is big enough (size >= len + extra). The user
> > declared it as a stack variable that size so it will be. With the deferral
> > enhancement, pktb_alloc2() records the buffer address and extra in the enlarged
> > struct pktbuff (extra is needed to tell pktb_mangle how much memory to memset to
> > 0).
>
> I agree that deferring the memcpy() and avoiding the malloc() is the
> way to go, we only have to agree in the way to achieve this.
>
> > If pktb_mangle() finds it has to make the packet larger then its original length
> > and the packet is still in its original location then copy it and zero extra.
> > (i.e. pktb_mangle() doesn't just check whether it was asked to make the packet
> > bigger: it might have previously been asked to make it smaller).
> >
> > Also (and this *is* tricky, update relevant pointers in the struct pktbuff).
> > That invalidates any poiners the caller may have obtained from e.g. pktb_data()
> > - see end of email.
>
> Regarding pktb_mangle() reallocation case, refetching the pointers
> sounds fine, documenting this is sufficient.
>
> [...]
> > > Revisiting, I would prefer to keep things simple. The caller should
> > > make sure that pktb_mangle() has a buffer containing enough room. I
> > > think it's more simple for the caller to allocate a buffer that is
> > > large enough for any mangling.
I reckon they'll just copy the code from the nfq_nlmsg_verdict_put_pkt() man /
web page. After declaring "char pktbuf[plen + EXTRA];" one can use "sizeof
pktbuf" as the length argument.
> >
> > Yes it's more complex. No problem with the buffer - the user gave that to
> > pktb_alloc2().
>
> I'm just hesitating about the new pktb_alloc2() approach because it
> has many parameters, it's just looks a bit complicated to me (this
> function takes 8 parameters).
It has the original 4 from pktb_alloc() plus 2 {buffer, size} pairs. It could
have been just one pair, with packet data appended to metadata as in
pktb_alloc() but I thought it would be really awkward to document how to
dimension it.
>
> If you can just pre-allocate the pkt_buff head from the configuration
> phase (before receiving packets from the kernel), then attach the
> buffer via pktb_setup_metadata() for each packet that is received (so
> the pkt_buff head is recycled). With this approach, pktb_head_size()
> won't be needed either.
I think we should not be usurping the data pointer of mnl_cb_run(). I can see
people wanting to use it to pass a pointer to e.g. some kind of database that
callbacks need to access. There's no performance gain to recycling the buffer:
the CB doesn't need to call pktb_head_size() on every invocation, that can be
done once by main() e.g.
static size_t sizeof_head;
...
int main(int argc, char *argv[])
{
...
sizeof_head = pktb_head_size(); /* Avoid multiple calls in CB */
...
static int queue_cb(const struct nlmsghdr *nlh, void *data)
{
char head[sizeof_head];
>
> My understanding is that requirements are:
>
> * Users that do not want to mangle the packet, they use the buffer
> that was passed to recvmsg().
> * Users that want to mangle the packet, they use the _mangle()
> function that might reallocate the data buffer (the one you would
> like to have). However, if this data buffer reallocation happens,
> then pkt_buff should annotate that this pkt_buff object needs to
> release this data buffer from pktb_free() otherwise.
No, there is nothing to release. We told pktb_alloc2() where the buffer was,
it's on the stack (usually).
>
> > Problem is that if mangler moves the packet, then any packet pointer the caller
> > had is invalid (points to the un-mangled copy). This applies at all levels, e.g.
> > nfq_udp_get_payload(). There is no way for the mangler functions to address
> > this: it just has to be highlighted in the documentation.
>
> That's fine, this is exactly how the kernel works: if the function
> might reallocate the data area, then you have to refetch pointers
> after this. If you teach _mangle() to do reallocations, then
> documenting this is fine.
>
> However, those reallocation need pktb_free() to release that new data
> buffer, right?
No way. There is no malloc() nor free() anywhere. The data buffer is
(recommended to be) on the stack; for running under gdb it may be preferred to
us a static buffer which has to be dimensioned hugely.
>
> > Still, I really like the deferred copy enhancement. Your thoughts?
>
> The deferred copy idea when mangling sounds fine, we only have to
> agree on how to get this done.
>
> Thanks.
Cheers ... Duncan.
next prev parent reply other threads:[~2020-04-29 13:28 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-26 13:23 [PATCH libnetfilter_queue 0/3] pktbuff API updates Pablo Neira Ayuso
2020-04-26 13:23 ` [PATCH libnetfilter_queue 1/3] pktbuff: add pktb_alloc_head() and pktb_build_data() Pablo Neira Ayuso
2020-04-30 5:41 ` Duncan Roe
2020-04-26 13:23 ` [PATCH libnetfilter_queue 2/3] example: nf-queue: use pkt_buff Pablo Neira Ayuso
2020-05-14 4:35 ` Duncan Roe
2020-05-14 4:35 ` [PATCH libnetfilter_queue 1/1] example: nf-queue: use pkt_buff (updated) Duncan Roe
2020-04-26 13:23 ` [PATCH libnetfilter_queue 3/3] pktbuff: add pktb_reset_network_header() and pktb_set_network_header() Pablo Neira Ayuso
2020-04-27 11:06 ` [PATCH libnetfilter_queue 0/3] pktbuff API updates Duncan Roe
2020-04-27 17:06 ` Pablo Neira Ayuso
2020-04-28 4:33 ` Duncan Roe
2020-04-28 10:34 ` Pablo Neira Ayuso
2020-04-28 21:14 ` Duncan Roe
2020-04-28 22:55 ` Pablo Neira Ayuso
2020-04-29 13:28 ` Duncan Roe [this message]
2020-04-29 19:00 ` Pablo Neira Ayuso
2020-04-29 19:54 ` Duncan Roe
2020-04-29 21:12 ` Pablo Neira Ayuso
2020-04-29 19:10 ` Duncan Roe
2020-04-29 19:16 ` Pablo Neira Ayuso
2020-04-29 20:30 ` Duncan Roe
2020-04-29 21:05 ` Pablo Neira Ayuso
2020-04-30 6:34 ` Duncan Roe
2020-05-02 12:50 ` Duncan Roe
2020-05-05 12:30 ` Pablo Neira Ayuso
2020-05-06 0:57 ` Duncan Roe
2020-05-06 2:39 ` Duncan Roe
2020-05-08 1:13 ` Duncan Roe
2020-05-09 8:26 ` Duncan Roe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200429132840.GA3833@dimstar.local.net \
--to=duncan_roe@optusnet.com.au \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.