Re: Xtables2 Netlink spec - Pablo Neira Ayuso

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Jan Engelhardt <jengelh@medozas.de>
Cc: Netfilter Developer Mailing List <netfilter-devel@vger.kernel.org>
Subject: Re: Xtables2 Netlink spec
Date: Thu, 25 Nov 2010 15:21:02 +0100	[thread overview]
Message-ID: <4CEE70CE.60502@netfilter.org> (raw)
In-Reply-To: <alpine.LNX.2.01.1011251327140.21752@obet.zrqbmnf.qr>

On 25/11/10 14:35, Jan Engelhardt wrote:
>
> On Thursday 2010-11-25 12:42, Pablo Neira Ayuso wrote:
>>>
>>> nfxt_socket = socket(AF_NETLINK, SOCK_RAW, NETFILTER_XTABLES);
>>
>> This has to go upon nfnetlink as other netfilter subsystems.
>
> Why so? It is not like Netlink protocols were limited to 32 AFAICS.
> Also as told, nfnetlink is not fit for parsing netlink messages where
> an attribute type appears more than once. If anything, I would look
> into genetlink, though that also starts to look like it cannot do
> that.

All netfilter subsystems must go over nfnetlink, dot.

If you are repeating the same attribute in one message, it means that 
you have to split your data into several messages.

>>> The Xtables2 Netlink protocol however encodes each node as a
>>> standalone attribute, to be called Flat Encoding, that is
>>> appended (a. k. a. “chained”) to the data stream. This makes it
>>> possible to split requests and dumps at a finer level than
>>> encapsulation would. Above all, it gets extensions the guarantee
>>> to have data blocks of a minimum guaranteed size.
>>>
>>> Since Netlink messages do have a 32-bit quantity to store the
>>> message length, rulesets of roughly up to 4 GB are possibile,
>>> which is currently regarded as sufficient. The largest (and
>>> meaningful) rulesets seen to date in the industry weighed in at
>>> approximately 150 MB.
>>
>> You can split data into several messages and avoid this limitation.
>
> Netlink may have support for splitting messages, but not really
> splitting data. So I am just splitting messages at attribute
> boundaries like everyone else.
>
>>> Whereas attribute nesting automatically provided for boundaries,
>>> this is realized using a dummy attribute in the chained approach.
>>> Certain attributes can start such a flattened nesting, and
>>> NFXTA_STOP terminates it.
>>
>> I don't like this trailing attribute, see below.
>>
>>> This attribute serves to denote the end of a nesting level as
>>> introduced by NFXTA_CHAIN, NFXTA_RULE, NFXTA_MATCH or
>>> NFXTA_TARGET. It has no data portion.
>>>
>>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>> | nla_len = 4                   | nla_type = NFXTA_STOP         |
>>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>
>> It's not a good idea to make assumptions on the order of the TLVs in
>> a Netlink message. I mean, you should not assume that NFXTA_STOP
>> comes after one specific attribute.
>
> Ordering is a necessary constraint with flat encoding. Furthermore,
> rules exhibit order, so even if I were to use encapsulated encoding,
> there would be ordering requirements.
>
> The Netlink RFC does not make any statements about what is to follow
> nlmsghdr; unless I missed something, it does not mention ordering,
> not even attributes at all. So XTNL is free to use what it chooses -
> including an nlattr32 that is not compatible with nlattr16.

Because the Netlink RFC doesn't make any statement, it doesn't mean that 
you can make assumptions. Moreover, that RFC doesn't cover everything in 
Netlink, that document requires lots of updates or way more RFCs to 
specify lots of undocumented Netlink aspects.

BTW, you may want to read this:
http://1984.lsi.us.es/~pablo/docs/spae.pdf

It still misses lots of aspects, including this, but we've got some more 
new documentation at least. It's not a RFC, it aims to be a tutorial.

>>> 2.2 Dump error code<sub:nfxta_errno>
>>>
>>> Once a NLM_F_MULTI dump operation has been started, for example
>>> with the NFXTM_CHAIN_DUMP request, Netlink kernel users must
>>> always end it successfully with NLMSG_DONE. To convey an error
>>> during the dump, Xtables2 will emit a NFXTA_ERRNO attribute into
>>> the stream (if it can), emit no further attributes for the
>>> request, and cause the dump to stop.
>>>
>>> 0                   1                   2                   3
>>> 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2
>>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>> | nla_len = 8                   | nla_type = NFXTA_ERRNO        |
>>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>> | int errno;                                                    |
>>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>
>> Isn't nlmsg_err OK for your needs?
>
> You cannot abort a dump from the kernel, which is why nlmsg_err
> does not get used.

What error can cause a dump from the kernel to be aborted? If we really 
need this, the point would be to add it to netlink instead of 
introducing some ad-hoc facility.

>>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>> | nla_len = 4 + payload         | nla_type = NFXTA_DATA         |
>>> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>> .                                                               .
>>> . e.g. struct xt_hashlimit_info
>>
>> This is fine during some transition period, but Netlink protocols
>> must not encapsulate structures in the payload of their TLVs.
>
> I did not see such a requirement in the Netlink RFC.
> Of course it is for existing extensions.

Again, the RFC is a useless argument for this, look for a better one. 
Encapsulating structures into TLVs is a *really bad practise* since you 
have to stick to the structure layout, which is indeed the problem that 
we have faced in iptables for 10 years, and that many other interfaces 
in the Linux kernel have.

Supporting the encapsulation of the structure during some time (during 
the transition) may be OK, but it's definitely not the way to go in the 
long run.

Remember that the revision field in iptables is a workaround, and the 
result in quite dirty code. The aim at that time we add it was to find 
some temporary solution until we could provide an extensible interface 
for iptables.

Moreover, if we support Netlink on the wire in the future, you'll have 
problems with encapsulated structures.

>> We can avoid this if structures are splitted into several TLVs. You
>> can add new attributes and obsolete old ones.
>
> Yes, but not at this stage. Complete architectural rewrites of
> everything at once comes with plenty of problems. Linux evolution has
> shown that small incremental reviewable patches are the credo.
>
> Do not worry, I left room in XTNL for attributes upgrades.

BTW, I didn't look at your protocol in deep yet but I'd suggest the 
following basis to rework it: one netlink message, one rule operation.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2010-11-25 14:20 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-24 22:29 Xtables2 Netlink spec Jan Engelhardt
2010-11-25 11:42 ` Pablo Neira Ayuso
2010-11-25 13:35   ` Jan Engelhardt
2010-11-25 14:21     ` Pablo Neira Ayuso [this message]
2010-11-25 21:46       ` Jan Engelhardt
2010-11-26  8:25         ` Pablo Neira Ayuso
2010-11-26 13:59           ` Jan Engelhardt
2010-11-26 19:48             ` Jozsef Kadlecsik
2010-11-26 19:55               ` Jan Engelhardt
2010-11-26 20:05                 ` Jozsef Kadlecsik
2010-11-26 21:33                   ` Jan Engelhardt
     [not found]                     ` <alpine.DEB.2.00.1011270951330.20431@blackhole.kfki.hu>
2010-11-27 13:39                       ` Jan Engelhardt
2010-11-27 17:04                         ` Jozsef Kadlecsik
2010-11-27 17:35                           ` Jan Engelhardt
2010-11-27 20:42                             ` Jozsef Kadlecsik
2010-11-29 12:30                               ` Pablo Neira Ayuso
2010-11-29 12:39                                 ` Jozsef Kadlecsik
2010-11-29 12:55                                   ` Pablo Neira Ayuso
2010-11-29 13:26                                     ` Jan Engelhardt
2010-11-29 13:49                                   ` Pablo Neira Ayuso
2010-11-29 12:23                 ` Pablo Neira Ayuso
2010-11-27 11:10             ` Pablo Neira Ayuso
2010-11-26 15:27           ` Jan Engelhardt
2010-11-27 12:25             ` Pablo Neira Ayuso
2010-12-03 21:03           ` Jan Engelhardt
2010-12-07  7:49             ` Pablo Neira Ayuso
2010-12-07 13:30               ` Jan Engelhardt
2010-12-08 11:36                 ` Pablo Neira Ayuso
2010-11-26 19:01 ` Jozsef Kadlecsik
2010-12-09 12:08   ` Pablo Neira Ayuso
2010-12-14  2:01     ` Jan Engelhardt
2010-12-14  2:16       ` James Nurmi
2010-12-14  3:46         ` Jan Engelhardt
2010-12-15 13:54       ` Pablo Neira Ayuso
2010-12-16 14:05         ` Thomas Graf
2010-12-16 14:22           ` Jan Engelhardt
2010-12-17  7:25             ` Thomas Graf
2010-12-17  9:35               ` Jan Engelhardt
2010-12-17  9:50                 ` Pablo Neira Ayuso
2010-12-17  9:55           ` Pablo Neira Ayuso
2010-12-17 14:56             ` Jan Engelhardt
2010-12-15  4:55   ` Jan Engelhardt
2010-12-15  8:51     ` Jozsef Kadlecsik
2010-12-16  9:57       ` Jesper Dangaard Brouer
2010-12-16 12:51         ` Error reporting in Netlink (Re: Xtables2 Netlink spec) Jan Engelhardt
2010-12-16 13:43           ` Thomas Graf
2010-12-16 13:51             ` Jan Engelhardt
2010-12-16 14:19               ` Thomas Graf
2010-12-17 10:00                 ` Pablo Neira Ayuso
2010-12-16 14:47             ` Jozsef Kadlecsik
2010-12-16 15:09               ` Jan Engelhardt
2010-12-16 23:31             ` Patrick McHardy
2010-12-17  6:58               ` Thomas Graf
2010-12-16 23:23           ` Patrick McHardy
2010-12-17 10:02             ` Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CEE70CE.60502@netfilter.org \
    --to=pablo@netfilter.org \
    --cc=jengelh@medozas.de \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).