Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vlad Buslov <vladbu@mellanox.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Vlad Buslov <vladbu@mellanox.com>,
	Edward Cree <ecree@solarflare.com>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Jiri Pirko <jiri@resnulli.us>,
	Davide Caratti <dcaratti@redhat.com>,
	Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>,
	Jakub Kicinski <kuba@kernel.org>
Subject: Re: [PATCH net-next v2 0/4] Implement classifier-action terse dump mode
Date: Wed, 20 May 2020 10:24:16 +0300	[thread overview]
Message-ID: <vbfv9krvzkv.fsf@mellanox.com> (raw)
In-Reply-To: <CAM_iQpXqLdAJOcwyQ=DZs5zi=zEtr97_LT9uhPtTTPke=8Vvdw@mail.gmail.com>


On Tue 19 May 2020 at 21:58, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Tue, May 19, 2020 at 2:04 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> I considered that approach initially but decided against it for
>> following reasons:
>>
>> - Generic data is covered by current terse dump implementation.
>>   Everything else will be act or cls specific which would result long
>>   list of flag values like: TCA_DUMP_FLOWER_KEY_ETH_DST,
>>   TCA_DUMP_FLOWER_KEY_ETH_DST, TCA_DUMP_FLOWER_KEY_VLAN_ID, ...,
>>   TCA_DUMP_TUNNEL_KEY_ENC_KEY_ID, TCA_DUMP_TUNNEL_KEY_ENC_TOS. All of
>>   these would require a lot of dedicated logic in act and cls dump
>>   callbacks. Also, it would be quite a challenge to test all possible
>>   combinations.
>
> Well, if you consider netlink dump as a database query, what Edward
> proposed is merely "select COLUMN1 COLUMN2 from cls_db" rather
> than "select * from cls_db".
>
> No one said it is easy to implement, it is just more elegant than you
> select a hardcoded set of columns for the user.

As I explained to Edward, having denser netlink packets with more
filters per packet is only part of optimization. Another part is not
executing some code at all. Consider fl_dump_key() which is 200 lines
function with bunch of conditionals like that:

static int fl_dump_key(struct sk_buff *skb, struct net *net,
		       struct fl_flow_key *key, struct fl_flow_key *mask)
{
	if (mask->meta.ingress_ifindex) {
		struct net_device *dev;

		dev = __dev_get_by_index(net, key->meta.ingress_ifindex);
		if (dev && nla_put_string(skb, TCA_FLOWER_INDEV, dev->name))
			goto nla_put_failure;
	}

	if (fl_dump_key_val(skb, key->eth.dst, TCA_FLOWER_KEY_ETH_DST,
			    mask->eth.dst, TCA_FLOWER_KEY_ETH_DST_MASK,
			    sizeof(key->eth.dst)) ||
	    fl_dump_key_val(skb, key->eth.src, TCA_FLOWER_KEY_ETH_SRC,
			    mask->eth.src, TCA_FLOWER_KEY_ETH_SRC_MASK,
			    sizeof(key->eth.src)) ||
	    fl_dump_key_val(skb, &key->basic.n_proto, TCA_FLOWER_KEY_ETH_TYPE,
			    &mask->basic.n_proto, TCA_FLOWER_UNSPEC,
			    sizeof(key->basic.n_proto)))
		goto nla_put_failure;

	if (fl_dump_key_mpls(skb, &key->mpls, &mask->mpls))
		goto nla_put_failure;

	if (fl_dump_key_vlan(skb, TCA_FLOWER_KEY_VLAN_ID,
			     TCA_FLOWER_KEY_VLAN_PRIO, &key->vlan, &mask->vlan))
		goto nla_put_failure;
    ...


Now imagine all of these are extended with additional if (flags &
TCA_DUMP_XXX). All gains from not outputting some other minor stuff into
netlink packet will be negated by it.


>
> Think about it, what if another user wants a less terse dump but still
> not a full dump? Would you implement ops->terse_dump2()? Or
> what if people still think your terse dump is not as terse as she wants?
> ops->mini_dump()? How many ops's we would end having?

User can discard whatever he doesn't need in user land code. The goal of
this change is performance optimization, not designing a generic
kernel-space data filtering mechanism.

>
>
>>
>> - It is hard to come up with proper validation for such implementation.
>>   In case of terse dump I just return an error if classifier doesn't
>>   implement the callback (and since current implementation only outputs
>>   generic action info, it doesn't even require support from
>>   action-specific dump callbacks). But, for example, how do we validate
>>   a case where user sets some flower and tunnel_key act dump flags from
>>   previous paragraph, but Qdisc contains some other classifier? Or
>>   flower classifier points to other types of actions? Or when flower
>>   classifier has and tunnel_key actions but also mirred? Should the
>
> Each action should be able to dump selectively too. If you think it
> as a database, it is just a different table with different schemas.

How is designing custom SQL-like query language (according to your
example at the beginning of the mail) for filter dump is going to
improve performance? If there is a way to do it in fast a generic manner
with BPF, then I'm very interested to hear the details. But adding
hundred more hardcoded conditionals is just not a solution considering
main motivations for this change is performance.

next prev parent reply	other threads:[~2020-05-20  7:24 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-15 11:40 [PATCH net-next v2 0/4] Implement classifier-action terse dump mode Vlad Buslov
2020-05-15 11:40 ` [PATCH net-next v2 1/4] net: sched: introduce terse dump flag Vlad Buslov
2020-05-15 11:51   ` Jiri Pirko
2020-05-15 11:40 ` [PATCH net-next v2 2/4] net: sched: implement terse dump support in act Vlad Buslov
2020-05-15 11:51   ` Jiri Pirko
2020-05-15 11:40 ` [PATCH net-next v2 3/4] net: sched: cls_flower: implement terse dump support Vlad Buslov
2020-05-15 11:40 ` [PATCH net-next v2 4/4] selftests: implement flower classifier terse dump tests Vlad Buslov
2020-05-15 17:04 ` [PATCH net-next v2 0/4] Implement classifier-action terse dump mode Jakub Kicinski
2020-05-18  6:46   ` Vlad Buslov
2020-05-15 17:25 ` David Miller
2020-05-18  6:50   ` Vlad Buslov
2020-05-17 19:13 ` Cong Wang
2020-05-18  6:44   ` Vlad Buslov
2020-05-18 18:46     ` Cong Wang
2020-05-19  9:10       ` Vlad Buslov
2020-05-19 18:39         ` Cong Wang
2020-05-20  7:33           ` Vlad Buslov
2020-05-18 15:37 ` Edward Cree
2020-05-18 18:50   ` Cong Wang
2020-05-19  9:04   ` Vlad Buslov
2020-05-19 14:30     ` Edward Cree
2020-05-19 15:17       ` Vlad Buslov
2020-05-19 18:58     ` Cong Wang
2020-05-20  7:24       ` Vlad Buslov [this message]
2020-05-22 19:33         ` Cong Wang
2020-05-25 11:38           ` Vlad Buslov
2020-05-21 14:36   ` Vlad Buslov
2020-05-21 17:02     ` Jakub Kicinski
2020-05-22 16:16       ` Vlad Buslov
2020-05-23 11:06         ` Jamal Hadi Salim
2020-05-22 19:41     ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=vbfv9krvzkv.fsf@mellanox.com \
    --to=vladbu@mellanox.com \
    --cc=davem@davemloft.net \
    --cc=dcaratti@redhat.com \
    --cc=ecree@solarflare.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=marcelo.leitner@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.