Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Daniel Borkmann <daniel@iogearbox.net>
To: Jamal Hadi Salim <jhs@mojatatu.com>,
	Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	WANG Cong <xiyou.wangcong@gmail.com>,
	Eric Dumazet <edumazet@google.com>,
	netdev@vger.kernel.org, Florian Westphal <fw@strlen.de>
Subject: Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
Date: Sun, 25 Sep 2016 18:26:47 +0200	[thread overview]
Message-ID: <57E7FAC7.6090904@iogearbox.net> (raw)
In-Reply-To: <6d2bd45a-a8a0-846d-5934-5e246522cab8@mojatatu.com>

On 09/25/2016 03:05 PM, Jamal Hadi Salim wrote:
> On 16-09-23 11:40 AM, Shmulik Ladkani wrote:
>> On Fri, 23 Sep 2016 08:48:33 -0400 Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>>>> Even today, one may create loops using existing 'egress redirect',
>>>> e.g. this rediculously errorneous construct:
>>>>
>>>>  # ip l add v0 type veth peer name v0p
>>>>  # tc filter add dev v0p parent ffff: basic \
>>>>     action mirred egress redirect dev v0
>>>
>>> I think we actually recover from this one by eventually
>>> dropping (theres a ttl field).
>>
>> [off topic]
>>
>> Don't know about that :) cpu fan got very noisy, 3 of 4 cores at 100%,
>> and after one second I got:
>>
>> # ip -s l show type veth
>> 16: v0p@v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>>     link/ether a2:64:ff:10:dd:85 brd ff:ff:ff:ff:ff:ff
>>     RX: bytes  packets  errors  dropped overrun mcast
>>     71660305923 469890864 0       0       0       0
>>     TX: bytes  packets  errors  dropped carrier collsns
>>     3509       24       0       0       0       0
>> 17: v0@v0p: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>>     link/ether 52:a2:34:f6:7c:ec brd ff:ff:ff:ff:ff:ff
>>     RX: bytes  packets  errors  dropped overrun mcast
>>     3509       24       0       0       0       0
>>     TX: bytes  packets  errors  dropped carrier collsns
>>     71660713017 469893555 0       0       0       0
>>
>
> I think this is still on topic!
>
> Now I realize that code we took out around 4.2.x is still useful
> for such a use case (I wasnt thinking about veth when Florian was
> slimming the skb);
> +Cc Florian W.
>
> This snippet from 4.2:
> -------------
> 3525 static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
> 3526 {
> 3527         struct net_device *dev = skb->dev;
> 3528         u32 ttl = G_TC_RTTL(skb->tc_verd);
> 3529         int result = TC_ACT_OK;
> 3530         struct Qdisc *q;
> 3531
> 3532         if (unlikely(MAX_RED_LOOP < ttl++)) {
> 3533                 net_warn_ratelimited("Redir loop detected Dropping packet (%d->%d)\n",
> 3534                                      skb->skb_iif, dev->ifindex);
> 3535                 return TC_ACT_SHOT;
> 3536         }
> 3537
> 3538         skb->tc_verd = SET_TC_RTTL(skb->tc_verd, ttl);
> 3539         skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
> 3540
> 3541         q = rcu_dereference(rxq->qdisc);
> 3542         if (q != &noop_qdisc) {
> 3543                 spin_lock(qdisc_lock(q));
> 3544                 if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
> 3545                         result = qdisc_enqueue_root(skb, q);
> 3546                 spin_unlock(qdisc_lock(q));
> 3547         }
> 3548
> 3549         return result;
> 3550 }
> --------------------
>
> MAX_RED_LOOP (stands for "Maximum Redirect loop") still exists in
> current code. The idea above was that we would increment the rttl
> counter  once and if we saw it again upto MAX_RED_LOOP we would assume
> a loop and drop the packet (at the time i didnt think it was wise to
> let the actions be in charge of setting the RTTL; it had to be central
> core code - but it may not be neccessary)
>
> Florian, when we discussed I said it was fine to reclaim those 3 bits
> on tc verdict for RTTL at the time because i had taken out the
> feature and never added it back. Your comment at the time was we can
> add it back when someone shows up with the feature.
> Shmulik is looking to add it.

Why not just reuse xmit_recursion, which is what we did in tc cls_bpf
programs f.e. see __bpf_tx_skb()? Would be a pity to waste 3 bits on
this in the skb.

>> Similarly to all constructs injecting skbs to device rx (bond/team,
>> vlan, macvlan, tunnels, ifb, __dev_forward_skb callers, etc..), we are
>> obligated to assign 'skb2->dev' as the new rx device.
>>
>> Regarding 'skb2->skb_iif', original act_mirred code already has:
>>
>>      skb2->skb_iif = skb->dev->ifindex;   <--- THIS IS ORIG DEV IIF
>>      skb2->dev = dev;                     <--- THIS IS TARGET DEV
>>     err = dev_queue_xmit(skb2);
>>
>> I'm preserving this; OTOH the suggested modification in the patch is
>>
>> -    err = dev_queue_xmit(skb2);
>> +    if (tcf_mirred_act_direction(m->tcfm_eaction) & AT_EGRESS)
>> +        err = dev_queue_xmit(skb2);
>> +    else
>> +        netif_receive_skb(skb2);
>>
>> now, the call to 'netif_receive_skb' will eventually override skb_iif to
>> the target RX dev's index, upon entry to __netif_receive_skb_core.
>>
>> I think this IS the expected behavior - as done by other "rx injection"
>> constructs.
>
> Sounds fine.
> I am wondering if we can have a tracing feature to show the lifetime of
> the packet as it is being cycled around the kernel? It would help
> debugging if some policy misbehaves.
>
>> My doubts were around whether we should call 'dev_forward_skb' instead
>> of 'netif_receive_skb'.
>> The former does some things I assumed we're not interested of, like
>> testing 'is_skb_forwardable' and re-running 'eth_type_trans'.
>> OTOH, it DOES scrub the skb.
>> Maybe we should scrub it as well prior the netif_receive_skb call?
>
> Scrubbing the skb could be a bad idea if it gets rid of global state
> like the RTTL if you add it back.
>
> cheers,
> jamal
>

next prev parent reply	other threads:[~2016-09-25 16:26 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-22 13:21 [PATCH net-next 0/4] act_mirred: Ingress actions support Shmulik Ladkani
2016-09-22 13:21 ` [PATCH net-next 1/4] net/sched: act_mirred: Rename tcfm_ok_push to tcfm_mac_header_xmit Shmulik Ladkani
2016-09-27 10:30   ` Daniel Borkmann
2016-09-27 18:24     ` Shmulik Ladkani
2016-09-22 13:21 ` [PATCH net-next 2/4] net/sched: act_mirred: Refactor detection whether dev needs xmit at mac header Shmulik Ladkani
2016-09-22 13:21 ` [PATCH net-next 3/4] net/sched: tc_mirred: Rename public predicates 'is_tcf_mirred_redirect' and 'is_tcf_mirred_mirror' Shmulik Ladkani
2016-09-22 13:21 ` [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions Shmulik Ladkani
2016-09-22 14:54   ` Eric Dumazet
2016-09-22 18:27     ` Shmulik Ladkani
2016-09-22 18:42       ` Eric Dumazet
2016-09-22 23:40   ` Jamal Hadi Salim
2016-09-23  5:11     ` Shmulik Ladkani
2016-09-23 12:48       ` Jamal Hadi Salim
2016-09-23 15:40         ` Shmulik Ladkani
2016-09-25  0:20           ` Cong Wang
2016-09-25 13:05           ` Jamal Hadi Salim
2016-09-25 16:26             ` Daniel Borkmann [this message]
2016-09-25 18:33               ` Florian Westphal
2016-09-25 23:47                 ` Jamal Hadi Salim
2016-09-25 23:31               ` Jamal Hadi Salim
2016-09-25 17:33             ` Shmulik Ladkani
2016-09-25 18:31               ` Florian Westphal
2016-09-26  1:15                 ` Jamal Hadi Salim
2016-09-26  1:35                   ` Florian Westphal
2016-09-26  1:40                     ` Jamal Hadi Salim
2016-09-26 14:43                     ` Hannes Frederic Sowa
2016-09-26 14:53                       ` Daniel Borkmann
2016-09-26 15:12                         ` Hannes Frederic Sowa
2016-09-26 15:53                           ` Daniel Borkmann
2016-09-26 15:26                       ` Shmulik Ladkani
2016-09-25 23:45               ` Jamal Hadi Salim
2016-09-25  0:07       ` Cong Wang
2016-09-25 13:39         ` Jamal Hadi Salim
2016-09-26  4:55           ` Cong Wang
2016-09-25 17:59         ` Shmulik Ladkani
2016-09-26  4:56           ` Cong Wang
2016-09-24 23:50   ` Cong Wang
2016-09-27  5:56   ` David Miller
2016-09-27  8:07     ` Shmulik Ladkani
2016-09-27 10:39       ` Daniel Borkmann
2016-09-27 13:44         ` David Miller
2016-09-27 14:18           ` Shmulik Ladkani
2016-09-27 14:47             ` Daniel Borkmann
2016-09-27 14:06       ` Jamal Hadi Salim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57E7FAC7.6090904@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fw@strlen.de \
    --cc=jhs@mojatatu.com \
    --cc=netdev@vger.kernel.org \
    --cc=shmulik.ladkani@gmail.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.