netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cong Wang <xiyou.wangcong@gmail.com>
To: Paul Blakey <paulb@nvidia.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
	Vlad Buslov <vladbu@nvidia.com>, Oz Shlomo <ozsh@nvidia.com>,
	Roi Dayan <roid@nvidia.com>,
	netdev@vger.kernel.org, Saeed Mahameed <saeedm@nvidia.com>,
	Eric Dumazet <edumazet@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Subject: Re: [PATCH net v2 1/2] net: Fix return value of qdisc ingress handling on success
Date: Sat, 1 Oct 2022 13:33:19 -0700	[thread overview]
Message-ID: <YzikD3a8VGTpAiJg@pop-os.localdomain> (raw)
In-Reply-To: <Yzig5mvDDFqqieDl@pop-os.localdomain>

On Sat, Oct 01, 2022 at 01:19:50PM -0700, Cong Wang wrote:
> On Wed, Sep 28, 2022 at 10:55:49AM +0300, Paul Blakey wrote:
> > 
> > 
> > On 25/09/2022 21:00, Cong Wang wrote:
> > > On Sun, Sep 25, 2022 at 11:14:21AM +0300, Paul Blakey wrote:
> > > > Currently qdisc ingress handling (sch_handle_ingress()) doesn't
> > > > set a return value and it is left to the old return value of
> > > > the caller (__netif_receive_skb_core()) which is RX drop, so if
> > > > the packet is consumed, caller will stop and return this value
> > > > as if the packet was dropped.
> > > > 
> > > > This causes a problem in the kernel tcp stack when having a
> > > > egress tc rule forwarding to a ingress tc rule.
> > > > The tcp stack sending packets on the device having the egress rule
> > > > will see the packets as not successfully transmitted (although they
> > > > actually were), will not advance it's internal state of sent data,
> > > > and packets returning on such tcp stream will be dropped by the tcp
> > > > stack with reason ack-of-unsent-data. See reproduction in [0] below.
> > > > 
> > > 
> > > Hm, but how is this return value propagated to egress? I checked
> > > tcf_mirred_act() code, but don't see how it is even used there.
> > > 
> > > 318         err = tcf_mirred_forward(want_ingress, skb2);
> > > 319         if (err) {
> > > 320 out:
> > > 321                 tcf_action_inc_overlimit_qstats(&m->common);
> > > 322                 if (tcf_mirred_is_act_redirect(m_eaction))
> > > 323                         retval = TC_ACT_SHOT;
> > > 324         }
> > > 325         __this_cpu_dec(mirred_rec_level);
> > > 326
> > > 327         return retval;
> > > 
> > > 
> > > What am I missing?
> > 
> > for the ingress acting act_mirred it will return TC_ACT_CONSUMED above
> > the code you mentioned (since redirect=1, use_reinsert=1. Although
> > TC_ACT_STOLEN which is the retval set for this action, will also act the
> > same)
> > 
> > 
> > It is propagated as such (TX stack starting from tcp):
> > 
> 
> Sorry for my misunderstanding.
> 
> I meant to say those TC_ACT_* return value, not NET_RX_*, but I worried
> too much here, as mirred lets user specify the return value.
> 
> BTW, it seems you at least miss the drop case, which is NET_RX_DROP for
> TC_ACT_SHOT at least? Possibly other code paths in sch_handle_ingress()
> too.
> 

I mean:

diff --git a/net/core/dev.c b/net/core/dev.c
index fa53830d0683..d1db8210d671 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5109,6 +5109,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
        struct mini_Qdisc *miniq = rcu_dereference_bh(skb->dev->miniq_ingress);
        struct tcf_result cl_res;

+       *ret = NET_RX_SUCCESS;
        /* If there's at least one ingress present somewhere (so
         * we get here via enabled static key), remaining devices
         * that are not configured with an ingress qdisc will bail
@@ -5136,6 +5137,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
        case TC_ACT_SHOT:
                mini_qdisc_qstats_cpu_drop(miniq);
                kfree_skb_reason(skb, SKB_DROP_REASON_TC_INGRESS);
+               *ret = NET_RX_DROP;
                return NULL;
        case TC_ACT_STOLEN:
        case TC_ACT_QUEUED:
@@ -5160,6 +5162,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
                break;
        }
 #endif /* CONFIG_NET_CLS_ACT */
+       *ret = NET_RX_SUCCESS;
        return skb;
 }



  reply	other threads:[~2022-10-01 20:33 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-25  8:14 [PATCH net v2 0/2] net: Fix return value of qdisc ingress handling on success Paul Blakey
2022-09-25  8:14 ` [PATCH net v2 1/2] " Paul Blakey
2022-09-25 18:00   ` Cong Wang
2022-09-28  7:55     ` Paul Blakey
2022-10-01 20:19       ` Cong Wang
2022-10-01 20:33         ` Cong Wang [this message]
2022-10-02  9:30         ` Paul Blakey
2022-09-25  8:14 ` [PATCH net v2 2/2] selftests: add selftest for chaining of tc ingress handling to egress Paul Blakey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YzikD3a8VGTpAiJg@pop-os.localdomain \
    --to=xiyou.wangcong@gmail.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=ozsh@nvidia.com \
    --cc=pabeni@redhat.com \
    --cc=paulb@nvidia.com \
    --cc=roid@nvidia.com \
    --cc=saeedm@nvidia.com \
    --cc=vladbu@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).