* Re: [PATCH net-next 7/9] net: tcp: add skb drop reasons to tcp connect requesting
From: kernel test robot @ 2022-05-16 12:54 UTC (permalink / raw)
To: menglong8.dong, edumazet
Cc: llvm, kbuild-all, rostedt, mingo, davem, yoshfuji, dsahern, kuba,
pabeni, imagedong, kafai, talalahmad, keescook, dongli.zhang,
linux-kernel, netdev
In-Reply-To: <20220516034519.184876-8-imagedong@tencent.com>
Hi,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on net-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/menglong8-dong-gmail-com/net-tcp-add-skb-drop-reasons-to-tcp-state-change/20220516-114934
base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git d9713088158b23973266e07fdc85ff7d68791a8c
config: mips-mtx1_defconfig (https://download.01.org/0day-ci/archive/20220516/202205162057.owcP29LO-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 853fa8ee225edf2d0de94b0dcbd31bea916e825e)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install mips cross compiling tool for clang build
# apt-get install binutils-mips-linux-gnu
# https://github.com/intel-lab-lkp/linux/commit/d93679590223760e685126e344dfddd7d7c08cc3
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review menglong8-dong-gmail-com/net-tcp-add-skb-drop-reasons-to-tcp-state-change/20220516-114934
git checkout d93679590223760e685126e344dfddd7d7c08cc3
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=mips SHELL=/bin/bash net/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
>> net/dccp/ipv4.c:584:5: error: conflicting types for 'dccp_v4_conn_request'
int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb,
^
net/dccp/dccp.h:258:5: note: previous declaration is here
int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb);
^
>> net/dccp/ipv4.c:921:21: error: incompatible function pointer types initializing 'int (*)(struct sock *, struct sk_buff *, enum skb_drop_reason *)' with an expression of type 'int (struct sock *, struct sk_buff *)' [-Werror,-Wincompatible-function-pointer-types]
.conn_request = dccp_v4_conn_request,
^~~~~~~~~~~~~~~~~~~~
2 errors generated.
vim +/dccp_v4_conn_request +584 net/dccp/ipv4.c
583
> 584 int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb,
585 enum skb_drop_reason *reason)
586 {
587 struct inet_request_sock *ireq;
588 struct request_sock *req;
589 struct dccp_request_sock *dreq;
590 const __be32 service = dccp_hdr_request(skb)->dccph_req_service;
591 struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb);
592
593 /* Never answer to DCCP_PKT_REQUESTs send to broadcast or multicast */
594 if (skb_rtable(skb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
595 return 0; /* discard, don't send a reset here */
596
597 if (dccp_bad_service_code(sk, service)) {
598 dcb->dccpd_reset_code = DCCP_RESET_CODE_BAD_SERVICE_CODE;
599 goto drop;
600 }
601 /*
602 * TW buckets are converted to open requests without
603 * limitations, they conserve resources and peer is
604 * evidently real one.
605 */
606 dcb->dccpd_reset_code = DCCP_RESET_CODE_TOO_BUSY;
607 if (inet_csk_reqsk_queue_is_full(sk))
608 goto drop;
609
610 if (sk_acceptq_is_full(sk))
611 goto drop;
612
613 req = inet_reqsk_alloc(&dccp_request_sock_ops, sk, true);
614 if (req == NULL)
615 goto drop;
616
617 if (dccp_reqsk_init(req, dccp_sk(sk), skb))
618 goto drop_and_free;
619
620 dreq = dccp_rsk(req);
621 if (dccp_parse_options(sk, dreq, skb))
622 goto drop_and_free;
623
624 if (security_inet_conn_request(sk, skb, req))
625 goto drop_and_free;
626
627 ireq = inet_rsk(req);
628 sk_rcv_saddr_set(req_to_sk(req), ip_hdr(skb)->daddr);
629 sk_daddr_set(req_to_sk(req), ip_hdr(skb)->saddr);
630 ireq->ir_mark = inet_request_mark(sk, skb);
631 ireq->ireq_family = AF_INET;
632 ireq->ir_iif = sk->sk_bound_dev_if;
633
634 /*
635 * Step 3: Process LISTEN state
636 *
637 * Set S.ISR, S.GSR, S.SWL, S.SWH from packet or Init Cookie
638 *
639 * Setting S.SWL/S.SWH to is deferred to dccp_create_openreq_child().
640 */
641 dreq->dreq_isr = dcb->dccpd_seq;
642 dreq->dreq_gsr = dreq->dreq_isr;
643 dreq->dreq_iss = dccp_v4_init_sequence(skb);
644 dreq->dreq_gss = dreq->dreq_iss;
645 dreq->dreq_service = service;
646
647 if (dccp_v4_send_response(sk, req))
648 goto drop_and_free;
649
650 inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
651 reqsk_put(req);
652 return 0;
653
654 drop_and_free:
655 reqsk_free(req);
656 drop:
657 __DCCP_INC_STATS(DCCP_MIB_ATTEMPTFAILS);
658 return -1;
659 }
660 EXPORT_SYMBOL_GPL(dccp_v4_conn_request);
661
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply
* [Patch] net: af_key: check encryption module availability consistency
From: Thomas Bartschies @ 2022-05-16 12:38 UTC (permalink / raw)
Since the recent introduction supporting the SM3 and SM4 hash algos for IPsec, the kernel
produces invalid pfkey acquire messages, when these encryption modules are disabled. This
happens because the availability of the algos wasn't checked in all necessary functions.
This patch adds these checks.
Signed-off-by: Thomas Bartschies <thomas.bartschies@cvk.de>
diff -uprN a/net/key/af_key.c b/net/key/af_key.c
--- a/net/key/af_key.c 2022-05-09 09:16:33.000000000 +0200
+++ b/net/key/af_key.c 2022-05-13 13:51:58.286250337 +0200
@@ -2898,7 +2898,7 @@ static int count_ah_combs(const struct x
break;
if (!aalg->pfkey_supported)
continue;
- if (aalg_tmpl_set(t, aalg))
+ if (aalg_tmpl_set(t, aalg) && aalg->available)
sz += sizeof(struct sadb_comb);
}
return sz + sizeof(struct sadb_prop);
@@ -2916,7 +2916,7 @@ static int count_esp_combs(const struct
if (!ealg->pfkey_supported)
continue;
- if (!(ealg_tmpl_set(t, ealg)))
+ if (!(ealg_tmpl_set(t, ealg) && ealg->available))
continue;
for (k = 1; ; k++) {
@@ -2927,7 +2927,7 @@ static int count_esp_combs(const struct
if (!aalg->pfkey_supported)
continue;
- if (aalg_tmpl_set(t, aalg))
+ if (aalg_tmpl_set(t, aalg) && aalg->available)
sz += sizeof(struct sadb_comb);
}
}
^ permalink raw reply
* Re: [PATCH net v2] netfilter: nf_flow_table: fix teardown flow timeout
From: Pablo Neira Ayuso @ 2022-05-16 12:43 UTC (permalink / raw)
To: Sven Auhagen
Cc: Oz Shlomo, Felix Fietkau, netdev, netfilter-devel,
Florian Westphal, Paul Blakey
In-Reply-To: <20220516122300.6gwrlmun4w3ynz7s@SvensMacbookPro.hq.voleatech.com>
On Mon, May 16, 2022 at 02:23:00PM +0200, Sven Auhagen wrote:
> On Mon, May 16, 2022 at 02:13:03PM +0200, Pablo Neira Ayuso wrote:
> > On Mon, May 16, 2022 at 12:56:41PM +0200, Pablo Neira Ayuso wrote:
> > > On Thu, May 12, 2022 at 09:28:03PM +0300, Oz Shlomo wrote:
> > > > Connections leaving the established state (due to RST / FIN TCP packets)
> > > > set the flow table teardown flag. The packet path continues to set lower
> > > > timeout value as per the new TCP state but the offload flag remains set.
> > > >
> > > > Hence, the conntrack garbage collector may race to undo the timeout
> > > > adjustment of the packet path, leaving the conntrack entry in place with
> > > > the internal offload timeout (one day).
> > > >
> > > > Avoid ct gc timeout overwrite by flagging teared down flowtable
> > > > connections.
> > > >
> > > > On the nftables side we only need to allow established TCP connections to
> > > > create a flow offload entry. Since we can not guaruantee that
> > > > flow_offload_teardown is called by a TCP FIN packet we also need to make
> > > > sure that flow_offload_fixup_ct is also called in flow_offload_del
> > > > and only fixes up established TCP connections.
> > > [...]
> > > > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > > > index 0164e5f522e8..324fdb62c08b 100644
> > > > --- a/net/netfilter/nf_conntrack_core.c
> > > > +++ b/net/netfilter/nf_conntrack_core.c
> > > > @@ -1477,7 +1477,8 @@ static void gc_worker(struct work_struct *work)
> > > > tmp = nf_ct_tuplehash_to_ctrack(h);
> > > >
> > > > if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) {
> > > > - nf_ct_offload_timeout(tmp);
> > >
> > > Hm, it is the trick to avoid checking for IPS_OFFLOAD from the packet
> > > path that triggers the race, ie. nf_ct_is_expired()
> > >
> > > The flowtable ct fixup races with conntrack gc collector.
> > >
> > > Clearing IPS_OFFLOAD might result in offloading the entry again for
> > > the closing packets.
> > >
> > > Probably clear IPS_OFFLOAD from teardown, and skip offload if flow is
> > > in a TCP state that represent closure?
> > >
> > > if (unlikely(!tcph || tcph->fin || tcph->rst))
> > > goto out;
> > >
> > > this is already the intention in the existing code.
> >
> > I'm attaching an incomplete sketch patch. My goal is to avoid the
> > extra IPS_ bit.
>
> You might create a race with ct gc that will remove the ct
> if it is in close or end of close and before flow offload teardown is running
> so flow offload teardown might access memory that was freed.
flow object holds a reference to the ct object until it is released,
no use-after-free can happen.
> It is not a very likely scenario but never the less it might happen now
> since the IPS_OFFLOAD_BIT is not set and the state might just time out.
>
> If someone sets a very small TCP CLOSE timeout it gets more likely.
>
> So Oz and myself were debatting about three possible cases/problems:
>
> 1. ct gc sets timeout even though the state is in CLOSE/FIN because the
> IPS_OFFLOAD is still set but the flow is in teardown
> 2. ct gc removes the ct because the IPS_OFFLOAD is not set and
> the CLOSE timeout is reached before the flow offload del
OK.
> 3. tcp ct is always set to ESTABLISHED with a very long timeout
> in flow offload teardown/delete even though the state is already
> CLOSED.
>
> Also as a remark we can not assume that the FIN or RST packet is hitting
> flow table teardown as the packet might get bumped to the slow path in
> nftables.
I assume this remark is related to 3.?
if IPS_OFFLOAD is unset, then conntrack would update the state
according to this FIN or RST.
Thanks for the summary.
^ permalink raw reply
* Re: [PATCH net v2] netfilter: nf_flow_table: fix teardown flow timeout
From: Sven Auhagen @ 2022-05-16 12:23 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Oz Shlomo, Felix Fietkau, netdev, netfilter-devel,
Florian Westphal, Paul Blakey
In-Reply-To: <YoI/z+aWkmAAycR3@salvia>
On Mon, May 16, 2022 at 02:13:03PM +0200, Pablo Neira Ayuso wrote:
> On Mon, May 16, 2022 at 12:56:41PM +0200, Pablo Neira Ayuso wrote:
> > On Thu, May 12, 2022 at 09:28:03PM +0300, Oz Shlomo wrote:
> > > Connections leaving the established state (due to RST / FIN TCP packets)
> > > set the flow table teardown flag. The packet path continues to set lower
> > > timeout value as per the new TCP state but the offload flag remains set.
> > >
> > > Hence, the conntrack garbage collector may race to undo the timeout
> > > adjustment of the packet path, leaving the conntrack entry in place with
> > > the internal offload timeout (one day).
> > >
> > > Avoid ct gc timeout overwrite by flagging teared down flowtable
> > > connections.
> > >
> > > On the nftables side we only need to allow established TCP connections to
> > > create a flow offload entry. Since we can not guaruantee that
> > > flow_offload_teardown is called by a TCP FIN packet we also need to make
> > > sure that flow_offload_fixup_ct is also called in flow_offload_del
> > > and only fixes up established TCP connections.
> > [...]
> > > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > > index 0164e5f522e8..324fdb62c08b 100644
> > > --- a/net/netfilter/nf_conntrack_core.c
> > > +++ b/net/netfilter/nf_conntrack_core.c
> > > @@ -1477,7 +1477,8 @@ static void gc_worker(struct work_struct *work)
> > > tmp = nf_ct_tuplehash_to_ctrack(h);
> > >
> > > if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) {
> > > - nf_ct_offload_timeout(tmp);
> >
> > Hm, it is the trick to avoid checking for IPS_OFFLOAD from the packet
> > path that triggers the race, ie. nf_ct_is_expired()
> >
> > The flowtable ct fixup races with conntrack gc collector.
> >
> > Clearing IPS_OFFLOAD might result in offloading the entry again for
> > the closing packets.
> >
> > Probably clear IPS_OFFLOAD from teardown, and skip offload if flow is
> > in a TCP state that represent closure?
> >
> > if (unlikely(!tcph || tcph->fin || tcph->rst))
> > goto out;
> >
> > this is already the intention in the existing code.
>
> I'm attaching an incomplete sketch patch. My goal is to avoid the
> extra IPS_ bit.
You might create a race with ct gc that will remove the ct
if it is in close or end of close and before flow offload teardown is running
so flow offload teardown might access memory that was freed.
It is not a very likely scenario but never the less it might happen now
since the IPS_OFFLOAD_BIT is not set and the state might just time out.
If someone sets a very small TCP CLOSE timeout it gets more likely.
So Oz and myself were debatting about three possible cases/problems:
1. ct gc sets timeout even though the state is in CLOSE/FIN because the
IPS_OFFLOAD is still set but the flow is in teardown
2. ct gc removes the ct because the IPS_OFFLOAD is not set and
the CLOSE timeout is reached before the flow offload del
3. tcp ct is always set to ESTABLISHED with a very long timeout
in flow offload teardown/delete even though the state is already
CLOSED.
Also as a remark we can not assume that the FIN or RST packet is hitting
flow table teardown as the packet might get bumped to the slow path in
nftables.
Best
Sven
^ permalink raw reply
* Re: [PATCH net v2] netfilter: nf_flow_table: fix teardown flow timeout
From: Sven Auhagen @ 2022-05-16 12:17 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Oz Shlomo, Felix Fietkau, netdev, netfilter-devel,
Florian Westphal, Paul Blakey
In-Reply-To: <YoI+RGnrHbTJJqxB@salvia>
On Mon, May 16, 2022 at 02:06:28PM +0200, Pablo Neira Ayuso wrote:
> On Mon, May 16, 2022 at 01:37:40PM +0200, Pablo Neira Ayuso wrote:
> > On Mon, May 16, 2022 at 01:18:17PM +0200, Sven Auhagen wrote:
> > > On Mon, May 16, 2022 at 12:56:38PM +0200, Pablo Neira Ayuso wrote:
> > > > On Thu, May 12, 2022 at 09:28:03PM +0300, Oz Shlomo wrote:
> > > > > Connections leaving the established state (due to RST / FIN TCP packets)
> > > > > set the flow table teardown flag. The packet path continues to set lower
> > > > > timeout value as per the new TCP state but the offload flag remains set.
> > > > >
> > > > > Hence, the conntrack garbage collector may race to undo the timeout
> > > > > adjustment of the packet path, leaving the conntrack entry in place with
> > > > > the internal offload timeout (one day).
> > > > >
> > > > > Avoid ct gc timeout overwrite by flagging teared down flowtable
> > > > > connections.
> > > > >
> > > > > On the nftables side we only need to allow established TCP connections to
> > > > > create a flow offload entry. Since we can not guaruantee that
> > > > > flow_offload_teardown is called by a TCP FIN packet we also need to make
> > > > > sure that flow_offload_fixup_ct is also called in flow_offload_del
> > > > > and only fixes up established TCP connections.
> > > > [...]
> > > > > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > > > > index 0164e5f522e8..324fdb62c08b 100644
> > > > > --- a/net/netfilter/nf_conntrack_core.c
> > > > > +++ b/net/netfilter/nf_conntrack_core.c
> > > > > @@ -1477,7 +1477,8 @@ static void gc_worker(struct work_struct *work)
> > > > > tmp = nf_ct_tuplehash_to_ctrack(h);
> > > > >
> > > > > if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) {
> > > > > - nf_ct_offload_timeout(tmp);
> > > >
> > > > Hm, it is the trick to avoid checking for IPS_OFFLOAD from the packet
> > > > path that triggers the race, ie. nf_ct_is_expired()
> > > >
> > > > The flowtable ct fixup races with conntrack gc collector.
> > > >
> > > > Clearing IPS_OFFLOAD might result in offloading the entry again for
> > > > the closing packets.
> > > >
> > > > Probably clear IPS_OFFLOAD from teardown, and skip offload if flow is
> > > > in a TCP state that represent closure?
> > >
> > > > if (unlikely(!tcph || tcph->fin || tcph->rst))
> > > > goto out;
> > > >
> > > > this is already the intention in the existing code.
> > > >
> > > > If this does work, could you keep IPS_OFFLOAD_TEARDOWN_BIT internal,
> > > > ie. no in uapi? Define it at include/net/netfilter/nf_conntrack.h and
> > > > add a comment regarding this to avoid an overlap in the future.
> > > >
> > > > > + if (!test_bit(IPS_OFFLOAD_TEARDOWN_BIT, &tmp->status))
> > > > > + nf_ct_offload_timeout(tmp);
> > > > > continue;
> > > > > }
> > > > >
> > > > > diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> > > > > index 3db256da919b..aaed1a244013 100644
> > > > > --- a/net/netfilter/nf_flow_table_core.c
> > > > > +++ b/net/netfilter/nf_flow_table_core.c
> > > > > @@ -177,14 +177,8 @@ int flow_offload_route_init(struct flow_offload *flow,
> > > > > }
> > > > > EXPORT_SYMBOL_GPL(flow_offload_route_init);
> > > > >
> > > > > -static void flow_offload_fixup_tcp(struct ip_ct_tcp *tcp)
> > > > > -{
> > > > > - tcp->state = TCP_CONNTRACK_ESTABLISHED;
> > > > > - tcp->seen[0].td_maxwin = 0;
> > > > > - tcp->seen[1].td_maxwin = 0;
> > > > > -}
> > > > >
> > > > > -static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
> > > > > +static void flow_offload_fixup_ct(struct nf_conn *ct)
> > > > > {
> > > > > struct net *net = nf_ct_net(ct);
> > > > > int l4num = nf_ct_protonum(ct);
> > > > > @@ -192,8 +186,12 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
> > > > >
> > > > > if (l4num == IPPROTO_TCP) {
> > > > > struct nf_tcp_net *tn = nf_tcp_pernet(net);
> > > > > + struct ip_ct_tcp *tcp = &ct->proto.tcp;
> > > > > +
> > > > > + tcp->seen[0].td_maxwin = 0;
> > > > > + tcp->seen[1].td_maxwin = 0;
> > > > >
> > > > > - timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
> > > > > + timeout = tn->timeouts[ct->proto.tcp.state];
> > > > > timeout -= tn->offload_timeout;
> > > > > } else if (l4num == IPPROTO_UDP) {
> > > > > struct nf_udp_net *tn = nf_udp_pernet(net);
> > > > > @@ -211,18 +209,6 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
> > > > > WRITE_ONCE(ct->timeout, nfct_time_stamp + timeout);
> > > > > }
> > > > >
> > > > > -static void flow_offload_fixup_ct_state(struct nf_conn *ct)
> > > > > -{
> > > > > - if (nf_ct_protonum(ct) == IPPROTO_TCP)
> > > > > - flow_offload_fixup_tcp(&ct->proto.tcp);
> > > > > -}
> > > > > -
> > > > > -static void flow_offload_fixup_ct(struct nf_conn *ct)
> > > > > -{
> > > > > - flow_offload_fixup_ct_state(ct);
> > > > > - flow_offload_fixup_ct_timeout(ct);
> > > > > -}
> > > > > -
> > > > > static void flow_offload_route_release(struct flow_offload *flow)
> > > > > {
> > > > > nft_flow_dst_release(flow, FLOW_OFFLOAD_DIR_ORIGINAL);
> > > > > @@ -353,6 +339,10 @@ static inline bool nf_flow_has_expired(const struct flow_offload *flow)
> > > > > static void flow_offload_del(struct nf_flowtable *flow_table,
> > > > > struct flow_offload *flow)
> > > > > {
> > > > > + struct nf_conn *ct = flow->ct;
> > > > > +
> > > > > + set_bit(IPS_OFFLOAD_TEARDOWN_BIT, &flow->ct->status);
> > > > > +
> > > > > rhashtable_remove_fast(&flow_table->rhashtable,
> > > > > &flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].node,
> > > > > nf_flow_offload_rhash_params);
> > > > > @@ -360,12 +350,11 @@ static void flow_offload_del(struct nf_flowtable *flow_table,
> > > > > &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].node,
> > > > > nf_flow_offload_rhash_params);
> > > > >
> > > > > - clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
> > > > > -
> > > > > if (nf_flow_has_expired(flow))
> > > > > - flow_offload_fixup_ct(flow->ct);
> > > > > - else
> > > > > - flow_offload_fixup_ct_timeout(flow->ct);
> > > > > + flow_offload_fixup_ct(ct);
> > > >
> > > > Very unlikely, but race might still happen between fixup and
> > > > clear IPS_OFFLOAD_BIT with gc below?
> > > >
> > > > Without checking from the packet path, the conntrack gc might race to
> > > > refresh the timeout, I don't see a 100% race free solution.
> > > >
> > > > Probably update the nf_ct_offload_timeout to a shorter value than a
> > > > day would mitigate this issue too.
> > >
> > > This section of the code is now protected by IPS_OFFLOAD_TEARDOWN_BIT
> > > which will prevent the update via nf_ct_offload_timeout.
> > > We set it at the beginning of flow_offload_del and flow_offload_teardown.
> > >
> > > Since flow_offload_teardown is only called on TCP packets
> > > we also need to set it at flow_offload_del to prevent the race.
> > >
> > > This should prevent the race at this point.
> >
> > OK.
> >
> > > > > + clear_bit(IPS_OFFLOAD_BIT, &ct->status);
> > > > > + clear_bit(IPS_OFFLOAD_TEARDOWN_BIT, &ct->status);
> > > > >
> > > > > flow_offload_free(flow);
> > > > > }
> > > > > @@ -373,8 +362,9 @@ static void flow_offload_del(struct nf_flowtable *flow_table,
> > > > > void flow_offload_teardown(struct flow_offload *flow)
> > > > > {
> > > > > set_bit(NF_FLOW_TEARDOWN, &flow->flags);
> > > > > + set_bit(IPS_OFFLOAD_TEARDOWN_BIT, &flow->ct->status);
> > > > >
> > > > > - flow_offload_fixup_ct_state(flow->ct);
> > > > > + flow_offload_fixup_ct(flow->ct);
> > > > > }
> > > > > EXPORT_SYMBOL_GPL(flow_offload_teardown);
> > > > >
> > > > > diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
> > > > > index 900d48c810a1..9cc3ea08eb3a 100644
> > > > > --- a/net/netfilter/nft_flow_offload.c
> > > > > +++ b/net/netfilter/nft_flow_offload.c
> > > > > @@ -295,6 +295,8 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
> > > > > sizeof(_tcph), &_tcph);
> > > > > if (unlikely(!tcph || tcph->fin || tcph->rst))
> > > > > goto out;
> > > > > + if (unlikely(!nf_conntrack_tcp_established(ct)))
> > > > > + goto out;
> > > >
> > > > This chunk is not required, from ruleset users can do
> > > >
> > > > ... ct status assured ...
> > > >
> > > > instead.
> > >
> > > Maybe this should be mentioned in the manual or wiki if
> > > it is not necessary in the flow offload code.
> >
> > Yes, documentation and wiki can be updated.
> >
> > Users might want to offload the flow at a later stage in the TCP
> > connection.
>
> Well, actually there is not later stage than established, anything
> after established are TCP teardown states.
>
> What's the issue with allowing to offload from SYN_RECV state?
There were multiple problem in general with the code.
flow_offload_fixup_tcp always moves a TCP connection
to established even if it is in FIN or CLOSE.
The flowoffload_del function was always setting the TCP timeout
to ESTABLISHED timeout even when the state was in CLOSE and therefore
creating a very long lasting dead state.
Since we might miss or bump packets to slow path, we do not know
what will happen there when we are still in SYN_RECV.
We will have a better knowledge of the TCP state when we are in
established first and we know that we are either still in it or
we have moved past it to a closing state.
^ permalink raw reply
* Re: [PATCH net v2] netfilter: nf_flow_table: fix teardown flow timeout
From: Pablo Neira Ayuso @ 2022-05-16 12:13 UTC (permalink / raw)
To: Oz Shlomo
Cc: Sven Auhagen, Felix Fietkau, netdev, netfilter-devel,
Florian Westphal, Paul Blakey
In-Reply-To: <YoIt5rHw4Xwl1zgY@salvia>
[-- Attachment #1: Type: text/plain, Size: 2052 bytes --]
On Mon, May 16, 2022 at 12:56:41PM +0200, Pablo Neira Ayuso wrote:
> On Thu, May 12, 2022 at 09:28:03PM +0300, Oz Shlomo wrote:
> > Connections leaving the established state (due to RST / FIN TCP packets)
> > set the flow table teardown flag. The packet path continues to set lower
> > timeout value as per the new TCP state but the offload flag remains set.
> >
> > Hence, the conntrack garbage collector may race to undo the timeout
> > adjustment of the packet path, leaving the conntrack entry in place with
> > the internal offload timeout (one day).
> >
> > Avoid ct gc timeout overwrite by flagging teared down flowtable
> > connections.
> >
> > On the nftables side we only need to allow established TCP connections to
> > create a flow offload entry. Since we can not guaruantee that
> > flow_offload_teardown is called by a TCP FIN packet we also need to make
> > sure that flow_offload_fixup_ct is also called in flow_offload_del
> > and only fixes up established TCP connections.
> [...]
> > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > index 0164e5f522e8..324fdb62c08b 100644
> > --- a/net/netfilter/nf_conntrack_core.c
> > +++ b/net/netfilter/nf_conntrack_core.c
> > @@ -1477,7 +1477,8 @@ static void gc_worker(struct work_struct *work)
> > tmp = nf_ct_tuplehash_to_ctrack(h);
> >
> > if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) {
> > - nf_ct_offload_timeout(tmp);
>
> Hm, it is the trick to avoid checking for IPS_OFFLOAD from the packet
> path that triggers the race, ie. nf_ct_is_expired()
>
> The flowtable ct fixup races with conntrack gc collector.
>
> Clearing IPS_OFFLOAD might result in offloading the entry again for
> the closing packets.
>
> Probably clear IPS_OFFLOAD from teardown, and skip offload if flow is
> in a TCP state that represent closure?
>
> if (unlikely(!tcph || tcph->fin || tcph->rst))
> goto out;
>
> this is already the intention in the existing code.
I'm attaching an incomplete sketch patch. My goal is to avoid the
extra IPS_ bit.
[-- Attachment #2: x.patch --]
[-- Type: text/x-diff, Size: 1804 bytes --]
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 20b4a14e5d4e..7af1e2e8f595 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -362,8 +362,6 @@ static void flow_offload_del(struct nf_flowtable *flow_table,
&flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].node,
nf_flow_offload_rhash_params);
- clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
-
if (nf_flow_has_expired(flow))
flow_offload_fixup_ct(flow->ct);
else
@@ -375,6 +373,7 @@ static void flow_offload_del(struct nf_flowtable *flow_table,
void flow_offload_teardown(struct flow_offload *flow)
{
set_bit(NF_FLOW_TEARDOWN, &flow->flags);
+ clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
flow_offload_fixup_ct_state(flow->ct);
}
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index 187b8cb9a510..7bc56377496c 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -273,6 +273,12 @@ static bool nft_flow_offload_skip(struct sk_buff *skb, int family)
return false;
}
+static bool flow_offload_teardown_state(const struct ip_ct_tcp *state)
+{
+ return state->state > TCP_CONNTRACK_ESTABLISHED &&
+ state->state <= TCP_CONNTRACK_CLOSE;
+}
+
static void nft_flow_offload_eval(const struct nft_expr *expr,
struct nft_regs *regs,
const struct nft_pktinfo *pkt)
@@ -298,7 +304,8 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
case IPPROTO_TCP:
tcph = skb_header_pointer(pkt->skb, nft_thoff(pkt),
sizeof(_tcph), &_tcph);
- if (unlikely(!tcph || tcph->fin || tcph->rst))
+ if (unlikely(!tcph || tcph->fin || tcph->rst ||
+ flow_offload_teardown_state(ct->proto.tcp)))
goto out;
break;
case IPPROTO_UDP:
^ permalink raw reply related
* Re: [PATCH net v2] netfilter: nf_flow_table: fix teardown flow timeout
From: Pablo Neira Ayuso @ 2022-05-16 12:06 UTC (permalink / raw)
To: Sven Auhagen
Cc: Oz Shlomo, Felix Fietkau, netdev, netfilter-devel,
Florian Westphal, Paul Blakey
In-Reply-To: <YoI3gliaYc250Vnb@salvia>
On Mon, May 16, 2022 at 01:37:40PM +0200, Pablo Neira Ayuso wrote:
> On Mon, May 16, 2022 at 01:18:17PM +0200, Sven Auhagen wrote:
> > On Mon, May 16, 2022 at 12:56:38PM +0200, Pablo Neira Ayuso wrote:
> > > On Thu, May 12, 2022 at 09:28:03PM +0300, Oz Shlomo wrote:
> > > > Connections leaving the established state (due to RST / FIN TCP packets)
> > > > set the flow table teardown flag. The packet path continues to set lower
> > > > timeout value as per the new TCP state but the offload flag remains set.
> > > >
> > > > Hence, the conntrack garbage collector may race to undo the timeout
> > > > adjustment of the packet path, leaving the conntrack entry in place with
> > > > the internal offload timeout (one day).
> > > >
> > > > Avoid ct gc timeout overwrite by flagging teared down flowtable
> > > > connections.
> > > >
> > > > On the nftables side we only need to allow established TCP connections to
> > > > create a flow offload entry. Since we can not guaruantee that
> > > > flow_offload_teardown is called by a TCP FIN packet we also need to make
> > > > sure that flow_offload_fixup_ct is also called in flow_offload_del
> > > > and only fixes up established TCP connections.
> > > [...]
> > > > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > > > index 0164e5f522e8..324fdb62c08b 100644
> > > > --- a/net/netfilter/nf_conntrack_core.c
> > > > +++ b/net/netfilter/nf_conntrack_core.c
> > > > @@ -1477,7 +1477,8 @@ static void gc_worker(struct work_struct *work)
> > > > tmp = nf_ct_tuplehash_to_ctrack(h);
> > > >
> > > > if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) {
> > > > - nf_ct_offload_timeout(tmp);
> > >
> > > Hm, it is the trick to avoid checking for IPS_OFFLOAD from the packet
> > > path that triggers the race, ie. nf_ct_is_expired()
> > >
> > > The flowtable ct fixup races with conntrack gc collector.
> > >
> > > Clearing IPS_OFFLOAD might result in offloading the entry again for
> > > the closing packets.
> > >
> > > Probably clear IPS_OFFLOAD from teardown, and skip offload if flow is
> > > in a TCP state that represent closure?
> >
> > > if (unlikely(!tcph || tcph->fin || tcph->rst))
> > > goto out;
> > >
> > > this is already the intention in the existing code.
> > >
> > > If this does work, could you keep IPS_OFFLOAD_TEARDOWN_BIT internal,
> > > ie. no in uapi? Define it at include/net/netfilter/nf_conntrack.h and
> > > add a comment regarding this to avoid an overlap in the future.
> > >
> > > > + if (!test_bit(IPS_OFFLOAD_TEARDOWN_BIT, &tmp->status))
> > > > + nf_ct_offload_timeout(tmp);
> > > > continue;
> > > > }
> > > >
> > > > diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> > > > index 3db256da919b..aaed1a244013 100644
> > > > --- a/net/netfilter/nf_flow_table_core.c
> > > > +++ b/net/netfilter/nf_flow_table_core.c
> > > > @@ -177,14 +177,8 @@ int flow_offload_route_init(struct flow_offload *flow,
> > > > }
> > > > EXPORT_SYMBOL_GPL(flow_offload_route_init);
> > > >
> > > > -static void flow_offload_fixup_tcp(struct ip_ct_tcp *tcp)
> > > > -{
> > > > - tcp->state = TCP_CONNTRACK_ESTABLISHED;
> > > > - tcp->seen[0].td_maxwin = 0;
> > > > - tcp->seen[1].td_maxwin = 0;
> > > > -}
> > > >
> > > > -static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
> > > > +static void flow_offload_fixup_ct(struct nf_conn *ct)
> > > > {
> > > > struct net *net = nf_ct_net(ct);
> > > > int l4num = nf_ct_protonum(ct);
> > > > @@ -192,8 +186,12 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
> > > >
> > > > if (l4num == IPPROTO_TCP) {
> > > > struct nf_tcp_net *tn = nf_tcp_pernet(net);
> > > > + struct ip_ct_tcp *tcp = &ct->proto.tcp;
> > > > +
> > > > + tcp->seen[0].td_maxwin = 0;
> > > > + tcp->seen[1].td_maxwin = 0;
> > > >
> > > > - timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
> > > > + timeout = tn->timeouts[ct->proto.tcp.state];
> > > > timeout -= tn->offload_timeout;
> > > > } else if (l4num == IPPROTO_UDP) {
> > > > struct nf_udp_net *tn = nf_udp_pernet(net);
> > > > @@ -211,18 +209,6 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
> > > > WRITE_ONCE(ct->timeout, nfct_time_stamp + timeout);
> > > > }
> > > >
> > > > -static void flow_offload_fixup_ct_state(struct nf_conn *ct)
> > > > -{
> > > > - if (nf_ct_protonum(ct) == IPPROTO_TCP)
> > > > - flow_offload_fixup_tcp(&ct->proto.tcp);
> > > > -}
> > > > -
> > > > -static void flow_offload_fixup_ct(struct nf_conn *ct)
> > > > -{
> > > > - flow_offload_fixup_ct_state(ct);
> > > > - flow_offload_fixup_ct_timeout(ct);
> > > > -}
> > > > -
> > > > static void flow_offload_route_release(struct flow_offload *flow)
> > > > {
> > > > nft_flow_dst_release(flow, FLOW_OFFLOAD_DIR_ORIGINAL);
> > > > @@ -353,6 +339,10 @@ static inline bool nf_flow_has_expired(const struct flow_offload *flow)
> > > > static void flow_offload_del(struct nf_flowtable *flow_table,
> > > > struct flow_offload *flow)
> > > > {
> > > > + struct nf_conn *ct = flow->ct;
> > > > +
> > > > + set_bit(IPS_OFFLOAD_TEARDOWN_BIT, &flow->ct->status);
> > > > +
> > > > rhashtable_remove_fast(&flow_table->rhashtable,
> > > > &flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].node,
> > > > nf_flow_offload_rhash_params);
> > > > @@ -360,12 +350,11 @@ static void flow_offload_del(struct nf_flowtable *flow_table,
> > > > &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].node,
> > > > nf_flow_offload_rhash_params);
> > > >
> > > > - clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
> > > > -
> > > > if (nf_flow_has_expired(flow))
> > > > - flow_offload_fixup_ct(flow->ct);
> > > > - else
> > > > - flow_offload_fixup_ct_timeout(flow->ct);
> > > > + flow_offload_fixup_ct(ct);
> > >
> > > Very unlikely, but race might still happen between fixup and
> > > clear IPS_OFFLOAD_BIT with gc below?
> > >
> > > Without checking from the packet path, the conntrack gc might race to
> > > refresh the timeout, I don't see a 100% race free solution.
> > >
> > > Probably update the nf_ct_offload_timeout to a shorter value than a
> > > day would mitigate this issue too.
> >
> > This section of the code is now protected by IPS_OFFLOAD_TEARDOWN_BIT
> > which will prevent the update via nf_ct_offload_timeout.
> > We set it at the beginning of flow_offload_del and flow_offload_teardown.
> >
> > Since flow_offload_teardown is only called on TCP packets
> > we also need to set it at flow_offload_del to prevent the race.
> >
> > This should prevent the race at this point.
>
> OK.
>
> > > > + clear_bit(IPS_OFFLOAD_BIT, &ct->status);
> > > > + clear_bit(IPS_OFFLOAD_TEARDOWN_BIT, &ct->status);
> > > >
> > > > flow_offload_free(flow);
> > > > }
> > > > @@ -373,8 +362,9 @@ static void flow_offload_del(struct nf_flowtable *flow_table,
> > > > void flow_offload_teardown(struct flow_offload *flow)
> > > > {
> > > > set_bit(NF_FLOW_TEARDOWN, &flow->flags);
> > > > + set_bit(IPS_OFFLOAD_TEARDOWN_BIT, &flow->ct->status);
> > > >
> > > > - flow_offload_fixup_ct_state(flow->ct);
> > > > + flow_offload_fixup_ct(flow->ct);
> > > > }
> > > > EXPORT_SYMBOL_GPL(flow_offload_teardown);
> > > >
> > > > diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
> > > > index 900d48c810a1..9cc3ea08eb3a 100644
> > > > --- a/net/netfilter/nft_flow_offload.c
> > > > +++ b/net/netfilter/nft_flow_offload.c
> > > > @@ -295,6 +295,8 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
> > > > sizeof(_tcph), &_tcph);
> > > > if (unlikely(!tcph || tcph->fin || tcph->rst))
> > > > goto out;
> > > > + if (unlikely(!nf_conntrack_tcp_established(ct)))
> > > > + goto out;
> > >
> > > This chunk is not required, from ruleset users can do
> > >
> > > ... ct status assured ...
> > >
> > > instead.
> >
> > Maybe this should be mentioned in the manual or wiki if
> > it is not necessary in the flow offload code.
>
> Yes, documentation and wiki can be updated.
>
> Users might want to offload the flow at a later stage in the TCP
> connection.
Well, actually there is not later stage than established, anything
after established are TCP teardown states.
What's the issue with allowing to offload from SYN_RECV state?
^ permalink raw reply
* Re: [PATCH v2] net: phy: marvell: Add errata section 5.1 for Alaska PHY
From: Andrew Lunn @ 2022-05-16 11:57 UTC (permalink / raw)
To: Stefan Roese
Cc: netdev, Leszek Polak, Marek Behún, Heiner Kallweit,
Russell King, David S . Miller
In-Reply-To: <20220516070859.549170-1-sr@denx.de>
On Mon, May 16, 2022 at 09:08:59AM +0200, Stefan Roese wrote:
> From: Leszek Polak <lpolak@arri.de>
>
> As per Errata Section 5.1, if EEE is intended to be used, some register
> writes must be done once after every hardware reset. This patch now adds
> the necessary register writes as listed in the Marvell errata.
>
> Without this fix we experience ethernet problems on some of our boards
> equipped with a new version of this ethernet PHY (different supplier).
>
> The fix applies to Marvell Alaska 88E1510/88E1518/88E1512/88E1514
> Rev. A0.
>
> Signed-off-by: Leszek Polak <lpolak@arri.de>
> Signed-off-by: Stefan Roese <sr@denx.de>
> Cc: Marek Behún <kabel@kernel.org>
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
^ permalink raw reply
* [PATCH linux-next] net: smc911x: replace ternary operator with min()
From: Guo Zhengkui @ 2022-05-16 11:56 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Colin Ian King, Guo Zhengkui, Jiasheng Jiang,
open list:NETWORKING DRIVERS, open list
Cc: zhengkui_guo
Fix the following coccicheck warning:
drivers/net/ethernet/smsc/smc911x.c:483:20-22: WARNING opportunity for min()
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
---
drivers/net/ethernet/smsc/smc911x.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/smsc/smc911x.c b/drivers/net/ethernet/smsc/smc911x.c
index fc9cef9dcefc..2694287770e6 100644
--- a/drivers/net/ethernet/smsc/smc911x.c
+++ b/drivers/net/ethernet/smsc/smc911x.c
@@ -480,7 +480,7 @@ static void smc911x_hardware_send_pkt(struct net_device *dev)
SMC_SET_TX_FIFO(lp, cmdB);
DBG(SMC_DEBUG_PKTS, dev, "Transmitted packet\n");
- PRINT_PKT(buf, len <= 64 ? len : 64);
+ PRINT_PKT(buf, min(len, 64));
/* Send pkt via PIO or DMA */
#ifdef SMC_USE_DMA
--
2.20.1
^ permalink raw reply related
* Re: [PATCH bpf-next 1/2] cpuidle/rcu: Making arch_cpu_idle and rcu_idle_exit noinstr
From: Frederic Weisbecker @ 2022-05-16 11:49 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Masami Hiramatsu, netdev, bpf, lkml, Martin KaFai Lau, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Steven Rostedt
In-Reply-To: <20220516042535.GV1790663@paulmck-ThinkPad-P17-Gen-1>
On Sun, May 15, 2022 at 09:25:35PM -0700, Paul E. McKenney wrote:
> On Sun, May 15, 2022 at 10:36:52PM +0200, Jiri Olsa wrote:
> > Making arch_cpu_idle and rcu_idle_exit noinstr. Both functions run
> > in rcu 'not watching' context and if there's tracer attached to
> > them, which uses rcu (e.g. kprobe multi interface) it will hit RCU
> > warning like:
> >
> > [ 3.017540] WARNING: suspicious RCU usage
> > ...
> > [ 3.018363] kprobe_multi_link_handler+0x68/0x1c0
> > [ 3.018364] ? kprobe_multi_link_handler+0x3e/0x1c0
> > [ 3.018366] ? arch_cpu_idle_dead+0x10/0x10
> > [ 3.018367] ? arch_cpu_idle_dead+0x10/0x10
> > [ 3.018371] fprobe_handler.part.0+0xab/0x150
> > [ 3.018374] 0xffffffffa00080c8
> > [ 3.018393] ? arch_cpu_idle+0x5/0x10
> > [ 3.018398] arch_cpu_idle+0x5/0x10
> > [ 3.018399] default_idle_call+0x59/0x90
> > [ 3.018401] do_idle+0x1c3/0x1d0
> >
> > The call path is following:
> >
> > default_idle_call
> > rcu_idle_enter
> > arch_cpu_idle
> > rcu_idle_exit
> >
> > The arch_cpu_idle and rcu_idle_exit are the only ones from above
> > path that are traceble and cause this problem on my setup.
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
>
> From an RCU viewpoint:
>
> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
>
> [ I considered asking for an instrumentation_on() in rcu_idle_exit(),
> but there is no point given that local_irq_restore() isn't something
> you instrument anyway. ]
So local_irq_save() in the beginning of rcu_idle_exit() is unsafe because
it is instrumentable by the function (graph) tracers and the irqsoff tracer.
Also it calls into lockdep that might make use of RCU.
That's why rcu_idle_exit() is not noinstr yet. See this patch:
https://lore.kernel.org/lkml/20220503100051.2799723-4-frederic@kernel.org/
Thanks.
>
> > ---
> > arch/x86/kernel/process.c | 2 +-
> > kernel/rcu/tree.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index b370767f5b19..1345cb0124a6 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -720,7 +720,7 @@ void arch_cpu_idle_dead(void)
> > /*
> > * Called from the generic idle code.
> > */
> > -void arch_cpu_idle(void)
> > +void noinstr arch_cpu_idle(void)
> > {
> > x86_idle();
> > }
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index a4b8189455d5..20d529722f51 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -896,7 +896,7 @@ static void noinstr rcu_eqs_exit(bool user)
> > * If you add or remove a call to rcu_idle_exit(), be sure to test with
> > * CONFIG_RCU_EQS_DEBUG=y.
> > */
> > -void rcu_idle_exit(void)
> > +void noinstr rcu_idle_exit(void)
> > {
> > unsigned long flags;
> >
> > --
> > 2.35.3
> >
^ permalink raw reply
* Re: [PATCH net v2] netfilter: nf_flow_table: fix teardown flow timeout
From: Pablo Neira Ayuso @ 2022-05-16 11:37 UTC (permalink / raw)
To: Sven Auhagen
Cc: Oz Shlomo, Felix Fietkau, netdev, netfilter-devel,
Florian Westphal, Paul Blakey
In-Reply-To: <20220516111817.2jic2qnij2dvkp5i@Svens-MacBookPro.local>
On Mon, May 16, 2022 at 01:18:17PM +0200, Sven Auhagen wrote:
> On Mon, May 16, 2022 at 12:56:38PM +0200, Pablo Neira Ayuso wrote:
> > On Thu, May 12, 2022 at 09:28:03PM +0300, Oz Shlomo wrote:
> > > Connections leaving the established state (due to RST / FIN TCP packets)
> > > set the flow table teardown flag. The packet path continues to set lower
> > > timeout value as per the new TCP state but the offload flag remains set.
> > >
> > > Hence, the conntrack garbage collector may race to undo the timeout
> > > adjustment of the packet path, leaving the conntrack entry in place with
> > > the internal offload timeout (one day).
> > >
> > > Avoid ct gc timeout overwrite by flagging teared down flowtable
> > > connections.
> > >
> > > On the nftables side we only need to allow established TCP connections to
> > > create a flow offload entry. Since we can not guaruantee that
> > > flow_offload_teardown is called by a TCP FIN packet we also need to make
> > > sure that flow_offload_fixup_ct is also called in flow_offload_del
> > > and only fixes up established TCP connections.
> > [...]
> > > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > > index 0164e5f522e8..324fdb62c08b 100644
> > > --- a/net/netfilter/nf_conntrack_core.c
> > > +++ b/net/netfilter/nf_conntrack_core.c
> > > @@ -1477,7 +1477,8 @@ static void gc_worker(struct work_struct *work)
> > > tmp = nf_ct_tuplehash_to_ctrack(h);
> > >
> > > if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) {
> > > - nf_ct_offload_timeout(tmp);
> >
> > Hm, it is the trick to avoid checking for IPS_OFFLOAD from the packet
> > path that triggers the race, ie. nf_ct_is_expired()
> >
> > The flowtable ct fixup races with conntrack gc collector.
> >
> > Clearing IPS_OFFLOAD might result in offloading the entry again for
> > the closing packets.
> >
> > Probably clear IPS_OFFLOAD from teardown, and skip offload if flow is
> > in a TCP state that represent closure?
>
> > if (unlikely(!tcph || tcph->fin || tcph->rst))
> > goto out;
> >
> > this is already the intention in the existing code.
> >
> > If this does work, could you keep IPS_OFFLOAD_TEARDOWN_BIT internal,
> > ie. no in uapi? Define it at include/net/netfilter/nf_conntrack.h and
> > add a comment regarding this to avoid an overlap in the future.
> >
> > > + if (!test_bit(IPS_OFFLOAD_TEARDOWN_BIT, &tmp->status))
> > > + nf_ct_offload_timeout(tmp);
> > > continue;
> > > }
> > >
> > > diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> > > index 3db256da919b..aaed1a244013 100644
> > > --- a/net/netfilter/nf_flow_table_core.c
> > > +++ b/net/netfilter/nf_flow_table_core.c
> > > @@ -177,14 +177,8 @@ int flow_offload_route_init(struct flow_offload *flow,
> > > }
> > > EXPORT_SYMBOL_GPL(flow_offload_route_init);
> > >
> > > -static void flow_offload_fixup_tcp(struct ip_ct_tcp *tcp)
> > > -{
> > > - tcp->state = TCP_CONNTRACK_ESTABLISHED;
> > > - tcp->seen[0].td_maxwin = 0;
> > > - tcp->seen[1].td_maxwin = 0;
> > > -}
> > >
> > > -static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
> > > +static void flow_offload_fixup_ct(struct nf_conn *ct)
> > > {
> > > struct net *net = nf_ct_net(ct);
> > > int l4num = nf_ct_protonum(ct);
> > > @@ -192,8 +186,12 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
> > >
> > > if (l4num == IPPROTO_TCP) {
> > > struct nf_tcp_net *tn = nf_tcp_pernet(net);
> > > + struct ip_ct_tcp *tcp = &ct->proto.tcp;
> > > +
> > > + tcp->seen[0].td_maxwin = 0;
> > > + tcp->seen[1].td_maxwin = 0;
> > >
> > > - timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
> > > + timeout = tn->timeouts[ct->proto.tcp.state];
> > > timeout -= tn->offload_timeout;
> > > } else if (l4num == IPPROTO_UDP) {
> > > struct nf_udp_net *tn = nf_udp_pernet(net);
> > > @@ -211,18 +209,6 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
> > > WRITE_ONCE(ct->timeout, nfct_time_stamp + timeout);
> > > }
> > >
> > > -static void flow_offload_fixup_ct_state(struct nf_conn *ct)
> > > -{
> > > - if (nf_ct_protonum(ct) == IPPROTO_TCP)
> > > - flow_offload_fixup_tcp(&ct->proto.tcp);
> > > -}
> > > -
> > > -static void flow_offload_fixup_ct(struct nf_conn *ct)
> > > -{
> > > - flow_offload_fixup_ct_state(ct);
> > > - flow_offload_fixup_ct_timeout(ct);
> > > -}
> > > -
> > > static void flow_offload_route_release(struct flow_offload *flow)
> > > {
> > > nft_flow_dst_release(flow, FLOW_OFFLOAD_DIR_ORIGINAL);
> > > @@ -353,6 +339,10 @@ static inline bool nf_flow_has_expired(const struct flow_offload *flow)
> > > static void flow_offload_del(struct nf_flowtable *flow_table,
> > > struct flow_offload *flow)
> > > {
> > > + struct nf_conn *ct = flow->ct;
> > > +
> > > + set_bit(IPS_OFFLOAD_TEARDOWN_BIT, &flow->ct->status);
> > > +
> > > rhashtable_remove_fast(&flow_table->rhashtable,
> > > &flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].node,
> > > nf_flow_offload_rhash_params);
> > > @@ -360,12 +350,11 @@ static void flow_offload_del(struct nf_flowtable *flow_table,
> > > &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].node,
> > > nf_flow_offload_rhash_params);
> > >
> > > - clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
> > > -
> > > if (nf_flow_has_expired(flow))
> > > - flow_offload_fixup_ct(flow->ct);
> > > - else
> > > - flow_offload_fixup_ct_timeout(flow->ct);
> > > + flow_offload_fixup_ct(ct);
> >
> > Very unlikely, but race might still happen between fixup and
> > clear IPS_OFFLOAD_BIT with gc below?
> >
> > Without checking from the packet path, the conntrack gc might race to
> > refresh the timeout, I don't see a 100% race free solution.
> >
> > Probably update the nf_ct_offload_timeout to a shorter value than a
> > day would mitigate this issue too.
>
> This section of the code is now protected by IPS_OFFLOAD_TEARDOWN_BIT
> which will prevent the update via nf_ct_offload_timeout.
> We set it at the beginning of flow_offload_del and flow_offload_teardown.
>
> Since flow_offload_teardown is only called on TCP packets
> we also need to set it at flow_offload_del to prevent the race.
>
> This should prevent the race at this point.
OK.
> > > + clear_bit(IPS_OFFLOAD_BIT, &ct->status);
> > > + clear_bit(IPS_OFFLOAD_TEARDOWN_BIT, &ct->status);
> > >
> > > flow_offload_free(flow);
> > > }
> > > @@ -373,8 +362,9 @@ static void flow_offload_del(struct nf_flowtable *flow_table,
> > > void flow_offload_teardown(struct flow_offload *flow)
> > > {
> > > set_bit(NF_FLOW_TEARDOWN, &flow->flags);
> > > + set_bit(IPS_OFFLOAD_TEARDOWN_BIT, &flow->ct->status);
> > >
> > > - flow_offload_fixup_ct_state(flow->ct);
> > > + flow_offload_fixup_ct(flow->ct);
> > > }
> > > EXPORT_SYMBOL_GPL(flow_offload_teardown);
> > >
> > > diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
> > > index 900d48c810a1..9cc3ea08eb3a 100644
> > > --- a/net/netfilter/nft_flow_offload.c
> > > +++ b/net/netfilter/nft_flow_offload.c
> > > @@ -295,6 +295,8 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
> > > sizeof(_tcph), &_tcph);
> > > if (unlikely(!tcph || tcph->fin || tcph->rst))
> > > goto out;
> > > + if (unlikely(!nf_conntrack_tcp_established(ct)))
> > > + goto out;
> >
> > This chunk is not required, from ruleset users can do
> >
> > ... ct status assured ...
> >
> > instead.
>
> Maybe this should be mentioned in the manual or wiki if
> it is not necessary in the flow offload code.
Yes, documentation and wiki can be updated.
Users might want to offload the flow at a later stage in the TCP
connection.
^ permalink raw reply
* Re: [PATCH net-next v3] net: dsa: realtek: rtl8366rb: Serialize indirect PHY register access
From: Vladimir Oltean @ 2022-05-16 11:33 UTC (permalink / raw)
To: Linus Walleij
Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, David S . Miller,
Jakub Kicinski, netdev, Alvin Šipraga, kernel test robot
In-Reply-To: <20220513213618.2742895-1-linus.walleij@linaro.org>
On Fri, May 13, 2022 at 11:36:18PM +0200, Linus Walleij wrote:
> From: Alvin Šipraga <alsi@bang-olufsen.dk>
>
> Lock the regmap during the whole PHY register access routines in
> rtl8366rb.
>
> Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
> Reported-by: kernel test robot <lkp@intel.com>
I don't think I would have added this tag.
> Tested-by: Linus Walleij <linus.walleij@linaro.org>
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
> ChangeLog v2->v3:
> - Explicitly target net-next
> ChangeLog v1->v2:
> - Make sure to always return a properly assigned error
> code on the error path in rtl8366rb_phy_read()
> found by the kernel test robot.
>
> I have tested that this does not create any regressions,
> it makes more sense to have this applied than not. First
> it is related to the same family as the other ASICs, also
> it makes perfect logical sense to enforce serialization
> of these reads/writes.
> ---
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
^ permalink raw reply
* [PATCH net-next] net: PIM register decapsulation and Forwarding.
From: Saranya Panjarathina @ 2022-05-16 11:29 UTC (permalink / raw)
To: netdev
Cc: Saranya_Panjarathina, davem, yoshfuji, dsahern, edumazet, kuba,
pabeni, linux-kernel, g_balaji1, Saranya Panjarathina
In-Reply-To: <20220512070138.19170-1-plsaranya@gmail.com>
PIM register packet is decapsulated but not forwarded in RP
__pim_rcv decapsulates the PIM register packet and reinjects for forwarding
after replacing the skb->dev to reg_dev (vif with VIFF_Register)
Ideally the incoming device should be same as skb->dev where the
original PIM register packet is received. mcache would not have
reg_vif as IIF. Decapsulated packet forwarding is failing
because of IIF mismatch. In RP for this S,G RPF interface would be
skb->dev vif only, so that would be IIF for the cache entry.
Signed-off-by: Saranya Panjarathina <plsaranya@gmail.com>
---
net/ipv4/ipmr.c | 2 +-
net/ipv6/ip6mr.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 13e6329784fb..7b9586335fb7 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -598,7 +598,7 @@ static int __pim_rcv(struct mr_table *mrt, struct sk_buff *skb,
skb->protocol = htons(ETH_P_IP);
skb->ip_summed = CHECKSUM_NONE;
- skb_tunnel_rx(skb, reg_dev, dev_net(reg_dev));
+ skb_tunnel_rx(skb, skb->dev, dev_net(skb->dev));
netif_rx(skb);
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 4e74bc61a3db..147e29a818ca 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -566,7 +566,7 @@ static int pim6_rcv(struct sk_buff *skb)
skb->protocol = htons(ETH_P_IPV6);
skb->ip_summed = CHECKSUM_NONE;
- skb_tunnel_rx(skb, reg_dev, dev_net(reg_dev));
+ skb_tunnel_rx(skb, skb->dev, net);
netif_rx(skb);
--
2.20.1
^ permalink raw reply related
* Re: [RFC Patch net-next v2 9/9] net: dsa: microchip: remove unused members in ksz_device
From: Vladimir Oltean @ 2022-05-16 11:30 UTC (permalink / raw)
To: Arun Ramadoss
Cc: linux-kernel, netdev, Russell King, Woojung Huh, UNGLinuxDriver,
Andrew Lunn, Vivien Didelot, Florian Fainelli, David S. Miller,
Jakub Kicinski, Paolo Abeni, Oleksij Rempel, Marek Vasut,
Michael Grzeschik, Eric Dumazet
In-Reply-To: <20220513102219.30399-10-arun.ramadoss@microchip.com>
On Fri, May 13, 2022 at 03:52:19PM +0530, Arun Ramadoss wrote:
> The name, regs_size and overrides members in struct ksz_device are
> unused. Hence remove it.
> And host_mask is used in only place of ksz8795.c file, which can be
> replaced by dev->info->cpu_ports
>
> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
> ---
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
^ permalink raw reply
* Re: [RFC Patch net-next v2 8/9] net: dsa: microchip: add the phylink get_caps
From: Vladimir Oltean @ 2022-05-16 11:29 UTC (permalink / raw)
To: Arun Ramadoss
Cc: linux-kernel, netdev, Russell King, Woojung Huh, UNGLinuxDriver,
Andrew Lunn, Vivien Didelot, Florian Fainelli, David S. Miller,
Jakub Kicinski, Paolo Abeni, Oleksij Rempel, Marek Vasut,
Michael Grzeschik, Eric Dumazet
In-Reply-To: <20220513102219.30399-9-arun.ramadoss@microchip.com>
On Fri, May 13, 2022 at 03:52:18PM +0530, Arun Ramadoss wrote:
> This patch add the support for phylink_get_caps for ksz8795 and ksz9477
> series switch. It updates the struct ksz_switch_chip with the details of
> the internal phys and xmii interface. Then during the get_caps based on
> the bits set in the structure, corresponding phy mode is set.
>
> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
> ---
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Looks good, although I haven't verified the exact compatibility matrix
for all switches. Just one comment below.
> @@ -179,6 +183,10 @@ const struct ksz_chip_data ksz_switch_chips[] = {
> .mib_names = ksz9477_mib_names,
> .mib_cnt = ARRAY_SIZE(ksz9477_mib_names),
> .reg_mib_cnt = MIB_COUNTER_NUM,
> + .supports_mii = {false, false, false, false, true},
> + .supports_rmii = {false, false, false, false, true},
> + .supports_rgmii = {false, false, false, false, true},
> + .internal_phy = {true, true, true, false, false},
> },
>
> [KSZ8765] = {
> @@ -193,6 +201,10 @@ const struct ksz_chip_data ksz_switch_chips[] = {
> .mib_names = ksz9477_mib_names,
> .mib_cnt = ARRAY_SIZE(ksz9477_mib_names),
> .reg_mib_cnt = MIB_COUNTER_NUM,
> + .supports_mii = {false, false, false, false, true},
> + .supports_rmii = {false, false, false, false, true},
> + .supports_rgmii = {false, false, false, false, true},
> + .internal_phy = {true, true, true, true, false},
> },
>
> [KSZ8830] = {
> @@ -206,6 +218,9 @@ const struct ksz_chip_data ksz_switch_chips[] = {
> .mib_names = ksz88xx_mib_names,
> .mib_cnt = ARRAY_SIZE(ksz88xx_mib_names),
> .reg_mib_cnt = MIB_COUNTER_NUM,
> + .supports_mii = {false, false, true},
> + .supports_rmii = {false, false, true},
> + .internal_phy = {true, true, false},
> },
>
> [KSZ9477] = {
> @@ -220,6 +235,14 @@ const struct ksz_chip_data ksz_switch_chips[] = {
> .mib_names = ksz9477_mib_names,
> .mib_cnt = ARRAY_SIZE(ksz9477_mib_names),
> .reg_mib_cnt = MIB_COUNTER_NUM,
> + .supports_mii = {false, false, false, false,
> + false, true, false},
> + .supports_rmii = {false, false, false, false,
> + false, true, false},
> + .supports_rgmii = {false, false, false, false,
> + false, true, false},
Please fix indentation here and for KSZ9897 and KSZ9567.
> + .internal_phy = {true, true, true, true,
> + true, false, false},
> },
^ permalink raw reply
* Re: [PATCH net-next v2] docs: ctucanfd: Use 'kernel-figure' directive instead of 'figure'
From: Akira Yokosawa @ 2022-05-16 11:24 UTC (permalink / raw)
To: Marc Kleine-Budde, Pavel Pisa, netdev, David S. Miller,
Jakub Kicinski
Cc: Martin Jerabek, Ondrej Ille, linux-doc, linux-kernel
In-Reply-To: <5986752a-1c2a-5d64-f91d-58b1e6decd17@gmail.com>
On Wed, 11 May 2022 08:45:43 +0900, Akira Yokosawa wrote:
> Two issues were observed in the ReST doc added by commit c3a0addefbde
> ("docs: ctucanfd: CTU CAN FD open-source IP core documentation.")
> with Sphinx versions 2.4.4 and 4.5.0.
>
> The plain "figure" directive broke "make pdfdocs" due to a missing
> PDF figure. For conversion of SVG -> PDF to work, the "kernel-figure"
> directive, which is an extension for kernel documentation, should
> be used instead.
>
> The directive of "code:: raw" causes a warning from both
> "make htmldocs" and "make pdfdocs", which reads:
>
> [...]/can/ctu/ctucanfd-driver.rst:75: WARNING: Pygments lexer name
> 'raw' is not known
>
> A plain literal-block marker should suffice where no syntax
> highlighting is intended.
>
> Fix the issues by using suitable directive and marker.
>
> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
> Fixes: c3a0addefbde ("docs: ctucanfd: CTU CAN FD open-source IP core documentation.")
> Acked-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
> Cc: Martin Jerabek <martin.jerabek01@gmail.com>
> Cc: Ondrej Ille <ondrej.ille@gmail.com>
> Cc: Marc Kleine-Budde <mkl@pengutronix.de>
> ---
> Changes in v1 -> v2
> - no change in diff
> - added explicit Sphinx versions the issues were observed
> - picked Pavel's Acked-by
>
Gentle ping to netdev maintainers.
I believe this one should go upstream together with the
offending commit.
If there is something I can do better, please let me know.
Thanks, Akira
> --
> .../networking/device_drivers/can/ctu/ctucanfd-driver.rst | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/networking/device_drivers/can/ctu/ctucanfd-driver.rst b/Documentation/networking/device_drivers/can/ctu/ctucanfd-driver.rst
> index 2fde5551e756..40c92ea272af 100644
> --- a/Documentation/networking/device_drivers/can/ctu/ctucanfd-driver.rst
> +++ b/Documentation/networking/device_drivers/can/ctu/ctucanfd-driver.rst
> @@ -72,7 +72,7 @@ it is reachable (on which bus it resides) and its configuration –
> registers address, interrupts and so on. An example of such a device
> tree is given in .
>
> -.. code:: raw
> +::
>
> / {
> /* ... */
> @@ -451,7 +451,7 @@ the FIFO is maintained, together with priority rotation, is depicted in
>
> |
>
> -.. figure:: fsm_txt_buffer_user.svg
> +.. kernel-figure:: fsm_txt_buffer_user.svg
>
> TX Buffer states with possible transitions
>
^ permalink raw reply
* Re: [RFC Patch net-next v2 7/9] net: dsa: move mib->cnt_ptr reset code to ksz_common.c
From: Vladimir Oltean @ 2022-05-16 11:25 UTC (permalink / raw)
To: Arun Ramadoss
Cc: linux-kernel, netdev, Russell King, Woojung Huh, UNGLinuxDriver,
Andrew Lunn, Vivien Didelot, Florian Fainelli, David S. Miller,
Jakub Kicinski, Paolo Abeni, Oleksij Rempel, Marek Vasut,
Michael Grzeschik, Eric Dumazet
In-Reply-To: <20220513102219.30399-8-arun.ramadoss@microchip.com>
On Fri, May 13, 2022 at 03:52:17PM +0530, Arun Ramadoss wrote:
> From: Prasanna Vengateshan <prasanna.vengateshan@microchip.com>
>
> mib->cnt_ptr resetting is handled in multiple places as part of
> port_init_cnt(). Hence moved mib->cnt_ptr code to ksz common layer
> and removed from individual product files.
>
> Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com>
> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
> ---
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
^ permalink raw reply
* Re: [RFC Patch net-next v2 6/9] net: dsa: microchip: move get_strings to ksz_common
From: Vladimir Oltean @ 2022-05-16 11:24 UTC (permalink / raw)
To: Arun Ramadoss
Cc: linux-kernel, netdev, Russell King, Woojung Huh, UNGLinuxDriver,
Andrew Lunn, Vivien Didelot, Florian Fainelli, David S. Miller,
Jakub Kicinski, Paolo Abeni, Oleksij Rempel, Marek Vasut,
Michael Grzeschik, Eric Dumazet
In-Reply-To: <20220513102219.30399-7-arun.ramadoss@microchip.com>
On Fri, May 13, 2022 at 03:52:16PM +0530, Arun Ramadoss wrote:
> ksz8795 and ksz9477 uses the same algorithm for copying the ethtool
> strings. Hence moved to ksz_common to remove the redundant code.
>
> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
> ---
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
^ permalink raw reply
* Re: [RFC Patch net-next v2 5/9] net: dsa: microchip: move struct mib_names to ksz_chip_data
From: Vladimir Oltean @ 2022-05-16 11:23 UTC (permalink / raw)
To: Arun Ramadoss
Cc: linux-kernel, netdev, Russell King, Woojung Huh, UNGLinuxDriver,
Andrew Lunn, Vivien Didelot, Florian Fainelli, David S. Miller,
Jakub Kicinski, Paolo Abeni, Oleksij Rempel, Marek Vasut,
Michael Grzeschik, Eric Dumazet
In-Reply-To: <20220513102219.30399-6-arun.ramadoss@microchip.com>
On Fri, May 13, 2022 at 03:52:15PM +0530, Arun Ramadoss wrote:
> The ksz88xx family has one set of mib_names. The ksz87xx, ksz9477,
> LAN937x based switches has one set of mib_names. In order to remove
> redundant declaration, moved the struct mib_names to ksz_chip_data
> structure. And allocated the mib memory in switch_register instead of
> individual switch_init function.
>
> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
> ---
> static int ksz9477_switch_init(struct ksz_device *dev)
> {
> - int i;
> -
> dev->ds->ops = &ksz9477_switch_ops;
>
> dev->port_mask = (1 << dev->info->port_cnt) - 1;
>
> - dev->reg_mib_cnt = SWITCH_COUNTER_NUM;
> - dev->mib_cnt = TOTAL_SWITCH_COUNTER_NUM;
> -
> - for (i = 0; i < dev->info->port_cnt; i++) {
> - spin_lock_init(&dev->ports[i].mib.stats64_lock);
> - mutex_init(&dev->ports[i].mib.cnt_mutex);
> - dev->ports[i].mib.counters =
> - devm_kzalloc(dev->dev,
> - sizeof(u64) *
> - (TOTAL_SWITCH_COUNTER_NUM + 1),
> - GFP_KERNEL);
> - if (!dev->ports[i].mib.counters)
> - return -ENOMEM;
> - }
> -
This fixes the NULL pointer dereference on probe that was introduced in
the previous patch, but please make sure that this does not happen in
the first place, for bisectability purposes.
> return 0;
> }
^ permalink raw reply
* Re: [PATCH net v2] netfilter: nf_flow_table: fix teardown flow timeout
From: Sven Auhagen @ 2022-05-16 11:18 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Oz Shlomo, Felix Fietkau, netdev, netfilter-devel,
Florian Westphal, Paul Blakey
In-Reply-To: <YoIt5rHw4Xwl1zgY@salvia>
On Mon, May 16, 2022 at 12:56:38PM +0200, Pablo Neira Ayuso wrote:
> On Thu, May 12, 2022 at 09:28:03PM +0300, Oz Shlomo wrote:
> > Connections leaving the established state (due to RST / FIN TCP packets)
> > set the flow table teardown flag. The packet path continues to set lower
> > timeout value as per the new TCP state but the offload flag remains set.
> >
> > Hence, the conntrack garbage collector may race to undo the timeout
> > adjustment of the packet path, leaving the conntrack entry in place with
> > the internal offload timeout (one day).
> >
> > Avoid ct gc timeout overwrite by flagging teared down flowtable
> > connections.
> >
> > On the nftables side we only need to allow established TCP connections to
> > create a flow offload entry. Since we can not guaruantee that
> > flow_offload_teardown is called by a TCP FIN packet we also need to make
> > sure that flow_offload_fixup_ct is also called in flow_offload_del
> > and only fixes up established TCP connections.
> [...]
> > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > index 0164e5f522e8..324fdb62c08b 100644
> > --- a/net/netfilter/nf_conntrack_core.c
> > +++ b/net/netfilter/nf_conntrack_core.c
> > @@ -1477,7 +1477,8 @@ static void gc_worker(struct work_struct *work)
> > tmp = nf_ct_tuplehash_to_ctrack(h);
> >
> > if (test_bit(IPS_OFFLOAD_BIT, &tmp->status)) {
> > - nf_ct_offload_timeout(tmp);
>
> Hm, it is the trick to avoid checking for IPS_OFFLOAD from the packet
> path that triggers the race, ie. nf_ct_is_expired()
>
> The flowtable ct fixup races with conntrack gc collector.
>
> Clearing IPS_OFFLOAD might result in offloading the entry again for
> the closing packets.
>
> Probably clear IPS_OFFLOAD from teardown, and skip offload if flow is
> in a TCP state that represent closure?
>
> if (unlikely(!tcph || tcph->fin || tcph->rst))
> goto out;
>
> this is already the intention in the existing code.
>
> If this does work, could you keep IPS_OFFLOAD_TEARDOWN_BIT internal,
> ie. no in uapi? Define it at include/net/netfilter/nf_conntrack.h and
> add a comment regarding this to avoid an overlap in the future.
>
> > + if (!test_bit(IPS_OFFLOAD_TEARDOWN_BIT, &tmp->status))
> > + nf_ct_offload_timeout(tmp);
> > continue;
> > }
> >
> > diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> > index 3db256da919b..aaed1a244013 100644
> > --- a/net/netfilter/nf_flow_table_core.c
> > +++ b/net/netfilter/nf_flow_table_core.c
> > @@ -177,14 +177,8 @@ int flow_offload_route_init(struct flow_offload *flow,
> > }
> > EXPORT_SYMBOL_GPL(flow_offload_route_init);
> >
> > -static void flow_offload_fixup_tcp(struct ip_ct_tcp *tcp)
> > -{
> > - tcp->state = TCP_CONNTRACK_ESTABLISHED;
> > - tcp->seen[0].td_maxwin = 0;
> > - tcp->seen[1].td_maxwin = 0;
> > -}
> >
> > -static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
> > +static void flow_offload_fixup_ct(struct nf_conn *ct)
> > {
> > struct net *net = nf_ct_net(ct);
> > int l4num = nf_ct_protonum(ct);
> > @@ -192,8 +186,12 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
> >
> > if (l4num == IPPROTO_TCP) {
> > struct nf_tcp_net *tn = nf_tcp_pernet(net);
> > + struct ip_ct_tcp *tcp = &ct->proto.tcp;
> > +
> > + tcp->seen[0].td_maxwin = 0;
> > + tcp->seen[1].td_maxwin = 0;
> >
> > - timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
> > + timeout = tn->timeouts[ct->proto.tcp.state];
> > timeout -= tn->offload_timeout;
> > } else if (l4num == IPPROTO_UDP) {
> > struct nf_udp_net *tn = nf_udp_pernet(net);
> > @@ -211,18 +209,6 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
> > WRITE_ONCE(ct->timeout, nfct_time_stamp + timeout);
> > }
> >
> > -static void flow_offload_fixup_ct_state(struct nf_conn *ct)
> > -{
> > - if (nf_ct_protonum(ct) == IPPROTO_TCP)
> > - flow_offload_fixup_tcp(&ct->proto.tcp);
> > -}
> > -
> > -static void flow_offload_fixup_ct(struct nf_conn *ct)
> > -{
> > - flow_offload_fixup_ct_state(ct);
> > - flow_offload_fixup_ct_timeout(ct);
> > -}
> > -
> > static void flow_offload_route_release(struct flow_offload *flow)
> > {
> > nft_flow_dst_release(flow, FLOW_OFFLOAD_DIR_ORIGINAL);
> > @@ -353,6 +339,10 @@ static inline bool nf_flow_has_expired(const struct flow_offload *flow)
> > static void flow_offload_del(struct nf_flowtable *flow_table,
> > struct flow_offload *flow)
> > {
> > + struct nf_conn *ct = flow->ct;
> > +
> > + set_bit(IPS_OFFLOAD_TEARDOWN_BIT, &flow->ct->status);
> > +
> > rhashtable_remove_fast(&flow_table->rhashtable,
> > &flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].node,
> > nf_flow_offload_rhash_params);
> > @@ -360,12 +350,11 @@ static void flow_offload_del(struct nf_flowtable *flow_table,
> > &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].node,
> > nf_flow_offload_rhash_params);
> >
> > - clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
> > -
> > if (nf_flow_has_expired(flow))
> > - flow_offload_fixup_ct(flow->ct);
> > - else
> > - flow_offload_fixup_ct_timeout(flow->ct);
> > + flow_offload_fixup_ct(ct);
>
> Very unlikely, but race might still happen between fixup and
> clear IPS_OFFLOAD_BIT with gc below?
>
> Without checking from the packet path, the conntrack gc might race to
> refresh the timeout, I don't see a 100% race free solution.
>
> Probably update the nf_ct_offload_timeout to a shorter value than a
> day would mitigate this issue too.
This section of the code is now protected by IPS_OFFLOAD_TEARDOWN_BIT
which will prevent the update via nf_ct_offload_timeout.
We set it at the beginning of flow_offload_del and flow_offload_teardown.
Since flow_offload_teardown is only called on TCP packets
we also need to set it at flow_offload_del to prevent the race.
This should prevent the race at this point.
>
> > + clear_bit(IPS_OFFLOAD_BIT, &ct->status);
> > + clear_bit(IPS_OFFLOAD_TEARDOWN_BIT, &ct->status);
> >
> > flow_offload_free(flow);
> > }
> > @@ -373,8 +362,9 @@ static void flow_offload_del(struct nf_flowtable *flow_table,
> > void flow_offload_teardown(struct flow_offload *flow)
> > {
> > set_bit(NF_FLOW_TEARDOWN, &flow->flags);
> > + set_bit(IPS_OFFLOAD_TEARDOWN_BIT, &flow->ct->status);
> >
> > - flow_offload_fixup_ct_state(flow->ct);
> > + flow_offload_fixup_ct(flow->ct);
> > }
> > EXPORT_SYMBOL_GPL(flow_offload_teardown);
> >
> > diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
> > index 900d48c810a1..9cc3ea08eb3a 100644
> > --- a/net/netfilter/nft_flow_offload.c
> > +++ b/net/netfilter/nft_flow_offload.c
> > @@ -295,6 +295,8 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
> > sizeof(_tcph), &_tcph);
> > if (unlikely(!tcph || tcph->fin || tcph->rst))
> > goto out;
> > + if (unlikely(!nf_conntrack_tcp_established(ct)))
> > + goto out;
>
> This chunk is not required, from ruleset users can do
>
> ... ct status assured ...
>
> instead.
Maybe this should be mentioned in the manual or wiki if
it is not necessary in the flow offload code.
>
> > break;
> > case IPPROTO_UDP:
> > break;
> > --
> > 1.8.3.1
> >
^ permalink raw reply
* Re: [RFC Patch net-next v2 4/9] net: dsa: microchip: move port memory allocation to ksz_common
From: Vladimir Oltean @ 2022-05-16 11:12 UTC (permalink / raw)
To: Arun Ramadoss
Cc: linux-kernel, netdev, Russell King, Woojung Huh, UNGLinuxDriver,
Andrew Lunn, Vivien Didelot, Florian Fainelli, David S. Miller,
Jakub Kicinski, Paolo Abeni, Oleksij Rempel, Marek Vasut,
Michael Grzeschik, Eric Dumazet
In-Reply-To: <20220513102219.30399-5-arun.ramadoss@microchip.com>
On Fri, May 13, 2022 at 03:52:14PM +0530, Arun Ramadoss wrote:
> ksz8795 and ksz9477 init function initializes the memory to dev->ports
> and assigns the ds real number of ports. Since both the routines are
> same, moved the allocation of port memory to ksz_switch_register after
> init.
>
> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
> ---
Does this actually work? ksz8_switch_init() and ksz9477_switch_init()
still dereference dev->ports. They are called from dev->dev_ops->init()
from ksz_switch_register(). You have moved the devm_kzalloc() to _after_
the dev->dev_ops->init() call. So these functions are accessing memory
behind a not-yet-allocated pointer.
> drivers/net/dsa/microchip/ksz8795.c | 8 --------
> drivers/net/dsa/microchip/ksz9477.c | 8 --------
> drivers/net/dsa/microchip/ksz_common.c | 9 +++++++++
> 3 files changed, 9 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c
> index b6032b65afc2..91f29ff7256c 100644
> --- a/drivers/net/dsa/microchip/ksz8795.c
> +++ b/drivers/net/dsa/microchip/ksz8795.c
> @@ -1599,11 +1599,6 @@ static int ksz8_switch_init(struct ksz_device *dev)
>
> dev->reg_mib_cnt = MIB_COUNTER_NUM;
>
> - dev->ports = devm_kzalloc(dev->dev,
> - dev->info->port_cnt * sizeof(struct ksz_port),
> - GFP_KERNEL);
> - if (!dev->ports)
> - return -ENOMEM;
> for (i = 0; i < dev->info->port_cnt; i++) {
> mutex_init(&dev->ports[i].mib.cnt_mutex);
> dev->ports[i].mib.counters =
> @@ -1615,9 +1610,6 @@ static int ksz8_switch_init(struct ksz_device *dev)
> return -ENOMEM;
> }
>
> - /* set the real number of ports */
> - dev->ds->num_ports = dev->info->port_cnt;
> -
> /* We rely on software untagging on the CPU port, so that we
> * can support both tagged and untagged VLANs
> */
> diff --git a/drivers/net/dsa/microchip/ksz9477.c b/drivers/net/dsa/microchip/ksz9477.c
> index c712a0011367..1a0fd36e180e 100644
> --- a/drivers/net/dsa/microchip/ksz9477.c
> +++ b/drivers/net/dsa/microchip/ksz9477.c
> @@ -1482,11 +1482,6 @@ static int ksz9477_switch_init(struct ksz_device *dev)
> dev->reg_mib_cnt = SWITCH_COUNTER_NUM;
> dev->mib_cnt = TOTAL_SWITCH_COUNTER_NUM;
>
> - dev->ports = devm_kzalloc(dev->dev,
> - dev->info->port_cnt * sizeof(struct ksz_port),
> - GFP_KERNEL);
> - if (!dev->ports)
> - return -ENOMEM;
> for (i = 0; i < dev->info->port_cnt; i++) {
> spin_lock_init(&dev->ports[i].mib.stats64_lock);
> mutex_init(&dev->ports[i].mib.cnt_mutex);
> @@ -1499,9 +1494,6 @@ static int ksz9477_switch_init(struct ksz_device *dev)
> return -ENOMEM;
> }
>
> - /* set the real number of ports */
> - dev->ds->num_ports = dev->info->port_cnt;
> -
> return 0;
> }
>
> diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
> index fd2f1bd3feb5..717734fe437e 100644
> --- a/drivers/net/dsa/microchip/ksz_common.c
> +++ b/drivers/net/dsa/microchip/ksz_common.c
> @@ -768,6 +768,15 @@ int ksz_switch_register(struct ksz_device *dev,
> if (ret)
> return ret;
>
> + dev->ports = devm_kzalloc(dev->dev,
> + dev->info->port_cnt * sizeof(struct ksz_port),
> + GFP_KERNEL);
> + if (!dev->ports)
> + return -ENOMEM;
> +
> + /* set the real number of ports */
> + dev->ds->num_ports = dev->info->port_cnt;
> +
> /* Host port interface will be self detected, or specifically set in
> * device tree.
> */
> --
> 2.33.0
>
^ permalink raw reply
* Re: [RFC Patch net-next v2 3/9] net: dsa: microchip: perform the compatibility check for dev probed
From: Vladimir Oltean @ 2022-05-16 11:08 UTC (permalink / raw)
To: Arun Ramadoss
Cc: linux-kernel, netdev, Russell King, Woojung Huh, UNGLinuxDriver,
Andrew Lunn, Vivien Didelot, Florian Fainelli, David S. Miller,
Jakub Kicinski, Paolo Abeni, Oleksij Rempel, Marek Vasut,
Michael Grzeschik, Eric Dumazet
In-Reply-To: <20220513102219.30399-4-arun.ramadoss@microchip.com>
On Fri, May 13, 2022 at 03:52:13PM +0530, Arun Ramadoss wrote:
> +static int ksz_check_device_id(struct ksz_device *dev)
> +{
> + const struct ksz_chip_data *dt_chip_data;
> +
> + dt_chip_data = of_device_get_match_data(dev->dev);
And one other comment. You haven't converted ksz8863_smi.c to put
anything in struct of_device_id :: data, so that driver will dereference
NULL here.
> +
> + /* Check for Device Tree and Chip ID */
> + if (dt_chip_data->chip_id != dev->chip_id) {
> + dev_err(dev->dev,
> + "Device tree specifies chip %s but found %s, please fix it!\n",
> + dt_chip_data->dev_name, dev->info->dev_name);
> + return -ENODEV;
> + }
> +
> + return 0;
> +}
^ permalink raw reply
* Re: [PATCH nf v2 2/2] netfilter: Use l3mdev flow key when re-routing mangled packets
From: Pablo Neira Ayuso @ 2022-05-16 11:03 UTC (permalink / raw)
To: Martin Willi; +Cc: Florian Westphal, David Ahern, netfilter-devel, netdev
In-Reply-To: <20220419134701.153090-3-martin@strongswan.org>
On Tue, Apr 19, 2022 at 03:47:01PM +0200, Martin Willi wrote:
> Commit 40867d74c374 ("net: Add l3mdev index to flow struct and avoid oif
> reset for port devices") introduces a flow key specific for layer 3
> domains, such as a VRF master device. This allows for explicit VRF domain
> selection instead of abusing the oif flow key.
>
> Update ip[6]_route_me_harder() to make use of that new key when re-routing
> mangled packets within VRFs instead of setting the flow oif, making it
> consistent with other users.
Applied to nf-next
^ permalink raw reply
* Re: [RFC Patch net-next v2 3/9] net: dsa: microchip: perform the compatibility check for dev probed
From: Vladimir Oltean @ 2022-05-16 11:03 UTC (permalink / raw)
To: Arun Ramadoss
Cc: linux-kernel, netdev, Russell King, Woojung Huh, UNGLinuxDriver,
Andrew Lunn, Vivien Didelot, Florian Fainelli, David S. Miller,
Jakub Kicinski, Paolo Abeni, Oleksij Rempel, Marek Vasut,
Michael Grzeschik, Eric Dumazet
In-Reply-To: <20220513102219.30399-4-arun.ramadoss@microchip.com>
On Fri, May 13, 2022 at 03:52:13PM +0530, Arun Ramadoss wrote:
> This patch perform the compatibility check for the device after the chip
> detect is done. It is to prevent the mismatch between the device
> compatible specified in the device tree and actual device found during
> the detect. The ksz9477 device doesn't use any .data in the
> of_device_id. But the ksz8795 uses .data for assigning the regmap
> between 8830 family and 87xx family switch. Changed the regmap
> assignment based on the chip_id from the .data.
>
> Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
> ---
> drivers/net/dsa/microchip/ksz8795_spi.c | 37 ++++++++++++++++++++-----
> drivers/net/dsa/microchip/ksz9477_i2c.c | 30 ++++++++++++++++----
> drivers/net/dsa/microchip/ksz9477_spi.c | 30 ++++++++++++++++----
> drivers/net/dsa/microchip/ksz_common.c | 25 ++++++++++++++++-
> drivers/net/dsa/microchip/ksz_common.h | 1 +
> 5 files changed, 103 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/net/dsa/microchip/ksz8795_spi.c b/drivers/net/dsa/microchip/ksz8795_spi.c
> index 5f8d94aee774..1ae1b1ee9f2a 100644
> --- a/drivers/net/dsa/microchip/ksz8795_spi.c
> +++ b/drivers/net/dsa/microchip/ksz8795_spi.c
> @@ -31,9 +31,12 @@ KSZ_REGMAP_TABLE(ksz8795, 16, KSZ8795_SPI_ADDR_SHIFT,
> KSZ_REGMAP_TABLE(ksz8863, 16, KSZ8863_SPI_ADDR_SHIFT,
> KSZ8863_SPI_TURNAROUND_SHIFT, KSZ8863_SPI_ADDR_ALIGN);
>
> +#define KSZ_88X3_FAMILY 0x8830
> +
Can we have this macro defined in ksz_common.h and used in the chip_id
of the ksz_chip_data structure as well? It makes things easier to follow
by pattern matching. And for symmetry, it would probably be good to have
such a macro for all chip ids.
^ permalink raw reply
* Re: [PATCH v2 nf 1/4] netfilter: flowtable: fix excessive hw offload attempts after failure
From: Pablo Neira Ayuso @ 2022-05-16 10:59 UTC (permalink / raw)
To: Felix Fietkau; +Cc: netdev, netfilter-devel
In-Reply-To: <20220509122616.65449-1-nbd@nbd.name>
Series applied to nf.git
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox