From: Martynas <m@lambda.lt>
To: "Martin KaFai Lau" <martin.lau@linux.dev>
Cc: "Daniel Borkmann" <daniel@iogearbox.net>,
netdev <netdev@vger.kernel.org>,
"Nikolay Aleksandrov" <razor@blackwall.org>,
bpf@vger.kernel.org
Subject: Re: [PATCH bpf v2 1/2] bpf: Derive source IP addr via bpf_*_fib_lookup()
Date: Thu, 05 Oct 2023 22:16:53 +0200 [thread overview]
Message-ID: <e7b992e3-8059-4058-8561-cb017c200c8d@app.fastmail.com> (raw)
In-Reply-To: <5bef21a3-18c0-e335-d64e-bcd6f1e304a4@linux.dev>
Thanks for the review.
On Wed, Oct 4, 2023, at 10:09 PM, Martin KaFai Lau wrote:
> On 10/3/23 12:10 AM, Martynas Pumputis wrote:
>> Extend the bpf_fib_lookup() helper by making it to return the source
>> IPv4/IPv6 address if the BPF_FIB_LOOKUP_SET_SRC flag is set.
>>
>> For example, the following snippet can be used to derive the desired
>> source IP address:
>>
>> struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr };
>>
>> ret = bpf_skb_fib_lookup(skb, p, sizeof(p),
>> BPF_FIB_LOOKUP_SET_SRC | BPF_FIB_LOOKUP_SKIP_NEIGH);
>> if (ret != BPF_FIB_LKUP_RET_SUCCESS)
>> return TC_ACT_SHOT;
>>
>> /* the p.ipv4_src now contains the source address */
>>
>> The inability to derive the proper source address may cause malfunctions
>> in BPF-based dataplanes for hosts containing netdevs with more than one
>> routable IP address or for multi-homed hosts.
>>
>> For example, Cilium implements packet masquerading in BPF. If an
>> egressing netdev to which the Cilium's BPF prog is attached has
>> multiple IP addresses, then only one [hardcoded] IP address can be used for
>> masquerading. This breaks connectivity if any other IP address should have
>> been selected instead, for example, when a public and private addresses
>> are attached to the same egress interface.
>>
>> The change was tested with Cilium [1].
>>
>> Nikolay Aleksandrov helped to figure out the IPv6 addr selection.
>>
>> [1]: https://github.com/cilium/cilium/pull/28283
>>
>> Signed-off-by: Martynas Pumputis <m@lambda.lt>
>> ---
>> include/net/ipv6_stubs.h | 5 +++++
>> include/uapi/linux/bpf.h | 9 +++++++++
>> net/core/filter.c | 19 ++++++++++++++++++-
>> net/ipv6/af_inet6.c | 1 +
>> tools/include/uapi/linux/bpf.h | 10 ++++++++++
>> 5 files changed, 43 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/net/ipv6_stubs.h b/include/net/ipv6_stubs.h
>> index c48186bf4737..21da31e1dff5 100644
>> --- a/include/net/ipv6_stubs.h
>> +++ b/include/net/ipv6_stubs.h
>> @@ -85,6 +85,11 @@ struct ipv6_bpf_stub {
>> sockptr_t optval, unsigned int optlen);
>> int (*ipv6_getsockopt)(struct sock *sk, int level, int optname,
>> sockptr_t optval, sockptr_t optlen);
>> + int (*ipv6_dev_get_saddr)(struct net *net,
>> + const struct net_device *dst_dev,
>> + const struct in6_addr *daddr,
>> + unsigned int prefs,
>> + struct in6_addr *saddr);
>> };
>> extern const struct ipv6_bpf_stub *ipv6_bpf_stub __read_mostly;
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 0448700890f7..a6bf686eecbc 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -3257,6 +3257,10 @@ union bpf_attr {
>> * and *params*->smac will not be set as output. A common
>> * use case is to call **bpf_redirect_neigh**\ () after
>> * doing **bpf_fib_lookup**\ ().
>> + * **BPF_FIB_LOOKUP_SET_SRC**
>> + * Derive and set source IP addr in *params*->ipv{4,6}_src
>> + * for the nexthop. If the src addr cannot be derived,
>> + * **BPF_FIB_LKUP_RET_NO_SRC_ADDR** is returned.
>> *
>> * *ctx* is either **struct xdp_md** for XDP programs or
>> * **struct sk_buff** tc cls_act programs.
>> @@ -6953,6 +6957,7 @@ enum {
>> BPF_FIB_LOOKUP_OUTPUT = (1U << 1),
>> BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2),
>> BPF_FIB_LOOKUP_TBID = (1U << 3),
>> + BPF_FIB_LOOKUP_SET_SRC = (1U << 4),
>
> bikeshedding: Shorten it to BPF_FIB_LOOKUP_SRC?
SGTM.
>
>> };
>>
>> enum {
>> @@ -6965,6 +6970,7 @@ enum {
>> BPF_FIB_LKUP_RET_UNSUPP_LWT, /* fwd requires encapsulation */
>> BPF_FIB_LKUP_RET_NO_NEIGH, /* no neighbor entry for nh */
>> BPF_FIB_LKUP_RET_FRAG_NEEDED, /* fragmentation required to fwd */
>> + BPF_FIB_LKUP_RET_NO_SRC_ADDR, /* failed to derive IP src addr */
>> };
>>
>> struct bpf_fib_lookup {
>> @@ -6999,6 +7005,9 @@ struct bpf_fib_lookup {
>> __u32 rt_metric;
>> };
>>
>> + /* input: source address to consider for lookup
>> + * output: source address result from lookup
>> + */
>> union {
>> __be32 ipv4_src;
>> __u32 ipv6_src[4]; /* in6_addr; network order */
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index a094694899c9..f3777ef1840b 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -5850,6 +5850,9 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>> params->rt_metric = res.fi->fib_priority;
>> params->ifindex = dev->ifindex;
>>
>> + if (flags & BPF_FIB_LOOKUP_SET_SRC)
>> + params->ipv4_src = fib_result_prefsrc(net, &res);
>
> Does it need to check 0 and return BPF_FIB_LKUP_RET_NO_SRC_ADDR for v4 also,
> or the fib_result_prefsrc does not fail?
From inspecting the other usages of the function - it does not fail.
>
>> +
>> /* xdp and cls_bpf programs are run in RCU-bh so
>> * rcu_read_lock_bh is not needed here
>> */
>> @@ -5992,6 +5995,19 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>> params->rt_metric = res.f6i->fib6_metric;
>> params->ifindex = dev->ifindex;
>>
>> + if (flags & BPF_FIB_LOOKUP_SET_SRC) {
>> + if (res.f6i->fib6_prefsrc.plen) {
>> + *(struct in6_addr *)params->ipv6_src = res.f6i->fib6_prefsrc.addr;
>> + } else {
>> + err = ipv6_bpf_stub->ipv6_dev_get_saddr(net, dev,
>> + &fl6.daddr, 0,
>> + (struct in6_addr *)
>> + params->ipv6_src);
>> + if (err)
>> + return BPF_FIB_LKUP_RET_NO_SRC_ADDR;
>
> This error also implies BPF_FIB_LKUP_RET_NO_NEIGH. I don't have a clean way of
> improving the API. May be others have some ideas.
>
> Considering dev has no saddr is probably (?) an unlikely case, it should be ok
> to leave it as is but at least a comment in the uapi will be needed. Otherwise,
> the bpf prog may use the 0 dmac as-is.
I expect that a user of the helper checks that err == 0 before using any of the output params.
>
> I feel the current bpf_ipv[46]_fib_lookup helper is doing many things
> in one
> function and then requires different BPF_FIB_LOOKUP_* bits to select
> what/how to
> do. In the future, it may be worth to consider breaking it into smaller
> kfunc(s). e.g. the __ipv[46]_neigh_lookup could be in its own kfunc.
>
Yep, good idea. At least it seems that the neigh lookup could live in its own function.
>> + }
>> + }
>> +
>> if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH)
>> goto set_fwd_params;
>>
>> @@ -6010,7 +6026,8 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>> #endif
>>
>> #define BPF_FIB_LOOKUP_MASK (BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | \
>> - BPF_FIB_LOOKUP_SKIP_NEIGH | BPF_FIB_LOOKUP_TBID)
>> + BPF_FIB_LOOKUP_SKIP_NEIGH | BPF_FIB_LOOKUP_TBID | \
>> + BPF_FIB_LOOKUP_SET_SRC)
>>
>> BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx,
>> struct bpf_fib_lookup *, params, int, plen, u32, flags)
next prev parent reply other threads:[~2023-10-05 20:18 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-03 7:10 [PATCH bpf v2 0/2] bpf: Fix src IP addr related limitation in bpf_*_fib_lookup() Martynas Pumputis
2023-10-03 7:10 ` [PATCH bpf v2 1/2] bpf: Derive source IP addr via bpf_*_fib_lookup() Martynas Pumputis
2023-10-04 20:09 ` Martin KaFai Lau
2023-10-05 20:16 ` Martynas [this message]
2023-10-06 6:29 ` Martin KaFai Lau
2023-10-06 7:03 ` Martynas
2023-10-03 7:10 ` [PATCH bpf v2 2/2] selftests/bpf: Add BPF_FIB_LOOKUP_SET_SRC tests Martynas Pumputis
2023-10-04 22:02 ` Martin KaFai Lau
2023-10-05 20:19 ` Martynas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e7b992e3-8059-4058-8561-cb017c200c8d@app.fastmail.com \
--to=m@lambda.lt \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=razor@blackwall.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox