* Re: [PATCH net] net: microchip: lan743x : bidirectional throughuput improvement
From: VishvambarPanth.S @ 2023-11-01 7:20 UTC (permalink / raw)
To: kuba, f.fainelli
Cc: Bryan.Whitehead, andrew, davem, linux-kernel, pabeni, netdev,
UNGLinuxDriver, edumazet
In-Reply-To: <20231004130957.2d633d03@kernel.org>
On Wed, 2023-10-04 at 13:09 -0700, Jakub Kicinski wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
>
> On Wed, 4 Oct 2023 13:02:17 -0700 Florian Fainelli wrote:
> > > Nobody complained for 5 years, and it's not a regression.
> > > Let's not treat this as a fix, please repost without the Fixes
> > > tag for
> > > net-next.
> >
> > As a driver maintainer, you may want to provide some guarantees to
> > your
> > end users/customers that from stable version X.Y.Z the performance
> > issues have been fixed. Performance improvements are definitively
> > border
> > line in terms of being considered as bug fixes though.
>
> I understand that, but too often people just "feel like a device
> which
> advertises X Mbps / Gbps should reach line rate" while no end user
> cares.
>
> Luckily stable rules are pretty clear about this (search for
> "performance"):
> https://docs.kernel.org/process/stable-kernel-rules.html
>
> As posted it doesn't fulfill the requirements 🤷️
Thanks for your feedback. I apologize for the delayed response.
The data presented in the patch description was aimed to convince a
reviewer with the visible impact of the performance boosts in both x64
and ARM platforms. However, the main motivation behind the patch was
not merely a "good-to-have" improvement but a solution to the
throughput issues reported by multiple customers in several platforms.
We received lots of customer requests through our ticket site system
urging us to address the performance issues on multiple kernel versions
including LTS. While it's acknowledged that stable branch rules
typically do not consider performance fixes that are not documented in
public Bugzilla, this performance enhancement is essential to many of
our customers and their end users and we believe should therefore be
considered for stable branch on the basis of it’s visible user impact.
Few issues reported by our customers are mentioned below, even though
these issues have existed for a long time, the data presented below is
collected from the customer within last 3 months.
Customer-A using lan743x with Hisilicon- Kirin 990 processor in 5.10
kernel, reported a mere ~300Mbps in Rx UDP. The fix significantly
improved the performance to ~900Mbps Rx in their platform.
Customer-B using lan743x with v5.10 has an issue with Tx UDP being only
157Mbps in their platform. Including the fix in the patch boosts the
performance to ~600Mbps in Tx UDP.
Customer-C using lan743x with ADAS Ref Design in v5.10 reported UDP
Tx/Rx to be 126/723 Mbps and the fix improved the performance to
828/956 Mbps.
Customer-D using lan743x with Qcom 6490 with v5.4 wanted improvements
for their platform from UDP Rx 200Mbps. The fix along with few other
changes helped us to bring Rx perf to 800Mbps in customer’s platform
This is a kind request for considering the acceptance of this patch
into the net branch, as it has a significant positive impact on users
and does not have any adverse effects.
Thanks,
Vishvambar Panth S
^ permalink raw reply
* [PATCH net-next v2 1/1] net: stmmac: check CBS input values before configuration
From: Gan Yi Fang @ 2023-11-01 6:19 UTC (permalink / raw)
To: Alexandre Torgue, Jose Abreu, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Maxime Coquelin, netdev, linux-stm32,
linux-arm-kernel, linux-kernel
Cc: Looi Hong Aun, Voon Weifeng, Song Yoong Siang,
Michael Sit Wei Hong, Gan Yi Fang
From: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Add check for below conditions before proceeding to configuration.
A message will be prompted if the input value is invalid.
Idleslope minus sendslope should equal speed_div.
Idleslope is always a positive value including zero.
Sendslope is always a negative value including zero.
Hicredit is always a positive value including zero.
Locredit is always a negative value including zero.
Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Signed-off-by: Gan, Yi Fang <yi.fang.gan@intel.com>
---
drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
index ac41ef4cbd2f..e8a079946f84 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
@@ -381,6 +381,11 @@ static int tc_setup_cbs(struct stmmac_priv *priv,
return -EOPNOTSUPP;
}
+ if ((qopt->idleslope - qopt->sendslope != speed_div) ||
+ qopt->idleslope < 0 || qopt->sendslope > 0 ||
+ qopt->hicredit < 0 || qopt->locredit > 0)
+ return -EINVAL;
+
mode_to_use = priv->plat->tx_queues_cfg[queue].mode_to_use;
if (mode_to_use == MTL_QUEUE_DCB && qopt->enable) {
ret = stmmac_dma_qmode(priv, priv->ioaddr, queue, MTL_QUEUE_AVB);
--
2.34.1
^ permalink raw reply related
* RE: [PATCH net-next 1/1] net: stmmac: add check for advertising linkmode request for set-eee
From: Gan, Yi Fang @ 2023-11-01 5:41 UTC (permalink / raw)
To: Russell King
Cc: Alexandre Torgue, Jose Abreu, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, Looi, Hong Aun, Voon, Weifeng,
Song, Yoong Siang, Ahmad Tarmizi, Noor Azura
In-Reply-To: <ZUDEIZwKMl88hGcX@shell.armlinux.org.uk>
> -----Original Message-----
> From: Russell King <linux@armlinux.org.uk>
> Sent: Tuesday, October 31, 2023 5:09 PM
> To: Gan, Yi Fang <yi.fang.gan@intel.com>
> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>; Jose Abreu
> <joabreu@synopsys.com>; David S . Miller <davem@davemloft.net>; Eric
> Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
> Abeni <pabeni@redhat.com>; Maxime Coquelin
> <mcoquelin.stm32@gmail.com>; netdev@vger.kernel.org; linux-stm32@st-md-
> mailman.stormreply.com; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; Looi, Hong Aun <hong.aun.looi@intel.com>; Voon,
> Weifeng <weifeng.voon@intel.com>; Song, Yoong Siang
> <yoong.siang.song@intel.com>; Ahmad Tarmizi, Noor Azura
> <noor.azura.ahmad.tarmizi@intel.com>
> Subject: Re: [PATCH net-next 1/1] net: stmmac: add check for advertising
> linkmode request for set-eee
>
> On Tue, Oct 31, 2023 at 08:44:23AM +0000, Gan, Yi Fang wrote:
> > Hi Russell King,
> >
> > > Why should this functionality be specific to stmmac?
> > This functionality is not specific to stmmac but other drivers can
> > have their own implementation.
> > (e.g.
> > https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/ql
> > ogic/qede/qede_ethtool.c#L1855)
>
> This is probably wrong (see below.)
>
> >
> > > Why do we need this?
> > Current implementation will not take any effect if user enters
> > unsupported value but user might not aware. With this, an error will be
> prompted if unsupported value is given.
>
> Why can't the user read back what settings were actually set like the other
> ethtool APIs? This is how ETHTOOL_GLINKSETTINGS works.
>
For current implementation, the behaviour of "fail to set" and
"set successfully" are the same from user's perspective.
But yes, user have responsibility to read back and check the setting.
> > > What is wrong with the checking and masking that phylib is doing?
> > Nothing wrong with the phylib but there is no error return back to
> > ethtool commands if unsupported value is given.
>
> Maybe because that is the correct implementation?
>
Yes, I agree with this.
> > > Why should we trust the value in edata->supported provided by the user?
> > The edata->supported is getting from the current setting and the value is set
> upon bootup.
> > Users are not allowed to change it.
>
> "not allowed" but there is nothing that prevents it. So an easy way to bypass
> your check is:
>
> struct ethtool_eee eeecmd;
>
> eeecmd.cmd = ETHTOOL_GEEE;
> send_ioctl(..., &eeecmd);
>
> eeecmd.cmd = ETHTOOL_SEEE;
> eeecmd.supported = ~0;
> eeecmd.advertised = ~0;
> error = send_ioctl(..., &eeecmd);
>
> and that won't return any error. So your check is weak at best, and relies upon
> the user doing the right thing.
>
Thank you for the example.
> > > Sorry, but no. I see no reason that this should be done, especially not in the
> stmmac driver.
> > I understand your reasoning. From your point of view, is this kind of
> > error message/ error handling not needed?
>
> It is not - ethtool APIs don't return errors if the advertise mask is larger than the
> supported mask - they merely limit to what is supported and set that. When
> subsequently querying the settings, they return what is actually set (so the
> advertise mask will always be a subset of the supported mask at that point.)
>
> So, if in userspace you really want to know if some modes were dropped, then
> you have to do a set-get-check sequence.
>
Thank you for all the feedbacks and explanations. It was a great discussion.
I understand your concerns and reasonings. Let's close the review for this patch.
Thanks.
Best Regards,
Fang
^ permalink raw reply
* Re: [PATCH net-XXX] vhost-vdpa: fix use after free in vhost_vdpa_probe()
From: Michael S. Tsirkin @ 2023-11-01 5:33 UTC (permalink / raw)
To: Dan Carpenter
Cc: Bo Liu, Jason Wang, kvm, virtualization, netdev, kernel-janitors
In-Reply-To: <cf53cb61-0699-4e36-a980-94fd4268ff00@moroto.mountain>
On Fri, Oct 27, 2023 at 03:12:54PM +0300, Dan Carpenter wrote:
> The put_device() calls vhost_vdpa_release_dev() which calls
> ida_simple_remove() and frees "v". So this call to
> ida_simple_remove() is a use after free and a double free.
>
> Fixes: ebe6a354fa7e ("vhost-vdpa: Call ida_simple_remove() when failed")
> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
queued, thanks!
> ---
> drivers/vhost/vdpa.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 9a2343c45df0..1aa67729e188 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -1511,7 +1511,6 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa)
>
> err:
> put_device(&v->dev);
> - ida_simple_remove(&vhost_vdpa_ida, v->minor);
> return r;
> }
>
> --
> 2.42.0
^ permalink raw reply
* Re: [PATCH net-next v2 5/9] net: ethernet: oa_tc6: implement internal PHY initialization
From: Parthiban.Veerasooran @ 2023-11-01 4:53 UTC (permalink / raw)
To: andrew
Cc: davem, edumazet, kuba, pabeni, robh+dt, krzysztof.kozlowski+dt,
conor+dt, corbet, Steen.Hegelund, rdunlap, horms, casper.casan,
netdev, devicetree, linux-kernel, linux-doc, Horatiu.Vultur,
Woojung.Huh, Nicolas.Ferre, UNGLinuxDriver, Thorsten.Kummermehr
In-Reply-To: <c67b0e57-b87b-45cd-b3fd-be11b0670b0d@lunn.ch>
Hi Andrew,
On 31/10/23 6:18 pm, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
>>>> + tc6->mdiobus = mdiobus_alloc();
>>>> + if (!tc6->mdiobus) {
>>>> + netdev_err(tc6->netdev, "MDIO bus alloc failed\n");
>>>> + return -ENODEV;
>>>> + }
>>>> +
>>>> + tc6->mdiobus->phy_mask = ~(u32)BIT(1);
>>>
>>> Does the standard define this ? BIT(1), not BIT(0)?
>> Ok, I think here is a typo. Will correct it.
>
> There is still the open question, does the standard define this? If
> not, a vendor might decide to use some other address, not 0. So it
> might be better to not set a mask and scan the whole bus.
> phy_find_first() should then work, no matter what address it is using.
The standard doesn't define anything about this. Ok I agree with this,
and remove the phy_mask and leave the phy_find_first() to find the phy.
Best Regards,
Parthiban V
>
> Andrew
^ permalink raw reply
* [PATCH v2 net] net/tcp: fix possible out-of-bounds reads in tcp_hash_fail()
From: Eric Dumazet @ 2023-11-01 4:52 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: netdev, eric.dumazet, Eric Dumazet, syzbot, Dmitry Safonov,
Francesco Ruggeri, David Ahern
syzbot managed to trigger a fault by sending TCP packets
with all flags being set.
v2:
- While fixing this bug, add PSH flag handling and represent
flags the way tcpdump does : [S], [S.], [P.]
- Print 4-tuples more consistently between families.
BUG: KASAN: stack-out-of-bounds in string_nocheck lib/vsprintf.c:645 [inline]
BUG: KASAN: stack-out-of-bounds in string+0x394/0x3d0 lib/vsprintf.c:727
Read of size 1 at addr ffffc9000397f3f5 by task syz-executor299/5039
CPU: 1 PID: 5039 Comm: syz-executor299 Not tainted 6.6.0-rc7-syzkaller-02075-g55c900477f5b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:364 [inline]
print_report+0xc4/0x620 mm/kasan/report.c:475
kasan_report+0xda/0x110 mm/kasan/report.c:588
string_nocheck lib/vsprintf.c:645 [inline]
string+0x394/0x3d0 lib/vsprintf.c:727
vsnprintf+0xc5f/0x1870 lib/vsprintf.c:2818
vprintk_store+0x3a0/0xb80 kernel/printk/printk.c:2191
vprintk_emit+0x14c/0x5f0 kernel/printk/printk.c:2288
vprintk+0x7b/0x90 kernel/printk/printk_safe.c:45
_printk+0xc8/0x100 kernel/printk/printk.c:2332
tcp_inbound_hash.constprop.0+0xdb2/0x10d0 include/net/tcp.h:2760
tcp_v6_rcv+0x2b31/0x34d0 net/ipv6/tcp_ipv6.c:1882
ip6_protocol_deliver_rcu+0x33b/0x13d0 net/ipv6/ip6_input.c:438
ip6_input_finish+0x14f/0x2f0 net/ipv6/ip6_input.c:483
NF_HOOK include/linux/netfilter.h:314 [inline]
NF_HOOK include/linux/netfilter.h:308 [inline]
ip6_input+0xce/0x440 net/ipv6/ip6_input.c:492
dst_input include/net/dst.h:461 [inline]
ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
NF_HOOK include/linux/netfilter.h:314 [inline]
NF_HOOK include/linux/netfilter.h:308 [inline]
ipv6_rcv+0x563/0x720 net/ipv6/ip6_input.c:310
__netif_receive_skb_one_core+0x115/0x180 net/core/dev.c:5527
__netif_receive_skb+0x1f/0x1b0 net/core/dev.c:5641
netif_receive_skb_internal net/core/dev.c:5727 [inline]
netif_receive_skb+0x133/0x700 net/core/dev.c:5786
tun_rx_batched+0x429/0x780 drivers/net/tun.c:1579
tun_get_user+0x29e7/0x3bc0 drivers/net/tun.c:2002
tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2048
call_write_iter include/linux/fs.h:1956 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x650/0xe40 fs/read_write.c:584
ksys_write+0x12f/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: 2717b5adea9e ("net/tcp: Add tcp_hash_fail() ratelimited logs")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dmitry Safonov <dima@arista.com>
Cc: Francesco Ruggeri <fruggeri@arista.com>
Cc: David Ahern <dsahern@kernel.org>
---
include/net/tcp_ao.h | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/include/net/tcp_ao.h b/include/net/tcp_ao.h
index a375a171ef3cb37ab1d8246c72c6a3e83f5c9184..b56be10838f09a2cb56ab511242d2b583eb4c33b 100644
--- a/include/net/tcp_ao.h
+++ b/include/net/tcp_ao.h
@@ -124,7 +124,7 @@ struct tcp_ao_info {
#define tcp_hash_fail(msg, family, skb, fmt, ...) \
do { \
const struct tcphdr *th = tcp_hdr(skb); \
- char hdr_flags[5] = {}; \
+ char hdr_flags[6]; \
char *f = hdr_flags; \
\
if (th->fin) \
@@ -133,17 +133,18 @@ do { \
*f++ = 'S'; \
if (th->rst) \
*f++ = 'R'; \
+ if (th->psh) \
+ *f++ = 'P'; \
if (th->ack) \
- *f++ = 'A'; \
- if (f != hdr_flags) \
- *f = ' '; \
+ *f++ = '.'; \
+ *f = 0; \
if ((family) == AF_INET) { \
- net_info_ratelimited("%s for (%pI4, %d)->(%pI4, %d) %s" fmt "\n", \
+ net_info_ratelimited("%s for %pI4.%d->%pI4.%d [%s] " fmt "\n", \
msg, &ip_hdr(skb)->saddr, ntohs(th->source), \
&ip_hdr(skb)->daddr, ntohs(th->dest), \
hdr_flags, ##__VA_ARGS__); \
} else { \
- net_info_ratelimited("%s for [%pI6c]:%u->[%pI6c]:%u %s" fmt "\n", \
+ net_info_ratelimited("%s for [%pI6c].%d->[%pI6c].%d [%s]" fmt "\n", \
msg, &ipv6_hdr(skb)->saddr, ntohs(th->source), \
&ipv6_hdr(skb)->daddr, ntohs(th->dest), \
hdr_flags, ##__VA_ARGS__); \
--
2.42.0.820.g83a721a137-goog
^ permalink raw reply related
* Re: [PATCH net-next v2 8/9] microchip: lan865x: add driver support for Microchip's LAN865X MACPHY
From: Parthiban.Veerasooran @ 2023-11-01 4:51 UTC (permalink / raw)
To: andrew
Cc: davem, edumazet, kuba, pabeni, robh+dt, krzysztof.kozlowski+dt,
conor+dt, corbet, Steen.Hegelund, rdunlap, horms, casper.casan,
netdev, devicetree, linux-kernel, linux-doc, Horatiu.Vultur,
Woojung.Huh, Nicolas.Ferre, UNGLinuxDriver, Thorsten.Kummermehr
In-Reply-To: <0ad1c14e-cd7e-4b24-aec9-75cbd19ad8e9@lunn.ch>
Hi Andrew,
On 31/10/23 6:23 pm, Andrew Lunn wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
>> Ah ok, then it is supposed to be like below, isn't it?
>>
>> static int lan865x_set_mac_address(struct net_device *netdev, void *addr)
>> {
>> struct sockaddr *address = addr;
>> int ret;
>>
>> if (netif_running(netdev))
>> return -EBUSY;
>>
>> ret = lan865x_set_hw_macaddr(netdev);
>> if (ret)
>> return ret;
>>
>> eth_hw_addr_set(netdev, address->sa_data);
>>
>> return 0;
>> }
>
> Yes, that is better. In practice, its probably not an issue, setting
> the MAC address will never fail, but it is good to get right, just in
> case.
Ok, thanks.
Best Regards,
Parthiban V
>
> Andrew
^ permalink raw reply
* Re: [PATCH net] net/tcp_sigpool: Fix some off by one bugs
From: Eric Dumazet @ 2023-11-01 4:51 UTC (permalink / raw)
To: Dan Carpenter
Cc: Dmitry Safonov, David S. Miller, David Ahern, Jakub Kicinski,
Paolo Abeni, Steen Hegelund, netdev, kernel-janitors
In-Reply-To: <ce915d61-04bc-44fb-b450-35fcc9fc8831@moroto.mountain>
On Tue, Oct 31, 2023 at 10:51 AM Dan Carpenter <dan.carpenter@linaro.org> wrote:
>
> The "cpool_populated" variable is the number of elements in the cpool[]
> array that have been populated. It is incremented in
> tcp_sigpool_alloc_ahash() every time we populate a new element.
> Unpopulated elements are NULL but if we have populated every element then
> this code will read one element beyond the end of the array.
>
> Fixes: 8c73b26315aa ("net/tcp: Prepare tcp_md5sig_pool for TCP-AO")
> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
> ---
> From static analysis and review.
>
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* Re: [PATCH net] tcp: Fix -Wc23-extensions in tcp_options_write()
From: Eric Dumazet @ 2023-11-01 4:49 UTC (permalink / raw)
To: Nathan Chancellor
Cc: davem, dsahern, kuba, pabeni, ndesaulniers, trix, 0x7f454c46,
fruggeri, noureddine, netdev, linux-kernel, llvm, patches
In-Reply-To: <20231031-tcp-ao-fix-label-in-compound-statement-warning-v1-1-c9731d115f17@kernel.org>
On Tue, Oct 31, 2023 at 9:23 PM Nathan Chancellor <nathan@kernel.org> wrote:
>
> Clang warns (or errors with CONFIG_WERROR=y) when CONFIG_TCP_AO is set:
>
> net/ipv4/tcp_output.c:663:2: error: label at end of compound statement is a C23 extension [-Werror,-Wc23-extensions]
> 663 | }
> | ^
> 1 error generated.
>
> On earlier releases (such as clang-11, the current minimum supported
> version for building the kernel) that do not support C23, this was a
> hard error unconditionally:
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* Re: [PATCH net-XXX] vhost-vdpa: fix use after free in vhost_vdpa_probe()
From: Jason Wang @ 2023-11-01 4:36 UTC (permalink / raw)
To: Dan Carpenter
Cc: Bo Liu, Michael S. Tsirkin, kvm, virtualization, netdev,
kernel-janitors
In-Reply-To: <cf53cb61-0699-4e36-a980-94fd4268ff00@moroto.mountain>
On Fri, Oct 27, 2023 at 8:13 PM Dan Carpenter <dan.carpenter@linaro.org> wrote:
>
> The put_device() calls vhost_vdpa_release_dev() which calls
> ida_simple_remove() and frees "v". So this call to
> ida_simple_remove() is a use after free and a double free.
>
> Fixes: ebe6a354fa7e ("vhost-vdpa: Call ida_simple_remove() when failed")
> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Thanks
> ---
> drivers/vhost/vdpa.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 9a2343c45df0..1aa67729e188 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -1511,7 +1511,6 @@ static int vhost_vdpa_probe(struct vdpa_device *vdpa)
>
> err:
> put_device(&v->dev);
> - ida_simple_remove(&vhost_vdpa_ida, v->minor);
> return r;
> }
>
> --
> 2.42.0
>
^ permalink raw reply
* Re: [PATCH net] net/tcp: fix possible out-of-bounds reads in tcp_hash_fail()
From: Eric Dumazet @ 2023-11-01 4:23 UTC (permalink / raw)
To: Dmitry Safonov
Cc: David S . Miller, netdev, eric.dumazet, syzbot, Jakub Kicinski,
Paolo Abeni, David Ahern, Francesco Ruggeri, Salam Noureddine
In-Reply-To: <CANn89iLzYCuUvKmw-4yiB-FBOtLnk0MmzM-h3Kvgn7qDpQg9bQ@mail.gmail.com>
On Wed, Nov 1, 2023 at 4:59 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Nov 1, 2023 at 1:02 AM Dmitry Safonov <dima@arista.com> wrote:
> >
> > Hi Eric,
> >
> > On 10/30/23 17:47, Dmitry Safonov wrote:
> > > On 10/30/23 08:45, Eric Dumazet wrote:
> > >> syzbot managed to trigger a fault by sending TCP packets
> > >> with all flags being set.
> > >>
> > >> BUG: KASAN: stack-out-of-bounds in string_nocheck lib/vsprintf.c:645 [inline]
> > >> BUG: KASAN: stack-out-of-bounds in string+0x394/0x3d0 lib/vsprintf.c:727
> > >> Read of size 1 at addr ffffc9000397f3f5 by task syz-executor299/5039
> > >>
> > >> CPU: 1 PID: 5039 Comm: syz-executor299 Not tainted 6.6.0-rc7-syzkaller-02075-g55c900477f5b #0
> > >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
> > >> Call Trace:
> > >> <TASK>
> > >> __dump_stack lib/dump_stack.c:88 [inline]
> > >> dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
> > >> print_address_description mm/kasan/report.c:364 [inline]
> > >> print_report+0xc4/0x620 mm/kasan/report.c:475
> > >> kasan_report+0xda/0x110 mm/kasan/report.c:588
> > >> string_nocheck lib/vsprintf.c:645 [inline]
> > >> string+0x394/0x3d0 lib/vsprintf.c:727
> > >> vsnprintf+0xc5f/0x1870 lib/vsprintf.c:2818
> > >> vprintk_store+0x3a0/0xb80 kernel/printk/printk.c:2191
> > >> vprintk_emit+0x14c/0x5f0 kernel/printk/printk.c:2288
> > >> vprintk+0x7b/0x90 kernel/printk/printk_safe.c:45
> > >> _printk+0xc8/0x100 kernel/printk/printk.c:2332
> > >> tcp_inbound_hash.constprop.0+0xdb2/0x10d0 include/net/tcp.h:2760
> > >> tcp_v6_rcv+0x2b31/0x34d0 net/ipv6/tcp_ipv6.c:1882
> > >> ip6_protocol_deliver_rcu+0x33b/0x13d0 net/ipv6/ip6_input.c:438
> > >> ip6_input_finish+0x14f/0x2f0 net/ipv6/ip6_input.c:483
> > >> NF_HOOK include/linux/netfilter.h:314 [inline]
> > >> NF_HOOK include/linux/netfilter.h:308 [inline]
> > >> ip6_input+0xce/0x440 net/ipv6/ip6_input.c:492
> > >> dst_input include/net/dst.h:461 [inline]
> > >> ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
> > >> NF_HOOK include/linux/netfilter.h:314 [inline]
> > >> NF_HOOK include/linux/netfilter.h:308 [inline]
> > >> ipv6_rcv+0x563/0x720 net/ipv6/ip6_input.c:310
> > >> __netif_receive_skb_one_core+0x115/0x180 net/core/dev.c:5527
> > >> __netif_receive_skb+0x1f/0x1b0 net/core/dev.c:5641
> > >> netif_receive_skb_internal net/core/dev.c:5727 [inline]
> > >> netif_receive_skb+0x133/0x700 net/core/dev.c:5786
> > >> tun_rx_batched+0x429/0x780 drivers/net/tun.c:1579
> > >> tun_get_user+0x29e7/0x3bc0 drivers/net/tun.c:2002
> > >> tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2048
> > >> call_write_iter include/linux/fs.h:1956 [inline]
> > >> new_sync_write fs/read_write.c:491 [inline]
> > >> vfs_write+0x650/0xe40 fs/read_write.c:584
> > >> ksys_write+0x12f/0x250 fs/read_write.c:637
> > >> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > >> do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
> > >> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > >>
> > >> Fixes: 2717b5adea9e ("net/tcp: Add tcp_hash_fail() ratelimited logs")
> > >> Reported-by: syzbot <syzkaller@googlegroups.com>
> > >> Signed-off-by: Eric Dumazet <edumazet@google.com>
> > >> Cc: Dmitry Safonov <dima@arista.com>
> > >> Cc: Francesco Ruggeri <fruggeri@arista.com>
> > >> Cc: David Ahern <dsahern@kernel.org>
> > >
> > > Thanks for fixing this,
> > > Reviewed-by: Dmitry Safonov <dima@arista.com>
> > >
> > >> ---
> > >> include/net/tcp_ao.h | 5 ++---
> > >> 1 file changed, 2 insertions(+), 3 deletions(-)
> > >>
> > >> diff --git a/include/net/tcp_ao.h b/include/net/tcp_ao.h
> > >> index a375a171ef3cb37ab1d8246c72c6a3e83f5c9184..5daf96a3dbee14bd3786e19ea4972e351058e6e7 100644
> > >> --- a/include/net/tcp_ao.h
> > >> +++ b/include/net/tcp_ao.h
> > >> @@ -124,7 +124,7 @@ struct tcp_ao_info {
> > >> #define tcp_hash_fail(msg, family, skb, fmt, ...) \
> > >> do { \
> > >> const struct tcphdr *th = tcp_hdr(skb); \
> > >> - char hdr_flags[5] = {}; \
> > >> + char hdr_flags[5]; \
> > >> char *f = hdr_flags; \
> > >> \
> > >> if (th->fin) \
> > >> @@ -135,8 +135,7 @@ do { \
> > >> *f++ = 'R'; \
> > >> if (th->ack) \
> > >> *f++ = 'A'; \
> > >> - if (f != hdr_flags) \
> > >> - *f = ' '; \
> > > Ah, that was clearly typo: I meant "*f = 0;"
> >
> > Actually, after testing, I can see that the space was intended as well,
> > otherwise it "sticks" to L3index:
> > [ 130.965652] TCP: AO key not found for (10.0.1.1, 56920)->(10.0.254.1,
> > 7010) Skeyid: 100 L3index: 0
> > [ 131.975116] TCP: AO hash is required, but not found for (10.0.1.1,
> > 52686)->(10.0.254.1, 7011) SL3 index 0
> > [ 132.984024] TCP: AO hash mismatch for (10.0.1.1, 51382)->(10.0.254.1,
> > 7012) SL3index: 0
> > [ 133.992221] TCP: Requested by the peer AO key id not found for
> > (10.0.1.1, 36548)->(10.0.254.1, 7013) SL3index: 0
> >
> >
> > If you don't mind, I'll send an updated version of your patch together
> > with some other small post-merge fixes this week.
>
> I will send a v2, adding the space in the format like this :
>
> diff --git a/include/net/tcp_ao.h b/include/net/tcp_ao.h
> index 5daf96a3dbee14bd3786e19ea4972e351058e6e7..b90b4f8fb10fcbb31ddf65b825c098b215a91e67
> 100644
> --- a/include/net/tcp_ao.h
> +++ b/include/net/tcp_ao.h
> @@ -137,7 +137,7 @@ do {
> \
> *f++ = 'A'; \
> *f = 0; \
> if ((family) == AF_INET) { \
> - net_info_ratelimited("%s for (%pI4, %d)->(%pI4, %d)
> %s" fmt "\n", \
> + net_info_ratelimited("%s for (%pI4, %d)->(%pI4, %d) %s
> " fmt "\n", \
> msg, &ip_hdr(skb)->saddr, ntohs(th->source), \
> &ip_hdr(skb)->daddr, ntohs(th->dest), \
> hdr_flags, ##__VA_ARGS__); \
I will also change the format to use tcpdump one:
ACK is represented with a dot.
diff --git a/include/net/tcp_ao.h b/include/net/tcp_ao.h
index b90b4f8fb10fcbb31ddf65b825c098b215a91e67..5ce1f8de0d77a1208bdbe4266e4bf2f76089f56c
100644
--- a/include/net/tcp_ao.h
+++ b/include/net/tcp_ao.h
@@ -134,10 +134,10 @@ do {
\
if (th->rst) \
*f++ = 'R'; \
if (th->ack) \
- *f++ = 'A'; \
+ *f++ = '.'; \
*f = 0; \
if ((family) == AF_INET) { \
- net_info_ratelimited("%s for (%pI4, %d)->(%pI4, %d) %s
" fmt "\n", \
+ net_info_ratelimited("%s for (%pI4, %d)->(%pI4, %d)
[%s] " fmt "\n", \
msg, &ip_hdr(skb)->saddr, ntohs(th->source), \
&ip_hdr(skb)->daddr, ntohs(th->dest), \
hdr_flags, ##__VA_ARGS__); \
^ permalink raw reply
* Re: [PATCH net] net/tcp: fix possible out-of-bounds reads in tcp_hash_fail()
From: Eric Dumazet @ 2023-11-01 3:59 UTC (permalink / raw)
To: Dmitry Safonov
Cc: David S . Miller, netdev, eric.dumazet, syzbot, Jakub Kicinski,
Paolo Abeni, David Ahern, Francesco Ruggeri, Salam Noureddine
In-Reply-To: <b3f9ae14-0cf6-4587-8d2a-9db1661746cc@arista.com>
On Wed, Nov 1, 2023 at 1:02 AM Dmitry Safonov <dima@arista.com> wrote:
>
> Hi Eric,
>
> On 10/30/23 17:47, Dmitry Safonov wrote:
> > On 10/30/23 08:45, Eric Dumazet wrote:
> >> syzbot managed to trigger a fault by sending TCP packets
> >> with all flags being set.
> >>
> >> BUG: KASAN: stack-out-of-bounds in string_nocheck lib/vsprintf.c:645 [inline]
> >> BUG: KASAN: stack-out-of-bounds in string+0x394/0x3d0 lib/vsprintf.c:727
> >> Read of size 1 at addr ffffc9000397f3f5 by task syz-executor299/5039
> >>
> >> CPU: 1 PID: 5039 Comm: syz-executor299 Not tainted 6.6.0-rc7-syzkaller-02075-g55c900477f5b #0
> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
> >> Call Trace:
> >> <TASK>
> >> __dump_stack lib/dump_stack.c:88 [inline]
> >> dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
> >> print_address_description mm/kasan/report.c:364 [inline]
> >> print_report+0xc4/0x620 mm/kasan/report.c:475
> >> kasan_report+0xda/0x110 mm/kasan/report.c:588
> >> string_nocheck lib/vsprintf.c:645 [inline]
> >> string+0x394/0x3d0 lib/vsprintf.c:727
> >> vsnprintf+0xc5f/0x1870 lib/vsprintf.c:2818
> >> vprintk_store+0x3a0/0xb80 kernel/printk/printk.c:2191
> >> vprintk_emit+0x14c/0x5f0 kernel/printk/printk.c:2288
> >> vprintk+0x7b/0x90 kernel/printk/printk_safe.c:45
> >> _printk+0xc8/0x100 kernel/printk/printk.c:2332
> >> tcp_inbound_hash.constprop.0+0xdb2/0x10d0 include/net/tcp.h:2760
> >> tcp_v6_rcv+0x2b31/0x34d0 net/ipv6/tcp_ipv6.c:1882
> >> ip6_protocol_deliver_rcu+0x33b/0x13d0 net/ipv6/ip6_input.c:438
> >> ip6_input_finish+0x14f/0x2f0 net/ipv6/ip6_input.c:483
> >> NF_HOOK include/linux/netfilter.h:314 [inline]
> >> NF_HOOK include/linux/netfilter.h:308 [inline]
> >> ip6_input+0xce/0x440 net/ipv6/ip6_input.c:492
> >> dst_input include/net/dst.h:461 [inline]
> >> ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
> >> NF_HOOK include/linux/netfilter.h:314 [inline]
> >> NF_HOOK include/linux/netfilter.h:308 [inline]
> >> ipv6_rcv+0x563/0x720 net/ipv6/ip6_input.c:310
> >> __netif_receive_skb_one_core+0x115/0x180 net/core/dev.c:5527
> >> __netif_receive_skb+0x1f/0x1b0 net/core/dev.c:5641
> >> netif_receive_skb_internal net/core/dev.c:5727 [inline]
> >> netif_receive_skb+0x133/0x700 net/core/dev.c:5786
> >> tun_rx_batched+0x429/0x780 drivers/net/tun.c:1579
> >> tun_get_user+0x29e7/0x3bc0 drivers/net/tun.c:2002
> >> tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2048
> >> call_write_iter include/linux/fs.h:1956 [inline]
> >> new_sync_write fs/read_write.c:491 [inline]
> >> vfs_write+0x650/0xe40 fs/read_write.c:584
> >> ksys_write+0x12f/0x250 fs/read_write.c:637
> >> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >> do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
> >> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>
> >> Fixes: 2717b5adea9e ("net/tcp: Add tcp_hash_fail() ratelimited logs")
> >> Reported-by: syzbot <syzkaller@googlegroups.com>
> >> Signed-off-by: Eric Dumazet <edumazet@google.com>
> >> Cc: Dmitry Safonov <dima@arista.com>
> >> Cc: Francesco Ruggeri <fruggeri@arista.com>
> >> Cc: David Ahern <dsahern@kernel.org>
> >
> > Thanks for fixing this,
> > Reviewed-by: Dmitry Safonov <dima@arista.com>
> >
> >> ---
> >> include/net/tcp_ao.h | 5 ++---
> >> 1 file changed, 2 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/net/tcp_ao.h b/include/net/tcp_ao.h
> >> index a375a171ef3cb37ab1d8246c72c6a3e83f5c9184..5daf96a3dbee14bd3786e19ea4972e351058e6e7 100644
> >> --- a/include/net/tcp_ao.h
> >> +++ b/include/net/tcp_ao.h
> >> @@ -124,7 +124,7 @@ struct tcp_ao_info {
> >> #define tcp_hash_fail(msg, family, skb, fmt, ...) \
> >> do { \
> >> const struct tcphdr *th = tcp_hdr(skb); \
> >> - char hdr_flags[5] = {}; \
> >> + char hdr_flags[5]; \
> >> char *f = hdr_flags; \
> >> \
> >> if (th->fin) \
> >> @@ -135,8 +135,7 @@ do { \
> >> *f++ = 'R'; \
> >> if (th->ack) \
> >> *f++ = 'A'; \
> >> - if (f != hdr_flags) \
> >> - *f = ' '; \
> > Ah, that was clearly typo: I meant "*f = 0;"
>
> Actually, after testing, I can see that the space was intended as well,
> otherwise it "sticks" to L3index:
> [ 130.965652] TCP: AO key not found for (10.0.1.1, 56920)->(10.0.254.1,
> 7010) Skeyid: 100 L3index: 0
> [ 131.975116] TCP: AO hash is required, but not found for (10.0.1.1,
> 52686)->(10.0.254.1, 7011) SL3 index 0
> [ 132.984024] TCP: AO hash mismatch for (10.0.1.1, 51382)->(10.0.254.1,
> 7012) SL3index: 0
> [ 133.992221] TCP: Requested by the peer AO key id not found for
> (10.0.1.1, 36548)->(10.0.254.1, 7013) SL3index: 0
>
>
> If you don't mind, I'll send an updated version of your patch together
> with some other small post-merge fixes this week.
I will send a v2, adding the space in the format like this :
diff --git a/include/net/tcp_ao.h b/include/net/tcp_ao.h
index 5daf96a3dbee14bd3786e19ea4972e351058e6e7..b90b4f8fb10fcbb31ddf65b825c098b215a91e67
100644
--- a/include/net/tcp_ao.h
+++ b/include/net/tcp_ao.h
@@ -137,7 +137,7 @@ do {
\
*f++ = 'A'; \
*f = 0; \
if ((family) == AF_INET) { \
- net_info_ratelimited("%s for (%pI4, %d)->(%pI4, %d)
%s" fmt "\n", \
+ net_info_ratelimited("%s for (%pI4, %d)->(%pI4, %d) %s
" fmt "\n", \
msg, &ip_hdr(skb)->saddr, ntohs(th->source), \
&ip_hdr(skb)->daddr, ntohs(th->dest), \
hdr_flags, ##__VA_ARGS__); \
^ permalink raw reply
* [PATCH net 2/3] net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc
From: D. Wythe @ 2023-11-01 3:42 UTC (permalink / raw)
To: kgraul, wenjia, jaka, wintera; +Cc: kuba, davem, netdev, linux-s390, linux-rdma
In-Reply-To: <1698810177-69740-1-git-send-email-alibuda@linux.alibaba.com>
From: "D. Wythe" <alibuda@linux.alibaba.com>
This patch re-fix the issues memtianed by commit 22a825c541d7
("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()").
Blocking sending message do solve the issues though, but it also
prevents the peer to receive the final message. Besides, in logic,
whether the sndbuf_desc is NULL or not have no impact on the processing
of cdc message sending.
Hence that, this patch allow the cdc message sending but to check the
sndbuf_desc with care in smc_cdc_tx_handler().
Fixes: 22a825c541d7 ("net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
---
net/smc/smc_cdc.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
index 01bdb79..3c06625 100644
--- a/net/smc/smc_cdc.c
+++ b/net/smc/smc_cdc.c
@@ -28,13 +28,15 @@ static void smc_cdc_tx_handler(struct smc_wr_tx_pend_priv *pnd_snd,
{
struct smc_cdc_tx_pend *cdcpend = (struct smc_cdc_tx_pend *)pnd_snd;
struct smc_connection *conn = cdcpend->conn;
+ struct smc_buf_desc *sndbuf_desc;
struct smc_sock *smc;
int diff;
+ sndbuf_desc = conn->sndbuf_desc;
smc = container_of(conn, struct smc_sock, conn);
bh_lock_sock(&smc->sk);
- if (!wc_status) {
- diff = smc_curs_diff(cdcpend->conn->sndbuf_desc->len,
+ if (!wc_status && sndbuf_desc) {
+ diff = smc_curs_diff(sndbuf_desc->len,
&cdcpend->conn->tx_curs_fin,
&cdcpend->cursor);
/* sndbuf_space is decreased in smc_sendmsg */
@@ -114,9 +116,6 @@ int smc_cdc_msg_send(struct smc_connection *conn,
union smc_host_cursor cfed;
int rc;
- if (unlikely(!READ_ONCE(conn->sndbuf_desc)))
- return -ENOBUFS;
-
smc_cdc_add_pending_send(conn, pend);
conn->tx_cdc_seq++;
--
1.8.3.1
^ permalink raw reply related
* [PATCH net 3/3] net/smc: put sk reference if close work was canceled
From: D. Wythe @ 2023-11-01 3:42 UTC (permalink / raw)
To: kgraul, wenjia, jaka, wintera; +Cc: kuba, davem, netdev, linux-s390, linux-rdma
In-Reply-To: <1698810177-69740-1-git-send-email-alibuda@linux.alibaba.com>
From: "D. Wythe" <alibuda@linux.alibaba.com>
Note that we always hold a reference to sock when attempting
to submit close_work. Therefore, if we have successfully
canceled close_work from pending, we MUST release that reference
to avoid potential leaks.
Fixes: 42bfba9eaa33 ("net/smc: immediate termination for SMCD link groups")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
---
net/smc/smc_close.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
index 449ef45..10219f5 100644
--- a/net/smc/smc_close.c
+++ b/net/smc/smc_close.c
@@ -116,7 +116,8 @@ static void smc_close_cancel_work(struct smc_sock *smc)
struct sock *sk = &smc->sk;
release_sock(sk);
- cancel_work_sync(&smc->conn.close_work);
+ if (cancel_work_sync(&smc->conn.close_work))
+ sock_put(sk);
cancel_delayed_work_sync(&smc->conn.tx_work);
lock_sock(sk);
}
--
1.8.3.1
^ permalink raw reply related
* [PATCH net 1/3] net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
From: D. Wythe @ 2023-11-01 3:42 UTC (permalink / raw)
To: kgraul, wenjia, jaka, wintera; +Cc: kuba, davem, netdev, linux-s390, linux-rdma
In-Reply-To: <1698810177-69740-1-git-send-email-alibuda@linux.alibaba.com>
From: "D. Wythe" <alibuda@linux.alibaba.com>
Considering scenario:
smc_cdc_rx_handler_rwwi
__smc_release
sock_set_flag
smc_close_active()
sock_set_flag
__set_bit(DEAD) __set_bit(DONE)
Dues to __set_bit is not atomic, the DEAD or DONE might be lost.
if the DEAD flag lost, the state SMC_CLOSED will be never be reached
in smc_close_passive_work:
if (sock_flag(sk, SOCK_DEAD) &&
smc_close_sent_any_close(conn)) {
sk->sk_state = SMC_CLOSED;
} else {
/* just shutdown, but not yet closed locally */
sk->sk_state = SMC_APPFINCLOSEWAIT;
}
Replace sock_set_flags or __set_bit to set_bit will fix this problem.
Since set_bit is atomic.
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
---
net/smc/af_smc.c | 4 ++--
net/smc/smc.h | 5 +++++
net/smc/smc_cdc.c | 2 +-
net/smc/smc_close.c | 2 +-
4 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index abd2667..da97f94 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -275,7 +275,7 @@ static int __smc_release(struct smc_sock *smc)
if (!smc->use_fallback) {
rc = smc_close_active(smc);
- sock_set_flag(sk, SOCK_DEAD);
+ smc_sock_set_flag(sk, SOCK_DEAD);
sk->sk_shutdown |= SHUTDOWN_MASK;
} else {
if (sk->sk_state != SMC_CLOSED) {
@@ -1743,7 +1743,7 @@ static int smc_clcsock_accept(struct smc_sock *lsmc, struct smc_sock **new_smc)
if (new_clcsock)
sock_release(new_clcsock);
new_sk->sk_state = SMC_CLOSED;
- sock_set_flag(new_sk, SOCK_DEAD);
+ smc_sock_set_flag(new_sk, SOCK_DEAD);
sock_put(new_sk); /* final */
*new_smc = NULL;
goto out;
diff --git a/net/smc/smc.h b/net/smc/smc.h
index 24745fd..e377980 100644
--- a/net/smc/smc.h
+++ b/net/smc/smc.h
@@ -377,4 +377,9 @@ void smc_fill_gid_list(struct smc_link_group *lgr,
int smc_nl_enable_hs_limitation(struct sk_buff *skb, struct genl_info *info);
int smc_nl_disable_hs_limitation(struct sk_buff *skb, struct genl_info *info);
+static inline void smc_sock_set_flag(struct sock *sk, enum sock_flags flag)
+{
+ set_bit(flag, &sk->sk_flags);
+}
+
#endif /* __SMC_H */
diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
index 89105e9..01bdb79 100644
--- a/net/smc/smc_cdc.c
+++ b/net/smc/smc_cdc.c
@@ -385,7 +385,7 @@ static void smc_cdc_msg_recv_action(struct smc_sock *smc,
smc->sk.sk_shutdown |= RCV_SHUTDOWN;
if (smc->clcsock && smc->clcsock->sk)
smc->clcsock->sk->sk_shutdown |= RCV_SHUTDOWN;
- sock_set_flag(&smc->sk, SOCK_DONE);
+ smc_sock_set_flag(&smc->sk, SOCK_DONE);
sock_hold(&smc->sk); /* sock_put in close_work */
if (!queue_work(smc_close_wq, &conn->close_work))
sock_put(&smc->sk);
diff --git a/net/smc/smc_close.c b/net/smc/smc_close.c
index dbdf03e..449ef45 100644
--- a/net/smc/smc_close.c
+++ b/net/smc/smc_close.c
@@ -173,7 +173,7 @@ void smc_close_active_abort(struct smc_sock *smc)
break;
}
- sock_set_flag(sk, SOCK_DEAD);
+ smc_sock_set_flag(sk, SOCK_DEAD);
sk->sk_state_change(sk);
if (release_clcsock) {
--
1.8.3.1
^ permalink raw reply related
* [PATCH net 0/3] bugfixs for smc
From: D. Wythe @ 2023-11-01 3:42 UTC (permalink / raw)
To: kgraul, wenjia, jaka, wintera; +Cc: kuba, davem, netdev, linux-s390, linux-rdma
From: "D. Wythe" <alibuda@linux.alibaba.com>
This patches includes bugfix following:
1. hung state
2. sock leak
3. potential panic
We have been testing these patches for some time, but
if you have any questions, please let us know.
Thanks,
D. Wythe
D. Wythe (3):
net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc
net/smc: put sk reference if close work was canceled
net/smc/af_smc.c | 4 ++--
net/smc/smc.h | 5 +++++
net/smc/smc_cdc.c | 11 +++++------
net/smc/smc_close.c | 5 +++--
4 files changed, 15 insertions(+), 10 deletions(-)
--
1.8.3.1
^ permalink raw reply
* Re: [syzbot] [net?] KASAN: slab-use-after-free Read in ptp_release
From: syzbot @ 2023-11-01 2:51 UTC (permalink / raw)
To: linux-kernel, netdev, richardcochran, syzkaller-bugs
In-Reply-To: <000000000000ffc87a06086172a0@google.com>
syzbot has found a reproducer for the following issue on:
HEAD commit: 89ed67ef126c Merge tag 'net-next-6.7' of git://git.kernel...
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1142a1a5680000
kernel config: https://syzkaller.appspot.com/x/.config?x=6e3b1d98cf5a2cca
dashboard link: https://syzkaller.appspot.com/bug?extid=8a676a50d4eee2f21539
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1751c63d680000
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/b69c238dd56a/disk-89ed67ef.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/f555d654a8ba/vmlinux-89ed67ef.xz
kernel image: https://storage.googleapis.com/syzbot-assets/335bbfb6c442/bzImage-89ed67ef.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8a676a50d4eee2f21539@syzkaller.appspotmail.com
list_del corruption. next->prev should be ffff888020fe5048, but was ffff88807a0f9048. (next=ffff88802533e5e8)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:67!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 5827 Comm: syz-executor.5 Not tainted 6.6.0-syzkaller-05843-g89ed67ef126c #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
RIP: 0010:__list_del_entry_valid_or_report+0x122/0x130 lib/list_debug.c:65
Code: 85 06 0f 0b 48 c7 c7 20 5f 9d 8b 4c 89 fe 48 89 d9 e8 52 db 85 06 0f 0b 48 c7 c7 a0 5f 9d 8b 4c 89 fe 4c 89 f1 e8 3e db 85 06 <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 80 3d 1d 6e
RSP: 0018:ffffc9000aaa7db0 EFLAGS: 00010046
RAX: 000000000000006d RBX: ffff88802533e5f0 RCX: 88e517f49d581b00
RDX: 0000000000000000 RSI: 0000000080000001 RDI: 0000000000000000
RBP: ffff888020fe5008 R08: ffffffff81717aac R09: 1ffff92001554f54
R10: dffffc0000000000 R11: fffff52001554f55 R12: dffffc0000000000
R13: ffff888020fe4000 R14: ffff88802533e5e8 R15: ffff888020fe5048
FS: 0000555557106480(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1cc2398000 CR3: 000000001cad1000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__list_del_entry_valid include/linux/list.h:124 [inline]
__list_del_entry include/linux/list.h:215 [inline]
list_del include/linux/list.h:229 [inline]
ptp_release+0xa8/0x1e0 drivers/ptp/ptp_chardev.c:147
posix_clock_release+0x8c/0x100 kernel/time/posix-clock.c:157
__fput+0x3cc/0xa10 fs/file_table.c:394
__do_sys_close fs/open.c:1590 [inline]
__se_sys_close+0x15f/0x220 fs/open.c:1575
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f1cc227b9da
Code: 48 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c 24 0c e8 03 7f 02 00 8b 7c 24 0c 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 36 89 d7 89 44 24 0c e8 63 7f 02 00 8b 44 24
RSP: 002b:00007ffe5d85d9f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f1cc227b9da
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000000032 R08: 0000001b30160000 R09: 00007f1cc239bf8c
R10: 00007ffe5d85db40 R11: 0000000000000293 R12: 00007f1cc1e001d8
R13: ffffffffffffffff R14: 00007f1cc1e00000 R15: 0000000000015db0
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__list_del_entry_valid_or_report+0x122/0x130 lib/list_debug.c:65
Code: 85 06 0f 0b 48 c7 c7 20 5f 9d 8b 4c 89 fe 48 89 d9 e8 52 db 85 06 0f 0b 48 c7 c7 a0 5f 9d 8b 4c 89 fe 4c 89 f1 e8 3e db 85 06 <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 80 3d 1d 6e
RSP: 0018:ffffc9000aaa7db0 EFLAGS: 00010046
RAX: 000000000000006d RBX: ffff88802533e5f0 RCX: 88e517f49d581b00
RDX: 0000000000000000 RSI: 0000000080000001 RDI: 0000000000000000
RBP: ffff888020fe5008 R08: ffffffff81717aac R09: 1ffff92001554f54
R10: dffffc0000000000 R11: fffff52001554f55 R12: dffffc0000000000
R13: ffff888020fe4000 R14: ffff88802533e5e8 R15: ffff888020fe5048
FS: 0000555557106480(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1cc2398000 CR3: 000000001cad1000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.
^ permalink raw reply
* [PATCH net-next] netfilter: nf_tables: Remove unused variable nft_net
From: Yang Li @ 2023-11-01 1:33 UTC (permalink / raw)
To: kuba, edumazet, davem, pabeni, fw, kadlec, pablo
Cc: netfilter-devel, coreteam, netdev, linux-kernel, Yang Li,
Abaci Robot
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=n, Size: 1241 bytes --]
The code that uses nft_net has been removed, and the nft_pernet function
is merely obtaining a reference to shared data through the net pointer.
The content of the net pointer is not modified or changed, so both of
them should be removed.
silence the warning:
net/netfilter/nft_set_rbtree.c:627:26: warning: variable ‘nft_net’ set but not used
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7103
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
---
net/netfilter/nft_set_rbtree.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 6f1186abd47b..baa3fea4fe65 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -624,14 +624,12 @@ static void nft_rbtree_gc(struct nft_set *set)
{
struct nft_rbtree *priv = nft_set_priv(set);
struct nft_rbtree_elem *rbe, *rbe_end = NULL;
- struct nftables_pernet *nft_net;
struct rb_node *node, *next;
struct nft_trans_gc *gc;
struct net *net;
set = nft_set_container_of(priv);
net = read_pnet(&set->net);
- nft_net = nft_pernet(net);
gc = nft_trans_gc_alloc(set, 0, GFP_KERNEL);
if (!gc)
--
2.20.1.7.g153144c
^ permalink raw reply related
* [GIT PULL] Networking follow up for 6.7
From: Jakub Kicinski @ 2023-11-01 1:18 UTC (permalink / raw)
To: torvalds; +Cc: kuba, davem, netdev, linux-kernel, pabeni
In-Reply-To: <CAHk-=wjTWHVjEA2wfU+eHMXygyuh4Jf_tqXRxv8VnzqAPB4htg@mail.gmail.com>
> Well, I had actually already merged the original pull request, and
> then started a full allmodconfig build.
>
> And because I'm off gallivanting and traveling, that takes 2h on this
> poor little laptop, so I had left to do more fun things than watch
> paint dry.
Eish. Sorry.
> I pushed out my original merge. I'll pull the updates later.
In case it's helpful here's a tag with just the delta described.
Running a risk of another race condition :)
The following changes since commit f1c73396133cb3d913e2075298005644ee8dfade:
net: pcs: xpcs: Add 2500BASE-X case in get state for XPCS drivers (2023-10-27 15:59:44 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git tags/net-next-6.7-followup
for you to fetch changes up to f2fbb908112311423b09cd0d2b4978f174b99585:
net: tcp: remove call to obsolete crypto_ahash_alignmask() (2023-10-31 13:11:51 -0700)
----------------------------------------------------------------
Follow up to networking PR for 6.7
- Support GRO decapsulation for IPsec ESP in UDP.
- Add a handful of MODULE_DESCRIPTION()s.
- Drop questionable alignment check in TCP AO to avoid
build issue after changes in the crypto tree.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
----------------------------------------------------------------
Florian Westphal (4):
xfrm: pass struct net to xfrm_decode_session wrappers
xfrm: move mark and oif flowi decode into common code
xfrm: policy: replace session decode with flow dissector
xfrm: policy: fix layer 4 flowi decoding
Jakub Kicinski (5):
net: fill in MODULE_DESCRIPTION()s in kuba@'s modules
net: fill in MODULE_DESCRIPTION()s under net/core
net: fill in MODULE_DESCRIPTION()s under net/802*
net: fill in MODULE_DESCRIPTION()s under drivers/net/
Merge tag 'ipsec-next-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Kees Cook (1):
xfrm: Annotate struct xfrm_sec_ctx with __counted_by
Steffen Klassert (6):
xfrm: Use the XFRM_GRO to indicate a GRO call on input
xfrm: Support GRO for IPv4 ESP in UDP encapsulation
xfrm: Support GRO for IPv6 ESP in UDP encapsulation
Merge branch 'xfrm: Support GRO decapsulation for ESP in UDP encapsulation'
Merge branch 'xfrm: policy: replace session decode with flow dissector'
xfrm Fix use after free in __xfrm6_udp_encap_rcv.
Stephen Rothwell (1):
net: tcp: remove call to obsolete crypto_ahash_alignmask()
Yue Haibing (1):
xfrm: Remove unused function declarations
drivers/net/amt.c | 1 +
drivers/net/dummy.c | 1 +
drivers/net/eql.c | 1 +
drivers/net/ifb.c | 1 +
drivers/net/macvtap.c | 1 +
drivers/net/netdevsim/netdev.c | 1 +
drivers/net/sungem_phy.c | 1 +
drivers/net/tap.c | 1 +
drivers/net/wireless/mediatek/mt7601u/usb.c | 1 +
include/net/gro.h | 2 +-
include/net/ipv6_stubs.h | 3 +
include/net/xfrm.h | 18 +-
include/uapi/linux/xfrm.h | 3 +-
net/802/fddi.c | 1 +
net/802/garp.c | 1 +
net/802/mrp.c | 1 +
net/802/p8022.c | 1 +
net/802/psnap.c | 1 +
net/802/stp.c | 1 +
net/8021q/vlan.c | 1 +
net/core/dev_addr_lists_test.c | 1 +
net/core/selftests.c | 1 +
net/ipv4/esp4_offload.c | 6 +-
net/ipv4/icmp.c | 2 +-
net/ipv4/ip_vti.c | 4 +-
net/ipv4/netfilter.c | 2 +-
net/ipv4/tcp_ao.c | 6 -
net/ipv4/udp.c | 16 ++
net/ipv4/xfrm4_input.c | 95 +++++++--
net/ipv6/af_inet6.c | 1 +
net/ipv6/esp6_offload.c | 10 +-
net/ipv6/icmp.c | 2 +-
net/ipv6/ip6_vti.c | 4 +-
net/ipv6/netfilter.c | 2 +-
net/ipv6/xfrm6_input.c | 103 ++++++++--
net/netfilter/nf_nat_proto.c | 2 +-
net/xfrm/xfrm_input.c | 6 +-
net/xfrm/xfrm_interface_core.c | 4 +-
net/xfrm/xfrm_policy.c | 301 ++++++++++++----------------
39 files changed, 362 insertions(+), 248 deletions(-)
^ permalink raw reply
* Re: [PATCH net] net/smc: fix documentation of buffer sizes
From: Dust Li @ 2023-11-01 1:07 UTC (permalink / raw)
To: Gerd Bayer, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Jonathan Corbet
Cc: Jan Karcher, Wenjia Zhang, Tony Lu, netdev, linux-doc,
linux-kernel
In-Reply-To: <20231030170343.748097-1-gbayer@linux.ibm.com>
On Mon, Oct 30, 2023 at 06:03:43PM +0100, Gerd Bayer wrote:
>Since commit 833bac7ec392 ("net/smc: Fix setsockopt and sysctl to
>specify same buffer size again") the SMC protocol uses its own
>default values for the smc.rmem and smc.wmem sysctl variables
>which are no longer derived from the TCP IPv4 buffer sizes.
>
>Fixup the kernel documentation to reflect this change, too.
>
>Fixes: 833bac7ec392 ("net/smc: Fix setsockopt and sysctl to specify same buffer size again")
>Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
>Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
>---
> Documentation/networking/smc-sysctl.rst | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
>diff --git a/Documentation/networking/smc-sysctl.rst b/Documentation/networking/smc-sysctl.rst
>index 6d8acdbe9be1..769149d98773 100644
>--- a/Documentation/networking/smc-sysctl.rst
>+++ b/Documentation/networking/smc-sysctl.rst
>@@ -44,18 +44,16 @@ smcr_testlink_time - INTEGER
>
> wmem - INTEGER
> Initial size of send buffer used by SMC sockets.
>- The default value inherits from net.ipv4.tcp_wmem[1].
>
> The minimum value is 16KiB and there is no hard limit for max value, but
> only allowed 512KiB for SMC-R and 1MiB for SMC-D.
>
>- Default: 16K
>+ Default: 64KiB
>
> rmem - INTEGER
> Initial size of receive buffer (RMB) used by SMC sockets.
>- The default value inherits from net.ipv4.tcp_rmem[1].
>
> The minimum value is 16KiB and there is no hard limit for max value, but
> only allowed 512KiB for SMC-R and 1MiB for SMC-D.
>
>- Default: 128K
>+ Default: 64KiB
>--
>2.41.0
^ permalink raw reply
* Re: [PATCH v4 1/5] r8169: Coalesce r8169_mac_ocp_write/modify calls to reduce spinlock stalls
From: Mirsad Todorovac @ 2023-11-01 0:35 UTC (permalink / raw)
To: Jacob Keller, Heiner Kallweit, Jason Gunthorpe, Joerg Roedel,
Lu Baolu, iommu, linux-kernel, netdev
Cc: Joerg Roedel, Will Deacon, Robin Murphy, nic_swsd,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Marco Elver
In-Reply-To: <f2f3a5dc-8b00-400a-9d3f-f10663ce8857@intel.com>
On 10/31/23 20:46, Jacob Keller wrote:
>
>
> On 10/30/2023 8:51 PM, Mirsad Todorovac wrote:
>> Am I allowed to keep Mr. Keller's Reviewed-by: tags on the reviewed diffs provided
>> that I fix the cover letter issue and objections?
>>
>
> I have no objections as long as the content otherwise remains the same :)
>
> Thanks,
> Jake
Of course, one changed character would require another review.
Thank you.
Mirsad
^ permalink raw reply
* Re: [PATCH bpf-next v8 07/10] bpf, net: switch to dynamic registration
From: Kui-Feng Lee @ 2023-11-01 0:19 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: kuifeng, netdev, bpf, ast, song, kernel-team, andrii, thinker.li,
drosen
In-Reply-To: <5a8520dd-0dd6-4d51-9e4a-6eebcf7e792d@linux.dev>
On 10/31/23 17:02, Martin KaFai Lau wrote:
> On 10/31/23 4:34 PM, Kui-Feng Lee wrote:
>>>> diff --git a/include/linux/btf.h b/include/linux/btf.h
>>>> index a8813605f2f6..954536431e0b 100644
>>>> --- a/include/linux/btf.h
>>>> +++ b/include/linux/btf.h
>>>> @@ -12,6 +12,8 @@
>>>> #include <uapi/linux/bpf.h>
>>>> #define BTF_TYPE_EMIT(type) ((void)(type *)0)
>>>> +#define BTF_STRUCT_OPS_TYPE_EMIT(type) {((void)(struct type *)0); \
>>>
>>> ((void)(struct type *)0); is new. Why is it needed?
>>
>> This is a trick of BTF to force compiler generate type info for
>> the given type. Without trick, compiler may skip these types if these
>> type are not used at all in the module. For example, modules usually
>> don't use value types of struct_ops directly.
> It is not the value type and value type emit is understood. It is the
> struct_ops type itself and it is new addition in this patchset afaict.
> The value type emit is in the next line which was cut out from the
> context here.
>
I mean both of them are required.
In the case of a dummy implementation, struct_ops type itself properly
never being used, only being declared by the module. Without this line,
the module developer will fail to load a struct_ops map of the dummy
type. This line is added to avoid this awful situation.
^ permalink raw reply
* Re: [PATCH bpf-next v8 07/10] bpf, net: switch to dynamic registration
From: Kui-Feng Lee @ 2023-11-01 0:19 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: kuifeng, netdev, bpf, ast, song, kernel-team, andrii, thinker.li,
drosen
In-Reply-To: <5a8520dd-0dd6-4d51-9e4a-6eebcf7e792d@linux.dev>
On 10/31/23 17:02, Martin KaFai Lau wrote:
> On 10/31/23 4:34 PM, Kui-Feng Lee wrote:
>>>> diff --git a/include/linux/btf.h b/include/linux/btf.h
>>>> index a8813605f2f6..954536431e0b 100644
>>>> --- a/include/linux/btf.h
>>>> +++ b/include/linux/btf.h
>>>> @@ -12,6 +12,8 @@
>>>> #include <uapi/linux/bpf.h>
>>>> #define BTF_TYPE_EMIT(type) ((void)(type *)0)
>>>> +#define BTF_STRUCT_OPS_TYPE_EMIT(type) {((void)(struct type *)0); \
>>>
>>> ((void)(struct type *)0); is new. Why is it needed?
>>
>> This is a trick of BTF to force compiler generate type info for
>> the given type. Without trick, compiler may skip these types if these
>> type are not used at all in the module. For example, modules usually
>> don't use value types of struct_ops directly.
> It is not the value type and value type emit is understood. It is the
> struct_ops type itself and it is new addition in this patchset afaict.
> The value type emit is in the next line which was cut out from the
> context here.
>
I mean both of them are required.
In the case of a dummy implementation, struct_ops type itself properly
never being used, only being declared by the module. Without this line,
the module developer will fail to load a struct_ops map of the dummy
type. This line is added to avoid this awful situation.
^ permalink raw reply
* Re: [PATCH bpf-next v8 07/10] bpf, net: switch to dynamic registration
From: Martin KaFai Lau @ 2023-11-01 0:02 UTC (permalink / raw)
To: Kui-Feng Lee
Cc: kuifeng, netdev, bpf, ast, song, kernel-team, andrii, thinker.li,
drosen
In-Reply-To: <10f383a2-c83b-4a40-a1f9-bcf33c76c164@gmail.com>
On 10/31/23 4:34 PM, Kui-Feng Lee wrote:
>>> diff --git a/include/linux/btf.h b/include/linux/btf.h
>>> index a8813605f2f6..954536431e0b 100644
>>> --- a/include/linux/btf.h
>>> +++ b/include/linux/btf.h
>>> @@ -12,6 +12,8 @@
>>> #include <uapi/linux/bpf.h>
>>> #define BTF_TYPE_EMIT(type) ((void)(type *)0)
>>> +#define BTF_STRUCT_OPS_TYPE_EMIT(type) {((void)(struct type *)0); \
>>
>> ((void)(struct type *)0); is new. Why is it needed?
>
> This is a trick of BTF to force compiler generate type info for
> the given type. Without trick, compiler may skip these types if these
> type are not used at all in the module. For example, modules usually
> don't use value types of struct_ops directly.
It is not the value type and value type emit is understood. It is the struct_ops
type itself and it is new addition in this patchset afaict. The value type emit
is in the next line which was cut out from the context here.
^ permalink raw reply
* Re: [PATCH net] net/tcp: fix possible out-of-bounds reads in tcp_hash_fail()
From: Dmitry Safonov @ 2023-11-01 0:02 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller
Cc: netdev, eric.dumazet, syzbot, Jakub Kicinski, Paolo Abeni,
David Ahern, Francesco Ruggeri, Salam Noureddine
In-Reply-To: <4053721c-8207-4793-8f43-7207a1454d63@arista.com>
Hi Eric,
On 10/30/23 17:47, Dmitry Safonov wrote:
> On 10/30/23 08:45, Eric Dumazet wrote:
>> syzbot managed to trigger a fault by sending TCP packets
>> with all flags being set.
>>
>> BUG: KASAN: stack-out-of-bounds in string_nocheck lib/vsprintf.c:645 [inline]
>> BUG: KASAN: stack-out-of-bounds in string+0x394/0x3d0 lib/vsprintf.c:727
>> Read of size 1 at addr ffffc9000397f3f5 by task syz-executor299/5039
>>
>> CPU: 1 PID: 5039 Comm: syz-executor299 Not tainted 6.6.0-rc7-syzkaller-02075-g55c900477f5b #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
>> Call Trace:
>> <TASK>
>> __dump_stack lib/dump_stack.c:88 [inline]
>> dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
>> print_address_description mm/kasan/report.c:364 [inline]
>> print_report+0xc4/0x620 mm/kasan/report.c:475
>> kasan_report+0xda/0x110 mm/kasan/report.c:588
>> string_nocheck lib/vsprintf.c:645 [inline]
>> string+0x394/0x3d0 lib/vsprintf.c:727
>> vsnprintf+0xc5f/0x1870 lib/vsprintf.c:2818
>> vprintk_store+0x3a0/0xb80 kernel/printk/printk.c:2191
>> vprintk_emit+0x14c/0x5f0 kernel/printk/printk.c:2288
>> vprintk+0x7b/0x90 kernel/printk/printk_safe.c:45
>> _printk+0xc8/0x100 kernel/printk/printk.c:2332
>> tcp_inbound_hash.constprop.0+0xdb2/0x10d0 include/net/tcp.h:2760
>> tcp_v6_rcv+0x2b31/0x34d0 net/ipv6/tcp_ipv6.c:1882
>> ip6_protocol_deliver_rcu+0x33b/0x13d0 net/ipv6/ip6_input.c:438
>> ip6_input_finish+0x14f/0x2f0 net/ipv6/ip6_input.c:483
>> NF_HOOK include/linux/netfilter.h:314 [inline]
>> NF_HOOK include/linux/netfilter.h:308 [inline]
>> ip6_input+0xce/0x440 net/ipv6/ip6_input.c:492
>> dst_input include/net/dst.h:461 [inline]
>> ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline]
>> NF_HOOK include/linux/netfilter.h:314 [inline]
>> NF_HOOK include/linux/netfilter.h:308 [inline]
>> ipv6_rcv+0x563/0x720 net/ipv6/ip6_input.c:310
>> __netif_receive_skb_one_core+0x115/0x180 net/core/dev.c:5527
>> __netif_receive_skb+0x1f/0x1b0 net/core/dev.c:5641
>> netif_receive_skb_internal net/core/dev.c:5727 [inline]
>> netif_receive_skb+0x133/0x700 net/core/dev.c:5786
>> tun_rx_batched+0x429/0x780 drivers/net/tun.c:1579
>> tun_get_user+0x29e7/0x3bc0 drivers/net/tun.c:2002
>> tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2048
>> call_write_iter include/linux/fs.h:1956 [inline]
>> new_sync_write fs/read_write.c:491 [inline]
>> vfs_write+0x650/0xe40 fs/read_write.c:584
>> ksys_write+0x12f/0x250 fs/read_write.c:637
>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>> do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>
>> Fixes: 2717b5adea9e ("net/tcp: Add tcp_hash_fail() ratelimited logs")
>> Reported-by: syzbot <syzkaller@googlegroups.com>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Cc: Dmitry Safonov <dima@arista.com>
>> Cc: Francesco Ruggeri <fruggeri@arista.com>
>> Cc: David Ahern <dsahern@kernel.org>
>
> Thanks for fixing this,
> Reviewed-by: Dmitry Safonov <dima@arista.com>
>
>> ---
>> include/net/tcp_ao.h | 5 ++---
>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/net/tcp_ao.h b/include/net/tcp_ao.h
>> index a375a171ef3cb37ab1d8246c72c6a3e83f5c9184..5daf96a3dbee14bd3786e19ea4972e351058e6e7 100644
>> --- a/include/net/tcp_ao.h
>> +++ b/include/net/tcp_ao.h
>> @@ -124,7 +124,7 @@ struct tcp_ao_info {
>> #define tcp_hash_fail(msg, family, skb, fmt, ...) \
>> do { \
>> const struct tcphdr *th = tcp_hdr(skb); \
>> - char hdr_flags[5] = {}; \
>> + char hdr_flags[5]; \
>> char *f = hdr_flags; \
>> \
>> if (th->fin) \
>> @@ -135,8 +135,7 @@ do { \
>> *f++ = 'R'; \
>> if (th->ack) \
>> *f++ = 'A'; \
>> - if (f != hdr_flags) \
>> - *f = ' '; \
> Ah, that was clearly typo: I meant "*f = 0;"
Actually, after testing, I can see that the space was intended as well,
otherwise it "sticks" to L3index:
[ 130.965652] TCP: AO key not found for (10.0.1.1, 56920)->(10.0.254.1,
7010) Skeyid: 100 L3index: 0
[ 131.975116] TCP: AO hash is required, but not found for (10.0.1.1,
52686)->(10.0.254.1, 7011) SL3 index 0
[ 132.984024] TCP: AO hash mismatch for (10.0.1.1, 51382)->(10.0.254.1,
7012) SL3index: 0
[ 133.992221] TCP: Requested by the peer AO key id not found for
(10.0.1.1, 36548)->(10.0.254.1, 7013) SL3index: 0
If you don't mind, I'll send an updated version of your patch together
with some other small post-merge fixes this week.
>
>> + *f = 0; \
>> if ((family) == AF_INET) { \
>> net_info_ratelimited("%s for (%pI4, %d)->(%pI4, %d) %s" fmt "\n", \
>> msg, &ip_hdr(skb)->saddr, ntohs(th->source), \
>
Thank you,
Dmitry
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox