Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [PATCH net] net: fec: Coverity issue: Dereference null return value
From: Wei Fang @ 2022-12-19  2:21 UTC (permalink / raw)
  To: Alexander H Duyck, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, Clark Wang, Shenwei Wang,
	dl-linux-imx
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <be98552a061f6249de558b210ff25de45e80d690.camel@gmail.com>


> -----Original Message-----
> From: Alexander H Duyck <alexander.duyck@gmail.com>
> Sent: 2022年12月16日 23:34
> To: Wei Fang <wei.fang@nxp.com>; davem@davemloft.net;
> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; Clark Wang
> <xiaoning.wang@nxp.com>; Shenwei Wang <shenwei.wang@nxp.com>;
> dl-linux-imx <linux-imx@nxp.com>
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH net] net: fec: Coverity issue: Dereference null return value
> 
> On Thu, 2022-12-15 at 17:11 +0800, wei.fang@nxp.com wrote:
> > From: Wei Fang <wei.fang@nxp.com>
> >
> > The build_skb might return a null pointer but there is no check on the
> > return value in the fec_enet_rx_queue(). So a null pointer dereference
> > might occur. To avoid this, we check the return value of build_skb. If
> > the return value is a null pointer, the driver will recycle the page
> > and update the statistic of ndev. Then jump to rx_processing_done to
> > clear the status flags of the BD so that the hardware can recycle the BD.
> >
> > Signed-off-by: Wei Fang <wei.fang@nxp.com>
> > Reviewed-by: Shenwei Wang <Shenwei.wang@nxp.com>
> > ---
> >  drivers/net/ethernet/freescale/fec_main.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/freescale/fec_main.c
> > b/drivers/net/ethernet/freescale/fec_main.c
> > index 5528b0af82ae..c78aaa780983 100644
> > --- a/drivers/net/ethernet/freescale/fec_main.c
> > +++ b/drivers/net/ethernet/freescale/fec_main.c
> > @@ -1674,6 +1674,16 @@ fec_enet_rx_queue(struct net_device *ndev, int
> budget, u16 queue_id)
> >  		 * bridging applications.
> >  		 */
> >  		skb = build_skb(page_address(page), PAGE_SIZE);
> > +		if (unlikely(!skb)) {
> > +			page_pool_recycle_direct(rxq->page_pool, page);
> > +			ndev->stats.rx_packets--;
> > +			ndev->stats.rx_bytes -= pkt_len;
> > +			ndev->stats.rx_dropped++;
> 
> I'm not sure you really need to bother with rewinding the rx_packets and
> rx_bytes counters. I know that the rx_dropped statistic will get incremented in
> the network stack in the event of a packet failing to enqueue to the backlog, so
> it might be better to just leave the rx_packets counter as is and assume the
> actual packet count is rx_packets - rx_dropped.
> 
According to your advice, I looked up the Linux document, actually as you said,
the rx_packets should include packets which host had to drop at various stages
of processing (even in the driver). Thanks for your review, I‘ll amend this in the
next version.

> > +
> > +			netdev_err(ndev, "build_skb failed!\n");
> 
> Instead of netdev_err you may want to consider netdev_err_once for this.
> Generally speaking when we start seeing memory allocation error issues they
> can get very noisy very quickly as you are likely to fail the allocation for every
> packet in a given polling session, and sessions to follow.
> 
Yes, it's better to use netdev_err_once than netdev_err in the situation you describe.
Thanks again!

> > +			goto rx_processing_done;
> > +		}
> > +
> >  		skb_reserve(skb, data_start);
> >  		skb_put(skb, pkt_len - sub_len);
> >  		skb_mark_for_recycle(skb);


^ permalink raw reply

* [PATCH V2 net] net: fec: Coverity issue: Dereference null return value
From: wei.fang @ 2022-12-19  2:27 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, xiaoning.wang, shenwei.wang,
	alexander.duyck, linux-imx
  Cc: netdev, linux-kernel

From: Wei Fang <wei.fang@nxp.com>

The build_skb might return a null pointer but there is no check on the
return value in the fec_enet_rx_queue(). So a null pointer dereference
might occur. To avoid this, we check the return value of build_skb. If
the return value is a null pointer, the driver will recycle the page and
update the statistic of ndev. Then jump to rx_processing_done to clear
the status flags of the BD so that the hardware can recycle the BD.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Shenwei Wang <Shenwei.wang@nxp.com>
---
V2 changes:
1. Remove rx_packets and rx_bytes counters.
2. Use netdev_err_once instead of netdev_err.
---
 drivers/net/ethernet/freescale/fec_main.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 5528b0af82ae..644f3c963730 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1674,6 +1674,14 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id)
 		 * bridging applications.
 		 */
 		skb = build_skb(page_address(page), PAGE_SIZE);
+		if (unlikely(!skb)) {
+			page_pool_recycle_direct(rxq->page_pool, page);
+			ndev->stats.rx_dropped++;
+
+			netdev_err_once(ndev, "build_skb failed!\n");
+			goto rx_processing_done;
+		}
+
 		skb_reserve(skb, data_start);
 		skb_put(skb, pkt_len - sub_len);
 		skb_mark_for_recycle(skb);
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH] net/ncsi: Always use unicast source MAC address
From: Peter Delevoryas @ 2022-12-19  3:16 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Peter Delevoryas, sam, davem, edumazet, kuba, pabeni, netdev,
	linux-kernel
In-Reply-To: <CAKgT0UfOnJGf+n_PTizCyq77H+ZvWMU4i=D=GW3o13RNqWf-Gg@mail.gmail.com>



> On Dec 17, 2022, at 12:57 PM, Alexander Duyck <alexander.duyck@gmail.com> wrote:
> 
> On Fri, Dec 16, 2022 at 8:20 PM Peter Delevoryas <peter@pjd.dev> wrote:
>> 
>> 
>> 
>>> On Dec 16, 2022, at 10:29 AM, Alexander Duyck <alexander.duyck@gmail.com> wrote:
>>> 
>>> On Thu, Dec 15, 2022 at 5:08 PM Peter Delevoryas <peter@pjd.dev> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Dec 13, 2022, at 8:41 AM, Alexander H Duyck <alexander.duyck@gmail.com> wrote:
>>>>> 
>>>>> On Mon, 2022-12-12 at 16:47 -0800, Peter Delevoryas wrote:
> 
> <...>
> 
>>> 
>>>>> My main
>>>>> concern would be that the dev_addr is not initialized for those first
>>>>> few messages so you may be leaking information.
>>>>> 
>>>>>> This might have the effect of causing the NIC to learn 2 MAC addresses from
>>>>>> an NC-SI link if the BMC uses OEM Get MAC Address commands to change its
>>>>>> initial MAC address, but it shouldn't really matter. Who knows if NIC's
>>>>>> even have MAC learning enabled from the out-of-band BMC link, lol.
>>>>>> 
>>>>>> [1]: https://tinyurl.com/4933mhaj
>>>>>> [2]: https://tinyurl.com/mr3tyadb
>>>>> 
>>>>> The thing is the OpenBMC approach initializes the value themselves to
>>>>> broadcast[3]. As a result the two code bases are essentially doing the
>>>>> same thing since mac_addr is defaulted to the broadcast address when
>>>>> the ncsi interface is registered.
>>>> 
>>>> That’s a very good point, thanks for pointing that out, I hadn’t
>>>> even noticed that!
>>>> 
>>>> Anyways, let me know what you think of the traces I added above.
>>>> Sorry for the delay, I’ve just been busy with some other stuff,
>>>> but I do really actually care about upstreaming this (and several
>>>> other NC-SI changes I’ll submit after this one, which are unrelated
>>>> but more useful).
>>>> 
>>>> Thanks,
>>>> Peter
>>> 
>>> So the NC-SI spec says any value can be used for the source MAC and
>>> that broadcast "may" be used. I would say there are some debugging
>>> advantages to using broadcast that will be obvious in a packet trace.
>> 
>> Ehhhhh yeah I guess, but the ethertype is what I filter for. But sure,
>> a broadcast source MAC is pretty unique too.
>> 
>>> I wonder if we couldn't look at doing something like requiring
>>> broadcast or LAA if the gma_flag isn't set.
>> 
>> What is LAA? I’m out of the loop
> 
> Locally administered MAC address[4]. Basically it is a MAC address
> that is generated locally such as your random MAC address. Assuming
> the other end of the NC-SI link is using a MAC address with a vendor
> OUI there should be no risk of collisions on a point-to-point link.
> Essentially if you wanted to you could probably just generate a random
> MAC address for the NCSI protocol and then use that in place of the
> broadcast address.
> 
>> But also: aren’t we already using broadcast if the gma_flag isn’t set?
>> 
>> -       if (nca->ndp->gma_flag == 1)
>> -               memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);
>> -       else
>> -               eth_broadcast_addr(eh->h_source);
>> +       memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);
> 
> That I am not sure about. You were using this kernel without your
> patch right? With your patch it would make sense to see that behavior,
> but without I am not sure why you would see that address for any NC-SI
> commands before the gma_flag is set.
> 
>> 
>>> With that we could at
>>> least advertise that we don't expect this packet to be going out in a
>>> real network as we cannot guarantee the MAC is unique.
>> 
>> Yeah, but it probably wouldn’t help my simulation scenario.
>> 
>> I guess it sounds like this patch is not a good idea, which to be fair,
>> is totally reasonable.
>> 
>> I can just add some iptables rules to tunnel these packets with a different
>> source MAC, or fix the multicast socket issue I was having. It’s really
>> not a big deal, and like you’re saying, we probably don’t want to make
>> it harder to maintain _forever_.
> 
> Like I said before I would be good with either a Broadcast address OR
> a LAA address. The one thing we need to watch out for though is any
> sort of leak. One possible concern would be if for example you had 4
> ports using 4 different MAC addresses but one BMC. You don't want to
> accidently leak the MAC address from one port onto the other one. With
> a LAA address if it were to leak and screw up ARP tables somewhere it
> wouldn't be a big deal since it isn't expected to be switched in the
> first place.
> 
>> I would just suggest praying for the next guy that tries to test NC-SI
>> stuff with QEMU and finds out NC-SI traffic gets dropped by bridges.
>> I had to resort to reading the source code and printing stuff with
>> BPF to identify this. Maybe it’s more obvious to other people this wouldn’t
>> work though.
> 
> Well it seems like NC-SI isn't meant to be bridged based on the fact
> that it is using a broadcast MAC address as a source. If nothing else
> I suppose you could try to work with the standards committee on that
> to see what can be done to make the protocol more portable.. :-)

Well, I started preparing some of my other patches to send, and while
digging up the history for that, I happened to notice this commit
completely by chance while browsing Github:

https://github.com/facebook/openbmc-linux/commit/933b5bd024d28f48a6359e6a9db631f778ba9ea7

[openbmc.quanta][PR] FBAL:Fixed NCSI can't work when import BR function

Summary:
As title.
Pull Request resolved: https://github.com/facebookexternal/openbmc.quanta/pull/1668
GitHub Author: Peter <peter.yin@quantatw.com>

diff --git a/meta-aspeed/recipes-kernel/linux/files/linux-aspeed-5.0/net/bridge/br_input.c b/meta-aspeed/recipes-kernel/linux/files/linux-aspeed-5.0/net/bridge/br_input.c
index 5ea7e56119c1..8ef0b627f5ec 100644
--- a/meta-aspeed/recipes-kernel/linux/files/linux-aspeed-5.0/net/bridge/br_input.c
+++ b/meta-aspeed/recipes-kernel/linux/files/linux-aspeed-5.0/net/bridge/br_input.c
@@ -220,6 +220,9 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
        if (unlikely(skb->pkt_type == PACKET_LOOPBACK))
                return RX_HANDLER_PASS;

+       if (skb->protocol == cpu_to_be16(ETH_P_NCSI))
+               return RX_HANDLER_PASS;
+
        if (!is_valid_ether_addr(eth_hdr(skb)->h_source))
                goto drop;

Which is accomplishing the same thing I suggested in my patch, except
that it’s modifying the Linux bridge code instead of changing the NC-SI
packets’ source MAC address.

To explain what I *think* this person was doing...

Meta has a system called Zion that’s described here:

https://engineering.fb.com/2019/03/14/data-center-engineering/accelerating-infrastructure/

It consists of two chassis, “Angel's Landing” and “Emerald Pools”.

Together, it’s kinda like an Nvidia DGX A100 system, but with generic
PCIe switches, and “OCP Accelerators”. There’s like an AMD GPU or an
Intel accelerator that can fit there. Maybe an A100 can fit too? I’m
not really completely clear on how its being used compared to GrandTeton,
announced at OCP 2022, which is even closer to the DGX architecture,
but yeah.

Angel’s Landing is 4 dual-socket boards stacked together, each board
with a BMC and NIC supporting NC-SI. I think in practice we reduced
this to 1-2 dual-socket boards, each with 2 NIC’s (presumably cause
we don't need that many CPU's but still need the network bandwidth).

Emerald Pools is a single board and 8 accelerator modules, and
the board has a BMC on it. To get network connectivity to the BMC,
there’s a USB from Emerald Pools to one of the Angel’s Landing BMC's
and the Angel’s Landing BMC bridges Emerald Pools traffic through
its NIC. If this doesn’t make sense, I think this is the whole setup
(Omitting the device tree and some MAC filtering stuff):

On an Angel’s Landing BMC:

$ ip link add br0 type bridge
$ ip link set eth0 master br0
$ ip link set eth1 master br0
$ ip link set usb0 master br0

And on the Emerald Pools BMC, there’s just a usb net intf:

$ ifconfig
lo        ….

usb0      Link encap:Ethernet  HWaddr xxxxxxxxxxx
          inet6 addr: xxxxxx Scope:Link
          inet6 addr: xxxxxx Scope:Global
          inet6 addr: xxxxxx Scope:Global
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:999332 errors:0 dropped:0 overruns:0 frame:0
          TX packets:594253 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:211829527 (202.0 MiB)  TX bytes:150569888 (143.5 MiB)

Anyways, so then my question was: is Zion actually relying on NC-SI
packets traversing a bridge?

The Emerald Pools BMC doesn’t have NC-SI enabled at all, not even a
userspace daemon or utility of any kind.

NC-SI *is* enabled and used on the Angel's Landing BMC, so I checked
to see if they traverse the bridge (in QEMU, I didn’t check on a real
system):

root@bmc-oob:~# tcpdump -i br0 -v "ether proto 0x88f8" &
[1] 12045
root@bmc-oob:~# [ 1434.520314] device br0 entered promiscuous mode
tcpdump: listening on br0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
ifconfig eth0 down
[ 1442.863305] br0: port 1(eth0) entered disabled state
root@bmc-oob:~# ifconfig eth0 up
[ 1445.978424] br0: port 1(eth0) entered blocking state
[ 1445.978743] br0: port 1(eth0) entered forwarding state
[ 1445.979131] 8021q: adding VLAN 0 to HW filter on device eth0
[ 1445.979814] ftgmac100 1e660000.ethernet eth0: NCSI: Handler for packet type 0x82 returned -19
root@bmc-oob:~# tcpdump -i eth0 -v "ether proto 0x88f8" &
[2] 12258
root@bmc-oob:~# tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
ifcon04:58:49.464810 fa:ce:b0:02:20:22 (oui Unknown) > Broadcast, ethertype Unknown (0x88f8), length 60:
        0x0000:  0001 0068 0a00 0000 0000 0000 0000 0000  ...h............
        0x0010:  ffff f597 0000 0000 0000 0000 0000 0000  ................
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
04:58:49.465099 Broadcast > Broadcast, ethertype Unknown (0x88f8), length 64:
        0x0000:  0001 0068 8a00 0010 0000 0000 0000 0000  ...h............
        0x0010:  0000 0000 0000 0001 0000 0000 0000 0000  ................
        0x0020:  ffff 7586 0000 0000 0000 0000 0000 d8cd  ..u.............
        0x0030:  c6bc                                     ..
04:58:49.471206 fa:ce:b0:02:20:22 (oui Unknown) > Broadcast, ethertype Unknown (0x88f8), length 60:
        0x0000:  0001 0069 1500 0000 0000 0000 0000 0000  ...i............
        0x0010:  ffff ea96 0000 0000 0000 0000 0000 0000  ................
        0x0020:  0000 0000 0000 0000 0000 0000 0000       ..............
04:58:49.471432 Broadcast > Broadcast, ethertype Unknown (0x88f8), length 78:
        0x0000:  0001 0069 9500 0028 0000 0000 0000 0000  ...i...(........
        0x0010:  0000 0000 f1f0 f000 0000 0000 0000 0000  ................
        0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0030:  0000 0000 0000 8119 fffd 0765 84e0 9fa4  ...........e….

So, I’m able to see packets on eth0, but so far I haven’t really seen
anything hitting the bridge. ¯\_(ツ)_/¯

Perhaps if there’s some cross-interface NC-SI traffic (eth0 <-> eth1), then
yes this would occur. But I don’t know why that would even happen? Regular
NC-SI failover or bonding (eth0, eth1) would be the actual solution? not sure.

The original commit was very vague, so perhaps I’ll follow up with
the author and reviewer to see if this patch was actually necessary.

> 
> [4]: https://macaddress.io/faq/what-are-a-universal-address-and-a-local-administered-address



^ permalink raw reply related

* [syzbot] memory leak in ath9k_hif_usb_rx_cb
From: syzbot @ 2022-12-19  3:40 UTC (permalink / raw)
  To: davem, edumazet, kuba, kvalo, linux-kernel, linux-wireless,
	netdev, pabeni, syzkaller-bugs, toke

Hello,

syzbot found the following issue on:

HEAD commit:    6f1f5caed5bf Merge tag 'for-linus-6.2-ofs1' of git://git.k..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11a5aa57880000
kernel config:  https://syzkaller.appspot.com/x/.config?x=aa9d05fc5567240b
dashboard link: https://syzkaller.appspot.com/bug?extid=e9632e3eb038d93d6bc6
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1138a5c0480000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10d07e77880000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/e0b09490fc5c/disk-6f1f5cae.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/2f00e5ef8dce/vmlinux-6f1f5cae.xz
kernel image: https://storage.googleapis.com/syzbot-assets/78f4c439075f/bzImage-6f1f5cae.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e9632e3eb038d93d6bc6@syzkaller.appspotmail.com

BUG: memory leak
unreferenced object 0xffff888101f97700 (size 240):
  comm "softirq", pid 0, jiffies 4294945988 (age 15.200s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff83ac0212>] __alloc_skb+0x202/0x270 net/core/skbuff.c:552
    [<ffffffff83ac396a>] __netdev_alloc_skb+0x6a/0x220 net/core/skbuff.c:630
    [<ffffffff82df70d0>] __dev_alloc_skb include/linux/skbuff.h:3165 [inline]
    [<ffffffff82df70d0>] ath9k_hif_usb_rx_stream drivers/net/wireless/ath/ath9k/hif_usb.c:635 [inline]
    [<ffffffff82df70d0>] ath9k_hif_usb_rx_cb+0x1d0/0x660 drivers/net/wireless/ath/ath9k/hif_usb.c:686
    [<ffffffff82fd9d89>] __usb_hcd_giveback_urb+0xf9/0x230 drivers/usb/core/hcd.c:1671
    [<ffffffff82fda06b>] usb_hcd_giveback_urb+0x1ab/0x1c0 drivers/usb/core/hcd.c:1754
    [<ffffffff8318c0b4>] dummy_timer+0x8e4/0x14c0 drivers/usb/gadget/udc/dummy_hcd.c:1988
    [<ffffffff81328243>] call_timer_fn+0x33/0x1f0 kernel/time/timer.c:1700
    [<ffffffff813284ff>] expire_timers+0xff/0x1d0 kernel/time/timer.c:1751
    [<ffffffff813286f9>] __run_timers kernel/time/timer.c:2022 [inline]
    [<ffffffff813286f9>] __run_timers kernel/time/timer.c:1995 [inline]
    [<ffffffff813286f9>] run_timer_softirq+0x129/0x2f0 kernel/time/timer.c:2035
    [<ffffffff84c000eb>] __do_softirq+0xeb/0x2ef kernel/softirq.c:571
    [<ffffffff8126a086>] invoke_softirq kernel/softirq.c:445 [inline]
    [<ffffffff8126a086>] __irq_exit_rcu+0xc6/0x110 kernel/softirq.c:650
    [<ffffffff848a7742>] sysvec_apic_timer_interrupt+0xa2/0xd0 arch/x86/kernel/apic/apic.c:1107
    [<ffffffff84a00cc6>] asm_sysvec_apic_timer_interrupt+0x16/0x20 arch/x86/include/asm/idtentry.h:649
    [<ffffffff848bd6e9>] native_safe_halt arch/x86/include/asm/irqflags.h:51 [inline]
    [<ffffffff848bd6e9>] arch_safe_halt arch/x86/include/asm/irqflags.h:89 [inline]
    [<ffffffff848bd6e9>] acpi_safe_halt drivers/acpi/processor_idle.c:112 [inline]
    [<ffffffff848bd6e9>] acpi_idle_do_entry+0xc9/0xe0 drivers/acpi/processor_idle.c:570
    [<ffffffff848bdc00>] acpi_idle_enter+0x150/0x230 drivers/acpi/processor_idle.c:707
    [<ffffffff83699eb4>] cpuidle_enter_state+0xc4/0x740 drivers/cpuidle/cpuidle.c:239

BUG: memory leak
unreferenced object 0xffff88810c312800 (size 1024):
  comm "softirq", pid 0, jiffies 4294945988 (age 15.200s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff814f6467>] __do_kmalloc_node mm/slab_common.c:967 [inline]
    [<ffffffff814f6467>] __kmalloc_node_track_caller+0x47/0x120 mm/slab_common.c:988
    [<ffffffff83ac00f1>] kmalloc_reserve net/core/skbuff.c:492 [inline]
    [<ffffffff83ac00f1>] __alloc_skb+0xe1/0x270 net/core/skbuff.c:565
    [<ffffffff83ac396a>] __netdev_alloc_skb+0x6a/0x220 net/core/skbuff.c:630
    [<ffffffff82df70d0>] __dev_alloc_skb include/linux/skbuff.h:3165 [inline]
    [<ffffffff82df70d0>] ath9k_hif_usb_rx_stream drivers/net/wireless/ath/ath9k/hif_usb.c:635 [inline]
    [<ffffffff82df70d0>] ath9k_hif_usb_rx_cb+0x1d0/0x660 drivers/net/wireless/ath/ath9k/hif_usb.c:686
    [<ffffffff82fd9d89>] __usb_hcd_giveback_urb+0xf9/0x230 drivers/usb/core/hcd.c:1671
    [<ffffffff82fda06b>] usb_hcd_giveback_urb+0x1ab/0x1c0 drivers/usb/core/hcd.c:1754
    [<ffffffff8318c0b4>] dummy_timer+0x8e4/0x14c0 drivers/usb/gadget/udc/dummy_hcd.c:1988
    [<ffffffff81328243>] call_timer_fn+0x33/0x1f0 kernel/time/timer.c:1700
    [<ffffffff813284ff>] expire_timers+0xff/0x1d0 kernel/time/timer.c:1751
    [<ffffffff813286f9>] __run_timers kernel/time/timer.c:2022 [inline]
    [<ffffffff813286f9>] __run_timers kernel/time/timer.c:1995 [inline]
    [<ffffffff813286f9>] run_timer_softirq+0x129/0x2f0 kernel/time/timer.c:2035
    [<ffffffff84c000eb>] __do_softirq+0xeb/0x2ef kernel/softirq.c:571
    [<ffffffff8126a086>] invoke_softirq kernel/softirq.c:445 [inline]
    [<ffffffff8126a086>] __irq_exit_rcu+0xc6/0x110 kernel/softirq.c:650
    [<ffffffff848a7742>] sysvec_apic_timer_interrupt+0xa2/0xd0 arch/x86/kernel/apic/apic.c:1107
    [<ffffffff84a00cc6>] asm_sysvec_apic_timer_interrupt+0x16/0x20 arch/x86/include/asm/idtentry.h:649
    [<ffffffff848bd6e9>] native_safe_halt arch/x86/include/asm/irqflags.h:51 [inline]
    [<ffffffff848bd6e9>] arch_safe_halt arch/x86/include/asm/irqflags.h:89 [inline]
    [<ffffffff848bd6e9>] acpi_safe_halt drivers/acpi/processor_idle.c:112 [inline]
    [<ffffffff848bd6e9>] acpi_idle_do_entry+0xc9/0xe0 drivers/acpi/processor_idle.c:570
    [<ffffffff848bdc00>] acpi_idle_enter+0x150/0x230 drivers/acpi/processor_idle.c:707

BUG: memory leak
unreferenced object 0xffff888101f97500 (size 240):
  comm "softirq", pid 0, jiffies 4294945988 (age 15.200s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff83ac0212>] __alloc_skb+0x202/0x270 net/core/skbuff.c:552
    [<ffffffff83ac396a>] __netdev_alloc_skb+0x6a/0x220 net/core/skbuff.c:630
    [<ffffffff82df70d0>] __dev_alloc_skb include/linux/skbuff.h:3165 [inline]
    [<ffffffff82df70d0>] ath9k_hif_usb_rx_stream drivers/net/wireless/ath/ath9k/hif_usb.c:635 [inline]
    [<ffffffff82df70d0>] ath9k_hif_usb_rx_cb+0x1d0/0x660 drivers/net/wireless/ath/ath9k/hif_usb.c:686
    [<ffffffff82fd9d89>] __usb_hcd_giveback_urb+0xf9/0x230 drivers/usb/core/hcd.c:1671
    [<ffffffff82fda06b>] usb_hcd_giveback_urb+0x1ab/0x1c0 drivers/usb/core/hcd.c:1754
    [<ffffffff8318c0b4>] dummy_timer+0x8e4/0x14c0 drivers/usb/gadget/udc/dummy_hcd.c:1988
    [<ffffffff81328243>] call_timer_fn+0x33/0x1f0 kernel/time/timer.c:1700
    [<ffffffff813284ff>] expire_timers+0xff/0x1d0 kernel/time/timer.c:1751
    [<ffffffff813286f9>] __run_timers kernel/time/timer.c:2022 [inline]
    [<ffffffff813286f9>] __run_timers kernel/time/timer.c:1995 [inline]
    [<ffffffff813286f9>] run_timer_softirq+0x129/0x2f0 kernel/time/timer.c:2035
    [<ffffffff84c000eb>] __do_softirq+0xeb/0x2ef kernel/softirq.c:571
    [<ffffffff8126a086>] invoke_softirq kernel/softirq.c:445 [inline]
    [<ffffffff8126a086>] __irq_exit_rcu+0xc6/0x110 kernel/softirq.c:650
    [<ffffffff848a7742>] sysvec_apic_timer_interrupt+0xa2/0xd0 arch/x86/kernel/apic/apic.c:1107
    [<ffffffff84a00cc6>] asm_sysvec_apic_timer_interrupt+0x16/0x20 arch/x86/include/asm/idtentry.h:649
    [<ffffffff848bd6e9>] native_safe_halt arch/x86/include/asm/irqflags.h:51 [inline]
    [<ffffffff848bd6e9>] arch_safe_halt arch/x86/include/asm/irqflags.h:89 [inline]
    [<ffffffff848bd6e9>] acpi_safe_halt drivers/acpi/processor_idle.c:112 [inline]
    [<ffffffff848bd6e9>] acpi_idle_do_entry+0xc9/0xe0 drivers/acpi/processor_idle.c:570
    [<ffffffff848bdc00>] acpi_idle_enter+0x150/0x230 drivers/acpi/processor_idle.c:707
    [<ffffffff83699eb4>] cpuidle_enter_state+0xc4/0x740 drivers/cpuidle/cpuidle.c:239



---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH net-next 09/11] sfc: implement iova rbtree to store dma mappings
From: Jason Wang @ 2022-12-19  6:03 UTC (permalink / raw)
  To: Gautam Dawar
  Cc: Gautam Dawar, linux-net-drivers, netdev, eperezma, tanuj.kamde,
	Koushik.Dutta, harpreet.anand, Edward Cree, Martin Habets,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel
In-Reply-To: <6f3eb21d-4f2d-eff3-37d4-9731eacd4af3@amd.com>

On Fri, Dec 16, 2022 at 8:48 PM Gautam Dawar <gdawar@amd.com> wrote:
>
>
> On 12/14/22 12:16, Jason Wang wrote:
> > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> >
> >
> > On Wed, Dec 7, 2022 at 10:57 PM Gautam Dawar <gautam.dawar@amd.com> wrote:
> >> sfc uses a MCDI DMA buffer that is allocated on the host
> >> for communicating with the Firmware. The MCDI buffer IOVA
> >> could overlap with the IOVA used by the guest for the
> >> virtqueue buffers. To detect such overlap, the DMA mappings
> >> from the guest will be stored in a IOVA rbtree and every
> >> such mapping will be compared against the MCDI buffer IOVA
> >> range. If an overlap is detected, the MCDI buffer will be
> >> relocated to a different IOVA.
> > I think it can't prevent guests from guessing the MCDI buffer address
> > and trying to DMA to/from that buffer.
>
> Yes, if the guest can guess the MCDI buffer address, it could use it to
> DMA to/from this buffer.
>
> However, the guest can modify the buffer contents but can't instruct the
> MC to execute the request. To cause any MCDI failure, the request buffer
> needs to be updated when host driver is about to execute the request or
> response buffer needs to be updated after command execution but before
> host driver reads it. This would be a very small time window and hard to
> guess for the guest.

Not that hard probably, actually, the guest driver don't even need to
guess, just leave a small space in its IOVA space then it knows the
host driver will use that for MCDI. So this is something we need to
address.

Any possibility to let the MCDI command run on PF instead of VF?

>
> > It might work with some co-operation of the NIC who can validate the
> > DMA initialized by the virtqueue and forbid the DMA to the MDCI
> > buffer.
> I think this problem can be solved using PASID which will be supported
> by our next hardware version.

That one way, another way is to add a check before initiating
virtqueue DMA, if it tries to DMA to MCDI, fail. This seems easier.

Thanks

> >
> > Thanks
>
> >> Signed-off-by: Gautam Dawar <gautam.dawar@amd.com>
> >> ---
> >>   drivers/net/ethernet/sfc/Makefile         |   3 +-
> >>   drivers/net/ethernet/sfc/ef100_iova.c     | 205 ++++++++++++++++++++++
> >>   drivers/net/ethernet/sfc/ef100_iova.h     |  40 +++++
> >>   drivers/net/ethernet/sfc/ef100_nic.c      |   1 -
> >>   drivers/net/ethernet/sfc/ef100_vdpa.c     |  38 ++++
> >>   drivers/net/ethernet/sfc/ef100_vdpa.h     |  15 ++
> >>   drivers/net/ethernet/sfc/ef100_vdpa_ops.c |   5 +
> >>   drivers/net/ethernet/sfc/mcdi.h           |   3 +
> >>   8 files changed, 308 insertions(+), 2 deletions(-)
> >>   create mode 100644 drivers/net/ethernet/sfc/ef100_iova.c
> >>   create mode 100644 drivers/net/ethernet/sfc/ef100_iova.h
> >>
> >> diff --git a/drivers/net/ethernet/sfc/Makefile b/drivers/net/ethernet/sfc/Makefile
> >> index a10eac91ab23..85852ff50b7c 100644
> >> --- a/drivers/net/ethernet/sfc/Makefile
> >> +++ b/drivers/net/ethernet/sfc/Makefile
> >> @@ -11,7 +11,8 @@ sfc-$(CONFIG_SFC_MTD) += mtd.o
> >>   sfc-$(CONFIG_SFC_SRIOV)        += sriov.o ef10_sriov.o ef100_sriov.o ef100_rep.o \
> >>                              mae.o tc.o tc_bindings.o tc_counters.o
> >>
> >> -sfc-$(CONFIG_SFC_VDPA) += mcdi_vdpa.o ef100_vdpa.o ef100_vdpa_ops.o
> >> +sfc-$(CONFIG_SFC_VDPA) += mcdi_vdpa.o ef100_vdpa.o ef100_vdpa_ops.o \
> >> +                          ef100_iova.o
> >>   obj-$(CONFIG_SFC)      += sfc.o
> >>
> >>   obj-$(CONFIG_SFC_FALCON) += falcon/
> >> diff --git a/drivers/net/ethernet/sfc/ef100_iova.c b/drivers/net/ethernet/sfc/ef100_iova.c
> >> new file mode 100644
> >> index 000000000000..863314c5b9b5
> >> --- /dev/null
> >> +++ b/drivers/net/ethernet/sfc/ef100_iova.c
> >> @@ -0,0 +1,205 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +/* Driver for Xilinx network controllers and boards
> >> + * Copyright(C) 2020-2022 Xilinx, Inc.
> >> + * Copyright(C) 2022 Advanced Micro Devices, Inc.
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify it
> >> + * under the terms of the GNU General Public License version 2 as published
> >> + * by the Free Software Foundation, incorporated herein by reference.
> >> + */
> >> +
> >> +#include "ef100_iova.h"
> >> +
> >> +static void update_free_list_node(struct ef100_vdpa_iova_node *target_node,
> >> +                                 struct ef100_vdpa_iova_node *next_node,
> >> +                                 struct ef100_vdpa_nic *vdpa_nic)
> >> +{
> >> +       unsigned long target_node_end;
> >> +       unsigned long free_area;
> >> +       bool in_list;
> >> +
> >> +       target_node_end = target_node->iova + target_node->size;
> >> +       free_area = next_node->iova - target_node_end;
> >> +       in_list = !(list_empty(&target_node->free_node));
> >> +
> >> +       if (!in_list && free_area >= MCDI_BUF_LEN) {
> >> +               list_add(&target_node->free_node,
> >> +                        &vdpa_nic->free_list);
> >> +       } else if (in_list && free_area < MCDI_BUF_LEN) {
> >> +               list_del_init(&target_node->free_node);
> >> +       }
> >> +}
> >> +
> >> +static void update_free_list(struct ef100_vdpa_iova_node *iova_node,
> >> +                            struct ef100_vdpa_nic *vdpa_nic,
> >> +                            bool add_node)
> >> +{
> >> +       struct ef100_vdpa_iova_node *prev_in = NULL;
> >> +       struct ef100_vdpa_iova_node *next_in = NULL;
> >> +       struct rb_node *prev_node;
> >> +       struct rb_node *next_node;
> >> +
> >> +       prev_node = rb_prev(&iova_node->node);
> >> +       next_node = rb_next(&iova_node->node);
> >> +
> >> +       if (prev_node)
> >> +               prev_in = rb_entry(prev_node,
> >> +                                  struct ef100_vdpa_iova_node, node);
> >> +       if (next_node)
> >> +               next_in = rb_entry(next_node,
> >> +                                  struct ef100_vdpa_iova_node, node);
> >> +
> >> +       if (add_node) {
> >> +               if (prev_in)
> >> +                       update_free_list_node(prev_in, iova_node, vdpa_nic);
> >> +
> >> +               if (next_in)
> >> +                       update_free_list_node(iova_node, next_in, vdpa_nic);
> >> +       } else {
> >> +               if (next_in && prev_in)
> >> +                       update_free_list_node(prev_in, next_in, vdpa_nic);
> >> +               if (!list_empty(&iova_node->free_node))
> >> +                       list_del_init(&iova_node->free_node);
> >> +       }
> >> +}
> >> +
> >> +int efx_ef100_insert_iova_node(struct ef100_vdpa_nic *vdpa_nic,
> >> +                              u64 iova, u64 size)
> >> +{
> >> +       struct ef100_vdpa_iova_node *iova_node;
> >> +       struct ef100_vdpa_iova_node *new_node;
> >> +       struct rb_node *parent;
> >> +       struct rb_node **link;
> >> +       struct rb_root *root;
> >> +       int rc = 0;
> >> +
> >> +       mutex_lock(&vdpa_nic->iova_lock);
> >> +
> >> +       root = &vdpa_nic->iova_root;
> >> +       link = &root->rb_node;
> >> +       parent = *link;
> >> +       /* Go to the bottom of the tree */
> >> +       while (*link) {
> >> +               parent = *link;
> >> +               iova_node = rb_entry(parent, struct ef100_vdpa_iova_node, node);
> >> +
> >> +               /* handle duplicate node */
> >> +               if (iova_node->iova == iova) {
> >> +                       rc = -EEXIST;
> >> +                       goto out_unlock;
> >> +               }
> >> +
> >> +               if (iova_node->iova > iova)
> >> +                       link = &(*link)->rb_left;
> >> +               else
> >> +                       link = &(*link)->rb_right;
> >> +       }
> >> +
> >> +       new_node = kzalloc(sizeof(*new_node), GFP_KERNEL);
> >> +       if (!new_node) {
> >> +               rc = -ENOMEM;
> >> +               goto out_unlock;
> >> +       }
> >> +
> >> +       new_node->iova = iova;
> >> +       new_node->size = size;
> >> +       INIT_LIST_HEAD(&new_node->free_node);
> >> +
> >> +       /* Put the new node here */
> >> +       rb_link_node(&new_node->node, parent, link);
> >> +       rb_insert_color(&new_node->node, root);
> >> +
> >> +       update_free_list(new_node, vdpa_nic, true);
> >> +
> >> +out_unlock:
> >> +       mutex_unlock(&vdpa_nic->iova_lock);
> >> +       return rc;
> >> +}
> >> +
> >> +static struct ef100_vdpa_iova_node*
> >> +ef100_rbt_search_node(struct ef100_vdpa_nic *vdpa_nic,
> >> +                     unsigned long iova)
> >> +{
> >> +       struct ef100_vdpa_iova_node *iova_node;
> >> +       struct rb_node *rb_node;
> >> +       struct rb_root *root;
> >> +
> >> +       root = &vdpa_nic->iova_root;
> >> +       if (!root)
> >> +               return NULL;
> >> +
> >> +       rb_node = root->rb_node;
> >> +
> >> +       while (rb_node) {
> >> +               iova_node = rb_entry(rb_node, struct ef100_vdpa_iova_node,
> >> +                                    node);
> >> +               if (iova_node->iova > iova)
> >> +                       rb_node = rb_node->rb_left;
> >> +               else if (iova_node->iova < iova)
> >> +                       rb_node = rb_node->rb_right;
> >> +               else
> >> +                       return iova_node;
> >> +       }
> >> +
> >> +       return NULL;
> >> +}
> >> +
> >> +void efx_ef100_remove_iova_node(struct ef100_vdpa_nic *vdpa_nic,
> >> +                               unsigned long iova)
> >> +{
> >> +       struct ef100_vdpa_iova_node *iova_node;
> >> +
> >> +       mutex_lock(&vdpa_nic->iova_lock);
> >> +       iova_node = ef100_rbt_search_node(vdpa_nic, iova);
> >> +       if (!iova_node)
> >> +               goto out_unlock;
> >> +
> >> +       update_free_list(iova_node, vdpa_nic, false);
> >> +
> >> +       rb_erase(&iova_node->node, &vdpa_nic->iova_root);
> >> +       kfree(iova_node);
> >> +
> >> +out_unlock:
> >> +       mutex_unlock(&vdpa_nic->iova_lock);
> >> +}
> >> +
> >> +void efx_ef100_delete_iova(struct ef100_vdpa_nic *vdpa_nic)
> >> +{
> >> +       struct ef100_vdpa_iova_node *iova_node;
> >> +       struct rb_root *iova_root;
> >> +       struct rb_node *node;
> >> +
> >> +       mutex_lock(&vdpa_nic->iova_lock);
> >> +
> >> +       iova_root = &vdpa_nic->iova_root;
> >> +       while (!RB_EMPTY_ROOT(iova_root)) {
> >> +               node = rb_first(iova_root);
> >> +               iova_node = rb_entry(node, struct ef100_vdpa_iova_node, node);
> >> +               if (!list_empty(&iova_node->free_node))
> >> +                       list_del_init(&iova_node->free_node);
> >> +               if (vdpa_nic->domain)
> >> +                       iommu_unmap(vdpa_nic->domain, iova_node->iova,
> >> +                                   iova_node->size);
> >> +               rb_erase(node, iova_root);
> >> +               kfree(iova_node);
> >> +       }
> >> +
> >> +       mutex_unlock(&vdpa_nic->iova_lock);
> >> +}
> >> +
> >> +int efx_ef100_find_new_iova(struct ef100_vdpa_nic *vdpa_nic,
> >> +                           unsigned int buf_len,
> >> +                           u64 *new_iova)
> >> +{
> >> +       struct ef100_vdpa_iova_node *iova_node;
> >> +
> >> +       /* pick the first node from freelist */
> >> +       iova_node = list_first_entry_or_null(&vdpa_nic->free_list,
> >> +                                            struct ef100_vdpa_iova_node,
> >> +                                            free_node);
> >> +       if (!iova_node)
> >> +               return -ENOENT;
> >> +
> >> +       *new_iova = iova_node->iova + iova_node->size;
> >> +       return 0;
> >> +}
> >> diff --git a/drivers/net/ethernet/sfc/ef100_iova.h b/drivers/net/ethernet/sfc/ef100_iova.h
> >> new file mode 100644
> >> index 000000000000..68e39c4152c7
> >> --- /dev/null
> >> +++ b/drivers/net/ethernet/sfc/ef100_iova.h
> >> @@ -0,0 +1,40 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +/* Driver for Xilinx network controllers and boards
> >> + * Copyright(C) 2020-2022 Xilinx, Inc.
> >> + * Copyright(C) 2022 Advanced Micro Devices, Inc.
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify it
> >> + * under the terms of the GNU General Public License version 2 as published
> >> + * by the Free Software Foundation, incorporated herein by reference.
> >> + */
> >> +#ifndef EFX_EF100_IOVA_H
> >> +#define EFX_EF100_IOVA_H
> >> +
> >> +#include "ef100_nic.h"
> >> +#include "ef100_vdpa.h"
> >> +
> >> +#if defined(CONFIG_SFC_VDPA)
> >> +/**
> >> + * struct ef100_vdpa_iova_node - guest buffer iova entry
> >> + *
> >> + * @node: red black tree node
> >> + * @iova: mapping's IO virtual address
> >> + * @size: length of mapped region in bytes
> >> + * @free_node: free list node
> >> + */
> >> +struct ef100_vdpa_iova_node {
> >> +       struct rb_node node;
> >> +       unsigned long iova;
> >> +       size_t size;
> >> +       struct list_head free_node;
> >> +};
> >> +
> >> +int efx_ef100_insert_iova_node(struct ef100_vdpa_nic *vdpa_nic,
> >> +                              u64 iova, u64 size);
> >> +void efx_ef100_remove_iova_node(struct ef100_vdpa_nic *vdpa_nic,
> >> +                               unsigned long iova);
> >> +void efx_ef100_delete_iova(struct ef100_vdpa_nic *vdpa_nic);
> >> +int efx_ef100_find_new_iova(struct ef100_vdpa_nic *vdpa_nic,
> >> +                           unsigned int buf_len, u64 *new_iova);
> >> +#endif  /* CONFIG_SFC_VDPA */
> >> +#endif /* EFX_EF100_IOVA_H */
> >> diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
> >> index 41811c519275..72820d2fe19d 100644
> >> --- a/drivers/net/ethernet/sfc/ef100_nic.c
> >> +++ b/drivers/net/ethernet/sfc/ef100_nic.c
> >> @@ -33,7 +33,6 @@
> >>
> >>   #define EF100_MAX_VIS 4096
> >>   #define EF100_NUM_MCDI_BUFFERS 1
> >> -#define MCDI_BUF_LEN (8 + MCDI_CTL_SDU_LEN_MAX)
> >>
> >>   #define EF100_RESET_PORT ((ETH_RESET_MAC | ETH_RESET_PHY) << ETH_RESET_SHARED_SHIFT)
> >>
> >> diff --git a/drivers/net/ethernet/sfc/ef100_vdpa.c b/drivers/net/ethernet/sfc/ef100_vdpa.c
> >> index 80bca281a748..b9368eb1acd5 100644
> >> --- a/drivers/net/ethernet/sfc/ef100_vdpa.c
> >> +++ b/drivers/net/ethernet/sfc/ef100_vdpa.c
> >> @@ -14,6 +14,7 @@
> >>   #include <uapi/linux/vdpa.h>
> >>   #include "ef100_vdpa.h"
> >>   #include "mcdi_vdpa.h"
> >> +#include "ef100_iova.h"
> >>   #include "mcdi_filters.h"
> >>   #include "mcdi_functions.h"
> >>   #include "ef100_netdev.h"
> >> @@ -280,6 +281,34 @@ static int get_net_config(struct ef100_vdpa_nic *vdpa_nic)
> >>          return 0;
> >>   }
> >>
> >> +static int vdpa_update_domain(struct ef100_vdpa_nic *vdpa_nic)
> >> +{
> >> +       struct vdpa_device *vdpa = &vdpa_nic->vdpa_dev;
> >> +       struct iommu_domain_geometry *geo;
> >> +       struct device *dma_dev;
> >> +
> >> +       dma_dev = vdpa_get_dma_dev(vdpa);
> >> +       if (!device_iommu_capable(dma_dev, IOMMU_CAP_CACHE_COHERENCY))
> >> +               return -EOPNOTSUPP;
> >> +
> >> +       vdpa_nic->domain = iommu_get_domain_for_dev(dma_dev);
> >> +       if (!vdpa_nic->domain)
> >> +               return -ENODEV;
> >> +
> >> +       geo = &vdpa_nic->domain->geometry;
> >> +       /* save the geo aperture range for validation in dma_map */
> >> +       vdpa_nic->geo_aper_start = geo->aperture_start;
> >> +
> >> +       /* Handle the boundary case */
> >> +       if (geo->aperture_end == ~0ULL)
> >> +               geo->aperture_end -= 1;
> >> +       vdpa_nic->geo_aper_end = geo->aperture_end;
> >> +
> >> +       /* insert a sentinel node */
> >> +       return efx_ef100_insert_iova_node(vdpa_nic,
> >> +                                         vdpa_nic->geo_aper_end + 1, 0);
> >> +}
> >> +
> >>   static struct ef100_vdpa_nic *ef100_vdpa_create(struct efx_nic *efx,
> >>                                                  const char *dev_name,
> >>                                                  enum ef100_vdpa_class dev_type,
> >> @@ -316,6 +345,7 @@ static struct ef100_vdpa_nic *ef100_vdpa_create(struct efx_nic *efx,
> >>          }
> >>
> >>          mutex_init(&vdpa_nic->lock);
> >> +       mutex_init(&vdpa_nic->iova_lock);
> >>          dev = &vdpa_nic->vdpa_dev.dev;
> >>          efx->vdpa_nic = vdpa_nic;
> >>          vdpa_nic->vdpa_dev.dma_dev = &efx->pci_dev->dev;
> >> @@ -325,9 +355,11 @@ static struct ef100_vdpa_nic *ef100_vdpa_create(struct efx_nic *efx,
> >>          vdpa_nic->pf_index = nic_data->pf_index;
> >>          vdpa_nic->vf_index = nic_data->vf_index;
> >>          vdpa_nic->vdpa_state = EF100_VDPA_STATE_INITIALIZED;
> >> +       vdpa_nic->iova_root = RB_ROOT;
> >>          vdpa_nic->mac_address = (u8 *)&vdpa_nic->net_config.mac;
> >>          ether_addr_copy(vdpa_nic->mac_address, mac);
> >>          vdpa_nic->mac_configured = true;
> >> +       INIT_LIST_HEAD(&vdpa_nic->free_list);
> >>
> >>          for (i = 0; i < EF100_VDPA_MAC_FILTER_NTYPES; i++)
> >>                  vdpa_nic->filters[i].filter_id = EFX_INVALID_FILTER_ID;
> >> @@ -353,6 +385,12 @@ static struct ef100_vdpa_nic *ef100_vdpa_create(struct efx_nic *efx,
> >>                  goto err_put_device;
> >>          }
> >>
> >> +       rc = vdpa_update_domain(vdpa_nic);
> >> +       if (rc) {
> >> +               pci_err(efx->pci_dev, "update_domain failed, err: %d\n", rc);
> >> +               goto err_put_device;
> >> +       }
> >> +
> >>          rc = get_net_config(vdpa_nic);
> >>          if (rc)
> >>                  goto err_put_device;
> >> diff --git a/drivers/net/ethernet/sfc/ef100_vdpa.h b/drivers/net/ethernet/sfc/ef100_vdpa.h
> >> index 1b0bbba88154..c3c77029973d 100644
> >> --- a/drivers/net/ethernet/sfc/ef100_vdpa.h
> >> +++ b/drivers/net/ethernet/sfc/ef100_vdpa.h
> >> @@ -12,7 +12,9 @@
> >>   #define __EF100_VDPA_H__
> >>
> >>   #include <linux/vdpa.h>
> >> +#include <linux/iommu.h>
> >>   #include <uapi/linux/virtio_net.h>
> >> +#include <linux/rbtree.h>
> >>   #include "net_driver.h"
> >>   #include "ef100_nic.h"
> >>
> >> @@ -155,6 +157,12 @@ struct ef100_vdpa_filter {
> >>    * @mac_configured: true after MAC address is configured
> >>    * @filters: details of all filters created on this vdpa device
> >>    * @cfg_cb: callback for config change
> >> + * @domain: IOMMU domain
> >> + * @iova_root: iova rbtree root
> >> + * @iova_lock: lock to synchronize updates to rbtree and freelist
> >> + * @free_list: list to store free iova areas of size >= MCDI buffer length
> >> + * @geo_aper_start: start of valid IOVA range
> >> + * @geo_aper_end: end of valid IOVA range
> >>    */
> >>   struct ef100_vdpa_nic {
> >>          struct vdpa_device vdpa_dev;
> >> @@ -174,6 +182,13 @@ struct ef100_vdpa_nic {
> >>          bool mac_configured;
> >>          struct ef100_vdpa_filter filters[EF100_VDPA_MAC_FILTER_NTYPES];
> >>          struct vdpa_callback cfg_cb;
> >> +       struct iommu_domain *domain;
> >> +       struct rb_root iova_root;
> >> +       /* mutex to synchronize rbtree operations */
> >> +       struct mutex iova_lock;
> >> +       struct list_head free_list;
> >> +       u64 geo_aper_start;
> >> +       u64 geo_aper_end;
> >>   };
> >>
> >>   int ef100_vdpa_init(struct efx_probe_data *probe_data);
> >> diff --git a/drivers/net/ethernet/sfc/ef100_vdpa_ops.c b/drivers/net/ethernet/sfc/ef100_vdpa_ops.c
> >> index 718b67f6da90..8c198d949fdb 100644
> >> --- a/drivers/net/ethernet/sfc/ef100_vdpa_ops.c
> >> +++ b/drivers/net/ethernet/sfc/ef100_vdpa_ops.c
> >> @@ -10,6 +10,7 @@
> >>
> >>   #include <linux/vdpa.h>
> >>   #include "ef100_vdpa.h"
> >> +#include "ef100_iova.h"
> >>   #include "io.h"
> >>   #include "mcdi_vdpa.h"
> >>
> >> @@ -260,6 +261,7 @@ static void ef100_reset_vdpa_device(struct ef100_vdpa_nic *vdpa_nic)
> >>          if (!vdpa_nic->status)
> >>                  return;
> >>
> >> +       efx_ef100_delete_iova(vdpa_nic);
> >>          vdpa_nic->vdpa_state = EF100_VDPA_STATE_INITIALIZED;
> >>          vdpa_nic->status = 0;
> >>          vdpa_nic->features = 0;
> >> @@ -743,9 +745,12 @@ static void ef100_vdpa_free(struct vdpa_device *vdev)
> >>          int i;
> >>
> >>          if (vdpa_nic) {
> >> +               /* clean-up the mappings and iova tree */
> >> +               efx_ef100_delete_iova(vdpa_nic);
> >>                  for (i = 0; i < (vdpa_nic->max_queue_pairs * 2); i++)
> >>                          reset_vring(vdpa_nic, i);
> >>                  ef100_vdpa_irq_vectors_free(vdpa_nic->efx->pci_dev);
> >> +               mutex_destroy(&vdpa_nic->iova_lock);
> >>                  mutex_destroy(&vdpa_nic->lock);
> >>                  vdpa_nic->efx->vdpa_nic = NULL;
> >>          }
> >> diff --git a/drivers/net/ethernet/sfc/mcdi.h b/drivers/net/ethernet/sfc/mcdi.h
> >> index db4ca4975ada..7d977a58a0df 100644
> >> --- a/drivers/net/ethernet/sfc/mcdi.h
> >> +++ b/drivers/net/ethernet/sfc/mcdi.h
> >> @@ -7,6 +7,7 @@
> >>   #ifndef EFX_MCDI_H
> >>   #define EFX_MCDI_H
> >>
> >> +#include "mcdi_pcol.h"
> >>   /**
> >>    * enum efx_mcdi_state - MCDI request handling state
> >>    * @MCDI_STATE_QUIESCENT: No pending MCDI requests. If the caller holds the
> >> @@ -40,6 +41,8 @@ enum efx_mcdi_mode {
> >>          MCDI_MODE_FAIL,
> >>   };
> >>
> >> +#define MCDI_BUF_LEN (8 + MCDI_CTL_SDU_LEN_MAX)
> >> +
> >>   /**
> >>    * struct efx_mcdi_iface - MCDI protocol context
> >>    * @efx: The associated NIC.
> >> --
> >> 2.30.1
> >>
>


^ permalink raw reply

* [PATCH v4] vhost_vdpa: fix the crash in unmap a large memory
From: Cindy Lu @ 2022-12-19  7:33 UTC (permalink / raw)
  To: lulu, jasowang, mst, kvm, linux-kernel, virtualization, netdev; +Cc: stable

While testing in vIOMMU, sometimes Guest will unmap very large memory,
which will cause the crash. To fix this, add a new function
vhost_vdpa_general_unmap(). This function will only unmap the memory
that saved in iotlb.

Call Trace:
[  647.820144] ------------[ cut here ]------------
[  647.820848] kernel BUG at drivers/iommu/intel/iommu.c:1174!
[  647.821486] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[  647.822082] CPU: 10 PID: 1181 Comm: qemu-system-x86 Not tainted 6.0.0-rc1home_lulu_2452_lulu7_vhost+ #62
[  647.823139] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-29-g6a62e0cb0dfe-prebuilt.qem4
[  647.824365] RIP: 0010:domain_unmap+0x48/0x110
[  647.825424] Code: 48 89 fb 8d 4c f6 1e 39 c1 0f 4f c8 83 e9 0c 83 f9 3f 7f 18 48 89 e8 48 d3 e8 48 85 c0 75 59
[  647.828064] RSP: 0018:ffffae5340c0bbf0 EFLAGS: 00010202
[  647.828973] RAX: 0000000000000001 RBX: ffff921793d10540 RCX: 000000000000001b
[  647.830083] RDX: 00000000080000ff RSI: 0000000000000001 RDI: ffff921793d10540
[  647.831214] RBP: 0000000007fc0100 R08: ffffae5340c0bcd0 R09: 0000000000000003
[  647.832388] R10: 0000007fc0100000 R11: 0000000000100000 R12: 00000000080000ff
[  647.833668] R13: ffffae5340c0bcd0 R14: ffff921793d10590 R15: 0000008000100000
[  647.834782] FS:  00007f772ec90640(0000) GS:ffff921ce7a80000(0000) knlGS:0000000000000000
[  647.836004] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  647.836990] CR2: 00007f02c27a3a20 CR3: 0000000101b0c006 CR4: 0000000000372ee0
[  647.838107] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  647.839283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  647.840666] Call Trace:
[  647.841437]  <TASK>
[  647.842107]  intel_iommu_unmap_pages+0x93/0x140
[  647.843112]  __iommu_unmap+0x91/0x1b0
[  647.844003]  iommu_unmap+0x6a/0x95
[  647.844885]  vhost_vdpa_unmap+0x1de/0x1f0 [vhost_vdpa]
[  647.845985]  vhost_vdpa_process_iotlb_msg+0xf0/0x90b [vhost_vdpa]
[  647.847235]  ? _raw_spin_unlock+0x15/0x30
[  647.848181]  ? _copy_from_iter+0x8c/0x580
[  647.849137]  vhost_chr_write_iter+0xb3/0x430 [vhost]
[  647.850126]  vfs_write+0x1e4/0x3a0
[  647.850897]  ksys_write+0x53/0xd0
[  647.851688]  do_syscall_64+0x3a/0x90
[  647.852508]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  647.853457] RIP: 0033:0x7f7734ef9f4f
[  647.854408] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 76 f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c8
[  647.857217] RSP: 002b:00007f772ec8f040 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[  647.858486] RAX: ffffffffffffffda RBX: 00000000fef00000 RCX: 00007f7734ef9f4f
[  647.859713] RDX: 0000000000000048 RSI: 00007f772ec8f090 RDI: 0000000000000010
[  647.860942] RBP: 00007f772ec8f1a0 R08: 0000000000000000 R09: 0000000000000000
[  647.862206] R10: 0000000000000001 R11: 0000000000000293 R12: 0000000000000010
[  647.863446] R13: 0000000000000002 R14: 0000000000000000 R15: ffffffff01100000
[  647.864692]  </TASK>
[  647.865458] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs v]
[  647.874688] ---[ end trace 0000000000000000 ]---

Cc: stable@vger.kernel.org
Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend")
Signed-off-by: Cindy Lu <lulu@redhat.com>
---
 drivers/vhost/vdpa.c | 40 +++++++++++++++++++++++-----------------
 1 file changed, 23 insertions(+), 17 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 166044642fd5..f8da6a3f98d2 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -683,10 +683,20 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
 	mutex_unlock(&d->mutex);
 	return r;
 }
+static void vhost_vdpa_general_unmap(struct vhost_vdpa *v,
+				     struct vhost_iotlb_map *map, u32 asid)
+{
+	struct vdpa_device *vdpa = v->vdpa;
+	const struct vdpa_config_ops *ops = vdpa->config;
+	if (ops->dma_map) {
+		ops->dma_unmap(vdpa, asid, map->start, map->size);
+	} else if (ops->set_map == NULL) {
+		iommu_unmap(v->domain, map->start, map->size);
+	}
+}
 
-static void vhost_vdpa_pa_unmap(struct vhost_vdpa *v,
-				struct vhost_iotlb *iotlb,
-				u64 start, u64 last)
+static void vhost_vdpa_pa_unmap(struct vhost_vdpa *v, struct vhost_iotlb *iotlb,
+				u64 start, u64 last, u32 asid)
 {
 	struct vhost_dev *dev = &v->vdev;
 	struct vhost_iotlb_map *map;
@@ -703,13 +713,13 @@ static void vhost_vdpa_pa_unmap(struct vhost_vdpa *v,
 			unpin_user_page(page);
 		}
 		atomic64_sub(PFN_DOWN(map->size), &dev->mm->pinned_vm);
+		vhost_vdpa_general_unmap(v, map, asid);
 		vhost_iotlb_map_free(iotlb, map);
 	}
 }
 
-static void vhost_vdpa_va_unmap(struct vhost_vdpa *v,
-				struct vhost_iotlb *iotlb,
-				u64 start, u64 last)
+static void vhost_vdpa_va_unmap(struct vhost_vdpa *v, struct vhost_iotlb *iotlb,
+				u64 start, u64 last, u32 asid)
 {
 	struct vhost_iotlb_map *map;
 	struct vdpa_map_file *map_file;
@@ -718,20 +728,21 @@ static void vhost_vdpa_va_unmap(struct vhost_vdpa *v,
 		map_file = (struct vdpa_map_file *)map->opaque;
 		fput(map_file->file);
 		kfree(map_file);
+		vhost_vdpa_general_unmap(v, map, asid);
 		vhost_iotlb_map_free(iotlb, map);
 	}
 }
 
 static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v,
-				   struct vhost_iotlb *iotlb,
-				   u64 start, u64 last)
+				   struct vhost_iotlb *iotlb, u64 start,
+				   u64 last, u32 asid)
 {
 	struct vdpa_device *vdpa = v->vdpa;
 
 	if (vdpa->use_va)
-		return vhost_vdpa_va_unmap(v, iotlb, start, last);
+		return vhost_vdpa_va_unmap(v, iotlb, start, last, asid);
 
-	return vhost_vdpa_pa_unmap(v, iotlb, start, last);
+	return vhost_vdpa_pa_unmap(v, iotlb, start, last, asid);
 }
 
 static int perm_to_iommu_flags(u32 perm)
@@ -798,17 +809,12 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v,
 	const struct vdpa_config_ops *ops = vdpa->config;
 	u32 asid = iotlb_to_asid(iotlb);
 
-	vhost_vdpa_iotlb_unmap(v, iotlb, iova, iova + size - 1);
+	vhost_vdpa_iotlb_unmap(v, iotlb, iova, iova + size - 1, asid);
 
-	if (ops->dma_map) {
-		ops->dma_unmap(vdpa, asid, iova, size);
-	} else if (ops->set_map) {
+	if (ops->set_map) {
 		if (!v->in_batch)
 			ops->set_map(vdpa, asid, iotlb);
-	} else {
-		iommu_unmap(v->domain, iova, size);
 	}
-
 	/* If we are in the middle of batch processing, delay the free
 	 * of AS until BATCH_END.
 	 */
-- 
2.34.3


^ permalink raw reply related

* Re: [PATCH] vhost_net: revert upend_idx only on retriable error
From: Michael S. Tsirkin @ 2022-12-19  7:39 UTC (permalink / raw)
  To: Jason Wang
  Cc: Andrey Smetanin, kvm, virtualization, netdev, linux-kernel,
	yc-core
In-Reply-To: <CACGkMEs3gdcQ5_PkYmz2eV-kFodZnnPPhvyRCyLXBYYdfHtNjw@mail.gmail.com>

On Thu, Dec 01, 2022 at 05:01:58PM +0800, Jason Wang wrote:
> On Wed, Nov 23, 2022 at 6:24 PM Andrey Smetanin
> <asmetanin@yandex-team.ru> wrote:
> >
> > Fix possible virtqueue used buffers leak and corresponding stuck
> > in case of temporary -EIO from sendmsg() which is produced by
> > tun driver while backend device is not up.
> >
> > In case of no-retriable error and zcopy do not revert upend_idx
> > to pass packet data (that is update used_idx in corresponding
> > vhost_zerocopy_signal_used()) as if packet data has been
> > transferred successfully.
> 
> Should we mark head.len as VHOST_DMA_DONE_LEN in this case?
> 
> Thanks

Any response here?


> >
> > Signed-off-by: Andrey Smetanin <asmetanin@yandex-team.ru>
> > ---
> >  drivers/vhost/net.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 20265393aee7..93e9166039b9 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -934,13 +934,16 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
> >
> >                 err = sock->ops->sendmsg(sock, &msg, len);
> >                 if (unlikely(err < 0)) {
> > +                       bool retry = err == -EAGAIN || err == -ENOMEM || err == -ENOBUFS;
> > +
> >                         if (zcopy_used) {
> >                                 if (vq->heads[ubuf->desc].len == VHOST_DMA_IN_PROGRESS)
> >                                         vhost_net_ubuf_put(ubufs);
> > -                               nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
> > -                                       % UIO_MAXIOV;
> > +                               if (retry)
> > +                                       nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
> > +                                               % UIO_MAXIOV;
> >                         }
> > -                       if (err == -EAGAIN || err == -ENOMEM || err == -ENOBUFS) {
> > +                       if (retry) {
> >                                 vhost_discard_vq_desc(vq, 1);
> >                                 vhost_net_enable_vq(net, vq);
> >                                 break;
> > --
> > 2.25.1
> >


^ permalink raw reply

* Re: [PATCH] usbnet: jump to rx_cleanup case instead of calling skb_queue_tail
From: Leesoo Ahn @ 2022-12-19  7:41 UTC (permalink / raw)
  To: Greg KH
  Cc: Oliver Neukum, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-usb, linux-kernel
In-Reply-To: <Y57VkLKetDsbUUjC@kroah.com>


On 22. 12. 18. 17:55, Greg KH wrote:
> On Sun, Dec 18, 2022 at 01:18:51AM +0900, Leesoo Ahn wrote:
>> The current source pushes skb into dev->done queue by calling
>> skb_queue_tail() and then, call skb_dequeue() to pop for rx_cleanup state
>> to free urb and skb next in usbnet_bh().
>> It wastes CPU resource with extra instructions. Instead, use return values
>> jumping to rx_cleanup case directly to free them. Therefore calling
>> skb_queue_tail() and skb_dequeue() is not necessary.
>>
>> The follows are just showing difference between calling skb_queue_tail()
>> and using return values jumping to rx_cleanup state directly in usbnet_bh()
>> in Arm64 instructions with perf tool.
>>
>> ----------- calling skb_queue_tail() -----------
>>         │     if (!(dev->driver_info->flags & FLAG_RX_ASSEMBLE))
>>    7.58 │248:   ldr     x0, [x20, #16]
>>    2.46 │24c:   ldr     w0, [x0, #8]
>>    1.64 │250: ↑ tbnz    w0, #14, 16c
>>         │     dev->net->stats.rx_errors++;
>>    0.57 │254:   ldr     x1, [x20, #184]
>>    1.64 │258:   ldr     x0, [x1, #336]
>>    2.65 │25c:   add     x0, x0, #0x1
>>         │260:   str     x0, [x1, #336]
>>         │     skb_queue_tail(&dev->done, skb);
>>    0.38 │264:   mov     x1, x19
>>         │268:   mov     x0, x21
>>    2.27 │26c: → bl      skb_queue_tail
>>    0.57 │270: ↑ b       44    // branch to call skb_dequeue()
>>
>> ----------- jumping to rx_cleanup state -----------
>>         │     if (!(dev->driver_info->flags & FLAG_RX_ASSEMBLE))
>>    1.69 │25c:   ldr     x0, [x21, #16]
>>    4.78 │260:   ldr     w0, [x0, #8]
>>    3.28 │264: ↑ tbnz    w0, #14, e4    // jump to 'rx_cleanup' state
>>         │     dev->net->stats.rx_errors++;
>>    0.09 │268:   ldr     x1, [x21, #184]
>>    2.72 │26c:   ldr     x0, [x1, #336]
>>    3.37 │270:   add     x0, x0, #0x1
>>    0.09 │274:   str     x0, [x1, #336]
>>    0.66 │278: ↑ b       e4    // branch to 'rx_cleanup' state
> Interesting, but does this even really matter given the slow speed of
> the USB hardware?

It doesn't if USB hardware has slow speed but in software view, it's 
still worth avoiding calling skb_queue_tail() and skb_dequeue() which 
work with spinlock, if possible.


>> Signed-off-by: Leesoo Ahn <lsahn@ooseel.net>
>> ---
>>   drivers/net/usb/usbnet.c | 11 ++++++-----
>>   1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
>> index 64a9a80b2309..924392a37297 100644
>> --- a/drivers/net/usb/usbnet.c
>> +++ b/drivers/net/usb/usbnet.c
>> @@ -555,7 +555,7 @@ static int rx_submit (struct usbnet *dev, struct urb *urb, gfp_t flags)
>>   
>>   /*-------------------------------------------------------------------------*/
>>   
>> -static inline void rx_process (struct usbnet *dev, struct sk_buff *skb)
>> +static inline int rx_process(struct usbnet *dev, struct sk_buff *skb)
>>   {
>>   	if (dev->driver_info->rx_fixup &&
>>   	    !dev->driver_info->rx_fixup (dev, skb)) {
>> @@ -576,11 +576,11 @@ static inline void rx_process (struct usbnet *dev, struct sk_buff *skb)
>>   		netif_dbg(dev, rx_err, dev->net, "rx length %d\n", skb->len);
>>   	} else {
>>   		usbnet_skb_return(dev, skb);
>> -		return;
>> +		return 0;
>>   	}
>>   
>>   done:
>> -	skb_queue_tail(&dev->done, skb);
>> +	return -1;
> Don't make up error numbers, this makes it look like this failed, not
> succeeded.  And if this failed, give it a real error value.
>
> thanks,
>
> greg k-h

Best regards,
Leesoo


^ permalink raw reply

* Re: [PATCH] usbnet: jump to rx_cleanup case instead of calling skb_queue_tail
From: Greg KH @ 2022-12-19  7:50 UTC (permalink / raw)
  To: Leesoo Ahn
  Cc: Oliver Neukum, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-usb, linux-kernel
In-Reply-To: <a2e0e98a-1044-908a-15bc-b165ff8b23ea@ooseel.net>

On Mon, Dec 19, 2022 at 04:41:16PM +0900, Leesoo Ahn wrote:
> 
> On 22. 12. 18. 17:55, Greg KH wrote:
> > On Sun, Dec 18, 2022 at 01:18:51AM +0900, Leesoo Ahn wrote:
> > > The current source pushes skb into dev->done queue by calling
> > > skb_queue_tail() and then, call skb_dequeue() to pop for rx_cleanup state
> > > to free urb and skb next in usbnet_bh().
> > > It wastes CPU resource with extra instructions. Instead, use return values
> > > jumping to rx_cleanup case directly to free them. Therefore calling
> > > skb_queue_tail() and skb_dequeue() is not necessary.
> > > 
> > > The follows are just showing difference between calling skb_queue_tail()
> > > and using return values jumping to rx_cleanup state directly in usbnet_bh()
> > > in Arm64 instructions with perf tool.
> > > 
> > > ----------- calling skb_queue_tail() -----------
> > >         │     if (!(dev->driver_info->flags & FLAG_RX_ASSEMBLE))
> > >    7.58 │248:   ldr     x0, [x20, #16]
> > >    2.46 │24c:   ldr     w0, [x0, #8]
> > >    1.64 │250: ↑ tbnz    w0, #14, 16c
> > >         │     dev->net->stats.rx_errors++;
> > >    0.57 │254:   ldr     x1, [x20, #184]
> > >    1.64 │258:   ldr     x0, [x1, #336]
> > >    2.65 │25c:   add     x0, x0, #0x1
> > >         │260:   str     x0, [x1, #336]
> > >         │     skb_queue_tail(&dev->done, skb);
> > >    0.38 │264:   mov     x1, x19
> > >         │268:   mov     x0, x21
> > >    2.27 │26c: → bl      skb_queue_tail
> > >    0.57 │270: ↑ b       44    // branch to call skb_dequeue()
> > > 
> > > ----------- jumping to rx_cleanup state -----------
> > >         │     if (!(dev->driver_info->flags & FLAG_RX_ASSEMBLE))
> > >    1.69 │25c:   ldr     x0, [x21, #16]
> > >    4.78 │260:   ldr     w0, [x0, #8]
> > >    3.28 │264: ↑ tbnz    w0, #14, e4    // jump to 'rx_cleanup' state
> > >         │     dev->net->stats.rx_errors++;
> > >    0.09 │268:   ldr     x1, [x21, #184]
> > >    2.72 │26c:   ldr     x0, [x1, #336]
> > >    3.37 │270:   add     x0, x0, #0x1
> > >    0.09 │274:   str     x0, [x1, #336]
> > >    0.66 │278: ↑ b       e4    // branch to 'rx_cleanup' state
> > Interesting, but does this even really matter given the slow speed of
> > the USB hardware?
> 
> It doesn't if USB hardware has slow speed but in software view, it's still
> worth avoiding calling skb_queue_tail() and skb_dequeue() which work with
> spinlock, if possible.

But can you actually measure that in either CPU load or in increased
transfer speeds?

thanks,

greg k-h

^ permalink raw reply

* [syzbot] KASAN: use-after-free Read in put_pmu_ctx
From: syzbot @ 2022-12-19  8:04 UTC (permalink / raw)
  To: acme, alexander.shishkin, bpf, jolsa, linux-kernel,
	linux-perf-users, mark.rutland, mingo, namhyung, netdev, peterz,
	syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    13e3c7793e2f Merge tag 'for-netdev' of https://git.kernel...
git tree:       bpf
console+strace: https://syzkaller.appspot.com/x/log.txt?x=177df7e0480000
kernel config:  https://syzkaller.appspot.com/x/.config?x=b0e91ad4b5f69c47
dashboard link: https://syzkaller.appspot.com/bug?extid=b8e8c01c8ade4fe6e48f
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15e87100480000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16ceeb13880000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/373a99daa295/disk-13e3c779.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/7fa71ed0fe17/vmlinux-13e3c779.xz
kernel image: https://storage.googleapis.com/syzbot-assets/2842ad5c698b/bzImage-13e3c779.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+b8e8c01c8ade4fe6e48f@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in __lock_acquire+0x3ee7/0x56d0 kernel/locking/lockdep.c:4925
Read of size 8 at addr ffff8880237d6018 by task syz-executor287/8300

CPU: 0 PID: 8300 Comm: syz-executor287 Not tainted 6.1.0-syzkaller-09661-g13e3c7793e2f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106
 print_address_description mm/kasan/report.c:284 [inline]
 print_report+0x15e/0x45d mm/kasan/report.c:395
 kasan_report+0xbf/0x1f0 mm/kasan/report.c:495
 __lock_acquire+0x3ee7/0x56d0 kernel/locking/lockdep.c:4925
 lock_acquire kernel/locking/lockdep.c:5668 [inline]
 lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162
 put_pmu_ctx kernel/events/core.c:4913 [inline]
 put_pmu_ctx+0xad/0x390 kernel/events/core.c:4893
 _free_event+0x3c5/0x13d0 kernel/events/core.c:5196
 free_event+0x58/0xc0 kernel/events/core.c:5224
 __do_sys_perf_event_open+0x66d/0x2980 kernel/events/core.c:12701
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f3a2b1b3f29
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 b1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff4215df68 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f3a2b1b3f29
RDX: 0000000000000000 RSI: 0000000000002070 RDI: 0000000020000480
RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000001
R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000012a59
R13: 00007fff4215df7c R14: 00007fff4215df90 R15: 00007fff4215df80
 </TASK>

Allocated by task 8300:
 kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
 kasan_set_track+0x25/0x30 mm/kasan/common.c:52
 ____kasan_kmalloc mm/kasan/common.c:371 [inline]
 ____kasan_kmalloc mm/kasan/common.c:330 [inline]
 __kasan_kmalloc+0xa5/0xb0 mm/kasan/common.c:380
 kmalloc include/linux/slab.h:580 [inline]
 kzalloc include/linux/slab.h:720 [inline]
 alloc_perf_context kernel/events/core.c:4693 [inline]
 find_get_context+0xcc/0x810 kernel/events/core.c:4763
 __do_sys_perf_event_open+0x963/0x2980 kernel/events/core.c:12476
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 5310:
 kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
 kasan_set_track+0x25/0x30 mm/kasan/common.c:52
 kasan_save_free_info+0x2e/0x40 mm/kasan/generic.c:518
 ____kasan_slab_free mm/kasan/common.c:236 [inline]
 ____kasan_slab_free+0x160/0x1c0 mm/kasan/common.c:200
 kasan_slab_free include/linux/kasan.h:177 [inline]
 slab_free_hook mm/slub.c:1781 [inline]
 slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1807
 slab_free mm/slub.c:3787 [inline]
 __kmem_cache_free+0xaf/0x3b0 mm/slub.c:3800
 rcu_do_batch kernel/rcu/tree.c:2244 [inline]
 rcu_core+0x81f/0x1980 kernel/rcu/tree.c:2504
 __do_softirq+0x1fb/0xadc kernel/softirq.c:571

Last potentially related work creation:
 kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
 __kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:488
 __call_rcu_common.constprop.0+0x99/0x820 kernel/rcu/tree.c:2753
 put_ctx kernel/events/core.c:1180 [inline]
 put_ctx+0x116/0x1e0 kernel/events/core.c:1173
 perf_event_exit_task_context kernel/events/core.c:13046 [inline]
 perf_event_exit_task+0x556/0x760 kernel/events/core.c:13073
 do_exit+0xb4d/0x2a30 kernel/exit.c:829
 __do_sys_exit kernel/exit.c:917 [inline]
 __se_sys_exit kernel/exit.c:915 [inline]
 __x64_sys_exit+0x42/0x50 kernel/exit.c:915
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

The buggy address belongs to the object at ffff8880237d6000
 which belongs to the cache kmalloc-512 of size 512
The buggy address is located 24 bytes inside of
 512-byte region [ffff8880237d6000, ffff8880237d6200)

The buggy address belongs to the physical page:
page:ffffea00008df500 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x237d4
head:ffffea00008df500 order:2 compound_mapcount:0 compound_pincount:0
anon flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 ffff888012441c80 0000000000000000 dead000000000001
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 2173, tgid 2173 (kworker/u4:3), ts 10211275854, free_ts 0
 prep_new_page mm/page_alloc.c:2539 [inline]
 get_page_from_freelist+0x10b5/0x2d50 mm/page_alloc.c:4291
 __alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5558
 alloc_pages+0x1aa/0x270 mm/mempolicy.c:2285
 alloc_slab_page mm/slub.c:1851 [inline]
 allocate_slab+0x25f/0x350 mm/slub.c:1998
 new_slab mm/slub.c:2051 [inline]
 ___slab_alloc+0xa91/0x1400 mm/slub.c:3193
 __slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3292
 __slab_alloc_node mm/slub.c:3345 [inline]
 slab_alloc_node mm/slub.c:3442 [inline]
 __kmem_cache_alloc_node+0x1a4/0x430 mm/slub.c:3491
 kmalloc_trace+0x26/0x60 mm/slab_common.c:1062
 kmalloc include/linux/slab.h:580 [inline]
 kzalloc include/linux/slab.h:720 [inline]
 alloc_bprm+0x51/0x900 fs/exec.c:1510
 kernel_execve+0xaf/0x500 fs/exec.c:1981
 call_usermodehelper_exec_async+0x2e7/0x580 kernel/umh.c:113
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
page_owner free stack trace missing

Memory state around the buggy address:
 ffff8880237d5f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8880237d5f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8880237d6000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
 ffff8880237d6080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8880237d6100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH] usbnet: jump to rx_cleanup case instead of calling skb_queue_tail
From: Leesoo Ahn @ 2022-12-19  8:09 UTC (permalink / raw)
  To: Greg KH
  Cc: Oliver Neukum, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-usb, linux-kernel
In-Reply-To: <Y6AXqOlCUy7mahgj@kroah.com>


On 22. 12. 19. 16:50, Greg KH wrote:
> On Mon, Dec 19, 2022 at 04:41:16PM +0900, Leesoo Ahn wrote:
>> On 22. 12. 18. 17:55, Greg KH wrote:
>>> On Sun, Dec 18, 2022 at 01:18:51AM +0900, Leesoo Ahn wrote:
>>>> The current source pushes skb into dev->done queue by calling
>>>> skb_queue_tail() and then, call skb_dequeue() to pop for rx_cleanup state
>>>> to free urb and skb next in usbnet_bh().
>>>> It wastes CPU resource with extra instructions. Instead, use return values
>>>> jumping to rx_cleanup case directly to free them. Therefore calling
>>>> skb_queue_tail() and skb_dequeue() is not necessary.
>>>>
>>>> The follows are just showing difference between calling skb_queue_tail()
>>>> and using return values jumping to rx_cleanup state directly in usbnet_bh()
>>>> in Arm64 instructions with perf tool.
>>>>
>>>> ----------- calling skb_queue_tail() -----------
>>>>          │     if (!(dev->driver_info->flags & FLAG_RX_ASSEMBLE))
>>>>     7.58 │248:   ldr     x0, [x20, #16]
>>>>     2.46 │24c:   ldr     w0, [x0, #8]
>>>>     1.64 │250: ↑ tbnz    w0, #14, 16c
>>>>          │     dev->net->stats.rx_errors++;
>>>>     0.57 │254:   ldr     x1, [x20, #184]
>>>>     1.64 │258:   ldr     x0, [x1, #336]
>>>>     2.65 │25c:   add     x0, x0, #0x1
>>>>          │260:   str     x0, [x1, #336]
>>>>          │     skb_queue_tail(&dev->done, skb);
>>>>     0.38 │264:   mov     x1, x19
>>>>          │268:   mov     x0, x21
>>>>     2.27 │26c: → bl      skb_queue_tail
>>>>     0.57 │270: ↑ b       44    // branch to call skb_dequeue()
>>>>
>>>> ----------- jumping to rx_cleanup state -----------
>>>>          │     if (!(dev->driver_info->flags & FLAG_RX_ASSEMBLE))
>>>>     1.69 │25c:   ldr     x0, [x21, #16]
>>>>     4.78 │260:   ldr     w0, [x0, #8]
>>>>     3.28 │264: ↑ tbnz    w0, #14, e4    // jump to 'rx_cleanup' state
>>>>          │     dev->net->stats.rx_errors++;
>>>>     0.09 │268:   ldr     x1, [x21, #184]
>>>>     2.72 │26c:   ldr     x0, [x1, #336]
>>>>     3.37 │270:   add     x0, x0, #0x1
>>>>     0.09 │274:   str     x0, [x1, #336]
>>>>     0.66 │278: ↑ b       e4    // branch to 'rx_cleanup' state
>>> Interesting, but does this even really matter given the slow speed of
>>> the USB hardware?
>> It doesn't if USB hardware has slow speed but in software view, it's still
>> worth avoiding calling skb_queue_tail() and skb_dequeue() which work with
>> spinlock, if possible.
> But can you actually measure that in either CPU load or in increased
> transfer speeds?
>
> thanks,
>
> greg k-h

I think the follows are maybe what you would be interested in. I have 
tested both case with perf on the same machine and environments, also 
modified driver code a bit to go to rx_cleanup case, not to net stack in 
a specific packet.

----- calling skb_queue_tail() -----
-   11.58%     0.26%  swapper          [k] usbnet_bh
    - 11.32% usbnet_bh
       - 6.43% skb_dequeue
            6.34% _raw_spin_unlock_irqrestore
       - 2.21% skb_queue_tail
            2.19% _raw_spin_unlock_irqrestore
       - 1.68% consume_skb
          - 0.97% kfree_skbmem
               0.80% kmem_cache_free
            0.53% skb_release_data

----- jump to rx_cleanup directly -----
-    7.62%     0.18%  swapper          [k] usbnet_bh
    - 7.44% usbnet_bh
       - 4.63% skb_dequeue
            4.57% _raw_spin_unlock_irqrestore
       - 1.76% consume_skb
          - 1.03% kfree_skbmem
               0.86% kmem_cache_free
            0.56% skb_release_data
         0.54% smsc95xx_rx_fixup

The first case takes CPU resource a bit much by the result.

Thank you for reviewing, by the way.

Best regards,
Leesoo




^ permalink raw reply

* [PATCH net v2] net: microchip: vcap: Fix initialization of value and mask
From: Horatiu Vultur @ 2022-12-19  8:22 UTC (permalink / raw)
  To: linux-arm-kernel, netdev, linux-kernel
  Cc: davem, edumazet, kuba, pabeni, lars.povlsen, Steen.Hegelund,
	daniel.machon, UNGLinuxDriver, Horatiu Vultur, kernel test robot,
	Dan Carpenter, Saeed Mahameed

Fix the following smatch warning:

smatch warnings:
drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c:103 vcap_debugfs_show_rule_keyfield() error: uninitialized symbol 'value'.
drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c:106 vcap_debugfs_show_rule_keyfield() error: uninitialized symbol 'mask'.

In case the vcap field was VCAP_FIELD_U128 and the key was different
than IP6_S/DIP then the value and mask were not initialized, therefore
initialize them.

Fixes: 610c32b2ce66 ("net: microchip: vcap: Add vcap_get_rule")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
---
v1->v2:
- rebase on net
- both the mask and value were assigned to data->u128.value, which is
  wrong, fix this.
---
 drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c b/drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c
index 895bfff550d23..e0b206247f2eb 100644
--- a/drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c
+++ b/drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c
@@ -83,6 +83,8 @@ static void vcap_debugfs_show_rule_keyfield(struct vcap_control *vctrl,
 		hex = true;
 		break;
 	case VCAP_FIELD_U128:
+		value = data->u128.value;
+		mask = data->u128.mask;
 		if (key == VCAP_KF_L3_IP6_SIP || key == VCAP_KF_L3_IP6_DIP) {
 			u8 nvalue[16], nmask[16];
 
-- 
2.38.0


^ permalink raw reply related

* Re: [Patch net] net_sched: reject TCF_EM_SIMPLE case for complex ematch module
From: Paolo Abeni @ 2022-12-19  8:16 UTC (permalink / raw)
  To: Cong Wang, netdev
  Cc: Cong Wang, syzbot+4caeae4c7103813598ae, Jun Nie, Jamal Hadi Salim
In-Reply-To: <20221217221707.46010-1-xiyou.wangcong@gmail.com>

Hello,

On Sat, 2022-12-17 at 14:17 -0800, Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> When TCF_EM_SIMPLE was introduced, it is supposed to be convenient
> for ematch implementation:
> 
> https://lore.kernel.org/all/20050105110048.GO26856@postel.suug.ch/
> 
> "You don't have to, providing a 32bit data chunk without TCF_EM_SIMPLE
> set will simply result in allocating & copy. It's an optimization,
> nothing more."
> 
> So if an ematch module provides ops->datalen that means it wants a
> complex data structure (saved in its em->data) instead of a simple u32
> value. We should simply reject such a combination, otherwise this u32
> could be misinterpreted as a pointer.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Reported-and-tested-by: syzbot+4caeae4c7103813598ae@syzkaller.appspotmail.com
> Reported-by: Jun Nie <jun.nie@linaro.org>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---
>  net/sched/ematch.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/sched/ematch.c b/net/sched/ematch.c
> index 4ce681361851..5c1235e6076a 100644
> --- a/net/sched/ematch.c
> +++ b/net/sched/ematch.c
> @@ -255,6 +255,8 @@ static int tcf_em_validate(struct tcf_proto *tp,
>  			 * the value carried.
>  			 */
>  			if (em_hdr->flags & TCF_EM_SIMPLE) {
> +				if (em->ops->datalen > 0)
> +					goto errout;
>  				if (data_len < sizeof(u32))
>  					goto errout;
>  				em->data = *(u32 *) data;


The patch LGTM, thanks!

Acked-by: Paolo Abeni <pabeni@redhat.com>

If I read correctly, this effectively rejects any ematch with
TCF_EM_SIMPLE set (all the existing tcf_ematch_ops structs have eiter
.change or . datalen > 0).

It looks like EM_SIMPLE does not work as intended since a lot of time
(possibly since its introduction !?!). Can we drop it completely?

Cheers,

Paolo


^ permalink raw reply

* Re: [PATCH RFC 1/2] i2c: add fwnode APIs
From: Russell King (Oracle) @ 2022-12-19  8:47 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: linux-acpi, linux-i2c, netdev, Andrew Lunn, David S. Miller,
	Eric Dumazet, Heiner Kallweit, Jakub Kicinski, Paolo Abeni,
	Wolfram Sang
In-Reply-To: <Y5G5ZyO1XRgjfN90@shell.armlinux.org.uk>

Hi Mika,

On Thu, Dec 08, 2022 at 10:16:07AM +0000, Russell King (Oracle) wrote:
> Hi Mika,
> 
> On Thu, Dec 08, 2022 at 12:04:02PM +0200, Mika Westerberg wrote:
> > Hi,
> > 
> > On Wed, Dec 07, 2022 at 11:22:24AM +0000, Russell King (Oracle) wrote:
> > > +EXPORT_SYMBOL(i2c_find_device_by_fwnode);
> > > +
> > 
> > Drop this empty line.
> 
> The additional empty line was there before, and I guess is something the
> I2C maintainer wants to logically separate the i2c device stuff from
> the rest of the file.
> 
> > > +/* must call put_device() when done with returned i2c_client device */
> > > +struct i2c_client *i2c_find_device_by_fwnode(struct fwnode_handle *fwnode);
> > 
> > With the kernel-docs in place you probably can drop these comments.
> 
> It's what is there against the other prototypes - and is very easy to
> get wrong, as I've recently noticed in the sfp.c code as a result of
> creating this series.
> 
> I find the whole _find_ vs _get_ thing a tad confusing, and there
> probably should be just one interface with one way of putting
> afterwards to avoid subtle long-standing bugs like this.
> 
> Thanks.

Do you have any comments on my reply please?

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH] usbnet: jump to rx_cleanup case instead of calling skb_queue_tail
From: Greg KH @ 2022-12-19  8:55 UTC (permalink / raw)
  To: Leesoo Ahn
  Cc: Oliver Neukum, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-usb, linux-kernel
In-Reply-To: <403f3ea8-eeec-2a78-640e-c11c3fe28f45@ooseel.net>

On Mon, Dec 19, 2022 at 05:09:21PM +0900, Leesoo Ahn wrote:
> 
> On 22. 12. 19. 16:50, Greg KH wrote:
> > On Mon, Dec 19, 2022 at 04:41:16PM +0900, Leesoo Ahn wrote:
> > > On 22. 12. 18. 17:55, Greg KH wrote:
> > > > On Sun, Dec 18, 2022 at 01:18:51AM +0900, Leesoo Ahn wrote:
> > > > > The current source pushes skb into dev->done queue by calling
> > > > > skb_queue_tail() and then, call skb_dequeue() to pop for rx_cleanup state
> > > > > to free urb and skb next in usbnet_bh().
> > > > > It wastes CPU resource with extra instructions. Instead, use return values
> > > > > jumping to rx_cleanup case directly to free them. Therefore calling
> > > > > skb_queue_tail() and skb_dequeue() is not necessary.
> > > > > 
> > > > > The follows are just showing difference between calling skb_queue_tail()
> > > > > and using return values jumping to rx_cleanup state directly in usbnet_bh()
> > > > > in Arm64 instructions with perf tool.
> > > > > 
> > > > > ----------- calling skb_queue_tail() -----------
> > > > >          │     if (!(dev->driver_info->flags & FLAG_RX_ASSEMBLE))
> > > > >     7.58 │248:   ldr     x0, [x20, #16]
> > > > >     2.46 │24c:   ldr     w0, [x0, #8]
> > > > >     1.64 │250: ↑ tbnz    w0, #14, 16c
> > > > >          │     dev->net->stats.rx_errors++;
> > > > >     0.57 │254:   ldr     x1, [x20, #184]
> > > > >     1.64 │258:   ldr     x0, [x1, #336]
> > > > >     2.65 │25c:   add     x0, x0, #0x1
> > > > >          │260:   str     x0, [x1, #336]
> > > > >          │     skb_queue_tail(&dev->done, skb);
> > > > >     0.38 │264:   mov     x1, x19
> > > > >          │268:   mov     x0, x21
> > > > >     2.27 │26c: → bl      skb_queue_tail
> > > > >     0.57 │270: ↑ b       44    // branch to call skb_dequeue()
> > > > > 
> > > > > ----------- jumping to rx_cleanup state -----------
> > > > >          │     if (!(dev->driver_info->flags & FLAG_RX_ASSEMBLE))
> > > > >     1.69 │25c:   ldr     x0, [x21, #16]
> > > > >     4.78 │260:   ldr     w0, [x0, #8]
> > > > >     3.28 │264: ↑ tbnz    w0, #14, e4    // jump to 'rx_cleanup' state
> > > > >          │     dev->net->stats.rx_errors++;
> > > > >     0.09 │268:   ldr     x1, [x21, #184]
> > > > >     2.72 │26c:   ldr     x0, [x1, #336]
> > > > >     3.37 │270:   add     x0, x0, #0x1
> > > > >     0.09 │274:   str     x0, [x1, #336]
> > > > >     0.66 │278: ↑ b       e4    // branch to 'rx_cleanup' state
> > > > Interesting, but does this even really matter given the slow speed of
> > > > the USB hardware?
> > > It doesn't if USB hardware has slow speed but in software view, it's still
> > > worth avoiding calling skb_queue_tail() and skb_dequeue() which work with
> > > spinlock, if possible.
> > But can you actually measure that in either CPU load or in increased
> > transfer speeds?
> > 
> > thanks,
> > 
> > greg k-h
> 
> I think the follows are maybe what you would be interested in. I have tested
> both case with perf on the same machine and environments, also modified
> driver code a bit to go to rx_cleanup case, not to net stack in a specific
> packet.
> 
> ----- calling skb_queue_tail() -----
> -   11.58%     0.26%  swapper          [k] usbnet_bh
>    - 11.32% usbnet_bh
>       - 6.43% skb_dequeue
>            6.34% _raw_spin_unlock_irqrestore
>       - 2.21% skb_queue_tail
>            2.19% _raw_spin_unlock_irqrestore
>       - 1.68% consume_skb
>          - 0.97% kfree_skbmem
>               0.80% kmem_cache_free
>            0.53% skb_release_data
> 
> ----- jump to rx_cleanup directly -----
> -    7.62%     0.18%  swapper          [k] usbnet_bh
>    - 7.44% usbnet_bh
>       - 4.63% skb_dequeue
>            4.57% _raw_spin_unlock_irqrestore
>       - 1.76% consume_skb
>          - 1.03% kfree_skbmem
>               0.86% kmem_cache_free
>            0.56% skb_release_data
>         0.54% smsc95xx_rx_fixup
> 
> The first case takes CPU resource a bit much by the result.

Ok, great!  Fix up the patch based on the review comments and add this
information to the changelog as well.

thanks,

greg k-h

^ permalink raw reply

* [syzbot] KASAN: use-after-free Read in aa_label_sk_perm
From: syzbot @ 2022-12-19  9:03 UTC (permalink / raw)
  To: apparmor, jmorris, john.johansen, linux-kernel,
	linux-security-module, netdev, paul, serge, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    7e68dd7d07a2 Merge tag 'net-next-6.2' of git://git.kernel...
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=12d5b793880000
kernel config:  https://syzkaller.appspot.com/x/.config?x=b0e91ad4b5f69c47
dashboard link: https://syzkaller.appspot.com/bug?extid=276237fb1679fa55a29b
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/3e9423c78657/disk-7e68dd7d.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/3844318fc016/vmlinux-7e68dd7d.xz
kernel image: https://storage.googleapis.com/syzbot-assets/b47aaab121f4/bzImage-7e68dd7d.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+276237fb1679fa55a29b@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in aa_label_sk_perm+0x4ec/0x530 security/apparmor/net.c:148
Read of size 8 at addr ffff88804a765480 by task syz-executor.5/12994

CPU: 0 PID: 12994 Comm: syz-executor.5 Not tainted 6.1.0-syzkaller-07445-g7e68dd7d07a2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106
 print_address_description mm/kasan/report.c:284 [inline]
 print_report+0x15e/0x45d mm/kasan/report.c:395
 kasan_report+0xbf/0x1f0 mm/kasan/report.c:495
 aa_label_sk_perm+0x4ec/0x530 security/apparmor/net.c:148
 aa_sk_perm+0x1e9/0xab0 security/apparmor/net.c:175
 security_socket_recvmsg+0x60/0xc0 security/security.c:2311
 sock_recvmsg net/socket.c:1011 [inline]
 ____sys_recvmsg+0x23b/0x610 net/socket.c:2695
 ___sys_recvmsg+0xf2/0x180 net/socket.c:2737
 do_recvmmsg+0x25e/0x6e0 net/socket.c:2831
 __sys_recvmmsg net/socket.c:2910 [inline]
 __do_sys_recvmmsg net/socket.c:2933 [inline]
 __se_sys_recvmmsg net/socket.c:2926 [inline]
 __x64_sys_recvmmsg+0x20f/0x260 net/socket.c:2926
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f26e788c0d9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f26e85c3168 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
RAX: ffffffffffffffda RBX: 00007f26e79ac2c0 RCX: 00007f26e788c0d9
RDX: 00000000000005dd RSI: 0000000020000540 RDI: 0000000000000004
RBP: 00007f26e78e7ae9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000040012062 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffc7c633f2f R14: 00007f26e85c3300 R15: 0000000000022000
 </TASK>

Allocated by task 12989:
 kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
 kasan_set_track+0x25/0x30 mm/kasan/common.c:52
 ____kasan_kmalloc mm/kasan/common.c:371 [inline]
 ____kasan_kmalloc mm/kasan/common.c:330 [inline]
 __kasan_kmalloc+0xa5/0xb0 mm/kasan/common.c:380
 kasan_kmalloc include/linux/kasan.h:211 [inline]
 __do_kmalloc_node mm/slab_common.c:968 [inline]
 __kmalloc+0x5a/0xd0 mm/slab_common.c:981
 kmalloc include/linux/slab.h:584 [inline]
 sk_prot_alloc+0x140/0x290 net/core/sock.c:2038
 sk_alloc+0x3a/0x7a0 net/core/sock.c:2091
 nr_create+0xb6/0x5f0 net/netrom/af_netrom.c:433
 __sock_create+0x359/0x790 net/socket.c:1515
 sock_create net/socket.c:1566 [inline]
 __sys_socket_create net/socket.c:1603 [inline]
 __sys_socket_create net/socket.c:1588 [inline]
 __sys_socket+0x133/0x250 net/socket.c:1636
 __do_sys_socket net/socket.c:1649 [inline]
 __se_sys_socket net/socket.c:1647 [inline]
 __x64_sys_socket+0x73/0xb0 net/socket.c:1647
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 9738:
 kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
 kasan_set_track+0x25/0x30 mm/kasan/common.c:52
 kasan_save_free_info+0x2e/0x40 mm/kasan/generic.c:518
 ____kasan_slab_free mm/kasan/common.c:236 [inline]
 ____kasan_slab_free+0x160/0x1c0 mm/kasan/common.c:200
 kasan_slab_free include/linux/kasan.h:177 [inline]
 slab_free_hook mm/slub.c:1781 [inline]
 slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1807
 slab_free mm/slub.c:3787 [inline]
 __kmem_cache_free+0xaf/0x3b0 mm/slub.c:3800
 sk_prot_free net/core/sock.c:2074 [inline]
 __sk_destruct+0x5df/0x750 net/core/sock.c:2166
 sk_destruct net/core/sock.c:2181 [inline]
 __sk_free+0x175/0x460 net/core/sock.c:2192
 sk_free+0x7c/0xa0 net/core/sock.c:2203
 sock_put include/net/sock.h:1987 [inline]
 nr_heartbeat_expiry+0x1d7/0x460 net/netrom/nr_timer.c:148
 call_timer_fn+0x1da/0x7c0 kernel/time/timer.c:1700
 expire_timers+0x2c6/0x5c0 kernel/time/timer.c:1751
 __run_timers kernel/time/timer.c:2022 [inline]
 __run_timers kernel/time/timer.c:1995 [inline]
 run_timer_softirq+0x326/0x910 kernel/time/timer.c:2035
 __do_softirq+0x1fb/0xadc kernel/softirq.c:571

Last potentially related work creation:
 kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
 __kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:488
 __call_rcu_common.constprop.0+0x99/0x820 kernel/rcu/tree.c:2753
 netlink_release+0xdcb/0x1e60 net/netlink/af_netlink.c:826
 __sock_release net/socket.c:650 [inline]
 sock_release+0x8b/0x1b0 net/socket.c:678
 netlink_kernel_release+0x4f/0x60 net/netlink/af_netlink.c:2118
 nfnetlink_net_exit_batch+0x112/0x320 net/netfilter/nfnetlink.c:782
 ops_exit_list+0x125/0x170 net/core/net_namespace.c:174
 cleanup_net+0x4ee/0xb10 net/core/net_namespace.c:606
 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289
 worker_thread+0x669/0x1090 kernel/workqueue.c:2436
 kthread+0x2e8/0x3a0 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306

Second to last potentially related work creation:
 kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
 __kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:488
 __call_rcu_common.constprop.0+0x99/0x820 kernel/rcu/tree.c:2753
 netlink_release+0xdcb/0x1e60 net/netlink/af_netlink.c:826
 __sock_release+0xcd/0x280 net/socket.c:650
 sock_close+0x1c/0x20 net/socket.c:1365
 __fput+0x27c/0xa90 fs/file_table.c:320
 task_work_run+0x16f/0x270 kernel/task_work.c:179
 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0x23c/0x250 kernel/entry/common.c:203
 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
 syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
 do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

The buggy address belongs to the object at ffff88804a765000
 which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 1152 bytes inside of
 2048-byte region [ffff88804a765000, ffff88804a765800)

The buggy address belongs to the physical page:
page:ffffea000129d800 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4a760
head:ffffea000129d800 order:3 compound_mapcount:0 compound_pincount:0
anon flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 ffff888012442000 0000000000000000 dead000000000001
raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd2a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5354, tgid 5354 (syz-executor.3), ts 142357401536, free_ts 13035931763
 prep_new_page mm/page_alloc.c:2539 [inline]
 get_page_from_freelist+0x10b5/0x2d50 mm/page_alloc.c:4291
 __alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5558
 alloc_pages+0x1aa/0x270 mm/mempolicy.c:2285
 alloc_slab_page mm/slub.c:1851 [inline]
 allocate_slab+0x25f/0x350 mm/slub.c:1998
 new_slab mm/slub.c:2051 [inline]
 ___slab_alloc+0xa91/0x1400 mm/slub.c:3193
 __slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3292
 __slab_alloc_node mm/slub.c:3345 [inline]
 slab_alloc_node mm/slub.c:3442 [inline]
 __kmem_cache_alloc_node+0x1a4/0x430 mm/slub.c:3491
 __do_kmalloc_node mm/slab_common.c:967 [inline]
 __kmalloc_node_track_caller+0x4b/0xc0 mm/slab_common.c:988
 kmalloc_reserve net/core/skbuff.c:492 [inline]
 __alloc_skb+0xe9/0x310 net/core/skbuff.c:565
 alloc_skb include/linux/skbuff.h:1270 [inline]
 nlmsg_new include/net/netlink.h:1002 [inline]
 inet6_ifinfo_notify+0x76/0x150 net/ipv6/addrconf.c:6047
 addrconf_notify+0x4c5/0x1c80 net/ipv6/addrconf.c:3657
 notifier_call_chain+0xb5/0x200 kernel/notifier.c:87
 call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1944
 call_netdevice_notifiers_extack net/core/dev.c:1982 [inline]
 call_netdevice_notifiers net/core/dev.c:1996 [inline]
 __dev_notify_flags+0x120/0x2d0 net/core/dev.c:8569
 dev_change_flags+0x11b/0x170 net/core/dev.c:8607
 do_setlink+0x9f1/0x3bb0 net/core/rtnetlink.c:2827
page last free stack trace:
 reset_page_owner include/linux/page_owner.h:24 [inline]
 free_pages_prepare mm/page_alloc.c:1459 [inline]
 free_pcp_prepare+0x65c/0xd90 mm/page_alloc.c:1509
 free_unref_page_prepare mm/page_alloc.c:3387 [inline]
 free_unref_page+0x1d/0x4d0 mm/page_alloc.c:3483
 free_contig_range+0xb5/0x180 mm/page_alloc.c:9496
 destroy_args+0xa8/0x64c mm/debug_vm_pgtable.c:1031
 debug_vm_pgtable+0x2958/0x29e9 mm/debug_vm_pgtable.c:1354
 do_one_initcall+0x141/0x790 init/main.c:1306
 do_initcall_level init/main.c:1379 [inline]
 do_initcalls init/main.c:1395 [inline]
 do_basic_setup init/main.c:1414 [inline]
 kernel_init_freeable+0x6f9/0x782 init/main.c:1634
 kernel_init+0x1e/0x1d0 init/main.c:1522
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306

Memory state around the buggy address:
 ffff88804a765380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88804a765400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88804a765480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                   ^
 ffff88804a765500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88804a765580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply

* Re: [PATCH] nfc: st-nci: array index overflow in st_nci_se_get_bwi()
From: Krzysztof Kozlowski @ 2022-12-19  9:06 UTC (permalink / raw)
  To: Alexander H Duyck, Aleksandr Burakov, Christophe Ricard,
	Samuel Ortiz
  Cc: netdev, linux-kernel, lvc-project
In-Reply-To: <5841f9021baf856c26fb27ac1d75fc1e29d3e044.camel@gmail.com>

On 14/12/2022 19:35, Alexander H Duyck wrote:
> On Tue, 2022-12-13 at 09:12 -0500, Aleksandr Burakov wrote:
>> Index of info->se_info.atr can be overflow due to unchecked increment
>> in the loop "for". The patch checks the value of current array index
>> and doesn't permit increment in case of the index is equal to
>> ST_NCI_ESE_MAX_LENGTH - 1.
>>
>> Found by Linux Verification Center (linuxtesting.org) with SVACE.
>>
>> Fixes: ed06aeefdac3 ("nfc: st-nci: Rename st21nfcb to st-nci")
>> Signed-off-by: Aleksandr Burakov <a.burakov@rosalinux.ru>
>> ---
>>  drivers/nfc/st-nci/se.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nfc/st-nci/se.c b/drivers/nfc/st-nci/se.c
>> index ec87dd21e054..ff8ac1784880 100644
>> --- a/drivers/nfc/st-nci/se.c
>> +++ b/drivers/nfc/st-nci/se.c
>> @@ -119,10 +119,11 @@ static u8 st_nci_se_get_bwi(struct nci_dev *ndev)
>>  	/* Bits 8 to 5 of the first TB for T=1 encode BWI from zero to nine */
>>  	for (i = 1; i < ST_NCI_ESE_MAX_LENGTH; i++) {
>>  		td = ST_NCI_ATR_GET_Y_FROM_TD(info->se_info.atr[i]);
>> -		if (ST_NCI_ATR_TA_PRESENT(td))
>> +		if (ST_NCI_ATR_TA_PRESENT(td) && i < ST_NCI_ESE_MAX_LENGTH - 1)
>>  			i++;
>>  		if (ST_NCI_ATR_TB_PRESENT(td)) {
>> -			i++;
>> +			if (i < ST_NCI_ESE_MAX_LENGTH - 1)
>> +				i++;
>>  			return info->se_info.atr[i] >> 4;
>>  		}
>>  	}
> 
> Rather than adding 2 checks you could do this all with one check.
> Basically you would just need to replace:
>   if (ST_NCI_ATR_TB_PRESENT(td)) {
> 	i++;
> 
> with:
>   if (ST_NCI_ATR_TB_PRESENT(td) && ++i < ST_NCI_ESE_MAX_LENGTH)
> 
> Basically it is fine to increment "i" as long as it isn't being used as
> an index so just restricting the last access so that we don't
> dereference using it as an index should be enough.

These are different checks - TA and TB. By skipping TA, your code is not
equivalent. Was it intended?

Best regards,
Krzysztof


^ permalink raw reply

* Re: [RFC PATCH v4 3/4] dpll: documentation on DPLL subsystem interface
From: Paolo Abeni @ 2022-12-19  9:13 UTC (permalink / raw)
  To: Vadim Fedorenko, Jakub Kicinski, Jiri Pirko, Arkadiusz Kubalewski,
	Jonathan Lemon
  Cc: netdev, Vadim Fedorenko, linux-arm-kernel, linux-clk
In-Reply-To: <20221129213724.10119-4-vfedorenko@novek.ru>

Hello,

I have a just a few minor notes WRT the documentation - which was a
very useful entry point for me to help understanding the subsystem.

On Wed, 2022-11-30 at 00:37 +0300, Vadim Fedorenko wrote:
> From: Vadim Fedorenko <vadfed@fb.com>
> 
> Add documentation explaining common netlink interface to configure DPLL
> devices and monitoring events. Common way to implement DPLL device in
> a driver is also covered.
> 
> Co-developed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
> Signed-off-by: Vadim Fedorenko <vadfed@fb.com>
> ---
>  Documentation/networking/dpll.rst  | 271 +++++++++++++++++++++++++++++
>  Documentation/networking/index.rst |   1 +
>  2 files changed, 272 insertions(+)
>  create mode 100644 Documentation/networking/dpll.rst
> 
> diff --git a/Documentation/networking/dpll.rst b/Documentation/networking/dpll.rst
> new file mode 100644
> index 000000000000..58401e2b70a7
> --- /dev/null
> +++ b/Documentation/networking/dpll.rst
> @@ -0,0 +1,271 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===============================
> +The Linux kernel DPLL subsystem
> +===============================
> +
> +
> +The main purpose of DPLL subsystem is to provide general interface
> +to configure devices that use any kind of Digital PLL and could use
> +different sources of signal to synchronize to as well as different
> +types of outputs.
> +The main interface is NETLINK_GENERIC based protocol with an event
> +monitoring multicast group defined.
> +
> +
> +Pin object
> +==========
> +A pin is amorphic object which represents either input and output, it
> +could be internal component of the device, as well as externaly
> +connected.
> +The number of pins per dpll vary, but usually multiple pins shall be
> +provided for a single dpll device.
> +Direction of a pin and it's capabilities are provided to the user in
> +response for netlink dump request messages.
> +Pin can be shared by multiple dpll devices. Where configuration on one
> +pin can alter multiple dplls (i.e. DPLL_PIN_SGINAL_TYPE, DPLL_PIN_TYPE,

Likely typo above: DPLL_PIN_SIGNAL_TYPE

> +DPLL_PIN_STATE), or just one pin-dpll pair (i.e. DPLL_PIN_PRIO).
> +Pin can be also a MUX type, where one or more pins are attached to
> +a parent pin. The parent pin is the one directly connected to the dpll,
> +which may be used by dplls in DPLL_MODE_AUTOMATIC selection mode, where
> +only pins directly connected to the dpll are capable of automatic
> +source pin selection. In such case, pins are dumped with
> +DPLLA_PIN_PARENT_IDX, and are able to be selected by the userspace with
> +netlink request.
> +
> +Configuration commands group
> +============================
> +
> +Configuration commands are used to get or dump information about
> +registered DPLL devices (and pins), as well as set configuration of
> +device or pins. As DPLL device could not be abstract and reflects real
> +hardware, there is no way to add new DPLL device via netlink from user
> +space and each device should be registered by it's driver.

Side note: in the long run we could end-up with a virtual/dummy dpll
driver for self-tests and/or reference's implementation sake.

> +
> +List of command with possible attributes
> +========================================
> +
> +All constants identifying command types use ``DPLL_CMD_`` prefix and
> +suffix according to command purpose. All attributes use ``DPLLA_``
> +prefix and suffix according to attribute purpose:
> +
> +  ============================  =======================================
> +  ``DEVICE_GET``                userspace to get device info
> +    ``ID``                      attr internal dpll device index
> +    ``NAME``                    attr dpll device name
> +    ``MODE``                    attr selection mode
> +    ``MODE_SUPPORTED``          attr available selection modes
> +    ``SOURCE_PIN_IDX``          attr index of currently selected source
> +    ``LOCK_STATUS``             attr internal frequency-lock status
> +    ``TEMP``                    attr device temperature information
> +    ``NETIFINDEX``              attr dpll owner Linux netdevice index

should we include also the cookie (or wuhatever will be used for
persistent device identification) into the readable attributes list? 

> +  ``DEVICE_SET``                userspace to set dpll device
> +                                configuration
> +    ``ID``                      attr internal dpll device index
> +    ``MODE``                    attr selection mode to configure
> +    ``PIN_IDX``                 attr index of source pin to select as
> +                                active source

It looks like the descrition for the above attribute ('PIN_IDX') and
'SOURCE_PIN_IDX' has been swapped.

> +  ``PIN_SET``                   userspace to set pins configuration
> +    ``ID``                      attr internal dpll device index
> +    ``PIN_IDX``                 attr index of a pin to configure
> +    ``PIN_TYPE``                attr type configuration value for
> +                                selected pin
> +    ``PIN_SIGNAL_TYPE``         attr signal type configuration value
> +                                for selected pin
> +    ``PIN_CUSTOM_FREQ``         attr signal custom frequency to be set
> +    ``PIN_STATE``               attr pin state to be set
> +    ``PIN_PRIO``                attr pin priority to be set
> +
> +Netlink dump requests
> +=====================
> +The ``DEVICE_GET`` command is capable of dump type netlink requests.
> +In such case the userspace shall provide ``DUMP_FILTER`` attribute
> +value to filter the response as required.
> +If filter is not provided only name and id of available dpll(s) is
> +provided. If the request also contains ``ID`` attribute, only selected
> +dpll device shall be dumped.

Should we explicitly document even the required permissions?

> +
> +Possible response message attributes for netlink requests depending on
> +the value of ``DPLLA_DUMP_FILTER`` attribute:
> +
> +  =============================== ====================================
> +  ``DPLL_DUMP_FILTER_PINS``       value of ``DUMP_FILTER`` attribute
> +    ``PIN``                       attr nested type contain single pin
> +                                  attributes
> +    ``PIN_IDX``                   attr index of dumped pin
> +    ``PIN_DESCRIPTION``           description of a pin provided by
> +                                  driver
> +    ``PIN_TYPE``                  attr value of pin type
> +    ``PIN_TYPE_SUPPORTED``        attr value of supported pin type
> +    ``PIN_SIGNAL_TYPE``           attr value of pin signal type
> +    ``PIN_SIGNAL_TYPE_SUPPORTED`` attr value of supported pin signal
> +                                  type
> +    ``PIN_CUSTOM_FREQ``           attr value of pin custom frequency
> +    ``PIN_STATE``                 attr value of pin state
> +    ``PIN_STATE_SUPPORTED``       attr value of supported pin state
> +    ``PIN_PRIO``                  attr value of pin prio
> +    ``PIN_PARENT_IDX``            attr value of pin patent index
> +    ``PIN_NETIFINDEX``            attr value of netdevice assocaiated
> +                                  with the pin
> +  ``DPLL_DUMP_FILTER_STATUS``     value of ``DUMP_FILTER`` attribute
> +    ``ID``                        attr internal dpll device index
> +    ``NAME``                      attr dpll device name
> +    ``MODE``                      attr selection mode
> +    ``MODE_SUPPORTED``            attr available selection modes
> +    ``SOURCE_PIN_IDX``            attr index of currently selected
> +                                  source
> +    ``LOCK_STATUS``               attr internal frequency-lock status
> +    ``TEMP``                      attr device temperature information
> +    ``NETIFINDEX``                attr dpll owner Linux netdevice index
> +
> +
> +The pre-defined enums
> +=====================
> +
> +All the enums use the ``DPLL_`` prefix.
> +
> +Values for ``PIN_TYPE`` and ``PIN_TYPE_SUPPORTED`` attributes:
> +
> +  ============================ ========================================
> +  ``PIN_TYPE_MUX``             MUX type pin, connected pins shall
> +                               have their own types
> +  ``PIN_TYPE_EXT``             External pin
> +  ``PIN_TYPE_SYNCE_ETH_PORT``  SyncE on Ethernet port
> +  ``PIN_TYPE_INT_OSCILLATOR``  Internal Oscillator (i.e. Holdover
> +                               with Atomic Clock as a Source)
> +  ``PIN_TYPE_GNSS``            GNSS 1PPS source
> +
> +Values for ``PIN_SIGNAL_TYPE`` and ``PIN_SIGNAL_TYPE_SUPPORTED``
> +attributes:
> +
> +  ===============================  ===================================
> +  ``PIN_SIGNAL_TYPE_1_PPS``        1 Hz frequency
> +  ``PIN_SIGNAL_TYPE_10_MHZ``       10 MHz frequency
> +  ``PIN_SIGNAL_TYPE_CUSTOM_FREQ``  Frequency value provided in attr
> +                                   ``PIN_CUSTOM_FREQ``
> +
> +Values for ``LOCK_STATUS`` attribute:
> +
> +  ============================= ======================================
> +  ``LOCK_STATUS_UNLOCKED``      DPLL is in freerun, not locked to any
> +                                source pin
> +  ``LOCK_STATUS_CALIBRATING``   DPLL device calibrates to lock to the
> +                                source pin signal
> +  ``LOCK_STATUS_LOCKED``        DPLL device is locked to the source
> +                                pin frequency
> +  ``LOCK_STATUS_HOLDOVER``      DPLL device lost a lock, using its
> +                                frequency holdover capabilities
> +
> +Values for ``PIN_STATE`` and ``PIN_STATE_SUPPORTED`` attributes:
> +
> +============================= ============================
> +  ``PIN_STATE_CONNECTED``     Pin connected to a dpll
> +  ``PIN_STATE_DISCONNECTED``  Pin disconnected from dpll
> +  ``PIN_STATE_SOURCE``        Source pin
> +  ``PIN_STATE_OUTPUT``        Output pin
> +
> +Possible DPLL source selection mode values:
> +
> +  =================== ================================================
> +  ``MODE_FORCED``     source pin is force-selected by
> +                      ``DPLL_CMD_DEVICE_SET`` with given value of
> +                      ``DPLLA_SOURCE_PIN_IDX`` attribute
> +  ``MODE_AUTOMATIC``  source pin ise auto selected according to

typo above 'ise' -> 'is'


Cheers,

Paolo


^ permalink raw reply

* Re: [PATCH RFC 1/2] i2c: add fwnode APIs
From: Mika Westerberg @ 2022-12-19  9:23 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: linux-acpi, linux-i2c, netdev, Andrew Lunn, David S. Miller,
	Eric Dumazet, Heiner Kallweit, Jakub Kicinski, Paolo Abeni,
	Wolfram Sang
In-Reply-To: <Y6AlC+9iGVGzWSbc@shell.armlinux.org.uk>

Hi,

On Mon, Dec 19, 2022 at 08:47:07AM +0000, Russell King (Oracle) wrote:
> Hi Mika,
> 
> On Thu, Dec 08, 2022 at 10:16:07AM +0000, Russell King (Oracle) wrote:
> > Hi Mika,
> > 
> > On Thu, Dec 08, 2022 at 12:04:02PM +0200, Mika Westerberg wrote:
> > > Hi,
> > > 
> > > On Wed, Dec 07, 2022 at 11:22:24AM +0000, Russell King (Oracle) wrote:
> > > > +EXPORT_SYMBOL(i2c_find_device_by_fwnode);
> > > > +
> > > 
> > > Drop this empty line.
> > 
> > The additional empty line was there before, and I guess is something the
> > I2C maintainer wants to logically separate the i2c device stuff from
> > the rest of the file.
> > 
> > > > +/* must call put_device() when done with returned i2c_client device */
> > > > +struct i2c_client *i2c_find_device_by_fwnode(struct fwnode_handle *fwnode);
> > > 
> > > With the kernel-docs in place you probably can drop these comments.
> > 
> > It's what is there against the other prototypes - and is very easy to
> > get wrong, as I've recently noticed in the sfp.c code as a result of
> > creating this series.
> > 
> > I find the whole _find_ vs _get_ thing a tad confusing, and there
> > probably should be just one interface with one way of putting
> > afterwards to avoid subtle long-standing bugs like this.
> > 
> > Thanks.
> 
> Do you have any comments on my reply please?

Sorry, no comments :) Thanks for the clarification.

^ permalink raw reply

* Re: [Patch net] net_sched: reject TCF_EM_SIMPLE case for complex ematch module
From: patchwork-bot+netdevbpf @ 2022-12-19  9:50 UTC (permalink / raw)
  To: Cong Wang
  Cc: netdev, cong.wang, syzbot+4caeae4c7103813598ae, jun.nie, jhs,
	pabeni
In-Reply-To: <20221217221707.46010-1-xiyou.wangcong@gmail.com>

Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Sat, 17 Dec 2022 14:17:07 -0800 you wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> When TCF_EM_SIMPLE was introduced, it is supposed to be convenient
> for ematch implementation:
> 
> https://lore.kernel.org/all/20050105110048.GO26856@postel.suug.ch/
> 
> [...]

Here is the summary with links:
  - [net] net_sched: reject TCF_EM_SIMPLE case for complex ematch module
    https://git.kernel.org/netdev/net/c/9cd3fd2054c3

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH] hamradio: baycom_epp: Do not use x86-specific rdtsc()
From: patchwork-bot+netdevbpf @ 2022-12-19  9:50 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: t.sailer, linux-hams, netdev, linux-kernel
In-Reply-To: <20221218120405.2431-1-bp@alien8.de>

Hello:

This patch was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Sun, 18 Dec 2022 13:04:05 +0100 you wrote:
> From: "Borislav Petkov (AMD)" <bp@alien8.de>
> 
> Use get_cycles() which is provided by pretty much every arch.
> 
> The UML build works too because get_cycles() is a simple "return 0;"
> because the rdtsc() is optimized away there.
> 
> [...]

Here is the summary with links:
  - hamradio: baycom_epp: Do not use x86-specific rdtsc()
    https://git.kernel.org/netdev/net-next/c/aba5b397cad7

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH RFC v2 0/2] Add I2C fwnode lookup/get interfaces
From: Russell King (Oracle) @ 2022-12-19  9:50 UTC (permalink / raw)
  To: linux-acpi, linux-i2c, netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Heiner Kallweit,
	Jakub Kicinski, Mika Westerberg, Paolo Abeni, Wolfram Sang

Hi,

This RFC series is intended for the next merge window, but we will need
to decide how to merge it as it is split across two subsystems. These
patches have been generated against the net-next, since patch 2 depends
on a recently merged patch in that tree (which is now in mainline.)

Currently, the SFP code attempts to work out what kind of fwnode we
found when looking up the I2C bus for the SFP cage, converts the fwnode
to the appropriate firmware specific representation to then call the
appropriate I2C layer function. This is inefficient, since the device
model provides a way to locate items on a bus_type by fwnode.

In order to reduce this complexity, this series adds fwnode interfaces
to the I2C subsystem to allow I2C adapters to be looked up. I also
accidentally also converted the I2C clients to also be looked up, so
I've left that in patch 1 if people think that could be useful - if
not, I'll remove it.

We could also convert the of_* functions to be inline in i2c.h and
remove the stub of_* functions and exports.

Do we want these to live in i2c-core-fwnode.c ? I don't see a Kconfig
symbol that indicates whether we want fwnode support, and I know there
are people looking to use software nodes to lookup the SFP I2C bus
(which is why the manual firmware-specific code in sfp.c is a problem.)

Thanks!

v2: updated patch 1 with docbook comments.

 drivers/i2c/i2c-core-acpi.c | 13 +-----
 drivers/i2c/i2c-core-base.c | 98 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/i2c/i2c-core-of.c   | 51 ++---------------------
 drivers/net/phy/sfp.c       | 13 +-----
 include/linux/i2c.h         |  9 +++++
 5 files changed, 112 insertions(+), 72 deletions(-)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* [PATCH RFC net-next v2 1/2] i2c: add fwnode APIs
From: Russell King (Oracle) @ 2022-12-19  9:52 UTC (permalink / raw)
  To: linux-acpi, linux-i2c, netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Heiner Kallweit,
	Jakub Kicinski, Mika Westerberg, Paolo Abeni, Wolfram Sang
In-Reply-To: <Y6Az235wsnRWFYWA@shell.armlinux.org.uk>

Add fwnode APIs for finding and getting I2C adapters, which will be
used by the SFP code. These are passed the fwnode corresponding to
the adapter, and return the I2C adapter. It is the responsibility of
the caller to find the appropriate fwnode.

We keep the DT and ACPI interfaces, but where appropriate, recode them
to use the fwnode interfaces internally.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
v2: update docbook comments

 drivers/i2c/i2c-core-acpi.c | 13 +----
 drivers/i2c/i2c-core-base.c | 98 +++++++++++++++++++++++++++++++++++++
 drivers/i2c/i2c-core-of.c   | 51 ++-----------------
 include/linux/i2c.h         |  9 ++++
 4 files changed, 111 insertions(+), 60 deletions(-)

diff --git a/drivers/i2c/i2c-core-acpi.c b/drivers/i2c/i2c-core-acpi.c
index 4dd777cc0c89..d6037a328669 100644
--- a/drivers/i2c/i2c-core-acpi.c
+++ b/drivers/i2c/i2c-core-acpi.c
@@ -442,18 +442,7 @@ EXPORT_SYMBOL_GPL(i2c_acpi_find_adapter_by_handle);
 
 static struct i2c_client *i2c_acpi_find_client_by_adev(struct acpi_device *adev)
 {
-	struct device *dev;
-	struct i2c_client *client;
-
-	dev = bus_find_device_by_acpi_dev(&i2c_bus_type, adev);
-	if (!dev)
-		return NULL;
-
-	client = i2c_verify_client(dev);
-	if (!client)
-		put_device(dev);
-
-	return client;
+	return i2c_find_device_by_fwnode(acpi_fwnode_handle(adev));
 }
 
 static int i2c_acpi_notify(struct notifier_block *nb, unsigned long value,
diff --git a/drivers/i2c/i2c-core-base.c b/drivers/i2c/i2c-core-base.c
index 13fafb74bab8..1395a78d4a22 100644
--- a/drivers/i2c/i2c-core-base.c
+++ b/drivers/i2c/i2c-core-base.c
@@ -1012,6 +1012,35 @@ void i2c_unregister_device(struct i2c_client *client)
 }
 EXPORT_SYMBOL_GPL(i2c_unregister_device);
 
+/**
+ * i2c_find_device_by_fwnode() - find an i2c_client for the fwnode
+ * @fwnode: &struct fwnode_handle corresponding to the &struct i2c_client
+ *
+ * Look up and return the &struct i2c_client corresponding to the @fwnode.
+ * If no client can be found, or @fwnode is NULL, this returns NULL.
+ *
+ * The user must call put_device(&client->dev) once done with the i2c client.
+ */
+struct i2c_client *i2c_find_device_by_fwnode(struct fwnode_handle *fwnode)
+{
+	struct i2c_client *client;
+	struct device *dev;
+
+	if (!fwnode)
+		return NULL;
+
+	dev = bus_find_device_by_fwnode(&i2c_bus_type, fwnode);
+	if (!dev)
+		return NULL;
+
+	client = i2c_verify_client(dev);
+	if (!client)
+		put_device(dev);
+
+	return client;
+}
+EXPORT_SYMBOL(i2c_find_device_by_fwnode);
+
 
 static const struct i2c_device_id dummy_id[] = {
 	{ "dummy", 0 },
@@ -1762,6 +1791,75 @@ int devm_i2c_add_adapter(struct device *dev, struct i2c_adapter *adapter)
 }
 EXPORT_SYMBOL_GPL(devm_i2c_add_adapter);
 
+static int i2c_dev_or_parent_fwnode_match(struct device *dev, const void *data)
+{
+	if (dev_fwnode(dev) == data)
+		return 1;
+
+	if (dev->parent && dev_fwnode(dev->parent) == data)
+		return 1;
+
+	return 0;
+}
+
+/**
+ * i2c_find_adapter_by_fwnode() - find an i2c_adapter for the fwnode
+ * @fwnode: &struct fwnode_handle corresponding to the &struct i2c_adapter
+ *
+ * Look up and return the &struct i2c_adapter corresponding to the @fwnode.
+ * If no adapter can be found, or @fwnode is NULL, this returns NULL.
+ *
+ * The user must call put_device(&adapter->dev) once done with the i2c adapter.
+ */
+struct i2c_adapter *i2c_find_adapter_by_fwnode(struct fwnode_handle *fwnode)
+{
+	struct i2c_adapter *adapter;
+	struct device *dev;
+
+	if (!fwnode)
+		return NULL;
+
+	dev = bus_find_device(&i2c_bus_type, NULL, fwnode,
+			      i2c_dev_or_parent_fwnode_match);
+	if (!dev)
+		return NULL;
+
+	adapter = i2c_verify_adapter(dev);
+	if (!adapter)
+		put_device(dev);
+
+	return adapter;
+}
+EXPORT_SYMBOL(i2c_find_adapter_by_fwnode);
+
+/**
+ * i2c_get_adapter_by_fwnode() - find an i2c_adapter for the fwnode
+ * @fwnode: &struct fwnode_handle corresponding to the &struct i2c_adapter
+ *
+ * Look up and return the &struct i2c_adapter corresponding to the @fwnode,
+ * and increment the adapter module's use count. If no adapter can be found,
+ * or @fwnode is NULL, this returns NULL.
+ *
+ * The user must call i2c_put_adapter(adapter) once done with the i2c adapter.
+ * Note that this is different from i2c_find_adapter_by_node().
+ */
+struct i2c_adapter *i2c_get_adapter_by_fwnode(struct fwnode_handle *fwnode)
+{
+	struct i2c_adapter *adapter;
+
+	adapter = i2c_find_adapter_by_fwnode(fwnode);
+	if (!adapter)
+		return NULL;
+
+	if (!try_module_get(adapter->owner)) {
+		put_device(&adapter->dev);
+		adapter = NULL;
+	}
+
+	return adapter;
+}
+EXPORT_SYMBOL(i2c_get_adapter_by_fwnode);
+
 static void i2c_parse_timing(struct device *dev, char *prop_name, u32 *cur_val_p,
 			    u32 def_val, bool use_def)
 {
diff --git a/drivers/i2c/i2c-core-of.c b/drivers/i2c/i2c-core-of.c
index 3ed74aa4b44b..c3e565e4bddf 100644
--- a/drivers/i2c/i2c-core-of.c
+++ b/drivers/i2c/i2c-core-of.c
@@ -113,69 +113,24 @@ void of_i2c_register_devices(struct i2c_adapter *adap)
 	of_node_put(bus);
 }
 
-static int of_dev_or_parent_node_match(struct device *dev, const void *data)
-{
-	if (dev->of_node == data)
-		return 1;
-
-	if (dev->parent)
-		return dev->parent->of_node == data;
-
-	return 0;
-}
-
 /* must call put_device() when done with returned i2c_client device */
 struct i2c_client *of_find_i2c_device_by_node(struct device_node *node)
 {
-	struct device *dev;
-	struct i2c_client *client;
-
-	dev = bus_find_device_by_of_node(&i2c_bus_type, node);
-	if (!dev)
-		return NULL;
-
-	client = i2c_verify_client(dev);
-	if (!client)
-		put_device(dev);
-
-	return client;
+	return i2c_find_device_by_fwnode(of_fwnode_handle(node));
 }
 EXPORT_SYMBOL(of_find_i2c_device_by_node);
 
 /* must call put_device() when done with returned i2c_adapter device */
 struct i2c_adapter *of_find_i2c_adapter_by_node(struct device_node *node)
 {
-	struct device *dev;
-	struct i2c_adapter *adapter;
-
-	dev = bus_find_device(&i2c_bus_type, NULL, node,
-			      of_dev_or_parent_node_match);
-	if (!dev)
-		return NULL;
-
-	adapter = i2c_verify_adapter(dev);
-	if (!adapter)
-		put_device(dev);
-
-	return adapter;
+	return i2c_find_adapter_by_fwnode(of_fwnode_handle(node));
 }
 EXPORT_SYMBOL(of_find_i2c_adapter_by_node);
 
 /* must call i2c_put_adapter() when done with returned i2c_adapter device */
 struct i2c_adapter *of_get_i2c_adapter_by_node(struct device_node *node)
 {
-	struct i2c_adapter *adapter;
-
-	adapter = of_find_i2c_adapter_by_node(node);
-	if (!adapter)
-		return NULL;
-
-	if (!try_module_get(adapter->owner)) {
-		put_device(&adapter->dev);
-		adapter = NULL;
-	}
-
-	return adapter;
+	return i2c_get_adapter_by_fwnode(of_fwnode_handle(node));
 }
 EXPORT_SYMBOL(of_get_i2c_adapter_by_node);
 
diff --git a/include/linux/i2c.h b/include/linux/i2c.h
index d84e0e99f084..bcee9faaf2e6 100644
--- a/include/linux/i2c.h
+++ b/include/linux/i2c.h
@@ -965,6 +965,15 @@ int i2c_handle_smbus_host_notify(struct i2c_adapter *adap, unsigned short addr);
 
 #endif /* I2C */
 
+/* must call put_device() when done with returned i2c_client device */
+struct i2c_client *i2c_find_device_by_fwnode(struct fwnode_handle *fwnode);
+
+/* must call put_device() when done with returned i2c_adapter device */
+struct i2c_adapter *i2c_find_adapter_by_fwnode(struct fwnode_handle *fwnode);
+
+/* must call i2c_put_adapter() when done with returned i2c_adapter device */
+struct i2c_adapter *i2c_get_adapter_by_fwnode(struct fwnode_handle *fwnode);
+
 #if IS_ENABLED(CONFIG_OF)
 /* must call put_device() when done with returned i2c_client device */
 struct i2c_client *of_find_i2c_device_by_node(struct device_node *node);
-- 
2.30.2


^ permalink raw reply related

* [PATCH RFC net-next v2 2/2] net: sfp: use i2c_get_adapter_by_fwnode()
From: Russell King (Oracle) @ 2022-12-19  9:52 UTC (permalink / raw)
  To: linux-acpi, linux-i2c, netdev
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Heiner Kallweit,
	Jakub Kicinski, Mika Westerberg, Paolo Abeni, Wolfram Sang
In-Reply-To: <Y6Az235wsnRWFYWA@shell.armlinux.org.uk>

Use the newly introduced i2c_get_adapter_by_fwnode() API, so that we
can retrieve the I2C adapter in a firmware independent manner once we
have the fwnode handle for the adapter.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 drivers/net/phy/sfp.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index 83b99d95b278..aa2f7ebbdebc 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -2644,10 +2644,8 @@ static void sfp_cleanup(void *data)
 
 static int sfp_i2c_get(struct sfp *sfp)
 {
-	struct acpi_handle *acpi_handle;
 	struct fwnode_handle *h;
 	struct i2c_adapter *i2c;
-	struct device_node *np;
 	int err;
 
 	h = fwnode_find_reference(dev_fwnode(sfp->dev), "i2c-bus", 0);
@@ -2656,16 +2654,7 @@ static int sfp_i2c_get(struct sfp *sfp)
 		return -ENODEV;
 	}
 
-	if (is_acpi_device_node(h)) {
-		acpi_handle = ACPI_HANDLE_FWNODE(h);
-		i2c = i2c_acpi_find_adapter_by_handle(acpi_handle);
-	} else if ((np = to_of_node(h)) != NULL) {
-		i2c = of_find_i2c_adapter_by_node(np);
-	} else {
-		err = -EINVAL;
-		goto put;
-	}
-
+	i2c = i2c_get_adapter_by_fwnode(h);
 	if (!i2c) {
 		err = -EPROBE_DEFER;
 		goto put;
-- 
2.30.2


^ permalink raw reply related

* Re: BUG: unable to handle kernel paging request in bpf_dispatcher_xdp #forregzbot
From: Thorsten Leemhuis @ 2022-12-19  9:59 UTC (permalink / raw)
  To: bpf, regressions@lists.linux.dev; +Cc: Linux Kernel Mailing List, netdev
In-Reply-To: <96ee7141-f09c-4df9-015b-6ae8f3588091@leemhuis.info>

On 08.12.22 09:44, Thorsten Leemhuis wrote:
> [Note: this mail contains only information for Linux kernel regression
> tracking. Mails like these contain '#forregzbot' in the subject to make
> then easy to spot and filter out. The author also tried to remove most
> or all individuals from the list of recipients to spare them the hassle.]
> 
> On 06.12.22 04:28, Hao Sun wrote:
>>
>> The following crash can be triggered with the BPF prog provided.
>> It seems the verifier passed some invalid progs. I will try to simplify
>> the C reproducer, for now, the following can reproduce this:
> 
> Thanks for the report. To be sure below issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression
> tracking bot:
> 
> #regzbot ^introduced c86df29d11df
> #regzbot title net/bpf: BUG: unable to handle kernel paging request in
> bpf_dispatcher_xdp
> #regzbot ignore-activity

#regzbot fix: bpf: Synchronize dispatcher update with
bpf_dispatcher_xdp_func

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox