Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2] net/tg3: fix race condition in tg3_reset_task()
From: Michael Chan @ 2023-11-02 17:27 UTC (permalink / raw)
  To: Thinh Tran
  Cc: netdev, siva.kallam, prashant, mchan, pavan.chebbi, drc,
	venkata.sai.duggi
In-Reply-To: <20231102161219.220-1-thinhtr@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 2016 bytes --]

On Thu, Nov 2, 2023 at 9:16 AM Thinh Tran <thinhtr@linux.vnet.ibm.com> wrote:
>
> When an EEH error is encountered by a PCI adapter, the EEH driver
> modifies the PCI channel's state as shown below:
>
>    enum {
>       /* I/O channel is in normal state */
>       pci_channel_io_normal = (__force pci_channel_state_t) 1,
>
>       /* I/O to channel is blocked */
>       pci_channel_io_frozen = (__force pci_channel_state_t) 2,
>
>       /* PCI card is dead */
>       pci_channel_io_perm_failure = (__force pci_channel_state_t) 3,
>    };
>
> If the same EEH error then causes the tg3 driver's transmit timeout
> logic to execute, the tg3_tx_timeout() function schedules a reset
> task via tg3_reset_task_schedule(), which may cause a race condition
> between the tg3 and EEH driver as both attempt to recover the HW via
> a reset action.
>
> EEH driver gets error event
> --> eeh_set_channel_state()
>     and set device to one of
>     error state above           scheduler: tg3_reset_task() get
>                                 returned error from tg3_init_hw()
>                              --> dev_close() shuts down the interface
>
> tg3_io_slot_reset() and
> tg3_io_resume() fail to
> reset/resume the device
>
>
> To resolve this issue, we avoid the race condition by checking the PCI
> channel state in the tg3_tx_timeout() function and skip the tg3 driver
> initiated reset when the PCI channel is not in the normal state.  (The
> driver has no access to tg3 device registers at this point and cannot
> even complete the reset task successfully without external assistance.)
> We'll leave the reset procedure to be managed by the EEH driver which
> calls the tg3_io_error_detected(), tg3_io_slot_reset() and
> tg3_io_resume() functions as appropriate.

This scenario can affect other drivers too, right?  Shouldn't this be
handled in a higher layer before calling ->ndo_tx_timeout() so we
don't have to add this logic to all the other drivers?  Thanks.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]

^ permalink raw reply

* [PATCH v2 1/2] tg3: Increment tx_dropped in tg3_tso_bug()
From: alexey.pakhunov @ 2023-11-02 17:25 UTC (permalink / raw)
  To: mchan
  Cc: vincent.wong2, netdev, linux-kernel, siva.kallam, prashant,
	Alex Pakhunov
In-Reply-To: <20231102172503.3413318-1-alexey.pakhunov@spacex.com>

From: Alex Pakhunov <alexey.pakhunov@spacex.com>

tg3_tso_bug() drops a packet if it cannot be segmented for any reason.
The number of discarded frames should be incremeneted accordingly.

Signed-off-by: Alex Pakhunov <alexey.pakhunov@spacex.com>
Signed-off-by: Vincent Wong <vincent.wong2@spacex.com>
---
 drivers/net/ethernet/broadcom/tg3.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 14b311196b8f..99638e6c9e16 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -7874,8 +7874,10 @@ static int tg3_tso_bug(struct tg3 *tp, struct tg3_napi *tnapi,
 
 	segs = skb_gso_segment(skb, tp->dev->features &
 				    ~(NETIF_F_TSO | NETIF_F_TSO6));
-	if (IS_ERR(segs) || !segs)
+	if (IS_ERR(segs) || !segs) {
+		tp->tx_dropped++;
 		goto tg3_tso_bug_end;
+	}
 
 	skb_list_walk_safe(segs, seg, next) {
 		skb_mark_not_on_list(seg);
-- 
2.39.3


^ permalink raw reply related

* [PATCH v2 2/2] tg3: Fix the TX ring stall
From: alexey.pakhunov @ 2023-11-02 17:25 UTC (permalink / raw)
  To: mchan
  Cc: vincent.wong2, netdev, linux-kernel, siva.kallam, prashant,
	Alex Pakhunov
In-Reply-To: <20231102172503.3413318-1-alexey.pakhunov@spacex.com>

From: Alex Pakhunov <alexey.pakhunov@spacex.com>

The TX ring maintained by the tg3 driver can end up in the state, when it
has packets queued for sending but the NIC hardware is not informed, so no
progress is made. This leads to a multi-second interruption in network
traffic followed by dev_watchdog() firing and resetting the queue.

The specific sequence of steps is:

1. tg3_start_xmit() is called at least once and queues packet(s) without
   updating tnapi->prodmbox (netdev_xmit_more() returns true)
2. tg3_start_xmit() is called with an SKB which causes tg3_tso_bug() to be
   called.
3. tg3_tso_bug() determines that the SKB is too large, ...

        if (unlikely(tg3_tx_avail(tnapi) <= frag_cnt_est)) {

   ... stops the queue, and returns NETDEV_TX_BUSY:

        netif_tx_stop_queue(txq);
        ...
        if (tg3_tx_avail(tnapi) <= frag_cnt_est)
                return NETDEV_TX_BUSY;

4. Since all tg3_tso_bug() call sites directly return, the code updating
   tnapi->prodmbox is skipped.

5. The queue is stuck now. tg3_start_xmit() is not called while the queue
   is stopped. The NIC is not processing new packets because
   tnapi->prodmbox wasn't updated. tg3_tx() is not called by
   tg3_poll_work() because the all TX descriptions that could be freed has
   been freed:

        /* run TX completion thread */
        if (tnapi->hw_status->idx[0].tx_consumer != tnapi->tx_cons) {
                tg3_tx(tnapi);

6. Eventually, dev_watchdog() fires triggering a reset of the queue.

This fix makes sure that the tnapi->prodmbox update happens regardless of
the reason tg3_start_xmit() returned.

Signed-off-by: Alex Pakhunov <alexey.pakhunov@spacex.com>
Signed-off-by: Vincent Wong <vincent.wong2@spacex.com>
---
v2: Sort Order the local variables in tg3_start_xmit() in the RCS order
v1: https://lore.kernel.org/netdev/20231101191858.2611154-1-alexey.pakhunov@spacex.com/T/#t
---
 drivers/net/ethernet/broadcom/tg3.c | 53 +++++++++++++++++++++++------
 1 file changed, 42 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 99638e6c9e16..f7680d3e46da 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6603,9 +6603,9 @@ static void tg3_tx(struct tg3_napi *tnapi)
 
 	tnapi->tx_cons = sw_idx;
 
-	/* Need to make the tx_cons update visible to tg3_start_xmit()
+	/* Need to make the tx_cons update visible to __tg3_start_xmit()
 	 * before checking for netif_queue_stopped().  Without the
-	 * memory barrier, there is a small possibility that tg3_start_xmit()
+	 * memory barrier, there is a small possibility that __tg3_start_xmit()
 	 * will miss it and cause the queue to be stopped forever.
 	 */
 	smp_mb();
@@ -7845,7 +7845,7 @@ static bool tg3_tso_bug_gso_check(struct tg3_napi *tnapi, struct sk_buff *skb)
 	return skb_shinfo(skb)->gso_segs < tnapi->tx_pending / 3;
 }
 
-static netdev_tx_t tg3_start_xmit(struct sk_buff *, struct net_device *);
+static netdev_tx_t __tg3_start_xmit(struct sk_buff *, struct net_device *);
 
 /* Use GSO to workaround all TSO packets that meet HW bug conditions
  * indicated in tg3_tx_frag_set()
@@ -7881,7 +7881,7 @@ static int tg3_tso_bug(struct tg3 *tp, struct tg3_napi *tnapi,
 
 	skb_list_walk_safe(segs, seg, next) {
 		skb_mark_not_on_list(seg);
-		tg3_start_xmit(seg, tp->dev);
+		__tg3_start_xmit(seg, tp->dev);
 	}
 
 tg3_tso_bug_end:
@@ -7891,7 +7891,7 @@ static int tg3_tso_bug(struct tg3 *tp, struct tg3_napi *tnapi,
 }
 
 /* hard_start_xmit for all devices */
-static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
+static netdev_tx_t __tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct tg3 *tp = netdev_priv(dev);
 	u32 len, entry, base_flags, mss, vlan = 0;
@@ -8135,11 +8135,6 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			netif_tx_wake_queue(txq);
 	}
 
-	if (!netdev_xmit_more() || netif_xmit_stopped(txq)) {
-		/* Packets are ready, update Tx producer idx on card. */
-		tw32_tx_mbox(tnapi->prodmbox, entry);
-	}
-
 	return NETDEV_TX_OK;
 
 dma_error:
@@ -8152,6 +8147,42 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	return NETDEV_TX_OK;
 }
 
+static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct netdev_queue *txq;
+	u16 skb_queue_mapping;
+	netdev_tx_t ret;
+
+	skb_queue_mapping = skb_get_queue_mapping(skb);
+	txq = netdev_get_tx_queue(dev, skb_queue_mapping);
+
+	ret = __tg3_start_xmit(skb, dev);
+
+	/* Notify the hardware that packets are ready by updating the TX ring
+	 * tail pointer. We respect netdev_xmit_more() thus avoiding poking
+	 * the hardware for every packet. To guarantee forward progress the TX
+	 * ring must be drained when it is full as indicated by
+	 * netif_xmit_stopped(). This needs to happen even when the current
+	 * skb was dropped or rejected with NETDEV_TX_BUSY. Otherwise packets
+	 * queued by previous __tg3_start_xmit() calls might get stuck in
+	 * the queue forever.
+	 */
+	if (!netdev_xmit_more() || netif_xmit_stopped(txq)) {
+		struct tg3_napi *tnapi;
+		struct tg3 *tp;
+
+		tp = netdev_priv(dev);
+		tnapi = &tp->napi[skb_queue_mapping];
+
+		if (tg3_flag(tp, ENABLE_TSS))
+			tnapi++;
+
+		tw32_tx_mbox(tnapi->prodmbox, tnapi->tx_prod);
+	}
+
+	return ret;
+}
+
 static void tg3_mac_loopback(struct tg3 *tp, bool enable)
 {
 	if (enable) {
@@ -17682,7 +17713,7 @@ static int tg3_init_one(struct pci_dev *pdev,
 	 * device behind the EPB cannot support DMA addresses > 40-bit.
 	 * On 64-bit systems with IOMMU, use 40-bit dma_mask.
 	 * On 64-bit systems without IOMMU, use 64-bit dma_mask and
-	 * do DMA address check in tg3_start_xmit().
+	 * do DMA address check in __tg3_start_xmit().
 	 */
 	if (tg3_flag(tp, IS_5788))
 		persist_dma_mask = dma_mask = DMA_BIT_MASK(32);
-- 
2.39.3


^ permalink raw reply related

* [PATCH v2 0/2] tg3: Fix the TX ring stall
From: alexey.pakhunov @ 2023-11-02 17:25 UTC (permalink / raw)
  To: mchan
  Cc: vincent.wong2, netdev, linux-kernel, siva.kallam, prashant,
	Alex Pakhunov

From: Alex Pakhunov <alexey.pakhunov@spacex.com>

This patch fixes a problem with the tg3 driver we encountered on several
machines having Broadcom 5719 NIC. The problem showed up as a 10-20 seconds
interruption in network traffic and these dmegs message followed by the NIC
registers dump:

=== dmesg ===
NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out
...
RIP: 0010:dev_watchdog+0x21e/0x230
...
tg3 0000:02:00.2 eth0: transmit timed out, resetting
=== ===

The issue was observed with "4.15.0-52-lowlatency #56~16.04.1-Ubuntu" and
"4.15.0-161-lowlatency #169~16.04.1-Ubuntu" kernels.

Based on the state of the TX queue at the time of the reset and analysis of
dev_watchdog() it appeared that the NIC has not been notified about packets
accumulated in the TX ring for TG3_TX_TIMEOUT seconds and was reset:

=== dmesg ===
tg3 0000:02:00.2 eth0: 0: Host status block [00000001:000000a0:(0000:06d8:0000):(0000:01a0)]
tg3 0000:02:00.2 eth0: 0: NAPI info [000000a0:000000a0:(0188:01a0:01ff):0000:(06f2:0000:0000:0000)]
=== ===

tnapi->hw_status->idx[0].tx_consumer is the same as tnapi->tx_cons (0x1a0)
meaning that the driver has processed all TX descriptions released by
the NIC. tnapi->tx_prod (0x188) is ahead of 0x1a0 meaning that there are
more descriptors in the TX ring ready to be sent but the NIC does not know
about that yet.

Further analysis showed that tg3_start_xmit() can stop the TX queue and
not tell the NIC about already enqueued packets. The specific sequence
is:

1. tg3_start_xmit() is called at least once and queues packet(s) without
   updating tnapi->prodmbox (netdev_xmit_more() returns true)
2. tg3_start_xmit() is called with an SKB which causes tg3_tso_bug() to be
   called.
3. tg3_tso_bug() determines that the SKB is too large [L7860], ...

        if (unlikely(tg3_tx_avail(tnapi) <= frag_cnt_est)) {

   ... stops the queue [L7861], and returns NETDEV_TX_BUSY [L7870]:

        netif_tx_stop_queue(txq);
        ...
        if (tg3_tx_avail(tnapi) <= frag_cnt_est)
                return NETDEV_TX_BUSY;

4. Since all tg3_tso_bug() call sites directly return, the code updating
   tnapi->prodmbox [L8138] is skipped.

5. The queue is stuck now. tg3_start_xmit() is not called while the queue
   is stopped. The NIC is not processing new packets because
   tnapi->prodmbox wasn't updated. tg3_tx() is not called by
   tg3_poll_work() because the all TX descriptions that could be freed has
   been freed [L7159]:

        /* run TX completion thread */
        if (tnapi->hw_status->idx[0].tx_consumer != tnapi->tx_cons) {
                tg3_tx(tnapi);

6. Eventually, dev_watchdog() fires resetting the queue.

As far as I can tell this sequence is still possible in HEAD of master.

I could not reproduce this stall by generating traffic to match conditions
required for tg3_tso_bug() to be called. Based on the driver's code
the SKB must be a TSO or GSO skb; it should contain a VLAN tag or extra TCP
header options; and it should be queued at exactly the right time.
I believe that the last part is what makes reproducing it harder.

However I was able to reproduce the stall by mimicing the behavior of
tg3_tso_bug() in tg3_start_xmit(). I added the following lines to
tg3_start_xmit() before "would_hit_hwbug = 0;" [L8046]:

        if (...) {
                netif_tx_stop_queue(txq);
                return NETDEV_TX_BUSY;
        }

        would_hit_hwbug = 0;

The condition is not super relevant. It was used to control when the stall
is induced, so that the network is not completely broken dueing testing.
This approach reproduced the issue rather reliably.

The proposed fix makes sure that the tnapi->prodmbox update happens
regardless of the reason tg3_start_xmit() returned. It essentially moves
the code updating tnapi->prodmbox from tg3_start_xmit() (which is renamed
to __tg3_start_xmit()) to a new wrapper. This makes sure all retun paths
are covered.

I tested this fix with the code inducing the TX stall from above. The fix
eliminated stalls completely.

An aternative approch, jumping to the code updating tnapi->prodmbox after
returning from tg3_tso_bug(), was considered. It yields a patch of almost
the same size. There are four branches in tg3_start_xmit() that would
need the goto: three tg3_tso_bug() call sites and the early return in
the very beginning of tg3_start_xmit(). This seemed like a more fragile
approach too since anyone modifying the function would need to be careful
to preserve the invatiant of leaving it through a particular branch.

Alex Pakhunov (2):
  tg3: Increment tx_dropped in tg3_tso_bug()
  tg3: Fix the TX ring stall

 drivers/net/ethernet/broadcom/tg3.c | 57 +++++++++++++++++++++++------
 1 file changed, 45 insertions(+), 12 deletions(-)

base-commit: ffc253263a1375a65fa6c9f62a893e9767fbebfa
-- 
2.39.3

^ permalink raw reply

* Re: [net-next RFC PATCH v3 2/4] net: phy: aquantia: move MMD_VEND define to header
From: Andrew Lunn @ 2023-11-02 17:19 UTC (permalink / raw)
  To: Christian Marangi
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiner Kallweit,
	Russell King, Robert Marko, Vladimir Oltean, netdev, devicetree,
	linux-kernel
In-Reply-To: <20231102150032.10740-2-ansuelsmth@gmail.com>

On Thu, Nov 02, 2023 at 04:00:30PM +0100, Christian Marangi wrote:
> Move MMD_VEND define to header to clean things up and in preparation for
> firmware loading support that require some define placed in
> aquantia_main.
> 
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next 9/9] mptcp: refactor sndbuf auto-tuning
From: Eric Dumazet @ 2023-11-02 17:19 UTC (permalink / raw)
  To: Mat Martineau
  Cc: Matthieu Baerts, David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev, mptcp
In-Reply-To: <20231023-send-net-next-20231023-2-v1-9-9dc60939d371@kernel.org>

On Mon, Oct 23, 2023 at 10:45 PM Mat Martineau <martineau@kernel.org> wrote:
>
> From: Paolo Abeni <pabeni@redhat.com>
>
> The MPTCP protocol account for the data enqueued on all the subflows
> to the main socket send buffer, while the send buffer auto-tuning
> algorithm set the main socket send buffer size as the max size among
> the subflows.
>
> That causes bad performances when at least one subflow is sndbuf
> limited, e.g. due to very high latency, as the MPTCP scheduler can't
> even fill such buffer.
>
> Change the send-buffer auto-tuning algorithm to compute the main socket
> send buffer size as the sum of all the subflows buffer size.
>
> Reviewed-by: Mat Martineau <martineau@kernel.org>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Mat Martineau <martineau@kernel.org

...

> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index df208666fd19..2b43577f952e 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -421,6 +421,7 @@ static bool subflow_use_different_dport(struct mptcp_sock *msk, const struct soc
>
>  void __mptcp_set_connected(struct sock *sk)
>  {
> +       __mptcp_propagate_sndbuf(sk, mptcp_sk(sk)->first);

->first can be NULL here, according to syzbot.

>         if (sk->sk_state == TCP_SYN_SENT) {
>                 inet_sk_state_store(sk, TCP_ESTABLISHED);
>                 sk->sk_state_change(sk);

^ permalink raw reply

* Re: [PATCH net] netlink: fill in missing MODULE_DESCRIPTION()
From: Jakub Kicinski @ 2023-11-02 17:14 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Jiri Pirko, davem, netdev, edumazet, pabeni
In-Reply-To: <20231102120533.GL6174@breakpoint.cc>

On Thu, 2 Nov 2023 13:05:33 +0100 Florian Westphal wrote:
> > It's a bit odd to target -net with this, isn't it?  

I mostly wanted to make sure the build bot still works after
we sucked in all the code from Linus. There was no patches getting
posted but...

> I had planned to fill the missing descriptions for
> all netfilter via next nf.git PR as I consider those as
> bug fixes.
> 
> Thats the regression risk here?

+1 for getting it into net, no regression risk and it's low key
annoying to see all these warnings.

^ permalink raw reply

* Re: [PATCH] [PATCH net] tg3: power down device only on SYSTEM_POWER_OFF
From: Pavan Chebbi @ 2023-11-02 17:11 UTC (permalink / raw)
  To: George Shuklin; +Cc: netdev, Andrew Gospodarek, Michael Chan
In-Reply-To: <5c778d51-ec87-4e74-9fd6-63dc4a9ae2a6@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2351 bytes --]

On Thu, Nov 2, 2023 at 7:44 PM George Shuklin <george.shuklin@gmail.com> wrote:
>
> On 11/2/23 09:04, Pavan Chebbi wrote:
> > On Thu, Nov 2, 2023 at 1:28 AM George Shuklin <george.shuklin@gmail.com> wrote:
> >> On 01/11/2023 17:20, Pavan Chebbi wrote:
> >>> On Wed, Nov 1, 2023 at 6:34 PM George Shuklin <george.shuklin@gmail.com> wrote:
> >>>> Dell R650xs servers hangs if tg3 driver calls tg3_power_down.
> >>>>
> >>>> This happens only if network adapters (BCM5720 for R650xs) were
> >>>> initialized using SNP (e.g. by booting ipxe.efi).
> >>>>
> >>>> This is partial revert of commit 2ca1c94ce0b.
> >>>>
> >>>> The actual problem is on Dell side, but this fix allow servers
> >>>> to come back alive after reboot.
> >>> How are you sure that the problem solved by 2ca1c94ce0b is not
> >>> reintroduced with this change?
> >> I contacted the author of original patch, no reply yet (1st day). Also,
> >> I tested it on few generations of available Dell servers (R330, R340,
> >> R350 and R650sx, for which this fix should help). It does produce log
> >> message from
> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471, but, at
> >> least, it reboots without issues.
> >>
> >> Actually, original patch is regression: 5.19 rebooting just fine, 6.0
> >> start to hang. I also reported it to dell support forum, but I'm not
> >> sure if they pick it up or not.
> >>
> >> What would be the proper course of actions for such problem (outside of
> >> fixing UEFI SNP, for which I don't have access to sources)?
> >>
> > Thanks for the explanation. I am not sure if we should make this
> > change unless we are 100pc sure that this patch won't cause
> > regression.
> > I feel Dell is in the best position to debug this and they can even
> > contact Broadcom if they see any problem in UEFI.
>
> I'm right now with dell support, and what they asked is to 'try this on
> supported distros', which at newest are 5.15. I'll try to bypass their
> L1 with Ubuntu + HWE to get to 6+ versions...
>
> I was able to reproduce hanging at reboot there (without ACPI messages),
> and patching helps there too.
>
OK. I am not too sure what we should do. The change as such looks fine to me.
Of course, the patch needs proper tags (tree.fixes, cc ...)

@Michael : Do you have any suggestions on this?

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]

^ permalink raw reply

* Re: [PATCH net 6/7] net: hns3: fix VF reset fail issue
From: Paolo Abeni @ 2023-11-02 16:24 UTC (permalink / raw)
  To: Jijie Shao, yisen.zhuang, salil.mehta, davem, edumazet, kuba
  Cc: shenjian15, wangjie125, liuyonglong, netdev, linux-kernel
In-Reply-To: <c87cfcbc-8cd6-4a01-bac0-74113f7ca904@huawei.com>

On Thu, 2023-11-02 at 20:16 +0800, Jijie Shao wrote:
> on 2023/11/2 18:45, Paolo Abeni wrote:
> > On Sat, 2023-10-28 at 10:59 +0800, Jijie Shao wrote:
> > >   
> > > -static void hclgevf_clear_event_cause(struct hclgevf_dev *hdev, u32 regclr)
> > > +static void hclgevf_clear_event_cause(struct hclgevf_dev *hdev, u32 regclr,
> > > +				      bool need_dalay)
> > >   {
> > > +#define HCLGEVF_RESET_DELAY		5
> > > +
> > > +	if (need_dalay)
> > > +		mdelay(HCLGEVF_RESET_DELAY);
> > 5ms delay in an interrupt handler is quite a lot. What about scheduling
> > a timer from the IH to clear the register when such delay is needed?
> > 
> > Thanks!
> > 
> > Paolo
> 
> Using timer in this case will complicate the code and make maintenance difficult.

Why? 

Would something alike the following be ok? (plus reset_timer
initialization at vf creation and cleanup at vf removal time):

---
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index a4d68fb216fb..626bc67065fc 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -1974,6 +1974,14 @@ static enum hclgevf_evt_cause hclgevf_check_evt_cause(struct hclgevf_dev *hdev,
 	return HCLGEVF_VECTOR0_EVENT_OTHER;
 }
 
+static void hclgevf_reset_timer(struct timer_list *t)
+{
+	struct hclgevf_dev *hdev = from_timer(hclgevf_dev, t, reset_timer);
+
+	hclgevf_clear_event_cause(hdev, HCLGEVF_VECTOR0_EVENT_RST);
+	hclgevf_reset_task_schedule(hdev);
+}
+
 static irqreturn_t hclgevf_misc_irq_handle(int irq, void *data)
 {
 	enum hclgevf_evt_cause event_cause;
@@ -1982,13 +1990,13 @@ static irqreturn_t hclgevf_misc_irq_handle(int irq, void *data)
 
 	hclgevf_enable_vector(&hdev->misc_vector, false);
 	event_cause = hclgevf_check_evt_cause(hdev, &clearval);
+	if (event_cause == HCLGEVF_VECTOR0_EVENT_RST)
+		mod_timer(hdev->reset_timer, jiffies + msecs_to_jiffies(5));
+
 	if (event_cause != HCLGEVF_VECTOR0_EVENT_OTHER)
 		hclgevf_clear_event_cause(hdev, clearval);
 
 	switch (event_cause) {
-	case HCLGEVF_VECTOR0_EVENT_RST:
-		hclgevf_reset_task_schedule(hdev);
-		break;
 	case HCLGEVF_VECTOR0_EVENT_MBX:
 		hclgevf_mbx_handler(hdev);
 		break;
---

> We consider reducing the delay time by polling. For example,
> the code cycles every 50 us to check whether the write register takes effect.
> If yes, the function returns immediately. or the code cycles until 5 ms.
> 
> Is this method appropriate?

IMHO such solution will not remove the problem. How frequent is
expected to be the irq generating such delay?

Thanks

Paolo



^ permalink raw reply related

* Re: [PATCH net-next v2 3/3] net: dsa: realtek: support reset controller
From: Vladimir Oltean @ 2023-11-02 16:21 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Luiz Angelo Daros de Luca, netdev, alsi, andrew, vivien.didelot,
	f.fainelli, davem, kuba, pabeni, robh+dt, krzk+dt, arinc.unal
In-Reply-To: <20231102155521.2yo5qpugdhkjy22x@skbuf>

On Thu, Nov 02, 2023 at 05:55:21PM +0200, Vladimir Oltean wrote:
> +static int __init realtek_interface_init(void)
> +{
> +	int err;
> +
> +	err = realtek_mdio_init();
> +	if (err)
> +		return err;
> +
> +	err = realtek_smi_init();
> +	if (err) {
> +		realtek_smi_exit();

One more correction, this was supposed to be realtek_mdio_exit().

> +		return err;
> +	}
> +
> +	return 0;
> +}
> +module_init(realtek_interface_init);

^ permalink raw reply

* [PATCH v2] net/tg3: fix race condition in tg3_reset_task()
From: Thinh Tran @ 2023-11-02 16:12 UTC (permalink / raw)
  To: netdev, siva.kallam, prashant, mchan, pavan.chebbi, drc
  Cc: venkata.sai.duggi, Thinh Tran
In-Reply-To: <20231002185510.1488-1-thinhtr@linux.vnet.ibm.com>

When an EEH error is encountered by a PCI adapter, the EEH driver
modifies the PCI channel's state as shown below:

   enum {
      /* I/O channel is in normal state */
      pci_channel_io_normal = (__force pci_channel_state_t) 1,

      /* I/O to channel is blocked */
      pci_channel_io_frozen = (__force pci_channel_state_t) 2,

      /* PCI card is dead */
      pci_channel_io_perm_failure = (__force pci_channel_state_t) 3,
   };

If the same EEH error then causes the tg3 driver's transmit timeout
logic to execute, the tg3_tx_timeout() function schedules a reset
task via tg3_reset_task_schedule(), which may cause a race condition
between the tg3 and EEH driver as both attempt to recover the HW via
a reset action.

EEH driver gets error event
--> eeh_set_channel_state()
    and set device to one of
    error state above		scheduler: tg3_reset_task() get 
   				returned error from tg3_init_hw()
			     --> dev_close() shuts down the interface

tg3_io_slot_reset() and 
tg3_io_resume() fail to
reset/resume the device


To resolve this issue, we avoid the race condition by checking the PCI
channel state in the tg3_tx_timeout() function and skip the tg3 driver
initiated reset when the PCI channel is not in the normal state.  (The
driver has no access to tg3 device registers at this point and cannot
even complete the reset task successfully without external assistance.)
We'll leave the reset procedure to be managed by the EEH driver which
calls the tg3_io_error_detected(), tg3_io_slot_reset() and 
tg3_io_resume() functions as appropriate. 



Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
Tested-by: Venkata Sai Duggi <venkata.sai.duggi@ibm.com>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>

---
 drivers/net/ethernet/broadcom/tg3.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 14b311196b8f..1c72ef05ab1b 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -7630,6 +7630,26 @@ static void tg3_tx_timeout(struct net_device *dev, unsigned int txqueue)
 {
 	struct tg3 *tp = netdev_priv(dev);
 
+	/* checking the PCI channel state for hard errors
+	 * for pci_channel_io_frozen case
+	 *   - I/O to channel is blocked.
+	 *     The EEH layer and I/O error detections will
+	 *     handle the reset procedure
+	 * for pci_channel_io_perm_failure  case
+	 *   - the PCI card is dead.
+	 *     The reset will not help
+	 * report the error for both cases and return.
+	 */
+	if (tp->pdev->error_state == pci_channel_io_frozen) {
+		netdev_err(dev, " %s, I/O to channel is blocked\n", __func__);
+		return;
+	}
+
+	if (tp->pdev->error_state == pci_channel_io_perm_failure) {
+		netdev_err(dev, " %s, adapter has failed permanently!\n", __func__);
+		return;
+	}
+
 	if (netif_msg_tx_err(tp)) {
 		netdev_err(dev, "transmit timed out, resetting\n");
 		tg3_dump_state(tp);
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH bpf-next v3 1/2] bpf: add skcipher API support to TC/XDP programs
From: Vadim Fedorenko @ 2023-11-02 16:14 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Martin KaFai Lau, Song Liu, bpf, Network Development,
	Linux Crypto Mailing List, Jakub Kicinski, Andrii Nakryiko,
	Alexei Starovoitov, Mykola Lysenko, Vadim Fedorenko,
	David S. Miller, Herbert Xu
In-Reply-To: <CAADnVQ+9pp33zv9DxouEmg24o7w27OKFUcvKChHuby_+d6-bLg@mail.gmail.com>

On 02/11/2023 15:36, Alexei Starovoitov wrote:
> On Thu, Nov 2, 2023 at 6:44 AM Vadim Fedorenko
> <vadim.fedorenko@linux.dev> wrote:
>>
>> On 01/11/2023 23:41, Martin KaFai Lau wrote:
>>> On 11/1/23 3:50 PM, Vadim Fedorenko wrote:
>>>>>> +static void *__bpf_dynptr_data_ptr(const struct bpf_dynptr_kern *ptr)
>>>>>> +{
>>>>>> +    enum bpf_dynptr_type type;
>>>>>> +
>>>>>> +    if (!ptr->data)
>>>>>> +        return NULL;
>>>>>> +
>>>>>> +    type = bpf_dynptr_get_type(ptr);
>>>>>> +
>>>>>> +    switch (type) {
>>>>>> +    case BPF_DYNPTR_TYPE_LOCAL:
>>>>>> +    case BPF_DYNPTR_TYPE_RINGBUF:
>>>>>> +        return ptr->data + ptr->offset;
>>>>>> +    case BPF_DYNPTR_TYPE_SKB:
>>>>>> +        return skb_pointer_if_linear(ptr->data, ptr->offset,
>>>>>> __bpf_dynptr_size(ptr));
>>>>>> +    case BPF_DYNPTR_TYPE_XDP:
>>>>>> +    {
>>>>>> +        void *xdp_ptr = bpf_xdp_pointer(ptr->data, ptr->offset,
>>>>>> __bpf_dynptr_size(ptr));
>>>>>
>>>>> I suspect what it is doing here (for skb and xdp in particular) is
>>>>> very similar to bpf_dynptr_slice. Please check if
>>>>> bpf_dynptr_slice(ptr, 0, NULL, sz) will work.
>>>>>
>>>>
>>>> Well, yes, it's simplified version of bpf_dynptr_slice. The problem is
>>>> that bpf_dynptr_slice bpf_kfunc which cannot be used in another
>>>> bpf_kfunc. Should I refactor the code to use it in both places? Like
>>>
>>> Sorry, scrolled too fast in my earlier reply :(
>>>
>>> I am not aware of this limitation. What error does it have?
>>> The bpf_dynptr_slice_rdwr kfunc() is also calling the bpf_dynptr_slice()
>>> kfunc.
>>>
>>>> create __bpf_dynptr_slice() which will be internal part of bpf_kfunc?
>>
>> Apparently Song has a patch to expose these bpf_dynptr_slice* functions
>> ton in-kernel users.
>>
>> https://lore.kernel.org/bpf/20231024235551.2769174-2-song@kernel.org/
>>
>> Should I wait for it to be merged before sending next version?
> 
> If you need something from another developer it's best to ask them
> explicitly :)
> In this case Song can respin with just that change that you need.

Got it. I actually need 2 different changes from the same patchset, I'll 
ping Song in the appropriate thread, thanks!


^ permalink raw reply

* [PATCH] net: ti: icssg-prueth: Add missing icss_iep_put to error path
From: Jan Kiszka @ 2023-11-02 16:03 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	MD Danish Anwar
  Cc: netdev, linux-kernel, Lopes Ivo, Diogo Miguel (T CED IFD-PT),
	Nishanth Menon, Su, Bao Cheng (RC-CN DF FA R&D)

From: Jan Kiszka <jan.kiszka@siemens.com>

Analogously to prueth_remove.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

This was lost from the TI SDK version while ripping out SR1.0 support - 
which we are currently restoring for upstream.

 drivers/net/ethernet/ti/icssg/icssg_prueth.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.c b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
index ffae89a6ccc5..0242e123fc05 100644
--- a/drivers/net/ethernet/ti/icssg/icssg_prueth.c
+++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
@@ -2200,6 +2200,9 @@ static int prueth_probe(struct platform_device *pdev)
 	if (prueth->pdata.quirk_10m_link_issue)
 		icss_iep_exit_fw(prueth->iep1);
 
+	icss_iep_put(prueth->iep1);
+	icss_iep_put(prueth->iep0);
+
 free_pool:
 	gen_pool_free(prueth->sram_pool,
 		      (unsigned long)prueth->msmcram.va, msmc_ram_size);
-- 
2.35.3

^ permalink raw reply related

* Re: [PATCH] net/tg3: fix race condition in tg3_reset_task_cancel()
From: Thinh Tran @ 2023-11-02 16:02 UTC (permalink / raw)
  To: Michael Chan
  Cc: netdev, siva.kallam, prashant, mchan, drc, pavan.chebbi,
	Venkata Sai Duggi
In-Reply-To: <09dbfd72-0efb-0275-9589-6178c9aca8a1@linux.vnet.ibm.com>

Hi Michael,

On 10/3/2023 5:05 PM, Thinh Tran wrote:
> Thanks for the review.
> 
> On 10/3/2023 4:37 AM, Michael Chan wrote:
> 
>>
>> tg3_flag_set() calls set_bit() which is atomic.  The same is true for
>> tg3_flag_clear().  Maybe we just need some smp_mb__after_atomic() or
>> similar memory barriers.
>>
> 
> I did not see it being used in this driver. I'll try that.
> 
> Thinh Tran
> 

I tried that but still the intermittent problem still persists.
I have a fix that I'll describe in V2 of the patch

Thinh Tran

^ permalink raw reply

* Re: [PATCH net-next v2 3/3] net: dsa: realtek: support reset controller
From: Vladimir Oltean @ 2023-11-02 15:57 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Luiz Angelo Daros de Luca, netdev, alsi, andrew, vivien.didelot,
	f.fainelli, davem, kuba, pabeni, robh+dt, krzk+dt, arinc.unal
In-Reply-To: <20231102155521.2yo5qpugdhkjy22x@skbuf>

On Thu, Nov 02, 2023 at 05:55:21PM +0200, Vladimir Oltean wrote:
> diff --git a/drivers/net/dsa/realtek/Kconfig b/drivers/net/dsa/realtek/Kconfig
> index 060165a85fb7..857a039fb0f1 100644
> --- a/drivers/net/dsa/realtek/Kconfig
> +++ b/drivers/net/dsa/realtek/Kconfig
> @@ -15,39 +15,37 @@ menuconfig NET_DSA_REALTEK
>  
>  if NET_DSA_REALTEK
>  
> +config NET_DSA_REALTEK_INTERFACE
> +	tristate
> +	help
> +	  Common interface driver for accessing Realtek switches, either
> +	  through MDIO or SMI.
> +
>  config NET_DSA_REALTEK_MDIO
> -	tristate "Realtek MDIO interface driver"
> -	depends on OF
> -	depends on NET_DSA_REALTEK_RTL8365MB || NET_DSA_REALTEK_RTL8366RB
> -	depends on NET_DSA_REALTEK_RTL8365MB || !NET_DSA_REALTEK_RTL8365MB
> -	depends on NET_DSA_REALTEK_RTL8366RB || !NET_DSA_REALTEK_RTL8366RB
> +	tristate "Realtek MDIO interface support"

I meant to also make "config NET_DSA_REALTEK_MDIO" a bool and not tristate.

>  	help
>  	  Select to enable support for registering switches configured
>  	  through MDIO.
>  
>  config NET_DSA_REALTEK_SMI
> -	tristate "Realtek SMI interface driver"
> -	depends on OF
> -	depends on NET_DSA_REALTEK_RTL8365MB || NET_DSA_REALTEK_RTL8366RB
> -	depends on NET_DSA_REALTEK_RTL8365MB || !NET_DSA_REALTEK_RTL8365MB
> -	depends on NET_DSA_REALTEK_RTL8366RB || !NET_DSA_REALTEK_RTL8366RB
> +	bool "Realtek SMI interface support"
>  	help
>  	  Select to enable support for registering switches connected
>  	  through SMI.
>  
>  config NET_DSA_REALTEK_RTL8365MB
>  	tristate "Realtek RTL8365MB switch subdriver"
> -	imply NET_DSA_REALTEK_SMI
> -	imply NET_DSA_REALTEK_MDIO
> +	select NET_DSA_REALTEK_INTERFACE
>  	select NET_DSA_TAG_RTL8_4
> +	depends on OF
>  	help
>  	  Select to enable support for Realtek RTL8365MB-VC and RTL8367S.
>  
>  config NET_DSA_REALTEK_RTL8366RB
>  	tristate "Realtek RTL8366RB switch subdriver"
> -	imply NET_DSA_REALTEK_SMI
> -	imply NET_DSA_REALTEK_MDIO
> +	select NET_DSA_REALTEK_INTERFACE
>  	select NET_DSA_TAG_RTL4_A
> +	depends on OF
>  	help
>  	  Select to enable support for Realtek RTL8366RB.

^ permalink raw reply

* [PATCH iwl-next v2] ice: Reset VF on Tx MDD event
From: Pawel Chmielewski @ 2023-11-02 15:51 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, pmenzel, lukasz.czapnik, Liang-Min Wang,
	Pawel Chmielewski, Michal Swiatkowski

From: Liang-Min Wang <liang-min.wang@intel.com>

In cases when VF sends malformed packets that are classified as malicious,
sometimes it causes Tx queue to freeze. This frozen queue can be stuck
for several minutes being unusable. This behavior can be reproduced with
DPDK application, testpmd.

When Malicious Driver Detection event occurs, perform graceful VF reset
to quickly bring VF back to operational state. Add a log message to
notify about the cause of the reset.

Signed-off-by: Liang-Min Wang <liang-min.wang@intel.com>
Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
Changelog
v1->v2:
Reverted unneeded formatting change, fixed commit message, fixed a log
message with a correct event name.
---

 drivers/net/ethernet/intel/ice/ice_main.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 3c9419b05a2a..ee9752af6397 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -1839,6 +1839,10 @@ static void ice_handle_mdd_event(struct ice_pf *pf)
 			if (netif_msg_tx_err(pf))
 				dev_info(dev, "Malicious Driver Detection event TX_TCLAN detected on VF %d\n",
 					 vf->vf_id);
+			dev_info(dev,
+				 "PF-to-VF reset on VF %d due to Tx MDD TX_TCLAN event\n",
+				 vf->vf_id);
+			ice_reset_vf(vf, ICE_VF_RESET_NOTIFY);
 		}
 
 		reg = rd32(hw, VP_MDET_TX_TDPU(vf->vf_id));
@@ -1849,6 +1853,10 @@ static void ice_handle_mdd_event(struct ice_pf *pf)
 			if (netif_msg_tx_err(pf))
 				dev_info(dev, "Malicious Driver Detection event TX_TDPU detected on VF %d\n",
 					 vf->vf_id);
+			dev_info(dev,
+				 "PF-to-VF reset on VF %d due to Tx MDD TX_TDPU event\n",
+				 vf->vf_id);
+			ice_reset_vf(vf, ICE_VF_RESET_NOTIFY);
 		}
 
 		reg = rd32(hw, VP_MDET_RX(vf->vf_id));
-- 
2.37.3


^ permalink raw reply related

* Re: [PATCH net-next v2 3/3] net: dsa: realtek: support reset controller
From: Vladimir Oltean @ 2023-11-02 15:55 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Luiz Angelo Daros de Luca, netdev, alsi, andrew, vivien.didelot,
	f.fainelli, davem, kuba, pabeni, robh+dt, krzk+dt, arinc.unal
In-Reply-To: <CACRpkdairxqm_YVshEuk_KbnZw9oH2sKiHapY_sTrgc85_+AmQ@mail.gmail.com>

On Thu, Nov 02, 2023 at 03:59:48PM +0100, Linus Walleij wrote:
> I don't know if this is an answer to your question, but look at what I did in
> 
> drivers/usb/fotg210/Makefile:
> 
> # This setup links the different object files into one single
> # module so we don't have to EXPORT() a lot of internal symbols
> # or create unnecessary submodules.
> fotg210-objs-y                          += fotg210-core.o
> fotg210-objs-$(CONFIG_USB_FOTG210_HCD)  += fotg210-hcd.o
> fotg210-objs-$(CONFIG_USB_FOTG210_UDC)  += fotg210-udc.o
> fotg210-objs                            := $(fotg210-objs-y)
> obj-$(CONFIG_USB_FOTG210)               += fotg210.o
> 
> Everything starting with CONFIG_* is a Kconfig option obviously.
> 
> The final module is just one file named fotg210.ko no matter whether
> HCD (host controller), UDC (device controller) or both parts were
> compiled into it. Often you just need one of them, sometimes you may
> need both.
> 
> It's a pretty clean example of how you do this "one module from
> several optional parts" using Kbuild.

To be clear, something like this is what you mean, right?

diff --git a/drivers/net/dsa/realtek/Kconfig b/drivers/net/dsa/realtek/Kconfig
index 060165a85fb7..857a039fb0f1 100644
--- a/drivers/net/dsa/realtek/Kconfig
+++ b/drivers/net/dsa/realtek/Kconfig
@@ -15,39 +15,37 @@ menuconfig NET_DSA_REALTEK
 
 if NET_DSA_REALTEK
 
+config NET_DSA_REALTEK_INTERFACE
+	tristate
+	help
+	  Common interface driver for accessing Realtek switches, either
+	  through MDIO or SMI.
+
 config NET_DSA_REALTEK_MDIO
-	tristate "Realtek MDIO interface driver"
-	depends on OF
-	depends on NET_DSA_REALTEK_RTL8365MB || NET_DSA_REALTEK_RTL8366RB
-	depends on NET_DSA_REALTEK_RTL8365MB || !NET_DSA_REALTEK_RTL8365MB
-	depends on NET_DSA_REALTEK_RTL8366RB || !NET_DSA_REALTEK_RTL8366RB
+	tristate "Realtek MDIO interface support"
 	help
 	  Select to enable support for registering switches configured
 	  through MDIO.
 
 config NET_DSA_REALTEK_SMI
-	tristate "Realtek SMI interface driver"
-	depends on OF
-	depends on NET_DSA_REALTEK_RTL8365MB || NET_DSA_REALTEK_RTL8366RB
-	depends on NET_DSA_REALTEK_RTL8365MB || !NET_DSA_REALTEK_RTL8365MB
-	depends on NET_DSA_REALTEK_RTL8366RB || !NET_DSA_REALTEK_RTL8366RB
+	bool "Realtek SMI interface support"
 	help
 	  Select to enable support for registering switches connected
 	  through SMI.
 
 config NET_DSA_REALTEK_RTL8365MB
 	tristate "Realtek RTL8365MB switch subdriver"
-	imply NET_DSA_REALTEK_SMI
-	imply NET_DSA_REALTEK_MDIO
+	select NET_DSA_REALTEK_INTERFACE
 	select NET_DSA_TAG_RTL8_4
+	depends on OF
 	help
 	  Select to enable support for Realtek RTL8365MB-VC and RTL8367S.
 
 config NET_DSA_REALTEK_RTL8366RB
 	tristate "Realtek RTL8366RB switch subdriver"
-	imply NET_DSA_REALTEK_SMI
-	imply NET_DSA_REALTEK_MDIO
+	select NET_DSA_REALTEK_INTERFACE
 	select NET_DSA_TAG_RTL4_A
+	depends on OF
 	help
 	  Select to enable support for Realtek RTL8366RB.
 
diff --git a/drivers/net/dsa/realtek/Makefile b/drivers/net/dsa/realtek/Makefile
index 0aab57252a7c..35b7734c0ad0 100644
--- a/drivers/net/dsa/realtek/Makefile
+++ b/drivers/net/dsa/realtek/Makefile
@@ -1,6 +1,15 @@
 # SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_NET_DSA_REALTEK_MDIO) 	+= realtek-mdio.o
-obj-$(CONFIG_NET_DSA_REALTEK_SMI) 	+= realtek-smi.o
+
+obj-$(CONFIG_NET_DSA_REALTEK_INTERFACE) := realtek-interface.o
+
+realtek-interface-objs			:= realtek-interface-common.o
+ifdef CONFIG_NET_DSA_REALTEK_MDIO
+realtek-interface-objs			+= realtek-mdio.o
+endif
+ifdef CONFIG_NET_DSA_REALTEK_SMI
+realtek-interface-objs			+= realtek-smi.o
+endif
+
 obj-$(CONFIG_NET_DSA_REALTEK_RTL8366RB) += rtl8366.o
 rtl8366-objs 				:= rtl8366-core.o rtl8366rb.o
 obj-$(CONFIG_NET_DSA_REALTEK_RTL8365MB) += rtl8365mb.o
diff --git a/drivers/net/dsa/realtek/realtek-interface-common.c b/drivers/net/dsa/realtek/realtek-interface-common.c
new file mode 100644
index 000000000000..bb7c77cdb9e2
--- /dev/null
+++ b/drivers/net/dsa/realtek/realtek-interface-common.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include <linux/module.h>
+
+#include "realtek-mdio.h"
+#include "realtek-smi.h"
+
+static int __init realtek_interface_init(void)
+{
+	int err;
+
+	err = realtek_mdio_init();
+	if (err)
+		return err;
+
+	err = realtek_smi_init();
+	if (err) {
+		realtek_smi_exit();
+		return err;
+	}
+
+	return 0;
+}
+module_init(realtek_interface_init);
+
+static void __exit realtek_interface_exit(void)
+{
+	realtek_smi_exit();
+	realtek_mdio_exit();
+}
+module_exit(realtek_interface_exit);
+
+MODULE_AUTHOR("Luiz Angelo Daros de Luca <luizluca@gmail.com>");
+MODULE_AUTHOR("Linus Walleij <linus.walleij@linaro.org>");
+MODULE_DESCRIPTION("Driver for interfacing with Realtek switches via MDIO or SMI");
+MODULE_LICENSE("GPL");
diff --git a/drivers/net/dsa/realtek/realtek-mdio.c b/drivers/net/dsa/realtek/realtek-mdio.c
index 292e6d087e8b..6997dec14de2 100644
--- a/drivers/net/dsa/realtek/realtek-mdio.c
+++ b/drivers/net/dsa/realtek/realtek-mdio.c
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0+
-/* Realtek MDIO interface driver
+/* Realtek MDIO interface support
  *
  * ASICs we intend to support with this driver:
  *
@@ -19,12 +19,12 @@
  * Copyright (C) 2009-2010 Gabor Juhos <juhosg@openwrt.org>
  */
 
-#include <linux/module.h>
 #include <linux/of.h>
 #include <linux/overflow.h>
 #include <linux/regmap.h>
 
 #include "realtek.h"
+#include "realtek-mdio.h"
 
 /* Read/write via mdiobus */
 #define REALTEK_MDIO_CTRL0_REG		31
@@ -283,8 +283,12 @@ static struct mdio_driver realtek_mdio_driver = {
 	.shutdown = realtek_mdio_shutdown,
 };
 
-mdio_module_driver(realtek_mdio_driver);
+int realtek_mdio_init(void)
+{
+	return mdio_driver_register(&realtek_mdio_driver);
+}
 
-MODULE_AUTHOR("Luiz Angelo Daros de Luca <luizluca@gmail.com>");
-MODULE_DESCRIPTION("Driver for Realtek ethernet switch connected via MDIO interface");
-MODULE_LICENSE("GPL");
+void realtek_mdio_exit(void)
+{
+	mdio_driver_unregister(&realtek_mdio_driver);
+}
diff --git a/drivers/net/dsa/realtek/realtek-mdio.h b/drivers/net/dsa/realtek/realtek-mdio.h
new file mode 100644
index 000000000000..941b4ef9d531
--- /dev/null
+++ b/drivers/net/dsa/realtek/realtek-mdio.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#ifndef _REALTEK_MDIO_H
+#define _REALTEK_MDIO_H
+
+#if IS_ENABLED(CONFIG_NET_DSA_REALTEK_MDIO)
+
+int realtek_mdio_init(void);
+void realtek_mdio_exit(void);
+
+#else
+
+static inline int realtek_mdio_init(void)
+{
+	return 0;
+}
+
+static inline void realtek_mdio_exit(void)
+{
+}
+
+#endif
+
+#endif
diff --git a/drivers/net/dsa/realtek/realtek-smi.c b/drivers/net/dsa/realtek/realtek-smi.c
index 755546ed8db6..4c282bfc884d 100644
--- a/drivers/net/dsa/realtek/realtek-smi.c
+++ b/drivers/net/dsa/realtek/realtek-smi.c
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0+
-/* Realtek Simple Management Interface (SMI) driver
+/* Realtek Simple Management Interface (SMI) interface
  * It can be discussed how "simple" this interface is.
  *
  * The SMI protocol piggy-backs the MDIO MDC and MDIO signals levels
@@ -26,7 +26,6 @@
  */
 
 #include <linux/kernel.h>
-#include <linux/module.h>
 #include <linux/device.h>
 #include <linux/spinlock.h>
 #include <linux/skbuff.h>
@@ -40,6 +39,7 @@
 #include <linux/if_bridge.h>
 
 #include "realtek.h"
+#include "realtek-smi.h"
 
 #define REALTEK_SMI_ACK_RETRY_COUNT		5
 
@@ -560,8 +560,13 @@ static struct platform_driver realtek_smi_driver = {
 	.remove_new = realtek_smi_remove,
 	.shutdown = realtek_smi_shutdown,
 };
-module_platform_driver(realtek_smi_driver);
 
-MODULE_AUTHOR("Linus Walleij <linus.walleij@linaro.org>");
-MODULE_DESCRIPTION("Driver for Realtek ethernet switch connected via SMI interface");
-MODULE_LICENSE("GPL");
+int realtek_smi_init(void)
+{
+	return platform_driver_register(&realtek_smi_driver);
+}
+
+void realtek_smi_exit(void)
+{
+	platform_driver_unregister(&realtek_smi_driver);
+}
diff --git a/drivers/net/dsa/realtek/realtek-smi.h b/drivers/net/dsa/realtek/realtek-smi.h
new file mode 100644
index 000000000000..9a4838321f94
--- /dev/null
+++ b/drivers/net/dsa/realtek/realtek-smi.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#ifndef _REALTEK_SMI_H
+#define _REALTEK_SMI_H
+
+#if IS_ENABLED(CONFIG_NET_DSA_REALTEK_SMI)
+
+int realtek_smi_init(void);
+void realtek_smi_exit(void);
+
+#else
+
+static inline int realtek_smi_init(void)
+{
+	return 0;
+}
+
+static inline void realtek_smi_exit(void)
+{
+}
+
+#endif
+
+#endif
-- 
2.34.1


It looks pretty reasonable to me. More stuff could go into
realtek-interface-common.c, that could be called directly from
realtek-smi.c and realtek-mdio.c without exporting anything.

I've eliminated the possibility for the SMI and MDIO options to be
anything other than y or n, because only a single interface module
(the common one) exists, and the y/n/m quality of that is
implied/selected by the drivers which depend on it. I hope I wasn't too
trigger-happy with this.

^ permalink raw reply related

* Re: [net-next RFC PATCH v3 1/4] net: phy: aquantia: move to separate directory
From: Andrew Lunn @ 2023-11-02 15:49 UTC (permalink / raw)
  To: Christian Marangi
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiner Kallweit,
	Russell King, Robert Marko, Vladimir Oltean, netdev, devicetree,
	linux-kernel
In-Reply-To: <6543bb3e.df0a0220.385df.cdb1@mx.google.com>

On Thu, Nov 02, 2023 at 04:07:41PM +0100, Christian Marangi wrote:
> On Thu, Nov 02, 2023 at 04:03:33PM +0100, Andrew Lunn wrote:
> > > diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> > > index 421d2b62918f..4b2451dd6c45 100644
> > > --- a/drivers/net/phy/Kconfig
> > > +++ b/drivers/net/phy/Kconfig
> > > @@ -68,6 +68,8 @@ config SFP
> > >  
> > >  comment "MII PHY device drivers"
> > >  
> > > +source "drivers/net/phy/aquantia/Kconfig"
> > > +
> > >  config AMD_PHY
> > >  	tristate "AMD and Altima PHYs"
> > >  	help
> > > @@ -96,11 +98,6 @@ config ADIN1100_PHY
> > >  	  Currently supports the:
> > >  	  - ADIN1100 - Robust,Industrial, Low Power 10BASE-T1L Ethernet PHY
> > >  
> > > -config AQUANTIA_PHY
> > > -	tristate "Aquantia PHYs"
> > > -	help
> > > -	  Currently supports the Aquantia AQ1202, AQ2104, AQR105, AQR405
> > > -
> > 
> > Does this move the PHY in the make menuconfig menu? We try to keep it
> > sorted based on the tristate string.
> >
> 
> Oh wasn't aware... Yes it does move it to the top of the list... I can
> just move the source entry where AQUANTIA_PHY was...

Yes, that would be best.

Thanks

    Andrew

---
pw-bot: cr


^ permalink raw reply

* [PATCH net-next v10 07/13] net:ethernet:realtek:rtase: Implement a function to receive packets
From: Justin Lai @ 2023-11-02 15:44 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, linux-kernel, netdev, andrew, pkshih,
	larry.chiu, Justin Lai
In-Reply-To: <20231102154505.940783-1-justinlai0215@realtek.com>

Implement rx_handler to read the information of the rx descriptor,
thereby checking the packet accordingly and storing the packet
in the socket buffer to complete the reception of the packet.

Signed-off-by: Justin Lai <justinlai0215@realtek.com>
---
 .../net/ethernet/realtek/rtase/rtase_main.c   | 148 ++++++++++++++++++
 1 file changed, 148 insertions(+)

diff --git a/drivers/net/ethernet/realtek/rtase/rtase_main.c b/drivers/net/ethernet/realtek/rtase/rtase_main.c
index b0bd9ec02710..fc0bd1bfd83b 100644
--- a/drivers/net/ethernet/realtek/rtase/rtase_main.c
+++ b/drivers/net/ethernet/realtek/rtase/rtase_main.c
@@ -452,6 +452,154 @@ static void rtase_rx_ring_clear(struct rtase_ring *ring)
 	}
 }
 
+static int rtase_fragmented_frame(u32 status)
+{
+	return (status & (RX_FIRST_FRAG | RX_LAST_FRAG)) !=
+		(RX_FIRST_FRAG | RX_LAST_FRAG);
+}
+
+static void rtase_rx_csum(const struct rtase_private *tp, struct sk_buff *skb,
+			  const union rx_desc *desc)
+{
+	u32 opts2 = le32_to_cpu(desc->desc_status.opts2);
+
+	/* rx csum offload */
+	if (((opts2 & RX_V4F) && !(opts2 & RX_IPF)) || (opts2 & RX_V6F)) {
+		if (((opts2 & RX_TCPT) && !(opts2 & RX_TCPF)) ||
+		    ((opts2 & RX_UDPT) && !(opts2 & RX_UDPF))) {
+			skb->ip_summed = CHECKSUM_UNNECESSARY;
+		} else {
+			skb->ip_summed = CHECKSUM_NONE;
+		}
+	} else {
+		skb->ip_summed = CHECKSUM_NONE;
+	}
+}
+
+static void rtase_rx_vlan_skb(union rx_desc *desc, struct sk_buff *skb)
+{
+	u32 opts2 = le32_to_cpu(desc->desc_status.opts2);
+
+	if (!(opts2 & RX_VLAN_TAG))
+		return;
+
+	__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), swab16(opts2 & VLAN_TAG_MASK));
+}
+
+static void rtase_rx_skb(const struct rtase_ring *ring, struct sk_buff *skb)
+{
+	struct rtase_int_vector *ivec = ring->ivec;
+
+	napi_gro_receive(&ivec->napi, skb);
+}
+
+static int rx_handler(struct rtase_ring *ring, int budget)
+{
+	const struct rtase_private *tp = ring->ivec->tp;
+	u32 pkt_size, cur_rx, delta, entry, status;
+	struct net_device *dev = tp->dev;
+	union rx_desc *desc_base = ring->desc;
+	struct sk_buff *skb;
+	union rx_desc *desc;
+	int workdone = 0;
+
+	if (!ring->desc)
+		return workdone;
+
+	cur_rx = ring->cur_idx;
+	entry = cur_rx % NUM_DESC;
+	desc = &desc_base[entry];
+
+	do {
+		/* make sure discriptor has been updated */
+		rmb();
+		status = le32_to_cpu(desc->desc_status.opts1);
+
+		if (status & DESC_OWN)
+			break;
+
+		if (unlikely(status & RX_RES)) {
+			if (net_ratelimit())
+				netdev_warn(dev, "Rx ERROR. status = %08x\n",
+					    status);
+
+			dev->stats.rx_errors++;
+
+			if (status & (RX_RWT | RX_RUNT))
+				dev->stats.rx_length_errors++;
+
+			if (status & RX_CRC)
+				dev->stats.rx_crc_errors++;
+
+			if (dev->features & NETIF_F_RXALL)
+				goto process_pkt;
+
+			rtase_mark_to_asic(desc, tp->rx_buf_sz);
+			goto skip_process_pkt;
+		}
+
+process_pkt:
+		pkt_size = status & RX_PKT_SIZE_MASK;
+		if (likely(!(dev->features & NETIF_F_RXFCS)))
+			pkt_size -= ETH_FCS_LEN;
+
+		/* the driver does not support incoming fragmented
+		 * frames. they are seen as a symptom of over-mtu
+		 * sized frames
+		 */
+		if (unlikely(rtase_fragmented_frame(status))) {
+			dev->stats.rx_dropped++;
+			dev->stats.rx_length_errors++;
+			rtase_mark_to_asic(desc, tp->rx_buf_sz);
+			continue;
+		}
+
+		skb = ring->skbuff[entry];
+		dma_sync_single_for_cpu(&tp->pdev->dev,
+					ring->mis.data_phy_addr[entry],
+					tp->rx_buf_sz, DMA_FROM_DEVICE);
+
+		ring->skbuff[entry] = NULL;
+
+		if (dev->features & NETIF_F_RXCSUM)
+			rtase_rx_csum(tp, skb, desc);
+
+		skb->dev = dev;
+		skb_put(skb, pkt_size);
+		skb_mark_for_recycle(skb);
+		skb->protocol = eth_type_trans(skb, dev);
+
+		if (skb->pkt_type == PACKET_MULTICAST)
+			dev->stats.multicast++;
+
+		rtase_rx_vlan_skb(desc, skb);
+		rtase_rx_skb(ring, skb);
+
+		dev->stats.rx_bytes += pkt_size;
+		dev->stats.rx_packets++;
+
+skip_process_pkt:
+		workdone++;
+		cur_rx++;
+		entry = cur_rx % NUM_DESC;
+		desc = ring->desc + sizeof(union rx_desc) * entry;
+		prefetch(desc);
+	} while (workdone != budget);
+
+	ring->cur_idx = cur_rx;
+	delta = rtase_rx_ring_fill(ring, ring->dirty_idx, ring->cur_idx, 1);
+
+	if (!delta && workdone)
+		netdev_info(dev, "no Rx buffer allocated\n");
+
+	ring->dirty_idx += delta;
+
+	if ((ring->dirty_idx + NUM_DESC) == ring->cur_idx)
+		netdev_emerg(dev, "Rx buffers exhausted\n");
+
+	return workdone;
+}
+
 static void rtase_rx_desc_init(struct rtase_private *tp, u16 idx)
 {
 	struct rtase_ring *ring = &tp->rx_ring[idx];
-- 
2.34.1


^ permalink raw reply related

* [PATCH net-next v10 13/13] MAINTAINERS: Add the rtase ethernet driver entry
From: Justin Lai @ 2023-11-02 15:45 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, linux-kernel, netdev, andrew, pkshih,
	larry.chiu, Justin Lai
In-Reply-To: <20231102154505.940783-1-justinlai0215@realtek.com>

Add myself and Larry Chiu as the maintainer for the rtase ethernet driver.

Signed-off-by: Justin Lai <justinlai0215@realtek.com>
---
 MAINTAINERS | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 53b7ca804465..239aae94dc0f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -18476,6 +18476,13 @@ L:	linux-remoteproc@vger.kernel.org
 S:	Maintained
 F:	drivers/tty/rpmsg_tty.c
 
+RTASE ETHERNET DRIVER
+M:	Justin Lai <justinlai0215@realtek.com>
+M:	Larry Chiu <larry.chiu@realtek.com>
+L:	netdev@vger.kernel.org
+S:	Maintained
+F:	drivers/net/ethernet/realtek/rtase/
+
 RTL2830 MEDIA DRIVER
 M:	Antti Palosaari <crope@iki.fi>
 L:	linux-media@vger.kernel.org
-- 
2.34.1


^ permalink raw reply related

* [PATCH net-next v10 01/13] net:ethernet:realtek:rtase: Add pci table supported in this module
From: Justin Lai @ 2023-11-02 15:44 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, linux-kernel, netdev, andrew, pkshih,
	larry.chiu, Justin Lai
In-Reply-To: <20231102154505.940783-1-justinlai0215@realtek.com>

Add pci table supported in this module, and implement pci_driver function
to initialize this driver, remove this driver, or shutdown this driver.

Signed-off-by: Justin Lai <justinlai0215@realtek.com>
---
 drivers/net/ethernet/realtek/rtase/rtase.h    | 353 ++++++++++
 .../net/ethernet/realtek/rtase/rtase_main.c   | 622 ++++++++++++++++++
 2 files changed, 975 insertions(+)
 create mode 100644 drivers/net/ethernet/realtek/rtase/rtase.h
 create mode 100644 drivers/net/ethernet/realtek/rtase/rtase_main.c

diff --git a/drivers/net/ethernet/realtek/rtase/rtase.h b/drivers/net/ethernet/realtek/rtase/rtase.h
new file mode 100644
index 000000000000..9239c518c504
--- /dev/null
+++ b/drivers/net/ethernet/realtek/rtase/rtase.h
@@ -0,0 +1,353 @@
+/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
+/*
+ *  rtase is the Linux device driver released for Realtek Automotive Switch
+ *  controllers with PCI-Express interface.
+ *
+ *  Copyright(c) 2023 Realtek Semiconductor Corp.
+ */
+
+#ifndef _RTASE_H_
+#define _RTASE_H_
+
+/* the low 32 bit address of receive buffer must be 8-byte alignment. */
+#define RTK_RX_ALIGN 8
+
+#define HW_VER_MASK 0x7C800000
+
+#define RX_DMA_BURST_256       4
+#define TX_DMA_BURST_UNLIMITED 7
+#define RX_BUF_SIZE            (PAGE_SIZE - \
+				SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+#define MAX_JUMBO_SIZE         (RX_BUF_SIZE - VLAN_ETH_HLEN - ETH_FCS_LEN)
+
+/* 3 means InterFrameGap = the shortest one */
+#define INTERFRAMEGAP 0x03
+
+#define RTASE_REGS_SIZE     256
+#define RTASE_PCI_REGS_SIZE 0x100
+
+#define MULTICAST_FILTER_MASK  GENMASK(30, 26)
+#define MULTICAST_FILTER_LIMIT 32
+
+#define RTASE_VLAN_FILTER_ENTRY_NUM 32
+#define RTASE_NUM_TX_QUEUE 8
+#define RTASE_NUM_RX_QUEUE 4
+
+#define RTASE_TXQ_CTRL      1
+#define RTASE_FUNC_TXQ_NUM  1
+#define RTASE_FUNC_RXQ_NUM  1
+#define RTASE_INTERRUPT_NUM 1
+
+#define MITI_TIME_COUNT_MASK     GENMASK(3, 0)
+#define MITI_TIME_UNIT_MASK      GENMASK(7, 4)
+#define MITI_DEFAULT_TIME        128
+#define MITI_MAX_TIME            491520
+#define MITI_PKT_NUM_COUNT_MASK  GENMASK(11, 8)
+#define MITI_PKT_NUM_UNIT_MASK   GENMASK(13, 12)
+#define MITI_DEFAULT_PKT_NUM     64
+#define MITI_MAX_PKT_NUM_IDX     3
+#define MITI_MAX_PKT_NUM_UNIT    16
+#define MITI_MAX_PKT_NUM         240
+#define MITI_COUNT_BIT_NUM       4
+
+#define RTASE_NUM_MSIX 4
+
+#define RTASE_DWORD_MOD 16
+
+/*****************************************************************************/
+enum rtase_registers {
+	RTASE_MAC0   = 0x0000,
+	RTASE_MAC4   = 0x0004,
+	RTASE_MAR0   = 0x0008,
+	RTASE_MAR1   = 0x000C,
+	RTASE_DTCCR0 = 0x0010,
+	RTASE_DTCCR4 = 0x0014,
+#define COUNTER_RESET BIT(0)
+#define COUNTER_DUMP  BIT(3)
+
+	RTASE_FCR    = 0x0018,
+#define FCR_RXQ_MASK    GENMASK(5, 4)
+#define FCR_VLAN_FTR_EN BIT(1)
+
+	RTASE_LBK_CTRL = 0x001A,
+#define LBK_ATLD BIT(1)
+#define LBK_CLR  BIT(0)
+
+	RTASE_TX_DESC_ADDR0   = 0x0020,
+	RTASE_TX_DESC_ADDR4   = 0x0024,
+	RTASE_TX_DESC_COMMAND = 0x0028,
+#define TX_DESC_CMD_CS BIT(15)
+#define TX_DESC_CMD_WE BIT(14)
+
+	RTASE_BOOT_CTL  = 0x6004,
+	RTASE_CLKSW_SET = 0x6018,
+
+	RTASE_CHIP_CMD = 0x0037,
+#define STOP_REQ      BIT(7)
+#define STOP_REQ_DONE BIT(6)
+#define RE            BIT(3)
+#define TE            BIT(2)
+
+	RTASE_IMR0 = 0x0038,
+	RTASE_ISR0 = 0x003C,
+#define TOK7 BIT(30)
+#define TOK6 BIT(28)
+#define TOK5 BIT(26)
+#define TOK4 BIT(24)
+#define FOVW BIT(6)
+#define RDU  BIT(4)
+#define TOK  BIT(2)
+#define ROK  BIT(0)
+
+	RTASE_IMR1 = 0x0800,
+	RTASE_ISR1 = 0x0802,
+#define Q_TOK BIT(4)
+#define Q_RDU BIT(1)
+#define Q_ROK BIT(0)
+
+	RTASE_EPHY_ISR = 0x6014,
+	RTASE_EPHY_IMR = 0x6016,
+
+	RTASE_TX_CONFIG_0 = 0x0040,
+#define TX_INTER_FRAME_GAP_MASK GENMASK(25, 24)
+	/* DMA burst value (0-7) is shift this many bits */
+#define TX_DMA_MASK             GENMASK(10, 8)
+
+	RTASE_RX_CONFIG_0 = 0x0044,
+#define RX_SINGLE_FETCH  BIT(14)
+#define RX_SINGLE_TAG    BIT(13)
+#define RX_MX_DMA_MASK   GENMASK(10, 8)
+#define ACPT_FLOW        BIT(7)
+#define ACCEPT_ERR       BIT(5)
+#define ACCEPT_RUNT      BIT(4)
+#define ACCEPT_BROADCAST BIT(3)
+#define ACCEPT_MULTICAST BIT(2)
+#define ACCEPT_MYPHYS    BIT(1)
+#define ACCEPT_ALLPHYS   BIT(0)
+#define ACCEPT_MASK      (ACPT_FLOW | ACCEPT_ERR | ACCEPT_RUNT | \
+			  ACCEPT_BROADCAST | ACCEPT_MULTICAST | \
+			  ACCEPT_MYPHYS | ACCEPT_ALLPHYS)
+
+	RTASE_RX_CONFIG_1 = 0x0046,
+#define RX_MAX_FETCH_DESC_MASK  GENMASK(15, 11)
+#define RX_NEW_DESC_FORMAT_EN   BIT(8)
+#define OUTER_VLAN_DETAG_EN     BIT(7)
+#define INNER_VLAN_DETAG_EN     BIT(6)
+#define PCIE_NEW_FLOW           BIT(2)
+#define PCIE_RELOAD_En          BIT(0)
+
+	RTASE_EEM = 0x0050,
+#define EEM_UNLOCK 0xC0
+
+	RTASE_TDFNR  = 0x0057,
+	RTASE_TPPOLL = 0x0090,
+	RTASE_PDR    = 0x00B0,
+	RTASE_FIFOR  = 0x00D3,
+#define TX_FIFO_EMPTY BIT(5)
+#define RX_FIFO_EMPTY BIT(4)
+
+	RTASE_PCPR = 0x00D8,
+#define PCPR_VLAN_FTR_EN BIT(6)
+
+	RTASE_RMS       = 0x00DA,
+	RTASE_CPLUS_CMD = 0x00E0,
+#define FORCE_RXFLOW_EN BIT(11)
+#define FORCE_TXFLOW_EN BIT(10)
+#define RX_CHKSUM       BIT(5)
+
+	RTASE_Q0_RX_DESC_ADDR0 = 0x00E4,
+	RTASE_Q0_RX_DESC_ADDR4 = 0x00E8,
+	RTASE_Q1_RX_DESC_ADDR0 = 0x4000,
+	RTASE_Q1_RX_DESC_ADDR4 = 0x4004,
+	RTASE_MTPS             = 0x00EC,
+#define TAG_NUM_SEL_MASK  GENMASK(10, 8)
+
+	RTASE_MISC = 0x00F2,
+#define RX_DV_GATE_EN BIT(3)
+
+	RTASE_TFUN_CTRL = 0x0400,
+#define TX_NEW_DESC_FORMAT_EN BIT(0)
+
+	RTASE_TX_CONFIG_1 = 0x203E,
+#define TC_MODE_MASK  GENMASK(11, 10)
+
+	RTASE_TOKSEL      = 0x2046,
+	RTASE_TXQCRDT_0   = 0x2500,
+	RTASE_RFIFONFULL  = 0x4406,
+	RTASE_INT_MITI_TX = 0x0A00,
+	RTASE_INT_MITI_RX = 0x0A80,
+
+	RTASE_VLAN_ENTRY_MEM_0 = 0x7234,
+	RTASE_VLAN_ENTRY_0     = 0xAC80,
+};
+
+enum desc_status_bit {
+	DESC_OWN = BIT(31), /* Descriptor is owned by NIC */
+	RING_END = BIT(30), /* End of descriptor ring */
+};
+
+enum sw_flag_content {
+	SWF_MSI_ENABLED  = BIT(1),
+	SWF_MSIX_ENABLED = BIT(2),
+};
+
+#define RSVD_MASK 0x3FFFC000
+
+struct tx_desc {
+	__le32 opts1;
+	__le32 opts2;
+	__le64 addr;
+	__le32 opts3;
+	__le32 reserved1;
+	__le32 reserved2;
+	__le32 reserved3;
+} __packed;
+
+/*------ offset 0 of tx descriptor ------*/
+#define TX_FIRST_FRAG BIT(29) /* Tx First segment of a packet */
+#define TX_LAST_FRAG  BIT(28) /* Tx Final segment of a packet */
+#define GIANT_SEND_V4 BIT(26) /* TCP Giant Send Offload V4 (GSOv4) */
+#define GIANT_SEND_V6 BIT(25) /* TCP Giant Send Offload V6 (GSOv6) */
+#define TX_VLAN_TAG   BIT(17) /* Add VLAN tag */
+
+/*------ offset 4 of tx descriptor ------*/
+#define TX_UDPCS_C BIT(31) /* Calculate UDP/IP checksum */
+#define TX_TCPCS_C BIT(30) /* Calculate TCP/IP checksum */
+#define TX_IPCS_C  BIT(29) /* Calculate IP checksum */
+#define TX_IPV6F_C BIT(28) /* Indicate it is an IPv6 packet */
+
+union rx_desc {
+	struct {
+		__le64 header_buf_addr;
+		__le32 reserved1;
+		__le32 opts_header_len;
+		__le64 addr;
+		__le32 reserved2;
+		__le32 opts1;
+	} __packed desc_cmd;
+
+	struct {
+		__le32 reserved1;
+		__le32 reserved2;
+		__le32 rss;
+		__le32 opts4;
+		__le32 reserved3;
+		__le32 opts3;
+		__le32 opts2;
+		__le32 opts1;
+	} __packed desc_status;
+} __packed;
+
+/*------ offset 28 of rx descriptor ------*/
+#define RX_FIRST_FRAG    BIT(25) /* Rx First segment of a packet */
+#define RX_LAST_FRAG     BIT(24) /* Rx Final segment of a packet */
+#define RX_RES           BIT(20)
+#define RX_RUNT          BIT(19)
+#define RX_RWT           BIT(18)
+#define RX_CRC           BIT(16)
+#define RX_V6F           BIT(31)
+#define RX_V4F           BIT(30)
+#define RX_UDPT          BIT(29)
+#define RX_TCPT          BIT(28)
+#define RX_IPF           BIT(26) /* IP checksum failed */
+#define RX_UDPF          BIT(25) /* UDP/IP checksum failed */
+#define RX_TCPF          BIT(24) /* TCP/IP checksum failed */
+#define RX_LBK_FIFO_FULL BIT(17) /* Loopback FIFO Full */
+#define RX_VLAN_TAG      BIT(16) /* VLAN tag available */
+
+#define NUM_DESC                1024
+#define RTASE_TX_RING_DESC_SIZE (NUM_DESC * sizeof(struct tx_desc))
+#define RTASE_RX_RING_DESC_SIZE (NUM_DESC * sizeof(union rx_desc))
+#define VLAN_ENTRY_CAREBIT      0xF0000000
+#define VLAN_TAG_MASK           GENMASK(15, 0)
+#define RX_PKT_SIZE_MASK        GENMASK(13, 0)
+
+/* txqos hardware definitions */
+#define RTASE_1T_CLOCK            64
+#define RTASE_1T_POWER            10000000
+#define RTASE_IDLESLOPE_INT_SHIFT 25
+#define RTASE_IDLESLOPE_INT_MASK  GENMASK(31, 25)
+
+#define IVEC_NAME_SIZE (IFNAMSIZ + 10)
+
+struct rtase_int_vector {
+	struct rtase_private *tp;
+	unsigned int irq;
+	u8 status;
+	char name[IVEC_NAME_SIZE];
+	u16 index;
+	u16 imr_addr;
+	u16 isr_addr;
+	u32 imr;
+	struct list_head ring_list;
+	struct napi_struct napi;
+	int (*poll)(struct napi_struct *napi, int budget);
+};
+
+struct rtase_ring {
+	struct rtase_int_vector *ivec;
+	void *desc;
+	dma_addr_t phy_addr;
+	u32 cur_idx;
+	u32 dirty_idx;
+	u16 index;
+
+	struct sk_buff *skbuff[NUM_DESC];
+	union {
+		u32 len[NUM_DESC];
+		dma_addr_t data_phy_addr[NUM_DESC];
+	} mis;
+
+	struct list_head ring_entry;
+	int (*ring_handler)(struct rtase_ring *ring, int budget);
+};
+
+struct rtase_txqos {
+	int hicredit;
+	int locredit;
+	int idleslope;
+	int sendslope;
+};
+
+struct rtase_private {
+	void __iomem *mmio_addr;
+	u32 sw_flag;
+	u32 mc_filter[2];
+
+	struct pci_dev *pdev;
+	struct net_device *dev;
+	u32 rx_buf_sz;
+
+	struct page_pool *page_pool;
+	struct rtase_ring tx_ring[RTASE_NUM_TX_QUEUE];
+	struct rtase_txqos tx_qos[RTASE_NUM_TX_QUEUE];
+	struct rtase_ring rx_ring[RTASE_NUM_RX_QUEUE];
+	struct rtase_counters *tally_vaddr;
+	dma_addr_t tally_paddr;
+
+	u32 vlan_filter_ctrl;
+	u16 vlan_filter_vid[RTASE_VLAN_FILTER_ENTRY_NUM];
+
+	struct delayed_work task;
+	u8 org_mac_addr[ETH_ALEN];
+	struct msix_entry msix_entry[RTASE_NUM_MSIX];
+	struct rtase_int_vector int_vector[RTASE_NUM_MSIX];
+
+	u16 tx_queue_ctrl;
+	u16 func_tx_queue_num;
+	u16 func_rx_queue_num;
+	u16 int_nums;
+	u16 tx_int_mit;
+	u16 rx_int_mit;
+};
+
+#define LSO_64K 64000
+
+#define NIC_MAX_PHYS_BUF_COUNT_LSO2 (16 * 4)
+
+#define TCPHO_MASK GENMASK(24, 18)
+
+#define MSS_MAX  0x07FF /* MSS value */
+#define MSS_MASK GENMASK(28, 18)
+
+#endif /* _RTASE_H_ */
diff --git a/drivers/net/ethernet/realtek/rtase/rtase_main.c b/drivers/net/ethernet/realtek/rtase/rtase_main.c
new file mode 100644
index 000000000000..03352a0b8a91
--- /dev/null
+++ b/drivers/net/ethernet/realtek/rtase/rtase_main.c
@@ -0,0 +1,622 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
+/*
+ *  rtase is the Linux device driver released for Realtek Automotive Switch
+ *  controllers with PCI-Express interface.
+ *
+ *  Copyright(c) 2023 Realtek Semiconductor Corp.
+ *
+ *  Below is a simplified block diagram of the chip and its relevant interfaces.
+ *
+ *               *************************
+ *               *                       *
+ *               *  CPU network device   *
+ *               *                       *
+ *               *   +-------------+     *
+ *               *   |  PCIE Host  |     *
+ *               ***********++************
+ *                          ||
+ *                         PCIE
+ *                          ||
+ *      ********************++**********************
+ *      *            | PCIE Endpoint |             *
+ *      *            +---------------+             *
+ *      *                | GMAC |                  *
+ *      *                +--++--+  Realtek         *
+ *      *                   ||     RTL90xx Series  *
+ *      *                   ||                     *
+ *      *     +-------------++----------------+    *
+ *      *     |           | MAC |             |    *
+ *      *     |           +-----+             |    *
+ *      *     |                               |    *
+ *      *     |     Ethernet Switch Core      |    *
+ *      *     |                               |    *
+ *      *     |   +-----+           +-----+   |    *
+ *      *     |   | MAC |...........| MAC |   |    *
+ *      *     +---+-----+-----------+-----+---+    *
+ *      *         | PHY |...........| PHY |        *
+ *      *         +--++-+           +--++-+        *
+ *      *************||****************||***********
+ *
+ *  The block of the Realtek RTL90xx series is our entire chip architecture,
+ *  the GMAC is connected to the switch core, and there is no PHY in between.
+ *  In addition, this driver is mainly used to control GMAC, but does not
+ *  control the switch core, so it is not the same as DSA.
+ */
+
+#include <linux/crc32.h>
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/etherdevice.h>
+#include <linux/if_vlan.h>
+#include <linux/in.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/iopoll.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/mdio.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/netdevice.h>
+#include <linux/pci.h>
+#include <linux/prefetch.h>
+#include <linux/rtnetlink.h>
+#include <linux/tcp.h>
+#include <asm/irq.h>
+#include <net/ip6_checksum.h>
+#include <net/page_pool.h>
+#include <net/pkt_cls.h>
+
+#include "rtase.h"
+
+#define RTK_OPTS1_DEBUG_VALUE 0x0BADBEEF
+#define RTK_MAGIC_NUMBER      0x0BADBADBADBADBAD
+
+static const struct pci_device_id rtase_pci_tbl[] = {
+	{PCI_VDEVICE(REALTEK, 0x906A)},
+	{}
+};
+
+MODULE_DEVICE_TABLE(pci, rtase_pci_tbl);
+
+MODULE_AUTHOR("Realtek ARD Software Team");
+MODULE_DESCRIPTION("Network Driver for the PCIe interface of Realtek Automotive Ethernet Switch");
+MODULE_LICENSE("Dual BSD/GPL");
+
+struct rtase_counters {
+	__le64 tx_packets;
+	__le64 rx_packets;
+	__le64 tx_errors;
+	__le32 rx_errors;
+	__le16 rx_missed;
+	__le16 align_errors;
+	__le32 tx_one_collision;
+	__le32 tx_multi_collision;
+	__le64 rx_unicast;
+	__le64 rx_broadcast;
+	__le32 rx_multicast;
+	__le16 tx_aborted;
+	__le16 tx_underun;
+} __packed;
+
+static void rtase_w8(const struct rtase_private *tp, u16 reg, u8 val8)
+{
+	writeb(val8, tp->mmio_addr + reg);
+}
+
+static void rtase_w16(const struct rtase_private *tp, u16 reg, u16 val16)
+{
+	writew(val16, tp->mmio_addr + reg);
+}
+
+static void rtase_w32(const struct rtase_private *tp, u16 reg, u32 val32)
+{
+	writel(val32, tp->mmio_addr + reg);
+}
+
+static u8 rtase_r8(const struct rtase_private *tp, u16 reg)
+{
+	return readb(tp->mmio_addr + reg);
+}
+
+static u16 rtase_r16(const struct rtase_private *tp, u16 reg)
+{
+	return readw(tp->mmio_addr + reg);
+}
+
+static u32 rtase_r32(const struct rtase_private *tp, u16 reg)
+{
+	return readl(tp->mmio_addr + reg);
+}
+
+static void rtase_tally_counter_clear(const struct rtase_private *tp)
+{
+	u32 cmd = lower_32_bits(tp->tally_paddr);
+
+	rtase_w32(tp, RTASE_DTCCR4, upper_32_bits(tp->tally_paddr));
+	rtase_w32(tp, RTASE_DTCCR0, cmd | COUNTER_RESET);
+}
+
+static void rtase_enable_eem_write(const struct rtase_private *tp)
+{
+	u8 val;
+
+	val = rtase_r8(tp, RTASE_EEM);
+	rtase_w8(tp, RTASE_EEM, val | EEM_UNLOCK);
+}
+
+static void rtase_disable_eem_write(const struct rtase_private *tp)
+{
+	u8 val;
+
+	val = rtase_r8(tp, RTASE_EEM);
+	rtase_w8(tp, RTASE_EEM, val & ~EEM_UNLOCK);
+}
+
+static void rtase_rar_set(const struct rtase_private *tp, const u8 *addr)
+{
+	u32 rar_low, rar_high;
+
+	rar_low = (u32)addr[0] | ((u32)addr[1] << 8) |
+		  ((u32)addr[2] << 16) | ((u32)addr[3] << 24);
+
+	rar_high = (u32)addr[4] | ((u32)addr[5] << 8);
+
+	rtase_enable_eem_write(tp);
+	rtase_w32(tp, RTASE_MAC0, rar_low);
+	rtase_w32(tp, RTASE_MAC4, rar_high);
+	rtase_disable_eem_write(tp);
+	rtase_w16(tp, RTASE_LBK_CTRL, LBK_ATLD | LBK_CLR);
+}
+
+static void rtase_get_mac_address(struct net_device *dev)
+{
+	struct rtase_private *tp = netdev_priv(dev);
+	u8 mac_addr[ETH_ALEN] __aligned(2) = {};
+	u32 i;
+
+	for (i = 0; i < ETH_ALEN; i++)
+		mac_addr[i] = rtase_r8(tp, RTASE_MAC0 + i);
+
+	if (!is_valid_ether_addr(mac_addr)) {
+		eth_random_addr(mac_addr);
+		dev->addr_assign_type = NET_ADDR_RANDOM;
+		netdev_warn(dev, "Random ether addr %pM\n", mac_addr);
+	}
+
+	eth_hw_addr_set(dev, mac_addr);
+	rtase_rar_set(tp, mac_addr);
+
+	/* keep the original MAC address */
+	ether_addr_copy(tp->org_mac_addr, dev->dev_addr);
+	ether_addr_copy(dev->perm_addr, dev->dev_addr);
+}
+
+static void rtase_reset_interrupt(struct pci_dev *pdev,
+				  const struct rtase_private *tp)
+{
+	if (tp->sw_flag & SWF_MSIX_ENABLED)
+		pci_disable_msix(pdev);
+	else
+		pci_disable_msi(pdev);
+}
+
+static int rtase_alloc_msix(struct pci_dev *pdev, struct rtase_private *tp)
+{
+	int ret;
+	u16 i;
+
+	memset(tp->msix_entry, 0x0, RTASE_NUM_MSIX * sizeof(struct msix_entry));
+
+	for (i = 0; i < RTASE_NUM_MSIX; i++)
+		tp->msix_entry[i].entry = i;
+
+	ret = pci_enable_msix_range(pdev, tp->msix_entry, tp->int_nums,
+				    tp->int_nums);
+
+	if (ret == tp->int_nums) {
+		for (i = 0; i < tp->int_nums; i++) {
+			tp->int_vector[i].irq = pci_irq_vector(pdev, i);
+			tp->int_vector[i].status = 1;
+		}
+	}
+
+	return ret;
+}
+
+static int rtase_alloc_interrupt(struct pci_dev *pdev,
+				 struct rtase_private *tp)
+{
+	int ret;
+
+	ret = rtase_alloc_msix(pdev, tp);
+	if (ret != tp->int_nums) {
+		ret = pci_enable_msi(pdev);
+		if (ret)
+			dev_err(&pdev->dev,
+				"unable to alloc interrupt.(MSI)\n");
+		else
+			tp->sw_flag |= SWF_MSI_ENABLED;
+	} else {
+		tp->sw_flag |= SWF_MSIX_ENABLED;
+	}
+
+	return ret;
+}
+
+static void rtase_init_hardware(const struct rtase_private *tp)
+{
+	u16 i;
+
+	for (i = 0; i < RTASE_VLAN_FILTER_ENTRY_NUM; i++)
+		rtase_w32(tp, RTASE_VLAN_ENTRY_0 + i * 4, 0);
+}
+
+static void rtase_init_int_vector(struct rtase_private *tp)
+{
+	u16 i;
+
+	/* interrupt vector 0 */
+	tp->int_vector[0].tp = tp;
+	tp->int_vector[0].index = 0;
+	tp->int_vector[0].imr_addr = RTASE_IMR0;
+	tp->int_vector[0].isr_addr = RTASE_ISR0;
+	tp->int_vector[0].imr = ROK | RDU | TOK | TOK4 | TOK5 | TOK6 | TOK7;
+	tp->int_vector[0].poll = rtase_poll;
+
+	memset(tp->int_vector[0].name, 0x0, sizeof(tp->int_vector[0].name));
+	INIT_LIST_HEAD(&tp->int_vector[0].ring_list);
+
+	netif_napi_add(tp->dev, &tp->int_vector[0].napi,
+		       tp->int_vector[0].poll);
+	napi_enable(&tp->int_vector[0].napi);
+
+	/* interrupt vector 1 ~ 3 */
+	for (i = 1; i < tp->int_nums; i++) {
+		tp->int_vector[i].tp = tp;
+		tp->int_vector[i].index = i;
+		tp->int_vector[i].imr_addr = RTASE_IMR1 + (i - 1) * 4;
+		tp->int_vector[i].isr_addr = RTASE_ISR1 + (i - 1) * 4;
+		tp->int_vector[i].imr = Q_ROK | Q_RDU | Q_TOK;
+		tp->int_vector[i].poll = rtase_poll;
+
+		memset(tp->int_vector[i].name, 0x0, sizeof(tp->int_vector[0].name));
+		INIT_LIST_HEAD(&tp->int_vector[i].ring_list);
+
+		netif_napi_add(tp->dev, &tp->int_vector[i].napi,
+			       tp->int_vector[i].poll);
+		napi_enable(&tp->int_vector[i].napi);
+	}
+}
+
+static u16 rtase_calc_time_mitigation(u32 time_us)
+{
+	u16 int_miti;
+	u8 msb, time_count, time_unit;
+
+	time_us = min_t(int, time_us, MITI_MAX_TIME);
+
+	msb = fls(time_us);
+	if (msb >= MITI_COUNT_BIT_NUM) {
+		time_unit = msb - MITI_COUNT_BIT_NUM;
+		time_count = time_us >> (msb - MITI_COUNT_BIT_NUM);
+	} else {
+		time_unit = 0;
+		time_count = time_us;
+	}
+
+	int_miti = u16_encode_bits(time_count, MITI_TIME_COUNT_MASK) |
+		   u16_encode_bits(time_unit, MITI_TIME_UNIT_MASK);
+
+	return int_miti;
+}
+
+static u16 rtase_calc_packet_num_mitigation(u16 pkt_num)
+{
+	u16 int_miti;
+	u8 msb, pkt_num_count, pkt_num_unit;
+
+	pkt_num = min_t(int, pkt_num, MITI_MAX_PKT_NUM);
+
+	if (pkt_num > 60) {
+		pkt_num_unit = MITI_MAX_PKT_NUM_IDX;
+		pkt_num_count = pkt_num / MITI_MAX_PKT_NUM_UNIT;
+	} else {
+		msb = fls(pkt_num);
+		if (msb >= MITI_COUNT_BIT_NUM) {
+			pkt_num_unit = msb - MITI_COUNT_BIT_NUM;
+			pkt_num_count = pkt_num >> (msb - MITI_COUNT_BIT_NUM);
+		} else {
+			pkt_num_unit = 0;
+			pkt_num_count = pkt_num;
+		}
+	}
+
+	int_miti = u16_encode_bits(pkt_num_count, MITI_PKT_NUM_COUNT_MASK) |
+		   u16_encode_bits(pkt_num_unit, MITI_PKT_NUM_UNIT_MASK);
+
+	return int_miti;
+}
+
+static void rtase_init_software_variable(struct pci_dev *pdev,
+					 struct rtase_private *tp)
+{
+	u16 int_miti;
+
+	tp->tx_queue_ctrl = RTASE_TXQ_CTRL;
+	tp->func_tx_queue_num = RTASE_FUNC_TXQ_NUM;
+	tp->func_rx_queue_num = RTASE_FUNC_RXQ_NUM;
+	tp->int_nums = RTASE_INTERRUPT_NUM;
+
+	int_miti = rtase_calc_time_mitigation(MITI_DEFAULT_TIME) |
+		   rtase_calc_packet_num_mitigation(MITI_DEFAULT_PKT_NUM);
+	tp->tx_int_mit = int_miti;
+	tp->rx_int_mit = int_miti;
+
+	tp->sw_flag = 0;
+
+	rtase_init_int_vector(tp);
+
+	/* MTU range: 60 - hw-specific max */
+	tp->dev->min_mtu = ETH_ZLEN;
+	tp->dev->max_mtu = MAX_JUMBO_SIZE;
+}
+
+static bool rtase_check_mac_version_valid(struct rtase_private *tp)
+{
+	u32 hw_ver = rtase_r32(tp, RTASE_TX_CONFIG_0) & HW_VER_MASK;
+	bool known_ver = false;
+
+	switch (hw_ver) {
+	case 0x00800000:
+	case 0x04000000:
+	case 0x04800000:
+		known_ver = true;
+		break;
+	}
+
+	return known_ver;
+}
+
+static int rtase_init_board(struct pci_dev *pdev, struct net_device **dev_out,
+			    void __iomem **ioaddr_out)
+{
+	struct net_device *dev;
+	void __iomem *ioaddr;
+	int ret = -ENOMEM;
+
+	/* dev zeroed in alloc_etherdev */
+	dev = alloc_etherdev_mq(sizeof(struct rtase_private),
+				RTASE_FUNC_TXQ_NUM);
+	if (!dev)
+		goto err_out;
+
+	SET_NETDEV_DEV(dev, &pdev->dev);
+
+	ret = pci_enable_device(pdev);
+	if (ret < 0)
+		goto err_out_free_dev;
+
+	/* make sure PCI base addr 1 is MMIO */
+	if (!(pci_resource_flags(pdev, 2) & IORESOURCE_MEM)) {
+		ret = -ENODEV;
+		goto err_out_disable;
+	}
+
+	/* check for weird/broken PCI region reporting */
+	if (pci_resource_len(pdev, 2) < RTASE_REGS_SIZE) {
+		ret = -ENODEV;
+		goto err_out_disable;
+	}
+
+	ret = pci_request_regions(pdev, KBUILD_MODNAME);
+	if (ret < 0)
+		goto err_out_disable;
+
+	if (!dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64)))
+		dev->features |= NETIF_F_HIGHDMA;
+	else if (dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32)))
+		goto err_out_free_res;
+	else
+		dev_info(&pdev->dev, "DMA_BIT_MASK: 32\n");
+
+	pci_set_master(pdev);
+
+	/* ioremap MMIO region */
+	ioaddr = ioremap(pci_resource_start(pdev, 2),
+			 pci_resource_len(pdev, 2));
+	if (!ioaddr) {
+		ret = -EIO;
+		goto err_out_free_res;
+	}
+
+	*ioaddr_out = ioaddr;
+	*dev_out = dev;
+	goto out;
+
+err_out_free_res:
+	pci_release_regions(pdev);
+
+err_out_disable:
+	pci_disable_device(pdev);
+
+err_out_free_dev:
+	free_netdev(dev);
+
+err_out:
+	*ioaddr_out = NULL;
+	*dev_out = NULL;
+
+out:
+	return ret;
+}
+
+static void rtase_release_board(struct pci_dev *pdev, struct net_device *dev,
+				void __iomem *ioaddr)
+{
+	const struct rtase_private *tp = netdev_priv(dev);
+
+	rtase_rar_set(tp, tp->org_mac_addr);
+	iounmap(ioaddr);
+
+	if ((tp->sw_flag & SWF_MSIX_ENABLED))
+		pci_disable_msix(pdev);
+	else
+		pci_disable_msi(pdev);
+
+	pci_release_regions(pdev);
+	pci_disable_device(pdev);
+	free_netdev(dev);
+}
+
+static int rtase_init_one(struct pci_dev *pdev,
+			  const struct pci_device_id *ent)
+{
+	struct net_device *dev = NULL;
+	void __iomem *ioaddr = NULL;
+	struct rtase_private *tp;
+	int ret;
+
+	if (!pdev->is_physfn && pdev->is_virtfn) {
+		dev_err(&pdev->dev, "This module does not support a virtual function.");
+		return -EINVAL;
+	}
+
+	dev_dbg(&pdev->dev, "Automotive Switch Ethernet driver loaded\n");
+
+	ret = rtase_init_board(pdev, &dev, &ioaddr);
+	if (ret != 0)
+		return ret;
+
+	tp = netdev_priv(dev);
+	tp->mmio_addr = ioaddr;
+	tp->dev = dev;
+	tp->pdev = pdev;
+
+	/* identify chip attached to board */
+	if (!rtase_check_mac_version_valid(tp)) {
+		return dev_err_probe(&pdev->dev, -ENODEV,
+				     "unknown chip version, contact rtase maintainers (see MAINTAINERS file)\n");
+	}
+
+	dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
+	if (!dev->tstats)
+		goto err_out_1;
+
+	rtase_init_software_variable(pdev, tp);
+	rtase_init_hardware(tp);
+
+	ret = rtase_alloc_interrupt(pdev, tp);
+	if (ret < 0) {
+		dev_err(&pdev->dev, "unable to alloc MSIX/MSI\n");
+		goto err_out_1;
+	}
+
+	rtase_init_netdev_ops(dev);
+
+	dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX;
+
+	dev->features |= NETIF_F_IP_CSUM;
+	dev->features |= NETIF_F_RXCSUM | NETIF_F_SG | NETIF_F_TSO;
+	dev->features |= NETIF_F_IPV6_CSUM | NETIF_F_TSO6;
+	dev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO |
+			   NETIF_F_RXCSUM | NETIF_F_HW_VLAN_CTAG_TX |
+			   NETIF_F_HW_VLAN_CTAG_RX;
+	dev->hw_features |= NETIF_F_RXALL;
+	dev->hw_features |= NETIF_F_RXFCS;
+	dev->hw_features |= NETIF_F_IPV6_CSUM | NETIF_F_TSO6;
+	dev->vlan_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO |
+			     NETIF_F_HIGHDMA;
+	dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
+	netif_set_tso_max_size(dev, LSO_64K);
+	netif_set_tso_max_segs(dev, NIC_MAX_PHYS_BUF_COUNT_LSO2);
+
+	rtase_get_mac_address(dev);
+
+	tp->tally_vaddr = dma_alloc_coherent(&pdev->dev,
+					     sizeof(*tp->tally_vaddr),
+					     &tp->tally_paddr,
+					     GFP_KERNEL);
+	if (!tp->tally_vaddr) {
+		ret = -ENOMEM;
+		goto err_out;
+	}
+
+	rtase_tally_counter_clear(tp);
+
+	pci_set_drvdata(pdev, dev);
+
+	ret = register_netdev(dev);
+	if (ret != 0)
+		goto err_out;
+
+	netdev_dbg(dev, "%pM, IRQ %d\n", dev->dev_addr, dev->irq);
+
+	netif_carrier_off(dev);
+
+	goto out;
+
+err_out:
+	if (tp->tally_vaddr) {
+		dma_free_coherent(&pdev->dev,
+				  sizeof(*tp->tally_vaddr),
+				  tp->tally_vaddr,
+				  tp->tally_paddr);
+
+		tp->tally_vaddr = NULL;
+	}
+
+err_out_1:
+	rtase_release_board(pdev, dev, ioaddr);
+
+out:
+	return ret;
+}
+
+static void rtase_remove_one(struct pci_dev *pdev)
+{
+	struct net_device *dev = pci_get_drvdata(pdev);
+	struct rtase_private *tp = netdev_priv(dev);
+	struct rtase_int_vector *ivec;
+	u32 i;
+
+	for (i = 0; i < tp->int_nums; i++) {
+		ivec = &tp->int_vector[i];
+		netif_napi_del(&ivec->napi);
+	}
+
+	unregister_netdev(dev);
+	rtase_reset_interrupt(pdev, tp);
+	if (tp->tally_vaddr) {
+		dma_free_coherent(&pdev->dev,
+				  sizeof(*tp->tally_vaddr),
+				  tp->tally_vaddr,
+				  tp->tally_paddr);
+		tp->tally_vaddr = NULL;
+	}
+
+	rtase_release_board(pdev, dev, tp->mmio_addr);
+	pci_set_drvdata(pdev, NULL);
+}
+
+static void rtase_shutdown(struct pci_dev *pdev)
+{
+	struct net_device *dev = pci_get_drvdata(pdev);
+	const struct rtase_private *tp = netdev_priv(dev);
+
+	if (netif_running(dev))
+		rtase_close(dev);
+
+	rtase_reset_interrupt(pdev, tp);
+}
+
+static struct pci_driver rtase_pci_driver = {
+	.name = KBUILD_MODNAME,
+	.id_table = rtase_pci_tbl,
+	.probe = rtase_init_one,
+	.remove = rtase_remove_one,
+	.shutdown = rtase_shutdown,
+};
+
+module_pci_driver(rtase_pci_driver);
-- 
2.34.1


^ permalink raw reply related

* [PATCH net-next v10 08/13] net:ethernet:realtek:rtase: Implement net_device_ops
From: Justin Lai @ 2023-11-02 15:45 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, linux-kernel, netdev, andrew, pkshih,
	larry.chiu, Justin Lai
In-Reply-To: <20231102154505.940783-1-justinlai0215@realtek.com>

1. Implement .ndo_set_rx_mode so that the device can change address
list filtering.
2. Implement .ndo_set_mac_address so that mac address can be changed.
3. Implement .ndo_change_mtu so that mtu can be changed.
4. Implement .ndo_tx_timeout to perform related processing when the
transmitter does not make any progress.
5. Implement .ndo_get_stats64 to provide statistics that are called
when the user wants to get network device usage.
6. Implement .ndo_vlan_rx_add_vid to register VLAN ID when the device
supports VLAN filtering.
7. Implement .ndo_vlan_rx_kill_vid to unregister VLAN ID when the device
supports VLAN filtering.
8. Implement the .ndo_setup_tc to enable setting any "tc" scheduler,
classifier or action on dev.
9. Implement .ndo_fix_features enables adjusting requested feature flags
based on device-specific constraints.
10. Implement .ndo_set_features enables updating device configuration to
new features.

Signed-off-by: Justin Lai <justinlai0215@realtek.com>
---
 .../net/ethernet/realtek/rtase/rtase_main.c   | 378 ++++++++++++++++++
 1 file changed, 378 insertions(+)

diff --git a/drivers/net/ethernet/realtek/rtase/rtase_main.c b/drivers/net/ethernet/realtek/rtase/rtase_main.c
index fc0bd1bfd83b..feae944bc5c0 100644
--- a/drivers/net/ethernet/realtek/rtase/rtase_main.c
+++ b/drivers/net/ethernet/realtek/rtase/rtase_main.c
@@ -1444,6 +1444,11 @@ static netdev_tx_t rtase_start_xmit(struct sk_buff *skb,
 	return NETDEV_TX_BUSY;
 }
 
+static void rtase_set_rx_mode(struct net_device *dev)
+{
+	rtase_hw_set_rx_packet_filter(dev);
+}
+
 static void rtase_enable_eem_write(const struct rtase_private *tp)
 {
 	u8 val;
@@ -1476,6 +1481,290 @@ static void rtase_rar_set(const struct rtase_private *tp, const u8 *addr)
 	rtase_w16(tp, RTASE_LBK_CTRL, LBK_ATLD | LBK_CLR);
 }
 
+static int rtase_set_mac_address(struct net_device *dev, void *p)
+{
+	struct rtase_private *tp = netdev_priv(dev);
+	int ret;
+
+	ret = eth_mac_addr(dev, p);
+	if (ret)
+		return ret;
+
+	rtase_rar_set(tp, dev->dev_addr);
+
+	return 0;
+}
+
+static int rtase_change_mtu(struct net_device *dev, int new_mtu)
+{
+	dev->mtu = new_mtu;
+
+	netdev_update_features(dev);
+
+	return 0;
+}
+
+static void rtase_wait_for_quiescence(const struct net_device *dev)
+{
+	struct rtase_private *tp = netdev_priv(dev);
+	struct rtase_int_vector *ivec;
+	u32 i;
+
+	for (i = 0; i < tp->int_nums; i++) {
+		ivec = &tp->int_vector[i];
+		synchronize_irq(ivec->irq);
+		/* wait for any pending NAPI task to complete */
+		napi_disable(&ivec->napi);
+	}
+
+	rtase_irq_dis_and_clear(tp);
+
+	for (i = 0; i < tp->int_nums; i++) {
+		ivec = &tp->int_vector[i];
+		napi_enable(&ivec->napi);
+	}
+}
+
+static void rtase_sw_reset(struct net_device *dev)
+{
+	struct rtase_private *tp = netdev_priv(dev);
+	int ret;
+
+	netif_stop_queue(dev);
+	netif_carrier_off(dev);
+	rtase_hw_reset(dev);
+
+	/* let's wait a bit while any (async) irq lands on */
+	rtase_wait_for_quiescence(dev);
+	rtase_tx_clear(tp);
+	rtase_rx_clear(tp);
+
+	ret = rtase_init_ring(dev);
+	if (ret)
+		netdev_alert(dev, "unable to init ring\n");
+
+	rtase_hw_config(dev);
+	/* always link, so start to transmit & receive */
+	rtase_hw_start(dev);
+
+	netif_carrier_on(dev);
+	netif_wake_queue(dev);
+}
+
+static void rtase_dump_tally_counter(const struct rtase_private *tp)
+{
+	dma_addr_t paddr = tp->tally_paddr;
+	u32 cmd = lower_32_bits(paddr);
+	u32 val;
+	int err;
+
+	rtase_w32(tp, RTASE_DTCCR4, upper_32_bits(paddr));
+	rtase_w32(tp, RTASE_DTCCR0, cmd);
+	rtase_w32(tp, RTASE_DTCCR0, cmd | COUNTER_DUMP);
+
+	err = read_poll_timeout(rtase_r32, val, !(val & COUNTER_DUMP), 10, 250,
+				false, tp, RTASE_DTCCR0);
+
+	if (err == -ETIMEDOUT)
+		netdev_err(tp->dev, "error occurred in dump tally counter\n");
+}
+
+static void rtase_dump_state(const struct net_device *dev)
+{
+	const struct rtase_private *tp = netdev_priv(dev);
+	const struct rtase_counters *counters;
+	int max_reg_size = RTASE_PCI_REGS_SIZE;
+	const struct rtase_ring *ring;
+	u32 dword_rd;
+	int n = 0;
+
+	ring = &tp->tx_ring[0];
+	netdev_err(dev, "Tx descriptor info:\n");
+	netdev_err(dev, "Tx curIdx = 0x%x\n", ring->cur_idx);
+	netdev_err(dev, "Tx dirtyIdx = 0x%x\n", ring->dirty_idx);
+	netdev_err(dev, "Tx phyAddr = 0x%llx\n", ring->phy_addr);
+
+	ring = &tp->rx_ring[0];
+	netdev_err(dev, "Rx descriptor info:\n");
+	netdev_err(dev, "Rx curIdx = 0x%x\n", ring->cur_idx);
+	netdev_err(dev, "Rx dirtyIdx = 0x%x\n", ring->dirty_idx);
+	netdev_err(dev, "Rx phyAddr = 0x%llx\n", ring->phy_addr);
+
+	netdev_err(dev, "Device Registers:\n");
+	netdev_err(dev, "Chip Command = 0x%02x\n", rtase_r8(tp, RTASE_CHIP_CMD));
+	netdev_err(dev, "IMR = %08x\n", rtase_r32(tp, RTASE_IMR0));
+	netdev_err(dev, "ISR = %08x\n", rtase_r32(tp, RTASE_ISR0));
+	netdev_err(dev, "Boot Ctrl Reg(0xE004) = %04x\n",
+		   rtase_r16(tp, RTASE_BOOT_CTL));
+	netdev_err(dev, "EPHY ISR(0xE014) = %04x\n",
+		   rtase_r16(tp, RTASE_EPHY_ISR));
+	netdev_err(dev, "EPHY IMR(0xE016) = %04x\n",
+		   rtase_r16(tp, RTASE_EPHY_IMR));
+	netdev_err(dev, "CLKSW SET REG(0xE018) = %04x\n",
+		   rtase_r16(tp, RTASE_CLKSW_SET));
+
+	netdev_err(dev, "Dump PCI Registers:\n");
+
+	while (n < max_reg_size) {
+		if ((n % RTASE_DWORD_MOD) == 0)
+			netdev_err(tp->dev, "0x%03x:\n", n);
+
+		pci_read_config_dword(tp->pdev, n, &dword_rd);
+		netdev_err(tp->dev, "%08x\n", dword_rd);
+		n += 4;
+	}
+
+	netdev_err(dev, "Dump tally counter:\n");
+	counters = tp->tally_vaddr;
+	rtase_dump_tally_counter(tp);
+
+	netdev_err(dev, "tx_packets %lld\n",
+		   le64_to_cpu(counters->tx_packets));
+	netdev_err(dev, "rx_packets %lld\n",
+		   le64_to_cpu(counters->rx_packets));
+	netdev_err(dev, "tx_errors %lld\n",
+		   le64_to_cpu(counters->tx_errors));
+	netdev_err(dev, "rx_missed %lld\n",
+		   le64_to_cpu(counters->rx_missed));
+	netdev_err(dev, "align_errors %lld\n",
+		   le64_to_cpu(counters->align_errors));
+	netdev_err(dev, "tx_one_collision %lld\n",
+		   le64_to_cpu(counters->tx_one_collision));
+	netdev_err(dev, "tx_multi_collision %lld\n",
+		   le64_to_cpu(counters->tx_multi_collision));
+	netdev_err(dev, "rx_unicast %lld\n",
+		   le64_to_cpu(counters->rx_unicast));
+	netdev_err(dev, "rx_broadcast %lld\n",
+		   le64_to_cpu(counters->rx_broadcast));
+	netdev_err(dev, "rx_multicast %lld\n",
+		   le64_to_cpu(counters->rx_multicast));
+	netdev_err(dev, "tx_aborted %lld\n",
+		   le64_to_cpu(counters->tx_aborted));
+	netdev_err(dev, "tx_underun %lld\n",
+		   le64_to_cpu(counters->tx_underun));
+}
+
+static void rtase_tx_timeout(struct net_device *dev, unsigned int txqueue)
+{
+	rtase_dump_state(dev);
+	rtase_sw_reset(dev);
+}
+
+static void rtase_get_stats64(struct net_device *dev,
+			      struct rtnl_link_stats64 *stats)
+{
+	const struct rtase_private *tp = netdev_priv(dev);
+	const struct rtase_counters *counters = tp->tally_vaddr;
+
+	if (!counters)
+		return;
+
+	netdev_stats_to_stats64(stats, &dev->stats);
+	dev_fetch_sw_netstats(stats, dev->tstats);
+
+	/* fetch additional counter values missing in stats collected by driver
+	 * from tally counter
+	 */
+	rtase_dump_tally_counter(tp);
+
+	stats->tx_errors = le64_to_cpu(counters->tx_errors);
+	stats->collisions = le32_to_cpu(counters->tx_multi_collision);
+	stats->tx_aborted_errors = le16_to_cpu(counters->tx_aborted);
+	stats->rx_missed_errors = le16_to_cpu(counters->rx_missed);
+}
+
+static void rtase_enable_vlan_filter(const struct rtase_private *tp, bool enabled)
+{
+	u16 tmp;
+
+	if (enabled == 1) {
+		tmp = rtase_r16(tp, RTASE_FCR);
+		if (!(tmp & FCR_VLAN_FTR_EN))
+			rtase_w16(tp, RTASE_FCR, tmp | FCR_VLAN_FTR_EN);
+
+		tmp = rtase_r16(tp, RTASE_PCPR);
+		if (!(tmp & PCPR_VLAN_FTR_EN))
+			rtase_w16(tp, RTASE_PCPR, tmp | PCPR_VLAN_FTR_EN);
+	} else {
+		tmp = rtase_r16(tp, RTASE_FCR);
+		if (tmp & FCR_VLAN_FTR_EN)
+			rtase_w16(tp, RTASE_FCR, tmp & ~FCR_VLAN_FTR_EN);
+
+		tmp = rtase_r16(tp, RTASE_PCPR);
+		if (!(tmp & PCPR_VLAN_FTR_EN))
+			rtase_w16(tp, RTASE_PCPR, tmp & ~PCPR_VLAN_FTR_EN);
+	}
+}
+
+static int rtase_vlan_rx_add_vid(struct net_device *dev, __be16 protocol,
+				 u16 vid)
+{
+	struct rtase_private *tp = netdev_priv(dev);
+	u16 tmp_mem, i;
+
+	if (be16_to_cpu(protocol) != ETH_P_8021Q)
+		return -EINVAL;
+
+	for (i = 0; i < RTASE_VLAN_FILTER_ENTRY_NUM; i++) {
+		u16 addr, mask;
+
+		if (!(tp->vlan_filter_ctrl & BIT(i))) {
+			tp->vlan_filter_ctrl |= BIT(i);
+			tp->vlan_filter_vid[i] = vid;
+			rtase_w32(tp, RTASE_VLAN_ENTRY_0 + i * 4,
+				  vid | VLAN_ENTRY_CAREBIT);
+			/* each 16-bit register contains two VLAN entries */
+			addr = RTASE_VLAN_ENTRY_MEM_0 + (i & ~0x1);
+			mask = 0x1 << ((i & 0x1) * 8);
+			tmp_mem = rtase_r16(tp, addr);
+			tmp_mem |= mask;
+			rtase_w16(tp, addr, tmp_mem);
+			break;
+		}
+	}
+
+	if (i == RTASE_VLAN_FILTER_ENTRY_NUM)
+		return -ENOENT;
+
+	rtase_enable_vlan_filter(tp, true);
+
+	return 0;
+}
+
+static int rtase_vlan_rx_kill_vid(struct net_device *dev, __be16 protocol,
+				  u16 vid)
+{
+	struct rtase_private *tp = netdev_priv(dev);
+	u16 tmp_mem, i;
+
+	if (be16_to_cpu(protocol) != ETH_P_8021Q)
+		return -EINVAL;
+
+	for (i = 0; i < RTASE_VLAN_FILTER_ENTRY_NUM; i++) {
+		u16 addr, mask;
+
+		if (tp->vlan_filter_vid[i] == vid) {
+			tp->vlan_filter_ctrl &= ~BIT(i);
+			tp->vlan_filter_vid[i] = 0;
+			rtase_w32(tp, RTASE_VLAN_ENTRY_0 + i * 4, 0);
+
+			/* each 16-bit register contains two VLAN entries */
+			addr = RTASE_VLAN_ENTRY_MEM_0 + (i & ~0x1);
+			mask = ~(0x1 << ((i & 0x1) * 8));
+			tmp_mem = rtase_r16(tp, addr);
+			tmp_mem &= mask;
+			rtase_w16(tp, addr, tmp_mem);
+			break;
+		}
+	}
+
+	/* check vlan filter enabled */
+	if (!tp->vlan_filter_ctrl)
+		rtase_enable_vlan_filter(tp, false);
+
+	return 0;
+}
+
 #ifdef CONFIG_NET_POLL_CONTROLLER
 /* Polling 'interrupt' - used by things like netconsole to send skbs
  * without having to re-enable interrupts. It's not called while
@@ -1492,13 +1781,102 @@ static void rtase_netpoll(struct net_device *dev)
 }
 #endif
 
+static void rtase_set_hw_cbs(const struct rtase_private *tp, u32 queue)
+{
+	u32 idle = tp->tx_qos[queue].idleslope * RTASE_1T_CLOCK;
+	u32 val, i;
+
+	val = u32_encode_bits(idle / RTASE_1T_POWER, RTASE_IDLESLOPE_INT_MASK);
+	idle %= RTASE_1T_POWER;
+
+	for (i = 1; i <= RTASE_IDLESLOPE_INT_SHIFT; i++) {
+		idle *= 2;
+		if ((idle / RTASE_1T_POWER) == 1)
+			val |= BIT(RTASE_IDLESLOPE_INT_SHIFT - i);
+
+		idle %= RTASE_1T_POWER;
+	}
+	rtase_w32(tp, RTASE_TXQCRDT_0 + queue * 4, val);
+}
+
+static int rtase_setup_tc_cbs(struct rtase_private *tp,
+			      const struct tc_cbs_qopt_offload *qopt)
+{
+	u32 queue = qopt->queue;
+
+	tp->tx_qos[queue].hicredit = qopt->hicredit;
+	tp->tx_qos[queue].locredit = qopt->locredit;
+	tp->tx_qos[queue].idleslope = qopt->idleslope;
+	tp->tx_qos[queue].sendslope = qopt->sendslope;
+
+	/* set hardware cbs */
+	rtase_set_hw_cbs(tp, queue);
+
+	return 0;
+}
+
+static int rtase_setup_tc(struct net_device *dev, enum tc_setup_type type,
+			  void *type_data)
+{
+	struct rtase_private *tp = netdev_priv(dev);
+	int ret = 0;
+
+	switch (type) {
+	case TC_SETUP_QDISC_CBS:
+		ret = rtase_setup_tc_cbs(tp, type_data);
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return ret;
+}
+
+static netdev_features_t rtase_fix_features(struct net_device *dev,
+					    netdev_features_t features)
+{
+	netdev_features_t features_fix = features;
+
+	if (dev->mtu > MSS_MAX)
+		features_fix &= ~NETIF_F_ALL_TSO;
+
+	if (dev->mtu > ETH_DATA_LEN)
+		features_fix &= ~NETIF_F_ALL_TSO;
+
+	return features_fix;
+}
+
+static int rtase_set_features(struct net_device *dev,
+			      netdev_features_t features)
+{
+	netdev_features_t features_set = features;
+
+	features_set &= NETIF_F_RXALL | NETIF_F_RXCSUM |
+			NETIF_F_HW_VLAN_CTAG_RX;
+
+	if (features_set ^ dev->features)
+		rtase_hw_set_features(dev, features_set);
+
+	return 0;
+}
+
 static const struct net_device_ops rtase_netdev_ops = {
 	.ndo_open = rtase_open,
 	.ndo_stop = rtase_close,
 	.ndo_start_xmit = rtase_start_xmit,
+	.ndo_set_rx_mode = rtase_set_rx_mode,
+	.ndo_set_mac_address = rtase_set_mac_address,
+	.ndo_change_mtu = rtase_change_mtu,
+	.ndo_tx_timeout = rtase_tx_timeout,
+	.ndo_get_stats64 = rtase_get_stats64,
+	.ndo_vlan_rx_add_vid = rtase_vlan_rx_add_vid,
+	.ndo_vlan_rx_kill_vid = rtase_vlan_rx_kill_vid,
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller = rtase_netpoll,
 #endif
+	.ndo_setup_tc = rtase_setup_tc,
+	.ndo_fix_features = rtase_fix_features,
+	.ndo_set_features = rtase_set_features,
 };
 
 static void rtase_get_mac_address(struct net_device *dev)
-- 
2.34.1


^ permalink raw reply related

* [PATCH net-next v10 05/13] net:ethernet:realtek:rtase: Implement hardware configuration function
From: Justin Lai @ 2023-11-02 15:44 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, linux-kernel, netdev, andrew, pkshih,
	larry.chiu, Justin Lai
In-Reply-To: <20231102154505.940783-1-justinlai0215@realtek.com>

Implement rtase_hw_config to set default hardware settings, including
setting interrupt mitigation, tx/rx DMA burst, interframe gap time,
rx packet filter, near fifo threshold and fill descriptor ring and
tally counter address, and enable flow control. When filling the
rx descriptor ring, the first group of queues needs to be processed
separately because the positions of the first group of queues are not
regular with other subsequent groups. The other queues are all newly
added features, but we want to retain the original design. So they were
not put together.

Signed-off-by: Justin Lai <justinlai0215@realtek.com>
---
 .../net/ethernet/realtek/rtase/rtase_main.c   | 245 ++++++++++++++++++
 1 file changed, 245 insertions(+)

diff --git a/drivers/net/ethernet/realtek/rtase/rtase_main.c b/drivers/net/ethernet/realtek/rtase/rtase_main.c
index 968150b2730c..166034b83506 100644
--- a/drivers/net/ethernet/realtek/rtase/rtase_main.c
+++ b/drivers/net/ethernet/realtek/rtase/rtase_main.c
@@ -466,6 +466,25 @@ static int rtase_init_ring(const struct net_device *dev)
 	return -ENOMEM;
 }
 
+static void rtase_interrupt_mitigation(const struct rtase_private *tp)
+{
+	u32 i;
+
+	/* tx interrupt mitigation */
+	for (i = 0; i < tp->func_tx_queue_num; i++)
+		rtase_w16(tp, RTASE_INT_MITI_TX + i * 2, tp->tx_int_mit);
+
+	/* rx interrupt mitigation */
+	for (i = 0; i < tp->func_rx_queue_num; i++)
+		rtase_w16(tp, RTASE_INT_MITI_RX + i * 2, tp->rx_int_mit);
+}
+
+static void rtase_tally_counter_addr_fill(const struct rtase_private *tp)
+{
+	rtase_w32(tp, RTASE_DTCCR4, upper_32_bits(tp->tally_paddr));
+	rtase_w32(tp, RTASE_DTCCR0, lower_32_bits(tp->tally_paddr));
+}
+
 static void rtase_tally_counter_clear(const struct rtase_private *tp)
 {
 	u32 cmd = lower_32_bits(tp->tally_paddr);
@@ -474,6 +493,125 @@ static void rtase_tally_counter_clear(const struct rtase_private *tp)
 	rtase_w32(tp, RTASE_DTCCR0, cmd | COUNTER_RESET);
 }
 
+static void rtase_desc_addr_fill(const struct rtase_private *tp)
+{
+	const struct rtase_ring *ring;
+	u16 i, cmd, val;
+	int err;
+
+	for (i = 0; i < tp->func_tx_queue_num; i++) {
+		ring = &tp->tx_ring[i];
+
+		rtase_w32(tp, RTASE_TX_DESC_ADDR0,
+			  lower_32_bits(ring->phy_addr));
+		rtase_w32(tp, RTASE_TX_DESC_ADDR4,
+			  upper_32_bits(ring->phy_addr));
+
+		cmd = i | TX_DESC_CMD_WE | TX_DESC_CMD_CS;
+		rtase_w16(tp, RTASE_TX_DESC_COMMAND, cmd);
+
+		err = read_poll_timeout(rtase_r16, val, !(val & TX_DESC_CMD_CS),
+					10, 1000, false, tp, RTASE_TX_DESC_COMMAND);
+
+		if (err == -ETIMEDOUT)
+			netdev_err(tp->dev, "error occurred in fill tx descriptor\n");
+	}
+
+	for (i = 0; i < tp->func_rx_queue_num; i++) {
+		ring = &tp->rx_ring[i];
+
+		if (i == 0) {
+			rtase_w32(tp, RTASE_Q0_RX_DESC_ADDR0,
+				  lower_32_bits(ring->phy_addr));
+			rtase_w32(tp, RTASE_Q0_RX_DESC_ADDR4,
+				  upper_32_bits(ring->phy_addr));
+		} else {
+			rtase_w32(tp, (RTASE_Q1_RX_DESC_ADDR0 + ((i - 1) * 8)),
+				  lower_32_bits(ring->phy_addr));
+			rtase_w32(tp, (RTASE_Q1_RX_DESC_ADDR4 + ((i - 1) * 8)),
+				  upper_32_bits(ring->phy_addr));
+		}
+	}
+}
+
+static int rtase_hw_set_features(const struct net_device *dev,
+				 netdev_features_t features)
+{
+	const struct rtase_private *tp = netdev_priv(dev);
+	u16 rx_config, val;
+
+	rx_config = rtase_r16(tp, RTASE_RX_CONFIG_0);
+	if (features & NETIF_F_RXALL)
+		rx_config |= (ACCEPT_ERR | ACCEPT_RUNT);
+	else
+		rx_config &= ~(ACCEPT_ERR | ACCEPT_RUNT);
+
+	rtase_w16(tp, RTASE_RX_CONFIG_0, rx_config);
+
+	val = rtase_r16(tp, RTASE_CPLUS_CMD);
+	if (features & NETIF_F_RXCSUM)
+		rtase_w16(tp, RTASE_CPLUS_CMD, val | RX_CHKSUM);
+	else
+		rtase_w16(tp, RTASE_CPLUS_CMD, val & ~RX_CHKSUM);
+
+	rx_config = rtase_r16(tp, RTASE_RX_CONFIG_1);
+	if (dev->features & NETIF_F_HW_VLAN_CTAG_RX)
+		rx_config |= (INNER_VLAN_DETAG_EN | OUTER_VLAN_DETAG_EN);
+	else
+		rx_config &= ~(INNER_VLAN_DETAG_EN | OUTER_VLAN_DETAG_EN);
+
+	rtase_w16(tp, RTASE_RX_CONFIG_1, rx_config);
+
+	return 0;
+}
+
+static void rtase_set_mar(const struct rtase_private *tp)
+{
+	rtase_w32(tp, RTASE_MAR0, tp->mc_filter[0]);
+	rtase_w32(tp, RTASE_MAR1, tp->mc_filter[1]);
+}
+
+static void rtase_hw_set_rx_packet_filter(struct net_device *dev)
+{
+	struct rtase_private *tp = netdev_priv(dev);
+	u32 mc_filter[2] = { 0xFFFFFFFF, 0xFFFFFFFF };
+	u16 rx_mode;
+
+	rx_mode = rtase_r16(tp, RTASE_RX_CONFIG_0) & ~ACCEPT_MASK;
+	rx_mode |= ACCEPT_BROADCAST | ACCEPT_MYPHYS;
+
+	if (dev->flags & IFF_PROMISC) {
+		rx_mode |= ACCEPT_MULTICAST | ACCEPT_ALLPHYS;
+	} else if ((netdev_mc_count(dev) > MULTICAST_FILTER_LIMIT) ||
+		   (dev->flags & IFF_ALLMULTI)) {
+		/* too many to filter perfectly -- accept all multicasts */
+		rx_mode |= ACCEPT_MULTICAST;
+	} else {
+		struct netdev_hw_addr *hw_addr;
+
+		mc_filter[0] = 0;
+		mc_filter[1] = 0;
+
+		netdev_for_each_mc_addr(hw_addr, dev) {
+			u32 bit_nr = eth_hw_addr_crc(hw_addr);
+			u32 idx = u32_get_bits(bit_nr, BIT(31));
+			u32 bit = u32_get_bits(bit_nr, MULTICAST_FILTER_MASK);
+
+			mc_filter[idx] |= BIT(bit);
+			rx_mode |= ACCEPT_MULTICAST;
+		}
+	}
+
+	if (dev->features & NETIF_F_RXALL)
+		rx_mode |= ACCEPT_ERR | ACCEPT_RUNT;
+
+	tp->mc_filter[0] = swab32(mc_filter[1]);
+	tp->mc_filter[1] = swab32(mc_filter[0]);
+
+	rtase_set_mar(tp);
+	rtase_w16(tp, RTASE_RX_CONFIG_0, rx_mode);
+}
+
 static void rtase_irq_dis_and_clear(const struct rtase_private *tp)
 {
 	const struct rtase_int_vector *ivec = &tp->int_vector[0];
@@ -545,6 +683,113 @@ static void rtase_hw_reset(const struct net_device *dev)
 	rtase_nic_reset(dev);
 }
 
+static void rtase_set_rx_queue(const struct rtase_private *tp)
+{
+	u16 reg_data;
+
+	reg_data = rtase_r16(tp, RTASE_FCR);
+	switch (tp->func_rx_queue_num) {
+	case 1:
+		u16p_replace_bits(&reg_data, 0x1, FCR_RXQ_MASK);
+		break;
+	case 2:
+		u16p_replace_bits(&reg_data, 0x2, FCR_RXQ_MASK);
+		break;
+	case 4:
+		u16p_replace_bits(&reg_data, 0x3, FCR_RXQ_MASK);
+		break;
+	}
+	rtase_w16(tp, RTASE_FCR, reg_data);
+}
+
+static void rtase_set_tx_queue(const struct rtase_private *tp)
+{
+	u16 reg_data;
+
+	reg_data = rtase_r16(tp, RTASE_TX_CONFIG_1);
+	switch (tp->tx_queue_ctrl) {
+	case 1:
+		u16p_replace_bits(&reg_data, 0x0, TC_MODE_MASK);
+		break;
+	case 2:
+		u16p_replace_bits(&reg_data, 0x1, TC_MODE_MASK);
+		break;
+	case 3:
+	case 4:
+		u16p_replace_bits(&reg_data, 0x2, TC_MODE_MASK);
+		break;
+	default:
+		u16p_replace_bits(&reg_data, 0x3, TC_MODE_MASK);
+		break;
+	}
+	rtase_w16(tp, RTASE_TX_CONFIG_1, reg_data);
+}
+
+static void rtase_hw_config(struct net_device *dev)
+{
+	const struct rtase_private *tp = netdev_priv(dev);
+	u32 reg_data32;
+	u16 reg_data16;
+
+	rtase_hw_reset(dev);
+
+	/* Set Rx DMA burst */
+	reg_data16 = rtase_r16(tp, RTASE_RX_CONFIG_0);
+	reg_data16 &= ~(RX_SINGLE_TAG | RX_SINGLE_FETCH);
+	u16p_replace_bits(&reg_data16, RX_DMA_BURST_256, RX_MX_DMA_MASK);
+	rtase_w16(tp, RTASE_RX_CONFIG_0, reg_data16);
+
+	/* New Rx Descritpor */
+	reg_data16 = rtase_r16(tp, RTASE_RX_CONFIG_1);
+	reg_data16 |= RX_NEW_DESC_FORMAT_EN | PCIE_NEW_FLOW;
+	u16p_replace_bits(&reg_data16, 0xF, RX_MAX_FETCH_DESC_MASK);
+	rtase_w16(tp, RTASE_RX_CONFIG_1, reg_data16);
+
+	rtase_set_rx_queue(tp);
+
+	/* interrupt mitigation */
+	rtase_interrupt_mitigation(tp);
+
+	/* set tx DMA burst size and interframe gap time */
+	reg_data32 = rtase_r32(tp, RTASE_TX_CONFIG_0);
+	u32p_replace_bits(&reg_data32, TX_DMA_BURST_UNLIMITED, TX_DMA_MASK);
+	u32p_replace_bits(&reg_data32, INTERFRAMEGAP, TX_INTER_FRAME_GAP_MASK);
+	rtase_w32(tp, RTASE_TX_CONFIG_0, reg_data32);
+
+	/* new tx Descriptor */
+	reg_data16 = rtase_r16(tp, RTASE_TFUN_CTRL);
+	rtase_w16(tp, RTASE_TFUN_CTRL, reg_data16 | TX_NEW_DESC_FORMAT_EN);
+
+	/* tx Fetch Desc Number */
+	rtase_w8(tp, RTASE_TDFNR, 0x10);
+
+	/* tag num select */
+	reg_data16 = rtase_r16(tp, RTASE_MTPS);
+	u16p_replace_bits(&reg_data16, 0x4, TAG_NUM_SEL_MASK);
+	rtase_w16(tp, RTASE_MTPS, reg_data16);
+
+	rtase_set_tx_queue(tp);
+
+	/* TOK condition */
+	rtase_w16(tp, RTASE_TOKSEL, 0x5555);
+
+	rtase_tally_counter_addr_fill(tp);
+	rtase_desc_addr_fill(tp);
+	rtase_hw_set_features(dev, dev->features);
+
+	/* enable flow control */
+	reg_data16 = rtase_r16(tp, RTASE_CPLUS_CMD);
+	reg_data16 |= (FORCE_TXFLOW_EN | FORCE_RXFLOW_EN);
+	rtase_w16(tp, RTASE_CPLUS_CMD, reg_data16);
+	/* set Near FIFO Threshold - rx missed issue. */
+	rtase_w16(tp, RTASE_RFIFONFULL, 0x190);
+
+	rtase_w16(tp, RTASE_RMS, tp->rx_buf_sz);
+
+	/* set Rx packet filter */
+	rtase_hw_set_rx_packet_filter(dev);
+}
+
 static void rtase_nic_enable(const struct net_device *dev)
 {
 	const struct rtase_private *tp = netdev_priv(dev);
-- 
2.34.1


^ permalink raw reply related

* [PATCH net-next v10 10/13] net:ethernet:realtek:rtase: Implement ethtool function
From: Justin Lai @ 2023-11-02 15:45 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, linux-kernel, netdev, andrew, pkshih,
	larry.chiu, Justin Lai
In-Reply-To: <20231102154505.940783-1-justinlai0215@realtek.com>

Implement the ethtool function to support users to obtain network card
information, including obtaining various device settings, Report whether
physical link is up, Report pause parameters, Set pause parameters,
Return a set of strings that describe the requested objects, Get number
of strings that @get_strings will write, Return extended statistics
about the device.

Signed-off-by: Justin Lai <justinlai0215@realtek.com>
---
 .../net/ethernet/realtek/rtase/rtase_main.c   | 144 ++++++++++++++++++
 1 file changed, 144 insertions(+)

diff --git a/drivers/net/ethernet/realtek/rtase/rtase_main.c b/drivers/net/ethernet/realtek/rtase/rtase_main.c
index 829f6df163e9..7f2351b8cc52 100644
--- a/drivers/net/ethernet/realtek/rtase/rtase_main.c
+++ b/drivers/net/ethernet/realtek/rtase/rtase_main.c
@@ -1902,9 +1902,153 @@ static void rtase_get_mac_address(struct net_device *dev)
 	ether_addr_copy(dev->perm_addr, dev->dev_addr);
 }
 
+static void rtase_get_drvinfo(struct net_device *dev,
+			      struct ethtool_drvinfo *drvinfo)
+{
+	const struct rtase_private *tp = netdev_priv(dev);
+
+	strscpy(drvinfo->driver, KBUILD_MODNAME, 32);
+	strscpy(drvinfo->bus_info, pci_name(tp->pdev), 32);
+}
+
+static int rtase_get_settings(struct net_device *dev,
+			      struct ethtool_link_ksettings *cmd)
+{
+	u32 supported = SUPPORTED_MII | SUPPORTED_Pause;
+
+	ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported,
+						supported);
+	cmd->base.speed = SPEED_5000;
+	cmd->base.duplex = DUPLEX_FULL;
+	cmd->base.port = PORT_MII;
+	cmd->base.autoneg = AUTONEG_DISABLE;
+
+	return 0;
+}
+
+static void rtase_get_pauseparam(struct net_device *dev,
+				 struct ethtool_pauseparam *pause)
+{
+	const struct rtase_private *tp = netdev_priv(dev);
+	u16 value = rtase_r16(tp, RTASE_CPLUS_CMD);
+
+	pause->autoneg = AUTONEG_DISABLE;
+
+	if ((value & (FORCE_TXFLOW_EN | FORCE_RXFLOW_EN)) ==
+	    (FORCE_TXFLOW_EN | FORCE_RXFLOW_EN)) {
+		pause->rx_pause = 1;
+		pause->tx_pause = 1;
+	} else if ((value & FORCE_TXFLOW_EN)) {
+		pause->tx_pause = 1;
+	} else if ((value & FORCE_RXFLOW_EN)) {
+		pause->rx_pause = 1;
+	}
+}
+
+static int rtase_set_pauseparam(struct net_device *dev,
+				struct ethtool_pauseparam *pause)
+{
+	const struct rtase_private *tp = netdev_priv(dev);
+	u16 value = rtase_r16(tp, RTASE_CPLUS_CMD);
+
+	if (pause->autoneg)
+		return -EOPNOTSUPP;
+
+	value &= ~(FORCE_TXFLOW_EN | FORCE_RXFLOW_EN);
+
+	if (pause->tx_pause)
+		value |= FORCE_TXFLOW_EN;
+
+	if (pause->rx_pause)
+		value |= FORCE_RXFLOW_EN;
+
+	rtase_w16(tp, RTASE_CPLUS_CMD, value);
+	return 0;
+}
+
+static const char rtase_gstrings[][ETH_GSTRING_LEN] = {
+	"tx_packets",
+	"rx_packets",
+	"tx_errors",
+	"rx_errors",
+	"rx_missed",
+	"align_errors",
+	"tx_single_collisions",
+	"tx_multi_collisions",
+	"unicast",
+	"broadcast",
+	"multicast",
+	"tx_aborted",
+	"tx_underrun",
+};
+
+static void rtase_get_strings(struct net_device *dev, u32 stringset, u8 *data)
+{
+	switch (stringset) {
+	case ETH_SS_STATS:
+		memcpy(data, *rtase_gstrings, sizeof(rtase_gstrings));
+		break;
+	}
+}
+
+static int rtase_get_sset_count(struct net_device *dev, int sset)
+{
+	int ret = -EOPNOTSUPP;
+
+	switch (sset) {
+	case ETH_SS_STATS:
+		ret = ARRAY_SIZE(rtase_gstrings);
+		break;
+	}
+
+	return ret;
+}
+
+static void rtase_get_ethtool_stats(struct net_device *dev,
+				    struct ethtool_stats *stats, u64 *data)
+{
+	struct rtase_private *tp = netdev_priv(dev);
+	const struct rtase_counters *counters;
+
+	ASSERT_RTNL();
+
+	counters = tp->tally_vaddr;
+	if (!counters)
+		return;
+
+	rtase_dump_tally_counter(tp);
+
+	data[0] = le64_to_cpu(counters->tx_packets);
+	data[1] = le64_to_cpu(counters->rx_packets);
+	data[2] = le64_to_cpu(counters->tx_errors);
+	data[3] = le32_to_cpu(counters->rx_errors);
+	data[4] = le16_to_cpu(counters->rx_missed);
+	data[5] = le16_to_cpu(counters->align_errors);
+	data[6] = le32_to_cpu(counters->tx_one_collision);
+	data[7] = le32_to_cpu(counters->tx_multi_collision);
+	data[8] = le64_to_cpu(counters->rx_unicast);
+	data[9] = le64_to_cpu(counters->rx_broadcast);
+	data[10] = le32_to_cpu(counters->rx_multicast);
+	data[11] = le16_to_cpu(counters->tx_aborted);
+	data[12] = le16_to_cpu(counters->tx_underun);
+}
+
+static const struct ethtool_ops rtase_ethtool_ops = {
+	.get_drvinfo = rtase_get_drvinfo,
+	.get_link = ethtool_op_get_link,
+	.get_link_ksettings = rtase_get_settings,
+	.get_pauseparam = rtase_get_pauseparam,
+	.set_pauseparam = rtase_set_pauseparam,
+	.get_strings = rtase_get_strings,
+	.get_sset_count = rtase_get_sset_count,
+	.get_ethtool_stats = rtase_get_ethtool_stats,
+	.get_ts_info = ethtool_op_get_ts_info,
+};
+
 static void rtase_init_netdev_ops(struct net_device *dev)
 {
 	dev->netdev_ops = &rtase_netdev_ops;
+	dev->ethtool_ops = &rtase_ethtool_ops;
 }
 
 static void rtase_reset_interrupt(struct pci_dev *pdev,
-- 
2.34.1


^ permalink raw reply related

* [PATCH net-next v10 02/13] net:ethernet:realtek:rtase: Implement the .ndo_open function
From: Justin Lai @ 2023-11-02 15:44 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, linux-kernel, netdev, andrew, pkshih,
	larry.chiu, Justin Lai
In-Reply-To: <20231102154505.940783-1-justinlai0215@realtek.com>

Implement the .ndo_open function to set default hardware settings
and initialize the descriptor ring and interrupts. Among them,
when requesting irq, because the first group of interrupts needs to
process more events, the overall structure will be different from
other groups of interrupts, so it needs to be processed separately.

Signed-off-by: Justin Lai <justinlai0215@realtek.com>
---
 .../net/ethernet/realtek/rtase/rtase_main.c   | 422 ++++++++++++++++++
 1 file changed, 422 insertions(+)

diff --git a/drivers/net/ethernet/realtek/rtase/rtase_main.c b/drivers/net/ethernet/realtek/rtase/rtase_main.c
index 03352a0b8a91..5bea71c25645 100644
--- a/drivers/net/ethernet/realtek/rtase/rtase_main.c
+++ b/drivers/net/ethernet/realtek/rtase/rtase_main.c
@@ -130,6 +130,291 @@ static u32 rtase_r32(const struct rtase_private *tp, u16 reg)
 	return readl(tp->mmio_addr + reg);
 }
 
+static void rtase_set_rxbufsize(struct rtase_private *tp)
+{
+	tp->rx_buf_sz = RX_BUF_SIZE;
+}
+
+static int rtase_alloc_desc(struct rtase_private *tp)
+{
+	struct pci_dev *pdev = tp->pdev;
+	u32 i;
+
+	/* rx and tx descriptors needs 256 bytes alignment.
+	 * dma_alloc_coherent provides more.
+	 */
+	for (i = 0; i < tp->func_tx_queue_num; i++) {
+		tp->tx_ring[i].desc = dma_alloc_coherent(&pdev->dev,
+							 RTASE_TX_RING_DESC_SIZE,
+							 &tp->tx_ring[i].phy_addr,
+							 GFP_KERNEL);
+		if (!tp->tx_ring[i].desc)
+			return -ENOMEM;
+	}
+
+	for (i = 0; i < tp->func_rx_queue_num; i++) {
+		tp->rx_ring[i].desc =
+			dma_alloc_coherent(&pdev->dev, RTASE_RX_RING_DESC_SIZE,
+					   &tp->rx_ring[i].phy_addr,
+					   GFP_KERNEL);
+		if (!tp->rx_ring[i].desc)
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void rtase_free_desc(struct rtase_private *tp)
+{
+	struct pci_dev *pdev = tp->pdev;
+	u32 i;
+
+	for (i = 0; i < tp->func_tx_queue_num; i++) {
+		if (!tp->tx_ring[i].desc)
+			continue;
+
+		dma_free_coherent(&pdev->dev, RTASE_TX_RING_DESC_SIZE,
+				  tp->tx_ring[i].desc,
+				  tp->tx_ring[i].phy_addr);
+		tp->tx_ring[i].desc = NULL;
+	}
+
+	for (i = 0; i < tp->func_rx_queue_num; i++) {
+		if (!tp->rx_ring[i].desc)
+			continue;
+
+		dma_free_coherent(&pdev->dev, RTASE_RX_RING_DESC_SIZE,
+				  tp->rx_ring[i].desc,
+				  tp->rx_ring[i].phy_addr);
+		tp->rx_ring[i].desc = NULL;
+	}
+}
+
+static void rtase_mark_to_asic(union rx_desc *desc, u32 rx_buf_sz)
+{
+	u32 eor = le32_to_cpu(desc->desc_cmd.opts1) & RING_END;
+
+	desc->desc_status.opts2 = 0;
+	/* force memory writes to complete before releasing descriptor */
+	dma_wmb();
+	WRITE_ONCE(desc->desc_cmd.opts1,
+		   cpu_to_le32(DESC_OWN | eor | rx_buf_sz));
+}
+
+static void rtase_tx_desc_init(struct rtase_private *tp, u16 idx)
+{
+	struct rtase_ring *ring = &tp->tx_ring[idx];
+	struct tx_desc *desc;
+	u32 i;
+
+	memset(ring->desc, 0x0, RTASE_TX_RING_DESC_SIZE);
+	memset(ring->skbuff, 0x0, sizeof(ring->skbuff));
+	ring->cur_idx = 0;
+	ring->dirty_idx = 0;
+	ring->index = idx;
+
+	for (i = 0; i < NUM_DESC; i++) {
+		ring->mis.len[i] = 0;
+		if ((NUM_DESC - 1) == i) {
+			desc = ring->desc + sizeof(struct tx_desc) * i;
+			desc->opts1 = cpu_to_le32(RING_END);
+		}
+	}
+
+	ring->ring_handler = tx_handler;
+	if (idx < 4) {
+		ring->ivec = &tp->int_vector[idx];
+		list_add_tail(&ring->ring_entry,
+			      &tp->int_vector[idx].ring_list);
+	} else {
+		ring->ivec = &tp->int_vector[0];
+		list_add_tail(&ring->ring_entry, &tp->int_vector[0].ring_list);
+	}
+}
+
+static void rtase_map_to_asic(union rx_desc *desc, dma_addr_t mapping,
+			      u32 rx_buf_sz)
+{
+	desc->desc_cmd.addr = cpu_to_le64(mapping);
+	/* make sure the physical address has been updated */
+	wmb();
+	rtase_mark_to_asic(desc, rx_buf_sz);
+}
+
+static void rtase_make_unusable_by_asic(union rx_desc *desc)
+{
+	desc->desc_cmd.addr = cpu_to_le64(RTK_MAGIC_NUMBER);
+	desc->desc_cmd.opts1 &= ~cpu_to_le32(DESC_OWN | RSVD_MASK);
+}
+
+static int rtase_alloc_rx_skb(const struct rtase_ring *ring,
+			      struct sk_buff **p_sk_buff, union rx_desc *desc,
+			      dma_addr_t *rx_phy_addr, u8 in_intr)
+{
+	struct rtase_int_vector *ivec = ring->ivec;
+	const struct rtase_private *tp = ivec->tp;
+	struct sk_buff *skb = NULL;
+	struct page *page;
+	dma_addr_t mapping;
+	void *buf_addr;
+	int ret = 0;
+
+	page = page_pool_dev_alloc_pages(tp->page_pool);
+	if (!page) {
+		netdev_err(tp->dev, "failed to alloc page\n");
+		goto err_out;
+	}
+
+	buf_addr = page_address(page);
+	mapping = page_pool_get_dma_addr(page);
+
+	skb = build_skb(buf_addr, PAGE_SIZE);
+	if (!skb) {
+		page_pool_put_full_page(tp->page_pool, page, true);
+		netdev_err(tp->dev, "failed to build skb\n");
+		goto err_out;
+	}
+
+	*p_sk_buff = skb;
+	*rx_phy_addr = mapping;
+	rtase_map_to_asic(desc, mapping, tp->rx_buf_sz);
+
+	return ret;
+
+err_out:
+	if (skb)
+		dev_kfree_skb(skb);
+
+	ret = -ENOMEM;
+	rtase_make_unusable_by_asic(desc);
+
+	return ret;
+}
+
+static u32 rtase_rx_ring_fill(struct rtase_ring *ring, u32 ring_start,
+			      u32 ring_end, u8 in_intr)
+{
+	union rx_desc *desc_base = ring->desc;
+	u32 cur;
+
+	for (cur = ring_start; ring_end - cur > 0; cur++) {
+		u32 i = cur % NUM_DESC;
+		union rx_desc *desc = desc_base + i;
+		int ret;
+
+		if (ring->skbuff[i])
+			continue;
+
+		ret = rtase_alloc_rx_skb(ring, &ring->skbuff[i], desc,
+					 &ring->mis.data_phy_addr[i],
+					 in_intr);
+		if (ret)
+			break;
+	}
+
+	return cur - ring_start;
+}
+
+static void rtase_mark_as_last_descriptor(union rx_desc *desc)
+{
+	desc->desc_cmd.opts1 |= cpu_to_le32(RING_END);
+}
+
+static void rtase_rx_ring_clear(struct rtase_ring *ring)
+{
+	union rx_desc *desc;
+	u32 i;
+
+	for (i = 0; i < NUM_DESC; i++) {
+		desc = ring->desc + sizeof(union rx_desc) * i;
+
+		if (!ring->skbuff[i])
+			continue;
+
+		dev_kfree_skb(ring->skbuff[i]);
+
+		ring->skbuff[i] = NULL;
+
+		rtase_make_unusable_by_asic(desc);
+	}
+}
+
+static void rtase_rx_desc_init(struct rtase_private *tp, u16 idx)
+{
+	struct rtase_ring *ring = &tp->rx_ring[idx];
+	u16 i;
+
+	memset(ring->desc, 0x0, RTASE_RX_RING_DESC_SIZE);
+	memset(ring->skbuff, 0x0, sizeof(ring->skbuff));
+	ring->cur_idx = 0;
+	ring->dirty_idx = 0;
+	ring->index = idx;
+
+	for (i = 0; i < NUM_DESC; i++)
+		ring->mis.data_phy_addr[i] = 0;
+
+	ring->ring_handler = rx_handler;
+	ring->ivec = &tp->int_vector[idx];
+	list_add_tail(&ring->ring_entry, &tp->int_vector[idx].ring_list);
+}
+
+static void rtase_rx_clear(struct rtase_private *tp)
+{
+	u32 i;
+
+	for (i = 0; i < tp->func_rx_queue_num; i++)
+		rtase_rx_ring_clear(&tp->rx_ring[i]);
+
+	page_pool_destroy(tp->page_pool);
+	tp->page_pool = NULL;
+}
+
+static int rtase_init_ring(const struct net_device *dev)
+{
+	struct rtase_private *tp = netdev_priv(dev);
+	struct page_pool *page_pool;
+	struct page_pool_params pp_params = {
+		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
+		.order = 0,
+		.pool_size = NUM_DESC * tp->func_rx_queue_num,
+		.nid = dev_to_node(&tp->pdev->dev),
+		.dev = &tp->pdev->dev,
+		.dma_dir = DMA_FROM_DEVICE,
+		.max_len = PAGE_SIZE,
+		.offset = 0,
+	};
+	u32 num;
+	u16 i;
+
+	page_pool = page_pool_create(&pp_params);
+	if (IS_ERR(page_pool)) {
+		netdev_err(tp->dev, "failed to create page pool\n");
+		return -ENOMEM;
+	}
+
+	tp->page_pool = page_pool;
+
+	for (i = 0; i < tp->func_tx_queue_num; i++)
+		rtase_tx_desc_init(tp, i);
+
+	for (i = 0; i < tp->func_rx_queue_num; i++) {
+		rtase_rx_desc_init(tp, i);
+		num = rtase_rx_ring_fill(&tp->rx_ring[i], 0, NUM_DESC, 0);
+		if (num != NUM_DESC)
+			goto err_out;
+
+		rtase_mark_as_last_descriptor(tp->rx_ring[i].desc +
+					      sizeof(union rx_desc) *
+					      (NUM_DESC - 1));
+	}
+
+	return 0;
+
+err_out:
+	rtase_rx_clear(tp);
+	return -ENOMEM;
+}
+
 static void rtase_tally_counter_clear(const struct rtase_private *tp)
 {
 	u32 cmd = lower_32_bits(tp->tally_paddr);
@@ -138,6 +423,133 @@ static void rtase_tally_counter_clear(const struct rtase_private *tp)
 	rtase_w32(tp, RTASE_DTCCR0, cmd | COUNTER_RESET);
 }
 
+static void rtase_nic_enable(const struct net_device *dev)
+{
+	const struct rtase_private *tp = netdev_priv(dev);
+	u16 rcr = rtase_r16(tp, RTASE_RX_CONFIG_1);
+	u8 val;
+
+	/* PCIe PLA reload */
+	rtase_w16(tp, RTASE_RX_CONFIG_1, rcr & ~PCIE_RELOAD_En);
+	rtase_w16(tp, RTASE_RX_CONFIG_1, rcr | PCIE_RELOAD_En);
+
+	/* set PCIe TE & RE */
+	val = rtase_r8(tp, RTASE_CHIP_CMD);
+	rtase_w8(tp, RTASE_CHIP_CMD, val | TE | RE);
+
+	/* clear rxdv_gated_en */
+	val = rtase_r8(tp, RTASE_MISC);
+	rtase_w8(tp, RTASE_MISC, val & ~RX_DV_GATE_EN);
+}
+
+static void rtase_enable_hw_interrupt(const struct rtase_private *tp)
+{
+	const struct rtase_int_vector *ivec = &tp->int_vector[0];
+	u32 i;
+
+	rtase_w32(tp, ivec->imr_addr, ivec->imr);
+
+	for (i = 1; i < tp->int_nums; i++) {
+		ivec = &tp->int_vector[i];
+		rtase_w16(tp, ivec->imr_addr, ivec->imr);
+	}
+}
+
+static void rtase_hw_start(const struct net_device *dev)
+{
+	const struct rtase_private *tp = netdev_priv(dev);
+
+	rtase_nic_enable(dev);
+	rtase_enable_hw_interrupt(tp);
+}
+
+static int rtase_open(struct net_device *dev)
+{
+	struct rtase_private *tp = netdev_priv(dev);
+	struct rtase_int_vector *ivec = &tp->int_vector[0];
+	const struct pci_dev *pdev = tp->pdev;
+	int ret;
+	u16 i;
+
+	rtase_set_rxbufsize(tp);
+
+	ret = rtase_alloc_desc(tp);
+	if (ret)
+		goto err_free_all_allocated_mem;
+
+	ret = rtase_init_ring(dev);
+	if (ret)
+		goto err_free_all_allocated_mem;
+
+	INIT_DELAYED_WORK(&tp->task, NULL);
+
+	rtase_hw_config(dev);
+
+	if (tp->sw_flag & SWF_MSIX_ENABLED) {
+		ret = request_irq(ivec->irq, rtase_interrupt, 0,
+				  dev->name, ivec);
+
+		/* request other interrupts to handle multiqueue */
+		for (i = 1; i < tp->int_nums; i++) {
+			if (ret)
+				continue;
+
+			ivec = &tp->int_vector[i];
+			if (ivec->status != 1)
+				continue;
+
+			snprintf(ivec->name, sizeof(ivec->name), "%s_int%i", tp->dev->name, i);
+			ret = request_irq(ivec->irq, rtase_q_interrupt, 0,
+					  ivec->name, ivec);
+		}
+	} else if (tp->sw_flag & SWF_MSI_ENABLED) {
+		ret = request_irq(pdev->irq, rtase_interrupt, 0, dev->name,
+				  ivec);
+	} else {
+		ret = request_irq(pdev->irq, rtase_interrupt, IRQF_SHARED,
+				  dev->name, ivec);
+	}
+
+	if (ret != 0) {
+		netdev_err(dev, "can't request MSIX interrupt. Error: %d\n", ret);
+		goto err_free_all_allocated_mem;
+	}
+
+	rtase_hw_start(dev);
+
+	netif_carrier_on(dev);
+	netif_wake_queue(dev);
+
+	goto out;
+
+err_free_all_allocated_mem:
+	rtase_free_desc(tp);
+
+out:
+	return ret;
+}
+
+static int rtase_close(struct net_device *dev)
+{
+	struct rtase_private *tp = netdev_priv(dev);
+	const struct pci_dev *pdev = tp->pdev;
+	u32 i;
+
+	rtase_down(dev);
+
+	if (tp->sw_flag & SWF_MSIX_ENABLED) {
+		for (i = 0; i < tp->int_nums; i++)
+			free_irq(tp->int_vector[i].irq, &tp->int_vector[i]);
+
+	} else {
+		free_irq(pdev->irq, &tp->int_vector[0]);
+	}
+
+	rtase_free_desc(tp);
+
+	return 0;
+}
+
 static void rtase_enable_eem_write(const struct rtase_private *tp)
 {
 	u8 val;
@@ -170,6 +582,11 @@ static void rtase_rar_set(const struct rtase_private *tp, const u8 *addr)
 	rtase_w16(tp, RTASE_LBK_CTRL, LBK_ATLD | LBK_CLR);
 }
 
+static const struct net_device_ops rtase_netdev_ops = {
+	.ndo_open = rtase_open,
+	.ndo_stop = rtase_close,
+};
+
 static void rtase_get_mac_address(struct net_device *dev)
 {
 	struct rtase_private *tp = netdev_priv(dev);
@@ -193,6 +610,11 @@ static void rtase_get_mac_address(struct net_device *dev)
 	ether_addr_copy(dev->perm_addr, dev->dev_addr);
 }
 
+static void rtase_init_netdev_ops(struct net_device *dev)
+{
+	dev->netdev_ops = &rtase_netdev_ops;
+}
+
 static void rtase_reset_interrupt(struct pci_dev *pdev,
 				  const struct rtase_private *tp)
 {
-- 
2.34.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox