* [GIT PULL] Networking for v6.7-rc1
From: Jakub Kicinski @ 2023-11-09 21:00 UTC (permalink / raw)
To: torvalds; +Cc: kuba, davem, netdev, linux-kernel, pabeni
Hi Linus!
The following changes since commit ff269e2cd5adce4ae14f883fc9c8803bc43ee1e9:
Merge tag 'net-next-6.7-followup' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next (2023-11-01 16:33:20 -1000)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git net-6.7-rc1
for you to fetch changes up to 83b9dda8afa4e968d9cce253f390b01c0612a2a5:
net: ti: icss-iep: fix setting counter value (2023-11-09 13:15:40 +0100)
----------------------------------------------------------------
Including fixes from netfilter and bpf.
Current release - regressions:
- sched: fix SKB_NOT_DROPPED_YET splat under debug config
Current release - new code bugs:
- tcp: fix usec timestamps with TCP fastopen
- tcp_sigpool: fix some off by one bugs
- tcp: fix possible out-of-bounds reads in tcp_hash_fail()
- tcp: fix SYN option room calculation for TCP-AO
- bpf: fix compilation error without CGROUPS
- ptp:
- ptp_read() should not release queue
- fix tsevqs corruption
Previous releases - regressions:
- llc: verify mac len before reading mac header
Previous releases - always broken:
- bpf:
- fix check_stack_write_fixed_off() to correctly spill imm
- fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END
- check map->usercnt after timer->timer is assigned
- dsa: lan9303: consequently nested-lock physical MDIO
- dccp/tcp: call security_inet_conn_request() after setting IP addr
- tg3: fix the TX ring stall due to incorrect full ring handling
- phylink: initialize carrier state at creation
- ice: fix direction of VF rules in switchdev mode
Misc:
- fill in a bunch of missing MODULE_DESCRIPTION()s, more to come
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
----------------------------------------------------------------
Alex Pakhunov (1):
tg3: Fix the TX ring stall
Alexander Sverdlin (1):
net: dsa: lan9303: consequently nested-lock physical MDIO
Alexei Starovoitov (3):
Merge branch 'bpf-fix-incorrect-immediate-spill'
Merge branch 'relax-allowlist-for-open-coded-css_task-iter'
Merge branch 'bpf-fix-precision-tracking-for-bpf_alu-bpf_to_be-bpf_end'
Andrew Lunn (3):
net: phy: fill in missing MODULE_DESCRIPTION()s
net: mdio: fill in missing MODULE_DESCRIPTION()s
net: ethtool: Fix documentation of ethtool_sprintf()
Andrii Nakryiko (1):
selftests/bpf: fix test_maps' use of bpf_map_create_opts
Aniruddha Paul (1):
ice: Fix VF-VF filter rules in switchdev mode
Björn Töpel (1):
selftests/bpf: Fix broken build where char is unsigned
Chuyi Zhou (5):
bpf: Relax allowlist for css_task iter
selftests/bpf: Add tests for css_task iter combining with cgroup iter
selftests/bpf: Add test for using css_task iter in sleepable progs
bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg
selftests/bpf: get trusted cgrp from bpf_iter__cgroup directly
D. Wythe (3):
net/smc: fix dangling sock under state SMC_APPFINCLOSEWAIT
net/smc: allow cdc msg send rather than drop it with NULL sndbuf_desc
net/smc: put sk reference if close work was canceled
Dan Carpenter (2):
hsr: Prevent use after free in prp_create_tagged_frame()
net/tcp_sigpool: Fix some off by one bugs
Dave Ertman (1):
ice: Fix SRIOV LAG disable on non-compliant aggregate
Dave Marchevsky (2):
bpf: Add __bpf_kfunc_{start,end}_defs macros
bpf: Add __bpf_hook_{start,end} macros
David Howells (1):
rxrpc: Fix two connection reaping bugs
David S. Miller (2):
Merge branch 'smc-fixes'
Merge branch 'vsock-fixes'
Diogo Ivo (1):
net: ti: icss-iep: fix setting counter value
Edward Adam Davis (2):
ptp: ptp_read should not release queue
ptp: fix corrupted list in ptp_open
Eric Dumazet (5):
inet: shrink struct flowi_common
tcp: fix fastopen code vs usec TS
net/tcp: fix possible out-of-bounds reads in tcp_hash_fail()
idpf: fix potential use-after-free in idpf_tso()
net_sched: sch_fq: better validate TCA_FQ_WEIGHTS and TCA_FQ_PRIOMAP
Filippo Storniolo (4):
vsock/virtio: remove socket from connected/bound list on shutdown
test/vsock fix: add missing check on socket creation
test/vsock: refactor vsock_accept
test/vsock: add dobule bind connect test
Florian Westphal (3):
netfilter: add missing module descriptions
ipvs: add missing module descriptions
netfilter: nat: fix ipv6 nat redirect with mapped and scoped addresses
Furong Xu (1):
net: stmmac: xgmac: Enable support for multiple Flexible PPS outputs
Geetha sowjanya (1):
octeontx2-pf: Free pending and dropped SQEs
George Shuklin (1):
tg3: power down device only on SYSTEM_POWER_OFF
Gerd Bayer (1):
net/smc: fix documentation of buffer sizes
Hangbin Liu (1):
selftests: pmtu.sh: fix result checking
Hao Sun (2):
bpf: Fix check_stack_write_fixed_off() to correctly spill imm
selftests/bpf: Add test for immediate spilled to stack
Heiner Kallweit (1):
r8169: respect userspace disabling IFF_MULTICAST
Hou Tao (1):
bpf: Check map->usercnt after timer->timer is assigned
Ivan Vecera (2):
i40e: Do not call devlink_port_type_clear()
i40e: Fix devlink port unregistering
Jakub Kicinski (10):
Merge branch 'net-sched-fill-in-missing-module_descriptions-for-net-sched'
Merge branch 'add-missing-module_descriptions'
tools: ynl-gen: don't touch the output file if content is the same
netlink: fill in missing MODULE_DESCRIPTION()
nfsd: regenerate user space parsers after ynl-gen changes
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Merge tag 'nf-23-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
net: kcm: fill in MODULE_DESCRIPTION()
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Jamal Hadi Salim (1):
net, sched: Fix SKB_NOT_DROPPED_YET splat under debug config
Jian Shen (1):
net: page_pool: add missing free_percpu when page_pool_init fail
Jiri Pirko (1):
netlink: specs: devlink: add forgotten port function caps enum values
Klaus Kudielka (1):
net: phylink: initialize carrier state at creation
Kuan-Wei Chiu (1):
s390/qeth: Fix typo 'weed' in comment
Kuniyuki Iwashima (3):
dccp: Call security_inet_conn_request() after setting IPv4 addresses.
dccp/tcp: Call security_inet_conn_request() after setting IPv6 addresses.
tcp: Fix SYN option room calculation for TCP-AO.
Linus Walleij (1):
net: xscale: Drop unused PHY number
Maciej Żenczykowski (1):
netfilter: xt_recent: fix (increase) ipv6 literal buffer length
Manu Bretelle (1):
selftests/bpf: fix test_bpffs
Marcin Szycik (1):
ice: Fix VF-VF direction matching in drop rule in switchdev
Martin KaFai Lau (1):
Merge branch 'Let BPF verifier consider {task,cgroup} is trusted in bpf_iter_reg'
Matthieu Baerts (1):
bpf: fix compilation error without CGROUPS
Michal Schmidt (1):
ice: lag: in RCU, use atomic allocation
Nathan Chancellor (1):
tcp: Fix -Wc23-extensions in tcp_options_write()
NeilBrown (1):
Fix termination state for idr_for_each_entry_ul()
Pablo Neira Ayuso (1):
netfilter: nf_tables: remove catchall element in GC sync path
Paolo Abeni (1):
Merge branch 'dccp-tcp-relocate-security_inet_conn_request'
Patrick Thompson (1):
net: r8169: Disable multicast filter for RTL8168H and RTL8107E
Philipp Stanner (1):
drivers/net/ppp: use standard array-copy-function
Ratheesh Kannoth (2):
octeontx2-pf: Fix error codes
octeontx2-pf: Fix holes in error code
Ronald Wahl (1):
net: ethernet: ti: am65-cpsw: rx_pause/tx_pause controls wrong direction
Shigeru Yoshida (2):
tipc: Change nla_policy for bearer-related names to NLA_NUL_STRING
virtio/vsock: Fix uninit-value in virtio_transport_recv_pkt()
Shung-Hsi Yu (2):
bpf: Fix precision tracking for BPF_ALU | BPF_TO_BE | BPF_END
selftests/bpf: precision tracking test for BPF_NEG and BPF_END
Victor Nogueira (3):
net: sched: Fill in MODULE_DESCRIPTION for act_gate
net: sched: Fill in missing MODULE_DESCRIPTION for classifiers
net: sched: Fill in missing MODULE_DESCRIPTION for qdiscs
Vlad Buslov (1):
net/sched: act_ct: Always fill offloading tuple iifidx
Vladimir Oltean (1):
net: enetc: shorten enetc_setup_xdp_prog() error message to fit NETLINK_MAX_FMTMSG_LEN
Willem de Bruijn (1):
llc: verify mac len before reading mac header
Documentation/bpf/kfuncs.rst | 6 +-
Documentation/netlink/specs/devlink.yaml | 4 +
Documentation/networking/smc-sysctl.rst | 6 +-
drivers/net/dsa/lan9303_mdio.c | 4 +-
drivers/net/ethernet/broadcom/tg3.c | 56 +++++++---
drivers/net/ethernet/freescale/enetc/enetc.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_devlink.c | 1 -
drivers/net/ethernet/intel/i40e/i40e_main.c | 10 +-
drivers/net/ethernet/intel/ice/ice_lag.c | 18 ++--
drivers/net/ethernet/intel/ice/ice_tc_lib.c | 114 +++++++++++++++-----
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 6 +-
.../ethernet/marvell/octeontx2/nic/otx2_common.c | 15 +--
.../ethernet/marvell/octeontx2/nic/otx2_common.h | 1 +
.../net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 81 ++++++++------
.../ethernet/marvell/octeontx2/nic/otx2_struct.h | 34 +++---
.../net/ethernet/marvell/octeontx2/nic/otx2_txrx.c | 42 ++++++++
drivers/net/ethernet/realtek/r8169_main.c | 6 +-
drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h | 2 +-
.../net/ethernet/stmicro/stmmac/dwxgmac2_core.c | 14 ++-
drivers/net/ethernet/ti/am65-cpsw-nuss.c | 4 +-
drivers/net/ethernet/ti/icssg/icss_iep.c | 2 +-
drivers/net/ethernet/xscale/ixp4xx_eth.c | 3 +-
drivers/net/mdio/acpi_mdio.c | 1 +
drivers/net/mdio/fwnode_mdio.c | 1 +
drivers/net/mdio/mdio-aspeed.c | 1 +
drivers/net/mdio/mdio-bitbang.c | 1 +
drivers/net/mdio/of_mdio.c | 1 +
drivers/net/phy/bcm-phy-ptp.c | 1 +
drivers/net/phy/bcm87xx.c | 1 +
drivers/net/phy/phylink.c | 2 +
drivers/net/phy/sfp.c | 1 +
drivers/net/ppp/ppp_generic.c | 4 +-
drivers/ptp/ptp_chardev.c | 23 ++--
drivers/ptp/ptp_clock.c | 8 +-
drivers/ptp/ptp_private.h | 1 +
drivers/s390/net/qeth_core_main.c | 2 +-
include/linux/btf.h | 11 ++
include/linux/ethtool.h | 4 +-
include/linux/idr.h | 6 +-
include/linux/tcp.h | 2 +-
include/net/flow.h | 2 +-
include/net/netfilter/nf_conntrack_act_ct.h | 34 +++---
include/net/tcp_ao.h | 13 +--
include/uapi/linux/nfsd_netlink.h | 6 +-
kernel/bpf/bpf_iter.c | 6 +-
kernel/bpf/cgroup_iter.c | 8 +-
kernel/bpf/cpumask.c | 6 +-
kernel/bpf/helpers.c | 39 ++++---
kernel/bpf/map_iter.c | 6 +-
kernel/bpf/task_iter.c | 24 ++---
kernel/bpf/verifier.c | 33 ++++--
kernel/cgroup/rstat.c | 9 +-
kernel/trace/bpf_trace.c | 6 +-
net/bpf/test_run.c | 7 +-
net/bridge/netfilter/ebtable_broute.c | 1 +
net/bridge/netfilter/ebtable_filter.c | 1 +
net/bridge/netfilter/ebtable_nat.c | 1 +
net/bridge/netfilter/ebtables.c | 1 +
net/bridge/netfilter/nf_conntrack_bridge.c | 1 +
net/core/filter.c | 13 +--
net/core/page_pool.c | 6 +-
net/core/xdp.c | 6 +-
net/dccp/ipv4.c | 6 +-
net/dccp/ipv6.c | 6 +-
net/devlink/netlink_gen.c | 2 +-
net/hsr/hsr_forward.c | 4 +-
net/ipv4/fou_bpf.c | 6 +-
net/ipv4/netfilter/iptable_nat.c | 1 +
net/ipv4/netfilter/iptable_raw.c | 1 +
net/ipv4/netfilter/nf_defrag_ipv4.c | 1 +
net/ipv4/netfilter/nf_reject_ipv4.c | 1 +
net/ipv4/syncookies.c | 2 +-
net/ipv4/tcp_ao.c | 5 +-
net/ipv4/tcp_input.c | 7 +-
net/ipv4/tcp_output.c | 72 +++++++------
net/ipv4/tcp_sigpool.c | 8 +-
net/ipv6/netfilter/ip6table_nat.c | 1 +
net/ipv6/netfilter/ip6table_raw.c | 1 +
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c | 1 +
net/ipv6/netfilter/nf_reject_ipv6.c | 1 +
net/ipv6/syncookies.c | 7 +-
net/kcm/kcmsock.c | 1 +
net/llc/llc_input.c | 10 +-
net/llc/llc_s_ac.c | 3 +
net/llc/llc_station.c | 3 +
net/netfilter/ipvs/ip_vs_core.c | 1 +
net/netfilter/ipvs/ip_vs_dh.c | 1 +
net/netfilter/ipvs/ip_vs_fo.c | 1 +
net/netfilter/ipvs/ip_vs_ftp.c | 1 +
net/netfilter/ipvs/ip_vs_lblc.c | 1 +
net/netfilter/ipvs/ip_vs_lblcr.c | 1 +
net/netfilter/ipvs/ip_vs_lc.c | 1 +
net/netfilter/ipvs/ip_vs_nq.c | 1 +
net/netfilter/ipvs/ip_vs_ovf.c | 1 +
net/netfilter/ipvs/ip_vs_pe_sip.c | 1 +
net/netfilter/ipvs/ip_vs_rr.c | 1 +
net/netfilter/ipvs/ip_vs_sed.c | 1 +
net/netfilter/ipvs/ip_vs_sh.c | 1 +
net/netfilter/ipvs/ip_vs_twos.c | 1 +
net/netfilter/ipvs/ip_vs_wlc.c | 1 +
net/netfilter/ipvs/ip_vs_wrr.c | 1 +
net/netfilter/nf_conntrack_bpf.c | 6 +-
net/netfilter/nf_conntrack_broadcast.c | 1 +
net/netfilter/nf_conntrack_netlink.c | 1 +
net/netfilter/nf_conntrack_proto.c | 1 +
net/netfilter/nf_nat_bpf.c | 6 +-
net/netfilter/nf_nat_core.c | 1 +
net/netfilter/nf_nat_redirect.c | 27 ++++-
net/netfilter/nf_tables_api.c | 23 +++-
net/netfilter/nfnetlink_osf.c | 1 +
net/netfilter/nft_chain_nat.c | 1 +
net/netfilter/nft_fib.c | 1 +
net/netfilter/nft_fwd_netdev.c | 1 +
net/netfilter/xt_recent.c | 2 +-
net/netlink/diag.c | 1 +
net/openvswitch/conntrack.c | 2 +-
net/rxrpc/conn_object.c | 2 +-
net/rxrpc/local_object.c | 2 +-
net/sched/act_api.c | 2 +-
net/sched/act_ct.c | 15 ++-
net/sched/act_gate.c | 1 +
net/sched/cls_api.c | 9 +-
net/sched/cls_basic.c | 1 +
net/sched/cls_cgroup.c | 1 +
net/sched/cls_fw.c | 1 +
net/sched/cls_route.c | 1 +
net/sched/cls_u32.c | 1 +
net/sched/sch_cbs.c | 1 +
net/sched/sch_choke.c | 1 +
net/sched/sch_drr.c | 1 +
net/sched/sch_etf.c | 1 +
net/sched/sch_ets.c | 1 +
net/sched/sch_fifo.c | 1 +
net/sched/sch_fq.c | 10 +-
net/sched/sch_gred.c | 1 +
net/sched/sch_hfsc.c | 1 +
net/sched/sch_htb.c | 1 +
net/sched/sch_ingress.c | 1 +
net/sched/sch_mqprio.c | 1 +
net/sched/sch_mqprio_lib.c | 1 +
net/sched/sch_multiq.c | 1 +
net/sched/sch_netem.c | 1 +
net/sched/sch_plug.c | 1 +
net/sched/sch_prio.c | 1 +
net/sched/sch_qfq.c | 1 +
net/sched/sch_red.c | 1 +
net/sched/sch_sfq.c | 1 +
net/sched/sch_skbprio.c | 1 +
net/sched/sch_taprio.c | 1 +
net/sched/sch_tbf.c | 1 +
net/sched/sch_teql.c | 1 +
net/smc/af_smc.c | 4 +-
net/smc/smc.h | 5 +
net/smc/smc_cdc.c | 11 +-
net/smc/smc_close.c | 5 +-
net/socket.c | 8 +-
net/tipc/netlink.c | 4 +-
net/vmw_vsock/virtio_transport_common.c | 18 +++-
net/xfrm/xfrm_interface_bpf.c | 6 +-
tools/net/ynl/generated/devlink-user.c | 2 +
tools/net/ynl/generated/nfsd-user.c | 120 +++++++++++++++++++--
tools/net/ynl/generated/nfsd-user.h | 44 +++++++-
tools/net/ynl/ynl-gen-c.py | 7 +-
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 6 +-
.../selftests/bpf/map_tests/map_percpu_stats.c | 20 +---
.../testing/selftests/bpf/prog_tests/cgroup_iter.c | 33 ++++++
tools/testing/selftests/bpf/prog_tests/iters.c | 1 +
.../testing/selftests/bpf/prog_tests/test_bpffs.c | 11 +-
tools/testing/selftests/bpf/prog_tests/verifier.c | 2 +
tools/testing/selftests/bpf/progs/iters_css_task.c | 55 ++++++++++
.../selftests/bpf/progs/iters_task_failure.c | 4 +-
.../selftests/bpf/progs/verifier_precision.c | 93 ++++++++++++++++
tools/testing/selftests/bpf/verifier/bpf_st_mem.c | 32 ++++++
tools/testing/selftests/bpf/xdp_hw_metadata.c | 2 +-
tools/testing/selftests/net/pmtu.sh | 2 +-
tools/testing/vsock/util.c | 87 ++++++++++++---
tools/testing/vsock/util.h | 3 +
tools/testing/vsock/vsock_test.c | 50 +++++++++
178 files changed, 1242 insertions(+), 434 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/verifier_precision.c
^ permalink raw reply
* Re: [REPORT] BPF: Reproducible triggering of BUG() from userspace PoC
From: Jiri Olsa @ 2023-11-09 21:39 UTC (permalink / raw)
To: Lee Jones
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, x86, bpf,
linux-kernel, netdev
In-Reply-To: <20231108154626.GB8909@google.com>
On Wed, Nov 08, 2023 at 03:46:26PM +0000, Lee Jones wrote:
> Good afternoon,
>
> After coming across a recent Syzkaller report [0] I thought I'd take
> some time to firstly reproduce the issue, then see if there was a
> trivial way to mitigate it. The report suggests that a BUG() in
> prog_array_map_poke_run() [1] can be trivially and reliably triggered
> from userspace using the PoC provided [2].
>
> ret = bpf_arch_text_poke(poke->tailcall_bypass,
> BPF_MOD_JUMP,
> old_bypass_addr,
> poke->bypass_addr);
> BUG_ON(ret < 0 && ret != -EINVAL);
>
> Indeed the PoC does seem to be able to consistently trigger the BUG(),
> not only on the reported kernel (v6.1), but also on linux-next. I went
> to the trouble of checking LORE, but failed to find any patches which
> may be attempting to fix this.
>
> kernel BUG at kernel/bpf/arraymap.c:1094!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 5 PID: 45 Comm: kworker/5:0 Not tainted 6.6.0-rc3-next-20230929-dirty #74
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> Workqueue: events prog_array_map_clear_deferred
> RIP: 0010:prog_array_map_poke_run+0x6b4/0x6d0
> Code: ff 0f 0b e8 1e 27 e1 ff 48 c7 c7 60 80 93 85 48 c7 c6 00 7f 93 85 48 c7 c2 bb c2 39 86 b9 45 04 00 00 45 89 f8 e8 9c 890
> RSP: 0018:ffffc9000036fb50 EFLAGS: 00010246
> RAX: 0000000000000044 RBX: ffff88811f337490 RCX: 63af48a1314f9900
> RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000
> RBP: ffffc9000036fbe8 R08: ffffffff815c23c5 R09: 1ffff11084c14eba
> R10: dfffe91084c14ebc R11: ffffed1084c14ebb R12: ffff888116517800
> R13: dffffc0000000000 R14: ffff888125a1a400 R15: 00000000fffffff0
> FS: 0000000000000000(0000) GS:ffff888426080000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000004ab678 CR3: 0000000122ac4000 CR4: 0000000000350eb0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> ? __die_body+0x92/0xf0
> ? die+0xa2/0xe0
> ? do_trap+0x12f/0x370
> ? handle_invalid_op+0xa6/0x140
> ? handle_invalid_op+0xdf/0x140
> ? prog_array_map_poke_run+0x6b4/0x6d0
> ? prog_array_map_poke_run+0x6b4/0x6d0
> ? exc_invalid_op+0x32/0x50
> ? asm_exc_invalid_op+0x1b/0x20
> ? __wake_up_klogd+0xd5/0x110
> ? prog_array_map_poke_run+0x6b4/0x6d0
> ? bpf_prog_6781ebc2dae4bad9+0xb/0x53
> fd_array_map_delete_elem+0x152/0x250
> prog_array_map_clear_deferred+0xf6/0x210
> ? __bpf_array_map_seq_show+0xa40/0xa40
> ? kick_pool+0x164/0x350
> ? process_one_work+0x57a/0xd00
> process_one_work+0x5e4/0xd00
> worker_thread+0x9cf/0xea0
> kthread+0x2b4/0x350
> ? pr_cont_work+0x580/0x580
> ? kthread_blkcg+0xd0/0xd0
> ret_from_fork+0x4a/0x80
> ? kthread_blkcg+0xd0/0xd0
> ret_from_fork_asm+0x11/0x20
> </TASK>
> Modules linked in:
> ---[ end trace 0000000000000000 ]---
>
> However, with my very limited BPF subsystem knowledge I was unable to
> trivially fix the issue. Hopefully some knowledgable person would be
> kind enough to provide me with some pointers.
>
> bpf_arch_text_poke() seems to be returning -EBUSY due to a negative
> memcmp() result from [3].
>
> ret = -EBUSY;
> mutex_lock(&text_mutex);
> if (memcmp(ip, old_insn, X86_PATCH_SIZE)) {
> goto out;
> [...]
>
> When spitting out the memory at those locations, this is the result:
>
> ip: e9 06 00 00 00
> old_insn: 0f 1f 44 00 00
> nop_insn: 0f 1f 44 00 00
>
> As you can see, the information stored in 'ip' does not match that of
> the data stored in 'old_insn', causing bpf_arch_text_poke() to return
> early with the error -EBUSY, suggesting that the data pointed to by
> 'old_insn', and by extension 'prog' should have been changed when
> emit_call()ing, to the value of 'ip', but wasn't.
hi,
thanks for the report.. I can reproduce that easily with [2]
AFAICS it looks like previous update fails because we use bpf_arch_text_poke,
which can't find poke->tailcall_bypass value as bpf program symbol and fails
with -EINVAL
then the following update fails to find expected jmp/nop because it was never
updated.. I think we should use __bpf_arch_text_poke like we do in
bpf_tail_call_direct_fixup and skip the bpf symbol check
with the patch below I can't reproduce the issue anymore, I'll do some more
checking though
jirka
---
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 8c10d9abc239..35c2988caf29 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -391,8 +391,8 @@ static int emit_jump(u8 **pprog, void *func, void *ip)
return emit_patch(pprog, func, ip, 0xE9);
}
-static int __bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
- void *old_addr, void *new_addr)
+int __bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
+ void *old_addr, void *new_addr)
{
const u8 *nop_insn = x86_nops[5];
u8 old_insn[X86_PATCH_SIZE];
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index eb84caf133df..0d7b8311fada 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3172,6 +3172,8 @@ enum bpf_text_poke_type {
int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
void *addr1, void *addr2);
+int __bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
+ void *old_addr, void *new_addr);
void *bpf_arch_text_copy(void *dst, void *src, size_t len);
int bpf_arch_text_invalidate(void *dst, size_t len);
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 2058e89b5ddd..4ab5864746ce 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -1073,33 +1073,33 @@ static void prog_array_map_poke_run(struct bpf_map *map, u32 key,
new_addr = new ? (u8 *)new->bpf_func + poke->adj_off : NULL;
if (new) {
- ret = bpf_arch_text_poke(poke->tailcall_target,
+ ret = __bpf_arch_text_poke(poke->tailcall_target,
BPF_MOD_JUMP,
old_addr, new_addr);
BUG_ON(ret < 0 && ret != -EINVAL);
if (!old) {
- ret = bpf_arch_text_poke(poke->tailcall_bypass,
+ ret = __bpf_arch_text_poke(poke->tailcall_bypass,
BPF_MOD_JUMP,
poke->bypass_addr,
NULL);
- BUG_ON(ret < 0 && ret != -EINVAL);
+ BUG_ON(ret < 0);
}
} else {
- ret = bpf_arch_text_poke(poke->tailcall_bypass,
+ ret = __bpf_arch_text_poke(poke->tailcall_bypass,
BPF_MOD_JUMP,
old_bypass_addr,
poke->bypass_addr);
- BUG_ON(ret < 0 && ret != -EINVAL);
+ BUG_ON(ret < 0);
/* let other CPUs finish the execution of program
* so that it will not possible to expose them
* to invalid nop, stack unwind, nop state
*/
if (!ret)
synchronize_rcu();
- ret = bpf_arch_text_poke(poke->tailcall_target,
+ ret = __bpf_arch_text_poke(poke->tailcall_target,
BPF_MOD_JUMP,
old_addr, NULL);
- BUG_ON(ret < 0 && ret != -EINVAL);
+ BUG_ON(ret < 0);
}
}
}
^ permalink raw reply related
* Re: [PATCH net] tty: Fix uninit-value access in ppp_sync_receive()
From: Simon Horman @ 2023-11-09 21:48 UTC (permalink / raw)
To: Shigeru Yoshida
Cc: davem, edumazet, kuba, pabeni, linux-ppp, netdev, linux-kernel
In-Reply-To: <20231108154420.1474853-1-syoshida@redhat.com>
On Thu, Nov 09, 2023 at 12:44:20AM +0900, Shigeru Yoshida wrote:
> KMSAN reported the following uninit-value access issue:
>
> =====================================================
> BUG: KMSAN: uninit-value in ppp_sync_input drivers/net/ppp/ppp_synctty.c:690 [inline]
> BUG: KMSAN: uninit-value in ppp_sync_receive+0xdc9/0xe70 drivers/net/ppp/ppp_synctty.c:334
> ppp_sync_input drivers/net/ppp/ppp_synctty.c:690 [inline]
> ppp_sync_receive+0xdc9/0xe70 drivers/net/ppp/ppp_synctty.c:334
> tiocsti+0x328/0x450 drivers/tty/tty_io.c:2295
> tty_ioctl+0x808/0x1920 drivers/tty/tty_io.c:2694
> vfs_ioctl fs/ioctl.c:51 [inline]
> __do_sys_ioctl fs/ioctl.c:871 [inline]
> __se_sys_ioctl+0x211/0x400 fs/ioctl.c:857
> __x64_sys_ioctl+0x97/0xe0 fs/ioctl.c:857
> do_syscall_x64 arch/x86/entry/common.c:51 [inline]
> do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82
> entry_SYSCALL_64_after_hwframe+0x63/0x6b
>
> Uninit was created at:
> __alloc_pages+0x75d/0xe80 mm/page_alloc.c:4591
> __alloc_pages_node include/linux/gfp.h:238 [inline]
> alloc_pages_node include/linux/gfp.h:261 [inline]
> __page_frag_cache_refill+0x9a/0x2c0 mm/page_alloc.c:4691
> page_frag_alloc_align+0x91/0x5d0 mm/page_alloc.c:4722
> page_frag_alloc include/linux/gfp.h:322 [inline]
> __netdev_alloc_skb+0x215/0x6d0 net/core/skbuff.c:728
> netdev_alloc_skb include/linux/skbuff.h:3225 [inline]
> dev_alloc_skb include/linux/skbuff.h:3238 [inline]
> ppp_sync_input drivers/net/ppp/ppp_synctty.c:669 [inline]
> ppp_sync_receive+0x237/0xe70 drivers/net/ppp/ppp_synctty.c:334
> tiocsti+0x328/0x450 drivers/tty/tty_io.c:2295
> tty_ioctl+0x808/0x1920 drivers/tty/tty_io.c:2694
> vfs_ioctl fs/ioctl.c:51 [inline]
> __do_sys_ioctl fs/ioctl.c:871 [inline]
> __se_sys_ioctl+0x211/0x400 fs/ioctl.c:857
> __x64_sys_ioctl+0x97/0xe0 fs/ioctl.c:857
> do_syscall_x64 arch/x86/entry/common.c:51 [inline]
> do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82
> entry_SYSCALL_64_after_hwframe+0x63/0x6b
>
> CPU: 0 PID: 12950 Comm: syz-executor.1 Not tainted 6.6.0-14500-g1c41041124bd #10
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
> =====================================================
>
> ppp_sync_input() checks the first 2 bytes of the data are PPP_ALLSTATIONS
> and PPP_UI. However, if the data length is 1 and the first byte is
> PPP_ALLSTATIONS, an access to an uninitialized value occurs when checking
> PPP_UI. This patch resolves this issue by checking the data length.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply
* [RFC PATCH 0/8] Add support for 10G Ethernet SerDes on MT7988
From: Daniel Golle @ 2023-11-09 21:50 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Chunfeng Yun,
Vinod Koul, Kishon Vijay Abraham I, Felix Fietkau, John Crispin,
Sean Wang, Mark Lee, Lorenzo Bianconi, Matthias Brugger,
AngeloGioacchino Del Regno, Andrew Lunn, Heiner Kallweit,
Russell King, Alexander Couzens, Daniel Golle, Philipp Zabel,
netdev, devicetree, linux-kernel, linux-arm-kernel,
linux-mediatek, linux-phy
This series aims to add support for GMAC2 and GMAC3 of the MediaTek MT7988 SoC.
While the vendor SDK stuffs all this into their Ethernet driver, I've tried to
seperate things into a PHY driver, a PCS driver as well as changes to the
existing Ethernet and LynxI PCS driver.
+----------------+
+--------------+ | USXGMII PCS | +------------------+
| Ethernet MAC +--+-------------+ +---+ PEXTP SerDes PHY |
+--------------+ | SGMII PCS | | +------------------+
+-------------+--+
Alltogether this allows using GMAC2 and GMAC3 with all possible interface modes,
including in-band-status if needed.
Daniel Golle (8):
dt-bindings: phy: mediatek,xfi-pextp: add new bindings
phy: add driver for MediaTek pextp 10GE SerDes PHY
net: pcs: pcs-mtk-lynxi: use 2500Base-X without AN
net: pcs: pcs-mtk-lynxi: allow calling with NULL advertising
dt-bindings: net: pcs: add bindings for MediaTek USXGMII PCS
net: pcs: add driver for MediaTek USXGMII PCS
dt-bindings: net: mediatek,net: fix and complete mt7988-eth binding
net: ethernet: mtk_eth_soc: add paths and SerDes modes for MT7988
.../devicetree/bindings/net/mediatek,net.yaml | 171 ++++-
.../bindings/net/pcs/mediatek,usxgmii.yaml | 105 +++
.../bindings/phy/mediatek,xfi-pextp.yaml | 71 ++
MAINTAINERS | 3 +
drivers/net/ethernet/mediatek/Kconfig | 17 +
drivers/net/ethernet/mediatek/mtk_eth_path.c | 122 +++-
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 178 ++++-
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 105 ++-
drivers/net/pcs/Kconfig | 10 +
drivers/net/pcs/Makefile | 1 +
drivers/net/pcs/pcs-mtk-lynxi.c | 38 +-
drivers/net/pcs/pcs-mtk-usxgmii.c | 688 ++++++++++++++++++
drivers/phy/mediatek/Kconfig | 11 +
drivers/phy/mediatek/Makefile | 1 +
drivers/phy/mediatek/phy-mtk-pextp.c | 355 +++++++++
include/linux/pcs/pcs-mtk-usxgmii.h | 18 +
16 files changed, 1813 insertions(+), 81 deletions(-)
create mode 100644 Documentation/devicetree/bindings/net/pcs/mediatek,usxgmii.yaml
create mode 100644 Documentation/devicetree/bindings/phy/mediatek,xfi-pextp.yaml
create mode 100644 drivers/net/pcs/pcs-mtk-usxgmii.c
create mode 100644 drivers/phy/mediatek/phy-mtk-pextp.c
create mode 100644 include/linux/pcs/pcs-mtk-usxgmii.h
--
2.42.1
^ permalink raw reply
* [RFC PATCH 1/8] dt-bindings: phy: mediatek,xfi-pextp: add new bindings
From: Daniel Golle @ 2023-11-09 21:50 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Chunfeng Yun,
Vinod Koul, Kishon Vijay Abraham I, Felix Fietkau, John Crispin,
Sean Wang, Mark Lee, Lorenzo Bianconi, Matthias Brugger,
AngeloGioacchino Del Regno, Andrew Lunn, Heiner Kallweit,
Russell King, Alexander Couzens, Daniel Golle, Philipp Zabel,
netdev, devicetree, linux-kernel, linux-arm-kernel,
linux-mediatek, linux-phy
In-Reply-To: <cover.1699565880.git.daniel@makrotopia.org>
Add bindings for the MediaTek PEXTP Ethernet SerDes PHY found in the
MediaTek MT7988 SoC which can operate at various interfaces modes:
* USXGMII
* 10GBase-R
* 5GBase-R
* 2500Base-X
* 1000Base-X
* Cisco SGMII (MAC side)
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
.../bindings/phy/mediatek,xfi-pextp.yaml | 71 +++++++++++++++++++
1 file changed, 71 insertions(+)
create mode 100644 Documentation/devicetree/bindings/phy/mediatek,xfi-pextp.yaml
diff --git a/Documentation/devicetree/bindings/phy/mediatek,xfi-pextp.yaml b/Documentation/devicetree/bindings/phy/mediatek,xfi-pextp.yaml
new file mode 100644
index 0000000000000..948d5031af1e3
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/mediatek,xfi-pextp.yaml
@@ -0,0 +1,71 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/phy/mediatek,xfi-pextp.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: MediaTek XFI PEXTP SerDes PHY
+
+maintainers:
+ - Daniel Golle <daniel@makrotopia.org>
+
+description: |
+ The MediaTek XFI PEXTP SerDes PHY provides the physical SerDes lanes
+ used by the MediaTek USXGMII PCS.
+
+properties:
+ $nodename:
+ pattern: "^phy@[0-9a-f]+$"
+
+ compatible:
+ const: mediatek,mt7988-xfi-pextp
+
+ reg:
+ maxItems: 1
+
+ clocks:
+ items:
+ - description: XFI PHY clock
+
+ resets:
+ items:
+ - description: PEXTP reset
+
+ mediatek,usxgmii-performance-errata:
+ $ref: /schemas/types.yaml#/definitions/flag
+ description:
+ USXGMII0 on MT7988 suffers from a performance problem in 10GBase-R
+ mode which needs a work-around in the driver. The work-around is
+ enabled using this flag.
+
+ "#phy-cells":
+ const: 0
+
+required:
+ - compatible
+ - reg
+ - clocks
+ - resets
+ - "#phy-cells"
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/clock/mediatek,mt7988-clk.h>
+ #include <dt-bindings/reset/mediatek,mt7988-resets.h>
+ soc {
+ #address-cells = <2>;
+ #size-cells = <2>;
+
+ xfi_pextp0: phy@11f20000 {
+ compatible = "mediatek,mt7988-xfi-pextp";
+ reg = <0 0x11f20000 0 0x10000>;
+ clocks = <&topckgen CLK_TOP_XFI_PHY_0_XTAL_SEL>;
+ resets = <&watchdog MT7988_TOPRGU_XFI_PEXTP0_GRST>;
+ mediatek,usxgmii-performance-errata;
+ #phy-cells = <0>;
+ };
+ };
+
+...
--
2.42.1
^ permalink raw reply related
* [RFC PATCH 2/8] phy: add driver for MediaTek pextp 10GE SerDes PHY
From: Daniel Golle @ 2023-11-09 21:51 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Chunfeng Yun,
Vinod Koul, Kishon Vijay Abraham I, Felix Fietkau, John Crispin,
Sean Wang, Mark Lee, Lorenzo Bianconi, Matthias Brugger,
AngeloGioacchino Del Regno, Andrew Lunn, Heiner Kallweit,
Russell King, Alexander Couzens, Daniel Golle, Philipp Zabel,
netdev, devicetree, linux-kernel, linux-arm-kernel,
linux-mediatek, linux-phy
In-Reply-To: <cover.1699565880.git.daniel@makrotopia.org>
Add driver for MediaTek's pextp 10 Gigabit/s Ethernet SerDes PHY which
can be found in the MT7988 SoC.
The PHY can operates only in PHY_MODE_ETHERNET, the submode is one of
PHY_INTERFACE_MODE_* corresponding to the supported modes:
* USXGMII
* 10GBase-R
* 5GBase-R
* 2500Base-X
* 1000Base-X
* Cisco SGMII (MAC side)
In order to work-around a performance issue present on the first of
two PEXTP present in MT7988 special tuning is applied which can be
selected by adding the mediatek,usxgmii-performance-errata property to
the device tree node.
There is no documentation what-so-ever for the pextp registers and
this driver is based on a GPL licensed implementation found in
MediaTek's SDK.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
MAINTAINERS | 1 +
drivers/phy/mediatek/Kconfig | 11 +
drivers/phy/mediatek/Makefile | 1 +
drivers/phy/mediatek/phy-mtk-pextp.c | 355 +++++++++++++++++++++++++++
4 files changed, 368 insertions(+)
create mode 100644 drivers/phy/mediatek/phy-mtk-pextp.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 7b151710e8c58..6499acd8f3874 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13527,6 +13527,7 @@ L: netdev@vger.kernel.org
S: Maintained
F: drivers/net/phy/mediatek-ge-soc.c
F: drivers/net/phy/mediatek-ge.c
+F: drivers/phy/mediatek/phy-mediatek-pextp.c
MEDIATEK I2C CONTROLLER DRIVER
M: Qii Wang <qii.wang@mediatek.com>
diff --git a/drivers/phy/mediatek/Kconfig b/drivers/phy/mediatek/Kconfig
index 3125ecb5d119f..a7749a6d96541 100644
--- a/drivers/phy/mediatek/Kconfig
+++ b/drivers/phy/mediatek/Kconfig
@@ -13,6 +13,17 @@ config PHY_MTK_PCIE
callback for PCIe GEN3 port, it supports software efuse
initialization.
+config PHY_MTK_PEXTP
+ tristate "MediaTek PEXTP Driver"
+ depends on ARCH_MEDIATEK || COMPILE_TEST
+ depends on OF && OF_ADDRESS
+ depends on HAS_IOMEM
+ select GENERIC_PHY
+ help
+ Say 'Y' here to add support for MediaTek pextp PHY driver.
+ The driver provides access to the Ethernet SerDes PHY supporting
+ various 1GE, 2.5GE, 5GE and 10GE modes.
+
config PHY_MTK_TPHY
tristate "MediaTek T-PHY Driver"
depends on ARCH_MEDIATEK || COMPILE_TEST
diff --git a/drivers/phy/mediatek/Makefile b/drivers/phy/mediatek/Makefile
index c9a50395533eb..ca60c7b9b02ac 100644
--- a/drivers/phy/mediatek/Makefile
+++ b/drivers/phy/mediatek/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_PHY_MTK_PCIE) += phy-mtk-pcie.o
obj-$(CONFIG_PHY_MTK_TPHY) += phy-mtk-tphy.o
obj-$(CONFIG_PHY_MTK_UFS) += phy-mtk-ufs.o
obj-$(CONFIG_PHY_MTK_XSPHY) += phy-mtk-xsphy.o
+obj-$(CONFIG_PHY_MTK_PEXTP) += phy-mtk-pextp.o
phy-mtk-hdmi-drv-y := phy-mtk-hdmi.o
phy-mtk-hdmi-drv-y += phy-mtk-hdmi-mt2701.o
diff --git a/drivers/phy/mediatek/phy-mtk-pextp.c b/drivers/phy/mediatek/phy-mtk-pextp.c
new file mode 100644
index 0000000000000..272bff4f37a96
--- /dev/null
+++ b/drivers/phy/mediatek/phy-mtk-pextp.c
@@ -0,0 +1,355 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* MediaTek 10GE SerDes PHY driver
+ *
+ * Copyright (c) 2023 Daniel Golle <daniel@makrotopia.org>
+ * based on mtk_usxgmii.c found in MediaTek's SDK released under GPL-2.0
+ * Copyright (c) 2022 MediaTek Inc.
+ * Author: Henry Yen <henry.yen@mediatek.com>
+ */
+
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/netdevice.h>
+#include <linux/platform_device.h>
+#include <linux/of.h>
+#include <linux/io.h>
+#include <linux/clk.h>
+#include <linux/reset.h>
+#include <linux/phy.h>
+#include <linux/phy/phy.h>
+
+struct mtk_pextp_phy {
+ void __iomem *base;
+ struct device *dev;
+ struct reset_control *reset;
+ struct clk *clk;
+ bool da_war;
+};
+
+static inline bool mtk_interface_mode_is_xgmii(phy_interface_t interface)
+{
+ switch (interface) {
+ case PHY_INTERFACE_MODE_INTERNAL:
+ case PHY_INTERFACE_MODE_USXGMII:
+ case PHY_INTERFACE_MODE_10GBASER:
+ case PHY_INTERFACE_MODE_5GBASER:
+ return true;
+ default:
+ return false;
+ }
+}
+
+static void mtk_pextp_setup(struct mtk_pextp_phy *pextp, phy_interface_t interface)
+{
+ bool is_10g = (interface == PHY_INTERFACE_MODE_10GBASER ||
+ interface == PHY_INTERFACE_MODE_USXGMII);
+ bool is_2p5g = (interface == PHY_INTERFACE_MODE_2500BASEX);
+ bool is_5g = (interface == PHY_INTERFACE_MODE_5GBASER);
+
+ dev_dbg(pextp->dev, "setting up for mode %s\n", phy_modes(interface));
+
+ /* Setup operation mode */
+ if (is_10g)
+ iowrite32(0x00C9071C, pextp->base + 0x9024);
+ else
+ iowrite32(0x00D9071C, pextp->base + 0x9024);
+
+ if (is_5g)
+ iowrite32(0xAAA5A5AA, pextp->base + 0x2020);
+ else
+ iowrite32(0xAA8585AA, pextp->base + 0x2020);
+
+ if (is_2p5g || is_5g || is_10g) {
+ iowrite32(0x0C020707, pextp->base + 0x2030);
+ iowrite32(0x0E050F0F, pextp->base + 0x2034);
+ iowrite32(0x00140032, pextp->base + 0x2040);
+ } else {
+ iowrite32(0x0C020207, pextp->base + 0x2030);
+ iowrite32(0x0E05050F, pextp->base + 0x2034);
+ iowrite32(0x00200032, pextp->base + 0x2040);
+ }
+
+ if (is_2p5g || is_10g)
+ iowrite32(0x00C014AA, pextp->base + 0x50F0);
+ else if (is_5g)
+ iowrite32(0x00C018AA, pextp->base + 0x50F0);
+ else
+ iowrite32(0x00C014BA, pextp->base + 0x50F0);
+
+ if (is_5g) {
+ iowrite32(0x3777812B, pextp->base + 0x50E0);
+ iowrite32(0x005C9CFF, pextp->base + 0x506C);
+ iowrite32(0x9DFAFAFA, pextp->base + 0x5070);
+ iowrite32(0x273F3F3F, pextp->base + 0x5074);
+ iowrite32(0xA8883868, pextp->base + 0x5078);
+ iowrite32(0x14661466, pextp->base + 0x507C);
+ } else {
+ iowrite32(0x3777C12B, pextp->base + 0x50E0);
+ iowrite32(0x005F9CFF, pextp->base + 0x506C);
+ iowrite32(0x9D9DFAFA, pextp->base + 0x5070);
+ iowrite32(0x27273F3F, pextp->base + 0x5074);
+ iowrite32(0xA7883C68, pextp->base + 0x5078);
+ iowrite32(0x11661166, pextp->base + 0x507C);
+ }
+
+ if (is_2p5g || is_10g) {
+ iowrite32(0x0E000AAF, pextp->base + 0x5080);
+ iowrite32(0x08080D0D, pextp->base + 0x5084);
+ iowrite32(0x02030909, pextp->base + 0x5088);
+ } else if (is_5g) {
+ iowrite32(0x0E001ABF, pextp->base + 0x5080);
+ iowrite32(0x080B0D0D, pextp->base + 0x5084);
+ iowrite32(0x02050909, pextp->base + 0x5088);
+ } else {
+ iowrite32(0x0E000EAF, pextp->base + 0x5080);
+ iowrite32(0x08080E0D, pextp->base + 0x5084);
+ iowrite32(0x02030B09, pextp->base + 0x5088);
+ }
+
+ if (is_5g) {
+ iowrite32(0x0C000000, pextp->base + 0x50E4);
+ iowrite32(0x04000000, pextp->base + 0x50E8);
+ } else {
+ iowrite32(0x0C0C0000, pextp->base + 0x50E4);
+ iowrite32(0x04040000, pextp->base + 0x50E8);
+ }
+
+ if (is_2p5g || mtk_interface_mode_is_xgmii(interface))
+ iowrite32(0x0F0F0C06, pextp->base + 0x50EC);
+ else
+ iowrite32(0x0F0F0606, pextp->base + 0x50EC);
+
+ if (is_5g) {
+ iowrite32(0x50808C8C, pextp->base + 0x50A8);
+ iowrite32(0x18000000, pextp->base + 0x6004);
+ } else {
+ iowrite32(0x506E8C8C, pextp->base + 0x50A8);
+ iowrite32(0x18190000, pextp->base + 0x6004);
+ }
+
+ if (is_10g)
+ iowrite32(0x01423342, pextp->base + 0x00F8);
+ else if (is_5g)
+ iowrite32(0x00A132A1, pextp->base + 0x00F8);
+ else if (is_2p5g)
+ iowrite32(0x009C329C, pextp->base + 0x00F8);
+ else
+ iowrite32(0x00FA32FA, pextp->base + 0x00F8);
+
+ /* Force SGDT_OUT off and select PCS */
+ if (mtk_interface_mode_is_xgmii(interface))
+ iowrite32(0x80201F20, pextp->base + 0x00F4);
+ else
+ iowrite32(0x80201F21, pextp->base + 0x00F4);
+
+ /* Force GLB_CKDET_OUT */
+ iowrite32(0x00050C00, pextp->base + 0x0030);
+
+ /* Force AEQ on */
+ iowrite32(0x02002800, pextp->base + 0x0070);
+ ndelay(1020);
+
+ /* Setup DA default value */
+ iowrite32(0x00000020, pextp->base + 0x30B0);
+ iowrite32(0x00008A01, pextp->base + 0x3028);
+ iowrite32(0x0000A884, pextp->base + 0x302C);
+ iowrite32(0x00083002, pextp->base + 0x3024);
+ if (mtk_interface_mode_is_xgmii(interface)) {
+ iowrite32(0x00022220, pextp->base + 0x3010);
+ iowrite32(0x0F020A01, pextp->base + 0x5064);
+ iowrite32(0x06100600, pextp->base + 0x50B4);
+ if (interface == PHY_INTERFACE_MODE_USXGMII)
+ iowrite32(0x40704000, pextp->base + 0x3048);
+ else
+ iowrite32(0x47684100, pextp->base + 0x3048);
+ } else {
+ iowrite32(0x00011110, pextp->base + 0x3010);
+ iowrite32(0x40704000, pextp->base + 0x3048);
+ }
+
+ if (!mtk_interface_mode_is_xgmii(interface) && !is_2p5g)
+ iowrite32(0x0000C000, pextp->base + 0x3064);
+
+ if (interface == PHY_INTERFACE_MODE_USXGMII) {
+ iowrite32(0xA8000000, pextp->base + 0x3050);
+ iowrite32(0x000000AA, pextp->base + 0x3054);
+ } else if (mtk_interface_mode_is_xgmii(interface)) {
+ iowrite32(0x00000000, pextp->base + 0x3050);
+ iowrite32(0x00000000, pextp->base + 0x3054);
+ } else {
+ iowrite32(0xA8000000, pextp->base + 0x3050);
+ iowrite32(0x000000AA, pextp->base + 0x3054);
+ }
+
+ if (mtk_interface_mode_is_xgmii(interface))
+ iowrite32(0x00000F00, pextp->base + 0x306C);
+ else if (is_2p5g)
+ iowrite32(0x22000F00, pextp->base + 0x306C);
+ else
+ iowrite32(0x20200F00, pextp->base + 0x306C);
+
+ if (interface == PHY_INTERFACE_MODE_10GBASER && pextp->da_war)
+ iowrite32(0x0007B400, pextp->base + 0xA008);
+
+ if (mtk_interface_mode_is_xgmii(interface))
+ iowrite32(0x00040000, pextp->base + 0xA060);
+ else
+ iowrite32(0x00050000, pextp->base + 0xA060);
+
+ if (is_10g)
+ iowrite32(0x00000001, pextp->base + 0x90D0);
+ else if (is_5g)
+ iowrite32(0x00000003, pextp->base + 0x90D0);
+ else if (is_2p5g)
+ iowrite32(0x00000005, pextp->base + 0x90D0);
+ else
+ iowrite32(0x00000007, pextp->base + 0x90D0);
+
+ /* Release reset */
+ iowrite32(0x0200E800, pextp->base + 0x0070);
+ usleep_range(150, 500);
+
+ /* Switch to P0 */
+ iowrite32(0x0200C111, pextp->base + 0x0070);
+ ndelay(1020);
+ iowrite32(0x0200C101, pextp->base + 0x0070);
+ usleep_range(15, 50);
+
+ if (mtk_interface_mode_is_xgmii(interface)) {
+ /* Switch to Gen3 */
+ iowrite32(0x0202C111, pextp->base + 0x0070);
+ } else {
+ /* Switch to Gen2 */
+ iowrite32(0x0201C111, pextp->base + 0x0070);
+ }
+ ndelay(1020);
+ if (mtk_interface_mode_is_xgmii(interface))
+ iowrite32(0x0202C101, pextp->base + 0x0070);
+ else
+ iowrite32(0x0201C101, pextp->base + 0x0070);
+ usleep_range(100, 500);
+ iowrite32(0x00000030, pextp->base + 0x30B0);
+ if (mtk_interface_mode_is_xgmii(interface))
+ iowrite32(0x80201F00, pextp->base + 0x00F4);
+ else
+ iowrite32(0x80201F01, pextp->base + 0x00F4);
+
+ iowrite32(0x30000000, pextp->base + 0x3040);
+ usleep_range(400, 1000);
+}
+
+static int mtk_pextp_set_mode(struct phy *phy, enum phy_mode mode, int submode)
+{
+ struct mtk_pextp_phy *pextp = phy_get_drvdata(phy);
+
+ if (mode != PHY_MODE_ETHERNET)
+ return -EINVAL;
+
+ switch (submode) {
+ case PHY_INTERFACE_MODE_1000BASEX:
+ case PHY_INTERFACE_MODE_2500BASEX:
+ case PHY_INTERFACE_MODE_SGMII:
+ case PHY_INTERFACE_MODE_5GBASER:
+ case PHY_INTERFACE_MODE_10GBASER:
+ case PHY_INTERFACE_MODE_USXGMII:
+ mtk_pextp_setup(pextp, submode);
+ return 0;
+ default:
+ return -EINVAL;
+ }
+}
+
+static int mtk_pextp_reset(struct phy *phy)
+{
+ struct mtk_pextp_phy *pextp = phy_get_drvdata(phy);
+
+ reset_control_assert(pextp->reset);
+ usleep_range(100, 500);
+ reset_control_deassert(pextp->reset);
+ mdelay(10);
+
+ return 0;
+}
+
+static int mtk_pextp_power_on(struct phy *phy)
+{
+ struct mtk_pextp_phy *pextp = phy_get_drvdata(phy);
+
+ return clk_prepare_enable(pextp->clk);
+}
+
+static int mtk_pextp_power_off(struct phy *phy)
+{
+ struct mtk_pextp_phy *pextp = phy_get_drvdata(phy);
+
+ clk_disable_unprepare(pextp->clk);
+
+ return 0;
+}
+
+static const struct phy_ops mtk_pextp_ops = {
+ .power_on = mtk_pextp_power_on,
+ .power_off = mtk_pextp_power_off,
+ .set_mode = mtk_pextp_set_mode,
+ .reset = mtk_pextp_reset,
+ .owner = THIS_MODULE,
+};
+
+static int mtk_pextp_probe(struct platform_device *pdev)
+{
+ struct device_node *np = pdev->dev.of_node;
+ struct phy_provider *phy_provider;
+ struct mtk_pextp_phy *pextp;
+ struct phy *phy;
+
+ if (!np)
+ return -ENODEV;
+
+ pextp = devm_kzalloc(&pdev->dev, sizeof(*pextp), GFP_KERNEL);
+ if (!pextp)
+ return -ENOMEM;
+
+ pextp->base = devm_of_iomap(&pdev->dev, np, 0, NULL);
+ if (!pextp->base)
+ return -EIO;
+
+ pextp->dev = &pdev->dev;
+ pextp->clk = devm_clk_get(&pdev->dev, NULL);
+ if (IS_ERR(pextp->clk))
+ return PTR_ERR(pextp->clk);
+
+ pextp->reset = devm_reset_control_get_exclusive(&pdev->dev, NULL);
+ if (IS_ERR(pextp->reset))
+ return PTR_ERR(pextp->reset);
+
+ pextp->da_war = of_property_read_bool(np, "mediatek,usxgmii-performance-errata");
+
+ phy = devm_phy_create(&pdev->dev, NULL, &mtk_pextp_ops);
+ if (IS_ERR(phy))
+ return PTR_ERR(phy);
+
+ phy_set_drvdata(phy, pextp);
+
+ phy_provider = devm_of_phy_provider_register(&pdev->dev, of_phy_simple_xlate);
+
+ return PTR_ERR_OR_ZERO(phy_provider);
+}
+
+static const struct of_device_id mtk_pextp_match[] = {
+ { .compatible = "mediatek,mt7988-xfi-pextp", },
+ { }
+};
+MODULE_DEVICE_TABLE(of, mtk_pextp_match);
+
+static struct platform_driver mtk_pextp_driver = {
+ .probe = mtk_pextp_probe,
+ .driver = {
+ .name = "mtk-pextp",
+ .of_match_table = mtk_pextp_match,
+ },
+};
+module_platform_driver(mtk_pextp_driver);
+
+MODULE_DESCRIPTION("MediaTek pextp SerDes PHY driver");
+MODULE_AUTHOR("Daniel Golle <daniel@makrotopia.org>");
+MODULE_LICENSE("GPL");
--
2.42.1
^ permalink raw reply related
* [RFC PATCH 3/8] net: pcs: pcs-mtk-lynxi: use 2500Base-X without AN
From: Daniel Golle @ 2023-11-09 21:51 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Chunfeng Yun,
Vinod Koul, Kishon Vijay Abraham I, Felix Fietkau, John Crispin,
Sean Wang, Mark Lee, Lorenzo Bianconi, Matthias Brugger,
AngeloGioacchino Del Regno, Andrew Lunn, Heiner Kallweit,
Russell King, Alexander Couzens, Daniel Golle, Philipp Zabel,
netdev, devicetree, linux-kernel, linux-arm-kernel,
linux-mediatek, linux-phy
In-Reply-To: <cover.1699565880.git.daniel@makrotopia.org>
Using 2500Base-T SFP modules e.g. on the BananaPi R3 requires manually
disabling auto-negotiation, e.g. using ethtool. While a proper fix
using SFP quirks is being discussed upstream, bring a work-around to
restore user experience to what it was before the switch to the
dedicated SGMII PCS driver.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
drivers/net/pcs/pcs-mtk-lynxi.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/drivers/net/pcs/pcs-mtk-lynxi.c b/drivers/net/pcs/pcs-mtk-lynxi.c
index 8501dd365279b..6204448d8eac6 100644
--- a/drivers/net/pcs/pcs-mtk-lynxi.c
+++ b/drivers/net/pcs/pcs-mtk-lynxi.c
@@ -92,14 +92,23 @@ static void mtk_pcs_lynxi_get_state(struct phylink_pcs *pcs,
struct phylink_link_state *state)
{
struct mtk_pcs_lynxi *mpcs = pcs_to_mtk_pcs_lynxi(pcs);
- unsigned int bm, adv;
+ unsigned int bm, bmsr, adv;
/* Read the BMSR and LPA */
regmap_read(mpcs->regmap, SGMSYS_PCS_CONTROL_1, &bm);
- regmap_read(mpcs->regmap, SGMSYS_PCS_ADVERTISE, &adv);
+ bmsr = FIELD_GET(SGMII_BMSR, bm);
+
+ if (state->interface == PHY_INTERFACE_MODE_2500BASEX) {
+ state->link = !!(bmsr & BMSR_LSTATUS);
+ state->an_complete = !!(bmsr & BMSR_ANEGCOMPLETE);
+ state->speed = SPEED_2500;
+ state->duplex = DUPLEX_FULL;
+
+ return;
+ }
- phylink_mii_c22_pcs_decode_state(state, FIELD_GET(SGMII_BMSR, bm),
- FIELD_GET(SGMII_LPA, adv));
+ regmap_read(mpcs->regmap, SGMSYS_PCS_ADVERTISE, &adv);
+ phylink_mii_c22_pcs_decode_state(state, bmsr, FIELD_GET(SGMII_LPA, adv));
}
static int mtk_pcs_lynxi_config(struct phylink_pcs *pcs, unsigned int neg_mode,
@@ -129,7 +138,8 @@ static int mtk_pcs_lynxi_config(struct phylink_pcs *pcs, unsigned int neg_mode,
if (neg_mode & PHYLINK_PCS_NEG_INBAND)
sgm_mode |= SGMII_REMOTE_FAULT_DIS;
- if (neg_mode == PHYLINK_PCS_NEG_INBAND_ENABLED) {
+ if (neg_mode == PHYLINK_PCS_NEG_INBAND_ENABLED &&
+ interface != PHY_INTERFACE_MODE_2500BASEX) {
if (interface == PHY_INTERFACE_MODE_SGMII)
sgm_mode |= SGMII_SPEED_DUPLEX_AN;
bmcr = BMCR_ANENABLE;
--
2.42.1
^ permalink raw reply related
* [RFC PATCH 4/8] net: pcs: pcs-mtk-lynxi: allow calling with NULL advertising
From: Daniel Golle @ 2023-11-09 21:51 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Chunfeng Yun,
Vinod Koul, Kishon Vijay Abraham I, Felix Fietkau, John Crispin,
Sean Wang, Mark Lee, Lorenzo Bianconi, Matthias Brugger,
AngeloGioacchino Del Regno, Andrew Lunn, Heiner Kallweit,
Russell King, Alexander Couzens, Daniel Golle, Philipp Zabel,
netdev, devicetree, linux-kernel, linux-arm-kernel,
linux-mediatek, linux-phy
In-Reply-To: <cover.1699565880.git.daniel@makrotopia.org>
Allow calling pcs_config with advertising set to NULL and in this case
keep the previously assigned advertisement.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
drivers/net/pcs/pcs-mtk-lynxi.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/net/pcs/pcs-mtk-lynxi.c b/drivers/net/pcs/pcs-mtk-lynxi.c
index 6204448d8eac6..1372653c3d422 100644
--- a/drivers/net/pcs/pcs-mtk-lynxi.c
+++ b/drivers/net/pcs/pcs-mtk-lynxi.c
@@ -81,6 +81,7 @@ struct mtk_pcs_lynxi {
phy_interface_t interface;
struct phylink_pcs pcs;
u32 flags;
+ int advertise;
};
static struct mtk_pcs_lynxi *pcs_to_mtk_pcs_lynxi(struct phylink_pcs *pcs)
@@ -121,11 +122,19 @@ static int mtk_pcs_lynxi_config(struct phylink_pcs *pcs, unsigned int neg_mode,
unsigned int rgc3, sgm_mode, bmcr;
int advertise, link_timer;
- advertise = phylink_mii_c22_pcs_encode_advertisement(interface,
- advertising);
- if (advertise < 0)
- return advertise;
+ if (advertising) {
+ advertise = phylink_mii_c22_pcs_encode_advertisement(interface,
+ advertising);
+ if (advertise < 0)
+ return advertise;
+ mpcs->advertise = advertise;
+ } else {
+ if (mpcs->advertise < 0)
+ return -EINVAL;
+
+ advertise = mpcs->advertise;
+ }
/* Clearing IF_MODE_BIT0 switches the PCS to BASE-X mode, and
* we assume that fixes it's speed at bitrate = line rate (in
* other words, 1000Mbps or 2500Mbps).
@@ -299,6 +308,7 @@ struct phylink_pcs *mtk_pcs_lynxi_create(struct device *dev,
mpcs->pcs.neg_mode = true;
mpcs->pcs.poll = true;
mpcs->interface = PHY_INTERFACE_MODE_NA;
+ mpcs->advertise = -1;
return &mpcs->pcs;
}
--
2.42.1
^ permalink raw reply related
* [RFC PATCH 5/8] dt-bindings: net: pcs: add bindings for MediaTek USXGMII PCS
From: Daniel Golle @ 2023-11-09 21:51 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Chunfeng Yun,
Vinod Koul, Kishon Vijay Abraham I, Felix Fietkau, John Crispin,
Sean Wang, Mark Lee, Lorenzo Bianconi, Matthias Brugger,
AngeloGioacchino Del Regno, Andrew Lunn, Heiner Kallweit,
Russell King, Alexander Couzens, Daniel Golle, Philipp Zabel,
netdev, devicetree, linux-kernel, linux-arm-kernel,
linux-mediatek, linux-phy
In-Reply-To: <cover.1699565880.git.daniel@makrotopia.org>
MediaTek's USXGMII can be found in the MT7988 SoC. We need to access
it in order to configure and monitor the Ethernet SerDes link in
USXGMII, 10GBase-R and 5GBase-R mode. By including a wrapped
legacy 1000Base-X/2500Base-X/Cisco SGMII LynxI PCS as well, those
interface modes are also available.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
.../bindings/net/pcs/mediatek,usxgmii.yaml | 105 ++++++++++++++++++
1 file changed, 105 insertions(+)
create mode 100644 Documentation/devicetree/bindings/net/pcs/mediatek,usxgmii.yaml
diff --git a/Documentation/devicetree/bindings/net/pcs/mediatek,usxgmii.yaml b/Documentation/devicetree/bindings/net/pcs/mediatek,usxgmii.yaml
new file mode 100644
index 0000000000000..199cf47859e31
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/pcs/mediatek,usxgmii.yaml
@@ -0,0 +1,105 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/pcs/mediatek,usxgmii.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: MediaTek USXGMII PCS
+
+maintainers:
+ - Daniel Golle <daniel@makrotopia.org>
+
+description:
+ The MediaTek USXGMII PCS provides physical link control and status
+ for USXGMII, 10GBase-R and 5GBase-R links on the SerDes interfaces
+ provided by the PEXTP PHY.
+ In order to also support legacy 2500Base-X, 1000Base-X and Cisco
+ SGMII an existing mediatek,*-sgmiisys LynxI PCS is wrapped to
+ provide those interfaces modes on the same SerDes interfaces shared
+ with the USXGMII PCS.
+
+properties:
+ $nodename:
+ pattern: "^pcs@[0-9a-f]+$"
+
+ compatible:
+ const: mediatek,mt7988-usxgmiisys
+
+ reg:
+ maxItems: 1
+
+ clocks:
+ items:
+ - description: USXGMII top-level clock
+ - description: SGMII top-level clock
+ - description: SGMII subsystem TX clock
+ - description: SGMII subsystem RX clock
+ - description: XFI PLL clock
+
+ clock-names:
+ items:
+ - const: usxgmii
+ - const: sgmii_sel
+ - const: sgmii_tx
+ - const: sgmii_rx
+ - const: xfi_pll
+
+ phys:
+ items:
+ - description: PEXTP SerDes PHY
+
+ mediatek,sgmiisys:
+ $ref: /schemas/types.yaml#/definitions/phandle
+ description:
+ Phandle to the syscon node of the corresponding SGMII LynxI PCS.
+
+ resets:
+ items:
+ - description: XFI reset
+ - description: SGMII reset
+
+ reset-names:
+ items:
+ - const: xfi
+ - const: sgmii
+
+ "#pcs-cells":
+ const: 0
+
+required:
+ - compatible
+ - reg
+ - clocks
+ - clock-names
+ - phys
+ - mediatek,sgmiisys
+ - resets
+ - reset-names
+ - "#pcs-cells"
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/clock/mediatek,mt7988-clk.h>
+ #include <dt-bindings/reset/mediatek,mt7988-resets.h>
+ soc {
+ #address-cells = <2>;
+ #size-cells = <2>;
+ usxgmiisys0: pcs@10080000 {
+ compatible = "mediatek,mt7988-usxgmiisys";
+ reg = <0 0x10080000 0 0x1000>;
+ clocks = <&topckgen CLK_TOP_USXGMII_SBUS_0_SEL>,
+ <&topckgen CLK_TOP_SGM_0_SEL>,
+ <&sgmiisys0 CLK_SGM0_TX_EN>,
+ <&sgmiisys0 CLK_SGM0_RX_EN>,
+ <&xfi_pll CLK_XFIPLL_PLL_EN>;
+ clock-names = "usxgmii", "sgmii_sel", "sgmii_tx", "sgmii_rx", "xfi_pll";
+ resets = <&watchdog MT7988_TOPRGU_XFI0_GRST>,
+ <&watchdog MT7988_TOPRGU_SGMII0_GRST>;
+ reset-names = "xfi", "sgmii";
+ phys = <&xfi_pextp0>;
+ mediatek,sgmiisys = <&sgmiisys0>;
+ #pcs-cells = <0>;
+ };
+ };
--
2.42.1
^ permalink raw reply related
* [RFC PATCH 6/8] net: pcs: add driver for MediaTek USXGMII PCS
From: Daniel Golle @ 2023-11-09 21:51 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Chunfeng Yun,
Vinod Koul, Kishon Vijay Abraham I, Felix Fietkau, John Crispin,
Sean Wang, Mark Lee, Lorenzo Bianconi, Matthias Brugger,
AngeloGioacchino Del Regno, Andrew Lunn, Heiner Kallweit,
Russell King, Alexander Couzens, Daniel Golle, Philipp Zabel,
netdev, devicetree, linux-kernel, linux-arm-kernel,
linux-mediatek, linux-phy
In-Reply-To: <cover.1699565880.git.daniel@makrotopia.org>
Add driver for USXGMII PCS found in the MediaTek MT7988 SoC and supporting
USXGMII, 10GBase-R and 5GBase-R interface modes. In order to support
Cisco SGMII, 1000Base-X and 2500Base-X via the also present LynxI PCS
create a wrapped PCS taking care of the components shared between the
new USXGMII PCS and the legacy LynxI PCS.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
MAINTAINERS | 2 +
drivers/net/pcs/Kconfig | 10 +
drivers/net/pcs/Makefile | 1 +
drivers/net/pcs/pcs-mtk-usxgmii.c | 688 ++++++++++++++++++++++++++++
include/linux/pcs/pcs-mtk-usxgmii.h | 18 +
5 files changed, 719 insertions(+)
create mode 100644 drivers/net/pcs/pcs-mtk-usxgmii.c
create mode 100644 include/linux/pcs/pcs-mtk-usxgmii.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 6499acd8f3874..026f62243f595 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13517,7 +13517,9 @@ M: Daniel Golle <daniel@makrotopia.org>
L: netdev@vger.kernel.org
S: Maintained
F: drivers/net/pcs/pcs-mtk-lynxi.c
+F: drivers/net/pcs/pcs-mtk-usxgmii.c
F: include/linux/pcs/pcs-mtk-lynxi.h
+F: include/linux/pcs/pcs-mtk-usxgmii.h
MEDIATEK ETHERNET PHY DRIVERS
M: Daniel Golle <daniel@makrotopia.org>
diff --git a/drivers/net/pcs/Kconfig b/drivers/net/pcs/Kconfig
index 87cf308fc6d8b..5df5b19c4eb93 100644
--- a/drivers/net/pcs/Kconfig
+++ b/drivers/net/pcs/Kconfig
@@ -25,6 +25,16 @@ config PCS_MTK_LYNXI
This module provides helpers to phylink for managing the LynxI PCS
which is part of MediaTek's SoC and Ethernet switch ICs.
+config PCS_MTK_USXGMII
+ tristate "MediaTek USXGMII PCS"
+ select PCS_MTK_LYNXI
+ select PHY_MTK_PEXTP
+ help
+ This module provides a driver for MediaTek's USXGMII PCS supporting
+ 10GBase-R, 5GBase-R and USXGMII interface modes.
+ 1000Base-X, 2500Base-X and Cisco SGMII are supported on the same
+ differential pairs via an embedded LynxI PHY.
+
config PCS_RZN1_MIIC
tristate "Renesas RZ/N1 MII converter"
depends on OF && (ARCH_RZN1 || COMPILE_TEST)
diff --git a/drivers/net/pcs/Makefile b/drivers/net/pcs/Makefile
index fb1694192ae63..cc355152ca1ca 100644
--- a/drivers/net/pcs/Makefile
+++ b/drivers/net/pcs/Makefile
@@ -6,4 +6,5 @@ pcs_xpcs-$(CONFIG_PCS_XPCS) := pcs-xpcs.o pcs-xpcs-nxp.o pcs-xpcs-wx.o
obj-$(CONFIG_PCS_XPCS) += pcs_xpcs.o
obj-$(CONFIG_PCS_LYNX) += pcs-lynx.o
obj-$(CONFIG_PCS_MTK_LYNXI) += pcs-mtk-lynxi.o
+obj-$(CONFIG_PCS_MTK_USXGMII) += pcs-mtk-usxgmii.o
obj-$(CONFIG_PCS_RZN1_MIIC) += pcs-rzn1-miic.o
diff --git a/drivers/net/pcs/pcs-mtk-usxgmii.c b/drivers/net/pcs/pcs-mtk-usxgmii.c
new file mode 100644
index 0000000000000..b3ca66c9df2a9
--- /dev/null
+++ b/drivers/net/pcs/pcs-mtk-usxgmii.c
@@ -0,0 +1,688 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2023 MediaTek Inc.
+ * Author: Henry Yen <henry.yen@mediatek.com>
+ * Daniel Golle <daniel@makrotopia.org>
+ */
+
+#include <linux/clk.h>
+#include <linux/io.h>
+#include <linux/mfd/syscon.h>
+#include <linux/mdio.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/regmap.h>
+#include <linux/reset.h>
+#include <linux/pcs/pcs-mtk-lynxi.h>
+#include <linux/pcs/pcs-mtk-usxgmii.h>
+#include <linux/phy/phy.h>
+#include <linux/platform_device.h>
+
+/* USXGMII subsystem config registers */
+/* Register to control speed */
+#define RG_PHY_TOP_SPEED_CTRL1 0x80c
+#define USXGMII_RATE_UPDATE_MODE BIT(31)
+#define USXGMII_MAC_CK_GATED BIT(29)
+#define USXGMII_IF_FORCE_EN BIT(28)
+#define USXGMII_RATE_ADAPT_MODE GENMASK(10, 8)
+#define USXGMII_RATE_ADAPT_MODE_X1 0
+#define USXGMII_RATE_ADAPT_MODE_X2 1
+#define USXGMII_RATE_ADAPT_MODE_X4 2
+#define USXGMII_RATE_ADAPT_MODE_X10 3
+#define USXGMII_RATE_ADAPT_MODE_X100 4
+#define USXGMII_RATE_ADAPT_MODE_X5 5
+#define USXGMII_RATE_ADAPT_MODE_X50 6
+#define USXGMII_XFI_RX_MODE GENMASK(6, 4)
+#define USXGMII_XFI_TX_MODE GENMASK(2, 0)
+#define USXGMII_XFI_MODE_10G 0
+#define USXGMII_XFI_MODE_5G 1
+#define USXGMII_XFI_MODE_2P5G 3
+
+/* Register to control PCS AN */
+#define RG_PCS_AN_CTRL0 0x810
+#define USXGMII_AN_RESTART BIT(31)
+#define USXGMII_AN_SYNC_CNT GENMASK(30, 11)
+#define USXGMII_AN_ENABLE BIT(0)
+
+#define RG_PCS_AN_CTRL2 0x818
+#define USXGMII_LINK_TIMER_IDLE_DETECT GENMASK(29, 20)
+#define USXGMII_LINK_TIMER_COMP_ACK_DETECT GENMASK(19, 10)
+#define USXGMII_LINK_TIMER_AN_RESTART GENMASK(9, 0)
+
+/* Register to read PCS AN status */
+#define RG_PCS_AN_STS0 0x81c
+#define USXGMII_LPA GENMASK(15, 0)
+#define USXGMII_LPA_LATCH BIT(31)
+
+/* Register to read PCS link status */
+#define RG_PCS_RX_STATUS0 0x904
+#define RG_PCS_RX_STATUS_UPDATE BIT(16)
+#define RG_PCS_RX_LINK_STATUS BIT(2)
+
+#define MTK_NETSYS_V3_AMA_RGC3 0x128
+
+/* struct mtk_usxgmii_pcs - This structure holds each usxgmii PCS
+ * @pcs: Phylink PCS structure
+ * @dev: Pointer to device structure
+ * @base: IO memory to access PCS hardware
+ * @clk: Pointer to USXGMII clk
+ * @xfi_pll: Pointer to XFI PLL clk
+ * @pextp: The PHYA instance attached to the PCS
+ * @reset: Pointer to USXGMII reset control
+ * @interface: Currently selected interface mode
+ * @neg_mode: Currently used phylink neg_mode
+ */
+struct mtk_usxgmii_pcs {
+ struct phylink_pcs pcs;
+ struct phylink_pcs *sgmii_pcs;
+ struct device *dev;
+ void __iomem *base;
+ struct clk *clk;
+ struct clk *xfi_pll;
+ struct phy *pextp;
+ struct reset_control *reset;
+ struct regmap *regmap_pll;
+ phy_interface_t interface;
+ unsigned int neg_mode;
+};
+
+/* struct mtk_sgmii_wrapper_pcs - Structure holding wrapped SGMII PCS
+ * @usxgmii_pcs Pointer to owning mtk_usxgmii_pcs structure
+ * @pcs Phylink PCS structure
+ * @clks: Pointers to 3 SGMII clks
+ * @reset: Pointer to SGMII reset control
+ * @interface: Currently selected interface mode
+ * @neg_mode: Currently used phylink neg_mode
+ */
+struct mtk_sgmii_wrapper_pcs {
+ struct mtk_usxgmii_pcs *usxgmii_pcs;
+ struct phylink_pcs *lynxi_pcs;
+ struct phylink_pcs pcs;
+ struct clk *clks[3];
+ struct reset_control *reset;
+ phy_interface_t interface;
+ unsigned int neg_mode;
+ bool permit_pause_to_mac;
+};
+
+static u32 mtk_r32(struct mtk_usxgmii_pcs *mpcs, unsigned int reg)
+{
+ return ioread32(mpcs->base + reg);
+}
+
+static void mtk_m32(struct mtk_usxgmii_pcs *mpcs, unsigned int reg, u32 mask, u32 set)
+{
+ u32 val;
+
+ val = ioread32(mpcs->base + reg);
+ val &= ~mask;
+ val |= set;
+ iowrite32(val, mpcs->base + reg);
+}
+
+static struct mtk_usxgmii_pcs *pcs_to_mtk_usxgmii_pcs(struct phylink_pcs *pcs)
+{
+ return container_of(pcs, struct mtk_usxgmii_pcs, pcs);
+}
+
+static int mtk_xfi_pextp_init(struct mtk_usxgmii_pcs *mpcs)
+{
+ struct device *dev = mpcs->dev;
+
+ mpcs->pextp = devm_phy_get(dev, NULL);
+ if (IS_ERR(mpcs->pextp))
+ return dev_err_probe(dev, PTR_ERR(mpcs->pextp), "cannot acquire PHYA\n");
+
+ if (!mpcs->pextp)
+ return dev_err_probe(dev, -ENODEV, "PHYA not found\n");
+
+ return 0;
+}
+
+static void mtk_usxgmii_reset(struct mtk_usxgmii_pcs *mpcs)
+{
+ if (!mpcs->pextp)
+ return;
+
+ phy_reset(mpcs->pextp);
+
+ reset_control_assert(mpcs->reset);
+ usleep_range(100, 500);
+ reset_control_deassert(mpcs->reset);
+
+ mdelay(10);
+}
+
+static void mtk_sgmii_reset(struct mtk_sgmii_wrapper_pcs *wp)
+{
+ if (!wp->usxgmii_pcs->pextp)
+ return;
+
+ phy_reset(wp->usxgmii_pcs->pextp);
+
+ reset_control_assert(wp->reset);
+ usleep_range(100, 500);
+ reset_control_deassert(wp->reset);
+
+ mdelay(10);
+}
+
+static int mtk_sgmii_wrapped_pcs_config(struct phylink_pcs *pcs,
+ unsigned int neg_mode,
+ phy_interface_t interface,
+ const unsigned long *advertising,
+ bool permit_pause_to_mac)
+{
+ struct mtk_sgmii_wrapper_pcs *wp = container_of(pcs, struct mtk_sgmii_wrapper_pcs, pcs);
+ bool full_reconf;
+ int ret;
+
+ phy_power_on(wp->usxgmii_pcs->pextp);
+
+ full_reconf = (interface != wp->interface);
+ if (full_reconf)
+ mtk_sgmii_reset(wp);
+
+ ret = wp->lynxi_pcs->ops->pcs_config(wp->lynxi_pcs, neg_mode, interface, advertising,
+ permit_pause_to_mac);
+
+ if (full_reconf)
+ phy_set_mode_ext(wp->usxgmii_pcs->pextp, PHY_MODE_ETHERNET, interface);
+
+ wp->interface = interface;
+ wp->neg_mode = neg_mode;
+ wp->permit_pause_to_mac = permit_pause_to_mac;
+
+ return ret;
+}
+
+static void mtk_sgmii_wrapped_pcs_get_state(struct phylink_pcs *pcs,
+ struct phylink_link_state *state)
+{
+ struct mtk_sgmii_wrapper_pcs *wp = container_of(pcs, struct mtk_sgmii_wrapper_pcs, pcs);
+
+ wp->lynxi_pcs->ops->pcs_get_state(wp->lynxi_pcs, state);
+
+ /* Continuously repeat re-configuration sequence until link comes up */
+ if (!state->link)
+ mtk_sgmii_wrapped_pcs_config(pcs, wp->neg_mode, state->interface,
+ NULL, wp->permit_pause_to_mac);
+}
+
+static void mtk_sgmii_wrapped_pcs_an_restart(struct phylink_pcs *pcs)
+{
+ struct mtk_sgmii_wrapper_pcs *wp = container_of(pcs, struct mtk_sgmii_wrapper_pcs, pcs);
+
+ wp->lynxi_pcs->ops->pcs_an_restart(wp->lynxi_pcs);
+}
+
+static void mtk_sgmii_wrapped_pcs_link_up(struct phylink_pcs *pcs,
+ unsigned int neg_mode,
+ phy_interface_t interface, int speed,
+ int duplex)
+{
+ struct mtk_sgmii_wrapper_pcs *wp = container_of(pcs, struct mtk_sgmii_wrapper_pcs, pcs);
+
+ wp->lynxi_pcs->ops->pcs_link_up(wp->lynxi_pcs, neg_mode, interface, speed, duplex);
+}
+
+static void mtk_sgmii_wrapped_pcs_disable(struct phylink_pcs *pcs)
+{
+ struct mtk_sgmii_wrapper_pcs *wp = container_of(pcs, struct mtk_sgmii_wrapper_pcs, pcs);
+
+ wp->lynxi_pcs->ops->pcs_disable(wp->lynxi_pcs);
+ wp->interface = PHY_INTERFACE_MODE_NA;
+ wp->neg_mode = -1;
+}
+
+static int mtk_sgmii_wrapped_pcs_enable(struct phylink_pcs *pcs)
+{
+ struct mtk_sgmii_wrapper_pcs *wp = container_of(pcs, struct mtk_sgmii_wrapper_pcs, pcs);
+
+ if (wp->lynxi_pcs->ops->pcs_enable)
+ wp->lynxi_pcs->ops->pcs_enable(wp->lynxi_pcs);
+
+ wp->interface = PHY_INTERFACE_MODE_NA;
+ wp->neg_mode = -1;
+
+ phy_power_on(wp->usxgmii_pcs->pextp);
+
+ return 0;
+}
+
+static const struct phylink_pcs_ops mtk_sgmii_wrapped_pcs_ops = {
+ .pcs_get_state = mtk_sgmii_wrapped_pcs_get_state,
+ .pcs_config = mtk_sgmii_wrapped_pcs_config,
+ .pcs_an_restart = mtk_sgmii_wrapped_pcs_an_restart,
+ .pcs_link_up = mtk_sgmii_wrapped_pcs_link_up,
+ .pcs_disable = mtk_sgmii_wrapped_pcs_disable,
+ .pcs_enable = mtk_sgmii_wrapped_pcs_enable,
+};
+
+static int mtk_sgmii_wrapper_init(struct mtk_usxgmii_pcs *mpcs)
+{
+ struct device_node *r = mpcs->dev->of_node, *np;
+ struct mtk_sgmii_wrapper_pcs *wp;
+ struct phylink_pcs *lynxi_pcs;
+ struct reset_control *rstc;
+ struct regmap *regmap;
+ struct clk *sgmii_sel, *sgmii_rx, *sgmii_tx;
+
+ np = of_parse_phandle(r, "mediatek,sgmiisys", 0);
+ if (!np)
+ return -ENODEV;
+
+ regmap = syscon_node_to_regmap(np);
+ of_node_put(np);
+ if (IS_ERR(regmap))
+ return PTR_ERR(regmap);
+
+ rstc = of_reset_control_get_shared(r, "sgmii");
+
+ if (IS_ERR(rstc))
+ return PTR_ERR(rstc);
+
+ sgmii_sel = devm_clk_get_enabled(mpcs->dev, "sgmii_sel");
+ if (IS_ERR(sgmii_sel))
+ return PTR_ERR(sgmii_sel);
+
+ sgmii_rx = devm_clk_get_enabled(mpcs->dev, "sgmii_rx");
+ if (IS_ERR(sgmii_rx))
+ return PTR_ERR(sgmii_rx);
+
+ sgmii_tx = devm_clk_get_enabled(mpcs->dev, "sgmii_tx");
+ if (IS_ERR(sgmii_tx))
+ return PTR_ERR(sgmii_tx);
+
+ lynxi_pcs = mtk_pcs_lynxi_create(mpcs->dev, regmap, MTK_NETSYS_V3_AMA_RGC3, 0);
+
+ if (IS_ERR(lynxi_pcs))
+ return PTR_ERR(lynxi_pcs);
+
+ if (!lynxi_pcs)
+ return -ENODEV;
+
+ /* Make sure all PCS ops are supported by wrapped PCS */
+ if (!lynxi_pcs->ops->pcs_get_state ||
+ !lynxi_pcs->ops->pcs_config ||
+ !lynxi_pcs->ops->pcs_an_restart ||
+ !lynxi_pcs->ops->pcs_link_up ||
+ !lynxi_pcs->ops->pcs_disable)
+ return -EOPNOTSUPP;
+
+ wp = devm_kzalloc(mpcs->dev, sizeof(*wp), GFP_KERNEL);
+ if (!wp)
+ return -ENOMEM;
+
+ wp->pcs.neg_mode = lynxi_pcs->neg_mode;
+ wp->pcs.ops = &mtk_sgmii_wrapped_pcs_ops;
+ wp->pcs.poll = true;
+ wp->lynxi_pcs = lynxi_pcs;
+ wp->usxgmii_pcs = mpcs;
+ wp->clks[0] = sgmii_sel;
+ wp->clks[1] = sgmii_rx;
+ wp->clks[2] = sgmii_tx;
+ wp->reset = rstc;
+ wp->interface = PHY_INTERFACE_MODE_NA;
+ wp->neg_mode = -1;
+
+ if (IS_ERR(wp->reset))
+ return PTR_ERR(wp->reset);
+
+ reset_control_deassert(wp->reset);
+
+ mpcs->sgmii_pcs = &wp->pcs;
+
+ return 0;
+}
+
+static void mtk_sgmii_wrapper_destroy(struct mtk_usxgmii_pcs *mpcs)
+{
+ struct mtk_sgmii_wrapper_pcs *wp = container_of(mpcs->sgmii_pcs,
+ struct mtk_sgmii_wrapper_pcs,
+ pcs);
+
+ mtk_pcs_lynxi_destroy(wp->lynxi_pcs);
+ reset_control_put(wp->reset);
+}
+
+static int mtk_usxgmii_pcs_config(struct phylink_pcs *pcs, unsigned int neg_mode,
+ phy_interface_t interface,
+ const unsigned long *advertising,
+ bool permit_pause_to_mac)
+{
+ struct mtk_usxgmii_pcs *mpcs = pcs_to_mtk_usxgmii_pcs(pcs);
+ unsigned int an_ctrl = 0, link_timer = 0, xfi_mode = 0, adapt_mode = 0;
+ bool mode_changed = false;
+
+ if (interface == PHY_INTERFACE_MODE_USXGMII) {
+ an_ctrl = FIELD_PREP(USXGMII_AN_SYNC_CNT, 0x1FF) | USXGMII_AN_ENABLE;
+ link_timer = FIELD_PREP(USXGMII_LINK_TIMER_IDLE_DETECT, 0x7B) |
+ FIELD_PREP(USXGMII_LINK_TIMER_COMP_ACK_DETECT, 0x7B) |
+ FIELD_PREP(USXGMII_LINK_TIMER_AN_RESTART, 0x7B);
+ xfi_mode = FIELD_PREP(USXGMII_XFI_RX_MODE, USXGMII_XFI_MODE_10G) |
+ FIELD_PREP(USXGMII_XFI_TX_MODE, USXGMII_XFI_MODE_10G);
+ } else if (interface == PHY_INTERFACE_MODE_10GBASER) {
+ an_ctrl = FIELD_PREP(USXGMII_AN_SYNC_CNT, 0x1FF);
+ link_timer = FIELD_PREP(USXGMII_LINK_TIMER_IDLE_DETECT, 0x7B) |
+ FIELD_PREP(USXGMII_LINK_TIMER_COMP_ACK_DETECT, 0x7B) |
+ FIELD_PREP(USXGMII_LINK_TIMER_AN_RESTART, 0x7B);
+ xfi_mode = FIELD_PREP(USXGMII_XFI_RX_MODE, USXGMII_XFI_MODE_10G) |
+ FIELD_PREP(USXGMII_XFI_TX_MODE, USXGMII_XFI_MODE_10G);
+ adapt_mode = USXGMII_RATE_UPDATE_MODE;
+ } else if (interface == PHY_INTERFACE_MODE_5GBASER) {
+ an_ctrl = FIELD_PREP(USXGMII_AN_SYNC_CNT, 0xFF);
+ link_timer = FIELD_PREP(USXGMII_LINK_TIMER_IDLE_DETECT, 0x3D) |
+ FIELD_PREP(USXGMII_LINK_TIMER_COMP_ACK_DETECT, 0x3D) |
+ FIELD_PREP(USXGMII_LINK_TIMER_AN_RESTART, 0x3D);
+ xfi_mode = FIELD_PREP(USXGMII_XFI_RX_MODE, USXGMII_XFI_MODE_5G) |
+ FIELD_PREP(USXGMII_XFI_TX_MODE, USXGMII_XFI_MODE_5G);
+ adapt_mode = USXGMII_RATE_UPDATE_MODE;
+ } else {
+ return -EINVAL;
+ }
+
+ adapt_mode |= FIELD_PREP(USXGMII_RATE_ADAPT_MODE, USXGMII_RATE_ADAPT_MODE_X1);
+
+ if (mpcs->interface != interface) {
+ mpcs->interface = interface;
+ mode_changed = true;
+ }
+
+ phy_power_on(mpcs->pextp);
+ mtk_usxgmii_reset(mpcs);
+
+ /* Setup USXGMII AN ctrl */
+ mtk_m32(mpcs, RG_PCS_AN_CTRL0,
+ USXGMII_AN_SYNC_CNT | USXGMII_AN_ENABLE,
+ an_ctrl);
+
+ mtk_m32(mpcs, RG_PCS_AN_CTRL2,
+ USXGMII_LINK_TIMER_IDLE_DETECT |
+ USXGMII_LINK_TIMER_COMP_ACK_DETECT |
+ USXGMII_LINK_TIMER_AN_RESTART,
+ link_timer);
+
+ mpcs->neg_mode = neg_mode;
+
+ /* Gated MAC CK */
+ mtk_m32(mpcs, RG_PHY_TOP_SPEED_CTRL1,
+ USXGMII_MAC_CK_GATED, USXGMII_MAC_CK_GATED);
+
+ /* Enable interface force mode */
+ mtk_m32(mpcs, RG_PHY_TOP_SPEED_CTRL1,
+ USXGMII_IF_FORCE_EN, USXGMII_IF_FORCE_EN);
+
+ /* Setup USXGMII adapt mode */
+ mtk_m32(mpcs, RG_PHY_TOP_SPEED_CTRL1,
+ USXGMII_RATE_UPDATE_MODE | USXGMII_RATE_ADAPT_MODE,
+ adapt_mode);
+
+ /* Setup USXGMII speed */
+ mtk_m32(mpcs, RG_PHY_TOP_SPEED_CTRL1,
+ USXGMII_XFI_RX_MODE | USXGMII_XFI_TX_MODE,
+ xfi_mode);
+
+ usleep_range(1, 10);
+
+ /* Un-gated MAC CK */
+ mtk_m32(mpcs, RG_PHY_TOP_SPEED_CTRL1, USXGMII_MAC_CK_GATED, 0);
+
+ usleep_range(1, 10);
+
+ /* Disable interface force mode for the AN mode */
+ if (an_ctrl & USXGMII_AN_ENABLE)
+ mtk_m32(mpcs, RG_PHY_TOP_SPEED_CTRL1, USXGMII_IF_FORCE_EN, 0);
+
+ /* Setup PHY for interface mode */
+ phy_set_mode_ext(mpcs->pextp, PHY_MODE_ETHERNET, interface);
+
+ return mode_changed;
+}
+
+static void mtk_usxgmii_pcs_get_fixed_speed(struct mtk_usxgmii_pcs *mpcs,
+ struct phylink_link_state *state)
+{
+ u32 val = mtk_r32(mpcs, RG_PHY_TOP_SPEED_CTRL1);
+ int speed;
+
+ /* Calculate speed from interface speed and rate adapt mode */
+ switch (FIELD_GET(USXGMII_XFI_RX_MODE, val)) {
+ case USXGMII_XFI_MODE_10G:
+ speed = 10000;
+ break;
+ case USXGMII_XFI_MODE_5G:
+ speed = 5000;
+ break;
+ case USXGMII_XFI_MODE_2P5G:
+ speed = 2500;
+ break;
+ default:
+ state->speed = SPEED_UNKNOWN;
+ return;
+ }
+
+ switch (FIELD_GET(USXGMII_RATE_ADAPT_MODE, val)) {
+ case USXGMII_RATE_ADAPT_MODE_X100:
+ speed /= 100;
+ break;
+ case USXGMII_RATE_ADAPT_MODE_X50:
+ speed /= 50;
+ break;
+ case USXGMII_RATE_ADAPT_MODE_X10:
+ speed /= 10;
+ break;
+ case USXGMII_RATE_ADAPT_MODE_X5:
+ speed /= 5;
+ break;
+ case USXGMII_RATE_ADAPT_MODE_X4:
+ speed /= 4;
+ break;
+ case USXGMII_RATE_ADAPT_MODE_X2:
+ speed /= 2;
+ break;
+ case USXGMII_RATE_ADAPT_MODE_X1:
+ break;
+ default:
+ state->speed = SPEED_UNKNOWN;
+ return;
+ }
+
+ state->speed = speed;
+ state->duplex = DUPLEX_FULL;
+}
+
+static void mtk_usxgmii_pcs_get_an_state(struct mtk_usxgmii_pcs *mpcs,
+ struct phylink_link_state *state)
+{
+ u16 lpa;
+
+ /* Refresh LPA by toggling LPA_LATCH */
+ mtk_m32(mpcs, RG_PCS_AN_STS0, USXGMII_LPA_LATCH, USXGMII_LPA_LATCH);
+ ndelay(1020);
+ mtk_m32(mpcs, RG_PCS_AN_STS0, USXGMII_LPA_LATCH, 0);
+ ndelay(1020);
+ lpa = FIELD_GET(USXGMII_LPA, mtk_r32(mpcs, RG_PCS_AN_STS0));
+
+ phylink_decode_usxgmii_word(state, lpa);
+}
+
+static void mtk_usxgmii_pcs_get_state(struct phylink_pcs *pcs,
+ struct phylink_link_state *state)
+{
+ struct mtk_usxgmii_pcs *mpcs = pcs_to_mtk_usxgmii_pcs(pcs);
+
+ /* Refresh USXGMII link status by toggling RG_PCS_AN_STATUS_UPDATE */
+ mtk_m32(mpcs, RG_PCS_RX_STATUS0, RG_PCS_RX_STATUS_UPDATE,
+ RG_PCS_RX_STATUS_UPDATE);
+ ndelay(1020);
+ mtk_m32(mpcs, RG_PCS_RX_STATUS0, RG_PCS_RX_STATUS_UPDATE, 0);
+ ndelay(1020);
+
+ /* Read USXGMII link status */
+ state->link = FIELD_GET(RG_PCS_RX_LINK_STATUS,
+ mtk_r32(mpcs, RG_PCS_RX_STATUS0));
+
+ /* Continuously repeat re-configuration sequence until link comes up */
+ if (!state->link) {
+ mtk_usxgmii_pcs_config(pcs, mpcs->neg_mode,
+ state->interface, NULL, false);
+ return;
+ }
+
+ if (FIELD_GET(USXGMII_AN_ENABLE, mtk_r32(mpcs, RG_PCS_AN_CTRL0)))
+ mtk_usxgmii_pcs_get_an_state(mpcs, state);
+ else
+ mtk_usxgmii_pcs_get_fixed_speed(mpcs, state);
+}
+
+static void mtk_usxgmii_pcs_restart_an(struct phylink_pcs *pcs)
+{
+ struct mtk_usxgmii_pcs *mpcs = pcs_to_mtk_usxgmii_pcs(pcs);
+
+ mtk_m32(mpcs, RG_PCS_AN_CTRL0, USXGMII_AN_RESTART, USXGMII_AN_RESTART);
+}
+
+static void mtk_usxgmii_pcs_link_up(struct phylink_pcs *pcs, unsigned int neg_mode,
+ phy_interface_t interface,
+ int speed, int duplex)
+{
+ /* Reconfiguring USXGMII to ensure the quality of the RX signal
+ * after the line side link up.
+ */
+ mtk_usxgmii_pcs_config(pcs, neg_mode, interface, NULL, false);
+}
+
+static void mtk_usxgmii_pcs_disable(struct phylink_pcs *pcs)
+{
+ struct mtk_usxgmii_pcs *mpcs = pcs_to_mtk_usxgmii_pcs(pcs);
+
+ mpcs->interface = PHY_INTERFACE_MODE_NA;
+ mpcs->neg_mode = -1;
+}
+
+static const struct phylink_pcs_ops mtk_usxgmii_pcs_ops = {
+ .pcs_config = mtk_usxgmii_pcs_config,
+ .pcs_get_state = mtk_usxgmii_pcs_get_state,
+ .pcs_an_restart = mtk_usxgmii_pcs_restart_an,
+ .pcs_link_up = mtk_usxgmii_pcs_link_up,
+ .pcs_disable = mtk_usxgmii_pcs_disable,
+};
+
+static int mtk_usxgmii_probe(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+ struct mtk_usxgmii_pcs *mpcs;
+ int ret;
+
+ mpcs = devm_kzalloc(dev, sizeof(*mpcs), GFP_KERNEL);
+ if (!mpcs)
+ return -ENOMEM;
+
+ mpcs->base = devm_platform_ioremap_resource(pdev, 0);
+ if (IS_ERR(mpcs->base))
+ return PTR_ERR(mpcs->base);
+
+ mpcs->dev = dev;
+ mpcs->pcs.ops = &mtk_usxgmii_pcs_ops;
+ mpcs->pcs.poll = true;
+ mpcs->pcs.neg_mode = true;
+ mpcs->interface = PHY_INTERFACE_MODE_NA;
+ mpcs->neg_mode = -1;
+
+ mpcs->clk = devm_clk_get_enabled(mpcs->dev, "usxgmii");
+ if (IS_ERR(mpcs->clk))
+ return PTR_ERR(mpcs->clk);
+
+ mpcs->xfi_pll = devm_clk_get_enabled(mpcs->dev, "xfi_pll");
+ if (IS_ERR(mpcs->xfi_pll))
+ return PTR_ERR(mpcs->xfi_pll);
+
+ mpcs->reset = devm_reset_control_get_shared(dev, "xfi");
+ if (IS_ERR(mpcs->reset))
+ return PTR_ERR(mpcs->reset);
+
+ reset_control_deassert(mpcs->reset);
+
+ ret = mtk_xfi_pextp_init(mpcs);
+ if (ret)
+ return ret;
+
+ ret = mtk_sgmii_wrapper_init(mpcs);
+ if (ret)
+ return ret;
+
+ platform_set_drvdata(pdev, mpcs);
+
+ return 0;
+}
+
+static int mtk_usxgmii_remove(struct platform_device *pdev)
+{
+ struct mtk_usxgmii_pcs *mpcs = platform_get_drvdata(pdev);
+
+ mtk_sgmii_wrapper_destroy(mpcs);
+ phy_power_off(mpcs->pextp);
+
+ return 0;
+}
+
+static const struct of_device_id mtk_usxgmii_of_mtable[] = {
+ { .compatible = "mediatek,mt7988-usxgmiisys" },
+ { /* sentinel */ },
+};
+MODULE_DEVICE_TABLE(of, mtk_usxgmii_of_mtable);
+
+struct phylink_pcs *mtk_usxgmii_select_pcs(struct device_node *np, phy_interface_t mode)
+{
+ struct platform_device *pdev;
+ struct mtk_usxgmii_pcs *mpcs;
+
+ if (!np)
+ return NULL;
+
+ if (!of_device_is_available(np))
+ return ERR_PTR(-ENODEV);
+
+ if (!of_match_node(mtk_usxgmii_of_mtable, np))
+ return ERR_PTR(-EINVAL);
+
+ pdev = of_find_device_by_node(np);
+ if (!pdev || !platform_get_drvdata(pdev)) {
+ if (pdev)
+ put_device(&pdev->dev);
+ return ERR_PTR(-EPROBE_DEFER);
+ }
+
+ mpcs = platform_get_drvdata(pdev);
+ put_device(&pdev->dev);
+
+ switch (mode) {
+ case PHY_INTERFACE_MODE_1000BASEX:
+ case PHY_INTERFACE_MODE_2500BASEX:
+ case PHY_INTERFACE_MODE_SGMII:
+ return mpcs->sgmii_pcs;
+ case PHY_INTERFACE_MODE_5GBASER:
+ case PHY_INTERFACE_MODE_10GBASER:
+ case PHY_INTERFACE_MODE_USXGMII:
+ return &mpcs->pcs;
+ default:
+ return NULL;
+ }
+}
+EXPORT_SYMBOL(mtk_usxgmii_select_pcs);
+
+static struct platform_driver mtk_usxgmii_driver = {
+ .driver = {
+ .name = "mtk_usxgmii",
+ .suppress_bind_attrs = true,
+ .of_match_table = mtk_usxgmii_of_mtable,
+ },
+ .probe = mtk_usxgmii_probe,
+ .remove = mtk_usxgmii_remove,
+};
+module_platform_driver(mtk_usxgmii_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("MediaTek USXGMII PCS driver");
+MODULE_AUTHOR("Daniel Golle <daniel@makrotopia.org>");
diff --git a/include/linux/pcs/pcs-mtk-usxgmii.h b/include/linux/pcs/pcs-mtk-usxgmii.h
new file mode 100644
index 0000000000000..7a3c49760ffa6
--- /dev/null
+++ b/include/linux/pcs/pcs-mtk-usxgmii.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __LINUX_PCS_MTK_USXGMII_H
+#define __LINUX_PCS_MTK_USXGMII_H
+
+#include <linux/phylink.h>
+
+/**
+ * mtk_usxgmii_select_pcs
+ * Return PCS indentified by a device node and the PHY interface mode in use
+ *
+ * @param np Pointer to device node indentifying a MediaTek USXGMII PCS
+ * @param mode Ethernet PHY interface mode
+ *
+ * @return Pointer to phylink PCS instance of NULL
+ */
+struct phylink_pcs *mtk_usxgmii_select_pcs(struct device_node *np, phy_interface_t mode);
+
+#endif /* __LINUX_PCS_MTK_USXGMII_H */
--
2.42.1
^ permalink raw reply related
* [RFC PATCH 7/8] dt-bindings: net: mediatek,net: fix and complete mt7988-eth binding
From: Daniel Golle @ 2023-11-09 21:52 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Chunfeng Yun,
Vinod Koul, Kishon Vijay Abraham I, Felix Fietkau, John Crispin,
Sean Wang, Mark Lee, Lorenzo Bianconi, Matthias Brugger,
AngeloGioacchino Del Regno, Andrew Lunn, Heiner Kallweit,
Russell King, Alexander Couzens, Daniel Golle, Philipp Zabel,
netdev, devicetree, linux-kernel, linux-arm-kernel,
linux-mediatek, linux-phy
In-Reply-To: <cover.1699565880.git.daniel@makrotopia.org>
Remove clocks which were copied from the vendor driver but are now taken
care of by dedicated drivers for PCS and PHY in the upstream driver.
Also remove mediatek,sgmiisys phandle which isn't required on MT7988
because we use pcs-handle on the MAC nodes instead.
Last but not least, add an example for MT7988.
Fixes: c94a9aabec36 ("dt-bindings: net: mediatek,net: add mt7988-eth binding")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
.../devicetree/bindings/net/mediatek,net.yaml | 171 +++++++++++++++---
1 file changed, 142 insertions(+), 29 deletions(-)
diff --git a/Documentation/devicetree/bindings/net/mediatek,net.yaml b/Documentation/devicetree/bindings/net/mediatek,net.yaml
index e74502a0afe86..c0f7bb6f3ef8d 100644
--- a/Documentation/devicetree/bindings/net/mediatek,net.yaml
+++ b/Documentation/devicetree/bindings/net/mediatek,net.yaml
@@ -27,9 +27,6 @@ properties:
- mediatek,mt7988-eth
- ralink,rt5350-eth
- reg:
- maxItems: 1
-
clocks: true
clock-names: true
@@ -115,6 +112,9 @@ allOf:
- mediatek,mt7623-eth
then:
properties:
+ reg:
+ maxItems: 1
+
interrupts:
maxItems: 3
@@ -149,6 +149,9 @@ allOf:
- mediatek,mt7621-eth
then:
properties:
+ reg:
+ maxItems: 1
+
interrupts:
maxItems: 1
@@ -174,6 +177,9 @@ allOf:
const: mediatek,mt7622-eth
then:
properties:
+ reg:
+ maxItems: 1
+
interrupts:
maxItems: 3
@@ -215,6 +221,9 @@ allOf:
const: mediatek,mt7629-eth
then:
properties:
+ reg:
+ maxItems: 1
+
interrupts:
maxItems: 3
@@ -257,6 +266,9 @@ allOf:
const: mediatek,mt7981-eth
then:
properties:
+ reg:
+ maxItems: 1
+
interrupts:
minItems: 4
@@ -295,6 +307,9 @@ allOf:
const: mediatek,mt7986-eth
then:
properties:
+ reg:
+ maxItems: 1
+
interrupts:
minItems: 4
@@ -333,36 +348,32 @@ allOf:
const: mediatek,mt7988-eth
then:
properties:
+ reg:
+ maxItems: 2
+ minItems: 2
+
interrupts:
minItems: 4
+ maxItems: 4
clocks:
- minItems: 34
- maxItems: 34
+ minItems: 24
+ maxItems: 24
clock-names:
items:
- - const: crypto
+ - const: xgp1
+ - const: xgp2
+ - const: xgp3
- const: fe
- const: gp2
- const: gp1
- const: gp3
+ - const: esw
+ - const: crypto
- const: ethwarp_wocpu2
- const: ethwarp_wocpu1
- const: ethwarp_wocpu0
- - const: esw
- - const: netsys0
- - const: netsys1
- - const: sgmii_tx250m
- - const: sgmii_rx250m
- - const: sgmii2_tx250m
- - const: sgmii2_rx250m
- - const: top_usxgmii0_sel
- - const: top_usxgmii1_sel
- - const: top_sgm0_sel
- - const: top_sgm1_sel
- - const: top_xfi_phy0_xtal_sel
- - const: top_xfi_phy1_xtal_sel
- const: top_eth_gmii_sel
- const: top_eth_refck_50m_sel
- const: top_eth_sys_200m_sel
@@ -375,18 +386,9 @@ allOf:
- const: top_netsys_sync_250m_sel
- const: top_netsys_ppefb_250m_sel
- const: top_netsys_warp_sel
- - const: wocpu1
- - const: wocpu0
- - const: xgp1
- - const: xgp2
- - const: xgp3
-
- mediatek,sgmiisys:
- minItems: 2
- maxItems: 2
patternProperties:
- "^mac@[0-1]$":
+ "^mac@[0-2]$":
type: object
unevaluatedProperties: false
allOf:
@@ -577,3 +579,114 @@ examples:
};
};
};
+
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ #include <dt-bindings/interrupt-controller/irq.h>
+ #include <dt-bindings/clock/mediatek,mt7988-clk.h>
+
+ soc {
+ #address-cells = <2>;
+ #size-cells = <2>;
+
+ ethernet@15100000 {
+ compatible = "mediatek,mt7988-eth";
+ reg = <0 0x15100000 0 0x80000>, <0 0x15400000 0 0x380000>;
+ interrupts = <GIC_SPI 196 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 197 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 198 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 199 IRQ_TYPE_LEVEL_HIGH>;
+
+ clocks = <ðsys CLK_ETHDMA_XGP1_EN>,
+ <ðsys CLK_ETHDMA_XGP2_EN>,
+ <ðsys CLK_ETHDMA_XGP3_EN>,
+ <ðsys CLK_ETHDMA_FE_EN>,
+ <ðsys CLK_ETHDMA_GP2_EN>,
+ <ðsys CLK_ETHDMA_GP1_EN>,
+ <ðsys CLK_ETHDMA_GP3_EN>,
+ <ðsys CLK_ETHDMA_ESW_EN>,
+ <ðsys CLK_ETHDMA_CRYPT0_EN>,
+ <ðwarp CLK_ETHWARP_WOCPU2_EN>,
+ <ðwarp CLK_ETHWARP_WOCPU1_EN>,
+ <ðwarp CLK_ETHWARP_WOCPU0_EN>,
+ <&topckgen CLK_TOP_ETH_GMII_SEL>,
+ <&topckgen CLK_TOP_ETH_REFCK_50M_SEL>,
+ <&topckgen CLK_TOP_ETH_SYS_200M_SEL>,
+ <&topckgen CLK_TOP_ETH_SYS_SEL>,
+ <&topckgen CLK_TOP_ETH_XGMII_SEL>,
+ <&topckgen CLK_TOP_ETH_MII_SEL>,
+ <&topckgen CLK_TOP_NETSYS_SEL>,
+ <&topckgen CLK_TOP_NETSYS_500M_SEL>,
+ <&topckgen CLK_TOP_NETSYS_PAO_2X_SEL>,
+ <&topckgen CLK_TOP_NETSYS_SYNC_250M_SEL>,
+ <&topckgen CLK_TOP_NETSYS_PPEFB_250M_SEL>,
+ <&topckgen CLK_TOP_NETSYS_WARP_SEL>;
+
+ clock-names = "xgp1", "xgp2", "xgp3", "fe", "gp2", "gp1",
+ "gp3", "esw", "crypto",
+ "ethwarp_wocpu2", "ethwarp_wocpu1",
+ "ethwarp_wocpu0", "top_eth_gmii_sel",
+ "top_eth_refck_50m_sel", "top_eth_sys_200m_sel",
+ "top_eth_sys_sel", "top_eth_xgmii_sel",
+ "top_eth_mii_sel", "top_netsys_sel",
+ "top_netsys_500m_sel", "top_netsys_pao_2x_sel",
+ "top_netsys_sync_250m_sel",
+ "top_netsys_ppefb_250m_sel",
+ "top_netsys_warp_sel";
+ assigned-clocks = <&topckgen CLK_TOP_NETSYS_2X_SEL>,
+ <&topckgen CLK_TOP_NETSYS_GSW_SEL>,
+ <&topckgen CLK_TOP_USXGMII_SBUS_0_SEL>,
+ <&topckgen CLK_TOP_USXGMII_SBUS_1_SEL>,
+ <&topckgen CLK_TOP_SGM_0_SEL>,
+ <&topckgen CLK_TOP_SGM_1_SEL>;
+ assigned-clock-parents = <&apmixedsys CLK_APMIXED_NET2PLL>,
+ <&topckgen CLK_TOP_NET1PLL_D4>,
+ <&topckgen CLK_TOP_NET1PLL_D8_D4>,
+ <&topckgen CLK_TOP_NET1PLL_D8_D4>,
+ <&apmixedsys CLK_APMIXED_SGMPLL>,
+ <&apmixedsys CLK_APMIXED_SGMPLL>;
+ mediatek,ethsys = <ðsys>;
+ mediatek,infracfg = <&topmisc>;
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ mac@0 {
+ compatible = "mediatek,eth-mac";
+ reg = <0>;
+ phy-mode = "internal";
+ status = "disabled";
+
+ fixed-link {
+ speed = <10000>;
+ full-duplex;
+ pause;
+ };
+ };
+
+ mac@1 {
+ compatible = "mediatek,eth-mac";
+ reg = <1>;
+ status = "disabled";
+ pcs-handle = <&usxgmiisys1>;
+ };
+
+ mac@2 {
+ compatible = "mediatek,eth-mac";
+ reg = <2>;
+ status = "disabled";
+ pcs-handle = <&usxgmiisys0>;
+ };
+
+ mdio_bus: mdio-bus {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ /* internal 2.5G PHY */
+ int_2p5g_phy: ethernet-phy@15 {
+ reg = <15>;
+ compatible = "ethernet-phy-ieee802.3-c45";
+ phy-mode = "internal";
+ };
+ };
+ };
+ };
--
2.42.1
^ permalink raw reply related
* [RFC PATCH 8/8] net: ethernet: mtk_eth_soc: add paths and SerDes modes for MT7988
From: Daniel Golle @ 2023-11-09 21:52 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Chunfeng Yun,
Vinod Koul, Kishon Vijay Abraham I, Felix Fietkau, John Crispin,
Sean Wang, Mark Lee, Lorenzo Bianconi, Matthias Brugger,
AngeloGioacchino Del Regno, Andrew Lunn, Heiner Kallweit,
Russell King, Alexander Couzens, Daniel Golle, Philipp Zabel,
netdev, devicetree, linux-kernel, linux-arm-kernel,
linux-mediatek, linux-phy
In-Reply-To: <cover.1699565880.git.daniel@makrotopia.org>
MT7988 comes with a built-in 2.5G PHY as well as SerDes lanes to
connect external PHYs or transceivers in USXGMII, 10GBase-R, 5GBase-R,
2500Base-X, 1000Base-X and Cisco SGMII interface modes.
Implement support for configuring for the new paths to SerDes interfaces
and the internal 2.5G PHY.
Add USXGMII PCS driver for 10GBase-R, 5GBase-R and USXGMII mode, and
setup the new PHYA on MT7988 to access the also still existing old
LynxI PCS for 1000Base-X, 2500Base-X and Cisco SGMII PCS interface
modes.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
drivers/net/ethernet/mediatek/Kconfig | 17 ++
drivers/net/ethernet/mediatek/mtk_eth_path.c | 122 ++++++++++++-
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 178 ++++++++++++++++---
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 105 +++++++++--
4 files changed, 379 insertions(+), 43 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/Kconfig b/drivers/net/ethernet/mediatek/Kconfig
index da0db417ab690..b63723b8d1d2c 100644
--- a/drivers/net/ethernet/mediatek/Kconfig
+++ b/drivers/net/ethernet/mediatek/Kconfig
@@ -21,10 +21,27 @@ config NET_MEDIATEK_SOC
select PAGE_POOL_STATS
select PCS_MTK_LYNXI
select REGMAP_MMIO
+ select PCS_MTK_USXGMII if NET_MEDIATEK_SOC_USXGMII
help
This driver supports the gigabit ethernet MACs in the
MediaTek SoC family.
+config NET_MEDIATEK_SOC_USXGMII
+ bool "Support USXGMII SerDes on MT7988"
+ depends on (ARCH_MEDIATEK && ARM64) || COMPILE_TEST
+ def_bool NET_MEDIATEK_SOC != n
+ help
+ Include support for 10GE SerDes which can be found on MT7988.
+ If this kernel should run on SoCs with 10 GBit/s Ethernet you
+ will need to select this option to use GMAC2 and GMAC3 with
+ external PHYs, SFP(+) cages in 10GBase-R, 5GBase-R or USXGMII
+ interface modes.
+
+ Note that as the 2500Base-X/1000Base-X/Cisco SGMII SerDes PCS
+ unit (MediaTek LynxI) in MT7988 is connected via the new 10GE
+ SerDes, you will also need to select this option in case you
+ want to use any of those SerDes modes.
+
config NET_MEDIATEK_STAR_EMAC
tristate "MediaTek STAR Ethernet MAC support"
select PHYLIB
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_path.c b/drivers/net/ethernet/mediatek/mtk_eth_path.c
index 7c27a19c4d8f4..3f4f4cfe6a233 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_path.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_path.c
@@ -31,10 +31,20 @@ static const char *mtk_eth_path_name(u64 path)
return "gmac2_rgmii";
case MTK_ETH_PATH_GMAC2_SGMII:
return "gmac2_sgmii";
+ case MTK_ETH_PATH_GMAC2_2P5GPHY:
+ return "gmac2_2p5gphy";
case MTK_ETH_PATH_GMAC2_GEPHY:
return "gmac2_gephy";
+ case MTK_ETH_PATH_GMAC3_SGMII:
+ return "gmac3_sgmii";
case MTK_ETH_PATH_GDM1_ESW:
return "gdm1_esw";
+ case MTK_ETH_PATH_GMAC1_USXGMII:
+ return "gmac1_usxgmii";
+ case MTK_ETH_PATH_GMAC2_USXGMII:
+ return "gmac2_usxgmii";
+ case MTK_ETH_PATH_GMAC3_USXGMII:
+ return "gmac3_usxgmii";
default:
return "unknown path";
}
@@ -127,6 +137,27 @@ static int set_mux_u3_gmac2_to_qphy(struct mtk_eth *eth, u64 path)
return 0;
}
+static int set_mux_gmac2_to_2p5gphy(struct mtk_eth *eth, u64 path)
+{
+ int ret;
+
+ if (path == MTK_ETH_PATH_GMAC2_2P5GPHY) {
+ ret = regmap_clear_bits(eth->ethsys, ETHSYS_SYSCFG0, SYSCFG0_SGMII_GMAC2_V2);
+ if (ret)
+ return ret;
+
+ /* Setup mux to 2p5g PHY */
+ ret = regmap_clear_bits(eth->infra, TOP_MISC_NETSYS_PCS_MUX, MUX_G2_USXGMII_SEL);
+ if (ret)
+ return ret;
+
+ dev_dbg(eth->dev, "path %s in %s updated\n",
+ mtk_eth_path_name(path), __func__);
+ }
+
+ return 0;
+}
+
static int set_mux_gmac1_gmac2_to_sgmii_rgmii(struct mtk_eth *eth, u64 path)
{
unsigned int val = 0;
@@ -165,7 +196,48 @@ static int set_mux_gmac1_gmac2_to_sgmii_rgmii(struct mtk_eth *eth, u64 path)
return 0;
}
-static int set_mux_gmac12_to_gephy_sgmii(struct mtk_eth *eth, u64 path)
+static int set_mux_gmac123_to_usxgmii(struct mtk_eth *eth, u64 path)
+{
+ unsigned int val = 0;
+ bool updated = true;
+ int mac_id = 0;
+
+ /* Disable SYSCFG1 SGMII */
+ regmap_read(eth->ethsys, ETHSYS_SYSCFG0, &val);
+
+ switch (path) {
+ case MTK_ETH_PATH_GMAC1_USXGMII:
+ val &= ~(u32)SYSCFG0_SGMII_GMAC1_V2;
+ mac_id = MTK_GMAC1_ID;
+ break;
+ case MTK_ETH_PATH_GMAC2_USXGMII:
+ val &= ~(u32)SYSCFG0_SGMII_GMAC2_V2;
+ mac_id = MTK_GMAC2_ID;
+ break;
+ case MTK_ETH_PATH_GMAC3_USXGMII:
+ val &= ~(u32)SYSCFG0_SGMII_GMAC3_V2;
+ mac_id = MTK_GMAC3_ID;
+ break;
+ default:
+ updated = false;
+ };
+
+ if (updated) {
+ regmap_update_bits(eth->ethsys, ETHSYS_SYSCFG0,
+ SYSCFG0_SGMII_MASK, val);
+
+ if (mac_id == MTK_GMAC2_ID)
+ regmap_set_bits(eth->infra, TOP_MISC_NETSYS_PCS_MUX,
+ MUX_G2_USXGMII_SEL);
+ }
+
+ dev_dbg(eth->dev, "path %s in %s updated = %d\n",
+ mtk_eth_path_name(path), __func__, updated);
+
+ return 0;
+}
+
+static int set_mux_gmac123_to_gephy_sgmii(struct mtk_eth *eth, u64 path)
{
unsigned int val = 0;
bool updated = true;
@@ -182,6 +254,9 @@ static int set_mux_gmac12_to_gephy_sgmii(struct mtk_eth *eth, u64 path)
case MTK_ETH_PATH_GMAC2_SGMII:
val |= SYSCFG0_SGMII_GMAC2_V2;
break;
+ case MTK_ETH_PATH_GMAC3_SGMII:
+ val |= SYSCFG0_SGMII_GMAC3_V2;
+ break;
default:
updated = false;
}
@@ -209,6 +284,10 @@ static const struct mtk_eth_muxc mtk_eth_muxc[] = {
.name = "mux_u3_gmac2_to_qphy",
.cap_bit = MTK_ETH_MUX_U3_GMAC2_TO_QPHY,
.set_path = set_mux_u3_gmac2_to_qphy,
+ }, {
+ .name = "mux_gmac2_to_2p5gphy",
+ .cap_bit = MTK_ETH_MUX_GMAC2_TO_2P5GPHY,
+ .set_path = set_mux_gmac2_to_2p5gphy,
}, {
.name = "mux_gmac1_gmac2_to_sgmii_rgmii",
.cap_bit = MTK_ETH_MUX_GMAC1_GMAC2_TO_SGMII_RGMII,
@@ -216,7 +295,15 @@ static const struct mtk_eth_muxc mtk_eth_muxc[] = {
}, {
.name = "mux_gmac12_to_gephy_sgmii",
.cap_bit = MTK_ETH_MUX_GMAC12_TO_GEPHY_SGMII,
- .set_path = set_mux_gmac12_to_gephy_sgmii,
+ .set_path = set_mux_gmac123_to_gephy_sgmii,
+ }, {
+ .name = "mux_gmac123_to_gephy_sgmii",
+ .cap_bit = MTK_ETH_MUX_GMAC123_TO_GEPHY_SGMII,
+ .set_path = set_mux_gmac123_to_gephy_sgmii,
+ }, {
+ .name = "mux_gmac123_to_usxgmii",
+ .cap_bit = MTK_ETH_MUX_GMAC123_TO_USXGMII,
+ .set_path = set_mux_gmac123_to_usxgmii,
},
};
@@ -249,12 +336,39 @@ static int mtk_eth_mux_setup(struct mtk_eth *eth, u64 path)
return err;
}
+int mtk_gmac_usxgmii_path_setup(struct mtk_eth *eth, int mac_id)
+{
+ u64 path;
+
+ path = (mac_id == MTK_GMAC1_ID) ? MTK_ETH_PATH_GMAC1_USXGMII :
+ (mac_id == MTK_GMAC2_ID) ? MTK_ETH_PATH_GMAC2_USXGMII :
+ MTK_ETH_PATH_GMAC3_USXGMII;
+
+ /* Setup proper MUXes along the path */
+ return mtk_eth_mux_setup(eth, path);
+}
+
int mtk_gmac_sgmii_path_setup(struct mtk_eth *eth, int mac_id)
{
u64 path;
- path = (mac_id == 0) ? MTK_ETH_PATH_GMAC1_SGMII :
- MTK_ETH_PATH_GMAC2_SGMII;
+ path = (mac_id == MTK_GMAC1_ID) ? MTK_ETH_PATH_GMAC1_SGMII :
+ (mac_id == MTK_GMAC2_ID) ? MTK_ETH_PATH_GMAC2_SGMII :
+ MTK_ETH_PATH_GMAC3_SGMII;
+
+ /* Setup proper MUXes along the path */
+ return mtk_eth_mux_setup(eth, path);
+}
+
+int mtk_gmac_2p5gphy_path_setup(struct mtk_eth *eth, int mac_id)
+{
+ u64 path = 0;
+
+ if (mac_id == MTK_GMAC2_ID)
+ path = MTK_ETH_PATH_GMAC2_2P5GPHY;
+
+ if (!path)
+ return -EINVAL;
/* Setup proper MUXes along the path */
return mtk_eth_mux_setup(eth, path);
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 3cf6589cfdacf..a550cf7ab0d91 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -22,6 +22,7 @@
#include <linux/pinctrl/devinfo.h>
#include <linux/phylink.h>
#include <linux/pcs/pcs-mtk-lynxi.h>
+#include <linux/pcs/pcs-mtk-usxgmii.h>
#include <linux/jhash.h>
#include <linux/bitfield.h>
#include <net/dsa.h>
@@ -260,12 +261,8 @@ static const char * const mtk_clks_source_name[] = {
"ethwarp_wocpu2",
"ethwarp_wocpu1",
"ethwarp_wocpu0",
- "top_usxgmii0_sel",
- "top_usxgmii1_sel",
"top_sgm0_sel",
"top_sgm1_sel",
- "top_xfi_phy0_xtal_sel",
- "top_xfi_phy1_xtal_sel",
"top_eth_gmii_sel",
"top_eth_refck_50m_sel",
"top_eth_sys_200m_sel",
@@ -508,6 +505,30 @@ static void mtk_setup_bridge_switch(struct mtk_eth *eth)
MTK_GSW_CFG);
}
+static bool mtk_check_gmac23_idle(struct mtk_mac *mac)
+{
+ u32 mac_fsm, gdm_fsm;
+
+ mac_fsm = mtk_r32(mac->hw, MTK_MAC_FSM(mac->id));
+
+ switch (mac->id) {
+ case MTK_GMAC2_ID:
+ gdm_fsm = mtk_r32(mac->hw, MTK_FE_GDM2_FSM);
+ break;
+ case MTK_GMAC3_ID:
+ gdm_fsm = mtk_r32(mac->hw, MTK_FE_GDM3_FSM);
+ break;
+ default:
+ return true;
+ };
+
+ if ((mac_fsm & 0xFFFF0000) == 0x01010000 &&
+ (gdm_fsm & 0xFFFF0000) == 0x00000000)
+ return true;
+
+ return false;
+}
+
static struct phylink_pcs *mtk_mac_select_pcs(struct phylink_config *config,
phy_interface_t interface)
{
@@ -516,6 +537,14 @@ static struct phylink_pcs *mtk_mac_select_pcs(struct phylink_config *config,
struct mtk_eth *eth = mac->hw;
unsigned int sid;
+ if (mtk_is_netsys_v3_or_greater(eth)) {
+#if IS_ENABLED(CONFIG_NET_MEDIATEK_SOC_USXGMII)
+ return mtk_usxgmii_select_pcs(mac->pcs_of_node, interface);
+#else
+ return NULL;
+#endif
+ }
+
if (interface == PHY_INTERFACE_MODE_SGMII ||
phy_interface_mode_is_8023z(interface)) {
sid = (MTK_HAS_CAPS(eth->soc->caps, MTK_SHARED_SGMII)) ?
@@ -567,7 +596,22 @@ static void mtk_mac_config(struct phylink_config *config, unsigned int mode,
goto init_err;
}
break;
+ case PHY_INTERFACE_MODE_USXGMII:
+ case PHY_INTERFACE_MODE_10GBASER:
+ case PHY_INTERFACE_MODE_5GBASER:
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_USXGMII)) {
+ err = mtk_gmac_usxgmii_path_setup(eth, mac->id);
+ if (err)
+ goto init_err;
+ }
+ break;
case PHY_INTERFACE_MODE_INTERNAL:
+ if (mac->id == MTK_GMAC2_ID &&
+ MTK_HAS_CAPS(eth->soc->caps, MTK_2P5GPHY)) {
+ err = mtk_gmac_2p5gphy_path_setup(eth, mac->id);
+ if (err)
+ goto init_err;
+ }
break;
default:
goto err_phy;
@@ -614,8 +658,6 @@ static void mtk_mac_config(struct phylink_config *config, unsigned int mode,
val &= ~SYSCFG0_GE_MODE(SYSCFG0_GE_MASK, mac->id);
val |= SYSCFG0_GE_MODE(ge_mode, mac->id);
regmap_write(eth->ethsys, ETHSYS_SYSCFG0, val);
-
- mac->interface = state->interface;
}
/* SGMII */
@@ -632,21 +674,40 @@ static void mtk_mac_config(struct phylink_config *config, unsigned int mode,
/* Save the syscfg0 value for mac_finish */
mac->syscfg0 = val;
- } else if (phylink_autoneg_inband(mode)) {
+ } else if (state->interface != PHY_INTERFACE_MODE_USXGMII &&
+ state->interface != PHY_INTERFACE_MODE_10GBASER &&
+ state->interface != PHY_INTERFACE_MODE_5GBASER &&
+ phylink_autoneg_inband(mode)) {
dev_err(eth->dev,
- "In-band mode not supported in non SGMII mode!\n");
+ "In-band mode not supported in non-SerDes modes!\n");
return;
}
/* Setup gmac */
- if (mtk_is_netsys_v3_or_greater(eth) &&
- mac->interface == PHY_INTERFACE_MODE_INTERNAL) {
- mtk_w32(mac->hw, MTK_GDMA_XGDM_SEL, MTK_GDMA_EG_CTRL(mac->id));
- mtk_w32(mac->hw, MAC_MCR_FORCE_LINK_DOWN, MTK_MAC_MCR(mac->id));
+ if (mtk_is_netsys_v3_or_greater(eth)) {
+ if (mtk_interface_mode_is_xgmii(state->interface)) {
+ mtk_w32(mac->hw, MTK_GDMA_XGDM_SEL, MTK_GDMA_EG_CTRL(mac->id));
+ mtk_w32(mac->hw, MAC_MCR_FORCE_LINK_DOWN, MTK_MAC_MCR(mac->id));
- mtk_setup_bridge_switch(eth);
+ if (mac->id == MTK_GMAC1_ID)
+ mtk_setup_bridge_switch(eth);
+ } else {
+ mtk_w32(eth, 0, MTK_GDMA_EG_CTRL(mac->id));
+
+ /* FIXME: In current hardware design, we have to reset FE
+ * when swtiching XGDM to GDM. Therefore, here trigger an SER
+ * to let GDM go back to the initial state.
+ */
+ if ((mtk_interface_mode_is_xgmii(mac->interface) ||
+ mac->interface == PHY_INTERFACE_MODE_NA) &&
+ !mtk_check_gmac23_idle(mac) &&
+ !test_bit(MTK_RESETTING, ð->state))
+ schedule_work(ð->pending_work);
+ }
}
+ mac->interface = state->interface;
+
return;
err_phy:
@@ -692,10 +753,13 @@ static void mtk_mac_link_down(struct phylink_config *config, unsigned int mode,
{
struct mtk_mac *mac = container_of(config, struct mtk_mac,
phylink_config);
- u32 mcr = mtk_r32(mac->hw, MTK_MAC_MCR(mac->id));
- mcr &= ~(MAC_MCR_TX_EN | MAC_MCR_RX_EN);
- mtk_w32(mac->hw, mcr, MTK_MAC_MCR(mac->id));
+ if (!mtk_interface_mode_is_xgmii(interface)) {
+ mtk_m32(mac->hw, MAC_MCR_TX_EN | MAC_MCR_RX_EN, 0, MTK_MAC_MCR(mac->id));
+ mtk_m32(mac->hw, MTK_XGMAC_FORCE_LINK(mac->id), 0, MTK_XGMAC_STS(mac->id));
+ } else if (mac->id != MTK_GMAC1_ID) {
+ mtk_m32(mac->hw, XMAC_MCR_TRX_DISABLE, XMAC_MCR_TRX_DISABLE, MTK_XMAC_MCR(mac->id));
+ }
}
static void mtk_set_queue_speed(struct mtk_eth *eth, unsigned int idx,
@@ -767,13 +831,11 @@ static void mtk_set_queue_speed(struct mtk_eth *eth, unsigned int idx,
mtk_w32(eth, val, soc->reg_map->qdma.qtx_sch + ofs);
}
-static void mtk_mac_link_up(struct phylink_config *config,
- struct phy_device *phy,
- unsigned int mode, phy_interface_t interface,
- int speed, int duplex, bool tx_pause, bool rx_pause)
+static void mtk_gdm_mac_link_up(struct mtk_mac *mac,
+ struct phy_device *phy,
+ unsigned int mode, phy_interface_t interface,
+ int speed, int duplex, bool tx_pause, bool rx_pause)
{
- struct mtk_mac *mac = container_of(config, struct mtk_mac,
- phylink_config);
u32 mcr;
mcr = mtk_r32(mac->hw, MTK_MAC_MCR(mac->id));
@@ -807,6 +869,55 @@ static void mtk_mac_link_up(struct phylink_config *config,
mtk_w32(mac->hw, mcr, MTK_MAC_MCR(mac->id));
}
+static void mtk_xgdm_mac_link_up(struct mtk_mac *mac,
+ struct phy_device *phy,
+ unsigned int mode, phy_interface_t interface,
+ int speed, int duplex, bool tx_pause, bool rx_pause)
+{
+ u32 mcr, force_link = 0;
+
+ if (mac->id == MTK_GMAC1_ID)
+ return;
+
+ /* Eliminate the interference(before link-up) caused by PHY noise */
+ mtk_m32(mac->hw, XMAC_LOGIC_RST, 0, MTK_XMAC_LOGIC_RST(mac->id));
+ mdelay(20);
+ mtk_m32(mac->hw, XMAC_GLB_CNTCLR, XMAC_GLB_CNTCLR, MTK_XMAC_CNT_CTRL(mac->id));
+
+ if (mac->interface == PHY_INTERFACE_MODE_INTERNAL || mac->id == MTK_GMAC3_ID)
+ force_link = MTK_XGMAC_FORCE_LINK(mac->id);
+
+ mtk_m32(mac->hw, MTK_XGMAC_FORCE_LINK(mac->id), force_link, MTK_XGMAC_STS(mac->id));
+
+ mcr = mtk_r32(mac->hw, MTK_XMAC_MCR(mac->id));
+ mcr &= ~(XMAC_MCR_FORCE_TX_FC | XMAC_MCR_FORCE_RX_FC | XMAC_MCR_TRX_DISABLE);
+ /* Configure pause modes -
+ * phylink will avoid these for half duplex
+ */
+ if (tx_pause)
+ mcr |= XMAC_MCR_FORCE_TX_FC;
+ if (rx_pause)
+ mcr |= XMAC_MCR_FORCE_RX_FC;
+
+ mtk_w32(mac->hw, mcr, MTK_XMAC_MCR(mac->id));
+}
+
+static void mtk_mac_link_up(struct phylink_config *config,
+ struct phy_device *phy,
+ unsigned int mode, phy_interface_t interface,
+ int speed, int duplex, bool tx_pause, bool rx_pause)
+{
+ struct mtk_mac *mac = container_of(config, struct mtk_mac,
+ phylink_config);
+
+ if (mtk_interface_mode_is_xgmii(interface))
+ mtk_xgdm_mac_link_up(mac, phy, mode, interface, speed, duplex,
+ tx_pause, rx_pause);
+ else
+ mtk_gdm_mac_link_up(mac, phy, mode, interface, speed, duplex,
+ tx_pause, rx_pause);
+}
+
static const struct phylink_mac_ops mtk_phylink_ops = {
.mac_select_pcs = mtk_mac_select_pcs,
.mac_config = mtk_mac_config,
@@ -4484,6 +4595,7 @@ static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np)
const __be32 *_id = of_get_property(np, "reg", NULL);
phy_interface_t phy_mode;
struct phylink *phylink;
+ struct phylink_pcs *pcs;
struct mtk_mac *mac;
int id, err;
int txqs = 1;
@@ -4518,6 +4630,12 @@ static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np)
mac->id = id;
mac->hw = eth;
mac->of_node = np;
+ mac->pcs_of_node = of_parse_phandle(mac->of_node, "pcs-handle", 0);
+ if (mac->pcs_of_node) {
+ pcs = mtk_usxgmii_select_pcs(mac->pcs_of_node, PHY_INTERFACE_MODE_NA);
+ if (IS_ERR(pcs))
+ return PTR_ERR(pcs);
+ }
err = of_get_ethdev_address(mac->of_node, eth->netdev[id]);
if (err == -EPROBE_DEFER)
@@ -4610,8 +4728,21 @@ static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np)
phy_interface_zero(mac->phylink_config.supported_interfaces);
__set_bit(PHY_INTERFACE_MODE_INTERNAL,
mac->phylink_config.supported_interfaces);
+ } else if (MTK_HAS_CAPS(mac->hw->soc->caps, MTK_USXGMII)) {
+ mac->phylink_config.mac_capabilities |= MAC_5000FD | MAC_10000FD;
+ __set_bit(PHY_INTERFACE_MODE_5GBASER,
+ mac->phylink_config.supported_interfaces);
+ __set_bit(PHY_INTERFACE_MODE_10GBASER,
+ mac->phylink_config.supported_interfaces);
+ __set_bit(PHY_INTERFACE_MODE_USXGMII,
+ mac->phylink_config.supported_interfaces);
}
+ if (MTK_HAS_CAPS(mac->hw->soc->caps, MTK_2P5GPHY) &&
+ id == MTK_GMAC2_ID)
+ __set_bit(PHY_INTERFACE_MODE_INTERNAL,
+ mac->phylink_config.supported_interfaces);
+
phylink = phylink_create(&mac->phylink_config,
of_fwnode_handle(mac->of_node),
phy_mode, &mtk_phylink_ops);
@@ -4805,7 +4936,8 @@ static int mtk_probe(struct platform_device *pdev)
regmap_write(cci, 0, 3);
}
- if (MTK_HAS_CAPS(eth->soc->caps, MTK_SGMII)) {
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_SGMII) &&
+ !mtk_is_netsys_v3_or_greater(eth)) {
err = mtk_sgmii_init(eth);
if (err)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 9ae3b8a71d0e6..ba5998ef7965e 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -15,6 +15,7 @@
#include <linux/u64_stats_sync.h>
#include <linux/refcount.h>
#include <linux/phylink.h>
+#include <linux/reset.h>
#include <linux/rhashtable.h>
#include <linux/dim.h>
#include <linux/bitfield.h>
@@ -503,6 +504,21 @@
#define INTF_MODE_RGMII_1000 (TRGMII_MODE | TRGMII_CENTRAL_ALIGNED)
#define INTF_MODE_RGMII_10_100 0
+/* XFI Mac control registers */
+#define MTK_XMAC_BASE(x) (0x12000 + (((x) - 1) * 0x1000))
+#define MTK_XMAC_MCR(x) (MTK_XMAC_BASE(x))
+#define XMAC_MCR_TRX_DISABLE 0xf
+#define XMAC_MCR_FORCE_TX_FC BIT(5)
+#define XMAC_MCR_FORCE_RX_FC BIT(4)
+
+/* XFI Mac logic reset registers */
+#define MTK_XMAC_LOGIC_RST(x) (MTK_XMAC_BASE(x) + 0x10)
+#define XMAC_LOGIC_RST BIT(0)
+
+/* XFI Mac count global control */
+#define MTK_XMAC_CNT_CTRL(x) (MTK_XMAC_BASE(x) + 0x100)
+#define XMAC_GLB_CNTCLR BIT(0)
+
/* GPIO port control registers for GMAC 2*/
#define GPIO_OD33_CTRL8 0x4c0
#define GPIO_BIAS_CTRL 0xed0
@@ -528,6 +544,7 @@
#define SYSCFG0_SGMII_GMAC2 ((3 << 8) & SYSCFG0_SGMII_MASK)
#define SYSCFG0_SGMII_GMAC1_V2 BIT(9)
#define SYSCFG0_SGMII_GMAC2_V2 BIT(8)
+#define SYSCFG0_SGMII_GMAC3_V2 BIT(7)
/* ethernet subsystem clock register */
@@ -566,6 +583,11 @@
#define GEPHY_MAC_SEL BIT(1)
/* Top misc registers */
+#define TOP_MISC_NETSYS_PCS_MUX 0x84
+#define NETSYS_PCS_MUX_MASK GENMASK(1, 0)
+#define MUX_G2_USXGMII_SEL BIT(1)
+#define MUX_HSGMII1_G1_SEL BIT(0)
+
#define USB_PHY_SWITCH_REG 0x218
#define QPHY_SEL_MASK GENMASK(1, 0)
#define SGMII_QPHY_SEL 0x2
@@ -590,6 +612,8 @@
#define MT7628_SDM_RBCNT (MT7628_SDM_OFFSET + 0x10c)
#define MT7628_SDM_CS_ERR (MT7628_SDM_OFFSET + 0x110)
+/* Debug Purpose Register */
+#define MTK_PSE_FQFC_CFG 0x100
#define MTK_FE_CDM1_FSM 0x220
#define MTK_FE_CDM2_FSM 0x224
#define MTK_FE_CDM3_FSM 0x238
@@ -598,6 +622,11 @@
#define MTK_FE_CDM6_FSM 0x328
#define MTK_FE_GDM1_FSM 0x228
#define MTK_FE_GDM2_FSM 0x22C
+#define MTK_FE_GDM3_FSM 0x23C
+#define MTK_FE_PSE_FREE 0x240
+#define MTK_FE_DROP_FQ 0x244
+#define MTK_FE_DROP_FC 0x248
+#define MTK_FE_DROP_PPE 0x24C
#define MTK_MAC_FSM(x) (0x1010C + ((x) * 0x100))
@@ -722,12 +751,8 @@ enum mtk_clks_map {
MTK_CLK_ETHWARP_WOCPU2,
MTK_CLK_ETHWARP_WOCPU1,
MTK_CLK_ETHWARP_WOCPU0,
- MTK_CLK_TOP_USXGMII_SBUS_0_SEL,
- MTK_CLK_TOP_USXGMII_SBUS_1_SEL,
MTK_CLK_TOP_SGM_0_SEL,
MTK_CLK_TOP_SGM_1_SEL,
- MTK_CLK_TOP_XFI_PHY_0_XTAL_SEL,
- MTK_CLK_TOP_XFI_PHY_1_XTAL_SEL,
MTK_CLK_TOP_ETH_GMII_SEL,
MTK_CLK_TOP_ETH_REFCK_50M_SEL,
MTK_CLK_TOP_ETH_SYS_200M_SEL,
@@ -798,19 +823,9 @@ enum mtk_clks_map {
BIT_ULL(MTK_CLK_GP3) | BIT_ULL(MTK_CLK_XGP1) | \
BIT_ULL(MTK_CLK_XGP2) | BIT_ULL(MTK_CLK_XGP3) | \
BIT_ULL(MTK_CLK_CRYPTO) | \
- BIT_ULL(MTK_CLK_SGMII_TX_250M) | \
- BIT_ULL(MTK_CLK_SGMII_RX_250M) | \
- BIT_ULL(MTK_CLK_SGMII2_TX_250M) | \
- BIT_ULL(MTK_CLK_SGMII2_RX_250M) | \
BIT_ULL(MTK_CLK_ETHWARP_WOCPU2) | \
BIT_ULL(MTK_CLK_ETHWARP_WOCPU1) | \
BIT_ULL(MTK_CLK_ETHWARP_WOCPU0) | \
- BIT_ULL(MTK_CLK_TOP_USXGMII_SBUS_0_SEL) | \
- BIT_ULL(MTK_CLK_TOP_USXGMII_SBUS_1_SEL) | \
- BIT_ULL(MTK_CLK_TOP_SGM_0_SEL) | \
- BIT_ULL(MTK_CLK_TOP_SGM_1_SEL) | \
- BIT_ULL(MTK_CLK_TOP_XFI_PHY_0_XTAL_SEL) | \
- BIT_ULL(MTK_CLK_TOP_XFI_PHY_1_XTAL_SEL) | \
BIT_ULL(MTK_CLK_TOP_ETH_GMII_SEL) | \
BIT_ULL(MTK_CLK_TOP_ETH_REFCK_50M_SEL) | \
BIT_ULL(MTK_CLK_TOP_ETH_SYS_200M_SEL) | \
@@ -944,6 +959,8 @@ enum mkt_eth_capabilities {
MTK_RGMII_BIT = 0,
MTK_TRGMII_BIT,
MTK_SGMII_BIT,
+ MTK_USXGMII_BIT,
+ MTK_2P5GPHY_BIT,
MTK_ESW_BIT,
MTK_GEPHY_BIT,
MTK_MUX_BIT,
@@ -964,8 +981,11 @@ enum mkt_eth_capabilities {
MTK_ETH_MUX_GDM1_TO_GMAC1_ESW_BIT,
MTK_ETH_MUX_GMAC2_GMAC0_TO_GEPHY_BIT,
MTK_ETH_MUX_U3_GMAC2_TO_QPHY_BIT,
+ MTK_ETH_MUX_GMAC2_TO_2P5GPHY_BIT,
MTK_ETH_MUX_GMAC1_GMAC2_TO_SGMII_RGMII_BIT,
MTK_ETH_MUX_GMAC12_TO_GEPHY_SGMII_BIT,
+ MTK_ETH_MUX_GMAC123_TO_GEPHY_SGMII_BIT,
+ MTK_ETH_MUX_GMAC123_TO_USXGMII_BIT,
/* PATH BITS */
MTK_ETH_PATH_GMAC1_RGMII_BIT,
@@ -973,14 +993,21 @@ enum mkt_eth_capabilities {
MTK_ETH_PATH_GMAC1_SGMII_BIT,
MTK_ETH_PATH_GMAC2_RGMII_BIT,
MTK_ETH_PATH_GMAC2_SGMII_BIT,
+ MTK_ETH_PATH_GMAC2_2P5GPHY_BIT,
MTK_ETH_PATH_GMAC2_GEPHY_BIT,
+ MTK_ETH_PATH_GMAC3_SGMII_BIT,
MTK_ETH_PATH_GDM1_ESW_BIT,
+ MTK_ETH_PATH_GMAC1_USXGMII_BIT,
+ MTK_ETH_PATH_GMAC2_USXGMII_BIT,
+ MTK_ETH_PATH_GMAC3_USXGMII_BIT,
};
/* Supported hardware group on SoCs */
#define MTK_RGMII BIT_ULL(MTK_RGMII_BIT)
#define MTK_TRGMII BIT_ULL(MTK_TRGMII_BIT)
#define MTK_SGMII BIT_ULL(MTK_SGMII_BIT)
+#define MTK_USXGMII BIT_ULL(MTK_USXGMII_BIT)
+#define MTK_2P5GPHY BIT_ULL(MTK_2P5GPHY_BIT)
#define MTK_ESW BIT_ULL(MTK_ESW_BIT)
#define MTK_GEPHY BIT_ULL(MTK_GEPHY_BIT)
#define MTK_MUX BIT_ULL(MTK_MUX_BIT)
@@ -1003,10 +1030,16 @@ enum mkt_eth_capabilities {
BIT_ULL(MTK_ETH_MUX_GMAC2_GMAC0_TO_GEPHY_BIT)
#define MTK_ETH_MUX_U3_GMAC2_TO_QPHY \
BIT_ULL(MTK_ETH_MUX_U3_GMAC2_TO_QPHY_BIT)
+#define MTK_ETH_MUX_GMAC2_TO_2P5GPHY \
+ BIT_ULL(MTK_ETH_MUX_GMAC2_TO_2P5GPHY_BIT)
#define MTK_ETH_MUX_GMAC1_GMAC2_TO_SGMII_RGMII \
BIT_ULL(MTK_ETH_MUX_GMAC1_GMAC2_TO_SGMII_RGMII_BIT)
#define MTK_ETH_MUX_GMAC12_TO_GEPHY_SGMII \
BIT_ULL(MTK_ETH_MUX_GMAC12_TO_GEPHY_SGMII_BIT)
+#define MTK_ETH_MUX_GMAC123_TO_GEPHY_SGMII \
+ BIT_ULL(MTK_ETH_MUX_GMAC123_TO_GEPHY_SGMII_BIT)
+#define MTK_ETH_MUX_GMAC123_TO_USXGMII \
+ BIT_ULL(MTK_ETH_MUX_GMAC123_TO_USXGMII_BIT)
/* Supported path present on SoCs */
#define MTK_ETH_PATH_GMAC1_RGMII BIT_ULL(MTK_ETH_PATH_GMAC1_RGMII_BIT)
@@ -1014,8 +1047,13 @@ enum mkt_eth_capabilities {
#define MTK_ETH_PATH_GMAC1_SGMII BIT_ULL(MTK_ETH_PATH_GMAC1_SGMII_BIT)
#define MTK_ETH_PATH_GMAC2_RGMII BIT_ULL(MTK_ETH_PATH_GMAC2_RGMII_BIT)
#define MTK_ETH_PATH_GMAC2_SGMII BIT_ULL(MTK_ETH_PATH_GMAC2_SGMII_BIT)
+#define MTK_ETH_PATH_GMAC2_2P5GPHY BIT_ULL(MTK_ETH_PATH_GMAC2_2P5GPHY_BIT)
#define MTK_ETH_PATH_GMAC2_GEPHY BIT_ULL(MTK_ETH_PATH_GMAC2_GEPHY_BIT)
+#define MTK_ETH_PATH_GMAC3_SGMII BIT_ULL(MTK_ETH_PATH_GMAC3_SGMII_BIT)
#define MTK_ETH_PATH_GDM1_ESW BIT_ULL(MTK_ETH_PATH_GDM1_ESW_BIT)
+#define MTK_ETH_PATH_GMAC1_USXGMII BIT_ULL(MTK_ETH_PATH_GMAC1_USXGMII_BIT)
+#define MTK_ETH_PATH_GMAC2_USXGMII BIT_ULL(MTK_ETH_PATH_GMAC2_USXGMII_BIT)
+#define MTK_ETH_PATH_GMAC3_USXGMII BIT_ULL(MTK_ETH_PATH_GMAC3_USXGMII_BIT)
#define MTK_GMAC1_RGMII (MTK_ETH_PATH_GMAC1_RGMII | MTK_RGMII)
#define MTK_GMAC1_TRGMII (MTK_ETH_PATH_GMAC1_TRGMII | MTK_TRGMII)
@@ -1023,7 +1061,12 @@ enum mkt_eth_capabilities {
#define MTK_GMAC2_RGMII (MTK_ETH_PATH_GMAC2_RGMII | MTK_RGMII)
#define MTK_GMAC2_SGMII (MTK_ETH_PATH_GMAC2_SGMII | MTK_SGMII)
#define MTK_GMAC2_GEPHY (MTK_ETH_PATH_GMAC2_GEPHY | MTK_GEPHY)
+#define MTK_GMAC2_2P5GPHY (MTK_ETH_PATH_GMAC2_2P5GPHY | MTK_2P5GPHY)
+#define MTK_GMAC3_SGMII (MTK_ETH_PATH_GMAC3_SGMII | MTK_SGMII)
#define MTK_GDM1_ESW (MTK_ETH_PATH_GDM1_ESW | MTK_ESW)
+#define MTK_GMAC1_USXGMII (MTK_ETH_PATH_GMAC1_USXGMII | MTK_USXGMII)
+#define MTK_GMAC2_USXGMII (MTK_ETH_PATH_GMAC2_USXGMII | MTK_USXGMII)
+#define MTK_GMAC3_USXGMII (MTK_ETH_PATH_GMAC3_USXGMII | MTK_USXGMII)
/* MUXes present on SoCs */
/* 0: GDM1 -> GMAC1, 1: GDM1 -> ESW */
@@ -1042,10 +1085,20 @@ enum mkt_eth_capabilities {
(MTK_ETH_MUX_GMAC1_GMAC2_TO_SGMII_RGMII | MTK_MUX | \
MTK_SHARED_SGMII)
+/* 2: GMAC2 -> XGMII */
+#define MTK_MUX_GMAC2_TO_2P5GPHY \
+ (MTK_ETH_MUX_GMAC2_TO_2P5GPHY | MTK_MUX | MTK_INFRA)
+
/* 0: GMACx -> GEPHY, 1: GMACx -> SGMII where x is 1 or 2 */
#define MTK_MUX_GMAC12_TO_GEPHY_SGMII \
(MTK_ETH_MUX_GMAC12_TO_GEPHY_SGMII | MTK_MUX)
+#define MTK_MUX_GMAC123_TO_GEPHY_SGMII \
+ (MTK_ETH_MUX_GMAC123_TO_GEPHY_SGMII | MTK_MUX)
+
+#define MTK_MUX_GMAC123_TO_USXGMII \
+ (MTK_ETH_MUX_GMAC123_TO_USXGMII | MTK_MUX | MTK_INFRA)
+
#define MTK_HAS_CAPS(caps, _x) (((caps) & (_x)) == (_x))
#define MT7621_CAPS (MTK_GMAC1_RGMII | MTK_GMAC1_TRGMII | \
@@ -1077,8 +1130,12 @@ enum mkt_eth_capabilities {
MTK_MUX_GMAC12_TO_GEPHY_SGMII | MTK_QDMA | \
MTK_RSTCTRL_PPE1 | MTK_SRAM)
-#define MT7988_CAPS (MTK_36BIT_DMA | MTK_GDM1_ESW | MTK_QDMA | \
- MTK_RSTCTRL_PPE1 | MTK_RSTCTRL_PPE2 | MTK_SRAM)
+#define MT7988_CAPS (MTK_36BIT_DMA | MTK_GDM1_ESW | MTK_GMAC1_SGMII | \
+ MTK_GMAC2_2P5GPHY | MTK_GMAC2_SGMII | MTK_GMAC2_USXGMII | \
+ MTK_GMAC3_SGMII | MTK_GMAC3_USXGMII | \
+ MTK_MUX_GMAC123_TO_GEPHY_SGMII | \
+ MTK_MUX_GMAC123_TO_USXGMII | MTK_MUX_GMAC2_TO_2P5GPHY | \
+ MTK_QDMA | MTK_RSTCTRL_PPE1 | MTK_RSTCTRL_PPE2 | MTK_SRAM)
struct mtk_tx_dma_desc_info {
dma_addr_t addr;
@@ -1313,6 +1370,7 @@ struct mtk_mac {
phy_interface_t interface;
int speed;
struct device_node *of_node;
+ struct device_node *pcs_of_node;
struct phylink *phylink;
struct phylink_config phylink_config;
struct mtk_eth *hw;
@@ -1421,6 +1479,19 @@ static inline u32 mtk_get_ib2_multicast_mask(struct mtk_eth *eth)
return MTK_FOE_IB2_MULTICAST;
}
+static inline bool mtk_interface_mode_is_xgmii(phy_interface_t interface)
+{
+ switch (interface) {
+ case PHY_INTERFACE_MODE_INTERNAL:
+ case PHY_INTERFACE_MODE_USXGMII:
+ case PHY_INTERFACE_MODE_10GBASER:
+ case PHY_INTERFACE_MODE_5GBASER:
+ return true;
+ default:
+ return false;
+ }
+}
+
/* read the hardware status register */
void mtk_stats_update_mac(struct mtk_mac *mac);
@@ -1429,8 +1500,10 @@ u32 mtk_r32(struct mtk_eth *eth, unsigned reg);
u32 mtk_m32(struct mtk_eth *eth, u32 mask, u32 set, unsigned int reg);
int mtk_gmac_sgmii_path_setup(struct mtk_eth *eth, int mac_id);
+int mtk_gmac_2p5gphy_path_setup(struct mtk_eth *eth, int mac_id);
int mtk_gmac_gephy_path_setup(struct mtk_eth *eth, int mac_id);
int mtk_gmac_rgmii_path_setup(struct mtk_eth *eth, int mac_id);
+int mtk_gmac_usxgmii_path_setup(struct mtk_eth *eth, int mac_id);
int mtk_eth_offload_init(struct mtk_eth *eth);
int mtk_eth_setup_tc(struct net_device *dev, enum tc_setup_type type,
--
2.42.1
^ permalink raw reply related
* Re: [PATCH net] net: sched: fix warn on htb offloaded class creation
From: Chittim, Madhu @ 2023-11-09 21:54 UTC (permalink / raw)
To: Maxim Mikityanskiy, Paolo Abeni
Cc: netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
Eric Dumazet, Jakub Kicinski, Tariq Toukan, Gal Pressman,
Saeed Mahameed, xuejun.zhang, sridhar.samudrala
In-Reply-To: <ZUEQzsKiIlgtbN-S@mail.gmail.com>
On 10/31/2023 7:40 AM, Maxim Mikityanskiy wrote:
> On Tue, 31 Oct 2023 at 10:11:14 +0100, Paolo Abeni wrote:
>> Hi,
>>
>> I'm sorry for the late reply.
>>
>> On Fri, 2023-10-27 at 16:57 +0300, Maxim Mikityanskiy wrote:
>>> I believe this is not the right fix.
>>>
>>> On Thu, 26 Oct 2023 at 17:36:48 +0200, Paolo Abeni wrote:
>>>> The following commands:
>>>>
>>>> tc qdisc add dev eth1 handle 2: root htb offload
>>>> tc class add dev eth1 parent 2: classid 2:1 htb rate 5mbit burst 15k
>>>>
>>>> yeld to a WARN in the HTB qdisc:
>>>
>>> Something is off here. These are literally the most basic commands one
>>> could invoke with HTB offload, I'm sure they worked. Is it something
>>> that broke recently? Tariq/Gal/Saeed, could you check them on a Mellanox
>>> NIC?
>>>
>>>>
>>>> WARNING: CPU: 2 PID: 1583 at net/sched/sch_htb.c:1959
>>>> CPU: 2 PID: 1583 Comm: tc Kdump: loaded 6.6.0-rc2.mptcp_7895773e5235+ #59
>>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
>>>> RIP: 0010:htb_change_class+0x25c4/0x2e30 [sch_htb]
>>>> Code: 24 58 48 b8 00 00 00 00 00 fc ff df 48 89 ca 48 c1 ea 03 80 3c 02 00 0f 85 92 01 00 00 49 89 8c 24 b0 01 00 00 e9 77 fc ff ff <0f> 0b e9 15 ec ff ff 80 3d f8 35 00 00 00 0f 85 d4 f9 ff ff ba 32
>>>> RSP: 0018:ffffc900015df240 EFLAGS: 00010246
>>>> RAX: 0000000000000000 RBX: ffff88811b4ca000 RCX: ffff88811db42800
>>>> RDX: 1ffff11023b68502 RSI: ffffffffaf2e6a00 RDI: ffff88811db42810
>>>> RBP: ffff88811db45000 R08: 0000000000000001 R09: fffffbfff664bbc9
>>>> R10: ffffffffb325de4f R11: ffffffffb2d33748 R12: 0000000000000000
>>>> R13: ffff88811db43000 R14: ffff88811b4caaac R15: ffff8881252c0030
>>>> FS: 00007f6c1f126740(0000) GS:ffff88815aa00000(0000) knlGS:0000000000000000
>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> CR2: 000055dca8e5b4a8 CR3: 000000011bc7a006 CR4: 0000000000370ee0
>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> Call Trace:
>>>> <TASK>
>>>> tc_ctl_tclass+0x394/0xeb0
>>>> rtnetlink_rcv_msg+0x2f5/0xaa0
>>>> netlink_rcv_skb+0x12e/0x3a0
>>>> netlink_unicast+0x421/0x730
>>>> netlink_sendmsg+0x79e/0xc60
>>>> ____sys_sendmsg+0x95a/0xc20
>>>> ___sys_sendmsg+0xee/0x170
>>>> __sys_sendmsg+0xc6/0x170
>>>> do_syscall_64+0x58/0x80
>>>> entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>>>>
>>>> The first command creates per TX queue pfifo qdiscs in
>>>> tc_modify_qdisc() -> htb_init() and grafts the pfifo to each dev_queue
>>>> via tc_modify_qdisc() -> qdisc_graft() -> htb_attach().
>>>
>>> Not exactly; it grafts pfifo to direct queues only. htb_attach_offload
>>> explicitly grafts noop to all the remaining queues.
>>
>> num_direct_qdiscs == real_num_tx_queues:
>>
>> https://elixir.bootlin.com/linux/latest/source/net/sched/sch_htb.c#L1101
>>
>> pfifo will be configured on all the TX queues available at TC creation
>> time, right?
>
> Yes, all real TX queues will be used as direct queues (for unclassified
> traffic). num_tx_queues should be somewhat bigger than
> real_num_tx_queues - it should reserve a queue per potential leaf class.
>
> pfifo is configured on direct queues, and the reserved queues have noop.
> Then, when a new leaf class is added (TC_HTB_LEAF_ALLOC_QUEUE), the
> driver allocates a new queue and increases real_num_tx_queues. HTB
> assigns a pfifo qdisc to the newly allocated queue.
>
> Changing the hierarchy (deleting a node or converting an inner node to a
> leaf) may reorder the classful queues (with indexes >= the initial
> real_num_tx_queues), so that there are no gaps.
>
>> Lacking a mlx card with offload support I hack basic htb support in
>> netdevsim and I observe the splat on top of such device. I can as well
>> share the netdevsim patch - it will need some clean-up.
>
> I will be happy to review the netdevsim patch, but I don't promise
> prompt responsiveness.
>
>>>
>>>> When the command completes, the qdisc_sleeping for each dev_queue is a
>>>> pfifo one. The next class creation will trigger the reported splat.
>>>>
>>>> Address the issue taking care of old non-builtin qdisc in
>>>> htb_change_class().
>>>>
>>>> Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload")
>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>> ---
>>>> net/sched/sch_htb.c | 3 +--
>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>>
>>>> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
>>>> index 0d947414e616..dc682bd542b4 100644
>>>> --- a/net/sched/sch_htb.c
>>>> +++ b/net/sched/sch_htb.c
>>>> @@ -1955,8 +1955,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
>>>> qdisc_refcount_inc(new_q);
>>>> }
>>>> old_q = htb_graft_helper(dev_queue, new_q);
>>>> - /* No qdisc_put needed. */
>>>> - WARN_ON(!(old_q->flags & TCQ_F_BUILTIN));
>>>> + qdisc_put(old_q);
>>>
>>> We can get here after one of two cases above:
>>>
>>> 1. A new queue is allocated with TC_HTB_LEAF_ALLOC_QUEUE. It's supposed
>>> to have a noop qdisc by default (after htb_attach_offload).
>>
>> So most likely the trivial netdevsim implementation I used was not good
>> enough.
>>
>> Which constrains should respect TC_HTB_LEAF_ALLOC_QUEUE WRT the
>> returned qid value? should it in the (real_num_tx_queues,
>> num_tx_queues] range?
>
> Let's say N is real_num_tx_queues as it was at the moment of attaching.
> HTB queues should be allocated from [N, num_tx_queues), and
> real_num_tx_queues should be increased accordingly. It should not return
> queues number [0, N).
>
> Deletions should fill the gaps: if queue X is being deleted, N <= X <
> real_num_tx_queues - 1, then the gap should be filled with queue number
> real_num_tx_queues - 1 by swapping the queues (real_num_tx_queues will
> be decreased by 1 accordingly). Some care also needs to be taken when
> converting inner-to-leaf (TC_HTB_LEAF_DEL_LAST) and leaf-to-inner (it's
> better to get insights from [1], there are also some comments).
>
>> Can HTB actually configure H/W shaping on
>> real_num_tx_queues?
>
> It will be on real_num_tx_queues, but after it's increased to add new
> HTB queues. The original queues [0, N) are used for direct traffic, same
> as the non-offloaded HTB's direct_queue (it's not shaped).
>
>> I find no clear documentation WRT the above.
>
> I'm sorry for the lack of documentation. All I have is the commit
> message [2] and a netdev talk [3]. Maybe the slides could be of some
> use...
>
> I hope the above explanation clarifies something, and feel free to ask
> further questions, I'll be glad to explain what hasn't been documented
> properly.
We would like to enable Tx rate limiting using htb offload on all the
existing queues. We are able to do with the following set of commands
with Paolo's patch
tc qdisc add dev enp175s0v0 handle 1: root htb offload
tc class add dev enp175s0v0 parent 1: classid 1:1 htb rate 1000mbit ceil
2000mbit burst 100k
where
classid 1:1 is tx queue0
tx_minrate is rate 1000mbps
tx_maxrate is ceil 2000mbps
In order to not break your implementation could bring in if condition
instead WARN_ON, something like below
if (!(old_q->flags & TCQ_F_BUILTIN))
qdisc_put(old_q);
Would this work for you, please advise.
^ permalink raw reply
* Re: [RFC PATCH 1/8] dt-bindings: phy: mediatek,xfi-pextp: add new bindings
From: Andrew Lunn @ 2023-11-09 21:55 UTC (permalink / raw)
To: Daniel Golle
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Chunfeng Yun,
Vinod Koul, Kishon Vijay Abraham I, Felix Fietkau, John Crispin,
Sean Wang, Mark Lee, Lorenzo Bianconi, Matthias Brugger,
AngeloGioacchino Del Regno, Heiner Kallweit, Russell King,
Alexander Couzens, Philipp Zabel, netdev, devicetree,
linux-kernel, linux-arm-kernel, linux-mediatek, linux-phy
In-Reply-To: <924c2c6316e6d51a17423eded3a2c5c5bbf349d2.1699565880.git.daniel@makrotopia.org>
> + mediatek,usxgmii-performance-errata:
> + $ref: /schemas/types.yaml#/definitions/flag
> + description:
> + USXGMII0 on MT7988 suffers from a performance problem in 10GBase-R
> + mode which needs a work-around in the driver. The work-around is
> + enabled using this flag.
Is there more details about this? I'm just wondering if this should be
based on the compatible, rather than a bool property.
Andrew
^ permalink raw reply
* Re: [PATCH net-next v2 1/1] ptp: clockmatrix: support 32-bit address space
From: Simon Horman @ 2023-11-09 22:25 UTC (permalink / raw)
To: Min Li; +Cc: richardcochran, lee, linux-kernel, netdev, Min Li
In-Reply-To: <MW5PR03MB6932A4AAD4F612B45E9F6856A0AFA@MW5PR03MB6932.namprd03.prod.outlook.com>
On Thu, Nov 09, 2023 at 01:13:52PM -0500, Min Li wrote:
> From: Min Li <min.li.xe@renesas.com>
>
> We used to assume 0x2010xxxx address. Now that
> we need to access 0x2011xxxx address, we need
> to support read/write the whole 32-bit address space.
>
> Signed-off-by: Min Li <min.li.xe@renesas.com>
> ---
> - Drop MAX_ABS_WRITE_PHASE_PICOSECONDS advised by Rahul
...
> @@ -62,7 +62,8 @@ static int contains_full_configuration(struct idtcm *idtcm,
> const struct firmware *fw)
> {
> struct idtcm_fwrc *rec = (struct idtcm_fwrc *)fw->data;
> - u16 scratch = IDTCM_FW_REG(idtcm->fw_ver, V520, SCRATCH);
> + u16 scratch = SCSR_ADDR(IDTCM_FW_REG(idtcm->fw_ver, V520, SCRATCH));
Hi Min Li,
I think a similar conversion for scratch in idtcm_load_firmware()
is required.
As flagged by clang-16 W=1, and Smatch.
`
> + u16 gpio_control = SCSR_ADDR(GPIO_USER_CONTROL);
> s32 full_count;
> s32 count = 0;
> u16 regaddr;
...
^ permalink raw reply
* Re: [PATCH v9 bpf-next 02/17] bpf: add BPF token delegation mount options to BPF FS
From: Andrii Nakryiko @ 2023-11-09 22:29 UTC (permalink / raw)
To: Christian Brauner
Cc: Andrii Nakryiko, bpf, netdev, paul, linux-fsdevel,
linux-security-module, keescook, kernel-team, sargun
In-Reply-To: <CAEf4Bza-Rv4YJs8R2YeMyk6psnT71dnuwBt2H=p32PdTCt-6nA@mail.gmail.com>
On Thu, Nov 9, 2023 at 9:09 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 12:48 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Wed, Nov 08, 2023 at 01:09:27PM -0800, Andrii Nakryiko wrote:
> > > On Wed, Nov 8, 2023 at 5:51 AM Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > > On Fri, Nov 03, 2023 at 12:05:08PM -0700, Andrii Nakryiko wrote:
> > > > > Add few new mount options to BPF FS that allow to specify that a given
> > > > > BPF FS instance allows creation of BPF token (added in the next patch),
> > > > > and what sort of operations are allowed under BPF token. As such, we get
> > > > > 4 new mount options, each is a bit mask
> > > > > - `delegate_cmds` allow to specify which bpf() syscall commands are
> > > > > allowed with BPF token derived from this BPF FS instance;
> > > > > - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
> > > > > a set of allowable BPF map types that could be created with BPF token;
> > > > > - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
> > > > > a set of allowable BPF program types that could be loaded with BPF token;
> > > > > - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
> > > > > a set of allowable BPF program attach types that could be loaded with
> > > > > BPF token; delegate_progs and delegate_attachs are meant to be used
> > > > > together, as full BPF program type is, in general, determined
> > > > > through both program type and program attach type.
> > > > >
> > > > > Currently, these mount options accept the following forms of values:
> > > > > - a special value "any", that enables all possible values of a given
> > > > > bit set;
> > > > > - numeric value (decimal or hexadecimal, determined by kernel
> > > > > automatically) that specifies a bit mask value directly;
> > > > > - all the values for a given mount option are combined, if specified
> > > > > multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
> > > > > delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
> > > > > mask.
> > > > >
> > > > > Ideally, more convenient (for humans) symbolic form derived from
> > > > > corresponding UAPI enums would be accepted (e.g., `-o
> > > > > delegate_progs=kprobe|tracepoint`) and I intend to implement this, but
> > > > > it requires a bunch of UAPI header churn, so I postponed it until this
> > > > > feature lands upstream or at least there is a definite consensus that
> > > > > this feature is acceptable and is going to make it, just to minimize
> > > > > amount of wasted effort and not increase amount of non-essential code to
> > > > > be reviewed.
> > > > >
> > > > > Attentive reader will notice that BPF FS is now marked as
> > > > > FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init
> > > > > user namespace as long as the process has sufficient *namespaced*
> > > > > capabilities within that user namespace. But in reality we still
> > > > > restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in
> > > > > init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added
> > > > > to allow creating BPF FS context object (i.e., fsopen("bpf")) from
> > > > > inside unprivileged process inside non-init userns, to capture that
> > > > > userns as the owning userns. It will still be required to pass this
> > > > > context object back to privileged process to instantiate and mount it.
> > > > >
> > > > > This manipulation is important, because capturing non-init userns as the
> > > > > owning userns of BPF FS instance (super block) allows to use that userns
> > > > > to constraint BPF token to that userns later on (see next patch). So
> > > > > creating BPF FS with delegation inside unprivileged userns will restrict
> > > > > derived BPF token objects to only "work" inside that intended userns,
> > > > > making it scoped to a intended "container".
> > > > >
> > > > > There is a set of selftests at the end of the patch set that simulates
> > > > > this sequence of steps and validates that everything works as intended.
> > > > > But careful review is requested to make sure there are no missed gaps in
> > > > > the implementation and testing.
> > > > >
> > > > > All this is based on suggestions and discussions with Christian Brauner
> > > > > ([0]), to the best of my ability to follow all the implications.
> > > >
> > > > "who will not be held responsible for any CVE future or present as he's
> > > > not sure whether bpf token is a good idea in general"
> > > >
> > > > I'm not opposing it because it's really not my subsystem. But it'd be
> > > > nice if you also added a disclaimer that I'm not endorsing this. :)
> > > >
> > >
> > > Sure, I'll clarify. I still appreciate your reviewing everything and
> > > pointing out all the gotchas (like the reconfiguration and other
> > > stuff), thanks!
> > >
> > > > A comment below.
> > > >
> > > > >
> > > > > [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/
> > > > >
> > > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > > ---
> > > > > include/linux/bpf.h | 10 ++++++
> > > > > kernel/bpf/inode.c | 88 +++++++++++++++++++++++++++++++++++++++------
> > > > > 2 files changed, 88 insertions(+), 10 deletions(-)
> > > > >
> > >
> > > [...]
> > >
> > > > > opt = fs_parse(fc, bpf_fs_parameters, param, &result);
> > > > > if (opt < 0) {
> > > > > @@ -665,6 +692,25 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
> > > > > case OPT_MODE:
> > > > > opts->mode = result.uint_32 & S_IALLUGO;
> > > > > break;
> > > > > + case OPT_DELEGATE_CMDS:
> > > > > + case OPT_DELEGATE_MAPS:
> > > > > + case OPT_DELEGATE_PROGS:
> > > > > + case OPT_DELEGATE_ATTACHS:
> > > > > + if (strcmp(param->string, "any") == 0) {
> > > > > + msk = ~0ULL;
> > > > > + } else {
> > > > > + err = kstrtou64(param->string, 0, &msk);
> > > > > + if (err)
> > > > > + return err;
> > > > > + }
> > > > > + switch (opt) {
> > > > > + case OPT_DELEGATE_CMDS: opts->delegate_cmds |= msk; break;
> > > > > + case OPT_DELEGATE_MAPS: opts->delegate_maps |= msk; break;
> > > > > + case OPT_DELEGATE_PROGS: opts->delegate_progs |= msk; break;
> > > > > + case OPT_DELEGATE_ATTACHS: opts->delegate_attachs |= msk; break;
> > > > > + default: return -EINVAL;
> > > > > + }
> > > > > + break;
> > > > > }
> > > >
> > > > So just to repeat that this will allow a container to set it's own
> > > > delegation options:
> > > >
> > > > # unprivileged container
> > > >
> > > > fd_fs = fsopen();
> > > > fsconfig(fd_fs, FSCONFIG_BLA_BLA, "give-me-all-the-delegation");
> > > >
> > > > # Now hand of that fd_fs to a privileged process
> > > >
> > > > fsconfig(fd_fs, FSCONFIG_CREATE_CMD, ...)
> > > >
> > > > This means the container manager can't be part of your threat model
> > > > because you need to trust it to set delegation options.
> > > >
> > > > But if the container manager is part of your threat model then you can
> > > > never trust an fd_fs handed to you because the container manager might
> > > > have enabled arbitrary delegation privileges.
> > > >
> > > > There's ways around this:
> > > >
> > > > (1) kernel: Account for this in the kernel and require privileges when
> > > > setting delegation options.
> > >
> > > What sort of privilege would that be? We are in an unprivileged user
> > > namespace, so that would have to be some ns_capable() checks or
> > > something? I can add ns_capable(CAP_BPF), but what else did you have
> > > in mind?
> >
> > You would require privileges in the initial namespace aka capable()
> > checks similar to what you require for superblock creation.
>
> ok, I was just wondering if I'm missing something non-obvious.
> capable(CAP_SYS_ADMIN) makes sense and doesn't really hurt intended
> use case. Privileged parent will set these config values and then do
> FSCONFIG_CREATE_CMD.
>
> For reconfiguration I'll enforce same capable(CAP_SYS_ADMIN) checks,
> unless unprivileged user drops permissions to more restrictive ones
> (but I haven't had a chance to look at exact callback API, so we'll
> see if that's easy to support).
Ok, so I played with this a bit. It seems that if I require
capable(CAP_SYS_ADMIN) for in fsconfig() to set delegation options, I
don't have to do anything special about reconfiguration. Any
FSCONFIG_SET_xxx command for delegation option will just fail, and so
reconfiguration is harmless. I'm going to go with that and keep it
simple.
>
> Thanks for feedback!
>
> >
> > >
> > > I think even if we say that privileged parent does FSCONFIG_SET_STRING
> > > and unprivileged child just does sys_fsopen("bpf", 0) and nothing
> > > more, we still can't be sure that child won't race with parent and set
> > > FSCONFIG_SET_STRING at the same time. Because they both have access to
> > > the same fs_fd.
> >
> > Unless you require privileges as outlined above to set delegation
> > options in which case an unprivileged container cannot change delegation
> > options at all.
>
> Yep, makes sense, that's what I'm going to do.
>
> >
> > >
> > > > (2) userspace: A trusted helper that allocates an fs_context fd in
> > > > the target user namespace, then sets delegation options and creates
> > > > superblock.
> > > >
> > > > (1) Is more restrictive but also more secure. (2) is less restrictive
> > > > but requires more care from userspace.
> > > >
> > > > Either way I would probably consider writing a document detailing
> > > > various delegation scenarios and possible pitfalls and implications
> > > > before advertising it.
> > > >
> > > > If you choose (2) then you also need to be aware that the security of
> > > > this also hinges on bpffs not allowing to reconfigure parameters once it
> > > > has been mounted. Otherwise an unprivileged container can change
> > > > delegation options.
> > > >
> > > > I would recommend that you either add a dummy bpf_reconfigure() method
> > > > with a comment in it or you add a comment on top of bpf_context_ops.
> > > > Something like:
> > > >
> > > > /*
> > > > * Unprivileged mounts of bpffs are owned by the user namespace they are
> > > > * mounted in. That means unprivileged users can change vfs mount
> > > > * options (ro<->rw, nosuid, etc.).
> > > > *
> > > > * They currently cannot change bpffs specific mount options such as
> > > > * delegation settings. If that is ever implemented it is necessary to
> > > > * require rivileges in the initial namespace. Otherwise unprivileged
> > > > * users can change delegation options to whatever they want.
> > > > */
> > >
> > > Yep, I will add a custom callback. I think we can allow reconfiguring
> > > towards less permissive delegation subset, but I'll need to look at
> > > the specifics to see if we can support that easily.
^ permalink raw reply
* Re: [PATCH net] net: microchip: lan743x : bidirectional throughuput improvement
From: Jakub Kicinski @ 2023-11-09 23:04 UTC (permalink / raw)
To: VishvambarPanth.S
Cc: f.fainelli, Bryan.Whitehead, andrew, davem, linux-kernel, pabeni,
netdev, UNGLinuxDriver, edumazet
In-Reply-To: <0d0627cbd32afb813b75b485ea8e979ac027482d.camel@microchip.com>
On Thu, 9 Nov 2023 10:53:26 +0000 VishvambarPanth.S@microchip.com wrote:
> Thanks for your feedback. I apologize for the delayed response.
>
> The data presented in the patch description was aimed to convince a
> reviewer with the visible impact of the performance boosts in both x64
> and ARM platforms. However, the main motivation behind the patch was
> not merely a "good-to-have" improvement but a solution to the
> throughput issues reported by multiple customers in several platforms.
> We received lots of customer requests through our ticket site system
> urging us to address the performance issues on multiple kernel versions
> including LTS. While it's acknowledged that stable branch rules
> typically do not consider performance fixes that are not documented in
> public Bugzilla, this performance enhancement is essential to many of
> our customers and their end users and we believe should therefore be
> considered for stable branch on the basis of it’s visible user impact.
> Few issues reported by our customers are mentioned below, even though
> these issues have existed for a long time, the data presented below is
> collected from the customer within last 3 months.
>
> Customer-A using lan743x with Hisilicon- Kirin 990 processor in 5.10
> kernel, reported a mere ~300Mbps in Rx UDP. The fix significantly
> improved the performance to ~900Mbps Rx in their platform.
>
> Customer-B using lan743x with v5.10 has an issue with Tx UDP being only
> 157Mbps in their platform. Including the fix in the patch boosts the
> performance to ~600Mbps in Tx UDP.
>
> Customer-C using lan743x with ADAS Ref Design in v5.10 reported UDP
> Tx/Rx to be 126/723 Mbps and the fix improved the performance to
> 828/956 Mbps.
>
> Customer-D using lan743x with Qcom 6490 with v5.4 wanted improvements
> for their platform from UDP Rx 200Mbps. The fix along with few other
> changes helped us to bring Rx perf to 800Mbps in customer’s platform
>
> This is a kind request for considering the acceptance of this patch
> into the net branch, as it has a significant positive impact on users
> and does not have any adverse effects.
Thanks a lot for the details. Unfortunately after further consideration
I can't accept this patch as a fix with clear conscience. The code has
been this way for a long time, performance improvements should end up
in new kernels and people who want to benefit from faster kernels should
not be sticking to old LTS releases.
So please repost for net-next next week, when it's open again.
^ permalink raw reply
* Re: [RFC PATCH 1/8] dt-bindings: phy: mediatek,xfi-pextp: add new bindings
From: Daniel Golle @ 2023-11-09 23:11 UTC (permalink / raw)
To: Andrew Lunn
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Chunfeng Yun,
Vinod Koul, Kishon Vijay Abraham I, Felix Fietkau, John Crispin,
Sean Wang, Mark Lee, Lorenzo Bianconi, Matthias Brugger,
AngeloGioacchino Del Regno, Heiner Kallweit, Russell King,
Alexander Couzens, Philipp Zabel, netdev, devicetree,
linux-kernel, linux-arm-kernel, linux-mediatek, linux-phy
In-Reply-To: <797ea94b-9c26-43a2-85d7-633990ed8c57@lunn.ch>
Hi Andrew,
On Thu, Nov 09, 2023 at 10:55:55PM +0100, Andrew Lunn wrote:
> > + mediatek,usxgmii-performance-errata:
> > + $ref: /schemas/types.yaml#/definitions/flag
> > + description:
> > + USXGMII0 on MT7988 suffers from a performance problem in 10GBase-R
> > + mode which needs a work-around in the driver. The work-around is
> > + enabled using this flag.
>
> Is there more details about this? I'm just wondering if this should be
> based on the compatible, rather than a bool property.
The vendor sources where this is coming from are here:
https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/a500d94cd47e279015ce22947e1ce396a7516598%5E%21/#F0
And I'm afraid this is as much detail as it gets. And yes, we could
also base this on the compatible and just have two different ones for
the two PEXTP instances found in MT7988.
Let me know your conclusion in that regard.
Cheers
Daniel
^ permalink raw reply
* RE: [PATCH net 3/3] dpll: fix register pin with unregistered parent pin
From: Kubalewski, Arkadiusz @ 2023-11-09 23:21 UTC (permalink / raw)
To: Jiri Pirko
Cc: Vadim Fedorenko, netdev@vger.kernel.org, Michalik, Michal,
Olech, Milena, pabeni@redhat.com, kuba@kernel.org
In-Reply-To: <ZU0fFz+GTdqjA7RD@nanopsycho>
>From: Jiri Pirko <jiri@resnulli.us>
>Sent: Thursday, November 9, 2023 7:04 PM
>
>Thu, Nov 09, 2023 at 05:02:48PM CET, arkadiusz.kubalewski@intel.com wrote:
>>>From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
>>>Sent: Thursday, November 9, 2023 11:56 AM
>>>To: Kubalewski, Arkadiusz <arkadiusz.kubalewski@intel.com>; Jiri Pirko
>>>
>>>On 09/11/2023 09:59, Kubalewski, Arkadiusz wrote:
>>>>> From: Jiri Pirko <jiri@resnulli.us>
>>>>> Sent: Wednesday, November 8, 2023 4:08 PM
>>>>>
>>>>> Wed, Nov 08, 2023 at 11:32:26AM CET, arkadiusz.kubalewski@intel.com
>>>>> wrote:
>>>>>> In case of multiple kernel module instances using the same dpll
>>>>>>device:
>>>>>> if only one registers dpll device, then only that one can register
>>>>>
>>>>> They why you don't register in multiple instances? See mlx5 for a
>>>>> reference.
>>>>>
>>>>
>>>> Every registration requires ops, but for our case only PF0 is able to
>>>> control dpll pins and device, thus only this can provide ops.
>>>> Basically without PF0, dpll is not able to be controlled, as well
>>>> as directly connected pins.
>>>>
>>>But why do you need other pins then, if FP0 doesn't exist?
>>>
>>
>>In general we don't need them at that point, but this is a corner case,
>>where users for some reason decided to unbind PF 0, and I treat this state
>>as temporary, where dpll/pins controllability is temporarily broken.
>
>So resolve this broken situation internally in the driver, registering
>things only in case PF0 is present. Some simple notification infra would
>do. Don't drag this into the subsystem internals.
>
Thanks for your feedback, but this is already wrong advice.
Our HW/FW is designed in different way than yours, it doesn't mean it is wrong.
As you might recall from our sync meetings, the dpll subsystem is to unify
approaches and reduce the code in the drivers, where your advice is exactly
opposite, suggested fix would require to implement extra synchronization of the
dpll and pin registration state between driver instances, most probably with
use of additional modules like aux-bus or something similar, which was from the
very beginning something we tried to avoid.
Only ice uses the infrastructure of muxed pins, and this is broken as it
doesn't allow unbind the driver which have registered dpll and pins without
crashing the kernel, so a fix is required in dpll subsystem, not in the driver.
Thank you!
Arkadiusz
>
>>
>>The dpll at that point is not registered, all the direct pins are also
>>not registered, thus not available to the users.
>>
>>When I do dump at that point there are still 3 pins present, one for each
>>PF, although they are all zombies - no parents as their parent pins are
>>not
>>registered (as the other patch [1/3] prevents dump of pin parent if the
>>parent is not registered). Maybe we can remove the REGISTERED mark for all
>>the muxed pins, if all their parents have been unregistered, so they won't
>>be visible to the user at all. Will try to POC that.
>>
>>>>>
>>>>>> directly connected pins with a dpll device. If unregistered parent
>>>>>> determines if the muxed pin can be register with it or not, it forces
>>>>>> serialized driver load order - first the driver instance which
>>>>>> registers the direct pins needs to be loaded, then the other
>>>>>> instances
>>>>>> could register muxed type pins.
>>>>>>
>>>>>> Allow registration of a pin with a parent even if the parent was not
>>>>>> yet registered, thus allow ability for unserialized driver instance
>>>>>
>>>>> Weird.
>>>>>
>>>>
>>>> Yeah, this is issue only for MUX/parent pin part, couldn't find better
>>>> way, but it doesn't seem to break things around..
>>>>
>>>
>>>I just wonder how do you see the registration procedure? How can parent
>>>pin exist if it's not registered? I believe you cannot get it through
>>>DPLL API, then the only possible way is to create it within the same
>>>driver code, which can be simply re-arranged. Am I wrong here?
>>>
>>
>>By "parent exist" I mean the parent pin exist in the dpll subsystem
>>(allocated on pins xa), but it doesn't mean it is available to the users,
>>as it might not be registered with a dpll device.
>>
>>We have this 2 step init approach:
>>1. dpll_pin_get(..) -> allocate new pin or increase reference if exist
>>2.1. dpll_pin_register(..) -> register with a dpll device
>>2.2. dpll_pin_on_pin_register -> register with a parent pin
>>
>>Basically:
>>- PF 0 does 1 & 2.1 for all the direct inputs, and steps: 1 & 2.2 for its
>> recovery clock pin,
>>- other PF's only do step 1 for the direct input pins (as they must get
>> reference to those in order to register recovery clock pin with them),
>> and steps: 1 & 2.2 for their recovery clock pin.
>>
>>
>>Thank you!
>>Arkadiusz
>>
>>>> Thank you!
>>>> Arkadiusz
>>>>
>>>>>
>>>>>> load order.
>>>>>> Do not WARN_ON notification for unregistered pin, which can be
>>>>>> invoked
>>>>>> for described case, instead just return error.
>>>>>>
>>>>>> Fixes: 9431063ad323 ("dpll: core: Add DPLL framework base functions")
>>>>>> Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base
>>>>>> functions")
>>>>>> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
>>>>>> ---
>>>>>> drivers/dpll/dpll_core.c | 4 ----
>>>>>> drivers/dpll/dpll_netlink.c | 2 +-
>>>>>> 2 files changed, 1 insertion(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/dpll/dpll_core.c b/drivers/dpll/dpll_core.c
>>>>>> index
>>>>>> 4077b562ba3b..ae884b92d68c 100644
>>>>>> --- a/drivers/dpll/dpll_core.c
>>>>>> +++ b/drivers/dpll/dpll_core.c
>>>>>> @@ -28,8 +28,6 @@ static u32 dpll_xa_id;
>>>>>> WARN_ON_ONCE(!xa_get_mark(&dpll_device_xa, (d)->id,
>>>>>> DPLL_REGISTERED))
>>>>>> #define ASSERT_DPLL_NOT_REGISTERED(d) \
>>>>>> WARN_ON_ONCE(xa_get_mark(&dpll_device_xa, (d)->id,
>>>>>> DPLL_REGISTERED))
>>>>>> -#define ASSERT_PIN_REGISTERED(p) \
>>>>>> - WARN_ON_ONCE(!xa_get_mark(&dpll_pin_xa, (p)->id,
>>>>>> DPLL_REGISTERED))
>>>>>>
>>>>>> struct dpll_device_registration {
>>>>>> struct list_head list;
>>>>>> @@ -641,8 +639,6 @@ int dpll_pin_on_pin_register(struct dpll_pin
>>>>>> *parent,
>>>>>> struct dpll_pin *pin,
>>>>>> WARN_ON(!ops->state_on_pin_get) ||
>>>>>> WARN_ON(!ops->direction_get))
>>>>>> return -EINVAL;
>>>>>> - if (ASSERT_PIN_REGISTERED(parent))
>>>>>> - return -EINVAL;
>>>>>>
>>>>>> mutex_lock(&dpll_lock);
>>>>>> ret = dpll_xa_ref_pin_add(&pin->parent_refs, parent, ops,
>>>>>> priv); diff
>>>>>> --git a/drivers/dpll/dpll_netlink.c b/drivers/dpll/dpll_netlink.c
>>>>>> index
>>>>>> 963bbbbe6660..ff430f43304f 100644
>>>>>> --- a/drivers/dpll/dpll_netlink.c
>>>>>> +++ b/drivers/dpll/dpll_netlink.c
>>>>>> @@ -558,7 +558,7 @@ dpll_pin_event_send(enum dpll_cmd event, struct
>>>>>> dpll_pin *pin)
>>>>>> int ret = -ENOMEM;
>>>>>> void *hdr;
>>>>>>
>>>>>> - if (WARN_ON(!xa_get_mark(&dpll_pin_xa, pin->id,
>>>>>> DPLL_REGISTERED)))
>>>>>> + if (!xa_get_mark(&dpll_pin_xa, pin->id, DPLL_REGISTERED))
>>>>>> return -ENODEV;
>>>>>>
>>>>>> msg = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
>>>>>> --
>>>>>> 2.38.1
>>>>>>
>>>
>>
^ permalink raw reply
* RE: [PATCH net 2/3] dpll: fix pin dump crash for rebound module
From: Kubalewski, Arkadiusz @ 2023-11-09 23:32 UTC (permalink / raw)
To: Jiri Pirko
Cc: netdev@vger.kernel.org, vadim.fedorenko@linux.dev,
Michalik, Michal, Olech, Milena, pabeni@redhat.com,
kuba@kernel.org
In-Reply-To: <ZU0fj5y9mAvVzXuf@nanopsycho>
>From: Jiri Pirko <jiri@resnulli.us>
>Sent: Thursday, November 9, 2023 7:06 PM
>
>Thu, Nov 09, 2023 at 05:30:20PM CET, arkadiusz.kubalewski@intel.com wrote:
>>>From: Jiri Pirko <jiri@resnulli.us>
>>>Sent: Thursday, November 9, 2023 2:19 PM
>>>
>>>Thu, Nov 09, 2023 at 01:20:48PM CET, arkadiusz.kubalewski@intel.com
>>>wrote:
>>>>>From: Jiri Pirko <jiri@resnulli.us>
>>>>>Sent: Wednesday, November 8, 2023 3:30 PM
>>>>>
>>>>>Wed, Nov 08, 2023 at 11:32:25AM CET, arkadiusz.kubalewski@intel.com
>>>>>wrote:
>>>>>>When a kernel module is unbound but the pin resources were not
>>>>>>entirely
>>>>>>freed (other kernel module instance have had kept the reference to
>>>>>>that
>>>>>>pin), and kernel module is again bound, the pin properties would not
>>>>>>be
>>>>>>updated (the properties are only assigned when memory for the pin is
>>>>>>allocated), prop pointer still points to the kernel module memory of
>>>>>>the kernel module which was deallocated on the unbind.
>>>>>>
>>>>>>If the pin dump is invoked in this state, the result is a kernel
>>>>>>crash.
>>>>>>Prevent the crash by storing persistent pin properties in dpll
>>>>>>subsystem,
>>>>>>copy the content from the kernel module when pin is allocated, instead
>>>>>>of
>>>>>>using memory of the kernel module.
>>>>>>
>>>>>>Fixes: 9431063ad323 ("dpll: core: Add DPLL framework base functions")
>>>>>>Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base
>>>>>>functions")
>>>>>>Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
>>>>>>---
>>>>>> drivers/dpll/dpll_core.c | 4 ++--
>>>>>> drivers/dpll/dpll_core.h | 4 ++--
>>>>>> drivers/dpll/dpll_netlink.c | 28 ++++++++++++++--------------
>>>>>> 3 files changed, 18 insertions(+), 18 deletions(-)
>>>>>>
>>>>>>diff --git a/drivers/dpll/dpll_core.c b/drivers/dpll/dpll_core.c
>>>>>>index 3568149b9562..4077b562ba3b 100644
>>>>>>--- a/drivers/dpll/dpll_core.c
>>>>>>+++ b/drivers/dpll/dpll_core.c
>>>>>>@@ -442,7 +442,7 @@ dpll_pin_alloc(u64 clock_id, u32 pin_idx, struct
>>>>>>module *module,
>>>>>> ret = -EINVAL;
>>>>>> goto err;
>>>>>> }
>>>>>>- pin->prop = prop;
>>>>>>+ memcpy(&pin->prop, prop, sizeof(pin->prop));
>>>>>
>>>>>Odd, you don't care about the pointer within this structure?
>>>>>
>>>>
>>>>Well, true. Need a fix.
>>>>Wondering if copying idea is better than just assigning prop pointer on
>>>>each call to dpll_pin_get(..) function (when pin already exists)?
>>>
>>>Not sure what do you mean. Examples please.
>>>
>>
>>Sure,
>>
>>Basically this change:
>>
>>diff --git a/drivers/dpll/dpll_core.c b/drivers/dpll/dpll_core.c
>>index ae884b92d68c..06b72d5877c3 100644
>>--- a/drivers/dpll/dpll_core.c
>>+++ b/drivers/dpll/dpll_core.c
>>@@ -483,6 +483,7 @@ dpll_pin_get(u64 clock_id, u32 pin_idx, struct module
>>*module,
>> pos->pin_idx == pin_idx &&
>> pos->module == module) {
>> ret = pos;
>>+ pos->prop = prop;
>> refcount_inc(&ret->refcount);
>> break;
>> }
>>
>>would replace whole of this patch changes, although seems a bit hacky.
>
>Or event better, as I suggested in the other patch reply, resolve this
>internally in the driver registering things only when they are valid.
>Much better then to hack anything in dpll core.
>
This approach seemed to me hacky, that is why started with coping the
data.
It is not about registering, rather about unregistering on driver
unbind, which brakes things, and currently cannot be recovered in
described case.
Thank you!
Arkadiusz
>
>>
>>Thank you!
>>Arkadiusz
>>
>>>
>>>>
>>>>Thank you!
>>>>Arkadiusz
>>>>
>>>>>
>>>>>> refcount_set(&pin->refcount, 1);
>>>>>> xa_init_flags(&pin->dpll_refs, XA_FLAGS_ALLOC);
>>>>>> xa_init_flags(&pin->parent_refs, XA_FLAGS_ALLOC);
>>>>>>@@ -634,7 +634,7 @@ int dpll_pin_on_pin_register(struct dpll_pin
>>>>>>*parent,
>>>>>>struct dpll_pin *pin,
>>>>>> unsigned long i, stop;
>>>>>> int ret;
>>>>>>
>>>>>>- if (WARN_ON(parent->prop->type != DPLL_PIN_TYPE_MUX))
>>>>>>+ if (WARN_ON(parent->prop.type != DPLL_PIN_TYPE_MUX))
>>>>>> return -EINVAL;
>>>>>>
>>>>>> if (WARN_ON(!ops) ||
>>>>>>diff --git a/drivers/dpll/dpll_core.h b/drivers/dpll/dpll_core.h
>>>>>>index 5585873c5c1b..717f715015c7 100644
>>>>>>--- a/drivers/dpll/dpll_core.h
>>>>>>+++ b/drivers/dpll/dpll_core.h
>>>>>>@@ -44,7 +44,7 @@ struct dpll_device {
>>>>>> * @module: module of creator
>>>>>> * @dpll_refs: hold referencees to dplls pin was registered
>>>>>>with
>>>>>> * @parent_refs: hold references to parent pins pin was registered
>>>>>>with
>>>>>>- * @prop: pointer to pin properties given by registerer
>>>>>>+ * @prop: pin properties copied from the registerer
>>>>>> * @rclk_dev_name: holds name of device when pin can recover
>>>>>>clock
>>>>>>from it
>>>>>> * @refcount: refcount
>>>>>> **/
>>>>>>@@ -55,7 +55,7 @@ struct dpll_pin {
>>>>>> struct module *module;
>>>>>> struct xarray dpll_refs;
>>>>>> struct xarray parent_refs;
>>>>>>- const struct dpll_pin_properties *prop;
>>>>>>+ struct dpll_pin_properties prop;
>>>>>> refcount_t refcount;
>>>>>> };
>>>>>>
>>>>>>diff --git a/drivers/dpll/dpll_netlink.c b/drivers/dpll/dpll_netlink.c
>>>>>>index 93fc6c4b8a78..963bbbbe6660 100644
>>>>>>--- a/drivers/dpll/dpll_netlink.c
>>>>>>+++ b/drivers/dpll/dpll_netlink.c
>>>>>>@@ -278,17 +278,17 @@ dpll_msg_add_pin_freq(struct sk_buff *msg,
>>>>>>struct
>>>>>>dpll_pin *pin,
>>>>>> if (nla_put_64bit(msg, DPLL_A_PIN_FREQUENCY, sizeof(freq),
>>>>>>&freq,
>>>>>> DPLL_A_PIN_PAD))
>>>>>> return -EMSGSIZE;
>>>>>>- for (fs = 0; fs < pin->prop->freq_supported_num; fs++) {
>>>>>>+ for (fs = 0; fs < pin->prop.freq_supported_num; fs++) {
>>>>>> nest = nla_nest_start(msg,
>>>>>>DPLL_A_PIN_FREQUENCY_SUPPORTED);
>>>>>> if (!nest)
>>>>>> return -EMSGSIZE;
>>>>>>- freq = pin->prop->freq_supported[fs].min;
>>>>>>+ freq = pin->prop.freq_supported[fs].min;
>>>>>> if (nla_put_64bit(msg, DPLL_A_PIN_FREQUENCY_MIN,
>>>>>>sizeof(freq),
>>>>>> &freq, DPLL_A_PIN_PAD)) {
>>>>>> nla_nest_cancel(msg, nest);
>>>>>> return -EMSGSIZE;
>>>>>> }
>>>>>>- freq = pin->prop->freq_supported[fs].max;
>>>>>>+ freq = pin->prop.freq_supported[fs].max;
>>>>>> if (nla_put_64bit(msg, DPLL_A_PIN_FREQUENCY_MAX,
>>>>>>sizeof(freq),
>>>>>> &freq, DPLL_A_PIN_PAD)) {
>>>>>> nla_nest_cancel(msg, nest);
>>>>>>@@ -304,9 +304,9 @@ static bool dpll_pin_is_freq_supported(struct
>>>>>>dpll_pin
>>>>>>*pin, u32 freq)
>>>>>> {
>>>>>> int fs;
>>>>>>
>>>>>>- for (fs = 0; fs < pin->prop->freq_supported_num; fs++)
>>>>>>- if (freq >= pin->prop->freq_supported[fs].min &&
>>>>>>- freq <= pin->prop->freq_supported[fs].max)
>>>>>>+ for (fs = 0; fs < pin->prop.freq_supported_num; fs++)
>>>>>>+ if (freq >= pin->prop.freq_supported[fs].min &&
>>>>>>+ freq <= pin->prop.freq_supported[fs].max)
>>>>>> return true;
>>>>>> return false;
>>>>>> }
>>>>>>@@ -403,7 +403,7 @@ static int
>>>>>> dpll_cmd_pin_get_one(struct sk_buff *msg, struct dpll_pin *pin,
>>>>>> struct netlink_ext_ack *extack)
>>>>>> {
>>>>>>- const struct dpll_pin_properties *prop = pin->prop;
>>>>>>+ const struct dpll_pin_properties *prop = &pin->prop;
>>>>>> struct dpll_pin_ref *ref;
>>>>>> int ret;
>>>>>>
>>>>>>@@ -696,7 +696,7 @@ dpll_pin_on_pin_state_set(struct dpll_pin *pin,
>>>>>>u32
>>>>>>parent_idx,
>>>>>> int ret;
>>>>>>
>>>>>> if (!(DPLL_PIN_CAPABILITIES_STATE_CAN_CHANGE &
>>>>>>- pin->prop->capabilities)) {
>>>>>>+ pin->prop.capabilities)) {
>>>>>> NL_SET_ERR_MSG(extack, "state changing is not allowed");
>>>>>> return -EOPNOTSUPP;
>>>>>> }
>>>>>>@@ -732,7 +732,7 @@ dpll_pin_state_set(struct dpll_device *dpll,
>>>>>>struct
>>>>>>dpll_pin *pin,
>>>>>> int ret;
>>>>>>
>>>>>> if (!(DPLL_PIN_CAPABILITIES_STATE_CAN_CHANGE &
>>>>>>- pin->prop->capabilities)) {
>>>>>>+ pin->prop.capabilities)) {
>>>>>> NL_SET_ERR_MSG(extack, "state changing is not allowed");
>>>>>> return -EOPNOTSUPP;
>>>>>> }
>>>>>>@@ -759,7 +759,7 @@ dpll_pin_prio_set(struct dpll_device *dpll, struct
>>>>>>dpll_pin *pin,
>>>>>> int ret;
>>>>>>
>>>>>> if (!(DPLL_PIN_CAPABILITIES_PRIORITY_CAN_CHANGE &
>>>>>>- pin->prop->capabilities)) {
>>>>>>+ pin->prop.capabilities)) {
>>>>>> NL_SET_ERR_MSG(extack, "prio changing is not allowed");
>>>>>> return -EOPNOTSUPP;
>>>>>> }
>>>>>>@@ -787,7 +787,7 @@ dpll_pin_direction_set(struct dpll_pin *pin,
>>>>>>struct
>>>>>>dpll_device *dpll,
>>>>>> int ret;
>>>>>>
>>>>>> if (!(DPLL_PIN_CAPABILITIES_DIRECTION_CAN_CHANGE &
>>>>>>- pin->prop->capabilities)) {
>>>>>>+ pin->prop.capabilities)) {
>>>>>> NL_SET_ERR_MSG(extack, "direction changing is not
>>>>>>allowed");
>>>>>> return -EOPNOTSUPP;
>>>>>> }
>>>>>>@@ -817,8 +817,8 @@ dpll_pin_phase_adj_set(struct dpll_pin *pin,
>>>>>>struct
>>>>>>nlattr *phase_adj_attr,
>>>>>> int ret;
>>>>>>
>>>>>> phase_adj = nla_get_s32(phase_adj_attr);
>>>>>>- if (phase_adj > pin->prop->phase_range.max ||
>>>>>>- phase_adj < pin->prop->phase_range.min) {
>>>>>>+ if (phase_adj > pin->prop.phase_range.max ||
>>>>>>+ phase_adj < pin->prop.phase_range.min) {
>>>>>> NL_SET_ERR_MSG_ATTR(extack, phase_adj_attr,
>>>>>> "phase adjust value not supported");
>>>>>> return -EINVAL;
>>>>>>@@ -999,7 +999,7 @@ dpll_pin_find(u64 clock_id, struct nlattr
>>>>>>*mod_name_attr,
>>>>>> unsigned long i;
>>>>>>
>>>>>> xa_for_each_marked(&dpll_pin_xa, i, pin, DPLL_REGISTERED) {
>>>>>>- prop = pin->prop;
>>>>>>+ prop = &pin->prop;
>>>>>> cid_match = clock_id ? pin->clock_id == clock_id : true;
>>>>>> mod_match = mod_name_attr && module_name(pin->module) ?
>>>>>> !nla_strcmp(mod_name_attr,
>>>>>>--
>>>>>>2.38.1
>>>>>>
>>>>
^ permalink raw reply
* Re: [PATCH net-next v2 1/1] ptp: clockmatrix: support 32-bit address space
From: Rahul Rameshbabu @ 2023-11-09 23:34 UTC (permalink / raw)
To: Min Li; +Cc: richardcochran, lee, linux-kernel, netdev, Min Li
In-Reply-To: <MW5PR03MB6932A4AAD4F612B45E9F6856A0AFA@MW5PR03MB6932.namprd03.prod.outlook.com>
On Thu, 09 Nov, 2023 13:13:52 -0500 Min Li <lnimi@hotmail.com> wrote:
> From: Min Li <min.li.xe@renesas.com>
>
> We used to assume 0x2010xxxx address. Now that
> we need to access 0x2011xxxx address, we need
> to support read/write the whole 32-bit address space.
>
> Signed-off-by: Min Li <min.li.xe@renesas.com>
> ---
> - Drop MAX_ABS_WRITE_PHASE_PICOSECONDS advised by Rahul
>
> drivers/ptp/ptp_clockmatrix.c | 61 ++--
> drivers/ptp/ptp_clockmatrix.h | 32 +-
> include/linux/mfd/idt8a340_reg.h | 542 ++++++++++++++++---------------
> 3 files changed, 328 insertions(+), 307 deletions(-)
>
> diff --git a/drivers/ptp/ptp_clockmatrix.c b/drivers/ptp/ptp_clockmatrix.c
> index f6f9d4adce04..ff316aebff45 100644
> --- a/drivers/ptp/ptp_clockmatrix.c
> +++ b/drivers/ptp/ptp_clockmatrix.c
<snip>
> @@ -1705,10 +1720,14 @@ static s32 idtcm_getmaxphase(struct ptp_clock_info *ptp __always_unused)
> }
>
> /*
> - * Internal function for implementing support for write phase offset
> + * Maximum absolute value for write phase offset in picoseconds
> *
> * @channel: channel
> * @delta_ns: delta in nanoseconds
> + *
> + * Destination signed register is 32-bit register in resolution of 50ps
> + *
> + * 0x7fffffff * 50 = 2147483647 * 50 = 107374182350
You would want to drop these comment changes as well. They were moved to
idtcm_adjphase.
> */
> static int _idtcm_adjphase(struct idtcm_channel *channel, s32 delta_ns)
> {
--
Thanks,
Rahul Rameshbabu
^ permalink raw reply
* RE: [PATCH net 0/3] dpll: fix unordered unbind/bind registerer issues
From: Kubalewski, Arkadiusz @ 2023-11-09 23:35 UTC (permalink / raw)
To: Jiri Pirko
Cc: Vadim Fedorenko, netdev@vger.kernel.org, Michalik, Michal,
Olech, Milena, pabeni@redhat.com, kuba@kernel.org
In-Reply-To: <ZU0fzzmmxjnsNW0n@nanopsycho>
>From: Jiri Pirko <jiri@resnulli.us>
>Sent: Thursday, November 9, 2023 7:07 PM
>
>Thu, Nov 09, 2023 at 06:20:14PM CET, arkadiusz.kubalewski@intel.com wrote:
>>>From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
>>>Sent: Thursday, November 9, 2023 11:51 AM
>>>
>>>On 08/11/2023 10:32, Arkadiusz Kubalewski wrote:
>>>> Fix issues when performing unordered unbind/bind of a kernel modules
>>>> which are using a dpll device with DPLL_PIN_TYPE_MUX pins.
>>>> Currently only serialized bind/unbind of such use case works, fix
>>>> the issues and allow for unserialized kernel module bind order.
>>>>
>>>> The issues are observed on the ice driver, i.e.,
>>>>
>>>> $ echo 0000:af:00.0 > /sys/bus/pci/drivers/ice/unbind
>>>> $ echo 0000:af:00.1 > /sys/bus/pci/drivers/ice/unbind
>>>>
>>>> results in:
>>>>
>>>> ice 0000:af:00.0: Removed PTP clock
>>>> BUG: kernel NULL pointer dereference, address: 0000000000000010
>>>> PF: supervisor read access in kernel mode
>>>> PF: error_code(0x0000) - not-present page
>>>> PGD 0 P4D 0
>>>> Oops: 0000 [#1] PREEMPT SMP PTI
>>>> CPU: 7 PID: 71848 Comm: bash Kdump: loaded Not tainted 6.6.0-rc5_next-
>>>>queue_19th-Oct-2023-01625-g039e5d15e451 #1
>>>> Hardware name: Intel Corporation S2600STB/S2600STB, BIOS
>>>>SE5C620.86B.02.01.0008.031920191559 03/19/2019
>>>> RIP: 0010:ice_dpll_rclk_state_on_pin_get+0x2f/0x90 [ice]
>>>> Code: 41 57 4d 89 cf 41 56 41 55 4d 89 c5 41 54 55 48 89 f5 53 4c 8b 66
>>>>08 48 89 cb 4d 8d b4 24 f0 49 00 00 4c 89 f7 e8 71 ec 1f c5 <0f> b6 5b
>>>>10
>>>>41 0f b6 84 24 30 4b 00 00 29 c3 41 0f b6 84 24 28 4b
>>>> RSP: 0018:ffffc902b179fb60 EFLAGS: 00010246
>>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>>> RDX: ffff8882c1398000 RSI: ffff888c7435cc60 RDI: ffff888c7435cb90
>>>> RBP: ffff888c7435cc60 R08: ffffc902b179fbb0 R09: 0000000000000000
>>>> R10: ffff888ef1fc8050 R11: fffffffffff82700 R12: ffff888c743581a0
>>>> R13: ffffc902b179fbb0 R14: ffff888c7435cb90 R15: 0000000000000000
>>>> FS: 00007fdc7dae0740(0000) GS:ffff888c105c0000(0000)
>>>>knlGS:0000000000000000
>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> CR2: 0000000000000010 CR3: 0000000132c24002 CR4: 00000000007706e0
>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> PKRU: 55555554
>>>> Call Trace:
>>>> <TASK>
>>>> ? __die+0x20/0x70
>>>> ? page_fault_oops+0x76/0x170
>>>> ? exc_page_fault+0x65/0x150
>>>> ? asm_exc_page_fault+0x22/0x30
>>>> ? ice_dpll_rclk_state_on_pin_get+0x2f/0x90 [ice]
>>>> ? __pfx_ice_dpll_rclk_state_on_pin_get+0x10/0x10 [ice]
>>>> dpll_msg_add_pin_parents+0x142/0x1d0
>>>> dpll_pin_event_send+0x7d/0x150
>>>> dpll_pin_on_pin_unregister+0x3f/0x100
>>>> ice_dpll_deinit_pins+0xa1/0x230 [ice]
>>>> ice_dpll_deinit+0x29/0xe0 [ice]
>>>> ice_remove+0xcd/0x200 [ice]
>>>> pci_device_remove+0x33/0xa0
>>>> device_release_driver_internal+0x193/0x200
>>>> unbind_store+0x9d/0xb0
>>>> kernfs_fop_write_iter+0x128/0x1c0
>>>> vfs_write+0x2bb/0x3e0
>>>> ksys_write+0x5f/0xe0
>>>> do_syscall_64+0x59/0x90
>>>> ? filp_close+0x1b/0x30
>>>> ? do_dup2+0x7d/0xd0
>>>> ? syscall_exit_work+0x103/0x130
>>>> ? syscall_exit_to_user_mode+0x22/0x40
>>>> ? do_syscall_64+0x69/0x90
>>>> ? syscall_exit_work+0x103/0x130
>>>> ? syscall_exit_to_user_mode+0x22/0x40
>>>> ? do_syscall_64+0x69/0x90
>>>> entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>>>> RIP: 0033:0x7fdc7d93eb97
>>>> Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e
>>>>fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0
>>>>ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
>>>> RSP: 002b:00007fff2aa91028 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>>>> RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fdc7d93eb97
>>>> RDX: 000000000000000d RSI: 00005644814ec9b0 RDI: 0000000000000001
>>>> RBP: 00005644814ec9b0 R08: 0000000000000000 R09: 00007fdc7d9b14e0
>>>> R10: 00007fdc7d9b13e0 R11: 0000000000000246 R12: 000000000000000d
>>>> R13: 00007fdc7d9fb780 R14: 000000000000000d R15: 00007fdc7d9f69e0
>>>> </TASK>
>>>> Modules linked in: uinput vfio_pci vfio_pci_core vfio_iommu_type1 vfio
>>>>irqbypass ixgbevf snd_seq_dummy snd_hrtimer snd_seq snd_timer
>>>>snd_seq_device snd soundcore overlay qrtr rfkill vfat fat xfs libcrc32c
>>>>rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod
>>>>target_core_mod
>>>>ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm intel_rapl_msr
>>>>intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common
>>>>isst_if_common skx_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal
>>>>intel_powerclamp coretemp irdma rapl intel_cstate ib_uverbs iTCO_wdt
>>>>iTCO_vendor_support acpi_ipmi intel_uncore mei_me ipmi_si pcspkr
>>>>i2c_i801
>>>>ib_core mei ipmi_devintf intel_pch_thermal ioatdma i2c_smbus
>>>>ipmi_msghandler lpc_ich joydev acpi_power_meter acpi_pad ext4 mbcache
>>>>jbd2
>>>>sd_mod t10_pi sg ast i2c_algo_bit drm_shmem_helper drm_kms_helper ice
>>>>crct10dif_pclmul ixgbe crc32_pclmul drm crc32c_intel ahci i40e libahci
>>>>ghash_clmulni_intel libata mdio dca gnss wmi fuse [last unloaded: iavf]
>>>> CR2: 0000000000000010
>>>>
>>>> Arkadiusz Kubalewski (3):
>>>> dpll: fix pin dump crash after module unbind
>>>> dpll: fix pin dump crash for rebound module
>>>> dpll: fix register pin with unregistered parent pin
>>>>
>>>> drivers/dpll/dpll_core.c | 8 ++------
>>>> drivers/dpll/dpll_core.h | 4 ++--
>>>> drivers/dpll/dpll_netlink.c | 37 ++++++++++++++++++++++--------------
>>>>-
>>>> 3 files changed, 26 insertions(+), 23 deletions(-)
>>>>
>>>
>>>
>>>I still don't get how can we end up with unregistered pin. And shouldn't
>>>drivers do unregister of dpll/pin during release procedure? I thought it
>>>was kind of agreement we reached while developing the subsystem.
>>>
>>
>>It's definitely not about ending up with unregistered pins.
>>
>>Usually the driver is loaded for PF0, PF1, PF2, PF3 and unloaded in
>>opposite
>>order: PF3, PF2, PF1, PF0. And this is working without any issues.
>
>Please fix this in the driver.
>
Thanks for your feedback, but this is already wrong advice.
Our HW/FW is designed in different way than yours, it doesn't mean it is wrong.
As you might recall from our sync meetings, the dpll subsystem is to unify
approaches and reduce the code in the drivers, where your advice is exactly
opposite, suggested fix would require to implement extra synchronization of the
dpll and pin registration state between driver instances, most probably with
use of additional modules like aux-bus or something similar, which was from the
very beginning something we tried to avoid.
Only ice uses the infrastructure of muxed pins, and this is broken as it
doesn't allow unbind the driver which have registered dpll and pins without
crashing the kernel, so a fix is required in dpll subsystem, not in the driver.
Thank you!
Arkadiusz
>
>>
>>Above crash is caused because of unordered driver unload, where dpll
>>subsystem
>>tries to notify muxed pin was deleted, but at that time the parent is
>>already
>>gone, thus data points to memory which is no longer available, thus crash
>>happens when trying to dump pin parents.
>>
>>This series fixes all issues I could find connected to the situation where
>>muxed-pins are trying to access their parents, when parent registerer was
>>removed
>>in the meantime.
>>
>>Thank you!
>>Arkadiusz
^ permalink raw reply
* Re: [RFC v1 0/8] vhost-vdpa: add support for iommufd
From: Michael S. Tsirkin @ 2023-11-09 23:48 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Cindy Lu, jasowang, yi.l.liu, linux-kernel, virtualization,
netdev
In-Reply-To: <20231107155217.GQ4488@nvidia.com>
On Tue, Nov 07, 2023 at 11:52:17AM -0400, Jason Gunthorpe wrote:
> On Tue, Nov 07, 2023 at 09:30:21AM -0500, Michael S. Tsirkin wrote:
> > On Tue, Nov 07, 2023 at 10:12:37AM -0400, Jason Gunthorpe wrote:
> > > Big company's should take the responsibility to train and provide
> > > skill development for their own staff.
> >
> > That would result in a beautiful cathedral of a patch. I know this is
> > how some companies work. We are doing more of a bazaar thing here,
> > though. In a bunch of subsystems it seems that you don't get the
> > necessary skills until you have been publically shouted at by
> > maintainers - better to start early ;). Not a nice environment for
> > novices, for sure.
>
> In my view the "shouting from maintainers" is harmful to the people
> buidling skills and it is an unkind thing to dump employees into that
> kind of situation.
>
> They should have help to establish the basic level of competence where
> they may do the wrong thing, but all the process and presentation of
> the wrong thing is top notch. You get a much better reception.
>
> Jason
What - like e.g. mechanically fixing checkpatch warnings without
understanding? I actually very much dislike it when people take a bad
patch and just polish the presentation
- it is easier for me if I can judge patch quality quickly from the
presentation. Matter of taste I guess.
--
MST
^ permalink raw reply
* [PATCH net] gve: Fixes for napi_poll when budget is 0
From: Ziwei Xiao @ 2023-11-09 23:59 UTC (permalink / raw)
To: netdev; +Cc: davem, kuba, Ziwei Xiao
Netpoll will explicilty pass the polling call with a budget of 0 to
indicate it's clearing the Tx path only. For the gve_rx_poll and
gve_xdp_poll, they were mistakenly taking the 0 budget as the indication
to do all the work. Add check to avoid the rx path and xdp path being
called when budget is 0. And also add check to avoid napi_complete_done
being triggered when budget is 0 for netpoll.
Fixes: f5cedc84a30d ("gve: Add transmit and receive support")
Signed-off-by: Ziwei Xiao <ziweixiao@google.com>
---
drivers/net/ethernet/google/gve/gve_main.c | 10 +++++-----
drivers/net/ethernet/google/gve/gve_rx.c | 4 ----
drivers/net/ethernet/google/gve/gve_tx.c | 4 ----
3 files changed, 5 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 276f996f95dc..5a84ccfd3423 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -254,16 +254,16 @@ static int gve_napi_poll(struct napi_struct *napi, int budget)
if (block->tx) {
if (block->tx->q_num < priv->tx_cfg.num_queues)
reschedule |= gve_tx_poll(block, budget);
- else
+ else if (budget)
reschedule |= gve_xdp_poll(block, budget);
}
- if (block->rx) {
+ if (block->rx && budget > 0) {
work_done = gve_rx_poll(block, budget);
reschedule |= work_done == budget;
}
- if (reschedule)
+ if (reschedule || budget == 0)
return budget;
/* Complete processing - don't unmask irq if busy polling is enabled */
@@ -298,12 +298,12 @@ static int gve_napi_poll_dqo(struct napi_struct *napi, int budget)
if (block->tx)
reschedule |= gve_tx_poll_dqo(block, /*do_clean=*/true);
- if (block->rx) {
+ if (block->rx && budget > 0) {
work_done = gve_rx_poll_dqo(block, budget);
reschedule |= work_done == budget;
}
- if (reschedule)
+ if (reschedule || budget == 0)
return budget;
if (likely(napi_complete_done(napi, work_done))) {
diff --git a/drivers/net/ethernet/google/gve/gve_rx.c b/drivers/net/ethernet/google/gve/gve_rx.c
index e84a066aa1a4..73655347902d 100644
--- a/drivers/net/ethernet/google/gve/gve_rx.c
+++ b/drivers/net/ethernet/google/gve/gve_rx.c
@@ -1007,10 +1007,6 @@ int gve_rx_poll(struct gve_notify_block *block, int budget)
feat = block->napi.dev->features;
- /* If budget is 0, do all the work */
- if (budget == 0)
- budget = INT_MAX;
-
if (budget > 0)
work_done = gve_clean_rx_done(rx, budget, feat);
diff --git a/drivers/net/ethernet/google/gve/gve_tx.c b/drivers/net/ethernet/google/gve/gve_tx.c
index 6957a865cff3..9f6ffc4a54f0 100644
--- a/drivers/net/ethernet/google/gve/gve_tx.c
+++ b/drivers/net/ethernet/google/gve/gve_tx.c
@@ -925,10 +925,6 @@ bool gve_xdp_poll(struct gve_notify_block *block, int budget)
bool repoll;
u32 to_do;
- /* If budget is 0, do all the work */
- if (budget == 0)
- budget = INT_MAX;
-
/* Find out how much work there is to be done */
nic_done = gve_tx_load_event_counter(priv, tx);
to_do = min_t(u32, (nic_done - tx->done), budget);
--
2.43.0.rc0.421.g78406f8d94-goog
^ permalink raw reply related
* Re: [net-next RFC PATCH v6 1/4] net: phy: aquantia: move to separate directory
From: Andrew Lunn @ 2023-11-10 0:12 UTC (permalink / raw)
To: Christian Marangi
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiner Kallweit,
Russell King, Robert Marko, Vladimir Oltean, netdev, devicetree,
linux-kernel
In-Reply-To: <20231109123253.3933-1-ansuelsmth@gmail.com>
On Thu, Nov 09, 2023 at 01:32:50PM +0100, Christian Marangi wrote:
> Move aquantia PHY driver to separate driectory in preparation for
> firmware loading support to keep things tidy.
>
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox