All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Pirko <jiri@resnulli.us>
To: "Kubalewski, Arkadiusz" <arkadiusz.kubalewski@intel.com>
Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"Michalik, Michal" <michal.michalik@intel.com>,
	"Olech, Milena" <milena.olech@intel.com>,
	"pabeni@redhat.com" <pabeni@redhat.com>,
	"kuba@kernel.org" <kuba@kernel.org>
Subject: Re: [PATCH net 0/3] dpll: fix unordered unbind/bind registerer issues
Date: Fri, 10 Nov 2023 11:09:25 +0100	[thread overview]
Message-ID: <ZU4BVdPvAwKer+3v@nanopsycho> (raw)
In-Reply-To: <DM6PR11MB4657DE812ADB8C5079705DC99BAEA@DM6PR11MB4657.namprd11.prod.outlook.com>

Fri, Nov 10, 2023 at 10:06:59AM CET, arkadiusz.kubalewski@intel.com wrote:
>>From: Jiri Pirko <jiri@resnulli.us>
>>Sent: Friday, November 10, 2023 7:49 AM
>>
>>Fri, Nov 10, 2023 at 12:35:43AM CET, arkadiusz.kubalewski@intel.com wrote:
>>>>From: Jiri Pirko <jiri@resnulli.us>
>>>>Sent: Thursday, November 9, 2023 7:07 PM
>>>>
>>>>Thu, Nov 09, 2023 at 06:20:14PM CET, arkadiusz.kubalewski@intel.com
>>>>wrote:
>>>>>>From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
>>>>>>Sent: Thursday, November 9, 2023 11:51 AM
>>>>>>
>>>>>>On 08/11/2023 10:32, Arkadiusz Kubalewski wrote:
>>>>>>> Fix issues when performing unordered unbind/bind of a kernel modules
>>>>>>> which are using a dpll device with DPLL_PIN_TYPE_MUX pins.
>>>>>>> Currently only serialized bind/unbind of such use case works, fix
>>>>>>> the issues and allow for unserialized kernel module bind order.
>>>>>>>
>>>>>>> The issues are observed on the ice driver, i.e.,
>>>>>>>
>>>>>>> $ echo 0000:af:00.0 > /sys/bus/pci/drivers/ice/unbind
>>>>>>> $ echo 0000:af:00.1 > /sys/bus/pci/drivers/ice/unbind
>>>>>>>
>>>>>>> results in:
>>>>>>>
>>>>>>> ice 0000:af:00.0: Removed PTP clock
>>>>>>> BUG: kernel NULL pointer dereference, address: 0000000000000010
>>>>>>> PF: supervisor read access in kernel mode
>>>>>>> PF: error_code(0x0000) - not-present page
>>>>>>> PGD 0 P4D 0
>>>>>>> Oops: 0000 [#1] PREEMPT SMP PTI
>>>>>>> CPU: 7 PID: 71848 Comm: bash Kdump: loaded Not tainted 6.6.0-
>>>>>>>rc5_next-
>>>>>>>queue_19th-Oct-2023-01625-g039e5d15e451 #1
>>>>>>> Hardware name: Intel Corporation S2600STB/S2600STB, BIOS
>>>>>>>SE5C620.86B.02.01.0008.031920191559 03/19/2019
>>>>>>> RIP: 0010:ice_dpll_rclk_state_on_pin_get+0x2f/0x90 [ice]
>>>>>>> Code: 41 57 4d 89 cf 41 56 41 55 4d 89 c5 41 54 55 48 89 f5 53 4c 8b
>>>>>>>66
>>>>>>>08 48 89 cb 4d 8d b4 24 f0 49 00 00 4c 89 f7 e8 71 ec 1f c5 <0f> b6 5b
>>>>>>>10
>>>>>>>41 0f b6 84 24 30 4b 00 00 29 c3 41 0f b6 84 24 28 4b
>>>>>>> RSP: 0018:ffffc902b179fb60 EFLAGS: 00010246
>>>>>>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>>>>>> RDX: ffff8882c1398000 RSI: ffff888c7435cc60 RDI: ffff888c7435cb90
>>>>>>> RBP: ffff888c7435cc60 R08: ffffc902b179fbb0 R09: 0000000000000000
>>>>>>> R10: ffff888ef1fc8050 R11: fffffffffff82700 R12: ffff888c743581a0
>>>>>>> R13: ffffc902b179fbb0 R14: ffff888c7435cb90 R15: 0000000000000000
>>>>>>> FS:  00007fdc7dae0740(0000) GS:ffff888c105c0000(0000)
>>>>>>>knlGS:0000000000000000
>>>>>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> CR2: 0000000000000010 CR3: 0000000132c24002 CR4: 00000000007706e0
>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>> PKRU: 55555554
>>>>>>> Call Trace:
>>>>>>>   <TASK>
>>>>>>>   ? __die+0x20/0x70
>>>>>>>   ? page_fault_oops+0x76/0x170
>>>>>>>   ? exc_page_fault+0x65/0x150
>>>>>>>   ? asm_exc_page_fault+0x22/0x30
>>>>>>>   ? ice_dpll_rclk_state_on_pin_get+0x2f/0x90 [ice]
>>>>>>>   ? __pfx_ice_dpll_rclk_state_on_pin_get+0x10/0x10 [ice]
>>>>>>>   dpll_msg_add_pin_parents+0x142/0x1d0
>>>>>>>   dpll_pin_event_send+0x7d/0x150
>>>>>>>   dpll_pin_on_pin_unregister+0x3f/0x100
>>>>>>>   ice_dpll_deinit_pins+0xa1/0x230 [ice]
>>>>>>>   ice_dpll_deinit+0x29/0xe0 [ice]
>>>>>>>   ice_remove+0xcd/0x200 [ice]
>>>>>>>   pci_device_remove+0x33/0xa0
>>>>>>>   device_release_driver_internal+0x193/0x200
>>>>>>>   unbind_store+0x9d/0xb0
>>>>>>>   kernfs_fop_write_iter+0x128/0x1c0
>>>>>>>   vfs_write+0x2bb/0x3e0
>>>>>>>   ksys_write+0x5f/0xe0
>>>>>>>   do_syscall_64+0x59/0x90
>>>>>>>   ? filp_close+0x1b/0x30
>>>>>>>   ? do_dup2+0x7d/0xd0
>>>>>>>   ? syscall_exit_work+0x103/0x130
>>>>>>>   ? syscall_exit_to_user_mode+0x22/0x40
>>>>>>>   ? do_syscall_64+0x69/0x90
>>>>>>>   ? syscall_exit_work+0x103/0x130
>>>>>>>   ? syscall_exit_to_user_mode+0x22/0x40
>>>>>>>   ? do_syscall_64+0x69/0x90
>>>>>>>   entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>>>>>>> RIP: 0033:0x7fdc7d93eb97
>>>>>>> Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f
>>>>>>>1e
>>>>>>>fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00
>>>>>>>f0
>>>>>>>ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
>>>>>>> RSP: 002b:00007fff2aa91028 EFLAGS: 00000246 ORIG_RAX:
>>>>>>>0000000000000001
>>>>>>> RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fdc7d93eb97
>>>>>>> RDX: 000000000000000d RSI: 00005644814ec9b0 RDI: 0000000000000001
>>>>>>> RBP: 00005644814ec9b0 R08: 0000000000000000 R09: 00007fdc7d9b14e0
>>>>>>> R10: 00007fdc7d9b13e0 R11: 0000000000000246 R12: 000000000000000d
>>>>>>> R13: 00007fdc7d9fb780 R14: 000000000000000d R15: 00007fdc7d9f69e0
>>>>>>>   </TASK>
>>>>>>> Modules linked in: uinput vfio_pci vfio_pci_core vfio_iommu_type1
>>>>>>>vfio
>>>>>>>irqbypass ixgbevf snd_seq_dummy snd_hrtimer snd_seq snd_timer
>>>>>>>snd_seq_device snd soundcore overlay qrtr rfkill vfat fat xfs
>>>>>>>libcrc32c
>>>>>>>rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod
>>>>>>>target_core_mod
>>>>>>>ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm
>>>>>>>intel_rapl_msr
>>>>>>>intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common
>>>>>>>isst_if_common skx_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal
>>>>>>>intel_powerclamp coretemp irdma rapl intel_cstate ib_uverbs iTCO_wdt
>>>>>>>iTCO_vendor_support acpi_ipmi intel_uncore mei_me ipmi_si pcspkr
>>>>>>>i2c_i801
>>>>>>>ib_core mei ipmi_devintf intel_pch_thermal ioatdma i2c_smbus
>>>>>>>ipmi_msghandler lpc_ich joydev acpi_power_meter acpi_pad ext4 mbcache
>>>>>>>jbd2
>>>>>>>sd_mod t10_pi sg ast i2c_algo_bit drm_shmem_helper drm_kms_helper ice
>>>>>>>crct10dif_pclmul ixgbe crc32_pclmul drm crc32c_intel ahci i40e libahci
>>>>>>>ghash_clmulni_intel libata mdio dca gnss wmi fuse [last unloaded:
>>>>>>>iavf]
>>>>>>> CR2: 0000000000000010
>>>>>>>
>>>>>>> Arkadiusz Kubalewski (3):
>>>>>>>    dpll: fix pin dump crash after module unbind
>>>>>>>    dpll: fix pin dump crash for rebound module
>>>>>>>    dpll: fix register pin with unregistered parent pin
>>>>>>>
>>>>>>>   drivers/dpll/dpll_core.c    |  8 ++------
>>>>>>>   drivers/dpll/dpll_core.h    |  4 ++--
>>>>>>>   drivers/dpll/dpll_netlink.c | 37 ++++++++++++++++++++++------------
>>>>>>>--
>>>>>>>-
>>>>>>>   3 files changed, 26 insertions(+), 23 deletions(-)
>>>>>>>
>>>>>>
>>>>>>
>>>>>>I still don't get how can we end up with unregistered pin. And
>>>>>>shouldn't
>>>>>>drivers do unregister of dpll/pin during release procedure? I thought
>>>>>>it
>>>>>>was kind of agreement we reached while developing the subsystem.
>>>>>>
>>>>>
>>>>>It's definitely not about ending up with unregistered pins.
>>>>>
>>>>>Usually the driver is loaded for PF0, PF1, PF2, PF3 and unloaded in
>>>>>opposite
>>>>>order: PF3, PF2, PF1, PF0. And this is working without any issues.
>>>>
>>>>Please fix this in the driver.
>>>>
>>>
>>>Thanks for your feedback, but this is already wrong advice.
>>>
>>>Our HW/FW is designed in different way than yours, it doesn't mean it is
>>>wrong.
>>>As you might recall from our sync meetings, the dpll subsystem is to unify
>>>approaches and reduce the code in the drivers, where your advice is
>>>exactly
>>>opposite, suggested fix would require to implement extra synchronization
>>>of the
>>>dpll and pin registration state between driver instances, most probably
>>>with
>>>use of additional modules like aux-bus or something similar, which was
>>>from the
>>>very beginning something we tried to avoid.
>>>Only ice uses the infrastructure of muxed pins, and this is broken as it
>>>doesn't allow unbind the driver which have registered dpll and pins
>>>without
>>>crashing the kernel, so a fix is required in dpll subsystem, not in the
>>>driver.
>>
>>I replied in the other patch thread.
>>
>
>Yes, so did I.
>But what is the reason you have moved the discussion from the other thread
>into this one?

I didn't, not sure why you say so. I just wanted to make sure you
follow.

>
>Thank you!
>Arkadiusz
>
>>
>>>
>>>Thank you!
>>>Arkadiusz
>>>
>>>>
>>>>>
>>>>>Above crash is caused because of unordered driver unload, where dpll
>>>>>subsystem
>>>>>tries to notify muxed pin was deleted, but at that time the parent is
>>>>>already
>>>>>gone, thus data points to memory which is no longer available, thus
>>>>>crash
>>>>>happens when trying to dump pin parents.
>>>>>
>>>>>This series fixes all issues I could find connected to the situation
>>>>>where
>>>>>muxed-pins are trying to access their parents, when parent registerer
>>>>>was
>>>>>removed
>>>>>in the meantime.
>>>>>
>>>>>Thank you!
>>>>>Arkadiusz

  reply	other threads:[~2023-11-10 10:09 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-08 10:32 [PATCH net 0/3] dpll: fix unordered unbind/bind registerer issues Arkadiusz Kubalewski
2023-11-08 10:32 ` [PATCH net 1/3] dpll: fix pin dump crash after module unbind Arkadiusz Kubalewski
2023-11-08 11:36   ` Przemek Kitszel
2023-11-08 12:08     ` Kubalewski, Arkadiusz
2023-11-08 15:08   ` Jiri Pirko
2023-11-09  9:49     ` Kubalewski, Arkadiusz
2023-11-09 13:18       ` Jiri Pirko
2023-11-09 16:33         ` Kubalewski, Arkadiusz
2023-11-08 10:32 ` [PATCH net 2/3] dpll: fix pin dump crash for rebound module Arkadiusz Kubalewski
2023-11-08 14:30   ` Jiri Pirko
2023-11-09 12:20     ` Kubalewski, Arkadiusz
2023-11-09 13:19       ` Jiri Pirko
2023-11-09 16:30         ` Kubalewski, Arkadiusz
2023-11-09 18:06           ` Jiri Pirko
2023-11-09 23:32             ` Kubalewski, Arkadiusz
2023-11-10  6:45               ` Jiri Pirko
2023-11-10  9:01                 ` Kubalewski, Arkadiusz
2023-11-10 10:06                   ` Jiri Pirko
2023-11-10 11:18                     ` Kubalewski, Arkadiusz
2023-11-10 11:44                       ` Jiri Pirko
2023-11-10 14:11                         ` Kubalewski, Arkadiusz
2023-11-08 10:32 ` [PATCH net 3/3] dpll: fix register pin with unregistered parent pin Arkadiusz Kubalewski
2023-11-08 15:07   ` Jiri Pirko
2023-11-09  9:59     ` Kubalewski, Arkadiusz
2023-11-09 10:56       ` Vadim Fedorenko
2023-11-09 16:02         ` Kubalewski, Arkadiusz
2023-11-09 18:04           ` Jiri Pirko
2023-11-09 23:21             ` Kubalewski, Arkadiusz
2023-11-10  6:48               ` Jiri Pirko
2023-11-10  8:50                 ` Kubalewski, Arkadiusz
2023-11-10 10:07                   ` Jiri Pirko
2023-11-10 11:19                     ` Kubalewski, Arkadiusz
2023-11-09 13:20       ` Jiri Pirko
2023-11-09 16:13         ` Kubalewski, Arkadiusz
2023-11-09 10:50 ` [PATCH net 0/3] dpll: fix unordered unbind/bind registerer issues Vadim Fedorenko
2023-11-09 17:20   ` Kubalewski, Arkadiusz
2023-11-09 18:07     ` Jiri Pirko
2023-11-09 23:35       ` Kubalewski, Arkadiusz
2023-11-10  6:48         ` Jiri Pirko
2023-11-10  9:06           ` Kubalewski, Arkadiusz
2023-11-10 10:09             ` Jiri Pirko [this message]
2023-11-10 11:22               ` Kubalewski, Arkadiusz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZU4BVdPvAwKer+3v@nanopsycho \
    --to=jiri@resnulli.us \
    --cc=arkadiusz.kubalewski@intel.com \
    --cc=kuba@kernel.org \
    --cc=michal.michalik@intel.com \
    --cc=milena.olech@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=vadim.fedorenko@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.