From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E95032C82 for ; Thu, 9 Nov 2023 18:07:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=resnulli-us.20230601.gappssmtp.com header.i=@resnulli-us.20230601.gappssmtp.com header.b="W2bPJzrU" Received: from mail-ed1-x529.google.com (mail-ed1-x529.google.com [IPv6:2a00:1450:4864:20::529]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DEEEC1727 for ; Thu, 9 Nov 2023 10:07:14 -0800 (PST) Received: by mail-ed1-x529.google.com with SMTP id 4fb4d7f45d1cf-5441ba3e53cso1819776a12.1 for ; Thu, 09 Nov 2023 10:07:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=resnulli-us.20230601.gappssmtp.com; s=20230601; t=1699553233; x=1700158033; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=oGtopPGDGiHp99PDIuHYLua3YoCROal58OYHkjV7DBI=; b=W2bPJzrU/xY6COVkFTQgSmqkfEC/VeopQ1i8TXfcIOMgGKtnzOxgrdfC9smN2jXRs1 jSsghNRgseRUs5XJfgYsOPN8zE245ubTYk+8F54anCMqY4GTPWr5gjGLQma7yfWGVzvB mUq08qdRsFAH3ZPiNovgYOsay2yhNXBDCIXUKkKUotwcWfhzU7dmA7ni+MoXSdYyBoSJ a+YEHScpu+FQNW9FfyO4/Q5sMUxjDDrJVusNFk2LFc6coyy7uq6gA7gBz/75FBV6F7Ws N3PDKKiTKKwguzda2+F8kbocuF30k77GDSW0/I3FNZuaDJDLa+Ox0QGs2w08j+JEo5Lz tdPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699553233; x=1700158033; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=oGtopPGDGiHp99PDIuHYLua3YoCROal58OYHkjV7DBI=; b=MrqW60lDOyzUyFMsDHM7zkMsSgmbVoRhxMUgsD+xxYX6ZvH5qxv4L0j6UP9flqcKBP /MC9Ffb6frUz6PobCG6mJaf8PzAdOXRYrLWn8NzkcX6P25qEb4UfNOvME/jX5p0nfyAu v5G1kUOU3CIRggpFmM4O9036WWhRYkz+Xe8o8Tq6GqQkzPaets/p1+XAiqwkc8Ox2GeA 7j5wcdH432HgA1yzOR666O+5mSSYO0kRYjEcieWuYI8inQmoTm4pJNc50rzAvmUI6gn0 tYuYjrLgxKCA73bXgxHwh6gSYknSK3EW4kA6v+5lMIo8+Io40GD1+37rXBK8eeeTp27M kZ+w== X-Gm-Message-State: AOJu0YxZ7caVatDvE2p2V0uPKPyLc5QZGbbgopCKn1VF01a45/kOwWkv pHFNUm3EeshI5PGJ/6CkxalKjA== X-Google-Smtp-Source: AGHT+IHeBVdX0ytpk0hc1Pmju8UM1U3MadSrK4b4dCn6Xj/sHbID2k+sDi/fwCCmUVTtFoVFjbQB0Q== X-Received: by 2002:a50:c050:0:b0:53e:98c6:5100 with SMTP id u16-20020a50c050000000b0053e98c65100mr4700269edd.30.1699553233208; Thu, 09 Nov 2023 10:07:13 -0800 (PST) Received: from localhost (host-213-179-129-39.customer.m-online.net. [213.179.129.39]) by smtp.gmail.com with ESMTPSA id j26-20020a50d01a000000b0053e408aec8bsm109200edf.6.2023.11.09.10.07.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Nov 2023 10:07:12 -0800 (PST) Date: Thu, 9 Nov 2023 19:07:11 +0100 From: Jiri Pirko To: "Kubalewski, Arkadiusz" Cc: Vadim Fedorenko , "netdev@vger.kernel.org" , "Michalik, Michal" , "Olech, Milena" , "pabeni@redhat.com" , "kuba@kernel.org" Subject: Re: [PATCH net 0/3] dpll: fix unordered unbind/bind registerer issues Message-ID: References: <20231108103226.1168500-1-arkadiusz.kubalewski@intel.com> <4c251905-308b-4709-8e08-39cda85678f9@linux.dev> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Thu, Nov 09, 2023 at 06:20:14PM CET, arkadiusz.kubalewski@intel.com wrote: >>From: Vadim Fedorenko >>Sent: Thursday, November 9, 2023 11:51 AM >> >>On 08/11/2023 10:32, Arkadiusz Kubalewski wrote: >>> Fix issues when performing unordered unbind/bind of a kernel modules >>> which are using a dpll device with DPLL_PIN_TYPE_MUX pins. >>> Currently only serialized bind/unbind of such use case works, fix >>> the issues and allow for unserialized kernel module bind order. >>> >>> The issues are observed on the ice driver, i.e., >>> >>> $ echo 0000:af:00.0 > /sys/bus/pci/drivers/ice/unbind >>> $ echo 0000:af:00.1 > /sys/bus/pci/drivers/ice/unbind >>> >>> results in: >>> >>> ice 0000:af:00.0: Removed PTP clock >>> BUG: kernel NULL pointer dereference, address: 0000000000000010 >>> PF: supervisor read access in kernel mode >>> PF: error_code(0x0000) - not-present page >>> PGD 0 P4D 0 >>> Oops: 0000 [#1] PREEMPT SMP PTI >>> CPU: 7 PID: 71848 Comm: bash Kdump: loaded Not tainted 6.6.0-rc5_next- >>>queue_19th-Oct-2023-01625-g039e5d15e451 #1 >>> Hardware name: Intel Corporation S2600STB/S2600STB, BIOS >>>SE5C620.86B.02.01.0008.031920191559 03/19/2019 >>> RIP: 0010:ice_dpll_rclk_state_on_pin_get+0x2f/0x90 [ice] >>> Code: 41 57 4d 89 cf 41 56 41 55 4d 89 c5 41 54 55 48 89 f5 53 4c 8b 66 >>>08 48 89 cb 4d 8d b4 24 f0 49 00 00 4c 89 f7 e8 71 ec 1f c5 <0f> b6 5b 10 >>>41 0f b6 84 24 30 4b 00 00 29 c3 41 0f b6 84 24 28 4b >>> RSP: 0018:ffffc902b179fb60 EFLAGS: 00010246 >>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 >>> RDX: ffff8882c1398000 RSI: ffff888c7435cc60 RDI: ffff888c7435cb90 >>> RBP: ffff888c7435cc60 R08: ffffc902b179fbb0 R09: 0000000000000000 >>> R10: ffff888ef1fc8050 R11: fffffffffff82700 R12: ffff888c743581a0 >>> R13: ffffc902b179fbb0 R14: ffff888c7435cb90 R15: 0000000000000000 >>> FS: 00007fdc7dae0740(0000) GS:ffff888c105c0000(0000) >>knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> CR2: 0000000000000010 CR3: 0000000132c24002 CR4: 00000000007706e0 >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>> PKRU: 55555554 >>> Call Trace: >>> >>> ? __die+0x20/0x70 >>> ? page_fault_oops+0x76/0x170 >>> ? exc_page_fault+0x65/0x150 >>> ? asm_exc_page_fault+0x22/0x30 >>> ? ice_dpll_rclk_state_on_pin_get+0x2f/0x90 [ice] >>> ? __pfx_ice_dpll_rclk_state_on_pin_get+0x10/0x10 [ice] >>> dpll_msg_add_pin_parents+0x142/0x1d0 >>> dpll_pin_event_send+0x7d/0x150 >>> dpll_pin_on_pin_unregister+0x3f/0x100 >>> ice_dpll_deinit_pins+0xa1/0x230 [ice] >>> ice_dpll_deinit+0x29/0xe0 [ice] >>> ice_remove+0xcd/0x200 [ice] >>> pci_device_remove+0x33/0xa0 >>> device_release_driver_internal+0x193/0x200 >>> unbind_store+0x9d/0xb0 >>> kernfs_fop_write_iter+0x128/0x1c0 >>> vfs_write+0x2bb/0x3e0 >>> ksys_write+0x5f/0xe0 >>> do_syscall_64+0x59/0x90 >>> ? filp_close+0x1b/0x30 >>> ? do_dup2+0x7d/0xd0 >>> ? syscall_exit_work+0x103/0x130 >>> ? syscall_exit_to_user_mode+0x22/0x40 >>> ? do_syscall_64+0x69/0x90 >>> ? syscall_exit_work+0x103/0x130 >>> ? syscall_exit_to_user_mode+0x22/0x40 >>> ? do_syscall_64+0x69/0x90 >>> entry_SYSCALL_64_after_hwframe+0x6e/0xd8 >>> RIP: 0033:0x7fdc7d93eb97 >>> Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e >>fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 >>ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 >>> RSP: 002b:00007fff2aa91028 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 >>> RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fdc7d93eb97 >>> RDX: 000000000000000d RSI: 00005644814ec9b0 RDI: 0000000000000001 >>> RBP: 00005644814ec9b0 R08: 0000000000000000 R09: 00007fdc7d9b14e0 >>> R10: 00007fdc7d9b13e0 R11: 0000000000000246 R12: 000000000000000d >>> R13: 00007fdc7d9fb780 R14: 000000000000000d R15: 00007fdc7d9f69e0 >>> >>> Modules linked in: uinput vfio_pci vfio_pci_core vfio_iommu_type1 vfio >>>irqbypass ixgbevf snd_seq_dummy snd_hrtimer snd_seq snd_timer >>>snd_seq_device snd soundcore overlay qrtr rfkill vfat fat xfs libcrc32c >>>rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod >>>ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm intel_rapl_msr >>>intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common >>>isst_if_common skx_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal >>>intel_powerclamp coretemp irdma rapl intel_cstate ib_uverbs iTCO_wdt >>>iTCO_vendor_support acpi_ipmi intel_uncore mei_me ipmi_si pcspkr i2c_i801 >>>ib_core mei ipmi_devintf intel_pch_thermal ioatdma i2c_smbus >>>ipmi_msghandler lpc_ich joydev acpi_power_meter acpi_pad ext4 mbcache jbd2 >>>sd_mod t10_pi sg ast i2c_algo_bit drm_shmem_helper drm_kms_helper ice >>>crct10dif_pclmul ixgbe crc32_pclmul drm crc32c_intel ahci i40e libahci >>>ghash_clmulni_intel libata mdio dca gnss wmi fuse [last unloaded: iavf] >>> CR2: 0000000000000010 >>> >>> Arkadiusz Kubalewski (3): >>> dpll: fix pin dump crash after module unbind >>> dpll: fix pin dump crash for rebound module >>> dpll: fix register pin with unregistered parent pin >>> >>> drivers/dpll/dpll_core.c | 8 ++------ >>> drivers/dpll/dpll_core.h | 4 ++-- >>> drivers/dpll/dpll_netlink.c | 37 ++++++++++++++++++++++--------------- >>> 3 files changed, 26 insertions(+), 23 deletions(-) >>> >> >> >>I still don't get how can we end up with unregistered pin. And shouldn't >>drivers do unregister of dpll/pin during release procedure? I thought it >>was kind of agreement we reached while developing the subsystem. >> > >It's definitely not about ending up with unregistered pins. > >Usually the driver is loaded for PF0, PF1, PF2, PF3 and unloaded in opposite >order: PF3, PF2, PF1, PF0. And this is working without any issues. Please fix this in the driver. > >Above crash is caused because of unordered driver unload, where dpll subsystem >tries to notify muxed pin was deleted, but at that time the parent is already >gone, thus data points to memory which is no longer available, thus crash >happens when trying to dump pin parents. > >This series fixes all issues I could find connected to the situation where >muxed-pins are trying to access their parents, when parent registerer was removed >in the meantime. > >Thank you! >Arkadiusz