Re: [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID"

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

From: Igor Pylypiv <ipylypiv@google.com>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: Niklas Cassel <cassel@kernel.org>,
	Jack Wang <jinpu.wang@cloud.ionos.com>,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Terrence Adams <tadamsjr@google.com>,
	linux-scsi@vger.kernel.org
Subject: Re: [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID"
Date: Sat, 2 Aug 2025 11:31:20 -0700	[thread overview]
Message-ID: <aI5ZeAweRWKQmLPU@google.com> (raw)
In-Reply-To: <72881ac7-f276-49d6-8918-a81d41502d11@kernel.org>

On Wed, Jul 23, 2025 at 10:38:56AM +0900, Damien Le Moal wrote:
> On 7/23/25 6:36 AM, Igor Pylypiv wrote:
> >>> And it works, I can see the drives in the enclosure behind the expander.
> >>> Care to send a proper path ?
> >>>
> >>> I think this needs more testing though, especially special cases like yanking
> >>> the SAS cable and doing device hotplug/unplug. Will do that later today.
> >>
> >> So I did that. And things are not pretty... Even a simple "rmmod pm80xx"
> >> crashes the kernel on a bad pointer dereference (invalid port address). Same if
> >> I hot-unplug drives from the enclosure. But that happens even with only Niklas
> >> revert patch applied. So I think that is unrelated to this change.
> >>
> >> That said, I will dig further to understand how the port pointers become
> >> invalid, and make sure this change is OK. Note that there are no issues that I
> >> can see when there is no expander (drives directly attached to the HBA).
> > 
> > Thank you for testing, Damien!
> > 
> > Just guessing, would defining the lldd_port_deformed() callback help?
> > The callback can set lldd_port to NULL if the problem is due to a dangling
> > lldd_port pointer.
> 
> Not sure if that is needed yet. The crash I am seeing is:
> 
> [56961.621080] BUG: unable to handle page fault for address: ff303500e1ee00ac
> [56961.629527] #PF: supervisor read access in kernel mode
> [56961.635315] #PF: error_code(0x0000) - not-present page
> [56961.641102] PGD 95e001067 P4D 95e002067 PUD 0
> [56961.646113] Oops: Oops: 0000 [#1] SMP
> [56961.650244] CPU: 10 UID: 0 PID: 3373 Comm: rmmod Not tainted 6.16.0-rc7+
> #380 PREEMPT(voluntary)
> [56961.660238] Hardware name: Supermicro SYS-520P-WTR/X12SPW-TF, BIOS 1.2
> 02/14/2022
> [56961.668664] RIP: 0010:do_raw_spin_lock+0xa/0xb0
> [56961.673776] Code: ff 48 c7 03 00 00 00 00 48 c7 43 10 ff ff ff ff 48 c7 43
> 08 ed 1e af de 48 83 c4 10 5b 5d c3 90 66 0f 1f 00 0f 1f 44 00 00 53 <8b> 47 04
> 48 89 fb 3d ad 4e ad de 75 41 48 8b 43 10 65 48 3b 05 4d
> [56961.694912] RSP: 0018:ff31e51a4f09fa60 EFLAGS: 00010082
> [56961.700799] RAX: 0000000000000000 RBX: 0000000000000046 RCX: 0000000000000000
> [56961.708841] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff303500e1ee00a8
> [56961.716880] RBP: ff303500e1ee00a8 R08: 0000000000000001 R09: 0000000000000000
> [56961.724920] R10: 0000000000000000 R11: 0000000000000000 R12: ff303500e1ee0000
> [56961.732958] R13: ff303504f4f64000 R14: ff303504e1ee0000 R15: ff303504d9f60000
> [56961.740999] FS:  00007f126637a740(0000) GS:ff3035145d8ed000(0000)
> knlGS:0000000000000000
> [56961.750114] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [56961.756590] CR2: ff303500e1ee00ac CR3: 00000001f2f38001 CR4: 0000000000773ef0
> [56961.765229] PKRU: 55555554
> [56961.768887] Call Trace:
> [56961.772249]  <TASK>
> [56961.775212]  _raw_spin_lock_irqsave+0x41/0x50
> [56961.780723]  ? sas_ata_end_eh+0x2f/0x60 [libsas]
> [56961.786532]  sas_ata_end_eh+0x2f/0x60 [libsas]
> [56961.792133]  sas_unregister_common_dev+0xc3/0x1a0 [libsas]
> [56961.798897]  sas_destruct_devices+0x9f/0xc0 [libsas]
> [56961.805073]  sas_deform_port.cold+0x83/0x288 [libsas]
> [56961.811330]  sas_unregister_ports+0x36/0x50 [libsas]
> [56961.817484]  sas_unregister_ha+0x56/0x90 [libsas]
> [56961.823343]  pm8001_pci_remove+0x28/0x160 [pm80xx]
> [56961.829315]  pci_device_remove+0x44/0xb0
> [56961.834287]  device_release_driver_internal+0x1a4/0x210
> [56961.840718]  driver_detach+0x4b/0x90
> [56961.845297]  bus_remove_driver+0x70/0xf0
> [56961.850265]  pci_unregister_driver+0x2f/0xb0
> [56961.855625]  pm8001_exit+0x10/0x28 [pm80xx]
> [56961.860909]  __x64_sys_delete_module+0x19e/0x2d0
> [56961.866650]  do_syscall_64+0x92/0x380
> [56961.871310]  ? __x64_sys_close+0x3d/0x80
> [56961.876246]  ? trace_hardirqs_on_prepare+0x77/0xa0
> [56961.882156]  ? do_syscall_64+0x154/0x380
> [56961.887081]  ? do_fault+0x3ef/0x720
> [56961.891510]  ? ___pte_offset_map+0x43/0x1c0
> [56961.896711]  ? ___pte_offset_map+0x24/0x1c0
> [56961.901901]  ? __handle_mm_fault+0x5f2/0x1140
> [56961.907278]  ? ksys_read+0x79/0xf0
> [56961.911570]  ? lock_acquire+0x280/0x2e0
> [56961.916339]  ? lock_acquire+0x280/0x2e0
> [56961.921103]  ? lock_release+0x222/0x2d0
> [56961.925865]  ? rcu_read_unlock+0x1c/0x60
> [56961.930720]  ? lock_release+0x222/0x2d0
> [56961.935476]  ? do_user_addr_fault+0x368/0x6a0
> [56961.940811]  ? trace_hardirqs_off+0x40/0xb0
> [56961.945936]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [56961.952031] RIP: 0033:0x7f1265d02a2b
> [56961.956444] Code: 73 01 c3 48 8b 0d dd 33 0f 00 f7 d8 64 89 01 48 83 c8 ff
> c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01
> f0 ff ff 73 01 c3 48 8b 0d ad 33 0f 00 f7 d8 64 89 01 48
> [56961.978361] RSP: 002b:00007ffe899f06c8 EFLAGS: 00000202 ORIG_RAX:
> 00000000000000b0
> [56961.987286] RAX: ffffffffffffffda RBX: 0000559d15a32750 RCX: 00007f1265d02a2b
> [56961.995727] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000559d15a327b8
> [56962.004170] RBP: 00007ffe899f06f0 R08: 0000000000000000 R09: 0000000000000000
> [56962.012612] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
> [56962.021058] R13: 00007ffe899f1701 R14: 00007ffe899f0940 R15: 0000000000000000
> 
> So it is here:
> 
> void sas_ata_end_eh(struct ata_port *ap)
> {
>         struct domain_device *dev = ap->private_data;
>         struct sas_ha_struct *ha = dev->port->ha;
>         unsigned long flags;
> 
>         spin_lock_irqsave(&ha->lock, flags);
> 	...
> 
> Meaning that the ha pointer is broken...
> Not sure why yet, but I get the exact same place for the crash if I hot-unplug
> a drive from the enclosure. So something is wrong with removing devices when
> there is an expander. Still digging into it.
>

Hi Damien,

Do you happen to have any updates regarding this crash?

Thanks,
Igor
 
> -- 
> Damien Le Moal
> Western Digital Research

next prev parent reply	other threads:[~2025-08-02 18:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-17 16:56 [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID" Niklas Cassel
2025-07-17 21:20 ` Igor Pylypiv
2025-07-17 21:29   ` Niklas Cassel
2025-07-18  4:35   ` Damien Le Moal
2025-07-18 22:30     ` Igor Pylypiv
2025-07-22  1:28       ` Damien Le Moal
2025-07-22  4:25         ` Damien Le Moal
2025-07-22 21:36           ` Igor Pylypiv
2025-07-23  1:38             ` Damien Le Moal
2025-08-02 18:31               ` Igor Pylypiv [this message]
2025-08-04  0:03                 ` Damien Le Moal
2025-07-18  4:36 ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aI5ZeAweRWKQmLPU@google.com \
    --to=ipylypiv@google.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=cassel@kernel.org \
    --cc=dlemoal@kernel.org \
    --cc=jinpu.wang@cloud.ionos.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=tadamsjr@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox