All of lore.kernel.org
 help / color / mirror / Atom feed
From: Igor Pylypiv <ipylypiv@google.com>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: Niklas Cassel <cassel@kernel.org>,
	Jack Wang <jinpu.wang@cloud.ionos.com>,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Terrence Adams <tadamsjr@google.com>,
	linux-scsi@vger.kernel.org
Subject: Re: [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID"
Date: Sat, 2 Aug 2025 11:31:20 -0700	[thread overview]
Message-ID: <aI5ZeAweRWKQmLPU@google.com> (raw)
In-Reply-To: <72881ac7-f276-49d6-8918-a81d41502d11@kernel.org>

On Wed, Jul 23, 2025 at 10:38:56AM +0900, Damien Le Moal wrote:
> On 7/23/25 6:36 AM, Igor Pylypiv wrote:
> >>> And it works, I can see the drives in the enclosure behind the expander.
> >>> Care to send a proper path ?
> >>>
> >>> I think this needs more testing though, especially special cases like yanking
> >>> the SAS cable and doing device hotplug/unplug. Will do that later today.
> >>
> >> So I did that. And things are not pretty... Even a simple "rmmod pm80xx"
> >> crashes the kernel on a bad pointer dereference (invalid port address). Same if
> >> I hot-unplug drives from the enclosure. But that happens even with only Niklas
> >> revert patch applied. So I think that is unrelated to this change.
> >>
> >> That said, I will dig further to understand how the port pointers become
> >> invalid, and make sure this change is OK. Note that there are no issues that I
> >> can see when there is no expander (drives directly attached to the HBA).
> > 
> > Thank you for testing, Damien!
> > 
> > Just guessing, would defining the lldd_port_deformed() callback help?
> > The callback can set lldd_port to NULL if the problem is due to a dangling
> > lldd_port pointer.
> 
> Not sure if that is needed yet. The crash I am seeing is:
> 
> [56961.621080] BUG: unable to handle page fault for address: ff303500e1ee00ac
> [56961.629527] #PF: supervisor read access in kernel mode
> [56961.635315] #PF: error_code(0x0000) - not-present page
> [56961.641102] PGD 95e001067 P4D 95e002067 PUD 0
> [56961.646113] Oops: Oops: 0000 [#1] SMP
> [56961.650244] CPU: 10 UID: 0 PID: 3373 Comm: rmmod Not tainted 6.16.0-rc7+
> #380 PREEMPT(voluntary)
> [56961.660238] Hardware name: Supermicro SYS-520P-WTR/X12SPW-TF, BIOS 1.2
> 02/14/2022
> [56961.668664] RIP: 0010:do_raw_spin_lock+0xa/0xb0
> [56961.673776] Code: ff 48 c7 03 00 00 00 00 48 c7 43 10 ff ff ff ff 48 c7 43
> 08 ed 1e af de 48 83 c4 10 5b 5d c3 90 66 0f 1f 00 0f 1f 44 00 00 53 <8b> 47 04
> 48 89 fb 3d ad 4e ad de 75 41 48 8b 43 10 65 48 3b 05 4d
> [56961.694912] RSP: 0018:ff31e51a4f09fa60 EFLAGS: 00010082
> [56961.700799] RAX: 0000000000000000 RBX: 0000000000000046 RCX: 0000000000000000
> [56961.708841] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff303500e1ee00a8
> [56961.716880] RBP: ff303500e1ee00a8 R08: 0000000000000001 R09: 0000000000000000
> [56961.724920] R10: 0000000000000000 R11: 0000000000000000 R12: ff303500e1ee0000
> [56961.732958] R13: ff303504f4f64000 R14: ff303504e1ee0000 R15: ff303504d9f60000
> [56961.740999] FS:  00007f126637a740(0000) GS:ff3035145d8ed000(0000)
> knlGS:0000000000000000
> [56961.750114] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [56961.756590] CR2: ff303500e1ee00ac CR3: 00000001f2f38001 CR4: 0000000000773ef0
> [56961.765229] PKRU: 55555554
> [56961.768887] Call Trace:
> [56961.772249]  <TASK>
> [56961.775212]  _raw_spin_lock_irqsave+0x41/0x50
> [56961.780723]  ? sas_ata_end_eh+0x2f/0x60 [libsas]
> [56961.786532]  sas_ata_end_eh+0x2f/0x60 [libsas]
> [56961.792133]  sas_unregister_common_dev+0xc3/0x1a0 [libsas]
> [56961.798897]  sas_destruct_devices+0x9f/0xc0 [libsas]
> [56961.805073]  sas_deform_port.cold+0x83/0x288 [libsas]
> [56961.811330]  sas_unregister_ports+0x36/0x50 [libsas]
> [56961.817484]  sas_unregister_ha+0x56/0x90 [libsas]
> [56961.823343]  pm8001_pci_remove+0x28/0x160 [pm80xx]
> [56961.829315]  pci_device_remove+0x44/0xb0
> [56961.834287]  device_release_driver_internal+0x1a4/0x210
> [56961.840718]  driver_detach+0x4b/0x90
> [56961.845297]  bus_remove_driver+0x70/0xf0
> [56961.850265]  pci_unregister_driver+0x2f/0xb0
> [56961.855625]  pm8001_exit+0x10/0x28 [pm80xx]
> [56961.860909]  __x64_sys_delete_module+0x19e/0x2d0
> [56961.866650]  do_syscall_64+0x92/0x380
> [56961.871310]  ? __x64_sys_close+0x3d/0x80
> [56961.876246]  ? trace_hardirqs_on_prepare+0x77/0xa0
> [56961.882156]  ? do_syscall_64+0x154/0x380
> [56961.887081]  ? do_fault+0x3ef/0x720
> [56961.891510]  ? ___pte_offset_map+0x43/0x1c0
> [56961.896711]  ? ___pte_offset_map+0x24/0x1c0
> [56961.901901]  ? __handle_mm_fault+0x5f2/0x1140
> [56961.907278]  ? ksys_read+0x79/0xf0
> [56961.911570]  ? lock_acquire+0x280/0x2e0
> [56961.916339]  ? lock_acquire+0x280/0x2e0
> [56961.921103]  ? lock_release+0x222/0x2d0
> [56961.925865]  ? rcu_read_unlock+0x1c/0x60
> [56961.930720]  ? lock_release+0x222/0x2d0
> [56961.935476]  ? do_user_addr_fault+0x368/0x6a0
> [56961.940811]  ? trace_hardirqs_off+0x40/0xb0
> [56961.945936]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> [56961.952031] RIP: 0033:0x7f1265d02a2b
> [56961.956444] Code: 73 01 c3 48 8b 0d dd 33 0f 00 f7 d8 64 89 01 48 83 c8 ff
> c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01
> f0 ff ff 73 01 c3 48 8b 0d ad 33 0f 00 f7 d8 64 89 01 48
> [56961.978361] RSP: 002b:00007ffe899f06c8 EFLAGS: 00000202 ORIG_RAX:
> 00000000000000b0
> [56961.987286] RAX: ffffffffffffffda RBX: 0000559d15a32750 RCX: 00007f1265d02a2b
> [56961.995727] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000559d15a327b8
> [56962.004170] RBP: 00007ffe899f06f0 R08: 0000000000000000 R09: 0000000000000000
> [56962.012612] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
> [56962.021058] R13: 00007ffe899f1701 R14: 00007ffe899f0940 R15: 0000000000000000
> 
> So it is here:
> 
> void sas_ata_end_eh(struct ata_port *ap)
> {
>         struct domain_device *dev = ap->private_data;
>         struct sas_ha_struct *ha = dev->port->ha;
>         unsigned long flags;
> 
>         spin_lock_irqsave(&ha->lock, flags);
> 	...
> 
> Meaning that the ha pointer is broken...
> Not sure why yet, but I get the exact same place for the crash if I hot-unplug
> a drive from the enclosure. So something is wrong with removing devices when
> there is an expander. Still digging into it.
>

Hi Damien,

Do you happen to have any updates regarding this crash?

Thanks,
Igor
 
> -- 
> Damien Le Moal
> Western Digital Research

  reply	other threads:[~2025-08-02 18:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-17 16:56 [PATCH] Revert "scsi: pm80xx: Do not use libsas port ID" Niklas Cassel
2025-07-17 21:20 ` Igor Pylypiv
2025-07-17 21:29   ` Niklas Cassel
2025-07-18  4:35   ` Damien Le Moal
2025-07-18 22:30     ` Igor Pylypiv
2025-07-22  1:28       ` Damien Le Moal
2025-07-22  4:25         ` Damien Le Moal
2025-07-22 21:36           ` Igor Pylypiv
2025-07-23  1:38             ` Damien Le Moal
2025-08-02 18:31               ` Igor Pylypiv [this message]
2025-08-04  0:03                 ` Damien Le Moal
2025-07-18  4:36 ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aI5ZeAweRWKQmLPU@google.com \
    --to=ipylypiv@google.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=cassel@kernel.org \
    --cc=dlemoal@kernel.org \
    --cc=jinpu.wang@cloud.ionos.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=tadamsjr@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.