From: Jason Ozolins <jason.ozolins@hpe.com>
To: Li Dongyang <dongyang.li@anu.edu.au>,
James.Bottomley@HansenPartnership.com,
linux-scsi@vger.kernel.org
Subject: Re: [REPOST PATCH] scsi: ses: don't ask for diagnostic pages repeatedly during probe
Date: Wed, 22 Nov 2017 04:45:41 +1100 [thread overview]
Message-ID: <5A146645.6070307@hpe.com> (raw)
In-Reply-To: <20171113234804.4675-1-dongyang.li@anu.edu.au>
Supporting info for the use case this patch addresses:
On 14/11/2017 10:48, Li Dongyang wrote:
> We are testing if there is a match with the ses device in a loop
> by calling ses_match_to_enclosure(), which will issue scsi receive
> diagnostics commands to the ses device for every device on the
> same host.
> On one of our boxes with 840 disks, it takes a long time to load
> the driver:
>
> [root@g1b-oss06 ~]# time modprobe ses
>
> real 40m48.247s
> user 0m0.001s
> sys 0m0.196s
The use case the patch addresses is Linux HA storage servers natively
handling block storage at scales that until now have been handled
by proprietary modular storage arrays.
Config tested has 840 targets across 26 SES devices in chained JBODS:
host# #targets #enclosures
1 40 2
2 400 12
3 400 12
Without the patch, time taken in ses_init goes up as
#disks * #SES enclosures; so ~41 minutes is spent within
class_interface_register(), holding the scsi_device class mutex.
Stack from kworker during modprobe ses:
[<ffffffff812f440b>] blk_execute_rq+0xab/0x150
[<ffffffff81457653>] scsi_execute+0xd3/0x170
[<ffffffff81458ffe>] scsi_execute_req_flags+0xee/0x100
[<ffffffffa028e19c>] ses_recv_diag+0x7c/0xd0 [ses]
[<ffffffffa028e9fc>] ses_enclosure_data_process+0x7c/0x3e0 [ses]
[<ffffffffa028edac>] ses_match_to_enclosure+0x4c/0xb0 [ses]
[<ffffffffa028f279>] ses_intf_add+0x469/0x4d0 [ses]
[<ffffffff8142dbe9>] class_interface_register+0xb9/0x110
[<ffffffff8145fcb6>] scsi_register_interface+0x16/0x20
[<ffffffffa0037013>] ses_init+0x13/0x1000 [ses]
[<ffffffff810020e8>] do_one_initcall+0xb8/0x230
[<ffffffff81100288>] load_module+0x22c8/0x2930
[<ffffffff81100aa6>] SyS_finit_module+0xa6/0xd0
[<ffffffff816964c9>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
During this time, if e.g. failed disks are replaced, the kworker
handling that event in the hpsa driver will block trying to get
the scsi_device class mutex, and hung task messages will be printed.
Task dump excerpt after disk insertion:
[ 961.323141] kworker/u56:2 D ffff88407ee98680 0 676 2 0x00000000
[ 961.323429] Workqueue: rescan_3_hpsa hpsa_rescan_ctlr_worker [hpsa]
[ 961.323727] ffff881ff456f890 0000000000000046 ffff881ffa454f10 ffff881ff456ffd8
[ 961.324034] ffff881ff456ffd8 ffff881ff456ffd8 ffff881ffa454f10 ffff88407ee98678
[ 961.324332] ffff88407ee9867c ffff881ffa454f10 00000000ffffffff ffff88407ee98680
[ 961.324646] Call Trace:
[ 961.324951] [<ffffffff816aa409>] schedule_preempt_disabled+0x29/0x70
[ 961.325252] [<ffffffff816a8337>] __mutex_lock_slowpath+0xc7/0x1d0
[ 961.325605] [<ffffffff816a774f>] mutex_lock+0x1f/0x2f
[ 961.325939] [<ffffffff8143c7f5>] device_add+0x535/0x7c0
[ 961.326266] [<ffffffff81473d27>] scsi_sysfs_add_sdev+0xb7/0x280
[ 961.326633] [<ffffffff81470d05>] scsi_probe_and_add_lun+0xc35/0xe30
[ 961.326971] [<ffffffff8144acfc>] ? __pm_runtime_resume+0x5c/0x80
[ 961.327327] [<ffffffff8147189d>] __scsi_scan_target+0xad/0x260
[ 961.327662] [<ffffffff81471b68>] scsi_scan_target+0x118/0x130
[ 961.327981] [<ffffffffc00a9df6>] sas_rphy_add+0x106/0x170 [scsi_transport_sas]
[ 961.328334] [<ffffffffc00f2c3c>] adjust_hpsa_scsi_table+0xefc/0x10b0 [hpsa]
[ 961.328705] [<ffffffffc00f4120>] ? hpsa_update_scsi_devices+0x1330/0x18f0 [hpsa]
[ 961.329055] [<ffffffffc00f37d7>] hpsa_update_scsi_devices+0x9e7/0x18f0 [hpsa]
[ 961.329442] [<ffffffff812fa413>] ? blk_peek_request+0x83/0x280
[ 961.329841] [<ffffffffc00f4890>] hpsa_scan_start+0x1b0/0x1e0 [hpsa]
[ 961.330216] [<ffffffffc00f4fd1>] hpsa_rescan_ctlr_worker+0x181/0x657 [hpsa]
[ 961.330655] [<ffffffff810a881a>] process_one_work+0x17a/0x440
[ 961.331089] [<ffffffff810a94e6>] worker_thread+0x126/0x3c0
[ 961.331519] [<ffffffff810a93c0>] ? manage_workers.isra.24+0x2a0/0x2a0
[ 961.331951] [<ffffffff810b098f>] kthread+0xcf/0xe0
[ 961.332379] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[ 961.332830] [<ffffffff816b4f58>] ret_from_fork+0x58/0x90
[ 961.333249] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
The delay processing disk changes causes problems in maintenance and
hardware failure situations where the machine has recently rebooted.
The HPSA HBA firmware presents disks before enclosures when it detects
a change to the SAS fabric; the patch speeds up initialization in
this case:
- When the module loads at boot, in the ses_intf_add() for each
disk device, there are not yet any enclosures registered for that host;
so ses_match_to_enclosure() in the enclosure_find() loop body is not
called.
- When ses_intf_add() is called for an enclosure device, it directly
calls ses_enclosure_data_process(), which populates the enclosure_device
with results from one page 10/page 7 query. The subsequent
call to ses_match_to_enclosure() within the shost_for_each_device()
loop fixes up all the disks for that enclosure, with refresh=0
preventing redundant ses_enclosure_data_process() calls on all
registered enclosure_devices for each device processed within the loop.
- Once enclosure_devices have been registered for a host, ses_intf_add()
calls for disk devices in that host will refresh registered
enclosure_devices through the ses_match_to_enclosure() call within the
enclosure_find() loop. For disks within enclosures not yet processed
by ses_intf_add(), they get handled by the cases already given above.
The detection ordering of disks then enclosures at boot holds for HPSA
driver HBAs, but may not hold in general. This case is also produced
by the HPSA driver if a cable to a chain of enclosures is replaced,
causing a whole chain of disks and enclosures to be re-detected, while
other disks and enclosures remain on the host.
We confirmed that enclosure device sysfs entries are correctly created
for that case, and correctly re-created if the module is removed and
reprobed, just with extra time taken during ses_init() for the
disks that follow already created enclosure devices. For any ordering
of devices on a host where disks precede enclosure devices, this patch
will reduce time taken in ses_init().
So the patch helps for this use case, and does not hurt in general.
Tested-by: Jason Ozolins <jason.ozolins@hpe.com>
> With the patch:
>
> [root@g1b-oss06 ~]# time modprobe ses
>
> real 0m17.915s
> user 0m0.008s
> sys 0m0.053s
>
> Note that we still need to refresh page 10 when we see a new disk
> to create the link.
>
> Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au>
> ---
> drivers/scsi/ses.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
> index 11826c5c2dd4..62f04c0511cf 100644
> --- a/drivers/scsi/ses.c
> +++ b/drivers/scsi/ses.c
> @@ -615,13 +615,16 @@ static void ses_enclosure_data_process(struct enclosure_device *edev,
> }
>
> static void ses_match_to_enclosure(struct enclosure_device *edev,
> - struct scsi_device *sdev)
> + struct scsi_device *sdev,
> + int refresh)
> {
> + struct scsi_device *edev_sdev = to_scsi_device(edev->edev.parent);
> struct efd efd = {
> .addr = 0,
> };
>
> - ses_enclosure_data_process(edev, to_scsi_device(edev->edev.parent), 0);
> + if (refresh)
> + ses_enclosure_data_process(edev, edev_sdev, 0);
>
> if (scsi_is_sas_rphy(sdev->sdev_target->dev.parent))
> efd.addr = sas_get_address(sdev);
> @@ -652,7 +655,7 @@ static int ses_intf_add(struct device *cdev,
> struct enclosure_device *prev = NULL;
>
> while ((edev = enclosure_find(&sdev->host->shost_gendev, prev)) != NULL) {
> - ses_match_to_enclosure(edev, sdev);
> + ses_match_to_enclosure(edev, sdev, 1);
> prev = edev;
> }
> return -ENODEV;
> @@ -768,7 +771,7 @@ static int ses_intf_add(struct device *cdev,
> shost_for_each_device(tmp_sdev, sdev->host) {
> if (tmp_sdev->lun != 0 || scsi_device_enclosure(tmp_sdev))
> continue;
> - ses_match_to_enclosure(edev, tmp_sdev);
> + ses_match_to_enclosure(edev, tmp_sdev, 0);
> }
>
> return 0;
>
prev parent reply other threads:[~2017-11-21 17:46 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-13 23:48 [REPOST PATCH] scsi: ses: don't ask for diagnostic pages repeatedly during probe Li Dongyang
2017-11-21 17:45 ` Jason Ozolins [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5A146645.6070307@hpe.com \
--to=jason.ozolins@hpe.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=dongyang.li@anu.edu.au \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.