From: Jason Yan <yanaijie@huawei.com>
To: John Garry <john.garry@huawei.com>,
martin.petersen@oracle.com, jejb@linux.vnet.ibm.com
Cc: linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org,
zhaohongjiang@huawei.com, hare@suse.com,
dan.j.williams@intel.com, jthumshirn@suse.de, hch@lst.de,
huangdaode@hisilicon.com, chenxiang66@hisilicon.com,
xiexiuqi@huawei.com, tj@kernel.org, miaoxie@huawei.com,
Xiaofei Tan <tanxiaofei@huawei.com>,
Ewan Milne <emilne@redhat.com>, Tomas Henzl <thenzl@redhat.com>
Subject: Re: [PATCH v2 7/7] scsi: libsas: fix issue of swapping two sas disks
Date: Fri, 1 Feb 2019 10:04:31 +0800 [thread overview]
Message-ID: <5C53A92F.50608@huawei.com> (raw)
In-Reply-To: <1135102e-e563-d3ab-9b44-d8691c3e6ccb@huawei.com>
On 2019/2/1 0:34, John Garry wrote:
> On 31/01/2019 02:55, Jason Yan wrote:
>>
>>
>> On 2019/1/31 1:53, John Garry wrote:
>>> On 30/01/2019 08:24, Jason Yan wrote:
>>>> The work flow of revalidation now is scanning expander phy by the
>>>> sequence of the phy and check if the phy have changed. This will leads
>>>> to an issue of swapping two sas disks on one expander.
>>>>
>>>> Assume we have two sas disks, connected with expander phy10 and phy11:
>>>>
>>>> phy10: 5000cca04eb1001d port-0:0:10
>>>> phy11: 5000cca04eb043ad port-0:0:11
>>>>
>>>> Swap these two disks, and imaging the following scenario:
>>>>
>>>> revalidation 1:
>>>
>>> What does "revalidation 1" actually mean?
>>
>> 'revalidation 1' means one entry in sas_discover_domain().
>>
>>>
>>>> -->phy10: 0 --> delete phy10 domain device
>>>> -->phy11: 5000cca04eb043ad (no change)
>>>
>>> so is disk 11 still inserted at this stage?
>>
>> Maybe, but that's what we read from the hardware.
>>
>>>
>>>> revalidation done
>>>>
>>>> revalidation 2:
>>>
>>> is port-0:0:10 deleted now?
>>>
>>
>> Yes. But we don't care about it.
>>
>>>> -->step 1, check phy10:
>>>> -->phy10: 5000cca04eb043ad --> add to wide port(port-0:0:11) (phy11
>>>> address is still 5000cca04eb043ad now)
>
> We do not want this to happen and it seems to be the crux of the problem.
>
> As an alternate to your solution, how about check if the PHY is an end
> device. If so, it should not form/join a wideport; that is, apart from
> dual-port disks, which I am not sure about - I think each port still has
> a unique WWN, so should be ok.
>
If the PHY do not join a wideport, then it have to form a wideport of
it's own. I'm not sure if we can have two ports with the same address
and do not break anything?
>>>
>>> So this should not happen. How are you physcially swapping them such
>>> that phy11 address is still 5000cca04eb043ad? I don't see how this would
>>> be true at revalidation 1.
>>>
>>
>> This issue is because we always process the PHYs from 0 to max phy
>> number. And please be aware of the real physcial address of the PHY and
>> the address stored in the memory is not always the same.
>> Actually when you checking phy10, phy11 physcial address is not
>> 5000cca04eb043ad. But the address stored in domain device is still
>> 5000cca04eb043ad. We have not get a chance to to read it because we are
>> processing phy10 now, right?
>>
>
> I see.
>
>> It's very easy to reproduce. I suggest you to do it yourself and look at
>> the logs.
>>
>
> I can't physically access the backpane, and this is not the sort of
> thing which is easy to fake by hacking the driver.
>
> And the log which you provided internally does not have much - if any -
> libsas logs to help me understand it.
>
>>>>
>>>> -->step 2, check phy11:
>>>> -->phy11: 0 --> phy11 address is 0 now, but it's part of wide
>>>> port(port-0:0:11), the domain device will not be deleted.
>>>> revalidation done
>>>>
>>>> revalidation 3:
>>>> -->phy10, 5000cca04eb043ad (no change)
>>>> -->phy11: 5000cca04eb1001d --> try to add port-0:0:11 but failed,
>>>> port-0:0:11 already exist, trigger a warning as follows
>>>> revalidation done
>>>>
>>>> [14790.189699] sysfs: cannot create duplicate filename
>>>> '/devices/pci0000:74/0000:74:02.0/host0/port-0:0/expander-0:0/port-0:0:11'
>>>>
>>>>
>>>>
>>>> [14790.201081] CPU: 25 PID: 5031 Comm: kworker/u192:3 Not tainted
>>>> 4.16.0-rc1-191134-g138f084-dirty #228
>>>> [14790.210199] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 EC
>>>> UEFI
>>>> Nemo 2.0 RC0 - B303 05/16/2018
>>>> [14790.219323] Workqueue: 0000:74:02.0_disco_q sas_revalidate_domain
>>>> [14790.225404] Call trace:
>>>> [14790.227842] dump_backtrace+0x0/0x18c
>>>> [14790.231492] show_stack+0x14/0x1c
>>>> [14790.234798] dump_stack+0x88/0xac
>>>> [14790.238101] sysfs_warn_dup+0x64/0x7c
>>>> [14790.241751] sysfs_create_dir_ns+0x90/0xa0
>>>> [14790.245835] kobject_add_internal+0xa0/0x284
>>>> [14790.250092] kobject_add+0xb8/0x11c
>>>> [14790.253570] device_add+0xe8/0x598
>>>> [14790.256960] sas_port_add+0x24/0x50
>>>> [14790.260436] sas_ex_discover_devices+0xb10/0xc30
>>>> [14790.265040] sas_ex_revalidate_domain+0x1d8/0x518
>>>> [14790.269731] sas_revalidate_domain+0x12c/0x154
>>>> [14790.274163] process_one_work+0x128/0x2b0
>>>> [14790.278160] worker_thread+0x14c/0x408
>>>> [14790.281897] kthread+0xfc/0x128
>>>> [14790.285026] ret_from_fork+0x10/0x18
>>>> [14790.288598] ------------[ cut here ]------------
>>>>
>>>> At last, the disk 5000cca04eb1001d is lost.
>
>
> .
>
WARNING: multiple messages have this Message-ID (diff)
From: Jason Yan <yanaijie@huawei.com>
To: John Garry <john.garry@huawei.com>, <martin.petersen@oracle.com>,
<jejb@linux.vnet.ibm.com>
Cc: <linux-scsi@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<zhaohongjiang@huawei.com>, <hare@suse.com>,
<dan.j.williams@intel.com>, <jthumshirn@suse.de>, <hch@lst.de>,
<huangdaode@hisilicon.com>, <chenxiang66@hisilicon.com>,
<xiexiuqi@huawei.com>, <tj@kernel.org>, <miaoxie@huawei.com>,
Xiaofei Tan <tanxiaofei@huawei.com>,
Ewan Milne <emilne@redhat.com>, Tomas Henzl <thenzl@redhat.com>
Subject: Re: [PATCH v2 7/7] scsi: libsas: fix issue of swapping two sas disks
Date: Fri, 1 Feb 2019 10:04:31 +0800 [thread overview]
Message-ID: <5C53A92F.50608@huawei.com> (raw)
In-Reply-To: <1135102e-e563-d3ab-9b44-d8691c3e6ccb@huawei.com>
On 2019/2/1 0:34, John Garry wrote:
> On 31/01/2019 02:55, Jason Yan wrote:
>>
>>
>> On 2019/1/31 1:53, John Garry wrote:
>>> On 30/01/2019 08:24, Jason Yan wrote:
>>>> The work flow of revalidation now is scanning expander phy by the
>>>> sequence of the phy and check if the phy have changed. This will leads
>>>> to an issue of swapping two sas disks on one expander.
>>>>
>>>> Assume we have two sas disks, connected with expander phy10 and phy11:
>>>>
>>>> phy10: 5000cca04eb1001d port-0:0:10
>>>> phy11: 5000cca04eb043ad port-0:0:11
>>>>
>>>> Swap these two disks, and imaging the following scenario:
>>>>
>>>> revalidation 1:
>>>
>>> What does "revalidation 1" actually mean?
>>
>> 'revalidation 1' means one entry in sas_discover_domain().
>>
>>>
>>>> -->phy10: 0 --> delete phy10 domain device
>>>> -->phy11: 5000cca04eb043ad (no change)
>>>
>>> so is disk 11 still inserted at this stage?
>>
>> Maybe, but that's what we read from the hardware.
>>
>>>
>>>> revalidation done
>>>>
>>>> revalidation 2:
>>>
>>> is port-0:0:10 deleted now?
>>>
>>
>> Yes. But we don't care about it.
>>
>>>> -->step 1, check phy10:
>>>> -->phy10: 5000cca04eb043ad --> add to wide port(port-0:0:11) (phy11
>>>> address is still 5000cca04eb043ad now)
>
> We do not want this to happen and it seems to be the crux of the problem.
>
> As an alternate to your solution, how about check if the PHY is an end
> device. If so, it should not form/join a wideport; that is, apart from
> dual-port disks, which I am not sure about - I think each port still has
> a unique WWN, so should be ok.
>
If the PHY do not join a wideport, then it have to form a wideport of
it's own. I'm not sure if we can have two ports with the same address
and do not break anything?
>>>
>>> So this should not happen. How are you physcially swapping them such
>>> that phy11 address is still 5000cca04eb043ad? I don't see how this would
>>> be true at revalidation 1.
>>>
>>
>> This issue is because we always process the PHYs from 0 to max phy
>> number. And please be aware of the real physcial address of the PHY and
>> the address stored in the memory is not always the same.
>> Actually when you checking phy10, phy11 physcial address is not
>> 5000cca04eb043ad. But the address stored in domain device is still
>> 5000cca04eb043ad. We have not get a chance to to read it because we are
>> processing phy10 now, right?
>>
>
> I see.
>
>> It's very easy to reproduce. I suggest you to do it yourself and look at
>> the logs.
>>
>
> I can't physically access the backpane, and this is not the sort of
> thing which is easy to fake by hacking the driver.
>
> And the log which you provided internally does not have much - if any -
> libsas logs to help me understand it.
>
>>>>
>>>> -->step 2, check phy11:
>>>> -->phy11: 0 --> phy11 address is 0 now, but it's part of wide
>>>> port(port-0:0:11), the domain device will not be deleted.
>>>> revalidation done
>>>>
>>>> revalidation 3:
>>>> -->phy10, 5000cca04eb043ad (no change)
>>>> -->phy11: 5000cca04eb1001d --> try to add port-0:0:11 but failed,
>>>> port-0:0:11 already exist, trigger a warning as follows
>>>> revalidation done
>>>>
>>>> [14790.189699] sysfs: cannot create duplicate filename
>>>> '/devices/pci0000:74/0000:74:02.0/host0/port-0:0/expander-0:0/port-0:0:11'
>>>>
>>>>
>>>>
>>>> [14790.201081] CPU: 25 PID: 5031 Comm: kworker/u192:3 Not tainted
>>>> 4.16.0-rc1-191134-g138f084-dirty #228
>>>> [14790.210199] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 EC
>>>> UEFI
>>>> Nemo 2.0 RC0 - B303 05/16/2018
>>>> [14790.219323] Workqueue: 0000:74:02.0_disco_q sas_revalidate_domain
>>>> [14790.225404] Call trace:
>>>> [14790.227842] dump_backtrace+0x0/0x18c
>>>> [14790.231492] show_stack+0x14/0x1c
>>>> [14790.234798] dump_stack+0x88/0xac
>>>> [14790.238101] sysfs_warn_dup+0x64/0x7c
>>>> [14790.241751] sysfs_create_dir_ns+0x90/0xa0
>>>> [14790.245835] kobject_add_internal+0xa0/0x284
>>>> [14790.250092] kobject_add+0xb8/0x11c
>>>> [14790.253570] device_add+0xe8/0x598
>>>> [14790.256960] sas_port_add+0x24/0x50
>>>> [14790.260436] sas_ex_discover_devices+0xb10/0xc30
>>>> [14790.265040] sas_ex_revalidate_domain+0x1d8/0x518
>>>> [14790.269731] sas_revalidate_domain+0x12c/0x154
>>>> [14790.274163] process_one_work+0x128/0x2b0
>>>> [14790.278160] worker_thread+0x14c/0x408
>>>> [14790.281897] kthread+0xfc/0x128
>>>> [14790.285026] ret_from_fork+0x10/0x18
>>>> [14790.288598] ------------[ cut here ]------------
>>>>
>>>> At last, the disk 5000cca04eb1001d is lost.
>
>
> .
>
next prev parent reply other threads:[~2019-02-01 2:04 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-30 8:24 [PATCH v2 0/7] libsas: fix issue of swapping or replacing disks Jason Yan
2019-01-30 8:24 ` Jason Yan
2019-01-30 8:24 ` [PATCH v2 1/7] scsi: libsas: reset the negotiated_linkrate when phy is down Jason Yan
2019-01-30 8:24 ` Jason Yan
2019-01-30 13:08 ` John Garry
2019-01-30 13:08 ` John Garry
2019-01-31 1:11 ` Jason Yan
2019-01-31 1:11 ` Jason Yan
2019-01-31 9:00 ` John Garry
2019-01-31 9:00 ` John Garry
2019-01-30 8:24 ` [PATCH v2 2/7] scsi: libsas: only clear phy->in_shutdown after shutdown event done Jason Yan
2019-01-30 8:24 ` Jason Yan
2019-01-30 16:26 ` John Garry
2019-01-30 16:26 ` John Garry
2019-01-31 1:13 ` Jason Yan
2019-01-31 1:13 ` Jason Yan
2019-01-30 8:24 ` [PATCH v2 3/7] scsi: libsas: optimize the debug print of the revalidate process Jason Yan
2019-01-30 8:24 ` Jason Yan
2019-01-30 16:41 ` John Garry
2019-01-30 16:41 ` John Garry
2019-01-31 1:31 ` Jason Yan
2019-01-31 1:31 ` Jason Yan
2019-01-31 10:25 ` John Garry
2019-01-31 10:25 ` John Garry
2019-01-30 8:24 ` [PATCH v2 4/7] scsi: libsas: split the replacement of sas disks in two steps Jason Yan
2019-01-30 8:24 ` Jason Yan
2019-01-30 17:22 ` John Garry
2019-01-30 17:22 ` John Garry
2019-01-31 2:04 ` Jason Yan
2019-01-31 2:04 ` Jason Yan
2019-01-31 10:29 ` John Garry
2019-01-31 10:29 ` John Garry
2019-01-31 16:38 ` John Garry
2019-01-31 16:38 ` John Garry
2019-02-01 1:58 ` Jason Yan
2019-02-01 1:58 ` Jason Yan
2019-02-01 9:34 ` John Garry
2019-02-01 9:34 ` John Garry
2019-01-30 8:24 ` [PATCH v2 5/7] scsi: libsas: check if the same device when flutter Jason Yan
2019-01-30 8:24 ` Jason Yan
2019-01-30 8:24 ` [PATCH v2 6/7] scsi: libsas: reset the phy address if discover failed Jason Yan
2019-01-30 8:24 ` Jason Yan
2019-01-30 17:36 ` John Garry
2019-01-30 17:36 ` John Garry
2019-01-31 2:13 ` Jason Yan
2019-01-31 2:13 ` Jason Yan
2019-01-31 9:10 ` John Garry
2019-01-31 9:10 ` John Garry
2019-01-30 8:24 ` [PATCH v2 7/7] scsi: libsas: fix issue of swapping two sas disks Jason Yan
2019-01-30 8:24 ` Jason Yan
2019-01-30 17:53 ` John Garry
2019-01-30 17:53 ` John Garry
2019-01-31 2:55 ` Jason Yan
2019-01-31 2:55 ` Jason Yan
2019-01-31 16:34 ` John Garry
2019-01-31 16:34 ` John Garry
2019-02-01 2:04 ` Jason Yan [this message]
2019-02-01 2:04 ` Jason Yan
2019-02-01 9:27 ` John Garry
2019-02-01 9:27 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5C53A92F.50608@huawei.com \
--to=yanaijie@huawei.com \
--cc=chenxiang66@hisilicon.com \
--cc=dan.j.williams@intel.com \
--cc=emilne@redhat.com \
--cc=hare@suse.com \
--cc=hch@lst.de \
--cc=huangdaode@hisilicon.com \
--cc=jejb@linux.vnet.ibm.com \
--cc=john.garry@huawei.com \
--cc=jthumshirn@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=miaoxie@huawei.com \
--cc=tanxiaofei@huawei.com \
--cc=thenzl@redhat.com \
--cc=tj@kernel.org \
--cc=xiexiuqi@huawei.com \
--cc=zhaohongjiang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.