* [bug report] md-cluster: Fix adding of new disk with new reload code
@ 2017-04-26 19:59 Dan Carpenter
2017-04-27 2:52 ` Guoqing Jiang
0 siblings, 1 reply; 3+ messages in thread
From: Dan Carpenter @ 2017-04-26 19:59 UTC (permalink / raw)
To: rgoldwyn; +Cc: linux-raid
Hello Goldwyn Rodrigues,
The patch dbb64f8635f5: "md-cluster: Fix adding of new disk with new
reload code" from Oct 1, 2015, leads to the following static checker
warning:
drivers/md/md-cluster.c:1341 add_new_disk()
warn: inconsistent returns 'cinfo->recv_mutex'.
Locked on : 1315,1341
Unlocked on: 1341
drivers/md/md-cluster.c
1300 static int add_new_disk(struct mddev *mddev, struct md_rdev *rdev)
1301 {
1302 struct md_cluster_info *cinfo = mddev->cluster_info;
1303 struct cluster_msg cmsg;
1304 int ret = 0;
1305 struct mdp_superblock_1 *sb = page_address(rdev->sb_page);
1306 char *uuid = sb->device_uuid;
1307
1308 memset(&cmsg, 0, sizeof(cmsg));
1309 cmsg.type = cpu_to_le32(NEWDISK);
1310 memcpy(cmsg.uuid, uuid, 16);
1311 cmsg.raid_slot = cpu_to_le32(rdev->desc_nr);
1312 lock_comm(cinfo, 1);
^^^^^^^^^^^^^^^^^^^
We take the lock here.
1313 ret = __sendmsg(cinfo, &cmsg);
1314 if (ret)
1315 return ret;
Should we unlock on failure here?
1316 cinfo->no_new_dev_lockres->flags |= DLM_LKF_NOQUEUE;
1317 ret = dlm_lock_sync(cinfo->no_new_dev_lockres, DLM_LOCK_EX);
1318 cinfo->no_new_dev_lockres->flags &= ~DLM_LKF_NOQUEUE;
1319 /* Some node does not "see" the device */
1320 if (ret == -EAGAIN)
1321 ret = -ENOENT;
1322 if (ret)
1323 unlock_comm(cinfo);
Because we do here. I think we're only supposed to hold the lock on
success but how this all works with cancel etc looked slightly
complicated so I decided to ask instead of sending a patch.
1324 else {
1325 dlm_lock_sync(cinfo->no_new_dev_lockres, DLM_LOCK_CR);
1326 /* Since MD_CHANGE_DEVS will be set in add_bound_rdev which
1327 * will run soon after add_new_disk, the below path will be
1328 * invoked:
1329 * md_wakeup_thread(mddev->thread)
1330 * -> conf->thread (raid1d)
1331 * -> md_check_recovery -> md_update_sb
1332 * -> metadata_update_start/finish
1333 * MD_CLUSTER_SEND_LOCKED_ALREADY will be cleared eventually.
1334 *
1335 * For other failure cases, metadata_update_cancel and
1336 * add_new_disk_cancel also clear below bit as well.
1337 * */
1338 set_bit(MD_CLUSTER_SEND_LOCKED_ALREADY, &cinfo->state);
1339 wake_up(&cinfo->wait);
1340 }
1341 return ret;
1342 }
regards,
dan carpenter
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [bug report] md-cluster: Fix adding of new disk with new reload code
2017-04-26 19:59 [bug report] md-cluster: Fix adding of new disk with new reload code Dan Carpenter
@ 2017-04-27 2:52 ` Guoqing Jiang
2017-05-05 22:30 ` Goldwyn Rodrigues
0 siblings, 1 reply; 3+ messages in thread
From: Guoqing Jiang @ 2017-04-27 2:52 UTC (permalink / raw)
To: Dan Carpenter, rgoldwyn; +Cc: linux-raid
HI,
Thanks for check.
On 04/27/2017 03:59 AM, Dan Carpenter wrote:
> Hello Goldwyn Rodrigues,
>
> The patch dbb64f8635f5: "md-cluster: Fix adding of new disk with new
> reload code" from Oct 1, 2015, leads to the following static checker
> warning:
>
> drivers/md/md-cluster.c:1341 add_new_disk()
> warn: inconsistent returns 'cinfo->recv_mutex'.
> Locked on : 1315,1341
> Unlocked on: 1341
>
> drivers/md/md-cluster.c
> 1300 static int add_new_disk(struct mddev *mddev, struct md_rdev *rdev)
> 1301 {
> 1302 struct md_cluster_info *cinfo = mddev->cluster_info;
> 1303 struct cluster_msg cmsg;
> 1304 int ret = 0;
> 1305 struct mdp_superblock_1 *sb = page_address(rdev->sb_page);
> 1306 char *uuid = sb->device_uuid;
> 1307
> 1308 memset(&cmsg, 0, sizeof(cmsg));
> 1309 cmsg.type = cpu_to_le32(NEWDISK);
> 1310 memcpy(cmsg.uuid, uuid, 16);
> 1311 cmsg.raid_slot = cpu_to_le32(rdev->desc_nr);
> 1312 lock_comm(cinfo, 1);
> ^^^^^^^^^^^^^^^^^^^
> We take the lock here.
>
> 1313 ret = __sendmsg(cinfo, &cmsg);
> 1314 if (ret)
> 1315 return ret;
>
> Should we unlock on failure here?
I agree we may need the unlock here, since both add_new_disk_cancel and
unlock_comm are not called for the failure case, it is not right.
But I think Goldwyn knows better, let's wait for his comment.
> 1316 cinfo->no_new_dev_lockres->flags |= DLM_LKF_NOQUEUE;
> 1317 ret = dlm_lock_sync(cinfo->no_new_dev_lockres, DLM_LOCK_EX);
> 1318 cinfo->no_new_dev_lockres->flags &= ~DLM_LKF_NOQUEUE;
> 1319 /* Some node does not "see" the device */
> 1320 if (ret == -EAGAIN)
> 1321 ret = -ENOENT;
> 1322 if (ret)
> 1323 unlock_comm(cinfo);
>
> Because we do here. I think we're only supposed to hold the lock on
> success but how this all works with cancel etc looked slightly
> complicated so I decided to ask instead of sending a patch.
Please take a look with comments from L1326 to L1337, unlock will be
called eventually by
metadata_update_finish (for successful case) or
metadata_update_cancel/add_new_disk_cancel
(for failure case).
>
> 1324 else {
> 1325 dlm_lock_sync(cinfo->no_new_dev_lockres, DLM_LOCK_CR);
> 1326 /* Since MD_CHANGE_DEVS will be set in add_bound_rdev which
> 1327 * will run soon after add_new_disk, the below path will be
> 1328 * invoked:
> 1329 * md_wakeup_thread(mddev->thread)
> 1330 * -> conf->thread (raid1d)
> 1331 * -> md_check_recovery -> md_update_sb
> 1332 * -> metadata_update_start/finish
> 1333 * MD_CLUSTER_SEND_LOCKED_ALREADY will be cleared eventually.
> 1334 *
> 1335 * For other failure cases, metadata_update_cancel and
> 1336 * add_new_disk_cancel also clear below bit as well.
> 1337 * */
> 1338 set_bit(MD_CLUSTER_SEND_LOCKED_ALREADY, &cinfo->state);
> 1339 wake_up(&cinfo->wait);
> 1340 }
> 1341 return ret;
> 1342 }
>
Regards,
Guoqing
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [bug report] md-cluster: Fix adding of new disk with new reload code
2017-04-27 2:52 ` Guoqing Jiang
@ 2017-05-05 22:30 ` Goldwyn Rodrigues
0 siblings, 0 replies; 3+ messages in thread
From: Goldwyn Rodrigues @ 2017-05-05 22:30 UTC (permalink / raw)
To: Guoqing Jiang, Dan Carpenter; +Cc: linux-raid
On 04/26/2017 09:52 PM, Guoqing Jiang wrote:
> HI,
>
> Thanks for check.
>
> On 04/27/2017 03:59 AM, Dan Carpenter wrote:
>> Hello Goldwyn Rodrigues,
>>
>> The patch dbb64f8635f5: "md-cluster: Fix adding of new disk with new
>> reload code" from Oct 1, 2015, leads to the following static checker
>> warning:
>>
>> drivers/md/md-cluster.c:1341 add_new_disk()
>> warn: inconsistent returns 'cinfo->recv_mutex'.
>> Locked on : 1315,1341
>> Unlocked on: 1341
>>
>> drivers/md/md-cluster.c
>> 1300 static int add_new_disk(struct mddev *mddev, struct md_rdev
>> *rdev)
>> 1301 {
>> 1302 struct md_cluster_info *cinfo = mddev->cluster_info;
>> 1303 struct cluster_msg cmsg;
>> 1304 int ret = 0;
>> 1305 struct mdp_superblock_1 *sb =
>> page_address(rdev->sb_page);
>> 1306 char *uuid = sb->device_uuid;
>> 1307
>> 1308 memset(&cmsg, 0, sizeof(cmsg));
>> 1309 cmsg.type = cpu_to_le32(NEWDISK);
>> 1310 memcpy(cmsg.uuid, uuid, 16);
>> 1311 cmsg.raid_slot = cpu_to_le32(rdev->desc_nr);
>> 1312 lock_comm(cinfo, 1);
>> ^^^^^^^^^^^^^^^^^^^
>> We take the lock here.
>>
>> 1313 ret = __sendmsg(cinfo, &cmsg);
>> 1314 if (ret)
>> 1315 return ret;
>>
>> Should we unlock on failure here?
>
> I agree we may need the unlock here, since both add_new_disk_cancel and
> unlock_comm are not called for the failure case, it is not right.
>
> But I think Goldwyn knows better, let's wait for his comment.
Yes, you are right. We should unlock here. Thanks.
Could you please send a patch?
>
>
>> 1316 cinfo->no_new_dev_lockres->flags |= DLM_LKF_NOQUEUE;
>> 1317 ret = dlm_lock_sync(cinfo->no_new_dev_lockres,
>> DLM_LOCK_EX);
>> 1318 cinfo->no_new_dev_lockres->flags &= ~DLM_LKF_NOQUEUE;
>> 1319 /* Some node does not "see" the device */
>> 1320 if (ret == -EAGAIN)
>> 1321 ret = -ENOENT;
>> 1322 if (ret)
>> 1323 unlock_comm(cinfo);
>>
>> Because we do here. I think we're only supposed to hold the lock on
>> success but how this all works with cancel etc looked slightly
>> complicated so I decided to ask instead of sending a patch.
>
> Please take a look with comments from L1326 to L1337, unlock will be
> called eventually by
> metadata_update_finish (for successful case) or
> metadata_update_cancel/add_new_disk_cancel
> (for failure case).
>
>>
>> 1324 else {
>> 1325 dlm_lock_sync(cinfo->no_new_dev_lockres,
>> DLM_LOCK_CR);
>> 1326 /* Since MD_CHANGE_DEVS will be set in
>> add_bound_rdev which
>> 1327 * will run soon after add_new_disk, the
>> below path will be
>> 1328 * invoked:
>> 1329 * md_wakeup_thread(mddev->thread)
>> 1330 * -> conf->thread (raid1d)
>> 1331 * -> md_check_recovery -> md_update_sb
>> 1332 * -> metadata_update_start/finish
>> 1333 * MD_CLUSTER_SEND_LOCKED_ALREADY will be
>> cleared eventually.
>> 1334 *
>> 1335 * For other failure cases,
>> metadata_update_cancel and
>> 1336 * add_new_disk_cancel also clear below bit
>> as well.
>> 1337 * */
>> 1338 set_bit(MD_CLUSTER_SEND_LOCKED_ALREADY,
>> &cinfo->state);
>> 1339 wake_up(&cinfo->wait);
>> 1340 }
>> 1341 return ret;
>> 1342 }
>>
>
> Regards,
> Guoqing
--
Goldwyn
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-05-05 22:30 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-04-26 19:59 [bug report] md-cluster: Fix adding of new disk with new reload code Dan Carpenter
2017-04-27 2:52 ` Guoqing Jiang
2017-05-05 22:30 ` Goldwyn Rodrigues
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).