From mboxrd@z Thu Jan 1 00:00:00 1970 From: NeilBrown Subject: Re: [PATCH 1/6] md-cluster: Protect communication with mutexes Date: Fri, 13 Nov 2015 08:59:08 +1100 Message-ID: <87fv0ariab.fsf@notabene.neil.brown.name> References: <1446781819-25571-1-git-send-email-rgoldwyn@suse.de> <87mvumsqbp.fsf@notabene.neil.brown.name> <5641631F.2080000@suse.de> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Return-path: In-Reply-To: <5641631F.2080000@suse.de> Sender: linux-raid-owner@vger.kernel.org To: Goldwyn Rodrigues , linux-raid@vger.kernel.org Cc: Goldwyn Rodrigues List-Id: linux-raid.ids --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Tue, Nov 10 2015, Goldwyn Rodrigues wrote: > On 11/09/2015 05:31 PM, NeilBrown wrote: >> On Fri, Nov 06 2015, rgoldwyn@suse.de wrote: >> >>> From: Goldwyn Rodrigues >>> >>> Communication can happen through multiple threads. It is possible that >>> one thread steps over another threads sequence. So, we use mutexes to >>> protect both the send and receive sequences. >>> >>> We use a new flag CLUSTER_MD_COMM_LOCKED which is used to detect >>> if communication is already locked. This is useful for cases where we n= eed to >>> take and hold the token DLM lock for a while. This bit is set only >>> after locking communication. >> >> I don't understand what this flag is trying to achieve, but I'm fairly >> sure it doesn't achieve it. > > Lets consider three specific cases of locking communication channels to=20 > show the conflict: > > 1. resync_info_update(): communication is locked and release for sending= =20 > a message. > 2. A regular md_update_sb(): communication is locked in=20 > metadata_update_start() and unlocked in metadata_update_finish() after=20 > writing to disk. In metadata_update_finish(), the sendmsg is called to=20 > send METADATA_UPDATED. > 3. An add operation which culminates in a md_update_sb(): Here the=20 > communication is locked before initiating add. If the add is successful,= =20 > it results in md_update_sb(). In md_update_sb(), metadata_update_start()= =20 > has to check if the communication is already locked. If locked, it=20 > should not lock again. Oh, I get it now - thanks. Going back and looking at the original commit I can now see that it does say that, but I didn't understand the implication at the time. I might be wrong again, but I think this approach is broken. The 'add disk' sequence does: 1/ ->add_new_disk which takes the lock 2/ other stuff, protected by the lock 3/ schedule a metadata update 4/ metadata update is initiated, which doesn't take the lock because the flag is set 5/ metadata update completes, lock is dropped. What if some other event causes the metadata update to happen during '2'? Maybe if you delayed setting MD_CLUSTER_COMM_LOCKED until after '2'. i.e. set it when scheduling the update. So the flag means: "lock has been taken for metadata update" One tricky bit there would be if metadata_update_start() found the bit wasn't set, and then entered mutex_lock() in lock_comm(). When MD_CLUSTER_COMM_LOCKED gets set we want that code to stop waiting and start doing useful things. But it won't. It might be easiest to make our own 'mutex' with a flag bit and a wait queue. Then calls to mutex_lock become wait_event(mddev->wait, !test_and_set_bit(bit, &cinfo->state)); mutex_unlock becoems clear_bit(bit, &cinfo->state); and the mutex_lock used from metadata_update_start is wait_event(mddev->wait, !test_and_set_bit(bit, &cinfo->state) || test_and_clear_bit(MD_CLUSTER_COMM_LOCKED, ....)); That might work.... providing it is well documented. (and you are right - mutex_trylock wouldn't have helped, I was completely misunderstanding). Thanks, NeilBrown > > The flag is used only for case 3. If the communication is already=20 > locked, it should not lock again. This flag is set only after=20 > lock_comm() has executed, but is checked for in metadata_update_start().= =20 > This should insure that any of the operations 1, 2 or 3 do not interfere= =20 > with each other. > > I am not sure if I have made the best effort to explain this. I had a=20 > tough time getting it right (which may or may not be complete). > >> >> Maybe if it was test_and_set_bit in metadata_update_start, it might make >> sense. But then I would suggest that clearing the bit be moved to >> unlock_comm() >> >> Do you just want to use mutex_try_lock() to detect if communication is >> already locked? > > Consider a race between md_update_sb() and sendmsg() [Case 1. and 2] > > mutex_try_lock() may not work in this situation because lock_comm()=20 > could have been called by sendmsg(), which will release it as soon as=20 > the message is sent. In the meantime (while the lock is locked), if a=20 > metadata_update_sb() operation executes. It will not relock the=20 > communication. This will result in the WARN_ON in unlock_comm() since=20 > sendmsg() sequence had already unlocked it. > > >> >>> >>> Also, Remove stray/ununsed sb_mutex. >> >> I already removed that it mainline - I should have mentioned. >> >> Thanks, >> NeilBrown >> >> >>> >>> Signed-off-by: Goldwyn Rodrigues >>> --- >>> drivers/md/md-cluster.c | 26 +++++++++++++++++++++----- >>> 1 file changed, 21 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c >>> index b73729b..a93734e 100644 >>> --- a/drivers/md/md-cluster.c >>> +++ b/drivers/md/md-cluster.c >>> @@ -47,6 +47,7 @@ struct resync_info { >>> #define MD_CLUSTER_WAITING_FOR_NEWDISK 1 >>> #define MD_CLUSTER_SUSPEND_READ_BALANCING 2 >>> #define MD_CLUSTER_BEGIN_JOIN_CLUSTER 3 >>> +#define MD_CLUSTER_COMM_LOCKED 4 >>> >>> >>> struct md_cluster_info { >>> @@ -54,7 +55,8 @@ struct md_cluster_info { >>> dlm_lockspace_t *lockspace; >>> int slot_number; >>> struct completion completion; >>> - struct mutex sb_mutex; >>> + struct mutex recv_mutex; >>> + struct mutex send_mutex; >>> struct dlm_lock_resource *bitmap_lockres; >>> struct dlm_lock_resource *resync_lockres; >>> struct list_head suspend_list; >>> @@ -503,6 +505,7 @@ static void recv_daemon(struct md_thread *thread) >>> struct cluster_msg msg; >>> int ret; >>> >>> + mutex_lock(&cinfo->recv_mutex); >>> /*get CR on Message*/ >>> if (dlm_lock_sync(message_lockres, DLM_LOCK_CR)) { >>> pr_err("md/raid1:failed to get CR on MESSAGE\n"); >>> @@ -529,6 +532,7 @@ static void recv_daemon(struct md_thread *thread) >>> ret =3D dlm_unlock_sync(message_lockres); >>> if (unlikely(ret !=3D 0)) >>> pr_info("unlock msg failed return %d\n", ret); >>> + mutex_unlock(&cinfo->recv_mutex); >>> } >>> >>> /* lock_comm() >>> @@ -542,20 +546,22 @@ static int lock_comm(struct md_cluster_info *cinf= o) >>> { >>> int error; >>> >>> - if (cinfo->token_lockres->mode =3D=3D DLM_LOCK_EX) >>> - return 0; >>> + mutex_lock(&cinfo->send_mutex); >>> >>> error =3D dlm_lock_sync(cinfo->token_lockres, DLM_LOCK_EX); >>> if (error) >>> pr_err("md-cluster(%s:%d): failed to get EX on TOKEN (%d)\n", >>> __func__, __LINE__, error); >>> + mutex_lock(&cinfo->recv_mutex); >>> return error; >>> } >>> >>> static void unlock_comm(struct md_cluster_info *cinfo) >>> { >>> WARN_ON(cinfo->token_lockres->mode !=3D DLM_LOCK_EX); >>> + mutex_unlock(&cinfo->recv_mutex); >>> dlm_unlock_sync(cinfo->token_lockres); >>> + mutex_unlock(&cinfo->send_mutex); >>> } >>> >>> /* __sendmsg() >>> @@ -709,7 +715,8 @@ static int join(struct mddev *mddev, int nodes) >>> init_completion(&cinfo->completion); >>> set_bit(MD_CLUSTER_BEGIN_JOIN_CLUSTER, &cinfo->state); >>> >>> - mutex_init(&cinfo->sb_mutex); >>> + mutex_init(&cinfo->send_mutex); >>> + mutex_init(&cinfo->recv_mutex); >>> mddev->cluster_info =3D cinfo; >>> >>> memset(str, 0, 64); >>> @@ -839,7 +846,12 @@ static int slot_number(struct mddev *mddev) >>> >>> static int metadata_update_start(struct mddev *mddev) >>> { >>> - return lock_comm(mddev->cluster_info); >>> + struct md_cluster_info *cinfo =3D mddev->cluster_info; >>> + int ret; >>> + if (test_and_clear_bit(MD_CLUSTER_COMM_LOCKED, &cinfo->state)) >>> + return 0; >>> + ret =3D lock_comm(cinfo); >>> + return ret; >>> } >>> >>> static int metadata_update_finish(struct mddev *mddev) >>> @@ -864,6 +876,7 @@ static int metadata_update_finish(struct mddev *mdd= ev) >>> ret =3D __sendmsg(cinfo, &cmsg); >>> } else >>> pr_warn("md-cluster: No good device id found to send\n"); >>> + clear_bit(MD_CLUSTER_COMM_LOCKED, &cinfo->state); >>> unlock_comm(cinfo); >>> return ret; >>> } >>> @@ -871,6 +884,7 @@ static int metadata_update_finish(struct mddev *mdd= ev) >>> static void metadata_update_cancel(struct mddev *mddev) >>> { >>> struct md_cluster_info *cinfo =3D mddev->cluster_info; >>> + clear_bit(MD_CLUSTER_COMM_LOCKED, &cinfo->state); >>> unlock_comm(cinfo); >>> } >>> >>> @@ -945,6 +959,7 @@ static int add_new_disk(struct mddev *mddev, struct= md_rdev *rdev) >>> memcpy(cmsg.uuid, uuid, 16); >>> cmsg.raid_slot =3D cpu_to_le32(rdev->desc_nr); >>> lock_comm(cinfo); >>> + set_bit(MD_CLUSTER_COMM_LOCKED, &cinfo->state); >>> ret =3D __sendmsg(cinfo, &cmsg); >>> if (ret) >>> return ret; >>> @@ -964,6 +979,7 @@ static int add_new_disk(struct mddev *mddev, struct= md_rdev *rdev) >>> static void add_new_disk_cancel(struct mddev *mddev) >>> { >>> struct md_cluster_info *cinfo =3D mddev->cluster_info; >>> + clear_bit(MD_CLUSTER_COMM_LOCKED, &cinfo->state); >>> unlock_comm(cinfo); >>> } >>> >>> -- >>> 1.8.5.6 > > --=20 > Goldwyn --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWRQusAAoJEDnsnt1WYoG56z0P/RmFT75gDDncpuumDoj9kZAw 0qlRzXB/HdlmtmsicmORyfVzdMoPWRfoYWvuyainnLMwYC2uuZIecyXvEgmlSjl3 5Htnr4InJQUmiWzJse31EayIRM9IprcqnRD/2QenbiI6q2z6YCstcGmC3jGeup9f vGlOKN0YD/D7TsK419F74RvhnHjlsril/uEI4OODHeCyi+7/3qPfr2TaEDriHNfl lqMoM374Gef6q+wfwrSQ1mtEMn1it1ZBXh8MfuO+JphxePIWEBXT8r+tjjRWtnzQ U9jGDv67LNAZquBu9e6pHsJHep5vQCgtrPC7pxPG4/+L5tiBYLABkXX3EDyL+M7/ vtJBb0sMadnSPKtUbb813Mde5Gpy55tRPXHSakuY/XDoajxQ9Nird3cs136GJhXs RT3G+xl9gd+9eds3kiuUWwhWTkA24Mp2nJzrpHP8YYB6WyksN3YMDOzC+IiVOGK+ 1MWllYC7EJoFScAbPs6CGd8Wah7UujbljZCS/t/SiFjkZ0O3ex0UtqtVO/vDRXOZ ApMaLRhPiANzOgkrsnb9a5B5ijMIJKOMl1b4wXgG85mmxU/bg+SqVavjwfojAXJq gsUAlb9bF3nmMIYAiiGxR0Fr3uQQ/y+NKNTN81TuGWRAtQ9XgTtox14Gi2RfCa3l nYQ/R5OIt7S50YveHEvO =p9mU -----END PGP SIGNATURE----- --=-=-=--