From mboxrd@z Thu Jan 1 00:00:00 1970 From: sagi@grimberg.me (Sagi Grimberg) Date: Wed, 8 Mar 2017 21:34:17 +0200 Subject: nvmet: race condition while CQE are getting processed concurrently with the DISCONNECTED event In-Reply-To: <20170308154605.GA28937@infradead.org> References: <7dc99796-899e-b1a0-6ddb-cbfc497195dd@grimberg.me> <20170308154605.GA28937@infradead.org> Message-ID: <9e9301a7-93a1-20a7-4f4f-d50f26a176e8@grimberg.me> >> For that perhaps you can try patch [1]. > > Yes, I'll think we need that. Did I mention that the percpu > refounter API is a complete trainwreck a couple times? :) Heh, You probably did, I wander what is the use-case for percpu_ref_kill without the guarantee that subsequent percpu_ref_tryget_live will fail... >> 2. ib_destroy_cq does not really protect against a case where >> the work requeue itself because it runs flush_work(). In this >> case when the work re-executes it polls a cq array that is >> already freed and sees a bogus successful completion. Perhaps >> ib_free_cq should run cancel_work_sync() instead? see [2]. > > Yeah, we'll probably need that as well. Independent of it solves > the problem reported here. I'll send proper patches. Would be nice if Raju or Yi can see if this helps at all..