From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from 71-19-161-253.dedicated.allstream.net ([71.19.161.253] helo=nsa.nbspaymentsolutions.com) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1WHxHm-00084w-Ct for linux-mtd@lists.infradead.org; Mon, 24 Feb 2014 15:17:39 +0000 From: Bill Pringlemeir To: Richard Weinberger Subject: Re: UBI leb_write_unlock NULL pointer Oops (continuation) References: <52EF772D.8080207@nod.at> <52EF9FFE.4020405@nod.at> <52F1F658.9080701@nod.at> <87zjlxy8lj.fsf@nbsps.com> <87txc4w698.fsf@nbsps.com> <877g8ojqsn.fsf@nbsps.com> <53079725.6080105@nod.at> <87ios8gsho.fsf@nbsps.com> <87k3coq8di.fsf@nbsps.com> <530860B1.9000808@nod.at> Date: Mon, 24 Feb 2014 10:09:56 -0500 In-Reply-To: <530860B1.9000808@nod.at> (Richard Weinberger's message of "Sat, 22 Feb 2014 09:32:49 +0100") Message-ID: <877g8kr1h7.fsf@nbsps.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "Wiedemer, Thorsten \(Lawo AG\)" , "linux-mtd@lists.infradead.org" List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On 22 Feb 2014, richard@nod.at wrote: > Am 22.02.2014 01:49, schrieb Bill Pringlemeir: >> I think the 'ubi_eba_copy_leb()' may have an issue. >> >> int ubi_eba_copy_leb(struct ubi_device *ubi, int from, int to, >> struct ubi_vid_hdr *vid_hdr) >> { >> ... >> >> err = leb_write_trylock(ubi, vol_id, lnum); >> >> static int leb_write_trylock(struct ubi_device *ubi, int vol_id, int lnum) >> { >> .. >> le = ltree_add_entry(ubi, vol_id, lnum); /* users + 1 */ >> if (IS_ERR(le)) >> return PTR_ERR(le); >> if (down_write_trylock(&le->mutex)) >> return 0; >> >> /* Contention, cancel */ >> spin_lock(&ubi->ltree_lock); >> le->users -= 1; /* user - 1 */ >> ... >> spin_unlock(&ubi->ltree_lock); >> >> return 1; >> } >> >> if (err)... >> if (vol->eba_tbl[lnum] != from) { dbg_wl("LEB %d:%d is no longer mapped to PEB %d, mapped to PEB %d, cancel", vol_id, lnum, from, vol->eba_tbl[lnum]); err = MOVE_CANCEL_RACE; goto out_unlock_leb; } ... out_unlock_leb: leb_write_unlock(ubi, vol_id, lnum); /* user - 1 */ >> Didn't the count have to bump up in this case? This isn't in Thorsten's >> traces, but neither are any 'down_read' or 'up_read' traces; maybe >> everything is in cache? > Hmm, I'm not sure whether I was able to follow your thought. But > leb_write_unlock() is balanced with leb_write_trylock() in > ubi_eba_copy_leb() which makes perfectly sense to me. What exactly is > the problem? There are two things that must be balanced. The 'reference count' ubi_ltree_entry -> users and the rw_semaphore down/up. You are right, the trylock needs to be balanced by the 'leb_write_unlock'. However, the 'leb_write_trylock()' has already decremented 'users' in preperation to move the 'lnum'. However, in the case of contention, 'ubi_eba_copy_leb' bails and does the 'leb_write_unlock()', which balances the 'trylock', but unbalances the 'users' reference count (I added some comments on the lines). I was postulating that the down/up paths are not on the same semaphore. There is nothing in the code to check this. It is possible for someone to come in and recycle the 'lnum' and create a new rwsemaphore. Here, the down/up are done on different 'rwsemaphores' and this will cause issues. I was looking at using a 'kref' instead of 'users' and then I found this path. Fwiw, Bill Pringlemeir.