From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mga09.intel.com ([134.134.136.24]) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1WUZFb-0003i8-DT for linux-mtd@lists.infradead.org; Mon, 31 Mar 2014 10:15:31 +0000 Message-ID: <1396260879.9016.70.camel@sauron.fi.intel.com> Subject: Re: ubifs: assertion fails From: Artem Bityutskiy To: Dolev Raviv Date: Mon, 31 Mar 2014 13:14:39 +0300 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Cc: linux-mtd@lists.infradead.org Reply-To: dedekind1@gmail.com List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Mon, 2014-03-24 at 06:03 +0000, Dolev Raviv wrote: > Hi all, > > I’m doing my first steps learning ubifs and I’m trying to understand a > something that does not make much sense to me. > > In fs/ubifs/shrinker.c, at shrink_tnc(), there is an assert condition that > shows up every once I a while (after stressing). > ubifs_assert(atomic_long_read(&c->clean_zn_cnt) >= 0); When this happens, do you then see a storm of similar assertions from other parts of the code? I am trying to understand if this assertion is incorrect, or you really get the accounting screwed when shrinking happens. In the former case, this would probably be a single assertion, on the latter you'd probably see many similar warnings from other code. E.g., when you unmount. > In another place in the same file in the function ubifs_shrinker(), I > found the following comment: > /* > * Due to the way UBIFS updates the clean znode counter it may > * temporarily be negative. > */ Yeah. The key here is this 'c->next'. If it is NULL, the accounting must be correct, if it is not NULL, it may be incorrect. And it will be correct when the on-going commit operation finishes and 'free_obsolete_znodes()' is called. > Could the assertion condition be wrong? Could be, but could also show that there is an accounting error happening when shrinker starts. And I saw misterious errors when shrinker starts working at some point, but did not have time to dig this. So there is at least 1 bug in the shrinker path which I saw. > Can anyone share information on what are those times that the counter can > be negative? When the commit operation starts, it grabs the tnc_mutex, prepares the list of nodes to commit, and release tnc_mutex. Now the accounting is incorrect. When the commit finishes, it grabs the mutex again, does some stuff, and also fixes the accounting. Then drops the mutex. The idea was to make sure that commit does not block I/O. Meaning that you can still write files while commit is going on. -- Best Regards, Artem Bityutskiy