From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.nokia.com ([192.100.122.233] helo=mgw-mx06.nokia.com) by bombadil.infradead.org with esmtps (Exim 4.69 #1 (Red Hat Linux)) id 1N8rPH-0006EU-Ul for linux-mtd@lists.infradead.org; Fri, 13 Nov 2009 08:21:29 +0000 Subject: Re: a UBIFS image makes task pdflush blocked > 120 seconds From: Artem Bityutskiy To: Norbert van Bolhuis In-Reply-To: <4AFADE3B.3050406@aimvalley.nl> References: <4ACF3478.1060503@aimvalley.nl> <1255269148.16942.95.camel@localhost> <4AD30069.8060101@aimvalley.nl> <1255510600.32489.130.camel@localhost> <4AFADE3B.3050406@aimvalley.nl> Content-Type: text/plain; charset="UTF-8" Date: Fri, 13 Nov 2009 10:20:55 +0200 Message-Id: <1258100455.21596.1187.camel@localhost> Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Cc: linux-mtd Reply-To: dedekind1@gmail.com List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Hi, On Wed, 2009-11-11 at 16:54 +0100, Norbert van Bolhuis wrote: > >> ok, will do that. I'll track commit_sem, io_mutex and ubifs_garbage_collect(). > >> I'll report back. > >> > >> Btw. stackdump is the same (2 out of 2 times). > > > > OK. I really do not have any idea off the top of my head now, sorry. > > Try to investigate this a beet deeper. > > > > > I investigated this. It turn out it's some kind of race condition between the pdflush > deamon and other tasks (e.g. events). It has everything to do the the commit_sem > and io_mutex. It's not related to my preprogrammed UBIFS image (i've seen it also after > a big file was written to UBIFS and a sync was done "simultaneously") nor is it related > to ubifs_garbage_collect(). > > Here's what I did and how I conclude the above. > > I logged (in memory) all commit_sem, io_mutex and ubifs_garbage_collect actions > and parse the results to human readable text afterwards. > "Req" means mutex/semaphore requested. If there's no "Req" the operation > has been done. > > > The race condition occurs for the below UBIFS pseaudo code (which is done by many > UBIFS functions): > > > err = make_reservation () { > down_read(&c->commit_sem) /*** commit_sem, make_reservation, DOWN 1 ***/ > err = reserve_space(c, jhead, len) { > ... > again: > mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead); /*** io_mutex, reserve_space, LOCK 1 ***/ > ... > avail = c->leb_size - wbuf->offs - wbuf->used; > if (wbuf->lnum != -1 && avail >= len) > return 0; > } > if (!err) > return 0; > } > /* the journal head stays locked and later has to be unlocked using release_head() */ > if (err) > goto out_free; > ... > err = write_head(c, BASEHD, dent, len, &lnum, &dent_offs, sync); > if (err) > goto out_release; > ... > release_head(c, BASEHD) { > mutex_unlock(&c->jheads[jhead].wbuf.io_mutex); /*** io_mutex, release_head, UNLOCK ***/ > } > ... > finish_reservation(c) { > up_read(&c->commit_sem); /*** commit_sem, finish_reservation, UP ***/ > } > > > > > Normally (for one "update" I guess) the logs shows: > > TIME=0000.000, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1 > TIME=0000.000, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1 > TIME=0000.000, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1 > TIME=0000.000, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1 > TIME=0000.480, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK > TIME=0000.480, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK > TIME=0000.480, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP > TIME=0000.480, M(t=0,xy=88)=commit_sem, finish_reservation, UP What are M(t= ,xy= ) ? Would be very useful to save process ID (current()->pid) as well. > > > > when "another task" is also updating the log shows: > > TIME=0304.252, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1 > TIME=0304.252, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1 > TIME=0304.252, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1 > TIME=0304.252, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1 > TIME=0304.292, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1 > TIME=0304.292, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1 <- 2nd DOWN !? Right, this is RW semaphore, and in journal reservation we lock it for reading (down_read()), and there may be many readers at a time. The only place this semaphore is locked for writing is the commit code, where we prohibit all journal accesses for the first phase of commit. May be there is some interference with down_write(commit_sem) ? > TIME=0304.292, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1 > TIME=0304.736, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK > TIME=0304.736, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK > TIME=0304.736, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP > TIME=0304.736, M(t=0,xy=88)=commit_sem, finish_reservation, UP > TIME=0304.736, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1 > TIME=0304.736, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK > TIME=0304.736, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK > TIME=0304.736, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP > TIME=0304.736, M(t=0,xy=88)=commit_sem, finish_reservation, UP > > I really don't understand the commit_sem here. Is there a 2nd copy of commit_sem ? See above. > the very first time pdflush interfers the pdflush error report shows > (for which there's also a log entry): > > TIME=0021.644, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1 > TIME=0021.644, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1 > TIME=0021.644, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1 > TIME=0021.644, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1 > TIME=0034.316, M(t=8,xy=06)=commit_sem, make_reservation, Req DOWN 1 > TIME=0034.316, M(t=0,xy=06)=commit_sem, make_reservation, DOWN 1 <- 2nd DOWN !? > TIME=0034.316, M(t=9,xy=07)=io_mutex, reserve_space, Req LOCK 1 <- pdflush blocks here > TIME=0168.868, M(t=3,xy=00)=pdflush error report > TIME=0302.012, M(t=3,xy=00)=pdflush error report Please, save PIDs as well, it is very useful. So now we need to figure out why it cannot take the mutex fro so long time. You should look deeper to the first task, which took the mutex and blocks pdflush. What that task is doing. For sure it writes something to the flash, which goes down to UBI, then to MTD, then to your flash driver. And probably somewhere there the task is blocked for long time? > TIME=0303.280, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK > TIME=0303.280, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK <- pdflush can continue > TIME=0303.280, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP > TIME=0303.280, M(t=0,xy=88)=commit_sem, finish_reservation, UP > TIME=0303.284, M(t=1,xy=07)=io_mutex, reserve_space, LOCK 1 <- pdflush continues here > TIME=0303.284, M(t=9,xy=8a)=io_mutex, release_head, Req UNLOCK > TIME=0303.284, M(t=1,xy=8a)=io_mutex, release_head, UNLOCK > TIME=0303.284, M(t=8,xy=88)=commit_sem, finish_reservation, Req UP > TIME=0303.284, M(t=0,xy=88)=commit_sem, finish_reservation, UP > > > > I must admit I dont't see the root cause. I don't understand why the commit_sem > doesn't seem to work. Because it is R/W semaphore. In your investigations you looked at we locks: the commit_sem (which is an R/W sem.) and io_mutex (which is a normal mutex). Let's discuss what they protect. 1. commit_sem serializes all FS writers vs. the commit process. While UBIFS has space in journals, tasks can just write. But once the journal is full, we have to commit. When we commit, we need to stop all writers for a short period of time (not for whole commit time). Thus, what we do is that all journal writers take the commit_sem for reading, and the commit function takes it for writing. 2. In UBIFS wa have several journal heads. We put different data to different heads. And the io_mutex is per-head mutex. 2 tasks are allowed to write at the same time to different heads. But if they want to write to the same head, they are serialized by the io_mutex. > Nor do I understand why the "other task" needs to hold the > io_mutex for more than 280 seconds. You need to look deeper to that task. See where it goes next, and how it spends some that time. It might happen to be that by the time the other task releases io_mutex, there is a third task which takes it, and by the time the third task releases the mutex, there is a fourth task which takes it. So pdflush waits very long. IOW, I think mutexes are not very fair. However, I think that most likely the other task is just blocked for long time somewhere in UBI or MTD level, for some reasons. > If you see the reason or have any idea please let me know, otherwise I'll > investigate this more thorough. Try to see deeper at what the other task is doing, and store process ids as well. HTH. -- Best Regards, Artem Bityutskiy (Артём Битюцкий)