From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?utf-8?B?SsO2cm4=?= Engel Subject: Race between __sync_single_inode() and LogFS garbage collector Date: Mon, 19 Feb 2007 21:31:51 +0000 Message-ID: <20070219213150.GD7813@lazybastard.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE To: linux-fsdevel@vger.kernel.org Return-path: Received: from lazybastard.de ([212.112.238.170]:39188 "EHLO longford.lazybastard.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932706AbXBSVfh (ORCPT ); Mon, 19 Feb 2007 16:35:37 -0500 Received: from joern by longford.lazybastard.org with local (Exim 4.50) id 1HJG70-0002AO-C3 for linux-fsdevel@vger.kernel.org; Mon, 19 Feb 2007 21:31:54 +0000 Content-Disposition: inline Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Looks like I really write the first log-structured filesystem for Linux= =2E At least I can into a fairly arcane race that seems to be generic to al= l of them. Writing when space is tight may involve calling the garbage collector. The garbage collector will iget() random inodes, either to verify if a block is valid or to copy the block around. At this point, all writes to LogFS are serialized. __sync_single_inode() will first lock a random inode, then call write_inode(), then unlock the inode. So we can get this: __sync_single_inode() garbage collector --------------------------------------------------------------------- inode->i_state |=3D I_LOCK; ... =2E.. mutex_lock(&super->s_w_mutex); write_inode(inode, wait); ... ... iget(sb, ino); mutex_lock(&super->s_w_mutex); ... ... wait_on_inode(inode); mutex_unlock(&super->s_w_mutex);=09 ... =09 =2E.. inode->i_state &=3D ~I_LOCK; And once in a blue moon, those two will race for the same inode. As fa= r as I can see, the race can only get fixed in two ways: 1. Never iget() inside the garbage collector. That would require havin= g a private inode cache for LogFS. 2. Synchonize __sync_single_inode() and the garbage collector somehow. Variant 1 would result in double caching for the same object, something I would like to avoid. So does anyone have suggestions how variant 2 could be achieved? Essentially what I need is a way to say "don't sync any inodes right now, I'll be back in 5 milliseconds or so". J=C3=B6rn --=20 Courage is not the absence of fear, but rather the judgement that something else is more important than fear. -- Ambrose Redmoon - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html