From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hin-Tak Leung Subject: Re: hfsplus BUG(), kmap and journalling. Date: Tue, 30 Oct 2012 08:24:36 +0000 Message-ID: <508F8EC4.7050905@users.sourceforge.net> References: <1350714241.15797.YahooMailClassic@web29405.mail.ird.yahoo.com> <1350896532.3236.31.camel@slavad-ubuntu> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------090704090409060807020004" Cc: linux-fsdevel@vger.kernel.org, Till Kamppeter , Naohiro Aota To: Vyacheslav Dubeyko Return-path: Received: from nm18.bullet.mail.ukl.yahoo.com ([217.146.183.192]:30083 "EHLO nm18.bullet.mail.ukl.yahoo.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751163Ab2J3IYr (ORCPT ); Tue, 30 Oct 2012 04:24:47 -0400 In-Reply-To: <1350896532.3236.31.camel@slavad-ubuntu> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: This is a multi-part message in MIME format. --------------090704090409060807020004 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Vyacheslav Dubeyko wrote: > On Sat, 2012-10-20 at 07:24 +0100, Hin-Tak Leung wrote: >> --- On Fri, 19/10/12, Vyacheslav Dubeyko wrote: >> >>> Hi Hin-Tak, >>> >>> On Thu, 2012-10-18 at 17:55 +0100, Hin-Tak Leung wrote: >>>> Hi, >>>> >>>> While looking at a few of the older BUG() traces I have >>> consistently >>>> running du on a somewhat large directory with lots of >>> small files and >>>> small directories, I noticed that it tends to have two >>> sleeping "? >>>> hfs_bnode_read()" towards the top. As it is a very >>> small and simple >>>> function which just reads a b-tree node record - >>> sometimes only a few >>>> bytes between a kmap/kunmap, I see that it might just >>> be the number of >>>> simultaneous kmap() being run. So I put a mutex around >>> it just to make >>>> sure only one copy of hfs_bnode_read() is run at a >>> time. >>> >>> Yeah, you touch very important problem. It needs to rework >>> hfsplus >>> driver from using kmap()/kunmap() because kmap() is slow, >>> theoretically >>> deadlocky and is deprecated. The alternative is >>> kunmap_atomic() but it >>> needs to dive more deeply in every case of kmap() using in >>> hfsplus >>> driver. >>> >>> The mutex is useless. It simply hides the issue. >> >> Yes, I am aware of that - putting mutex'es in just makes fewer kmap calls, but the limit of simultaneous kmap()'s can still be reached - and reasonably easily - just run 'du' a few more times, as I wrote below. >> > > Usually, hfs_bnode_read() is called after searching of some object in b-tree. > It needs to initialize struct hfsplus_find_data by means of hfs_find_init() > before any search and operations inside b-tree node. And then, it needs to > call hfs_find_exit(). The hfsplus_find_data structure contains mutex that it > locks of b-tree during hfs_find_init() call and unlock during > hfs_find_exit(). And, usually, hfs_bnode_read() is placed > between hfs_find_init()/hfs_find_exit() calls. So, as I can understand, your > mutex inside hfs_bnode_read() is useless. But, maybe, not all hfs_bnode_read() > calls are placed inside hfs_find_init()/hfs_find_exit() calls. It needs to check. > I can't clear understand about what simultaneous kmap()'s you are talking. > As > I can see, usually, (especially, in the case of hfs_bnode_read()) pairs of > kmap()/kunmap() localize in small parts of code and I expect that it executes > fast. Do I misunderstand something? Perhaps it is easier to show some code for the discussion. The attached serializes kmap in hfs_bnode_read() . This makes the driver works more reliably, somehow. In real usage, multiple instances of hfs_bnode_read() do get invoked in parallel. I assume that is because of both SMP as well as file system access itself are parallelized at various level - e.g. recursion like running du probably invokes a few instances of readdir/getdents? >> I tried swapping those by kmap_atomic()/kunmap_atomic() (beware the arguments are different for unmap) - but the kernel immediately warned that *_atom() code is used where code can sleep. >> > > Could you describe in more details about warning? In hfs_bnode_read(), I tried changing kmap/kunmap to kmap_atomic()/kunmap_atomic() . (Sorry I deleted the change since it does not work - but it shouldn't be too difficult to redo since it is just changing two lines in hfs_bnode_read()) - then I get many: BUG: scheduling while atomic: ... Netgear's hfs+ journalling code do try to replay journals, etc. But (1) it does not *create* journal entry correctly for attributes files, since implementation of attributes file itself was incomplete until recently, (2) I suspect it does not create/update journals the *exact same* way as Mac OS X - this means it is unsafe to do cross-OS unclean mounts, even when unclean-mount works perfectly by itself under Linux. BTW, I think I see another bug - unrelated to journalling - in the current hfs+ code: folder counts are not updated correctly. It seems that hfs+ folders have recursive folder counts (i.e. a parent dir knows how many sub-dir's it has, without needing to recurse down), and this is currently not updated correctly. One way of demo'ing this is to: (make sure fs is fsck-clean ; mount) cp -ipr somedir_with_some_files/ some_hfs+_place/someplace_inside/ (umount; run fsck_hfs again, now it complains about folder counts) The actual message looks a it like this: ** Checking catalog hierarchy. HasFolderCount flag needs to be set (id = 393055) (It should be 0x10 instead of 0) Incorrect folder count in a directory (id = 205) (It should be 18 instead of 17) --------------090704090409060807020004 Content-Type: text/x-patch; name="0001-serialize-hfs_bnode_read_kmap.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-serialize-hfs_bnode_read_kmap.patch" >>From 3b69f7f0b2827d00daf20aa75991e8d184be5ad0 Mon Sep 17 00:00:00 2001 From: Hin-Tak Leung Date: Thu, 11 Oct 2012 01:46:49 +0100 Subject: [PATCH] serialize hfs_bnode_read_kmap Signed-off-by: Hin-Tak Leung --- fs/hfsplus/bnode.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c index 5c125ce..7c19ad9 100644 --- a/fs/hfsplus/bnode.c +++ b/fs/hfsplus/bnode.c @@ -17,6 +17,20 @@ #include "hfsplus_fs.h" #include "hfsplus_raw.h" +static DEFINE_MUTEX(hfs_bnode_read_kmap_mutex_mutex); + +static inline void +hfs_bnode_read_kmap_mutex_lock(void) +{ + mutex_lock(&hfs_bnode_read_kmap_mutex_mutex); +} + +static inline void +hfs_bnode_read_kmap_mutex_unlock(void) +{ + mutex_unlock(&hfs_bnode_read_kmap_mutex_mutex); +} + /* Copy a specified range of bytes from the raw data of a node */ void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len) { @@ -28,14 +42,18 @@ void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len) off &= ~PAGE_CACHE_MASK; l = min(len, (int)PAGE_CACHE_SIZE - off); + hfs_bnode_read_kmap_mutex_lock(); memcpy(buf, kmap(*pagep) + off, l); kunmap(*pagep); + hfs_bnode_read_kmap_mutex_unlock(); while ((len -= l) != 0) { buf += l; l = min(len, (int)PAGE_CACHE_SIZE); + hfs_bnode_read_kmap_mutex_lock(); memcpy(buf, kmap(*++pagep), l); kunmap(*pagep); + hfs_bnode_read_kmap_mutex_unlock(); } } -- 1.7.11.7 --------------090704090409060807020004--