From: Hin-Tak Leung <htl10@users.sourceforge.net>
To: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: linux-fsdevel@vger.kernel.org,
Till Kamppeter <till.kamppeter@gmail.com>,
Naohiro Aota <naota@elisp.net>
Subject: Re: hfsplus BUG(), kmap and journalling.
Date: Tue, 30 Oct 2012 08:24:36 +0000 [thread overview]
Message-ID: <508F8EC4.7050905@users.sourceforge.net> (raw)
In-Reply-To: <1350896532.3236.31.camel@slavad-ubuntu>
[-- Attachment #1: Type: text/plain, Size: 4664 bytes --]
Vyacheslav Dubeyko wrote:
> On Sat, 2012-10-20 at 07:24 +0100, Hin-Tak Leung wrote:
>> --- On Fri, 19/10/12, Vyacheslav Dubeyko <slava@dubeyko.com> wrote:
>>
>>> Hi Hin-Tak,
>>>
>>> On Thu, 2012-10-18 at 17:55 +0100, Hin-Tak Leung wrote:
>>>> Hi,
>>>>
>>>> While looking at a few of the older BUG() traces I have
>>> consistently
>>>> running du on a somewhat large directory with lots of
>>> small files and
>>>> small directories, I noticed that it tends to have two
>>> sleeping "?
>>>> hfs_bnode_read()" towards the top. As it is a very
>>> small and simple
>>>> function which just reads a b-tree node record -
>>> sometimes only a few
>>>> bytes between a kmap/kunmap, I see that it might just
>>> be the number of
>>>> simultaneous kmap() being run. So I put a mutex around
>>> it just to make
>>>> sure only one copy of hfs_bnode_read() is run at a
>>> time.
>>>
>>> Yeah, you touch very important problem. It needs to rework
>>> hfsplus
>>> driver from using kmap()/kunmap() because kmap() is slow,
>>> theoretically
>>> deadlocky and is deprecated. The alternative is
>>> kunmap_atomic() but it
>>> needs to dive more deeply in every case of kmap() using in
>>> hfsplus
>>> driver.
>>>
>>> The mutex is useless. It simply hides the issue.
>>
>> Yes, I am aware of that - putting mutex'es in just makes fewer kmap calls, but the limit of simultaneous kmap()'s can still be reached - and reasonably easily - just run 'du' a few more times, as I wrote below.
>>
>
> Usually, hfs_bnode_read() is called after searching of some object in b-tree.
> It needs to initialize struct hfsplus_find_data by means of hfs_find_init()
> before any search and operations inside b-tree node. And then, it needs to
> call hfs_find_exit(). The hfsplus_find_data structure contains mutex that it
> locks of b-tree during hfs_find_init() call and unlock during
> hfs_find_exit(). And, usually, hfs_bnode_read() is placed
> between hfs_find_init()/hfs_find_exit() calls. So, as I can understand, your
> mutex inside hfs_bnode_read() is useless. But, maybe, not all hfs_bnode_read()
> calls are placed inside hfs_find_init()/hfs_find_exit() calls. It needs to check.
> I can't clear understand about what simultaneous kmap()'s you are talking.
> As
> I can see, usually, (especially, in the case of hfs_bnode_read()) pairs of
> kmap()/kunmap() localize in small parts of code and I expect that it executes
> fast. Do I misunderstand something?
Perhaps it is easier to show some code for the discussion. The attached
serializes kmap in hfs_bnode_read() . This makes the driver works more reliably,
somehow. In real usage, multiple instances of hfs_bnode_read() do get invoked in
parallel. I assume that is because of both SMP as well as file system access
itself are parallelized at various level - e.g. recursion like running du
probably invokes a few instances of readdir/getdents?
>> I tried swapping those by kmap_atomic()/kunmap_atomic() (beware the arguments are different for unmap) - but the kernel immediately warned that *_atom() code is used where code can sleep.
>>
>
> Could you describe in more details about warning?
In hfs_bnode_read(), I tried changing kmap/kunmap to
kmap_atomic()/kunmap_atomic() . (Sorry I deleted the change since it does not
work - but it shouldn't be too difficult to redo since it is just changing two
lines in hfs_bnode_read()) - then I get many:
BUG: scheduling while atomic: ...
<a lot of stuff snipped>
Netgear's hfs+ journalling code do try to replay journals, etc. But (1) it does
not *create* journal entry correctly for attributes files, since implementation
of attributes file itself was incomplete until recently, (2) I suspect it does
not create/update journals the *exact same* way as Mac OS X - this means it is
unsafe to do cross-OS unclean mounts, even when unclean-mount works perfectly by
itself under Linux.
BTW, I think I see another bug - unrelated to journalling - in the current hfs+
code: folder counts are not updated correctly. It seems that hfs+ folders have
recursive folder counts (i.e. a parent dir knows how many sub-dir's it has,
without needing to recurse down), and this is currently not updated correctly.
One way of demo'ing this is to:
(make sure fs is fsck-clean ; mount)
cp -ipr somedir_with_some_files/ some_hfs+_place/someplace_inside/
(umount; run fsck_hfs again, now it complains about folder counts)
The actual message looks a it like this:
** Checking catalog hierarchy.
HasFolderCount flag needs to be set (id = 393055)
(It should be 0x10 instead of 0)
Incorrect folder count in a directory (id = 205)
(It should be 18 instead of 17)
[-- Attachment #2: 0001-serialize-hfs_bnode_read_kmap.patch --]
[-- Type: text/x-patch, Size: 1520 bytes --]
>From 3b69f7f0b2827d00daf20aa75991e8d184be5ad0 Mon Sep 17 00:00:00 2001
From: Hin-Tak Leung <htl10@users.sourceforge.net>
Date: Thu, 11 Oct 2012 01:46:49 +0100
Subject: [PATCH] serialize hfs_bnode_read_kmap
Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
---
fs/hfsplus/bnode.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
index 5c125ce..7c19ad9 100644
--- a/fs/hfsplus/bnode.c
+++ b/fs/hfsplus/bnode.c
@@ -17,6 +17,20 @@
#include "hfsplus_fs.h"
#include "hfsplus_raw.h"
+static DEFINE_MUTEX(hfs_bnode_read_kmap_mutex_mutex);
+
+static inline void
+hfs_bnode_read_kmap_mutex_lock(void)
+{
+ mutex_lock(&hfs_bnode_read_kmap_mutex_mutex);
+}
+
+static inline void
+hfs_bnode_read_kmap_mutex_unlock(void)
+{
+ mutex_unlock(&hfs_bnode_read_kmap_mutex_mutex);
+}
+
/* Copy a specified range of bytes from the raw data of a node */
void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
{
@@ -28,14 +42,18 @@ void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
off &= ~PAGE_CACHE_MASK;
l = min(len, (int)PAGE_CACHE_SIZE - off);
+ hfs_bnode_read_kmap_mutex_lock();
memcpy(buf, kmap(*pagep) + off, l);
kunmap(*pagep);
+ hfs_bnode_read_kmap_mutex_unlock();
while ((len -= l) != 0) {
buf += l;
l = min(len, (int)PAGE_CACHE_SIZE);
+ hfs_bnode_read_kmap_mutex_lock();
memcpy(buf, kmap(*++pagep), l);
kunmap(*pagep);
+ hfs_bnode_read_kmap_mutex_unlock();
}
}
--
1.7.11.7
next prev parent reply other threads:[~2012-10-30 8:24 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-18 16:55 hfsplus BUG(), kmap and journalling Hin-Tak Leung
2012-10-19 12:45 ` Vyacheslav Dubeyko
2012-10-20 6:24 ` Hin-Tak Leung
2012-10-22 9:02 ` Vyacheslav Dubeyko
2012-10-30 8:24 ` Hin-Tak Leung [this message]
2012-10-30 11:45 ` Vyacheslav Dubeyko
2012-11-02 5:43 ` hfsplus foldercount (Re: hfsplus BUG(), kmap and journalling.) Hin-Tak Leung
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=508F8EC4.7050905@users.sourceforge.net \
--to=htl10@users.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=naota@elisp.net \
--cc=slava@dubeyko.com \
--cc=till.kamppeter@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).