From: Hin-Tak Leung <htl10@users.sourceforge.net>
To: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: linux-fsdevel@vger.kernel.org,
Till Kamppeter <till.kamppeter@gmail.com>,
Naohiro Aota <naota@elisp.net>
Subject: Re: hfsplus BUG(), kmap and journalling.
Date: Tue, 30 Oct 2012 08:24:36 +0000 [thread overview]
Message-ID: <508F8EC4.7050905@users.sourceforge.net> (raw)
In-Reply-To: <1350896532.3236.31.camel@slavad-ubuntu>
[-- Attachment #1: Type: text/plain, Size: 4664 bytes --]
Vyacheslav Dubeyko wrote:
> On Sat, 2012-10-20 at 07:24 +0100, Hin-Tak Leung wrote:
>> --- On Fri, 19/10/12, Vyacheslav Dubeyko <slava@dubeyko.com> wrote:
>>
>>> Hi Hin-Tak,
>>>
>>> On Thu, 2012-10-18 at 17:55 +0100, Hin-Tak Leung wrote:
>>>> Hi,
>>>>
>>>> While looking at a few of the older BUG() traces I have
>>> consistently
>>>> running du on a somewhat large directory with lots of
>>> small files and
>>>> small directories, I noticed that it tends to have two
>>> sleeping "?
>>>> hfs_bnode_read()" towards the top. As it is a very
>>> small and simple
>>>> function which just reads a b-tree node record -
>>> sometimes only a few
>>>> bytes between a kmap/kunmap, I see that it might just
>>> be the number of
>>>> simultaneous kmap() being run. So I put a mutex around
>>> it just to make
>>>> sure only one copy of hfs_bnode_read() is run at a
>>> time.
>>>
>>> Yeah, you touch very important problem. It needs to rework
>>> hfsplus
>>> driver from using kmap()/kunmap() because kmap() is slow,
>>> theoretically
>>> deadlocky and is deprecated. The alternative is
>>> kunmap_atomic() but it
>>> needs to dive more deeply in every case of kmap() using in
>>> hfsplus
>>> driver.
>>>
>>> The mutex is useless. It simply hides the issue.
>>
>> Yes, I am aware of that - putting mutex'es in just makes fewer kmap calls, but the limit of simultaneous kmap()'s can still be reached - and reasonably easily - just run 'du' a few more times, as I wrote below.
>>
>
> Usually, hfs_bnode_read() is called after searching of some object in b-tree.
> It needs to initialize struct hfsplus_find_data by means of hfs_find_init()
> before any search and operations inside b-tree node. And then, it needs to
> call hfs_find_exit(). The hfsplus_find_data structure contains mutex that it
> locks of b-tree during hfs_find_init() call and unlock during
> hfs_find_exit(). And, usually, hfs_bnode_read() is placed
> between hfs_find_init()/hfs_find_exit() calls. So, as I can understand, your
> mutex inside hfs_bnode_read() is useless. But, maybe, not all hfs_bnode_read()
> calls are placed inside hfs_find_init()/hfs_find_exit() calls. It needs to check.
> I can't clear understand about what simultaneous kmap()'s you are talking.
> As
> I can see, usually, (especially, in the case of hfs_bnode_read()) pairs of
> kmap()/kunmap() localize in small parts of code and I expect that it executes
> fast. Do I misunderstand something?
Perhaps it is easier to show some code for the discussion. The attached
serializes kmap in hfs_bnode_read() . This makes the driver works more reliably,
somehow. In real usage, multiple instances of hfs_bnode_read() do get invoked in
parallel. I assume that is because of both SMP as well as file system access
itself are parallelized at various level - e.g. recursion like running du
probably invokes a few instances of readdir/getdents?
>> I tried swapping those by kmap_atomic()/kunmap_atomic() (beware the arguments are different for unmap) - but the kernel immediately warned that *_atom() code is used where code can sleep.
>>
>
> Could you describe in more details about warning?
In hfs_bnode_read(), I tried changing kmap/kunmap to
kmap_atomic()/kunmap_atomic() . (Sorry I deleted the change since it does not
work - but it shouldn't be too difficult to redo since it is just changing two
lines in hfs_bnode_read()) - then I get many:
BUG: scheduling while atomic: ...
<a lot of stuff snipped>
Netgear's hfs+ journalling code do try to replay journals, etc. But (1) it does
not *create* journal entry correctly for attributes files, since implementation
of attributes file itself was incomplete until recently, (2) I suspect it does
not create/update journals the *exact same* way as Mac OS X - this means it is
unsafe to do cross-OS unclean mounts, even when unclean-mount works perfectly by
itself under Linux.
BTW, I think I see another bug - unrelated to journalling - in the current hfs+
code: folder counts are not updated correctly. It seems that hfs+ folders have
recursive folder counts (i.e. a parent dir knows how many sub-dir's it has,
without needing to recurse down), and this is currently not updated correctly.
One way of demo'ing this is to:
(make sure fs is fsck-clean ; mount)
cp -ipr somedir_with_some_files/ some_hfs+_place/someplace_inside/
(umount; run fsck_hfs again, now it complains about folder counts)
The actual message looks a it like this:
** Checking catalog hierarchy.
HasFolderCount flag needs to be set (id = 393055)
(It should be 0x10 instead of 0)
Incorrect folder count in a directory (id = 205)
(It should be 18 instead of 17)
[-- Attachment #2: 0001-serialize-hfs_bnode_read_kmap.patch --]
[-- Type: text/x-patch, Size: 1520 bytes --]
>From 3b69f7f0b2827d00daf20aa75991e8d184be5ad0 Mon Sep 17 00:00:00 2001
From: Hin-Tak Leung <htl10@users.sourceforge.net>
Date: Thu, 11 Oct 2012 01:46:49 +0100
Subject: [PATCH] serialize hfs_bnode_read_kmap
Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
---
fs/hfsplus/bnode.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
index 5c125ce..7c19ad9 100644
--- a/fs/hfsplus/bnode.c
+++ b/fs/hfsplus/bnode.c
@@ -17,6 +17,20 @@
#include "hfsplus_fs.h"
#include "hfsplus_raw.h"
+static DEFINE_MUTEX(hfs_bnode_read_kmap_mutex_mutex);
+
+static inline void
+hfs_bnode_read_kmap_mutex_lock(void)
+{
+ mutex_lock(&hfs_bnode_read_kmap_mutex_mutex);
+}
+
+static inline void
+hfs_bnode_read_kmap_mutex_unlock(void)
+{
+ mutex_unlock(&hfs_bnode_read_kmap_mutex_mutex);
+}
+
/* Copy a specified range of bytes from the raw data of a node */
void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
{
@@ -28,14 +42,18 @@ void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len)
off &= ~PAGE_CACHE_MASK;
l = min(len, (int)PAGE_CACHE_SIZE - off);
+ hfs_bnode_read_kmap_mutex_lock();
memcpy(buf, kmap(*pagep) + off, l);
kunmap(*pagep);
+ hfs_bnode_read_kmap_mutex_unlock();
while ((len -= l) != 0) {
buf += l;
l = min(len, (int)PAGE_CACHE_SIZE);
+ hfs_bnode_read_kmap_mutex_lock();
memcpy(buf, kmap(*++pagep), l);
kunmap(*pagep);
+ hfs_bnode_read_kmap_mutex_unlock();
}
}
--
1.7.11.7
next prev parent reply other threads:[~2012-10-30 8:24 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-18 16:55 hfsplus BUG(), kmap and journalling Hin-Tak Leung
2012-10-19 12:45 ` Vyacheslav Dubeyko
2012-10-20 6:24 ` Hin-Tak Leung
2012-10-22 9:02 ` Vyacheslav Dubeyko
2012-10-30 8:24 ` Hin-Tak Leung [this message]
2012-10-30 11:45 ` Vyacheslav Dubeyko
2012-11-02 5:43 ` hfsplus foldercount (Re: hfsplus BUG(), kmap and journalling.) Hin-Tak Leung
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=508F8EC4.7050905@users.sourceforge.net \
--to=htl10@users.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=naota@elisp.net \
--cc=slava@dubeyko.com \
--cc=till.kamppeter@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.