All of lore.kernel.org
 help / color / mirror / Atom feed
From: Corey Hickey <bugfood-ml@fatooh.org>
To: Jeff Mahoney <jeffm@suse.com>
Cc: reiserfs-devel@vger.kernel.org
Subject: Re: Lack of cached bitmap causing degraded performance and occasional hangs
Date: Wed, 20 Feb 2008 13:35:11 -0800	[thread overview]
Message-ID: <47BC9D0F.4060302@fatooh.org> (raw)
In-Reply-To: <47BC7BE7.9010400@suse.com>

Jeff Mahoney wrote:
> Corey Hickey wrote:
>> Hello,
> 
>> Every once in a while one of the hard drives in my RAID-0 array starts
>> buzzing: seeking rapidly and regularly such that it provides a
>> continuous tone. The tone is continuous for 0.5-2 seconds before
>> changing frequency; the sound goes through many such steps over the
>> course of 5-30 seconds. Meanwhile, my computer is effectively unusable:
>> programs are starved for I/O, terminals hang, and sometimes X becomes
>> unresponsive--I can't even move the mouse pointer.
> 
>> This drove me nuts for a while until I figured out the problem:
>> reiserfs' bitmap data keeps falling out of the kernel's page cache, and
>> re-reading the bitmap is very slow.
> 
>> Dropping the page cache instantly triggers the same behavior.
> 
>> # echo 1 > /proc/sys/vm/drop_caches
>> # dd if=/dev/zero of=file bs=1M count=1024
> 
>> It's quite common for writing a gigabyte to consist of 30 seconds of
>> reading bitmap data followed by 7 seconds of writing. Sometimes writing
>> a single byte takes 15 seconds of reading and 0 seconds of writing. :)
> 
>> I did some tests this evening that appear to confirm my analysis. I
>> compiled two kernels: one from git immediately before this commit, and
>> one from after.
> 
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5065227b46235ec0131b383cc2f537069b55c6b6
> 
>> Before:
>> - filesystem takes a long time to mount (of course)
>> - no problems thereafter
> 
>> After:
>> - filesystem mounts pretty quickly
>> - the usual buzzing and such
> 
> 
>> I don't understand why this problem is biting me so badly--I have
>> several other reiserfs filesystems (on the same computer and on others)
>> and I can't make any trouble happen with them. Actually, I can always
>> force the bitmap data to be forgotten by dropping the page cache, but
>> re-reading it only takes an moment on every other reiserfs I have. For
>> example, when writing a 1GB file, my 185 GB single-disk filesystem reads
>> about 600 KB of bitmap data in 1 second; my 932 GB RAID-0 is likely to
>> read 15 MB in 30 seconds.
> 
> 
>> I tried gathering information about the bitmaps on the two filesystems
>> and how quickly they can be read.
> 
>> # echo 1 > /proc/sys/vm/drop_caches
>> # time debugreiserfs -m /dev/md0 | wc -l
>> (and the same thing for /dev/sda4)
> 
>> Meanwhile, I captured disk read info with dstat to see how many
>> kilobytes of data were read.
> 
>>                time      lines     kilobytes
>> /dev/md0     55.125s     14935       29496
>> /dev/sda4     9.524s      2987        6680
> 
>> The ratios of the above data are very close to each other and to the
>> ratio of the filesystem sizes:
> 
>> fs size:   932 / 185      = 5.038
>> time:      55.126 / 9.524 = 5.788
>> lines:     14935 / 2987   = 5.000
>> kilobytes: 29496 / 6680   = 4.416
> 
> That makes sense. The number of bitmaps is a function of the size of the
> file system. There is one bitmap per 128MB of disk, and they're spaced
> as-needed, so every 128MB.

I thought that might be the case. Thanks for clarifying.

>> So, then, why does the larger filesystem have to read so much more
>> bitmap data before writing? As I mentioned before, /dev/md0 reads up to
>> 15 MB before writing, and /dev/sda4 reads only 600 KB.
> 
> It will only read until it can find the space available. How full are
> each of these file systems?

Well, I guess that would explain why so much is read.

/dev/sda4             185G  160G   25G  87% /nazgul
/dev/md0              932G  897G   35G  97% /oliphaunt

They're both pretty full, but it's quite likely that /dev/sda4 has a
large contiguous chunk of free space near the beginning. Most of that FS
is temporary storage for large files (many GB).

Unfortunately, I can't test cleaning out /dev/md0 right now--one of the
disks in my backup array started dying yesterday and I won't have a
replacement for a couple days.

I tried temporarily filling up /dev/sda4 to 98%, but I still wasn't able
to reproduce the problem there.

> It's certainly strange behavior. I have a 1.2 TB reiserfs file system
> that I can't duplicate this behavior with, even after dropping the
> caches. It's about 67% full, so finding free space is relatively easy.

What happens if you fill up the filesystem? I suppose the problem might
have something to do with the ratio between FS size and RAM size. I have
1 GB.

Once I get my replacement drive I'll be able to make a 1.2 TB array and
test it on a system with 640 MB of RAM.

> Does this happen repeatedly, or just the first time a write occurs? I'd
> be surprised if it happened every time, since reiserfs caches how many
> free blocks are in each bitmap group the first time the block is read.
> The cache is updated when a block is used or freed. If an allocation
> can't be met within that group, it's skipped.

Does dropping the page cache make reiserfs forget how many free blocks
are in the bitmap groups, or is that cached separately? I can always
make the problem occur after dropping the page cache.

If I drop the page cache, and then start writing repeatedly, as in:
-----------------------------------------------------
echo 1 > /proc/sys/vm/drop_caches
while true ; do
    dd if=/dev/zero of=file bs=1M count=1024 2>&1 | \
        grep copied | cut -d' ' -f6-
done
-----------------------------------------------------

...then I get the following results:
47.7652 s, 22.5 MB/s
34.7170 s, 30.9 MB/s
34.3364 s, 31.3 MB/s
35.0858 s, 30.6 MB/s
34.2207 s, 31.4 MB/s
34.4387 s, 31.2 MB/s
34.1648 s, 31.4 MB/s
34.6974 s, 30.9 MB/s
33.8431 s, 31.7 MB/s
35.1522 s, 30.5 MB/s


If, instead of dropping the page cache, I trick the kernel into caching
the bitmap with "debugreiserfs -m /dev/md0 &>/dev/null":
7.53645 s, 142 MB/s
8.17551 s, 131 MB/s
9.20222 s, 117 MB/s
7.12582 s, 151 MB/s
7.35693 s, 146 MB/s
6.98245 s, 154 MB/s
7.85886 s, 137 MB/s
7.96864 s, 135 MB/s
7.82978 s, 137 MB/s
7.84058 s, 137 MB/s


I don't know why the writing speeds are staying so consistently low in
the first test. Yesterday I ran pretty much the same thing and saw the
write speeds climb back up to around 140 MB/s over the course of five or
six runs; today I repeated the test several times and saw the same
results as I pasted above. I guess the kernel is preferring to cache the
1 GB file it just wrote. If I drop caches and write a 512 MB file
repeatedly, the results are nicer:

40.0924 s, 13.4 MB/s
3.78939 s, 142 MB/s
3.17951 s, 169 MB/s
3.33849 s, 161 MB/s
3.77553 s, 142 MB/s
3.78852 s, 142 MB/s
2.92377 s, 184 MB/s
3.38227 s, 159 MB/s
3.71573 s, 144 MB/s



This wasn't under any particular memory starvation.

$ free
            total       used       free     shared    buffers     cached
Mem:      1023336     291284     732052          0      48936      30300
-/+ buffers/cache     212048     811288
Swap:     1004052      12000     992052



Thank you very much for your reply, by the way. I was hoping you would. :)

-Corey

  reply	other threads:[~2008-02-20 21:35 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-20 17:50 Lack of cached bitmap causing degraded performance and occasional hangs Corey Hickey
2008-02-20 19:13 ` Jeff Mahoney
2008-02-20 21:35   ` Corey Hickey [this message]
2008-02-20 22:00     ` Jeff Mahoney
2008-02-20 23:44       ` Corey Hickey
2008-02-20 19:38 ` Jeff Mahoney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47BC9D0F.4060302@fatooh.org \
    --to=bugfood-ml@fatooh.org \
    --cc=jeffm@suse.com \
    --cc=reiserfs-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.