All of lore.kernel.org
 help / color / mirror / Atom feed
* Lack of cached bitmap causing degraded performance and occasional hangs
@ 2008-02-20 17:50 Corey Hickey
  2008-02-20 19:13 ` Jeff Mahoney
  2008-02-20 19:38 ` Jeff Mahoney
  0 siblings, 2 replies; 6+ messages in thread
From: Corey Hickey @ 2008-02-20 17:50 UTC (permalink / raw)
  To: reiserfs-devel

Hello,

Every once in a while one of the hard drives in my RAID-0 array starts
buzzing: seeking rapidly and regularly such that it provides a
continuous tone. The tone is continuous for 0.5-2 seconds before
changing frequency; the sound goes through many such steps over the
course of 5-30 seconds. Meanwhile, my computer is effectively unusable:
programs are starved for I/O, terminals hang, and sometimes X becomes
unresponsive--I can't even move the mouse pointer.

This drove me nuts for a while until I figured out the problem:
reiserfs' bitmap data keeps falling out of the kernel's page cache, and
re-reading the bitmap is very slow.

Dropping the page cache instantly triggers the same behavior.

# echo 1 > /proc/sys/vm/drop_caches
# dd if=/dev/zero of=file bs=1M count=1024

It's quite common for writing a gigabyte to consist of 30 seconds of
reading bitmap data followed by 7 seconds of writing. Sometimes writing
a single byte takes 15 seconds of reading and 0 seconds of writing. :)

I did some tests this evening that appear to confirm my analysis. I
compiled two kernels: one from git immediately before this commit, and
one from after.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5065227b46235ec0131b383cc2f537069b55c6b6

Before:
- filesystem takes a long time to mount (of course)
- no problems thereafter

After:
- filesystem mounts pretty quickly
- the usual buzzing and such


I don't understand why this problem is biting me so badly--I have
several other reiserfs filesystems (on the same computer and on others)
and I can't make any trouble happen with them. Actually, I can always
force the bitmap data to be forgotten by dropping the page cache, but
re-reading it only takes an moment on every other reiserfs I have. For
example, when writing a 1GB file, my 185 GB single-disk filesystem reads
about 600 KB of bitmap data in 1 second; my 932 GB RAID-0 is likely to
read 15 MB in 30 seconds.


I tried gathering information about the bitmaps on the two filesystems
and how quickly they can be read.

# echo 1 > /proc/sys/vm/drop_caches
# time debugreiserfs -m /dev/md0 | wc -l
(and the same thing for /dev/sda4)

Meanwhile, I captured disk read info with dstat to see how many
kilobytes of data were read.

               time      lines     kilobytes
/dev/md0     55.125s     14935       29496
/dev/sda4     9.524s      2987        6680

The ratios of the above data are very close to each other and to the
ratio of the filesystem sizes:

fs size:   932 / 185      = 5.038
time:      55.126 / 9.524 = 5.788
lines:     14935 / 2987   = 5.000
kilobytes: 29496 / 6680   = 4.416


So, then, why does the larger filesystem have to read so much more
bitmap data before writing? As I mentioned before, /dev/md0 reads up to
15 MB before writing, and /dev/sda4 reads only 600 KB.

Thanks,
Corey

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Lack of cached bitmap causing degraded performance and occasional hangs
  2008-02-20 17:50 Lack of cached bitmap causing degraded performance and occasional hangs Corey Hickey
@ 2008-02-20 19:13 ` Jeff Mahoney
  2008-02-20 21:35   ` Corey Hickey
  2008-02-20 19:38 ` Jeff Mahoney
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff Mahoney @ 2008-02-20 19:13 UTC (permalink / raw)
  To: Corey Hickey; +Cc: reiserfs-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Corey Hickey wrote:
> Hello,
> 
> Every once in a while one of the hard drives in my RAID-0 array starts
> buzzing: seeking rapidly and regularly such that it provides a
> continuous tone. The tone is continuous for 0.5-2 seconds before
> changing frequency; the sound goes through many such steps over the
> course of 5-30 seconds. Meanwhile, my computer is effectively unusable:
> programs are starved for I/O, terminals hang, and sometimes X becomes
> unresponsive--I can't even move the mouse pointer.
> 
> This drove me nuts for a while until I figured out the problem:
> reiserfs' bitmap data keeps falling out of the kernel's page cache, and
> re-reading the bitmap is very slow.
> 
> Dropping the page cache instantly triggers the same behavior.
> 
> # echo 1 > /proc/sys/vm/drop_caches
> # dd if=/dev/zero of=file bs=1M count=1024
> 
> It's quite common for writing a gigabyte to consist of 30 seconds of
> reading bitmap data followed by 7 seconds of writing. Sometimes writing
> a single byte takes 15 seconds of reading and 0 seconds of writing. :)
> 
> I did some tests this evening that appear to confirm my analysis. I
> compiled two kernels: one from git immediately before this commit, and
> one from after.
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5065227b46235ec0131b383cc2f537069b55c6b6
> 
> Before:
> - filesystem takes a long time to mount (of course)
> - no problems thereafter
> 
> After:
> - filesystem mounts pretty quickly
> - the usual buzzing and such
> 
> 
> I don't understand why this problem is biting me so badly--I have
> several other reiserfs filesystems (on the same computer and on others)
> and I can't make any trouble happen with them. Actually, I can always
> force the bitmap data to be forgotten by dropping the page cache, but
> re-reading it only takes an moment on every other reiserfs I have. For
> example, when writing a 1GB file, my 185 GB single-disk filesystem reads
> about 600 KB of bitmap data in 1 second; my 932 GB RAID-0 is likely to
> read 15 MB in 30 seconds.
> 
> 
> I tried gathering information about the bitmaps on the two filesystems
> and how quickly they can be read.
> 
> # echo 1 > /proc/sys/vm/drop_caches
> # time debugreiserfs -m /dev/md0 | wc -l
> (and the same thing for /dev/sda4)
> 
> Meanwhile, I captured disk read info with dstat to see how many
> kilobytes of data were read.
> 
>                time      lines     kilobytes
> /dev/md0     55.125s     14935       29496
> /dev/sda4     9.524s      2987        6680
> 
> The ratios of the above data are very close to each other and to the
> ratio of the filesystem sizes:
> 
> fs size:   932 / 185      = 5.038
> time:      55.126 / 9.524 = 5.788
> lines:     14935 / 2987   = 5.000
> kilobytes: 29496 / 6680   = 4.416

That makes sense. The number of bitmaps is a function of the size of the
file system. There is one bitmap per 128MB of disk, and they're spaced
as-needed, so every 128MB.

> So, then, why does the larger filesystem have to read so much more
> bitmap data before writing? As I mentioned before, /dev/md0 reads up to
> 15 MB before writing, and /dev/sda4 reads only 600 KB.

It will only read until it can find the space available. How full are
each of these file systems?

It's certainly strange behavior. I have a 1.2 TB reiserfs file system
that I can't duplicate this behavior with, even after dropping the
caches. It's about 67% full, so finding free space is relatively easy.

Does this happen repeatedly, or just the first time a write occurs? I'd
be surprised if it happened every time, since reiserfs caches how many
free blocks are in each bitmap group the first time the block is read.
The cache is updated when a block is used or freed. If an allocation
can't be met within that group, it's skipped.

- -Jeff

- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFHvHvnLPWxlyuTD7IRAp0kAKCJqkCWNocayJ7So94RfPhPB6DVzwCePCK/
GOYifjzCgRRptQFs5e5YtD8=
=cLW3
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Lack of cached bitmap causing degraded performance and occasional hangs
  2008-02-20 17:50 Lack of cached bitmap causing degraded performance and occasional hangs Corey Hickey
  2008-02-20 19:13 ` Jeff Mahoney
@ 2008-02-20 19:38 ` Jeff Mahoney
  1 sibling, 0 replies; 6+ messages in thread
From: Jeff Mahoney @ 2008-02-20 19:38 UTC (permalink / raw)
  To: Corey Hickey; +Cc: reiserfs-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Corey Hickey wrote:
> Hello,
> 
> Every once in a while one of the hard drives in my RAID-0 array starts
> buzzing: seeking rapidly and regularly such that it provides a
> continuous tone. The tone is continuous for 0.5-2 seconds before
> changing frequency; the sound goes through many such steps over the
> course of 5-30 seconds. Meanwhile, my computer is effectively unusable:
> programs are starved for I/O, terminals hang, and sometimes X becomes
> unresponsive--I can't even move the mouse pointer.
> 
> This drove me nuts for a while until I figured out the problem:
> reiserfs' bitmap data keeps falling out of the kernel's page cache, and
> re-reading the bitmap is very slow.
> 
> Dropping the page cache instantly triggers the same behavior.
> 
> # echo 1 > /proc/sys/vm/drop_caches
> # dd if=/dev/zero of=file bs=1M count=1024
> 
> It's quite common for writing a gigabyte to consist of 30 seconds of
> reading bitmap data followed by 7 seconds of writing. Sometimes writing
> a single byte takes 15 seconds of reading and 0 seconds of writing. :)
> 
> I did some tests this evening that appear to confirm my analysis. I
> compiled two kernels: one from git immediately before this commit, and
> one from after.
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5065227b46235ec0131b383cc2f537069b55c6b6
> 
> Before:
> - filesystem takes a long time to mount (of course)
> - no problems thereafter
> 
> After:
> - filesystem mounts pretty quickly
> - the usual buzzing and such
> 
> 
> I don't understand why this problem is biting me so badly--I have
> several other reiserfs filesystems (on the same computer and on others)
> and I can't make any trouble happen with them. Actually, I can always
> force the bitmap data to be forgotten by dropping the page cache, but
> re-reading it only takes an moment on every other reiserfs I have. For
> example, when writing a 1GB file, my 185 GB single-disk filesystem reads
> about 600 KB of bitmap data in 1 second; my 932 GB RAID-0 is likely to
> read 15 MB in 30 seconds.
> 
> 
> I tried gathering information about the bitmaps on the two filesystems
> and how quickly they can be read.
> 
> # echo 1 > /proc/sys/vm/drop_caches
> # time debugreiserfs -m /dev/md0 | wc -l
> (and the same thing for /dev/sda4)
> 
> Meanwhile, I captured disk read info with dstat to see how many
> kilobytes of data were read.
> 
>                time      lines     kilobytes
> /dev/md0     55.125s     14935       29496
> /dev/sda4     9.524s      2987        6680
> 
> The ratios of the above data are very close to each other and to the
> ratio of the filesystem sizes:
> 
> fs size:   932 / 185      = 5.038
> time:      55.126 / 9.524 = 5.788
> lines:     14935 / 2987   = 5.000
> kilobytes: 29496 / 6680   = 4.416
> 
> 
> So, then, why does the larger filesystem have to read so much more
> bitmap data before writing? As I mentioned before, /dev/md0 reads up to
> 15 MB before writing, and /dev/sda4 reads only 600 KB.


- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFHvIG7LPWxlyuTD7IRAjLgAJsGekcbqlSyMtFpT+iWI8JU1LxUyQCfUeYy
cf/n+R9tlBAMQtp67e2eEnE=
=we6q
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Lack of cached bitmap causing degraded performance and occasional hangs
  2008-02-20 19:13 ` Jeff Mahoney
@ 2008-02-20 21:35   ` Corey Hickey
  2008-02-20 22:00     ` Jeff Mahoney
  0 siblings, 1 reply; 6+ messages in thread
From: Corey Hickey @ 2008-02-20 21:35 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: reiserfs-devel

Jeff Mahoney wrote:
> Corey Hickey wrote:
>> Hello,
> 
>> Every once in a while one of the hard drives in my RAID-0 array starts
>> buzzing: seeking rapidly and regularly such that it provides a
>> continuous tone. The tone is continuous for 0.5-2 seconds before
>> changing frequency; the sound goes through many such steps over the
>> course of 5-30 seconds. Meanwhile, my computer is effectively unusable:
>> programs are starved for I/O, terminals hang, and sometimes X becomes
>> unresponsive--I can't even move the mouse pointer.
> 
>> This drove me nuts for a while until I figured out the problem:
>> reiserfs' bitmap data keeps falling out of the kernel's page cache, and
>> re-reading the bitmap is very slow.
> 
>> Dropping the page cache instantly triggers the same behavior.
> 
>> # echo 1 > /proc/sys/vm/drop_caches
>> # dd if=/dev/zero of=file bs=1M count=1024
> 
>> It's quite common for writing a gigabyte to consist of 30 seconds of
>> reading bitmap data followed by 7 seconds of writing. Sometimes writing
>> a single byte takes 15 seconds of reading and 0 seconds of writing. :)
> 
>> I did some tests this evening that appear to confirm my analysis. I
>> compiled two kernels: one from git immediately before this commit, and
>> one from after.
> 
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5065227b46235ec0131b383cc2f537069b55c6b6
> 
>> Before:
>> - filesystem takes a long time to mount (of course)
>> - no problems thereafter
> 
>> After:
>> - filesystem mounts pretty quickly
>> - the usual buzzing and such
> 
> 
>> I don't understand why this problem is biting me so badly--I have
>> several other reiserfs filesystems (on the same computer and on others)
>> and I can't make any trouble happen with them. Actually, I can always
>> force the bitmap data to be forgotten by dropping the page cache, but
>> re-reading it only takes an moment on every other reiserfs I have. For
>> example, when writing a 1GB file, my 185 GB single-disk filesystem reads
>> about 600 KB of bitmap data in 1 second; my 932 GB RAID-0 is likely to
>> read 15 MB in 30 seconds.
> 
> 
>> I tried gathering information about the bitmaps on the two filesystems
>> and how quickly they can be read.
> 
>> # echo 1 > /proc/sys/vm/drop_caches
>> # time debugreiserfs -m /dev/md0 | wc -l
>> (and the same thing for /dev/sda4)
> 
>> Meanwhile, I captured disk read info with dstat to see how many
>> kilobytes of data were read.
> 
>>                time      lines     kilobytes
>> /dev/md0     55.125s     14935       29496
>> /dev/sda4     9.524s      2987        6680
> 
>> The ratios of the above data are very close to each other and to the
>> ratio of the filesystem sizes:
> 
>> fs size:   932 / 185      = 5.038
>> time:      55.126 / 9.524 = 5.788
>> lines:     14935 / 2987   = 5.000
>> kilobytes: 29496 / 6680   = 4.416
> 
> That makes sense. The number of bitmaps is a function of the size of the
> file system. There is one bitmap per 128MB of disk, and they're spaced
> as-needed, so every 128MB.

I thought that might be the case. Thanks for clarifying.

>> So, then, why does the larger filesystem have to read so much more
>> bitmap data before writing? As I mentioned before, /dev/md0 reads up to
>> 15 MB before writing, and /dev/sda4 reads only 600 KB.
> 
> It will only read until it can find the space available. How full are
> each of these file systems?

Well, I guess that would explain why so much is read.

/dev/sda4             185G  160G   25G  87% /nazgul
/dev/md0              932G  897G   35G  97% /oliphaunt

They're both pretty full, but it's quite likely that /dev/sda4 has a
large contiguous chunk of free space near the beginning. Most of that FS
is temporary storage for large files (many GB).

Unfortunately, I can't test cleaning out /dev/md0 right now--one of the
disks in my backup array started dying yesterday and I won't have a
replacement for a couple days.

I tried temporarily filling up /dev/sda4 to 98%, but I still wasn't able
to reproduce the problem there.

> It's certainly strange behavior. I have a 1.2 TB reiserfs file system
> that I can't duplicate this behavior with, even after dropping the
> caches. It's about 67% full, so finding free space is relatively easy.

What happens if you fill up the filesystem? I suppose the problem might
have something to do with the ratio between FS size and RAM size. I have
1 GB.

Once I get my replacement drive I'll be able to make a 1.2 TB array and
test it on a system with 640 MB of RAM.

> Does this happen repeatedly, or just the first time a write occurs? I'd
> be surprised if it happened every time, since reiserfs caches how many
> free blocks are in each bitmap group the first time the block is read.
> The cache is updated when a block is used or freed. If an allocation
> can't be met within that group, it's skipped.

Does dropping the page cache make reiserfs forget how many free blocks
are in the bitmap groups, or is that cached separately? I can always
make the problem occur after dropping the page cache.

If I drop the page cache, and then start writing repeatedly, as in:
-----------------------------------------------------
echo 1 > /proc/sys/vm/drop_caches
while true ; do
    dd if=/dev/zero of=file bs=1M count=1024 2>&1 | \
        grep copied | cut -d' ' -f6-
done
-----------------------------------------------------

...then I get the following results:
47.7652 s, 22.5 MB/s
34.7170 s, 30.9 MB/s
34.3364 s, 31.3 MB/s
35.0858 s, 30.6 MB/s
34.2207 s, 31.4 MB/s
34.4387 s, 31.2 MB/s
34.1648 s, 31.4 MB/s
34.6974 s, 30.9 MB/s
33.8431 s, 31.7 MB/s
35.1522 s, 30.5 MB/s


If, instead of dropping the page cache, I trick the kernel into caching
the bitmap with "debugreiserfs -m /dev/md0 &>/dev/null":
7.53645 s, 142 MB/s
8.17551 s, 131 MB/s
9.20222 s, 117 MB/s
7.12582 s, 151 MB/s
7.35693 s, 146 MB/s
6.98245 s, 154 MB/s
7.85886 s, 137 MB/s
7.96864 s, 135 MB/s
7.82978 s, 137 MB/s
7.84058 s, 137 MB/s


I don't know why the writing speeds are staying so consistently low in
the first test. Yesterday I ran pretty much the same thing and saw the
write speeds climb back up to around 140 MB/s over the course of five or
six runs; today I repeated the test several times and saw the same
results as I pasted above. I guess the kernel is preferring to cache the
1 GB file it just wrote. If I drop caches and write a 512 MB file
repeatedly, the results are nicer:

40.0924 s, 13.4 MB/s
3.78939 s, 142 MB/s
3.17951 s, 169 MB/s
3.33849 s, 161 MB/s
3.77553 s, 142 MB/s
3.78852 s, 142 MB/s
2.92377 s, 184 MB/s
3.38227 s, 159 MB/s
3.71573 s, 144 MB/s



This wasn't under any particular memory starvation.

$ free
            total       used       free     shared    buffers     cached
Mem:      1023336     291284     732052          0      48936      30300
-/+ buffers/cache     212048     811288
Swap:     1004052      12000     992052



Thank you very much for your reply, by the way. I was hoping you would. :)

-Corey

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Lack of cached bitmap causing degraded performance and occasional hangs
  2008-02-20 21:35   ` Corey Hickey
@ 2008-02-20 22:00     ` Jeff Mahoney
  2008-02-20 23:44       ` Corey Hickey
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff Mahoney @ 2008-02-20 22:00 UTC (permalink / raw)
  To: Corey Hickey; +Cc: reiserfs-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Corey Hickey wrote:
> Jeff Mahoney wrote:
> Does dropping the page cache make reiserfs forget how many free blocks
> are in the bitmap groups, or is that cached separately? I can always
> make the problem occur after dropping the page cache.

That's cached separately. What version of the kernel are you using?
There was an issue a while ago where file systems over 90% full would
run into huge performance problems because the allocator would always
try to find a free "window" of the size requested. This would cause it
to loop over the entire file system, and then step back and take
whatever it could find. We fixed that a while ago, though.

Caching all the bitmaps in memory for your larger file system would take
30 MB. The pattern of looping over them and back is not a good case for
an LRU list, since it loops over all of them and starts from the
beginning again. What did the memory footprint look like before you
dropped the caches?


> If I drop the page cache, and then start writing repeatedly, as in:
> -----------------------------------------------------
> echo 1 > /proc/sys/vm/drop_caches
> while true ; do
>     dd if=/dev/zero of=file bs=1M count=1024 2>&1 | \
>         grep copied | cut -d' ' -f6-
> done
> -----------------------------------------------------
> 
> ...then I get the following results:
> 47.7652 s, 22.5 MB/s

... and now we've cached a bit ...

> 34.7170 s, 30.9 MB/s
> 34.3364 s, 31.3 MB/s
> 35.0858 s, 30.6 MB/s
> 34.2207 s, 31.4 MB/s
> 34.4387 s, 31.2 MB/s
> 34.1648 s, 31.4 MB/s
> 34.6974 s, 30.9 MB/s
> 33.8431 s, 31.7 MB/s
> 35.1522 s, 30.5 MB/s

> If, instead of dropping the page cache, I trick the kernel into caching
> the bitmap with "debugreiserfs -m /dev/md0 &>/dev/null":
> 7.53645 s, 142 MB/s
> 8.17551 s, 131 MB/s
> 9.20222 s, 117 MB/s
> 7.12582 s, 151 MB/s
> 7.35693 s, 146 MB/s
> 6.98245 s, 154 MB/s
> 7.85886 s, 137 MB/s
> 7.96864 s, 135 MB/s
> 7.82978 s, 137 MB/s
> 7.84058 s, 137 MB/s

Yep, touching those blocks would delay those getting dropped.


> I don't know why the writing speeds are staying so consistently low in
> the first test. Yesterday I ran pretty much the same thing and saw the
> write speeds climb back up to around 140 MB/s over the course of five or
> six runs; today I repeated the test several times and saw the same
> results as I pasted above. I guess the kernel is preferring to cache the
> 1 GB file it just wrote. If I drop caches and write a 512 MB file
> repeatedly, the results are nicer:
> 
> 40.0924 s, 13.4 MB/s

.. and again, we've cached a bit ...

> 3.78939 s, 142 MB/s
> 3.17951 s, 169 MB/s
> 3.33849 s, 161 MB/s
> 3.77553 s, 142 MB/s
> 3.78852 s, 142 MB/s
> 2.92377 s, 184 MB/s
> 3.38227 s, 159 MB/s
> 3.71573 s, 144 MB/s

Your analysis is probably right: Writing the 1 GB file is forcing the
bitmaps out of the cache. Writing a 512MB file ends up not causing
memory pressure, so nothing is forced out. Your original report
mentioned that you could see measurable delays with 1 MB transferred or
even just one byte. Was that while your system was running at normal
load with a bit of memory pressure?

I think right now the most important question is which kernel version
you're running.

- -Jeff

- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFHvKMHLPWxlyuTD7IRAkLkAJ95UlfvkCMOBVsksDlV+jlK8vO7/ACfVr2h
U+DjYplVdcjXFQJzs37cmck=
=rYPO
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Lack of cached bitmap causing degraded performance and occasional hangs
  2008-02-20 22:00     ` Jeff Mahoney
@ 2008-02-20 23:44       ` Corey Hickey
  0 siblings, 0 replies; 6+ messages in thread
From: Corey Hickey @ 2008-02-20 23:44 UTC (permalink / raw)
  To: Jeff Mahoney; +Cc: reiserfs-devel

Jeff Mahoney wrote:
> Corey Hickey wrote:
>> Jeff Mahoney wrote:
>> Does dropping the page cache make reiserfs forget how many free blocks
>> are in the bitmap groups, or is that cached separately? I can always
>> make the problem occur after dropping the page cache.
> 
> That's cached separately. What version of the kernel are you using?

2.6.24.2. I've also seen what appeared to be the same problem in
- 2.6.24
- 2.6.23.1
- 2.6.21

...ever since I made this array and copied files to it from backup.

> There was an issue a while ago where file systems over 90% full would
> run into huge performance problems because the allocator would always
> try to find a free "window" of the size requested. This would cause it
> to loop over the entire file system, and then step back and take
> whatever it could find. We fixed that a while ago, though.

If you think there's any use in my testing it, I can try to clean house
and move files off the array to down below 90%. I'll start cleaning
after I send this (I ought to anyway); let me know if I should try to
get below 90%, though.

Still, I'm not seeing any issues when I fill up /dev/sda4 (on the same
machine) to 98%.

> Caching all the bitmaps in memory for your larger file system would take
> 30 MB. The pattern of looping over them and back is not a good case for
> an LRU list, since it loops over all of them and starts from the
> beginning again. What did the memory footprint look like before you
> dropped the caches?

For the report I gave earlier, I had closed a few memory hogs to see if
more free memory would alleviate the problem. Here's a more typical
report for free memory
- after normal usage
- after dropping page cache
- after reading bitmaps
- after droping page cache again

$ free
           total       used       free     shared    buffers     cached
Mem:     1023336    1004704      18632          0       8428     639680
-/+ buffers/cache:   356596     666740
Swap:    1004052      12000     992052

# echo 1 > /proc/sys/vm/drop_caches

$ free
           total       used       free     shared    buffers     cached
Mem:     1023336     419740     603596          0       3884      60072
-/+ buffers/cache:   355784     667552
Swap:    1004052      12000     992052

# debugreiserfs -m /dev/md0 &>/dev/null

$ free
           total       used       free     shared    buffers     cached
Mem:     1023336     456384     566952          0      33436      60296
-/+ buffers/cache:   362652     660684
Swap:    1004052      12000     992052

# echo 1 > /proc/sys/vm/drop_caches

$ free
           total       used       free     shared    buffers     cached
Mem:     1023336     419736     603600          0       3812      60056
-/+ buffers/cache:   355868     667468
Swap:    1004052      12000     992052

> Your analysis is probably right: Writing the 1 GB file is forcing the
> bitmaps out of the cache. Writing a 512MB file ends up not causing
> memory pressure, so nothing is forced out. Your original report
> mentioned that you could see measurable delays with 1 MB transferred or
> even just one byte. Was that while your system was running at normal
> load with a bit of memory pressure?

I was referring to seeing a delay after dropping the page cache, such as:

# echo 1 > /proc/sys/vm/drop_caches
# dd if=/dev/zero of=file bs=1c count=1
1+0 records in
1+0 records out
1 byte (1 B) copied, 7.72591 s, 0.0 kB/s

I'm not sure what to make of that; it would surprise me if there are
really so few "holes" toward the beginning of the filesystem that it
ought to take that long to find room for such a small file. I'm just
speculating, though....

As for when the problem crops up on its own, I often see it when the
system is under an I/O load (or was recently): for example, copying a
large file, compiling a program, watching a movie, or doing something
with git. That would seem consistent with the kernel dropping bitmap
data from cache in favor of files recently read/written. Being under
memory pressure might make the problem more likely, but it isn't
strictly necessary.

-Corey

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-02-20 23:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-20 17:50 Lack of cached bitmap causing degraded performance and occasional hangs Corey Hickey
2008-02-20 19:13 ` Jeff Mahoney
2008-02-20 21:35   ` Corey Hickey
2008-02-20 22:00     ` Jeff Mahoney
2008-02-20 23:44       ` Corey Hickey
2008-02-20 19:38 ` Jeff Mahoney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.