All of lore.kernel.org
 help / color / mirror / Atom feed
From: Corey Hickey <bugfood-ml@fatooh.org>
To: lrhorer@satx.rr.com
Cc: reiserfs-devel@vger.kernel.org
Subject: Re: Problem with reiserfs volume
Date: Tue, 05 May 2009 16:40:37 -0700	[thread overview]
Message-ID: <4A00CE75.4020109@fatooh.org> (raw)
In-Reply-To: <20090505084353762.EEUS22077@cdptpa-omta02.mail.rr.com>

Leslie Rhorer wrote:
>>>>>> It would always be the same 5 drives which dropped to zero
>>>>>> and the same 5 which still reported some reads going on.
>>>> I did the math and (if a couple reasonable assumptions I made are
>>>> correct), then the reiserfs bitmaps would indeed be distributed among
>>>> five of 10 drives in a RAID-6.
>>>>
>>>> If you're interested, ask, and I'll write it up.
>>> It's academic, but I'm curious.  Why would the default parameters have
>>> failed?
>> It's not exactly a "failure"--it's just that the bitmaps are placed
>> every 128 MB, and that results in a certain distribution among your disks.
> 
> This triggered a thought.  When I built the array, it was physically in a
> termporary configuration, so that while /dev/sda was drive 0 in the array
> and /dev/sdj was drive 9 in the array when it was built, the drives were
> moved in a piecemeal fashion to the new chassis, so that the order was
> something like /dev/sdf, /dev/sdg, /dev/sdh, /dev/sdi, /dev/sdj, /dev/sda,
> /dev/sde, /dev/sdd, /dev/sdc, /dev/sb, or something like that.  This
> shouldn't create a problem, as md handles RAID assembly based upon the drive
> superblock, not the udev assignment.  Is it possible the re-arrangement
> caused a failure of the bitmap somehow?

That should be fine.

I might not have been clear on this before: reading the bitmap data is
slow because it is distributed every 128 MB across the filesystem; this
means that in order to read lots of bitmaps, the disk spends most of its
time seeking rather than reading. For me, that's what was causing the
disk to "buzz", and that's why dstat showed read rates of only 400-600
KB/sec.

I just ran a quick test on my single-disk reiserfs and calculated the
average seek rate:

fs_size = 242341144 KB
bitmap_spacing = 128 MB = 131072 KB
num_bitmaps = fs_size / bitmap_spacing = 1849
bitmaps_read_time = 15.5 sec   (from debugreiserfs -m)
bitmap_read_rate = num_bitmaps / bitmaps_read_time = 119 bitmaps/sec
seek_rate = bitmap_read_rate = 119 seeks/sec  (seek to every bitmap)

That's a lot of seeking!

Having the bitmaps spread out among several disks of a RAID probably
wouldn't help. Reiserfs doesn't try to read the bitmaps in parallel;
that would be bad unless it knew the RAID layout. So, each disk would
just be idle when it wasn't its turn to seek and read another bitmap.


Remember how in the old days (before 2.6.19, I think) large reiserfs
filesystems took forever to mount? That's because reiserfs was reading
all the bitmap data and caching it internally. Eventually Jeff Mahoney
wrote a patch to make reiserfs read bitmap data on-demand and just let
the kernel cache them (or not).

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5065227b46235ec0131b383cc2f537069b55c6b6

> It still doesn't quite explain to me how a high read rate strictly at the
> drive level (e.g. ckarray) causes severe problems at the FS level, while an
> idle system did not exhibit nearly the frequency of problems nor did the
> hang last even a fraction as long (40 seconds vs. 20 minutes).

20 minutes sounds excessive, even when competing with a resync. I
couldn't say, and can't test it here.

>>>>>> During a RAID resync, almost every file create causes a halt.
>>>> Perhaps because the resync I/O caused the bitmap data to fall off the
>>>> page cache.
>>> How would that happen?  More to the point, how would it happen without
>>> triggering activity in the FS?
>> That was sort of a speculative statement, and I can't really back it up
>> because I don't know the details of how the page cache fits in, but IF
>> the data read and written during a resync gets cached, then the page
>> cache might prefer to retain that data rather than the bitmap data.
>>
>> If the bitmap data never stays in the page cache for long, then a file
>> write would pretty much always require some bitmaps to be re-read.
> 
> Except this happened without any file writes or reads other than the file
> creation itself and with no disk activity other than the array re-sync.

I remember even 0-byte files taking a long time to write. My guess would
be that reiserfs doesn't know the file will end up being empty when the
file is created, or perhaps it tries to find some contiguous space
anyway so the file can be appended to without excessive fragmentation.

In order to find contiguous space, reiserfs needs to look at the
bitmaps; if enough bitmap data isn't cached, reiserfs will have to read
some, which, as we know, can take a long time.

-Corey

  reply	other threads:[~2009-05-05 23:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-04 17:25 Problem with reiserfs volume Lelsie Rhorer
2009-04-06 20:04 ` Corey Hickey
2009-04-28 23:53   ` Leslie Rhorer
2009-04-29  0:00     ` Leslie Rhorer
2009-04-30  6:47       ` Corey Hickey
2009-05-03  1:58         ` Leslie Rhorer
2009-05-03 23:54           ` Corey Hickey
2009-05-05  8:43             ` Leslie Rhorer
2009-05-05 23:40               ` Corey Hickey [this message]
2009-05-06  2:04                 ` Leslie Rhorer
2009-05-07  5:59                   ` Corey Hickey
2009-05-11 16:37                     ` Leslie Rhorer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A00CE75.4020109@fatooh.org \
    --to=bugfood-ml@fatooh.org \
    --cc=lrhorer@satx.rr.com \
    --cc=reiserfs-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.