All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: "Yucong Sun (叶雨飞)" <sunyucong@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Raid10 and page cache
Date: Wed, 7 Dec 2011 12:01:33 +1100	[thread overview]
Message-ID: <20111207120133.70ca294c@notabene.brown> (raw)
In-Reply-To: <CAJygYd3fM+wtnU0HkN7s7o=FPizbPNPET2A21jxWk-BJZgdFbA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2511 bytes --]

On Tue, 6 Dec 2011 15:13:34 -0800 Yucong Sun (叶雨飞) <sunyucong@gmail.com>
wrote:

> On Tue, Dec 6, 2011 at 2:26 PM, NeilBrown <neilb@suse.de> wrote:
> > On Tue, 6 Dec 2011 14:01:14 -0800 Yucong Sun (叶雨飞) <sunyucong@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> I recently setup raid10 on 4 physical disk and have a iscsi serve it
> >> as a block device, and have been trying to tweak for performance.
> >>
> >> First thing I notice that MD seems to rely on page cache to flush
> >> changes to disk,  is there any way to turn that off so changes are
> >> flushed to the disk? like O_FSYNC|O_DIRECT does? The reason I want to
> >> turn it off is to understand the performance difference,  I want to be
> >> sure that page cache is truly acting as a write-back cache, I know one
> >> can tune the dirty_* to control the cache flush, but I want to make
> >> sure that it is actually doing what I think it does.
> >
> > Why do you think this?
> >
> > md/raid10 sends all request straight through to the relevant underlying
> > device(s).
> > reads are just passed straight down.
> > Writes are duplicated (the request structure, not the data) and queued to a
> > separate thread which does the actual write, but it is fairly direct.
> 
> So I know there's page caching /flush involved because I watch
> /proc/meminfo and see  Dirty value growing up and After reach the
> threshold, Write-back kicks in and wrote data.
> So if as you said md does no page flushing, then it must because of
> the iscsi software opens the device without O_DIRECT, so it uses page
> cache which in turn flush data to MD, now it makes more sense.
> 
> But for the md write, it's not SYNC write? meaning that after write
> call with O_DIRECT to the md device returns, the data is still
> possibility on the fly to the disk? how does having a bitmap plays in
> between? does it work like ext3 jounal? after a power-loss, can we
> expect a crash consistent data on the disk?

When you want sync writes, you need to use fsync.

When md writes the superblock or a bitmap page it uses SYNC and FLUSH writes
to ensure they get to the media before the subsequent data write.


> 
> Another thing to note is I found IO size on MD device is always 4K,
> which is the page size, is that normal? just want to making sure this
> isn't a bad behavior result from the iscsi software.

It is normal in some cases.  It depends a bit on the details of the
underlying device.


NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  parent reply	other threads:[~2011-12-07  1:01 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-06 21:29 Raid10 and page cache Yucong Sun (叶雨飞)
2011-12-06 22:01 ` Yucong Sun (叶雨飞)
2011-12-06 22:26   ` NeilBrown
2011-12-06 23:13     ` Yucong Sun (叶雨飞)
2011-12-06 23:22       ` Marcus Sorensen
2011-12-07  1:01       ` NeilBrown [this message]
2011-12-07  4:04         ` Yucong Sun (叶雨飞)
2011-12-07  4:28           ` NeilBrown
2011-12-07  4:50             ` Yucong Sun (叶雨飞)
2011-12-07  5:10               ` NeilBrown
2011-12-07  6:14                 ` Yucong Sun (叶雨飞)
2011-12-07  9:21                   ` Yucong Sun (叶雨飞)
2011-12-07 23:37                     ` Yucong Sun (叶雨飞)
2011-12-08  0:10                       ` NeilBrown
2011-12-08  6:31                         ` Yucong Sun (叶雨飞)
     [not found] ` <CAJygYd16PWfKe8fK-b150N46CEwzBUqJn1N6dfsGR4yyTgGbTQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-12-06 22:01   ` Yucong Sun (叶雨飞)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111207120133.70ca294c@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=sunyucong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.