All of lore.kernel.org
 help / color / mirror / Atom feed
From: linbloke <linbloke@fastmail.fm>
To: John Adams <john.adams@avid.com>
Cc: "linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: What are these reads in what should be simply a full-stripe write?
Date: Wed, 29 Feb 2012 10:14:49 +1100	[thread overview]
Message-ID: <4F4D5FE9.7090301@fastmail.fm> (raw)
In-Reply-To: <FC0D60DF-6357-495C-B10A-1C727AAFCBE3@avid.com>


On 29/02/12 5:47 AM, John Adams wrote:
> For some years I've been working on some niche filesystems which serve
> workflows involving lots of video.  Lately, I have had occasion to
> investigate the behavior of md as a possible raid solution (2.6.32
> kernel).
>
> As part of that, we looked at some fio based loads in the buffered and
> O_DIRECT cases and noticed some reading that we didn't understand when
> using O_DIRECT.  We were led to this comparision by incorrect
> information from a vendor. (We were trying to repro some reported
> performance and were initially told that O_DIRECT had been used).
>
> We are aware of the problems discussed concerning O_DIRECT.  As fs
> guys, we're accustomed to worrying about copies and such, so it wasn't
> immediately obvious to us that O_DIRECT would be a mistake in our
> case.  This is essentially an embedded system with a single process
> owning a group of disks with no filesystem.  There is no possibility
> of a race with another process.
>
> Anyway, I am curious about this reading behavior and I would grateful for any
> comments.
>
> I tried writing single stripes under both scenarios.  To give the
> barest possible summary. I used a dd command like this with
> oflag=direct omitted or not.  This was driven from a script that
> sets up some blktrace and ftrace things, waits an appropriate time in
> the buffered case etc.
>
> dd oflag=direct if=/dev/zero of=/dev/md0 seek=0 bs=1M count=1
>
> 8+2 128k strip
>
> [physical disk completions via blkparse]
>
> Buffered:
>
>   Reads Completed:        2,        5KiB  Writes Completed:        4,      258KiB
>   Reads Completed:        0,        0KiB  Writes Completed:        3,      130KiB
>   Reads Completed:        0,        0KiB  Writes Completed:        3,      130KiB
>   Reads Completed:        0,        0KiB  Writes Completed:        3,      130KiB
>   Reads Completed:        0,        0KiB  Writes Completed:        3,      130KiB
>   Reads Completed:        0,        0KiB  Writes Completed:        3,      130KiB
>   Reads Completed:        0,        0KiB  Writes Completed:        3,      130KiB
>   Reads Completed:        0,        0KiB  Writes Completed:        3,      130KiB
>   Reads Completed:        2,        8KiB  Writes Completed:        3,      130KiB
>   Reads Completed:        6,       24KiB  Writes Completed:        4,      258KiB
>
> Direct Example 1:
>
>   Reads Completed:        2,        5KiB  Writes Completed:       20,      258KiB
>   Reads Completed:        9,       36KiB  Writes Completed:       14,      130KiB
>   Reads Completed:       32,      128KiB  Writes Completed:       14,      130KiB
>   Reads Completed:        1,        4KiB  Writes Completed:       16,      130KiB
>   Reads Completed:       32,      128KiB  Writes Completed:       12,      130KiB
>   Reads Completed:        0,        0KiB  Writes Completed:        8,      130KiB
>   Reads Completed:        0,        0KiB  Writes Completed:        8,      130KiB
>   Reads Completed:        0,        0KiB  Writes Completed:        8,      130KiB
>   Reads Completed:        2,        8KiB  Writes Completed:        8,      130KiB
>   Reads Completed:        6,       24KiB  Writes Completed:       19,      258KiB
>
> Direct Example 2:
>
>   Reads Completed:        4,      133KiB  Writes Completed:        3,      130KiB
>   Reads Completed:       11,      164KiB  Writes Completed:        3,      130KiB
>   Reads Completed:       34,      256KiB  Writes Completed:        3,      130KiB
>   Reads Completed:        2,      132KiB  Writes Completed:        3,      130KiB
>   Reads Completed:       33,      256KiB  Writes Completed:        3,      130KiB
>   Reads Completed:        3,      136KiB  Writes Completed:        3,      130KiB
>   Reads Completed:        7,      152KiB  Writes Completed:        3,      130KiB
>
>
> I was able to gain a little bit of insight through blktrace and
> ftrace.  Our initial assumption was that maybe things were being
> broken up differently such that md thought it needed to do a rmv.
>
> But as I dug into the blktrace output, that did not seem to be the
> case (reads are coming after what is obviously the strip write).  I
> used ftrace to show me the path down to md_make_request in the
> O_DIRECT and buffered cases.  This showed me some calls refering to
> read_ahead in the direct case.
>
>             <...>-14859 [001] 510340.525310: md_make_request
>             <...>-14859 [001] 510340.525311:<stack trace>
>   =>  generic_make_request
>   =>  submit_bio
>   =>  submit_bh
>   =>  block_read_full_page
>   =>  blkdev_readpage
>   =>  __do_page_cache_readahead
>   =>  force_page_cache_readahead
>   =>  page_cache_sync_readahead
>
> So is this read ahead I'm observing?  Why does it occur only in the
> direct case?
>
> I noticed that blktrace sometime identifies what I assume to be the
> instigator of the io.  So I can sometimes see dd or md_raid6 there.
> As in [dd] or [md0_raid6]:
>
>   8,16   1      115     0.042000000  2910  D   W 2256 + 48 [md0_raid6]
>
> These unexplained reads either mention blkid or [0] or [(null)].
>
> It isn't clear to me whether the unexpected read behavior is due to a
> tuning problem in the O_DIRECT case or simply the way things work.
>
> Thank you for any comments.--
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

G'day John,

You need to give us more detail about your md raid setup. Beside a 
reference to md_raid6, there is no other details about your array.
How about sending:

mdadm -V
uname -a
mdadm -Dvv /dev/mdarray
mdadm -Evv /dev/arraycomponentdevices - for all of them


Good luck in the hunt,
J




  reply	other threads:[~2012-02-28 23:14 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-28 18:47 What are these reads in what should be simply a full-stripe write? John Adams
2012-02-28 23:14 ` linbloke [this message]
2012-03-05 16:29   ` John Adams
2012-02-29  5:52 ` Doug Dumitru

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F4D5FE9.7090301@fastmail.fm \
    --to=linbloke@fastmail.fm \
    --cc=john.adams@avid.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.