All of lore.kernel.org
 help / color / mirror / Atom feed
From: troby <Thorn.Roby@harlandfs.com>
To: xfs@oss.sgi.com
Subject: Re: How to deal with XFS stripe geometry mismatch with hardware RAID5
Date: Wed, 14 Mar 2012 16:21:04 -0700 (PDT)	[thread overview]
Message-ID: <33506375.post@talk.nabble.com> (raw)
In-Reply-To: <20120314210514.GA46448@nsrc.org>




Brian Candler wrote:
> 
> On Wed, Mar 14, 2012 at 10:43:44AM -0700, troby wrote:
>> Mongo pre-allocates its datafiles and zero-fills them (there is a short
>> header at the start of each, not rewritten as far as I know)  and then
>> writes to them sequentially, wrapping around when it hits the end. In
>> this
>> case the entire load is inserts, no updates, hence the sequential writes.
>> The data will not wrap around for about 6 months, at which time old files
>> will be overwritten starting from the beginning. The BBU is functioning
>> and
>> the cache is set to write-back. The files are memory-mapped, I'll check
>> whether fsync is used. Flushing is done about every 30 seconds and takes
>> about 8 seconds.
> 
> How much data has been added to mongodb in those 30 seconds?
> 
>    typically 2.5 MB
> 
> If everything really was being written sequentially then I reckon you
> could
> write about 6.6GB in that time (11 disks x 75MB/sec x 8 seconds). From
> your
> posting I suspect you are not achieving that level of performance :-)
> 
> If it really is being written sequentially to a continguous file then the
> stripe alignment won't make any difference, because this is just a big
> pre-allocated file, and XFS will do its best to give one big contiguous
> chunk of space for it.
> 
> Anwyay, you don't need to guess these things, you can easily find out.
> 
> (1) Is the file preallocated and contiguous, or fragmented?
> 
>     # xfs_bmap /path/to/file
> 
> All seem to have a single extent:
> this is a currently active file:
> lfs.303:
>         0: [0..4192255]: 36322376672..36326568927
> 
> this is an old file:
> lfs.3:
>         0: [0..1048575]: 2039336992..2040385567
> 
> 
> 
> This will show you if you get one huge extent. If you get a number of
> large
> extents (say 100MB+) that would be fine for performance too.  If you get
> lots of shrapnel then there's a problem.
> 
> (2) Are you really writing sequentially?
> 
>     # btrace /dev/whatever | grep ' [DC] '
> 
> This will show you block requests dispatched [D] and completed [C] to the
> controller.
> 
> I'm not familiar with the btrace output, but here's the summary of roughly
> 5 minutes:
> 
> Total (8,16):
>  Reads Queued:      16,914,    1,888MiB  Writes Queued:      47,147,   
> 1,438MiB
>  Read Dispatches:   16,914,    1,888MiB  Write Dispatches:   47,050,   
> 1,438MiB
>  Reads Requeued:         0               Writes Requeued:         0
>  Reads Completed:   16,914,    1,888MiB  Writes Completed:   47,050,   
> 1,438MiB
>  Read Merges:            0,        0KiB  Write Merges:           97,     
> 592KiB
>  IO unplugs:        17,060               Timer unplugs:           6
> 
> Throughput (R/W): 5,528KiB/s / 4,209KiB/s
> Events (8,16): 418,873 entries
> Skips: 0 forward (0 -   0.0%)
> 
> 
> And here is some of the detail:
> 
> 8,16   0     2251     7.674877079  5364  C   R 42376096952 + 256 [0]
>   8,16   0     2252     7.675031410  5364  C   R 4046119976 + 256 [0]
>   8,16   0     2259     7.689553858  5364  D   R 4046120232 + 256 [mongod]
>   8,16   0     2260     7.689812456  5364  C   R 4046120232 + 256 [0]
>   8,16   0     2267     7.690973707  5364  D   R 42376097208 + 256
> [mongod]
>   8,16   0     2268     7.691225467  5364  C   R 42376097208 + 256 [0]
>   8,16   0     2275     7.699438100  5364  D   R 21964732520 + 256
> [mongod]
>   8,16   0     2276     7.699688313     0  C   R 21964732520 + 256 [0]
>   8,16   0     2283     7.700493875  5364  D   R 4046120488 + 256 [mongod]
>   8,16   0     2284     7.700749134  5364  C   R 4046120488 + 256 [0]
>   8,16   0     2291     7.703460687  5364  D   R 42376097464 + 256
> [mongod]
>   8,16   0     2292     7.703707154  5364  C   R 42376097464 + 256 [0]
>   8,16   2      928     7.730573720  5364  D   R 21964760296 + 256
> [mongod]
>   8,16   0     2293     7.747651477     0  C   R 21964760296 + 256 [0]
>   8,16   0     2300     7.754517529  5364  D   R 4046120744 + 256 [mongod]
>   8,16   0     2301     7.754781549  5364  C   R 4046120744 + 256 [0]
>   8,16   0     2308     7.760712917  5364  D   R 42376097720 + 256
> [mongod]
>   8,16   0     2309     7.761392841  5364  C   R 42376097720 + 256 [0]
>   8,16   2      935     7.769193162  5597  D   R 4046121000 + 256 [mongod]
>   8,16   0     2310     7.769458041     0  C   R 4046121000 + 256 [0]
>   8,16   2      942     7.773021214  5597  D   R 42376097976 + 256
> [mongod]
>   8,16   0     2311     7.773290126     0  C   R 42376097976 + 256 [0]
>   8,16   2      949     7.780080336  5597  D   R 4046121256 + 256 [mongod]
>   8,16   0     2312     7.780346410     0  C   R 4046121256 + 256 [0]
>   8,16   2      956     7.808903046  5597  D   R 42376098232 + 256
> [mongod]
>   8,16   0     2313     7.809197289     0  C   R 42376098232 + 256 [0]
>   8,16   2      963     7.816907787  5597  D   R 4046121512 + 256 [mongod]
>   8,16   0     2314     7.817182676     0  C   R 4046121512 + 256 [0]
>   8,16   2      970     7.827457411  5597  D   R 42376098488 + 256
> [mongod]
>   8,16   0     2315     7.827730410     0  C   R 42376098488 + 256 [0]
>   8,16   0     2316     7.833225453     0  C   R 4046121768 + 256 [0]
>   8,16   1     2410     7.844128616 37922  D   W 60216121432 + 80
> [flush-8:16]
>   8,16   1     2411     7.844140476 37922  D   W 60216121528 + 256
> [flush-8:16]
>   8,16   1     2412     7.844145438 37922  D   W 60216121784 + 256
> [flush-8:16]
>   8,16   1     2413     7.844149939 37922  D   W 60216122040 + 256
> [flush-8:16]
>   8,16   1     2414     7.844154486 37922  D   W 60216122296 + 256
> [flush-8:16]
>   8,16   1     2415     7.844159104 37922  D   W 60216122552 + 256
> [flush-8:16]
>   8,16   1     2416     7.844163489 37922  D   W 60216122808 + 256
> [flush-8:16]
>   8,16   1     2417     7.844169195 37922  D   W 60216123064 + 256
> [flush-8:16]
>   8,16   1     2418     7.844173666 37922  D   W 60216123320 + 256
> [flush-8:16]
>   8,16   1     2419     7.844178182 37922  D   W 60216123576 + 208
> [flush-8:16]
>   8,16   1     2420     7.844182518 37922  D   W 60216123800 + 256
> [flush-8:16]
>   8,16   1     2421     7.844186886 37922  D   W 60216124056 + 256
> [flush-8:16]
>   8,16   1     2422     7.844191572 37922  D   W 60216124312 + 256
> [flush-8:16]
>   8,16   1     2423     7.844195825 37922  D   W 60216124568 + 256
> [flush-8:16]
>   8,16   1     2424     7.844200405 37922  D   W 60216124824 + 256
> [flush-8:16]
>   8,16   1     2425     7.844205039 37922  D   W 60216125080 + 256
> [flush-8:16]
>   8,16   1     2426     7.844209304 37922  D   W 60216125336 + 256
> [flush-8:16]
>   8,16   1     2427     7.844213483 37922  D   W 60216125592 + 256
> [flush-8:16]
>   8,16   1     2428     7.844217895 37922  D   W 60216125848 + 256
> [flush-8:16]
>   8,16   1     2429     7.844222295 37922  D   W 60216126104 + 256
> [flush-8:16]
>   8,16   1     2430     7.844226651 37922  D   W 60216126360 + 256
> [flush-8:16]
>   8,16   1     2431     7.844230959 37922  D   W 60216126616 + 256
> [flush-8:16]
>   8,16   1     2432     7.844235575 37922  D   W 60216126872 + 256
> [flush-8:16]
>   8,16   1     2433     7.844239866 37922  D   W 60216127128 + 256
> [flush-8:16]
>   8,16   1     2434     7.844244274 37922  D   W 60216127384 + 256
> [flush-8:16]
>   8,16   1     2435     7.844249817 37922  D   W 60216127640 + 256
> [flush-8:16]
>   8,16   1     2436     7.844254266 37922  D   W 60216127896 + 256
> [flush-8:16]
>   8,16   1     2437     7.844258706 37922  D   W 60216128152 + 256
> [flush-8:16]
>   8,16   1     2438     7.844263213 37922  D   W 60216128408 + 256
> [flush-8:16]
>   8,16   1     2439     7.844267570 37922  D   W 60216128664 + 256
> [flush-8:16]
> 
> 
> And at a higher level:
> 
>     # strace -p <pid-of-mongodb-process>
> 
> will show you the seek/write/read operations that the application is
> performing.
> 
> Once you have the answers to those, you can make a better judgement as to
> what's happening.
> 
> (3) One other thing to check:
> 
> cat /sys/block/xxx/bdi/read_ahead_kb
> cat /sys/block/xxx/queue/max_sectors_kb
> 
> Increasing those to 1024 (echo 1024 > ....) may make some improvement.
> 
>     They were 128 - I increased the first, but trying to write the second
> gave me a write error.
> 
>> One thing I'm wondering is whether the incorrect stripe structure I
>> specified with mkfs is actually written into the file system structure
> 
> I am guessing that probably things like chunks of inodes are
> stripe-aligned. 
> But if you're really writing sequentially to a huge contiguous file then
> it
> won't matter anyway.
> 
> Regards,
> 
> Brian.
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 
> 

-- 
View this message in context: http://old.nabble.com/How-to-deal-with-XFS-stripe-geometry-mismatch-with-hardware-RAID5-tp33498437p33506375.html
Sent from the Xfs - General mailing list archive at Nabble.com.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2012-03-14 23:21 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-13 23:21 How to deal with XFS stripe geometry mismatch with hardware RAID5 troby
2012-03-14  7:37 ` Brian Candler
2012-03-14  7:52   ` Brian Candler
2012-03-14 15:41   ` Peter Grandi
2012-03-14 17:53   ` troby
2012-03-14  8:36 ` Stan Hoeppner
2012-03-14 17:43   ` troby
2012-03-14 21:05     ` Brian Candler
2012-03-14 23:21       ` troby [this message]
2012-03-15  0:31         ` Peter Grandi
2012-03-14 22:48     ` Peter Grandi
2012-03-14 23:22 ` Peter Grandi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=33506375.post@talk.nabble.com \
    --to=thorn.roby@harlandfs.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.