Distributed Replicated Block Device (DRBD) development
 help / color / mirror / Atom feed
From: Mrten <mrten+drbd@ii.nl>
To: drbd-dev@lists.linbit.com
Subject: Re: [Drbd-dev] [PATCH 2/2] expand section on throughput tuning to highlight prime usecase of external metadata
Date: Fri, 08 Jul 2011 20:37:07 +0200	[thread overview]
Message-ID: <4E174E53.5000106@ii.nl> (raw)
In-Reply-To: <4E17116D.4080305@linbit.com>

On 08-07-2011 16:17:17, Florian Haas wrote:

> You're adding a third item to the enumeration; so it would be nice if
> you could also rephrase the next paragraph which talks about "the 
> minimum between the two".

Will do.

> You're talking about a battery backup of a cache that is not there. 
> Does not compute. :)

So true, will fix ;)

>> DRBD metadata updates necessary to guarantee +  data-completeness 
>> in case of failure can slow down +  write throughput significantly.
>> If a raw device is normally capable of +  250 MB/s write throughput
>> it is not an anomaly to see writes as slow as + 70 MB/s with DRBD
>> enabled (numbers are for rotational disks). This is +  purely
>> caused by head seeks; 4MB data updates have to be followed by
>> metadata updates +  and the data-writes can only continue after the
>> metadata has been reached the +  platters (caching and write
>> reordering does not help).
> 
> I'm afraid you're missing some context here. DRBD performs the 
> synchronous meta data updates you are referring to only when an AL 
> extent goes hot or cold. It doesn't do so randomly or, as your 
> paragraph seems to imply to a casual reader, every time it has 
> written 4M of data.
> 
> And it is definitely _not_ normal to see 250MB/s write bandwidth drop
> to 70 MB/s. 110 MB/s would be entirely normal if you are replicating
> over Gigagit Ethernet, but that is determined by the bandwidth of the
> replication link, it doesn't have much to do with AL updates.

I think I should explain what I trying to convey, or rather, my mental
image of what happened while I was benchmarking (and saw that huge
performance drop).

My backing device for DRBD is a software raid-0 (two disks), with
'meta-disk internal'. Benchmarking was done by dd'ing a few gigs from
/dev/zero. All this dd-writing makes a lot of new extents hot (one for
every 4MB written?), which has to be remembered in the metadata, with
synchronous writes. Since my backing device is raid-0 and the default
chunk size for that is rather large these days, the (small) metadata
updates aren't spread over the raid-0 disks but are concentrated on one
device, which becomes the bottleneck for the benchmark because it has to
seek all the time.

This is not a cause for concern when you have a hardware battery-backed
cache, as the raid-controller can then delay writing the metadata, but I
don't have that.

I've blktrace-d, blkparse-d and seekwatcher-ed the hell out of this and
the images show exactly that happen, so I dared to write it up like this
without having read the source ;). Lots of linear writes, regularly
interrupted by a seek to synchronously write the metadata.


The slowdown wasn't caused by the interconnection between primary and
secondary, the 70MB/s was measured both in StandAlone and UpToDate (I
bonded 3 GE interfaces for nice syncing bandwidth).

And it was pure benchmarking, no other things happening on the server so
I'd expect that only the benchmark made extents hot.

I of course do not know the exact criteria that mark extents hot, if
what I described above is not an accurate description of what happens,
please correct me.


But the reason I think this should be in the docs is that I reckon that
lots of people would like to 0+"network raid-1" with relatively cheap
hardware, do the simplest of benchmarks and get confused by the
slowdown. Googling this I saw this subject passing over the mailinglist
a couple of times.

> And what you mean by "caching and write reordering does not help" I 
> don't understand at all, can you elaborate please?

The synchronous (barrier?) writes for the metadata, as far as I
understand it from a mailing post from Lars, *must* have reached the
platters before the linear dd-writing can continue. So no enabling of
write caches, NCQ or tuning of elevators is going to help.

However, if you think that the paragraph now implies that *every* write
randomly makes extents hot then I should do some polishing ;)


>> +[[s-tune-external-metadata]]

[...]

> This section would be ok, but it's still missing the steps to dump 
> the existing metadata and restore it onto the new metadata device. 
> Can you add that and repost the patch please?

Ah, I hadn't thought of that scenario (am using a raid-1 for the
metadata). Is this along the lines of:

drbdadm down [resource]
drbdadm dump-md [resource] > savefile
[change meta-disk]
drbdmeta /dev/drbdX v08 [metadevice] 0 restore-md savefile

?

Is the index 0 correct usage when using flexible-meta-disk?

Maarten.

  reply	other threads:[~2011-07-08 18:37 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-07 15:44 [Drbd-dev] [PATCH 1/2] add extra paragraph about manpages/ directory Mrten
2011-07-07 15:44 ` [Drbd-dev] [PATCH 2/2] expand section on throughput tuning to highlight prime usecase of external metadata Mrten
2011-07-08 14:17   ` Florian Haas
2011-07-08 18:37     ` Mrten [this message]
2011-07-08 14:01 ` [Drbd-dev] [PATCH 1/2] add extra paragraph about manpages/ directory Florian Haas
2011-07-08 14:45   ` Florian Haas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E174E53.5000106@ii.nl \
    --to=mrten+drbd@ii.nl \
    --cc=drbd-dev@lists.linbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox