All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mrten <mrten+drbd@ii.nl>
To: drbd-dev@lists.linbit.com
Subject: Re: [Drbd-dev] [PATCH 2/2] expand section on throughput tuning to highlight prime usecase of external metadata
Date: Fri, 08 Jul 2011 20:37:07 +0200	[thread overview]
Message-ID: <4E174E53.5000106@ii.nl> (raw)
In-Reply-To: <4E17116D.4080305@linbit.com>

On 08-07-2011 16:17:17, Florian Haas wrote:

> You're adding a third item to the enumeration; so it would be nice if
> you could also rephrase the next paragraph which talks about "the 
> minimum between the two".

Will do.

> You're talking about a battery backup of a cache that is not there. 
> Does not compute. :)

So true, will fix ;)

>> DRBD metadata updates necessary to guarantee +  data-completeness 
>> in case of failure can slow down +  write throughput significantly.
>> If a raw device is normally capable of +  250 MB/s write throughput
>> it is not an anomaly to see writes as slow as + 70 MB/s with DRBD
>> enabled (numbers are for rotational disks). This is +  purely
>> caused by head seeks; 4MB data updates have to be followed by
>> metadata updates +  and the data-writes can only continue after the
>> metadata has been reached the +  platters (caching and write
>> reordering does not help).
> 
> I'm afraid you're missing some context here. DRBD performs the 
> synchronous meta data updates you are referring to only when an AL 
> extent goes hot or cold. It doesn't do so randomly or, as your 
> paragraph seems to imply to a casual reader, every time it has 
> written 4M of data.
> 
> And it is definitely _not_ normal to see 250MB/s write bandwidth drop
> to 70 MB/s. 110 MB/s would be entirely normal if you are replicating
> over Gigagit Ethernet, but that is determined by the bandwidth of the
> replication link, it doesn't have much to do with AL updates.

I think I should explain what I trying to convey, or rather, my mental
image of what happened while I was benchmarking (and saw that huge
performance drop).

My backing device for DRBD is a software raid-0 (two disks), with
'meta-disk internal'. Benchmarking was done by dd'ing a few gigs from
/dev/zero. All this dd-writing makes a lot of new extents hot (one for
every 4MB written?), which has to be remembered in the metadata, with
synchronous writes. Since my backing device is raid-0 and the default
chunk size for that is rather large these days, the (small) metadata
updates aren't spread over the raid-0 disks but are concentrated on one
device, which becomes the bottleneck for the benchmark because it has to
seek all the time.

This is not a cause for concern when you have a hardware battery-backed
cache, as the raid-controller can then delay writing the metadata, but I
don't have that.

I've blktrace-d, blkparse-d and seekwatcher-ed the hell out of this and
the images show exactly that happen, so I dared to write it up like this
without having read the source ;). Lots of linear writes, regularly
interrupted by a seek to synchronously write the metadata.


The slowdown wasn't caused by the interconnection between primary and
secondary, the 70MB/s was measured both in StandAlone and UpToDate (I
bonded 3 GE interfaces for nice syncing bandwidth).

And it was pure benchmarking, no other things happening on the server so
I'd expect that only the benchmark made extents hot.

I of course do not know the exact criteria that mark extents hot, if
what I described above is not an accurate description of what happens,
please correct me.


But the reason I think this should be in the docs is that I reckon that
lots of people would like to 0+"network raid-1" with relatively cheap
hardware, do the simplest of benchmarks and get confused by the
slowdown. Googling this I saw this subject passing over the mailinglist
a couple of times.

> And what you mean by "caching and write reordering does not help" I 
> don't understand at all, can you elaborate please?

The synchronous (barrier?) writes for the metadata, as far as I
understand it from a mailing post from Lars, *must* have reached the
platters before the linear dd-writing can continue. So no enabling of
write caches, NCQ or tuning of elevators is going to help.

However, if you think that the paragraph now implies that *every* write
randomly makes extents hot then I should do some polishing ;)


>> +[[s-tune-external-metadata]]

[...]

> This section would be ok, but it's still missing the steps to dump 
> the existing metadata and restore it onto the new metadata device. 
> Can you add that and repost the patch please?

Ah, I hadn't thought of that scenario (am using a raid-1 for the
metadata). Is this along the lines of:

drbdadm down [resource]
drbdadm dump-md [resource] > savefile
[change meta-disk]
drbdmeta /dev/drbdX v08 [metadevice] 0 restore-md savefile

?

Is the index 0 correct usage when using flexible-meta-disk?

Maarten.

  reply	other threads:[~2011-07-08 18:37 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-07 15:44 [Drbd-dev] [PATCH 1/2] add extra paragraph about manpages/ directory Mrten
2011-07-07 15:44 ` [Drbd-dev] [PATCH 2/2] expand section on throughput tuning to highlight prime usecase of external metadata Mrten
2011-07-08 14:17   ` Florian Haas
2011-07-08 18:37     ` Mrten [this message]
2011-07-08 14:01 ` [Drbd-dev] [PATCH 1/2] add extra paragraph about manpages/ directory Florian Haas
2011-07-08 14:45   ` Florian Haas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E174E53.5000106@ii.nl \
    --to=mrten+drbd@ii.nl \
    --cc=drbd-dev@lists.linbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.