linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* split RAID1 during backups?
@ 2005-10-24 10:57 Jeff Breidenbach
  2005-10-24 11:22 ` Jurriaan Kalkman
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Jeff Breidenbach @ 2005-10-24 10:57 UTC (permalink / raw)
  To: linux-raid


Hi all,

I have a two drive RAID1 serving data for a busy website. The
partition is 500GB and contains millions of 10KB files. For reference,
here's /proc/mdstat

Personalities : [raid1]
md0 : active raid1 sdc1[0] sdd1[1]
      488383936 blocks [2/2] [UU]

For backups, I set the md0 partition to readonly and then use dd_rescue
+ netcat to copy the parition over a gigabit network. Unfortuantely,
this process takes almost 10 hours. I'm only able to copy about 18MB/s
from md0 due to disk contention with the webserver. If I had the full
attention of a single disk, I could read at nearly 60MB/s.

So - I'm thinking of the following backup scenario.  First, remount
/dev/md0 readonly just to be safe. Then mount the two component
paritions (sdc1, sdd1) readonly. Tell the webserver to work from one
component partition, and tell the backup process to work from the
other component partition. Once the backup is complete, point the
webserver back at /dev/md0, unmount the component partitions, then
switch read-write mode back on.

Am I insane? 

Everything on this system seems bottlenecked by disk I/O. That
includes the rate web pages are served as well as the backup process
described above. While I'm always hungry for perforance tips, faster
backups are the current focus. For those interested in gory details
such as drive types, NCQ settings, kernel version and whatnot, I
dumped a copy of dmesg output here: http://www.jab.org/dmesg

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
  2005-10-24 10:57 Jeff Breidenbach
@ 2005-10-24 11:22 ` Jurriaan Kalkman
  2005-10-24 11:37 ` Brad Campbell
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 25+ messages in thread
From: Jurriaan Kalkman @ 2005-10-24 11:22 UTC (permalink / raw)
  To: Jeff Breidenbach; +Cc: linux-raid

>
> Hi all,
>
> I have a two drive RAID1 serving data for a busy website. The
> partition is 500GB and contains millions of 10KB files. For reference,
> here's /proc/mdstat
>
> Personalities : [raid1]
> md0 : active raid1 sdc1[0] sdd1[1]
>       488383936 blocks [2/2] [UU]
>
> For backups, I set the md0 partition to readonly and then use dd_rescue
> + netcat to copy the parition over a gigabit network. Unfortuantely,
> this process takes almost 10 hours. I'm only able to copy about 18MB/s
> from md0 due to disk contention with the webserver. If I had the full
> attention of a single disk, I could read at nearly 60MB/s.

First of all, if the data is mostly static, rsync might work faster. Don't
feed rsync millions of files in one go - try to split it in separate
processes for say all files starting with a, all files starting with b
etc.

> So - I'm thinking of the following backup scenario.  First, remount
> /dev/md0 readonly just to be safe. Then mount the two component
> paritions (sdc1, sdd1) readonly. Tell the webserver to work from one
> component partition, and tell the backup process to work from the
> other component partition. Once the backup is complete, point the
> webserver back at /dev/md0, unmount the component partitions, then
> switch read-write mode back on.
>
> Am I insane?

It doesn't sound insane. If it's actually fast, is something only you can
test on your hardware.

By the way, it used to be with regular IDE disks that using hdc and hdd
together on a single wire was a sure way to get a slow system. I take it
sdc and sdd using SATA don't influence each other?

Good luck,
Jurriaan


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
  2005-10-24 10:57 Jeff Breidenbach
  2005-10-24 11:22 ` Jurriaan Kalkman
@ 2005-10-24 11:37 ` Brad Campbell
  2005-10-24 19:05 ` Bill Davidsen
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 25+ messages in thread
From: Brad Campbell @ 2005-10-24 11:37 UTC (permalink / raw)
  To: Jeff Breidenbach; +Cc: linux-raid

Jeff Breidenbach wrote:

> So - I'm thinking of the following backup scenario.  First, remount
> /dev/md0 readonly just to be safe. Then mount the two component
> paritions (sdc1, sdd1) readonly. Tell the webserver to work from one
> component partition, and tell the backup process to work from the
> other component partition. Once the backup is complete, point the
> webserver back at /dev/md0, unmount the component partitions, then
> switch read-write mode back on.

Why not do something like this ?

mount -o remount,ro /dev/md0 /web
mdadm --fail /dev/md0 /dev/sdd1
mdadm --remove /dev/md0 /dev/sdd1
mount -o ro /dev/sdd1 /target

<do backup here>

umount /target
mdadm -add /dev/md0 /dev/sdd1
mount -o remount,rw /dev/md0 /web

That way the web server continues to run from the md..
However you will endure a rebuild on md0 when you re-add the disk, but given everything is mounted 
read-only, you should not practically be doing anything and if you fail a disk during the rebuild 
the other disk will still be intact.

I second jurriaan's vote for rsync also, but I would be inclined just to let it loose on the whole 
disk rather than break it up into parts.. but then I have heaps of ram too..

Regards,
Brad
-- 
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
@ 2005-10-24 12:07 Jeff Breidenbach
  2005-10-24 13:26 ` Paul Clements
  2005-10-24 18:55 ` dean gaudet
  0 siblings, 2 replies; 25+ messages in thread
From: Jeff Breidenbach @ 2005-10-24 12:07 UTC (permalink / raw)
  To: linux-raid


>First of all, if the data is mostly static, rsync might work faster.

Any operation that stats the individual files - even to just look at
timestamps - takes about two weeks. Therefore it is hard for me to see
rsync as a viable solution, even though the data is mostly
static. About 400,000 files change between weekly backups.

>I take it sdc and sdd using SATA don't influence each other?

Correct.

>However you will endure a rebuild on md0 when you re-add the disk, but
>given everything is mounted read-only, you should not practically be
>doing anything

If the rebuild operation is a no-op, then that sounds like a great
idea. If the rebuild operation requires scanning over all data in both
drives, I think that's going to be at least as expensive as the
current 10 hour process.

Thanks for the suggestions so far.

Cheers,
Jeff


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
  2005-10-24 12:07 Jeff Breidenbach
@ 2005-10-24 13:26 ` Paul Clements
  2005-10-24 18:55 ` dean gaudet
  1 sibling, 0 replies; 25+ messages in thread
From: Paul Clements @ 2005-10-24 13:26 UTC (permalink / raw)
  To: Jeff Breidenbach; +Cc: linux-raid

Jeff Breidenbach wrote:

>>However you will endure a rebuild on md0 when you re-add the disk, but
>>given everything is mounted read-only, you should not practically be
>>doing anything
> 
> 
> If the rebuild operation is a no-op, then that sounds like a great
> idea. If the rebuild operation requires scanning over all data in both
> drives, I think that's going to be at least as expensive as the
> current 10 hour process.

You seem to be using a pretty bleeding edge kernel (2.6.12). If you were 
to upgrade to 2.6.13, the md bitmap (intent logging) functionality that 
is present in that kernel will allow the resync to be a no-op. 
Basically, the bitmap tracks writes to the md array, and then when 
resync occurs, only syncs blocks that have been written. Since the array 
would be readonly while you're doing backups, there would be nothing to 
resync.

See the list archives for more details on md bitmap.

--
Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
  2005-10-24 12:07 Jeff Breidenbach
  2005-10-24 13:26 ` Paul Clements
@ 2005-10-24 18:55 ` dean gaudet
  1 sibling, 0 replies; 25+ messages in thread
From: dean gaudet @ 2005-10-24 18:55 UTC (permalink / raw)
  To: Jeff Breidenbach; +Cc: linux-raid

On Mon, 24 Oct 2005, Jeff Breidenbach wrote:

> >First of all, if the data is mostly static, rsync might work faster.
> 
> Any operation that stats the individual files - even to just look at
> timestamps - takes about two weeks. Therefore it is hard for me to see
> rsync as a viable solution, even though the data is mostly
> static. About 400,000 files change between weekly backups.

taking a long time to stat individual files makes me wonder if you're
suffering from atime updates and O(n) directory lookups... have you tried
this:

- mount -o noatime,nodiratime
- tune2fs -O dir_index  (and e2fsck -D)
  (you need recentish e2fsprogs for this, and i'm pretty sure you want
  2.6.x kernel)

a big hint you're suffering from atime updates is write traffic when your
fs is mounted rw, and your static webserver is the only thing running (and
your logs go elsewhere)... atime updates are probably the only writes
then.  try "iostat -x 5".

a big hint you're suffering from O(n) directory lookups is heaps of system
time... (vmstat or top).


On Mon, 24 Oct 2005, Brad Campbell wrote:

> mount -o remount,ro /dev/md0 /web
> mdadm --fail /dev/md0 /dev/sdd1
> mdadm --remove /dev/md0 /dev/sdd1
> mount -o ro /dev/sdd1 /target
> 
> <do backup here>
> 
> umount /target
> mdadm -add /dev/md0 /dev/sdd1
> mount -o remount,rw /dev/md0 /web

the md event counts would be out of sync and unless you're using bitmapped 
intent logging this would cause a full resync.  if the raid wasn't online 
you could probably use one of the mdadm options to force the two devices 
to be a sync'd raid1 ... but i'm guessing you wouldn't be able to do it 
online.

other 2.6.x bleeding edge options are to mark one drive as write-mostly
so that you have no read traffic competition while doing a backup... or
just use the bitmap intent logging and a nbd to add a third, networked,
copy of the drive on another machine.

-dean

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
  2005-10-24 10:57 Jeff Breidenbach
  2005-10-24 11:22 ` Jurriaan Kalkman
  2005-10-24 11:37 ` Brad Campbell
@ 2005-10-24 19:05 ` Bill Davidsen
  2005-10-25  4:30 ` Thomas Garner
  2005-10-27  0:04 ` Christopher Smith
  4 siblings, 0 replies; 25+ messages in thread
From: Bill Davidsen @ 2005-10-24 19:05 UTC (permalink / raw)
  To: Jeff Breidenbach; +Cc: linux-raid

On Mon, 24 Oct 2005, Jeff Breidenbach wrote:

> 
> Hi all,
> 
> I have a two drive RAID1 serving data for a busy website. The
> partition is 500GB and contains millions of 10KB files. For reference,
> here's /proc/mdstat
> 
> Personalities : [raid1]
> md0 : active raid1 sdc1[0] sdd1[1]
>       488383936 blocks [2/2] [UU]
> 
> For backups, I set the md0 partition to readonly and then use dd_rescue
> + netcat to copy the parition over a gigabit network. Unfortuantely,
> this process takes almost 10 hours. I'm only able to copy about 18MB/s
> from md0 due to disk contention with the webserver. If I had the full
> attention of a single disk, I could read at nearly 60MB/s.

Sounds like the wrong solution... how about this:
  Get one more drive of the same size, and at backup time add it to the 
mirror. After rebuild take it back out of the mirror. Put a remount r/o in 
there at your discresion. Now you have a valid copy of your data, back 
that up as you like.

  If you want to try something "which used to work" see nbd, export 500GB 
from another machine, add the network block device to the mirror, let it 
sync, break the mirror. Haven't tried since 2.4.19 or so.

  The real problem is that you should be able to search the disk faster 
than that, identify the modified files, and do incrementals regularly. I 
have 400GB of 2-5MB files, and it takes minutes, not hours, to scan them. 
That's PATA not SATA, I suspect there may still be issues there, SATA is 
not as well explored, certainly not by me!

> 
> So - I'm thinking of the following backup scenario.  First, remount
> /dev/md0 readonly just to be safe. Then mount the two component
> paritions (sdc1, sdd1) readonly. Tell the webserver to work from one
> component partition, and tell the backup process to work from the
> other component partition. Once the backup is complete, point the
> webserver back at /dev/md0, unmount the component partitions, then
> switch read-write mode back on.
> 
> Am I insane? 

I don't like the failure scenarios on that one, personally.

One last thought, the slowness of the disk may result from the extended 
times to do directory operations which you mention. You don't have all 
those thousands of files in a single directory, do you? How long does it 
take to do an unsuccessful "find" from the root? Like "find /base -name 
spvgZy3G" or other name it's not going to find.

> 
> Everything on this system seems bottlenecked by disk I/O. That
> includes the rate web pages are served as well as the backup process
> described above. While I'm always hungry for perforance tips, faster
> backups are the current focus. For those interested in gory details
> such as drive types, NCQ settings, kernel version and whatnot, I
> dumped a copy of dmesg output here: http://www.jab.org/dmesg
> 
> Cheers,
> Jeff
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
Doing interesting things with little computers since 1979


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
@ 2005-10-24 20:28 Jeff Breidenbach
  2005-10-24 20:58 ` John Stoffel
  2005-10-25 22:18 ` David Greaves
  0 siblings, 2 replies; 25+ messages in thread
From: Jeff Breidenbach @ 2005-10-24 20:28 UTC (permalink / raw)
  To: linux-raid


Thanks for all the suggestions.

> a big hint you're suffering from atime updates is write traffic when your
> fs is mounted rw, and your static webserver is the only thing running (and
> your logs go elsewhere)... atime updates are probably the only writes
> then.  try "iostat -x 5".

I think atime updates are an unlikely, since the partition is mounted
with atime turned off. I will look into the directory listing issue.

# mount | grep md0
/dev/md0 on /data1 type reiserfs (rw,noatime,nodiratime)

# vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  0      0  57468 962964 2301336    0    0   345   500    3     2 40  6 31 23


>  The real problem is that you should be able to search the disk faster
>than that, identify the modified files, and do incrementals regularly. I
>have 400GB of 2-5MB files, and it takes minutes, not hours, to scan them.
>That's PATA not SATA, I suspect there may still be issues there, SATA is
>not as well explored, certainly not by me!

I'm not sure this is a relevant comparison. The files in question are
three about 1000 times smaller, and there are about 1000 times more of them.

>Get one more drive of the same size, and at backup time add it to
>the mirror. After rebuild take it back out of the mirror. Put a
>remount r/o in there at your discresion. Now you have a valid copy of
>your data, back that up as you like.

Interesting. What is the advantage over the current practice? Is it
faster, or does is use less disk I/O? Reminder: the current practice
(which I think is too slow) is to copy md0 with dd_rescue while the
partition is also feeding a webserver.

> the md event counts would be out of sync and unless you're using bitmapped
> intent logging this would cause a full resync.  if the raid wasn't online
> you could probably use one of the mdadm options to force the two devices
> to be a sync'd raid1 ... but i'm guessing you wouldn't be able to do it
> online.

I will look into intent logging. This is the first I've heard of it, thanks.

> other 2.6.x bleeding edge options are to mark one drive as write-mostly
> so that you have no read traffic competition while doing a backup... or
> just use the bitmap intent logging and a nbd to add a third, networked,
> copy of the drive on another machine.

This is also the first I've heard of ndb. Thanks, I'll look into that
too.

>One last thought, the slowness of the disk may result from the extended
>times to do directory operations which you mention. You don't have all
>those thousands of files in a single directory, do you? How long does it
>take to do an unsuccessful "find" from the root? Like "find /base -name
>spvgZy3G" or other name it's not going to find.

Individual directories contain up to about 150,000 files. If I run ls
-U on all directories, it completes in a reasonably amount of time (I
forget how much, but I think it is well under an hour). Reiserfs is
supposed to be good at this sort of thing. If I were to stat each
file, then it's a different story.

Jeff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
  2005-10-24 20:28 Jeff Breidenbach
@ 2005-10-24 20:58 ` John Stoffel
  2005-10-25 22:18 ` David Greaves
  1 sibling, 0 replies; 25+ messages in thread
From: John Stoffel @ 2005-10-24 20:58 UTC (permalink / raw)
  To: Jeff Breidenbach; +Cc: linux-raid

>>>>> "Jeff" == Jeff Breidenbach <jeff@jab.org> writes:


Jeff> # mount | grep md0
Jeff> /dev/md0 on /data1 type reiserfs (rw,noatime,nodiratime)

Ah, you're using reiserfs on here.  It may or may not be having
problems with all those files per-directory that you have.  Is there
any way you can split them up more into sub-directories?  

Old news servers used to run into this exact same problem, and what
they did was move all files starting with 'a' into the 'a/' directory,
all files starting with 'b' into b/... etc.  You can go down as many
levels as you want.  

Jeff> Individual directories contain up to about 150,000 files. If I
Jeff> run ls -U on all directories, it completes in a reasonably
Jeff> amount of time (I forget how much, but I think it is well under
Jeff> an hour). Reiserfs is supposed to be good at this sort of
Jeff> thing. If I were to stat each file, then it's a different story.

Do you stat the files in inode order (not sure how reiserfs stores
files), when you're doing a readdir() on the directory contents?  You
don't want to bother sorting at all, you just want to pull them off
the disk as efficiently as possible.

I think you'll get alot more performance out of your system if you can
just re-do how the application writes/reads the files you're using.
It almost sounds like some sort of cache system...

The other idea would be to use 'inotify' and just copy those files
which change to the cloned box.

Another idea, which would require more hardware would be to make some
readonly copies of the system and have all reads go there, and only
writes goto the master system.  If the master dies, you just promote a
slave into that role.  If a slave dies, you have extras running
around.  Then you could do your backups against the readonly systems,
in parallel to get the most performance out of your backups.

But knowing more about the application would help.  Millions of tiny
files aren't optimal these days.  

Oh yeah, what kinds of block size are you using on the filesystem?
And how many disks?  Splitting the load across more smaller disks will
probably help as well, since I suspect that your times are dominated
by seek and directory overhead, not actually reading of all these tiny
files.

John

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
@ 2005-10-25  3:37 Jeff Breidenbach
  2005-10-25  4:07 ` dean gaudet
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Jeff Breidenbach @ 2005-10-25  3:37 UTC (permalink / raw)
  To: linux-raid


Ok... thanks everyone!

David, you said you are worried about failure scenarios
involved with RAID splitting. Could you please elaborate?
My biggest concern is I'm going to accidentally trigger
a rebuild no matter what I try but maybe you have something
more serious in mind.

Brad, your suggestion about kernel 2.6.13 and intent logging and
having mdadm pull a disk sounds like a winner. I'm going to to try it
if the software looks mature enough. Should I be scared?

Dean, the comment about "write-mostly" is confusing to me.  Let's say
I somehow marked one of the component drives write-mostly to quiet it
down. How do I get at it? Linux will not let me mount the component
partition if md0 is also mounted. Do you think "write-mostly" or
"write-behind" are likely enough to be magic bullets that I should
learn all about them?

Bill, thanks for the suggestion to use nbd instead of netcat.  Netcat
is solid software and very fast, but does feel a little like duct
tape. You also suggested putting a third drive (local or nbd remote)
temporarily in the RAID1. What does that buy versus the current
practice of using dd_rescue to copy the data off md0? I'm not
imagining any I/O savings over the current approach.

John, I'm using 4KB blocks in reiserfs with tail packing. All sorts of
other details are in the dmesg output [1]. I agree seeks are a major
bottleneck, and I like your suggestion about putting extra spindles
in. Master-slave won't work because the data is continuously changing.
I'm not going to argue about the optimality of millions of tiny files
(go talk to Hans Reiser about that one!) but I definitely don't foresee
major application redesign any time soon.

Most importantly, thanks for the encouragement. So far it sounds like
there might be some ninja magic required, but I'm becoming
increasingly optimistic that it will be - somehow - possible manage
disk contention in order to dramatically raise backup speeds.

Cheers,
Jeff

[1] http://www.jab.org/dmesg

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
  2005-10-25  3:37 split RAID1 during backups? Jeff Breidenbach
@ 2005-10-25  4:07 ` dean gaudet
  2005-10-25  8:35 ` Norman Schmidt
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 25+ messages in thread
From: dean gaudet @ 2005-10-25  4:07 UTC (permalink / raw)
  To: Jeff Breidenbach; +Cc: linux-raid

On Mon, 24 Oct 2005, Jeff Breidenbach wrote:

> Dean, the comment about "write-mostly" is confusing to me.  Let's say
> I somehow marked one of the component drives write-mostly to quiet it
> down. How do I get at it? Linux will not let me mount the component
> partition if md0 is also mounted. Do you think "write-mostly" or
> "write-behind" are likely enough to be magic bullets that I should
> learn all about them?

if one drive is write-mostly, and you remount the filesystem read-only... 
then no writes should be occuring... and you could dd from the component 
drive directly and get a consistent fs image.  (i'm assuming you can 
remount the filesystem read-only for the duration of the backup because it 
sounds like that's how you do it now; and i'm assuming you're happy enough 
with your dd_rescue image...)

myself i've been considering a related problem... i don't trust LVM/DM 
snapshots in 2.6.x yet, and i've been holding back a 2.4.x box waiting for 
them to stabilise... but that seems to be taking a long time.  the box 
happens to have a 3-way raid1 anyhow, and 2.6.x bitmapped intent logging 
would give me a great snapshot backup option:  just break off one disk 
during the backup and put it back in the mirror when done.

there's probably one problem with this 3-way approach... i'll need some 
way to get the fs (ext3) to reach a "safe" point where no log recovery 
would be required on the disk i break out of the mirror... because under 
no circumstances do you want to write on the disk while it's outside the 
mirror.  (LVM snapshotting in 2.4.x requires a "VFS lock" patch which does 
exactly this when you create a snapshot.)


> John, I'm using 4KB blocks in reiserfs with tail packing.

i didn't realise you were using reiserfs... i'd suggest disabling tail 
packing... but then i've never used reiser, and i've only ever seen 
reports of tail packing having serious performance impact.  you're really 
only saving yourself an average of half a block per inode... maybe try a 
smaller block size if the disk space is an issue due to lots of inodes.

-dean

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
  2005-10-24 10:57 Jeff Breidenbach
                   ` (2 preceding siblings ...)
  2005-10-24 19:05 ` Bill Davidsen
@ 2005-10-25  4:30 ` Thomas Garner
  2005-10-27  0:04 ` Christopher Smith
  4 siblings, 0 replies; 25+ messages in thread
From: Thomas Garner @ 2005-10-25  4:30 UTC (permalink / raw)
  To: Jeff Breidenbach; +Cc: linux-raid

Should there be any consideration for the utilization of the gigabit 
interface that is passing all of this backup traffic, as well as the 
speed of the drive that is doing all of the writing during this 
transaction?  Is the 18MB/s how fast the data is being copied over the 
network, or is it some metric within the host system?

Thomas

Jeff Breidenbach wrote:
> Hi all,
> 
> I have a two drive RAID1 serving data for a busy website. The
> partition is 500GB and contains millions of 10KB files. For reference,
> here's /proc/mdstat
> 
> Personalities : [raid1]
> md0 : active raid1 sdc1[0] sdd1[1]
>       488383936 blocks [2/2] [UU]
> 
> For backups, I set the md0 partition to readonly and then use dd_rescue
> + netcat to copy the parition over a gigabit network. Unfortuantely,
> this process takes almost 10 hours. I'm only able to copy about 18MB/s
> from md0 due to disk contention with the webserver. If I had the full
> attention of a single disk, I could read at nearly 60MB/s.
> 
> So - I'm thinking of the following backup scenario.  First, remount
> /dev/md0 readonly just to be safe. Then mount the two component
> paritions (sdc1, sdd1) readonly. Tell the webserver to work from one
> component partition, and tell the backup process to work from the
> other component partition. Once the backup is complete, point the
> webserver back at /dev/md0, unmount the component partitions, then
> switch read-write mode back on.
> 
> Am I insane? 
> 
> Everything on this system seems bottlenecked by disk I/O. That
> includes the rate web pages are served as well as the backup process
> described above. While I'm always hungry for perforance tips, faster
> backups are the current focus. For those interested in gory details
> such as drive types, NCQ settings, kernel version and whatnot, I
> dumped a copy of dmesg output here: http://www.jab.org/dmesg
> 
> Cheers,
> Jeff
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
@ 2005-10-25  5:01 Jeff Breidenbach
  0 siblings, 0 replies; 25+ messages in thread
From: Jeff Breidenbach @ 2005-10-25  5:01 UTC (permalink / raw)
  To: linux-raid


On 10/24/05, Thomas Garner <tlg1466@neo.tamu.edu> wrote:
> Should there be any consideration for the utilization of the gigabit
> interface that is passing all of this backup traffic, as well as the
> speed of the drive that is doing all of the writing during this
> transaction?  Is the 18MB/s how fast the data is being copied over the
> network, or is it some metric within the host system?

The switched gigabit network is plenty fast. The bottleneck is
reading from the RAID1 while it is under contention.

Here are measurements from transferring a chunk of data from
/dev/zero, a single unmounted drive, and RAID1. Measurements are
reported by dd_rescue and reflect how fast data is moving over the
network. I was careful to use smart command line options with
dd_rescue, avoid contaminating Linux's disk cache, and make sure
results were repeatable.

MB/s   Operation
====   ============================
72.0   dd-rescue /dev/zero - | netcat
61.8   dd-rescue [unmounted single drive]  - | netcat
18.8   dd-rescue md0 - | netcat

dd_rescue v1.11 options:
 -B 4096 -q  -l -d -s 11G -m 200M -S 0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
  2005-10-25  3:37 split RAID1 during backups? Jeff Breidenbach
  2005-10-25  4:07 ` dean gaudet
@ 2005-10-25  8:35 ` Norman Schmidt
  2005-10-25 17:51   ` John Stoffel
  2005-10-25 18:04 ` John Stoffel
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Norman Schmidt @ 2005-10-25  8:35 UTC (permalink / raw)
  To: linux-raid

Jeff Breidenbach schrieb:
> Ok... thanks everyone!

Something from me:

What you should be able to do with software raid1 is the following:
Stop the raid, mount both underlying devices instead of the raid device, 
but of course READ ONLY. Both contain the complete data and filesystem, 
and in addition to that the md superblock at the end. Both should be 
identical copies of that.
Thus, you do not have to resync afterwards. You then can backup the one 
disk while serving the web server from the other. When you are done, 
unmount, assemble the raid, mount it and go on.

Eg. /dev/md1 consists of /dev/hda5 and /dev/hdc5 and has a working fs on it.

/dev/md1 is mounted on /data.

umount /dev/md1
mdadm -S /dev/md1

mount -r /dev/hda5 /data
mount -r /dev/hdc5 /backup

do your backup

umount /dev/hda5
umount /dev/hdc5

mdadm -A /dev/md1

mount /dev/md1 /data


Please could somebody double-check or confirm this first?

Hope this helps, Norman.

-- 
Norman Schmidt          Institut fuer Physikal. u. Theoret. Chemie
Dipl.-Chem. Univ.       Friedrich-Alexander-Universitaet
schmidt@naa.net         Erlangen-Nuernberg
+49 9131 852 7321       IT-Systembetreuer Physikalische Chemie

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
  2005-10-25  8:35 ` Norman Schmidt
@ 2005-10-25 17:51   ` John Stoffel
  2005-10-25 19:20     ` Norman Schmidt
  0 siblings, 1 reply; 25+ messages in thread
From: John Stoffel @ 2005-10-25 17:51 UTC (permalink / raw)
  To: schmidt; +Cc: linux-raid


Norman> What you should be able to do with software raid1 is the
Norman> following: Stop the raid, mount both underlying devices
Norman> instead of the raid device, but of course READ ONLY. Both
Norman> contain the complete data and filesystem, and in addition to
Norman> that the md superblock at the end. Both should be identical
Norman> copies of that.  Thus, you do not have to resync
Norman> afterwards. You then can backup the one disk while serving the
Norman> web server from the other. When you are done, unmount,
Norman> assemble the raid, mount it and go on.

Umm... so what you're proposing would mean that he would have to stop
his application, make sure nothing is accessing that volume, and then
restart the application?  

Somehow I don't see that happening.  

Why do you feel that you have to stop and umount the volume for the
splitting off of a read-only mirror pair for backups?  

John

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
  2005-10-25  3:37 split RAID1 during backups? Jeff Breidenbach
  2005-10-25  4:07 ` dean gaudet
  2005-10-25  8:35 ` Norman Schmidt
@ 2005-10-25 18:04 ` John Stoffel
  2005-10-25 18:13 ` Paul Clements
  2005-10-25 20:05 ` Bill Davidsen
  4 siblings, 0 replies; 25+ messages in thread
From: John Stoffel @ 2005-10-25 18:04 UTC (permalink / raw)
  To: Jeff Breidenbach; +Cc: linux-raid


Jeff> Ok... thanks everyone!

You're welcome!  :]

Jeff> John, I'm using 4KB blocks in reiserfs with tail packing. All
Jeff> sorts of other details are in the dmesg output [1]. I agree
Jeff> seeks are a major bottleneck, and I like your suggestion about
Jeff> putting extra spindles in. 

I think this will be the easiest and cheapest solution for you, esp if
you do stripes over mirror pairs.  RAID10.  Or which ever it is, I
can't seem to keep it strait no matter how I think it through.

In your case, getting two HPT302 controllers and two pairs of 100gb
disks, and using them to stripe and mirror across the six disks you
have, would almost certainly improve your performance.

Jeff> Master-slave won't work because the data is continuously
Jeff> changing. 

Inotify might be the solution here, just have it watch the filesystem,
and when you see a file change you push it to the slaves.  

Jeff>  I'm not going to argue about the optimality of millions of tiny
Jeff> files (go talk to Hans Reiser about that one!)  but I definitely
Jeff> don't foresee major application redesign any time soon.

I don't think that hashing files down a level is a major re-design,
esp if your application is well done already and has just a few
functions where files are opened/written/read/closed.  That's where
you'd put your intelligence.  

But again, I'd strongly suggest that you get more controllers and
smaller disks and spread your load over as many spindles a possible.
That should give you a big performance boost.  And of course, make
sure that you have the PCI bus bandwidth to handle it.  Stay away from
RAID5 in your situation, it would just destroy you even more.

I don't recall if Reiserfs has a shrink option, but you might be able
to do so.

Heck, if you could get another pair of controllers, or a four channel
SATA controller with 4x120gb drives, you could make up a 4 way stripe,
probably 64k stripe width (or whatever the average size of your files
is), so that a single file read should hit just one disk.  Then you
can add that in as a third mirror to your existing pair of disks.
Then you'd simply pull one of the disks, toss in another controller
and four more disks and repeat the re-mirror.

Yes, you'd have more disks, more controllers, but better performance.  

You might also be able to cache more files in RAM, instead of on disk,
and if you can, laying them out in access order would be better as
well.  It all depends on how much time/money/effort you can spend
here.

And yes, I'll harp on this, but changing how your application stores
files could be a very cheap and simple way to get more performance.
Just hashing the files into smaller buckets will be a big win.  

And you might think about moving to ext3 with the dir_index option set
as well.  And a smaller block size to the filesystem, though you'll
need to bump up the inode counts by quite a bit.  

You do have a test system you can play with right?  One that has a
good subset of your data?  Just playing with filesystems and config
options might also give you a performance boost.  

To minimize downtime, through money in terms of disks and controllers
at the system.  Shutdown, add them in, reboot.  Then you can add in a
new half to the existing MD array, pull out some old disks, add in
others, all while serving data.  Yes, you will take a performance hit
while the mirroring happens, but you won't have downtime.

Good luck,
John

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
  2005-10-25  3:37 split RAID1 during backups? Jeff Breidenbach
                   ` (2 preceding siblings ...)
  2005-10-25 18:04 ` John Stoffel
@ 2005-10-25 18:13 ` Paul Clements
  2005-10-25 20:05 ` Bill Davidsen
  4 siblings, 0 replies; 25+ messages in thread
From: Paul Clements @ 2005-10-25 18:13 UTC (permalink / raw)
  To: Jeff Breidenbach; +Cc: linux-raid

Jeff Breidenbach wrote:

> your suggestion about kernel 2.6.13 and intent logging and
> having mdadm pull a disk sounds like a winner. I'm going to to try it
> if the software looks mature enough. Should I be scared?

There have been a couple bug fixes in the bitmap stuff since 2.6.13 was 
released, but it's stable. You'll need mdadm 2.x as well.

--
Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
  2005-10-25 17:51   ` John Stoffel
@ 2005-10-25 19:20     ` Norman Schmidt
  0 siblings, 0 replies; 25+ messages in thread
From: Norman Schmidt @ 2005-10-25 19:20 UTC (permalink / raw)
  To: linux-raid

John Stoffel schrieb:
> Norman> What you should be able to do with software raid1 is the
> Norman> following: Stop the raid, mount both underlying devices
> Norman> instead of the raid device, but of course READ ONLY. Both
> Norman> contain the complete data and filesystem, and in addition to
> Norman> that the md superblock at the end. Both should be identical
> Norman> copies of that.  Thus, you do not have to resync
> Norman> afterwards. You then can backup the one disk while serving the
> Norman> web server from the other. When you are done, unmount,
> Norman> assemble the raid, mount it and go on.
> 
> Umm... so what you're proposing would mean that he would have to stop
> his application, make sure nothing is accessing that volume, and then
> restart the application?  
> 
> Somehow I don't see that happening.  
> 
> Why do you feel that you have to stop and umount the volume for the
> splitting off of a read-only mirror pair for backups?  
> 
> John

All this "pulling a disk" sounded to me like a resync, and with that 
much data and slow access this resync would take ages.

And as far as I remember, there was a suggestion with stopping or 
interrupting applications anyway, was there not? Unfortunately, I have 
thrown most of the posts away.

If you keep the volume running, ok - it should still work from the data 
integrity point of view. But if you keep using the data from a 
functional raid1 (/dev/md1 on /data), you cannot disentangle the access, 
since the raid1 will read from both mirrors. But what was desired was a 
faster backup and being able to continue serving the data.

If one disk (without the read-balancing of raid1) would provide enough 
treoughput for the webserver, one could even do the following:

mount /dev/md1 /writeaccess
mount -r /dev/hda /data
mount -r /dev/hdc /backup

So write from /weiteacces (mirrored to both drives) and read from one 
for the webserver (statically) and from the other one for the backup.

Could that work?

Norman.
-- 
Norman Schmidt          Institut fuer Physikal. u. Theoret. Chemie
Dipl.-Chem. Univ.       Friedrich-Alexander-Universitaet
schmidt@naa.net         Erlangen-Nuernberg
+49 9131 852 7321       IT-Systembetreuer Physikalische Chemie

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
  2005-10-25  3:37 split RAID1 during backups? Jeff Breidenbach
                   ` (3 preceding siblings ...)
  2005-10-25 18:13 ` Paul Clements
@ 2005-10-25 20:05 ` Bill Davidsen
  2005-10-26 18:15   ` Dan Stromberg
  4 siblings, 1 reply; 25+ messages in thread
From: Bill Davidsen @ 2005-10-25 20:05 UTC (permalink / raw)
  To: Jeff Breidenbach; +Cc: linux-raid

On Mon, 24 Oct 2005, Jeff Breidenbach wrote:

> 
> Ok... thanks everyone!
> 
> David, you said you are worried about failure scenarios
> involved with RAID splitting. Could you please elaborate?
> My biggest concern is I'm going to accidentally trigger
> a rebuild no matter what I try but maybe you have something
> more serious in mind.
> 
> Brad, your suggestion about kernel 2.6.13 and intent logging and
> having mdadm pull a disk sounds like a winner. I'm going to to try it
> if the software looks mature enough. Should I be scared?
> 
> Dean, the comment about "write-mostly" is confusing to me.  Let's say
> I somehow marked one of the component drives write-mostly to quiet it
> down. How do I get at it? Linux will not let me mount the component
> partition if md0 is also mounted. Do you think "write-mostly" or
> "write-behind" are likely enough to be magic bullets that I should
> learn all about them?
> 
> Bill, thanks for the suggestion to use nbd instead of netcat.  Netcat
> is solid software and very fast, but does feel a little like duct
> tape. You also suggested putting a third drive (local or nbd remote)
> temporarily in the RAID1. What does that buy versus the current
> practice of using dd_rescue to copy the data off md0? I'm not
> imagining any I/O savings over the current approach.

As a paranoid admin, you (a) reduce read-only time, (b) never have an 
unmirrored data running, and (c) it does let you send from an unused drive 
(or you can get a real hot swap bay and put the mirrored drive in the 
safe).

I have one other thought, if you want to just stream this to another drive 
and can stand long r/o mounts (or play with intent stuff, carefully), that 
is to:
 open a socket to something on the other machine which is going to write a 
single BIG data file, or to a partition of the same size.

 open the partition as a file (open /dev/md0)

 use the sendfile() system call to blast the "file" to the socket without 
using user memory.

Based on my vast experience with one test program, this should work ;-) It 
will be limited by how fast you can write it at the other end, I suspect.

====

I still think you should be able to do incrementals. If that's Reiser3 
you're using, it may not be performing as well as ext3 with hashing would, 
but I lack the time to test that properly.

> 
> John, I'm using 4KB blocks in reiserfs with tail packing. All sorts of
> other details are in the dmesg output [1]. I agree seeks are a major
> bottleneck, and I like your suggestion about putting extra spindles
> in. Master-slave won't work because the data is continuously changing.
> I'm not going to argue about the optimality of millions of tiny files
> (go talk to Hans Reiser about that one!) but I definitely don't foresee
> major application redesign any time soon.
> 
> Most importantly, thanks for the encouragement. So far it sounds like
> there might be some ninja magic required, but I'm becoming
> increasingly optimistic that it will be - somehow - possible manage
> disk contention in order to dramatically raise backup speeds.
> 
> Cheers,
> Jeff
> 
> [1] http://www.jab.org/dmesg
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
Doing interesting things with little computers since 1979


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
  2005-10-24 20:28 Jeff Breidenbach
  2005-10-24 20:58 ` John Stoffel
@ 2005-10-25 22:18 ` David Greaves
  1 sibling, 0 replies; 25+ messages in thread
From: David Greaves @ 2005-10-25 22:18 UTC (permalink / raw)
  To: Jeff Breidenbach; +Cc: linux-raid

Jeff Breidenbach wrote:

>Individual directories contain up to about 150,000 files. If I run ls
>-U on all directories, it completes in a reasonably amount of time (I
>forget how much, but I think it is well under an hour). Reiserfs is
>supposed to be good at this sort of thing. If I were to stat each
>file, then it's a different story.
>  
>
Have you tried / can you try XFS.

IIRC it is very good indeed at this kind of scenario (used to be an
*excellent* nntp server fs)

David


^ permalink raw reply	[flat|nested] 25+ messages in thread

* split RAID1 during backups?
@ 2005-10-26  8:17 Jeff Breidenbach
  2005-10-27 13:23 ` Bill Davidsen
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff Breidenbach @ 2005-10-26  8:17 UTC (permalink / raw)
  To: linux-raid


Norman> What you should be able to do with software raid1 is the
Norman> following: Stop the raid, mount both underlying devices
Norman> instead of the raid device, but of course READ ONLY. Both
Norman> contain the complete data and filesystem, and in addition to
Norman> that the md superblock at the end. Both should be identical
Norman> copies of that.  Thus, you do not have to resync
Norman> afterwards. You then can backup the one disk while serving the
Norman> web server from the other. When you are done, unmount,
Norman> assemble the raid, mount it and go on.

I tried both variants of Norman's suggestion on a test machine and
they worked great. Shutting down and restarting md0 did not trigger a
rebuild. Perfect! And I could mount component partitions
read-only at any time. However on the production machine the
component partitions refused to mount, claiming to be "already
mounted". Despite the fact that the component drives do not show up
anywhere in lsof or mtab. When I saw this, I got nervous and did not
even try stopping md0 on the production machine.

# mount -o ro /dev/sdc1 backup
mount: /dev/sdc1 already mounted or backup busy

The two machines hardly match. The test machine has a 2.4.27 kernel
and JBOD drives hanging off a 3ware 7xxx controller. The production
machine has a 2.6.12 kernel and Intel SATA controllers. Both machines
have mdadm 1.9.0, and the discrepancy in behavior seems weird to
me. Any insights?

Paul> There have been a couple bug fixes in the bitmap stuff since
Paul> 2.6.13 was released, but it's stable. You'll need mdadm 2.x as
Paul> well.

It turns out Debian has not yet packaged 2.6.13 even in the unstable
branch. I will wait for this to happen before trying out the whizzy
intent-logging and write-mostly suggestions. I'm brave, but not THAT
brave. 

Dean> i didn't realise you were using reiserfs... i'd suggest
Dean> disabling tail packing... but then i've never used reiser, and
Dean> i've only ever seen reports of tail packing having serious
Dean> performance impact.

Done, thanks.

Bill> If you want to try something "which used to work" see nbd,
Bill> export 500GB from another machine, add the network block device
Bill> to the mirror, let it sync, break the mirror. Haven't tried
Bill> since 2.4.19 or so.

Wow, nbd (network block device) sounds really useful. I wonder if it
is a good way to provide more spindles to a hungry webserver.  Plus
they had a major release yesterday. While I've been focusing on
managing disk contention, if there's an easy way to reduce it, that's
definitely fair game.

Some of the other suggestions I'm going to hold off on. For example,
sendfile() doesn't really address the bottleneck of disk contention.
I'm also not so anxious to switch filesystems. That's a two week
endeavor that doesn't really address the contention issue. And it's
also a little hard for me to imagine that someone is going to beat the
pants off of reiserfs, especially since reiserfs was specifically
designed to deal with lots of small files efficiently. Finally, I'm
not going to focus on incremental backups if there's any prayer of
getting a 500GB full backup in 3 hours.  Full backups provide a LOT of
warm fuzzies.

Again, thank you all very much.

-Jeff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
  2005-10-25 20:05 ` Bill Davidsen
@ 2005-10-26 18:15   ` Dan Stromberg
  0 siblings, 0 replies; 25+ messages in thread
From: Dan Stromberg @ 2005-10-26 18:15 UTC (permalink / raw)
  To: davidsen; +Cc: Jeff Breidenbach, linux-raid, strombrg


I noticed some discussion of both netcat and NBD in this thread, and
these are both things I've investigated at various points during the
last 365.25 days.  :)


On the subject of netcat:

        You might be interested in my pnetcat program at
        http://dcs.nac.uci.edu/~strombrg/pnetcat.html .  It's similar to
        netcat, but perhaps behaves itself better with regard to EOF's,
        can read/write to/from sockets/stdin/stdout arbitrarily (EG,
        both to and from two different sockets, or to and from stdout
        and stdin), like netcat it can also do TCP and UDP (but UDP is
        without any form of error detection - not sure where this is in
        netcat), is at least in some cases faster due to use of large
        blocksizes and adjustable TCP windows, and it's written in a
        VHLL making it ultra-easy to read and maintain.


On the subject of NBD:

        ENBD is supposed to be a funded, improved version of NBD.  Last
        I checked, there was a version of NBD in the Linus kernels, but
        ENBD was still distributed as patches.  GNBD, which I haven't
        examined in isolation (although I have done some GFS overtop of
        GNBD, which didn't work out that well - GFS was pitched to us by
        IBM but didn't work out - it wouldn't do what we needed even on
        the GFS -roadmap-) may or may not be more reliable than ENBD.
        
        My NBD/ENBD notes: http://dcs.nac.uci.edu/~strombrg/nbd.html
        
        If someone experiments with GNBD for getting > 2T filesystems,
        or better still, > 16T filesystems, -without- GFS overtop of it
        (EG, putting an XFS or JFS or ReiserFS overtop of a large device
        tacked together with MD or LVM or LVM2 or EVMS) I'd very much
        like to hear about that!



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
  2005-10-24 10:57 Jeff Breidenbach
                   ` (3 preceding siblings ...)
  2005-10-25  4:30 ` Thomas Garner
@ 2005-10-27  0:04 ` Christopher Smith
  4 siblings, 0 replies; 25+ messages in thread
From: Christopher Smith @ 2005-10-27  0:04 UTC (permalink / raw)
  To: Jeff Breidenbach; +Cc: linux-raid

Jeff Breidenbach wrote:
> Hi all,
> 

[...]

> So - I'm thinking of the following backup scenario.  First, remount
> /dev/md0 readonly just to be safe. Then mount the two component
> paritions (sdc1, sdd1) readonly. Tell the webserver to work from one
> component partition, and tell the backup process to work from the
> other component partition. Once the backup is complete, point the
> webserver back at /dev/md0, unmount the component partitions, then
> switch read-write mode back on.

Isn't this just the sort of scenario LVM snapshots are meant for ?  It 
might not help with the duration aspect, but it will mean your services 
aren't down/non-redundant for the entire time it takes to backup.

> Everything on this system seems bottlenecked by disk I/O. That
> includes the rate web pages are served as well as the backup process
> described above. While I'm always hungry for perforance tips, faster
> backups are the current focus. For those interested in gory details
> such as drive types, NCQ settings, kernel version and whatnot, I
> dumped a copy of dmesg output here: http://www.jab.org/dmesg

I think this might be one of those situations where SCSI really does 
offer a significant performance advantage, although if you're actually 
filling up that 500G, it'll be a quite a bit more expensive.

See if you can get hold of a reasonably sized array using SCSI drives 
and do some comparitive benchmarking.

You might also want to experiment with different filesystems, although 
that may not be feasible...

CS

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
  2005-10-26  8:17 Jeff Breidenbach
@ 2005-10-27 13:23 ` Bill Davidsen
  0 siblings, 0 replies; 25+ messages in thread
From: Bill Davidsen @ 2005-10-27 13:23 UTC (permalink / raw)
  To: Jeff Breidenbach; +Cc: linux-raid

On Wed, 26 Oct 2005, Jeff Breidenbach wrote:

> 
> Norman> What you should be able to do with software raid1 is the
> Norman> following: Stop the raid, mount both underlying devices
> Norman> instead of the raid device, but of course READ ONLY. Both
> Norman> contain the complete data and filesystem, and in addition to
> Norman> that the md superblock at the end. Both should be identical
> Norman> copies of that.  Thus, you do not have to resync
> Norman> afterwards. You then can backup the one disk while serving the
> Norman> web server from the other. When you are done, unmount,
> Norman> assemble the raid, mount it and go on.
> 
> I tried both variants of Norman's suggestion on a test machine and
> they worked great. Shutting down and restarting md0 did not trigger a
> rebuild. Perfect! And I could mount component partitions
> read-only at any time. However on the production machine the
> component partitions refused to mount, claiming to be "already
> mounted". Despite the fact that the component drives do not show up
> anywhere in lsof or mtab. When I saw this, I got nervous and did not
> even try stopping md0 on the production machine.

As long as md0 is running I suspect the partition will be marked as in 
use. So you have to stop it. If the 2.4 kernel didn't detect that, I would 
call it a bug.

> 
> # mount -o ro /dev/sdc1 backup
> mount: /dev/sdc1 already mounted or backup busy
> 
> The two machines hardly match. The test machine has a 2.4.27 kernel
> and JBOD drives hanging off a 3ware 7xxx controller. The production
> machine has a 2.6.12 kernel and Intel SATA controllers. Both machines
> have mdadm 1.9.0, and the discrepancy in behavior seems weird to
> me. Any insights?

	[___snip___]
> 
> Bill> If you want to try something "which used to work" see nbd,
> Bill> export 500GB from another machine, add the network block device
> Bill> to the mirror, let it sync, break the mirror. Haven't tried
> Bill> since 2.4.19 or so.
> 
> Wow, nbd (network block device) sounds really useful. I wonder if it
> is a good way to provide more spindles to a hungry webserver.  Plus
> they had a major release yesterday. While I've been focusing on
> managing disk contention, if there's an easy way to reduce it, that's
> definitely fair game.
> 
> Some of the other suggestions I'm going to hold off on. For example,
> sendfile() doesn't really address the bottleneck of disk contention.

sendfile() bypasses the copy to user buffer, which in turn will bypass 
copy to system buffers, which eliminates contention for buffer space. Use 
vmstat to check, if you have a lot of system time and lots of space in 
buffers of various kinds, there's a good possibility that the problem is 
there.

> I'm also not so anxious to switch filesystems. That's a two week
> endeavor that doesn't really address the contention issue. And it's
> also a little hard for me to imagine that someone is going to beat the
> pants off of reiserfs, especially since reiserfs was specifically
> designed to deal with lots of small files efficiently. Finally, I'm
> not going to focus on incremental backups if there's any prayer of
> getting a 500GB full backup in 3 hours.  Full backups provide a LOT of
> warm fuzzies.
> 
> Again, thank you all very much.


-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
Doing interesting things with little computers since 1979


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: split RAID1 during backups?
@ 2005-10-30  3:06 Jeff Breidenbach
  0 siblings, 0 replies; 25+ messages in thread
From: Jeff Breidenbach @ 2005-10-30  3:06 UTC (permalink / raw)
  To: linux-raid


Thanks to good advice from many people, here are my findings and
conclusions.

(1) Splitting the RAID works. I have now implemented this technique on
the production system and am making a backup right now.

(2) NBD is cool, works well on Debian, and is very convenient. A
couple experiments suggest it may be slower compared to netcat for
blasting data across the network. By slower, I mean less throughput to
the point where the network can become a bottleneck. I don't have
conclusive data yet, so take with a large grain of salt. I am using
netcat for now.

(3) End-to-end throughput is not quite as high as I'd hoped.  At this
point it appears the limiting factor is the speed (throughput) of the
destination disks. During earlier testing, I had been dumping bits to
/dev/null on the destination machine instead of the actual destination
partition. No worries, this can be addressed.

(4) I'll play with fancier options like "write-mostly" when Debian
releases a 2.6.13 kernel, and when I'm convinced that I'm not going
accidentally introduce slower disks into the RAID and bog the entire
system down for writes.

>sendfile() bypasses the copy to user buffer, which in turn will bypass
>copy to system buffers, which eliminates contention for buffer space. Use
>vmstat to check, if you have a lot of system time and lots of space in
>buffers of various kinds, there's a good possibility that the problem is
>there.

I use the -d option in dd_rescue, which invokes O_DIRECT and therefore
doesn't trash Linux's disk buffer. Unfortunately because of the very
random access patterns of the web server, cache misses are extremely
common anyway.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2005-10-30  3:06 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-25  3:37 split RAID1 during backups? Jeff Breidenbach
2005-10-25  4:07 ` dean gaudet
2005-10-25  8:35 ` Norman Schmidt
2005-10-25 17:51   ` John Stoffel
2005-10-25 19:20     ` Norman Schmidt
2005-10-25 18:04 ` John Stoffel
2005-10-25 18:13 ` Paul Clements
2005-10-25 20:05 ` Bill Davidsen
2005-10-26 18:15   ` Dan Stromberg
  -- strict thread matches above, loose matches on Subject: below --
2005-10-30  3:06 Jeff Breidenbach
2005-10-26  8:17 Jeff Breidenbach
2005-10-27 13:23 ` Bill Davidsen
2005-10-25  5:01 Jeff Breidenbach
2005-10-24 20:28 Jeff Breidenbach
2005-10-24 20:58 ` John Stoffel
2005-10-25 22:18 ` David Greaves
2005-10-24 12:07 Jeff Breidenbach
2005-10-24 13:26 ` Paul Clements
2005-10-24 18:55 ` dean gaudet
2005-10-24 10:57 Jeff Breidenbach
2005-10-24 11:22 ` Jurriaan Kalkman
2005-10-24 11:37 ` Brad Campbell
2005-10-24 19:05 ` Bill Davidsen
2005-10-25  4:30 ` Thomas Garner
2005-10-27  0:04 ` Christopher Smith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).