All of lore.kernel.org
 help / color / mirror / Atom feed
* [linux-lvm] LVM2 scalability within volume group
@ 2004-03-17 17:36 Dave Olien
  2004-03-17 18:00 ` Alasdair G Kergon
  2004-03-17 18:12 ` Alasdair G Kergon
  0 siblings, 2 replies; 7+ messages in thread
From: Dave Olien @ 2004-03-17 17:36 UTC (permalink / raw)
  To: linux-lvm


Greetings,

I'm doing some evaluation of LVM2 on large systems, with lots of disk
devices.  I'm currently using LVM2.2.00.08, along with device-mapper.1.00.07.
I plan to eventually upgrade to the lastest CVS trees for LVM2.  I recall
earlier mail that it's faster than what I'm using.

The first thing I noticed is that creating a VG with lots of PVs is a bad
idea.  Creating a volume group with one PV takes about 12
seconds elapsed time.  Adding a new PV initially takes about 5 seconds.
But this grows to 15 seconds when adding the 40th PV, 25 seconds adding
the 60th PV.  Adding the 200th PV takes about 6 minutes.

Activing this volume group (with 200 PVs) takes about 48 minutes.

I can be spreading PVs out among the available 99 VGs.
The individual VGs seem to be independent of each other, so having
large numbers of VGs with a few PVs performs OK.

But stil, would it make sense for adding PVs to VG to scale better?
I'm guessing that the current lack of scaling is from the way redundant
meta data is stored.  Are redundant copies of meta data being updated on
every PV within the VG whenever a new PV is added?  Even so, why should
200 read/writes take so long?

Having redundant copies of meta data is a good thing.  But how about
allowing the adminstrator to set a limit on the degree of redundancy when
a VG is created.  You could limit a VG to having for example 10 redundant
copies.  Then adding more PVs beyond the 10th would encounter less overhead.

Am I missing something important?

Thanks!
Dave Olien

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] LVM2 scalability within volume group
  2004-03-17 17:36 [linux-lvm] LVM2 scalability within volume group Dave Olien
@ 2004-03-17 18:00 ` Alasdair G Kergon
  2004-03-17 18:15   ` Dave Olien
  2004-03-22 22:41   ` Dave Olien
  2004-03-17 18:12 ` Alasdair G Kergon
  1 sibling, 2 replies; 7+ messages in thread
From: Alasdair G Kergon @ 2004-03-17 18:00 UTC (permalink / raw)
  To: LVM general discussion and development

On Wed, Mar 17, 2004 at 09:36:38AM -0800, Dave Olien wrote:
> Having redundant copies of meta data is a good thing.  But how about
> allowing the adminstrator to set a limit on the degree of redundancy when
> a VG is created.  You could limit a VG to having for example 10 redundant
> copies.  Then adding more PVs beyond the 10th would encounter less overhead.
> Am I missing something important?

There'll be a VG-level option for this eventually; until then, use the
pvcreate options to say how many copies of metadata you want on each PV.
e.g. pvcreate --metadatacopies 0
[Careful use of the --restorefile option lets you reduce it on a PV already in the VG.]

For complex VGs you should increase the space set aside for metadata too:
  --metadatasize

See the pvcreate man page.
 
Alasdair
-- 
agk@redhat.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] LVM2 scalability within volume group
  2004-03-17 17:36 [linux-lvm] LVM2 scalability within volume group Dave Olien
  2004-03-17 18:00 ` Alasdair G Kergon
@ 2004-03-17 18:12 ` Alasdair G Kergon
  1 sibling, 0 replies; 7+ messages in thread
From: Alasdair G Kergon @ 2004-03-17 18:12 UTC (permalink / raw)
  To: LVM general discussion and development

On Wed, Mar 17, 2004 at 09:36:38AM -0800, Dave Olien wrote:
> I plan to eventually upgrade to the lastest CVS trees for LVM2.  I recall
> earlier mail that it's faster than what I'm using.
 
Other speed-ups: 
  Add a filter to lvm.conf so it doesn't waste time looking at non-PVs.
  Invoke commands using the lvm shell (configure --enable-readline) so the 
tools retain the maximum amount of internal state between commands.
[I'll fix this to avoid the warning when piping]

Alasdair
-- 
agk@redhat.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] LVM2 scalability within volume group
  2004-03-17 18:00 ` Alasdair G Kergon
@ 2004-03-17 18:15   ` Dave Olien
  2004-03-22 22:41   ` Dave Olien
  1 sibling, 0 replies; 7+ messages in thread
From: Dave Olien @ 2004-03-17 18:15 UTC (permalink / raw)
  To: LVM general discussion and development


Thanks!  The pvcreate man page I have installed now (from LVM2.00.08)
doesn't mention this option.  I'll move to the cvs version and give it
a try.

Dave

On Wed, Mar 17, 2004 at 12:00:05PM -0600, Alasdair G Kergon wrote:
> On Wed, Mar 17, 2004 at 09:36:38AM -0800, Dave Olien wrote:
> > Having redundant copies of meta data is a good thing.  But how about
> > allowing the adminstrator to set a limit on the degree of redundancy when
> > a VG is created.  You could limit a VG to having for example 10 redundant
> > copies.  Then adding more PVs beyond the 10th would encounter less overhead.
> > Am I missing something important?
> 
> There'll be a VG-level option for this eventually; until then, use the
> pvcreate options to say how many copies of metadata you want on each PV.
> e.g. pvcreate --metadatacopies 0
> [Careful use of the --restorefile option lets you reduce it on a PV already in the VG.]
> 
> For complex VGs you should increase the space set aside for metadata too:
>   --metadatasize
> 
> See the pvcreate man page.
>  
> Alasdair
> -- 
> agk@redhat.com
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] LVM2 scalability within volume group
  2004-03-17 18:00 ` Alasdair G Kergon
  2004-03-17 18:15   ` Dave Olien
@ 2004-03-22 22:41   ` Dave Olien
  2004-03-26 22:34     ` Alasdair G Kergon
  1 sibling, 1 reply; 7+ messages in thread
From: Dave Olien @ 2004-03-22 22:41 UTC (permalink / raw)
  To: LVM general discussion and development; +Cc: agk


I finally tried reducing metadata rundancy, and re-ran my experiment
with the single volume group containing 200 physical volumes.  I constructed
the new volume group with redundant metadata on only 6 PVs, instead of
the default copy on every volume group.  This helped a lot.  Here's comparing
the new configuration with the old configuration:

        - Time to add a  PV to a VG with large number of PV's:

                                elasped time (secs)
                PV number:      New      Old
                1st PV took      5         3
                40th PV took     6        13
                60th PV took     7        24
                200th PV took   15       426

        - Time to create a Logical Volume within that Volume group:
                                New             Old
                                 30 seconds     14 minutes

        - Time to activate a volume group:
                                New             Old
                                29 seconds      45 minutes

While this is a big improvement, 15 second still seems a long time
for adding that 200th PV.  Likewise 29 seconds to activate the VG
is much better.  But can these be made faster?

I did an strace on some of these commands.  It seems that every
command opens about 480 file descriptors.  I Iooked at the
/etc/lvm/.cache file. It looks like every device listed there is opened
for every command.  I wasn't able to reduce this .cache file very much,
because even though I was using only 200 devices in one volume group,
I still wanted to put the other 200 devices into other volume groups.  

Can these user-level commands be made smarter in this regard?

Is this something that using the lvm(8) shell would help?  On a large
system, re-activing lots of large volume groups could take a while.
Could the startup script for LVM benefit from run an lvm(8) script to
do the startup work?


On Wed, Mar 17, 2004 at 12:00:05PM -0600, Alasdair G Kergon wrote:
> On Wed, Mar 17, 2004 at 09:36:38AM -0800, Dave Olien wrote:
> > Having redundant copies of meta data is a good thing.  But how about
> > allowing the adminstrator to set a limit on the degree of redundancy when
> > a VG is created.  You could limit a VG to having for example 10 redundant
> > copies.  Then adding more PVs beyond the 10th would encounter less overhead.
> > Am I missing something important?
> 
> There'll be a VG-level option for this eventually; until then, use the
> pvcreate options to say how many copies of metadata you want on each PV.
> e.g. pvcreate --metadatacopies 0
> [Careful use of the --restorefile option lets you reduce it on a PV already in the VG.]
> 
> For complex VGs you should increase the space set aside for metadata too:
>   --metadatasize
> 
> See the pvcreate man page.
>  
> Alasdair
> -- 
> agk@redhat.com
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] LVM2 scalability within volume group
  2004-03-22 22:41   ` Dave Olien
@ 2004-03-26 22:34     ` Alasdair G Kergon
  2004-03-30 23:19       ` Dave Olien
  0 siblings, 1 reply; 7+ messages in thread
From: Alasdair G Kergon @ 2004-03-26 22:34 UTC (permalink / raw)
  To: LVM general discussion and development

On Mon, Mar 22, 2004 at 02:41:45PM -0800, Dave Olien wrote:
> While this is a big improvement, 15 second still seems a long time
> for adding that 200th PV.  Likewise 29 seconds to activate the VG
> is much better.  But can these be made faster?

Yes, I've a list of performance enhancements waiting to be made - they
aren't top priority yet though.
 
> Can these user-level commands be made smarter in this regard?

Yes.  
  (a) Additional internal state can be written to the cache file.
      [The index of which PVs are in which VGs; the index of which
       UUIDs were found on which devices.]
  (b) Some disk reads are still duplicated and can be cached safely.

> Is this something that using the lvm(8) shell would help?  

Yes, this is equivalent to point (a).
Either pipe the commands through it, or try linking against
the library I checked into CVS today.  
(configure --enable-cmdlib;  brief docn in lvm2cmd.h)

Quick example prog below.

Alasdair



#include "lvm2cmd.h"
 
/* All output gets passed to this function line-by-line */
void test_log_fn(int level, const char *file, int line, const char *format)
{
	/* Extract and process output here instead of printing it */

        if (level != 4)
                return;
 
        printf("%s\n", format);
        return;
}
 
int main(int argc, char **argv)
{
        void *handle;
        int r;
 
        lvm2_log_fn(test_log_fn);
 
        handle = lvm2_init();
 
        lvm2_log_level(handle, 1);
        r = lvm2_run(handle, "vgs --noheadings vg1");

	/* More commands here */
 
        lvm2_exit(handle);
 
        return r;
}
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] LVM2 scalability within volume group
  2004-03-26 22:34     ` Alasdair G Kergon
@ 2004-03-30 23:19       ` Dave Olien
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Olien @ 2004-03-30 23:19 UTC (permalink / raw)
  To: agk; +Cc: linux-lvm


Thanks for the feedback.  I'll be experimenting more with this over the
next few weeks.  I'd love to try changes as they come available.

Dave Olien

On Fri, Mar 26, 2004 at 04:34:03PM -0600, Alasdair G Kergon wrote:
> On Mon, Mar 22, 2004 at 02:41:45PM -0800, Dave Olien wrote:
> > Can these user-level commands be made smarter in this regard?
> 
> Yes.  
>   (a) Additional internal state can be written to the cache file.
>       [The index of which PVs are in which VGs; the index of which
>        UUIDs were found on which devices.]
>   (b) Some disk reads are still duplicated and can be cached safely.
> 
> > Is this something that using the lvm(8) shell would help?  
> 
> Yes, this is equivalent to point (a).
> Either pipe the commands through it, or try linking against
> the library I checked into CVS today.  
> (configure --enable-cmdlib;  brief docn in lvm2cmd.h)
> 
> Quick example prog below.
> 
> 
> #include "lvm2cmd.h"
>  
> /* All output gets passed to this function line-by-line */
> void test_log_fn(int level, const char *file, int line, const char *format)
> {
> 	/* Extract and process output here instead of printing it */
> 
>         if (level != 4)
>                 return;
>  
>         printf("%s\n", format);
>         return;
> }
>  
> int main(int argc, char **argv)
> {
>         void *handle;
>         int r;
>  
>         lvm2_log_fn(test_log_fn);
>  
>         handle = lvm2_init();
>  
>         lvm2_log_level(handle, 1);
>         r = lvm2_run(handle, "vgs --noheadings vg1");
> 
> 	/* More commands here */
>  
>         lvm2_exit(handle);
>  
>         return r;
> }
>  
> 
> 
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-03-30 23:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-17 17:36 [linux-lvm] LVM2 scalability within volume group Dave Olien
2004-03-17 18:00 ` Alasdair G Kergon
2004-03-17 18:15   ` Dave Olien
2004-03-22 22:41   ` Dave Olien
2004-03-26 22:34     ` Alasdair G Kergon
2004-03-30 23:19       ` Dave Olien
2004-03-17 18:12 ` Alasdair G Kergon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.