All of lore.kernel.org
 help / color / mirror / Atom feed
* [linux-lvm] hard-lock seems to have caused serious LVM problems
@ 2001-01-15 11:59 dmeyer
  0 siblings, 0 replies; 7+ messages in thread
From: dmeyer @ 2001-01-15 11:59 UTC (permalink / raw)
  To: linux-lvm

Last night, my machine (running Linux-2.4.0, LVM-0.9, and the 0.9
utilities) locked up hard.  On reboot, vgscan can only find one of my
VGs.  vgscan results in:

# vgscan
vgscan -- reading all physical volumes (this may take a while...)
vgscan -- found active volume group "main_vg"
vgscan -- found inactive volume group "misc_vg"
vgscan -- ERROR "vg_read_with_pv_and_lv(): allocated LE of LV" can't get data of volume group "misc_vg" from physical volume(s)
vgscan -- ERROR "vg_read_with_pv_and_lv(): allocated LE of LV"
creating "/etc/lvmtab" and "/etc/lvmtab.d"

The LV on main_vg works just fine, but I can't get at anything in
misc_vg.

vgcfgrestore isn't helping.  I get:
# vgcfgrestore -v -f ./lvmconf/misc_vg.conf -n misc_vg -l
vgcfgrestore -- locking logical volume manager
vgcfgrestore -- restoring volume group "misc_vg" from "./lvmconf/misc_vg.conf"
vgcfgrestore -- checking existence of "./lvmconf/misc_vg.conf"
vgcfgrestore -- reading volume group data for "misc_vg" from "./lvmconf/misc_vg.conf"
vgcfgrestore -- ERROR: different structure size stored in "./lvmconf/misc_vg.conf" than expected in file vg_cfgrestore.c [line 120]
vgcfgrestore -- ERROR "vg_cfgrestore(): read" restoring volume group "misc_vg"

Hacking in some extra debugging code, it looks like the first
VGCFG_READ in vgcfgrestore() is expecting a vg_t to be 2484 bytes, but
the actual struct on-disk is only 2248 bytes.

All other diagnostic output is going to be too long for the list, so
please look at http://www.dmeyer.net/~dmeyer/lvm for files I reference
below. 

As far as I can tell (which isn't very far, really), the PVs
themselves are OK - I can run pvdata and get nothing that looks (to
me, at least) horribly suspicious.  I put the results from pvdata -a
for all 5 PVs in pvdata.<partition>.

vgscan -d seg faults.  However, by adding

   if (uuidstr[0] != '/') {
     return -1;
   }

to the beginning of lvm_check_uuid in lvm_uuid.c, I managed to keep
vgscan from dying on me.  Anyway, the results from vgscan -d are also
on my web page.  There are actually 4 versions:  0.9 and 0.9.1-beta1,
and both patched (i.e. with the code above) and unpatched.

I've also dd'd the first 32k of each of the 5 file partitions, in case
that might help.  Also, /etc/lvm* from the previous night's backups
are also there.

If anyone can suggest a course of action, I'd really appreciate it.

     Dave

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] hard-lock seems to have caused serious LVM problems
@ 2001-01-15 15:00 dmeyer
  2001-01-15 16:25 ` Heinz J. Mauelshagen
  2001-01-15 17:09 ` Andreas Dilger
  0 siblings, 2 replies; 7+ messages in thread
From: dmeyer @ 2001-01-15 15:00 UTC (permalink / raw)
  To: linux-lvm

Thanks to help from Jan Niehusmann, I have more information, now.
After applying this patch:

> The following patch from Jan (with a minor correction "against" segfaults :-)
> corrected the problem for me:
------------------------------------------------------------------------------
*** pv_read_all_pv_of_vg.c.orig	Mon Nov 20 03:47:20 2000
--- pv_read_all_pv_of_vg.c.patched	Sat Jan 13 18:31:00 2001
***************
*** 101,117 ****
        for ( p = 0; pv_tmp[p] != NULL; p++) {
           if ( strncmp ( pv_tmp[p]->vg_name, vg_name, NAME_LEN) == 0) {
              pv_this_sav = pv_this;
              if ( ( pv_this = realloc ( pv_this,
!                                        ( np + 2) * sizeof ( pv_t*))) == NULL) {
                 fprintf ( stderr, "realloc error in %s [line %d]\n",
                                   __FILE__, __LINE__);
                 ret = -LVM_EPV_READ_ALL_PV_OF_VG_MALLOC;
                 if ( pv_this_sav != NULL) free ( pv_this_sav);
                 goto pv_read_all_pv_of_vg_end;
              }
!             pv_this[np] = pv_tmp[p];
!             pv_this[np+1] = NULL;
!             np++;
           }
        }
  
--- 101,117 ----
        for ( p = 0; pv_tmp[p] != NULL; p++) {
           if ( strncmp ( pv_tmp[p]->vg_name, vg_name, NAME_LEN) == 0) {
              pv_this_sav = pv_this;
+ 	    if ( np < pv_tmp[p]->pv_number) np = pv_tmp[p]->pv_number;
              if ( ( pv_this = realloc ( pv_this,
!                                        ( np + 1) * sizeof ( pv_t*))) == NULL) {
                 fprintf ( stderr, "realloc error in %s [line %d]\n",
                                   __FILE__, __LINE__);
                 ret = -LVM_EPV_READ_ALL_PV_OF_VG_MALLOC;
                 if ( pv_this_sav != NULL) free ( pv_this_sav);
                 goto pv_read_all_pv_of_vg_end;
              }
! 	    pv_this[pv_tmp[p]->pv_number-1] = pv_tmp[p];
!             pv_this[np] = NULL;
           }
        }
  
vgscan stopped giving me an error.  Unfortunately, it stopped
mentioning my second VG (named misc_vg) at all :-(.

misc_vg has 5 PVs, /dev/{hdb1,hdb5,hdb6,sda1,sda2}.  It turns out,
vgscan was ignoring misc_vg because it didn't think all the PVs were
online.  It was reading the uuid list from /dev/sda1, and /dev/sda1
only had 4 PVs in it's uuid list.  Commenting out the block of code in
pv_read_all_pv_of_vg.c that starts with "if (uuids > 0) {" fixes the
problem, though I kind of doubt that it's the right fix.

Does the following seem right?
# pvdata -U /dev/hdb1 /dev/hdb5 /dev/hdb6 /dev/sda1 /dev/sda2
--- List of physical volume UUIDs ---

000: pXMXm8FIECSb7mGPEIX3qVgQFbt21sKd
001: --- EMPTY ---
002: --- EMPTY ---
003: --- EMPTY ---
004: --- EMPTY ---
--- List of physical volume UUIDs ---

000: pXMXm8FIECSb7mGPEIX3qVgQFbt21sKd
001: efjtqFYhTIqyLO2cBURu5zN7rLJsG4dF
002: --- EMPTY ---
003: --- EMPTY ---
004: --- EMPTY ---
--- List of physical volume UUIDs ---

000: pXMXm8FIECSb7mGPEIX3qVgQFbt21sKd
001: efjtqFYhTIqyLO2cBURu5zN7rLJsG4dF
002: 931SNJ6F66g3n3qA9Nts3r4jqe4TOHW8
003: --- EMPTY ---
004: --- EMPTY ---
--- List of physical volume UUIDs ---

000: pXMXm8FIECSb7mGPEIX3qVgQFbt21sKd
001: efjtqFYhTIqyLO2cBURu5zN7rLJsG4dF
002: 931SNJ6F66g3n3qA9Nts3r4jqe4TOHW8
003: m2wuLpJ9AQmzXnbk4sCuQu8hjAev7pax
004: --- EMPTY ---
--- List of physical volume UUIDs ---

000: pXMXm8FIECSb7mGPEIX3qVgQFbt21sKd
001: efjtqFYhTIqyLO2cBURu5zN7rLJsG4dF
002: 931SNJ6F66g3n3qA9Nts3r4jqe4TOHW8
003: m2wuLpJ9AQmzXnbk4sCuQu8hjAev7pax
004: fdNejQ2DAp9A8KN0UrePxscwoY8vqVSu

Each PV only has the uuids from the PVs before it.  Should each PV
have the complete list of uuids in its VG (in which case there's
something screwy with my PVs)?  Or did pv_read_all_pv_of_vg somehow
pick the wrong PV to read the uuid list from?  Or something else?

     Dave
     

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [linux-lvm] hard-lock seems to have caused serious LVM problems
@ 2001-01-15 16:03 dmeyer
  2001-01-15 17:22 ` Heinz J. Mauelshagen
  0 siblings, 1 reply; 7+ messages in thread
From: dmeyer @ 2001-01-15 16:03 UTC (permalink / raw)
  To: linux-lvm

In article <20010115162523.A31019@srv.t-online.de> you write:
> Every PV is assumed to hold all the PV UUIDs of all PVs in the volume group
> the PV belongs to.

The PVs and VGs were created with 0.8final - I migrated to 0.9 tools
when I upgraded to linux-2.4.0.  Could it be related to that somehow?

Interestingly enough, something seems to have fixed my PVs in the
meantime - now they all have every UUID.  Either activating the VG or
running vgck must have done it.

-- 
David M. Meyer
dmeyer@dmeyer.net

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] hard-lock seems to have caused serious LVM problems
  2001-01-15 15:00 dmeyer
@ 2001-01-15 16:25 ` Heinz J. Mauelshagen
  2001-01-15 17:09 ` Andreas Dilger
  1 sibling, 0 replies; 7+ messages in thread
From: Heinz J. Mauelshagen @ 2001-01-15 16:25 UTC (permalink / raw)
  To: linux-lvm

On Mon, Jan 15, 2001 at 10:00:48AM -0500, dmeyer@dmeyer.net wrote:
> Thanks to help from Jan Niehusmann, I have more information, now.
> After applying this patch:
> 
> > The following patch from Jan (with a minor correction "against" segfaults :-)
> > corrected the problem for me:
> ------------------------------------------------------------------------------
> *** pv_read_all_pv_of_vg.c.orig	Mon Nov 20 03:47:20 2000
> --- pv_read_all_pv_of_vg.c.patched	Sat Jan 13 18:31:00 2001
> ***************
> *** 101,117 ****
>         for ( p = 0; pv_tmp[p] != NULL; p++) {
>            if ( strncmp ( pv_tmp[p]->vg_name, vg_name, NAME_LEN) == 0) {
>               pv_this_sav = pv_this;
>               if ( ( pv_this = realloc ( pv_this,
> !                                        ( np + 2) * sizeof ( pv_t*))) == NULL) {
>                  fprintf ( stderr, "realloc error in %s [line %d]\n",
>                                    __FILE__, __LINE__);
>                  ret = -LVM_EPV_READ_ALL_PV_OF_VG_MALLOC;
>                  if ( pv_this_sav != NULL) free ( pv_this_sav);
>                  goto pv_read_all_pv_of_vg_end;
>               }
> !             pv_this[np] = pv_tmp[p];
> !             pv_this[np+1] = NULL;
> !             np++;
>            }
>         }
>   
> --- 101,117 ----
>         for ( p = 0; pv_tmp[p] != NULL; p++) {
>            if ( strncmp ( pv_tmp[p]->vg_name, vg_name, NAME_LEN) == 0) {
>               pv_this_sav = pv_this;
> + 	    if ( np < pv_tmp[p]->pv_number) np = pv_tmp[p]->pv_number;
>               if ( ( pv_this = realloc ( pv_this,
> !                                        ( np + 1) * sizeof ( pv_t*))) == NULL) {
>                  fprintf ( stderr, "realloc error in %s [line %d]\n",
>                                    __FILE__, __LINE__);
>                  ret = -LVM_EPV_READ_ALL_PV_OF_VG_MALLOC;
>                  if ( pv_this_sav != NULL) free ( pv_this_sav);
>                  goto pv_read_all_pv_of_vg_end;
>               }
> ! 	    pv_this[pv_tmp[p]->pv_number-1] = pv_tmp[p];
> !             pv_this[np] = NULL;
>            }
>         }
>   
> vgscan stopped giving me an error.  Unfortunately, it stopped
> mentioning my second VG (named misc_vg) at all :-(.
> 
> misc_vg has 5 PVs, /dev/{hdb1,hdb5,hdb6,sda1,sda2}.  It turns out,
> vgscan was ignoring misc_vg because it didn't think all the PVs were
> online.  It was reading the uuid list from /dev/sda1, and /dev/sda1
> only had 4 PVs in it's uuid list.  Commenting out the block of code in
> pv_read_all_pv_of_vg.c that starts with "if (uuids > 0) {" fixes the
> problem, though I kind of doubt that it's the right fix.
> 
> Does the following seem right?

No.
Every PV is assumed to hold all the PV UUIDs of all PVs in the volume group
the PV belongs to.

I am going to check, if there's any obvious reason that the array is screwed...

> # pvdata -U /dev/hdb1 /dev/hdb5 /dev/hdb6 /dev/sda1 /dev/sda2
> --- List of physical volume UUIDs ---
> 
> 000: pXMXm8FIECSb7mGPEIX3qVgQFbt21sKd
> 001: --- EMPTY ---
> 002: --- EMPTY ---
> 003: --- EMPTY ---
> 004: --- EMPTY ---
> --- List of physical volume UUIDs ---
> 
> 000: pXMXm8FIECSb7mGPEIX3qVgQFbt21sKd
> 001: efjtqFYhTIqyLO2cBURu5zN7rLJsG4dF
> 002: --- EMPTY ---
> 003: --- EMPTY ---
> 004: --- EMPTY ---
> --- List of physical volume UUIDs ---
> 
> 000: pXMXm8FIECSb7mGPEIX3qVgQFbt21sKd
> 001: efjtqFYhTIqyLO2cBURu5zN7rLJsG4dF
> 002: 931SNJ6F66g3n3qA9Nts3r4jqe4TOHW8
> 003: --- EMPTY ---
> 004: --- EMPTY ---
> --- List of physical volume UUIDs ---
> 
> 000: pXMXm8FIECSb7mGPEIX3qVgQFbt21sKd
> 001: efjtqFYhTIqyLO2cBURu5zN7rLJsG4dF
> 002: 931SNJ6F66g3n3qA9Nts3r4jqe4TOHW8
> 003: m2wuLpJ9AQmzXnbk4sCuQu8hjAev7pax
> 004: --- EMPTY ---
> --- List of physical volume UUIDs ---
> 
> 000: pXMXm8FIECSb7mGPEIX3qVgQFbt21sKd
> 001: efjtqFYhTIqyLO2cBURu5zN7rLJsG4dF
> 002: 931SNJ6F66g3n3qA9Nts3r4jqe4TOHW8
> 003: m2wuLpJ9AQmzXnbk4sCuQu8hjAev7pax
> 004: fdNejQ2DAp9A8KN0UrePxscwoY8vqVSu
> 
> Each PV only has the uuids from the PVs before it.  Should each PV
> have the complete list of uuids in its VG (in which case there's
> something screwy with my PVs)?  Or did pv_read_all_pv_of_vg somehow
> pick the wrong PV to read the uuid list from?  Or something else?
> 
>      Dave
>      
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@sistina.com
> http://lists.sistina.com/mailman/listinfo/linux-lvm

-- 

Regards,
Heinz    -- The LVM Guy --

*** Software bugs are stupid.
    Nevertheless it needs not so stupid people to solve them ***

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Sistina Software Inc.
Senior Consultant/Developer                       Am Sonnenhang 11
                                                  56242 Marienrachdorf
                                                  Germany
Mauelshagen@Sistina.com                           +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] hard-lock seems to have caused serious LVM problems
  2001-01-15 15:00 dmeyer
  2001-01-15 16:25 ` Heinz J. Mauelshagen
@ 2001-01-15 17:09 ` Andreas Dilger
  1 sibling, 0 replies; 7+ messages in thread
From: Andreas Dilger @ 2001-01-15 17:09 UTC (permalink / raw)
  To: linux-lvm

Dave writes:
> Does the following seem right?

> # pvdata -U /dev/hdb1 /dev/hdb5 /dev/hdb6 /dev/sda1 /dev/sda2
> --- List of physical volume UUIDs ---
> 
> 000: pXMXm8FIECSb7mGPEIX3qVgQFbt21sKd
> 001: --- EMPTY ---
> 002: --- EMPTY ---
> 003: --- EMPTY ---
> 004: --- EMPTY ---
> --- List of physical volume UUIDs ---
> 
> 000: pXMXm8FIECSb7mGPEIX3qVgQFbt21sKd
> 001: efjtqFYhTIqyLO2cBURu5zN7rLJsG4dF
> 002: --- EMPTY ---
> 003: --- EMPTY ---
> 004: --- EMPTY ---
> --- List of physical volume UUIDs ---
> 
> 000: pXMXm8FIECSb7mGPEIX3qVgQFbt21sKd
> 001: efjtqFYhTIqyLO2cBURu5zN7rLJsG4dF
> 002: 931SNJ6F66g3n3qA9Nts3r4jqe4TOHW8
> 003: --- EMPTY ---
> 004: --- EMPTY ---
> --- List of physical volume UUIDs ---
> 
> 000: pXMXm8FIECSb7mGPEIX3qVgQFbt21sKd
> 001: efjtqFYhTIqyLO2cBURu5zN7rLJsG4dF
> 002: 931SNJ6F66g3n3qA9Nts3r4jqe4TOHW8
> 003: m2wuLpJ9AQmzXnbk4sCuQu8hjAev7pax
> 004: --- EMPTY ---
> --- List of physical volume UUIDs ---
> 
> 000: pXMXm8FIECSb7mGPEIX3qVgQFbt21sKd
> 001: efjtqFYhTIqyLO2cBURu5zN7rLJsG4dF
> 002: 931SNJ6F66g3n3qA9Nts3r4jqe4TOHW8
> 003: m2wuLpJ9AQmzXnbk4sCuQu8hjAev7pax
> 004: fdNejQ2DAp9A8KN0UrePxscwoY8vqVSu
> 
> Each PV only has the uuids from the PVs before it.  Should each PV
> have the complete list of uuids in its VG (in which case there's
> something screwy with my PVs)?  Or did pv_read_all_pv_of_vg somehow
> pick the wrong PV to read the uuid list from?  Or something else?

No, this looks wrong.  Each PV should have the UUID from all PVs in the VG.
What you can do in the meantime (if you are confident with such things) is
to copy the UUIDs from /dev/sda2 (or whichever has all 5 UUIDs) to all of
the other disks.  You can do this with "od -Ad -a /dev/sda2" to find where
the UUIDs are stored in the VG (note that each UUID has 32 ASCII characters
and 96 NULs), and "dd" to copy but you obviously need to be careful.

It may also work to change pv_read_all_pv_of_vg() to call pv_read_uuidlist()
for each PV and find the one with the longest list of UUIDs (appropriately
saving uuids_max and pv_uuid_list_max).  However, I'm not sure if that is a
permanent solution - it should really be the same list on all PVs.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [linux-lvm] hard-lock seems to have caused serious LVM problems
  2001-01-15 16:03 [linux-lvm] hard-lock seems to have caused serious LVM problems dmeyer
@ 2001-01-15 17:22 ` Heinz J. Mauelshagen
  0 siblings, 0 replies; 7+ messages in thread
From: Heinz J. Mauelshagen @ 2001-01-15 17:22 UTC (permalink / raw)
  To: linux-lvm

On Mon, Jan 15, 2001 at 11:03:58AM -0500, dmeyer@dmeyer.net wrote:
> In article <20010115162523.A31019@srv.t-online.de> you write:
> > Every PV is assumed to hold all the PV UUIDs of all PVs in the volume group
> > the PV belongs to.
> 
> The PVs and VGs were created with 0.8final - I migrated to 0.9 tools
> when I upgraded to linux-2.4.0.  Could it be related to that somehow?

No, it shouldn't.
0.9 created PV UUIDs automagically in case there were none before.

> 
> Interestingly enough, something seems to have fixed my PVs in the
> meantime - now they all have every UUID.  Either activating the VG or
> running vgck must have done it.

No, it doesn't happen while activating VGs or checking them.

The following tools call vg_write_with_pv_and_lv() in the library
which in turn calls pv_write_uuidlist() to do the UUID list update:

lvcreate
lvextend
lvreduce
lvremove
lvrename
vgcfgrestore
vgchange (in case you change the max LV property)
vgcreate
vgexport
vgextend
vgimport
vgmerge
vgreduce
vgrename
vgscan
vgsplit

You must have run one of those tools in order to update the UUID list.
It is still rather interesting what caused the mess you reported in the first
place.

> 
> -- 
> David M. Meyer
> dmeyer@dmeyer.net
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@sistina.com
> http://lists.sistina.com/mailman/listinfo/linux-lvm

-- 

Regards,
Heinz    -- The LVM Guy --

*** Software bugs are stupid.
    Nevertheless it needs not so stupid people to solve them ***

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Sistina Software Inc.
Senior Consultant/Developer                       Am Sonnenhang 11
                                                  56242 Marienrachdorf
                                                  Germany
Mauelshagen@Sistina.com                           +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [linux-lvm] hard-lock seems to have caused serious LVM problems
@ 2001-01-15 20:33 dmeyer
  0 siblings, 0 replies; 7+ messages in thread
From: dmeyer @ 2001-01-15 20:33 UTC (permalink / raw)
  To: linux-lvm

In article <20010115172200.A31217@srv.t-online.de> you write:
> On Mon, Jan 15, 2001 at 11:03:58AM -0500, dmeyer@dmeyer.net wrote:
> > Interestingly enough, something seems to have fixed my PVs in the
> > meantime - now they all have every UUID.  Either activating the VG or
> > running vgck must have done it.
> 
> No, it doesn't happen while activating VGs or checking them.
> 
> The following tools call vg_write_with_pv_and_lv() in the library
> which in turn calls pv_write_uuidlist() to do the UUID list update:

Maybe it was calling vgscan on the messed up VG once I managed to get
it online did it, then.  I didn't call any of the others.

> You must have run one of those tools in order to update the UUID list.
> It is still rather interesting what caused the mess you reported in the first
> place.

I wonder if it wasn't somehow related to calling vgscan -d and getting
seg faults in the middle of it.  Or, more likely, running vgscan in
the debugger trying to extract more information about what was going
on.  

In any case, I hope fervantly never to have the opportunity to find
out again :-).

-- 
David M. Meyer
dmeyer@dmeyer.net

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2001-01-15 20:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-01-15 16:03 [linux-lvm] hard-lock seems to have caused serious LVM problems dmeyer
2001-01-15 17:22 ` Heinz J. Mauelshagen
  -- strict thread matches above, loose matches on Subject: below --
2001-01-15 20:33 dmeyer
2001-01-15 15:00 dmeyer
2001-01-15 16:25 ` Heinz J. Mauelshagen
2001-01-15 17:09 ` Andreas Dilger
2001-01-15 11:59 dmeyer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.