All of lore.kernel.org
 help / color / mirror / Atom feed
* Internal error: Pool read_vg crc mismatch only when running in test environment
@ 2013-06-20 23:18 Tony Asleson
  2013-06-21 13:53 ` Zdenek Kabelac
  0 siblings, 1 reply; 6+ messages in thread
From: Tony Asleson @ 2013-06-20 23:18 UTC (permalink / raw)
  To: lvm-devel

Writing some new unit test cases for my latest liblvm patch set and at
the moment I am running into a case where I can run the unit test case
against real disk and it works, but if I run it in the test environment
with loop back devices I am getting an abort with:

"Internal error: Pool read_vg crc mismatch."

Any ideas why this error isn't occurring on both?

Thanks,
Tony

Snippet from output.


VERBOSE= \
cluster_LVM_TEST_LOCKING=3 \
lvmetad_LVM_TEST_LVMETAD=1 \
./lib/harness normal:api/pytest.sh \
	      cluster:api/pytest.sh \
	      lvmetad:api/pytest.sh
Running normal:api/pytest.sh ...                                     FAILED.
-- FAILED normal:api/pytest.sh ------------------------------------

#If you change this change the unit test case too.
aux prepare_pvs 3
#pytest.sh:17+ aux prepare_pvs 3
## preparing loop device....ok (/dev/loop0)
## preparing 3 devices...ok
  Physical volume "@TESTDIR@/dev/mapper/@PREFIX at pv1" successfully created
  Physical volume "@TESTDIR@/dev/mapper/@PREFIX at pv2" successfully created
  Physical volume "@TESTDIR@/dev/mapper/@PREFIX at pv3" successfully created

#Locate the python binding library to use.
python_lib=`find $abs_top_builddir -name lvm.so`
find $abs_top_builddir -name lvm.so
##pytest.sh:20+ find /home/tasleson/lvm2 -name lvm.so
#pytest.sh:20+
python_lib=/home/tasleson/lvm2/python/build/lib.linux-x86_64-2.7/lvm.so
if [ "$python_lib" != "" ]
then
	export PYTHONPATH=`dirname $python_lib`:$PYTHONPATH
	python_lvm_unit.py -v
else
	echo "Unable to test python bindings as library not available"
fi
#pytest.sh:21+ '['
/home/tasleson/lvm2/python/build/lib.linux-x86_64-2.7/lvm.so '!=' '' ']'
dirname $python_lib
##pytest.sh:23+ dirname
/home/tasleson/lvm2/python/build/lib.linux-x86_64-2.7/lvm.so
#pytest.sh:23+ export
PYTHONPATH=/home/tasleson/lvm2/python/build/lib.linux-x86_64-2.7:
#pytest.sh:23+
PYTHONPATH=/home/tasleson/lvm2/python/build/lib.linux-x86_64-2.7:
#pytest.sh:24+ python_lvm_unit.py -v
testConfigFindBool (__main__.TestLvm) ... ok
testConfigReload (__main__.TestLvm) ... ok
testConfig_override (__main__.TestLvm) ... ok
testDupeLvCreate (__main__.TestLvm) ... ok
testLvActiveInactive (__main__.TestLvm) ... ok
testLvMisc (__main__.TestLvm) ... FAIL
testLvProperty (__main__.TestLvm) ... ok
testLvRename (__main__.TestLvm) ... ok
testLvResize (__main__.TestLvm) ... ok
testLvSeg (__main__.TestLvm) ... ok
testLvSize (__main__.TestLvm) ... ok
testLvSuspend (__main__.TestLvm) ... ok
testLvTags (__main__.TestLvm) ... ok
testOpenClose (__main__.TestLvm) ... ok
testPercentToFloat (__main__.TestLvm) ... ok
testPvGetters (__main__.TestLvm) ...   Internal error: Pool read_vg crc
mismatch.
api/pytest.sh: line 27: 14608 Aborted                 (core dumped)
python_lvm_unit.py -v
set +vx; STACKTRACE; set -vx
##pytest.sh:24+ set +vx
## - api/pytest.sh:24
## 0 STACKTRACE() called from lib/utils:24
[New LWP 14608]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `python
/home/tasleson/lvm2/test/api/python_lvm_unit.py -v'.
Program terminated with signal 6, Aborted.
#0  0x0000003055035ba5 in raise () from /lib64/libc.so.6
#0  0x0000003055035ba5 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000003055037358 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007fb6d2249b3b in print_log (level=2, file=0x7fb6d1ffb073
"mm/pool.c", line=178, dm_errno_or_class=-1, format=0x7fb6d1ffb158
"Internal error: Pool %s crc mismatch.") at log/log.c:375
        ap = {{gp_offset = 48, fp_offset = 48, overflow_arg_area =
0x7fff895beef0, reg_save_area = 0x7fff895bee30}}
        buf = " \331\374\000\000\000\000\000\240l\aU0\000\000\000
\331\374\000\000\000\000\000\353\265\006U0\000\000\000\n\212\024\001\000\000\000\000\305\232$\322\266\177\000\000\063\370\000\000\000\000\000\000\210M\000\001\000\000\000\000\353\267+\322\266\177\000\000uG\aU\b\000\000\000\235\266+\322\266\177\000\000`\001\000\000\a\000\000\000
\\\003U0\000\000\000\002", '\000' <repeats 28 times>, "811", '\000'
<repeats 16 times>"\360,
\030\bW0\000\000\000\b:;W0\000\000\000B\246D\324\266\177\000\000x\230\v\001\000\000\000\000\240>8W0\000\000\000\300\000\\\211\377\177\000\000\207\r\005U0",
'\000' <repeats 11 times>,
"0\000\000\000\060\000\000\000\200\353[\211\377\177\000\000\300\352[\211\377\177\000\000@;*\322\266\177\000\000\000\000\000\024\000\000\000\000\365\240*\322\266\177\000\000+\003\000\000\000\000\000\000\357"...
        locn = "\000nternal error: Pool read_vg crc
mismatch.\000\000\a\000\000\000
\331\374\000\000\000\000\000\353\265\006U0\000\000\000PI\000\001\000\000\000\000\305\232$\322\266\177\000\000?q\030\001\000\000\000\000Fx\027\001\000\000\000\000\330\240*\322\266\177\000\000\000\000\000\000@\000\000\000\365\240*\322\266\177\000\000\217\000\000\000\a",
'\000' <repeats 21 times>, "\027\001", '\000' <repeats 14 times>"\377,
\377", '\000' <repeats 12 times>, "\002", '\000' <repeats 15
times>"\260,
\332[\211\377\177\000\000\000\000\000\000\000\000\000\000\377\377\377\377\000\000\000\000\000\360\025\001\000\000\000\000\377\377\377\377\377\177\000\000\000\004;U0",
'\000' <repeats 12 times>, "\004;U0", '\000' <repeats 12 times>"\360"...
        bufused = 0
        n = 42
        message = 0x7fff895bd9d0 ""
        trformat = 0x7fb6d1ffb158 "Internal error: Pool %s crc mismatch."
        newbuf = 0x8965c3e <Address 0x8965c3e out of bounds>
        use_stderr = 0
        log_once = 0
        fatal_internal_error = 1
        msglen = 42
#3  0x00007fb6d1feb838 in dm_pool_unlock (p=0x1159460, crc=1) at
mm/pool.c:178
No locals.
#4  0x00007fb6d221db10 in lvmcache_vginfo_holders_dec_and_test_for_zero
(vginfo=0x1082010) at cache/lvmcache.c:821
No locals.
#5  0x00007fb6d22804cd in release_vg (vg=0x1173150) at metadata/vg.c:87
No locals.
#6  0x00007fb6d221c39f in _free_cached_vgmetadata (vginfo=0x1082010) at
cache/lvmcache.c:145
No locals.
#7  0x00007fb6d221c77d in _drop_metadata (vgname=0x7fff895c0090
"thin_vg", drop_precommitted=0) at cache/lvmcache.c:258
        vginfo = 0x1082010
        info = 0x1082020
#8  0x00007fb6d221c897 in lvmcache_drop_metadata (vgname=0x7fff895c0090
"thin_vg", drop_precommitted=0) at cache/lvmcache.c:295
No locals.
#9  0x00007fb6d22a4366 in _file_lock_resource (cmd=0x1044b60,
resource=0x7fff895c0090 "thin_vg", flags=33, lv=0x0) at
locking/file_locking.c:267
        lockfile = '\000' <repeats 784 times>"\220,
\371[\211\377\177\000\000\200\371[\211\377\177\000\000M\241+\322\266\177\000\000\020\373[\211\377\177\000\000J\241+\322\266\177\000\000\063k\004U0",
'\000' <repeats 11 times>"\356,
\223\004U0\000\000\000\320\371[\211\377\177\000\000\300\371[\211\377\177\000\000M\241+\322\266\177\000\000P\373[\211\377\177\000\000J\241+\322\266\177\000\000\063k\004U0",
'\000' <repeats 11 times>"\356,
\223\004U0\000\000\000\002\000\000\000\000\000\000\000\001", '\000'
<repeats 19 times>"\377,
\377\377\377\000\000\000\000\000\000\000\000>\217\aU0", '\000' <repeats
15 times>, "\n\000\000\000\316\b+\322\266\177", '\000' <repeats 18
times>"\377,
\377\377\377\377\377\377\377\000\000\000\000\000\000\000\000\356\223\004U0",
'\000' <repeats 15 times>,
"\n\000\000\000\377\377\377\377\377\377\377\377\000\000\000\000\000\000\000\000\207\371[\211\377\177",
'\000' <repeats 26 times>"\372"...
        origin_only = 0
        revert = 0
#10 0x00007fb6d2247dcc in _lock_vol (cmd=0x1044b60,
resource=0x7fff895c0090 "thin_vg", flags=33, lv_op=LV_NOOP, lv=0x0) at
locking/locking.c:382
        lck_type = 1
        lck_scope = 0
        ret = 0
        __PRETTY_FUNCTION__ = "_lock_vol"
#11 0x00007fb6d22480cb in lock_vol (cmd=0x1044b60, vol=0xfbbfb4
"thin_vg", flags=33, lv=0x0) at locking/locking.c:460
        resource = "thin_vg", '\000' <repeats 250 times>
        lv_op = LV_NOOP
        lck_type = 1
#12 0x00007fb6d226a98c in _vg_lock_and_read (cmd=0x1044b60,
vg_name=0xfbbfb4 "thin_vg", vgid=0x0, lock_flags=33, status_flags=0,
misc_flags=0) at metadata/metadata.c:4060
        vg = 0x0
        consistent = 0
        consistent_in = 32767
        failure = 0
        already_locked = 0
#13 0x00007fb6d226ad89 in vg_read (cmd=0x1044b60, vg_name=0xfbbfb4
"thin_vg", vgid=0x0, flags=0) at metadata/metadata.c:4175
        status = 0
        lock_flags = 33
#14 0x00007fb6d22167cb in lvm_vg_open (libh=0x1044b60, vgname=0xfbbfb4
"thin_vg", mode=0x7fb6d452e4dc "r", flags=0) at lvm_vg.c:197
        internal_flags = 0
        vg = 0xfff260
#15 0x00007fb6d24fadc0 in liblvm_lvm_vg_open (self=0x0, args=('thin_vg',
'r')) at liblvm_python.c:465
        vgname = 0xfbbfb4 "thin_vg"
        mode = 0x7fb6d452e4dc "r"
        vgobj = 0x7fb6d44e9150
#16 0x00000030570dd281 in call_function (oparg=<optimized out>,
pp_stack=0x7fff895c0488) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4098
        callargs = ('thin_vg', 'r')
        flags = <optimized out>
        tstate = <optimized out>
        func = <built-in function vgOpen>
        w = <optimized out>
        na = <optimized out>
        nk = <optimized out>
        n = <optimized out>
        pfunc = 0x115af20
        x = <optimized out>
#17 PyEval_EvalFrameEx (f=f at entry=Frame 0x115ad70, for file
/home/tasleson/lvm2/test/api/python_lvm_unit.py, line 97, in
_get_pv_test
(self=<TestLvm(_resultForDoCleanups=<TextTestResult(_original_stdout=<file
at remote 0x7fb6d45411e0>, dots=False, skipped=[], _mirrorOutput=False,
stream=<_WritelnDecorator(stream=<file@remote 0x7fb6d4541270>) at
remote 0x1006dd0>, testsRun=16, buffer=None, _original_stderr=<file at
remote 0x7fb6d4541270>, showAll=True, _stdout_buffer=None,
_stderr_buffer=None, _moduleSetUpFailed=False, expectedFailures=[],
errors=[], descriptions=True, _previousTestClass=<type@remote
0x10a5e90>, unexpectedSuccesses=[],
failures=[(<TestLvm(_resultForDoCleanups=<...>,
_type_equality_funcs={<type@remote 0x3057384e20>: 'assertSetEqual',
<type@remote 0x3057380420>: 'assertListEqual', <type@remote
0x305738e820>: 'assertMultiLineEqual', <type@remote 0x3057389760>:
'assertTupleEqual', <type@remote 0x3057384fc0>: 'assertSetEqual',
<type@remote 0x30573829e0>: 'assertDictEqual'}, _testMethodDoc=None,
_testMethodNa...(truncated), throwflag=throwflag at entry=0) at
/usr/src/debug/Python-2.7.3/Python/ceval.c:2740
        sp = 0x115af28
        stack_pointer = <optimized out>
        next_instr = 0x7fb6d4462964 "}\004"
        opcode = <optimized out>
        oparg = <optimized out>
        why = 1
        err = 0
        x = <optimized out>
        v = <optimized out>
        w = <optimized out>
        u = <optimized out>
        t = <optimized out>
        stream = 0x0
        fastlocals = 0x115aee8
        freevars = 0x115af18
        retval = 0x0
        tstate = <optimized out>
        co = 0x7fb6d444eeb0
        instr_ub = -1
        instr_lb = 0
        instr_prev = -1
        first_instr = 0x7fb6d446293c "t"
        names = ('lvm', 'listVgNames', 'vgOpen', 'listPVs', 'len', 'None')
        consts = (None, 0)



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Internal error: Pool read_vg crc mismatch only when running in test environment
  2013-06-20 23:18 Internal error: Pool read_vg crc mismatch only when running in test environment Tony Asleson
@ 2013-06-21 13:53 ` Zdenek Kabelac
  2013-06-25 23:28   ` Tony Asleson
  0 siblings, 1 reply; 6+ messages in thread
From: Zdenek Kabelac @ 2013-06-21 13:53 UTC (permalink / raw)
  To: lvm-devel

Dne 21.6.2013 01:18, Tony Asleson napsal(a):
> Writing some new unit test cases for my latest liblvm patch set and at
> the moment I am running into a case where I can run the unit test case
> against real disk and it works, but if I run it in the test environment
> with loop back devices I am getting an abort with:
>
> "Internal error: Pool read_vg crc mismatch."
>
> Any ideas why this error isn't occurring on both?
>

This happens - if you have requested 'read-only'  VG struct,
and you have modifed something in this vgmem pool
(either writing to struct, or just using vgmem pool for allocation)

Zdenek



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Internal error: Pool read_vg crc mismatch only when running in test environment
  2013-06-21 13:53 ` Zdenek Kabelac
@ 2013-06-25 23:28   ` Tony Asleson
  2013-06-26  7:49     ` Zdenek Kabelac
  0 siblings, 1 reply; 6+ messages in thread
From: Tony Asleson @ 2013-06-25 23:28 UTC (permalink / raw)
  To: lvm-devel

On 06/21/2013 08:53 AM, Zdenek Kabelac wrote:
> Dne 21.6.2013 01:18, Tony Asleson napsal(a):
>> Writing some new unit test cases for my latest liblvm patch set and at
>> the moment I am running into a case where I can run the unit test case
>> against real disk and it works, but if I run it in the test environment
>> with loop back devices I am getting an abort with:
>>
>> "Internal error: Pool read_vg crc mismatch."
>>
>> Any ideas why this error isn't occurring on both?
>>
> 
> This happens - if you have requested 'read-only'  VG struct,
> and you have modifed something in this vgmem pool
> (either writing to struct, or just using vgmem pool for allocation)

My original inquiry was for ideas why I see this on loop back devices
and not actual devices.  This response doesn't seem to match what I am
seeing.

The stack trace shows that we are getting this error during an
lvm_vg_open.  I can re-create the error regardless if I open the vg
struct as read-only or read-write.

>From my initial debug it appears that if the vginfo->vg_use_count > 1
(in this case 3) we pass 1 as the second parameter to dm_pool_unlock
which is a crc check of the pool which it finds to be different.  At
this point it would seem I am exacerbating some type of caching bug or
that somewhere along the path I am inadvertently changing the contents
of the vg struct pointer with my latest patches.

Regards,
Tony



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Internal error: Pool read_vg crc mismatch only when running in test environment
  2013-06-25 23:28   ` Tony Asleson
@ 2013-06-26  7:49     ` Zdenek Kabelac
  2013-06-26 22:49       ` Tony Asleson
  0 siblings, 1 reply; 6+ messages in thread
From: Zdenek Kabelac @ 2013-06-26  7:49 UTC (permalink / raw)
  To: lvm-devel

Dne 26.6.2013 01:28, Tony Asleson napsal(a):
> On 06/21/2013 08:53 AM, Zdenek Kabelac wrote:
>> Dne 21.6.2013 01:18, Tony Asleson napsal(a):
>>> Writing some new unit test cases for my latest liblvm patch set and at
>>> the moment I am running into a case where I can run the unit test case
>>> against real disk and it works, but if I run it in the test environment
>>> with loop back devices I am getting an abort with:
>>>
>>> "Internal error: Pool read_vg crc mismatch."
>>>
>>> Any ideas why this error isn't occurring on both?
>>>
>>
>> This happens - if you have requested 'read-only'  VG struct,
>> and you have modifed something in this vgmem pool
>> (either writing to struct, or just using vgmem pool for allocation)
>
> My original inquiry was for ideas why I see this on loop back devices
> and not actual devices.  This response doesn't seem to match what I am
> seeing.

It's hard to give advice if I do not see the actual code.


>
> The stack trace shows that we are getting this error during an
> lvm_vg_open.  I can re-create the error regardless if I open the vg
> struct as read-only or read-write.
>
>  From my initial debug it appears that if the vginfo->vg_use_count > 1
> (in this case 3) we pass 1 as the second parameter to dm_pool_unlock
> which is a crc check of the pool which it finds to be different.  At
> this point it would seem I am exacerbating some type of caching bug or
> that somewhere along the path I am inadvertently changing the contents
> of the vg struct pointer with my latest patches.

There is internal debug support for this kind of problems which mprotects
vg structure, so any write access to locked vg structure crashes the 
application and you may look at stack trace.

But it needs some hand modification of make.tmpl file (no configure option)
Uncomment #DEFS += -DDEBUG_ENFORCE_POOL_LOCKING  and rebuild and retest with
unlimited coredump size.

Zdenek



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Internal error: Pool read_vg crc mismatch only when running in test environment
  2013-06-26  7:49     ` Zdenek Kabelac
@ 2013-06-26 22:49       ` Tony Asleson
  2013-06-28  9:14         ` Zdenek Kabelac
  0 siblings, 1 reply; 6+ messages in thread
From: Tony Asleson @ 2013-06-26 22:49 UTC (permalink / raw)
  To: lvm-devel

On 06/26/2013 02:49 AM, Zdenek Kabelac wrote:
> Dne 26.6.2013 01:28, Tony Asleson napsal(a):
>> On 06/21/2013 08:53 AM, Zdenek Kabelac wrote:
>>> Dne 21.6.2013 01:18, Tony Asleson napsal(a):
>>>> Writing some new unit test cases for my latest liblvm patch set and at
>>>> the moment I am running into a case where I can run the unit test case
>>>> against real disk and it works, but if I run it in the test environment
>>>> with loop back devices I am getting an abort with:
>>>>
>>>> "Internal error: Pool read_vg crc mismatch."
>>>>
>>>> Any ideas why this error isn't occurring on both?
>>>>
>>>
>>> This happens - if you have requested 'read-only'  VG struct,
>>> and you have modifed something in this vgmem pool
>>> (either writing to struct, or just using vgmem pool for allocation)
>>
>> My original inquiry was for ideas why I see this on loop back devices
>> and not actual devices.  This response doesn't seem to match what I am
>> seeing.
> 
> It's hard to give advice if I do not see the actual code.

Patch set was posted a while back.  Specifically the issue I am running
into has to do with the functionality to list all PVs (including orphan)

Specific patch is:
https://www.redhat.com/archives/lvm-devel/2013-May/msg00036.html

>> The stack trace shows that we are getting this error during an
>> lvm_vg_open.  I can re-create the error regardless if I open the vg
>> struct as read-only or read-write.
>>
>>  From my initial debug it appears that if the vginfo->vg_use_count > 1
>> (in this case 3) we pass 1 as the second parameter to dm_pool_unlock
>> which is a crc check of the pool which it finds to be different.  At
>> this point it would seem I am exacerbating some type of caching bug or
>> that somewhere along the path I am inadvertently changing the contents
>> of the vg struct pointer with my latest patches.
> 
> There is internal debug support for this kind of problems which mprotects
> vg structure, so any write access to locked vg structure crashes the
> application and you may look at stack trace.
> 
> But it needs some hand modification of make.tmpl file (no configure option)
> Uncomment #DEFS += -DDEBUG_ENFORCE_POOL_LOCKING  and rebuild and retest
> with
> unlimited coredump size.

Thanks for the tip, this helped quite a bit!  I modified the code in the
above referenced patch to retain the vg pointer when retrieving the list
of PVs.  The liblvm library uses the vg->vgmem pool to allocate strings
to be returned to the user when getting information pertaining to a PV.
 The problem is that the vg gets cached with a crc.  We then allocate
memory from the vgmem pool and later when we go to clear out the cache
entry we fail the crc check.  When I enable the
DEBUG_ENFORCE_POOL_LOCKING we fail after we retrieve a vg from the cache
and then I try to allocate something from the vgmem pool.

Your comment about opening the vg "r" vs. "w" still seems incorrect to
me, but if it is indeed correct then we have a problem with the library
that precedes any of my changes.  We use the vg->vgmem for many things
and if this argument is indeed true then all those allocations will be
causing the vg struct to change, thus causing crc errors if the user is
allowed to open the vg struct as read-only.  In my code review and
testing it appears that we aren't hitting this because when we call into
vg_open we don't have a vgid, so we fail early in lvmcache_get_vg and
thus we never add them to the cache and thus we never call dm_pool_lock
on them.

If you could point out the bit of code that actually determines that we
mprotect memory based on the user doing an vg open read only vs. read
write that would be most helpful.  I'm not seeing it at the moment and I
would like to understand this better.

Currently I am thinking about just backing out the change to retrieve
the pv list and use the cmd->mem for allocations instead of the
vg->vgmem.  Previously I brought up just putting such things on the heap
and letting the users of the library free them, but that was dismissed
because existing library users would then have memory leaks.

Thanks,
Tony



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Internal error: Pool read_vg crc mismatch only when running in test environment
  2013-06-26 22:49       ` Tony Asleson
@ 2013-06-28  9:14         ` Zdenek Kabelac
  0 siblings, 0 replies; 6+ messages in thread
From: Zdenek Kabelac @ 2013-06-28  9:14 UTC (permalink / raw)
  To: lvm-devel

Dne 27.6.2013 00:49, Tony Asleson napsal(a):
> On 06/26/2013 02:49 AM, Zdenek Kabelac wrote:
>> Dne 26.6.2013 01:28, Tony Asleson napsal(a):
>>> On 06/21/2013 08:53 AM, Zdenek Kabelac wrote:
>>>> Dne 21.6.2013 01:18, Tony Asleson napsal(a):
>>>>> Writing some new unit test cases for my latest liblvm patch set and at
>>>>> the moment I am running into a case where I can run the unit test case
>>>>> against real disk and it works, but if I run it in the test environment
>>>>> with loop back devices I am getting an abort with:
>>>>>
>>>>> "Internal error: Pool read_vg crc mismatch."
>>>>>
>>>>> Any ideas why this error isn't occurring on both?
>>>>>
>>>>
>>>> This happens - if you have requested 'read-only'  VG struct,
>>>> and you have modifed something in this vgmem pool
>>>> (either writing to struct, or just using vgmem pool for allocation)
>>>
>>> My original inquiry was for ideas why I see this on loop back devices
>>> and not actual devices.  This response doesn't seem to match what I am
>>> seeing.
>>
>> It's hard to give advice if I do not see the actual code.
>
> Patch set was posted a while back.  Specifically the issue I am running
> into has to do with the functionality to list all PVs (including orphan)
>
> Specific patch is:
> https://www.redhat.com/archives/lvm-devel/2013-May/msg00036.html
>
>>> The stack trace shows that we are getting this error during an
>>> lvm_vg_open.  I can re-create the error regardless if I open the vg
>>> struct as read-only or read-write.
>>>
>>>   From my initial debug it appears that if the vginfo->vg_use_count > 1
>>> (in this case 3) we pass 1 as the second parameter to dm_pool_unlock
>>> which is a crc check of the pool which it finds to be different.  At
>>> this point it would seem I am exacerbating some type of caching bug or
>>> that somewhere along the path I am inadvertently changing the contents
>>> of the vg struct pointer with my latest patches.
>>
>> There is internal debug support for this kind of problems which mprotects
>> vg structure, so any write access to locked vg structure crashes the
>> application and you may look at stack trace.
>>
>> But it needs some hand modification of make.tmpl file (no configure option)
>> Uncomment #DEFS += -DDEBUG_ENFORCE_POOL_LOCKING  and rebuild and retest
>> with
>> unlimited coredump size.
>
> Thanks for the tip, this helped quite a bit!  I modified the code in the
> above referenced patch to retain the vg pointer when retrieving the list
> of PVs.  The liblvm library uses the vg->vgmem pool to allocate strings
> to be returned to the user when getting information pertaining to a PV.
>   The problem is that the vg gets cached with a crc.  We then allocate
> memory from the vgmem pool and later when we go to clear out the cache
> entry we fail the crc check.  When I enable the
> DEBUG_ENFORCE_POOL_LOCKING we fail after we retrieve a vg from the cache
> and then I try to allocate something from the vgmem pool.
>
> Your comment about opening the vg "r" vs. "w" still seems incorrect to
> me, but if it is indeed correct then we have a problem with the library
> that precedes any of my changes.  We use the vg->vgmem for many things
> and if this argument is indeed true then all those allocations will be
> causing the vg struct to change, thus causing crc errors if the user is
> allowed to open the vg struct as read-only.  In my code review and
> testing it appears that we aren't hitting this because when we call into
> vg_open we don't have a vgid, so we fail early in lvmcache_get_vg and
> thus we never add them to the cache and thus we never call dm_pool_lock
> on them.
>
> If you could point out the bit of code that actually determines that we
> mprotect memory based on the user doing an vg open read only vs. read
> write that would be most helpful.  I'm not seeing it at the moment and I
> would like to understand this better.
>
> Currently I am thinking about just backing out the change to retrieve
> the pv list and use the cmd->mem for allocations instead of the
> vg->vgmem.  Previously I brought up just putting such things on the heap
> and letting the users of the library free them, but that was dismissed
> because existing library users would then have memory leaks.


yeah there could be some design fault which is using mem pool from read-only
volume group.  vg->vgmem  is only used for allocation of objects directly 
related to VG  -  return of some copied buffers is not something which is 
supposed touch this memory pool as well as creating PV lists
(cmd mempool or just temporarily created new pool needs to be used here.)

If you hold only read lock, there should be no need to modify vg structure.

Zdenek




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-06-28  9:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-20 23:18 Internal error: Pool read_vg crc mismatch only when running in test environment Tony Asleson
2013-06-21 13:53 ` Zdenek Kabelac
2013-06-25 23:28   ` Tony Asleson
2013-06-26  7:49     ` Zdenek Kabelac
2013-06-26 22:49       ` Tony Asleson
2013-06-28  9:14         ` Zdenek Kabelac

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.