[linux-lvm] What really works?

linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed

* [linux-lvm] What really works?
       [not found] <E15vISM-0005YX-00.2001-10-21-14-15-58@mail5.svr.pol.co.uk>
@ 2001-10-21 14:25 ` Jason A. Lixfeld
  2001-10-22 14:37   ` [linux-lvm] " Theodore Tso
  2001-10-22 17:04   ` Stephen C. Tweedie
  0 siblings, 2 replies; 10+ messages in thread
From: Jason A. Lixfeld @ 2001-10-21 14:25 UTC (permalink / raw)
  To: ext3-users; +Cc: linux-lvm

Folks, I'm really stressed here.  I'm sending this to both lists to see
if anyone can offer any assistance.

	I have 2 boxes.  Box1 has 10x80GB drives in it and 1 2GB drive
that the OS is installed on.  Box2 has 6x60GB drives in it and an 8GB
drive that the OS is installed on.  Here's the layout for box1:

Linux version 2.4.12-ac3(gcc version 2.96 20000731 (Red Hat Linux 7.1
2.96-81)

PDC20268: IDE controller on PCI bus 00 dev 40
PDC20268: chipset revision 1
PDC20268: not 100% native mode: will probe irqs later
PDC20268: ROM enabled at 0xe7000000
PDC20268: pci-config space interrupt mirror fixed.
PDC20268: (U)DMA Burst Bit ENABLED Primary MASTER Mode Secondary MASTER
Mode.
    ide2: BM-DMA at 0xb400-0xb407, BIOS settings: hde:pio, hdf:pio
    ide3: BM-DMA at 0xb408-0xb40f, BIOS settings: hdg:pio, hdh:pio
PDC20267: IDE controller on PCI bus 00 dev 48
PCI: Found IRQ 10 for device 00:09.0
PDC20267: chipset revision 2
PDC20267: not 100% native mode: will probe irqs later
PDC20267: ROM enabled at 0xe8000000
PDC20267: (U)DMA Burst Bit ENABLED Primary PCI Mode Secondary PCI Mode.
    ide4: BM-DMA at 0xc800-0xc807, BIOS settings: hdi:pio, hdj:pio
    ide5: BM-DMA at 0xc808-0xc80f, BIOS settings: hdk:pio, hdl:pio
PDC20267: IDE controller on PCI bus 00 dev 50
PCI: Found IRQ 12 for device 00:0a.0
PDC20267: chipset revision 2
PDC20267: not 100% native mode: will probe irqs later
PDC20267: ROM enabled at 0xe9000000
PDC20267: (U)DMA Burst Bit ENABLED Primary PCI Mode Secondary PCI Mode.
    ide6: BM-DMA at 0xdc00-0xdc07, BIOS settings: hdm:pio, hdn:pio
    ide7: BM-DMA at 0xdc08-0xdc0f, BIOS settings: hdo:pio, hdp:pio
hda: Maxtor 82100A4, ATA DISK drive
hdb: probing with STATUS(0x00) instead of ALTSTATUS(0x50)
hdb: probing with STATUS(0x00) instead of ALTSTATUS(0x50)
hde: Maxtor 98196H8, ATA DISK drive
hdf: Maxtor 98196H8, ATA DISK drive
hdg: Maxtor 98196H8, ATA DISK drive
hdh: Maxtor 98196H8, ATA DISK drive
hdi: Maxtor 98196H8, ATA DISK drive
hdj: MAXTOR 4K080H4, ATA DISK drive
hdk: MAXTOR 4K080H4, ATA DISK drive
hdl: MAXTOR 4K080H4, ATA DISK drive
hdm: Maxtor 98196H8, ATA DISK drive
hdn: Maxtor 98196H8, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide2 at 0xa400-0xa407,0xa802 on irq 15
ide3 at 0xac00-0xac07,0xb002 on irq 15
ide4 at 0xc000-0xc007,0xc402 on irq 10
ide5 at 0xb800-0xb807,0xbc02 on irq 10
ide6 at 0xd400-0xd407,0xd802 on irq 12
hda: 4124736 sectors (2112 MB) w/256KiB Cache, CHS=1023/64/63, DMA
hde: 156312576 sectors (80032 MB) w/2048KiB Cache, CHS=155072/16/63,
UDMA(66)
hdf: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63,
UDMA(66)
hdg: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63,
UDMA(66)
hdh: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63,
UDMA(66)
hdi: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63,
UDMA(100)
hdj: 156301487 sectors (80026 MB) w/2000KiB Cache, CHS=155060/16/63,
UDMA(100)
hdk: 156301487 sectors (80026 MB) w/2000KiB Cache, CHS=155060/16/63,
UDMA(100)
hdl: 156301487 sectors (80026 MB) w/2000KiB Cache, CHS=155060/16/63,
UDMA(100)
hdm: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63,
UDMA(100)
hdn: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63,
UDMA(100)
Partition check:
 hda: hda1 hda2
 hde: hde1
 hdf: [PTBL] [9964/255/63] hdf1
 hdg: hdg1
 hdh: unknown partition table
 hdi: hdi1
 hdj: hdj1
 hdk: hdk1
 hdl: hdl1
 hdm: hdm1
 hdn: hdn1

# /sbin/pvscan
pvscan -- reading all physical volumes (this may take a while...)
pvscan -- ACTIVE   PV "/dev/hdm" of VG "foo1" [76.31 GB / 0 free]
pvscan -- ACTIVE   PV "/dev/hdn" of VG "foo3" [76.31 GB / 0 free]
pvscan -- ACTIVE   PV "/dev/hdk" of VG "foo3" [74.50 GB / 0 free]
pvscan -- ACTIVE   PV "/dev/hdl" of VG "foo2" [74.50 GB / 0 free]
pvscan -- ACTIVE   PV "/dev/hdi" of VG "foo1" [76.31 GB / 0 free]
pvscan -- ACTIVE   PV "/dev/hdj" of VG "foo2" [74.50 GB / 0 free]
pvscan -- ACTIVE   PV "/dev/hdg" of VG "foo3" [76.31 GB / 0 free]
pvscan -- ACTIVE   PV "/dev/hde" of VG "foo1" [74.52 GB / 0 free]
pvscan -- total: 8 [603.47 GB] / in use: 8 [603.47 GB] / in no VG: 0 [0]

# /sbin/lvscan
lvscan -- ACTIVE            "/dev/foo1/pc" [227.14 GB]
lvscan -- ACTIVE            "/dev/foo2/cons" [149.00 GB]
lvscan -- ACTIVE            "/dev/foo3/mov" [227.12 GB]
lvscan -- 3 logical volumes with 603.27 GB total in 3 volume groups
lvscan -- 3 active logical volumes

--

History:  The LVMs and the RAID is brand new, not upgraded from a
previous version of LVM or EXT3 or anything.  The OS is old (RedHat
7.0).  To make a long story short, I was wrestling around for about a
week trying to find a good kernel rev that would work with the
FastTrack100 card, had all the recent features and fixes of LVM and all
the recent features and fixes of EXT3.  I compiled and installed the
latest e2fsprogs and utils-linux aswell.  I even tried a bunch of
different compilers before I found that 2.4.10-ac11 was a kernel rev
that finally started working ok.  Previous attempts at stock kernels and
various LVM/EXT3/FastTrack patches resulted in all kinds of errors
ranging from ATARAID errors to LVM errors to physical hard drive errors.
It was a big, huge mess.  Anyway, 2.4.10-ac11 worked fine for about 5
days.  We started to get low on space on the RAID so deleted stuff off
of one of the LVMs to make room and then we moved stuff from the raid
over to the LVM we had just free'd up space on.  As we were doing this,
kjournald chewed up all of the mem and CPU and then all these EXT3
errors started showing up in the logs.  All of a sudden, EXT3 crashed
and I had to make a trek into the office to hard boot the box.  I went
and rebooted the box and when it came back up, I decided to do forced
fsck'd of 2 of the LVMs that were having problems (/dev/foo3/mov was
getting data moved over to it and /dev/foo2/cons wasn't doing anything,
but it was showing up in dmesg as having some problems aswell before the
reboot).  I fsck'd both LVMs and fsck finds a whole WHACK of errors on
both LVMs.  After the fsck's finish and I delete the lost+found, I
discover that quite a large chunk of both LVMs are gone because of what
fsck thought to be bad data.  One of the lost+founds is stuck on my LVM
and I can't delete it.  Every time I try, it gives me permission denied
errors. Chattr -I doesn't work either (that's another problem, how the
hell do I get rid of all that stuff in there now).

Anyways, as it stands now every time I run a forced fsck on one of the
LVMs, it finds all kinds of errors, which results in data-loss as fsck
attempts to fix the problems.

I have just upgraded the kernel to 2.4.12-ac3, and now I get these
errors from the ATARAID:

attempt to access beyond end of device
72:01: rw=0, want=586678672, limit=160079661
attempt to access beyond end of device
72:01: rw=0, want=586940816, limit=160079661
attempt to access beyond end of device
72:01: rw=0, want=588251536, limit=160079661
attempt to access beyond end of device
72:01: rw=0, want=588513680, limit=160079661
attempt to access beyond end of device
72:01: rw=0, want=586678672, limit=160079661
attempt to access beyond end of device
72:01: rw=0, want=586678672, limit=160079661
attempt to access beyond end of device
72:01: rw=0, want=586678672, limit=160079661
attempt to access beyond end of device
72:01: rw=0, want=586678672, limit=160079661
attempt to access beyond end of device
72:01: rw=0, want=586678672, limit=160079661
attempt to access beyond end of device
72:01: rw=0, want=586940816, limit=160079661
attempt to access beyond end of device
72:01: rw=0, want=588251536, limit=160079661
attempt to access beyond end of device
72:01: rw=0, want=588513680, limit=160079661
attempt to access beyond end of device
72:01: rw=0, want=586678672, limit=160079661
attempt to access beyond end of device
72:01: rw=0, want=586678672, limit=160079661
attempt to access beyond end of device
72:01: rw=0, want=586678672, limit=160079661
attempt to access beyond end of device
72:01: rw=0, want=586678672, limit=160079661
attempt to access beyond end of device
72:01: rw=0, want=586678672, limit=160079661
attempt to access beyond end of device
72:01: rw=0, want=586678672, limit=160079661

I'm using these settings on my drives (via hdparm):/sbin/hdparm -m16 -c1
-d1 -a8 /dev/hd[e-n]

Also, for the ATARAID ppl who read this, as seen above, the drives that
are connected to the Promise FastTrack100 card come up as UDMA66,
instead of UDMA100.  How do I fix this?

New problem, I try to fsck another EXT3 LVM:

# /sbin/fsck /dev/ARCHiVE1/PC 
fsck 1.25 (20-Sep-2001)
e2fsck 1.25 (20-Sep-2001)
Superblock has a bad ext3 journal (inode 8).
Clear<y>? yes

*** ext3 journal has been deleted - filesystem is now ext2 only ***

/dev/ARCHiVE1/PC was not cleanly unmounted, check forced.
Pass 1: Checking inodes, blocks, and sizes

What's going on there?!

Why is all this happening?  What do I need to do to get this all
working?  This isn't complicated, is it?  10 drives, 3 LVMs and one
ATARAID?!

PLEASE help me get this working.  I'm sick and tired of wasting my days
fixing all this stuff... 

-JL

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [linux-lvm] Re: What really works?
  2001-10-21 14:25 ` [linux-lvm] What really works? Jason A. Lixfeld
@ 2001-10-22 14:37   ` Theodore Tso
  2001-10-22 14:42     ` Bryan-TheBS-Smith
  2001-10-22 21:13     ` [linux-lvm] " Jason A. Lixfeld
  2001-10-22 17:04   ` Stephen C. Tweedie
  1 sibling, 2 replies; 10+ messages in thread
From: Theodore Tso @ 2001-10-22 14:37 UTC (permalink / raw)
  To: Jason A. Lixfeld; +Cc: ext3-users, linux-lvm

On Sun, Oct 21, 2001 at 03:25:56PM -0400, Jason A. Lixfeld wrote:
> Previous attempts at stock kernels and
> various LVM/EXT3/FastTrack patches resulted in all kinds of errors
> ranging from ATARAID errors to LVM errors to physical hard drive errors.

If you see physical disk errors or LVM errors, it's likely that the
filesystem may have been corrupted.  Sometimes the filesystem
corruption may not be noticeable until much later, and the longer you
wait, the worse your data may get scrambled.  So if you were seeing
all sorts of disk or LVM errors, don't just assume that the problems
all went away when you moved to a "stable" kernel.  

So after seeing this kind of disk problems, *always* run fsck -f to
make sure the filesystem is in a sane state before you continue.  Any
journaling filesystem, like ext3, will protect you against needing to
run fsck after an unclean shutdown, but they all won't save you if you
have low-level disk/LVM errors.  In those cases, you still need to
have a filesystem checker to try to pick up the pieces caused by
hardware problems, kernel bugs, etc.

>  One of the lost+founds is stuck on my LVM and I can't delete it.
> Every time I try, it gives me permission denied errors. Chattr -I
> doesn't work either (that's another problem, how the hell do I get
> rid of all that stuff in there now).

It's "chattr -i" (case matters).  

Also, what version of e2fsprogs are you using?  Recent versions should
have offered to get rid of corrupted files for you.  

> Why is all this happening?  What do I need to do to get this all
> working?  This isn't complicated, is it?  10 drives, 3 LVMs and one
> ATARAID?!

Umm.... you're using bleeding edge 2.4 kernels, LVM's, and ext3?  Err,
yeah, this is still complicated.  Given all of the changes going into
the kernel, even though a lot of work had gone into making earlier
versions of LVM and ext3 play nicely together, your milage may
definitely vary (and I haven't even included the ATARAID aspects)..  I
wouldn't put any production data on it just yet, until things have
done a little settling down.

						- Ted

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [linux-lvm] Re: What really works?
  2001-10-22 14:37   ` [linux-lvm] " Theodore Tso
@ 2001-10-22 14:42     ` Bryan-TheBS-Smith
  2001-10-22 21:13     ` [linux-lvm] " Jason A. Lixfeld
  1 sibling, 0 replies; 10+ messages in thread
From: Bryan-TheBS-Smith @ 2001-10-22 14:42 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Jason A. Lixfeld, ext3-users, linux-lvm

Theodore Tso wrote:
> (and I haven't even included the ATARAID aspects)

Just say *NO* to "trick BIOS" ATA RAID.  Especially from a company
that seems to be doing little to guarantee Linux compatibility.

-- TheBS

-- 
Bryan "TheBS" Smith    mailto:b.j.smith@ieee.org   chat:thebs413
Engineer  AbsoluteValue Systems, Inc.  http://www.linux-wlan.org
President     SmithConcepts, Inc.   http://www.SmithConcepts.com
----------------------------------------------------------------
The US National ID is an enhanced Social Security Number.  It
will give those who abuse it more information than ever before.
And just like the SSN, they will ignore all the regulations.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [linux-lvm] RE: What really works?
  2001-10-22 14:37   ` [linux-lvm] " Theodore Tso
  2001-10-22 14:42     ` Bryan-TheBS-Smith
@ 2001-10-22 21:13     ` Jason A. Lixfeld
  2001-10-23  5:35       ` [linux-lvm] " Stephen C. Tweedie
  1 sibling, 1 reply; 10+ messages in thread
From: Jason A. Lixfeld @ 2001-10-22 21:13 UTC (permalink / raw)
  To: 'Theodore Tso'; +Cc: ext3-users, linux-lvm

> If you see physical disk errors or LVM errors, it's likely 
> that the filesystem may have been corrupted.  Sometimes the 
> filesystem corruption may not be noticeable until much later, 
> and the longer you wait, the worse your data may get 
> scrambled.  So if you were seeing all sorts of disk or LVM 
> errors, don't just assume that the problems all went away 
> when you moved to a "stable" kernel.  

Good advice.

> So after seeing this kind of disk problems, *always* run fsck 
> -f to make sure the filesystem is in a sane state before you 
> continue.  Any journaling filesystem, like ext3, will protect 
> you against needing to run fsck after an unclean shutdown, 
> but they all won't save you if you have low-level disk/LVM 
> errors.  In those cases, you still need to have a filesystem 
> checker to try to pick up the pieces caused by hardware 
> problems, kernel bugs, etc.

More good advice.  I ran one on the ATARAID and one of the LVMs.  I'll
run one on the other LVMs.  Actually, I'll run one on ALL the
filesystems to see if they are still clean or of something is scattering
data all over the place.

> >  One of the lost+founds is stuck on my LVM and I can't delete it. 
> > Every time I try, it gives me permission denied errors. Chattr -I 
> > doesn't work either (that's another problem, how the hell 
> do I get rid 
> > of all that stuff in there now).
> 
> It's "chattr -i" (case matters).  

Uhm, it was "chattr -i" that I tried.  The latest version of Outlook for
Windows (shoot me, I know!) thinks it's so smart so it capitalizes the
first letter of every new sentence.

> Also, what version of e2fsprogs are you using?  Recent 
> versions should have offered to get rid of corrupted files for you.  

If tune2fs -v is any indication, I'm running v1.25.

This is still a very annoying problem.  chattr -i <-- I corrected it
this time :)  Doesn't make the files or directories deletable.  Any
other ideas?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [linux-lvm] Re: What really works?
  2001-10-22 21:13     ` [linux-lvm] " Jason A. Lixfeld
@ 2001-10-23  5:35       ` Stephen C. Tweedie
  0 siblings, 0 replies; 10+ messages in thread
From: Stephen C. Tweedie @ 2001-10-23  5:35 UTC (permalink / raw)
  To: Jason A. Lixfeld; +Cc: 'Theodore Tso', ext3-users, linux-lvm

Hi,

On Mon, Oct 22, 2001 at 10:13:48PM -0400, Jason A. Lixfeld wrote:

> This is still a very annoying problem.  chattr -i <-- I corrected it
> this time :)  Doesn't make the files or directories deletable.  Any
> other ideas?

Use "chattr -ia".  The "-a" (append-only) attribute will also pin the
files against deletion.

Failing that, "debugfs" lets you forcibly delete files: "clri
<filename>" will clear the inode by force, and a subsequent e2fsck
will fix up the fs summary information to be consistent with the
destroyed file (including deleting the directory entry pointing to the
nuked inode).

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [linux-lvm] Re: What really works?
  2001-10-21 14:25 ` [linux-lvm] What really works? Jason A. Lixfeld
  2001-10-22 14:37   ` [linux-lvm] " Theodore Tso
@ 2001-10-22 17:04   ` Stephen C. Tweedie
  2001-10-22 17:22     ` Andreas Dilger
  2001-10-23  4:42     ` Joe Thornber
  1 sibling, 2 replies; 10+ messages in thread
From: Stephen C. Tweedie @ 2001-10-22 17:04 UTC (permalink / raw)
  To: Jason A. Lixfeld; +Cc: ext3-users, linux-lvm

Hi,

On Sun, Oct 21, 2001 at 03:25:56PM -0400, Jason A. Lixfeld wrote:
> Folks, I'm really stressed here.  I'm sending this to both lists to see
> if anyone can offer any assistance.

> Anyway, 2.4.10-ac11 worked fine for about 5
> days.  We started to get low on space on the RAID so deleted stuff off
> of one of the LVMs to make room and then we moved stuff from the raid
> over to the LVM we had just free'd up space on.

We test ext3 extensively under load, but if it has particular problems
over LVM I'd be interested in knowing.  All I can suggest right now to
narrow things down is that you see whether ext2 works any better.

Just glancing over the LVM code, though, I don't think that their
locking code is safe in the presence of other filesystem activity.  

lvm_do_pe_lock_unlock does try to flush existing IO, but they do it
with

		pe_lock_req.lock = UNLOCK_PE;
		fsync_dev(pe_lock_req.data.lv_dev);
		pe_lock_req.lock = LOCK_PE;

which (a) doesn't wait for existing IO to complete if that IO was
submitted externally to the buffer cache (so it won't catch
raw IO, direct IO, journal activity, or RAID1 ios); and (b) it allows
new IO to be submitted while the fsync is going on, so when it
eventually sets LOCK_PE state again, we can have loads of new IO
freshly submitted to the device by the time the lock is re-asserted.  

LVM folks, am I missing something here?  I can't see how you can
assert that the device is truly quiescent after the LOCK_PE has been
set.  

The 1.0.1-rc4 code seems to be improved in that it does another
fsync_dev after finally setting LOCK_PE, but fsync_dev is still
inadequate here for any IO submitted directly via submit_bh(), rather
than through the buffer cache.  This bug would be more likely to hit
ext3 than ext2, as ext3 uses submit_bh directly for a lot of its
journal IO, but there are plenty of cases outside ext3 which will also
hit this problem.

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [linux-lvm] Re: What really works?
  2001-10-22 17:04   ` Stephen C. Tweedie
@ 2001-10-22 17:22     ` Andreas Dilger
  2001-10-23  5:32       ` Stephen C. Tweedie
  2001-10-23  4:42     ` Joe Thornber
  1 sibling, 1 reply; 10+ messages in thread
From: Andreas Dilger @ 2001-10-22 17:22 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Jason A. Lixfeld, ext3-users, linux-lvm

On Oct 22, 2001  23:05 +0100, Stephen C. Tweedie wrote:
> Just glancing over the LVM code, though, I don't think that their
> locking code is safe in the presence of other filesystem activity.  
>
> lvm_do_pe_lock_unlock does try to flush existing IO, but they do it
> with
> 
> 		pe_lock_req.lock = UNLOCK_PE;
> 		fsync_dev(pe_lock_req.data.lv_dev);
> 		pe_lock_req.lock = LOCK_PE;

Note that the code you reference is only in use when the Logical Extent
is being moved from one disk to another (shouldn't be done in normal
circumstances).  Also, this code has been reworked in the LVM CVS and
recent LVM releases to be more robust.

> which (a) doesn't wait for existing IO to complete if that IO was
> submitted externally to the buffer cache (so it won't catch
> raw IO, direct IO, journal activity, or RAID1 ios); and (b) it allows
> new IO to be submitted while the fsync is going on, so when it
> eventually sets LOCK_PE state again, we can have loads of new IO
> freshly submitted to the device by the time the lock is re-asserted.  

The "external I/O" problem is a known issue (raw IO) because it is not
flushed.  Note that in newer kernels, all write I/O which is done to
the LE being moved is put into a queue at LVM mapping time, so the
above fsync is not an issue for it (it gets resubmitted when the move
is done).

> LVM folks, am I missing something here?  I can't see how you can
> assert that the device is truly quiescent after the LOCK_PE has been
> set.  

See lvm_map() in newer LVM code for the _queue_io() calls.

Cheers, Andreas
--
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [linux-lvm] Re: What really works?
  2001-10-22 17:22     ` Andreas Dilger
@ 2001-10-23  5:32       ` Stephen C. Tweedie
  0 siblings, 0 replies; 10+ messages in thread
From: Stephen C. Tweedie @ 2001-10-23  5:32 UTC (permalink / raw)
  To: Stephen C. Tweedie, Jason A. Lixfeld, ext3-users, linux-lvm

Hi,

On Mon, Oct 22, 2001 at 04:23:26PM -0600, Andreas Dilger wrote:

> > lvm_do_pe_lock_unlock does try to flush existing IO, but they do it
> > with
> > 
> > 		pe_lock_req.lock = UNLOCK_PE;
> > 		fsync_dev(pe_lock_req.data.lv_dev);
> > 		pe_lock_req.lock = LOCK_PE;
> 
> Note that the code you reference is only in use when the Logical Extent
> is being moved from one disk to another (shouldn't be done in normal
> circumstances).

Right, and the user in question had a LVM working robustly for 5 days
before trying to move a partition, at which point the filesystem
started giving errors all over the place.  It wasn't 100% clear from
the bug report whether the "move" was a fs-level copy or an LVM-level
PE move, though.

> Also, this code has been reworked in the LVM CVS and
> recent LVM releases to be more robust.

Yep, saw that.

> > which (a) doesn't wait for existing IO to complete if that IO was
> > submitted externally to the buffer cache (so it won't catch
> > raw IO, direct IO, journal activity, or RAID1 ios); and (b) it allows
> > new IO to be submitted while the fsync is going on, so when it
> > eventually sets LOCK_PE state again, we can have loads of new IO
> > freshly submitted to the device by the time the lock is re-asserted.  
> 
> The "external I/O" problem is a known issue (raw IO) because it is not
> flushed.  Note that in newer kernels, all write I/O which is done to
> the LE being moved is put into a queue at LVM mapping time, so the
> above fsync is not an issue for it (it gets resubmitted when the move
> is done).

It's still an issue, because you haven't waited for the previous
external IO to complete.  The 1.0.1rc4 code looks much more robust in
its locking against newly submitted IO (case (b) above), but doesn't
address (a) yet, and for the ext3 journal, that's a big problem.

Any block device which assumes that IO is done through the buffer
cache is broken in this respect.  The 2.2 raid1/5 reconstruction code
had the same problem, but 2.4 fixed that.

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [linux-lvm] Re: What really works?
  2001-10-22 17:04   ` Stephen C. Tweedie
  2001-10-22 17:22     ` Andreas Dilger
@ 2001-10-23  4:42     ` Joe Thornber
  2001-11-02  9:40       ` Stephen C. Tweedie
  1 sibling, 1 reply; 10+ messages in thread
From: Joe Thornber @ 2001-10-23  4:42 UTC (permalink / raw)
  To: linux-lvm

Stephen,

On Mon, Oct 22, 2001 at 11:05:24PM +0100, Stephen C. Tweedie wrote:
> LVM folks, am I missing something here?  I can't see how you can
> assert that the device is truly quiescent after the LOCK_PE has been
> set.  

You're not missing anything.  The correct thing to do (as pointed out
by Andrea Arc.) is to hook b_end_io such that we can keep track of all
'in flight' io's, and only start moving once these have completed.
This is implemented in the new device-mapper driver that we are
switching to shortly.  I consider 'live' pvmove broken in the current
LVM.

- Joe

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [linux-lvm] Re: What really works?
  2001-10-23  4:42     ` Joe Thornber
@ 2001-11-02  9:40       ` Stephen C. Tweedie
  0 siblings, 0 replies; 10+ messages in thread
From: Stephen C. Tweedie @ 2001-11-02  9:40 UTC (permalink / raw)
  To: linux-lvm; +Cc: Stephen Tweedie, Joe Thornber

Hi,

On Tue, Oct 23, 2001 at 10:43:59AM +0100, Joe Thornber wrote:
 
> You're not missing anything.  The correct thing to do (as pointed out
> by Andrea Arc.) is to hook b_end_io such that we can keep track of all
> 'in flight' io's, and only start moving once these have completed.

Precisely.

> This is implemented in the new device-mapper driver that we are
> switching to shortly.  I consider 'live' pvmove broken in the current
> LVM.

Just while we're on the subject, does anyone have a timescale for
fixing this?

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2001-11-02  9:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <E15vISM-0005YX-00.2001-10-21-14-15-58@mail5.svr.pol.co.uk>
2001-10-21 14:25 ` [linux-lvm] What really works? Jason A. Lixfeld
2001-10-22 14:37   ` [linux-lvm] " Theodore Tso
2001-10-22 14:42     ` Bryan-TheBS-Smith
2001-10-22 21:13     ` [linux-lvm] " Jason A. Lixfeld
2001-10-23  5:35       ` [linux-lvm] " Stephen C. Tweedie
2001-10-22 17:04   ` Stephen C. Tweedie
2001-10-22 17:22     ` Andreas Dilger
2001-10-23  5:32       ` Stephen C. Tweedie
2001-10-23  4:42     ` Joe Thornber
2001-11-02  9:40       ` Stephen C. Tweedie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).