Linux LVM users
 help / color / mirror / Atom feed
* [linux-lvm] Using Oracle with lvm AND rawio: read(512) from /dev/raw/...
@ 2000-12-01 21:28 Michael Tokarev
  2000-12-01 23:22 ` Andreas Dilger
  2000-12-02 20:46 ` Jorg de Jong
  0 siblings, 2 replies; 15+ messages in thread
From: Michael Tokarev @ 2000-12-01 21:28 UTC (permalink / raw)
  To: linux-lvm

Hello!

I finally got working (at least at first view) lvm with
RedHat's 2.2.17-8 kernel (from rawhide) and lvm-0.9.new_raid.patch,
patched by Andreas Dilger (with small additional changes).

So far, so good.

As I can see, lvm patch should go with sct's rawio patch, so
I conclude that them should work together.  Isn't it? :)

My main goal is to use oracle database with raw devices on
top of lvm (using some number of disks, so that total
storage size is large and needs to be managed intelligently).
This is IMHO a great thing to have with linux -- Oracle's
best results can be achieved on raw devices, and those needs
to be managed (using disk partitions is a PITA here).

I've made simple LV, and attached raw device on top of it,
using `raw' utility.  And what I've noticied is that I
can't write 512-byte blocks to it.  The only block size
I can use is 1024, 2048, 3072, etc, i.e. 1024*n.  With
just lvm device it is ok (seemed to be), but with /dev/raw
device write/read gives "invalid argument" error message.

The bad thing is that Oracle tries to write 512 bytes
_when creating tablespace_ (I've set up it to use 4k
blocks, so it will read/write 4096*n blocks after ts
creation).  I attached some strace output from oracle
process when creating tablespace, below.

/dev/raw/raw100 bound to /dev/vg0/ora0 lv (128M).
 dd if=/dev/zero of=/dev/raw/raw100 bs=512
  dd: /dev/raw/raw100: Invalid argument
  1+0 records in
  0+0 records out

But what's interesting is that I already have set up
some databases to use raw devices, and them working
good (no glitches was found so far).  I used "plain"
disk partitions for this, and softraid-devices, e.g.
  partition => rawdevice => oracle datafile
  partition,partition => raid0 => rawdevice => oracle

/dev/raw/raw1 bound to /dev/sda2 (1G)
 dd if=/dev/zero of=/dev/raw/raw1 bs=512
 ^C
 281835+0 records in
 281834+0 records out

(I've just hit ^C here for. Process will complete
correctly).

So the question: why read/write fails with rawio
on top of lvm when requesting "incorrect" block size?

Strace excerpt below.  What I noticied is that
oracle tried to use different methods here, but
all failed.  Some of them used with 1024-multiple
sizes only, but also failed.
BTW, does anybody knows what's "pwrite()" ?

Oracle 8.1.6 EE (Oracle8iR2) for Linux.

I think that we all interested in resolving this
particular issue.  I'll be glad to try different things
here as well, and provide any additional info, or providing
all my experience for this... And just one thing -- may this
be due to strangers with lvm patch (0.9-2.2.17-new_raid +
Andreas "patch for patch" + my *minor* tweaks)? 

Thank you.

Regards,
 Michael.

SQL> create tablespace x datafile '/dev/raw/raw100' size 100m reuse;
...
stat("/dev/raw/raw100", {st_mode=S_IFCHR|0660, st_rdev=makedev(162, 100), ...}) = 0
open("/dev/raw/raw100", O_RDONLY)       = 9
close(9)                                = 0
open("/dev/raw/raw100", O_RDWR)         = 9
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=1024}) = 0
fstat(408, {st_mode=S_IFREG|0640, st_size=1077248, ...}) = 0
fstat(407, 0xbfffb65c)                  = -1 EBADF (Bad file descriptor)
dup2(9, 407)                            = 407
close(9)                                = 0
fcntl(407, F_SETFD, FD_CLOEXEC)         = 0
fcntl(407, F_GETFL)                     = 0x2 (flags O_RDWR)
fcntl(407, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
stat("/dev/raw/raw100", {st_mode=S_IFCHR|0660, st_rdev=makedev(162, 100), ...}) = 0
open("/dev/raw/raw100", O_RDWR)         = 9
lseek(9, 0, SEEK_SET)                   = 0
write(9, "\0\0\0\0\0\20\0\0\0d\0\0]\\[Z\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
lseek(9, 104861184, SEEK_SET)           = 104861184
read(9, 0x92e4800, 512)                 = -1 EINVAL (Invalid argument)
close(9)                                = 0
fcntl(407, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
close(407)                              = 0
old_mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40191000
old_mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x401d2000
old_mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40213000
old_mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40254000
stat("/dev/raw/raw100", {st_mode=S_IFCHR|0660, st_rdev=makedev(162, 100), ...}) = 0
open("/dev/raw/raw100", O_RDONLY)       = 9
close(9)                                = 0
open("/dev/raw/raw100", O_RDWR)         = 9
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=1024}) = 0
fstat(407, 0xbfffaf58)                  = -1 EBADF (Bad file descriptor)
dup2(9, 407)                            = 407
close(9)                                = 0
fcntl(407, F_SETFD, FD_CLOEXEC)         = 0
fcntl(407, F_GETFL)                     = 0x2 (flags O_RDWR)
fcntl(407, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
stat("/dev/raw/raw100", {st_mode=S_IFCHR|0660, st_rdev=makedev(162, 100), ...}) = 0
open("/dev/raw/raw100", O_RDWR)         = 9
lseek(9, 0, SEEK_SET)                   = 0
write(9, "\0\0\0\0\0\20\0\0\377\377\377\377]\\[Z\0\0\0\0\0\0\0\0"..., 4096) = 4096
lseek(9, 4294966784, SEEK_SET)          = -1 EINVAL (Invalid argument)
close(9)                                = 0
fcntl(407, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
close(407)                              = 0
stat("/dev/raw/raw100", {st_mode=S_IFCHR|0660, st_rdev=makedev(162, 100), ...}) = 0
open("/dev/raw/raw100", O_RDONLY)       = 9
close(9)                                = 0
gettimeofday({975703953, 657435}, NULL) = 0
open("/dev/raw/raw100", O_RDWR)         = 9
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=1024}) = 0
fstat(407, 0xbfffb378)                  = -1 EBADF (Bad file descriptor)
dup2(9, 407)                            = 407
close(9)                                = 0
fcntl(407, F_SETFD, FD_CLOEXEC)         = 0
pwrite(407, "\0\2\0\0\1\0\0\0\0\0\0\0\0\0\1\1\0\0\0\0\0\0\0\0\0\0\0"..., 262144, 4096) = -1 EINVAL (Invalid argument)
pwrite(407, "\0\2\0\0A\0\0\0\0\0\0\0\0\0\1\1\0\0\0\0\0\0\0\0\0\0\0\0"..., 262144, 266240) = -1 EINVAL (Invalid argument)
pwrite(407, "\0\2\0\0\201\0\0\0\0\0\0\0\0\0\1\1\0\0\0\0\0\0\0\0\0\0"..., 262144, 528384) = -1 EINVAL (Invalid argument)
pwrite(407, "\0\2\0\0\301\0\0\0\0\0\0\0\0\0\1\1\0\0\0\0\0\0\0\0\0\0"..., 262144, 790528) = -1 EINVAL (Invalid argument)
stat("/dev/raw/raw100", {st_mode=S_IFCHR|0660, st_rdev=makedev(162, 100), ...}) = 0
open("/dev/raw/raw100", O_RDWR)         = 9
lseek(9, 0, SEEK_SET)                   = 0
write(9, "\0\0\0\0\0\20\0\0\0\1\0\0]\\[Z\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
lseek(9, 1052160, SEEK_SET)             = 1052160
read(9, 0x92e4800, 512)                 = -1 EINVAL (Invalid argument)
close(9)                                = 0
fcntl(407, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
close(407)                              = 0
gettimeofday({975703953, 665286}, NULL) = 0
gettimeofday({975703953, 665470}, NULL) = 0
close(6)                                = 0
open("/usr/oracle/dbs/orcl/bgdump/alert_orcl.log", O_WRONLY|O_APPEND|O_CREAT, 0664) = 6
... "ORA-19502 signalled during: crea"

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Using Oracle with lvm AND rawio: read(512) from /dev/raw/...
  2000-12-01 21:28 [linux-lvm] Using Oracle with lvm AND rawio: read(512) from /dev/raw/ Michael Tokarev
@ 2000-12-01 23:22 ` Andreas Dilger
  2000-12-02  0:35   ` Michael Ju. Tokarev
  2000-12-02 20:46 ` Jorg de Jong
  1 sibling, 1 reply; 15+ messages in thread
From: Andreas Dilger @ 2000-12-01 23:22 UTC (permalink / raw)
  To: linux-lvm

Michael Tokarev writes:
> The bad thing is that Oracle tries to write 512 bytes
> _when creating tablespace_ (I've set up it to use 4k
> blocks, so it will read/write 4096*n blocks after ts
> creation).  I attached some strace output from oracle
> process when creating tablespace, below.

One "hack" you could try when creating the tablespaces initially is
to symlink /dev/raw/raw100 to the block device you are using
(in this case /dev/vg0/ora0), and then when you are done with
the tabelspace creation remove the symlink and set up the raw
device as before.

I'm not 100% sure this will work, however.  Is the /dev/raw directory
a "virtual filesystem" like proc or /dev/shm?  The other possibility
is that Oracle will treat it differently because it is a block device
and not a character device, but I'm not sure of that either.

As to the real problem, I have no idea.  There were a couple of changes
that Heinz made to LVM w.r.t. block sizes, but I think that was limited
to removing constants, and not changing the actual block handling.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Using Oracle with lvm AND rawio: read(512) from  /dev/raw/...
  2000-12-01 23:22 ` Andreas Dilger
@ 2000-12-02  0:35   ` Michael Ju. Tokarev
  2000-12-02  5:29     ` [linux-lvm] Using Oracle with lvm AND rawio: read(512) from Andreas Dilger
  2000-12-02 10:45     ` [linux-lvm] Using Oracle with lvm AND rawio: read(512) from /dev/raw/ Christoph Hellwig
  0 siblings, 2 replies; 15+ messages in thread
From: Michael Ju. Tokarev @ 2000-12-02  0:35 UTC (permalink / raw)
  To: linux-lvm

Andreas Dilger wrote:
> 
> Michael Tokarev writes:
> > The bad thing is that Oracle tries to write 512 bytes
> > _when creating tablespace_ (I've set up it to use 4k
> > blocks, so it will read/write 4096*n blocks after ts
> > creation).  I attached some strace output from oracle
> > process when creating tablespace, below.
> 
> One "hack" you could try when creating the tablespaces initially is
> to symlink /dev/raw/raw100 to the block device you are using
> (in this case /dev/vg0/ora0), and then when you are done with
> the tabelspace creation remove the symlink and set up the raw
> device as before.

What's *still* unknown to me is a difference between character and
block specials.  In principle, block device should only allow
block access (i.e. multiple of 512 or 1024 or whatether size),
while character devs should allow read of 1 byte.  For example,
solaris doesn't allow "any-size" i/o on block devices -- things
will be bad here (when I read one byte, I got it, but next byte
read will actually be 513'th one, not 2nd etc).  Linux sometimes
gives us errors (invalid argument) in such a cases, and sometimes
not (lvm devices doesn't generate this errors).  The whole rawio
thing as it seemed *to me* to be, is to provide some layer between
block and char i/o, so it should be possible to use any read/write
size with it.  At the other hand, rawio (as stated at sgi oss site
at least), while uses character devices, requires even more
restrictions to be meet -- also about aligning memory buffers
used for read/write requests etc.  I'm confused here... :(

> I'm not 100% sure this will work, however.

This will not work.  I don't know if it's an oracle bug or not
(seemed to be), but if I give to it *block* special file (like
lvm's lv), it will try to remove that file (device) first and
than complain that "file already exists" (funny ;).  But this
triggers something in my mind -- I can create tablespace in
regular file and than copy that file to lv.  For this I don't
shure if it will work or not (will check later - interesting),
but for shure this isn't a solution, since we'll not be able
to extend that datafile from oracle (like extending filesystem
on top of lv), since this again will require writing/reading
like at initial creation time, -- so almost all lvm work will
go away.

> Is the /dev/raw directory
> a "virtual filesystem" like proc or /dev/shm?  The other possibility

No, it's just a plain subdirectory with a bunch (254) files named
raw1..raw254.  It looks like RedHat uses this layout (since there
is no standard layout for rawdevs like for sda hdb etc) to be
prepared for 2.4 kernel (I use 2.2.17).  Linux have no character
devices for disks etc for now, so the rawio patch is for that
purpose (e.g. solaris always had /dev/dsk/xxx devices - that are
character specials, and /dev/rdsk/xxx, block specials (or the
opposite, I don't remember) that have the same major/minor but
different type; linux lacks this).  For now, linux's way is a
ugly -- one should "bound" block device to one of /dev/raw/xxx
device using supplied `raw' utility, and after that can open
that /dev/raw/xxx and use it.  One thing that stops me trying
lvm in "my real life" is that I didn't know if it can work together
with rawio patch.  For now, lvm patch requires rawio patch, so
I concluded that them works together well and tried that...

> is that Oracle will treat it differently because it is a block device
> and not a character device, but I'm not sure of that either.

Shure it will, and will refuse to do something... ;)

Seriously, for me it's not a critical to *not* to use lvm for this
sort of things (yet), there are another working solutions already
exists.  With lvm, things should be far easily.  The main goal
of my original post is to enshure (or make it that) that linux works
in this environment (there should be high demand for raw partitions
on top of lvm in enterprise level, and oracle is the main consumer
for that raw devices).

> 
> As to the real problem, I have no idea.  There were a couple of changes
> that Heinz made to LVM w.r.t. block sizes, but I think that was limited
> to removing constants, and not changing the actual block handling.

This again reminds my "unknowlege" of block/char specials -- should
*block* lvm devices have some *block* limits or not ?! ;) But ok.

I need to check if "stock" lvm patches for stock kernel will work
here (using "retired" lvm-0.9-2.2.17-new_raid.patch and a patch
for that patch isn't a clean experiment ;) ), and will post results
here (but this will be at least at monday, as I won't not reboot
machines at work from home -- all my experiments are at work, I
have no hardware to test things at home machine).  If things will
go, I'll see what's different in two situations...

BTW, here, we are using both lvm and softraid together.  The
test machine have hardware raid controller (5x18Gb, level 5,
total ~72Gb, 68Gb used for one pv that I want to manage for
oracle datafiles), and another four 18G disks on different
controller, used as some number of softraid1-pairs (again,
with raw devices on top and oracle usage).  So it is important
that all subsystem works -- lvm, md and rawio.  Md together with
rawio works pretty well, but not lvm+rawio.  I don't want to
try lvm+md... :)

BTW, one (little?) question.  There was a thread on the list
recently subjected "lvm 0.9 - make_req_fn", for that I don't
have beginning (it was a time when mailinglist switched from
msede.com to sistina.com).  Is there any place where this
can be found ? (sistina's archives are quiet about this).
I ask because I got the problem when tried to apply (really
adopt) patches to redhat-patched kernel exactly with this
(make_req_fn), and wanted to ask list too about this (and
found end of that thread...).

Thanks!

Regards,
 Michael.

P.S.  I like lvm tools -- the first look was *very* nice
tool collection, with very friendly interface/concepts...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Using Oracle with lvm AND rawio: read(512) from
  2000-12-02  0:35   ` Michael Ju. Tokarev
@ 2000-12-02  5:29     ` Andreas Dilger
  2000-12-02 23:24       ` Michael Ju. Tokarev
  2000-12-02 10:45     ` [linux-lvm] Using Oracle with lvm AND rawio: read(512) from /dev/raw/ Christoph Hellwig
  1 sibling, 1 reply; 15+ messages in thread
From: Andreas Dilger @ 2000-12-02  5:29 UTC (permalink / raw)
  To: linux-lvm

Michael Tokarev writes:
> What's *still* unknown to me is a difference between character and
> block specials.  In principle, block device should only allow
> block access (i.e. multiple of 512 or 1024 or whatether size),
> while character devs should allow read of 1 byte.

Actually, it is just the opposite for devices like hard drives:
the block interface will read a whole block no matter what size of
I/O you do, and it will cache the block for any smaller read/write
actions (i.e. read whole block-modify part of block-write whole block).
The character interface do NOT buffer the data, so the application
is forced to do correct block-sized and aligned reads/writes themselves.
The benefit of the character device is that it doesn't do cacheing.

> but for shure this isn't a solution, since we'll not be able
> to extend that datafile from oracle (like extending filesystem
> on top of lv), since this again will require writing/reading
> like at initial creation time, -- so almost all lvm work will
> go away.

You don't have to extend the datafiles for Oracle, just add a new
datafile to the tablespace.

Cheers, Andreas
-- 
Andreas Dilger                               TurboLabs filesystem development

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Using Oracle with lvm AND rawio: read(512) from /dev/raw/...
  2000-12-02  0:35   ` Michael Ju. Tokarev
  2000-12-02  5:29     ` [linux-lvm] Using Oracle with lvm AND rawio: read(512) from Andreas Dilger
@ 2000-12-02 10:45     ` Christoph Hellwig
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2000-12-02 10:45 UTC (permalink / raw)
  To: linux-lvm; +Cc: linux-lvm

On Sat, Dec 02, 2000 at 03:35:18AM +0300, Michael Ju. Tokarev wrote:
> What's *still* unknown to me is a difference between character and
> block specials.  In principle, block device should only allow
> block access (i.e. multiple of 512 or 1024 or whatether size),
> while character devs should allow read of 1 byte.  For example,
> solaris doesn't allow "any-size" i/o on block devices -- things
> will be bad here (when I read one byte, I got it, but next byte
> read will actually be 513'th one, not 2nd etc).  Linux sometimes
> gives us errors (invalid argument) in such a cases, and sometimes
> not (lvm devices doesn't generate this errors).  The whole rawio
> thing as it seemed *to me* to be, is to provide some layer between
> block and char i/o, so it should be possible to use any read/write
> size with it.  At the other hand, rawio (as stated at sgi oss site
> at least), while uses character devices, requires even more
> restrictions to be meet -- also about aligning memory buffers
> used for read/write requests etc.  I'm confused here... :(

Ok.  The blockdevices itself don't have a direct interface to the
userspace.  They are only called by ll_rw_block.  The blockdevice
specials are implemented in fs/block_dev.c, and allow cached access
to the block devices (currently through the buffer cache, but it may
move to the pagecache soon).  The rawio stuff on the other hand allows
raw, uncached access to the block devices.  It has another advantage:
by using kiobufs the data does not have to be copied from user- to
kernelspace or vice versa.

> > Is the /dev/raw directory
> > a "virtual filesystem" like proc or /dev/shm?  The other possibility
> 
> No, it's just a plain subdirectory with a bunch (254) files named
> raw1..raw254.  It looks like RedHat uses this layout (since there
> is no standard layout for rawdevs like for sda hdb etc) to be
> prepared for 2.4 kernel (I use 2.2.17).

Nope.  You can use whatever /dev layout you want with every kernel you
want unless you use devfs.  The newer raw versions just use the /dev/raw
directory instead of having the files directly in /dev.

> Linux have no character
> devices for disks etc for now, so the rawio patch is for that
> purpose (e.g. solaris always had /dev/dsk/xxx devices - that are
> character specials, and /dev/rdsk/xxx, block specials (or the
> opposite, I don't remember)

Yepp. The opposite... (rdsk = raw disk).

> that have the same major/minor but
> different type; linux lacks this).  For now, linux's way is a
> ugly -- one should "bound" block device to one of /dev/raw/xxx
> device using supplied `raw' utility, and after that can open
> that /dev/raw/xxx and use it.  One thing that stops me trying
> lvm in "my real life" is that I didn't know if it can work together
> with rawio patch.  For now, lvm patch requires rawio patch, so
> I concluded that them works together well and tried that..

LVM doesn't really need rawio but the kiobuf infrastructure included
in the rawio patch.  Without snapshot LVs it should be usable without
that patch.  And yes, LVM works with rawio.  Rawio works with every
blockdvice, because it needs no special support on the block device
layer.

> > 
> > As to the real problem, I have no idea.  There were a couple of changes
> > that Heinz made to LVM w.r.t. block sizes, but I think that was limited
> > to removing constants, and not changing the actual block handling.
> 
> This again reminds my "unknowlege" of block/char specials -- should
> *block* lvm devices have some *block* limits or not ?! ;) But ok.

Block lvm devices shouldn't have a limit on the number of read/written
chars.  Actually block special devices is a very confusing name.  These
are just device files that allow cached access to the block devices.

	Christoph

-- 
Of course it doesn't work. We've performed a software upgrade.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Using Oracle with lvm AND rawio: read(512) from  /dev/raw/...
  2000-12-01 21:28 [linux-lvm] Using Oracle with lvm AND rawio: read(512) from /dev/raw/ Michael Tokarev
  2000-12-01 23:22 ` Andreas Dilger
@ 2000-12-02 20:46 ` Jorg de Jong
  1 sibling, 0 replies; 15+ messages in thread
From: Jorg de Jong @ 2000-12-02 20:46 UTC (permalink / raw)
  To: linux-lvm

Hi,

your mail trigged me to try this on my system! 

SQL> create tablespace x datafile '/dev/raw/raw1' ;
create tablespace x datafile '/dev/raw/raw1'

Tablespace created.

and I did not have any problems what so ever!
my setup is : 
- rh7.0 
- 2.4.11 stock kernel with 
- plain old LVM 8.0 (usertools patched)
- Oracle is 8.1.6 SE

regards 
-- 
Jorg de Jong
Work : mailto:jorg.de.jong@ict.nl 
Play : mailto:j.e.s.de.jong@freeler.nl

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Using Oracle with lvm AND rawio: read(512) from
  2000-12-02  5:29     ` [linux-lvm] Using Oracle with lvm AND rawio: read(512) from Andreas Dilger
@ 2000-12-02 23:24       ` Michael Ju. Tokarev
  2000-12-03  1:34         ` Michael Ju. Tokarev
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Ju. Tokarev @ 2000-12-02 23:24 UTC (permalink / raw)
  To: linux-lvm

Andreas Dilger wrote:
> 
[]
> Actually, it is just the opposite for devices like hard drives:
> the block interface will read a whole block no matter what size of
> I/O you do, and it will cache the block for any smaller read/write
> actions (i.e. read whole block-modify part of block-write whole block).
> The character interface do NOT buffer the data, so the application
> is forced to do correct block-sized and aligned reads/writes themselves.
> The benefit of the character device is that it doesn't do cacheing.

Oh, thanks!  This makes sence for me now -- after almost 10 years
of experience ! :^)))

[]
> You don't have to extend the datafiles for Oracle, just add a new
> datafile to the tablespace.

Oracle will work (a bit) better if it have only one datafile for
each tablespace, unless them are on different disks.  It will
try to load-balance data between datafiles in tablespace, and this
is useless effort in one-disk (or, as in my case, one-raid-array)
layout.  Simple rule -- make *one* datafile on each disk for each
appropriate tablespace, not more and not less, and do not add second
datafile to the same disk, only grow existing one.  BTW, should LVM
also do disk-load-balancing?  I saw some todo items about this (and
remapping frequently accessing data to faster disk etc)...
Again, even for new datafile -- it should be created somewhere using
hacks described before (that does *not* work)...

I tried to setup datafile on filesystem than copying it to raw device
on top of lv.  And this does *not* work -- then oracle first opens
the tablespace, it requests read(512) -- probably to read datafile
header -- and failed exactly as then trying to create it, giving
"unable to identify datafile ...: invalid argument" error message.
Trace is very similar to one I posted before, with write() changed
to read().  Datafile created this way on plain disk partition
works prefectly, as expected...  So the problem is somewhere in
lvm+rawio combination -- only this pair together won't work.

The only way I have to try is to use stock kernel with clean lvm
patch, that I'll do at monday.

Regards,
 Michael.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Using Oracle with lvm AND rawio: read(512) from
  2000-12-02 23:24       ` Michael Ju. Tokarev
@ 2000-12-03  1:34         ` Michael Ju. Tokarev
  2000-12-04  8:42           ` Michael Ju. Tokarev
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Ju. Tokarev @ 2000-12-03  1:34 UTC (permalink / raw)
  To: linux-lvm

"Michael Ju. Tokarev" wrote:
> 
> The only way I have to try is to use stock kernel with clean lvm
> patch, that I'll do at monday.

Ok, installed now:
  stock 2.2.17 kernel with
    rawio patch (from lvm source)
    lvm-0.9-2.2.17-stock.patch
  lvmtools 0.9

 # raw /dev/raw/raw100 /dev/vg0/lv0
 /dev/raw/raw100:        bound to major 58, minor 0
 # dd if=/dev/zero of=/dev/raw/raw100 bs=512
 dd: /dev/raw/raw100: Invalid argument
 1+0 records in
 0+0 records out
 # _

 # raw /dev/raw/raw100 /dev/sda4
 dev/raw/raw100:        bound to major 8, minor 4
 # dd if=/dev/zero of=/dev/raw/raw100 bs=512
 dd: /dev/raw/raw100: No such device or address
 131073+0 records in
 131072+0 records out
 # _

So, as I can see, "stock lvm" also won't work with rawio
with 512-byte block i/o, while disk partition with rawio
is pretty happy with that block size.

Question: *why* lvm+rawio won't work with 512-bytes i/o?
And this is not an oracle question, oracle here is just
an example application that really uses 512-byte i/o.

I see in linux/include/linux/lvm.h :

 [...]
 #ifdef BLOCK_SIZE
 #undef BLOCK_SIZE
 #endif

 #ifdef CONFIG_ARCH_S390 
 #define BLOCK_SIZE     4096
 #else
 #define BLOCK_SIZE     1024  <<<<<<<<<========
 #endif

 #ifndef        SECTOR_SIZE
 #define SECTOR_SIZE    512
 #endif
 [...]

I'm not shure if it is legal to change this value...

Next 2 questions: can I change this value to 512?
And will this cure the problem?  For now I can't
experiment as I have no physical access to that
machine (only over dialup modem), and I wan't to
risk rebooting it again with unknown kernel...

BTW, why it ever defined?  I see that this value used
to initialize lvm_blocksizes[] static array, and that
array isn't changed anywhere, only read.  And only
in lvm_snap.c, if #defined DEBUG_SNAPSHOT (it will not
compile if this #defined, since that array referenced
here is static in other source) it is checked if underlying
device has different block size, and that value 
instead in further calculations.
Why lvm_blocksize[] is used@all while it is constant
array ?! ;)

Little *strong* suggestion.  Please rename BLOCK_SIZE
and SECTOR_SIZE to LVM_BLOCK_SIZE and LVM_SECTOR_SIZE
in lvm.h -- currently used names are very common to
be a source of namespace collisions.

Regards,
 Michael.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Using Oracle with lvm AND rawio: read(512) from
  2000-12-03  1:34         ` Michael Ju. Tokarev
@ 2000-12-04  8:42           ` Michael Ju. Tokarev
  2000-12-04 14:14             ` Jorg de Jong
  2000-12-04 14:45             ` Michael Tokarev
  0 siblings, 2 replies; 15+ messages in thread
From: Michael Ju. Tokarev @ 2000-12-04  8:42 UTC (permalink / raw)
  To: linux-lvm

"Michael Ju. Tokarev" wrote:
> 
[]
> I see in linux/include/linux/lvm.h :
> 
>  [...]
>  #ifdef BLOCK_SIZE
>  #undef BLOCK_SIZE
>  #endif
> 
>  #ifdef CONFIG_ARCH_S390
>  #define BLOCK_SIZE     4096
>  #else
>  #define BLOCK_SIZE     1024  <<<<<<<<<========
>  #endif
> 
>  #ifndef        SECTOR_SIZE
>  #define SECTOR_SIZE    512
>  #endif
>  [...]

Hm.  The same definitions are in 2.4-pre kernel patch...
So 2.4 lvm should also *not* work for 512-byte i/o size.
But Jorg de Jong claims that he successefully created
tablespace on raw dev with 2.4-pre11 kernel... I'm
confused...  And for S390 -- is smallest i/o size 4K ?!

[]
> Little *strong* suggestion.  Please rename BLOCK_SIZE
> and SECTOR_SIZE to LVM_BLOCK_SIZE and LVM_SECTOR_SIZE
> in lvm.h -- currently used names are very common to
> be a source of namespace collisions.

The same for 2.4.  Let's change those definitions while
it isn't *too* late...

Regards,
 Michael.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Using Oracle with lvm AND rawio: read(512) from
  2000-12-04  8:42           ` Michael Ju. Tokarev
@ 2000-12-04 14:14             ` Jorg de Jong
  2000-12-04 14:42               ` Michael Tokarev
  2000-12-04 14:45             ` Michael Tokarev
  1 sibling, 1 reply; 15+ messages in thread
From: Jorg de Jong @ 2000-12-04 14:14 UTC (permalink / raw)
  To: linux-lvm

"Michael Ju. Tokarev" wrote:
> 

Hi Michael, 

I had a look at you original message and I found two things;

- the man page of raw says that you can not use dd on a raw device !?!
- further more I suspect that it might be the case that our logical volume is
to small to create the requested table on.

The oracle documentation says on the subject :

>Raw Device Setup
>
>Keep in mind the following items when creating raw devices: 
>
>    When creating the volumes, ensure that the owner and group are oracle and oinstall, respectively. 
>
>     The size of an Oracle datafile created in a raw partition must be at least two Oracle block sizes smaller than the size of the
>     raw partition. 


-- 
Jorg de Jong
Work : mailto:jorg.de.jong@ict.nl 
Play : mailto:j.e.s.de.jong@freeler.nl

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Using Oracle with lvm AND rawio: read(512) from
  2000-12-04 14:14             ` Jorg de Jong
@ 2000-12-04 14:42               ` Michael Tokarev
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Tokarev @ 2000-12-04 14:42 UTC (permalink / raw)
  To: linux-lvm


Jorg de Jong wrote:
> 
> Hi Michael,
> 
> I had a look at you original message and I found two things;
> 
> - the man page of raw says that you can not use dd on a raw device !?!

This is because of (possible) alignment problems.  Dd command that I use aligns
data accordingly, it seemed to be.  At least I didn't noticied any problems
using it with plain "rawed" partitions, as shown by my examples, and it
works well with raw device on top of lvm volume, as long as I specify
block size 1024, 2048 etc, but NOT with 512 (default) blocksize.

> - further more I suspect that it might be the case that our logical volume is
> to small to create the requested table on.

I'd think that 128Mb volume is sufficient for 10Mb tablespace... ;)

> The oracle documentation says on the subject :
> 
> >Raw Device Setup
> >
> >Keep in mind the following items when creating raw devices:
> >
> >    When creating the volumes, ensure that the owner and group are oracle and oinstall, respectively.

If oinstall group used at all.  That devices simple should be read/writable by
oracle process owner, that's ok in my case:
  # ls -l /dev/raw/raw100
  crw-rw----    1 oracle   disk     162, 100 Aug 14 19:26 /dev/raw/raw100
  # _
If it where due to permissions problems, then plain partitions also
should not work (them works well), and oracle should return "permission
denied" error (that it does if I change permissions).

Regards,
 Michael.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Using Oracle with lvm AND rawio: read(512) from
  2000-12-04  8:42           ` Michael Ju. Tokarev
  2000-12-04 14:14             ` Jorg de Jong
@ 2000-12-04 14:45             ` Michael Tokarev
  2000-12-04 15:16               ` Michael Tokarev
  1 sibling, 1 reply; 15+ messages in thread
From: Michael Tokarev @ 2000-12-04 14:45 UTC (permalink / raw)
  To: linux-lvm

"Michael Ju. Tokarev" wrote:
[]
> 
> Hm.  The same definitions are in 2.4-pre kernel patch...
> So 2.4 lvm should also *not* work for 512-byte i/o size.
> But Jorg de Jong claims that he successefully created
> tablespace on raw dev with 2.4-pre11 kernel... I'm
> confused...  And for S390 -- is smallest i/o size 4K ?!

Changed BLOCK_SIZE to 512 (from 1024) and recompiled.
Lvm won't work anymore at all, any i/o (read/write) on
any lv gives "no such device or address" error...

Any ideas? :(((

Regards,
 Michael.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Using Oracle with lvm AND rawio: read(512) from
  2000-12-04 14:45             ` Michael Tokarev
@ 2000-12-04 15:16               ` Michael Tokarev
  2000-12-04 19:35                 ` Jorg de Jong
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Tokarev @ 2000-12-04 15:16 UTC (permalink / raw)
  To: linux-lvm

Michael Tokarev wrote:
> 
[]
> 
> Changed BLOCK_SIZE to 512 (from 1024) and recompiled.
> Lvm won't work anymore at all, any i/o (read/write) on
> any lv gives "no such device or address" error...
> 
> Any ideas? :(((

Oh, my bad, I'm sorry!..  It worked with BLOCK_SIZE=512, I forgot to
vgchange -ay !

So, with BLOCK_SIZE=512, it is ok, and dd ... bs=512 is also ok.
With 1024, it won't work with invalid argument.

What I noticied also is that regardless of actual block size
used in lvm, stat() on raw device on top of it reports
st_blksize=4096 (not 512 nor 1024) -- is this Ok?

And -- Jorg de Jong, can you please strace the ts creation
process?  I'm very interested on i/o buffer sizes used there.

If e.g. your ORACLE_SID=orcl, do the following:

  oracle> sqlplus internal/
  SQL>

then in other window/terminal, find pid of "oracleorcl"
(oracle$ORACLE_SID) process started by sqlplus:
  # ps -fu oracle
  ...
  oracle   14343 13857  0 18:02 pts/3    00:00:00 sqlplus
  oracle   14344 14343  0 18:02 ?        00:00:00 oracleorcl (DESCRIPTION=(LOCAL=...
           ^^^^^ ^^^^^                            ^^^^^^^^^^
and do the following:
  [ other window ]
  # strace -o trc -v -p 14344 ($PID of that process)

  [ sqlplus window ]
  SQL> create tablespace ... ;

  [ other window ]
  ^C
  # _  

Then examine `trc' file made by strace.  I'm interested in:
 1) st_blksize of /dev/raw/rawNN device
 2) read/write sizes requested by oracle on that device.

And also try:
  dd of=/dev/null if=/dev/raw/raw100 bs=512
   and the same with bs=1024
  (this will not destroy data)
 (/dev/raw/raw100 bound to one of LVs).

If you successefully created the tablespace, then
either oracle uses only 1024*n i/o (my 8.1.6 uses 512
sometimes) or your kernel allows to use 512 bytes...

Thank you!

Regards,
 Michael.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [linux-lvm] Using Oracle with lvm AND rawio: read(512) from
  2000-12-04 15:16               ` Michael Tokarev
@ 2000-12-04 19:35                 ` Jorg de Jong
  2000-12-04 20:01                   ` [linux-lvm] Final Q: i/o with 512 bytes: lvm+raw... [was: Oracle, lvm, rawio, ...] Michael Tokarev
  0 siblings, 1 reply; 15+ messages in thread
From: Jorg de Jong @ 2000-12-04 19:35 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 1398 bytes --]

Michael Tokarev wrote:
> 
> And -- Jorg de Jong, can you please strace the ts creation
> process?  I'm very interested on i/o buffer sizes used there.
> 
....
>   # strace -o trc -v -p 14344 ($PID of that process)
> 
>   [ sqlplus window ]
>   SQL> create tablespace ... ;
> 
>   [ other window ]
>   ^C
>   # _
> 
> Then examine `trc' file made by strace.  I'm interested in:
>  1) st_blksize of /dev/raw/rawNN device
>  2) read/write sizes requested by oracle on that device.
> 
I successfully created a tabelspace on a raw device.
See the attachment... 

> And also try:
>   dd of=/dev/null if=/dev/raw/raw100 bs=512
>    and the same with bs=1024
>   (this will not destroy data)
>  (/dev/raw/raw100 bound to one of LVs).
> 

[root] /tmp > dd of=/dev/null if=/dev/raw/raw1 bs=512
1024000+0 records in
1024000+0 records out
[root] /tmp > dd of=/dev/null if=/dev/raw/raw1 bs=1024
512000+0 records in
512000+0 records out

Also no problems here !

> If you successefully created the tablespace, then
> either oracle uses only 1024*n i/o (my 8.1.6 uses 512
> sometimes) or your kernel allows to use 512 bytes...
> 
> Thank you!
> 
> Regards,
>  Michael.
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@sistina.com
> http://lists.sistina.com/mailman/listinfo/linux-lvm

-- 
Jorg de Jong
Work : mailto:jorg.de.jong@ict.nl 
Play : mailto:j.e.s.de.jong@freeler.nl

[-- Attachment #2: trc.gz --]
[-- Type: application/x-gzip, Size: 2679 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [linux-lvm] Final Q: i/o with 512 bytes: lvm+raw... [was: Oracle, lvm, rawio, ...]
  2000-12-04 19:35                 ` Jorg de Jong
@ 2000-12-04 20:01                   ` Michael Tokarev
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Tokarev @ 2000-12-04 20:01 UTC (permalink / raw)
  To: linux-lvm

Jorg de Jong wrote:
[]
> I successfully created a tabelspace on a raw device.
> See the attachment...

Oh, Thank you.  Thanks.

Now I have last 2 questions to LVM gurus.

1. 512-byte i/o is ok on 2.4 kernel, and failed on 2.2 kernel.
For example, the following sequence of syscalls:

 open("/dev/raw/raw1", O_RDWR)           = 11
 lseek(11, 0, SEEK_SET)                  = 0
 write(11, "\0\0\0\0\0 \0\0\0\0\0\0]\\[Z\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
 lseek(11, 7680, SEEK_SET)               = 7680
 read(11, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512

will succed on 2.4 kernel (/dev/raw/raw1 bound to LV), and last read will fail on
2.2 kernel with EINVAL.  Why the difference?
With 1024-byte i/o, it will success on both.

2.  Is it ok to change BLOCK_SIZE in lvm.h to 512?  It seemed to be worked
with that, but I'm not shure for all cases.  What drawbacks can be here?
At least that definition in lvm.h looks strange, since for S390 it is ever
"better" -- 4096, and if I set it that way, lvm+rawio allows me to write
using 4096-bytes io, not 1024...

And for the last.  Does anybody knows why rawio always reports st_blksize=4096?
It seemed to be that st_blksize should match blksize of "underlying" device,
that in case of lvm has BLOCK_SIZE (from lvm.h) st_blksize...

Thanks to all whu answered my (a bit stupid) questions, especially to Jorg de Jong
for his testing with Oracle.

Regards,
 Michael.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2000-12-04 20:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-12-01 21:28 [linux-lvm] Using Oracle with lvm AND rawio: read(512) from /dev/raw/ Michael Tokarev
2000-12-01 23:22 ` Andreas Dilger
2000-12-02  0:35   ` Michael Ju. Tokarev
2000-12-02  5:29     ` [linux-lvm] Using Oracle with lvm AND rawio: read(512) from Andreas Dilger
2000-12-02 23:24       ` Michael Ju. Tokarev
2000-12-03  1:34         ` Michael Ju. Tokarev
2000-12-04  8:42           ` Michael Ju. Tokarev
2000-12-04 14:14             ` Jorg de Jong
2000-12-04 14:42               ` Michael Tokarev
2000-12-04 14:45             ` Michael Tokarev
2000-12-04 15:16               ` Michael Tokarev
2000-12-04 19:35                 ` Jorg de Jong
2000-12-04 20:01                   ` [linux-lvm] Final Q: i/o with 512 bytes: lvm+raw... [was: Oracle, lvm, rawio, ...] Michael Tokarev
2000-12-02 10:45     ` [linux-lvm] Using Oracle with lvm AND rawio: read(512) from /dev/raw/ Christoph Hellwig
2000-12-02 20:46 ` Jorg de Jong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox