Re: [uml-devel] Dynamic remount with variable COW stacking/merging needed, support for snapshot repilication

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [uml-devel] Dynamic remount with variable COW stacking/merging needed, support  for snapshot repilication
@ 2004-01-18  8:20 James W McMechan
  2004-01-18 16:13 ` BlaisorBlade
  2004-01-19  4:10 ` [uml-user] " Jeff Dike
  0 siblings, 2 replies; 6+ messages in thread
From: James W McMechan @ 2004-01-18  8:20 UTC (permalink / raw)
  To: sdw, user-mode-linux-user, user-mode-linux-devel

The stackable COW was not merged,
I have a new more abstracted version I
am working on, but Jeff seemed to want
to replace the LVM driver layer above
the ubd device to implement the COW
features rather then having COW as a
system feature below the ubd layer where
I need it, for it to work like a hardware
function. The LVM system can do its
own snapshot feature above the ubd
driver.
That puts LVM as the user of the UML
setting up a snapshot, and COW being
setup by the host admin completely
outside of the UML invisible (mostly)
to the admin of the UML (guest root)
Which provides two separate but
useful capabilities to the host admin
and the guest admin

Mostly I have been tinkering with the
ISAM style COW so that filesystems
that do not do sparse files will work
better,  but there are a lot of corner
cases to look at. So it is a bit slow.

The Dynamic remount could be done
in much less than one second, if you
prepare as a separate step, then
it is just replacing the file handle in the
device structure with a new handle
all writes before will occur to the
original COW and afterwards all
writes will occur to the new COW
since that is just a single pointer
write, it does not have too many
problems, only closing while running
the snapshot is likely to break things
and once the new COW is in place
the lower layer COWs can be merged
since they are then readonly, if it is
mounted r/w on more than one device
things would get messy, but that is bad
anyway I don't consider it a major
bug, but rather a user problem.

I have been thinking of some more
ubd flags ubd=C10H128S63 to
change disk layout or ubd1C10
for the ubd device 1 cylinder 10
case as a individual example.
C for cylinder
H for heads
S for sectors
P for padding i.e. 512 byte header
padding so that raw devices can be
checked without failing due to I/O
errors dropping out.
M for mmap size index,data
V to select what version header to create
V0 is a raw data file don't check for COW
V1 the first version header not portable
V2 the second one with the wrong math
V3 my version with the separated offset length
separated allows for using a program to fix the
math errors on detection without having to
redo the header format
V4 my version of ISAM
L for symlinks as names
U for update in place
which only really applies to the moo
program, but I was thinking should
have all the options defined.
?? sector size should have a option
?? for page size i.e. use 64k even on i386
so COW can be read on say a alpha
A to set AIO mode
R for readonly so all options are uppercase
S for sync data at each read/write
?? for sync on barriers for 2.6

The types of mmaping I think might be needed
no mmap in use -- for when mmap does not
work some fs do not mmap well
index mmap -- like now only the index/bitmap
is mmaped uses a fair amount of address space
full mmap -- map in both the index/bitmap and
the data, uses a huge amount of address space
paged index -- map in a (few) pages of the
index at a time, I have run with one page nicely
paged data -- map in a (few) pages of the data
at a time, I am not sure that this is helpful?

The problem I see with mmaping is that in order
to mmap properly I need to kmalloc a buffer
of the right size first, and then mmap on top of
the buffer, so that the kernel does not try to
use the space that is mmaped for other purposes
outside of the ubd_user/cow_user functions
which it would be free to do if it is not kmalloc'ed
and then if it tries to read/write that address space
the _user function would be using the mmaped
area not the kernel memory, or when the kernel
overwrites the index/bitmap with kernel data
Ick the mind boggles

The types of headers I think should be present
COW -- what we have now
ISAM -- which will work without sparse files
DISK -- just to keep disk image info CHS etc
HEADER -- like disk but treats the data in
the disk as a separate file sort of like the 
backing file in a COW setup
several different HEADERs could be setup
for one image file, each with a different layout
for the C/H/S as a example

The problem I see with the ubd=mmap is that
it does not have a failure path for a
non-page-aligned data sector, it looks like it
just drops it, this would I think occur mostly
on metadata updates, 2.4.23-1um was where
I was looking, and I could be wrong, but if it
does I would expect massive data corruption

My infrastructure currently uses mmap but I
hope to make that optional since it gobbles
address space, ISAM uses 8 bytes/sector as
index entries and so gobbles 64 times as much
as regular COW files, and I have problems
with the regular COW now, if I get the paged
version it will be much better.
The only other calls it uses are
ubd_malloc/ubd_free, (kmalloc/kfree) 
open, close, pread, pwrite
all of which are being abstracted through
common_open, common_close
common_pread, common_pwrite
so that ubd_user gets much nicer and
ubd_kern stops having bits of the COW
layer stuck in the device structure.

Then later the entire section can be replaced
if desired with a different implementation so
long as it provides open/close/pread/pwrite
or something similar.

Oh yes as a side note 2.6.0 and 2.4.25pre
have the fix for Oops on the host kernel
when reading /dev/shm with hostfs so if
that was bothering you the 2.6 or next
2.4 kernel should help.

James McMechan

________________________________________________________________
The best thing to hit the internet in years - Juno SpeedBand!
Surf the web up to FIVE TIMES FASTER!
Only $14.95/ month - visit www.juno.com to sign up today!

-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [uml-devel] Dynamic remount with variable COW stacking/merging needed, support  for snapshot repilication
  2004-01-18  8:20 [uml-devel] Dynamic remount with variable COW stacking/merging needed, support for snapshot repilication James W McMechan
@ 2004-01-18 16:13 ` BlaisorBlade
  2004-01-19  3:59   ` Matt Zimmerman
  2004-01-19  4:10 ` [uml-user] " Jeff Dike
  1 sibling, 1 reply; 6+ messages in thread
From: BlaisorBlade @ 2004-01-18 16:13 UTC (permalink / raw)
  To: James W McMechan; +Cc: user-mode-linux-user, user-mode-linux-devel

Alle 09:20, domenica 18 gennaio 2004, James W McMechan ha scritto:

> The types of mmaping I think might be needed:

> no mmap in use -- for when mmap does not
> work some fs do not mmap well

> index mmap -- like now only the index/bitmap
> is mmaped uses a fair amount of address space

> full mmap -- map in both the index/bitmap and
> the data, uses a huge amount of address space

> paged index -- map in a (few) pages of the
> index at a time, I have run with one page nicely

> paged data -- map in a (few) pages of the data
> at a time, I am not sure that this is helpful?

If I've correctly understood what Jeff explains, this would be very useful to 
reduce memory consumption.

> The problem I see with mmaping is that in order
> to mmap properly I need to kmalloc a buffer
> of the right size first, and then mmap on top of
> the buffer, so that the kernel does not try to
> use the space that is mmaped for other purposes
> outside of the ubd_user/cow_user functions
> which it would be free to do if it is not kmalloc'ed
> and then if it tries to read/write that address space
> the _user function would be using the mmaped
> area not the kernel memory, or when the kernel
> overwrites the index/bitmap with kernel data
> Ick the mind boggles

Ok, I've checked and physmem_subst_mapping does exactly this (even if I don't 
see a real reason for which kmalloc should reserve the correct space; maybe 
this is the bug).

> The problem I see with the ubd=mmap is that
> it does not have a failure path for a
> non-page-aligned data sector, it looks like it
> just drops it, this would I think occur mostly
> on metadata updates, 2.4.23-1um was where
> I was looking, and I could be wrong, but if it
> does I would expect massive data corruption

In fact ubd=mmap has data corruption. If you have seen why, post *this only* 
to Jeff Dike (he needed several reports to start thinking mmap was buggy). I 
say *this only* because this mail was quite long and a bit hard to read.
Also, there is an user (I'm going to ask him whether he uses ubd=mmap) which 
reports some problems (I'm forwarding his message to you).

However, I've seen the missing failure path, but comparing with 2.4.20-6um( 
the latest patch I had available on my HD without ubd=mmap) it seems that 
mmap is just not used. Also, mmap_fd is called by prepare_request, which does 
nothing if alignment is wrong, while the actual write is anyhow done in 
do_io. And I'm sure of this, since the return value of a failing mmap_fd is 
the same as when ubd=mmap is not active (from mmap_fd):

        if(!ubd_do_mmap)
                return(-1);

        /* The buffer must be page aligned */
        if(((unsigned long) req->buffer % UBD_MMAP_BLOCK_SIZE) != 0)
                return(-1);


> Oh yes as a side note 2.6.0 and 2.4.25pre
> have the fix for Oops on the host kernel
> when reading /dev/shm with hostfs so if
> that was bothering you the 2.6 or next
> 2.4 kernel should help.

Yes, I saw it. Thanks!

-- 
cat <<EOSIGN
Paolo Giarrusso, aka Blaisorblade
Linux Kernel 2.4.23/2.6.0 on an i686; Linux registered user n. 292729
EOSIGN




-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [uml-devel] Dynamic remount with variable COW stacking/merging needed, support  for snapshot repilication
  2004-01-18 16:13 ` BlaisorBlade
@ 2004-01-19  3:59   ` Matt Zimmerman
  0 siblings, 0 replies; 6+ messages in thread
From: Matt Zimmerman @ 2004-01-19  3:59 UTC (permalink / raw)
  To: user-mode-linux-user, user-mode-linux-devel

On Sun, Jan 18, 2004 at 05:13:57PM +0100, BlaisorBlade wrote:

> Alle 09:20, domenica 18 gennaio 2004, James W McMechan ha scritto:
> > Oh yes as a side note 2.6.0 and 2.4.25pre
> > have the fix for Oops on the host kernel
> > when reading /dev/shm with hostfs so if
> > that was bothering you the 2.6 or next
> > 2.4 kernel should help.
> 
> Yes, I saw it. Thanks!

Ooh, good.  I stopped using hostfs-on-tmpfs for a number of things due to
this bug, having sent the oops to linux-kernel and having received no
response.  I'm glad it's fixed.

-- 
 - mdz


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [uml-user] Re: [uml-devel] Dynamic remount with variable COW stacking/merging needed, support  for snapshot repilication
  2004-01-18  8:20 [uml-devel] Dynamic remount with variable COW stacking/merging needed, support for snapshot repilication James W McMechan
  2004-01-18 16:13 ` BlaisorBlade
@ 2004-01-19  4:10 ` Jeff Dike
  2004-01-19 18:26   ` Adam Heath
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff Dike @ 2004-01-19  4:10 UTC (permalink / raw)
  To: James W McMechan; +Cc: sdw, user-mode-linux-user, user-mode-linux-devel

On Sun, Jan 18, 2004 at 12:20:58AM -0800, James W McMechan wrote:
> The stackable COW was not merged,
> I have a new more abstracted version I
> am working on, but Jeff seemed to want
> to replace the LVM driver layer above
> the ubd device to implement the COW
> features rather then having COW as a
> system feature below the ubd layer where
> I need it, for it to work like a hardware
> function. 

What does COW have to do with the LVM driver?  They're completely independent
of each other.

The COW stuff is inside the ubd driver, rather than below it.  I want it
to be above the ubd driver, so that it is complely outside, and a ubd device
deals with only one file.

This will clean up the code, allow stackable COW files as a trivial 
side-effect, and allow COW volumes to be mounted on the host.  What won't
you be able to do if this happens?

> and COW being
> setup by the host admin completely
> outside of the UML invisible (mostly)
> to the admin of the UML (guest root)
> Which provides two separate but
> useful capabilities to the host admin
> and the guest admin

This is another thing a separate COW driver will make possible.  A COW volume
could mounted on the host and be passed to UML as a device.

> The Dynamic remount could be done
> in much less than one second, if you
> prepare as a separate step, then
> it is just replacing the file handle in the
> device structure with a new handle
> all writes before will occur to the
> original COW and afterwards all
> writes will occur to the new COW

Just to be clear, you're suggesting copying a COW file on the host, then
switching the ubd driver to use the copy?  I don't see how that helps, because
the COW file needs to be quiescent during the copy.

> I have been thinking of some more
> ubd flags ubd=C10H128S63 to
> change disk layout 

This is better than stuffing geometry in the COW header.  Are there any actual
uses that you know of for specifying the geometry of a ubd device?

> or ubd1C10
> for the ubd device 1 cylinder 10
> case as a individual example.
> C for cylinder
> H for heads
> S for sectors

Again, interesting, but are there any uses for this?

> P for padding i.e. 512 byte header
> padding so that raw devices can be
> checked without failing due to I/O
> errors dropping out.

I think this happens automatically since I/O happens with 512 granularity
anyway.

> M for mmap size index,data

Do you mean the mmap unit?

> V to select what version header to create
> V0 is a raw data file don't check for COW
> V1 the first version header not portable
> V2 the second one with the wrong math
> V3 my version with the separated offset length
> separated allows for using a program to fix the
> math errors on detection without having to
> redo the header format

I don't want to support old COW versions.  I don't see any point in allowing
UML to create V1 or V2 COW files.

V0 is needed, but that's the wrong name.  Some other switch is needed to
tell the driver to treat the file as data even if it looks like a COW file.

> V4 my version of ISAM

This is a new cow_format, not a new version of the COW file (and I realize
that sounds confusing :-).  IOW, you set the cow_format field in the header
to 1, after getting me to assign 1 as the ISAM cow_format.

> L for symlinks as names

What this?  I don't remember seeing this before.

> U for update in place
> which only really applies to the moo
> program, but I was thinking should
> have all the options defined.

Well, if the ubd (or COW) driver isn't going to do update in place (and I don't
see why that makes sense), then it should just be specific to uml_moo.

> ?? sector size should have a option
> ?? for page size i.e. use 64k even on i386
> so COW can be read on say a alpha

Yeah.

> A to set AIO mode

I think this should be auto-detected and used.  Maybe there should be a no-AIO
for debugging.  In any case, AIO is a lot less useful with ubd-mmap, unless
there is an AIO way to wait to a page to be faulted in.

> R for readonly so all options are uppercase
> S for sync data at each read/write
> ?? for sync on barriers for 2.6

Are you thinking barriers for journalling filesystems?  That would be useful,
but I've done no thinking about that.

> The types of mmaping I think might be needed
> no mmap in use -- for when mmap does not
> work some fs do not mmap well
> index mmap -- like now only the index/bitmap
> is mmaped uses a fair amount of address space
> full mmap -- map in both the index/bitmap and
> the data, uses a huge amount of address space
> paged index -- map in a (few) pages of the
> index at a time, I have run with one page nicely
> paged data -- map in a (few) pages of the data
> at a time, I am not sure that this is helpful?

The paged index would be useful (and that's probably the only way mapping the
index could be useful on x86).  It's not obvious to me that you can keep the
right parts mapped.  Also, the fact that only little pieces of the index
are mapped at any time make it a lot harder to make mapping pay.  With only
two words being changed at a time, the writes are pretty damn cheap.

> The problem I see with mmaping is that in order
> to mmap properly I need to kmalloc a buffer
> of the right size first, and then mmap on top of
> the buffer, so that the kernel does not try to
> use the space that is mmaped for other purposes
> outside of the ubd_user/cow_user functions
> which it would be free to do if it is not kmalloc'ed
> and then if it tries to read/write that address space
> the _user function would be using the mmaped
> area not the kernel memory, or when the kernel
> overwrites the index/bitmap with kernel data
> Ick the mind boggles

You'd use __get_free_pages for this, not kmalloc.  But that's exactly what
happens, and it works cleanly.  You allocate the pages, overmap them from
the device, and then unmap and replace the original pages on free.

> The types of headers I think should be present
> COW -- what we have now
> ISAM -- which will work without sparse files
> DISK -- just to keep disk image info CHS etc
> HEADER -- like disk but treats the data in
> the disk as a separate file sort of like the 
> backing file in a COW setup
> several different HEADERs could be setup
> for one image file, each with a different layout
> for the C/H/S as a example

Those are interesting.  However, I must again ask who's crying for these :-)

> The problem I see with the ubd=mmap is that
> it does not have a failure path for a
> non-page-aligned data sector, it looks like it
> just drops it, this would I think occur mostly
> on metadata updates, 2.4.23-1um was where
> I was looking, and I could be wrong, but if it
> does I would expect massive data corruption

I am looking for ubd-mmap data corruption, but I don't see any there.  In
the case that the data can't be mapped (wrong size, wrong alignment, etc),
then it just falls back to the old read/write code.

> Oh yes as a side note 2.6.0 and 2.4.25pre
> have the fix for Oops on the host kernel
> when reading /dev/shm with hostfs so if
> that was bothering you the 2.6 or next
> 2.4 kernel should help.

Yup, I saw that.  Nice work.

				Jeff

-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [uml-user] Re: [uml-devel] Dynamic remount with variable COW stacking/merging needed, support  for snapshot repilication
  2004-01-19  4:10 ` [uml-user] " Jeff Dike
@ 2004-01-19 18:26   ` Adam Heath
  2004-01-19 19:40     ` Jeff Dike
  0 siblings, 1 reply; 6+ messages in thread
From: Adam Heath @ 2004-01-19 18:26 UTC (permalink / raw)
  To: Jeff Dike
  Cc: James W McMechan, sdw, user-mode-linux-user,
	user-mode-linux-devel

On Sun, 18 Jan 2004, Jeff Dike wrote:

> > A to set AIO mode
>
> I think this should be auto-detected and used.  Maybe there should be a no-AIO
> for debugging.  In any case, AIO is a lot less useful with ubd-mmap, unless
> there is an AIO way to wait to a page to be faulted in.

The problem is that you can't request multiple pages at once with mmap.  Only
one page at a time(because the page you requested will fault, which will pause
the process while the system pages it in).

With AIO, you can request multiple pages at once, and they will return in
random(or efficient) order.  AIO can actually be faster than mmap in this
case.

In fact, mmap can be slower in some cases than normal reading, if the kernel
only pages in one block at a time(because of other reasons).

-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [uml-user] Re: [uml-devel] Dynamic remount with variable COW stacking/merging needed, support for snapshot repilication
  2004-01-19 18:26   ` Adam Heath
@ 2004-01-19 19:40     ` Jeff Dike
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff Dike @ 2004-01-19 19:40 UTC (permalink / raw)
  To: Adam Heath
  Cc: James W McMechan, sdw, user-mode-linux-user,
	user-mode-linux-devel

adam@doogie.org said:
> The problem is that you can't request multiple pages at once with
> mmap.  Only one page at a time(because the page you requested will
> fault, which will pause the process while the system pages it in). 

No, mmap itself doesn't fault.  It fiddles some data structures and returns
immediately.  The fault happens when the page is first touched.  That's why
the ubd IO thread touches the page after a request is satified my mmap - to
fault the page in in an asynchronous way.

I would love for there to be aio page fault notification.

> In fact, mmap can be slower in some cases than normal reading, if the
> kernel only pages in one block at a time(because of other reasons). 

In a lot of cases, actually.  The tlb operations associated with changing
mappings are expensive, and can outweigh a lot of copying.

				Jeff



-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-01-19 19:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-18  8:20 [uml-devel] Dynamic remount with variable COW stacking/merging needed, support for snapshot repilication James W McMechan
2004-01-18 16:13 ` BlaisorBlade
2004-01-19  3:59   ` Matt Zimmerman
2004-01-19  4:10 ` [uml-user] " Jeff Dike
2004-01-19 18:26   ` Adam Heath
2004-01-19 19:40     ` Jeff Dike

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.