All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: AFS-based VBD backend
@ 2004-12-23 11:41 Ian Pratt
  2004-12-23 12:23 ` Steve Traugott
  2004-12-23 12:36 ` Andrew Warfield
  0 siblings, 2 replies; 19+ messages in thread
From: Ian Pratt @ 2004-12-23 11:41 UTC (permalink / raw)
  To: Steve Traugott, xen-devel

> I think most of this could be done in python -- the backend 
> driver itself might be a relatively thin layer to translate 
> block addressing to and from file byte locations, talk to the 
> frontend, and do a periodic
> fsync() on the underlying file to write the changes back to 
> the AFS server.  

You don't want to do it in python, but you do want to do it in user
space to avoid the deadlocks with AFS.

The blocktap backend driver is what you want. I'm not sure of it's
current state, but the plan is to enable you to terminate a blk device
channel in user-space.

A future revision of blocktap could use kiovec's to avoid having to copy
data into user space, and thus would give good performance.

The kernel loop driver works fine with most file systems, but AFS is
very 'special' and doesn't really follow the design of the other file
systems.

Of course, you could use 'unfsd -r' and export your AFS root file
systems via same-machine NFS. This works pretty well, but I can't say
I've hammered it.

Ian
 


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread
* RE: AFS-based VBD backend
@ 2004-12-23 13:38 Ian Pratt
  2005-01-04  2:54 ` Steve Traugott
  0 siblings, 1 reply; 19+ messages in thread
From: Ian Pratt @ 2004-12-23 13:38 UTC (permalink / raw)
  To: Steve Traugott; +Cc: xen-devel

 

> > Do you still see hangs with 2.6.9 kernels?
> 
> We're not on 2.6.x yet, due to lagging OpenAFS support.  

Interesting. Is 2.6 OpenAFS support being actively worked on? Who by?
 
> I am 
> seeing the complaints about NFS roots in early 2.6 kernels 
> though -- do you have any reason to think it's better now than 2.4?

Use dom0 2.4 for AFS support, run unfsd to export to 2.6 domU's.

It's well worth a try...
 
> > > One variation on that theme that I have tested is an enbd server 
> > > running on an AFS client, serving block devices to dom0 
> on another 
> > > machine.  ;-) Slow, but seems stable.
> > > Haven't tried same-machine with that either, because it 
> eats about 
> > > 30% CPU on the enbd server under load.
> > 
> > We use redhat gnbd in preference to enbd.
> 
> I noticed that.  You said something last October about 
> running both the gnbd client and server in dom0 on a number 
> of machines -- I thought importing from the same machine 
> would cause a deadlock, so I've been trying to figure out 
> what you're actually doing.

It doesn't make sense to do a loopback import (not sure whether it would
deadlock or not, but I can believe it).

Running client and server in dom0 and importing/exporting to other
machines seems to work fine.

Ian


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread
* RE: AFS-based VBD backend
@ 2004-12-23 12:31 Ian Pratt
  2004-12-23 13:17 ` Steve Traugott
  0 siblings, 1 reply; 19+ messages in thread
From: Ian Pratt @ 2004-12-23 12:31 UTC (permalink / raw)
  To: Steve Traugott; +Cc: xen-devel

 > > The blocktap backend driver is what you want. I'm not sure of it's 
> > current state, but the plan is to enable you to terminate a 
> blk device 
> > channel in user-space.
> 
> Cool!  Where was the most recent version of that?

Unstable tree, but probably works in 2.0 too.

> > Of course, you could use 'unfsd -r' and export your AFS root file 
> > systems via same-machine NFS. This works pretty well, but I 
> can't say 
> > I've hammered it.
> 
> I'm one of the folks who ran into the NFS root hangs early on 
> -- trying very hard to get off of it, hence this messing 
> about with alternatives.
> Granted, I'm not using same-machine NFS, don't know how much 
> that would reduce the hangs.

Do you still see hangs with 2.6.9 kernels?

> One variation on that theme that I have tested is an enbd 
> server running on an AFS client, serving block devices to 
> dom0 on another machine.  ;-) Slow, but seems stable.  
> Haven't tried same-machine with that either, because it eats 
> about 30% CPU on the enbd server under load.

We use redhat gnbd in preference to enbd.

Ian 


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread
* AFS-based VBD backend
@ 2004-12-23  8:55 Steve Traugott
  2004-12-23 10:37 ` Luciano Miguel Ferreira Rocha
  2004-12-28 19:21 ` Johannes Formann
  0 siblings, 2 replies; 19+ messages in thread
From: Steve Traugott @ 2004-12-23  8:55 UTC (permalink / raw)
  To: xen-devel

Hi All,

Has anyone ever put any thought (or code) into an (Open)AFS-based
virtual block device backend for Xen?  This driver would use a large
file stored in AFS as its underlying "device", as opposed to a loop
device or an LVM or physical device.  If I understand Xen's interface
architecture correctly, this backend would be stacked something like
this:

    ext3 or other standard filesystem in domN
    existing block device frontend in domN
    new "afs file" backend in dom0
    large file in /afs in dom0
    AFS client in dom0
    AFS server somewhere on net

You would configure this using something like:

    disk = ['afsfile:/full/path/to/afs/file,sda1,w']

Only dom0 would need to be an AFS client; the kerberos ticket(s) would
be owned and renewed by management code in dom0 only; the other domains
would not need to be kerberized, would not have access to any keytabs in
dom0, etc.  Each domain could have a single kerberos ID and its own AFS
volume(s) for storage of its "disks", but the individual users or
daemons in each domain would not be aware of kerberos at all.  

I think most of this could be done in python -- the backend driver
itself might be a relatively thin layer to translate block addressing to
and from file byte locations, talk to the frontend, and do a periodic
fsync() on the underlying file to write the changes back to the AFS
server.  

There'd be nothing to keep someone from using this backend on top of
ordinary, non-AFS files; this might provide better performance (one less
layer) than going through the loop device driver.  Perhaps the VBD type
name might even want to be 'rawfile' or somesuch instead of 'afsfile',
though an AFS-specific bind/unbind script might be useful for token
management.  

Some potential FAQ answers, before I get bombarded:  ;-)

Q:  Why is this useful?
A:  Because AFS can be run over the Internet, has excellent security,
    client-side caching, server-side replication and snapshots, and
    would lend itself well to an environment where the AFS clients and
    servers might be in different geographical locations, owned by two
    different parties, hosting Xen domN filesystems owned in turn by
    other parties.  

Q:  Why not just use native AFS in the unprivileged domains?
A:  Because then those domains would have to be kerberized AFS clients,
    and the users/owners of those domains would have to have kerberos
    ID's, they'd have to be knowledgable in AFS ACLs, the AFS
    directory-based security model, daemon token renewal, and so on.
    The root-user domain owners would have to know how to manage
    kerberos users, and they'd have to have kerberos admin rights.  This
    is too much to expect.  The typical Xen customer wants to be able to
    just use normal *nix tools to add or delete a user -- that won't work
    with kerberos.  All users of all domains would be in the same
    kerberos namespace too -- there could be only one "jim" across all
    domains, even though those domains are supposed to be independent
    *nix machines owned by different parties -- very difficult to
    explain.

Q:  Why not just use a loop device on top of AFS, with the 'file:' 
    VBD type?
A:  Loop devices on top of AFS files hang with large volumes of 
    I/O -- looks like a deadlock of some sort (in my tests, a dd of
    around 2-300 Mb into an AFS-based loop device appears to
    consistently hang the kernel, even with a 500Mb or larger AFS
    cache).  In addition, an unmodified loop.c will not fsync() the
    underlying file; changes won't get written back to the AFS server
    until loop teardown.  I've added an fsync() to the worker thread of
    loop.c to take care of this every few seconds; that seems to work
    but I can't really stress test it much because of the hang problem.

Q:  Why not use iSCSI, nbd, drbd, gnbd, or enbd?
A:  While these each seem to do their job well, none offer all of the 
    maturity, client-side caching, WAN-optimized protocols, volume
    management, backups, easy snapshots, scalability, central
    administration, redundancy, replication, or kerberized security of
    AFS.  

What did I miss?  ;-)

Steve
-- 
Stephen G. Traugott  (KG6HDQ)
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@TerraLuna.Org 
http://www.stevegt.com -- http://Infrastructures.Org


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2005-01-18  0:35 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-12-23 11:41 AFS-based VBD backend Ian Pratt
2004-12-23 12:23 ` Steve Traugott
2004-12-23 12:36 ` Andrew Warfield
2004-12-23 13:48   ` Steve Traugott
2004-12-23 13:59     ` Andrew Warfield
2005-01-18  0:35     ` Steve Traugott
  -- strict thread matches above, loose matches on Subject: below --
2004-12-23 13:38 Ian Pratt
2005-01-04  2:54 ` Steve Traugott
2005-01-04  2:59   ` Kris Van Hees
2004-12-23 12:31 Ian Pratt
2004-12-23 13:17 ` Steve Traugott
2004-12-23  8:55 Steve Traugott
2004-12-23 10:37 ` Luciano Miguel Ferreira Rocha
2004-12-23 10:42   ` Keir Fraser
2004-12-23 10:56     ` Luciano Miguel Ferreira Rocha
2004-12-23 12:39       ` Steve Traugott
2004-12-23 11:58     ` Steve Traugott
2004-12-23 11:18   ` Steve Traugott
2004-12-28 19:21 ` Johannes Formann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.