Our introduction to Reiser-list

All of lore.kernel.org
 help / color / mirror / Atom feed

* Our introduction to Reiser-list
@ 2005-10-25 22:58 Peter van Hardenberg
  2005-10-25 23:08 ` Hans Reiser
  2005-10-26 21:00 ` Lares Moreau
  0 siblings, 2 replies; 24+ messages in thread
From: Peter van Hardenberg @ 2005-10-25 22:58 UTC (permalink / raw)
  To: reiserfs-list; +Cc: ycoady, Nordman, Ryan, huaning

Hello all,

I'm a student at the University of Victoria. Between myself and a few fellow 
students we have embarked on a quest to do some experiments with the Reiser4 
metadata system to show it off and provide some real world use cases.

We'll be spending lots of time on this project between now and the end of the 
year. If list members have ideas for interesting experiments, please, join in 
and suggest them.

So far, we have a few nifty ideas:

1) tie the 'extract' utility to a shell script to copy existing metadata down 
into the filesystem

2) create a (very slow) shell which operates on the filesystem by xpath query. 
doing it right would be difficult, but faking it should be easy.

We are all inexperienced with kernel hacking and will probably ask some stupid 
questions as we go forward. Our individual interests vary and we may find 
that we have bitten off more than we can chew, but I expect it will be an 
interesting ride, if nothing else. 

Our first contribution will be a practical guide to installing Reiser4 (with 
metadata enabled) under Ubuntu 5.10. 

All the best,
-pvh

-- 
Peter van Hardenberg (pvh@pvh.ca)
Victoria, BC, Canada

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-25 22:58 Our introduction to Reiser-list Peter van Hardenberg
@ 2005-10-25 23:08 ` Hans Reiser
  2005-10-26  0:04   ` Peter van Hardenberg
  2005-10-26 21:00 ` Lares Moreau
  1 sibling, 1 reply; 24+ messages in thread
From: Hans Reiser @ 2005-10-25 23:08 UTC (permalink / raw)
  To: Peter van Hardenberg; +Cc: reiserfs-list, ycoady, Nordman, Ryan, huaning

Peter van Hardenberg wrote:

>Hello all,
>
>I'm a student at the University of Victoria. Between myself and a few fellow 
>students we have embarked on a quest to do some experiments with the Reiser4 
>metadata system to show it off and provide some real world use cases.
>
>We'll be spending lots of time on this project between now and the end of the 
>year. If list members have ideas for interesting experiments, please, join in 
>and suggest them.
>
>So far, we have a few nifty ideas:
>
>1) tie the 'extract' utility to a shell script to copy existing metadata down 
>into the filesystem
>  
>
what is the extract utility?

>2) create a (very slow) shell which operates on the filesystem by xpath query. 
>  
>
can you say three paragraphs here?

>doing it right would be difficult, but faking it should be easy.
>
>We are all inexperienced with kernel hacking and will probably ask some stupid 
>questions as we go forward. Our individual interests vary and we may find 
>that we have bitten off more than we can chew, but I expect it will be an 
>interesting ride, if nothing else. 
>  
>
well, you are all more qualified than I was when I started work on
reiserfs......

>Our first contribution will be a practical guide to installing Reiser4 (with 
>metadata enabled) under Ubuntu 5.10. 
>  
>
Oh, that sounds great, it will go onto our website under the install
button if you are kind enough to allow that.

>All the best,
>-pvh
>
>  
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-25 23:08 ` Hans Reiser
@ 2005-10-26  0:04   ` Peter van Hardenberg
  2005-10-26  2:42     ` Hubert Chan
  0 siblings, 1 reply; 24+ messages in thread
From: Peter van Hardenberg @ 2005-10-26  0:04 UTC (permalink / raw)
  To: reiserfs-list

On October 25, 2005 04:08 pm, you wrote:
> Peter van Hardenberg wrote:
> >Hello all,
> >
> >I'm a student at the University of Victoria. Between myself and a few
> > fellow students we have embarked on a quest to do some experiments with
> > the Reiser4 metadata system to show it off and provide some real world
> > use cases.
> >
> >We'll be spending lots of time on this project between now and the end of
> > the year. If list members have ideas for interesting experiments, please,
> > join in and suggest them.
> >
> >So far, we have a few nifty ideas:
> >
> >1) tie the 'extract' utility to a shell script to copy existing metadata
> > down into the filesystem
>
> what is the extract utility?
>

Extract is a utility that recognizes many different file formats and extracts 
their metadata. For example:

pvh@arroyo:~ $ /usr/bin/extract music/Audioslave\ -\ Like\ a\ Stone.mp3
format - 192 bps, 44100 hz, 4m58 stereo
resource-type - MPEG V1
mimetype - audio/mpeg
description - Audioslave: Like a Stone (final mixdown)
comment - finalmix by Francis/AndyWarh
genre - Rock
date - 2002
album - final mixdown
artist - Audioslave
title - Like a Stone
genre - 17
comment -
date - 2002C
album - final mixdownT
artist - AudioslaveT
title - Like a StoneT
pvh@arroyo:~ $


> >2) create a (very slow) shell which operates on the filesystem by xpath
> > query.
>
> can you say three paragraphs here?
>

I envision creating a simple shell in a scripting language which will operate 
on an XML representation of the filesystem which maps, in the simple case, 
directories and files such that simple xpath queries appear identical to 
normal filesystem queries.

For example, if the XML looked like this:

<home>
   <pvh>
      <music>
          <audioslave.mp3 artist="Audioslave"/>
          <radiohead.mp3 artist="Radiohead"/>
      </music>
   </pvh>
   <norbs>
      <music>
         <chuckberry.mp3 artist="Chuck Berry"/>
         <ledzeppelin.mp3 artist="Led Zeppelin"/>
      </music>
   </norbs>
</home>


You could retrieve the first <music> node via "/home/pvh/music".
The query "//music" would return a list containing the nodes in any music 
element. The query "//norbs[//@artist = 'Chuck Berry']" would only return 
norbs' node if somewhere in that node was a file with a Chuck Berry artist 
attribute.

For more information, check out any good xPath tutorial. There would have to 
be some minor modifications due to namespace limitations of XML elements, but 
the principle would remain the same.

>
> >Our first contribution will be a practical guide to installing Reiser4
> > (with metadata enabled) under Ubuntu 5.10.
>
> Oh, that sounds great, it will go onto our website under the install
> button if you are kind enough to allow that. 
>

Of course.

-p

-- 
Peter van Hardenberg (pvh@pvh.ca)
Victoria, BC, Canada

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-26  0:04   ` Peter van Hardenberg
@ 2005-10-26  2:42     ` Hubert Chan
  2005-10-26 12:44       ` Peter Foldiak
  0 siblings, 1 reply; 24+ messages in thread
From: Hubert Chan @ 2005-10-26  2:42 UTC (permalink / raw)
  To: reiserfs-list

On Tue, 25 Oct 2005 17:04:55 -0700, Peter van Hardenberg <pvh@uvic.ca> said:

[...]

> I envision creating a simple shell in a scripting language which will
> operate on an XML representation of the filesystem which maps, in the
> simple case, directories and files such that simple xpath queries
> appear identical to normal filesystem queries.

Although it's not exactly the same, you may be interested in a research
proposal that I wrote for a course last year on embedding XML databases
into the filesystem namespace.  At least some of the references may be
useful.

http://www.cs.uwaterloo.ca/~hy3chan/cs741project.ps

-- 
Hubert Chan <hubert@uhoreg.ca> - http://www.uhoreg.ca/
PGP/GnuPG key: 1024D/124B61FA
Fingerprint: 96C5 012F 5F74 A5F7 1FF7  5291 AF29 C719 124B 61FA
Key available at wwwkeys.pgp.net.   Encrypted e-mail preferred.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-26  2:42     ` Hubert Chan
@ 2005-10-26 12:44       ` Peter Foldiak
  2005-10-26 16:10         ` Peter van Hardenberg
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Foldiak @ 2005-10-26 12:44 UTC (permalink / raw)
  To: Peter van Hardenberg; +Cc: reiserfs-list

Hubert Chan wrote:

>On Tue, 25 Oct 2005 17:04:55 -0700, Peter van Hardenberg <pvh@uvic.ca> said:
>  
>
>>I envision creating a simple shell in a scripting language which will
>>operate on an XML representation of the filesystem which maps, in the
>>simple case, directories and files such that simple xpath queries
>>appear identical to normal filesystem queries.
>>    
>>
>
>Although it's not exactly the same, you may be interested in a research
>proposal that I wrote for a course last year on embedding XML databases
>into the filesystem namespace.  At least some of the references may be
>useful.
>
>http://www.cs.uwaterloo.ca/~hy3chan/cs741project.ps
>
Also take a look at the part of a (record-breakingly long?) thread "file 
as a directory" on this list (also copied to lkml) near and after:

http://www.ussg.iu.edu/hypermail/linux/kernel/0411.3/0044.html

There, I suggested that file selection should be unified with 
part-of-file selection using XPath-like syntax.

To do what you are suggesting efficiently would be really nice. To do it 
inefficiently (with a shell script) may still be interesting, possibly 
even useful. But as XPath was originally inteded for selection inside 
XML files, it would also be nice if (at least for your XML files) you 
could select inside your files using a unified syntax!
 Peter

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-26 12:44       ` Peter Foldiak
@ 2005-10-26 16:10         ` Peter van Hardenberg
  2005-10-26 16:43           ` Chester R. Hosey
  2005-10-26 21:04           ` Nate Diller
  0 siblings, 2 replies; 24+ messages in thread
From: Peter van Hardenberg @ 2005-10-26 16:10 UTC (permalink / raw)
  To: reiserfs-list

On October 26, 2005 05:44 am, you wrote:
> Also take a look at the part of a (record-breakingly long?) thread "file
> as a directory" on this list (also copied to lkml) near and after:
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0411.3/0044.html
>
> There, I suggested that file selection should be unified with
> part-of-file selection using XPath-like syntax.
>
> To do what you are suggesting efficiently would be really nice. To do it
> inefficiently (with a shell script) may still be interesting, possibly
> even useful. But as XPath was originally inteded for selection inside
> XML files, it would also be nice if (at least for your XML files) you
> could select inside your files using a unified syntax!
>  Peter

The objections raised about this on the LKML are quite valid. I could see 
there being value in this kind of access, and an "XML" plugin or a 
"dictionary file" plugin could be useful, given sufficient time to mature and 
address issues of stability and security. 

Personally, I LIKE the bytestream. I think it is a sensible enough 
building-block for abstraction. I imagine the filesystem as a tree with named 
branches that has streams hanging off it like Christmas ornaments.

Ontologically, this is a nice simplification. Every node has the potential to 
be a file, no node has the requirement of being a file. The names are managed 
independently from the streams. Further, keeping the contents of streams 
opaque in the general case makes sense to me. Streams are already both simple 
and flexible, and a mechanism already exists for putting assigning a unique 
namespace to data.

In fact, instead of creating XML files with a plugin, I would personally try 
splitting them into many small files and write a userspace DOM-library that 
maps calls like "node.getChildren" to calls like "readdir(NODE)". So: instead 
of XMLfile/entry/@attribute parsing an XML file, there would actually be 
files in there.

Although I freely acknowledge my inexperience, I believe the real problems are 
related to graph traversal algorithms. Linus has commented on the obvious 
hardlink issues. I imagine there are more gremlins lurking in the shadows on 
this one. Garbage collectors have largely given up on reference counting, a 
luxury afforded by blazingly fast access to small amounts of storage. I am 
not particularly up on the research though.

Also: I still have not been able to USE files as directories. Yes, I can reach 
file/..../, but that is only one special case. I can crash things like it was 
going out of style by playing with these file-directories. Does nobody have 
any experience with this? What kind of work is it going to take me to get 
this going?

-pvh

-- 
Peter van Hardenberg (pvh@pvh.ca)
Victoria, BC, Canada

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-26 16:10         ` Peter van Hardenberg
@ 2005-10-26 16:43           ` Chester R. Hosey
  2005-10-26 17:12             ` Hans Reiser
                               ` (2 more replies)
  2005-10-26 21:04           ` Nate Diller
  1 sibling, 3 replies; 24+ messages in thread
From: Chester R. Hosey @ 2005-10-26 16:43 UTC (permalink / raw)
  To: reiserfs-list

Peter van Hardenberg wrote:
> Although I freely acknowledge my inexperience, I believe the real problems are 
> related to graph traversal algorithms. Linus has commented on the obvious 
> hardlink issues. I imagine there are more gremlins lurking in the shadows on 
> this one. Garbage collectors have largely given up on reference counting, a 
> luxury afforded by blazingly fast access to small amounts of storage. I am 
> not particularly up on the research though.

Just a suggestion from the uninformed peanut gallery...

Hans already plans on having a repacker, which will run incrementally in
the background. Might it make sense to do incremental GC, possibly even
in combination with the repacker's traversal of the disk?

Chet


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-26 16:43           ` Chester R. Hosey
@ 2005-10-26 17:12             ` Hans Reiser
  2005-10-26 20:43             ` David Masover
  2005-10-26 22:40             ` Nate Diller
  2 siblings, 0 replies; 24+ messages in thread
From: Hans Reiser @ 2005-10-26 17:12 UTC (permalink / raw)
  To: Chester R. Hosey, Nate Diller; +Cc: reiserfs-list

Nate, please comment on your notion of having list all functionality,
and how allowing cycles could be ok under such a scenario.

Hans

Chester R. Hosey wrote:

>Peter van Hardenberg wrote:
>  
>
>>Although I freely acknowledge my inexperience, I believe the real problems are 
>>related to graph traversal algorithms. Linus has commented on the obvious 
>>hardlink issues. I imagine there are more gremlins lurking in the shadows on 
>>this one. Garbage collectors have largely given up on reference counting, a 
>>luxury afforded by blazingly fast access to small amounts of storage. I am 
>>not particularly up on the research though.
>>    
>>
>
>Just a suggestion from the uninformed peanut gallery...
>
>Hans already plans on having a repacker, which will run incrementally in
>the background. Might it make sense to do incremental GC, possibly even
>in combination with the repacker's traversal of the disk?
>
>Chet
>
>
>
>  
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-26 16:43           ` Chester R. Hosey
  2005-10-26 17:12             ` Hans Reiser
@ 2005-10-26 20:43             ` David Masover
  2005-10-26 22:40             ` Nate Diller
  2 siblings, 0 replies; 24+ messages in thread
From: David Masover @ 2005-10-26 20:43 UTC (permalink / raw)
  To: Chester R. Hosey; +Cc: reiserfs-list

Chester R. Hosey wrote:
> Peter van Hardenberg wrote:
> 
>>Although I freely acknowledge my inexperience, I believe the real problems are 
>>related to graph traversal algorithms. Linus has commented on the obvious 
>>hardlink issues. I imagine there are more gremlins lurking in the shadows on 
>>this one. Garbage collectors have largely given up on reference counting, a 
>>luxury afforded by blazingly fast access to small amounts of storage. I am 
>>not particularly up on the research though.
> 
> 
> Just a suggestion from the uninformed peanut gallery...
> 
> Hans already plans on having a repacker, which will run incrementally in
> the background. Might it make sense to do incremental GC, possibly even
> in combination with the repacker's traversal of the disk?

You're not the first person to suggest GC instead of refcounting.  I 
still say, if at all possible, let's not let it come to that.

Try this:  I have a box which I call "the server" because it's headless 
and it does things like my one-man email operation.  It has a TV tuner 
card on it, and it has an 80 gig hard drive.

It wouldn't take a lot of TV to fill up 80 gigs.  My desktop has a 500 
gig RAID, which I use for games, my Windows install, and so on.

So, I can pull the TV from my server onto my desktop relatively easily 
-- there's a gigabit crossover between them, and NFS is fast enough. 
That way, I keep the server disk usage below 50%, even though I don't 
leave the desktop on all the time, and even though it can take awhile 
before I watch the shows I'm recording.  Even if I just choose to record 
from a particular channel for a full day, then skim through the 
recording to see if there's anything interesting.

With grabage collection, the idea is that maybe once a week, the 
repacker runs, and frees space at the same time.  In other words, if I 
delete something, I may not get the space back for most of a week.  With 
the current reference counting scheme, I get the space back immediately.

In virtual machines and such, garbage collection is fast, so it can be 
run much more frequently, even on demand -- need more RAM?  Run the 
garbage collector, flush the buffers, and you have RAM.

You can't do that with an FS, because the garbage collection would take 
insanely long, and you'd never know when it'd hit.  Kind of like lazy 
allocation, only worse.  Lazy allocation means that after awhile, my RAM 
fills up and Reiser4 decides to flush to disk, making my FS access 
unresponsive for a few seconds, sometimes 10 or 20.  It's better now, 
not sure if that's because I've got 2 gigs of RAM on my desktop instead 
of half a gig or because the new version of Reiser4 is smarter about it.

But, imagine that annoying random insane disk activity, effectively a 
few seconds of a frozen system, only you very likely have to lock the 
entire FS, and it takes several minutes or hours instead of a few 
seconds.  That's why you can't do on-disk garbage collection on demand.

Also, if you keep disk usage low, it's easier to keep things 
defragmented.  In RAM, no one cares -- use all the RAM, if it gets out 
of order, so what?  It's called "Random Access Memory" for a reason. 
And don't tell me you repack every time you collect garbage, because it 
already takes too long, and repacking would make it take longer.  And if 
you tried to do it in the same pass, you'd end up with a perfectly 
defragmented FS, except for the hundreds of tiny, randomly distributed 
holes where the recently collected garbage was.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-26 16:43           ` Chester R. Hosey
  2005-10-26 17:12             ` Hans Reiser
  2005-10-26 20:43             ` David Masover
@ 2005-10-26 22:40             ` Nate Diller
  2005-10-26 17:02               ` John Gilmore
  2 siblings, 1 reply; 24+ messages in thread
From: Nate Diller @ 2005-10-26 22:40 UTC (permalink / raw)
  To: Chester R. Hosey; +Cc: reiserfs-list

On 10/26/05, Chester R. Hosey <chosey@nauticom.net> wrote:
> Peter van Hardenberg wrote:
> > Although I freely acknowledge my inexperience, I believe the real problems
> are 
> > related to graph traversal algorithms. Linus has commented on the obvious
> 
> > hardlink issues. I imagine there are more gremlins lurking in the shadows
> on 
> > this one. Garbage collectors have largely given up on reference counting,
> a 
> > luxury afforded by blazingly fast access to small amounts of storage. I am
> 
> > not particularly up on the research though.
> 
> Just a suggestion from the uninformed peanut gallery...
> 
> Hans already plans on having a repacker, which will run incrementally in
> the background. Might it make sense to do incremental GC, possibly even
> in combination with the repacker's traversal of the disk?
> 
I'd reply to Hans' email directly, but in a moment of weakness he top-posted.  Anyway, I don't subscribe to this list from my namesys account.

Ok, the garbage collection problem.  Your suggestion is insightful for an important reason, although it turns out not to solve the problem, it actually helps illustrate it quite well.

File Locality
  There are a few schools of thought on how to group data that is not formally structured (ie, not a relational database).  One is to use temporal locality, meaning group things together which are created close together in time.  Another is locality of reference, which is optimal if you can get it.  Unfortunately, it's difficult to keep track of which files are accessed close together in time, without incurring lots of overhead.  It's also impossible to know in advance, so it can really only be used in a repacker.
  Hans chose to use semantic locality to group data in the on-disk tree, meaning that files are grouped based on their name, and their parent directory.  This is done by a key assignment plugin, which basically aligns everything in the tree by its position alphabetically (this is the purpose of the firbration plugin) withinin its parent directory.  This means that if you loop over the results of readdir() reading each file, you will traverse through part of the tree incrementally, eliminating disk seeks and speeding up access immensely.
  Since the disk is basically a static array of blocks, and the whole tree obviously cannot be cached at once, the tree sometimes has to get divided up when it is written to disk.  If this happens a lot, traversing the tree in order can actually introduce a lot of seeks, jumping around to the various peices of the tree, rather than none like I just described.  This is the problem which the repacker solves, its job is to "defragment" the tree itself, and since there is only one tree, once it is done the disk's free space will also be consolidated.

Graphs
  This semantic locality scheme doesn't perfectly map to the posix namespace, since posix allows for hard links.  A directory tree with hard links is not precisely a tree, it is actually a directed graph, or digraph.  In order to make management of the namespace easier, however, there is a limitation that directories cannot be hard linked, meaning that the namespace is actually an acyclic directed graph, or ADG.
  Even with an ADG there is the problem that semantic locality does not correspond to in-tree (on-disk) locality when a file has more than one hard link, since it only exists in one place on disk but more than one place in the namespace.  In practice this poses no real issues, but it makes it difficult to imagine a repacker that could track down all the parent directories of a particular file, without lots of overhead.

File-as-Directory
  The issue with file-as-directory (FaD) is that it removes the Acyclic property of the namespace graph.  This is because anything which contains file data can be hard-linked, even if that item is also a directory.  It would be unreasonable to discard hard links entirely, they are quite useful and would be much more useful, in fact, with FaD enabled.  So the new namespace would be a directed graph, with cycles.
  Since unix operating systems are responsible for deleting unused data in file systems (garbage collection), any algorithm used for that purpose has to satisfy strict requirements, for CPU and memory use, but more importantly for reliability.  The algorithm must not fail the deletion unless the system is OOM, and it must always free the file's resources.  Reference counting has always been used for this task.
  It's been awhile since I took graph theory, and I got a C in it anyway, but the algorithms I have seen for determining graph connectivity end up traversing to every node in the graph in the worst case.  This means that the file system would potentially have to read in the directory data for every link to every file in the system, for a single deletion operation.

my proposal
  First I should note that this is my personal suggestion, and it is not endorsed by Hans in any way.
  I have always had a strange view of the role of the OS, and in my view, there is no reason that the kernel should be required perform garbage collection for the file system.  Rather, a program should be able to do garbage collection for itself, or delegate the task to a daemon.  The program simply needs to be able to get a list of all the nodes (files, ie. inodes) in the system, and be able to instruct the kernel to delete them if they are unused.
  Unfortunately, I've run out of time, and I'll have to articulate my idea in detail in another email.  It's not the easy way to do it; I don't think that this can work in Linux (or Reiser4) in the near future, so no need for flames on that account.  There are also a number of practical problems to address, such as whether inode numbers are persistant, but I'm confident that this is the long-term solution.

NATE

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-26 22:40             ` Nate Diller
@ 2005-10-26 17:02               ` John Gilmore
  2005-10-27  0:55                 ` Hubert Chan
                                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: John Gilmore @ 2005-10-26 17:02 UTC (permalink / raw)
  To: reiserfs-list

On Wednesday 26 October 2005 22:40, Nate Diller wrote:
> File-as-Directory
>   The issue with file-as-directory (FaD) is that it removes the Acyclic
> property of the namespace graph.  This is because anything which contains
> file data can be hard-linked, even if that item is also a directory.  It
> would be unreasonable to discard hard links entirely, they are quite useful
> and would be much more useful, in fact, with FaD enabled.  So the new
> namespace would be a directed graph, with cycles. Since unix operating
> systems are responsible for deleting unused data in file systems (garbage
> collection), any algorithm used for that purpose has to satisfy strict
> requirements, for CPU and memory use, but more importantly for reliability.
>  The algorithm must not fail the deletion unless the system is OOM, and it
> must always free the file's resources.  Reference counting has always been
> used for this task. It's been awhile since I took graph theory, and I got a
> C in it anyway, but the algorithms I have seen for determining graph
> connectivity end up traversing to every node in the graph in the worst
> case.  This means that the file system would potentially have to read in
> the directory data for every link to every file in the system, for a single
> deletion operation.

If the issue is really the matter of removing the acyclic property, wouldn't 
that be solved by requiring that the file contains no non-dynamically 
generated subdirectories? That way, the graph is still acyclic, making 
reference counting work again.

I had understood that a big part of the issue with file-as-directory was the 
same as the issue with hard links to directories. Which I thought is that if 
you move one directory into another, you can lose the connection to the root 
of the filesystem -- I.E. ../../.. etc is no longer guaranteed to get you 
to / -- And of course this also means that there is no way to get from / to 
your freshly moved files, and you've just lost a chunk of disk space to 
complete inaccessibility. fsck would have to be made smarter specifically to 
be able to figure that one out, too.

Forbiding subdirectories in file-as-directory solves that problem too, because 
a normal file cannot be moved into anothers' file-as-directory. You could 
make it lose its meta-data, I suppose, but that's potentially lossy, which mv 
isn't allowed to be.

Actually, when I had first read about file-as-directory, I had assumed that 
the content was either dynamically generated from the on-disk metadata (like 
uid, gid, etc) or stored as a "sideband" type stream in the file itself, like 
some of the MAC OS file systems (and others) do, or generated via a plugin 
connecting to user-space, for ID3 tags on mp3 files and other information 
which can easily be obtained from the file itself.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-26 17:02               ` John Gilmore
@ 2005-10-27  0:55                 ` Hubert Chan
  2005-10-27  6:49                 ` Peter van Hardenberg
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: Hubert Chan @ 2005-10-27  0:55 UTC (permalink / raw)
  To: reiserfs-list

On Wed, 26 Oct 2005 17:02:06 +0000, John Gilmore <jgilmore@glycou.com> said:

> On Wednesday 26 October 2005 22:40, Nate Diller wrote:
>> File-as-Directory
>> The issue with file-as-directory (FaD) is that it removes the Acyclic
>> property of the namespace graph.

More precisely, it removes an easy criterion which prevents directed
cycles.  There are other ways to prevent directed cycles, but preventing
directories from being hard-linked is an easy way to do it.

One way, which I had suggested before, and I had promised to work out
some of the nasty details, but haven't had time to yet (it's still on my
todo list), is to keep parent pointers for all nodes.  Whenever you
create a new hardlink, you walk up the tree and see if that new hardlink
will create a cycle.  While this *may* require, in the worst case, a
traversal of the entire tree, it is highly unlikely to do so, under most
peoples' file organization scheme.  (Hard links are rare, and very deep
and narrow hierarchies are rare too.)

[...]

>> It's been awhile since I took graph theory, and I got a C in it
>> anyway, but the algorithms I have seen for determining graph
>> connectivity end up traversing to every node in the graph in the
>> worst case.

That is correct.  Any algorithm for determining graph connectivity, or
checking for directed cycles, by necessity has to traverse the entire
graph in the worst case.  So you definitely don't want to restart your
algorithm from scratch on every link/unlink operation.  The best that
you could hope for is that you can maintain a separate data structure
that you update on every link/unlink operation, which allows you to
easily determine whether the graph is connected, or if it has a directed
cycle.  But from my own studies in this area, it doesn't seem very
likely that such a data structure can me easily maintained.

>> This means that the file system would potentially have to read in the
>> directory data for every link to every file in the system, for a
>> single deletion operation.

> If the issue is really the matter of removing the acyclic property,
> wouldn't that be solved by requiring that the file contains no
> non-dynamically generated subdirectories? That way, the graph is still
> acyclic, making reference counting work again.

I think we would still want to have hardlinks to files that contain
dynamically generated subdirectories; one of the purposes for
file-as-dir is to be able to attach arbitrary metadata to a file.

But yes, preventing hardlinks to files that contain non-dynamically
generated subdirectories would ensure that there are no directed cycles
(as long as we are sure that the dynamically-generated subdirectories
don't do stupid things).

> I had understood that a big part of the issue with file-as-directory
> was the same as the issue with hard links to directories. Which I
> thought is that if you move one directory into another, you can lose
> the connection to the root of the filesystem

I think that you can run into this problem even without (multiple) hard
links to directories.  And I believe this is solved by making sure that
when you move directory A into directory B, you check the pathname to
make sure that A isn't a prefix of B.

I believe that as long as you check that, and ensure that the graph has
no directed cycles, and never unlink a non-empty directory (or rather a
directory with no non-dynamically generated subdirectories), you won't
have to worry about losing the connection to the root.

-- 
Hubert Chan <hubert@uhoreg.ca> - http://www.uhoreg.ca/
PGP/GnuPG key: 1024D/124B61FA
Fingerprint: 96C5 012F 5F74 A5F7 1FF7  5291 AF29 C719 124B 61FA
Key available at wwwkeys.pgp.net.   Encrypted e-mail preferred.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-26 17:02               ` John Gilmore
  2005-10-27  0:55                 ` Hubert Chan
@ 2005-10-27  6:49                 ` Peter van Hardenberg
  2005-10-27 11:17                   ` David Masover
  2005-10-27  8:44                 ` Hans Reiser
  2005-10-27 12:05                 ` Alexander G. M. Smith
  3 siblings, 1 reply; 24+ messages in thread
From: Peter van Hardenberg @ 2005-10-27  6:49 UTC (permalink / raw)
  To: reiserfs-list

On October 26, 2005 10:02 am, John Gilmore wrote:
> Actually, when I had first read about file-as-directory, I had assumed that
> the content was either dynamically generated from the on-disk metadata
> (like uid, gid, etc) or stored as a "sideband" type stream in the file
> itself, like some of the MAC OS file systems (and others) do, or generated
> via a plugin connecting to user-space, for ID3 tags on mp3 files and other
> information which can easily be obtained from the file itself.

And I thought the whole idea was to unify the namespace and make things like 
ID3 tags obsolete...

-pvh

-- 
Peter van Hardenberg (pvh@pvh.ca)
Victoria, BC, Canada

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-27  6:49                 ` Peter van Hardenberg
@ 2005-10-27 11:17                   ` David Masover
  2005-10-27 19:20                     ` Peter van Hardenberg
  0 siblings, 1 reply; 24+ messages in thread
From: David Masover @ 2005-10-27 11:17 UTC (permalink / raw)
  To: Peter van Hardenberg; +Cc: reiserfs-list

Peter van Hardenberg wrote:
> On October 26, 2005 10:02 am, John Gilmore wrote:
> 
>>Actually, when I had first read about file-as-directory, I had assumed that
>>the content was either dynamically generated from the on-disk metadata
>>(like uid, gid, etc) or stored as a "sideband" type stream in the file
>>itself, like some of the MAC OS file systems (and others) do, or generated
>>via a plugin connecting to user-space, for ID3 tags on mp3 files and other
>>information which can easily be obtained from the file itself.
> 
> 
> And I thought the whole idea was to unify the namespace and make things like 
> ID3 tags obsolete...

The two are not mutually exclusive.  You unify the namespace, and use
that to access things like ID3 tags.  Of course, eventually ID3 tags
become obsolete, and the information is instead stored outside of the
file itself, as a separate stream (treated as a file).  You'd have a
standard way of serializing any given file and all its metadata, so that
"something like id3" doesn't have to be re-invented for every file type
that has metadata, and so that similar metadata can be accessed through
a standard mechanism -- searching for a particular artist should return
songs (using id3 tags) and music videos (using the mpeg equivalent) and
maybe even song lyrics (using separate metadata).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-27 11:17                   ` David Masover
@ 2005-10-27 19:20                     ` Peter van Hardenberg
  2005-10-27 20:44                       ` Jonathan Briggs
  0 siblings, 1 reply; 24+ messages in thread
From: Peter van Hardenberg @ 2005-10-27 19:20 UTC (permalink / raw)
  To: reiserfs-list

On October 27, 2005 04:17 am, David Masover wrote:
> Peter van Hardenberg wrote:
> > On October 26, 2005 10:02 am, John Gilmore wrote:
> >
> > And I thought the whole idea was to unify the namespace and make things
> > like ID3 tags obsolete...
>
> The two are not mutually exclusive.  You unify the namespace, and use
> that to access things like ID3 tags.  Of course, eventually ID3 tags
> become obsolete, and the information is instead stored outside of the
> file itself, as a separate stream (treated as a file).  You'd have a
> standard way of serializing any given file and all its metadata, so that
> "something like id3" doesn't have to be re-invented for every file type
> that has metadata, and so that similar metadata can be accessed through
> a standard mechanism -- searching for a particular artist should return
> songs (using id3 tags) and music videos (using the mpeg equivalent) and
> maybe even song lyrics (using separate metadata).

It's much easier, more extensible, and more secure to create a utility which 
ties together a number of userspace metadata libraries to create static files 
than to move them all down into kernel space. I feel plugins providing 
pseudofiles should only be used when there is no viable alternative.

-- 
Peter van Hardenberg (pvh@pvh.ca)
Victoria, BC, Canada

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-27 19:20                     ` Peter van Hardenberg
@ 2005-10-27 20:44                       ` Jonathan Briggs
  0 siblings, 0 replies; 24+ messages in thread
From: Jonathan Briggs @ 2005-10-27 20:44 UTC (permalink / raw)
  To: Peter van Hardenberg; +Cc: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 1973 bytes --]

On Thu, 2005-10-27 at 12:20 -0700, Peter van Hardenberg wrote:
> On October 27, 2005 04:17 am, David Masover wrote:
> > Peter van Hardenberg wrote:
> > > On October 26, 2005 10:02 am, John Gilmore wrote:
> > >
> > > And I thought the whole idea was to unify the namespace and make things
> > > like ID3 tags obsolete...
> >
> > The two are not mutually exclusive.  You unify the namespace, and use
> > that to access things like ID3 tags.  Of course, eventually ID3 tags
> > become obsolete, and the information is instead stored outside of the
> > file itself, as a separate stream (treated as a file).  You'd have a
> > standard way of serializing any given file and all its metadata, so that
> > "something like id3" doesn't have to be re-invented for every file type
> > that has metadata, and so that similar metadata can be accessed through
> > a standard mechanism -- searching for a particular artist should return
> > songs (using id3 tags) and music videos (using the mpeg equivalent) and
> > maybe even song lyrics (using separate metadata).
> 
> It's much easier, more extensible, and more secure to create a utility which 
> ties together a number of userspace metadata libraries to create static files 
> than to move them all down into kernel space. I feel plugins providing 
> pseudofiles should only be used when there is no viable alternative.

And with transactions, it can be safe against becoming out of sync
(which is one of the arguments for doing it in-kernel).

Scenario:
        Reiser4 detects a file update about to happen.
        Transaction opens.  Changes are now invisible outside
        transaction.
                File contents are updated.  
                User space helper is called.
                Metadata files are updated.
                User space helper exits.
        Transaction closes.  File and metadata updates are now visible.
-- 
Jonathan Briggs <jbriggs@esoft.com>
eSoft, Inc.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-26 17:02               ` John Gilmore
  2005-10-27  0:55                 ` Hubert Chan
  2005-10-27  6:49                 ` Peter van Hardenberg
@ 2005-10-27  8:44                 ` Hans Reiser
  2005-10-27 12:05                 ` Alexander G. M. Smith
  3 siblings, 0 replies; 24+ messages in thread
From: Hans Reiser @ 2005-10-27  8:44 UTC (permalink / raw)
  To: John Gilmore; +Cc: reiserfs-list

John Gilmore wrote:

>On Wednesday 26 October 2005 22:40, Nate Diller wrote:
>  
>
>>File-as-Directory
>>  The issue with file-as-directory (FaD) is that it removes the Acyclic
>>property of the namespace graph.  This is because anything which contains
>>file data can be hard-linked, even if that item is also a directory.  It
>>would be unreasonable to discard hard links entirely, they are quite useful
>>and would be much more useful, in fact, with FaD enabled.  So the new
>>namespace would be a directed graph, with cycles. Since unix operating
>>systems are responsible for deleting unused data in file systems (garbage
>>collection), any algorithm used for that purpose has to satisfy strict
>>requirements, for CPU and memory use, but more importantly for reliability.
>> The algorithm must not fail the deletion unless the system is OOM, and it
>>must always free the file's resources.  Reference counting has always been
>>used for this task. It's been awhile since I took graph theory, and I got a
>>C in it anyway, but the algorithms I have seen for determining graph
>>connectivity end up traversing to every node in the graph in the worst
>>case.  This means that the file system would potentially have to read in
>>the directory data for every link to every file in the system, for a single
>>deletion operation.
>>    
>>
>
>If the issue is really the matter of removing the acyclic property, wouldn't 
>that be solved by requiring that the file contains no non-dynamically 
>generated subdirectories?
>
More precisely, that a file with two hard links contains no
non-dynamically generated subdirectories.    Hmmm.  Yes, that works I
think.... 

> That way, the graph is still acyclic, making 
>reference counting work again.
>
>I had understood that a big part of the issue with file-as-directory was the 
>same as the issue with hard links to directories. Which I thought is that if 
>you move one directory into another, you can lose the connection to the root 
>of the filesystem -- I.E. ../../.. etc is no longer guaranteed to get you 
>to / -- And of course this also means that there is no way to get from / to 
>your freshly moved files, and you've just lost a chunk of disk space to 
>complete inaccessibility. fsck would have to be made smarter specifically to 
>be able to figure that one out, too.
>
>Forbiding subdirectories in file-as-directory solves that problem too, because 
>a normal file cannot be moved into anothers' file-as-directory. You could 
>make it lose its meta-data, I suppose, but that's potentially lossy, which mv 
>isn't allowed to be.
>
>Actually, when I had first read about file-as-directory, I had assumed that 
>the content was either dynamically generated from the on-disk metadata (like 
>uid, gid, etc) or stored as a "sideband" type stream in the file itself, like 
>some of the MAC OS file systems (and others) do, or generated via a plugin 
>connecting to user-space, for ID3 tags on mp3 files and other information 
>which can easily be obtained from the file itself.
>
>
>
>  
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-26 17:02               ` John Gilmore
                                   ` (2 preceding siblings ...)
  2005-10-27  8:44                 ` Hans Reiser
@ 2005-10-27 12:05                 ` Alexander G. M. Smith
  2005-10-27 12:41                   ` John Gilmore
  2005-10-27 16:40                   ` Hans Reiser
  3 siblings, 2 replies; 24+ messages in thread
From: Alexander G. M. Smith @ 2005-10-27 12:05 UTC (permalink / raw)
  To: John Gilmore; +Cc: reiserfs-list

John Gilmore wrote on Wed, 26 Oct 2005 17:02:06 +0000:
> I had understood that a big part of the issue with file-as-directory was the 
> same as the issue with hard links to directories. Which I thought is that if 
> you move one directory into another, you can lose the connection to the root 
> of the filesystem -- I.E. ../../.. etc is no longer guaranteed to get you 
> to / -- And of course this also means that there is no way to get from / to 
> your freshly moved files, and you've just lost a chunk of disk space to 
> complete inaccessibility. fsck would have to be made smarter specifically to 
> be able to figure that one out, too.

The file move operation has to check that it doesn't break the graph into
two graphs, thus disconnecting some files from the root.  Or you can think
of it as being a way of deleting a whole subgraph of files all at once,
kind of like an rmdir that works better than usual :-)

Speaking of connecting to the root, one concept I found useful was to have
a "True Path" to the root.  One of the hard links to a file / directory is
considered to be the main one and the rest are auxiliary (easier done if
each file/dir stores a list of parents).  The main one is guaranteed to
lead to the root (a recursive property) and is used for ".." in directories
and the equivalent operation for files.  Then when you want a canonical
path, you build it by following the main parents up to the root.

The "True Path" comes in useful for faking hard links for POSIX compatibility
by misrepresenting them as symbolic links using the dynamically generated true
path string.  As well, if you remove a hard link and it wasn't the main parent
then you don't have to do any graph traversal to fix up things; all items
will still have a valid link to the root through their unchanged main parent.

- Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-27 12:05                 ` Alexander G. M. Smith
@ 2005-10-27 12:41                   ` John Gilmore
  2005-10-28 12:29                     ` Alexander G. M. Smith
  2005-10-27 16:40                   ` Hans Reiser
  1 sibling, 1 reply; 24+ messages in thread
From: John Gilmore @ 2005-10-27 12:41 UTC (permalink / raw)
  To: reiserfs-list

On Thursday 27 October 2005 12:05, Alexander G. M. Smith wrote:
> John Gilmore wrote on Wed, 26 Oct 2005 17:02:06 +0000:
> > I had understood that a big part of the issue with file-as-directory was
> > the same as the issue with hard links to directories. Which I thought is
> > that if you move one directory into another, you can lose the connection
> > to the root of the filesystem -- I.E. ../../.. etc is no longer
> > guaranteed to get you to / -- And of course this also means that there is
> > no way to get from / to your freshly moved files, and you've just lost a
> > chunk of disk space to complete inaccessibility. fsck would have to be
> > made smarter specifically to be able to figure that one out, too.
>
> The file move operation has to check that it doesn't break the graph into
> two graphs, thus disconnecting some files from the root.  Or you can think
> of it as being a way of deleting a whole subgraph of files all at once,
> kind of like an rmdir that works better than usual :-)
>
> Speaking of connecting to the root, one concept I found useful was to have
> a "True Path" to the root.  One of the hard links to a file / directory is
> considered to be the main one and the rest are auxiliary (easier done if
> each file/dir stores a list of parents).  The main one is guaranteed to
> lead to the root (a recursive property) and is used for ".." in directories
> and the equivalent operation for files.  Then when you want a canonical
> path, you build it by following the main parents up to the root.
>
> The "True Path" comes in useful for faking hard links for POSIX
> compatibility by misrepresenting them as symbolic links using the
> dynamically generated true path string.  As well, if you remove a hard link
> and it wasn't the main parent then you don't have to do any graph traversal
> to fix up things; all items will still have a valid link to the root
> through their unchanged main parent.

I'm getting confused. So forgive my if I become incoherent in my attempts to 
understand...

<rambling>

I'm failing to see the difference between a "true path" and a "not guaranteed 
true path" -- Don't all paths (that point "upwards") have to lead to root 
eventually? Hrm... Maybe you mean that an "upward path" (also called a "back 
reference" no?) is the one that is guaranteed to not lead into a cycle? Or at 
least to lead back out... So this "True Path" is a short name for "shortest 
path to root?" 

Or maybe you're storing only one parent pointer, and it's the "True Path" - 
but thats very expensive to update when the "true" reference is unlinked.

Or maybe you are talking about a backreference that isn't updated -- for 
efficiency reasons? Leading to stale backreferences, which of course would be 
more efficient only on delete, and then you'd still have to... Nevermind, no 
gains here.

I don't see a way to (safely) move (multiple hard-linked) directories around 
without requiring that the new connection to root be atomically verified. And 
that requires parent pointers. And multiple parent pointers, which in turn 
makes verifying the connection to root complex, and a "True Path" concept 
might be usefull. But it might better be called a 'guaranteed path'. But it's 
then just one way of maintaining some information to help you decide which of 
several parents you're going to ascend through when verifying a new 
connection to root.

Maybe there a better way to safely move (and delete, etc) multiple hard-linked 
directories?

Maybe some more complex method - some rule to insure the graph stays acyclic? 
But there's nothing inherently wrong with cycles, they are just hard to deal 
with.

</Rambling>

Wouldn't requiring parent pointers (and multiple parent pointers, at that) 
require that updates be done in (at least) two places every time that you did 
any linking/unlinking/link-changing? And wouldn't that kill performance 
pretty badly?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-27 12:41                   ` John Gilmore
@ 2005-10-28 12:29                     ` Alexander G. M. Smith
  0 siblings, 0 replies; 24+ messages in thread
From: Alexander G. M. Smith @ 2005-10-28 12:29 UTC (permalink / raw)
  To: John Gilmore; +Cc: reiserfs-list

John Gilmore wrote on Thu, 27 Oct 2005 12:41:50 +0000:
> I'm failing to see the difference between a "true path" and a "not guaranteed 
> true path" -- Don't all paths (that point "upwards") have to lead to root 
> eventually? Hrm... Maybe you mean that an "upward path" (also called a "back 
> reference" no?) is the one that is guaranteed to not lead into a cycle? Or at 
> least to lead back out... So this "True Path" is a short name for "shortest 
> path to root?"

This is in a file system that allows multiple parents - also known as hard
links to directories.  If you happen to repeatedly go up the wrong parents,
you can get stuck in a cycle.

> Or maybe you're storing only one parent pointer, and it's the "True Path" - 
> but thats very expensive to update when the "true" reference is unlinked.

Hans Reiser wrote on Thu, 27 Oct 2005 09:40:09 -0700:
> What do you do when unlinking the true path parent and there remain
> links to the others, do you check for connection to root then during unlink?

That's the expensive part.  Picking the new true path parent when the old
one is deleted can involve a traversal of the whole file system (if a link
near the root is affected).  At least all the descendants of the unlinked
directory will have to be processed (except for ordinary files - they're
inert).

> Wouldn't requiring parent pointers (and multiple parent pointers, at that) 
> require that updates be done in (at least) two places every time that you did 
> any linking/unlinking/link-changing? And wouldn't that kill performance 
> pretty badly?

Right - both the parent directory and the affected children need to be updated,
even regular hard link creation or file creation require two disk writes.  One
affecting the directory's data (list of children) and the other is in the target
file's inode (list of parents).  Regular performance will be less or the same
(creating a file in an ordinary file system requires writes to the directory
and to the file's inode anyway).  Only delete / rename performance can
occasionally be horrible.

Guess I should replay the AGMS link documentation (Alias Generated Morphing
Symbolic links, morphing since they appear as dynamically changing symbolic
links under POSIX though they are really hard links - otherwise "ls -R" and
other tools break).  Of note is the algorithm for doing delete updating at
the end of the listing.

- Alex

Because the presence of AGMS links allows for cycles in the file name tree
(turning it into a directed graph), we need graph traversal algorithms for
ensuring that all directories (except the root) always have a valid main
parent directory.  Another desired feature is that errors (such as out of
memory) during a rename or delete leave the file system unchanged.  This
leads to a transaction type of approach, where the code figures out all the
work that needs to be done, allocates items needed (such as strings for new
names), then goes and implements the changes in a way which avoids doing
anything that could fail.  If it fails at any step before the final
implementation step, it just frees what it had allocated and exits.

Most operations (create, stat, write, etc) don't involve changes that affect
more than one or two things in the file system, so they aren't covered here.
So only deletion and renaming need this fancy treatment.  Renaming because
rename a/b x/y unlinks b from directory a, unlinks y from directory x (maybe
deletes it too if it doesn't have any other parents) and adds a link in
directory x to b.

Deletion (actually, it's more like unlink; deletion of the object doesn't
happen until the last parent goes) of a directory can disconnect a
ring/cycle of subdirectories from the rest of the file system.  An example
of this is a tree of directories A/B/C with an AGMS link named D inside C
that goes back to B. B is essentially a subdirectory of directories A and C,
with B's main parent directory ".." leading to A and B's extra parent
directory "..." leading to C.

    A _      A has B as a subdirectory, ordinary (hard link) connection.
    |/ \
    B  |     B has C as a subdirectory.
    |  ^
    C  |     C contains an AGMS link named D.
    |  ^
   (D) |     D is an AGMS link pointing to B, () to show it is an AGMS link.
    \__/

Now remove the A-B link.  As part of that operation, C is promoted to be the
new main parent directory of B, and the now useless AGMS link D is
deallocated.

  A   _
     / \
    B  |     B has C as a subdirectory.
    |  ^
    C  |     C has B as a subdirectory.
    \__/

B now has C as a parent and C has B as a parent, so they won't get
deallocated because they still have parents.  However, there's no way of
accessing the B/C directories from anywhere else in the file system, or for
getting from B/C to the root via ".." operations.  We want to detect that.

As well you can unlink a directory which formerly provided the current path
to the root, leaving some other directories without a series of ".."
operations leading to the root.  Their main parent directories need to be
updated.  For example in directory A add an AGMS link X pointing to
directory C and then unlink B from A (cd A ; ln -d B/C X ; rmdir B).  B's
main parent becomes C (via AGMS link D which gets merged with B since you
can't have an AGMS link as the main parent directory).  Also the main parent
of C becomes A (via merging with X), since it has a connection to the root
while B just uselessly circles around to C again.  Watch out when merging a
directory with an AGMS link - there may be other AGMS links pointing to the
former AGMS link, which need to be fixed up to point to the directory.
Finally, C used to be inside B but when it switched its main parent
directory from B to A, it leaves behind a new AGMS link named Z in directory
B.

Initial situation:

   A   _      A has B as a subdirectory, and contains AGMS link X.
   /\ / \
 (X) B  |     B has C as a subdirectory.  Main parent is A.
   \ |  ^     X is an AGMS link in A which points to C.  Main parent A.
     C  |     C contains an AGMS link named D.  Main parent B.
     |  ^
    (D) |     D is an AGMS link pointing to B.  Main parent C.
     \__/

After "rmdir B":

   A          A just contains directory C.
   |   _
    \ / \
     C  |     C is a directory which contains B.  Main parent A.
     |  |
     B  |     B is a directory which contains AGMS link Z.
     |  ^
    (Z) |     Z is an AGMS link to C.
     \__/

It's actually not that simple, since when something merges with an AGMS
link, it takes on the name of the link.  Also, when a placeholder is left
behind, it has the old name of the thing that used to be there.  The actual
names would be:

   A          A just contains directory X (formerly an AGMS link).
   |   _
    \ / \
     X  |     X is a directory (formerly C), contains D.  Main parent is A.
     |  |
     D  |     D is a directory (formerly B) which contains AGMS link C.
     |  ^
    (C) |     C is a left behind placeholder AGMS link to X.
     \__/

/Test:
drwxrwxrwx   0 agmsmith agmsmith        1 Mar 17 11:42 .
drwxrwxrwx   1 agmsmith agmsmith        0 Mar 17 11:42 ..
drwxrwxrwx   1 agmsmith agmsmith        2 Mar 17 11:42 a
/Test/a:
drwxrwxrwx   1 agmsmith agmsmith        2 Mar 17 11:42 .
drwxrwxrwx   0 agmsmith agmsmith        1 Mar 17 11:42 ..
drwxrwxrwx   2 agmsmith agmsmith        1 Mar 17 11:42 b
lrwxrwxrwx   1 agmsmith agmsmith        1 Mar 17 11:43 x -> /Test/a/b/c
/Test/a/b:
drwxrwxrwx   2 agmsmith agmsmith        1 Mar 17 11:42 .
drwxrwxrwx   1 agmsmith agmsmith        2 Mar 17 11:42 ..
lrwxrwxrwx   0 agmsmith agmsmith        2 Mar 17 11:42 ... -> /Test/a/b/c
drwxrwxrwx   2 agmsmith agmsmith        1 Mar 17 11:42 c
/Test/a/b/c:
drwxrwxrwx   2 agmsmith agmsmith        1 Mar 17 11:42 .
drwxrwxrwx   2 agmsmith agmsmith        1 Mar 17 11:42 ..
lrwxrwxrwx   0 agmsmith agmsmith        2 Mar 17 11:42 ... -> /Test/a
lrwxrwxrwx   1 agmsmith agmsmith        1 Mar 17 11:43 d -> /Test/a/b

And after rmdir /Test/a/b the listing is:

/Test:
drwxrwxrwx   0 agmsmith agmsmith        1 Mar 17 11:42 .
drwxrwxrwx   1 agmsmith agmsmith        0 Mar 17 11:42 ..
drwxrwxrwx   1 agmsmith agmsmith        1 Mar 17 11:42 a
/Test/a:
drwxrwxrwx   1 agmsmith agmsmith        1 Mar 17 11:42 .
drwxrwxrwx   0 agmsmith agmsmith        1 Mar 17 11:42 ..
drwxrwxrwx   2 agmsmith agmsmith        1 Mar 17 11:42 x
/Test/a/x:
drwxrwxrwx   2 agmsmith agmsmith        1 Mar 17 11:42 .
drwxrwxrwx   1 agmsmith agmsmith        1 Mar 17 11:42 ..
lrwxrwxrwx   0 agmsmith agmsmith        2 Mar 17 11:42 ... -> /Test/a/x/d
drwxrwxrwx   1 agmsmith agmsmith        1 Mar 17 11:42 d
/Test/a/x/d:
drwxrwxrwx   1 agmsmith agmsmith        1 Mar 17 11:42 .
drwxrwxrwx   2 agmsmith agmsmith        1 Mar 17 11:42 ..
lrwxrwxrwx   1 agmsmith agmsmith        1 Mar 17 11:44 c -> /Test/a/x

The detection of cycles and fixing up of main parent directories is done in
three steps:

Step 1.
  Find all children of the unlinked objects.  This is done by having a list
  of objects needing to be examined, initialised with the one or two objects
  being unlinked.  Then while the pending examination list isn't empty, an
  object is removed from the pending list and added to the list of children
  found, and its children (just directory or AGMS link type children,
  including ones through a removed link but not through the new added link)
  are added to the pending examination list.

Step 2.
  Find the new main parent directory or AGMS link for each affected item.
  Start by adding all children found in step 1 to an orphan list.  Then
  while the orphan list isn't empty, repeatedly scan through it for an
  object that has some parent directory (excluding parents through removed
  links but including parents through the new added link) or parent AGMS
  link which isn't in also in the orphan list.  Mark that parent as the new
  main parent and remove the object from the orphan list (yes, other orphans
  can now use the object as a main parent if they are children), and resume
  scanning at the next orphan.  If the scan gets to the end of the list,
  restart scanning at the beginning.  If the scan goes from beginning to end
  of the non-empty list without finding any parents then we have a
  disconnected cycle, and an error code needs to be returned to the user.

Step 3.
  Change the main parents to the new choices.  For each item being
  re-parented, create a new AGMS link to the item in the former main parent
  directory.  If an AGMS link is the new main parent, merge with the AGMS
  link (replacing the AGMS link in effect, references to the old AGMS link
  are changed to point to the re-parented item, and the re-parented item
  takes on the name of the old AGMS link).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-27 12:05                 ` Alexander G. M. Smith
  2005-10-27 12:41                   ` John Gilmore
@ 2005-10-27 16:40                   ` Hans Reiser
  1 sibling, 0 replies; 24+ messages in thread
From: Hans Reiser @ 2005-10-27 16:40 UTC (permalink / raw)
  To: Alexander G. M. Smith; +Cc: John Gilmore, reiserfs-list

Alexander G. M. Smith wrote:

>John Gilmore wrote on Wed, 26 Oct 2005 17:02:06 +0000:
>  
>
>>I had understood that a big part of the issue with file-as-directory was the 
>>same as the issue with hard links to directories. Which I thought is that if 
>>you move one directory into another, you can lose the connection to the root 
>>of the filesystem -- I.E. ../../.. etc is no longer guaranteed to get you 
>>to / -- And of course this also means that there is no way to get from / to 
>>your freshly moved files, and you've just lost a chunk of disk space to 
>>complete inaccessibility. fsck would have to be made smarter specifically to 
>>be able to figure that one out, too.
>>    
>>
>
>The file move operation has to check that it doesn't break the graph into
>two graphs, thus disconnecting some files from the root.  Or you can think
>of it as being a way of deleting a whole subgraph of files all at once,
>kind of like an rmdir that works better than usual :-)
>
>Speaking of connecting to the root, one concept I found useful was to have
>a "True Path" to the root.  One of the hard links to a file / directory is
>considered to be the main one and the rest are auxiliary (easier done if
>each file/dir stores a list of parents).  The main one is guaranteed to
>lead to the root (a recursive property) and is used for ".." in directories
>and the equivalent operation for files.  Then when you want a canonical
>path, you build it by following the main parents up to the root.
>  
>
What do you do when unlinking the true path parent and there remain
links to the others, do you check for connection to root then during unlink?

>The "True Path" comes in useful for faking hard links for POSIX compatibility
>by misrepresenting them as symbolic links using the dynamically generated true
>path string.  As well, if you remove a hard link and it wasn't the main parent
>then you don't have to do any graph traversal to fix up things; all items
>will still have a valid link to the root through their unchanged main parent.
>
>- Alex
>
>
>  
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-26 16:10         ` Peter van Hardenberg
  2005-10-26 16:43           ` Chester R. Hosey
@ 2005-10-26 21:04           ` Nate Diller
  2005-10-26 21:09             ` Hans Reiser
  1 sibling, 1 reply; 24+ messages in thread
From: Nate Diller @ 2005-10-26 21:04 UTC (permalink / raw)
  To: Peter van Hardenberg; +Cc: reiserfs-list

On 10/26/05, Peter van Hardenberg <pvh@uvic.ca> wrote:
> On October 26, 2005 05:44 am, you wrote:
....
> Also: I still have not been able to USE files as directories. Yes, I can
> reach
> file/..../, but that is only one special case. I can crash things like it
> was
> going out of style by playing with these file-directories. Does nobody have
>
> any experience with this? What kind of work is it going to take me to get
> this going?
>
I do have some experience with this, with code about 9 months old now.
 There are two basic problems.  One is that the code has suffered
bitrot over the past year, so it will have bugs compared to when I
played with it.  The other problem is that it causes serious confusion
to the Linux VFS, the dentry cache in particular, which has become
increasingly complex in its efforts to stop sucking so bad.  It may be
useable as a toy by fixing a few of the recent bugs I mentioned, but
it will not be compatable with Linux until someone such as myself
spends about a year working on it.  And even then, I bet it would not
be merged for another year at least.

The other way to do cool metadata stuff in Reiser4 is sys_reiser4(),
which was in a serious state of not being useable last I tried it. 
However, it's more likely work at some point, perhaps you can look
into helping finish it off, although I expect it will involve lots of
core reiser4 hacking.

My verdict: file-as-dir is dead, and will not return to the world of
the living in its current form

NATE

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-26 21:04           ` Nate Diller
@ 2005-10-26 21:09             ` Hans Reiser
  0 siblings, 0 replies; 24+ messages in thread
From: Hans Reiser @ 2005-10-26 21:09 UTC (permalink / raw)
  To: Nate Diller; +Cc: Peter van Hardenberg, reiserfs-list

Nate Diller wrote:

>On 10/26/05, Peter van Hardenberg <pvh@uvic.ca> wrote:
>  
>
>>On October 26, 2005 05:44 am, you wrote:
>>    
>>
>....
>  
>
>>Also: I still have not been able to USE files as directories. Yes, I can
>>reach
>>file/..../, but that is only one special case. I can crash things like it
>>was
>>going out of style by playing with these file-directories. Does nobody have
>>
>>any experience with this? What kind of work is it going to take me to get
>>this going?
>>
>>    
>>
>I do have some experience with this, with code about 9 months old now.
> There are two basic problems.  One is that the code has suffered
>bitrot over the past year, so it will have bugs compared to when I
>played with it.  The other problem is that it causes serious confusion
>to the Linux VFS, the dentry cache in particular, which has become
>increasingly complex in its efforts to stop sucking so bad.  It may be
>useable as a toy by fixing a few of the recent bugs I mentioned, but
>it will not be compatable with Linux until someone such as myself
>spends about a year working on it.  And even then, I bet it would not
>be merged for another year at least.
>
>The other way to do cool metadata stuff in Reiser4 is sys_reiser4(),
>which was in a serious state of not being useable last I tried it. 
>However, it's more likely work at some point, perhaps you can look
>into helping finish it off, although I expect it will involve lots of
>core reiser4 hacking.
>
>My verdict: file-as-dir is dead, and will not return to the world of
>the living in its current form
>
>NATE
>
>
>  
>



file-as-dir has bugs.  These bugs require substantial vfs fixes or vfs
sidestepping.  That will be done, but not right now, we have to get the
rest of the code into the kernel first, and then work on it.  I disagree
with Nate's characterization of things.

Hans

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Our introduction to Reiser-list
  2005-10-25 22:58 Our introduction to Reiser-list Peter van Hardenberg
  2005-10-25 23:08 ` Hans Reiser
@ 2005-10-26 21:00 ` Lares Moreau
  1 sibling, 0 replies; 24+ messages in thread
From: Lares Moreau @ 2005-10-26 21:00 UTC (permalink / raw)
  Cc: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 1753 bytes --]

I am too an amateur FS geek.
A link you might like to look at.
http://www.geocities.com/ravikiran_uvs/
It contains a Simple file system. That has helped me, and may be of
interest to you.


On Tue, 2005-10-25 at 15:58 -0700, Peter van Hardenberg wrote:
> Hello all,
> 
> I'm a student at the University of Victoria. Between myself and a few fellow 
> students we have embarked on a quest to do some experiments with the Reiser4 
> metadata system to show it off and provide some real world use cases.
> 
> We'll be spending lots of time on this project between now and the end of the 
> year. If list members have ideas for interesting experiments, please, join in 
> and suggest them.
> 
> So far, we have a few nifty ideas:
> 
> 1) tie the 'extract' utility to a shell script to copy existing metadata down 
> into the filesystem
> 
> 2) create a (very slow) shell which operates on the filesystem by xpath query. 
> doing it right would be difficult, but faking it should be easy.
> 
> We are all inexperienced with kernel hacking and will probably ask some stupid 
> questions as we go forward. Our individual interests vary and we may find 
> that we have bitten off more than we can chew, but I expect it will be an 
> interesting ride, if nothing else. 
> 
> Our first contribution will be a practical guide to installing Reiser4 (with 
> metadata enabled) under Ubuntu 5.10. 
> 
> All the best,
> -pvh
> 
-- 
Lares Moreau <lares.moreau@gmail.com>  | LRU: 400755 http://counter.li.org
Gentoo x86 Arch Tester                 |               ::0 Alberta, Canada
Public Key: 0D46BB6E @ subkeys.pgp.net |           Encrypted Mail Prefered
Key fingerprint = 0CA3 E40D F897 7709 3628  C5D4 7D94 483E 0D46 BB6E

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2005-10-28 12:29 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-25 22:58 Our introduction to Reiser-list Peter van Hardenberg
2005-10-25 23:08 ` Hans Reiser
2005-10-26  0:04   ` Peter van Hardenberg
2005-10-26  2:42     ` Hubert Chan
2005-10-26 12:44       ` Peter Foldiak
2005-10-26 16:10         ` Peter van Hardenberg
2005-10-26 16:43           ` Chester R. Hosey
2005-10-26 17:12             ` Hans Reiser
2005-10-26 20:43             ` David Masover
2005-10-26 22:40             ` Nate Diller
2005-10-26 17:02               ` John Gilmore
2005-10-27  0:55                 ` Hubert Chan
2005-10-27  6:49                 ` Peter van Hardenberg
2005-10-27 11:17                   ` David Masover
2005-10-27 19:20                     ` Peter van Hardenberg
2005-10-27 20:44                       ` Jonathan Briggs
2005-10-27  8:44                 ` Hans Reiser
2005-10-27 12:05                 ` Alexander G. M. Smith
2005-10-27 12:41                   ` John Gilmore
2005-10-28 12:29                     ` Alexander G. M. Smith
2005-10-27 16:40                   ` Hans Reiser
2005-10-26 21:04           ` Nate Diller
2005-10-26 21:09             ` Hans Reiser
2005-10-26 21:00 ` Lares Moreau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.