Re: File as a directory - file-as-dir vs. link-dirs (again)

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3
@ 2005-11-18  3:42 Leo Comerford
  2005-11-19  3:13 ` Hubert Chan
  0 siblings, 1 reply; 9+ messages in thread
From: Leo Comerford @ 2005-11-18  3:42 UTC (permalink / raw)
  To: Alexander G. M. Smith; +Cc: reiserfs-list

(This is the third and final choke-sized chunk. In order to keep any
replies together, I suggest that people reply to this part unless the
reply is very specific to one of the others.)

File-as-dir is a flawed way of expressing parent-child relations.
Unfortunately, when it comes to relations, expressing two-way
parent-child links and providing a tree view of them is what
file-as-dir does /best/.

Even simple two-way relationships that don't have an obvious
parent-child nature cause additional problems. Say we decided to
create metadata to record which of the men are friends. So if Dean
gets along with his brother Ed we could create

/(something)/friend/aardvark:
/(something)/friend/aardvark:1 (which is the file also known as
'/(whatever)/portrait/Ed')
/(something)/friend/aardvark:2 (which is the file also known as
'/(whatever)/portrait/Dean')

using link-directories. In fact, if we have anonymous last name
segments, we can just create

/(something)/friend/aardvark: , which links anonymously to both the Ed
and Dean photos.

But try to express this using subfiles: which of the two brothers will
we arbitrarily choose to make the subfile of the other?

In general, because the subfile relationship is always parent-child,
to express a symmetric relationship in it we have to make up spurious
extra data, declaring one participant in the relationship to be the
'parent' when no such distinction exists. Ed and Dean are unlikely to
care about this, but try deciding whether Sales worksClosely with
Marketing or Marketing worksClosely with Sales on your firm's
computerised org chart. (Apparently things like LDAPisation projects
have provoked wars over less.) And in the link-directory example using
anonymous links, even the dumbest program that knows nothing about
either /(something)/friends or friendship can tell that
/(something)/friends/aardvark: is symmetric. In the link-directory
example that doesn't use anonymous links, it doesn't know that - and
subfile metadata will actively give it the false parent/child
information. And of course even if we already know that a specific
relationship is symmetric, or if it's not important that we find out,
problems two and four from part two bite hard. For example, reliably
finding all of Ed's friends' photos requires looking for both all his
photo's ;friends children and all its ;friends parents every time. We
have similar problems for relationships that aren't symmetric, but for
which we don't want to have to declare one role to be the parent of
the other. Which party in a is-husband-of/is-wife-of relation should
be indicated as the parent?

Then there are (>2)-way relations. Here's a good example of a
three-way relation, lifted from the
Rumbaugh-Blaha-Premerlani-Eddy-Lorensen OO book. Say that we have
files representing programmers, software projects and programming
languages. Now say that, for example, Bob is using Algol 68 on the
Foomatic and both SNOBOL and PL/1 on Project Omega, while Dean is
coding in PL/1 on the Computron and in PILOT on Project Omega, and
Todd is formally specifying the Foomatic in Z. We would represent this
information using link-directories by creating

/(thingy)/impl-lang/aardvark:coder --> /(whatever)/portrait/Bob
/(thingy)/impl-lang/aardvark:lang   --> /bin/algol68
/(thingy)/impl-lang/aardvark:proj    --> /(whatever)/projects/foomatic
/(thingy)/impl-lang/zebra:coder --> /(whatever)/portrait/Dean
/(thingy)/impl-lang/zebra:lang   --> /bin/pilot
/(thingy)/impl-lang/zebra:proj    --> /(whatever)/projects/foomatic

and so on: one link-directory for each triple of programmer, project
and language. If we want to express the same information using subfile
metadata we are going to have to create something like

/(whatever)/portrait/Bob;impl-lang/1/proj   --> /bin/algol68
/(whatever)/portrait/Bob;impl-lang/1/lang   --> /(whatever)/projects/foomatic
/(whatever)/portrait/Dean;impl-lang/1/proj  --> /bin/pilot
/(whatever)/portrait/Dean;impl-lang/1/lang  --> /(whatever)/projects/foomatic

and so on. Problem two is worse in this case. Not only do we have to
look through the path"name"s of /(whatever)/projects/foomatic in order
to find out what programmers are working on it, but in order to find
out what languages Bob is using on the Foomatic we have to find the
/(whatever)/portrait/Bob;impl-lang/* directories among the path"name"s
of /(whatever)/projects/foomatic and then examine those directories'
./language names. And to find out what projects Bob is working on, we
have to list all the /(whatever)/projects/* files which are linked
from /(whatever)/portrait/Bob;impl-lang/*/project . All this is
basically the same as working with link-directories using
base-filesystem commands; indeed /(whatever)/portrait/Bob;impl-lang/1
is basically /(thingy)/impl-lang/aardvark: shoved under an arbitrary
choice of one of the three files it relates.

We created tools so that we could handle parent-child relations
expressed as link-directories without clunkiness; naturally we can do
similar things for relations of other kinds. One generally useful tool
would be something like rels below:

$ cd /(whatever)/portrait/Dean
$ rels
/(something)/father-son (:son) :father
/(thingy)/impl-lang (:coder) :lang :proj
/(thingy)/impl-lang (:coder) :lang :proj
/(something)/friend
$

rels lists the link-directories of which /(whatever)/portrait/Dean is
a descendant. The animal names at the end of each link-directory's
"name" have been omitted, because they don't convey any information
beyond distinguishing between different link-directories in the
relation-directory. (The last "name"-segment of a link-directory isn't
always thus; in my other email to you shortly I discuss how programs
can sensibly identify the ones that are.) Some other compression is
obviously possible too. It would be possible to create a program (or
an ls option) that worked like ls -P (as described above) except that
instead of printing pathnames through link-directories it would
substitute the corresponding rels entry. And those who really,
absolutely demand to deal with relations via subfile metadata could
create a set of tree operators to simulate it by presenting the
non-relational pathnames of the base filesystem tree as well as
pathnames like this:

/(whatever)/portrait/Dean;(something)/father-son[son]:father
/(whatever)/portrait/Dean;(thingy)/impl-lang/zebra[coder]:lang
/(whatever)/portrait/Dean;(thingy)/impl-lang/zebra[coder]:proj
/(whatever)/portrait/Dean;(thingy)/impl-lang/giraffe[coder]:lang
/(whatever)/portrait/Dean;(thingy)/impl-lang/giraffe[coder]:proj
/(whatever)/portrait/Dean;(something)/friend

. (If (something)/father-son and so on seem rather bulky in this
context, remember problem one from part two; real-world
subfile-metadata names will probably be just as long.)

Other possible things include:

$ cd /(whatever)/portrait/Bob
$ go /(something)/father-son/manticore:son
$ pwd
/(whatever)/portrait/Dean
$ langs-used
pl1 pilot
$ go /(something)/friend
$ pwd
/(whatever)/portrait/Ed
$ go /(something)/father-son/:father
$ pwd
/(whatever)/portrait/Bob
$

.

Using subfile metadata automatically creates a rooted-digraph
representation of the (meta)data: if Mike is the father of Bob, you
express that by using a subfile link to make Bob's photo an actual
subfile of Mike's in the base filesystem "tree". So you can go "down"
the subfile link from Mike to Bob and (maybe) "up" again to Bob. We
saw earlier how we can instead present tree presentations of arbitrary
link-directory metadata. This alternative approach, providing

$ pwg
^Mike-Ted-Todd
$ lsg
Andy

instead of

$ pwd
/(whatever)/portrait/Mike;son-photo;son-photo
$ ls
son-photo employs random-stuff irrelevant whats_this ~ [etc. etc.]

, is more powerful and more pleasant even if all you want is a single
rooted-digraph presentation of the (meta)data you are using. But of
course we don't always want to look at everything as a rooted digraph.
Some data we don't want to present as a rooted digraph at all. Imagine
we have a large body of heavily interconnected /(something)/friend
links , making for a big and definitely rootless graph. We could just
present this as a rooted digraph by arbitrarily choosing one person's
photo to be the root, but we really don't want to have to do this,
just as we don't want to have to choose one person's photo to be the
parent in an individual photo-of-friend/photo-of-friend relationship.
So we need operators to explore and manipulate rootless graphs too.
Something like

$ pwd
/(whatever)/portrait/Ed/(whatever)/portrait/Ed
$ defrel /(something)/friend
# some metadata off /(something)/friend/ is specifying a name-segment
# directory, as with /(something)/father-son/ above, so we do:
$ go Dean
$ rel
Ed
$

would provide the basic "ls" and "cd", and obviously we can do much
more. And one important application of a set of generic rootless-graph
operators is that it gives us a graph presentation of both all
symmetric two-way links (like /(whatever)/friend ) and all asymmetric
two-way links for which we don't have metadata to indicate which role
to think of as the parent.

And naturally data doesn't have to be relational to demand a
non-rooted-digraph representation. Say I attach metadata to the
/(whatever)/photos files specifying for each photo the co-ordinates of
the pictured man's house. Operators which present a geometric rather
than a graph view can then be employed:

$ pwd # current location
/(whatever)/portrait/Bob
$ pwg # current location
^Mike-Bob
$ loc # current location
39° 45.38' N 105° 00.55' W  1610
$ range 5 # everything within 5 kilometres
/(whatever)/portrait/Ed /(whatever)/portrait/Jeff
$ cg Dean # move from father to son
$ up 500; north 20 # move up 500 m and north 20km
$ loc
41° 04.203' N 81° 31.442' W 782
$pwd
/(stuff)/coords/earth/039_56.163N105_00.55W2110/aardvark

. :)

One of the nice things about using rooted digraphs to represent data
is that so many things can be thought of as special cases of them, and
so represented as them. For example, we can sensibly represent a stack
as a tree, with the tail as the root. But of course all trees that
represent stacks in this way have additional constraints: for example,
there is at most one child per parent. So while we can use all the
generic "tree" operators on our stack-as-a-tree, we can also provide
other operators that won't (reliably) work on other tree
representations, including an operator to move to the head and
operators to push and pop. So where possible, we should create
non-rooted-graph presentations of the filesystem by extending the
rooted-graph presentation; we could implement a completely new set of
operators to present stacks and the like, but why do so? Only when the
new presentation can't reasonably be seen as an extension of the
rooted-graph presentation should we make a whole new set of operators;
this is the case with the rootless-graph and geometric presentations
discussed above.

There is also a lot of information that can reasonably be presented as
a rooted digraph but which we may want to present in other ways too.
One example is the base filesystem "tree" metadata itself. In many
ways it's best to think of the filesystem as consisting of files, with
attributes (their (full, opaque) pathnames) attached, floating around
inside their volumes in a completely unstructured fashion. Directories
are basically searches-by-attribute which return an unordered set of
the matching files, with the additional wrinkle that files which have
a more specialised version of the attribute appear in subdirectories.
(For example, all the files with opaque pathname '/usr/[aardvark]' are
children of /usr/, while all the files with the opaque pathname
'/usr/bin/[zebra]' are children of /usr/bin/, despite the fact that
having the pathname '/usr/bin/[zebra]' means that '/usr' is also
asserted of them.) So we need an operator which works like ls/lsg/etc.
except that instead of listing the children of a file it lists all
(and only) its opaque descendants. Beyond that though, you don't
actually much need extra shell operators to support the "bunch of
files searchable by attribute" way of looking at the filesystem; most
of what you need is at the levels above (the visual presentation of
directories/search results in the GUI) and below (using mount() to
expose persistent queries as directories). So it's actually an
untypical, bad example. :)

Speaking of GUIs, the improved filesystem GUI mentioned earlier which
works as a skin over the generic rooted-digraph operators (ch et al.)
can obviously provide a skin over other sets of generic operators too.
For example, it can provide a special GUI representation for stacks,
queues and deques as a thin skin over a set of generic
stack/queue/dequeue operators (itself an extension of the set of
generic rooted-digraph operators, as discussed above). Similarly it
could provide a GUI for unrooted graphs expressed through the generic
graph operators, a 2D or 3D-plot representation of data presented
through the geometric operators, and so on. So it could present a GUI
to the /(something)/friend data that decorates each node with
/(something/father-son and /(stuff)/coords/earth information, or
indeed use a specialised position-on-earth GUI to display the
/(stuff)/coords/earth data of the /(whatever)/portrait files with the
/(something)/friend information represented as great-circle lines
charted between the locations of each pair of friends. The important
point here is how thin a layer this GUI is, knowing nothing about the
syntax or semantics of the data it is representing beyond what it gets
from the operators it supports. This gives it extreme flexibility:
flicking between the two GUI presentations described above, or
replacing the /(something)/friend lines on the globe display with
/(something)/father-son ones, is a matter of one or two commands
rather than recoding GUI components. The power this could afford is
considerable.

And finally, there is the information that we want to be able to view
both as a rooted digraph and as ... a different rooted digraph (or as
more than two different ones). For example, the subfile-metadata
representation of the of the picture-of-father/picture-of-son metadata
discussed above presents it as a descendant chart showing (photos of)
(some of) the descendants of (most obviously) Mike. But parent/child
relationships can just as easily be thought of as creating a pedigree,
giving information about people's ancestors. In other words, it's just
as correct to think of (for example) Joe as being at the root of a
pedigree. Now it happens that the partial pedigrees expressed by our
photo-of-father/photo-of-son relationship metadata are all degenerate
trees, but that's only because everyone has only one father. Bring
mothers and daughters into the picture as well and the answer to
"which way is root?" becomes entirely relative, no pun intended. So we
might want to be able to view parent/child (in the biology sense)
relationships both as descendant charts and as pedigrees. This is easy
to do using the link-directories approach: just tell the "tree-view"
operators to regard :son rather than :father as being the rootward
role, or vice versa. (Note that we can do something similar with
rootless graphs: specify a root node using a command like

$ setroot /(whatever)/friend/Ed

and then you can use the rooted-digraph operators on the graph.) But
the subfile-metadata approach only provides us with one presentation
or the other unless we duplicate or rejig the data, or use a custom
persistent query.

In sum: "file-as-a-directory" gives us nothing, in terms of power,
unambiguousness, or convenience, that we can't get from
link-directories plus a small set of convenience utilities. The
reverse is emphatically not the case. Furthermore, link-directories
plus some more convenience utilities give us powerful things that are
pretty much beyond the ken of "file-as-a-directory" altogether.

But why not build the new tools to work on top of subfile metadata
rather than link-directories? Firstly, because subfile metadata is not
a sound foundation to build them on. Due to things like problem five
in part two and the problems with symmetric links above, subfile
metadata is ambiguous and sometimes downright misleading, so the
amount you can safely infer from it without extra context information
is limited. Secondly, once we have implemented the tools - once we
can, for example, present the ;son-photo subfile metadata as a
pedigree using the rooted-digraph operators and use those operators to
navigate and tweak the pedigree just as easily as if it were expressed
in the base filesystem tree - the advantages of using subfile metadata
are gone. The ;son-photo links give us a rough-and-ready tree
representation of the descendant chart for free, but we can get a
better, cleaner tree representation of the descendant chart using the
same class of operators we use to navigate the pedigree. So why put up
with the pain that the file-as-dir kludge will cause us when it no
longer satisfies any of our needs? Providing both file-as-dir and
link-directories is a bad idea too, for roughly the same reasons.

> - Alex
>

Leo.

--
Leo Richard Comerford - http://www.st-and.ac.uk/~lrc1 - accept no namesakes :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3
  2005-11-18  3:42 Leo Comerford
@ 2005-11-19  3:13 ` Hubert Chan
  2005-11-20  7:03   ` Leo Comerford
  0 siblings, 1 reply; 9+ messages in thread
From: Hubert Chan @ 2005-11-19  3:13 UTC (permalink / raw)
  To: reiserfs-list

P.S. your relational model can easily be expressed using file-as-dir
(well, actually, just standard directories):

/(something)/father-son/aardvark/father is a symlink to
  '/(whatever)/portrait/Mike')
/(something)/father-son/aardvark/son is a symlink to
  '/(whatever)/portrait/Bob')

-- 
Hubert Chan <hubert@uhoreg.ca> - http://www.uhoreg.ca/
PGP/GnuPG key: 1024D/124B61FA
Fingerprint: 96C5 012F 5F74 A5F7 1FF7  5291 AF29 C719 124B 61FA
Key available at wwwkeys.pgp.net.   Encrypted e-mail preferred.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3
@ 2005-11-19 19:58 Alexander G. M. Smith
  2005-12-08  8:53 ` Leo Comerford
  0 siblings, 1 reply; 9+ messages in thread
From: Alexander G. M. Smith @ 2005-11-19 19:58 UTC (permalink / raw)
  To: Leo Comerford; +Cc: reiserfs-list

Leo Comerford wrote on Fri, 18 Nov 2005 03:42:50 +0000:
> [^.*$]

Just a few points I thought of while reading through your text:

Genealogy is an extremely structured arrangement of data, most people won't be
doing something that complex - think of photo filing instead.  Also cycles
exist everywhere, even in genealogy.  So cycles should be supported by default.

You had a separate directory storing relationship links.  How about making that
a subdirectory of the person?  If I wanted to do genealogy-as-a-file-system, I'd
have a "children" subdirectory under the person; it would contain hard links to
all the person's children.  If you want to find a person's mother or father,
examine the list of their parent directories (a cyclic file system has more
parents than just "..") to find the ones called "children".  The person's
parents are the holders of those "children" directories.

I wouldn't worry about naming conflicts (such as "children" being a magic name)
since most people only define a few dozen relationships, at least in BeOS.  E-mail
was the most complex, followed by MP3 audio and photo tagging and not too much
else.  So just stick a prefix on the Children directory to mark it as special,
or mark it with a special file type (a subclass of directory in Apple's new
typing scheme).  BeOS used a short prefix, like "MAIL:" so you'd have
"MAIL:subject" and "MAIL:from" as e-mail attribute names (though that's actually
bad, since news articles have subjects too).

Admittedly one difficulty is in representing tuples.  And double ended links.
Like the Friendship relation.  BeOS handled that by tagging things with group
names, and then indexing the tags.  So e-mail contacts all had a "META:group"
attribute containing a comma separated list of group names.  The BeOS indexed
attribute query engine could then quickly turn up all contacts belonging to a
particular group.  Though finding a list of all groups wasn't as elegant.
Which reminds me that attribute-like things should include arrays, with global
indexing support to find things inside an array (also useful for arrays of
keywords in word processing documents).

So to sum up, it seems that you're way more power hungry than I.  I just want
something to make finding photos easier, not a whole database equivalent system
(I'd use a database for that).  Early versions of BeOS did use a database as
the file system, which turned out to be more trouble than it was worth.  A file
system can and perhaps should be something lighter.  But not as lightweight as
plain Unix file systems - I want better searching and linking.

- Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3
  2005-11-19  3:13 ` Hubert Chan
@ 2005-11-20  7:03   ` Leo Comerford
  2005-11-20 22:35     ` Leo Comerford
  2005-11-21  6:40     ` Hubert Chan
  0 siblings, 2 replies; 9+ messages in thread
From: Leo Comerford @ 2005-11-20  7:03 UTC (permalink / raw)
  To: Hubert Chan; +Cc: reiserfs-list

On 11/19/05, Hubert Chan <hubert@uhoreg.ca> wrote:
> P.S. your relational model can easily be expressed using file-as-dir
> (well, actually, just standard directories):
>
> /(something)/father-son/aardvark/father is a symlink to
>   '/(whatever)/portrait/Mike')
> /(something)/father-son/aardvark/son is a symlink to
>   '/(whatever)/portrait/Bob')
>

Yes absolutely. Yes, my relational model *does* uses standard
directories, with three differences.

1) foofs's internal implementation of link-directories and "other"
directories might be different. Or it might not. Entirely unimportant
at this level.

2) gc might treat some link-directories differently to
predicate-directories. (If Bob and Mike have been deleted, I don't
want /(something)/father-son/aardvark/ lying around.)

3)

> --
> Hubert Chan <hubert@uhoreg.ca> - http://www.uhoreg.ca/
> PGP/GnuPG key: 1024D/124B61FA
> Fingerprint: 96C5 012F 5F74 A5F7 1FF7  5291 AF29 C719 124B 61FA
> Key available at wwwkeys.pgp.net.   Encrypted e-mail preferred.
>
>


--
Leo Richard Comerford - http://www.st-and.ac.uk/~lrc1 - accept no namesakes :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3
  2005-11-20  7:03   ` Leo Comerford
@ 2005-11-20 22:35     ` Leo Comerford
  2005-11-21  6:40     ` Hubert Chan
  1 sibling, 0 replies; 9+ messages in thread
From: Leo Comerford @ 2005-11-20 22:35 UTC (permalink / raw)
  To: Hubert Chan; +Cc: reiserfs-list

(Apologies for email snafu - hit the wrong button.)

Difference 2 can also be left aside for now.

3) Say we just used standard directories to indicate relations. Then
if a user comes across  these "names" -

/(something)/father-son
/(something)/father-son/aardvark
/(something)/father-son/aardvark/son
/(something)/father-son/aardvark/father

- while browsing the filesystem she can use her common sense to guess
that /(something)/father-son expresses a father-son relation.
'father-son' likely suggests a relationship between fathers and sons,
and indeed /(something)/father-son/aardvark has children called father
and son. Pretty obvious. Similarly, if she comes across

/usr/bin
/usr/bin/alpha
/usr/bin/bravo
[etc. etc.]

, if she is familiar with Unix she will know that /usr/bin/ indicates
user binaries. Every (non-directory) file in /usr/bin/ isA '/usr/bin',
a user binary. (Also, common sense might suggest that /usr/bin/ has so
many children that it's unlikely to be one giant relationship.) But
what if she comes across

/(something)/mumble
/(something)/mumble/thingy/alpha
/(something)/mumble/thingy/bravo

? Is alpha a '/(something)/mumble/thingy', or is it in the 'alpha'
role of a '/(something)/mumble' relationship with another file? (Or it
could be in the 'thingy/alpha' role of a '/(something)' relationship.)

This matters a lot. The distinction between being a foo and being a
party in a foo relationship is clear and important. For example, there
is a big difference between being a marriage and being a married
person. So we need to know which one is meant. What's more, we need to
be *told*, because the other two solutions - guessing and knowing
already - aren't good enough. The person who created
/(something)/mumble/thingy would be able to tell us that if only she
had some way of indicating to us that we should interpret
/(something)/mumble/thingy as an instance of a relation. And that is
(to a first approximation) all that link-directories are - directories
with a simple binary flag set at creation time to indicate how they
should be interpreted.

Leo.
--
Leo Richard Comerford - http://www.st-and.ac.uk/~lrc1 - accept no namesakes :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3
  2005-11-20  7:03   ` Leo Comerford
  2005-11-20 22:35     ` Leo Comerford
@ 2005-11-21  6:40     ` Hubert Chan
  2005-12-23  5:39       ` Leo Comerford
  2005-12-23  5:59       ` Leo Comerford
  1 sibling, 2 replies; 9+ messages in thread
From: Hubert Chan @ 2005-11-21  6:40 UTC (permalink / raw)
  To: reiserfs-list

On Sun, 20 Nov 2005 07:03:03 +0000, Leo Comerford <lrc1@st-andrews.ac.uk> said:

> On 11/19/05, Hubert Chan <hubert@uhoreg.ca> wrote:
>> P.S. your relational model can easily be expressed using file-as-dir
>> (well, actually, just standard directories):
>> 
>> /(something)/father-son/aardvark/father is a symlink to
>> '/(whatever)/portrait/Mike')
>> /(something)/father-son/aardvark/son is a symlink to
>> '/(whatever)/portrait/Bob')

> Yes absolutely. Yes, my relational model *does* uses standard
> directories, with three differences.

> 1) foofs's internal implementation of link-directories and "other"
> directories might be different. Or it might not. Entirely unimportant
> at this level.

Yes.  I think that we should distinguish between how the data is stored,
and how it is exported to the filesystem.  I think that anyone who tries
to take a relational table, and try to translate it directly into a
filesystem implementation is nuts.  They are very different data models,
and should not be confused.  (Filesystem is probably more like an
object-oriented database, or, if I remember correctly -- it's been a
while since my database course, a network model database, and not like a
relational model.)

However, I think that it is perfectly reasonable (or at least not
completely insane) to export a relational table through a filesystem
interface, and have filesystem operations reflect on the underlying
table.

> 2) gc might treat some link-directories differently to
> predicate-directories. (If Bob and Mike have been deleted, I don't
> want /(something)/father-son/aardvark/ lying around.)

Yes, I think that would require some entirely different magic.
(e.g. your relational model to know about primary keys and such.)

> 3) Say we just used standard directories to indicate relations. Then
> if a user comes across these "names" ...

[...]

> But what if she comes across

> /(something)/mumble
> /(something)/mumble/thingy/alpha
> /(something)/mumble/thingy/bravo

> ? Is alpha a '/(something)/mumble/thingy', or is it in the 'alpha'
> role of a '/(something)/mumble' relationship with another file? (Or it
> could be in the 'thingy/alpha' role of a '/(something)' relationship.)

A few points
- things should not be named "mumble" or "thingy" -- things should be
  named descriptively (obviously -- of course, people don't always
  follow these obvious rules)
- the user should be in charge of how he/she organizes the data, and so
  he/she should pick the names that he/she wants to use.  It shouldn't
  be mandated.  And anyone else who uses the data should have a
  reasonable expectation to have any confusing things documented.  At
  least database schemas are usually documented for those people who
  need to use them.
- you can still have some sort of marker to indicate what role each part
  of the name takes (e.g. the "...." delimiter to indicate
  pseudofiles).  Or you can use a special naming convention (e.g. tuples
  have a special prefix).  But I think that trying to introduce a new
  delimiter that does basically the same thing as '/' is going to cause
  a lot of problems.  (See Rob Pike's paper, "The Hideous Name", if you
  haven't read it already, for more on this.  It's cited in Hans'
  "Future Vision" paper.)

As a completely different side issue, I think that using random names
such as "aardvark" or "zebra" to refer to tuples is a bad idea (and I
know this isn't part of your proposal).  If you use things that are real
words, people will get confused, since they will try to associate
meaning to something that doesn't have meaning.  (e.g. "why is my
relationship called "dodo", while Bob's is called "tiger"?)  I think
that it's best to just assign random meaningless strings, so that people
will know that they are meaningless.

-- 
Hubert Chan <hubert@uhoreg.ca> - http://www.uhoreg.ca/
PGP/GnuPG key: 1024D/124B61FA
Fingerprint: 96C5 012F 5F74 A5F7 1FF7  5291 AF29 C719 124B 61FA
Key available at wwwkeys.pgp.net.   Encrypted e-mail preferred.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3
  2005-11-19 19:58 File as a directory - file-as-dir vs. link-dirs (again) - 3/3 Alexander G. M. Smith
@ 2005-12-08  8:53 ` Leo Comerford
  0 siblings, 0 replies; 9+ messages in thread
From: Leo Comerford @ 2005-12-08  8:53 UTC (permalink / raw)
  To: Alexander G. M. Smith; +Cc: reiserfs-list

On 11/19/05, Alexander G. M. Smith <agmsmith@rogers.com> wrote:
> Leo Comerford wrote on Fri, 18 Nov 2005 03:42:50 +0000:
> > [^.*$]
>
> Just a few points I thought of while reading through your text:
>
> Genealogy is an extremely structured arrangement of data, most people won't be
> doing something that complex - think of photo filing instead.  Also cycles
> exist everywhere, even in genealogy.  So cycles should be supported by default.

Cycles? Sure. Hence all the quibbling about the generic "tree"
operators actually being rooted directed graph operators.

(In the case of geneaology, parent-child digraphs are acyclic unless
they involve Beeblebroxes, though they're *shudder* not invariably
/trees/.)

It's true that a lot of file metadata doesn't involve recursive
structure or suchlike, but you don't have to get very advanced or
esoteric to find file metadata that does. Expose your email's
Reply-to: information as file metadata and you have recursive
structure.

(Sidebar: Actually, you can think of the process of improving the
filesystem - large parts of that process, anyway - simply as the
process of fixing email. Every email ought to be an ordinary file like
any other document; instead emails live as entries in spool files
around which mail agents perform ritual lockfile dances. A filesystem
which handles smaller files more efficiently allows us to store each
email as a file, but you have to keep those files in a Maildir and
continue observing special safe-access rituals. Introduce transactions
and you can just put your emails into ordinary folders, but everyone
knows by now that just putting each of your emails into one folder or
another is a completely inadequate way of organising them. Introduce
pathname-listing and every email can usefully be in several
directories at once; this means that we can indicate all the
categories and "labels" we put on our emails by just putting them in
directories instead of having to use special data formats
understandable only to email clients (or worse, only to one email
client). But there is still the (meta)data in the email headers
themselves - we don't want to have to either duplicate or ignore it in
our directory metadata. So we use mount() to expose persistent queries
on the header data as directories.)

> You had a separate directory storing relationship links.  How about making that
> a subdirectory of the person?  If I wanted to do genealogy-as-a-file-system, I'd
> have a "children" subdirectory under the person; it would contain hard links to
> all the person's children.  If you want to find a person's mother or father,
> examine the list of their parent directories (a cyclic file system has more
> parents than just "..") to find the ones called "children".  The person's
> parents are the holders of those "children" directories.
>

You can use the same one-to-many approach with link-directories:
instead of creating a link directory for each (biological)
parent-child pair, create one for each parent which links it to all
its children. (Having the individual link-directories is better in one
way: it's safe to go from the one-to-one to the one-to-many form
without context knowledge, but not /vice versa/. In the case of the
parent-child relationship, if a person is a parent to a bunch of
children then that person is individually a parent to them all. But if
a person hasLessMoneyThan a bunch of people, then (s)he may not have
less money than any one of them.)

The (biological) parent-child relationship is kind to subfile metadata
again here: the one-to-many link makes the (biological) parent an even
more obvious candidate to go "on top". Problem five from part two is
as strong as ever, on the other hand: there's no way for a program to
tell without context knowledge that bob;children asserts a
relationship between bob and several other files, rather than
asserting some attributes of bob or describing some subparts of it.

> I wouldn't worry about naming conflicts (such as "children" being a magic name)
> since most people only define a few dozen relationships, at least in BeOS.  [snip]

If the names are only being created and used by one person, then yes,
conflicts are likely to be rare and easily dealt with by hand. But if
you have more than one person involved, and especially if you are
trying to use different bundles of names created independently by
different groups, then people will soon resort to the usual defensive
practises used in package naming. Since any application could
potentially define and use a bunch of its own subfile names just as it
can create several (ordinary) directories, this situation will arise
as soon as subfile names become popular.

[snip]

>
> So to sum up, it seems that you're way more power hungry than I.  I just want
> something to make finding photos easier, not a whole database equivalent system
> (I'd use a database for that).  Early versions of BeOS did use a database as
> the file system, which turned out to be more trouble than it was worth.  A file
> system can and perhaps should be something lighter.  But not as lightweight as
> plain Unix file systems - I want better searching and linking.

I could say quite a lot about this, but I'll try not to.

First, as the Reply-To: example shows, fairly high-powered filesystem
stuff is (at least) useful in many pretty ordinary and common
situations. And the goal of namespace integration involves the
high-powered stuff: eliminating things like RDBs or mbox files
altogether, or at least assimilating their interfaces to the
filesystem interface.

But I'm not just greedy for power here, I'm also greedy for elegance.
The reason why I dislike things like subfile metadata is not simply
because they're not powerful enough, but also because they're ad-hoc
weld-ons. I want to see the power of the Unix filesystem multiplied
without diluting its cleanness. The price to pay is that any
filesystem implementation which fully supports all that additional
power (with "good enough" performance all round) is going to be harder
to create than a traditional Unix filesystem, or (probably) even a
traditional Unix filesystem with a couple of extra widgets attached.

(Someone's going to have to sit down someday and come up with
conventions to disambiguate the n different meanings of the word
'filesystem' in the Unix context...)

> - Alex
>

Leo.

--
Leo Richard Comerford - http://www.st-and.ac.uk/~lrc1 - accept no namesakes :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3
  2005-11-21  6:40     ` Hubert Chan
@ 2005-12-23  5:39       ` Leo Comerford
  2005-12-23  5:59       ` Leo Comerford
  1 sibling, 0 replies; 9+ messages in thread
From: Leo Comerford @ 2005-12-23  5:39 UTC (permalink / raw)
  To: Hubert Chan; +Cc: reiserfs-list

On 11/21/05, Hubert Chan <hubert@uhoreg.ca> wrote:

[snip]

> A few points
> - things should not be named "mumble" or "thingy" -- things should be
>  named descriptively (obviously -- of course, people don't always
>  follow these obvious rules)

Well, sure. But even with the best efforts to choose clear and
descriptive "names", the meaning of every "name" will not always be
clear to every person, never mind to programs. For 'mumble' read
'[mumble]'.

> - the user should be in charge of how he/she organizes the data, and so
>  he/she should pick the names that he/she wants to use.  It shouldn't
>  be mandated.  And anyone else who uses the data should have a
>  reasonable expectation to have any confusing things documented.  At
>  least database schemas are usually documented for those people who
>  need to use them.

Documentation is good and important, but for all sorts of good and bad
reasons few filesystems will ever have good, up-to-date documentation
handy for every path"name". And, as Future Vision says, no-one has the
time or the inclination to study up on the format of every database
they might want to use. People like to learn by exploration, even when
they do also read the docs. And a shell utility can't use
human-readable documentation any more than it can apply human Unix
lore or common sense to interpreting the filesystem. To such a
program, every segment name is akin to 'mumble'; without some
additional information, it can't tell that /usr/bin is not an instance
of the relation /usr .

Which brings us back to semantics. I've compared the filesystem
interface to an ADT. But if that were all it is, if there were no
conventions about how to interpret a path"name", then it /would/ be
necessary to get out the manual and read up on the meaning of every
new path"name" you come across, because you would almost never be able
to infer anything about its meaning from looking at it. But the
filesystem isn't (just) a persistent-storage data structure or ADT;
it's a language through which both people and programs communicate.
I've described the semantics of that language before - '/usr/bin' is a
predicate which is asserted of all and only the opaque descendants of
/usr/bin, '/usr' is a predicate which is asserted of all and only the
opaque descendants of /usr, '/usr/passwd' is a predicate which is
asserted of /usr/passwd, etc. etc. ad nauseam. So if I come across
/foo/bar which links to a non-directory file then I know that the
predicate '/foo/bar' is asserted of that file (and that file only).
Even if 'foo' and/or 'bar' is mysterious to me, I already know a good
deal about the intended meaning of this bit of the filesystem, and I
can use what I know to help me deduce the meanings of mysterious
"name"-segments. (To plag^H^H^H^Hparaphrase one David Moser, imagine
walking into an office and seeing a Post-It note stuck on the side of
something. Even if the note contains many nonslarkish English
flutzpahs, you can glork much more of its pluggandisp than if it were
scríofa i dteanga éigin eile.) Having this language also means that
even programs which never know the meaning of any "name"-segment can
extract useful information from pathnames in virtue of their form. For
example, listing the common attributes of two files is a matter of
listing the intersection of their pathnames.

But if we start using directories to assert relations as well as
predicates without distingushing those directories which assert
instances of a relation, then we make every sentence in the language
ambiguous. Now any given full path"name" /might/ assert a predicate,
or it might assert an instance of a relation instead. (Or in fact an
instance of any one of several relations, since '/foo:bar/baz',
'/foo/bar:baz', and '/foo:bar:baz' all ambiguate to '/foo/bar/baz' .)
Such an ambiguous language is much less useful. Before, for example,
it took a simple shell command to find the predicates asserted of a
file. When the ambiguity is introduced, that simple operation becomes
an exercise in manual-reading and guesswork.

Speaking of databases, if you ask someone like C.J. Date what the most
important feature of the relational database is, he won't talk about
view-construction or even ACID properties. He certainly won't say
anything about performance. The answer he will give you is
"well-defined semantics". While a subgraph of a network database is
basically just a bit of persistent-storage data-structure whose
meaning can only be discerned by reading the documentation, a table in
an RDB can (must) always be understood as expressing the present
instances of some relation.

> - you can still have some sort of marker to indicate what role each part
>  of the name takes (e.g. the "...." delimiter to indicate
>  pseudofiles).  Or you can use a special naming convention (e.g. tuples
>  have a special prefix).  But I think that trying to introduce a new
>  delimiter that does basically the same thing as '/' is going to cause
>  a lot of problems.  (See Rob Pike's paper, "The Hideous Name", if you
>  haven't read it already, for more on this.  It's cited in Hans'
>  "Future Vision" paper.)

':' does basically the same thing as '/' in the same sense that OR
does basically the same thing as AND. Writing '/' when you mean ':' is
like always writing OR when you mean AND (so P & (Q | R) becomes P &
(Q & R) ). Introducing ambiguity is not a great way to save a
primitive. Now you /can/ eliminate OR without ambiguity, by expressing
it in terms of AND and NOT  (so P & (Q | R) becomes P & ~(~Q & ~R) ).
(Or indeed by expressing all three in terms of NAND.) But this
approach won't work with ':' and '/', because it's 100% impossible to
express (multi-place) relations in terms of single-place predicates.
That leaves three basic options: introduce syntax for link-directories
(or some other primitive that indicates relations), never express
anything that requires relations, or embrace ambiguity.

Going for option 1 then creates the problem of how to hide the
relation information from (or indeed reveal it to) existing code that
doesn't know anything about new syntax for relations. This is
basically exactly the same problem as indicating the start of subfile
metadata to such code, so all the same fixes involving special
"name"-segments or segment prefixes etc. are available. I'll just add
that that whatever kludges have to be applied in the
POSIX-compatiblity interface, they shouldn't blight the
next-generation interface too; a delimiter like ':' is clearly nicer
than special magic "name"-segments when backwards compatibility is not
an issue.

[snip]

Leo.
--
Leo Richard Comerford - http://www.st-and.ac.uk/~lrc1 - accept no namesakes :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: File as a directory - file-as-dir vs. link-dirs (again) - 3/3
  2005-11-21  6:40     ` Hubert Chan
  2005-12-23  5:39       ` Leo Comerford
@ 2005-12-23  5:59       ` Leo Comerford
  1 sibling, 0 replies; 9+ messages in thread
From: Leo Comerford @ 2005-12-23  5:59 UTC (permalink / raw)
  To: Hubert Chan; +Cc: reiserfs-list, Alexander G. M. Smith

On 11/21/05, Hubert Chan <hubert@uhoreg.ca> wrote:

[snip]

>
> As a completely different side issue, I think that using random names
> such as "aardvark" or "zebra" to refer to tuples is a bad idea (and I
> know this isn't part of your proposal).  If you use things that are real
> words, people will get confused, since they will try to associate
> meaning to something that doesn't have meaning.  (e.g. "why is my
> relationship called "dodo", while Bob's is called "tiger"?)  I think
> that it's best to just assign random meaningless strings, so that people
> will know that they are meaningless.
>

(This addresses the issue with last "name"-segments of
link-directories which I said in part 3 that I'd get back to.)

Meaningless "name" segments are annoying to those who know or guess
that they're meaningless. Worse, they're misleading to those people
and programs that don't. (After all, they amount to making up spurious
information.) The ideal solution is to throw them away completely:
having anonymous final segments allows two non-directory files having
the predicates '/foo/aardvark' and '/foo/zebra' to both simply be
'/foo'. It has some slightly weird effects, though; for example, when
'/foo/aardvark:bar' and '/foo/zebra:bar' both become '/foo/:bar', what
happens to

cd /foo/:bar

? One solution is to dynamically generate filler text for anonymous
final segments whenever text is necessary (in a POSIX legacy
interface, for example), based on the linked file's inode and that of
the volume; the format of the filler text should allow programs (and
humans) to detect it as such when they can't find out some other,
out-of-band, way.

[snip]

Leo.
--
Leo Richard Comerford - http://www.st-and.ac.uk/~lrc1 - accept no namesakes :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-12-23  5:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-19 19:58 File as a directory - file-as-dir vs. link-dirs (again) - 3/3 Alexander G. M. Smith
2005-12-08  8:53 ` Leo Comerford
  -- strict thread matches above, loose matches on Subject: below --
2005-11-18  3:42 Leo Comerford
2005-11-19  3:13 ` Hubert Chan
2005-11-20  7:03   ` Leo Comerford
2005-11-20 22:35     ` Leo Comerford
2005-11-21  6:40     ` Hubert Chan
2005-12-23  5:39       ` Leo Comerford
2005-12-23  5:59       ` Leo Comerford

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.