From: Hans Reiser <reiser@namesys.com>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: Erik Andersen <andersen@codepoet.org>, linux-kernel@vger.kernel.org
Subject: Re: Things that Longhorn seems to be doing right
Date: Thu, 30 Oct 2003 22:23:49 +0300 [thread overview]
Message-ID: <3FA16545.6070704@namesys.com> (raw)
In-Reply-To: <20031030174809.GA10209@thunk.org>
Theodore Ts'o wrote:
>On Thu, Oct 30, 2003 at 11:05:05AM +0300, Hans Reiser wrote:
>
>
>>What a performance nightmare. Updating a user space database every time
>>a file changes --- let's move to a micro-kernel architecture for all of
>>the kernel the same day.....;-)
>>
>>
>
>Nope, the user space database only needs to change when the file
>metadata changes.
>
>
Do you mean like it does with every file create?
>
>
>>Not to mention that SQL is utterly unsuited for semi-structured data
>>queries (what people store in filesystems is semi-structured data), and
>>would only be effective for those fields that you require every file to
>>have.
>>
>>
>
>Your assumption here is that the only thing that people search and
>index on is semi-structed data.
>
No, my assumption is that structured data is a special case of
semi-structured data, and should be modeled that way.
>
>In addition, even for text-based files, in the future, files will very
>likely not be straight ASCII, but some kind of rich text based format
>with formatting, unicode, etc.
>
Formatting does not make text table structured.
> And even general, unstructured
>text-based indexing is hard enough that putting that into the kernel
>is just as bad as putting an SQL optimizer into the kernel.
>
Well, since I don't think that SQL belongs in the filesystem, and I
think that text indexing should be done by users choosing how to index
their text, including choosing whether to use an automatic indexer or do
it by hand, and I think that the automatic indexer probably belongs in
user space (I could be wrong, but I would at least choose to do version
1 of such a thing in user space, perhaps using a language other than C),
I have to say that we are agreeing here. Surely it is an accident, but
oh well.;-)
> That I
>would claim would have to be done in userspace, as part of the
>overhead when OpenOffice saves the file. (Note that some of the
>Linux-based office suites store files as gzip'ed XML files, which
>again argues that putting it in the kernel is insane --- why should we
>compress the file, only to have the kernel uncompress it and then
>re-parse the XML just so they can index it? Much better to have
>OpenOffice do the indexing while it has the uncompressed, parsed out
>text tree in memory. And if the indexes need to be updated in
>userspace, then life is much, much, much simpler if the lookups are
>also done in userspace --- especially when complex SQL query
>optimizations may be required.)
>
>
Well I agree.
You are missing my argument. I am saying that the indexes and name
space belong in the kernel, not that the auto-indexer belongs in the kernel.
>
>
>>How about you send him a patch that removes all of that networking stuff
>>from the kernel and puts it into user space where it belongs.;-) There
>>was this Windows user on Slashdot some time ago who claimed that it
>>wasn't just the browser that should be unbundled from the kernel, the
>>whole networking stack was unfairly bundled and locked out the companies
>>that used to provide DOS with networking stacks (the user didn't have in
>>mind patching the windows kernel and recompiling, he really thought it
>>should all be in user space). Your kind of fellow.....
>>
>>
>
>Networking has definite performance requirements on a per-packet basis
>which requires that it be in the kernel. Given that indexing happens
>rarely (i.e., only when a file is saved), the same arguments simply
>don't apply. If you consider how often a user is going to ask the
>question, "Give me a list of all photographs taken between June 10,
>1993 and July 24, 1996 which contains Mary Schmidt as a subject and
>whose resolution is at least 150 dpi",
>
uh, all the time, if there is a namespace that lets him. How often do
you use google? How often do you memorize the primary key of an object
in a relational database, and use only that versus how often do you do a
richer query?
> it definitely demonstrates why
>this doesn't need to be in the kernel.
>
>If you consider the amount of data that needs to be shovelled back and
>forth between the kernel's network device driver to a userspace
>networking stack and then back down into the kernel to the socket
>layer when processing a TCP connection over a 10 gigabyte Ethernet
>link, it's clear why it has to be in the kernel.
>
> When you consider
>how much data needs to be referenced when doing indexing, and in fact
>that it may exist in uncompressed form only in the userspace
>application, you'll see why it indeed it's better to do it in userspace.
>
>The bottom line is that if a case can be made that some portion of the
>functionality required by WinFS needs to be in the kernel, and in the
>filesystem layer specifically, I'm all in favor of it. But it has to
>be justified. To date, I haven't seen a justification for why the
>database processing aspect of things needs to be in the kernel.
>
> - Ted
>
>
>
>
In general, arguments over whether functionality belongs in the kernel
or a userspace library are not as easy as you tend to suggest. I think
you are a bit inclined to assume that what Unix does today is the right
thing for 2006. The kernel is going to grow at probably roughly the
same rate that computer horsepower grows, and the 30 year trend of
putting more and more into the kernel will continue.
Most filesystem namespace functionality belongs in the kernel because
subnames tend to invoke the functionality of other subnames when one
creates a richly compounding filesystem name space. There are however
exceptions to this. I would put directory lookup in the kernel. I
would put vicinity set intersection in the kernel. I would put set
difference in the kernel. I would put set union in the kernel. I would
put inheritance in the kernel. I would generally continue to put namei
in the kernel. Maybe macro expansion belongs in user space libraries, I
haven't thought enough about it to say. Probably the main reason I
don't want the auto-indexer in the kernel is irrational: I don't want to
design it, I want to see a lot of experiments, and I think the
psychological barriers to entry are lower for user space experiments.
Other valid reasons might be that string processing tools are richer in
user space, and sys_reiser4() will provide efficient batch operations
that will overcome most of the pain of context switches to the kernel
for each index update.
--
Hans
next prev parent reply other threads:[~2003-10-30 19:23 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-10-29 8:50 Things that Longhorn seems to be doing right Hans Reiser
2003-10-29 22:42 ` Erik Andersen
2003-10-29 23:03 ` Hans Reiser
2003-10-29 22:25 ` Dax Kelson
2003-10-30 0:20 ` Joseph Pingenot
2003-10-30 0:54 ` Neil Brown
2003-10-30 1:34 ` Joseph Pingenot
2003-10-30 2:54 ` Bernd Eckenfels
2003-10-30 2:58 ` Arnaldo Carvalho de Melo
2003-10-30 3:16 ` Joseph Pingenot
2003-10-30 5:28 ` Jeff Garzik
2003-10-30 5:56 ` Valdis.Kletnieks
2003-10-30 3:16 ` Neil Brown
2003-10-30 3:39 ` Joseph Pingenot
2003-10-30 10:27 ` Thorsten Körner
2003-10-30 21:28 ` jlnance
2003-10-30 22:29 ` Måns Rullgård
2003-10-31 2:03 ` Daniel B.
2003-10-31 1:04 ` Clemens Schwaighofer
2003-10-30 2:09 ` Alex Belits
2003-10-30 3:12 ` Joseph Pingenot
2003-10-30 4:21 ` Scott Robert Ladd
2003-10-31 16:42 ` Timothy Miller
2003-10-31 19:15 ` Hans Reiser
2003-10-30 9:52 ` Ingo Oeser
2003-10-30 4:06 ` Scott Robert Ladd
2003-10-30 1:52 ` Theodore Ts'o
2003-10-30 2:03 ` Joseph Pingenot
2003-10-30 9:23 ` Ingo Oeser
2003-10-30 3:57 ` Scott Robert Ladd
2003-10-30 4:08 ` Larry McVoy
2003-10-30 13:46 ` Jesse Pollard
2003-10-31 4:50 ` Stephen Satchell
2003-10-30 7:33 ` Diego Calleja García
2003-10-30 8:43 ` Giuliano Pochini
2003-10-30 8:05 ` Hans Reiser
2003-10-30 8:17 ` Wichert Akkerman
2003-10-30 11:59 ` Hans Reiser
2003-10-30 9:14 ` Giuliano Pochini
2003-10-30 9:55 ` Hans Reiser
2003-10-30 17:48 ` Theodore Ts'o
2003-10-30 19:23 ` Hans Reiser [this message]
2003-10-30 20:31 ` Theodore Ts'o
2003-10-31 7:40 ` Hans Reiser
2003-10-31 19:30 ` Theodore Ts'o
2003-10-31 20:47 ` Hans Reiser
2003-10-31 13:59 ` Herman
2003-10-31 21:23 ` Richard B. Johnson
2003-11-01 18:30 ` Hans Reiser
2003-10-31 21:08 ` David S. Miller
2003-11-02 21:42 ` Hans Reiser
2003-11-03 12:42 ` Nikita Danilov
2003-11-03 16:58 ` Timothy Miller
2003-11-04 8:13 ` Hans Reiser
2003-11-05 13:51 ` Ingo Oeser
2003-11-05 2:07 ` Hans Reiser
2003-10-31 11:01 ` Kenneth Johansson
2003-10-31 13:52 ` Jesse Pollard
2003-10-30 11:21 ` Felipe Alfaro Solana
2003-10-30 7:25 ` Christian Axelsson
2003-10-30 8:10 ` Hans Reiser
[not found] ` <200311011731.10052.ioe-lkml@rameria.de>
[not found] ` <3FA3FF46.7010309@namesys.com>
2003-11-03 10:55 ` Ingo Oeser
2003-11-04 8:10 ` Hans Reiser
[not found] <LUlv.31e.5@gated-at.bofh.it>
[not found] ` <M7iG.41B.7@gated-at.bofh.it>
[not found] ` <MagC.82U.7@gated-at.bofh.it>
[not found] ` <Maqe.8l3.9@gated-at.bofh.it>
2003-10-30 11:10 ` Ihar 'Philips' Filipau
2003-10-30 17:23 ` Alex Belits
2003-10-31 1:46 ` Daniel B.
2003-10-31 1:57 ` Philippe Troin
[not found] ` <Mcig.2uf.1@gated-at.bofh.it>
[not found] ` <Mcs2.2FJ.5@gated-at.bofh.it>
2003-10-30 12:04 ` Ihar 'Philips' Filipau
[not found] ` <Mg2B.7wf.9@gated-at.bofh.it>
[not found] ` <Mh8n.BT.9@gated-at.bofh.it>
[not found] ` <MhLf.1pF.9@gated-at.bofh.it>
2003-10-30 12:16 ` Ihar 'Philips' Filipau
-- strict thread matches above, loose matches on Subject: below --
2003-11-02 13:11 Brian Beattie
2003-11-02 17:15 ` Valdis.Kletnieks
2003-11-03 19:35 ` Brian Beattie
2003-11-03 20:17 ` Richard B. Johnson
2003-11-03 20:23 ` Valdis.Kletnieks
2003-11-03 20:54 ` Richard B. Johnson
2003-11-03 21:01 ` Valdis.Kletnieks
2003-11-03 22:06 ` Måns Rullgård
2003-11-04 8:47 ` Michael Clark
2003-11-04 12:47 ` Richard B. Johnson
2003-11-04 14:02 ` Brian Beattie
2003-11-03 20:55 ` Roland Dreier
2003-11-04 0:35 ` Daniel B.
2003-11-04 14:05 ` Brian Beattie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3FA16545.6070704@namesys.com \
--to=reiser@namesys.com \
--cc=andersen@codepoet.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox