* using reiserfs as a DB @ 2002-04-21 20:53 Phil Howard 2002-04-22 13:20 ` Oleg Drokin 0 siblings, 1 reply; 11+ messages in thread From: Phil Howard @ 2002-04-21 20:53 UTC (permalink / raw) To: reiserfs-list Given the balanced tree directory structure of reiserfs, it seems it could be usable as a DB in place of a DB library (such as Berkeley DB). Has anyone done any timing/benchmarks of reiserfs used as a replacement for a DB library, as compared to one such as Berkeley DB? There would be an advantage to using conventional file tools to access the data instead of having to code some up for a DB library. The issue would certainly involve the open/read/close timings for reiserfs for each piece of data accessed. The uses for which I have an interest in doing this would most be small data, usually less than 128 bytes, and almost always less than 512 bytes. For example, one use involves indexing a lot of (100s to maybe even 1000000) URLs under special short keywords. -- ----------------------------------------------------------------- | Phil Howard - KA9WGN | Dallas | http://linuxhomepage.com/ | | phil-nospam@ipal.net | Texas, USA | http://phil.ipal.org/ | ----------------------------------------------------------------- ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: using reiserfs as a DB 2002-04-21 20:53 using reiserfs as a DB Phil Howard @ 2002-04-22 13:20 ` Oleg Drokin 2002-04-22 13:47 ` Hans Reiser 2002-04-22 17:44 ` Phil Howard 0 siblings, 2 replies; 11+ messages in thread From: Oleg Drokin @ 2002-04-22 13:20 UTC (permalink / raw) To: Phil Howard; +Cc: reiserfs-list Hello! On Sun, Apr 21, 2002 at 03:53:28PM -0500, Phil Howard wrote: > Given the balanced tree directory structure of reiserfs, it seems it > could be usable as a DB in place of a DB library (such as Berkeley DB). > Has anyone done any timing/benchmarks of reiserfs used as a replacement > for a DB library, as compared to one such as Berkeley DB? There would > be an advantage to using conventional file tools to access the data > instead of having to code some up for a DB library. The issue would > certainly involve the open/read/close timings for reiserfs for each > piece of data accessed. The uses for which I have an interest in doing > this would most be small data, usually less than 128 bytes, and almost > always less than 512 bytes. For example, one use involves indexing a > lot of (100s to maybe even 1000000) URLs under special short keywords. I do not have any numbers, but take in account that while DB database generally have to updata atime/mtime/ctime on only 3 files (or even 2), in case of a filesystem each file accessed will change atime and/or mtime/ctime. (you can turn off atime updates of course). Also directory lookups ain't going to be free either. I've not heard of a test like you are describing, so feel free to implement one that will suit all your needs. But I remember that squid people decided lookup/open/close operations are too expensive for them and raw reiserfs access was born, where you was able directly access filesystems objects by the keys. Bye, Oleg ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: using reiserfs as a DB 2002-04-22 13:20 ` Oleg Drokin @ 2002-04-22 13:47 ` Hans Reiser 2002-04-22 14:03 ` Nikita Danilov 2002-04-22 17:44 ` Phil Howard 1 sibling, 1 reply; 11+ messages in thread From: Hans Reiser @ 2002-04-22 13:47 UTC (permalink / raw) To: Oleg Drokin; +Cc: Phil Howard, reiserfs-list, god Oleg Drokin wrote: >Hello! > >On Sun, Apr 21, 2002 at 03:53:28PM -0500, Phil Howard wrote: > > >>Given the balanced tree directory structure of reiserfs, it seems it >>could be usable as a DB in place of a DB library (such as Berkeley DB). >>Has anyone done any timing/benchmarks of reiserfs used as a replacement >>for a DB library, as compared to one such as Berkeley DB? There would >>be an advantage to using conventional file tools to access the data >>instead of having to code some up for a DB library. The issue would >>certainly involve the open/read/close timings for reiserfs for each >>piece of data accessed. The uses for which I have an interest in doing >>this would most be small data, usually less than 128 bytes, and almost >>always less than 512 bytes. For example, one use involves indexing a >>lot of (100s to maybe even 1000000) URLs under special short keywords. >> >> > >I do not have any numbers, but take in account that while DB database >generally have to updata atime/mtime/ctime on only 3 files (or even 2), >in case of a filesystem each file accessed will change atime and/or mtime/ctime. > >(you can turn off atime updates of course). Also directory lookups ain't going >to be free either. >I've not heard of a test like you are describing, so feel free to implement >one that will suit all your needs. > >But I remember that squid people decided lookup/open/close operations are >too expensive for them and raw reiserfs access was born, where you was able >directly access filesystems objects by the keys. > >Bye, > Oleg > > > > The reiser4() system call, is specifically designed to alleviate this problem. Can you discuss this with Nikita, who is I think at best half-convinced that reiser4() has any purpose;-), so that he can understand what you want and why? hans ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: using reiserfs as a DB 2002-04-22 13:47 ` Hans Reiser @ 2002-04-22 14:03 ` Nikita Danilov 0 siblings, 0 replies; 11+ messages in thread From: Nikita Danilov @ 2002-04-22 14:03 UTC (permalink / raw) To: Phil Howard; +Cc: Oleg Drokin, Hans Reiser, reiserfs-list Hans Reiser writes: > Oleg Drokin wrote: > > >Hello! > > > >On Sun, Apr 21, 2002 at 03:53:28PM -0500, Phil Howard wrote: > > > > > >>Given the balanced tree directory structure of reiserfs, it seems it > >>could be usable as a DB in place of a DB library (such as Berkeley DB). > >>Has anyone done any timing/benchmarks of reiserfs used as a replacement > >>for a DB library, as compared to one such as Berkeley DB? There would > >>be an advantage to using conventional file tools to access the data > >>instead of having to code some up for a DB library. The issue would > >>certainly involve the open/read/close timings for reiserfs for each > >>piece of data accessed. The uses for which I have an interest in doing > >>this would most be small data, usually less than 128 bytes, and almost > >>always less than 512 bytes. For example, one use involves indexing a > >>lot of (100s to maybe even 1000000) URLs under special short keywords. > >> > >> > > > >I do not have any numbers, but take in account that while DB database > >generally have to updata atime/mtime/ctime on only 3 files (or even 2), > >in case of a filesystem each file accessed will change atime and/or mtime/ctime. > > > >(you can turn off atime updates of course). Also directory lookups ain't going > >to be free either. > >I've not heard of a test like you are describing, so feel free to implement > >one that will suit all your needs. > > > >But I remember that squid people decided lookup/open/close operations are > >too expensive for them and raw reiserfs access was born, where you was able > >directly access filesystems objects by the keys. > > > >Bye, > > Oleg > > > > > > > > > The reiser4() system call, is specifically designed to alleviate this > problem. Can you discuss this with Nikita, who is I think at best > half-convinced that reiser4() has any purpose;-), so that he can > understand what you want and why? Some parts of Phil Howard's concern are in fact addressed by reiser4() system call: avoiding overhead of multiple system calls, and avoiding overhead of creating relatively heavy-weight objects (file descriptors and all related paraphernalia) that are going to be recycled shortly. But, I guess, another implied idea is to use internal reiser{fs|4} tree directly as indexing structure, rather than to use "semantical" tree of user-visible pathnames for this purpose. This was implemented in reiserfs-raw: back-end for squid cache. Interestingly, reiserfs-raw was used for exactly storing pages indexed by md5 of URL. > > hans > Nikita. > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: using reiserfs as a DB 2002-04-22 13:20 ` Oleg Drokin 2002-04-22 13:47 ` Hans Reiser @ 2002-04-22 17:44 ` Phil Howard 2002-04-22 18:12 ` Yura Umanets 2002-04-23 8:50 ` Nikita Danilov 1 sibling, 2 replies; 11+ messages in thread From: Phil Howard @ 2002-04-22 17:44 UTC (permalink / raw) To: Oleg Drokin; +Cc: reiserfs-list On Mon, Apr 22, 2002 at 05:20:09PM +0400, Oleg Drokin wrote: | On Sun, Apr 21, 2002 at 03:53:28PM -0500, Phil Howard wrote: | > Given the balanced tree directory structure of reiserfs, it seems it | > could be usable as a DB in place of a DB library (such as Berkeley DB). | > Has anyone done any timing/benchmarks of reiserfs used as a replacement | > for a DB library, as compared to one such as Berkeley DB? There would | > be an advantage to using conventional file tools to access the data | > instead of having to code some up for a DB library. The issue would | > certainly involve the open/read/close timings for reiserfs for each | > piece of data accessed. The uses for which I have an interest in doing | > this would most be small data, usually less than 128 bytes, and almost | > always less than 512 bytes. For example, one use involves indexing a | > lot of (100s to maybe even 1000000) URLs under special short keywords. | | I do not have any numbers, but take in account that while DB database | generally have to updata atime/mtime/ctime on only 3 files (or even 2), | in case of a filesystem each file accessed will change atime and/or mtime/ctime. | | (you can turn off atime updates of course). Also directory lookups ain't going | to be free either. | I've not heard of a test like you are describing, so feel free to implement | one that will suit all your needs. | | But I remember that squid people decided lookup/open/close operations are | too expensive for them and raw reiserfs access was born, where you was able | directly access filesystems objects by the keys. "By the keys" means what? Are the keys the filenames/paths, or are they an internal manifestation obtained by looking up those keys? What I envision in some needs ideas are pretty much "flat" directory structures where the application key would be the filename in the directory. One example of this would be a lookup table translating a ham radio callsign into a web URL for that ham operators web site (the keys in this case would be small strings, 3 to 6 characters, and potentially a rather tight space if it scales up). Does the raw interface simply shortcut access to files in a normal reiserfs mounted filesystem, which can also still be accessed the usual way, or is it a special object which can only be accessed that way (if so, then it loses the advantage of being able to use conventional tools that work on files, and ends up being pretty much a DB lib implemented in kernel space). Since most operations would be open() file, read() file once (because nothing would be larger than one block), and close(), a single system call that allowed to just fetch the contents given a name would certainly be a plus for the server component. -- ----------------------------------------------------------------- | Phil Howard - KA9WGN | Dallas | http://linuxhomepage.com/ | | phil-nospam@ipal.net | Texas, USA | http://phil.ipal.org/ | ----------------------------------------------------------------- ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: using reiserfs as a DB 2002-04-22 17:44 ` Phil Howard @ 2002-04-22 18:12 ` Yura Umanets 2002-04-22 19:26 ` Richard Emslie 2002-04-22 23:16 ` Phil Howard 2002-04-23 8:50 ` Nikita Danilov 1 sibling, 2 replies; 11+ messages in thread From: Yura Umanets @ 2002-04-22 18:12 UTC (permalink / raw) To: Phil Howard; +Cc: reiserfs-list Phil Howard wrote: > "By the keys" means what? Are the keys the filenames/paths, The key in reiserfs seems like four-dimentional coordinate of the certain item in filesystem. The first component is identifier of directory where given object (file or directory) lies. Second - identifier of the given object. Third component - offset inside object. If object is file, then offset is offset inside this file, if directory - hashed name of first entry in this direntry. And finally last component is type of the item (statdata, direntry, direct item, indirect item). Therefore it is very fast to access corresponding object item. This is just $tree_height blocks to be read. > Does the raw interface simply shortcut access to files in a normal reiserfs > mounted filesystem, which can also still be accessed the usual way, or is it > a special object which can only be accessed that way (if so, then it loses > the advantage of being able to use conventional tools that work on files, and > ends up being pretty much a DB lib implemented in kernel space). Since most > operations would be open() file, read() file once (because nothing would be > larger than one block), and close(), a single system call that allowed to > just fetch the contents given a name would certainly be a plus for the server > component. The instance of what is a raw access to files and directories you can see on http://reiserfs.linux.kiev.ua/progsreiserfs-0.3.0.tar.gz in files: object.c, file.c, dir.c ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: using reiserfs as a DB 2002-04-22 18:12 ` Yura Umanets @ 2002-04-22 19:26 ` Richard Emslie 2002-04-23 6:44 ` Oleg Drokin 2002-04-22 23:16 ` Phil Howard 1 sibling, 1 reply; 11+ messages in thread From: Richard Emslie @ 2002-04-22 19:26 UTC (permalink / raw) To: Yura Umanets; +Cc: Phil Howard, reiserfs-list@namesys.com On Mon, 22 Apr 2002, Yura Umanets wrote: > Phil Howard wrote: > > > "By the keys" means what? Are the keys the filenames/paths, > > The key in reiserfs seems like four-dimentional coordinate of the > certain item in filesystem. > > The first component is identifier of directory where given object (file > or directory) lies. Second - identifier of the given object. Third > component - offset inside object. If object is file, then offset is > offset inside this file, if directory - hashed name of first entry in > this direntry. And finally last component is type of the item (statdata, > direntry, direct item, indirect item). > > Therefore it is very fast to access corresponding object item. This is > just $tree_height blocks to be read. > > > Does the raw interface simply shortcut access to files in a normal reiserfs > > mounted filesystem, which can also still be accessed the usual way, or is it > > a special object which can only be accessed that way (if so, then it loses > > the advantage of being able to use conventional tools that work on files, and > > ends up being pretty much a DB lib implemented in kernel space). Since most > > operations would be open() file, read() file once (because nothing would be > > larger than one block), and close(), a single system call that allowed to > > just fetch the contents given a name would certainly be a plus for the server > > component. > > > The instance of what is a raw access to files and directories you can > see on http://reiserfs.linux.kiev.ua/progsreiserfs-0.3.0.tar.gz in > files: object.c, file.c, dir.c > > Sorry for sounding dumb but am I right in saying this code does not go near reiserfs kernel code. ie it is directly accessing at block level? Is this reiserfs-raw? If so how does this benfit from the reiserfs's internal tree? If this has nothing to do with reiserfs-raw, how can one access a partition when mounted raw? ie open(pathname) doesn't make much sense. Cheers, Richard ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: using reiserfs as a DB 2002-04-22 19:26 ` Richard Emslie @ 2002-04-23 6:44 ` Oleg Drokin 0 siblings, 0 replies; 11+ messages in thread From: Oleg Drokin @ 2002-04-23 6:44 UTC (permalink / raw) To: Richard Emslie; +Cc: Yura Umanets, Phil Howard, reiserfs-list@namesys.com Hello! On Mon, Apr 22, 2002 at 08:26:19PM +0100, Richard Emslie wrote: > > The instance of what is a raw access to files and directories you can > > see on http://reiserfs.linux.kiev.ua/progsreiserfs-0.3.0.tar.gz in > > files: object.c, file.c, dir.c > Sorry for sounding dumb but am I right in saying this code does not go > near reiserfs kernel code. ie it is directly accessing at block level? Yes. > Is this reiserfs-raw? If so how does this benfit from the reiserfs's > internal tree? No. Reiserfs-raw is a different thing. In reiserfs-raw you actually mount your fs, and then access the data through ioctls. > If this has nothing to do with reiserfs-raw, how can one access a > partition when mounted raw? ie open(pathname) doesn't make much sense. You can do it through ioctl. Nikita should know the details. Bye, Oleg ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: using reiserfs as a DB 2002-04-22 18:12 ` Yura Umanets 2002-04-22 19:26 ` Richard Emslie @ 2002-04-22 23:16 ` Phil Howard 2002-04-23 6:46 ` Oleg Drokin 1 sibling, 1 reply; 11+ messages in thread From: Phil Howard @ 2002-04-22 23:16 UTC (permalink / raw) To: Yura Umanets; +Cc: reiserfs-list On Mon, Apr 22, 2002 at 09:12:36PM +0300, Yura Umanets wrote: | Phil Howard wrote: | | >"By the keys" means what? Are the keys the filenames/paths, | | The key in reiserfs seems like four-dimentional coordinate of the | certain item in filesystem. | | The first component is identifier of directory where given object (file | or directory) lies. Second - identifier of the given object. Third | component - offset inside object. If object is file, then offset is | offset inside this file, if directory - hashed name of first entry in | this direntry. And finally last component is type of the item (statdata, | direntry, direct item, indirect item). How is the application going to know what the key is for a particular file? How is the application going to translate what it has as a key, into the kind of key the raw interface uses? How costly is this lookup? -- ----------------------------------------------------------------- | Phil Howard - KA9WGN | Dallas | http://linuxhomepage.com/ | | phil-nospam@ipal.net | Texas, USA | http://phil.ipal.org/ | ----------------------------------------------------------------- ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: using reiserfs as a DB 2002-04-22 23:16 ` Phil Howard @ 2002-04-23 6:46 ` Oleg Drokin 0 siblings, 0 replies; 11+ messages in thread From: Oleg Drokin @ 2002-04-23 6:46 UTC (permalink / raw) To: Phil Howard; +Cc: Yura Umanets, reiserfs-list Hello! On Mon, Apr 22, 2002 at 06:16:45PM -0500, Phil Howard wrote: > | The first component is identifier of directory where given object (file > | or directory) lies. Second - identifier of the given object. Third > | component - offset inside object. If object is file, then offset is > | offset inside this file, if directory - hashed name of first entry in > | this direntry. And finally last component is type of the item (statdata, > | direntry, direct item, indirect item). > How is the application going to know what the key is for a particular > file? How is the application going to translate what it has as a key, It seems I used wrong word. What was used to acces files were in fact md5 sums of their names (the URL in squid case). > into the kind of key the raw interface uses? How costly is this lookup? Once Nikita will appear, he can explain better because he invented the code, I believe. Bye, Oleg ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: using reiserfs as a DB 2002-04-22 17:44 ` Phil Howard 2002-04-22 18:12 ` Yura Umanets @ 2002-04-23 8:50 ` Nikita Danilov 1 sibling, 0 replies; 11+ messages in thread From: Nikita Danilov @ 2002-04-23 8:50 UTC (permalink / raw) To: Phil Howard; +Cc: reiserfs-list Hello, Phil Howard writes: > On Mon, Apr 22, 2002 at 05:20:09PM +0400, Oleg Drokin wrote: > > | On Sun, Apr 21, 2002 at 03:53:28PM -0500, Phil Howard wrote: > | > Given the balanced tree directory structure of reiserfs, it seems it > | > could be usable as a DB in place of a DB library (such as Berkeley DB). > | > Has anyone done any timing/benchmarks of reiserfs used as a replacement > | > for a DB library, as compared to one such as Berkeley DB? There would > | > be an advantage to using conventional file tools to access the data > | > instead of having to code some up for a DB library. The issue would > | > certainly involve the open/read/close timings for reiserfs for each > | > piece of data accessed. The uses for which I have an interest in doing > | > this would most be small data, usually less than 128 bytes, and almost > | > always less than 512 bytes. For example, one use involves indexing a > | > lot of (100s to maybe even 1000000) URLs under special short keywords. > | > | I do not have any numbers, but take in account that while DB database > | generally have to updata atime/mtime/ctime on only 3 files (or even 2), > | in case of a filesystem each file accessed will change atime and/or mtime/ctime. > | > | (you can turn off atime updates of course). Also directory lookups ain't going > | to be free either. > | I've not heard of a test like you are describing, so feel free to implement > | one that will suit all your needs. > | > | But I remember that squid people decided lookup/open/close operations are > | too expensive for them and raw reiserfs access was born, where you was able > | directly access filesystems objects by the keys. > > "By the keys" means what? Are the keys the filenames/paths, or are they an > internal manifestation obtained by looking up those keys? What I envision > in some needs ideas are pretty much "flat" directory structures where the > application key would be the filename in the directory. One example of this > would be a lookup table translating a ham radio callsign into a web URL for > that ham operators web site (the keys in this case would be small strings, > 3 to 6 characters, and potentially a rather tight space if it scales up). > > Does the raw interface simply shortcut access to files in a normal reiserfs > mounted filesystem, which can also still be accessed the usual way, or is it > a special object which can only be accessed that way (if so, then it loses > the advantage of being able to use conventional tools that work on files, and > ends up being pretty much a DB lib implemented in kernel space). Since most > operations would be open() file, read() file once (because nothing would be > larger than one block), and close(), a single system call that allowed to > just fetch the contents given a name would certainly be a plus for the server > component. > I shall try to answer these and other questions about reiserfs-raw. Internally, reiserfs stores almost all file-system meta-data (directory entries, on-disk inodes, and pointers to blocks with file data) and some files-system data ("tails"---last portion of files bodies) in a balanced tree similar to ones described in a standard CS text-books. Specifically, each file-system object (directory, regular file, symbolic link, etc.) is represented as sequence of "items". Each item is stored in the tree under some "key". In reiser3.x key is 16 bytes. To obtain meta-data, file-system composes key and performs tree lookup (search_by_key() function). Key of an item is composed from some unique identifier of object ("objectid", also used as inode number), its "packing locality", which happens to be objectid of directory where object was created (*the* parent directory, so to speak), item type, and "offset" within object. For regular file offset is really offset within file, for directory, offset of the directory entry is, roughly speaking, hash of name stored in this directory entry. As I said, reiserfs just uses this tree (referred to as "internal") to build user visible file system structure (which itself is a tree, called "semantic") on the top of it. Note, that said trees are not even close to be isomorphic. Reiserfs-raw implemented API to access internal reiserfs tree directly, that is without going through semantic tree first. Application using this API is responsible for: (1) assigning keys to objects. Application creates anonymous object by giving its objectid. There are no directories. The only way to access object later is by knowing its objectid. Of course, objectid can be stored in the tree itself, but this way one just builds some sort of directories. (2) keeping track of object lifetime. In the standard file systems, directory tree also serves as garbage collector: when link count drops to zero, object is recycled. In reiserfs-raw there are not directories and hence to garbage collector is provided by system. Reiserfs-raw was designed as back-end for SquidNG (Squid New Generation)---project to rewrite squid and get rid of some of its limitations (mainly necessity to keep all cache meta-data in the memory all the time). In was mainly implemented by Yury Shevchuk <sizif@botik.ru> and sponsored by IntegratedLinux (not sure how they are named today). Joe Cooper (joe@swelltech.com) maintained SquidNG (http://www.swelltech.com/pengies/joe/squidng.html) Later Arkadi E. Shishlov (arkadi@it.lv) ported reiserfs-raw to 2.4 kernels (http://kvin.lv/arkadi/reiserfs-raw/). I cannot help mentioning that SquidNG+reiserfs-raw outperformed all other Squids by large margin at the official benchmarking event. Namesys doesn't support reiserfs-raw. Nikita. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2002-04-23 8:50 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-04-21 20:53 using reiserfs as a DB Phil Howard 2002-04-22 13:20 ` Oleg Drokin 2002-04-22 13:47 ` Hans Reiser 2002-04-22 14:03 ` Nikita Danilov 2002-04-22 17:44 ` Phil Howard 2002-04-22 18:12 ` Yura Umanets 2002-04-22 19:26 ` Richard Emslie 2002-04-23 6:44 ` Oleg Drokin 2002-04-22 23:16 ` Phil Howard 2002-04-23 6:46 ` Oleg Drokin 2002-04-23 8:50 ` Nikita Danilov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.