All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boaz Harrosh <bharrosh@panasas.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>,
	Matthew Wilcox <matthew@wil.cx>,
	Benny Halevy <bhalevy@panasas.com>, Jeff Garzik <jeff@garzik.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Al Viro <viro@ZenIV.linux.org.uk>,
	Avishay Traeger <avishay@gmail.com>,
	open-osd development <osd-dev@open-osd.org>,
	linux-scsi <linux-scsi@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 7/9] exofs: mkexofs
Date: Sun, 04 Jan 2009 17:20:42 +0200	[thread overview]
Message-ID: <4960D3CA.2000202@panasas.com> (raw)
In-Reply-To: <1230739053.3408.74.camel@localhost.localdomain>

James Bottomley wrote:
> On Wed, 2008-12-31 at 17:19 +0200, Boaz Harrosh wrote:
>> Andrew Morton wrote:
>>> On Tue, 16 Dec 2008 17:33:48 +0200
>>> Boaz Harrosh <bharrosh@panasas.com> wrote:
>>>
>>>> We need a mechanism to prepare the file system (mkfs).
>>>> I chose to implement that by means of a couple of
>>>> mount-options. Because there is no user-mode API for committing
>>>> OSD commands. And also, all this stuff is highly internal to
>>>> the file system itself.
>>>>
>>>> - Added two mount options mkfs=0/1,format=capacity_in_meg, so mkfs/format
>>>>   can be executed by kernel code just before mount. An mkexofs utility
>>>>   can now be implemented by means of a script that mounts and unmount the
>>>>   file system with proper options.
>>> Doing mkfs in-kernel is unusual.  I don't think the above description
>>> sufficiently helps the uninitiated understand why mkfs cannot be done
>>> in userspace as usual.  Please flesh it out a bit.
>> There are a few main reasons.
>> - There is no user-mode API for initiating OSD commands. Such a subsystem
>>   would be hundredfold bigger then the mkfs code submitted. I think it would be
>>   hard and stupid to maintain a complex user-mode API just for creating
>>   a couple of objects and writing a couple of on disk structures.
> 
> This is really a reflection of the whole problem with the OSD paradigm.

Certainly not a problem of the OSD paradigm, just maybe a problem
of the current code boundaries laid out by years of block-devices.

> In theory, a filesystem on OSD is a thin layer of metadata mapping
> objects to files.  Get this right and the storage will manage things,
- objects to files.  Get this right and the storage will manage things,
+ files to objects.  Get this right and the storage will manage things,
[objects to files is what some of the osd-targets do.]
> like security and access and attributes (there's even a natural mapping
> to the VFS concept of extended attributes).  Plus, the storage has
> enough information to manage persistence, backups and replication.
> 

Sounds perfect to me.

> The real problem is that no-one has actually managed to come up with a
> useful VFS<->OSD mapping layer (even by extending or altering the VFS).
> Every filesystem that currently uses OSD has a separate direct OSD
> speaking interface (i.e. it slices out the block layer to do this and
> talks directly to the storage).

I'm not sure what you mean.
Lets take VFS<->BLOCKS mapping for example. Each FS has it's own
interpretation of what that means, brtfs is less perfect then xfs
or vice versa?
I guess you did not mean "mapping" but meant "Interface" or API.
(or more likely I misunderstand the meaning of "mapping" ;)

Well that is exactly what I was attempting to submit. A general-purpose
low-level but easy-to-use, objects API for kernel clients. be it a
dead-simple exofs, or a complex multi-head beast like a pNFS-Objects
file system. The same library/API/Interface will be used for NFS-Clients
NFSD-Servers, reconstruction, security what ever.

The block-layer is not sliced out, Only the elevator function is, since
BIO merging, if any, are not device global but per-object/file, and the
elevator does not currently support that. (Profiling shows that it will
be needed)

BTW. The block-based filesystems are just a big minority in Kernel. The
majority does not use block-layer either.

> 
> I suppose this could be taken to show that such a layer is impossibly
> complex, as you assert, but its lack is reflected in strange looking
> design decisions like in-kernel mkfs.  It would also mean that there
> would be very little layered code sharing between ODS based filesystems.
- would be very little layered code sharing between ODS based filesystems.
+ would be very little layered code sharing between OSD based filesystems.

I disagree.
All the OSD-Based file systems (In Linux) should absolutely only use the
open-osd library submitted. I myself will work on a couple. If anything is
missing that could not be added later, I would like to know about it.

User-mode Interface is another matter. There are some ideas and some already
implemented.
[Hosted on open-osd.org
 see: http://git.open-osd.org/gitweb.cgi?p=osc-osd/.git;a=summary
 look inside the osd-initiator directory]
And I have a toy interface that adds no new entries into the Kernel in
the form of an OSDVFS module, that will let you access the raw OSD device
through the VFS name-space.

The lack of any user-mode API is just the lack of any current need/priority,
or that I'm the only one working on OSD. But nothing that could not be solved
in two weeks of pragmatic work. Surly it's not a paradigm problem.

> 
>> - I intend to refactor the code further to make use of more super.c services,
>>   so to make this addition even smaller. Also future direction of raid over
>>   multiple objects will make even more kernel infrastructure needed which
>>   will need even more user-mode code duplication.
>> - I anticipate problems that are not yet addressed in this body of work
>>   but will be in the future, mainly that a single OSD-target (lun) can
>>   be shared by lots of FSs, and a single FS can span many OSD-targets.
>>   Some central management is much easier to do in Kernel.
>>
>>> What are the dependencies for this filesystem code?  I assume that it
>>> depends on various block- and scsi-level patches?  Which ones, and
>>> what is their status, and is this code even compileable without them?
>>>
>> This OSD-based file system is dependent on the open-osd initiator library
>> code that I've submitted for inclusion for 2.6.29. It has been sitting
>> in linux-next for a while now, and has not been receiving any comments
>> for the last two updated patchsets I've sent to scsi-misc/lkml. However
>> it has not yet been submitted into Jame's scsi-misc git tree, and James
>> is the ultimate maintainer that should submit this work. I hope it will
>> still be submitted into 2.6.29, as this code is totally self sufficient
>> and does not endangers or changes any other Kernel subsystems.
>> (All the needed ground work was already submitted to Linus since 2.6.26)
>> So why should it not?
> 
> I don't like it mainly because it's not truly a useful general framework
> for others to build on.  However, as argued above, there might not
> actually be such a useful framework, so as long as the only two
> consumers (you and Lustre) want an interface like this, I'll put it in.
> 

Time will tell, but I believe the exact opposite. I believe and strive
for this OSD body of work to be useful for anybody that needs to talk
T10-OSD in Linux, be it for any-purpose. Any thing missing should be
easily added.

> James
> 
> 

To summarize the way I see it:
- James is right in that we can not currently see the full OSD picture since
  we do not have a user-mode API, so the usefulness of it all is unclear.
  [I will send an RFD soon, and hope all interested will chime in on the
   discussion]
- That said, all the submitted code is still relevant and useful,
  though at few places it takes the route of pragmatic-easy vs
  long-term-correctness. [Which can be fixed]
- exofs/OSD is not the first FS that depends on a none-block-dev/its-own
  stack. The lower level (OSD) is represented to kernel as a char-dev +
  Additional API, common to other FS/stack models. Though the lower OSD
  level has the potential to be a generic layer that can be used by lots
  of users and use cases, not only FS type.

Thank you James for your consideration
Boaz

  parent reply	other threads:[~2009-01-04 15:20 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-16 14:48 [PATCHSET 0/9] exofs (was osdfs) Boaz Harrosh
2008-12-16 14:48 ` Boaz Harrosh
2008-12-16 14:52 ` [PATCH 1/9] exofs: osd Swiss army knife Boaz Harrosh
2008-12-16 14:52   ` Boaz Harrosh
2008-12-29 20:29   ` Andrew Morton
2008-12-31 15:33     ` Boaz Harrosh
2008-12-31 19:26       ` Andrew Morton
2009-01-01 14:44         ` Boaz Harrosh
2009-01-02 16:52     ` Pavel Machek
2009-01-04  8:43       ` Boaz Harrosh
2009-01-04 20:03         ` Pavel Machek
2009-01-05  9:01           ` Boaz Harrosh
2009-01-05  9:36             ` Pavel Machek
2008-12-16 15:15 ` Boaz Harrosh
2009-01-07 15:47   ` [osd-dev] " Benny Halevy
2009-01-13 13:55     ` Alan Cox
2009-01-13 14:43       ` Boaz Harrosh
2009-01-13 14:52         ` Boaz Harrosh
2009-01-13 15:09       ` Jamie Lokier
2009-01-13 15:17         ` Jeff Garzik
2009-01-13 15:28           ` Benny Halevy
2008-12-16 15:15 ` Boaz Harrosh
2008-12-16 15:17 ` [PATCH 2/9] exofs: file and file_inode operations Boaz Harrosh
2008-12-16 15:17   ` Boaz Harrosh
2008-12-29 20:34   ` Andrew Morton
2008-12-31 15:36     ` Boaz Harrosh
2008-12-16 15:21 ` [PATCH 3/9] exofs: symlink_inode and fast_symlink_inode operations Boaz Harrosh
2008-12-16 15:21   ` Boaz Harrosh
2008-12-29 20:35   ` Andrew Morton
2008-12-16 15:22 ` [PATCH 4/9] exofs: address_space_operations Boaz Harrosh
2008-12-16 15:22   ` Boaz Harrosh
2008-12-29 20:45   ` Andrew Morton
2008-12-31 15:35     ` Boaz Harrosh
2008-12-16 15:28 ` [PATCH 5/9] exofs: dir_inode and directory operations Boaz Harrosh
2008-12-16 15:28   ` Boaz Harrosh
2008-12-29 20:47   ` Andrew Morton
2008-12-31 15:33     ` Boaz Harrosh
2008-12-16 15:31 ` [PATCH 6/9] exofs: super_operations and file_system_type Boaz Harrosh
2008-12-16 15:31   ` Boaz Harrosh
2008-12-17 22:23   ` Marcin Slusarz
2008-12-18  8:41     ` Boaz Harrosh
2008-12-29 20:50   ` Andrew Morton
2008-12-16 15:33 ` [PATCH 7/9] exofs: mkexofs Boaz Harrosh
2008-12-16 15:33   ` Boaz Harrosh
2008-12-29 20:14   ` Andrew Morton
2008-12-31 15:19     ` Boaz Harrosh
2008-12-31 15:57       ` James Bottomley
2009-01-01  9:22         ` [osd-dev] " Benny Halevy
2009-01-01  9:54           ` Jeff Garzik
2009-01-01 14:23             ` Benny Halevy
2009-01-01 14:28               ` Matthew Wilcox
2009-01-01 18:12               ` Jörn Engel
2009-01-01 18:12                 ` Jörn Engel
2009-01-01 23:26           ` J. Bruce Fields
2009-01-02  7:14             ` Benny Halevy
2009-01-04 15:20         ` Boaz Harrosh [this message]
2009-01-04 15:38           ` Christoph Hellwig
2009-01-12 18:12           ` James Bottomley
2009-01-12 19:23             ` Jeff Garzik
2009-01-12 19:56               ` James Bottomley
2009-01-12 19:56               ` James Bottomley
2009-01-12 20:22                 ` Jeff Garzik
2009-01-12 23:25                   ` James Bottomley
2009-01-13 13:03                     ` [osd-dev] " Benny Halevy
2009-01-13 13:24                       ` Jeff Garzik
2009-01-13 13:32                         ` Benny Halevy
2009-01-13 13:44                     ` Jeff Garzik
2009-01-13 14:03                       ` Alan Cox
2009-01-13 14:17                         ` Jeff Garzik
2009-01-13 16:14                           ` Alan Cox
2009-01-13 17:21                             ` Boaz Harrosh
2009-01-21 18:13                               ` Jeff Garzik
2009-01-21 18:44                                 ` Boaz Harrosh
2009-01-12 22:48             ` Jamie Lokier
2009-01-06  8:40         ` Andreas Dilger
2008-12-31 19:25       ` Andrew Morton
2009-01-01 13:33         ` Boaz Harrosh
2009-01-02 22:46           ` James Bottomley
2009-01-04  8:59             ` Boaz Harrosh
2008-12-16 15:36 ` [PATCH 8/9] exofs: Documentation Boaz Harrosh
2008-12-16 15:36   ` Boaz Harrosh
2008-12-18  7:47   ` Pavel Machek
2008-12-18  8:32     ` Boaz Harrosh
2008-12-16 15:38 ` [PATCH 9/9] fs: Add exofs to Kernel build Boaz Harrosh
2008-12-16 15:38   ` Boaz Harrosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4960D3CA.2000202@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=avishay@gmail.com \
    --cc=bhalevy@panasas.com \
    --cc=jeff@garzik.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=matthew@wil.cx \
    --cc=osd-dev@open-osd.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.