linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Garzik <jeff@garzik.org>
To: Benny Halevy <bhalevy@panasas.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>,
	open-osd development <osd-dev@open-osd.org>,
	Boaz Harrosh <bharrosh@panasas.com>,
	linux-scsi <linux-scsi@vger.kernel.org>,
	linux-kernel@vger.kernel.org, avishay@gmail.com,
	viro@ZenIV.linux.org.uk, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [osd-dev] [PATCH 7/9] exofs: mkexofs
Date: Thu, 01 Jan 2009 04:54:16 -0500	[thread overview]
Message-ID: <495C92C8.5040702@garzik.org> (raw)
In-Reply-To: <495C8B65.4010202@panasas.com>

Benny Halevy wrote:
> On Dec. 31, 2008, 17:57 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>> On Wed, 2008-12-31 at 17:19 +0200, Boaz Harrosh wrote:
>>> Andrew Morton wrote:
>>>> On Tue, 16 Dec 2008 17:33:48 +0200
>>>> Boaz Harrosh <bharrosh@panasas.com> wrote:
>>>>
>>>>> We need a mechanism to prepare the file system (mkfs).
>>>>> I chose to implement that by means of a couple of
>>>>> mount-options. Because there is no user-mode API for committing
>>>>> OSD commands. And also, all this stuff is highly internal to
>>>>> the file system itself.
>>>>>
>>>>> - Added two mount options mkfs=0/1,format=capacity_in_meg, so mkfs/format
>>>>>   can be executed by kernel code just before mount. An mkexofs utility
>>>>>   can now be implemented by means of a script that mounts and unmount the
>>>>>   file system with proper options.
>>>> Doing mkfs in-kernel is unusual.  I don't think the above description
>>>> sufficiently helps the uninitiated understand why mkfs cannot be done
>>>> in userspace as usual.  Please flesh it out a bit.
>>> There are a few main reasons.
>>> - There is no user-mode API for initiating OSD commands. Such a subsystem
>>>   would be hundredfold bigger then the mkfs code submitted. I think it would be
>>>   hard and stupid to maintain a complex user-mode API just for creating
>>>   a couple of objects and writing a couple of on disk structures.
>> This is really a reflection of the whole problem with the OSD paradigm.
>>
>> In theory, a filesystem on OSD is a thin layer of metadata mapping
>> objects to files.  Get this right and the storage will manage things,
>> like security and access and attributes (there's even a natural mapping
>> to the VFS concept of extended attributes).  Plus, the storage has
>> enough information to manage persistence, backups and replication.
>>
>> The real problem is that no-one has actually managed to come up with a
>> useful VFS<->OSD mapping layer (even by extending or altering the VFS).
>> Every filesystem that currently uses OSD has a separate direct OSD
>> speaking interface (i.e. it slices out the block layer to do this and
>> talks directly to the storage).
>>
>> I suppose this could be taken to show that such a layer is impossibly
>> complex, as you assert, but its lack is reflected in strange looking
>> design decisions like in-kernel mkfs.  It would also mean that there
>> would be very little layered code sharing between ODS based filesystems.
> 
> I think that we may need to gain some more experience to extract the
> commonalities of such file systems.  Currently we came up with the
> lowest possible denominator the osd initiator library that deals
> with command formatting and execution, including attrs, sense status,
> and security.

Not putting words in James' mouth, but I definitely agree that the 
in-kernel mkfs raises a red flag or two.  mkfs.ext3 for block-based 
filesystems has direct and intimate knowledge of ext3 filesystem 
structure, and it writes that information from userland directly to the 
block(s) necessary.

Similarly, mkfs for an object-based filesystem should be issuing SCSI 
commands to the OSD device from userland, AFAICS.


> To provide a higher level abstraction that would help with "administrative"
> tasks like mkfs and the like we already tossed an idea in the past -
> a file system that will represent the contents of an OSD in a namespace,
> for example: partition_id / object_id / {data, attrs / ..., ctl / ...}.
> Such a file system could provide a generic mapping which one could
> use to easily develop management applications for the OSD.  That said,
> it's out of the scope of exofs which focuses mostly on the filesystem
> data and metadata paths.

That's far too complex for what is necessary.  Just issue SCSI commands 
from userland.  We don't need an abstract interface specifically for 
low-level details.  The VFS is that abstract interface; anything else 
should be low-level and purpose-built.

	Jeff







  reply	other threads:[~2009-01-01  9:54 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4947BFAA.4030208@panasas.com>
     [not found] ` <4947CA5C.50104@panasas.com>
     [not found]   ` <20081229121423.efde9d06.akpm@linux-foundation.org>
2008-12-31 15:19     ` [PATCH 7/9] exofs: mkexofs Boaz Harrosh
2008-12-31 15:57       ` James Bottomley
2009-01-01  9:22         ` [osd-dev] " Benny Halevy
2009-01-01  9:54           ` Jeff Garzik [this message]
2009-01-01 14:23             ` Benny Halevy
2009-01-01 14:28               ` Matthew Wilcox
2009-01-01 18:12               ` Jörn Engel
2009-01-01 23:26           ` J. Bruce Fields
2009-01-02  7:14             ` Benny Halevy
2009-01-04 15:20         ` Boaz Harrosh
2009-01-04 15:38           ` Christoph Hellwig
2009-01-12 18:12           ` James Bottomley
2009-01-12 19:23             ` Jeff Garzik
2009-01-12 19:56               ` James Bottomley
2009-01-12 20:22                 ` Jeff Garzik
2009-01-12 23:25                   ` James Bottomley
2009-01-13 13:03                     ` [osd-dev] " Benny Halevy
2009-01-13 13:24                       ` Jeff Garzik
2009-01-13 13:32                         ` Benny Halevy
2009-01-13 13:44                     ` Jeff Garzik
2009-01-13 14:03                       ` Alan Cox
2009-01-13 14:17                         ` Jeff Garzik
2009-01-13 16:14                           ` Alan Cox
2009-01-13 17:21                             ` Boaz Harrosh
2009-01-21 18:13                               ` Jeff Garzik
2009-01-21 18:44                                 ` Boaz Harrosh
2009-01-12 22:48             ` Jamie Lokier
2009-01-06  8:40         ` Andreas Dilger
2008-12-31 19:25       ` Andrew Morton
2009-01-01 13:33         ` Boaz Harrosh
2009-01-02 22:46           ` James Bottomley
2009-01-04  8:59             ` Boaz Harrosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=495C92C8.5040702@garzik.org \
    --to=jeff@garzik.org \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=avishay@gmail.com \
    --cc=bhalevy@panasas.com \
    --cc=bharrosh@panasas.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=osd-dev@open-osd.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).