public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils
       [not found] ` <1234185129-31858-1-git-send-email-bharrosh@panasas.com>
@ 2009-02-16  4:18   ` FUJITA Tomonori
  2009-02-16  8:49     ` Boaz Harrosh
  0 siblings, 1 reply; 17+ messages in thread
From: FUJITA Tomonori @ 2009-02-16  4:18 UTC (permalink / raw)
  To: bharrosh
  Cc: avishay, jeff, akpm, linux-fsdevel, osd-dev, linux-kernel,
	James.Bottomley, jens.axboe, linux-scsi

On Mon,  9 Feb 2009 15:12:09 +0200
Boaz Harrosh <bharrosh@panasas.com> wrote:

> This patch includes osd infrastructure that will be used later by
> the file system.
> 
> Also the declarations of constants, on disk structures,
> and prototypes.
> 
> And the Kbuild+Kconfig files needed to build the exofs module.
> 
> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
> ---
>  fs/exofs/Kbuild   |   30 +++++++
>  fs/exofs/Kconfig  |   13 +++
>  fs/exofs/common.h |  181 +++++++++++++++++++++++++++++++++++++++++
>  fs/exofs/exofs.h  |  139 ++++++++++++++++++++++++++++++++
>  fs/exofs/osd.c    |  230 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 593 insertions(+), 0 deletions(-)
>  create mode 100644 fs/exofs/Kbuild
>  create mode 100644 fs/exofs/Kconfig
>  create mode 100644 fs/exofs/common.h
>  create mode 100644 fs/exofs/exofs.h
>  create mode 100644 fs/exofs/osd.c

> +static void _osd_read(struct osd_request *or,
> +	const struct osd_obj_id *obj, uint64_t offset, struct bio *bio)
> +{
> +	osd_req_read(or, obj, bio, offset);
> +	EXOFS_DBGMSG("osd_req_read(p=%llX, ob=%llX, l=%llu, of=%llu)\n",
> +		_LLU(obj->partition), _LLU(obj->id), _LLU(bio->bi_size),
> +		_LLU(offset));
> +}
> +
> +#ifdef __KERNEL__

Hmm?

> +static struct bio *_bio_map_pages(struct request_queue *req_q,
> +				  struct page **pages, unsigned page_count,
> +				  size_t length, gfp_t gfp_mask)
> +{
> +	struct bio *bio;
> +	int i;
> +
> +	bio = bio_alloc(gfp_mask, page_count);
> +	if (!bio) {
> +		EXOFS_DBGMSG("Failed to bio_alloc page_count=%d\n", page_count);
> +		return NULL;
> +	}
> +
> +	for (i = 0; i < page_count && length; i++) {
> +		size_t use_len = min(length, PAGE_SIZE);
> +
> +		if (use_len !=
> +		    bio_add_pc_page(req_q, bio, pages[i], use_len, 0)) {
> +			EXOFS_ERR("Failed bio_add_pc_page req_q=%p pages[i]=%p "
> +				  "use_len=%Zd page_count=%d length=%Zd\n",
> +				  req_q, pages[i], use_len, page_count, length);
> +			bio_put(bio);
> +			return NULL;
> +		}
> +
> +		length -= use_len;
> +	}
> +
> +	WARN_ON(length);
> +	return bio;
> +}

1) exofs builds bios by hand.
2) exofs passes bio to OSD SCSI ULD.

As a result, exofs and OSD SCSI ULD need to know the internal of bio,
that is, you reinvent the bio handling infrastructure, as pointed out
in another thread in scsi-ml.

_bio_map_pages is called where the VFS passes an array of a pointer to
a page frame.

Why can't you simply pass the array to OSD SCSI ULD? Then OSD SCSI ULD
can use the block layer helper functions to build a request out of
pages without knowing the internal of bio.


> +int osd_req_read_pages(struct osd_request *or,
> +	const struct osd_obj_id *obj, u64 offset, u64 length,
> +	struct page **pages, int page_count)
> +{
> +	struct request_queue *req_q = or->osd_dev->scsi_device->request_queue;
> +	struct bio *bio = _bio_map_pages(req_q, pages, page_count, length,
> +					 GFP_KERNEL);
> +
> +	if (!bio)
> +		return -ENOMEM;
> +
> +	_osd_read(or, obj, offset, bio);
> +	return 0;
> +}
> +#endif /* def __KERNEL__ */
> +
> +int osd_req_read_kern(struct osd_request *or,
> +	const struct osd_obj_id *obj, u64 offset, void* buff, u64 len)
> +{
> +	struct request_queue *req_q = or->osd_dev->scsi_device->request_queue;
> +	struct bio *bio = bio_map_kern(req_q, buff, len, GFP_KERNEL);
> +
> +	if (!bio)
> +		return -ENOMEM;
> +
> +	_osd_read(or, obj, offset, bio);
> +	return 0;
> +}
> +
> +static void _osd_write(struct osd_request *or,
> +	const struct osd_obj_id *obj, uint64_t offset, struct bio *bio)
> +{
> +	osd_req_write(or, obj, bio, offset);
> +	EXOFS_DBGMSG("osd_req_write(p=%llX, ob=%llX, l=%llu, of=%llu)\n",
> +		_LLU(obj->partition), _LLU(obj->id), _LLU(bio->bi_size),
> +		_LLU(offset));
> +}
> +
> +#ifdef __KERNEL__
> +int osd_req_write_pages(struct osd_request *or,
> +	const struct osd_obj_id *obj, u64 offset, u64 length,
> +	struct page **pages, int page_count)
> +{
> +	struct request_queue *req_q = or->osd_dev->scsi_device->request_queue;
> +	struct bio *bio = _bio_map_pages(req_q, pages, page_count, length,
> +					 GFP_KERNEL);
> +
> +	if (!bio)
> +		return -ENOMEM;
> +
> +	_osd_write(or, obj, offset, bio);
> +	return 0;
> +}
> +#endif /* def __KERNEL__ */
> +
> +int osd_req_write_kern(struct osd_request *or,
> +	const struct osd_obj_id *obj, u64 offset, void* buff, u64 len)
> +{
> +	struct request_queue *req_q = or->osd_dev->scsi_device->request_queue;
> +	struct bio *bio = bio_map_kern(req_q, buff, len, GFP_KERNEL);
> +
> +	if (!bio)
> +		return -ENOMEM;
> +
> +	_osd_write(or, obj, offset, bio);
> +	return 0;
> +}
> -- 
> 1.6.0.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils
  2009-02-16  4:18   ` [PATCH 1/8] exofs: Kbuild, Headers and osd utils FUJITA Tomonori
@ 2009-02-16  8:49     ` Boaz Harrosh
  2009-02-16  9:00       ` FUJITA Tomonori
  0 siblings, 1 reply; 17+ messages in thread
From: Boaz Harrosh @ 2009-02-16  8:49 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: avishay, jeff, akpm, linux-fsdevel, osd-dev, linux-kernel,
	James.Bottomley, jens.axboe, linux-scsi

FUJITA Tomonori wrote:
> On Mon,  9 Feb 2009 15:12:09 +0200
> Boaz Harrosh <bharrosh@panasas.com> wrote:
> 
>> This patch includes osd infrastructure that will be used later by
>> the file system.
>>
>> Also the declarations of constants, on disk structures,
>> and prototypes.
>>
>> And the Kbuild+Kconfig files needed to build the exofs module.
>>
>> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
>> ---
>>  fs/exofs/Kbuild   |   30 +++++++
>>  fs/exofs/Kconfig  |   13 +++
>>  fs/exofs/common.h |  181 +++++++++++++++++++++++++++++++++++++++++
>>  fs/exofs/exofs.h  |  139 ++++++++++++++++++++++++++++++++
>>  fs/exofs/osd.c    |  230 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 593 insertions(+), 0 deletions(-)
>>  create mode 100644 fs/exofs/Kbuild
>>  create mode 100644 fs/exofs/Kconfig
>>  create mode 100644 fs/exofs/common.h
>>  create mode 100644 fs/exofs/exofs.h
>>  create mode 100644 fs/exofs/osd.c
> 
>> +static void _osd_read(struct osd_request *or,
>> +	const struct osd_obj_id *obj, uint64_t offset, struct bio *bio)
>> +{
>> +	osd_req_read(or, obj, bio, offset);
>> +	EXOFS_DBGMSG("osd_req_read(p=%llX, ob=%llX, l=%llu, of=%llu)\n",
>> +		_LLU(obj->partition), _LLU(obj->id), _LLU(bio->bi_size),
>> +		_LLU(offset));
>> +}
>> +
>> +#ifdef __KERNEL__
> 
> Hmm?
> 

Yep, this file also complies in user mode.

>> +static struct bio *_bio_map_pages(struct request_queue *req_q,
>> +				  struct page **pages, unsigned page_count,
>> +				  size_t length, gfp_t gfp_mask)
>> +{
>> +	struct bio *bio;
>> +	int i;
>> +
>> +	bio = bio_alloc(gfp_mask, page_count);
>> +	if (!bio) {
>> +		EXOFS_DBGMSG("Failed to bio_alloc page_count=%d\n", page_count);
>> +		return NULL;
>> +	}
>> +
>> +	for (i = 0; i < page_count && length; i++) {
>> +		size_t use_len = min(length, PAGE_SIZE);
>> +
>> +		if (use_len !=
>> +		    bio_add_pc_page(req_q, bio, pages[i], use_len, 0)) {
>> +			EXOFS_ERR("Failed bio_add_pc_page req_q=%p pages[i]=%p "
>> +				  "use_len=%Zd page_count=%d length=%Zd\n",
>> +				  req_q, pages[i], use_len, page_count, length);
>> +			bio_put(bio);
>> +			return NULL;
>> +		}
>> +
>> +		length -= use_len;
>> +	}
>> +
>> +	WARN_ON(length);
>> +	return bio;
>> +}
> 
> 1) exofs builds bios by hand.
> 2) exofs passes bio to OSD SCSI ULD.
> 
> As a result, exofs and OSD SCSI ULD need to know the internal of bio,
> that is, you reinvent the bio handling infrastructure, as pointed out
> in another thread in scsi-ml.
> 
> _bio_map_pages is called where the VFS passes an array of a pointer to
> a page frame.
> 
> Why can't you simply pass the array to OSD SCSI ULD? Then OSD SCSI ULD
> can use the block layer helper functions to build a request out of
> pages without knowing the internal of bio.
> 
> 

Because actually this code is wrong and temporary and will change soon.
At vfs write_pages I do not get a linear array of page pointers but a
link-list of pages. This will not fit any current model. Also looking
ahead I will have RAID 0, 1, 5, and 6 on objects of different devices. bio
is the perfect collector for memory information in this situation.

exofs is not the first and only file system who is using bios. Proof of
the matter is that block exports a bio submit routine.

As I said on the other thread, I could live without it for now, for a short while,
but I will regret it badly and it will hurt performance in the long run.

<snip>

Boaz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils
  2009-02-16  8:49     ` Boaz Harrosh
@ 2009-02-16  9:00       ` FUJITA Tomonori
  2009-02-16  9:19         ` Boaz Harrosh
  0 siblings, 1 reply; 17+ messages in thread
From: FUJITA Tomonori @ 2009-02-16  9:00 UTC (permalink / raw)
  To: bharrosh
  Cc: fujita.tomonori, avishay, jeff, akpm, linux-fsdevel, osd-dev,
	linux-kernel, James.Bottomley, jens.axboe, linux-scsi

On Mon, 16 Feb 2009 10:49:39 +0200
Boaz Harrosh <bharrosh@panasas.com> wrote:

> FUJITA Tomonori wrote:
> > On Mon,  9 Feb 2009 15:12:09 +0200
> > Boaz Harrosh <bharrosh@panasas.com> wrote:
> > 
> >> This patch includes osd infrastructure that will be used later by
> >> the file system.
> >>
> >> Also the declarations of constants, on disk structures,
> >> and prototypes.
> >>
> >> And the Kbuild+Kconfig files needed to build the exofs module.
> >>
> >> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
> >> ---
> >>  fs/exofs/Kbuild   |   30 +++++++
> >>  fs/exofs/Kconfig  |   13 +++
> >>  fs/exofs/common.h |  181 +++++++++++++++++++++++++++++++++++++++++
> >>  fs/exofs/exofs.h  |  139 ++++++++++++++++++++++++++++++++
> >>  fs/exofs/osd.c    |  230 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  5 files changed, 593 insertions(+), 0 deletions(-)
> >>  create mode 100644 fs/exofs/Kbuild
> >>  create mode 100644 fs/exofs/Kconfig
> >>  create mode 100644 fs/exofs/common.h
> >>  create mode 100644 fs/exofs/exofs.h
> >>  create mode 100644 fs/exofs/osd.c
> > 
> >> +static void _osd_read(struct osd_request *or,
> >> +	const struct osd_obj_id *obj, uint64_t offset, struct bio *bio)
> >> +{
> >> +	osd_req_read(or, obj, bio, offset);
> >> +	EXOFS_DBGMSG("osd_req_read(p=%llX, ob=%llX, l=%llu, of=%llu)\n",
> >> +		_LLU(obj->partition), _LLU(obj->id), _LLU(bio->bi_size),
> >> +		_LLU(offset));
> >> +}
> >> +
> >> +#ifdef __KERNEL__
> > 
> > Hmm?
> > 
> 
> Yep, this file also complies in user mode.

Even if you do, it's a good thing to add __KERNEL__ to fs/*.c?


> >> +static struct bio *_bio_map_pages(struct request_queue *req_q,
> >> +				  struct page **pages, unsigned page_count,
> >> +				  size_t length, gfp_t gfp_mask)
> >> +{
> >> +	struct bio *bio;
> >> +	int i;
> >> +
> >> +	bio = bio_alloc(gfp_mask, page_count);
> >> +	if (!bio) {
> >> +		EXOFS_DBGMSG("Failed to bio_alloc page_count=%d\n", page_count);
> >> +		return NULL;
> >> +	}
> >> +
> >> +	for (i = 0; i < page_count && length; i++) {
> >> +		size_t use_len = min(length, PAGE_SIZE);
> >> +
> >> +		if (use_len !=
> >> +		    bio_add_pc_page(req_q, bio, pages[i], use_len, 0)) {
> >> +			EXOFS_ERR("Failed bio_add_pc_page req_q=%p pages[i]=%p "
> >> +				  "use_len=%Zd page_count=%d length=%Zd\n",
> >> +				  req_q, pages[i], use_len, page_count, length);
> >> +			bio_put(bio);
> >> +			return NULL;
> >> +		}
> >> +
> >> +		length -= use_len;
> >> +	}
> >> +
> >> +	WARN_ON(length);
> >> +	return bio;
> >> +}
> > 
> > 1) exofs builds bios by hand.
> > 2) exofs passes bio to OSD SCSI ULD.
> > 
> > As a result, exofs and OSD SCSI ULD need to know the internal of bio,
> > that is, you reinvent the bio handling infrastructure, as pointed out
> > in another thread in scsi-ml.
> > 
> > _bio_map_pages is called where the VFS passes an array of a pointer to
> > a page frame.
> > 
> > Why can't you simply pass the array to OSD SCSI ULD? Then OSD SCSI ULD
> > can use the block layer helper functions to build a request out of
> > pages without knowing the internal of bio.
> > 
> > 
> 
> Because actually this code is wrong and temporary and will change soon.
> At vfs write_pages I do not get a linear array of page pointers but a
> link-list of pages. This will not fit any current model.

Then, why can't you pass a link-list of pages?


> Also looking
> ahead I will have RAID 0, 1, 5, and 6 on objects of different devices. bio
> is the perfect collector for memory information in this situation.

You will add such features to exofs, handling multiple devices
internally?


> exofs is not the first and only file system who is using bios. Proof of
> the matter is that block exports a bio submit routine.

Seems that exofs just passes pages and the ULD sends a SCSI command
including these pages. I don't see how exofs needs to handle bio
directly.


> As I said on the other thread, I could live without it for now, for a short while,
> but I will regret it badly and it will hurt performance in the long run.
> 
> <snip>
> 
> Boaz
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils
  2009-02-16  9:00       ` FUJITA Tomonori
@ 2009-02-16  9:19         ` Boaz Harrosh
  2009-02-16  9:27           ` Jeff Garzik
  2009-02-16  9:38           ` [PATCH 1/8] exofs: Kbuild, Headers and osd utils FUJITA Tomonori
  0 siblings, 2 replies; 17+ messages in thread
From: Boaz Harrosh @ 2009-02-16  9:19 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: avishay, jeff, akpm, linux-fsdevel, osd-dev, linux-kernel,
	James.Bottomley, jens.axboe, linux-scsi

FUJITA Tomonori wrote:
> On Mon, 16 Feb 2009 10:49:39 +0200
> Boaz Harrosh <bharrosh@panasas.com> wrote:
> 
>> FUJITA Tomonori wrote:
>>> On Mon,  9 Feb 2009 15:12:09 +0200
>>> Boaz Harrosh <bharrosh@panasas.com> wrote:
>>>
>>>> This patch includes osd infrastructure that will be used later by
>>>> the file system.
>>>>
>>>> Also the declarations of constants, on disk structures,
>>>> and prototypes.
>>>>
>>>> And the Kbuild+Kconfig files needed to build the exofs module.
>>>>
>>>> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
>>>> ---
>>>>  fs/exofs/Kbuild   |   30 +++++++
>>>>  fs/exofs/Kconfig  |   13 +++
>>>>  fs/exofs/common.h |  181 +++++++++++++++++++++++++++++++++++++++++
>>>>  fs/exofs/exofs.h  |  139 ++++++++++++++++++++++++++++++++
>>>>  fs/exofs/osd.c    |  230 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  5 files changed, 593 insertions(+), 0 deletions(-)
>>>>  create mode 100644 fs/exofs/Kbuild
>>>>  create mode 100644 fs/exofs/Kconfig
>>>>  create mode 100644 fs/exofs/common.h
>>>>  create mode 100644 fs/exofs/exofs.h
>>>>  create mode 100644 fs/exofs/osd.c
>>>> +static void _osd_read(struct osd_request *or,
>>>> +	const struct osd_obj_id *obj, uint64_t offset, struct bio *bio)
>>>> +{
>>>> +	osd_req_read(or, obj, bio, offset);
>>>> +	EXOFS_DBGMSG("osd_req_read(p=%llX, ob=%llX, l=%llu, of=%llu)\n",
>>>> +		_LLU(obj->partition), _LLU(obj->id), _LLU(bio->bi_size),
>>>> +		_LLU(offset));
>>>> +}
>>>> +
>>>> +#ifdef __KERNEL__
>>> Hmm?
>>>
>> Yep, this file also complies in user mode.
> 
> Even if you do, it's a good thing to add __KERNEL__ to fs/*.c?
> 
> 
>>>> +static struct bio *_bio_map_pages(struct request_queue *req_q,
>>>> +				  struct page **pages, unsigned page_count,
>>>> +				  size_t length, gfp_t gfp_mask)
>>>> +{
>>>> +	struct bio *bio;
>>>> +	int i;
>>>> +
>>>> +	bio = bio_alloc(gfp_mask, page_count);
>>>> +	if (!bio) {
>>>> +		EXOFS_DBGMSG("Failed to bio_alloc page_count=%d\n", page_count);
>>>> +		return NULL;
>>>> +	}
>>>> +
>>>> +	for (i = 0; i < page_count && length; i++) {
>>>> +		size_t use_len = min(length, PAGE_SIZE);
>>>> +
>>>> +		if (use_len !=
>>>> +		    bio_add_pc_page(req_q, bio, pages[i], use_len, 0)) {
>>>> +			EXOFS_ERR("Failed bio_add_pc_page req_q=%p pages[i]=%p "
>>>> +				  "use_len=%Zd page_count=%d length=%Zd\n",
>>>> +				  req_q, pages[i], use_len, page_count, length);
>>>> +			bio_put(bio);
>>>> +			return NULL;
>>>> +		}
>>>> +
>>>> +		length -= use_len;
>>>> +	}
>>>> +
>>>> +	WARN_ON(length);
>>>> +	return bio;
>>>> +}
>>> 1) exofs builds bios by hand.
>>> 2) exofs passes bio to OSD SCSI ULD.
>>>
>>> As a result, exofs and OSD SCSI ULD need to know the internal of bio,
>>> that is, you reinvent the bio handling infrastructure, as pointed out
>>> in another thread in scsi-ml.
>>>
>>> _bio_map_pages is called where the VFS passes an array of a pointer to
>>> a page frame.
>>>
>>> Why can't you simply pass the array to OSD SCSI ULD? Then OSD SCSI ULD
>>> can use the block layer helper functions to build a request out of
>>> pages without knowing the internal of bio.
>>>
>>>
>> Because actually this code is wrong and temporary and will change soon.
>> At vfs write_pages I do not get a linear array of page pointers but a
>> link-list of pages. This will not fit any current model.
> 
> Then, why can't you pass a link-list of pages?
> 

What? How to do that? I mean how to move from link-list of pages
to request?

> 
>> Also looking
>> ahead I will have RAID 0, 1, 5, and 6 on objects of different devices. bio
>> is the perfect collector for memory information in this situation.
> 
> You will add such features to exofs, handling multiple devices
> internally?
> 

Multiple objects on Multiple devices, Yes.

> 
>> exofs is not the first and only file system who is using bios. Proof of
>> the matter is that block exports a bio submit routine.
> 
> Seems that exofs just passes pages and the ULD sends a SCSI command
> including these pages. I don't see how exofs needs to handle bio
> directly.
> 

How do you propose to collect these pages? and keep them without allocating
an extra list? without pre-allocating a struct request? and without re-inventing
the bio structure?

> 
>> As I said on the other thread, I could live without it for now, for a short while,
>> but I will regret it badly and it will hurt performance in the long run.
>>
>> <snip>
>>
>> Boaz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils
  2009-02-16  9:19         ` Boaz Harrosh
@ 2009-02-16  9:27           ` Jeff Garzik
  2009-02-16 10:19             ` Boaz Harrosh
  2009-02-16  9:38           ` [PATCH 1/8] exofs: Kbuild, Headers and osd utils FUJITA Tomonori
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff Garzik @ 2009-02-16  9:27 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: FUJITA Tomonori, avishay, akpm, linux-fsdevel, osd-dev,
	linux-kernel, James.Bottomley, jens.axboe, linux-scsi

Boaz Harrosh wrote:
> FUJITA Tomonori wrote:
>> Boaz Harrosh <bharrosh@panasas.com> wrote:
>>> Also looking
>>> ahead I will have RAID 0, 1, 5, and 6 on objects of different devices. bio
>>> is the perfect collector for memory information in this situation.

>> You will add such features to exofs, handling multiple devices
>> internally?

> Multiple objects on Multiple devices, Yes.

That sort of feature does not belong in exofs, but somewhat separate. 
Ideally we should be able to share "MD for OSD" with other OSD 
filesystems, and the "osdblk" device that I will produce once libosd 
hits upstream.

	Jeff





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils
  2009-02-16  9:19         ` Boaz Harrosh
  2009-02-16  9:27           ` Jeff Garzik
@ 2009-02-16  9:38           ` FUJITA Tomonori
  2009-02-16 10:29             ` Boaz Harrosh
  1 sibling, 1 reply; 17+ messages in thread
From: FUJITA Tomonori @ 2009-02-16  9:38 UTC (permalink / raw)
  To: bharrosh
  Cc: fujita.tomonori, avishay, jeff, akpm, linux-fsdevel, osd-dev,
	linux-kernel, James.Bottomley, jens.axboe, linux-scsi

On Mon, 16 Feb 2009 11:19:21 +0200
Boaz Harrosh <bharrosh@panasas.com> wrote:

> >> Also looking
> >> ahead I will have RAID 0, 1, 5, and 6 on objects of different devices. bio
> >> is the perfect collector for memory information in this situation.
> > 
> > You will add such features to exofs, handling multiple devices
> > internally?
> > 
> 
> Multiple objects on Multiple devices, Yes.

I thought that exofs is kinda example (reference) file system.

Nobody has seen your code. Let's discuss when we have the
code. Over-designing for what we've not seen is not a good idea.


> >> exofs is not the first and only file system who is using bios. Proof of
> >> the matter is that block exports a bio submit routine.
> > 
> > Seems that exofs just passes pages and the ULD sends a SCSI command
> > including these pages. I don't see how exofs needs to handle bio
> > directly.
> > 
> 
> How do you propose to collect these pages? and keep them without allocating
> an extra list? without pre-allocating a struct request? and without re-inventing
> the bio structure?

I don't think that allocating an extra list (or something) to keep
them hurts performance. We can talk about it when you have the real
performance results.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils
  2009-02-16  9:27           ` Jeff Garzik
@ 2009-02-16 10:19             ` Boaz Harrosh
  2009-02-16 11:05               ` pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils) Jeff Garzik
  0 siblings, 1 reply; 17+ messages in thread
From: Boaz Harrosh @ 2009-02-16 10:19 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: FUJITA Tomonori, avishay, akpm, linux-fsdevel, osd-dev,
	linux-kernel, James.Bottomley, jens.axboe, linux-scsi

Jeff Garzik wrote:
> Boaz Harrosh wrote:
>> FUJITA Tomonori wrote:
>>> Boaz Harrosh <bharrosh@panasas.com> wrote:
>>>> Also looking
>>>> ahead I will have RAID 0, 1, 5, and 6 on objects of different devices. bio
>>>> is the perfect collector for memory information in this situation.
> 
>>> You will add such features to exofs, handling multiple devices
>>> internally?
> 
>> Multiple objects on Multiple devices, Yes.
> 
> That sort of feature does not belong in exofs, but somewhat separate. 
> Ideally we should be able to share "MD for OSD" with other OSD 
> filesystems, and the "osdblk" device that I will produce once libosd 
> hits upstream.
> 

No can do. exofs is meant to be a reference implementation of a pNFS-objects
file serving system. Have you read the spec of pNFS-objects layout? they define
RAID 0, 1, 5, and 6. In pNFS the MDS is suppose to be able to write the data
for its clients as NFS, so it needs to have all the infra structure and knowledge
of an Client pNFS-object layout drive.

But don't worry, the plans are that layout-drive and exofs will reuse all the
same library code that does all that. There will not be a single line of duplicate
code.

In fact one of the things I wanted to talk about in LSF is a generic, BIO based
(or some thing else), RAID engine, That could be used by all RAIDers in Kernel,
DM, MD, btrfs, exofs pNFS-objects, TUX3, ZFS and so on. And I don't mean just the
low level memory-pointers XOR functions, but the more higher level of memory
splitters/collectors, abstract-device lists, and RAID description structures.
(Because RAIDs can be stacked like 10, 50, and all kind of crazy things)

> 	Jeff
> 
> 
> 
> 

Boaz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils
  2009-02-16  9:38           ` [PATCH 1/8] exofs: Kbuild, Headers and osd utils FUJITA Tomonori
@ 2009-02-16 10:29             ` Boaz Harrosh
  2009-02-17  0:20               ` FUJITA Tomonori
  0 siblings, 1 reply; 17+ messages in thread
From: Boaz Harrosh @ 2009-02-16 10:29 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: avishay, jeff, akpm, linux-fsdevel, osd-dev, linux-kernel,
	James.Bottomley, jens.axboe, linux-scsi

FUJITA Tomonori wrote:
> On Mon, 16 Feb 2009 11:19:21 +0200
> Boaz Harrosh <bharrosh@panasas.com> wrote:
> 
>>>> Also looking
>>>> ahead I will have RAID 0, 1, 5, and 6 on objects of different devices. bio
>>>> is the perfect collector for memory information in this situation.
>>> You will add such features to exofs, handling multiple devices
>>> internally?
>>>
>> Multiple objects on Multiple devices, Yes.
> 
> I thought that exofs is kinda example (reference) file system.
> 
> Nobody has seen your code. Let's discuss when we have the
> code. Over-designing for what we've not seen is not a good idea.
> 

Thanks for the insults, and high credit ;)

Yes it's "kinda example (reference) file system" of a pNFS-objects
file system. What can I do life is tough.

> 
>>>> exofs is not the first and only file system who is using bios. Proof of
>>>> the matter is that block exports a bio submit routine.
>>> Seems that exofs just passes pages and the ULD sends a SCSI command
>>> including these pages. I don't see how exofs needs to handle bio
>>> directly.
>>>
>> How do you propose to collect these pages? and keep them without allocating
>> an extra list? without pre-allocating a struct request? and without re-inventing
>> the bio structure?
> 
> I don't think that allocating an extra list (or something) to keep
> them hurts performance. We can talk about it when you have the real
> performance results.

So you are the one that starts to invent the wheel here. I thought I was
the one that does that, only you only called me by names, because you never showed
me where.

But please only answer one question for me: Please don't write back if you do not
answer this question:

Why do other filesystems allow to use bios? are they going to stop? Who is going
to remove that?

And as I said, I am going to remove it for now, please be patient. You have never
herd from me that I refuse to do it, did you?

Boaz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils)
  2009-02-16 10:19             ` Boaz Harrosh
@ 2009-02-16 11:05               ` Jeff Garzik
  2009-02-16 12:45                 ` Boaz Harrosh
                                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Jeff Garzik @ 2009-02-16 11:05 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: FUJITA Tomonori, avishay, akpm, linux-fsdevel, osd-dev,
	linux-kernel, James.Bottomley, jens.axboe, linux-scsi

Boaz Harrosh wrote:
> No can do. exofs is meant to be a reference implementation of a pNFS-objects
> file serving system. Have you read the spec of pNFS-objects layout? they define
> RAID 0, 1, 5, and 6. In pNFS the MDS is suppose to be able to write the data
> for its clients as NFS, so it needs to have all the infra structure and knowledge
> of an Client pNFS-object layout drive.

Yes, I have studied pNFS!  I plan to add v4.1 and pNFS support to my NFS 
server, once v4.0 support is working well.


pNFS The Theory:   is wise and necessary:  permit clients to directly 
connect to data storage, rather than copying through the metadata 
server(s).  This is what every distributed filesystem is doing these 
days -- direct to data server for bulk data read/write.

pNFS The Specification:   is an utter piece of shit.  I can only presume 
some shady backroom deal in a smoke-filled room was the reason this saw 
the light of day.


In a sane world, NFS clients would speak... NFS.

In the crazy world of pNFS, NFS clients are now forced to speak NFS, 
SCSI, RAID, and any number of proprietary layout types.  When will HTTP 
be added to the list?  :)

But anything beyond the NFS protocol for talking client <-> data servers 
is code bloat complexity madness for an NFS client that wishes to be 
compatible with "most of the NFS 4.1 world".

An ideal NFS client for pNFS should be asked to do these two things, and 
nothing more:

1) send metadata transactions to one or more metadata servers, using 
well-known NFS protocol

2) send data to one or more data servers, using well-known NFS protocol 
subset designed for storage (v4.1, section 13.6)

But no.

pNFS has forced a huge complexity on the NFS client, by permitting an 
unbounded number of network protocols.  A "layout plugin" layer is 
required.  SCSI and OSD support are REQUIRED for any reasonably 
compatible setup going forward.

But even more than the technical complexity, this is the first time in 
NFS history that NFS has required a protocol besides... NFS.

pNFS means that a useful. compatible NFS client must know all these 
storage protocols, in addition to NFS.

Furthermore, enabling proprietary layout types means that it is easy for 
a "compatible" v4.1 client to be denied parallel access to data 
available to other "compatible" v4.1 clients:

	Client A: Linux, fully open source

	Client B: Linux, with closed source module for
		  layout type SuperWhizBang storage

	Both Client A and Client B can claim to be NFS v4.1 and pNFS
	compatible,
	yet Client A must read data through the metadata
	server because it lacks the SuperWhizBang storage plugin.

pNFS means a never-ending arms race for the best storage layout, where 
NFS clients are inevitably compatibly with a __random subset__ of total 
available layout types.  pNFS will be a continuing train wreck of 
fly-by-night storage companies, and their pet layout types & storage 
protocols.

It is a support nightmare, an admin nightmare, a firewall nightmare, a 
client implementor's nightmare, but a storage vendor's wet dream.

NFS was never beautiful, but at least until v4.0 it was well known and 
widely cross-compatible.  And only required one network protocol: NFS.

	Jeff




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils)
  2009-02-16 11:05               ` pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils) Jeff Garzik
@ 2009-02-16 12:45                 ` Boaz Harrosh
  2009-02-16 15:50                 ` James Bottomley
  2009-02-16 16:23                 ` Benny Halevy
  2 siblings, 0 replies; 17+ messages in thread
From: Boaz Harrosh @ 2009-02-16 12:45 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: FUJITA Tomonori, avishay, akpm, linux-fsdevel, osd-dev,
	linux-kernel, James.Bottomley, jens.axboe, linux-scsi

Jeff Garzik wrote:
> Boaz Harrosh wrote:
>> No can do. exofs is meant to be a reference implementation of a pNFS-objects
>> file serving system. Have you read the spec of pNFS-objects layout? they define
>> RAID 0, 1, 5, and 6. In pNFS the MDS is suppose to be able to write the data
>> for its clients as NFS, so it needs to have all the infra structure and knowledge
>> of an Client pNFS-object layout drive.
> 
> Yes, I have studied pNFS!  I plan to add v4.1 and pNFS support to my NFS 
> server, once v4.0 support is working well.
> 
> 
> pNFS The Theory:   is wise and necessary:  permit clients to directly 
> connect to data storage, rather than copying through the metadata 
> server(s).  This is what every distributed filesystem is doing these 
> days -- direct to data server for bulk data read/write.
> 
> pNFS The Specification:   is an utter piece of shit.  I can only presume 
> some shady backroom deal in a smoke-filled room was the reason this saw 
> the light of day.
> 
> 
> In a sane world, NFS clients would speak... NFS.
> 
> In the crazy world of pNFS, NFS clients are now forced to speak NFS, 
> SCSI, RAID, and any number of proprietary layout types.  When will HTTP 
> be added to the list?  :)
> 
> But anything beyond the NFS protocol for talking client <-> data servers 
> is code bloat complexity madness for an NFS client that wishes to be 
> compatible with "most of the NFS 4.1 world".
> 
> An ideal NFS client for pNFS should be asked to do these two things, and 
> nothing more:
> 
> 1) send metadata transactions to one or more metadata servers, using 
> well-known NFS protocol
> 
> 2) send data to one or more data servers, using well-known NFS protocol 
> subset designed for storage (v4.1, section 13.6)
> 
> But no.
> 
> pNFS has forced a huge complexity on the NFS client, by permitting an 
> unbounded number of network protocols.  A "layout plugin" layer is 
> required.  SCSI and OSD support are REQUIRED for any reasonably 
> compatible setup going forward.
> 
> But even more than the technical complexity, this is the first time in 
> NFS history that NFS has required a protocol besides... NFS.
> 
> pNFS means that a useful. compatible NFS client must know all these 
> storage protocols, in addition to NFS.
> 
> Furthermore, enabling proprietary layout types means that it is easy for 
> a "compatible" v4.1 client to be denied parallel access to data 
> available to other "compatible" v4.1 clients:
> 
> 	Client A: Linux, fully open source
> 
> 	Client B: Linux, with closed source module for
> 		  layout type SuperWhizBang storage
> 
> 	Both Client A and Client B can claim to be NFS v4.1 and pNFS
> 	compatible,
> 	yet Client A must read data through the metadata
> 	server because it lacks the SuperWhizBang storage plugin.
> 
> pNFS means a never-ending arms race for the best storage layout, where 
> NFS clients are inevitably compatibly with a __random subset__ of total 
> available layout types.  pNFS will be a continuing train wreck of 
> fly-by-night storage companies, and their pet layout types & storage 
> protocols.
> 
> It is a support nightmare, an admin nightmare, a firewall nightmare, a 
> client implementor's nightmare, but a storage vendor's wet dream.
> 
> NFS was never beautiful, but at least until v4.0 it was well known and 
> widely cross-compatible.  And only required one network protocol: NFS.
> 
> 	Jeff
> 

I hear you. I'm paying close attention and noting down all of above
hazardous signals. However, please allow me my own on-look on the matter.
Perhaps one day soon, (Probably not in LSF, no travel budget approval yet),
we will meet and we can talk about it more closely, and maybe I could
convince you of other aspects as well.

But pragmatically speaking. All the above has nothing that I can do about it.
My job is to show an OO reference implementation of pNFS-objects, a public
and signed protocol. I admit that pNFS-objects is a Panasas's pet, which is
my boss and the inventor of pNFS. I hope to remove it from your above
SuperWhizBang category, please. Actually my job is so it will not be. I want
an open standard implementation from day one, so there will be no questions. I
understand that you argue about the do or die of the OSD protocol under pNFS.
For me it is just that much more challenge of swimming up stream, as a Salmon.
Everyone is doing "Files" I get to do "Objects". I hope when finally the code
will arrive, soon, and it gets to be used, it's merits, performance, security,
and ease-of-use will win users over, big time.
(Lets compare notes, what is the minimal NFS DS implementation you can imagine?
 What would you say an OSD's target is? not counting all the extras an OSD gives
 you, like no proprietary back-channel protocol between MDS-to-DS inside the cluster.)

But I do hear you, really, you have very valid points that must be taken into consideration.

Thanks
Boaz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils)
  2009-02-16 11:05               ` pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils) Jeff Garzik
  2009-02-16 12:45                 ` Boaz Harrosh
@ 2009-02-16 15:50                 ` James Bottomley
  2009-02-16 16:27                   ` Benny Halevy
  2009-02-16 16:23                 ` Benny Halevy
  2 siblings, 1 reply; 17+ messages in thread
From: James Bottomley @ 2009-02-16 15:50 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Boaz Harrosh, FUJITA Tomonori, avishay, akpm, linux-fsdevel,
	osd-dev, linux-kernel, jens.axboe, linux-scsi

On Mon, 2009-02-16 at 06:05 -0500, Jeff Garzik wrote:
> Boaz Harrosh wrote:
> > No can do. exofs is meant to be a reference implementation of a pNFS-objects
> > file serving system. Have you read the spec of pNFS-objects layout? they define
> > RAID 0, 1, 5, and 6. In pNFS the MDS is suppose to be able to write the data
> > for its clients as NFS, so it needs to have all the infra structure and knowledge
> > of an Client pNFS-object layout drive.
> 
> Yes, I have studied pNFS!  I plan to add v4.1 and pNFS support to my NFS 
> server, once v4.0 support is working well.
> 
> 
> pNFS The Theory:   is wise and necessary:  permit clients to directly 
> connect to data storage, rather than copying through the metadata 
> server(s).  This is what every distributed filesystem is doing these 
> days -- direct to data server for bulk data read/write.
> 
> pNFS The Specification:   is an utter piece of shit.  I can only presume 
> some shady backroom deal in a smoke-filled room was the reason this saw 
> the light of day.
> 
> 
> In a sane world, NFS clients would speak... NFS.
> 
> In the crazy world of pNFS, NFS clients are now forced to speak NFS, 
> SCSI, RAID, and any number of proprietary layout types.  When will HTTP 
> be added to the list?  :)

Heh, it's one of the endearing faults of the storage industry that we
never learn from our mistakes ... particularly in storage protocols.

Actually, perhaps that's a mischaracterised: we never actually learn
from our successes.  For example, most popular storage protocols solve
about 80% of the problem (NFSv2) get something bolted on to take that to
95% (locking) and rule for decades.  We end up obsessing about the 5%
and produce something that's like 10x the overhead to solve it.
Customers, for some unfathomable reason, hate complexity  (I suspect
principally because it in some measure equals expense) so the 100%
solution (which actually turns out to be a 95% one because the over
engineered complexity adds another 5% of different problems that take
years to find) tends to work its way into a niche and stay there ...
eventually fading.

If you're really lucky, the niche evolves into something sustainable.
For example iSCSI: blew its early promise, pulled a bunch of unnecessary
networking into the protocol and ended up too big to fit in disk
firmware (thus destroying the ability to have a simple network tap to
replace storage fabric).  It's been slowly fading until Virtualisation
came along.  Now all the other solutions to getting storage into virtual
machines are so horrible and arcane that iSCSI looks like a winner (if
the alternative is Frankenstein's monster, Grendel's mother suddenly
looks more attractive as a partner).

So, trust the customer ... if it's so horrible it shouldn't have seen
the light of day, the chances are that no-one will buy it anyway.

James



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils)
  2009-02-16 11:05               ` pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils) Jeff Garzik
  2009-02-16 12:45                 ` Boaz Harrosh
  2009-02-16 15:50                 ` James Bottomley
@ 2009-02-16 16:23                 ` Benny Halevy
  2 siblings, 0 replies; 17+ messages in thread
From: Benny Halevy @ 2009-02-16 16:23 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Boaz Harrosh, FUJITA Tomonori, avishay, akpm, linux-fsdevel,
	osd-dev, linux-kernel, James.Bottomley, jens.axboe, linux-scsi

On Feb. 16, 2009, 13:05 +0200, Jeff Garzik <jeff@garzik.org> wrote:
> Boaz Harrosh wrote:
>> No can do. exofs is meant to be a reference implementation of a pNFS-objects
>> file serving system. Have you read the spec of pNFS-objects layout? they define
>> RAID 0, 1, 5, and 6. In pNFS the MDS is suppose to be able to write the data
>> for its clients as NFS, so it needs to have all the infra structure and knowledge
>> of an Client pNFS-object layout drive.
> 
> Yes, I have studied pNFS!  I plan to add v4.1 and pNFS support to my NFS 
> server, once v4.0 support is working well.
> 
> 
> pNFS The Theory:   is wise and necessary:  permit clients to directly 
> connect to data storage, rather than copying through the metadata 
> server(s).  This is what every distributed filesystem is doing these 
> days -- direct to data server for bulk data read/write.
> 
> pNFS The Specification:   is an utter piece of shit.  I can only presume 
> some shady backroom deal in a smoke-filled room was the reason this saw 
> the light of day.
> 
> 
> In a sane world, NFS clients would speak... NFS.
> 
> In the crazy world of pNFS, NFS clients are now forced to speak NFS, 
> SCSI, RAID, and any number of proprietary layout types.  When will HTTP 
> be added to the list?  :)
> 
> But anything beyond the NFS protocol for talking client <-> data servers 
> is code bloat complexity madness for an NFS client that wishes to be 
> compatible with "most of the NFS 4.1 world".
> 
> An ideal NFS client for pNFS should be asked to do these two things, and 
> nothing more:
> 
> 1) send metadata transactions to one or more metadata servers, using 
> well-known NFS protocol
> 
> 2) send data to one or more data servers, using well-known NFS protocol 
> subset designed for storage (v4.1, section 13.6)
> 
> But no.
> 
> pNFS has forced a huge complexity on the NFS client, by permitting an 
> unbounded number of network protocols.  A "layout plugin" layer is 
> required.  SCSI and OSD support are REQUIRED for any reasonably 
> compatible setup going forward.
> 
> But even more than the technical complexity, this is the first time in 
> NFS history that NFS has required a protocol besides... NFS.
> 
> pNFS means that a useful. compatible NFS client must know all these 
> storage protocols, in addition to NFS.
> 
> Furthermore, enabling proprietary layout types means that it is easy for 
> a "compatible" v4.1 client to be denied parallel access to data 
> available to other "compatible" v4.1 clients:
> 
> 	Client A: Linux, fully open source
> 
> 	Client B: Linux, with closed source module for
> 		  layout type SuperWhizBang storage
> 
> 	Both Client A and Client B can claim to be NFS v4.1 and pNFS
> 	compatible,
> 	yet Client A must read data through the metadata
> 	server because it lacks the SuperWhizBang storage plugin.

At least, for SuperWhizBang to comply with NFSv4.1 requirements it
needs to follow the IETF process as an open protocol and address
important semantic details (as listed by nfsv4.1) on-top of the
wire data structures like security considerations and client fencing.

Being open source or not is somewhat orthogonal to that since
from the protocol specification one should be able to implement
a fully comply client and/or server.

> 
> pNFS means a never-ending arms race for the best storage layout, where 
> NFS clients are inevitably compatibly with a __random subset__ of total 
> available layout types.  pNFS will be a continuing train wreck of 
> fly-by-night storage companies, and their pet layout types & storage 
> protocols.
> 
> It is a support nightmare, an admin nightmare, a firewall nightmare, a 
> client implementor's nightmare, but a storage vendor's wet dream.

What you're basically saying is similar to rejecting the filesystem export
kABI since this will cause a never-ending arms race for the best file
system.  (a.k.a. "640KB of RAM and FAT file system is likely anything
that a sane user will ever need"... ;-)

I believe that competition is good.  Good for customers, for which the pNFS
protocol was designed for, and good for vendors as well.  The extra complexity
is there since one-size does not fit all.  Different storage technologies
fit certain applications better than others and exposing storage for
direct client access can convey these strengths all the way up to the host
running the application.

Besides, the pNFS specification was also driven by customers that use
proprietary clustered filesystems today, that use block or object-based
storage.  These customers want pNFS to be a standard to steam up
competition for several reasons, like:
- encourage open source implementations
- second source availability
- building best-of-breed systems by integrating
  parts from different vendors
- reuse of existing storage and networking infrastructure

Benny

> 
> NFS was never beautiful, but at least until v4.0 it was well known and 
> widely cross-compatible.  And only required one network protocol: NFS.
> 
> 	Jeff
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils)
  2009-02-16 15:50                 ` James Bottomley
@ 2009-02-16 16:27                   ` Benny Halevy
  0 siblings, 0 replies; 17+ messages in thread
From: Benny Halevy @ 2009-02-16 16:27 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jeff Garzik, Boaz Harrosh, FUJITA Tomonori, avishay, akpm,
	linux-fsdevel, osd-dev, linux-kernel, jens.axboe, linux-scsi

On Feb. 16, 2009, 17:50 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> On Mon, 2009-02-16 at 06:05 -0500, Jeff Garzik wrote:
>> Boaz Harrosh wrote:
>>> No can do. exofs is meant to be a reference implementation of a pNFS-objects
>>> file serving system. Have you read the spec of pNFS-objects layout? they define
>>> RAID 0, 1, 5, and 6. In pNFS the MDS is suppose to be able to write the data
>>> for its clients as NFS, so it needs to have all the infra structure and knowledge
>>> of an Client pNFS-object layout drive.
>> Yes, I have studied pNFS!  I plan to add v4.1 and pNFS support to my NFS 
>> server, once v4.0 support is working well.
>>
>>
>> pNFS The Theory:   is wise and necessary:  permit clients to directly 
>> connect to data storage, rather than copying through the metadata 
>> server(s).  This is what every distributed filesystem is doing these 
>> days -- direct to data server for bulk data read/write.
>>
>> pNFS The Specification:   is an utter piece of shit.  I can only presume 
>> some shady backroom deal in a smoke-filled room was the reason this saw 
>> the light of day.
>>
>>
>> In a sane world, NFS clients would speak... NFS.
>>
>> In the crazy world of pNFS, NFS clients are now forced to speak NFS, 
>> SCSI, RAID, and any number of proprietary layout types.  When will HTTP 
>> be added to the list?  :)
> 
> Heh, it's one of the endearing faults of the storage industry that we
> never learn from our mistakes ... particularly in storage protocols.
> 
> Actually, perhaps that's a mischaracterised: we never actually learn
> from our successes.  For example, most popular storage protocols solve
> about 80% of the problem (NFSv2) get something bolted on to take that to
> 95% (locking) and rule for decades.  We end up obsessing about the 5%
> and produce something that's like 10x the overhead to solve it.
> Customers, for some unfathomable reason, hate complexity  (I suspect
> principally because it in some measure equals expense) so the 100%
> solution (which actually turns out to be a 95% one because the over
> engineered complexity adds another 5% of different problems that take
> years to find) tends to work its way into a niche and stay there ...
> eventually fading.
> 
> If you're really lucky, the niche evolves into something sustainable.
> For example iSCSI: blew its early promise, pulled a bunch of unnecessary
> networking into the protocol and ended up too big to fit in disk
> firmware (thus destroying the ability to have a simple network tap to
> replace storage fabric).  It's been slowly fading until Virtualisation
> came along.  Now all the other solutions to getting storage into virtual
> machines are so horrible and arcane that iSCSI looks like a winner (if
> the alternative is Frankenstein's monster, Grendel's mother suddenly
> looks more attractive as a partner).
> 
> So, trust the customer ... if it's so horrible it shouldn't have seen
> the light of day, the chances are that no-one will buy it anyway.

I completely agree with this sentence.
And no customer, whatsoever, that I've talked to about pNFS had
any reservations about supporting multiple layout types.  On the
contrary...

Benny

> 
> James
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils
  2009-02-16 10:29             ` Boaz Harrosh
@ 2009-02-17  0:20               ` FUJITA Tomonori
  2009-02-17  8:10                 ` [osd-dev] " Boaz Harrosh
  0 siblings, 1 reply; 17+ messages in thread
From: FUJITA Tomonori @ 2009-02-17  0:20 UTC (permalink / raw)
  To: bharrosh
  Cc: fujita.tomonori, avishay, jeff, akpm, linux-fsdevel, osd-dev,
	linux-kernel, James.Bottomley, jens.axboe, linux-scsi

On Mon, 16 Feb 2009 12:29:06 +0200
Boaz Harrosh <bharrosh@panasas.com> wrote:

> >>>> exofs is not the first and only file system who is using bios. Proof of
> >>>> the matter is that block exports a bio submit routine.
> >>> Seems that exofs just passes pages and the ULD sends a SCSI command
> >>> including these pages. I don't see how exofs needs to handle bio
> >>> directly.
> >>>
> >> How do you propose to collect these pages? and keep them without allocating
> >> an extra list? without pre-allocating a struct request? and without re-inventing
> >> the bio structure?
> > 
> > I don't think that allocating an extra list (or something) to keep
> > them hurts performance. We can talk about it when you have the real
> > performance results.
> 
> So you are the one that starts to invent the wheel here. I thought I was
> the one that does that, only you only called me by names, because you never showed
> me where.
> 
> But please only answer one question for me: Please don't write back if you do not
> answer this question:
> 
> Why do other filesystems allow to use bios? are they going to stop? Who is going
> to remove that?

Can you stop the argument, "exofs is similar to the existing
traditional file systems hence it should be treated equally". It's
simply untrue. Does anyone except for panasas people insist the same
argument?

We are talking about the design of exofs, which also affects the
design of OSD ULD (including the library) living in SCSI
mid-layer. It's something completely different from existing
traditional file systems that work nicely on the top of the block
layer.

As discussed in another thread, now OSD ULD reinvents the bio handling
infrastructure because of the design of exofs. But OSD ULD can use the
block layer helper functions to avoid the re-invention if we change
the exofs design to take pages instead of bios. For now, it works
perfectly for exofs. In the future, we might change it but we don't
know until you submit patches (or the performance results) that show
taking pages doesn't work for exofs nicely.

I guess that we need to evolve the block layer to support OSD stuff
cleanly than we've discussed recently. But again we can do when we
definitely need to do.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [osd-dev] [PATCH 1/8] exofs: Kbuild, Headers and osd utils
  2009-02-17  0:20               ` FUJITA Tomonori
@ 2009-02-17  8:10                 ` Boaz Harrosh
  2009-02-27  8:09                   ` FUJITA Tomonori
  0 siblings, 1 reply; 17+ messages in thread
From: Boaz Harrosh @ 2009-02-17  8:10 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: James.Bottomley, linux-scsi, jeff, linux-kernel, avishay, osd-dev,
	jens.axboe, linux-fsdevel, akpm

FUJITA Tomonori wrote:
> 
> Can you stop the argument, "exofs is similar to the existing
> traditional file systems hence it should be treated equally". It's
> simply untrue. Does anyone except for panasas people insist the same
> argument?
> 

No I will not, it is true. exofs is just a regular old filesystem
nothing different.

> We are talking about the design of exofs, which also affects the
> design of OSD ULD (including the library) living in SCSI
> mid-layer. 

The ULD belongs to scsi but the library could sit else where, how
is that an argument?

> It's something completely different from existing
> traditional file systems that work nicely on the top of the block
> layer.
> 

Nicely is a matter of opinion. I think that building a bio in stages
in the background, then at the point of execution build a request-from-bio
and execute is a nice design that makes sure nothing is duplicated, copied,
and wheels are not re-invented. Current Kernel design is nice, why change
it?

> As discussed in another thread, now OSD ULD reinvents the bio handling
> infrastructure because of the design of exofs. 

Not true, show me where? You keep saying that. Where in the code is it
reinvented?

> But OSD ULD can use the
> block layer helper functions to avoid the re-invention if we change
> the exofs design to take pages instead of bios.

That, above is exactly a re-invention of block layer. What was all that
scatterlist pointers and scsi_execute_async() cleanup that you worked
so hard to get rid off. It was a list of pages+offsets+lengths, that's what
it was. Now you ask me to do the same, keep an external structure of
pages+offsets+lengths. pass them three layers down and at some point in
time force new block_layer interfaces, which do not fully exist today,
to prepare a request for submission.

No! the decision was, keep preparation of request local and submit it
in place, without intermediate structures. From-memory-to-request
in one stage.

That's what I want. The bio lets me do that yesterday, lots of file
systems do that yesterday.

All I'm asking for is one small blk_make_request() that is a parallel
of generic_make_request() of the BLOCK_FS, for the BLOCK_PC requests

If someone wanted a filesystem over tape drives, over st.c or osst.c.
He would design it similar. collect bios in background, point and shoot.
The blk_map_xxx functions where made to satisfy user-mode interfaces, for
filesystems it was bio for ages.

> For now, it works

> perfectly for exofs. In the future, we might change it but we don't
> know until you submit patches (or the performance results) that show
> taking pages doesn't work for exofs nicely.
> 

I don't know about you, but me, I don't have to do some work to know
it's bad. I can imagine before hand that it is bad. I usually run
hundreds of simulations in my head, discarding any bad options until I
find the one way I like. Usually the short easiest way is also the best.
(Since I'm very lazy)
Like with bidi for example, Why not just take two requests instead of
one? But I was sent to do all that gigantic work so everyone will see
that.

> I guess that we need to evolve the block layer to support OSD stuff
> cleanly than we've discussed recently. But again we can do when we
> definitely need to do.

It's not that big and long evolution. It is a simple:

struct request *blk_make_request(struct bio*, gfp_t gfp);

And we are done. more simple then that? I don't know

Boaz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [osd-dev] [PATCH 1/8] exofs: Kbuild, Headers and osd utils
  2009-02-17  8:10                 ` [osd-dev] " Boaz Harrosh
@ 2009-02-27  8:09                   ` FUJITA Tomonori
  2009-03-01 10:43                     ` Boaz Harrosh
  0 siblings, 1 reply; 17+ messages in thread
From: FUJITA Tomonori @ 2009-02-27  8:09 UTC (permalink / raw)
  To: bharrosh
  Cc: fujita.tomonori, James.Bottomley, linux-scsi, jeff, linux-kernel,
	avishay, osd-dev, jens.axboe, linux-fsdevel, akpm

On Tue, 17 Feb 2009 10:10:15 +0200
Boaz Harrosh <bharrosh@panasas.com> wrote:

> FUJITA Tomonori wrote:
> > 
> > Can you stop the argument, "exofs is similar to the existing
> > traditional file systems hence it should be treated equally". It's
> > simply untrue. Does anyone except for panasas people insist the same
> > argument?
> > 
> 
> No I will not, it is true. exofs is just a regular old filesystem
> nothing different.

After reading this, I gave up discussing this issue with you but I
still wait for your fixes that you promised:

http://marc.info/?l=linux-scsi&m=123445759718253&w=2


Thanks,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [osd-dev] [PATCH 1/8] exofs: Kbuild, Headers and osd utils
  2009-02-27  8:09                   ` FUJITA Tomonori
@ 2009-03-01 10:43                     ` Boaz Harrosh
  0 siblings, 0 replies; 17+ messages in thread
From: Boaz Harrosh @ 2009-03-01 10:43 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: James.Bottomley, linux-scsi, jeff, linux-kernel, avishay, osd-dev,
	jens.axboe, linux-fsdevel, akpm

FUJITA Tomonori wrote:
> On Tue, 17 Feb 2009 10:10:15 +0200
> Boaz Harrosh <bharrosh@panasas.com> wrote:
> 
>> FUJITA Tomonori wrote:
>>> Can you stop the argument, "exofs is similar to the existing
>>> traditional file systems hence it should be treated equally". It's
>>> simply untrue. Does anyone except for panasas people insist the same
>>> argument?
>>>
>> No I will not, it is true. exofs is just a regular old filesystem
>> nothing different.
> 
> After reading this, I gave up discussing this issue with you but I
> still wait for your fixes that you promised:
> 
> http://marc.info/?l=linux-scsi&m=123445759718253&w=2
> 
> 
> Thanks,
> --

They are on the way, I have not forgotten

Boaz


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-03-01 10:44 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <49902A9E.3070002@panasas.com>
     [not found] ` <1234185129-31858-1-git-send-email-bharrosh@panasas.com>
2009-02-16  4:18   ` [PATCH 1/8] exofs: Kbuild, Headers and osd utils FUJITA Tomonori
2009-02-16  8:49     ` Boaz Harrosh
2009-02-16  9:00       ` FUJITA Tomonori
2009-02-16  9:19         ` Boaz Harrosh
2009-02-16  9:27           ` Jeff Garzik
2009-02-16 10:19             ` Boaz Harrosh
2009-02-16 11:05               ` pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils) Jeff Garzik
2009-02-16 12:45                 ` Boaz Harrosh
2009-02-16 15:50                 ` James Bottomley
2009-02-16 16:27                   ` Benny Halevy
2009-02-16 16:23                 ` Benny Halevy
2009-02-16  9:38           ` [PATCH 1/8] exofs: Kbuild, Headers and osd utils FUJITA Tomonori
2009-02-16 10:29             ` Boaz Harrosh
2009-02-17  0:20               ` FUJITA Tomonori
2009-02-17  8:10                 ` [osd-dev] " Boaz Harrosh
2009-02-27  8:09                   ` FUJITA Tomonori
2009-03-01 10:43                     ` Boaz Harrosh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox