* Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils [not found] ` <1234185129-31858-1-git-send-email-bharrosh@panasas.com> @ 2009-02-16 4:18 ` FUJITA Tomonori 2009-02-16 8:49 ` Boaz Harrosh 0 siblings, 1 reply; 17+ messages in thread From: FUJITA Tomonori @ 2009-02-16 4:18 UTC (permalink / raw) To: bharrosh Cc: avishay, jeff, akpm, linux-fsdevel, osd-dev, linux-kernel, James.Bottomley, jens.axboe, linux-scsi On Mon, 9 Feb 2009 15:12:09 +0200 Boaz Harrosh <bharrosh@panasas.com> wrote: > This patch includes osd infrastructure that will be used later by > the file system. > > Also the declarations of constants, on disk structures, > and prototypes. > > And the Kbuild+Kconfig files needed to build the exofs module. > > Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> > --- > fs/exofs/Kbuild | 30 +++++++ > fs/exofs/Kconfig | 13 +++ > fs/exofs/common.h | 181 +++++++++++++++++++++++++++++++++++++++++ > fs/exofs/exofs.h | 139 ++++++++++++++++++++++++++++++++ > fs/exofs/osd.c | 230 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 593 insertions(+), 0 deletions(-) > create mode 100644 fs/exofs/Kbuild > create mode 100644 fs/exofs/Kconfig > create mode 100644 fs/exofs/common.h > create mode 100644 fs/exofs/exofs.h > create mode 100644 fs/exofs/osd.c > +static void _osd_read(struct osd_request *or, > + const struct osd_obj_id *obj, uint64_t offset, struct bio *bio) > +{ > + osd_req_read(or, obj, bio, offset); > + EXOFS_DBGMSG("osd_req_read(p=%llX, ob=%llX, l=%llu, of=%llu)\n", > + _LLU(obj->partition), _LLU(obj->id), _LLU(bio->bi_size), > + _LLU(offset)); > +} > + > +#ifdef __KERNEL__ Hmm? > +static struct bio *_bio_map_pages(struct request_queue *req_q, > + struct page **pages, unsigned page_count, > + size_t length, gfp_t gfp_mask) > +{ > + struct bio *bio; > + int i; > + > + bio = bio_alloc(gfp_mask, page_count); > + if (!bio) { > + EXOFS_DBGMSG("Failed to bio_alloc page_count=%d\n", page_count); > + return NULL; > + } > + > + for (i = 0; i < page_count && length; i++) { > + size_t use_len = min(length, PAGE_SIZE); > + > + if (use_len != > + bio_add_pc_page(req_q, bio, pages[i], use_len, 0)) { > + EXOFS_ERR("Failed bio_add_pc_page req_q=%p pages[i]=%p " > + "use_len=%Zd page_count=%d length=%Zd\n", > + req_q, pages[i], use_len, page_count, length); > + bio_put(bio); > + return NULL; > + } > + > + length -= use_len; > + } > + > + WARN_ON(length); > + return bio; > +} 1) exofs builds bios by hand. 2) exofs passes bio to OSD SCSI ULD. As a result, exofs and OSD SCSI ULD need to know the internal of bio, that is, you reinvent the bio handling infrastructure, as pointed out in another thread in scsi-ml. _bio_map_pages is called where the VFS passes an array of a pointer to a page frame. Why can't you simply pass the array to OSD SCSI ULD? Then OSD SCSI ULD can use the block layer helper functions to build a request out of pages without knowing the internal of bio. > +int osd_req_read_pages(struct osd_request *or, > + const struct osd_obj_id *obj, u64 offset, u64 length, > + struct page **pages, int page_count) > +{ > + struct request_queue *req_q = or->osd_dev->scsi_device->request_queue; > + struct bio *bio = _bio_map_pages(req_q, pages, page_count, length, > + GFP_KERNEL); > + > + if (!bio) > + return -ENOMEM; > + > + _osd_read(or, obj, offset, bio); > + return 0; > +} > +#endif /* def __KERNEL__ */ > + > +int osd_req_read_kern(struct osd_request *or, > + const struct osd_obj_id *obj, u64 offset, void* buff, u64 len) > +{ > + struct request_queue *req_q = or->osd_dev->scsi_device->request_queue; > + struct bio *bio = bio_map_kern(req_q, buff, len, GFP_KERNEL); > + > + if (!bio) > + return -ENOMEM; > + > + _osd_read(or, obj, offset, bio); > + return 0; > +} > + > +static void _osd_write(struct osd_request *or, > + const struct osd_obj_id *obj, uint64_t offset, struct bio *bio) > +{ > + osd_req_write(or, obj, bio, offset); > + EXOFS_DBGMSG("osd_req_write(p=%llX, ob=%llX, l=%llu, of=%llu)\n", > + _LLU(obj->partition), _LLU(obj->id), _LLU(bio->bi_size), > + _LLU(offset)); > +} > + > +#ifdef __KERNEL__ > +int osd_req_write_pages(struct osd_request *or, > + const struct osd_obj_id *obj, u64 offset, u64 length, > + struct page **pages, int page_count) > +{ > + struct request_queue *req_q = or->osd_dev->scsi_device->request_queue; > + struct bio *bio = _bio_map_pages(req_q, pages, page_count, length, > + GFP_KERNEL); > + > + if (!bio) > + return -ENOMEM; > + > + _osd_write(or, obj, offset, bio); > + return 0; > +} > +#endif /* def __KERNEL__ */ > + > +int osd_req_write_kern(struct osd_request *or, > + const struct osd_obj_id *obj, u64 offset, void* buff, u64 len) > +{ > + struct request_queue *req_q = or->osd_dev->scsi_device->request_queue; > + struct bio *bio = bio_map_kern(req_q, buff, len, GFP_KERNEL); > + > + if (!bio) > + return -ENOMEM; > + > + _osd_write(or, obj, offset, bio); > + return 0; > +} > -- > 1.6.0.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils 2009-02-16 4:18 ` [PATCH 1/8] exofs: Kbuild, Headers and osd utils FUJITA Tomonori @ 2009-02-16 8:49 ` Boaz Harrosh 2009-02-16 9:00 ` FUJITA Tomonori 0 siblings, 1 reply; 17+ messages in thread From: Boaz Harrosh @ 2009-02-16 8:49 UTC (permalink / raw) To: FUJITA Tomonori Cc: avishay, jeff, akpm, linux-fsdevel, osd-dev, linux-kernel, James.Bottomley, jens.axboe, linux-scsi FUJITA Tomonori wrote: > On Mon, 9 Feb 2009 15:12:09 +0200 > Boaz Harrosh <bharrosh@panasas.com> wrote: > >> This patch includes osd infrastructure that will be used later by >> the file system. >> >> Also the declarations of constants, on disk structures, >> and prototypes. >> >> And the Kbuild+Kconfig files needed to build the exofs module. >> >> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> >> --- >> fs/exofs/Kbuild | 30 +++++++ >> fs/exofs/Kconfig | 13 +++ >> fs/exofs/common.h | 181 +++++++++++++++++++++++++++++++++++++++++ >> fs/exofs/exofs.h | 139 ++++++++++++++++++++++++++++++++ >> fs/exofs/osd.c | 230 +++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 5 files changed, 593 insertions(+), 0 deletions(-) >> create mode 100644 fs/exofs/Kbuild >> create mode 100644 fs/exofs/Kconfig >> create mode 100644 fs/exofs/common.h >> create mode 100644 fs/exofs/exofs.h >> create mode 100644 fs/exofs/osd.c > >> +static void _osd_read(struct osd_request *or, >> + const struct osd_obj_id *obj, uint64_t offset, struct bio *bio) >> +{ >> + osd_req_read(or, obj, bio, offset); >> + EXOFS_DBGMSG("osd_req_read(p=%llX, ob=%llX, l=%llu, of=%llu)\n", >> + _LLU(obj->partition), _LLU(obj->id), _LLU(bio->bi_size), >> + _LLU(offset)); >> +} >> + >> +#ifdef __KERNEL__ > > Hmm? > Yep, this file also complies in user mode. >> +static struct bio *_bio_map_pages(struct request_queue *req_q, >> + struct page **pages, unsigned page_count, >> + size_t length, gfp_t gfp_mask) >> +{ >> + struct bio *bio; >> + int i; >> + >> + bio = bio_alloc(gfp_mask, page_count); >> + if (!bio) { >> + EXOFS_DBGMSG("Failed to bio_alloc page_count=%d\n", page_count); >> + return NULL; >> + } >> + >> + for (i = 0; i < page_count && length; i++) { >> + size_t use_len = min(length, PAGE_SIZE); >> + >> + if (use_len != >> + bio_add_pc_page(req_q, bio, pages[i], use_len, 0)) { >> + EXOFS_ERR("Failed bio_add_pc_page req_q=%p pages[i]=%p " >> + "use_len=%Zd page_count=%d length=%Zd\n", >> + req_q, pages[i], use_len, page_count, length); >> + bio_put(bio); >> + return NULL; >> + } >> + >> + length -= use_len; >> + } >> + >> + WARN_ON(length); >> + return bio; >> +} > > 1) exofs builds bios by hand. > 2) exofs passes bio to OSD SCSI ULD. > > As a result, exofs and OSD SCSI ULD need to know the internal of bio, > that is, you reinvent the bio handling infrastructure, as pointed out > in another thread in scsi-ml. > > _bio_map_pages is called where the VFS passes an array of a pointer to > a page frame. > > Why can't you simply pass the array to OSD SCSI ULD? Then OSD SCSI ULD > can use the block layer helper functions to build a request out of > pages without knowing the internal of bio. > > Because actually this code is wrong and temporary and will change soon. At vfs write_pages I do not get a linear array of page pointers but a link-list of pages. This will not fit any current model. Also looking ahead I will have RAID 0, 1, 5, and 6 on objects of different devices. bio is the perfect collector for memory information in this situation. exofs is not the first and only file system who is using bios. Proof of the matter is that block exports a bio submit routine. As I said on the other thread, I could live without it for now, for a short while, but I will regret it badly and it will hurt performance in the long run. <snip> Boaz ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils 2009-02-16 8:49 ` Boaz Harrosh @ 2009-02-16 9:00 ` FUJITA Tomonori 2009-02-16 9:19 ` Boaz Harrosh 0 siblings, 1 reply; 17+ messages in thread From: FUJITA Tomonori @ 2009-02-16 9:00 UTC (permalink / raw) To: bharrosh Cc: fujita.tomonori, avishay, jeff, akpm, linux-fsdevel, osd-dev, linux-kernel, James.Bottomley, jens.axboe, linux-scsi On Mon, 16 Feb 2009 10:49:39 +0200 Boaz Harrosh <bharrosh@panasas.com> wrote: > FUJITA Tomonori wrote: > > On Mon, 9 Feb 2009 15:12:09 +0200 > > Boaz Harrosh <bharrosh@panasas.com> wrote: > > > >> This patch includes osd infrastructure that will be used later by > >> the file system. > >> > >> Also the declarations of constants, on disk structures, > >> and prototypes. > >> > >> And the Kbuild+Kconfig files needed to build the exofs module. > >> > >> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> > >> --- > >> fs/exofs/Kbuild | 30 +++++++ > >> fs/exofs/Kconfig | 13 +++ > >> fs/exofs/common.h | 181 +++++++++++++++++++++++++++++++++++++++++ > >> fs/exofs/exofs.h | 139 ++++++++++++++++++++++++++++++++ > >> fs/exofs/osd.c | 230 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> 5 files changed, 593 insertions(+), 0 deletions(-) > >> create mode 100644 fs/exofs/Kbuild > >> create mode 100644 fs/exofs/Kconfig > >> create mode 100644 fs/exofs/common.h > >> create mode 100644 fs/exofs/exofs.h > >> create mode 100644 fs/exofs/osd.c > > > >> +static void _osd_read(struct osd_request *or, > >> + const struct osd_obj_id *obj, uint64_t offset, struct bio *bio) > >> +{ > >> + osd_req_read(or, obj, bio, offset); > >> + EXOFS_DBGMSG("osd_req_read(p=%llX, ob=%llX, l=%llu, of=%llu)\n", > >> + _LLU(obj->partition), _LLU(obj->id), _LLU(bio->bi_size), > >> + _LLU(offset)); > >> +} > >> + > >> +#ifdef __KERNEL__ > > > > Hmm? > > > > Yep, this file also complies in user mode. Even if you do, it's a good thing to add __KERNEL__ to fs/*.c? > >> +static struct bio *_bio_map_pages(struct request_queue *req_q, > >> + struct page **pages, unsigned page_count, > >> + size_t length, gfp_t gfp_mask) > >> +{ > >> + struct bio *bio; > >> + int i; > >> + > >> + bio = bio_alloc(gfp_mask, page_count); > >> + if (!bio) { > >> + EXOFS_DBGMSG("Failed to bio_alloc page_count=%d\n", page_count); > >> + return NULL; > >> + } > >> + > >> + for (i = 0; i < page_count && length; i++) { > >> + size_t use_len = min(length, PAGE_SIZE); > >> + > >> + if (use_len != > >> + bio_add_pc_page(req_q, bio, pages[i], use_len, 0)) { > >> + EXOFS_ERR("Failed bio_add_pc_page req_q=%p pages[i]=%p " > >> + "use_len=%Zd page_count=%d length=%Zd\n", > >> + req_q, pages[i], use_len, page_count, length); > >> + bio_put(bio); > >> + return NULL; > >> + } > >> + > >> + length -= use_len; > >> + } > >> + > >> + WARN_ON(length); > >> + return bio; > >> +} > > > > 1) exofs builds bios by hand. > > 2) exofs passes bio to OSD SCSI ULD. > > > > As a result, exofs and OSD SCSI ULD need to know the internal of bio, > > that is, you reinvent the bio handling infrastructure, as pointed out > > in another thread in scsi-ml. > > > > _bio_map_pages is called where the VFS passes an array of a pointer to > > a page frame. > > > > Why can't you simply pass the array to OSD SCSI ULD? Then OSD SCSI ULD > > can use the block layer helper functions to build a request out of > > pages without knowing the internal of bio. > > > > > > Because actually this code is wrong and temporary and will change soon. > At vfs write_pages I do not get a linear array of page pointers but a > link-list of pages. This will not fit any current model. Then, why can't you pass a link-list of pages? > Also looking > ahead I will have RAID 0, 1, 5, and 6 on objects of different devices. bio > is the perfect collector for memory information in this situation. You will add such features to exofs, handling multiple devices internally? > exofs is not the first and only file system who is using bios. Proof of > the matter is that block exports a bio submit routine. Seems that exofs just passes pages and the ULD sends a SCSI command including these pages. I don't see how exofs needs to handle bio directly. > As I said on the other thread, I could live without it for now, for a short while, > but I will regret it badly and it will hurt performance in the long run. > > <snip> > > Boaz > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils 2009-02-16 9:00 ` FUJITA Tomonori @ 2009-02-16 9:19 ` Boaz Harrosh 2009-02-16 9:27 ` Jeff Garzik 2009-02-16 9:38 ` [PATCH 1/8] exofs: Kbuild, Headers and osd utils FUJITA Tomonori 0 siblings, 2 replies; 17+ messages in thread From: Boaz Harrosh @ 2009-02-16 9:19 UTC (permalink / raw) To: FUJITA Tomonori Cc: avishay, jeff, akpm, linux-fsdevel, osd-dev, linux-kernel, James.Bottomley, jens.axboe, linux-scsi FUJITA Tomonori wrote: > On Mon, 16 Feb 2009 10:49:39 +0200 > Boaz Harrosh <bharrosh@panasas.com> wrote: > >> FUJITA Tomonori wrote: >>> On Mon, 9 Feb 2009 15:12:09 +0200 >>> Boaz Harrosh <bharrosh@panasas.com> wrote: >>> >>>> This patch includes osd infrastructure that will be used later by >>>> the file system. >>>> >>>> Also the declarations of constants, on disk structures, >>>> and prototypes. >>>> >>>> And the Kbuild+Kconfig files needed to build the exofs module. >>>> >>>> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> >>>> --- >>>> fs/exofs/Kbuild | 30 +++++++ >>>> fs/exofs/Kconfig | 13 +++ >>>> fs/exofs/common.h | 181 +++++++++++++++++++++++++++++++++++++++++ >>>> fs/exofs/exofs.h | 139 ++++++++++++++++++++++++++++++++ >>>> fs/exofs/osd.c | 230 +++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> 5 files changed, 593 insertions(+), 0 deletions(-) >>>> create mode 100644 fs/exofs/Kbuild >>>> create mode 100644 fs/exofs/Kconfig >>>> create mode 100644 fs/exofs/common.h >>>> create mode 100644 fs/exofs/exofs.h >>>> create mode 100644 fs/exofs/osd.c >>>> +static void _osd_read(struct osd_request *or, >>>> + const struct osd_obj_id *obj, uint64_t offset, struct bio *bio) >>>> +{ >>>> + osd_req_read(or, obj, bio, offset); >>>> + EXOFS_DBGMSG("osd_req_read(p=%llX, ob=%llX, l=%llu, of=%llu)\n", >>>> + _LLU(obj->partition), _LLU(obj->id), _LLU(bio->bi_size), >>>> + _LLU(offset)); >>>> +} >>>> + >>>> +#ifdef __KERNEL__ >>> Hmm? >>> >> Yep, this file also complies in user mode. > > Even if you do, it's a good thing to add __KERNEL__ to fs/*.c? > > >>>> +static struct bio *_bio_map_pages(struct request_queue *req_q, >>>> + struct page **pages, unsigned page_count, >>>> + size_t length, gfp_t gfp_mask) >>>> +{ >>>> + struct bio *bio; >>>> + int i; >>>> + >>>> + bio = bio_alloc(gfp_mask, page_count); >>>> + if (!bio) { >>>> + EXOFS_DBGMSG("Failed to bio_alloc page_count=%d\n", page_count); >>>> + return NULL; >>>> + } >>>> + >>>> + for (i = 0; i < page_count && length; i++) { >>>> + size_t use_len = min(length, PAGE_SIZE); >>>> + >>>> + if (use_len != >>>> + bio_add_pc_page(req_q, bio, pages[i], use_len, 0)) { >>>> + EXOFS_ERR("Failed bio_add_pc_page req_q=%p pages[i]=%p " >>>> + "use_len=%Zd page_count=%d length=%Zd\n", >>>> + req_q, pages[i], use_len, page_count, length); >>>> + bio_put(bio); >>>> + return NULL; >>>> + } >>>> + >>>> + length -= use_len; >>>> + } >>>> + >>>> + WARN_ON(length); >>>> + return bio; >>>> +} >>> 1) exofs builds bios by hand. >>> 2) exofs passes bio to OSD SCSI ULD. >>> >>> As a result, exofs and OSD SCSI ULD need to know the internal of bio, >>> that is, you reinvent the bio handling infrastructure, as pointed out >>> in another thread in scsi-ml. >>> >>> _bio_map_pages is called where the VFS passes an array of a pointer to >>> a page frame. >>> >>> Why can't you simply pass the array to OSD SCSI ULD? Then OSD SCSI ULD >>> can use the block layer helper functions to build a request out of >>> pages without knowing the internal of bio. >>> >>> >> Because actually this code is wrong and temporary and will change soon. >> At vfs write_pages I do not get a linear array of page pointers but a >> link-list of pages. This will not fit any current model. > > Then, why can't you pass a link-list of pages? > What? How to do that? I mean how to move from link-list of pages to request? > >> Also looking >> ahead I will have RAID 0, 1, 5, and 6 on objects of different devices. bio >> is the perfect collector for memory information in this situation. > > You will add such features to exofs, handling multiple devices > internally? > Multiple objects on Multiple devices, Yes. > >> exofs is not the first and only file system who is using bios. Proof of >> the matter is that block exports a bio submit routine. > > Seems that exofs just passes pages and the ULD sends a SCSI command > including these pages. I don't see how exofs needs to handle bio > directly. > How do you propose to collect these pages? and keep them without allocating an extra list? without pre-allocating a struct request? and without re-inventing the bio structure? > >> As I said on the other thread, I could live without it for now, for a short while, >> but I will regret it badly and it will hurt performance in the long run. >> >> <snip> >> >> Boaz ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils 2009-02-16 9:19 ` Boaz Harrosh @ 2009-02-16 9:27 ` Jeff Garzik 2009-02-16 10:19 ` Boaz Harrosh 2009-02-16 9:38 ` [PATCH 1/8] exofs: Kbuild, Headers and osd utils FUJITA Tomonori 1 sibling, 1 reply; 17+ messages in thread From: Jeff Garzik @ 2009-02-16 9:27 UTC (permalink / raw) To: Boaz Harrosh Cc: FUJITA Tomonori, avishay, akpm, linux-fsdevel, osd-dev, linux-kernel, James.Bottomley, jens.axboe, linux-scsi Boaz Harrosh wrote: > FUJITA Tomonori wrote: >> Boaz Harrosh <bharrosh@panasas.com> wrote: >>> Also looking >>> ahead I will have RAID 0, 1, 5, and 6 on objects of different devices. bio >>> is the perfect collector for memory information in this situation. >> You will add such features to exofs, handling multiple devices >> internally? > Multiple objects on Multiple devices, Yes. That sort of feature does not belong in exofs, but somewhat separate. Ideally we should be able to share "MD for OSD" with other OSD filesystems, and the "osdblk" device that I will produce once libosd hits upstream. Jeff ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils 2009-02-16 9:27 ` Jeff Garzik @ 2009-02-16 10:19 ` Boaz Harrosh 2009-02-16 11:05 ` pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils) Jeff Garzik 0 siblings, 1 reply; 17+ messages in thread From: Boaz Harrosh @ 2009-02-16 10:19 UTC (permalink / raw) To: Jeff Garzik Cc: FUJITA Tomonori, avishay, akpm, linux-fsdevel, osd-dev, linux-kernel, James.Bottomley, jens.axboe, linux-scsi Jeff Garzik wrote: > Boaz Harrosh wrote: >> FUJITA Tomonori wrote: >>> Boaz Harrosh <bharrosh@panasas.com> wrote: >>>> Also looking >>>> ahead I will have RAID 0, 1, 5, and 6 on objects of different devices. bio >>>> is the perfect collector for memory information in this situation. > >>> You will add such features to exofs, handling multiple devices >>> internally? > >> Multiple objects on Multiple devices, Yes. > > That sort of feature does not belong in exofs, but somewhat separate. > Ideally we should be able to share "MD for OSD" with other OSD > filesystems, and the "osdblk" device that I will produce once libosd > hits upstream. > No can do. exofs is meant to be a reference implementation of a pNFS-objects file serving system. Have you read the spec of pNFS-objects layout? they define RAID 0, 1, 5, and 6. In pNFS the MDS is suppose to be able to write the data for its clients as NFS, so it needs to have all the infra structure and knowledge of an Client pNFS-object layout drive. But don't worry, the plans are that layout-drive and exofs will reuse all the same library code that does all that. There will not be a single line of duplicate code. In fact one of the things I wanted to talk about in LSF is a generic, BIO based (or some thing else), RAID engine, That could be used by all RAIDers in Kernel, DM, MD, btrfs, exofs pNFS-objects, TUX3, ZFS and so on. And I don't mean just the low level memory-pointers XOR functions, but the more higher level of memory splitters/collectors, abstract-device lists, and RAID description structures. (Because RAIDs can be stacked like 10, 50, and all kind of crazy things) > Jeff > > > > Boaz ^ permalink raw reply [flat|nested] 17+ messages in thread
* pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils) 2009-02-16 10:19 ` Boaz Harrosh @ 2009-02-16 11:05 ` Jeff Garzik 2009-02-16 12:45 ` Boaz Harrosh ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Jeff Garzik @ 2009-02-16 11:05 UTC (permalink / raw) To: Boaz Harrosh Cc: FUJITA Tomonori, avishay, akpm, linux-fsdevel, osd-dev, linux-kernel, James.Bottomley, jens.axboe, linux-scsi Boaz Harrosh wrote: > No can do. exofs is meant to be a reference implementation of a pNFS-objects > file serving system. Have you read the spec of pNFS-objects layout? they define > RAID 0, 1, 5, and 6. In pNFS the MDS is suppose to be able to write the data > for its clients as NFS, so it needs to have all the infra structure and knowledge > of an Client pNFS-object layout drive. Yes, I have studied pNFS! I plan to add v4.1 and pNFS support to my NFS server, once v4.0 support is working well. pNFS The Theory: is wise and necessary: permit clients to directly connect to data storage, rather than copying through the metadata server(s). This is what every distributed filesystem is doing these days -- direct to data server for bulk data read/write. pNFS The Specification: is an utter piece of shit. I can only presume some shady backroom deal in a smoke-filled room was the reason this saw the light of day. In a sane world, NFS clients would speak... NFS. In the crazy world of pNFS, NFS clients are now forced to speak NFS, SCSI, RAID, and any number of proprietary layout types. When will HTTP be added to the list? :) But anything beyond the NFS protocol for talking client <-> data servers is code bloat complexity madness for an NFS client that wishes to be compatible with "most of the NFS 4.1 world". An ideal NFS client for pNFS should be asked to do these two things, and nothing more: 1) send metadata transactions to one or more metadata servers, using well-known NFS protocol 2) send data to one or more data servers, using well-known NFS protocol subset designed for storage (v4.1, section 13.6) But no. pNFS has forced a huge complexity on the NFS client, by permitting an unbounded number of network protocols. A "layout plugin" layer is required. SCSI and OSD support are REQUIRED for any reasonably compatible setup going forward. But even more than the technical complexity, this is the first time in NFS history that NFS has required a protocol besides... NFS. pNFS means that a useful. compatible NFS client must know all these storage protocols, in addition to NFS. Furthermore, enabling proprietary layout types means that it is easy for a "compatible" v4.1 client to be denied parallel access to data available to other "compatible" v4.1 clients: Client A: Linux, fully open source Client B: Linux, with closed source module for layout type SuperWhizBang storage Both Client A and Client B can claim to be NFS v4.1 and pNFS compatible, yet Client A must read data through the metadata server because it lacks the SuperWhizBang storage plugin. pNFS means a never-ending arms race for the best storage layout, where NFS clients are inevitably compatibly with a __random subset__ of total available layout types. pNFS will be a continuing train wreck of fly-by-night storage companies, and their pet layout types & storage protocols. It is a support nightmare, an admin nightmare, a firewall nightmare, a client implementor's nightmare, but a storage vendor's wet dream. NFS was never beautiful, but at least until v4.0 it was well known and widely cross-compatible. And only required one network protocol: NFS. Jeff ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils) 2009-02-16 11:05 ` pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils) Jeff Garzik @ 2009-02-16 12:45 ` Boaz Harrosh 2009-02-16 15:50 ` James Bottomley 2009-02-16 16:23 ` Benny Halevy 2 siblings, 0 replies; 17+ messages in thread From: Boaz Harrosh @ 2009-02-16 12:45 UTC (permalink / raw) To: Jeff Garzik Cc: FUJITA Tomonori, avishay, akpm, linux-fsdevel, osd-dev, linux-kernel, James.Bottomley, jens.axboe, linux-scsi Jeff Garzik wrote: > Boaz Harrosh wrote: >> No can do. exofs is meant to be a reference implementation of a pNFS-objects >> file serving system. Have you read the spec of pNFS-objects layout? they define >> RAID 0, 1, 5, and 6. In pNFS the MDS is suppose to be able to write the data >> for its clients as NFS, so it needs to have all the infra structure and knowledge >> of an Client pNFS-object layout drive. > > Yes, I have studied pNFS! I plan to add v4.1 and pNFS support to my NFS > server, once v4.0 support is working well. > > > pNFS The Theory: is wise and necessary: permit clients to directly > connect to data storage, rather than copying through the metadata > server(s). This is what every distributed filesystem is doing these > days -- direct to data server for bulk data read/write. > > pNFS The Specification: is an utter piece of shit. I can only presume > some shady backroom deal in a smoke-filled room was the reason this saw > the light of day. > > > In a sane world, NFS clients would speak... NFS. > > In the crazy world of pNFS, NFS clients are now forced to speak NFS, > SCSI, RAID, and any number of proprietary layout types. When will HTTP > be added to the list? :) > > But anything beyond the NFS protocol for talking client <-> data servers > is code bloat complexity madness for an NFS client that wishes to be > compatible with "most of the NFS 4.1 world". > > An ideal NFS client for pNFS should be asked to do these two things, and > nothing more: > > 1) send metadata transactions to one or more metadata servers, using > well-known NFS protocol > > 2) send data to one or more data servers, using well-known NFS protocol > subset designed for storage (v4.1, section 13.6) > > But no. > > pNFS has forced a huge complexity on the NFS client, by permitting an > unbounded number of network protocols. A "layout plugin" layer is > required. SCSI and OSD support are REQUIRED for any reasonably > compatible setup going forward. > > But even more than the technical complexity, this is the first time in > NFS history that NFS has required a protocol besides... NFS. > > pNFS means that a useful. compatible NFS client must know all these > storage protocols, in addition to NFS. > > Furthermore, enabling proprietary layout types means that it is easy for > a "compatible" v4.1 client to be denied parallel access to data > available to other "compatible" v4.1 clients: > > Client A: Linux, fully open source > > Client B: Linux, with closed source module for > layout type SuperWhizBang storage > > Both Client A and Client B can claim to be NFS v4.1 and pNFS > compatible, > yet Client A must read data through the metadata > server because it lacks the SuperWhizBang storage plugin. > > pNFS means a never-ending arms race for the best storage layout, where > NFS clients are inevitably compatibly with a __random subset__ of total > available layout types. pNFS will be a continuing train wreck of > fly-by-night storage companies, and their pet layout types & storage > protocols. > > It is a support nightmare, an admin nightmare, a firewall nightmare, a > client implementor's nightmare, but a storage vendor's wet dream. > > NFS was never beautiful, but at least until v4.0 it was well known and > widely cross-compatible. And only required one network protocol: NFS. > > Jeff > I hear you. I'm paying close attention and noting down all of above hazardous signals. However, please allow me my own on-look on the matter. Perhaps one day soon, (Probably not in LSF, no travel budget approval yet), we will meet and we can talk about it more closely, and maybe I could convince you of other aspects as well. But pragmatically speaking. All the above has nothing that I can do about it. My job is to show an OO reference implementation of pNFS-objects, a public and signed protocol. I admit that pNFS-objects is a Panasas's pet, which is my boss and the inventor of pNFS. I hope to remove it from your above SuperWhizBang category, please. Actually my job is so it will not be. I want an open standard implementation from day one, so there will be no questions. I understand that you argue about the do or die of the OSD protocol under pNFS. For me it is just that much more challenge of swimming up stream, as a Salmon. Everyone is doing "Files" I get to do "Objects". I hope when finally the code will arrive, soon, and it gets to be used, it's merits, performance, security, and ease-of-use will win users over, big time. (Lets compare notes, what is the minimal NFS DS implementation you can imagine? What would you say an OSD's target is? not counting all the extras an OSD gives you, like no proprietary back-channel protocol between MDS-to-DS inside the cluster.) But I do hear you, really, you have very valid points that must be taken into consideration. Thanks Boaz ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils) 2009-02-16 11:05 ` pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils) Jeff Garzik 2009-02-16 12:45 ` Boaz Harrosh @ 2009-02-16 15:50 ` James Bottomley 2009-02-16 16:27 ` Benny Halevy 2009-02-16 16:23 ` Benny Halevy 2 siblings, 1 reply; 17+ messages in thread From: James Bottomley @ 2009-02-16 15:50 UTC (permalink / raw) To: Jeff Garzik Cc: Boaz Harrosh, FUJITA Tomonori, avishay, akpm, linux-fsdevel, osd-dev, linux-kernel, jens.axboe, linux-scsi On Mon, 2009-02-16 at 06:05 -0500, Jeff Garzik wrote: > Boaz Harrosh wrote: > > No can do. exofs is meant to be a reference implementation of a pNFS-objects > > file serving system. Have you read the spec of pNFS-objects layout? they define > > RAID 0, 1, 5, and 6. In pNFS the MDS is suppose to be able to write the data > > for its clients as NFS, so it needs to have all the infra structure and knowledge > > of an Client pNFS-object layout drive. > > Yes, I have studied pNFS! I plan to add v4.1 and pNFS support to my NFS > server, once v4.0 support is working well. > > > pNFS The Theory: is wise and necessary: permit clients to directly > connect to data storage, rather than copying through the metadata > server(s). This is what every distributed filesystem is doing these > days -- direct to data server for bulk data read/write. > > pNFS The Specification: is an utter piece of shit. I can only presume > some shady backroom deal in a smoke-filled room was the reason this saw > the light of day. > > > In a sane world, NFS clients would speak... NFS. > > In the crazy world of pNFS, NFS clients are now forced to speak NFS, > SCSI, RAID, and any number of proprietary layout types. When will HTTP > be added to the list? :) Heh, it's one of the endearing faults of the storage industry that we never learn from our mistakes ... particularly in storage protocols. Actually, perhaps that's a mischaracterised: we never actually learn from our successes. For example, most popular storage protocols solve about 80% of the problem (NFSv2) get something bolted on to take that to 95% (locking) and rule for decades. We end up obsessing about the 5% and produce something that's like 10x the overhead to solve it. Customers, for some unfathomable reason, hate complexity (I suspect principally because it in some measure equals expense) so the 100% solution (which actually turns out to be a 95% one because the over engineered complexity adds another 5% of different problems that take years to find) tends to work its way into a niche and stay there ... eventually fading. If you're really lucky, the niche evolves into something sustainable. For example iSCSI: blew its early promise, pulled a bunch of unnecessary networking into the protocol and ended up too big to fit in disk firmware (thus destroying the ability to have a simple network tap to replace storage fabric). It's been slowly fading until Virtualisation came along. Now all the other solutions to getting storage into virtual machines are so horrible and arcane that iSCSI looks like a winner (if the alternative is Frankenstein's monster, Grendel's mother suddenly looks more attractive as a partner). So, trust the customer ... if it's so horrible it shouldn't have seen the light of day, the chances are that no-one will buy it anyway. James ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils) 2009-02-16 15:50 ` James Bottomley @ 2009-02-16 16:27 ` Benny Halevy 0 siblings, 0 replies; 17+ messages in thread From: Benny Halevy @ 2009-02-16 16:27 UTC (permalink / raw) To: James Bottomley Cc: Jeff Garzik, Boaz Harrosh, FUJITA Tomonori, avishay, akpm, linux-fsdevel, osd-dev, linux-kernel, jens.axboe, linux-scsi On Feb. 16, 2009, 17:50 +0200, James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > On Mon, 2009-02-16 at 06:05 -0500, Jeff Garzik wrote: >> Boaz Harrosh wrote: >>> No can do. exofs is meant to be a reference implementation of a pNFS-objects >>> file serving system. Have you read the spec of pNFS-objects layout? they define >>> RAID 0, 1, 5, and 6. In pNFS the MDS is suppose to be able to write the data >>> for its clients as NFS, so it needs to have all the infra structure and knowledge >>> of an Client pNFS-object layout drive. >> Yes, I have studied pNFS! I plan to add v4.1 and pNFS support to my NFS >> server, once v4.0 support is working well. >> >> >> pNFS The Theory: is wise and necessary: permit clients to directly >> connect to data storage, rather than copying through the metadata >> server(s). This is what every distributed filesystem is doing these >> days -- direct to data server for bulk data read/write. >> >> pNFS The Specification: is an utter piece of shit. I can only presume >> some shady backroom deal in a smoke-filled room was the reason this saw >> the light of day. >> >> >> In a sane world, NFS clients would speak... NFS. >> >> In the crazy world of pNFS, NFS clients are now forced to speak NFS, >> SCSI, RAID, and any number of proprietary layout types. When will HTTP >> be added to the list? :) > > Heh, it's one of the endearing faults of the storage industry that we > never learn from our mistakes ... particularly in storage protocols. > > Actually, perhaps that's a mischaracterised: we never actually learn > from our successes. For example, most popular storage protocols solve > about 80% of the problem (NFSv2) get something bolted on to take that to > 95% (locking) and rule for decades. We end up obsessing about the 5% > and produce something that's like 10x the overhead to solve it. > Customers, for some unfathomable reason, hate complexity (I suspect > principally because it in some measure equals expense) so the 100% > solution (which actually turns out to be a 95% one because the over > engineered complexity adds another 5% of different problems that take > years to find) tends to work its way into a niche and stay there ... > eventually fading. > > If you're really lucky, the niche evolves into something sustainable. > For example iSCSI: blew its early promise, pulled a bunch of unnecessary > networking into the protocol and ended up too big to fit in disk > firmware (thus destroying the ability to have a simple network tap to > replace storage fabric). It's been slowly fading until Virtualisation > came along. Now all the other solutions to getting storage into virtual > machines are so horrible and arcane that iSCSI looks like a winner (if > the alternative is Frankenstein's monster, Grendel's mother suddenly > looks more attractive as a partner). > > So, trust the customer ... if it's so horrible it shouldn't have seen > the light of day, the chances are that no-one will buy it anyway. I completely agree with this sentence. And no customer, whatsoever, that I've talked to about pNFS had any reservations about supporting multiple layout types. On the contrary... Benny > > James > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils) 2009-02-16 11:05 ` pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils) Jeff Garzik 2009-02-16 12:45 ` Boaz Harrosh 2009-02-16 15:50 ` James Bottomley @ 2009-02-16 16:23 ` Benny Halevy 2 siblings, 0 replies; 17+ messages in thread From: Benny Halevy @ 2009-02-16 16:23 UTC (permalink / raw) To: Jeff Garzik Cc: Boaz Harrosh, FUJITA Tomonori, avishay, akpm, linux-fsdevel, osd-dev, linux-kernel, James.Bottomley, jens.axboe, linux-scsi On Feb. 16, 2009, 13:05 +0200, Jeff Garzik <jeff@garzik.org> wrote: > Boaz Harrosh wrote: >> No can do. exofs is meant to be a reference implementation of a pNFS-objects >> file serving system. Have you read the spec of pNFS-objects layout? they define >> RAID 0, 1, 5, and 6. In pNFS the MDS is suppose to be able to write the data >> for its clients as NFS, so it needs to have all the infra structure and knowledge >> of an Client pNFS-object layout drive. > > Yes, I have studied pNFS! I plan to add v4.1 and pNFS support to my NFS > server, once v4.0 support is working well. > > > pNFS The Theory: is wise and necessary: permit clients to directly > connect to data storage, rather than copying through the metadata > server(s). This is what every distributed filesystem is doing these > days -- direct to data server for bulk data read/write. > > pNFS The Specification: is an utter piece of shit. I can only presume > some shady backroom deal in a smoke-filled room was the reason this saw > the light of day. > > > In a sane world, NFS clients would speak... NFS. > > In the crazy world of pNFS, NFS clients are now forced to speak NFS, > SCSI, RAID, and any number of proprietary layout types. When will HTTP > be added to the list? :) > > But anything beyond the NFS protocol for talking client <-> data servers > is code bloat complexity madness for an NFS client that wishes to be > compatible with "most of the NFS 4.1 world". > > An ideal NFS client for pNFS should be asked to do these two things, and > nothing more: > > 1) send metadata transactions to one or more metadata servers, using > well-known NFS protocol > > 2) send data to one or more data servers, using well-known NFS protocol > subset designed for storage (v4.1, section 13.6) > > But no. > > pNFS has forced a huge complexity on the NFS client, by permitting an > unbounded number of network protocols. A "layout plugin" layer is > required. SCSI and OSD support are REQUIRED for any reasonably > compatible setup going forward. > > But even more than the technical complexity, this is the first time in > NFS history that NFS has required a protocol besides... NFS. > > pNFS means that a useful. compatible NFS client must know all these > storage protocols, in addition to NFS. > > Furthermore, enabling proprietary layout types means that it is easy for > a "compatible" v4.1 client to be denied parallel access to data > available to other "compatible" v4.1 clients: > > Client A: Linux, fully open source > > Client B: Linux, with closed source module for > layout type SuperWhizBang storage > > Both Client A and Client B can claim to be NFS v4.1 and pNFS > compatible, > yet Client A must read data through the metadata > server because it lacks the SuperWhizBang storage plugin. At least, for SuperWhizBang to comply with NFSv4.1 requirements it needs to follow the IETF process as an open protocol and address important semantic details (as listed by nfsv4.1) on-top of the wire data structures like security considerations and client fencing. Being open source or not is somewhat orthogonal to that since from the protocol specification one should be able to implement a fully comply client and/or server. > > pNFS means a never-ending arms race for the best storage layout, where > NFS clients are inevitably compatibly with a __random subset__ of total > available layout types. pNFS will be a continuing train wreck of > fly-by-night storage companies, and their pet layout types & storage > protocols. > > It is a support nightmare, an admin nightmare, a firewall nightmare, a > client implementor's nightmare, but a storage vendor's wet dream. What you're basically saying is similar to rejecting the filesystem export kABI since this will cause a never-ending arms race for the best file system. (a.k.a. "640KB of RAM and FAT file system is likely anything that a sane user will ever need"... ;-) I believe that competition is good. Good for customers, for which the pNFS protocol was designed for, and good for vendors as well. The extra complexity is there since one-size does not fit all. Different storage technologies fit certain applications better than others and exposing storage for direct client access can convey these strengths all the way up to the host running the application. Besides, the pNFS specification was also driven by customers that use proprietary clustered filesystems today, that use block or object-based storage. These customers want pNFS to be a standard to steam up competition for several reasons, like: - encourage open source implementations - second source availability - building best-of-breed systems by integrating parts from different vendors - reuse of existing storage and networking infrastructure Benny > > NFS was never beautiful, but at least until v4.0 it was well known and > widely cross-compatible. And only required one network protocol: NFS. > > Jeff > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils 2009-02-16 9:19 ` Boaz Harrosh 2009-02-16 9:27 ` Jeff Garzik @ 2009-02-16 9:38 ` FUJITA Tomonori 2009-02-16 10:29 ` Boaz Harrosh 1 sibling, 1 reply; 17+ messages in thread From: FUJITA Tomonori @ 2009-02-16 9:38 UTC (permalink / raw) To: bharrosh Cc: fujita.tomonori, avishay, jeff, akpm, linux-fsdevel, osd-dev, linux-kernel, James.Bottomley, jens.axboe, linux-scsi On Mon, 16 Feb 2009 11:19:21 +0200 Boaz Harrosh <bharrosh@panasas.com> wrote: > >> Also looking > >> ahead I will have RAID 0, 1, 5, and 6 on objects of different devices. bio > >> is the perfect collector for memory information in this situation. > > > > You will add such features to exofs, handling multiple devices > > internally? > > > > Multiple objects on Multiple devices, Yes. I thought that exofs is kinda example (reference) file system. Nobody has seen your code. Let's discuss when we have the code. Over-designing for what we've not seen is not a good idea. > >> exofs is not the first and only file system who is using bios. Proof of > >> the matter is that block exports a bio submit routine. > > > > Seems that exofs just passes pages and the ULD sends a SCSI command > > including these pages. I don't see how exofs needs to handle bio > > directly. > > > > How do you propose to collect these pages? and keep them without allocating > an extra list? without pre-allocating a struct request? and without re-inventing > the bio structure? I don't think that allocating an extra list (or something) to keep them hurts performance. We can talk about it when you have the real performance results. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils 2009-02-16 9:38 ` [PATCH 1/8] exofs: Kbuild, Headers and osd utils FUJITA Tomonori @ 2009-02-16 10:29 ` Boaz Harrosh 2009-02-17 0:20 ` FUJITA Tomonori 0 siblings, 1 reply; 17+ messages in thread From: Boaz Harrosh @ 2009-02-16 10:29 UTC (permalink / raw) To: FUJITA Tomonori Cc: avishay, jeff, akpm, linux-fsdevel, osd-dev, linux-kernel, James.Bottomley, jens.axboe, linux-scsi FUJITA Tomonori wrote: > On Mon, 16 Feb 2009 11:19:21 +0200 > Boaz Harrosh <bharrosh@panasas.com> wrote: > >>>> Also looking >>>> ahead I will have RAID 0, 1, 5, and 6 on objects of different devices. bio >>>> is the perfect collector for memory information in this situation. >>> You will add such features to exofs, handling multiple devices >>> internally? >>> >> Multiple objects on Multiple devices, Yes. > > I thought that exofs is kinda example (reference) file system. > > Nobody has seen your code. Let's discuss when we have the > code. Over-designing for what we've not seen is not a good idea. > Thanks for the insults, and high credit ;) Yes it's "kinda example (reference) file system" of a pNFS-objects file system. What can I do life is tough. > >>>> exofs is not the first and only file system who is using bios. Proof of >>>> the matter is that block exports a bio submit routine. >>> Seems that exofs just passes pages and the ULD sends a SCSI command >>> including these pages. I don't see how exofs needs to handle bio >>> directly. >>> >> How do you propose to collect these pages? and keep them without allocating >> an extra list? without pre-allocating a struct request? and without re-inventing >> the bio structure? > > I don't think that allocating an extra list (or something) to keep > them hurts performance. We can talk about it when you have the real > performance results. So you are the one that starts to invent the wheel here. I thought I was the one that does that, only you only called me by names, because you never showed me where. But please only answer one question for me: Please don't write back if you do not answer this question: Why do other filesystems allow to use bios? are they going to stop? Who is going to remove that? And as I said, I am going to remove it for now, please be patient. You have never herd from me that I refuse to do it, did you? Boaz ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils 2009-02-16 10:29 ` Boaz Harrosh @ 2009-02-17 0:20 ` FUJITA Tomonori 2009-02-17 8:10 ` [osd-dev] " Boaz Harrosh 0 siblings, 1 reply; 17+ messages in thread From: FUJITA Tomonori @ 2009-02-17 0:20 UTC (permalink / raw) To: bharrosh Cc: fujita.tomonori, avishay, jeff, akpm, linux-fsdevel, osd-dev, linux-kernel, James.Bottomley, jens.axboe, linux-scsi On Mon, 16 Feb 2009 12:29:06 +0200 Boaz Harrosh <bharrosh@panasas.com> wrote: > >>>> exofs is not the first and only file system who is using bios. Proof of > >>>> the matter is that block exports a bio submit routine. > >>> Seems that exofs just passes pages and the ULD sends a SCSI command > >>> including these pages. I don't see how exofs needs to handle bio > >>> directly. > >>> > >> How do you propose to collect these pages? and keep them without allocating > >> an extra list? without pre-allocating a struct request? and without re-inventing > >> the bio structure? > > > > I don't think that allocating an extra list (or something) to keep > > them hurts performance. We can talk about it when you have the real > > performance results. > > So you are the one that starts to invent the wheel here. I thought I was > the one that does that, only you only called me by names, because you never showed > me where. > > But please only answer one question for me: Please don't write back if you do not > answer this question: > > Why do other filesystems allow to use bios? are they going to stop? Who is going > to remove that? Can you stop the argument, "exofs is similar to the existing traditional file systems hence it should be treated equally". It's simply untrue. Does anyone except for panasas people insist the same argument? We are talking about the design of exofs, which also affects the design of OSD ULD (including the library) living in SCSI mid-layer. It's something completely different from existing traditional file systems that work nicely on the top of the block layer. As discussed in another thread, now OSD ULD reinvents the bio handling infrastructure because of the design of exofs. But OSD ULD can use the block layer helper functions to avoid the re-invention if we change the exofs design to take pages instead of bios. For now, it works perfectly for exofs. In the future, we might change it but we don't know until you submit patches (or the performance results) that show taking pages doesn't work for exofs nicely. I guess that we need to evolve the block layer to support OSD stuff cleanly than we've discussed recently. But again we can do when we definitely need to do. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [osd-dev] [PATCH 1/8] exofs: Kbuild, Headers and osd utils 2009-02-17 0:20 ` FUJITA Tomonori @ 2009-02-17 8:10 ` Boaz Harrosh 2009-02-27 8:09 ` FUJITA Tomonori 0 siblings, 1 reply; 17+ messages in thread From: Boaz Harrosh @ 2009-02-17 8:10 UTC (permalink / raw) To: FUJITA Tomonori Cc: James.Bottomley, linux-scsi, jeff, linux-kernel, avishay, osd-dev, jens.axboe, linux-fsdevel, akpm FUJITA Tomonori wrote: > > Can you stop the argument, "exofs is similar to the existing > traditional file systems hence it should be treated equally". It's > simply untrue. Does anyone except for panasas people insist the same > argument? > No I will not, it is true. exofs is just a regular old filesystem nothing different. > We are talking about the design of exofs, which also affects the > design of OSD ULD (including the library) living in SCSI > mid-layer. The ULD belongs to scsi but the library could sit else where, how is that an argument? > It's something completely different from existing > traditional file systems that work nicely on the top of the block > layer. > Nicely is a matter of opinion. I think that building a bio in stages in the background, then at the point of execution build a request-from-bio and execute is a nice design that makes sure nothing is duplicated, copied, and wheels are not re-invented. Current Kernel design is nice, why change it? > As discussed in another thread, now OSD ULD reinvents the bio handling > infrastructure because of the design of exofs. Not true, show me where? You keep saying that. Where in the code is it reinvented? > But OSD ULD can use the > block layer helper functions to avoid the re-invention if we change > the exofs design to take pages instead of bios. That, above is exactly a re-invention of block layer. What was all that scatterlist pointers and scsi_execute_async() cleanup that you worked so hard to get rid off. It was a list of pages+offsets+lengths, that's what it was. Now you ask me to do the same, keep an external structure of pages+offsets+lengths. pass them three layers down and at some point in time force new block_layer interfaces, which do not fully exist today, to prepare a request for submission. No! the decision was, keep preparation of request local and submit it in place, without intermediate structures. From-memory-to-request in one stage. That's what I want. The bio lets me do that yesterday, lots of file systems do that yesterday. All I'm asking for is one small blk_make_request() that is a parallel of generic_make_request() of the BLOCK_FS, for the BLOCK_PC requests If someone wanted a filesystem over tape drives, over st.c or osst.c. He would design it similar. collect bios in background, point and shoot. The blk_map_xxx functions where made to satisfy user-mode interfaces, for filesystems it was bio for ages. > For now, it works > perfectly for exofs. In the future, we might change it but we don't > know until you submit patches (or the performance results) that show > taking pages doesn't work for exofs nicely. > I don't know about you, but me, I don't have to do some work to know it's bad. I can imagine before hand that it is bad. I usually run hundreds of simulations in my head, discarding any bad options until I find the one way I like. Usually the short easiest way is also the best. (Since I'm very lazy) Like with bidi for example, Why not just take two requests instead of one? But I was sent to do all that gigantic work so everyone will see that. > I guess that we need to evolve the block layer to support OSD stuff > cleanly than we've discussed recently. But again we can do when we > definitely need to do. It's not that big and long evolution. It is a simple: struct request *blk_make_request(struct bio*, gfp_t gfp); And we are done. more simple then that? I don't know Boaz ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [osd-dev] [PATCH 1/8] exofs: Kbuild, Headers and osd utils 2009-02-17 8:10 ` [osd-dev] " Boaz Harrosh @ 2009-02-27 8:09 ` FUJITA Tomonori 2009-03-01 10:43 ` Boaz Harrosh 0 siblings, 1 reply; 17+ messages in thread From: FUJITA Tomonori @ 2009-02-27 8:09 UTC (permalink / raw) To: bharrosh Cc: fujita.tomonori, James.Bottomley, linux-scsi, jeff, linux-kernel, avishay, osd-dev, jens.axboe, linux-fsdevel, akpm On Tue, 17 Feb 2009 10:10:15 +0200 Boaz Harrosh <bharrosh@panasas.com> wrote: > FUJITA Tomonori wrote: > > > > Can you stop the argument, "exofs is similar to the existing > > traditional file systems hence it should be treated equally". It's > > simply untrue. Does anyone except for panasas people insist the same > > argument? > > > > No I will not, it is true. exofs is just a regular old filesystem > nothing different. After reading this, I gave up discussing this issue with you but I still wait for your fixes that you promised: http://marc.info/?l=linux-scsi&m=123445759718253&w=2 Thanks, ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [osd-dev] [PATCH 1/8] exofs: Kbuild, Headers and osd utils 2009-02-27 8:09 ` FUJITA Tomonori @ 2009-03-01 10:43 ` Boaz Harrosh 0 siblings, 0 replies; 17+ messages in thread From: Boaz Harrosh @ 2009-03-01 10:43 UTC (permalink / raw) To: FUJITA Tomonori Cc: James.Bottomley, linux-scsi, jeff, linux-kernel, avishay, osd-dev, jens.axboe, linux-fsdevel, akpm FUJITA Tomonori wrote: > On Tue, 17 Feb 2009 10:10:15 +0200 > Boaz Harrosh <bharrosh@panasas.com> wrote: > >> FUJITA Tomonori wrote: >>> Can you stop the argument, "exofs is similar to the existing >>> traditional file systems hence it should be treated equally". It's >>> simply untrue. Does anyone except for panasas people insist the same >>> argument? >>> >> No I will not, it is true. exofs is just a regular old filesystem >> nothing different. > > After reading this, I gave up discussing this issue with you but I > still wait for your fixes that you promised: > > http://marc.info/?l=linux-scsi&m=123445759718253&w=2 > > > Thanks, > -- They are on the way, I have not forgotten Boaz ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2009-03-01 10:44 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <49902A9E.3070002@panasas.com>
[not found] ` <1234185129-31858-1-git-send-email-bharrosh@panasas.com>
2009-02-16 4:18 ` [PATCH 1/8] exofs: Kbuild, Headers and osd utils FUJITA Tomonori
2009-02-16 8:49 ` Boaz Harrosh
2009-02-16 9:00 ` FUJITA Tomonori
2009-02-16 9:19 ` Boaz Harrosh
2009-02-16 9:27 ` Jeff Garzik
2009-02-16 10:19 ` Boaz Harrosh
2009-02-16 11:05 ` pNFS rant (was Re: [PATCH 1/8] exofs: Kbuild, Headers and osd utils) Jeff Garzik
2009-02-16 12:45 ` Boaz Harrosh
2009-02-16 15:50 ` James Bottomley
2009-02-16 16:27 ` Benny Halevy
2009-02-16 16:23 ` Benny Halevy
2009-02-16 9:38 ` [PATCH 1/8] exofs: Kbuild, Headers and osd utils FUJITA Tomonori
2009-02-16 10:29 ` Boaz Harrosh
2009-02-17 0:20 ` FUJITA Tomonori
2009-02-17 8:10 ` [osd-dev] " Boaz Harrosh
2009-02-27 8:09 ` FUJITA Tomonori
2009-03-01 10:43 ` Boaz Harrosh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox