* [RFC][PATCH] Direct I/O for the SCSI tapes @ 2002-07-27 12:38 Kai Makisara 2002-07-28 4:39 ` Douglas Gilbert 0 siblings, 1 reply; 5+ messages in thread From: Kai Makisara @ 2002-07-27 12:38 UTC (permalink / raw) To: linux-scsi The URL http://www.kolumbus.fi/kai.makisara/st-dio.html contains explanation and a link to a patch implementing direct I/O in the SCSI tape driver (st). Before adding this to the official kernel, I would like to get some feedback on this patch. If no one is interested in this enhancement and/or it seems to be useless, it will probably be left to collect dust. Kai ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC][PATCH] Direct I/O for the SCSI tapes 2002-07-27 12:38 [RFC][PATCH] Direct I/O for the SCSI tapes Kai Makisara @ 2002-07-28 4:39 ` Douglas Gilbert 2002-07-28 11:03 ` Kai Makisara 0 siblings, 1 reply; 5+ messages in thread From: Douglas Gilbert @ 2002-07-28 4:39 UTC (permalink / raw) To: Kai Makisara; +Cc: linux-scsi, ingo.oeser Kai Makisara wrote: > > The URL http://www.kolumbus.fi/kai.makisara/st-dio.html contains > explanation and a link to a patch implementing direct I/O in the SCSI tape > driver (st). Before adding this to the official kernel, I would like to > get some feedback on this patch. If no one is interested in this > enhancement and/or it seems to be useless, it will probably be left to > collect dust. Kai, Faster throughput and less CPU overhead seem pretty good reasons to use direct IO. With larger disk sizes, faster tape transfer speeds are needed to backup them up in a reasonable time. So I hope your patch doesn't collect dust. Also most high speed application that use sg are _streaming_ to disks [I have one report of > 320 MB/sec]. Perhaps people may one day use a purpose built command set for streaming like SSC rather than SBC (as used by direct access devices). It was interesting to see that your patch used variants of Ingo Oeser's sg_map_user_pages() [see Kai's version shown below] **. My interest in those functions is that I would like to use them in sg (since kiobufs have been removed from lk 2.5). According to Ingo several other char drivers that previously used kiobufs are probably looking for replacements as well. Since they are general routines of use to char drivers that build scatter gather lists (block drivers have bio) perhaps they should be placed in some common area. Doug Gilbert ** lkml 2002-07-19 22:39:18 Ingo Oeser "Re: [never mind] kiobufs and highmem" Kai's st_map_user_pages() and st_unmap_user_pages() follow: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv static int st_map_user_pages(struct scatterlist *sgl, const unsigned int max_pages, unsigned long uaddr, size_t count, int rw, unsigned long max_pfn) { int res, i, j; unsigned int nr_pages; struct page **pages; nr_pages = ((uaddr & ~PAGE_MASK) + count - 1 + ~PAGE_MASK) >> PAGE_SHIFT; /* User attempted Overflow! */ if ((uaddr + count) < uaddr) return -EINVAL; /* Too big */ if (nr_pages > max_pages) return -ENOMEM; /* Hmm? */ if (count == 0) return 0; if ((pages = kmalloc(max_pages * sizeof(*pages), GFP_ATOMIC)) == NULL) return -ENOMEM; /* Try to fault in all of the necessary pages */ down_read(¤t->mm->mmap_sem); /* rw==READ means read from drive, write into memory area */ res = get_user_pages( current, current->mm, uaddr, nr_pages, rw == READ, 0, /* don't force */ pages, NULL); up_read(¤t->mm->mmap_sem); /* Errors and no page mapped should return here */ if (res < nr_pages) goto out_unmap; for (i=0; i < nr_pages; i++) { /* FIXME: flush superflous for rw==READ, * probably wrong function for rw==WRITE */ flush_dcache_page(pages[i]); if (page_to_pfn(pages[i]) > max_pfn) goto out_unlock; /* ?? Is locking needed? I don't think so */ /* if (TestSetPageLocked(pages[i])) goto out_unlock; */ } /* Populate the scatter/gather list */ sgl[0].page = pages[0]; sgl[0].offset = uaddr & ~PAGE_MASK; if (nr_pages > 1) { sgl[0].length = PAGE_SIZE - sgl[0].offset; count -= sgl[0].length; for (i=1; i < nr_pages ; i++) { sgl[i].offset = 0; sgl[i].page = pages[i]; sgl[i].length = count < PAGE_SIZE ? count : PAGE_SIZE; count -= PAGE_SIZE; } } else { sgl[0].length = count; } kfree(pages); return nr_pages; out_unlock: /* for (j=0; j < i; j++) unlock_page(pages[j]); */ res = 0; out_unmap: if (res > 0) for (j=0; j < res; j++) page_cache_release(pages[j]); kfree(pages); return res; } /* And unmap them... */ static int st_unmap_user_pages(struct scatterlist *sgl, const unsigned int nr_pages, int dirtied) { int i; for (i=0; i < nr_pages; i++) { if (dirtied && !PageReserved(sgl[i].page)) SetPageDirty(sgl[i].page); /* unlock_page(sgl[i].page); */ /* FIXME: cache flush missing for rw==READ * FIXME: call the correct reference counting function */ page_cache_release(sgl[i].page); } return 0; } ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC][PATCH] Direct I/O for the SCSI tapes 2002-07-28 4:39 ` Douglas Gilbert @ 2002-07-28 11:03 ` Kai Makisara 2002-07-28 22:59 ` Ingo Oeser 0 siblings, 1 reply; 5+ messages in thread From: Kai Makisara @ 2002-07-28 11:03 UTC (permalink / raw) To: Douglas Gilbert; +Cc: linux-scsi, ingo.oeser On Sun, 28 Jul 2002, Douglas Gilbert wrote: ... > It was interesting to see that your patch used variants of Ingo Oeser's > sg_map_user_pages() [see Kai's version shown below] **. My interest in > those functions is that I would like to use them in sg (since kiobufs > have been removed from lk 2.5). According to Ingo several other char > drivers that previously used kiobufs are probably looking for > replacements as well. > Kiobufs seem to be going out (although Linus has not yet given his opinion) and since there is nothing equivalent in the kernel, I had to code something. I started from Ingo's function and looked at the kiobuf, bio, dio, and aio code and tried to use the best ideas. > Since they are general routines of use to char drivers that > build scatter gather lists (block drivers have bio) perhaps they > should be placed in some common area. > It seems clear that there are many of us who would like to see something like this in the kernel. I fully agreed with Ingo when he posted his messages to lkml but no one has followed up the discussion yet. There have been some indications on lkml that Ben LaHaise's aio and/or kvecs would be included in 2.6. As far as I know, kvecs are quite near what I want and st should use kvecs if they are included. We don't want more functions mapping user buffers than necessary. If kvecs will not be in 2.6, then we want something else. This is also needed while waiting because it is useful to have the drivers otherwise tested as long as possible. Here I see two viable alternatives: 1. We make functions that all the users want and agree on and try to get these into the kernel. These would be either permanent solutions or removed if kvecs (or something else) is introduced into the kernel. Linus has not always been thrilled with temporary solutions. 2. Everybody makes their own temporary solution. Even in this alternative we can agree on the interface and implementation and then the eventual cleanup will be easy. > Doug Gilbert > > ** lkml 2002-07-19 22:39:18 Ingo Oeser "Re: [never mind] kiobufs and highmem" > > > Kai's st_map_user_pages() and st_unmap_user_pages() follow: > vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv > > static int st_map_user_pages(struct scatterlist *sgl, const unsigned int max_pages, > unsigned long uaddr, size_t count, int rw, > unsigned long max_pfn) This function does contain some details tailored for st: it checks the page accessibility and backs out if there is even one non-accessible page. For general use, it might be better to separate this from the mapping so that st would use if ((nbr_sg = cio_map_user_pages(sgl, max_pages, uaddr, count, rw)) { for (i=0; i < nbr_sg; i++) { if (page_to_pfn(sgl->page) > max_pfn) { cio_unmap_use_pages(sgl, 0); nbr_sg = 0; } } } I don't think this would significantly more cycles than my current version. It would probably be best to do possible locking in a separate function. There it would probably be advantageous to stop early if an unlockable page is found (or optionally wait until the pages are unlocked). The locking and unlocking functions can be left undefined until someone tells what he/she needs. I would like to see the following minimum properties in the mapping function: - it maps all pages or none - it pins the pages into memory - it returns the number of pages or error - it returns the page pointers and page lengths (lengths will be useful if large pages will be implemented) and offset in first page - it does whatever cache flushing is needed Is the scatter/gather list the correct way to return the results? It is convenient in many cases. However, if kvecs or something else is used in future, mapping from those to scatter/gather list is needed. It might be easier to code that now. The mapping function might be made to return the results in something that looks exactly like kvecs ;-? Any opinions? Kai ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC][PATCH] Direct I/O for the SCSI tapes 2002-07-28 11:03 ` Kai Makisara @ 2002-07-28 22:59 ` Ingo Oeser 2002-07-29 18:45 ` Kai Makisara 0 siblings, 1 reply; 5+ messages in thread From: Ingo Oeser @ 2002-07-28 22:59 UTC (permalink / raw) To: Kai Makisara; +Cc: Douglas Gilbert, linux-scsi [Please keep me CC'ed on replies, since I'm not subscribed to linux-scsi. Thanks for CC'ing me so far!] On Sun, Jul 28, 2002 at 02:03:16PM +0300, Kai Makisara wrote: > There have been some indications on lkml that Ben LaHaise's aio and/or > kvecs would be included in 2.6. As far as I know, kvecs are quite near > what I want and st should use kvecs if they are included. We don't want > more functions mapping user buffers than necessary. Mapping user buffers is one step only. I want mapping of user buffers to bus adresses to do scatter-gather. We need it in so many places now, so I like the idea of simplifying it the way we agree on. The kvecs do only user page pinning. We need struct scatterlist since we need to get the pages anyway (remember: we get a pointer to the struct page, where we easyly get to the physical address, which is already the bus adress on many arches AFAIR). With kvecs we get only half the work done. I also would like to see the mapping function entered very early in the call chain (my own use is one stack frame after the file operations!). That's why I used the stack for the temporary buffer (which should be very small anyway, so we don't pin too much pages). > 1. We make functions that all the users want and agree on and try to get > these into the kernel. These would be either permanent solutions or > removed if kvecs (or something else) is introduced into the kernel. Linus > has not always been thrilled with temporary solutions. Even then we would still need sth. which does the physical mapping and must allocate another array for it. I wanted to unify this on the argument, that scatter-gather goes together with DMA an bus adresses in the COMMON case (which we optimize for, remember ;-)). I've thought I overlooked sth. important, until Doug fowarded me your work and testing. > It would probably be best to do possible locking in a separate function. > There it would probably be advantageous to stop early if an unlockable > page is found (or optionally wait until the pages are unlocked). The > locking and unlocking functions can be left undefined until someone tells > what he/she needs. > > I would like to see the following minimum properties in the mapping > function: > - it maps all pages or none > - it pins the pages into memory > - it returns the number of pages or error > - it returns the page pointers and page lengths (lengths will be useful if > large pages will be implemented) and offset in first page > - it does whatever cache flushing is needed - locks partial pages used (first and/or last might be) to avoid a partial page write for the unused remainder. I agree completly. > Is the scatter/gather list the correct way to return the results? It is > convenient in many cases. s/many/the common/ so I would add it to the minimum properties. Otherwise we allocate just another array again for the scatter-gather list. > However, if kvecs or something else is used in > future, mapping from those to scatter/gather list is needed. It might be > easier to code that now. The mapping function might be made to return the > results in something that looks exactly like kvecs ;-? I would prefer 2 seperate functions for the kvec and the scatterlist cases. My basic idea was to prepare the sg-list from user space buffers and thus enable very simple user space DMA. I don't want it to get lost ;-) Regards Ingo Oeser -- Science is what we can tell a computer. Art is everything else. --- D.E.Knuth ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC][PATCH] Direct I/O for the SCSI tapes 2002-07-28 22:59 ` Ingo Oeser @ 2002-07-29 18:45 ` Kai Makisara 0 siblings, 0 replies; 5+ messages in thread From: Kai Makisara @ 2002-07-29 18:45 UTC (permalink / raw) To: Ingo Oeser; +Cc: Douglas Gilbert, linux-scsi On Mon, 29 Jul 2002, Ingo Oeser wrote: > [Please keep me CC'ed on replies, since I'm not subscribed to > linux-scsi. Thanks for CC'ing me so far!] ... > Mapping user buffers is one step only. I want mapping of user > buffers to bus adresses to do scatter-gather. We need it in so > many places now, so I like the idea of simplifying it the way we > agree on. The kvecs do only user page pinning. > > We need struct scatterlist since we need to get the pages anyway > (remember: we get a pointer to the struct page, where we easyly > get to the physical address, which is already the bus adress on > many arches AFAIR). > > With kvecs we get only half the work done. > I agree. I have just been exploring the other possibilities if sgl_map_user_pages() is not accepted into the kernel. map_user_kvec() (at least in aio-20020619.diff) returns all the information (page pointers, offset, lengths) that sgl_map_user_pages() puts into the s/g list. Mapping to a separate s/g list is needed anyway if the user wants to combine adjacent pages into single s/g entries. This remapping can, of course, be also made from s/g list to s/g list. If the user does not need compaction, then nothing has to be done if the mapping function returns a s/g list. My only valid argument in favour of kvecs is maintenance of the mm code within the mapping function. If the kernel contains a maintained function mapping directly to s/g lists, I will be happy to use it. > I also would like to see the mapping function entered very early > in the call chain (my own use is one stack frame after > the file operations!). That's why I used the stack for the > temporary buffer (which should be very small anyway, so we don't > pin too much pages). > In st the number of pages can be quite large (256 or even larger) and this is why I changed the allocation to kmalloc. In a general function this may also be safer. ... > > I would like to see the following minimum properties in the mapping > > function: > > - it maps all pages or none > > - it pins the pages into memory > > - it returns the number of pages or error > > - it returns the page pointers and page lengths (lengths will be useful if > > large pages will be implemented) and offset in first page > > - it does whatever cache flushing is needed > - locks partial pages used (first and/or last might be) to > avoid a partial page write for the unused remainder. > Is this locking necessary in all cases where the mapping function would be used? (I am asking because it is not a very light operation in a SMP system. Also I can see two alternative strategies: some callers would like to wait until locking succeeds, others (like st) would like to tear down the mapping if locking fails.) Kai ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2002-07-29 18:45 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-07-27 12:38 [RFC][PATCH] Direct I/O for the SCSI tapes Kai Makisara 2002-07-28 4:39 ` Douglas Gilbert 2002-07-28 11:03 ` Kai Makisara 2002-07-28 22:59 ` Ingo Oeser 2002-07-29 18:45 ` Kai Makisara
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox