All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] Direct I/O for the SCSI tapes
@ 2002-07-27 12:38 Kai Makisara
  2002-07-28  4:39 ` Douglas Gilbert
  0 siblings, 1 reply; 5+ messages in thread
From: Kai Makisara @ 2002-07-27 12:38 UTC (permalink / raw)
  To: linux-scsi

The URL http://www.kolumbus.fi/kai.makisara/st-dio.html contains
explanation and a link to a patch implementing direct I/O in the SCSI tape
driver (st). Before adding this to the official kernel, I would like to
get some feedback on this patch. If no one is interested in this
enhancement and/or it seems to be useless, it will probably be left to
collect dust.

	Kai






^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH] Direct I/O for the SCSI tapes
  2002-07-27 12:38 [RFC][PATCH] Direct I/O for the SCSI tapes Kai Makisara
@ 2002-07-28  4:39 ` Douglas Gilbert
  2002-07-28 11:03   ` Kai Makisara
  0 siblings, 1 reply; 5+ messages in thread
From: Douglas Gilbert @ 2002-07-28  4:39 UTC (permalink / raw)
  To: Kai Makisara; +Cc: linux-scsi, ingo.oeser

Kai Makisara wrote:
> 
> The URL http://www.kolumbus.fi/kai.makisara/st-dio.html contains
> explanation and a link to a patch implementing direct I/O in the SCSI tape
> driver (st). Before adding this to the official kernel, I would like to
> get some feedback on this patch. If no one is interested in this
> enhancement and/or it seems to be useless, it will probably be left to
> collect dust.

Kai,
Faster throughput and less CPU overhead seem pretty good reasons
to use direct IO. With larger disk sizes, faster tape transfer speeds 
are needed to backup them up in a reasonable time. So I hope your
patch doesn't collect dust. Also most high speed application that
use sg are _streaming_ to disks [I have one report of > 320 MB/sec].
Perhaps people may one day use a purpose built command set for
streaming like SSC rather than SBC (as used by direct access devices).

It was interesting to see that your patch used variants of Ingo Oeser's 
sg_map_user_pages() [see Kai's version shown below] **. My interest in 
those functions is that I would like to use them in sg (since kiobufs 
have been removed from lk 2.5). According to Ingo several other char
drivers that previously used kiobufs are probably looking for
replacements as well.

Since they are general routines of use to char drivers that
build scatter gather lists (block drivers have bio) perhaps they 
should be placed in some common area.

Doug Gilbert

** lkml 2002-07-19 22:39:18 Ingo Oeser "Re: [never mind] kiobufs and highmem"


Kai's st_map_user_pages() and st_unmap_user_pages() follow:
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

static int st_map_user_pages(struct scatterlist *sgl, const unsigned int max_pages,
                            unsigned long uaddr, size_t count, int rw,
                            unsigned long max_pfn)
{
       int res, i, j;
       unsigned int nr_pages;
       struct page **pages;

       nr_pages = ((uaddr & ~PAGE_MASK) + count - 1 + ~PAGE_MASK) >> PAGE_SHIFT;

       /* User attempted Overflow! */
       if ((uaddr + count) < uaddr)
               return -EINVAL;

       /* Too big */
        if (nr_pages > max_pages)
               return -ENOMEM;

       /* Hmm? */
       if (count == 0)
               return 0;

       if ((pages = kmalloc(max_pages * sizeof(*pages), GFP_ATOMIC)) == NULL)
               return -ENOMEM;

        /* Try to fault in all of the necessary pages */
       down_read(&current->mm->mmap_sem);
        /* rw==READ means read from drive, write into memory area */
       res = get_user_pages(
               current,
               current->mm,
               uaddr,
               nr_pages,
               rw == READ,
               0, /* don't force */
               pages,
               NULL);
       up_read(&current->mm->mmap_sem);

       /* Errors and no page mapped should return here */
       if (res < nr_pages)
               goto out_unmap;

        for (i=0; i < nr_pages; i++) {
                /* FIXME: flush superflous for rw==READ,                 
		 * probably wrong function for rw==WRITE
                 */
               flush_dcache_page(pages[i]);
               if (page_to_pfn(pages[i]) > max_pfn)
                       goto out_unlock;
               /* ?? Is locking needed? I don't think so */
               /* if (TestSetPageLocked(pages[i]))
                  goto out_unlock; */
        }

       /* Populate the scatter/gather list */
       sgl[0].page = pages[0];
       sgl[0].offset = uaddr & ~PAGE_MASK;
       if (nr_pages > 1) {
               sgl[0].length = PAGE_SIZE - sgl[0].offset;
               count -= sgl[0].length;
               for (i=1; i < nr_pages ; i++) {
                       sgl[i].offset = 0;
                       sgl[i].page = pages[i];
                       sgl[i].length = count < PAGE_SIZE ? count : PAGE_SIZE;
                       count -= PAGE_SIZE;
               }
       }
       else {
               sgl[0].length = count;
       }

       kfree(pages);
       return nr_pages;

 out_unlock:
       /* for (j=0; j < i; j++)
          unlock_page(pages[j]); */
       res = 0;
 out_unmap:
       if (res > 0)
               for (j=0; j < res; j++)
                       page_cache_release(pages[j]);
       kfree(pages);
       return res;
}

/* And unmap them... */
static int st_unmap_user_pages(struct scatterlist *sgl, const unsigned int nr_pages,
                              int dirtied)
{
       int i;

       for (i=0; i < nr_pages; i++) {
               if (dirtied && !PageReserved(sgl[i].page))
                       SetPageDirty(sgl[i].page);
               /* unlock_page(sgl[i].page); */
               /* FIXME: cache flush missing for rw==READ
                * FIXME: call the correct reference counting function
                */
               page_cache_release(sgl[i].page);
       }

       return 0;
}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH] Direct I/O for the SCSI tapes
  2002-07-28  4:39 ` Douglas Gilbert
@ 2002-07-28 11:03   ` Kai Makisara
  2002-07-28 22:59     ` Ingo Oeser
  0 siblings, 1 reply; 5+ messages in thread
From: Kai Makisara @ 2002-07-28 11:03 UTC (permalink / raw)
  To: Douglas Gilbert; +Cc: linux-scsi, ingo.oeser

On Sun, 28 Jul 2002, Douglas Gilbert wrote:

...
> It was interesting to see that your patch used variants of Ingo Oeser's
> sg_map_user_pages() [see Kai's version shown below] **. My interest in
> those functions is that I would like to use them in sg (since kiobufs
> have been removed from lk 2.5). According to Ingo several other char
> drivers that previously used kiobufs are probably looking for
> replacements as well.
>
Kiobufs seem to be going out (although Linus has not yet given his
opinion) and since there is nothing equivalent in the kernel, I had to
code something. I started from Ingo's function and looked at the kiobuf,
bio, dio, and aio code and tried to use the best ideas.

> Since they are general routines of use to char drivers that
> build scatter gather lists (block drivers have bio) perhaps they
> should be placed in some common area.
>
It seems clear that there are many of us who would like to see something
like this in the kernel. I fully agreed with Ingo when he posted his
messages to lkml but no one has followed up the discussion yet.

There have been some indications on lkml that Ben LaHaise's aio and/or
kvecs would be included in 2.6. As far as I know, kvecs are quite near
what I want and st should use kvecs if they are included. We don't want
more functions mapping user buffers than necessary.

If kvecs will not be in 2.6, then we want something else. This is also
needed while waiting because it is useful to have the drivers otherwise
tested as long as possible. Here I see two viable alternatives:

1. We make functions that all the users want and agree on and try to get
these into the kernel. These would be either permanent solutions or
removed if kvecs (or something else) is introduced into the kernel. Linus
has not always been thrilled with temporary solutions.

2. Everybody makes their own temporary solution. Even in this alternative
we can agree on the interface and implementation and then the
eventual cleanup will be easy.

> Doug Gilbert
>
> ** lkml 2002-07-19 22:39:18 Ingo Oeser "Re: [never mind] kiobufs and highmem"
>
>
> Kai's st_map_user_pages() and st_unmap_user_pages() follow:
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
>
> static int st_map_user_pages(struct scatterlist *sgl, const unsigned int max_pages,
>                             unsigned long uaddr, size_t count, int rw,
>                             unsigned long max_pfn)

This function does contain some details tailored for st: it checks the
page accessibility and backs out if there is even one non-accessible page.
For general use, it might be better to separate this from the mapping so
that st would use

	if ((nbr_sg = cio_map_user_pages(sgl, max_pages, uaddr, count, rw)) {
		for (i=0; i < nbr_sg; i++) {
			if (page_to_pfn(sgl->page) > max_pfn) {
				cio_unmap_use_pages(sgl, 0);
				nbr_sg = 0;
			}
		}
	}

I don't think this would significantly more cycles than my current
version.

It would probably be best to do possible locking in a separate function.
There it would probably be advantageous to stop early if an unlockable
page is found (or optionally wait until the pages are unlocked). The
locking and unlocking functions can be left undefined until someone tells
what he/she needs.

I would like to see the following minimum properties in the mapping
function:
- it maps all pages or none
- it pins the pages into memory
- it returns the number of pages or error
- it returns the page pointers and page lengths (lengths will be useful if
  large pages will be implemented) and offset in first page
- it does whatever cache flushing is needed

Is the scatter/gather list the correct way to return the results? It is
convenient in many cases. However, if kvecs or something else is used in
future, mapping from those to scatter/gather list is needed. It might be
easier to code that now. The mapping function might be made to return the
results in something that looks exactly like kvecs ;-?

Any opinions?

	Kai



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH] Direct I/O for the SCSI tapes
  2002-07-28 11:03   ` Kai Makisara
@ 2002-07-28 22:59     ` Ingo Oeser
  2002-07-29 18:45       ` Kai Makisara
  0 siblings, 1 reply; 5+ messages in thread
From: Ingo Oeser @ 2002-07-28 22:59 UTC (permalink / raw)
  To: Kai Makisara; +Cc: Douglas Gilbert, linux-scsi

[Please keep me CC'ed on replies, since I'm not subscribed to
linux-scsi. Thanks for CC'ing me so far!]

On Sun, Jul 28, 2002 at 02:03:16PM +0300, Kai Makisara wrote:
> There have been some indications on lkml that Ben LaHaise's aio and/or
> kvecs would be included in 2.6. As far as I know, kvecs are quite near
> what I want and st should use kvecs if they are included. We don't want
> more functions mapping user buffers than necessary.
 
Mapping user buffers is one step only. I want mapping of user
buffers to bus adresses to do scatter-gather. We need it in so
many places now, so I like the idea of simplifying it the way we
agree on. The kvecs do only user page pinning. 

We need struct scatterlist since we need to get the pages anyway
(remember: we get a pointer to the struct page, where we easyly
get to the physical address, which is already the bus adress on
many arches AFAIR).

With kvecs we get only half the work done.

I also would like to see the mapping function entered very early
in the call chain (my own use is one stack frame after
the file operations!). That's why I used the stack for the
temporary buffer (which should be very small anyway, so we don't
pin too much pages).

> 1. We make functions that all the users want and agree on and try to get
> these into the kernel. These would be either permanent solutions or
> removed if kvecs (or something else) is introduced into the kernel. Linus
> has not always been thrilled with temporary solutions.
 
Even then we would still need sth. which does the physical
mapping and must allocate another array for it. I wanted to unify
this on the argument, that scatter-gather goes together with DMA
an bus adresses in the COMMON case (which we optimize for,
remember ;-)).

I've thought I overlooked sth. important, until Doug fowarded me
your work and testing.

> It would probably be best to do possible locking in a separate function.
> There it would probably be advantageous to stop early if an unlockable
> page is found (or optionally wait until the pages are unlocked). The
> locking and unlocking functions can be left undefined until someone tells
> what he/she needs.
> 
> I would like to see the following minimum properties in the mapping
> function:
> - it maps all pages or none
> - it pins the pages into memory
> - it returns the number of pages or error
> - it returns the page pointers and page lengths (lengths will be useful if
>   large pages will be implemented) and offset in first page
> - it does whatever cache flushing is needed
  - locks partial pages used (first and/or last might be) to
    avoid a partial page write for the unused remainder.
  
I agree completly.

> Is the scatter/gather list the correct way to return the results? It is
> convenient in many cases. 

s/many/the common/

so I would add it to the minimum properties. Otherwise we
allocate just another array again for the scatter-gather list.

> However, if kvecs or something else is used in
> future, mapping from those to scatter/gather list is needed. It might be
> easier to code that now. The mapping function might be made to return the
> results in something that looks exactly like kvecs ;-?

I would prefer 2 seperate functions for the kvec and the
scatterlist cases. My basic idea was to prepare the sg-list from
user space buffers and thus enable very simple user space DMA. 
I don't want it to get lost ;-)


Regards

Ingo Oeser
-- 
Science is what we can tell a computer. Art is everything else. --- D.E.Knuth

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC][PATCH] Direct I/O for the SCSI tapes
  2002-07-28 22:59     ` Ingo Oeser
@ 2002-07-29 18:45       ` Kai Makisara
  0 siblings, 0 replies; 5+ messages in thread
From: Kai Makisara @ 2002-07-29 18:45 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: Douglas Gilbert, linux-scsi

On Mon, 29 Jul 2002, Ingo Oeser wrote:

> [Please keep me CC'ed on replies, since I'm not subscribed to
> linux-scsi. Thanks for CC'ing me so far!]
...
> Mapping user buffers is one step only. I want mapping of user
> buffers to bus adresses to do scatter-gather. We need it in so
> many places now, so I like the idea of simplifying it the way we
> agree on. The kvecs do only user page pinning.
>
> We need struct scatterlist since we need to get the pages anyway
> (remember: we get a pointer to the struct page, where we easyly
> get to the physical address, which is already the bus adress on
> many arches AFAIR).
>
> With kvecs we get only half the work done.
>
I agree. I have just been exploring the other possibilities if
sgl_map_user_pages() is not accepted into the kernel. map_user_kvec() (at
least in aio-20020619.diff) returns all the information (page pointers,
offset, lengths) that sgl_map_user_pages() puts into the s/g list. Mapping
to a separate s/g list is needed anyway if the user wants to combine
adjacent pages into single s/g entries.

This remapping can, of course, be also made from s/g list to s/g list. If
the user does not need compaction, then nothing has to be done if the
mapping function returns a s/g list. My only valid argument in favour of
kvecs is maintenance of the mm code within the mapping function. If the
kernel contains a maintained function mapping directly to s/g lists,
I will be happy to use it.

> I also would like to see the mapping function entered very early
> in the call chain (my own use is one stack frame after
> the file operations!). That's why I used the stack for the
> temporary buffer (which should be very small anyway, so we don't
> pin too much pages).
>
In st the number of pages can be quite large (256 or even larger) and this
is why I changed the allocation to kmalloc. In a general function this may
also be safer.

...
> > I would like to see the following minimum properties in the mapping
> > function:
> > - it maps all pages or none
> > - it pins the pages into memory
> > - it returns the number of pages or error
> > - it returns the page pointers and page lengths (lengths will be useful if
> >   large pages will be implemented) and offset in first page
> > - it does whatever cache flushing is needed
>   - locks partial pages used (first and/or last might be) to
>     avoid a partial page write for the unused remainder.
>
Is this locking necessary in all cases where the mapping function would be
used? (I am asking because it is not a very light operation in a SMP
system. Also I can see two alternative strategies: some callers would like
to wait until locking succeeds, others (like st) would like to tear down
the mapping if locking fails.)

	Kai



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-07-29 18:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-07-27 12:38 [RFC][PATCH] Direct I/O for the SCSI tapes Kai Makisara
2002-07-28  4:39 ` Douglas Gilbert
2002-07-28 11:03   ` Kai Makisara
2002-07-28 22:59     ` Ingo Oeser
2002-07-29 18:45       ` Kai Makisara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.