linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Boaz Harrosh <bharrosh@panasas.com>
To: Tejun Heo <tj@kernel.org>
Cc: axboe@kernel.dk, linux-kernel@vger.kernel.org,
	fujita.tomonori@lab.ntt.co.jp
Subject: Re: [PATCH 08/17] bio: reimplement bio_copy_user_iov()
Date: Wed, 01 Apr 2009 18:50:35 +0300	[thread overview]
Message-ID: <49D38D4B.7020701@panasas.com> (raw)
In-Reply-To: <1238593472-30360-9-git-send-email-tj@kernel.org>

On 04/01/2009 04:44 PM, Tejun Heo wrote:
> Impact: more modular implementation
> 
> Break down bio_copy_user_iov() into the following steps.
> 
> 1. bci and page allocation
> 2. copying data if WRITE
> 3. create bio accordingly
> 
> bci is now responsible for managing any copy related resources.  Given
> source iov, bci_create() allocates bci and fills it with enough pages
> to cover the source iov.  The allocated pages are described with a
> sgl.
> 
> Note that new allocator always rounds up rq_map_data->offset to page
> boundary to simplify implementation and guarantee enough DMA padding
> area at the end.  As the only user, scsi sg, always passes in zero
> offset, this doesn't cause any actual behavior difference.  Also,
> nth_page() is used to walk to the next page rather than directly
> adding to struct page *.
> 
> Copying back and forth is done using bio_memcpy_sgl_uiov() which is
> implemented using sg mapping iterator and iov iterator.
> 
> The last step is done using bio_create_from_sgl().
> 
> This patch by itself adds one more level of indirection via sgl and
> more code but components factored out here will be used for future
> code refactoring.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Hi dear Tejun

I've looked hard and deep into your patchset, and I would like to
suggest an improvement.

[Option 1]
What your code is actually using from sgl-code base is:
 for_each_sg
 sg_mapping_iter and it's
	sg_miter_start, sg_miter_next
 ... (what else)

I would like if you can define above for bvec(s) just the way you like
them. Then code works directly on the destination bvect inside the final
bio. One less copy no intermediate allocation, and no kmalloc of
bigger-then-page buffers.

These are all small inlines, duplicating those will not affect
Kernel size at all. You are not using the chaining ability of sgl(s)
so it can be simplified. You will see that not having the intermediate
copy simplifies the code even more.

Since no out-side user currently needs sgl(s) no functionality is lost.

[Option 2]
Keep pointer to sgl and not bvec at bio, again code works on final destination.
Later users of block layer that call blk_rq_fill_sgl (blk_rq_map_sg) will just
get a copy of the pointer and another allocation and copy is gained.
This option will spill outside of the current patches scope. Into bvec hacking
code.


I do like your long term vision of separating the DMA part from the virtual part
of scatterlists. Note how they are actually two disjoint lists altogether. After
the dma_map does its thing the dma physical list might be shorter then virtual
and sizes might not correspond at all. The dma mapping code regards the dma part
as an empty list that gets appended while processing, any segments match is
accidental. (That is: inside the scatterlist the virtual address most probably
does not match the dma address)

So [option 1] matches more closely to that vision.

Historically code was doing
  Many-sources => scatterlist => biovec => scatterlist => dma-scatterlist

Only at 2.6.30 we can say that we shorten a step to do:
  Many-sources => biovec => scatterlist => dma-scatterlist

Now you want to return the extra step, I hate it.
[Option 2] can make that even shorter.
  Many-sources => scatterlist => dma-scatterlist

Please consider [option 1] it will only add some source code
but it will not increase code size, maybe it will decrease,
and it will be fast.

Please consider that this code-path is used by me, in exofs and
pNFS-objcets in a very very hot path, where memory pressure is a
common scenario.

And I have one more question.
Are you sure kmalloc of bigger-then-page buffers are safe? As I
understood it, that tries to allocate physically contiguous pages
which degrades as time passes, and last time I tried this with a kmem_cache
(do to a bug) it crashed the kernel randomly after 2 minutes of use.

Thanks
Boaz

  reply	other threads:[~2009-04-01 15:53 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-01 13:44 [RFC PATCHSET block] block: blk-map updates and API cleanup Tejun Heo
2009-04-01 13:44 ` [PATCH 01/17] blk-map: move blk_rq_map_user() below blk_rq_map_user_iov() Tejun Heo
2009-04-01 13:44 ` [PATCH 02/17] scatterlist: improve atomic mapping handling in mapping iterator Tejun Heo
2009-04-01 13:44 ` [PATCH 03/17] blk-map: improve alignment checking for blk_rq_map_user_iov() Tejun Heo
2009-04-01 13:44 ` [PATCH 04/17] bio: bio.h cleanup Tejun Heo
2009-04-01 13:44 ` [PATCH 05/17] bio: cleanup rw usage Tejun Heo
2009-04-02  8:36   ` Boaz Harrosh
2009-04-02  9:02     ` Tejun Heo
2009-04-02  9:07       ` Boaz Harrosh
2009-04-02  9:13         ` Tejun Heo
2009-04-01 13:44 ` [PATCH 06/17] blk-map/bio: use struct iovec instead of sg_iovec Tejun Heo
2009-04-01 14:50   ` Boaz Harrosh
2009-04-01 15:32     ` Tejun Heo
2009-04-01 13:44 ` [PATCH 07/17] blk-map/bio: rename stuff Tejun Heo
2009-04-01 13:44 ` [PATCH 08/17] bio: reimplement bio_copy_user_iov() Tejun Heo
2009-04-01 15:50   ` Boaz Harrosh [this message]
2009-04-01 23:57     ` Tejun Heo
2009-04-02  8:24       ` Boaz Harrosh
2009-04-02  8:59         ` Tejun Heo
2009-04-02  9:54           ` Boaz Harrosh
2009-04-02  1:38     ` Tejun Heo
2009-04-02  7:34       ` Boaz Harrosh
2009-04-02  7:51         ` Tejun Heo
2009-04-01 13:44 ` [PATCH 09/17] bio: collapse __bio_map_user_iov(), __bio_unmap_user() and __bio_map_kern() Tejun Heo
2009-04-01 13:44 ` [PATCH 10/17] bio: use bio_create_from_sgl() in bio_map_user_iov() Tejun Heo
2009-04-01 16:33   ` Boaz Harrosh
2009-04-01 22:20     ` Tejun Heo
2009-04-01 13:44 ` [PATCH 11/17] bio: add sgl source support to bci and implement bio_memcpy_sgl_sgl() Tejun Heo
2009-04-01 13:44 ` [PATCH 12/17] bio: implement bio_{map|copy}_kern_sgl() Tejun Heo
2009-04-01 13:44 ` [PATCH 13/17] blk-map: implement blk_rq_map_kern_sgl() Tejun Heo
2009-04-01 16:50   ` Boaz Harrosh
2009-04-01 22:25     ` Tejun Heo
2009-04-01 13:44 ` [PATCH 14/17] scsi: replace custom rq mapping with blk_rq_map_kern_sgl() Tejun Heo
2009-04-01 17:00   ` Boaz Harrosh
2009-04-01 17:05     ` James Bottomley
2009-04-01 17:17       ` Boaz Harrosh
2009-04-13  7:42     ` FUJITA Tomonori
2009-04-13  9:38       ` Tejun Heo
2009-04-13 10:07         ` FUJITA Tomonori
2009-04-13 12:59           ` Borislav Petkov
2009-04-14  0:44             ` FUJITA Tomonori
2009-04-14 10:01               ` Borislav Petkov
2009-04-14 23:44                 ` FUJITA Tomonori
2009-04-15  4:25                   ` Tejun Heo
2009-04-15  7:26                     ` Borislav Petkov
2009-04-15  7:48                       ` FUJITA Tomonori
2009-04-15  8:13                         ` Borislav Petkov
2009-04-16  3:06                           ` Tejun Heo
2009-04-16  5:44                             ` Borislav Petkov
2009-04-16  6:07                               ` Tejun Heo
2009-04-16  6:29                                 ` Borislav Petkov
2009-04-16  6:30                                   ` Tejun Heo
2009-04-16  5:53                             ` [PATCH 1/3] ide: add helpers for preparing sense requests Borislav Petkov
2009-04-16  5:53                             ` [PATCH 2/3] ide-cd: convert to using generic sense request Borislav Petkov
2009-04-16  5:54                             ` [PATCH 3/3] ide-atapi: convert ide-{floppy,tape} to using preallocated sense buffer Borislav Petkov
2009-04-01 13:44 ` [PATCH 15/17] bio/blk-map: kill unused stuff and un-export internal functions Tejun Heo
2009-04-01 13:54   ` Boaz Harrosh
2009-04-01 14:06     ` Tejun Heo
2009-04-01 13:44 ` [PATCH 16/17] blk-map/bio: remove superflous @len parameter from blk_rq_map_user_iov() Tejun Heo
2009-04-01 17:12   ` Boaz Harrosh
2009-04-01 22:17     ` Tejun Heo
2009-04-01 13:44 ` [PATCH 17/17] blk-map/bio: remove superflous @q from blk_rq_map_{user|kern}*() Tejun Heo
2009-04-01 17:05   ` Boaz Harrosh
2009-04-01 14:08 ` [RFC PATCHSET block] block: blk-map updates and API cleanup Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49D38D4B.7020701@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=axboe@kernel.dk \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).