From: Tejun Heo <tj@kernel.org>
To: Boaz Harrosh <bharrosh@panasas.com>
Cc: axboe@kernel.dk, linux-kernel@vger.kernel.org,
fujita.tomonori@lab.ntt.co.jp
Subject: Re: [PATCH 08/17] bio: reimplement bio_copy_user_iov()
Date: Thu, 02 Apr 2009 17:59:39 +0900 [thread overview]
Message-ID: <49D47E7B.3030209@kernel.org> (raw)
In-Reply-To: <49D47653.90507@panasas.com>
Hello, Boaz.
Boaz Harrosh wrote:
>> Yeah, we can switch to dealing with bvecs instead of sgls. For kernel
>> mappings, it wouldn't make any difference. For user single mappings,
>> it'll remove an intermediate step. For user sg mappings, if we make
>> gup use sgl, it won't make much difference.
>
> "gup use sgl" was just a stupid suggestion to try eliminate one more
> allocation. Realy Realy stupid.
>
> for gup there is a much better, safer, way. Just allocate on the stack
> a constant size say 64 pointers and add an inner-loop on the 64 pages.
> The extra code loop is much cheaper then an allocation and is one less
> point of failure.
Great.
> So now that we got gup out of the way, will you go directly to bvecs?
As said before, I don't think exposing bio at the driver interface is
a good idea copy or not. For internal implementation, if there's
enough benefit, yeah.
>> Frankly, for copy paths, I don't think all this matters at all. If
>> each request is small, other overheads will dominate. For large
>> requests, copying to and from sgl isn't something worth worrying
>> about.
>
> It is. Again, I tested that. Each new allocation in the order of
> the number of pages in a request adds 3-5% on a RAM I/O
Sure, if you do things against ram, anything will show up. Do you
have something more realistic?
> And it adds a point of failure. The less the better.
That is not the only measure. Things are always traded off using
multiple evaluation metrics. The same goes with the _NO_ allocation
thing.
>> For mapping paths, if we change gup to use sgl (we really can't make
>> it swallow bvec), I don't think it will make whole lot of difference
>> one way or the other. It might end up using slightly more cachelines,
>> but that will be about it.
>
> gup to use sgl, is out of the way, that was stupid idea to try eliminate
> one more dynamic allocation.
Yeah, I do agree gup is much better off with pages array.
>> I do agree using bvec as internal page carrier would be nicer (sans
>> having to implement stuff manually) but I'm not convinced whether
>> it'll be worth the added code.
>
> We can reach a point where there is no intermediate dynamic alloctions
> and no point of failures, but for the allocation of the final bio+biovec.
There is no need to obsess about no point of failure. Obsessing about
single evaluation metric is a nice and fast way toward a mess.
> And that "implement stuff manually" will also enable internal bio
> code cleanups since now biovec is that much smarter.
>
> <snip>
>>> And I have one more question. Are you sure kmalloc of
>>> bigger-then-page buffers are safe?
>> It has higher chance of failure but is definitely safe.
>
> Much much higher, not just 100% higher. And it will kill the
> use of block-layer in nommu systems.
Didn't know nommu would kill high order allocation or were you talking
about vmalloc?
>>> As I understood it, that tries to allocate physically contiguous
>>> pages which degrades as time passes, and last time I tried this with
>>> a kmem_cache (do to a bug) it crashed the kernel randomly after 2
>>> minutes of use.
>> Then, that's a bug.
>
> Fixing the bug will not help, the allocation will fail and the IO will
> not get through.
Under enough memory pressure, Non-fs IOs are gonna fail somewhere
along the line. If you don't like that, use mempool backed allocation
or preallocated data structures. Also, I believe it is generally
considered better to fail IO than oopsing. :-P
>> Yeah, high order allocations fail more easily as time passes. But
>> low order (1 maybe 2) allocations aren't too bad.
>
> No, we are not talking about 1 or 2. Today since Jens put the scatterlist
> chaining we can have lots and lots of them chained together. At the time
> Jens said that bio had the chaining option for a long time, only scatterlists
> limit the size of the request.
Alright, let's than chain bios but let's please do it inside bio
proper.
>> If it matters we can make bio_kmalloc() use vmalloc() for bvecs if it
>> goes over PAGE_SIZE but again given that nobody reported the
>
> No that is a regression, vmaloc is an order of a magnitude slower then
> just plain BIO chaining
It all depends on how frequent those multiple allocations will be.
> spurious
>> GFP_DMA in bounce_gfp which triggers frenzy OOM killing spree for
>> large SG_IOs, I don't think this matters all that much.
>>
>
> This BUG was only for HW, ie ata. But for stuff like iscsi FB sas
> they did fine.
The bug was for all controllers which couldn't do 64bit DMA on 64bit
machines.
> The fact of the matter is that people, me included, run in systems
> with very large request for a while now, large like 4096 pages which
> is 16 chained bios. Do you want to allocate that with vmalloc?
The patchset was written the way it's written because I couldn't see
any pressing performance requirement for the current existing users.
Optimizing for SG_IO against memory is simply not a good idea.
If you have in-kernel users which often require large transfers, sure,
chaining bio would be better. Still, let's do it _inside_ bio. Would
your use case be happy with something like blk_rq_map_kern_bvec(rq,
bvec) which can append to rq? If we convert internal implementation
over to bvec, blk_rq_map_kern_sg() can be sg->bvec converting wrapper
around blk_rq_map_kern_bvec().
> I love your cleanups, and your courage, which I don't have.
Thanks a lot but it's not like the code and Jens are scary monsters I
need to slay, so I don't think I'm being particularly courageous. :-)
> I want to help in any way I can. If you need just tell me what, and
> I'll be glad to try or test anything. This is fun I like to work and
> think on these things
Great, your reviews are comments are very helpful, so please keep them
coming. :-)
Jens, Tomo, what do you guys think?
Thanks.
--
tejun
next prev parent reply other threads:[~2009-04-02 9:00 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-01 13:44 [RFC PATCHSET block] block: blk-map updates and API cleanup Tejun Heo
2009-04-01 13:44 ` [PATCH 01/17] blk-map: move blk_rq_map_user() below blk_rq_map_user_iov() Tejun Heo
2009-04-01 13:44 ` [PATCH 02/17] scatterlist: improve atomic mapping handling in mapping iterator Tejun Heo
2009-04-01 13:44 ` [PATCH 03/17] blk-map: improve alignment checking for blk_rq_map_user_iov() Tejun Heo
2009-04-01 13:44 ` [PATCH 04/17] bio: bio.h cleanup Tejun Heo
2009-04-01 13:44 ` [PATCH 05/17] bio: cleanup rw usage Tejun Heo
2009-04-02 8:36 ` Boaz Harrosh
2009-04-02 9:02 ` Tejun Heo
2009-04-02 9:07 ` Boaz Harrosh
2009-04-02 9:13 ` Tejun Heo
2009-04-01 13:44 ` [PATCH 06/17] blk-map/bio: use struct iovec instead of sg_iovec Tejun Heo
2009-04-01 14:50 ` Boaz Harrosh
2009-04-01 15:32 ` Tejun Heo
2009-04-01 13:44 ` [PATCH 07/17] blk-map/bio: rename stuff Tejun Heo
2009-04-01 13:44 ` [PATCH 08/17] bio: reimplement bio_copy_user_iov() Tejun Heo
2009-04-01 15:50 ` Boaz Harrosh
2009-04-01 23:57 ` Tejun Heo
2009-04-02 8:24 ` Boaz Harrosh
2009-04-02 8:59 ` Tejun Heo [this message]
2009-04-02 9:54 ` Boaz Harrosh
2009-04-02 1:38 ` Tejun Heo
2009-04-02 7:34 ` Boaz Harrosh
2009-04-02 7:51 ` Tejun Heo
2009-04-01 13:44 ` [PATCH 09/17] bio: collapse __bio_map_user_iov(), __bio_unmap_user() and __bio_map_kern() Tejun Heo
2009-04-01 13:44 ` [PATCH 10/17] bio: use bio_create_from_sgl() in bio_map_user_iov() Tejun Heo
2009-04-01 16:33 ` Boaz Harrosh
2009-04-01 22:20 ` Tejun Heo
2009-04-01 13:44 ` [PATCH 11/17] bio: add sgl source support to bci and implement bio_memcpy_sgl_sgl() Tejun Heo
2009-04-01 13:44 ` [PATCH 12/17] bio: implement bio_{map|copy}_kern_sgl() Tejun Heo
2009-04-01 13:44 ` [PATCH 13/17] blk-map: implement blk_rq_map_kern_sgl() Tejun Heo
2009-04-01 16:50 ` Boaz Harrosh
2009-04-01 22:25 ` Tejun Heo
2009-04-01 13:44 ` [PATCH 14/17] scsi: replace custom rq mapping with blk_rq_map_kern_sgl() Tejun Heo
2009-04-01 17:00 ` Boaz Harrosh
2009-04-01 17:05 ` James Bottomley
2009-04-01 17:17 ` Boaz Harrosh
2009-04-13 7:42 ` FUJITA Tomonori
2009-04-13 9:38 ` Tejun Heo
2009-04-13 10:07 ` FUJITA Tomonori
2009-04-13 12:59 ` Borislav Petkov
2009-04-14 0:44 ` FUJITA Tomonori
2009-04-14 10:01 ` Borislav Petkov
2009-04-14 23:44 ` FUJITA Tomonori
2009-04-15 4:25 ` Tejun Heo
2009-04-15 7:26 ` Borislav Petkov
2009-04-15 7:48 ` FUJITA Tomonori
2009-04-15 8:13 ` Borislav Petkov
2009-04-16 3:06 ` Tejun Heo
2009-04-16 3:06 ` Tejun Heo
2009-04-16 5:44 ` Borislav Petkov
2009-04-16 6:07 ` Tejun Heo
2009-04-16 6:07 ` Tejun Heo
2009-04-16 6:29 ` Borislav Petkov
2009-04-16 6:30 ` Tejun Heo
2009-04-16 6:30 ` Tejun Heo
2009-04-16 5:53 ` [PATCH 1/3] ide: add helpers for preparing sense requests Borislav Petkov
2009-04-16 5:53 ` [PATCH 2/3] ide-cd: convert to using generic sense request Borislav Petkov
2009-04-16 5:54 ` [PATCH 3/3] ide-atapi: convert ide-{floppy,tape} to using preallocated sense buffer Borislav Petkov
2009-04-01 13:44 ` [PATCH 15/17] bio/blk-map: kill unused stuff and un-export internal functions Tejun Heo
2009-04-01 13:54 ` Boaz Harrosh
2009-04-01 14:06 ` Tejun Heo
2009-04-01 13:44 ` [PATCH 16/17] blk-map/bio: remove superflous @len parameter from blk_rq_map_user_iov() Tejun Heo
2009-04-01 17:12 ` Boaz Harrosh
2009-04-01 22:17 ` Tejun Heo
2009-04-01 13:44 ` [PATCH 17/17] blk-map/bio: remove superflous @q from blk_rq_map_{user|kern}*() Tejun Heo
2009-04-01 17:05 ` Boaz Harrosh
2009-04-01 14:08 ` [RFC PATCHSET block] block: blk-map updates and API cleanup Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49D47E7B.3030209@kernel.org \
--to=tj@kernel.org \
--cc=axboe@kernel.dk \
--cc=bharrosh@panasas.com \
--cc=fujita.tomonori@lab.ntt.co.jp \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.