All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jaehoon Chung <jh80.chung@samsung.com>
To: Per Forlin <per.forlin@linaro.org>
Cc: linux-mmc@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, dev@lists.linaro.org,
	Chris Ball <cjb@laptop.org>,
	Kyungmin Park <kyungmin.park@samsung.com>
Subject: Re: [PATCH 0/5] mmc: add double buffering for mmc block requests
Date: Tue, 18 Jan 2011 11:35:09 +0900	[thread overview]
Message-ID: <4D34FC5D.5080605@samsung.com> (raw)
In-Reply-To: <1294856043-13447-1-git-send-email-per.forlin@linaro.org>

Hi Per..

it is interesting approach..so
we want to test your double buffering in our environment(Samsung SoC).

Did you test with SDHCI?
If you tested with SDHCI, i want to know how much increase the performance.

Thanks,
Jaehoon Chung

Per Forlin wrote:
> Add support to prepare one MMC request while another is active on
> the host. This is done by making the issue_rw_rq() asynchronous.
> The increase in throughput is proportional to the time it takes to
> prepare a request and how fast the memory is. The faster the MMC/SD is
> the more significant the prepare request time becomes. Measurements on U5500
> and U8500 on eMMC shows significant performance gain for DMA on MMC for large
> reads. In the PIO case there is some gain in performance for large reads too.
> There seems to be no or small performance gain for write, don't have a good
> explanation for this yet.
> 
> There are two optional hooks pre_req() and post_req() that the host driver
> may implement in order to improve double buffering. In the DMA case pre_req()
> may do dma_map_sg() and prepare the dma descriptor and post_req runs the
> dma_unmap_sg.
> 
> The mmci host driver implementation for double buffering is not intended
> nor ready for mainline yet. It is only an example of how to implement
> pre_req() and post_req(). The reason for this is that the basic DMA support
> for MMCI is not complete yet. The mmci patches are sent in a separate patch
> series "[FYI 0/4] arm: mmci: example implementation of double buffering".
> 
> Issues/Questions for issue_rw_rq() in block.c:
> * Is it safe to claim the host for the first MMC request and wait to release
>   it until the MMC queue is empty again? Or must the host be claimed and
>   released for every request?
> * Is it possible to predict the result from __blk_end_request().
>   If there are no errors for a completed MMC request and the
>   blk_rq_bytes(req) == data.bytes_xfered, will it be guaranteed that
>   __blk_end_request will return 0?
> 
> Here follows the IOZone results for u8500 v1.1 on eMMC.
> The numbers for DMA are a bit to good here due to the fact that the
> CPU speed is decreased compared to u8500 v2. This makes the cache handling
> even more significant.
> 
> Command line used: ./iozone -az -i0 -i1 -i2 -s 50m -I -f /iozone.tmp -e -R -+u
> 
> Relative diff: VANILLA-MMC-PIO -> 2BUF-MMC-PIO
> cpu load is abs diff
>                                                         random  random
>         KB      reclen  write   rewrite read    reread  read    write
>         51200   4       +0%     +0%     +0%     +0%     +0%     +0%
>         cpu:            +0.1    -0.1    -0.5    -0.3    -0.1    -0.0
> 
>         51200   8       +0%     +0%     +6%     +6%     +8%     +0%
>         cpu:            +0.1    -0.1    -0.3    -0.4    -0.8    +0.0
> 
>         51200   16      +0%     -2%     +0%     +0%     -3%     +0%
>         cpu:            +0.0    -0.2    +0.0    +0.0    -0.2    +0.0
> 
>         51200   32      +0%     +1%     +0%     +0%     +0%     +0%
>         cpu:            +0.1    +0.0    -0.3    +0.0    +0.0    +0.0
> 
>         51200   64      +0%     +0%     +0%     +0%     +0%     +0%
>         cpu:            +0.1    +0.0    +0.0    +0.0    +0.0    +0.0
> 
>         51200   128     +0%     +1%     +1%     +1%     +1%     +0%
>         cpu:            +0.0    +0.2    +0.1    -0.3    +0.4    +0.0
> 
>         51200   256     +0%     +0%     +1%     +1%     +1%     +0%
>         cpu:            +0.0    -0.0    +0.1    +0.1    +0.1    +0.0
> 
>         51200   512     +0%     +1%     +2%     +2%     +2%     +0%
>         cpu:            +0.1    +0.0    +0.2    +0.2    +0.2    +0.1
> 
>         51200   1024    +0%     +2%     +2%     +2%     +3%     +0%
>         cpu:            +0.2    +0.1    +0.2    +0.5    -0.8    +0.0
> 
>         51200   2048    +0%     +2%     +3%     +3%     +3%     +0%
>         cpu:            +0.0    -0.2    +0.4    +0.8    -0.5    +0.2
> 
>         51200   4096    +0%     +1%     +3%     +3%     +3%     +1%
>         cpu:            +0.2    +0.1    +0.9    +0.9    +0.5    +0.1
> 
>         51200   8192    +1%     +0%     +3%     +3%     +3%     +1%
>         cpu:            +0.2    +0.2    +1.3    +1.3    +1.0    +0.0
> 
>         51200   16384   +0%     +1%     +3%     +3%     +3%     +1%
>         cpu:            +0.2    +0.1    +1.0    +1.3    +1.0    +0.5
> 
> Relative diff: VANILLA-MMC-DMA -> 2BUF-MMC-MMCI-DMA
> cpu load is abs diff
>                                                         random  random
>         KB      reclen  write   rewrite read    reread  read    write
>         51200   4       +0%     -3%     +6%     +5%     +5%     +0%
>         cpu:            +0.0    -0.2    -0.6    -0.1    +0.3    +0.0
> 
>         51200   8       +0%     +0%     +7%     +7%     +7%     +0%
>         cpu:            +0.0    +0.1    +0.8    +0.6    +0.9    +0.0
> 
>         51200   16      +0%     +0%     +7%     +7%     +8%     +0%
>         cpu:            +0.0    -0.0    +0.7    +0.7    +0.8    +0.0
> 
>         51200   32      +0%     +0%     +8%     +8%     +9%     +0%
>         cpu:            +0.0    +0.1    +0.7    +0.7    +0.3    +0.0
> 
>         51200   64      +0%     +1%     +9%     +9%     +9%     +0%
>         cpu:            +0.0    +0.0    +0.8    +0.7    +0.8    +0.0
> 
>         51200   128     +1%     +0%     +13%    +13%    +14%    +0%
>         cpu:            +0.2    +0.0    +1.0    +1.0    +1.1    +0.0
> 
>         51200   256     +1%     +2%     +8%     +8%     +11%    +0%
>         cpu:            +0.0    +0.3    +0.0    +0.7    +1.5    +0.0
> 
>         51200   512     +1%     +2%     +16%    +16%    +17%    +0%
>         cpu:            +0.2    +0.2    +2.2    +2.1    +2.2    +0.1
> 
>         51200   1024    +1%     +2%     +20%    +20%    +20%    +1%
>         cpu:            +0.2    +0.1    +2.6    +1.9    +2.6    +0.0
> 
>         51200   2048    +0%     +2%     +22%    +22%    +21%    +0%
>         cpu:            +0.0    +0.3    +2.3    +2.9    +2.1    -0.0
> 
>         51200   4096    +1%     +2%     +23%    +23%    +23%    +1%
>         cpu:            +0.2    +0.1    +2.0    +3.2    +3.1    +0.0
> 
>         51200   8192    +1%     +5%     +24%    +24%    +24%    +1%
>         cpu:            +1.4    -0.0    +4.2    +3.0    +2.8    +0.1
> 
>         51200   16384   +1%     +3%     +24%    +24%    +24%    +2%
>         cpu:            +0.0    +0.3    +3.4    +3.8    +3.7    +0.1
> 
> Here follows the IOZone results for u5500 on eMMC.
> These numbers for DMA are more as expected.
> 
> Command line used: ./iozone -az -i0 -i1 -i2 -s 50m -I -f /iozone.tmp -e -R -+u
> 
> Relative diff: VANILLA-MMC-DMA -> 2BUF-MMC-MMCI-DMA
> cpu load is abs diff
>                                                         random  random
>         KB      reclen  write   rewrite read    reread  read    write
>         51200   128     +1%     +1%     +10%    +9%     +10%    +0%
>         cpu:            +0.1    +0.0    +1.3    +0.1    +0.8    +0.1
> 
>         51200   256     +2%     +2%     +7%     +7%     +9%     +0%
>         cpu:            +0.1    +0.4    +0.5    +0.6    +0.7    +0.0
> 
>         51200   512     +2%     +2%     +12%    +12%    +12%    +1%
>         cpu:            +0.4    +0.6    +1.8    +2.4    +2.4    +0.2
> 
>         51200   1024    +2%     +3%     +14%    +14%    +14%    +0%
>         cpu:            +0.3    +0.1    +2.1    +1.4    +1.4    +0.2
> 
>         51200   2048    +3%     +3%     +16%    +16%    +16%    +1%
>         cpu:            +0.2    +0.2    +2.5    +1.8    +2.4    -0.2
> 
>         51200   4096    +3%     +3%     +17%    +17%    +18%    +3%
>         cpu:            +0.1    -0.1    +2.7    +2.0    +2.7    -0.1
> 
>         51200   8192    +3%     +3%     +18%    +18%    +18%    +3%
>         cpu:            -0.1    +0.2    +3.0    +2.3    +2.2    +0.2
> 
>         51200   16384   +3%     +3%     +18%    +18%    +18%    +4%
>         cpu:            +0.2    +0.2    +2.8    +3.5    +2.4    -0.0
> 
> Per Forlin (5):
>   mmc: add member in mmc queue struct to hold request data
>   mmc: Add a block request prepare function
>   mmc: Add a second mmc queue request member
>   mmc: Store the mmc block request struct in mmc queue
>   mmc: Add double buffering for mmc block requests
> 
>  drivers/mmc/card/block.c |  337 ++++++++++++++++++++++++++++++----------------
>  drivers/mmc/card/queue.c |  171 +++++++++++++++---------
>  drivers/mmc/card/queue.h |   31 +++-
>  drivers/mmc/core/core.c  |   77 +++++++++--
>  include/linux/mmc/core.h |    7 +-
>  include/linux/mmc/host.h |    8 +
>  6 files changed, 432 insertions(+), 199 deletions(-)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


WARNING: multiple messages have this Message-ID (diff)
From: jh80.chung@samsung.com (Jaehoon Chung)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 0/5] mmc: add double buffering for mmc block requests
Date: Tue, 18 Jan 2011 11:35:09 +0900	[thread overview]
Message-ID: <4D34FC5D.5080605@samsung.com> (raw)
In-Reply-To: <1294856043-13447-1-git-send-email-per.forlin@linaro.org>

Hi Per..

it is interesting approach..so
we want to test your double buffering in our environment(Samsung SoC).

Did you test with SDHCI?
If you tested with SDHCI, i want to know how much increase the performance.

Thanks,
Jaehoon Chung

Per Forlin wrote:
> Add support to prepare one MMC request while another is active on
> the host. This is done by making the issue_rw_rq() asynchronous.
> The increase in throughput is proportional to the time it takes to
> prepare a request and how fast the memory is. The faster the MMC/SD is
> the more significant the prepare request time becomes. Measurements on U5500
> and U8500 on eMMC shows significant performance gain for DMA on MMC for large
> reads. In the PIO case there is some gain in performance for large reads too.
> There seems to be no or small performance gain for write, don't have a good
> explanation for this yet.
> 
> There are two optional hooks pre_req() and post_req() that the host driver
> may implement in order to improve double buffering. In the DMA case pre_req()
> may do dma_map_sg() and prepare the dma descriptor and post_req runs the
> dma_unmap_sg.
> 
> The mmci host driver implementation for double buffering is not intended
> nor ready for mainline yet. It is only an example of how to implement
> pre_req() and post_req(). The reason for this is that the basic DMA support
> for MMCI is not complete yet. The mmci patches are sent in a separate patch
> series "[FYI 0/4] arm: mmci: example implementation of double buffering".
> 
> Issues/Questions for issue_rw_rq() in block.c:
> * Is it safe to claim the host for the first MMC request and wait to release
>   it until the MMC queue is empty again? Or must the host be claimed and
>   released for every request?
> * Is it possible to predict the result from __blk_end_request().
>   If there are no errors for a completed MMC request and the
>   blk_rq_bytes(req) == data.bytes_xfered, will it be guaranteed that
>   __blk_end_request will return 0?
> 
> Here follows the IOZone results for u8500 v1.1 on eMMC.
> The numbers for DMA are a bit to good here due to the fact that the
> CPU speed is decreased compared to u8500 v2. This makes the cache handling
> even more significant.
> 
> Command line used: ./iozone -az -i0 -i1 -i2 -s 50m -I -f /iozone.tmp -e -R -+u
> 
> Relative diff: VANILLA-MMC-PIO -> 2BUF-MMC-PIO
> cpu load is abs diff
>                                                         random  random
>         KB      reclen  write   rewrite read    reread  read    write
>         51200   4       +0%     +0%     +0%     +0%     +0%     +0%
>         cpu:            +0.1    -0.1    -0.5    -0.3    -0.1    -0.0
> 
>         51200   8       +0%     +0%     +6%     +6%     +8%     +0%
>         cpu:            +0.1    -0.1    -0.3    -0.4    -0.8    +0.0
> 
>         51200   16      +0%     -2%     +0%     +0%     -3%     +0%
>         cpu:            +0.0    -0.2    +0.0    +0.0    -0.2    +0.0
> 
>         51200   32      +0%     +1%     +0%     +0%     +0%     +0%
>         cpu:            +0.1    +0.0    -0.3    +0.0    +0.0    +0.0
> 
>         51200   64      +0%     +0%     +0%     +0%     +0%     +0%
>         cpu:            +0.1    +0.0    +0.0    +0.0    +0.0    +0.0
> 
>         51200   128     +0%     +1%     +1%     +1%     +1%     +0%
>         cpu:            +0.0    +0.2    +0.1    -0.3    +0.4    +0.0
> 
>         51200   256     +0%     +0%     +1%     +1%     +1%     +0%
>         cpu:            +0.0    -0.0    +0.1    +0.1    +0.1    +0.0
> 
>         51200   512     +0%     +1%     +2%     +2%     +2%     +0%
>         cpu:            +0.1    +0.0    +0.2    +0.2    +0.2    +0.1
> 
>         51200   1024    +0%     +2%     +2%     +2%     +3%     +0%
>         cpu:            +0.2    +0.1    +0.2    +0.5    -0.8    +0.0
> 
>         51200   2048    +0%     +2%     +3%     +3%     +3%     +0%
>         cpu:            +0.0    -0.2    +0.4    +0.8    -0.5    +0.2
> 
>         51200   4096    +0%     +1%     +3%     +3%     +3%     +1%
>         cpu:            +0.2    +0.1    +0.9    +0.9    +0.5    +0.1
> 
>         51200   8192    +1%     +0%     +3%     +3%     +3%     +1%
>         cpu:            +0.2    +0.2    +1.3    +1.3    +1.0    +0.0
> 
>         51200   16384   +0%     +1%     +3%     +3%     +3%     +1%
>         cpu:            +0.2    +0.1    +1.0    +1.3    +1.0    +0.5
> 
> Relative diff: VANILLA-MMC-DMA -> 2BUF-MMC-MMCI-DMA
> cpu load is abs diff
>                                                         random  random
>         KB      reclen  write   rewrite read    reread  read    write
>         51200   4       +0%     -3%     +6%     +5%     +5%     +0%
>         cpu:            +0.0    -0.2    -0.6    -0.1    +0.3    +0.0
> 
>         51200   8       +0%     +0%     +7%     +7%     +7%     +0%
>         cpu:            +0.0    +0.1    +0.8    +0.6    +0.9    +0.0
> 
>         51200   16      +0%     +0%     +7%     +7%     +8%     +0%
>         cpu:            +0.0    -0.0    +0.7    +0.7    +0.8    +0.0
> 
>         51200   32      +0%     +0%     +8%     +8%     +9%     +0%
>         cpu:            +0.0    +0.1    +0.7    +0.7    +0.3    +0.0
> 
>         51200   64      +0%     +1%     +9%     +9%     +9%     +0%
>         cpu:            +0.0    +0.0    +0.8    +0.7    +0.8    +0.0
> 
>         51200   128     +1%     +0%     +13%    +13%    +14%    +0%
>         cpu:            +0.2    +0.0    +1.0    +1.0    +1.1    +0.0
> 
>         51200   256     +1%     +2%     +8%     +8%     +11%    +0%
>         cpu:            +0.0    +0.3    +0.0    +0.7    +1.5    +0.0
> 
>         51200   512     +1%     +2%     +16%    +16%    +17%    +0%
>         cpu:            +0.2    +0.2    +2.2    +2.1    +2.2    +0.1
> 
>         51200   1024    +1%     +2%     +20%    +20%    +20%    +1%
>         cpu:            +0.2    +0.1    +2.6    +1.9    +2.6    +0.0
> 
>         51200   2048    +0%     +2%     +22%    +22%    +21%    +0%
>         cpu:            +0.0    +0.3    +2.3    +2.9    +2.1    -0.0
> 
>         51200   4096    +1%     +2%     +23%    +23%    +23%    +1%
>         cpu:            +0.2    +0.1    +2.0    +3.2    +3.1    +0.0
> 
>         51200   8192    +1%     +5%     +24%    +24%    +24%    +1%
>         cpu:            +1.4    -0.0    +4.2    +3.0    +2.8    +0.1
> 
>         51200   16384   +1%     +3%     +24%    +24%    +24%    +2%
>         cpu:            +0.0    +0.3    +3.4    +3.8    +3.7    +0.1
> 
> Here follows the IOZone results for u5500 on eMMC.
> These numbers for DMA are more as expected.
> 
> Command line used: ./iozone -az -i0 -i1 -i2 -s 50m -I -f /iozone.tmp -e -R -+u
> 
> Relative diff: VANILLA-MMC-DMA -> 2BUF-MMC-MMCI-DMA
> cpu load is abs diff
>                                                         random  random
>         KB      reclen  write   rewrite read    reread  read    write
>         51200   128     +1%     +1%     +10%    +9%     +10%    +0%
>         cpu:            +0.1    +0.0    +1.3    +0.1    +0.8    +0.1
> 
>         51200   256     +2%     +2%     +7%     +7%     +9%     +0%
>         cpu:            +0.1    +0.4    +0.5    +0.6    +0.7    +0.0
> 
>         51200   512     +2%     +2%     +12%    +12%    +12%    +1%
>         cpu:            +0.4    +0.6    +1.8    +2.4    +2.4    +0.2
> 
>         51200   1024    +2%     +3%     +14%    +14%    +14%    +0%
>         cpu:            +0.3    +0.1    +2.1    +1.4    +1.4    +0.2
> 
>         51200   2048    +3%     +3%     +16%    +16%    +16%    +1%
>         cpu:            +0.2    +0.2    +2.5    +1.8    +2.4    -0.2
> 
>         51200   4096    +3%     +3%     +17%    +17%    +18%    +3%
>         cpu:            +0.1    -0.1    +2.7    +2.0    +2.7    -0.1
> 
>         51200   8192    +3%     +3%     +18%    +18%    +18%    +3%
>         cpu:            -0.1    +0.2    +3.0    +2.3    +2.2    +0.2
> 
>         51200   16384   +3%     +3%     +18%    +18%    +18%    +4%
>         cpu:            +0.2    +0.2    +2.8    +3.5    +2.4    -0.0
> 
> Per Forlin (5):
>   mmc: add member in mmc queue struct to hold request data
>   mmc: Add a block request prepare function
>   mmc: Add a second mmc queue request member
>   mmc: Store the mmc block request struct in mmc queue
>   mmc: Add double buffering for mmc block requests
> 
>  drivers/mmc/card/block.c |  337 ++++++++++++++++++++++++++++++----------------
>  drivers/mmc/card/queue.c |  171 +++++++++++++++---------
>  drivers/mmc/card/queue.h |   31 +++-
>  drivers/mmc/core/core.c  |   77 +++++++++--
>  include/linux/mmc/core.h |    7 +-
>  include/linux/mmc/host.h |    8 +
>  6 files changed, 432 insertions(+), 199 deletions(-)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

  parent reply	other threads:[~2011-01-18  2:36 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-12 18:13 [PATCH 0/5] mmc: add double buffering for mmc block requests Per Forlin
2011-01-12 18:13 ` Per Forlin
2011-01-12 18:13 ` [PATCH 1/5] mmc: add member in mmc queue struct to hold request data Per Forlin
2011-01-12 18:13   ` Per Forlin
2011-01-12 18:14 ` [PATCH 2/5] mmc: Add a block request prepare function Per Forlin
2011-01-12 18:14   ` Per Forlin
2011-01-12 18:14 ` [PATCH 3/5] mmc: Add a second mmc queue request member Per Forlin
2011-01-12 18:14   ` Per Forlin
2011-01-12 18:14 ` [PATCH 4/5] mmc: Store the mmc block request struct in mmc queue Per Forlin
2011-01-12 18:14   ` Per Forlin
2011-01-12 18:14 ` [PATCH 5/5] mmc: Add double buffering for mmc block requests Per Forlin
2011-01-12 18:14   ` Per Forlin
2011-01-12 18:24 ` [PATCH 0/5] mmc: add " Per Forlin
2011-01-12 18:24   ` Per Forlin
2011-01-18  2:35 ` Jaehoon Chung [this message]
2011-01-18  2:35   ` Jaehoon Chung
2011-01-18  8:12   ` Per Forlin
2011-01-18  8:12     ` Per Forlin
     [not found]     ` <AANLkTimjfO4Wb0f87X_sDugW=yU1=YEQn35uZnsKKwq2-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-01-28  8:28       ` Per Forlin
2011-01-28  8:28         ` Per Forlin
2011-01-28  8:28         ` Per Forlin
2011-01-30  8:23         ` Jaehoon Chung
2011-01-30  8:23           ` Jaehoon Chung
2011-02-05 17:02 ` Russell King - ARM Linux
2011-02-05 17:02   ` Russell King - ARM Linux
2011-02-05 20:36   ` Russell King - ARM Linux
2011-02-05 20:36     ` Russell King - ARM Linux

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D34FC5D.5080605@samsung.com \
    --to=jh80.chung@samsung.com \
    --cc=cjb@laptop.org \
    --cc=dev@lists.linaro.org \
    --cc=kyungmin.park@samsung.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mmc@vger.kernel.org \
    --cc=per.forlin@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.