* Forum for asking questions related to block device drivers
@ 2013-04-10 20:53 neha naik
2013-04-11 5:15 ` Rajat Sharma
2013-04-11 7:47 ` Forum for asking questions related to block device drivers Bjørn Mork
0 siblings, 2 replies; 13+ messages in thread
From: neha naik @ 2013-04-10 20:53 UTC (permalink / raw)
To: kernelnewbies
Hi All,
Nobody has replied to my query here. So i am just wondering if there is
a forum for block device driver where i can post my query.
Please tell me if there is any such forum.
Thanks,
Neha
---------- Forwarded message ----------
From: neha naik <nehanaik27@gmail.com>
Date: Tue, Apr 9, 2013 at 10:18 AM
Subject: Passthrough device driver performance is low on reads compared to
writes
To: kernelnewbies at kernelnewbies.org
Hi All,
I have written a passthrough block device driver using 'make_request'
call. This block device driver simply passes any request that comes to it
down to lvm.
However, the read performance for my passthrough driver is around 65MB/s
(measured through dd) and write performance is around 140MB/s for dd block
size 4096.
The write performance matches with lvm's write performance more or less
but, the read performance on lvm is around 365MB/s.
I am posting snippets of code which i think are relevant here:
static int passthrough_make_request(
struct request_queue * queue, struct bio * bio)
{
passthrough_device_t * passdev = queue->queuedata;
bio->bi_bdev = passdev->bdev_backing;
generic_make_request(bio);
return 0;
}
For initializing the queue i am using following:
blk_queue_make_request(passdev->queue, passthrough_make_request);
passdev->queue->queuedata = sbd;
passdev->queue->unplug_fn = NULL;
bdev_backing = passdev->bdev_backing;
blk_queue_stack_limits(passdev->queue, bdev_get_queue(bdev_backing));
if ((bdev_get_queue(bdev_backing))->merge_bvec_fn) {
blk_queue_merge_bvec(sbd->queue, sbd_merge_bvec_fn);
}
Now, I browsed through dm code in kernel to see if there is some flag or
something which i am not using which is causing this huge performance
penalty.
But, I have not found anything.
If you have any ideas about what i am possibly doing wrong then please tell
me.
Thanks in advance.
Regards,
Neha
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130410/5052f416/attachment.html
^ permalink raw reply [flat|nested] 13+ messages in thread* Forum for asking questions related to block device drivers 2013-04-10 20:53 Forum for asking questions related to block device drivers neha naik @ 2013-04-11 5:15 ` Rajat Sharma 2013-04-11 15:09 ` neha naik 2013-04-11 7:47 ` Forum for asking questions related to block device drivers Bjørn Mork 1 sibling, 1 reply; 13+ messages in thread From: Rajat Sharma @ 2013-04-11 5:15 UTC (permalink / raw) To: kernelnewbies Hi, On Thu, Apr 11, 2013 at 2:23 AM, neha naik <nehanaik27@gmail.com> wrote: > Hi All, > Nobody has replied to my query here. So i am just wondering if there is a > forum for block device driver where i can post my query. > Please tell me if there is any such forum. > > Thanks, > Neha > > ---------- Forwarded message ---------- > From: neha naik <nehanaik27@gmail.com> > Date: Tue, Apr 9, 2013 at 10:18 AM > Subject: Passthrough device driver performance is low on reads compared to > writes > To: kernelnewbies at kernelnewbies.org > > > Hi All, > I have written a passthrough block device driver using 'make_request' > call. This block device driver simply passes any request that comes to it > down to lvm. > > However, the read performance for my passthrough driver is around 65MB/s > (measured through dd) and write performance is around 140MB/s for dd block > size 4096. > The write performance matches with lvm's write performance more or less but, > the read performance on lvm is around 365MB/s. > > I am posting snippets of code which i think are relevant here: > > static int passthrough_make_request( > struct request_queue * queue, struct bio * bio) > { > > passthrough_device_t * passdev = queue->queuedata; > bio->bi_bdev = passdev->bdev_backing; > generic_make_request(bio); > return 0; > } > > For initializing the queue i am using following: > > blk_queue_make_request(passdev->queue, passthrough_make_request); > passdev->queue->queuedata = sbd; > passdev->queue->unplug_fn = NULL; > bdev_backing = passdev->bdev_backing; > blk_queue_stack_limits(passdev->queue, bdev_get_queue(bdev_backing)); > if ((bdev_get_queue(bdev_backing))->merge_bvec_fn) { > blk_queue_merge_bvec(sbd->queue, sbd_merge_bvec_fn); > } > What is the implementation for sbd_merge_bvec_fn? Please debug through it to check requests are merging or not? May be that is the cause of lower performance? > Now, I browsed through dm code in kernel to see if there is some flag or > something which i am not using which is causing this huge performance > penalty. > But, I have not found anything. > > If you have any ideas about what i am possibly doing wrong then please tell > me. > > Thanks in advance. > > Regards, > Neha > -Rajat > > _______________________________________________ > Kernelnewbies mailing list > Kernelnewbies at kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Forum for asking questions related to block device drivers 2013-04-11 5:15 ` Rajat Sharma @ 2013-04-11 15:09 ` neha naik 2013-04-11 17:53 ` Rajat Sharma 0 siblings, 1 reply; 13+ messages in thread From: neha naik @ 2013-04-11 15:09 UTC (permalink / raw) To: kernelnewbies Hi, I am calling the merge function of the block device driver below me(since mine is only pass through). Does this not work? When i tried seeing what read requests were coming then i saw that when i issue dd with count=1 it retrieves 4 pages, so i tried with 'direct' flag. But even with direct io my read performance is way lower than my write performance. Regards, Neha On Wed, Apr 10, 2013 at 11:15 PM, Rajat Sharma <fs.rajat@gmail.com> wrote: > Hi, > > On Thu, Apr 11, 2013 at 2:23 AM, neha naik <nehanaik27@gmail.com> wrote: > > Hi All, > > Nobody has replied to my query here. So i am just wondering if there > is a > > forum for block device driver where i can post my query. > > Please tell me if there is any such forum. > > > > Thanks, > > Neha > > > > ---------- Forwarded message ---------- > > From: neha naik <nehanaik27@gmail.com> > > Date: Tue, Apr 9, 2013 at 10:18 AM > > Subject: Passthrough device driver performance is low on reads compared > to > > writes > > To: kernelnewbies at kernelnewbies.org > > > > > > Hi All, > > I have written a passthrough block device driver using 'make_request' > > call. This block device driver simply passes any request that comes to it > > down to lvm. > > > > However, the read performance for my passthrough driver is around 65MB/s > > (measured through dd) and write performance is around 140MB/s for dd > block > > size 4096. > > The write performance matches with lvm's write performance more or less > but, > > the read performance on lvm is around 365MB/s. > > > > I am posting snippets of code which i think are relevant here: > > > > static int passthrough_make_request( > > struct request_queue * queue, struct bio * bio) > > { > > > > passthrough_device_t * passdev = queue->queuedata; > > bio->bi_bdev = passdev->bdev_backing; > > generic_make_request(bio); > > return 0; > > } > > > > For initializing the queue i am using following: > > > > blk_queue_make_request(passdev->queue, passthrough_make_request); > > passdev->queue->queuedata = sbd; > > passdev->queue->unplug_fn = NULL; > > bdev_backing = passdev->bdev_backing; > > blk_queue_stack_limits(passdev->queue, bdev_get_queue(bdev_backing)); > > if ((bdev_get_queue(bdev_backing))->merge_bvec_fn) { > > blk_queue_merge_bvec(sbd->queue, sbd_merge_bvec_fn); > > } > > > > What is the implementation for sbd_merge_bvec_fn? Please debug through > it to check requests are merging or not? May be that is the cause of > lower performance? > > > Now, I browsed through dm code in kernel to see if there is some flag or > > something which i am not using which is causing this huge performance > > penalty. > > But, I have not found anything. > > > > If you have any ideas about what i am possibly doing wrong then please > tell > > me. > > > > Thanks in advance. > > > > Regards, > > Neha > > > > -Rajat > > > > > _______________________________________________ > > Kernelnewbies mailing list > > Kernelnewbies at kernelnewbies.org > > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130411/f1744815/attachment-0001.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Forum for asking questions related to block device drivers 2013-04-11 15:09 ` neha naik @ 2013-04-11 17:53 ` Rajat Sharma 2013-04-11 18:50 ` neha naik 0 siblings, 1 reply; 13+ messages in thread From: Rajat Sharma @ 2013-04-11 17:53 UTC (permalink / raw) To: kernelnewbies so you mean direct I/O read of your passthrough device is lower than direct I/O read of lvm? On Thu, Apr 11, 2013 at 8:39 PM, neha naik <nehanaik27@gmail.com> wrote: > Hi, > I am calling the merge function of the block device driver below me(since > mine is only pass through). Does this not work? > When i tried seeing what read requests were coming then i saw that when i > issue dd with count=1 it retrieves 4 pages, > so i tried with 'direct' flag. But even with direct io my read performance > is way lower than my write performance. > > Regards, > Neha > > > On Wed, Apr 10, 2013 at 11:15 PM, Rajat Sharma <fs.rajat@gmail.com> wrote: >> >> Hi, >> >> On Thu, Apr 11, 2013 at 2:23 AM, neha naik <nehanaik27@gmail.com> wrote: >> > Hi All, >> > Nobody has replied to my query here. So i am just wondering if there >> > is a >> > forum for block device driver where i can post my query. >> > Please tell me if there is any such forum. >> > >> > Thanks, >> > Neha >> > >> > ---------- Forwarded message ---------- >> > From: neha naik <nehanaik27@gmail.com> >> > Date: Tue, Apr 9, 2013 at 10:18 AM >> > Subject: Passthrough device driver performance is low on reads compared >> > to >> > writes >> > To: kernelnewbies at kernelnewbies.org >> > >> > >> > Hi All, >> > I have written a passthrough block device driver using 'make_request' >> > call. This block device driver simply passes any request that comes to >> > it >> > down to lvm. >> > >> > However, the read performance for my passthrough driver is around 65MB/s >> > (measured through dd) and write performance is around 140MB/s for dd >> > block >> > size 4096. >> > The write performance matches with lvm's write performance more or less >> > but, >> > the read performance on lvm is around 365MB/s. >> > >> > I am posting snippets of code which i think are relevant here: >> > >> > static int passthrough_make_request( >> > struct request_queue * queue, struct bio * bio) >> > { >> > >> > passthrough_device_t * passdev = queue->queuedata; >> > bio->bi_bdev = passdev->bdev_backing; >> > generic_make_request(bio); >> > return 0; >> > } >> > >> > For initializing the queue i am using following: >> > >> > blk_queue_make_request(passdev->queue, passthrough_make_request); >> > passdev->queue->queuedata = sbd; >> > passdev->queue->unplug_fn = NULL; >> > bdev_backing = passdev->bdev_backing; >> > blk_queue_stack_limits(passdev->queue, bdev_get_queue(bdev_backing)); >> > if ((bdev_get_queue(bdev_backing))->merge_bvec_fn) { >> > blk_queue_merge_bvec(sbd->queue, sbd_merge_bvec_fn); >> > } >> > >> >> What is the implementation for sbd_merge_bvec_fn? Please debug through >> it to check requests are merging or not? May be that is the cause of >> lower performance? >> >> > Now, I browsed through dm code in kernel to see if there is some flag or >> > something which i am not using which is causing this huge performance >> > penalty. >> > But, I have not found anything. >> > >> > If you have any ideas about what i am possibly doing wrong then please >> > tell >> > me. >> > >> > Thanks in advance. >> > >> > Regards, >> > Neha >> > >> >> -Rajat >> >> > >> > _______________________________________________ >> > Kernelnewbies mailing list >> > Kernelnewbies at kernelnewbies.org >> > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies >> > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Forum for asking questions related to block device drivers 2013-04-11 17:53 ` Rajat Sharma @ 2013-04-11 18:50 ` neha naik 2013-04-11 19:49 ` Greg Freemyer 2013-04-15 7:02 ` simple question about struct pointer Ben Wu 0 siblings, 2 replies; 13+ messages in thread From: neha naik @ 2013-04-11 18:50 UTC (permalink / raw) To: kernelnewbies Yes. Interestingly my direct write i/o performance is better than my direct read i/o performance for my passthrough device... And that doesn't make any kind of sense to me. pdev0 = pass through device on top of lvm root at voffice-base:/home/neha/sbd# time dd if=/dev/pdev0 of=/dev/null bs=4096 count=1024 iflag=direct 1024+0 records in 1024+0 records out 4194304 bytes (4.2 MB) copied, 4.09488 s, 1.0 MB/s real 0m4.100s user 0m0.028s sys 0m0.000s root at voffice-base:/home/neha/sbd# time dd if=/dev/shm/image of=/dev/pdev0 bs=4096 count=1024 oflag=direct 1024+0 records in 1024+0 records out 4194304 bytes (4.2 MB) copied, 0.0852398 s, 49.2 MB/s real 0m0.090s user 0m0.004s sys 0m0.012s Thanks, Neha On Thu, Apr 11, 2013 at 11:53 AM, Rajat Sharma <fs.rajat@gmail.com> wrote: > so you mean direct I/O read of your passthrough device is lower than > direct I/O read of lvm? > > On Thu, Apr 11, 2013 at 8:39 PM, neha naik <nehanaik27@gmail.com> wrote: > > Hi, > > I am calling the merge function of the block device driver below > me(since > > mine is only pass through). Does this not work? > > When i tried seeing what read requests were coming then i saw that when i > > issue dd with count=1 it retrieves 4 pages, > > so i tried with 'direct' flag. But even with direct io my read > performance > > is way lower than my write performance. > > > > Regards, > > Neha > > > > > > On Wed, Apr 10, 2013 at 11:15 PM, Rajat Sharma <fs.rajat@gmail.com> > wrote: > >> > >> Hi, > >> > >> On Thu, Apr 11, 2013 at 2:23 AM, neha naik <nehanaik27@gmail.com> > wrote: > >> > Hi All, > >> > Nobody has replied to my query here. So i am just wondering if > there > >> > is a > >> > forum for block device driver where i can post my query. > >> > Please tell me if there is any such forum. > >> > > >> > Thanks, > >> > Neha > >> > > >> > ---------- Forwarded message ---------- > >> > From: neha naik <nehanaik27@gmail.com> > >> > Date: Tue, Apr 9, 2013 at 10:18 AM > >> > Subject: Passthrough device driver performance is low on reads > compared > >> > to > >> > writes > >> > To: kernelnewbies at kernelnewbies.org > >> > > >> > > >> > Hi All, > >> > I have written a passthrough block device driver using > 'make_request' > >> > call. This block device driver simply passes any request that comes to > >> > it > >> > down to lvm. > >> > > >> > However, the read performance for my passthrough driver is around > 65MB/s > >> > (measured through dd) and write performance is around 140MB/s for dd > >> > block > >> > size 4096. > >> > The write performance matches with lvm's write performance more or > less > >> > but, > >> > the read performance on lvm is around 365MB/s. > >> > > >> > I am posting snippets of code which i think are relevant here: > >> > > >> > static int passthrough_make_request( > >> > struct request_queue * queue, struct bio * bio) > >> > { > >> > > >> > passthrough_device_t * passdev = queue->queuedata; > >> > bio->bi_bdev = passdev->bdev_backing; > >> > generic_make_request(bio); > >> > return 0; > >> > } > >> > > >> > For initializing the queue i am using following: > >> > > >> > blk_queue_make_request(passdev->queue, passthrough_make_request); > >> > passdev->queue->queuedata = sbd; > >> > passdev->queue->unplug_fn = NULL; > >> > bdev_backing = passdev->bdev_backing; > >> > blk_queue_stack_limits(passdev->queue, bdev_get_queue(bdev_backing)); > >> > if ((bdev_get_queue(bdev_backing))->merge_bvec_fn) { > >> > blk_queue_merge_bvec(sbd->queue, sbd_merge_bvec_fn); > >> > } > >> > > >> > >> What is the implementation for sbd_merge_bvec_fn? Please debug through > >> it to check requests are merging or not? May be that is the cause of > >> lower performance? > >> > >> > Now, I browsed through dm code in kernel to see if there is some flag > or > >> > something which i am not using which is causing this huge performance > >> > penalty. > >> > But, I have not found anything. > >> > > >> > If you have any ideas about what i am possibly doing wrong then please > >> > tell > >> > me. > >> > > >> > Thanks in advance. > >> > > >> > Regards, > >> > Neha > >> > > >> > >> -Rajat > >> > >> > > >> > _______________________________________________ > >> > Kernelnewbies mailing list > >> > Kernelnewbies at kernelnewbies.org > >> > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > >> > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130411/8e26c147/attachment.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Forum for asking questions related to block device drivers 2013-04-11 18:50 ` neha naik @ 2013-04-11 19:49 ` Greg Freemyer 2013-04-11 20:48 ` neha naik 2013-04-15 7:02 ` simple question about struct pointer Ben Wu 1 sibling, 1 reply; 13+ messages in thread From: Greg Freemyer @ 2013-04-11 19:49 UTC (permalink / raw) To: kernelnewbies On Thu, Apr 11, 2013 at 2:50 PM, neha naik <nehanaik27@gmail.com> wrote: > Yes. Interestingly my direct write i/o performance is better than my direct > read i/o performance for my passthrough device... And that doesn't make any > kind of sense to me. > > pdev0 = pass through device on top of lvm > > root at voffice-base:/home/neha/sbd# time dd if=/dev/pdev0 of=/dev/null bs=4096 > count=1024 iflag=direct > 1024+0 records in > 1024+0 records out > 4194304 bytes (4.2 MB) copied, 4.09488 s, 1.0 MB/s > > real 0m4.100s > user 0m0.028s > sys 0m0.000s > > root at voffice-base:/home/neha/sbd# time dd if=/dev/shm/image of=/dev/pdev0 > bs=4096 count=1024 oflag=direct > 1024+0 records in > 1024+0 records out > 4194304 bytes (4.2 MB) copied, 0.0852398 s, 49.2 MB/s > > real 0m0.090s > user 0m0.004s > sys 0m0.012s > > Thanks, > Neha I assume your issue is caching somewhere. If in the top levels of the kernel, dd has various fsync, fdatasync, etc. options that should address that. I note you aren't using any of them. You mention LVM. It should pass cache flush commands down, but some flavors of mdraid will not the last I knew. ie. Raid 6 used to discard cache flush commands iirc. I don't know if that was ever fixed or not. If the cache is in hardware, then dd's cache flushing calls may or may not get propagated all the way to the device. Some battery backed caches actually intentionally reply ACK to a cache flush command without actually doing it. Further, you're only writing 4MB. Not much of a test for most devices. A sata drive will typically have at least 32MB of cache. One way to ensure that results are not being corrupted by the various caches up and down the storage stack is to write so much data you overwhelm the caches. That can be a huge amount of data in some systems. ie. A server with 128 GB or ram may use 10's of GB for cache. As you can see, testing of the write path for performance can take a significant effort to ensure caches are not biasing your results. HTH Greg ^ permalink raw reply [flat|nested] 13+ messages in thread
* Forum for asking questions related to block device drivers 2013-04-11 19:49 ` Greg Freemyer @ 2013-04-11 20:48 ` neha naik 2013-04-11 22:06 ` Arlie Stephens 2013-04-11 23:02 ` Greg Freemyer 0 siblings, 2 replies; 13+ messages in thread From: neha naik @ 2013-04-11 20:48 UTC (permalink / raw) To: kernelnewbies HI Greg, Thanks a lot. Everything you said made complete sense to me but when i tried running with following options my read is so slow (basically with direct io, that with 1MB/s it will just take 32minutes to read 32MB data) yet my write is doing fine. Should i use some other options of dd (though i understand that with direct we bypass all caches, but direct doesn't guarantee that everything is written when call returns to user for which i am using fdatasync). time dd if=/dev/shm/image of=/dev/sbd0 bs=4096 count=262144 oflag=direct conv=fdatasync time dd if=/dev/pdev0 of=/dev/null bs=4096 count=2621262144+0 records in 262144+0 records out 1073741824 bytes (1.1 GB) copied, 17.7809 s, 60.4 MB/s real 0m17.785s user 0m0.152s sys 0m1.564s I interrupted the dd for read because it was taking too much time with 1MB/s : time dd if=/dev/pdev0 of=/dev/null bs=4096 count=262144 iflag=direct conv=fdatasync ^C150046+0 records in 150045+0 records out 614584320 bytes (615 MB) copied, 600.197 s, 1.0 MB/s real 10m0.201s user 0m2.576s sys 0m0.000s Thanks, Neha On Thu, Apr 11, 2013 at 1:49 PM, Greg Freemyer <greg.freemyer@gmail.com>wrote: > On Thu, Apr 11, 2013 at 2:50 PM, neha naik <nehanaik27@gmail.com> wrote: > > Yes. Interestingly my direct write i/o performance is better than my > direct > > read i/o performance for my passthrough device... And that doesn't make > any > > kind of sense to me. > > > > pdev0 = pass through device on top of lvm > > > > root at voffice-base:/home/neha/sbd# time dd if=/dev/pdev0 of=/dev/null > bs=4096 > > count=1024 iflag=direct > > 1024+0 records in > > 1024+0 records out > > 4194304 bytes (4.2 MB) copied, 4.09488 s, 1.0 MB/s > > > > real 0m4.100s > > user 0m0.028s > > sys 0m0.000s > > > > root at voffice-base:/home/neha/sbd# time dd if=/dev/shm/image > of=/dev/pdev0 > > bs=4096 count=1024 oflag=direct > > 1024+0 records in > > 1024+0 records out > > 4194304 bytes (4.2 MB) copied, 0.0852398 s, 49.2 MB/s > > > > real 0m0.090s > > user 0m0.004s > > sys 0m0.012s > > > > Thanks, > > Neha > > I assume your issue is caching somewhere. > > If in the top levels of the kernel, dd has various fsync, fdatasync, > etc. options that should address that. I note you aren't using any of > them. > > You mention LVM. It should pass cache flush commands down, but some > flavors of mdraid will not the last I knew. ie. Raid 6 used to > discard cache flush commands iirc. I don't know if that was ever > fixed or not. > > If the cache is in hardware, then dd's cache flushing calls may or may > not get propagated all the way to the device. Some battery backed > caches actually intentionally reply ACK to a cache flush command > without actually doing it. > > Further, you're only writing 4MB. Not much of a test for most > devices. A sata drive will typically have at least 32MB of cache. > One way to ensure that results are not being corrupted by the various > caches up and down the storage stack is to write so much data you > overwhelm the caches. That can be a huge amount of data in some > systems. ie. A server with 128 GB or ram may use 10's of GB for > cache. > > As you can see, testing of the write path for performance can take a > significant effort to ensure caches are not biasing your results. > > HTH > Greg > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130411/b83b5b59/attachment-0001.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Forum for asking questions related to block device drivers 2013-04-11 20:48 ` neha naik @ 2013-04-11 22:06 ` Arlie Stephens 2013-04-11 23:02 ` Greg Freemyer 1 sibling, 0 replies; 13+ messages in thread From: Arlie Stephens @ 2013-04-11 22:06 UTC (permalink / raw) To: kernelnewbies Hi Neha, On Apr 11 2013, neha naik wrote: > HI Greg, > Thanks a lot. Everything you said made complete sense to me but when i > tried running with following options my read is so slow (basically with > direct io, that with 1MB/s it will just take 32minutes to read 32MB data) > yet my write is doing fine. Should i use some other options of dd (though i > understand that with direct we bypass all caches, but direct doesn't > guarantee that everything is written when call returns to user for which i > am using fdatasync). I'm no kind of expert, but the last time I found myself timing dd, I found that the block size was critical, and 4096 bytes is a very small block size, from a dd point of view. On freebsd at least, cranking it up to at least 1MB did great things for its performance. What happens with "bs=1M" ? > time dd if=/dev/shm/image of=/dev/sbd0 bs=4096 count=262144 oflag=direct > conv=fdatasync > time dd if=/dev/pdev0 of=/dev/null bs=4096 count=2621262144+0 records in > 262144+0 records out > 1073741824 bytes (1.1 GB) copied, 17.7809 s, 60.4 MB/s > > real 0m17.785s > user 0m0.152s > sys 0m1.564s > > > I interrupted the dd for read because it was taking too much time with > 1MB/s : > time dd if=/dev/pdev0 of=/dev/null bs=4096 count=262144 iflag=direct > conv=fdatasync > ^C150046+0 records in > 150045+0 records out > 614584320 bytes (615 MB) copied, 600.197 s, 1.0 MB/s > > > real 10m0.201s > user 0m2.576s > sys 0m0.000s > > Thanks, > Neha ^ permalink raw reply [flat|nested] 13+ messages in thread
* Forum for asking questions related to block device drivers 2013-04-11 20:48 ` neha naik 2013-04-11 22:06 ` Arlie Stephens @ 2013-04-11 23:02 ` Greg Freemyer 2013-04-12 18:01 ` neha naik 1 sibling, 1 reply; 13+ messages in thread From: Greg Freemyer @ 2013-04-11 23:02 UTC (permalink / raw) To: kernelnewbies On Thu, Apr 11, 2013 at 4:48 PM, neha naik <nehanaik27@gmail.com> wrote: > HI Greg, > Thanks a lot. Everything you said made complete sense to me but when i > tried running with following options my read is so slow (basically with > direct io, that with 1MB/s it will just take 32minutes to read 32MB data) > yet my write is doing fine. Should i use some other options of dd (though i > understand that with direct we bypass all caches, but direct doesn't > guarantee that everything is written when call returns to user for which i > am using fdatasync). > > time dd if=/dev/shm/image of=/dev/sbd0 bs=4096 count=262144 oflag=direct > conv=fdatasync > time dd if=/dev/pdev0 of=/dev/null bs=4096 count=2621262144+0 records in > 262144+0 records out > 1073741824 bytes (1.1 GB) copied, 17.7809 s, 60.4 MB/s > > real 0m17.785s > user 0m0.152s > sys 0m1.564s > > > I interrupted the dd for read because it was taking too much time with 1MB/s > : > time dd if=/dev/pdev0 of=/dev/null bs=4096 count=262144 iflag=direct > conv=fdatasync > ^C150046+0 records in > 150045+0 records out > 614584320 bytes (615 MB) copied, 600.197 s, 1.0 MB/s > > > real 10m0.201s > user 0m2.576s > sys 0m0.000s Before reading the below, please not the rotating disks are made of zones with a constant number of sectors/track. In the below I discuss 1 track as holding 1MB of data. I believe that is roughly accurate for an outer track with near 3" of diameter. A inner track with roughly 2" of diameter, would only have 2/3rds of 1MB of data. I am ignoring that for simplicity sake. You can worry about it yourself separately. ==== When you use iflag=direct, you are telling the kernel, I know what I'm doing, just do it. So let's do some math and see if we can figure it out. I assume you are working with rotating media as your backing store for the LVM volumes. A rotating disk with 6000 RPMs takes 10 milliseconds per revolution. (I'm using this because the math is easy. Check the specs for your drives.) With iflag=direct, you have taken control of interacting with a rotating disk that can only read data once every rotation. That is relevant sectors are only below the read head once every 10 msecs. So, you are saying, give me 4KB every time the data rotates below the read head. That happens about 100 times per second, so per my logic you should be seeing 400KB/sec read rate. You are actually getting roughly twice that. Thus my question is what is happening in your setup that you are getting 10KB per rotation instead of the 4KB you asked for. (the answer could be that you have 15K rpm drives, instead of the 6K rpm drives I calculated for.) My laptop is giving 20MB/sec with bs=4KB which implies I'm getting 50x the speed I expect from the above theory. I have to assume some form of read-ahead is going on and reading 256KB at a time. That logic may be in my laptop's disk and not the kernel. (I don't know for sure). Arlie recommended 1 MB reads. That should be a lot faster because a disk track is roughly 1 MB, so you are telling the disk: As you spin, when you get to the sector I care about, do a continuous read for a full rotation (1MB). By the time you ask for the next 1MB, I would expect it will be too late get the very next sector, so the drive would do a full rotation looking for your sector, then do a continuous 1MB read. So, if my logic is right the drive itself is doing: rotation 1: searching for first sector of read rotation 2: read 1MB continuously rotation 3: searching for first sector of next read rotation 4: read 1MB continuously I just checked my laptop's drive, and with bs=1MB it actually achieves more or less max transfer rate, so for it at least with 1MB reads the cpu / drive controller is able to keep up with the rotating disk and not have the 50% wasted rotations that I would actually expect. Again it appears something is doing some read ahead. Let's assume my laptop's disk does a 256KB readahead every time it gets a read request. So when it gets that 1MB request, it actually reads 1MB+256KB, but it returns the first 1MB to the cpu as soon as it has it. Thus when the 1MB is returned to the cpu, the drive is still working on the next 256KB and putting it in on-disk cache. If 256KB is 1/4 of a track's data, then it takes the disk about 2.5 msecs to read that data from the rotating platter to drives internal controller cache. If during that 2.5 msecs the cpu issues the next 1MB read request, the disk will just continue reading and not have any dead time. If you want to understand exactly what is happening you would need to monitor exactly what is going back and forth across the sata bus. Is the kernel doing a read-ahead even with direct io? Is the drive doing some kind of read ahead? etc. If you are going to work with direct io, hopefully the above gives you a new way to think about things. Greg ^ permalink raw reply [flat|nested] 13+ messages in thread
* Forum for asking questions related to block device drivers 2013-04-11 23:02 ` Greg Freemyer @ 2013-04-12 18:01 ` neha naik 0 siblings, 0 replies; 13+ messages in thread From: neha naik @ 2013-04-12 18:01 UTC (permalink / raw) To: kernelnewbies HI Greg, I am using SSD underneath. However, my problem is not exactly related to disk cache. I think i should give some more background. These are my key points: 1. Read on my passthrough driver on top of lvm is slower than read on just the lvm (with or without any kind of direct i/o). 2. Read on my passthrough driver (on top of lvm) is slower than write on my passthrough driver (on top of lvm). 3. If i disable lvm readahead (we can do that for all block device drivers) then its read performance becomes almost equal to the read performance of my passthrough driver. This suggested that lvm readahead was helping lvm's performance. But, if it helps the lvm performance then it should also help the performance of my passthrough driver (which is sitting on top of it). This led me to thinking that i am doing something in my device driver which is possibly either disabling the lvm readahead or lvm readahead gets switched off when it is not interacting with the kernel directly. Given this, i am thinking there are there may be some issue with how i have written my device driver (rather used the api). I am using the 'merge_bvec_fn' function of lvm underneath it which i think should have merged the ios (since we are doing sequential io). But, that is clearly not the case. When i print the pages that come to my driver i see that each time the function 'make_request' gets called with one page. Shouldn't it be merging the io using lvm merge function or it doesn't work like that? That is should each driver write its own 'merge_bvec_fn' and not rely on the driver beneath it to take care of that? Or is there some problem when i pass the request to lvm (should i be calling some thing else or passing some kind of flag). Regards, Neha On Thu, Apr 11, 2013 at 5:02 PM, Greg Freemyer <greg.freemyer@gmail.com>wrote: > On Thu, Apr 11, 2013 at 4:48 PM, neha naik <nehanaik27@gmail.com> wrote: > > HI Greg, > > Thanks a lot. Everything you said made complete sense to me but when i > > tried running with following options my read is so slow (basically with > > direct io, that with 1MB/s it will just take 32minutes to read 32MB data) > > yet my write is doing fine. Should i use some other options of dd > (though i > > understand that with direct we bypass all caches, but direct doesn't > > guarantee that everything is written when call returns to user for which > i > > am using fdatasync). > > > > time dd if=/dev/shm/image of=/dev/sbd0 bs=4096 count=262144 oflag=direct > > conv=fdatasync > > time dd if=/dev/pdev0 of=/dev/null bs=4096 count=2621262144+0 records in > > 262144+0 records out > > 1073741824 bytes (1.1 GB) copied, 17.7809 s, 60.4 MB/s > > > > real 0m17.785s > > user 0m0.152s > > sys 0m1.564s > > > > > > I interrupted the dd for read because it was taking too much time with > 1MB/s > > : > > time dd if=/dev/pdev0 of=/dev/null bs=4096 count=262144 iflag=direct > > conv=fdatasync > > ^C150046+0 records in > > 150045+0 records out > > 614584320 bytes (615 MB) copied, 600.197 s, 1.0 MB/s > > > > > > real 10m0.201s > > user 0m2.576s > > sys 0m0.000s > > Before reading the below, please not the rotating disks are made of > zones with a constant number of sectors/track. In the below I discuss > 1 track as holding 1MB of data. I believe that is roughly accurate > for an outer track with near 3" of diameter. A inner track with > roughly 2" of diameter, would only have 2/3rds of 1MB of data. I am > ignoring that for simplicity sake. You can worry about it yourself > separately. > > ==== > When you use iflag=direct, you are telling the kernel, I know what I'm > doing, just do it. > > So let's do some math and see if we can figure it out. I assume you > are working with rotating media as your backing store for the LVM > volumes. > > A rotating disk with 6000 RPMs takes 10 milliseconds per revolution. > (I'm using this because the math is easy. Check the specs for your > drives.) > > With iflag=direct, you have taken control of interacting with a > rotating disk that can only read data once every rotation. That is > relevant sectors are only below the read head once every 10 msecs. > > So, you are saying, give me 4KB every time the data rotates below the > read head. That happens about 100 times per second, so per my logic > you should be seeing 400KB/sec read rate. > > You are actually getting roughly twice that. Thus my question is what > is happening in your setup that you are getting 10KB per rotation > instead of the 4KB you asked for. (the answer could be that you have > 15K rpm drives, instead of the 6K rpm drives I calculated for.) > > My laptop is giving 20MB/sec with bs=4KB which implies I'm getting 50x > the speed I expect from the above theory. I have to assume some form > of read-ahead is going on and reading 256KB at a time. That logic may > be in my laptop's disk and not the kernel. (I don't know for sure). > > Arlie recommended 1 MB reads. That should be a lot faster because a > disk track is roughly 1 MB, so you are telling the disk: As you spin, > when you get to the sector I care about, do a continuous read for a > full rotation (1MB). By the time you ask for the next 1MB, I would > expect it will be too late get the very next sector, so the drive > would do a full rotation looking for your sector, then do a continuous > 1MB read. > > So, if my logic is right the drive itself is doing: > > rotation 1: searching for first sector of read > rotation 2: read 1MB continuously > rotation 3: searching for first sector of next read > rotation 4: read 1MB continuously > > I just checked my laptop's drive, and with bs=1MB it actually achieves > more or less max transfer rate, so for it at least with 1MB reads the > cpu / drive controller is able to keep up with the rotating disk and > not have the 50% wasted rotations that I would actually expect. > > Again it appears something is doing some read ahead. Let's assume my > laptop's disk does a 256KB readahead every time it gets a read > request. So when it gets that 1MB request, it actually reads > 1MB+256KB, but it returns the first 1MB to the cpu as soon as it has > it. Thus when the 1MB is returned to the cpu, the drive is still > working on the next 256KB and putting it in on-disk cache. If 256KB > is 1/4 of a track's data, then it takes the disk about 2.5 msecs to > read that data from the rotating platter to drives internal controller > cache. If during that 2.5 msecs the cpu issues the next 1MB read > request, the disk will just continue reading and not have any dead > time. > > If you want to understand exactly what is happening you would need to > monitor exactly what is going back and forth across the sata bus. Is > the kernel doing a read-ahead even with direct io? Is the drive doing > some kind of read ahead? etc. > > If you are going to work with direct io, hopefully the above gives you > a new way to think about things. > Greg > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130412/6bca93dc/attachment.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* simple question about struct pointer 2013-04-11 18:50 ` neha naik 2013-04-11 19:49 ` Greg Freemyer @ 2013-04-15 7:02 ` Ben Wu 2013-04-15 9:48 ` arshad hussain 1 sibling, 1 reply; 13+ messages in thread From: Ben Wu @ 2013-04-15 7:02 UTC (permalink / raw) To: kernelnewbies Dear All ?Im new to linux kernel program, and found struct pointer is difficult to understand, that the? struct s3c_i2sv2_info *i2s = snd_soc_dai_get_drvdata(cpu_dai) means,? why is use the struct pointer assgn another struct pointer?? static int s3c2412_i2s_hw_params(struct snd_pcm_substream *substream, ?? ??? ??? ??? ? struct snd_pcm_hw_params *params, ?? ??? ??? ??? ? struct snd_soc_dai *cpu_dai) { ?? ?struct s3c_i2sv2_info *i2s = snd_soc_dai_get_drvdata(cpu_dai); ?? ?struct s3c_dma_params *dma_data; ?? ?.................................................................... } -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130415/1ee21821/attachment.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* simple question about struct pointer 2013-04-15 7:02 ` simple question about struct pointer Ben Wu @ 2013-04-15 9:48 ` arshad hussain 0 siblings, 0 replies; 13+ messages in thread From: arshad hussain @ 2013-04-15 9:48 UTC (permalink / raw) To: kernelnewbies On Mon, Apr 15, 2013 at 12:32 PM, Ben Wu <crayben@yahoo.cn> wrote: > > Dear All ?Im new to linux kernel program, and found struct pointer is > difficult to understand, that the struct s3c_i2sv2_info *i2s = > snd_soc_dai_get_drvdata(cpu_dai) means,? why is use the struct pointer > assgn another struct pointer?? > > > static int s3c2412_i2s_hw_params(struct snd_pcm_substream *substream, > struct snd_pcm_hw_params *params, > struct snd_soc_dai *cpu_dai) > { > struct s3c_i2sv2_info *i2s = snd_soc_dai_get_drvdata(cpu_dai); > struct s3c_dma_params *dma_data; > .................................................................... > } > _______________________________________________ > Kernelnewbies mailing list > Kernelnewbies at kernelnewbies.org > http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies > > Have a look at a utility binary "cdecl". This will help you greatly. However, the statement above means, snd_soc... is a function which takes a argument which is a pointer and returns a pointer to struct s3c_.. Here the pointer i2s is declared and initialize. Else it will do nasty thing if you try to de-reference it. Thanks, -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130415/bfb9b97a/attachment.html ^ permalink raw reply [flat|nested] 13+ messages in thread
* Forum for asking questions related to block device drivers 2013-04-10 20:53 Forum for asking questions related to block device drivers neha naik 2013-04-11 5:15 ` Rajat Sharma @ 2013-04-11 7:47 ` Bjørn Mork 1 sibling, 0 replies; 13+ messages in thread From: Bjørn Mork @ 2013-04-11 7:47 UTC (permalink / raw) To: kernelnewbies neha naik <nehanaik27@gmail.com> writes: > Nobody has replied to my query here. So i am just wondering if there is > a forum for block device driver where i can post my query. > Please tell me if there is any such forum. The "get_maintainer" script will tell you such things. Try running for example scripts/get_maintainer.pl -f drivers/block/ from the top level kernel source directory. (The answer seems to be NO. The only list pointed to by the script is linux-kernel at vger.kernel.org.) Bj?rn ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2013-04-15 9:48 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-04-10 20:53 Forum for asking questions related to block device drivers neha naik 2013-04-11 5:15 ` Rajat Sharma 2013-04-11 15:09 ` neha naik 2013-04-11 17:53 ` Rajat Sharma 2013-04-11 18:50 ` neha naik 2013-04-11 19:49 ` Greg Freemyer 2013-04-11 20:48 ` neha naik 2013-04-11 22:06 ` Arlie Stephens 2013-04-11 23:02 ` Greg Freemyer 2013-04-12 18:01 ` neha naik 2013-04-15 7:02 ` simple question about struct pointer Ben Wu 2013-04-15 9:48 ` arshad hussain 2013-04-11 7:47 ` Forum for asking questions related to block device drivers Bjørn Mork
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.