linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Testing latest linux-next 4.9 ib_srp and ib_srpt
       [not found]               ` <1928955380.8220327.1482302041315.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-12-22  2:10                 ` Laurence Oberman
       [not found]                   ` <1337539588.8422488.1482372617094.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Laurence Oberman @ 2016-12-22  2:10 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org



----- Original Message -----
> From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> To: "Bart Van Assche" <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Sent: Wednesday, December 21, 2016 1:34:01 AM
> Subject: Re: Testing latest linux-next 4.9 ib_srp and ib_srpt
> 
> 
> 
> ----- Original Message -----
> > From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > To: "Bart Van Assche" <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Sent: Tuesday, December 20, 2016 10:31:34 PM
> > Subject: Re: Testing latest linux-next 4.9 ib_srp and ib_srpt
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > To: "Bart Van Assche" <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > > Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > Sent: Tuesday, December 20, 2016 3:44:42 PM
> > > Subject: Re: Testing latest linux-next 4.9 ib_srp and ib_srpt
> > > 
> > > 
> > > 
> > > ----- Original Message -----
> > > > From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > > To: "Bart Van Assche" <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > > > Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > > Sent: Tuesday, December 20, 2016 2:43:48 PM
> > > > Subject: Re: Testing latest linux-next 4.9 ib_srp and ib_srpt
> > > > 
> > > > 
> > > > 
> > > > ----- Original Message -----
> > > > > From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > > > To: "Bart Van Assche" <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > > > > Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > > > Sent: Tuesday, December 20, 2016 2:33:26 PM
> > > > > Subject: Re: Testing latest linux-next 4.9 ib_srp and ib_srpt
> > > > > 
> > > > > Hello Bart
> > > > > 
> > > > > I pulled the latest linux-next and built kernels for both server and
> > > > > client
> > > > > to rerun all my EDR tests for srp.
> > > > > 
> > > > > For some reason the I/O size is being capped again to 1MB in my
> > > > > testing.
> > > > > Using my same testbed.
> > > > > Remember we spent a lot of time making sure we could do 4MB I/O :)
> > > > > 
> > > > > Its working fine in the RHEL 7.3 kernel so before I start going back
> > > > > testing
> > > > > upstream kernels decided to ask.
> > > > > 
> > > > > Have you tested large I/O with latest linux-next
> > > > > 
> > > > > Server Configuration
> > > > > ---------------------
> > > > > Linux fedstorage.bos.redhat.com 4.9.0+
> > > > > 
> > > > > [root@fedstorage modprobe.d]# cat ib_srp.conf
> > > > > options ib_srp cmd_sg_entries=64 indirect_sg_entries=2048
> > > > > 
> > > > > [root@fedstorage modprobe.d]# cat ib_srpt.conf
> > > > > options ib_srpt srp_max_req_size=8296
> > > > > 
> > > > > Also Using
> > > > > 
> > > > > # Set the srp_sq_size
> > > > > for i in
> > > > > /sys/kernel/config/target/srpt/0xfe800000000000007cfe900300726e4e
> > > > > /sys/kernel/config/target/srpt/0xfe800000000000007cfe900300726e4f
> > > > > do
> > > > > 	echo 16384 > $i/tpgt_1/attrib/srp_sq_size
> > > > > done
> > > > > 
> > > > > Client Configuration
> > > > > --------------------
> > > > > Linux ibclient 4.9.0+
> > > > > 
> > > > > [root@ibclient modprobe.d]# cat ib_srp.conf
> > > > > options ib_srp cmd_sg_entries=255 indirect_sg_entries=2048
> > > > > 
> > > > > dd if=/dev/sdw bs=4096k of=/dev/null iflag=direct
> > > > > 
> > > > > ### RECORD    4 >>> ibclient <<< (1482261733.001) (Tue Dec 20
> > > > > 14:22:13
> > > > > 2016)
> > > > > ###
> > > > > # DISK STATISTICS (/sec)
> > > > > #
> > > > > <---------reads---------------><---------writes--------------><--------averages-------->
> > > > > Pct
> > > > > #Time     Name       KBytes Merged  IOs Size  Wait  KBytes Merged
> > > > > IOs
> > > > > Size
> > > > > Wait  RWSize  QLen  Wait SvcTim Util
> > > > > 14:22:13 sdw         1373184      0 1341 1024     2       0      0
> > > > > 0
> > > > > 0
> > > > > 0    1024     3     2      0   97
> > > > > 
> > > > > 
> > > > > If I reboot into my 7.3 kernel its back to what I expect
> > > > > 
> > > > >  dd if=/dev/sdw bs=4096k of=/dev/null iflag=direct
> > > > > 
> > > > > 
> > > > > ### RECORD    3 >>> ibclient <<< (1482262254.001) (Tue Dec 20
> > > > > 14:30:54
> > > > > 2016)
> > > > > ###
> > > > > # DISK STATISTICS (/sec)
> > > > > #
> > > > > <---------reads---------------><---------writes--------------><--------averages-------->
> > > > > Pct
> > > > > #Time     Name       KBytes Merged  IOs Size  Wait  KBytes Merged
> > > > > IOs
> > > > > Size
> > > > > Wait  RWSize  QLen  Wait SvcTim Util
> > > > > 14:30:54 sdw         172032    129   42 4096     3       0      0
> > > > > 0
> > > > > 0
> > > > > 0    4096     1     3      3   130
> > > > > 
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > > > > in
> > > > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > 
> > > > 
> > > > Hi Bart,
> > > > 
> > > > Just FYI
> > > > 
> > > > That dd snap was just as I had stopped the dd.
> > > > 
> > > > Here is a stable dd snap with the RHEL kernel, I just noticed the
> > > > merging,
> > > > need to reboot back into upstream to compare again.
> > > > No merging seen in upstream.
> > > > 
> > > > Thanks
> > > > Laurence
> > > > 
> > > > ### RECORD    6 >>> ibclient <<< (1482262723.001) (Tue Dec 20 14:38:43
> > > > 2016)
> > > > ###
> > > > # DISK STATISTICS (/sec)
> > > > #
> > > > <---------reads---------------><---------writes--------------><--------averages-------->
> > > > Pct
> > > > #Time     Name       KBytes Merged  IOs Size  Wait  KBytes Merged  IOs
> > > > Size
> > > > Wait  RWSize  QLen  Wait SvcTim Util
> > > > 14:38:43 sdw         1200128    879  293 4096     3       0      0    0
> > > > 0
> > > > 0    4096     1     3      3   95
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > > > in
> > > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > 
> > > 
> > > Replying to my own message to keep the thread going
> > > 
> > > 
> > > This is Linux ibclient 4.8.0-rc4
> > > 
> > > Behaves like I expected, and I see the 4MB I/O sizes. as I had already
> > > tested
> > > this.
> > > 
> > > dd if=/dev/sdw bs=4096k of=/dev/null iflag=direct
> > > 
> > > 
> > > ### RECORD    6 >>> ibclient <<< (1482266543.001) (Tue Dec 20 15:42:23
> > > 2016)
> > > ###
> > > # DISK STATISTICS (/sec)
> > > #
> > > <---------reads---------------><---------writes--------------><--------averages-------->
> > > Pct
> > > #Time     Name       KBytes Merged  IOs Size  Wait  KBytes Merged  IOs
> > > Size
> > > Wait  RWSize  QLen  Wait SvcTim Util
> > > 15:42:23 sdw         278528    201   68 4096     2       0      0    0
> > > 0
> > > 0    4096     1     2      2   206
> > > 
> > > Rebooting back into 4.9 and will bisect leading to it once I confirm
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> > So this is where I got to here:
> > 
> > These all worked and I get the 4MB I/O size with direct I/O
> > 
> > v4.8-rc4
> > v4.8-rc5
> > v4.8-rc6
> > v4.8-rc7
> > v4.8-rc8
> > v4.9
> > v4.9-rc1
> > v4.9-rc2
> > v4.9-rc3
> > v4.9-rc4
> > v4.9-rc5
> > v4.9-rc6
> > v4.9-rc7
> > v4.9-rc8
> > 
> > Then git checkout master and build final test kernel
> > 4.9.0+
> > 
> > This one fails
> > 
> > # DISK STATISTICS (/sec)
> > #
> > <---------reads---------------><---------writes--------------><--------averages-------->
> > Pct
> > #Time     Name       KBytes Merged  IOs Size  Wait  KBytes Merged  IOs Size
> > Wait  RWSize  QLen  Wait SvcTim Util
> > 22:12:48 sdw         1413120      0 1380 1024     2       0      0    0
> > 0
> > 0    1024     3     2      0   99
> > 22:12:49 sdw         1409024      0 1376 1024     2       0      0    0
> > 0
> > 0    1024     3     2      0   98
> > 22:12:50 sdw         1445888      0 1412 1024     2       0      0    0
> > 0
> > 0    1024     3     2      0   98
> > 22:12:51 sdw         1429504      0 1396 1024     2       0      0    0
> > 0
> > 0    1024     3     2      0   98
> > 22:12:52 sdw         1426432      0 1393 1024     2       0      0    0
> > 0
> > 0    1024     3     2      0   98
> > 22:12:53 sdw         1408000      0 1375 1024     2       0      0    0
> > 0
> > 0    1024     3     2      0   98
> >                                 ***      ****
> >                                 ****
> > 
> > The last commit for 4.9-rc8 was 3e5de27
> > 
> > Between this and the master checkout HEAD there are 2313 lines of commits
> > This was the giant merge.
> > 
> > Something broke here.
> > I guess I will have to bisect the hard way, unless somebody realizes which
> > of
> > the commits broke it.
> > 
> > Commits for SRP, none of these look like candidates to break the I/O size
> > and
> > merge
> > 
> > 9032ad7 Merge branches 'misc', 'qedr', 'reject-helpers', 'rxe' and 'srp'
> > into
> > merge-test
> > 4fa354c IB/srp: Make writing the add_target sysfs attr interruptible
> > 290081b IB/srp: Make mapping failures easier to debug
> > 3787d99 IB/srp: Make login failures easier to debug
> > 042dd76 IB/srp: Introduce a local variable in srp_add_one()
> > 1a1faf7 IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build
> > 0d38c24 IB/srpt: Report login failures only once
> > 
> > dm related, however I am testing against an sd, not dm device here
> > 
> > ef548c5 dm flakey: introduce "error_writes" feature
> > e99dda8f dm cache policy smq: use hash_32() instead of hash_32_generic()
> > 027c431 dm crypt: reject key strings containing whitespace chars
> > b446396 dm space map: always set ev if sm_ll_mutate() succeeds
> > 0c79ce0 dm space map metadata: skip useless memcpy in
> > metadata_ll_init_index()
> > 314c25c dm space map metadata: fix 'struct sm_metadata' leak on failed
> > create
> > 58fc4fe Documentation: dm raid: define data_offset status field
> > 11e2968 dm raid: fix discard support regression
> > affa9d2 dm raid: don't allow "write behind" with raid4/5/6
> > 54cd640 dm mpath: use hw_handler_params if attached hw_handler is same as
> > requested
> > c538f6e dm crypt: add ability to use keys from the kernel key retention
> > service
> > 0637018 dm array: remove a dead assignment in populate_ablock_with_values()
> > 6080758 dm ioctl: use offsetof() instead of open-coding it
> > b23df0d dm rq: simplify use_blk_mq initialization
> > 41c73a4 dm bufio: drop the lock when doing GFP_NOIO allocation
> > d12067f dm bufio: don't take the lock in dm_bufio_shrink_count
> > 9ea61ca dm bufio: avoid sleeping while holding the dm_bufio lock
> > 5b8c01f dm table: simplify dm_table_determine_type()
> > 301fc3f dm table: an 'all_blk_mq' table must be loaded for a blk-mq DM
> > device
> > 6936c12 dm table: fix 'all_blk_mq' inconsistency when an empty table is
> > loaded
> > 
> > I have time next week so will try narrow it down via bisect by commit
> > 
> > Thanks
> > Laurence
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> Started with bisect at > 6000 revisions and 13 cycles between v4.9-rc8 and
> master.
> bisect good for v4.9-rc8, bisect bad for master HEAD
> 
> Down to around 700 now, but its late, going to bed now so will finish
> tomorrow
> 
> Thanks
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Hi Bart, Christoph

After multiple bisects (6000 revisions, 13 cycles), I got to this one.
Of course there are a huge amount of block layer changes as we know in rc10.

[loberman@ibclient linux-next.orig]$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[542ff7bf18c63cf403e36a4a1c71d86dc120d924] block: new direct I/O implementation

This commit is the one that seems to have changed the behavior.
Max I/O size restricted 1MB, even when 4MB I/O is requested, no merging seen.

This is not going to only affect SRP targets.

I will review the code and will be happy to test any patches.
I will leave the test bed in place.


commit 542ff7bf18c63cf403e36a4a1c71d86dc120d924
Author: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Date:   Wed Nov 16 23:14:22 2016 -0700

    block: new direct I/O implementation
    
    Similar to the simple fast path, but we now need a dio structure to
    track multiple-bio completions.  It's basically a cut-down version
    of the new iomap-based direct I/O code for filesystems, but without
    all the logic to call into the filesystem for extent lookup or
    allocation, and without the complex I/O completion workqueue handler
    for AIO - instead we just use the FUA bit on the bios to ensure
    data is flushed to stable storage.
    
    Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
    Signed-off-by: Jens Axboe <axboe-b10kYP2dOMg@public.gmane.org>

diff --git a/fs/block_dev.c b/fs/block_dev.c
index a1b9abe..35cc494 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -270,11 +270,161 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
        return ret;
 }
 
+struct blkdev_dio {
+       union {
+               struct kiocb            *iocb;
+               struct task_struct      *waiter;
+       };
+       size_t                  size;
+       atomic_t                ref;
+       bool                    multi_bio : 1;
+       bool                    should_dirty : 1;
+       bool                    is_sync : 1;
+       struct bio              bio;
+};
+
+static struct bio_set *blkdev_dio_pool __read_mostly;
+
+static void blkdev_bio_end_io(struct bio *bio)
+{
+       struct blkdev_dio *dio = bio->bi_private;
+       bool should_dirty = dio->should_dirty;
+
+       if (dio->multi_bio && !atomic_dec_and_test(&dio->ref)) {
+               if (bio->bi_error && !dio->bio.bi_error)
+                       dio->bio.bi_error = bio->bi_error;
+       } else {
+               if (!dio->is_sync) {
+                       struct kiocb *iocb = dio->iocb;
+                       ssize_t ret = dio->bio.bi_error;
+
+                       if (likely(!ret)) {
+                               ret = dio->size;
+                               iocb->ki_pos += ret;
+                       }
+
+                       dio->iocb->ki_complete(iocb, ret, 0);
+                       bio_put(&dio->bio);
+               } else {
+                       struct task_struct *waiter = dio->waiter;
+
+                       WRITE_ONCE(dio->waiter, NULL);
+                       wake_up_process(waiter);
+               }
+       }
+
+       if (should_dirty) {
+               bio_check_pages_dirty(bio);
+       } else {
+               struct bio_vec *bvec;
+               int i;
+
+               bio_for_each_segment_all(bvec, bio, i)
+                       put_page(bvec->bv_page);
+               bio_put(bio);
+       }
+}
+
 static ssize_t
-blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
+__blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 {
        struct file *file = iocb->ki_filp;
        struct inode *inode = bdev_file_inode(file);
+       struct block_device *bdev = I_BDEV(inode);
+       unsigned blkbits = blksize_bits(bdev_logical_block_size(bdev));
+       struct blkdev_dio *dio;
+       struct bio *bio;
+       bool is_read = (iov_iter_rw(iter) == READ);
+       loff_t pos = iocb->ki_pos;
+       blk_qc_t qc = BLK_QC_T_NONE;
+       int ret;
+
+       if ((pos | iov_iter_alignment(iter)) & ((1 << blkbits) - 1))
+               return -EINVAL;
+
+       bio = bio_alloc_bioset(GFP_KERNEL, nr_pages, blkdev_dio_pool);
+       bio_get(bio); /* extra ref for the completion handler */
+
+       dio = container_of(bio, struct blkdev_dio, bio);
+       dio->is_sync = is_sync_kiocb(iocb);
+       if (dio->is_sync)
+               dio->waiter = current;
+       else
+               dio->iocb = iocb;
+
+       dio->size = 0;
+       dio->multi_bio = false;
+       dio->should_dirty = is_read && (iter->type == ITER_IOVEC);
+
+       for (;;) {
+               bio->bi_bdev = bdev;
+               bio->bi_iter.bi_sector = pos >> blkbits;
+               bio->bi_private = dio;
+               bio->bi_end_io = blkdev_bio_end_io;
+
+               ret = bio_iov_iter_get_pages(bio, iter);
+               if (unlikely(ret)) {
+                       bio->bi_error = ret;
+                       bio_endio(bio);
+                       break;
+               }
+
+               if (is_read) {
+                       bio->bi_opf = REQ_OP_READ;
+                       if (dio->should_dirty)
+                               bio_set_pages_dirty(bio);
+               } else {
+                       bio->bi_opf = dio_bio_write_op(iocb);
+                       task_io_account_write(bio->bi_iter.bi_size);
+               }
+
+               dio->size += bio->bi_iter.bi_size;
+               pos += bio->bi_iter.bi_size;
+
+               nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES);
+               if (!nr_pages) {
+                       qc = submit_bio(bio);
+                       break;
+               }
+
+               if (!dio->multi_bio) {
+                       dio->multi_bio = true;
+                       atomic_set(&dio->ref, 2);
+               } else {
+                       atomic_inc(&dio->ref);
+               }
+
+               submit_bio(bio);
+               bio = bio_alloc(GFP_KERNEL, nr_pages);
+       }
+
+       if (!dio->is_sync)
+               return -EIOCBQUEUED;
+
+       for (;;) {
+               set_current_state(TASK_UNINTERRUPTIBLE);
+               if (!READ_ONCE(dio->waiter))
+                       break;
+
+               if (!(iocb->ki_flags & IOCB_HIPRI) ||
+                   !blk_mq_poll(bdev_get_queue(bdev), qc))
+                       io_schedule();
+       }
+       __set_current_state(TASK_RUNNING);
+
+       ret = dio->bio.bi_error;
+       if (likely(!ret)) {
+               ret = dio->size;
+               iocb->ki_pos += ret;
+       }
+
+       bio_put(&dio->bio);
+       return ret;
+}
+
+static ssize_t
+blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
+{
        int nr_pages;
 
        nr_pages = iov_iter_npages(iter, BIO_MAX_PAGES + 1);
@@ -282,10 +432,18 @@ static void blkdev_bio_end_io_simple(struct bio *bio)
                return 0;
        if (is_sync_kiocb(iocb) && nr_pages <= BIO_MAX_PAGES)
                return __blkdev_direct_IO_simple(iocb, iter, nr_pages);
-       return __blockdev_direct_IO(iocb, inode, I_BDEV(inode), iter,
-                                   blkdev_get_block, NULL, NULL,
-                                   DIO_SKIP_DIO_COUNT);
+
+       return __blkdev_direct_IO(iocb, iter, min(nr_pages, BIO_MAX_PAGES));
+}
+
+static __init int blkdev_init(void)
+{
+       blkdev_dio_pool = bioset_create(4, offsetof(struct blkdev_dio, bio));
+       if (!blkdev_dio_pool)
+               return -ENOMEM;
+       return 0;
 }
+module_init(blkdev_init);
 
 int __sync_blockdev(struct block_device *bdev, int wait)
 {

Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Testing latest linux-next 4.9 ib_srp and ib_srpt
       [not found]                   ` <1337539588.8422488.1482372617094.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-12-22  6:23                     ` Christoph Hellwig
       [not found]                       ` <20161222062321.GA30326-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2016-12-22  6:23 UTC (permalink / raw)
  To: Laurence Oberman
  Cc: Bart Van Assche, Christoph Hellwig,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

I just got CC'ed to a massive chain of full quotes.  If you want an
answer from me please write a readable mail, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Testing latest linux-next 4.9 ib_srp and ib_srpt sees I/O capped at 1MB and no merging
       [not found]                       ` <20161222062321.GA30326-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2016-12-22 13:17                         ` Laurence Oberman
       [not found]                           ` <1661819060.8462293.1482412678092.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Laurence Oberman @ 2016-12-22 13:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA


Hello Christoph, apologies, here is a clear summary of the issue.

During testing of the latest linux-next with rc-10 block layer changes I noticed that I/O was being capped at 1MB size and no merging was seen.

The issue was not apparent on 4.8.0-rc8 or earlier.

dd if=/dev/sdw bs=4096k of=/dev/null iflag=direct

### RECORD    6 >>> ibclient <<< (1482266543.001) (Tue Dec 20 15:42:23 2016) ###
# DISK STATISTICS (/sec)
#                   <---------reads---------------><---------writes--------------><--------averages--------> Pct
#Time     Name       KBytes Merged  IOs Size  Wait  KBytes Merged  IOs Size  Wait  RWSize  QLen  Wait SvcTim Util
15:42:23 sdw         278528    201   68 4096     2       0      0    0    0     0    4096     1     2      2   206

Then git checkout master and build final test kernel
4.9.0+

This one clearly shows the I.O at 1MB and no merging.

# DISK STATISTICS (/sec)
#                   <---------reads---------------><---------writes--------------><--------averages--------> Pct
#Time     Name       KBytes Merged  IOs Size  Wait  KBytes Merged  IOs Size  Wait  RWSize  QLen  Wait SvcTim Util
22:12:48 sdw         1413120      0 1380 1024     2       0      0    0    0     0    1024     3     2      0   99
22:12:49 sdw         1409024      0 1376 1024     2       0      0    0    0     0    1024     3     2      0   98
22:12:50 sdw         1445888      0 1412 1024     2       0      0    0    0     0    1024     3     2      0   98
22:12:51 sdw         1429504      0 1396 1024     2       0      0    0    0     0    1024     3     2      0   98
22:12:52 sdw         1426432      0 1393 1024     2       0      0    0    0     0    1024     3     2      0   98
22:12:53 sdw         1408000      0 1375 1024     2       0      0    0    0     0    1024     3     2      0   98
                                ***      ****                                         ****

After multiple bisects (6000 revisions, 13 cycles), I got to this one.
Of course there are a huge amount of block layer changes as we know in rc10.

[loberman@ibclient linux-next.orig]$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[542ff7bf18c63cf403e36a4a1c71d86dc120d924] block: new direct I/O implementation

This commit is the one that seems to have changed the behavior.
Max I/O size restricted 1MB, even when 4MB I/O is requested, no merging seen.

This is not going to only affect SRP targets.

I will be happy to test any patches and the test bed is always in place.


commit 542ff7bf18c63cf403e36a4a1c71d86dc120d924
Author: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Date:   Wed Nov 16 23:14:22 2016 -0700

    block: new direct I/O implementation
    
    Similar to the simple fast path, but we now need a dio structure to
    track multiple-bio completions.  It's basically a cut-down version
    of the new iomap-based direct I/O code for filesystems, but without
    all the logic to call into the filesystem for extent lookup or
    allocation, and without the complex I/O completion workqueue handler
    for AIO - instead we just use the FUA bit on the bios to ensure
    data is flushed to stable storage.
    
    Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
    Signed-off-by: Jens Axboe <axboe-b10kYP2dOMg@public.gmane.org>

Many Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Testing latest linux-next 4.9 ib_srp and ib_srpt sees I/O capped at 1MB and no merging
       [not found]                           ` <1661819060.8462293.1482412678092.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-12-22 15:40                             ` Christoph Hellwig
       [not found]                               ` <20161222154049.GA4638-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2016-12-22 19:54                               ` kbuild test robot
  0 siblings, 2 replies; 7+ messages in thread
From: Christoph Hellwig @ 2016-12-22 15:40 UTC (permalink / raw)
  To: Laurence Oberman
  Cc: Christoph Hellwig, Bart Van Assche,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

Hi Laurance,

please try the patch below:

---
>From 69febe1cfb55844862f768447432249781001f9c Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Date: Thu, 22 Dec 2016 16:38:29 +0100
Subject: block: add back plugging in __blkdev_direct_IO

This allows sending larger than 1 MB requess to devices that support
large I/O sizes.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Reported-by: Laurence Oberman <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/block_dev.c | 3 +++
 fs/iomap.c     | 1 -
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7c45072..206a92a 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -328,6 +328,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = bdev_file_inode(file);
 	struct block_device *bdev = I_BDEV(inode);
+	struct blk_plug plug;
 	struct blkdev_dio *dio;
 	struct bio *bio;
 	bool is_read = (iov_iter_rw(iter) == READ);
@@ -353,6 +354,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 	dio->multi_bio = false;
 	dio->should_dirty = is_read && (iter->type == ITER_IOVEC);
 
+	blk_start_plug(&plug);
 	for (;;) {
 		bio->bi_bdev = bdev;
 		bio->bi_iter.bi_sector = pos >> 9;
@@ -394,6 +396,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 		submit_bio(bio);
 		bio = bio_alloc(GFP_KERNEL, nr_pages);
 	}
+	blk_finish_plug(&plug);
 
 	if (!dio->is_sync)
 		return -EIOCBQUEUED;
diff --git a/fs/iomap.c b/fs/iomap.c
index 354a123..3adf1e1 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -844,7 +844,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, struct iomap_ops *ops,
 	size_t count = iov_iter_count(iter);
 	loff_t pos = iocb->ki_pos, end = iocb->ki_pos + count - 1, ret = 0;
 	unsigned int flags = IOMAP_DIRECT;
-	struct blk_plug plug;
 	struct iomap_dio *dio;
 
 	lockdep_assert_held(&inode->i_rwsem);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Testing latest linux-next 4.9 ib_srp and ib_srpt sees I/O capped at 1MB and no merging
       [not found]                               ` <20161222154049.GA4638-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2016-12-22 17:59                                 ` Laurence Oberman
  2016-12-22 19:55                                 ` block: add back plugging in __blkdev_direct_IO kbuild test robot
  1 sibling, 0 replies; 7+ messages in thread
From: Laurence Oberman @ 2016-12-22 17:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA


----- Original Message -----
> From: "Christoph Hellwig" <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> To: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: "Christoph Hellwig" <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>, "Bart Van Assche" <Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>,
> linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Sent: Thursday, December 22, 2016 10:40:49 AM
> Subject: Re: Testing latest linux-next 4.9 ib_srp and ib_srpt sees I/O capped at 1MB and no merging
> 
> Hi Laurance,
> 
> please try the patch below:
> 
> ---
> From 69febe1cfb55844862f768447432249781001f9c Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Date: Thu, 22 Dec 2016 16:38:29 +0100
> Subject: block: add back plugging in __blkdev_direct_IO
> 
> This allows sending larger than 1 MB requess to devices that support
> large I/O sizes.
> 
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Reported-by: Laurence Oberman <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/block_dev.c | 3 +++
>  fs/iomap.c     | 1 -
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 7c45072..206a92a 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -328,6 +328,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter
> *iter, int nr_pages)
>  	struct file *file = iocb->ki_filp;
>  	struct inode *inode = bdev_file_inode(file);
>  	struct block_device *bdev = I_BDEV(inode);
> +	struct blk_plug plug;
>  	struct blkdev_dio *dio;
>  	struct bio *bio;
>  	bool is_read = (iov_iter_rw(iter) == READ);
> @@ -353,6 +354,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter
> *iter, int nr_pages)
>  	dio->multi_bio = false;
>  	dio->should_dirty = is_read && (iter->type == ITER_IOVEC);
>  
> +	blk_start_plug(&plug);
>  	for (;;) {
>  		bio->bi_bdev = bdev;
>  		bio->bi_iter.bi_sector = pos >> 9;
> @@ -394,6 +396,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter
> *iter, int nr_pages)
>  		submit_bio(bio);
>  		bio = bio_alloc(GFP_KERNEL, nr_pages);
>  	}
> +	blk_finish_plug(&plug);
>  
>  	if (!dio->is_sync)
>  		return -EIOCBQUEUED;
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 354a123..3adf1e1 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -844,7 +844,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> struct iomap_ops *ops,
>  	size_t count = iov_iter_count(iter);
>  	loff_t pos = iocb->ki_pos, end = iocb->ki_pos + count - 1, ret = 0;
>  	unsigned int flags = IOMAP_DIRECT;
> -	struct blk_plug plug;
>  	struct iomap_dio *dio;
>  
>  	lockdep_assert_held(&inode->i_rwsem);
> --
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Hello Christoph

The patch works and I now see 4MB I/O

# DISK STATISTICS (/sec)
#                   <---------reads---------><---------writes---------><--------averages--------> Pct
#Time     Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util
11:53:58 sdah        143360    105   35 4096       0      0    0    0    4096     1    28     28   99
11:53:59 sdah        139264    102   34 4096       0      0    0    0    4096     1    29     29   99
11:54:00 sdah        143360    105   35 4096       0      0    0    0    4096     1    28     28   99

I think you forgot to remove calls to blk_start_plug and blk_finish_plug in fs/iomap.c in your patch.
I took them out and built the test kernel that way.

Let me know if you will just remove those in the final or want a patch.


Thanks fo rthe super quick response

Regards
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: block: add back plugging in __blkdev_direct_IO
  2016-12-22 15:40                             ` Christoph Hellwig
       [not found]                               ` <20161222154049.GA4638-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2016-12-22 19:54                               ` kbuild test robot
  1 sibling, 0 replies; 7+ messages in thread
From: kbuild test robot @ 2016-12-22 19:54 UTC (permalink / raw)
  Cc: kbuild-all, Laurence Oberman, Christoph Hellwig, Bart Van Assche,
	linux-rdma, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 1878 bytes --]

Hi Christoph,

[auto build test ERROR on linus/master]
[also build test ERROR on next-20161222]
[cannot apply to v4.9]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Christoph-Hellwig/block-add-back-plugging-in-__blkdev_direct_IO/20161223-002453
config: x86_64-rhel (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   fs/iomap.c: In function 'iomap_dio_rw':
>> fs/iomap.c:897:18: error: 'plug' undeclared (first use in this function)
     blk_start_plug(&plug);
                     ^~~~
   fs/iomap.c:897:18: note: each undeclared identifier is reported only once for each function it appears in

vim +/plug +897 fs/iomap.c

ff6a9292 Christoph Hellwig 2016-11-30  891  		WARN_ON_ONCE(ret);
ff6a9292 Christoph Hellwig 2016-11-30  892  		ret = 0;
ff6a9292 Christoph Hellwig 2016-11-30  893  	}
ff6a9292 Christoph Hellwig 2016-11-30  894  
ff6a9292 Christoph Hellwig 2016-11-30  895  	inode_dio_begin(inode);
ff6a9292 Christoph Hellwig 2016-11-30  896  
ff6a9292 Christoph Hellwig 2016-11-30 @897  	blk_start_plug(&plug);
ff6a9292 Christoph Hellwig 2016-11-30  898  	do {
ff6a9292 Christoph Hellwig 2016-11-30  899  		ret = iomap_apply(inode, pos, count, flags, ops, dio,
ff6a9292 Christoph Hellwig 2016-11-30  900  				iomap_dio_actor);

:::::: The code at line 897 was first introduced by commit
:::::: ff6a9292e6f633d596826be5ba70d3ef90cc3300 iomap: implement direct I/O

:::::: TO: Christoph Hellwig <hch@lst.de>
:::::: CC: Dave Chinner <david@fromorbit.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 38270 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: block: add back plugging in __blkdev_direct_IO
       [not found]                               ` <20161222154049.GA4638-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2016-12-22 17:59                                 ` Laurence Oberman
@ 2016-12-22 19:55                                 ` kbuild test robot
  1 sibling, 0 replies; 7+ messages in thread
From: kbuild test robot @ 2016-12-22 19:55 UTC (permalink / raw)
  Cc: kbuild-all-JC7UmRfGjtg, Laurence Oberman, Christoph Hellwig,
	Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1925 bytes --]

Hi Christoph,

[auto build test ERROR on linus/master]
[also build test ERROR on next-20161222]
[cannot apply to v4.9]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Christoph-Hellwig/block-add-back-plugging-in-__blkdev_direct_IO/20161223-002453
config: x86_64-lkp (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   fs/iomap.c: In function 'iomap_dio_rw':
>> fs/iomap.c:897:18: error: 'plug' undeclared (first use in this function)
     blk_start_plug(&plug);
                     ^~~~
   fs/iomap.c:897:18: note: each undeclared identifier is reported only once for each function it appears in

vim +/plug +897 fs/iomap.c

ff6a9292 Christoph Hellwig 2016-11-30  891  		WARN_ON_ONCE(ret);
ff6a9292 Christoph Hellwig 2016-11-30  892  		ret = 0;
ff6a9292 Christoph Hellwig 2016-11-30  893  	}
ff6a9292 Christoph Hellwig 2016-11-30  894  
ff6a9292 Christoph Hellwig 2016-11-30  895  	inode_dio_begin(inode);
ff6a9292 Christoph Hellwig 2016-11-30  896  
ff6a9292 Christoph Hellwig 2016-11-30 @897  	blk_start_plug(&plug);
ff6a9292 Christoph Hellwig 2016-11-30  898  	do {
ff6a9292 Christoph Hellwig 2016-11-30  899  		ret = iomap_apply(inode, pos, count, flags, ops, dio,
ff6a9292 Christoph Hellwig 2016-11-30  900  				iomap_dio_actor);

:::::: The code at line 897 was first introduced by commit
:::::: ff6a9292e6f633d596826be5ba70d3ef90cc3300 iomap: implement direct I/O

:::::: TO: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
:::::: CC: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 24656 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-12-22 19:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <07c07529-4636-fafb-2598-7358d8a1460d@sandisk.com>
     [not found] ` <52e8398f-a146-721c-3b92-0892b4abbff8@intel.com>
     [not found]   ` <1482166487.25336.10.camel@sandisk.com>
     [not found]     ` <1918919536.8196250.1482262406057.JavaMail.zimbra@redhat.com>
     [not found]       ` <2052479881.8196880.1482263028727.JavaMail.zimbra@redhat.com>
     [not found]         ` <1668746735.8200653.1482266682585.JavaMail.zimbra@redhat.com>
     [not found]           ` <120559766.8215321.1482291094018.JavaMail.zimbra@redhat.com>
     [not found]             ` <1928955380.8220327.1482302041315.JavaMail.zimbra@redhat.com>
     [not found]               ` <1928955380.8220327.1482302041315.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-12-22  2:10                 ` Testing latest linux-next 4.9 ib_srp and ib_srpt Laurence Oberman
     [not found]                   ` <1337539588.8422488.1482372617094.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-12-22  6:23                     ` Christoph Hellwig
     [not found]                       ` <20161222062321.GA30326-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-12-22 13:17                         ` Testing latest linux-next 4.9 ib_srp and ib_srpt sees I/O capped at 1MB and no merging Laurence Oberman
     [not found]                           ` <1661819060.8462293.1482412678092.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-12-22 15:40                             ` Christoph Hellwig
     [not found]                               ` <20161222154049.GA4638-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2016-12-22 17:59                                 ` Laurence Oberman
2016-12-22 19:55                                 ` block: add back plugging in __blkdev_direct_IO kbuild test robot
2016-12-22 19:54                               ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).