From: Minchan Kim <minchan@kernel.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: kernel-team <kernel-team@lge.com>, Jan Kara <jack@suse.cz>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
Dave Chinner <david@fromorbit.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Matthew Wilcox <willy@infradead.org>,
Christoph Hellwig <hch@lst.de>, linux-mm <linux-mm@kvack.org>,
seungho1.park@lge.com, Andrew Morton <akpm@linux-foundation.org>,
"karam . lee" <karam.lee@lge.com>
Subject: Re: [PATCH v1 2/6] fs: use on-stack-bio if backing device has BDI_CAP_SYNC capability
Date: Mon, 21 Aug 2017 15:13:39 +0900 [thread overview]
Message-ID: <20170821061339.GA2544@bbox> (raw)
In-Reply-To: <1046cd1e-35f2-2663-4886-64e6e4f2093c@kernel.dk>
Hi Jens,
On Wed, Aug 16, 2017 at 09:56:12AM -0600, Jens Axboe wrote:
> On 08/15/2017 10:48 PM, Minchan Kim wrote:
> > Hi Jens,
> >
> > On Mon, Aug 14, 2017 at 10:17:09AM -0600, Jens Axboe wrote:
> >> On 08/14/2017 09:38 AM, Jens Axboe wrote:
> >>> On 08/14/2017 09:31 AM, Minchan Kim wrote:
> >>>>> Secondly, generally you don't have slow devices and fast devices
> >>>>> intermingled when running workloads. That's the rare case.
> >>>>
> >>>> Not true. zRam is really popular swap for embedded devices where
> >>>> one of low cost product has a really poor slow nand compared to
> >>>> lz4/lzo [de]comression.
> >>>
> >>> I guess that's true for some cases. But as I said earlier, the recycling
> >>> really doesn't care about this at all. They can happily coexist, and not
> >>> step on each others toes.
> >>
> >> Dusted it off, result is here against -rc5:
> >>
> >> http://git.kernel.dk/cgit/linux-block/log/?h=cpu-alloc-cache
> >>
> >> I'd like to split the amount of units we cache and the amount of units
> >> we free, right now they are both CPU_ALLOC_CACHE_SIZE. This means that
> >> once we hit that count, we free all of the, and then store the one we
> >> were asked to free. That always keeps 1 local, but maybe it'd make more
> >> sense to cache just free CPU_ALLOC_CACHE_SIZE/2 (or something like that)
> >> so that we retain more than 1 per cpu in case and app preempts when
> >> sleeping for IO and the new task on that CPU then issues IO as well.
> >> Probably minor.
> >>
> >> Ran a quick test on nullb0 with 32 sync readers. The test was O_DIRECT
> >> on the block device, so I disabled the __blkdev_direct_IO_simple()
> >> bypass. With the above branch, we get ~18.0M IOPS, and without we get
> >> ~14M IOPS. Both ran with iostats disabled, to avoid any interference
> >> from that.
> >
> > Looks promising.
> > If recycling bio works well enough, I think we don't need to introduce
> > new split in the path for on-stack bio.
> > I will test your version on zram-swap!
>
> Thanks, let me know how it goes. It's quite possible that we'll need
> a few further tweaks, but at least the basis should be there.
Sorry for my late reply.
I just finished the swap-in testing in with zram-swap which is critical
for the latency.
For the testing, I made a memcc and put $NR_CPU(mine is 12) processes
in there and each processes consumes 1G so total is 12G while my system
has 16GB memory so there was no global reclaim.
Then, echo 1 > /mnt/memcg/group/force.empty to swap all pages out and
then the programs wait my signal to swap in and I trigger the signal
to every processes to swap in every pages and measures elapsed time
for the swapin.
the value is average usec time elapsed swap-in 1G pages for each process
and I repeated it 10times and stddev is very stable.
swapin:
base(with rw_page) 1100806.73(100.00%)
no-rw_page 1146856.95(104.18%)
Jens's pcp 1146910.00(104.19%)
onstack-bio 1114872.18(101.28%)
In my test, there is no difference between dynamic bio allocation
(i.e., no-rwpage) and pcp approch but onstack-bio is much faster
so it's almost same with rw_page.
swapout test is to measure elapsed time for "echo 1 > /mnt/memcg/test_group/force.empty'
so it's sec unit.
swapout:
base(with rw_page) 7.72(100.00%)
no-rw_page 8.36(108.29%)
Jens's pcp 8.31(107.64%)
onstack-bio 8.19(106.09%)
rw_page's swapout is 6% or more than faster than else.
I tried pmbenchmak with no memcg to see the performance in global reclaim.
Also, I executed background IO job which reads data from HDD.
The value is average usec time elapsed for a page access so smaller is
better.
base(with rw_page) 14.42(100.00%)
no-rw_page 15.66(108.60%)
Jens's pcp 15.81(109.64%)
onstack-bio 15.42(106.93%)
It's similar to swapout test in memcg.
6% or more is not trivial so I doubt we can remove rw_page
at this moment. :(
I will look into the detail with perf.
If you have further optimizations or suggestions, Feel free to
say that. I am happy to test it.
Thanks.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>,
Dan Williams <dan.j.williams@intel.com>,
Matthew Wilcox <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
Ross Zwisler <ross.zwisler@linux.intel.com>,
"karam . lee" <karam.lee@lge.com>,
seungho1.park@lge.com, Dave Chinner <david@fromorbit.com>,
Jan Kara <jack@suse.cz>, Vishal Verma <vishal.l.verma@intel.com>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
kernel-team <kernel-team@lge.com>
Subject: Re: [PATCH v1 2/6] fs: use on-stack-bio if backing device has BDI_CAP_SYNC capability
Date: Mon, 21 Aug 2017 15:13:39 +0900 [thread overview]
Message-ID: <20170821061339.GA2544@bbox> (raw)
In-Reply-To: <1046cd1e-35f2-2663-4886-64e6e4f2093c@kernel.dk>
Hi Jens,
On Wed, Aug 16, 2017 at 09:56:12AM -0600, Jens Axboe wrote:
> On 08/15/2017 10:48 PM, Minchan Kim wrote:
> > Hi Jens,
> >
> > On Mon, Aug 14, 2017 at 10:17:09AM -0600, Jens Axboe wrote:
> >> On 08/14/2017 09:38 AM, Jens Axboe wrote:
> >>> On 08/14/2017 09:31 AM, Minchan Kim wrote:
> >>>>> Secondly, generally you don't have slow devices and fast devices
> >>>>> intermingled when running workloads. That's the rare case.
> >>>>
> >>>> Not true. zRam is really popular swap for embedded devices where
> >>>> one of low cost product has a really poor slow nand compared to
> >>>> lz4/lzo [de]comression.
> >>>
> >>> I guess that's true for some cases. But as I said earlier, the recycling
> >>> really doesn't care about this at all. They can happily coexist, and not
> >>> step on each others toes.
> >>
> >> Dusted it off, result is here against -rc5:
> >>
> >> http://git.kernel.dk/cgit/linux-block/log/?h=cpu-alloc-cache
> >>
> >> I'd like to split the amount of units we cache and the amount of units
> >> we free, right now they are both CPU_ALLOC_CACHE_SIZE. This means that
> >> once we hit that count, we free all of the, and then store the one we
> >> were asked to free. That always keeps 1 local, but maybe it'd make more
> >> sense to cache just free CPU_ALLOC_CACHE_SIZE/2 (or something like that)
> >> so that we retain more than 1 per cpu in case and app preempts when
> >> sleeping for IO and the new task on that CPU then issues IO as well.
> >> Probably minor.
> >>
> >> Ran a quick test on nullb0 with 32 sync readers. The test was O_DIRECT
> >> on the block device, so I disabled the __blkdev_direct_IO_simple()
> >> bypass. With the above branch, we get ~18.0M IOPS, and without we get
> >> ~14M IOPS. Both ran with iostats disabled, to avoid any interference
> >> from that.
> >
> > Looks promising.
> > If recycling bio works well enough, I think we don't need to introduce
> > new split in the path for on-stack bio.
> > I will test your version on zram-swap!
>
> Thanks, let me know how it goes. It's quite possible that we'll need
> a few further tweaks, but at least the basis should be there.
Sorry for my late reply.
I just finished the swap-in testing in with zram-swap which is critical
for the latency.
For the testing, I made a memcc and put $NR_CPU(mine is 12) processes
in there and each processes consumes 1G so total is 12G while my system
has 16GB memory so there was no global reclaim.
Then, echo 1 > /mnt/memcg/group/force.empty to swap all pages out and
then the programs wait my signal to swap in and I trigger the signal
to every processes to swap in every pages and measures elapsed time
for the swapin.
the value is average usec time elapsed swap-in 1G pages for each process
and I repeated it 10times and stddev is very stable.
swapin:
base(with rw_page) 1100806.73(100.00%)
no-rw_page 1146856.95(104.18%)
Jens's pcp 1146910.00(104.19%)
onstack-bio 1114872.18(101.28%)
In my test, there is no difference between dynamic bio allocation
(i.e., no-rwpage) and pcp approch but onstack-bio is much faster
so it's almost same with rw_page.
swapout test is to measure elapsed time for "echo 1 > /mnt/memcg/test_group/force.empty'
so it's sec unit.
swapout:
base(with rw_page) 7.72(100.00%)
no-rw_page 8.36(108.29%)
Jens's pcp 8.31(107.64%)
onstack-bio 8.19(106.09%)
rw_page's swapout is 6% or more than faster than else.
I tried pmbenchmak with no memcg to see the performance in global reclaim.
Also, I executed background IO job which reads data from HDD.
The value is average usec time elapsed for a page access so smaller is
better.
base(with rw_page) 14.42(100.00%)
no-rw_page 15.66(108.60%)
Jens's pcp 15.81(109.64%)
onstack-bio 15.42(106.93%)
It's similar to swapout test in memcg.
6% or more is not trivial so I doubt we can remove rw_page
at this moment. :(
I will look into the detail with perf.
If you have further optimizations or suggestions, Feel free to
say that. I am happy to test it.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>,
Dan Williams <dan.j.williams@intel.com>,
Matthew Wilcox <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
Ross Zwisler <ross.zwisler@linux.intel.com>,
"karam . lee" <karam.lee@lge.com>,
seungho1.park@lge.com, Dave Chinner <david@fromorbit.com>,
Jan Kara <jack@suse.cz>, Vishal Verma <vishal.l.verma@intel.com>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
kernel-team <kernel-team@lge.com>
Subject: Re: [PATCH v1 2/6] fs: use on-stack-bio if backing device has BDI_CAP_SYNC capability
Date: Mon, 21 Aug 2017 15:13:39 +0900 [thread overview]
Message-ID: <20170821061339.GA2544@bbox> (raw)
In-Reply-To: <1046cd1e-35f2-2663-4886-64e6e4f2093c@kernel.dk>
Hi Jens,
On Wed, Aug 16, 2017 at 09:56:12AM -0600, Jens Axboe wrote:
> On 08/15/2017 10:48 PM, Minchan Kim wrote:
> > Hi Jens,
> >
> > On Mon, Aug 14, 2017 at 10:17:09AM -0600, Jens Axboe wrote:
> >> On 08/14/2017 09:38 AM, Jens Axboe wrote:
> >>> On 08/14/2017 09:31 AM, Minchan Kim wrote:
> >>>>> Secondly, generally you don't have slow devices and fast devices
> >>>>> intermingled when running workloads. That's the rare case.
> >>>>
> >>>> Not true. zRam is really popular swap for embedded devices where
> >>>> one of low cost product has a really poor slow nand compared to
> >>>> lz4/lzo [de]comression.
> >>>
> >>> I guess that's true for some cases. But as I said earlier, the recycling
> >>> really doesn't care about this at all. They can happily coexist, and not
> >>> step on each others toes.
> >>
> >> Dusted it off, result is here against -rc5:
> >>
> >> http://git.kernel.dk/cgit/linux-block/log/?h=cpu-alloc-cache
> >>
> >> I'd like to split the amount of units we cache and the amount of units
> >> we free, right now they are both CPU_ALLOC_CACHE_SIZE. This means that
> >> once we hit that count, we free all of the, and then store the one we
> >> were asked to free. That always keeps 1 local, but maybe it'd make more
> >> sense to cache just free CPU_ALLOC_CACHE_SIZE/2 (or something like that)
> >> so that we retain more than 1 per cpu in case and app preempts when
> >> sleeping for IO and the new task on that CPU then issues IO as well.
> >> Probably minor.
> >>
> >> Ran a quick test on nullb0 with 32 sync readers. The test was O_DIRECT
> >> on the block device, so I disabled the __blkdev_direct_IO_simple()
> >> bypass. With the above branch, we get ~18.0M IOPS, and without we get
> >> ~14M IOPS. Both ran with iostats disabled, to avoid any interference
> >> from that.
> >
> > Looks promising.
> > If recycling bio works well enough, I think we don't need to introduce
> > new split in the path for on-stack bio.
> > I will test your version on zram-swap!
>
> Thanks, let me know how it goes. It's quite possible that we'll need
> a few further tweaks, but at least the basis should be there.
Sorry for my late reply.
I just finished the swap-in testing in with zram-swap which is critical
for the latency.
For the testing, I made a memcc and put $NR_CPU(mine is 12) processes
in there and each processes consumes 1G so total is 12G while my system
has 16GB memory so there was no global reclaim.
Then, echo 1 > /mnt/memcg/group/force.empty to swap all pages out and
then the programs wait my signal to swap in and I trigger the signal
to every processes to swap in every pages and measures elapsed time
for the swapin.
the value is average usec time elapsed swap-in 1G pages for each process
and I repeated it 10times and stddev is very stable.
swapin:
base(with rw_page) 1100806.73(100.00%)
no-rw_page 1146856.95(104.18%)
Jens's pcp 1146910.00(104.19%)
onstack-bio 1114872.18(101.28%)
In my test, there is no difference between dynamic bio allocation
(i.e., no-rwpage) and pcp approch but onstack-bio is much faster
so it's almost same with rw_page.
swapout test is to measure elapsed time for "echo 1 > /mnt/memcg/test_group/force.empty'
so it's sec unit.
swapout:
base(with rw_page) 7.72(100.00%)
no-rw_page 8.36(108.29%)
Jens's pcp 8.31(107.64%)
onstack-bio 8.19(106.09%)
rw_page's swapout is 6% or more than faster than else.
I tried pmbenchmak with no memcg to see the performance in global reclaim.
Also, I executed background IO job which reads data from HDD.
The value is average usec time elapsed for a page access so smaller is
better.
base(with rw_page) 14.42(100.00%)
no-rw_page 15.66(108.60%)
Jens's pcp 15.81(109.64%)
onstack-bio 15.42(106.93%)
It's similar to swapout test in memcg.
6% or more is not trivial so I doubt we can remove rw_page
at this moment. :(
I will look into the detail with perf.
If you have further optimizations or suggestions, Feel free to
say that. I am happy to test it.
Thanks.
next prev parent reply other threads:[~2017-08-21 6:11 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-08 6:50 [PATCH v1 0/6] Remove rw_page Minchan Kim
2017-08-08 6:50 ` Minchan Kim
2017-08-08 6:50 ` [PATCH v1 1/6] bdi: introduce BDI_CAP_SYNC Minchan Kim
2017-08-08 6:50 ` Minchan Kim
2017-08-08 6:50 ` Minchan Kim
2017-08-08 6:50 ` [PATCH v1 2/6] fs: use on-stack-bio if backing device has BDI_CAP_SYNC capability Minchan Kim
2017-08-08 6:50 ` Minchan Kim
2017-08-08 12:49 ` Matthew Wilcox
2017-08-08 12:49 ` Matthew Wilcox
2017-08-08 13:29 ` Matthew Wilcox
2017-08-08 13:29 ` Matthew Wilcox
2017-08-09 1:51 ` Minchan Kim
2017-08-09 1:51 ` Minchan Kim
2017-08-09 1:51 ` Minchan Kim
2017-08-09 2:31 ` Matthew Wilcox
2017-08-09 2:31 ` Matthew Wilcox
2017-08-09 2:41 ` Minchan Kim
2017-08-09 2:41 ` Minchan Kim
2017-08-10 3:04 ` Matthew Wilcox
2017-08-10 3:04 ` Matthew Wilcox
2017-08-10 3:06 ` Dan Williams
2017-08-10 3:06 ` Dan Williams
2017-08-11 10:46 ` Christoph Hellwig
2017-08-11 10:46 ` Christoph Hellwig
2017-08-11 10:46 ` Christoph Hellwig
2017-08-11 14:26 ` Jens Axboe
2017-08-11 14:26 ` Jens Axboe
2017-08-14 8:50 ` Minchan Kim
2017-08-14 8:50 ` Minchan Kim
2017-08-14 8:50 ` Minchan Kim
2017-08-14 14:36 ` Jens Axboe
2017-08-14 14:36 ` Jens Axboe
2017-08-14 15:06 ` Minchan Kim
2017-08-14 15:06 ` Minchan Kim
2017-08-14 15:06 ` Minchan Kim
2017-08-14 15:14 ` Jens Axboe
2017-08-14 15:14 ` Jens Axboe
2017-08-14 15:14 ` Jens Axboe
2017-08-14 15:31 ` Minchan Kim
2017-08-14 15:31 ` Minchan Kim
2017-08-14 15:31 ` Minchan Kim
2017-08-14 15:38 ` Jens Axboe
2017-08-14 15:38 ` Jens Axboe
2017-08-14 15:38 ` Jens Axboe
2017-08-14 16:17 ` Jens Axboe
2017-08-14 16:17 ` Jens Axboe
2017-08-14 16:17 ` Jens Axboe
2017-08-16 4:48 ` Minchan Kim
2017-08-16 4:48 ` Minchan Kim
2017-08-16 4:48 ` Minchan Kim
2017-08-16 15:56 ` Jens Axboe
2017-08-16 15:56 ` Jens Axboe
2017-08-16 15:56 ` Jens Axboe
2017-08-21 6:13 ` Minchan Kim [this message]
2017-08-21 6:13 ` Minchan Kim
2017-08-21 6:13 ` Minchan Kim
2017-08-14 8:48 ` Minchan Kim
2017-08-14 8:48 ` Minchan Kim
2017-08-14 8:48 ` Minchan Kim
2017-08-10 4:00 ` Minchan Kim
2017-08-10 4:00 ` Minchan Kim
2017-08-09 1:48 ` Minchan Kim
2017-08-09 1:48 ` Minchan Kim
2017-08-08 6:50 ` [PATCH v1 3/6] mm:swap: remove end_swap_bio_write argument Minchan Kim
2017-08-08 6:50 ` Minchan Kim
2017-08-08 6:50 ` Minchan Kim
2017-08-08 6:50 ` [PATCH v1 4/6] mm:swap: use on-stack-bio for BDI_CAP_SYNC devices Minchan Kim
2017-08-08 6:50 ` Minchan Kim
2017-08-08 6:50 ` Minchan Kim
2017-08-08 6:50 ` [PATCH v1 5/6] zram: remove zram_rw_page Minchan Kim
2017-08-08 6:50 ` Minchan Kim
2017-08-08 6:50 ` Minchan Kim
2017-08-08 7:02 ` Sergey Senozhatsky
2017-08-08 7:02 ` Sergey Senozhatsky
2017-08-08 8:13 ` Minchan Kim
2017-08-08 8:13 ` Minchan Kim
2017-08-08 8:13 ` Minchan Kim
2017-08-08 8:23 ` Sergey Senozhatsky
2017-08-08 8:23 ` Sergey Senozhatsky
2017-08-08 8:23 ` Sergey Senozhatsky
2017-08-08 15:48 ` Matthew Wilcox
2017-08-08 15:48 ` Matthew Wilcox
2017-08-08 6:50 ` [PATCH v1 6/6] fs: remove rw_page Minchan Kim
2017-08-08 6:50 ` Minchan Kim
2017-08-08 6:50 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170821061339.GA2544@bbox \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=karam.lee@lge.com \
--cc=kernel-team@lge.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=seungho1.park@lge.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.