Re: mmotm 2009-06-02-16-11 uploaded (readahead)

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

* Re: mmotm 2009-06-02-16-11 uploaded (readahead)
       [not found]       ` <20090608213817.999143dd.akpm@linux-foundation.org>
@ 2009-06-09  4:51         ` Wu Fengguang
  2009-06-09 11:01           ` Vladislav Bolkhovitin
  0 siblings, 1 reply; 2+ messages in thread
From: Wu Fengguang @ 2009-06-09  4:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jens Axboe, Randy Dunlap, linux-kernel@vger.kernel.org,
	hifumi.hisashi-gVGce1chcLdL9jVzuh4AOg@public.gmane.org,
	Vladislav Bolkhovitin, Bart Van Assche, Beheer InterCommIT,
	linux-nfs, scst-devel

On Tue, Jun 09, 2009 at 12:38:17PM +0800, Andrew Morton wrote:
> On Tue, 9 Jun 2009 05:59:16 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > ...
> > > Doing a block-specific call from inside page_cache_async_readahead() is
> > > a bit of a layering violation - this may not be a block-backed
> > > filesystem at all.
> > > 
> > > otoh, perhaps blk_run_backing_dev() is wrongly named and defined in the
> > > wrong place.  Perhaps non-block-backed backing_devs want to implement
> > > an unplug-style function too?  In which case the whole thing should be
> > > renamed and moved outside blkdev.h.
> > > 
> > > If we don't want to do that, shouldn't backing_dev_info.unplug* be
> > > wrapped in #ifdef CONFIG_BLOCK?  And wasn't it a layering violation to
> > > put block-specific things into the backing_dev_info?
> > > 
> > > Jens, talk to me!
> > > 
> > > From the readahead POV: does it make sense to call the backing-dev's
> > > "unplug" function even if that isn't a block-based device?  Or was this
> > > just a weird block-device-only performance problem?  Hard to say.
> > 
> > Layering wise, I don't think it's that bad. It would have looked cleaner
> > to do:
> > 
> >         blk_run_address_space(mapping);
> > 
> > instead, but we would still need to make that available outside of
> > CONFIG_BLOCK as well.
> > 
> > What I don't like about the patch is that it's a heuristic, a "I poked
> > this and it made that faster" with nobody really understanding why.
> 
> Well.  I _think_ we understand it.  I'm not sure that we understand why
> it made scst faster though.

Because the NFS/SCST servers are running RAID?

Also the client side NFS/SCST IO request may be slitted up and served
by a pool of server processes, which introduces the same disorderness
as in RAID configuration. But I wonder whether blk_* work for them,
or NFS/SCST have the "plug" concept at all.

> > And
> > it's second guessing the block layer unplugging, so perhaps the real fix
> > should be going on there. Or perhaps it's just fine and this micro
> > optimization just helps this one case and that's great.
> > 
> > So ho humm, not terribly excited about it, but I guess we can shove it
> > in there for testing. But lets please use blk_run_address_space() and
> > add an empty stub for that.
> 
> But blk_anything() shouldn't be in the readahead code - readahead isn't
> specific to block-based devices!

Yup, the "#ifdef CONFIG_BLOCK" looks ugly..

Thanks,
Fengguang

> y:/usr/src/25> egrep "blk|block" mm/readahead.c 
> #include <linux/blkdev.h>
>  * block layer to abandon the readahead if request allocation would block.
>  * force_page_cache_readahead() will ignore queue congestion and will block on
> y:/usr/src/25> 
> 
> 
> >From a layering POV we should have some mapping_start_io(address_space
> *) which of course calls blk_run_address_space() if it's a block-backed
> and calls <something else> if it's not block-backed.  Problem is, if
> the backing device is, say, NFS then we have no reason to believe that
> starting IO at this time is beneficial to NFS.
> 
> But sure, the world wouldn't end if we put a block-specific IO hint in
> there.  It just isn't quite right.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: mmotm 2009-06-02-16-11 uploaded (readahead)
  2009-06-09  4:51         ` mmotm 2009-06-02-16-11 uploaded (readahead) Wu Fengguang
@ 2009-06-09 11:01           ` Vladislav Bolkhovitin
  0 siblings, 0 replies; 2+ messages in thread
From: Vladislav Bolkhovitin @ 2009-06-09 11:01 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Jens Axboe, Randy Dunlap,
	linux-kernel@vger.kernel.org,
	hifumi.hisashi-gVGce1chcLdL9jVzuh4AOg@public.gmane.org,
	Bart Van Assche, Beheer InterCommIT, linux-nfs, scst-devel


Wu Fengguang, on 06/09/2009 08:51 AM wrote:
> On Tue, Jun 09, 2009 at 12:38:17PM +0800, Andrew Morton wrote:
>> On Tue, 9 Jun 2009 05:59:16 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
>>
>>> ...
>>>> Doing a block-specific call from inside page_cache_async_readahead() is
>>>> a bit of a layering violation - this may not be a block-backed
>>>> filesystem at all.
>>>>
>>>> otoh, perhaps blk_run_backing_dev() is wrongly named and defined in the
>>>> wrong place.  Perhaps non-block-backed backing_devs want to implement
>>>> an unplug-style function too?  In which case the whole thing should be
>>>> renamed and moved outside blkdev.h.
>>>>
>>>> If we don't want to do that, shouldn't backing_dev_info.unplug* be
>>>> wrapped in #ifdef CONFIG_BLOCK?  And wasn't it a layering violation to
>>>> put block-specific things into the backing_dev_info?
>>>>
>>>> Jens, talk to me!
>>>>
>>>> From the readahead POV: does it make sense to call the backing-dev's
>>>> "unplug" function even if that isn't a block-based device?  Or was this
>>>> just a weird block-device-only performance problem?  Hard to say.
>>> Layering wise, I don't think it's that bad. It would have looked cleaner
>>> to do:
>>>
>>>         blk_run_address_space(mapping);
>>>
>>> instead, but we would still need to make that available outside of
>>> CONFIG_BLOCK as well.
>>>
>>> What I don't like about the patch is that it's a heuristic, a "I poked
>>> this and it made that faster" with nobody really understanding why.
>> Well.  I _think_ we understand it.  I'm not sure that we understand why
>> it made scst faster though.
> 
> Because the NFS/SCST servers are running RAID?
> 
> Also the client side NFS/SCST IO request may be slitted up and served
> by a pool of server processes, which introduces the same disorderness
> as in RAID configuration. But I wonder whether blk_* work for them,
> or NFS/SCST have the "plug" concept at all.

Yes, I agree about the disorderness. In the Beheer's case there are both 
RAID and IO reordering caused by IO submission by a pool of SCST IO 
threads. So, your comment in the patch can well explain why the 
blk_run_backing_dev() patch recovers read-ahead and, hence, improves 
performance in this case.

But I also agree that it would be good to prove that theory by some 
block/RA/SCST traces, because there might be other similar issues in the 
RA code, which could be discovered with better understanding of the 
problem. We can ask Beheer to prepare the necessary traces.

Vlad


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2009-06-09 11:07 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200906022331.n52NVJhG015117@imap1.linux-foundation.org>
     [not found] ` <4A25F3FF.5060404@oracle.com>
     [not found]   ` <20090603134739.97d8a461.akpm@linux-foundation.org>
     [not found]     ` <20090609035915.GW11363@kernel.dk>
     [not found]       ` <20090608213817.999143dd.akpm@linux-foundation.org>
2009-06-09  4:51         ` mmotm 2009-06-02-16-11 uploaded (readahead) Wu Fengguang
2009-06-09 11:01           ` Vladislav Bolkhovitin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox