From: Mark Lord <liml@rtr.ca>
To: Jens Axboe <jens.axboe@oracle.com>
Cc: Mark Lord <lkml@rtr.ca>, Matthew Wilcox <matthew@wil.cx>,
IDE/ATA development list <linux-ide@vger.kernel.org>,
Linux Kernel <linux-kernel@vger.kernel.org>,
linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: QUEUE_FLAG_CLUSTER: not working in 2.6.24 ?
Date: Thu, 13 Dec 2007 15:14:55 -0500 [thread overview]
Message-ID: <476192BF.5050308@rtr.ca> (raw)
In-Reply-To: <20071213200958.GK10104@kernel.dk>
Jens Axboe wrote:
> On Thu, Dec 13 2007, Mark Lord wrote:
>> Jens Axboe wrote:
>>> On Thu, Dec 13 2007, Jens Axboe wrote:
>>>> On Thu, Dec 13 2007, Mark Lord wrote:
>>>>> Jens Axboe wrote:
>>>>>> On Thu, Dec 13 2007, Mark Lord wrote:
>>>>>>> Mark Lord wrote:
>>>>>>>> Jens Axboe wrote:
>>>>>>>>> On Thu, Dec 13 2007, Mark Lord wrote:
>>>>>>>>>> Matthew Wilcox wrote:
>>>>>>>>>>> On Thu, Dec 13, 2007 at 01:48:18PM -0500, Mark Lord wrote:
>>>>>>>>>>>> Problem confirmed. 2.6.23.8 regularly generates segments up to
>>>>>>>>>>>> 64KB for libata,
>>>>>>>>>>>> but 2.6.24 uses only 4KB segments and a *few* 8KB segments.
>>>>>>>>>>> Just a suspicion ... could this be slab vs slub? ie check your
>>>>>>>>>>> configs
>>>>>>>>>>> are the same / similar between the two kernels.
>>>>>>>>>> ..
>>>>>>>>>>
>>>>>>>>>> Mmmm.. a good thought, that one.
>>>>>>>>>> But I just rechecked, and both have CONFIG_SLAB=y
>>>>>>>>>>
>>>>>>>>>> My guess is that something got changed around when Jens
>>>>>>>>>> reworked the block layer for 2.6.24.
>>>>>>>>>> I'm going to dig around in there now.
>>>>>>>>> I didn't rework the block layer for 2.6.24 :-). The core block layer
>>>>>>>>> changes since 2.6.23 are:
>>>>>>>>>
>>>>>>>>> - Support for empty barriers. Not a likely candidate.
>>>>>>>>> - Shared tag queue fixes. Totally unlikely.
>>>>>>>>> - sg chaining support. Not likely.
>>>>>>>>> - The bio changes from Neil. Of the bunch, the most likely suspects
>>>>>>>>> in
>>>>>>>>> this area, since it changes some of the code involved with merges and
>>>>>>>>> blk_rq_map_sg().
>>>>>>>>> - Lots of simple stuff, again very unlikely.
>>>>>>>>>
>>>>>>>>> Anyway, it sounds odd for this to be a block layer problem if you do
>>>>>>>>> see
>>>>>>>>> occasional segments being merged. So it sounds more like the input
>>>>>>>>> data
>>>>>>>>> having changed.
>>>>>>>>>
>>>>>>>>> Why not just bisect it?
>>>>>>>> ..
>>>>>>>>
>>>>>>>> Because the early 2.6.24 series failed to boot on this machine
>>>>>>>> due to bugs in the block layer -- so the code that caused this
>>>>>>>> regression
>>>>>>>> is probably in the stuff from before the kernels became usable here.
>>>>>>> ..
>>>>>>>
>>>>>>> That sounds more harsh than intended --> the earlier 2.6.24 kernels
>>>>>>> (up to
>>>>>>> the first couple of -rc* ones failed here because of incompatibilities
>>>>>>> between the block/bio changes and libata.
>>>>>>>
>>>>>>> That's better, I think!
>>>>>> No worries, I didn't pick it up as harsh just as an odd conclusion :-)
>>>>>>
>>>>>> If I were you, I'd just start from the first -rc that booted for you. If
>>>>>> THAT has the bug, then we'll think of something else. If you don't get
>>>>>> anywhere, I can run some tests tomorrow and see if I can reproduce it
>>>>>> here.
>>>>> ..
>>>>>
>>>>> I believe that *anyone* can reproduce it, since it's broken long before
>>>>> the requests ever get to SCSI or libata. Which also means that *anyone*
>>>>> who wants to can bisect it, as well.
>>>>>
>>>>> I don't do "bisects".
>>>> It was just a suggestion on how to narrow it down, do as you see fit.
>>>>
>>>>> But I will dig a bit more and see if I can find the culprit.
>>>> Sure, I'll dig around as well.
>>> Just tried something simple. I only see one 12kb segment so far, so not
>>> a lot by any stretch. I also DONT see any missed merges signs, so it
>>> would appear that the pages in the request are simply not contigious
>>> physically.
>>>
>>> diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
>>> index e30b1a4..1e34b6f 100644
>>> --- a/block/ll_rw_blk.c
>>> +++ b/block/ll_rw_blk.c
>>> @@ -1330,6 +1330,8 @@ int blk_rq_map_sg(struct request_queue *q, struct
>>> request *rq,
>>> goto new_segment;
>>>
>>> sg->length += nbytes;
>>> + if (sg->length > 8192)
>>> + printk("sg_len=%d\n", sg->length);
>>> } else {
>>> new_segment:
>>> if (!sg)
>>> @@ -1349,6 +1351,8 @@ new_segment:
>>> sg = sg_next(sg);
>>> }
>>>
>>> + if (bvprv && (page_address(bvprv->bv_page) +
>>> bvprv->bv_len == page_address(bvec->bv_page)))
>>> + printk("missed merge\n");
>>> sg_set_page(sg, bvec->bv_page, nbytes,
>>> bvec->bv_offset);
>>> nsegs++;
>>> }
>>>
>> ..
>>
>> Yeah, the first part is similar to my own hack.
>>
>> For testing, try "dd if=/dev/sda of=/dev/null bs=4096k".
>> That *really* should end up using contiguous pages on most systems.
>>
>> I figured out the git thing, and am now building some in-between kernels to
>> try.
>
> OK, it's a vm issue, I have tens of thousand "backward" pages after a
> boot - IOW, bvec->bv_page is the page before bvprv->bv_page, not
> reverse. So it looks like that bug got reintroduced.
...
Mmm.. shouldn't one of the front- or back- merge logics work for either order?
next prev parent reply other threads:[~2007-12-13 20:14 UTC|newest]
Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-13 18:36 QUEUE_FLAG_CLUSTER: not working in 2.6.24 ? Mark Lord
2007-12-13 18:37 ` Mark Lord
2007-12-13 18:42 ` Matthew Wilcox
2007-12-13 18:46 ` James Bottomley
2007-12-13 18:48 ` Mark Lord
2007-12-13 18:53 ` Matthew Wilcox
2007-12-13 19:03 ` Mark Lord
2007-12-13 19:26 ` Jens Axboe
2007-12-13 19:30 ` Mark Lord
2007-12-13 19:32 ` Mark Lord
2007-12-13 19:39 ` Jens Axboe
2007-12-13 19:42 ` Mark Lord
2007-12-13 19:53 ` Jens Axboe
2007-12-13 19:59 ` Mark Lord
2007-12-13 20:05 ` Jens Axboe
2007-12-13 20:02 ` Jens Axboe
2007-12-13 20:06 ` Mark Lord
2007-12-13 20:09 ` Jens Axboe
2007-12-13 20:14 ` Mark Lord [this message]
2007-12-13 20:18 ` Mark Lord
2007-12-13 20:21 ` Jens Axboe
2007-12-13 22:02 ` Andrew Morton
2007-12-13 22:02 ` Andrew Morton
2007-12-13 22:15 ` James Bottomley
2007-12-13 22:15 ` James Bottomley
2007-12-13 22:29 ` Andrew Morton
2007-12-13 22:29 ` Andrew Morton
2007-12-13 22:33 ` Mark Lord
2007-12-13 22:33 ` Mark Lord
2007-12-13 23:13 ` Mark Lord
2007-12-13 23:13 ` Mark Lord
2007-12-14 0:05 ` Mark Lord
2007-12-14 0:05 ` Mark Lord
2007-12-14 0:30 ` Mark Lord
2007-12-14 0:30 ` Mark Lord
2007-12-14 0:37 ` Andrew Morton
2007-12-14 0:37 ` Andrew Morton
2007-12-14 0:42 ` Mark Lord
2007-12-14 0:42 ` Mark Lord
2007-12-14 0:46 ` [PATCH] fix page_alloc for larger I/O segments (improved) Mark Lord
2007-12-14 0:46 ` Mark Lord
2007-12-14 0:57 ` James Bottomley
2007-12-14 0:57 ` James Bottomley
2007-12-14 1:11 ` Andrew Morton
2007-12-14 1:11 ` Andrew Morton
2007-12-14 2:23 ` Mark Lord
2007-12-14 2:23 ` Mark Lord
2007-12-14 2:23 ` Mark Lord
2007-12-14 17:42 ` Mel Gorman
2007-12-14 17:42 ` Mel Gorman
2007-12-14 18:07 ` Mark Lord
2007-12-14 18:07 ` Mark Lord
2007-12-16 21:56 ` Mel Gorman
2007-12-16 21:56 ` Mel Gorman
2007-12-14 18:13 ` Matthew Wilcox
2007-12-14 18:13 ` Matthew Wilcox
2007-12-14 18:30 ` Mark Lord
2007-12-14 18:30 ` Mark Lord
2007-12-20 22:37 ` Matthew Wilcox
2007-12-20 22:37 ` Matthew Wilcox
2007-12-14 0:47 ` QUEUE_FLAG_CLUSTER: not working in 2.6.24 ? Mark Lord
2007-12-14 0:47 ` Mark Lord
2007-12-14 11:50 ` Mel Gorman
2007-12-14 11:50 ` Mel Gorman
2007-12-14 13:57 ` Mark Lord
2007-12-14 13:57 ` Mark Lord
2007-12-14 0:40 ` [PATCH] fix page_alloc for larger I/O segments Mark Lord
2007-12-14 0:40 ` Mark Lord
2007-12-14 1:03 ` Andrew Morton
2007-12-14 1:03 ` Andrew Morton
2007-12-14 4:00 ` Matthew Wilcox
2007-12-14 4:00 ` Matthew Wilcox
2007-12-15 1:09 ` QUEUE_FLAG_CLUSTER: not working in 2.6.24 ? Mel Gorman
2007-12-15 1:09 ` Mel Gorman
2007-12-15 2:02 ` Andrew Morton
2007-12-15 2:02 ` Andrew Morton
2007-12-15 5:55 ` Matt Mackall
2007-12-15 5:55 ` Matt Mackall
2007-12-16 21:55 ` Mel Gorman
2007-12-16 21:55 ` Mel Gorman
2007-12-17 19:24 ` Randy Dunlap
2007-12-17 19:24 ` Randy Dunlap
2007-12-18 2:42 ` Matt Mackall
2007-12-18 2:42 ` Matt Mackall
2007-12-13 22:17 ` Jens Axboe
2007-12-13 22:17 ` Jens Axboe
2007-12-13 22:02 ` VM allocates pages in reverse order again Matthew Wilcox
2007-12-13 22:02 ` Matthew Wilcox
2007-12-13 19:37 ` QUEUE_FLAG_CLUSTER: not working in 2.6.24 ? Jens Axboe
2007-12-13 19:53 ` Mark Lord
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=476192BF.5050308@rtr.ca \
--to=liml@rtr.ca \
--cc=jens.axboe@oracle.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=lkml@rtr.ca \
--cc=matthew@wil.cx \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.