From: Nigel Cunningham <nigel@tuxonice.net>
To: Hugh Dickins <hughd@google.com>
Cc: Mark Lord <kernel@teksavvy.com>,
LKML <linux-kernel@vger.kernel.org>,
pm list <linux-pm@lists.linux-foundation.org>,
Christoph Hellwig <hch@infradead.org>
Subject: Re: 2.6.35 Regression: Ages spent discarding blocks that weren't used!
Date: Sun, 08 Aug 2010 08:47:50 +1000 [thread overview]
Message-ID: <4C5DE296.6080509@tuxonice.net> (raw)
In-Reply-To: <AANLkTinU24v6Mty54YL3gv3M5S+4jdN5vHD54KknchAG@mail.gmail.com>
Hi.
On 07/08/10 08:07, Hugh Dickins wrote:
> On Thu, Aug 5, 2010 at 9:40 PM, Nigel Cunningham<nigel@tuxonice.net> wrote:
>> On 06/08/10 11:15, Hugh Dickins wrote:
>>> On Wed, Aug 4, 2010 at 11:28 PM, Nigel Cunningham<nigel@tuxonice.net>
>>> wrote:
>>>>>> On 05/08/10 13:58, Hugh Dickins wrote:
>>>>>
>>>>> I agree it would make more sense to discard swap when freeing rather
>>>>> than when allocating, I wish we could. But at the freeing point we're
>>>>> often holding a page_table spinlock at an outer level, and it's just
>>>>> one page we're given to free. Freeing is an operation you want to be
>>>>> comfortable doing when you're short of resources, whereas discard is a
>>>>> kind of I/O operation which needs resources.
>>
>> The more I think about this, the more it seems to me that doing the discard
>> at allocation time is just wrong. How about a strategy in which you do the
>> discard immediately if resources permit, or flag it as in need of doing (at
>> a future swap free of that page or swap off time) if things are too
>> pressured at the moment? I haven't put thought into what data structures
>> could be used for that - just want to ask for now if you'd be happy with the
>> idea of looking into a strategy like that.
>
> We're going round in circles here. I have already agreed with you
> that doing it at swap free time seems more natural, but explained that
> there are implementation difficulties there, so doing it at allocation
> time proved both much less messy and more efficient. I can imagine
> advances that would sway me to putting in more effort there
> (duplicating the scan for a free 1MB every time a page of swap is
> freed? doesn't sound efficient to me, but if it saves us from an
> inefficiency of too many or too late discards, perhaps it could work
> out). However, a recently introduced regression does not make a
> strong case for adding in hacks - not yet anyway.
Okay; sorry.
>> I've done the bisect now - spent the time today instead of on the place, and
>
> That's great, many thanks!
>
>> it took me to fbbf055692aeb25c54c49d9ca84532de836fbba0. This is Christoph's
>> Hellwig's patch "block: fix DISCARD_BARRIER requests".
>
> That's
> block: fix DISCARD_BARRIER requests
>
> Filesystems assume that DISCARD_BARRIER are full barriers, so that they
> don't have to track in-progress discard operation when submitting new I/O.
> But currently we only treat them as elevator barriers, which don't
> actually do the nessecary queue drains.
>
> where the discard barrier changes from REQ_SOFTBARRIER to REQ_HARDBARRIER.
>
> If REQ_SOFTBARRIER means that the device is still free to reorder a
> write, which was issued after discard completion was reported, before
> the discard (so later discarding the data written), then certainly I
> agree with Christoph (now Cc'ed) that the REQ_HARDBARRIER is
> unavoidable there; but if not, then it's not needed for the swap case.
> I hope to gain a little more enlightenment on such barriers shortly.
>
> What does seem over the top to me, is for mm/swapfile.c's
> blkdev_issue_discard()s to be asking for both BLKDEV_IFL_WAIT and
> BLKDEV_IFL_BARRIER: those swap discards were originally written just
> to use barriers, without needing to wait for completion in there. I'd
> be interested to hear if cutting out the BLKDEV_IFL_WAITs makes the
> swap discards behave acceptably again for you - but understand that
> you won't have a chance to try that until later next week.
Well, I've just arrived at the hotel, and I want to delay going to bed
until a 'proper' hour local time, so I expect I'll do it this evening -
especially a one liner like that. Might even give it a go now...
Nigel
next prev parent reply other threads:[~2010-08-07 22:47 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-04 1:40 2.6.35 Regression: Ages spent discarding blocks that weren't used! Nigel Cunningham
2010-08-04 8:59 ` Stefan Richter
2010-08-04 9:16 ` Nigel Cunningham
2010-08-04 12:44 ` Mark Lord
2010-08-04 18:02 ` Martin K. Petersen
2010-08-04 21:22 ` Nigel Cunningham
2010-08-05 3:58 ` Hugh Dickins
2010-08-05 6:28 ` Nigel Cunningham
2010-08-06 1:15 ` Hugh Dickins
2010-08-06 4:40 ` Nigel Cunningham
2010-08-06 22:07 ` Hugh Dickins
2010-08-07 22:47 ` Nigel Cunningham [this message]
2010-08-13 11:54 ` Christoph Hellwig
2010-08-13 18:15 ` Hugh Dickins
2010-08-14 11:43 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C5DE296.6080509@tuxonice.net \
--to=nigel@tuxonice.net \
--cc=hch@infradead.org \
--cc=hughd@google.com \
--cc=kernel@teksavvy.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox