public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nigel Cunningham <nigel@tuxonice.net>
To: Hugh Dickins <hughd@google.com>
Cc: Mark Lord <kernel@teksavvy.com>,
	LKML <linux-kernel@vger.kernel.org>,
	pm list <linux-pm@lists.linux-foundation.org>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: 2.6.35 Regression: Ages spent discarding blocks that weren't used!
Date: Sun, 08 Aug 2010 08:47:50 +1000	[thread overview]
Message-ID: <4C5DE296.6080509@tuxonice.net> (raw)
In-Reply-To: <AANLkTinU24v6Mty54YL3gv3M5S+4jdN5vHD54KknchAG@mail.gmail.com>

Hi.

On 07/08/10 08:07, Hugh Dickins wrote:
> On Thu, Aug 5, 2010 at 9:40 PM, Nigel Cunningham<nigel@tuxonice.net>  wrote:
>> On 06/08/10 11:15, Hugh Dickins wrote:
>>> On Wed, Aug 4, 2010 at 11:28 PM, Nigel Cunningham<nigel@tuxonice.net>
>>>   wrote:
>>>>>> On 05/08/10 13:58, Hugh Dickins wrote:
>>>>>
>>>>> I agree it would make more sense to discard swap when freeing rather
>>>>> than when allocating, I wish we could.  But at the freeing point we're
>>>>> often holding a page_table spinlock at an outer level, and it's just
>>>>> one page we're given to free.  Freeing is an operation you want to be
>>>>> comfortable doing when you're short of resources, whereas discard is a
>>>>> kind of I/O operation which needs resources.
>>
>> The more I think about this, the more it seems to me that doing the discard
>> at allocation time is just wrong. How about a strategy in which you do the
>> discard immediately if resources permit, or flag it as in need of doing (at
>> a future swap free of that page or swap off time) if things are too
>> pressured at the moment? I haven't put thought into what data structures
>> could be used for that - just want to ask for now if you'd be happy with the
>> idea of looking into a strategy like that.
>
> We're going round in circles here.  I have already agreed with you
> that doing it at swap free time seems more natural, but explained that
> there are implementation difficulties there, so doing it at allocation
> time proved both much less messy and more efficient.  I can imagine
> advances that would sway me to putting in more effort there
> (duplicating the scan for a free 1MB every time a page of swap is
> freed? doesn't sound efficient to me, but if it saves us from an
> inefficiency of too many or too late discards, perhaps it could work
> out).  However, a recently introduced regression does not make a
> strong case for adding in hacks - not yet anyway.

Okay; sorry.

>> I've done the bisect now - spent the time today instead of on the place, and
>
> That's great, many thanks!
>
>> it took me to fbbf055692aeb25c54c49d9ca84532de836fbba0. This is Christoph's
>> Hellwig's patch "block: fix DISCARD_BARRIER requests".
>
> That's
>      block: fix DISCARD_BARRIER requests
>
>      Filesystems assume that DISCARD_BARRIER are full barriers, so that they
>      don't have to track in-progress discard operation when submitting new I/O.
>      But currently we only treat them as elevator barriers, which don't
>      actually do the nessecary queue drains.
>
> where the discard barrier changes from REQ_SOFTBARRIER to REQ_HARDBARRIER.
>
> If REQ_SOFTBARRIER means that the device is still free to reorder a
> write, which was issued after discard completion was reported, before
> the discard (so later discarding the data written), then certainly I
> agree with Christoph (now Cc'ed) that the REQ_HARDBARRIER is
> unavoidable there; but if not, then it's not needed for the swap case.
>   I hope to gain a little more enlightenment on such barriers shortly.
>
> What does seem over the top to me, is for mm/swapfile.c's
> blkdev_issue_discard()s to be asking for both BLKDEV_IFL_WAIT and
> BLKDEV_IFL_BARRIER: those swap discards were originally written just
> to use barriers, without needing to wait for completion in there.  I'd
> be interested to hear if cutting out the BLKDEV_IFL_WAITs makes the
> swap discards behave acceptably again for you - but understand that
> you won't have a chance to try that until later next week.

Well, I've just arrived at the hotel, and I want to delay going to bed 
until a 'proper' hour local time, so I expect I'll do it this evening - 
especially a one liner like that. Might even give it a go now...

Nigel

  reply	other threads:[~2010-08-07 22:47 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-04  1:40 2.6.35 Regression: Ages spent discarding blocks that weren't used! Nigel Cunningham
2010-08-04  8:59 ` Stefan Richter
2010-08-04  9:16   ` Nigel Cunningham
2010-08-04 12:44 ` Mark Lord
2010-08-04 18:02   ` Martin K. Petersen
2010-08-04 21:22   ` Nigel Cunningham
2010-08-05  3:58     ` Hugh Dickins
2010-08-05  6:28       ` Nigel Cunningham
2010-08-06  1:15         ` Hugh Dickins
2010-08-06  4:40           ` Nigel Cunningham
2010-08-06 22:07             ` Hugh Dickins
2010-08-07 22:47               ` Nigel Cunningham [this message]
2010-08-13 11:54               ` Christoph Hellwig
2010-08-13 18:15                 ` Hugh Dickins
2010-08-14 11:43                   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C5DE296.6080509@tuxonice.net \
    --to=nigel@tuxonice.net \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=kernel@teksavvy.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox