Re: 2.6.35 Regression: Ages spent discarding blocks that weren't used!

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Nigel Cunningham <nigel@tuxonice.net>
To: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: linux-kernel@vger.kernel.org,
	linux-pm@lists.linux-foundation.org, linux-scsi@vger.kernel.org
Subject: Re: 2.6.35 Regression: Ages spent discarding blocks that weren't used!
Date: Wed, 04 Aug 2010 19:16:55 +1000	[thread overview]
Message-ID: <4C593007.7040708@tuxonice.net> (raw)
In-Reply-To: <4C592BFE.7070701@s5r6.in-berlin.de>

Hi.

On 04/08/10 18:59, Stefan Richter wrote:
> (adding Cc: linux-scsi)
>
> Nigel Cunningham wrote:
>> I've just given hibernation a go under 2.6.35, and at first I thought
>> there was some sort of hang in freezing processes. The computer sat
>> there for aaaaaages, apparently doing nothing. Switched from TuxOnIce to
>> swsusp to see if it was specific to my code but no - the problem was
>> there too. I used the nifty new kdb support to get a backtrace, which was:
>>
>> get_swap_page_of_type
>> discard_swap_cluster
>> blk_dev_issue_discard
>> wait_for_completion
>>
>> Adding a printk in discard swap cluster gives the following:
>>
>> [   46.758330] Discarding 256 pages from bdev 800003 beginning at page 640377.
>> [   47.003363] Discarding 256 pages from bdev 800003 beginning at page 640633.
>> [   47.246514] Discarding 256 pages from bdev 800003 beginning at page 640889.
>>
>> ...
>>
>> [  221.877465] Discarding 256 pages from bdev 800003 beginning at page 826745.
>> [  222.121284] Discarding 256 pages from bdev 800003 beginning at page 827001.
>> [  222.365908] Discarding 256 pages from bdev 800003 beginning at page 827257.
>> [  222.610311] Discarding 256 pages from bdev 800003 beginning at page 827513.
>>
>> So allocating 4GB of swap on my SSD now takes 176 seconds instead of
>> virtually no time at all. (This code is completely unchanged from 2.6.34).
>>
>> I have a couple of questions:
>>
>> 1) As far as I can see, there haven't been any changes in mm/swapfile.c
>> that would cause this slowdown, so something in the block layer has
>> (from my point of view) regressed. Is this a known issue?
>
> Perhaps ATA TRIM is enabled for this SSD in 2.6.35 but not in 2.6.34?
> Or the discard code has been changed to issue many moderately sized ATA
> TRIMs instead of a single huge one, and the former was much more optimal
> for your particular SSD?

Mmmm. Wonder how I tell. Something in dmesg or hdparm -I?

ata3.00: ATA-8: ARSSD56GBP, 1916, max UDMA/133
ata3.00: 500118192 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
ata3.00: configured for UDMA/133
scsi 2:0:0:0: Direct-Access     ATA      ARSSD56GBP       1916 PQ: 0 ANSI: 5
sd 2:0:0:0: Attached scsi generic sg1 type 0
sd 2:0:0:0: [sda] 500118192 512-byte logical blocks: (256 GB/238 GiB)
sd 2:0:0:0: [sda] Write Protect is off
sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA
sda: sda1 sda2 sda3 sda4
sd 2:0:0:0: [sda] Attached SCSI disk

/dev/sda:

ATA device, with non-removable media
	Model Number:       ARSSD56GBP
	Serial Number:      DC2210200F1B40015
	Firmware Revision:  1916
Standards:
	Supported: 8 7 6 5
	Likely used: 8
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:  268435455
	LBA48  user addressable sectors:  500118192
	Logical  Sector size:                   512 bytes
	Physical Sector size:                   512 bytes
	device size with M = 1024*1024:      244198 MBytes
	device size with M = 1000*1000:      256060 MBytes (256 GB)
	cache/buffer size  = unknown
	Nominal Media Rotation Rate: Solid State Device
Capabilities:
	LBA, IORDY(can be disabled)
	Queue depth: 32
	Standby timer values: spec'd by Standard, no device specific minimum
	R/W multiple sector transfer: Max = 1	Current = 1
	DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4
	     Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	SMART feature set
	    	Security Mode feature set
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	Host Protected Area feature set
	   *	WRITE_BUFFER command
	   *	READ_BUFFER command
	   *	DOWNLOAD_MICROCODE
	    	SET_MAX security extension
	   *	48-bit Address feature set
	   *	Device Configuration Overlay feature set
	   *	Mandatory FLUSH_CACHE
	   *	FLUSH_CACHE_EXT
	   *	SMART self-test
	   *	General Purpose Logging feature set
	   *	Gen1 signaling speed (1.5Gb/s)
	   *	Gen2 signaling speed (3.0Gb/s)
	   *	Native Command Queueing (NCQ)
	   *	Phy event counters
	   *	DMA Setup Auto-Activate optimization
	    	Device-initiated interface power management
	   *	Software settings preservation
	   *	Data Set Management determinate TRIM supported
Security:
		supported
	not	enabled
	not	locked
		frozen
	not	expired: security count
	not	supported: enhanced erase
Checksum: correct


>> 2) Why are we calling discard_swap_cluster anyway? The swap was unused
>> and we're allocating it. I could understand calling it when freeing
>> swap, but when allocating?
>
> At the moment when the administrator creates swap space, the kernel can
> assume that he has no use anymore for the data that may have existed
> previously at this space.  Hence instruct the SSD's flash translation
> layer to return all these blocks to the list of unused logical blocks
> which do not have to be read and backed up whenever another logical
> block within the same erase block is written to.
>
> However, I am surprised that this is done every time (?) when preparing
> for hibernation.

It's not hibernation per se. The discard code is called from a few 
places in swapfile.c in (afaict from a quick scan) both swap allocation 
and free paths.

Regards,

Nigel

next prev parent reply	other threads:[~2010-08-04  9:16 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-04  1:40 2.6.35 Regression: Ages spent discarding blocks that weren't used! Nigel Cunningham
2010-08-04  8:59 ` Stefan Richter
2010-08-04  9:16   ` Nigel Cunningham [this message]
2010-08-04 12:44 ` Mark Lord
2010-08-04 18:02   ` Martin K. Petersen
2010-08-04 21:22   ` Nigel Cunningham
2010-08-05  3:58     ` Hugh Dickins
2010-08-05  6:28       ` Nigel Cunningham
2010-08-06  1:15         ` Hugh Dickins
2010-08-06  4:40           ` Nigel Cunningham
2010-08-06 22:07             ` Hugh Dickins
2010-08-07 22:47               ` Nigel Cunningham
2010-08-13 11:54               ` Christoph Hellwig
2010-08-13 18:15                 ` Hugh Dickins
2010-08-14 11:43                   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C593007.7040708@tuxonice.net \
    --to=nigel@tuxonice.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@lists.linux-foundation.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=stefanr@s5r6.in-berlin.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox