linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robert Hancock <hancockrwd@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Szabolcs Szakacsits <szaka@ntfs-3g.com>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>,
	Grant Grundler <grundler@google.com>,
	Linux IDE mailing list <linux-ide@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Jens Axboe <jens.axboe@oracle.com>,
	Arjan van de Ven <arjan@infradead.org>
Subject: Re: Implementing NVMHCI...
Date: Sun, 12 Apr 2009 12:35:49 -0600	[thread overview]
Message-ID: <49E23485.4020904@gmail.com> (raw)
In-Reply-To: <alpine.LFD.2.00.0904121012500.4583@localhost.localdomain>

Linus Torvalds wrote:
> IOW, when you allocate a new 32kB cluster, you will have to allocate 8 
> pages to do IO on it (since you'll have to initialize the diskspace), but 
> you can still literally treat those pages as _individual_ pages, and you 
> can write them out in any order, and you can free them (and then look them 
> up) one at a time.
> 
> Notice? The cluster size really only ends up being a disk-space allocation 
> issue, not an issue for actually caching the end result or for the actual 
> size of the IO.

Right.. I didn't realize we were actually that smart (not writing out 
the entire cluster when dirtying one page) but I guess it makes sense.

> 
> The hardware sector size is very different. If you have a 32kB hardware 
> sector size, that implies that _all_ IO has to be done with that 
> granularity. Now you can no longer treat the eight pages as individual 
> pages - you _have_ to write them out and read them in as one entity. If 
> you dirty one page, you effectively dirty them all. You can not drop and 
> re-allocate pages one at a time any more.
> 
> 				Linus

I suspect that in this case trying to gang together multiple pages 
inside the VM to actually handle it this way all the way through would 
be insanity. My guess is the only way you could sanely do it is the 
read-modify-write approach when writing out the data (in the block layer 
maybe?) where the read can be optimized away if the pages for the entire 
hardware sector are already in cache or the write is large enough to 
replace the entire sector. I assume we already do this in the md code 
somewhere for cases like software RAID 5 with a stripe size of >4KB..

That obviously would have some performance drawbacks compared to a 
smaller sector size, but if the device is bound and determined to use 
bigger sectors internally one way or the other and the alternative is 
the drive does R-M-W internally to emulate smaller sectors - which for 
some devices seems to be the case - maybe it makes more sense to do it 
in the kernel if we have more information to allow us to do it more 
efficiently. (Though, at least on the normal ATA disk side of things, 4K 
is the biggest number I've heard tossed about for a future expanded 
sector size, but flash devices like this may be another story..)

  reply	other threads:[~2009-04-12 18:35 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20090412091228.GA29937@elte.hu>
2009-04-12 15:14 ` Implementing NVMHCI Szabolcs Szakacsits
2009-04-12 15:20   ` Alan Cox
2009-04-12 16:15     ` Avi Kivity
2009-04-12 17:11       ` Linus Torvalds
2009-04-13  6:32         ` Avi Kivity
2009-04-13 15:10           ` Linus Torvalds
2009-04-13 15:38             ` James Bottomley
2009-04-14  7:22             ` Andi Kleen
2009-04-14 10:07               ` Avi Kivity
2009-04-14  9:59             ` Avi Kivity
2009-04-14 10:23               ` Jeff Garzik
2009-04-14 10:37                 ` Avi Kivity
2009-04-14 11:45                   ` Jeff Garzik
2009-04-14 11:58                     ` Szabolcs Szakacsits
2009-04-17 22:45                       ` H. Peter Anvin
2009-04-14 12:08                     ` Avi Kivity
2009-04-14 12:21                       ` Jeff Garzik
2009-04-25  8:26                 ` Pavel Machek
2009-04-12 15:41   ` Linus Torvalds
2009-04-12 17:02     ` Robert Hancock
2009-04-12 17:20       ` Linus Torvalds
2009-04-12 18:35         ` Robert Hancock [this message]
2009-04-13 11:18         ` Avi Kivity
2009-04-12 17:23     ` James Bottomley
     [not found]     ` <6934efce0904141052j3d4f87cey9fc4b802303aa73b@mail.gmail.com>
2009-04-15  6:37       ` Artem Bityutskiy
2009-04-30 22:51         ` Jörn Engel
2009-04-30 23:36           ` Jeff Garzik
2009-04-11 17:33 Jeff Garzik
2009-04-11 19:32 ` Alan Cox
2009-04-11 19:52   ` Linus Torvalds
2009-04-11 20:21     ` Jeff Garzik
2009-04-11 21:49     ` Grant Grundler
2009-04-11 22:33       ` Linus Torvalds
2009-04-12  5:08         ` Leslie Rhorer
2009-04-11 23:25       ` Alan Cox
2009-04-11 23:51         ` Jeff Garzik
2009-04-12  0:49           ` Linus Torvalds
2009-04-12  1:59             ` Jeff Garzik
2009-04-12  1:15         ` david
2009-04-12  3:13           ` Linus Torvalds
2009-04-12 14:23         ` Mark Lord
2009-04-12 17:29           ` Jeff Garzik
2009-04-11 19:54   ` Jeff Garzik
2009-04-11 21:08     ` John Stoffel
2009-04-11 21:31       ` John Stoffel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49E23485.4020904@gmail.com \
    --to=hancockrwd@gmail.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=arjan@infradead.org \
    --cc=grundler@google.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=szaka@ntfs-3g.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).