Re: [REVIEW] NVM Express driver

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andi Kleen <andi@firstfloor.org>
To: Matthew Wilcox <willy@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [REVIEW] NVM Express driver
Date: Sun, 13 Mar 2011 18:14:19 +0100	[thread overview]
Message-ID: <20110313171419.GL2499@one.firstfloor.org> (raw)
In-Reply-To: <20110312055146.GA4183@linux.intel.com>

On Sat, Mar 12, 2011 at 12:51:46AM -0500, Matthew Wilcox wrote:
> Is there a good API to iterate through each socket, then each core in a
> socket, then each HT sibling?  eg, if I have 20 queues and 2x6x2 CPUs,

Not for this particular order. And also you have to handle
hotplug in any case anyways.

And whatever you do, don't add NR_CPUS arrays.

> I want to assign at least one queue to each core; some threads will get
> their own queues and others will have to share with their HT sibling.

Please write a generic library function for this if you do this.

> 
> > > +	nprps = DIV_ROUND_UP(length, PAGE_SIZE);
> > > +	npages = DIV_ROUND_UP(8 * nprps, PAGE_SIZE);
> > > +	prps = kmalloc(sizeof(*prps) + sizeof(__le64 *) * npages, GFP_ATOMIC);
> > > +	prp_page = 0;
> > > +	if (nprps <= (256 / 8)) {
> > > +		pool = dev->prp_small_pool;
> > > +		prps->npages = 0;
> > 
> > 
> > Unchecked GFP_ATOMIC allocation? That will oops soon.
> > Besides GFP_ATOMIC a very risky thing to do on a low memory situation,
> > which can trigger writeouts.
> 
> Ah yes, thank you.  There are a few other places like this.  Bizarrely,
> they've not oopsed during the xfstests runs.

You need suitable background load. If you run it in LTP the harness has
support for background load. For GFP_ATOMIC exhaustion you typically
need something interrupt intensive, like a lot of networking.

> 
> My plan for this is, instead of using a mempool, to submit partial I/Os
> in the rare cases where a write cannot allocate memory.  I have the
> design in my head, just not committed to code yet.  The design also
> avoids allocating any memory in the driver for I/Os that do not cross
> a page boundary.

I forgot the latest status, but there were a lot of improvements
with dirty pages handling since that "no memory allocation on writeout"
rule was introduced. It may not be as big a problem as it used to 
be with GFP_NOFS. 

Copying linux-mm in case there are deep thoughts on this there.

Just GFP_ATOMIC is definitely still a bad idea there. 

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

WARNING: multiple messages have this Message-ID (diff)

From: Andi Kleen <andi@firstfloor.org>
To: Matthew Wilcox <willy@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [REVIEW] NVM Express driver
Date: Sun, 13 Mar 2011 18:14:19 +0100	[thread overview]
Message-ID: <20110313171419.GL2499@one.firstfloor.org> (raw)
In-Reply-To: <20110312055146.GA4183@linux.intel.com>

On Sat, Mar 12, 2011 at 12:51:46AM -0500, Matthew Wilcox wrote:
> Is there a good API to iterate through each socket, then each core in a
> socket, then each HT sibling?  eg, if I have 20 queues and 2x6x2 CPUs,

Not for this particular order. And also you have to handle
hotplug in any case anyways.

And whatever you do, don't add NR_CPUS arrays.

> I want to assign at least one queue to each core; some threads will get
> their own queues and others will have to share with their HT sibling.

Please write a generic library function for this if you do this.

> 
> > > +	nprps = DIV_ROUND_UP(length, PAGE_SIZE);
> > > +	npages = DIV_ROUND_UP(8 * nprps, PAGE_SIZE);
> > > +	prps = kmalloc(sizeof(*prps) + sizeof(__le64 *) * npages, GFP_ATOMIC);
> > > +	prp_page = 0;
> > > +	if (nprps <= (256 / 8)) {
> > > +		pool = dev->prp_small_pool;
> > > +		prps->npages = 0;
> > 
> > 
> > Unchecked GFP_ATOMIC allocation? That will oops soon.
> > Besides GFP_ATOMIC a very risky thing to do on a low memory situation,
> > which can trigger writeouts.
> 
> Ah yes, thank you.  There are a few other places like this.  Bizarrely,
> they've not oopsed during the xfstests runs.

You need suitable background load. If you run it in LTP the harness has
support for background load. For GFP_ATOMIC exhaustion you typically
need something interrupt intensive, like a lot of networking.

> 
> My plan for this is, instead of using a mempool, to submit partial I/Os
> in the rare cases where a write cannot allocate memory.  I have the
> design in my head, just not committed to code yet.  The design also
> avoids allocating any memory in the driver for I/Os that do not cross
> a page boundary.

I forgot the latest status, but there were a lot of improvements
with dirty pages handling since that "no memory allocation on writeout"
rule was introduced. It may not be as big a problem as it used to 
be with GFP_NOFS. 

Copying linux-mm in case there are deep thoughts on this there.

Just GFP_ATOMIC is definitely still a bad idea there. 

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-03-13 17:14 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-03 20:47 [REVIEW] NVM Express driver Matthew Wilcox
2011-03-03 21:13 ` Greg KH
2011-03-03 21:41   ` Matthew Wilcox
2011-03-03 21:51     ` Greg KH
2011-03-03 22:07       ` Matthew Wilcox
2011-03-03 22:22         ` Greg KH
2011-03-04  2:25           ` Andy Lutomirski
2011-03-04  9:02             ` el es
2011-03-04 21:29             ` Greg KH
2011-03-04 12:43           ` Alan Cox
2011-03-04 21:28             ` Greg KH
2011-03-04 21:59               ` Alan Cox
2011-03-04 22:10                 ` Greg KH
2011-03-04 22:33                   ` Alan Cox
2011-03-04 23:10                     ` Greg KH
2011-03-05 10:28                       ` Alan Cox
2011-03-04 12:52     ` Mark Brown
2011-03-03 21:33 ` Randy Dunlap
2011-03-04 13:06 ` Christoph Hellwig
2011-03-04 14:46   ` Matthew Wilcox
2011-03-11 22:29 ` Andi Kleen
2011-03-12  5:51   ` Matthew Wilcox
2011-03-13 17:14     ` Andi Kleen [this message]
2011-03-13 17:14       ` Andi Kleen
2011-03-13 18:24 ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110313171419.GL2499@one.firstfloor.org \
    --to=andi@firstfloor.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.