linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* page-able RDMA
@ 2011-12-12 16:09 Sagi Grimberg
  0 siblings, 0 replies; only message in thread
From: Sagi Grimberg @ 2011-12-12 16:09 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-mm, Or Gerlitz, Shachar Raindel

Hey all,

InfiniBand allows remote host to access the memory of a local process, 
without involvement of the local CPU. This is called "RDMA". Currently, 
this is implemented by the task registering the address-space region 
that will be accessible through the network using a special API call ( 
ibv_reg_mr ). This API pins the address space area into RAM space (using 
get_user_pages), makes it DMA mappable, and adds a device specific 
mapping for this region. The memory area is pinned in memory until the 
user chooses to remove the registration, through another API call 
(ibv_dereg_mr).

I am working on a prototype enabling page able memory for an InfiniBand 
driver using mmu_notifier.
Such a task requires one to be able to manage a secondary PT for all 
relevant pages of a certain process,
This can be done using the mmu_notifier invalidation callback mechanism.

The pages will _NOT_ be pinned in RAM space, and all MMU actions will be 
reflected to the device's secondary PT, on the other hand the device 
will initiate page-fault events towards the driver when trying to 
operate on an unmapped page. the driver then will request mapping the 
relevant pages.
Once the pages are in memory, the driver will update the device's 
secondary PT.

The work on the prototype has raised several fundamental questions:

Since the device needs to stop any ongoing operations regarding that 
page, one should make sure that the device is sync with the page going 
to be freed upon return from the invalidation callback, and halted any 
read/write to the page. this flushing action is somewhat expensive since 
it is blocked by HW possibly for a long (10s of milliseconds) time.
* Are the invalidation callbacks sleep able (invalidate_page 
specifically)? thus allowing a scheduling HW sync?

Another goal to batch invalidations for performance improvement. Being 
able to delay a page invalidation can donate a major acceleration to our 
performance.
So, One should be aware of when it is OK to delay invalidations. upon a 
swap based invalidation - it's probably OK to delay, but for a user 
unmap action - delaying the invalidation can lead to bad results.
* Can one refuse an invalidation initiated on a page? what is the state 
of such a page?
* What is your opinion about providing the notifiers with extra 
information regarding the invalidation cause (swap, unmap, 
page-migration etc...)?
   or splitting the notifier to "invalidation that we can postpone" and 
"invalidation that must happen now"?

I had some short private email exchange on the matter with Andrea, which 
now naturally is moved here,
so to sync people on that correspondence I added this short intro. The 
original thread will be followed by this mail.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2011-12-12 16:15 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-12 16:09 page-able RDMA Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).