linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>,
	Dan Williams <dan.j.williams@intel.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>, Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Linux API <linux-api@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Dave Chinner <david@fromorbit.com>,
	linux-xfs@vger.kernel.org, Linux MM <linux-mm@kvack.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	Jeff Layton <jlayton@poochiereds.net>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v9 0/6] MAP_DIRECT for DAX userspace flush
Date: Mon, 16 Oct 2017 15:02:52 +0300	[thread overview]
Message-ID: <e29eb9ed-2d87-cde8-4efa-50de1fff0c04@grimberg.me> (raw)
In-Reply-To: <20171014015752.GA25172@obsidianresearch.com>


Hey folks, (chiming in very late here...)

>>> I think, if you want to build a uAPI for notification of MR lease
>>> break, then you need show how it fits into the above software model:
>>>   - How it can be hidden in a RDMA specific library
>>
>> So, here's a strawman can ibv_poll_cq() start returning ibv_wc_status
>> == IBV_WC_LOC_PROT_ERR when file coherency is lost. This would make
>> the solution generic across DAX and non-DAX. What's you're feeling for
>> how well applications are prepared to deal with that status return?
> 
> Stuffing an entry into the CQ is difficult. The CQ is in user memory
> and it is DMA'd from the HCA for several pieces of hardware, so the
> kernel can't just stuff something in there. It can be done
> with HW support by having the HCA DMA it via an exception path or
> something, but even then, you run into questions like CQ overflow and
> accounting issues since it is not ment for this.

But why should the kernel ever need to mangle the CQ? if a lease break
would deregister the MR the device is expected to generate remote
protection errors on its own.

And in that case, I think we need a query mechanism rather an event
mechanism so when the application starts seeing protection errors
it can query the relevant MR (I think most if not all devices have that
information in their internal completion queue entries).

> 
> So, you need a side channel of some kind, either in certain drivers or
> generically..
> 
>>>   - How lease break can be done hitlessly, so the library user never
>>>     needs to know it is happening or see failed/missed transfers

I agree that the application should not be aware of lease breakages, but
seeing failed transfers is perfectly acceptable given that an access
violation is happening (my assumption is that failed transfers are error
completions reported in the user completion queue). What we need to have
is a framework to help user-space to recover sanely, which is to query
what MR had the access violation, restore it, and re-establish the queue
pair.

>>
>> iommu redirect should be hit less and behave like the page cache case
>> where RDMA targets pages that are no longer part of the file.
> 
> Yes, if the iommu can be fenced properly it sounds doable.
> 
>>>   - Whatever fast path checking is needed does not kill performance
>>
>> What do you consider a fast path? I was assuming that memory
>> registration is a slow path, and iommu operations are asynchronous so
>> should not impact performance of ongoing operations beyond typical
>> iommu overhead.
> 
> ibv_poll_cq() and ibv_post_send() would be a fast path.
> 
> Where this struggled before is in creating a side channel you also now
> have to check that side channel, and checking it at high performance
> is quite hard.. Even quiecing things to be able to tear down the MR
> has performance implications on post send...

This is exactly why I think we should not have it, but instead give
building blocks to recover sanely from error completions...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-10-16 12:02 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-12  0:47 [PATCH v9 0/6] MAP_DIRECT for DAX userspace flush Dan Williams
2017-10-12  0:47 ` [PATCH v9 1/6] mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flags Dan Williams
2017-10-12 13:51   ` Jan Kara
2017-10-12 16:32     ` Linus Torvalds
2017-10-16  7:38       ` Christoph Hellwig
2017-10-16  7:56       ` Jan Kara
2017-10-12  0:47 ` [PATCH v9 2/6] fs, mm: pass fd to ->mmap_validate() Dan Williams
2017-10-12  1:21   ` Al Viro
2017-10-12  1:28     ` Dan Williams
2017-10-12  2:17       ` Dan Williams
2017-10-12  3:44         ` Dan Williams
2017-10-12  0:47 ` [PATCH v9 3/6] fs: MAP_DIRECT core Dan Williams
2017-10-12  0:47 ` [PATCH v9 4/6] xfs: prepare xfs_break_layouts() for reuse with MAP_DIRECT Dan Williams
2017-10-12  0:47 ` [PATCH v9 5/6] fs, xfs, iomap: introduce break_layout_nowait() Dan Williams
2017-10-12  0:47 ` [PATCH v9 6/6] xfs: wire up MAP_DIRECT Dan Williams
2017-10-12 14:23 ` [PATCH v9 0/6] MAP_DIRECT for DAX userspace flush Christoph Hellwig
2017-10-12 17:41   ` Dan Williams
2017-10-13  6:57     ` Christoph Hellwig
2017-10-13 15:14       ` Dan Williams
2017-10-13 16:38         ` Jason Gunthorpe
2017-10-13 17:01           ` Dan Williams
2017-10-13 17:31             ` Jason Gunthorpe
2017-10-13 18:22               ` Dan Williams
2017-10-14  1:57                 ` Jason Gunthorpe
2017-10-16 12:02                   ` Sagi Grimberg [this message]
2017-10-19  6:02                     ` Jason Gunthorpe
2017-10-16  7:30                 ` Christoph Hellwig
2017-10-16  7:26               ` Christoph Hellwig
2017-10-16 12:07                 ` Sagi Grimberg
2017-10-16 17:43                 ` Dan Williams
2017-10-16 19:44                   ` Dan Williams
2017-10-17  6:46                     ` Christoph Hellwig
2017-10-16  7:22           ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e29eb9ed-2d87-cde8-4efa-50de1fff0c04@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bfields@fieldses.org \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jgunthorpe@obsidianresearch.com \
    --cc=jlayton@poochiereds.net \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).