netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Jerome Glisse <jglisse@redhat.com>
Cc: "Tim Sell" <timothy.sell@unisys.com>,
	linux-doc@vger.kernel.org,
	"Alexander Shishkin" <alexander.shishkin@linux.intel.com>,
	"Zaibo Xu" <xuzaibo@huawei.com>,
	zhangfei.gao@foxmail.com, linuxarm@huawei.com,
	haojian.zhuang@linaro.org, "Christoph Lameter" <cl@linux.com>,
	"Hao Fang" <fanghao11@huawei.com>,
	"Gavin Schenk" <g.schenk@eckelmann.de>,
	"Leon Romanovsky" <leon@kernel.org>,
	"RDMA mailing list" <linux-rdma@vger.kernel.org>,
	"Vinod Koul" <vkoul@kernel.org>,
	"Doug Ledford" <dledford@redhat.com>,
	"Uwe Kleine-König" <u.kleine-koenig@pengutronix.de>,
	"David Kershner" <david.kershner@unisys.com>,
	"Kenneth Lee" <nek.in.cn@gmail.com>,
	"Johan Hovold" <johan@kernel.org>,
	"Cyrille Pitchen" <cyrille.pitchen@free-electrons.com>,
	"Sagar Dharia" <sdharia@codeaurora.org>
Subject: Re: [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce
Date: Mon, 19 Nov 2018 14:41:10 -0700	[thread overview]
Message-ID: <20181119214110.GJ4890@ziepe.ca> (raw)
In-Reply-To: <20181119213320.GG4593@redhat.com>

On Mon, Nov 19, 2018 at 04:33:20PM -0500, Jerome Glisse wrote:
> On Mon, Nov 19, 2018 at 02:26:38PM -0700, Jason Gunthorpe wrote:
> > On Mon, Nov 19, 2018 at 03:26:15PM -0500, Jerome Glisse wrote:
> > > On Mon, Nov 19, 2018 at 01:11:56PM -0700, Jason Gunthorpe wrote:
> > > > On Mon, Nov 19, 2018 at 02:46:32PM -0500, Jerome Glisse wrote:
> > > > 
> > > > > > ?? How can O_DIRECT be fine but RDMA not? They use exactly the same
> > > > > > get_user_pages flow, right? Can we do what O_DIRECT does in RDMA and
> > > > > > be fine too?
> > > > > > 
> > > > > > AFAIK the only difference is the length of the race window. You'd have
> > > > > > to fork and fault during the shorter time O_DIRECT has get_user_pages
> > > > > > open.
> > > > > 
> > > > > Well in O_DIRECT case there is only one page table, the CPU
> > > > > page table and it gets updated during fork() so there is an
> > > > > ordering there and the race window is small.
> > > > 
> > > > Not really, in O_DIRECT case there is another 'page table', we just
> > > > call it a DMA scatter/gather list and it is sent directly to the block
> > > > device's DMA HW. The sgl plays exactly the same role as the various HW
> > > > page list data structures that underly RDMA MRs.
> > > > 
> > > > It is not a page table that matters here, it is if the DMA address of
> > > > the page is active for DMA on HW.
> > > > 
> > > > Like you say, the only difference is that the race is hopefully small
> > > > with O_DIRECT (though that is not really small, NVMeof for instance
> > > > has windows as large as connection timeouts, if you try hard enough)
> > > > 
> > > > So we probably can trigger this trouble with O_DIRECT and fork(), and
> > > > I would call it a bug :(
> > > 
> > > I can not think of any scenario that would be a bug with O_DIRECT.
> > > Do you have one in mind ? When you fork() and do other syscall that
> > > affect the memory of your process in another thread you should
> > > expect non consistant results. Kernel is not here to provide a fully
> > > safe environement to user, user can shoot itself in the foot and
> > > that's fine as long as it only affect the process itself and no one
> > > else. We should not be in the business of making everything baby
> > > proof :)
> > 
> > Sure, I setup AIO with O_DIRECT and launch a read.
> > 
> > Then I fork and dirty the READ target memory using the CPU in the
> > child.
> > 
> > As you described in this case the fork will retain the physical page
> > that is undergoing O_DIRECT DMA, and the parent gets a new copy'd page.
> > 
> > The DMA completes, and the child gets the DMA'd to page. The parent
> > gets an unchanged copy'd page.
> > 
> > The parent gets the AIO completion, but can't see the data.
> > 
> > I'd call that a bug with O_DIRECT. The only correct outcome is that
> > the parent will always see the O_DIRECT data. Fork should not cause
> > the *parent* to malfunction. I agree the child cannot make any
> > prediction what memory it will see.
> > 
> > I assume the same flow is possible using threads and read()..
> > 
> > It is really no different than the RDMA bug with fork.
> > 
> 
> Yes and that's expected behavior :) If you fork() and have anything
> still in flight at time of fork that can change your process address
> space (including data in it) then all bets are of.
> 
> At least this is my reading of fork() syscall.

Not mine.. I can't think of anything else that would have this
behavior.

All traditional syscalls, will properly dirty the pages of the
parent. ie if I call read() in a thread and do fork in another thread,
then not seeing the data after read() completes is clearly a bug. All
other syscalls are the same.

It is bonkers that opening the file with O_DIRECT would change this
basic behavior. I'm calling it a bug :)

Jason

  reply	other threads:[~2018-11-20  8:06 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20181112075807.9291-1-nek.in.cn@gmail.com>
     [not found] ` <20181112075807.9291-2-nek.in.cn@gmail.com>
2018-11-13  0:23   ` [RFCv3 PATCH 1/6] uacce: Add documents for WarpDrive/uacce Leon Romanovsky
2018-11-14  2:58     ` Kenneth Lee
2018-11-14 16:00       ` Leon Romanovsky
2018-11-15  8:51         ` Kenneth Lee
2018-11-15 14:54           ` Leon Romanovsky
2018-11-19  9:14             ` Kenneth Lee
2018-11-19  9:19               ` Kenneth Lee
2018-11-19 10:48                 ` Leon Romanovsky
2018-11-19 16:48                   ` Jerome Glisse
2018-11-19 18:27                     ` Jason Gunthorpe
2018-11-19 18:42                       ` Jerome Glisse
2018-11-19 18:53                         ` Jason Gunthorpe
2018-11-19 19:17                           ` Jerome Glisse
2018-11-19 19:27                             ` Jason Gunthorpe
2018-11-19 19:46                               ` Jerome Glisse
2018-11-19 20:11                                 ` Jason Gunthorpe
2018-11-19 20:26                                   ` Jerome Glisse
2018-11-19 21:26                                     ` Jason Gunthorpe
2018-11-19 21:33                                       ` Jerome Glisse
2018-11-19 21:41                                         ` Jason Gunthorpe [this message]
2018-11-19 19:02                         ` Leon Romanovsky
2018-11-19 19:19                         ` Christopher Lameter
2018-11-19 19:25                           ` Jerome Glisse
2018-11-20  2:30                   ` Kenneth Lee
2018-11-27  2:52                     ` Kenneth Lee
2018-11-19 18:49               ` Jason Gunthorpe
2018-11-20  3:07                 ` Kenneth Lee
2018-11-20  3:29                   ` Jason Gunthorpe
2018-11-20  9:16                     ` Jonathan Cameron
2018-11-20 12:19                       ` Jean-Philippe Brucker
2018-11-21  6:08                     ` Kenneth Lee
2018-11-22  2:58                       ` Jason Gunthorpe
2018-11-23  8:02                         ` Kenneth Lee
2018-11-23 18:05                           ` Jason Gunthorpe
2018-11-24  4:13                             ` Kenneth Lee
2018-11-20  5:17                   ` Leon Romanovsky
2018-11-21  3:02                     ` Kenneth Lee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181119214110.GJ4890@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=cl@linux.com \
    --cc=cyrille.pitchen@free-electrons.com \
    --cc=david.kershner@unisys.com \
    --cc=dledford@redhat.com \
    --cc=fanghao11@huawei.com \
    --cc=g.schenk@eckelmann.de \
    --cc=haojian.zhuang@linaro.org \
    --cc=jglisse@redhat.com \
    --cc=johan@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=nek.in.cn@gmail.com \
    --cc=sdharia@codeaurora.org \
    --cc=timothy.sell@unisys.com \
    --cc=u.kleine-koenig@pengutronix.de \
    --cc=vkoul@kernel.org \
    --cc=xuzaibo@huawei.com \
    --cc=zhangfei.gao@foxmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).