Re: Corruption with O_DIRECT and unaligned user buffers

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Andrea Arcangeli <aarcange@redhat.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
	Tim LaBerge <tim.laberge@quantum.com>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Subject: Re: Corruption with O_DIRECT and unaligned user buffers
Date: Sat, 20 Dec 2008 17:02:20 +0100	[thread overview]
Message-ID: <20081220160220.GE6383@random.random> (raw)
In-Reply-To: <20081219151118.A0AC.KOSAKI.MOTOHIRO@jp.fujitsu.com>

Hello!

On Fri, Dec 19, 2008 at 03:34:20PM +0900, KOSAKI Motohiro wrote:
> I think gup_pte_range() doesn't change pte attribute.
> Could you explain why get_user_pages_fast() is evil?

It's evil because it was assumed that by just relying on the
local_irq_disable() to prevent the smp tlb flush IPI to run, it'd be
enough to simulate a 'current' pagetable walk that allowed the current
task to run entirely lockless.

Problem is that by being totally lockless it prevents us to know if a
page is under direct-io or not. And if a page is under direct IO with
writing to memory (reading from memory we cannot care less, it's
always ok) we can't merge pages in ksm or we can't mark the pte
readonly in fork etc... If we do things break. The entirely lockless
(but atomic) pagetable walk done by the cpu is different from gup_fast
because the one done by the cpu will never end up writing to the page
through the pci bus in DMA, so the moment the IPI runs whatever I/O is
interrupted (not the case for gup_fast, when gup_fast returns and the
IPI runs and page is then available for sharing to ksm or pte marked
readonly, the direct DMA is still in flight). That's why gup_fast
*can't* be 100% lockless as today, otherwise it's unfixable and broken
and it's not just ksm. This very O_DIRECT bug in fork is 100%
unfixable without adding some serialization to gup_fast. So my patch
fixes it fully only for kernels before the introduction of gup_fast...

My suggestion is to reintroduced the big reader lock (br_lock) of
2.4 and have gup_fast take the read side of it, and fork/ksm take the
write side. It must no be a write-starving lock like the 2.4 one
though or fork would hang forever on large smp. It should be still
faster than get_user_pages.

> Why rhel can't use memory barrier?

Oh it can, just I didn't implemented immediately as I wanted to ship a
simpler patch first, but given the 27% slowdown measured in later
email, I'll definitely have to replace the TestSetPageLocked with
smb_rmb and see if the introduced overhead goes away.

next prev parent reply	other threads:[~2008-12-20 16:02 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-14 17:04 Corruption with O_DIRECT and unaligned user buffers Tim LaBerge
2008-11-19  4:25 ` Nick Piggin
2008-11-19  6:52   ` Nick Piggin
2008-11-19 16:58   ` Andrea Arcangeli
2008-12-18 15:29     ` Andrea Arcangeli
2008-12-19  2:21       ` KAMEZAWA Hiroyuki
2008-12-19  5:06         ` KAMEZAWA Hiroyuki
2008-12-19  6:34       ` KOSAKI Motohiro
2008-12-20 16:02         ` Andrea Arcangeli [this message]
2008-12-19  7:19       ` KAMEZAWA Hiroyuki
2008-12-19  7:44         ` Li Zefan
2008-12-19  8:45           ` Li Zefan
2008-12-19 20:27           ` Andrea Arcangeli
2008-12-20 15:55         ` Andrea Arcangeli
2008-12-19 11:51       ` Li Zefan
2008-12-19 12:14         ` KOSAKI Motohiro
2008-12-19 12:58         ` Hugh Dickins
2008-12-19 20:34         ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081220160220.GE6383@random.random \
    --to=aarcange@redhat.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=tim.laberge@quantum.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).