linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* testing stable pages being modified
@ 2013-05-30 22:36 Zach Brown
  2013-05-31  5:11 ` Chris Mason
  0 siblings, 1 reply; 6+ messages in thread
From: Zach Brown @ 2013-05-30 22:36 UTC (permalink / raw)
  To: linux-fsdevel, linux-btrfs

'stable' pages have always been a bit of a fiction.  It's easy to
intentionally modify stable pages under io with some help from page
references that ignore mappings and page state.

Here's little test that uses O_DIRECT to get the pinned aio ring pages
under IO and then has event completion stores modify them while they're
in flight.

It's a nice quick way to test the consequences of stable pages being
modified.  It can be used to burp out ratelimited csum failure kernel
messages with btrfs, for example.

- z

#define _GNU_SOURCE
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <limits.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <assert.h>
#include <libaio.h>

int main(int argc, char **argv)
{
	size_t total = 1 * 1024 * 1024;
	size_t page_size = sysconf(_SC_PAGESIZE);
	struct iovec *iov;
	size_t iov_nr = total / page_size;
	void *junk;
	io_context_t ctx = NULL;
	int nr_iocbs = 3;
	struct iocb iocbs[nr_iocbs];
	struct iocb *iocb_ptrs[nr_iocbs];
	struct io_event events[nr_iocbs];
	int ret;
	int fd;
	int nr;
	int i;

	if (argc != 2) {
		fprintf(stderr, "usage: %s <file_to_overwrite>\n", argv[0]);
		exit(1);
	}

	iov = calloc(iov_nr, sizeof(*iov));
	junk = malloc(total);
	assert(iov && junk);

	fd = open(argv[1], O_RDWR|O_CREAT|O_DIRECT, 0644);
	assert(fd >= 0);

	ret = io_setup(nr_iocbs, &ctx);
	assert(ret >= 0);

	for (i = 0; i < iov_nr; i++) {
		iov[i].iov_base = ctx;
		iov[i].iov_len = page_size;
	}

	/* initial write to allocate the file region */
	ret = writev(fd, iov, iov_nr);
	assert(ret == total);

	/*
	 * Keep one of each of these iocbs in flight:
	 *
	 * [0]: hopefully fast 0 byte read to keep churning events
	 * [1]: dio read of file bytes to trigger csum verification
	 * [2]: dio write of unstable event pages
	 */
	io_prep_pread(&iocbs[0], fd, junk, 0, 0);
	io_prep_pread(&iocbs[1], fd, junk, total, 0);
	io_prep_pwritev(&iocbs[2], fd, iov, iov_nr, 0);

	for (i = 0; i < nr_iocbs; i++)
		iocb_ptrs[i] = &iocbs[i];
	nr = nr_iocbs;

	for(;;) {
		ret = io_submit(ctx, nr, iocb_ptrs);
		assert(ret == nr);

		nr = io_getevents(ctx, 1, nr_iocbs, events, NULL);
		assert(nr > 0);

		for (i = 0; i < nr; i++)
			iocb_ptrs[i] = events[i].obj;
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: testing stable pages being modified
  2013-05-30 22:36 testing stable pages being modified Zach Brown
@ 2013-05-31  5:11 ` Chris Mason
  2013-05-31  6:24   ` Zach Brown
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Mason @ 2013-05-31  5:11 UTC (permalink / raw)
  To: Zach Brown, linux-fsdevel@vger.kernel.org,
	linux-btrfs@vger.kernel.org

Quoting Zach Brown (2013-05-30 18:36:10)
> 'stable' pages have always been a bit of a fiction.  It's easy to
> intentionally modify stable pages under io with some help from page
> references that ignore mappings and page state.
> 
> Here's little test that uses O_DIRECT to get the pinned aio ring pages
> under IO and then has event completion stores modify them while they're
> in flight.
> 
> It's a nice quick way to test the consequences of stable pages being
> modified.  It can be used to burp out ratelimited csum failure kernel
> messages with btrfs, for example.

Changing O_DIRECT in flight has always been a deep dark corner case, and
crc errors are the expected result.  Have you found anyone doing this in
real life?

I do like the small test program though, we should extend it into a test
to make sure crcs are really crcing.

-chris


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: testing stable pages being modified
  2013-05-31  5:11 ` Chris Mason
@ 2013-05-31  6:24   ` Zach Brown
  2013-05-31 13:29     ` Josef Bacik
  0 siblings, 1 reply; 6+ messages in thread
From: Zach Brown @ 2013-05-31  6:24 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-fsdevel@vger.kernel.org, linux-btrfs@vger.kernel.org

> Changing O_DIRECT in flight has always been a deep dark corner case, and
> crc errors are the expected result.  Have you found anyone doing this in
> real life?

Agreed; and no, I haven't heard of people accidentally modifying stable
pages.

> I do like the small test program though, we should extend it into a test
> to make sure crcs are really crcing.

Yeah, the little test is more to make sure that the error paths handle
the modification case as expected :).

- z

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: testing stable pages being modified
  2013-05-31  6:24   ` Zach Brown
@ 2013-05-31 13:29     ` Josef Bacik
  2013-05-31 13:53       ` Chris Mason
  0 siblings, 1 reply; 6+ messages in thread
From: Josef Bacik @ 2013-05-31 13:29 UTC (permalink / raw)
  To: Zach Brown
  Cc: Chris Mason, linux-fsdevel@vger.kernel.org,
	linux-btrfs@vger.kernel.org

On Fri, May 31, 2013 at 12:24:30AM -0600, Zach Brown wrote:
> > Changing O_DIRECT in flight has always been a deep dark corner case, and
> > crc errors are the expected result.  Have you found anyone doing this in
> > real life?
> 
> Agreed; and no, I haven't heard of people accidentally modifying stable
> pages.
> 

Windows does this, it also will send down the same page for different offsets
which is why we have that special check in check_direct_IO for reads because
that would cause lots of csum errors too.  I tried to fix the modified in flight
problem by checking the csums of the pages in the io completion handler and
re-submitting the IO if the pages had changed, but this of course dramatically
reduced performance for all of those well behaved O_DIRECT applications, so in
the end I just set nodatasum for that vm image and carried on.  I'm not sure
what the solution is for this problem.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: testing stable pages being modified
  2013-05-31 13:29     ` Josef Bacik
@ 2013-05-31 13:53       ` Chris Mason
  2013-05-31 14:34         ` Josef Bacik
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Mason @ 2013-05-31 13:53 UTC (permalink / raw)
  To: Josef Bacik, Zach Brown
  Cc: linux-fsdevel@vger.kernel.org, linux-btrfs@vger.kernel.org

Quoting Josef Bacik (2013-05-31 09:29:07)
> On Fri, May 31, 2013 at 12:24:30AM -0600, Zach Brown wrote:
> > > Changing O_DIRECT in flight has always been a deep dark corner case, and
> > > crc errors are the expected result.  Have you found anyone doing this in
> > > real life?
> > 
> > Agreed; and no, I haven't heard of people accidentally modifying stable
> > pages.
> > 
> 
> Windows does this, it also will send down the same page for different offsets
> which is why we have that special check in check_direct_IO for reads because
> that would cause lots of csum errors too.  I tried to fix the modified in flight
> problem by checking the csums of the pages in the io completion handler and
> re-submitting the IO if the pages had changed, but this of course dramatically
> reduced performance for all of those well behaved O_DIRECT applications, so in
> the end I just set nodatasum for that vm image and carried on.  I'm not sure
> what the solution is for this problem.  Thanks,

Ugh, I forgot about the windows case.  KVM should really be copying the
pages for sectors where IO is already in flight.

We could do the copies ourselves: mount -o dio_copies

-chris


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: testing stable pages being modified
  2013-05-31 13:53       ` Chris Mason
@ 2013-05-31 14:34         ` Josef Bacik
  0 siblings, 0 replies; 6+ messages in thread
From: Josef Bacik @ 2013-05-31 14:34 UTC (permalink / raw)
  To: Chris Mason
  Cc: Josef Bacik, Zach Brown, linux-fsdevel@vger.kernel.org,
	linux-btrfs@vger.kernel.org

On Fri, May 31, 2013 at 07:53:29AM -0600, Chris Mason wrote:
> Quoting Josef Bacik (2013-05-31 09:29:07)
> > On Fri, May 31, 2013 at 12:24:30AM -0600, Zach Brown wrote:
> > > > Changing O_DIRECT in flight has always been a deep dark corner case, and
> > > > crc errors are the expected result.  Have you found anyone doing this in
> > > > real life?
> > > 
> > > Agreed; and no, I haven't heard of people accidentally modifying stable
> > > pages.
> > > 
> > 
> > Windows does this, it also will send down the same page for different offsets
> > which is why we have that special check in check_direct_IO for reads because
> > that would cause lots of csum errors too.  I tried to fix the modified in flight
> > problem by checking the csums of the pages in the io completion handler and
> > re-submitting the IO if the pages had changed, but this of course dramatically
> > reduced performance for all of those well behaved O_DIRECT applications, so in
> > the end I just set nodatasum for that vm image and carried on.  I'm not sure
> > what the solution is for this problem.  Thanks,
> 
> Ugh, I forgot about the windows case.  KVM should really be copying the
> pages for sectors where IO is already in flight.
> 
> We could do the copies ourselves: mount -o dio_copies
>

That would work for us, but what about other people that rely on stable pages,
like *fs on iscsi and such?  It might be good to have a generic mount option
that the vfs notices and makes the copying happen before it gets to the file
system and that way we're all save and don't have a different solution/mount
option for each fs.  Thanks,

Josef 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-05-31 14:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-30 22:36 testing stable pages being modified Zach Brown
2013-05-31  5:11 ` Chris Mason
2013-05-31  6:24   ` Zach Brown
2013-05-31 13:29     ` Josef Bacik
2013-05-31 13:53       ` Chris Mason
2013-05-31 14:34         ` Josef Bacik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).