All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@kernel.dk>
To: Mike Snitzer <snitzer@redhat.com>
Cc: LVM general discussion and development <linux-lvm@redhat.com>,
	Spelic <spelic@shiftmail.org>,
	Christoph Hellwig <hch@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	xfs@oss.sgi.com, npiggin@kernel.dk, dm-devel@redhat.com
Subject: Re: Bugs in mkfs.xfs, device mapper, xfs, and /dev/ram
Date: Sat, 4 Dec 2010 04:11:40 +1100	[thread overview]
Message-ID: <20101203171140.GA11889@amd> (raw)
In-Reply-To: <20101202212227.GA22703@redhat.com>

On Thu, Dec 02, 2010 at 04:22:27PM -0500, Mike Snitzer wrote:
> On Thu, Dec 02 2010 at  9:17am -0500,
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > On Thu, Dec 02, 2010 at 03:14:28PM +0100, Spelic wrote:
> > > On 12/02/2010 03:11 PM, Christoph Hellwig wrote:
> > > >I'm pretty sure you have CONFIG_DEBUG_BLOCK_EXT_DEVT enabled.  This
> > > >option must never be enabled, as it causes block devices to be
> > > >randomly renumered.  Together with the ramdisk driver overloading
> > > >the BLKFLSBUF ioctl to discard all data it guarantees you to get
> > > >data loss like yours.
> > > 
> > > Nope...
> > > 
> > > # CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
> > 
> > Hmm, I suspect dm-linear's dumb forwarding of ioctls has the same
> > effect.
> 
> For the benefit of others:
> - mkfs.xfs will avoid sending BLKFLSBUF to any device whose major is
>   ramdisk's major, this dates back to 2004:
>   http://oss.sgi.com/archives/xfs/2004-08/msg00463.html
> - but because a kpartx partition overlay (linear DM mapping) is used for
>   the /dev/ram0p1 device, mkfs.xfs only sees a device with DM's major 
> - so mkfs.xfs sends BLKFLSBUF to the DM device blissfully unaware that
>   the backing device (behind the DM linear target) is a brd device
> - DM will forward the BLKFLSBUF ioctl to brd, which triggers
>   drivers/block/brd.c:brd_ioctl (nuking the entire ramdisk in the
>   process)
> 
> So coming full circle this is what hch was referring to when he
> mentioned:
> 1) "ramdisk driver overloading the BLKFLSBUF ioctl ..."
> 2) "dm-linear's dumb forwarding of ioctls ..."
> 
> I really can't see DM adding a specific check for ramdisk's major when
> forwarding the BLKFLSBUF ioctl.
> 
> brd has direct partition support (see commit d7853d1f8932c) so maybe
> kpartx should just blacklist /dev/ram devices?
> 
> Alternatively, what about switching brd away from overloading BLKFLSBUF
> to a real implementation of (overloaded) BLKDISCARD support in brd.c?
> One that doesn't blindly nuke the entire device but that properly
> processes the discard request.

Yeah the situation really sucks (mkfs.jfs doesn't work on ramdisk
for the same reason).

I want to unfortunately keep ioctl for compatibility, but adding new
saner ones would be welcome. Also, having a non-default config or
load time parameter for brd, to skip the special case, if that would
help testing on older userspace.

DISCARD is actually a problem for rd. To actually get proper
correctness, you need to preload brd with pages, otherwise when
doing stress tests, IO can require memory allocations and deadlock.
If we add a discard that frees pages, that introduces the same problem.
If you find any option useful for testing, however, patches are fine --
brd pretty much is only useful for testing nowadays.

WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <npiggin@kernel.dk>
To: Mike Snitzer <snitzer@redhat.com>
Cc: npiggin@kernel.dk,
	LVM general discussion and development <linux-lvm@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	xfs@oss.sgi.com, Christoph Hellwig <hch@infradead.org>,
	dm-devel@redhat.com, Spelic <spelic@shiftmail.org>
Subject: Re: [linux-lvm] Bugs in mkfs.xfs, device mapper, xfs, and /dev/ram
Date: Sat, 4 Dec 2010 04:11:40 +1100	[thread overview]
Message-ID: <20101203171140.GA11889@amd> (raw)
In-Reply-To: <20101202212227.GA22703@redhat.com>

On Thu, Dec 02, 2010 at 04:22:27PM -0500, Mike Snitzer wrote:
> On Thu, Dec 02 2010 at  9:17am -0500,
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > On Thu, Dec 02, 2010 at 03:14:28PM +0100, Spelic wrote:
> > > On 12/02/2010 03:11 PM, Christoph Hellwig wrote:
> > > >I'm pretty sure you have CONFIG_DEBUG_BLOCK_EXT_DEVT enabled.  This
> > > >option must never be enabled, as it causes block devices to be
> > > >randomly renumered.  Together with the ramdisk driver overloading
> > > >the BLKFLSBUF ioctl to discard all data it guarantees you to get
> > > >data loss like yours.
> > > 
> > > Nope...
> > > 
> > > # CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
> > 
> > Hmm, I suspect dm-linear's dumb forwarding of ioctls has the same
> > effect.
> 
> For the benefit of others:
> - mkfs.xfs will avoid sending BLKFLSBUF to any device whose major is
>   ramdisk's major, this dates back to 2004:
>   http://oss.sgi.com/archives/xfs/2004-08/msg00463.html
> - but because a kpartx partition overlay (linear DM mapping) is used for
>   the /dev/ram0p1 device, mkfs.xfs only sees a device with DM's major 
> - so mkfs.xfs sends BLKFLSBUF to the DM device blissfully unaware that
>   the backing device (behind the DM linear target) is a brd device
> - DM will forward the BLKFLSBUF ioctl to brd, which triggers
>   drivers/block/brd.c:brd_ioctl (nuking the entire ramdisk in the
>   process)
> 
> So coming full circle this is what hch was referring to when he
> mentioned:
> 1) "ramdisk driver overloading the BLKFLSBUF ioctl ..."
> 2) "dm-linear's dumb forwarding of ioctls ..."
> 
> I really can't see DM adding a specific check for ramdisk's major when
> forwarding the BLKFLSBUF ioctl.
> 
> brd has direct partition support (see commit d7853d1f8932c) so maybe
> kpartx should just blacklist /dev/ram devices?
> 
> Alternatively, what about switching brd away from overloading BLKFLSBUF
> to a real implementation of (overloaded) BLKDISCARD support in brd.c?
> One that doesn't blindly nuke the entire device but that properly
> processes the discard request.

Yeah the situation really sucks (mkfs.jfs doesn't work on ramdisk
for the same reason).

I want to unfortunately keep ioctl for compatibility, but adding new
saner ones would be welcome. Also, having a non-default config or
load time parameter for brd, to skip the special case, if that would
help testing on older userspace.

DISCARD is actually a problem for rd. To actually get proper
correctness, you need to preload brd with pages, otherwise when
doing stress tests, IO can require memory allocations and deadlock.
If we add a discard that frees pages, that introduces the same problem.
If you find any option useful for testing, however, patches are fine --
brd pretty much is only useful for testing nowadays.

WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <npiggin@kernel.dk>
To: Mike Snitzer <snitzer@redhat.com>
Cc: npiggin@kernel.dk,
	LVM general discussion and development <linux-lvm@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	xfs@oss.sgi.com, Christoph Hellwig <hch@infradead.org>,
	dm-devel@redhat.com, Spelic <spelic@shiftmail.org>
Subject: Re: Bugs in mkfs.xfs, device mapper, xfs, and /dev/ram
Date: Sat, 4 Dec 2010 04:11:40 +1100	[thread overview]
Message-ID: <20101203171140.GA11889@amd> (raw)
In-Reply-To: <20101202212227.GA22703@redhat.com>

On Thu, Dec 02, 2010 at 04:22:27PM -0500, Mike Snitzer wrote:
> On Thu, Dec 02 2010 at  9:17am -0500,
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > On Thu, Dec 02, 2010 at 03:14:28PM +0100, Spelic wrote:
> > > On 12/02/2010 03:11 PM, Christoph Hellwig wrote:
> > > >I'm pretty sure you have CONFIG_DEBUG_BLOCK_EXT_DEVT enabled.  This
> > > >option must never be enabled, as it causes block devices to be
> > > >randomly renumered.  Together with the ramdisk driver overloading
> > > >the BLKFLSBUF ioctl to discard all data it guarantees you to get
> > > >data loss like yours.
> > > 
> > > Nope...
> > > 
> > > # CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
> > 
> > Hmm, I suspect dm-linear's dumb forwarding of ioctls has the same
> > effect.
> 
> For the benefit of others:
> - mkfs.xfs will avoid sending BLKFLSBUF to any device whose major is
>   ramdisk's major, this dates back to 2004:
>   http://oss.sgi.com/archives/xfs/2004-08/msg00463.html
> - but because a kpartx partition overlay (linear DM mapping) is used for
>   the /dev/ram0p1 device, mkfs.xfs only sees a device with DM's major 
> - so mkfs.xfs sends BLKFLSBUF to the DM device blissfully unaware that
>   the backing device (behind the DM linear target) is a brd device
> - DM will forward the BLKFLSBUF ioctl to brd, which triggers
>   drivers/block/brd.c:brd_ioctl (nuking the entire ramdisk in the
>   process)
> 
> So coming full circle this is what hch was referring to when he
> mentioned:
> 1) "ramdisk driver overloading the BLKFLSBUF ioctl ..."
> 2) "dm-linear's dumb forwarding of ioctls ..."
> 
> I really can't see DM adding a specific check for ramdisk's major when
> forwarding the BLKFLSBUF ioctl.
> 
> brd has direct partition support (see commit d7853d1f8932c) so maybe
> kpartx should just blacklist /dev/ram devices?
> 
> Alternatively, what about switching brd away from overloading BLKFLSBUF
> to a real implementation of (overloaded) BLKDISCARD support in brd.c?
> One that doesn't blindly nuke the entire device but that properly
> processes the discard request.

Yeah the situation really sucks (mkfs.jfs doesn't work on ramdisk
for the same reason).

I want to unfortunately keep ioctl for compatibility, but adding new
saner ones would be welcome. Also, having a non-default config or
load time parameter for brd, to skip the special case, if that would
help testing on older userspace.

DISCARD is actually a problem for rd. To actually get proper
correctness, you need to preload brd with pages, otherwise when
doing stress tests, IO can require memory allocations and deadlock.
If we add a discard that frees pages, that introduces the same problem.
If you find any option useful for testing, however, patches are fine --
brd pretty much is only useful for testing nowadays.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2010-12-03 17:11 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-02 13:55 [linux-lvm] Bugs in mkfs.xfs, device mapper, xfs, and /dev/ram Spelic
2010-12-02 13:55 ` Spelic
2010-12-02 13:55 ` Spelic
2010-12-02 14:11 ` [linux-lvm] " Christoph Hellwig
2010-12-02 14:11   ` Christoph Hellwig
2010-12-02 14:11   ` Christoph Hellwig
2010-12-02 14:14   ` [linux-lvm] " Spelic
2010-12-02 14:14     ` Spelic
2010-12-02 14:14     ` Spelic
2010-12-02 14:17     ` [linux-lvm] " Christoph Hellwig
2010-12-02 14:17       ` Christoph Hellwig
2010-12-02 14:17       ` Christoph Hellwig
2010-12-02 21:22       ` Mike Snitzer
2010-12-02 21:22         ` Mike Snitzer
2010-12-02 21:22         ` [linux-lvm] " Mike Snitzer
2010-12-02 22:08         ` Mike Snitzer
2010-12-02 22:08           ` Mike Snitzer
2010-12-02 22:08           ` [linux-lvm] " Mike Snitzer
2010-12-03 17:11         ` Nick Piggin [this message]
2010-12-03 17:11           ` Nick Piggin
2010-12-03 17:11           ` [linux-lvm] " Nick Piggin
2010-12-03 18:15           ` Ted Ts'o
2010-12-03 18:15             ` Ted Ts'o
2010-12-03 18:15             ` [linux-lvm] " Ted Ts'o
2010-12-02 14:14 ` Spelic
2010-12-02 14:14   ` Spelic
2010-12-02 14:14   ` Spelic
2010-12-02 23:07   ` [linux-lvm] " Dave Chinner
2010-12-02 23:07     ` Dave Chinner
2010-12-02 23:07     ` Dave Chinner
2010-12-03 14:07     ` [linux-lvm] " Spelic
2010-12-03 14:07       ` Spelic
2010-12-03 14:07       ` Spelic
2010-12-06  4:09       ` [linux-lvm] " Dave Chinner
2010-12-06  4:09         ` Dave Chinner
2010-12-06  4:09         ` Dave Chinner
2010-12-06 12:20         ` [linux-lvm] NFS corruption on ENOSPC (was: Re: Bugs in mkfs.xfs, device mapper, xfs, and /dev/ram) Spelic
2010-12-06 12:20           ` Spelic
2010-12-06 12:20           ` Spelic
2010-12-06 13:33           ` [linux-lvm] " Trond Myklebust
2010-12-06 13:33             ` Trond Myklebust
2010-12-06 13:33             ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101203171140.GA11889@amd \
    --to=npiggin@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-lvm@redhat.com \
    --cc=snitzer@redhat.com \
    --cc=spelic@shiftmail.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.