From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 2 Jul 2012 09:41:04 -0400 From: Mike Snitzer Message-ID: <20120702134104.GC785@redhat.com> References: <4FEEE5E5.5060800@redhat.com> <4FEF66C4.20001@redhat.com> <4FF04950.30503@redhat.com> <4FF0AEA9.3090900@redhat.com> <4FF0C91B.8050700@redhat.com> MIME-Version: 1.0 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Subject: Re: [linux-lvm] Regression with FALLOC_FL_PUNCH_HOLE in 3.5-rc kernel Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="utf-8" To: =?utf-8?B?THVrw6HFoQ==?= Czerner Cc: amwang@redhat.com, Zdenek Kabelac , Hugh Dickins , linux-kernel@vger.kernel.org, Joe Thornber , LVM general discussion and development , Alasdair G Kergon On Mon, Jul 02 2012 at 6:35am -0400, Lukáš Czerner wrote: > > > > So you're testing rather old kernel so you might be missing some > > fixes there. Could you rerun the test with the recent kernel ? > > > > Also it appears that the bug here happens because dm requested a > > destination page which is within the kernel space. It seems that > > this has been initiated by the write request from the mirror target. > > So I do not immediately see how punch hole (discard) is involved at > > all. You might have been lucky enough to hit a different bug > > probably ? > > > > Looking at git log, this commit has been brought to my attention: > > > > 0c535e0d6f463365c29623350dbd91642363c39b dm io: fix discard support > > > > seems related to this crash. > > > > Please retest with recent kernel. Ah, you beat me to recommending that fix ;) > So from the original backtrace for the problem Zdenek is seeing on 3.5.0-rc4 > (https://lkml.org/lkml/2012/6/30/98) I think that this is > problem in the device mapper itself. I do not think it has anything > to do with tmpfs or mm. According to bisects from Zdenek it clearly > shows that the problem appear when the discard support for the loop > device is added, so it is most likely related to the dm discard support. What about using scsi_debug with the dm-mirror target? Never say never, DM-mirror and/or dm-io code could still have an issue, but the commit referenced above did fix discard with the mirror target back in 3.3. > Anyway, the backtrace points to the NULL pointed dereference in > dm_rh_region_context() which is simple function: > > void *dm_rh_region_context(struct dm_region *reg) > { > return reg->rh->context; > } > > so either reg, or reg-rh is NULL. Now the only place this is used is > from recovery_complete() in dm-raid1.c. So this is somewhat related > to raid recovery. I am not familiar with the dm code, but can > someone from the dm team look at this ? I'll coordiinate with Zdenek. > But just to be sure to rule out the punch hole thing Zdenek can you > run your tests on the "real" discard capable device ? Or at least on > the device which does not convert discard requests into punch hole ? > You can use scsi_debug to create such device: > > modprobe scsi_debug dev_size_mb=16 sector_size=512 num_tgts=1 lbpu=1 Great minds think alike ;)