From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (ext-mx02.extmail.prod.ext.phx2.redhat.com [10.5.110.26]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v0DCGjaH014610 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Fri, 13 Jan 2017 07:16:45 -0500 Received: from zimbra13.linbit.com (zimbra13.linbit.com [212.69.166.240]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 80C238F226 for ; Fri, 13 Jan 2017 12:16:43 +0000 (UTC) Date: Fri, 13 Jan 2017 13:16:40 +0100 From: Lars Ellenberg Message-ID: <20170113121640.GC9172@soda.linbit> References: <07fb8e78-2050-a2ba-3e71-c21e989d57f3@knebb.de> <20170110094258.GB4322@soda.linbit> <20170112170053.GE4650@soda.linbit> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20170112170053.GE4650@soda.linbit> Content-Transfer-Encoding: 8bit Subject: Re: [linux-lvm] [DRBD-user] LVM on top of DRBD [actually: mkfs.ext4 then mount results in detach on RHEL 7 on VMWare] Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="utf-8" To: LVM general discussion and development , drbd-user@lists.linbit.com On Thu, Jan 12, 2017 at 06:00:53PM +0100, Lars Ellenberg wrote: > On Wed, Jan 11, 2017 at 06:23:08PM +0100, knebb@knebb.de wrote: > > Hi Lars and all, > > > > > > >> I have to cross-post to LVM as well to DRBD mailing list as I have no > > >> clue where the issue is- if it's not a bug... > > >> > > >> I can not get working LVM on top of drbd- I am getting I/O erros > > >> followed by "diskless" state. > > > For some reason, (some? not only?) VMWare virtual disks tend to pretend > > > to support "write same", even if they fail such requests later. > > > > > > DRBD treats such failed WRITE-SAME the same way as any other backend > > > error, and by default detaches. > > Ok, it is beyond my knowledge, but I understand what the "write-same" > > command does. But if the underlying physical disk offers the command and > > reports an error when used this should apply to mkfs.ext4 on the device/ > > partition as well, shouldn't it? > > In this case, it happens on first mount. > Also, it is not an "EIO", but an "EOPNOTSUP". > > What really happens is that the file system code calls > blkdev_issue_zeroout(), > which will try discard, if discard is available and discard zeroes data, > or, if discard (with discard zeroes data) is not available or returns > failure, tries write-same with ZERO_PAGE, > or, if write-same is not available or returns failure, > tries __blkdev_issue_zeroout() (which uses "normal" writes). > > At least in "current upstream", probably very similar in your > almost-3.10.something kernel. > > DRBD sits in between, sees the failure return of write-same, > and handles it by detaching. > > > drbd detacheds when an error is > > reported- but why does Linux not report an error without drbd? And why > > does this only happen when using LVM in-between? Should be the same when > > LVM is not used.... > > Yes. And it is, as far as I can tell. > > > > Older kernels (RHEL 6) and also older drbd (8.3) are not affected, because they > > > don't know about write-same. > > My primary host is running CentOS7 while the secondary ist older > > (CentOS6). I will try to create the ext4 on the secondary and then > > switch to primary. > > > > > Or tell the system that the backend does not support write-same: > > > Check setting: > > > grep ^ /sys/block/*/device/scsi_disk/*/max_write_same_blocks > > > disable: > > > echo 0 | tee /sys/block/*/device/scsi_disk/*/max_write_same_blocks > > > > > A "find /sys -name "*same*"" does not report any files named > > double check that, please. > all my centos7 / RHEL 7 (and other distributions with sufficiently new > kernel) have that. > > there are both the read-only /sys/block/*/queue/write_same_max_bytes > and the write-able /sys/devices/*/*/*/host*/target*/*/scsi_disk/*/max_write_same_blocks > > > "max_write_same_blocks". On none of the both nodes. So I dcan not > > disable nor verify if it's enabled. I assume no as it does not exist. So > > this might not be the reason. > > show us lsblk -t and lsblk -D from the box that detaches. > (the "7" one) > > It may also be that a discard failed, in which case it could be > devicemapper pretending discard was supported, and the backend failing > that discard request. Or some combination there. > > Your original logs show > > Jan 7 10:58:44 backuppc kernel: EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: (null) > > Jan 7 10:58:48 backuppc kernel: block drbd1: local WRITE IO error sector 5296+3960 on sdc > > The "+..." part is the length (number of sectors) of the request. > We don't allow "normal" requests of that size, so this is either a > discard or write-same. > > > Jan 7 10:58:48 backuppc kernel: block drbd1: disk( UpToDate -> Failed ) > > > Jan 7 10:58:48 backuppc kernel: block drbd1: IO ERROR: neither local nor remote data, sector 29096+3968 > > > Jan 7 10:58:48 backuppc kernel: dm-2: WRITE SAME failed. Manually zeroing. > > And here we see that at least some WRITE SAME was issued, and returned failure. > and device mapper, which in your case sits above DRBD, > and consumes that error, has its own fallback code for failed write-same. Correcting myself, the presence of the warning message misled me. The 3.10 kernel still has that warning message directly in blkdev_issue_zeroout(), so that's not the device mapper fallback, but simply the mechanism I described above, with additional "log that I took the fallback because of failure". Which means DISCARDS have not even been tried, or we'd have a message about that as well. > Which can no longer be services, because DRBD already detached. > > So yes, > I'm pretty sure that I did not pull my "best guess" out of thin air only > > ;-) -- : Lars Ellenberg : LINBIT | Keeping the Digital World Running : DRBD -- Heartbeat -- Corosync -- Pacemaker DRBD� and LINBIT� are registered trademarks of LINBIT __ please don't Cc me, but send to list -- I'm subscribed