All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladislav Bolkhovitin <vst@vlnb.net>
To: Gennadiy Nerubayev <parakie@gmail.com>
Cc: James Bottomley <James.Bottomley@suse.de>,
	Christof Schmitt <christof.schmitt@de.ibm.com>,
	Boaz Harrosh <bharrosh@panasas.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	Chris Mason <chris.mason@oracle.com>
Subject: Re: Wrong DIF guard tag on ext2 write
Date: Mon, 26 Jul 2010 16:22:39 +0400	[thread overview]
Message-ID: <4C4D7E0F.1000602@vlnb.net> (raw)
In-Reply-To: <AANLkTi=OhopP4qud6EVffrNU3jLrA2KzRca=a4T9GhHx@mail.gmail.com>

Gennadiy Nerubayev, on 07/24/2010 12:51 AM wrote:
>>>> The real life problem we can see in an active-active DRBD-setup. In this
>>>> configuration 2 nodes act as a single SCST-powered SCSI device and they both
>>>> run DRBD to keep their backstorage in-sync. The initiator uses them as a
>>>> single multipath device in an active-active round-robin load-balancing
>>>> configuration, i.e. sends requests to both nodes in parallel, then DRBD
>>>> takes care to replicate the requests to the other node.
>>>>
>>>> The problem is that sometimes DRBD complies about concurrent local
>>>> writes, like:
>>>>
>>>> kernel: drbd0: scsi_tgt0[12503] Concurrent local write detected! [DISCARD
>>>> L] new: 144072784s +8192; pending: 144072784s +8192
>>>>
>>>> This message means that DRBD detected that both nodes received
>>>> overlapping writes on the same block(s) and DRBD can't figure out which one
>>>> to store. This is possible only if the initiator sent the second write
>>>> request before the first one completed.
>>>>
>>>> The topic of the discussion could well explain the cause of that. But,
>>>> unfortunately, people who reported it forgot to note which OS they run on
>>>> the initiator, i.e. I can't say for sure it's Linux.
>>>
>>> Sorry for the late chime in, but here's some more information of
>>> potential interest as I've previously inquired about this to the drbd
>>> mailing list:
>>>
>>> 1. It only happens when using blockio mode in IET or SCST. Fileio,
>>> nv_cache, and write_through do not generate the warnings.
>>
>> Some explanations for those who not familiar with the terminology:
>>
>>   - "Fileio" means Linux IO stack on the target receives IO via
>> vfs_readv()/vfs_writev()
>>
>>   - "NV_CACHE" means all the cache synchronization requests
>> (SYNCHRONIZE_CACHE, FUA) from the initiator are ignored
>>
>>   - "WRITE_THROUGH" means write through, i.e. the corresponding backend file
>> for the device open with O_SYNC flag.
>>
>>> 2. It happens on active/passive drbd clusters (on the active node
>>> obviously), NOT active/active. In fact, I've found that doing round
>>> robin on active/active is a Bad Idea (tm) even with a clustered
>>> filesystem, until at least the target software is able to synchronize
>>> the command state of either node.
>>> 3. Linux and ESX initiators can generate the warning, but I've so far
>>> only been able to reliably reproduce it using a Windows initiator and
>>> sqlio or iometer benchmarks. I'll be trying again using iometer when I
>>> have the time.
>>> 4. It only happens using a random write io workload (any block size),
>>> with initiator threads>1, OR initiator queue depth>1. The higher
>>> either of those is, the more spammy the warnings become.
>>> 5. The transport does not matter (reproduced with iSCSI and SRP)
>>> 6. If DRBD is disconnected (primary/unknown), the warnings are not
>>> generated. As soon as it's reconnected (primary/secondary), the
>>> warnings will reappear.
>>
>> It would be great if you prove or disprove our suspicions that Linux can
>> produce several write requests for the same blocks simultaneously. To be
>> sure we need:
>>
>> 1. The initiator is Linux. Windows and ESX are not needed for this
>> particular case.
>>
>> 2. If you are able to reproduce it, we will need full description of which
>> application used on the initiator to generate the load and in which mode.
>>
>> Target and DRBD configuration doesn't matter, you can use any.
>
> I just tried, and this particular DRBD warning is not reproducible
> with io (iometer) coming from a Linux initiator (2.6.30.10) The same
> iometer parameters were used as on windows, and both the base device
> as well as filesystem (ext3) were tested, both negative. I'll try a
> few more tests, but it seems that this is a nonissue with a Linux
> initiator.

OK, but to be completely sure, can you check also with other load 
generators, than IOmeter, please? IOmeter on Linux is a lot less 
effective than on Windows, because it uses sync IO, while we need big 
multi-IO load to trigger the problem we are discussing, if it exists. 
Plus, to catch it we need an FS on the initiator side, not using raw 
devices. So, something like fio over files on FS or diskbench should be 
more appropriate. Please don't use direct IO to avoid the bug Dave 
Chinner pointed us out.

Also, you mentioned above about that Linux can generate the warning. Can 
you recall on which configuration, including the kernel version, the 
load application and its configuration, you have seen it?

Thanks,
Vlad

  reply	other threads:[~2010-07-26 12:22 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-31 11:28 Wrong DIF guard tag on ext2 write Christof Schmitt
2010-05-31 11:34 ` Christof Schmitt
2010-05-31 14:20 ` Martin K. Petersen
2010-05-31 14:46   ` Christof Schmitt
2010-06-01 13:16     ` Martin K. Petersen
2010-06-02 13:37       ` Christof Schmitt
2010-06-02 23:20         ` Dave Chinner
2010-06-04  1:34           ` Martin K. Petersen
2010-06-04  2:32             ` Dave Chinner
2010-06-07 16:20               ` Martin K. Petersen
2010-06-07 17:22                 ` Boaz Harrosh
2010-06-07 17:40                   ` Martin K. Petersen
2010-06-08  7:15                     ` Christof Schmitt
2010-06-08  8:47                       ` Dave Chinner
2010-06-08  8:52                         ` Nick Piggin
2010-05-31 14:49   ` Nick Piggin
2010-06-01 13:17     ` Martin K. Petersen
2010-05-31 15:01   ` James Bottomley
2010-05-31 15:30     ` Boaz Harrosh
2010-05-31 15:49       ` Nick Piggin
2010-05-31 16:25         ` Boaz Harrosh
2010-06-01 13:22         ` Martin K. Petersen
2010-06-01 10:30       ` Christof Schmitt
2010-06-01 10:49         ` Boaz Harrosh
2010-06-01 13:03         ` Chris Mason
2010-06-01 13:50           ` Christof Schmitt
2010-06-01 13:58             ` Chris Mason
2010-06-08  7:18               ` Christof Schmitt
2010-06-08  7:18               ` Christof Schmitt
2010-06-08  7:18               ` Christof Schmitt
2010-06-01 14:26             ` Nick Piggin
2010-06-01 13:50           ` Christof Schmitt
2010-06-01 13:50           ` Christof Schmitt
2010-06-01 13:27         ` James Bottomley
2010-06-01 13:33           ` Chris Mason
2010-06-01 13:40             ` James Bottomley
2010-06-01 13:49               ` Chris Mason
2010-06-01 16:29                 ` Matthew Wilcox
2010-06-01 16:29                 ` Matthew Wilcox
2010-06-01 16:29                   ` Matthew Wilcox
2010-06-01 16:47                   ` Chris Mason
2010-06-01 16:54                     ` James Bottomley
2010-06-01 18:09                       ` Chris Mason
2010-06-01 18:46                         ` Nick Piggin
2010-06-01 19:35                           ` Chris Mason
2010-06-02  3:20                             ` Nick Piggin
2010-06-02  3:20                             ` Nick Piggin
2010-06-02  3:20                               ` Nick Piggin
2010-06-02 13:17                               ` Martin K. Petersen
2010-06-02 13:41                                 ` Nick Piggin
2010-06-03 15:46                                   ` Chris Mason
2010-06-03 16:27                                     ` Nick Piggin
2010-06-03 16:27                                       ` Nick Piggin
2010-06-04  1:46                                       ` Martin K. Petersen
2010-06-04  3:09                                         ` Nick Piggin
2010-06-03 16:27                                     ` Nick Piggin
2010-06-04  2:02                                     ` Dave Chinner
2010-06-04 15:32                                       ` Jan Kara
2010-06-04  2:02                                     ` Dave Chinner
2010-06-04  2:02                                     ` Dave Chinner
2010-06-04  1:30                                   ` Martin K. Petersen
2010-06-01 18:46                         ` Nick Piggin
2010-06-01 18:46                         ` Nick Piggin
2010-06-01 21:07                         ` James Bottomley
2010-06-01 22:49                           ` Chris Mason
2010-06-01 13:50               ` Martin K. Petersen
2010-06-01 14:28                 ` Nick Piggin
2010-06-01 14:32                 ` James Bottomley
2010-06-01 14:54                   ` Martin K. Petersen
2010-06-03 11:20           ` Vladislav Bolkhovitin
2010-06-03 12:07             ` Boaz Harrosh
2010-06-03 12:41               ` Vladislav Bolkhovitin
2010-06-03 12:46                 ` Vladislav Bolkhovitin
2010-06-09 15:58                   ` Vladislav Bolkhovitin
2010-06-03 13:06                 ` Boaz Harrosh
2010-06-03 13:23                   ` Vladislav Bolkhovitin
2010-07-23 17:59             ` Gennadiy Nerubayev
2010-07-23 17:59               ` Gennadiy Nerubayev
2010-07-23 19:16               ` Vladislav Bolkhovitin
2010-07-23 20:51                 ` Gennadiy Nerubayev
2010-07-26 12:22                   ` Vladislav Bolkhovitin [this message]
2010-07-26 17:00                     ` Gennadiy Nerubayev
2010-07-26 19:26                       ` Vladislav Bolkhovitin
2010-07-24  1:03                 ` Dave Chinner
2010-06-01  2:40     ` FUJITA Tomonori
2010-06-03 16:09 ` [LFS/VM TOPIC] Stable pages while IO (was Wrong DIF guard tag on ext2 write) Boaz Harrosh
2010-06-03 16:09   ` Boaz Harrosh
2010-06-03 16:09   ` Boaz Harrosh
2010-06-03 16:30   ` [Lsf10-pc] " J. Bruce Fields
2010-06-03 17:41   ` Vladislav Bolkhovitin
2010-06-04 16:23   ` Jan Kara
2010-06-04 16:30     ` [Lsf10-pc] " J. Bruce Fields
2010-06-04 17:11       ` Jan Kara
2010-06-06  9:35     ` Boaz Harrosh
2010-06-06 23:37       ` Jan Kara
2010-06-07  8:30         ` Boaz Harrosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C4D7E0F.1000602@vlnb.net \
    --to=vst@vlnb.net \
    --cc=James.Bottomley@suse.de \
    --cc=bharrosh@panasas.com \
    --cc=chris.mason@oracle.com \
    --cc=christof.schmitt@de.ibm.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=parakie@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.