All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladislav Bolkhovitin <vst@vlnb.net>
To: James Bottomley <James.Bottomley@suse.de>
Cc: Christof Schmitt <christof.schmitt@de.ibm.com>,
	Boaz Harrosh <bharrosh@panasas.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,
	Chris Mason <chris.mason@oracle.com>,
	Gennadiy Nerubayev <parakie@gmail.com>
Subject: Re: Wrong DIF guard tag on ext2 write
Date: Thu, 03 Jun 2010 15:20:02 +0400	[thread overview]
Message-ID: <4C078FE2.9000804@vlnb.net> (raw)
In-Reply-To: <1275398876.21962.6.camel@mulgrave.site>

James Bottomley, on 06/01/2010 05:27 PM wrote:
> On Tue, 2010-06-01 at 12:30 +0200, Christof Schmitt wrote:
>> What is the best strategy to continue with the invalid guard tags on
>> write requests? Should this be fixed in the filesystems?
> 
> For write requests, as long as the page dirty bit is still set, it's
> safe to drop the request, since it's already going to be repeated.  What
> we probably want is an error code we can return that the layer that sees
> both the request and the page flags can make the call.
> 
>> Another idea would be to pass invalid guard tags on write requests
>> down to the hardware, expect an "invalid guard tag" error and report
>> it to the block layer where a new checksum is generated and the
>> request is issued again. Basically implement a retry through the whole
>> I/O stack. But this also sounds complicated.
> 
> No, no ... as long as the guard tag is wrong because the fs changed the
> page, the write request for the updated page will already be queued or
> in-flight, so there's no need to retry.

There's one interesting problem here, at least theoretically, with SCSI 
or similar transports which allow to have commands queue depth >1 and 
allowed to internally reorder queued requests. I don't know the FS/block 
layers sufficiently well to tell if sending several requests for the 
same page really possible or not, but we can see a real life problem, 
which can be well explained if it's possible.

The problem could be if the second (rewrite) request (SCSI command) for 
the same page queued to the corresponding device before the original 
request finished. Since the device allowed to freely reorder requests, 
there's a probability that the original write request would hit the 
permanent storage *AFTER* the retry request, hence the data changes it's 
carrying would be lost, hence welcome data corruption.

For single parallel SCSI or SAS devices such race may look practically 
impossible, but for sophisticated clusters when many nodes pretending to 
be a single SCSI device in a load balancing configuration, it becomes 
very real.

The real life problem we can see in an active-active DRBD-setup. In this 
configuration 2 nodes act as a single SCST-powered SCSI device and they 
both run DRBD to keep their backstorage in-sync. The initiator uses them 
as a single multipath device in an active-active round-robin 
load-balancing configuration, i.e. sends requests to both nodes in 
parallel, then DRBD takes care to replicate the requests to the other node.

The problem is that sometimes DRBD complies about concurrent local 
writes, like:

kernel: drbd0: scsi_tgt0[12503] Concurrent local write detected! 
[DISCARD L] new: 144072784s +8192; pending: 144072784s +8192

This message means that DRBD detected that both nodes received 
overlapping writes on the same block(s) and DRBD can't figure out which 
one to store. This is possible only if the initiator sent the second 
write request before the first one completed.

The topic of the discussion could well explain the cause of that. But, 
unfortunately, people who reported it forgot to note which OS they run 
on the initiator, i.e. I can't say for sure it's Linux.

Vlad


  parent reply	other threads:[~2010-06-03 11:19 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-31 11:28 Wrong DIF guard tag on ext2 write Christof Schmitt
2010-05-31 11:34 ` Christof Schmitt
2010-05-31 14:20 ` Martin K. Petersen
2010-05-31 14:46   ` Christof Schmitt
2010-06-01 13:16     ` Martin K. Petersen
2010-06-02 13:37       ` Christof Schmitt
2010-06-02 23:20         ` Dave Chinner
2010-06-04  1:34           ` Martin K. Petersen
2010-06-04  2:32             ` Dave Chinner
2010-06-07 16:20               ` Martin K. Petersen
2010-06-07 17:22                 ` Boaz Harrosh
2010-06-07 17:40                   ` Martin K. Petersen
2010-06-08  7:15                     ` Christof Schmitt
2010-06-08  8:47                       ` Dave Chinner
2010-06-08  8:52                         ` Nick Piggin
2010-05-31 14:49   ` Nick Piggin
2010-06-01 13:17     ` Martin K. Petersen
2010-05-31 15:01   ` James Bottomley
2010-05-31 15:30     ` Boaz Harrosh
2010-05-31 15:49       ` Nick Piggin
2010-05-31 16:25         ` Boaz Harrosh
2010-06-01 13:22         ` Martin K. Petersen
2010-06-01 10:30       ` Christof Schmitt
2010-06-01 10:49         ` Boaz Harrosh
2010-06-01 13:03         ` Chris Mason
2010-06-01 13:50           ` Christof Schmitt
2010-06-01 13:50           ` Christof Schmitt
2010-06-01 13:50           ` Christof Schmitt
2010-06-01 13:58             ` Chris Mason
2010-06-08  7:18               ` Christof Schmitt
2010-06-08  7:18               ` Christof Schmitt
2010-06-08  7:18               ` Christof Schmitt
2010-06-01 14:26             ` Nick Piggin
2010-06-01 13:27         ` James Bottomley
2010-06-01 13:33           ` Chris Mason
2010-06-01 13:40             ` James Bottomley
2010-06-01 13:49               ` Chris Mason
2010-06-01 16:29                 ` Matthew Wilcox
2010-06-01 16:29                   ` Matthew Wilcox
2010-06-01 16:47                   ` Chris Mason
2010-06-01 16:54                     ` James Bottomley
2010-06-01 18:09                       ` Chris Mason
2010-06-01 18:46                         ` Nick Piggin
2010-06-01 19:35                           ` Chris Mason
2010-06-02  3:20                             ` Nick Piggin
2010-06-02  3:20                             ` Nick Piggin
2010-06-02  3:20                               ` Nick Piggin
2010-06-02 13:17                               ` Martin K. Petersen
2010-06-02 13:41                                 ` Nick Piggin
2010-06-03 15:46                                   ` Chris Mason
2010-06-03 16:27                                     ` Nick Piggin
2010-06-03 16:27                                       ` Nick Piggin
2010-06-04  1:46                                       ` Martin K. Petersen
2010-06-04  3:09                                         ` Nick Piggin
2010-06-03 16:27                                     ` Nick Piggin
2010-06-04  2:02                                     ` Dave Chinner
2010-06-04  2:02                                     ` Dave Chinner
2010-06-04 15:32                                       ` Jan Kara
2010-06-04  2:02                                     ` Dave Chinner
2010-06-04  1:30                                   ` Martin K. Petersen
2010-06-01 18:46                         ` Nick Piggin
2010-06-01 18:46                         ` Nick Piggin
2010-06-01 21:07                         ` James Bottomley
2010-06-01 22:49                           ` Chris Mason
2010-06-01 16:29                 ` Matthew Wilcox
2010-06-01 13:50               ` Martin K. Petersen
2010-06-01 14:28                 ` Nick Piggin
2010-06-01 14:32                 ` James Bottomley
2010-06-01 14:54                   ` Martin K. Petersen
2010-06-03 11:20           ` Vladislav Bolkhovitin [this message]
2010-06-03 12:07             ` Boaz Harrosh
2010-06-03 12:41               ` Vladislav Bolkhovitin
2010-06-03 12:46                 ` Vladislav Bolkhovitin
2010-06-09 15:58                   ` Vladislav Bolkhovitin
2010-06-03 13:06                 ` Boaz Harrosh
2010-06-03 13:23                   ` Vladislav Bolkhovitin
2010-07-23 17:59             ` Gennadiy Nerubayev
2010-07-23 17:59               ` Gennadiy Nerubayev
2010-07-23 19:16               ` Vladislav Bolkhovitin
2010-07-23 20:51                 ` Gennadiy Nerubayev
2010-07-26 12:22                   ` Vladislav Bolkhovitin
2010-07-26 17:00                     ` Gennadiy Nerubayev
2010-07-26 19:26                       ` Vladislav Bolkhovitin
2010-07-24  1:03                 ` Dave Chinner
2010-06-01  2:40     ` FUJITA Tomonori
2010-06-03 16:09 ` [LFS/VM TOPIC] Stable pages while IO (was Wrong DIF guard tag on ext2 write) Boaz Harrosh
2010-06-03 16:09   ` Boaz Harrosh
2010-06-03 16:09   ` Boaz Harrosh
2010-06-03 16:30   ` [Lsf10-pc] " J. Bruce Fields
2010-06-03 17:41   ` Vladislav Bolkhovitin
2010-06-04 16:23   ` Jan Kara
2010-06-04 16:30     ` [Lsf10-pc] " J. Bruce Fields
2010-06-04 17:11       ` Jan Kara
2010-06-06  9:35     ` Boaz Harrosh
2010-06-06 23:37       ` Jan Kara
2010-06-07  8:30         ` Boaz Harrosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C078FE2.9000804@vlnb.net \
    --to=vst@vlnb.net \
    --cc=James.Bottomley@suse.de \
    --cc=bharrosh@panasas.com \
    --cc=chris.mason@oracle.com \
    --cc=christof.schmitt@de.ibm.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=parakie@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.