From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shaohua Li Subject: Re: [PATCH] md: make suspend range wait timed out Date: Mon, 19 Jun 2017 17:54:43 -0700 Message-ID: <20170620005443.hil23ffizavctgkq@kernel.org> References: <877f0c7gvr.fsf@notabene.neil.brown.name> <20170616155204.myffyxp5tuoctcoo@kernel.org> <87vant3rw5.fsf@notabene.neil.brown.name> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <87vant3rw5.fsf@notabene.neil.brown.name> Sender: linux-raid-owner@vger.kernel.org To: NeilBrown Cc: linux-raid@vger.kernel.org, Shaohua Li , Mikulas Patocka List-Id: linux-raid.ids On Mon, Jun 19, 2017 at 07:30:50AM +1000, Neil Brown wrote: > On Fri, Jun 16 2017, Shaohua Li wrote: > > > On Fri, Jun 16, 2017 at 01:26:00PM +1000, Neil Brown wrote: > >> On Fri, Jun 09 2017, Shaohua Li wrote: > >> > >> > From: Shaohua Li > >> > > >> > suspend range is controlled by userspace. If userspace doesn't clear suspend > >> > range, it's possible a thread will wait for the range forever, we can't even > >> > kill it. This is bad behavior. Add a timeout in the wait. If timeout happens, > >> > we return IO error. The app controlling suspend range looks like part of disk > >> > firmware, if disk isn't responded for a long time, timed out IO error is > >> > returned. > >> > > >> > A simple search in SCSI code shows maximum IO timeout is 120s, so I use this > >> > value here too. > >> > >> I really don't like this. It is an API change with no really > >> justification. Has the current behavior caused a problem? > > > > This centainly causes problem. Set the suspend range will make application > > stall for ever, don't you think this is a problem? > > I agree that it could cause a problem. I'm asking it is actually, in > practice, causes a problem. Do you have reports from people saying "the > IO to my RAID array is hanging, what is wrong?" and you look into it and > find out that suspend_hi is larger than suspend_lo? > > And if that does happen, is this really the best way to fix it? I don't have reports about this issue. Just think the behavior is bad. > > > >> Both md and dm export APIs which allow IO to be suspended while > >> changes are made. Changing that to a timed-out period needs, I think, > >> to be clearly justified. > >> > >> If it is changed to a timed-out period, then that should be explicit, > >> rather than having each request independently time out. > >> i.e. when the suspend is initiated, the end-time should be computed, and > >> any IO would block until that time, not block for some number of > >> seconds. > > > > Ok, this makes sense. We can add a timeout. If it's expired, we clear suspend > > range. Userspace should know what they are doing. > > > >> > >> If an md device is left suspended, then the current code will block IO > >> indefinitely. This patch will at a 20minute times to every single > >> request, which will mean IO proceeds, but extremely slowly. I don't see > >> that as a useful improvement. > > > > It returns error, so application will not dispatch more IO. But I agree a > > timeout to clear the suspend looks a better policy. > > Write errors only get back to the application if it calls fsync(), and > many don't do that. Write errors can easily cause a filesystem to go > read-only, and require an fsck. I think we should be very cautious > about triggering write errors. > > NFS will hang indefinitely rather then return an error if the server is > not available. That can certainly be annoying, but the alternative has > been tried, and it leads to random data corruption. > The two cases are only comparable at a very high level, but I think > this result should encourage substantial caution. It's hard to say if an IO error or an infinite wait is better, but since there is better option in this case, I don't want to argue. I'll repost a patch to reset suspend range after a timeout, assume this is your suggestion. Thanks, Shaohua