* Problem with DISCARD and RAID5 @ 2012-11-01 6:38 NeilBrown 2012-11-02 1:40 ` Shaohua Li 0 siblings, 1 reply; 5+ messages in thread From: NeilBrown @ 2012-11-01 6:38 UTC (permalink / raw) To: Shaohua Li; +Cc: linux RAID, Jens Axboe, lkml [-- Attachment #1: Type: text/plain, Size: 725 bytes --] Hi Shaohua, I've been doing some testing and discovered a problem with your discard support for RAID5. The code in blkdev_issue_discard assumes that the 'granularity' is a power of 2, and for example subtracts 1 to get a mask. However RAID5 sets the granularity to be the stripe size which often is not a power of two. When this happens you can easily get into an infinite loop. I suspect that to make this work properly, blkdev_issue_discard will need to be changed to allow 'granularity' to be an arbitrary value. When it is a power of two, the current masking can be used. When it is anything else, it will need to use sector_div(). Could you look into this please? Thanks, NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problem with DISCARD and RAID5 2012-11-01 6:38 Problem with DISCARD and RAID5 NeilBrown @ 2012-11-02 1:40 ` Shaohua Li 2012-11-05 21:48 ` Dave Chinner 0 siblings, 1 reply; 5+ messages in thread From: Shaohua Li @ 2012-11-02 1:40 UTC (permalink / raw) To: NeilBrown; +Cc: linux RAID, Jens Axboe, lkml On Thu, Nov 01, 2012 at 05:38:54PM +1100, NeilBrown wrote: > > Hi Shaohua, > I've been doing some testing and discovered a problem with your discard > support for RAID5. > > The code in blkdev_issue_discard assumes that the 'granularity' is a power > of 2, and for example subtracts 1 to get a mask. > > However RAID5 sets the granularity to be the stripe size which often is not > a power of two. When this happens you can easily get into an infinite loop. > > I suspect that to make this work properly, blkdev_issue_discard will need to > be changed to allow 'granularity' to be an arbitrary value. > When it is a power of two, the current masking can be used. > When it is anything else, it will need to use sector_div(). Yep, looks we need use sector_div. And this isn't the only problem. discard request can be merged, and the merge check only checks max_discard_sectors. That means the split requests in blkdev_issue_discard can be merged again. The split nerver works. I'm wondering what's purpose of discard_alignment and discard_granularity. Are there devices with discard_granularity not 1 sector? If bio isn't discard aligned, what device will do? Further, why driver handles alignment/granularity if device will ignore misaligned request. Jens, can you share some hints please? Thanks, Shaohua ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problem with DISCARD and RAID5 2012-11-02 1:40 ` Shaohua Li @ 2012-11-05 21:48 ` Dave Chinner 2012-11-06 8:06 ` Jens Axboe 0 siblings, 1 reply; 5+ messages in thread From: Dave Chinner @ 2012-11-05 21:48 UTC (permalink / raw) To: Shaohua Li; +Cc: NeilBrown, linux RAID, Jens Axboe, lkml On Fri, Nov 02, 2012 at 09:40:58AM +0800, Shaohua Li wrote: > On Thu, Nov 01, 2012 at 05:38:54PM +1100, NeilBrown wrote: > > > > Hi Shaohua, > > I've been doing some testing and discovered a problem with your discard > > support for RAID5. > > > > The code in blkdev_issue_discard assumes that the 'granularity' is a power > > of 2, and for example subtracts 1 to get a mask. > > > > However RAID5 sets the granularity to be the stripe size which often is not > > a power of two. When this happens you can easily get into an infinite loop. > > > > I suspect that to make this work properly, blkdev_issue_discard will need to > > be changed to allow 'granularity' to be an arbitrary value. > > When it is a power of two, the current masking can be used. > > When it is anything else, it will need to use sector_div(). > > Yep, looks we need use sector_div. And this isn't the only problem. discard > request can be merged, and the merge check only checks max_discard_sectors. > That means the split requests in blkdev_issue_discard can be merged again. The > split nerver works. > > I'm wondering what's purpose of discard_alignment and discard_granularity. Are > there devices with discard_granularity not 1 sector? Most certainly. Thin provisioned storage often has granularity in the order of megabytes.... > If bio isn't discard > aligned, what device will do? Up to the device. > Further, why driver handles alignment/granularity > if device will ignore misaligned request. When you send a series of sequential unaligned requests, the device may ignore them all. Hence you end up with nothing being discarded, even though the entire range being discarded is much, much larger than the discard granularity.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problem with DISCARD and RAID5 2012-11-05 21:48 ` Dave Chinner @ 2012-11-06 8:06 ` Jens Axboe 2012-11-07 5:02 ` Shaohua Li 0 siblings, 1 reply; 5+ messages in thread From: Jens Axboe @ 2012-11-06 8:06 UTC (permalink / raw) To: Dave Chinner; +Cc: Shaohua Li, NeilBrown, linux RAID, lkml On 2012-11-05 22:48, Dave Chinner wrote: > On Fri, Nov 02, 2012 at 09:40:58AM +0800, Shaohua Li wrote: >> On Thu, Nov 01, 2012 at 05:38:54PM +1100, NeilBrown wrote: >>> >>> Hi Shaohua, >>> I've been doing some testing and discovered a problem with your discard >>> support for RAID5. >>> >>> The code in blkdev_issue_discard assumes that the 'granularity' is a power >>> of 2, and for example subtracts 1 to get a mask. >>> >>> However RAID5 sets the granularity to be the stripe size which often is not >>> a power of two. When this happens you can easily get into an infinite loop. >>> >>> I suspect that to make this work properly, blkdev_issue_discard will need to >>> be changed to allow 'granularity' to be an arbitrary value. >>> When it is a power of two, the current masking can be used. >>> When it is anything else, it will need to use sector_div(). >> >> Yep, looks we need use sector_div. And this isn't the only problem. discard >> request can be merged, and the merge check only checks max_discard_sectors. >> That means the split requests in blkdev_issue_discard can be merged again. The >> split nerver works. >> >> I'm wondering what's purpose of discard_alignment and discard_granularity. Are >> there devices with discard_granularity not 1 sector? > > Most certainly. Thin provisioned storage often has granularity in the > order of megabytes.... Can't really to to much about that... >> If bio isn't discard >> aligned, what device will do? > > Up to the device. We should not send those down, if they are violating the restrictions set by the driver. >> Further, why driver handles alignment/granularity >> if device will ignore misaligned request. > > When you send a series of sequential unaligned requests, the device > may ignore them all. Hence you end up with nothing being discarded, > even though the entire range being discarded is much, much larger > than the discard granularity.... That's just tough luck, unfortunately. Shaohua, I'd suggest sending down whatever discards you can, IFF they are aligned according to the restrictions being set. If that ends up not discarding to devices that have large alignment/size constraints, nothing we can do about that. -- Jens Axboe ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problem with DISCARD and RAID5 2012-11-06 8:06 ` Jens Axboe @ 2012-11-07 5:02 ` Shaohua Li 0 siblings, 0 replies; 5+ messages in thread From: Shaohua Li @ 2012-11-07 5:02 UTC (permalink / raw) To: Jens Axboe; +Cc: Dave Chinner, NeilBrown, linux RAID, lkml On Tue, Nov 06, 2012 at 09:06:16AM +0100, Jens Axboe wrote: > On 2012-11-05 22:48, Dave Chinner wrote: > > On Fri, Nov 02, 2012 at 09:40:58AM +0800, Shaohua Li wrote: > >> On Thu, Nov 01, 2012 at 05:38:54PM +1100, NeilBrown wrote: > >>> > >>> Hi Shaohua, > >>> I've been doing some testing and discovered a problem with your discard > >>> support for RAID5. > >>> > >>> The code in blkdev_issue_discard assumes that the 'granularity' is a power > >>> of 2, and for example subtracts 1 to get a mask. > >>> > >>> However RAID5 sets the granularity to be the stripe size which often is not > >>> a power of two. When this happens you can easily get into an infinite loop. > >>> > >>> I suspect that to make this work properly, blkdev_issue_discard will need to > >>> be changed to allow 'granularity' to be an arbitrary value. > >>> When it is a power of two, the current masking can be used. > >>> When it is anything else, it will need to use sector_div(). > >> > >> Yep, looks we need use sector_div. And this isn't the only problem. discard > >> request can be merged, and the merge check only checks max_discard_sectors. > >> That means the split requests in blkdev_issue_discard can be merged again. The > >> split nerver works. > >> > >> I'm wondering what's purpose of discard_alignment and discard_granularity. Are > >> there devices with discard_granularity not 1 sector? > > > > Most certainly. Thin provisioned storage often has granularity in the > > order of megabytes.... > > Can't really to to much about that... > > >> If bio isn't discard > >> aligned, what device will do? > > > > Up to the device. > > We should not send those down, if they are violating the restrictions > set by the driver. > > >> Further, why driver handles alignment/granularity > >> if device will ignore misaligned request. > > > > When you send a series of sequential unaligned requests, the device > > may ignore them all. Hence you end up with nothing being discarded, > > even though the entire range being discarded is much, much larger > > than the discard granularity.... > > That's just tough luck, unfortunately. Shaohua, I'd suggest sending down > whatever discards you can, IFF they are aligned according to the > restrictions being set. If that ends up not discarding to devices that > have large alignment/size constraints, nothing we can do about that. So we have two problems here: 1. as Neil described, blkdev_issue_discard assumes alignment and granularity are a power of 2. We can fix it with sector_div for example. 2. discard request can be merged. The merge check currently ignore alignment and granularity. So it's possible unaligned requests are merged to aligned, or one aligned request and one unaligned request are merged to unaligned. Just ignore unaligned request, so such merge will not happen? ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-11-07 5:02 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-11-01 6:38 Problem with DISCARD and RAID5 NeilBrown 2012-11-02 1:40 ` Shaohua Li 2012-11-05 21:48 ` Dave Chinner 2012-11-06 8:06 ` Jens Axboe 2012-11-07 5:02 ` Shaohua Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).