* [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications. @ 2021-03-23 7:52 Erwin van Londen 2021-03-25 16:07 ` Benjamin Block 0 siblings, 1 reply; 17+ messages in thread From: Erwin van Londen @ 2021-03-23 7:52 UTC (permalink / raw) To: dm-devel [-- Attachment #1.1: Type: text/plain, Size: 299 bytes --] Hello All, Just wondering if there were any plans to incorporate FPIN congestion/latency notifications in dm-multipath to disperse IO over non-affected paths. Regards, Erwin van Londen -- Kind regards, Erwin van Londen http://erwinvanlonden.net PGP key: http://erwinvanlonden.net/pgp-key-id/ [-- Attachment #1.2: Type: text/html, Size: 3702 bytes --] [-- Attachment #2: Type: text/plain, Size: 97 bytes --] -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications. 2021-03-23 7:52 [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications Erwin van Londen @ 2021-03-25 16:07 ` Benjamin Block 2021-03-26 11:15 ` Muneendra Kumar M 0 siblings, 1 reply; 17+ messages in thread From: Benjamin Block @ 2021-03-25 16:07 UTC (permalink / raw) To: Erwin van Londen; +Cc: Muneendra, dm-devel On Tue, Mar 23, 2021 at 05:52:33PM +1000, Erwin van Londen wrote: > Hello All, > > Just wondering if there were any plans to incorporate FPIN > congestion/latency notifications in dm-multipath to disperse IO over > non-affected paths. > For whats worth, general support in Kernel for a new path state in answer to existing FPIN notifications was added earlier this year: https://lore.kernel.org/linux-scsi/1609969748-17684-1-git-send-email-muneendra.kumar@broadcom.com/T/ But this only adds a new port-state and support of it for one particular driver (lpfc). Not aware of any other driver supporting this new state yet, but I might have missed it. Also, the port-state is not set in kernel, but has to be set by something external, unlike with RSCNs, where we set the state in the kernel. What it does, once a path is set into 'Marginal' state, is to not retry commands on the same shaky path, once it already failed one time already. As far as dm-multipath is concerned, I asked that as well when this patch series was developed: https://lore.kernel.org/linux-scsi/20201002162633.GA8365@t480-pf1aa2c2/ Hannes answered that in the thread: https://lore.kernel.org/linux-scsi/ca995d96-608b-39b9-8ded-4a6dd7598660@suse.de/ Not sure what happened in between, didn't see anything on the mpath topic yet. -- Best Regards, Benjamin Block / Linux on IBM Z Kernel Development / IBM Systems IBM Deutschland Research & Development GmbH / https://www.ibm.com/privacy Vorsitz. AufsR.: Gregor Pillen / Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294 -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications. 2021-03-25 16:07 ` Benjamin Block @ 2021-03-26 11:15 ` Muneendra Kumar M 2021-03-31 0:22 ` Erwin van Londen 0 siblings, 1 reply; 17+ messages in thread From: Muneendra Kumar M @ 2021-03-26 11:15 UTC (permalink / raw) To: Benjamin Block, Erwin van Londen; +Cc: dm-devel [-- Attachment #1.1: Type: text/plain, Size: 3157 bytes --] Hi Benjamin, My replies are below On Tue, Mar 23, 2021 at 05:52:33PM +1000, Erwin van Londen wrote: >> Hello All, >> > >Just wondering if there were any plans to incorporate FPIN > >congestion/latency notifications in dm-multipath to disperse IO over > >non-affected paths. > >For whats worth, general support in Kernel for a new path state in answer to existing FPIN notifications was added earlier this year: >https://lore.kernel.org/linux-scsi/1609969748-17684-1-git-send-email-mune endra.kumar@broadcom.com/T/ >But this only adds a new port-state and support of it for one particular driver (lpfc). Not aware of any other driver supporting this new state yet, but I might have missed it. Also, the port-state is not set in kernel, but has to be set by something external, unlike with RSCNs, where we set the >state in the kernel. We had a discussion with Marvel and they are adding the support in their(qlaxx) driver. >What it does, once a path is set into 'Marginal' state, is to not retry commands on the same shaky path, once it already failed one time already. Yes >As far as dm-multipath is concerned, I asked that as well when this patch series was developed: >https://lore.kernel.org/linux-scsi/20201002162633.GA8365@t480-pf1aa2c2/ >Hannes answered that in the thread: >https://lore.kernel.org/linux-scsi/ca995d96-608b-39b9-8ded-4a6dd7598660@s use.de/ >Not sure what happened in between, didn't see anything on the mpath topic yet. As Hannes mentioned in his reply we have an external daemon called fctxpd which acts on fpin-li events and sets the path to marginal path group as well as set the port state to marginal. This daemon is part of epel8. Below is the path for the same where we have changes https://github.com/brocade/bsn-fc-txptd The above code is reviewed by the Benjamin Marzinski from redhat . Note:The latest release will be available on the epel8 where we have the support to set the port state to marginal in a week time As we have all the support in the kernel for fpin registration, notifications and also setting the port_state to marginal We had a initial discussion with Hannes adding the fpin based native support in dm multipathd for FPIN Congestion/Latency notifications . I will take the initiative and start the discussion with Benjamin Marzinski and get this work done with the help of Hannes. Regards, Muneendra. -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. [-- Attachment #1.2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 4220 bytes --] [-- Attachment #2: Type: text/plain, Size: 97 bytes --] -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications. 2021-03-26 11:15 ` Muneendra Kumar M @ 2021-03-31 0:22 ` Erwin van Londen 2021-03-31 7:25 ` Hannes Reinecke 0 siblings, 1 reply; 17+ messages in thread From: Erwin van Londen @ 2021-03-31 0:22 UTC (permalink / raw) To: Muneendra Kumar M, Benjamin Block; +Cc: dm-devel [-- Attachment #1.1: Type: text/plain, Size: 3966 bytes --] Hello Muneendra, benjamin, The fpin options that are developed do have a whole plethora of options and do not mainly trigger paths being in a marginal state. Th mpio layer could utilise the various triggers like congestion and latency and not just use a marginal state as a decisive point. If a path is somewhat congested the amount of io's dispersed over these paths could just be reduced by a flexible margin depending on how often and which fpins are actually received. If for instance and fpin is recieved that an upstream port is throwing physical errors you may exclude is entirely from queueing IO's to it. If it is a latency related problem where credit shortages come in play you may just need to queue very small IO's to it. The scsi CDB will tell the size of the IO. Congestion notifications may just be used for potentially adding an artificial delay to reduce the workload on these paths and schedule them on another. Not really sure what the possibilities are from a DM-Multipath viewpoint, but I feel if the OS options are not properly aligned with what the FC protocol and HBA drivers are able to provide we may miss a good opportunity to optimize the dispersion of IO's and improve overall performance. Regards, Erwin On Fri, 2021-03-26 at 16:45 +0530, Muneendra Kumar M wrote: > Hi Benjamin, > My replies are below > > > On Tue, Mar 23, 2021 at 05:52:33PM +1000, Erwin van Londen wrote: > > > Hello All, > > > > > > Just wondering if there were any plans to incorporate FPIN > > > congestion/latency notifications in dm-multipath to disperse IO > > > over > > > non-affected paths. > > > > > For whats worth, general support in Kernel for a new path state in > > answer > to existing FPIN notifications was added earlier this year: > > https://lore.kernel.org/linux-scsi/1609969748-17684-1-git-send-email-mune > endra.kumar@broadcom.com/T/ > > > But this only adds a new port-state and support of it for one > > particular > driver (lpfc). Not aware of any other driver supporting this new > state > yet, but I might have missed it. Also, the port-state is not set in > kernel, but has to be set by something external, unlike with RSCNs, > where > we set the >state in the kernel. > > We had a discussion with Marvel and they are adding the support in > their(qlaxx) driver. > > > > What it does, once a path is set into 'Marginal' state, is to not > > retry > commands on the same shaky path, once it already failed one time > already. > Yes > > > As far as dm-multipath is concerned, I asked that as well when this > > patch > series was developed: > > https://lore.kernel.org/linux-scsi/20201002162633.GA8365@t480-pf1aa2c2/ > > Hannes answered that in the thread: > > https://lore.kernel.org/linux-scsi/ca995d96-608b-39b9-8ded-4a6dd7598660@s > use.de/ > > > Not sure what happened in between, didn't see anything on the mpath > > topic > yet. > > As Hannes mentioned in his reply we have an external daemon called > fctxpd > which acts on fpin-li events and sets the path to marginal path group > as > well as set the port state to marginal. > This daemon is part of epel8. > Below is the path for the same where we have changes > https://github.com/brocade/bsn-fc-txptd > > The above code is reviewed by the Benjamin Marzinski from redhat . > > Note:The latest release will be available on the epel8 where we have > the > support to set the port state to marginal in a week time > > As we have all the support in the kernel for fpin registration, > notifications and also setting the port_state to marginal > We had a initial discussion with Hannes adding the fpin based native > support in dm multipathd for FPIN Congestion/Latency notifications . > I will take the initiative and start the discussion with Benjamin > Marzinski and get this work done with the help of Hannes. > > > > > Regards, > Muneendra. > [-- Attachment #1.2: Type: text/html, Size: 6327 bytes --] [-- Attachment #2: Type: text/plain, Size: 97 bytes --] -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications. 2021-03-31 0:22 ` Erwin van Londen @ 2021-03-31 7:25 ` Hannes Reinecke 2021-03-31 8:12 ` Erwin van Londen 2021-03-31 9:57 ` Martin Wilck 0 siblings, 2 replies; 17+ messages in thread From: Hannes Reinecke @ 2021-03-31 7:25 UTC (permalink / raw) To: Erwin van Londen, Muneendra Kumar M, Benjamin Block Cc: dm-devel, Martin Wilck Hi Erwin, On 3/31/21 2:22 AM, Erwin van Londen wrote: > Hello Muneendra, benjamin, > > The fpin options that are developed do have a whole plethora of options > and do not mainly trigger paths being in a marginal state. Th mpio layer > could utilise the various triggers like congestion and latency and not > just use a marginal state as a decisive point. If a path is somewhat > congested the amount of io's dispersed over these paths could just be > reduced by a flexible margin depending on how often and which fpins are > actually received. If for instance and fpin is recieved that an upstream > port is throwing physical errors you may exclude is entirely from > queueing IO's to it. If it is a latency related problem where credit > shortages come in play you may just need to queue very small IO's to it. > The scsi CDB will tell the size of the IO. Congestion notifications may > just be used for potentially adding an artificial delay to reduce the > workload on these paths and schedule them on another. > As correctly noted, FPINs come with a variety of options. And I'm not certain we can everything correctly; a degraded path is simple, but for congestion there is only _so_ much we can do. The typical cause for congestion is, say, a 32G host port talking to a 16G (or even 8G) target port _and_ a 32G target port. So the host cannot 'tune down' it's link to 8G; doing so would impact performance on the 32G target port. (And we would suffer reverse congestion whenever that target port sends frames). And throttling things on the SCSI layer only helps _so_ much, as the real congestion is due to the speed with which the frames are sequenced onto the wire. Which is not something we from the OS can control. >From another POV this is arguably a fabric mis-design; so it _could_ be alleviated by separating out the ports with lower speeds into its own zone (or even on a separate SAN); that would trivially make the congestion go away. But for that the admin first should be _alerted_, and this really is my primary goal: having FPINs showing up in the message log, to alert the admin that his fabric is not performing well. A second step will be to massaging FPINs into DM multipath, and have it influencing the path priority or path status. But this is currently under discussion how it could be integrated best. > Not really sure what the possibilities are from a DM-Multipath > viewpoint, but I feel if the OS options are not properly aligned with > what the FC protocol and HBA drivers are able to provide we may miss a > good opportunity to optimize the dispersion of IO's and improve overall > performance. > Looking at the size of the commands is one possibility, but at this time this presumes too much on how we _think_ FPINs will be generated. I'd rather do some more tests to figure out under which circumstances we can expect which type of FPINs, and then start looking for ways on how to integrate them. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications. 2021-03-31 7:25 ` Hannes Reinecke @ 2021-03-31 8:12 ` Erwin van Londen 2021-03-31 9:57 ` Martin Wilck 1 sibling, 0 replies; 17+ messages in thread From: Erwin van Londen @ 2021-03-31 8:12 UTC (permalink / raw) To: Hannes Reinecke, Muneendra Kumar M, Benjamin Block Cc: dm-devel, Wilck, Martin [-- Attachment #1.1: Type: text/plain, Size: 4360 bytes --] Hello Hannes, Thanks for responding. On Wed, 2021-03-31 at 09:25 +0200, Hannes Reinecke wrote: > Hi Erwin, > > On 3/31/21 2:22 AM, Erwin van Londen wrote: > > Hello Muneendra, benjamin, > > > > The fpin options that are developed do have a whole plethora of > > options > > and do not mainly trigger paths being in a marginal state. Th mpio > > layer > > could utilise the various triggers like congestion and latency and > > not > > just use a marginal state as a decisive point. If a path is > > somewhat > > congested the amount of io's dispersed over these paths could just > > be > > reduced by a flexible margin depending on how often and which fpins > > are > > actually received. If for instance and fpin is recieved that an > > upstream > > port is throwing physical errors you may exclude is entirely from > > queueing IO's to it. If it is a latency related problem where > > credit > > shortages come in play you may just need to queue very small IO's > > to it. > > The scsi CDB will tell the size of the IO. Congestion notifications > > may > > just be used for potentially adding an artificial delay to reduce > > the > > workload on these paths and schedule them on another. > > > As correctly noted, FPINs come with a variety of options. > And I'm not certain we can everything correctly; a degraded path is > simple, but for congestion there is only _so_ much we can do. > The typical cause for congestion is, say, a 32G host port talking to > a > 16G (or even 8G) target port _and_ a 32G target port. Congestion can also be caused by a change in workload characteristics where, for example, read and write workload start interfering. The funnel principle would not apply in that case. > > So the host cannot 'tune down' it's link to 8G; doing so would impact > performance on the 32G target port. > (And we would suffer reverse congestion whenever that target port > sends > frames). > > And throttling things on the SCSI layer only helps _so_ much, as the > real congestion is due to the speed with which the frames are > sequenced > onto the wire. Which is not something we from the OS can control. If you can interleave IOs with an artificial delay depending on the type and frequency these FPINS arrive you would be able to prevent latency buildup in the san. > > From another POV this is arguably a fabric mis-design; so it _could_ > be > alleviated by separating out the ports with lower speeds into its own > zone (or even on a separate SAN); that would trivially make the > congestion go away. The entire FPIN concept was designed to be able to provide clients with the option to respond and react to changing behaviours in sans. A mis- design is often not really the case but ongoing changes and continuous provisioning is mainly contributing to the case. > > But for that the admin first should be _alerted_, and this really is > my > primary goal: having FPINs showing up in the message log, to alert > the > admin that his fabric is not performing well. I think the FC drivers are already having facilities to do that or they will have that shortly. dm-multipath is not really required to handle the notifications but would be useful if actions have been done based on fpins. > > A second step will be to massaging FPINs into DM multipath, and have > it > influencing the path priority or path status. But this is currently > under discussion how it could be integrated best. OK > > > Not really sure what the possibilities are from a DM-Multipath > > viewpoint, but I feel if the OS options are not properly aligned > > with > > what the FC protocol and HBA drivers are able to provide we may > > miss a > > good opportunity to optimize the dispersion of IO's and improve > > overall > > performance. > > > Looking at the size of the commands is one possibility, but at this > time > this presumes too much on how we _think_ FPINs will be generated. > I'd rather do some more tests to figure out under which circumstances > we > can expect which type of FPINs, and then start looking for ways on > how > to integrate them. The FC protocol only describes the framework and not the values that need to be adhered to. That depends on the end devices and their capabilities. > > Cheers, > > Hannes [-- Attachment #1.2: Type: text/html, Size: 6076 bytes --] [-- Attachment #2: Type: text/plain, Size: 97 bytes --] -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications. 2021-03-31 7:25 ` Hannes Reinecke 2021-03-31 8:12 ` Erwin van Londen @ 2021-03-31 9:57 ` Martin Wilck 2021-03-31 10:48 ` Muneendra Kumar M 1 sibling, 1 reply; 17+ messages in thread From: Martin Wilck @ 2021-03-31 9:57 UTC (permalink / raw) To: erwin@erwinvanlonden.net, muneendra.kumar@broadcom.com, bblock@linux.ibm.com, hare@suse.de Cc: dm-devel@redhat.com On Wed, 2021-03-31 at 09:25 +0200, Hannes Reinecke wrote: > Hi Erwin, > > On 3/31/21 2:22 AM, Erwin van Londen wrote: > > Hello Muneendra, benjamin, > > > > The fpin options that are developed do have a whole plethora of > > options > > and do not mainly trigger paths being in a marginal state. Th mpio > > layer > > could utilise the various triggers like congestion and latency and > > not > > just use a marginal state as a decisive point. If a path is > > somewhat > > congested the amount of io's dispersed over these paths could just > > be > > reduced by a flexible margin depending on how often and which fpins > > are > > actually received. If for instance and fpin is recieved that an > > upstream > > port is throwing physical errors you may exclude is entirely from > > queueing IO's to it. If it is a latency related problem where > > credit > > shortages come in play you may just need to queue very small IO's > > to it. > > The scsi CDB will tell the size of the IO. Congestion notifications > > may > > just be used for potentially adding an artificial delay to reduce > > the > > workload on these paths and schedule them on another. > > > As correctly noted, FPINs come with a variety of options. > And I'm not certain we can everything correctly; a degraded path is > simple, but for congestion there is only _so_ much we can do. > The typical cause for congestion is, say, a 32G host port talking to > a > 16G (or even 8G) target port _and_ a 32G target port. > > So the host cannot 'tune down' it's link to 8G; doing so would impact > performance on the 32G target port. > (And we would suffer reverse congestion whenever that target port > sends > frames). > > And throttling things on the SCSI layer only helps _so_ much, as the > real congestion is due to the speed with which the frames are > sequenced > onto the wire. Which is not something we from the OS can control. > > From another POV this is arguably a fabric mis-design; so it _could_ > be > alleviated by separating out the ports with lower speeds into its own > zone (or even on a separate SAN); that would trivially make the > congestion go away. > > But for that the admin first should be _alerted_, and this really is > my > primary goal: having FPINs showing up in the message log, to alert > the > admin that his fabric is not performing well. > > A second step will be to massaging FPINs into DM multipath, and have > it > influencing the path priority or path status. But this is currently > under discussion how it could be integrated best. If there was any discussion, I haven't been involved :-) I haven't looked into FPIN much so far. I'm rather sceptic with it's usefulness for dm-multipath. Being a property of FC-2, FPIN works at least 2 layers below dm-multipath. dm-multipath is agnostic against protocol and transport properties by design. User space multipathd can cross these layers and tune dm-multipath based on lower-level properties, but such actions have rather large latencies. As you know, dm-multipath has 3 switches for routing IO via different paths: 1 priority groups, 2 path status (good / failed) 3 path selector algorithm 1) and 2) are controlled by user space, and have high latency. The current "marginal" concept in multipathd watches paths for repeated failures, and configures the kernel to avoid using paths that are considered marginal, using methods 1) and 2). This is a very-high- latency algorithm that changes state on the time scale of minutes. There is no concept for "delaying" or "pausing" IO on paths on short time scale. The only low-latency mechanism is 3). But it's block level, no existing selector looks at transport-level properties. That said, I can quite well imagine a feedback mechanism based on throttling or delays applied in the FC drivers. For example, it a remote port was throttled by the driver in response to FPIN messages, it's bandwidth would decrease, and a path selector like "service-time" would automatically assign less IO to such paths. This wouldn't need any changes in dm-multipath or multipath-tools, it would work entirely on the FC level. Talking about improving the current "marginal" algorithm in multipathd, and knowing that it's slow, FPIN might provide additional data that would be good to have. Currently, multipathd only has 2 inputs, "good<->bad" state transitions based either on kernel I/O errors or path checker results, and failure statistics from multipathd's internal "io_err_stat" thread, which only reads sector 0. This could obviously be improved, but there may actually be lower-hanging fruit than evaluating FPIN notifications (for example, I've pondered utilizing the kernel's blktrace functionality to detect unusually long IO latencies or bandwidth drops). Talking about FPIN, is it planned to notify user space about such fabric events, and if yes, how? Thanks, Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Software Solutions Germany GmbH HRB 36809, AG Nürnberg GF: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications. 2021-03-31 9:57 ` Martin Wilck @ 2021-03-31 10:48 ` Muneendra Kumar M 2021-03-31 11:45 ` Martin Wilck 2021-04-01 2:48 ` Erwin van Londen 0 siblings, 2 replies; 17+ messages in thread From: Muneendra Kumar M @ 2021-03-31 10:48 UTC (permalink / raw) To: Martin Wilck, erwin, bblock, hare; +Cc: dm-devel [-- Attachment #1.1: Type: text/plain, Size: 5837 bytes --] Hi Martin, Below are my replies. >If there was any discussion, I haven't been involved :-) >I haven't looked into FPIN much so far. I'm rather sceptic with it's usefulness for dm-multipath. Being a property of FC-2, FPIN works at least 2 layers below dm-multipath. dm-multipath is agnostic against protocol and transport properties by design. User space multipathd can cross these layers and tune dm-multipath based on lower-level properties, but such actions have rather large latencies. >As you know, dm-multipath has 3 switches for routing IO via different paths: > 1 priority groups, > 2 path status (good / failed) >3 path selector algorithm >1) and 2) are controlled by user space, and have high latency. >The current "marginal" concept in multipathd watches paths for repeated failures, and configures the kernel to avoid using paths that are considered marginal, using methods 1) and 2). This is a very-high- latency algorithm that >changes state on the time scale of minutes. >There is no concept for "delaying" or "pausing" IO on paths on short time scale. >The only low-latency mechanism is 3). But it's block level, no existing selector looks at transport-level properties. >That said, I can quite well imagine a feedback mechanism based on throttling or delays applied in the FC drivers. For example, it a remote port was throttled by the driver in response to FPIN messages, it's bandwidth would >decrease, and a path selector like "service-time" >would automatically assign less IO to such paths. This wouldn't need any changes in dm-multipath or multipath-tools, it would work entirely on the FC level. [Muneendra]Agreed. >Talking about improving the current "marginal" algorithm in multipathd, and knowing that it's slow, FPIN might provide additional data that would be good to have. Currently, multipathd only has 2 inputs, "good<->bad" state >transitions based either on kernel I/O errors or path checker results, and failure statistics from multipathd's internal "io_err_stat" thread, which only reads sector 0. This could obviously be improved, but there may actually be >lower-hanging fruit than evaluating FPIN notifications (for example, I've pondered utilizing the kernel's blktrace functionality to detect unusually long IO latencies or bandwidth drops). >Talking about FPIN, is it planned to notify user space about such fabric events, and if yes, how? [Muneendra]Yes. FC drivers, when receiving FC FPIN ELS's are calling a scsi transport routine with the FPIN payload. The transport is pushing this as an "event" via netlink. An app bound to the local address used by the scsi transport can receive the event and parse it. Benjamin has added a marginal_path group(multipath marginal pathgroups) in the dm-multipath. https://patchwork.kernel.org/project/dm-devel/cover/1564763622-31752-1-git -send-email-bmarzins@redhat.com/ One of the intention of the Benjamin's patch (support for maginal path) is to support for the FPIN events we receive from fabric. On receiving the fpin-li our intention was to place all the paths that are affected into the marginal path group. Below are the 4 types of descriptors returned in an FPIN: • Link Integrity (LN): some error on a link that affected frames, which is the main one for "flaky path" • Delivery Notification (DN): something explicitly knew about a dropped frame and is reporting it. Usually, things like a CRC error says you can't trust the frame header, so you it's a LI error. But if you do have a valid frame, but drop it, such as a fabric edge timer (don't queue it more the 250-600ms), then it becomes a DN type. Could be flaky path, but not necessarily. • Congestion (CN): fabric is saying it's congested sending to "your" port. Meaning if a host receives it - fabric is saying it has more frames for the host than it's pulling in so it's backing up the fabric.What should happen is load by the host should be lowered - but it's across all targets. Not all targets are perhaps in the mpio path list • Peer Congestion (PCN): this goes along with CN in that the fabric is now telling the other devices in the zone sending traffic to that congested port that the other port is backing up. So the idea is these peer send less load to the congested port. Shouldn't really tie to mpio. some of the current thinking is targets could see this and reduce their transmission rate to a host to the link speed of the host On receiving the congestion notifications our intention is to slowdown the work load gradually from the host until it stops receiving the congestion notifications. We need to validate the same how we can achieve the same of decreasing the workloads with the help of dm-multipath. As Hannes mentioned in his earlier mail our primary goal is that the admin first should be _alerted_, having FPINs showing up in the message log, to alert the admin that his fabric is not performing well. Regards, Muneendra. -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. [-- Attachment #1.2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 4220 bytes --] [-- Attachment #2: Type: text/plain, Size: 97 bytes --] -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications. 2021-03-31 10:48 ` Muneendra Kumar M @ 2021-03-31 11:45 ` Martin Wilck 2021-03-31 11:53 ` Hannes Reinecke 2021-04-01 2:48 ` Erwin van Londen 1 sibling, 1 reply; 17+ messages in thread From: Martin Wilck @ 2021-03-31 11:45 UTC (permalink / raw) To: erwin@erwinvanlonden.net, muneendra.kumar@broadcom.com, bblock@linux.ibm.com, hare@suse.de Cc: dm-devel@redhat.com On Wed, 2021-03-31 at 16:18 +0530, Muneendra Kumar M wrote: > > > Talking about FPIN, is it planned to notify user space about such > > fabric > events, and if yes, how? > > [Muneendra]Yes. FC drivers, when receiving FC FPIN ELS's are calling > a > scsi transport routine with the FPIN payload. The transport > is pushing this as an "event" via netlink. An app bound to the local > address used by the scsi transport can receive the event and parse > it. > > Benjamin has added a marginal_path group(multipath marginal > pathgroups) in > the dm-multipath. > https://patchwork.kernel.org/project/dm-devel/cover/1564763622-31752-1-git > -send-email-bmarzins@redhat.com/ > > One of the intention of the Benjamin's patch (support for maginal > path) is > to support for the FPIN events we receive from fabric. > On receiving the fpin-li our intention was to place all the paths > that > are affected into the marginal path group. > I'm aware of Ben's work, but I hadn't realized it had anything to do with FPIN. As of today, multipathd doesn't listen on the NETLINK_SCSITRANSPORT socket. Does any user space tool do this? Google didn't show me anything. Regards, Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Software Solutions Germany GmbH HRB 36809, AG Nürnberg GF: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications. 2021-03-31 11:45 ` Martin Wilck @ 2021-03-31 11:53 ` Hannes Reinecke 2021-03-31 11:57 ` Muneendra Kumar M 0 siblings, 1 reply; 17+ messages in thread From: Hannes Reinecke @ 2021-03-31 11:53 UTC (permalink / raw) To: Martin Wilck, erwin@erwinvanlonden.net, muneendra.kumar@broadcom.com, bblock@linux.ibm.com Cc: dm-devel@redhat.com On 3/31/21 1:45 PM, Martin Wilck wrote: > On Wed, 2021-03-31 at 16:18 +0530, Muneendra Kumar M wrote: >> >>> Talking about FPIN, is it planned to notify user space about such >>> fabric >> events, and if yes, how? >> >> [Muneendra]Yes. FC drivers, when receiving FC FPIN ELS's are calling >> a >> scsi transport routine with the FPIN payload. The transport >> is pushing this as an "event" via netlink. An app bound to the local >> address used by the scsi transport can receive the event and parse >> it. >> >> Benjamin has added a marginal_path group(multipath marginal >> pathgroups) in >> the dm-multipath. >> https://patchwork.kernel.org/project/dm-devel/cover/1564763622-31752-1-git >> -send-email-bmarzins@redhat.com/ >> >> One of the intention of the Benjamin's patch (support for maginal >> path) is >> to support for the FPIN events we receive from fabric. >> On receiving the fpin-li our intention was to place all the paths >> that >> are affected into the marginal path group. >> > > I'm aware of Ben's work, but I hadn't realized it had anything to do > with FPIN. As of today, multipathd doesn't listen on the > NETLINK_SCSITRANSPORT socket. Does any user space tool do this? > Google didn't show me anything. > I did, once, but that was years ago. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications. 2021-03-31 11:53 ` Hannes Reinecke @ 2021-03-31 11:57 ` Muneendra Kumar M 2021-03-31 12:41 ` Martin Wilck 0 siblings, 1 reply; 17+ messages in thread From: Muneendra Kumar M @ 2021-03-31 11:57 UTC (permalink / raw) To: Hannes Reinecke, Martin Wilck, erwin, bblock; +Cc: dm-devel [-- Attachment #1.1: Type: text/plain, Size: 1485 bytes --] Hi Martin, >> >> I'm aware of Ben's work, but I hadn't realized it had anything to do >> with FPIN. As of today, multipathd doesn't listen on the >> NETLINK_SCSITRANSPORT socket. Does any user space tool do this? > >Google didn't show me anything. > > >I did, once, but that was years ago. We have user space daemon(Broadcom'sFiber Channel Transport Daemon) called fctxpd (Benjamin was talking in his patch) which acts on fpin-li events by listening on NETLINK_SCSITRANSPORT socket And it sets the path to marginal path group on receiving FPIN events. This daemon is part of epel8. Below is the path for the same where we have changes https://github.com/brocade/bsn-fc-txptd Regards, Muneendra. -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. [-- Attachment #1.2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 4220 bytes --] [-- Attachment #2: Type: text/plain, Size: 97 bytes --] -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications. 2021-03-31 11:57 ` Muneendra Kumar M @ 2021-03-31 12:41 ` Martin Wilck 0 siblings, 0 replies; 17+ messages in thread From: Martin Wilck @ 2021-03-31 12:41 UTC (permalink / raw) To: erwin@erwinvanlonden.net, muneendra.kumar@broadcom.com, bblock@linux.ibm.com, hare@suse.de Cc: dm-devel@redhat.com On Wed, 2021-03-31 at 17:27 +0530, Muneendra Kumar M wrote: > Hi Martin, > > > > > > > I'm aware of Ben's work, but I hadn't realized it had anything to > > > do > > > with FPIN. As of today, multipathd doesn't listen on the > > > NETLINK_SCSITRANSPORT socket. Does any user space tool do this? > > > Google didn't show me anything. > > > > > I did, once, but that was years ago. > > We have user space daemon(Broadcom'sFiber Channel Transport Daemon) > called > fctxpd (Benjamin was talking in his patch) which acts on fpin-li events > by listening on NETLINK_SCSITRANSPORT socket > And it sets the path to marginal path group on receiving FPIN events. > This daemon is part of epel8. > Below is the path for the same where we have changes > https://github.com/brocade/bsn-fc-txptd > > Regards, > Muneendra. I see, and this daemon uses multipathd's "set marginal" command to make multipathd act on it. I can see now that Ben talked about "Broadcom's Fiber Channel Transport Daemon" back then, but he didn't go into details, and I either overlooked it entirely, or forgot about it. I recall that I was wondering by myself whether the "set marginal" command had any use other than manual testing. I wonder if we could / should incorporate this functionality into multipathd itself. But anyway, it seems that this part of the FPIN mechanism works already. Thanks Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Software Solutions Germany GmbH HRB 36809, AG Nürnberg GF: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications. 2021-03-31 10:48 ` Muneendra Kumar M 2021-03-31 11:45 ` Martin Wilck @ 2021-04-01 2:48 ` Erwin van Londen 2021-04-01 10:16 ` Martin Wilck 1 sibling, 1 reply; 17+ messages in thread From: Erwin van Londen @ 2021-04-01 2:48 UTC (permalink / raw) To: Muneendra Kumar M, Martin Wilck, bblock, hare; +Cc: dm-devel [-- Attachment #1.1.1: Type: text/plain, Size: 7031 bytes --] Hello Muneendra On Wed, 2021-03-31 at 16:18 +0530, Muneendra Kumar M wrote: > Hi Martin, > Below are my replies. > > > > If there was any discussion, I haven't been involved :-) > > > I haven't looked into FPIN much so far. I'm rather sceptic with > > it's > usefulness for dm-multipath. Being a property of FC-2, FPIN works at > least > 2 layers below dm-multipath. dm-multipath is agnostic against > protocol and > transport properties by design. User space multipathd can cross these > layers and tune dm-multipath based on lower-level properties, but > such > actions have rather large latencies. > > > As you know, dm-multipath has 3 switches for routing IO via > > different > paths: > > > 1 priority groups, > > 2 path status (good / failed) > >3 path selector algorithm > > > 1) and 2) are controlled by user space, and have high latency. > > > The current "marginal" concept in multipathd watches paths for > > repeated > failures, and configures the kernel to avoid using paths that are > considered marginal, using methods 1) and 2). This is a very-high- > latency > algorithm that >changes state on the time scale of minutes. > > There is no concept for "delaying" or "pausing" IO on paths on > > short time > scale. > > > The only low-latency mechanism is 3). But it's block level, no > > existing > selector looks at transport-level properties. > > > That said, I can quite well imagine a feedback mechanism based on > throttling or delays applied in the FC drivers. For example, it a > remote > port was throttled by the driver in response to FPIN messages, it's > bandwidth would >decrease, and a path selector like "service-time" > > would automatically assign less IO to such paths. This wouldn't > > need any > changes in dm-multipath or multipath-tools, it would work entirely on > the > FC level. > > [Muneendra]Agreed. I think the only way the FC drivers can respond to this is by delaying the R_RDY primitives resulting in less credits being available for the remote side to use. That only works on a link layer and not fabric wide. It cannot change linkspeed at all as that would bounce a port resulting in all sorts of state changes. That being said this is already the existing behavior and not really tied to fpins. The goal of the fpin method was to provide a more proactive method and inform the OS layer of fabric issues so it could act upon it by adjusting the IO profile. > > > Talking about improving the current "marginal" algorithm in > > multipathd, > and knowing that it's slow, FPIN might provide additional data that > would > be good to have. Currently, multipathd only has 2 inputs, "good<- > >bad" > state >transitions based either on kernel I/O errors or path checker > results, and failure statistics from multipathd's internal > "io_err_stat" > thread, which only reads sector 0. This could obviously be improved, > but > there may actually be >lower-hanging fruit than evaluating FPIN > notifications (for example, I've pondered utilizing the kernel's > blktrace > functionality to detect unusually long IO latencies or bandwidth > drops). > > > Talking about FPIN, is it planned to notify user space about such > > fabric > events, and if yes, how? > > [Muneendra]Yes. FC drivers, when receiving FC FPIN ELS's are calling > a > scsi transport routine with the FPIN payload. The transport > is pushing this as an "event" via netlink. An app bound to the local > address used by the scsi transport can receive the event and parse > it. > > Benjamin has added a marginal_path group(multipath marginal > pathgroups) in > the dm-multipath. > https://patchwork.kernel.org/project/dm-devel/cover/1564763622-31752-1-git > -send-email-bmarzins@redhat.com/ > > One of the intention of the Benjamin's patch (support for maginal > path) is > to support for the FPIN events we receive from fabric. > On receiving the fpin-li our intention was to place all the paths > that > are affected into the marginal path group. I think this should all be done in kernel space as we're talking sub- millisecond timings here when it comes to fpins and the reaction time expected. I may be wrong but I'll leave that up to you. > > Below are the 4 types of descriptors returned in an FPIN: > • Link Integrity (LN): some error on a link that affected > frames, > which is the main one for "flaky path" > • Delivery Notification (DN): something explicitly knew about > a > dropped frame and is reporting it. Usually, things like a CRC error > says > you can't trust the frame header, so you it's a LI error. But if you > do > have a valid frame, but drop it, such as a fabric edge timer (don't > queue > it more the 250-600ms), then it becomes a DN type. Could be flaky > path, > but not necessarily. > • Congestion (CN): fabric is saying it's congested sending to > "your" > port. Meaning if a host receives it - fabric is saying it has more > frames > for the host than it's pulling in so it's backing up the fabric.What > should happen is load by the host should be lowered - but it's across > all > targets. Not all targets are perhaps in the mpio path list > • Peer Congestion (PCN): this goes along with CN in that the > fabric > is now telling the other devices in the zone sending traffic to that > congested port that the other port is backing up. So the idea is > these > peer send less load to the congested port. Shouldn't really tie to > mpio. > some of the current thinking is targets could see this and reduce > their > transmission rate to a host to the link speed of the host > > On receiving the congestion notifications our intention is to > slowdown the > work load gradually from the host until it stops receiving the > congestion > notifications. > We need to validate the same how we can achieve the same of > decreasing the > workloads with the help of dm-multipath. Would it be possible to piggyback on the service time path selector in this when it pertains latency? Another thing is that at some stage the IO queueing decision needs to take into account the various different FPIN descriptors. A remote delivery notification due to slow drain behaviour is very different than ISL congestion or any physical issues. > > As Hannes mentioned in his earlier mail our primary goal is that the > admin first should be _alerted_, having FPINs showing up in the > message > log, to alert the > admin that his fabric is not performing well. > This is a bit of a reactive approach that should be a secondary objective. Having been in storage/fc support for 20 years I know that most admins are not really responsive to this and taking actions based on event entries take a very very long time. From an operations perspective any sort of manual action should be avoided as much as possible. > > Regards, > Muneendra. > [-- Attachment #1.1.2: Type: text/html, Size: 10448 bytes --] [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 488 bytes --] [-- Attachment #2: Type: text/plain, Size: 97 bytes --] -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications. 2021-04-01 2:48 ` Erwin van Londen @ 2021-04-01 10:16 ` Martin Wilck 2021-04-01 22:04 ` Erwin van Londen 0 siblings, 1 reply; 17+ messages in thread From: Martin Wilck @ 2021-04-01 10:16 UTC (permalink / raw) To: erwin@erwinvanlonden.net, muneendra.kumar@broadcom.com, bblock@linux.ibm.com, hare@suse.de Cc: dm-devel@redhat.com On Thu, 2021-04-01 at 12:48 +1000, Erwin van Londen wrote: > > > > Benjamin has added a marginal_path group(multipath marginal > > pathgroups) in > > the dm-multipath. > > https://patchwork.kernel.org/project/dm-devel/cover/1564763622-31752-1-git > > -send-email-bmarzins@redhat.com/ > > > > One of the intention of the Benjamin's patch (support for maginal > > path) is > > to support for the FPIN events we receive from fabric. > > On receiving the fpin-li our intention was to place all the paths > > that > > are affected into the marginal path group. > I think this should all be done in kernel space as we're talking sub- > millisecond timings here when it comes to fpins and the reaction time > expected. I may be wrong but I'll leave that up to you. Sub-ms latency is impossible with this setup (kernel -> broadcom FC daemon -> multipathd -> kernel). It's only suitable for "fatal" FPINs that would suggest taking a path offline on the time scale of minutes. I suppose that would work well for LN FPINs, but not for the other types. > > > > On receiving the congestion notifications our intention is to > > slowdown the > > work load gradually from the host until it stops receiving the > > congestion > > notifications. > > We need to validate the same how we can achieve the same of > > decreasing the > > workloads with the help of dm-multipath. > Would it be possible to piggyback on the service time path selector > in this when it pertains latency? Not on service-time itself, but someone could write a new path selector algorithm. IMO we'd still have the problem that this would be seen as a layering violation. In the long run dm-mpath may need to add transport- specific callbacks. But for a proof-of-concept, a selector algorithm with layering violations would be ok, I believe. Regards Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Software Solutions Germany GmbH HRB 36809, AG Nürnberg GF: Felix Imendörffer -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications. 2021-04-01 10:16 ` Martin Wilck @ 2021-04-01 22:04 ` Erwin van Londen 2021-04-05 5:30 ` Muneendra Kumar M 0 siblings, 1 reply; 17+ messages in thread From: Erwin van Londen @ 2021-04-01 22:04 UTC (permalink / raw) To: Martin Wilck, muneendra.kumar@broadcom.com, bblock@linux.ibm.com, hare@suse.de Cc: dm-devel@redhat.com [-- Attachment #1.1.1: Type: text/plain, Size: 2578 bytes --] Hello Martin, On Thu, 2021-04-01 at 10:16 +0000, Martin Wilck wrote: > On Thu, 2021-04-01 at 12:48 +1000, Erwin van Londen wrote: > > > > > > Benjamin has added a marginal_path group(multipath marginal > > > pathgroups) in > > > the dm-multipath. > > > https://patchwork.kernel.org/project/dm-devel/cover/1564763622-31752-1-git > > > -send-email-bmarzins@redhat.com/ > > > > > > One of the intention of the Benjamin's patch (support for maginal > > > path) is > > > to support for the FPIN events we receive from fabric. > > > On receiving the fpin-li our intention was to place all the > > > paths > > > that > > > are affected into the marginal path group. > > I think this should all be done in kernel space as we're talking > > sub- > > millisecond timings here when it comes to fpins and the reaction > > time > > expected. I may be wrong but I'll leave that up to you. > > Sub-ms latency is impossible with this setup (kernel -> broadcom FC > daemon -> multipathd -> kernel). It's only suitable for "fatal" FPINs > that would suggest taking a path offline on the time scale of > minutes. > I suppose that would work well for LN FPINs, but not for the other > types. I agree. I was hoping the FC drivers would be able to play a role in this and provide a direct hook into the FPIN notifications in such a way that userspace daemons would not be required and multipath would be able to play a direct role here. When it comes to latency in a san we're indeed talking about sub-ms when it comes to impacting other parts of the fabrics having an immediate effect on multiple initiators and targets due to the shared nature of the beast. > > > > > > > On receiving the congestion notifications our intention is to > > > slowdown the > > > work load gradually from the host until it stops receiving the > > > congestion > > > notifications. > > > We need to validate the same how we can achieve the same of > > > decreasing the > > > workloads with the help of dm-multipath. > > Would it be possible to piggyback on the service time path selector > > in this when it pertains latency? > > Not on service-time itself, but someone could write a new path > selector > algorithm. IMO we'd still have the problem that this would be seen as > a > layering violation. In the long run dm-mpath may need to add > transport- > specific callbacks. But for a proof-of-concept, a selector algorithm > with layering violations would be ok, I believe. Is that an offer of volunteering?? :-) > > Regards > Martin > [-- Attachment #1.1.2: Type: text/html, Size: 4120 bytes --] [-- Attachment #1.2: face-smile.png --] [-- Type: image/png, Size: 871 bytes --] [-- Attachment #2: Type: text/plain, Size: 97 bytes --] -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications. 2021-04-01 22:04 ` Erwin van Londen @ 2021-04-05 5:30 ` Muneendra Kumar M 2021-04-05 5:55 ` Erwin van Londen 0 siblings, 1 reply; 17+ messages in thread From: Muneendra Kumar M @ 2021-04-05 5:30 UTC (permalink / raw) To: Erwin van Londen, Martin Wilck, bblock, hare; +Cc: dm-devel [-- Attachment #1.1.1.1: Type: text/plain, Size: 3606 bytes --] Hi Erwin, Below are my replies. On Thu, 2021-04-01 at 10:16 +0000, Martin Wilck wrote: On Thu, 2021-04-01 at 12:48 +1000, Erwin van Londen wrote: Benjamin has added a marginal_path group(multipath marginal pathgroups) in the dm-multipath. https://patchwork.kernel.org/project/dm-devel/cover/1564763622-31752-1-git -send-email-bmarzins@redhat.com/ One of the intention of the Benjamin's patch (support for maginal path) is to support for the FPIN events we receive from fabric. On receiving the fpin-li our intention was to place all the paths that are affected into the marginal path group. I think this should all be done in kernel space as we're talking sub- millisecond timings here when it comes to fpins and the reaction time expected. I may be wrong but I'll leave that up to you. Sub-ms latency is impossible with this setup (kernel -> broadcom FC daemon -> multipathd -> kernel). It's only suitable for "fatal" FPINs that would suggest taking a path offline on the time scale of minutes. I suppose that would work well for LN FPINs, but not for the other types. >>I agree. I was hoping the FC drivers would be able to play a role in this and provide a direct hook into the FPIN notifications in such a way that userspace daemons would not be required and multipath would >>be able to play a direct role here. >>When it comes to latency in a san we're indeed talking about sub-ms when it comes to impacting other parts of the fabrics having an immediate effect on multiple initiators and targets due to the shared nature >>of the beast. >> On receiving the congestion notifications our intention is to slowdown the work load gradually from the host until it stops receiving the congestion notifications. We need to validate the same how we can achieve the same of decreasing the workloads with the help of dm-multipath. Would it be possible to piggyback on the service time path selector in this when it pertains latency? Not on service-time itself, but someone could write a new path selector algorithm. IMO we'd still have the problem that this would be seen as a layering violation. In the long run dm-mpath may need to add transport- specific callbacks. But for a proof-of-concept, a selector algorithm with layering violations would be ok, I believe. >>Is that an offer of volunteering?? [image: :-)] [Muneendra]To address all the issues we are planning to come up with new dm-path selector algorithm which should address the above concerns where FC drivers will do a direct hook into the FPIN notifications in such a way that userspace daemons would not be required and multipath would be able to play a direct role here. Will come up with more details regarding the new dm-path selector algorithm for FPIN notifications. Regards, Muneendra. -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it. [-- Attachment #1.1.1.2: Type: text/html, Size: 8831 bytes --] [-- Attachment #1.1.2: image001.png --] [-- Type: image/png, Size: 871 bytes --] [-- Attachment #1.2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 4220 bytes --] [-- Attachment #2: Type: text/plain, Size: 97 bytes --] -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications. 2021-04-05 5:30 ` Muneendra Kumar M @ 2021-04-05 5:55 ` Erwin van Londen 0 siblings, 0 replies; 17+ messages in thread From: Erwin van Londen @ 2021-04-05 5:55 UTC (permalink / raw) To: Muneendra Kumar M, Martin Wilck, bblock, hare; +Cc: dm-devel [-- Attachment #1.1.1.1: Type: text/plain, Size: 4243 bytes --] Hello Muneendra, On Mon, 2021-04-05 at 11:00 +0530, Muneendra Kumar M wrote: > Hi Erwin, > Below are my replies. > > On Thu, 2021-04-01 at 10:16 +0000, Martin Wilck wrote: > > On Thu, 2021-04-01 at 12:48 +1000, Erwin van Londen wrote: > > > > > > > > Benjamin has added a marginal_path group(multipath marginal > > > > pathgroups) in > > > > the dm-multipath. > > > > > https://patchwork.kernel.org/project/dm-devel/cover/1564763622-31752-1-git > > > > -send-email-bmarzins@redhat.com/ > > > > > > > > One of the intention of the Benjamin's patch (support for > > > > maginal > > > > path) is > > > > to support for the FPIN events we receive from fabric. > > > > On receiving the fpin-li our intention was to place all the > > > > paths > > > > that > > > > are affected into the marginal path group. > > > I think this should all be done in kernel space as we're talking > > > sub- > > > millisecond timings here when it comes to fpins and the reaction > > > time > > > expected. I may be wrong but I'll leave that up to you. > > > > Sub-ms latency is impossible with this setup (kernel -> broadcom > > FC > > daemon -> multipathd -> kernel). It's only suitable for "fatal" > > FPINs > > that would suggest taking a path offline on the time scale of > > minutes. > > I suppose that would work well for LN FPINs, but not for the other > > types. > > >>I agree. I was hoping the FC drivers would be able to play a role > in this and provide a direct hook into the FPIN notifications in such > a way that userspace daemons would not be required and multipath > would >>be able to play a direct role here. > >>When it comes to latency in a san we're indeed talking about sub-ms > when it comes to impacting other parts of the fabrics having an > immediate effect on multiple initiators and targets due to the shared > nature >>of the beast. > >> > > > > > > > > > > On receiving the congestion notifications our intention is to > > > > slowdown the > > > > work load gradually from the host until it stops receiving the > > > > congestion > > > > notifications. > > > > We need to validate the same how we can achieve the same of > > > > decreasing the > > > > workloads with the help of dm-multipath. > > > Would it be possible to piggyback on the service time path > > > selector > > > in this when it pertains latency? > > > > Not on service-time itself, but someone could write a new path > > selector > > algorithm. IMO we'd still have the problem that this would be seen > > as a > > layering violation. In the long run dm-mpath may need to add > > transport- > > specific callbacks. But for a proof-of-concept, a selector > > algorithm > > with layering violations would be ok, I believe. > > >>Is that an offer of volunteering?? :-) > [Muneendra]To address all the issues we are planning to come up with > new dm-path selector algorithm which should address > the above concerns where FC drivers will do a direct hook into the > FPIN notifications in such a way that userspace daemons would not be > required and multipath would be able to play a > direct role here. > Will come up with more details regarding the new dm-path selector > algorithm for FPIN notifications. That is awesome. Thank you very much. If you need any input or feedback then please let me know. > > Regards, > Muneendra. > > > This electronic communication and the information and any files > transmitted with it, or attached to it, are confidential and are > intended solely for the use of the individual or entity to whom it is > addressed and may contain information that is confidential, legally > privileged, protected by privacy laws, or otherwise restricted from > disclosure to anyone else. If you are not the intended recipient or > the person responsible for delivering the e-mail to the intended > recipient, you are hereby notified that any use, copying, > distributing, dissemination, forwarding, printing, or copying of this > e-mail is strictly prohibited. If you received this e-mail in error, > please return the e-mail to the sender, delete it from your computer, > and destroy any printed copy of it. [-- Attachment #1.1.1.2: Type: text/html, Size: 9192 bytes --] [-- Attachment #1.1.2: image001.png --] [-- Type: image/png, Size: 871 bytes --] [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 488 bytes --] [-- Attachment #2: Type: text/plain, Size: 97 bytes --] -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-04-06 11:42 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-03-23 7:52 [dm-devel] dm-multipath - IO queue dispatch based on FPIN Congestion/Latency notifications Erwin van Londen 2021-03-25 16:07 ` Benjamin Block 2021-03-26 11:15 ` Muneendra Kumar M 2021-03-31 0:22 ` Erwin van Londen 2021-03-31 7:25 ` Hannes Reinecke 2021-03-31 8:12 ` Erwin van Londen 2021-03-31 9:57 ` Martin Wilck 2021-03-31 10:48 ` Muneendra Kumar M 2021-03-31 11:45 ` Martin Wilck 2021-03-31 11:53 ` Hannes Reinecke 2021-03-31 11:57 ` Muneendra Kumar M 2021-03-31 12:41 ` Martin Wilck 2021-04-01 2:48 ` Erwin van Londen 2021-04-01 10:16 ` Martin Wilck 2021-04-01 22:04 ` Erwin van Londen 2021-04-05 5:30 ` Muneendra Kumar M 2021-04-05 5:55 ` Erwin van Londen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.