Re: [RFC] hot plug failure handle mechanism

dev.dpdk.org archive mirror
 help / color / mirror / Atom feed

From: "Guo, Jia" <jia.guo@intel.com>
To: Matan Azrad <matan@mellanox.com>, "dev@dpdk.org" <dev@dpdk.org>
Cc: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"stephen@networkplumber.org" <stephen@networkplumber.org>,
	"Richardson, Bruce" <bruce.richardson@intel.com>,
	"Yigit, Ferruh" <ferruh.yigit@intel.com>,
	"gaetan.rivet@6wind.com" <gaetan.rivet@6wind.com>,
	"Wu, Jingjing" <jingjing.wu@intel.com>,
	Thomas Monjalon <thomas@monjalon.net>,
	Mordechay Haimovsky <motih@mellanox.com>,
	"Van Haaren, Harry" <harry.van.haaren@intel.com>,
	"Zhang, Qi Z" <qi.z.zhang@intel.com>,
	"Zhang, Helin" <helin.zhang@intel.com>,
	"jblunck@infradead.org" <jblunck@infradead.org>,
	"shreyansh.jain@nxp.com" <shreyansh.jain@nxp.com>
Subject: Re: [RFC] hot plug failure handle mechanism
Date: Fri, 25 May 2018 15:49:28 +0800	[thread overview]
Message-ID: <c47fc51a-c42f-7bfd-ae92-045e90dff4bf@intel.com> (raw)
In-Reply-To: <VI1PR0501MB260802D33FA9258B37EC7320D26A0@VI1PR0501MB2608.eurprd05.prod.outlook.com>

hi,matan


On 5/24/2018 10:57 PM, Matan Azrad wrote:
> Hi Guo
>
> Some questions.
>
> From: Guo Jia
>> As we know, hot plug is an importance feature whenever it use for the
>> datacenter device's fail-safe and consumption management , or use for the
>> dynamic deployment  and SRIOV Live Migration in SDN/NFV, it could be bring
>> the higher flexibility and continuality of the networking services in multiple use
>> case in industry.
>>
>> So let we see, dpdk as an importance networking combine framework with
>> packet control path/fast path lib and multiple diversity PMD drivers, what can it
>> do to help if application want to achieve their hot plug solution when they are
>> working in packet processing by dpdk.
>>
>> We already have a general device event mechanism, failsafe driver, bonding
>> driver and hot plug/unplug api in framework, app could use these api to
>> develop functional, but for the case of hot plug failure handle, that is removing
>> a device at run-time will cause app trigger MMIO error and crash out, it is lack
>> of a mechanism to handle the failure when hot unplug device. At present,
>> kernel only guantiy the hotplug handle safer on the kernel side, but for the user
>> mode side, no more specific 3rd tools such as udev/driverctl have especially
>> cover about these part of mechanism, and considerate feasibility of the
>> implementation, runtime performance and the general for almost user mode
>> PMD driver, here a general hot plug failure handle mechanism in dpdk
>> framework would be proposed.
>>
>> The hot plug failure handle mechanism should be come across as bellow:
>> 1. Add a new bus ops "handle_hot-unplug"in bus to handle bus read/write
>> error, it is bus-specific and each kind of bus can implement its own logic.
>> 2. Implement pci bus specific ops"pci_handle_hot_unplug", in the function,
>> base on the failure address to remap memory which belong to the
>> corresponding device that unplugged.
>> 3. Implement a new sigbus handler, and register it when start device event
>> monitoring, once the MMIO sigbus error exposure, it will trigger the above hot
>> plug failure handle mechanism, that will keep app, that working on packet
>> processing, would not be broken and crash, then could keep going clean, fail-
>> safe or other working task.
> Can you explain more what's happened with all the threads? Master thread, host thread, data-path threads,
> The signal may happened only in a datapath thread or even from a control thread?
i will explain it here for you at first, sigbus handler is register per 
process, cause of the signal event mechanism, control thread and 
data-path thread will random receive the sigbus error, but will go
to the common sigbus handler, in the handler find the device according 
the failure address, then remap the memory for the device.
> What's about resource leak?  (mainly relevant for control threads):
> If you jump from the signal address to the restart address, how can you clean the process which was started and got the signal?
it will not use long jump to turn back the restart address, just capture 
the sigbus event and then do failure handle, then let the thread keep 
going at current position.
> Matan.
>> 4. Also also will introduce the solution by use testpmd to show the example of
>> the whole procedure like that:
>> device unplug ->failure handle->stop forwarding->stop port->close port->detach
>> port.
>>
>> Best regards,
>>
>> Jeff Guo

next prev parent reply	other threads:[~2018-05-25  7:49 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-24  6:55 [RFC] hot plug failure handle mechanism Guo, Jia
2018-05-24 14:57 ` Matan Azrad
2018-05-25  7:49   ` Guo, Jia [this message]
2018-05-29 11:20 ` Bruce Richardson
2018-06-04  1:56   ` Guo, Jia
2018-06-06 12:54     ` Bruce Richardson
2018-06-06 13:11       ` Ananyev, Konstantin
2018-06-07  2:14       ` Guo, Jia
2018-06-14 21:37         ` Thomas Monjalon
2018-06-15  8:31           ` Guo, Jia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c47fc51a-c42f-7bfd-ae92-045e90dff4bf@intel.com \
    --to=jia.guo@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=gaetan.rivet@6wind.com \
    --cc=harry.van.haaren@intel.com \
    --cc=helin.zhang@intel.com \
    --cc=jblunck@infradead.org \
    --cc=jingjing.wu@intel.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=matan@mellanox.com \
    --cc=motih@mellanox.com \
    --cc=qi.z.zhang@intel.com \
    --cc=shreyansh.jain@nxp.com \
    --cc=stephen@networkplumber.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).