From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hongyang Yang Subject: Re: [PATCH v10 4/5] remus: implement remus network buffering for nic devices Date: Thu, 10 Jul 2014 09:38:30 +0800 Message-ID: <53BDEE96.10303@cn.fujitsu.com> References: <1401932069-16460-1-git-send-email-yanghy@cn.fujitsu.com> <1401932069-16460-5-git-send-email-yanghy@cn.fujitsu.com> <21392.42936.240674.451323@mariner.uk.xensource.com> <5396B4BF.2060209@cn.fujitsu.com> <21437.52514.486908.472983@mariner.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <21437.52514.486908.472983@mariner.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Jackson Cc: ian.campbell@citrix.com, wency@cn.fujitsu.com, stefano.stabellini@eu.citrix.com, andrew.cooper3@citrix.com, yunhong.jiang@intel.com, eddie.dong@intel.com, xen-devel@lists.xen.org, rshriram@cs.ubc.ca, roger.pau@citrix.com, laijs@cn.fujitsu.com List-Id: xen-devel@lists.xenproject.org Hi Ian, Thank you for the reply, I have some comments below... On 07/10/2014 07:15 AM, Ian Jackson wrote: > Hongyang Yang writes ("Re: [PATCH v10 4/5] remus: implement remus network buffering for nic devices"): >> Sorry for the late reply, just notice this comment... > > And here I am being even later replying... > >> On 06/06/2014 01:24 AM, Ian Jackson wrote: >>> As far as I can see this attempts to search for an ifb which is not in >>> use. >>> >>> I see you claim a lock to ensure that you don't fail due to races with >>> other copies of this script. >>> >>> But are there potentially other things (not Xen related, parhaps) in >>> the system which might try to allocate an ifb using a similar >>> approach ? How do we deal with the potential race with them ? > > Have I missed an answer to this question ? > >>> Also: I think you should: >>> - write the IFB name to xenstore _before_ starting to configure it >>> - in the loop I quote above, check in xenstore that the ifb is not >>> in use by another domain Sorry for mention this explicit,your comment here has already been addressed: +#return 0 if the ifb is free +check_ifb() { + local installed=`nl-qdisc-list -d $1` + [ -n "$installed" ] && return 1 + + for domid in `xenstore-list "/local/domain" 2>/dev/null || true` + do + [ $domid -eq 0 ] && continue + xenstore-exists "/libxl/$domid/remus/netbuf" || continue + for devid in `xenstore-list "/libxl/$domid/remus/netbuf" 2>/dev/null || true` + do + local path="/libxl/$domid/remus/netbuf/$devid/ifb" + xenstore-exists $path || continue + local ifb=`xenstore-read "$path" 2>/dev/null || true` + [ "$ifb" = "$1" ] && return 1 + done + done + + return 0 +} + +setup_ifb() { + + for ifb in `ifconfig -a -s|egrep ^ifb|cut -d ' ' -f1` + do + check_ifb "$ifb" || continue + IFB="$ifb" + break + done + + if [ -z "$IFB" ] + then + fatal "Unable to find a free IFB device for $vifname" + fi + + #not using xenstore_write that automatically exits on error + #because we need to cleanup + _xenstore_write "$XENBUS_PATH/ifb" "$IFB" || xs_write_failed "$vifname" "$IFB" + do_or_die ip link set dev "$IFB" up +} >>> >>> Otherwise there seems to be the following risk: >>> 1. You pick ifbX using the loop above >>> 2. You start to configure ifbX, eventually resulting in a >>> configuration which makes it not show up as free >>> 3. Something bad happens and you fail, before writing the >>> ifb name to xenstore >>> >>> In this case, the ifb would be leaked. (I see you do try to avoid >>> this with xs_write_failed, but scripts can fail for other reasons.) >> >> If the setup failed for any reason, we will call teardown in the remus >> device framework, so the ifb won't be leaked. > > I'm afraid that you can't assume that your script will necessarily > execute the teardown. Perhaps the script got killed by the OOM > killer, or something else horrible went wrong. > > So you need to make sure that all the states the system goes through, > however briefly and no matter what cleanup will be attempted on > failure. > >>>> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl >>>> index 4278a6b..50bf1ef 100644 >>>> --- a/tools/libxl/libxl_types.idl >>>> +++ b/tools/libxl/libxl_types.idl >>>> @@ -566,6 +566,8 @@ libxl_domain_remus_info = Struct("domain_remus_info",[ >>>> ("interval", integer), >>>> ("blackhole", bool), >>>> ("compression", bool), >>>> + ("netbuf", bool), >>>> + ("netbufscript", string), >>> >>> I think netbuf should be a defbool, not a bool. Indeed, perhaps this >>> is true of the other options too. Is there some reason it shouldn't >>> default to enabled ? >> >> The netbuffering is enabled by default, this option is used to disable >> the netbuffering support. > > Maybe I haven't followed the code here correctly, but I think that has > to be done with a defbool. Where is the default set ? I don't see it > in this patch. It was set in the main_remus function in netbuf command switch patch, default is on, and will only be switched off when user passes an -n option, so I think this does not need to be done with a defbool: diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 68df548..b324f34 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -7175,8 +7175,9 @@ int main_remus(int argc, char **argv) r_info.interval = 200; r_info.blackhole = 0; r_info.compression = 1; + r_info.netbuf = 1; - SWITCH_FOREACH_OPT(opt, "bui:s:e", NULL, "remus", 2) { + SWITCH_FOREACH_OPT(opt, "buni:s:N:e", NULL, "remus", 2) { @@ -7186,6 +7187,12 @@ int main_remus(int argc, char **argv) case 'u': r_info.compression = 0; break; + case 'n': + r_info.netbuf = 0; + break; + case 'N': > > Thanks, > Ian. > . > -- Thanks, Yang.