public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Sreedhar Kodali <srkodali-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
To: "Hefty, Sean" <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	pradeeps-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
Subject: RE: Patches for fine-grained rsocket preloading
Date: Fri, 05 Sep 2014 19:27:34 +0530	[thread overview]
Message-ID: <73d9b26bb2890baca83977c699925a73@imap.linux.ibm.com> (raw)
In-Reply-To: <1828884A29C6694DAF28B7E6B8A8237399DC1580-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>

Hi Sean,

Thanks for reviewing the patches.  Please find answers to your review 
comments below:

> On 2014-09-04 23:37, Hefty, Sean wrote:
>> Could you please go through the provided changes and ensure that
>> they are part of the regular rsocket head stream.
> 
> It would be helpful to separate patches into different email messages.

Sorry, it was my mistake.  I have corrected it - now v2 of patches from 
1 to 4 are
sent in different email messages with a subject template

"[PATCH v2 {patch_no}/4] rsockets: {short_description}".

> 
>> Subject: [patch 1/3] subsystem: RDMA CM
>> commit 725d209ea390c643447c337b178442e5ef7b8b50
>> Author: Sreedhar Kodali <srkodali-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>> Date:   Tue Aug 5 16:18:58 2014 +0530
>> 
>>      By default the R-Sockets pre-loading library intercepts all the
>> stream
>>      and datagram sockets belonging to a launched program processes 
>> and
>> threads.
>> 
>>      However, distributed application and database servers may require
>> fine
>>      grained interception to ensure that only the processes which are
>> listening
>>      for remote connections on the RDMA transport need to be enabled 
>> with
>> RDMA
>>      while remaining can continue to use TCP as before.  This allows
>> proper
>>      communication happening between various server components 
>> locally.
>> 
>>      A configuration file based mechanism is introduced to facilitate
>> this
>>      fine grained interception mechanism.  As part of preload
>> initialization,
>>      the configuration file is scanned and an in-memory record store 
>> is
>> created
>>      with all the entries found.  When a request is made to intercept 
>> a
>> socket,
>>      its attributes are cross checked with stored records to see 
>> whether
>> we
>>      should proceed with rsocket switch over.
>> 
>>      Note: Right now, the fine grained interception mechanism is 
>> enabled
>> only
>>            for newly created sockets.  Going forward, this can be 
>> extened
>> to
>>            select connections based on the specified host/IP addresses
>> and
>>            ports as well.
>> 
>>      "preload_config" is the name of the configuration file which 
>> should
>> exist
>>      in the default configuration location (usually the full path to 
>> this
>>      configuration file is:
>> <install-root>/etc/rdma/rsocket/preload_config)
>>      of an installed rsocket library.
>>      The sample format for this configuration file is shown below:
>> 
>>      @#
>>      @# Sample config file for preloading in a program specific way
>>      @#
>>      @# Each line entry should have the following format:
>>      @#
>>      @#   prog_name <space> dom_spec <space> type_spec <space> 
>> proto_spec
>>      @#
>>      @# where,
>>      @#
>>      @# prog_name  - program or command name (string without spaces)
>>      @# dom_spec   - one or more socket domain strings separated by
>> commas
>>      @#            - format: {*|domain,[,domain,...]}
>>      @#            - '*' means any valid domain
>>      @#            - valid domains: inet/inet6/ib
>>      @# type_spec  - one or more socket type strings separated by 
>> commas
>>      @#            - format: {*|type[,type,...]}
>>      @#            - '*' means any valid type
>>      @#            - valid types: stream/dgram
>>      @# proto_spec - one or more socket protocol strings separated by
>> commas
>>      @#            - format: {*|protocol[,protocol,...]}
>>      @#            - '*' means any valid protocol
>>      @#            - valid protocols: tcp/udp
>>      @# <space>    - one ore more tab or space characters
>>      @#
>>      @# Note:
>>      @#  Lines beginning with '#' character are treated as comments.
>>      @#  Comments at the end of an entry are allowed and should be
>> preceded
>>      @#  by '#' character.
>>      @#  Blank lines are ignored.
>>      @
>>      @progA inet stream tcp # intercept progA's internet stream 
>> sockets
>>      @progB inet6 dgram udp # intercept progB's ipv6 datagram sockets
>>      @progC * * * # intercept progC's sockets
>> 
>>      Signed-off-by: Sreedhar Kodali <srkodali-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>>      Reviewed-by: Pradeep Satyanarayana <pradeeps-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> 
> Overall, the patch concepts look reasonable.  See comments below.

Thanks for your appreciation.

> 
>>      ---
>> diff --git a/src/preload.c b/src/preload.c
>> index fb2149b..8c043aa 100644
>> --- a/src/preload.c
>> +++ b/src/preload.c
>> @@ -50,6 +50,8 @@
>>   #include <netinet/tcp.h>
>>   #include <unistd.h>
>>   #include <semaphore.h>
>> +#include <ctype.h>
>> +#include <stdlib.h>
>> 
>>   #include <rdma/rdma_cma.h>
>>   #include <rdma/rdma_verbs.h>
>> @@ -57,6 +59,10 @@
>>   #include "cma.h"
>>   #include "indexer.h"
>> 
>> +#ifndef LINE_MAX
>> +#define LINE_MAX 2048
>> +#endif /* LINE_MAX */
> 
> Any reason why this can't just be an enum and a smaller value?

This can't be an enum because it is used in allocating temporary
buffer for line parsing.

Smaller value, yes.

Now in revised patch v2 1/4, I have replaced it with hard coded
value of 512.  Hope this value covers most of the general
configuration cases.

> 
>> +static config_entry_t *entryp = NULL;
>> +static int16_t nentries = 0;
>> +static int16_t config_scanned = 0;
>> +static int16_t config_avail = 0;
>> +extern char *program_invocation_short_name;
> 
> Static variables are initialized to 0 by default (C standard).

Yes, that's true.  Just made them explicit for clarity.
In revised patch v2 1/4, the initialization are removed.

> 
>> +/* free in-memory config store
>> + * should be called only once during finalization
>> + */
>> +static void free_preload_config(void)
>> +{
>> +   int i;
>> +
>> +   if (entryp) {
>> +       for (i = 0; i < nentries; i++) {
>> +           if (entryp[i].name) {
>> +               free(entryp[i].name);
>> +           }
>> +       }
>> +       free(entryp);
>> +   }
>> +
>> +   return;
>> +}
>> +
>> +/* check whether interception is required for this socket
>> + * compares the provided attributes with that available in the
>> in-memory
>> + * data store for the current process
>> + * sets-up in-memory config store if it's already not done
>> + */
>> +static int intercept_socket(int domain, int type, int protocol)
>> +{
>> +   int i;
>> +
>> +   if (!config_scanned) {
>> +       pthread_mutex_lock(&mut);
>> +       if (scan_preload_config() == 0) {
>> +           config_avail = 1;
>> +       }
>> +       if (entryp) {
>> +           atexit(free_preload_config);
>> +       }
>> +       config_scanned = 1;
>> +       pthread_mutex_unlock(&mut);
> 
> There's already an init_preload() function that handles all
> initialization.  Please move the scanning of the file there.  You can
> then drop the config_scanned variable.

Done.  Please refer to patch v2 1/4.
Scanning of file is moved to init_preload() function.
config_scanned variable is now dropped.

> 
>> +   }
>> +
>> +   if (!config_avail) {
>> +       return -1;
>> +   }
> 
> This can probably go.  I'm not sure why the return value is -1, versus
> just 1 or 0.

Done.  Please refer to patch v2 1/4.
The return value of -1 is now removed and it's now replaced
with explicit config_avail checking in socket() method.

> 
>> +
>> +   /* locate the config entry */
>> +   for (i = 0; i < nentries; i++) {
>> +       if (strncmp(entryp[i].name, program_invocation_short_name,
>> strlen(entryp[i].name)) == 0) {
>> +           break;
>> +       }
>> +   }
>> +   if (i == nentries) {
>> +       return 0;
>> +   }
>> +
>> +   /* match domain field */
>> +   if (!(entryp[i].domain & (1 << domain))) {
>> +       return 0;
>> +   }
>> +
>> +   /* match type field */
>> +   if (!(entryp[i].type & (1 << type))) {
>> +       return 0;
>> +   }
>> +
>> +   /* match protocol field only if protocol is specified */
>> +   if (protocol && !(entryp[i].protocol & (1 << protocol))) {
>> +       return 0;
>> +   }
>> +
>> +   /* entry matched */
>> +   return 1;
>> +}
>> +
>>   static int fd_open(void)
>>   {
>>      struct fd_info *fdi;
>> @@ -404,6 +631,10 @@ int socket(int domain, int type, int protocol)
>>      static __thread int recursive;
>>      int index, ret;
>> 
>> +   if (intercept_socket(domain, type, protocol) == 0) {
>> +       goto real;
>> +   }
>> +
>>      if (recursive)
>>          goto real;
>> 
>> 
>> Subject: [patch 2/3] subsystem: RDMA CM
>> commit bf77b79be41b9ed288be6c7e858d463838febc84
>> Author: Sreedhar Kodali <srkodali-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>> Date:   Tue Aug 5 16:38:05 2014 +0530
>> 
>>      Some complex server class applications start multiple listening
>> threads
>>      to receive client connection requests simultaneously from 
>> different
>> sources.
>>      When rsocket tries to intercept sockets belonging to such
>> applications,
>>      the listen() call made multiple times from different listening
>> threads
>>      should not result in error on par with the corresponding TCP
>> behavior.
>> 
>>      The return value of rlisten() is modified to ensure that it 
>> wouldn't
>>      bail out when it is invoked more than once with the same socket
>> attributes
>>      from different listening threads.  This behavior is gated by a
>> runtime
>>      parameter 'multi_listeners' with the associated configuration 
>> file
>> so it
>>      does not affect in any way the rest of rsocket applications.
> 
> I didn't follow this.  If rsocket is behaving differently than normal
> sockets, I'm fine updating it so that it matches the desired behavior.
>  It doesn't need to be an option.
> 
> That said, can you provide more details on what the desired behavior
> should be and what rsockets is doing differently?
> 

Accepted.  Your subsequent understanding and modified changes are 
correct.
I am referring to your latest patch on this.


>> 
>>      Signed-off-by: Sreedhar Kodali <srkodali-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>>      Reviewed-by: Pradeep Satyanarayana <pradeeps-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>>      ---
>> 
>> diff --git a/src/rsocket.c b/src/rsocket.c
>> index 7007897..3879a70 100644
>> --- a/src/rsocket.c
>> +++ b/src/rsocket.c
>> @@ -115,6 +115,7 @@ static uint16_t def_rqsize = 384;
>>   static uint32_t def_mem = (1 << 17);
>>   static uint32_t def_wmem = (1 << 17);
>>   static uint32_t polling_time = 10;
>> +static uint16_t multi_listeners = 0;
>> 
>>   /*
>>    * Immediate data format is determined by the upper bits
>> @@ -542,6 +543,11 @@ void rs_configure(void)
>>          def_iomap_size = (uint8_t) rs_value_to_scale(
>>              (uint16_t) rs_scale_to_value(def_iomap_size, 8), 8);
>>      }
>> +
>> +   if ((f = fopen(RS_CONF_DIR "/multi_listeners", "r"))) {
>> +       (void) fscanf(f, "%hu", &multi_listeners);
>> +       fclose(f);
>> +   }
>>      init = 1;
>>   out:
>>      pthread_mutex_unlock(&mut);
>> @@ -1175,8 +1181,12 @@ int rlisten(int socket, int backlog)
>>      if (!rs)
>>          return ERR(EBADF);
>>      ret = rdma_listen(rs->cm_id, backlog);
>> -   if (!ret)
>> +   if (!ret) {
>> +       rs->state = rs_listening;
>> +   } else if (multi_listeners == 1 && errno == EINVAL) {
>>          rs->state = rs_listening;
> 
> This is what I don't get.  The listen called failed, but we want to
> mark that the rsocket is in the listening state anyway?

Even though this logic is working, later I understood that this
is not the correct way of fixing the issue.  As I said above,
your latest fix on this is good.

> 
>> +       ret = 0;
>> +   }
>>      return ret;
>>   }
>> 
>> 
>> Subject: [patch 3/3] subsystem: RDMA CM
>> commit 275aafe76c98718ad6989cbe1cf35fb82e36cd2c
>> Author: Sreedhar Kodali <srkodali-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>> Date:   Tue Aug 5 16:45:41 2014 +0530
>> 
>>      While waiting for a completion queue event, rsocket's logic by
>> default
>>      bails out when interrupted.  Because of this, on the passive side
>>      ongoing connection establishments are abruptly terminated without
>>      fully accepting the incoming connection requests.
>> 
>>      The solution is to modify the completion event waiting logic to
>> ensure
>>      that it retries for the event upon interruption instead of 
>> returning
>>      with an error.  This behavior is gated by a runtime parameter
>>      'restart_onintr' with the associated configuration file so it 
>> does
>>      not affect in any way the rest of rsocket applications.
>> 
>>      Signed-off-by: Sreedhar Kodali <srkodali-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>>      Reviewed-by: Pradeep Satyanarayana <pradeeps-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>>      ---
>> 
>> diff --git a/src/rsocket.c b/src/rsocket.c
>> index 3879a70..c7e5b11 100644
>> --- a/src/rsocket.c
>> +++ b/src/rsocket.c
>> @@ -116,6 +116,7 @@ static uint32_t def_mem = (1 << 17);
>>   static uint32_t def_wmem = (1 << 17);
>>   static uint32_t polling_time = 10;
>>   static uint16_t multi_listeners = 0;
>> +static uint16_t restart_onintr = 0;
>> 
>>   /*
>>    * Immediate data format is determined by the upper bits
>> @@ -548,6 +549,11 @@ void rs_configure(void)
>>          (void) fscanf(f, "%hu", &multi_listeners);
>>          fclose(f);
>>      }
>> +
>> +   if ((f = fopen(RS_CONF_DIR "/restart_onintr", "r"))) {
>> +       (void) fscanf(f, "%hu", &restart_onintr);
>> +       fclose(f);
>> +   }
>>      init = 1;
>>   out:
>>      pthread_mutex_unlock(&mut);
>> @@ -1973,10 +1979,14 @@ static int rs_get_cq_event(struct rsocket *rs)
>>      if (!rs->cq_armed)
>>          return 0;
>> 
>> +resume_get_cq_event:
>>      ret = ibv_get_cq_event(rs->cm_id->recv_cq_channel, &cq, 
>> &context);
>>      if (!ret) {
>>          ibv_ack_cq_events(rs->cm_id->recv_cq, 1);
>>          rs->cq_armed = 0;
>> +   } else if (restart_onintr == 1 && errno == EINTR) {
>> +       errno = 0;
>> +       goto resume_get_cq_event;
> 
> Could this lead to the thread blocking forever?  Is there a reason to
> make this a conditional check?

Could be. This is made a conditional check because most of the cases
this is only useful on the server side.

> 
>>      } else if (errno != EAGAIN) {
>>          rs->state = rs_error;
>>      }
> 
> - Sean

Thank You.

- Sreedhar

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

      parent reply	other threads:[~2014-09-05 13:57 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-06  6:43 Patches for fine-grained rsocket preloading Sreedhar Kodali
     [not found] ` <17f3d712c0ef986a2cfe8eb11d99f66d-FJGp5E75HVmZamtmwQBW5tBPR1lH4CV8@public.gmane.org>
2014-09-04 18:07   ` Hefty, Sean
     [not found]     ` <1828884A29C6694DAF28B7E6B8A8237399DC1580-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-09-04 18:15       ` Hefty, Sean
     [not found]         ` <1828884A29C6694DAF28B7E6B8A8237399DC15B4-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2014-09-05 14:00           ` Sreedhar Kodali
2014-09-05 13:57       ` Sreedhar Kodali [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=73d9b26bb2890baca83977c699925a73@imap.linux.ibm.com \
    --to=srkodali-23vcf4htsmix0ybbhkvfkdbpr1lh4cv8@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=pradeeps-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    --cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox