public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* Problem with XRC userspace
@ 2010-02-15 12:56 Jack Morgenstein
       [not found] ` <201002151456.56696.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 2+ messages in thread
From: Jack Morgenstein @ 2010-02-15 12:56 UTC (permalink / raw)
  To: rdreier-FYB4Gu1CFyUAvxtiuMwx3w
  Cc: rolandd-FYB4Gu1CFyUAvxtiuMwx3w, Tziporet Koren,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, changquing.tang-VXdhtT5mjnY

Roland,

The issue in these e-mails I am  forwarding is still a real problem.

If I put the XRC srq fields after the pthread_cond_t, the RHEL4/5 incompatibility kicks in, in a big way.
If I put them before the pthread_cond_t, we still have a problem with "events_completed", as indicated
by Gleb Nabokov of Voltaire, not to mention requiring all apps using libibverbs to recompile
(no backwards libibverbs binary compatibility).

(Note that this bug still exists, without the XRC changes, both for  struct ibv_qp  and   struct ibv_qp  
    -- see __ibv_ack_async_event() in libibverbs/src/device.c :
        case IBV_EVENT_QP_FATAL:
        case IBV_EVENT_QP_REQ_ERR:
        case IBV_EVENT_QP_ACCESS_ERR:
        case IBV_EVENT_COMM_EST:
        case IBV_EVENT_SQ_DRAINED:
        case IBV_EVENT_PATH_MIG:
        case IBV_EVENT_PATH_MIG_ERR:
        case IBV_EVENT_QP_LAST_WQE_REACHED:
        {
                struct ibv_qp *qp = event->element.qp;

                pthread_mutex_lock(&qp->mutex);
===>            ++qp->events_completed;
                pthread_cond_signal(&qp->cond);
                pthread_mutex_unlock(&qp->mutex);

                return;
        }

        case IBV_EVENT_SRQ_ERR:
        case IBV_EVENT_SRQ_LIMIT_REACHED:
        {
                struct ibv_srq *srq = event->element.srq;

                pthread_mutex_lock(&srq->mutex);
===>            ++srq->events_completed;
                pthread_cond_signal(&srq->cond);
                pthread_mutex_unlock(&srq->mutex);

                return;


In the OFED distribution, since it is installed as a set of packages, we simply moved the mutex and cond fields
to the end of the structures involved (as a userspace fix).  What on earth can we do for the mainstream
( see below -- "Yikes...")?

any ideas? Another "compat" layer, and incrementing the ABI version maybe?

-Jack

----------  Forwarded Message  ----------

Subject: [ofa-general] Another XRC binary compatable issue for different pthread version.
Date: Sunday 17 February 2008 20:31
From: "Tang, Changqing" <changquing.tang-VXdhtT5mjnY@public.gmane.org>
To: "general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org" <general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org>


HI:
        Here is the ibv_srq structure:

struct ibv_srq {
        struct ibv_context     *context;
        void                   *srq_context;
        struct ibv_pd          *pd;
        uint32_t                handle;

        pthread_mutex_t         mutex;
        pthread_cond_t          cond;
        uint32_t                events_completed;

        uint32_t                xrc_srq_num;
        struct ibv_xrc_domain  *xrc_domain;
        struct ibv_cq          *xrc_cq;
};

On redhat 5 system, since it has a new pthread version, 'pthread_cond_t' is larger
than on redhat 4 system.

So if I compile the code on redhat 5 system, it won't run on redhat 4 system, and
vice versa.


--CQ
_______________________________________________
general mailing list
general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


-------------------------------------------------------


----------  Forwarded Message  ----------

Subject: RE: [ofa-general] Another XRC binary compatable issue for	different pthread version.
Date: Monday 18 February 2008 17:29
From: "Tang, Changqing" <changquing.tang-VXdhtT5mjnY@public.gmane.org>
To: Gleb Natapov <glebn-smomgflXvOZWk0Htik3J/w@public.gmane.org>
Cc: Roland Dreier <rdreier-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>, "general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org" <general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org>


Any application code access events_completed field ?  HP-MPI does not.

If no user code access 'mutex' 'cond' and 'events_completed', I suggest to
put the XRC fields in the middle of this structure.


--CQ


> -----Original Message-----
> From: Gleb Natapov [mailto:glebn-smomgflXvOZWk0Htik3J/w@public.gmane.org]
> Sent: Monday, February 18, 2008 9:21 AM
> To: Tang, Changqing
> Cc: Roland Dreier; general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> Subject: Re: [ofa-general] Another XRC binary compatable
> issue for different pthread version.
>
> On Mon, Feb 18, 2008 at 03:15:01PM +0000, Tang, Changqing wrote:
> >
> > Without using XRC fields, everything seems to work OK.
> >
> It's only seems so. Access to events_completed should be also
> problematic.
>
> > --CQ
> >
> >
> > > -----Original Message-----
> > > From: Roland Dreier [mailto:rdreier-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org]
> > > Sent: Monday, February 18, 2008 7:24 AM
> > > To: Tang, Changqing
> > > Cc: general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> > > Subject: Re: [ofa-general] Another XRC binary compatable
> issue for
> > > different pthread version.
> > >
> > >  >         Here is the ibv_srq structure:
> > >  >
> > >  > struct ibv_srq {
> > >  ...
> > >  >         pthread_cond_t          cond;
> > >
> > >  > On redhat 5 system, since it has a new pthread version,
> > > 'pthread_cond_t' is larger  > than on redhat 4 system.
> > >
> > > Yikes... I don't see any way to handle this without breaking the
> > > libibverbs ABI for all existing binaries, since we have to move
> > > pthread_cond_t out of all exposed structures....
> > >
> > > Any ideas??
> > >
> > >  - R.
> > >
> > _______________________________________________
> > general mailing list
> > general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
>
> --
>                         Gleb.
>
_______________________________________________
general mailing list
general-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


-------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Problem with XRC userspace
       [not found] ` <201002151456.56696.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2010-02-15 17:34   ` Jason Gunthorpe
  0 siblings, 0 replies; 2+ messages in thread
From: Jason Gunthorpe @ 2010-02-15 17:34 UTC (permalink / raw)
  To: Jack Morgenstein
  Cc: rdreier-FYB4Gu1CFyUAvxtiuMwx3w, rolandd-FYB4Gu1CFyUAvxtiuMwx3w,
	Tziporet Koren, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	changquing.tang-VXdhtT5mjnY

On Mon, Feb 15, 2010 at 02:56:56PM +0200, Jack Morgenstein wrote:

> If I put the XRC srq fields after the pthread_cond_t, the RHEL4/5
> incompatibility kicks in, in a big way.  If I put them before the
> pthread_cond_t, we still have a problem with "events_completed", as
> indicated by Gleb Nabokov of Voltaire, not to mention requiring all
> apps using libibverbs to recompile (no backwards libibverbs binary
> compatibility).

So, er, is this trying to say that RH changed the size of
pthread_cond_t, and because this internal structure is exposed via the
header file rather than being opaque you can get the app thinking the
size is X and the library thinking it is Y and presumably both link to
different symvers for things like pthread_cond_XX?

> In the OFED distribution, since it is installed as a set of
> packages, we simply moved the mutex and cond fields to the end of
> the structures involved (as a userspace fix).  What on earth can we
> do for the mainstream ( see below -- "Yikes...")?

Well, no matter what, you have to rev at least the ibverbs API toward
the driver. You cannot actually change ibv_srq at all without breaking
all the drivers too. Look at mlx4, it allocates a

struct mlx4_srq {
        struct ibv_srq                  ibv_srq;
        struct mlx4_buf                 buf;

During ibv_create_srq - you cannot increase the size of ibv_srq
without breaking this API.

Soo.. going ahead and breaking the driver API (rev the symver on
ibv_cmd_create_srq I guess), it seems pretty simple to fixup:

struct ibv_srq {
         struct ibv_context     *context;
         void                   *srq_context;
         struct ibv_pd          *pd;
         uint32_t                handle;

         uint32_t                xrc_srq_num;
         struct ibv_xrc_domain  *xrc_domain;
         struct ibv_cq          *xrc_cq;

         uint32_t private[64]; // Something more sneaky for the 64..
};

struct ibv_srq_private {
         pthread_mutex_t         mutex;
         pthread_cond_t          cond;
         uint32_t                events_completed;
};

Then use:
COMPILE_BUG(sizeof(srq->private) >= sizeof(struct ibv_srq_private));
pthread_cond_signal(((ibv_srq_private *)srq->private)->cond);

Since pthread_cond_t can be different sizes the app *cannot* touch it
directly and the events_completed cannot be accessed without taking
the lock, this shouldn't cause any problems. You can put the xrc items
anywhere in the structure so long as the other 4 public members do not
change offset.

Realistically, the only things that need to be built against the same
libpthreads are the drivers and the libibverbs itself, as long as the
3 private items remain at the end of the structure the apps won't
care. If this was a really big deal then the symver of
ibv_cmd_create_srq would need to be determined based on the libpthread
it was linked too.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-02-15 17:34 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-15 12:56 Problem with XRC userspace Jack Morgenstein
     [not found] ` <201002151456.56696.jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2010-02-15 17:34   ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox