Opened 8 years ago

Closed 8 years ago

#1902 closed defect (fixed)

Crash when endpoint has multiple worker threads and SIP TCP transport is disconnected during incoming call handling

Reported by: nanang Owned by: bennylp
Priority: normal Milestone: release-2.5
Component: pjsua-lib Version: trunk
Keywords: Cc:
Backport to 1.x milestone: Backported: no

Description (last modified by nanang)

Scenario:

  1. Run pjsua with multiple worker threads, e.g: with --thread-cnt=2 param.
  2. Receive a call via TCP transport and disconnect the transport immediately (can be simulated by adding code to shutdown the transport in the incoming call callback pjsua_call_on_incoming()).
  3. The thread handling incoming call may crash at any point when accessing the dialog (usually still inside pjsua_call_on_incoming() function), as the dialog is already destroyed prematurely by other working thread after transport is disconnected.

Sample call stack (crash when trying to use dialog's pool while dialog is already destroyed):

pj_pool_alloc_from_block(pj_pool_block * block, unsigned __int64 size) Line 50	C
pj_pool_alloc(pj_pool_t * pool, unsigned __int64 size) Line 60	C
pj_strdup(pj_pool_t * pool, pj_str_t * dst, const pj_str_t * src) Line 40	C
pjsip_tpmgr_find_local_addr2(pjsip_tpmgr * tpmgr, pj_pool_t * pool, pjsip_tpmgr_fla2_param * prm) Line 1510	C
pjsua_acc_get_uac_addr(int acc_id, pj_pool_t * pool, const pj_str_t * dst_uri, pjsip_host_port * addr, pjsip_transport_type_e * p_tp_type, int * secure, const void * * p_tp) Line 3118	C
pjsua_call_on_incoming(pjsip_rx_data * rdata) Line 1389	C
...

During fixing this issue, we found a couple of quite major issues:

  1. Dialog being created by pjsip_dlg_create_uas() may be destroyed by other thread before the function returns, so application has no chance to lock the dialog. It may happen with this scenario:
    1. Initial INVITE is coming, pjsip_dlg_create_uas() is being executed to create UAS dialog.
    2. pjsip_dlg_create_uas() created transaction for the incoming message and registered the just created dialog.
    3. Transport disconnected event received by other thread, it triggers transaction to move its state to terminated.
    4. The just created dialog receives on_tsx_state() and get destroyed!

This is a quite fatal bug, so we decided to deprecate pjsip_dlg_create_uas() immediately and introduced pjsip_dlg_create_uas_and_inc_lock(). Application not affected by this bug, e.g: app with a single worker thread, can continue using pjsip_dlg_create_uas() by setting DEPRECATED_FOR_TICKET_1902 to zero in config_site.h.

  1. In case of transport error on multi worker threads application, there is a possibility of a transaction user receiving two on_tsx_state() callbacks concurrently, one for terminated state notification and another for destroyed state notification, this is due to delayed terminated state notification when transport error occurs (the delay was introduced to avoid deadlock, see #1646). For dialog, this can be harmful as one of pjsip_dlg_on_tsx_state() may destroy the dialog, while another pjsip_dlg_on_tsx_state() is waiting for dialog lock.

This ticket fixes this issue by serializing the transaction state notifications.

Thanks Itay Bianco for the report.

Change History (3)

comment:1 Changed 8 years ago by nanang

  • Description modified (diff)

comment:2 Changed 8 years ago by nanang

In r5241, fixed:

  • Crash when endpoint has multiple worker threads and SIP TCP transport is disconnected during incoming call handling.
  • Deprecated pjsip_dlg_create_uas(), replaced by pjsip_dlg_create_uas_and_inc_lock().
  • Serialized transaction state notifications (terminated and destroyed) in case of transport error.

comment:3 Changed 8 years ago by nanang

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.