Opened 4 months ago

Closed 4 months ago

Last modified 3 months ago

#2191 closed defect (fixed)

Crash due to double timer entry scheduling in SIP transport

Reported by: nanang Owned by: nanang
Priority: normal Milestone: release-2.9
Component: pjsip Version: trunk
Keywords: Cc:
Backport to 1.x milestone: Backported: no

Description

Reported callstack trace:

#0  pj_atomic_get (atomic_var=0x0) at ../src/pj/os_core_unix.c:929
#1  pjsip_transport_destroy (tp=0x2a2df78) at ../src/pjsip/sip_transport.c:1309
#2  transport_idle_callback (timer_heap=0x2824280, entry=0x2a2e070)
    at ../src/pjsip/sip_transport.c:1016
#3  pj_timer_heap_poll (ht=0x2824280, next_delay=0x7fa5bd086cb0)
    at ../src/pj/timer.c:650
#4  pjsip_endpt_handle_events2 (endpt=0x2823f58, max_timeout=0x7fa5bd086d00,
    p_count=0x7fa5bd086d18) at ../src/pjsip/sip_endpoint.c:715
#5  pjsua_handle_events (msec_timeout=10) at ../src/pjsua-lib/pjsua_core.c:2134
#6  worker_thread (arg=0x0) at ../src/pjsua-lib/pjsua_core.c:768

The callstack above shows that it was about to destroy transport 0x2a2df78 (note that pj_atomic_get() is just the beginning stage of a transport destroy), while the log shows that the transport had actually been destroyed:

tcpc0x2a2df78  TCP transport destroyed normally

Thanks Adam Wientzek for the report.

Investigation

Both transport destroy operations seem to be performed via timer entry, so the culprit seems to be double timer entry scheduling, which after investigating further the scenario is possible indeed (i.e: timer heap has check for it already, but outside the timer heap lock!). So the fix should be about double timer entry scheduling prevention.

After investigating further, most SIP transport implementations (UDP, TCP, TLS) actually have a group lock for synchronization with socket, it should be a good thing to also integrate the group lock with timer.

Change History (2)

comment:1 Changed 4 months ago by nanang

  • Owner set to nanang
  • Resolution set to fixed
  • Status changed from new to closed

In 5971:

Fixed #2191:

  • Stricter double timer entry scheduling prevention.
  • Integrate group lock in SIP transport, e.g: for add/dec ref, for timer scheduling.


comment:2 Changed 3 months ago by nanang

In 5991:

Re #2191: Fixed crash in SIP transport destroy due to bug introduced by r5971, i.e: group lock is set after registering tp to tpmgr, so tpmgr won't call pj_grp_lock_add_ref(), but in unregisteration, group lock is set, so tpmgr will call pj_grp_lock_dec_ref().

Note: See TracTickets for help on using tickets.