#2191 closed defect (fixed)
Crash due to double timer entry scheduling in SIP transport
Reported by: | nanang | Owned by: | nanang |
---|---|---|---|
Priority: | normal | Milestone: | release-2.9 |
Component: | pjsip | Version: | trunk |
Keywords: | Cc: | ||
Backport to 1.x milestone: | Backported: | no |
Description
Reported callstack trace:
#0 pj_atomic_get (atomic_var=0x0) at ../src/pj/os_core_unix.c:929 #1 pjsip_transport_destroy (tp=0x2a2df78) at ../src/pjsip/sip_transport.c:1309 #2 transport_idle_callback (timer_heap=0x2824280, entry=0x2a2e070) at ../src/pjsip/sip_transport.c:1016 #3 pj_timer_heap_poll (ht=0x2824280, next_delay=0x7fa5bd086cb0) at ../src/pj/timer.c:650 #4 pjsip_endpt_handle_events2 (endpt=0x2823f58, max_timeout=0x7fa5bd086d00, p_count=0x7fa5bd086d18) at ../src/pjsip/sip_endpoint.c:715 #5 pjsua_handle_events (msec_timeout=10) at ../src/pjsua-lib/pjsua_core.c:2134 #6 worker_thread (arg=0x0) at ../src/pjsua-lib/pjsua_core.c:768
The callstack above shows that it was about to destroy transport 0x2a2df78 (note that pj_atomic_get() is just the beginning stage of a transport destroy), while the log shows that the transport had actually been destroyed:
tcpc0x2a2df78 TCP transport destroyed normally
Thanks Adam Wientzek for the report.
Investigation
Both transport destroy operations seem to be performed via timer entry, so the culprit seems to be double timer entry scheduling, which after investigating further the scenario is possible indeed (i.e: timer heap has check for it already, but outside the timer heap lock!). So the fix should be about double timer entry scheduling prevention.
After investigating further, most SIP transport implementations (UDP, TCP, TLS) actually have a group lock for synchronization with socket, it should be a good thing to also integrate the group lock with timer.
Change History (2)
comment:1 Changed 5 years ago by nanang
- Owner set to nanang
- Resolution set to fixed
- Status changed from new to closed
comment:2 Changed 5 years ago by nanang
In 5991:
In 5971: