Mutex Locks Order in PJSUA-LIB
There are many types of mutexes used in PJSIP, and both the library and application MUST obey a uniform lock order to avoid deadlocks.
To lock a particular call in PJSUA-LIB, this mutex lock ordering MUST be used:
- lock the dialog
- lock the user agent mutex
- lock PJSUA-LIB
- lock application's mutex
Acquiring Locks in the Callbacks
The lock ordering above mimics the lock ordering of an incoming SIP message. The first two locks (dialog and user agent mutexes) happens automatically in PJSIP's user agent layer, and PJSIP will guarantee that the locking sequence for these two mutexes is obeyed all the time. If application wants to acquire dialog's mutex, it just needs to call pjsip_dlg_inc_lock() or pjsip_dlg_try_inc_lock() and PJSIP will take care of getting the mutexes locked in the correct sequence (application shall release the dialog lock with pjsip_dlg_dec_lock()). Note that the user agent mutex will not be held unless dialog is being created or destroyed.
The callback then reaches PJSUA-LIB, and PJSUA-LIB will acquire its mutex with PJSUA_LOCK() or PJSUA_TRY_LOCK() (and later it will release the mutex with PJSUA_UNLOCK() once it has finished with processing the event). These locking macros are currently not part of PJSUA public API, so if application ever wants to use these macros directly, it must include <pjsua-lib/pjsua_internal.h> header file.
Finally the callback reaches application. At this stage, all locks (dialog and PJSUA-LIB) are held, and application can safely work with the call. If application employs its own (application) mutex, then this mutex may be acquired now.
Once the callback returns, the locks will be released in the reverse order.
Acquiring Locks from the Application
When application initiates the API call (for example from a user interface event), application MUST ensure that the lock ordering above is obeyed at all times, otherwise deadlock will occur.
If application doesn't use any mutexes in the application code, then nothing needs to be done. PJSUA-LIB will guarantee that the locks are always acquired in the standard order. If application does use application mutex, then the steps below must be taken to ensure that locks are acquired in standard order.
Below are the steps required to acquire mutexes in the standard order:
- Call acquire_call() function (declared in <pjsua-lib/pjsua_internal.h>. This function will acquire all the locks in the standard ordering above. Note that this function may return non-PJ_SUCCESS if locks cannot be acquired. The common reason of the failure is because other threads are currently acquiring the mutex and the function has detected that acquiring the mutex will result in deadlock condition.
- Once acquire_call() returns successfully, application may then acquire the application mutex.
Once application is done with the processing, it must release the locks in the reverse order:
- Release application's mutex.
- Release call's mutex by calling pjsip_dlg_dec_lock() (the dialog handle was returned by acquire_call() function).
Although hard deadlock (where application just freezes) should never happen when the above guidelines are used, there may be cases when application gets a soft deadlock state. When this happens, application will not freeze permanently, but rather it will freeze for few seconds while PJSUA-LIB is trying to acquire the locks, and these message may appear in the log afterwards:
Timed-out trying to acquire PJSUA mutex (possibly system has deadlocked) in pjsua_xxx
This could happen for example in the following scenario:
- application callback is called
- application posts a job to a worker thread, and blocks until it gets result from the worker thread
- the worker thread calls some PJSUA-LIB call API
- when PJSUA-LIB tries to acquire locks with acquire_call() on behalf of the worker thread, it will fail to get them because the locks are currently being held by the callback thread (in step no 1 above). In this case, rather than deadlocking permanently, PJSUA-LIB will timeout after it retries acquiring the locks for few seconds, and returns PJ_ETIMEDOUT error to acquire_call() caller.
This is a generic application design problem, and is not caused by PJSUA-LIB (in fact PJSUA-LIB has helped us here by not deadlocking permanently).
As for the solution, quoting the famous saying: I'll leave that as an exercise for the reader. ;-)
Soft Deadlock When Working with Multiple Calls
Ticket #1371 reports another soft deadlock case when working with multiple calls.