Ticket #1573 (closed defect: fixed)

Opened 6 years ago

Last modified 6 years ago

Deadlock in SSL socket (thanks to Jeff Anderson for detailed report)

Reported by: ming Owned by: bennylp
Priority: normal Milestone: release-2.1
Component: pjlib Version: trunk
Keywords: Cc:
Backport to 1.x milestone: release-1.16 Backported: yes

Description (last modified by ming) (diff)

Thread 1 is reading data from the SSL socket with this sequence of events:

  1. ioqueue_dispatch_read_event() in ioqueue_common_abs.c locks the ioqueue key’s mutex and calls on_read_complete()
    /* Lock the key. */
    pj_mutex_lock(h->mutex);
    …
    …
    if (h->cb.on_read_complete && !IS_CLOSING(h)) {
        (*h->cb.on_read_complete)…
    }
    
  2. on_data_read calls asock_on_data_read() in ssl_sock_ossl.c. Here it locks the socket’s write_mutex:
    pj_lock_acquire(ssock->write_mutex);
    size_ = SSL_read(ssock->ossl_ssl, data_, size_);
    pj_lock_release(ssock->write_mutex);
    

Thread 2 is writing data to the SSL socket with this sequence of events:

  1. pj_ssl_sock_send() is called which acquires the lock to the socket’s write mutex:
    pj_lock_acquire(ssock->write_mutex);
    
  2. Eventually pj_ioqueue_send() is called from ioqueue_common_abs.c which locks the ioqueue key’s mutex:
    pj_mutex_lock(key->mutex);
    

So in thread 1 we lock the ioqueue key mutex followed by the socket write_mutex. In thread 2 we lock the write_mutex followed by the ioqueue key mutex.

Another deadlock scenario: It’s the same problem with one thread locking the ioqueue key’s mutex and the ssock.write_mutex in opposite orders:

Thread 1:

  1. In pj_ssl_sock_send() ssl_write is called and the ssl sock’s write_mutex is locked before the call:
    pj_lock_acquire(ssock->write_mutex);
    …
    …
    status = ssl_write(ssock, send_key, data, *size, flags);
    
  2. Call to ssl_write eventually leads to pj_ioqueue_send() being called and the ioqueue key’s mutex is locked (and deadlocked here):
    pj_mutex_lock(key->mutex);
    

Thread 2:

  1. A write event leads to ioqueue_dispatch_write_event being called which locks the ioqueue key’s mutex and calls the write callback:
    pj_mutex_lock(h->mutex);
    …
    if (h->cb.on_write_complete && !IS_CLOSING(h)) {
        (*h->cb.on_write_complete)(h, (pj_ioqueue_op_key_t*)write_op,
        write_op->written);
    }
    
  2. asock_on_data_sent is called which locks the ssl sock’s write_mutex, and here we deadlock on the acquire:
    /* Update write buffer state */
    pj_lock_acquire(ssock->write_mutex);
    

Note that SSL sock creates activesock with whole_data setting set as PJ_TRUE, so this causes the concurrency to be disabled in activesock.c line 219:

    if (asock->whole_data) {
	/* Must disable concurrency otherwise there is a race condition*/
	pj_ioqueue_set_concurrency(asock->key, 0);
    }

Change History

comment:1 Changed 6 years ago by ming

  • Description modified (diff)

comment:2 Changed 6 years ago by nanang

  • Status changed from new to closed
  • Resolution set to fixed

In 4247:

Fix #1573:

  • Never hold lock while calling pj_activesock_send*() to avoid deadlock.
  • Refactor the sending buffer management.

comment:3 Changed 6 years ago by nanang

In 4248:

Re #1573: Fixed bad cast on ioqueue send key to send data in asock_on_data_sent().

comment:4 Changed 6 years ago by nanang

In 4249:

Re #1573: Fixed improper unlock in do_handshake().

comment:5 Changed 6 years ago by ming

In 4392:

Re #1573: Backported to 1.x

comment:6 Changed 6 years ago by ming

  • Backported set
Note: See TracTickets for help on using tickets.