Version 8 (modified by bennylp, 15 years ago) (diff)


Using APS-Direct and VAS-Direct in PJMEDIA

Table of Contents

  1. Non-PCM Format
  2. Passthrough Codecs
  3. (Symbian) Sound Device API
  4. New Audio Switchboard (the non-mixing conference bridge)

The APS-Direct and VAS-Direct is our codenames for functionalities to use the hardware codecs that are supported by Nokia APS and/or VAS directly, bypassing media processing in PJMEDIA. These features will be introduced gradually beginning in PJSIP version 1.1.

APS stands for Audio Proxy Server, and it is available as plug-in for Nokia S60 3rd Edition up to Feature Pack 2 version. This has been deprecated for FP2 devices and above, and it is being replaced by VoIP Audio Services (VAS), which is available as plug-in for S60 FP1/FP2 devices and will be available built-in in later S60 versions.


The Nokia APS and VAS support codecs such as G.711 (PCMA and PCMU), G.729, iLBC, and AMR-NB, though the availability of these codecs may vary according to the handset types. There are significant benefits of using these codecs instead of software codecs (in PJMEDIA-CODEC), with the main benefits are performance (hardware vs software codecs, latency) and the given codec licenses/royalties.

Due to these benefits, the ability to use these codecs in PJSIP applications is very desirable.

Note that non-APS codecs can still be used as usual, e.g: GSM, Speex/8000, and G.722.


Before starting working with APS-Direct, please make sure you understand the concepts behind APS-Direct so that you can design the application appropriately.

The whole point of APS-Direct is to enable end-to-end encoded audio format media flow, that is from microphone device down to network/socket and from network/socket to the speaker device. This may sound obvious, but it has the following serious implications which will impact your application design.

No mixing

As will later explained, we have developed a new variant of conference bridge called audio switchboard. This object has the same API as the bridge, but it lacks the mixing capability of the bridge. The implication of this is you can't have two slots transmitting to the same slot in the switchboard.

So you can't have two calls with active and connected to the audio device at the same time. You can have more than one calls, but one of them must be put on-hold.

One format rule
The sound device can only handle one format at a time, meaning that if it is currently opened with G.729 format (for one call for example), you can't feed it with PCM frames for example from the tone generator or PCM WAV files.

All PJMEDIA features that work with PCM audio will no longer work if the audio device is currently opened in codec mode. This includes the tone generator (tonegen) and WAV files. If you wish to use any of the features above, you must close the sound device and re-open it in PCM mode.


The use of APS-Direct and VAS-Direct is very different than traditional PJMEDIA media processing, with the main difference being the audio frames returned by/given to the sound device are now in encoded format rather than in raw PCM format. The following changes will be done in order to support this.

Non-PCM Format

Media ports may now support non-PCM media, and this is signaled by adding a new "format" field in the pjmedia_port_info.

typedef enum pjmedia_format { 
} pjmedia_format;

Support for new frame type (the enum pjmedia_frame_type): PJMEDIA_FRAME_TYPE_EXTENDED. When the frame's type is set to this type, the pjmedia_frame structure may be typecasted to pjmedia_frame_ext struct (new):

typedef struct pjmedia_frame_ext {
    pjmedia_frame   base;          /**< Base frame info */
    pj_uint16_t     samples_cnt;   /**< Number of samples in this frame */
    pj_uint16_t     subframe_cnt;  /**< Number of (sub)frames in this frame */

    /* Zero or more (sub)frames follows immediately after this,
     * each will be represented by pjmedia_frame_ext_subframe
} pjmedia_frame_ext;

typedef struct pjmedia_frame_ext_subframe {
    pj_uint16_t     bitlen;   /**< Number of bits in the data */
    pj_uint8_t      data[1];  /**< The encoded data */
} struct pjmedia_frame_ext_subframe;

The stream must also support non-PCM audio frames in its get_frame() and put_frame() port interface.

Passthrough Codecs

While the actual codec encoding/decoding will be done by APS/VAS, "dummy" codec instances still need to be created in PJMEDIA:

  • PJMEDIA needs the list of codecs supported to be offered/negotiated in SDP,
  • some codecs have special framing requirements which are not handled by the hardware codecs, for example the framing rules of AMR codecs (RFC 3267).

Passthrough codecs will be implemented for: PCMA, PCMU, iLBC, G.729, and AMR-NB.

(Symbian) Sound Device API

The APS/VAS based sound device backends will support additional APIs:

  • to query the list of supported codecs/formats (for APS, the list is currently hardcoded),
  • to set which format to use when opening the sound device,
  • audio routing to loudspeaker or earpiece (this API is already available)

New Audio Switchboard (the non-mixing conference bridge)

Since audio frames are forwarded back and forth in encoded format, obviously the traditional conference bridge would not be able to handle it. A new object will be added, we call this audio switchboard (conf_switch.c) and it's API will be compatible with the existing conference bridge API, so that it can replace the bridge in the application by compile time switch.

Understandably some conference bridge features will not be available:

  • audio mixing feature (no conferencing feature),
  • audio level adjustment and query,
  • passive ports.

Some of the features of the switchboard:

  • uses the conference bridge API to control it (so it is compile-time compatible),
  • one source may transmit to more than one destinations (though obviously one destination cannot take more than one sources since the switchboard cannot mix audio). This is useful for example to implement call recording feature in the application.
  • it is optimized for low latency to the sound device (no delaybuf buffering for the microphone),
  • much more lightweight (footprint and performance),
  • supports routing audio from ports with different ptime settings.

Using APS-Direct or VAS-Direct

Currently only APS-Direct is implemented, and here are the steps to build the application with APS- Direct feature.

  1. Enable APS sound device implementation as described here.
  2. Enable audio switch board, i.e. in config_site.h:
  3. Enable passthrough codecs, i.e. in config_site.h:

For building sample application symbian_ua, those steps are enough since it's already prepared to use APS- Direct.

For general application, there are few more things to be handled:

  • Reopening sound device when it needs to change the active format/codec, e.g: when a call is confirmed and stream has been created, the sound device format should be matched to the SDP negotiation result. Here is the sample code for application that using pjsua-lib, reopening sound device is done in on_stream_created() pjsua callback, this will replace the precreated pjsua-lib sound device instance:
    /* Global sound port. */
    static pjmedia_snd_port *g_snd_port;
    /* Reopen sound device on on_stream_created() pjsua callback. */
    static void on_stream_created(pjsua_call_id, 
    			      pjmedia_session *sess,
    			      unsigned stream_idx, 
        pjmedia_port *conf;
        pjmedia_session_info sess_info;
        pjmedia_stream_info *strm_info;
        pjmedia_snd_setting setting;
        unsigned samples_per_frame;
        /* Get active format for this stream, based on SDP negotiation result. */    
        pjmedia_session_get_info(sess, &sess_info);
        strm_info = &sess_info.stream_info[stream_idx];
        /* Init sound device setting based on stream info. */
        pj_bzero(&setting, sizeof(setting));
        setting.format = strm_info->param->info.format;
        setting.bitrate = strm_info->param->info.avg_bps;
        setting.cng = strm_info->param->setting.cng;
        setting.vad = strm_info->param->setting.vad;
        setting.plc = strm_info->param->setting.plc;
        /* Close sound device and get the conference port. */
        conf = pjsua_set_no_snd_dev();
        samples_per_frame = strm_info->param->info.clock_rate *
    			strm_info->param->info.frm_ptime *
    			strm_info->param->info.channel_cnt /
        /* Reset conference port attributes. */
        conf->info.samples_per_frame = samples_per_frame;
        conf->info.clock_rate = strm_info->param->info.clock_rate;
        conf->info.channel_count = strm_info->param->info.channel_cnt;
        conf->info.bits_per_sample = 16;
        /* Reopen sound device. */
        /* Connect sound to conference port. */
        pjmedia_snd_port_connect(g_snd_port, conf);
  • Note that sound device instance is now owned and managed by application, so

pjsua_media_config.snd_auto_close_time will not work. Here is the sample code to utilize

on_stream_destroyed() pjsua callback as the trigger of closing the sound device:

/* Close sound device on on_stream_destroyed() pjsua callback. */
static void on_stream_destroyed(pjsua_call_id,
    if (g_snd_port) {
    	g_snd_port = NULL;


Internal documentations:

  1. Using APS in PJSIP
  2. Known problems of PJSIP with APS and Symbian target in general
  3. PJMEDIA and PJMEDIA-CODEC Documentation

External links:

  1. Audio Proxy Server and VoIP Audio Services (PDF presentation)
  2. Audio Proxy Server - Forum Nokia
  3. Nokia VoIP Audio Service API - Forum Nokia