wiki:PJMEDIA-MIPS

Version 2 (modified by bennylp, 16 years ago) (diff)

--

PJMEDIA MIPS Measurement

This page shows the CPU requirements/MIPS (Million Instructions Per Second) measurements of various PJMEDIA components which would be useful to evaluate PJMEDIA performance. Please do not interpret these numbers as the official or definite performance number, as there are many compilation flags in PJMEDIA that can be set, as well as compiler switches, to improve the performance.

All the results below are done with the stock settings that come with PJSIP distribution. The test source is in pjmedia/src/test directory.

Test Method

Each test should measure the overall performance for both directions. So for example for resampling, the test shows the total upsample and downsample time in a single test, and for codec it will show the total encoding and decoding time.

The test program depends on a correct setting of MIPS value of the processor being set correctly during compilation time. The test uses strictly one thread only.

Interpreting the Results

Columns

There are four columns in the result table:

Clock Rate:
This shows the sampling rate of the component being tested, in Hz. We test both 8KHz and 16Hz. Most components can work in both 8KHz and 16KHz hence there will be two test results row for the same component, each with different clock rate. Some components (mostly related codec) can only work in one of the clock rate (e.g. GSM is only shown in 8KHz, while G.722 is only shown in 16KHz).
Time (usec):
This shows the time elapsed to process 1 second worth of audio samples, in microseconds.
CPU (%):
From the elapsed time above, we can measure how much CPU usage needed to run this component in real-time. For example, if the time elapsed is 1 second (one million microseconds) then this component will take 100% of CPU time when run in real-time. Or if the time elapsed is 0.5 second (500 thousands microseconds) then this component will take 50% of the CPU time when run in real-time.

The CPU percentage maybe larger than 100% if the time taken to process 1 second worth of audio samples is more than 1 second.

MIPS:
Also from the elapsed time above, we can measure the MIPS needed to run this component in real-time, since we know (or we can assume) the MIPS value of the processor.

Rows

The rows shows the measurement result of a particular components. The components tested are described below.

get from memplayer:
The memory/buffer based player port supplies the audio samples for almost all of the tests, so its time adds as overhead for all tests.
conference bridge with N call(s):
This measures the performance of the conference bridge with N calls. Note that we don't use actual call for the test since we only want to measure the conference bridge performance and not codec performance (this will be measured in separate tests). So for this test we use memplayer for each call to supply audio to the bridge. During the test all the calls (ports) will be connected to port zero and port zero will be connected to all calls. No connection among calls is created.
upsample+downsample:
This measures the performance of the resampling algorithm used. The test gets the audio from the memplayer, upsample it twice the clock rate, then downsample it half the clock rate again so that the clock rate now is the same as originally. This test measures linear resampling and polyphase resampling using small filter and large filter.
WSOLA PLC - N% loss:
This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to generate/emulate lost packet (a.k.a Packet Lost Concealment/PLC). Timing for various loss percentage is shown. The WSOLA algorithm is used by both the delay buffer and PLC algorithm in pjmedia. The delay buffer itself is used by the splitcomb, sound port, and the conference bridge to adapt to audio burst and clock drifts.
WSOLA discard N% excess:
This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to discard excess audio samples (e.g. caused by clock drifts). Timing for various excess percentage is shown. The WSOLA algorithm is used by both the delay buffer and PLC algorithm in pjmedia. The delay buffer itself is used by the splitcomb, sound port, and the conference bridge to adapt to audio burst and clock drifts.
echo canceller Nms tail len:
This measures the performance of the acoustic echo canceller (AEC) for various echo tail settings. The audio source is taken from memplayer, and there is no acoustic delay in the audio samples.
tone generator with single/dual freq:
This measures the performance of the tone generator to continuously generate single or dual frequency tone.
codec encode/decode:
This measures the time to encode and then decode 1 second worth of audio samples using the specified codec.
stream TX/RX:
This test is intended to measure the performance/overhead of the stream, which consist of RTP/RTCP processing and de-jitter buffering. In addition it also tests the performance of Secure RTP (SRTP) for various settings combination and codec bandwidth. Since the test here also consists of codec processing (actual encoding and decoding), you need to subtract the result with the result of the corresponding codec to measure the overhead of the stream and SRTP only.

Results

PJSIP-0.9.0, ARM9 (ARM926EJ-S) Linux

Hardware:Olimex SAM9-L9260 board
Platform:Linux 2.6.23
Processor:ARM926EJ-S rev 5 (v5l)
Speed:180 MHz
Assumed MIPS:198 MIPS
BogoMIPS:98.91
Compilation:arm-926-linux-gnu-gcc -O2 -msoft-float -DNDEBUG -DPJ_HAS_FLOATING_POINT=0

Result:

00:59:38.531 os_core_unix.c pjlib 0.9.0-trunk for POSIX initialized
MIPS test, with CPU=180Mhz,  198.0 MIPS
Clock  Item                                      Time     CPU    MIPS
 Rate                                           (usec)    (%)
----------------------------------------------------------------------
 8KHz get from memplayer                          181    0.018    0.04
 8KHz conference bridge with 1 call              6682    0.668    1.32
 8KHz conference bridge with 2 calls            11943    1.194    2.36
 8KHz conference bridge with 4 calls            22402    2.240    4.44
 8KHz conference bridge with 8 calls            42969    4.297    8.51
 8KHz conference bridge with 16 calls           83328    8.333   16.50
 8KHz upsample+downsample - linear               5815    0.581    1.15
 8KHz upsample+downsample - small filter        66786    6.679   13.22
 8KHz upsample+downsample - large filter       870754   87.075  172.41
 8KHz WSOLA PLC - 0% loss                         605    0.060    0.12
 8KHz WSOLA PLC - 2% loss                        1004    0.100    0.20
 8KHz WSOLA PLC - 5% loss                        1541    0.154    0.31
 8KHz WSOLA PLC - 10% loss                       1803    0.180    0.36
 8KHz WSOLA PLC - 20% loss                       3102    0.310    0.61
 8KHz WSOLA PLC - 50% loss                       8431    0.843    1.67
 8KHz WSOLA discard 2% excess                     214    0.021    0.04
 8KHz WSOLA discard 5% excess                     488    0.049    0.10
 8KHz WSOLA discard 10% excess                   1178    0.118    0.23
 8KHz WSOLA discard 20% excess                   2009    0.201    0.40
 8KHz WSOLA discard 50% excess                   6432    0.643    1.27
 8KHz echo canceller 100ms tail len            335870   33.587   66.50
 8KHz echo canceller 128ms tail len            336225   33.623   66.57
 8KHz echo canceller 200ms tail len            349240   34.924   69.15
 8KHz echo canceller 256ms tail len            363206   36.321   71.91
 8KHz echo canceller 400ms tail len            400026   40.003   79.21
 8KHz echo canceller 500ms tail len            426646   42.665   84.48
 8KHz echo canceller 512ms tail len            432291   43.229   85.59
 8KHz echo canceller 600ms tail len            454965   45.496   90.08
 8KHz echo canceller 800ms tail len            516487   51.649  102.26
 8KHz tone generator with single freq             920    0.092    0.18
 8KHz tone generator with dual freq              1428    0.143    0.28
 8KHz codec encode/decode - G.711                2701    0.270    0.53
 8KHz codec encode/decode - GSM                 75750    7.575   15.00
 8KHz codec encode/decode - iLBC              2856203  285.620  565.53
 8KHz codec encode/decode - Speex 8Khz         436162   43.616   86.36
 8KHz codec encode/decode - L16/8000/1           1704    0.170    0.34
 8KHz stream TX/RX - G.711                       6786    0.679    1.34
 8KHz stream TX/RX - G.711 SRTP 32bit           21688    2.169    4.29
 8KHz stream TX/RX - G.711 SRTP 32bit +auth     33501    3.350    6.63
 8KHz stream TX/RX - G.711 SRTP 80bit           21725    2.172    4.30
 8KHz stream TX/RX - G.711 SRTP 80bit +auth     33551    3.355    6.64
 8KHz stream TX/RX - GSM                        82035    8.203   16.24
 8KHz stream TX/RX - GSM SRTP 32bit             90890    9.089   18.00
 8KHz stream TX/RX - GSM SRTP 32bit + auth      99334    9.933   19.67
 8KHz stream TX/RX - GSM SRTP 80bit             90893    9.089   18.00
 8KHz stream TX/RX - GSM SRTP 80bit + auth      99356    9.936   19.67
16KHz get from memplayer                          239    0.024    0.05
16KHz conference bridge with 1 call             12780    1.278    2.53
16KHz conference bridge with 2 calls            23052    2.305    4.56
16KHz conference bridge with 4 calls            43174    4.317    8.55
16KHz conference bridge with 8 calls            82096    8.210   16.26
16KHz conference bridge with 16 calls          158565   15.856   31.40
16KHz upsample+downsample - linear              11469    1.147    2.27
16KHz upsample+downsample - small filter       133088   13.309   26.35
16KHz upsample+downsample - large filter      1739742  173.974  344.47
16KHz WSOLA PLC - 0% loss                         980    0.098    0.19
16KHz WSOLA PLC - 2% loss                        1910    0.191    0.38
16KHz WSOLA PLC - 5% loss                        3734    0.373    0.74
16KHz WSOLA PLC - 10% loss                       7867    0.787    1.56
16KHz WSOLA PLC - 20% loss                      13007    1.301    2.58
16KHz WSOLA PLC - 50% loss                      29022    2.902    5.75
16KHz WSOLA discard 2% excess                     551    0.055    0.11
16KHz WSOLA discard 5% excess                    1027    0.103    0.20
16KHz WSOLA discard 10% excess                   1973    0.197    0.39
16KHz WSOLA discard 20% excess                  10454    1.045    2.07
16KHz WSOLA discard 50% excess                  22276    2.228    4.41
16KHz echo canceller 100ms tail len            664649   66.465  131.60
16KHz echo canceller 128ms tail len            682686   68.269  135.17
16KHz echo canceller 200ms tail len            720924   72.092  142.74
16KHz echo canceller 256ms tail len            752928   75.293  149.08
16KHz echo canceller 400ms tail len            877528   87.753  173.75
16KHz echo canceller 500ms tail len            970559   97.056  192.17
16KHz echo canceller 512ms tail len            989839   98.984  195.99
16KHz echo canceller 600ms tail len           1065465  106.547  210.96
16KHz echo canceller 800ms tail len           1285075  128.508  254.44
16KHz tone generator with single freq            1617    0.162    0.32
16KHz tone generator with dual freq              2632    0.263    0.52
16KHz codec encode/decode - G.722              148080   14.808   29.32
16KHz codec encode/decode - Speex 16Khz        979202   97.920  193.88
16KHz codec encode/decode - L16/16000/1          3244    0.324    0.64
16KHz stream TX/RX - G.722                     155685   15.568   30.83