wiki:PJMEDIA-MIPS

Version 6 (modified by bennylp, 11 years ago) (diff)

--

PJMEDIA MIPS Measurement

This page attempts to show the typical performance characteristic of various PJMEDIA components, which could be useful to evaluate PJMEDIA performance. Please do not interpret these numbers as an official or definite performance number, as there are many compilation flags in PJMEDIA as well as compiler switches that can be set to increase (or decrease) the performance.

Test Method

Each test should measure the overall performance for both directions. So for example for resampling, the test shows the total upsample and downsample time in a single test, and for codec it will show the total encoding and decoding time.

The test program depends on correct settings of CPU_MHZ and MIPS value of the processor being set correctly during compilation time. We used the MIPS information in the following links to assume the MIPS value of the processor:

To measure the MIPS score of a component, the program calculates the time to process 1 second worth of audio samples using that component, then calculates the MIPS score based on the configured MIPS value of the processor. Because of this, the calculated MIPS shouldn't be interpreted as a real MIPS value since it's purely based on time measurement, and our assumed MIPS value for the processor may be wrong, and in platforms where floating-point is available, floating-point instructions will be used instead.

The test uses strictly one thread only.

All the results below are done with the default settings that come with PJSIP distribution. The test source code are available in pjmedia/src/test directory (mips_test.c file).

Interpreting the Results

Columns

There are four columns in the result table:

Clock Rate:
This shows the sampling rate of the component being tested, in KHz. We test both 8KHz and 16KHz. Most components can work in both 8KHz and 16KHz hence there will be two test result rows for the same component, each with different clock rate. Some components (mostly related codec) can only work in one of the clock rate (e.g. GSM is only shown in 8KHz, while G.722 is only shown in 16KHz) hence there will only be one test result row for these components.
Time (usec):
This shows the time elapsed to process 1 second worth of audio samples, in microseconds.
CPU (%):
This shows how much CPU usage (in percent) this component will consume when running it in real-time. The value is derived from the time measurement above. For example, if the time elapsed is 1 secondthen this component will take 100% of CPU time when run in real-time. Or if the time elapsed is 0.5 second then this component will take 50% of the CPU time when run in real-time.

The CPU percentage maybe larger than 100% if the time taken to process 1 second worth of audio samples is more than 1 second. It may happen when we perform the test on slower processor.

MIPS:
The MIPS (Million Instructions per Second) score roughly means how many instructions will be executed by this component per second when we run this component in real-time. The value is derived from the time measurement above, and calculated based on the assumed MIPS value of the processor. Once again, the score may be incorrect for many reasons so it shouldn't be interpreted as an official/definite score, and especially one MUST NOT use the MIPS score to compare performance of different processor families/architectures.

Rows

The rows show the measurement result of a particular components. The components tested are described below.

get from memplayer:
The memory/buffer based player port supplies the audio samples for almost all of the tests, so its time adds as overhead for all tests.
conference bridge with N call(s):
This measures the performance of the conference bridge with N calls. Note that we don't use actual call for the test since we only want to measure the conference bridge performance and not codec performance (this will be measured in separate tests). So for this test we use memplayer for each "call" to supply audio to the bridge. During the test all the calls (ports) will be connected to port zero and port zero will be connected to all calls. No connection among calls are created.
upsample+downsample:
This measures the performance of the resampling algorithm used. The test gets the audio from the memplayer, upsample it twice the clock rate, then downsample it half the clock rate again so that the clock rate now is the same as originally. This test measures both linear and non-linear resampling using small filter and large filter. Some resampling backend algorithms may not support selecting between linear/non-linear and small/large filter, in that case the results will be equal for all settings.
WSOLA PLC - N% loss:
This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to generate/emulate lost packet (a.k.a Packet Lost Concealment/PLC). Timing for various loss percentages are shown.

The WSOLA algorithm is used by both the delay buffer and PLC algorithm in pjmedia. The delay buffer itself is used by the splitcomb, sound port, and the conference bridge to adapt to audio burst and clock drifts.

WSOLA discard N% excess:
This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to discard excess audio samples (e.g. caused by clock drifts). Timing for various excess percentages are shown.
echo canceller Nms tail len:
This measures the performance of the acoustic echo canceller (AEC) for various echo tail settings. The audio source is taken from memplayer, and there is no acoustic delay in the AEC input.
tone generator with single/dual freq:
This measures the performance of the tone generator to continuously generate single or dual frequency tone for 1 second.
codec encode/decode:
This measures the time to encode and then decode 1 second worth of audio samples using the specified codec for 1 second.
stream TX/RX:
This test is intended to measure the performance/overhead of the stream, which consist of codec, RTP/RTCP processing, and de-jitter buffering. In addition it also tests the performance of Secure RTP (SRTP) for various setting combinations and codec bandwidth. Since the test here also consists of codec processing (encoding and decoding), you need to subtract the result with the result of the corresponding codec to measure the overhead of the stream and SRTP only.

Results

PJSIP-0.9.0, Linux, ARM9 (ARM926EJ-S), gcc

Hardware:Olimex SAM9-L9260 board
Platform:Linux 2.6.23
Processor:ARM926EJ-S rev 5 (v5l)
Speed:180 MHz
Assumed MIPS:198 MIPS
BogoMIPS:98.91
Compilation:arm-926-linux-gnu-gcc -O2 -msoft-float -DNDEBUG -DPJ_HAS_FLOATING_POINT=0
gcc: version 4.2.1 --with-cpu=arm926ej-s -march=armv5te -msoft-float --with-float=soft

Result:

00:59:38.531 os_core_unix.c pjlib 0.9.0-trunk for POSIX initialized
MIPS test, with CPU=180Mhz,  198.0 MIPS
Clock  Item                                      Time     CPU    MIPS
 Rate                                           (usec)    (%)
----------------------------------------------------------------------
 8KHz get from memplayer                          181    0.018    0.04
 8KHz conference bridge with 1 call              6682    0.668    1.32
 8KHz conference bridge with 2 calls            11943    1.194    2.36
 8KHz conference bridge with 4 calls            22402    2.240    4.44
 8KHz conference bridge with 8 calls            42969    4.297    8.51
 8KHz conference bridge with 16 calls           83328    8.333   16.50
 8KHz upsample+downsample - linear               5815    0.581    1.15
 8KHz upsample+downsample - small filter        66786    6.679   13.22
 8KHz upsample+downsample - large filter       870754   87.075  172.41
 8KHz WSOLA PLC - 0% loss                         605    0.060    0.12
 8KHz WSOLA PLC - 2% loss                        1004    0.100    0.20
 8KHz WSOLA PLC - 5% loss                        1541    0.154    0.31
 8KHz WSOLA PLC - 10% loss                       1803    0.180    0.36
 8KHz WSOLA PLC - 20% loss                       3102    0.310    0.61
 8KHz WSOLA PLC - 50% loss                       8431    0.843    1.67
 8KHz WSOLA discard 2% excess                     214    0.021    0.04
 8KHz WSOLA discard 5% excess                     488    0.049    0.10
 8KHz WSOLA discard 10% excess                   1178    0.118    0.23
 8KHz WSOLA discard 20% excess                   2009    0.201    0.40
 8KHz WSOLA discard 50% excess                   6432    0.643    1.27
 8KHz echo canceller 100ms tail len            335870   33.587   66.50
 8KHz echo canceller 128ms tail len            336225   33.623   66.57
 8KHz echo canceller 200ms tail len            349240   34.924   69.15
 8KHz echo canceller 256ms tail len            363206   36.321   71.91
 8KHz echo canceller 400ms tail len            400026   40.003   79.21
 8KHz echo canceller 500ms tail len            426646   42.665   84.48
 8KHz echo canceller 512ms tail len            432291   43.229   85.59
 8KHz echo canceller 600ms tail len            454965   45.496   90.08
 8KHz echo canceller 800ms tail len            516487   51.649  102.26
 8KHz tone generator with single freq             920    0.092    0.18
 8KHz tone generator with dual freq              1428    0.143    0.28
 8KHz codec encode/decode - G.711                2701    0.270    0.53
 8KHz codec encode/decode - GSM                 75750    7.575   15.00
 8KHz codec encode/decode - iLBC              2856203  285.620  565.53
 8KHz codec encode/decode - Speex 8Khz         436162   43.616   86.36
 8KHz codec encode/decode - L16/8000/1           1704    0.170    0.34
 8KHz stream TX/RX - G.711                       6786    0.679    1.34
 8KHz stream TX/RX - G.711 SRTP 32bit           21688    2.169    4.29
 8KHz stream TX/RX - G.711 SRTP 32bit +auth     33501    3.350    6.63
 8KHz stream TX/RX - G.711 SRTP 80bit           21725    2.172    4.30
 8KHz stream TX/RX - G.711 SRTP 80bit +auth     33551    3.355    6.64
 8KHz stream TX/RX - GSM                        82035    8.203   16.24
 8KHz stream TX/RX - GSM SRTP 32bit             90890    9.089   18.00
 8KHz stream TX/RX - GSM SRTP 32bit + auth      99334    9.933   19.67
 8KHz stream TX/RX - GSM SRTP 80bit             90893    9.089   18.00
 8KHz stream TX/RX - GSM SRTP 80bit + auth      99356    9.936   19.67
16KHz get from memplayer                          239    0.024    0.05
16KHz conference bridge with 1 call             12780    1.278    2.53
16KHz conference bridge with 2 calls            23052    2.305    4.56
16KHz conference bridge with 4 calls            43174    4.317    8.55
16KHz conference bridge with 8 calls            82096    8.210   16.26
16KHz conference bridge with 16 calls          158565   15.856   31.40
16KHz upsample+downsample - linear              11469    1.147    2.27
16KHz upsample+downsample - small filter       133088   13.309   26.35
16KHz upsample+downsample - large filter      1739742  173.974  344.47
16KHz WSOLA PLC - 0% loss                         980    0.098    0.19
16KHz WSOLA PLC - 2% loss                        1910    0.191    0.38
16KHz WSOLA PLC - 5% loss                        3734    0.373    0.74
16KHz WSOLA PLC - 10% loss                       7867    0.787    1.56
16KHz WSOLA PLC - 20% loss                      13007    1.301    2.58
16KHz WSOLA PLC - 50% loss                      29022    2.902    5.75
16KHz WSOLA discard 2% excess                     551    0.055    0.11
16KHz WSOLA discard 5% excess                    1027    0.103    0.20
16KHz WSOLA discard 10% excess                   1973    0.197    0.39
16KHz WSOLA discard 20% excess                  10454    1.045    2.07
16KHz WSOLA discard 50% excess                  22276    2.228    4.41
16KHz echo canceller 100ms tail len            664649   66.465  131.60
16KHz echo canceller 128ms tail len            682686   68.269  135.17
16KHz echo canceller 200ms tail len            720924   72.092  142.74
16KHz echo canceller 256ms tail len            752928   75.293  149.08
16KHz echo canceller 400ms tail len            877528   87.753  173.75
16KHz echo canceller 500ms tail len            970559   97.056  192.17
16KHz echo canceller 512ms tail len            989839   98.984  195.99
16KHz echo canceller 600ms tail len           1065465  106.547  210.96
16KHz echo canceller 800ms tail len           1285075  128.508  254.44
16KHz tone generator with single freq            1617    0.162    0.32
16KHz tone generator with dual freq              2632    0.263    0.52
16KHz codec encode/decode - G.722              148080   14.808   29.32
16KHz codec encode/decode - Speex 16Khz        979202   97.920  193.88
16KHz codec encode/decode - L16/16000/1          3244    0.324    0.64
16KHz stream TX/RX - G.722                     155685   15.568   30.83

PJSIP-0.9.0, Linux, Pentium3, gcc

Hardware:IBM X21 Notebook
Platform:Linux 2.6.23
Processor:Pentium III
Speed:700 MHz
Assumed MIPS: 1895.6 MIPS
BogoMIPS: 1395.36
Compilation: -O3 -march=pentium3 -fomit-frame-pointer -DNDEBUG
gcc: version 4.2.3

Result:

02:01:45.561 os_core_unix.c pjlib 0.9.0-trunk for POSIX initialized
MIPS test, with CPU=700Mhz, 1895.6 MIPS
Clock  Item                                      Time     CPU    MIPS
 Rate                                           (usec)    (%)       
----------------------------------------------------------------------
 8KHz get from memplayer                           23    0.002    0.04
 8KHz conference bridge with 1 call               800    0.080    1.52
 8KHz conference bridge with 2 calls             1395    0.140    2.64
 8KHz conference bridge with 4 calls             2522    0.252    4.78
 8KHz conference bridge with 8 calls             4704    0.470    8.92
 8KHz conference bridge with 16 calls            9146    0.915   17.34
 8KHz upsample+downsample - linear                589    0.059    1.12
 8KHz upsample+downsample - small filter         9563    0.956   18.13
 8KHz upsample+downsample - large filter        46644    4.664   88.42
 8KHz WSOLA PLC - 0% loss                         107    0.011    0.20
 8KHz WSOLA PLC - 2% loss                         240    0.024    0.45
 8KHz WSOLA PLC - 5% loss                         466    0.047    0.88
 8KHz WSOLA PLC - 10% loss                        524    0.052    0.99
 8KHz WSOLA PLC - 20% loss                        958    0.096    1.82
 8KHz WSOLA PLC - 50% loss                       2667    0.267    5.06
 8KHz WSOLA discard 2% excess                      57    0.006    0.11
 8KHz WSOLA discard 5% excess                     142    0.014    0.27
 8KHz WSOLA discard 10% excess                    364    0.036    0.69
 8KHz WSOLA discard 20% excess                    631    0.063    1.20
 8KHz WSOLA discard 50% excess                   2081    0.208    3.94
 8KHz echo canceller 100ms tail len             40050    4.005   75.92
 8KHz echo canceller 128ms tail len             33179    3.318   62.89
 8KHz echo canceller 200ms tail len             35161    3.516   66.65
 8KHz echo canceller 256ms tail len             37470    3.747   71.03
 8KHz echo canceller 400ms tail len             45104    4.510   85.50
 8KHz echo canceller 500ms tail len             50504    5.050   95.74
 8KHz echo canceller 512ms tail len             50940    5.094   96.56
 8KHz echo canceller 600ms tail len             56113    5.611  106.37
 8KHz echo canceller 800ms tail len             71677    7.168  135.87
 8KHz tone generator with single freq            1758    0.176    3.33
 8KHz tone generator with dual freq              3506    0.351    6.65
 8KHz codec encode/decode - G.711                 357    0.036    0.68
 8KHz codec encode/decode - GSM                 11382    1.138   21.58
 8KHz codec encode/decode - iLBC                46894    4.689   88.89
 8KHz codec encode/decode - Speex 8Khz          64428    6.443  122.13
 8KHz codec encode/decode - L16/8000/1            248    0.025    0.47
 8KHz stream TX/RX - G.711                        617    0.062    1.17
 8KHz stream TX/RX - G.711 SRTP 32bit            1751    0.175    3.32
 8KHz stream TX/RX - G.711 SRTP 32bit +auth      3161    0.316    5.99
 8KHz stream TX/RX - G.711 SRTP 80bit            1773    0.177    3.36
 8KHz stream TX/RX - G.711 SRTP 80bit +auth      3108    0.311    5.89
 8KHz stream TX/RX - GSM                        11755    1.176   22.28
 8KHz stream TX/RX - GSM SRTP 32bit             12439    1.244   23.58
 8KHz stream TX/RX - GSM SRTP 32bit + auth      13285    1.329   25.18
 8KHz stream TX/RX - GSM SRTP 80bit             12270    1.227   23.26
 8KHz stream TX/RX - GSM SRTP 80bit + auth      13358    1.336   25.32
16KHz get from memplayer                           27    0.003    0.05
16KHz conference bridge with 1 call              1522    0.152    2.89
16KHz conference bridge with 2 calls             2711    0.271    5.14
16KHz conference bridge with 4 calls             4772    0.477    9.05
16KHz conference bridge with 8 calls             8913    0.891   16.90
16KHz conference bridge with 16 calls           18759    1.876   35.56
16KHz upsample+downsample - linear               1136    0.114    2.15
16KHz upsample+downsample - small filter        19231    1.923   36.45
16KHz upsample+downsample - large filter        93066    9.307  176.42
16KHz WSOLA PLC - 0% loss                         177    0.018    0.34
16KHz WSOLA PLC - 2% loss                         534    0.053    1.01
16KHz WSOLA PLC - 5% loss                        1165    0.116    2.21
16KHz WSOLA PLC - 10% loss                       2796    0.280    5.30
16KHz WSOLA PLC - 20% loss                       4515    0.451    8.56
16KHz WSOLA PLC - 50% loss                      10482    1.048   19.87
16KHz WSOLA discard 2% excess                     168    0.017    0.32
16KHz WSOLA discard 5% excess                     326    0.033    0.62
16KHz WSOLA discard 10% excess                    654    0.065    1.24
16KHz WSOLA discard 20% excess                   3526    0.353    6.68
16KHz WSOLA discard 50% excess                   7507    0.751   14.23
16KHz echo canceller 100ms tail len             68547    6.855  129.94
16KHz echo canceller 128ms tail len             72619    7.262  137.66
16KHz echo canceller 200ms tail len             78054    7.805  147.96
16KHz echo canceller 256ms tail len             84739    8.474  160.63
16KHz echo canceller 400ms tail len            107738   10.774  204.23
16KHz echo canceller 500ms tail len            129879   12.988  246.20
16KHz echo canceller 512ms tail len            133796   13.380  253.62
16KHz echo canceller 600ms tail len            152166   15.217  288.45
16KHz echo canceller 800ms tail len            205415   20.542  389.38
16KHz tone generator with single freq            3489    0.349    6.61
16KHz tone generator with dual freq              6996    0.700   13.26
16KHz codec encode/decode - G.722               32803    3.280   62.18
16KHz codec encode/decode - Speex 16Khz        156629   15.663  296.91
16KHz codec encode/decode - L16/16000/1           434    0.043    0.82
16KHz stream TX/RX - G.722                      20959    2.096   39.73

PJSIP-0.9.0, Windows XP, Pentium 4, Visual Studio 2005

Hardware:HP PC
Platform:Windows XP SP2
Processor:Pentium 4 (single core, no Hyper-Threading)
Speed:2.6 GHz
Assumed MIPS: 8102 MIPS
BogoMIPS: -
Compilation: Default Release settings (/O2)
Compiler: Visual Studio 2005

Result:

09:46:14.571 os_core_win32. pjlib 0.9.0-trunk for win32 initialized
MIPS test, with CPU=2666Mhz, 8102.0 MIPS
Clock  Item                                      Time     CPU    MIPS
 Rate                                           (usec)    (%)
----------------------------------------------------------------------
 8KHz get from memplayer                           11    0.001    0.09
 8KHz conference bridge with 1 call               337    0.034    2.73
 8KHz conference bridge with 2 calls              512    0.051    4.15
 8KHz conference bridge with 4 calls              919    0.092    7.45
 8KHz conference bridge with 8 calls             1658    0.166   13.43
 8KHz conference bridge with 16 calls            3180    0.318   25.76
 8KHz upsample+downsample - linear                288    0.029    2.33
 8KHz upsample+downsample - small filter         7822    0.782   63.37
 8KHz upsample+downsample - large filter        38386    3.839  311.00
 8KHz WSOLA PLC - 0% loss                          53    0.005    0.43
 8KHz WSOLA PLC - 2% loss                          61    0.006    0.49
 8KHz WSOLA PLC - 5% loss                         103    0.010    0.83
 8KHz WSOLA PLC - 10% loss                        152    0.015    1.23
 8KHz WSOLA PLC - 20% loss                        195    0.020    1.58
 8KHz WSOLA PLC - 50% loss                        520    0.052    4.21
 8KHz WSOLA discard 2% excess                       8    0.001    0.06
 8KHz WSOLA discard 5% excess                      27    0.003    0.22
 8KHz WSOLA discard 10% excess                     74    0.007    0.60
 8KHz WSOLA discard 20% excess                    117    0.012    0.95
 8KHz WSOLA discard 50% excess                    370    0.037    3.00
 8KHz echo canceller 100ms tail len             20945    2.095  169.70
 8KHz echo canceller 128ms tail len             20484    2.048  165.96
 8KHz echo canceller 200ms tail len             21017    2.102  170.28
 8KHz echo canceller 256ms tail len             21562    2.156  174.69
 8KHz echo canceller 400ms tail len             23030    2.303  186.59
 8KHz echo canceller 500ms tail len             24102    2.410  195.27
 8KHz echo canceller 512ms tail len             24441    2.444  198.02
 8KHz echo canceller 600ms tail len             25380    2.538  205.63
 8KHz echo canceller 800ms tail len             28751    2.875  232.94
 8KHz tone generator with single freq              84    0.008    0.68
 8KHz tone generator with dual freq               125    0.013    1.01
 8KHz codec encode/decode - G.711                 135    0.014    1.09
 8KHz codec encode/decode - GSM                  6898    0.690   55.89
 8KHz codec encode/decode - iLBC                39783    3.978  322.32
 8KHz codec encode/decode - Speex 8Khz          24543    2.454  198.85
 8KHz codec encode/decode - L16/8000/1            161    0.016    1.30
 8KHz stream TX/RX - G.711                        298    0.030    2.41
 8KHz stream TX/RX - G.711 SRTP 32bit             633    0.063    5.13
 8KHz stream TX/RX - G.711 SRTP 32bit +auth      1063    0.106    8.61
 8KHz stream TX/RX - G.711 SRTP 80bit             634    0.063    5.14
 8KHz stream TX/RX - G.711 SRTP 80bit +auth      1066    0.107    8.64
 8KHz stream TX/RX - GSM                         7182    0.718   58.19
 8KHz stream TX/RX - GSM SRTP 32bit              7353    0.735   59.57
 8KHz stream TX/RX - GSM SRTP 32bit + auth       7693    0.769   62.33
 8KHz stream TX/RX - GSM SRTP 80bit              7313    0.731   59.25
 8KHz stream TX/RX - GSM SRTP 80bit + auth       7673    0.767   62.17
16KHz get from memplayer                            8    0.001    0.06
16KHz conference bridge with 1 call               592    0.059    4.80
16KHz conference bridge with 2 calls              907    0.091    7.35
16KHz conference bridge with 4 calls             1620    0.162   13.13
16KHz conference bridge with 8 calls             3055    0.306   24.75
16KHz conference bridge with 16 calls            5799    0.580   46.98
16KHz upsample+downsample - linear                560    0.056    4.54
16KHz upsample+downsample - small filter        15505    1.551  125.62
16KHz upsample+downsample - large filter        76944    7.694  623.40
16KHz WSOLA PLC - 0% loss                          52    0.005    0.42
16KHz WSOLA PLC - 2% loss                         263    0.026    2.13
16KHz WSOLA PLC - 5% loss                         113    0.011    0.92
16KHz WSOLA PLC - 10% loss                        383    0.038    3.10
16KHz WSOLA PLC - 20% loss                        742    0.074    6.01
16KHz WSOLA PLC - 50% loss                       1757    0.176   14.24
16KHz WSOLA discard 2% excess                       9    0.001    0.07
16KHz WSOLA discard 5% excess                      69    0.007    0.56
16KHz WSOLA discard 10% excess                    220    0.022    1.78
16KHz WSOLA discard 20% excess                    403    0.040    3.27
16KHz WSOLA discard 50% excess                   1301    0.130   10.54
16KHz echo canceller 100ms tail len             42084    4.208  340.96
16KHz echo canceller 128ms tail len             42697    4.270  345.93
16KHz echo canceller 200ms tail len             43782    4.378  354.72
16KHz echo canceller 256ms tail len             45008    4.501  364.65
16KHz echo canceller 400ms tail len             49519    4.952  401.20
16KHz echo canceller 500ms tail len             51945    5.194  420.86
16KHz echo canceller 512ms tail len             52492    5.249  425.29
16KHz echo canceller 600ms tail len             54984    5.498  445.48
16KHz echo canceller 800ms tail len             60065    6.006  486.65
16KHz tone generator with single freq             161    0.016    1.30
16KHz tone generator with dual freq               239    0.024    1.94
16KHz codec encode/decode - G.722                9354    0.935   75.79
16KHz codec encode/decode - Speex 16Khz         51086    5.109  413.90
16KHz codec encode/decode - L16/16000/1           304    0.030    2.46
16KHz stream TX/RX - G.722                       9570    0.957   77.54