Context Navigation

Changes between Version 5 and Version 6 of PJMEDIA-MIPS

Timestamp:: Jul 5, 2008 9:40:48 AM (16 years ago)
Author:: bennylp
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

PJMEDIA-MIPS

-                      v5
+                      v6
 = PJMEDIA MIPS Measurement =
 This page shows the CPU requirements/MIPS (Million Instructions Per Second) measurements of various PJMEDIA components which would be useful to evaluate PJMEDIA performance. Please do not interpret these numbers as an official or definite performance number, as these tests do not actually measure the actual number of instructions executed but rather the time, and there are many compilation flags in PJMEDIA as well as compiler switches that can be set to improve the performance.
+This page attempts to show the typical performance characteristic of various PJMEDIA components, which could be useful to evaluate PJMEDIA performance. Please do not interpret these numbers as an official or definite performance number, as there are many compilation flags in PJMEDIA as well as compiler switches that can be set to increase (or decrease) the performance.
 == Test Method ==
+Each test should measure the overall performance for both directions. So for example for resampling, the test shows the total upsample and downsample time in a single test,  and for codec it will show the total encoding and decoding time.
+The test program depends on a correct setting of MIPS value of the processor being set correctly during compilation time. The test uses strictly one thread only.
+To measure the MIPS value of a component, the program calculates the time to process 1 second worth of audio samples using that component, then calculates the MIPS value based on the configured MIPS value of the processor.
+All the results below are done with the stock settings that come with PJSIP distribution. The test source code is available in '''pjmedia/src/test''' directory ({{{mips_test.c}}} file).
+Each test should measure the overall performance for both directions. So for example for resampling, the test shows the total upsample and downsample time in a single test, and for codec it will show the total encoding and decoding time.
+The test program depends on correct settings of CPU_MHZ and MIPS value of the processor being set correctly during compilation time. We used the MIPS information in the following links to assume the MIPS value of the processor:
+ - http://en.wikipedia.org/wiki/Million_instructions_per_second
+ - http://en.wikipedia.org/wiki/ARM_architecture
+To measure the MIPS score of a component, the program calculates the time to process 1 second worth of audio samples using that component, then calculates the MIPS score based on the configured MIPS value of the processor. Because of this, the calculated MIPS shouldn't be interpreted as a ''real'' MIPS value since it's purely based on time measurement, and our assumed MIPS value for the processor may be wrong, and in platforms where floating-point is available, floating-point instructions will be used instead.
+The test uses strictly one thread only.
+All the results below are done with the default settings that come with PJSIP distribution. The test source code are available in '''pjmedia/src/test''' directory ({{{mips_test.c}}} file).
 == Interpreting the Results ==
 …
  '''Clock Rate:''' ::
   This shows the sampling rate of the component being tested, in Hz. We test both 8KHz and 16Hz. Most components can work in both 8KHz and 16KHz hence there will be two test results row for the same component, each with different clock rate. Some components (mostly related codec) can only work in one of the clock rate (e.g. GSM is only shown in 8KHz, while G.722 is only shown in 16KHz).
+  This shows the sampling rate of the component being tested, in KHz. We test both 8KHz and 16KHz. Most components can work in both 8KHz and 16KHz hence there will be two test result rows for the same component, each with different clock rate. Some components (mostly related codec) can only work in one of the clock rate (e.g. GSM is only shown in 8KHz, while G.722 is only shown in 16KHz) hence there will only be one test result row for these components.
  '''Time (usec):''' ::
 …
  '''CPU (%):''' ::
  From the elapsed time above, we can measure how much CPU usage needed to run this component in real-time. For example, if the time elapsed is 1 second (one million microseconds) then this component will take 100% of CPU time when run in real-time. Or if the time elapsed is 0.5 second (500 thousands microseconds) then this component will take 50% of the CPU time when run in real-time.
  The CPU percentage maybe larger than 100% if the time taken to process 1 second worth of audio samples is more than 1 second.
+ This shows how much CPU usage (in percent) this component will consume when running it in real-time. The value is derived from the time measurement above. For example, if the time elapsed is 1 secondthen this component will take 100% of CPU time when run in real-time. Or if the time elapsed is 0.5 second then this component will take 50% of the CPU time when run in real-time.
+ The CPU percentage maybe larger than 100% if the time taken to process 1 second worth of audio samples is more than 1 second. It may happen when we perform the test on slower processor.
  '''MIPS:''' ::
  Also from the elapsed time above, we can measure the MIPS needed to run this component in real-time, since we know (or we can assume) the MIPS value of the processor.
+ The MIPS (Million Instructions per Second) score roughly means how many instructions will be executed by this component per second when we run this component in real-time.  The value is derived from the time measurement above, and calculated based on the assumed MIPS value of the processor. Once again, the score may be incorrect for many reasons so it shouldn't be interpreted as an official/definite score, and especially one MUST NOT use the MIPS score to compare performance of different processor families/architectures.
 === Rows ===
 The rows shows the measurement result of a particular components. The components tested are described below.
+The rows show the measurement result of a particular components. The components tested are described below.
  '''get from memplayer:''' ::
 …
  '''conference bridge with N call(s):''' ::
   This measures the performance of the conference bridge with N calls. Note that we don't use actual call for the test since we only want to measure the conference bridge performance and not codec performance (this will be measured in separate tests). So for this test we use memplayer for each ''call'' to supply audio to the bridge. During the test all the calls (ports) will be connected to port zero and port zero will be connected to all calls. No connection among calls is created.
+  This measures the performance of the conference bridge with N calls. Note that we don't use actual call for the test since we only want to measure the conference bridge performance and not codec performance (this will be measured in separate tests). So for this test we use memplayer for each "call" to supply audio to the bridge. During the test all the calls (ports) will be connected to port zero and port zero will be connected to all calls. No connection among calls are created.
  '''upsample+downsample:''' ::
   This measures the performance of the resampling algorithm used. The test gets the audio from the memplayer, upsample it twice the clock rate, then downsample it half the clock rate again so that the clock rate now is the same as originally. This test measures linear resampling and polyphase resampling using small filter and large filter.
+  This measures the performance of the resampling algorithm used. The test gets the audio from the memplayer, upsample it twice the clock rate, then downsample it half the clock rate again so that the clock rate now is the same as originally. This test measures both linear and non-linear resampling using small filter and large filter. Some resampling backend algorithms may not support selecting between linear/non-linear and small/large filter, in that case the results will be equal for all settings.
  '''WSOLA PLC - N% loss:''' ::
+  This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to generate/emulate lost packet (a.k.a Packet Lost Concealment/PLC). Timing for various loss percentage is shown. The WSOLA algorithm is used by both the delay buffer and PLC algorithm in pjmedia. The delay buffer itself is used by the splitcomb, sound port, and the conference bridge to adapt to audio burst and clock drifts.
+  This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to generate/emulate lost packet (a.k.a Packet Lost Concealment/PLC). Timing for various loss percentages are shown.
+  The WSOLA algorithm is used by both the delay buffer and PLC algorithm in pjmedia. The delay buffer itself is used by the splitcomb, sound port, and the conference bridge to adapt to audio burst and clock drifts.
  '''WSOLA discard N% excess:''' ::
   This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to discard excess audio samples (e.g. caused by clock drifts). Timing for various excess percentage is shown. The WSOLA algorithm is used by both the delay buffer and PLC algorithm in pjmedia. The delay buffer itself is used by the splitcomb, sound port, and the conference bridge to adapt to audio burst and clock drifts.
+  This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to discard excess audio samples (e.g. caused by clock drifts). Timing for various excess percentages are shown.
  '''echo canceller Nms tail len:''' ::
   This measures the performance of the acoustic echo canceller (AEC) for various echo tail settings. The audio source is taken from memplayer, and there is no acoustic delay in the audio samples.
+  This measures the performance of the acoustic echo canceller (AEC) for various echo tail settings. The audio source is taken from memplayer, and there is no acoustic delay in the AEC input.
  '''tone generator with single/dual freq:''' ::
   This measures the performance of the tone generator to continuously generate single or dual frequency tone.
+  This measures the performance of the tone generator to continuously generate single or dual frequency tone for 1 second.
  '''codec encode/decode:''' ::
   This measures the time to encode and then decode 1 second worth of audio samples using the specified codec.
+  This measures the time to encode and then decode 1 second worth of audio samples using the specified codec for 1 second.
  '''stream TX/RX:''' ::
   This test is intended to measure the performance/overhead of the stream, which consist of RTP/RTCP processing and de-jitter buffering. In addition it also tests the performance of Secure RTP (SRTP) for various settings combination and codec bandwidth. Since the test here also consists of codec processing (actual encoding and decoding), you need to subtract the result with the result of the corresponding codec to measure the overhead of the stream and SRTP only.
+  This test is intended to measure the performance/overhead of the stream, which consist of codec, RTP/RTCP processing, and de-jitter buffering. In addition it also tests the performance of Secure RTP (SRTP) for various setting combinations and codec bandwidth. Since the test here also consists of codec processing (encoding and decoding), you need to subtract the result with the result of the corresponding codec to measure the overhead of the stream and SRTP only.
 …
 KHz stream TX/RX - G.722                      20959    2.096   39.73
 }}}
+=== PJSIP-0.9.0, Windows XP, Pentium 4, Visual Studio 2005  ===
+||Hardware:||HP PC||
+||Platform:||Windows XP SP2||
+||Processor:||Pentium 4 (single core, no Hyper-Threading)||
+||Speed:||2.6 GHz||
+||Assumed MIPS:|| 8102 MIPS||
+||BogoMIPS:|| - ||
+||Compilation:|| Default Release settings (/O2) ||
+||Compiler:|| Visual Studio 2005 ||
+Result:
+{{{
+:46:14.571 os_core_win32. pjlib 0.9.0-trunk for win32 initialized
+MIPS test, with CPU=2666Mhz, 8102.0 MIPS
+Clock  Item                                      Time     CPU    MIPS
+ Rate                                           (usec)    (%)
+----------------------------------------------------------------------
+KHz get from memplayer                           11    0.001    0.09
+KHz conference bridge with 1 call               337    0.034    2.73
+KHz conference bridge with 2 calls              512    0.051    4.15
+KHz conference bridge with 4 calls              919    0.092    7.45
+KHz conference bridge with 8 calls             1658    0.166   13.43
+KHz conference bridge with 16 calls            3180    0.318   25.76
+KHz upsample+downsample - linear                288    0.029    2.33
+KHz upsample+downsample - small filter         7822    0.782   63.37
+KHz upsample+downsample - large filter        38386    3.839  311.00
+KHz WSOLA PLC - 0% loss                          53    0.005    0.43
+KHz WSOLA PLC - 2% loss                          61    0.006    0.49
+KHz WSOLA PLC - 5% loss                         103    0.010    0.83
+KHz WSOLA PLC - 10% loss                        152    0.015    1.23
+KHz WSOLA PLC - 20% loss                        195    0.020    1.58
+KHz WSOLA PLC - 50% loss                        520    0.052    4.21
+KHz WSOLA discard 2% excess                       8    0.001    0.06
+KHz WSOLA discard 5% excess                      27    0.003    0.22
+KHz WSOLA discard 10% excess                     74    0.007    0.60
+KHz WSOLA discard 20% excess                    117    0.012    0.95
+KHz WSOLA discard 50% excess                    370    0.037    3.00
+KHz echo canceller 100ms tail len             20945    2.095  169.70
+KHz echo canceller 128ms tail len             20484    2.048  165.96
+KHz echo canceller 200ms tail len             21017    2.102  170.28
+KHz echo canceller 256ms tail len             21562    2.156  174.69
+KHz echo canceller 400ms tail len             23030    2.303  186.59
+KHz echo canceller 500ms tail len             24102    2.410  195.27
+KHz echo canceller 512ms tail len             24441    2.444  198.02
+KHz echo canceller 600ms tail len             25380    2.538  205.63
+KHz echo canceller 800ms tail len             28751    2.875  232.94
+KHz tone generator with single freq              84    0.008    0.68
+KHz tone generator with dual freq               125    0.013    1.01
+KHz codec encode/decode - G.711                 135    0.014    1.09
+KHz codec encode/decode - GSM                  6898    0.690   55.89
+KHz codec encode/decode - iLBC                39783    3.978  322.32
+KHz codec encode/decode - Speex 8Khz          24543    2.454  198.85
+KHz codec encode/decode - L16/8000/1            161    0.016    1.30
+KHz stream TX/RX - G.711                        298    0.030    2.41
+KHz stream TX/RX - G.711 SRTP 32bit             633    0.063    5.13
+KHz stream TX/RX - G.711 SRTP 32bit +auth      1063    0.106    8.61
+KHz stream TX/RX - G.711 SRTP 80bit             634    0.063    5.14
+KHz stream TX/RX - G.711 SRTP 80bit +auth      1066    0.107    8.64
+KHz stream TX/RX - GSM                         7182    0.718   58.19
+KHz stream TX/RX - GSM SRTP 32bit              7353    0.735   59.57
+KHz stream TX/RX - GSM SRTP 32bit + auth       7693    0.769   62.33
+KHz stream TX/RX - GSM SRTP 80bit              7313    0.731   59.25
+KHz stream TX/RX - GSM SRTP 80bit + auth       7673    0.767   62.17
+KHz get from memplayer                            8    0.001    0.06
+KHz conference bridge with 1 call               592    0.059    4.80
+KHz conference bridge with 2 calls              907    0.091    7.35
+KHz conference bridge with 4 calls             1620    0.162   13.13
+KHz conference bridge with 8 calls             3055    0.306   24.75
+KHz conference bridge with 16 calls            5799    0.580   46.98
+KHz upsample+downsample - linear                560    0.056    4.54
+KHz upsample+downsample - small filter        15505    1.551  125.62
+KHz upsample+downsample - large filter        76944    7.694  623.40
+KHz WSOLA PLC - 0% loss                          52    0.005    0.42
+KHz WSOLA PLC - 2% loss                         263    0.026    2.13
+KHz WSOLA PLC - 5% loss                         113    0.011    0.92
+KHz WSOLA PLC - 10% loss                        383    0.038    3.10
+KHz WSOLA PLC - 20% loss                        742    0.074    6.01
+KHz WSOLA PLC - 50% loss                       1757    0.176   14.24
+KHz WSOLA discard 2% excess                       9    0.001    0.07
+KHz WSOLA discard 5% excess                      69    0.007    0.56
+KHz WSOLA discard 10% excess                    220    0.022    1.78
+KHz WSOLA discard 20% excess                    403    0.040    3.27
+KHz WSOLA discard 50% excess                   1301    0.130   10.54
+KHz echo canceller 100ms tail len             42084    4.208  340.96
+KHz echo canceller 128ms tail len             42697    4.270  345.93
+KHz echo canceller 200ms tail len             43782    4.378  354.72
+KHz echo canceller 256ms tail len             45008    4.501  364.65
+KHz echo canceller 400ms tail len             49519    4.952  401.20
+KHz echo canceller 500ms tail len             51945    5.194  420.86
+KHz echo canceller 512ms tail len             52492    5.249  425.29
+KHz echo canceller 600ms tail len             54984    5.498  445.48
+KHz echo canceller 800ms tail len             60065    6.006  486.65
+KHz tone generator with single freq             161    0.016    1.30
+KHz tone generator with dual freq               239    0.024    1.94
+KHz codec encode/decode - G.722                9354    0.935   75.79
+KHz codec encode/decode - Speex 16Khz         51086    5.109  413.90
+KHz codec encode/decode - L16/16000/1           304    0.030    2.46
+KHz stream TX/RX - G.722                       9570    0.957   77.54
+}}}