Changes between Version 5 and Version 6 of PJMEDIA-MIPS


Ignore:
Timestamp:
Jul 5, 2008 9:40:48 AM (16 years ago)
Author:
bennylp
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • PJMEDIA-MIPS

    v5 v6  
    11= PJMEDIA MIPS Measurement = 
    22 
    3 This page shows the CPU requirements/MIPS (Million Instructions Per Second) measurements of various PJMEDIA components which would be useful to evaluate PJMEDIA performance. Please do not interpret these numbers as an official or definite performance number, as these tests do not actually measure the actual number of instructions executed but rather the time, and there are many compilation flags in PJMEDIA as well as compiler switches that can be set to improve the performance. 
     3This page attempts to show the typical performance characteristic of various PJMEDIA components, which could be useful to evaluate PJMEDIA performance. Please do not interpret these numbers as an official or definite performance number, as there are many compilation flags in PJMEDIA as well as compiler switches that can be set to increase (or decrease) the performance. 
    44 
    55== Test Method == 
    66 
    7 Each test should measure the overall performance for both directions. So for example for resampling, the test shows the total upsample and downsample time in a single test,  and for codec it will show the total encoding and decoding time. 
    8  
    9 The test program depends on a correct setting of MIPS value of the processor being set correctly during compilation time. The test uses strictly one thread only. 
    10  
    11 To measure the MIPS value of a component, the program calculates the time to process 1 second worth of audio samples using that component, then calculates the MIPS value based on the configured MIPS value of the processor. 
    12  
    13 All the results below are done with the stock settings that come with PJSIP distribution. The test source code is available in '''pjmedia/src/test''' directory ({{{mips_test.c}}} file). 
     7Each test should measure the overall performance for both directions. So for example for resampling, the test shows the total upsample and downsample time in a single test, and for codec it will show the total encoding and decoding time. 
     8 
     9The test program depends on correct settings of CPU_MHZ and MIPS value of the processor being set correctly during compilation time. We used the MIPS information in the following links to assume the MIPS value of the processor: 
     10 - http://en.wikipedia.org/wiki/Million_instructions_per_second 
     11 - http://en.wikipedia.org/wiki/ARM_architecture 
     12 
     13To measure the MIPS score of a component, the program calculates the time to process 1 second worth of audio samples using that component, then calculates the MIPS score based on the configured MIPS value of the processor. Because of this, the calculated MIPS shouldn't be interpreted as a ''real'' MIPS value since it's purely based on time measurement, and our assumed MIPS value for the processor may be wrong, and in platforms where floating-point is available, floating-point instructions will be used instead. 
     14 
     15The test uses strictly one thread only. 
     16 
     17All the results below are done with the default settings that come with PJSIP distribution. The test source code are available in '''pjmedia/src/test''' directory ({{{mips_test.c}}} file). 
     18 
    1419 
    1520== Interpreting the Results == 
     
    2025 
    2126 '''Clock Rate:''' :: 
    22   This shows the sampling rate of the component being tested, in Hz. We test both 8KHz and 16Hz. Most components can work in both 8KHz and 16KHz hence there will be two test results row for the same component, each with different clock rate. Some components (mostly related codec) can only work in one of the clock rate (e.g. GSM is only shown in 8KHz, while G.722 is only shown in 16KHz). 
     27  This shows the sampling rate of the component being tested, in KHz. We test both 8KHz and 16KHz. Most components can work in both 8KHz and 16KHz hence there will be two test result rows for the same component, each with different clock rate. Some components (mostly related codec) can only work in one of the clock rate (e.g. GSM is only shown in 8KHz, while G.722 is only shown in 16KHz) hence there will only be one test result row for these components. 
    2328 
    2429 '''Time (usec):''' :: 
     
    2631 
    2732 '''CPU (%):''' :: 
    28  From the elapsed time above, we can measure how much CPU usage needed to run this component in real-time. For example, if the time elapsed is 1 second (one million microseconds) then this component will take 100% of CPU time when run in real-time. Or if the time elapsed is 0.5 second (500 thousands microseconds) then this component will take 50% of the CPU time when run in real-time. 
    29  
    30  The CPU percentage maybe larger than 100% if the time taken to process 1 second worth of audio samples is more than 1 second. 
     33 This shows how much CPU usage (in percent) this component will consume when running it in real-time. The value is derived from the time measurement above. For example, if the time elapsed is 1 secondthen this component will take 100% of CPU time when run in real-time. Or if the time elapsed is 0.5 second then this component will take 50% of the CPU time when run in real-time. 
     34 
     35 The CPU percentage maybe larger than 100% if the time taken to process 1 second worth of audio samples is more than 1 second. It may happen when we perform the test on slower processor. 
    3136 
    3237 '''MIPS:''' :: 
    33  Also from the elapsed time above, we can measure the MIPS needed to run this component in real-time, since we know (or we can assume) the MIPS value of the processor. 
     38 The MIPS (Million Instructions per Second) score roughly means how many instructions will be executed by this component per second when we run this component in real-time.  The value is derived from the time measurement above, and calculated based on the assumed MIPS value of the processor. Once again, the score may be incorrect for many reasons so it shouldn't be interpreted as an official/definite score, and especially one MUST NOT use the MIPS score to compare performance of different processor families/architectures. 
    3439 
    3540 
    3641=== Rows === 
    3742 
    38 The rows shows the measurement result of a particular components. The components tested are described below. 
     43The rows show the measurement result of a particular components. The components tested are described below. 
    3944 
    4045 '''get from memplayer:''' :: 
     
    4247 
    4348 '''conference bridge with N call(s):''' :: 
    44   This measures the performance of the conference bridge with N calls. Note that we don't use actual call for the test since we only want to measure the conference bridge performance and not codec performance (this will be measured in separate tests). So for this test we use memplayer for each ''call'' to supply audio to the bridge. During the test all the calls (ports) will be connected to port zero and port zero will be connected to all calls. No connection among calls is created. 
     49  This measures the performance of the conference bridge with N calls. Note that we don't use actual call for the test since we only want to measure the conference bridge performance and not codec performance (this will be measured in separate tests). So for this test we use memplayer for each "call" to supply audio to the bridge. During the test all the calls (ports) will be connected to port zero and port zero will be connected to all calls. No connection among calls are created. 
    4550 
    4651 '''upsample+downsample:''' :: 
    47   This measures the performance of the resampling algorithm used. The test gets the audio from the memplayer, upsample it twice the clock rate, then downsample it half the clock rate again so that the clock rate now is the same as originally. This test measures linear resampling and polyphase resampling using small filter and large filter. 
     52  This measures the performance of the resampling algorithm used. The test gets the audio from the memplayer, upsample it twice the clock rate, then downsample it half the clock rate again so that the clock rate now is the same as originally. This test measures both linear and non-linear resampling using small filter and large filter. Some resampling backend algorithms may not support selecting between linear/non-linear and small/large filter, in that case the results will be equal for all settings. 
    4853 
    4954 '''WSOLA PLC - N% loss:''' :: 
    50   This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to generate/emulate lost packet (a.k.a Packet Lost Concealment/PLC). Timing for various loss percentage is shown. The WSOLA algorithm is used by both the delay buffer and PLC algorithm in pjmedia. The delay buffer itself is used by the splitcomb, sound port, and the conference bridge to adapt to audio burst and clock drifts. 
     55  This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to generate/emulate lost packet (a.k.a Packet Lost Concealment/PLC). Timing for various loss percentages are shown.  
     56 
     57  The WSOLA algorithm is used by both the delay buffer and PLC algorithm in pjmedia. The delay buffer itself is used by the splitcomb, sound port, and the conference bridge to adapt to audio burst and clock drifts. 
    5158 
    5259 '''WSOLA discard N% excess:''' :: 
    53   This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to discard excess audio samples (e.g. caused by clock drifts). Timing for various excess percentage is shown. The WSOLA algorithm is used by both the delay buffer and PLC algorithm in pjmedia. The delay buffer itself is used by the splitcomb, sound port, and the conference bridge to adapt to audio burst and clock drifts. 
     60  This measures the performance of Waveform Similarity based Overlap and Add (WSOLA) algorithm when it is used to discard excess audio samples (e.g. caused by clock drifts). Timing for various excess percentages are shown. 
    5461 
    5562 '''echo canceller Nms tail len:''' :: 
    56   This measures the performance of the acoustic echo canceller (AEC) for various echo tail settings. The audio source is taken from memplayer, and there is no acoustic delay in the audio samples. 
     63  This measures the performance of the acoustic echo canceller (AEC) for various echo tail settings. The audio source is taken from memplayer, and there is no acoustic delay in the AEC input. 
    5764 
    5865 '''tone generator with single/dual freq:''' :: 
    59   This measures the performance of the tone generator to continuously generate single or dual frequency tone. 
     66  This measures the performance of the tone generator to continuously generate single or dual frequency tone for 1 second. 
    6067 
    6168 '''codec encode/decode:''' :: 
    62   This measures the time to encode and then decode 1 second worth of audio samples using the specified codec. 
     69  This measures the time to encode and then decode 1 second worth of audio samples using the specified codec for 1 second. 
    6370 
    6471 '''stream TX/RX:''' :: 
    65   This test is intended to measure the performance/overhead of the stream, which consist of RTP/RTCP processing and de-jitter buffering. In addition it also tests the performance of Secure RTP (SRTP) for various settings combination and codec bandwidth. Since the test here also consists of codec processing (actual encoding and decoding), you need to subtract the result with the result of the corresponding codec to measure the overhead of the stream and SRTP only. 
     72  This test is intended to measure the performance/overhead of the stream, which consist of codec, RTP/RTCP processing, and de-jitter buffering. In addition it also tests the performance of Secure RTP (SRTP) for various setting combinations and codec bandwidth. Since the test here also consists of codec processing (encoding and decoding), you need to subtract the result with the result of the corresponding codec to measure the overhead of the stream and SRTP only. 
    6673 
    6774 
     
    27428116KHz stream TX/RX - G.722                      20959    2.096   39.73 
    275282}}} 
     283 
     284 
     285=== PJSIP-0.9.0, Windows XP, Pentium 4, Visual Studio 2005  === 
     286 
     287||Hardware:||HP PC|| 
     288||Platform:||Windows XP SP2|| 
     289||Processor:||Pentium 4 (single core, no Hyper-Threading)|| 
     290||Speed:||2.6 GHz|| 
     291||Assumed MIPS:|| 8102 MIPS|| 
     292||BogoMIPS:|| - || 
     293||Compilation:|| Default Release settings (/O2) || 
     294||Compiler:|| Visual Studio 2005 || 
     295 
     296Result: 
     297 
     298{{{ 
     29909:46:14.571 os_core_win32. pjlib 0.9.0-trunk for win32 initialized 
     300MIPS test, with CPU=2666Mhz, 8102.0 MIPS 
     301Clock  Item                                      Time     CPU    MIPS 
     302 Rate                                           (usec)    (%) 
     303---------------------------------------------------------------------- 
     304 8KHz get from memplayer                           11    0.001    0.09 
     305 8KHz conference bridge with 1 call               337    0.034    2.73 
     306 8KHz conference bridge with 2 calls              512    0.051    4.15 
     307 8KHz conference bridge with 4 calls              919    0.092    7.45 
     308 8KHz conference bridge with 8 calls             1658    0.166   13.43 
     309 8KHz conference bridge with 16 calls            3180    0.318   25.76 
     310 8KHz upsample+downsample - linear                288    0.029    2.33 
     311 8KHz upsample+downsample - small filter         7822    0.782   63.37 
     312 8KHz upsample+downsample - large filter        38386    3.839  311.00 
     313 8KHz WSOLA PLC - 0% loss                          53    0.005    0.43 
     314 8KHz WSOLA PLC - 2% loss                          61    0.006    0.49 
     315 8KHz WSOLA PLC - 5% loss                         103    0.010    0.83 
     316 8KHz WSOLA PLC - 10% loss                        152    0.015    1.23 
     317 8KHz WSOLA PLC - 20% loss                        195    0.020    1.58 
     318 8KHz WSOLA PLC - 50% loss                        520    0.052    4.21 
     319 8KHz WSOLA discard 2% excess                       8    0.001    0.06 
     320 8KHz WSOLA discard 5% excess                      27    0.003    0.22 
     321 8KHz WSOLA discard 10% excess                     74    0.007    0.60 
     322 8KHz WSOLA discard 20% excess                    117    0.012    0.95 
     323 8KHz WSOLA discard 50% excess                    370    0.037    3.00 
     324 8KHz echo canceller 100ms tail len             20945    2.095  169.70 
     325 8KHz echo canceller 128ms tail len             20484    2.048  165.96 
     326 8KHz echo canceller 200ms tail len             21017    2.102  170.28 
     327 8KHz echo canceller 256ms tail len             21562    2.156  174.69 
     328 8KHz echo canceller 400ms tail len             23030    2.303  186.59 
     329 8KHz echo canceller 500ms tail len             24102    2.410  195.27 
     330 8KHz echo canceller 512ms tail len             24441    2.444  198.02 
     331 8KHz echo canceller 600ms tail len             25380    2.538  205.63 
     332 8KHz echo canceller 800ms tail len             28751    2.875  232.94 
     333 8KHz tone generator with single freq              84    0.008    0.68 
     334 8KHz tone generator with dual freq               125    0.013    1.01 
     335 8KHz codec encode/decode - G.711                 135    0.014    1.09 
     336 8KHz codec encode/decode - GSM                  6898    0.690   55.89 
     337 8KHz codec encode/decode - iLBC                39783    3.978  322.32 
     338 8KHz codec encode/decode - Speex 8Khz          24543    2.454  198.85 
     339 8KHz codec encode/decode - L16/8000/1            161    0.016    1.30 
     340 8KHz stream TX/RX - G.711                        298    0.030    2.41 
     341 8KHz stream TX/RX - G.711 SRTP 32bit             633    0.063    5.13 
     342 8KHz stream TX/RX - G.711 SRTP 32bit +auth      1063    0.106    8.61 
     343 8KHz stream TX/RX - G.711 SRTP 80bit             634    0.063    5.14 
     344 8KHz stream TX/RX - G.711 SRTP 80bit +auth      1066    0.107    8.64 
     345 8KHz stream TX/RX - GSM                         7182    0.718   58.19 
     346 8KHz stream TX/RX - GSM SRTP 32bit              7353    0.735   59.57 
     347 8KHz stream TX/RX - GSM SRTP 32bit + auth       7693    0.769   62.33 
     348 8KHz stream TX/RX - GSM SRTP 80bit              7313    0.731   59.25 
     349 8KHz stream TX/RX - GSM SRTP 80bit + auth       7673    0.767   62.17 
     35016KHz get from memplayer                            8    0.001    0.06 
     35116KHz conference bridge with 1 call               592    0.059    4.80 
     35216KHz conference bridge with 2 calls              907    0.091    7.35 
     35316KHz conference bridge with 4 calls             1620    0.162   13.13 
     35416KHz conference bridge with 8 calls             3055    0.306   24.75 
     35516KHz conference bridge with 16 calls            5799    0.580   46.98 
     35616KHz upsample+downsample - linear                560    0.056    4.54 
     35716KHz upsample+downsample - small filter        15505    1.551  125.62 
     35816KHz upsample+downsample - large filter        76944    7.694  623.40 
     35916KHz WSOLA PLC - 0% loss                          52    0.005    0.42 
     36016KHz WSOLA PLC - 2% loss                         263    0.026    2.13 
     36116KHz WSOLA PLC - 5% loss                         113    0.011    0.92 
     36216KHz WSOLA PLC - 10% loss                        383    0.038    3.10 
     36316KHz WSOLA PLC - 20% loss                        742    0.074    6.01 
     36416KHz WSOLA PLC - 50% loss                       1757    0.176   14.24 
     36516KHz WSOLA discard 2% excess                       9    0.001    0.07 
     36616KHz WSOLA discard 5% excess                      69    0.007    0.56 
     36716KHz WSOLA discard 10% excess                    220    0.022    1.78 
     36816KHz WSOLA discard 20% excess                    403    0.040    3.27 
     36916KHz WSOLA discard 50% excess                   1301    0.130   10.54 
     37016KHz echo canceller 100ms tail len             42084    4.208  340.96 
     37116KHz echo canceller 128ms tail len             42697    4.270  345.93 
     37216KHz echo canceller 200ms tail len             43782    4.378  354.72 
     37316KHz echo canceller 256ms tail len             45008    4.501  364.65 
     37416KHz echo canceller 400ms tail len             49519    4.952  401.20 
     37516KHz echo canceller 500ms tail len             51945    5.194  420.86 
     37616KHz echo canceller 512ms tail len             52492    5.249  425.29 
     37716KHz echo canceller 600ms tail len             54984    5.498  445.48 
     37816KHz echo canceller 800ms tail len             60065    6.006  486.65 
     37916KHz tone generator with single freq             161    0.016    1.30 
     38016KHz tone generator with dual freq               239    0.024    1.94 
     38116KHz codec encode/decode - G.722                9354    0.935   75.79 
     38216KHz codec encode/decode - Speex 16Khz         51086    5.109  413.90 
     38316KHz codec encode/decode - L16/16000/1           304    0.030    2.46 
     38416KHz stream TX/RX - G.722                       9570    0.957   77.54 
     385}}} 
     386