使用System.Diagnostics.Stopwatch对程序的运行时间精确地测量


背景

  现代CPU有多个内核,大容量高速缓存,指令管道和许多其他的东西影响特定测试场景一个算法的运行时间。白盒测试技术-如附加的调试器或者分析器-关闭 CPU的高速缓存线路,管道等。真正的运行时间是隐藏的,这样这些现代超标量处理器优化的计算方法执行速度要比使用分析器的没有优化的还要慢(由于更多的指令)。黑盒测试没有附加的调试器或分析器(运行时间测量),能发现算法的实际性能,并完成了算法的性能分析。

  设置测试方案

  最重要的是防止CPU内核或处理器之间的切换。对性能测试有很大的影响。这可以通过设置进程的ProcessorAffinity来实现:

  Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2); // Use only the second core 

  为了能独占CPU内核,我们必须防止其他线程可以使用此CPU内核。我们设置进程和线程的优先级,来实现这一目标:

  Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;

  Thread.CurrentThread.Priority = ThreadPriority.Highest;

  最后,但并非不重要的,在我们的测试需要热身阶段。在我的系统中,1000-1500毫秒热身阶段之后结果是稳定的。我们可以用stopwatch自己来控制热身(这里至少1200mS):

stopwatch.Start();
while (stopwatch.ElapsedMilliseconds < 1200)
{
    result = TestFunction(seed, count); 
}
stopwatch.Stop(); 

  下面是完整的示例:

using System;
using System.Diagnostics;
using System.Threading;
namespace PreciseMeasure
{
    class Program
    {
        static void Main(string[] args)
        {
            Stopwatch stopwatch = new Stopwatch();
            long seed = Environment.TickCount;     // Prevents the JIT Compiler 
                    // from optimizing Fkt calls away
            long result = 0;
            int count = 100000000;
            Console.WriteLine("20 Tests without correct preparation"); 
            Console.WriteLine("Warmup");
            for (int repeat = 0; repeat < 20; ++repeat)
            {
                stopwatch.Reset();
                stopwatch.Start();
                result ^= TestFunction(seed, count);
                stopwatch.Stop();
                Console.WriteLine("Ticks: " + stopwatch.ElapsedTicks + 
                " mS: " +stopwatch.ElapsedMilliseconds);
            }
            // Uses the second Core or Processor for the Test
            Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2); 
            // Prevents "Normal" processes 
            // from interrupting Threads
            Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;     
            // Prevents "Normal" Threads 
            Thread.CurrentThread.Priority = ThreadPriority.Highest;                         
            // from interrupting this thread
            Console.WriteLine();
            Console.WriteLine();
            Console.WriteLine("20 Tests with correct preparation"); 
            Console.WriteLine("Warmup");
            stopwatch.Reset();
            stopwatch.Start();
            while (stopwatch.ElapsedMilliseconds < 1200)  // A Warmup of 1000-1500 mS 
                                                          // stabilizes the CPU cache and pipeline.
            {
                result = TestFunction(seed, count);       // Warmup
            }
            stopwatch.Stop();
            for (int repeat = 0; repeat < 20; ++repeat)
            {
                stopwatch.Reset();
                stopwatch.Start();
                result ^= TestFunction(seed, count);
                stopwatch.Stop();
                Console.WriteLine("Ticks: " + stopwatch.ElapsedTicks + 
                " mS: " + stopwatch.ElapsedMilliseconds);
            }
            Console.WriteLine(result); // prevents optimizations (current compilers are 
        // too silly to analyze the dataflow that deep, but we never know )
        }
        public static long TestFunction(long seed, int count)
        {
            long result = seed;
            for (int i = 0; i < count; ++i)
            {
                result ^= i ^ seed; // Some useless bit operations
            }
            return result;
        }
    }
}

结果:

  没有正确的准备

Ticks: 1580367 mS: 572 <-- highest Value
Ticks: 1577003 mS: 571
Ticks: 1576140 mS: 571
Ticks: 1560964 mS: 565
Ticks: 1351663 mS: 489
Ticks: 1248383 mS: 452
Ticks: 1115361 mS: 404
Ticks: 1112813 mS: 403
Ticks: 1113112 mS: 403
Ticks: 1112012 mS: 402 <-- lowest Value
Ticks: 1330444 mS: 482
Ticks: 1558493 mS: 564
Ticks: 1501097 mS: 543
Ticks: 1517796 mS: 549
Ticks: 1542712 mS: 558
Ticks: 1574959 mS: 570
Ticks: 1483975 mS: 537
Ticks: 1390578 mS: 503
Ticks: 1546904 mS: 560
Ticks: 1349507 mS: 488 

  运行时间在402ms572ms之间不等。存在170 mS 或者42%差距。很显然,这些结果是没有用的。

  正确的准备:

Ticks: 1110518 mS: 402
Ticks: 1110540 mS: 402
Ticks: 1110543 mS: 402
Ticks: 1110684 mS: 402 <-- highest Value
Ticks: 1110508 mS: 402
Ticks: 1110553 mS: 402
Ticks: 1110600 mS: 402
Ticks: 1110433 mS: 402 <-- lowest Value
Ticks: 1110509 mS: 402
Ticks: 1110508 mS: 402
Ticks: 1110489 mS: 402
Ticks: 1110568 mS: 402
Ticks: 1110503 mS: 402
Ticks: 1110566 mS: 402
Ticks: 1110625 mS: 402
Ticks: 1110474 mS: 402
Ticks: 1110571 mS: 402
Ticks: 1110448 mS: 402
Ticks: 1110555 mS: 402
Ticks: 1110495 mS: 402 


  有20个相同的结果:402 ms ,只能用tick(内部CPU性能计数器值)分辨。测试结果存在251tick或者0,02 %差距。在我的系统中,Stopwatch的频率是每秒2760029 tick。测试之间的运行差别只有0,09毫秒。这用于衡量和比较算法运行是非常好的。

  兴趣点:

  其中一个很重要的事情应该牢记。没有做准备的最好(最低)结果值还没有做了准备的最差结果值好。CPU的上下文和核心交换对应用程序运行会产生巨大影响


« 
» 
快速导航

Copyright © 2016 phpStudy | 豫ICP备2021030365号-3