问题描述:

I need proccess several lines from a database (can be millions) in parallel in c#. The processing is quite quick (50 or 150ms/line) but I can not know this speed before runtime as it depends on hardware/network.

The ThreadPool or the newer TaskParallelLibrary seems to be what feets my needs as I am new to threading and want to get the most efficient way to process the data.

However these methods does not provide a way to control the speed execution of my tasks (lines/minute) : I want to be able to set a maximum speed limit for the processing or run it full speed.

Please note that setting the number of thread of the ThreadPool/TaskFactory does not provide sufficient accuracy for my needs as I would like to be able to set a speed limit below the 'one thread speed'.

Using a custom sheduler for the TPL seems to be a way to do that, but I did not find a way to implement it.

Furthermore, I'm worried about the efficiency cost that would take such a setup.

Could you provide me a way or advices how to achieve this work ?

Thanks in advance for your answers.

网友答案:

The TPL provides a convenient programming abstraction on top of the Thread Pool. I would always select TPL when that is an option.

If you wish to throttle the total processing speed, there's nothing built-in that would support that.

You can measure the total processing speed as you proceed through the file and regulate speed by introducing (non-spinning) delays in each thread. The size of the delay can be dynamically adjusted in your code based on observed processing speed.

网友答案:

I am not seeing the advantage of limiting a speed, but I suggest you look into limiting max degree of parallalism of the operation. That can be done via MaxDegreeOfParallelism in the ParalleForEach options property as the code works over the disparate lines of data. That way you can control the slots, for lack of a better term, which can be expanded or subtracted depending on the criteria which you are working under.

Here is an example using the ConcurrentBag to process lines of disperate data and to use 2 parallel tasks.

   var myLines = new List<string> { "Alpha", "Beta", "Gamma", "Omega" };

   var stringResult = new ConcurrentBag<string>();

   ParallelOptions parallelOptions = new ParallelOptions();

   parallelOptions.MaxDegreeOfParallelism = 2;

   Parallel.ForEach( myLines, parallelOptions, line =>
   {
      if (line.Contains( "e" ))
         stringResult.Add( line );

   } );

   Console.WriteLine( string.Join( " | ", stringResult ) );
   // Outputs Beta | Omega

Note that parallel options also has a TaskScheduler property which you can refine more of the processing. Finally for more control, maybe you want to cancel the processing when a specific threshold is reached? If so look into CancellationToken property to exit the process early.

相关阅读:
Top