问题描述:

I wrote a function (masking) with 3 inputs:

  1. inputOCL - an oclMat
  2. comparisonValue - a double value
  3. method - an int variable determining the comparison method

For my example I chose method=1, which stands for CMP_GT, testing if inputOCL>comparisonValue element-wise.

The purpose of the function is to zero out all the elements in inputOCL that don't comply with the given copmarison.

Here is the function masking:

void masking(cv::ocl::oclMat inputOCL, double comparisonValue, int method){

// NOTE: method can be set to 1-->5 corresponding to (==, >, >=, <, <=, !=)

cv::ocl::oclMat valueOCL(inputOCL.size(), inputOCL.type());

valueOCL.setTo(cv::Scalar(comparisonValue));

cv::ocl::oclMat logicalOCL;

cv::ocl::compare(inputOCL, valueOCL, logicalOCL, method);

logicalOCL.convertTo(logicalOCL, inputOCL.type());

cv::ocl::multiply(logicalOCL, inputOCL, inputOCL);

cv::ocl::multiply(1 / 255.0, inputOCL, inputOCL); }

When timing the function I find a very large difference in runtime between running the function or running the computation directly when running the following code:

int main(int argc, char** argv){

double value1 = 1.23456789012345;

double value2 = 1.23456789012344;

// initialize matrix

cv::Mat I(5000, 5000, CV_64F, cv::Scalar(value1));

// copy input to GPU

cv::ocl::oclMat inputOCL(I);

int method = 1;

static double start_TIMER;

// computation done in function

start_TIMER = cv::getTickCount();

masking(inputOCL, value2, method);

std::cout << "\nFunction runtime = " << ((double)(cv::getTickCount() - start_TIMER)) / cv::getTickFrequency() << " Seconds\n";

// direct computation

start_TIMER = cv::getTickCount();

cv::ocl::oclMat valueOCL(inputOCL.size(), inputOCL.type());

valueOCL.setTo(cv::Scalar(value2));

cv::ocl::oclMat logicalOCL;

cv::ocl::compare(inputOCL, valueOCL, logicalOCL, method);

logicalOCL.convertTo(logicalOCL, inputOCL.type());

cv::ocl::multiply(logicalOCL, inputOCL, inputOCL);

cv::ocl::multiply(1 / 255.0, inputOCL, inputOCL);

std::cout << "\nDirect runtime = " << ((double)(cv::getTickCount() - start_TIMER)) / cv::getTickFrequency() << " Seconds\n";

}

The runtimes can be seen in this screenshot:

Why is there such a large difference in runtimes?

网友答案:

I want to thank asarsakov (brought to my attention the matter of destructing oclMats) and DarkZeros (noted I forgot to destruct the second temporary oclMat in the function).

However, this does not reflect the entire solution. It seems the only way I am able to achieve identical 'direct' and 'function' results is by sending the oclMats using cv::ocl::oclMat& instead of cv::ocl::oclMat.

See the following code (the entire code, function and all) for the final solution that yields identical results. By changing the boolean variables at the beginning of main we control the computation (direct or via the function) as well as controlling the release of the oclMat within the timer.

#include "opencv2/ocl/ocl.hpp"
#include <conio.h>

void masking(cv::ocl::oclMat &inputOCL, cv::ocl::oclMat &valueOCL, cv::ocl::oclMat &logicalOCL, double comparisonValue, int method){
// NOTE: the method input is 1-->5 corresponding to (==, >, >=, <, <=, !=)  
valueOCL.setTo(cv::Scalar(comparisonValue));
cv::ocl::compare(inputOCL, valueOCL, logicalOCL, method);
logicalOCL.convertTo(logicalOCL, inputOCL.type());
cv::ocl::multiply(logicalOCL, inputOCL, inputOCL);
cv::ocl::multiply(1 / 255.0, inputOCL, inputOCL);   
}

int main(int argc, char** argv){

bool direct  = 1; // 1 for direct, 0 for function
bool release = 1; // 1 with releasing temporary oclMat, 0 without releasing them

// initialize data  
int method = 1;
static double start_TIMER;
double value1 = 1.23456789012345;
double value2 = 1.23456789012344;
cv::Mat I(5000, 5000, CV_64F, cv::Scalar(value1));

if (direct){
    // direct computation
    cv::ocl::oclMat inputOCL1(I);
    cv::ocl::oclMat valueOCL1(inputOCL1.size(), inputOCL1.type());
    cv::ocl::oclMat logicalOCL1;
    start_TIMER = cv::getTickCount();
    valueOCL1.setTo(cv::Scalar(value2));
    cv::ocl::compare(inputOCL1, valueOCL1, logicalOCL1, method);
    logicalOCL1.convertTo(logicalOCL1, inputOCL1.type());
    cv::ocl::multiply(logicalOCL1, inputOCL1, inputOCL1);
    cv::ocl::multiply(1 / 255.0, inputOCL1, inputOCL1);     
    if (release){ valueOCL1.release(); logicalOCL1.release(); }
    std::cout << "\nDirect runtime = " << ((double)(cv::getTickCount() - start_TIMER)) / cv::getTickFrequency() << " Seconds\n";
}

if (!direct){
    // computation done in function
    cv::ocl::oclMat inputOCL2(I);
    cv::ocl::oclMat valueOCL2(inputOCL2.size(), inputOCL2.type());
    cv::ocl::oclMat logicalOCL2;
    start_TIMER = cv::getTickCount();
    masking(inputOCL2, valueOCL2, logicalOCL2, value2, method);     
    if (release){ valueOCL2.release(); logicalOCL2.release(); }
    std::cout << "\nFunction runtime = " << ((double)(cv::getTickCount() - start_TIMER)) / cv::getTickFrequency() << " Seconds\n";      
}

printf("\nPress any key to exit...");
_getch();
return 0;
}
相关阅读:
Top