问题描述:

I'm trying to use multithreading.pool to compare images based on similarity. While I have code working on a single core, using a for loop or map() to iterate over the data, it's dreadfully slow on large groups of images. For that reason I've been trying to implement multiprocessing but I can't seem to get it right. My main question is why doesn't getssim() in the below code change the list?

The structure of the iterable looks something like this:

[[("images/000.jpg",np.ndarray),0.923],...]

Where the float is the simiarlity index of an image compared to the current image being tested. Here is the (somewhat abbreviated) non-working code:

import cv2

import glob

from skimage.measure import structural_similarity as ssim

import operator

import multiprocessing

def makeSimilarList(imagesdata):

simImgList = [] #list of images ordered by their similarity

while(imagesdata):

simImg = findSimilar(imagesdata)

simImgList.append(os.path.basename(simImg))

return simImgList

def getssim(imgd):

similarityIndex = ssim(img1,imgd[0][1])

print(similarityIndex) #this prints correctly

imgd[1] = similarityIndex

return imgd #this appears to have no effect

def findSimilar(imagesdata):

limg = imagesdata.pop()

global img1 #making img1 accessible to getssim, a bad idea!

img1 = limg[0][1]

p = multiprocessing.Pool(processes=multiprocessing.cpu_count(),maxtasksperchild=2)

p.map(getssim,imagesdata)

p.close()

p.join()

imagesdata.sort(key=operator.itemgetter(1))

return limg[0][0] #return name of image

images = [f for f in glob.glob(src + "*." + ftype)]

images.reverse()

imagesdata = [[(f,cv2.imread(f,0)),""] for f in images]

finalList = makeSimilarList(imagesdata)

with open("./simlist.txt", 'w') as f:

f.write('\n'.join(finalList))

Thanks for the help!!

网友答案:

You forgot to assign the result from multiprocessing.map to a variable. The key function should probably read

def findSimilar(imagesdata):
    limg = imagesdata.pop()
    global img1  # making img1 accessible to getssim, a bad idea!
    img1 = limg[0][1]
    p = multiprocessing.Pool(maxtasksperchild=2)
    imagesdata = p.map(getssim, imagesdata)
    p.close()
    p.join()                                                                     
    imagesdata.sort(key=operator.itemgetter(1))
    return limg[0][0]  #return name of image

Since you don't give enough details, I could not test your code, but I think this was the crucial point.

相关阅读:
Top