问题描述:

I have relatively small data set (circa 5mb) which I need to download (paid subscription) and sort on a monthly basis. I am hoping that someone can indicate a more efficient way to tackle this task.

The data is a record of planning applications within the last 30 days and whist the data is very informative, it is not organised very well - i.e. they do not provide the use type (e.g. residential, commercial, industrial etc.) nor do they provide the number of units. But they do provide a title for the application as well as a description.

An example title as follows:

STACKVILLE - 1680 APARTMENTS/RESTAURANT/RETAIL UNITS

An example description as follows:

Independent living scheme for older persons comprising 1100 one and 580 two bedroom apartments with support facilities a restaurant and bar retail units a cafe 20 reablement suites parking access landscaping and demolition works. 1 568 sq.m

At present, I am using Excel which is somewhat time consuming. I have an ever-growing list of circa 100 use classes, as I need to determine the exact usage, which I search the title and description for. If both the title and description return the same value then I am reasonably certain that is the use classification, if it is not found it is unlikely to be of interest - although I am not 100% certain my method catches everything that I am asking it to. If they return different values then I have to do a bit of manual sift. The formula I am using as follows:

=IFERROR(INDEX(Use_Classifications!$A$1:$A$100,MATCH(TRUE,INDEX(ISNUMBER(SEARCH(Use_Classifications!$A$1:$A$100,D2)),,),0)),"Not found")

After I have determined them I then need to determine the number of units and put this figure into a column to the right of the data for analysis.

Can anyone recommend a more efficient way to do this? I'm currently learning Python if that helps. Would also be willing to learn almost anything to speed this up as it's very time consuming.

相关阅读:
Top