问题描述:

Here are my classes:

public class XDetail

{

public string Name { get; set; }

public int ID { get; set; }

}

public class X

{

public int XID { get; set; }

public int ID { get; set; }

}

The ID is shared between them to link X and XDetail (one to many relationship) and X and XDetail are really typed DataRows. I read in a file using the following linq query and shape an anonymous type:

var results = (from line in File.ReadAllLines(file)

select new

{

XID = int.Parse(line.Substring(0, 8).TrimStart('0')),

Name = line.Substring(8, 255).Trim()

}).ToList();

This data is used to check against existing X/XDetail to make appropriate changes or add new records. I wrapping the results in a check to see if it throws on the .ToList() when the sequence has no results. XList is a List and XDetailList is a List.

From there I attempt a fancy linq query to match up the appropriate items:

var changedData = from x in XList

join xDetail in XDetailList on x.ID equals xDetail.ID

where

(!results.Any(p => p.XID.Equals(x.XID))

|| !results.Any(p => p.Name.Equals(xDetail.Name)))

select new

{

XValue = x,

XDetailValue = xDetail,

Result = (from result in results

where result.Name.Equals(xDetail.Name)

select result).SingleOrDefault()

};

My new problem is that this query will only provide me with what has changed in X/XDetail and not what is new. To accomplish getting what is new I have to run another query which seemed fine enough while testing on small data sets (3 existing entries of X/XDetail), but when I attempted the real file and went to churn through it's ~7700 entries I seem to have endless processing.

For a sample data set of the following already contained in X/XDetail:

XID: 1, Name: Bob, ID: 10

XID: 2, Name: Joe, ID: 20

XID: 3, Name: Sam, ID: 30

With a results file containing:

XID: 2, Name: Bob2

XID: 3, Name: NotSam

XID: 4, Name: NewGuy

XID: 5, Name: NewGuy2

I'd like to be able to get a result set containing:

{XID: 2, Name: Bob2}, x, xDetail

{XID: 3, Name: NotSam}, x, xDetail

{XID: 4, Name: NewGuy}, x, xDetail

{XID: 5, Name: NewGuy2}, x, xDetail

I'd like the x and xDetail as part of the result set so that I can use those typed data rows to make the necessary changes.

I tried my hand at making such a query:

var newData = from result in results

join x in XList on result.XID equals x.XID

join xDetail in XDetailList on x.ID equals xDetail.ID

where

(x.XID == result.XID && xDetail.Name != result.Name)

select new

{

XValue = x,

XDetailValue = xDetail,

Result = result

};

As the joins indicate I'm only ever going to get the changed items in the data, I really want to be able to add in that data that isn't in X/XDetail and stop my system that has been processing my ~7700 change file for the past 2.5 hours. I feel like I have stared at this and related queries too long to be able to spot what I should be doing to shape a where clause correctly for it.

Is there a way to structure the linq query to find the changed data and the data that does not exist in X/XDetail and return that into a new result set to process?

网友答案:

I think your performaces problems are related to the complexity of your queries, that are maybe around O(n^2).

Hence, first I suggest you to set the current data in a lookup structure, like this (*):

var joinedByXID = (from x in XList
                    join xDetail in XDetailList on x.ID equals xDetail.ID
                    select new { X = x, XDetail = xDetail })
                    .ToLookup(x => x.X.ID);

Now, I'm not sure, but I assume that by saying "changed data" you mean a list of entries having XID already existing but a new name, is it right?
If so, you can get "changed data" using this query:

var changedData = results
.Where(r => joinedByXID.Contains(r.XID))
.SelectMany(r => joinedByXID[r.XID]
            .Where(x => x.XDetail.Name != r.Name)
            .Select(old => new {XValue=old.X, XDetailValue=old.XDetail, Result=r}));

Then, if by "new data" you mean a list of entries having new XID (XID not currently present in XList/XDetailList), well you cannot match them with X/Xdetail elements because, well there aren't any, so that's simply:

var newData = results
.Where(r => !joinedByXID.Contains(r.XID));

(*)
Actually, to be even faster, you could arrange your data in a dictionary of dictionary, where the outer key is XID and the inner key is the Name.

相关阅读:
Top