问题描述:

I have a VB.NET application that I use to load various files into a RichTextBox and then scan through the document to find specific words. It's similar to the Find function in Word. The app was running fine until a 5,150 line .sql document run through it and it's taking upwards of 10 minutes to run to completion.

Can anyone recommend a better way of coding it than I have below?

 If sqlText.Contains("GRANT") Then

Dim searchstring As String = "GRANT"

Dim count As New List(Of Integer)()

For i As Integer = 0 To rtbFile.Text.Length - 1

If rtbFile.Text.IndexOf(searchstring, i) <> -1 Then

count.Add(rtbFile.Text.IndexOf(searchstring, i))

End If

Next

Try

For i As Integer = 0 To count.Count - 1

rtbFile.Select(count(i), searchstring.Length)

rtbFile.SelectionBackColor = Color.Yellow

rtbFile.SelectionFont = New Font(rtbFile.Font, FontStyle.Bold)

count.RemoveAt(i)

Next

Catch ex As Exception

End Try

rtbFile.Select(rtbFile.Text.Length, 0)

rtbFile.SelectionBackColor = Color.White

rtbFile.SelectionFont = New Font(rtbFile.Font, FontStyle.Regular)

End If

网友答案:

This can also be done with a While loop and RichTextBox.Find():

    Dim searchstring As String = "GRANT"
    Dim index As Integer =  rtbFile.Find(searchstring, 0, RichTextBoxFinds.None)
    While index <> -1
        rtbFile.Select(index, searchstring.Length)
        rtbFile.SelectionBackColor = Color.Yellow
        rtbFile.SelectionFont = New Font(rtbFile.Font, FontStyle.Bold)
        index = rtbFile.Find(searchstring, index + searchstring.Length, RichTextBoxFinds.None)
    End While
网友答案:

That first loop is killing the performance, you are calling IndexOf for every character in the string. Also the two loops can be merged in to one. Change it to:

rtbFile.SelectionBackColor = Color.Yellow
rtbFile.SelectionFont = New Font(rtbFile.Font, FontStyle.Bold)

For Each m As Match in Regex.Matches(sertbFile.Text, searchstring)
     rtbFile.Select(m.Index, searchstring.Length)
Next
网友答案:

You've got a few bad things going on here:

First, the following code:

For i As Integer = 0 To rtbFile.Text.Length - 1
    If rtbFile.Text.IndexOf(searchstring, i) <> -1 Then
        count.Add(rtbFile.Text.IndexOf(searchstring, i))
    End If
Next

This is looping through every character in your string, and calling IndexOf on the entire string from that point forward. So your 50,000-character string is running IndexOf 50,000 times, on large strings.

You only need to call IndexOf as many times as you find a string. When your string is found, you increment your start index to that point, and keep searching only from that point.

Next thing, this code:

For i As Integer = 0 To count.Count - 1
    ...
    count.RemoveAt(i)
Next

The RemoveAt line is unnecessary. You're already looping through a list, so you don't need to remove the items as you go along. The way it stands, your loop will skip every other item in your list.

网友答案:

Whoops. I missed a very important point about the IndexOf (and incorrectly assumed it was fed with the end of the last match). See Magnus's answer.


I am not sure where the bottleneck is (and it might very well be from setting the selection itself), but here are my suggestions, roughly in order of priority:

  1. Invoke rtbFile.Text once to avoid any roundtrips to underlying control (perhaps a native Windows control?) and use a variable to store the resulting string. Once the string is obtained in .NET, just keep using it directly unless/until the text may change. If the control is native then a lot of work may be required to simply "get the text".

  2. Use normal item iteration over the count collection (not indexing) and do not remove from the front-of the List when assigning the selections. Removing from the front of a List is "expensive" in that it must shift all items down internally. Also, removing the element is unneeded here and dubious at best: since the collection being modified is also is being iterated which likely leads to incorrect behavior (skipped items), regardless of performance.

  3. Only call IndexOf once per loop and use a variable to avoid a duplicate search. This likely won't have an overall impact, but it does avoid some "extra" work. IndexOf itself is fine and doesn't need to be replaced.

YMMV.

相关阅读:
Top