This is rather a design problem. I don't know how to achieve this in
- I need to parse big files (> 10 million lines) which look like
2013-05-09 11:09:01 Local4.Debug 220.127.116.11 %MMT-7-715036: Group = 18.104.22.168, IP = 22.214.171.124, Sending keep-alive of type DPD R-U-THERE (seq number 0x7db7a2f3)
2013-05-09 11:09:01 Local4.Debug 126.96.36.199 %MMT-7-715046: Group = 188.8.131.52, IP = 184.108.40.206, constructing blank hash payload
2013-05-09 11:09:01 Local4.Debug 220.127.116.11 %MMT-7-715046: Group = 18.104.22.168, IP = 22.214.171.124, constructing qm hash payload
2013-05-09 11:09:01 Local4.Debug 126.96.36.199 %ASA-7-713236: IP = 188.8.131.52, IKE_DECODE SENDING Message (msgid=61216d3e) with payloads : HDR + HASH (8) + NOTIFY (11) + NONE (0) total length : 84
2013-05-09 11:09:01 Local4.Debug 172.22.10.111 %MMT-7-713236: IP = 184.108.40.206, IKE_DECODE RECEIVED Message (msgid=867466fe) with payloads : HDR + HASH (8) + NOTIFY (11) + NONE (0) total length : 84
Eventthat will be sent to server.
- How can I read this log file efficiently in
Akka model? I read that reading a file synchronously is better because of less magnetic tape movement.
- In that case, there could be
FileReaderActor per file, that would read each line and send them for processing to lets say
Router may have many actors working on
line (from file) and creating
Event. There would be 1
- I was also thinking of sending
Events in batch to avoid too much data transfer in network. In such cases, where shall I keep accumulating these
Events? and How would I know if I all
Events are generated from
I think I know what your asking, your basically saying that if you read and proccess a file in the mannor you are describing you risk having a massive amount of messages if the proccessing takes significantly longer than the reading. Also if you are messaging over the network ideally you would want to minimize the amount of messages to send. If your lines don't take long to process then I wouldn't send them to be processed over the network. Have you considered using futures instead? Don't know if you case is as simple as Parallel File Processing: What are recommended ways? in that case you should use streams. But I think the thing is with actors although they are good for throttling their main purpose is to wrap up state, and you don't have that so much with proccessing a file. Maybe you would be better off with futures, I show an example of that here Executing Dependent tasks in parallel in Java. But you could use actors like you say and have the processing actors communicate with the reader actor and tell it to stop reading for lets say a second as soon as the number of messages waiting to be processed exceeds 1000000 or however many you decide.