问题描述:

I am using this script to stream tweets from twitter using tweepy and I am facing a problem with the coordinates parameter.

whenever I get a tweet with coordinates, I get this error:

(1064, ‘You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near \’: “\’Point\'”, u\’coordinates\’: \'(28.5383355,-81.3792365)\’})\’ at line 1′)

also, my Coordinates condition to only store the tweets with geo-location is not taking effect. All the incoming tweets seem to be stored in the db.

import tweepy

import json

import MySQLdb

from dateutil import parser

WORDS = ['#bigdata', '#AI', '#datascience', '#machinelearning', '#ml', '#iot']

CONSUMER_KEY = ""

CONSUMER_SECRET = ""

ACCESS_TOKEN = ""

ACCESS_TOKEN_SECRET = ""

HOST = ""

USER = ""

PASSWD = ""

DATABASE = ""

# This function takes the 'created_at', 'text', 'screen_name', 'tweet_id' and 'coordinates' and stores it

# into a MySQL database

def store_data(created_at, text, screen_name, tweet_id, coordinates):

db=MySQLdb.connect(host=HOST, user=USER, passwd=PASSWD, db=DATABASE, charset="utf8")

cursor = db.cursor()

insert_query = "INSERT INTO twitter (tweet_id, screen_name, created_at, text, coordinates) VALUES (%s, %s, %s, %s, %s)"

cursor.execute(insert_query, (tweet_id, screen_name, created_at, text, coordinates))

db.commit()

cursor.close()

db.close()

return

class StreamListener(tweepy.StreamListener):

#This is a class provided by tweepy to access the Twitter Streaming API.

def on_connect(self):

# Called initially to connect to the Streaming API

print("You are now connected to the streaming API.")

def on_error(self, status_code):

# On error - if an error occurs, display the error / status code

print('An Error has occured: ' + repr(status_code))

return False

def on_data(self, data):

#This is the meat of the script...it connects to your mongoDB and stores the tweet

try:

# Decode the JSON from Twitter

datajson = json.loads(data)

if datajson['coordinates']=='None':

print 'coordinates = None, skipped'

else:

#grab the wanted data from the Tweet

text = datajson['text']

screen_name = datajson['user']['screen_name']

tweet_id = datajson['id']

created_at = parser.parse(datajson['created_at'])

coordinates = datajson['coordinates']

#print out a message to the screen that we have collected a tweet

print("Tweet collected at " + str(created_at))

#print datajson

#insert the data into the MySQL database

store_data(created_at, text, screen_name, tweet_id, coordinates)

except Exception as e:

print(e)

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)

auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

#Set up the listener. The 'wait_on_rate_limit=True' is needed to help with Twitter API rate limiting.

listener = StreamListener(api=tweepy.API(wait_on_rate_limit=True))

streamer = tweepy.Stream(auth=auth, listener=listener)

print("Tracking: " + str(WORDS))

streamer.filter(track=WORDS)

网友答案:

SQL error

the result from tweepy for the coordinates not a number or string, its instead an object.

you need to parse this object into lan & lon and save each one in a different column.

to get the latitude and longitude instead of this line:

coordinates = datajson['coordinates']

do this:

latitude, longitude = datajson["coordinates"]["coordinates"]

also, my Coordinates condition to only store the tweets with geo-location is not taking effect. All the incoming tweets seem to be stored in the db.

'None' is a string and not the variable None

replace:

if datajson['coordinates']=='None':

with:

if datajson['coordinates'] is None:

or better:

if not datajson['coordinates']:
相关阅读:
Top