How to get latest tweet id, using python-twitter search API -
i'm trying find way not same tweets using search api. that's i'm doing:
- make request twitter
- store tweets
- make request twitter
- store tweets,
- compare results 2 , 4
ideally in step 5 0, meaning no overlapping tweets received. i'm not asking twitter server same information more once.
but think got stuck in step 3, have make call. i'm trying use 'since_id'
argument tweets after points. i'm not sure if value i'm using correct.
code:
import twitter class test(): def __init__(self): self.t_auth() self.hashtag = ['justinbieber'] self.tweets_1 = [] self.ids_1 = [] self.created_at_1 = [] self.tweet_text_1 = [] self.last_id_1 = '' self.page_1 = 1 self.tweets_2 = [] self.ids_2 = [] self.created_at_2 = [] self.tweet_text_2 = [] self.last_id_2 = '' self.page_2 = 1 in range(1,16): self.tweets_1.extend(self.api.getsearch(self.hashtag, per_page=100, since_id=self.last_id_1, page=self.page_1)) self.page_1 += 1; print len(self.tweets_1) t in self.tweets_1: self.ids_1.insert(0,t.id) self.created_at_1.insert(0,t.created_at) self.tweet_text_1.insert(0,t.text) self.last_id_1 = t.id self.last_id_2 = self.last_id_1 in range(1,16): self.tweets_2.extend(self.api.getsearch(self.hashtag, per_page=100, since_id=self.last_id_2, page=self.page_2)) self.page_2 += 1; print len(self.tweets_2) t in self.tweets_2: self.ids_2.insert(0,t.id) self.created_at_2.insert(0,t.created_at) self.tweet_text_2.insert(0,t.text) self.last_id_2 = t.id print 'total number of tweets in test 1: ', len(self.tweets_1) print 'last id of test 1: ', self.last_id_1 print 'total number of tweets in test 2: ', len(self.tweets_2) print 'last id of test 2: ', self.last_id_2 print '##################################' print '#############overlaping###########' ids_overlap = set(self.ids_1).intersection(self.ids_2) tweets_text_overlap = set(self.tweet_text_1).intersection(self.tweet_text_2) created_at_overlap = set(self.created_at_1).intersection(self.created_at_2) print 'ids: ', len(ids_overlap) print 'text: ', len(tweets_text_overlap) print 'created_at: ', len(created_at_overlap) print ids_overlap print tweets_text_overlap print created_at_overlap def t_auth(self): consumer_key="xxx" consumer_secret="xxx" access_key = "xxx" access_secret = "xxx" self.api = twitter.api(consumer_key, consumer_secret ,access_key, access_secret) self.api.verifycredentials() return self.api if __name__ == "__main__": test()
in addition 'since_id', can use 'max_id'. twitter api documentation:
iterating in result set: parameters such count, until, since_id, max_id allow control how iterate through search results, since large set of tweets.
by setting these values dynamically, can restrict search results not overlap. example, max_id set @ 1100 , since_id set @ 1000, , have tweets ids between 2 values.
Comments
Post a Comment