It’s been a while since I wrote my last article here. During the winter break I went to three national parks in California, Utah and Arizona, the experience of which is astonishing. Obviously, I have put all of my focus into photography and adventure…….
Alright, after I came home, I as usual would check some forums to see if there’s any update about technology and data science. Among those websites, V2EX is one of my favourite because it is an active, user-friendly website that has a lot of interesting piece of news. I know there are already some people who have developed fascinating third party apps in OS X, but I am more than happy to find out the way to do it.
I have known RSS for a long time, which means, I knew it before Google reader closed…… I seldom used it right now, because I indeed don’t need to overload my information flow with RSS and not too many websites are worth checking every day. To make a RSS reader for my needs, I just need two things, one is a simple interface without ads and distractions, the other one is a simple representation of article titles that are clickable. What about the author and description? Not important to me at all. I make it because I need it to be in this way instead of anything else.
Okay, import RSS, run…. No way… It is Python but it is not that easy to make it. I did research on the Internet and confirmed the primary packages that I need to use: feedparser(an elegant package to read RSS or xml webpage), and tkinter(to create simple user interface).
Frankly speaking, I am really pissed off by the tkinter packages, because it is so non-Pythonly that it contains a lot of confusing concepts that I am not familiar with. I read a book about tkinter before and it is like, it is a PROGRAMMING LANGUAGE itself! I had a terrible time dealing with it every time I used it. Instead, feedparser is incredibly easy to use. It automatically renders the author, published time, description and so on of the newest articles from the webpage. To split the time I spent on this project, I would say 65% of it is attributed to damn tkinter.
Another difficult point in this program is the time zone issue. By default, the published time is UTC format. I tried to use “time.localtime()” to transform it but it fails. YEAH, IT FAILS! It is executed successfully but its format is still in UTC instead of the local time zone PST. Neither did “time.strftime()” work. After debugging, I am pretty sure that it is due to the timestamp I make could not be transform correctly. Later I found a post using calendar package to transform the formatted UTC format in to a correct localized timestamp. Thanks to the StackOverFlow!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
import feedparser import re import os import requests import time import calendar from Tkinter import * import webbrowser class Application(Frame): def __init__(self, master=None): Frame.__init__(self, master) self.pack() # Create the graphic interface self.url = Entry(self, width = 50) self.url.pack() self.button = Button(self, text='Enter',command=self.get_input) self.button.pack() self.contents = StringVar() self.url.config(textvariable=self.contents) # create a canvas inside the frame def create_scrollbar(self): def myfunction(event): self.canvas.configure(scrollregion=self.canvas.bbox('all'),width=1000,height=500) self.canvas = Canvas(self) self.frame = Frame(self.canvas) self.myscrollbar = Scrollbar(self, orient='vertical',command=self.canvas.yview) self.canvas.configure(yscrollcommand=self.myscrollbar.set) self.myscrollbar.pack(side='right',fill='y') self.canvas.pack(side='left') self.canvas.create_window((0,0),window=self.frame,anchor='nw') self.frame.bind('<Configure>',myfunction) # to add 'http://' prefix to those who are lazy def error_handler(self): str = self.contents.get() if re.match(r'http://', str, flags=0) == None: url = 'http://' + str try: requests.request('GET', url) return url except: Label(self, text='Not A Valid Address!', relief=RAISED).pack() else: url = str try: requests.request('GET', url) return url except: Label(self, text='Check your Address Spelling!', relief=RAISED).pack() # captures the user input and return the url def get_input(self): global user_input user_input = self.error_handler() text = self.print_input(user_input) return user_input # print the RSS result in the frame, and it would also create a # hyperlink in the title. def print_input(self, input): text = parser(input) text = text.get_text() self.create_scrollbar() for key, value in text.items(): t = Label(self.frame, text = key, fg="blue", cursor="hand2") # t.bind('<Button-1>', callback) t.bind('<Button-1>', lambda self,value=value: callback(value)) t.pack() # this function is for print_input above global callback def callback(link): webbrowser.open(link) class parser(object): """this class would take the url input and get 50 newest post from the website, and create a dictionary that contains the title and the link. I might consider putting more elements in it if I have the future need. """ def __init__(self, url): self.url = url self.feed = feedparser.parse(self.url) global lenItem lenItem = len(self.feed.entries) def get_text(self): content = {} for i in range(0, lenItem): title = self.feed.entries[i].title link = self.feed.entries[i].link # description = self.feed.entries[i].description # author = self.feed.entries[i].author pubDate = self.feed.entries[i].published pubTime = time.strptime(pubDate, '%Y-%m-%dT%H:%M:%SZ') timeStamp = calendar.timegm(pubTime) pubTime = time.strftime("%Y-%m-%d %H:%M:%S %Z", time.localtime(timeStamp)) title = title.encode('utf8') # description = description.encode('utf8') # author = author.encode('utf8') link = link.encode('utf8') title = title + ' ' + pubTime content[title]=link return content if __name__ == '__main__': root = Application() root.master.title('Input Web Address?') root.master.minsize(400,40) root.mainloop() |
This program is designed specifically for V2EX program, but I have tried to use it on other websites since I made an input box….It kind of works, but it needs to be adjusted in some ways.
Firstly, since I am labelling without any format, sometimes a long title would exceed the window limit, therefore it requires users to adjust the canvas size in line 30.
Secondly, since it provides the published time, it seems to me that every website would use different time format, for example, when scraping Verge, it would encounter this error, requiring user to match the format in line 107.
Thirdly, some attributes are not available in some websites. For example, when I input Reddit front page, it says that description is not available in line 104, so I have commented those lines in order to enhance it versatility.
Alright, from today on, I will never open the browser and input address and endure the interface and potential ads. All I have is traditionally-tasted titles and links, and I know I am gonna use it. 🙂
Major thankies for the blog post.Really looking forward to read more. Fantastic.
I’d love to be grateful for the efforts you’ve place in penning this website.
I’m hoping to check out the same high-grade content on your part later on too.
Actually, your creative writing abilities has encouraged
me to acquire my, personal website now 😉