Python and iTunes

After going through the chaos and matters in Feb., finally I have two days off to do something interesting. I have been thinking about to find a lyrics fetcher on OS X for a long time, but I failed to find one.

For IOS users, you can download musicXmatch. For Windows users, you can download MiniLyrics. These two softwares work like a charm on the mentioned platforms, but for OS X, I still could not find a versatile one. Therefore, I  came up with an idea that why not try to make one with Python.

Continue reading

An ugly but simple V2EX RSS client

It’s been a while since I wrote my last article here. During the winter break I went to three national parks in California, Utah and Arizona, the experience of which is astonishing. Obviously, I have put all of my focus into photography and adventure…….

Alright, after I came home, I as usual would check some forums to see if there’s any update about technology and data science. Among those websites, V2EX is one of my favourite because it is an active, user-friendly website that has a lot of interesting piece of news. I know there are already some people who have developed fascinating third party apps in OS X, but I am more than happy to find out the way to do it.

I have known RSS for a long time, which means, I knew it before Google reader closed…… I seldom used it right now, because I indeed don’t need to overload my information flow with RSS and not too many websites are worth checking every day. To make a RSS reader for my needs, I just need two things, one is a simple interface without ads and distractions, the other one is a simple representation of article titles that are clickable. What about the author and description? Not important to me at all. I make it because I need it to be in this way instead of anything else.

Screen Shot 2016-01-28 at 10.19.13 AM
Continue reading

Using Plot.ly package to create dashboard in R

I have taken DSO-510 Business Analytics this semester. In the final phase of presentation, Prof.Selby asked us to create a webpage to show our results made in SAS. Actually this is a very challenging task since everybody knows  SAS is not good at fancy visualization. Without hesitating for long, I decided to use R instead.

However, R is also not good at creating graphs, at least, for basic users. There are tons of settings and configurations you can tweak, but who on earth could get the hang of them right away. I tried to ask some seniors for advice, and I had some suggestions like using Shinyapps, which is a very good javascript-embedded tool made by RStudio. Although I like challenges, I hope I can make things easier this time without focusing too much on how to use the tools and how to reinvent the wheels.

After researching some packages on the Internet, I came across Plot.ly which is an incredibly easy and fancy package to create plots. They are so easy to install that you just need only one line of command in R and all will be set. It contains a lot of presets in color and design, and its official website has guidelines and tutorials and even example codes to keep users on the same page. I believe everyone who has experience in R will find it easy to learn. Most importantly, Plot.ly can save interactive graphs into HTML file, the feature that is perfect for those who want to create dashboards.

When I get enough interactive graphs, I started to think about how to make a dashboard. Fortunately I have seen some amazing dashboards in various presentations during this semester, so I think the primary factors are “overwhelming” and “informative”. To create an “overwhelming” dashboard, I need to arrange the positions and size of the graphs very well to create a feeling that this work is simply fabulous. To create an “informative” graph, the charts and plots need to be interactive, colorful and user friendly, and luckily Plot.ly has done these for me. Well, apparently, now is all about design.

1…2…3…Boom! Here comes the shitty work: R Dashboard

I am not professional in design, but I don’t like to ask real designers any questions, because I believe what interests me will be good design. To achieve that goal, I have taken a look at successful webpages about dashboards, and wrote and edit the code to mimic successful websites. If anyone is interested in how to arrange the plots, please take a look in my source code of the webpage, and I believe that should be the way to learn and get into discussion. Happy learning!

Final Project of Manegerial Statistics

 

 

 

This report is the final project for course GSBA-524 in Fall 2015.

Olympics, the biggest sports event around the world, is held by a select country every four years. There are approximately 200 countries that take part in it each time. Obviously, a lot of talented athletics are fighting so hard to get the entry of Olympics, and, probably to win a medal for his/her country, proudly. However, as a audience I have never thought about the mysteries and secrets of those medalists. What makes them excellent? What are the interesting facts about them?

As the final project of GSBA-524, I am asked to dig deeper in this topic as an audience and as a potential data analyst. All I have is a dirty dataset that consists of details of Olympics medalists from 2000 to 2012, Country GDP, Population, HDI and ISO codes. To be honest, I am a big basketball fans and I loved Kobe so much that I also plan to make this report a tribute to my beloved 2008 USA basketball team.
Continue reading

Scrapy, Finally….

After months of experiencing the American culture and adapting to the course level at USC, I finally got a chance to review my Python skill. Fortunately, I remember the basic operation command quite well and perhaps that’s how interests can drive a normal person to learn something deep enough.

In short, I still want to enhance my skill in grab the content on different kinds of websites. Everybody knows there are tons of packages of Python that could do that, but everybody should be careful about the learning curve of those stuff unless you major in computer science. During this semester, I have heard a lot of my classmates claiming that Python is easy to learn but COME ON, Python after all is a very powerful objected oriented programming language. If you have never touched on programming before, it would make you crazy. To sum up using an analogy, it’s like learning to play the guitar, at first it might be easy but if you want to do a SOLO in BLUES at stage in front of thousands of audience, you’d better invest a decent amount of time in it.

Continue reading

Select other columns when using functions in SQL

Thanks to INF 551, I finally got a chance to practice the SQL skillset that I learned on my own. In this class, the instructor is recommending us students to use MySQL, which is, not bad but it is not as familiar to me as SQLite.

SQL is straightforward in most ways, but I found a tricky problem when I want to select another column while using functions such as sum(), count() in SELECT clauses. In excel or R, I can easily specify which column to show up when specifying some conditions. However, it seems that SQL does not provide such user-friendly methods to do that.

For example, when I want to select the column next to the maximum of the sum of “sales_amount” column, I ended up inserting two “SELECT” clauses into the chunk, and used a where clause to clarify one last condition. Mind that two clauses in brackets are kind of the same except for the letter alias.

I have done some search on the Internet but I could not find a more elegant way to solve this issue. Another method is to create a new table or view in the software to serve as a substitute for clauses in brackets.

Anyone know better solutions?

Is Excel kidding me with ” ‘ ” ?

I just took my midterm exam which is such a kick ass to me. Obviously it has something to do with Excel and it is not a pleasant experience.

In one question which requires me to convert cell format from DATE to Day of the Week. And I fully understand the formula that is “=TEXT(A2, “dddd”)”. However, because I have been busy learning programming languages myself this semester, I wrongly assume that the ” ‘ ” would be as same as ” ” “, which is  a quite weird expression, isn’t it?

But no way Excel’s formula is still using outdated rules for programming. How is that possible that it does not support single quote. It took me about half an hour and still unaware of the problem. Only after the exam is submitted did I realize that this could be the problem.

So be careful next time you use the formatting string in Excel. DO NOT USE SINGLE QUOTE!

Hands on the Tkinter module but a sad story….

Actually these days I’ve been learning the Tkinter module on Python quite hard because I found it quite interesting. It served as a tool to make my teenage dreams come true since I would like to make something visually enjoyable for a long time.

To be honest, the learning process of Tkinter is pretty complex due to its huge volume of pre-requisite knowledge of Tk programming “protocol”. I just touched on it for 3 days but I want to create some programs in order to preserve the confidence and interests to continue learning. So, I want to share a sad but true story using Tkinter module. Here’s the code:

Please don’t laugh at me, because I want to keep writing….(T_T)

Come on Python! Just Read My Clipboard

I don’t know whether you guys have been annoyed by the MyStatLab powered by Pearson. These days I was busy doing homework on it. My homework is about statistics analysis and it requires to use a lot of work on RStudio. However, as everybody know, R doesn’t support xls format very well, which requires users to transform the xls file into csv file first.

It might not seem to be a big deal as long as you have the csv file that is quite ubiquitous among data world. Nevertheless, MyStatLab doesn’t provide the csv format and give users the access of xls format instead! It gives the user to options: one is to copy the datasheet to the clipboard; the other is to download it as an Excel file.

This was totally nightmare when I tried to do my work on MyStatLab. Since I want to avoid uncertainties of reading a txt file in RStudio(which means I have to copy and paste the data to a txt file and you know we can’t simply modify the “.txt” to “.csv” because it won’t work!), I have to download the Excel file, open it in Excel, save it as a csv file, read and open it in RStudio. Those are all repetitive work and quite time consuming and not pleasing.

Screen Shot 2015-09-14 at 10.28.18 PM

After finishing my job painfully and stupidly, I calmed down and tried to solve it in Python. I hate all repetitive work and I believe machine should have done it better. Therefore, I wrote a small program to read my clipboard and save it as a csv file.

P.S. I did consider to read the excel file and run a python program save it as a csv file but I don’t see the boost in efficiency comparing to copy the data and run the python scripts. Another reason to do that is because creating a csv format file is supported originally by python.

Here are the codes:

I refer to http://stackoverflow.com/questions/16188160/how-to-read-data-from-clipboard-and-pass-it-as-value-to-a-variable-in-python and it is quite useful and block the Tkinter window to pop up automatically. Also, I added the support for UTF-8 encoding because I am afraid that I might use it to convert some csv files where headers would contain some Chinese characters.

Something I like about SAS

I’ve been active in learning SAS for about a week. During this time, I found out something about SAS that really make it on top of statistical software for a long time. I want to talk about my feeling about it here, as a reminder.

First of all, I want to ask a question: what softwares would you think of when you heard other people talk about statistics? Answers may vary: R, Python, SAS, STATA, SPSS……. To be honest, I have never touched on one of them and that is SAS. Due to my major in undergraduate study,  I used STATA quite a lot and even my whole paper counted on it. Additionally, I have tried to broaden my vision about statistical software, so I have installed and played with R Studio and SPSS before. Except for the various capabilities of R, I am not stunned by any kind of statistical softwares. At least that’s my initial impression. I love python very much, but I have not much experience in the statistics-related packages such as Numpy.

Until I am forced to learn SAS can I find out the most interesting point in SAS — it separates the data input procedure and data analysis procedure with DATA and PROC command! I appreciate this setting very much because when the command file is becoming chunkier, this setting would help you recognise what you want to navigate to more efficiently. At first, it has some learning curve but once you get the hand of that, you will treasure the clarity it gives you.

When I wrote my undergraduate paper using STATA, I thought of it as a python-based statistical software because both of them have similar coding logic. It is simple but is still not easy for a non-programmer to understand at once. As for R, it simplifies a lot of the coding command and makes it easier for statistician to code, but it has way harder learning process than SAS does because it is more versatile and contains a lot of unnecessary packages for researcher. It’s true that people have different preference on their tastes, so that’s also why there are many similar but different tools in one area.

Frankly speaking, the coding structure of SAS is not as efficient as that of R, for example, I have to type “RUN;” to make it run each time. What I like about is the designing of its logic that separates the data input and analysing process. Just think about it, what if Python could open every type of data files using “DATA” command, it would be of great convenience!