Other articles


  1. Pandas date parsing performance

    Dates and times provide an unlimited source of hassles for anyone working with them. In this post I'll discuss a potential performance pitfall I encountered parsing dates in pandas. Conclusion: Create DatetimeIndices by parsing data with to_datetime(my_dates, format='my_format').

    ...read more

    There are comments.

  2. Saving time and space by working with gzip and bzip2 compressed files in python

    File compression tools like gzip and bzip2 can compress text files into a fraction of their size, often to as little as 20% of the original. Data files often come compressed to save storage space and network bandwidth. A typical workflow is to uncompress the file before analysis, but it can be more convenient to leave the file in its compressed form, especially if the uncompressed file would take up a significant amount of space. In this post I'll show how to work directly with compressed files in python.

    ...read more

    There are comments.

  3. Installing python for data science

    Installing all the python libraries required for data science can be a challenge, especially on windows machine. Unfortunately the same thing that makes the libraries fast also makes them difficult to distribute to different system types. Luckily there are a few free options for getting up and running painlessly. I …

    ...read more

    There are comments.

  4. Get a list of all English words in python

    The nltk library for python contains a lot of useful data in addition to it's functions. One convient data set is a list of all english words, accessible like so:

    from nltk.corpus import words
    word_list = words.words()
    # prints 236736
    print len(word_list)
    

    You will probably first have to download …

    ...read more

    There are comments.

Page 1 / 1

Links

Social