Python Projects (Downloads)

Bib.tXt 2018-08

Bib.tXt takes as input a .txt file in LaTeX style (including \cite commands) and a corresponding .bib file. It outputs a .txt file in which all references are converted into author-year citations, including a reference list at the end of the output file.

Resource Counter 2019-09

Resource Counter is a Python script written for Kalpavriksh. It counts the resources listed at by means of the Python library Beautiful Soup. (Update: 2021-01)

Contractions and Ambiguity 2019-12

The Python code for “Contractions and Ambiguity: Grammar Based Disambiguation of English Apostrophe+S Contractions in Movie Scripts” identifies and expands seven different types of contractions in movie scripts of ten different Quentin Tarantino movies. (The .ipynb file can be opened with the open-source web application Jupyter Notebook.)

Title Clusters in Vector Space 2020-03

The Python script for “Title Clusters in Vector Space: Clustering of Artwork Titles via Word Embeddings” applies clustering algorithms to word embeddings to group a set of (70,000) artwork titles from "The Tate Collection" into emergent categories.

National Register of Large Dams 2020-10

The first Python script uses the Camelot library to extract tables from the .pdf publication “National Register of Large Dams” (June 2019) by the Central Water Commission of the Government of India. The second script corrects data errors with the help of the Pandas library and regular expressions.

Ten Thousand 2021-01

The dice game “Ten Thousand” implemented as a Python script.