The past weekend was kind of cold in the Twin Cities area of Minneapolis and St. Paul. Seems like winter a somewhat ahead of time.
In the past few months I have been spending time learning and experimenting with machine learning (ML) and Big Data. Machine learning seems to require a lot of properly cleaned samples. This is one more case when garbage in implies garbage out. That said; the first step is to collect data. Data can come from different sources i.e., databases, files, public repositories, the Internet, etc. Data can be collected from the Internet in different ways. In general one can collect data from the internet using two main approaches: web scraping and via an API. I will cover both of these approaches in the following posts. Continue reading “BeautifulSoup”
It is Sunday again, seems like last week came and went by faster than usual.
When I browse YouTube videos on my phone, I tend to run into some that I would like to watch and if possible experiment with the subject. This post is associated with a video by Irfan Baqui. It is nice to get a challenge, understand what it is required, solve it and see how a fellow developer comes to the same solution using a different and in some cases the same approach. Continue reading “Odd Occuring Number in Array”
While I was waiting for some tests to complete I checked my Gmail and found a message from HackerRank suggesting a challenge. The Equal Stacks challenge may be found under Practice > Data Structures > Stacks > Equal Stacks. I read the description for the problem and decided to tackle it using stacks; how creative of me. Continue reading “Equal Stacks”
It is Sunday morning in the Twin Cities of Minneapolis and St. Paul. Woke up around 04:30 AM and spent the next couple hours working on Machine Learning with Big Data. It is a Coursera course. Have one more week to complete this course; so far so good. After preparing and having breakfast with my best half, return to my computer. Continue reading “Transform Strings”
Yesterday I was talking with a coworker about the time it takes (me) to produce a post in this blog. Towards the end of the day, after a nice walk with my wife, I developed the code for this post. My inspiration came from a YouTube video by Irfan Baqui. I am a firm believer that in order to verify you understand some subject, you need to write about it. The reason for writing is that one explains the subject to the reader. Continue reading “Queue implemented with Stacks”
Lately I have not had the time to write in this blog. For the past several months I have been getting up seven days a week, no later than 04:30 AM. I am taking a specialization on Big Data and machine learning. Loving every minute but it does not leave time at the end of the day to sit down and do something in order to be able to write a post. Continue reading “Fibonacci Sequence”
It is possible to receive a request, create a process or thread, service the request, and return to the caller the results of the operation. Many years ago, creating a process was the default approach. The issue was that creating and destroying a process when done are quite expensive operations. Continue reading “Thread Pool”
When indexing text based word frequency / relevance which may be applicable for web searches, one of the procedures used is to create a term frequency (tf) array followed by an inverse document frequency (idf) one. You can read more about this here.
In a previous post I experimented with some text in order to build hashmaps with the words of sentences (to keep things in perspective for a blog post). In that post I used a string that I copied from a course I took some years ago. The sting was already preprocessed. The text had already been stripped off punctuation marks. Continue reading “More than a List of Words”
Last week I was reading a post on Medium “First Steps in Data Science with Python NumPy” by Kshitij Bajracharya.
What called my attention is his opening statement “I’ve read that the best way to learn something is to blog about it”. I believe Kshitij hit it right on. The reason I agree is that I have been a believer in “If you can’t explain it simply, you don’t understand it well enough”. This quote is attributed to Albert Einstein. Continue reading “Simple Problems in Python”
Have you ever wondered how computers search for text and similar images?
For example, if you use Windows, open a File Explorer window. From top to bottom the windows has the title bar, the menu bar, the tool bar. Under the toolbar there are two text fields. The one on the left displays the full path to the current folder / directory. The one on the right displays “Search <current_folder>” e.g., “Algorithms”. I have enabled in my computer “Index Properties and File Contents”. By default when you search, Windows will only search the file names and properties; not the contents of the file. Depending on your usage, you might need to index some or all the files in all folders in your computer. In my case, I perform searches in all types of documents. If you mostly use the Office Suite, you might enable search only on folders holding your *.docx files. The reason for this is that the mechanism uses additional disk and memory to operate. Continue reading “Vector Model and Similarity Search”