Jirka's Public Notepad

Data Engineering | Python | SQL Server | Teradata

March 14, 2020 By Jiří Hubáček Leave a Comment

Iterative approach – Photo Of The Day App

This post demonstrates the iterative approach that is an integral component of agile development on my personal project. It started small with a minimum viable product (MVP) and evolved by adding more features and refining the existing ones.

The centre point of this post is my Photo of the Day bot that started as a simple script one would manually invoke and run on local machine. The bot evolved into a dockerized application running in the cloud while continuously building and deployed.

Iterative thinking – Anuradha Gajanayaka

The implementation of the idea goes a long way back. At the beginning, there wasn’t a clear vision of what features and technologies are going to be utilized. Over time, I picked some technologies from my consulting projects, others out of curiosity. Either way, starting on a greenfield is a different experience than using the technology at the client. One needs to dig deeper, start from scratch, tweak the configuration, customize and troubleshoot.

The simple script evolved into a continuously built and deployed dockerized application. It would not only post a photo to Twitter but also extract EXIF information, post on Telegram, integrate with Amazon’s S3 and ECR and auto-generate hashtags using one of Tensorflow’s basic model.

The best learning experience is applying what one has learned on something practical. I only got this far because I wanted to make something for people around me and myself that would cheer our days up. ❤️

From the beginning …

During my travels, I shoot lots of beautiful photos, but only a handful of them see the light of the world. If some do, it might be as short as showing them to my friends at the cafe. And that’s a shame.

❤️ Prague ❤️
❤️ Düsseldorf ❤️

MVP – Minimum Viable Product

The first version of the script would scan a photo folder, pick a random photo and post it to Twitter via API. Within a few hours, I got a working MVP.

Pillow

Twitter API can only post an image file smaller than 3MB. Keeping an eye on the file size would not be practical. Within the next iteration, I implemented the Pillow library to keep resizing before the file size gets under 3MB. One can also use Pillow to extract file’s metadata, transform and lots more. My bot extracts date, GPS coordinates and camera information.

Telegram

Another iteration’s focus was on rewriting the spaghetti script with a few methods into an object-oriented design. I also refactored lots of the code and added Telegram posting support.

TensorFlow

Some time later, I saw someone using TensorFlow for image recognition. What a better way of trying new technology out than implementing it into the bot and present the outcome as hashtags! It’s 2020, and I’m definitely not going to do that manually. 😀🤦🏼‍♀️

Docker

Because of bundling the image recognition model with TensorFlow, I thought of containerizing the application to make building and running more automated and coherent process. Later, I benefited from containerizing as it made cloud hosting a relatively simple task.

S3 – Simple Storage Service

At this point, the bot relied on image data present in its folder structure. This tight coupling would be impractical for a cloud-hosted scheduled task. To decouple data from the code, they would have to be stored somewhere else in storage that would be independent on the code. An ideal use-case of AWS S3! The bot utilizes the boto3 library to retrieve the image from S3 and post it. Easy!

ECS – Elastic Container Service

Up to this point, the bot execution was a manual process not bringing any added value. On my way back from a conference, I sat next to an AWS architect on the aeroplane. During our conversation, I asked whether he’s got an idea of hosting a dockerized application in AWS and got recommended to take a look at ECS, namely Fargate launch type.

The Fargate launch type allows one running dockerized images without the need for maintaining infrastructure and EC2 cluster node instances. It’s a lightweight launch type where one specifies their Task Definition (what to run with resource quotas) and creates a task on the cluster.

Pricing is based on requested CPU and memory resources per hour. My bot runs as a scheduled task every day for approximately two minutes. My bill is around 0.7USD per month.

Azure DevOps Pipelines

Last bit of the puzzle was implementing continuous deployment process. At that time, I was part of a project using Azure DevOps pipelines for infrastructure and data platform deployment.

We had a DevOps team that did all the heavy lifting, and my team was only making minor adjustments to their foundation. I wanted to get some hands-on experience and went to set a continuous build pipeline for my bot.

My build pipeline is hooked on the bot’s master branch. When there’s a new commit, the pipeline pulls the repository content, inserts secrets (Twitter, Telegram, S3 API keys), invokes Docker Build task and pushes the image to Elastic Container Registry (ECR).

Conclusion

The application build was a great experience. Throughout the process, I got to learn new technologies, find a way of applying them on the problem and making them work together. Whereat the beginning was only a Python script with third-party libraries, now stands Docker, AWS with S3, ECR, ECS and Azure with DevOps.

The bot’s ECS Scheduled Task gets invoked, and posts to Twitter every day at 6 am CET.

Links: GitHub, Twitter

#photoOfTheDay shot on iPhone XS, in 2019. TwitterBot (GitHub: https://t.co/MdlNzXkyJo) #iPhoneXS #tensorflow Content prediction: 74% #koala #koalabear #kangaroobear #nativebear #Phascolarctoscinereus 5% #Madagascarcat #ring-tailedlemur #Lemurcatta 1% #indri #indris #Indriind… pic.twitter.com/cmXhw44cLP

— Jiří Hubáček (@hubacekjirka) January 19, 2020

Related

Filed Under: Uncategorized Tagged With: CICD, DevOps, Docker, ECR, ECS, python, S3, Telegram, TensofFlow, Twitter

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  • GitHub
  • LinkedIn
  • RSS
  • Twitter
© 2021 · Jiří Hubáček, PGP