Studies Case // Desktop App Automation

How I Built an AI-Powered Desktop App to Automate Creative Workflows

AI-Powered Desktop App Main Image

A practical case study on using Python and AI to solve real-world business bottlenecks.

The Problem: Our Team Was Drowning in Paperwork

Let me set the scene. Our company is in the business of doing good. We help Indonesian workers who have spent years in Japan navigate the bureaucratic maze to claim their hard-earned pension funds. It's rewarding work, but a crucial part of our process was, until recently, an absolute nightmare.

The Japanese government, in its infinite wisdom, sends us these official paper receipts for every single client whose pension process is finalized. Think of them as the golden ticket, confirming everything is squared away. These documents contain all the critical data: the client's name, their pension number, payout details, and so on.

"Our job? Extract that data. Simple, right?"

Wrong. Our previous method involved a human being, a pair of eyes, and a whole lot of patience. Team members would manually read each receipt and painstakingly type the information into an Excel spreadsheet. This data would then get passed to our finance department for the next steps.

Now, imagine doing this for 500 to 800 clients every single month.

Each document took about 3–5 minutes to process. If you do the math… well, you don't have to, because I already did. For 500 documents, that's roughly 2,500 minutes, or about 41 hours of mind-numbing, soul-crushing data entry. With an 8-hour workday, we were losing the better part of a full work week. It was a bottleneck so bad, we practically needed a traffic controller for our spreadsheets. Our team was spending more time with these receipts than with their families. It was repetitive, inefficient, and, let's be honest, a one-way ticket to burnout city.

The Tech & The Strategy: The "Aha!" Moment

One afternoon, while staring at a mountain of these receipts, I noticed something. Despite the unique data on each one, the layout was always the same. The name was always in the top left, the pension number was always in its little box, and so on. They were all printed on the same, unchanging template.

The 'Aha!' Moment

A lightbulb went off in my head. It was one of those classic "shower thoughts," except I was fully clothed and surrounded by paper.

"Why are we doing this?" I thought. "This is a job for a robot."

And just like that, the mission was clear: automate the heck out of this process. The strategy was simple. If the format never changes, we don't need a human to find the data. We just need to tell a machine where to look. The obvious tool for the job was Artificial Intelligence, specifically something that could read text from an image.

Sample of Japan Pension Receipt

Sample how looks Japan Pension Receipt

The Process: Teaching a Machine to Read

My initial thought, "Let's use AI!" was the easy part. The execution required a bit of research.

The core of the problem was Optical Character Recognition (OCR). But I knew we needed something more advanced than a basic scanner app from 2010. We needed something robust, accurate, and intelligent. That's what led me to the Google Vision AI platform. It's like OCR on steroids - incredibly powerful and spookily accurate at identifying and extracting text from images.

Here's the breakdown of how the solution works:

  1. Defining the Template: First, I took one of the receipts and mapped it out. I identified the specific coordinates - the rectangular "region" - for each piece of data we needed. For example, Name: {x: 50, y: 100, width: 200, height: 30}. I essentially created a treasure map for the AI.
Defining data regions on the receipt
  1. Building the App: I developed a simple desktop application using Python. The user interface is minimalist: "Select Scanned Receipts," "Choose Save Location," and a big, beautiful "Run" button.
  2. The Magic Workflow:
    • The user scans a whole batch of receipts into a folder as images (JPG or PNG).
    • They open the app and select that folder.
    • They hit "Run."
    • The app iterates through each image file. For every image, it sends a request to the Google Vision AI API, along with the predefined "regions" for the data we want.
    • The AI reads only the text within those specific areas on the image.
    • It sends the extracted text back to our application almost instantly.
    • The application neatly organizes the data (Name, Pension Number, etc.) and appends it as a new row in a CSV file. CSVs are great because they're lightweight and can be opened by virtually any spreadsheet program, like Excel or Google Sheets.

The entire process is designed to be as hands-off as possible. The user just tells the app where the files are and watches the magic happen.

Desktop App UI

Desktop App UI

The Outcome: From 5 Days to 7 Minutes

Automation Results

So, what did this little experiment achieve?

Let's go back to our original benchmark. Manually processing 500 documents took about 41 hours, which is roughly 5 full workdays for one person.

With the new AI-powered app, processing those same 500 documents now takes about 7 minutes.

No, that's not a typo. Seven. Minutes.

The app processes the entire batch at once, firing off requests and compiling the data faster than you can brew a pot of coffee.

Now, you might be asking, "Is it 100% accurate?"

Of course not. And anyone who tells you their AI system is 100% perfect is probably trying to sell you something. AI, even Google's, can occasionally misread a character or get confused by a smudge on the paper.

But here's the beauty of it. Instead of spending 41 hours on manual data entry, our team now spends about 30–60 minutes verifying the AI's output. They open the CSV file, place it side-by-side with the scanned receipts, and quickly eyeball the data for any obvious errors. It's infinitely faster to proofread than to type from scratch.

We traded three days of tedious labor for one hour of quality assurance. I'd call that a win. Our team is happier, our process is ridiculously efficient, and we can now focus our energy on what actually matters: helping our clients. All thanks to a little bit of code and a "lazy" idea.