Media Artificial Intelligence

Creating Next-Generation AI Media Tool for Converting Language into Text

For our client from media and AI industry, our team developed a solution which makes journalists’ work easier and more accurate.

About the client

Aiconix is an IT startup, with the aim of building a bridge between two worlds: one old – media, and one new – artificial intelligence. One of the goals on that path was to create a next-generation media tool for efficiently converting language into text, creating indexation and image content analysis, and extracting data from videos.

About the project

The work on this project went through two phases.

In the first phase, our client needed MVP (minimum viable product) to use it for presentations, demos, etc. for pitching investors. MVP consists of integration with the most significant AI image processing providers such as IBM, Amazon, Google and Microsoft. At this phase, the client needed a simple integration with these providers with web application for administration and configuration and separate cross-platform desktop applications for customers. Besides image processing, MVP included speech to text processing with the same providers, using the platform we created.

The second phase began when the MVP was completed, so we continued to work on a scalable, production-ready solution, which is still continuously improved. The desktop application was discarded as obsolete, so we migrated the whole solution to Alongside image and audio processing, we also implemented video metadata analysis and text processing, as well as text translation.

The final outcome

Considering this project is still ongoing, the final result is yet to be fully discovered. Currently, the solution we developed for this client is being used by media companies whose journalists will significantly improve the speed and quality of their work. The solution allows audio files to be transcribed with the lowest error among all other competitive solutions. Besides, Suggestion Service is used as Expert advisor for images, videos and tweets in articles that are going to be published. Forbes recognizes this project as one of the top 25 startups in the DACH area. It has been released in production on Oracle cloud, and its Microservice architecture is distributed among more than 40 VMs.

Client's feedback

KoloTree – a software development service provider – who help us with the front- and backend development of our aingine platform. They are doing such a great job so we thought it’s about time to give them a little shout-out and say thank you for their work!

Our side of the story

The requirements we met with within this project was a novelty to our team, so we faced a few challenges.

The biggest challenge for us was to implement an algorithm that will combine results from multiple AI providers in real-time. We managed to overcome it by applying functional and reactive programming principles in the process of manipulating media streams.

Another challenge was handling streams of different audio and video formats, which we overcome by using best programming practices with applying Domain Driven Design and Test Driven Development practice.

Working on this project have brought us an opportunity to develop an outstanding solution, using a large variety of technologies.

Key features

  • Creating SRT and Transcription from audio file with higher quality than Google, Microsoft, Amazon or any other AI service
  • Extracting Metadata from Videos and Images
  • Suggesting to journalist video files and tweets that can be embedded in articles based on a text that journalist is writing


Media, Artificial Intelligence


Hamburg, Germany


  • For backend: Java 8+, Spring Framework, Vavr functional extension for Java, Reactor project for Java, Akka
  • UI:, Java FX
  • Messaging system: Apache Kafka
  • Database: MongoDB and MySQL

Team size

~20 members


Cooperation: 2018 – ongoing