Customer Success Story : The Cross Product

About company

The Cross Product (TCP) is a SaaS company that offers automated tools to analyze 3D data from large scale linear infrastructure such as railways, highways and powerlines. Our classifications, modeling and reverse engineering tools provide efficient ways of managing such infrastructures with applications ranging from risk assessments and assets management to intervention planifications and many more... Typical computations on such data require dedicated instances due to the high memory requirements, and can last up to several days. We don’t expect a high number of requests due to the targeted market and thus, we don’t need a highly scalable solution. However, ensuring that the computation runs correctly is critical. Ensuring the availability of the user data and optimizing its storage cost are critical.

Solution

We have decided to run our own orchestrator. It is reachable through an API. We have divided our architecture into several services: authentification, licensing, data management and computations. Users have access to a restricted set of applications which is dictated by their license. They can upload/download/manage their data in their own user-space in an Object Storage. Once data is uploaded, a computation can be started with a request to our computation service. This service records the requested computation in a postgresql table. Each computation is a state-machine, and states typically include: booting an instance, downloading the input data, running one or more dockers, uploading the data back to the Storage. We have implemented a Producer/consumer model to manage the computations, and the underlying tool is a scaleway NATS jetstream. Each state is produced by the computation service and published to the NATS, and an asynchronous process consumes the task and executes through ssh the corresponding commands to the instance. Once a task is completed, the instance informs the computation service, which, in turn, publishes the next state. The termination of a computation triggers a mail using transactional emails. As for now, services are merged into one monolithic container, and deployed on Serverless Containers. We planned on switching to Kubernete, for a more versatile and production ready deployment .

Having our own orchestrator allows us to finely tune the lifetime of instances and significantly reduce costs. We can also predict the required memory and number of cores according to the application and the inputs. We initially chose scaleway for it is European and it offers environmentally friendly services. We were pleasantly surprised that many tools could be quickly spinned up and used in a production ready environment. It helped us greatly in designing our final architecture (especially for NATS jetstream, containers, postgresql...).

Results

In the last 6 months:

800 new kilometers processed by our platform.
80 000 tokens used by our clients
10 new corporate customers
160k€ of income, from which 80k€ is recurrent.

Future plans

We are in the process of migrating from serverless to Kubernetes + Terraform. This would ensure us that our dev and staging environments are almost similar to the production environment. We should also benefit from having a fully declarative architecture.
Looking ahead, we consider using more of Scaleway GPUs. At first, until the end of 2024, it would probably be using Instance GPUs for training and perhaps inferring and for other vectorized computations (filtering, clustering...). We do not have large-scale model yet, but if the need for it rises in the near future, we would consider leveraging GPU clusters from Scaleway.