Jon Krohn: 00:00:00
This is episode number 673 with Vincent Gosselin, CEO and Co-Founder of Taipy. Today’s episode is brought to you by Pathway, the reactive data processing framework, and by Posit, the open-source data science company.
00:00:18
Welcome to the SuperDataScience Podcast, the most listened-to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today. And now let’s make the complex simple.
00:00:48
Welcome back to the SuperDataScience Podcast. I’m delighted to be joined today by the mind-bogglingly kind and deeply technical entrepreneur Vincent Gosselin. Vincent is CEO and Co-Founder of Taipy, an open-source Python library that works up and down the stack to both easily build web applications as well as back-end data pipelines. Having obtained his masters from the prestigious Université Paris-Saclay in Computer Science and Artificial Intelligence in 1987, he’s been working with AI models for nearly 40 years, amassing a wealth of experience across a broad range of industries, semiconductors, finance, airspace, and logistics, to name a few. He’s held roles, including Director of Software Development at ILOG, Director of Advanced Analytics at IBM, and VP of Advanced Analytics at DecisionBrain. Today’s episode will appeal primarily to hands-on practitioners who are keen to hear about how they can be accelerating their productivity and Python, whether it’s on the front end to build a data-driven web application or on the backend to have scalable, reusable, and maintainable data pipelines.
00:01:49
That said, Vincent’s breadth of wisdom honed over his decades long AI career may prove to be fascinating and informative to technical and non-technical listeners alike. In this episode, Vincent details the critical gaps in Python development that led him to create the open-source Taipy library. How much potential there is for data pipeline engineering to be improved, how shifting toward lower code environments can accelerate Python development without sacrificing any flexibility and the 50-year-old programming language that was designed for AI and that he was nostalgic for until Python emerged on the scene. Alright, you ready for this fascinating episode? Let’s go.
00:02:33
Vincent, welcome to the SuperDataScience Podcast. It’s great to have you here. Where in the world are you calling in from?
Vincent Gosselin: 00:02:39
Thanks Jon for having me. I’m coming from a place called Plateau de Saclay, which is just south of Paris, so that’s in France.
Jon Krohn: 00:02:48
Nice. So you are the CEO and Co-Founder of Taipy, which is an open-source Python platform. It’s a data-driven web application builder. Can you elaborate for us on what Taipy does?
Vincent Gosselin: 00:03:04
Okay, so let me give you a bit of perspective. I’ve worked for many years building application in the AI world for large companies. We have done that for companies like container terminals in Singapore, companies in the semiconductor business like Samsung, TSMC, and McDonald’s, and so on and so forth. Disney for manpower planning and so on. So large companies. We have done that for a long time in Java Scala, earlier on C++. And these were heavy projects on, you know, we really wanted to do that in Python, everything. I mean, from all the different functions, front end, backend. And since we love Python we said, well, let’s do that. And we realized there were some issues. One on the backend first for the kind of application we had in mind, which are mostly, you know, AI application with end users, end users using the software, smart or not so smart software.
00:04:17
But there, there is a graphical interface, which is very important here. So that on the graphical side, we were not happy with what Python was proposing in its ecosystem in those days, a few years back. We looked at several software and we couldn’t find what we needed in terms of having a powerful, graphical interface builder where you can really design easily, you know, graphics, charts, and pages, a lot of interactions having a multi-user and so on, uneasy to use. That’s the important thing. We wanted any Python developer to be able to develop graphical interfaces. The second component was on the backend. As you know, software are not only about user interfaces, it’s also about the backend. And we looked around and there were tools in the Python ecosystem also, but we found them either extremely complex or for what we wanted, we would need like three or four of these packages. So we decided to build [inaudible 00:05:26] Taipy around these two packages, Taipy GUI and Taipy Core. That’s how it started. This is the origin of the product.
Jon Krohn: 00:05:41
Right. So they’re these two, can we call them two separate products, GUI and Core?
Vincent Gosselin: 00:05:46
So, initially, they have been designed as two separate packages. So, when you do a pip install Taipy, for instance, you do get both package if you want, and you can use one or the other. There is no connections if you want, between them. But this is not what will happen soon. In the next release, we are going to link the two. For instance, our geographical components are basically built on top of Plotly, and this is re-augmented markdown capability that we provide to really build quickly these interfaces. On the backend we do provide pipelines. We have a great pipeline editor. We are the only tool to provide this functionality of quickly building graphically your pipeline, not only with styles but also data, you know, the data is modeled and all the flows and so on.
00:06:47
And we have this concept, a very important concept called scenarios. Scenarios are really about being able to run your pipelines with different parameters and to keep these runs, these executions and be able to compare them. You can compare two scenarios, you can compare them over time. But in the next release what we want is these two components to come together. So you will be able to graphically see all the scenarios, for instance, that have occurred in the past two weeks and select them, and visualize the data node automatically. And all of this is like a single graphical component that visualize your backend component. So this is where we are going and that will be available in June for our next release.
Jon Krohn: 00:07:39
Nice. That sounds really exciting. So so I understand the GUI component. It seems very easy for me to understand the Taipy GUI component. It’s difficult or historically was difficult to build graphical interfaces easily in Python. And so Taipy GUI offers the solution. For Taipy Core, could you elaborate for me a bit more on what the problem was and how Taipy Core is a solution? Like, so you were talking about having to use three or four different applications previously to a achieve the same objective, but maybe, like giving us a specific use case of the way that things were before Taipy Core and now how that same endpoint can be reached much more easily with Taipy Core.
Vincent Gosselin: 00:08:36
Sure. So the first thing is, Taipy is all about pipelines, on building pipelines. And our objective is not only to cater for the needs of people who are already building pipelines, but we want also to bring people who are not using pipelines to use pipelines. Pipeline is a also kind of methodology to build your algorithms. We are still seeing a lot of Python developers, you know, involved with some kind of spaghetty programming and which are, should be solved with pipelines. For instance, in pipelines in Taipy Core you can for instance do caching. So if one of your tasks have already, have not changed, or the input have not changed, there’s no reason to run it again. So pipelines bring a lot of facility for doing that. The other aspect that you have is also to make sure this corresponds to best practices.
00:09:39
When you build your pipelines, for instance, in machine learning, you really have to build your main algorithm, but you need to have you know, what is called I forgot the names, like having the canary in the mine, you know, where, where you can have a second or third algorithms to compare with. So there are a few things relating to this. So first, again, is all about the ease to build these pipelines. We don’t want people to be struggling to build a simple or complex pipelines. Hence this graphical editor, which is an extension of Visual Code and allows anybody really to build pipelines where you have predefined data nodes that can read external data. Data node can be even your own parameter, a Python object that gets modified through the graphical interface, for instance, and you produce output data node, by the task will do that.
00:10:44
And this data node, this output data node will have to be stored and kept and so on. And the thing that we couldn’t find are really the concept of scenario. The scenario is really in the world of machine learning, they’re referred to experiment, I suppose, but we want to go way beyond just the realm of data science here. We want to look at the complete application. So in that sense you know, most of data algorithm will end up in the hands of end users where you have graphics and they will be using your pipelines and they will be modifying these pipelines. How? By parameters. We have an example, for instance, for [inaudible 00:11:27] Marché, one of the largest retailer in Europe. When you deal with projections for the stores or the cashflow for the whole company, the Covid data is actually something that no one had had any data.
00:11:43
So it was really data that only the end user could inject in the pipelines to really look at different kinds of scenarios. So this capability to build scenario is something really unique. We couldn’t find this anywhere in any of the existing tools. So, and that’s the heart of Taipy Core. And if you look at the other software, so of course there are software packages like Airflow, for instance. So, Airflow is really covering some of what we are doing, but it’s quite complicated. There is a big learning curve involved. There are some other tools like for instance Prefect, for instance, which is a nice software, but it’s like it’s lacking also this great editor. There’s no concept [inaudible 00:12:37] scenarios on, on this kind of thing. So again, it’s all about having the capability to build easily these pipelines, to track every single run, to be able to consider things like business cycles.
00:12:58
So that’s from our background. We come from a big, we work with big companies. So when you work for McDonald’s, for instance, and you do a forecast for each of their stores, every week they need to produce a forecast. So it’s kind of weekly cycle. For another situation, could be a monthly cycle or daily cycle. So all your scenarios you run, needs to be sorted per bucket, time bucket, a week, a day, month, whatever. So all of this is kind of pre-built and again, really easy to use to come up with your backend.
Jon Krohn: 00:14:08
Are you moving from batch to real-time? Pathway makes real-time machine learning and data processing simple. Run your pipeline in Python or SQL in the same manner as you would for batch processing. With Pathway, it will work as-is in streaming mode. Pathway will handle all of the data updates for you automatically. The free and source-available solution, based on a powerful Rust engine, ensures consistency at all times. Pathway makes it simple to enrich your data, create and process machine learning features, and draw conclusions quickly. All developers can access the enterprise-proven technology for free at pathway.com — check it out!
00:14:11
Nice. Okay, so to try to summarize back for you, what I now think I understand really well about this product, about this component of Taipy. So the Core component allows you to build pipelines easily. So, even people who haven’t felt like they could do pipelining before, they can now understand this great way of organizing your projects. So you don’t have, as you described it, spaghetti code linking everything together. So, you can have clearly defined nodes, clearly defined dataflows into and out of your analytics processes, your machine learning models, maybe some combination of those things because you could have some data inflowing, then some pre-processing steps, then maybe it goes through a machine learning model and then flows into some analytics, so that a report can be created, say on a daily cadence or a weekly cadence. And that whole flow can be captured as a well organized scenario.
Vincent Gosselin: 00:15:10
Absolutely. And for us, we consider that we have two kind of users. So, the data scientist can also program, of course, this, all this, but he’s also a user. He can also build a graphical interface for his own pipeline and use that.
Jon Krohn: 00:15:28
A click-and-point user.
Vincent Gosselin: 00:15:29
Exactly. And you have, of course, the end user, the person, who doesn’t know how to program in Python, will be using this. But he also needs to run these scenarios to do some variation, to start a forecast with different values. For instance, the parameter for the forecast could be the date from which you want to start the forecast. You may try to want different values. So all of these are also scenarios.
Jon Krohn: 00:15:57
Right, right, right. So the, so you could potentially have a Python developer, a data scientist who is programming up a Taipy Core scenario, and then an end user at a specific McDonald’s location could take that scenario and rerun it on a different day of the week or with some different kind of input parameter.
Vincent Gosselin: 00:16:20
Exactly. And that’s creating more scenarios. And usually at the end of the day, you will pick one of this scenario as an end user and say, you know, that’s the scenario that I want to be published to be official.
Jon Krohn: 00:16:33
Right, right, right, right. Cool. All right, now I understand. That sounds great. And then coming up in June, and I think we’ll talk about this more later in the episode, these two different key components of the Taipy software library, this open-source library Taipy. So, we historically had the GUI part and the Core part being separate, but coming up soon, those will be blended together. And so for example, you’ll be able to do things like being able to visualize in the GUI component, the way that your dataflows are set up within Core.
Vincent Gosselin: 00:17:13
Yeah. So, basically dataflows are quite simple. You model the data. This is modelnize data nodes and tasks that and that makes your pipeline. So yeah, and you can have several pipelines modeled. So on, on this will be run, each run will happen with different values of your parameters. And this is what we call scenarios. So basically it’s four concepts: data nodes, tasks, pipelines, and scenarios. All of this can be visualized. So if I select a scenario, I want to know you know, what are the data nodes? What data is inside each data node? So you can have really like a, you know, graphically you can have a selector of scenarios. So automatically you have all your scenarios appearing. You can select one and it’ll show up, you know, what are the data nodes with the, and you can click on one of these data node and have a view as a table, as a chart, or charts of that data. So all of this will be kind of ready to use within the graphical Taipy graphical objects where you can not only visualize your own stuff, but also the backend itself. So that will make the programming and the development of the full application even faster.
Jon Krohn: 00:18:38
Nice.
Vincent Gosselin: 00:18:38
So, you bring basically the two elements together.
Jon Krohn: 00:18:41
And to dig into those. So we’ve talked about scenarios a fair bit now. To dig into the data nodes and the tasks a little bit more. Am I, I’m kind of getting the impression, and you can correct me on what I’m getting right, what I’m getting wrong about these, but maybe the data nodes these are like the data structures, so it’s like it could be a table of data whereas the tasks, these do work, these are more like the verbs. So like the data nodes are like the nouns and the tasks are like the verbs doing work on those nouns.
Vincent Gosselin: 00:19:14
Yes, exactly. And to be even more concrete data nodes, they are predefined data nodes, CSV data node, JSON data node, Parquet data node, and so on. Data node may not only be a pointer to some external data, it could be also a Python object like you know, a date that is entered through the graphical interface, that’s also a data node. And you have the [inaudible 00:19:43], the output data nodes which are created by your tasks. So the task take as input, the input data node, and as I put the output data node and the data node, the output data node will actually store the data for if in inside each scenario. So these are data generated and kept by Taipy Core, and the task is purely a Python function.
Jon Krohn: 00:20:08
Right.
Vincent Gosselin: 00:20:10
So, it fits very well within your IDE whatever IDE you’re using. One thing I want to say also that we made also a big point to make sure that Taipy was working not only as DotPy or IDE environment, but also within Notebooks.
Jon Krohn: 00:20:29
Nice.
Vincent Gosselin: 00:20:29
So we are the, one of the only, I think we are the only, for instance interface full, you know, web application graphics that you can trigger from a Notebook.
Jon Krohn: 00:20:40
That’s great to hear, because that is often a nightmare. Like I’ve tried, lots of different kinds of visualizations, particularly if you’re trying to get a visualization outside of the specific browser tab that you’re in a particular Notebook.
Vincent Gosselin: 00:20:53
Exactly.
Jon Krohn: 00:20:53
It can be an absolute nightmare. So that’s great to hear.
Vincent Gosselin: 00:20:57
Yeah, that was, I had to convince our R&D team that on, on day one, we had a fairly strong argument on this and I won, and I’m happy to have won.
Jon Krohn: 00:21:05
Nice. I’m glad you won as well Vincent. Nice. So let’s dig a bit more into the data pipelines aspect. This is something in a lot of recent episodes of the show. We’ve talked about data pipelines. This is something that, it seems like a lot of organizations are thinking about this in a new way. So things like having your data pipelines be scalable, be reusable, be flexible, be maintainable this is a, this is a challenge that a lot of organizations are confronting. So what do you think is the future of data pipelines and how is Taipy addressing these kinds of pain points that organizations are experiencing with their data pipelines?
Vincent Gosselin: 00:21:51
Very good question. My feeling is that it’s still a not-so-mature area. That’s why we wanted to start with a good graphical tools. I mean, there’s nothing like graphics to share across the team of development. So that participates to the, I feel, to the best practices. So that you first designed graphically your and quickly your pipelines. You can share that with a team. Of course, there are situations where you may need to use a program to build your pipeline because there are situations where your pipeline is dynamic. You don’t know in advance what the pipeline will look like. So we have this option still, but most of the time your pipeline basically can be drawn on the graphic editor. So that’s very important. Some of the features that we’ll be looking at are of course, to improve. Of course with Taipy, you execute those pipelines. You have this concept of scenario, which we thought was, was very important to bring to the table. In terms of the roadmap. There are important things that will, will be coming soon. One is to stop having DAGs. You remember all this pipeline is supposed to be DAGs.
Jon Krohn: 00:23:23
Yeah. Actually, this whole time that you’ve been talking about this, I’ve been imagining that this is a Directed Acyclic Graph – a DAG.
Vincent Gosselin: 00:23:29
Exactly. So now we have, so these aren’t [crosstalk 00:23:32]. No, well at, at this stage. There still are, but I want to get rid of this. We need to get rid of that for simply, we need to have, IF nodes appear in a DAG where you can go back, that’s very useful when you want to model a drift especially graphically. I want to see this node where you can write rules to be able to detect, for instance, the reason for retraining. So we are going to have to go back. Of course, the cycle cannot be physically looping, so there will be some limits to that. But we will be provide this in a future release. We would like also to, there are few books that have been addressing this, but to come up with templates, pipeline templates, there are good practices.
00:24:22
I still talk to students, or even people who are starting in the career of data scientists, and they really forget you know, what are the best practices for doing their algorithm. So here I’m talking about specifically about machine learning how you do the testing, all these things. So this will help a lot if you had templates available. So we plan to have a way to store pipeline templates, a little bit like Hugging Face, but on the template side, not on the model side. There is a big, big opportunity here we feel. And of course, the company itself to build its own pipeline to be reused by other developers and newcomers. And there is a lot of work to be, to be done on that, on that field.
Jon Krohn: 00:25:12
Yeah, and I think something that ties into that you guys really excel at at Taipy is data pipeline versioning.
Vincent Gosselin: 00:25:20
Absolutely. So this is something also, so here we are not talking about the code versioning. We are not even talking about you know, the versioning MLOps. So where you have we have registries, different versions for you, machine learning, you could do that, but I think MLOps is really the best tool for doing that. We are talking about the pipeline itself. When you build a pipeline and the pipeline goes way beyond the pipeline of just doing you know, the training on, on the, the testing, it’s also, there’s a pipeline after scoring that you need to model. When you start doing this it’s kind of … your pipeline is doomed to evolve. You may discover that you have a great external data source that will benefit your model. Again, I’m talking about the context of machine learning, but it works in other contexts where you don’t necessarily do machine learning, where you have a new data source where you have a new task that you want to add. And your previous run that you have kept for the past 12 months don’t work anymore. So you need to version your pipeline and to have also the tools to be able to migrate your previous run so that it still work with the new version, your pipeline. So that’s something which is also fairly un unique, and that we bring to, to the table here for with Taipy Core.
Jon Krohn: 00:26:51
Nice. And that must be great not only for when an individual makes adaptations to a pipeline but also of course, you made that point of across the organization, a newcomer comes in, they can see, okay, this is how things are today. I can see it clearly graphically, and you can see this history of how it evolved to where it is today.
Vincent Gosselin: 00:27:12
Absolutely. Yeah. You need absolutely to be able to do simple comparison between versions that will tell you, oh, you can’t use that because this new data note has been added on that date, on this algorithm is a completely different version number or is new, brand new. So yeah, you need to track that and you need to have different behaviors, whether you are in a production mode or just in staging mode or just development mode.
Jon Krohn: 00:28:21
Every company wants to become more data-driven, especially with languages like R and Python. Unfortunately, traditional data science training is broken. The material is generic. You’re learning in isolation. You never end up applying anything you’ve learned. Posit Academy fixes this with collaborative, expert-led training that’s actually relevant to your job. Do you work in finance? Learn R and Python within the context of investment analysis. Are you a biostatistician? Learn while working through clinical analysis projects. Posit Academy is the ultimate learning experience for professional teams in any industry that want to learn R and Python for data science. 94% of learners are still coding 6 months later. Learn more at Posit.co/Academy
00:28:25
So you work with a lot of different companies. Some of them are small, some of them are big, they are in lots of different verticals. Are there any kinds of trends or generalizations you can make with respect to the kinds of companies that are successful at adopting these pipelines? Or are there, just, I guess, are there differences in general between how these different kinds of companies, big, small, different verticals, deal with data pipelines?
Vincent Gosselin: 00:28:52
Yeah, so here I would give you a bit of a historical background also here. So I want to say that this data thing is, if you look back 30 years ago has changed a lot. Before data was not a center stage. It was about, you know, doing an algorithm. So when we’re dealing with a, I don’t know, the Port of Singapore, you had to find the best algorithm to load as quickly as possible, the containers loading and discharge of containers on the container vessel. So that was the objective. And, but very often at the start of the project, you realize that data wasn’t there, wasn’t of high quality, and you started to get into trouble. So this has changed, luckily. People are much more aware, especially organizations these days, large organization have done a huge effort to make data a center stage. And now the question is almost the opposite.
00:29:51
I’ve got all this data, what can I do with it? So of course, this is where you come in and you help create this problem. Solve these problems. What you see is that there are big differences between companies. If you talk to companies in the semiconductor business with everything is automated, you know, in the plants or in companies like Samsung, TSMC, McDonald’s, the data quality is extremely high. Which of course creates a fantastic environment for you to be successful with your project. The main issue that we are seeing right now, it’s not so much on the data side, it’s the fact that Python is fast becoming the mainstream language. So it used to be a glue language that did move into the AI language, the main language for AI into mainstream, to a point where even CIOs who didn’t want to do, to have anything to do with Python, now they have to.
00:30:55
So, but that creates a need to have tools really to scale up, to be able not only to be used for pilots, but also into full applications. So the danger here is, and that we have seen, basically, Gartner was also had a report on this where he had 85% of pilots in Python staying pilots not making it in the next stage for various reason. Lack of skills. Cause a lot of things were not reused. They were starting again from scratch after the pilot. So you have to have a JavaScript guy, you have to have a, another data guy, DevOps guy, and so on and so forth. So this is not sustainable. And this is why also we have created Taipy to really ease that process, not to, to stop with silos. You know, the, the innovation group doing some pilots and the IT group doing the redevelopment from scratch.
00:31:54
That’s really crazy. To be able to have somebody who is good in, who knows Python, to be able to develop an application very quickly without having like 8 or 9 or 10 people around him which increases the cost also where you move from a pilot for 20k to a full application for half a million or more. There is also the silo issue is always an issue everywhere. I hear very often, you know, the data scientist group saying, okay, I’m exposing my algorithm, my algorithm is exposed, and that’s it. As if life has stopped. That’s not how it works. The algorithm will end up inside a full application. And this is where you need to continue the pipeline thing towards the execution side. You need to be able for the data scientist to understand what the end user will be doing with his pipelines. So we see the pipeline really getting beyond the data science and cover the whole application.
Jon Krohn: 00:32:59
And so, in addition to having these data pipelines be accessible outside of just silos, it’s nice for data pipelines and code in general to be accessible beyond just people who are very serious developers. So there’s advantages in terms of being able to share what you’re doing, being able to easily collaborate on what you’re doing to having low-code versions or you know low-code implementations. And you recently said in an interview with a French news network that you know, you’re having more and more of these kinds of low-code functionalities within Taipy, and this seems to be a popular trend. Some people talk about it as like a democratization of machine learning or of data science. So yeah, what do you think about this general trend and how do you think that they’re impacting the whole data science lifecycle?
Vincent Gosselin: 00:34:05
Great question. So on that aspect we see that first of all, I want to say that we are low-code, not no-code. So coding is still center stage with Taipy. I mean, I, we, it’s very important, the level of customization and tool has been designed really to allow for really fairly important customization. I don’t believe, you know, we are not in the field of complete auto-generation of code here at all. We need Python developers. Taipy is for Python developers, but, you know, we want their work to be much, much more efficient and also much more successful. We want Python, you know, to I was saying, to go all the way from pilot to applications. We want Python to go beyond just in the building of pipelines for AI or other types of algorithms, or … We will go nevertheless in terms of roadmap towards a bit more automation even more. For instance, I was talking about, you know, these components that will bring both graphics and backend objects.
00:35:19
For instance, if you want to see a data node, so you would, you know, you have the data node graphical object, which will immediately show you if it’s a table, a table view, you’ll be able to scroll to see that or maybe some charts automatically for you to, to do that. We are thinking of having, of course, natural language interface, like where you can query your table or your and, and generate automatically Taipy graphical objects. So this would be, would be good. So that’s the level of automation that brings a lot of value to the data scientist and to the end user. So that’s, that’s the kind of thing. But we will always, at this stage, I’m not seeing any move towards going into no-code. And even our low-code is still requires coding because we, we want the product to be open, be highly customizable.
Jon Krohn: 00:36:14
Nice. So then, so this is like a, so what does it mean? So it’s very easy for me to understand in my head the idea of no-code. Okay? It’s like, so some people are out there trying to build a no-code application where you just click and point, you drag. So like a very good, a well-known publicly listed version of this kind of tool is Alteryx, where the idea is that you can build these visual pipelines and everything is, there’s no code at all. Just everything’s click and point. So what does it mean for something to be low-code? What’s high-code in contrast?
Vincent Gosselin: 00:36:51
But it, it’s true. The word low-code is undefined almost [inaudible 00:36:55] I’ve seen low-code that I’ve very close to no-code for me and low-code where you still have to do quite a bit of programming, which is the case for Taipy. The reason is that if you look at the graphical interface, we don’t want to be, to have our hands tied as, as a developer. You know, if I look at some no-code application in terms of graphics, for instance, there are things I will not be able to do. I may want not only to visualize stuff, but I want to click on a particular cell in the table, and I want particular graphics to pop up, and I want to update maybe two or three pages that can be impacted by that thing.
00:37:38
So you need a very rich capability to deal with you know, kind of to provide a, a lot of freedom. So the freedom, which is about customization is very important. You don’t have that. The more you go towards the no-code field, the, the more the more your hands are tight. Yeah. Basically. So that’s why we are not competing with tools like Gradio for instance, which is a kind of things we are not, we are much more into the coding here, but we want to get rid of, we want to automate this to a level where, you know, you can build very quickly an interface. We, I can give you some examples, for instance, of some customers where they developed. So this was not no-code. They developed, for instance, in Python or in Java with nothing really from scratch and application, which was a machine learning application using some standard tools for machine learning, and even for the pipeline stuff. It took them like seven, eight months, five, four to five people to do it with the graphics, the backend, everything.
00:38:51
The same, the very same thing was developed in a month and a half with one person, two per one and a half person. So you basically divide the time by three or four, the number of people by two, and you get a factor 10 cost twice. So that’s what we want, but we don’t want to compromise on the customization level as little as possible because we are developers. We know that we don’t want to provide, for instance, the graphical stuff. We have no interest to look at tools like we are not a BI tool, for instance, where you do everything. Because we don’t want to read [inaudible 00:39:34] stuff. We have seen what these tools can do. They have their own market but we want much more flexibility. We don’t want to do that.
00:39:44
So that was a choice at the start. So I’m not seeing a reason for us to change this. Especially now we are seeing a lot of CIOs moving into Python and they see Taipy as really a fantastic stepping stone for them to get into that world. It lowers the learning curve and so on. But they have serious application to build. When you build kind of application for different companies I mentioned, usually large retailers or logistics or manufacturers. These are fairly heavy-duty thing – graphic-wise, backend-wise. Most of the, if you were to do fairly no-code, yeah, you, you won’t go very far. And again, our focus is really to have end users behind. People who will be using your algorithms, your pipeline on the day-to-day or once a month depending on the situation. But these people are demanding. They always want parameters, changes, great graphics. Yeah. And so.
Jon Krohn: 00:40:57
Nice. So I get it. So low-code means efficiency and automation wherever possible. It means a shallower learning curve while still allowing for a huge amount of flexibility to suit, you know, whatever kind of-
Vincent Gosselin: 00:41:10
Absolutely.
Jon Krohn: 00:41:11
And graphics, for example. Okay, cool. So this leads me to another question, which is, so you’ve talked a lot from the beginning of this episode. You can, you can tell the way that you think about building the Taipy application, the way that you think about solving problems. You’re always thinking long term, you know, from your very first answer, you were like, let me start at the beginning and why we needed to do it this way. So when you were building Taipy right from the outset through to today, what were the kinds of decisions that you made around the languages that you would use like this, this Taipy application? It’s quite complex in the sense that it is, it covers like the whole stack.
Vincent Gosselin: 00:42:03
Yes.
Jon Krohn: 00:42:04
You know, you have, you have to have very efficient backend operations, but you also have to have great visuals. So what are the kinds of decisions that you made as you began developing Taipy? How did you choose the programming languages, for example, that underpin it?
Vincent Gosselin: 00:42:23
Well, first of all, we are a bit of a strange startup is that we, we have contacts in large corporate organizations. So the software was built based on our experience with large companies. So that was kind of right from day one, it was about building, you know, full-blown applications. Not pilots only. Of course, we can use it for pilots, but not only for pilots. So that was one of the things. So we clearly set up, you know, what were the objectives of Taipy. We explained that to the R&D team. After that of course they make their own choices, of course, on the graphical side everything was built on JavaScript. It’s built on top of Plotly. So some of the choices you have to make as an R&D is which library are you going to build upon to build your software.
00:43:22
So we made choices to on Plotly, but we also want to open the product with an API to basically connect with any other graphical libraries. We worked with different, on the backend we have decided, for instance, how can we build a graphical editor? Well, the choice was after looking around, well, let’s do it with Visual Code extensions, and that work out really well. So you always have to be on the lookout for what are the libraries in Python that can really help you make your software more powerful, more complete, still keeping the philosophy, which is to be sufficiently customizable and easy to learn. So that’s really the thing that we become. Otherwise, the R&D team itself, yeah, they use things like Git’s Copilot for their own programming things and some other stuff. I’m sure I’m not completely aware of what tools use internally.
Jon Krohn: 00:44:33
Nice. Well, I’m sure it’s great for our audience to hear the kinds of decisions that you’re making when you’re designing a platform like this and to be able to hear about it from conception like that. Going back even further into your history, pre-Taipy Vincent, you’ve held several analytics leadership positions for consulting firms, like a VP of Advanced Analytics for DecisionBrain. You were director of advanced analytics for IBM. And so you were working on problems across a broad range of industries, apparel, container terminals, semiconductors, fast food, airspace, finance. So you’ve alluded to some of these things already in this episode, like shipping containers in Singapore, McDonald’s [inaudible 00:45:17]. But how are, if there are some kind of insights that you can provide to our audience from having worked across all these different verticals with so many different kinds of companies are there any trends that you see out there in terms of how companies manage their data or set up machine learning applications with, you know, to learn from their data? I’d love to hear about your thoughts on this and then something that kind of ties into this personally. Do you think that this kind of experience that you have, having worked for so many different kinds of companies across so many different kinds of industries, do you think that this kind of approach gives people an advantage when they then later come to build a general solution like Taipy?
Vincent Gosselin: 00:46:13
Okay, so a wide topic. Yeah. So definitely over the years we have seen as I mentioned earlier, also big differences in application. And we, you definitely need to build over the years best practices to educate your customers for them to be successful. They spend a lot of money on some of these systems. And the ROI can be very large. That’s why people talk about AI is all about ROI very often. So yeah, so what I want to say is that there are few things. So we see some areas of excellence, for instance, you may have in a company R&E group that does extremely extraordinary stuff with AI application or some data. The challenge that we see is how to spread this to the operations, to the different functions within the company, whether at the finance level, like in [inaudible 00:47:31], where they really looked at optimizing and predicting cash flow for the whole company to just people planners in the port of Singapore or port of Hong Kong, where they need to plan for the loading and unloading of vessels, to the factories of Samsung and TSMC, which needs to dispatch all the different plots and wafers to the different, all these expensive machines.
00:47:51
And the we see really two categories, what we call the automated thing. So basically you don’t have any users. So this a lot of things are have been put in place, but it’s like an iceberg. You know, the, the really, the big thing in the future is really to get all these smart algorithms in the hands of end users. And that’s really about all about it. So, and that’s what Taipy has been designed for. And I’m seeing a big challenge here where how do you [inaudible 00:48:35], and I’ve seen also some smart data centers that are also a bit dreaming sometimes about their own algorithm. Like, you know, I’ve exposed my algorithm, it works perfectly on this data. I’ve finished my job, and thinking that it’ll be whatever the algorithm generates will be automatically applied.
00:48:55
That’s not the case when you deal with end users. They will want to do what-if analysis. That’s why we have these scenarios. They will want to change parameters to play and decide what really to do with it. So this is really the big challenge for AI in general. Is how do, do you make end users use these algorithms? And this is why we have created Taipy. And that’s the biggest challenge especially now that you know, everybody programs in Python. You have to be successful with this. And it’s not about, you know what I call automated AI, where you do automatic image recognition, automatic this, automatic that. It’s about, you know, bringing humans to interact with this algorithm, however smart there. And for this, you need intuitive interface. You need to do it easily, quickly, and to connect really with the people developing the graphics, the algorithms, and the end users together. So very often whether you read books, you listen to people, they usually stop at the area of expertise which works too much as a silo. So we and I think that’s the biggest opportunity also for the future. So that’s what we’re seeing. AI in a lot of industries. These industries are used in isolation very often as an automated AI algorithm. And this area where it’s used with the users on for end users is just beginning.
Jon Krohn: 00:50:47
Nice. Yeah, really exciting times, I think that we’re, that we’re just getting going on this. Vincent, you’ve been in this industry for a very long time. So it’s coming on 40 years that you’ve been working on artificial intelligence. And so no doubt you’ve seen a lot over those years. There’s been a number of AI winters in those decades. So I would love to hear your perspective of what those were like, you know, to have, you know, that first AI winter in the eighties and ones that have happened since. And then after you’ve answered that question, I might have you speculate about what’s gonna happen in the future.
Vincent Gosselin: 00:51:38
Okay, so yes, I’m old enough for that. It’s not quite 40 years yet, 35, I suppose, but yeah, you’re right. It’s really when I graduated, it was really, you know, the late eighties and AI was everywhere. Everybody wanted to do AI. It was not as much as now, of course, but you know, if you could, you would go for an AI degree, and this is what I did, and I loved it. And the first couple of years, well, first two, three years were fantastic. You know, we were doing all these projects and so on, but I realized a lot of the projects failed. The technology was not there, the data wasn’t there. I remember some AI projects that we were involved in those days, expert system were considered AI, for instance. But the data on the rules were changing faster than what we could change. So there were, it was doomed from the start.
Jon Krohn: 00:52:42
We should probably quickly break that down for our listeners who don’t know what expert systems are. So this was the kind of approach to AI where every, every path through the system was hard-coded by a programmer.
Vincent Gosselin: 00:52:56
Yes. It was basically rule programming. You were just defining if-then-else, and these rules were working in different mode. One was called Forward Chaining, Backward Chaining, so basically deducing or inducing and this is how it was used. So that was a big thing in the heydays of AI in the early nineties, and everything stopped after the Japanese were really the big promoter of AI in those days. The economic crashed in Japan, but, they have not really recovered since, by the way. And really a lot of the financing and everything stopped all of it due to failure. Big project failure, small and big. The technology wasn’t there. The neural network were really, really slow not doing much. It was really toys at this stage. So that’s at that time how it happened. Luckily, I was able to work in an area of AI, which was really kind of protected.
00:53:56
Not extremely visible. It was really about optimization, planning, scheduling with smart algorithms, you know, basically algorithm based on trees, also called you know, operation research, mathematical modeling. And this was really performing. We solved really large problems. It was still a kind of AI you were solving problem with smart algorithms behind, based on trees. And that was really a lot of fun. And but that was dampered somehow by the fact that suddenly everybody had to move into C++ than Java. So you remember also that it’s not, this AI heydays in the late eighties, early nineties was not only about AI, there was a flurry of new languages, beautiful languages. Like in particular, Prolog was a fantastic language. And you have this feeling like in Python, in fact, it’s, these are very different languages, but in Python you basically think of something and you move from the ideas to an implementation very quickly. That was the, the capability also you had in Prolog and all this disappeared completely. Very few people were so everybody moving to mainstream programming in C++ and Java. At that time, I felt really not very exciting in terms of programming languages. It was very boring. In fact, I always hated to physically code a lot, to produce simple results. So it was a, coding was very heavy. This has changed completely now. So it’s like being reborn.
Jon Krohn: 00:55:45
Nice and maybe reborn even more that we now, today, something that’s been really exciting for me and has made data science more exciting than ever has been these large language models that we can plug into, and in some cases, fine-tune. At the time of us recording this episode, GPT-4 has only been out for a week, but there are so many new capabilities that emerge from having access to these foundation models. I’ve been blown away. I’ve, over the course of the last week, there have been so many ideas that have come to my mind of new machine learning capabilities, or even just wondering whether I could now automate data labeling for some, you know, previously I would’ve had to have painstakingly myself or offshored the labeling of data sets that now GPT-4 can do so, so confidently, like I…
00:56:45
So I, I often think I’m gonna give it something that is just too hard, that’s too abstract, that like, I’m even like, am I even, am I even explaining this well enough that a human could understand and it gets exactly what I wanted and gives me the output that exactly what I was hoping for. So is this also something you know, it’s, I hear exactly what you’re saying about things like the Python programming language, allowing you to move from idea to execution more rapidly than C++ or Java, and that’s really exciting. But yeah, I just, I’m curious if you also have had this same kind of experience that I’ve had in recent weeks.
Vincent Gosselin: 00:57:24
Yeah. Oh yeah. Absolutely. So it, at the same time, it’s, it’s extremely exciting. Even the weird thing about this is that when, even when you program them, if you use this, you can go in Hugging Face, you can use some of these algorithms and you do transfer learning, and you use them for the, your own purpose, like what we do for, with Taipy for instance. That’s one of the thing we want to automate, to generate automatically some code on the graphical side, even though you understand how it works what they have done and so on. You get surprised. That’s, that’s really the thing that is it’s rare that you get surprised. I’ve never been surprised that much. It’s really the same kind of thing you’re saying is that maybe I haven’t done enough.
00:58:14
And, and, and you look at it, I said, wow, how did he do it? And so there is this kind of a, of a gap, and it, I can understand also why it can be scary for a lot of people. At the same time I think we have to think of all the use cases that you can get from these tools. That’s the interesting bit. That’s the creativity. You know, how can I use the power of this for well, for in my life, and when I do programming, when I built a library like this? I think we are going to discover a lot of new cases that even the guy who designed this all the community, in fact, behind it, have not thought about at the start. But for a company like us it’s really, again, it’s not going to make us believe that no-code with this large model language can be automated. I don’t believe it’ll happen, but we can really bring it the right place inside the software to really make it even more successful. But it’s true. I mean we live in a time where it’s really unsettling, I would say. Exciting and a bit scary sometimes.
Jon Krohn: 00:59:37
Yeah, there’s certainly, I think it’s critical that if you are leading data science projects, or in your case a data-centric, a data-driven company, it’s critical that all of us need to be getting used to the kinds of things that GPT-4 can be doing that future foundational models will be able to do. Because if you’re not, you’re probably missing out on commercial opportunities, and your competitors are probably dabbling and trying to see how they can be-
Vincent Gosselin: 01:00:08
That’s, that’s exactly what we are seeing amongst the people we meet the, even before the large language models, you really already had some kind of acceleration between companies where the gap was getting wider between those that were really just thinking about doing something, and those that were really well prepared on all the way to doing extremely sophisticated projects with huge ROI. This will even accelerate this process. So the scary bit is, is that. I mean, if you are a CIO these days, and if you miss this, it’s almost like a death sentence. There’s no way. It’s, and it’s going to be very hard to catch up. So there’s a whole culture change that needs to happen across the whole company, not only the IT not only the AI or innovation groups. And that’s also going to create nightmares for CIOs. I’m sure. But again, you will see, you will see even more blatantly differences between companies.
Jon Krohn: 01:01:24
Yeah. Nightmares, but also really incredible things that will come out of this out of all of this disruption. So it’s clear to me, and I’m sure many of our listeners from hearing you over the course of this episode, that you are a brilliant person to be working for with an almost unparalleled amount of experience in AI. And Taipy, clearly an amazing software library for people to be using open-source for data pipelining as well as for user interfaces. So there’s probably a lot of listeners out there wondering how they could be working for Taipy and what kinds of opportunities there are. So, Vincent are you doing any hiring and what do you look for in the people that you hire?
Vincent Gosselin: 01:02:14
Okay, so company like us, we are what we call a “freemium” company. So we have two objective in life. One is to grow a huge community around Taipy on the open-source. And, of course, the other one, which is to get large companies and they often do take on the enterprise version of the software, which is mostly about support. The feature-wise, it’s basically the same product. So for this, we need people obviously, so developer advocates for the community recruiting, basically developers working in our GitHub repository and creating additional things and working with the, the R&D team. We are looking also at evangelists that can talk about the product and talk to the community at large. And we are looking also for people for our own R&D. So basically we have back-end and front-end background. So people who are, you know, excited about our graphical interface and people who are really into JavaScript [inaudible 01:03:24] will be, this is what we will be looking at. Of course on the backend, which is more about, you know, pipelining and a lot of this is developed in Python, so it’s more the backend background.
Jon Krohn: 01:03:37
Awesome. Yeah, that makes perfect sense. Are there particular kinds of attributes that make a developer advocate or a back-end developer or a front-end developer who applies to Taipy? Are there, are there aspects that you look for that you’re like, yes, this is somebody that’s perfect for a company like ours?
Vincent Gosselin: 01:03:57
So these days, what hasn’t changed is that, you know, you still need to go through an interview and do some technical tests of course to check. Surprisingly, yes, experience doesn’t seem to be as important as before. And for people with the same number of years, same backgrounds, you can see a wide differences in capabilities because it’s all about not only doing your job, but being curious as people being around looking at, you know, all, all the different channels you have to learn. It’s absolutely mind-boggling. And that’s, that’s what creates a difference between candidates. It’s not only because you have a good degree, it’s because you’re curious. So curiosity might be the most important quality in there that makes a big difference. The rest, you know, is being able to work with a team and to be collaborative, to be yeah, being collaborative is more and more important. You know, the days where you have a very smart guy who was, you know, like God, and everybody was following him, that I lived through that. This doesn’t seem to be as successful as before. This kind of model inside an R&D, which is a good thing. You don’t have to deal with huge egos and this kind of thing anymore.
Jon Krohn: 01:05:24
Yeah. I think this pace, there’s just too much to know now for one person to be able to be that-
Vincent Gosselin: 01:05:30 Exactly.
Jon Krohn: 01:05:30 single point in.
Vincent Gosselin: 01:05:33
Exactly.
Jon Krohn: 01:05:33
Yeah.
Vincent Gosselin: 01:05:33
Exactly.
Jon Krohn: 01:05:35
So you mentioned curiosity there as one of the key things that makes somebody successful today. For our curious listeners, Vincent, do you have book recommendations?
Vincent Gosselin: 01:05:45
Yes, so again, there are a lot of books, a lot of good books, a lot of bad books. One book I would, which we’re talking about, you know, natural language processing and all this, I would recommend Natural Language Processing with Transformers from O’Reilly. This also gives you a good intro to working with Hugging Face. So it’s really a very nice book and it’s very recent because now good books are recent books. You age very quickly as a book these days.
Jon Krohn: 01:06:20
Yeah, yeah.
Vincent Gosselin: 01:06:20
Then there are books also on the design you know, patterns. I mean, I like that because of what we’re talking about, you know, pipelines and being able to create best practices. They’re already books like Machine Learning Design Patterns, still on O’Reilly and Building Machine Learning Pipelines. These are good books for those looking at building pipelines and so on. And finally, I would recommend one of my favorites, it’s really François Chollet. Any book from him usually extremely enjoyable. And he has this way of making it easy to explain a difficult concept in deep learning of course. And the, he’s the, the guy who, who has been working on Keras, but definitely one of the rare author that can explain deep learning well.
Jon Krohn: 01:07:18
Yeah. And yeah, so the big book from him I think would be the Deep Learning with Python published by Manning. The second edition came out in 2021, yeah.
Vincent Gosselin: 01:07:27
Absolutely.
Jon Krohn: 01:07:28
Nice. Those are great recommendations Vincent, I’m not surprised to get them from you, given how excellent this episode has been. For people who want to continue to learn from you after the episode, how should they follow you?
Vincent Gosselin: 01:07:40
Well, that’s easy. So it’s all about our GitHub. So we go to our repo Taipy repo do stuff with it and do some, give us some stars if you want. Otherwise, it’s all about our LinkedIn page and a lot of stuff on our website also available. Of course, you need to have the, as much stuff on the, in terms of documentation, demos and things to learn the product. So the website is a really good place and on Twitter we’re starting to be a bit more involved.
Jon Krohn: 01:08:17
Nice. Alright, Vincent thank you very much for taking all of this time with us today. No doubt as the CEO of Taipy, you have a lot that you need to get done in a given week, so I really appreciate you taking the time with us and I’m sure our listeners do as well. And yeah, thanks for a great episode and we look forward to seeing how the Taipy environment evolves in the future.
Vincent Gosselin: 01:08:42
Thank you, Jon. It was a real pleasure discussing with you today. Thanks.
Jon Krohn: 01:08:51
What an AI legend to be able to learn from. In today’s episode, Vincent filled us in on how the open-source Taipy GUI component makes it simple to build front-end user interfaces for data-driven Python web applications. How the open-source Taipy Core component makes building pipelines easy through combinations of data nodes for data inputs and outputs, tasks for performing operations on data, and scenarios for executing while allowing for flexible parameter variation. We also talked about how Python is no longer just a glue or an ML language, but is increasingly the mainstream choice as applications core language. Why Taipy selected Plotly and Visual Studio Code extensions as integral from the start and how Prolog was a beautiful language designed for AI that similar to Python today, enables developers to move rapidly from idea to implementation. As always, you can get all the show notes including the transcript for this episode, the video recording, any materials mentioned on the show, the URLs for Vincent’s social media profiles, as well as my own social media profiles at www.superdatascience.com/673. That’s www.superdatascience.com/673
01:09:57
I encourage you to let me know your thoughts on this episode directly by tagging me in public posts or comments on LinkedIn, Twitter, or YouTube. Your feedback is invaluable for helping us shape future episodes of the show. And if you’d like to engage with me in person as opposed to just through social media, I’d love to meet you in real life at the Open Data Science Conference East, that’s ODSC East. It’ll be held in Boston from May 9th to 11th. I’ll be doing two half-day tutorials. One will introduce deep learning with hands-on demos in PyTorch and TensorFlow. And the other tutorial is brand new. It’ll be on fine-tuning, deploying, and commercializing with large language models including GPT-4. In addition to these two formal events, I’ll also just be hanging around grabbing beers and chatting with folks. It’d be so fun to see you there.
01:10:44
All right. Thanks to my colleagues at Nebula for supporting me while I create content like this SuperDataScience episode for you. And thanks of course to Ivana, Mario, Natalie, Serg, Sylvia, Zara, and Kirill on the SuperDataScience team, producing another fascinating episode for us today. For enabling that super team to create this free podcast for you, we are deeply grateful to our sponsors whom I’ve hand selected as partners because I expect their products to be genuinely of interest to you. Please consider supporting this free show by checking out our sponsors’ links, which you can find in the show notes. And if you yourself are interested in sponsoring an episode, you can get the details on how by making your way to jonkrohn.com/podcast. Finally, thanks of course to you for listening. It’s because you listen that I am here. Until next time my friend, keep on rocking it out there and I’m looking forward to enjoying another round of the SuperDataScience podcast with you very soon.