SDS 683: Contextual A.I. for Adapting to Adversaries, with Dr. Matar Haller

Podcast Guest: Matar Haller

May 30, 2023

Matar Haller speaks to Jon Krohn about the challenges of identifying, analyzing and flagging malicious information online. In this episode, Matar explains how contextual AI and a “database of evil” can help resolve the multiple challenges of blocking dangerous content across a range of media, even those that are live-streamed.

Thanks to our Sponsors:
Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
About Matar Haller
As the VP of Data & AI at ActiveFence, Matar Haller leads the Data Group, whose teams are responsible for the data and algorithms which fuel ActiveFence’s ability to ingest, detect and analyze harmful activity and malicious content at scale in an ever-changing, complex online landscape. Matar holds a Ph.D. in Neuroscience from the University of California at Berkeley, where she recorded and analyzed signals from electrodes surgically implanted in human brains. Matar is passionate about expanding leadership opportunities for women in STEM fields and has three children who surprise and inspire her every day.
Overview
According to the Alan Turing Institute, nearly 90% of people aged 18-34 “have witnessed or received harmful content online at least once” (turing.ac.uk, 2023). As VP of Data and AI at ActiveFence, a company that develops algorithms to detect and remove harmful user-generated posts, Mater Haller has a lot of work ahead of her. The ubiquity of dangerous content and its range across media formats (video, text, audio, image) make such data hard to monitor. As Matar explains, this is only the first step in a long process of weeding out illegal material online.
The first issue is that the divisions between acceptable and unacceptable content may not always be clear. This is why it is necessary to consider content such as “baby’s bath time” in its broader context: Is it an innocent video posted to share a happy moment? Or is it something more sinister? To assess this broader context, Matar explains, contextual AI will look at the user’s history of posts, whether it contains known logos or weapons, the language used to describe the post, and its user-generated tags.
Another obstacle that ActiveFence has to climb is that users who want to spread misinformation and extreme media won’t give up easily, seeking instead to circumvent AI through a variety of means, from unusual spellings to video and audio played at half-time. To address these evasion techniques, ActiveFence hires intelligence analysts with expertise in finding misinformation. These analysts research the hashtags, trends and even emojis that dangerous groups might use, which they can then add to a “database of evil” which helps the ActiveFence team to surface and block any offending data.
Thanks to the Digital Services Act passed by the EU last year, users in EU countries can benefit from more guardrails against witnessing illegal content. But while this is a step in the right direction, some forms of content sharing such as live-streaming can be a challenge to monitor. Matar says that the scale of data, posted simultaneously, can bog down the process of detection. One method to solve this problem is to again use contextual AI, considering the user, their streaming and comments history, and the groups to which they belong, before making a risk score as to how likely that piece of content might be inflammatory.
Listen to this episode to hear about the specific technologies ActiveFence uses to run its platform, Matar’s experience with the Insight Fellows Program, and MedTech’s potential capabilities for predicting brain seizures.
In this episode you will learn:
  • How ActiveFence helps its customers to moderate platform content [05:36]
  • How ActiveFence finds extreme social media users trying to evade detection [16:32]
  • How to monitor live-streaming content and analyze it for dangerous material [29:13]
  • The technologies ActiveFence uses to run its platform [35:54]
  • Matar’s experience of the Insight Fellows Program (Data Science Fellowship) [40:28]
  • Leadership opportunities for women in STEM [1:00:41]
  • Israel’s R&D edge for AI [1:13:19] 

Podcast Transcript

Jon Krohn: 00:00:05

This is episode number 683 with Dr. Matar Haller, VP of Data and AI at ActiveFence. Today’s episode is brought to you by Posit, the open-source data science company, by Anaconda, the world’s most popular Python distribution, and by WithFeeling.ai, the company bringing humanity into A.I.
00:00:23
Welcome to the SuperDataScience podcast, the most listened-to podcast in the data science industry. Each week, we bring you inspiring people and ideas to help you build a successful career in data science. I’m your host, Jon Krohn. Thanks for joining me today. And now let’s make the complex simple.
00:00:55
Welcome back to the SuperDataScience podcast. Today I’m joined by the wildly intelligent data scientist and communicator, Matar Haller. Matar is the Vice President of Data and AI At ActiveFence, an Israeli firm that has raised over a $100 million in venture capital to protect online platforms and their users from malicious behavior and malicious content. She’s renowned for her top-rated presentations at leading global data science conferences. She previously worked as Director of Algorithmic AI at SparkBeyond an analytics platform. She holds a PhD in neuroscience from UC Berkeley, and prior to data science, she taught soldiers how to operate tanks. Today’s episode has some technical moments that will resonate particularly well with hands-on data science practitioners. But for the most part, the episode will be interesting to anyone who wants to hear from a brilliant person on cutting-edge AI applications. In this episode, Matar details the “database of evil” that ActiveFence has amassed for identifying malicious content. How contextual AI considers adjacent and potentially multimodal information when classifying data. How to continuously adapt AI systems to real-world adversarial actors. The machine learning model deployment stack. She uses the data she collected directly from human brains using recording electrodes and how this research relates to the brain-computer interfaces of the future and why being a preschool teacher is a more intense job than the military. All right, you ready for this captivating episode? Let’s go. 
00:02:27
Matar, welcome to the SuperDataScience podcast. It’s awesome to have you on the show. Where are you calling in from?
Matar Haller: 00:02:33
I’m calling in from Israel, sunny, sunny Israel. So thanks for having me. 
Jon Krohn: 00:02:37
Sunny, sunny Israel. Is that always true? Always Sunny Israel. 
Matar Haller: 00:02:40
Mhm, most of the time it’s pretty sunny. We have like two seasons. One is really long and it’s really, really hot. And the other one is shorter and beautiful and not as hot. But still, we have a lot of sun and that’s not- 
Jon Krohn: 00:02:54
[crosstalk 00:02:55] beaches. 
Matar Haller: 00:02:56
We have very nice beaches. We, it’s, we have tropic-like areas that are like more green and nice, and forests, wildflowers, mountains. Not all camels and deserts, although we have that too. 
Jon Krohn: 00:03:09
Cool. Well, I guess it isn’t cool, but I, sounds hot, but I will have to visit there sometime. I actually, I have a grandmother who recently visited and said that it was her favorite place she’s ever been. 
Matar Haller: 00:03:22
Oh, wow. Nice. So come visit I’ll introduce you to my chickens. 
Jon Krohn: 00:03:29
There you go. This episode brought to you by the Israel Tourism Board. And, but you do travel a lot as well. So you were recently in New York. You were at MLconf the Machine Learning Conference in New York, which I wasn’t able to make it to this year, but you were a speaker at MLconf. And Deborah Williams, who’s a friend of mine and the acquisitions editor Pearson that I’ve worked with for the books that I’ve created, all the video content I’ve created, she wrote me a long email summarizing how MLconf had gone. And she said that by far the best speaker hands down, and not just her opinion, but the opinion of “everyone that she spoke to” was that you Matar were by far the best speaker at MLconf. So I was like, well, get her on the show. 
Matar Haller: 00:04:21
So that’s very, very flattering and now like, take your expectations and lower them. Thank you. Very, very flattering. Thank you. That was a fun, that was a fun conference. There’s lots of interesting ideas and good, good talks. So it, it was a, if she said that, there, it’s, there was a high bar. So thank you. 
Jon Krohn: 00:04:40
And so let’s dig into what you do. So you are the VP of Data and Artificial Intelligence at ActiveFence, which is a platform for content moderation, harmful content detection, and threat intelligence. And so to be clear ActiveFence is not a company that is doing the content moderating. It’s not like there’s this army of people at ActiveFence that are monitoring for harmful content, but you provide tools, data, and AI-enhanced tools that allow your customers to be able to do that content moderation themselves more efficiently. And this seems to be quite a good niche. I could see on Crunchbase that ActiveFence has over a hundred million dollars in funding. So yeah, it seems like a very valuable niche to be filling for your customers. So tell us a bit about what this means. How do you use AI to be moderating content? How’s that useful for threat intelligence, that kind of thing? 
Matar Haller: 00:05:42
Sure. So, ActiveFence you’re right. Like we are a platform that basically, our clients are any company that has user-generated content. So whether it’s you know, comments or chats or uploading videos or audio or any place that you have a user that’s able to upload content, there’s a potential for misuse of that. And for uploading malicious content. And our goal, our mission is basically to help platforms ensure that their users are safe, that they have safe online interactions. And so we do, we provide the tools to help them, to help them do that. And really one of the, this is one of the biggest challenges that face UGCs, platforms with user-generated content is basically how can they detect this malicious behavior especially since, as we know, items can be in any format, right? So we need to be able to detect whether it’s video, audio, text, images all of that. And we, and also it can be in any language, and it can also be any number of violations, right?
00:06:44
So you have sort of these, these big ones that, you know, you say absolutely not. Like I do not want child pornography. I do not want terror. I do not want one’s supremacy. But, but there’s, there’s like many, many, many, many more and different, different companies, different platforms have different levels of sensitivity to it, right? Even something that you can say as blatant is like, I do not want child pornography. No one wants child pornography on their platform. But let’s define it, right? What does that mean? Is, you know, baby’s first bath. Is, is that, is that something that we need to be aware of?
00:07:12
And so the tools that we provide need to be sort of contextually aware of, you know, the policy the way that things are being used or presented. And so for me and for all, you know, my teams it’s a super, super interesting space to be in because not only are the algorithms that we use really exciting and sort of interesting, but I think the application right, we’re not, we’re not selling air, like we’re actually making it like impact, like making a real impact on like human interactions in a positive way. 
Jon Krohn: 00:07:45
Right, so to what extent can you tell us about those exciting algorithms? 
Matar Haller: 00:07:51
So, that’s, that’s an excellent question. Thank you for asking. So I think that, so there’s many different levels of things that we can do. So the first thing is that we sort of, we have our, a platform, right? And this, this is a platform that basically enables, it enables users to or like moderators, to come in to view the content, to look at sort of where, where it is, and then to make a decision whether or not something should be removed or not, right? And this is the platform that we provide to our users in order to basically ensure that we’re being able to protect the wellbeing of the moderators and to make sure that they’re only seeing things they actually need to be seen in order to be more efficient. They, there’s absolutely no need to review everything. There’s like a, most of the things are benign and even within the things that are harmful, there isn’t really any need to view anything. Then in that, in that case, basically you want to make sure that you have sort of some sort of automated content moderation on top. And that’s where sort of we, we come in. Yes. 
Jon Krohn: 00:08:49
I guess that that ends up being important for the mental health of the people who are doing the content moderation as well, because I’ve read how people in those roles, it can be quite a harrowing experience when you’re just watching the headings in child porn all day. 
Matar Haller: 00:09:02
Absolutely. Absolutely. Content, like moderator wellbeing is a huge, huge, huge issue. It’s in the news, like periodically it comes up as like this, this, this huge thing. And, and ActiveFence is like very, very concerned about this, right? We deal with data that is not pleasant, right? And so in the same way that I actively work to protect my data scientists and my engineers from exposure to this, and only when it’s really needed and with a lot of safeguards, we want to make sure that, you know, we’re all human and want to make sure that moderators are also protected in the same way. And so if there’s things that are sort of blatant, you know, a beheading, why do they need to watch that? There really isn’t a need, right? There’s things that are clearly, obviously violations are clearly obviously malicious and should just and should just be removed and banned.
00:09:52
And so the algorithms that we use, basically we use what we, we what, what we call contextual AI. What this means is that we look at sort of the item in the context that it is being used but also within the item, right? We have a, our data model basically enables us to take sort of an item even if it’s just an image, and start breaking it apart into the components that it has, so that then we can build those together into a coherent risk score where this risk score can take into account, you know, what, like, do we see any like weapons? Do we see any known logos? Do we see any known people of interest that we know have, you know, from their history or whatever, we know that they’re, you know, spewing hate speech or misinformation and so forth? 
00:10:36
And then all those components together can combine to basically say, yes, this item is very probably, very probable to be risky. And so that’s sort of how we, we build the full picture. And then, of course, there’s other layers of that, right? Even for example, for chats, right? You can say, well, I can just use keywords, right? Like, if I find the “N” word, then clearly this is very violative. But what if it’s someone saying, please don’t call me that. Or what if it’s a rap song? Or what if it’s you know, someone like, you know like community sort of re-owning a word. And so like, you know, I, you know, I’m proud to be a whatever some, some slur. And so in those cases, clearly I don’t want to ban that. And if I’m just doing keywords which are sort of contextually unaware then I lose that ability. And so in those cases, we do need to use sort of language models that are more contextually aware. And these language models need to be trained and tuned on these specific cases. Because these are the cases that are always interesting. 
Jon Krohn: 00:11:37
That does sound really interesting. And it sounds like the kind of thing that in this brave new world that we have of these really powerful large language models that this is the kind of thing that they could do really well, that a few years ago it might have been a lot tougher. And so it’s great that you, that you’re presumably able to leverage these kinds of new technologies, especially these kind of multimodal technologies that are emerging. So I don’t think it’s available to the public yet, but GPT-4 has this image component where you can have an image, you can provide an image, a photo of your fridge, and ask GPT-4, “What can I cook?”, based on the ingredients that you see in this image. And so that kind of multimodality, it sounds like it’s something that you’ve been working with for a while. 
Matar Haller: 00:12:30
Yeah, we’ve been looking a lot at multimodality because for you know, if I’m going back to the child pornography example, because that’s, for people something that’s like so obvious, right? Like, you should be able to know whether something is child porn or not. Like we all sort of viscerally know what is bad. And yet sometimes you’ll see, you know, a picture of a child and it looks fine, but, you know, it’s sort of like only the people that are in the know will know that, you know, whether like that’s a face of a victim that’s known, or in the comments, there’s links to off-platform sites or something about the angle or a logo that’s like, the picture itself is benign, but there’s a logo that’s associated with a studio that’s been associated with child porn or the title or the description.
00:13:12
And so sometimes it’s enough to look at the image, sometimes it’s enough to look at the surroundings. But oftentimes it’s the combination. And I mean in terms of like this, the generative AI, right now is this sort of like perfect storm for trust and safety, right? Because we’re gonna be having sort of US elections soon, and so political disinformation is something that’s like very, very pertinent. And is now sort of having these large language models sort of lowers the bar for the entry of bad actors, right? Suddenly, if it used to be that things could either be like really high quality, but low scale or low quality and easy to catch in high scale, now that’s not an issue, right? And so it’s like an enabling technology and, you know, can, it’s obviously, I don’t, like no fear mongering here, I think it’s like, has a lot of good that it can do. But we need to be aware of how it can be used and how we can kind of be prepared for it.
Jon Krohn: 00:14:15
This episode is brought to you by Posit: the open-source data science company. Posit makes the best tools for data scientists who love open source. Period. No matter which language they prefer. Posit’s popular RStudio IDE and enterprise products, like Posit Workbench, Connect, and Package Manager, help individuals, teams, and organizations scale R & Python development easily and securely. Produce higher-quality analysis faster with great data science tools. Visit Posit.co—that’s P-O-S-I-T dot co—to learn more. 
00:14:50
Yeah, no question. So we, you know, famously in the United States, in the 2008 election cycle, there was a lot of, there were foreign actors involved in creating disinformation in Eastern European kind of farms. And you can imagine exactly like you’re saying, these kinds of tools like GPT-4 make it a lot easier to create a lot more content. Cause you don’t need to have a human typing out everything so much more cheaply. It probably like several orders of magnitude less expensive to be generating malicious content, misleading content, disinformation. So yeah, it is interesting that yeah, heading into yeah, and I mean, I guess we’re always heading into an election cycle somewhere. [crosstalk 00:15:31] And so it’s something- 
Matar Haller: 00:15:33
It never ends. 
Jon Krohn: 00:15:34
Like, yeah, it’s crazy in the US to me that people in the, in the lower house that they have a two-year election cycle, and so like, you spend a few months litigating then is back to fundraising.
Matar Haller: 00:15:48
Exactly.
Jon Krohn: 00:15:48
It’s wild. 
Matar Haller: 00:15:50
Yeah. But, but, but I think it’s, what’s interesting is that like disinformation is only one aspect of it, right? We’re seeing like computer generated or, you know generative AI-generated child pornography. And then at this point, the question is it still violative? And I think yes, right? Like, we don’t want that stuff out there. I don’t care whether something is real or fake. It’s, it’s still child porn and it should be, it should be banned. And then, and then there’s a second level of like, well, unless I’m trying to find who the victim is, and then I do care, and then there’s like another level of detection that needs to be built on top of that. 
Jon Krohn: 00:16:27
Is it tricky? I mean, it must be tricky. Something that must add an extra level of complexity to this is that presumably the nefarious actors out there are constantly shifting and trying to evade detection by you. So, in, with, when many of our listeners and myself, when we’re building machine learning models, we don’t have to worry about somebody trying to outwit the model. You know, like you can build a machine learning classifier to detect images of cats and dogs, and it’s not like the cats are like trying to look like dogs and are gonna come up with ways of like, dressing to look more like dogs. So, … 
Matar Haller: 00:17:08
Yeah, no, I mean, I think I may think about it in terms of like, how, how is what I do different from car detection, right? Like that’s sort of, you know, or cat detection or anything. Or, yeah. Anyway, there’s a million examples. And so I think there’s, in addition to the fact that it’s evasive and adversarial, right? So there’s, you know, examples of like QAnon, which is a group that’s bands on some platforms will, you know, change their text to be cue, like C U E. And then you have to basically catch it by knowing to look for that. And to know to look for that, that’s already subject matter expertise. And that’s one thing [inaudible 00:17:44] we have, you know, you mentioned threat intelligence. And so we have intelligence analysts that this is what they do, right?
00:17:49
They, they’re experts in, you know, misinformation or in hate speech and or in terror and research these groups, they know about sort of the latest hashtags or trends like what emojis they’re using now and so forth. And then basically that is able to, you know, they’re able to sort of surface data that’s relevant. You know, for example, you know, the latest, there’s, you know a hate group that was founded in like this last June or this last October, and they’re already, you know, on, on different social networks with their logos spewing hate. And so to catch those, to know those to put, and then those can feed back into our algorithms, and I can know to look for those logos, to look for those phrases, to look for those actors. And so then I’m able to sort of stay on top of it on the fact that yes, it’s adversarial.
00:18:35
I have the subject of matter expertise. It’s extremely non-stationary, right? I have new actors coming up all the time, right? If I’m looking for cats and dogs, like how much have they really changed? They’re, they’re not gonna change, right? Right. They’re gonna have four legs, a tail, and ears versus here the landscape is very non, because it’s adversarial, it’s extremely non-stationary. And so that’s why I need to have my subject matter experts that are constantly feeding me more information of like, oh, this is a new slang term. Oh, this is a new slur. And so forth.
Jon Krohn: 00:19:04
Is there a flywheel that gives active defense, a defensible moat between this content moderation and the threat intelligence? So I’m just kind of, I was, I just kind of had this brainwave here, and you can correct me if I’m thinking about this incorrectly, but, so you have the content moderation aspect of the platform. So machine learning models are detecting “Hey, you know, we think that there’s, this is high-risk content over here” automatically. And then maybe that automation can assist the threat intelligence part of the company. And then the threat intelligence people in turn are keeping tabs on what’s going on through a combination of more manual intelligence work, as well as this automated automatically assisted intelligence work that your content moderation side is helping with. And then they can feed back into the content moderation. Like, “Hey, like, here’s something else you need to be able to look for. We need to train a machine learning model to be able to check this kind of thing”. Because yeah, like, you know, there’s this new logo that you need to be looking out for. 
Matar Haller: 00:20:02
Totally. You nailed it. Yeah. We love flywheels at ActiveFence. We always say like, no strategy deck is complete without a flywheel. And so, and so, absolutely. It’s exactly as you described. So we have our intelligence analysts that are, you know, finding, finding things, right? Those feed into the algorithms, make our algorithms better, and then we have incoming data collection, whether the data is like we go out and practically collect it, or we get data, you know, clients are sending us data being like, “Hey, is this violated or not?” and then that’s basically can then be fed back into the intelligence analysts. So for things that come from clients, sometimes it’s things that they haven’t seen before that they, that they don’t know. Often they do know it, but because we’re also out there collecting data proactively, then that’s basically able to feed back in. 
00:20:46
And one core component of this flywheel is something that is we’ve, we’ve, it’s our proprietary database. It’s sort of a violative content. And what this means is that basically we have data that we’ve already identified, whether it’s images or audio or videos or texts that we’ve already identified as being violative of a policy or malicious, we can hash that. And then new content that comes in can be compared to that, right? And then that also helps us, first of all, to be more efficient. But also we can proactively enlarge. We don’t need to wait for data to come in, right? We can proactively enlarge this proprietary database by going out there, going to sources that we know are problematic and that’s what our intelligence analyst can help us with. And then next time that it comes in, basically we’ve, we’ve already seen it before. And so there’s definitely this sort of interaction, the flywheel between the intelligence analysts, the humans and the AI, one sort of feeding off the other. 
Jon Krohn: 00:21:43
Yes. So this is the database of evil that you’ve talked about publicly before, right? And so to give a bit of an analogy, this is kind of like how antivirus solutions have a database of known viruses, and then if that line of code that’s known to be malicious is come across on your own hardware, that can be compared against this database, and you can say, okay, this is the threat. We need to remove this part of your file system. So similarly, you have this database of bad content, of harmful content and yeah, it’s proprietary. So, okay, that, I think that, yeah, you have more to say about that. 
Matar Haller: 00:22:24
So my, the more thing that I have to say about that is only that to bring us back to the, to the idea that we’re in an adversarial space. And if we sort of keep this in mind, it’s like we’re in the space that’s adversarial, that requires subject matter expertise, it’s, or it’s adversarial or invasive, requires subject matter expertise. It’s non-stationary, it’s multi-dimensional. And so once we keep those and, and it requires context, and once we keep those in mind, then it sort of helps us frame how we want to use this database of evil. But also, or this proprietary base, but also like what it needs to sort of be robust against, right? So if we’re in a place that requires subject matter expertise, ensure, then we can keep enlarging our data, like all of our databases, right, with our intelligence analysts, and it’s non-stationary. So we need to make sure that it’s always updated. Like having a snapshot of this database isn’t enough, right? There’s gonna be new things. 
00:23:15
But also if we’re, if we’re in a place that’s inherently adversarial, then we need to make sure that this database is also robust to adversarial manipulations. What does that mean? For example, if I have like a very hateful song like, you know, glorifying the Holocaust, for example like a love song glorifying the Holocaust, right? These things exist. Then, and I know that this is banned on platforms, then I can speed it up, right? And then in the comments, or in the title or summary, I can say, listen to this at half speed. And then now, now I’ve basically like made it against all, you know, made it past all kinds of defenses. 
00:23:54
And so we need to make sure, and the same thing with images, right? I can like rotate it, I can grayscale it, I can mirror it, I can do all sorts of things. And so I need to make sure that my hashing any hashing algorithm that I have is robust to these manipulations up to a point, right? Because I’m always gonna, it’s always this idea of like precision versus recall. Like, do I want to now unfairly capture things that shouldn’t be captured, right? And unfairly say that they are violative, probably not. And so it’s a tricky line, but it’s, it’s, that’s the line that is when any content moderation algorithm we’re always trying to figure out what, where things should, where the boundaries should go. 
Jon Krohn: 00:24:31
And I guess that’s key to why having a human in the loop in these kinds of decisions so that people can, if they’re unfairly forced to remove content, there should be some kind of appeals process in the platform, or yeah, or human reviewer that can make some final decisions. So I guess kind of going back to a point earlier in the conversation, when something is really flagrant and your risk score, which I assume is kind of similar to in machine learning, having a binary classifier where you have a confidence on yeah, whether, how likely is this to be malicious content, harmful content? And so if that is very high, if you’re like, okay, this is 0.99999, we’re just like, there’s no point in sending this to a human to review. But if it’s 0.8 or 0.7, then like there might be something here somebody should review before a decision is made. And yeah, same thing on the flip side, when it’s, yeah, even things that are, yeah, so if something does get flagged automatically, because there’s still, everything is probabilistic in machine learning. 
00:25:35
So there’s gonna be cases where the algorithm is very confident and there’s still in and due to some circumstance you’re describing where like a group has re-owned something that has been a racial slur historically there should be the opportunity for that person to say “no, like, I have,” like “I should be able to”, yeah, so these, so it kind of works both ways. 
Matar Haller: 00:26:00
So our risk score exactly as you describe, and like, everything is probabilistic and also it’s, it’s a business use case where they want to, like how much manual view they want, right? Like maybe for a child’s like a platform for children, they say, you know what, just ban everything. Like, so what if the kids like can’t chat about, you know, it’s fine. But maybe for like other platforms, like a news platform- 
Jon Krohn: 00:26:23
The kids can’t chat about racial slurs. 
Matar Haller: 00:26:26
Yeah, but fine, I, I’m okay with that. Like, I don’t care for me. My kids can just like type three words. That’s okay. And my, my kids are anyway, they’re own, they’re getting cell phones when they’re 35. Like until then, deal with it. But yeah, but so, so it’s exactly that, right? But other platforms would be like, you know what, even if it’s at, at 99, like for the things that are 90 nines, like, we, we want to review them because we’d, we would rather err on the side of like free speech or whatever. And so I think that also in terms of appeals, that’s super important point. That’s something that is definitely critical in this because this is like, you know, people are posting things and they don’t want to be unfairly punished. 
00:27:05
And actually right now I think that the world of trust and safety is having its GDPR moment, right? Like GDPR was, is for those who that are not familiar, it was like privacy regulation passed into EU that ended up having huge sweeping effect because it was basically any time that a EU that a citizen of the EU is on an online platform, then GDPR is effect, like, is effective on them in terms of like, you know what cookies and what can be stored and, and so forth. And you’ve probably all seen, like, you know, the notifications on your browsers about privacy regulations. And so now trust in safety is having its GDPR moment with the DSA which is a digital services act. It’s passed by the EU last year. And it basically also puts in protections for trust and safety. It codifies them by law in sort of, in a similar way with fines and so forth. And while it’s still new for smaller tech companies. Big, like the very big online companies, they have, like, they’re already, it’s like already being rolling in and they’re rolling out and there’s like very strict regulations on them that they need to follow. And it’ll, it’ll trickle down probably to everyone. And so like regulation, fines are on the table. 
00:28:15
And so these businesses need these tools to be compliant. And part of that is also auditing and understanding. Like, part of the DSA is like auditing and understanding why things were banned and, and explaining it and so forth. And so that’s another thing that we invest in is explainability, right? Like if I’m giving a score, then I want to be able to explain why. Like, because a lot of times these things are, you need that subject matter expertise to understand that, oh, like this particular logo is actually associated with this particular terra group or hate group, or child pornography studio or whatever. 
Jon Krohn: 00:28:48
Nice. Yeah. So I can see how the evolving regulatory landscape ends up being important, probably helpful to you as you develop these algorithms. We’ve talked already about harmful content kind of in static in you know, in posted content. But there’s also, there’s something that we hear a lot in the news recently, we, we see increasingly in the news is not just content that’s been posted, but content that is streaming real-time. So there have been incidences in the US recently of shootings being live-streamed to social media platforms. And so this happening in real-time, that must add an extra layer of complexity to some of the work that you’re doing. 
Matar Haller: 00:29:40
Absolutely. There’s been like horrific instances of live streaming in the US and, and elsewhere. And so there’s a couple of ways to approach that. One of which is that you, we can put really, really small content moderation model is sort of on the edge device, right? So that, that does sort of some something basic to catch sort of the blatant stuff and then, you know, raise it up for, for human review. Cause I think in these, in these cases of, of live streams, it’s, it’s tricky. We’re still learning the space and we would want someone to just, like, if we flag it to take a look at it maybe err on the side of flagging too much, and then having someone take a look at it, again, it’s, it’s always, it’s a business question of like, what’s the platform? What, what are we looking for? 
00:30:21
And so, you know, have some sort of like detector of, I don’t know, gunshots or of something that’s that, you know, a small [inaudible 00:30:29] model edge device able to flag it right away. We can also do something where we are, you know, once the content makes it to the servers sample frame, like there’s a question of like, what do you want to moderate? Like every single frame? Do you want to sample every minute, every second? Like at what there’s, there’s a huge question. I think what makes this so challenging is just the scale, right? You have like so much data streaming in and then do the same thing, sample, and then look for maybe more, more complex things, right? So those are sort of like the typical things, and that’s where you’re really focusing on the content itself, right, that’s coming in.
00:30:58
But a lot of these times when, when you’re, when you’re live streaming something like this, then you know the perpetrator may have like, you know, pre-shared this somewhere. There’s people that are, you know, joining the stream that are commenting, and now it’s now suddenly you have a much, much richer source of information, right? You can look at who are the other users, who is the user that’s streaming, what else have they streamed in the past? What, what other groups are they been in? What are people writing in the comments like and so forth. And suddenly now you might be able to catch it or at least flag it just from the surrounding information, right? Like there’s, there’s enough indicators of risk from the things that are around it where sure, you want to moderate the content, you want to look at it and so forth. However, you would want to basically look at the other markets of risk around the content itself to make your job easier and faster, and more efficient. 
Jon Krohn: 00:31:44
Did you know that Anaconda is the world’s most popular platform for developing and deploying secure Python solutions faster? Anaconda’s solutions enable practitioners and institutions around the world to securely harness the power of open source. And their cloud platform is a place where you can learn and share within the Python community. Master your Python skills with on-demand courses, cloud-hosted notebooks, webinars and so much more! See why over 35 million users trust Anaconda by heading to www.superdatascience.com/anaconda — you’ll find the page pre-populated with our special code “SDS” so you’ll get your first 30 days free. Yep, that’s 30 days of free Python training at www.superdatascience.com/anaconda
00:32:29
Gotcha. So, yeah, so it obviously is more complex to be moderating harmful content when you are thinking about it in a real-time situation. But as you point out, having smaller threat detection models on the edge device, so on mobile phones, maybe on laptops, being able to detect these issues in real-time and potentially flag those to the social media platform. And then also once the real-time data is reaching the servers of these platforms, you can be sampling at some appropriate interval in order to be trying to detect whether harmful content, so yeah, the sound of gunshots, and then so it can be reviewed as to whether this is like video game gunshots or not. And then something so probably in a circumstance like that, whether it’s real-world gunshots or video game gunshots, we’re going to be able to tell more easily because of the contextual information that surrounds that. So the kind of text that people post in response is probably going to be quite different and classifiable different in a video game where people there might be more like well, I don’t want to even speculate on what- 
Matar Haller: 00:33:41
Yeah, love, yeah. But, but, but I would say that, you know, if we’re thinking, if we’re thinking about it and like, I’m kind of thinking it, thinking out loud and like refining of it, what I said earlier is that, so you have something on the edge device that does like you know, more basic, like smaller model that does more basic content moderation. And then instead of it flagging human, like remember that everything is making it to the cloud. And so we’re sampling. And so things that have been flagged by the edge device can then either have like more like different, like more tightly spaced samples or can have more deeper analysis on it. Like, it, it’s, it’s basically a funnel, right? And again, it depends what you discover on the edge device. Like maybe you might want to right away flag it too and be like, listen, like there’s, this is not something that, that is very likely to be in the gray zone.
00:34:23
And then also you can look at the surrounding content. You don’t need to wait for the content to be uploaded to serve anything, right? Like, you have the surrounding content, it’s like text and who the user is, and like, where, you know, do, do we recognize this user? Where else have they posted before? The users that are commenting, like, where else are they? And this is where actually, like a graphical data model comes in handy, right? Because now you have all these relations between users and you can see like, what have they liked before, what groups are they in who have they interacted with, and so forth. And then if these are people that are known to us, then we can say, well actually, like this is a user that if we see them here, this is, it adds to the probability of risk.
Jon Krohn: 00:35:01
All right. So Matar, you’ve given us a interesting overview of how content moderation works in your automated platform for detecting harmful content. So things like contextual AI needing to be able to adapt to adversarial opponents, the flywheel between content moderation and threat intelligence that’s helpful to you. The “database of evil” and how there’s flexibility in the way that information’s hashed in there so that you can be detecting new variations that are adjacent to existing known harmful content. And then most recently we just talked about the specific circumstances of real-time streaming and how we can be addressing harmful content in those circumstances. So very interesting. And so I’m curious to what extent you can tell us about the kinds of technologies that you use to make the platform happen. So, you know, what kinds of programming languages, obviously we’re not, you can’t get into a level of detail that would allow adversarial actors to be more effective than adversarially- 
Matar Haller: 00:36:09
Adversarial actors listen in now. Yeah, so I, and again I’m giving it with the caveat that like, the parts that of that I deal with are sort of like the, the data, the MLOps, the engineering, the API, the world of front end is beautiful, mysterious to me. So I can like list technologies that use there, but they don’t mean that much to me. I’m more, I’ve always been sort of like a backend geek. And so in terms of that, we do obviously data people do Python, duh. We also use Node and TypeScript. We serve our models on Kubernetes. We have, we’ve done a lot of work in-house of selecting of like writing stuff to basically select the correct instance type for a given model so that you get really good, the best utilization. 
00:37:01
We’ve also, are working on like model-in versus model-out being like, do we bake the model into the image or does the, or do we like, or when we spin up the pod, like do we bring the model from outside basically in order to maximize our or minimize our uptime? Because we basically need to be able to deal with really, really high throughput and low latency. And so and also we don’t want to be just like burning money on machines that are up for no reason. And so we have HPA that we then tune, and then we can basically spin up and spin down our machines as we need and then be smart about which machines we’re spinning up. And also if we’re able to sometimes sort of put multiple models on the machine or batch the requests to the machine and we do all sorts of optimizations to make sure that we’re high through put SLA. 
Jon Krohn: 00:37:52
Nice, very interesting, thank you for being able to go into even that level of detail. So clearly you have a really deep understanding of not just data science and modeling, but of backend engineering. So like scaling, being able to meet SLAs, Kubernetes. So, super interesting. I didn’t know from my research beforehand that you had that kind of expertise as well. So let’s dig into your background a little bit to see how this all came about. So you did a neuroscience PhD at UC Berkeley which is I think, a great decision. I also did a neuroscience PhD. So I, for me, that was something that I got into it because I was fascinated as to how chemicals, biology, physics create a conscious experience. And so like everything that you think, everything that you do in some way that we obviously are nowhere near fully elucidating can be reduced down to physical processes.
00:39:03
And so I wanted to dig into that as much as I could. But then as I got started in the PhD, I was like, wow dataset sizes are getting really big. It seems like there’s really interesting things that we could be doing there, detecting patterns in data, identifying causal direction in data. And so I went down this road of focusing on programming and machine learning because I knew that whether I stayed in academia or not, those would be transferrable skills. And I’m not surprised, I guess that that ended up being true. However so in your PhD, I know that you were recording activity from surgically implanted electrodes in human brains. And I made this joke before we started recording about how, you know, I felt, you know, I really feel like I made the right choice sticking to silicon experiments or analyzing data as opposed to doing things in you know, learning how to implant electrodes into a ferret.
00:40:03
And I was making this joke about how, you know, people in my cohort, in my PhD were doing that kind of thing. But then the exact person that I was thinking of ended up has a really nice job at Google DeepMind. So there’s, so it seems like you have insight into that. So I ended up getting here [crosstalk 00:40:21], but we kind of have this. So tell us about your PhD, how that relates to work you’re doing today. There’s also, there’s the Insight Data Science Fellowship program that you used to transition from your PhD from your academic background into industrial data science. So it’d be interesting to hear about that. And then to finally, have it all make sense, as to how I started off this entire long transition is that I mentioned how you have, it’s, you have a rich understanding of the backend of a software platform. And so just kind of how this all came about your rich depth of knowledge in the field. 
Matar Haller: 00:41:04
Yes. The first question, let’s start with, with the first one. So my PhD, so yes, my PhD. I was recording electrodes, recording data from electrodes surgical implanted in human brains. Basically what I wanted was, you know, animal-quality data from humans, right? With animal research you can stick electrodes where you want, get really beautiful data. You could do in slices. You could do you know, from trained monkeys for years to do a task and, and then get just beautiful recordings of just like signals for hours and hours of neurons at work. And with humans, you’re often limited to things that are either slow, so FMRI, and it’s like, you know, you see it many, many, many, many, many degrees removed. You’re actually measuring blood flow. You’re not even measuring like direct brain activity. So you’re like measuring a side effect of thinking. Or you can do EEG, which is then, you know electrical signals filtered through the scalp. And even with, with technologies like MEG and so forth which is magnetoencephalography, it’s, it’s not the same. You’re not, you’re not at, you’re not on the brain. And then this really unique opportunity opens up in the laboratory of Dr. Robert Knight at Berkeley which [inaudible 00:42:12] is basically to work with patients that are undergoing brain surgery often for epilepsy. So epilepsy that can’t be treated with medicine right? They keep having recurrent seizures. And so the only solution is to go and to surgically remove the problematic area of the brain. However before that’s done, you need to map out the brain to ensure that, you know, you stop the seizures, but the person is left, you know aphasic, like they can’t speak, or you stop seizures and suddenly they can’t, you know, they’re blind.
00:42:40
And so what you want to do is you want to basically map out the areas of the cortex around the region of interest to ensure that first of all, you can localize exactly where the seizure is. Because remember, until you really get in there, everything is filtered through the scalp and you have, like, it’s really, you can’t tell. And so to figure out like where the focal point is and also what’s around it. And so these people, basically, what happens is they come in for surgery they have a craniotomy. The scalp is, the skull is removed, the electrodes are implanted and then they’re bandaged up, and then they’re in the hospital for a week with electrodes coming out of their brain, hooked up to a pre-emp, and then to an amp. And the best-case scenario for, and then their meds are tapered. And there, there’s, there’s all kinds of things.
00:43:29
All, many, many things are done to try to induce a seizure, because the best-case scenario is basically on the first day they have seizures. You localize it, you figure out exactly what’s around it, and then like the next day they’re, they’re in surgery, they remove it, and you’re done. Oftentimes, that’s not the case. You need to work to induce a seizure, sleep deprivation, strobe lights, all sorts of things to get them, right and so we could, most of the time these people are like sitting in the hospital just kind of like waiting around, right, watching TV or, you know, whatever. And then we come in and, and if they, if they consent, then, then we can come and we can run all sorts of tasks based, we ensure that the task is matched to the regions of the brain that are mapped, right? There’s no point in doing a memory task if, you know, the parts of the brain are only motor and so forth, or a motor task if they’re only looking at language regions.
00:44:18
And I found this job like very, very meaningful in terms of science outreach, like explain to them the value that they have for science and why I’m doing what I’m doing. And I spent a lot of time just, you know, talking to them about the brain and about my research and so forth. I also found it like emotionally incredibly difficult because you’re meeting these people pretty much at the worst time of their lives, right, like, this is just a terrible situation to be in. And so it’s, it’s challenging, it’s rewarding, it’s basically, you know, in a way that you don’t expect your PhD to be, right? Like, you go and you’re like, gimme data, I’ll analyze data. Great. And then there’s, there’s this.
00:44:57
And so I would come in and I would, the task that I was interested in, I was basically interested in sort of tracking the path of a decision in the brain. So I would basically, in the beginning I just had my one task, but thenI realized that basically all tasks that we recorded decision tasks. So I entered into like, dip my foot into like the world of, you know, big data where I could basically take all the tasks that were run. And anytime that there’s sort of a decision to be made, then you have sort of like a motor decision, right? Like you see a stimulus, right? So that goes, or you hear something, so that goes into like your visual cortex or through auditory cortex, and then it needs to make it to the decision making area of the brain, right, the prefrontal cortex, which is where sort of moves forward in the brain. You need to make a decision. 
00:45:37
The decision is made, and then it needs to go back to the motor cortex to execute the decision. And what I want to do is I basically tracked that, that loop in the brain and was basically able to look at the activity in the prefrontal cortex and basically say, aha, look like the sustained activity that I see in the prefrontal cortex correlates with, you know, the reaction time, right? I’m able to sort of see when they’ll trigger a reaction without looking at the motor cortex. I can look, I can see errors, I can see that like the amplitude is also correlated with errors and basically tracking a thought through the brain. And because I was on the brain I had extremely fast like recordings of what was going on. So it wasn’t filtered through anything. 
Jon Krohn: 00:46:20
The future of AI shouldn’t be just about productivity. An AI agent with a capacity to grow alongside you long-term could become a companion that supports your emotional well-being. Paradot, an AI companion app developed by WithFeeling AI, reimagines the way humans interact with AI today. Using their proprietary Large Language Models, Paradot A.I. agents store your likes and dislikes in a long-term memory system, enabling them to recall important details about you and incorporate those details into dialog with you without LLMs’ typical context-window limitations. Explore what the future of human-A.I. interactions could be like this very day by downloading the Paradot app via the Apple App Store or Google Play, or by visiting paradot.ai on the web. 
00:47:03
That is so fascinating. I really did just sit at a computer and, like learn the programming languages and machine learning algorithms, which was interesting. But wow. I mean, yeah, you’re really doing real valuable work. And so I think some of that work would go back to like Wilder Penfield. 
Matar Haller: 00:47:25
Yes. So that’s, that’s when you’re actually looking and you’re, yeah, I mean, that’s like the Wilder Penfield is like the grandfather of, of everything that we did. The electrodes that I was using, and again, like things here moved really, really fast. And my PhD was a while ago. I’m not dating myself, but it, it wasn’t yesterday. And electrodes have gotten since then smaller. There’s also a lot of single-unit recordings where you can actually put in an electrode and record from like smaller populations, right? My electrodes were, were pretty big and kind of far apart. So I’m recording from larger populations, but yes, it all comes back to like classic neuroscience. And as I was doing it, so that’s like the data collection, right? But then you collect the data and it’s such a rich dataset that like they can sustain you forever.
00:48:11
And like the data sets that I collected are still being used for studies because it’s like, so it’s like rare data. It’s expensive data, it’s rich data. You can look at it many different ways. And so I took my data and data from other studies and then, and as I was, as I was working, I said, you know, the brain is amazing, right? There’s like, no, I don’t think anyone in our field can argue about that. However, what I found myself like drawn to where the algorithms, the machine learning, the statistics, the signal processing, the programming languages, like all of the things that maybe you would say, “oh, those are just the methods”, I found that those, that those were the parts of the papers that I was reading, those are like the algorithms. Those were the parts that I was like most drawn to. And so then it was kind of like a natural transition from there to be like, okay, like this is, this is what I, this is what I’m actually more, more interested in most of the time. So that was how I found myself there. I did an, like an NLP class towards the end of my PhD, kind of in secret. And then like the rest is history.
Jon Krohn: 00:49:15
Nice. Yeah. And then the Insight Program. So we’ve had guests on in the past that did this fellowship as well. So it’s intended for, I think primarily people who have already done an academic PhD, have a strong quantitative background like you did. You were doing tons of, like you say machine learning, kind of data science techniques. So things like time series analysis, dimensionality reduction, clustering, regression, data permutation. So you had all this existing experience. And then, so the Insight Data Science Program was that a useful transition after you’ve done all that? You’d done the secret NLP course, and yeah, was that still useful for making the transition to industry? 
Matar Haller: 00:50:03
Oh, absolutely. Insight Data Science is wonderful. They, what they do is they basically, they take people that, you know, we already have all of these skills and we’ve done, we’ve done data science and you know, we’ve done machine learning and we’ve done programming, but we’re totally, totally clueless about the real world because we’re academics, right? We know nothing. And basically they help us sort of frame what we’ve done in the context of industry. So talking about like startups, and funding, and jobs, and like, what is it like to work? And then also things just like best, best practices, right? Like, you know, version control and things like that, which some people do in their PhD, some people don’t do, like, some people’s PhD is like a hundred percent in like a Jupyter in like a single Jupyter notebook. And so basically like they kind of get you up to speed for like, and like some, like for gaps that you have there, but also just frame what you’ve done in the context of, okay, this is industry, like here is why what you have is valuable here is how you can use the things that you’ve done in industry.
00:51:05
So they show, you know, people come in, they sort of like, you know, “Hey, look, here’s how, here’s like fraud detection at this company.” And you’re like, “Oh, hey. Like I, I, I’ve done that.” and sort of kind of tie it together for you. Also, you know like salary negotiation, like all, all of these like things. And then at the, what, what you end up doing is they say, okay, in your PhD you had like 5, 6, 7 years to work on something. Now you have three weeks to put together a project that actually brings concrete impact that like can bring, be impactful, and that you can like pitch it. And then that kind of puts you in the mindset of like, what is a POC? And that also helps employers get around the bias of like, “God, I don’t want to hire an academic, like I’ll never get anything done.” 
Jon Krohn: 00:51:45
Right, right, right. And I think there’s, there’s a lot of relationships between Insight and future employers. Lots of employers are looking for great data science talent, like the people that are intaken into the Insight Data Science Program. And so yes, you end up with this flywheel. So, and I just, I want to quickly go back before we transition away from what you were doing with your PhD and how you got to what you’re doing today. I just, I want to talk a little bit more about, so I mentioned Wilder Penfield and how he was, so he was, I guess, the first person to be able to map the human cortex to the level of detail. And it was, I think it was the same situation. It was, it was many decades ago, like the 1950s or something. 
00:52:30
But same kind of thing. Stroke patients, open skull, and so recording from these individual electrodes over the whole brain. And so that gave this map of the whole, so there’s this somatic sensory homunculus and this motor homunculus. So it’s this like, really cool. I encourage our listeners, and I’ll try to remember to include in the show notes a link to some images of this homunculus that it shows you, so I think homunculus is Latin for like little man. And so it’s the idea of this when you, as you go over the motor cortex or the sensory cortex in the brain, there’s this map of your body. And it isn’t anatomically correct in terms of scale. So for example for both the sensory as well as the homunculus the hands are huge because, so, because you have so much you have so much detailed sensory perception as well as motor perception in your hands. And I remember like the lips are huge-
Matar Haller: 00:53:37
Lips, the tongue. 
Jon Krohn: 00:53:39
But yes, [crosstalk 00:53:40] back is small. 
Matar Haller: 00:53:41
Yeah. Exactly. I think we had the same textbook. So one thing you could do is you could take a, you could take a paperclip, right? And you can like put it together. So like the, it’s points are like at some distance apart. And you could find like, what is the closest distance on your back basically that you’re, that you can tell them apart. Because at some point, like you just, you’re not sensitive enough on your back to tell them apart. And then you put on your lips and immediately you’re like, oh, wow, these are super far apart. And that basically is reflective of the fact that you have less representation, less sensory representation of your back than of your lips in your, in your sensory strip.
Jon Krohn: 00:54:13
Cool. So yeah, so I wanted to recap on that. Oh, and then, yeah. So, so, and there, I, there’s a very specific reason why I brought this back up, not just because it’s really interesting, which I think it is in and of itself, but you were talking about how recording electrodes, so I was thinking about how the recording electrodes that while the Penfield would’ve been working with many decades ago, not many centuries, many decades ago. 
Matar Haller: 00:54:36
Wow. Thousands of years ago. 
Jon Krohn: 00:54:39
This podcast will be listened to for millennia. It’ll all be confusing. So the recording electrodes would’ve been much bigger that he was working with than you were working with. And you were talking about how in recent years they’ve become even smaller. And so then that got me thinking about how there is this push with companies like Elon Musk’s Neurolink to have brain-computer interfaces eventually that aren’t just for people who have serious issues and have their skull opened. But there’s this move towards, in our lifetime, potentially having some way of having recording electrodes on our brains without needing to have invasive surgery. And so I don’t know if you have any thoughts on that. And I also, I’m gonna be asking you after this episode as to whether you happen to know any amazing guests that could dig deep into that topic here. 
Matar Haller: 00:55:37
Ah, yeah, I have some ideas. So I, so not all, so actually even today, not all brain commuter interface is super invasive. I mean, it’s invasive in the sense that like, you have an electrode in the brain, like that’s invasive, right? But not far all you need like a craniotomy that opens everything, right? So for example, you have for Parkinson’s patients deep brain stimulation, right? Where they’re, they basically hold they put in an electrode very, very specific to the substantial nigra, which is a place in the brainstem where- 
Jon Krohn: 00:56:13
The black substance.
Matar Haller: 00:56:15
Yeah, there you go. Someone remembers their neuroanatomy. Where, where basically you know produces dopamine, and then when cells there start to die then, then you need to basically stimulate it in order to get around the fact that you’re not, it’s, it’s not functioning. And so that, that’s, again, that’s just like an electrode that’s brought in and that’s used to stimulate and actually there’s like you know, how do you decide how much to stimulate or whatever, there’s like a, like a device that you can like calibrate and then decide how much to stimulate. And so that’s, you know, that doesn’t require massive craniotomy. I mean, that’s already like a feedback loop that you have that’s, it’s been, it’s been around for a really long time. 
00:56:54
You also have things that are more invasive, but long-term. So the patients that I was discussing, they basically have it have the electrodes in temporarily, right? They’re, they have the craniotomy, the electrodes are put in wires coming out of the head. Once they have seizures and everything’s localized in the best case, they remove it, skulls back in, electrodes gone. However, there’s also different companies for example, NeuroPace which they actually permanently implant an electrode strip in the person’s brain. And then they’re able to basically record ongoing. And the idea is that they’re able to predict or give some sort of like leadway before a seizure happens, and then stimulate they stimulate to stop the seizure. And, and they, and they record and then that’s uploaded to their servers. And so, and that’s already, that’s also already there. That’s like, there’s patients walking around with it right now. 
00:57:43
And so I think Elon Musks’ is like the next step of that where it’s like, okay, it’s not clinical and let’s say how we can get it smaller and smaller. So, if we think about like Moore’s Law, what he said about things getting like, you know more and more fitting and getting smaller and smaller it, I think, I think, we’ll be there, we’ll be there shortly. And then it’s a matter of like what, like how, how do you make it minimally invasive? How do you make it so that, because at some point, like in terms of like, how long can it be in before cells start to die, before the body starts to reject it? And basically there’s a difference between just recording and stimul and versus stimulating and what does it mean to like stimulate and at what frequency?
00:58:22
And I think there’s some really, really interesting questions, because already we see. So here’s another tangent. We see that brains oscillate, right? So we have like oscillations in the brain basically where there’s different frequency bands that are associated with like, different processes, right? So like alpha, which is between 8-12 HZ, that’s often for like visual cortex, but it’s you have beta, which is like 15 to 30 Hz, and that’s for motor movement, like when you initiate, when you initiate motor movement, you have beta suppression. So there’s things like that. But what we do, what we also see, is that there’s individual variability between these frequency bands, right? So like, my beta is not your beta, my alpha is not your alpha, my theta is not your theta, and so forth.
00:59:05
And so anytime that now we’re gonna go in and we’re gonna start stimulating, you’re gonna say, okay, well I’m gonna stimulate a particular frequency, but what is that frequency, right? And how do I determine that frequency? And how do I know what frequency is ideal for me versus for you, for whatever the results are. Now this is like, again, right now it’s, it’s, it’s far away. But if, but we, we, there, there are already like stimulation protocols that we see even completely non-invasive. Like, there’s things with TMS, transcranial magnetic stimulation where people have been experimenting with that. And there’s also research about using stimulation for psychiatric disorders and so forth. So there, it’s, it’s a huge, huge field. And hopefully that wasn’t too much of a tangent. 
Jon Krohn: 00:59:46
No, not at all. I obviously found it super fascinating. And I mean, I think that any of these kinds of, any of these kinds of discussions around how we can be using technologies to adapt our brains either to resolve some you know, some, some negative issue like you’re saying from strokes to psychiatric issues all the way through to potentially having enhancements which is, I know like, you know, some of these brain-computer interface, BCI technologies are designed to, yeah, not just be for resolving issues, but also to potentially augment human capabilities in ways that probably we can’t predict yet. So I don’t know. I think it’s super, super interesting and yes, I will be following up with you to see if you have other recommendations for people who can dig into kind of a BCI episode.
01:00:40
So you mentioned Matar, how your PhD was more intense than some other people’s PhDs, neuroscience PhDs, certainly much more intense than orders of magnitude more intense than my PhD was in terms of like being really in the real world and dealing with patients. But that isn’t your only intense job that you had. So, am I reading this correctly? You were teaching children how to use tanks? Preschool tank instructor? No, wait, it’s two separate items. Yeah. So you were a preschool teacher, you were also a tank instructor. So I’m curious as to whether those experiences helped you prepare for your career and in particular, maybe this might seem tangential, but it wouldn’t surprise me if somehow this does tie into an answer, which is that I know that you’re passionate about expanding leadership opportunities for Women in STEM careers, including data science. And so I wonder if we can somehow tie those two topics together. 
Matar Haller: 01:01:41
Yeah, sure. Why not? So, so my military, so in Israel there’s a mandatory military service. I actually did kind of like a strange route when normally when you’re 18 you basically, that’s when you start your military service. I actually did my undergrad- 
Jon Krohn: 01:02:01
You went to Berkeley.
Matar Haller: 01:02:03
I did.
Jon Krohn: 01:02:04
Right. Yeah. You went to Berkeley for an undergrad and then back to Israel to be a tank instructor and then back to Berkeley to do your PhD. 
Matar Haller: 01:02:13
Yes, correct. True story. And so- 
Jon Krohn: 01:02:15
And then now you’re back in Israel again. 
Matar Haller: 01:02:18
Why can’t I make a decision? Yes. And so I did it a bit backwards and I actually decided that for my military service, not like there’s, there’s not a ton of like, flexibility in what you do, but like, there’s some and I decided that I want to, I wanted to try out to do something that was like, very different from anything that I would ever do. I said, I’m probably gonna be in an office for the rest of my life. I want to do something very different. And I also want to do something that’s like kind of scary to me that like, I’m pretty sure I’ll fail at it. I think it’ll be really difficult. It’s like comes totally, completely out of my comfort zone because I think that I think that’s important. 
01:02:56
And so, and I said, you know, and the risk here isn’t that high. Like worst case, like, I won’t be that great in the military, that’s fine. And so I tried out to be like an instructor and then specifically I saw a tank. I was like, that machine is amazing. I want that. And so then I like tried out specifically to be a tank instructor. And my, the way that it works is at least then was that they have women that basically we train the soldiers. Where basically in a tank you have you have the gunner, you have the driver, you have the loader, and you have the commander. And my role was basically to train the gunners. And, and then also commanders have to do all of them and, and officers have to know all of them. And so also training like the commanders and, and the officers.
01:03:55
And specifically for training the gunners, I was trainer on the weapons subsystems. So basically all of the computers. And that basically helps them like all the computer system within the tank for the weapons subsystem. And it was kind of tricky to do my undergraduate before a military service because I would ask my commanders questions like “so the algorithm that it uses to understand like what angle to open is that like, does it, does it learn through reinforcement learning or?”, and they were just like, maybe like, what? Who are you? Like what planet did you come from? Like, what are huh? And so that’s, that’s what I did my military service in. And so it included like, it was like incredibly physically difficult because they basically make us go through you know, I did basic training, had to do like a 1,000,001 pushups run around a lot the outside, not sleep, learn how to, basically I was trained on all of the subsistence of the tank, not only the gunners, basically to learn everything that goes on there and then focus in on one. 
01:05:03
Because I only got, basically after you do basic training, after you learn everything, that’s only, only after that. Then they’re like, okay, now you’re gonna focus on this. And it basically checked all the boxes of being like really, really hard, incredibly challenging. Turns out physical isn’t the tough, isn’t the tough part. It’s like mentally very, very difficult. And so that kind of set me up to being kind of less afraid of failure because it’s tough. After that I was a preschool teacher which was by far, by far the most difficult job I’ve ever had by far, by far. 
Jon Krohn: 01:05:38
Oh, really? 
Matar Haller: 01:05:40
Oh, yeah. 
Jon Krohn: 01:05:41
Wow. I even, as you were saying that, I thought you were gonna say, like, I had this idea of like cuddles and like laughter, like how much nicer that would be then. 
Matar Haller: 01:05:50
Yes. Tons of cuddles, tons of, tons of cuddles, tons of laughter, but like, so physically draining. And like, so emotionally, like, you know, I would dream about my kids and like, it never leaves you like you dream about these kids and you’re thinking about them and like they, you know and I was also like more sick than I’ve ever been, like, constantly sick. I was like always on like, some sort of antibiotics. No, but it, it’s like very, very challenging. But it’s also you know, it gives you, it’s also another way to do something that’s like really tough and gives you a different perspective. And it’s both the military and being a preschool teacher are incredibly, incredibly humbling. Like very, very humbling. And so I think that’s like the biggest takeaway for me of those things that, that I did.
01:06:35
And now wait, wait. I need to tie it back in to Women in STEM. So, now I’m a mom. I have three kids. I have a two-year-old and almost five-year-old, and like less than a month five-year-old, and an eight-year-old. And so first of all, if I’m tying it all into like content moderation, why I do what I do, I think it’s like extremely obvious. Like you know, online harm can turn into offline harm. And I do want to make sort of all interactions safer. And in terms of like, I see my daughter and I see the world she’s, she’s eight and like, what it means to be, to be a woman in this world and a leader in this world. 
01:07:19
And I want to make sure that she has role models so that she isn’t the only woman in her computer science class, been there. So that she has, you know, she isn’t the only woman in meetings, been there. I want to make sure that she has like a much more welcoming environment for whatever she, for whatever she wants to do. And what’s really sad to me is that even now there’s like I’m hearing from her things like, oh, well, you know, boys are better at that than me. No, not true, very not true. Here’s why it’s not true. And so these are the kinds of things that like, I want to, first of all, I want to make sure that they’re, they’re not out there online. These kinds of you know, speaking of disinformation, but also want to make sure that sort of the environment that she’s growing up into is, is much more welcoming. 
Jon Krohn: 01:08:05
Nice. Well, it’s cool to hear how you are, you know, your passions are coming through across all aspects of your life and that you’re tying together you know, these personal things, these, the personal things that you’d like to see in the world with what you’re doing professionally with respect to things like disinformation. So we were talking about you being in Israel, obviously that’s come up a number of times at this episode and the military service. Another thing that is unique about Israel is that it has very high R&D expenditure per capita. So it is markedly higher than any other nation on the planet. And so that probably creates an interesting flywheel between the strong tech startup ecosystem that there is in Israel. So that, you know, helps generate more things that R&D can be spent on.
01:09:00
But another interesting piece related to this is that I can’t remember if this was a podcast conversation that I had in the past. I don’t think it was. So I think this is the first time we talked about it on air, but my understanding is that another thing that’s fueling tech startups in Israel is this mandatory military service. So you went and did tank instruction, but a lot of people particularly I suspect a lot of people that already had undergraduate degrees like you did, end up doing things that aren’t, you know, you know, they’re, they’re not training to be on the front lines and more, they’re training how to do threat intelligence. They’re training how to do signal detection, they’re using machine learning and data analysis in the field. And then, so having developed that skillset for several years, when you finish it, you’re like, well what could I do? And one idea that I guess a lot of these people have is, well, I could be making a startup I could be using, using these technology skills in industry. 
01:10:05
So so we have these flywheels of, I guess there’s two flywheels here. There’s one where the mandatory military training creates leads people to be tech entrepreneurs. And then that probably in turn also is helpful for military capabilities in general. And then you have this separate flywheel of R&D where this strong tech ecosystem is a self-fulfilling prophecy of, “oh, great, you know, we should be investing more in this.” And so then more people go into that. And yeah, so I’ve, I’ve now talked a lot, a lengthy transition. The floor is yours. 
Matar Haller: 01:10:47
So yes, yes, and yes. So yeah, we have mandatory military service. It’s currently set at on like in general, roughly speaking, two years for women, three years for men. Again, with lots and lots of caveats. And in and there’s sort of, first of all, there’s definitely like a big investment of the military in technology, whether it’s like signal processing or AI or whatever. And so then you have people that are basically trained in that, like you said, and then they can go out and they have this skillset set, and we, we hire people, you know, everyone’s hiring those sets of people, but even people that aren’t going into these sorts of fields, the fact that there’s this mandatory military service means that already from a young age, you’re in a place where you’re picking up skills that are necessary to succeed in, you know, in these companies, right? 
01:11:39
So, for example, leadership skills, right? Like you can go basically in most cases, in order to become an officer in our military, you have to start at the bottom. Like, it’s not like in the US where you have like West Point and the Naval Academy or whatever, and then, and then that’s how you become an officer. Basically you start when you’re 18, and then based on different parameters, you can, you can be, you can elect or you can be chosen to do officer’s training. And so then you have these people that are, that are leaving the military with a skillset set of, you know, being like very focused, you know, focused and leadership skills and managerial skills and time management skills, and all these things that basically like, oh, that’s for a successful entrepreneur or a successful CEO.
01:12:17
And so yes, it’s one of them is like on the job training. And the other one is just like in general, these like other skills they need to have. And another thing that I think is really positive about the fact that it’s a sort of mandatory military services, that it’s sort of this like equalizing force, right? So everyone goes into the military, almost, huge caveat, which is like causing right now a lot of social unrest here, but we’ll leave that for a different, for a different time. But you go into the military and you’re mixed with different people, right? And so that’s also a way of like meeting people that you wouldn’t necessarily otherwise meet, then kind of out of your, you know, echo chamber out of your specific place. And then that can also be like a very like an incubator for like new relationships that can then go off and, and start new companies. And then yes, I think the fact that we have a very, very strong investment in R&D is also, like you said, the self-fulfilling prophecy. Like what do people want to go end up doing? What do people end up doing is they go to this field, right? It’s like, that’s, that’s what we know. That’s what we see. That’s, that’s, that’s sort of like that’s also a very good way for upward mobility for, for people. 
Jon Krohn: 01:13:17
And so with our field in particular with data science, do you think that all of this R&D in Israel will give Israel an edge in AI technology in particular? 
Matar Haller: 01:13:26
Yeah, so I, yes, absolutely. I think that already we’re seeing that. So I have people that I, that I work with one of which like it, she used to be like very, very senior in the military and in AI. She’s, I work with her very closely. She’s, she’s our VPI product noit. And so she was very, very senior in the military and building AI infrastructure capability. So there’s already this sort of cross-pollination. We also have people that either, like I said, we hire right out of the military, or in some rare cases we have people that sort of start doing their studies first before, right? The military says, okay, like, you can take this time. You know, we pay for your studies and then you sign on for a certain amount of time later for the military.
01:14:14
And in some cases we can also hire these people while they’re in their studies. And then the skills that they learn with us, they can then go and use in the military. So there is this, this definite cross-pollination that we’re seeing. And I think that it also definitely puts, you know, there’s, puts AI as like a very, very strong and core component of the industry here because it’s so useful, not only in the military, right, but in general in all of the companies that are going on. And so there’s like this very, very rich community here of, you know, researchers, practitioners, and so forth. 
Jon Krohn: 01:14:49
Great answer, crystal clear and exciting to see what ActiveFence and other AI companies will be doing out of Israel in the coming years and the coming decades. This has been an awesome episode, Matar. So I was promised that you were this extraordinary speaker and you have proved to be an amazing communicator. It’s been a real joy to speak to you. 
Matar Haller: 01:15:11
Thank you. 
Jon Krohn: 01:15:11
And so I’m sure our audience loved this conversation as well. Thank you. And so we covered a lot of interesting topics, automated harmful content detection, neuroscience, military service, preschoolers, so I’m sure our listeners will want to hear more from you. So first my penultimate question that I always ask guests is whether you have a book recommendation for our audience? 
Matar Haller: 01:15:44
Of course. So this has nothing to do with anything that we talked about. But I really like the book Under the Banner of Heaven. It’s by Jon Krakauer who wrote Into the Wild. I think. 
Jon Krohn: 01:15:56
Oh yeah, I’ve heard that. 
Matar Haller: 01:15:59
I love reading books about sort of like other, other lives or other, other places. And so Under the Banner of Heaven is, is a good one. 
Jon Krohn: 01:16:09
Nice. Yeah, Jon Krakauer is a, an outstanding author based on Into the Wild, so I’m sure that that’s a great recommendation. He’s also, he’s an annoying person for me when I start typing my name into Google. Oh, that’s true. He’s the one who comes up until I get to the “o” in my last name. So always reminded of him. 
Matar Haller: 01:16:29
Maybe that’s what primed me to think of that book of all books I could recommended. 
Jon Krohn: 01:16:33
There you go. And yeah, and then my final question for you is how should people follow you and glean more insights from you after the program? 
Matar Haller: 01:16:42
So, I’m on LinkedIn, like everyone. I also, we have a R&D tech blog for ActiveFence. It’s engineering.activefence.com, and that’s where you can read more about the things that we do and dive into some more, some more details. And please feel free to shoot me an email or reach out to me on LinkedIn. I’m always happy to chat. 
Jon Krohn: 01:17:06
Nice. Thank you for making that offer to our listeners, Matar, and thank you so much for being on the program, especially on such short notice we booked you just days before recording this episode. 
Matar Haller: 01:17:18
Oh, don’t say that. It makes it seem like I have no life. 
Jon Krohn: 01:17:22
Yeah, right. Yeah, as I mean, it actually, it just shows, you know, how kind you were to make this time because you’ve got three kids, you are the VP of Data and AI at a very fast growing high valued company. And so thank you for making the time despite that to fit our SuperDataScience listeners in. 
Matar Haller: 01:17:45
Happy to. 
Jon Krohn: 01:17:46
Nice. Well, yeah, so you mentioned potentially being on the show again the future, and that sounds great to me. We can hear about how ActiveFence continues to shape this harmful content reduction space in the years to come. Thanks, Matar. 
Matar Haller: 01:18:04
Thank you for having me. This was fascinating and a lot of fun. 
Jon Krohn: 01:18:12
I loved this conversation today. I hope you did too. In today’s episode, Matar filled us in on how an ML model such as a binary classifier can become Contextual by taking into account additional context. For example, we can pull out a logo from an image, identify the individual in an image, and compare it with a database. We can examine natural language comments and consider the content poster’s history and graph-network affiliations. She also talked about how real-time streaming of harmful content presents unique challenges that can be addressed by smaller models on edge devices like phones, sampling on servers, and again, taking into account context. She talked about how we can create a flywheel of defensible commercial AI systems by amassing proprietary data, curated by internal experts. And she talked about how she uses Python, node.js TypeScript, and Kubernetes for developing ML models, deploying them into production, and scaling them up for ActiveFence’s users. 
01:19:07
As always, you can get all the show notes, including the transcript for this episode of the video recording, any materials mentioned on the show, the URLs for Matar’s social media profiles, as well as my own social media profiles www.superdatascience.com/683. That’s www.superdatascience.com/683. Your feedback is invaluable both for spreading the word about this show as well as helping me shape future episodes more to your liking. So please rate the show on whichever platform you listen to it through, and feel free to converse with me directly through public posts or comments on LinkedIn, Twitter, and YouTube. All right, thanks to my colleagues at Nebula for supporting me while I create content like this SuperDataScience episode for you. And thanks of course to Ivana, Mario, Natalie, Serg, Sylvia, Zara, and Kirill on the SuperDataScience team for producing another captivating episode for us today. 
01:19:54
For enabling that super team to create this free podcast for you, we are deeply grateful to our sponsors whom I hand-selected as partners because I expect their products to be genuinely of interest to you. Please consider supporting this show by checking out our sponsors’ links, which you can find in the show notes. And if you yourself are interested in sponsoring an episode, you can get the details on how by making your way to jonkrohn.com/podcast. Finally, thanks of course to you for listening. It’s because you listen that I’m here. Until next time, my friend, keep on rocking it out there and I’m looking forward to enjoying another round of the SuperDataScience podcast with you very soon. 
Show All

Share on

Related Podcasts