How do we teach computers to see? In this episode, LexSet founders Francis Bitonti and Les Karpas talk about advances in Computer Vision.
Welcome to the Decoded Podcast, presented by GS1 US, where today's thought leaders help us crack the code on emerging technologies. Hello, everyone, my name is Reid Jackson, I'm the Senior Director of Corporate Development at GS1 US, and on today's episode, we're going to be discussing computer vision. Our two guests today are Francis Bitonti and Les Karpas. They're the co-founders of a startup called LexSet. They're innovators, patent holders, entrepreneurs, designers, authors, technologists, and more. Guys, welcome to the show.
Thank you so much for having us.
Thank you so much. It's a pleasure to be here.
Now we have for the first time on the Decoded by GS1 US Podcast two guests at one time, and which always makes it fun. But, we'll do our very best here. Today's topic is computer vision. It's a big topic. It's also closely related to artificial intelligence, which is an even bigger topic, and there's so many things I want to discuss with you guys today, but I know we won't get through everything. But, if we could, let's just start with a little background on both of you and how you got involved with computer vision. Francis, if you would, let's start with you, a quick introduction.
Yeah, sure. My background is actually computer graphics, originally. I went back to school and got an architecture degree. I've always been working with solving 3D spatial problems. For the last eight years, I founded and ran a product studio called Studio Bitonti where we became very, very involved with 3D printing and building software applications for what we call generative design. Using algorithms to solve design problems, and that was actually how I met Les.
That is amazing. I really wasn't expecting that, I've met both of you individually, and we've talked. That is such a cool story. Boiling that up, you guys are working in this 3D printed space, you're talking at all these conferences, you keep bumping into one another. You find out that you like each other's personalities, and then you find this opportunity and you're like, let's start a business. Does that sum it up?
Yeah, that's pretty much what happened.
Yeah.
That is so cool. I also did notice just doing a little research, you guys have a bunch of patents around 3D printing and manufacturing, fabrication. It always blows my mind when I'm around people that are like, "I patent that, I patent that." It's very cool. Awesome. All right.
Sure. I would say that, in the last five to 10 years, we've seen a real renaissance in the power of GPU computing, and graphics cards. That has ushered in a massive new wave of deep learning on top of the existing machine learning and AI that was the status quo before the revolution kicked off. With this massive acceleration of compute that unlocked all of these new applications that weren't previously possible, and that happened at the same time that drones and service robots and autonomous vehicles, and other new technologies needed advanced forms of computer vision, and that created a perfect storm for the technology to grow and advance really rapidly.
I would only add to that a bit in that, I think that at the core of all this is deep learning. Essentially, you hear a lot of buzzwords, we don't always know what all these things mean, but deep learning is really, it just means you have many many layers in a neural network. These techniques are not new, but to Les's point, they've been very, very expensive to compute. They're incredibly powerful and flexible. The neural net will in many ways solve the problem for you if you can throw enough data and compute at it.
I think what's happened is with GPUs and the economics of computing, we've reached a point where everybody can start to leverage these algorithms, and you don't really need a supercomputer. They're very adaptable and then they're capable of solving a whole wide array of problems. I think that's really what's driving it right now.
We have a debt to thank to all of the crypto miners out there. All the crypto biters out there went and bought hundreds of thousands of graphics cards and put them all over the world so they can min crypto, then suddenly the bottom dropped out of that market, and then there are all of these GPUs sitting around looking for something important to do. Meanwhile, the whole AI community is like, hello. That just unlocked [inaudible 00:07:10] at unexpected low prices.
Reid, I don't want to hijack this conversation because clearly we could talk about this all day.
I know, it's tough.
The crypto space, they continue to push computing technology. I think also what Les's point is, crypto, they've moved on to specialized circuitry. They've gotten computing problems where GPUs aren't working anymore, but they're continuing to be one of the applications that pushes these technologies. It's a good place to look, actually, if you want to see, I think what the next generation of processing technologies are going to be.
I think you both bring up excellent points and we see it across all the technologies we talk about. It really is a web of interconnections and leveraging from cross platform environments. We're going to have conversations on the Decoded Podcast about AI, and there's AI general and AI pure and all these other levels as you go down and talk about the differences of off the shelf AI, and the fact that some people are using Excel spreadsheets with complicated macros, and they're calling it AI.
No worries, we'll stay away from that.
Cool. I love on your website you guys actually call out and we're going to talk about LexSet as a company in a little bit here, but I still think this is appropriate. You guys call out that there's a dirty little secret in AI, and that it requires vast quantities of data that is annotated by humans. AI is only as smart as the data that it's trained on. Can you elaborate a little bit more on that, Francis?
Yeah. Right now, all over the world, there's people just labeling data. If you think about it, we're using photographs to teach these algorithms things. If I want to teach an algorithm to find chairs, or soda bottles, I need millions of pictures. Those pictures weren't meant to teach a computer anything. They were meant to just capture some patterns of light. What ends up happening is you end up needing to go through and say, no, these pixels, these colors, these relate to a Coca Cola bottle, these relate to a chair.
A lot of folks think like hell, here's a picture and then it goes to a computer, but there's still a ton of human work that's associated with this.
Yeah. My favorite anecdote to bring up about the value of, I guess, click farm annotation. There's this company called Scale. They've got a billion dollar valuation last summer from Founders Fund raised $100 million from them. The only thing they do is send massive amounts of photos over to click farms in Southeast Asia to be labeled. They just do that in a really efficient way. That is a billion dollar valuation just on click farm.
I think that shows you how big of a problem this is, and how much money is being spent on getting data sets in shape for AI. I was talking to a customer the other day and they actually had a lot of data, but the problem is that it's like looking for a glass of water in the middle of an ocean. You can't use it, it's not formatted correctly. They were an early stage startup, and it was actually cost prohibitive for them to actually get their own data labels so that they could use it for AI.
A lot of folks miss that. We've talked about I've been involved in this big data space, if you will, for the last seven plus years, really involved in it, in data centers and Hadoop environments, machine learning and unsupervised machine learning and taking different points of view. But it really comes down to not a data lake, it's a data swamp, and then you still need this human interaction to make it happen.
It's very prevalent. There's examples all over the place. I think cosmetics companies, these are some of the more public examples in trying to build these applications where you can pre visualize makeup. A lot of the facial tracking systems, they're being trained on data sets that are predominantly white males and we're seeing all kinds of problems with that.
That's a bias problem that's not even related to humans, it's just related to the physical environment and a biased interpretation.
To just unpack that a little bit farther. The bias comes from the human that is collecting the data that goes into the data set.
Excellent point.
It can be really subtle, really subtle, that somebody just wasn't aware of a lack of variety in the data they were putting together. It's very much a problem that you don't see inherently because you don't notice what's missing in a dataset.
That happens to me all the time in my family because I'm the only boy and I have three sisters. When I say stuff, they look at me like, what the heck are you talking about? I don't think I'm doing anything wrong, but it's completely bias.
The fix for that bias is to create forced variation. To create variables in the system that is building the datasets, LexSet, we create simulated data, and we create intentional randomization in that data on lots of different parameters to eliminate bias, and even the bias that we didn't know that was there. A really tangible example is most photographs are taken from about five to six feet off the ground, because that's how tall people are. That creates camera angle bias. They tend to be taken in environments where the lighting is pretty good, because people aren't used to taking photographs in bad lighting conditions. That creates inherent lighting bias. Suggests between those things alone, that's a tremendous obstacle for an AI to overcome when it's only fed data from those types.
I love that you uncovered that for our listeners here because most people that are listening to this, they know from a high level and they're dabbling in, they're not data scientists, they're not manipulating data every day. When they hear bias, they think of personal human, relationship type bias, but you're talking about just actual, like I always take photographs in this lighting, or I always take photographs from this angle, therefore, it has a bias associated to it, because we're not marrying it up or comparing it to other angles.
The other side of that is, certain things just happen less frequently than others. If you think about from self-driving car perspective, think about how often do you see a yellow traffic light? Not nearly as much as you see green and red. Some of these things just, they're harder to collect, because they don't happen as frequently. Even if you do, you might not have the diversity you have at the other positions. Bias, I think it's one of the biggest challenges is engineering space building systems.
Wow, that's amazing. All right, let's transition here a little bit more and we'll get a little bit more into your company, LexSet, but you guys are coming off of an amazing year. You've won a ton of awards. I'm just going to read off a couple here. One being the Startup To Watch at the Augmented World Expo. You won our startup lab pitch competition at GS1 Connect 2019. You were named the Best Startup Showcase at Data Con LA. You won the popular vote at Emerging AI Pioneer Showcase at the O'Reilly AI Conference. Most recently, you only won third place at the Verizon Built.5G Challenge. But that third place paid off a quarter million dollars.
I have not been given the go ahead to discuss the demo that we're building with them at a reveal... They're choosing sometime this spring. I wish we could say more, but what we will say is that it showcases the relationship between computer vision and 5G and the power that 5G can have in enabling computer vision systems because you can create data and retrain computer vision models at a speed that has never been possible before.
I tried, folks, I didn't think that they would talk about it, but I tried. First again, I do want to congratulate you guys on that because that was a steep competition. The winner got a million dollars, but the top three all get to work with Verizon's Labs for, it's like eight weeks, I think post the competition and come up with solution. I'm excited to see what you guys are coming with.
Sure. When we started, we were, as you pointed out, we were doing visual search. The broader vision was to build an AI for interior design. I think I told you earlier, I have a background in architecture, and so is Les. When we set out down that path, we started building computer vision systems, and we started to run into all the problems that computer vision companies run into, and data being one of them.
At that time, we were having a hard time getting training data, just like everybody else, and we had instructed one of our engineers to start rendering it. Again, our background's showing here, my background in computer graphics before architecture. I was looking at these images, and I'm like, well, they're not very high fidelity. I'm pretty sure a computer can make something that looks as good as that. Then we started making these synthetically.
Wow, that's an interesting journey. Les, do you have any comments you want to make on that?
Just in short that, while we were trying to figure out having the best visual search system possible, we stumbled onto synthetic data being the way to make that true and then realized that wow, okay, this applies to so much more than visual search. We're thinking too small.
I've been dabbling in the computer vision space and getting educated as they go along. We've met with some companies I was lucky enough to take a tour of Sam's Club now in Dallas, where in essence, it's like an Amazon Go Store. You walk in, you grab product and you walk out and it's paid for, and associated with you think. They have some other things in there with augmented reality and stuff, which is really cool.
There's a group called the Khronos Group, that is an industry trade group from the 3D space primarily made of people from the VR and AR industry that are trying to create standardized formats for retail 3D content. Anything that's going to be shown in a VR or AR environment that can be purchased. If every retailer out there made 3D models of their products and made them easily accessible, not only would that help with computer vision problems, so that the Amazon Goes of the world could function better, but that would just make their products all the more sellable in a variety of formats. That's really the way the world was pointed is the 3D digital shopping in various forms. The biggest thing that's missing is available 3D content from all of the retailers and CPG folks.
I'm so glad to hear you say that because we've been, at GS1 US, we've been thinking along the same lines, and we've been talking with some other standard organizations and CPG companies and we've even had at CES this year we were approached by a software company, I won't name them, but they're a rather large one. It was just what you said, we need some more standards around 3D modeling of images, because it'll just help everything scale faster, instead of doing it in this proprietary manner.
There's also, I wanted to add off on that, the actual standards around how these datasets are formatted, it's a little bit of a tower of Babel. We're distributing data to lots of different types of companies using lots of different types of technologies. What we're finding is actually, the formatting of this information, which is more or less the same is changing every time customer to customer.
It's tough because you're moving a lot of different parts and pieces and emotions and competitive advantages, but we do see it over time. If you look at history of things and just like, I'm a networking guy started off in networks in the '90s, when there wasn't really the public internet, there was private networks, but they didn't talk to one another.
That's exactly right. We simulate lighting, we simulate basically all the physics around the optics. We simulate the materials that are on the objects. We capture and generate real world lighting conditions, and then we reproduce all of these digitally.
How long does something like that take? In just high level comparison, if I'm doing that, the traditional old fashioned way, if you will, taking the photos, going around, using the different cameras, using the different lighting. I'm trying to create as many variable possible as I can, I'm doing that. That's like weeks, months, years, depending upon the thing. What's the impact that your solution is going to have time wise... Is it the same amount of time because of the compute or is it drastically different?
It's drastically different. Some of our customers have recorded three to four months to build a dataset, oftentimes more. We're delivering datasets in a couple of days once the simulations are configured. By the end of this year, we'll be doing it in a few hours. It's like a quantum leap in terms of how you're getting these data sets right now.
That is so cool.
Let's me just tag on to that, let's say your Coca Cola, and you change the design of your packaging, and you want that product to be sold at an Amazon Go or any other store that's doing cashier-less checked out. Well, if you change the packaging, suddenly the computer vision systems in those stores won't be able to notice or see those products until you take new photos, have them re-annotated, have model retrained. But with LexSet, if Coca Cola was to send us a 3D model of that bottle ahead of time before they shipped it, or even the days that they did ship it, we can retrain those computer vision AI's instantly to be able to recognize new product.
I'm so glad you brought that up because that's what I was thinking going through this is things change, it enables me to go back in and make one quick edit. Well, maybe a little bit more than that. I'm oversimplifying, but I still have the datasets that I can manipulate because they're synthetic. It's not, I have to go now reproduce all those new images and all those new ways and angles and lighting and everything. I simply tell the system a new way of doing it.
Democratizing its tools.
My favorite word, democratizing. You guys have brought up Bitcoin and democratizing in this, I didn't expect it to go there, too funny.
We work in tech, right?
Yeah. A couple of other questions here related to the startup world. Share some things that you guys learned early on that you weren't expecting, or a major surprise that happened for you? If there aren't any, there aren't any. That's fine, too, but I always find that you already talked about, hey, we went off down this one path, talking about visual search, and then we realized, we need training data, and you moved. Were there any others like, industry wasn't accepting or just are there any other surprises that you could share with us?
Absolutely. So many. One of the first ones I think that we hit as we expected augmented reality to proliferate art more quickly than it did. We were trying to build this tool that was going to allow AR content to become XR content. To make all AR content spatially aware. We thought the world was going to need that in short order. We bet a little too quickly on that, but that's actually starting to come to fruition now. We were just three years early on that one.
That's very interesting. If I could just comment on that real quick, when I was at CES this year, I met with IDC, and I'm actually going to have them on a show, we're recording it next week to talk about specifically that, AR and VR and hey, how come one hasn't taken off? There's a lot of hype and it didn't go places. It is going places, but there's other nuances to it that are holding it up compared to autonomous vehicles and such.
Another one that I think we've noticed is that while AI and computer vision is exploding right now, it's exploding in very fragmented ways. Every given industry has two or three trailblazers that are doing amazing things with it, and then there are a whole bunch of people that you're doing the wait and see. We're doing deals with a lot of the trailblazers, and then we're waiting for a lot of the wait and see folks to wake up.
That's common with your early adopters, bleeding edge, your main adopters and your laggers. But, I do also get that sense of fragmentation because it's not like an industry that's doing leading adoption, it's this company, it's that group, it's these folks. Sometimes there's no rhyme or reason, it's just the people behind the companies.
Computer vision itself is an industry, but it's servicing a lot of different verticals right now. Like Les says, you're seeing the early adopters in each of these verticals moving full ahead, but then we're still waiting for everybody to follow on. It's happening, though. It's happening very quickly between now and when we started the company, there's way more activity in the space and it's actually moving amazingly fast.
Yeah. Any other major surprises?
Every day is a surprise in a startup. Anything that can happen will and... Even I'm a pretty imaginative, creative guy, and even I'm surprised at what you just don't know what's going to happen.
Some of the other, I guess, pleasant surprises come from inbound demands. We get industries that we never expected to be working with reaching out to us and actually moving forward. I can't talk too much about the details, but there's a TV broadcast studio doing things in the sports space, that we're going to be developing some synthetic data for so they can do metrics on some of the games that they're broadcasting. That was just a use case that we never thought of.
I love stories like that, where it's like, that's exactly I was looking for. I wouldn't think of that, why would they need it? But as soon as you do find out, you're like, oh, okay. That makes a lot of sense. Very, very cool. All right. A little rapid fire here. Just some quick little questions as we're coming close to the end of the show here. Where do you see things going in the future? Make a bet here, make an estimate. Let's see how accurate you are. We'll come back in a year.
Okay. I think every camera will be running some form of AI, probably in the next five years.
I would agree with that, maybe even sooner.
That's one of mine. I'm trying to leave myself a little cushion. So, I'm right.
Yeah. Okay. Les, what do you think?
I think that AI development is going to get much, much easier. Right now, it's something that's in the realm of very elite engineers. I think in the very near term, within the next year or so it's going to become in the realm of data scientists, and we're going to see frameworks that make it a lot more approachable for a much, much larger population.
Do you think we have any fears of running into another AI winter? AI has been around for a long time, longer than computer vision, actually. We've had it... I'm just curious as your thoughts because potentially, everyone's already moved on to this quantum computer, quantum computer. You still need to have negative 273 degrees Kelvin to let it work.
Call it big data, call it analytics, the ability to have a computer recognize stuff that you want it to recognize is going to get much, much easier to set up.
I'm not sure we're looking a winter. I guess from my end, I think we've... Les and I both saw the rise of a 3D printing hype. I think... 3D printing is slowly climbing its way out of that, like disillusionment. I think what we're seeing in AI, though, we're seeing some really powerful applications that, really affect companies bottom lines, and enable customer experiences and products that just aren't possible any other way.
When Francis and I were in 3D printing, we saw a market where people were betting on what technology would do in the future. Today, people are betting on what AI is doing right now. That's a really significant difference. The other side of it is that Francis is 100%, right, general AI thinking machines are not happening anytime soon, in our lifetimes, but the ability to simply and easily teach a computer to recognize simple patterns is going to get really easy. That's really useful for every industry on the planet.
I think you guys put it really well on that, and we'll leave it at that, move to the next question here. Who are your biggest competitors?
That's a good question. There's a lot of synthetic... It's not a lot. We've identified about five or six synthetic data players out there. We're not really running up against each other in fierce competition just yet. We're seeing people are more or less settling into these different parts of the industry. Some are certainly focused on defense. We've been very focused on things that are affecting interior environments, autonomous cars. Like we said earlier, these use cases are so diverse, and the market's so fast, and the applications are so many that we haven't reached the saturation yet when we're starting to fight with each other.
Competition is also healthy. It's good. You've seen, it's pretty open, Les?
Yeah. It's just super blue ocean. There's not enough of us creating synthetic data and successful ways that we're fighting.
It's the perfect type of competition right now. They're out there, it's validated, the market's validated. It's not like why am I the only one with this idea? But we're also not on top of each other just yet.
Right. Last couple of things here because people always want to know and do the best you can to answer in this... Because there's so many variables, but people want to understand one, how much does it cost? Am I looking at millions of dollars just to put my toe in the water on this, or can I get started for a couple hundred bucks or a couple of thousand? Then how long does it take? Is this something where if I called you today and said we had this meeting, and then tomorrow, I signed a contract with you, are we working in hours, minutes, days, months? Just general scenario, just so people can have a little context.
To get started with us right now, it's probably somewhere between $5,000 to $10,000 just to get a sample going about the system. We're looking at about order of weeks to get going. Where we're pointed is we're trying to build a developer platform. We're looking at hours and hundreds of dollars to get started. But this is the position we're at the company right now. We're not a Bugatti right now. That's good. We welcome anybody who wants to give our technology a try.
For the corporates that are thinking about getting into the space, they should think about hiring one senior AI engineer, junior engineer too, setting aside a data budget of 5,000 to $20,000 a month and then a processing budget, that's about the same. That's about the correct allocation to really get rolling quickly with something substantial.
Well, guys, that's all the time that we have for today. I can't thank you both enough for one, taking the time out of your day and seriously dropping some good knowledge on us. The part about the bias and the synthetic and where things are going with AI and everything, it was worth the time. I want to thank the listeners for listening.
Thank you.
Thank you.
Bye for now.
Episode Summary
Computer Vision (CV) has become a buzzworthy term in commerce, but have you ever wondered how we actually teach computers to “see”? There is a lot more to Computer Vision than you might think. In this episode, hear how training data as a service is significantly cutting down on the time and manpower necessary to feed high-performance AI models the high-quality image data they need.
The views, information, or opinions expressed during DeCoded by GS1 US podcast series are solely those of the individuals involved and do not necessarily represent those of GS1 US, its employees or member companies. The podcast series is provided by GS1 US as a convenience and does not constitute or imply an endorsement, recommendation, or favoring by GS1 US of any of the identified companies, products, or services. GS1 US does not warrant or guarantee any of the products or services identified here, nor does it assume any legal liability or responsibility with respect to them.
About Francis Bitonti and Leslie Oliver Karpas
CEO, Francis Bitonti & President, Leslie Oliver Karpas, have deep entrepreneurial backgrounds in computer vision, generative design, and robotics. Together they've won four startup competitions for LexSet including Augmented World Expo (AWE), GS1 Connect, DataConLA, and the People's Choice Award at the O'Reilly Media AI Pioneers Showcase, and were nominated for a SXSW Interactive Innovation Award this year. They were also selected as one of the winners of Verizon Built on 5G Challenge. Supporting them is a team of experts in computational geometry, deep learning, and 3D content.
About Reid Jackson
As Vice President, Corporate Development, Reid Jackson helps leads the investigation of new technologies, partnerships and business opportunities to increase the relevance and reach of GS1 Standards. Drawing on his extensive IT background and experience implementing solutions for both large and small corporations in retail, grocery, healthcare and manufacturing, Mr. Jackson helps lead the exploration of collaboration opportunities to help businesses leverage emerging technologies including the Internet of Things (IoT), blockchain, artificial intelligence, machine learning and computer vision.
Suggest a Guest
Know someone who would be a great guest on the DeCoded by GS1 US podcast? Submit your suggestion here!