AI in Automotive – Webinar by RSIP Vision July 18

AI in Automotive – Webinar by RSIP Vision July 18


hi again everybody and thank you for joining us today for this webinar this is one of a in the series of our webinars a I in automotive and today we’re going to talk about the devastation challenge this is actually a challenge that I assume most of you are common with and familiar with just because you know the line of work and everything and actually we in our company we deal with that on a daily basis and mainly many of our rnd automotive related projects and actually like we now have different sensors we’re going to talk about them briefly today and then following that we’ll have a deeper discussion into stereo camera and then the technologies around that I have today with me Jana Caston he is one of our freedy computer vision experts from a company Jonnie hi alright and myself small Shapiro on the global business development manager in the company so let’s start you you okay so um basically autonomous cars how do they see the world so like you can see on the lower left part this is actually image probably many of you are familiar with from 1957 so this isn’t a newly dream but it is something but many of the companies today are involved in it and there’s a lot of investments going on around it actually I believe many of you saw just yesterday the Tesla announcement by the CEO Allen masks but he said that they believe at full self-driving capabilities for hundreds of thousands of cars are going to be launched by the end of this year so there’s a lot going on I know there’s a lot of discussions is this actually going to happen now or later is it going to take less time or more but what I want to discuss that today we want to briefly go over like I said with technologies and dive into the store topic one but what we need actually to have this kind of autonomous capabilities is having a real time obstacle detection system and this system of these capabilities need to be activated and available in all weather conditions that’s a very important part because we cannot have a car but we’ll drive on this day you won’t drive back from work if it’s snowing or raining or something like that and if we go a bit deeper if in fact when we talking actually of having the capability to do environmental analysis around the car and understand what’s going on besides the car let’s all around it to better understand what are the capabilities and how we can move forward so this is just a one slide where many like this I just wanted to show a short example for how much is going on in the market this is a slide of lidar manufacturers so actually today if I remember correctly if account the more than 70 come developing different kind of lighters and lighter is one of the technologies that we’re going to talk and available in the market and estimations are with this market is going to reach over 100 billion dollars in 2030 so that’s the reason so many companies are running towards this and this is a big thing in many cases or at least many believe it’s going to be the revolution of a next generation of cars so a lot is going on what we’re talking about autonomous driving so we have a few phases that actually need to happen again and again in a fraction of a second simultaneity working on dedicated sub models so we can actually perform the understanding of what’s going on and give the right directions to the car so the first one the perception is actually actually trying to understand what do we see around the car what’s going on what are the objects around the car to detect them to classify them to understand which one is more than dangerous which one is closer more far away and so on and that’s a basic stage a lot of activities and effort here are many using AI technologies probably you are familiar with and if you you’ve been attending at least the last where we now we spoke about that and this is one stage that there’s a big difference between different vendors and different kind of solutions and capabilities the second phase is vilaça lusatian and this involves a dedicated hardware software as well and they’re basically it’s integrating the perception data with I would say very accurate Maps to better understand exactly where the car is it includes using things like GPS I’m used and many different kind of technologies to combine between the understanding of what’s going on outside of a car and where exactly we are in a certain state the second phase will be the path planning because we’re talking about autonomous drive things so the the basic first stage will be for the car off of a computer to design and decide what will be the optimal path so I’m getting for this specific point where the car is right now and to destination their location and this process needs to be processed and run again and again after each given time because the data that comes in from the perception and their colonization can influence these decisions and maybe change the TAF or alter it in a way or change the time estimation and so on so this is the first stage very important and the last one of course our controls is actually giving the commands tufa driving controls in the car ok so eventually something needs to move the steering wheel or tell the car to accelerate or stop or turn Oh blink or something like that and that’s the final stage and as I said before this is happening again and again and again all over the face of the car is getting to its destination so having said that there are few kinds of sensors that are available today and being used by the different vendors I’m sure you most of you are familiar with the sensor so we’ll go briefly over them the first wall is the optical one RGB or what we just know as cameras the second one is relied on our technology if red is a radar and the fourth one is ultrasonic technology if you’ll take a look at the slide so it shows the physician name over the over sensors around autonomous cars today in more at least most believe it’s going to be pretty similar and you know when we’re going to get to the last stage actually manufacturing you know B scars and having been deployed everywhere but in general all should be the same so you can see in most of these testing cars today you’ll see will either unit on top of the vehicle this way or never in different configurations it’s where in many cases this will be very taking one giving a 360-degree coverage you’ll see the radius sensors along the side and you’ll see a smaller lighter unit in the front as well also you don’t see this on this image but you you’ll have in many cars rolled through sonic sensor as well on the back fender in most cases just for going backwards you know parking the car and one more sensor you’ll have inside the car that’s the internal measurement units I am you more related to the positioning system and so on so this is a general array of how the sensors look when installed inside a car okay so let’s dive into the sensors themselves cameras so I think the most important thing and we know that what to say about cameras is they’re already around okay so many of the cars do they have cameras embedded on top of them my my car has actually friend in it’s covered with cameras 360 degrees different kind of capabilities of course it’s not autonomous but I would say more advanced ADIS capabilities using these cameras but this is a big advantage they’re already here then the pricing of the cameras are low so there’s no barrier using this technology getting me to the autonomous driving cars and of course the best thing about the cameras is the resolution so the good thing about it is that it’s very good when we need to actually detect and recognize objects and that’s a main capability but you being used today in the perception and understanding of what’s going on outside of a car but by the way also inside the car today in many applications so this is a leading one and many believe that this will stay the leading sensor within the autonomous vehicle of course with this I mean with disabilities so these cameras are that they have limited capabilities in more severe conditions okay so if we have low visibility raining snow fog things like that a lot limited of course if it’s night it’s more complicated so yes we can add infrared cameras okay that’s a bit different giving us more capabilities but we’re paying with a certain degree of resolution and things like that so there is some kind of trade-off between using a camera and other sensors the second sensor is lighter it and of course probably most of you heard about this and this is a actually using laser beams to calculate the depth of objects around the car and it sends the lasers of course events bouncing back and we measure the time that it took to to get back to the sensors it’s a good thing about this it gives us 30 36 360-degree coverage of the car if we’re talking about we’re taking one but we saw earlier in the in the pictures so it gives us a good coverage it is very accurate okay and precise and it gives us already embedded the DEF estimation so you don’t need to extract that out of the data like you do in the cameras we’ll get to that later and that’s a good thing and of course it’s it’s a very robust technology the disadvantage of the main one is the cost so these systems are still significantly more expensive than regular cameras and that’s a challenge when we talk about cars and you know building thousands of cars and a lot more than that and also they have different kind of limitations to words whether we’re talking about fogs now rainy conditions so it will many cases interact or interfere with the image coming and the perception there coming out of that so that’s one thing to remember the further a sensory would be the radar this is using radio radio waves actually to send certain certain objects or around the car once again you can see on the right lower right lower part of the screen an example of somehow of comparing between the lidar and a high resolution like a radar so you can see one of the clear things is that a radar isn’t that clear okay the resolution is a lot lower than using the lighter or RGB camera and that’s one of the disadvantages but the very good thing about radar technology is it’s actually weather resistant so it doesn’t matter if it’s raining or for me or something like that or ADA we see through at least in most cases and also when you look at the pricing of the sensor so it’s just positioned in a good place where it’s not too expensive and according to that many manufacturers are actually embedding that in the solutions as well and some of them at least believe that this is going to be a main sensor in the cars I think one important things to say is that no sensor is perfect you know that and you understood that so you have here some kind of a spider wave diagram showing compression between the different kind of sensors but I think the details aren’t interesting in this specific presentation what is important is to they understand that it needs to be some kind of a combined solution and that’s the TAF that most of the manufacturers are taking today and diffusion is the solution taking the good parts of at least two or more technologies combining them and then getting to very robust results and something that can actually give the needed capabilities in all weather conditions and of course working so this was a quick brief of the different sensors technologies available in the market and without taking more time I’ll pass the microphone to Yanni will which will dive in into the main subject into the Deaf estimation from stereo cameras Yanni okay thank you ace Malik so that’s when I said I will talk about that’s estimation from a stereo camera and here’s the outline for a my part I will start with a brief introduction for the problem then I will review some classic methods that they do a depth estimation from stereo cameras then I will talk about end-to-end learning using deep learning and we will say glance to our next time webinar domain adaptation okay so a deaf customer for many applications like autonomous driving and drones moment reality gaming and robotics and the basic question is that given a pixel we are asking ourselves what is the depth of this pixel so from single view we can only know the direction that this point comes from but we cannot say what is the depth of the 3d point but when we have two comrades and corresponding points between the two images then each camera gives us a 3d ray and the intersection between the 30 Ray’s is the 3d point and then we can tell what is the depth of the of this point so here is an example for a depth a stereo camera here is the right view and this is the left view again right view and left field and we can say that corresponding pixels between the two images have a large large disparity where when the depth is small but when depth is large then we will have small disparity between corresponding pixels so actually the problem of a depth estimation is to find for each pixel in the left image a corresponding pixel in the right image and estimate the disparity between the corresponding pixels then given the calibration of the a stereo camera we can directly convert the disparity map to depth estimation and will give a numerical example suppose that we have this correspondence and the pixel value of the points then in this case the disparity is 50 pixels and we will represent a the number 50 at the red color in the disparity map and if we take a point with a larger depth then we will have smaller disparity which is only 30 pixels and will we will represent it as a blue color in the disparity map so we need to find for each pixel in the left image a corresponding pixel in the right image and actually we only need to search on this line and we need to have some similarity measure that will help us to choose the right pixel so the basic similarity measure called sa D sum of absolute differences over the pixels in a patch and actually it measures the distance between two patches two windows inside the image so if the distance is low we will get that the patches are similar and if the distance is high we will say that this say a patches are not similar so this way we can go over all the patches in the right image and choose the one with the smaller distance here is an example for the disparity maps a computing computed only using si D for different windows sizes however when we have a homogeneous area then the SAT measure will give us similar results for all the corresponding pixels so we will not know which one to choose and as you can see in this example on the left you can see the image on the right you can see the ground truth and in the center you can see that the SAT measure failed on this homogeneous regions so how do we solve this the assumption that most nearby windows have the similar depth so we can use this to constrain the unstable areas and this can be done using what is called a smoothness term in addition to the SAT measure and you can see the result using belief propagation or graph cuts that can use the smoothness terms the problem with this method is that they are really slow SVM approach which is semi global matching solve this by applying smoothness constraint only on a one direction on the image and this can be solved very fast so after they do that for different direction they combine all the directions together to get a consistent estimation of the disparity map and here you can see the result sgm on the bottom performed really well comparing to belief propagation and graph cuts which are much slower than sgm so this is a most successful classic approach so the first method that used deep learning for depth estimation a published for you table by its bontar a and laocoon and they treated the comparison between patches as a classification problem so the input is two patches and the output is a whether they are the same or they are different so in this case the network should say that they are the same windows and in this case the network will say that there are different windows so this is actually a classification deep network so what they did is to replace the classic s ad measure with learn similarity measure but the rest of the pipeline remained the same meaning that they did the sgm optimization with smoothness a constraint to a get depth map and at the time of their submission they ranked one among all the previous methods and you can see that open CVS GBM ranked nine at this time and MC CN n actually had two version 1 of them was the accurate one but the slower one and they had this fast version which was both accurate and even faster than the SG BM and by the way both methods are not a fast enough for real-time application and for example this is something that we do at our si P we take an application and we make it run faster by constraining the algorithm to work well for the specific application ok so what can we learn using a deep learning so the first thing as we saw before is to learn the cimelia the similarity measure between patches another thing that can be learned is the smoothness constraint and the optimization part so for example for the sgm we can learn how to set the parameters of this gem using deep learning or we can learn how to combine different direction of estimation using deep learning another thing that can be learned using deep learning is a post processing meaning getting an initial estimation and improve it using deep learning but we will talk about end-to-end learning which takes as input the two images and use a single convolutional neural network to get directly the depths from the two images but the big challenge here is that kitty which is a the most common a data set has only 200 pairs for training so it’s not a problem for batch based network because one training sample has thousands of touch matches for training touch based network but for end-to-end learning one training sample is actually one training sample and it’s not enough to train and to end network first error estimation so the first method that did end-to-end a deep learning for stereo was this net and for this they generated a synthetic data set em with a imager and their corresponding depth and then the yield unit architecture to estimate the depth directly from a to a RGB imager and they turned this network using the synthetic data set and here you can see a comparison between the methods at this time this plant the end-to-end a network performs close to MCC nm but both of them performed much better than sgm the classic approach however this plant was much faster than MC CN N and here you can see the ranking at this time da at this time MC CNN prefer ranked 2 and this Pinet rank 4 but this net was much faster by the way today both are ranked much lower because this is a very active research field another approach that we will review is dissonant and if we get back to the traditional approach we need to match a pixel from the left image two pixels on the right image and usually we define a range of disparities that we would like to consider betwee in demon and d-max and then we can plot the cost for each disparity and if you consider only this pixel we can just use the winner takes it all sex selection which take the selection with the minimal cost but if you want to consider more pixels then we can create a volume which its size is the height of the image times the width of the image time the number of disparities that we are consider where for each pixel we store this matching cost vector then we can optimize the cost volume so this net use this cost volume and they applied neural networks directly on this cost volume at it and it was published published two years ago by a Kindle town and this was the first end-to-end method that ranked first on both kitty datasets and today they are ranked a much lower for the three and one hundred on Kitty datasets and here is their idea they are getting the input image app they extract features from the images then they concatenate the features into the cost volume without telling directly how to compare the features then they apply 3d a convolutional neural networks to a using the unit architecture and at the end they using differentiable a winner text title which is actually soft arc marks to get the right disparity and depth estimation and you can see a comparison between a dissonant and dissonant and for the background pixels they got much better result than this nut on the foreground area disconnect was still better but if we consider all the pixels this net was much better than this net but it is a slower and it can only perform one pair per batch because it requires much more memory but in terms of convergence time GC net converges faster because it requires much less iteration to converge so the last time approach that I will review is Jana and this is a very new method it was published one month ago at cvpr 2019 and it is currently ranked second in kitty and the nice thing about this method is that they used a classic approaches inside a deep neural network and here is their architecture so first they are taking the are doing fetch feature extraction to the images and store it in cost value like GC net but then they have special layers called SGA semi global guided aggregation which is a learned approximation for the classic sgm approach and this helps them to globally find the right depth using the deep network then they have LGL local guided aggregation ler that improves the initial estimation of the death so this is this performs well and I think that today this is the best way to go for depth estimation using deep learning and here is system visual examples this is the input image and this is the result by dissonant and the result by gen8 and bloom in faraway red is closer and green is very close okay so again GC net and J net here’s another example and the result by GC net and Janet so you can see the J net is much cleaner than jitta net and the last example this is the input image without by GC net and Janet definite and Janet but as we said before this is a very active research field and five new papers published at the cvpr last month so it’s very important to keep updated for the resident methods for a depth estimation using a deep learning and for our next webinar we will talk about a very important problem called domain adaptation and the idea is that if you train a network on a specific environment it’s not revealed that this will work on a new environment you train your network in New York you want it to perform well on in Los Angeles and this challenge is very active a research field and we will talk about it next time so thank you very much okay great thank you any thank you very much and so I hope everybody learned from this session and you only went deep into pilatus I would say methods to handle this kind of estimation out of the cameras this lie just so a handful of companies today are actually working developing and manufacturing camera-based def estimation solutions for both emotive industry and I just wanted to give you some examples what this technology is actually being used in the market and basically this concludes this part of the recession we have more time so let’s see if we have a few questions from the audience and we’ll be happy to address them and in any case of course if any of you is leaving so we thank you for taking the time of this webinar so let’s give it a minute and see what questions are coming from the audience okay so one question we have here is can GE net perform in real time yawning okay so during that with their accurate model cannot is it’s faster than GC net but it’s not fast enough for a real-time application but they have a real-time model that runs at a speed of 15 frames per second in a GPU but this model is less accurate but still much more accurate than this minute that runs in every time okay so I hope that gave the answer let’s see what else we have here we have a question why deep networks work well for stereo matching okay so deep networks um actually can capture semantic information from images so as people we know that if we see a car on the Left image we should search for a car on the right image and the pixels should correspond however in the classic methods is such a semantic perception could not be used but now with deep networks it became possible so that the reason I believe they worked away okay great thank you let’s see what else we have here in just a second okay so we have here can you show the images again which one of these any cases that GC net achieves better accuracy okay so just a second let’s go back maybe yeah okay the images again so I think in this image I would choose a GC net over geonet because on this say on the left part you can say that a J net smooth this barrier away and for an application of Donna MERS driving I would like to know that there is a barrier here so I think that sometimes jaynette though they’re good at the surfaces is sometimes they a smooth out the obstacles and it’s a problem for especially for autonomous driving okay thank you let’s take maybe two more and then we’ll complete this webinar so we have here how sensitive is GA net two different cameras so actually they perform the several experiments and you can see the details on a pepper the most case that there are interested but interested about is the kitty data set for which we took the examples from but they did a other comparison to contain their paper okay let’s take a one last one is there a model that can run in real time on the CPU okay so that’s a challenge always right Johnny yes especially for when you don’t have a big Gipp GPU available so actually most deep neural networks solution are very good for using the GPU and very fast but there is a in some I will call it a classic approach from 2014 which is called toast and this work by ranched and Strauss which they did not use that deep learning but they optimize the sgm approach to be more accurate and more fast and they perform well on the CPU but their error is larger they’re larger than the real time version of reggienet so I think it’s clear that the best solution will be to have a GPU running in any case if you have GPU you should definitely use the learning approaches yeah okay great so I see we have no more questions on the panel and that’s great we even ahead of time so you have a few more minutes left for yourselves and it’s time to say thank you so first of all Yanni thank you very much for joining us today and sharing all this knowledge and understanding of the field we appreciate it thank you okay and thank you the audience everybody for taking the time for this webinar I hope you found it you know educational and interesting and please do not hesitate to contact us if you have any more questions and we’ll be happy to see you in the coming webinar on different kinds of AI in automotive topics thank you very much and have a great day


Leave a Reply

Your email address will not be published. Required fields are marked *