We have made some behind-the-scene changes at ENLIGHTEN3D that changed our RSS feed. This will be the last post on this feed, so to continue receiving updates, follow this link to re-subscribe.
http://feeds.feedburner.com/Enlighten3d
We have made some behind-the-scene changes at ENLIGHTEN3D that changed our RSS feed. This will be the last post on this feed, so to continue receiving updates, follow this link to re-subscribe.
http://feeds.feedburner.com/Enlighten3d
By Christian Laforte
CVPR 2008, the leading computer vision research conference, is starting tomorrow (June 24th) in Alaska. I won’t be attending this year, but fortunately, http://gmazars.info/conf/cvpr2008.html has the full list of selected papers.
What is segmentation and why it matters
In this blog post I will explore advances in one of the most popular vision research topics, segmentation. Segmentation consists in dividing an image into several regions that look similar. Think of it as cutting the contour of an object with scissors, or in Photoshop, separating a foreground object from the background.
Segmentation is a critical step in many vision problems such as recognizing an object in a cluttered room, or a person in a crowd. A classical example is isolating a tiger from a jungle scene.
Sumatran Tiger in the wild, photograph from Richard Ness
If you’re not a computer vision scientist you may think this is segmentation stuff is overly complicated and useless. But say you want to photoshop yourself riding this tiger to impress your online friends. (After making your face prettier of course.) The first thing you’ll need to do (after buying, learning and practicing Photoshop) is to separate the tiger from the background. Now imagine your online friends call the bluff and ask you to post a video of the scene, and you’ll understand why teaching a computer to do it for you would be pretty helpful.
We humans can segment the tiger effortlessly (in our mind at least), since we have a mental model of what a tiger and a jungle look like, built through years of experience, and our eyes and our brains have evolved with natural selection. Individuals who couldn’t spot the tiger in the jungle didn’t survive too long. Replicating this intelligence in a computer algorithm is still an active research area. Even though humans have an easier time, even Photoshop experts sometimes have trouble with harder images that involve repeating patterns, transparency, motion blur and depth of field:
Bengal tiger cub from National Geographic
Manually segmenting the cub from the other tiger is challenging. Doing it automatically is not yet possible. We’re not there yet, but several new papers show possible ways.
Using Contours to Detect and Localize Junctions in Natural Images (PDF)
Michael Maire, Pablo Arbelez, Charless Fowlkes, Jitendra Malik
The paper provides a state-of-the-art solution for the related problems of finding contours (segmentation curves), and finding junction (points joined by multiple contours). The contours are found by combining local and global information. The local cues are combined in a multiscale oriented signal including brightness, color and texture gradients. The global information is considered to be in the first 9 generalized eigenvectors, from which a signal is extracted with Gaussian directional derivatives at multiple orientations. The local and global information are then linearly combined, resulting in a globalized probability of boundary, which claims the top spot in the standard Berkeley segmentation benchmark.
Original image
A set of so-so contour lines. Too many lines in the textured areas.
A near perfect set of contour lines, produced by Maire’s algorithm
Maire and his colleagues then proceed to leverage this superior contour detection algorithm to identify junctions, using an energy minimization approach. Open contours can therefore be extended to their likely junction points. The results of this approach are compared with that of a novel Harris operator, along with human-provided expected results. As you can see in the example below, the approach yields smooth, nice contours. Junctions are detected in their expected location even in heavily textured boundaries.
Original image
Resulting contours and junctions
One question that remains unanswered is… how fast are these algorithms? How well do they deal with very large images? Since they don’t mention performance, I wouldn’t be surprised if it took several seconds or minutes to process one image.
Other CVPR papers related to segmentation
Edge preserving spatially varying mixtures for image segmentation (PDF)
Giorgos Sfikas, Christophoros Nikou, Nikolaos Galatsanos
Proposes a hierarchical Bayesian model based on Gaussian mixture models with a prior enforcing spatial smoothness. I skimmed through it very quickly, so I can’t offer an intelligent review. Unlike the first approach, it reportedly doesn’t require tweaking parameters, but the results aren’t as compelling IMHO.
Segmentation by transduction (PDF)
Olivier Duchenne, Jean-Yves Audibert, Renaud Keriven, Jean Ponce, Florent Segonne
Olivier Duchenne and his colleagues describe a semi-interactive background segmentation technique, inspired from GrabCuts and Graph Cuts. Such semi-interactive techniques rely on hints provided by the user to help the computer segment an object from the rest of the image, like brush strokes on the foreground and background, or tracing a rectangle surrounding the foreground object. The best way to explain it is with an example image from the Duchenne’s paper:
This particular paper produces decent results pretty fast (2 seconds to 3 minutes on a standard computer with a single thread). Some more results:
Such techniques will eventually make it simple for anyone to cut a picture cleanly. In the meantime, don’t throw away that old Photoshop magic wand.
By Christian Laforte
Fast, out-of-the-box 3D on the web remains an elusive dream.
Theatre Magique has an informative review of the most active 3D flash libraries (e.g. Papervision). He notes that while the applications are promising, the performance is still lacking, and that the much publicized 3D capability in Flash 10 (the next version currently in Beta) doesn’t really help.
“In the end, Flash is still not ready for 3D. The Z dimension is now affordable thanks to the devoted work of some flashers but their engines are built on a weak platform, that is the Flash Player. It still cannot handle bitmap processing quickly, which is the biggest barrier to 3D. [...] Is Adobe waiting for some miracles from the community to move on to 3D ? Flash Player 10 is a beginning to this but it doesn’t even include Z-sorting so you end up doing it manually…You end up taking Away3D and waiting for the Flash Player 11 patiently.”
Still, I can’t leave you on a sad note, so here’s an effective see this Flash-based Fifa promotion by Electronic Arts. Note the 2D characters drawn on billboards, still cool nonetheless.
Disclosure: Adobe is one of our clients, but we have nothing to do with 3D in Flash. (We contributed the COLLADA translator for the 3D capability in Photoshop.)
So what can you do if you need fast 3D on the web, right now? There are many plug-ins that can help. One of them, our Feeling Engine, is amongst the fastest, most powerful and programmer-friendly. It’s fast because it takes maximum advantage of your graphic card and it’s written in C++, instead of slower languages like ActionScript or Java. Because it is the official COLLADA viewer, it has the most complete support for animated 3D models from Maya, 3ds Max and most other professional tools. And it can easily be extended through C++ and Javascript, so web developers don’t have to become 3D experts to use 3D in their web site.
Finally, we support both fast server-side and client-side use, which means that many applications won’t require their users to install a plug-in.
The main down side of the Feeling Engine? So far we’ve preferred to focus on development and performance optimization rather than marketing.
By Christian Laforte
When I think of 3D Graphics enthusiasts, the image that comes to mind is the stereotypical male that loves video games, gadgets, maths and programming (basically me). That’s why I as happy to discover the refreshingly different Elaine Polvinen’s and her blog.

A pioneer in virtual fashion design (she started experimenting with digital design techniques in ‘87), her blog explores the new possibilities in fashion design offered by virtual worlds and new techniques to help motivate weight loss using 3D avatars and gadgets like Wii Fit. Definitely worth a look.

As seen in VFT blog: Kozmara – a fashion
designer in real and virtual world
By Joshua Koopferstock
French car manufacturer Renault is teaming up with Holografika, creators of the glasses-free HoloVizio display, and Oktal, a provider of simulation software, to develop a 3D holographic screen for cars. That appears to be all the detail on this for the moment, so we are left to wonder, what are they going to DO with a 3D display in a vehicle?

A Renault Megané Coupé Cabriolet (image from www.renault.com)
I’m not much of a car person myself, but a few half-formed ideas come to mind. There are certain occasions when 2D maps do not provide enough detail for navigation (when roads go over or under each other), and if nothing else, 3D maps would be a more impressive visual feature. Is the future 3D Google Earth in every car? While we may be working toward that, it is still a ways off.
Perhaps it is a way to more accurately visualize the exterior of a vehicle. Renault’s product line does include cars with a rear parking sensor, and it would certainly be helpful to be able to visualize this in 3D.
At this point, it’s impossible to say exactly what they have in mind. Even if they’re just doing this as eye-candy, though, it should spur car enthusiasts and programmers to think of new and interesting ways to use the 3D displays. I look forward to seeing what, if anything, actually comes to market from this partnership.
via Optics.org (free registration required to view article)
By Christian Laforte
Do you know what CUDA and OpenCL stand for and how they could make your computer 50 times faster? If so, you can safely jump to the “Ending the mess” section below. Otherwise read on for a gentle introduction.
A computer has two important processing units: the CPU and GPU. Think of them as the two brothers in Rain Man.
The GPU is the ultimate autistic savant. He’s really, really good at counting stuff and doing a lot of complex math at the same time. For example, he can multiply two long sequences of numbers in his head, faster than you can type it in a calculator. But ask him to do something he’s not used to like buying milk and he’ll just ignore you or throw a fit. He only listens to people who know him well. Even if you spend years learning to communicate with a given autistic savant, chances are you’ll have to start anew when you meet a new one.
The CPU is your regular guy. He can do all kinds of stuff that the savant can’t. He goes along well with everybody, as long as they speak English. If he learns to take advantage of the savant, the two of them can do amazing things like count cards at Poker.
In other words, the GPU is natural at some operations that involve repetitive calculations, like those necessary for drawing 3D graphics and doing basic image manipulation. It can do those operations hundreds of times faster than regular CPUs. The high performance comes at the cost of ease of programming. As long as you stick with basic 3D graphics, it’s pretty easy. But say you want to make your financial application run 50 times faster, or making your protein folding simulator run a hundred times faster. It can be done, but for every ten thousand programmers out there, you’ll be lucky if you find one GPU expert capable of achieving that. (Shameless hint: you’ll have a much higher chance to find what you’re looking for at Feeling Software.)
The main GPU vendors — NVIDIA, AMD and Intel — have created new programming languages in the hope to simplify this process for non-GPU experts. For the time being NVIDIA seems to be leading the pack with CUDA. AMD has the Stream SDK. Intel provides Ct. Apple has OpenCL. And Microsoft has… well Microsoft doesn’t have anything yet, it’s too busy introducing new bugs in Vista.
History lesson
Wouldn’t it be nice if everyone could play nice together, if we could get any computing-intensive application to run ten or a hundred times faster without having to deal with all these vendor-specific languages? A potential solution is looming. Before I give you the answer, a short history lesson:
Back when the dinosaurs roamed the earth, in the early 1990s, there were dozens of 3D graphics workstation vendors whose name have long been forgotten who tried to dominate the 3D graphics industry. One of them, SGI, offered generously to transform its superior technology into an open standard called OpenGL. With OpenGL, programming simple 3D graphics algorithms on different hardware became easy.
Shortly after, SGI tragically lost its mind. Every smart graphics engineers fled the company to join upstarts like NVIDIA and ATI (bought by AMD recently). On its death bed, SGI gave up OpenGL and allowed a standard group called Khronos to take it over, where is has since evolved at a moderate but consistent rate. Khronos also maintains other 3D standards like COLLADA.
Ending the mess
Today Khronos announced that it wants to repeat the exploit. This time, they are starting a Compute Working Group so NVIDIA, AMD, Intel and many more can try to agree on a cross-platform standard, i.e. a programming language that will run super fast on multi-core CPUs and on GPUs alike. History shows that this is a good move, that can make it possible for new applications to come to life, applications that would have been too complicated to implement otherwise.
I have seen a couple of these standardization cycles in action, and since I’m not participating in this working group (so far) I will simply speculate from past experience and from the little that has been publicly stated so far. (Feel free in the comments section!) I think there are important questions that will make or break this initiative:
- Are the main players in a collaborative mindset, or are trying to dominate each other? In previous cycles (e.g. GLSL) Apple acted as the mediator, since they badly needed that technology and it had a strong business influence on NVIDIA and ATI. Intel didn’t have a strong participation. This time Intel is very active in 3D graphics. ATI/AMD is in a tough spot where they badly need to get back some market share, maybe through deals with Apple. I’m hopeful that there will be enough collaboration to create a decent standard, with unusual high-performance features continuing to be exposed only in the proprietary standards.
- Typically, the resulting standard is strongly inspired by one precedent. OpenGL was derived from IrisGL from SGI. COLLADA came straight from Sony. Arguably GLSL was inspired by early versions of NVIDIA Cg and a long line of predecessors. This time around, Apple apparently likes NVIDIA’s CUDA which is likely to form the basis of their OpenCL. OpenCL would be the right name for the resulting standard, if the other players can accept it. So my bet is on CUDA rechristened as OpenCL, with bits of Intel Ct thrown in to keep everyone happy. AMD will keep its low-level interface and will implement the high-level standard on top.
- How long will it take for a something usable to come out of this? In the past, such standards have typically taken at least a year, often because modifications were necessary to make the hardware compatible. In this case, it sounds like the technology will primarily be software-based (e.g. compilers), so it could happen much faster if a majority of the vendors accept to play nice. The deciding factor here will be the chair of the working group. A diplomatic yet decisive chair can make things happen ten times faster than a less decisive or a visibly biased one. Since I don’t know yet who is chairing the group, I will refrain from commenting publicly.