Dr. Alexey Abramov

Email:
-

Global QuickSearch:   Matches: 0

Search Settings

    Author / Editor / Organization
    Year
    Title
    Journal / Proceedings / Book
    Reich, S. and Abramov, A. and Papon, J. and Wörgötter, F. and Dellen, B. (2013).
    A Novel Real-time Edge-Preserving Smoothing Filter. International Conference on Computer Vision Theory and Applications, 5 - 14.
    BibTeX:
    @inproceedings{reichabramovpapon2013,
      author = {Reich, S. and Abramov, A. and Papon, J. and Wörgötter, F. and Dellen, B.},
      title = {A Novel Real-time Edge-Preserving Smoothing Filter},
      pages = {5 - 14},
      booktitle = {International Conference on Computer Vision Theory and Applications},
      year = {2013},
      location = {Barcelona (Spain)},
      month = {February 21-24},
      url = {http://www.visapp.visigrapp.org/Abstracts/2013/VISAPP_2013_Abstracts.htm},
      abstract = {The segmentation of textured and noisy areas in images is a very challenging task due to the large variety of objects and materials in natural environments, which cannot be solved by a single similarity measure. In this paper, we address this problem by proposing a novel edge-preserving texture filter, which smudges the color values inside uniformly textured areas, thus making the processed image more workable for color-based image segmentation. Due to the highly parallel structure of the method, the implementation on a GPU runs in real-time, allowing us to process standard images within tens of milliseconds. By preprocessing images with this novel filter before applying a recent real-time color-based image segmentation method, we obtain significant improvements in performance for images from the Berkeley dataset, outperforming an alternative version using a standard bilateral filter for preprocessing. We further show that our combined approach leads to better segmentations in terms of a standard performance measure than graph-based and mean-shift segmentation for the Berkeley image dataset.}}
    Abstract: The segmentation of textured and noisy areas in images is a very challenging task due to the large variety of objects and materials in natural environments, which cannot be solved by a single similarity measure. In this paper, we address this problem by proposing a novel edge-preserving texture filter, which smudges the color values inside uniformly textured areas, thus making the processed image more workable for color-based image segmentation. Due to the highly parallel structure of the method, the implementation on a GPU runs in real-time, allowing us to process standard images within tens of milliseconds. By preprocessing images with this novel filter before applying a recent real-time color-based image segmentation method, we obtain significant improvements in performance for images from the Berkeley dataset, outperforming an alternative version using a standard bilateral filter for preprocessing. We further show that our combined approach leads to better segmentations in terms of a standard performance measure than graph-based and mean-shift segmentation for the Berkeley image dataset.
    Review:
    Papon, J. and Abramov, A. and Schoeler, M. and Wörgötter, F. (2013).
    Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds. IEEE Conference on Computer Vision and Pattern Recognition CVPR, 2027 - 2034. DOI: 10.1109/CVPR.2013.264.
    BibTeX:
    @inproceedings{paponabramovschoeler2013,
      author = {Papon, J. and Abramov, A. and Schoeler, M. and Wörgötter, F.},
      title = {Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds},
      pages = {2027 - 2034},
      booktitle = {IEEE Conference on Computer Vision and Pattern Recognition CVPR},
      year = {2013},
      location = {Portland, OR, USA},
      month = {06},
      doi = {10.1109/CVPR.2013.264},
      abstract = {Unsupervised over-segmentation of an image into regions of perceptually similar pixels, known as super pixels, is a widely used preprocessing step in segmentation algorithms. Super pixel methods reduce the number of regions that must be considered later by more computationally expensive algorithms, with a minimal loss of information. Nevertheless, as some information is inevitably lost, it is vital that super pixels not cross object boundaries, as such errors will propagate through later steps. Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent super pixels from crossing regions of empty space. We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space. Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated. Additionally, as the algorithm works directly in 3D space, observations from several calibrated RGB+D cameras can be segmented jointly. Experiments on a large data set of human annotated RGB+D images demonstrate a significant reduction in occurrence of clusters crossing object boundaries, while maintaining speeds comparable to state-of-the-art 2D methods.}}
    Abstract: Unsupervised over-segmentation of an image into regions of perceptually similar pixels, known as super pixels, is a widely used preprocessing step in segmentation algorithms. Super pixel methods reduce the number of regions that must be considered later by more computationally expensive algorithms, with a minimal loss of information. Nevertheless, as some information is inevitably lost, it is vital that super pixels not cross object boundaries, as such errors will propagate through later steps. Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent super pixels from crossing regions of empty space. We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space. Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated. Additionally, as the algorithm works directly in 3D space, observations from several calibrated RGB+D cameras can be segmented jointly. Experiments on a large data set of human annotated RGB+D images demonstrate a significant reduction in occurrence of clusters crossing object boundaries, while maintaining speeds comparable to state-of-the-art 2D methods.
    Review:
    Papon, J. and Abramov, A. and Aksoy, E. and Wörgötter, F. (2012).
    A modular system architecture for online parallel vision pipelines. Applications of Computer Vision WACV, 2012 IEEE Workshop on, 361-368. DOI: 10.1109/WACV.2012.6163002.
    BibTeX:
    @inproceedings{paponabramovaksoy2012,
      author = {Papon, J. and Abramov, A. and Aksoy, E. and Wörgötter, F.},
      title = {A modular system architecture for online parallel vision pipelines},
      pages = {361-368},
      booktitle = {Applications of Computer Vision WACV, 2012 IEEE Workshop on},
      year = {2012},
      month = {jan},
      doi = {10.1109/WACV.2012.6163002},
      abstract = {We present an architecture for real-time, online vision systems which enables development and use of complex vision pipelines integrating any number of algorithms. Individual algorithms are implemented using modular plugins, allowing integration of independently developed algorithms and rapid testing of new vision pipeline configurations. The architecture exploits the parallelization of graphics processing units (GPUs) and multi-core systems to speed processing and achieve real-time performance. Additionally, the use of a global memory management system for frame buffering permits complex algorithmic flow (e.g. feedback loops) in online processing setups, while maintaining the benefits of threaded asynchronous operation of separate algorithms. To demonstrate the system, a typical real-time system setup is described which incorporates plugins for video and depth acquisition, GPU-based segmentation and optical flow, semantic graph generation, and online visualization of output. Performance numbers are shown which demonstrate the insignificant overhead cost of the architecture as well as speed-up over strictly CPU and single threaded implementations.}}
    Abstract: We present an architecture for real-time, online vision systems which enables development and use of complex vision pipelines integrating any number of algorithms. Individual algorithms are implemented using modular plugins, allowing integration of independently developed algorithms and rapid testing of new vision pipeline configurations. The architecture exploits the parallelization of graphics processing units (GPUs) and multi-core systems to speed processing and achieve real-time performance. Additionally, the use of a global memory management system for frame buffering permits complex algorithmic flow (e.g. feedback loops) in online processing setups, while maintaining the benefits of threaded asynchronous operation of separate algorithms. To demonstrate the system, a typical real-time system setup is described which incorporates plugins for video and depth acquisition, GPU-based segmentation and optical flow, semantic graph generation, and online visualization of output. Performance numbers are shown which demonstrate the insignificant overhead cost of the architecture as well as speed-up over strictly CPU and single threaded implementations.
    Review:
    Papon, J. and Abramov, A. and Wörgötter, F. (2012).
    Occlusion Handling in Video Segmentation via Predictive Feedback. Computer Vision ECCV 2012. Workshops and Demonstrations, 233-242, 7585. DOI: 10.1007/978-3-642-33885-4_24.
    BibTeX:
    @incollection{paponabramovwoergoetter2012,
      author = {Papon, J. and Abramov, A. and Wörgötter, F.},
      title = {Occlusion Handling in Video Segmentation via Predictive Feedback},
      pages = {233-242},
      booktitle = {Computer Vision ECCV 2012. Workshops and Demonstrations},
      year = {2012},
      volume= {7585},
      publisher = {Springer Berlin Heidelberg},
      series = {Lecture Notes i},
      doi = {10.1007/978-3-642-33885-4_24},
      abstract = {We present a method for unsupervised on-line dense video segmentation which utilizes sequential Bayesian estimation techniques to resolve partial and full occlusions. Consistent labeling through occlusions is vital for applications which move from low-level object labels to high-level semantic knowledge - tasks such as activity recognition or robot control. The proposed method forms a predictive loop between segmentation and tracking, with tracking predictions used to seed the segmentation kernel, and segmentation results used to update tracked models. All segmented labels are tracked, without the use of a-priori models, using parallel color-histogram particle filters. Predictions are combined into a probabilistic representation of image labels, a realization of which is used to seed segmentation. A simulated annealing relaxation process allows the realization to converge to a minimal energy segmented image. Found segments are subsequently used to repopulate the particle sets, closing the loop. Results on the Cranfield benchmark sequence demonstrate that the prediction mechanism allows on-line segmentation to maintain temporally consistent labels through partial & full occlusions, significant appearance changes, and rapid erratic movements. Additionally, we show that tracking performance matches state-of-the art tracking methods on several challenging benchmark sequences.}}
    Abstract: We present a method for unsupervised on-line dense video segmentation which utilizes sequential Bayesian estimation techniques to resolve partial and full occlusions. Consistent labeling through occlusions is vital for applications which move from low-level object labels to high-level semantic knowledge - tasks such as activity recognition or robot control. The proposed method forms a predictive loop between segmentation and tracking, with tracking predictions used to seed the segmentation kernel, and segmentation results used to update tracked models. All segmented labels are tracked, without the use of a-priori models, using parallel color-histogram particle filters. Predictions are combined into a probabilistic representation of image labels, a realization of which is used to seed segmentation. A simulated annealing relaxation process allows the realization to converge to a minimal energy segmented image. Found segments are subsequently used to repopulate the particle sets, closing the loop. Results on the Cranfield benchmark sequence demonstrate that the prediction mechanism allows on-line segmentation to maintain temporally consistent labels through partial & full occlusions, significant appearance changes, and rapid erratic movements. Additionally, we show that tracking performance matches state-of-the art tracking methods on several challenging benchmark sequences.
    Review:
    Abramov, A. and Papon, J. and Pauwels, K. and Wörgötter, F. and Dellen, B. (2012).
    Depth-supported real-time video segmentation with the Kinect. IEEE workshop on the Applications of Computer Vision WACV. DOI: 10.1109/WACV.2012.6163000.
    BibTeX:
    @inproceedings{abramovpaponpauwels2012,
      author = {Abramov, A. and Papon, J. and Pauwels, K. and Wörgötter, F. and Dellen, B.},
      title = {Depth-supported real-time video segmentation with the Kinect},
      booktitle = {IEEE workshop on the Applications of Computer Vision WACV},
      year = {2012},
      doi = {10.1109/WACV.2012.6163000},
      abstract = {We present a real-time technique for the spatiotemporal segmentation of color/depth movies. Images are segmented using a parallel Metropolis algorithm implemented on a GPU utilizing both color and depth information, acquired with the Microsoft Kinect. Segments represent the equilibrium states of a Potts model, where tracking of segments is achieved by warping obtained segment labels to the next frame using real-time optical flow, which reduces the number of iterations required for the Metropolis method to encounter the new equilibrium state. By including depth information into the framework, true objects boundaries can be found more easily, improving also the temporal coherency of the method. The algorithm has been tested for videos of medium resolutions showing human manipulations of objects. The framework provides an inexpensive visual front end for visual preprocessing of videos in industrial settings and robot labs which can potentially be used in various applications.}}
    Abstract: We present a real-time technique for the spatiotemporal segmentation of color/depth movies. Images are segmented using a parallel Metropolis algorithm implemented on a GPU utilizing both color and depth information, acquired with the Microsoft Kinect. Segments represent the equilibrium states of a Potts model, where tracking of segments is achieved by warping obtained segment labels to the next frame using real-time optical flow, which reduces the number of iterations required for the Metropolis method to encounter the new equilibrium state. By including depth information into the framework, true objects boundaries can be found more easily, improving also the temporal coherency of the method. The algorithm has been tested for videos of medium resolutions showing human manipulations of objects. The framework provides an inexpensive visual front end for visual preprocessing of videos in industrial settings and robot labs which can potentially be used in various applications.
    Review:
    Abramov, A. and Papon, J. and Pauwels, K. and Wörgötter, F. and Babette, D. (2012).
    Real-time Segmentation of Stereo Videos on a Resource-limited System with a Mobile GPU. IEEE Transactions on circuits and systems for video technology, 1292 - 1305, 22, 9. DOI: 10.1109/TCSVT.2012.2199389.
    BibTeX:
    @inproceedings{abramovpaponpauwels2012a,
      author = {Abramov, A. and Papon, J. and Pauwels, K. and Wörgötter, F. and Babette, D.},
      title = {Real-time Segmentation of Stereo Videos on a Resource-limited System with a Mobile GPU},
      pages = {1292 - 1305},
      booktitle = {IEEE Transactions on circuits and systems for video technology},
      year = {2012},
      volume= {22},
      number = {9},
      month = {09},
      doi = {10.1109/TCSVT.2012.2199389},
      abstract = {In mobile robotic applications, visual information needs to be processed fast despite resource limitations of the mobile system. Here, a novel real-time framework for model-free spatiotemporal segmentation of stereo videos is presented. It combines real-time optical flow and stereo with image segmentation and runs on a portable system with an integrated mobile graphics processing unit. The system performs online, automatic, and dense segmentation of stereo videos and serves as a visual front end for preprocessing in mobile robots, providing a condensed representation of the scene that can potentially be utilized in various applications, e.g., object manipulation, manipulation recognition, visual servoing. The method was tested on real-world sequences with arbitrary motions, including videos acquired with a moving camera.}}
    Abstract: In mobile robotic applications, visual information needs to be processed fast despite resource limitations of the mobile system. Here, a novel real-time framework for model-free spatiotemporal segmentation of stereo videos is presented. It combines real-time optical flow and stereo with image segmentation and runs on a portable system with an integrated mobile graphics processing unit. The system performs online, automatic, and dense segmentation of stereo videos and serves as a visual front end for preprocessing in mobile robots, providing a condensed representation of the scene that can potentially be utilized in various applications, e.g., object manipulation, manipulation recognition, visual servoing. The method was tested on real-world sequences with arbitrary motions, including videos acquired with a moving camera.
    Review:
    Aksoy, E E. and Abramov, A. and Dörr, J. and Kejun, N. and Dellen, B. and Wörgötter, F. (2011).
    Learning the semantics of object-action relations by observation. The International Journal of Robotics Research September, 1229-1249, 30.
    BibTeX:
    @article{aksoyabramovdoerr2011,
      author = {Aksoy, E E. and Abramov, A. and Dörr, J. and Kejun, N. and Dellen, B. and Wörgötter, F.},
      title = {Learning the semantics of object-action relations by observation},
      pages = {1229-1249},
      journal = {The International Journal of Robotics Research September},
      year = {2011},
      volume= {30},
      url = {http://ijr.sagepub.com/content/30/10/1229.abstract},
      abstract = {Recognizing manipulations performed by a human and the transfer and execution of this by a robot is a difficult problem. We address this in the current study by introducing a novel representation of the relations between objects at decisive time points during a manipulation. Thereby, we encode the essential changes in a visual scenery in a condensed way such that a robot can recognize and learn a manipulation without prior object knowledge. To achieve this we continuously track image segments in the video and construct a dynamic graph sequence. Topological transitions of those graphs occur whenever a spatial relation between some segments has changed in a discontinuous way and these moments are stored in a transition matrix called the semantic event chain (SEC). We demonstrate that these time points are highly descriptive for distinguishing between different manipulations. Employing simple sub-string search algorithms, SECs can be compared and type-similar manipulations can be recognized with high confidence. As the approach is generic, statistical learning can be used to find the archetypal SEC of a given manipulation class. The performance of the algorithm is demonstrated on a set of real videos showing hands manipulating various objects and performing different actions. In experiments with a robotic arm, we show that the SEC can be learned by observing human manipulations, transferred to a new scenario, and then reproduced by the machine.}}
    Abstract: Recognizing manipulations performed by a human and the transfer and execution of this by a robot is a difficult problem. We address this in the current study by introducing a novel representation of the relations between objects at decisive time points during a manipulation. Thereby, we encode the essential changes in a visual scenery in a condensed way such that a robot can recognize and learn a manipulation without prior object knowledge. To achieve this we continuously track image segments in the video and construct a dynamic graph sequence. Topological transitions of those graphs occur whenever a spatial relation between some segments has changed in a discontinuous way and these moments are stored in a transition matrix called the semantic event chain (SEC). We demonstrate that these time points are highly descriptive for distinguishing between different manipulations. Employing simple sub-string search algorithms, SECs can be compared and type-similar manipulations can be recognized with high confidence. As the approach is generic, statistical learning can be used to find the archetypal SEC of a given manipulation class. The performance of the algorithm is demonstrated on a set of real videos showing hands manipulating various objects and performing different actions. In experiments with a robotic arm, we show that the SEC can be learned by observing human manipulations, transferred to a new scenario, and then reproduced by the machine.
    Review:
    Abramov, A. and Kulvicius, T. and Wörgötter, F. and Dellen, B. (2011).
    Real-Time Image Segmentation on a GPU. Facing the Multicore-Challenge, 131-142, 6310. DOI: 10.1007/978-3-642-16233-6_14.
    BibTeX:
    @inproceedings{abramovkulviciuswoergoetter2011,
      author = {Abramov, A. and Kulvicius, T. and Wörgötter, F. and Dellen, B.},
      title = {Real-Time Image Segmentation on a GPU},
      pages = {131-142},
      booktitle = {Facing the Multicore-Challenge},
      year = {2011},
      volume= {6310},
      doi = {10.1007/978-3-642-16233-6_14},
      abstract = {Efficient segmentation of color images is important for many applications in computer vision. Non-parametric solutions are required in situations where little or no prior knowledge about the data is available. In this paper, we present a novel parallel image segmentation algorithm which segments images in real-time in a non-parametric way. The algorithm finds the equilibrium states of a Potts model in the superparamagnetic phase of the system. Our method maps perfectly onto the Graphics Processing Unit GPU architecture and has been implemented using the framework NVIDIA Compute Unified Device Architecture CUDA. For images of 256 x 320 pixels we obtained a frame rate of 30 Hz that demonstrates the applicability of the algorithm to video-processing tasks in real-time1}}
    Abstract: Efficient segmentation of color images is important for many applications in computer vision. Non-parametric solutions are required in situations where little or no prior knowledge about the data is available. In this paper, we present a novel parallel image segmentation algorithm which segments images in real-time in a non-parametric way. The algorithm finds the equilibrium states of a Potts model in the superparamagnetic phase of the system. Our method maps perfectly onto the Graphics Processing Unit GPU architecture and has been implemented using the framework NVIDIA Compute Unified Device Architecture CUDA. For images of 256 x 320 pixels we obtained a frame rate of 30 Hz that demonstrates the applicability of the algorithm to video-processing tasks in real-time1
    Review:
    Abramov, A. and Aksoy, E E. and Dörr, J. and Wörgötter, F. and Pauwels, K. and Dellen, B. (2010).
    3d semantic representation of actions from efficient stereo-image-sequence segmentation on GPUs. 5th International Symposium 3D Data Processing, Visualization and Transmission.
    BibTeX:
    @inproceedings{abramovaksoydoerr2010,
      author = {Abramov, A. and Aksoy, E E. and Dörr, J. and Wörgötter, F. and Pauwels, K. and Dellen, B.},
      title = {3d semantic representation of actions from efficient stereo-image-sequence segmentation on GPUs},
      booktitle = {5th International Symposium 3D Data Processing, Visualization and Transmission},
      year = {2010},
      abstract = {A novel real-time framework for model-free stereo-video segmentation and stereo-segment tracking is presented, combining real-time optical flow and stereo with image segmentation running separately on two GPUs. The stereosegment tracking algorithm achieves a frame rate of 23 Hz for regular videos with a frame size of 256x320 pixels and nearly real time for stereo videos. The computed stereo segments are used to construct 3D segment graphs, from which main graphs, representing a relevant change in the scene, are extracted, which allow us to represent a movie of e.g. 396 original frames by only 12 graphs, each containing only a small number of nodes, providing a condensed description of the scene while preserving data-intrinsic semantics. Using this method, human activities, e.g. and handling of objects, can be encoded in an efficient way. The method has potential applications for manipulation action recognition and learning, and provides a vision-front end for applications in cognitive robotics}}
    Abstract: A novel real-time framework for model-free stereo-video segmentation and stereo-segment tracking is presented, combining real-time optical flow and stereo with image segmentation running separately on two GPUs. The stereosegment tracking algorithm achieves a frame rate of 23 Hz for regular videos with a frame size of 256x320 pixels and nearly real time for stereo videos. The computed stereo segments are used to construct 3D segment graphs, from which main graphs, representing a relevant change in the scene, are extracted, which allow us to represent a movie of e.g. 396 original frames by only 12 graphs, each containing only a small number of nodes, providing a condensed description of the scene while preserving data-intrinsic semantics. Using this method, human activities, e.g. and handling of objects, can be encoded in an efficient way. The method has potential applications for manipulation action recognition and learning, and provides a vision-front end for applications in cognitive robotics
    Review:
    Aksoy, E E. and Abramov, A. and Wörgötter, F. and Dellen, B. (2010).
    Categorizing object-action relations from semantic scene graphs. IEEE International Conference on Robotics and Automation ICRA, 398-405. DOI: 10.1109/ROBOT.2010.5509319.
    BibTeX:
    @inproceedings{aksoyabramovwoergoetter2010,
      author = {Aksoy, E E. and Abramov, A. and Wörgötter, F. and Dellen, B.},
      title = {Categorizing object-action relations from semantic scene graphs},
      pages = {398-405},
      booktitle = {IEEE International Conference on Robotics and Automation ICRA},
      year = {2010},
      month = {05},
      doi = {10.1109/ROBOT.2010.5509319},
      abstract = {In this work we introduce a novel approach for detecting spatiotemporal object-action relations, leading to both, action recognition and object categorization. Semantic scene graphs are extracted from image sequences and used to find the characteristic main graphs of the action sequence via an exact graph-matching technique, thus providing an event table of the action scene, which allows extracting object- action relations. The method is applied to several artificial and real action scenes containing limited context. The central novelty of this approach is that it is model free and needs a priori representation neither for objects nor actions. Essentially actions are recognized without requiring prior object knowledge and objects are categorized solely based on their exhibited role within an action sequence. Thus, this approach is grounded in the affordance principle, which has recently attracted much attention in robotics and provides a way forward for trial and error learning of object-action relations through repeated experimentation. It may therefore be useful for recognition and categorization tasks for example in imitation learning in developmental and cognitive robotics}}
    Abstract: In this work we introduce a novel approach for detecting spatiotemporal object-action relations, leading to both, action recognition and object categorization. Semantic scene graphs are extracted from image sequences and used to find the characteristic main graphs of the action sequence via an exact graph-matching technique, thus providing an event table of the action scene, which allows extracting object- action relations. The method is applied to several artificial and real action scenes containing limited context. The central novelty of this approach is that it is model free and needs a priori representation neither for objects nor actions. Essentially actions are recognized without requiring prior object knowledge and objects are categorized solely based on their exhibited role within an action sequence. Thus, this approach is grounded in the affordance principle, which has recently attracted much attention in robotics and provides a way forward for trial and error learning of object-action relations through repeated experimentation. It may therefore be useful for recognition and categorization tasks for example in imitation learning in developmental and cognitive robotics
    Review:
    Schoeler, M. and Stein, S. and Papon, J. and Abramov, A. and Wörgötter, F. (2014).
    Fast Self-supervised On-line Training for Object Recognition Specifically for Robotic Applications. International Conference on Computer Vision Theory and Applications VISAPP, 1 - 10.
    BibTeX:
    @inproceedings{schoelersteinpapon2014,
      author = {Schoeler, M. and Stein, S. and Papon, J. and Abramov, A. and Wörgötter, F.},
      title = {Fast Self-supervised On-line Training for Object Recognition Specifically for Robotic Applications},
      pages = {1 - 10},
      booktitle = {International Conference on Computer Vision Theory and Applications VISAPP},
      year = {2014},
      month = {January},
      abstract = {Today most recognition pipelines are trained at an off-line stage, providing systems with pre-segmented images and predefined objects, or at an on-line stage, which requires a human supervisor to tediously control the learning. Self-Supervised on-line training of recognition pipelines without human intervention is a highly desirable goal, as it allows systems to learn unknown, environment specific objects on-the-fly. We propose a fast and automatic system, which can extract and learn unknown objects with minimal human intervention by employing a two-level pipeline combining the advantages of RGB-D sensors for object extraction and high-resolution cameras for object recognition. Furthermore, we significantly improve recognition results with local features by implementing a novel keypoint orientation scheme, which leads to highly invariant but discriminative object signatures. Using only one image per object for training, our system is able to achieve a recognition rate of 79% for 18 objects, benchmarked on 42 scenes with random poses, scales and occlusion, while only taking 7 seconds for the training. Additionally, we evaluate our orientation scheme on the state-of-the-art 56-object SDU-dataset boosting accuracy for one training view per object by +37% to 78% and peaking at a performance of 98% for 11 training views.}}
    Abstract: Today most recognition pipelines are trained at an off-line stage, providing systems with pre-segmented images and predefined objects, or at an on-line stage, which requires a human supervisor to tediously control the learning. Self-Supervised on-line training of recognition pipelines without human intervention is a highly desirable goal, as it allows systems to learn unknown, environment specific objects on-the-fly. We propose a fast and automatic system, which can extract and learn unknown objects with minimal human intervention by employing a two-level pipeline combining the advantages of RGB-D sensors for object extraction and high-resolution cameras for object recognition. Furthermore, we significantly improve recognition results with local features by implementing a novel keypoint orientation scheme, which leads to highly invariant but discriminative object signatures. Using only one image per object for training, our system is able to achieve a recognition rate of 79% for 18 objects, benchmarked on 42 scenes with random poses, scales and occlusion, while only taking 7 seconds for the training. Additionally, we evaluate our orientation scheme on the state-of-the-art 56-object SDU-dataset boosting accuracy for one training view per object by +37% to 78% and peaking at a performance of 98% for 11 training views.
    Review:
    Aksoy, E E. and Abramov, A. and Wörgötter, F. and Scharr, H. and Fischbach, A. and Dellen, B. (2015).
    Modeling leaf growth of rosette plants using infrared stereo image sequences. Computers and Electronics in Agriculture, 78 - 90, 110. DOI: http://dx.doi.org/10.1016/j.compag.2014.10.020.
    BibTeX:
    @article{aksoyabramovwoergoetter2015,
      author = {Aksoy, E E. and Abramov, A. and Wörgötter, F. and Scharr, H. and Fischbach, A. and Dellen, B.},
      title = {Modeling leaf growth of rosette plants using infrared stereo image sequences},
      pages = {78 - 90},
      journal = {Computers and Electronics in Agriculture},
      year = {2015},
      volume= {110},
      url = {http://www.sciencedirect.com/science/article/pii/S0168169914002816},
      doi = {http://dx.doi.org/10.1016/j.compag.2014.10.020},
      abstract = {Abstract In this paper, we present a novel multi-level procedure for finding and tracking leaves of a rosette plant, in our case up to 3 weeks old tobacco plants, during early growth from infrared-image sequences. This allows measuring important plant parameters, e.g. leaf growth rates, in an automatic and non-invasive manner. The procedure consists of three main stages: preprocessing, leaf segmentation, and leaf tracking. Leaf-shape models are applied to improve leaf segmentation, and further used for measuring leaf sizes and handling occlusions. Leaves typically grow radially away from the stem, a property that is exploited in our method, reducing the dimensionality of the tracking task. We successfully tested the method on infrared image sequences showing the growth of tobacco-plant seedlings up to an age of about 30ꃚys, which allows measuring relevant plant growth parameters such as leaf growth rate. By robustly fitting a suitably modified autocatalytic growth model to all growth curves from plants under the same treatment, average plant growth models could be derived. Future applications of the method include plant-growth monitoring for optimizing plant production in green houses or plant phenotyping for plant research.}}
    Abstract: Abstract In this paper, we present a novel multi-level procedure for finding and tracking leaves of a rosette plant, in our case up to 3 weeks old tobacco plants, during early growth from infrared-image sequences. This allows measuring important plant parameters, e.g. leaf growth rates, in an automatic and non-invasive manner. The procedure consists of three main stages: preprocessing, leaf segmentation, and leaf tracking. Leaf-shape models are applied to improve leaf segmentation, and further used for measuring leaf sizes and handling occlusions. Leaves typically grow radially away from the stem, a property that is exploited in our method, reducing the dimensionality of the tracking task. We successfully tested the method on infrared image sequences showing the growth of tobacco-plant seedlings up to an age of about 30ꃚys, which allows measuring relevant plant growth parameters such as leaf growth rate. By robustly fitting a suitably modified autocatalytic growth model to all growth curves from plants under the same treatment, average plant growth models could be derived. Future applications of the method include plant-growth monitoring for optimizing plant production in green houses or plant phenotyping for plant research.
    Review:

    © 2011 - 2017 Dept. of Computational Neuroscience • comments to: sreich _at_ gwdg.de • Impressum / Site Info