Johannes Dörr

Email:
-

Global QuickSearch:   Matches: 0

Search Settings

    Author / Editor / Organization
    Year
    Title
    Journal / Proceedings / Book
    Aksoy, E E. and Abramov, A. and Dörr, J. and Kejun, N. and Dellen, B. and Wörgötter, F. (2011).
    Learning the semantics of object-action relations by observation. The International Journal of Robotics Research September, 1229-1249, 30.
    BibTeX:
    @article{aksoyabramovdoerr2011,
      author = {Aksoy, E E. and Abramov, A. and Dörr, J. and Kejun, N. and Dellen, B. and Wörgötter, F.},
      title = {Learning the semantics of object-action relations by observation},
      pages = {1229-1249},
      journal = {The International Journal of Robotics Research September},
      year = {2011},
      volume= {30},
      url = {http://ijr.sagepub.com/content/30/10/1229.abstract},
      abstract = {Recognizing manipulations performed by a human and the transfer and execution of this by a robot is a difficult problem. We address this in the current study by introducing a novel representation of the relations between objects at decisive time points during a manipulation. Thereby, we encode the essential changes in a visual scenery in a condensed way such that a robot can recognize and learn a manipulation without prior object knowledge. To achieve this we continuously track image segments in the video and construct a dynamic graph sequence. Topological transitions of those graphs occur whenever a spatial relation between some segments has changed in a discontinuous way and these moments are stored in a transition matrix called the semantic event chain (SEC). We demonstrate that these time points are highly descriptive for distinguishing between different manipulations. Employing simple sub-string search algorithms, SECs can be compared and type-similar manipulations can be recognized with high confidence. As the approach is generic, statistical learning can be used to find the archetypal SEC of a given manipulation class. The performance of the algorithm is demonstrated on a set of real videos showing hands manipulating various objects and performing different actions. In experiments with a robotic arm, we show that the SEC can be learned by observing human manipulations, transferred to a new scenario, and then reproduced by the machine.}}
    Abstract: Recognizing manipulations performed by a human and the transfer and execution of this by a robot is a difficult problem. We address this in the current study by introducing a novel representation of the relations between objects at decisive time points during a manipulation. Thereby, we encode the essential changes in a visual scenery in a condensed way such that a robot can recognize and learn a manipulation without prior object knowledge. To achieve this we continuously track image segments in the video and construct a dynamic graph sequence. Topological transitions of those graphs occur whenever a spatial relation between some segments has changed in a discontinuous way and these moments are stored in a transition matrix called the semantic event chain (SEC). We demonstrate that these time points are highly descriptive for distinguishing between different manipulations. Employing simple sub-string search algorithms, SECs can be compared and type-similar manipulations can be recognized with high confidence. As the approach is generic, statistical learning can be used to find the archetypal SEC of a given manipulation class. The performance of the algorithm is demonstrated on a set of real videos showing hands manipulating various objects and performing different actions. In experiments with a robotic arm, we show that the SEC can be learned by observing human manipulations, transferred to a new scenario, and then reproduced by the machine.
    Review:
    Abramov, A. and Aksoy, E E. and Dörr, J. and Wörgötter, F. and Pauwels, K. and Dellen, B. (2010).
    3d semantic representation of actions from efficient stereo-image-sequence segmentation on GPUs. 5th International Symposium 3D Data Processing, Visualization and Transmission.
    BibTeX:
    @inproceedings{abramovaksoydoerr2010,
      author = {Abramov, A. and Aksoy, E E. and Dörr, J. and Wörgötter, F. and Pauwels, K. and Dellen, B.},
      title = {3d semantic representation of actions from efficient stereo-image-sequence segmentation on GPUs},
      booktitle = {5th International Symposium 3D Data Processing, Visualization and Transmission},
      year = {2010},
      abstract = {A novel real-time framework for model-free stereo-video segmentation and stereo-segment tracking is presented, combining real-time optical flow and stereo with image segmentation running separately on two GPUs. The stereosegment tracking algorithm achieves a frame rate of 23 Hz for regular videos with a frame size of 256x320 pixels and nearly real time for stereo videos. The computed stereo segments are used to construct 3D segment graphs, from which main graphs, representing a relevant change in the scene, are extracted, which allow us to represent a movie of e.g. 396 original frames by only 12 graphs, each containing only a small number of nodes, providing a condensed description of the scene while preserving data-intrinsic semantics. Using this method, human activities, e.g. and handling of objects, can be encoded in an efficient way. The method has potential applications for manipulation action recognition and learning, and provides a vision-front end for applications in cognitive robotics}}
    Abstract: A novel real-time framework for model-free stereo-video segmentation and stereo-segment tracking is presented, combining real-time optical flow and stereo with image segmentation running separately on two GPUs. The stereosegment tracking algorithm achieves a frame rate of 23 Hz for regular videos with a frame size of 256x320 pixels and nearly real time for stereo videos. The computed stereo segments are used to construct 3D segment graphs, from which main graphs, representing a relevant change in the scene, are extracted, which allow us to represent a movie of e.g. 396 original frames by only 12 graphs, each containing only a small number of nodes, providing a condensed description of the scene while preserving data-intrinsic semantics. Using this method, human activities, e.g. and handling of objects, can be encoded in an efficient way. The method has potential applications for manipulation action recognition and learning, and provides a vision-front end for applications in cognitive robotics
    Review:

    © 2011 - 2017 Dept. of Computational Neuroscience • comments to: sreich _at_ gwdg.de • Impressum / Site Info