Dr. Eren Erdal Aksoy

Group(s): Computer Vision
Email:
eaksoy@gwdg.de

Global QuickSearch:   Matches: 0

Search Settings

    Author / Editor / Organization
    Year
    Title
    Journal / Proceedings / Book
    Zenker, S. and Aksoy, E E. and Goldschmidt, D. and Wörgötter, F. and Manoonpong, P. (2013).
    Visual Terrain Classification for Selecting Energy Efficient Gaits of a Hexapod Robot. IEEE/ASME International Conference on Advanced Intelligent Mechatronics, 577-584. DOI: 10.1109/AIM.2013.6584154.
    BibTeX:
    @inproceedings{zenkeraksoygoldschmidt2013,
      author = {Zenker, S. and Aksoy, E E. and Goldschmidt, D. and Wörgötter, F. and Manoonpong, P.},
      title = {Visual Terrain Classification for Selecting Energy Efficient Gaits of a Hexapod Robot},
      pages = {577-584},
      booktitle = {IEEE/ASME International Conference on Advanced Intelligent Mechatronics},
      year = {2013},
      location = {Wollongong (Australia)},
      month = {Jul 9-12},
      doi = {10.1109/AIM.2013.6584154},
      abstract = {Legged robots need to be able to classify and recognize different terrains to adapt their gait accordingly. Recent works in terrain classification use different types of sensors (like stereovision, 3D laser range, and tactile sensors) and their combination. However, such sensor systems require more computing power, produce extra load to legged robots, and/or might be difficult to install on a small size legged robot. In this work, we present an online terrain classification system. It uses only a monocular camera with a feature-based terrain classification algorithm which is robust to changes in illumination and view points. For this algorithm, we extract local features of terrains using either Scale Invariant Feature Transform (SIFT) or Speed Up Robust Feature (SURF). We encode the features using the Bag of Words (BoW) technique, and then classify the words using Support Vector Machines (SVMs) with a radial basis function kernel. We compare this feature-based approach with a color-based approach on the Caltech-256 benchmark as well as eight different terrain image sets (grass, gravel, pavement, sand, asphalt, floor, mud, and fine gravel). For terrain images, we observe up to 90% accuracy with the feature-based approach. Finally, this online terrain classification system is successfully applied to our small hexapod robot AMOS II. The output of the system providing terrain information is used as an input to its neural locomotion control to trigger an energy-efficient gait while traversing different terrains.}}
    Abstract: Legged robots need to be able to classify and recognize different terrains to adapt their gait accordingly. Recent works in terrain classification use different types of sensors (like stereovision, 3D laser range, and tactile sensors) and their combination. However, such sensor systems require more computing power, produce extra load to legged robots, and/or might be difficult to install on a small size legged robot. In this work, we present an online terrain classification system. It uses only a monocular camera with a feature-based terrain classification algorithm which is robust to changes in illumination and view points. For this algorithm, we extract local features of terrains using either Scale Invariant Feature Transform (SIFT) or Speed Up Robust Feature (SURF). We encode the features using the Bag of Words (BoW) technique, and then classify the words using Support Vector Machines (SVMs) with a radial basis function kernel. We compare this feature-based approach with a color-based approach on the Caltech-256 benchmark as well as eight different terrain image sets (grass, gravel, pavement, sand, asphalt, floor, mud, and fine gravel). For terrain images, we observe up to 90% accuracy with the feature-based approach. Finally, this online terrain classification system is successfully applied to our small hexapod robot AMOS II. The output of the system providing terrain information is used as an input to its neural locomotion control to trigger an energy-efficient gait while traversing different terrains.
    Review:
    Aein, M J. and Aksoy, E E. and Tamosiunaite, M. and Papon, J. and Ude, A. and Wörgötter, F. (2013).
    Toward a library of manipulation actions based on Semantic Object-Action Relations. IEEE/RSJ International Conference on Intelligent Robots and Systems. DOI: 10.1109/IROS.2013.6697011.
    BibTeX:
    @inproceedings{aeinaksoytamosiunaite2013,
      author = {Aein, M J. and Aksoy, E E. and Tamosiunaite, M. and Papon, J. and Ude, A. and Wörgötter, F.},
      title = {Toward a library of manipulation actions based on Semantic Object-Action Relations},
      booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems},
      year = {2013},
      doi = {10.1109/IROS.2013.6697011},
      abstract = {The goal of this study is to provide an architecture for a generic definition of robot manipulation actions. We emphasize that the representation of actions presented here is procedural. Thus, we will define the structural elements of our action representations as execution protocols. To achieve this, manipulations are defined using three levels. The top- level defines objects, their relations and the actions in an abstract and symbolic way. A mid-level sequencer, with which the action primitives are chained, is used to structure the actual action execution, which is performed via the bottom level. This (lowest) level collects data from sensors and communicates with the control system of the robot. This method enables robot manipulators to execute the same action in different situations i.e. on different objects with different positions and orientations. In addition, two methods of detecting action failure are provided which are necessary to handle faults in system. To demonstrate the effectiveness of the proposed framework, several different actions are performed on our robotic setup and results are shown. This way we are creating a library of human-like robot actions, which can be used by higher-level task planners to execute more complex tasks.}}
    Abstract: The goal of this study is to provide an architecture for a generic definition of robot manipulation actions. We emphasize that the representation of actions presented here is procedural. Thus, we will define the structural elements of our action representations as execution protocols. To achieve this, manipulations are defined using three levels. The top- level defines objects, their relations and the actions in an abstract and symbolic way. A mid-level sequencer, with which the action primitives are chained, is used to structure the actual action execution, which is performed via the bottom level. This (lowest) level collects data from sensors and communicates with the control system of the robot. This method enables robot manipulators to execute the same action in different situations i.e. on different objects with different positions and orientations. In addition, two methods of detecting action failure are provided which are necessary to handle faults in system. To demonstrate the effectiveness of the proposed framework, several different actions are performed on our robotic setup and results are shown. This way we are creating a library of human-like robot actions, which can be used by higher-level task planners to execute more complex tasks.
    Review:
    Papon, J. and Kulvicius, T. and Aksoy, E E. and Wörgötter, F. (2013).
    Point Cloud Video Object Segmentation using a Persistent Supervoxel World-Model. IEEE/RSJ International Conference on Intelligent Robots and Systems IROS, 3712-3718. DOI: 10.1109/IROS.2013.6696886.
    BibTeX:
    @inproceedings{paponkulviciusaksoy2013,
      author = {Papon, J. and Kulvicius, T. and Aksoy, E E. and Wörgötter, F.},
      title = {Point Cloud Video Object Segmentation using a Persistent Supervoxel World-Model},
      pages = {3712-3718},
      booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems IROS},
      year = {2013},
      location = {Tokyo (Japan)},
      month = {November 3-8},
      organization = {},
      doi = {10.1109/IROS.2013.6696886},
      abstract = {Robust visual tracking is an essential precursor to understanding and replicating human actions in robotic systems. In order to accurately evaluate the semantic meaning of a sequence of video frames, or to replicate an action contained therein, one must be able to coherently track and segment all observed agents and objects. This work proposes a novel online point cloud based algorithm which simultaneously tracks 6DoF pose and determines spatial extent of all entities in indoor scenarios. This is accomplished using a persistent supervoxel world-model which is updated, rather than replaced, as new frames of data arrive. Maintenance of a world model enables general object permanence, permitting successful tracking through full occlusions. Object models are tracked using a bank of independent adaptive particle filters which use a supervoxel observation model to give rough estimates of object state. These are united using a novel multi-model RANSAC-like approach, which seeks to minimize a global energy function associating world-model supervoxels to predicted states. We present results on a standard robotic assembly benchmark for two application scenarios - human trajectory imitation and semantic action understanding - demonstrating the usefulness of the tracking in intelligent robotic systems.}}
    Abstract: Robust visual tracking is an essential precursor to understanding and replicating human actions in robotic systems. In order to accurately evaluate the semantic meaning of a sequence of video frames, or to replicate an action contained therein, one must be able to coherently track and segment all observed agents and objects. This work proposes a novel online point cloud based algorithm which simultaneously tracks 6DoF pose and determines spatial extent of all entities in indoor scenarios. This is accomplished using a persistent supervoxel world-model which is updated, rather than replaced, as new frames of data arrive. Maintenance of a world model enables general object permanence, permitting successful tracking through full occlusions. Object models are tracked using a bank of independent adaptive particle filters which use a supervoxel observation model to give rough estimates of object state. These are united using a novel multi-model RANSAC-like approach, which seeks to minimize a global energy function associating world-model supervoxels to predicted states. We present results on a standard robotic assembly benchmark for two application scenarios - human trajectory imitation and semantic action understanding - demonstrating the usefulness of the tracking in intelligent robotic systems.
    Review:
    Aksoy, E E. and Tamosiunaite, M. and Vuga, R. and Ude, A. and Geib, C. and Steedman, M. and Wörgötter, F. (2013).
    Structural bootstrapping at the sensorimotor level for the fast acquisition of action knowledge for cognitive robots. IEEE International Conference on Development and Learning and Epigenetic Robotics ICDL-EPIROB, 1--8. DOI: 10.1109/DevLrn.2013.6652537.
    BibTeX:
    @inproceedings{aksoytamosiunaitevuga2013,
      author = {Aksoy, E E. and Tamosiunaite, M. and Vuga, R. and Ude, A. and Geib, C. and Steedman, M. and Wörgötter, F.},
      title = {Structural bootstrapping at the sensorimotor level for the fast acquisition of action knowledge for cognitive robots},
      pages = {1--8},
      booktitle = {IEEE International Conference on Development and Learning and Epigenetic Robotics ICDL-EPIROB},
      year = {2013},
      location = {Osaka (Japan)},
      month = {08},
      doi = {10.1109/DevLrn.2013.6652537},
      abstract = {Autonomous robots are faced with the problem of encoding complex actions (e.g. complete manipulations) in a generic and generalizable way. Recently we had introduced the Semantic Event Chains (SECs) as a new representation which can be directly computed from a stream of 3D images and is based on changes in the relationships between objects involved in a manipulation. Here we show that the SEC framework can be extended (called extended SEC) with action-related information and used to achieve and encode two important cognitive properties relevant for advanced autonomous robots: The extended SEC enables us to determine whether an action representation (1) needs to be newly created and stored in its entirety in the robots memory or (2) whether one of the already known and memorized action representations just needs to be refined. In human cognition these two processes (1 and 2) are known as accommodation and assimilation. Thus, here we show that the extended SEC representation can be used to realize these processes originally defined by Piaget for the first time in a robotic application. This is of fundamental importance for any cognitive agent as it allows categorizing observed actions in new versus known ones, storing only the relevant aspects.}}
    Abstract: Autonomous robots are faced with the problem of encoding complex actions (e.g. complete manipulations) in a generic and generalizable way. Recently we had introduced the Semantic Event Chains (SECs) as a new representation which can be directly computed from a stream of 3D images and is based on changes in the relationships between objects involved in a manipulation. Here we show that the SEC framework can be extended (called extended SEC) with action-related information and used to achieve and encode two important cognitive properties relevant for advanced autonomous robots: The extended SEC enables us to determine whether an action representation (1) needs to be newly created and stored in its entirety in the robots memory or (2) whether one of the already known and memorized action representations just needs to be refined. In human cognition these two processes (1 and 2) are known as accommodation and assimilation. Thus, here we show that the extended SEC representation can be used to realize these processes originally defined by Piaget for the first time in a robotic application. This is of fundamental importance for any cognitive agent as it allows categorizing observed actions in new versus known ones, storing only the relevant aspects.
    Review:
    Papon, J. and Abramov, A. and Aksoy, E. and Wörgötter, F. (2012).
    A modular system architecture for online parallel vision pipelines. Applications of Computer Vision WACV, 2012 IEEE Workshop on, 361-368. DOI: 10.1109/WACV.2012.6163002.
    BibTeX:
    @inproceedings{paponabramovaksoy2012,
      author = {Papon, J. and Abramov, A. and Aksoy, E. and Wörgötter, F.},
      title = {A modular system architecture for online parallel vision pipelines},
      pages = {361-368},
      booktitle = {Applications of Computer Vision WACV, 2012 IEEE Workshop on},
      year = {2012},
      month = {jan},
      doi = {10.1109/WACV.2012.6163002},
      abstract = {We present an architecture for real-time, online vision systems which enables development and use of complex vision pipelines integrating any number of algorithms. Individual algorithms are implemented using modular plugins, allowing integration of independently developed algorithms and rapid testing of new vision pipeline configurations. The architecture exploits the parallelization of graphics processing units (GPUs) and multi-core systems to speed processing and achieve real-time performance. Additionally, the use of a global memory management system for frame buffering permits complex algorithmic flow (e.g. feedback loops) in online processing setups, while maintaining the benefits of threaded asynchronous operation of separate algorithms. To demonstrate the system, a typical real-time system setup is described which incorporates plugins for video and depth acquisition, GPU-based segmentation and optical flow, semantic graph generation, and online visualization of output. Performance numbers are shown which demonstrate the insignificant overhead cost of the architecture as well as speed-up over strictly CPU and single threaded implementations.}}
    Abstract: We present an architecture for real-time, online vision systems which enables development and use of complex vision pipelines integrating any number of algorithms. Individual algorithms are implemented using modular plugins, allowing integration of independently developed algorithms and rapid testing of new vision pipeline configurations. The architecture exploits the parallelization of graphics processing units (GPUs) and multi-core systems to speed processing and achieve real-time performance. Additionally, the use of a global memory management system for frame buffering permits complex algorithmic flow (e.g. feedback loops) in online processing setups, while maintaining the benefits of threaded asynchronous operation of separate algorithms. To demonstrate the system, a typical real-time system setup is described which incorporates plugins for video and depth acquisition, GPU-based segmentation and optical flow, semantic graph generation, and online visualization of output. Performance numbers are shown which demonstrate the insignificant overhead cost of the architecture as well as speed-up over strictly CPU and single threaded implementations.
    Review:
    Wörgötter, F. and Aksoy, E. E. and Krüger, N. and Piater, J. and Ude, A. and Tamosiunaite, M. (2013).
    A Simple Ontology of Manipulation Actions based on Hand-Object Relations. IEEE Transactions on Autonomous Mental Development, 117 - 134, 05, 02. DOI: 10.1109/TAMD.2012.2232291.
    BibTeX:
    @article{woergoetteraksoykrueger2013,
      author = {Wörgötter, F. and Aksoy, E. E. and Krüger, N. and Piater, J. and Ude, A. and Tamosiunaite, M.},
      title = {A Simple Ontology of Manipulation Actions based on Hand-Object Relations},
      pages = {117 - 134},
      journal = {IEEE Transactions on Autonomous Mental Development},
      year = {2013},
      volume= {05},
      number = {02},
      month = {06},
      doi = {10.1109/TAMD.2012.2232291},
      abstract = {Humans can perform a multitude of different actions with their hands (manipulations). In spite of this, so far there have been only a few attempts to represent manipulation types trying to understand the underlying principles. Here we first discuss how manipulation actions are structured in space and time. For this we use as temporal anchor points those moments where two objects (or hand and object) touch or un-touch each other during a manipulation. We show that by this one can define a relatively small tree-like manipulation ontology. We find less than 30 fundamental manipulations. The temporal anchors also provide us with information about when to pay attention to additional important information, for example when to consider trajectory shapes and relative poses between objects. As a consequence a highly condensed representation emerges by which different manipulations can be recognized and encoded. Examples of manipulations recognition and execution by a robot based on this representation are given at the end of this study.}}
    Abstract: Humans can perform a multitude of different actions with their hands (manipulations). In spite of this, so far there have been only a few attempts to represent manipulation types trying to understand the underlying principles. Here we first discuss how manipulation actions are structured in space and time. For this we use as temporal anchor points those moments where two objects (or hand and object) touch or un-touch each other during a manipulation. We show that by this one can define a relatively small tree-like manipulation ontology. We find less than 30 fundamental manipulations. The temporal anchors also provide us with information about when to pay attention to additional important information, for example when to consider trajectory shapes and relative poses between objects. As a consequence a highly condensed representation emerges by which different manipulations can be recognized and encoded. Examples of manipulations recognition and execution by a robot based on this representation are given at the end of this study.
    Review:
    Aksoy, E E. and Dellen, B. and Tamosiunaite, M. and Wörgötter, F. (2011).
    Execution of a Dual-Object Pushing Action with Semantic Event Chains. IEEE-RAS Int. Conf. on Humanoid Robots, 576-583. DOI: 10.1109/Humanoids.2011.6100833.
    BibTeX:
    @inproceedings{aksoydellentamosiunaite2011,
      author = {Aksoy, E E. and Dellen, B. and Tamosiunaite, M. and Wörgötter, F.},
      title = {Execution of a Dual-Object Pushing Action with Semantic Event Chains},
      pages = {576-583},
      booktitle = {IEEE-RAS Int. Conf. on Humanoid Robots},
      year = {2011},
      doi = {10.1109/Humanoids.2011.6100833},
      abstract = {Execution of a manipulation after learning from demonstration many times requires intricate planning and control systems or some form of manual guidance for a robot. Here we present a framework for manipulation execution based on the so called "Semantic Event Chain" which is an abstract description of relations between the objects in the scene. It captures the change of those relations during a manipulation and thereby provides the decisive temporal anchor points by which a manipulation is critically defined. Using semantic event chains a model of a manipulation can be learned. We will show that it is possible to add the required control parameters (the spatial anchor points) to this model, which can then be executed by a robot in a fully autonomous way. The process of learning and execution of semantic event chains is explained using a box pushing example.}}
    Abstract: Execution of a manipulation after learning from demonstration many times requires intricate planning and control systems or some form of manual guidance for a robot. Here we present a framework for manipulation execution based on the so called "Semantic Event Chain" which is an abstract description of relations between the objects in the scene. It captures the change of those relations during a manipulation and thereby provides the decisive temporal anchor points by which a manipulation is critically defined. Using semantic event chains a model of a manipulation can be learned. We will show that it is possible to add the required control parameters (the spatial anchor points) to this model, which can then be executed by a robot in a fully autonomous way. The process of learning and execution of semantic event chains is explained using a box pushing example.
    Review:
    Aksoy, E E. and Abramov, A. and Dörr, J. and Kejun, N. and Dellen, B. and Wörgötter, F. (2011).
    Learning the semantics of object-action relations by observation. The International Journal of Robotics Research September, 1229-1249, 30.
    BibTeX:
    @article{aksoyabramovdoerr2011,
      author = {Aksoy, E E. and Abramov, A. and Dörr, J. and Kejun, N. and Dellen, B. and Wörgötter, F.},
      title = {Learning the semantics of object-action relations by observation},
      pages = {1229-1249},
      journal = {The International Journal of Robotics Research September},
      year = {2011},
      volume= {30},
      url = {http://ijr.sagepub.com/content/30/10/1229.abstract},
      abstract = {Recognizing manipulations performed by a human and the transfer and execution of this by a robot is a difficult problem. We address this in the current study by introducing a novel representation of the relations between objects at decisive time points during a manipulation. Thereby, we encode the essential changes in a visual scenery in a condensed way such that a robot can recognize and learn a manipulation without prior object knowledge. To achieve this we continuously track image segments in the video and construct a dynamic graph sequence. Topological transitions of those graphs occur whenever a spatial relation between some segments has changed in a discontinuous way and these moments are stored in a transition matrix called the semantic event chain (SEC). We demonstrate that these time points are highly descriptive for distinguishing between different manipulations. Employing simple sub-string search algorithms, SECs can be compared and type-similar manipulations can be recognized with high confidence. As the approach is generic, statistical learning can be used to find the archetypal SEC of a given manipulation class. The performance of the algorithm is demonstrated on a set of real videos showing hands manipulating various objects and performing different actions. In experiments with a robotic arm, we show that the SEC can be learned by observing human manipulations, transferred to a new scenario, and then reproduced by the machine.}}
    Abstract: Recognizing manipulations performed by a human and the transfer and execution of this by a robot is a difficult problem. We address this in the current study by introducing a novel representation of the relations between objects at decisive time points during a manipulation. Thereby, we encode the essential changes in a visual scenery in a condensed way such that a robot can recognize and learn a manipulation without prior object knowledge. To achieve this we continuously track image segments in the video and construct a dynamic graph sequence. Topological transitions of those graphs occur whenever a spatial relation between some segments has changed in a discontinuous way and these moments are stored in a transition matrix called the semantic event chain (SEC). We demonstrate that these time points are highly descriptive for distinguishing between different manipulations. Employing simple sub-string search algorithms, SECs can be compared and type-similar manipulations can be recognized with high confidence. As the approach is generic, statistical learning can be used to find the archetypal SEC of a given manipulation class. The performance of the algorithm is demonstrated on a set of real videos showing hands manipulating various objects and performing different actions. In experiments with a robotic arm, we show that the SEC can be learned by observing human manipulations, transferred to a new scenario, and then reproduced by the machine.
    Review:
    Abramov, A. and Aksoy, E E. and Dörr, J. and Wörgötter, F. and Pauwels, K. and Dellen, B. (2010).
    3d semantic representation of actions from efficient stereo-image-sequence segmentation on GPUs. 5th International Symposium 3D Data Processing, Visualization and Transmission.
    BibTeX:
    @inproceedings{abramovaksoydoerr2010,
      author = {Abramov, A. and Aksoy, E E. and Dörr, J. and Wörgötter, F. and Pauwels, K. and Dellen, B.},
      title = {3d semantic representation of actions from efficient stereo-image-sequence segmentation on GPUs},
      booktitle = {5th International Symposium 3D Data Processing, Visualization and Transmission},
      year = {2010},
      abstract = {A novel real-time framework for model-free stereo-video segmentation and stereo-segment tracking is presented, combining real-time optical flow and stereo with image segmentation running separately on two GPUs. The stereosegment tracking algorithm achieves a frame rate of 23 Hz for regular videos with a frame size of 256x320 pixels and nearly real time for stereo videos. The computed stereo segments are used to construct 3D segment graphs, from which main graphs, representing a relevant change in the scene, are extracted, which allow us to represent a movie of e.g. 396 original frames by only 12 graphs, each containing only a small number of nodes, providing a condensed description of the scene while preserving data-intrinsic semantics. Using this method, human activities, e.g. and handling of objects, can be encoded in an efficient way. The method has potential applications for manipulation action recognition and learning, and provides a vision-front end for applications in cognitive robotics}}
    Abstract: A novel real-time framework for model-free stereo-video segmentation and stereo-segment tracking is presented, combining real-time optical flow and stereo with image segmentation running separately on two GPUs. The stereosegment tracking algorithm achieves a frame rate of 23 Hz for regular videos with a frame size of 256x320 pixels and nearly real time for stereo videos. The computed stereo segments are used to construct 3D segment graphs, from which main graphs, representing a relevant change in the scene, are extracted, which allow us to represent a movie of e.g. 396 original frames by only 12 graphs, each containing only a small number of nodes, providing a condensed description of the scene while preserving data-intrinsic semantics. Using this method, human activities, e.g. and handling of objects, can be encoded in an efficient way. The method has potential applications for manipulation action recognition and learning, and provides a vision-front end for applications in cognitive robotics
    Review:
    Aksoy, E E. and Abramov, A. and Wörgötter, F. and Dellen, B. (2010).
    Categorizing object-action relations from semantic scene graphs. IEEE International Conference on Robotics and Automation ICRA, 398-405. DOI: 10.1109/ROBOT.2010.5509319.
    BibTeX:
    @inproceedings{aksoyabramovwoergoetter2010,
      author = {Aksoy, E E. and Abramov, A. and Wörgötter, F. and Dellen, B.},
      title = {Categorizing object-action relations from semantic scene graphs},
      pages = {398-405},
      booktitle = {IEEE International Conference on Robotics and Automation ICRA},
      year = {2010},
      month = {05},
      doi = {10.1109/ROBOT.2010.5509319},
      abstract = {In this work we introduce a novel approach for detecting spatiotemporal object-action relations, leading to both, action recognition and object categorization. Semantic scene graphs are extracted from image sequences and used to find the characteristic main graphs of the action sequence via an exact graph-matching technique, thus providing an event table of the action scene, which allows extracting object- action relations. The method is applied to several artificial and real action scenes containing limited context. The central novelty of this approach is that it is model free and needs a priori representation neither for objects nor actions. Essentially actions are recognized without requiring prior object knowledge and objects are categorized solely based on their exhibited role within an action sequence. Thus, this approach is grounded in the affordance principle, which has recently attracted much attention in robotics and provides a way forward for trial and error learning of object-action relations through repeated experimentation. It may therefore be useful for recognition and categorization tasks for example in imitation learning in developmental and cognitive robotics}}
    Abstract: In this work we introduce a novel approach for detecting spatiotemporal object-action relations, leading to both, action recognition and object categorization. Semantic scene graphs are extracted from image sequences and used to find the characteristic main graphs of the action sequence via an exact graph-matching technique, thus providing an event table of the action scene, which allows extracting object- action relations. The method is applied to several artificial and real action scenes containing limited context. The central novelty of this approach is that it is model free and needs a priori representation neither for objects nor actions. Essentially actions are recognized without requiring prior object knowledge and objects are categorized solely based on their exhibited role within an action sequence. Thus, this approach is grounded in the affordance principle, which has recently attracted much attention in robotics and provides a way forward for trial and error learning of object-action relations through repeated experimentation. It may therefore be useful for recognition and categorization tasks for example in imitation learning in developmental and cognitive robotics
    Review:
    Dellen, B. and Aksoy, E E. and Wörgötter, F. (2009).
    Segment Tracking via a Spatiotemporal Linking Process including Feedback Stabilization in an n-D Lattice Model. Sensors, 9355-9379, 9, 11. DOI: 10.3390/s91109355.
    BibTeX:
    @article{dellenaksoywoergoetter2009,
      author = {Dellen, B. and Aksoy, E E. and Wörgötter, F.},
      title = {Segment Tracking via a Spatiotemporal Linking Process including Feedback Stabilization in an n-D Lattice Model},
      pages = {9355-9379},
      journal = {Sensors},
      year = {2009},
      volume= {9},
      number = {11},
      url = {http://www.mdpi.com/1424-8220/9/11/9355},
      doi = {10.3390/s91109355},
      abstract = {Model-free tracking is important for solving tasks such as moving-object tracking and action recognition in cases where no prior object knowledge is available. For this purpose, we extend the concept of spatially synchronous dynamics in spin-lattice models to the spatiotemporal domain to track segments within an image sequence. The method is related to synchronization processes in neural networks and based on superparamagnetic clustering of data. Spin interactions result in the formation of clusters of correlated spins, providing an automatic labeling of corresponding image regions. The algorithm obeys detailed balance. This is an important property as it allows for consistent spin-transfer across subsequent frames, which can be used for segment tracking. Therefore, in the tracking process the correct equilibrium will always be found, which is an important advance as compared with other more heuristic tracking procedures. In the case of long image sequences, i.e. and movies, the algorithm is augmented with a feedback mechanism, further stabilizing segment tracking}}
    Abstract: Model-free tracking is important for solving tasks such as moving-object tracking and action recognition in cases where no prior object knowledge is available. For this purpose, we extend the concept of spatially synchronous dynamics in spin-lattice models to the spatiotemporal domain to track segments within an image sequence. The method is related to synchronization processes in neural networks and based on superparamagnetic clustering of data. Spin interactions result in the formation of clusters of correlated spins, providing an automatic labeling of corresponding image regions. The algorithm obeys detailed balance. This is an important property as it allows for consistent spin-transfer across subsequent frames, which can be used for segment tracking. Therefore, in the tracking process the correct equilibrium will always be found, which is an important advance as compared with other more heuristic tracking procedures. In the case of long image sequences, i.e. and movies, the algorithm is augmented with a feedback mechanism, further stabilizing segment tracking
    Review:
    Krüger, N. and Ude, A. and Petersen, H. and Nemec, B. and Ellekilde, L. and Savarimuthu, T. and Rytz, J. and Fischer, K. and Buch, A. and Kraft, D. and Mustafa, W. and Aksoy, E. and Papon, J. and Kramberger, A. and Wörgötter, F. (2014).
    Technologies for the Fast Set-Up of Automated Assembly Processes. KI - Künstliche Intelligenz, 1-9. DOI: 10.1007/s13218-014-0329-9.
    BibTeX:
    @article{kruegerudepetersen2014,
      author = {Krüger, N. and Ude, A. and Petersen, H. and Nemec, B. and Ellekilde, L. and Savarimuthu, T. and Rytz, J. and Fischer, K. and Buch, A. and Kraft, D. and Mustafa, W. and Aksoy, E. and Papon, J. and Kramberger, A. and Wörgötter, F.},
      title = {Technologies for the Fast Set-Up of Automated Assembly Processes},
      pages = {1-9},
      journal = {KI - Künstliche Intelligenz},
      year = {2014},
      language = {English},
      publisher = {Springer Berlin Heidelberg},
      url = {http://dx.doi.org/10.1007/s13218-014-0329-9},
      doi = {10.1007/s13218-014-0329-9},
      abstract = {In this article, we describe technologies facilitating the set-up of automated assembly solutions which have been developed in the context of the IntellAct project (2011-2014). Tedious procedures are currently still required to establish such robot solutions. This hinders especially the automation of so called few-of-a-kind production. Therefore, most production of this kind is done manually and thus often performed in low-wage countries. In the IntellAct project, we have developed a set of methods which facilitate the set-up of a complex automatic assembly process, and here we present our work on tele-operation, dexterous grasping, pose estimation and learning of control strategies. The prototype developed in IntellAct is at a TRL4 (corresponding to demonstration in lab environment).}}
    Abstract: In this article, we describe technologies facilitating the set-up of automated assembly solutions which have been developed in the context of the IntellAct project (2011-2014). Tedious procedures are currently still required to establish such robot solutions. This hinders especially the automation of so called few-of-a-kind production. Therefore, most production of this kind is done manually and thus often performed in low-wage countries. In the IntellAct project, we have developed a set of methods which facilitate the set-up of a complex automatic assembly process, and here we present our work on tele-operation, dexterous grasping, pose estimation and learning of control strategies. The prototype developed in IntellAct is at a TRL4 (corresponding to demonstration in lab environment).
    Review:
    Schlette, C. and Buch, A. and Aksoy, E. and Steil, T. and Papon, J. and Savarimuthu, T. and Wörgötter, F. and Krüger, N. and Roßmann, J. (2014).
    A new benchmark for pose estimation with ground truth from virtual reality. Production Engineering, 745-754, 8, 6. DOI: 10.1007/s11740-014-0552-0.
    BibTeX:
    @article{schlettebuchaksoy2014,
      author = {Schlette, C. and Buch, A. and Aksoy, E. and Steil, T. and Papon, J. and Savarimuthu, T. and Wörgötter, F. and Krüger, N. and Roßmann, J.},
      title = {A new benchmark for pose estimation with ground truth from virtual reality},
      pages = {745-754},
      journal = {Production Engineering},
      year = {2014},
      volume= {8},
      number = {6},
      language = {English},
      publisher = {Springer Berlin Heidelberg},
      url = {http://dx.doi.org/10.1007/s11740-014-0552-0},
      doi = {10.1007/s11740-014-0552-0},
      abstract = {The development of programming paradigms for industrial assembly currently gets fresh impetus from approaches in human demonstration and programming-by-demonstration. Major low- and mid-level prerequisites for machine vision and learning in these intelligent robotic applications are pose estimation, stereo reconstruction and action recognition. As a basis for the machine vision and learning involved, pose estimation is used for deriving object positions and orientations and thus target frames for robot execution. Our contribution introduces and applies a novel benchmark for typical multi-sensor setups and algorithms in the field of demonstration-based automated assembly. The benchmark platform is equipped with a multi-sensor setup consisting of stereo cameras and depth scanning devices (see Fig. 1). The dimensions and abilities of the platform have been chosen in order to reflect typical manual assembly tasks. Following the eRobotics methodology, a simulatable 3D representation of this platform was modelled in virtual reality. Based on a detailed camera and sensor simulation, we generated a set of benchmark images and point clouds with controlled levels of noise as well as ground truth data such as object positions and time stamps. We demonstrate the application of the benchmark to evaluate our latest developments in pose estimation, stereo reconstruction and action recognition and publish the benchmark data for objective comparison of sensor setups and algorithms in industry.}}
    Abstract: The development of programming paradigms for industrial assembly currently gets fresh impetus from approaches in human demonstration and programming-by-demonstration. Major low- and mid-level prerequisites for machine vision and learning in these intelligent robotic applications are pose estimation, stereo reconstruction and action recognition. As a basis for the machine vision and learning involved, pose estimation is used for deriving object positions and orientations and thus target frames for robot execution. Our contribution introduces and applies a novel benchmark for typical multi-sensor setups and algorithms in the field of demonstration-based automated assembly. The benchmark platform is equipped with a multi-sensor setup consisting of stereo cameras and depth scanning devices (see Fig. 1). The dimensions and abilities of the platform have been chosen in order to reflect typical manual assembly tasks. Following the eRobotics methodology, a simulatable 3D representation of this platform was modelled in virtual reality. Based on a detailed camera and sensor simulation, we generated a set of benchmark images and point clouds with controlled levels of noise as well as ground truth data such as object positions and time stamps. We demonstrate the application of the benchmark to evaluate our latest developments in pose estimation, stereo reconstruction and action recognition and publish the benchmark data for objective comparison of sensor setups and algorithms in industry.
    Review:
    Aksoy, E E. and Tamosiunaite, M. and Wörgötter, F. (2014).
    Model-free incremental learning of the semantics of manipulation actions. Robotics and Autonomous Systems, 1-42. DOI: 10.1016/j.robot.2014.11.003.
    BibTeX:
    @article{aksoytamosiunaitewoergoetter2014,
      author = {Aksoy, E E. and Tamosiunaite, M. and Wörgötter, F.},
      title = {Model-free incremental learning of the semantics of manipulation actions},
      pages = {1-42},
      journal = {Robotics and Autonomous Systems},
      year = {2014},
      url = {http://www.sciencedirect.com/science/article/pii/S0921889014002450},
      doi = {10.1016/j.robot.2014.11.003},
      abstract = {Abstract Understanding and learning the semantics of complex manipulation actions are intriguing and non-trivial issues for the development of autonomous robots. In this paper, we present a novel method for an on-line, incremental learning of the semantics of manipulation actions by observation. Recently, we had introduced the Semantic Event Chains (SECs) as a new generic representation for manipulations, which can be directly computed from a stream of images and is based on the changes in the relationships between objects involved in a manipulation. We here show that the SEC concept can be used to bootstrap the learning of the semantics of manipulation actions without using any prior knowledge about actions or objects. We create a new manipulation action benchmark with 8 different manipulation tasks including in total 120 samples to learn an archetypal SEC model for each manipulation action. We then evaluate the learned SEC models with 20 long and complex chained manipulation sequences including in total 103 manipulation samples. Thereby we put the event chains to a decisive test asking how powerful is action classification when using this framework. We find that we reach up to 100 % and 87 % average precision and recall values in the validation phase and 99 % and 92 % in the testing phase. This supports the notion that SECs are a useful tool for classifying manipulation actions in a fully automatic way.}}
    Abstract: Abstract Understanding and learning the semantics of complex manipulation actions are intriguing and non-trivial issues for the development of autonomous robots. In this paper, we present a novel method for an on-line, incremental learning of the semantics of manipulation actions by observation. Recently, we had introduced the Semantic Event Chains (SECs) as a new generic representation for manipulations, which can be directly computed from a stream of images and is based on the changes in the relationships between objects involved in a manipulation. We here show that the SEC concept can be used to bootstrap the learning of the semantics of manipulation actions without using any prior knowledge about actions or objects. We create a new manipulation action benchmark with 8 different manipulation tasks including in total 120 samples to learn an archetypal SEC model for each manipulation action. We then evaluate the learned SEC models with 20 long and complex chained manipulation sequences including in total 103 manipulation samples. Thereby we put the event chains to a decisive test asking how powerful is action classification when using this framework. We find that we reach up to 100 % and 87 % average precision and recall values in the validation phase and 99 % and 92 % in the testing phase. This supports the notion that SECs are a useful tool for classifying manipulation actions in a fully automatic way.
    Review:
    Aksoy, E E. and Abramov, A. and Wörgötter, F. and Scharr, H. and Fischbach, A. and Dellen, B. (2015).
    Modeling leaf growth of rosette plants using infrared stereo image sequences. Computers and Electronics in Agriculture, 78 - 90, 110. DOI: http://dx.doi.org/10.1016/j.compag.2014.10.020.
    BibTeX:
    @article{aksoyabramovwoergoetter2015,
      author = {Aksoy, E E. and Abramov, A. and Wörgötter, F. and Scharr, H. and Fischbach, A. and Dellen, B.},
      title = {Modeling leaf growth of rosette plants using infrared stereo image sequences},
      pages = {78 - 90},
      journal = {Computers and Electronics in Agriculture},
      year = {2015},
      volume= {110},
      url = {http://www.sciencedirect.com/science/article/pii/S0168169914002816},
      doi = {http://dx.doi.org/10.1016/j.compag.2014.10.020},
      abstract = {Abstract In this paper, we present a novel multi-level procedure for finding and tracking leaves of a rosette plant, in our case up to 3 weeks old tobacco plants, during early growth from infrared-image sequences. This allows measuring important plant parameters, e.g. leaf growth rates, in an automatic and non-invasive manner. The procedure consists of three main stages: preprocessing, leaf segmentation, and leaf tracking. Leaf-shape models are applied to improve leaf segmentation, and further used for measuring leaf sizes and handling occlusions. Leaves typically grow radially away from the stem, a property that is exploited in our method, reducing the dimensionality of the tracking task. We successfully tested the method on infrared image sequences showing the growth of tobacco-plant seedlings up to an age of about 30ꃚys, which allows measuring relevant plant growth parameters such as leaf growth rate. By robustly fitting a suitably modified autocatalytic growth model to all growth curves from plants under the same treatment, average plant growth models could be derived. Future applications of the method include plant-growth monitoring for optimizing plant production in green houses or plant phenotyping for plant research.}}
    Abstract: Abstract In this paper, we present a novel multi-level procedure for finding and tracking leaves of a rosette plant, in our case up to 3 weeks old tobacco plants, during early growth from infrared-image sequences. This allows measuring important plant parameters, e.g. leaf growth rates, in an automatic and non-invasive manner. The procedure consists of three main stages: preprocessing, leaf segmentation, and leaf tracking. Leaf-shape models are applied to improve leaf segmentation, and further used for measuring leaf sizes and handling occlusions. Leaves typically grow radially away from the stem, a property that is exploited in our method, reducing the dimensionality of the tracking task. We successfully tested the method on infrared image sequences showing the growth of tobacco-plant seedlings up to an age of about 30ꃚys, which allows measuring relevant plant growth parameters such as leaf growth rate. By robustly fitting a suitably modified autocatalytic growth model to all growth curves from plants under the same treatment, average plant growth models could be derived. Future applications of the method include plant-growth monitoring for optimizing plant production in green houses or plant phenotyping for plant research.
    Review:
    Vuga, R. and Aksoy, E E. and Wörgötter, F. and Ude, A. (2015).
    Probabilistic semantic models for manipulation action representation and extraction. Robotics and Autonomous Systems, 40 - 56, 65. DOI: 10.1016/j.robot.2014.11.012.
    BibTeX:
    @article{vugaaksoywoergoetter2015,
      author = {Vuga, R. and Aksoy, E E. and Wörgötter, F. and Ude, A.},
      title = {Probabilistic semantic models for manipulation action representation and extraction},
      pages = {40 - 56},
      journal = {Robotics and Autonomous Systems},
      year = {2015},
      volume= {65},
      url = {http://www.sciencedirect.com/science/article/pii/S0921889014002851},
      doi = {10.1016/j.robot.2014.11.012},
      abstract = {Abstract In this paper we present a hierarchical framework for representation of manipulation actions and its applicability to the problem of top down action extraction from observation. The framework consists of novel probabilistic semantic models, which encode contact relations as probability distributions over the action phase. The models are action descriptive and can be used to provide probabilistic similarity scores for newly observed action sequences. The lower level of the representation consists of parametric hidden Markov models, which encode trajectory information.}}
    Abstract: Abstract In this paper we present a hierarchical framework for representation of manipulation actions and its applicability to the problem of top down action extraction from observation. The framework consists of novel probabilistic semantic models, which encode contact relations as probability distributions over the action phase. The models are action descriptive and can be used to provide probabilistic similarity scores for newly observed action sequences. The lower level of the representation consists of parametric hidden Markov models, which encode trajectory information.
    Review:
    Aksoy, E. E. and Schoeler, M. and Wörgötter, F. (2014).
    Testing piagets ideas on robots: Assimilation and accommodation using the semantics of actions. IEEE International Conferences on Development and Learning and Epigenetic Robotics (ICDL-Epirob), 107-108. DOI: 10.1109/DEVLRN.2014.6982962.
    BibTeX:
    @inproceedings{aksoyschoelerwoergoetter2014,
      author = {Aksoy, E. E. and Schoeler, M. and Wörgötter, F.},
      title = {Testing piagets ideas on robots: Assimilation and accommodation using the semantics of actions},
      pages = {107-108},
      booktitle = {IEEE International Conferences on Development and Learning and Epigenetic Robotics (ICDL-Epirob)},
      year = {2014},
      month = {Oct},
      doi = {10.1109/DEVLRN.2014.6982962},
      abstract = {The proposed framework addresses the problem of implementing a high level}}
    Abstract: The proposed framework addresses the problem of implementing a high level
    Review:
    Savarimuthu, R. and Papon, J. and Buch, A. G. and Aksoy, E. and Mustafa, W. and Wörgötter, F. and Krüger, N. (2015).
    An Online Vision System for Understanding Complex Assembly Tasks. International Conference on Computer Vision Theory and Applications, 1 - 8. DOI: 10.5220/0005260804540461.
    BibTeX:
    @inproceedings{savarimuthupaponbuch2015,
      author = {Savarimuthu, R. and Papon, J. and Buch, A. G. and Aksoy, E. and Mustafa, W. and Wörgötter, F. and Krüger, N.},
      title = {An Online Vision System for Understanding Complex Assembly Tasks},
      pages = {1 - 8},
      booktitle = {International Conference on Computer Vision Theory and Applications},
      year = {2015},
      location = {Berlin (Germany)},
      month = {March 11 - 14},
      doi = {10.5220/0005260804540461},
      abstract = {We present an integrated system for the recognition, pose estimation and simultaneous tracking of multiple objects in 3D scenes. Our target application is a complete semantic representation of dynamic scenes which requires three essential steps recognition of objects, tracking their movements, and identification of interactions between them. We address this challenge with a complete system which uses object recognition and pose estimation to initiate object models and trajectories, a dynamic sequential octree structure to allow for full 6DOF tracking through occlusions, and a graph-based semantic representation to distil interactions. We evaluate the proposed method on real scenarios by comparing tracked outputs to ground truth trajectories and we compare the results to Iterative Closest Point and Particle Filter based trackers.}}
    Abstract: We present an integrated system for the recognition, pose estimation and simultaneous tracking of multiple objects in 3D scenes. Our target application is a complete semantic representation of dynamic scenes which requires three essential steps recognition of objects, tracking their movements, and identification of interactions between them. We address this challenge with a complete system which uses object recognition and pose estimation to initiate object models and trajectories, a dynamic sequential octree structure to allow for full 6DOF tracking through occlusions, and a graph-based semantic representation to distil interactions. We evaluate the proposed method on real scenarios by comparing tracked outputs to ground truth trajectories and we compare the results to Iterative Closest Point and Particle Filter based trackers.
    Review:
    Aksoy, E. E. and Aein, M. J. and Tamosiunaite, M. and Wörgötter, F. (2015).
    Semantic parsing of human manipulation activities using on-line learned models for robot imitation. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2875-2882. DOI: 10.1109/IROS.2015.7353773.
    BibTeX:
    @inproceedings{aksoyaeintamosiunaite2015,
      author = {Aksoy, E. E. and Aein, M. J. and Tamosiunaite, M. and Wörgötter, F.},
      title = {Semantic parsing of human manipulation activities using on-line learned models for robot imitation},
      pages = {2875-2882},
      booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
      year = {2015},
      location = {Hamburg, Germany},
      month = {Sept},
      doi = {10.1109/IROS.2015.7353773},
      abstract = {Human manipulation activity recognition is an important yet challenging task in robot imitation. In this paper, we introduce, for the first time, a novel method for semantic decomposition and recognition of continuous human manipulation activities by using on-line learned individual manipulation models. Solely based on the spatiotemporal interactions between objects and hands in the scene, the proposed framework can parse not only sequential and concurrent (overlapping) manipulation streams but also basic primitive elements of each detected manipulation. Without requiring any prior object knowledge, the framework can furthermore extract object-like scene entities that are performing the same role in the detected manipulations. The framework was evaluated on our new egocentric activity dataset which contains 120 different samples of 8 single atomic manipulations (e.g. Cutting and Stirring) and 20 long and complex activity demonstrations such as}}
    Abstract: Human manipulation activity recognition is an important yet challenging task in robot imitation. In this paper, we introduce, for the first time, a novel method for semantic decomposition and recognition of continuous human manipulation activities by using on-line learned individual manipulation models. Solely based on the spatiotemporal interactions between objects and hands in the scene, the proposed framework can parse not only sequential and concurrent (overlapping) manipulation streams but also basic primitive elements of each detected manipulation. Without requiring any prior object knowledge, the framework can furthermore extract object-like scene entities that are performing the same role in the detected manipulations. The framework was evaluated on our new egocentric activity dataset which contains 120 different samples of 8 single atomic manipulations (e.g. Cutting and Stirring) and 20 long and complex activity demonstrations such as
    Review:
    Aksoy E.E.and Wörgötter, F. (2014).
    Piaget ve Robotlar : Özümseme ve Uyumsama. Türkiye Otonom Robotlar Konferans (TORK), 1 -- 2.
    BibTeX:
    @bibtexentrytype{aksoyeeandwoergoetter2014,
      author = {Aksoy E.E.and Wörgötter, F.},
      title = {Piaget ve Robotlar : Özümseme ve Uyumsama},
      pages = {1 -- 2},
      booktitle = {Türkiye Otonom Robotlar Konferans (TORK)},
      year = {2014},
      location = {Ankara, Turkey},
      month = {November 6 - 7}}
    Abstract:
    Review:
    Agostini, A. and Aein, M. J. and Szedmak, S. and Aksoy, E. E. and Piater, J. and Wörgötter, F. (2015).
    Using Structural Bootstrapping for Object Substitution in Robotic Executions of Human-like Manipulation Tasks. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 6479-6486. DOI: 10.1109/IROS.2015.7354303.
    BibTeX:
    @inproceedings{agostiniaeinszedmak2015,
      author = {Agostini, A. and Aein, M. J. and Szedmak, S. and Aksoy, E. E. and Piater, J. and Wörgötter, F.},
      title = {Using Structural Bootstrapping for Object Substitution in Robotic Executions of Human-like Manipulation Tasks},
      pages = {6479-6486},
      booktitle = {IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS)},
      year = {2015},
      location = {Hamburg, Germany},
      month = {Sept},
      doi = {10.1109/IROS.2015.7354303},
      abstract = {In this work we address the problem of finding replacements of missing objects that are needed for the execution of human-like manipulation tasks. This is a usual problem that is easily solved by humans provided their natural knowledge to find object substitutions: using a knife as a screwdriver or a book as a cutting board. On the other hand, in robotic applications, objects required in the task should be included in advance in the problem definition. If any of these objects is missing from the scenario, the conventional approach is to manually redefine the problem according to the available objects in the scene. In this work we propose an automatic way of finding object substitutions for the execution of manipulation tasks. The approach uses a logic-based planner to generate a plan from a prototypical problem definition and searches for replacements in the scene when some of the objects involved in the plan are missing. This is done by means of a repository of objects and attributes with roles, which is used to identify the affordances of the unknown objects in the scene. Planning actions are grounded using a novel approach that encodes the semantic structure of manipulation actions. The system was evaluated in a KUKA arm platform for the task of preparing a salad with successful results.}}
    Abstract: In this work we address the problem of finding replacements of missing objects that are needed for the execution of human-like manipulation tasks. This is a usual problem that is easily solved by humans provided their natural knowledge to find object substitutions: using a knife as a screwdriver or a book as a cutting board. On the other hand, in robotic applications, objects required in the task should be included in advance in the problem definition. If any of these objects is missing from the scenario, the conventional approach is to manually redefine the problem according to the available objects in the scene. In this work we propose an automatic way of finding object substitutions for the execution of manipulation tasks. The approach uses a logic-based planner to generate a plan from a prototypical problem definition and searches for replacements in the scene when some of the objects involved in the plan are missing. This is done by means of a repository of objects and attributes with roles, which is used to identify the affordances of the unknown objects in the scene. Planning actions are grounded using a novel approach that encodes the semantic structure of manipulation actions. The system was evaluated in a KUKA arm platform for the task of preparing a salad with successful results.
    Review:
    Ziaeetabar, F. and Aksoy, E. E. and Wörgötter, F. and Tamosiunaite, M. (2017).
    Semantic Analysis of Manipulation Actions Using Spatial Relations. IEEE International Conference on Robotics and Automation (ICRA) (accepted).
    BibTeX:
    @inproceedings{ziaeetabaraksoywoergoetter2017,
      author = {Ziaeetabar, F. and Aksoy, E. E. and Wörgötter, F. and Tamosiunaite, M.},
      title = {Semantic Analysis of Manipulation Actions Using Spatial Relations},
      booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
      year = {2017},
      month = {May- June},
      note = {accepted},
      abstract = {Recognition of human manipulation actions together with the analysis and execution by a robot is an important issue. Also, perception of spatial relationships between objects is central to understanding the meaning of manipulation actions. Here we would like to merge these two notions and analyze manipulation actions using symbolic spatial relations between objects in the scene. Specifically, we define procedures for extraction of symbolic human-readable relations based on Axis Aligned Bounding Box object models and use sequences of those relations for action recognition from image sequences. Our framework is inspired by the so called Semantic Event Chain framework, which analyzes touching and un-touching events of different objects during the manipulation. However, our framework uses fourteen spatial relations instead of two. We show that our relational framework is able to differentiate between more manipulation actions than the original Semantic Event Chains. We quantitatively evaluate the method on the MANIAC dataset containing 120 videos of eight different manipulation actions and obtain 97% classification accuracy which is 12 % more as compared to the original Semantic Event Chains.}}
    Abstract: Recognition of human manipulation actions together with the analysis and execution by a robot is an important issue. Also, perception of spatial relationships between objects is central to understanding the meaning of manipulation actions. Here we would like to merge these two notions and analyze manipulation actions using symbolic spatial relations between objects in the scene. Specifically, we define procedures for extraction of symbolic human-readable relations based on Axis Aligned Bounding Box object models and use sequences of those relations for action recognition from image sequences. Our framework is inspired by the so called Semantic Event Chain framework, which analyzes touching and un-touching events of different objects during the manipulation. However, our framework uses fourteen spatial relations instead of two. We show that our relational framework is able to differentiate between more manipulation actions than the original Semantic Event Chains. We quantitatively evaluate the method on the MANIAC dataset containing 120 videos of eight different manipulation actions and obtain 97% classification accuracy which is 12 % more as compared to the original Semantic Event Chains.
    Review:

    © 2011 - 2017 Dept. of Computational Neuroscience • comments to: sreich _at_ gwdg.de • Impressum / Site Info