Dr. Jan Markus Schoeler

Group(s): Computer Vision
Email:
mschoeler@gwdg.de

Global QuickSearch:   Matches: 0

Search Settings

    Author / Editor / Organization
    Year
    Title
    Journal / Proceedings / Book
    Papon, J. and Abramov, A. and Schoeler, M. and Wörgötter, F. (2013).
    Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds. IEEE Conference on Computer Vision and Pattern Recognition CVPR, 2027 - 2034. DOI: 10.1109/CVPR.2013.264.
    BibTeX:
    @inproceedings{paponabramovschoeler2013,
      author = {Papon, J. and Abramov, A. and Schoeler, M. and Wörgötter, F.},
      title = {Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds},
      pages = {2027 - 2034},
      booktitle = {IEEE Conference on Computer Vision and Pattern Recognition CVPR},
      year = {2013},
      location = {Portland, OR, USA},
      month = {06},
      doi = 10.1109/CVPR.2013.264},
      abstract = Unsupervised over-segmentation of an image into regions of perceptually similar pixels, known as super pixels, is a widely used preprocessing step in segmentation algorithms. Super pixel methods reduce the number of regions that must be considered later by more computationally expensive algorithms, with a minimal loss of information. Nevertheless, as some information is inevitably lost, it is vital that super pixels not cross object boundaries, as such errors will propagate through later steps. Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent super pixels from crossing regions of empty space. We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space. Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated. Additionally, as the algorithm works directly in 3D space, observations from several calibrated RGB+D cameras can be segmented jointly. Experiments on a large data set of human annotated RGB+D images demonstrate a significant reduction in occurrence of clusters crossing object boundaries, while maintaining speeds comparable to state-of-the-art 2D methods.}}
    		
    Abstract: Unsupervised over-segmentation of an image into regions of perceptually similar pixels, known as super pixels, is a widely used preprocessing step in segmentation algorithms. Super pixel methods reduce the number of regions that must be considered later by more computationally expensive algorithms, with a minimal loss of information. Nevertheless, as some information is inevitably lost, it is vital that super pixels not cross object boundaries, as such errors will propagate through later steps. Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent super pixels from crossing regions of empty space. We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space. Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated. Additionally, as the algorithm works directly in 3D space, observations from several calibrated RGB+D cameras can be segmented jointly. Experiments on a large data set of human annotated RGB+D images demonstrate a significant reduction in occurrence of clusters crossing object boundaries, while maintaining speeds comparable to state-of-the-art 2D methods.
    Review:
    Stein, S. and Schoeler, M. and Papon, J. and Wörgötter, F. (2014).
    Object Partitioning using Local Convexity. Conference on Computer Vision and Pattern Recognition CVPR, 304-311. DOI: 10.1109/CVPR.2014.46.
    BibTeX:
    @inproceedings{steinschoelerpapon2014,
      author = {Stein, S. and Schoeler, M. and Papon, J. and Wörgötter, F.},
      title = {Object Partitioning using Local Convexity},
      pages = {304-311},
      booktitle = {Conference on Computer Vision and Pattern Recognition CVPR},
      year = {2014},
      location = {Columbus, OH, USA},
      month = {06},
      doi = 10.1109/CVPR.2014.46},
      abstract = The problem of how to arrive at an appropriate 3D-segmentation of a scene remains difficult. While current state-of-the-art methods continue to gradually improve in benchmark performance, they also grow more and more complex, for example by incorporating chains of classifiers, which require training on large manually annotated data- sets. As an alternative to this, we present a new, efficient learning- and model-free approach for the segmentation of 3D point clouds into object parts. The algorithm begins by decomposing the scene into an adjacency-graph of surface patches based on a voxel grid. Edges in the graph are then classified as either convex or concave using a novel combination of simple criteria which operate on the local geometry of these patches. This way the graph is divided into locally convex connected subgraphs, which - with high accuracy - represent object parts. Additionally, we propose a novel depth dependent voxel grid to deal with the decreasing point-density at far distances in the point clouds. This improves segmentation, allowing the use of fixed parameters for vastly different scenes. The algorithm is straight-forward to implement and requires no training data, while nevertheless producing results that are comparable to state-of-the-art methods which incorporate high-level concepts involving classification, learning and model fitting.}}
    		
    Abstract: The problem of how to arrive at an appropriate 3D-segmentation of a scene remains difficult. While current state-of-the-art methods continue to gradually improve in benchmark performance, they also grow more and more complex, for example by incorporating chains of classifiers, which require training on large manually annotated data- sets. As an alternative to this, we present a new, efficient learning- and model-free approach for the segmentation of 3D point clouds into object parts. The algorithm begins by decomposing the scene into an adjacency-graph of surface patches based on a voxel grid. Edges in the graph are then classified as either convex or concave using a novel combination of simple criteria which operate on the local geometry of these patches. This way the graph is divided into locally convex connected subgraphs, which - with high accuracy - represent object parts. Additionally, we propose a novel depth dependent voxel grid to deal with the decreasing point-density at far distances in the point clouds. This improves segmentation, allowing the use of fixed parameters for vastly different scenes. The algorithm is straight-forward to implement and requires no training data, while nevertheless producing results that are comparable to state-of-the-art methods which incorporate high-level concepts involving classification, learning and model fitting.
    Review:
    Stein, S. and Wörgötter, F. and Schoeler, M. and Papon, J. and Kulvicius, T. (2014).
    Convexity based object partitioning for robot applications. IEEE International Conference on Robotics and Automation (ICRA), 3213-3220. DOI: 10.1109/ICRA.2014.6907321.
    BibTeX:
    @inproceedings{steinwoergoetterschoeler2014,
      author = {Stein, S. and Wörgötter, F. and Schoeler, M. and Papon, J. and Kulvicius, T.},
      title = {Convexity based object partitioning for robot applications},
      pages = {3213-3220},
      booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
      year = {2014},
      month = {05},
      doi = 10.1109/ICRA.2014.6907321},
      abstract = The idea that connected convex surfaces, separated by concave boundaries, play an important role for the perception of objects and their decomposition into parts has been discussed for a long time. Based on this idea, we present a new bottom-up approach for the segmentation of 3D point clouds into object parts. The algorithm approximates a scene using an adjacency-graph of spatially connected surface patches. Edges in the graph are then classified as either convex or concave using a novel, strictly local criterion. Region growing is employed to identify locally convex connected subgraphs, which represent the object parts. We show quantitatively that our algorithm, although conceptually easy to graph and fast to compute, produces results that are comparable to far more complex state-of-the-art methods which use classification, learning and model fitting. This suggests that convexity/concavity is a powerful feature for object partitioning using 3D data. Furthermore we demonstrate that for many objects a natural decomposition into}}
    		
    Abstract: The idea that connected convex surfaces, separated by concave boundaries, play an important role for the perception of objects and their decomposition into parts has been discussed for a long time. Based on this idea, we present a new bottom-up approach for the segmentation of 3D point clouds into object parts. The algorithm approximates a scene using an adjacency-graph of spatially connected surface patches. Edges in the graph are then classified as either convex or concave using a novel, strictly local criterion. Region growing is employed to identify locally convex connected subgraphs, which represent the object parts. We show quantitatively that our algorithm, although conceptually easy to graph and fast to compute, produces results that are comparable to far more complex state-of-the-art methods which use classification, learning and model fitting. This suggests that convexity/concavity is a powerful feature for object partitioning using 3D data. Furthermore we demonstrate that for many objects a natural decomposition into
    Review:
    Schoeler, M. and Stein, S. and Papon, J. and Abramov, A. and Wörgötter, F. (2014).
    Fast Self-supervised On-line Training for Object Recognition Specifically for Robotic Applications. International Conference on Computer Vision Theory and Applications VISAPP, 1 - 10.
    BibTeX:
    @inproceedings{schoelersteinpapon2014,
      author = {Schoeler, M. and Stein, S. and Papon, J. and Abramov, A. and Wörgötter, F.},
      title = {Fast Self-supervised On-line Training for Object Recognition Specifically for Robotic Applications},
      pages = {1 - 10},
      booktitle = {International Conference on Computer Vision Theory and Applications VISAPP},
      year = {2014},
      month = {January},
      abstract = Today most recognition pipelines are trained at an off-line stage, providing systems with pre-segmented images and predefined objects, or at an on-line stage, which requires a human supervisor to tediously control the learning. Self-Supervised on-line training of recognition pipelines without human intervention is a highly desirable goal, as it allows systems to learn unknown, environment specific objects on-the-fly. We propose a fast and automatic system, which can extract and learn unknown objects with minimal human intervention by employing a two-level pipeline combining the advantages of RGB-D sensors for object extraction and high-resolution cameras for object recognition. Furthermore, we significantly improve recognition results with local features by implementing a novel keypoint orientation scheme, which leads to highly invariant but discriminative object signatures. Using only one image per object for training, our system is able to achieve a recognition rate of 79% for 18 objects, benchmarked on 42 scenes with random poses, scales and occlusion, while only taking 7 seconds for the training. Additionally, we evaluate our orientation scheme on the state-of-the-art 56-object SDU-dataset boosting accuracy for one training view per object by +37% to 78% and peaking at a performance of 98% for 11 training views.}}
    		
    Abstract: Today most recognition pipelines are trained at an off-line stage, providing systems with pre-segmented images and predefined objects, or at an on-line stage, which requires a human supervisor to tediously control the learning. Self-Supervised on-line training of recognition pipelines without human intervention is a highly desirable goal, as it allows systems to learn unknown, environment specific objects on-the-fly. We propose a fast and automatic system, which can extract and learn unknown objects with minimal human intervention by employing a two-level pipeline combining the advantages of RGB-D sensors for object extraction and high-resolution cameras for object recognition. Furthermore, we significantly improve recognition results with local features by implementing a novel keypoint orientation scheme, which leads to highly invariant but discriminative object signatures. Using only one image per object for training, our system is able to achieve a recognition rate of 79% for 18 objects, benchmarked on 42 scenes with random poses, scales and occlusion, while only taking 7 seconds for the training. Additionally, we evaluate our orientation scheme on the state-of-the-art 56-object SDU-dataset boosting accuracy for one training view per object by +37% to 78% and peaking at a performance of 98% for 11 training views.
    Review:
    Schoeler, M. and Wörgötter, F. and Aein, M. and Kulvicius, T. (2014).
    Automated generation of training sets for object recognition in robotic applications. 23rd International Conference on Robotics in Alpe-Adria-Danube Region (RAAD), 1-7. DOI: 10.1109/RAAD.2014.7002247.
    BibTeX:
    @inproceedings{schoelerwoergoetteraein2014,
      author = {Schoeler, M. and Wörgötter, F. and Aein, M. and Kulvicius, T.},
      title = {Automated generation of training sets for object recognition in robotic applications},
      pages = {1-7},
      booktitle = {23rd International Conference on Robotics in Alpe-Adria-Danube Region (RAAD)},
      year = {2014},
      month = {Sept},
      doi = 10.1109/RAAD.2014.7002247},
      abstract = Object recognition plays an important role in robotics, since objects/tools first have to be identified in the scene before they can be manipulated/used. The performance of object recognition largely depends on the training dataset. Usually such training sets are gathered manually by a human operator, a tedious procedure, which ultimately limits the size of the dataset. One reason for manual selection of samples is that results returned by search engines often contain irrelevant images, mainly due to the problem of homographs (words spelled the same but with different meanings). In this paper we present an automated and unsupervised method, coined Trainingset Cleaning by Translation ( TCT ), for generation of training sets which are able to deal with the problem of homographs. For disambiguation, it uses the context provided by a command like "tighten the nut" together with a combination of public image searches, text searches and translation services. We compare our approach against plain Google image search qualitatively as well as in a classification task and demonstrate that our method indeed leads to a task-relevant training set, which results in an improvement of 24.1% in object recognition for 12 ambiguous classes. In addition, we present an application of our method to a real robot scenario.}}
    		
    Abstract: Object recognition plays an important role in robotics, since objects/tools first have to be identified in the scene before they can be manipulated/used. The performance of object recognition largely depends on the training dataset. Usually such training sets are gathered manually by a human operator, a tedious procedure, which ultimately limits the size of the dataset. One reason for manual selection of samples is that results returned by search engines often contain irrelevant images, mainly due to the problem of homographs (words spelled the same but with different meanings). In this paper we present an automated and unsupervised method, coined Trainingset Cleaning by Translation ( TCT ), for generation of training sets which are able to deal with the problem of homographs. For disambiguation, it uses the context provided by a command like "tighten the nut" together with a combination of public image searches, text searches and translation services. We compare our approach against plain Google image search qualitatively as well as in a classification task and demonstrate that our method indeed leads to a task-relevant training set, which results in an improvement of 24.1% in object recognition for 12 ambiguous classes. In addition, we present an application of our method to a real robot scenario.
    Review:
    Papon, J. and Schoeler, M. and Wörgötter, F. (2015).
    Spatially Stratified Correspondence Sampling for Real-Time Point Cloud Tracking. IEEE Winter Conference on Applications of Computer Vision (WACV), 124-131. DOI: 10.1109/WACV.2015.24 edition.
    BibTeX:
    @inproceedings{paponschoelerwoergoetter2015,
      author = {Papon, J. and Schoeler, M. and Wörgötter, F.},
      title = {Spatially Stratified Correspondence Sampling for Real-Time Point Cloud Tracking},
      pages = {124-131},
      booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
      year = {2015},
      month = {Jan},
      doi = 10.1109/WACV.2015.24 edition},
      abstract = In this paper we propose a novel spatially stratified sampling technique for evaluating the likelihood function in particle filters. In particular, we show that in the case where the measurement function uses spatial correspondence, we can greatly reduce computational cost by exploiting spatial structure to avoid redundant computations. We present results which quantitatively show that the technique permits equivalent, and in some cases, greater accuracy, as a reference point cloud particle filter at significantly faster run-times. We also compare to a GPU implementation, and show that we can exceed their performance on the CPU. In addition, we present results on a multi-target tracking appli- cation, demonstrating that the increases in efficiency permit online 6DoF multi-target tracking on standard hardware.}}
    		
    Abstract: In this paper we propose a novel spatially stratified sampling technique for evaluating the likelihood function in particle filters. In particular, we show that in the case where the measurement function uses spatial correspondence, we can greatly reduce computational cost by exploiting spatial structure to avoid redundant computations. We present results which quantitatively show that the technique permits equivalent, and in some cases, greater accuracy, as a reference point cloud particle filter at significantly faster run-times. We also compare to a GPU implementation, and show that we can exceed their performance on the CPU. In addition, we present results on a multi-target tracking appli- cation, demonstrating that the increases in efficiency permit online 6DoF multi-target tracking on standard hardware.
    Review:
    Schoeler, M. and Wörgötter, F. and Papon, J. and Kulvicius, T. (2015).
    Unsupervised generation of context-relevant training-sets for visual object recognition employing multilinguality. IEEE Winter Conference on Applications of Computer Vision (WACV), 805-812. DOI: 10.1109/WACV.2015.112.
    BibTeX:
    @inproceedings{schoelerwoergoetterpapon2015,
      author = {Schoeler, M. and Wörgötter, F. and Papon, J. and Kulvicius, T.},
      title = {Unsupervised generation of context-relevant training-sets for visual object recognition employing multilinguality},
      pages = {805-812},
      booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
      year = {2015},
      month = {Jan},
      doi = 10.1109/WACV.2015.112},
      abstract = Image based object classification requires clean training data sets. Gathering such sets is usually done manually by humans, which is time-consuming and laborious. On the other hand, directly using images from search engines creates very noisy data due to ambiguous noun-focused indexing. However, in daily speech nouns and verbs are always coupled. We use this for the automatic generation of clean data sets by the here-presented TRANSCLEAN algorithm, which through the use of multiple languages also solves the problem of polysemes (a single spelling with multiple meanings). Thus, we use the implicit knowledge contained in verbs, e.g. in an imperative such as}}
    		
    Abstract: Image based object classification requires clean training data sets. Gathering such sets is usually done manually by humans, which is time-consuming and laborious. On the other hand, directly using images from search engines creates very noisy data due to ambiguous noun-focused indexing. However, in daily speech nouns and verbs are always coupled. We use this for the automatic generation of clean data sets by the here-presented TRANSCLEAN algorithm, which through the use of multiple languages also solves the problem of polysemes (a single spelling with multiple meanings). Thus, we use the implicit knowledge contained in verbs, e.g. in an imperative such as
    Review:
    Aksoy, E. E. and Schoeler, M. and Wörgötter, F. (2014).
    Testing piagets ideas on robots: Assimilation and accommodation using the semantics of actions. IEEE International Conferences on Development and Learning and Epigenetic Robotics (ICDL-Epirob), 107-108. DOI: 10.1109/DEVLRN.2014.6982962.
    BibTeX:
    @inproceedings{aksoyschoelerwoergoetter2014,
      author = {Aksoy, E. E. and Schoeler, M. and Wörgötter, F.},
      title = {Testing piagets ideas on robots: Assimilation and accommodation using the semantics of actions},
      pages = {107-108},
      booktitle = {IEEE International Conferences on Development and Learning and Epigenetic Robotics (ICDL-Epirob)},
      year = {2014},
      month = {Oct},
      doi = 10.1109/DEVLRN.2014.6982962},
      abstract = The proposed framework addresses the problem of implementing a high level}}
    		
    Abstract: The proposed framework addresses the problem of implementing a high level
    Review:
    Schoeler, M. and Papon, J. and Wörgötter, F. (2015).
    Constrained planar cuts - Object partitioning for point clouds. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5207-5215. DOI: 10.1109/CVPR.2015.7299157.
    BibTeX:
    @inproceedings{schoelerpaponwoergoetter2015,
      author = {Schoeler, M. and Papon, J. and Wörgötter, F.},
      title = {Constrained planar cuts - Object partitioning for point clouds},
      pages = {5207-5215},
      booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      year = {2015},
      location = {Boston, MA, USA},
      month = {June},
      doi = 10.1109/CVPR.2015.7299157},
      abstract = While humans can easily separate unknown objects into meaningful parts, recent segmentation methods can only achieve similar partitionings by training on human-annotated ground-truth data. Here we introduce a bottom-up method for segmenting 3D point clouds into functional parts which does not require supervision and achieves equally good results. Our method uses local concavities as an indicator for inter-part boundaries. We show that this criterion is efficient to compute and generalizes well across different object classes. The algorithm employs a novel locally constrained geometrical boundary model which proposes greedy cuts through a local concavity graph. Only planar cuts are considered and evaluated using a cost function, which rewards cuts orthogonal to concave edges. Additionally, a local clustering constraint is applied to ensure the partitioning only affects relevant locally concave regions. We evaluate our algorithm on recordings from an RGB-D camera as well as the Princeton Segmentation Benchmark, using a fixed set of parameters across all object classes. This stands in stark contrast to most reported results which require either knowing the number of parts or annotated ground-truth for learning. Our approach outperforms all existing bottom-up methods (reducing the gap to human performance by up to 50 %) and achieves scores similar to top-down data-driven approaches.}}
    		
    Abstract: While humans can easily separate unknown objects into meaningful parts, recent segmentation methods can only achieve similar partitionings by training on human-annotated ground-truth data. Here we introduce a bottom-up method for segmenting 3D point clouds into functional parts which does not require supervision and achieves equally good results. Our method uses local concavities as an indicator for inter-part boundaries. We show that this criterion is efficient to compute and generalizes well across different object classes. The algorithm employs a novel locally constrained geometrical boundary model which proposes greedy cuts through a local concavity graph. Only planar cuts are considered and evaluated using a cost function, which rewards cuts orthogonal to concave edges. Additionally, a local clustering constraint is applied to ensure the partitioning only affects relevant locally concave regions. We evaluate our algorithm on recordings from an RGB-D camera as well as the Princeton Segmentation Benchmark, using a fixed set of parameters across all object classes. This stands in stark contrast to most reported results which require either knowing the number of parts or annotated ground-truth for learning. Our approach outperforms all existing bottom-up methods (reducing the gap to human performance by up to 50 %) and achieves scores similar to top-down data-driven approaches.
    Review:
    Papon, J. and Schoeler, M. (2015).
    Semantic Pose using Deep Networks Trained on Synthetic RGB-D. IEEE International Conference on Computer Vision (ICCV), 1-9.
    BibTeX:
    @inproceedings{paponschoeler2015,
      author = {Papon, J. and Schoeler, M.},
      title = {Semantic Pose using Deep Networks Trained on Synthetic RGB-D},
      pages = {1-9},
      booktitle = {IEEE International Conference on Computer Vision (ICCV)},
      year = {2015},
      location = {Santiago, Chile},
      month = {12},
      url = http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Papon_Semantic_Pose_Using_ICCV_2015_paper.pdf},
      abstract = In this work we address the problem of indoor scene understanding from RGB-D images. Specifically, we propose to find instances of common furniture classes, their spatial extent, and their pose with respect to generalized class models. To accomplish this, we use a deep, wide, multi-output convolutional neural network (CNN) that predicts class, pose, and location of possible objects simultaneously. To overcome the lack of large annotated RGB-D training sets (especially those with pose), we use an on-the-fly rendering pipeline that generates realistic cluttered room scenes in parallel to training. We then perform transfer learning on the relatively small amount of publicly available annotated RGB-D data, and find that our model is able to successfully annotate even highly challenging real scenes. Importantly, our trained network is able to understand noisy and sparse observations of highly cluttered scenes with a remarkable degree of accuracy, inferring class and pose from a very limited set of cues. Additionally, our neural network is only moderately deep and computes class, pose and position in tandem, so the overall run-time is significantly faster than existing methods, estimating all output parameters simultaneously in parallel on a GPU in seconds.}}
    		
    Abstract: In this work we address the problem of indoor scene understanding from RGB-D images. Specifically, we propose to find instances of common furniture classes, their spatial extent, and their pose with respect to generalized class models. To accomplish this, we use a deep, wide, multi-output convolutional neural network (CNN) that predicts class, pose, and location of possible objects simultaneously. To overcome the lack of large annotated RGB-D training sets (especially those with pose), we use an on-the-fly rendering pipeline that generates realistic cluttered room scenes in parallel to training. We then perform transfer learning on the relatively small amount of publicly available annotated RGB-D data, and find that our model is able to successfully annotate even highly challenging real scenes. Importantly, our trained network is able to understand noisy and sparse observations of highly cluttered scenes with a remarkable degree of accuracy, inferring class and pose from a very limited set of cues. Additionally, our neural network is only moderately deep and computes class, pose and position in tandem, so the overall run-time is significantly faster than existing methods, estimating all output parameters simultaneously in parallel on a GPU in seconds.
    Review:
    Abelha, P. and Guerin, F. and Schoeler, M. (2016).
    A Model-Based Approach to Finding Substitute Tools in 3D Vision Data. IEEE International Conference on Robotics and Automation (ICRA) (accepted).
    BibTeX:
    @inproceedings{abelhaguerinschoeler2016,
      author = {Abelha, P. and Guerin, F. and Schoeler, M.},
      title = {A Model-Based Approach to Finding Substitute Tools in 3D Vision Data},
      booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
      year = {2016},
      note = {accepted},
      abstract = A robot can feasibly be given knowledge of a set of tools for manipulation activities (e.g. hammer, knife, spatula). If the robot then operates outside a closed environment it is likely to face situations where the tool it knows is not available, but alternative unknown tools are present. We tackle the problem of finding the best substitute tool based solely on 3D vision data. Our approach handcodes simple models of known tools in terms of superquadrics and relationships among them. Our system attempts to fit these models to pointclouds of unknown tools, producing a numeric value for how good a fit is. This value can be used to rate candidate substitutes. We explicitly control how closely each part of a tool must match our model, under direction from parameters of a target task. We allow bottom-up information from segmentation to dictate the sizes that should be considered for various parts of the tool. These ideas allow for a flexible matching so that tools may be superficially quite different, but similar in the way that matters. We evaluate our systems ratings relative to other approaches and relative to human performance in the same task. This is an approach to knowledge transfer, via a suitable representation and reasoning engine, and we discuss how this could be extended to transfer in planning.}}
    		
    Abstract: A robot can feasibly be given knowledge of a set of tools for manipulation activities (e.g. hammer, knife, spatula). If the robot then operates outside a closed environment it is likely to face situations where the tool it knows is not available, but alternative unknown tools are present. We tackle the problem of finding the best substitute tool based solely on 3D vision data. Our approach handcodes simple models of known tools in terms of superquadrics and relationships among them. Our system attempts to fit these models to pointclouds of unknown tools, producing a numeric value for how good a fit is. This value can be used to rate candidate substitutes. We explicitly control how closely each part of a tool must match our model, under direction from parameters of a target task. We allow bottom-up information from segmentation to dictate the sizes that should be considered for various parts of the tool. These ideas allow for a flexible matching so that tools may be superficially quite different, but similar in the way that matters. We evaluate our systems ratings relative to other approaches and relative to human performance in the same task. This is an approach to knowledge transfer, via a suitable representation and reasoning engine, and we discuss how this could be extended to transfer in planning.
    Review:
    Schoeler, M. and Wörgötter, F. (2015).
    Bootstrapping the Semantics of Tools: Affordance analysis of real world objects on a per-part basis. IEEE Transactions on Autonomous Mental Development (TAMD), 84-98, 8, 2. DOI: 10.1109/TAMD.2015.2488284.
    BibTeX:
    @article{schoelerwoergoetter2015,
      author = {Schoeler, M. and Wörgötter, F.},
      title = {Bootstrapping the Semantics of Tools: Affordance analysis of real world objects on a per-part basis},
      pages = {84-98},
      journal = {IEEE Transactions on Autonomous Mental Development (TAMD)},
      year = {2015},
      volume= {8},
      number = {2},
      month = {06},
      doi = 10.1109/TAMD.2015.2488284},
      abstract = This study shows how understanding of object functionality arises by analyzing objects at the level of their parts where we focus here on primary tools. First, we create a set of primary tool functionalities, which we speculate is related to the possible functions of the human hand. The function of a tool is found by comparing it to this set. For this, the unknown tool is segmented, using a data-driven method, into its parts and evaluated using the geometrical part constellations against the training set. We demonstrate that various tools and even uncommon tool-versions can be recognized. The system }}
    		
    Abstract: This study shows how understanding of object functionality arises by analyzing objects at the level of their parts where we focus here on primary tools. First, we create a set of primary tool functionalities, which we speculate is related to the possible functions of the human hand. The function of a tool is found by comparing it to this set. For this, the unknown tool is segmented, using a data-driven method, into its parts and evaluated using the geometrical part constellations against the training set. We demonstrate that various tools and even uncommon tool-versions can be recognized. The system
    Review:
    Gressmann, F. and Lüddecke, T. and Ivanovska, T. and Schoeler, M. and Wörgötter, F. (2017).
    Part-driven Visual Perception of 3D Objects. Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017), 370-377. DOI: 10.5220/0006211203700377.
    BibTeX:
    @conference{gressmannlueddeckeivanovska2017,
      author = {Gressmann, F. and Lüddecke, T. and Ivanovska, T. and Schoeler, M. and Wörgötter, F.},
      title = {Part-driven Visual Perception of 3D Objects},
      pages = {370-377},
      booktitle = {Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017)},
      year = {2017},
      organization = {ScitePress},
      publisher = {ScitePress},
      doi = 10.5220/0006211203700377},
      abstract = During the last years, approaches based on convolutional neural networks (CNN) had substantial success in visual object perception. CNNs turned out to be capable of extracting high-level features of objects, which allow for fine-grained classification. However, some object classes exhibit tremendous variance with respect to their instances appearance. We believe that considering object parts as an intermediate representation could be helpful in these cases. In this work, a part-driven perception of everyday objects with a rotation estimation is implemented using deep convolution neural networks. The used network is trained and tested on artificially generated RGB-D data. The approach has a potential to be used for part recognition of realistic sensor recordings in present robot systems.}}
    		
    Abstract: During the last years, approaches based on convolutional neural networks (CNN) had substantial success in visual object perception. CNNs turned out to be capable of extracting high-level features of objects, which allow for fine-grained classification. However, some object classes exhibit tremendous variance with respect to their instances appearance. We believe that considering object parts as an intermediate representation could be helpful in these cases. In this work, a part-driven perception of everyday objects with a rotation estimation is implemented using deep convolution neural networks. The used network is trained and tested on artificially generated RGB-D data. The approach has a potential to be used for part recognition of realistic sensor recordings in present robot systems.
    Review:

    © 2011 - 2016 Dept. of Computational Neuroscience • comments to: sreich _at_ gwdg.de • Impressum / Site Info