DeepGen
DeepGen Project Page
Advisor: Mohammad Keshavazi, Allen Yang
Team Member: Flaviano Christian Reyes, Ritika Shrivastava, Songwen Su
Video link: https://www.youtube.com/watch?v=oMTvVkss-38
Introduction

Context
Due to the recent development in both software and hardware, augmented reality is becoming increasingly popular and applied in applications from many fields such as gaming, healthcare, and education. When designing these AR and VR applications, one problem that is encountered is the variance faced when the same application is used in different environments.
Motivation
The environment settings of different users can vary a lot. At the moment, most augmented reality applications do not use environmental context into consideration when trying to project objects to their surrounding environments. This can break the suspension of disbelief and hurt the user experience. To narrow the scope of such a large problem, our team decided to focus on the  problem of indoor scene synthesis. 

The Problem
The general goal of indoor scene synthesis is to produce a feasible furniture layout of various object classes which address both functional and aesthetic criteria. Given a room with some existing objects, the scene synthesis project focuses on finding an optimal location to place a new object.
Team Process
Literature Review

PlanIT: Combines a high-level relation graph representation with spatial prior neural networks.

GRAIN: Input scene represented as a tree where nodes represent objects or relationships between objects. Training involves encoding and recreating the tree.

SceneGraphNet: Neural message passing approach to augment an input 3D indoor scene with new objects matching their surroundings. 

SceneGen: The system takes a semantically segmented scene as input and outputs positional and orientation probability maps for placing virtual content.
Final Prototype: 
System description
A designer is concerned with placing an object of furniture “F” of label “F_L” into a room “R”. Within the room “R”, there may exist objects within R. The designer is primarily concerned with where to place “F” in “R”. Our algorithm provides a solution to this problem. First, we sample points from the room. Generally, the sampling scheme is that you sample some resolution of points uniformly in space from the room. For instance, if you have a resolution of 50, then you sample 50x50 = 2500 points. For each point sampled, our system pretends that we have placed object “F” at this point such that the point sampled is equal to the center of “F”. Given that “F” is placed at the sampled point “P”, we generate a feature vector associated with “F” placement relative to what already exists within the scene, including walls and the floor. We also generate a scene graph which describes the spatial relationships between object “F” and what already exists within the scene. Using the feature vector and the scene graph associated with “F” and “P”, we pass this information to our model. The model outputs a probability describing the likelihood of placement. Once we have gotten all the probabilities, the sample point with the maximum probability of placement “P*” is where we will finally place object “F”.
Final Prototype: 
Implementation details
Input
Given an object of size (r_x, r_y, r_z) with label L.

Data Structure

Feature Vector
     - Average distance from the a possible placement to furniture groups
     - Counts per furniture group detailing how many members of a furniture group are surrounding a target object
     - Counts per furniture group detailing how many members of a furniture group are intersecting a target object
     - Counts per furniture group detailing how many members of a furniture group are supporting a target object
     - Three closest furniture groups stored in order within a vector [A, B, C]. For instance, A represents the closest furniture group to the target object, while C represents the farthest.
     - One-hot encoded vector representing the relative position of the target object within the room. Specifically, [1, 0] represents that the object is nearest one wall, [0, 1] represents that the object is equally near two walls (a corner), and [0, 0] represents that an object is positioned in the middle of the room.

Scene Graph
There are five scene graphs relationships generated per object placement. They represent:

     (1) Intersection: If another object in a scene intersects the target object, then an edge is connected from the other object node to the target object node.

     (2) Surrounding: If another object is within the proximity of the target object, then an edge is connected from the other object node to the target object node.

     (3) Support by: If another object’s top plane of its bounding box is within a thresholded distance (0.05 meters) of the target object’s bottom plane, then an edge is connected from the other object to the target object node.

     (4) Supporting: If another object’s bottom plane of its bounding box is within a thresholded distance (0.05 meters) of the target object’s top plane, then an edge is connected from the other object to the target object node.

    (5) Relative position: The nodes in this graph are associated with the walls, floor, and target object. If a wall is within the proximity of an object, then an edge is drawn from the node associated with the wall to the target object. If no walls meet this criteria, an edge is drawn from the floor node to the target object node.

    (6) Co-occurring: The nodes in this graph are associated with the walls, floor, and target object. If a wall is within the proximity of an object, then an edge is drawn from the node associated with the wall to the target object. If no walls meet this criteria, an edge is drawn from the floor node to the target object node.

Node: Each furniture node in a scene graph also has an associated feature vector. This feature vector includes a one-hot encoding of the furniture group label as well as the ordering of distance between itself and the target object. For example, if the ordering is equal to 5, then the object is the 5th closest to the target object.

Edge: if two objects are connected in a scene graph, then they meet a certain criteria of spatial relationship. For example, in intersection scene graphs, if two objects and their bounding boxes intersect, then an edge connects them. 

Note: A default node  is included so the model can understand when there is a lack of nodes in a scene graph. Its feature vector includes an all zero one-hot encoded vector as well as an ordering number of -1.


Algorithm

(1) We sample points from a room R.

(2) For each sample point, we pretend that the given object is placed as the sample point.

(3) We generate a feature vector and a set of scene graphs based on the placement and size of the object.

(4) Given the feature vector and the set of scene graphs associated with this object placement, we pass these encodings into our network. Our network will output a probability describing the likelihood that our placement is plausible.

(5) We repeat steps 2-4 for all points sampled from the room.
Network Architecture

- There is a graph neural network per each furniture group label. Therefore, plausibility depends on the object’s label.

- Each scene graph, except for the co-occurrence scene graph, has an associated graph attention layer which it passes through. The co-occurrence scene graph is passed through a dynamic edge layer, which learns which co-occurrence relationships are important to a furniture group L.

- Once each scene graph has passed through their associated layer, the messages that have reached the target node in each scene graph are concatenated together  with the summary feature vector. This large vector is passed through an MLP to output the plausibility probability.
Reference

Kai Wang et all. "PlanIT: Planning and Instantiating Indoor Scenes with Relation Graph and Spatial Prior Networks". In: arXiv:1406.2616

Manyi Li et al. “GRAINS: Generative recursive autoencoders for indoor scenes”. In: ACM Transactions on Graphics (TOG) 38.2 (2019), p. 12.

Yang Zhou, Zachary While, and Evangelos Kalogerakis. “SceneGraphNet: Neural Message Passing for 3D Indoor Scene Augmentation”. In: Proceedings of the IEEE International Conference on Computer Vision. 2019, pp. 7384–7392.

Mohammad Keshavarzi et al. “SceneGen: Generative Contextual Scene Augmentation using Scene Graph Priors”. In: arXiv preprint arXiv:2009.12395 (2020).



DeepGen
1
46
0
Published: