Safe-Construct: Redefining Construction Safety Violation Recognition as a 3D Multi-view Engagement Task

Abstract

Recognizing safety violations in construction environments is critical yet remains underexplored in computer vision. Existing models predominantly rely on 2D object detection, which fails to capture the complexities of real-world violations due to: (i) an oversimplified task formulation treating violation recognition merely as object detection, (ii) in-adequate validation under realistic conditions, (iii) absence of standardized baselines, and (iv) limited scalability from the unavailability of synthetic dataset generators for diverse construction scenarios. To address these challenges, we introduce Safe-Construct, the first framework that reformulates violation recognition as a 3D multi-view engagement task, leveraging scene-level worker-object context and 3D spatial understanding. We also propose the Synthetic Indoor Construction Site Generator (SICSG) to create diverse, scalable training data, overcoming data limitations. Safe-Construct achieves a 7.6% improvement over state-of-the-art methods across four violation types. We rigorously evaluate our approach in near-realistic settings, incorporating four violations, four workers, 14 objects, and challenging conditions like occlusions (worker-object, worker-worker) and variable illumination (back-lighting, overexposure, sunlight). By integrating 3D multi-view spatial understanding and synthetic data generation, Safe-Construct sets a new benchmark for scalable and robust safety monitoring in high-risk industries.

Motivation

Comparison with Prior Methods. Previous models train on crowd-sourced or web-mined data: Pictor-v2, Pictor-v3, and Roboflow framing the problem as 2D object detection task. These lack 3D spatial understanding and scene-level worker-object context. These datasets are small with most images having unrealistic resolutions and perspectives (see pictures 1-4 in Pictor-v3, 1-2 in Pictor-v2, and 2-7 in Roboflow). Realistic industrial setups do not feature workers in close camera proximity (<1 m). (b) In contrast, Safe-Construct is the first multi-view 3D violation recognition model that leverages 3D spatial understanding and scene-level worker-object context.

Contributions

We are the first to formulate violation recognition as a 3D multi-view engagement task. By leveraging geometry-based modeling and multi-view inputs, our approach achieves occlusion-robust, scene-level understanding that surpasses existing state-of-the-art methods.
Safe-Construct is the first framework to decouple violation criteria from training data, enabling scalable generalization to new violation types without the need for collecting additional real-dataset datasets.
We introduce the Synthetic Indoor Construction Scene Generator (SICSG), a novel custom engine that generates physically realistic scene variations, such as illumination, occlusion, and perspective changes, imparting spatial awareness and physical common sense to the model.
We conduct the first evaluation in a 3D multi-camera indoor construction setup, comprising four safety violations, four workers and 14 objects across diverse conditions—occlusions, lighting variations, and camera distances resulting in significant scale changes in worker bodies, significantly increasing scene complexity. Safe-Construct consistently outperforms prior methods. Moreover, it is the first model tailored specifically for indoor construction settings.

BibTeX

@misc{chharia2025safeconstructredefiningconstructionsafety,
      title={Safe-Construct: Redefining Construction Safety Violation Recognition as 3D Multi-View Engagement Task}, 
      author={Aviral Chharia and Tianyu Ren and Tomotake Furuhata and Kenji Shimada},
      year={2025},
      eprint={2504.10880},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.10880}, 
    }

Redefining Construction Safety Violation Recognition as 3D Multi-View
Engagement Task

CVPR 2025 Affective Behavior Analysis in-the-wild

Abstract

Motivation

Contributions

Qualitative Results

Edge Cases

BibTeX