<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title type="main" level="a">A Comparative Study of Deep Learning Models for Symbol Detection in Technical Drawings</title>
        <author>
          <persName n="1" ref="https://orcid.org/0000-0003-1354-7817" type="ORCID">
            <forename>Benedikt</forename>
            <surname>Faltin</surname>
            <placeName type="affiliation">Ruhr-University Bochum, Germany</placeName>
          </persName>
          <persName n="2">
            <forename>Damaris</forename>
            <surname>Gann</surname>
            <placeName type="affiliation">Ruhr-University Bochum, Germany</placeName>
          </persName>
          <persName n="3" ref="https://orcid.org/0000-0002-2729-7743" type="ORCID">
            <forename>Markus</forename>
            <surname>König</surname>
            <placeName type="affiliation">Ruhr-University Bochum, Germany</placeName>
          </persName>
        </author>
        <respStmt>
          <resp>This is a section of <title>CONVR 2023 - Proceedings of the 23rd International Conference on  Construction Applications of Virtual Reality </title>(DOI: <idno type="DOI">10.36253/979-12-215-0289-3</idno>) by </resp>
          <name>Pietro Capone, Vito Getuli, Farzad Pour Rahimian, Nashwan Dawood, Alessandro Bruttini, Tommaso Sorbi</name>
        </respStmt>
      </titleStmt>
      <publicationStmt>
        <publisher>Firenze University Press</publisher>
        <pubPlace>Florence</pubPlace>
        <date when="2023">2023</date>
        <idno type="DOI">https://doi.org/10.36253/10.36253/979-12-215-0289-3.87</idno>
        <availability>
          <p>Available for academic research purposes</p>
          <p>Open Access</p>
          <p>Copyright Author(s)</p>
          <licence source="text" target="https://creativecommons.org/licenses/by-nc/4.0/legalcode">
            <p>Content licence CC BY-NC 4.0</p>
          </licence>
          <licence source="metadata" target="https://creativecommons.org/publicdomain/zero/1.0/legalcode">
            <p>Metadata licence CC0 1.0</p>
          </licence>
        </availability>
      </publicationStmt>
      <sourceDesc>
        <p>This is original content, published for academic research purposes</p>
      </sourceDesc>
    </fileDesc>
    <encodingDesc>
      <appInfo>
        <application version="2.2" ident="Booksflow">
          <desc>Digital edition XML powered by Booksflow</desc>
        </application>
      </appInfo>
    </encodingDesc>
    <profileDesc>
      <abstract xml:lang="en">
        <p>Symbols are a universal way to convey complex information in technical drawings since they can represent a wide range of elements, including components, materials, or relationships, in a concise and space-saving manner. Therefore, to enable a digital and automatic interpretation of pixel-based drawings, accurate detection of symbols is a crucial step. To enhance the efficiency of the digitization process, current research focuses on automating this symbol detection using deep learning models. However, the ever-increasing repertoire of model architectures poses a challenge for researchers and practitioners alike in retaining an overview of the latest advancements and selecting the most suitable model architecture for their respective use cases. To provide guidance, this contribution conducts a comparative study of prevalent and state-of-the-art model architectures for the task of symbol detection in pixel-based construction drawings. Therefore, this study evaluates six different object detection model architectures, including YOLOv5, YOLOv7, YOLOv8, Swin-Transformer, ConvNeXt, and Faster-RCNN. These models are trained and tested on two distinct datasets from the bridge and residential building domains, both representing substantial sub-sectors of the construction industry. Furthermore, the models are evaluated based on five criteria, i.e., detection accuracy, robustness to data scarcity, training time, inference time, and model size. In summary, our comparative study highlights the performance and capabilities of different deep learning models for symbol detection in construction drawings. Through the comprehensive evaluation and practical insights, this research facilitates the advancement of automated symbol detection by showing the strengths and weaknesses of the model architectures, thus providing users with valuable guidance in choosing the most appropriate model for their real-world applications</p>
      </abstract>
      <textClass>
        <keywords>
          <list>
            <item>Computer Vision</item>
            <item>Technical Drawings</item>
            <item>Symbol Detection</item>
            <item>Comparative Study</item>
          </list>
        </keywords>
      </textClass>
    </profileDesc>
  </teiHeader>
  <text>
    <body>
      <p>It is available online at https://doi.org/10.36253/10.36253/979-12-215-0289-3.87<ref target="https://doi.org/10.36253/10.36253/979-12-215-0289-3.87" /></p>
      <div>
        <listBibl>
          <head>References</head>
          <bibl n="137457">
            <bibl>Adam, S., Ogier, J. M., Cariou, C., Mullot, R., Labiche, J., &amp;amp; Gardes, J. (2000). Symbol and character recognition: application to engineering drawings. International Journal on Document Analysis and Recognition, 3(2), 89–101.</bibl>
            <idno type="DOI">10.1007/s100320000033</idno>
          </bibl>
          <bibl n="138769">
            <bibl>Ah-Soon, C. (1998). A constraint network for symbol detection in architectural drawings. In K. Tombre &amp;amp; A.K. Chhabra (Eds.), Lecture Notes in Computer Science. Springer.</bibl>
            <idno type="DOI">10.1007/3-540-64381-8_41</idno>
          </bibl>
          <bibl n="136681">
            <bibl>Br&amp;#246;&amp;#223;ner, P., Hohlmann, B., &amp;amp; Radermacher, K. (2022). Transformer vs. CNN: A Comparison on Knee Segmentation in Ultrasound Images. In F. Rodriguez Y Baena, J. W. Giles &amp;amp; E. Stindel (Eds.), Proceedings of the 20th Annual Meeting of the International Society for Computer Assisted Orthopaedic Surgery, Vol. 5, 31–36.</bibl>
            <idno type="DOI">10.29007/cqcv</idno>
          </bibl>
          <bibl n="137832">
            <bibl>Deng, J., Dong, W., Socher, R., Li, L.-J., Kai Li, &amp;amp; Li Fei-Fei (2009 ImageNet: A large-scale hierarchical image database. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 248–255.</bibl>
            <idno type="DOI">10.1109/CVPR.2009.5206848</idno>
          </bibl>
          <bibl n="136965">
            <bibl>Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., &amp;amp; Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv.</bibl>
            <idno type="DOI">10.48550/arXiv.2010.11929</idno>
          </bibl>
          <bibl n="138903">
            <bibl>Elyan, E., Jamieson, L., &amp;amp; Ali-Gombe, A. (2020). Deep learning for symbols detection and classification in engineering drawings. Neural networks, Vol. 129, 91–102.</bibl>
            <idno type="DOI">10.1016/j.neunet.2020.05.025</idno>
          </bibl>
          <bibl n="136640">
            <bibl>Elyan, E., Moreno-Garc&amp;#237;a, C. F., &amp;amp; Johnston, P. (2020). Symbols in Engineering Drawings (SiED): An Imbalanced Dataset Benchmarked by Convolutional Neural Networks. In L. Iliadis, P. P. Angelov, C. Jayne, &amp;amp; E. Pimenidis (Eds.), Proceedings of the 21st EANN (Engineering Applications of Neural Networks) 2020 Conference, 215–224. Springer.</bibl>
            <idno type="DOI">10.1007/978-3-030-48791-1_16</idno>
          </bibl>
          <bibl n="136717">
            <bibl>Faltin, B., Sch&amp;#246;nfelder, P., &amp;amp; K&amp;#246;nig, M. (2023). Inferring Interconnections of Construction Drawings for Bridges Using Deep Learning-based Methods. In E. Hjelseth, S. F. Sujan &amp;amp; R. J. Scherer (Eds.), ECPPM 2022-eWork and eBusiness in Architecture, Engineering and Construction 2022, 343-350. CRC Press.</bibl>
            <idno type="DOI">10.1201/9781003354222</idno>
          </bibl>
          <bibl n="136677">Faltin, B., Sch&amp;#246;nfelder, P., &amp;amp; K&amp;#246;nig, M. (2023). Improving Symbol Detection on Engineering Drawings Using a Keypoint-Based Deep Learning Approach. The 30th EG-ICE: International Conference on Intelligent Computing in Engineering. https://www.ucl.ac.uk/bartlett/construction/sites/bartlett_construction/files/1889.pdf</bibl>
          <bibl n="138982">
            <bibl>Gudigar, A., Chokkadi, S., &amp;amp; U, R. (2016). A review on automatic detection and recognition of traffic sign. Multimedia Tools and Applications, 75(1), 333–364.</bibl>
            <idno type="DOI">10.1007/s11042-014-2293-7</idno>
          </bibl>
          <bibl n="138543">
            <bibl>He, K., Zhang, X., Ren, S., &amp;amp; Sun, J. (2016). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778.</bibl>
            <idno type="DOI">10.1109/CVPR.2016.90</idno>
          </bibl>
          <bibl n="137333">
            <bibl>Huang, W., Sun, Q., Yu, A., Guo, W., Xu, Q., Wen, B., &amp;amp; Xu, L. (2023). Leveraging Deep Convolutional Neural Network for Point Symbol Recognition in Scanned Topographic Maps. ISPRS International Journal of Geo-Information, 12(3), 128.</bibl>
            <idno type="DOI">10.3390/ijgi12030128</idno>
          </bibl>
          <bibl n="139015">
            <bibl>Jaiswal, A., Babu, A. R., Zadeh, M. Z., Banerjee, D., &amp;amp; Makedon, F. (2021). A Survey on Contrastive Self-Supervised Learning. Technologies, 9(1), Article 2.</bibl>
            <idno type="DOI">10.3390/technologies9010002</idno>
          </bibl>
          <bibl n="136793">
            <bibl>Jocher, G., Chaurasia, A., Stoken, A., Borovec, J., Kwon, Y., Michael, K., TaoXie, Fang, J. imyhxy, Lorna, Zan Yifu, Wong, C., V, A., Montes, D., Wang, Z., Fati, C., Nadar, J., Laughing, … Jain, M. (2022). ultralytics/yolov5: v7.0 - YOLOv5 SOTA Realtime Instance Segmentation. Zenodo.</bibl>
            <idno type="DOI">10.5281/zenodo.3908559</idno>
          </bibl>
          <bibl n="139433">Jocher, G., Chaurasia, A., &amp;amp; Qiu, J. (2023). YOLO by Ultralytics (Version 8.0.0). https://github.com/ultralytics/ultralytics</bibl>
          <bibl n="136744">
            <bibl>Kalervo, A., Ylioinas, J., H&amp;#228;iki&amp;#246;, M., Karhu, A., &amp;amp; Kannala, J. (2019). CubiCasa5K: A Dataset and an Improved Multi-task Model for Floorplan Image Analysis. In M. Felsberg, P.-E. Forss&amp;#233;n, I.-M. Sintorn &amp;amp; J. Unger (Eds.), Image Analysis: 21st Scandinavian Conference, Vol. 11482, 28-40. Springer.</bibl>
            <idno type="DOI">10.1007/978-3-030-20205-7_3</idno>
          </bibl>
          <bibl n="136757">
            <bibl>Lin, T. Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., &amp;amp; Piotr, D. (2014). Microsoft COCO: Common Objects in Context. In D. Fleet, T. Pajdla, B.Schiele &amp;amp; T. Tuytelaars (Eds.), Computer Vision – ECCV 2014, Vol. 13, 740-755. Springer.</bibl>
            <idno type="DOI">10.1007/978-3-319-10602-1_48</idno>
          </bibl>
          <bibl n="137849">
            <bibl>Lim, J.-S., Astrid, M., Yoon, H.-J., &amp;amp; Lee, S.-I. (2021). Small Object Detection using Context and Attention. 2021 International Conference on Artificial Intelligence in Information and Communication, 181–186.</bibl>
            <idno type="DOI">10.1109/ICAIIC51459.2021.9415217</idno>
          </bibl>
          <bibl n="137238">
            <bibl>Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., &amp;amp; Guo, B. (2021). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. Proceedings of the IEEE/CVF International Conference on Computer Vision, 9992-10002.</bibl>
            <idno type="DOI">10.1109/ICCV48922.2021.00986</idno>
          </bibl>
          <bibl n="138202">
            <bibl>Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., &amp;amp; Xie, S. (2022). A ConvNet for the 2020s. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 11976-11986.</bibl>
            <idno type="DOI">10.1109/CVPR52688.2022.01167</idno>
          </bibl>
          <bibl n="139665">
            <bibl>Loshchilov, I., &amp;amp; Hutter, F. (2017). Decoupled weight decay regularization. arXiv.</bibl>
            <idno type="DOI">10.48550/arXiv.1711.05101</idno>
          </bibl>
          <bibl n="136896">
            <bibl>Mani, S., Haddad, M. A., Constantini, D., Douhard, W., Li, Q., &amp;amp; Poirier, L. (2020). Automatic Digitization of Engineering Diagrams Using Deep Learning and Graph Search. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 176-177.</bibl>
            <idno type="DOI">10.1109/CVPRW50498.2020.00096</idno>
          </bibl>
          <bibl n="137297">
            <bibl>Moutik, O., Sekkat, H., Tigani, S., Chehri, A., Saadane, R., Tchakoucht, T. A., &amp;amp; Paul, A. (2023). Convolutional Neural Networks or Vision Transformers: Who Will Win the Race for Action Recognitions in Visual Data?. Sensors, 23(2), 734.</bibl>
            <idno type="DOI">10.3390/s23020734</idno>
          </bibl>
          <bibl n="138050">
            <bibl>Padilla, R., Passos, W. L., Dias, T. L. B., Netto, S. L., &amp;amp; da Silva, E. A. B. (2021). A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit. Electronics, 10(3), 279.</bibl>
            <idno type="DOI">10.3390/electronics10030279</idno>
          </bibl>
          <bibl n="137393">
            <bibl>Ren, S., He, K., Girshick, R., &amp;amp; Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137-1149.</bibl>
            <idno type="DOI">10.1109/TPAMI.2016.2577031</idno>
          </bibl>
          <bibl n="137410">
            <bibl>Rombach, R., Blattmann, A., Lorenz, D., Esser, P., &amp;amp; Ommer, B. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684-10695.</bibl>
            <idno type="DOI">10.1109/CVPR52688.2022.01042</idno>
          </bibl>
          <bibl n="138626">
            <bibl>Schmidt, S., Rao, Q., Tatsch, J., &amp;amp; Knoll, A. (2020). Advanced Active Learning Strategies for Object Detection. Proceedings of the IEEE Intelligent Vehicles Symposium. 871–876.</bibl>
            <idno type="DOI">10.1109/IV47402.2020.9304565</idno>
          </bibl>
          <bibl n="138447">
            <bibl>Wang, D., Zhang, J., Du, B., Xia, G. S., &amp;amp; Tao, D. (2023). An Empirical Study of Remote Sensing Pretraining. Proceedings of the IEEE Transactions on Geoscience and Remote Sensing, 61.</bibl>
            <idno type="DOI">10.1109/TGRS.2022.3176603</idno>
          </bibl>
          <bibl n="139153">
            <bibl>Wang, C.Y., Bochkovskiy, A., &amp;amp; Liao, H.Y. (2022). YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. arXiv.</bibl>
            <idno type="DOI">10.48550/arXiv.2207.02696</idno>
          </bibl>
          <bibl n="138164">
            <bibl>Zaidi, S. S. A., Ansari, M. S., Aslam, A., Kanwal, N., Asghar, M., &amp;amp; Lee, B. (2022). A survey of modern deep learning based object detection models. Digital Signal Processing, 126, Article 103514.</bibl>
            <idno type="DOI">10.1016/j.dsp.2022.103514</idno>
          </bibl>
          <bibl n="138248">
            <bibl>Ziran, Z., &amp;amp; Marinai, S. (2018). Object Detection in Floor Plan Images. In: L. Pancioni, F. Schwenker, E. Trentin, (Eds.), Artificial Neural Networks in Pattern Recognition, 383-394. Springer.</bibl>
            <idno type="DOI">10.1007/978-3-319-99978-4_30</idno>
          </bibl>
        </listBibl>
      </div>
    </body>
  </text>
</TEI>