Tags
Language
Tags
December 2024
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 1 2 3 4

OCR for Smart Data Extraction from PDF and Images with NER

Posted By: lucky_aut
OCR for Smart Data Extraction from PDF and Images with NER

OCR for Smart Data Extraction from PDF and Images with NER
Duration: 1h 45m | .MP4 1280x720, 30 fps(r) | AAC, 44100 Hz, 2ch | 540 MB
Genre: eLearning | Language: English

Learn Data Extraction, Labelling with Training using Spacy & build a solution with Python, Pandas, OCR and NER concepts

What you'll learn:
Understand data extraction from different types of documents such as PDF, Word and Scanned Images
Learn how to use Tesseract and PyTesseract for recognition of data from images
Learn how to use Spacy efficiently for labelling along with training on custom data for NER
Use Pandas to convert extracted data to a CSV format

Requirements:
Basic Python Programming knowledge

Description:
Gain a competitive edge in the world of Computer Vision through this course by learning how to do Smart Data Extraction from Pdf and Images.
The technology landscape of world has brought in cognitive skills at the forefront where major emphasis is on intelligent data extraction. This becomes more complex due to the huge variety of input documents such as pdf document with structured data, scanned pdf document and word document. This course aims to solve this challenging problem by helping you to understand these various formats and then empower you to do smart data extraction using Python, Pandas, OCR, Tesseract, PyTesseract, OpenCV, Spacy and NER concepts.
The course will guide you on how you can build a common pipeline irrespective of multiple data formats through a structured workflow wherein you will learn Data Extraction using OCR, Data Labelling with Spacy along with Training a model on custom NER data and validating the model through prediction. Towards the end, we will combine all the learnings to build a Smart Text Extractor application.
The course has been designed to explain text data extraction workflow in depth by first explaining the technology concepts and then their implementation through code. Detailed code walkthrough has been included for all the code implementations and 12 supporting source code files are available for download. In addition to this, the quiz at the end of course helps you to assess your knowledge and identify the improvement areas.
Enroll in this course and enhance your cognitive capabilities. Here are just few of the topics we will be learning:

· Understanding basics of Data Conversion
· Conversion and Extraction from structured PDF document
· Conversion of Scanned PDF document to text
· Conversion and Extraction of data from word document to text
· Common Format for Pipeline for all types of document
· Image Reading using PIL and OpenCV
· Tesseract for Extraction
· Tesseract Page Segmentation Mode (PSM) and OCR Engine Mode (OEM)
· Extraction of Data from Image
· PyTesseract Operations for conversion of  documents to readable text
· Named Entity Recognition (NER)
· Spacy Entity Types
· IOB Format
· Labelling with Spacy for NER
· Training Spacy model on custom data using NER
· Predicting using Trained Spacy Model
· Pandas
· Convert Data to CSV Output using DataFrame

Who this course is for:
Python Developer who want to learn data extraction using OCR
NLP and NER Enthusiast who are keen to explore Text Labelling
Computer Vision professionals
OCR Engineer

More Info