מדריך מפתחים - מערכת סריקת מסמכים

תיעוד טכני מפורט לפיתוח והרחבת המערכת

סקירה כללית

מערכת סריקת המסמכים היא מערכת מלאה לניהול, סריקה ועיבוד מסמכים פיננסיים. המערכת בנויה כמודול עצמאי בתוך פלטפורמת TechLabs והיא כוללת:

  • 📤 העלאת מסמכים - תמיכה ב-PDF, JPG, PNG עד 10MB
  • 🔍 OCR רב-ספקי - Tesseract, Google Cloud Vision, AWS Textract
  • 📧 אינטגרציית אימייל - Gmail OAuth 2.0 + IMAP
  • 📊 ניתוח פיננסי - חילוץ אוטומטי של פרטי חשבוניות
  • 🛡️ זיהוי הונאות - זיהוי כפילויות וחריגות
  • Approval Workflow - מערכת אישורים מרובת שלבים
  • 🏢 Multi-tenant - תמיכה במספר חברות

סטטיסטיקות קוד:

Backend:
~59,000 שורות Python
Frontend:
~106,000 שורות (HTML/JS/CSS)
Database:
16 טבלאות, 14 מודלים

מחסנית טכנולוגיות

Backend
Python 3.9+ Flask 2.0+ PostgreSQL 15 Redis Celery

מודולים עיקריים:
SQLAlchemy (ORM), Pillow (Image Processing), pytesseract (OCR), google-cloud-vision, boto3 (AWS), cryptography (Encryption)

Frontend
JavaScript ES6+ Bootstrap 5

ספריות:
Bootstrap Icons, Fetch API (Async/Await), Custom Manager Classes

Infrastructure
Docker Traefik Gunicorn

פריסה:
Docker Compose, Traefik Reverse Proxy, SSL/TLS Certificates

ארכיטקטורה

תרשים זרימה כללי:

┌─────────────┐
│   Browser   │
└──────┬──────┘
       │ HTTPS (Traefik)
       ▼
┌─────────────┐      ┌──────────────┐
│ Flask App   │◄────►│ PostgreSQL   │
│ (Gunicorn)  │      │   Database   │
└──────┬──────┘      └──────────────┘
       │
       ├──────────►┌──────────────┐
       │           │    Redis     │
       │           │   (Cache)    │
       │           └──────────────┘
       │
       ├──────────►┌──────────────┐
       │           │    Celery    │
       │           │   Workers    │
       │           └──────┬───────┘
       │                  │
       │                  ▼
       │           ┌──────────────┐
       │           │ OCR Services │
       │           │ (Tesseract,  │
       │           │  GCP, AWS)   │
       │           └──────────────┘
       │
       └──────────►┌──────────────┐
                   │ Email APIs   │
                   │ (Gmail/IMAP) │
                   └──────────────┘

שכבות המערכת:

1. Presentation Layer (UI)
  • 27 תבניות HTML (Jinja2)
  • JavaScript Managers (ES6 Classes)
  • Bootstrap 5 RTL
  • Real-time Updates
2. API Layer
  • RESTful APIs (40+ endpoints)
  • JSON Request/Response
  • Authentication & Authorization
  • Rate Limiting
3. Business Logic Layer
  • Document Processing
  • OCR Orchestration
  • Fraud Detection Algorithms
  • Approval Workflow Engine
4. Data Layer
  • SQLAlchemy ORM
  • PostgreSQL Database
  • Redis Cache
  • File Storage System

מבנה קבצים

app/
├── routes/
│   └── routes_email_scanning.py         # 320 lines - HTML routes
├── api/
│   ├── api_email_scanning.py            # 825 lines - Core API
│   └── api_email_scanning_enhanced.py   # 619 lines - Enhanced API
├── models/
│   └── models_email_scanning.py         # 765 lines - Database models
├── services/
│   ├── email_scanning_ocr_service.py   # ~500 lines - OCR service
│   ├── email_scanning_tasks.py         # 11,486 lines - Celery tasks
│   ├── email_integration_service.py    # ~15,000 lines - Email service
│   └── gmail_service.py                # ~10,000 lines - Gmail OAuth
├── static/
│   ├── js/modules/
│   │   ├── email-scanning-manager.js   # 912 lines - Main manager
│   │   └── email-integration-accounts.js # 518 lines - Email UI
│   ├── css/modules/
│   │   └── email-scanning.css          # ~5,000 lines - Styles
│   └── docs/email-scanning/
│       ├── index.html
│       ├── user-guide.html
│       ├── api-reference.html
│       └── technical.html
└── templates/email-scanning/
    ├── dashboard.html
    ├── documents.html
    ├── upload.html
    └── ... (24 more templates)

מסד נתונים

מודלים עיקריים:

ScannedDocument

טבלה: scanned_documents

תיאור: מסמך סרוק - הטבלה המרכזית של המערכת

Python
class ScannedDocument(db.Model):
    __tablename__ = 'scanned_documents'

    # Primary Key
    id = db.Column(db.Integer, primary_key=True)

    # File Info
    filename = db.Column(db.String(255), nullable=False)
    original_filename = db.Column(db.String(255))
    file_path = db.Column(db.String(512))
    file_size = db.Column(db.Integer)
    upload_date = db.Column(db.DateTime, default=datetime.utcnow)

    # Ownership
    user_id = db.Column(db.Integer, db.ForeignKey('users.id'))
    company_id = db.Column(db.Integer, db.ForeignKey('scanning_companies.id'))

    # OCR Results
    ocr_status = db.Column(db.String(50))  # pending/processing/completed/failed
    ocr_confidence = db.Column(db.Float)
    extracted_text = db.Column(db.Text)

    # Financial Data (extracted)
    supplier_name = db.Column(db.String(255))
    supplier_tax_id = db.Column(db.String(50))
    invoice_number = db.Column(db.String(100))
    total_amount = db.Column(db.Numeric(12, 2))
    currency = db.Column(db.String(3), default='ILS')
    invoice_date = db.Column(db.Date)
    due_date = db.Column(db.Date)

    # Status & Workflow
    status = db.Column(db.String(50), default='pending')
    approval_status = db.Column(db.String(50))
    category_id = db.Column(db.Integer, db.ForeignKey('document_categories.id'))

    # Relationships
    category = db.relationship('DocumentCategory', backref='documents')
    line_items = db.relationship('DocumentLineItem', backref='document')
    versions = db.relationship('DocumentVersion', backref='document')
DocumentCategory

טבלה: document_categories

Python
class DocumentCategory(db.Model):
    __tablename__ = 'document_categories'

    id = db.Column(db.Integer, primary_key=True)
    name = db.Column(db.String(100), nullable=False)
    description = db.Column(db.Text)
    color = db.Column(db.String(7))  # Hex color
    icon = db.Column(db.String(50))

    # Auto-categorization
    auto_categorize = db.Column(db.Boolean, default=False)
    keywords = db.Column(db.JSON)  # List of keywords

    # Multi-tenant
    company_id = db.Column(db.Integer, db.ForeignKey('scanning_companies.id'))
EmailAccount

טבלה: email_accounts

Python
class EmailAccount(db.Model):
    __tablename__ = 'email_accounts'

    id = db.Column(db.Integer, primary_key=True)
    email = db.Column(db.String(255), nullable=False)
    provider = db.Column(db.String(50))  # gmail/imap/outlook

    # IMAP Configuration
    imap_server = db.Column(db.String(255))
    imap_port = db.Column(db.Integer)
    imap_username = db.Column(db.String(255))
    imap_password_encrypted = db.Column(db.Text)  # AES-256 Fernet

    # Gmail OAuth
    gmail_token_encrypted = db.Column(db.Text)
    gmail_refresh_token_encrypted = db.Column(db.Text)

    # Auto-fetch
    auto_fetch = db.Column(db.Boolean, default=False)
    fetch_interval = db.Column(db.Integer, default=300)  # seconds
    last_fetch = db.Column(db.DateTime)

    # Relationships
    messages = db.relationship('EmailMessage', backref='account')

כל הטבלאות:

# שם טבלה מטרה
1scanned_documentsמסמכים סרוקים
2document_categoriesקטגוריות
3document_ocr_jobsתורי OCR
4document_line_itemsשורות חשבונית
5document_exportsהיסטוריית ייצוא
6document_notificationsהתראות
7document_sharesשיתוף מסמכים
8document_versionsגרסאות
9scanning_companiesחברות (Multi-tenant)
10email_accountsחשבונות אימייל
11email_messagesהודעות אימייל
12email_attachmentsקבצים מצורפים
13email_processing_rulesכללי אוטומציה
14email_fetch_logsלוגים
15documentsמסמכים כללי
16portal_documentsמסמכי פורטל

Backend - Python/Flask

דוגמה: יצירת API Endpoint

Python - Flask
from flask import Blueprint, request, jsonify
from app.models_email_scanning import ScannedDocument
from app import db
from flask_login import login_required, current_user

email_scanning_api = Blueprint('email_scanning_api', __name__)

@email_scanning_api.route('/api/email-scanning/documents', methods=['GET'])
@login_required
def get_documents():
    """
    Get list of documents with pagination and filtering
    """
    # Get query parameters
    page = request.args.get('page', 1, type=int)
    per_page = request.args.get('per_page', 20, type=int)
    category_id = request.args.get('category_id', type=int)
    search = request.args.get('search', '')

    # Build query
    query = ScannedDocument.query.filter_by(user_id=current_user.id)

    # Apply filters
    if category_id:
        query = query.filter_by(category_id=category_id)

    if search:
        query = query.filter(
            db.or_(
                ScannedDocument.original_filename.ilike(f'%{search}%'),
                ScannedDocument.supplier_name.ilike(f'%{search}%'),
                ScannedDocument.invoice_number.ilike(f'%{search}%')
            )
        )

    # Paginate
    pagination = query.order_by(
        ScannedDocument.upload_date.desc()
    ).paginate(page=page, per_page=per_page, error_out=False)

    # Serialize results
    documents = [{
        'id': doc.id,
        'filename': doc.filename,
        'original_filename': doc.original_filename,
        'upload_date': doc.upload_date.isoformat() if doc.upload_date else None,
        'category_name': doc.category.name if doc.category else None,
        'status': doc.status,
        'total_amount': float(doc.total_amount) if doc.total_amount else None
    } for doc in pagination.items]

    # Return success response
    return success_response({
        'documents': documents,
        'total': pagination.total,
        'pages': pagination.pages,
        'current_page': page,
        'per_page': per_page
    })

def success_response(data=None, message='Success', status_code=200):
    """Helper function for consistent API responses"""
    response = {
        'success': True,
        'message': message
    }
    if data is not None:
        response['data'] = data
    return jsonify(response), status_code

טיפים לפיתוח Backend:

✅ Best Practices:
  • השתמש ב-@login_required decorator לכל ה-endpoints
  • תמיד החזר תשובות בפורמט success_response()
  • בצע validationלכל ה-input parameters
  • השתמש ב-SQLAlchemy transactions עבור כתיבות למסד נתונים
  • הוסף logging עבור כל פעולה קריטית

Frontend - JavaScript

Manager Class Pattern

JavaScript ES6
class EmailScanningManager {
    constructor() {
        this.documents = [];
        this.categories = [];
        this.currentPage = 1;
        this.totalPages = 1;
        this.selectedCategory = null;
    }

    async init() {
        console.log('🔧 Initializing EmailScanning Manager...');

        try {
            // Load initial data
            await this.loadCategories();
            await this.loadDocuments();

            // Setup event listeners
            this.setupEventListeners();

            console.log('✅ EmailScanning Manager initialized');
        } catch (error) {
            console.error('❌ Initialization failed:', error);
        }
    }

    async loadDocuments(page = 1) {
        try {
            const response = await fetch(
                `/api/email-scanning/documents?page=${page}&per_page=20`,
                {
                    method: 'GET',
                    credentials: 'include',
                    headers: {
                        'Content-Type': 'application/json'
                    }
                }
            );

            if (!response.ok) {
                throw new Error(`HTTP ${response.status}`);
            }

            const result = await response.json();

            // IMPORTANT: Unwrap success_response
            const data = result.data || result;

            this.documents = Array.isArray(data)
                ? data
                : (data.documents || []);
            this.totalPages = data.pages || 1;
            this.currentPage = page;

            this.renderDocuments();
        } catch (error) {
            console.error('Error loading documents:', error);
            this.showError('שגיאה בטעינת מסמכים');
        }
    }

    renderDocuments() {
        const container = document.getElementById('documents-container');
        if (!container) return;

        container.innerHTML = '';

        this.documents.forEach(doc => {
            const docCard = this.createDocumentCard(doc);
            container.appendChild(docCard);
        });
    }

    createDocumentCard(doc) {
        const card = document.createElement('div');
        card.className = 'document-card';
        card.innerHTML = `
            
${doc.original_filename || doc.filename}
${doc.status}

ספק: ${doc.supplier_name || 'לא זוהה'}

סכום: ${doc.total_amount ? `₪${doc.total_amount}` : '-'}

תאריך: ${this.formatDate(doc.upload_date)}

`; return card; } setupEventListeners() { // Upload button const uploadBtn = document.getElementById('upload-btn'); if (uploadBtn) { uploadBtn.addEventListener('click', () => this.showUploadModal()); } // Category filter const categoryFilter = document.getElementById('category-filter'); if (categoryFilter) { categoryFilter.addEventListener('change', (e) => { this.selectedCategory = e.target.value; this.loadDocuments(1); }); } } showError(message) { // Show toast notification const toast = document.createElement('div'); toast.className = 'toast-notification error'; toast.textContent = message; document.body.appendChild(toast); setTimeout(() => toast.remove(), 3000); } } // Initialize on page load const manager = new EmailScanningManager(); document.addEventListener('DOMContentLoaded', () => manager.init());
חשוב - טיפול בתשובות API:

הAPI מחזיר תשובות עטופות ב-success_response, לכן יש תמיד לחלץ את הנתונים מ-result.data:

// ✅ CORRECT
const result = await response.json();
const data = result.data || result;
this.documents = Array.isArray(data) ? data : (data.documents || []);

// ❌ WRONG
const data = await response.json();
this.documents = data.documents;  // Will fail!

מערכת OCR

המערכת תומכת ב-3 ספקי OCR עם fallback אוטומטי:

1. Tesseract OCR (מקומי)
  • ✅ חינם ומהיר
  • ✅ תמיכה בעברית
  • ⚠️ דיוק נמוך יחסית (75-85%)
Python
import pytesseract
from PIL import Image

def extract_text_tesseract(image_path, lang='heb+eng'):
    """Extract text using Tesseract OCR"""
    try:
        image = Image.open(image_path)
        text = pytesseract.image_to_string(image, lang=lang)
        confidence = pytesseract.image_to_data(image, output_type='dict')
        avg_conf = sum(confidence['conf']) / len(confidence['conf'])

        return {
            'text': text,
            'confidence': avg_conf / 100,
            'provider': 'tesseract'
        }
    except Exception as e:
        logger.error(f"Tesseract OCR failed: {e}")
        return None
2. Google Cloud Vision
  • ✅ דיוק גבוה (90-95%)
  • ✅ תמיכה מצוינת בעברית
  • ⚠️ דורש API key וגובה תשלום
Python
from google.cloud import vision

def extract_text_google_vision(image_path):
    """Extract text using Google Cloud Vision"""
    try:
        client = vision.ImageAnnotatorClient()

        with open(image_path, 'rb') as image_file:
            content = image_file.read()

        image = vision.Image(content=content)
        response = client.document_text_detection(image=image)

        if response.error.message:
            raise Exception(response.error.message)

        text = response.full_text_annotation.text
        confidence = response.full_text_annotation.pages[0].confidence

        return {
            'text': text,
            'confidence': confidence,
            'provider': 'google_vision'
        }
    except Exception as e:
        logger.error(f"Google Vision OCR failed: {e}")
        return None
3. AWS Textract
  • ✅ מעולה לחשבוניות וטפסים מובנים
  • ✅ מזהה אוטומטית שדות (Supplier, Total, etc.)
  • ⚠️ תמיכה מוגבלת בעברית

תהליך OCR אסינכרוני:

Python - Celery
from celery import shared_task

@shared_task(bind=True, max_retries=3)
def process_document_ocr(self, document_id, provider='auto'):
    """
    Celery task to process document OCR asynchronously
    """
    try:
        document = ScannedDocument.query.get(document_id)
        if not document:
            logger.error(f"Document {document_id} not found")
            return

        # Update status
        document.ocr_status = 'processing'
        db.session.commit()

        # Perform OCR with fallback
        result = None
        providers = ['tesseract', 'google_vision', 'aws_textract']

        if provider != 'auto':
            providers = [provider]

        for prov in providers:
            logger.info(f"Trying OCR provider: {prov}")
            result = perform_ocr(document.file_path, prov)

            if result and result['confidence'] > 0.7:
                break

        if not result:
            raise Exception("All OCR providers failed")

        # Extract financial data
        extracted_data = extract_invoice_data(result['text'])

        # Update document
        document.extracted_text = result['text']
        document.ocr_confidence = result['confidence']
        document.ocr_provider = result['provider']
        document.ocr_status = 'completed'

        # Update extracted fields
        document.supplier_name = extracted_data.get('supplier')
        document.invoice_number = extracted_data.get('invoice_number')
        document.total_amount = extracted_data.get('total_amount')

        db.session.commit()

        logger.info(f"OCR completed for document {document_id}")

    except Exception as e:
        logger.error(f"OCR failed for document {document_id}: {e}")
        document.ocr_status = 'failed'
        db.session.commit()

        # Retry with exponential backoff
        raise self.retry(exc=e, countdown=60 * (2 ** self.request.retries))

אינטגרציית אימייל

Gmail OAuth 2.0

Python
from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build

def fetch_gmail_messages(account):
    """Fetch messages from Gmail using OAuth 2.0"""
    try:
        # Decrypt stored tokens
        access_token = decrypt(account.gmail_token_encrypted)
        refresh_token = decrypt(account.gmail_refresh_token_encrypted)

        # Create credentials
        creds = Credentials(
            token=access_token,
            refresh_token=refresh_token,
            token_uri='https://oauth2.googleapis.com/token',
            client_id=app.config['GMAIL_CLIENT_ID'],
            client_secret=app.config['GMAIL_CLIENT_SECRET']
        )

        # Build Gmail service
        service = build('gmail', 'v1', credentials=creds)

        # Fetch messages
        results = service.users().messages().list(
            userId='me',
            labelIds=['INBOX'],
            q='has:attachment',
            maxResults=50
        ).execute()

        messages = results.get('messages', [])

        # Process each message
        for msg_meta in messages:
            msg = service.users().messages().get(
                userId='me',
                id=msg_meta['id'],
                format='full'
            ).execute()

            # Save to database
            save_email_message(account.id, msg)

            # Download attachments
            process_attachments(account.id, msg)

        # Update last fetch
        account.last_fetch = datetime.utcnow()
        db.session.commit()

    except Exception as e:
        logger.error(f"Gmail fetch failed: {e}")
        raise

IMAP Integration

Python
import imaplib
import email

def fetch_imap_messages(account):
    """Fetch messages from IMAP server"""
    try:
        # Decrypt password
        password = decrypt(account.imap_password_encrypted)

        # Connect to IMAP server
        if account.use_ssl:
            imap = imaplib.IMAP4_SSL(account.imap_server, account.imap_port)
        else:
            imap = imaplib.IMAP4(account.imap_server, account.imap_port)

        # Login
        imap.login(account.imap_username, password)

        # Select INBOX
        imap.select('INBOX')

        # Search for unread messages with attachments
        _, message_numbers = imap.search(None, 'UNSEEN')

        for num in message_numbers[0].split():
            # Fetch message
            _, msg_data = imap.fetch(num, '(RFC822)')
            email_body = msg_data[0][1]
            message = email.message_from_bytes(email_body)

            # Save to database
            save_email_message(account.id, message)

            # Process attachments
            if message.is_multipart():
                for part in message.walk():
                    if part.get_content_disposition() == 'attachment':
                        save_attachment(account.id, part)

        # Logout
        imap.close()
        imap.logout()

        # Update last fetch
        account.last_fetch = datetime.utcnow()
        db.session.commit()

    except Exception as e:
        logger.error(f"IMAP fetch failed: {e}")
        raise
🔐 אבטחה:

כל הסיסמאות וה-tokens מוצפנים באמצעות AES-256 Fernet לפני השמירה במסד הנתונים. מפתח ההצפנה נשמר ב-ENCRYPTION_KEY environment variable.

פריסה (Deployment)

Docker Compose Setup

YAML
version: '3.8'

services:
  web:
    build: .
    container_name: techlabs-web
    volumes:
      - ./app:/app/app
      - ./logs:/app/logs
      - ./uploads:/app/uploads
    environment:
      - FLASK_ENV=production
      - DATABASE_URL=postgresql://user:pass@db:5432/techlabs
      - REDIS_URL=redis://redis:6379/0
      - ENCRYPTION_KEY=${ENCRYPTION_KEY}
    depends_on:
      - db
      - redis
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.techlabs.rule=Host(`labs.levor.io`)"
      - "traefik.http.routers.techlabs.tls=true"
      - "traefik.http.routers.techlabs.tls.certresolver=letsencrypt"

  db:
    image: postgres:15
    container_name: techlabs-db
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_DB=techlabs
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=password

  redis:
    image: redis:7-alpine
    container_name: techlabs-redis

  celery:
    build: .
    command: celery -A app.celery worker --loglevel=info
    depends_on:
      - redis
      - db
    environment:
      - FLASK_ENV=production

volumes:
  postgres_data:

הרצה:

Bash
# Build and start
docker-compose up -d --build

# View logs
docker-compose logs -f web

# Run migrations
docker exec techlabs-web flask db upgrade

# Restart services
docker-compose restart

# Stop
docker-compose down

בדיקות

Unit Tests

Python - pytest
import pytest
from app import create_app, db
from app.models_email_scanning import ScannedDocument, DocumentCategory

@pytest.fixture
def app():
    app = create_app('testing')
    with app.app_context():
        db.create_all()
        yield app
        db.session.remove()
        db.drop_all()

@pytest.fixture
def client(app):
    return app.test_client()

def test_create_document(client, app):
    """Test document creation"""
    with app.app_context():
        doc = ScannedDocument(
            filename='test.pdf',
            original_filename='חשבונית.pdf',
            user_id=1
        )
        db.session.add(doc)
        db.session.commit()

        assert doc.id is not None
        assert doc.filename == 'test.pdf'

def test_api_get_documents(client):
    """Test GET /api/email-scanning/documents"""
    # Login first
    client.post('/auth/login', data={
        'email': 'test@example.com',
        'password': 'password'
    })

    # Make API request
    response = client.get('/api/email-scanning/documents')

    assert response.status_code == 200
    data = response.get_json()
    assert data['success'] == True
    assert 'data' in data

# Run tests
# pytest tests/ -v

API Tests

Bash
#!/bin/bash
# test_api.sh

BASE_URL="https://labs.levor.io"

# Login and get cookie
curl -c cookies.txt -X POST \
  ${BASE_URL}/auth/login \
  -d "email=admin@techlab.co.il" \
  -d "password=yourpassword"

# Test document endpoint
curl -b cookies.txt \
  ${BASE_URL}/api/email-scanning/documents | jq

# Test upload
curl -b cookies.txt -X POST \
  ${BASE_URL}/api/email-scanning/documents/upload \
  -F "file=@invoice.pdf" \
  -F "category_id=1" | jq

הרחבת המערכת

הוספת API Endpoint חדש

  1. הוסף route ב-api_email_scanning.py:
Python
@email_scanning_api.route('/api/email-scanning/custom-report', methods=['GET'])
@login_required
def custom_report():
    """Your custom endpoint"""
    # Your logic here
    data = {
        'report': 'Your data'
    }
    return success_response(data)
  1. הוסף קריאה ב-JavaScript:
JavaScript
async loadCustomReport() {
    const response = await fetch('/api/email-scanning/custom-report', {
        credentials: 'include'
    });
    const result = await response.json();
    const data = result.data || result;
    console.log(data.report);
}

הוספת ספק OCR חדש

Python
def extract_text_custom_provider(image_path):
    """
    Add your custom OCR provider
    """
    try:
        # Your OCR logic
        text = your_ocr_api.extract(image_path)

        return {
            'text': text,
            'confidence': 0.95,
            'provider': 'custom'
        }
    except Exception as e:
        logger.error(f"Custom OCR failed: {e}")
        return None

# Register in ocr_service.py
OCR_PROVIDERS = {
    'tesseract': extract_text_tesseract,
    'google_vision': extract_text_google_vision,
    'aws_textract': extract_text_aws_textract,
    'custom': extract_text_custom_provider  # ← Add here
}
סיימת!

עכשיו אתה מכיר את כל המערכת ויכול להתחיל לפתח. בהצלחה!