Skip to main content
Shit in. Shit out.

Clean gjør akkurat det som det høres ut som. Vasker data.

Advarsel: når har du beveger deg et stykke ned i materien. Dette er en beskrivelse av et av flere lokale skript vi kjører når vi vasker data ifm. en migrering.
Describe Your Image

70

tusen kontakter vasket (minst)

Describe Your Image

20

kunder som har fått bedre kvalitet i dataen inn i systemet

Description of Clean

(summary by Claude)


# Frei Data Clean

A Python tool for cleaning and validating customer contact and company data from CSV files.

## Features

### Data Processing

- Processes both contact and company data
- Validates and standardizes:
  - Email addresses (format and domain)
  - Phone numbers (Norwegian and international formats)
  - Names (first name and last name)
- Handles multiple emails per record (extras moved to 'other_emails')
- Caches domain validation results (24-hour expiry)
- Identifies duplicate records
- Supports custom output paths and CSV delimiters

### Name Processing

- Handles both full_name and first_name/last_name inputs
- Splits full names on last space
- Standardizes case (Title Case)
- Preserves compound surnames
- Handles empty values and whitespace

### Email Validation

- Format validation
- Domain MX record checking
- Multiple email handling
- Duplicate detection
- Status tracking:
  - 'valid' - Valid format and domain
  - 'invalid domain' - Valid format but invalid domain
  - 'invalid format' - Invalid email format
  - 'duplicate' - Duplicate email address
  - '' (empty) - Empty email field

### Phone Number Processing

- Norwegian number formatting
- International number support
- Shortcode handling
- Mobile/landline detection
- Status tracking:
  - 'valid norwegian' - Valid Norwegian format
  - 'valid intl' - Valid international format
  - 'shortcode' - 5-digit shortcode
  - 'invalid length' - Wrong number of digits
  - 'invalid format' - Doesn't match any valid format
  - 'moved to mobile' - Landline number moved to mobile
  - 'moved to phone' - Mobile number moved to landline
  - 'empty' - No number provided