Reference guide

The tf_crnn.data_handler

Data handling for input function

data_loader(csv_filename, str], params[, labels]) Loads, preprocesses (data augmentation, padding) and feeds the data
padding_inputs_width(image, target_shape, …) Given an input image, will pad it to return a target_shape size padded image.
augment_data(image, max_rotation) Data augmentation on an image (padding, brightness, contrast, rotation)
random_rotation(img, max_rotation, crop) Rotates an image with a random angle.
random_padding(image, max_pad_w, max_pad_h) Given an image will pad its border adding a random number of rows and columns
serving_single_input(fixed_height, min_width) Serving input function needed for export (in TensorFlow).

Config for training

Alphabet(lookup_alphabet_file, blank_symbol) Object for alphabet / symbols units.
TrainingParams(**kwargs) Object for parameters related to the training.
Params(**kwargs) Object for general parameters
import_params_from_json(model_directory, …) Read the exported json file with parameters of the experiment.

Model

deep_cnn(input_imgs, input_channels, …) CNN part of the CRNN network.
deep_bidirectional_lstm(inputs, params, …) Recurrent part of the CRNN network.
crnn_fn(features, labels, mode, params) CRNN model definition for tf.Estimator.
get_words_from_chars(characters_list, …[, …]) Joins separated characters to form words.

Loading exported model

PredictionModel(model_dir, session, signature) Helper class to load an exported model and apply it to image segments for transcription.

tf_crnn.data_handler.augment_data(image: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203367b320>, max_rotation: float = 0.1) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f203365dc18>[source]

Data augmentation on an image (padding, brightness, contrast, rotation)

Parameters:
  • image – Tensor
  • max_rotation – float, maximum permitted rotation (in radians)
Returns:

Tensor

tf_crnn.data_handler.data_loader(csv_filename: Union[List[str], str], params: tf_crnn.config.Params, labels=True, batch_size: int = 64, data_augmentation: bool = False, num_epochs: int = None, image_summaries: bool = False)[source]

Loads, preprocesses (data augmentation, padding) and feeds the data

Parameters:
  • csv_filename – filename or list of filenames
  • params – Params object containing all the parameters
  • labels – transcription labels
  • batch_size – batch_size
  • data_augmentation – flag to select or not data augmentation
  • num_epochs – feeds the data ‘num_epochs’ times
  • image_summaries – floag to show image summaries or not
Returns:

data_loader function

tf_crnn.data_handler.padding_inputs_width(image: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203365dcc0>, target_shape: Tuple[int, int], increment: int) → Tuple[<sphinx.ext.autodoc.importer._MockObject object at 0x7f203581e4e0>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581eac8>][source]

Given an input image, will pad it to return a target_shape size padded image. There are 3 cases:

  • image width > target width : simple resizing to shrink the image
  • image width >= 0.5*target width : pad the image
  • image width < 0.5*target width : replicates the image segment and appends it
Parameters:
  • image – Tensor of shape [H,W,C]
  • target_shape – final shape after padding [H, W]
  • increment – reduction factor due to pooling between input width and output width, this makes sure that the final width will be a multiple of increment
Returns:

(image padded, output width)

tf_crnn.data_handler.random_padding(image: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203fbec940>, max_pad_w: int = 5, max_pad_h: int = 10) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f20336800f0>[source]

Given an image will pad its border adding a random number of rows and columns

Parameters:
  • image – image to pad
  • max_pad_w – maximum padding in width
  • max_pad_h – maximum padding in height
Returns:

a padded image

tf_crnn.data_handler.random_rotation(img: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203fbec400>, max_rotation: float = 0.1, crop: bool = True) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f203fbec828>[source]

Rotates an image with a random angle. See https://stackoverflow.com/questions/16702966/rotate-image-and-crop-out-black-borders for formulae

Parameters:
  • img – Tensor
  • max_rotation – maximum angle to rotate (radians)
  • crop – boolean to crop or not the image after rotation
Returns:

tf_crnn.data_handler.serving_single_input(fixed_height: int = 32, min_width: int = 8)[source]

Serving input function needed for export (in TensorFlow). Features to serve :

  • images : greyscale image
  • input_filename : filename of image segment
  • input_rgb: RGB image segment
Parameters:
  • fixed_height – height of the image to format the input data with
  • min_width – minimum width to resize the image
Returns:

serving_input_fn

class tf_crnn.config.Alphabet(lookup_alphabet_file: str = None, blank_symbol: str = '$')[source]

Object for alphabet / symbols units.

Variables:
  • _blank_symbol (str) – Blank symbol used for CTC
  • _alphabet_units (List[str]) – list of elements composing the alphabet. The units may be a single character or multiple characters.
  • _codes (List[int]) – Each alphabet unit has a unique corresponding code.
  • _nclasses (int) – number of alphabet units.
alphabet_units
blank_symbol
check_input_file_alphabet(csv_filenames: List[str], discarded_chars: str = ';|\t\n\r\x0b\x0c', csv_delimiter: str = ';') → None[source]

Checks if labels of input files contains only characters that are in the Alphabet.

Parameters:
  • csv_filenames – list of the csv filename
  • discarded_chars – discarded characters
  • csv_delimiter – character delimiting field in the csv file
Returns:

codes
classmethod create_lookup_from_labels(csv_files: List[str], export_lookup_filename: str, original_lookup_filename: str = None)[source]

Create a lookup dictionary for csv files containing labels. Exports a json file with the Alphabet.

Parameters:
  • csv_files – list of files to get the labels from (should be of format path;label)
  • export_lookup_filename – filename to export alphabet lookup dictionary
  • original_lookup_filename – original lookup filename to update (optional)
Returns:

n_classes
class tf_crnn.config.Params(**kwargs)[source]

Object for general parameters

Variables:
  • input_shape (Tuple[int, int]) – input shape of the image to batch (this is the shape after data augmentation). The original will either be resized or pad depending on its original size
  • input_channels (int) – number of color channels for input image
  • csv_delimiter (str) – character to delimit csv input files
  • string_split_delimiter (str) – character that delimits each alphabet unit in the labels
  • num_gpus (int) – number of gpus to use
  • lookup_alphabet_file (str) – json file that contains the mapping alphabet units <-> codes
  • csv_files_train (str) – csv filename which contains the (path;label) of each training sample
  • csv_files_eval (str) – csv filename which contains the (path;label) of each eval sample
  • output_model_dir (str) – output directory where the model will be saved and exported
  • keep_prob_dropout (float) – keep probability
  • num_beam_paths (int) – number of paths (transcriptions) to return for ctc beam search (only used when predicting)
  • data_augmentation (bool) – if True augments data on the fly
  • data_augmentation_max_rotation (float) – max permitted roation to apply to image during training (radians)
  • input_data_n_parallel_calls (int) – number of parallel calls to make when using Dataset.map()
keep_prob_dropout
show_experiment_params() → dict[source]

Returns a dictionary with the variables of the class. :return:

class tf_crnn.config.TrainingParams(**kwargs)[source]

Object for parameters related to the training.

Variables:
  • n_epochs (int) – numbers of epochs to run the training (default: 50)
  • train_batch_size (int) – batch size during training (default: 64)
  • eval_batch_size (int) – batch size during evaluation (default: 128)
  • learning_rate (float) – initial learning rate (default: 1e-4)
  • learning_decay_rate (float) – decay rate for exponential learning rate (default: .96)
  • learning_decay_steps (int) – decay steps for exponential learning rate (default: 1000)
  • evaluate_every_epoch (int) – evaluate every ‘evaluate_every_epoch’ epoch (default: 5)
  • save_interval (int) – save the model every ‘save_interval’ step (default: 1e3)
  • optimizer (str) – which optimizer to use (‘adam’, ‘rms’, ‘ada’) (default: ‘adam)
to_dict() → dict[source]
tf_crnn.config.import_params_from_json(model_directory: str = None, json_filename: str = None) → dict[source]

Read the exported json file with parameters of the experiment.

Parameters:
  • model_directory – Direcoty where the odel was exported
  • json_filename – filename of the file
Returns:

a dictionary containing the parameters of the experiment

tf_crnn.model.crnn_fn(features, labels, mode, params)[source]

CRNN model definition for tf.Estimator. Combines deep_cnn and deep_bidirectional_lstm to define the model and adds loss computation and CTC decoder.

Parameters:
  • features – dictionary with keys : ‘images’, ‘images_widths’, ‘filenames
  • labels – string containing the transcriptions. Flattend (1D) array with encoded label (one code per character)
  • mode – TRAIN, EVAL, PREDICT
  • params – dictionary with keys: ‘Params’, ‘TrainingParams
Returns:

tf_crnn.model.deep_bidirectional_lstm(inputs: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581a390>, params: tf_crnn.config.Params, summaries: bool = True) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581a320>[source]

Recurrent part of the CRNN network. Uses a biderectional LSTM.

Parameters:
  • inputs – output of deep_cnn
  • params – parameters of the model
  • summaries – flag to enable bias and weight histograms to be visualized in Tensorboard
Returns:

Tuple : (tensor [width(time), batch, n_classes], raw transcription codes)

tf_crnn.model.deep_cnn(input_imgs: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581a3c8>, input_channels: int, is_training: bool, summaries: bool = True) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581a2b0>[source]

CNN part of the CRNN network.

Parameters:
  • input_imgs – input images [B, H, W, C]
  • input_channels – input channels, 1 for greyscale images, 3 for RGB color images
  • is_training – flag to indicate training or not
  • summaries – flag to enable bias and weight histograms to be visualized in Tensorboard
Returns:

tensor of shape [batch, final_width, final_height x final_features]

class tf_crnn.loader.PredictionModel(model_dir: str, session: <sphinx.ext.autodoc.importer._MockObject object at 0x7f2033696a90> = None, signature: str = 'predictions')[source]

Helper class to load an exported model and apply it to image segments for transcription.

Variables:
  • session (tf.Session) – tf.Session within which to run the loading process
  • model – loaded exported model
Parameters:
  • model_dir – directory containing the saved model files.
  • sessiontf.Session to load the model
  • signature

    which signature to use to select the type of input :

    • predictions (default) : input a grayscale image
    • rgb_images : input a RGB image
    • filename : input the filename of the image segment
predict(input_to_predict: Union[numpy.ndarray, str]) → dict[source]

Get transcription for input data.

Parameters:input_to_predict – input data of the format specified in signature when instantiating the object
Returns:a dictionary with the predictions