Reference guide¶

The tf_crnn.data_handler

Data handling for input function¶

`data_loader`(csv_filename, str], params[, labels])	Loads, preprocesses (data augmentation, padding) and feeds the data
`padding_inputs_width`(image, target_shape, …)	Given an input image, will pad it to return a target_shape size padded image.
`augment_data`(image, max_rotation)	Data augmentation on an image (padding, brightness, contrast, rotation)
`random_rotation`(img, max_rotation, crop)	Rotates an image with a random angle.
`random_padding`(image, max_pad_w, max_pad_h)	Given an image will pad its border adding a random number of rows and columns
`serving_single_input`(fixed_height, min_width)	Serving input function needed for export (in TensorFlow).

Config for training¶

`Alphabet`(lookup_alphabet_file, blank_symbol)	Object for alphabet / symbols units.
`TrainingParams`(**kwargs)	Object for parameters related to the training.
`Params`(**kwargs)	Object for general parameters
`import_params_from_json`(model_directory, …)	Read the exported json file with parameters of the experiment.

Model¶

`deep_cnn`(input_imgs, input_channels, …)	CNN part of the CRNN network.
`deep_bidirectional_lstm`(inputs, params, …)	Recurrent part of the CRNN network.
`crnn_fn`(features, labels, mode, params)	CRNN model definition for `tf.Estimator`.
`get_words_from_chars`(characters_list, …[, …])	Joins separated characters to form words.

Loading exported model¶

PredictionModel(model_dir, session, signature) Helper class to load an exported model and apply it to image segments for transcription.

tf_crnn.data_handler.augment_data(image: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203367b320>, max_rotation: float = 0.1) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f203365dc18>[source]¶

Data augmentation on an image (padding, brightness, contrast, rotation)

Parameters:	image – Tensor max_rotation – float, maximum permitted rotation (in radians)
Returns:	Tensor

tf_crnn.data_handler.data_loader(csv_filename: Union[List[str], str], params: tf_crnn.config.Params, labels=True, batch_size: int = 64, data_augmentation: bool = False, num_epochs: int = None, image_summaries: bool = False)[source]¶

Loads, preprocesses (data augmentation, padding) and feeds the data

Parameters:	csv_filename – filename or list of filenames params – Params object containing all the parameters labels – transcription labels batch_size – batch_size data_augmentation – flag to select or not data augmentation num_epochs – feeds the data ‘num_epochs’ times image_summaries – floag to show image summaries or not
Returns:	data_loader function

tf_crnn.data_handler.padding_inputs_width(image: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203365dcc0>, target_shape: Tuple[int, int], increment: int) → Tuple[<sphinx.ext.autodoc.importer._MockObject object at 0x7f203581e4e0>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581eac8>][source]¶

Given an input image, will pad it to return a target_shape size padded image. There are 3 cases:

image width > target width : simple resizing to shrink the image

image width >= 0.5*target width : pad the image

image width < 0.5*target width : replicates the image segment and appends it

Parameters:	image – Tensor of shape [H,W,C] target_shape – final shape after padding [H, W] increment – reduction factor due to pooling between input width and output width, this makes sure that the final width will be a multiple of increment
Returns:	(image padded, output width)

tf_crnn.data_handler.random_padding(image: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203fbec940>, max_pad_w: int = 5, max_pad_h: int = 10) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f20336800f0>[source]¶

Given an image will pad its border adding a random number of rows and columns

Parameters:	image – image to pad max_pad_w – maximum padding in width max_pad_h – maximum padding in height
Returns:	a padded image

tf_crnn.data_handler.random_rotation(img: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203fbec400>, max_rotation: float = 0.1, crop: bool = True) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f203fbec828>[source]¶

Rotates an image with a random angle. See https://stackoverflow.com/questions/16702966/rotate-image-and-crop-out-black-borders for formulae

Parameters:	img – Tensor max_rotation – maximum angle to rotate (radians) crop – boolean to crop or not the image after rotation
Returns:

tf_crnn.data_handler.serving_single_input(fixed_height: int = 32, min_width: int = 8)[source]¶

Serving input function needed for export (in TensorFlow). Features to serve :

images : greyscale image

input_filename : filename of image segment

input_rgb: RGB image segment

Parameters:	fixed_height – height of the image to format the input data with min_width – minimum width to resize the image
Returns:	serving_input_fn

class tf_crnn.config.Alphabet(lookup_alphabet_file: str = None, blank_symbol: str = '$')[source]¶

Object for alphabet / symbols units.

Variables:	_blank_symbol (str) – Blank symbol used for CTC _alphabet_units (List[str]) – list of elements composing the alphabet. The units may be a single character or multiple characters. _codes (List[int]) – Each alphabet unit has a unique corresponding code. _nclasses (int) – number of alphabet units.

alphabet_units¶

blank_symbol¶

check_input_file_alphabet(csv_filenames: List[str], discarded_chars: str = ';|\t\n\r\x0b\x0c', csv_delimiter: str = ';') → None[source]¶

Checks if labels of input files contains only characters that are in the Alphabet.

Parameters:	csv_filenames – list of the csv filename discarded_chars – discarded characters csv_delimiter – character delimiting field in the csv file
Returns:

codes¶

classmethod create_lookup_from_labels(csv_files: List[str], export_lookup_filename: str, original_lookup_filename: str = None)[source]¶

Create a lookup dictionary for csv files containing labels. Exports a json file with the Alphabet.

Parameters:	csv_files – list of files to get the labels from (should be of format path;label) export_lookup_filename – filename to export alphabet lookup dictionary original_lookup_filename – original lookup filename to update (optional)
Returns:

n_classes¶

class tf_crnn.config.Params(**kwargs)[source]¶

Object for general parameters

Variables:

input_shape (Tuple[int, int]) – input shape of the image to batch (this is the shape after data augmentation). The original will either be resized or pad depending on its original size
input_channels (int) – number of color channels for input image
csv_delimiter (str) – character to delimit csv input files
string_split_delimiter (str) – character that delimits each alphabet unit in the labels
num_gpus (int) – number of gpus to use
lookup_alphabet_file (str) – json file that contains the mapping alphabet units <-> codes
csv_files_train (str) – csv filename which contains the (path;label) of each training sample
csv_files_eval (str) – csv filename which contains the (path;label) of each eval sample
output_model_dir (str) – output directory where the model will be saved and exported
keep_prob_dropout (float) – keep probability
num_beam_paths (int) – number of paths (transcriptions) to return for ctc beam search (only used when predicting)
data_augmentation (bool) – if True augments data on the fly
data_augmentation_max_rotation (float) – max permitted roation to apply to image during training (radians)
input_data_n_parallel_calls (int) – number of parallel calls to make when using Dataset.map()

keep_prob_dropout¶

show_experiment_params() → dict[source]¶: Returns a dictionary with the variables of the class. :return:

class tf_crnn.config.TrainingParams(**kwargs)[source]¶

Object for parameters related to the training.

Variables:

n_epochs (int) – numbers of epochs to run the training (default: 50)
train_batch_size (int) – batch size during training (default: 64)
eval_batch_size (int) – batch size during evaluation (default: 128)
learning_rate (float) – initial learning rate (default: 1e-4)
learning_decay_rate (float) – decay rate for exponential learning rate (default: .96)
learning_decay_steps (int) – decay steps for exponential learning rate (default: 1000)
evaluate_every_epoch (int) – evaluate every ‘evaluate_every_epoch’ epoch (default: 5)
save_interval (int) – save the model every ‘save_interval’ step (default: 1e3)
optimizer (str) – which optimizer to use (‘adam’, ‘rms’, ‘ada’) (default: ‘adam)

to_dict() → dict[source]¶

tf_crnn.config.import_params_from_json(model_directory: str = None, json_filename: str = None) → dict[source]¶

Read the exported json file with parameters of the experiment.

Parameters:	model_directory – Direcoty where the odel was exported json_filename – filename of the file
Returns:	a dictionary containing the parameters of the experiment

tf_crnn.model.crnn_fn(features, labels, mode, params)[source]¶

CRNN model definition for tf.Estimator. Combines deep_cnn and deep_bidirectional_lstm to define the model and adds loss computation and CTC decoder.

Parameters:	features – dictionary with keys : ‘images’, ‘images_widths’, ‘filenames’ labels – string containing the transcriptions. Flattend (1D) array with encoded label (one code per character) mode – TRAIN, EVAL, PREDICT params – dictionary with keys: ‘Params’, ‘TrainingParams’
Returns:

tf_crnn.model.deep_bidirectional_lstm(inputs: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581a390>, params: tf_crnn.config.Params, summaries: bool = True) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581a320>[source]¶

Recurrent part of the CRNN network. Uses a biderectional LSTM.

Parameters:	inputs – output of `deep_cnn` params – parameters of the model summaries – flag to enable bias and weight histograms to be visualized in Tensorboard
Returns:	Tuple : (tensor [width(time), batch, n_classes], raw transcription codes)

tf_crnn.model.deep_cnn(input_imgs: <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581a3c8>, input_channels: int, is_training: bool, summaries: bool = True) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f203581a2b0>[source]¶

CNN part of the CRNN network.

Parameters:	input_imgs – input images [B, H, W, C] input_channels – input channels, 1 for greyscale images, 3 for RGB color images is_training – flag to indicate training or not summaries – flag to enable bias and weight histograms to be visualized in Tensorboard
Returns:	tensor of shape [batch, final_width, final_height x final_features]

class tf_crnn.loader.PredictionModel(model_dir: str, session: <sphinx.ext.autodoc.importer._MockObject object at 0x7f2033696a90> = None, signature: str = 'predictions')[source]¶

Helper class to load an exported model and apply it to image segments for transcription.

Variables:	session (tf.Session) – `tf.Session` within which to run the loading process model – loaded exported model
Parameters:	model_dir – directory containing the saved model files. session – `tf.Session` to load the model signature – which signature to use to select the type of input : predictions (default) : input a grayscale image rgb_images : input a RGB image filename : input the filename of the image segment

predict(input_to_predict: Union[numpy.ndarray, str]) → dict[source]¶

Get transcription for input data.

Parameters:	input_to_predict – input data of the format specified in signature when instantiating the object
Returns:	a dictionary with the predictions