In the last part part we were using VisionKit to accurately recognize text on captured images. In this part we will bring everything together and create a searchable PDF by combining the image and recognized texts. Lets recap what is needed and what is already done:
Part 1 ✅
Foundations ✅
Scan the document ✅
Part 2
Recognize the text on scanned images ✅
Part 3
Create a PDF with the images
Place the recognized text behind each image on the PDF
Save and display the PDF
Create a PDF
Before we start right into coding I want to highlight what actually a searchable PDF is. It is a combination of a text and image layer where the textual layer is placed behind the image. The position of the text characters should be in the exact same position as the character is placed on the image. This in the end leads to the fact that you can select the text behind the image and it looks like you would select the text which is shown on the image. I think the following image visualizes this concept very well:
Last time we stopped with the PDFCreator class and a method signature that will take an array of UIImage and returns Data which can be used to create a PDF out of it.
Let us continue by implementing the createSearchablePDF(from:) method. The first step is to invoke the pdfData method on an UIGraphicsPDFRenderer instance. This method is responsible for creating the raw data that can later be used as the return type of our method. Next up we keep a reference to the underlying core graphics context. This can be used to draw things on the PDF. We will use it for drawing our scanned UIImages and the recognized text onto the PDF.
Drawing image and text
Having the base setup for the method, we can start iterating over the images we scanned in part 2. The workflow for processing a single image will be the following:
Take the image
Recognize the text on the image
Create a PDF page with the dimensions of the image
Use the recognized text to draw text
Draw the image above the text on the pdf
Having everything in place, let's start with the actual implementation of the method.
As you may have already seen, we are using two methods here that we did not yet define - writeTextOnTextBoxLevel(recognizedText:on:bounds) and draw(image:on:withSize). We will have a more detailed look on these now.
Drawing the text
Drawing text on the PDF page is a little bit tricky in this context. We need to exactly know where to draw the text as well as choosing a font size which tries to match the original font size as close as possible in order to "simulate" the selection of the text on the image in the end.
Thanks to the VNRecognizedText we already have the information about where the text was recognized on the images. Furthermore we get the exact bounding boxes of each recognized line of text. Now the task is to find a size for the system font which tries to fit exactly into this bounding box. For that purpose we will utilise a binary search algorithm from the FittableFontLabel package. This implementation gave me the best results in terms of matching font sizes. I only did some small modifications in order to make it fit into the current context, but all credit here goes to Tom Barens and his awesome library.
Like we did it with the PDFCreator class, the FontSizeCalculator is implemented as a singleton and exposes a single public method called fontSizeThatFits(text:maxFontSize:minFontScale:rectSize) which calculates the best matching font size for a given boundary. I do not want to go into further detail here, since the purpose of the series is to find out how to create a searchable PDF and not how to implement a binary search for a font size.
Having the font size calculation in place we can start to implement the writeTextOnTextBoxLevel(recognizedText:on:bounds) method. Like with the createSearchablePDF(from:) method, I want to provide a high level overview about what has to be done within the method:
Get the bounding box of the recognized text
Translate the bounding coordinates to image coordinates
Calculate the best matching font size
Prepare an NSAttributedString with the recognized text and font size
Draw the string inside the transformed bounding coordinates
Having our plan we can start with the implementation of the method.
This is the full implementation of drawing text based on the complete text box level of the recognized text. You can also draw each individual character on the PDF but in my tests this gave me worse results than just draw the whole line of recognized text.
Drawing the image
In contrast to drawing the text, drawing an image is rather simple. The only thing necessary is to call the draw method of UIImage. Please keep in mind that you do not need to provide the CGContext variable per se. It is just to demonstrate that the draw method of UIImage is always executed within the current graphics context.
Creating the actual PDF
The last step which is missing, is to convert the raw Data into an actual PDF file. For that we take the returned Data instance from createSearchablePDF(from:) and initialize a PDFDocument instance. If this is successful, you can do whatever you want with the PDF, for example saving it to the documents directory of the user.
Conclusion
That was quite a lot of code today but in the end we managed to finish up a working example of creating a searchable PDF of scanned documents. I hope you found this rather longly series interesting. It was the first time I wrote a series of connected posts about a more complex topic - there are for sure some places where I could improve my writing here. If you have suggestions or question please don't hesitate to reach out to me !