Working with interactive forms
pikepdf provides two interfaces for working with interactive forms. There is a low-level
interface, pikepdf.AcroForm, which is exposed as the
pikepdf.Pdf.acroform property. There is also a higher-level interface available
in the pikepdf.form module, which provides several abstractions to make usage
easier.
Extracting Form Data
It is relatively easy to extract basic form data from a PDF.
>>> from pikepdf.form import Form
>>> form = Form(pdf)
>>> data = {}
>>> for field_name, field in form.items():
... if field.is_text or field.is_choice or field.is_radio_button:
... data[field_name] = field.value
... elif field.is_checkbox:
... data[field_name] = field.checked
Inspecting the Form
The form allows retrieving specific named fields via dict-like access. There are several useful properties common to all fields. The most useful of these are:
alternate_name, which is a human-readable label for the field.fully_qualified_name, which is the machine-readable key which identifies this fieldis_requiredis_textis_checkboxis_radio_buttonis_pushbuttonis_choice
>>> field = form['MyField']
>>> field.fully_qualified_name
"MyField"
>>> field.alternate_name
"Applicant's first given name"
>>> field.is_text
True
>>> field.is_required
False
Fields with duplicate names are supported. Accessing them by name returns a list of fields
instead of a single field. Accessing attributes directly on this list (e.g. field.value)
will proxy to the first field in the list.
Filling Form Data
Before filling a form, you will need to determine how you will deal with appearance streams. In addition to merely holding values, PDF form fields must explicitly declare how the filled-in value should look. This is known as the appearance stream. There are several options available.
First, you may choose not to generate appearance streams at all. Most full-fat PDF readers
are capable of generating these appearance streams themselves, so depending on your use
case it may be acceptable to leave appearance stream generation to the end-user
application. This is the default behavior of the pikepdf.form.Form class.
If you do need or want to generate appearance streams, you must provide the class you wish
to use to accomplish this task. There are two possible implementations provided with
pikepdf: pikepdf.form.DefaultAppearanceStreamGenerator and
pikepdf.form.ExtendedAppearanceStreamGenerator. To use either of these, simply pass
the class as the second argument to the constructor:
>>> from pikepdf.form import Form, DefaultAppearanceStreamGenerator
>>> form = Form(pdf, DefaultAppearanceStreamGenerator)
The differences between these two options is explained in the documentation for each class.
Lastly, you may implement your own class for generating appearance streams that better
fits your specific use case. It must implement the interface provided by
pikepdf.form.AppearanceStreamGenerator.
After filling a form, you may also wish to flatten it. This converts the interactive form fields into normal, un-editable text. This can be done as follows:
pdf.flatten_annotations()
Generating appearance streams is required if you wish to flatten the form.
Text Fields
Text fields can either resemble an HTML text input, or an HTML textarea, as well as a
password field, file upload, or rich text input. pikepdf supports only the first two
options, which can be distinguished from one another using the is_multiline property.
The underlying value of the text field is stored in the value property. The field
may also have a default_value which should be used when resetting the form.
>>> text_field = form['MyTextField']
>>> text_field.is_multiline
False
>>> text_field.default_value
''
>>> text_field.value
''
>>> text_field.max_length
75
>>> text_field.value = "Hello World!"
Checkbox Fields
Checkbox fields behave somewhat similarly to what you might be familiar with working with
HTML forms in JavaScript. There is a checked property which will tell you if the box
is checked or not. If access to the underlying value is needed, it can be fetched via the
value property.
Unlike HTML checkboxes, however, there is a value for both the on and off states, and
thus value will return different values depending on if the box is checked or not. The
value for an off state will be a pikepdf.Name with the value “/Off”. The value for the
on state is variable, and can be retrieved from the on_value property.
>>> checkbox_field = form['MyCheckbox']
>>> checkbox.checked
False
>>> checkbox.value
pikepdf.Name("/Off")
>>> checkbox.on_value
pikepdf.Name("/Yes")
>> checkbox.states
(pikepdf.Name("/Yes"), pikepdf.Name("/Off"))
>>> checkbox.checked = True
>>> checkbox.value
pikepdf.Name("/Yes")
Choice Fields
Choice fields may be either list boxes or comboboxes, as determined by the is_combobox
property. If the field is a combobox, it may optionally have an editable text box attached
to it, as shown by the allows_edit property. Editable choice fields may store
arbitrary values, but otherwise choice fields are limited to those options which are
returned via the options property.
>>> field = form['MyChoiceField']
>>> field.is_combobox
True
>>> field.allows_edit
False
>>> field.options[0].display_name
"Pike"
>>> field.options[2].select()
>>> field.value
"Trout"
>>> field.value = "Pike"
Signature Fields
pikepdf does not support signature fields, but does include a utility function to stamp an image over the top of the field’s bounding box. The stamped image must be a PDF.
>>> form_pdf = Pdf.open(...)
>>> sig_pdf = Pdf.open(...)
>>> form = Form(form_pdf)
>>> form['MySigField'].stamp_overlay(sig_pdf.pages[0])
To stamp an image that is not already a PDF, you will need to use an image processing library, such as Pillow to convert it:
>>> from PIL import Image
>>> img = Image.open(img).convert('RGB')
>>> img_as_pdf = BytesIO()
>>> img.save(img_as_pdf, 'pdf')
>>> img_as_pdf.seek(0)
>>> sig_pdf = Pdf.open(img_as_pdf)