Making pdf Documents
© Brooke Clarke 2006 - 2007
Manual or Auto Feed
Dots Per Inch
Black & White, Gray scale or
Scanning Blank Pages
Portrait or Landscape
Angle Correction Rotation
What's wrong with most Bookmarks
This came about because I wanted to
pdf versions of military
and test instrument Technical Manuals
that range in size
dozens to hundreds of pages. So here are some of the things
learned in the past few years. There are three main steps to
making a great manual. Scanning, Post Processing and Acrobat
processing. The following is based on the idea that a CD-ROM
DVD will be used as the distribution medium, not an on line
document. When a document is to be on line the file size
be minimized both to reduce the storage requirement and to shorten
download time at the expense of quality.
The first, but for most also
essentially the last, step is to scan the document.
Manual or Auto Feed
There are two kinds of scanners,
manual and auto feed. I use a manual flat bed scanner and it has
advantage that when individual pages are scanned they are aligned
not rotated. An auto feeder saves on labor, but creates a
of problems. Since there needs to be some clearance between
edges of the paper and the feed slot, say it's 0.1 inches, then
paper can rotate by some small amount (0.1"/8.5" = 0.6
0.6 degrees is a very noticeable amount of rotation and typically
pages have some rotation. The feed rollers sometimes grab
sheets either skipping a sheet, making some combination of two
or distorting a single sheet by smearing the letters.
When copying double sided sheets the back side image bleeds
shows up in the scan of the front side. The main reason for
through is that the scanner lid has a white lining. This is
as bad as a mirror and reflects light back through the
my opinion the lid should be painted flat black, or what I do is
sheet of flat black paper to the lid. Now light is only
by the front of the page knocking the bleed through down a lot.
My HP 6200C ScanJet has a white surface under the lid. It
died and HP AFAICT does no make a replacement scanner. The
7600 flat bed scanner I'm now using has a flat black surface under
The four images below show what bleed through looks like on a
histogram. The upper left "Exposure Adjustment" window shows
classical bleed through peak on the right. In the image
you can see the word "INDEX" in the bleed through.
The upper right "Exposure Adjustment" window shows the highlight
has been moved from 233 to 169. 169 was choosen because it's
the left toe of the bleed through curve. The image below
The only change made was to the highlight cursor. Both
the raw .bmp files directly from the HP 6200 scanner.
The next step in eliminating bleed through is to set the white
threshold using the histogram. On the HP
flat bed scanner, when doing gray scale or color
can adjust the black threshold, white threshold and the gamma
stuff with the 3 color channels). The threshold controls are
directly below the histogram and move cursors on the
So by placing the white (right hand side) cursor just to the left
the toe of the hump that's the bleed through you eliminate it
completely. Note this is a trade off since you are also
out some of the highlight detail in the image.
Note that if the page is Black and White (no gray) then by
in gray scale and setting the scanner controls you can completely
eliminate bleed though. But it there is a photograph or
gray scale on the page eliminating bleed through and the quality
image are a tradeoff and the histogram gets to be very important.
When scanning a bound book insert a sheet of black paper behind
page being scanned.
I tried the HP 8400 flat bed scanner and although they "show" a
histogram, there's no way it can be used as described above since
controls were somewhere else and there were no cursors. I
it back and stuck with the 6200.
See Post Processing Bleed Through
fix bleed through in an existing image file.
I expect that what most people do is scan in jpg or pdf
This is a mistake if you're going to do any post processing since
are lossy formats and degrade each time there's a new
A non lossy format like Bit Map (.bmp) is a better choice to
high quality. Bit map also includes the physical size of the
image which is not the case with Tagged Image Format (TIF).
When working with hundreds of pages there will be mistakes and you
need to rescan a page or two. So it's a very good idea if
file naming scheme somehow will allow you match to the actual
book. Most of the TMs that I scan use a chapter-page system
the first page in chapter 4 is called 4-1.
Also the file name should be such that the computer file manager
alphabetize them into the same order that they appear in the
book. Otherwise you will need to do a lot of manual work to
the pages in order.
The answer for me has been a file name like nnn-mmm.bmp.
nnn is the chapter number starting with 000 for stuff prior to
1 and after the last chapter keep using the next number, so if
A comes after chapter 9 then it's 010-mmm.bmp. Where mmm is
page number. Note nnn and mmm are always say 3 digits so the
front cover is 001-001, not 1-1. This is needed to keep the
sort order correct.
A schematic might be 004-037L.bmp for the left side and
for the right side. If there are more than two scans you can
A, B, C etc. This way when making multiple scans of what is
really one page number you don't use up page numbers that are
for other actual pages. For a huge book you may need to
for a 4 digit page number like
When scanning you don't need to manually enter all of the file
name. When the save file button is pressed the default file
is the last name stored and you can just place the cursor in front
the digit that needs to be changed and type: delete, the new
Dots Per Inch (DPI)
This has a lot to do with the source material. If the source
line art or text made prior to laser printers then 300 DPI is
very good. But if there are schematic diagrams with very
print size (like a C size drawing that has been photo reduced by
then 600 DPI is needed. Photos are discussed separately.
Black & White, Gray scale or Color
If the source document has color then the scan should be in
color. When scanning very old books where the pages have
sometimes using color will make the post processing easier.
everything else I use gray scale. Black and White has an
when you are trying for the smallest file size, but for me it's
much of a quality reduction. Note that even though you are
scanning a black and white document you need gray scale so that
half a pixel sees black and the other half sees white it can make
gray. If B&W was used in this situation there would be a
pixel error either into the black or into the white.
For most documents you can set the frame size to just a little
than the page image then there will not be black borders.
working with schematics it's good to expand the frame size to
as much of the schematic as possible so that when stitching you
have more choice of where to place the seam. But remember to
it back for text pages.
Scanning Blank Pages
Books are laid out so that new chapters always start on a right
(odd numbered) page. This is a good thing to do for a
book since it allows thumbing for new chapters, but has no
an electronic only book. If making a pdf where it's planned
print all of it then scanning blank pages will maintain the odd
the right concept.
This is where a number of things get
fixed and the advantages of an electronic manual start to show
up. This is done using a photo editing package like
Photoshop. These packages process image files and although
can do some text that's not their main use.
Recently I received images of pages scanned by someone else that
noticeable bleed through on many pages. But in Photoshop you
Image\Adjust\Levels and on the histogram move the right cursor to
left so that it's on the toe of the bleed through curve.
moving the left cursor to the right makes the blacks blacker
the whole page look better. This is much better than trying
use the "magic wand" to get rid of the bleed though. This
so well because the bleed through is in the form of light gray
not the black images that are desired.
June 2017 update: After scanning a page on the Xerox 7600 in
black & white mode directly into Photoshop the Image\Mode is
color. By changing the mode to grayscale then using
Image\Adjustments\Level to move the right hand slider to eliminate
bleed through the results are better than with the mode at
Color. But before doing this the page is rotated if needed
and cropped. Some erasing may be needed at the gutter.
Portrait or Landscape
In a hard bound book all the pages must be in the same
that's not the case for a pdf document. So if there's a
or table that's better viewed in landscape mode the page should be
rotated into landscape format. Note that if the document is
printed Adobe will automatically rotate it.
Angle Correction Rotation
If an auto feeder was used in any of the prior generations then
will be pages with rotations typically less than 2 degrees that
need to be rotated to within about 0.3 degrees of true. 0.7
degrees is very noticeable and anything over 1 degree is really
noticeable. If it's a schematic that's going to be stitched
pages need to be the same rotation. This means that you can
stitch a couple of pates where they are both 0.6 degrees, but not
one is 0.0 and the other is 0.6.
The idea is
things that are not wanted. Binder and staple
holes are an example. Older copy processes have the tendency
leave small black specks much like finely ground pepper.
books where the pages have yellowed have a grainy
you have set the frame size too big or the page got rotated there
be black borders that need to be erased. Some copy machines
streaks, like there was a scratch on the drum. The fold
a schematic are another thing that can be erased. A properly
scan of a clean page may not need any cleaning. An antique
may need an hour of cleaning for each page.
Sometimes rather than erasing to white you need to use a copy and
method, like for eliminating the binder holes in a color cover
The image at the left comes from the 1928
. Many hours of cleaning were required to get
it looking like this. The photo is a reduced resolution
full size it's even more impressive. Note that like all the
illustrations in the catalog this is hand drawn using K&E
supplies not a photo.
Fold out pages need to be stitched together. This makes it
easy to look at a schematic on the computer screen. When a
4 page fold out schematic is broken up into seperate pages it's
almost impossible to work with it on a computer. What's most
commonly done is to get out the tape and scissors and make a hard
Most schematics that were drawn prior to laser printers were done
either by hand or a plotting machine, but in either case there was
pen or pencil used that could not draw anything finer than
0.3 mm. This is a much wider line than a laser printer can
draw. So if you have a schematic that's on a B (2 x letter
or C (4 x letter size) sheet you can stitch it together. Do
shrink the page size, leave it as is since the Adobe
default will shrink the page to fit the printer. This way
user has the Adobe option to use tiling (cut and tape) to get a
size print. There was a time when photo reductions were used
typically to move a drawing to the next smaller sheet size.
these you need to go up to 600 DPI when scanning to maintain the
When stitching you can place the stitch anywhere in the area where
images overlap. Rather than just take all of the second
it's good to look for a place where there's a minimal amount of
cross the stitch. It's common that there's a small scale and
rotation difference between the two images that are being joined
even if you pick a good stitch line you may still need to
where on the line the best match will occur and as you go farther
the match gets poorer.
Line art, like schematics, can be
stored as either an image or vector file. Photoshop, Paint,
are image processing programs. Autocad and the old HP ME
(Mechanical Engineering) are vector processing programs. A
based pdf "D" size drawing fits into a few hundred kilo bytes, but
the same file with the same resolution is converted to an image
it will be a few hundred Mega bytes, i.e. about 1,000 times
larger. This was made real to me when making the web page
HP E1938 OCXO
where the complete
package was very small as a vector pdf but huge when done as an
I haven't found a free image to vector converter
you know of one please let me know
Often in schematics a box is drawn
around part of the circuit to define some function. Since
lines look very similar to the trace lines they add
But they can be manually erased and replaced by either a colored
or a gray line. Greatly adding to the understanding of the
The trace lines can also be made much more understandable by doing
things like making all the ground lines wider and solid
The Vcc lines can be made red and the signal lines some other
color. This goes even further in making the schematic easier
understand at a glance.
Some colors, like yellow, look good on the computer screen, but do
show up when a page is printed. So when choosing colors be
they have at least 15% of red, blue and green components, so
some gray to print. Better is to make a trial print to test
In Acrobat 7 there is a "make pdf
multiple files" option and a browse function. So if you have
files named as described above it's just a few clicks and you will
a single document that combines all the pages. I put this
the intro to the Acrobat processing sections because it's just he
beginning, not the end of what's needed.
An electronic document is different from a physical document and
you find what you want is different. You can not "thumb" an
electronic document like you can a book. But a book does not
the instant access that you get with an electronic document.
pages are numbered using the chapter-page method there's no way to
correlate that with the pdf document page number.
think good bookmarks are by far the best way of navigating a
document. A pdf document without bookmarks is next to
for use at a computer, all you can do is print it and use
copy, what a waste!
It's an art to name the bookmarks to keep them both short and
meaningful. Nesting folders is part of keeping the length
bookmark names short and also logically dividing the
For documents of about 25 pages and up bookmarks make a world of
difference in how easy it is to find something.
When the bookmarks are setup like the table of contents and List
Illustrations and List of Tables (TOC, LOI and LOT) you have all
these handy no matter what page you are on, just click on the
tab, open a folder or two and your on a new page. It's
I overlooked the use of bookmarks for
some time. Note that all the free TMs on LOGSA have
also note that they are useless. This means that when you
CD-ROM with a bunch of TMs it's also probably the case that the
bookmarks are useless. I think that someone that knew
bookmarks wrote the mil spec for how a TM is to be made and the
has a paragraph saying that there will be a bookmark for each
paragraph, figure, and table and sure enough that's what they
have. The problem is that the bookmark names are
For example "Chapter 3" is the name of the bookmark for "Ch 3
Operation" . The bookmark name for a paragraph may be
Section 3 -Paragraph 4.1.4". This has two problems, one -
not tell you what's in this paragraph and two it's too
Bookmarks are in a collapsible frame to the left of the main
frame. You can click on the "Bookmarks" tab on the left to
them and you can click on the button at the center of the
little bumps in bottom right of the illustration)
to close them. The divider bar can also be grabbed and
moved. So you can see that good bookmarks both tell you
will get when you click on them and also are as short as
In the illustration they use "CHAPTER 1" instead of "Ch
capitals is like someone is SHOUTING, not pleasent. Also
up 8 spaces when 4 work better.
Another problem with the LOGSA TMs is that the bookmarks depend
logical order for the paragraph numbers. If there's a
number typo caused
by the OCR then then all the bookmarks for the rest of
chapter are missing.
When someone makes a pdf document without bookmarks and then
any changes, which includes the ability for the user to add
then they have really made a useless document.
Some vendors use pdf documents for their data sheets, which in
cases are really books of 50 or more pages. I have helped
change from using the worthless type of bookmark to using better
A good bookmark gives you a good idea
of what you will get if you click on it. It's also as
possible. Since bookmarks can be nested a good way to
the "Chapter 4- Section 3 -Paragraph 41.4" length problem is to
folder for "Ch 4 Maint" and a sub folder for "DS Maint" and then
bookmark for P4 "P4-Cal Adj" and a sub bookmark for "P4.1 VFO"
sub sub bookmark for P4.1.4 VFO Max Freq Adj". This way
folder a bookmark tells you it's context. There can then
bookmarks called "Scope" but each is in a different section.
This is from TM11-5820-667-35 for the PRC-77. I think you
good bookmarks allow you to find what you want very quickly.
Notice in the illustration "Sec II Schematics & Block dia".
making a bookmark for each section all the indented bookmarks
that section no longer need to carry any of the section title
making them shorter. Another example is where there are a
bookmarks that all relate to the same item in a longer list of
different items. Adding a new bookmark-folder allows
removing the common name from all the sub bookmarks, i.e.
bookmarks are context sensitive and each one does not need to
all the higher level names.
Under the Figures in the above illustration I should have
for schematics or blk for block diagrams.
Links work like web page
They can be placed on about anything in a pdf document.
all, LOGSA manuals come with links on the Table of
of Illustrations and List of Tables entries, so you can use a
conventional book navigation approach. They also typically
links in the body of the document whenever there's a reference
other part of the document. For example referenced to
linked as are references to other paragraphs. Some
have every Index entry inked to the referenced pages. If
bookmarks are good as described above there's less need for
they are still very handy, i.e. one click and you're at the
page. And using the BIG back arrow (not the previous page
arrow) you can go back to the page prior to the link, making it
have a look at where the link is pointing and then return.
Optical Character Recognition
different types of pdf documents. Most of, but not all,
documents have each letter of the text as a letter. This
allows searching the text but also allows correcting
an antique book is scanned you can leave the image of the book
appear in the pdf document and hide the OCR text behind the
image. This allows searching but you can not change the
appearance. Without bookmarks OCR allows finding things,
a good set of bookmarks it's not as important. With
7 you can just click a button and add OCR for the whole
(although it takes some time and memory).
I used Omni Page Pro 11 for some time. You have quite a
control over what it does and the file format for the
three main windows are a list of the source pages, the active
being worked where it brings up questionable conversions and
your input, and the output window running the application
the app, like Word. One problem is that it may make a
not ask you what to do. Another is assigning different
similar text or making some text bold and some not. Omni
ZERO on line support and the quality of the phone support leaves
something to be desired.
Acrobat 7 has built in OCR capability but I have not figured out
really use it. So far I just click and let it run.
not been able to exercise much control of what it will do or
what it has done. If you know about Acrobat 7 OCR let
I don't use page number links
good bookmarks work so well. If a document has poor or non
existent bookmarks, links or OCR then adding bookmarks where the
and target is a page number would allow translating a body text,
Index reference to a page number into a way to get to that page
number. Note typically there is NO way in an
document to get to any given page number since the pdf file page
almost never correlates with the number printed at the bottom of
There are some new features in 9 Pro
that are really nice. You can rotate a page or group of
also Crop a page. Ofter when someone else has scanned a
they do not rotate the pages so they can be read at a
Note: the reader when printing has the capability of rotating them
to match the paper i.e. landscape or portrait both print
correctly. This often has the effect of adding a lot of
space around the image so cropping inside Acrobat is very handy.
Autocad drawings can be made into pdf formats with different
pdf. The fancier version keeps the layers separated
allows the reader to turn them on and off individually.
A bound book needs a "gutter" on
edge where the pages are attached to each other to form the back
spine of the book. There are also borders at the top,
outside edges. If the page is scanned and all these white
are included, when the page is displayed on screen the print and
will be smaller than they would be if the white space were
cropped. All printers have a minimum margin for each edge,
the page is printed the printers margins will be added to those
page making the printed text and images smaller than they were
origional. So cropping improves both the on screen and
versions of the document.
An area where a electronic document
very different from a printed one is the case of
pdf document allows the user to change the size of the displayed
image. I find this very useful since my reading vision is
good as it once was, but it's fantastic to be able to zoom in on a
resolution color photo to the point that you are seeing
Taking a high resolution color photo
some skill and is the subject of many books and college level
classes. When using a digital camera if at all possible use
raw file format (the one that makes the largest file size).
that a color scanner can make a 30 Mega byte file at only 300 DPI
HUGE files at higher DPI values like 600 or 1200. These
macroscopic views or even microscopic views when enlarged.
can see way more in one of these images than you can with a
I frequently see things in my photos that I did not see with my
Making a high resolution color photo into a pdf does not result in
file size reduction and may even make it larger, I feel it's the
thing to do.
Back to Brooke's Manuals
Scanned by Brooke, Home, Products for Sale web pages
[an error occurred while processing this directive] page created 22 May