Mitch Fincher: The Distracted Programmer: May 2022

Saturday, May 21, 2022

Fixing EXIF Errors in JPEG files like "Error: Bad format (0) for IFD1 entry 0"

I've been working on my greatgrandmother's photos and find jewels from 1910 like this one:

Unfortunately, many of the photos from 110 years ago do not have any names written on the back.

As I think about how to preserve my family's story for the next 110 or 300 years, I'd like to write the description on the back of my digital pictures by injecting EXIF image descriptions into the photos themselves.

To do this I downloaded the free command line tool "exiftool" for my mac.

Then, I added photo captions into jpeg files with this command:

exiftool -imageDescription="My image description" mypic.jpg

but got errors like this:

"Error: Bad format (0) for IFD1 entry 0"

I purchased the tool "metaImage" from the Mac store to add descriptions into the jpeg, but it wouldn't inject the caption into my files. I contacted their tech support and got a quick solution.

Jérémy Vizzini from neededapps.com (creator of "metaImage") solved the problem for me. The issue was my jpeg files were corrupt. (Some of the photos were from 2003, so the EXIF standards may have changed, or not have been rigoriously followed decades ago). Jérémy gave me this snippet of exiftool code to fix the jpeg:

  exiftool -all= -tagsfromfile @ -all:all -unsafe -icc_profile mypic.jpg

This fixed the problem. Thanks Jérémy! Now my jpegs will be ready for the next 300 years.

How do you archive your photos to preserve them for 300 years?

Wednesday, May 04, 2022

How to Find Duplicate Images on a Mac or Linux Machine

This is how to search recursively all directories on a Mac or Linux machine for duplicate images with a single line of awkward bash script. This method will find duplicates anywhere on your disk below your current directory (try "~"), and find multiple versions.

Many commercial products exist to easily find and delete duplicate images like those reviewed here, but if you are like me, and don't like to download apps willy nilly for a single task, and have a bit of shell scripting experience, you can use the following line of tortured bash script to find duplicate files.

find . -type f $ -name "*.jpg" -o -name "*.gif" $ | awk '{print "\"" $0 "\""}' | xargs shasum -a 256 | sort > checksumAndFilename.tmp && cat checksumAndFilename.tmp | awk '{print $1}' | uniq -D | uniq > checksum.tmp && grep -f checksum.tmp checksumAndFilename.tmp | tee duplicates.tmp && echo "output in \"duplicates.tmp\"" && rm checksumAndFilename.tmp checksum.tmp

The basic idea is to search the current directory and all subdirectories for images files, calculate a hash for each file, then sort the hashes and then list adjacent hashes, which would be duplicates. (If this is too much for your brain, just go to imymac and buy an app.)

Ok, let's go through the command in detail.

Get all the image files in your directory and below. Update "*.jpg" to "*.png" or whatever you need.
find . -type f $ -name "*.jpg" -o -name "*.gif" $
Surround the file name with double quotes, since some people still insist on the horrible, dasterdardly, awful practice of including spaces in names.
awk '{print "\"" $0 "\""}'
Pipe the names of the files into shasum to generate a hash
xargs shasum -a 256

Sort by the hash value so duplicates will be adjacent and write to a temp file

sort > checksumAndFilename.tmp

checksumAndFilename.tmp looks like this. The files with the same hash value would be duplicates.

ff45b77226369d27b67772e72dfe8dc3387eff06  ./2010-07-04-2224-July4_036.jpg
ff65e3611973092e61127439af6b3c82d0ee055a  ./2010-12-29-1408-IMG_9638.jpg
ff680170b0451868a1bda027c801b78f55067366  ./2010-12-24-1010-IMG_9235.jpg
ff918f6f8230deb3cd2208602dadb5c6f88039dc  ./2010-03-14-2025-IPhone_8146.jpg

We are almost done, but how to only see hash values that are duplicates?

Get only the hash values that are duplicates
cat checksumAndFilename.tmp | awk '{print $1}' | uniq -D | uniq > checksum.tmp

checksum.tmp looks like this. This are only the hash values that are duplicated.

0526e5586cc1e4d2d97e5cc813c8d9b698bc3df2
075a137c8857c8b38555cf632d906ed0581b9224

We have only the duplicated hash values. Let's match the hashes back with their filenames
grep -f checksum.tmp checksumAndFilename.tmp

We can see the first two are duplicates, and the next two are as well.

  
0526e5586cc1e4d2d97e5cc813c8d9b698bc3df2  ./2010-11-28-0926-IMG_0300.jpg
0526e5586cc1e4d2d97e5cc813c8d9b698bc3df2  ./IMG_0300.jpg
075a137c8857c8b38555cf632d906ed0581b9224  ./2010-06-08-photoshoot012.jpg
075a137c8857c8b38555cf632d906ed0581b9224  ./2010-06-08-photoshoot_012.jpg

Write to the output file and the screen
tee duplicates.tmp
Let's remind ourselves where the output lives
echo "output in \"duplicates.tmp\""
Clean up our mess
rm checksumAndFilename.tmp checksum.tmp

My gut tells me there's some ways to clean this script up. Please add a comment if you can improve the script.