Words by Coleman: September 2018

Saturday, September 29, 2018

Book Review: Blood Solutions by BJ Smith

Detective stories aren't usually on my bookshelf, but when I found out my coworker had written a novel, I wanted to read it.

From the first page, I was pleasantly surprised that the writing was excellent. I spend a lot of time reading for critique groups and beta-trades and this was well plotted and polished. The story itself kept me engaged and thinking through a quick, enjoyable read. The detective was an interesting character but I felt closer to the psychologist by the end. But my favorite part was the mystery--there was a perfect mix of clues that kept me guessing until the end.

His second book just came out so let me know what you think!

Wednesday, September 5, 2018

Grep within Microsoft Word Files

How many of you are actually UNIX users who write fiction in Microsoft Word for the convenience of sharing comments with others? Wait, it's just me? Regardless of the limited audience, I need to save this cool trick for my own reference.

The issue: I want to bring back a scene that I've written and discarded. I'm sure it lives in some previous version, but it is tedious to open all the previous versions and search by hand. (Yes, this is partly because I save so many previous versions despite my inability to open and search them. Specifically, there are 229 docx files of my current work-in-progress).

The real thanks goes to the last answer of this question on stack overflow: How do I grep in microsoft word files? . I can't upvote the comment, but the theory and specifics are invaluable.

docx files are just zipped xml files.

So all you have to do is unzip them,

use sed to strip off the xml tags

and you have a grep-able text file.

(Grep is an invaluable unix command, and one of the main reasons I use a mac so I can open a terminal and use these tools I know. Grep will return any instances of a string from a text file).

I wrapped the user's answer (in italics below) in order to convert all my docx files. I'm not going to explain all the unix commands; feel free to comment if you want to know more.

ls *.docx > ! list.all

grep -v -e "~" list.all > ! list.docx

mkdir text_files

set list = `cat list.docx`

foreach file ( $list )

unzip -p "$file" word/document.xml | sed -e 's/<[^>]\{1,\}>//g; s/[^[:print:]]\{1,\}//g' >! text_files/$file:r.txt

ls text_files/$file:r.txt

end

Now I should be able to search for my text string (orach, in this case) through each file.
cd text_files
grep orach *

However, the format of the replacement (sed) command above basically makes the entire document into one long line, and since grep returns the entire line that the match was found in, this is too much output. I spent the rest of my morning writing time trying to get a newline at each replacement with sed, but it is leading me down a rabbit hole. It all has to do with newline and escaping the newline and such and there are considerations based off of different flavors of unix/linux and shells. (For the record, OSX on a mac is BSD unix and my shell is tcsh). I can get the \n into the file but it doesn't show up as an actual newline. 's#<[^>]\{1,\}>#\\\n#g'

So meanwhile, all I need to know are the filenames that match, then I can open them in 'less' and get it out.

foreach file ( * )

set test = `grep orach $file`

if ( $#test > 0 ) echo "$file matches"

unset test

end

Words by Coleman

Saturday, September 29, 2018

Book Review: Blood Solutions by BJ Smith

Wednesday, September 5, 2018

Grep within Microsoft Word Files

Saturday, September 1, 2018

WhatsYourStory on Amy River's

Search This Blog

Labels

Blog Archive

About Me

Words by Coleman

Saturday, September 29, 2018

Book Review: Blood Solutions by BJ Smith

Wednesday, September 5, 2018

Grep within Microsoft Word Files

Saturday, September 1, 2018

WhatsYourStory on Amy River's

Search This Blog

Subscribe To

Labels

Blog Archive

About Me