Monday, July 25, 2016

How To Import Text Files Into Sql Server Tables

Occasionally you need to import data from a fixed format file.  "BULK INSERT" provides an easy way to do that.

Suppose you have a three line column-formatted file of pet information as shown below
(the first two rows are for column description and do not appear in the file):


0        1         2         3         4         5
12345678901234567890123456789012345678901234567890
CC  1Dog     Rover                United States   
CC 82Turtle  Yurtle               Germany         
CC  5Llama   Rama                 Peru          
  



And you want to put them in this table


CREATE TABLE MYPETS
(
 [Age]     int   NOT NULL,
 [Animal] CHAR(10)   NOT NULL,
 [Name] VARCHAR(30) NOT NULL
)
GO


You first need to create an xml description of the data. 

In the "RECORD" you specify which columns hold which data fields.  The "ROW" will describe how to stuff the fields into the table.

A few notes:

You don't have to import all the fields.  You can skip them by not putting them in the "ROW" section.

You do need to describe the entire row.  It's a little tricky for the length of the RECORD because of "end-of-line" characters.  Are there 1 or 2 terminating characters?
If you get errors like this:



Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 4 (Name).


You may need to experiment with the length of the final column.
When parsing integers I found that its easiest to convert all the "INT" columns to "CHAR" and look at the output.  Often its off by one or two characters.  Then convert back to INT.

Here's our xml format file:


<BCPFORMAT  
xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"  
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
  <RECORD> 
    <FIELD ID="1" xsi:type="CharFixed" LENGTH="2"/>  
    <FIELD ID="2" xsi:type="CharFixed" LENGTH="3"/>  
    <FIELD ID="3" xsi:type="CharFixed" LENGTH="8"/>  
    <FIELD ID="4" xsi:type="CharFixed" LENGTH="20"/>  
    <FIELD ID="5" xsi:type="CharFixed" LENGTH="20"/>  
  </RECORD> 
  <ROW> 
    <COLUMN SOURCE="2" NAME="AGE" xsi:type="SQLINT"/> 
    <COLUMN SOURCE="3" NAME="ANIMAL" xsi:type="SQLCHAR"/> 
    <COLUMN SOURCE="4" NAME="NAME" xsi:type="SQLCHAR"/> 
  </ROW> 
</BCPFORMAT> 



Finally to put it all together, use the "BULK INSERT" command to read the document and use your xml format file:



BULK INSERT MYPETS FROM 'c:\Users\BDCOOPER\Documents\mypets.txt'
WITH
(
FIRSTROW = 1, -- skip to this row increase of header rows
--LASTROW = 12, --optional used for debugging
ROWTERMINATOR = '\n',
--ERRORFILE = 'c:\Users\
BDCOOPER\Documents\MyPets.error',
FORMATFILE = 'c:\Users\
BDCOOPER\Documents\MyPets.xml'
)
GO
select top 10 * from MyPets

The "LASTROW" is useful for debugging.  If you process dies in the middle of your file, set LASTROW to 2 and see what is being imported into your database.  Usually it's just off by a column or two.

ERRORFILE can be useful for viewing errors, but its also a pain because enterprise manager can retain a lock on it.

If all goes well you will get your text data imported into SQL.



Age   Animal       Name
 1    Dog           Rover              
82    Turtle        Yurtle             
 5    Llama         Rama                



You can also use BULK INSERT to read comma separated files by defining a field separator.

Enjoy.

Saturday, July 23, 2016

Austin .Net Users Group - Continuous Integration/Containuous Deployment with Docker by Gabriel Schenker


April 11, 2016 - How Docker simplifies continuous integration and continuous deployment

 45 people attended the .Net Users Group to hear Gabriel Schenker talk about his experience with Docker containers.  Here's my one picture and random notes.



Continuous Delivery goes through manual QA process
Continuous Deployment skips QA and go straight to live
Blue/Green deployment is non-destructive, in parallel
Blue is current site, Green is new site 
visit https://lostechies.com/gabrielschenker/2016/04/07/blue-green-deployment-in-docker-cloud/ for details.

Containers are here to stay.

DockerToolbox
$ docker build -t adnug3 . //builds an image
 docker images are layered, images can be made from pre-existing images with additional parts
$ docker images //shows list of docker images
$ docker run -d --name adnug -p 5000:5000 adnug3
$ docker ps //shows list of running docker container
$ docker logs adnug

Why not run TeamCity in a container?
Publishing Container Images
DockerCloud has images
You can publish images to Docker Hub registry 
Usually today we use Octopus deploy or rake or PowerShell scripts

cloud.docker.com can deploy to AWS or Azure

Load balancers can be loaded from a docker image and front-end multiple docker nodes.
useful for blue/green testing








 

Thursday, July 14, 2016

Using Nuget Package Restore Instead Of Checking into Source Control Is Evil


 NuGet is great and wonderful and makes acquiring packages for .Net project easy and encourages consistency in directory structure which is always a happy thing.

But this trend towards using NuGet Package Restore instead of checking into source control is evil.

Dynamically downloading NuGet files has these supposed advantages:
  1. Reduces the amount of disk storage for version controlled files
  2. Reduces the number of binaries you need to checkin.  We all know that checking binaries into a version control system is bad because Git doesn't play well with binaries.
Well neither of those are compelling to me.  Disk space is cheap.  Really.  Don't worry about it.  You  can store binaries in Git and it's OK.  Really.  Philosophically it's bad to store binaries, but in the real world no kittens die. 

Seven Reasons not to do dynamic downloading of NuGet packages:

1. The most compelling reason for not pulling your libraries directly from NuGet.org for each build is security.  NuGet.org is a huge security risk.  A bad guy could quietly inject malware into NuGet packages and the resulting malware would be embedded the next day via continuous delivery in hundreds of projects on the web.

A security researcher would hopefully find the malware in a few days, but by then the malware has compromised thousands of machines.

By taking a library and putting it in version control you protect yourself by giving the internet time to test the libraries.  You may hear that your package was corrupted, and you have time to fix it.  After a few weeks, the packages are probably fine.

2. NuGet will go away someday.  It won't be tomorrow, but eventually a better technology will replace it and if you try to build your system in 10 years you may be out of luck. I know this is true and wrote about it extensively on my GeoCities page.

3. A package author may remove her code from the NuGet feed the night before you compile your next release.  Sure you can get a backup from somewhere, but it takes time.

4. The NuGet.org goes off line.  NuGet.org is not guaranteed to always be up.  I've been working from home late at night and couldn't compile because NuGet.org was offline.  Eventually I contacted a coworker who emailed me the missing libraries, but is that any way to develop software?

5. Late at night, your build server may lose internet connectivity and your six hour build fails.

6. Reading libraries from a disk is faster than pulling from the internet.
7. Dynamically downloading dependencies complicates the build.  You have to remember to do a "dotnet restore -v Minimal".   OK, it's not much more complicated, but it's an extra step.

In conclusion, spend an extra $100 bucks on a terabyte of disk space and checkin all your dependencies.  Your co-workers in 10 years will thank you.







Monday, July 04, 2016

How Not to Write User Documentation

One of my pet peeves is user documentation that is written backwards in time.  For example, I received this today:

"Enable and Set Automatic updates to "Automatically Download and Install them" in Tools >  Options section."




Although this is a trivial example, in my mind I have to push the instructions onto a "stack" and then pop them off as I go.  This would have been easier:
"Run XXX.  Select "Tools/Options".  Then set"Set Automatic updates" to "Automatically Download and Install".

If you are writing user instructions, please write them chronologically in order of what your tender user would do, not what you just did.

Tuesday, June 21, 2016

Setting Up a New Windows 7 Box

Here's my list of things to do with a new windows 7 box in 2016: 
 
Set Preferences
-----------
Right-click RecycleBin, select "Properties" and uncheck "Display delete confirmation dialog" box
In the task bar, right-click select properties and set "Taskbar buttons:" to be "Always Combine"
Add environmental variable "HOME" and set to C:\home\mfincher so emacs can find me
Add C:\home\mfincher\bin to environmental variable "Path"
Create C:\tmp so ange-ftp can work.
Turn off error reporting, "My Computer"/Properties/Advanced system settings/Advanced/Startup and Recovery/Settings/System failure, set "Write debugging information" to "(none)"


Install Utility Programs
------------------------
Install Emacs and a
Install spell (https://www.emacswiki.org/emacs/AspellWindows)
ssh 
Install SoapUI from https://www.soapui.org/downloads/latest-release.html
Install Chrome and Firefox, set up sync in Firefox to get old bookmarks and addins
Install Launchy from https://sourceforge.net/projects/launchy/files/
Install AutoHotKey from https://autohotkey.com/download/
Install ConEmu from https://sourceforge.net/projects/conemu/
Install PostMan inside Chrome
Install Notepad++ from https://notepad-plus-plus.org
Install Mouse drivers
Install visual studio
Install pdf reader sumatra - http://blog.kowalczyk.info/software/sumatrapdf/download.html
Install Paint.Net http://www.getpaint.net/download.html
Install Mezer Tools https://www.bayden.com/mezer/ Windows-s for capture Windows-c for calipers 
Install 7-zip for zip/tar files from http://www.7-zip.org/
Install Process Explorer  http://technet.microsoft.com/en-us/sysinternals
Install cygwin, add c:\cygwin64\bin to "Path" environmental variable so emacs can see "gzip"
        or unxutils from https://sourceforge.net/projects/unxutils/
Install Microsoft Programs
------------------------
set powershell to use unsigned scripts: add  "set-executionpolicy remotesigned" 
Install sql server 
Install Visual Studio and ReSharper
 
Copy Files From Old Computer
----------------------------
Copy My Documents, My Pictures, c:\home, c:\opt
copy over all the old databases
Copy over inetpub directories
Copy over old Microsoft Mail archives, *.pst files
Copy root directory of Perforce files, C:\workfiles
Copy the old %AppData%\Launchy\Luanchy.ini config file over to new machine 
Copy Themes from %userprofile%\AppData\Local\Microsoft\Windows\Themes C:\Users\fincherm\AppData\Local\Microsoft\Windows\Themes
 

Monday, June 06, 2016

Teaching My Nine Year Old Daughter to Program with LightBot.com

I introduced my nine-year old daughter to lightbot.com to show her how to program.  LightBot.com is a wonderful site where you can download a little game to a device (she has a $50 android table) and learn to program a little robot.

The game is amazing at teaching programming.  After learning the basics, the site introduces kids to the concept of functions and recursion (although its called 'loops').  Kids think they are just playing a game, when in reality they are learning some basics of programming.

Friday, May 13, 2016

Why Kanban Doesn't Have Sprints. Agile Austin Lunch May 13, 2016

Thirty Agile Austiners attended a free lunch at Kasasa to discuss with value of time boxing a sprint.


What are benefits of timebox?
Predictability / Focus Priorities / Cadence / Estimation  / well documented, it comes in a box

 Problems with Timebox:
1.  If testing takes four days at the end of a sprint, what are the developers doing those last four days?
Timebox encourages crossfunctional team members; developers that test the last few days.  Is this the best use of their time?
2.  No room to do important bug fixes or building infrastructure.  Too much noise to estimate accurately.
3.  Have to do lots of padding for noise.
4.  Poor quality, and no refactoring, results from cramming something in at the last minute.
5. Timebox has a negative effect on morale, as a sprint is considered a failure if only 9 of 10 items were done.  When this goes on sprint after sprint morale suffers.  Then developers start to pad estimates which then starts to look excessive.

Notes:
Kanban - one piece flow - watch a single work item carefully to see blockers
Kanban is flow based, Scrum is iteration based.