一次用SSIS导入DTS文件的经历

西安的某DBA给了一个用,号作为column delimeter,用”号作为字符串列的text qualifier的数据文件。估计是DTS倒出来的,结果我用SSIS设好了column delimeter和qualifier,并设好哪一列用qualifier哪一列不用后,导入却仍然失败。看数据,发现是如果列中又再含有”号的情况出错。虽然这样的”号已经被用两个”号表示,也即是vb的转义方法。但不知为何SSIS仍然不能识别,按理说只要不是单独的一个”号,就可以仍然识别为字符串内部,这应该是没有岐义的。一时google不出答案,就想到把双引号改成单引号,结果发现这样的数据就可以正常导入了。

于是就开始想用方法实现这个替换,结果一下子花了大半天时间才完成这个导入。。。后面有点赌气的感觉--我就不信替换不掉。。。

最后用了一个vbs+正则表达式解决的。首先是考虑,””,不能替换,因为这可能是一个空串列。但我没考虑””在行首行末的情形--因为我是一行行应用RegExp的,如果对整个文件应用,怕机器吃不消。

中间还写过一个vbs,用纯粹的”转义的逻辑来处理--也就是用一个flag来判断是否在串内部的,顺序处理整个文件,但这个方法太慢,运行了十多分钟放弃。

正则式无非就是[^,]””[^,] 以及””\n和\n””,后面两个还是后来补上的,用editplus做替换也挺方便。而在vbs内如果不是用递归(循环)调用,而是想一次过替换,我就复制到另一个字符串里,然后记下所有的match来对复制品进行替换。这样才能在替换完一个match后matchs数组还有效。其间我还用了(?=)正向预查来确保多个连续”号里的每一个都没有错过。不过后来还是用了递归,因为发现””的第二个”号并不需要专门替换而且替换后字符位置提前了一位。后天上班再把正则式拷贝上来记录一下。这次还是没考虑完全,因为a””,这样的本来应该被替换的结果没替换,好像是用editplus补上的[^,]'”,。 还有,””[^,],还有行首行尾的情况。

总之,如果用regexp要注意的有点多,多亏了editplus支持正则。

vbs写些简单的小程序还是挺方便的,虽然性能对大文件一般。SSIS设置还是比较细的,如果不是不能识别这个文件的格式正确,还是挺完美的工具。还有就是由于collation可以随时方便地改变,SSIS导入文件时就直接用文件的code page好了,修改目标table的collation来匹配SSIS,省得match半天的。

 

JS小技

获得当前网络应用的路径,

1.如果要获得当前页面的路径,用docuemtn.URL.lastIndexOf(“/”)。

2.要获得根路径,用var path1=document.URL.substring(0,document.URL.indexOf(“/”,document.URL.indexOf(“/”,8)+1));

如果不确定变量有没有定义,尤其是js文件被引用的情况下,不能确定引用者是不是那个页面,则要判断变量有没有定义:

用 v !=undefined是不对的,要用typeof(v) != “undefined”,要小心。

 

2008年3月20日

http://www.infoq.com/interviews/johanna-rothman-risk-games

里面谈到了几个应该避免的schedule game:

Split Focus:

But there are so many senior managers who because they multitask all the time, every day, that’s their job, they have forgotten or really don’t know that technical people need substantial time to really invest in the work that they are doing, they need to think. Sometimes thinking involves picking up a pen, sometimes thinking involves typing at the computer, sometimes it involves talking with other people, and sometimes people just sit back in there chair and think. And if they are constantly moving from one project to another there is no way. And that’s crazy. Sometimes it happens where managers assign you two projects at the same time, just so you can’t get bored. Very few of us actually get bored these days.

 

 

 

Bring me a rock:

you bring in your best schedule and then some senior manager says “Bring it in”. And so you go back with your team and figure it out. Of course the first thing people say is: “Let’s cut testing”, like that’s going to help you get the product done faster. It’s not, but people work on it. “We can parallelize this, we can cut this corner, we can do this,” none of that stuff ever works. And you bring in the date by two weeks. And you bring back the date to the senior manager and they say: “Not good enough”.

and the Solution:

or you can ask a bunch of questions about what is driving this project, there are context-free questions to ask, you can say: “What does ‘done’ mean? What are our release criteria? Let’s make sure we are both talking about the same thing”. You can move from time box to iterations so that you can get as much as much of the stuff done in priority order (preferably by value, not by risk), so that you can keep making progress as you go. And then when they say: “You are done”, you are actually done with stuff