Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not sure what modules or tools you are talking about, but if you use Sql Server Integration Services (formerly SQL Servrer Data Transformation Services), you basically have a data processing pipeline which supports everything and all transformations on the planet.

And obviously it supports arbitrary text-encodings, although sometimes you will need to be explicit about it.

If you used the simplified wizards, all the options may not have been there, but you should have been given the option to export/save the job as a package, and then you can open, modify, test and debug that before running the job for real.

Seriously. SQL Server has some immensely kick-ass and über-capable tooling compared to pretty much every other database out there.

To even suggest it doesn't support UTF8 is ludicrous.



He's probably talking about "bcp" which indeed doesn't support utf-8.

So why would someone even use bcp instead of SSIS? SSIS might be nice for performing repeated imports of data that has a fixed format, but for quick and dirty exports/imports it's really frustrating to use. It's not even smart enough to scan an entire data file and suggest appropriate field lengths and formats. Every single time I try to import a .csv file it craps out and doesn't even show where the error occured - that's after clicking through a bunch of steps in a GUI. At least with BCP you can easily rerun the import/export from the command line.

SQL and SQL Management studio are generally great but I would not include SSIS when lauding them.


If SQL server supports UTF8, Microsoft manages to hide that fact well. http://technet.microsoft.com/en-us/library/ms176089.aspx:

char [ ( n ) ] Fixed-length, non-Unicode string data.

http://technet.microsoft.com/en-us/library/ms186939.aspx

Character data types that are either fixed-length, nchar, or variable-length, nvarchar, Unicode data and use the UNICODE UCS-2 character set.

So, (var)char is "non-Unicode", and n(var)char is UCS-2 only.

That is in agreement with http://blogs.msdn.com/b/qingsongyao/archive/2009/04/10/sql-s..., which claims the glass is half full ("In summary, SQL Server DOES support storing all Unicode characters; although it has its own limitation.")

On the other hand, we have http://msdn.microsoft.com/en-us/library/ms143726.aspx that seems to state that SQL Server 2012 has proper unicode collations. UTF8 still is nowhere to be found, though.


To be fair, the format in which data is stored in the DB and the format used for importing data are two entirely different things.

If you want to treat data as a stream of bytes hardcore UTF8 & PHPesque style (this function is "binary safe" woo) with no regard to the actual text involved, feel free to store it a bytes. SQL Server supports that.

If you want to store it as unicode text feel free to use the ntext and nvarchar types. I'm pretty sure that's what you intend to do anyway, even though you insist on calling it UTF8.


I'm not the original complainer about UTF8 support, but "If you want to store it as unicode text feel free to use the ntext and nvarchar types." comes at a price: for the o so common almost-ASCII text collections, it blows up your disk usage and I/O bandwidth for actual data by a factor of almost 2. For shortish fields, the difference probably isn't that, but if you store, say, web pages or blog posts, it can add up.


The various command-line and text-friendly bulk/bcp tools are the pain-point here.

SSIS is insanely powerful and performant, but it's also insanely cumbersome and script-unfriendly. Microsoft has finally started embracing the power of plain text and scriptable tools in their web-stack and .NET, but SSIS represents a holdover from their heavyweight GUI-and-wizard days.


Fair enough. I don't use those tools too often, so I tend to forget they're around as well.

That said, if you are doing repeatable jobs (and not just one-off imports) you can still create a SSIS package, and then run the package from your script using the package-runner and appropriate config-data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: