Unicode and Oracle SQLcl…on Windows

thatjeffsmith SQL Developer 22 Comments

Tell Others About This Story:

I have an UMLAUT table. Pretty simple really…

  CREATE TABLE "HR"."UMLAUT" 
   (	"CHARACTERS" VARCHAR2(2)
   ) ;
REM INSERTING INTO UMLAUT
SET DEFINE OFF;
INSERT INTO UMLAUT (CHARACTERS) VALUES ('Ñ');
INSERT INTO UMLAUT (CHARACTERS) VALUES ('Ä');
INSERT INTO UMLAUT (CHARACTERS) VALUES ('Ñ');
COMMIT;

Looks great in SQLDev, but not so much in SQLcl…

Weird...

Weird…

Everything is UNICODE by default in Java applications. It’s converted to Unicode coming in and going out… [docs]

This is a HUGE advantage in the Java world. We don’t have to do anything special when it comes to supporting Unicode.

So what’s happening above?

Windows CMD by default isn’t setup to use a Unicode code page.

So yeah, that’s not going to work.

UNTIL you do this.

So what does 'chcp 65001' do?

So what does ‘chcp 65001’ do?

The CHCP command changes the code page for the CMD window. And the 65001 code page is for UTF-8.

But, you probably don’t want to have to do this every time you open a CMD window.

So, buy a Mac and use a proper BASH shell environment 😉

Or, you can follow the advice here on StackOverflow. You basically edit the Windows registry to start CMD out with the Unicode code page.

Related Posts

Tell Others About This Story:

Comments 22

  1. The current version of SQLcl (18.3.0.0) no longer includes a bin\sql.bat loader. The “-Dfile.encoding=UTF-8” argument can also be passed to Java by creating a Windows environment variable:

    JAVA_TOOL_OPTIONS=’-Duser.language=en -Duser.region=US -Dfile.encoding=UTF-8′
    (the variable value should not have quotation marks)

    You’ll know you did it right because the first line shown when instantiating SQLcl will say “Picked up JAVA_TOOL_OPTIONS…”:

    H:\>sql.exe user/[email protected]:1521/sid
    Picked up JAVA_TOOL_OPTIONS: -Duser.language=en -Duser.region=US -Dfile.encoding=UTF-8

  2. Hi,
    We had a great time this past week trying to configure SQLcl in Windows 7.
    In the end the following 3 Items had to be set.

    1. Font in the properties of the SQLcl window has to be set to a true type font. (right-click the title at the top of the window to access)
    This can be set permanently by changing the defaults font for command prompt window (CMD) located in the windows\system32 folder and the one located in windows\sysWOW64.
    (not sure which of the 2 SQLcl pulls it’s font setting from but it’s one of them)

    2. Set the encoding in SQLcl.
    set encoding cp1252 in my case.
    An easy way to find the correct one is to run the ‘show encoding’ command in SQL Developer’s editor.

    3. Set to the Code Page for the HOST
    in sqlcl this can be done with the following command:
    HOST CHCP 1252

    1. thatjeffsmith Post
      Author
  3. Using 4.1.5.21.78 on Windows, the sql.bat script changes the codepage to 65001 (Unicode).

    However, it does not restore the previous codepage before exiting. This causes subsequent applications to not work as expected.

    1. Still not fixed in sqldeveloper-17.4.1.054.0712-no-jre.

      If sql.bat is going to change the codepage, then sql.bat must reset it to the codepage the user was using before sql.bat started.

  4. Hi Jeff,

    This is an old thread, but I have some interesting findings with 4.2.0.16.355.0402 on Windows 10.
    – This is not yet mentioned in this thread : you should switch your console font from “Raster Font” to a truetype font, e.g. “Consolas”. Your link to chcp explains this.
    – The chcp 65001 is already set in sql.bat, and makes sure the output in sqlcl is always ok, whatever is the value in “show/set encoding”, except when you do “set sqlformat ansiconsole”. Then it is wrong and you need to add ” -Dfile.encoding=UTF-8″ to STD_ARGS in bin\sql.bat to get it right. Setting encoding to UTF-8 is not enough. When you add the file.encoding, then the “show encoding” will default to UTF-8.
    – Since the bin\sql.bat changes the chcp permanent, we have to reset it to original settings afterward, or you have to change NLS_LANG to use AL32UTF8 or UTF8 ( did not see any differences ). But you have to be aware that all input and output files will have to be in UTF-8 encoding.
    – Using cmd.exe or powershell with codepage 65001 has some strange results which scare me away : sqlplus prints fine, except when a strange character is on the first column of the line, then it is not OK. If you spool to a file and display the contents with “type”, then it is fine. It might be something in sqlplus 11.2
    – If you test with non-printable characters, you will see different things depending on the TTF font choosen, e.g. the separators:
    select ‘FS=’||chr(28)||’,GS=’||chr(29)||’,RS=’||chr(30)||’,US=’||chr(31)||’;’ from dual;

    Regards,
    Dirk

  5. Reading your post, I was hoping that this time around, we can actually use sqlcl in production. Not so. Our database has EE8MSWIN1250 code page. Our clients are Windows 7 and 10 (CP1250).
    Our result of changing cmd to chcp 65001 and the latest sqlcl is this:

    On Windows 7
    ~~~~~~~~~~~~~
    – sqlcl will display extra charaters. We enter 10 umlauts, we’ll get extra output, like this:

    ŠšČčĐđĆ掞
    Ć掞
    ž

    Only the first line should display.
    – real show stopper is that we can’t write *any* umlauts on command line of sqlcl if chcp is set to 65001 (instead of umlaut, we see square)

    Windows 10
    ~~~~~~~~~~~~~
    – sqlcl will display umlauts correctly if we set chcp to 65001 (no extra characters)
    – the problem of entering umlats on command line of sqlcl is the same as on Windows 7. Show stopper.

    The end result is that sqlcl is (still) completely unusable on Windows 7/10 in real life. At least for us, using 8-bit DB (EE8MSWIN1250) on backend and Win7/10 clients with Eastern Europe (CP1250) codepage.

    That said, I found your recommendation (link tothe stackoverflow) of tweaking cmd globally in registry very dangerous. chcp 65001 makes cmd.exe (and powershell as well) broken. And what about all the other uses of cmd (sqlldr, sqlplus, etc.).

    Linux & Macs are the only real alternative, this certainly works for me, but all our developers use Windows 7 or 10 and sqlcl is unfortunately of no use for them :-(. Pity.

    Regards,
    AlesK

    I

    1. thatjeffsmith Post
      Author

      Maybe you need a different code age. Win 10 working and Win 7 not working seems to be the problem isn’t with SQLcl, but with your setup, somewhere.

      How does a UTF-8 code page break cmd.exe? As always, test and verify. If it works, then change the registry to default to that. If it doesn’t work, then don’t do it.

      Did you check the fonts as well? That plays a big piece in the equation, you have to pick a font that supports the character you’re working with.

      Can you give me a table with a set of inserts I can test against?

      SQLcl isn’t ready for production yet, it’s still an Early Adopter. That’s about to change though.

    2. Hi Jeff,

      Win 10 works 50% :), select’s works, sort of, we can’t use umlauts nowhere in the WHERE condition, which is a deal breaker. But the output at least doesn’t reproduce extra characters.

      > How does a UTF-8 code page break cmd.exe?

      Sorry, bad wording. Setting UTF-8 in registra makes cmd.exe unusable for other work. Let’s say that I want to insert some line with umlauts in SQL*Plus (setting NLS_LANG to SLOVENIAN_SLOVENIA.UTF8 beforehand). The result immediate sqlplus crash (11.2.0.4). No one can use SQL*Plus to run existing scripts any more….well, that’s why I consider cmd.exe being “broken” if we set to UTF-8.

      You can use table you posted, just insert:

      insert into UMLAUT values(‘Šš’);
      insert into UMLAUT values(‘Čč’);
      insert into UMLAUT values(‘Žž’);

      Of course, if you wish to reproduce my real environment, then you’ll need at least db with EE8MSWIN1250 characterset — if that’s relevant at all.
      Regarding fonts, I’m aware of them, they were suspect #1, so I tried every one of them without success. And it’s not just my machine, we tested this on 8 of them, from Lenovo Windows default installations that shipped to us with factory default installations, to our Windows Enterprise installment.

      I’ll try my luck with ConEmu.

      Regards,
      Ales

    3. thatjeffsmith Post
      Author
    4. Hi Jeff,

      some good and bad news about sqlcl and UTF8 on Windows.
      I think you’ll have to patch official launcher (sql.bat) file, adding parameter -Dfile_encoding=UTF-8, my line looks like:

      java -Dfile.encoding=UTF-8 %STD_ARGS% %DEBUG% -cp “%CPFILE%” oracle.dbtools.raptor.scriptrunner.cmdline.SqlCli %*

      That’ll partly solve the issue I was complaining above. Instead of seeing square instead of umlaut, we see proper umlaut followed by the square. Better, but still weird. No matter what we did, cmd.exe and powershell behaves like this.

      The only workable solution at this point is to use ConEmu instead of cmd.exe (of course, java -Dfile_encoding=UTF-8….is still mandatory in sql.bat).

      HTH,
      Ales

    5. Forgot to mention that ConEmu is mandatory for Windows 7 clients (99% of my “clients”), on Windows 10, cmd.exe works as it should if -Dfile.encoding=UTF-8 swith is added to sql.bat.

      Regards,
      Ales

    6. thatjeffsmith Post
      Author

      we added in latest SQLcl…

      I am HR on orcl > help set encodi
      SET ENCODING
      SET ENCODING { UTF8,GBK,…}

      I am HR on orcl > show encodings
      current Encoding:Cp1252
      List of available encodings:
      Big5
      Big5-HKSCS
      CESU-8
      Cp1252
      EUC-JP
      EUC-KR
      GB18030
      GB2312
      GBK
      IBM-Thai
      IBM00858
      IBM01140
      IBM01141
      IBM01142
      IBM01143
      IBM01144
      IBM01145
      IBM01146
      IBM01147
      IBM01148
      IBM01149
      IBM037
      IBM1026
      IBM1047
      IBM273
      IBM277
      IBM278
      IBM280
      IBM284
      IBM285
      IBM290

    7. Thanks Jeff, but the SET ENCODING does not work for us.

      On Windows 7:
      the only workaround is to replace the line in sql.bat

      SET STD_ARGS=-Djava.awt.headless=true -Xss10M
      with
      SET STD_ARGS=-Djava.awt.headless=true -Xss10M -Dfile.encoding=UTF-8

      and use ConEmu instead of cmd.exe/powershell.ece.

      On Windows 10: the change in sql.bat is enough, you don’t need ConEmu.

      You can see the complete example here:
      https://dbaportal.eu/2016/11/17/unicode-and-oracle-sqlclon-windows-solved/

      Regards,
      Ales

  6. you don’t need unicode. it should be fine with
    C:> set NLS_LANG=american_america.we8pc850

    SQL> select distinct CLIENT_CHARSET from v$session_connect_info where sid in (select sid from v$mystat);

    CLIENT_CHARSET
    —————————————-
    WE8PC850
    SQL> select * from umlaut;

    CH

    Ñ
    Ä
    Ñ

    1. thatjeffsmith Post
      Author
    1. thatjeffsmith Post
      Author
    2. thatjeffsmith Post
      Author

Leave a Reply

Your email address will not be published. Required fields are marked *